WO2003054218A2 - Nucleotide polymorphisms associated with osteoporosis - Google Patents

Nucleotide polymorphisms associated with osteoporosis Download PDF

Info

Publication number
WO2003054218A2
WO2003054218A2 PCT/US2002/040948 US0240948W WO03054218A2 WO 2003054218 A2 WO2003054218 A2 WO 2003054218A2 US 0240948 W US0240948 W US 0240948W WO 03054218 A2 WO03054218 A2 WO 03054218A2
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
dna
protein
gene
polymorphism
Prior art date
Application number
PCT/US2002/040948
Other languages
French (fr)
Other versions
WO2003054218A3 (en
Inventor
Karen Anne Jones
Ana Valdes
David J. Townley
Jonathan Mangion
Nicolas Galwey
Simon Bennett
Ian Mckay
Alan Schafer
Original Assignee
Incyte Genomics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Incyte Genomics, Inc. filed Critical Incyte Genomics, Inc.
Priority to AU2002366709A priority Critical patent/AU2002366709A1/en
Priority to CA002471376A priority patent/CA2471376A1/en
Priority to EP02805650A priority patent/EP1466012A2/en
Publication of WO2003054218A2 publication Critical patent/WO2003054218A2/en
Publication of WO2003054218A3 publication Critical patent/WO2003054218A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the invention relates in general to polymorphisms in genes associated with susceptibility to low bone mineral density and bone remodeling and methods of identifying individuals having a gene containing a polymorphism associated with osteoporosis.
  • the invention also relates to a method of detecting an increases susceptibility to a disease in an individual resulting from the presence of a polymorphism or mutation in the gene coding sequence of a osteoporosis and bone remodeling associated gene.
  • SNPs single nucleotide polymorphisms
  • association studies such as linkage equilibrium studies
  • SNP single nucleotide polymorphism
  • Nucleotide sequence mutations which occur in a gene or gene family, where the gene or gene family is associated with a given disease, indicates susceptibility to or development of the disease.
  • Osteoporosis is a common disease characterized by low bone mineral density (BMD), deterioration of bone micro-architecture and increased risk of bone damage, such as fracture.
  • BMD bone mineral density
  • Common types of osteoporosis include postmenopausal and senile osteoporosis, which generally occur later in life, e.g., 70+ years.
  • Osteoporosis is a major health problem in virtually all societies. It is estimated that 30 million Americans and 100 million people worldwide are at risk for osteoporosis. In European populations, one in three women and one in twelve men over the age of fifty is at risk. These numbers are growing as the elderly population increases.
  • osteoporosis is a major public health problem which affects quality of life and increases costs to health care providers.
  • Peak bone mass is mainly genetically determined, though dietary factors and physical activity can have positive effects. Peak bone mass is attained at the point when skeletal growth ceases, after which time bone loss starts. In contrast to the positive balance that occurs during growth, in osteoporosis, the resorbed cavity is not completely refilled by bone. Despite recent successes with drugs that inhibit bone resorption, there is a clear need for specific anabolic agents that will considerably increase bone formation in people who have already suffered substantial bone loss. There are no such drugs currently approved.
  • HRT hormone replacement therapy
  • bisphosphonates e.g., alendronate (Fosamax)
  • estrogen and estrogen receptor modulators progestin, calcitonin, and vitamin D.
  • Osteoporosis can be considered a complex genetic trait with variants of several genes underlying the genetic determination of the variability of the phenotype.
  • Low bone mineral density (BMD) is an important risk factor for fractures, the clinically most relevant feature of osteoporosis. Segregation analysis in families has shown that BMD is under polygenic control while, in addition, biochemical markers of bone turnover have also been shown to have strong genetic components.
  • VDR vitamin D receptor
  • the present invention is applicable to any disease in which low BMD and/or bone fracture is a factor, and is therefore particularly concerned with diseases such as osteoporosis.
  • Low BMD is defined as two standard deviations below the age-matched mean of bone mineral density for a given population.
  • Bone damage may be defined as any form of structural damage such as fractures, bones or chips, and degradation or deterioration of the bone other than normal wear and tear resulting from low bone mineral density or another cause. Such low BMD and/or bone damage is associated with osteoporosis.
  • the invention may be practised on any mammalian subject.
  • the mammalian subject will be a human, and most preferably an adult, preferably female.
  • the polynucleotide of this invention is preferably DNA, or may be RNA or other options .
  • fragments of the nucleic acid sequences of the first aspect are provided, which comprise one or more nucleotide substitutions, insertions or deletions.
  • the novelty of a fragment according to the present embodiment may be easily ascertained using sequence comparison methods as previously described.
  • Preferred fragments may be 10 to 40 nucleotides in length. More preferably, the fragments are between 5 to 10, 5 to 20, or 10 to 20 nucleotides in length. For example, the fragments may be 5, 8, 10, 12, 15, 18, 20, 22, 25, 28, 30, or 35 nucleotides in length.
  • the fragments may be useful in a variety of diagnostic, prognostic or therapeutic methods, or may be useful as research tools for example in drug screening.
  • non-coding, complementary sequences which hybridize to a nucleic acid sequence of the first aspect.
  • anti-sense sequences are useful as probes or primers for detecting an allele of a polymorphism of the invention, or in the regulation of the genes. They may also be used as agents for use in the identification and/or treatment of individuals having or being susceptible to low bone mineral density.
  • the anti-sense polynucleotides of this embodiment may be the full length of sequence of the first aspect, or more preferably may be 5 to 30 nucleotides in length.
  • Preferred polynucleotides are 5 to 10 or 10 to 25 nucleotides in length.
  • Primers, in particular, are typically 10 to 15 nucleotides long, and may occasionally be 16 to 25.
  • the polynucleotides of the aforementioned aspects of the invention may be in the form of a vector, to enable the in vitro or in vivo expression of the polynucleotide sequence .
  • the polynucleotides may be operably linked to one or more regulatory elements including a promoter; regions upstream or downstream of a promoter such as enhancers which regulate the activity ofthe promoter; an origin of replication; appropriate restriction sites to enable cloning of inserts adjacent to the polynucleotide sequence; markers, for example antibiotic resistance genes; ribosome binding sites: RNA splice sites and transcription termination regions; polymerisation sites; or any other element which may facilitate the cloning and/or expression of the polynucleotide sequence.
  • each may be controlled by its own regulatory sequences, or all sequences may be controlled by the same regulatory sequences. In the same manner, each sequence may comprise a 3' polyadenylation site.
  • the vectors may be introduced into microbial, yeast or animal DNA, either chromosomal or mitochondrial, or may exist independently as plasmids. Examples of suitable vectors will be known to persons skilled in the art and include pBluescript II, LambdaZap, and pCMV-Script (Stratagene Cloning Systems, La Jolla (USA))
  • host cell comprising a polynucleotide according to any of the aforementioned aspects, for expression of the polynucleotide.
  • the host cell may comprise an expression vector, or naked DNA encoding said polynucleotides.
  • suitable host cells are available, both eukaryotic and prokaryotic.
  • transgenic non-human animal comprising a polynucleotide according to an aforementioned aspect of the invention.
  • the transgenic, non-human animal comprises a polynucleotide according to the second third aspects.
  • Transgenic non-human animals are useful for the analysis of the single nucleotide polymorphisms and their phenotypic effect.
  • a method of screening for agents for use in the prognosis, diagnosis or treatment of individuals having, or being susceptible to, low bone mineral density comprising contacting a putative agent with a polynucleotide or protein according to an aforementioned aspect of the present invention, and monitoring the reaction there between.
  • the method further comprises contacting a putative agent with a reference polynucleotide or protein, and comparing the reaction between (i) the agent and the polynucleotide or protein encoding the reference allele; and (ii) the agent and polynucleotide or protein ofthe invention.
  • Potential agents are those which react differently with a variant of the invention and a reference allele.
  • the present method may be carried out by contacting a putative agent with a host cell or transgenic non-human animal comprising a polynucleotide or protein according to the invention.
  • putative agents will include those known to persons skilled in the art, and include chemical or biological compounds, such as anti-sense polynucleotide sequences, complementary to the coding sequences of the first aspect, or polyclonal or monoclonal antibodies which bind to a product such as a protein or protein fragment of the second aspect. They may also be useful in determining susceptibility to low bone mineral density, or in the diagnosis, prognosis or treatment of related conditions.
  • a method of diagnosing, or determining susceptibility of a subject to low bone mineral density and/or bone damage comprising analysing the genetic material of a subject to determine which allele(s) ofthe gene is/are present.
  • the method may include determining whether one or more particular alleles are present, or which combination of alleles (i.e. a haplotype) is present.
  • the method may also include determining whether subjects are homozygous or heterozygous for a particular allele or haplotype.
  • the method comprises determining which allele of one or more of the polymorphisms of the invention is/are present.
  • the method may include determining the presence of the polymorphism of the gene which in combination with polymorphisms defined herein or other polymorphisms may define a risk haplotype.
  • the method may comprise determining which allele is present in the protein.
  • the method comprises determining whether the allele of the polymorphism of the fourth aspect is present. Any method for determining the presence of an allele may be used.
  • One such method involves the use of antibodies in diagnosing or determining susceptibility to low bone mineral density.
  • the method may comprise removing a sample from a subject, contacting the sample with an antibody to an antigen of the protein, and detecting binding of the antibody to the antigen, wherein binding is indicative ofthe presence of a particular allele or form of the protein and thus risk to low BMD. Tissue samples as described above are suitable for this method.
  • a method of predicting the response of a subject to treatment comprising analysing genetic material of a subject to determine which allele(s) of the gene is/are present.
  • the method is carried out according to the ninth aspect.
  • This aspect ofthe invention is based upon the observation that the effectiveness of treatment depends upon the underlying cause of disease. Therefore, depending upon the presence of particular allele(s), and their effect, certain treatments may be effective, whereas others may not. This will be the case where different alleles or haplotypes result in low bone mineral density, but mediate their effect via different biological mechanisms.
  • the method preferably also comprises comparing the alleles present in a subject with those ofthe genes which require particular treatments.
  • the present invention provides a kit to determine which alleles of the gene is/are present.
  • the kit will be suitable for determining which alleles of the polymorphisms of the first aspect are present.
  • the kit may contain polynucleotides, most preferably anti-sense sequences such as those of the third aspect, for use as probes or primers; antibodies which bind to alleles of the protein, such as those of the fifth aspect; or restriction enzymes for use in detecting the presence of a polynucleotide, protein, or fragment thereof.
  • the kit will also comprise means for detection of a reaction, such as nucleotide label detection means, labelled secondary antibodies or size detection means.
  • a reaction such as nucleotide label detection means, labelled secondary antibodies or size detection means.
  • the polynucleotides, or antibodies may be fixed to a substrate, for example an array.
  • the kit further comprises means for indicating correlation between the genotype of a subject and risk of low BMD. Such means may be in the form of a chart or visual aid, which indicate that presence of one or more alleles of the gene, including alleles of the polymorphisms of the invention, is/are associated with low BMD.
  • the invention provides novel polynucleotides and polymorphic polynucleotides associated with a given human disease, for example, with osteoporosis.
  • the invention also provides a gene sequence containing one or more polymorphic nucleotides associated with a predisposition to or the development of a given human disease such as osteoporosis.
  • the invention also relates to polypeptides encoded by the novel polynucleotides or the polymorphism- containing gene.
  • the invention also provides methods of detecting a polymorphism according to the invention in individuals at risk for osteoporosis, and for determining if a given polymorphism is associated with a predisposition to the disease.
  • the invention also discloses polymorphism(s) that are either associated with or are not associated with (i.e., are neutral) osteoporosis.
  • a polymorphism in a given gene can be utilized in various diagnostic and therapeutic methods and procedures, for example, in nucleic acid and peptide diagnosis, drug screening and design, and in gene and peptide therapy.
  • a polymorphism associated with a given gene can be utilized in various gene expression systems and assays designed to analyze gene regulation and expression. Definitions
  • polymorphism refers to a nucleotide alteration that either predisposes an individual to a disease or is not associated with a disease, which occurs as a result of a substitution, insertion or deletion. More particularly, a "polymorphism” or “polymorphic variation” may be a nucleic acid sequence variation, as compared to the naturally occurring sequence, resulting from either a nucleotide deletion, an insertion or addition, or a substitution, which is present at a frequency of greater than 1% in a population.
  • neutral polymorphism refers to a polymorphism which is present at a frequency of greater than 1 % in a population, which does not alter gene function or phenotype, and thus is not associated with a predisposition to or development of a disease.
  • polynucleotide sequence refers to a sense or antisense nucleic acid sequence comprising RNA, cDNA, genomic DNA, synthetic forms and mixed polymers, that may be chemically or biochemically modified or may contain non-natural or derivatized nucleotide bases.
  • mutant refers to a variation in the nucleotide sequence of a gene or regulatory sequence as compared to the naturally occurring or normal nucleotide sequence.
  • a mutation may result from the deletion, insertion or substitution of more than one nucleotide (e.g.,
  • nucleotide change such as a deletion, insertion or substitution.
  • mutation also encompasses chromosomal rearrangements.
  • nucleic acid probe refers to an oligonucleotide, nucleotide or polynucleotide, and fragments and portions thereof, and to DNA or RNA of genomic or synthetic origin which may be single- or double- stranded, which represents the sense or antisense strand.
  • DNA fragment refers to a length of polynucleotide, for example, as small as 5 nucleotides, 10, 20, 25, 40, 50, 75, 100, 250, 400, 500 and 1 kb, and as large as 5-10kb.
  • alteration refers to a change in either a nucleotide or amino acid sequence, as compared to the naturally occurring sequence, resulting from a deletion, an insertion or addition, or a substitution.
  • deletion refers to a change in either nucleotide or amino acid sequence wherein one or more nucleotides or amino acid residues, respectively, are absent.
  • insertion or “addition” refers to a change in either nucleotide or amino acid sequence wherein one or more nucleotides or amino acid residues, respectively, have been added.
  • substitution refers to a replacement of one or more nucleotides or amino acids by different nucleotides or amino acid residues, respectively.
  • specifically hybridizable refers to a nucleic acid or fragment thereof that hybridizes to another nucleic acid (or a complementary strand thereof) due to the presence of a region that is at least approximately 90 % homologous , preferably at least approximately 90- 95% homologous, and more preferably approximately 98-100% homologous, as are polynucleotides that hybridize to a partner under stringent hybridization conditions.
  • Stringent hybridization conditions are defined hereinbelow for various hybridization protocols.
  • a probe that is specifically hybridizable to a given sequence can be used to detect a 1 bp out of 10 bp (10%) or a 1 bp out of 2O bp (5%) difference between nucleic acid sequences and is therefore useful for discriminating between a wild type and a mutant form of a gene of interest.
  • amino acid sequence refers to the sequential array of amino acids that have been joined by peptide bonds between the carboxylic acid group of one amino acid and the amino group of the adjacent amino acid to form long linear polymers comprising proteins.
  • amino acid refers to protein subunit molecules that contain a carboxylic acid group, and an amino group, both linked to a single carbon atom.
  • a polypeptide is said to be "encoded" by a polynucleotide if the polynucleotide, either in its native state or in a recombinant form can be transcribed and/or translated to produce the mRNA for and/or the polypeptide or a fragment thereof.
  • gene refers to a region of DNA which includes a portion which can be transcribed into RNA, and which may contain an open reading frame or coding region (also referred to as an exon) which encodes a protein, a non-coding region (also referred to as an intron), and a specific regulatory region comprising the DNA regulatory elements which control expression of the transcribed region.
  • coding region refers to a region of DNA which encodes a protein, also known as an exon.
  • non-coding region refers to a region of DNA which does not encode a protein coding region, also known as an intron, and is not included in the RNA molecule that is synthesized from a particular gene.
  • regulatory region refers to DNA sequences which are located either 5' of the transcription start site, 3' or the transcription termination site, within an intron or exon, capable of ensuring that the gene is transcribed at the proper time and in the appropriate cell type.
  • consensus DNA sequence or wild-type DNA sequence refers to a sequence wherein every position represents the nucleotide that occurs with the highest frequency when many actual sequences are compared.
  • consensus DNA sequence or wild-type DNA sequence also refers to the normal, naturally occurring DNA sequence.
  • a given sequence (or mutation or polymorphism) "associated with" osteoporosis refers to a nucleic acid sequence that increases susceptibility to the disease, predisposes an individual to the disease or contributes to the disease, wherein the nucleic acid sequence is present at a higher frequency (at least 5%, preferably 10%, more preferably 25% higher) in individuals with the disease as compared to individuals who do not have the disease.
  • a sequence "not associated with" osteoporosis refers to a nucleic acid sequence that does not jncrease susceptibility to the disease, predispose an individual to the disease or contribute to the disease, wherein the nucleic acid sequence is not present at a higher frequency in individuals with the disease, and thus is present at a frequency about equal to its frequency in individuals who do not have the disease.
  • amplifying refers to producing additional copies of a nucleic acid sequence, preferably by the method of polymerase chain reaction (Mullis and Faloona, 1987, Methods Enzvmol. 155: 335).
  • oligonucleotide primers refer to single stranded DNA or RNA molecules that are hybridizable to a nucleic acid template and prime enzymatic synthesis of a second nucleic acid strand. Oligonucleotide primers useful according to the invention are between 5 to 100 nucleotides in length, preferably 20-60 nucleotides in length, and more preferably 20-40 nucleotides in length.
  • sequencing refers to determining the precise nucleotide composition or sequence of a nucleic acid region by methods well known in the art (see Ausubel et al., supra and S ambrook et al . , supra) .
  • comparing refers to determining if the nucleotides at one or more positions in a particular region of a nucleic acid fragment are identical for any two or more sequences. According to the invention, sequence comparisons can be performed by using computer program analysis as described below in Section F entitled “Identification and Characterization of Polymorphisms”.
  • sequence differences or “sequence variations” refer to nucleotide changes, at one or more positions between any two or more sequences being compared.
  • determining the presence of polymorphic variations refers to using methods well known in the art to identify a nucleotide, at one or more positions within a particular nucleic acid region, that is distinct from the nucleotide present in the naturally occurring, wild-type or consensus sequence, resulting from either a nucleotide deletion, an insertion or addition, or a substitution.
  • determining the absence of polymorphic variations refers to using methods well known in the art to determine that the nucleotides present at every position analyzed in a particular nucleic acid region are identical to the nucleotides present in the naturally occurring, wild type or consensus sequence.
  • genotyping refers to determining the composition of the genetic material that is inherited by an organism from its parents.
  • biological sample refers to a tissue or fluid sample containing a polynucleotide or polypeptide of interest, and isolated from an individual including but not limited to plasma, serum, spinal fluid, lymph fluid, urine, stool, external secretions of the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and samples of in vitro cell culture constituents.
  • amplimers refer to a specific fragment of DNA generated by PCR that is at least 30 bp in length and is preferably between 50 and 100 bp in length, and is more preferably between 150-300bp in length, with a melting temperature in the range of approximately 60-62°C.
  • phenotype refers to the biological appearances of an organism or a tissue derived from an organism, wherein biological appearances include chemical, structural and behavioral attributes, and excludes genetic constitution.
  • protype refers to the genetic material that is inherited by an organism from its parents.
  • genetic susceptibility to osteoporosis refers to an increased risk of developing osteoporosis resulting from specific DNA differences relative to non-susceptible individuals.
  • an individual who is genetically susceptible to osteoporosis has a 5-100%, and more preferably a 25-50% greater chance of developing osteoporosis, as compared to non- susceptible individuals.
  • diagnostic refers to the practice of identifying a disease from the signs and symptoms of an individual including the DNA sequences of genes that are associated with an increased susceptibility to the disease.
  • Diagnostic also refers to the practice of stratifying patient populations based on the efficacy or toxicity of a composition, and the predictive placement of an individual in a response strata based on stata-associated parameters.
  • prognosis refers to the possibility of recovering from a particular disease or condition, and also refers to risk assessment of developing a particular disease or condition.
  • oligonucleotide primers are disclosed that are useful for determining the sequence of a particular allele of a gene.
  • the invention also discloses oligonucleotide primers designed to amplify a region of a gene that is known to contain a polymorphism.
  • the invention also discloses oligonucleotide primers designed to anneal specifically to a particular allele of a gene.
  • Oligonucleotide primers useful according to the invention are single-stranded DNA or RNA molecules that are hybridizable to a nucleic acid template and prime enzymatic synthesis of a second nucleic acid strand.
  • oligonucleotide primers are prepared by synthetic methods, either chemical or enzymatic. Alternatively, such a molecule or a fragment thereof is naturally-occurring, and is isolated from its natural source or purchased from a commercial supplier. Oligonucleotide primers are 5 to 100 nucleotides in length, ideally from 20 to 40 nucleotides, although oligonucleotides of different length are of use.
  • Pairs of single- stranded DNA primers can be annealed to sequences within or surrounding a gene on chromosome Y in order to prime amplifying DNA synthesis of a region of a gene.
  • a complete set of gene primers will allow synthesis of all of the nucleotides of the coding sequences, e.g., the exons, introns and control regions.
  • the set of primers will also allow synthesis of both intron and exon sequences.
  • Allele-specific primers are also useful, according to the invention. Such primers will anneal only to a particular-mutant allele (e.g. alleles containing a polymorphism), and thus will only amplify a product if the template also contains the polymorphism. Allele specific primers that anneal only to a wild type gene sequence are also useful according to the invention.
  • selective hybridization occurs when two nucleic acid sequences are substantially complementary (at least about 65% complementary over a stretch of at least 14 to 25 nucleotides, preferably at least about 75%, more preferably at least about 90% complementary). See Kanehisa, M., 1984, Nucleic Acids Res. 12: 203, incorporated herein by reference. As a result, it is expected that a certain degree of mismatch at the priming site is tolerated. Such mismatch may be small, such as a mono-, di- or tri-nucleotide. Alternatively, it may encompass loops, which are defined as regions in which there exists a mismatch in an uninterrupted series of four or more nucleotides.
  • oligonucleotide primers Numerous factors influence the efficiency and selectivity of hybridization of the primer to a second nucleic acid molecule. These factors, which include primer length, nucleotide sequence and/or composition, hybridization temperature, buffer composition and potential for steric hindrance in the region to which the primer is required to hybridize, will be considered when designing oligonucleotide primers according to the invention.
  • longer sequences have a higher melting temperature (T M ) than do shorter ones, and are less likely to be repeated within a given target sequence, thereby minimizing promiscuous hybridization.
  • T M melting temperature
  • Primer sequences with a high G-C content or that comprise palindromic sequences tend to self-hybridize, as do their intended target sites, since unimolecular, rather than bimolecular, hybridization kinetics are generally favored in solution.
  • Hybridization temperature varies inversely with primer annealing efficiency, as does the concentration of organic solvents, e.g. formamide, that might be included in a priming reaction or hybridization mixture, while increases in salt concentration facilitate binding.
  • concentration of organic solvents e.g. formamide
  • synthesis primers hybridize more efficiently than do shorter ones, which are sufficient under more permissive conditions.
  • Stringent hybridization conditions typically include salt concentrations of less than about 1M, more usually less than about 500 mM and preferably less than about 200 mM.
  • Hybridization temperatures range from as low as 0°C to greater than 22°C, greater than about 30°C, and (most often) in excess of about 37°C. Longer fragments may require higher hybridization temperatures for specific hybridization. As several factors affect the stringency of hybridization, the combination of parameters is more important than the absolute measure of a single factor.
  • Oligonucleotide primers can be designed with these considerations in mind and synthesized according to the following methods.
  • the design of a particular oligonucleotide primer for the purpose of sequencing or PCR involves selecting a sequence that is capable of recognizing the target sequence, but has a minimal predicted secondary structure.
  • the oligonucleotide sequence binds only to a single site in the target nucleic acid.
  • the Tm of the oligonucleotide is optimized by analysis ofthe length and GC content ofthe oligonucleotide.
  • the selected primer sequence does not demonstrate significant matches to sequences in the GenBank database (or other available databases).
  • Primer The design of a primer is facilitated by the use of readily available computer programs, developed to assist in the evaluation of the several parameters described above and the optimization of primer sequences. Examples of such programs are "PrimerS elect” of the DNAStarTMsoftware package (DNAStar, Inc. ; Madison, WI), OLIGO 4.0 (National Biosciences, Inc.), PRIMER, Oligonucleotide Selection Program, PGEN and Amplify (described in Ausubel et al., 1995, Short Protocols in Molecular Biology.3rd Edition, John Wiley & Sons). Primers are designed with sequences that serve as targets for other primers to produce a PCR product that has known sequences on the ends which serve as targets for further amplification (e.g. to sequence the PCR product).
  • primers are designed with restriction enzyme site sequences appended to their 5' ends.
  • all nucleotides ofthe primers are derived from gene sequences or sequences adjacent to a gene, except for the few nucleotides necessary to form a restriction enzyme site.
  • restriction enzyme site is well known in the art. If the genomic sequence of a gene and the sequence of the open reading frame of a gene are known, design of particular primers is well within the skill of the art.
  • oligonucleotides are prepared by a suitable method, e.g. the phosphoramidite method described by Beaucage and Carruthers (1981, Tetrahedron Lett., 22:1859) or the triester method according to Matteucci et al. (1981, J. Am. Chem. Soc, 103:3185), both incorporated herein by reference, or by other chemical methods using either a commercial automated oligonucleotide synthesizer (which is commercially available) or VLSIPSTM technology.
  • the invention discloses polynucleotide sequences comprising polymorphisms.
  • the polynucleotide sequences of the invention are specifically hybridizable to a mutant form of a gene and are therefore useful for discriminating between a wild-type form of a gene and a mutant form of a gene.
  • the polynucleotide sequences ofthe invention may also be useful for expression ofthe encoded protein or a fragment thereof.
  • the invention also features antisense polynucleotide sequences complementary to polynucleotide sequences comprising polymorphisms. Antisense polynucleotide sequences are useful according to the invention for inhibiting expression of an allelic form of a gene.
  • the present invention utilizes polynucleotide sequences and fragments comprising RNA, cDNA, genomic DNA, synthetic forms, and mixed polymers.
  • the invention includes both sense and antisense strands of the polynucleotide sequences.
  • the polynucleotide sequences may be chemically or biochemically modified or may contain non- natural or derivatized nucleotide bases. Such modifications include, for example, labels, methylation, substitution of one or more of the naturally occurring nucleotides with an analog, internucleotide modifications such as uncharged linkages (e.g. methyl phosphonates, phosphorodithioates.
  • pendent moieties e.g., polypeptides
  • intercalators e.g. acridine, psoralen, etc.
  • alkylators e.g. alpha anomeric nucleic acids, etc.
  • modified linkages e.g. alpha anomeric nucleic acids, etc.
  • synthetic molecules that mimic polynucleotides in their ability to bind to a designated sequence via hydrogen bonding and other chemical interactions. Such molecules are known in the art and include, for example, those in which peptide linkages substitute for phosphate linkages in the backbone of the molecule.
  • the polynucleotide may be a naturally occurring polynucleotide, or may be a structurally related variant of such a polynucleotide having modified bases and/or sugars and/or linkages.
  • the term "polynucleotide” as used herein is intended to cover all such variants.
  • X or W NH) (Mag and Engels. 1988, Nucleic Acids Res., 16:3525)
  • purine derivatives lacking specific nitrogen atoms (e.g.7-deaza adenine, hypoxanthine) or functionalized in the 8-position (e.g. 8-azido adenine, 8-bromo adenine)
  • Polynucleotides covalently linked to reactive functional groups e.g.: i) psoralens (Miller et al., 1988, Nucleic Acids Res. Special Pub. No.
  • modified polynucleotides while sharing features with polynucleotides designed as "anti-sense” inhibitors, are distinct in that the compounds correspond to sense-strand sequences and the mechanism of action depends on protein-nucleic acid interactions and does not depend upon interactions with nucleic acid sequences.
  • Polynucleotide sequences comprising DNA can be isolated from cDNA or genomic libraries (including YAC and B AC libraries) by cloning methods well known to those skilled in the art (Ausubel et al., supra). Briefly, isolation of a DNA clone comprising a particular polynucleotide sequence involves screening a recombinant DNA or cDNA library and identifying the clone containing the desired sequence. Cloning will involve the following steps. The clones of a particular library are spread onto plates, transferred to an appropriate substrate for screening, denatured, and probed for the presence of a particular sequence. A description of hybridization conditions, and methods for producing labeled probes is included below.
  • the desired clone is preferably identified by hybridization to a nucleic acid probe or by expression of a protein that can be detected by an antibody.
  • the desired clone is identified by polymerase chain amplification of a sequence defined by a particular set of primers according to the methods described below.
  • Polynucleotide sequences of the invention are amplified from genomic DNA.
  • Genomic DNA is isolated from tissues or cells according to the following method.
  • the tissue is isolated free from surrounding normal tissues.
  • genomic DNA from mammalian tissue
  • the tissue is minced and frozen in liquid nitrogen.
  • Frozen tissue is ground into a fine powder with a prechilled mortar and pestle, and suspended in digestion buffer (100 mM NaCI, 10 mM TrisCl, pH 8.0, 25 mM EDTA, pH 8.0, 0.5% (w/v) SDS, 0.1 mg/ml proteinase K) at 1.2ml digestion buffer per lOOmg of tissue.
  • digestion buffer 100 mM NaCI, 10 mM TrisCl, pH 8.0, 25 mM EDTA, pH 8.0, 0.5% (w/v) SDS, 0.1 mg/ml proteinase K
  • cells are pelleted by centrifugation for 5 min at 500 x g, resuspended in 1-10 ml ice-cold PBS, repelleted for 5 min at 500 x g and resuspended in 1 volume of digestion buffer.
  • Samples in digestion buffer are incubated (with shaking) for 12-18 hours at 50°C, and then extracted with an equal volume of phenol/chloroform/isoamyl alcohol. If the phases are not resolved following a centrifugation step (10 min at 1700 x g), another volume of digestion buffer (without proteinase K) is added and the centrifugation step is repeated. If a thick white material is evident at the interface of the two phases, the organic extraction step is repeated. Following extraction the upper, aqueous layer is transferred to a new tube to which will be added x volume of 7.5M ammomum acetate and 2 volumes of 100% ethanol.
  • the nucleic acid is pelleted by centrifugation for 2 min at 1700 x g, washed with 70% ethanol, air dried and resuspended in TE buffer (10 mM TrisCl, pH 8.0, 1 mM EDTA, pH 8.0) at lmg/ml. Residual RNA is removed by incubating the sample for 1 hour at 37°C in the presence of 0.1% SDS and 1 mg/ml DNAse-free RNASE, andrepeating the extraction and ethanol precipitation steps.
  • the yield of genomic DNA according to this method is expected to be approximately 2 mg DNA/1 g cells or tissue (Ausubel et al., supra).
  • Genomic DNA isolated according to this method can be used for Southern blot analysis, restriction enzyme digestion, dot blot analysis or PCR analysis, according to the invention.
  • Restriction digest (of cDNA or genomic DNA) Following the identification of a desired cDNA or genomic clone containing a particular sequence, polynucleotides of the invention are isolated from these clones by digestion with restriction enzymes.
  • restriction enzyme digestion is well known to those skilled in the art (Ausubel et al., supra). Reagents useful for restriction enzyme digestion are readily available from commercial vendors including New England Biolabs, Boebringer Mannheim, Promega, as well as other sources. d. PCR
  • Polynucleotide sequences of the invention are amplified from genomic DNA or other natural sources by the polymerase chain reaction (PCR). PCR methods are well-known to those skilled in the art.
  • PCR provides a method for rapidly amplifying a particular DNA sequence by using multiple cycles of DNA replication catalyzed by a thermostable, DNA-dependent DNA polymerase to amplify the target sequence of interest.
  • PCR requires the presence of a nucleic acid to be amplified, two single stranded oligonucleotide primers flanking the sequence to be amplified, a DNA polymerase, deoxyribonucleoside triphosphates, a buffer and salts.
  • PCR is well known in the art. PCR, is performed as described in Mullis and Faloona, 1987, Methods Enzymol., 155: 335, herein incorporated by reference.
  • PCR is performed using template DNA (at least 1 pg; more usefully, 1 - 1000 ng) and at least 25 pmol of oligonucleotide primers.
  • a typical reaction mixture includes: 2 ml of DNA, 25 pmol of oligonucleotide primer, 2.5 ml of 10X PCR buffer 1 (Perkin-Elmer, Foster City, CA), 0.4 ml of 1.25 mM dNTP, 0.15 ml (or 2.5 units) of Taq DNA polymerase (Perkin Elmer, Foster City, CA) and deionized water to a total volume of 25 ml.
  • Mineral oil is overlaid and the PCR is performed using a programmable thermal cycler.
  • the length and temperature of each step of a PCR cycle, as well as the number of cycles, are adjusted according to the stringency requirements in effect.
  • Annealing temperature and timing are determined both by the efficiency with which a primer is expected to anneal to a template and the degree of mismatch that is to be tolerated.
  • the ability to optimize the stringency of primer annealing conditions is well within the knowledge of one of moderate skill in the art.
  • An annealing temperature of between 30°C and 72°C is used.
  • Initial denaturation of the template molecules normally occurs at between 92°C and 99°C for 4 minutes, followed by 20-40 cycles consisting of denaturation (94-99°C for 15 seconds to 1 minute), annealing (temperature determined as discussed above; 1-2 minutes), and extension (72°C for 1 minute).
  • the final extension step is generally carried out for 4 minutes at 72°C, and may be followed by an indefinite (0-24 hour) step at 4°C.
  • Taq DNA polymerase When Taq DNA polymerase is activated, it cleaves off the fluorescent reporters of the probe bound to the template by virtue of its 5'-to-3' nucleolytic activity. In the absence of the quenchers, the reporters now fluoresce. The color change in the reporters is proportional to the amount of each specific product and is measured by a fluorometer; therefore, the amount of each color can be measured and the PCR product can be quantified.
  • the PCR reactions can be performed in 96 well plates so that samples derived from many individuals can be processed and measured simultaneously.
  • the TaqmanTM system has the additional advantage of not requiring gel electrophoresis and allows for quantification when used with a standard curve.
  • the present invention also provides a polynucleotide sequence comprising RNA.
  • a polynucleotide comprising RNA is useful for detecting snps and polymorphisms by techniques including but not limited to hybridization methods or the RNase protection method.
  • a polynucleotide comprising RNA is also useful as a template for the in vitro production of protein.
  • a polynucleotide comprising RNA is also useful for detecting and localizing specific mRNA sequences by in situ hybridization.
  • Polynucleotide sequences comprising RNA can be produced according to the method of in vitro transcription.
  • the technique of in vitro transcription is well known to those of skill in the art. Briefly, the gene of interest is inserted into a vector containing an SP6, T3 or T7 promoter.
  • the vector is linearized with an appropriate restriction enzyme that digests the vector at a single site located downstream- of the coding sequence. Following a phenol/chloroform extraction, the DNA is ethanol precipitated, washed in 70% ethanol, dried and resuspended in sterile water.
  • the in vitro transcription reaction is performed by incubating the linearized DNA with transcription buffer (200 mM TrisCl, pH 8.0,40 mM MgCl 2 , 10 mM spermidine, 250 NaCI [T7 or T3] or 200 mM TrisCl, pH 7.5,30 mM MgCl 2 , lOmM spermidine [SP6]), dithiothreitol, RNASE inhibitors, each of the four ribonucleoside triphosphates, and either SP6, T7 or T3 RNA polymerase for 30 min at 37°C.
  • transcription buffer 200 mM TrisCl, pH 8.0,40 mM MgCl 2 , 10 mM spermidine, 250 NaCI [T7 or T3] or 200 mM TrisCl, pH 7.5,30 mM MgCl 2 , lOmM spermidine [SP6]
  • dithiothreitol 200 mM
  • RNA RNA
  • unlabeled UTP will be omitted and -SUTP will be included in the reaction mixture.
  • the DNA template is then removed by incubation with DNasel. Following ethanol precipitation, an aliquot of the radiolabeled RNA is counted in a scintillation counter to determine the cpm/ml (Ausubel et al., supra).
  • polynucleotide sequences comprising RNA are prepared by chemical synthesis techniques such as solid phase phosphoramidite (described above). 3. Polynucleotide Sequences Comprising Oligonucleotides
  • a polynucleotide sequence comprising oligonucleotides can be made by using • oligonucleotide synthesizing machines which are commercially available (described above).
  • Polynucleotide sequences ofthe invention can be used to express the protein product (or fragment thereof) of the gene of interest by inserting the polynucleotide sequence into an expression vector.
  • Expression vectors suitable for protein expression in mammalian cells, bacterial cells, insect cells or plant cells are well known in the art and are described in Section H entitled "Production of a Mutant Protein".
  • Polynucleotide sequences ofthe invention can be used to prepare hybrid polynucleotides comprising a sequence of a gene adjacent to a sequence encoding a foreign protein or a fragment thereof (e.g lacZ, trpE, glutathionine S-transferase or thioredoxin) or a protein tag (hemmaglutinin or FLAG).
  • hybrid polynucleotides produce fusion proteins that are useful, according to the invention, for improved expression and/or rapid isolation of a protein or protein fragment, encoded by the sequence of a gene.
  • Hybrid polynucleotides are also useful as a source of antigen for the production of antibodies.
  • Nucleic acid constructs comprising a polynucleotide of genomic, cDNA, synthetic or semi- synthetic origin in association with a polynucleotide sequence encoding a foreign protein or a fragment thereof, (carrier sequence) can be generated by recombinant nucleic acid techniques well known in the art (See Ausubel et al., supra). According to this method, the cloned gene is introduced into an expression vector at a position located 3' to a carrier sequence coding for the amino terminus of a highly expressed protein, an entire functional moiety of a highly expressed protein or the entire protein. It is preferable to use a earner sequence from an E. coli gene or from any gene that is expressed at high levels in E. coli.
  • the purification protocol can be designed in accordance with the unique physical properties of the carrier protein (e.g. heat stability).
  • the tag sequence may encode a protein (e.g. glutathione-S-transf erase (GST)) which can be purified by either a chemical interaction (for example glutathione purification of GST).
  • GST glutathione-S-transf erase
  • some carrier proteins, such as thioredoxin (Trx) can be selectively released from intact cells by osmotic shock or freeze/thaw procedures.
  • proteins that are fused to these carrier proteins can be purified away from intracellular contaminants by virtue of the physical attributes of the carrier protein (Ausubel et al., supra).
  • the temperature at which expression is induced can affect inclusion body formation since inclusion body formation is induced at higher temperatures (37°C and 42°C) and inhibited at lower temperatures (30°C). In certain instances, lowering the total level of protein expression can lead to an increase in the proportion of soluble protein that is produced.
  • the strain background of the cells in which the protein is being produced can affect the proportion of a particular protein that is expressed in a soluble form.
  • the choice of carrier protein can affect the solubility of an expressed fusion protein (Ausubel et al., supra).
  • fusion proteins in E. coli An additional problem that can be encountered when producing fusion proteins in E. coli is formation of an unstable protein, or a protein that is cleaved at the site of the junction between the carrier sequence and the sequence of the protein of interest. To decrease complications due to protein instability one can arrange for the fusion protein to be expressed as insoluble aggregates. Alternatively, one can express the fusion protein in E. coli strains that are deficient in proteases (Ausubel et al., supra).
  • cleavage of fusion proteins to remove the carrier are known to those skilled in the art.
  • the choice of a method is usually determined by the composition, sequence, and physical characteristics ofthe particular protein.
  • Reagents such as cyanogen bromide, hydroxylamine or low pH can be used to chemically cleave fusion proteins.
  • enzymatic cleavage methods can be used.
  • Enzymatic cleavage protocols are advantageous because they can be carried out under relatively mild reaction conditions, and because they involve highly specific cleavage reactions.
  • Enzymes useful for enzymatic cleavage of fusion proteins include factor Xa, thrombin, enterokinase, renin and collagenase (Ausubel et al., supra).
  • Recombinant constructs encoding fusion proteins wherein the carrier sequence is on the order of 9-15 codons can be generated by PCR methods. According to this method, a PCR primer will be designed to contain at least 13 nucleotides that are identical to the target sequence on either side of the nucleotide sequence encoding the carrier sequence.
  • the PCR primer will also contain a restriction enzyme site to facilitate cloning of the amplified product into an appropriate expression vector. PCR will be carried out as described above and the sequence ofthe amplified product will be confirmed by sequence analysis as described in Section D entitled "Isolation of a Wild type Gene”.
  • recombinant constructs encoding fusion proteins can be generated by site/oligonucleotide directed mutatagenesis (Ausubel et al., supra).
  • site directed mutatagenesis the DNA to be mutated is inserted into a plasmid which has an FI origin of replication.
  • a mutagenesis oligonucleotide is designed to contain 13 bp that are 100% identical to the target sequence, on either side of a sequence coding for the 9-15 codons of carrier sequence that is to be added by the mutatgenesis protocol.
  • a single stranded preparation of the vector is prepared by the following method.
  • the sample After the addition of 2.6 ml of 20% polyethylene glycol 200-800/2M NaCI to 20 ml of bacterial supernatant, the sample is incubated for 1 - 1.5 hours on ice. The sample is pelleted by centrifugation at 9000 rpm for 20 minutes. Following removal of the supernatant, residual supernatant are removed by centrifugation at 3000 rpm for 5 minutes. The pellet is resuspended in 400 ml of TE, extracted twice with phenol and four times with phenol: chloroform and ethanol precipitated. The resulting pellet is resuspended in 40 ml TE.
  • Mutagenesis is performed by using a muta-gene kit (Bio-Rad, Hercules, CA) according to the following method.
  • a muta-gene kit Bio-Rad, Hercules, CA
  • To kinase the oligonucleotide primer 1 ml (200ng) of oligonucleotide is incubated in the presence of 2 ml of 10 kinase buffer (0.5M Tris, pH 8.0, 70mM MgCl 2 , lOmM DTT), 2 ml lOmM rATP, 2 ml polynucleotide kinase and 13 ml H 2 0 for 37°C for 1 hour.
  • 10 kinase buffer 0.5M Tris, pH 8.0, 70mM MgCl 2 , lOmM DTT
  • 2 ml lOmM rATP 2 ml polynucleotide kinase
  • DNA is isolated from the transformed E. coli cells by mini prep methods known in the art (Ausubel et al., supra), and sequenced according to methods known in the art (described in Section D entitled "Isolation of a Wild Type Gene”.
  • the invention discloses nucleic acid probes.
  • the nucleic acid probes of the invention are specifically hybridizable to a mutant gene but not to a wild type form of a gene due to the presence of one or more polymorphisms.
  • These allele specific probes can be used to screen DNA sequences of a gene which have been amplified by PCR, or are present in a genomic DNA or RNA test sample. Hybridization of a particular allele specific probe to an amplified gene sequence, under stringent conditions (described below), indicates that the polymorphism contained in the probe is present in the amplified sequence.
  • Nucleic acid probes that are specifically hybridizable to a wild type form of a gene but not to a mutant form of a gene are also useful according to the invention.
  • the probes ofthe claimed invention will be specific for a nucleic acid region that is adjacent to a region that is thought to contain one or more polymorphisms. These probes will be useful for detecting the presence of one or more polymorphisms in the adjacent region by the method of primer extension (as described in Section F entitled "Identification and Characterization of Polymorphisms”.
  • probes of the claimed invention will be used to detect a gain or loss of a restriction enzyme site known to contain one or more polymorphisms of the claimed invention.
  • Nucleic acid probes are able to detect a restriction enzyme fragment that is of a size that can be easily separated on an agarose gel ' and visualized by Southern blot analysis. Probes that are useful according to this embodiment of the claimed invention can be specific for any region within a gene or outside of a gene.
  • the nucleic acids probes ofthe invention are useful for a variety of hybridization-based analyses including but not limited to Southern hybridization to genomic DNA, cDNA sequences or PCR amplification products, Northern hybridization to mRNA and RNase protection assays, DNA sequencing and isolation of genomic or cDNA clones of a gene.
  • the probes may also be used to determine whether mRNA encoded for by a gene is present in a cell or tissue by the method of in situ hybridization. These techniques are well known in the art and can be performed as described in Ausubel et al., supra.
  • polymorphisms associated with alleles of a gene which either predispose to a particular disease (e.g. osteoporosis) or are not associated with a particular disease (e.g. osteoporosis), will be detected by the formation of a stable hybrid consisting of a polynucleotide probe comprising one or more polymorphisms and a target sequence, that also comprises one or more polymorphisms, under stringent to moderately stringent hybridization and wash conditions. If it is expected that the probes will be perfectly complementary to the target sequence, stringent conditions will be used.
  • Hybridization stringency may be lessened if some mismatching is expected, for example, if variants are expected with the result that the probe will not be completely complementary. Conditions are chosen which rule out nonspecific/adventitious bindings, that is, which minimize noise. Since such indications identify neutral DNA polymorphisms as well as mutations, these indications need further analysis (such as assays described in Section F entitled "Identification and Characterization of Polymorphisms") to demonstrate detection of a susceptibility allele of a gene. Probes for alleles of a gene may be derived from genomic DNA or cDNA sequences from specific for the gene of interest. The probes may be of any suitable length, which span all or a portion ofthe region containing the gene.
  • the probes may be short, e.g., in the range of about 8-30 base pairs, since the hybrid will be relatively stable under even stringent conditions. If some degree of mismatch is expected with the probe, i.e., if it is suspected that the probe will hybridize to a variant region, a longer probe may be employed which hybridizes to the target sequence with the requisite specificity.
  • Probes according to the invention also include an isolated polynucleotide attached to a label or a reporter molecule which may be useful for isolating other polynucleotide sequences, having sequence similarity by standard methods, including but not limited to the above- referenced hybridization-based assays. Techniques for preparing and labeling probes (as described in Ausubel et al. Supra) are included below. A wide variety of labels and conjugation techniques are known by those skilled in the art and can be used in a various nucleic acid and amino acid assays.
  • Means for producing labeled hybridization or PCR probes for detecting related sequences include oligolabeling, nick translation, end-labeling or PCR amplification using a labeled nucleotide.
  • the protein-encoding sequence, or any portion of it may be cloned into a vector for the production of an mRNA probe.
  • Such vectors are known in the art, are commercially available, and may be used to synthesize RNA probes in vitro by addition of an appropriate RNA polymerase such as T7, T3 or SP6 and labeled nucleotides.
  • reporter molecules or labels include those radionuclides, enzymes, fluorescent, chemiluminescent, orchromogenic agents as well as substrates, cofactors, inhibitors, magnetic particles and the like.
  • Patents teaching the use of such labels include US Patents 3,817,838; 3,350,752; 3,939,350; 3,996,345; 4,277,437; 4,275,149 and 4,366,241.
  • recombinant immunoglobulins may be produced as shown in US Patent No. 4,816,567 incorporated herein by reference.
  • Probes comprising synthetic oligonucleotides or other polynucleotides of the present invention may be derived from naturally occurring or recombinant single- or double- stranded polynucleotides, or be chemically synthesized.
  • Portions of the polynucleotide sequence having at least approximately 5 nucleotides, preferably 9-15 nucleotides, fewer than about 6 kb and usually fewer than about 1 kb, from a polynucleotide sequence encoding a gene are preferred as probes.
  • a DNA probe useful according to the present invention can be isolated from a gene or a polynucleotide construct derived from a gene, or from a cDNA sequence specific for a gene or a cDNA construct specific for a gene by the methods of PCR or restriction enzyme digestion, as described above.
  • Riboprobes useful according to the invention can be synthesized by the method of in vitro transcription, or by chemical synthesis methods, as described above.
  • An oligonucleotide probe useful according to the invention can be designed, as described above, and synthesized in a commercially available automated synthesizer.
  • Nucleic acid hybridization rate and stability will be affected by a variety of experimental parameters including salt concentration, temperature, the presence of organic solvents, the viscosity of the hybridization solution, the base composition of the probe, the length of the duplex, and the number of mismatches between the hybridizing nucleic acids (Ausubel et al., supra), and as described in Section A entitled "Design and Synthesis of Oligonucleotide Primers".
  • Southern blot analysis can be used to detect sequence variations in a gene from a PCR amplified product or from a total genomic DNA test sample via a non-PCR based assay.
  • the hybridization conditions can be varied as necessary according to the parameters described in Section A entitled "Design and Synthesis of Oligonucleotide Primers". Following hybridization, the membrane is washed at room temperature in 2X SSC/0.1% SDS and at 65°C in 0.2X SSC/0.1% SDS, and exposed to film.
  • the stringency of the wash buffers can also be varied depending on the amount of the background signal (Ausubel et al., supra).
  • Detection of a nucleic acid probe-target nucleic acid hybrid will include the step of hybridizing a nucleic acid probe to the DNA target.
  • This probe may be radioactively labeled or covalently linked to an enzyme such that the covalent linkage does not interfere with the specificity of the hybridization.
  • a resulting hybrid can be detected with a labeled probe.
  • Methods for radioactively labeling a probe include random oligonucleotide primed synthesis, nick translation or kinase reactions (see Ausubel et al., supra).
  • a hybrid can be detected via non-isotopic methods.
  • Non-isotopically labeled probes can be produced by the addition of biotin or digoxigenin, fluorescent groups, chemiluminescent groups (e.g. dioxetanes, particularly triggered dioxetanes), enzymes or antibodies.
  • non-isotopic probes are detected by fluorescence or enzymatic methods. Detection of a radiolabeled probe-target nucleic acid complex can be accomplished by separating the complex from free probe and measuring the level of complex by autoradiography or scintillation counting. If the probe is covalently linked to an enzyme, the enzyme-probe-conjugate- target nucleic acid complex will be isolated away from the free probe enzyme conjugate and a substrate will be added for enzyme detection.
  • Enzymatic activity will be observed as a change in color development or luminescent output resulting in a 10 3 -10 6 increase in sensitivity.
  • An example of the preparation and use of nucleic acid probe- enzyme conjugates as hybridization probes (wherein the enzyme is alkaline phosphatase) is described in (Jablonski et al., 1986, Nucleic Acids Res., 14:6115)
  • Two-step label amplification methodologies are known in the art. These assays are based on the principle that a small ligand (such as digoxigenin, biotin, or the like) is attached to a nucleic acid probe capable of specifically binding to a gene. Allele specific gene probes are also useful according to this method.
  • the small ligand attached to the nucleic acid probe will be specifically recognized by an antibody-enzyme conjugate.
  • digoxigenin will be attached to the nucleic acid probe and hybridization will be detected by an antibody-alkaline phosphatase conjugate wherein the alkaline phosphatase reacts with a chemiluminescent substrate.
  • an antibody-alkaline phosphatase conjugate wherein the alkaline phosphatase reacts with a chemiluminescent substrate.
  • the small ligand will be recognized by a second ligand-enzyme conjugate that is capable of specifically complexing to the first ligand.
  • biotin avidin interaction A well known example of this manner of small ligand interaction is the biotin avidin interaction. Methods for labeling nucleic acid probes and their use in biotin-avidin based assays are described in Rigby et al., 1977, J. Mol. Biol., 113:237 and Nguyen et al., 1992, BioTechniques, 13:116).
  • Variations of the basic hybrid detection protocol are known in the art, and include modifications that facilitate separation of the hybrids to be detected from extraneous materials and/or that employ the signal from the labeled moiety. A number of these modifications are reviewed in, e.g., Matthews & Kricka, 1988, Anal. Biochem., 169:1; Landegren et al., 1988, Science, 242:229; Mittlin, 1989, Clincal Chem. 35:1819; U.S. Pat. No. 4,868,105, and in EPO Publication No. 225,807.
  • a wild type version of a candidate gene according to the invention can be isolated by cloning from an appropriately selected genomic library according to methods well known in the art. Methods of cloning are described in Section B entitled "Production of a Polynucleotide Sequence
  • sequence of the cloned gene will be determined by sequencing methods well known in the art (see Ausubel et al., supra and Sambrook et al., supra). Methods of sequencing employ such enzymes as the Klenow fragment of DNA polymerase I, Sequenase® (US Biochemical
  • the process is automated with machines such as the Hamilton Micro Lab 2200 (Hamilton, Reno NV), Peltier Thermal Cycler (PTC200; MJ Research, Watertown, MA) and the ABI 377 DNA sequencers (Perkin Elmer).
  • machines such as the Hamilton Micro Lab 2200 (Hamilton, Reno NV), Peltier Thermal Cycler (PTC200; MJ Research, Watertown, MA) and the ABI 377 DNA sequencers (Perkin Elmer).
  • a mutant version of a candidate gene according to the invention can be isolated by cloning from an appropriately selected genomic library according to methods well known in the art. Methods of cloning are described in Section B entitled “Production of a Polynucleotide Sequence.”
  • the sequence of the cloned gene will be determined by sequencing methods described in Section D entitled "Isolation of a Wild Type Gene.”
  • the starting point is a set of experimentally derived nucleic acid sequences.
  • the sequences In order to be useful for SNP discovery by the invention, it is preferred that the sequences have complete chromatogram files from a gel or capillary electrophoresis sequencing machine. When this is not available, quality score data which assigns a score to each base in the sequence indicating the likelihood of error for the basecall may be used. If neither of these data are available, the sequence may be used to assist the clustering of other sequences and in some cases to provide additional verification for a discovered SNP, but is not be used by the invention for the identification of the polymorphism.
  • the population of sequences used may constitute either a database of cDNA-derived sequences or genomic sequence.
  • sequences used by the invention are from an assembled cDNA database, such as the LifeSeqGold database (Incyte Genomics, Inc(Incyte), Palo Alto, CA).
  • Derivation of Nucleic Acid Sequences cDNA was isolated from libraries constructed using RNA derived from normal and diseased human tissues and cell lines.
  • the human tissues and cell lines used for cDNA library construction were selected from a broad range of sources to provide a diverse population of cDNAs representative of gene transcription throughout the human body. Descriptions of the human tissues and cell lines used for cDNA library construction are provided in the LIFESEQ database (Incyte Pharmaceuticals, Inc. (Incyte), Palo Alto CA).
  • Human tissues were broadly selected from, for example, cardiovascular, dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, reproductive, and urologic sources.
  • Cell lines used for cDNA library construction were derived from, for example, leukemic cells, teratocarcinomas, neuroepitheliomas, cervical carcinoma, lung fibroblasts, and endothelial cells.
  • Such cell lines include, for example, THP-1, Jurkat, HUVEC, hNT2, WI38, HeLa, and other cell lines commonly used and available from public depositories (American Type Culture Collection, Manassas VA).
  • cell lines Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical agent such as 5'-aza-2'-deoxycytidine, treated with an activating agent such as lipopolysaccharide in the case of leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress.
  • a pharmaceutical agent such as 5'-aza-2'-deoxycytidine
  • an activating agent such as lipopolysaccharide in the case of leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress.
  • Chain termination reaction products may be electrophoresed on urea-polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucleotides) or by fluorescence (for fluorophore- labeled nucleotides). Automated methods for mechanized reaction preparation, sequencing, and analysis using fluorescence detection methods have been developed.
  • Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 liquid transfer system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler (PTC200; MJ Research, Inc. (MJ Research), Watertown MA), and ABI CATALYST 800 thermal cycler (Perkin-Elmer). Sequencing can be carried out using, for example, the ABI 373 or 377 (Perkin-Elmer) or MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA) DNA sequencing systems, or other automated and manual sequencing systems well known in the art.
  • ABI 373 or 377 Perkin-Elmer
  • MEGABACE 1000 Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA
  • nucleotide sequences have been prepared by current, state-of-the-art, automated methods and, as such, may contain occasional sequencing errors or unidentified nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified bases do not represent a hindrance to practicing the invention for those skilled in the art.
  • Several methods employing standard recombinant techniques may be used to correct errors and complete the missing sequence information. (See, e.g., those described in Ausubel, F.M. et al. (1997) Short Protocols in Molecular Biology, John Wiley & Sons, New York NY; and Sambrook, J. et al. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Plainview NY.)
  • Human polynucleotide sequences may be assembled using programs or algorithms well known in the art. Sequences to be assembled are related, wholly or in part, and may be derived from a single or many different transcripts. Assembly of the sequences can be performed using such programs as PHRAP (Phils Revised Assembly Program) and the GELVIEW fragment assembly system (GCG), or other methods known in the art.
  • PHRAP Phils Revised Assembly Program
  • GCG GELVIEW fragment assembly system
  • cDNA sequences are used as "component" sequences that are assembled into “template” or “consensus” sequences as follows. Sequence chromatograms are processed, verified, and quality scores are obtained using PHRED. Raw sequences are edited using an editing pathway known as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Pharmaceuticals, Palo Alto, CA). A series of BLAST comparisons is performed and low- information segments and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n' s", or masked, to prevent spurious matches. Mitochondrial and ribosomal RNA sequences are also removed.
  • Block 1 See, e.g., the LIFESEQ Assembled User Guide, Incyte Pharmaceuticals, Palo Alto, CA).
  • a series of BLAST comparisons is performed and low- information segments and repetitive elements (e.g., dinucleot
  • the processed sequences are then loaded into a relational database management system (RDMS) which assigns edited sequences to existing templates, if available.
  • RDMS relational database management system
  • a process is initiated which modifies existing templates or creates new templates from works in progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences themselves.
  • the templates can be merged into bins. If multiple templates exist in one bin, the bin can be split and the templates reannotated.
  • a resultant template sequence may contain either a partial or a full length open reading frame, or all or part of a genetic regulatory element. This variation is due in part to the fact that the full length cDNAs of many genes are several hundred, and sometimes several thousand, bases in length. With current technology, cDNAs comprising the coding regions of large genes cannot be cloned because of vector limitations, incomplete reverse transcription of the mRNA, or incomplete "second strand" synthesis. Template sequences may be extended to include additional contiguous sequences derived from the parent RNA transcript using a variety of methods known to those of skill in the art. Extension may thus be used to achieve the full length coding sequence of a gene.
  • the cDNA sequences are analyzed using a variety of programs and algorithms which are well known in the art. (See, e.g., Ausubel, supra. Chapter 7.7; Meyers, R.A. (Ed.) (1995) Molecular Biology and Biotechnology, Wiley VCH, New York NY, pp. 856-853). These analyses comprise both reading frame determinations, e.g., based on triplet codon periodicity for particular organisms (Fickett, J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and stop codons; and homology searches.
  • BLAST Basic Local Alignment Search Tool
  • BLAST is especially useful in determining exact matches and comparing two sequence fragments of arbitrary but equal lengths, whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score set by the user (Karlin, S. et al. (1988) Proc. Natl. Acad. Sci.
  • Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif, BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, in "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N. 08/812,290, filed March 6, 1997, incorporated herein by reference.
  • the method comprise a series of filters to identify isSNPs from other sequencing variants and errors.
  • the filters can be grouped into the following five sets of filters by the order of application in the method:
  • Preliminary Filters the main filter in the first group removes the majority of base call errors by requiring a minimum phred quality score of 15. Additional filters at this stage deal with sequence alignment errors as well as errors resulting from improper trimming of vector sequence, chimeras and splice junctions.
  • Finishing Filters these filters remove duplicate and redundant SNPs from the generated list of SNP, and remove SNPs which are from the hypervariable regions of hypervariable genes such as immunoglobulin and T cell receptors.
  • sequences must first be trimmed to eliminate vector sequence, contamination and repetitive sequences. Then certain low information content sequences (for example, long runs of a single base, or two or three-base repeats) and repetitive sequences (for example Alu sequences in humans) must be massed (changed to N's) to prevent over-clustering errors.
  • the clustering process then identifies the sets of sequences that are believed to be derived from the same original DNA sequence or gene.
  • the preferred processes are Blocked 1 for trimming and masking, a variety of different algorithms for clustering, and phrap for the alignment. It will be recognized by those skilled in the art that phrap and other alignment methods carry out a secondary clustering step which divides clusters into contigs, and carry out a secondary trimming step which defines the end points of the portion of each sequence which participates in the contig. The contigs then maybe searched for the occurrence of SNPs.
  • the first step in identifying candidate SNP sequences is to redefine the end points of each sequence as the points within the previous end points where a stretch of at least 10 consecutive base calls, containing at least eight base changes, matches the consensus sequence exactly. Sequence trimming errors (both at single sequence stage and at the alignment stage contribute to the false positives when foreign sequence (vector, chimera or splice variant) is similar to the real sequence and the true boundary is difficult to determine. This step is a conservative approach to avoid false positives and also filters out lower-quality sequence that the ends. The reason the length of the match with a consensus is measured in base changes is to avoid low significance matches on repetitive sequence such as polyA.
  • the next step is an each position of the alignment to compare the base calls of all the aligned sequences which are between their start and end positions and which have quality scores greater than a set threshold, and which have neighboring base calls which agree with a consensus sequence and where the neighboring base calls also have a quality score > the threshold.
  • the threshold is a phred quality score greater than or equal to 15.
  • the possibilities are A, C, G, T, and -(deletion).
  • the next step is a Clone Filter where if there has been more than one base call for a sequence position, then the clone for each sequence is identified in the sequences corresponding to each clone are compared. If the base calls for different sequences from the same clone disagree, then all the sequences for this clone at this base position are removed from consideration. After all of these filters, positions for which there is more than one base call are candidate SNPs. The "wild type" base call is the one in the consensus sequence and the others are designated candidate SNPs. If the wild type base call is a deletion, then the SNP is considered to be an insertion at the previous base.
  • the next filters require opening of the chromatogram files for the sequences identified as containing candidate SNPs. At each candidate SNP position, the chromatogram data of each sequence passing the Identification Filters is extracted.
  • the first step in this process utilizes a program ABIdump to translate binary ABI chromatogram files into usable form.
  • Intensity Filter if the SNP is a single base change (this step is skipped for insertions and deletions), then the process intensity values for each of four bases at the call chromatogram location of the candidate SNP base are used to compute a ratio. If we call the intensity of wild type, "wt”, the intensity of the SNP base “snp”, the minimum of the other two “min”, and the phred quality of the base call "Q”, then the wild type sequences must have
  • the candidate SNP passes only if at least one wild type sequence passes and at least one SNP sequence passes.
  • the quality of the candidate SNP is the lower of the highest wild type pass level and the highest SNP pass level (if there is a high-quality wild type sequence but only low quality SNP sequences, then the candidate is low quality.
  • a SNP quality value is returned.
  • Polymerase errors are specific to the type of sequencing protocol used. For example, reverse transcriptase is involved in EST sequencing but not genomic clone sequencing. Polymerase is involved in the creation of extension clones (polymerase is used in all sequencing reactions, but errors are less likely to arise because only a fraction of the templates are affected in contrast to the extension process where a single polymerase product becomes a template for the entire reaction). This filter is not applied to genomic sequences in the current embodiment on the premise that the genomic sequences do not have polymerase errors, and that somatic mutations are likely to have the same profile as real SNPs.
  • This filter also filters out rare SNPs as well as apparent SNPs which are not real. It is difficult to determine and confirm by experiments to what extent SNP candidates are too rare to be confirmed vs. simply not real. For many applications, very rare SNPs are of less utility than common ones such that this is not a problem; however in some applications it may be advisable to turn this filter off.
  • This filter is that probabilities of different mutations is different depending on the source. For example true SNPs may be mostly transitions whereas reverse transcriptase mutations could be primarily G to T mutations. While this does not allow one to determine for sure that a given change is a true SNP, it allows one to evaluate the relative likelihood that a given mutation is a true SNP.
  • SNP confirmation data suggest that G/T SNP candidates in which there is only one clone having the T allele have a very low probability of being real SNPs. The SNP candidates are excluded from the high confidence set (they are kept in a different file-their confirmation rate is well below 50 percent). The other set which had a very low confirmation rate is any A/T SNP.
  • Frequency Filter This filter is based on the concept that true SNPs have a different frequency profile than clone errors and that a candidate SNP which is evident in only one clone in a deep alignment is less likely to be real than one which appears in one clone in a shallow alignment.
  • the likelihood of finding a SNP at a given sequence location is a function of the number of chromosomes sequenced. This curve is distinctly non-linear as most SNPs are sufficiently frequent, to be found with relatively few sequences. The probability of an error of this type, however is essentially linear in the number of sequences since the chance of the change occurring in two different sequences is independent.
  • This filter is the basis of a secondary method used to develop the base change sequence analysis filter. Comparing the set of single clone SNPs from shallow alignment's with those from deep alignment's, which are more likely to be errors, will reveal base changes which are more likely to be associated with polymerase errors and somatic mutations.
  • These filters are intended to remove candidates SNPs which result from the incorrect clustering of similar sequences such as highly homogenous genes, similar genomic sequences, and contamination from other species where the sequences of the species have been mis- labeled as human.
  • Number of base change filter This filter distinguishes homologous sequences from SNPs on the basis ofthe frequency of variants. True SNPs occur about one per kd when comparing to sequences or once per 2 kb if the length of sequences is included, and this fraction decreases as the depth of the alignment increases. Since EST sequences tend to be about 500 bp or less in length, then it would be expected to have not more than one SNP per four sequences. The number of SNPs in the cluster is divided by the number of sequences in the cluster and SNPs for which this number is larger than one are discarded. The higher the number, the less likely the SNP is to be real. The threshold value of one was chosen because it appears to correspond to roughly a 50 percent success rate, however the threshold value could be adjusted to higher value to accept lower confidence SNPs.
  • This filter calculates the number of SNPs for which the sequence is the only representative within a window of 100 bases on either side, and discards any of the SNPs for which there are more than one other SNP in this window.
  • This threshold can be set higher, but the actual fraction of SNP candidates which are true SNPs drops off to less than 50 percent.
  • Cluster total has proven to be empirically correlated with the confirmation rate, probably because it predicts clusters which contain para-logs, homologs and contamination from other species. Candidates SNPs which have a cluster number of less than eight are kept. This threshold value for the cluster total can be varied. Redundancy/finishing filters
  • Redundant SNP filter SNPs in different contigs of the same gene which have the same base change and surrounding sequence are flagged as redundant. To accommodate possible splice variants this redundancy filter also applies to SNPs which have the surrounding sequence matches on only one side.
  • Sequences containing SNPs are filtered to remove SNPs in sequences that are homologs to T cell receptors and immunoglobulin genes because both types of genes have hyper- variable regions which could result in false positives.
  • SNP related data With each candidate SNP a variety of data is kept, including the number and sources of all contributing sequences (for example gene album, HTPS, FL, WashU/Merck, etc.), the surrounding sequence, measures of the ratio and quality scores for the "best" sequence representing each allele, etc.
  • contributing sequences for example gene album, HTPS, FL, WashU/Merck, etc.
  • the surrounding sequence measures of the ratio and quality scores for the "best" sequence representing each allele, etc.
  • Sequence related data for each sequence associated with each SNP, the following data is kept including the distance in each direction to the end of the sequence, the distance in each direction to the next base different from the consensus and passing the initial quality filters, the library, tissue ID, donor ID and comments (for example tumor, diseases, normal).
  • the invention provides methods for detecting the presence of polymorphisms in candidate genes ofthe invention.
  • the invention also provides methods for distinguishing polymorphisms which contribute to a particular disease (e.g. osteoporosis) over polymorphisms which do not contribute to the disease.
  • Identification of Polymorphisms in a candidate gene will involve the steps of isolating the candidate gene, determining its genomic structure and identifying polymorphisms in the DNA sequences in any portion of the entire protein-coding region.
  • the invention also provides methods for identifying polymorphisms in the DNA sequences corresponding to RNA splice junctions.
  • the invention also provides methods for identifying polymorphisms in the DNA sequence corresponding to the regulatory (promoter) region of the candidate gene.
  • a candidate gene is isolated by cloning methods well known in the art (described above).
  • the genomic structure of a candidate gene is determined by Southern blot analysis, as described in Section C. It is expected that the entire sequence of an open reading frame (ORF) of an average entire gene can be spanned by 16 PCR-amplified DNA fragments or amplimers of an average length of 225 bp. It is expected that a smaller gene can be spanned by 1-2 amplimers and that >50 amplimers are required to span extremely large genes.
  • Primers useful for production of the amplimers of a particular candidate gene are designed based on preexisting knowledge of the sequence ofthe wild type gene, according to the primer design strategies described in Section A entitled "Design and Synthesis of Oligonucleotide Primers.”
  • primers that amplify overlapping regions of the candidate gene. If a sequence variation is located in a region of a candidate gene that corresponds to the region to which the primers hybridize, the primers will likely not bind, the region containing this sequence variation will not be amplified and the variation will not be detected in PCR based assays.
  • By producing overlapping amplimers it is expected that virtually all of the sequence variations in a particular candidate gene will be detected.
  • the amount of overlap in the amplimers is somewhat variable (approximately 20%) and the precise location ofthe overlapping regions will depend on the location of regions comprising a sequence that is an appropriate primer sequence.
  • each polymorphism will be detected in the context of an SSCP fragment.
  • Polymorphism analysis by fluorescent SSCP uses PCR to generate an amplimer of DNA to be studied.
  • the region to be tested is defined as the region between the primers (e.g. the region that is incorporated into the PCR product and reflects the sequence of the DNA sample being tested).
  • the PCR primers reflect the sequence of the DNA sample being tested and are incorporated into the PCR product as one end of each strand of DNA in the PCR product.
  • fSSCP provides a method of screening a DNA sequence located between PCR primers for the presence of polymorphisms.
  • the sensitivity of the technique of fSSCP for detecting a polymorphism is affected by length, such that there is a substantial decrease in the detection of polymorphisms in amplimers that are greater than 300 bp in length.
  • primer3 program (Copyright (c) 1996 Whitehead Institute for Biomedical Research) is employed to design pairs of primers suitable for use in a single PCR reaction.
  • program parameters are set so that multiple amplimers are designed in the length range of 150-300bp, with predicted primer melting temperatures in the narrow range 60- 62°C.
  • the narrow temperature range increases the likelihood that a single set of PCR conditions can be used to generate a wide variety of different amplimers.
  • SSCP does not detect 100% of polymorphisms.
  • the invention provides for detection of polymorphisms with an efficiency of 95% under a single set of conditions using single coverage of sequences; a 2-fold screening strategy can be employed if it is necessary to increase this detection efficiency.
  • polymorphism can be located, and detected anywhere in the SSCP fragment except in the regions at each end that correspond to the sequence of the PCR primers.
  • the precise location and identity of the sequence variation(s) of a particular SSCP fragment can be confirmed by sequencing the fragment as described in Section D entitled "Isolation of a Wild Type Gene".
  • the sequence of a candidate gene will be compared to the known sequence of a wild-type version of the gene by using the following DNA/protein sequence analysis programs and methods.
  • PSI-BLAST is a more sensitive variant of BLAST that operates by interactively searching the database while simultaneously refining the query pattern based on the results of the searches.
  • Other packages of programs that are available and which have different specific properties include the HMMER, SAM, WISE, STADEN and FASTA packages, and the programs est_genome, dotter, e-PCR, Clustal, cross_match and phrap (Pearson, 1996, Methods EnzvmoL. 266:227).
  • primers can be designed to produce amplimers useful for identifying polymorphisms located in the RNA splice junctions.
  • primers can be designed to produce amplimers useful for identifying polymorphisms located in the promoter region.
  • Additional methods for detecting and isolating polymorphisms include, but are not limited to fluorescent polarization-TDI, mass spectroscopy denaturing gradient gel electrophoresis, chemical cleavage of mismatch, constant denaturant capillary electrophoresis, RNase cleavage, heteroduplex analysis, sequencing by hybridization, DNA sequencing, representational difference analysis, and denaturing high performance liquid chromatography, described below in Section F entitled, "Identification and Characterization of Polymorphisms".
  • polymorphisms do not alter gene function and are called neutral polymorphisms. Some polymorphisms do have an effect on gene function, for example by changing the amino acid sequence of a protein, or by altering control sequences such as promoters or RNA splicing or degradation signals.
  • Polymorphisms can be used in genetic studies to identify a gene involved in a disease. If a polymorphism alters a gene function such that it increases disease susceptibility, then it will be present more often in individuals with the disease than in those without the disease. Alternatively, if a particular DNA variant is protective against a disease, it will be found more often in individuals without the disease than in those with the disease. Statistical methods are used to evaluate polymorphism frequencies found in diseased as compared to normal populations, and provide a means for establishing a causal link between a polymorphism and a phenotype. To detect a significant association between a disease and a polymorphic site, different tests may be used with either genotypic or allelic distributions.
  • the simplest test consists of a t- test wherein the frequency of the polymorphic alleles in normal individuals and individuals with the disease phenotype is compared.
  • a comparison of the genotypic distribution in normal individuals and individuals with the disease phenotype can also be performed using a chi-square test of homogeneity.
  • These tests are implemented in all commercially or freely available statistical packages, for example SAS and S+, and are even included in Microsoft Excel. More sophisticated analyses will be performed by incorporating covariates such as linear regression or logistic regression, and by accounting for the information provided by adjacent polymorphic sites (multipoint analysis).
  • associations studies which test polymorphism frequencies within groups exhibiting different phenotypes and use statistical methods to compare the group polymorphism frequencies and identify correlations with phenotypes, are known as "associations studies". Some polymorphisms that occur in a single gene can alter the function of a gene sufficiently such that the polymorphism results in a disease (monogenic disease). However, many common human diseases are polygenic; that is they are the result of complex interactions of various forms of multiple genes.
  • DNA variants leading to monogenic diseases are usually rare in a population due to the process of natural selection against those carrying the disease gene.
  • variants in genes that are involved in polygenic disease do not produce the disease phenotype unless they occur in the appropriate combination with other gene variants, normal individuals can carry a subset ofthe disease-contributing variants without suffering adverse effects.
  • disease-contributing gene variants that are associated with polygenic diseases may exist at a high frequency in a normal population.
  • Monogenic diseases tend to be rare within the population, and therefore few patients may be available for studies of these diseases.
  • a polymorphism in a single specific gene is necessary and usually sufficient to cause a monogenic disease, such that associations between the variant gene and the phenotype are usually readily apparent.
  • the polymorphism present in the disease gene will not be found upon examination of a large number of normal individuals. If there is not complete penetrance then some apparently normal individuals will contain the mutation; the difference in frequency of occurrence ofthe variant gene in the disease group as compared to the normal population will reveal that the variant is associated with the disease.
  • variation at different genes occurs in a combination which alters susceptibility to the disease.
  • genes may have variant forms which can contribute to a disease phenotype, it is not always necessary for a contributing variant to be present at every gene potentially contributing to the disease in a given affected individual.
  • a hypothetical disease could be caused by a particular combination of variants at three of four genes, designated as A, B, C, and D.
  • Appropriate susceptibility variants in combination at any three ofthe genes can cause the susceptibility, i.e. one person with increased susceptibility may have susceptibility variants in genes A, B, and C, while another individual with increased susceptibility to the same disease will have susceptibility variants in genes B, C, and D. Therefore, although not all affected individuals will have the same susceptibility variants, the net result is that a diseased population will have susceptibility variant forms of genes A, B, C, and D at a higher frequency than an unaffected population (as detected by association studies).
  • the polymorphisms which contribute to the polygenic disease are also present in a normal population.
  • an individual with susceptibility polymorphisms in only one or two of the genes potentially contributing to the disease susceptibility will be normal with regard to disease susceptibility. Therefore, normal populations can be used to identify polymorphic regions of the genome in the population, and these regions can then be specifically tested in larger patient and control populations.
  • a gene is analyzed for the presence of polymorphisms by testing between 2 and 100 normal individuals in order to establish if a particular polymorphism is present for that gene in the population. Once a polymorphic site(s) has been defined, the polymorphic site is then tested in case (disease) and control (normal) populations and statistical analyses are performed to identify polymorphisms which occur at significantly different frequencies in the two populations.
  • the determination ofthe statistical significance of polymorphism frequency differences is dependent upon the size of the observed frequency difference between the populations, and on the size of the populations being studied. If a significant difference is found, then it can be concluded that an association exists between the polymorphism and the phenotype being studied.
  • a statistically significant difference is a frequency difference at a particular site between populations which would be expected to occur by chance in only 5 out of 100 tests. That is, a difference which has a 95% probability of being a true difference due to the affect of the gene.
  • polymorphisms which do not directly contribute to a disease can also be used to identify regions of the genome which contain genes that contribute to the disease by virtue of their proximity to disease-contributing polymorphisms.
  • DNA exists as 23 homologous pairs of linear molecules (chromosomes). Recombination is a process which results in reciprocal exchanges of short homologous DNA segments between these homologous DNA pairs. Only one of each of the 23 pairs of chromosomes is inherited by the offspring. The inherited chromosome is thus made up of tandemly arrayed segments of DNA derived from both of a pair of chromosomes. Consequently, DNA is transferred in segments from one generation to the next. Although the boundaries of each inherited segment may vary in each generation, the net effect is that sequences of DNA which are adjacent along the length of the molecule are inherited together at a higher frequency than sequences that are farther apart.
  • a region (continuous linear segment) of DNA has two or more polymorphisms that are close together, they will be co-inherited at a higher frequency than polymorphisms that are farther apart, as they are more likely to remain on the same segment of DNA during recombination. Therefore, if two or more polymorphisms are close together, they will occur together at a higher frequency in a population than would be expected by random segregation. This effect is known as linkage. Linkage studies are performed using multiply affected individuals within families; the most commonly used approach is to test markers located throughout the genome in many sets of affected sib pairs that share the same phenotype.
  • Markers which are located in the region of a genome that contributes to the phenotype will be inherited in both siblings, along with the phenotype, at a higher frequency than expected by chance. Studies wherein data from many such families is compared can be used to implicate a region of a genome as one that contributes to a particular phenotype.
  • Linkage disequilibrium (LD) association studies provide another method for using polymorphisms in genetic studies. The method of LD involves making a correlation at the population level, between the alleles (alternative polymorphic forms ofthe same sequence site) present at different genomic sites.
  • site 1 has two variant forms, A and a
  • site 2 has two variant forms B and b
  • the observation in a population that allele A at site 1 is more often found with allele B at locus 2 than with allele b is an example of LD.
  • allele B is a disease- contributing polymorphism, then testing at allele A may show an association with the disease.
  • Linkage disequilibrium may be generated in several ways. Maintenance of LD in a population allows a disease association to be detected many generations after the formation of LD. The maintenance of LD is explained by linkage: the closer the two loci, the longer (in terms of number of generations) that particular LD is maintained.
  • polymorphisms which do not directly contribute to a disease can be used to identify regions of the genome which contain a disease contributing polymorphism. If a polymorphism affects gene function such that it contributes to a phenotype being studied and is found to be associated with the phenotype, nearby (neutral) polymorphisms which are in LD with the disease polymorphism may also show an association with the disease.
  • a polymorphism does not affect gene function but is found to be associated with a particular phenotype, this polymorphism is in LD with a different, but adjacent polymorphism that affects gene function such that it contributes to the phenotype being studied. If a neutral polymorphism is always inherited with a phenotype- contributing polymorphism, then the strength of the association of the neutral polymorphism to the phenotype will be equal to that of the polymorphism which affects gene function and is contributing to the phenotype.
  • a polymorphism which shows an association with a phenotype is a marker for that phenotype and implicates the region in which the polymorphism resides as a region containing a polymorphism which contributes to the phenotype. Additional flanking polymorphisms can be tested to determine the precise location of the true phenotype-contributing variant.
  • Linkage studies on families, and LD studies on populations have different degrees of resolution with regards to defining the size of a DNA region which contains the phenotype- contributing polymorphism. In general, linkage studies define an interval which potentially contains tens to hundreds of genes, while LD studies have been used to implicate single genes in the development of a particular phenotype. 3. Test Populations Useful for Polymorphism Genotyping
  • the invention provides methods of determining allelic frequencies by performing genotypic analyses in appropriate test populations.
  • Bone Fracture Cohort 1000 multiple or low trauma fracture cases and 1000 control cases to determine genetic association with fracture.
  • BMD (Bone Mass Density) Cohort: 300 high and 300 low BMD cases to study genetic association with high or low BMD.
  • BMD Case Control Cohort 500 low BMD and normal BMD case contols to study genetic association with low BMD/fracture.
  • osteoporosis is most effective at the time when bone loss is increasing and before the bones have become fragile and prone to fracturing.
  • Established diagnostic techniques use x-ray and ultrasonography to measure skeletal parameters of bone size, volume and mineral density to predict fracture risk and to assess response to therapy. Such measurements give a "static" value which can be compared to normal values to aid diagnosis of low bone mass and fracture risk (Schott, Cormier et al. 1998).
  • the World Health Organization defines osteoporosis as present when the bone mineral density levels are more than 2.5 standard deviations below the young normal mean.
  • DXA dual energy X-ray absorptiometry
  • QCT quantitative computed tomography
  • SXA single-energy x-ray absorptiometry
  • An alternative method to predict fracture independently of bone mass is to measure bone turnover.
  • High turnover bone resorption and formation
  • This is a "dynamic" measurement which is assessed with biochemical markers in urine or serum and can be used very effectively in therapy monitoring in preference to BMD measurements which alter more slowly (results of PEPI trial and Merck Research Laboratories).
  • biomarkers can provide more accurate fracture predictions over bone mass measurement alone.
  • markers for bone resorption deoxypyridinoline crosslinks
  • bone formation bone alkaline phosphatase, osteocalcin
  • the invention discloses methods for performing polymorphism genotyping. These methods can be used to detect the presence of a polymorphism in a sample comprising DNA or RNA.
  • a DNA sample for analysis according to the invention may be prepared from any tissue or cell line, and preparative procedures are well-known in the art. The preparation of genomic DNA is performed as described in Section B.
  • RNA samples may also be useful for genotyping according to the invention. Isolation of RNA can be performed according to the following methods.
  • RNA is purified from mammalian tissue according to the following method. Following removal of the tissue of interest, pieces of tissue of ⁇ 2g are cut and quick frozen in liquid nitrogen, to prevent degradation of RNA. Upon the addition of a volume of 20 ml tissue guanidinium solution per 2 g of tissue, tissue samples are ground in a tissuemizer with two or three 10-second bursts. To prepare tissue guanidiium solution (1 L) 590.8 g guanidinium isothiocyanate is dissolved in approximately 400 ml DEPC-treated H 2 0.
  • Homogenized tissue samples are subjected to centrifugation for 10 min at 12,000 x g at 12°C.
  • the resulting supernatant is incubated for 2 min at 65°C in the presence of 0.1 volume of 20% Sarkosyl, layered over 9 ml of a 5.7M CsCl solution (O.lg CsCl/ml), and separated by centrifugation overnight at 113,000 x g at 22°C. After careful removal ofthe supernatant, the tube is inverted and drained.
  • the bottom of the tube (containing the RNA pellet) is placed in a 50 ml plastic tube and incubated overnight (or longer) at 4°C in the presence of 3 ml tissue resuspension buffer (5 mM EDTA, 0.5% (v/v) Sarkosyl, 5% (v/v) 2-ME) to allow complete resuspension of the RNA pellet.
  • tissue resuspension buffer 5 mM EDTA, 0.5% (v/v) Sarkosyl, 5% (v/v) 2-ME
  • RNA solution is extracted sequentially with 25:24:1 phenol/chloroform/isoamyl alcohol, followed by 24:1 chloroform/isoamyl alcohol, precipitated by the addition of 3 M sodium acetate, pH 5.2, and 2.5 volumes of 100% ethanol, and resuspended in DEPC water (Chirgwin et al., 1979, Biochemistry, 18: 5294).
  • RNA is isolated from mammalian tissue according to the following single step protocol.
  • the tissue of interest is prepared by homogenization in a glass teflon homogenizer in 1 ml denaturing solution (4M guanidiium thiosulfate, 25 mM sodium citrate, pH 7.0, 0.1 M 2-ME, 0.5% (w/v) N-laurylsarkosine) per lOOmg tissue.
  • 1 ml denaturing solution 4M guanidiium thiosulfate, 25 mM sodium citrate, pH 7.0, 0.1 M 2-ME, 0.5% (w/v) N-laurylsarkosine
  • 0.1 ml of 2 M sodium acetate, pH 4 1 ml water-saturated phenol
  • 0.2 ml of 49: 1 chloroform/isoamyl alcohol are added sequentially.
  • the sample is mixed after the addition of each component, and incubated for 15 min at 0-4°C after all components have been added.
  • the sample is separated by centrifugation for 20 min at 10,000 x g, 4°C, precipitated by the addition of 1 ml of 100% isopropanol, incubated for 30 minutes at -20°C and pelleted by centrifugation for 10 minutes at 10,000 x g, 4°C.
  • the resulting RNA pellet is dissolved in 0.3 ml denaturing solution, transferred to a microfuge tube, precipitated by the addition of 0.3 ml of 100% isopropanol for 30 minutes at -20°C, and centrifuged for 10 minutes at 10,000 x g at 4°C.
  • RNA pellet is washed in 70% ethanol, dried, and resuspended in 100-200 ml DEPC-treated water or DEPC-treated 0.5% SDS (Chomczynski and Sacchi, 1987, Anal. Biochem., 162: 156).
  • RNA prepared according to either of these methods can be used for genotyping by the methods of Northern blot analysis, S 1 nuclease analysis and primer extension analysis (Ausubel et al., supra).
  • cDNA samples also may be prepared according to the invention, i.e., DNA that is complementary to RNA such as mRNA. The preparation of cDNA is well-known and well- documented in the prior art. cDNA is prepared according to the following method.
  • Total cellular RNA is isolated (as described) and passed through a column of oligo(dT)-cellulose to isolate polyA RNA.
  • the bound polyA mRNAs are eluted from the column with a low ionic strength buffer.
  • short deoxythymidine oligonucleotides (12-20 nucleotides) are hybridized to the polyA tails to be used as primers for reverse transcriptase, an enzyme that uses RNA as a template for DNA synthesis.
  • mRNA species can be primed from many positions by using short oligonucleotide fragments comprising numerous sequences complementary to the mRNA of interest as primers for cDNA synthesis.
  • RNA-DNA hybrid can be converted to a double stranded DNA molecule by a variety of enzymatic steps well-known in the art (Watson et al., 1992, Recombinant DNA, 2nd edition, Scientific American Books, New York).
  • Tissues or fluids which are useful for obtaining a DNA or RNA sample according to the invention include but are not limited to plasma, serum, spinal fluid, lymph fluid, external secretions of the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and samples of in vitro cell culture constituents.
  • Genotyping methods which are useful according to the invention, i.e., for the detection of polymorphisms in nucleic acid samples isolated from individuals, are disclosed below.
  • SSCP Single Strand Conformation Polymorphism
  • fSSCP Fluorescent SSCP Screening
  • SSCP single strand conformation polymorphism
  • SSCP Single stranded DNAs that contain sequence variations are identified by an abnormal mobility on polyacrylamide gels.
  • SSCP detects all types of point mutations and short insertions or deletions that are located between the PCR primers (within the probe region) with apparently equal efficiency. This technique has proven useful for detection of multiple mutations and polymorphisms, including SNPs.
  • SSCP sensitivity varies dramatically with the size of the DNA fragment being analysed. The optimal size fragment for sensitive detection by SSCP is approximately 125-300bp.
  • the mobility of a single stranded DNA or double stranded DNA fragment during electrophoresis through a gel matrix is dependent on its size. Small molecules migrate more rapidly than large molecules because they pass through the pores in the matrix more easily.
  • electrophoresis of single stranded DNA involves a 'denaturing' gel which maintains the single strandedness of the molecules.
  • the denaturant is typically urea in polyacrylamide gels, and typically formamide or sodium hydroxide in agarose gels.
  • single-stranded DNA is analysed on a 'nondenaturing' gel.
  • test DNA samples are prepared for analysis as described above, and subject to PCR amplification.
  • Oligonucleotide primers are designed and synthesized as described above.
  • Amplifications are performed in a total volume of 10 ml containing 50 mM KCI, 10 mM Tris- HCl, pH 9.0 (at 25°C), 0.1 % Triton X-100, 1.5 mM MgCl 2 , 0.2mM of dGTP, dATP, dTTP, 0.02 mM of non radioactive dCTP, 0.05 ml [a- 33 P] dCTP (1,000-3,000 Ci mmol "1 ; 10 mCi ml "1 ), 0.2 uM each primer, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U Taq DNA polymerase.
  • the PCR cycling profile is as follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72°C for 5 min. Annealing temperature is different for each PCR primer pair and can be optimized according to the parameters described above. Amplifications using Vent Taq polymerase (New England Biolabs) are performed in a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of dGTP, dATP, dTTP, 0.02 mM dCTP, 0.25 ul [a- 33 P] dCTP (1,000-3,000 Ci mmoi !
  • SSCP Dried gels are exposed to X-OMAT ARfilm (Kodak) and the autoradiographs are analysed and scored for aberrant migration of bands (band shifts).
  • SSCP may be optimized, as desired, as taught in Glavac et al., 1993, Hum. Mut. 2:404. fSSCP Analysis
  • fSSCP fluorescent SSCP
  • fSSCP does not require handling of radioactive materials. Furthermore, the fSSCP technique allows for automated data and automated data analysis programs that detect aberrantly migrating samples. In contrast, SSCP evaluation involves visual examination by an individual, and does not provide a means for correcting for lane to lane variations in electrophoretic conditions, as does fSSCP analysis. fSSCP Analysis is performed as follows.
  • Amplifications are performed in a total volume of 10 ul containing 50 mM KCI, lOmM Tris-HCl, pH 9.0 (at 25 °C), 0.1 % Triton X-100, 1.5 mM MgCl 2 , 0.2mM of dGTP, dATP, dTTP, dCTP, 0.2 uM primer labeled with one of the fluorochromes HEX, FAM, TET or JOE, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U Taq DNA polymerase.
  • the PCR cycling profile is as follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72'C for 5 min. Annealing temperature is different for each PCR primer pair.
  • Vent Taq polymerase New England Biolabs
  • Amplifications using Vent Taq polymerase are performed in a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of dGTP, dATP, dTTP, dCTP, 0.2 uM primer labeled with one of the fluorochromes HEX, FAM, TET or JOE, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U of Vent Taq DNA polymerase. Samples are heated to 98°C for 5 min prior to addition of enzyme and nucleotides.
  • the PCR cycling profile is 98°C, 1 min; annealing temperature, 45 sec; 72°C, 1 min for 35 cycles, followed by a final extension at 72°C for 5 min. Annealing temperature is different for each PCR primer pair.
  • Two ul of fluorescent PCR products are added to 3 ul formamide dye (95% formamide, 20mM EDTA, 0.05% bromophenol blue, 0.05% xylene cyanol), denatured at 100°C for 5 min, then placed on ice. Thereafter, 0.5-1 ml of GenescanTM 1500 size markers are added as an internal standard.
  • SSCP and fSSCP techniques are preferred according to the invention, other methods for detecting sequence variations, including DNA sequencing, can be employed. Additional techniques for detecting DNA sequence variations useful according to the invention are described below.
  • Fluorescence polarization-TDI is another preferred technique according to the invention for the detection of sequence variations.
  • Template-directed primer extension is a dideoxy chain terminating DNA sequencing protocol designed to ascertain the nature of the one base immediately 3 'to the sequencing primer that is annealed to the target DNA immediately upstream from the polymorphic site.
  • ddNTP dideoxyribonucleoside triphosphate
  • the primer is extended specifically by one base as dictated by the target DNA sequence at the polymorphic site. By determining which ddNTP is incorporated, the alleles present in the target DNA can be determined.
  • Fluorescence polarization is based on the observation that when a fluorescent molecule is exited by plane-polarized light, it emits polarized fluorescent light into a fixed plane if the molecules remain stationary between excitation and emission. However, because the molecule rotates and tumbles in solution, fluorescence polarization is not observed fully by an external detector.
  • the fluorescence polarization of a molecule is proportional to the molecule' s rotational relaxation time, which is related to the viscosity of the solvent, absolute temperature, molecular volume, and the gas constant. If the viscosity and temperature are held constant, then fluorescence polarization is directly proportional to the molecular volume, which is directly proportional to the molecular weight.
  • the fluorescent molecule If the fluorescent molecule is large (with high molecular weight), it rotates and tumbles more slowly in solution and fluorescence polarization is preserved. If the molecule is small (with low molecular weight), it rotates and tumbles faster and fluorescence polarization is largely lost (depolarized).
  • the sequencing primer is an unmodified primer with its 3' end immediately upstream from a polymorphic or mutation site.
  • the allele-specific dye ddNTP is incorporated onto the TDI primer in the presence of DNA polymerase and target DNA.
  • the genotype of the target DNA molecule can be determined simply by exciting the fluorescent dye in the reaction and determining whether a change in fluorescence polarization occurs. Chen et al., 1999, Genome Res., 9:492.
  • One or more test DNA samples are prepared for analysis as described above, and subj ect to PCR amplification.
  • Oligonucleotide primers are designed and synthesized as described above. Amplifications are performed in a total volume of 10 ml containing 50 mM KCI, 10 mM Tris- HCl, pH 9.0 (at 25°C), 0.1 % Triton X-100, 1.5 mM MgCl 2 , 0.2mM of dGTP, dATP, dTTP, 0.02 mM of non radioactive dCTP, 0.05 ml [a- 33 P] dCTP (1,000-3,000 Ci mmol "1 ; 10 mCi ml "1 ), 0.2 uM each primer, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U Taq DNA polymerase.
  • the PCR cycling profile is as follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72°C for 5 min.
  • Annealing temperature is different for each PCR primer pair and can be optimized according to the parameters described above.
  • Vent Taq polymerase New England Biolabs
  • Amplifications using Vent Taq polymerase are performed in a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of dGTP, dATP, dTTP, 0.02 mM dCTP, 0.25 ul [a- 33 P] dCTP (1,000-3,000 Ci mmol ⁇ lO mCi ml "1 ), 0.2 uM of each primer, 50 ng of genomic DNA (or 1 ng of cloned DNA template) and 0.1 U of Vent Taq DNA polymerase. Samples are heated to 98°C for 5 min prior to addition of enzyme and nucleotides.
  • the PCR cycling profile is 98°C, 1 min; annealing temperature, 45 sec; 72°C, 1 min for 35 cycles, followed by a final extension at 72°C for 5 min.
  • the length and temperature of each step of a PCR cycle, as well as the number of cycles, is adjusted in accordance to the stringency requirements, as described above.
  • unused PCR primers and dNTPs are destroyed by adding 2ml of PCR product to 2ml of SAP/Exonuclease cocktail (0.1U shimp alkaline phosphatase (1 U/ml,Amersham Pharmacia Biotech, Inc., Piscataway, NJ)and 0.2U E. coli exonuclease I (10 U/ml, Amersham)in SAP buffer (20mM TrisHCl, pH 8.0; 10 mM MgCl 2 , Amersham))per well of a 384-well Black PCR plate (ABT). The mixtures are incubated at 37°C for 60 min before the enzymes are heat inactivated at 95°C for 15 min. The mixture is held at 4°C until used in the FP- 5 TDI assay.
  • SAP/Exonuclease cocktail 0.1U shimp alkaline phosphatase (1 U/ml,Amersham Pharmacia Biotech, Inc., Piscataway, NJ
  • TDI reaction cocktail containing TDI buffer (50mM Tris-HCl (pH 9.0), 50mM KCI, 5 mM NaCI, 2 mM MgCl 2 , 8% glycerol), 1 mM TDI primer, 12.5 nM of each of two allele specific dye-labled ddNTPs (ROX-ddGTP, BFL- ddATP, Tamra-ddCTP, or R6G-ddUTP; NEN Life Science Products, Inc., Boston, MA), and ' 10 0.32U Thermo Sequenase (Amersham).
  • the reaction mixtures are incubated at 94oC for 15 min, followed by 34 cycles of 94°C for 30 seconds and 55°C for 15 seconds. Upon completion of the reaction cycles, the samples are held at 4°C.
  • Denaturing gradient gel electrophoresis is a gel system which allows electrophoretic separation of DNA fragments differing in sequence by a single base pair. The o separation is based upon differences in the temperature of strand dissociation of the wild-type and mutant molecules.
  • DGGE Denaturing gradient gel electrophoresis
  • fragments migrating through the gel are exposed to an increasing concentration of denaturant in the gel.
  • the DNA strands begin to dissociate. This dissociation causes a significant reduction in the mobility of the fragment.
  • the position in the gel at which the level 5 of denaturant is critical for a particular DNA fragment is a function of the Tm of the DNA fragment and is therefore different for wild-type versus mutant fragments.
  • DGGE mutation detection rate
  • the mutation detection rate of DGGE approaches 100%.
  • DGGE can only be used to analyze fragments between 100 and 800bp due to the resolution limit of polyacrylamide gels .
  • DGGE is advantageous over other methods useful for detecting sequence variations because the behavior of DNA molecules on DGGE gels can be modeled by computer thereby making it possible to accurately predict the detectability of a mutation in a given fragment. Genomic DNA fragments can be efficiently transferred from the gel following DGGE as described in US Patent No. 5,190,856.
  • Chemical Cleavage of Mismatches is another technique for detection of sequence variations that is useful according to the invention.
  • CCM is based upon the ability of hydroxylamine and osmium tetroxide to react with the mismatch in a DNA heteroduplex and the ability of piperidine to cleave the heteroduplex at the point of mismatch.
  • sequence variations are detected by the appearance of fragments that are smaller than the untreated heteroduplex following denaturing polyacrylamide gel electrophoresis.
  • DNA fragments up to lkb in size can be analysed by CCM with a probable 100% detection rate for sequence variation.
  • CCM is particularly useful for either detecting all of the sequence variations in a particular fragment of DNA or for determining that there are no sequence variations in a particular fragment of DNA.
  • CDCE analysis is particularly useful in high throughput screening, i.e., wherein large numbers of DNA samples are analysed.
  • CDCE analysis combines several elements of both replaceable linear polyacrylamide capillary electrophoresis and constant denaturant gel electrophoresis.
  • the technique of CDCE is a rapid, high resolution procedure that demonstrates a high dynamic range, and is automatable.
  • the method of CDCE as described in detail in Khrapko et al., 1994, Nucleic Acids Res. 22:364, involves the use of a zone of constant temperature and a denaturant concentration in capillary electrophoresis. Linear polyacrylamide gel electrophoresis is performed at viscosity levels that permit facile replacement of the matrix after each run.
  • point mutation-containing heteroduplexes are separated from wild type homoduplexes in less than 30 minutes.
  • the system has an absolute limit of detection of 3 x 10 4 molecules with a linear dynamic range of six orders of magnitude.
  • the relative limit of detection is about 3/10,000, i.e., 100,000 mutant sequences are recognized among 3 x 10 8 wild type sequences. This approach is applicable to analysis of low frequency mutations, and to genetic screening of pooled samples for detection of rare variants.
  • RNASE RNase Cleavage
  • RNASE A RNASE A
  • RNASE TI RNASE T2
  • RNASE T2 specifically digest single stranded RNA.
  • RNA is annealed to form double stranded RNA or an RNA/DNA duplex, it can no longer be digested with these enzymes.
  • cleavage at the point of mismatch may occur.
  • RNASE Cleavage is preferably performed with RNASE A.
  • Ribonuclease A specifically digests single stranded RNA but can also cleave heteroduplex molecules at the point of mismatch. The extent of cleavage at single base mismatches depends on both the type of mismatch, and the sequence of DNA flanking the mismatch. Sequence variations leading to mismatch are indicated by the presence of fragments that are smaller than the uncleaved heteroduplex on denaturing polyacrylamide gels.
  • RNASE Cleavage involves forming a heteroduplex between a radiolabeled single stranded RNA probe (riboprobe) and a PCR product derived from a biological sample. If a point mutation is present in the PCR product, following treatment of the resulting RNA/DNA heteroduplex with RNASE A, the RNA strand of the duplex may be cleaved. The sample is then denatured by heating and analysed on a denaturing polyacrylamide gel. If the RNA probe has not been cleaved, it will be the same size as the PCR product. If the probe has been cleaved, it will be smaller than the PCR product.
  • riboprobe radiolabeled single stranded RNA probe
  • RNASE Cleavage can be used to easily detect a 1 bp deletion. However, small insertions may not be as easily detected as small deletions, by RNASE Cleavage, as 'looping-out' occurs on the target strand rather than the probe strand.
  • Heteroduplex Analysis Another method for genotyping according to the invention is heteroduplex analysis.
  • Heteroduplex molecules i.e., double stranded DNA molecules containing a mismatch
  • the exact rate of detection of sequence variations by heteroduplex analysis is unknown, but is clearly significantly lower than 100%. Presumably, the sequence of DNA flanking the mismatch, rather than the actual mismatch affects the detectability. Mismatches that are located in the middle of a DNA fragment are detected most easily.
  • heteroduplex analysis is less sensitive than some of the other genotyping methods described, it may be considered useful according to the invention due to its simplicity.
  • MRD mismatch repair detection
  • MRD is an in vivo method that detects DNA sequence variation by the occurrence of a change in bacterial colony color. DNA fragments to be screened for variation are cloned into two MRD plasmids, and bacteria are transformed with heteroduplexes of these constructs . The resulting colonies are blue in the absence of a mismatch and white in the presence of a mismatch. MRD can be used to detect a single mismatch in a DNA fragment as large as 10 kb in size. MRD permits high-throughput screening of genetic mutations, and is described in detail in Faham et al., 1995, Genome Research 5:474.
  • Mismatch Recognition by DNA Repair Enzymes Another technique that is useful for detecting sequence variations according to the invention is Mismatch Recognition by DNA Repair Enzymes.
  • the E.coli mismatch correction systems are well-understood.
  • Three ofthe proteins required for the methyl-directed DNA repair pathway: MutS, MutL and MutH are sufficient to recognize 7 ofthe possible 8 single base-pair mismatches (C/C mismatches are not recognized) and cut/nick the DNA at the nearest GATC sequence.
  • the MutY protein which is involved in a distinct repair system can also be used to detect A/G and A C mismatches.
  • thymidine glycosylase can recognize all types of T mismatch and 'all-type endonuclease' or Topoisomerase I is capable of detecting all 8 mismatches, but does so with varying efficiencies, depending on both the type of mismatch and the neighboring sequence.
  • the MutS gene product is the methyl-directed repair protein which binds to the mismatch.
  • Purified MutS protein has been used to detect mutations by several different methods. Gel mobility assays can be performed in which DNA bound to the MutS protein migrates more slowly through an acrylamide gel than free DNA. This method has been used to detect single base mismatches.
  • MutS in mismatch recognition
  • nitrocellulose membranes An alternative method for the use of MutS in mismatch recognition, which does not require gel electrophoresis, involves the immobilization of MutS protein on nitrocellulose membranes. Labeled heteroduplexed DNA is used to probe the membrane in a dot-blot format. When both DNA strands are used, all mismatches can be recognized by binding of the DNA to the protein attached to the membrane. Although C/C mismatches are not detected, the corresponding G/G mismatch derived from the other strand is recognized. This technique is particularly useful because it is simple, inexpensive, and amenable to automation. However, the detection efficiency of this method may be limited by the size of the DNA fragment, h particular, this method works well for very short fragments.
  • An alternative method for detecting sequence variations according to the invention is sequencing by hybridization (SBH).
  • SBH sequencing by hybridization
  • arrays of short (8-10 base long) oligonucleotides are immobilized on a solid support in a manner similar to the reverse dot-blot protocol, and probed with a target DNA fragment.
  • oligonucleotides are synthesized together and directly onto the support.
  • the synthesis system begins with a silicon chip coated with a nucleotide linked to a light-sensitive chemical group which is used to illuminate particular grid co-ordinates removing the blocking group at these positions .
  • the chip is then exposed to the next photoprotected nucleotide, which polymerizes onto the exposed nucleotides.
  • oligonucleotides of different sequences can be synthesized at different positions on the solid support. Thirty-two cycles of specific additions (i.e., 8 additions of each of the four nucleotides) should enable the production of all 65,536 possible 8-mer oligonucleotides at defined positions on the chip.
  • a DNA molecule e.g., a fluorescently labeled PCR product
  • fully matched hybrids should give a high intensity of fluorescence and hybrids with one or more mismatches should give substantially less intense fluorescence.
  • the combination ofthe position and intensity of the signals on the chip enables computers to derive the sequence of the DNA molecule being analysed for the presence of sequence variations.
  • ASO allele-specific oligonucleotide
  • 'dot-blot' The technique of allele-specific oligonucleotide (ASO) hybridization or the 'dot-blot' is also useful for genotyping according to the invention.
  • an oligonucleotide will only bind to a PCR product if the two are 100% identical.
  • a single base pair mismatch is sufficient to prevent hybridization.
  • a pair of oligonucleotides, one carrying the wild type base and the other carrying a single base change, as compared to the wild type sequence, can be used to determine if a PCR product is homozygous wild type, heterozygous or homozygous mutant for a particular base change.
  • the PCR product When performing conventional dot blots, the PCR product is fixed onto a nylon membrane and probed with a labeled oligonucleotide.
  • an oligonucleotide When performing a 'reverse dot blot' , an oligonucleotide is fixed to a membrane and probed with a labeled PCR product.
  • the probe may be isotopically labeled, or non-isotopically labeled.
  • the allele-specific polymerase chain reaction (also called the amplification refractory mutation system or ARMS) comprises an assay that occurs during the PCR reaction itself.
  • ARMS requires the use of sequence-specific PCR primers which differ from each other at their terminal 3' nucleotide and are designed to amplify only the normal allele in one reaction, and only the mutant allele in another reaction.
  • amplification occurs.
  • Agarose gel electrophoresis is used to detect the presence of an amplified product.
  • the genotype of a (heterozygous) wild-type sample is characterized by amplification products in both reactions, and a homozygous mutant sample generates product in only the mutant reaction.
  • This technique can be modified so that the 5' ends of the allele-specific primers are labeled with different fluorescent labels, and the 5 ' end ofthe common primers are biotin labeled.
  • the wild-type specific and the mutant-specific reactions are performed in a single tube.
  • the advantages of this approach are that a gel electrophoresis step is not required, and the method is amenable to automation.
  • PIRA primer-introduced restriction analysis
  • the method of primer-introduced restriction analysis can also be used for genotyping according to the invention.
  • PIRA is a technique which allows known sequence variations to be detected by restriction digestion.
  • By introducing a base change close to the position of a known sequence variation for example by using a PCR primer containing a mismatch, as compared to the target sequence, it is possible to create a restriction endonuclease recognition site that indicates the presence of a particular sequence change.
  • the combination of the altered base in the primer sequence and the altered base at the mutation site creates a new restriction enzyme target site. This approach may be used to create a new restriction enzyme site in either the wild-type allele or the mutant allele.
  • the homozygous wild-type form would produce a single band of the full-length size
  • the homozygous mutant form would produce a single band of the reduced size
  • the heterozygous form would produce both full length and reduced sized bands. Band size will be analysed by gel electrophoresis.
  • Oli onucleotide Ligation Assay The technique of oligonucleotide ligation can also be used for genotyping according to the invention.
  • oligonucleotide ligation is based on the following observations. If two oligonucleotides are annealed to a strand of DNA and are exactly juxtaposed, they can be joined by the enzyme DNA ligase. If there is a single base pair mismatch at the junction of the two oligonucleotides then ligation will not occur. According to the method of oligonucleotide ligation, the two oligonucleotides used in the assay are modified by the addition of two different labels.
  • the assay for a li gated product involves detecting a ligated product by assaying for the appearance of the labels of the two oligonucleotides on a single molecule rather than visualization of a new, larger sized DNA fragment by gel electrophoresis.
  • the oligonucleotide ligation assay can be performed by a robot and the results can be analysed by a plate reader and fed directly into a computer. This method is therefore extremely useful for detecting the presence of a sequence variation in a large number of samples.
  • the oligonucleotide ligation assay is performed on PCR-amplified DNA.
  • a modification of this assay termed the ligase chain reaction, is performed on genomic DNA and involves amplification with a thermostable DNA ligase.
  • Direct DNA Sequencing Genotyping may also be carried out by directly sequencing the
  • Mini-Sequencing also known as single nucleotide primer extension
  • the technique of mini-sequencing can also be used to detect any known point mutation, deletion or insertion, according to the invention.
  • Obtaining sequence information for just a single base pair only requires the sequencing of that particular base. This can be done by including only one base in the sequencing reaction rather than all four. When this base is labeled and complementary to the first base immediately 3' to the primer (on the target strand), the label will not be incorporated. Thus, a given base pair can be sequenced on the basis of label incorporation or failure of incorporation without the need for electrophoretic size separation.
  • Genotyping according to the invention can also be performed by the method of 5' nuclease assay.
  • the 5' nuclease assay is a technique that monitors the extent of amplification in a PCR reaction on the basis of the degree of fluorescence in the reaction mix. A low level of fluorescence indicates no amplification or very poor amplification and a high level of fluorescence indicates good amplification.
  • This system can be adapted to permit identification of known sequence variations, without the need for any post-PCR analysis other than fluorescence emission analysis.
  • PCR amplification is detected by measuring the 5' to 3' exonuclease activity of Taq polymerase.
  • Taq polymerase cleaves 5' terminal nucleotides of double stranded DNA.
  • the preferred substrate for Taq polymerase is a partially double stranded molecule.
  • Taq polymerase cleaves the strand that contains the closest free 5' end.
  • an oligonucleotide 'probe' which is phosphorylated at its 3' end so as to render it incapable of serving as a DNA synthesis primer, is included in the PCR reaction.
  • the probe is designed to anneal to a position between the two amplification primers.
  • the probe is labeled in a manner that permits detection ofthe removal ofthe probe.
  • the probe is labeled at different positions with two different fluorescent labels.
  • One label has a localized quenching effect on the fluorescence of the other (reporter) label. This effect is mediated by energy transfer from one dye to the other, and requires that the two dyes are in close proximity to each other.
  • Genotyping according to the invention can also be carried out by Representational Difference Analysis (RDA).
  • RDA is described in detail in Lisitsyn et al., 1993, Science 259:946, and an adaptation which combines selective breeding with RDA is described in Lisitsyn et al., 1993, Nature Genet. 6:57.
  • RDA identifies sequence dissimilarities through the application of a powerful approach to subtractive hybridization.
  • An amplicon can comprise, for example, the set of BglJJ fragments that are small enough to be amplified by the PCR.
  • the iterative subtraction step begins with the ligation of a special adaptor to the 5' end of fragments contained in the amplicon derived from the test sample (tester amplicon).
  • the tester amplicon is then melted and briefly reannealed in the presence of a large excess of amplicon, derived from the wild type sample (driver amplicon).
  • Those tester fragments that reanneal can serve as a template for the addition of the adaptor sequence to the 3 '-end of the "partner" fragment.
  • these tester fragments can be exponentially amplified by PCR. This procedure is then repeated to achieve successively higher enrichment.
  • RDA may be used to clone sequences that are either wholly absent from the wild type sample or are present in the wild type DNA, but are contained in a restriction fragment that is too large to be amplified in the amplicon.
  • the former case may arise from a total deletion; the latter from a restriction fragment length polymorphism with the short allele present in the tester but not the wild type DNA.
  • RDA is useful for subtracting DNA from an individual with a particular disease from normal DNA so as to identify regions showing homozygous or heterozygous deletions; locating fragments present in a parent with a dominant disorder but absent in his unaffected offspring; and locating mRNAs expressed in normal tissue but not present in tissue isolated from an individual with a particular disease.
  • DHPLC Chromatography
  • partial heat denaturation and a linear acetonitrile column are used to identify polymorphisms in DNA fragments .
  • DHPLC provides a method of comparative DNA sequencing based on the capability of ion-pair reverse phase liquid chromatography on alkylated nonporous poly(styrene divinylbenzene) particles to resolve homo- from heteroduplex molecules under conditions of partial denaturation. This method can potentially be automated to allow for rapid analysis of a large number of samples (Underhill et al., 1996, Proc. Natl. Acad. Sci. USA, 93:196).
  • Matrix-assisted laser desorption-ionization-time-of-flight (MALDI-TOF) mass spectroscopy is another method according to the invention by which genotyping can be performed.
  • the method of MALDI-TOF mass spectroscopy is based on the irradiation of crystals formed by suitable small organic molecules (referred to as the matrix) with a short laser pulse at a wavelength close to the resonant adsorption band of the matrix molecules. This causes an energy transfer and desorption process producing matrix ions. Low concentrations of nucleic acid molecules are added to the matrix molecules while in solution and become embedded in the solid matrix crystals upon drying of the mixture.
  • the intact nucleic acids are then desorbed into the gas phase and ionized upon irradiation with a laser allowing their mass analysis.
  • MALDI is used primarily with time-of-flight spectrometers where the time of flight is related to the mass-to- charge ratio of the nucleic acids molecules. Reviewed in Griffin TJ. and Smith L.M., 2000, Trends Biotech 18:77. Genotyping can be performed by any of the following MALDI-TOF mass spectroscopy approaches including sequencing of PCR products (Fu,D-J et al., 1998, Nat. Biotechnol. 16:381; Kirpekar, F. et al., Nucleic Acids Res.
  • the invention provides methods for specifying a particular polymorphism.
  • specifying an polymorphism is meant defining a polymorphism in the context of a larger region of nucleic acid which contains the polymorphism, and is of sufficient length to be easily differentiated from any other position in the genome.
  • a unique nucleotide position (e.g. a polymorphic site) in the human genome can be specified by describing a unique sequence of DNA within the genome, and providing the location of the unique nucleotide position relative to that sequence. Preferably this is done by providing the sequence identity of a length of unique DNA containing the polymorphism, and indicating which of the nucleotide sites is polymorphic.
  • 16 bp would uniquely define a sequence in the genome.
  • the genome is not composed of random sequence and does not contain equal amounts of A, G, C andT.
  • 10-12 bp sequences are likely to be specific for 95% of genes. Some sequences may even be specified by as few as 8 nucleotides.
  • the minimum sequence length that is useful according to the invention for identifying polymorphisms in most gene and intergenic sequences is approximately 9-15 bp.
  • repeats In the case of repeat sequences and sequences associated with gene families, the probability of observing a particular sequence is greatly increased and it becomes difficult to specify a polymorphism in the context of a sequence that is only on the order of 9-15 bp.
  • repeats There are many types of repeats including tandem repeats, where a larger sequence block has within it smaller repeat units (e.g. microsatellites). Tandem repeats usually occur within non-genic areas, but can also occur within genes and subsequently affect gene function; they can be 10-lOOOs of bp long, or, if located in centromeres and telomeres, be megabase sized. Some repeats are composed of blocks which do not have sub-repeat units and are non-functional (e.g. -300 bp Alu repeats). These occur by duplication/dispersal throughout the genome.
  • a larger region of nucleic acid which contains the polymorphism will be required to define a polymorphism in a gene that is a member of a gene family. It is predicted that a sequence of 9-15 bp will be sufficient to define a polymorphism in 99% of all cases.
  • An oligonucleotide is designed such that it is specific for a target sequence, and hybridizes only at the target sequence site. This oligonucleotide will not hybridize if the target sequence differs at the position in the sequence to be tested.
  • Another oligonucleotide is designed such that it hybridizes with the polymorphic form of the sequence.
  • a DNA sample is tested for hybridization with each of the two probes independently. If the DNA hybridizes to only one of the probes, it can be concluded that the individual is homozygous for the corresponding sequence. If both probes hybridize to a test DNA sample, then the individual is heterozygous. Hybridization will be detected by the method of Southern blot analysis (as described in Section C entitled "Production of a Nucleic Acid Probe").
  • An alternative method for specifying a particular polymorphism involves a PCR-based strategy.
  • a region of a candidate gene to be tested is amplified by PCR (as described).
  • the amplified fragment is digested with a restriction enzyme that will not cut a fragment that contains a polymorphism, due to the location of the polymorphism within the recognition site of this restriction enzyme.
  • the products ofthe digestion reaction mixture are size separated in an agarose gel, stained with ethidium bromide, and visualized under ultraviolet light to determine if the amplified product has been digested.
  • the PCR primers provide the specificity for a particular polymorphism by virtue ofthe specific sequence of the two primers, as well as by the location of the primer binding sites in the target DNA.
  • multiple sites for primer binding may exist in a target DNA sequence, only the sites that are close enough together will produce an amplified product that includes the nucleic acid region containing the polymorphism.
  • a PCR reaction is carried out with PCR primers that contain polymorphisms. According to this embodiment, if the template nucleic acid lacks the polymorphism present in the primers there will be no PCR product. Thus, according to this embodiment of the invention, the absence of a PCR product indicates that a polymorphism is not present in the target sequence.
  • a DNA fragment comprising the region containing a polymorphism is PCR amplified from an individual to be tested.
  • the PCR product is denatured and one strand is retained for analysis.
  • An oligonucleotide probe is designed such that it is specific for a region in the sequence and hybridizes such that its 3' terminal nucleotide is paired with the nucleotide adjacent to the one to be tested.
  • the PCR product and probe are combined with a polymerase and terminating, differentially colored, nucleotides.
  • the polymerase extends the probe by one base, and only the base which is complementary to the site being tested is added. The reaction is washed, and the color of the reaction indicates the nucleotide that has been added and the sequence at the position of interest.
  • the PCR step provides one level of specificity by amplifying a region (1 - 10000 bp as desired between the PCR primers) from a complex (3,000,000,000 bp) mixture.
  • the PCR probes primers must be unique in both their hybridization specificity and their proximity to one another. Since proximity ofthe two PCR primers is needed (i.e. a distance across which a polymerase can extend to join the primers), shorter PCR primers can be used, e.g. in theory a small enough region could be amplified with a 8-10 bp binding site for a PCR primer. To ensure that a primer hybridizes with specificity, a primer must be at least 5 bp.
  • a second level of specificity is provided by the primer which is extended in the primer extension reaction. Since this primer is hybridizing to a short piece of DNA, it can be short and unique for the fragment with which it binds.
  • the primer is at least 5bp and preferably 8bp.
  • the primer used for the primer extension step is located probe adjacent to the polymorphic site, the PCR primers should not overlap with the polymorphic site being tested.
  • One method for detecting a previously defined polymorphism involves Southern blot analysis of wild type and mutant DNA following digestion with a restriction enzyme which has a recognition sequence which includes the polymorphic site to be tested.
  • a restriction enzyme which has a recognition sequence which includes the polymorphic site to be tested.
  • a particular restriction enzyme cuts wild type DNA but does not cut mutant DNA due to the presence of a polymorphism within the recognition site of this restriction enzyme.
  • Many restriction enzymes exist which recognize 4bps.
  • the resulting fragments will be size separated in an agarose gel, transferred to a membrane and probed with a nucleic acid probe. If the site is uncut, the fragment is one length and if the site is cut the fragment will be of a shorter length.
  • the nucleic acid hybridization probe will provide specificity to the particular polymorphism being tested by defining the polymorphism in the context of a larger stretch of nucleic acid sequence.
  • the nucleic acid probe may comprise the nucleic acid sequence corresponding to the region known to contain the polymorphism.
  • the sequence-specific probe may be located 10, 100, 1000, or even 100s of thousands of bases from the region containing the polymorphism. If the probe is located some distance from the region containing the polymorphism, an intervening recognition site for the restriction enzyme cannot be located between the probe hybridization site and the region of interest containing the polymorphism site.
  • a hybridization probe useful according to this method will be much larger than the minimum length of a sequence (9-15 bp) required to give specificity to, or define a particular polymorphism.
  • a chemical or enzyme which recognizes a unique pair of nucleotides at the site of a polymorphism can be used to detect the polymorphism.
  • the amount of sequence required for recognition by a chemical or enzyme is 2 bp (providing that the
  • 2 bp sequence is unique in a region large enough to produce a fragment which can then be bound by a specific probe).
  • a labeled chemical or enzyme which binds to one sequence of the polymorphic recognition site and not another is used.
  • This method involves the steps of digesting the DNA with a restriction enzyme, and adding a labeled, sequence-specific binding protein (e.g. a restriction enzyme that lacks cleavage capability).
  • the sequence-specific binding protein will bind to multiple sites in the genome, including the site to be tested.
  • the fragments will be separated on a gel and then probed with a probe specific for the test sequence. If the fragment identified by the second probe is identical to a fragment identified by the first probe (e.g. the labeled chemical or enzyme), then the sequence being tested for is present.
  • the invention provides methods for performing polymorphism genotyping in appropriate populations (described above).
  • the invention also provides in vitro and in vivo assays useful for determining the phenotypic outcome of a polymorphism in a candidate gene.
  • Every polymorphism has the potential to alter the genetic activity of an individual.
  • the effect of a polymorphism can range from an inconsequential, silent change to a change that causes a complete loss of protein function to a gain of aberrant or detrimental function mutation.
  • the severity of the effect of a polymorphism on gene activity will depend on the exact molecular consequences of the particular polymorphism. For example, alterations of a single pre-mRNA splicing dinucleotide could have profound effects on both the quantitative and qualitative properties of gene activity since alterations in splicing efficiency can both reduce the overall level of normal transcription as well as cause "exon skipping".
  • In vitro assays useful for determining the effects of a polymorphism on gene expression and protein function include, but are not limited to the following. i. Transcriptional Regulation The transcriptional regulation of a candidate gene containing a polymorphism may be altered, as compared to the wild type gene.
  • promoter assays wherein the altered promoter ofthe candidate gene is used to drive the expression of a reporter gene (e.g. CAT, luciferase, GFP) are performed.
  • Changes in the transcriptional regulation of a candidate gene due to the presence of a polymorphism can also be detected by methods useful for measuring the level of mRNA including S 1 nuclease mapping and RT-PCR.
  • the S 1 enzyme is a single-stranded endonuclease that will digest both single-stranded RNA and DNA.
  • a probe that has been efficiently labeled to a high specific activity at the 5 ' end through the use of a kinase is used to determine either the amount of an mRNA species or the 5' end of a message.
  • a single stranded probe that is complementary to the sequence of the RNA species of interest is utilized in SI analysis. If the structure of a particular mRNA species is known, S 1 analysis is performed with oligonucleotide probes of at least 40 bp, that are complementary to the RNA of interest.
  • oligonucleotides wherein the 5' end of the oligonucleotide is complementary to the RNA. It is also preferable to use oligonucleotides wherein the 5' terminal residues contain dG or dC residues. If Si nuclease analysis will be utilized to determine the 5' termini of an RNA species, the 3' end of the oligonucleotide should extend at least 4 nucleotides beyond the RNA coding sequence. The inclusion of additional nucleotides facilitates differentiation of a band resulting from an RNA:DNA duplex and a band representing the probe.
  • a hybridization probe for SI analysis is prepared by incubating 2pmol of an oligonucleotide in the presence of 150 mCi[y 32 P]ATP (3000-7000Ci/mmol), 2.5 ml 10X T4 polynucleotide kinase buffer (700mM Tris-Cl, pH 7.5, 100 mM MgCl 2 , 50 mM dithiothreitol, 1 mM spermidine-Cl, 1 mM EDTA), and 10U T4 polynucleotide kinase for 37°C for 30-60 minutes.
  • the radiolabeled probe is ethanol precipitated and resuspended at lml/0.3ng oligonucleotide or 1O 5 cpm.
  • the hybridization reaction is performed as follows. An amount of probe equal to 5xl0 4 Cerenkov counts is added to 5Omg RNA on ice and ethanol precipitated. The resulting pellet is resuspended in 20ml S 1 hybridization solution (80% deionized formamide, 40 mM PIPES, pH 6.4, 400mM NaCI, 1 mM EDTA, pH 8), denatured for 10 min at 65°C and hybridized overnight at 30°C.
  • S 1 hybridization solution 80% deionized formamide, 40 mM PIPES, pH 6.4, 400mM NaCI, 1 mM EDTA, pH 8
  • RT-PCR reverse transcription /polymerase chain reaction
  • the RNA is converted to first strand cDNA, which is relatively stable and is a suitable template for a PCR reaction.
  • the cDNA template of interest is amplified using PCR. This is accomplished by repeated rounds of annealing sequence- specific primers to either strand of the template and synthesizing new strands of complementary DNA from them using a thermostable DNA polymerase.
  • RNA sample is ethanol precipitated with a cDNA primer. It may be preferable to use a cDNA primer that is identical to one ofthe amplification primers.
  • a cDNA primer that is identical to one ofthe amplification primers.
  • To the pellet is added 12 ml H 2 0, 4ml 400mM TrisCl, pH 8.3, and 4 ml 400 mM KCI. The mixture is heated to 90°C, slow cooled to 67°C, microfuged and incubated for 3 hours at 52°C.
  • the resulting cDNA pellet is resuspended in 40ml H 2 0.5ml ofthe cDNA sample is mixed with 5ml or each amplification primer ( ⁇ 20mM each), 4ml 5mM 4dNTP mix, 10ml 10X amplification buffer (500mM KCI, lOOmM TrisCl, pH8.4, lmg/ml gelatin) and 70.5ml H 2 0. After the mixture is heated for 2 minutes at 94°C, 0.5 ml (2.5U) Taq DNA polymerase is added and the sample is overlaid with mineral oil.
  • PCR amplification ofthe cDNA will be performed using the following automated amplification cycles: 39 cycles (2 minutes at 55°C, 2 minutes at 72°C, 1 minute at 94°C), 1 cycle (2 minutes at 55°C, 7 minutes at 72°C). The number of cycles can be varied in accordance with the abundance of RNA (Ausubel et al., supra).
  • assays including but not limited to the yeast two-hybrid assay (Fields et al., 1994, Trends Genet., 10:286) can be used to determine the effects of a polymorphism on transcription factor binding.
  • the protein product of the gene of interest is a DNA binding protein
  • the phenotypic outcome of a polymorphism may be impaired nuclear transport, DNA binding, chromatin assembly or chromatin structure, methylations or histones deacetylation.
  • Irnmunocytochemical methods or cell fractionation techniques are used to determine if the protein is correctly localized in the nucleus.
  • DNA binding properties of a transcription factor are determined by gel shift analysis (as described in Ausubel et al., supra), oligonucleotide selection, southwestern assays or by immunohistochemical analysis of fixed chromosomes.
  • the method of gel shift analysis is used to detect sequence specific DNA-binding proteins from crude extracts. According to this method, proteins that bind to an end-labeled DNA fragment will retard the mobility of the fragment. The change in the mobility of the labeled fragment is detected by the appearance of a discrete band comprising the DNA-protein complex.
  • a number of methods for preparing nuclear and cytoplasmic extracts useful for gel shift analysis are known in the art. For example, nuclear extracts are prepared according to the following method.
  • a cell pellet is washed in PBS, resupended in a volume of hypotonic buffer (10 mM HEPES, pH 7.9, 1.5 mM MgCl 2 , lOmM KCI, 0.2 mM PMSF, 0.5 mM DTT ) that is approximately equal to 3 times the packed cell volume and allowed to swell on ice for 10 minutes.
  • hypotonic buffer 10 mM HEPES, pH 7.9, 1.5 mM MgCl 2 , lOmM KCI, 0.2 mM PMSF, 0.5 mM DTT
  • Cells are homogenized in a glass Dounce homogenizer and the nuclei are collected by centrifugation and resupended in a volume of low-salt buffer (20 mM HEPES , pH 7.9, 25% (v/v) glycerol, 1.5 mM MgCl 2 , 0.02 M KCI, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) equivalent to one-half ofthe packed nuclear volume.
  • low-salt buffer (20 mM HEPES , pH 7.9, 25% (v/v) glycerol, 1.5 mM MgCl 2 , 0.02 M KCI, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT
  • nuclear extraction is carried out for 30 minutes with continuous gentle stirring.
  • a volume of high-salt buffer (20 mM HEPES, pH 7.9, 25% (v/v) glycerol, 1.5 mM MgCl 2 , 1.2 M KCI, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) equivalent to one-half of the packed nuclear volume (dropwise with stirring) to the nuclei.
  • the nuclei are collected by centrifugation and the nuclear extract is dialyzed against 50 volumes of dialysis buffer (20 mM HEPES, pH 7.9, 20% (v/v) glycerol, lOOmM KCI, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) until the conductivities of extract and buffer are equivalent.
  • the extract is removed from the dialysis tubing and analysed for protein concentration (Ausubel et al., supra).
  • Probes useful for gel shift analysis include a fragment of plasmid DNA or a gel-purified double stranded oligonucleotide.
  • the probe is labeled with Klenow fragment by incubating a lOOml solution of plasmid DNA or oligonucleotide with lOOmCi of the desired [a- 32 P] dNTP, 4ml of 5 mM 3dNTP mix and 2.5 U Klenow fragment for 20 minutes at room temperature.
  • 4ml of a solution comprising 5 mM of the dNTP co ⁇ esponding to the radioactive dNTP the sample is incubated for 5 minutes at room temperature.
  • the radiolabeled probe is ethanol precipitated, resuspended in TE buffer and gel purified.
  • DNA binding activity is an essential property of proteins involved in many basic cell biological events, such as chromatin structure, transcriptional regulation, DNA replication and repair.
  • the biological activity of a DNA binding protein can be assayed by defining the optimal target DNA binding site.
  • the canonical nucleotide sequence defining the binding site is elucidated in vitro by mixing purified full length protein, or just the DNA binding domain of a protein of interest, with an oligonucleotide duplex pool containing a completely randomized central region flanked by primer- annealing sites. Multiple rounds of immunoprecipitation and amplification by PCR enriches for high affinity sites which are cloned are sequenced in order to define a canonical binding site.
  • DNA binding protein The ability of a DNA binding protein to correctly regulate chromatin assembly and structure can be determined by DNase hypersensitivity assays. Alternatively, coimmunoprecipitation experiments or Western blot analysis can be used to determine if the DNA binding protein is associated with a component of the chromatin.
  • the ability of a protein to bind DNA is measured by using the "Southwestern" blot technique (for example see Antalis et al., 1993, Gene, 134:201). According to this method, radiolabelled DNA is incubated with protein that has been immobilized on nitrocellulose filters and the amount of boundDNA is measured by scintillation counting or autoradiography followed by densitometry.
  • the protein to be tested can be pure protein, immunoprecipitated protein, crude cell lysates or even recombinant protein denatured directly from bacterial colonies, yeast or cell culture.
  • immunoprecipitation can be used to test for the presence of the protein (Otto and Lee, 1993, Methods Cell Biol., 37: 119, Banting, 1995, In Gene Probes 1: A practical approach. Chapter 8: Antibody probes, pp. 225-227, IRL press.).
  • the following methods are used for determining if a protein of interest is associated with a particular subcellular component.
  • proteins are immunoprecipitated with an antibody specific for a cellular component (e.g.
  • the immunoprecipitated material is analysed on a gel by denaturing polyacrylamide gel electrophoresis and western blot analysis is performed with an antibody specific for the protein of interest, to determine if a physical association exists between the cellular component and the protein of interest.
  • Various incubation and wash treatments of the cell lysate are used to remove background contamination and enhance the sensitivity of detection (Banting, 1995, supra).
  • the initial immunoprecipitation can be carried out with the antibody specific for the protein of interest, and the western blot analysis can be performed with an antibody specific for a cellular component.
  • the cells prior to immunoprecipitation the cells can be treated with a protein crosslinker to ensure that protein-protein interactions are maintained during immunoprecipitation.
  • proteins can be cross-linked to DNA and then precipitated (Dedon et al., 1991, Anal. Biochem., 197:83). If DNA coprecipitates with a particular protein, this suggests that DNA is associated with, and presumably bound to the protein. The coprecipitating DNA can be sequenced to identify the bound sequence.
  • the transcriptionally active promoter region of a gene can be analysed for susceptibility to cleavage by DNAsel (Montecino et al., 1994,Biochemistry, 33:348). Efficient cleavage of genomic DNA is dependent on the accessibility of this enzyme to the DNA, and is influenced by several factors, including nucleosome packaging, overall chromatin configuration, and the presence of DNA binding proteins such as transcription factors. DNA sequence variations within the promoter DNA may have profound effects on these factors and result in abe ⁇ ant regulation of gene transcription and ultimately abnormal biological activity of the gene. Therefore, altered gene activity around a polymorphic site can be detected as increased or decreased DNAsel hypersensitivity (Vaishnaw et al., 1995, Immunogenetics, 41:354).
  • methylations-specific PCR (Herman et al., 1996, Proc Natl Acad Sci USA., 93:9821), is used to determine the methylations status of CpG islands without the use of methylations-specific restriction enzymes.
  • chromatin-packaged genes involves highly regulated changes in nucleosome structure that control DNA accessibility. Changes in nucleosome structure can be mediated by enzymatic complexes which control the acetylation and deacetylation of histones. Transcription elongation is required for the formation of the unfolded structure of transcribing nucleosomes, and histones acetylation is required for the maintenance of these structures (Walia et al., 1998, J. Biol. Chem., 3:14516). Deacetylation can be prevented by incubating cells with histones deacetylase inhibitors such as sodium butyrate or trichostain A. To assay for changes in acetylation and the state of transcriptional activity, chromatin fractions are purified using organomercury and hydroxylapatite dissociation chromatographic techniques (Walia et al., supra).
  • a polymorphism causes a change in the transcriptional start site of a candidate gene
  • SI nuclease mapping and primer extension can be performed.
  • the presence of a polymorphism may cause an mRNA to be abe ⁇ antly expressed.
  • a polymorphism may change the tissue specificity or developmental expression pattern of an mRNA species.
  • a variety of molecular methods for detecting mRNA known in the art can be performed to determine the expression pattern of an mRNA These methods include, but are not limited to the following: Northern blot analysis, RT-PCR, SI analysis, RNASE Protection analysis, or in situ hybridization analysis of sections, wherein the samples are derived from multiple different tissues or from a tissue at different stages of development.
  • Northern blot analysis, RT-PCR and S 1 analysis can also be used to determine if a polymorphism results in an altered pattern of mRNA splicing.
  • Northern blotting The method of Northern blotting is well known in the art. This technique involves the transfer of RNA from an electrophoresis gel to a membrane support to allow the detection of specific sequences in RNA preparations.
  • RNA sample (prepared by the addition of MOPS buffer, formaldehyde and formamide) is separated on an agarose/formaldehyde gel in IX MOPS buffer. Following staining with ethidium bromide and visualization under ultra violet light to determine the integrity of the RNA, the RNA is hydrolyzed by treatment with 0.05M NaOH/l .5MNaCl followed by incubation with 0.5M Tris-Cl (pH 7.4V1.5M NaCI. The RNA is transferred to a commercially available nylon or nitrocellulose membrane (e.g.
  • Hybond-N membrane Amersham, Arlington Heights, IL
  • the membrane is hybridized with a radiolabeled probe in hybridization solution (e.g. in 50% formamide/2.5% Denhardt's/100-200mg denatured salmon sperm DNA/0. 1% SDS/5X SSPE) at 42°C.
  • hybridization solution e.g. in 50% formamide/2.5% Denhardt's/100-200mg denatured salmon sperm DNA/0. 1% SDS/5X SSPE
  • the hybridization conditions can be varied as necessary as described in Ausubel et al., supra and Sambrook et al., supra.
  • the membrane is washed at room temperature in 2X SSC/0.1% SDS, at 42°C in IX SSC/0.1% SDS, at 65°C in 0.2X SSC/0.1% SDS, and exposed to film.
  • the stringency of the wash buffers can also be varied depending on the amount of background signal (Ausubel et al., supra).
  • RNASE Protection analysis can be used to analyze RNA structure and amount and determine the endpoint of a specific RNA.
  • RNASE protection is more sensitive than SI analysis since it utilizes a sequence specific hybridization probe that is labeled to a high specific activity.
  • the probe is hybridized to sample RNAs and treated with ribonuclease to remove free probe. Following ribonuclease treatment, the fragments comprising probe annealed to homologous sequences in the sample RNA are recovered by ethanol precipitation, and analysed by electrophoresis on a sequencing gel. The presence of the target mRNA is indicated by the presence of an appropriately sized fragment of the probe.
  • a probe is labeled by the method of in vitro transcription (in the presence of [a- 32 P] CTP as described in Section B entitled "Production of a Polynucleotide Sequence".
  • the RNA sample to be analysed is ethanol precipitated and resuspended in 30ml hybridization buffer (4 parts formamide/1 part 200 mM PIPES, pH 6.4, 2M NaCI, 5 mM EDTA) containing 5 x 10 5 cpm of the probe RNA.
  • the mixture is denatured 5 minutes at 85°C and incubated at the desired hybridization temperature (30°C to 60°C) for >8 hours.
  • ribonuclease digestion buffer (10 mM Tris-Cl, pH 7.5, 300mM NaCI, 5mM EDTA) containing 40mg/ml ribonuclease A and 2mg/ml ribonuclease TI.
  • the sample is incubated for 30-60 minutes at 30°C.
  • 10 ml 20%SDS and 2.5ml 2Omg/ml proteinase K the sample is incubated for 15 minutes at 37°C.
  • RNA loading buffer 80% (v/v) formamide, 1 mM EDTA, pH 8.0, 0.1 % bromophenol blue, 0.1 % xylene cyanol
  • primer extension is used to map the 5' end of an RNA and to quantitate the amount of an RNA of interest by using reverse transcriptase to extend a primer that is complementary to a region of a given RNA.
  • oligonucleotide primer is labeled in a kinase reaction as described for SI analysis.
  • the primer extension reaction is performed by mixing 10-50mg total cellular RNA (in 10ml) with 1.5ml 10X Hybridization buffer (1.5M KCI, 0.1M TrisCl, pH 8.3, lOmM EDTA) and 3.5 ml labeled oligonucleotide. Samples are heated to 65°C for 90 minutes and allowed to slow cool at room temperature.
  • primer extension reaction mixture (0.9ml Tris-Cl, pH 8.3, 0.9ml 0.5MMgCl 2 , 0.25ml DTT, 6.75ml 1 mg/ml actinomycin D, 1.33 ml 5 mM 4dNTP mix, 20 ml H 2 0, 0.2ml 25U/ml AMV reverse transcriptase).
  • Samples are incubated for 1 hour at 42°C, and then, following the addition of 105ml RNASE reaction mix (100 mg/ml salmon sperm DNA, 20 mg/ml RNASE A) for 15 minutes at 37°C.
  • Samples are extracted in phenol/chloroformlisoamyl alcohol, ethanol precipitated, resuspended in stop/loading dye (20 mM EDTA, pH 8.0, 0.05% bromophenol blue, 0.05% xylene cyanol in formamide), heated at 65°C and analysed by electrophoresis on a 9% acrylamide/7M urea gel and autoradiography.
  • stop/loading dye (20 mM EDTA, pH 8.0, 0.05% bromophenol blue, 0.05% xylene cyanol in formamide
  • Cytological techniques well known in the art can be used to determine the temporal and spatial expression patterns of mRNA (in situ hybridization of tissue sections) and protein (immunohistochemistry in individual cells).
  • Tissue samples intended for use in in situ detection of either RNA or protein are fixed using conventional reagents; such samples may comprise whole or squashed cells, or sectioned tissue.
  • Fixatives useful for such procedures include, but are not limited to, formalin, 4% paraformaldehyde in an isotonic buffer, formaldehyde (each of which confers a measure of RNAase resistance to the nucleic acid molecules of the sample) or a multi -component fixative, such as FAAG (85 % ethanol, 4% formaldehyde, 5% acetic acid, 1% EM grade glutaraldehyde).
  • RNAase-free i.e. treated with 0.1% diethylprocarbonate (DEPC) at room temperature overnight and subsequently autoclaved for 1.5 to 2 hours.
  • Tissue will be fixed at 4°C, either on a sample roller or a rocking platform, for 12 to 48 hours in order to allow the fixative to reach the center of the sample.
  • DEPC diethylprocarbonate
  • sample Prior to embedding, excess fixative will be removed and the sample will be dehydrated by a series of two- to ten-minute washes in increasingly high concentrations of ethanol, beginning at 60% and ending with two washes in 95% and another two in 100% ethanol, followed by two ten-minute washes in xylene.
  • Samples will be embedded in one of a variety of sectioning supports, e.g. paraffin, plastic polymers or a mixed paraffin/polymer medium (e.g. Paraplast®Plus Tissue Embedding Medium, supplied by Oxford Labware).
  • sectioning supports e.g. paraffin, plastic polymers or a mixed paraffin/polymer medium (e.g. Paraplast®Plus Tissue Embedding Medium, supplied by Oxford Labware).
  • paraffin plastic polymers
  • a mixed paraffin/polymer medium e.g. Paraplast®Plus Tissue Embedding Medium, supplied by Oxford Labware.
  • fixed, dehydrated tissue will be transferred from the second xylene wash
  • the paraffin or a paraffin/polymer resin will be replaced three to six times over a period of approximately three hours to dilute out residual xylene.
  • the sample will be incubated overnight at 58°C under a vacuum, in order to optimize infiltration of the embedding medium into the tissue.
  • BSA bovine serum albumin
  • fixation and embedding are also applicable for use according to the methods of the invention; examples of these are found in Humason, G.L., 1979, Animal Tissue Techniques, 4th ed. (W.H. Freeman & Co., San Fransisco), as is frozen sectioning (Serrano et al., 1989, supra).
  • In situ Hybridization Analysis According to the method of in situ hybridization a specifically labeled nucleic acid probe is hybridized to cellular RNA present in individual cells or tissue sections. In situ hybridization can be performed on either paraffin or frozen sections. Depending on the desired sensitivity and resolution, either film or emulsion autoradioagraphy can be utilized to detect the hybridized radioactive probe.
  • the following method of in situ hybridization is performed by incubating slides containing cell or tissue specimens in a slide rack contained within a glass staining dish. According to this method, it is preferable to use solutions that have been prepared fresh. Prior to the hybridization steps, slides are dewaxed to remove the sectioning support material.
  • the dewaxing protocol involves sequential washes in xylene, rehydration by sequential washes in 100%, 95%, 70% and 50% ethanol, and denaturation in 0.2N HCl.
  • samples are postfixed in a freshly prepared solution of 4% PFA, washed in PBS, incubated in 10 mM DTT (10 min at 45°C) and blocked in 400 ml PBS containing 0.617g DTT, 0.74 g iodoacetamide and O.Sg N-ethylmaleimide, for 30 min at 45°C in a water bath covered with aluminum foil, due to the light sensitivity of iodoacetamide and N- ethylmaleimide.
  • the samples are washed in PBS and equilibrated sequentially in freshly prepared 0.
  • TEA buffer 1M triethanolamine
  • TEA buffer/0.25% acetic anhydride 1M triethanolamine
  • TEA buffer/0.5% acetic anhydride 1M triethanolamine
  • the sample are dehydrated by sequential washes in 50%, 70%, 95%, and 100% ethanol and air dried.
  • 35 S-labeled riboprobes and competitor probes prepared in the absence of a radiolabel (prepared as described in Section B entitled "Production of a Polynucleotide Sequence") or double-stranded DNA probes (prepared with [ 35 S]dNTPs by methods well known in the art including nick translation or random oligonucleotide-primed synthesis) are heated to 100°C for 3 min and diluted to a concentration of 0.3mg/ml final probe concentration, in 50% formamide, 0.3M NaCI, lOmM TrisCl, pH 8.0, 1 mM EDTA, lx Denhardt solution, 500mg/ml yeast tRNA, 500mg/ml ⁇ oly(A) (Pharmacia), 50 mM DTT, 10% polyethylene glycol (MW 6000).
  • the hybridization step is carried out by covering the sample with an appropriate amount of probe, and incubating for 30 min to 4 hour at 45°C in a chamber designed to prevent dilution or concentration of the hybridization solution. Samples are washed sequentially at 55°C in solution A (50% (v/v) formamide, 2X SSC, 20 mM 2-mercaptoethanol), and solution B (50% (v/v) formamide, 2X SSC, 20 mM 2-mercaptoethanol, 0.5% (v/v) Triton-X-100) and at room temperature in solution C (2X SSC, 20 mM 2- mercaptoethanol).
  • Gene-expression can be regulated by variations in mRNA stability (Liebhaber, 1997, Nucleic Acids Symp Ser., 36:29 and Ross J. 1996, Trends Genet., 5:171). Any gene variation occurring within the cis-acting elements which control mRNA abundance may influence gene expression levels (Peltz et al., 1992, Curr Opin Cell Biol., 4:979). Quantitative RT-PCR (Kohler, et al, 1995, Quantitation of mRNA by polymerase chain reaction, Springer) and mRNA radiolabelling techniques are two methods for measuring relative mRNA abundance and stability.
  • Quantitative PCR employs an internal standard to provide a direct comparison between alternative reactions, enabling comparison of low abundance transcripts or transcripts derived from a sample that is only available in a limited quantity (McPherson MJ et al., eds, 1995, PCR2- A practical approach. IRL Press).
  • RNA Transcription Rates Genetic polymorphism within the regulatory regions of a gene can significantly alter transcription rate and mRNA stability, resulting in reduced biological activity of the encoded protein.
  • One of the most sensitive assays for measuring the rate of gene transcription is the nuclear runoff assay (Groudine and Casimir, 1984, Nucleic Acids Res 12: 1427). Nuclei isolated from cell lines expressing the target gene of interest are treated with radiolabelled UTP and the level of incorporation of radiolabel into nascent RNA transcripts is determined by filter hybridization to immobilized cDNA derived from the target gene.
  • a genetic variation can cause a change in the localization of a particular mRNA species
  • RNA localization Changes in RNA localization can be detected by immunohistochemical methods well known in the art (e.g. in situ analysis described above).
  • mRNA like protein
  • the Xenopus oocyte is a popular, experimentally tractable, system for studying intracellular trafficking of mRNA (Nakielny et al . , 1997, Annu. Rev. Neurosci. , 20:269).
  • Fluorescently labelled RNA is microinjected into the large oocyte cell where its location can be detected using standard microscopy methods.
  • Polymorphic variants of a particular mRNA species may differ in their response to cellular mechanisms responsible for partitioning mRNA within the cell. This method has been useful for demonstrating that sequence variations can affect sub-cellular localization (Grimm et al., 1997,EMBO J., 16:793)
  • Post-Translational alterations resulting from premature stop codons, translational readthrough or multiple open reading frames and translational suppression may occur as a result of a polymorphism.
  • a polynucleotide comprising one or more polymorphisms is subjected to in vitro transcription and in vitro translation (as described in sections B and J entitled “Production of a Polynucleotide Sequence” and "Preparation of a Labeled Protein”).
  • the translation product(s) are analysed for the appearance of aberrantly sized proteins.
  • Additional post-translational alterations that may occur as a result of a polymorphism include changes in localization due to an altered signal sequence, and changes in glycosylation, myristilation, and susceptibility to or sites of proteolytic cleavage.
  • the method of immunocytochemistry can be used to determine if a protein is incorrectly localized, due to the presence of an altered signal sequence.
  • Immunohistochemical techniques including indirect immunofluorescence, immunoperoxidase labeling or immunogold labeling, are used for protein localization.
  • Immunofluorescent labeling of tissue sections (prepared as for in situ analysis, described above) is performed by the following method. Slides containing the sample of interest are equilibrated to room temperature washed in PBS, incubated with an appropriate dilution of primary antibody (1 hour at room temperature), washed in PBS, incubated with an appropriate dilution of secondary antibody (1 hour at room temperature), washed in PBS and analysed under a microscope (Ausubel et al., supra). Alternatively, the sensitivity ofthe immunohistochemical reaction is increased by using a streptavidin-secondary antibody conjugate reacted with a biotin- fluorochrome conjugate. Alternatively, immunogold labeling is used to detect a protein of interest by using an immunogold-conjugated secondary antibody. Immunoperoxidase labeling of tissue sections is performed by the following method.
  • Slides are pretreated in 0.25% hydrogen peroxide, incubated with primary antibody, washed in PBS and incubated (1 hour at room temperature) with a specific secondary bridging antibody capable of recognizing both the primary antibody and a Horseradish peroxidase antiperoixidase (PAP) complex.
  • PAP Horseradish peroxidase antiperoixidase
  • the slides are washed in PBS and developed in diaminobenzidene substrate solution (0.03% (w/v) 3,3' diaminobenzidene in 200 ml PBS) at room temperature (Ausubel et al., supra).
  • protein localization is determined by cell fractionation wherein cells are biosynthetically labeled, the labeled material is fractionated, and the radiolabeled proteins in each fraction are analysed by immunoprecitation with an antibody specific for the protein of interest.
  • Changes in protein glycosylation can be detected by radiolabelling a protein of interest with sugars, determining if a change in the cellular localization (by immunocytochemistry) ofthe protein in culture has occurred due to aberrant glycosylation, or by determining the effects of inhibitors of glycosylation on the migration pattern of proteins analysed by polyacrylamide gel electrophoresis.
  • Protein glycosylation can be inhibited by tunicamycin, an antibiotic, as well as by several sugar analogues (Schwarz, 1991, Behring Inst Mitt., 89:198). These reagents are used to characterize the effects of sequence changes on protein glycosylation.
  • Changes in protein modification with lipids are detected by radiolabelling a protein of interest with myristic acid or by determining if a change in the cellular localization of the protein in culture has occurred as a result of aberrant lipid modification (by immunocytochemistry).
  • Covalent attachment of lipids is a mechanism by which eukaryotic cells direct and, in some cases, control, membrane localization of proteins (Casey, 1994, Curr. Opin. Cell. Biol., 2:219). Such post-translational addition of myristyl, palmityl or prenyl side-chains has akey role in the functional regulation of many proteins (Chow et al., 1992, Curr. Opin. Cell. Biol., 4:629; Resh, 1994, Cell, 763:411). Assays for detecting proteins that are covalently modified by the attachment of lipids include labeling with [ 3 H]myristate (Stevenson et al., 1992, J. Exp.
  • Proteolytic Cleavage Post-translational cleavage of polypeptides is an important mechanism for modulating protein function in many physiological processes. Protease activity is involved in zymogen processing, activation of enzyme catalysis, tissue/cell remodelling, signal transduction cascades, protein degradation and cell death pathways (Rappay, 1989, Prog Histochem Cytochem., 18:1). A protein that is predicted to be a protease or the target of a protease can be assayed in vitro using purified proteins or cell extracts (Muta et al., 1995, J. Biol. Chem. 270:892) where cleavage efficiency is monitored by standard PAGE or western blotting.
  • proteases and/or their targets can be expressed from expression plasmids in in vivo cell culture systems in order to monitor their biological activity (Zhang, et al., 1998, J. Biol. Chem.273: 1144).
  • the specificity of proteolytic cleavage is determined using inhibitors that selectively block seine, cysteine, aspartic and metallo proteolytic activity (e.g. pepstatin A selectively inhibits aspartic proteases) (Rich, et al., 1985, Biochemistry., 24: 3165).
  • pulse chase experiments with radiolabeled protein can be carried out to determine the precursor-product relationship following digestion with a protease of a given specificity.
  • the method of pulse chase labeling is described in Ausubel et al., supra.
  • inhibitors of proteases e.g acid proteases or seine proteases
  • a polymorphism may modify the properties of the receptor such that receptor binding/turnover or activation is altered. Receptor formation can be impaired if a polymorphism causes improper receptor localization or assembly.
  • the receptor can be localized by immunocytochemical techniques.
  • cells that are expressing the receptor can be fractionated and subjected to Western blot analysis or biosynthetically labeled, fractionated and analysed by immunoprecipitation.
  • a number of methods can be used to determine if a receptor is colocalized with the appropriate protein partner.
  • a protein may be dependent on the ability of the protein to interact with other proteins as part of a large complex.
  • certain cell surface receptors consist of a receptor complex that is composed of several homo- or heteromeric protein subunits, and activation by ligand can result in altered protein-protein interactions both within the receptor complex and with "downstream" targets such as G-proteins (Okada and Pessin, 1996, J. Biol. Chem., 271:25533).
  • G-proteins Okada and Pessin, 1996, J. Biol. Chem., 271:25533.
  • Protein-protein interactions can be assayed immunologically by co- immunoprecipitation of native (Gilboa etal., 1998, J. Biol. Chem., 140:767) or chemically cross- linked complexes (Haniu et al., 1997, J.
  • Receptor binding/Turnover Receptor-ligand interaction is essential for the functionality of the bound complex.
  • Receptor binding/turnover can be measured by standard Scatchard analysis of radiolabelled ligand binding in vitro (Culouscou et al., 1993, J. Biol. Chem. 268:10458) or in cellular based assays (Greenlund et al., 1993, J. Biol. Chem. 268: 18103).
  • affinity chromatography methods can be employed to determine if a receptor is demonstrating aberrant binding characteristics. According to the method of affinity chromatography, receptor-ligand interactions are allowed to occur, and the binding efficiency or receptor and ligand and/or turnover of receptor-ligand complexes is measured. Alternatively, affinity chromatography can be used to isolate one or more components of a receptor ligand interaction for further analysis (March et al., 1974, Adv. Exp. Med. Biol., 42:3).
  • the method of affinity chromatography typically involves immobilizing on a solid support one component, for example a known ligand for a receptor, and then incubating the immobilized ligand with radiolabelled protein under optimal binding conditions. To measure the exact binding affinity of a given ligand-receptor pair, an increasing amount of non-labeled competitor is added. This assay can be used to assess altered binding efficiency resulting from the presence of a polymorphism in a protein of interest.
  • Receptor Activation Assays Phosphorylation, Kinase Activity and Mitogenic Stimulation
  • the biological function of a receptor is usually assayed in cell culture following over-expression.
  • the phosphorylated state of a receptor can be assayed directly by immunological methods by employing an antibody that specifically recognizes a phosphorylated residue (Bangalore, 1992., Proc Natl Acad Sci USA., 89:11637).
  • Endogenous kinase activity associated with a receptor is measured via the incorporation of radiolabelled phosphate in immunoprecipitated receptor complex (Kazlauskas and Cooper, 1989, Cell 58:1121).
  • Downstream events of receptor activity including mitogenic stimulation or map kinase activity can be measured by tritiated thymidine incorporation (Luo et al., 1996, Cancer Res. 56:4983), or by mobility-shift analysis of map kinase on western blots (Vietor, 1993., J. Biol. Chem. 268:18994), respectively.
  • Immunocytochemical methods can be used to determine if a receptor-ligand complex is co ⁇ ectly translocated to the nucleus.
  • nuclear preparations prepared as described below
  • Western blot or immunoprecipitation for the presence of the receptor protein.
  • a receptor is a transcriptional activator
  • the ability of the receptor to induce gene expression can be measured by a variety of methods including Northern blot analysis, or reporter gene assays wherein the promoter region isolated from a gene that is activated by the receptor regulates the expression of a reporter protein.
  • the gene of interest may encode a protein that has an enzymatic activity wherein the enzyme catalyzes a reaction that is critical to the general metabolism of a cell.
  • assays can be performed to measure the enzymatic activity of the protein.
  • the protein of interest may also be involved in various aspects of DNA synthesis or replication.
  • In vitro assays for the enzymatic reactions involved in DNA synthesis or replication e.g. polymerase, ligase, exonuclease or helicase activity
  • the biological activity of the proteins catalyzing these activities are assayed in vitro using standard enzymatic techniques (Adams, 199, DNA Replication: A Practical Approach I, Rickwood, et al., Eds., JRL Press. Oxford, England).
  • assays for measuring transporter activity or the activity of ATP dependent pumps are useful, according to the invention, for determining if a mutated protein is impaired in these functions.
  • the full-length cDNA clone is isolated by standard expression cloning strategies, and a change in activity ofthe full-length cDNA or antisense cDNA upon microinjection into Xenopus laevis oocytes is determined by measuring changes in influx/efflux transport of radiolabelled amino acid molecules (Broer et al., 1995, Biochem J., 312(Pt 3):863), neurotransmitters or their metabolites.
  • the coupling ratios e.g. moles substrate transported/mole ATP hydrolyzed
  • the gene of interest may encode for a protein that is a component of an ion channel. Immunocytochemical methods can be used to determine if an ion channel protein demonstrates the appropriate cell type specificity.
  • the activity of an ion channel can be measured by electrophysiological methods in oocytes. Alternatively, the sensitivity of ion channel activity to a particular inhibitor can be determined.
  • This technology represents a useful system for studying various aspects of ion channels encoded for by foreign mRNAs including channel expression, single-channel behavior, and the response of channels to the action of pharmacologically active substances (Sigel, 1987 J. Physiol., 386: 73).
  • the function of individual channel proteins is determined by the high resolution patch clamp technique.
  • This technique (which is useful in a variety of cell types, including Xenopus oocytes described above) involves measuring changes in transmembrane cmxent across the cell membrane in vitro (Sachs et al., 1983, Methods Enzymol., 103: 147). Processes such as signaling, secretion, and synaptic transmission are examined at the cellular level by the patch clamp method.
  • the gene expression pattern and protein structure of ionic channels can be determined by combining information derived from high-resolution electrophysiological recordings obtained by the patch clamp method with molecular biological analysis (Liem et al., 1995, Neurosurgery, 36: 382).
  • a polymorphic variation in a gene that encodes a protein that is a member of a multimeric protein complex, such as an ion channel or a cytoskeletal structural component, can alter the assembly and function the multimeric protein complex (Lee et al., 1994., Biophys J., 66: 667).
  • a gene variation may affect protein-protein interaction, or disrupt the production of components of a multimeric complex, thereby disrupting stoichiometry and consequently decreasing stability.
  • In vitro assembly assays (described above) can be performed to determine if a polymorphism has affected the assembly of an ion channel.
  • the influence of a polymorphism on general aspects of cell behavior, including cell morphology, adhesive properties, differentiation and proliferation can be assessed using a combination of methods including microscopic observation of cell cultures (Azuma et al., 1994, Histol.Histopathol., 9:781), immunohistochemistry, and FACs analysis techniques (Beesley, 1993, Immunocytochemistry: a Practical Approach, Rickwood, et al., (Eds), JRL Press and Ormerod, 1994, Flow Cytometry: a practical Approach, Rickwood et al., (Eds), BRL Press. Oxford, England).
  • Apoptosis has been implicated in the etiology and pathophysiology of a variety of human diseases.
  • Gene variants which influence the process of apoptosis can be assessed by a variety of methods of analysis involving either the tissues or cells (Allen et al., 1997, J Pharmacol Toxicol Methods, 37: 215).
  • Cell cultures expressing the gene variants of interest are analysed using Annexin V which interacts strongly with phosphatidylserine residues that have been exposed as a result of plasma membrane breakdown occurring in the early stages of apoptosis.
  • TdT-mediated deoxyuridine triphosphate (dUTP)-biotin nick end-labeling (TUNEL) is a prefe ⁇ ed method for specific staining of apoptotic cells in histological sections and cytology specimen (Labat-Moleur et al., 1998, J. Histochem Cytochem., 46:327; Sasano et al, 1998., Diagn Cytopathol.,18:398).
  • Apoptosis is also detected by quantification of DNA fragmentation by ethidium bromide staining and gel electrophoresis, or by the use of saturation labeling of 3' ends of DNA fragments (Peng and Liu, 1997, Lab Invest., 77:547).
  • Assay for In Vivo Receptor Function Growth Cone Guidance Assay. Activation of cell-surface receptors can result in the stimulation of cell motility.
  • signaling molecules for example the netrins, (Serafini et al. , 1994, Cell.78: 409), which are responsible for both contact mediated or chemo-mediated attraction and repulsion of migrating cells.
  • a classic model for this activity is the trajectory that the leading edge "growth cone" takes when a neuron is stimulated to grow out from explanted neural tissue in cell culture (Goodman, 1996, Annu Rev Neurosci. 19: 341).
  • Ligands present in the culture medium or immobilized on a substrate bind to receptors on the cell-surface of the growth cone and trigger second-messenger signals thereby dictating an appropriate steering response.
  • the biological activity of such receptors or ligands can be measured by overexpressing the receptor or ligand protein in culture and then monitoring growth cone guidance (Kremoser et al., 1995, Cell 82: 359). Attraction or repulsion of cells which is observed to be different than normal is an indication of the role of this protein in growth guidance, and identifies the polymorphisms as altering function.
  • Changes in gene expression or protein function that result from the presence of a polymorphism can be detected by in vivo assays including the production of transgenic animals, knock out animals or the analysis of naturally occurring animal models of a particular disease.
  • Transgenic mice provide a useful tool for genetic and developmental biology studies and forthe determination of afunction of anovel sequence. Accordingto the method of conventional transgenesis, additional copies of normal or modified genes are injected into the male pronucleus of the zygote and become integrated into the genomic DNA of the recipient mouse. The transgene is transmitted in a Mendelian manner in established transgenic strains.
  • Constructs useful for creating transgenic animals comprise genes under the control of either their normal promoters or an inducible promoter, reporter genes under the control of promoters to be analysed with respect to their patterns of tissue expression and regulation, and constructs containing dominant mutations, mutant promoters, and artificial fusion genes to be studied with regard to their specific developmental outcome.
  • Transgenic mice are useful according to the invention for analysis ofthe dominant effects of overexpressing a candidate gene in mouse. Typically, DNA fragments on the order of 10 kilobases or less are used to construct a transgenic animal (Reeves, 1998, New. Anat, 253: 19).
  • Transgenic animals can be created with a construct comprising a candidate gene containing one or more polymorphisms according to the invention.
  • transgenic mice engineered to overexpress a number of genes, including PCK1 (Valera et al., 1994, Proc. Natl. Acad. Sci. USA, 91: 9151), INS (Mitanchez et al.,FEBSLetters,421: 285), IAPP(D'Alession etal., 1994, Osteoporosis, 43: 1457), Asp (Klebig et al, Proc. Natl. Acad. Sci. USA, 92: 4728) and Agrt (Graham et al., Nature Genetics, 17:273), have been prepared and may be useful for studying osteoporosis.
  • PCK1 Valera et al., 1994, Proc. Natl. Acad. Sci. USA, 91: 9151
  • INS Mitanchez et al.,FEBSLetters,421: 285)
  • Knock out animals are produced by the method of creating gene deletions with homologous recombination. This technique is based on the development of embryonic stem (ES) cells that are derived from embryos, are maintained in culture and have the capacity to participate in the development of every tissue in the mouse when introduced into a host blastocyst. A knock out animal is produced by directing homologous recombination to a specific target gene in the ES cells, thereby producing a null allele of the gene. The potential phenotypic consequences of this null allele (either in heterozygous or homozygous offspring) can be analysed (Reeves, supra).
  • ES embryonic stem
  • Single or double knock out mice that may be useful for studying osteoporosis have been produced for a number of genes including IRS 1 (Araki et al., 1994, Nature, 372:186, Tamemoto et al., 1994, Nature, 372:182), 1R52 (Withers et al., 1998, Nature, 391:900), INSR, BJJRKO, MJJRKO, INSR (Lamothe et al., 1998, FEBS Letter, 426:381), GLUT2, GLUT4 (Katz et al., 1995, Nature, 377:151), GLP1R (Gallwitz and Schmidt, 1997, Z.
  • the method of targeted homologous recombination has been improved by the development of a system for site-specific recombination based on the bacteriophage PI site specific recombinase Cre.
  • the Cre-loxP site-specific DNA recombinase from bacteriophage PI is used in transgenic mouse assays in order to create gene knockouts restricted to defined tissues or developmental stages. Regionally restricted genetic deletion, as opposed to global gene knockout, has the advantage that a phenotype can be attributed to a particular cell/tissue (Marth, 1996, Clin. Invest. 97: 1999).
  • the Cre-loxP system one transgenic mouse strain is engineered such that loxP sites flank one or more exons of the gene of interest.
  • Amplified products useful according to the invention can be prepared by utilizing the method of PCR as described in Section B entitled "Production of a Polynucleotide Sequence
  • Primers useful for producing an amplified product according to the invention can be designed and synthesized as described in Section A entitled “Design and Synthesis of Oligonucleotide Primers”.
  • the invention provides methods (e.g. Southern blot analysis, PCR, primer extension and oligonucleotide hybridization), of detecting a polymorphism in an amplified product.
  • polynucleotide sequences which encode candidate gene protein fragments, fusion proteins or functional equivalents thereof may be used in recombinant DNA molecules that direct the expression of a candidate gene protein in appropriate host cells. Due to the inherent degeneracy ofthe genetic code, other DNA sequences which encode substantially the same or a functionally equivalent amino acid sequence, may be used to clone and express the candidate gene protein. As will be understood by those of skill in the art, it may be advantageous to produce candidate gene-encoding nucleotide sequences possessing non-naturally occurring codons.
  • Codons preferred by a particular prokaryotic or eukaryotic host can be selected, for example, to increase the rate of protein expression or to produce recombinant RNA transcripts having desirable properties, such as a longer half -life as compared to transcripts produced from the naturally occurring sequence.
  • nucleotide sequences of the present invention can be engineered in order to alter a candidate gene-encoding sequence for a variety of reasons, including but not limited to, alterations which modify the cloning, processing and/or expression of the gene product. For example, mutations may be introduced using techniques which are well known in the art, e.g., site-directed mutagenesis to insert new restriction sites, to alter glycosylation patterns, to change codon preference or to produce splice variants.
  • a natural, modified or recombinant candidate gene protein-encoding sequence may be ligated to a heterologous sequence to encode a fusion protein (as described in Section B entitled "Production of a Polynucleotide Sequence").
  • a fusion protein may also be engineered to contain a cleavage site located between a candidate protein and the heterologous protein sequence, so that the protein of interest may be substantially purified away from the heterologous moiety following cleavage.
  • the sequence encoding the candidate gene protein may be synthesized, whole or in part, using chemical methods well known in the art (see Caruthers, et al, 1980, Nuc Acids Res Symp Ser, 7:215, Horn, et al., 1980, Nuc Acids Res Symp Ser, 225, etc.)
  • the protein itself, or a portion thereof could be produced using chemical methods of synthesis.
  • peptide synthesis can be performed using various solid-phase techniques (Roberge, et al., 1995, Science, 269:202) and automated synthesis may be achieved, for example, using the A.1.431 A Peptide Synthesizer (Perkin Elmer) in accordance with the instructions provided by the manufacturer.
  • the newly synthesized peptide can be substantially purified by preparative high performance liquid chromatography (e.g., Creighton, 1983, Proteins, Structures and Molecular Principles, WH Freeman and Co. New YorkNY).
  • the composition ofthe synthetic peptides may be confirmed by amino acid analysis or sequencing (e.g., the Edman degradation procedure; Creighton, supra). Additionally the amino acid sequence of interest, or any part thereof, may be altered during direct synthesis and/or combined using chemical methods with sequences from other proteins , or any part thereof, to produce a variant polypeptide.
  • Expression Systems hi order to express a biologically active protein, the nucleotide sequence encoding the protein of interest or its functional equivalent, is inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for the transcription and translation of the inserted coding sequence.
  • an appropriate expression vector i.e., a vector which contains the necessary elements for the transcription and translation of the inserted coding sequence.
  • a variety of expression vector/host systems may be utilized to contain and express a protein product of a candidate gene according to the invention. These include but are not limited to microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with virus expression vectors (e.g., baculovirus); plant cell systems transfected with virus expression vector (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) or transformed with bacterial expression vectors (e.g., Ti or pBR322 plasmid); or animal cell systems.
  • microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with virus expression vectors (e.g., baculovirus); plant cell systems transfected with virus expression vector (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic
  • control elements or “regulatory sequences” of these systems vary in their strength and specificities and are those nontranslated regions ofthe vector, enhancers, promoters, and 3' untranslated regions, which interact with host cellular proteins to carry out transcription and translation.
  • any number of suitable transcription and translation elements including constitutive and inducible promoters, may be used.
  • inducible promoters such as the hybrid lacZ promoter of the Bluescript® phagemid (Stratagene, LaJolla CA) or pSportl (Gibco BRL) and ptrp-lac hybrids and the like may be used.
  • the baculovirus polyhedron promoter may be used in insect cells. Promoters or enhancers derived from the genomes of plant cells (e.g., heat shock, RUBISCO; and storage protein genes) or from plant virus (e.g. viral promoters or leader sequences) may be cloned into the vector, hi mammalian cell systems promoters from the mammalian genes or from mammalian viruses are most appropriate. If it is necessary to generate a cell line that contains multiple copies ofthe sequence encoding the protein product ofthe gene of interest, vectors based on 5 V40 or EB V may be used with an appropriate selectable marker. In bacterial systems, a number of expression vectors may be selected depending upon the use intended for the protein of interest.
  • vectors which direct high level expression of fusion proteins that are readily purified may be desirable.
  • Such vectors include, but are not limited to, the multifunctional E. coli cloning and expression vectors such as Bluescript® (Stratagene), in which the sequence encoding the protein of interest may be ligated into the vector in frame with sequences encoding the amino-terminal Met and the subsequent 27 residues of b-galactosidase so that a hybrid protein is produced; pIN vectors (Van Heeke & Schuster, 1989, J Biol Chem 264:5503); and the like.
  • Bluescript® Stratagene
  • pIN vectors Van Heeke & Schuster, 1989, J Biol Chem 264:5503
  • Pgex vectors may also be used to express foreign polypeptides as fusion proteins with GST.
  • fusion proteins are soluble and can easily be purified from lysed cells by adsorption to glutathione-agarose beads followed by elution in the presence of free glutathione.
  • Proteins made in such systems are designed to include heparmn , thrombin or factor X A protease cleavage sites so that the cloned polypeptide of interest can be released from the GST moiety at will.
  • yeast Saccharomyces cerevisiae
  • a number of vectors containing constitutive or inducible promoters such as alpha factor, alcohol oxidase and PGH may be used.
  • the expression of a sequence encoding a protein of interest may be driven by any of a number of promoters.
  • viral promoters such as the 35S and 19S promoters of CaMV (Brisson et al., 1984, Nature 310:511) may be used alone or in combination with the omega leader sequence from TMV (Takamatsu et al, 1987, EMBO J 3:17).
  • plant promoters such as the small subunit of RUBISCO (Coruzzi et al., 1984, EMBO J 3:1671; Broglie et al., 1984, Science, 224:838); or heat shock promoters (Winter I and Sinibaldi RM, 1991, Results Probl Cell Differ., 17:85) may be used. These constructs can be introduced into plant cells by direct DNA transformation or pathogen- mediated transection. For reviews of such techniques, see Hobbs S or Mu ⁇ y LE in McGraw Hill Yearbook of Science and Technology (1992) McGraw Hill New York NY, pp 191-196 or Weissbach and Weissbach (1988) Methods for Plant Molecular Biology, Academic Press, New York, pp 421-463.
  • An alternative expression system which could be used to express a protein of interest is an insect system.
  • Autographa californica nuclear polyhedrosis virus (AcNPV) is used as a vector to express foreign genes in Spodoptera frugiperda cells or in Trichoplusia larvae.
  • the sequence encoding the protein of interest may be cloned into a nonessential region of the virus, such as the polyhedrin gene, and placed under control of the polyhedrin promoter. Successful insertion of the sequence encoding the protein of interest will render the polyhedron gene inactive and produce recombinant virus lacking coat protein coat.
  • the recombinant viruses are then used to infect S.frigoerda cells or Trichoplusia larvae in which the protein of interest is expressed (Smith et al., 1983., J Virol 46:584; Engelhard, et al., 1994, Proc Natl Acad Sci 91 :3224).
  • a number of viral-based expression systems may be utilized.
  • a sequence encoding the protein of interest may be ligated into an adenovirus transcription/translation complex consisting of the late promoter and tripartite leader sequence. Insertion in a nonessential El or E3 region of the viral genome will result in a viable virus capable of expressing in infected host cells (Logan and Shenk, 1984, Proc Natl Acad Sci, 81 :3655).
  • transcription enhancers such as the rous sarcoma virus (RSV) enhancer, may be used to increase expression in mammalian host cells.
  • RSV rous sarcoma virus
  • Specific initiation signals may also be required for efficient translation of a sequence encoding the protein of interest. These signals include the ATG initiation codon and adjacent sequences, hi cases where the sequence encoding the protein, its initiation codon and upstream sequences are inserted into the most appropriate expression vector, no additional translational control signals may be needed. However, in cases where only coding sequence, or a portion thereof, is inserted, exogenous transcriptional control signals including the ATG initiation codon must be provided. Furthermore, the initiation codon must be in the co ⁇ ect reading frame to ensure transcription ofthe entire insert. Exogenous transcriptional elements and initiation codons can be of various origins, both natural and synthetic.
  • Enhancers appropriate to the cell system in use (Scharf, et al., 1994, Results Probl Cell Differ, 20:125; Bittner et al, 1987, Methods in Enzymol, 153:516).
  • a host cell strain may be chosen for its ability to modulate the expression of the inserted sequences or to process the expressed protein in the desired fashion.
  • modifications of the polypeptide include but are not limited to, acetylation, carboxylation, glycosylation, phosphorylation, lipidation and acylation.
  • Post-translational processing which cleaves a " prepro" form ofthe protein may also be important for correct insertion, folding and/or function.
  • Different host cells such as CHO, HeLa, MDCK, 293, W138, etc have specific cellular machinery and characteristic mechanisms for such post-translational activities and may be chosen to ensure the correct modification and processing of the introduced, foreign protein.
  • cell lines which stably express a foreign protein may be transformed using expression vectors which contain viral origins of replication or endogenous expression elements and a selectable marker gene. Following the introduction of the vector, cells may be allowed to grow for 1-2 days in an enriched media before they are switched to selective media.
  • the purpose of the selectable marker is to confer resistance to selection, and its presence allows growth and recovery of cells which successfully express the introduced sequences. Resistant clumps of stably transformed cells can be expanded using tissue culture techniques appropriate to the cell type.
  • Any number of selection systems may be used to recover transformed cell lines. These include, but are not limited to, the herpes simplex virus thymidine kinase (Wigler., et al., 1977, Cell 11:223) and adenine phosphoribosyltransferase (Lowy, et al., 1980, Cell 22:817) genes which can be employed in tk- or aprt- cells, respectively.
  • antimetabolite, antibiotic or herbicide resistance can be used as the basis for selection; for example, dhfr which confers resistance to methotrexate (Wigler et al., 1980, Proc Natl Acad Sci 77:3567); npt, which confers resistance to the aminoglycosides neomycin and G-418 (Colbere-Garapin et al., 1981., J Mol Biol., 150:1) and als or pat, which confer resistance to chlorsulfuron and phosphinotricin acetyltransf erase, respectively (Murry, supra).
  • marker gene expression suggests that the gene of interest is also present, its presence and expression should be confirmed.
  • recombinant cells containing the sequence encoding the foreign protein can be identified by the absence of marker gene function.
  • a marker gene can be placed in tandem with the sequence encoding the foreign protein under the control of a single promoter. Expression of the marker gene in response to induction or selection usually indicates expression of the tandem sequences as well.
  • host cells which contain the coding sequence for a protein of interest and express the protein of interest may be identified by a variety of procedures known to those of skill in the art. These procedures include, but are not limited to, DNA-DNA or DNA-RNA hybridization and protein bioassay or immunoassay techniques which include membrane, solution, or chip based technologies for the detection and/or quantification of the nucleic acid or protein.
  • the presence of the polynucleotide sequence encoding the protein of interest can be detected by DNA-DNA or DNA-RNA hybridization or amplification using probes, portions or fragments of the sequence encoding the foreign protein of interest.
  • a variety of protocols for detecting and measuring the expression of the foreign protein, using either polyclonal or monoclonal antibodies specific for the protein are known in the art. Examples include enzyme-linked immunosorbant assay (ELISA), radioimmunoassay (RIA) and fluorescent activated cell sorting (FACS).
  • ELISA enzyme-linked immunosorbant assay
  • RIA radioimmunoassay
  • FACS fluorescent activated cell sorting
  • a two-site, monoclonal-based immunoassay utilizing monoclonal antibodies reactive to two non-interfering epitopes on the protein of interest is preferred, but a competitive binding assay may be employed. These and other assays are described in Hampton et al., 1990, Serological Methods a Laboratory Manual, APS Presds, St Paul MN and Maddox., et al, 1983, J Exp Med 158:1211.
  • Host cells transformed with a nucleotide sequence encoding a protein of interest may be cultured under conditions suitable for the expression and recovery of the encoded protein from cell culture.
  • the protein produced by a recombinant cell may be secreted or contained intracellularly depending on the sequence and/or the vector used.
  • expression vectors containing a sequence encoding a protein of interest can be designed with signal sequences which direct secretion of the protein of interest through a prokaryotic or eucaryotic cell membrane.
  • recombinanfconstructions may j oin the sequence encoding the protein of interest to the nucleotide sequence encoding a polypeptide domain which will facilitate purification of soluble proteins (Kroll et al., 1993, DNA Cell Biol, 12:441).
  • the protein of interest may also be expressed as a recombinant protein with one or more additional polypeptide domains added to facilitate protein purification.
  • purification facilitating domains include, but are not limited to, metal chelating peptides such as a histidine- tryptophan modules that allow purification on immobilized metals, protein a domains that allow purification on immobilized immunoglobulin, and the domain utilized in the FLAGS extension/affinity purification system (Immunex Corp, Seattle WA).
  • the inclusion of a cleavable linker sequences such as Factor XA or enterokinase (Invitrogen, San Diego CA), between the purification domain and the protein of interest is useful for facilitating purification.
  • One such expression vector provides for expression of a fusion protein comprising the sequence encoding a foreign protein and nucleic acid sequence encoding 6 histidine residues followed by thioredoxin and an enterokinase cleavage site.
  • the histidine residues facilitate purification while the enterokinase cleavage site provides a means for purifying the foreign protein from the fusion protein.
  • fragments of the protein of interest may be produced by direct peptide synthesis using solid-phase techniques (Stewart et al., 1969, Solid- Phase Peptide Synthesis, WH Freeman Co,. San Francisco; Merrifield, 1963, J Am Chem Soc, 85 : 2149) .
  • In vitro protein synthesis may be performed using manual techniques or by automation . Automated synthesis may be achieved, for example, using Applied Biosystems 431 A Peptide Synthesizer (Perkin Elmer, Foster City CA) in accordance with the instructions provided by the manufacturer.
  • Various fragments of a protein of interest may be chemically synthesized separately and combined using chemical methods to produce the full length molecule.
  • Antibodies specific for the protein products of the candidate genes of the invention are useful for protein purification, for the diagnosis and treatment of various diseases (e.g osteoporosis) and for drug screening and drug design methods useful for identifying and developing compounds to be used in the treatment of various diseases (e.g. osteoporosis).
  • diseases e.g osteoporosis
  • drug screening and drug design methods useful for identifying and developing compounds to be used in the treatment of various diseases (e.g. osteoporosis).
  • an antibody useful in the invention may comprise a whole antibody, an antibody fragment, a polyfunctional antibody aggregate, or in general a substance comprising one or more specific binding sites from an antibody.
  • the antibody fragment may be a fragment such as an Fv, Fab or F(ab') 2 fragment or a derivative thereof, such as a single chain Fv fragment.
  • the antibody or antibody fragment may be non-recombinant, recombinant or humanized.
  • the antibody may be of an immunoglobulin isotype, e.g., IgG, lgM, and so forth.
  • an aggregate, polymer, derivative and conjugate of an immunoglobulin or a fragment thereof can be used where appropriate.
  • Neutralizing antibodies are especially useful according to the invention for diagnostics, therapeutics and methods of drug screening and drug design.
  • Peptides used to induce specific antibodies may have an amino acid sequence consisting of at least five amino acids and preferably at least 10 amino acids. Preferably, they should be identical to a region of the natural protein and may contain the entire amino acid sequence of a small, naturally occurring molecule. Short stretches of amino acids coreesponding to the protein product of a candidate gene of the invention may be fused with amino acids from another protein such as keyhole limpet hemocyanin or GST, and antibody will be produced against the chimeric molecule.
  • Procedures well known in the art can be used for the production of antibodies to the protein products of the candidate genes of the invention.
  • various hosts including goats, rabbits, rats, mice etc... may be immunized by injection with the protein products (or any portion, fragment, or oligonucleotide thereof which retains immunogenic properties) of the candidate genes of the invention.
  • various adjuvants may be used to increase the immunological response.
  • adjuvants include but are not limited to Freund's, mineral gels such as aluminum hydroxide, and surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, and dinitrophenol.
  • BCG Bacilli Calmette-Guerin
  • Corynebacterium parvum are potentially useful human adjuvants.
  • the antigen protein may be conjugated to a conventional carrier in order to increase its immunogenicity, and an antiserum to the peptide-carrier conjugate will be raised. Coupling of a peptide to a carrier protein and immunizations may be performed as described (Dymecki et al., 1992, J. Biol. Chem., 267: 4815).
  • the serum can be titered against protein antigen by ELISA (below) or alternatively by dot or spot blotting (Boersma and Van Leeuwen, 1994, J Neurosci. Methods, 51: 317).
  • the antiserum may be used in tissue sections prepared asdescribed. A useful serum will react strongly with the appropriate peptides by ELISA, for example, following the procedures of Green et al., 1982, Cell, 28: 477.
  • Monoclonal antibodies Techniques for preparing monoclonal antibodies are well known, and monoclonal antibodies may be prepared using a candidate antigen whose level is to be measured or which is to be either inactivated or affinity-purified, preferably bound to a carrier, as described by Arnheiter et al., 1981, Nature, 294;278.
  • Monoclonal antibodies are typically obtained from hybridoma tissue cultures or from ascites fluid obtained from animals into which the hybridoma tissue was introduced.
  • Monoclonal antibody-producing hybridomas (or polyclonal sera) can be screened for antibody binding to the target protein.
  • Antibody Detection Methods Particularly preferred immunological tests rely on the use of either monoclonal or polyclonal antibodies and include enzyme-linked immunoassays (ELISA), immunoblotting and immunoprecipitation (see Voller, 1978, Diagnostic Horizons, 2:1, Microbiological Associates Quarterly Publication, Walkersville, MD; Voller et al., 1978, J. Clin. Pathol., 31: 507; U.S. Reissue Pat. No. 31,006; UK Patent 2,019,408; Butler, 1981, Methods Enzymol, 73: 482; Maggio, E.
  • ELISA enzyme-linked immunoassays
  • Labeling techniques are useful, according to the invention, for studying the biochemical properties, processing, intracellular transport, secretion and degradation of proteins.
  • Biosynthetic labeling of proteins produced by candidate genes of the invention is preferably performed with 35 S-methionine due to the high specific activity (>800Ci/mmol) and ease of detection of this amino acid.
  • Another amino acid should be used to label a protein that contains little or no methionine.
  • either suspension cells or adherent cells are labeled with 35 S-methionine. Briefly, cells are washed and incubated for 15 min at 37°C in short-term labeling medium (complete serum-free, methionine free RPMI or DMEM containing 5% (v/v) dialyzed fetal bovine serum) to deplete intracellular pools of methionine.
  • cells can be labeled in the presence of 35 S- methionine in long term labeling medium (90% methionine free RPMI or DMEM) for up to 16 hours (Ausubel et al., supra).
  • the protein product of the cloned candidate gene of the invention can be produced by the methods of in vitro transcription and in vitro translation.
  • In vitro transcription is performed essentially as described in Section B entitled "Production of a Polynucleotide Sequence" in the absence of a labeled ribonucleoside.
  • the RNA produced by the in vitro transcription reaction will be extracted with phenol, ethanol precipitated twice and resuspended in 10ml of TE buffer.
  • In vitro translation is performed by adding 1 to 10ml of RNA to an in vitro translation kit (e.g. wheat germ or reticulocyte lysate) in the presence of 15mCi [ 35 S]methionine, following the directions provided by the manufacturer.
  • a typical reaction is carried out in a 30ml volume at room temperature for 30 to 60 minutes (Ausubel et al., supra).
  • Mammalian cells expressing a nucleotide sequence comprising a polymorphism are useful, according to the invention for determining the biochemical and functional properties of the protein product of a nucleotide sequence comprising a polymorphism, for analyzing expression of a candidate gene, for large scale production of a protein of interest, for drug screening and for the production of transgenic animals or knockout mice.
  • Methods of efficiently introducing foreign DNA into mammalian cells include calcium phosphate transfection, DEAE-dextran transfection, electroporation and liposome-mediated transfection (Ausubel et al., supra).
  • the method of calcium phosphate transfection involves preparing a precipitate by slowly mixing a HEPES -buffered saline solution with a mixture of calcium chloride and DNA. According to this method, up to 10% of the cells on a dish will incorporate DNA. Cells to be transfected are split one day prior to transfection so that on the day of transfection cells are well-separated on the plate, a 10 cm dish of cells is fed with 9.0 ml of complete medium approximately 2 to 4 hours before the addition of the precipitate.
  • DNA to be transfected (10-50mg/10-cm plate) is ethanol precipitated, resuspended in 450ml sterile water and mixed with 50ml of 2.5M CaCl 2
  • the DNA/CaCl 2 solution is added dropwise to a 15-ml conical tube containing 500ml 2X HeBS (0.283M NaCI, 0.023M HEPES acid, 1.5 mM Na 2 HPO 4 , pH 7.05). It is preferable to bubble the HeBS solution during the addition ofthe DNA mixture.
  • the precipitate After the precipitate has formed for 20 minutes at room temperature, it is added evenly to the cells.
  • the cells are incubated with the precipitate at 37°C in a CO 2 humidified incubator for 4-16 hours. Following removal of the precipitate, the cells are washed with PBS and fed in complete medium.
  • Glycerol or dimethyl sulfoxide shock can be used to increase the DNA uptake by certain types of cells (Ausubel et al., supra).
  • Cells to be transfected are plated at a concentration such that after 3 days of growth they are 30-50% confluent.
  • the DNA to be transfected (approximately 4mg) is ethanol precipitated, resuspended in 40ml TBS and added slowly while shaking to 80ml of warm lOmg/ml DEAE- dextran in TBS.
  • cells are shocked by the addition of 5 ml of 10% DMSO in PBS. After a 1 minute incubation at room temperature, cells are washed with PBS and fed with complete medium (Ausubel et al., supra).
  • DNA can be introduced into cells by the use of high-voltage electric shocks, a technique termed electroporation.
  • electroporation cells are suspended in an appropriate electroporation buffer and placed in an electroporation cuvette.
  • the cuvette is connected to a power supply and the cells are subjected to a high-voltage electrical pulse of a defined magnitude and length, optimized for the cell type being transfected. After a brief period of recovery, the cells are placed in normal culture medium.
  • a population of cells to be transfected by electroporation is grown to late-log phase in complete medium. Typically stable transfection requires 5 X 106 cells, and transient transfection requires 1-4X 10 7 cells. Cells are harvested by centrifugation for 5 minutes at640x gat4°C. The resulting cell pellet is resuspended in half of the original volume of ice-cold electroporation buffer (e.g. PBS without calcium or magnesium, Hepes buffered saline, tissue culture medium without serum, or phosphate buffered sucrose (272mM sucrose/7 mM K 2 HPO 4 , pH 7.4/lmM MgCl 2 )). The choice of an electroporation buffer is dictated by the cell line.
  • electroporation buffer e.g. PBS without calcium or magnesium, Hepes buffered saline, tissue culture medium without serum, or phosphate buffered sucrose (272mM sucrose/7 mM K 2 HPO 4 , pH 7.4/
  • Cells are then harvested by centrifugation for 5 minutes at 640 x g at 4°C, and resuspended at 1 X 10 7 /ml in electroporation buffer at 0°C for stable transfection or at a higher concentration (up to 8 X 10 7 /ml) for transient transfection. Aliquots of the cells (0.5 ml) are transferred into the desired number of electroporation cuvettes and placed on ice.
  • DNA is added to the cell suspension in the cuvettes on ice.
  • DNA (optimally 1-lOmg) should be linearized with a restriction enzyme that cuts at a site in a non- essential region, purified by phenol extraction and ethanol precipitated.
  • Supercoiled DNA (optimally lOmg) may be used for transient transfection. The DNA/cell suspension is mixed, and incubated on ice for 5 minutes.
  • the cuvette is placed in the holder in the electroporation apparatus (at room temperature) and shocked one or more times at the desired voltage and capacitance settings.
  • An electroporation apparatus useful according to the invention is the Bio-Rad Gene Pulser.
  • the number of shocks and the voltage and capacitance settings will vary depending on the cell type, and should be optimized. The two parameters that are critical for successful electroporation are the maximum voltage for the shock and the duration of the current pulse.
  • the cuvette containing the mixture of cells and DNA is incubated on ice for 10 minutes.
  • the transfected cells are diluted 20-fold in complete culture medium.
  • stable transfection cells are grown for 48 hours in nonselective medium and then transferred to antibiotic containing medium.
  • transient transfection cells are incubated 50-60 hours and then harvested for the desired transient assay.
  • Transgenic animals expressing a construct comprising a candidate gene containing a polymorphism, according to the invention can be produced by methods well known in the art (reviewed in Reeves et al., supra). Knock out mice wherein a candidate gene according to the invention has been disrupted can be produced by methods well known in the art (reviewed in Moreadith and Radford, 1997, J,Mol. Med., 75:208 and Shastry, 1998, Mol. Cell. Biochem., 181:163). These animals provide useful models for studying the functional consequences of one or more polymorphisms in a gene of interest.
  • the invention provides a method of producing a candidate gene library comprising genes that are potentially associated with the susceptibility to, or pathogenesis of a disease.
  • a candidate gene library is useful for determining the genetic basis of a disease of interest. Genetic susceptibility to a disease must occur as a result of specific DNA differences relative to non-susceptible individuals. In the case of osteoporosis, many genes are known which are potentially involved in the susceptibility to, or pathogenesis of the disease. These genes are included in the candidate gene library and the association of these genes with osteoporosis is determined from population studies according to the invention.
  • the candidate gene strategy Unlike linkage studies wherein a region of the genome that is thought to be involved in a disease is determined, the candidate gene strategy, including association studies, addresses the involvement of a particular gene in a disease.
  • the results of association studies of candidate genes are used to identify genes that should be intensively studied as potential therapeutics or therapeutic targets.
  • the full range of polymorphic sites within each candidate gene is identified and examined in diseased and normal populations. The frequency of each gene variant (allele) in each population is then compared to the other. If a specific polymorphism under analysis contributes to the disease phenotype, it will be present in the diseased population at a higher frequency than in the normal population.
  • the specific polymorphism under analysis does not itself contribute to the disease phenotype but resides elsewhere in, or is near to a gene containing a contributory polymorphism, a significant association may be seen with the polymorphic marker being tested. This is because the two markers are in linkage disequilibrium with each other due to their close proximity.
  • the goal of linkage studies is to determine the approximate position of disease genes by studying related individuals in families. According to linkage strategies, DNA markers that are randomly spaced throughout the genome, but are rarely located within genes, are tested for the frequency of their presence along with the particular disease phenotype. There is approximately a 50% chance of an unlinked gene and marker gene co-localizing. If a particular marker is present at a significantly higher frequency than expected in disease individuals, this indicates that the marker is located in the vicinity of the disease gene. Usually the disease gene is delimited to a large region (containing tens to hundreds of genes). After a disease gene has been grossly mapped, this entire region must be extensively characterized to determine what genes are present in the region. Any gene that is identified according to this method becomes a candidate gene.
  • a series of genetic crosses is performed in an animal model system of a particular defect that is characteristic of a disease of interest (e.g. osteoporosis) between individuals having an observable mutant phenotype and normal individuals of a control strain.
  • At least one disease- related loci is used as a marker in these crosses.
  • linkage analysis can be performed using chromosomal markers that do not comprise a disease related locus (described below). If non-random assortment of the mutant trait with a marker locus is observed, and if that non- random assortment is statistically significant (for example, if a Student's t test or ANOVA is applied to the results) the trait is linked to the marker locus.
  • Pedigree analysis is a useful technique for identifying genes for which variant alleles may contribute to the risk, onset or progression of a disease in a family containing multiple individuals afflicted with a disease; according to this method, numerous genetic loci from affected and unaffected family members are compared. Non-random assortment of a given genetic marker between affected and unaffected family members relative to the distributions observed for other genetic loci indicates that the marker (for example, a variant isoform of a gene) either contributes to the disease or is in physical proximity to another that does so.
  • the marker for example, a variant isoform of a gene
  • YAC yeast artificial chromosome
  • BAC bacterial artificial chromosome
  • An initial evaluation may be performed with the assistance of a computer program, such as the PathCallingTM (CuraGen) biological pathway discovery platform.
  • All or a subset of the open reading frames present in the region are then cloned (e.g., by PCR) from mutant animals or affected family members and from their healthy counterparts (either control animals or unaffected family members), and the sequences of these open reading frames are compared.
  • Jf a mutation or other allelic variant is found to be linked to individuals displaying the disease phenotype (in a statistically-significant, non-random manner), it can be concluded that this mutation is associated with a disease phenotype.
  • a nucleic acid fragment containing this gene can be labeled and used as a probe for in situ hybridization analysis of fixed chromosomes of the human or other mammal to determine precisely the physical location of the gene.
  • a gene that has been mapped and isolated in this manner may be useful as a candidate target for disease diagnosis and for drug targeting according to the invention (see below).
  • a candidate gene library according to the invention will include i . genes that are involved in known or predicted disease pathways, ii. new genes that are identified by a relevant pattern of specific tissue or cell expression, iii. genes that map to genomic regions of known linkage, and iv. gene sequences (from sequence databases) that are homologs of the above referenced categories of potential candidate genes.
  • the choice of potentially related genes to be selected from a database will depend on the percent identity as calculated by Fast DB and based upon mismatch penalty, gap penalty, gap size penalty and joining penalty.
  • predictions can be made regarding a cell or tissue-type that would be expected to express high or low levels of candidate genes associated with a particular disease. For osteoporosis, it is expected that muscle, adipose, pancreas or liver tissue or tissue comprising insulin secreting pancreatic b-cells, would be useful for identifying candidate genes according to the invention.
  • SAGE depends on the following two principles. First, sufficient information is contained within a short nucleotide sequence (approximately 9-lObp), isolated from a defined location within a transcript, to uniquely identify a transcript. Second, the concatenation of short tags of sequence allows transcripts to be analysed serially by sequencing multiple tags within a single clone.
  • the method of SAGE is performed by synthesizing double-stranded cDNA from mRNA, cleaving the resulting cDNA with an anchoring restriction endonuclease that is expected to cleave most transcripts at least one time, and isolating the most 3' region of the cleaved cDNA by binding to streptavadin beads.
  • This protocol allows for the identification of a unique site on a transcript that corresponds to the restriction site located closest to the polyA tail.
  • Replicate samples of the most 3 ' region of the cDNA are ligated to one of two linker molecules that contain a type US restriction site for a tagging enzyme.
  • the cleavage site for Type IIS restriction endonucleases is located at a defined distance up to 20 bp from the asymmetric recognition site.
  • Linkers are designed such that upon cleavage of the ligation product with the tagging enzyme there is release of the linker and an attached short region of cDNA.
  • the two pools of released tags are ligated to each other and the resulting ligated product is used as a template for PCR amplification in the presence of primers that are specific for each linker.
  • the PCR product is cleaved with the anchoring enzyme and amplification products, comprising two tags linked tail to tail, are isolated, concatenated by ligation, cloned and sequenced (Velescu et al., supra).
  • Differential display provides a method for separating and cloning individual mRNAs by PCR analysis.
  • oligonucleotide primers are selected wherein one primer is anchored to the polyadenylate tail of a subset of mRNA species and the other primer is short and of an arbitrary sequence such that it anneals at different positions relative to the first primer.
  • the mRNA subpopulations that are identified with these primer pairs are subjected to reverse transcription, amplified and analysed on a DNA sequencing gel.
  • DNA sequences to be tested for expression are spotted onto a surface, usually at high-density to allow for the testing of many genes.
  • the surface contain the DNA sequences is typically refe ⁇ ed to as a 'chip' .
  • the spotted DNA cam be either cDNA clones or oligonucleotides.
  • RNA is prepared from the two cells or tissues to be compared. The RNA from one cell/tissue will be labeled red and the RNA from the other cell/tissue will be labeled yellow. Both RNA preparations are hybridized to the DNA array. The ratio of red to yellow is indicative of the relative levels of expression between the two cells/tissues.
  • Linkage analysis provides a method for identifying genes mapping to genomic regions of known linkage.
  • linkage analysis may be performed between an unmapped candidate gene and one or more ofthe disease-related loci or by analyzing the genetic linkage between the candidate gene and chromosomal markers which are not themselves linked to a disease-related locus, according to the same method.
  • the spacing of markers throughout the genome of the test organism is approximately one every cM or less. This spacing will ensure complete coverage of the genome and will facilitate accurate mapping.
  • Other methods for mapping a candidate gene are provided below.
  • Radiation hybrid (RH) mapping is a somatic cell hybrid technique that was developed to create high resolution, contiguous maps of mammalian chromosomes.
  • the method is useful for ordering DNA markers spanning millions of base pairs of DNA at a resolution not easily obtained by other mapping methods (Cox etal., 1990, Science, 250: 245; Burffle etal., 1991, Genomics, 9:19; Warrington et al., 1992, Genomics, 13: 803; Abel et al., 1993, Genomics, 17:632).
  • Radiation hybrid mapping facilitates the mapping of non-polymorphic DNA markers that cannot be used for meiotic mapping.
  • a lethal dose of X-irradiation is used to fragment the chromosomes ofthe donor cell line. Chromosome fragments from the donor cell line are then retained, in a non-selective manner, following cell fusion with a recipient cell line. The resulting hybrid clones are then analysed for the presence or absence of specific donor chromosome markers. It is expected that markers that are further apart on a chromosome are more likely to be broken apart by radiation and to segregate independently in the RH cells than markers that are closer together.
  • mRNA is isolated from a tissue of choice, wherein the tissue is obtained from two distinct organisms and wherein one organism displays a mutant phenotype with regard to a particular trait while the other is normal in that respect.
  • Methods well known in the art are used to prepare cDNA from the mRNA derived from the organism.
  • the mRNA template is then degraded, either by hydrolysis under alkaline conditions or by RNAase H-mediated cleavage, and the cDNA is returned to a buffer in which mRNA is stable, and mixed with a molar excess of mRNA prepared from the second organism under conditions of stringent hybridization.
  • the mixture is then passed over a hydroxyapatite column, which binds double-stranded nucleic acids but allows single stranded nucleic acid molecules to pass through.
  • Reverse transcripts derived from the first sample which do not hybridize to niRNA molecules derived from the second organism (in other words, reverse transcripts specific to the first tissue sample) are present in the flow-through fraction and are cloned into a vector to create a subtraction library.
  • the reciprocal experiment in which the cDNA is derived from the second mRNA preparation) is also carried out to create a complete set of transcripts specific to the tissue samples derived from the two organisms.
  • This procedure will provide transcripts that can be labeled and used as probes in in situ hybridization analysis of immobilized chromosomes.
  • the method of subtractive screening therefore, yields both cloned genes as well as reagents useful for determining if the cloned genes co-localize with a loci of interest. If a particular gene is found to co-localize to a loci of interest, the genes may be analysed functionally (e.g., in a phenotypic rescue experiment, as described below or by the phenotypic assays described in Section F entitled "Identification and Characterization of Polymorphisms") Ultimately, these genes may be used as targets for drugs or disease diagnostic methods, or even as therapeutic nucleic acids.
  • entrapment vectors first described in bacteria (Casadaban and Cohen, 1979, Proc. Natl. Acad. Sci. U.S.A., 76: 4530; Casadaban et al., 1980, J Bacteriol, 143: 971).
  • entrapment vectors can be introduced into pluripotent ES cells in culture (for example, using electroporation or a retrovirus) and then passed into the germline via chimeras (Gossler et al., 1989, Science, 244: 463; Skames, 1990, Biotechnology, 8:827).
  • transgenic animals containing entrapment vectors may be generated by standard oocyte injection protocols.
  • Promoter or gene trap vectors often contain a reporter gene, e.g., lacZ, Cat or green fluorescent protein (Gfp) that lacks its own upstream promoter and/or splice acceptor sequence.
  • promoter gene traps contain a reporter gene with a splice site but no promoter. If the vector integrates within a gene and is spliced into the gene product, then the reporter gene will be expressed. Enhancer traps contain a reporter gene and have a minimal promoter which requires the activity of an enhancer in order to function. If the vector integrates near an enhancer (whether in a gene or not), then the reporter gene will be expressed. Activation of the reporter gene can only occur when the vector is integrated within an active host gene and generates a fusion transcript with the host gene. The activity of a reporter gene provides an easy assay for determining if a vector has been integrated into an expressed gene. Methods for detecting reporter gene activity in transfected cells or tissues of a transgenic animal are well known in the art.
  • the mutagenic vector may be mapped using standard cytogenetic techniques, such as in situ hybridization, wherein a labeled fragment comprising vector-specific sequence is used as a probe. Co-localization of the probe with a particular locus of interest indicates that the associated gene is a suitable candidate and should be subjected to further analysis. A gene that has been identified in this manner can be cloned as described.
  • N. Diagnostic Indicators, Screens and Disease Symptoms in another embodiment of the invention, there is provided a method of diagnosing or determining susceptibility of a subject to low BMD and/or bone damage.
  • This method involves analyzing the genetic material of a subject to determine which allele(s) ofthe gene is/are present.
  • the method may include determining whether one or more particular alleles are present, or which combination of alleles (i.e. a haplotype) is present.
  • the method may also include determining whether subjects are homozygous or heterozygous for a particular allele or haplotype.
  • the method comprises determining which allele of one or more ofthe polymorphisms ofthe invention is/are present.
  • the method may include determining the presence of the polymorphism of the gene which in combination with polymorphisms defined herein or other polymorphisms may define a risk haplotype.
  • the polynucleotides sequences for these particular alleles may be used for diagnostic purposes .
  • the polynucleotides which may be used include oligonucleotides, complementary RNA and DNA molecules and PNAs.
  • the polynucleotides may be used to determine whether subjects are homozygous or heterozygous for a particular allele or haplotype making them susceptible to low BMD and/or bone damage, and hence, osteoporosis.
  • hybridization with a PCR probe which is capable of detecting particular polymorphism and these probes may be used to identify nucleic acid sequences of particular alleles or haplotype. These probes must be specific to these particular alleles and the stringency of the hybridization or amplification must be such that the probe identifies only this particular allele.
  • Means for producing specific hybridization probes for these polynucleotides of particular alleles include the cloning of these polynucleotide sequences into vectors for the production of mRNA probes is well known to one skilled in the art. Such vectors are known in the art, are commercially available, and may be used to synthesize RNA probes in vitro by means of the addition of the appropriate RNA polymerases and the appropriate labeled nucleotides.
  • Hybridization probes may be labeled by a variety of reporter groups, for example, by radionuclides such as 32 P or 35 S, or by enzymatic labels, such as alkaline phosphatase coupled to the probe via avidin/biotin coupling systems, and the like.
  • Polynucleotides of particular alleles or haplotype may be used in Southern or northern analysis, dot blot, or other membrane-based technologies; in PCR technologies; in dipstick, pin, and multiformat ELISA-like assays; and in microarrays utilizing fluids or tissues from patients to detect susceptibility to low BMD and/or bone damage. Such qualitative methods are well known in the art.
  • polynucleotides of particular alleles or haplotype may be used in assays that detect susceptibility to low BMD and/or bone damage, particularly those mentioned above.
  • Polynucleotides complementary to sequences of a particular allele or haplotype may be labeled by standard methods and added to a fluid or tissue sample from a patient under conditions suitable for the formation of hybridization complexes. After a suitable incubation period, the sample is washed and determined if there is a signal. If a signal is found, then the presence of the polynucleotide of a particular allele, alleles or haplotype in the sample indicates the susceptibility to low BMD and/or bone damage, and hence, osteoporosis.
  • Such assays may also be used to determine the particular therapeutic treatment regimen for an individual patient.
  • osteoporosis With respect to osteoporosis, the presence of a particular polymorphism or polymorphisms in a tissue sample from an individual may indicate a predisposition for low BMD and/or bone damage, or may provide a means for detecting osteoporosis prior to the appearance of actual clinical symptoms. A more definitive diagnosis of this type may allow health professionals to employ preventative measures or aggressive treatment earlier, thereby preventing the development or further progression of osteoporosis.
  • oligonucleotides designed from the polynucleotide sequences of a particular allele or haplotype may involve the use of PCR. These oligomers may be chemically synthesized, generated enzymatically, or produced in vitro. Oligomers will contain a fragment of a polynucleotide a particular allele, alleles or haplotype or a fragment of a polynucleotide complementary to the polynucleotide a particular allele, alleles or haplotype, and will be employed under optimized conditions for identification of a specific polymorphism, polymorphisms or haplotype. Oligomers may also be employed under very stringent conditions for detection of these particular DNA or RNA sequences.
  • oligonucleotides or longer fragments derived from any of the polynucleotides described herein may be used as elements on a microarray.
  • the microarray can be used in transcript imaging techniques to detect a particular polymorphism, polymorphisms or haplotype simultaneously as described below.
  • this information may be used to develop a pharmacogenomic profile of a patient in order to select the most appropriate and effective treatment regimen for that patient. For example, therapeutic agents which are highly effective and display the fewest side effects may be selected for a patient based on his/her pharmacogenomic profile.
  • a method involves the use of antibodies in diagnosing or determining the susceptibility to low BMD and/or bone damage.
  • the antibodies would specifically bind to an epitope of a particular allele or form of the protein and may be used to determine susceptibility to low BMD and/or bone damage, and hence, osteoporosis.
  • Antibodies useful for diagnostic purposes may be prepared in the same manner as described above for therapeutics. Diagnostic assays for determining susceptibility to low BMD and/or bone damage include methods which utilize the antibody and a label to detect a particular allele or form of the protein in human body fluids or in extracts of cells or tissues.
  • the antibodies may be used with or without modification, and may be labeled by covalent or non-covalent attachment of a reporter molecule.
  • reporter molecules A wide variety of reporter molecules, several of which are described above, are known in the art and may be used.
  • fragments of ABBR, or antibodies specific for ABBR may be used as elements on a microarray.
  • Microa ⁇ ays may be prepared, used, and analysed using methods known in the art (Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93: 10614-10619; Baldeschweiler et al. (1995) PCT application WO95/251116; Shalon, D. et al. (1995) PCT application WO95/35505; Heller, R.A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150-2155; Heller, M.J. et al. (1997) U.S. Patent No. 5,605,662).
  • Various types of microarrays are well known and thoroughly described in Schena, M., ed. (1999; DNA Microarrays: A Practical Approach, Oxford University Press, London).
  • tissue or fluid samples containing a polynucleotide or polypeptide of interest include but are not limited to plasma, serum, spinal fluid, lymph fluid, urine, stool, external secretions of the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and samples of in vitro cell culture constituents.
  • Genomic DNA, cDNA or RNA can be prepared from the human sample according to the methods described above.
  • a biological sample such as blood is prepared and analysed for the presence or absence of susceptibility alleles of a gene containing a polymorphism, according to the invention. Results of these tests and interpretive information will be returned to the health care provider for communication to the tested individual.
  • diagnoses may be performed by diagnostic laboratories, or, alternatively, diagnostic kits are manufactured and sold to health care providers or to private individuals for self-diagnosis.
  • the screening method will involve amplification ofthe relevant gene sequences.
  • the screening method involves a non-PCR based strategy.
  • non-PCR based screening methods include Southern blot analysis to detect the presence of a variant form of a gene in a sample comprising total genomic DNA from the individual being tested.
  • northern blot analysis can be used to detect an aberrant mRNA encoded by a gene, that exhibits altered stability or is the result of alternative splicing in a sample comprising RNA from an individual being tested.
  • the methods of S 1 nuclease analysis, RNASE protection and primer extension can also be used to determine both the endpoint and the amount of a gene specific mRNA (Ausubel et al., supra). Both PCR and non-PCR based screening strategies can detect target sequences with a high level of sensitivity.
  • the preferred method is target amplification.
  • the target nucleic acid sequence is amplified with polymerases.
  • One particularly preferred method using polymerase-driven amplification is PCR (described above).
  • the polymerase chain reaction and other polymerase-driven amplification assays can achieve over a million-fold increase in copy number through the use of polymerase-driven amplification cycles.
  • PCR primers useful for target amplification according to the invention will be designed to amplify a region of DNA containing one or more polymorphisms. Allele specific primers (comprising one or more polymorphisms) are also useful for detecting gene sequence variations by PCR methodologies according to the invention.
  • the absence of a particular polymorphism will be indicated by the absence of an amplified product when the amplification step is carried out in the presence of allele specific primers.
  • the resulting nucleic acid can be sequenced and the specific sequence of the test DNA will be compared with the wild type sequence by using the computer programs described in Section F entitled "Identification and Characterization of Polymorphisms".
  • the amplified product will be analysed by Southern blot assay with nucleic acid probes. Nucleic acid probes, useful according to the invention, will be specifically hybridizable to a mutant form of a gene but not to the wild type gene due to the presence of one or more polymorphisms.
  • the biological sample to be analysed such as blood or serum
  • the biological sample to be analysed may be treated, if desired, to extract the nucleic acids (as described above).
  • the sample nucleic acids isolated from a biological sample or amplified by PCR
  • the targeted region ofthe nucleic acids being analysed are at least partially single-stranded to form hybrids with the targeting sequence of the probe. If the sequence is naturally single-stranded, denaturation will not be required. However, if the sequence is double-stranded, the sequence will probably need to be denatured. Denaturation can be carried out by various techniques known in the art.
  • analyte nucleic acid and probe will be incubated under conditions which promote stable hybrid formation of the target sequence in the probe with the putative targeted sequence in the sample DNA. If the region of the probe which is used to bind to the analyte is designed to be completely complementary to the targeted region, high stringency conditions are desirable in order to prevent false positives. However, conditions of high stringency will be used only if the probes are complementary to regions ofthe chromosome which are unique in the genome. The stringency of hybridization is determined by a number of factors (described above). Detection, if any, ofthe resulting hybrid is usually accomplished by the use of labeled probes.
  • the probe may be unlabeled, but may be detectable by specific binding with a ligand which is labeled, either directly or indirectly.
  • a ligand which is labeled, either directly or indirectly.
  • Suitable labels, and methods for labeling probes and ligand are known in the art, and are described in Section C entitled "Production of a Nucleic Acid Probe".
  • the foregoing screening method may be modified to identify individuals having a gene containing a neutral polymorphism not associated with osteoporosis, by preferably amplifying DNA fragments of a gene derived from a particular individual.
  • the amplified DNA fragments are sequenced and the sequence is compared to the consensus gene sequence containing neutral polymorphisms.
  • differences between the individual's coding sequence for a gene and a consensus sequence for the same gene are determined wherein the presence of any neutral polymorphisms and the absence of a polymorphisms not previously identified as neutral polymorphisms can be correlated with an absence of increased genetic susceptibility to osteoporosis resulting from a mutation in a gene coding sequence.
  • detection of a polymorphism will be performed by detecting loss of a restriction enzyme recognition site due to the presence of one or more polymorphisms.
  • a polymorphism will be detected with a polynucleotide probe that is capable of detecting a restriction enzyme fragment containing the polymorphism, wherein the fragment is of a size that can be easily separated on an agarose gel and visualized by Southern blot analysis.
  • a polynucleotide probe according to this embodiment of the invention can be specific for a sequence within the candidate gene or outside of the candidate gene.
  • the nucleic acid probe assays of this invention will employ a mixture of nucleic acid probes capable of detecting a gene.
  • a mixture of nucleic acid probes capable of detecting a gene in one example to detect the presence of a gene in a test sample, more than one probe complementary to a gene is employed and in particular the number of different probes is alternatively 2, 3, or 5 different nucleic acid probe sequences.
  • the probe mixture includes probes capable of binding to the allele- specific mutations identified in populations of patients with alterations in a gene.
  • any number of probes can be used, and will preferably include probes corresponding to the major gene mutations identified as predisposing an individual to osteoporosis.
  • Northern blot analysis SI nuclease analysis, RNASE protection and primer extension (Ausubel et al., supra) are also methods according to the invention for detecting changes in mRNA resulting from the presence of one or more polymorphisms in the sequence of a gene. Additionally, ofthe methods of genotyping described in Section F entitled “Identification and Characterization of Polymorphisms" can be used for diagnostics according to the invention.
  • Peptide Diagnosis and Diagnostic kits osteoporosis can also be detected on the basis of an alteration of the wild-type polypeptide. Such alterations can be determined by sequence analysis in accordance with conventional techniques. More preferably, antibodies (polyclonal or monoclonal) are used to detect differences in, or the absence of peptides derived from a gene of interest. The antibodies maybe prepared as described above in Section I entitled "Preparation of Antibodies". Preferably, antibodies will immunoprecipitate the protein product of a gene from solution as well as react with the protein product of a gene on Western or immunoblots of polyacrylamide gels. Antibodies useful according to the invention will also detect the protein product of a gene in paraffin or frozen tissue sections, using immunocytochemical techniques.
  • Prefe ⁇ ed embodiments relating to methods for detecting wild type or mutant forms of the protein product of a gene include enzyme linked immunosorbent assays (ELISA), radioimmunoassay (RIA), immunoradiometric assays (IRMA) and immunoenzymatic assays (IEMA), including sandwich assays using monoclonal and/or polyclonal antibodies.
  • ELISA enzyme linked immunosorbent assays
  • RIA radioimmunoassay
  • IRMA immunoradiometric assays
  • IEMA immunoenzymatic assays
  • Exemplary sandwich assays are described by David et al. In U.S. Pat. Nos.4,376,110 and 4,486,530, hereby incorporated by reference.
  • This invention is particularly useful for screening therapeutic compounds by using the mutant gene or protein product or binding fragment of the gene in any of a variety of drug screening techniques.
  • the protein product or fragment of a gene employed in such a test may either be free in solution, affixed to a solid support, expressed on the surface of a cell, or located intracellularly.
  • One method of drug screening utilizes eukaryotic or procaryotic host cells which are stably transformed with a recombinant polynucleotide expressing the polypeptide or fragment, preferably in competitive binding assays. Such cells, either in viable or fixed form, can be used for standard binding assays. In particular, these cells can be used to measure formation of a complex comprising the protein product or fragment of a gene and the agent being tested.
  • these cells can be used to determine if the formation of a complex between the protein product or fragment of a gene and a known ligand is interfered with by an agent being tested.
  • the present invention discloses methods useful for drug screening wherein such methods comprise contacting a candidate drug with a polypeptide or fragment derived from a gene and assaying (i) for the presence of a complex between the drug and the polypeptide derived or fragment derived from a gene, or (ii) for the presence of a complex between the polypeptide or fragment derived from a gene and a ligand, by methods well known in the art.
  • the polypeptide or fragment derived from a gene is labeled for use in competitive binding assays.
  • Purified protein can be coated directly onto plates for use in the aforementioned drug screening techniques.
  • non-neutralizing antibodies to the polypeptide can be used to capture the polypeptide or peptide fragment of interest and immobilize it on the solid support.
  • An additional technique for drug screening involves the use of host eukaryotic cell lines or cells (such as described above) which have a gene that produces a defective protein.
  • the host cell lines or cells are grown in the presence of a test drug compound.
  • the rate of growth ofthe host cells is measured to determine if the compound is capable of regulating the growth of cells expressing a nonfunctional protein product of a gene.
  • the ability of the test compound to restore the function of the mutant gene protein can be measured by using an appropriate in vitro assay for function of the protein product of a gene. Suitable in vitro functional assays are described in Section F entitled "Identification and Characterization of Polymorphisms".
  • the ability of the test compound to alter the cellular localization of the protein will be determined. Changes in the cellular localization of a protein of interest will be detected by performing cellular fractionation studies with biosynthetically labeled cells. Alternatively, the cellular localization of a protein of interest can be determined by immunocytochemical methods well known in the art. A method of drug screening may involve the use of host eukaryotic cell lines or cells
  • aberrant pattern of expression is meant the level of expression is either abnormally high or low, or the temporal pattern of expression is different from that of the wild type gene.
  • the ability of a test drug to alter the expression of a mutant form of a gene can be measured by Northern blot analysis, SI nuclease analysis, primer extension or RNASE protection assays.
  • cells can be engineered to express a reporter construct comprising a mutant gene promoter driving expression of a reporter gene (e.g. CAT, luciferase, green fluorescent protein). These cells can be grown in the presence of a test compound and the ability of a test compound to alter the level of activity of the mutant gene promoter can be determined by standard assays for each reporter gene which are well known in the art.
  • a “candidate drug” as used herein, is any compound with a potential to modulate a phenotype associated with a particular disease according to the invention.
  • a candidate drug is tested in a concentration range that depends upon the molecular weight of the drug and the type of assay.
  • small molecules (as defined below) may be tested in a concentration range of lpg - lOOmg/ml, preferably at about 100 pg - 10 ng/ml; large molecules, e.g., peptides, may be tested in the range of 10 ng - 100 mg/ml, preferably 100 ng - 10 mg/ml.
  • Candidate drug compounds from large libraries of synthetic or natural compounds can be screened. Numerous means are cu ⁇ ently used for random and directed synthesis of saccharide, peptide, and nucleic acid based compounds.
  • Synthetic compound libraries are commercially available from a number of companies including Maybridge Chemical Co. (Trevillet, Cornwall, UK), Comgenex (Princeton, NJ), Brandon Associates (Merrimack, NH), and Microsource (New Milford, CT).
  • a rare chemical library is available from Aldrich (Milwaukee, WI).
  • Combinatorial libraries are available and can be prepared.
  • libraries of natural compounds in the form of bacterial, fungal, plant and animal extracts are available from e.g., Pan Laboratories (Bothell, WA) or MycoSearch (NC), or are readily produceable by methods well known in the art.
  • natural and synthetically produced libraries and compounds are readily modified through conventional chemical, physical, and biochemical means.
  • Useful compounds may be found within numerous chemical classes, though typically they are organic compounds, and preferably small organic compounds. Small organic compounds have a molecular weight of more than 50 yet less than about 2,500 daltons, preferably less than about 750 daltons, more preferably less than about 350 daltons. Exemplary classes include heterocycles, peptides, saccharides, steroids, and the like. The compounds may be modified to enhance efficacy, stability, pharmaceutical compatibility, and the like. Structural identification of an agent may be used to identify, generate, or screen additional agents.
  • peptide agents may be modified in a variety of ways to enhance their stability, such as using an unnatural amino acid, such as a D-amino acid, particularly D-alanine, by functionalizing the amino or carboxylic terminus, e.g. for the amino group, acylation or alkylation, and for the carboxyl group, esterification or amidification, or the like.
  • an unnatural amino acid such as a D-amino acid, particularly D-alanine
  • Determination of Activity of a Drug is determined to be effective if its use results in a change of about 10% of a phenotype associated with a disease according to the invention.
  • the level of modulation by a candidate modulator of a phenotype associated with a disease may be quantified using any acceptable limits, for example, via the following formula, which describes detections performed with a radioactively labeled probe (e.g., a radiolabeled antibody in an immunobinding experiment or a radiolabeled nucleic acid probe in a Northern hybridization).
  • a radioactively labeled probe e.g., a radiolabeled antibody in an immunobinding experiment or a radiolabeled nucleic acid probe in a Northern hybridization.
  • CPM Control CPM ControI is the average of the cpm in antibody/ligand complexes or on Northern blots resulting from assays that lack the candidate modulator (in other words, untreated controls)
  • CPM Sample is the cpm in antibody/ligand complexes or on Northern blots resulting from assays containing the candidate modulator.
  • the assay comprises use of a labeling system or system of measuring enzymatic activity in which there is a linear relationship between the amount of label detected and the amount of protein or nucleic acid being represented per unit of label or the amount of protein or nucleic acid represented by a unit of enzymatic activity.
  • Rational drug design is useful for producing either structural analogs of biologically active polypeptides of interest or small molecules with which polypeptides of interest interact (e.g., agonists, antagonists, inhibitors) in order to design drugs which are, for example, more active or stable forms of the polypeptide, or which enhance or interfere with the function of a polypeptide in vivo. See, e.g., Hodgson, 1991, BioTechnology, 9:19.
  • the three-dimensional structure of a protein of interest e.g., the polypeptide product of the gene
  • the complex comprising the protein product of a gene in association with its ligand is determined by x-ray crystallography, by computer modeling or most typically, by a combination of approaches.
  • useful information regarding the structure of a polypeptide may be obtained by modeling based on the structure of homologous proteins. Rational drug design has been used successfully in the development of HTV protease inhibitors (Erickson et al., 1990, Science, 249: 527).
  • Rational drug design may also involve the analysis of peptides derived from the protein product of a gene by an alanine scan (Wells, 1991, Methods in Enzymol., 202: 390). According to this method, each of the amino acid residues of the peptide is sequentially replaced by alanine, and the effect of this amino acid substitution on the peptide' s activity is determined. This technique can be used to determine the functionally relevant regions of the peptide.
  • Another experimental approach to rational drug design will involve the isolation of a target-specific antibody (selected by a functional assay) and the determination of the crystal structure of this antibody. Theoretically, this approach will yield a pharmacore upon which subsequent drug design can be based.
  • anti-idiotypic antibodies anti -ids
  • the binding site ofthe anti -ids will be an analog ofthe original receptor.
  • the anti-id could then be used to identify and isolate potentially therapeutic peptides from banks of chemically or biologically produced banks of peptides. These selected peptides would then function as pharmacores.
  • the present invention also provides a method of supplying wild-type gene function to a cell which carries a mutant allele of a gene.
  • a full length version of the wild-type gene, or a fragment of the gene may be introduced into the cell in a vector such that the gene remains extrachromosomal and is expressed by the cell from the extrachromosomal location.
  • the wild-type gene or gene fragment should recombine with the endogenous mutant gene X already present in the cell. Such recombination requires a double recombination event which results in the correction ofthe gene mutation.
  • Vectors for introduction of genes both for recombination and for extrachromosomal maintenance are known in the art, and any suitable vector may be used.
  • Methods for introducing DNA into cells such as electroporation, calcium phosphate co- precipitation and lipofection are known in the art (described above).
  • Cells transformed with the wild-type gene can be used as model systems to study changes in the intensity of symptoms associated with osteoporosis and drug treatments which promote such changes.
  • a gene or a fragment thereof, where applicable may be used in gene therapy methods in order to increase the amount of the expression products of such genes in cells of patients with osteoporosis. It may also be useful to increase the level of expression of a gene even in those cells in which the mutant gene is expressed at a "normal" level, but the gene product is not fully functional.
  • Gene therapy can be carried out according to generally accepted methods, for example, as described by Friedman, 1991, In Therapy for Genetic Diseases; T. Friedman ed., Oxford
  • the vector will be injected into the patient, either locally at an appropriate site according to the invention or systemically.
  • Gene transfer systems known in the art may be useful in the practice ofthe gene therapy methods of the present invention. These include viral and nonviral transfer methods, a number of viruses have been used as gene transfer vectors, including papovaviruses, e.g., 5 V40 (Madzak et al., 1992, J Gen Virol, 73: 1533), adenovirus (Berkner, 1992, Curr. Top. Microbiol. Immunol, 158:39; Berkner et al, 1988, BioTechniques, 6:616; Gorziglia and Kapikian, 1992, J Virol, 66:4407; Quantin et al, 1992, Proc. Natl. Acad. Sci.
  • papovaviruses e.g., 5 V40 (Madzak et al., 1992, J Gen Virol, 73: 1533), adenovirus (Berkner, 1992, Curr. Top. Microbiol. Immunol, 158:39
  • Nonviral gene transfer methods known in the art include chemical techniques such as calcium phosphate coprecipitation (Graham and van der Eb, 1973, Virology, 52:456; Pellicer et al, 1980, Science, 209: 1414); mechanical techniques, for example microinjection (Anderson et al, 1980, Proc. Natl. Acad. Sci. USA, 77: 5399; Gordon et al, 1980, Proc. Natl Acad. Sci.. USA, 77: 7380; Brinster et al, 1981, Cell, 27:223; Constantini andLacy, 1981, Nature, 294:92); membrane fusion-mediated transfer via liposomes (Feigner et al, 1987, Proc. Natl. Acad.
  • DNA of any size is combined with a polylysine-conjugated antibody specific to the adenovirus hexon protein, and the resulting complex is bound to an adenovirus vector.
  • the trimolecular complex is then used to infect cells.
  • the adenovirus vector permits efficient binding, internalization, and degradation of the endosome before the coupled DNA is damaged.
  • Liposome/DNA complexes have been shown to be capable of mediating direct in vivo gene transfer. While in standard liposome preparations the gene transfer process is nonspecific, localized in vivo uptake and expression have been reported in tumor deposits, for example, following direct in situ administration (Nabel, 1992, Hum. Gen. Ther., 3:399). Gene transfer techniques which target DNA directly to an appropriate tissue, e.g., a tissue that normally expresses the protein product of the candidate gene of the invention, is preferred. Receptor-mediated gene transfer, for example, is accomplished by the conjugation of DNA (usually in the form of covalently closed supercoiled plasmid) to a protein ligand via polylysine.
  • Ligands are chosen on the basis ofthe presence ofthe corresponding ligand receptors on the cell surface of the target cell/tissue type. These ligand-DNA conjugates can be injected directly into the blood if desired and are directed to the target tissue where receptor binding and internalization of the DNA-protein complex occurs. To overcome the problem of intracellular destruction of DNA, coinfection with adenovirus can be included to disrupt endosome function.
  • Peptides which have gene activity can be supplied to cells which carry mutant or missing alleles of a gene.
  • peptides specific for a mutant form of the protein product of a gene can be supplied to cells carrying a wild type protein.
  • the protein product of a gene can be produced by expression ofthe cDNA sequence in bacteria, for example, using known expression vectors (as described in Section H entitled "Production of a Mutant Protein”).
  • the protein product of a gene can be extracted from mammalian cells engineered to produce the protein product of a gene of interest.
  • the techniques of synthetic chemistry can be employed to synthesize the protein product of a gene. Any of the above techniques can provide a preparation of protein product of a gene that is substantially free of other human proteins.
  • Active gene molecules can be introduced into cells by microinjection or by the use of liposomes, for example. Alternatively, some active molecules may be taken up by cells, actively or by diffusion. Extracellular application of the protein product of a gene may be sufficient to decrease or reverse the physiological effects of osteoporosis.
  • Other molecules with the activity of a protein product of a gene for example, peptides, drugs or organic compounds may also be used to effect such a reversal. Modified polypeptides having substantially similar function may also be useful for peptide therapy.
  • Transformed Hosts Cells and animals which carry a mutant allele of a gene can be used as model systems to study and test for substances which have potential as therapeutic agents. Following application of a test substance to the cells, the phenotype of the cell will be determined. Any variety of phenotypic changes associated with osteoporosis can be assessed, including insulin resistance and combined insulin resistance/insulin secretion detect. Assays for each of these traits are known in the art.
  • Animals useful for testing therapeutic agents can be selected after mutagenesis of whole animals or after treatment of germline cells or zygotes. Such treatments include insertion of mutant alleles of a gene, usually from a second animal species, as well as insertion of disrupted homologous genes. Alternatively, the endogenous gene of the animals may be disrupted by insertion or deletion mutation or other genetic alterations using conventional techniques (Capecchi, 1989, Science, 244:1288; Valancius and Smithies, 1991, Mol. Cell.
  • Polynucleotides can be used to mark objects or substances for the purposes of later identification.
  • polynucleotides ofthe invention are useful for tracking the manufacture and distribution of a large number of diverse substances, including but not limited to: (1) natural resources such as animals, plants, oil, minerals, and water; (2) chemicals such as drugs, solvents, petroleum products, and explosives; (3) commercial by-products including pollutants such as radioactive or other hazardous waste; and (4) articles of manufacture such as guns, typewriters, automobiles and automobile parts.
  • a nucleic acid according to the invention when used as a marker, thus aids in the determination of product identity and so provides information useful to manufacturers and consumers.
  • Polynucleotides have the advantage over other marking materials of being readily amplifiable through the use of polymerase chain reaction (PCR) technology.
  • PCR polymerase chain reaction
  • the method of PCR is well known in the art. PCR is performed as described by Mullis & Faloona, 1987, Methods Enzymol, 155:335, herein incorporated by reference. It is the unique sequence of a polynucleotide which renders it useful as a marker, since the sequence, or a characteristic pattern derived from its sequence, confers a property on the polynucleotide which permits it to be tracked.
  • a novel polynucleotide sequence ofthe invention may be used as markers by their attachment to or mixture in objects or substances to be marked. Methods for marking various classes of substances and later detection of the tags in those substances are disclosed in U.S. Patent Nos. 5,451,505, and 5,643,728.
  • a polynucleotide of the invention as a marker may entail combining a polynucleotide with the substance or object to be marked, using methods appropriate to that substance or object; and detecting the marker through amplification of the polynucleotide sequence using PCR technology, followed by either sequence analysis or identification by other means known in the art (e.g., hybridization assays).
  • a marker nucleic acid to a substance or object and subsequent detection of that nucleic acid will vary depending upon the nature ofthe substance or object and the environment to which it will be exposed.
  • inert solids such as paper, many pharmaceutical products, wood, some foodstuffs, etc.
  • Chemically active substances such as foodstuffs with enzymatic activity, polymers with charged groups, or acidic pharmaceuticals may require that a protective composition (e.g., liposomes) be added to the nucleic acid being used as a marker.
  • the nucleic acid may be mixed directly with the liquid, or, if the chemical nature of the liquid is not compatible with this approach (i.e., nucleic acids are not soluble in the liquid), the nucleic acid may be mixed with a detergent to enhance its solubility.
  • Containerized gases may be marked simply by adding a nucleic acid to the container in dry form, as it will be dispersed throughout the gas as the gas is released.
  • nucleic acid to add to a substance as a marker will also vary with the given situation, as will the detection strategy.
  • PCR technology allows the amplification and detection of as little as one molecule from a sample.
  • Other means of detection such as hybridization assays require that more nucleic acid be recovered from a sample to efficiently detect it.
  • PCR can be combined with a hybridization assay, however, to enhance the sensitivity of the method.
  • a nucleic acid sequence used as a marker will generally be from 20 to 1 ,000 bases long, and preferably will be 60 to 1,000 bases long when PCR is to be used to detect the marker.
  • Marked gunpowder may be prepared as follows: 1) add 16 ng of nucleic acid bearing the chosen marker sequence (derived from a polynucleotide of the invention) to 1 ml of distilled water; 2) mix the solution of nucleic acid with 1 g of nitrocellulose-based gunpowder; and 3) dry in air or under vacuum at 85°C.
  • Another example of a substance which may be marked with a nucleic acid according to the invention is ink.
  • the presence of an amplification product of the proper size (visualized, for example by gel electrophoresis alongside nucleic acid size markers followed by ethidium bromide staining of the gel, according to standard methods) will indicate the presence of the marker in the sample.
  • the PCR product may be further subjected to hybridization analysis or to sequencing to enhance the accuracy of the method. A method of hybridization analysis which can be used is described herein.
  • a polynucleotide of the invention is novel, (that is, its sequence is unique),it is useful as a marker for chromosomal mapping.
  • methods of chromosomal mapping known in the art. Prominent among them is the variant of the in situ hybridization technique known as "Fluorescence In Situ Hybridization", or FISH. Details of methods and solutions used for in situ hybridization are well-known in the art. There are many variations of the FISH technique itself, however the basic approach is similar in each case.
  • in situ hybridization of cells, nuclei, or metaphase chromosome spreads is performed with a polynucleotide probe either directly labeled with a fluorochrome, or labeled with a moiety which will be bound by a fluorochrome tagged entity.
  • the hybridized probe is visualized by irradiation of the sample with light in the wavelength which excites fluorescence from the fluorochrome.
  • the location of the novel polynucleotide sequence on that chromosome may be further localized by in situ hybridization along with probes specific for known genes or sequences, labeled with other fluorescent tags which allow the differentiation of the signals from the different probes.
  • in situ hybridization along with probes specific for known genes or sequences, labeled with other fluorescent tags which allow the differentiation of the signals from the different probes.
  • probes specific for known genes or sequences labeled with other fluorescent tags which allow the differentiation of the signals from the different probes.
  • polynucleotide of the invention In addition to being able to determine the chromosomal location of the novel polynucleotide, similar technology, in which FISH is combined with flow cytometry, will allow the polynucleotide of the invention to be used to sort chromosomes, nuclei, or whole cells containing various dosages (i.e., gene copy numbers) of the gene encoding that polynucleotide
  • Forensic science depends heavily on methods for determining the source of various compounds associated with criminal activity.
  • identification of individuals involved in criminal activity through analysis of substances found at the crime scenes is critical
  • genetic typing which involves the determination of the genotype of an individual with regard to loci which are polymorphic within the population.
  • polymorphic refers to a gene or other segment of DNA which shows nucleotide sequence variability from individual to individual.
  • the use of PCR techniques and nucleotide probes to detect even single nucleotide changes in a polynucleotide sequence has revolutionized the field of forensic serology (see Reynolds and Sensabaugh, 1991, Anal. Chem., 63:2).
  • polymorphisms useful for forensic identification and methods of typing samples with regard to those polymorphisms, see U.S. Patent # 5,273,883.
  • a polynucleotide ofthe invention is found to have nucleotide sequence variation among individuals within a population, it may be useful in the analysis of forensic samples.
  • methods known to those skilled in the art for typing nucleic acids with regard to polymorphisms It should be understood that any such method is acceptable according to the invention.
  • One particular method is termed the "reverse dot blot" method.
  • oligonucleotides bearing the sequences of various polymorphic forms of the polynucleotide region to be analysed are bound to membranes; 2) labeled, PCR-amplified fragments, derived from the sample to be genotyped, and corresponding to the polymorphic region ("target DNA") are allowed to hybridize to the bound oligonucleotides under conditions which only allow the hybridization of molecules with 100% complementary sequences; 3) unbound target DNA is removed; and 4) hybridized molecules are detected.
  • the specific genotype of the individual from whom the target sample was obtained (amplified), with regard to the polymorphic region of a polynucleotide ofthe invention, may thus be determined by screening a panel of probes containing the known polymorphic sequence variations of that region. It should be understood that the hybridization conditions may be adjusted by one of skill in the art so that limited amounts of non-complementarity, including single base mismatches, may be detected with this method.
  • compositions are accomplished orally or parenterally.
  • Methods of parenteral delivery include topical, intra-arterial (directly to the tumor), intramuscular, subcutaneous, intramedullary, intrathecal, intraventricular, intravenous, intraperitoneal, or intranasal administration.
  • these pharmaceutical compositions may contain suitable pharmaceutically acceptable carrier preparations which can be used pharmaceutically.
  • compositions for oral administration can be formulated using pharmaceutically acceptable carriers well known in the art in dosages suitable for oral administration.
  • Such carriers enable the pharmaceutical compositions to be formulated as tablets, pills, dragees, capsules, liquids, gels, syrups, slurries, suspensions and the like, for ingestion by the patient.
  • compositions for oral use can be obtained through combination of active compounds with solid excipient, optionally grinding a resulting mixture, and processing the mixture of granules, after adding suitable auxiliaries, if desired, to obtain tablets or dragee cores.
  • Suitable excipients are carbohydrate or protein fillers such as sugars, including lactose, sucrose, mannitol, or sorbitol; starch from corn, wheat, rice, potato, or other plants; cellulose such as methyl cellulose, hydroxypropylmethyl-cellulose, or sodium carboxymethyl cellulose; and gums including arabic and tragacanth; and proteins such as gelatin and collagen.
  • disintegrating or solubilizing agents may be added, such as the cross-linked polyvinyl pyrrolidone, agar, alginic acid, or a salt thereof, such as sodium alginate.
  • Dragee cores are provided with suitable coatings such as concentrated sugar solutions, which may also contain gum arabic, talc, polyvinylpyrrolidone, carbopol gel, polyethylene glycol, and/or titanium dioxide, lacquer solutions, and suitable organic solvents or solvent mixtures.
  • suitable coatings such as concentrated sugar solutions, which may also contain gum arabic, talc, polyvinylpyrrolidone, carbopol gel, polyethylene glycol, and/or titanium dioxide, lacquer solutions, and suitable organic solvents or solvent mixtures.
  • Dyestuffs or pigments may be added to the tablets or dragee coatings for product identification or to characterize the quantity of active compound, ie, dosage.
  • Push-fit capsules made of gelatin, as well as soft, sealed capsules made of gelatin and a coating such as glycerol or sorbitol.
  • Push-fit capsules can contain active ingredients mixed with a filler or binders such as lactose or starches, lubricants such as talc or magnesium stearate, and, optionally, stabilizers.
  • the active compounds may be dissolved or suspended in suitable liquids, such as fatty oils, liquid paraffin, or liquid polyethylene glycol with or without stabilizers.
  • compositions for parenteral administration include aqueous solutions of active compounds.
  • the pharmaceutical compositions of the invention may be formulated in aqueous solutions, preferably in physiologically compatible buffers such as Hank's solution, Ringer' solution, or physiologically buffered saline.
  • Aqueous injection suspensions may contain substances which increase the viscosity ofthe suspension, such as sodium carboxymethyl cellulose, sorbitol, or dextran.
  • suspensions ofthe active solvents or vehicles include fatty oils such as sesame oil, or synthetic fatty acid esters, such as ethyl oleate or triglycerides, or liposomes.
  • the suspension may also contain suitable stabilizers or agents which increase the solubility of the compounds to allow for the preparation of highly concentrated solutions.
  • penetrants appropriate to the particular barrier to be permeated or used in the formulation.
  • penetrants are generally known in the art.
  • compositions of the present invention may be manufactured in a manner that known in the art, e.g. by means of conventional mixing, dissolving, granulating, dragee-making, levitating, emulsifying, encapsulating, entrapping or lyophilizing processes.
  • the pharmaceutical composition may be provided as a salt and can be formed with many acids, including but not limited to hydrochloric, sulfuric, acetic, lactic, tartaric, malic, succinic, etc... Salts tend to be more soluble in aqueous or other protonic solvents that are the corresponding free base forms.
  • the preferred preparation may be a lyophilized powder in lmM-50 mM histidine, 0.1%-2% sucrose, 2%-7% mannitol at aPhRange of 4.5 to 5.5 that is combined with buffer prior to use.
  • compositions suitable for use in the present invention include compositions wherein the active ingredients are contained in an effective amount to achieve the intended purpose.
  • the determination of an effective dose is well within the capability of those skilled in the art.
  • the therapeutically effective dose can be estimated initially either in cell culture assays, or in animal models, usually mice, rabbits, dogs, or pigs. The animal model is also used to achieve a desirable concentration range and route of administration. Such information can then be use to determine useful doses and routes for administration in humans.
  • a therapeutically effective dose refers to that amount of protein or its antibodies, antagonists, or inhibitors which ameliorate the symptoms or conditions.
  • Therapeutic efficacy and toxicity of such compounds can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, eg, ED50 (the dose therapeutically effective in 50% of the population) and LD50 (the dose lethal to 50% of the population). The dose ratio between therapeutic and toxic effects is the therapeutic index, and it can be expressed as the ratio, LD50/ED50. Pharmaceutical compositions which exhibit large therapeutic indices are preferred.
  • the data obtained from cell culture assays and animals studies is used in formulating a range of dosage for human use.
  • the dosage of such compounds lies preferably within a range of circulating concentrations that include the ED5O with little or no toxicity. The dosage varies within this range depending upon the dosage from employed, sensitivity of the patient, and the route of administration.
  • the exact dosage is chosen by the individual physician in view of the patient to be treated. Dosage and administration are adjusted to provide sufficient levels of the active moiety or to maintain the desired effect. Additional factors which may be taken into account include the severity of the disease state; age, weight and gender of the patient; diet, time and frequency of administration, drug combination(s), reaction sensitivities, and tolerance/response to therapy. Long acting pharmaceutical compositions might be administered every 3 to 4 days, every week, or once every two weeks depending on a half-life and clearance rate of the particular formulation.
  • Dosage amounts may vary from 0.1 to 100,000 micrograms per person per day, for example, lug, lOug, lOOug, 500 ug, lmg, lOmg, and even up to a total dose of about lg per person per day, depending upon the route of administration.
  • Guidance as to particular dosages and methods of delivery is provided in the literature. See U.S. Patent Nos. 4,657,760; 5,206,344; or 5,225,212, hereby incorporated by reference. Those skilled in the art will employ different formulations for nucleotides than for proteins or their inhibitors. Similarly, delivery of polynucleotide or polypeptides will be specific to particular cells, conditions, locations, etc.
  • a polynucleotide sequence according to the invention containing a mutation which is believed to be associated with osteoporosis can be statistically linked to osteoporosis by linkage analysis.
  • An animal model system exhibiting a particular phenotypic defect that is characteristic of the osteoporosis is selected.
  • a series of genetic crosses is performed in this animal model system between individuals having an observable mutant phenotype and normal individuals of a control strain.
  • At least one disease-related locus or a chromosomal marker that does not comprise a disease related locus is used as a marker in these crosses. If a statistically significant pattern of non-random assortment of the mutant trait with a marker locus is observed, the trait is linked to the marker locus.
  • linkage analysis can be performed on an existing human or other mammalian pedigree.
  • numerous genetic loci from affected and unaffected family members are compared.
  • Non-random assortment of a given genetic marker between affected and unaffected family members relative to the distributions observed for other genetic loci indicates that the marker (for example, a variant isoform of a gene) either contributes to the disease or is in physical proximity to another that does so.
  • a polynucleotide sequence according to the invention can be used as a marker for a normal phenotype or for a phenotype associated with osteoporosis.
  • this sequence can be used as a marker for osteoporosis.
  • a sequence of interest can be used as a probe to screen genomic DNA from individuals by Southern blot analysis according to the method described above. If the sequence of interest is detected by Southern blot analysis, and the presence of this sequence is confirmed by direct sequencing, it can be concluded that the individual from which the genomic DNA has been isolated has an increased frequency for the development of osteoporosis for which the sequence is a marker.
  • the marker can also be used as an osteoporosis indicator according to the method of PCR.
  • a genomic DNA sample of interest can be analysed in a PCR reaction wherein one ofthe primers contains the marker sequence. If the marker sequence is present in the sample DNA, a PCR product will be produced.
  • the PCR primers can be designed such that they amplify a region containing the marker sequence.
  • the amplified product can be analysed by hybridization methods, described above, to determine the presence of the sequence of interest.
  • a polynucleotide according to the invention, containing a mutation which is believed to be associated with osteoporosis can be used a target for drug screening.
  • One method of drug screening utilizes eukaryotic or procaryotic host cells which are stably transformed with a polynucleotide according to the invention and either exhibit a particular phenotype characteristic of the presence of the polynucleotide or express a polypeptide or fragment encoded by the polynucleotide.
  • Such cells can be used for standard competitive binding assays.
  • these cells can be used to measure formation of a complex comprising the protein product or fragment of a polynucleotide according to the invention and the agent being tested.
  • these cells can be used to determine if the formation of a complex between the protein product or fragment of a polynucleotide according to the invention and a known ligand is interfered with by an agent being tested.
  • An alternative method for drug screening involves using of eukaryotic cell lines or cells (such as described above) which contain a polynucleotide according to the invention that produces a defective protein.
  • the host cell lines or cells are grown in the presence of a test drug.
  • the rate of growth of the host cells is measured to determine if the compound is capable of regulating the growth of cells expressing a nonfunctional protein product of the polynucleotide according to the invention.
  • a drug that is useful according to the invention will increase or decrease the growth rate of a cell by at least 10%.
  • the ability of the test compound to restore the function of the mutant gene protein by at least 10% can be measured by using an appropriate in vitro assay for function of the protein product of a gene (as described in Section F entitled "Identification and Characterization of Polymorphisms"). If the host cell lines or cells express a protein product of a gene that exhibits an aberrant pattern of cellular localization, the ability of the test compound to alter the cellular localization of the protein by at least 10% will be determined. Changes in the cellular localization of a protein of interest will be detected by performing cellular fractionation studies with biosynthetically labeled cells. Alternatively, the cellular localization of a protein of interest can be determined by immunocytochemical methods well known in the art.
  • a method of drug screening may also involve the use of host eukaryotic cell lines or cells (described above) which have an altered gene that demonstrates an aberrant pattern of expression where the level of expression is either abnormally high or low, or the temporal pattern of expression is different from that of the wild type gene.
  • the ability of a test drug to alter the expression of a mutant form of a gene by at least 10% can be measured by Northern blot analysis, SI nuclease analysis, primer extension or RNase protection assays, as described above.
  • cells can be engineered to express a reporter construct comprising a mutant gene promoter driving expression of a reporter gene (e.g.
  • CAT luciferase, green fluorescent protein
  • a transgenic animal whose genomic DNA contains a polynucleotide associated with a particular phenotypic defect that is characteristic of osteoporosis and a normal, control animal (not containing the polynucleotide) can be treated with a candidate drug according to the invention.
  • the ability of a candidate drug to ameliorate symptoms of the disease, by at least 10%, will be analysed by assessing the disease symptoms and their amelioration.
  • osteoporosis candidate gene list was compiled using gene or gene sequences selected from literature sources, using sequence homology, library subtraction and expression analysis. Expression analysis was performed using "guilt-by-association" queries to identify
  • Polymorphism discovery was by fSSCP as described in section F "Identification and Characterization of Polymorphisms".
  • the polymorphisms were mapped to cDNA sequences in the LifeSeqGold database (Incyte Genomics, Inc., Palo Alto, CA) to identify the affected gene.
  • the genomic Human Diversity Panel will be used where full genomic structure for the selected candidate genes is available to allow screening of the open reading frame of the gene including splice junctions.
  • a cDNA version of the HDP (generated from lymphoblastoid cell lines to obviate the need for intron/exon structure in 50% of human genes) will be used where full genomic structure for the selected candidate genes may not be available to permit screening of the open reading frame of the gene.
  • This HDP is derived from 47 consenting individuals from four ethnic groups (Caucasian, African-American, Asian and Hispanic).
  • probands were identified through probands with a BMD Z score of at least -1.6 (equivalent to approximately the lower 5% of the normal distribution of BMD) at either the femoral neck or the lumbar spine (L2-L4).
  • a "proband” is defined as the first person identified with a particular phenotype (in this case low BMD) within a family.
  • the initial phase of family collection focused on nuclear families of European Caucasoid origin. These families were used primarily for a genome-wide scan for genetic determinants of BMD. BMD was measured in all participating family members and treated as a quantitative trait. First degree relatives of probands will be invited to participate. These included parents, siblings and offspring over the age of twenty. Spouses could to take part to act as controls and to assist the analysis of their children's genotype.
  • the size and nature of families will therefore depend on a number of factors including the age of the proband, family history of osteoporosis or fractures and whether other family members are willing to participate. It is expected, judged from previous experience, that the average number of volunteers per family will be five individuals.
  • the absolute minimum family that was accepted into the study is a pair of siblings, either concordant or discordant for BMD where one of the siblings was a proband.
  • a collection of large numbers of simplex families for linkage disequilibrium studies was carried out to get finer mapping stages of positional cloning and for systematic mapping of functional candidate genes.
  • families from other ethnic groups will provide genetic diversity for haplotype analysis to help identify the primary disease- predisposing sequences. Cape Town and Singapore were selected to collect material from ethnic groups.
  • probands were identified if they had a femoral neck/lumbar spine BMD equal or lower than Z -2.0, were between 20 - 85 years of age, European, white Caucasian and fully mobile. They were excluded from the study if they had secondary osteoporosis, prednisolone usage at a dose of 7.5mg per day for six months or longer or equivalent steroid doses of Dexamethasone 0.75mg per day or hydrocortisone 30mg per day, were hypothyroid patients on thyroxine if the TSH is below the laboratory normal range, had a malignancy (including myeloma) within five years, have malabsorption, have a inflammatory bowel disease, have premenopausal (aged less than 45 years) amenorrhoea greater than six months, other than pregnancy, had previous or cu ⁇ ent alcohol intake estimated at greater than 30 units per week for more than six months, chronic renal failure (creatinine > 150 ⁇ mol/1) or chronic liver dysfunction (AST >
  • Volunteers gave blood samples for DNA extraction for genetic studies, as well as blood samples for calcium, creatinine, liver function (if over 60 years), TSH and vitamin D (if over 60 years) tests, and a second voided urine sample for markers of bone turnover.
  • at least 10 ml of venous blood was collected from a forearm vein into EDTA tubes. Blood collected into plastic tubes will be frozen straight away. Blood collected into glass tubes was transferred to plastic tubes before freezing. Once frozen the blood will not be thawed until DNA extraction takes place. DNA extraction was performed using standard procedures. The blood was frozen quickly as possible to -70°C and then stored at -70°C. A 10-ml venous blood sample was also taken from all subjects for biochemical assays of calcium, creatinine, liver function, TSH and vitamin D. Blood was collected into a plain container. Separated serum was stored at -70°C.
  • Second voided urine samples for analysis of biochemical markers of bone turnover were taken. These samples were stored at -70°C. i addition to BMD at femoral neck and lumbar spine, height and weight were measured. Volunteers were scanned at the femoral neck and lower spine (L2-L4). For the femoral neck, the volunteer will be placed in the dorsal decubitus position with a 10-degree internal rotation of the hip, according to the manufacturer's protocol. For a satisfactory lumbar spine scan (e.g. no scoliosis, severe degenerative disease or obvious fracture), the volunteer will be placed, as described in the manufacturer's manual, in a comfortable supine position with legs raised and supported so as to ensure that the lumbar spine is as horizontal as possible. The axis of the spine should be parallel to the axis of the scanning machine.
  • Bone mineral density measurements was performed using dual energy X-ray absorbtiometry (DXA) scanning.
  • the bone density data was standardized by the use at each center of the same male and female reference population databases for hip and spine.
  • the Z score was calculated by subtracting the predicted BMD from the actual BMD, and then dividing the difference by the reference population standard deviation.
  • Osteoporosis Bone mineral density was analysed as a quantitative trait in probands and family members. Selection of probands with a low BMD increased the power to detect linkage of genetic marker loci.
  • Power is defined here as the probability of observing positive evidence for linkage at a single additive quantitative locus, assuming that a genetic susceptibility locus exists, using the variance component model of Amos (1994). Positive evidence of linkage means a LOD score of 3.0 (p ⁇ 0.001) or greater, the accepted scientific standard.
  • Narrow sense heritability (a measure of the genetic contribution of a specific locus).
  • the broad sense heritability for osteoporosis was estimated to be between 0.3 and 0.8.
  • Theoretical calculations were based on the analysis of 108 nuclear families with an average of 2.9 phenotyped siblings per family using the same recruitment strategy as described above, assuming:
  • That the marker locus is highly informative.
  • family recruitment begins initially using the proband inclusion criterion of a BMD Z score of -2.0.
  • a more stringent proband criterion of Z score of -2.0 or less will be adopted.
  • BMD was co ⁇ ected for height, weight, age and sex.
  • the experimental threshold for positive evidence for linkage was p ⁇ 0.001 at one locus or p ⁇ 0.01 at two or more adjacent loci.
  • SNPs Single nucleotide polymorphisms
  • Incyte' s proprietary fSSCP method Fluoresently labeled primers were synthesized and PCR was performed on 47 DNAs from a Coriel-derived Human Diversity Panel. The PCR products were electrophoresed on an ABI 377 machine and 8% nondenaturing, 12cm SSCP gels were used. The resulting traces were aligned in ABI Genotyper software and where variant traces (indicating underlying polymorphisms) were found, examples of each variant were sequenced.
  • a pair of oligonucleotides for amplification by PCR was designed on either side of each biallelic polymorphism to produce a product size between 50bp and 350bp.
  • a sequencing oligonucleotide was designed to end within 30bp either 5' or 3' to each polymorphic site. All amplification oligonucleotides used to generate the complementary strand to the sequencing primer were labeled with a 5' - Biotin. Examples of the particular sequencing primers used are found in Table 10 of U.S.
  • Each reaction used 20ng DNA (dried down), 0.6 units of AmpliTaq GoldTM DNA polymerase, IX PCR Buffer R, 2.5mM MgCl 2 , lmM dNTP, and lOpmol of each PCR oligonucleotide in a final volume of 10ml
  • the PCR cycling conditions used were: 95°C for 12 min, 45 cycles of: 94°C for 15 sec, T A for 15 sec, 72°C for 30 sec, and 72°C for 5 min.
  • sequencing oligonucleotide was annealed to the template by denaturing at 80°C for 2min and then cooling to room temperature for 10 min. Each marker/sample combination was then sequenced/genotyped by pyrosequencingTM on a
  • PSQ96TM (Pyrosequencing AB). Genotype results were stored in the PSQ oracle® database ready for statistical analysis.
  • ACLP Aortic carboxypeptidase-like protein
  • NM_001129 Protein NP_001120
  • the ACLP also known as the adipocyte enhancer(AE)-binding protein 1 (AEBP1), is a transcriptional repressor with carboxypeptidase activity and may play a role in adipogenesis.
  • AEBP1 adipocyte enhancer(AE)-binding protein 1
  • a kinase anchor protein 9 (AKAP9) mRNA: NM_005751 Protein: NP_005742 mRNA: NM_147166 Protein: NP_671695 mRNA: NM_147171 Protein: NP_671700 mRNA: NM_147185 Protein: NP_671714
  • AKAP9 also known as YOTIAO, is a scaffold protein that binds type I protein phosphatase (PPl) and cAMP-dependent protein kinase (PKA) to NMDA receptors. AKAP9 also anchors protein kinases and phosphatases to the centrosome and the Golgi apparatus. 3) Bone morphogenetic protein receptor, type II (BMPR2)
  • Variant 1 mRNA: NM_001204 Protein: NP_001195
  • Variant 2 mRNA: NM 033346 Protein: NP 203132
  • BMPR2 also know as the serine/threonine kinase type II activin receptor-like kinase rs a transforming growth factor beta (TGF-beta) receptor that can also bind type I receptors and is involved in bone and other morphogenesis. Mutations in the gene are associated with familial primary pulmonary hypertension.
  • TGF-beta transforming growth factor beta
  • Fibroblast growth factor receptor 2 (FGFR2) mRNA: NM_000141 Protein: NP_000132 mRNA: NM_022969 Protein: NP_075258 mRNA: NM_022970 Protein: NP_075259 mRNA: NM_022971 Protein: NP_075260 mRNA: NM_022972 Protein: NP_075261 mRNA: NM_022973 Protein: NP_075262 mRNA: NM_022974 Protein: NP_075263 mRNA: NM_022975 Protein: NP_075264 mRNA: NM_022976 Protein: NP_075265 mRNA: NM_023028 Protein: NP_075417 mRNA: NM_023029 Protein: NP_075418 mRNA: NM_023030 Protein: NP_075419 mRNA: NM 023031 Protein: NP 075420
  • FGFR2 is a high-affinity receptor, depending on the isoform, for acidic, basic and/or keratinocyte growth factor.
  • This receptor is a member of the fibroblast growth factor receptor family, where amino acid sequence is highly conserved between members and throughout evolution.
  • FGFR family members differ from one another in their ligand affinities and tissue distribution.
  • a full-length representative protein consists of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation.
  • Mutations in this gene are associated with many craniosynostotic syndromes and bone malformations.
  • the genomic organization of this gene encompasses 20 exons.
  • FOSB FBJ murine osteosarcoma viral oncogene homolog B (FOSB) mRNA: NM_006732 Protein: NP_006723
  • FOSB is a DNA-binding member of the Fos family, forms AP-1 transcription factor complex with Jun proteins. FOSB may be involved in the pathogenesis of breast tumors.
  • FSTL1 also known as follistatin-related protein, is a nuclear activin-binding protein that is induced by TGF beta 1 (TGFB1) and inhibits cell proliferation.
  • FSTL1 is also an autoantigen in systemic rheumatic diseases.
  • FSTL1 is abundantly expressed (0.33%) in trabecular bone libraries.
  • IGFBP5 Insulin-like growth factor binding protein 5
  • mRNA NM_000599
  • Protein NP_000590
  • IGFBP5 is a member of the insulin-like growth factor binding family of proteins that bind to and modulate insulin-like growth factor activity, regulates bone formation and may serve in muscle and cartilage development. IGFBP5 has tissue specificity with osteosarcoma, and at lower levels in liver, kidney, and brain. IGFBP5 can also alter the interaction of insulin growth factors with their cell surface receptors.
  • IRSl also known as FflRS-1, is a cytoplasmic docking protein that mediates IGF1 signaling to SH2-containing effector molecules such as Grb2 and PI3-kinase and inhibits apoptosis. IRSl also plays a role in cell proliferation and glucose transport.
  • Alpha V subunit integrin (ITGAV) mRNA: NM_002210 Protein: NP_002201 ITGAV is a subunit of the vitronectin receptor that is involved in cell-cell and cell-matrix interactions, plays a role in tumor angiogenesis and may contribute to tumorigenicity of cutaneous malignant melanoma. Integrins serve as major receptors for extracellular matrix-mediated cell adhesion and migration, cytoskeletal organization, cell proliferation, survival, and differentiation. Alpha-V integrins comprise a subset sharing a common alpha-V subunit combined with 1 of 5 beta subunits (beta-1, -3, -5, -6, or -8).
  • alpha-V integrins recognize the sequence RGD in a variety of ligands (vitronectin, fibronectin, osteopontin, bone sialoprotein, thrombospondin, fibrinogen, von Willebrand factor, tenascin, and agrin) and, in the case of alpha- V-8, laminin and type IV collagen
  • Vitronectin is a multifunctional glycoprotein present in blood and in the extracellular matrix. It binds glycosaminoglycans, collagen, plasminogen and the urokinase-receptor, and also stabilizes the inhibitory conformation of plasminogen activation inhibitor- 1. By its localization in the extracellular matrix and its binding to plasminogen activation inhibitor- 1, vitronectin can potentially regulate the proteolytic degradation of this matrix. In addition, vitronectin binds to complement, to heparin and to thrombin-antithrombin III complexes, implicating its participation in the immune response and in the regulation of clot formation. The biological functions of vitronectin can be modulated by proteolytic enzymes, and by exo- and ecto-protein kinases present in blood.
  • Vitronectin contains an Arg-Gly-Asp (RGD) sequence, through which it binds to the integrin receptor alpha v beta 3, and is involved in the cell attachment, spreading and migration.
  • RGD Arg-Gly-Asp
  • Bone resorption requires the tight attachment of the bone-resorbing cells, the osteoclasts, to the bone mineralized matrix.
  • Integrins a class of cell surface adhesion glycoproteins, play a key role in the attachment process. Most integrins bind to their ligands via the RGD tripeptide present within the ligand sequence. The interaction between integrins and ligands results in bidirectional transfer of signals across the plasma membrane. Tyrosine phosphorylation occurs within cells as a result of integrin binding to ligands and probably plays a role in the formation of the osteoclast clear zone, a specialized region of the osteoclast membrane maintained by cytoskeletal structure and involved in bone resorption.
  • Human osteoclasts express alpha 2 beta 1 and alpha v beta 3 integrins on their surface.
  • the alpha v beta 3 integrin a vitronectin receptor, plays an essential role in bone resorption.
  • echistatin an RGD-containing protein from a snake venom, binds to the alpha v beta 3 integrin and blocks bone resorption both in vitro and in vivo. (Dresner-Pollak R, Rosenblatt M. J Cell Biochem 1994 Nov;56(3):323-30).
  • Alpha-V integrins have been implicated in many developmental processes and are therapeutic targets for inhibition of angiogenesis and osteoporosis.
  • the ablation of the gene for the alpha-V integrin subunit, eliminating all 5 alpha-V integrins, although causing lethality, allows considerable development and organogenesis including, most notably, extensive vasculogenesis and angiogenesis.
  • These liveborn alpha- V-null mice consistently exhibit intracerebral and intestinal hemo ⁇ hages and cleft palates.
  • KJ_bonlib4 is also known as p97, DAP5, NAT1 and Eukaryotic translation initiation factor 4G-like 1.
  • KJ_bonlib4 is a translational repressor that binds EIF3 and EIF4A, but not EIF4E, promotes IFNG-induced programmed cell death and is cleaved by caspase-3 (CASP3) during apoptosis.
  • KJ bonlib7 has an unknown function.
  • KJ_opgbal is a member of the sulfatase family, which hydrolyze sulfate esters, has a region of moderate similarity to a region of N-acetylglucosamine-6-sulfate sulfatase (human GNS), which is associated with Sanfilippo disease IITD upon deficiency
  • KJ_opgbal3 is also known as DEPP.
  • Pfram model results indicate that KJJDPGB A13 is a thermophilic metalloprotease.
  • KJ_opgbal4 is also known as NESHBP and TARSH.
  • KJ_opgbal4 contains a fibronectin type III domain, which is involved in cell surface binding. 15)
  • KJ_opgba47 is a member of the cytochrome b561 family, has moderate similarity to uncharacterized cytochrome b561 (human CYB561), which is an integral membrane protein found in neuroendocrine secretory vesicles.
  • KJ_opgbal 15 has a strong similarity to B cell phosphoinositide 3 -kinase (PI3K) adaptor (mouse Bcap), which binds to SH2 domains of PI3K and may recruit PI3K to glycolipid-enriched microdomains leading to BCR-mediated PI3K activation.
  • PI3K B cell phosphoinositide 3 -kinase
  • KJ_opgbal36 has an unknown function.
  • LUM is an extracellular matrix keratan sulfate proteoglycan that may be involved in the development and maintenance of corneal transparency.
  • MMPl Matrix metalloproteinase 1
  • MMPl also known as interstitial collagenase, is a matrix metalloprotease that cleaves fibrillar collagen type I to gelatin and functions in collagen turnover in most tissues and may play a role in cartilage destruction in rheumatoid arthritis.
  • Mitogen-activated protein kinase 8 MAPK8J
  • MAPK ⁇ isoform l mRNA: NM_139049 Protein: NP_620637
  • MAPK8 isoform 2 mRNA: NM_002750 Protein: NP_002741
  • MAPK8 isoform 3 mRNA: NM_139046 Protein: NP_620634
  • MAPK8 isoform 4 mRNA: NM_ 139047 Protein: NP_620635
  • MAPK8 is also known as JNK, JNK1, PRKM8, SAPK1, JNK1A2 and JNK21B1/2.
  • MAPK8 is a serine-threonine kinase that regulates c-Jun (JUN) and plays a role in the induction of apoptosis and other cellular responses to stressors such as ultraviolet light, reactive oxygen and hypoxia.
  • NFKB2 Nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (NFKB2) mRNA: NM_002502 Protein: NP_002493 NFKB2 is a transcription factor, involved in immune response, may coordinate pre mRNA splicing and transcription and may play a role in HJV infection, leukemia, breast cancer and lymphoid neoplasia.
  • NOTCH3 encodes the third discovered human homologue of the Drosophilia melanogaster type I membrane protein notch.
  • notch interaction with its cell- bound ligands (delta, serrate) establishes an intercellular signalling pathway that plays a key role in neural development.
  • Mutations in NOTCH3 have been identified as the underlying cause of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASE ).
  • CADASE cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy
  • Alignment of available genomic sequence to the CDS contig identified at least 29 exons.
  • Notch may be a receptor with different functional domains, the intracellular domain having the signal-transducing activity ofthe intact protein and the extracellular domain possessing a ligand-binding and regulatory activity.
  • OSF2 Osteoblast specific factor 2
  • mRNA NM_006475
  • Protein NP_006466 OSF2 is also known as periostin.
  • OGN is a member of the keratan sulfate proteoglycan group of the small leucine-rich profeoglycan family and may play a role in regulating corneal transparency.
  • Plasminogen activator inhibitor 1 (PAH) mRNA: NM_000602 Protein: NP_000593
  • PAH is a member of the serpin family of serine proteases, inhibitors and plays a role in regulating blood coagulation by inhibiting fibrinolysis, contributes to tumor progression and is a risk factor for cardiovascular diseases.
  • PTGS1 is also known as COX1, catalyzes the conversion of arachidonic acid to prostaglandin H2 and may be involved in inflammation and blood coagulation. PTGS 1 's activity is irreversibly inhibited by aspirin.
  • SCYA2 CCL2 chemokine (C-C motif) ligand 2 (SCYA20) mRNA: NM_002982 Protein: NP_002973
  • SCYA2 is also known as monocyte secretory protein JE monocyte chemoattractant protein- 1 monocyte chemotactic and activating factor small inducible cytokine subfamily A (Cys-Cys), member 2 monocyte chemotactic protein 1 and homologous to mouse Sig-je, is a Cytokine A2, CC chemokine that attracts monocytes, memory T-cells, natural killer cells and endothelial cells.
  • SCYA2 plays a role in the inflammatory response to infection and in inflammatory diseases including arthritis, multiple sclerosis and atherosclerosis.
  • TIMP Tissue inhibitor of metalloproteinase 1 (TIMPl) mRNA: NM_003254 Protein: NP 003245
  • TIMP is also known as erythroid potentiating activity, EPA, EPO, HCl and CLGI. TIMP inhibits matrix metalloproteases including MMP2, stimulates growth of erythroid cells and attenuates metastasis of tumorigenic cells when overexpressed.
  • TGM1 Transglutaminase 2 (TGM1) mRNA: NM_000359 Protein: NP_000350
  • TGM1 is membrane bound and catalyzes the crosslinking of extracellular matrix (ECM) proteins and other cellular proteins, modulates the ECM, cell growth, adhesion, signaling, and apoptosis, and has been associated with Alzheimer's, Huntington, and celiac disease.
  • ECM extracellular matrix
  • TNFAIP6 Tumor Necrosis Factor- Alpha-Induced Protein 6
  • TNFAIP6 is a metalloprotease. TNFAIP6 is transcribed in normal fibroblasts and activated by binding of the TNFa. Similar to CD44, TNFAIP6 binds hyaluronate and is involved in plasmin inhibition and the inhibition of inflammation.
  • VEGF Vascular endothelial growth factor
  • VEGF which is structurally related to platelet-derived growth factor, induces endothelial cell proliferation and migration, vascular permeability, angiogenesis and NO-mediated signal transduction.
  • Many polypeptide mitogens such as basic fibroblast growth factor and platelet- derived growth factors are active on a wide range of different cell types.
  • vascular endothelial growth factor is a mitogen primarily for vascular endothelial cells. Data suggest that mutations of p53 and activation of the Ras/MAPK pathway may play a role in the induction of VEGF expression in human colorectal cancer.
  • vascular endothelial growth factor by membrane-type 1 matrix metalloproteinase stimulates human glioma xenograft growth and angiogenesis. Both VEGF-induced PI 3-kinase activation and beta(l) mtegrin-mediated binding to fibronectin are required for the recruitment and activation of PKC alpha.
  • Quantitative transmission disequilibrium tests for association between 100 SNP loci and 13 phenotypic traits are reported.
  • BMD bone mineral density
  • For each marker-trait combination the significance of stratification is tested.
  • the significance of association between marker and trait is tested both unpartitioned, and partitioned into between-family and within-family components.
  • SNP-trait associations are significant at the 1 % level. The most notable of these is between SNP locus ITGA08 and BMD in lumbar vertebrae 2 to 4 in males. The effect of this association is 4.1 % of the mean value for calibrated BMD, and 0.237 units for the Z score.
  • the traits comprise calibrated bone mineral density (BMD) values for four skeletal sites, the corresponding Z scores, the occurrence of fractures, and four other traits.
  • the skeletal sites studied are lumbar vertebrae 2 to 4 (mean value), the neck of the femur, the trochanter, and the total of BMD values over three sites in the hip (neck of femur, trochanter and 'inter').
  • Calibrated BMD values are given in units of g/cm 2 .
  • the four other traits, which are not directly related to osteoporosis (though they are associated with it) and which are included for purposes of comparison, are the ages of onset and cessation of periods in females, and height and weight in both
  • Statistical analysis is performed using the software QTDT. For each marker-trait combination, the significance of stratification is tested. If stratification is present, the between-pedigrees component of the marker-trait association is not entirely due to linkage and only the within-pedigree component can legitimately be used to measure the effect of the locus. However, if stratification is absent the unpartitioned association provides a stronger test of significance and a more precise measure of the effect of the locus. Therefore each marker-trait combination is tested for association both without partitioning of the association, and with partitioning into between- and within-pedigree components. The interpretation of the results then depends on the outcome of the test for stratification. These analyses are performed both for the sexes pooled, and also using phenotypic data from males only and females only.
  • the script contained in the file heritability/run_QTDT_heritability is then run.
  • This script fits a QTDT model with options -a- -We -Veg to each phenotypic trait for the sexes pooled, both for the complete set of phenotypic values and with the exclusion of outliers for BMD.
  • options indicate that no model of association is to be fitted, and that the variance components V e (null model) and V e + V g (full model) are to be estimated.
  • Heritability is then estimated as
  • the co ⁇ esponding analysis was performed using the phenotypic values of males only and using those of females only.
  • This script fits a QTDT model with options -at -Weg or each phenotypic trait, for the sexes pooled, in combination with each SNP locus. These options indicate that the association between the trait and the SNP locus is to be estimated, and that the model is also to include the variance components V e + V g . However, the association is not to be partitioned into between-pedigree and within-pedigree components.
  • the same model is fitted for the phenotypes of males only and for females only. The chi-square value for association of each SNP with each trait is extracted from these files and these values are stored.
  • the chi-square value needed to achieve significance following the Bonfe ⁇ oni correction is presented for all BMD traits at a single SNP locus (8 tests), for all SNP loci for a single trait (100 tests) and for all SNP-trait combinations (800 tests).
  • the frequency of the rare allele at each SNP locus is extracted from the file and is also presented in each of these worksheets.
  • the mean and heritability of each phenotypic trait, for the sexes pooled (with and without the inclusion of outliers) and for each sex individually, are presented in Table 1.
  • the exclusion of outliers makes little difference to either the mean or the heritability.
  • the BMD traits calibrated BMD values and Z scores
  • the exclusion of outliers consistently raises the mean value and lowers the heritability. This is to be expected, as the outliers were identified on the basis of high BMD.
  • Heritability of the BMD traits is strikingly lower in females than in males. In particular, that of CalL2_L4BMD is zero. Conversion of the BMD values to Z scores causes a small but consistent increase in heritability in females, but had no consistent effect in males.
  • the test for association between a phenotypic trait and a SNP within pedigrees is less powerful than the non-partitioned test, provided that stratification is absent. It is therefore to be expected that the chi-square value for non-partitioned association (-at model in QTDT) will be larger than that for association within pedigrees (-ao model), unless there is strong association due to stratification in the opposite direction to that caused by linkage. In the present case there were no exceptions to this expectation.
  • the significance tests for stratification (given by the -ap model) are summarized in Tables 2 and 3. There are a few significant values (P ⁇ 0.05), but not many more than the 5 % expected by chance. It is therefore concluded that stratification is not strong or widespread in these data, and attention is therefore focused on the non-partitioned model of association.
  • the SNP loci OGN_02 and OMDJ33 show significant associations with phenotypic traits in all three sub-sets of the data (sexes pooled, males only and females only), and OMD_01 shows six associations that are significant at the 1 % level.
  • the difference between either homozygote and the heterozygote in these marker-trait associations ranges from 2.8 % to 10.4 % of the mean value for the trait in the case of the calibrated BMD traits, and from 0.114 to 0.448 units for the Z scores. In the cases where stratification is significant, the within-family association effect is consistently much smaller than the unpartitioned effect.
  • the strongest and most consistent associations between SNP loci and phenotypic traits related to BMD are at loci OGN_02, OMD_03, OMD_01 and ITGA08.
  • the first three of these each show several associations significant at the 5 % level, with effects ranging from 2.8 % to 10.4 % of the mean value for the trait in the case of the calibrated BMD traits, and from 0.114 to 0.448 units for the Z scores.
  • ITGA08 shows significant association only with BMD in lumbar vertebrae 2 to 4 in males, but this effect is significant at the 1 % level. Its magnitude lies in the same range as those at the other three loci.
  • Table 10 of this application provides a list of the polymorphism markers of the thirty-two (32) genes listed in Example 9 which have been found to have various effects on susceptibility to low BMD and bone damage.
  • Tables 11 and 12 ranks into groups the polymorphic markers by the relevance of their association to the susceptibility to low BMD by sexes (Table 11 -males and Table 12-females). Those markers ranked in Group A are the ones that show the most association to the susceptibility to low BMD, Group B show less association and Group C shows the least association.
  • the gene by gene interactions were assessed by logistic regression.
  • the interaction was assessed for every pair of OMD-ITGAV SNPs (OMD01 and OMD03 versus ITGAV02, 08, 11 and 12).
  • the logistic regression models were
  • MEN: OP or Frx (0 or 1) OMD(snp) + ITGAV(snp) + OMG(snp)+ ITGAV(snp)+ age+ weight
  • OP or Frx (0 or 1) OMD(snp) + ITGAV(snp) + OMG(snp)+ ITGAV(snp)+ age+ height
  • the weight was not included for the women and height for the men since they did not show significant effect on OP or Frx in the sample set used.
  • the study group was made up of individuals with osteoporosis (OP) that were unrelated individuals from the FAMOS cohort that had (1) been diagnosed with OP, (2) had fractures and had a maximum Z (spine) of -1 or (3) was a proband and had a maximum Z (spine) of -0.5.
  • OP osteoporosis
  • the odds ratio of the predisposing variant was computed by logistic regression analyzing separately the two or three ITGAV genotypes.
  • the significant level of the association was determined by computing the Wald's Chi-Square for the OMD SNP (model tested OP+OMDsnp + age + height (if women) + weighty men) ) independently for each ITGAV genotype. This is illustrated by the plots in Figure 1 A-E.

Abstract

The invention relates to polynucleotides associated with susceptibility to low bone mineral density and/or bone damage generally associated with human diseases, and in particular to osteoporosis. The invention further relates to polymorphic polynucleotides associated with osteoporosis. The invention provides methods of determining if a particular polymorphism predisposes an individual to or is associated with the development of osteoporosis. The invention also provides methods of detecting the presence of one or more polymorphism as an indicator of osteoporosis, and provides for use of novel polynucleotides of the invention in the development of drugs and in disease treatment.

Description

NUCLEOTIDE POLYMORPHISMS ASSOCIATED WITH OSTEOPOROSIS
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
TECHNICAL FIELD
The invention relates in general to polymorphisms in genes associated with susceptibility to low bone mineral density and bone remodeling and methods of identifying individuals having a gene containing a polymorphism associated with osteoporosis. The invention also relates to a method of detecting an increases susceptibility to a disease in an individual resulting from the presence of a polymorphism or mutation in the gene coding sequence of a osteoporosis and bone remodeling associated gene.
BACKGROUND
Single nucleotide substitutions and small unique insertions and deletions are the most frequent form of DNA polymorphism and disease-causing mutation in the human genome. These DNA sequence variations, called single nucleotide polymorphisms (SNPs), have gained popularity and have been proposed as the genetic markers of choice for the study of complex genetic traits (Collins et al. 1997 Science 278: 1580- 1581; Risch and Merkangas 1996 Science 273: 1516-1517). Despite the fact that on average approximately one nucleotide position in every 1000 bases along the human chromosome is estimated to differ between any two copies of the chromosome (Cooper et al. 1985 Human Genetics 69: 201- 205; Kwok et al. 1996 Genomics 31: 123-126) developing SNP markers is not easy.
It has been suggested that association studies (such as linkage equilibrium studies) with a set of single nucleotide polymorphism (SNP) markers evenly spaced across the genome at approximately 100 KB intervals would provide the necessary power to detect the small effects of each gene involved in a complex trait (Hauser et al. 1996 Genetic Epidemiology 13: 117-137 in Kwok and Chen 1998 Genetic Engineering 20: 125-134, Plenum Press, New York). Alternatively, one can take a candidate gene approach in performing association studies with the use of a set of gene-associated SNP markers to detect these genetic factors (ibid.).
Nucleotide sequence mutations which occur in a gene or gene family, where the gene or gene family is associated with a given disease, indicates susceptibility to or development of the disease.
Osteoporosis is a common disease characterized by low bone mineral density (BMD), deterioration of bone micro-architecture and increased risk of bone damage, such as fracture. Common types of osteoporosis include postmenopausal and senile osteoporosis, which generally occur later in life, e.g., 70+ years. Osteoporosis is a major health problem in virtually all societies. It is estimated that 30 million Americans and 100 million people worldwide are at risk for osteoporosis. In European populations, one in three women and one in twelve men over the age of fifty is at risk. These numbers are growing as the elderly population increases. It is estimated that by the middle of the next century the number of osteoporosis sufferers will double in the West, but may increase six-fold in Asia and South- America. Fracture is the most serious endpoint of osteoporosis, particularly fracture of the hip which affects up to 1.7 million people worldwide each year. It is estimated that by the year 2050, the number of hip fractures worldwide will increase to over 6 million, as life expectancy and age of the population increase (See Spangler et al. "The Genetic Component of Osteoporosis Mini-review"; http://www.csa.com.osteointro.html). Thus, osteoporosis is a major public health problem which affects quality of life and increases costs to health care providers.
Peak bone mass is mainly genetically determined, though dietary factors and physical activity can have positive effects. Peak bone mass is attained at the point when skeletal growth ceases, after which time bone loss starts. In contrast to the positive balance that occurs during growth, in osteoporosis, the resorbed cavity is not completely refilled by bone. Despite recent successes with drugs that inhibit bone resorption, there is a clear need for specific anabolic agents that will considerably increase bone formation in people who have already suffered substantial bone loss. There are no such drugs currently approved.
Current treatment for osteoporosis helps stop further bone loss and fractures. Common therapeutics include HRT (hormone replacement therapy), bisphosphonates, e.g., alendronate (Fosamax), as well as, estrogen and estrogen receptor modulators, progestin, calcitonin, and vitamin D. While there may be numerous factors that determine whether any particular person will develop osteoporosis, a step towards prevention, control or treatment of osteoporosis is determining whether one is at risk for osteoporosis. Genetic factors also play an important role in the pathogenesis of osteoporosis. Some attribute 50-60% of total bone variation (Bone Mineral Density; BMD), depending upon the bone area, to genetic effects. However, up to 85%-90% of the variance in bone mineral density may be genetically determined.
Studies have shown from family histories, twin studies, and racial factors, that there may be a predisposition for osteoporosis. Several candidate genes may be involved in this, most probably multigenic, process.
Osteoporosis can be considered a complex genetic trait with variants of several genes underlying the genetic determination of the variability of the phenotype. Low bone mineral density (BMD) is an important risk factor for fractures, the clinically most relevant feature of osteoporosis. Segregation analysis in families has shown that BMD is under polygenic control while, in addition, biochemical markers of bone turnover have also been shown to have strong genetic components. Several candidate genes have been analysed in relation to BMD but the most widely studied gene in this respect, the vitamin D receptor (VDR) gene, explains only a small part of the genetic effect on BMD. Numerous studies, focussing on the BsmJ allele of the vitamin D receptor gene have concluded that absence of the restriction site correlates with low bone mineral density.
Diagnosis of those at risk of developing osteoporosis allows more effective preventive measures. Strategies for the prevention of this disease include development of bone density in early adulthood, and minimisation of bone loss in later life. Changes in lifestyle, nutrition and hormonal factors have been shown to affect bone loss.
There is need for clinical and epidemiological research for the prevention and treatment of osteoporosis for gaining deeper knowledge of factors controlling bone cell activity and regulation of bone mineral and matrix formation and remodelling.
SUMMARY
One or more of these novel polymorphisms at the positions indicated in Table 2 found in U.S. Provisional Patent Application Serial No. 60/342,711 entitled "Nucleotide Polymorphisms Associated with Osteoporosis" filed December 20, 2001 (which is hereby incorporated herein by reference in its entirety) may be responsible for increased susceptibility to low bone mineral density (BMD) and/or bone fracture, which indicates bone damage and related conditions such as osteoporosis, h particular, the polymorphisms of the present invention, either alone or in combination with other polymorphisms, may be useful in identifying individuals susceptible or resistant to low BMD and/or bone damage, and for those individuals that are susceptible to low BMD and/or bone damage in the prevention or treatment of this condition.
The present invention is applicable to any disease in which low BMD and/or bone fracture is a factor, and is therefore particularly concerned with diseases such as osteoporosis. Low BMD is defined as two standard deviations below the age-matched mean of bone mineral density for a given population. Bone damage may be defined as any form of structural damage such as fractures, bones or chips, and degradation or deterioration of the bone other than normal wear and tear resulting from low bone mineral density or another cause. Such low BMD and/or bone damage is associated with osteoporosis.
The invention may be practised on any mammalian subject. Preferably, the mammalian subject will be a human, and most preferably an adult, preferably female.
The polynucleotide of this invention is preferably DNA, or may be RNA or other options .
In a second aspect, fragments of the nucleic acid sequences of the first aspect are provided, which comprise one or more nucleotide substitutions, insertions or deletions. The novelty of a fragment according to the present embodiment may be easily ascertained using sequence comparison methods as previously described.
Preferred fragments may be 10 to 40 nucleotides in length. More preferably, the fragments are between 5 to 10, 5 to 20, or 10 to 20 nucleotides in length. For example, the fragments may be 5, 8, 10, 12, 15, 18, 20, 22, 25, 28, 30, or 35 nucleotides in length. The fragments may be useful in a variety of diagnostic, prognostic or therapeutic methods, or may be useful as research tools for example in drug screening.
In a third aspect of the invention, there are provided non-coding, complementary sequences which hybridize to a nucleic acid sequence of the first aspect. Such "anti-sense" sequences are useful as probes or primers for detecting an allele of a polymorphism of the invention, or in the regulation of the genes. They may also be used as agents for use in the identification and/or treatment of individuals having or being susceptible to low bone mineral density.
The anti-sense polynucleotides of this embodiment may be the full length of sequence of the first aspect, or more preferably may be 5 to 30 nucleotides in length. Preferred polynucleotides are 5 to 10 or 10 to 25 nucleotides in length. Primers, in particular, are typically 10 to 15 nucleotides long, and may occasionally be 16 to 25.
In a preferred embodiment, the polynucleotides of the aforementioned aspects of the invention may be in the form of a vector, to enable the in vitro or in vivo expression of the polynucleotide sequence . The polynucleotides may be operably linked to one or more regulatory elements including a promoter; regions upstream or downstream of a promoter such as enhancers which regulate the activity ofthe promoter; an origin of replication; appropriate restriction sites to enable cloning of inserts adjacent to the polynucleotide sequence; markers, for example antibiotic resistance genes; ribosome binding sites: RNA splice sites and transcription termination regions; polymerisation sites; or any other element which may facilitate the cloning and/or expression of the polynucleotide sequence. Where two or more polynucleotides of the invention are introduced into the same vector, each may be controlled by its own regulatory sequences, or all sequences may be controlled by the same regulatory sequences. In the same manner, each sequence may comprise a 3' polyadenylation site. The vectors may be introduced into microbial, yeast or animal DNA, either chromosomal or mitochondrial, or may exist independently as plasmids. Examples of suitable vectors will be known to persons skilled in the art and include pBluescript II, LambdaZap, and pCMV-Script (Stratagene Cloning Systems, La Jolla (USA))
In another aspect of the present invention, there is provided host cell comprising a polynucleotide according to any of the aforementioned aspects, for expression of the polynucleotide. The host cell may comprise an expression vector, or naked DNA encoding said polynucleotides. A wide variety of suitable host cells are available, both eukaryotic and prokaryotic.
In a further aspect of the present invention, there is provided a transgenic non-human animal comprising a polynucleotide according to an aforementioned aspect of the invention.
Preferably, the transgenic, non-human animal comprises a polynucleotide according to the second third aspects. Transgenic non-human animals are useful for the analysis of the single nucleotide polymorphisms and their phenotypic effect.
In an eighth aspect of the present invention there is provided a method of screening for agents for use in the prognosis, diagnosis or treatment of individuals having, or being susceptible to, low bone mineral density, said method comprising contacting a putative agent with a polynucleotide or protein according to an aforementioned aspect of the present invention, and monitoring the reaction there between. Preferably, the method further comprises contacting a putative agent with a reference polynucleotide or protein, and comparing the reaction between (i) the agent and the polynucleotide or protein encoding the reference allele; and (ii) the agent and polynucleotide or protein ofthe invention. Potential agents are those which react differently with a variant of the invention and a reference allele. It is envisaged that the present method may be carried out by contacting a putative agent with a host cell or transgenic non-human animal comprising a polynucleotide or protein according to the invention. Putative agents will include those known to persons skilled in the art, and include chemical or biological compounds, such as anti-sense polynucleotide sequences, complementary to the coding sequences of the first aspect, or polyclonal or monoclonal antibodies which bind to a product such as a protein or protein fragment of the second aspect. They may also be useful in determining susceptibility to low bone mineral density, or in the diagnosis, prognosis or treatment of related conditions. In a ninth aspect of the present invention, there is provided a method of diagnosing, or determining susceptibility of a subject to low bone mineral density and/or bone damage, said method comprising analysing the genetic material of a subject to determine which allele(s) ofthe gene is/are present. The method may include determining whether one or more particular alleles are present, or which combination of alleles (i.e. a haplotype) is present. The method may also include determining whether subjects are homozygous or heterozygous for a particular allele or haplotype. In a preferred embodiment, the method comprises determining which allele of one or more of the polymorphisms of the invention is/are present. In particular, the method may include determining the presence of the polymorphism of the gene which in combination with polymorphisms defined herein or other polymorphisms may define a risk haplotype.
In another preferred embodiment of the ninth aspect, the method may comprise determining which allele is present in the protein. Preferably, the method comprises determining whether the allele of the polymorphism of the fourth aspect is present. Any method for determining the presence of an allele may be used. One such method involves the use of antibodies in diagnosing or determining susceptibility to low bone mineral density. The method may comprise removing a sample from a subject, contacting the sample with an antibody to an antigen of the protein, and detecting binding of the antibody to the antigen, wherein binding is indicative ofthe presence of a particular allele or form of the protein and thus risk to low BMD. Tissue samples as described above are suitable for this method. In a further aspect of the present invention, there is provided a method of predicting the response of a subject to treatment, said method comprising analysing genetic material of a subject to determine which allele(s) of the gene is/are present. Preferably, the method is carried out according to the ninth aspect. This aspect ofthe invention is based upon the observation that the effectiveness of treatment depends upon the underlying cause of disease. Therefore, depending upon the presence of particular allele(s), and their effect, certain treatments may be effective, whereas others may not. This will be the case where different alleles or haplotypes result in low bone mineral density, but mediate their effect via different biological mechanisms. The method preferably also comprises comparing the alleles present in a subject with those ofthe genes which require particular treatments. This may be done by use of a chart or visual aid detailing the therapies which are most appropriate for particular genotypes. In a further aspect, the present invention provides a kit to determine which alleles of the gene is/are present. Preferably, the kit will be suitable for determining which alleles of the polymorphisms of the first aspect are present. The kit may contain polynucleotides, most preferably anti-sense sequences such as those of the third aspect, for use as probes or primers; antibodies which bind to alleles of the protein, such as those of the fifth aspect; or restriction enzymes for use in detecting the presence of a polynucleotide, protein, or fragment thereof. Preferably, the kit will also comprise means for detection of a reaction, such as nucleotide label detection means, labelled secondary antibodies or size detection means. In yet a further preferred embodiment, the polynucleotides, or antibodies may be fixed to a substrate, for example an array.. The kit further comprises means for indicating correlation between the genotype of a subject and risk of low BMD. Such means may be in the form of a chart or visual aid, which indicate that presence of one or more alleles of the gene, including alleles of the polymorphisms of the invention, is/are associated with low BMD.
DESCRIPTION
The invention provides novel polynucleotides and polymorphic polynucleotides associated with a given human disease, for example, with osteoporosis. The invention also provides a gene sequence containing one or more polymorphic nucleotides associated with a predisposition to or the development of a given human disease such as osteoporosis. The invention also relates to polypeptides encoded by the novel polynucleotides or the polymorphism- containing gene. The invention also provides methods of detecting a polymorphism according to the invention in individuals at risk for osteoporosis, and for determining if a given polymorphism is associated with a predisposition to the disease. The invention also discloses polymorphism(s) that are either associated with or are not associated with (i.e., are neutral) osteoporosis. A polymorphism in a given gene can be utilized in various diagnostic and therapeutic methods and procedures, for example, in nucleic acid and peptide diagnosis, drug screening and design, and in gene and peptide therapy. A polymorphism associated with a given gene can be utilized in various gene expression systems and assays designed to analyze gene regulation and expression. Definitions
As used herein, "polymorphism" refers to a nucleotide alteration that either predisposes an individual to a disease or is not associated with a disease, which occurs as a result of a substitution, insertion or deletion. More particularly, a "polymorphism" or "polymorphic variation" may be a nucleic acid sequence variation, as compared to the naturally occurring sequence, resulting from either a nucleotide deletion, an insertion or addition, or a substitution, which is present at a frequency of greater than 1% in a population.
As used herein, "neutral polymorphism" refers to a polymorphism which is present at a frequency of greater than 1 % in a population, which does not alter gene function or phenotype, and thus is not associated with a predisposition to or development of a disease.
As used herein "polynucleotide sequence" refers to a sense or antisense nucleic acid sequence comprising RNA, cDNA, genomic DNA, synthetic forms and mixed polymers, that may be chemically or biochemically modified or may contain non-natural or derivatized nucleotide bases.
As used herein "mutation" refers to a variation in the nucleotide sequence of a gene or regulatory sequence as compared to the naturally occurring or normal nucleotide sequence. A mutation may result from the deletion, insertion or substitution of more than one nucleotide (e.g.,
2, 3, 4, or more nucleotides) or a single nucleotide change such as a deletion, insertion or substitution. The term "mutation" also encompasses chromosomal rearrangements.
As used herein, "nucleic acid probe" refers to an oligonucleotide, nucleotide or polynucleotide, and fragments and portions thereof, and to DNA or RNA of genomic or synthetic origin which may be single- or double- stranded, which represents the sense or antisense strand. Both terms "nucleic acid probe" and "DNA fragment" refer to a length of polynucleotide, for example, as small as 5 nucleotides, 10, 20, 25, 40, 50, 75, 100, 250, 400, 500 and 1 kb, and as large as 5-10kb.
As used herein, "alteration" refers to a change in either a nucleotide or amino acid sequence, as compared to the naturally occurring sequence, resulting from a deletion, an insertion or addition, or a substitution. As used herein, "deletion" refers to a change in either nucleotide or amino acid sequence wherein one or more nucleotides or amino acid residues, respectively, are absent.
As used herein, "insertion" or "addition" refers to a change in either nucleotide or amino acid sequence wherein one or more nucleotides or amino acid residues, respectively, have been added.
As used herein, "substitution" refers to a replacement of one or more nucleotides or amino acids by different nucleotides or amino acid residues, respectively.
As used herein, "specifically hybridizable" refers to a nucleic acid or fragment thereof that hybridizes to another nucleic acid (or a complementary strand thereof) due to the presence of a region that is at least approximately 90 % homologous , preferably at least approximately 90- 95% homologous, and more preferably approximately 98-100% homologous, as are polynucleotides that hybridize to a partner under stringent hybridization conditions. "Stringent" hybridization conditions are defined hereinbelow for various hybridization protocols. A probe that is specifically hybridizable to a given sequence can be used to detect a 1 bp out of 10 bp (10%) or a 1 bp out of 2O bp (5%) difference between nucleic acid sequences and is therefore useful for discriminating between a wild type and a mutant form of a gene of interest.
As used herein, "amino acid sequence" refers to the sequential array of amino acids that have been joined by peptide bonds between the carboxylic acid group of one amino acid and the amino group of the adjacent amino acid to form long linear polymers comprising proteins. As used herein, "amino acid" refers to protein subunit molecules that contain a carboxylic acid group, and an amino group, both linked to a single carbon atom.
A polypeptide is said to be "encoded" by a polynucleotide if the polynucleotide, either in its native state or in a recombinant form can be transcribed and/or translated to produce the mRNA for and/or the polypeptide or a fragment thereof. As used herein, "gene " refers to a region of DNA which includes a portion which can be transcribed into RNA, and which may contain an open reading frame or coding region (also referred to as an exon) which encodes a protein, a non-coding region (also referred to as an intron), and a specific regulatory region comprising the DNA regulatory elements which control expression of the transcribed region. As used herein, "coding region" refers to a region of DNA which encodes a protein, also known as an exon.
As used herein, "non-coding region" refers to a region of DNA which does not encode a protein coding region, also known as an intron, and is not included in the RNA molecule that is synthesized from a particular gene.
As used herein, "regulatory region" refers to DNA sequences which are located either 5' of the transcription start site, 3' or the transcription termination site, within an intron or exon, capable of ensuring that the gene is transcribed at the proper time and in the appropriate cell type.
As used herein, "consensus DNA sequence" or "wild-type DNA sequence" refers to a sequence wherein every position represents the nucleotide that occurs with the highest frequency when many actual sequences are compared. As used herein, "consensus DNA sequence" or "wild-type DNA sequence" also refers to the normal, naturally occurring DNA sequence.
As used herein, a given sequence (or mutation or polymorphism) "associated with" osteoporosis refers to a nucleic acid sequence that increases susceptibility to the disease, predisposes an individual to the disease or contributes to the disease, wherein the nucleic acid sequence is present at a higher frequency (at least 5%, preferably 10%, more preferably 25% higher) in individuals with the disease as compared to individuals who do not have the disease.
As used herein, a sequence "not associated with" osteoporosis refers to a nucleic acid sequence that does not jncrease susceptibility to the disease, predispose an individual to the disease or contribute to the disease, wherein the nucleic acid sequence is not present at a higher frequency in individuals with the disease, and thus is present at a frequency about equal to its frequency in individuals who do not have the disease.
As used herein, "amplifying" refers to producing additional copies of a nucleic acid sequence, preferably by the method of polymerase chain reaction (Mullis and Faloona, 1987, Methods Enzvmol. 155: 335).
As used herein, "oligonucleotide primers" refer to single stranded DNA or RNA molecules that are hybridizable to a nucleic acid template and prime enzymatic synthesis of a second nucleic acid strand. Oligonucleotide primers useful according to the invention are between 5 to 100 nucleotides in length, preferably 20-60 nucleotides in length, and more preferably 20-40 nucleotides in length.
As used herein, "sequencing" refers to determining the precise nucleotide composition or sequence of a nucleic acid region by methods well known in the art (see Ausubel et al., supra and S ambrook et al . , supra) .
As used herein, "comparing" a sequence refers to determining if the nucleotides at one or more positions in a particular region of a nucleic acid fragment are identical for any two or more sequences. According to the invention, sequence comparisons can be performed by using computer program analysis as described below in Section F entitled "Identification and Characterization of Polymorphisms".
As used herein, "sequence differences" or "sequence variations" refer to nucleotide changes, at one or more positions between any two or more sequences being compared.
As used herein, "determining the presence of polymorphic variations" refers to using methods well known in the art to identify a nucleotide, at one or more positions within a particular nucleic acid region, that is distinct from the nucleotide present in the naturally occurring, wild-type or consensus sequence, resulting from either a nucleotide deletion, an insertion or addition, or a substitution.
As used herein, "determining the absence of polymorphic variations" refers to using methods well known in the art to determine that the nucleotides present at every position analyzed in a particular nucleic acid region are identical to the nucleotides present in the naturally occurring, wild type or consensus sequence.
As used herein, "genotyping" refers to determining the composition of the genetic material that is inherited by an organism from its parents.
As used herein, "biological sample" refers to a tissue or fluid sample containing a polynucleotide or polypeptide of interest, and isolated from an individual including but not limited to plasma, serum, spinal fluid, lymph fluid, urine, stool, external secretions of the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and samples of in vitro cell culture constituents. As used herein, "amplimers" refer to a specific fragment of DNA generated by PCR that is at least 30 bp in length and is preferably between 50 and 100 bp in length, and is more preferably between 150-300bp in length, with a melting temperature in the range of approximately 60-62°C. As used herein, "phenotype" refers to the biological appearances of an organism or a tissue derived from an organism, wherein biological appearances include chemical, structural and behavioral attributes, and excludes genetic constitution.
As used herein, "genotype" refers to the genetic material that is inherited by an organism from its parents. As used herein, "genetic susceptibility to osteoporosis" refers to an increased risk of developing osteoporosis resulting from specific DNA differences relative to non-susceptible individuals. Preferably an individual who is genetically susceptible to osteoporosis has a 5-100%, and more preferably a 25-50% greater chance of developing osteoporosis, as compared to non- susceptible individuals. As used herein, "diagnostic" refers to the practice of identifying a disease from the signs and symptoms of an individual including the DNA sequences of genes that are associated with an increased susceptibility to the disease. "Diagnostic" also refers to the practice of stratifying patient populations based on the efficacy or toxicity of a composition, and the predictive placement of an individual in a response strata based on stata-associated parameters. As used herein, "prognosis" refers to the possibility of recovering from a particular disease or condition, and also refers to risk assessment of developing a particular disease or condition.
A. Design and Synthesis of Oligonucleotide Primers According to the present invention, oligonucleotide primers are disclosed that are useful for determining the sequence of a particular allele of a gene. The invention also discloses oligonucleotide primers designed to amplify a region of a gene that is known to contain a polymorphism. The invention also discloses oligonucleotide primers designed to anneal specifically to a particular allele of a gene. Oligonucleotide primers useful according to the invention are single-stranded DNA or RNA molecules that are hybridizable to a nucleic acid template and prime enzymatic synthesis of a second nucleic acid strand. The primer is complementary to a portion of a target molecule present in a pool of nucleic acid molecules. It is contemplated that oligonucleotide primers according to the invention are prepared by synthetic methods, either chemical or enzymatic. Alternatively, such a molecule or a fragment thereof is naturally-occurring, and is isolated from its natural source or purchased from a commercial supplier. Oligonucleotide primers are 5 to 100 nucleotides in length, ideally from 20 to 40 nucleotides, although oligonucleotides of different length are of use. Pairs of single- stranded DNA primers can be annealed to sequences within or surrounding a gene on chromosome Y in order to prime amplifying DNA synthesis of a region of a gene. A complete set of gene primers will allow synthesis of all of the nucleotides of the coding sequences, e.g., the exons, introns and control regions. Preferably, the set of primers will also allow synthesis of both intron and exon sequences. Allele-specific primers are also useful, according to the invention. Such primers will anneal only to a particular-mutant allele (e.g. alleles containing a polymorphism), and thus will only amplify a product if the template also contains the polymorphism. Allele specific primers that anneal only to a wild type gene sequence are also useful according to the invention.
Typically, selective hybridization occurs when two nucleic acid sequences are substantially complementary (at least about 65% complementary over a stretch of at least 14 to 25 nucleotides, preferably at least about 75%, more preferably at least about 90% complementary). See Kanehisa, M., 1984, Nucleic Acids Res. 12: 203, incorporated herein by reference. As a result, it is expected that a certain degree of mismatch at the priming site is tolerated. Such mismatch may be small, such as a mono-, di- or tri-nucleotide. Alternatively, it may encompass loops, which are defined as regions in which there exists a mismatch in an uninterrupted series of four or more nucleotides.
Numerous factors influence the efficiency and selectivity of hybridization of the primer to a second nucleic acid molecule. These factors, which include primer length, nucleotide sequence and/or composition, hybridization temperature, buffer composition and potential for steric hindrance in the region to which the primer is required to hybridize, will be considered when designing oligonucleotide primers according to the invention.
A positive correlation exists between primer length and both the efficiency and accuracy with which a primer will anneal to a target sequence. In particular, longer sequences have a higher melting temperature (TM) than do shorter ones, and are less likely to be repeated within a given target sequence, thereby minimizing promiscuous hybridization. Primer sequences with a high G-C content or that comprise palindromic sequences tend to self-hybridize, as do their intended target sites, since unimolecular, rather than bimolecular, hybridization kinetics are generally favored in solution. However, it is also important to design a primer that contains sufficient numbers of G-C nucleotide pairings since each G-C pair is bound by three hydrogen bonds, rather than the two that are found when A and T bases pair to bind the target sequence, and therefore forms a tighter, stronger bond. Hybridization temperature varies inversely with primer annealing efficiency, as does the concentration of organic solvents, e.g. formamide, that might be included in a priming reaction or hybridization mixture, while increases in salt concentration facilitate binding. Under stringent annealing conditions, longer hybridization probes (of use, for example, in Northern analysis), or synthesis primers, hybridize more efficiently than do shorter ones, which are sufficient under more permissive conditions. Stringent hybridization conditions typically include salt concentrations of less than about 1M, more usually less than about 500 mM and preferably less than about 200 mM. Hybridization temperatures range from as low as 0°C to greater than 22°C, greater than about 30°C, and (most often) in excess of about 37°C. Longer fragments may require higher hybridization temperatures for specific hybridization. As several factors affect the stringency of hybridization, the combination of parameters is more important than the absolute measure of a single factor.
Oligonucleotide primers can be designed with these considerations in mind and synthesized according to the following methods.
1. Oligonucleotide Primer Design Strategy
The design of a particular oligonucleotide primer for the purpose of sequencing or PCR involves selecting a sequence that is capable of recognizing the target sequence, but has a minimal predicted secondary structure. The oligonucleotide sequence binds only to a single site in the target nucleic acid. Furthermore, the Tm of the oligonucleotide is optimized by analysis ofthe length and GC content ofthe oligonucleotide. Furthermore, when designing a PCR primer useful for the amplification of genomic DNA, the selected primer sequence does not demonstrate significant matches to sequences in the GenBank database (or other available databases).
The design of a primer is facilitated by the use of readily available computer programs, developed to assist in the evaluation of the several parameters described above and the optimization of primer sequences. Examples of such programs are "PrimerS elect" of the DNAStar™software package (DNAStar, Inc. ; Madison, WI), OLIGO 4.0 (National Biosciences, Inc.), PRIMER, Oligonucleotide Selection Program, PGEN and Amplify (described in Ausubel et al., 1995, Short Protocols in Molecular Biology.3rd Edition, John Wiley & Sons). Primers are designed with sequences that serve as targets for other primers to produce a PCR product that has known sequences on the ends which serve as targets for further amplification (e.g. to sequence the PCR product). If many different genes are amplified with specific primers that share a common 'tail' sequence', the PCR products from these distinct genes can. subsequently be sequenced with a single set of primers. Alternatively, in order to facilitate subsequent cloning of amplified sequences, primers are designed with restriction enzyme site sequences appended to their 5' ends. Thus, all nucleotides ofthe primers are derived from gene sequences or sequences adjacent to a gene, except for the few nucleotides necessary to form a restriction enzyme site. Such enzymes and sites are well known in the art. If the genomic sequence of a gene and the sequence of the open reading frame of a gene are known, design of particular primers is well within the skill of the art.
2. Synthesis
The primers themselves are synthesized using techniques which are also well known in the art. Once designed, oligonucleotides are prepared by a suitable method, e.g. the phosphoramidite method described by Beaucage and Carruthers (1981, Tetrahedron Lett., 22:1859) or the triester method according to Matteucci et al. (1981, J. Am. Chem. Soc, 103:3185), both incorporated herein by reference, or by other chemical methods using either a commercial automated oligonucleotide synthesizer (which is commercially available) or VLSIPS™ technology.
B. Production of a Polynucleotide Sequence The invention discloses polynucleotide sequences comprising polymorphisms. The polynucleotide sequences of the invention are specifically hybridizable to a mutant form of a gene and are therefore useful for discriminating between a wild-type form of a gene and a mutant form of a gene. The polynucleotide sequences ofthe invention may also be useful for expression ofthe encoded protein or a fragment thereof. The invention also features antisense polynucleotide sequences complementary to polynucleotide sequences comprising polymorphisms. Antisense polynucleotide sequences are useful according to the invention for inhibiting expression of an allelic form of a gene.
The present invention utilizes polynucleotide sequences and fragments comprising RNA, cDNA, genomic DNA, synthetic forms, and mixed polymers. The invention includes both sense and antisense strands of the polynucleotide sequences. According to the invention, the polynucleotide sequences may be chemically or biochemically modified or may contain non- natural or derivatized nucleotide bases. Such modifications include, for example, labels, methylation, substitution of one or more of the naturally occurring nucleotides with an analog, internucleotide modifications such as uncharged linkages (e.g. methyl phosphonates, phosphorodithioates. etc.), pendent moieties (e.g., polypeptides), intercalators, (e.g. acridine, psoralen, etc.) chelators, alkylators, and modified linkages (e.g. alpha anomeric nucleic acids, etc.) Also included are synthetic molecules that mimic polynucleotides in their ability to bind to a designated sequence via hydrogen bonding and other chemical interactions. Such molecules are known in the art and include, for example, those in which peptide linkages substitute for phosphate linkages in the backbone of the molecule.
The polynucleotide may be a naturally occurring polynucleotide, or may be a structurally related variant of such a polynucleotide having modified bases and/or sugars and/or linkages. The term "polynucleotide" as used herein is intended to cover all such variants.
Modifications, which may be made to the polynucleotide may include (but are not limited to) the following types: a) Backbone modifications i) phosphorothioates (X or Y or W or Z = S or any combination of two or more with the remainder as 0). e.g. Y=S (Stein et al., 1988, Nucleic Acids Res., 15:3209), X=S (Cosstick and Vyle, 1989, Tetrahedron Letters, 30:4693), Y and Z=S (Brill et al., 1989, J. Amer. Chem. Soc. 111:2321) ii) methylphosphonates (eg Z=methyl (Miller et al., 1980, J. Biol. Chem.. 255:9569)) iii) phosphorami dates (Z-= N-(alkyl)2 e.g. alkyl methyl, ethyl, butyl) (Z=morpholine or piperazine) (Agrawal et al., 1988, Proc. Natl. Acad. Sci.. USA. 85;7079) (X or W = NH) (Mag and Engels. 1988, Nucleic Acids Res., 16:3525) iv) phosphotriesters (Z=O-alkyl e.g. methyl, ethyl etc) (Miller et al., Biochemistry,
21:5468) v) phosphorus-free linkages (e.g. carbamate, acetamidate, acetate) (Gait et al.,
1974, J Chem.Soc. Perkin I. 1684, Gait et al., 1979, J Chem.Soc. Perkin I. 1389) b) Sugar modifications i) 2'-deoxynucleosides (R=H) ii) 2'-O-methylated nucleosides (R=OMe) (Sproat et al., 1989, Nucleic Acids Res.. 17:
3373) iii) 2'-fluoro-2'-deoxynucleosides (R=F) (Kruget al., 1989, Nucleosides and Nucleotides, 8:1473) c) Base modifications - (for a review see Jones, 1979, Int. J. Biolog. Macromolecules, 1:194) i) pyrimidine derivatives substituted in the 5-position (e.g. methyl, bromo, fluoro etc) or replacing a carbonyl group by an amino group (Piccirilli et al., 1990, Nature, 343:33). ii) purine derivatives lacking specific nitrogen atoms (e.g.7-deaza adenine, hypoxanthine) or functionalized in the 8-position (e.g. 8-azido adenine, 8-bromo adenine) d) Polynucleotides covalently linked to reactive functional groups, e.g.: i) psoralens (Miller et al., 1988, Nucleic Acids Res. Special Pub. No. 20:113, phenanthrolines (Sun et al., 1988, Biochemistry, 27:6039), mustards (Vlassov et al., 1988, Gene, 72:313) (irreversible cross-linking agents with or without the need for co-reagents) ii) acridine (intercalating agents) (Helene et al., 1985, Biochimie, 61:111) iii) thiol derivatives (reversible disulfide formation with proteins) (Connolly and
Newman, 1989, Nucleic Acids Res., 17:4957) iv) aldehydes (Schiffs base formation) v) azido, bromo groups (UV cross-linking) vi) ellipticines (photolytic cross-linking) (Perrouault et al., 1990, Nature, 344:358) e) Polynucleotides covalently linked to lipophilic groups or other reagents capable of improving uptake by cells, e.g.: i) cholesterol (Letsinger et al., 1989, Proc. Natl. Acad. Sci. USA, 86:6553), polyamines
(Lemaitre et al., 1987, Proc. Natl. Acad. Sci. USA, 84: 648), other soluble polymers (e.g. polyethylene glycol) f) Polynucleotides containing alpha-nucleosides (Morvan et al., Nucleic Acids Res., 15: 3421) g) Combinations of modifications a)-f)
It should be noted that such modified polynucleotides, while sharing features with polynucleotides designed as "anti-sense" inhibitors, are distinct in that the compounds correspond to sense-strand sequences and the mechanism of action depends on protein-nucleic acid interactions and does not depend upon interactions with nucleic acid sequences.
1. Polynucleotide Sequences Comprising DNA a. Cloning
Polynucleotide sequences comprising DNA can be isolated from cDNA or genomic libraries (including YAC and B AC libraries) by cloning methods well known to those skilled in the art (Ausubel et al., supra). Briefly, isolation of a DNA clone comprising a particular polynucleotide sequence involves screening a recombinant DNA or cDNA library and identifying the clone containing the desired sequence. Cloning will involve the following steps. The clones of a particular library are spread onto plates, transferred to an appropriate substrate for screening, denatured, and probed for the presence of a particular sequence. A description of hybridization conditions, and methods for producing labeled probes is included below. The desired clone is preferably identified by hybridization to a nucleic acid probe or by expression of a protein that can be detected by an antibody. Alternatively, the desired clone is identified by polymerase chain amplification of a sequence defined by a particular set of primers according to the methods described below.
The selection of an appropriate library involves identifying tissues or cell lines that are an abundant source of the desired sequence. Furthermore, if the polynucleotide sequence of interest contains regulatory sequence or intronic sequence a genomic library is screened (Ausubel et al., supra). b. Genomic DNA
Polynucleotide sequences of the invention are amplified from genomic DNA. Genomic DNA is isolated from tissues or cells according to the following method.
To facilitate detection of a variant form of a gene from a particular tissue, the tissue is isolated free from surrounding normal tissues. To isolate genomic DNA from mammalian tissue, the tissue is minced and frozen in liquid nitrogen. Frozen tissue is ground into a fine powder with a prechilled mortar and pestle, and suspended in digestion buffer (100 mM NaCI, 10 mM TrisCl, pH 8.0, 25 mM EDTA, pH 8.0, 0.5% (w/v) SDS, 0.1 mg/ml proteinase K) at 1.2ml digestion buffer per lOOmg of tissue. To isolate genomic DNA from mammalian tissue culture cells, cells are pelleted by centrifugation for 5 min at 500 x g, resuspended in 1-10 ml ice-cold PBS, repelleted for 5 min at 500 x g and resuspended in 1 volume of digestion buffer.
Samples in digestion buffer are incubated (with shaking) for 12-18 hours at 50°C, and then extracted with an equal volume of phenol/chloroform/isoamyl alcohol. If the phases are not resolved following a centrifugation step (10 min at 1700 x g), another volume of digestion buffer (without proteinase K) is added and the centrifugation step is repeated. If a thick white material is evident at the interface of the two phases, the organic extraction step is repeated. Following extraction the upper, aqueous layer is transferred to a new tube to which will be added x volume of 7.5M ammomum acetate and 2 volumes of 100% ethanol. The nucleic acid is pelleted by centrifugation for 2 min at 1700 x g, washed with 70% ethanol, air dried and resuspended in TE buffer (10 mM TrisCl, pH 8.0, 1 mM EDTA, pH 8.0) at lmg/ml. Residual RNA is removed by incubating the sample for 1 hour at 37°C in the presence of 0.1% SDS and 1 mg/ml DNAse-free RNASE, andrepeating the extraction and ethanol precipitation steps. The yield of genomic DNA, according to this method is expected to be approximately 2 mg DNA/1 g cells or tissue (Ausubel et al., supra). Genomic DNA isolated according to this method can be used for Southern blot analysis, restriction enzyme digestion, dot blot analysis or PCR analysis, according to the invention. c. Restriction digest (of cDNA or genomic DNA) Following the identification of a desired cDNA or genomic clone containing a particular sequence, polynucleotides of the invention are isolated from these clones by digestion with restriction enzymes.
The technique of restriction enzyme digestion is well known to those skilled in the art (Ausubel et al., supra). Reagents useful for restriction enzyme digestion are readily available from commercial vendors including New England Biolabs, Boebringer Mannheim, Promega, as well as other sources. d. PCR
Polynucleotide sequences of the invention are amplified from genomic DNA or other natural sources by the polymerase chain reaction (PCR). PCR methods are well-known to those skilled in the art.
PCR provides a method for rapidly amplifying a particular DNA sequence by using multiple cycles of DNA replication catalyzed by a thermostable, DNA-dependent DNA polymerase to amplify the target sequence of interest. PCR requires the presence of a nucleic acid to be amplified, two single stranded oligonucleotide primers flanking the sequence to be amplified, a DNA polymerase, deoxyribonucleoside triphosphates, a buffer and salts.
The method of PCR is well known in the art. PCR, is performed as described in Mullis and Faloona, 1987, Methods Enzymol., 155: 335, herein incorporated by reference.
PCR is performed using template DNA (at least 1 pg; more usefully, 1 - 1000 ng) and at least 25 pmol of oligonucleotide primers. A typical reaction mixture includes: 2 ml of DNA, 25 pmol of oligonucleotide primer, 2.5 ml of 10X PCR buffer 1 (Perkin-Elmer, Foster City, CA), 0.4 ml of 1.25 mM dNTP, 0.15 ml (or 2.5 units) of Taq DNA polymerase (Perkin Elmer, Foster City, CA) and deionized water to a total volume of 25 ml. Mineral oil is overlaid and the PCR is performed using a programmable thermal cycler. The length and temperature of each step of a PCR cycle, as well as the number of cycles, are adjusted according to the stringency requirements in effect. Annealing temperature and timing are determined both by the efficiency with which a primer is expected to anneal to a template and the degree of mismatch that is to be tolerated. The ability to optimize the stringency of primer annealing conditions is well within the knowledge of one of moderate skill in the art. An annealing temperature of between 30°C and 72°C is used. Initial denaturation of the template molecules normally occurs at between 92°C and 99°C for 4 minutes, followed by 20-40 cycles consisting of denaturation (94-99°C for 15 seconds to 1 minute), annealing (temperature determined as discussed above; 1-2 minutes), and extension (72°C for 1 minute). The final extension step is generally carried out for 4 minutes at 72°C, and may be followed by an indefinite (0-24 hour) step at 4°C.
Several techniques for detecting PCR products quantitatively without electrophoresis may be useful according to the invention in order to make it more suitable for easy clinical use. One of these techniques, for which there are commercially available kits such as Taqman™ (Perkin Elmer, Foster City, CA), is performed with a transcript-specific antisense probe. This probe is specific for the PCR product (e.g. a nucleic acid fragment derived from a gene) and is prepared with a quencher and fluorescent reporter probe complexed to the 5' end of the oligonucleotide. Different fluorescent markers can be attached to different reporters, allowing for measurement of two products in one reaction. When Taq DNA polymerase is activated, it cleaves off the fluorescent reporters of the probe bound to the template by virtue of its 5'-to-3' nucleolytic activity. In the absence of the quenchers, the reporters now fluoresce. The color change in the reporters is proportional to the amount of each specific product and is measured by a fluorometer; therefore, the amount of each color can be measured and the PCR product can be quantified. The PCR reactions can be performed in 96 well plates so that samples derived from many individuals can be processed and measured simultaneously. The Taqman™ system has the additional advantage of not requiring gel electrophoresis and allows for quantification when used with a standard curve. 2. Polynucleotide Sequences Comprising RNA
The present invention also provides a polynucleotide sequence comprising RNA. A polynucleotide comprising RNA is useful for detecting snps and polymorphisms by techniques including but not limited to hybridization methods or the RNase protection method. A polynucleotide comprising RNA is also useful as a template for the in vitro production of protein. A polynucleotide comprising RNA is also useful for detecting and localizing specific mRNA sequences by in situ hybridization.
Polynucleotide sequences comprising RNA can be produced according to the method of in vitro transcription. The technique of in vitro transcription is well known to those of skill in the art. Briefly, the gene of interest is inserted into a vector containing an SP6, T3 or T7 promoter. The vector is linearized with an appropriate restriction enzyme that digests the vector at a single site located downstream- of the coding sequence. Following a phenol/chloroform extraction, the DNA is ethanol precipitated, washed in 70% ethanol, dried and resuspended in sterile water. The in vitro transcription reaction is performed by incubating the linearized DNA with transcription buffer (200 mM TrisCl, pH 8.0,40 mM MgCl2, 10 mM spermidine, 250 NaCI [T7 or T3] or 200 mM TrisCl, pH 7.5,30 mM MgCl2, lOmM spermidine [SP6]), dithiothreitol, RNASE inhibitors, each of the four ribonucleoside triphosphates, and either SP6, T7 or T3 RNA polymerase for 30 min at 37°C. To prepare a radiolabeled polynucleotide comprising RNA, unlabeled UTP will be omitted and -SUTP will be included in the reaction mixture. The DNA template is then removed by incubation with DNasel. Following ethanol precipitation, an aliquot of the radiolabeled RNA is counted in a scintillation counter to determine the cpm/ml (Ausubel et al., supra).
Alternatively, polynucleotide sequences comprising RNA are prepared by chemical synthesis techniques such as solid phase phosphoramidite (described above). 3. Polynucleotide Sequences Comprising Oligonucleotides
A polynucleotide sequence comprising oligonucleotides can be made by using • oligonucleotide synthesizing machines which are commercially available (described above).
4. Polynucleotide Sequences Encoding Fusion Proteins Polynucleotide sequences ofthe invention can be used to express the protein product (or fragment thereof) of the gene of interest by inserting the polynucleotide sequence into an expression vector. Expression vectors suitable for protein expression in mammalian cells, bacterial cells, insect cells or plant cells are well known in the art and are described in Section H entitled "Production of a Mutant Protein".
Polynucleotide sequences ofthe invention can be used to prepare hybrid polynucleotides comprising a sequence of a gene adjacent to a sequence encoding a foreign protein or a fragment thereof (e.g lacZ, trpE, glutathionine S-transferase or thioredoxin) or a protein tag (hemmaglutinin or FLAG). Such hybrid polynucleotides produce fusion proteins that are useful, according to the invention, for improved expression and/or rapid isolation of a protein or protein fragment, encoded by the sequence of a gene. Hybrid polynucleotides are also useful as a source of antigen for the production of antibodies.
Nucleic acid constructs comprising a polynucleotide of genomic, cDNA, synthetic or semi- synthetic origin in association with a polynucleotide sequence encoding a foreign protein or a fragment thereof, (carrier sequence) can be generated by recombinant nucleic acid techniques well known in the art (See Ausubel et al., supra). According to this method, the cloned gene is introduced into an expression vector at a position located 3' to a carrier sequence coding for the amino terminus of a highly expressed protein, an entire functional moiety of a highly expressed protein or the entire protein. It is preferable to use a earner sequence from an E. coli gene or from any gene that is expressed at high levels in E. coli. It is often preferable to select a carrier sequence that will facilitate protein purification, either with antibodies, or with an affinity purification protocol that is specific for the carrier protein being used. For example, the purification protocol can be designed in accordance with the unique physical properties of the carrier protein (e.g. heat stability). Alternatively, the tag sequence may encode a protein (e.g. glutathione-S-transf erase (GST)) which can be purified by either a chemical interaction (for example glutathione purification of GST). Alternatively, some carrier proteins, such as thioredoxin (Trx) can be selectively released from intact cells by osmotic shock or freeze/thaw procedures. Often, proteins that are fused to these carrier proteins can be purified away from intracellular contaminants by virtue of the physical attributes of the carrier protein (Ausubel et al., supra). To ensure that a fusion protein is useful, according to the invention, it may be necessary to modify the expression protocol to produce a soluble protein. Due to the fact that high-level expression of certain proteins can lead to the formation of inclusion bodies, if a soluble protein is required it may be necessary to modify the following variables. The temperature at which expression is induced can affect inclusion body formation since inclusion body formation is induced at higher temperatures (37°C and 42°C) and inhibited at lower temperatures (30°C). In certain instances, lowering the total level of protein expression can lead to an increase in the proportion of soluble protein that is produced. The strain background of the cells in which the protein is being produced can affect the proportion of a particular protein that is expressed in a soluble form. Furthermore, the choice of carrier protein can affect the solubility of an expressed fusion protein (Ausubel et al., supra).
An additional problem that can be encountered when producing fusion proteins in E. coli is formation of an unstable protein, or a protein that is cleaved at the site of the junction between the carrier sequence and the sequence of the protein of interest. To decrease complications due to protein instability one can arrange for the fusion protein to be expressed as insoluble aggregates. Alternatively, one can express the fusion protein in E. coli strains that are deficient in proteases (Ausubel et al., supra).
Often it is useful to remove the carrier protein moiety from the protein of interest to facilitate biochemical and functional analyses. Methods for cleavage of fusion proteins to remove the carrier are known to those skilled in the art. The choice of a method is usually determined by the composition, sequence, and physical characteristics ofthe particular protein. Reagents such as cyanogen bromide, hydroxylamine or low pH can be used to chemically cleave fusion proteins. To avoid complications resulting from chemical cleavage (e.g. the presence of chemical cleavage sites in the protein of interest and/or the occurrence of side reactions resulting in protein modification), enzymatic cleavage methods can be used. Enzymatic cleavage protocols are advantageous because they can be carried out under relatively mild reaction conditions, and because they involve highly specific cleavage reactions. Enzymes useful for enzymatic cleavage of fusion proteins include factor Xa, thrombin, enterokinase, renin and collagenase (Ausubel et al., supra). Recombinant constructs encoding fusion proteins wherein the carrier sequence is on the order of 9-15 codons, can be generated by PCR methods. According to this method, a PCR primer will be designed to contain at least 13 nucleotides that are identical to the target sequence on either side of the nucleotide sequence encoding the carrier sequence. Preferably, the PCR primer will also contain a restriction enzyme site to facilitate cloning of the amplified product into an appropriate expression vector. PCR will be carried out as described above and the sequence ofthe amplified product will be confirmed by sequence analysis as described in Section D entitled "Isolation of a Wild type Gene".
Alternatively, recombinant constructs encoding fusion proteins can be generated by site/oligonucleotide directed mutatagenesis (Ausubel et al., supra). According to the method of site directed mutatagenesis the DNA to be mutated is inserted into a plasmid which has an FI origin of replication. A mutagenesis oligonucleotide is designed to contain 13 bp that are 100% identical to the target sequence, on either side of a sequence coding for the 9-15 codons of carrier sequence that is to be added by the mutatgenesis protocol. A single stranded preparation of the vector is prepared by the following method.
Following transformation of an appropriate bacterial strain (e.g. CJ236) with the recombinant plasmid and plating of the bacteria on LB agar plates, a single resulting colony is grown in 4x5 ml of LB plus ampicillin for 1 hour at 37°C with vigorous shaking. M13K07 helper phage (2 ml, approximately 1010-10n plaque forming units) is added and the bacteria are grown for an additional hour at 37°C with vigorous shaking. Following the addition of 7 ml of kanamycin (50 mg/ml), the bacteria are grown overnight at 37°C with vigorous shaking. The following day bacterial cultures are pooled and cells are separated by centrifugation. After the addition of 2.6 ml of 20% polyethylene glycol 200-800/2M NaCI to 20 ml of bacterial supernatant, the sample is incubated for 1 - 1.5 hours on ice. The sample is pelleted by centrifugation at 9000 rpm for 20 minutes. Following removal of the supernatant, residual supernatant are removed by centrifugation at 3000 rpm for 5 minutes. The pellet is resuspended in 400 ml of TE, extracted twice with phenol and four times with phenol: chloroform and ethanol precipitated. The resulting pellet is resuspended in 40 ml TE.
Mutagenesis is performed by using a muta-gene kit (Bio-Rad, Hercules, CA) according to the following method. To kinase the oligonucleotide primer, 1 ml (200ng) of oligonucleotide is incubated in the presence of 2 ml of 10 kinase buffer (0.5M Tris, pH 8.0, 70mM MgCl2, lOmM DTT), 2 ml lOmM rATP, 2 ml polynucleotide kinase and 13 ml H20 for 37°C for 1 hour. To carry out the annealing and synthesis steps, 2.5 ml of single-stranded template are mixed with 1 ml of kinased oligonucleotide, 1.0 ml of 10X annealing buffer (200mM Tris-HCl, pH 7.4, 20 mM MgCl2, 500mM NaCI) and 5.5 ml H20 for 10 min at 65°C. The reaction mixture is slow-cooled to 37°C. Once the sample has reached 37°C, the sample is spun briefly in a microfuge. Following the addition of 1.0 ml of 10X synthesis buffer (5mM each dATP, dCTP, cGTP, dTTP, lOmM ATP, lOOmM Tris-HCl, pH 7.4, 50 mM MgCl2, 20mM DTT), 1.0 ml T4 DNA ligase and 0.5 ml of T4 DNA polymerase, the sample is incubated for 5 minutes on ice, 5 minutes at room temperature and 1 hour at 37°C. A 2 ml aliquot of the sample is used to transform E. coli.
DNA is isolated from the transformed E. coli cells by mini prep methods known in the art (Ausubel et al., supra), and sequenced according to methods known in the art (described in Section D entitled "Isolation of a Wild Type Gene".
C. Production of a Nucleic acid Probe
The invention discloses nucleic acid probes. Preferably, the nucleic acid probes of the invention are specifically hybridizable to a mutant gene but not to a wild type form of a gene due to the presence of one or more polymorphisms. These allele specific probes can be used to screen DNA sequences of a gene which have been amplified by PCR, or are present in a genomic DNA or RNA test sample. Hybridization of a particular allele specific probe to an amplified gene sequence, under stringent conditions (described below), indicates that the polymorphism contained in the probe is present in the amplified sequence. Hybridization of a particular allele specific probe to a test sample comprising genomic DNA or RNA, under stringent conditions (described below), indicates that the polymorphism contained in the probe, is present in the nucleic acid of the test sample. Nucleic acid probes that are specifically hybridizable to a wild type form of a gene but not to a mutant form of a gene are also useful according to the invention.
In another embodiment, the probes ofthe claimed invention will be specific for a nucleic acid region that is adjacent to a region that is thought to contain one or more polymorphisms. These probes will be useful for detecting the presence of one or more polymorphisms in the adjacent region by the method of primer extension (as described in Section F entitled "Identification and Characterization of Polymorphisms".
In other embodiments, probes of the claimed invention will be used to detect a gain or loss of a restriction enzyme site known to contain one or more polymorphisms of the claimed invention. Nucleic acid probes, according to this embodiment, are able to detect a restriction enzyme fragment that is of a size that can be easily separated on an agarose gel' and visualized by Southern blot analysis. Probes that are useful according to this embodiment of the claimed invention can be specific for any region within a gene or outside of a gene.
The nucleic acids probes ofthe invention are useful for a variety of hybridization-based analyses including but not limited to Southern hybridization to genomic DNA, cDNA sequences or PCR amplification products, Northern hybridization to mRNA and RNase protection assays, DNA sequencing and isolation of genomic or cDNA clones of a gene. The probes may also be used to determine whether mRNA encoded for by a gene is present in a cell or tissue by the method of in situ hybridization. These techniques are well known in the art and can be performed as described in Ausubel et al., supra.
According to the methods ofthe above-referenced hybridization assays, polymorphisms associated with alleles of a gene, which either predispose to a particular disease (e.g. osteoporosis) or are not associated with a particular disease (e.g. osteoporosis), will be detected by the formation of a stable hybrid consisting of a polynucleotide probe comprising one or more polymorphisms and a target sequence, that also comprises one or more polymorphisms, under stringent to moderately stringent hybridization and wash conditions. If it is expected that the probes will be perfectly complementary to the target sequence, stringent conditions will be used. Hybridization stringency may be lessened if some mismatching is expected, for example, if variants are expected with the result that the probe will not be completely complementary. Conditions are chosen which rule out nonspecific/adventitious bindings, that is, which minimize noise. Since such indications identify neutral DNA polymorphisms as well as mutations, these indications need further analysis (such as assays described in Section F entitled "Identification and Characterization of Polymorphisms") to demonstrate detection of a susceptibility allele of a gene. Probes for alleles of a gene may be derived from genomic DNA or cDNA sequences from specific for the gene of interest. The probes may be of any suitable length, which span all or a portion ofthe region containing the gene. If the target sequence contains a sequence identical to that of the probe, the probes may be short, e.g., in the range of about 8-30 base pairs, since the hybrid will be relatively stable under even stringent conditions. If some degree of mismatch is expected with the probe, i.e., if it is suspected that the probe will hybridize to a variant region, a longer probe may be employed which hybridizes to the target sequence with the requisite specificity.
Probes according to the invention also include an isolated polynucleotide attached to a label or a reporter molecule which may be useful for isolating other polynucleotide sequences, having sequence similarity by standard methods, including but not limited to the above- referenced hybridization-based assays. Techniques for preparing and labeling probes (as described in Ausubel et al. Supra) are included below. A wide variety of labels and conjugation techniques are known by those skilled in the art and can be used in a various nucleic acid and amino acid assays. Means for producing labeled hybridization or PCR probes for detecting related sequences include oligolabeling, nick translation, end-labeling or PCR amplification using a labeled nucleotide. Alternatively, the protein-encoding sequence, or any portion of it, may be cloned into a vector for the production of an mRNA probe. Such vectors are known in the art, are commercially available, and may be used to synthesize RNA probes in vitro by addition of an appropriate RNA polymerase such as T7, T3 or SP6 and labeled nucleotides.
A number of companies such as Pharmacia Biotech (Piscataway NJ), Promega (Madison WI) and US Biochemical Corp (Cleveland OH) supply commercial kits and protocols for these procedures. Suitable reporter molecules or labels include those radionuclides, enzymes, fluorescent, chemiluminescent, orchromogenic agents as well as substrates, cofactors, inhibitors, magnetic particles and the like. Patents teaching the use of such labels include US Patents 3,817,838; 3,350,752; 3,939,350; 3,996,345; 4,277,437; 4,275,149 and 4,366,241. Also, recombinant immunoglobulins may be produced as shown in US Patent No. 4,816,567 incorporated herein by reference. Probes comprising synthetic oligonucleotides or other polynucleotides of the present invention may be derived from naturally occurring or recombinant single- or double- stranded polynucleotides, or be chemically synthesized.
Portions of the polynucleotide sequence having at least approximately 5 nucleotides, preferably 9-15 nucleotides, fewer than about 6 kb and usually fewer than about 1 kb, from a polynucleotide sequence encoding a gene are preferred as probes.
A DNA probe useful according to the present invention can be isolated from a gene or a polynucleotide construct derived from a gene, or from a cDNA sequence specific for a gene or a cDNA construct specific for a gene by the methods of PCR or restriction enzyme digestion, as described above. Riboprobes useful according to the invention can be synthesized by the method of in vitro transcription, or by chemical synthesis methods, as described above.
An oligonucleotide probe useful according to the invention can be designed, as described above, and synthesized in a commercially available automated synthesizer.
Nucleic acid hybridization rate and stability will be affected by a variety of experimental parameters including salt concentration, temperature, the presence of organic solvents, the viscosity of the hybridization solution, the base composition of the probe, the length of the duplex, and the number of mismatches between the hybridizing nucleic acids (Ausubel et al., supra), and as described in Section A entitled "Design and Synthesis of Oligonucleotide Primers". Southern blot analysis can be used to detect sequence variations in a gene from a PCR amplified product or from a total genomic DNA test sample via a non-PCR based assay. The method of Southern blot analysis is well known in the art (Ausubel et al., supra, Sambrook et al., 1989, Molecular Cloning. A Laboratory Manual.. 2nd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY). This technique involves the transfer of DNA fragments from an electrophoresis gel to a membrane support resulting in the immobilization of the DNA fragments. The resulting membrane carries a semipermanent reproduction ofthe banding pattern of the gel.
Southern blot analysis is performed according to the following method. Genomic DNA
(5-20 mg) is digested with the appropriate restriction enzyme and separated on a 0.6-1.0% agarose gel in TAE buffer. The DNA is transferred to a commercially available nylon or nitrocellulose membrane (e.g. Hybond-N membrane, Amersham, Arlington Heights, JL) by methods well known in the art (Ausubel et al., supra, Sambrook et al., supra). Following transfer and UV cross linking, the membrane is hybridized with a radiolabeled probe in hybridization solution (e.g. under stringent conditions in 5X SSC, 5X Denhardt solution, 1% SDS) at 65°C. Alternatively, high stringency hybridization can be performed at 68°C or in a hybridization buffer containing a decreased concentration of salt, for example 0. IX SSC. The hybridization conditions can be varied as necessary according to the parameters described in Section A entitled "Design and Synthesis of Oligonucleotide Primers". Following hybridization, the membrane is washed at room temperature in 2X SSC/0.1% SDS and at 65°C in 0.2X SSC/0.1% SDS, and exposed to film. The stringency of the wash buffers can also be varied depending on the amount of the background signal (Ausubel et al., supra).
Detection of a nucleic acid probe-target nucleic acid hybrid will include the step of hybridizing a nucleic acid probe to the DNA target. This probe may be radioactively labeled or covalently linked to an enzyme such that the covalent linkage does not interfere with the specificity of the hybridization. A resulting hybrid can be detected with a labeled probe. Methods for radioactively labeling a probe include random oligonucleotide primed synthesis, nick translation or kinase reactions (see Ausubel et al., supra). Alternatively, a hybrid can be detected via non-isotopic methods. Non-isotopically labeled probes can be produced by the addition of biotin or digoxigenin, fluorescent groups, chemiluminescent groups (e.g. dioxetanes, particularly triggered dioxetanes), enzymes or antibodies. Typically, non-isotopic probes are detected by fluorescence or enzymatic methods. Detection of a radiolabeled probe-target nucleic acid complex can be accomplished by separating the complex from free probe and measuring the level of complex by autoradiography or scintillation counting. If the probe is covalently linked to an enzyme, the enzyme-probe-conjugate- target nucleic acid complex will be isolated away from the free probe enzyme conjugate and a substrate will be added for enzyme detection. Enzymatic activity will be observed as a change in color development or luminescent output resulting in a 103-106 increase in sensitivity. An example of the preparation and use of nucleic acid probe- enzyme conjugates as hybridization probes (wherein the enzyme is alkaline phosphatase) is described in (Jablonski et al., 1986, Nucleic Acids Res., 14:6115) Two-step label amplification methodologies are known in the art. These assays are based on the principle that a small ligand (such as digoxigenin, biotin, or the like) is attached to a nucleic acid probe capable of specifically binding to a gene. Allele specific gene probes are also useful according to this method.
According to the method of two-step label amplification, the small ligand attached to the nucleic acid probe will be specifically recognized by an antibody-enzyme conjugate. For example, digoxigenin will be attached to the nucleic acid probe and hybridization will be detected by an antibody-alkaline phosphatase conjugate wherein the alkaline phosphatase reacts with a chemiluminescent substrate. For methods of preparing nucleic acid probe-small ligand conjugates, see (Martin et al., 1990, BioTechniques, 9:762). Alternatively, the small ligand will be recognized by a second ligand-enzyme conjugate that is capable of specifically complexing to the first ligand. A well known example of this manner of small ligand interaction is the biotin avidin interaction. Methods for labeling nucleic acid probes and their use in biotin-avidin based assays are described in Rigby et al., 1977, J. Mol. Biol., 113:237 and Nguyen et al., 1992, BioTechniques, 13:116).
Variations of the basic hybrid detection protocol are known in the art, and include modifications that facilitate separation of the hybrids to be detected from extraneous materials and/or that employ the signal from the labeled moiety. A number of these modifications are reviewed in, e.g., Matthews & Kricka, 1988, Anal. Biochem., 169:1; Landegren et al., 1988, Science, 242:229; Mittlin, 1989, Clincal Chem. 35:1819; U.S. Pat. No. 4,868,105, and in EPO Publication No. 225,807.
D. Isolation of a Wild type gene
A wild type version of a candidate gene according to the invention can be isolated by cloning from an appropriately selected genomic library according to methods well known in the art. Methods of cloning are described in Section B entitled "Production of a Polynucleotide Sequence
The sequence of the cloned gene will be determined by sequencing methods well known in the art (see Ausubel et al., supra and Sambrook et al., supra). Methods of sequencing employ such enzymes as the Klenow fragment of DNA polymerase I, Sequenase® (US Biochemical
Corp, Cleveland, OH), Taq polymerase (Perkin Elmer, Norwalk, CT), thermostable T7 polymerase (Amersham, Chicago, IL), or combinations of recombinant polymerases and proofreading exonucleases such as the ELONGASE Amplification System (Gibco BRL, Gaithersburg, MD). Preferably, the process is automated with machines such as the Hamilton Micro Lab 2200 (Hamilton, Reno NV), Peltier Thermal Cycler (PTC200; MJ Research, Watertown, MA) and the ABI 377 DNA sequencers (Perkin Elmer).
E. Isolation of a Mutant Gene
A mutant version of a candidate gene according to the invention can be isolated by cloning from an appropriately selected genomic library according to methods well known in the art. Methods of cloning are described in Section B entitled "Production of a Polynucleotide Sequence."
The sequence of the cloned gene will be determined by sequencing methods described in Section D entitled "Isolation of a Wild Type Gene."
F. Identification and Characterization of Polymorphisms a. Identification of SNPs by in silico methods (isSNPs)
1. Identification of Polymorphisms in Candidate Genes
The starting point is a set of experimentally derived nucleic acid sequences. In order to be useful for SNP discovery by the invention, it is preferred that the sequences have complete chromatogram files from a gel or capillary electrophoresis sequencing machine. When this is not available, quality score data which assigns a score to each base in the sequence indicating the likelihood of error for the basecall may be used. If neither of these data are available, the sequence may be used to assist the clustering of other sequences and in some cases to provide additional verification for a discovered SNP, but is not be used by the invention for the identification of the polymorphism. The population of sequences used may constitute either a database of cDNA-derived sequences or genomic sequence. In a preferred embodiment, sequences used by the invention are from an assembled cDNA database, such as the LifeSeqGold database (Incyte Genomics, Inc(Incyte), Palo Alto, CA). Derivation of Nucleic Acid Sequences cDNA was isolated from libraries constructed using RNA derived from normal and diseased human tissues and cell lines. The human tissues and cell lines used for cDNA library construction were selected from a broad range of sources to provide a diverse population of cDNAs representative of gene transcription throughout the human body. Descriptions of the human tissues and cell lines used for cDNA library construction are provided in the LIFESEQ database (Incyte Pharmaceuticals, Inc. (Incyte), Palo Alto CA). Human tissues were broadly selected from, for example, cardiovascular, dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, reproductive, and urologic sources. Cell lines used for cDNA library construction were derived from, for example, leukemic cells, teratocarcinomas, neuroepitheliomas, cervical carcinoma, lung fibroblasts, and endothelial cells. Such cell lines include, for example, THP-1, Jurkat, HUVEC, hNT2, WI38, HeLa, and other cell lines commonly used and available from public depositories (American Type Culture Collection, Manassas VA). Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical agent such as 5'-aza-2'-deoxycytidine, treated with an activating agent such as lipopolysaccharide in the case of leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress.
Sequencing of the cDNAs Methods for DNA sequencing are well known in the art. Conventional enzymatic methods employ the Klenow fragment of DNA polymerase I, SEQUENASE DNA polymerase (U.S. Biochemical Corporation, Cleveland OH), Taq polymerase (The Perkin-Elmer Corporation (Perkin-Elmer), Norwalk CT), thermostable T7 polymerase (Amersham Pharmacia Biotech, Inc. (Amersham Pharmacia Biotech), Piscataway NJ), or combinations of polymerases and proofreading exonucleases such as those found in the ELONGASE amplification system (Life Technologies Inc. (Life Technologies), Gaithersburg MD), to extend the nucleic acid sequence from an oligonucleotide primer annealed to the DNA template of interest. Methods have been developed for the use of both single-stranded and double-stranded templates. Chain termination reaction products may be electrophoresed on urea-polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucleotides) or by fluorescence (for fluorophore- labeled nucleotides). Automated methods for mechanized reaction preparation, sequencing, and analysis using fluorescence detection methods have been developed. Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 liquid transfer system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler (PTC200; MJ Research, Inc. (MJ Research), Watertown MA), and ABI CATALYST 800 thermal cycler (Perkin-Elmer). Sequencing can be carried out using, for example, the ABI 373 or 377 (Perkin-Elmer) or MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA) DNA sequencing systems, or other automated and manual sequencing systems well known in the art.
The nucleotide sequences have been prepared by current, state-of-the-art, automated methods and, as such, may contain occasional sequencing errors or unidentified nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified bases do not represent a hindrance to practicing the invention for those skilled in the art. Several methods employing standard recombinant techniques may be used to correct errors and complete the missing sequence information. (See, e.g., those described in Ausubel, F.M. et al. (1997) Short Protocols in Molecular Biology, John Wiley & Sons, New York NY; and Sambrook, J. et al. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Plainview NY.)
Assembly of cDNA Sequences
Human polynucleotide sequences may be assembled using programs or algorithms well known in the art. Sequences to be assembled are related, wholly or in part, and may be derived from a single or many different transcripts. Assembly of the sequences can be performed using such programs as PHRAP (Phils Revised Assembly Program) and the GELVIEW fragment assembly system (GCG), or other methods known in the art.
Alternatively, cDNA sequences are used as "component" sequences that are assembled into "template" or "consensus" sequences as follows. Sequence chromatograms are processed, verified, and quality scores are obtained using PHRED. Raw sequences are edited using an editing pathway known as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Pharmaceuticals, Palo Alto, CA). A series of BLAST comparisons is performed and low- information segments and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n' s", or masked, to prevent spurious matches. Mitochondrial and ribosomal RNA sequences are also removed. The processed sequences are then loaded into a relational database management system (RDMS) which assigns edited sequences to existing templates, if available. When additional sequences are added into the RDMS, a process is initiated which modifies existing templates or creates new templates from works in progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences themselves. After the new sequences have been assigned to templates, the templates can be merged into bins. If multiple templates exist in one bin, the bin can be split and the templates reannotated.
A resultant template sequence may contain either a partial or a full length open reading frame, or all or part of a genetic regulatory element. This variation is due in part to the fact that the full length cDNAs of many genes are several hundred, and sometimes several thousand, bases in length. With current technology, cDNAs comprising the coding regions of large genes cannot be cloned because of vector limitations, incomplete reverse transcription of the mRNA, or incomplete "second strand" synthesis. Template sequences may be extended to include additional contiguous sequences derived from the parent RNA transcript using a variety of methods known to those of skill in the art. Extension may thus be used to achieve the full length coding sequence of a gene.
Analysis of the cDNA Sequences
The cDNA sequences are analyzed using a variety of programs and algorithms which are well known in the art. (See, e.g., Ausubel, supra. Chapter 7.7; Meyers, R.A. (Ed.) (1995) Molecular Biology and Biotechnology, Wiley VCH, New York NY, pp. 856-853). These analyses comprise both reading frame determinations, e.g., based on triplet codon periodicity for particular organisms (Fickett, J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and stop codons; and homology searches. Computer programs known to those of skill in the art for performing computer-assisted searches for amino acid and nucleic acid sequence similarity, include, for example, Basic Local Alignment Search Tool (BLAST; Altschul, S.F. (1993) J. Mol. Evol. 36:290-300; Altschul, S.F.et al. (1990) J. Mol. Biol. 215:403-410.) BLAST is especially useful in determining exact matches and comparing two sequence fragments of arbitrary but equal lengths, whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score set by the user (Karlin, S. et al. (1988) Proc. Natl. Acad. Sci. USA 85:841-845.) Using an appropriate search tool (e.g., BLAST or HMM), GenBank, SwissProt, BLOCKS, PFAM and other databases may be searched for sequences containing regions of homology to a query rbosm or RBOSM of the present invention. Other approaches to the identification, assembly, storage, and display of nucleotide and polypeptide sequences are provided in "Relational Database for Storing Biomolecule Information," U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based Full-Length Biomolecular Sequence Database," U.S.S.N. 08/811,758, filed March 6, 1997; and "Relational Database and System for Storing Information Relating to Biomolecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, all of which are incorporated by reference herein in their entirety.
Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif, BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, in "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N. 08/812,290, filed March 6, 1997, incorporated herein by reference.
Identification of Sequence Variants and Polymorphisms
The method comprise a series of filters to identify isSNPs from other sequencing variants and errors. The filters can be grouped into the following five sets of filters by the order of application in the method:
Preliminary Filters: the main filter in the first group removes the majority of base call errors by requiring a minimum phred quality score of 15. Additional filters at this stage deal with sequence alignment errors as well as errors resulting from improper trimming of vector sequence, chimeras and splice junctions.
Advanced Chromatogram Analysis: additional base call errors are then detected by examining the original chromatogram files in the vicinity of a putative SNP by an automated procedure resulting in a set of SNPs wherein the base call error rate is reduced to less than 5%. Clone Error Filters: errors introduced during laboratory processing such as those caused by reverse transcriptase, polymerase or somatic mutation are among the most difficult to distinguish from true SNPs. The Clone Error filters use statistically generated algorithms to identify these sources of error. A small percentage of actual SNPs will be discarded at this stage. Clustering Error Filters : these types of errors result from the incorrect clustering of close homologs, pseudo- genes or from contamination by nonhuman sequences. The filters developed to minimize these clustering errors are also statistically based. As above these filters may be reject a fraction of actual SNPs
Finishing Filters: these filters remove duplicate and redundant SNPs from the generated list of SNP, and remove SNPs which are from the hypervariable regions of hypervariable genes such as immunoglobulin and T cell receptors.
Pre-processing steps
The sequences must first be trimmed to eliminate vector sequence, contamination and repetitive sequences. Then certain low information content sequences (for example, long runs of a single base, or two or three-base repeats) and repetitive sequences (for example Alu sequences in humans) must be massed (changed to N's) to prevent over-clustering errors. The clustering process then identifies the sets of sequences that are believed to be derived from the same original DNA sequence or gene. The sequences in each cluster or then aligned using a method such as phrap which also defines a consensus sequence. It will be well recognized by those skilled in the art that there are numerous existing programs for carrying out these processes, and the SNP discovery process described herein will work equally well with any of them. In the instant embodiment, the preferred processes are Blocked 1 for trimming and masking, a variety of different algorithms for clustering, and phrap for the alignment. It will be recognized by those skilled in the art that phrap and other alignment methods carry out a secondary clustering step which divides clusters into contigs, and carry out a secondary trimming step which defines the end points of the portion of each sequence which participates in the contig. The contigs then maybe searched for the occurrence of SNPs.
Errors in the trimming, clustering and alignment processes will cause SNP discovery errors, usually false positives (the prediction of SNPs where they do not exist). Additional filters which are the subject of the invention are designed to recognize and remove these errors by providing the ability to identify likely errors in the processes and to correct them.
In some instances, it is preferred, as an optional step, to unmask regions of sequences which were masked because of low information content or repetitive sequence) during the clustering process can be unmasked after clustering to allow discovery of SNPs within these regions.
Identification of Candidate SNP Sequences
The first step in identifying candidate SNP sequences is to redefine the end points of each sequence as the points within the previous end points where a stretch of at least 10 consecutive base calls, containing at least eight base changes, matches the consensus sequence exactly. Sequence trimming errors (both at single sequence stage and at the alignment stage contribute to the false positives when foreign sequence (vector, chimera or splice variant) is similar to the real sequence and the true boundary is difficult to determine. This step is a conservative approach to avoid false positives and also filters out lower-quality sequence that the ends. The reason the length of the match with a consensus is measured in base changes is to avoid low significance matches on repetitive sequence such as polyA.
The next step is an each position of the alignment to compare the base calls of all the aligned sequences which are between their start and end positions and which have quality scores greater than a set threshold, and which have neighboring base calls which agree with a consensus sequence and where the neighboring base calls also have a quality score > the threshold. Preferably the threshold is a phred quality score greater than or equal to 15. The possibilities are A, C, G, T, and -(deletion).
The next step is a Clone Filter where if there has been more than one base call for a sequence position, then the clone for each sequence is identified in the sequences corresponding to each clone are compared. If the base calls for different sequences from the same clone disagree, then all the sequences for this clone at this base position are removed from consideration. After all of these filters, positions for which there is more than one base call are candidate SNPs. The "wild type" base call is the one in the consensus sequence and the others are designated candidate SNPs. If the wild type base call is a deletion, then the SNP is considered to be an insertion at the previous base.
Automated Chromatogram Checking
The next filters require opening of the chromatogram files for the sequences identified as containing candidate SNPs. At each candidate SNP position, the chromatogram data of each sequence passing the Identification Filters is extracted. The first step in this process utilizes a program ABIdump to translate binary ABI chromatogram files into usable form.
Multiple Base Call Algorithm filter: the ABI base calls for each sequence are compared to the phred base calls. If the base calls do not agree at the SNP position and the two adjacent flanking positions, then the sequences are removed from consideration.
Intensity Filter: if the SNP is a single base change (this step is skipped for insertions and deletions), then the process intensity values for each of four bases at the call chromatogram location of the candidate SNP base are used to compute a ratio. If we call the intensity of wild type, "wt", the intensity of the SNP base "snp", the minimum of the other two "min", and the phred quality of the base call "Q", then the wild type sequences must have
(snp-min) < (wt-min)(Q-17)/37 and Q>=17 to be considered high-quality, and (snp-min)<(wt-min)(Q-4)/37 and Q>=15 to be considered a low quality pass.
The basis for these formula is that if a base is mis- called, then there is likely to be a residual peak for the correct base. The larger the peak for the wild type base, the less likely that the call of the SNP is correct. The actual thresholds in the formula are based on empirical data from clones which were sequence multiple times and which gave a set of confirmed SNPs and error rates for algorithm optimization.
The candidate SNP passes only if at least one wild type sequence passes and at least one SNP sequence passes. The quality of the candidate SNP is the lower of the highest wild type pass level and the highest SNP pass level (if there is a high-quality wild type sequence but only low quality SNP sequences, then the candidate is low quality. A SNP quality value is returned. Clone Error Quality Filters (somatic mutation/reverse transcriptase/polymerase errors)
The purpose of these filters is to remove errors which are actually in the clone, that is, the clone sequence was correct but the clone does not represent the individual being sequenced. Three possible sources of these errors are somatic mutations, errors made by reverse transcriptase in the process of making cDNA, and DNA polymerase errors in those situations where the DNA has been amplified by PCR at some point prior to inserting in the cloning vector. Somatic mutations can be a particular problem in sequencing clones derived from cell lines.
Polymerase errors are specific to the type of sequencing protocol used. For example, reverse transcriptase is involved in EST sequencing but not genomic clone sequencing. Polymerase is involved in the creation of extension clones (polymerase is used in all sequencing reactions, but errors are less likely to arise because only a fraction of the templates are affected in contrast to the extension process where a single polymerase product becomes a template for the entire reaction). This filter is not applied to genomic sequences in the current embodiment on the premise that the genomic sequences do not have polymerase errors, and that somatic mutations are likely to have the same profile as real SNPs.
This filter also filters out rare SNPs as well as apparent SNPs which are not real. It is difficult to determine and confirm by experiments to what extent SNP candidates are too rare to be confirmed vs. simply not real. For many applications, very rare SNPs are of less utility than common ones such that this is not a problem; however in some applications it may be advisable to turn this filter off.
Base change sequence analysis filter
The premise of this filter is that probabilities of different mutations is different depending on the source. For example true SNPs may be mostly transitions whereas reverse transcriptase mutations could be primarily G to T mutations. While this does not allow one to determine for sure that a given change is a true SNP, it allows one to evaluate the relative likelihood that a given mutation is a true SNP. SNP confirmation data suggest that G/T SNP candidates in which there is only one clone having the T allele have a very low probability of being real SNPs. The SNP candidates are excluded from the high confidence set (they are kept in a different file-their confirmation rate is well below 50 percent). The other set which had a very low confirmation rate is any A/T SNP.
Frequency Filter This filter is based on the concept that true SNPs have a different frequency profile than clone errors and that a candidate SNP which is evident in only one clone in a deep alignment is less likely to be real than one which appears in one clone in a shallow alignment. The likelihood of finding a SNP at a given sequence location is a function of the number of chromosomes sequenced. This curve is distinctly non-linear as most SNPs are sufficiently frequent, to be found with relatively few sequences. The probability of an error of this type, however is essentially linear in the number of sequences since the chance of the change occurring in two different sequences is independent. This means that the probability that a candidate SNP observed in a single clone is a true SNP is lower if the alignment is deep then if a is shallow. Any SNP occurring in a single clone in an alignment of more than 20 clones (counting only high-quality sequences which have a chance of contributing a candidate SNP) is excluded from the high confidence set.
This filter is the basis of a secondary method used to develop the base change sequence analysis filter. Comparing the set of single clone SNPs from shallow alignment's with those from deep alignment's, which are more likely to be errors, will reveal base changes which are more likely to be associated with polymerase errors and somatic mutations.
Clustering Error Filters
These filters are intended to remove candidates SNPs which result from the incorrect clustering of similar sequences such as highly homogenous genes, similar genomic sequences, and contamination from other species where the sequences of the species have been mis- labeled as human.
Number of base change filter This filter distinguishes homologous sequences from SNPs on the basis ofthe frequency of variants. True SNPs occur about one per kd when comparing to sequences or once per 2 kb if the length of sequences is included, and this fraction decreases as the depth of the alignment increases. Since EST sequences tend to be about 500 bp or less in length, then it would be expected to have not more than one SNP per four sequences. The number of SNPs in the cluster is divided by the number of sequences in the cluster and SNPs for which this number is larger than one are discarded. The higher the number, the less likely the SNP is to be real. The threshold value of one was chosen because it appears to correspond to roughly a 50 percent success rate, however the threshold value could be adjusted to higher value to accept lower confidence SNPs.
Distance from next polymorphism filter
This filter calculates the number of SNPs for which the sequence is the only representative within a window of 100 bases on either side, and discards any of the SNPs for which there are more than one other SNP in this window. This threshold can be set higher, but the actual fraction of SNP candidates which are true SNPs drops off to less than 50 percent.
Haplotype clustering filter
When sequences from different sources are inappropriately clustered, it is possible to divide them into two or more clusters which are consistent. In particular, if we take any two differences between homologs and consider the haplotypes of the clones which overlap both SNPs, there are only two haplotypes. In other words, a 2x2 matrix of haplotypes is diagonal having only two non-zero entries. If there are only two sequences, then this is expected. For each SNP, a 2x2 haplotype matrix with each other SNP is computed. If it is diagonal, and there are more than two sequences, than the sum of the diagonal elements minus one is a "cluster total" for this SNP. This "cluster total" number has proven to be empirically correlated with the confirmation rate, probably because it predicts clusters which contain para-logs, homologs and contamination from other species. Candidates SNPs which have a cluster number of less than eight are kept. This threshold value for the cluster total can be varied. Redundancy/finishing filters
Redundant SNP filter: SNPs in different contigs of the same gene which have the same base change and surrounding sequence are flagged as redundant. To accommodate possible splice variants this redundancy filter also applies to SNPs which have the surrounding sequence matches on only one side.
T cell receptor/immunoglobulin filters
Sequences containing SNPs are filtered to remove SNPs in sequences that are homologs to T cell receptors and immunoglobulin genes because both types of genes have hyper- variable regions which could result in false positives.
Output file
SNP related data: With each candidate SNP a variety of data is kept, including the number and sources of all contributing sequences (for example gene album, HTPS, FL, WashU/Merck, etc.), the surrounding sequence, measures of the ratio and quality scores for the "best" sequence representing each allele, etc.
Sequence related data: for each sequence associated with each SNP, the following data is kept including the distance in each direction to the end of the sequence, the distance in each direction to the next base different from the consensus and passing the initial quality filters, the library, tissue ID, donor ID and comments (for example tumor, diseases, normal).
b. Identification of polymorphisms in osteoporosis associated genes by SSCP
The invention provides methods for detecting the presence of polymorphisms in candidate genes ofthe invention. The invention also provides methods for distinguishing polymorphisms which contribute to a particular disease (e.g. osteoporosis) over polymorphisms which do not contribute to the disease.
1. Identification of Polymorphisms in Candidate Genes Identification of polymorphisms in a candidate gene, according to the invention, will involve the steps of isolating the candidate gene, determining its genomic structure and identifying polymorphisms in the DNA sequences in any portion of the entire protein-coding region. The invention also provides methods for identifying polymorphisms in the DNA sequences corresponding to RNA splice junctions. The invention also provides methods for identifying polymorphisms in the DNA sequence corresponding to the regulatory (promoter) region of the candidate gene.
A candidate gene is isolated by cloning methods well known in the art (described above). Preferably the genomic structure of a candidate gene is determined by Southern blot analysis, as described in Section C. It is expected that the entire sequence of an open reading frame (ORF) of an average entire gene can be spanned by 16 PCR-amplified DNA fragments or amplimers of an average length of 225 bp. It is expected that a smaller gene can be spanned by 1-2 amplimers and that >50 amplimers are required to span extremely large genes. Primers useful for production of the amplimers of a particular candidate gene are designed based on preexisting knowledge of the sequence ofthe wild type gene, according to the primer design strategies described in Section A entitled "Design and Synthesis of Oligonucleotide Primers."
For PCR amplification of a region to be tested by SSCP it is preferable to design primers that amplify overlapping regions of the candidate gene. If a sequence variation is located in a region of a candidate gene that corresponds to the region to which the primers hybridize, the primers will likely not bind, the region containing this sequence variation will not be amplified and the variation will not be detected in PCR based assays. By producing overlapping amplimers it is expected that virtually all of the sequence variations in a particular candidate gene will be detected. The amount of overlap in the amplimers is somewhat variable (approximately 20%) and the precise location ofthe overlapping regions will depend on the location of regions comprising a sequence that is an appropriate primer sequence. It is a possibility that a polymorphism will be located at a position just adjacent to the primer site. Consequently, sequence information will be available for only 20 bp on one side of the polymorphism and for 104-279 bp on the other side of the polymorphism. However, this should be a sufficient amount of sequence information to allow definition of a unique sequence context in which to define the particular polymorphism. Based on screening analysis of 92 samples (184 chromosomes), it is expected that about 50% of the amplimers will demonstrate polymorphisms, and that approximately 80% of these amplimers will detect changes at single positions while the remaining 20% will detect base changes at two positions. Based on these estimates, it is expected that there will be approximately 10 sequence variations per open reading frame. However, the number of amplimers that demonstrate polymorphisms with vary depending on the number of individuals tested, the ethnicity and structure of the population being tested, and the region of DNA being tested.
Preferably, each polymorphism will be detected in the context of an SSCP fragment. Polymorphism analysis by fluorescent SSCP (fSSCP, described in detail in Section F entitled "Identification and Characterization of Polymorphisms") uses PCR to generate an amplimer of DNA to be studied. The region to be tested is defined as the region between the primers (e.g. the region that is incorporated into the PCR product and reflects the sequence of the DNA sample being tested). The PCR primers reflect the sequence of the DNA sample being tested and are incorporated into the PCR product as one end of each strand of DNA in the PCR product. If a polymorphism occurs in a primer binding site either the PCR primer does not bind due to the mismatch and the PCR will not produce a product, or the primer binds, an amplification step occurs wherein the primer is incorporated, but the amplified product does not contain the polymorphism which occurs at the primer binding site. Therefore, fSSCP provides a method of screening a DNA sequence located between PCR primers for the presence of polymorphisms. The sensitivity of the technique of fSSCP for detecting a polymorphism is affected by length, such that there is a substantial decrease in the detection of polymorphisms in amplimers that are greater than 300 bp in length. However, different conditions for performing SCCP at high sensitivity with larger fragments, e.g. 800-1500 bp have also been described. If the length of DNA screened per amplimer is decreased then more amplimers are required to screen a region of a given size. Therefore, efficient screening of a gene dictates that the lower limit of the size of an amplimer is 125 bp. To attain specificity for a particular gene sequence, pnmers are usually 20-25 bp in length, and additional criteria such as G:C content, and intra- and inter-primer complementarity are important considerations in primer design (as described above). All of these considerations are addressed if the primer3 program (Copyright (c) 1996 Whitehead Institute for Biomedical Research) is employed to design pairs of primers suitable for use in a single PCR reaction. Typically, program parameters are set so that multiple amplimers are designed in the length range of 150-300bp, with predicted primer melting temperatures in the narrow range 60- 62°C. The narrow temperature range increases the likelihood that a single set of PCR conditions can be used to generate a wide variety of different amplimers.
If it is desirable to screen a contiguous stretch of DNA which is larger than the maximum fragment size desired for sensitive polymorphism detection by fSSCP (300 bp) it is necessary to use multiple amplimers (which are assayed separately) which span the region of interest. Since the primer sites in an amplimer are not tested, these sequences need to be contained within another amplimer. To test the primer sequence, overlapping amplimers are designed by an algorithm that evaluates a large number of amplimers generated by the primer3 program for the optimum overlapping set according to a cost function. Thus, a series of overlapping PCR amplification products can be used to test a contiguous stretch of DNA. Constraints on primer design are such that the absolute minimum overlap is rarely possible. As a result, some regions of overlap occur that results in 'double testing' of a particular segment of DNA. The detection efficiency is affected by the sequence context of the polymorphism; it is possible that a polymorphic site will be detected in only one of two different amplimers which overlap the same site. One strategy that is useful for increasing polymorphism detection efficiency is to design overlapping amplimers to generate 2-fold coverage of all sequences.
SSCP does not detect 100% of polymorphisms. The invention provides for detection of polymorphisms with an efficiency of 95% under a single set of conditions using single coverage of sequences; a 2-fold screening strategy can be employed if it is necessary to increase this detection efficiency.
It is expected that the polymorphism can be located, and detected anywhere in the SSCP fragment except in the regions at each end that correspond to the sequence of the PCR primers. The precise location and identity of the sequence variation(s) of a particular SSCP fragment can be confirmed by sequencing the fragment as described in Section D entitled "Isolation of a Wild Type Gene". The sequence of a candidate gene will be compared to the known sequence of a wild-type version of the gene by using the following DNA/protein sequence analysis programs and methods.
There are a large number of freely available methods for performing sequence comparisons. These methods differ in their speed of execution, their sensitivity, and the type of comparisons they are able to make. For example one can compare two DNA sequences, two protein sequences, a DNA sequence to a protein sequence by conceptual translation, or DNA sequences as if they were protein sequences, again by conceptual translation. The BLAST suite of programs (Altschul et al., 1990, J.Mol.Biol. 215:403) are commonly used to perform the above-referenced type of analysis. Although the BLAST suite of programs provides a rapid method of determining multiple distinct similarities between two sequences, these programs are not guaranteed to find an optimal solution when comparing two sequences according to a particular set of parameters. PSI-BLAST is a more sensitive variant of BLAST that operates by interactively searching the database while simultaneously refining the query pattern based on the results of the searches. Other packages of programs that are available and which have different specific properties include the HMMER, SAM, WISE, STADEN and FASTA packages, and the programs est_genome, dotter, e-PCR, Clustal, cross_match and phrap (Pearson, 1996, Methods EnzvmoL. 266:227).
If sequence information is available for the intron-exon boundaries and for a region of the intron (of approximately 30-150 bp) located immediately 5' of an intron-exon boundary, primers can be designed to produce amplimers useful for identifying polymorphisms located in the RNA splice junctions. Similarly, if the promoter region of a candidate gene has been sequenced, primers can be designed to produce amplimers useful for identifying polymorphisms located in the promoter region. Additional methods for detecting and isolating polymorphisms include, but are not limited to fluorescent polarization-TDI, mass spectroscopy denaturing gradient gel electrophoresis, chemical cleavage of mismatch, constant denaturant capillary electrophoresis, RNase cleavage, heteroduplex analysis, sequencing by hybridization, DNA sequencing, representational difference analysis, and denaturing high performance liquid chromatography, described below in Section F entitled, "Identification and Characterization of Polymorphisms".
2. Methods of Determining if a Polymorphism Contributes to osteoporosis
No two individuals (excluding identical twins or other clones) have the same sequence of DNA in their genome. Variability in gene sequences between individuals accounts for many of the obvious phenotypic differences (such as pigmentation of hair, skin, etc.) and many non- obvious ones (such as drug tolerance and disease susceptibility). In a population, the DNA sequence that occurs at the highest frequency at any given site is commonly referred to as the wild type sequence. The term "wild type sequence" can be misleading, however, because in different populations an alternative form of a DNA sequence may be predominant and thus considered wild type for that particular population. DNA polymorphisms are located throughout the genome, within and between genes, and the various forms may or may not result in differential gene function (as determined by comparing the function of two alternative forms of the same sequence). Most polymorphisms do not alter gene function and are called neutral polymorphisms. Some polymorphisms do have an effect on gene function, for example by changing the amino acid sequence of a protein, or by altering control sequences such as promoters or RNA splicing or degradation signals.
Polymorphisms can be used in genetic studies to identify a gene involved in a disease. If a polymorphism alters a gene function such that it increases disease susceptibility, then it will be present more often in individuals with the disease than in those without the disease. Alternatively, if a particular DNA variant is protective against a disease, it will be found more often in individuals without the disease than in those with the disease. Statistical methods are used to evaluate polymorphism frequencies found in diseased as compared to normal populations, and provide a means for establishing a causal link between a polymorphism and a phenotype. To detect a significant association between a disease and a polymorphic site, different tests may be used with either genotypic or allelic distributions. The simplest test consists of a t- test wherein the frequency of the polymorphic alleles in normal individuals and individuals with the disease phenotype is compared. A comparison of the genotypic distribution in normal individuals and individuals with the disease phenotype can also be performed using a chi-square test of homogeneity. These tests are implemented in all commercially or freely available statistical packages, for example SAS and S+, and are even included in Microsoft Excel. More sophisticated analyses will be performed by incorporating covariates such as linear regression or logistic regression, and by accounting for the information provided by adjacent polymorphic sites (multipoint analysis). An example of this type of program is the freely available program "Analyze" by JD Terwilliger (currently available at the WWW site ftp://ftp.well.ox.ac.uk/pub/genetics/analyze). If a polymorphism has a phenotypic effect, a bias will exist in the distribution of polymorphisms between groups that have and do not have the disease phenotype. This manner of analysis can be used to study a trait that is not necessarily a disease; any trait can be studied by comparing a group with a particular phenotypic form of a trait to a group with a different phenotypic form of that trait. It is important that the cases and controls are correctly matched with regards to ethnicity, environmental influences, and other factors which could effect the phenotype being studied. Studies which test polymorphism frequencies within groups exhibiting different phenotypes and use statistical methods to compare the group polymorphism frequencies and identify correlations with phenotypes, are known as "associations studies". Some polymorphisms that occur in a single gene can alter the function of a gene sufficiently such that the polymorphism results in a disease (monogenic disease). However, many common human diseases are polygenic; that is they are the result of complex interactions of various forms of multiple genes. In the case of polygenic diseases, the alteration of a single gene may not be detrimental per se, but in combination with certain sequence variants of other genes, this altered DNA sequence may contribute to a disease phenotype. DNA variants leading to monogenic diseases are usually rare in a population due to the process of natural selection against those carrying the disease gene. As variants in genes that are involved in polygenic disease do not produce the disease phenotype unless they occur in the appropriate combination with other gene variants, normal individuals can carry a subset ofthe disease-contributing variants without suffering adverse effects. Thus, disease-contributing gene variants that are associated with polygenic diseases may exist at a high frequency in a normal population. Selection against these disease variant forms of a gene will only occur when they are present in the appropriate disease- causing combination and there may not necessarily be selection against these gene variants in individuals carrying a subset of the disease-contributing variants. Neutral DNA variants do not alter gene function or contribute to a disease, are under no selective pressure and occur at variable frequencies within populations.
Monogenic diseases tend to be rare within the population, and therefore few patients may be available for studies of these diseases. A polymorphism in a single specific gene is necessary and usually sufficient to cause a monogenic disease, such that associations between the variant gene and the phenotype are usually readily apparent. In cases where the expression of a mutation phenotype is complete, ("complete penetrance"), the polymorphism present in the disease gene will not be found upon examination of a large number of normal individuals. If there is not complete penetrance then some apparently normal individuals will contain the mutation; the difference in frequency of occurrence ofthe variant gene in the disease group as compared to the normal population will reveal that the variant is associated with the disease. In polygenic diseases, variation at different genes occurs in a combination which alters susceptibility to the disease. Although several genes may have variant forms which can contribute to a disease phenotype, it is not always necessary for a contributing variant to be present at every gene potentially contributing to the disease in a given affected individual. For example, a hypothetical disease could be caused by a particular combination of variants at three of four genes, designated as A, B, C, and D. Appropriate susceptibility variants in combination at any three ofthe genes can cause the susceptibility, i.e. one person with increased susceptibility may have susceptibility variants in genes A, B, and C, while another individual with increased susceptibility to the same disease will have susceptibility variants in genes B, C, and D. Therefore, although not all affected individuals will have the same susceptibility variants, the net result is that a diseased population will have susceptibility variant forms of genes A, B, C, and D at a higher frequency than an unaffected population (as detected by association studies).
Unlike monogenic diseases which result from polymorphisms that are not present in control populations, the polymorphisms which contribute to the polygenic disease are also present in a normal population. As described in the example above, an individual with susceptibility polymorphisms in only one or two of the genes potentially contributing to the disease susceptibility will be normal with regard to disease susceptibility. Therefore, normal populations can be used to identify polymorphic regions of the genome in the population, and these regions can then be specifically tested in larger patient and control populations. Typically, a gene is analyzed for the presence of polymorphisms by testing between 2 and 100 normal individuals in order to establish if a particular polymorphism is present for that gene in the population. Once a polymorphic site(s) has been defined, the polymorphic site is then tested in case (disease) and control (normal) populations and statistical analyses are performed to identify polymorphisms which occur at significantly different frequencies in the two populations.
The determination ofthe statistical significance of polymorphism frequency differences is dependent upon the size of the observed frequency difference between the populations, and on the size of the populations being studied. If a significant difference is found, then it can be concluded that an association exists between the polymorphism and the phenotype being studied. A statistically significant difference is a frequency difference at a particular site between populations which would be expected to occur by chance in only 5 out of 100 tests. That is, a difference which has a 95% probability of being a true difference due to the affect of the gene.
The foregoing discussion describes a method of testing for an association between a polymorphism which is the direct contributor to a disease and the disease phenotype. However, polymorphisms which do not directly contribute to a disease can also be used to identify regions of the genome which contain genes that contribute to the disease by virtue of their proximity to disease-contributing polymorphisms.
In humans, DNA exists as 23 homologous pairs of linear molecules (chromosomes). Recombination is a process which results in reciprocal exchanges of short homologous DNA segments between these homologous DNA pairs. Only one of each of the 23 pairs of chromosomes is inherited by the offspring. The inherited chromosome is thus made up of tandemly arrayed segments of DNA derived from both of a pair of chromosomes. Consequently, DNA is transferred in segments from one generation to the next. Although the boundaries of each inherited segment may vary in each generation, the net effect is that sequences of DNA which are adjacent along the length of the molecule are inherited together at a higher frequency than sequences that are farther apart. If a region (continuous linear segment) of DNA has two or more polymorphisms that are close together, they will be co-inherited at a higher frequency than polymorphisms that are farther apart, as they are more likely to remain on the same segment of DNA during recombination. Therefore, if two or more polymorphisms are close together, they will occur together at a higher frequency in a population than would be expected by random segregation. This effect is known as linkage. Linkage studies are performed using multiply affected individuals within families; the most commonly used approach is to test markers located throughout the genome in many sets of affected sib pairs that share the same phenotype. Markers which are located in the region of a genome that contributes to the phenotype will be inherited in both siblings, along with the phenotype, at a higher frequency than expected by chance. Studies wherein data from many such families is compared can be used to implicate a region of a genome as one that contributes to a particular phenotype. Linkage disequilibrium (LD) association studies provide another method for using polymorphisms in genetic studies. The method of LD involves making a correlation at the population level, between the alleles (alternative polymorphic forms ofthe same sequence site) present at different genomic sites. If site 1 has two variant forms, A and a, and site 2 has two variant forms B and b, the observation in a population that allele A at site 1 is more often found with allele B at locus 2 than with allele b is an example of LD. If allele B is a disease- contributing polymorphism, then testing at allele A may show an association with the disease.
Linkage disequilibrium may be generated in several ways. Maintenance of LD in a population allows a disease association to be detected many generations after the formation of LD. The maintenance of LD is explained by linkage: the closer the two loci, the longer (in terms of number of generations) that particular LD is maintained. As a result, polymorphisms which do not directly contribute to a disease can be used to identify regions of the genome which contain a disease contributing polymorphism. If a polymorphism affects gene function such that it contributes to a phenotype being studied and is found to be associated with the phenotype, nearby (neutral) polymorphisms which are in LD with the disease polymorphism may also show an association with the disease. Conversely, if a polymorphism does not affect gene function but is found to be associated with a particular phenotype, this polymorphism is in LD with a different, but adjacent polymorphism that affects gene function such that it contributes to the phenotype being studied. If a neutral polymorphism is always inherited with a phenotype- contributing polymorphism, then the strength of the association of the neutral polymorphism to the phenotype will be equal to that of the polymorphism which affects gene function and is contributing to the phenotype. A polymorphism which shows an association with a phenotype (for instance with disease susceptibility) is a marker for that phenotype and implicates the region in which the polymorphism resides as a region containing a polymorphism which contributes to the phenotype. Additional flanking polymorphisms can be tested to determine the precise location of the true phenotype-contributing variant.
Linkage studies on families, and LD studies on populations have different degrees of resolution with regards to defining the size of a DNA region which contains the phenotype- contributing polymorphism. In general, linkage studies define an interval which potentially contains tens to hundreds of genes, while LD studies have been used to implicate single genes in the development of a particular phenotype. 3. Test Populations Useful for Polymorphism Genotyping
The invention provides methods of determining allelic frequencies by performing genotypic analyses in appropriate test populations.
The following study populations from the FAMOS study group may be utilized. Bone Fracture Cohort: 1000 multiple or low trauma fracture cases and 1000 control cases to determine genetic association with fracture.
BMD (Bone Mass Density) Cohort: 300 high and 300 low BMD cases to study genetic association with high or low BMD.
BMD Case Control Cohort: 500 low BMD and normal BMD case contols to study genetic association with low BMD/fracture.
4. Assays Useful for Determining the Association of a Polymorphism with osteoporosis Preventative treatment for osteoporosis is most effective at the time when bone loss is increasing and before the bones have become fragile and prone to fracturing. Established diagnostic techniques use x-ray and ultrasonography to measure skeletal parameters of bone size, volume and mineral density to predict fracture risk and to assess response to therapy. Such measurements give a "static" value which can be compared to normal values to aid diagnosis of low bone mass and fracture risk (Schott, Cormier et al. 1998). The World Health Organization defines osteoporosis as present when the bone mineral density levels are more than 2.5 standard deviations below the young normal mean. The various techniques used to measure bone mineral density are: dual energy X-ray absorptiometry (DXA) - used to measure bone mass at the lumbar spine and hip, but it can also be applied to measuring total skeletal bone mass, soft-tissue composition and other regional bone measurements. Considered the "gold standard" for BMD measurement. high-resolution quantitative computed tomography (QCT) - highly sensitive, accurate and specific spinal measurements. This technique is more costly and involves higher radiation doses than other techniques and is not widely available. single-energy x-ray absorptiometry (SXA) - provides accurate radius BMD measurements. quantitative ultrasound (QUS) - new and promising technique which may have applications in both BMD measurement and assessment or architectural deterioration of bone tissue. Recent studies suggest QUS of calcaneus bone predicts hip fracture as well as DXA (Hans, Dargent-Molina et al. 1996).
An alternative method to predict fracture independently of bone mass is to measure bone turnover. High turnover (bone resorption and formation) is associated with rapid bone loss and is likely to contribute to micro-architectural deterioration (Ross andKnowlton 1998). This is a "dynamic" measurement which is assessed with biochemical markers in urine or serum and can be used very effectively in therapy monitoring in preference to BMD measurements which alter more slowly (results of PEPI trial and Merck Research Laboratories). When used in combination with bone mass assessment, biomarkers can provide more accurate fracture predictions over bone mass measurement alone. Several markers for bone resorption (deoxypyridinoline crosslinks), and bone formation (bone alkaline phosphatase, osteocalcin) have been developed for use in diagnostic kits. The current challenge is to reduce the variability of the measurements and improve their reliability and applicability.
5. Methods of Genotyping Polymorphisms The invention discloses methods for performing polymorphism genotyping. These methods can be used to detect the presence of a polymorphism in a sample comprising DNA or RNA.
A DNA sample for analysis according to the invention may be prepared from any tissue or cell line, and preparative procedures are well-known in the art. The preparation of genomic DNA is performed as described in Section B.
RNA samples may also be useful for genotyping according to the invention. Isolation of RNA can be performed according to the following methods.
RNA is purified from mammalian tissue according to the following method. Following removal of the tissue of interest, pieces of tissue of <2g are cut and quick frozen in liquid nitrogen, to prevent degradation of RNA. Upon the addition of a volume of 20 ml tissue guanidinium solution per 2 g of tissue, tissue samples are ground in a tissuemizer with two or three 10-second bursts. To prepare tissue guanidiium solution (1 L) 590.8 g guanidinium isothiocyanate is dissolved in approximately 400 ml DEPC-treated H20. 25 ml of 2 M Tris-Cl, pH 7.5 (0.05 M final) and 20 ml Na2EDTA (0.01 M final) is added, the solution is stirred overnight, the volume is adjusted to 950 ml, and 50 ml 2-ME is added.
Homogenized tissue samples are subjected to centrifugation for 10 min at 12,000 x g at 12°C. The resulting supernatant is incubated for 2 min at 65°C in the presence of 0.1 volume of 20% Sarkosyl, layered over 9 ml of a 5.7M CsCl solution (O.lg CsCl/ml), and separated by centrifugation overnight at 113,000 x g at 22°C. After careful removal ofthe supernatant, the tube is inverted and drained. The bottom of the tube (containing the RNA pellet) is placed in a 50 ml plastic tube and incubated overnight (or longer) at 4°C in the presence of 3 ml tissue resuspension buffer (5 mM EDTA, 0.5% (v/v) Sarkosyl, 5% (v/v) 2-ME) to allow complete resuspension of the RNA pellet. The resulting RNA solution is extracted sequentially with 25:24:1 phenol/chloroform/isoamyl alcohol, followed by 24:1 chloroform/isoamyl alcohol, precipitated by the addition of 3 M sodium acetate, pH 5.2, and 2.5 volumes of 100% ethanol, and resuspended in DEPC water (Chirgwin et al., 1979, Biochemistry, 18: 5294).
Alternatively, RNA is isolated from mammalian tissue according to the following single step protocol. The tissue of interest is prepared by homogenization in a glass teflon homogenizer in 1 ml denaturing solution (4M guanidiium thiosulfate, 25 mM sodium citrate, pH 7.0, 0.1 M 2-ME, 0.5% (w/v) N-laurylsarkosine) per lOOmg tissue. Following transfer of the homogenate to a 5-ml polypropylene tube, 0.1 ml of 2 M sodium acetate, pH 4, 1 ml water-saturated phenol, and 0.2 ml of 49: 1 chloroform/isoamyl alcohol are added sequentially. The sample is mixed after the addition of each component, and incubated for 15 min at 0-4°C after all components have been added. The sample is separated by centrifugation for 20 min at 10,000 x g, 4°C, precipitated by the addition of 1 ml of 100% isopropanol, incubated for 30 minutes at -20°C and pelleted by centrifugation for 10 minutes at 10,000 x g, 4°C. The resulting RNA pellet is dissolved in 0.3 ml denaturing solution, transferred to a microfuge tube, precipitated by the addition of 0.3 ml of 100% isopropanol for 30 minutes at -20°C, and centrifuged for 10 minutes at 10,000 x g at 4°C. The RNA pellet is washed in 70% ethanol, dried, and resuspended in 100-200 ml DEPC-treated water or DEPC-treated 0.5% SDS (Chomczynski and Sacchi, 1987, Anal. Biochem., 162: 156). RNA prepared according to either of these methods can be used for genotyping by the methods of Northern blot analysis, S 1 nuclease analysis and primer extension analysis (Ausubel et al., supra). cDNA samples also may be prepared according to the invention, i.e., DNA that is complementary to RNA such as mRNA. The preparation of cDNA is well-known and well- documented in the prior art. cDNA is prepared according to the following method. Total cellular RNA is isolated (as described) and passed through a column of oligo(dT)-cellulose to isolate polyA RNA. The bound polyA mRNAs are eluted from the column with a low ionic strength buffer. To produce cDNA molecules, short deoxythymidine oligonucleotides (12-20 nucleotides) are hybridized to the polyA tails to be used as primers for reverse transcriptase, an enzyme that uses RNA as a template for DNA synthesis. Alternatively, mRNA species can be primed from many positions by using short oligonucleotide fragments comprising numerous sequences complementary to the mRNA of interest as primers for cDNA synthesis. The resultant RNA-DNA hybrid can be converted to a double stranded DNA molecule by a variety of enzymatic steps well-known in the art (Watson et al., 1992, Recombinant DNA, 2nd edition, Scientific American Books, New York).
Tissues or fluids which are useful for obtaining a DNA or RNA sample according to the invention include but are not limited to plasma, serum, spinal fluid, lymph fluid, external secretions of the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and samples of in vitro cell culture constituents.
Genotyping methods which are useful according to the invention, i.e., for the detection of polymorphisms in nucleic acid samples isolated from individuals, are disclosed below.
Single Strand Conformation Polymorphism (SSCP) Screening and Fluorescent SSCP Screening (fSSCP)
SSCP Analysis
One technique for detecting DNA sequence variations in a biological sample is single strand conformation polymorphism (SSCP) (Glavac et al., 1993, Hum. Mut. 2:404; Sheffield et al., 1993, Genomics 16:325). SSCP is a simple and effective technique for the detection of single base changes. This technique is based on the principle that single-stranded DNA molecules assume specific sequence-based secondary structures (conformers) under nondenaturing conditions. The detection of point mutations by single stranded conformation polymorphism is believed to be due to an alteration in the structure of single stranded DNA. Molecules differing by only a single base substitution may assume different conformers and migrate differently in a nondenaturing polyacrylamide gel. Single stranded DNAs that contain sequence variations are identified by an abnormal mobility on polyacrylamide gels. SSCP detects all types of point mutations and short insertions or deletions that are located between the PCR primers (within the probe region) with apparently equal efficiency. This technique has proven useful for detection of multiple mutations and polymorphisms, including SNPs. SSCP sensitivity varies dramatically with the size of the DNA fragment being analysed. The optimal size fragment for sensitive detection by SSCP is approximately 125-300bp.
The mobility of a single stranded DNA or double stranded DNA fragment during electrophoresis through a gel matrix is dependent on its size. Small molecules migrate more rapidly than large molecules because they pass through the pores in the matrix more easily. Conventionally, electrophoresis of single stranded DNA involves a 'denaturing' gel which maintains the single strandedness of the molecules. The denaturant is typically urea in polyacrylamide gels, and typically formamide or sodium hydroxide in agarose gels. In contrast, according to the SSCP screening protocol, single-stranded DNA is analysed on a 'nondenaturing' gel. When single stranded DNA is analysed on a 'non-denaturing' gel, intramolecular interactions can occur. In particular, the single stranded DNA is able to (partially) bind to itself. Consequently, DNA that is separated by electrophoresis on an SSCP gel does not migrate as a linear molecule but rather, the mobility of the DNA on an SSCP gel is governed by both its size and tertiary structure (conformation). The tertiary structure of a single stranded DNA fragment is dependent on the sequence of the entire fragment. Therefore, if a polymorphism exists in a given fragment, the conformation will usually be altered. The technique is performed as follows.
One or more test DNA samples are prepared for analysis as described above, and subject to PCR amplification. Oligonucleotide primers are designed and synthesized as described above.
Amplifications are performed in a total volume of 10 ml containing 50 mM KCI, 10 mM Tris- HCl, pH 9.0 (at 25°C), 0.1 % Triton X-100, 1.5 mM MgCl2, 0.2mM of dGTP, dATP, dTTP, 0.02 mM of non radioactive dCTP, 0.05 ml [a-33P] dCTP (1,000-3,000 Ci mmol"1; 10 mCi ml"1), 0.2 uM each primer, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U Taq DNA polymerase. The PCR cycling profile is as follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72°C for 5 min. Annealing temperature is different for each PCR primer pair and can be optimized according to the parameters described above. Amplifications using Vent Taq polymerase (New England Biolabs) are performed in a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of dGTP, dATP, dTTP, 0.02 mM dCTP, 0.25 ul [a-33P] dCTP (1,000-3,000 Ci mmoi !;10 mCi ml"1), 0.2 uM of each primer, 50 ng of genomic DNA (or 1 ng of cloned DNA template) and 0.1 U of Vent Taq DNA polymerase. Samples are heated to 98°C for 5 min prior to addition of enzyme and nucleotides. The PCR cycling profile is 98°C, 1 min; annealing temperature, 45 sec; 72°C, 1 min for 35 cycles, followed by a final extension at 72°C for 5 min. The length and temperature of each step of a PCR cycle, as well as the number of cycles, is adjusted in accordance to the stringency requirements, as described above. SSCP analysis is performed as follows. Ten ul of formamide dye (95% formamide,
20mM EDTA, 0.05% bromophenol blue, 0.05% xylene cyanol) are added to 10 ul aliquots of radiolabeled PCR product. Following denaturation at 100°C for 5 min, the reaction mixture is placed on ice. Two ul aliquots are loaded onto 8% acrylamide:bisacrylamide (37.5 : 1), 0.5X TBE (45 mM Tris-borate, 1 mM EDTA), 5% glycerol gels. Electrophoresis is carried out at 25W at 4°C for 8 hours in 0.5X TBE. Dried gels are exposed to X-OMAT ARfilm (Kodak) and the autoradiographs are analysed and scored for aberrant migration of bands (band shifts). SSCP may be optimized, as desired, as taught in Glavac et al., 1993, Hum. Mut. 2:404. fSSCP Analysis
Techniques for screening multiple DNA samples simultaneously are also useful for performing rapid genotyping analysis on a large number of samples according to the invention. By pooling and multiplexing DNA samples in fluorescent SSCP (fSSCP) assays, the high throughput required for detecting sequence variations in a large number of samples is achieved (Makino et al., 1992, PCR Methods Appl. 2:10; Ellison et al, 1993, BioTechniques 15:684). According to the method of fSCCP, PCR products are visualized and analysed using an ABI fluorescent DNA sequencing machine. Different primer pairs are identified by different color fluorochromes (4 different fluorochromes are now available). fSSCP offers the following advantages over SSCP. Unlike SSCP, fSSCP does not require handling of radioactive materials. Furthermore, the fSSCP technique allows for automated data and automated data analysis programs that detect aberrantly migrating samples. In contrast, SSCP evaluation involves visual examination by an individual, and does not provide a means for correcting for lane to lane variations in electrophoretic conditions, as does fSSCP analysis. fSSCP Analysis is performed as follows.
Amplifications are performed in a total volume of 10 ul containing 50 mM KCI, lOmM Tris-HCl, pH 9.0 (at 25 °C), 0.1 % Triton X-100, 1.5 mM MgCl2, 0.2mM of dGTP, dATP, dTTP, dCTP, 0.2 uM primer labeled with one of the fluorochromes HEX, FAM, TET or JOE, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U Taq DNA polymerase. The PCR cycling profile is as follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72'C for 5 min. Annealing temperature is different for each PCR primer pair. Amplifications using Vent Taq polymerase (New England Biolabs) are performed in a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of dGTP, dATP, dTTP, dCTP, 0.2 uM primer labeled with one of the fluorochromes HEX, FAM, TET or JOE, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U of Vent Taq DNA polymerase. Samples are heated to 98°C for 5 min prior to addition of enzyme and nucleotides. The PCR cycling profile is 98°C, 1 min; annealing temperature, 45 sec; 72°C, 1 min for 35 cycles, followed by a final extension at 72°C for 5 min. Annealing temperature is different for each PCR primer pair. Two ul of fluorescent PCR products are added to 3 ul formamide dye (95% formamide, 20mM EDTA, 0.05% bromophenol blue, 0.05% xylene cyanol), denatured at 100°C for 5 min, then placed on ice. Thereafter, 0.5-1 ml of Genescan™ 1500 size markers are added as an internal standard. Two ul of the mix is loaded onto 8% or 10% acrylamide:bisacrylamide (37.5:1), 0.5X TBE (45 mM Tris-borate, 1 mM EDTA), 5% glycerol gels and electrophoresis is performed on an ABI 377 DNA sequencing machine. Gel temperature is maintained between 4° and 10°C by an external cooling unit connected to the internal cooling plumbing and chambers. Electrophoresis is carried out at 2500- 3500 volts for 4 - 10 hours in 0.5X TBE. Data is automatically collected and analysed with Genescan and Genotype analysis software (ABI). The fSSCP procedure identifies regions of 150-300 base pairs containing a sequence variation. To identify the exact sequence change, the fragment which demonstrates the aberrant migration is amplified again from the same biological sample, using non fluorescent primers. The sequence is then determined using standard DNA sequencing methods well known to those skilled in the art (Ausubel et al., supra).
Although SSCP and fSSCP techniques are preferred according to the invention, other methods for detecting sequence variations, including DNA sequencing, can be employed. Additional techniques for detecting DNA sequence variations useful according to the invention are described below.
Fluorescence Polarization-TDI
Fluorescence polarization-TDI is another preferred technique according to the invention for the detection of sequence variations. Template-directed primer extension is a dideoxy chain terminating DNA sequencing protocol designed to ascertain the nature of the one base immediately 3 'to the sequencing primer that is annealed to the target DNA immediately upstream from the polymorphic site. In the presence of DNA polymerase and the appropriate dideoxyribonucleoside triphosphate (ddNTP), the primer is extended specifically by one base as dictated by the target DNA sequence at the polymorphic site. By determining which ddNTP is incorporated, the alleles present in the target DNA can be determined. Fluorescence polarization is based on the observation that when a fluorescent molecule is exited by plane-polarized light, it emits polarized fluorescent light into a fixed plane if the molecules remain stationary between excitation and emission. However, because the molecule rotates and tumbles in solution, fluorescence polarization is not observed fully by an external detector. The fluorescence polarization of a molecule is proportional to the molecule' s rotational relaxation time, which is related to the viscosity of the solvent, absolute temperature, molecular volume, and the gas constant. If the viscosity and temperature are held constant, then fluorescence polarization is directly proportional to the molecular volume, which is directly proportional to the molecular weight. If the fluorescent molecule is large (with high molecular weight), it rotates and tumbles more slowly in solution and fluorescence polarization is preserved. If the molecule is small (with low molecular weight), it rotates and tumbles faster and fluorescence polarization is largely lost (depolarized).
In the FP-TDI assay, the sequencing primer is an unmodified primer with its 3' end immediately upstream from a polymorphic or mutation site. When incubated in the presence of ddNTPs labled with different fluorophores, the allele-specific dye ddNTP is incorporated onto the TDI primer in the presence of DNA polymerase and target DNA. The genotype of the target DNA molecule can be determined simply by exciting the fluorescent dye in the reaction and determining whether a change in fluorescence polarization occurs. Chen et al., 1999, Genome Res., 9:492. One or more test DNA samples are prepared for analysis as described above, and subj ect to PCR amplification. Oligonucleotide primers are designed and synthesized as described above. Amplifications are performed in a total volume of 10 ml containing 50 mM KCI, 10 mM Tris- HCl, pH 9.0 (at 25°C), 0.1 % Triton X-100, 1.5 mM MgCl2, 0.2mM of dGTP, dATP, dTTP, 0.02 mM of non radioactive dCTP, 0.05 ml [a-33P] dCTP (1,000-3,000 Ci mmol"1; 10 mCi ml"1), 0.2 uM each primer, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U Taq DNA polymerase. The PCR cycling profile is as follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72°C for 5 min. Annealing temperature is different for each PCR primer pair and can be optimized according to the parameters described above. Amplifications using Vent Taq polymerase (New England Biolabs) are performed in a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of dGTP, dATP, dTTP, 0.02 mM dCTP, 0.25 ul [a-33P] dCTP (1,000-3,000 Ci mmol^lO mCi ml"1), 0.2 uM of each primer, 50 ng of genomic DNA (or 1 ng of cloned DNA template) and 0.1 U of Vent Taq DNA polymerase. Samples are heated to 98°C for 5 min prior to addition of enzyme and nucleotides. The PCR cycling profile is 98°C, 1 min; annealing temperature, 45 sec; 72°C, 1 min for 35 cycles, followed by a final extension at 72°C for 5 min. The length and temperature of each step of a PCR cycle, as well as the number of cycles, is adjusted in accordance to the stringency requirements, as described above.
Following PCR amplification, unused PCR primers and dNTPs are destroyed by adding 2ml of PCR product to 2ml of SAP/Exonuclease cocktail (0.1U shimp alkaline phosphatase (1 U/ml,Amersham Pharmacia Biotech, Inc., Piscataway, NJ)and 0.2U E. coli exonuclease I (10 U/ml, Amersham)in SAP buffer (20mM TrisHCl, pH 8.0; 10 mM MgCl2, Amersham))per well of a 384-well Black PCR plate (ABT). The mixtures are incubated at 37°C for 60 min before the enzymes are heat inactivated at 95°C for 15 min. The mixture is held at 4°C until used in the FP- 5 TDI assay.
To the enzymatically treated PCR product, 2 ml of TDI reaction cocktail containing TDI buffer (50mM Tris-HCl (pH 9.0), 50mM KCI, 5 mM NaCI, 2 mM MgCl2, 8% glycerol), 1 mM TDI primer, 12.5 nM of each of two allele specific dye-labled ddNTPs (ROX-ddGTP, BFL- ddATP, Tamra-ddCTP, or R6G-ddUTP; NEN Life Science Products, Inc., Boston, MA), and ' 10 0.32U Thermo Sequenase (Amersham). The reaction mixtures are incubated at 94oC for 15 min, followed by 34 cycles of 94°C for 30 seconds and 55°C for 15 seconds. Upon completion of the reaction cycles, the samples are held at 4°C.
After the primer extension reaction, 24 ml of TE buffer/methanol (2:1) is added to each sample well, and the fluorescence polarization is measured using aLJL Analyst (LJL Biosystems, 15 Sunnyvale, CA).
Denaturing Gradient Gel Electrophoresis
Denaturing gradient gel electrophoresis (DGGE) is a gel system which allows electrophoretic separation of DNA fragments differing in sequence by a single base pair. The o separation is based upon differences in the temperature of strand dissociation of the wild-type and mutant molecules. During electrophoresis, fragments migrating through the gel are exposed to an increasing concentration of denaturant in the gel. When the DNA fragments are exposed to a critical level of denaturant, the DNA strands begin to dissociate. This dissociation causes a significant reduction in the mobility of the fragment. The position in the gel at which the level 5 of denaturant is critical for a particular DNA fragment is a function of the Tm of the DNA fragment and is therefore different for wild-type versus mutant fragments. Consequently, upon migration to the position at which the level of denaturant is at the critical point, for either the wild-type or the mutant fragment, the mobility of these two molecules will become different, thus resulting in their separation. The mutation detection rate of DGGE approaches 100%. Although o the technique of DGGE is relatively simple to perform, and does not require radioisotopes or toxic chemicals, it does require some specialized equipment. Furthermore, DGGE can only be used to analyze fragments between 100 and 800bp due to the resolution limit of polyacrylamide gels . DGGE is advantageous over other methods useful for detecting sequence variations because the behavior of DNA molecules on DGGE gels can be modeled by computer thereby making it possible to accurately predict the detectability of a mutation in a given fragment. Genomic DNA fragments can be efficiently transferred from the gel following DGGE as described in US Patent No. 5,190,856.
Chemical Cleavage of Mismatches Chemical cleavage of mismatch (CCM) is another technique for detection of sequence variations that is useful according to the invention. CCM is based upon the ability of hydroxylamine and osmium tetroxide to react with the mismatch in a DNA heteroduplex and the ability of piperidine to cleave the heteroduplex at the point of mismatch. According to the method of CCM, sequence variations are detected by the appearance of fragments that are smaller than the untreated heteroduplex following denaturing polyacrylamide gel electrophoresis.
DNA fragments up to lkb in size can be analysed by CCM with a probable 100% detection rate for sequence variation. CCM is particularly useful for either detecting all of the sequence variations in a particular fragment of DNA or for determining that there are no sequence variations in a particular fragment of DNA.
Constant Denaturant Capillary Electrophoresis (CDCE) Analysis
CDCE analysis is particularly useful in high throughput screening, i.e., wherein large numbers of DNA samples are analysed. CDCE analysis combines several elements of both replaceable linear polyacrylamide capillary electrophoresis and constant denaturant gel electrophoresis. The technique of CDCE is a rapid, high resolution procedure that demonstrates a high dynamic range, and is automatable. The method of CDCE, as described in detail in Khrapko et al., 1994, Nucleic Acids Res. 22:364, involves the use of a zone of constant temperature and a denaturant concentration in capillary electrophoresis. Linear polyacrylamide gel electrophoresis is performed at viscosity levels that permit facile replacement of the matrix after each run. For a typical 100 bp fragment of DNA, point mutation-containing heteroduplexes are separated from wild type homoduplexes in less than 30 minutes. Using laser- induced fluorescence to detect fluorescent-tagged DNA, the system has an absolute limit of detection of 3 x 104 molecules with a linear dynamic range of six orders of magnitude. The relative limit of detection is about 3/10,000, i.e., 100,000 mutant sequences are recognized among 3 x 108 wild type sequences. This approach is applicable to analysis of low frequency mutations, and to genetic screening of pooled samples for detection of rare variants.
RNase Cleavage An additional method for genotyping that is useful according to the invention is RNASE
Cleavage. Various ribonuclease enzymes, including RNASE A, RNASE TI and RNASE T2 specifically digest single stranded RNA. When RNA is annealed to form double stranded RNA or an RNA/DNA duplex, it can no longer be digested with these enzymes. However, when a mismatch is present in the double stranded molecule, cleavage at the point of mismatch may occur.
RNASE Cleavage is preferably performed with RNASE A. Ribonuclease A specifically digests single stranded RNA but can also cleave heteroduplex molecules at the point of mismatch. The extent of cleavage at single base mismatches depends on both the type of mismatch, and the sequence of DNA flanking the mismatch. Sequence variations leading to mismatch are indicated by the presence of fragments that are smaller than the uncleaved heteroduplex on denaturing polyacrylamide gels.
According to the invention, RNASE Cleavage involves forming a heteroduplex between a radiolabeled single stranded RNA probe (riboprobe) and a PCR product derived from a biological sample. If a point mutation is present in the PCR product, following treatment of the resulting RNA/DNA heteroduplex with RNASE A, the RNA strand of the duplex may be cleaved. The sample is then denatured by heating and analysed on a denaturing polyacrylamide gel. If the RNA probe has not been cleaved, it will be the same size as the PCR product. If the probe has been cleaved, it will be smaller than the PCR product. RNASE Cleavage can be used to easily detect a 1 bp deletion. However, small insertions may not be as easily detected as small deletions, by RNASE Cleavage, as 'looping-out' occurs on the target strand rather than the probe strand.
Heteroduplex Analysis Another method for genotyping according to the invention is heteroduplex analysis.
Heteroduplex molecules, i.e., double stranded DNA molecules containing a mismatch, can be separated from homoduplex molecules on ordinary gels. The exact rate of detection of sequence variations by heteroduplex analysis is unknown, but is clearly significantly lower than 100%. Presumably, the sequence of DNA flanking the mismatch, rather than the actual mismatch affects the detectability. Mismatches that are located in the middle of a DNA fragment are detected most easily. Although heteroduplex analysis is less sensitive than some of the other genotyping methods described, it may be considered useful according to the invention due to its simplicity.
Mismatch Repair Detection (MRD) Another technique that is useful for genotyping according to the invention is mismatch repair detection (MRD). MRD is an in vivo method that detects DNA sequence variation by the occurrence of a change in bacterial colony color. DNA fragments to be screened for variation are cloned into two MRD plasmids, and bacteria are transformed with heteroduplexes of these constructs . The resulting colonies are blue in the absence of a mismatch and white in the presence of a mismatch. MRD can be used to detect a single mismatch in a DNA fragment as large as 10 kb in size. MRD permits high-throughput screening of genetic mutations, and is described in detail in Faham et al., 1995, Genome Research 5:474.
Mismatch Recognition by DNA Repair Enzymes Another technique that is useful for detecting sequence variations according to the invention is Mismatch Recognition by DNA Repair Enzymes. The E.coli mismatch correction systems are well-understood. Three ofthe proteins required for the methyl-directed DNA repair pathway: MutS, MutL and MutH are sufficient to recognize 7 ofthe possible 8 single base-pair mismatches (C/C mismatches are not recognized) and cut/nick the DNA at the nearest GATC sequence. The MutY protein, which is involved in a distinct repair system can also be used to detect A/G and A C mismatches. Some mammalian enzymes are also useful for mismatch recognition: thymidine glycosylase can recognize all types of T mismatch and 'all-type endonuclease' or Topoisomerase I is capable of detecting all 8 mismatches, but does so with varying efficiencies, depending on both the type of mismatch and the neighboring sequence.
The MutS gene product is the methyl-directed repair protein which binds to the mismatch.
Purified MutS protein has been used to detect mutations by several different methods. Gel mobility assays can be performed in which DNA bound to the MutS protein migrates more slowly through an acrylamide gel than free DNA. This method has been used to detect single base mismatches.
An alternative method for the use of MutS in mismatch recognition, which does not require gel electrophoresis, involves the immobilization of MutS protein on nitrocellulose membranes. Labeled heteroduplexed DNA is used to probe the membrane in a dot-blot format. When both DNA strands are used, all mismatches can be recognized by binding of the DNA to the protein attached to the membrane. Although C/C mismatches are not detected, the corresponding G/G mismatch derived from the other strand is recognized. This technique is particularly useful because it is simple, inexpensive, and amenable to automation. However, the detection efficiency of this method may be limited by the size of the DNA fragment, h particular, this method works well for very short fragments.
Sequencing by Hybridization (SBH)
An alternative method for detecting sequence variations according to the invention is sequencing by hybridization (SBH). According to this method, arrays of short (8-10 base long) oligonucleotides are immobilized on a solid support in a manner similar to the reverse dot-blot protocol, and probed with a target DNA fragment. In particular, oligonucleotides are synthesized together and directly onto the support.
The synthesis system begins with a silicon chip coated with a nucleotide linked to a light- sensitive chemical group which is used to illuminate particular grid co-ordinates removing the blocking group at these positions . The chip is then exposed to the next photoprotected nucleotide, which polymerizes onto the exposed nucleotides. In this manner, as a result of successive rounds of nucleotide additions, oligonucleotides of different sequences can be synthesized at different positions on the solid support. Thirty-two cycles of specific additions (i.e., 8 additions of each of the four nucleotides) should enable the production of all 65,536 possible 8-mer oligonucleotides at defined positions on the chip.
When the chip is probed with a DNA molecule, e.g., a fluorescently labeled PCR product, fully matched hybrids should give a high intensity of fluorescence and hybrids with one or more mismatches should give substantially less intense fluorescence. The combination ofthe position and intensity of the signals on the chip enables computers to derive the sequence of the DNA molecule being analysed for the presence of sequence variations.
Allele-Specific Oligonucleotide Hybridization
The technique of allele-specific oligonucleotide (ASO) hybridization or the 'dot-blot' is also useful for genotyping according to the invention. Under specific hybridization conditions, an oligonucleotide will only bind to a PCR product if the two are 100% identical. A single base pair mismatch is sufficient to prevent hybridization. A pair of oligonucleotides, one carrying the wild type base and the other carrying a single base change, as compared to the wild type sequence, can be used to determine if a PCR product is homozygous wild type, heterozygous or homozygous mutant for a particular base change. When performing conventional dot blots, the PCR product is fixed onto a nylon membrane and probed with a labeled oligonucleotide. When performing a 'reverse dot blot' , an oligonucleotide is fixed to a membrane and probed with a labeled PCR product. The probe may be isotopically labeled, or non-isotopically labeled. The technique allows for the genotyping of multiple PCR amplified samples for the presence of a single base change.
Allele-Specific PCR
Many methods for identifying sequence variations involve the analysis of PCR-amplified DNA. The allele-specific polymerase chain reaction (also called the amplification refractory mutation system or ARMS) comprises an assay that occurs during the PCR reaction itself. ARMS requires the use of sequence-specific PCR primers which differ from each other at their terminal 3' nucleotide and are designed to amplify only the normal allele in one reaction, and only the mutant allele in another reaction. When the 3' end of a specific primer is 100% identical to the target, amplification occurs. When the 3' end of a specific primer is not 100% identical to the target, amplification does not occur. Agarose gel electrophoresis is used to detect the presence of an amplified product. The genotype of a (heterozygous) wild-type sample is characterized by amplification products in both reactions, and a homozygous mutant sample generates product in only the mutant reaction.
This technique can be modified so that the 5' ends of the allele-specific primers are labeled with different fluorescent labels, and the 5 ' end ofthe common primers are biotin labeled. According to this alternate protocol, the wild-type specific and the mutant-specific reactions are performed in a single tube. The advantages of this approach are that a gel electrophoresis step is not required, and the method is amenable to automation.
Primer-Introduced Restriction Analysis The method of primer-introduced restriction analysis (PIRA) can also be used for genotyping according to the invention. PIRA is a technique which allows known sequence variations to be detected by restriction digestion. By introducing a base change close to the position of a known sequence variation (for example by using a PCR primer containing a mismatch, as compared to the target sequence), it is possible to create a restriction endonuclease recognition site that indicates the presence of a particular sequence change. The combination of the altered base in the primer sequence and the altered base at the mutation site, creates a new restriction enzyme target site. This approach may be used to create a new restriction enzyme site in either the wild-type allele or the mutant allele. If a novel restriction enzyme site is introduced in the mutant allele then, following digestion with the appropriate restriction enzyme, the homozygous wild-type form would produce a single band of the full-length size, the homozygous mutant form would produce a single band of the reduced size and the heterozygous form would produce both full length and reduced sized bands. Band size will be analysed by gel electrophoresis.
Oli onucleotide Ligation Assay The technique of oligonucleotide ligation can also be used for genotyping according to the invention.
The method of oligonucleotide ligation is based on the following observations. If two oligonucleotides are annealed to a strand of DNA and are exactly juxtaposed, they can be joined by the enzyme DNA ligase. If there is a single base pair mismatch at the junction of the two oligonucleotides then ligation will not occur. According to the method of oligonucleotide ligation, the two oligonucleotides used in the assay are modified by the addition of two different labels. According to this method, the assay for a li gated product involves detecting a ligated product by assaying for the appearance of the labels of the two oligonucleotides on a single molecule rather than visualization of a new, larger sized DNA fragment by gel electrophoresis.
When ligation reactions are conducted in 96-well microtiter plates and ligation is scored by ELIS A, the oligonucleotide ligation assay can be performed by a robot and the results can be analysed by a plate reader and fed directly into a computer. This method is therefore extremely useful for detecting the presence of a sequence variation in a large number of samples. The oligonucleotide ligation assay is performed on PCR-amplified DNA. A modification of this assay, termed the ligase chain reaction, is performed on genomic DNA and involves amplification with a thermostable DNA ligase.
Direct DNA Sequencing Genotyping according to the invention may also be carried out by directly sequencing the
DNA sample in the region ofthe gene of interest, using DNA sequencing procedures well-known in the art (described above in Section D, entitled "Isolation of a Wild Type Gene").
Mini-Sequencing The technique of mini-sequencing (also known as single nucleotide primer extension) can also be used to detect any known point mutation, deletion or insertion, according to the invention. Obtaining sequence information for just a single base pair only requires the sequencing of that particular base. This can be done by including only one base in the sequencing reaction rather than all four. When this base is labeled and complementary to the first base immediately 3' to the primer (on the target strand), the label will not be incorporated. Thus, a given base pair can be sequenced on the basis of label incorporation or failure of incorporation without the need for electrophoretic size separation.
5' Nuclease Assay
Genotyping according to the invention can also be performed by the method of 5' nuclease assay. The 5' nuclease assay is a technique that monitors the extent of amplification in a PCR reaction on the basis of the degree of fluorescence in the reaction mix. A low level of fluorescence indicates no amplification or very poor amplification and a high level of fluorescence indicates good amplification. This system can be adapted to permit identification of known sequence variations, without the need for any post-PCR analysis other than fluorescence emission analysis.
PCR amplification is detected by measuring the 5' to 3' exonuclease activity of Taq polymerase. Taq polymerase cleaves 5' terminal nucleotides of double stranded DNA. The preferred substrate for Taq polymerase is a partially double stranded molecule. Taq polymerase cleaves the strand that contains the closest free 5' end. According to the 5' nuclease assay, an oligonucleotide 'probe' which is phosphorylated at its 3' end so as to render it incapable of serving as a DNA synthesis primer, is included in the PCR reaction. The probe is designed to anneal to a position between the two amplification primers. When an actively extending Taq polymerase molecule reaches the probe molecule, it partially displaces the probe and then cleaves the probe at or near the single stranded/double stranded cleavage site until the entire probe is broken up and removed from the template. The polymerase continues this process of displacement and cleavage until the entire probe is broken up and removed from the template. The probe is labeled in a manner that permits detection ofthe removal ofthe probe. In particular, the probe is labeled at different positions with two different fluorescent labels. One label has a localized quenching effect on the fluorescence of the other (reporter) label. This effect is mediated by energy transfer from one dye to the other, and requires that the two dyes are in close proximity to each other. If the probe is cleaved at a position between the reporter and the quencher dyes, the two dyes become physically separated thereby resulting in an increase in fluorescence which is proportional to the yield of the PCR product. Representational Difference Analysis (RDA)
Genotyping according to the invention can also be carried out by Representational Difference Analysis (RDA). RDA is described in detail in Lisitsyn et al., 1993, Science 259:946, and an adaptation which combines selective breeding with RDA is described in Lisitsyn et al., 1993, Nature Genet. 6:57. RDA identifies sequence dissimilarities through the application of a powerful approach to subtractive hybridization. According to the method of RDA, one first creates simplified representations, called amplicons, from two samples that are being compared. An amplicon can comprise, for example, the set of BglJJ fragments that are small enough to be amplified by the PCR. The iterative subtraction step begins with the ligation of a special adaptor to the 5' end of fragments contained in the amplicon derived from the test sample (tester amplicon). The tester amplicon is then melted and briefly reannealed in the presence of a large excess of amplicon, derived from the wild type sample (driver amplicon). Those tester fragments that reanneal (presumably fragments absent from the wild type, driver amplicon) can serve as a template for the addition of the adaptor sequence to the 3 '-end of the "partner" fragment. As a result, these tester fragments can be exponentially amplified by PCR. This procedure is then repeated to achieve successively higher enrichment.
RDA may be used to clone sequences that are either wholly absent from the wild type sample or are present in the wild type DNA, but are contained in a restriction fragment that is too large to be amplified in the amplicon. The former case may arise from a total deletion; the latter from a restriction fragment length polymorphism with the short allele present in the tester but not the wild type DNA. RDA is useful for subtracting DNA from an individual with a particular disease from normal DNA so as to identify regions showing homozygous or heterozygous deletions; locating fragments present in a parent with a dominant disorder but absent in his unaffected offspring; and locating mRNAs expressed in normal tissue but not present in tissue isolated from an individual with a particular disease.
Denaturing High Performance Liquid Chromatography
According to the scanning method of Denaturing High Performance Liquid
Chromatography (DHPLC), partial heat denaturation and a linear acetonitrile column are used to identify polymorphisms in DNA fragments . DHPLC provides a method of comparative DNA sequencing based on the capability of ion-pair reverse phase liquid chromatography on alkylated nonporous poly(styrene divinylbenzene) particles to resolve homo- from heteroduplex molecules under conditions of partial denaturation. This method can potentially be automated to allow for rapid analysis of a large number of samples (Underhill et al., 1996, Proc. Natl. Acad. Sci. USA, 93:196).
Mass Spectroscopy
Matrix-assisted laser desorption-ionization-time-of-flight (MALDI-TOF) mass spectroscopy is another method according to the invention by which genotyping can be performed. The method of MALDI-TOF mass spectroscopy is based on the irradiation of crystals formed by suitable small organic molecules (referred to as the matrix) with a short laser pulse at a wavelength close to the resonant adsorption band of the matrix molecules. This causes an energy transfer and desorption process producing matrix ions. Low concentrations of nucleic acid molecules are added to the matrix molecules while in solution and become embedded in the solid matrix crystals upon drying of the mixture. The intact nucleic acids are then desorbed into the gas phase and ionized upon irradiation with a laser allowing their mass analysis. MALDI is used primarily with time-of-flight spectrometers where the time of flight is related to the mass-to- charge ratio of the nucleic acids molecules. Reviewed in Griffin TJ. and Smith L.M., 2000, Trends Biotech 18:77. Genotyping can be performed by any of the following MALDI-TOF mass spectroscopy approaches including sequencing of PCR products (Fu,D-J et al., 1998, Nat. Biotechnol. 16:381; Kirpekar, F. et al., Nucleic Acids Res. 26:2554), direct mass-analysis of PCR products (Ross, P.L. et al., 1998, Anal. Chem. 70:2067), analysis of allele-specific PCR (Taranenko, N.I. et al, 1996, Genet. Anal. Biomol. Eng. 13:87) or LCR (ligase chain reaction; Jurinke, C. et al., 1996, Anal. Biochem.237: 174)products, analysis of RFLP-PCR products (Srinivasan, J.R. et al., 1998, Rapid Commun. Mass Spectrom. 12: 1045), minisequencing (Haff, L. A. and Smirnov, IP., 1997, Genome Res. 7:378; Higgens, G.S. et al., 1997, BioTechniques 23:710), analysis of PNA (peptide nucleic acid) hybridization probes (Griffin, TJ. et al., 1997, Nat. Biotech. 15:1368; Ross, P.L., Anal. Chem.69:4197; Jiang-Baucom, P. etal, 1997, Anal. Chem.69:4894), or direct analysis of invasive cleavage products (Griffin, T J. et al, 1999, Proc. Natl. Acad. Sci. USA 96:6301).
6. Methods of Specifying a Polymorphism The invention provides methods for specifying a particular polymorphism. By "specifying an polymorphism" is meant defining a polymorphism in the context of a larger region of nucleic acid which contains the polymorphism, and is of sufficient length to be easily differentiated from any other position in the genome.
A unique nucleotide position (e.g. a polymorphic site) in the human genome can be specified by describing a unique sequence of DNA within the genome, and providing the location of the unique nucleotide position relative to that sequence. Preferably this is done by providing the sequence identity of a length of unique DNA containing the polymorphism, and indicating which of the nucleotide sites is polymorphic.
A calculation can be made to determine a sequence length which will be unique in the 3 billion nucleotide human genome. If it is assumed that the genome contains equal numbers of the nucleotides A, G, C and T, and that they occur randomly in the genome, one can determine the probability of any given sequence of a defined length occurring in the genome; a random 12mer will appear in a random 3,000,000,000 bp genome 179 times, a random 15 mer will appear in a random 3,000,000,000 bp genome 3 times and a random 16mer will appear in a random 3,000,000,000 bp genome 1 time.
Thus, it would appear that specifying 16 bp would uniquely define a sequence in the genome. However, the genome is not composed of random sequence and does not contain equal amounts of A, G, C andT. In fact, 10-12 bp sequences are likely to be specific for 95% of genes. Some sequences may even be specified by as few as 8 nucleotides. The minimum sequence length that is useful according to the invention for identifying polymorphisms in most gene and intergenic sequences is approximately 9-15 bp.
In the case of repeat sequences and sequences associated with gene families, the probability of observing a particular sequence is greatly increased and it becomes difficult to specify a polymorphism in the context of a sequence that is only on the order of 9-15 bp. There are many types of repeats including tandem repeats, where a larger sequence block has within it smaller repeat units (e.g. microsatellites). Tandem repeats usually occur within non-genic areas, but can also occur within genes and subsequently affect gene function; they can be 10-lOOOs of bp long, or, if located in centromeres and telomeres, be megabase sized. Some repeats are composed of blocks which do not have sub-repeat units and are non-functional (e.g. -300 bp Alu repeats). These occur by duplication/dispersal throughout the genome.
It may be difficult to specify a polymorphism that occurs in a gene that is a member of a gene family. Through the mechanism of gene duplication, gene families, comprising multiple copies of a gene in which some, but not all of the DNA sequence has diverged, have been formed. Thus, certain regions of a gene may be conserved in different gene family members. With time, a duplicated gene can lose function and the sequence of the duplicated gene can deteriorate; the amount of homology between the original gene and the duplicated version depends upon the time since duplication. Other duplications maintain function and retain some level of similarity with the original gene in the important domains. Some related genes can share nearly 100% homology across a region that is hundreds of bp long, and yet have no significant homology at any other location. In these cases, it may be necessary to specify dozens or more nucleotides to provide a unique sequence.
To identify a unique sequence, a search must be done wherein a specific sequence is compared to all known human sequences and the minimum unique sequence is defined. However, in the absence of a complete sequence for the human genome, it cannot be guaranteed that a sequence is truly unique. Empirical experimentation can be used to determine the minimum sequence for specificity/uniqueness. In the case of a gene family member, if sequence information is available for the region corresponding to the region of interest in other members of the gene family, than it may be possible to define a unique short (9-15 bp) sequence that contains a polymorphism and has specificity. In the event that a particular region cannot be defined as unique, a larger region of nucleic acid which contains the polymorphism will be required to define a polymorphism in a gene that is a member of a gene family. It is predicted that a sequence of 9-15 bp will be sufficient to define a polymorphism in 99% of all cases. Methods of specifying a polymorphism that involve using sequences which either encompass or overlap the polymorphic site to be tested or do not encompass or overlap the polymorphic site to be tested are useful according to the invention and are described below.
Oligonucleotide Hybridization.
An oligonucleotide is designed such that it is specific for a target sequence, and hybridizes only at the target sequence site. This oligonucleotide will not hybridize if the target sequence differs at the position in the sequence to be tested. Another oligonucleotide is designed such that it hybridizes with the polymorphic form of the sequence. A DNA sample is tested for hybridization with each of the two probes independently. If the DNA hybridizes to only one of the probes, it can be concluded that the individual is homozygous for the corresponding sequence. If both probes hybridize to a test DNA sample, then the individual is heterozygous. Hybridization will be detected by the method of Southern blot analysis (as described in Section C entitled "Production of a Nucleic Acid Probe").
Specifying a Polymorphism by PCR
An alternative method for specifying a particular polymorphism involves a PCR-based strategy. According to this method, a region of a candidate gene to be tested is amplified by PCR (as described). The amplified fragment is digested with a restriction enzyme that will not cut a fragment that contains a polymorphism, due to the location of the polymorphism within the recognition site of this restriction enzyme. The products ofthe digestion reaction mixture are size separated in an agarose gel, stained with ethidium bromide, and visualized under ultraviolet light to determine if the amplified product has been digested. According to this method, the PCR primers provide the specificity for a particular polymorphism by virtue ofthe specific sequence of the two primers, as well as by the location of the primer binding sites in the target DNA. Although, multiple sites for primer binding may exist in a target DNA sequence, only the sites that are close enough together will produce an amplified product that includes the nucleic acid region containing the polymorphism. Alternatively, a PCR reaction is carried out with PCR primers that contain polymorphisms. According to this embodiment, if the template nucleic acid lacks the polymorphism present in the primers there will be no PCR product. Thus, according to this embodiment of the invention, the absence of a PCR product indicates that a polymorphism is not present in the target sequence.
Primer Extension
A DNA fragment comprising the region containing a polymorphism is PCR amplified from an individual to be tested. The PCR product is denatured and one strand is retained for analysis. An oligonucleotide probe is designed such that it is specific for a region in the sequence and hybridizes such that its 3' terminal nucleotide is paired with the nucleotide adjacent to the one to be tested. The PCR product and probe are combined with a polymerase and terminating, differentially colored, nucleotides. The polymerase extends the probe by one base, and only the base which is complementary to the site being tested is added. The reaction is washed, and the color of the reaction indicates the nucleotide that has been added and the sequence at the position of interest.
The PCR step provides one level of specificity by amplifying a region (1 - 10000 bp as desired between the PCR primers) from a complex (3,000,000,000 bp) mixture. The PCR probes primers must be unique in both their hybridization specificity and their proximity to one another. Since proximity ofthe two PCR primers is needed (i.e. a distance across which a polymerase can extend to join the primers), shorter PCR primers can be used, e.g. in theory a small enough region could be amplified with a 8-10 bp binding site for a PCR primer. To ensure that a primer hybridizes with specificity, a primer must be at least 5 bp.
A second level of specificity is provided by the primer which is extended in the primer extension reaction. Since this primer is hybridizing to a short piece of DNA, it can be short and unique for the fragment with which it binds. The primer is at least 5bp and preferably 8bp.
Although the primer used for the primer extension step is located probe adjacent to the polymorphic site, the PCR primers should not overlap with the polymorphic site being tested. Southern Blotting
One method for detecting a previously defined polymorphism involves Southern blot analysis of wild type and mutant DNA following digestion with a restriction enzyme which has a recognition sequence which includes the polymorphic site to be tested. According to this method, a particular restriction enzyme cuts wild type DNA but does not cut mutant DNA due to the presence of a polymorphism within the recognition site of this restriction enzyme. Many restriction enzymes exist which recognize 4bps. The resulting fragments will be size separated in an agarose gel, transferred to a membrane and probed with a nucleic acid probe. If the site is uncut, the fragment is one length and if the site is cut the fragment will be of a shorter length. The nucleic acid hybridization probe will provide specificity to the particular polymorphism being tested by defining the polymorphism in the context of a larger stretch of nucleic acid sequence. The nucleic acid probe may comprise the nucleic acid sequence corresponding to the region known to contain the polymorphism. The sequence-specific probe may be located 10, 100, 1000, or even 100s of thousands of bases from the region containing the polymorphism. If the probe is located some distance from the region containing the polymorphism, an intervening recognition site for the restriction enzyme cannot be located between the probe hybridization site and the region of interest containing the polymorphism site. Typically, a hybridization probe useful according to this method will be much larger than the minimum length of a sequence (9-15 bp) required to give specificity to, or define a particular polymorphism.
Alternatively, a chemical or enzyme which recognizes a unique pair of nucleotides at the site of a polymorphism, can be used to detect the polymorphism. According to this method, the amount of sequence required for recognition by a chemical or enzyme is 2 bp (providing that the
2 bp sequence is unique in a region large enough to produce a fragment which can then be bound by a specific probe).
According to a variation ofthe above method, a labeled chemical or enzyme which binds to one sequence of the polymorphic recognition site and not another is used. This method involves the steps of digesting the DNA with a restriction enzyme, and adding a labeled, sequence-specific binding protein (e.g. a restriction enzyme that lacks cleavage capability). The sequence-specific binding protein will bind to multiple sites in the genome, including the site to be tested. The fragments will be separated on a gel and then probed with a probe specific for the test sequence. If the fragment identified by the second probe is identical to a fragment identified by the first probe (e.g. the labeled chemical or enzyme), then the sequence being tested for is present.
7. Determination of the Phenotypic Outcome of a Polymorphism
To determine the phenotypic outcome of a polymorphism according to the invention, it is necessary to screen suitable populations to obtain a statistically significant measure of the association of a polymorphism with a particular disease (e.g osteoporosis). The invention provides methods for performing polymorphism genotyping in appropriate populations (described above). The invention also provides in vitro and in vivo assays useful for determining the phenotypic outcome of a polymorphism in a candidate gene.
Every polymorphism has the potential to alter the genetic activity of an individual. At the level of a single gene, the effect of a polymorphism can range from an inconsequential, silent change to a change that causes a complete loss of protein function to a gain of aberrant or detrimental function mutation. The severity of the effect of a polymorphism on gene activity will depend on the exact molecular consequences of the particular polymorphism. For example, alterations of a single pre-mRNA splicing dinucleotide could have profound effects on both the quantitative and qualitative properties of gene activity since alterations in splicing efficiency can both reduce the overall level of normal transcription as well as cause "exon skipping". If the deleted exon involves a coding exon then exon skipping will lead to an alteration in the amino acid composition of the resulting protein and likely effect protein activity. To accurately assess the role of a particular polymorphism in the regulation of various molecular events, appropriate assays for both gene expression and protein function must be carried out.
In vitro assays useful for determining the effects of a polymorphism on gene expression and protein function include, but are not limited to the following. i. Transcriptional Regulation The transcriptional regulation of a candidate gene containing a polymorphism may be altered, as compared to the wild type gene.
Promoter Activity If a polymorphism is located in the promoter, enhancer or repressor region of a candidate gene, promoter assays (well known in the art) wherein the altered promoter ofthe candidate gene is used to drive the expression of a reporter gene (e.g. CAT, luciferase, GFP) are performed. Changes in the transcriptional regulation of a candidate gene due to the presence of a polymorphism can also be detected by methods useful for measuring the level of mRNA including S 1 nuclease mapping and RT-PCR.
SI Analysis
The S 1 enzyme is a single-stranded endonuclease that will digest both single-stranded RNA and DNA. According to the method of S 1 analysis, a probe that has been efficiently labeled to a high specific activity at the 5 ' end through the use of a kinase, is used to determine either the amount of an mRNA species or the 5' end of a message. A single stranded probe that is complementary to the sequence of the RNA species of interest is utilized in SI analysis. If the structure of a particular mRNA species is known, S 1 analysis is performed with oligonucleotide probes of at least 40 bp, that are complementary to the RNA of interest. It is preferable to use oligonucleotides wherein the 5' end of the oligonucleotide is complementary to the RNA. It is also preferable to use oligonucleotides wherein the 5' terminal residues contain dG or dC residues. If Si nuclease analysis will be utilized to determine the 5' termini of an RNA species, the 3' end of the oligonucleotide should extend at least 4 nucleotides beyond the RNA coding sequence. The inclusion of additional nucleotides facilitates differentiation of a band resulting from an RNA:DNA duplex and a band representing the probe.
A hybridization probe for SI analysis is prepared by incubating 2pmol of an oligonucleotide in the presence of 150 mCi[y32P]ATP (3000-7000Ci/mmol), 2.5 ml 10X T4 polynucleotide kinase buffer (700mM Tris-Cl, pH 7.5, 100 mM MgCl2, 50 mM dithiothreitol, 1 mM spermidine-Cl, 1 mM EDTA), and 10U T4 polynucleotide kinase for 37°C for 30-60 minutes. The radiolabeled probe is ethanol precipitated and resuspended at lml/0.3ng oligonucleotide or 1O5 cpm.
The hybridization reaction is performed as follows. An amount of probe equal to 5xl04 Cerenkov counts is added to 5Omg RNA on ice and ethanol precipitated. The resulting pellet is resuspended in 20ml S 1 hybridization solution (80% deionized formamide, 40 mM PIPES, pH 6.4, 400mM NaCI, 1 mM EDTA, pH 8), denatured for 10 min at 65°C and hybridized overnight at 30°C. The following day, 300 ml of a mixture of 150 ml 2x S 1 nuclease buffer (0.56M NaCI, 0.1 M sodium acetate, pH 4.5, 9mM ZnSO4), 3ml 2mg/ml single-stranded calf thymus DNA, 147 ml H20 and 300U S 1 nuclease is added to the hybridization reaction and incubated for 60 minutes at 30°C. Following the addition of 80ml S 1 stop buffer (4M ammonium acetate, 20mM EDTA, 40 mg/ml tRNA) the sample is ethanol precipitated, resuspended in formamide loading dye, denatured and analysed on a denaturing polyacrylamide/urea gel of the appropriate percentage for the expected size of the protected band (Ausubel et al., supra).
RT-PCR
The method of RT-PCR is useful according to the invention for RNA expression analysis. According to the method of reverse transcription /polymerase chain reaction (RT-PCR) during the reverse transcription (RT) step, the RNA is converted to first strand cDNA, which is relatively stable and is a suitable template for a PCR reaction. In the second step, the cDNA template of interest is amplified using PCR. This is accomplished by repeated rounds of annealing sequence- specific primers to either strand of the template and synthesizing new strands of complementary DNA from them using a thermostable DNA polymerase.
An RNA sample is ethanol precipitated with a cDNA primer. It may be preferable to use a cDNA primer that is identical to one ofthe amplification primers. To the pellet is added 12 ml H20, 4ml 400mM TrisCl, pH 8.3, and 4 ml 400 mM KCI. The mixture is heated to 90°C, slow cooled to 67°C, microfuged and incubated for 3 hours at 52°C. Following the addition of 29ml reverse transcriptase buffer (per sample/2.5ml 400mM TrisCl, pH8.3, 2.5ml 400mM KCI, 1ml 300mM MgCl2, 5ml lOOmM DTT, 5ml 5mM 4 dNTP mix, 2ml actinomycin D, 11ml H20) and 0.5ml (16U) AMV reverse transcriptase, the sample is incubated for 1 hour at a temperature between 37°C and 55°C. The temperature will be adjusted in accordance with the composition ofthe primer and the RNA of interest. The sample is then extracted sequentially with phenol and chloroform, and ethanol precipitated. The resulting cDNA pellet is resuspended in 40ml H20.5ml ofthe cDNA sample is mixed with 5ml or each amplification primer (~20mM each), 4ml 5mM 4dNTP mix, 10ml 10X amplification buffer (500mM KCI, lOOmM TrisCl, pH8.4, lmg/ml gelatin) and 70.5ml H20. After the mixture is heated for 2 minutes at 94°C, 0.5 ml (2.5U) Taq DNA polymerase is added and the sample is overlaid with mineral oil. PCR amplification ofthe cDNA will be performed using the following automated amplification cycles: 39 cycles (2 minutes at 55°C, 2 minutes at 72°C, 1 minute at 94°C), 1 cycle (2 minutes at 55°C, 7 minutes at 72°C). The number of cycles can be varied in accordance with the abundance of RNA (Ausubel et al., supra).
If a polymorphism is located in a transcription factor binding site, assays including but not limited to the yeast two-hybrid assay (Fields et al., 1994, Trends Genet., 10:286) can be used to determine the effects of a polymorphism on transcription factor binding.
If the protein product of the gene of interest is a DNA binding protein the phenotypic outcome of a polymorphism may be impaired nuclear transport, DNA binding, chromatin assembly or chromatin structure, methylations or histones deacetylation.
Nuclear Transport
Irnmunocytochemical methods or cell fractionation techniques (as described above) are used to determine if the protein is correctly localized in the nucleus.
The DNA binding properties of a transcription factor are determined by gel shift analysis (as described in Ausubel et al., supra), oligonucleotide selection, southwestern assays or by immunohistochemical analysis of fixed chromosomes.
Gel Shift Analysis
The method of gel shift analysis is used to detect sequence specific DNA-binding proteins from crude extracts. According to this method, proteins that bind to an end-labeled DNA fragment will retard the mobility of the fragment. The change in the mobility of the labeled fragment is detected by the appearance of a discrete band comprising the DNA-protein complex. A number of methods for preparing nuclear and cytoplasmic extracts useful for gel shift analysis are known in the art. For example, nuclear extracts are prepared according to the following method. A cell pellet is washed in PBS, resupended in a volume of hypotonic buffer (10 mM HEPES, pH 7.9, 1.5 mM MgCl2, lOmM KCI, 0.2 mM PMSF, 0.5 mM DTT ) that is approximately equal to 3 times the packed cell volume and allowed to swell on ice for 10 minutes. Cells are homogenized in a glass Dounce homogenizer and the nuclei are collected by centrifugation and resupended in a volume of low-salt buffer (20 mM HEPES , pH 7.9, 25% (v/v) glycerol, 1.5 mM MgCl2, 0.02 M KCI, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) equivalent to one-half ofthe packed nuclear volume. Following the addition of a volume of high-salt buffer (20 mM HEPES, pH 7.9, 25% (v/v) glycerol, 1.5 mM MgCl2, 1.2 M KCI, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) equivalent to one-half of the packed nuclear volume (dropwise with stirring) to the nuclei, nuclear extraction is carried out for 30 minutes with continuous gentle stirring. The nuclei are collected by centrifugation and the nuclear extract is dialyzed against 50 volumes of dialysis buffer (20 mM HEPES, pH 7.9, 20% (v/v) glycerol, lOOmM KCI, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) until the conductivities of extract and buffer are equivalent. The extract is removed from the dialysis tubing and analysed for protein concentration (Ausubel et al., supra).
Probes useful for gel shift analysis include a fragment of plasmid DNA or a gel-purified double stranded oligonucleotide. Preferably the probe is labeled with Klenow fragment by incubating a lOOml solution of plasmid DNA or oligonucleotide with lOOmCi of the desired [a- 32P] dNTP, 4ml of 5 mM 3dNTP mix and 2.5 U Klenow fragment for 20 minutes at room temperature. Upon the addition of 4ml of a solution comprising 5 mM of the dNTP coπesponding to the radioactive dNTP, the sample is incubated for 5 minutes at room temperature. The radiolabeled probe is ethanol precipitated, resuspended in TE buffer and gel purified.
Gel shift analysis is performed by incubating 10,000 cpm of the labeled probe (0.1-0.5 ng) with 2mg poly (dl-dC)-poly(dl-dC), 300 mg BSA, and approximately 15mg of a nuclear extract or buffered crude protein extract prepared, for example, as described above, for 15 minutes at 30°C. An aliquot of the binding reaction is analysed by electrophoresis on a prewarmed low-ionic strength gel (e.g. a 4% polyacrylamide gel in TBE) and autoradiography (Ausubel et al., supra). Oligoselection Assays for DNA Binding Activity
DNA binding activity is an essential property of proteins involved in many basic cell biological events, such as chromatin structure, transcriptional regulation, DNA replication and repair. The biological activity of a DNA binding protein can be assayed by defining the optimal target DNA binding site. Using the PCR based primer selection technique (Blackwell, 1990, Science, 250:1104) the canonical nucleotide sequence defining the binding site is elucidated in vitro by mixing purified full length protein, or just the DNA binding domain of a protein of interest, with an oligonucleotide duplex pool containing a completely randomized central region flanked by primer- annealing sites. Multiple rounds of immunoprecipitation and amplification by PCR enriches for high affinity sites which are cloned are sequenced in order to define a canonical binding site.
The ability of a DNA binding protein to correctly regulate chromatin assembly and structure can be determined by DNase hypersensitivity assays. Alternatively, coimmunoprecipitation experiments or Western blot analysis can be used to determine if the DNA binding protein is associated with a component of the chromatin.
Southwestern Blot Assay for Protein-DNA Interactions
The ability of a protein to bind DNA is measured by using the "Southwestern" blot technique (for example see Antalis et al., 1993, Gene, 134:201). According to this method, radiolabelled DNA is incubated with protein that has been immobilized on nitrocellulose filters and the amount of boundDNA is measured by scintillation counting or autoradiography followed by densitometry. The protein to be tested can be pure protein, immunoprecipitated protein, crude cell lysates or even recombinant protein denatured directly from bacterial colonies, yeast or cell culture.
Assay of Protein Binding to Chromosomes in Vivo: brrmunocytology of Fixed Chromosomes
Numerous biologically important nuclear proteins are in direct contact with genomic DNA. The presence of these proteins can be detected immunocytologically by fixing metaphase chromosomes such that the protein is permanently fixed at the region of DNA to which it normally binds. The presence and cytological location of the protein can then be determined by incubating the fixed chromosomes with an antibody directed against the protein of interest, and performing standard methods of immunohistochemical staining (Zink and Paro, 1989, Nature, 337:468).
Coimmunoprecipitation Assay for Chromatin Assembly/Structure.
If an antibody specific for a protein of interest exists, immunoprecipitation can be used to test for the presence of the protein (Otto and Lee, 1993, Methods Cell Biol., 37: 119, Banting, 1995, In Gene Probes 1: A practical approach. Chapter 8: Antibody probes, pp. 225-227, IRL press.). The following methods are used for determining if a protein of interest is associated with a particular subcellular component. According to one method, proteins are immunoprecipitated with an antibody specific for a cellular component (e.g. chromatin or nuclear antigens), the immunoprecipitated material is analysed on a gel by denaturing polyacrylamide gel electrophoresis and western blot analysis is performed with an antibody specific for the protein of interest, to determine if a physical association exists between the cellular component and the protein of interest. Various incubation and wash treatments of the cell lysate are used to remove background contamination and enhance the sensitivity of detection (Banting, 1995, supra). Alternatively, the initial immunoprecipitation can be carried out with the antibody specific for the protein of interest, and the western blot analysis can be performed with an antibody specific for a cellular component. According to a variation of this method, prior to immunoprecipitation the cells can be treated with a protein crosslinker to ensure that protein-protein interactions are maintained during immunoprecipitation. According to another variation of this method, proteins can be cross-linked to DNA and then precipitated (Dedon et al., 1991, Anal. Biochem., 197:83). If DNA coprecipitates with a particular protein, this suggests that DNA is associated with, and presumably bound to the protein. The coprecipitating DNA can be sequenced to identify the bound sequence.
DNAse Hypersensitivity
The transcriptionally active promoter region of a gene can be analysed for susceptibility to cleavage by DNAsel (Montecino et al., 1994,Biochemistry, 33:348). Efficient cleavage of genomic DNA is dependent on the accessibility of this enzyme to the DNA, and is influenced by several factors, including nucleosome packaging, overall chromatin configuration, and the presence of DNA binding proteins such as transcription factors. DNA sequence variations within the promoter DNA may have profound effects on these factors and result in abeπant regulation of gene transcription and ultimately abnormal biological activity of the gene. Therefore, altered gene activity around a polymorphic site can be detected as increased or decreased DNAsel hypersensitivity (Vaishnaw et al., 1995, Immunogenetics, 41:354).
Assay for DNA Methylations Accurate mapping of DNA methylations patterns, for example, in CpG islands which are unmethylated regions of DNA, is used to investigate and gain a better understanding of diverse biological processes such as the regulation of imprinted genes, X chromosome inactivation and tumor suppressor gene silencing in human cancer. DNA methylations at specific sites is most frequently studied by use of methylations-sensitive restriction endonucleases (for example HpaH) and Southern blotting (Sambrook et al., supra). The sensitivity of this method can be enhanced several hundred-fold by performing a ligation-mediated PCR step (as described in Steigerwald et al., 1990, Nucleic Acids Res., 6: 1435) after enzyme treatment. An alternative strategy termed methylations-specific PCR (Herman et al., 1996, Proc Natl Acad Sci USA., 93:9821), is used to determine the methylations status of CpG islands without the use of methylations-specific restriction enzymes.
Histones-Deacetylation
Transcription of chromatin-packaged genes involves highly regulated changes in nucleosome structure that control DNA accessibility. Changes in nucleosome structure can be mediated by enzymatic complexes which control the acetylation and deacetylation of histones. Transcription elongation is required for the formation of the unfolded structure of transcribing nucleosomes, and histones acetylation is required for the maintenance of these structures (Walia et al., 1998, J. Biol. Chem., 3:14516). Deacetylation can be prevented by incubating cells with histones deacetylase inhibitors such as sodium butyrate or trichostain A. To assay for changes in acetylation and the state of transcriptional activity, chromatin fractions are purified using organomercury and hydroxylapatite dissociation chromatographic techniques (Walia et al., supra).
ii. Transcription Start Site To determine if a particular polymorphism causes a change in the transcriptional start site of a candidate gene SI nuclease mapping and primer extension can be performed. The presence of a polymorphism may cause an mRNA to be abeπantly expressed. In particular, a polymorphism may change the tissue specificity or developmental expression pattern of an mRNA species. A variety of molecular methods for detecting mRNA known in the art can be performed to determine the expression pattern of an mRNA These methods include, but are not limited to the following: Northern blot analysis, RT-PCR, SI analysis, RNASE Protection analysis, or in situ hybridization analysis of sections, wherein the samples are derived from multiple different tissues or from a tissue at different stages of development. Northern blot analysis, RT-PCR and S 1 analysis can also be used to determine if a polymorphism results in an altered pattern of mRNA splicing.
Northern-B 1 ottin g
The method of Northern blotting is well known in the art. This technique involves the transfer of RNA from an electrophoresis gel to a membrane support to allow the detection of specific sequences in RNA preparations.
Northern blot analysis is performed according to the following method. An RNA sample (prepared by the addition of MOPS buffer, formaldehyde and formamide) is separated on an agarose/formaldehyde gel in IX MOPS buffer. Following staining with ethidium bromide and visualization under ultra violet light to determine the integrity of the RNA, the RNA is hydrolyzed by treatment with 0.05M NaOH/l .5MNaCl followed by incubation with 0.5M Tris-Cl (pH 7.4V1.5M NaCI. The RNA is transferred to a commercially available nylon or nitrocellulose membrane (e.g. Hybond-N membrane, Amersham, Arlington Heights, IL) by methods well known in the art (Ausubel et.al., supra, Sambrook et al., supra). Following transfer and UV cross linking, the membrane is hybridized with a radiolabeled probe in hybridization solution (e.g. in 50% formamide/2.5% Denhardt's/100-200mg denatured salmon sperm DNA/0. 1% SDS/5X SSPE) at 42°C. The hybridization conditions can be varied as necessary as described in Ausubel et al., supra and Sambrook et al., supra. Following hybridization, the membrane is washed at room temperature in 2X SSC/0.1% SDS, at 42°C in IX SSC/0.1% SDS, at 65°C in 0.2X SSC/0.1% SDS, and exposed to film. The stringency of the wash buffers can also be varied depending on the amount of background signal (Ausubel et al., supra).
RNASE Protection Analysis
RNASE Protection analysis can be used to analyze RNA structure and amount and determine the endpoint of a specific RNA.
The method of RNASE protection is more sensitive than SI analysis since it utilizes a sequence specific hybridization probe that is labeled to a high specific activity. The probe is hybridized to sample RNAs and treated with ribonuclease to remove free probe. Following ribonuclease treatment, the fragments comprising probe annealed to homologous sequences in the sample RNA are recovered by ethanol precipitation, and analysed by electrophoresis on a sequencing gel. The presence of the target mRNA is indicated by the presence of an appropriately sized fragment of the probe.
A probe is labeled by the method of in vitro transcription (in the presence of [a-32P] CTP as described in Section B entitled "Production of a Polynucleotide Sequence". The RNA sample to be analysed is ethanol precipitated and resuspended in 30ml hybridization buffer (4 parts formamide/1 part 200 mM PIPES, pH 6.4, 2M NaCI, 5 mM EDTA) containing 5 x 105 cpm of the probe RNA. The mixture is denatured 5 minutes at 85°C and incubated at the desired hybridization temperature (30°C to 60°C) for >8 hours. To each reaction mixture is added 350 ml ribonuclease digestion buffer (10 mM Tris-Cl, pH 7.5, 300mM NaCI, 5mM EDTA) containing 40mg/ml ribonuclease A and 2mg/ml ribonuclease TI. The sample is incubated for 30-60 minutes at 30°C. Following the addition of 10 ml 20%SDS and 2.5ml 2Omg/ml proteinase K, the sample is incubated for 15 minutes at 37°C. The sample is extracted with phenol /chloroformlisoamyl alcohol, ethanol precipitated, resuspended in RNA loading buffer (80% (v/v) formamide, 1 mM EDTA, pH 8.0, 0.1 % bromophenol blue, 0.1 % xylene cyanol), denatured and analysed by electrophoresis on a denaturing polyacrylamide/urea gel and autoradiography (Ausubel et al., supra).
Primer Extension The method of primer extension is used to map the 5' end of an RNA and to quantitate the amount of an RNA of interest by using reverse transcriptase to extend a primer that is complementary to a region of a given RNA.
An oligonucleotide primer is labeled in a kinase reaction as described for SI analysis. The primer extension reaction is performed by mixing 10-50mg total cellular RNA (in 10ml) with 1.5ml 10X Hybridization buffer (1.5M KCI, 0.1M TrisCl, pH 8.3, lOmM EDTA) and 3.5 ml labeled oligonucleotide. Samples are heated to 65°C for 90 minutes and allowed to slow cool at room temperature. To each sample is added 30ml of primer extension reaction mixture (0.9ml Tris-Cl, pH 8.3, 0.9ml 0.5MMgCl2, 0.25ml DTT, 6.75ml 1 mg/ml actinomycin D, 1.33 ml 5 mM 4dNTP mix, 20 ml H20, 0.2ml 25U/ml AMV reverse transcriptase). Samples are incubated for 1 hour at 42°C, and then, following the addition of 105ml RNASE reaction mix (100 mg/ml salmon sperm DNA, 20 mg/ml RNASE A) for 15 minutes at 37°C. Samples are extracted in phenol/chloroformlisoamyl alcohol, ethanol precipitated, resuspended in stop/loading dye (20 mM EDTA, pH 8.0, 0.05% bromophenol blue, 0.05% xylene cyanol in formamide), heated at 65°C and analysed by electrophoresis on a 9% acrylamide/7M urea gel and autoradiography.
In Situ Hybridization
Cytological techniques well known in the art can be used to determine the temporal and spatial expression patterns of mRNA (in situ hybridization of tissue sections) and protein (immunohistochemistry in individual cells).
Preparation of histological samples
Tissue samples intended for use in in situ detection of either RNA or protein are fixed using conventional reagents; such samples may comprise whole or squashed cells, or sectioned tissue. Fixatives useful for such procedures include, but are not limited to, formalin, 4% paraformaldehyde in an isotonic buffer, formaldehyde (each of which confers a measure of RNAase resistance to the nucleic acid molecules of the sample) or a multi -component fixative, such as FAAG (85 % ethanol, 4% formaldehyde, 5% acetic acid, 1% EM grade glutaraldehyde). For the detection of RNA, water used in the preparation of an aqueous component of a solution to which the tissue is exposed until it is embedded is RNAase-free, i.e. treated with 0.1% diethylprocarbonate (DEPC) at room temperature overnight and subsequently autoclaved for 1.5 to 2 hours. Tissue will be fixed at 4°C, either on a sample roller or a rocking platform, for 12 to 48 hours in order to allow the fixative to reach the center of the sample.
Prior to embedding, excess fixative will be removed and the sample will be dehydrated by a series of two- to ten-minute washes in increasingly high concentrations of ethanol, beginning at 60% and ending with two washes in 95% and another two in 100% ethanol, followed by two ten-minute washes in xylene. Samples will be embedded in one of a variety of sectioning supports, e.g. paraffin, plastic polymers or a mixed paraffin/polymer medium (e.g. Paraplast®Plus Tissue Embedding Medium, supplied by Oxford Labware). For example, fixed, dehydrated tissue will be transferred from the second xylene wash to paraffin or a paraffin/polymer resin in the liquid-phase at about 58°C. The paraffin or a paraffin/polymer resin will be replaced three to six times over a period of approximately three hours to dilute out residual xylene. The sample will be incubated overnight at 58°C under a vacuum, in order to optimize infiltration of the embedding medium into the tissue. The next day, following several additional changes of medium at 20 minute to one hour intervals, also at 58°C, the tissue sample will be positioned in a sectioning mold, the mold will be suπounded by ice water and the medium will be allowed to harden. Sections of 6mm thickness will be taken and affixed to 'subbed' slides, which are slides coated with a proteinaceous substrate material, usually bovine serum albumin (BSA), to promote adhesion. Other methods of fixation and embedding are also applicable for use according to the methods of the invention; examples of these are found in Humason, G.L., 1979, Animal Tissue Techniques, 4th ed. (W.H. Freeman & Co., San Fransisco), as is frozen sectioning (Serrano et al., 1989, supra).
In situ Hybridization Analysis According to the method of in situ hybridization a specifically labeled nucleic acid probe is hybridized to cellular RNA present in individual cells or tissue sections. In situ hybridization can be performed on either paraffin or frozen sections. Depending on the desired sensitivity and resolution, either film or emulsion autoradioagraphy can be utilized to detect the hybridized radioactive probe.
The following method of in situ hybridization is performed by incubating slides containing cell or tissue specimens in a slide rack contained within a glass staining dish. According to this method, it is preferable to use solutions that have been prepared fresh. Prior to the hybridization steps, slides are dewaxed to remove the sectioning support material. The dewaxing protocol involves sequential washes in xylene, rehydration by sequential washes in 100%, 95%, 70% and 50% ethanol, and denaturation in 0.2N HCl. Following a heat denaturation step (70°C in 2X SSC), samples are postfixed in a freshly prepared solution of 4% PFA, washed in PBS, incubated in 10 mM DTT (10 min at 45°C) and blocked in 400 ml PBS containing 0.617g DTT, 0.74 g iodoacetamide and O.Sg N-ethylmaleimide, for 30 min at 45°C in a water bath covered with aluminum foil, due to the light sensitivity of iodoacetamide and N- ethylmaleimide. The samples are washed in PBS and equilibrated sequentially in freshly prepared 0. 1M triethanolamine (TEA buffer), TEA buffer/0.25% acetic anhydride, and TEA buffer/0.5% acetic anhydride. Following a blocking step in 2X SSC, the sample are dehydrated by sequential washes in 50%, 70%, 95%, and 100% ethanol and air dried. 35S-labeled riboprobes and competitor probes prepared in the absence of a radiolabel (prepared as described in Section B entitled "Production of a Polynucleotide Sequence") or double-stranded DNA probes (prepared with [35S]dNTPs by methods well known in the art including nick translation or random oligonucleotide-primed synthesis) are heated to 100°C for 3 min and diluted to a concentration of 0.3mg/ml final probe concentration, in 50% formamide, 0.3M NaCI, lOmM TrisCl, pH 8.0, 1 mM EDTA, lx Denhardt solution, 500mg/ml yeast tRNA, 500mg/ml ρoly(A) (Pharmacia), 50 mM DTT, 10% polyethylene glycol (MW 6000). The hybridization step is carried out by covering the sample with an appropriate amount of probe, and incubating for 30 min to 4 hour at 45°C in a chamber designed to prevent dilution or concentration of the hybridization solution. Samples are washed sequentially at 55°C in solution A (50% (v/v) formamide, 2X SSC, 20 mM 2-mercaptoethanol), and solution B (50% (v/v) formamide, 2X SSC, 20 mM 2-mercaptoethanol, 0.5% (v/v) Triton-X-100) and at room temperature in solution C (2X SSC, 20 mM 2- mercaptoethanol). Following a 15 minute incubation with RNASE, samples are washed at 50"C in solution C, and at room temperature in 2X SSC. Samples are rehydrated by sequential washes in 50% ethanol/0.3M ammonium acetate, 70% ethanol/0.3M ammonium acetate, 95% ethanol/0.3M ammonium acetate, and 100% ethanol. Slides are air dried and analysed by film or by emulsion autoradiography (Ausubel et al., supra).
iii. mRNA Stability/Control of Turnover and mRNA Transcription Rate
Changes in mRNA stability/control of turnover and mRNA transcription rates due to the presence of a polymorphism, can be detected by the following methods.
mRNA Stability
Gene-expression can be regulated by variations in mRNA stability (Liebhaber, 1997, Nucleic Acids Symp Ser., 36:29 and Ross J. 1996, Trends Genet., 5:171). Any gene variation occurring within the cis-acting elements which control mRNA abundance may influence gene expression levels (Peltz et al., 1992, Curr Opin Cell Biol., 4:979). Quantitative RT-PCR (Kohler, et al, 1995, Quantitation of mRNA by polymerase chain reaction, Springer) and mRNA radiolabelling techniques are two methods for measuring relative mRNA abundance and stability. Quantitative PCR employs an internal standard to provide a direct comparison between alternative reactions, enabling comparison of low abundance transcripts or transcripts derived from a sample that is only available in a limited quantity (McPherson MJ et al., eds, 1995, PCR2- A practical approach. IRL Press).
Assay for mRNA Transcription Rates. Genetic polymorphism within the regulatory regions of a gene can significantly alter transcription rate and mRNA stability, resulting in reduced biological activity of the encoded protein. One of the most sensitive assays for measuring the rate of gene transcription is the nuclear runoff assay (Groudine and Casimir, 1984, Nucleic Acids Res 12: 1427). Nuclei isolated from cell lines expressing the target gene of interest are treated with radiolabelled UTP and the level of incorporation of radiolabel into nascent RNA transcripts is determined by filter hybridization to immobilized cDNA derived from the target gene.
iv. Intracellular mRNA Localization A genetic variation can cause a change in the localization of a particular mRNA species
(e.g. to the cytoskeleton, or to the nuclear scaffold).
Immunohistochemisitry
Changes in RNA localization can be detected by immunohistochemical methods well known in the art (e.g. in situ analysis described above).
Oocyte Injection Assays
In many cases mRNA, like protein, is localized in relation to the polarity ofthe cell or the cytoskeletal architecture (St. Johnston, 1995, Cell, 81:161). The Xenopus oocyte is a popular, experimentally tractable, system for studying intracellular trafficking of mRNA (Nakielny et al . , 1997, Annu. Rev. Neurosci. , 20:269). Fluorescently labelled RNA is microinjected into the large oocyte cell where its location can be detected using standard microscopy methods. Polymorphic variants of a particular mRNA species may differ in their response to cellular mechanisms responsible for partitioning mRNA within the cell. This method has been useful for demonstrating that sequence variations can affect sub-cellular localization (Grimm et al., 1997,EMBO J., 16:793)
v. Post-Translational Alterations
Post-Translational alterations resulting from premature stop codons, translational readthrough or multiple open reading frames and translational suppression may occur as a result of a polymorphism. To detect post-translational alterations, a polynucleotide comprising one or more polymorphisms is subjected to in vitro transcription and in vitro translation (as described in sections B and J entitled "Production of a Polynucleotide Sequence" and "Preparation of a Labeled Protein"). The translation product(s) are analysed for the appearance of aberrantly sized proteins. Additional post-translational alterations that may occur as a result of a polymorphism include changes in localization due to an altered signal sequence, and changes in glycosylation, myristilation, and susceptibility to or sites of proteolytic cleavage. The method of immunocytochemistry can be used to determine if a protein is incorrectly localized, due to the presence of an altered signal sequence.
Immunohistochemistry
Immunohistochemical techniques including indirect immunofluorescence, immunoperoxidase labeling or immunogold labeling, are used for protein localization.
Immunofluorescent labeling of tissue sections (prepared as for in situ analysis, described above) is performed by the following method. Slides containing the sample of interest are equilibrated to room temperature washed in PBS, incubated with an appropriate dilution of primary antibody (1 hour at room temperature), washed in PBS, incubated with an appropriate dilution of secondary antibody (1 hour at room temperature), washed in PBS and analysed under a microscope (Ausubel et al., supra). Alternatively, the sensitivity ofthe immunohistochemical reaction is increased by using a streptavidin-secondary antibody conjugate reacted with a biotin- fluorochrome conjugate. Alternatively, immunogold labeling is used to detect a protein of interest by using an immunogold-conjugated secondary antibody. Immunoperoxidase labeling of tissue sections is performed by the following method.
Slides are pretreated in 0.25% hydrogen peroxide, incubated with primary antibody, washed in PBS and incubated (1 hour at room temperature) with a specific secondary bridging antibody capable of recognizing both the primary antibody and a Horseradish peroxidase antiperoixidase (PAP) complex. The slides are washed in PBS and developed in diaminobenzidene substrate solution (0.03% (w/v) 3,3' diaminobenzidene in 200 ml PBS) at room temperature (Ausubel et al., supra).
Alternatively, protein localization is determined by cell fractionation wherein cells are biosynthetically labeled, the labeled material is fractionated, and the radiolabeled proteins in each fraction are analysed by immunoprecitation with an antibody specific for the protein of interest. Assay for Glycosylation Inhibition
Changes in protein glycosylation can be detected by radiolabelling a protein of interest with sugars, determining if a change in the cellular localization (by immunocytochemistry) ofthe protein in culture has occurred due to aberrant glycosylation, or by determining the effects of inhibitors of glycosylation on the migration pattern of proteins analysed by polyacrylamide gel electrophoresis.
Post-translational glycosylation of proteins plays an important role in defining protein function (Baeziger, 1994, FASEB J., 13:1019; Jacob, 1995, Curr. Opin. Struct. Biol., 5:605).
Protein glycosylation can be inhibited by tunicamycin, an antibiotic, as well as by several sugar analogues (Schwarz, 1991, Behring Inst Mitt., 89:198). These reagents are used to characterize the effects of sequence changes on protein glycosylation.
Assay for Post-Translational Modification with Lipids
Changes in protein modification with lipids (e.g. myristilation) are detected by radiolabelling a protein of interest with myristic acid or by determining if a change in the cellular localization of the protein in culture has occurred as a result of aberrant lipid modification (by immunocytochemistry).
Covalent attachment of lipids is a mechanism by which eukaryotic cells direct and, in some cases, control, membrane localization of proteins (Casey, 1994, Curr. Opin. Cell. Biol., 2:219). Such post-translational addition of myristyl, palmityl or prenyl side-chains has akey role in the functional regulation of many proteins (Chow et al., 1992, Curr. Opin. Cell. Biol., 4:629; Resh, 1994, Cell, 763:411). Assays for detecting proteins that are covalently modified by the attachment of lipids include labeling with [3H]myristate (Stevenson et al., 1992, J. Exp. Med., 176:1053), or a combination of enzymatic and chemical cleavage techniques performed in conjunction with tandem mass spectrometry to determine sites of modification (Papac et al., 1992, J. Biol. Chem., 267:16889).
Proteolytic Cleavage Post-translational cleavage of polypeptides is an important mechanism for modulating protein function in many physiological processes. Protease activity is involved in zymogen processing, activation of enzyme catalysis, tissue/cell remodelling, signal transduction cascades, protein degradation and cell death pathways (Rappay, 1989, Prog Histochem Cytochem., 18:1). A protein that is predicted to be a protease or the target of a protease can be assayed in vitro using purified proteins or cell extracts (Muta et al., 1995, J. Biol. Chem. 270:892) where cleavage efficiency is monitored by standard PAGE or western blotting. Alternatively, proteases and/or their targets can be expressed from expression plasmids in in vivo cell culture systems in order to monitor their biological activity (Zhang, et al., 1998, J. Biol. Chem.273: 1144). The specificity of proteolytic cleavage is determined using inhibitors that selectively block seine, cysteine, aspartic and metallo proteolytic activity (e.g. pepstatin A selectively inhibits aspartic proteases) (Rich, et al., 1985, Biochemistry., 24: 3165).
To determine if a protein has been modified such that the sites of proteolytic cleavage have been altered, or susceptibility to proteolytic cleavage has changed pulse chase experiments with radiolabeled protein can be carried out to determine the precursor-product relationship following digestion with a protease of a given specificity. The method of pulse chase labeling is described in Ausubel et al., supra. Alternatively, inhibitors of proteases (e.g acid proteases or seine proteases) can be used to identify protease cleavage sites.
vi. Changes in Receptor Properties
If the gene of interest encodes a receptor protein, a polymorphism may modify the properties of the receptor such that receptor binding/turnover or activation is altered. Receptor formation can be impaired if a polymorphism causes improper receptor localization or assembly.
Receptor Localization
To determine if a receptor protein is being expressed at the proper location (e.g. nucleus, cytoplasm, cell surface), the receptor can be localized by immunocytochemical techniques. Alternatively, cells that are expressing the receptor can be fractionated and subjected to Western blot analysis or biosynthetically labeled, fractionated and analysed by immunoprecipitation. Protein-Protein InteractionsAfn vitro Assembly Assays for Receptors
A number of methods can be used to determine if a receptor is colocalized with the appropriate protein partner.
The function of a protein may be dependent on the ability of the protein to interact with other proteins as part of a large complex. For example, certain cell surface receptors consist of a receptor complex that is composed of several homo- or heteromeric protein subunits, and activation by ligand can result in altered protein-protein interactions both within the receptor complex and with "downstream" targets such as G-proteins (Okada and Pessin, 1996, J. Biol. Chem., 271:25533). Protein-protein interactions can be assayed immunologically by co- immunoprecipitation of native (Gilboa etal., 1998, J. Biol. Chem., 140:767) or chemically cross- linked complexes (Haniu et al., 1997, J. Biol. Chem., 272:25296), or through protein-protein mobility shift assays (Stern and Frieden, 1993, Anal. Biochem., 212:221). If all of the components of a receptor complex have been identified, one can employ in vitro reconstitution assays to assess whether a single protein alteration can effect the functioning of the entire complex (Durovic et al., 1994, J. Biol. Chem., 269:30320).
Assay for In Vitro Assembly of Multimeric Protein Complexes
To determine whether these genetic variations have affected protein complex assembly, experiments are carried out wherein recombinant mutant subunits are transfected into cells and coexpressed with the other subunit components in vitro. Proper assembly is assessed by immunoprecipitation ofthe protein complex in question with antibodies specific for the various members ofthe complex followed by PAGE analysis (Koster et al., 1998, Biophysl. J., 74: 1821).
Assay Receptor Binding/Turnover. Receptor-ligand interaction is essential for the functionality of the bound complex.
Genetic changes that alter either ligand or receptor can dramatically affect receptor binding, turnover, and subsequent activation of downstream signaling events. Receptor binding/turnover can be measured by standard Scatchard analysis of radiolabelled ligand binding in vitro (Culouscou et al., 1993, J. Biol. Chem. 268:10458) or in cellular based assays (Greenlund et al., 1993, J. Biol. Chem. 268: 18103).
LigandiBinding as Measured by Affinity Chromatography Alternatively, affinity chromatography methods (well known in the art) can be employed to determine if a receptor is demonstrating aberrant binding characteristics. According to the method of affinity chromatography, receptor-ligand interactions are allowed to occur, and the binding efficiency or receptor and ligand and/or turnover of receptor-ligand complexes is measured. Alternatively, affinity chromatography can be used to isolate one or more components of a receptor ligand interaction for further analysis (March et al., 1974, Adv. Exp. Med. Biol., 42:3). The method of affinity chromatography typically involves immobilizing on a solid support one component, for example a known ligand for a receptor, and then incubating the immobilized ligand with radiolabelled protein under optimal binding conditions. To measure the exact binding affinity of a given ligand-receptor pair, an increasing amount of non-labeled competitor is added. This assay can be used to assess altered binding efficiency resulting from the presence of a polymorphism in a protein of interest.
Receptor Activation Assays: Phosphorylation, Kinase Activity and Mitogenic Stimulation
Almost all signaling that occurs through cell surface receptors is regulated by phosphorylation, a reversible post-translational event that occurs at specific amino acid residues and is catalyzed by a protein kinase activity present within the receptor itself
(autophosphorylation) or in trans via direct interaction with an associated kinase (Hunter, 1997,
Philos Trans R Soc Lond B Biol Sci., 353:583). The specific effect of phosphorylation on a biological activity depends on the receptor, but often results in modulation of endogenous receptor kinase activity or interaction with associated proteins, which are also often kinases. The results of a phosphorylation event are passed on through a cascade of protein kinases/phosphatases which ultimately effect downstream processes controlling gene transcription, cell proliferation, metabolism, movement and differentiation (Patarca, 1996, Grit
Rev Oncog., 7:343). The biological function of a receptor is usually assayed in cell culture following over-expression. The phosphorylated state of a receptor can be assayed directly by immunological methods by employing an antibody that specifically recognizes a phosphorylated residue (Bangalore, 1992., Proc Natl Acad Sci USA., 89:11637). Endogenous kinase activity associated with a receptor is measured via the incorporation of radiolabelled phosphate in immunoprecipitated receptor complex (Kazlauskas and Cooper, 1989, Cell 58:1121). "Downstream" events of receptor activity including mitogenic stimulation or map kinase activity, can be measured by tritiated thymidine incorporation (Luo et al., 1996, Cancer Res. 56:4983), or by mobility-shift analysis of map kinase on western blots (Vietor, 1993., J. Biol. Chem. 268:18994), respectively.
Immunocytochemical methods can be used to determine if a receptor-ligand complex is coπectly translocated to the nucleus. Alternatively, nuclear preparations (prepared as described below) can be analysed by Western blot or immunoprecipitation for the presence of the receptor protein.
If a receptor is a transcriptional activator, the ability of the receptor to induce gene expression can be measured by a variety of methods including Northern blot analysis, or reporter gene assays wherein the promoter region isolated from a gene that is activated by the receptor regulates the expression of a reporter protein.
vii. Enzyme Catalysis
The gene of interest may encode a protein that has an enzymatic activity wherein the enzyme catalyzes a reaction that is critical to the general metabolism of a cell. To determine if a mutated protein is impaired in its enzymatic function, assays can be performed to measure the enzymatic activity of the protein. There are many important enzymatic activities associated with normal cellular metabolism, including: glycosidation, esterification, amidation, hydroxylation, acetylation, sulfonylation, alkylation. Each of these activities are assayed using in vitro methods employing overexpressed or purified proteins, well known in the art (Eisenthal and Danson, 1992, Enzyme Assays: A Practical Approach, Rickwood et al., Eds., JJRL Press. Oxford, England).
The protein of interest may also be involved in various aspects of DNA synthesis or replication. In vitro assays for the enzymatic reactions involved in DNA synthesis or replication (e.g. polymerase, ligase, exonuclease or helicase activity) are known in the art. The biological activity of the proteins catalyzing these activities are assayed in vitro using standard enzymatic techniques (Adams, 199, DNA Replication: A Practical Approach I, Rickwood, et al., Eds., JRL Press. Oxford, England).
If the protein of interest is involved in glycolysis or energy transport, assays for measuring transporter activity or the activity of ATP dependent pumps are useful, according to the invention, for determining if a mutated protein is impaired in these functions.
Transporter Activity
Mammalian cells possess a variety of transporter systems, for example amino acid transporters, which have overlapping substrate specificity (Van Winkle et al., 1993, Biochim Biophys Acta, 1154:157). To determine if a polymorphism in a candidate gene of interest has altered the function of the protein product of this gene as a molecular transporter, the full-length cDNA clone is isolated by standard expression cloning strategies, and a change in activity ofthe full-length cDNA or antisense cDNA upon microinjection into Xenopus laevis oocytes is determined by measuring changes in influx/efflux transport of radiolabelled amino acid molecules (Broer et al., 1995, Biochem J., 312(Pt 3):863), neurotransmitters or their metabolites.
ATP-dependent pumps Activity
Mammalian cells possess a variety of molecules that are categorized as ATP-binding cassette or ATP-dependent transporters or pumps. These include the Na+-K+-ATPase ion pump, the calcium uptake pump, (K+ + H+)- ATPase and the human multidrug resistant protein termed P-glycoprotein. Alterations in pump activity are investigated by expressing the clone specific for the pump protein(s) of interest in Xenopus oocytes, and performing tracer studies which measure the changes in ATP-dependent uptake or extrusion of a radiolabelled substrate, and changes in the coupling ratios (e.g. moles substrate transported/mole ATP hydrolyzed) (Shapiro et al., 1998, Eur. J. Biochem., 254:189).
viii. Ion Channel The gene of interest may encode for a protein that is a component of an ion channel. Immunocytochemical methods can be used to determine if an ion channel protein demonstrates the appropriate cell type specificity.
The activity of an ion channel can be measured by electrophysiological methods in oocytes. Alternatively, the sensitivity of ion channel activity to a particular inhibitor can be determined.
Assays for Ion Channel Activity in Oocytes
Polymorphisms which alter ion channel function and regulation are studied using the oocytes of Xenopus laevis. Injection of the oocytes with exogenous in vitro transcribed mRNA results in the production and functional expression of foreign membrane proteins, including voltage- and neurotransmitter- operated ion channels (Dascal et al., 1987., CRC Grit Rev Biochem., 224:317). Changes in the oocyte transmembrane current in response to expression of an exogenous mRNA is measured. This technique has been improved by the development of rapid superfusion systems that utilize a dual role perfusion micropipette that controls internal solution as well as monitoring voltage (Costa et al., 1994, Biophys J., 67:395). This technology represents a useful system for studying various aspects of ion channels encoded for by foreign mRNAs including channel expression, single-channel behavior, and the response of channels to the action of pharmacologically active substances (Sigel, 1987 J. Physiol., 386: 73).
Patch Clamp Assays for Ion Channel Activity.
The function of individual channel proteins is determined by the high resolution patch clamp technique. This technique (which is useful in a variety of cell types, including Xenopus oocytes described above) involves measuring changes in transmembrane cmxent across the cell membrane in vitro (Sachs et al., 1983, Methods Enzymol., 103: 147). Processes such as signaling, secretion, and synaptic transmission are examined at the cellular level by the patch clamp method. The gene expression pattern and protein structure of ionic channels can be determined by combining information derived from high-resolution electrophysiological recordings obtained by the patch clamp method with molecular biological analysis (Liem et al., 1995, Neurosurgery, 36: 382).
A polymorphic variation in a gene that encodes a protein that is a member of a multimeric protein complex, such as an ion channel or a cytoskeletal structural component, can alter the assembly and function the multimeric protein complex (Lee et al., 1994., Biophys J., 66: 667).
A gene variation may affect protein-protein interaction, or disrupt the production of components of a multimeric complex, thereby disrupting stoichiometry and consequently decreasing stability.
Assay for In Vitro Assembly of Multimeric Protein Complexes In vitro assembly assays (described above) can be performed to determine if a polymorphism has affected the assembly of an ion channel.
ix. Cellular Properties
The influence of a polymorphism on general aspects of cell behavior, including cell morphology, adhesive properties, differentiation and proliferation can be assessed using a combination of methods including microscopic observation of cell cultures (Azuma et al., 1994, Histol.Histopathol., 9:781), immunohistochemistry, and FACs analysis techniques (Beesley, 1993, Immunocytochemistry: a Practical Approach, Rickwood, et al., (Eds), JRL Press and Ormerod, 1994, Flow Cytometry: a practical Approach, Rickwood et al., (Eds), BRL Press. Oxford, England).
Assays for Measuring Apoptosis
Apoptosis has been implicated in the etiology and pathophysiology of a variety of human diseases. Gene variants which influence the process of apoptosis can be assessed by a variety of methods of analysis involving either the tissues or cells (Allen et al., 1997, J Pharmacol Toxicol Methods, 37: 215). Cell cultures expressing the gene variants of interest are analysed using Annexin V which interacts strongly with phosphatidylserine residues that have been exposed as a result of plasma membrane breakdown occurring in the early stages of apoptosis. Either vital or fixed material can be analysed by Annexin V labeling in combination with microscopy and flow cytometry detection methods (van Engeland et al., 1998, Cytometry, 31:1). TdT-mediated deoxyuridine triphosphate (dUTP)-biotin nick end-labeling (TUNEL) is a prefeπed method for specific staining of apoptotic cells in histological sections and cytology specimen (Labat-Moleur et al., 1998, J. Histochem Cytochem., 46:327; Sasano et al, 1998., Diagn Cytopathol.,18:398). Apoptosis is also detected by quantification of DNA fragmentation by ethidium bromide staining and gel electrophoresis, or by the use of saturation labeling of 3' ends of DNA fragments (Peng and Liu, 1997, Lab Invest., 77:547).
Assay for In Vivo Receptor Function: Growth Cone Guidance Assay. Activation of cell-surface receptors can result in the stimulation of cell motility. There are many different families of signaling molecules, for example the netrins, (Serafini et al. , 1994, Cell.78: 409), which are responsible for both contact mediated or chemo-mediated attraction and repulsion of migrating cells. A classic model for this activity is the trajectory that the leading edge "growth cone" takes when a neuron is stimulated to grow out from explanted neural tissue in cell culture (Goodman, 1996, Annu Rev Neurosci. 19: 341). Ligands present in the culture medium or immobilized on a substrate bind to receptors on the cell-surface of the growth cone and trigger second-messenger signals thereby dictating an appropriate steering response. The biological activity of such receptors or ligands can be measured by overexpressing the receptor or ligand protein in culture and then monitoring growth cone guidance (Kremoser et al., 1995, Cell 82: 359). Attraction or repulsion of cells which is observed to be different than normal is an indication of the role of this protein in growth guidance, and identifies the polymorphisms as altering function.
x. Changes in gene expression or protein function that result from the presence of a polymorphism can be detected by in vivo assays including the production of transgenic animals, knock out animals or the analysis of naturally occurring animal models of a particular disease.
Transgenic Animals Transgenic mice provide a useful tool for genetic and developmental biology studies and forthe determination of afunction of anovel sequence. Accordingto the method of conventional transgenesis, additional copies of normal or modified genes are injected into the male pronucleus of the zygote and become integrated into the genomic DNA of the recipient mouse. The transgene is transmitted in a Mendelian manner in established transgenic strains.
Constructs useful for creating transgenic animals comprise genes under the control of either their normal promoters or an inducible promoter, reporter genes under the control of promoters to be analysed with respect to their patterns of tissue expression and regulation, and constructs containing dominant mutations, mutant promoters, and artificial fusion genes to be studied with regard to their specific developmental outcome. Transgenic mice are useful according to the invention for analysis ofthe dominant effects of overexpressing a candidate gene in mouse. Typically, DNA fragments on the order of 10 kilobases or less are used to construct a transgenic animal (Reeves, 1998, New. Anat, 253: 19). Transgenic animals can be created with a construct comprising a candidate gene containing one or more polymorphisms according to the invention. Alternatively, a transgenic animal expressing a candidate gene containing a single polymorphism can be crossed to a second transgenic animal expressing a candidate gene containing a different polymorphism and the combined effects ofthe two polymorphisms can be studied in the offspring animals. Transgenic mice engineered to overexpress a number of genes, including PCK1 (Valera et al., 1994, Proc. Natl. Acad. Sci. USA, 91: 9151), INS (Mitanchez et al.,FEBSLetters,421: 285), IAPP(D'Alession etal., 1994, Osteoporosis, 43: 1457), Asp (Klebig et al, Proc. Natl. Acad. Sci. USA, 92: 4728) and Agrt (Graham et al., Nature Genetics, 17:273), have been prepared and may be useful for studying osteoporosis.
Knock Out Animals i. Standard
Knock out animals are produced by the method of creating gene deletions with homologous recombination. This technique is based on the development of embryonic stem (ES) cells that are derived from embryos, are maintained in culture and have the capacity to participate in the development of every tissue in the mouse when introduced into a host blastocyst. A knock out animal is produced by directing homologous recombination to a specific target gene in the ES cells, thereby producing a null allele of the gene. The potential phenotypic consequences of this null allele (either in heterozygous or homozygous offspring) can be analysed (Reeves, supra). Single or double knock out mice that may be useful for studying osteoporosis have been produced for a number of genes including IRS 1 (Araki et al., 1994, Nature, 372:186, Tamemoto et al., 1994, Nature, 372:182), 1R52 (Withers et al., 1998, Nature, 391:900), INSR, BJJRKO, MJJRKO, INSR (Lamothe et al., 1998, FEBS Letter, 426:381), GLUT2, GLUT4 (Katz et al., 1995, Nature, 377:151), GLP1R (Gallwitz and Schmidt, 1997, Z. Gastroenterol, 35:655):, GCK (Sakuraet al, 1998, Diabetologia, 41:654), GCK/IRSl, IRSl/INSR, MC4R (Huszar et al., 1997, Cell, 88:13 1) and BRS3 (Ohki-Hamazaki et al., 1997, Nature, 390:165).
ii. In vivo Tissue Specific Knock Out in Mice Using Cre-lox.
The method of targeted homologous recombination has been improved by the development of a system for site-specific recombination based on the bacteriophage PI site specific recombinase Cre. The Cre-loxP site-specific DNA recombinase from bacteriophage PI is used in transgenic mouse assays in order to create gene knockouts restricted to defined tissues or developmental stages. Regionally restricted genetic deletion, as opposed to global gene knockout, has the advantage that a phenotype can be attributed to a particular cell/tissue (Marth, 1996, Clin. Invest. 97: 1999). In the Cre-loxP system one transgenic mouse strain is engineered such that loxP sites flank one or more exons of the gene of interest. Homozygotes for this so called 'foxed gene' are crossed with a second transgenic mouse that expresses the Cre gene under control of a cell/tissue type transcriptional promoter. Cre protein then excises DNA between loxP recognition sequences and effectively removes target gene function (Sauer, 1998, Methods, 14:381). There are now many in vivo examples of this method, including the inducible inactivation of mammary tissue specific genes (Wagner et al., 1997, Nucleic Acids Res., 25:4323).
iii. Bac Rescue of Knock Out Phenotype
In order to verify that a particular genetic polymorphism/mutation is responsible for altered protein function in vivo one can "rescue" the altered protein function by introducing a wild-type copy of the gene in question. In vivo complementation with bacterial artificial chromosome (B AC) clones expressed in transgenic mice can be used for these purposes. This method has been used for the identification of the mouse circadian Clock gene (Antoch et al., 1997, Cell 89: 655).
G. Production of an Amplified Product
Amplified products useful according to the invention can be prepared by utilizing the method of PCR as described in Section B entitled "Production of a Polynucleotide Sequence
Primers useful for producing an amplified product according to the invention (e.g. an amplified product comprising one or more polymorphisms) can be designed and synthesized as described in Section A entitled "Design and Synthesis of Oligonucleotide Primers".
The invention provides methods (e.g. Southern blot analysis, PCR, primer extension and oligonucleotide hybridization), of detecting a polymorphism in an amplified product.
H. Production of a Mutant Protein 1. Expression of the Nucleotide Sequence
In accordance with the present invention, polynucleotide sequences which encode candidate gene protein fragments, fusion proteins or functional equivalents thereof may be used in recombinant DNA molecules that direct the expression of a candidate gene protein in appropriate host cells. Due to the inherent degeneracy ofthe genetic code, other DNA sequences which encode substantially the same or a functionally equivalent amino acid sequence, may be used to clone and express the candidate gene protein. As will be understood by those of skill in the art, it may be advantageous to produce candidate gene-encoding nucleotide sequences possessing non-naturally occurring codons. Codons preferred by a particular prokaryotic or eukaryotic host (Murray et al., 1989, Nucleic Acid Res 17:477) can be selected, for example, to increase the rate of protein expression or to produce recombinant RNA transcripts having desirable properties, such as a longer half -life as compared to transcripts produced from the naturally occurring sequence.
The nucleotide sequences of the present invention can be engineered in order to alter a candidate gene-encoding sequence for a variety of reasons, including but not limited to, alterations which modify the cloning, processing and/or expression of the gene product. For example, mutations may be introduced using techniques which are well known in the art, e.g., site-directed mutagenesis to insert new restriction sites, to alter glycosylation patterns, to change codon preference or to produce splice variants. In another embodiment of the invention, a natural, modified or recombinant candidate gene protein-encoding sequence may be ligated to a heterologous sequence to encode a fusion protein (as described in Section B entitled "Production of a Polynucleotide Sequence"). For example, for screening of peptide libraries for inhibitors of candidate gene protein activity, it may be useful to encode a chimeric protein that is recognized by a commercially available antibody. a fusion protein may also be engineered to contain a cleavage site located between a candidate protein and the heterologous protein sequence, so that the protein of interest may be substantially purified away from the heterologous moiety following cleavage.
In another embodiment of the invention, the sequence encoding the candidate gene protein may be synthesized, whole or in part, using chemical methods well known in the art (see Caruthers, et al, 1980, Nuc Acids Res Symp Ser, 7:215, Horn, et al., 1980, Nuc Acids Res Symp Ser, 225, etc.) Alternatively, the protein itself, or a portion thereof, could be produced using chemical methods of synthesis. For example, peptide synthesis can be performed using various solid-phase techniques (Roberge, et al., 1995, Science, 269:202) and automated synthesis may be achieved, for example, using the A.1.431 A Peptide Synthesizer (Perkin Elmer) in accordance with the instructions provided by the manufacturer.
The newly synthesized peptide can be substantially purified by preparative high performance liquid chromatography (e.g., Creighton, 1983, Proteins, Structures and Molecular Principles, WH Freeman and Co. New YorkNY). The composition ofthe synthetic peptides may be confirmed by amino acid analysis or sequencing (e.g., the Edman degradation procedure; Creighton, supra). Additionally the amino acid sequence of interest, or any part thereof, may be altered during direct synthesis and/or combined using chemical methods with sequences from other proteins , or any part thereof, to produce a variant polypeptide.
2. Expression Systems hi order to express a biologically active protein, the nucleotide sequence encoding the protein of interest or its functional equivalent, is inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for the transcription and translation of the inserted coding sequence. Methods which are well known to those skilled in the art can be used to construct expression vectors containing a protein-encoding sequence and appropriate transcriptional or translational controls. These methods include in vivo recombination or genetic recombination. Such techniques are described in Ausubel et al., supra and Sambrook et al., supra.
A variety of expression vector/host systems may be utilized to contain and express a protein product of a candidate gene according to the invention. These include but are not limited to microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with virus expression vectors (e.g., baculovirus); plant cell systems transfected with virus expression vector (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) or transformed with bacterial expression vectors (e.g., Ti or pBR322 plasmid); or animal cell systems.
The "control elements" or "regulatory sequences" of these systems vary in their strength and specificities and are those nontranslated regions ofthe vector, enhancers, promoters, and 3' untranslated regions, which interact with host cellular proteins to carry out transcription and translation. Depending on the vector system and host utilized, any number of suitable transcription and translation elements, including constitutive and inducible promoters, may be used. For example, when cloning in bacterial systems, inducible promoters such as the hybrid lacZ promoter of the Bluescript® phagemid (Stratagene, LaJolla CA) or pSportl (Gibco BRL) and ptrp-lac hybrids and the like may be used. The baculovirus polyhedron promoter may be used in insect cells. Promoters or enhancers derived from the genomes of plant cells (e.g., heat shock, RUBISCO; and storage protein genes) or from plant virus (e.g. viral promoters or leader sequences) may be cloned into the vector, hi mammalian cell systems promoters from the mammalian genes or from mammalian viruses are most appropriate. If it is necessary to generate a cell line that contains multiple copies ofthe sequence encoding the protein product ofthe gene of interest, vectors based on 5 V40 or EB V may be used with an appropriate selectable marker. In bacterial systems, a number of expression vectors may be selected depending upon the use intended for the protein of interest. For example, when large quantities of a protein are required for the production of antibodies, vectors which direct high level expression of fusion proteins that are readily purified may be desirable. Such vectors include, but are not limited to, the multifunctional E. coli cloning and expression vectors such as Bluescript® (Stratagene), in which the sequence encoding the protein of interest may be ligated into the vector in frame with sequences encoding the amino-terminal Met and the subsequent 27 residues of b-galactosidase so that a hybrid protein is produced; pIN vectors (Van Heeke & Schuster, 1989, J Biol Chem 264:5503); and the like. Pgex vectors (Promega, Madison WI) may also be used to express foreign polypeptides as fusion proteins with GST. In general, such fusion proteins are soluble and can easily be purified from lysed cells by adsorption to glutathione-agarose beads followed by elution in the presence of free glutathione. Proteins made in such systems are designed to include heparmn , thrombin or factor X A protease cleavage sites so that the cloned polypeptide of interest can be released from the GST moiety at will. In the yeast, Saccharomyces cerevisiae, a number of vectors containing constitutive or inducible promoters such as alpha factor, alcohol oxidase and PGH may be used. For reviews, see Ausubel et al (supra) and Grant et al., 1987, Methods in Enzymology 153:516.
In cases where plant expression vectors are used, the expression of a sequence encoding a protein of interest may be driven by any of a number of promoters. For example, viral promoters such as the 35S and 19S promoters of CaMV (Brisson et al., 1984, Nature 310:511) may be used alone or in combination with the omega leader sequence from TMV (Takamatsu et al, 1987, EMBO J 6:307). Alternatively, plant promoters such as the small subunit of RUBISCO (Coruzzi et al., 1984, EMBO J 3:1671; Broglie et al., 1984, Science, 224:838); or heat shock promoters (Winter I and Sinibaldi RM, 1991, Results Probl Cell Differ., 17:85) may be used. These constructs can be introduced into plant cells by direct DNA transformation or pathogen- mediated transection. For reviews of such techniques, see Hobbs S or Muπy LE in McGraw Hill Yearbook of Science and Technology (1992) McGraw Hill New York NY, pp 191-196 or Weissbach and Weissbach (1988) Methods for Plant Molecular Biology, Academic Press, New York, pp 421-463. An alternative expression system which could be used to express a protein of interest is an insect system. In one such system, Autographa californica nuclear polyhedrosis virus (AcNPV) is used as a vector to express foreign genes in Spodoptera frugiperda cells or in Trichoplusia larvae. The sequence encoding the protein of interest may be cloned into a nonessential region of the virus, such as the polyhedrin gene, and placed under control of the polyhedrin promoter. Successful insertion of the sequence encoding the protein of interest will render the polyhedron gene inactive and produce recombinant virus lacking coat protein coat. The recombinant viruses are then used to infect S.frigoerda cells or Trichoplusia larvae in which the protein of interest is expressed (Smith et al., 1983., J Virol 46:584; Engelhard, et al., 1994, Proc Natl Acad Sci 91 :3224).
In mammalian host cells, a number of viral-based expression systems may be utilized. In cases where an adenovirus is used as an expression vector, a sequence encoding the protein of interest may be ligated into an adenovirus transcription/translation complex consisting of the late promoter and tripartite leader sequence. Insertion in a nonessential El or E3 region of the viral genome will result in a viable virus capable of expressing in infected host cells (Logan and Shenk, 1984, Proc Natl Acad Sci, 81 :3655). In addition, transcription enhancers, such as the rous sarcoma virus (RSV) enhancer, may be used to increase expression in mammalian host cells.
Specific initiation signals may also be required for efficient translation of a sequence encoding the protein of interest. These signals include the ATG initiation codon and adjacent sequences, hi cases where the sequence encoding the protein, its initiation codon and upstream sequences are inserted into the most appropriate expression vector, no additional translational control signals may be needed. However, in cases where only coding sequence, or a portion thereof, is inserted, exogenous transcriptional control signals including the ATG initiation codon must be provided. Furthermore, the initiation codon must be in the coπect reading frame to ensure transcription ofthe entire insert. Exogenous transcriptional elements and initiation codons can be of various origins, both natural and synthetic. The efficiency of expression may be enhanced by the inclusion of enhancers appropriate to the cell system in use (Scharf, et al., 1994, Results Probl Cell Differ, 20:125; Bittner et al, 1987, Methods in Enzymol, 153:516).
In addition, a host cell strain may be chosen for its ability to modulate the expression of the inserted sequences or to process the expressed protein in the desired fashion. Such modifications of the polypeptide include but are not limited to, acetylation, carboxylation, glycosylation, phosphorylation, lipidation and acylation. Post-translational processing which cleaves a " prepro" form ofthe protein may also be important for correct insertion, folding and/or function. Different host cells such as CHO, HeLa, MDCK, 293, W138, etc have specific cellular machinery and characteristic mechanisms for such post-translational activities and may be chosen to ensure the correct modification and processing of the introduced, foreign protein.
For long-term, high-yield production of recombinant proteins, stable expression is prefeπed. For example, cell lines which stably express a foreign protein may be transformed using expression vectors which contain viral origins of replication or endogenous expression elements and a selectable marker gene. Following the introduction of the vector, cells may be allowed to grow for 1-2 days in an enriched media before they are switched to selective media. The purpose of the selectable marker is to confer resistance to selection, and its presence allows growth and recovery of cells which successfully express the introduced sequences. Resistant clumps of stably transformed cells can be expanded using tissue culture techniques appropriate to the cell type.
Any number of selection systems may be used to recover transformed cell lines. These include, but are not limited to, the herpes simplex virus thymidine kinase (Wigler., et al., 1977, Cell 11:223) and adenine phosphoribosyltransferase (Lowy, et al., 1980, Cell 22:817) genes which can be employed in tk- or aprt- cells, respectively. Also, antimetabolite, antibiotic or herbicide resistance can be used as the basis for selection; for example, dhfr which confers resistance to methotrexate (Wigler et al., 1980, Proc Natl Acad Sci 77:3567); npt, which confers resistance to the aminoglycosides neomycin and G-418 (Colbere-Garapin et al., 1981., J Mol Biol., 150:1) and als or pat, which confer resistance to chlorsulfuron and phosphinotricin acetyltransf erase, respectively (Murry, supra). Additional selectable genes have been described, for example, trpB, which allows cells to utilize indole in place of tryptophan, or hisD, which allows cells to utilize histinol in place of histidine (Hartman andMulligan, 1988, Proc Natl Acad Sci 85:8047). Recently, the use of visible markers has gained popularity with such markers as anthocyanins, B glucuronidase and its substrate, GUS, and luciferase and its substrate, luciferin, being widely used not only to identify transformants, but also to quantify the amount of transient or stable protein expression attributable to a specific vector system (Rhodes et al., 1995, Methods Mol Biol 55:121).
Ill 3. Identification of Transformants Containing the Polynucleotide Sequence
Although the presence/absence of marker gene expression suggests that the gene of interest is also present, its presence and expression should be confirmed. For example, if the sequence encoding a foreign protein is inserted within a marker gene sequence, recombinant cells containing the sequence encoding the foreign protein can be identified by the absence of marker gene function. Alternatively, a marker gene can be placed in tandem with the sequence encoding the foreign protein under the control of a single promoter. Expression of the marker gene in response to induction or selection usually indicates expression of the tandem sequences as well.
Alternatively, host cells which contain the coding sequence for a protein of interest and express the protein of interest may be identified by a variety of procedures known to those of skill in the art. These procedures include, but are not limited to, DNA-DNA or DNA-RNA hybridization and protein bioassay or immunoassay techniques which include membrane, solution, or chip based technologies for the detection and/or quantification of the nucleic acid or protein. The presence of the polynucleotide sequence encoding the protein of interest can be detected by DNA-DNA or DNA-RNA hybridization or amplification using probes, portions or fragments of the sequence encoding the foreign protein of interest.
A variety of protocols for detecting and measuring the expression of the foreign protein, using either polyclonal or monoclonal antibodies specific for the protein are known in the art. Examples include enzyme-linked immunosorbant assay (ELISA), radioimmunoassay (RIA) and fluorescent activated cell sorting (FACS). A two-site, monoclonal-based immunoassay utilizing monoclonal antibodies reactive to two non-interfering epitopes on the protein of interest is preferred, but a competitive binding assay may be employed. These and other assays are described in Hampton et al., 1990, Serological Methods a Laboratory Manual, APS Presds, St Paul MN and Maddox., et al, 1983, J Exp Med 158:1211.
4. Purification of the Protein of Interest
Host cells transformed with a nucleotide sequence encoding a protein of interest may be cultured under conditions suitable for the expression and recovery of the encoded protein from cell culture. The protein produced by a recombinant cell may be secreted or contained intracellularly depending on the sequence and/or the vector used. As will be understood by those of skill in the art, expression vectors containing a sequence encoding a protein of interest can be designed with signal sequences which direct secretion of the protein of interest through a prokaryotic or eucaryotic cell membrane. Other recombinanfconstructions may j oin the sequence encoding the protein of interest to the nucleotide sequence encoding a polypeptide domain which will facilitate purification of soluble proteins (Kroll et al., 1993, DNA Cell Biol, 12:441).
The protein of interest may also be expressed as a recombinant protein with one or more additional polypeptide domains added to facilitate protein purification. Such purification facilitating domains include, but are not limited to, metal chelating peptides such as a histidine- tryptophan modules that allow purification on immobilized metals, protein a domains that allow purification on immobilized immunoglobulin, and the domain utilized in the FLAGS extension/affinity purification system (Immunex Corp, Seattle WA). The inclusion of a cleavable linker sequences such as Factor XA or enterokinase (Invitrogen, San Diego CA), between the purification domain and the protein of interest is useful for facilitating purification. One such expression vector provides for expression of a fusion protein comprising the sequence encoding a foreign protein and nucleic acid sequence encoding 6 histidine residues followed by thioredoxin and an enterokinase cleavage site. The histidine residues facilitate purification while the enterokinase cleavage site provides a means for purifying the foreign protein from the fusion protein.
In addition to recombinant production, fragments of the protein of interest may be produced by direct peptide synthesis using solid-phase techniques (Stewart et al., 1969, Solid- Phase Peptide Synthesis, WH Freeman Co,. San Francisco; Merrifield, 1963, J Am Chem Soc, 85 : 2149) . In vitro protein synthesis may be performed using manual techniques or by automation . Automated synthesis may be achieved, for example, using Applied Biosystems 431 A Peptide Synthesizer (Perkin Elmer, Foster City CA) in accordance with the instructions provided by the manufacturer. Various fragments of a protein of interest may be chemically synthesized separately and combined using chemical methods to produce the full length molecule.
I. Preparation of Antibodies Antibodies specific for the protein products of the candidate genes of the invention are useful for protein purification, for the diagnosis and treatment of various diseases (e.g osteoporosis) and for drug screening and drug design methods useful for identifying and developing compounds to be used in the treatment of various diseases (e.g. osteoporosis). By antibody, we include constructions using the binding (variable) region of such an antibody, and other antibody modifications. Thus, an antibody useful in the invention may comprise a whole antibody, an antibody fragment, a polyfunctional antibody aggregate, or in general a substance comprising one or more specific binding sites from an antibody. The antibody fragment may be a fragment such as an Fv, Fab or F(ab')2 fragment or a derivative thereof, such as a single chain Fv fragment. The antibody or antibody fragment may be non-recombinant, recombinant or humanized. The antibody may be of an immunoglobulin isotype, e.g., IgG, lgM, and so forth. In addition, an aggregate, polymer, derivative and conjugate of an immunoglobulin or a fragment thereof can be used where appropriate. Neutralizing antibodies are especially useful according to the invention for diagnostics, therapeutics and methods of drug screening and drug design. Although a protein product (or fragment or oligopeptide thereof) of a candidate gene of the invention that is useful for the production of antibodies does not require biological activity, it must be antigenic. Peptides used to induce specific antibodies may have an amino acid sequence consisting of at least five amino acids and preferably at least 10 amino acids. Preferably, they should be identical to a region of the natural protein and may contain the entire amino acid sequence of a small, naturally occurring molecule. Short stretches of amino acids coreesponding to the protein product of a candidate gene of the invention may be fused with amino acids from another protein such as keyhole limpet hemocyanin or GST, and antibody will be produced against the chimeric molecule. Procedures well known in the art can be used for the production of antibodies to the protein products of the candidate genes of the invention. For the production of antibodies, various hosts including goats, rabbits, rats, mice etc... may be immunized by injection with the protein products (or any portion, fragment, or oligonucleotide thereof which retains immunogenic properties) of the candidate genes of the invention. Depending on the host species, various adjuvants may be used to increase the immunological response. Such adjuvants include but are not limited to Freund's, mineral gels such as aluminum hydroxide, and surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, and dinitrophenol. BCG (bacilli Calmette-Guerin) and Corynebacterium parvum are potentially useful human adjuvants.
1. Polyclonal antibodies . The antigen protein may be conjugated to a conventional carrier in order to increase its immunogenicity, and an antiserum to the peptide-carrier conjugate will be raised. Coupling of a peptide to a carrier protein and immunizations may be performed as described (Dymecki et al., 1992, J. Biol. Chem., 267: 4815). The serum can be titered against protein antigen by ELISA (below) or alternatively by dot or spot blotting (Boersma and Van Leeuwen, 1994, J Neurosci. Methods, 51: 317). At the same time, the antiserum may be used in tissue sections prepared asdescribed. A useful serum will react strongly with the appropriate peptides by ELISA, for example, following the procedures of Green et al., 1982, Cell, 28: 477.
2. Monoclonal antibodies. Techniques for preparing monoclonal antibodies are well known, and monoclonal antibodies may be prepared using a candidate antigen whose level is to be measured or which is to be either inactivated or affinity-purified, preferably bound to a carrier, as described by Arnheiter et al., 1981, Nature, 294;278.
Monoclonal antibodies are typically obtained from hybridoma tissue cultures or from ascites fluid obtained from animals into which the hybridoma tissue was introduced.
Monoclonal antibody-producing hybridomas (or polyclonal sera) can be screened for antibody binding to the target protein.
3. Antibody Detection Methods Particularly preferred immunological tests rely on the use of either monoclonal or polyclonal antibodies and include enzyme-linked immunoassays (ELISA), immunoblotting and immunoprecipitation (see Voller, 1978, Diagnostic Horizons, 2:1, Microbiological Associates Quarterly Publication, Walkersville, MD; Voller et al., 1978, J. Clin. Pathol., 31: 507; U.S. Reissue Pat. No. 31,006; UK Patent 2,019,408; Butler, 1981, Methods Enzymol, 73: 482; Maggio, E. (ed.), 1980, Enzyme Immunoassay, CRC Press, Boca Raton, FL) or radioimmunoassays (RIA) (Weintraub, B., Principles of radioimmunoassays, Seventh Training Course on Radioligand Assay Techniques, The Endocrine Society, March 1986, pp. 1-5, 46-49 and 68-78). For analysing tissues for the presence or absence of a protein produced by a candidate gene according to the present invention, immunohistochemistry techniques may be used. It will be apparent to one skilled in the art that the antibody molecule may have to be labelled to facilitate easy detection of a target protein. Techniques for labelling antibody molecules are well known to those skilled in the art (see Harlow and Lane, 1989, Antibodies, Cold Spring Harbor Laboratory).
J. Preparation of a Labeled Protein
1. Labeling of protein
Labeling techniques are useful, according to the invention, for studying the biochemical properties, processing, intracellular transport, secretion and degradation of proteins.
Biosynthetic labeling of proteins produced by candidate genes of the invention is preferably performed with 35S-methionine due to the high specific activity (>800Ci/mmol) and ease of detection of this amino acid. Another amino acid should be used to label a protein that contains little or no methionine. According to the following protocol, either suspension cells or adherent cells are labeled with 35S-methionine. Briefly, cells are washed and incubated for 15 min at 37°C in short-term labeling medium (complete serum-free, methionine free RPMI or DMEM containing 5% (v/v) dialyzed fetal bovine serum) to deplete intracellular pools of methionine. Cells are then incubated in the presence of 35S -methionine working solution (0.1 to 0.2 mCi/ml in 37°C short-term labeling medium) such that 4ml of 35S-methionine working solution is added per 2 x 107 suspension cells and 2 to 4 ml of 35S -methionine working solution is added per 100 mm dish of adherent cells (0.5-2 x 107 cells), for a period of 30 min to 3 hour in a humidified, 37°C, 5% CO2 incubator. Upon completion of labeling, suspension cells are washed by centrifugation in ice-cold PBS. Following removal of labeling medium, adherent cells are washed with PBS, scraped and collected by centrifugation. Labeled cells are processed and analysed by immunoaffinity chromatography, immunoprecipitation and one- and two-dimensional gel electrophoresis (Ausubel et al., supra).
If the protein of interest is synthesized at a relatively low rate or is in a steady state, it may be necessary to label cells for an extended period of time. When performing long-term biosynthetic labeling of cells, it is necessary to include unlabeled methionine in the medium to maintain cell viability and to ensure that incorporation of label is maintained during the course of the experiment. According to this method, cells can be labeled in the presence of 35S- methionine in long term labeling medium (90% methionine free RPMI or DMEM) for up to 16 hours (Ausubel et al., supra).
2. In vitro Translation
The protein product of the cloned candidate gene of the invention can be produced by the methods of in vitro transcription and in vitro translation. In vitro transcription is performed essentially as described in Section B entitled "Production of a Polynucleotide Sequence" in the absence of a labeled ribonucleoside. The RNA produced by the in vitro transcription reaction will be extracted with phenol, ethanol precipitated twice and resuspended in 10ml of TE buffer. In vitro translation is performed by adding 1 to 10ml of RNA to an in vitro translation kit (e.g. wheat germ or reticulocyte lysate) in the presence of 15mCi [35S]methionine, following the directions provided by the manufacturer. A typical reaction is carried out in a 30ml volume at room temperature for 30 to 60 minutes (Ausubel et al., supra).
K. Production of Cells Expressing a Nucleotide Sequence Comprising a Polymorphism
Mammalian cells expressing a nucleotide sequence comprising a polymorphism are useful, according to the invention for determining the biochemical and functional properties of the protein product of a nucleotide sequence comprising a polymorphism, for analyzing expression of a candidate gene, for large scale production of a protein of interest, for drug screening and for the production of transgenic animals or knockout mice. Methods of efficiently introducing foreign DNA into mammalian cells are known in the art and include calcium phosphate transfection, DEAE-dextran transfection, electroporation and liposome-mediated transfection (Ausubel et al., supra).
Transfection Protocols
1. Calcium-Phosphate Transfection
The method of calcium phosphate transfection involves preparing a precipitate by slowly mixing a HEPES -buffered saline solution with a mixture of calcium chloride and DNA. According to this method, up to 10% of the cells on a dish will incorporate DNA. Cells to be transfected are split one day prior to transfection so that on the day of transfection cells are well-separated on the plate, a 10 cm dish of cells is fed with 9.0 ml of complete medium approximately 2 to 4 hours before the addition of the precipitate. DNA to be transfected (10-50mg/10-cm plate) is ethanol precipitated, resuspended in 450ml sterile water and mixed with 50ml of 2.5M CaCl2 The DNA/CaCl2 solution is added dropwise to a 15-ml conical tube containing 500ml 2X HeBS (0.283M NaCI, 0.023M HEPES acid, 1.5 mM Na2HPO4, pH 7.05). It is preferable to bubble the HeBS solution during the addition ofthe DNA mixture. After the precipitate has formed for 20 minutes at room temperature, it is added evenly to the cells. The cells are incubated with the precipitate at 37°C in a CO2 humidified incubator for 4-16 hours. Following removal of the precipitate, the cells are washed with PBS and fed in complete medium. Glycerol or dimethyl sulfoxide shock can be used to increase the DNA uptake by certain types of cells (Ausubel et al., supra).
2. DEAE-Dextran Transfection
Cells to be transfected are plated at a concentration such that after 3 days of growth they are 30-50% confluent. The DNA to be transfected (approximately 4mg) is ethanol precipitated, resuspended in 40ml TBS and added slowly while shaking to 80ml of warm lOmg/ml DEAE- dextran in TBS. After cells have been washed with PBS and fed with 4 ml of DMEM containing 10% Nu Serum/lOcm dish, the DEAE-dextranlDNA mixture is evenly distributed over the entire plate. Cells are incubated with the DNA for approximately 4 hours in a humidified CO2 incubator. Following the removal of the DEAE-dextran/DNA mixture, cells are shocked by the addition of 5 ml of 10% DMSO in PBS. After a 1 minute incubation at room temperature, cells are washed with PBS and fed with complete medium (Ausubel et al., supra).
3. Electroporation
Alternatively, DNA can be introduced into cells by the use of high-voltage electric shocks, a technique termed electroporation. Briefly, according to the method of electroporation, cells are suspended in an appropriate electroporation buffer and placed in an electroporation cuvette. Following the addition of DNA, the cuvette is connected to a power supply and the cells are subjected to a high-voltage electrical pulse of a defined magnitude and length, optimized for the cell type being transfected. After a brief period of recovery, the cells are placed in normal culture medium.
A population of cells to be transfected by electroporation is grown to late-log phase in complete medium. Typically stable transfection requires 5 X 106 cells, and transient transfection requires 1-4X 107 cells. Cells are harvested by centrifugation for 5 minutes at640x gat4°C. The resulting cell pellet is resuspended in half of the original volume of ice-cold electroporation buffer (e.g. PBS without calcium or magnesium, Hepes buffered saline, tissue culture medium without serum, or phosphate buffered sucrose (272mM sucrose/7 mM K2HPO4, pH 7.4/lmM MgCl2)). The choice of an electroporation buffer is dictated by the cell line. Cells are then harvested by centrifugation for 5 minutes at 640 x g at 4°C, and resuspended at 1 X 107/ml in electroporation buffer at 0°C for stable transfection or at a higher concentration (up to 8 X 107/ml) for transient transfection. Aliquots of the cells (0.5 ml) are transferred into the desired number of electroporation cuvettes and placed on ice.
DNA is added to the cell suspension in the cuvettes on ice. For stable transfection, DNA (optimally 1-lOmg) should be linearized with a restriction enzyme that cuts at a site in a non- essential region, purified by phenol extraction and ethanol precipitated. Supercoiled DNA (optimally lOmg) may be used for transient transfection. The DNA/cell suspension is mixed, and incubated on ice for 5 minutes.
The cuvette is placed in the holder in the electroporation apparatus (at room temperature) and shocked one or more times at the desired voltage and capacitance settings. An electroporation apparatus useful according to the invention is the Bio-Rad Gene Pulser. The number of shocks and the voltage and capacitance settings will vary depending on the cell type, and should be optimized. The two parameters that are critical for successful electroporation are the maximum voltage for the shock and the duration of the current pulse.
Following electroporation, the cuvette containing the mixture of cells and DNA is incubated on ice for 10 minutes. The transfected cells are diluted 20-fold in complete culture medium. For stable transfection cells are grown for 48 hours in nonselective medium and then transferred to antibiotic containing medium. For transient transfection, cells are incubated 50-60 hours and then harvested for the desired transient assay.
L. Production of Animals Expressing a Nucleotide Sequence Comprising a Polymorphism
Transgenic animals expressing a construct comprising a candidate gene containing a polymorphism, according to the invention can be produced by methods well known in the art (reviewed in Reeves et al., supra). Knock out mice wherein a candidate gene according to the invention has been disrupted can be produced by methods well known in the art (reviewed in Moreadith and Radford, 1997, J,Mol. Med., 75:208 and Shastry, 1998, Mol. Cell. Biochem., 181:163). These animals provide useful models for studying the functional consequences of one or more polymorphisms in a gene of interest.
M. Production of a Candidate Gene Library
The invention provides a method of producing a candidate gene library comprising genes that are potentially associated with the susceptibility to, or pathogenesis of a disease. A candidate gene library is useful for determining the genetic basis of a disease of interest. Genetic susceptibility to a disease must occur as a result of specific DNA differences relative to non-susceptible individuals. In the case of osteoporosis, many genes are known which are potentially involved in the susceptibility to, or pathogenesis of the disease. These genes are included in the candidate gene library and the association of these genes with osteoporosis is determined from population studies according to the invention. Unlike linkage studies wherein a region of the genome that is thought to be involved in a disease is determined, the candidate gene strategy, including association studies, addresses the involvement of a particular gene in a disease. The results of association studies of candidate genes are used to identify genes that should be intensively studied as potential therapeutics or therapeutic targets. According to the invention, the full range of polymorphic sites within each candidate gene is identified and examined in diseased and normal populations. The frequency of each gene variant (allele) in each population is then compared to the other. If a specific polymorphism under analysis contributes to the disease phenotype, it will be present in the diseased population at a higher frequency than in the normal population. In addition, if the specific polymorphism under analysis does not itself contribute to the disease phenotype but resides elsewhere in, or is near to a gene containing a contributory polymorphism, a significant association may be seen with the polymorphic marker being tested. This is because the two markers are in linkage disequilibrium with each other due to their close proximity.
1. Strategies for Identifying Genes Associated with a Disease
There are a number of methods known in the art for the identification of genes involved in a disease. These methods include familial linkage studies followed by positional cloning, differential gene expression studies on tissues, and population-based candidate gene association studies. Although positional cloning has proven to be useful for diseases resulting from a single mutation, this technique is not suitable for identifying genetic linkage in diseases where multiple genetic variants combine to create disease susceptibility. Furthermore, it has been demonstrated that the etiological basis of the majority of diseases comprises more than one gene.
The goal of linkage studies is to determine the approximate position of disease genes by studying related individuals in families. According to linkage strategies, DNA markers that are randomly spaced throughout the genome, but are rarely located within genes, are tested for the frequency of their presence along with the particular disease phenotype. There is approximately a 50% chance of an unlinked gene and marker gene co-localizing. If a particular marker is present at a significantly higher frequency than expected in disease individuals, this indicates that the marker is located in the vicinity of the disease gene. Usually the disease gene is delimited to a large region (containing tens to hundreds of genes). After a disease gene has been grossly mapped, this entire region must be extensively characterized to determine what genes are present in the region. Any gene that is identified according to this method becomes a candidate gene.
Linkage studies have been used successfully to identify the genes responsible for certain genetic diseases originating from mutations in a single gene (monogenic diseases). However, most common human diseases are of polygenic origin wherein changes in multiple genes causes an increased susceptibility to or pathogenesis of a particular disease. Because the DNA changes associated with genes which contribute to polygenic diseases are common in the population, thereby diluting the contribution of a given region of the genome to the disease, it is difficult to perform linkage studies on diseases of polygenic origin.
Linkage analysis
A series of genetic crosses is performed in an animal model system of a particular defect that is characteristic of a disease of interest (e.g. osteoporosis) between individuals having an observable mutant phenotype and normal individuals of a control strain. At least one disease- related loci is used as a marker in these crosses. Alternatively, linkage analysis can be performed using chromosomal markers that do not comprise a disease related locus (described below). If non-random assortment of the mutant trait with a marker locus is observed, and if that non- random assortment is statistically significant (for example, if a Student's t test or ANOVA is applied to the results) the trait is linked to the marker locus. Similarly, linkage analysis using an existing human or other mammalian pedigree may be performed. Pedigree analysis is a useful technique for identifying genes for which variant alleles may contribute to the risk, onset or progression of a disease in a family containing multiple individuals afflicted with a disease; according to this method, numerous genetic loci from affected and unaffected family members are compared. Non-random assortment of a given genetic marker between affected and unaffected family members relative to the distributions observed for other genetic loci indicates that the marker (for example, a variant isoform of a gene) either contributes to the disease or is in physical proximity to another that does so.
If a non-random assortment of the disease-related phenotype with a marker locus is observed, using either approach, this is indicative of an association between the gene underlying the defect and that locus. Because the strength of any conclusion drawn from linkage analysis is statistically-based, the accuracy of the results is thought to be proportional to the number of crosses or family members and genetic loci analysed.
Positional Cloning If linkage is confirmed it is preferable to perform a molecular analysis of the region in which the peak of linkage maps. The wide availability of yeast artificial chromosome (YAC) or bacterial artificial chromosome (BAC) libraries facilitates this analysis, a nucleic acid sequence specific for a region encompassing a gene which is determined to occupy a map location of a particular locus of interest is examined, and open reading frames are evaluated to determine their relationship with the observed phenotype. An initial evaluation may be performed with the assistance of a computer program, such as the PathCalling™ (CuraGen) biological pathway discovery platform. All or a subset of the open reading frames present in the region are then cloned (e.g., by PCR) from mutant animals or affected family members and from their healthy counterparts (either control animals or unaffected family members), and the sequences of these open reading frames are compared. Jf a mutation or other allelic variant is found to be linked to individuals displaying the disease phenotype (in a statistically-significant, non-random manner), it can be concluded that this mutation is associated with a disease phenotype. A nucleic acid fragment containing this gene can be labeled and used as a probe for in situ hybridization analysis of fixed chromosomes of the human or other mammal to determine precisely the physical location of the gene. Furthermore, a gene that has been mapped and isolated in this manner may be useful as a candidate target for disease diagnosis and for drug targeting according to the invention (see below).
2. Identification of Genes to be Included in Candidate Gene Library A candidate gene library according to the invention will include i . genes that are involved in known or predicted disease pathways, ii. new genes that are identified by a relevant pattern of specific tissue or cell expression, iii. genes that map to genomic regions of known linkage, and iv. gene sequences (from sequence databases) that are homologs of the above referenced categories of potential candidate genes. The choice of potentially related genes to be selected from a database will depend on the percent identity as calculated by Fast DB and based upon mismatch penalty, gap penalty, gap size penalty and joining penalty.
Based on the physiological changes associated with a disease of interest, predictions can be made regarding a cell or tissue-type that would be expected to express high or low levels of candidate genes associated with a particular disease. For osteoporosis, it is expected that muscle, adipose, pancreas or liver tissue or tissue comprising insulin secreting pancreatic b-cells, would be useful for identifying candidate genes according to the invention.
Differences in the expression of known and unknown genes in normal and disease tissue can be determined by methods known in the art including Serial Analysis of Gene Expression (SAGE) (Velcuescu et al., 1995, Science, 270:484), subtractive hybridization/screening
(described below), differential display (Ling and Pardee, 1992, Science, 257:967) high-density microarray expression testing.
The technique of SAGE allows for the rapid, detailed analysis of thousands of transcripts.
SAGE depends on the following two principles. First, sufficient information is contained within a short nucleotide sequence (approximately 9-lObp), isolated from a defined location within a transcript, to uniquely identify a transcript. Second, the concatenation of short tags of sequence allows transcripts to be analysed serially by sequencing multiple tags within a single clone.
The method of SAGE is performed by synthesizing double-stranded cDNA from mRNA, cleaving the resulting cDNA with an anchoring restriction endonuclease that is expected to cleave most transcripts at least one time, and isolating the most 3' region of the cleaved cDNA by binding to streptavadin beads. This protocol allows for the identification of a unique site on a transcript that corresponds to the restriction site located closest to the polyA tail. Replicate samples of the most 3 ' region of the cDNA are ligated to one of two linker molecules that contain a type US restriction site for a tagging enzyme. The cleavage site for Type IIS restriction endonucleases is located at a defined distance up to 20 bp from the asymmetric recognition site. Linkers are designed such that upon cleavage of the ligation product with the tagging enzyme there is release of the linker and an attached short region of cDNA.
Following the creation of blunt ends, the two pools of released tags are ligated to each other and the resulting ligated product is used as a template for PCR amplification in the presence of primers that are specific for each linker. The PCR product is cleaved with the anchoring enzyme and amplification products, comprising two tags linked tail to tail, are isolated, concatenated by ligation, cloned and sequenced (Velescu et al., supra).
Differential display provides a method for separating and cloning individual mRNAs by PCR analysis. According to the method of differential display, oligonucleotide primers are selected wherein one primer is anchored to the polyadenylate tail of a subset of mRNA species and the other primer is short and of an arbitrary sequence such that it anneals at different positions relative to the first primer. The mRNA subpopulations that are identified with these primer pairs are subjected to reverse transcription, amplified and analysed on a DNA sequencing gel. By using multiple sets of primers, a reproducible pattern of amplified cDNA fragments that demonstrate a requirement for the sequence specificity of either primer can be obtained (Liang and Pardee, supra).
According to the method of high-density microarray expression testing, DNA sequences to be tested for expression are spotted onto a surface, usually at high-density to allow for the testing of many genes. The surface contain the DNA sequences is typically refeπed to as a 'chip' . The spotted DNA cam be either cDNA clones or oligonucleotides. RNA is prepared from the two cells or tissues to be compared. The RNA from one cell/tissue will be labeled red and the RNA from the other cell/tissue will be labeled yellow. Both RNA preparations are hybridized to the DNA array. The ratio of red to yellow is indicative of the relative levels of expression between the two cells/tissues.
3. Mapping a candidate gene
Molecular and cytogenetic methods of mapping candidate genes are known in the art and are summarized below. Linkage analysis provides a method for identifying genes mapping to genomic regions of known linkage.
Linkage analysis
As described above, linkage analysis may be performed between an unmapped candidate gene and one or more ofthe disease-related loci or by analyzing the genetic linkage between the candidate gene and chromosomal markers which are not themselves linked to a disease-related locus, according to the same method. For the latter type of analysis it is preferable that the spacing of markers throughout the genome of the test organism is approximately one every cM or less. This spacing will ensure complete coverage of the genome and will facilitate accurate mapping. Other methods for mapping a candidate gene are provided below.
Syntenic similarity
As a result of classical genetic studies and, more recently, multi-laboratory genomic sequencing collaborations such as the Human Genome Project and Mouse Genome Project, the human and mouse genomes have been extensively characterized. It is now known that there is a significant degree of co-linearity among human, mice and rats wherein there is conservation relative to one another among these several species in the chromosomal map positions of numerous genes and groups of genes. Examination of the human and/or mouse chromosomal maps in the regions comparable to those to which a particular loci of interest maps in the rat will yield candidate genes which may be responsible for the physiological changes associated with a disease of interest. The methods of radiation hybrid mapping or fluorescence in situ hybridization at low stringency to rat chromosomes using labeled fragments derived from the human or mouse genes can be used to confirm that genes present in these regions of the human and/or mouse are present in the regions of interest in the rat. Radiation hybrid (RH) mapping is a somatic cell hybrid technique that was developed to create high resolution, contiguous maps of mammalian chromosomes. The method is useful for ordering DNA markers spanning millions of base pairs of DNA at a resolution not easily obtained by other mapping methods (Cox etal., 1990, Science, 250: 245; Burmeister etal., 1991, Genomics, 9:19; Warrington et al., 1992, Genomics, 13: 803; Abel et al., 1993, Genomics, 17:632). Radiation hybrid mapping facilitates the mapping of non-polymorphic DNA markers that cannot be used for meiotic mapping.
According to the method of radiation hybrid mapping a lethal dose of X-irradiation is used to fragment the chromosomes ofthe donor cell line. Chromosome fragments from the donor cell line are then retained, in a non-selective manner, following cell fusion with a recipient cell line. The resulting hybrid clones are then analysed for the presence or absence of specific donor chromosome markers. It is expected that markers that are further apart on a chromosome are more likely to be broken apart by radiation and to segregate independently in the RH cells than markers that are closer together. By performing a statistical analysis of the co-segregation of various loci in hybrid clones, it is possible to construct a map that provides information regarding the relative order and distance of markers (Cox et al., 1990, supra; Warrington et al., 1991, Genomics, 11: 701; Ceccherini et al., 1992, Proc. Natl. Acad. Sci. USA, 89: 104).
Subtractive screening
In view of the observation that only a subset of an organism's genes are expressed in a given tissue, there is a high probability that transcripts which differ in expression between cells of the same tissue in a mutant and control animal are responsible for the observed mutant phenotype.
According to the method of subtractive cloning, mRNA is isolated from a tissue of choice, wherein the tissue is obtained from two distinct organisms and wherein one organism displays a mutant phenotype with regard to a particular trait while the other is normal in that respect. Methods well known in the art are used to prepare cDNA from the mRNA derived from the organism. The mRNA template is then degraded, either by hydrolysis under alkaline conditions or by RNAase H-mediated cleavage, and the cDNA is returned to a buffer in which mRNA is stable, and mixed with a molar excess of mRNA prepared from the second organism under conditions of stringent hybridization. The mixture is then passed over a hydroxyapatite column, which binds double-stranded nucleic acids but allows single stranded nucleic acid molecules to pass through. Reverse transcripts derived from the first sample which do not hybridize to niRNA molecules derived from the second organism (in other words, reverse transcripts specific to the first tissue sample) are present in the flow-through fraction and are cloned into a vector to create a subtraction library. The reciprocal experiment (in which the cDNA is derived from the second mRNA preparation) is also carried out to create a complete set of transcripts specific to the tissue samples derived from the two organisms.
This procedure will provide transcripts that can be labeled and used as probes in in situ hybridization analysis of immobilized chromosomes. The method of subtractive screening therefore, yields both cloned genes as well as reagents useful for determining if the cloned genes co-localize with a loci of interest. If a particular gene is found to co-localize to a loci of interest, the genes may be analysed functionally (e.g., in a phenotypic rescue experiment, as described below or by the phenotypic assays described in Section F entitled "Identification and Characterization of Polymorphisms") Ultimately, these genes may be used as targets for drugs or disease diagnostic methods, or even as therapeutic nucleic acids.
Mutagenic tranposon mapping
The selection of insertional events that lie within genes (e.g., within coding or regulatory sequences) is facilitated by the use of entrapment vectors, first described in bacteria (Casadaban and Cohen, 1979, Proc. Natl. Acad. Sci. U.S.A., 76: 4530; Casadaban et al., 1980, J Bacteriol, 143: 971). By employing animal models, entrapment vectors can be introduced into pluripotent ES cells in culture (for example, using electroporation or a retrovirus) and then passed into the germline via chimeras (Gossler et al., 1989, Science, 244: 463; Skames, 1990, Biotechnology, 8:827). Alternatively, transgenic animals containing entrapment vectors may be generated by standard oocyte injection protocols.
These methods result in DNA integrations that are highly mutagenic because they interrupt the endogenous coding sequence. It is estimated that the frequency of obtaining a mutation in some gene of any in the genome using a promoter or gene trap is about 45%. For a detailed description of retro viral insertion mutagenesis see Methods Enzymol., vol. 225, 1990. Genes which are expressed in a tissue of interest and for which a biochemical assay of a particular activity have been developed in animal models are most useful according to this method. Promoter or gene trap vectors often contain a reporter gene, e.g., lacZ, Cat or green fluorescent protein (Gfp) that lacks its own upstream promoter and/or splice acceptor sequence. That is, promoter gene traps contain a reporter gene with a splice site but no promoter. If the vector integrates within a gene and is spliced into the gene product, then the reporter gene will be expressed. Enhancer traps contain a reporter gene and have a minimal promoter which requires the activity of an enhancer in order to function. If the vector integrates near an enhancer (whether in a gene or not), then the reporter gene will be expressed. Activation of the reporter gene can only occur when the vector is integrated within an active host gene and generates a fusion transcript with the host gene. The activity of a reporter gene provides an easy assay for determining if a vector has been integrated into an expressed gene. Methods for detecting reporter gene activity in transfected cells or tissues of a transgenic animal are well known in the art.
The mutagenic vector may be mapped using standard cytogenetic techniques, such as in situ hybridization, wherein a labeled fragment comprising vector-specific sequence is used as a probe. Co-localization of the probe with a particular locus of interest indicates that the associated gene is a suitable candidate and should be subjected to further analysis. A gene that has been identified in this manner can be cloned as described.
N. Diagnostic Indicators, Screens and Disease Symptoms In another embodiment of the invention, there is provided a method of diagnosing or determining susceptibility of a subject to low BMD and/or bone damage. This method involves analyzing the genetic material of a subject to determine which allele(s) ofthe gene is/are present. The method may include determining whether one or more particular alleles are present, or which combination of alleles (i.e. a haplotype) is present. The method may also include determining whether subjects are homozygous or heterozygous for a particular allele or haplotype.
In a preferred embodiment, the method comprises determining which allele of one or more ofthe polymorphisms ofthe invention is/are present. In particular, the method may include determining the presence of the polymorphism of the gene which in combination with polymorphisms defined herein or other polymorphisms may define a risk haplotype. The polynucleotides sequences for these particular alleles may be used for diagnostic purposes . The polynucleotides which may be used include oligonucleotides, complementary RNA and DNA molecules and PNAs. The polynucleotides may be used to determine whether subjects are homozygous or heterozygous for a particular allele or haplotype making them susceptible to low BMD and/or bone damage, and hence, osteoporosis. In one aspect, hybridization with a PCR probe which is capable of detecting particular polymorphism and these probes may be used to identify nucleic acid sequences of particular alleles or haplotype. These probes must be specific to these particular alleles and the stringency of the hybridization or amplification must be such that the probe identifies only this particular allele. Means for producing specific hybridization probes for these polynucleotides of particular alleles include the cloning of these polynucleotide sequences into vectors for the production of mRNA probes is well known to one skilled in the art. Such vectors are known in the art, are commercially available, and may be used to synthesize RNA probes in vitro by means of the addition of the appropriate RNA polymerases and the appropriate labeled nucleotides. Hybridization probes may be labeled by a variety of reporter groups, for example, by radionuclides such as 32P or 35S, or by enzymatic labels, such as alkaline phosphatase coupled to the probe via avidin/biotin coupling systems, and the like.
Polynucleotides of particular alleles or haplotype may be used in Southern or northern analysis, dot blot, or other membrane-based technologies; in PCR technologies; in dipstick, pin, and multiformat ELISA-like assays; and in microarrays utilizing fluids or tissues from patients to detect susceptibility to low BMD and/or bone damage. Such qualitative methods are well known in the art.
In a particular embodiment, polynucleotides of particular alleles or haplotype may be used in assays that detect susceptibility to low BMD and/or bone damage, particularly those mentioned above. Polynucleotides complementary to sequences of a particular allele or haplotype may be labeled by standard methods and added to a fluid or tissue sample from a patient under conditions suitable for the formation of hybridization complexes. After a suitable incubation period, the sample is washed and determined if there is a signal. If a signal is found, then the presence of the polynucleotide of a particular allele, alleles or haplotype in the sample indicates the susceptibility to low BMD and/or bone damage, and hence, osteoporosis. Such assays may also be used to determine the particular therapeutic treatment regimen for an individual patient.
With respect to osteoporosis, the presence of a particular polymorphism or polymorphisms in a tissue sample from an individual may indicate a predisposition for low BMD and/or bone damage, or may provide a means for detecting osteoporosis prior to the appearance of actual clinical symptoms. A more definitive diagnosis of this type may allow health professionals to employ preventative measures or aggressive treatment earlier, thereby preventing the development or further progression of osteoporosis.
Additional diagnostic uses for oligonucleotides designed from the polynucleotide sequences of a particular allele or haplotype may involve the use of PCR. These oligomers may be chemically synthesized, generated enzymatically, or produced in vitro. Oligomers will contain a fragment of a polynucleotide a particular allele, alleles or haplotype or a fragment of a polynucleotide complementary to the polynucleotide a particular allele, alleles or haplotype, and will be employed under optimized conditions for identification of a specific polymorphism, polymorphisms or haplotype. Oligomers may also be employed under very stringent conditions for detection of these particular DNA or RNA sequences. Examples of particular primer sequences and annealing temperatures for specific polymorphism markers are found in Table 10 of U.S. Provisional Patent Application Serial Number 60/423559, entitled "Nucleotide Polymorphisms Associated with Osteoporosis" filed November 4, 2002, which is hereby incorporated herein by reference in its entirety, and in Tables 10 and 13 of this application.
In further embodiments, oligonucleotides or longer fragments derived from any of the polynucleotides described herein may be used as elements on a microarray. The microarray can be used in transcript imaging techniques to detect a particular polymorphism, polymorphisms or haplotype simultaneously as described below. In particular, this information may be used to develop a pharmacogenomic profile of a patient in order to select the most appropriate and effective treatment regimen for that patient. For example, therapeutic agents which are highly effective and display the fewest side effects may be selected for a patient based on his/her pharmacogenomic profile.
In another embodiment, a method involves the use of antibodies in diagnosing or determining the susceptibility to low BMD and/or bone damage. The antibodies would specifically bind to an epitope of a particular allele or form of the protein and may be used to determine susceptibility to low BMD and/or bone damage, and hence, osteoporosis. Antibodies useful for diagnostic purposes may be prepared in the same manner as described above for therapeutics. Diagnostic assays for determining susceptibility to low BMD and/or bone damage include methods which utilize the antibody and a label to detect a particular allele or form of the protein in human body fluids or in extracts of cells or tissues. The antibodies may be used with or without modification, and may be labeled by covalent or non-covalent attachment of a reporter molecule. A wide variety of reporter molecules, several of which are described above, are known in the art and may be used. A variety of protocols for measuring a particular allele or form of the protein, including ELISAs, RIAs, and FACS, are known in the art and provide a basis for diagnosing susceptibility to low BMD and/or bone damage.
In another embodiment, , fragments of ABBR, or antibodies specific for ABBR may be used as elements on a microarray.
Microaπays may be prepared, used, and analysed using methods known in the art (Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93: 10614-10619; Baldeschweiler et al. (1995) PCT application WO95/251116; Shalon, D. et al. (1995) PCT application WO95/35505; Heller, R.A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150-2155; Heller, M.J. et al. (1997) U.S. Patent No. 5,605,662). Various types of microarrays are well known and thoroughly described in Schena, M., ed. (1999; DNA Microarrays: A Practical Approach, Oxford University Press, London).
O. Preparation of a Human Sample The presence of an allelic form of a gene containing a sequence variation, according to the invention, can be detected by testing any tissue of a human subject. Human samples that are useful according to the invention include tissue or fluid samples containing a polynucleotide or polypeptide of interest, include but are not limited to plasma, serum, spinal fluid, lymph fluid, urine, stool, external secretions of the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and samples of in vitro cell culture constituents. Genomic DNA, cDNA or RNA can be prepared from the human sample according to the methods described above.
P. Methods of Use 1. Nucleic Acid Diagnosis and Diagnostic Kits
In order to detect the presence of an allele of a gene predisposing an individual to osteoporosis, a biological sample such as blood is prepared and analysed for the presence or absence of susceptibility alleles of a gene containing a polymorphism, according to the invention. Results of these tests and interpretive information will be returned to the health care provider for communication to the tested individual. Such diagnoses may be performed by diagnostic laboratories, or, alternatively, diagnostic kits are manufactured and sold to health care providers or to private individuals for self-diagnosis.
Initially, the screening method will involve amplification ofthe relevant gene sequences. In another preferred embodiment of the invention, the screening method involves a non-PCR based strategy. Such non-PCR based screening methods include Southern blot analysis to detect the presence of a variant form of a gene in a sample comprising total genomic DNA from the individual being tested. Alternatively, northern blot analysis can be used to detect an aberrant mRNA encoded by a gene, that exhibits altered stability or is the result of alternative splicing in a sample comprising RNA from an individual being tested. The methods of S 1 nuclease analysis, RNASE protection and primer extension can also be used to determine both the endpoint and the amount of a gene specific mRNA (Ausubel et al., supra). Both PCR and non-PCR based screening strategies can detect target sequences with a high level of sensitivity.
The preferred method, according to the invention, is target amplification. According to this method, the target nucleic acid sequence is amplified with polymerases. One particularly preferred method using polymerase-driven amplification is PCR (described above). The polymerase chain reaction and other polymerase-driven amplification assays can achieve over a million-fold increase in copy number through the use of polymerase-driven amplification cycles. PCR primers useful for target amplification according to the invention, will be designed to amplify a region of DNA containing one or more polymorphisms. Allele specific primers (comprising one or more polymorphisms) are also useful for detecting gene sequence variations by PCR methodologies according to the invention. The absence of a particular polymorphism will be indicated by the absence of an amplified product when the amplification step is carried out in the presence of allele specific primers. Once amplified, the resulting nucleic acid can be sequenced and the specific sequence of the test DNA will be compared with the wild type sequence by using the computer programs described in Section F entitled "Identification and Characterization of Polymorphisms". Alternatively, the amplified product will be analysed by Southern blot assay with nucleic acid probes. Nucleic acid probes, useful according to the invention, will be specifically hybridizable to a mutant form of a gene but not to the wild type gene due to the presence of one or more polymorphisms. When a probe comprising the target sequence, according to the invention, is used to detect the presence of the target sequences via non PCR-based strategies, (for example, in screening for osteoporosis susceptibility), the biological sample to be analysed, such as blood or serum, may be treated, if desired, to extract the nucleic acids (as described above). The sample nucleic acids (isolated from a biological sample or amplified by PCR) may be prepared in various ways to facilitate detection of the target sequence; e.g. denaturation, restriction digestion, electrophoresis or dot blotting. Preferably, the targeted region ofthe nucleic acids being analysed are at least partially single-stranded to form hybrids with the targeting sequence of the probe. If the sequence is naturally single-stranded, denaturation will not be required. However, if the sequence is double-stranded, the sequence will probably need to be denatured. Denaturation can be carried out by various techniques known in the art.
To detect the presence of a sequence variation in a gene, according to the invention, analyte nucleic acid and probe will be incubated under conditions which promote stable hybrid formation of the target sequence in the probe with the putative targeted sequence in the sample DNA. If the region of the probe which is used to bind to the analyte is designed to be completely complementary to the targeted region, high stringency conditions are desirable in order to prevent false positives. However, conditions of high stringency will be used only if the probes are complementary to regions ofthe chromosome which are unique in the genome. The stringency of hybridization is determined by a number of factors (described above). Detection, if any, ofthe resulting hybrid is usually accomplished by the use of labeled probes. Alternatively, the probe may be unlabeled, but may be detectable by specific binding with a ligand which is labeled, either directly or indirectly. Suitable labels, and methods for labeling probes and ligand are known in the art, and are described in Section C entitled "Production of a Nucleic Acid Probe".
Accordingly, the foregoing screening method may be modified to identify individuals having a gene containing a neutral polymorphism not associated with osteoporosis, by preferably amplifying DNA fragments of a gene derived from a particular individual. The amplified DNA fragments are sequenced and the sequence is compared to the consensus gene sequence containing neutral polymorphisms. At this time, differences between the individual's coding sequence for a gene and a consensus sequence for the same gene are determined wherein the presence of any neutral polymorphisms and the absence of a polymorphisms not previously identified as neutral polymorphisms can be correlated with an absence of increased genetic susceptibility to osteoporosis resulting from a mutation in a gene coding sequence.
In another embodiment ofthe invention, detection of a polymorphism will be performed by detecting loss of a restriction enzyme recognition site due to the presence of one or more polymorphisms. According to this embodiment, a polymorphism will be detected with a polynucleotide probe that is capable of detecting a restriction enzyme fragment containing the polymorphism, wherein the fragment is of a size that can be easily separated on an agarose gel and visualized by Southern blot analysis. A polynucleotide probe according to this embodiment of the invention can be specific for a sequence within the candidate gene or outside of the candidate gene.
It is also contemplated within the scope of this invention that the nucleic acid probe assays of this invention will employ a mixture of nucleic acid probes capable of detecting a gene. Thus, in one example to detect the presence of a gene in a test sample, more than one probe complementary to a gene is employed and in particular the number of different probes is alternatively 2, 3, or 5 different nucleic acid probe sequences. In another example, to detect the presence of mutations in the gene sequence in a patient, more than one probe complementary to a gene is employed wherein the probe mixture includes probes capable of binding to the allele- specific mutations identified in populations of patients with alterations in a gene. In this embodiment, any number of probes can be used, and will preferably include probes corresponding to the major gene mutations identified as predisposing an individual to osteoporosis.
Northern blot analysis, SI nuclease analysis, RNASE protection and primer extension (Ausubel et al., supra) are also methods according to the invention for detecting changes in mRNA resulting from the presence of one or more polymorphisms in the sequence of a gene. Additionally, ofthe methods of genotyping described in Section F entitled "Identification and Characterization of Polymorphisms" can be used for diagnostics according to the invention.
2. Peptide Diagnosis and Diagnostic kits osteoporosis can also be detected on the basis of an alteration of the wild-type polypeptide. Such alterations can be determined by sequence analysis in accordance with conventional techniques. More preferably, antibodies (polyclonal or monoclonal) are used to detect differences in, or the absence of peptides derived from a gene of interest. The antibodies maybe prepared as described above in Section I entitled "Preparation of Antibodies". Preferably, antibodies will immunoprecipitate the protein product of a gene from solution as well as react with the protein product of a gene on Western or immunoblots of polyacrylamide gels. Antibodies useful according to the invention will also detect the protein product of a gene in paraffin or frozen tissue sections, using immunocytochemical techniques. Prefeπed embodiments relating to methods for detecting wild type or mutant forms of the protein product of a gene include enzyme linked immunosorbent assays (ELISA), radioimmunoassay (RIA), immunoradiometric assays (IRMA) and immunoenzymatic assays (IEMA), including sandwich assays using monoclonal and/or polyclonal antibodies. Exemplary sandwich assays are described by David et al. In U.S. Pat. Nos.4,376,110 and 4,486,530, hereby incorporated by reference.
3. Drug Screening
This invention is particularly useful for screening therapeutic compounds by using the mutant gene or protein product or binding fragment of the gene in any of a variety of drug screening techniques.
The protein product or fragment of a gene employed in such a test may either be free in solution, affixed to a solid support, expressed on the surface of a cell, or located intracellularly. One method of drug screening utilizes eukaryotic or procaryotic host cells which are stably transformed with a recombinant polynucleotide expressing the polypeptide or fragment, preferably in competitive binding assays. Such cells, either in viable or fixed form, can be used for standard binding assays. In particular, these cells can be used to measure formation of a complex comprising the protein product or fragment of a gene and the agent being tested. Alternatively, these cells can be used to determine if the formation of a complex between the protein product or fragment of a gene and a known ligand is interfered with by an agent being tested. Thus, the present invention discloses methods useful for drug screening wherein such methods comprise contacting a candidate drug with a polypeptide or fragment derived from a gene and assaying (i) for the presence of a complex between the drug and the polypeptide derived or fragment derived from a gene, or (ii) for the presence of a complex between the polypeptide or fragment derived from a gene and a ligand, by methods well known in the art. Preferably, the polypeptide or fragment derived from a gene is labeled for use in competitive binding assays. Methods for producing a labeled protein by in vitro translation are described in Section J entitled "Preparation of a Labeled Protein". Free polypeptide or fragment will be separated from that present in a protein: protein complex, and the amount of free (i.e., uncomplexed) label will be used as a measure of the binding of the test drug to the polypeptide or the ability of the test drug to interfere with protein: ligand binding.
Another method of drug screening allows for high throughput screening for compounds exhibiting suitable binding affinity to the polypeptides and is described in detail in Geysen, WO
84/03564. According to this method, large numbers of different small peptide test compounds are synthesized on a solid substrate, such as plastic pins or another suitable surface. The peptide test compounds are reacted with the polypeptides or peptide fragments derived from a gene, and washed. Bound polypeptide is then detected by methods well known in the art.
Purified protein can be coated directly onto plates for use in the aforementioned drug screening techniques. Alternatively, non-neutralizing antibodies to the polypeptide can be used to capture the polypeptide or peptide fragment of interest and immobilize it on the solid support.
Competitive drug screening assays in which neutralizing antibodies capable of specifically binding the polypeptide of interest compete with a test compound for binding to the polypeptide or fragments thereof of interest are also useful according to the invention. According to this method, antibodies can be used to detect the presence of any test peptide which shares one or more antigenic determinants with the polypeptide of interest.
An additional technique for drug screening involves the use of host eukaryotic cell lines or cells (such as described above) which have a gene that produces a defective protein. According to this method, the host cell lines or cells are grown in the presence of a test drug compound. The rate of growth ofthe host cells is measured to determine if the compound is capable of regulating the growth of cells expressing a nonfunctional protein product of a gene. Alternatively, the ability of the test compound to restore the function of the mutant gene protein can be measured by using an appropriate in vitro assay for function of the protein product of a gene. Suitable in vitro functional assays are described in Section F entitled "Identification and Characterization of Polymorphisms". If the host cell lines or cells express a protein product of a gene that exhibits an aberrant pattern of cellular localization, the ability of the test compound to alter the cellular localization of the protein will be determined. Changes in the cellular localization of a protein of interest will be detected by performing cellular fractionation studies with biosynthetically labeled cells. Alternatively, the cellular localization of a protein of interest can be determined by immunocytochemical methods well known in the art. A method of drug screening may involve the use of host eukaryotic cell lines or cells
(described above) which have an altered gene that demonstrates an aberrant pattern of expression. By aberrant pattern of expression is meant the level of expression is either abnormally high or low, or the temporal pattern of expression is different from that of the wild type gene. The ability of a test drug to alter the expression of a mutant form of a gene can be measured by Northern blot analysis, SI nuclease analysis, primer extension or RNASE protection assays. Alternatively, if a mutant form of a gene contains an polymorphisms in the promoter region of a gene, cells can be engineered to express a reporter construct comprising a mutant gene promoter driving expression of a reporter gene (e.g. CAT, luciferase, green fluorescent protein). These cells can be grown in the presence of a test compound and the ability of a test compound to alter the level of activity of the mutant gene promoter can be determined by standard assays for each reporter gene which are well known in the art.
Candidate Drugs
A "candidate drug" as used herein, is any compound with a potential to modulate a phenotype associated with a particular disease according to the invention.
A candidate drug is tested in a concentration range that depends upon the molecular weight of the drug and the type of assay. For example, for inhibition of protein/protein complex formation, small molecules (as defined below) may be tested in a concentration range of lpg - lOOmg/ml, preferably at about 100 pg - 10 ng/ml; large molecules, e.g., peptides, may be tested in the range of 10 ng - 100 mg/ml, preferably 100 ng - 10 mg/ml. Candidate drug compounds from large libraries of synthetic or natural compounds can be screened. Numerous means are cuπently used for random and directed synthesis of saccharide, peptide, and nucleic acid based compounds. Synthetic compound libraries are commercially available from a number of companies including Maybridge Chemical Co. (Trevillet, Cornwall, UK), Comgenex (Princeton, NJ), Brandon Associates (Merrimack, NH), and Microsource (New Milford, CT). A rare chemical library is available from Aldrich (Milwaukee, WI). Combinatorial libraries are available and can be prepared. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant and animal extracts are available from e.g., Pan Laboratories (Bothell, WA) or MycoSearch (NC), or are readily produceable by methods well known in the art. Additionally, natural and synthetically produced libraries and compounds are readily modified through conventional chemical, physical, and biochemical means.
Useful compounds may be found within numerous chemical classes, though typically they are organic compounds, and preferably small organic compounds. Small organic compounds have a molecular weight of more than 50 yet less than about 2,500 daltons, preferably less than about 750 daltons, more preferably less than about 350 daltons. Exemplary classes include heterocycles, peptides, saccharides, steroids, and the like. The compounds may be modified to enhance efficacy, stability, pharmaceutical compatibility, and the like. Structural identification of an agent may be used to identify, generate, or screen additional agents. For example, where peptide agents are identified, they may be modified in a variety of ways to enhance their stability, such as using an unnatural amino acid, such as a D-amino acid, particularly D-alanine, by functionalizing the amino or carboxylic terminus, e.g. for the amino group, acylation or alkylation, and for the carboxyl group, esterification or amidification, or the like.
Determination of Activity of a Drug A candidate drug, assayed according to the invention as described above, is determined to be effective if its use results in a change of about 10% of a phenotype associated with a disease according to the invention.
The level of modulation by a candidate modulator of a phenotype associated with a disease according to the invention, may be quantified using any acceptable limits, for example, via the following formula, which describes detections performed with a radioactively labeled probe (e.g., a radiolabeled antibody in an immunobinding experiment or a radiolabeled nucleic acid probe in a Northern hybridization).
(CPMControl - CPMSample) Percent Modulation = xlOO
(CPMControl) where CPMControI is the average of the cpm in antibody/ligand complexes or on Northern blots resulting from assays that lack the candidate modulator (in other words, untreated controls), and CPMSample is the cpm in antibody/ligand complexes or on Northern blots resulting from assays containing the candidate modulator. A similar calculation is performed where the assay comprises use of a labeling system or system of measuring enzymatic activity in which there is a linear relationship between the amount of label detected and the amount of protein or nucleic acid being represented per unit of label or the amount of protein or nucleic acid represented by a unit of enzymatic activity.
4. Rational Drug Design
Rational drug design is useful for producing either structural analogs of biologically active polypeptides of interest or small molecules with which polypeptides of interest interact (e.g., agonists, antagonists, inhibitors) in order to design drugs which are, for example, more active or stable forms of the polypeptide, or which enhance or interfere with the function of a polypeptide in vivo. See, e.g., Hodgson, 1991, BioTechnology, 9:19. According to one method of rational drug design, the three-dimensional structure of a protein of interest (e.g., the polypeptide product of the gene) or, or the complex comprising the protein product of a gene in association with its ligand, is determined by x-ray crystallography, by computer modeling or most typically, by a combination of approaches. Alternatively, useful information regarding the structure of a polypeptide may be obtained by modeling based on the structure of homologous proteins. Rational drug design has been used successfully in the development of HTV protease inhibitors (Erickson et al., 1990, Science, 249: 527). Rational drug design may also involve the analysis of peptides derived from the protein product of a gene by an alanine scan (Wells, 1991, Methods in Enzymol., 202: 390). According to this method, each of the amino acid residues of the peptide is sequentially replaced by alanine, and the effect of this amino acid substitution on the peptide' s activity is determined. This technique can be used to determine the functionally relevant regions of the peptide.
Another experimental approach to rational drug design will involve the isolation of a target-specific antibody (selected by a functional assay) and the determination of the crystal structure of this antibody. Theoretically, this approach will yield a pharmacore upon which subsequent drug design can be based. Alternatively, if anti-idiotypic antibodies (anti -ids) specific for a functional, pharmacologically active antibody are generated, there is no need to determine the crystallographic structure of the target-specific antibody. It is expected that the binding site ofthe anti -ids will be an analog ofthe original receptor. The anti-id could then be used to identify and isolate potentially therapeutic peptides from banks of chemically or biologically produced banks of peptides. These selected peptides would then function as pharmacores. According to these methods it may be possible to design drugs which demonstrate increased activity or stability of the protein product of a gene or which function as inhibitors, agonists, antagonists, etc. ofthe activity of a protein product of a gene. The availability of cloned gene sequences, including polymorphisms, ensures that sufficient amounts of the polypeptide product of a gene are available to facilitate analytical studies such as x-ray crystallography. Furthermore, the knowledge of the sequence of the protein product of a gene provided herein will guide those using computer modeling techniques in place of, or in addition to x-ray crystallography.
5. Gene Therapy The present invention also provides a method of supplying wild-type gene function to a cell which carries a mutant allele of a gene. By replacing a mutant gene with a wild type gene, it may be possible to reverse the symptoms of osteoporosis in the recipient cells, a full length version of the wild-type gene, or a fragment of the gene, may be introduced into the cell in a vector such that the gene remains extrachromosomal and is expressed by the cell from the extrachromosomal location. More preferably, following introduction into the mutant cell, the wild-type gene or gene fragment should recombine with the endogenous mutant gene X already present in the cell. Such recombination requires a double recombination event which results in the correction ofthe gene mutation. Vectors for introduction of genes both for recombination and for extrachromosomal maintenance are known in the art, and any suitable vector may be used. Methods for introducing DNA into cells such as electroporation, calcium phosphate co- precipitation and lipofection are known in the art (described above). Cells transformed with the wild-type gene can be used as model systems to study changes in the intensity of symptoms associated with osteoporosis and drug treatments which promote such changes.
As generally discussed above, a gene or a fragment thereof, where applicable, may be used in gene therapy methods in order to increase the amount of the expression products of such genes in cells of patients with osteoporosis. It may also be useful to increase the level of expression of a gene even in those cells in which the mutant gene is expressed at a "normal" level, but the gene product is not fully functional.
It other embodiments of the invention it may be useful to increase the amount of the expression products of a mutant form of a gene in a cell that expresses the wild type protein.
Gene therapy can be carried out according to generally accepted methods, for example, as described by Friedman, 1991, In Therapy for Genetic Diseases; T. Friedman ed., Oxford
University Press, pp. 105-121). Initially, the appropriate cells from a patient with osteoporosis would be analysed by the diagnostic methods described above, to determine the level of production of a polypeptide from a gene and the activity of a polypeptide product of a gene. A virus or plasmid vector (see further details below), comprising a copy of a gene and suitable expression control elements, and capable of replicating inside the cells, will be prepared. Suitable vectors are known and are disclosed in U.S. Pat. No. 5,252,479 and PCT published application
WO 93/07282. The vector will be injected into the patient, either locally at an appropriate site according to the invention or systemically.
Gene transfer systems known in the art may be useful in the practice ofthe gene therapy methods of the present invention. These include viral and nonviral transfer methods, a number of viruses have been used as gene transfer vectors, including papovaviruses, e.g., 5 V40 (Madzak et al., 1992, J Gen Virol, 73: 1533), adenovirus (Berkner, 1992, Curr. Top. Microbiol. Immunol, 158:39; Berkner et al, 1988, BioTechniques, 6:616; Gorziglia and Kapikian, 1992, J Virol, 66:4407; Quantin et al, 1992, Proc. Natl. Acad. Sci. USA, 89:2581; Rosenfeld et al, 1992, Cell, 68:143 ; Wilkinson et al, 1992, Nucleic Acids Res. 20:2233; Stratford-Perricaudet et al, 1990, Hum. Gene Ther., 1:241), vaccinia virus (Moss, 1992, Cuπ. Top. Microbiol. Immunol, 158:25) adeno-associated virus (Muzyczka, 1992, Curr. Top. Microbiol. Immunol, 158:97; Ohi et al, 1990, Gene, 89:279), herpesviruses including HSV and EBV (Margolskee, 1992, Curr. Top. Microbiol Immunol, 158:67, Johnson et al, 1992, J. Virol, 66:2952; Fink et al, 1992, Hum. Gene Ther., 3:11; Breakfield and Geller, 1987, Mol. Neurobiol, 1:337; Freese et al, 1990, Biochem. Pharmacol, 40: 2189), and retroviruses of avian (Brandyopadhyay and Temin, 1984, Mol Cell Biol, 4:749; Petropoulos et al, 1992, J. Virol, 66:3391), marine (Miller, 1992, Cuπ. Top. Microbiol. Immunol, 158:1; Miller et al, 1985, Mol. Cell. Biol, 5:431; Sorge et al, 1984, Mol. Cell. Biol, 4:1730; Mann and Baltimore, 1985, J. Virol, 54:401; Miller et al, 1988, J. Virol, 62:4337), and human origin (Shimada et al, 1991, J. Clin. Invest., 88:1043); Helseith et al, 1990, J. Virol, 64:24 16; Page et al, 1990, J. Virol, 64: 5370; Buchschacher and Panganiban, 1992, J. Virol, 66:2731). Most human gene therapy protocols have been based on disabled murine retroviruses.
Nonviral gene transfer methods known in the art include chemical techniques such as calcium phosphate coprecipitation (Graham and van der Eb, 1973, Virology, 52:456; Pellicer et al, 1980, Science, 209: 1414); mechanical techniques, for example microinjection (Anderson et al, 1980, Proc. Natl. Acad. Sci. USA, 77: 5399; Gordon et al, 1980, Proc. Natl Acad. Sci.. USA, 77: 7380; Brinster et al, 1981, Cell, 27:223; Constantini andLacy, 1981, Nature, 294:92); membrane fusion-mediated transfer via liposomes (Feigner et al, 1987, Proc. Natl. Acad. Sci. USA, 84:7413; Wang and Huang, 1989, Biochemistry, 28:9508; Kaneda et al. 1989, J. Biol. Chem., 264:12126; Stewart et al, 1992, Hum. Gen. Ther., 3:267; Nabel et al, 1990, Science, 249:1285;Limetal, 1992, Circulation, 83:2007); and direct DNA uptake and receptor-mediated DNA transfer (Wolff et al, 1990, Science, 247: 1465; Wu etal, 1991, J.Biol. Chem., 266: 14338; Zenke et al, 1990, Proc. Natl. Acad. Sci. USA, 87:3655; Wu et al, 1989b, J. Biol. Chem., 264:16985; Wolff et al, 1991, BioTechniques, 11:474; Wagner et al, 1990, Proc. Natl. Acad. ScLUSA, 87:3410; Wagner et al., 1991, Proc. Natl. Acad. Sci.USA, 88:4255; Gotten et al, 1990, Proc. Natl. 'Acad. Sci.USA, 87:4033; Curiel et al, 1991a, Proc. Natl Acad. Sci.USA, 88:8850; Curiel et al, 1991b, Hum. Gene Ther., 3:147. h an approach which combines biological and physical gene transfer methods, plasmid
DNA of any size is combined with a polylysine-conjugated antibody specific to the adenovirus hexon protein, and the resulting complex is bound to an adenovirus vector. The trimolecular complex is then used to infect cells. The adenovirus vector permits efficient binding, internalization, and degradation of the endosome before the coupled DNA is damaged.
Liposome/DNA complexes have been shown to be capable of mediating direct in vivo gene transfer. While in standard liposome preparations the gene transfer process is nonspecific, localized in vivo uptake and expression have been reported in tumor deposits, for example, following direct in situ administration (Nabel, 1992, Hum. Gen. Ther., 3:399). Gene transfer techniques which target DNA directly to an appropriate tissue, e.g., a tissue that normally expresses the protein product of the candidate gene of the invention, is preferred. Receptor-mediated gene transfer, for example, is accomplished by the conjugation of DNA (usually in the form of covalently closed supercoiled plasmid) to a protein ligand via polylysine. Ligands are chosen on the basis ofthe presence ofthe corresponding ligand receptors on the cell surface of the target cell/tissue type. These ligand-DNA conjugates can be injected directly into the blood if desired and are directed to the target tissue where receptor binding and internalization of the DNA-protein complex occurs. To overcome the problem of intracellular destruction of DNA, coinfection with adenovirus can be included to disrupt endosome function.
6. Peptide Therapy
Peptides which have gene activity can be supplied to cells which carry mutant or missing alleles of a gene. Alternatively, peptides specific for a mutant form of the protein product of a gene can be supplied to cells carrying a wild type protein. The protein product of a gene can be produced by expression ofthe cDNA sequence in bacteria, for example, using known expression vectors (as described in Section H entitled "Production of a Mutant Protein"). Alternatively, the protein product of a gene can be extracted from mammalian cells engineered to produce the protein product of a gene of interest. In addition, the techniques of synthetic chemistry can be employed to synthesize the protein product of a gene. Any of the above techniques can provide a preparation of protein product of a gene that is substantially free of other human proteins. This is most readily accomplished by carrying out protein synthesis in a microorganism or in vitro. Active gene molecules can be introduced into cells by microinjection or by the use of liposomes, for example. Alternatively, some active molecules may be taken up by cells, actively or by diffusion. Extracellular application of the protein product of a gene may be sufficient to decrease or reverse the physiological effects of osteoporosis. Other molecules with the activity of a protein product of a gene (for example, peptides, drugs or organic compounds) may also be used to effect such a reversal. Modified polypeptides having substantially similar function may also be useful for peptide therapy.
7. Transformed Hosts Cells and animals which carry a mutant allele of a gene can be used as model systems to study and test for substances which have potential as therapeutic agents. Following application of a test substance to the cells, the phenotype of the cell will be determined. Any variety of phenotypic changes associated with osteoporosis can be assessed, including insulin resistance and combined insulin resistance/insulin secretion detect. Assays for each of these traits are known in the art.
Animals useful for testing therapeutic agents can be selected after mutagenesis of whole animals or after treatment of germline cells or zygotes. Such treatments include insertion of mutant alleles of a gene, usually from a second animal species, as well as insertion of disrupted homologous genes. Alternatively, the endogenous gene of the animals may be disrupted by insertion or deletion mutation or other genetic alterations using conventional techniques (Capecchi, 1989, Science, 244:1288; Valancius and Smithies, 1991, Mol. Cell. Biol, 11:1402; Hasty et al, 1991, Nature, 350:243; Shinkai et al, 1992, Cell, 68:855; Mombaerts et al, 1992, Cell, 68:869; Philpott etal, 1992, Science, 256:1448; Snouwaertetal, 1992, Science, 257:1083; Donehower et al., 1992, Nature, 356;215). Following the administration of test substances, the physiological changes associated with osteoporosis will be assessed. lithe test substance prevents or suppresses any of these physiological changes, then the test substance will be considered a candidate therapeutic agent for the treatment of osteoporosis. These animal models provide an extremely important testing vehicle for potential therapeutic products.
8. Use of a Polynucleotide as a Unique Sequence Marker: Polynucleotides can be used to mark objects or substances for the purposes of later identification. Thus, polynucleotides ofthe invention are useful for tracking the manufacture and distribution of a large number of diverse substances, including but not limited to: (1) natural resources such as animals, plants, oil, minerals, and water; (2) chemicals such as drugs, solvents, petroleum products, and explosives; (3) commercial by-products including pollutants such as radioactive or other hazardous waste; and (4) articles of manufacture such as guns, typewriters, automobiles and automobile parts. A nucleic acid according to the invention, when used as a marker, thus aids in the determination of product identity and so provides information useful to manufacturers and consumers. Polynucleotides have the advantage over other marking materials of being readily amplifiable through the use of polymerase chain reaction (PCR) technology. The method of PCR is well known in the art. PCR is performed as described by Mullis & Faloona, 1987, Methods Enzymol, 155:335, herein incorporated by reference. It is the unique sequence of a polynucleotide which renders it useful as a marker, since the sequence, or a characteristic pattern derived from its sequence, confers a property on the polynucleotide which permits it to be tracked.
It is contemplated that a novel polynucleotide sequence ofthe invention, or fragments or derivatives of it may be used as markers by their attachment to or mixture in objects or substances to be marked. Methods for marking various classes of substances and later detection of the tags in those substances are disclosed in U.S. Patent Nos. 5,451,505, and 5,643,728.
Briefly, the use of a polynucleotide of the invention as a marker may entail combining a polynucleotide with the substance or object to be marked, using methods appropriate to that substance or object; and detecting the marker through amplification of the polynucleotide sequence using PCR technology, followed by either sequence analysis or identification by other means known in the art (e.g., hybridization assays).
The methods of applying a marker nucleic acid to a substance or object and subsequent detection of that nucleic acid will vary depending upon the nature ofthe substance or object and the environment to which it will be exposed. For example, inert solids such as paper, many pharmaceutical products, wood, some foodstuffs, etc., can be either processed with the marker nucleic acid, or the nucleic acid may be sprayed onto their surfaces. Chemically active substances, such as foodstuffs with enzymatic activity, polymers with charged groups, or acidic pharmaceuticals may require that a protective composition (e.g., liposomes) be added to the nucleic acid being used as a marker.
In order to mark liquids, the nucleic acid may be mixed directly with the liquid, or, if the chemical nature of the liquid is not compatible with this approach (i.e., nucleic acids are not soluble in the liquid), the nucleic acid may be mixed with a detergent to enhance its solubility.
Containerized gases may be marked simply by adding a nucleic acid to the container in dry form, as it will be dispersed throughout the gas as the gas is released.
The amount of nucleic acid to add to a substance as a marker will also vary with the given situation, as will the detection strategy. PCR technology, however, allows the amplification and detection of as little as one molecule from a sample. Other means of detection, such as hybridization assays require that more nucleic acid be recovered from a sample to efficiently detect it. PCR can be combined with a hybridization assay, however, to enhance the sensitivity of the method. A nucleic acid sequence used as a marker will generally be from 20 to 1 ,000 bases long, and preferably will be 60 to 1,000 bases long when PCR is to be used to detect the marker.
One example of a substance for which nucleic acid marking is suited is gunpowder. Marked gunpowder may be prepared as follows: 1) add 16 ng of nucleic acid bearing the chosen marker sequence (derived from a polynucleotide of the invention) to 1 ml of distilled water; 2) mix the solution of nucleic acid with 1 g of nitrocellulose-based gunpowder; and 3) dry in air or under vacuum at 85°C. To recover the marker from gunpowder: 1) wash the gunpowder sample with 1 ml of distilled water; 2) add 50 ml of the wash solution to a standard PCR mix, or, alternatively, place gunpowder flakes directly into a 100 ml PCR mix; and 3) amplify according to standard PCR methods using primers which anneal at opposite ends and on opposite strands of the sequence used as a marker (annealing and extension conditions will depend upon the exact sequences chosen for oligonucletide primers, and may be adjusted according to methods known in the art).
Another example of a substance which may be marked with a nucleic acid according to the invention is ink. To prepare marked ink sample: 1) if the ink is water insoluble, mix the nucleic acid with detergents as for oil. If the ink is water soluble, add nucleic acid directly to the ink to a concentration of about 1 to 20 ng per ml. To recover the marker from ink, proceed as for oils and medicines.
In the above examples, the presence of an amplification product of the proper size (visualized, for example by gel electrophoresis alongside nucleic acid size markers followed by ethidium bromide staining of the gel, according to standard methods) will indicate the presence of the marker in the sample. In some instances, the PCR product may be further subjected to hybridization analysis or to sequencing to enhance the accuracy of the method. A method of hybridization analysis which can be used is described herein.
9. Use of a Polynucleotide of the Invention as a Marker for Chromosome Mapping:
Because a polynucleotide of the invention is novel, (that is, its sequence is unique),it is useful as a marker for chromosomal mapping. There are a number of methods of chromosomal mapping known in the art. Prominent among them is the variant of the in situ hybridization technique known as "Fluorescence In Situ Hybridization", or FISH. Details of methods and solutions used for in situ hybridization are well-known in the art. There are many variations of the FISH technique itself, however the basic approach is similar in each case. Essentially, in situ hybridization of cells, nuclei, or metaphase chromosome spreads is performed with a polynucleotide probe either directly labeled with a fluorochrome, or labeled with a moiety which will be bound by a fluorochrome tagged entity. The hybridized probe is visualized by irradiation of the sample with light in the wavelength which excites fluorescence from the fluorochrome. When combined with standard methods of karyotyping known in the art, this method allows the polynucleotide sequence to be localized to a particular arm of a particular chromosome. Once mapped to a specific chromosome, the location of the novel polynucleotide sequence on that chromosome may be further localized by in situ hybridization along with probes specific for known genes or sequences, labeled with other fluorescent tags which allow the differentiation of the signals from the different probes. Such an approach and various adaptations of it allows the localization of the novel gene relative to a known gene. Methods of generating and using fluorescence-labeled polynucleotide probes for FISH and chromosome mapping are known in the art (for example, see Malcolm et al, 1981, Ann. Hum. Genet., 45:134; Bar-Am et al, 1992, Genes. Chromosomes & Cancer, 4:314; Pinkel etal, 1988, Proc. Natl. Acad. Sci.USA, 85:9138; U.S. Patent No. 5,728,527). Additional variations of the chromosome mapping method utilize a PCR approach (Dionne et al, 1990, BioTechniques, 8(2): 190 andlggo et al, 1989, Proc. Natl. Acad. Sci. USA, 86:6211).
In addition to being able to determine the chromosomal location of the novel polynucleotide, similar technology, in which FISH is combined with flow cytometry, will allow the polynucleotide of the invention to be used to sort chromosomes, nuclei, or whole cells containing various dosages (i.e., gene copy numbers) of the gene encoding that polynucleotide
(Hulfdin et al, 1998, Nuc. Acids Res., 26:3651).
10) Use of a Polynucleotide ofthe Invention as a Marker for Analysis of Forensic Materials:
Forensic science depends heavily on methods for determining the source of various compounds associated with criminal activity. In particular, the identification of individuals involved in criminal activity through analysis of substances found at the crime scenes is critical Such identification is possible with genetic typing, which involves the determination of the genotype of an individual with regard to loci which are polymorphic within the population. As used herein, "polymorphic" refers to a gene or other segment of DNA which shows nucleotide sequence variability from individual to individual. The use of PCR techniques and nucleotide probes to detect even single nucleotide changes in a polynucleotide sequence has revolutionized the field of forensic serology (see Reynolds and Sensabaugh, 1991, Anal. Chem., 63:2). For an example of polymorphisms useful for forensic identification and methods of typing samples with regard to those polymorphisms, see U.S. Patent # 5,273,883.
If a polynucleotide ofthe invention is found to have nucleotide sequence variation among individuals within a population, it may be useful in the analysis of forensic samples. There are a number of methods known to those skilled in the art for typing nucleic acids with regard to polymorphisms. It should be understood that any such method is acceptable according to the invention. One particular method is termed the "reverse dot blot" method. The basic steps involved are: 1) oligonucleotides bearing the sequences of various polymorphic forms of the polynucleotide region to be analysed are bound to membranes; 2) labeled, PCR-amplified fragments, derived from the sample to be genotyped, and corresponding to the polymorphic region ("target DNA") are allowed to hybridize to the bound oligonucleotides under conditions which only allow the hybridization of molecules with 100% complementary sequences; 3) unbound target DNA is removed; and 4) hybridized molecules are detected.
The specific genotype of the individual from whom the target sample was obtained (amplified), with regard to the polymorphic region of a polynucleotide ofthe invention, may thus be determined by screening a panel of probes containing the known polymorphic sequence variations of that region. It should be understood that the hybridization conditions may be adjusted by one of skill in the art so that limited amounts of non-complementarity, including single base mismatches, may be detected with this method.
Q. Pharmaceutical Compositions—Prevention and Treatment
1. Administration of Pharmaceutical Compositions
Administration of pharmaceutical compositions is accomplished orally or parenterally. Methods of parenteral delivery include topical, intra-arterial (directly to the tumor), intramuscular, subcutaneous, intramedullary, intrathecal, intraventricular, intravenous, intraperitoneal, or intranasal administration. In addition to the active ingredients, these pharmaceutical compositions may contain suitable pharmaceutically acceptable carrier preparations which can be used pharmaceutically.
Pharmaceutical compositions for oral administration can be formulated using pharmaceutically acceptable carriers well known in the art in dosages suitable for oral administration. Such carriers enable the pharmaceutical compositions to be formulated as tablets, pills, dragees, capsules, liquids, gels, syrups, slurries, suspensions and the like, for ingestion by the patient.
Pharmaceutical preparations for oral use can be obtained through combination of active compounds with solid excipient, optionally grinding a resulting mixture, and processing the mixture of granules, after adding suitable auxiliaries, if desired, to obtain tablets or dragee cores. Suitable excipients are carbohydrate or protein fillers such as sugars, including lactose, sucrose, mannitol, or sorbitol; starch from corn, wheat, rice, potato, or other plants; cellulose such as methyl cellulose, hydroxypropylmethyl-cellulose, or sodium carboxymethyl cellulose; and gums including arabic and tragacanth; and proteins such as gelatin and collagen. If desired, disintegrating or solubilizing agents may be added, such as the cross-linked polyvinyl pyrrolidone, agar, alginic acid, or a salt thereof, such as sodium alginate.
Dragee cores are provided with suitable coatings such as concentrated sugar solutions, which may also contain gum arabic, talc, polyvinylpyrrolidone, carbopol gel, polyethylene glycol, and/or titanium dioxide, lacquer solutions, and suitable organic solvents or solvent mixtures.
Dyestuffs or pigments may be added to the tablets or dragee coatings for product identification or to characterize the quantity of active compound, ie, dosage.
Pharmaceutical preparations which can be used orally include push-fit capsules made of gelatin, as well as soft, sealed capsules made of gelatin and a coating such as glycerol or sorbitol. Push-fit capsules can contain active ingredients mixed with a filler or binders such as lactose or starches, lubricants such as talc or magnesium stearate, and, optionally, stabilizers. In soft capsules, the active compounds may be dissolved or suspended in suitable liquids, such as fatty oils, liquid paraffin, or liquid polyethylene glycol with or without stabilizers.
Pharmaceutical formulations for parenteral administration include aqueous solutions of active compounds. For injection, the pharmaceutical compositions of the invention may be formulated in aqueous solutions, preferably in physiologically compatible buffers such as Hank's solution, Ringer' solution, or physiologically buffered saline. Aqueous injection suspensions may contain substances which increase the viscosity ofthe suspension, such as sodium carboxymethyl cellulose, sorbitol, or dextran. Additionally, suspensions ofthe active solvents or vehicles include fatty oils such as sesame oil, or synthetic fatty acid esters, such as ethyl oleate or triglycerides, or liposomes. Optionally, the suspension may also contain suitable stabilizers or agents which increase the solubility of the compounds to allow for the preparation of highly concentrated solutions.
For topical or nasal administration, penetrants appropriate to the particular barrier to be permeated or used in the formulation. Such penetrants are generally known in the art.
2. Manufacture and Storage The pharmaceutical compositions of the present invention may be manufactured in a manner that known in the art, e.g. by means of conventional mixing, dissolving, granulating, dragee-making, levitating, emulsifying, encapsulating, entrapping or lyophilizing processes.
The pharmaceutical composition may be provided as a salt and can be formed with many acids, including but not limited to hydrochloric, sulfuric, acetic, lactic, tartaric, malic, succinic, etc... Salts tend to be more soluble in aqueous or other protonic solvents that are the corresponding free base forms. In other cases, the preferred preparation may be a lyophilized powder in lmM-50 mM histidine, 0.1%-2% sucrose, 2%-7% mannitol at aPhRange of 4.5 to 5.5 that is combined with buffer prior to use. After pharmaceutical compositions comprising a compound of the invention formulated in a acceptable carrier have been prepared, they can be placed in an appropriate container and labeled for treatment of an indicated condition with information including amount, frequency and method of administration.
3. Therapeutically Effective Dose
Pharmaceutical compositions suitable for use in the present invention include compositions wherein the active ingredients are contained in an effective amount to achieve the intended purpose. The determination of an effective dose is well within the capability of those skilled in the art. ' For any compound, the therapeutically effective dose can be estimated initially either in cell culture assays, or in animal models, usually mice, rabbits, dogs, or pigs. The animal model is also used to achieve a desirable concentration range and route of administration. Such information can then be use to determine useful doses and routes for administration in humans.
A therapeutically effective dose refers to that amount of protein or its antibodies, antagonists, or inhibitors which ameliorate the symptoms or conditions. Therapeutic efficacy and toxicity of such compounds can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, eg, ED50 (the dose therapeutically effective in 50% of the population) and LD50 (the dose lethal to 50% of the population). The dose ratio between therapeutic and toxic effects is the therapeutic index, and it can be expressed as the ratio, LD50/ED50. Pharmaceutical compositions which exhibit large therapeutic indices are preferred. The data obtained from cell culture assays and animals studies is used in formulating a range of dosage for human use. The dosage of such compounds lies preferably within a range of circulating concentrations that include the ED5O with little or no toxicity. The dosage varies within this range depending upon the dosage from employed, sensitivity of the patient, and the route of administration.
The exact dosage is chosen by the individual physician in view of the patient to be treated. Dosage and administration are adjusted to provide sufficient levels of the active moiety or to maintain the desired effect. Additional factors which may be taken into account include the severity of the disease state; age, weight and gender of the patient; diet, time and frequency of administration, drug combination(s), reaction sensitivities, and tolerance/response to therapy. Long acting pharmaceutical compositions might be administered every 3 to 4 days, every week, or once every two weeks depending on a half-life and clearance rate of the particular formulation.
Dosage amounts may vary from 0.1 to 100,000 micrograms per person per day, for example, lug, lOug, lOOug, 500 ug, lmg, lOmg, and even up to a total dose of about lg per person per day, depending upon the route of administration. Guidance as to particular dosages and methods of delivery is provided in the literature. See U.S. Patent Nos. 4,657,760; 5,206,344; or 5,225,212, hereby incorporated by reference. Those skilled in the art will employ different formulations for nucleotides than for proteins or their inhibitors. Similarly, delivery of polynucleotide or polypeptides will be specific to particular cells, conditions, locations, etc.
Without further elaboration, it is believed that one skilled in the art can, using the preceding description, utilize the present invention to its fullest extent. The following embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder of the disclosure in any way whatsoever.
EXAMPLE 1
Establishment of an Association Between a Given Polynucleotide Sequence and Osteoporosis
A polynucleotide sequence according to the invention containing a mutation which is believed to be associated with osteoporosis, can be statistically linked to osteoporosis by linkage analysis. An animal model system exhibiting a particular phenotypic defect that is characteristic of the osteoporosis is selected. A series of genetic crosses is performed in this animal model system between individuals having an observable mutant phenotype and normal individuals of a control strain. At least one disease-related locus or a chromosomal marker that does not comprise a disease related locus is used as a marker in these crosses. If a statistically significant pattern of non-random assortment of the mutant trait with a marker locus is observed, the trait is linked to the marker locus.
Similarly, linkage analysis can be performed on an existing human or other mammalian pedigree. According to this method, numerous genetic loci from affected and unaffected family members are compared. Non-random assortment of a given genetic marker between affected and unaffected family members relative to the distributions observed for other genetic loci indicates that the marker (for example, a variant isoform of a gene) either contributes to the disease or is in physical proximity to another that does so.
If either approach demonstrates a non-random assortment of the osteoporosis-related phenotype with a marker locus, this is indicative of an association between the gene underlying the defect and that locus. Because the strength of any conclusion drawn from linkage analysis is statistically-based, the accuracy of the results is thought to be proportional to the number of crosses or family members and genetic loci analysed.
EXAMPLE 2
Screening Assay For Osteoporosis
A polynucleotide sequence according to the invention can be used as a marker for a normal phenotype or for a phenotype associated with osteoporosis.
If it can be demonstrated by the methods of phenotyping, described above, that a particular sequence is associated with an osteoporosis phenotype, this sequence can be used as a marker for osteoporosis. A sequence of interest can be used as a probe to screen genomic DNA from individuals by Southern blot analysis according to the method described above. If the sequence of interest is detected by Southern blot analysis, and the presence of this sequence is confirmed by direct sequencing, it can be concluded that the individual from which the genomic DNA has been isolated has an increased frequency for the development of osteoporosis for which the sequence is a marker.
The marker can also be used as an osteoporosis indicator according to the method of PCR. A genomic DNA sample of interest can be analysed in a PCR reaction wherein one ofthe primers contains the marker sequence. If the marker sequence is present in the sample DNA, a PCR product will be produced. Alternatively, the PCR primers can be designed such that they amplify a region containing the marker sequence. The amplified product can be analysed by hybridization methods, described above, to determine the presence of the sequence of interest.
EXAMPLE 3
Use of a Given Polynucleotide as a Target for Drug Screening
A polynucleotide according to the invention, containing a mutation which is believed to be associated with osteoporosis can be used a target for drug screening.
One method of drug screening utilizes eukaryotic or procaryotic host cells which are stably transformed with a polynucleotide according to the invention and either exhibit a particular phenotype characteristic of the presence of the polynucleotide or express a polypeptide or fragment encoded by the polynucleotide. Such cells, either in viable or fixed form, can be used for standard competitive binding assays. In particular, these cells can be used to measure formation of a complex comprising the protein product or fragment of a polynucleotide according to the invention and the agent being tested. Alternatively, these cells can be used to determine if the formation of a complex between the protein product or fragment of a polynucleotide according to the invention and a known ligand is interfered with by an agent being tested.
An alternative method for drug screening involves using of eukaryotic cell lines or cells (such as described above) which contain a polynucleotide according to the invention that produces a defective protein. According to this method, the host cell lines or cells are grown in the presence of a test drug. The rate of growth of the host cells is measured to determine if the compound is capable of regulating the growth of cells expressing a nonfunctional protein product of the polynucleotide according to the invention. Preferably, a drug that is useful according to the invention will increase or decrease the growth rate of a cell by at least 10%. Alternatively, the ability of the test compound to restore the function of the mutant gene protein by at least 10% can be measured by using an appropriate in vitro assay for function of the protein product of a gene (as described in Section F entitled "Identification and Characterization of Polymorphisms"). If the host cell lines or cells express a protein product of a gene that exhibits an aberrant pattern of cellular localization, the ability of the test compound to alter the cellular localization of the protein by at least 10% will be determined. Changes in the cellular localization of a protein of interest will be detected by performing cellular fractionation studies with biosynthetically labeled cells. Alternatively, the cellular localization of a protein of interest can be determined by immunocytochemical methods well known in the art.
A method of drug screening may also involve the use of host eukaryotic cell lines or cells (described above) which have an altered gene that demonstrates an aberrant pattern of expression where the level of expression is either abnormally high or low, or the temporal pattern of expression is different from that of the wild type gene. The ability of a test drug to alter the expression of a mutant form of a gene by at least 10% can be measured by Northern blot analysis, SI nuclease analysis, primer extension or RNase protection assays, as described above. Alternatively, if a mutant form of a gene contains a polymorphism in the promoter region of a gene, cells can be engineered to express a reporter construct comprising a mutant gene promoter driving expression of a reporter gene (e.g. CAT, luciferase, green fluorescent protein). These cells can be grown in the presence of a test compound and the ability of a test compound to alter the level of activity of the mutant gene promoter can be determined by standard assays for each reporter gene which are well known in the art.
A transgenic animal whose genomic DNA contains a polynucleotide associated with a particular phenotypic defect that is characteristic of osteoporosis and a normal, control animal (not containing the polynucleotide) can be treated with a candidate drug according to the invention. The ability of a candidate drug to ameliorate symptoms of the disease, by at least 10%, will be analysed by assessing the disease symptoms and their amelioration.
EXAMPLE 4 Polymorphisms in Genes Associated With Osteoporosis
The osteoporosis candidate gene list was compiled using gene or gene sequences selected from literature sources, using sequence homology, library subtraction and expression analysis. Expression analysis was performed using "guilt-by-association" queries to identify
Incyte-novel and known genes not previously associated with osteoporosis which have similar expression patterns to genes known to be associated with osteoporosis. Guilt-by- association analysis was performed as described in Walker et al. 1999 Genome Res 9:1198 and U.S. Provisional Patent Application Serial No. 60/342,711 entitled "Nucleotide Polymorphisms Associated with Osteoporosis" filed December 20, 2001, both of which are incorporated by reference in their entirety.
Polymorphism discovery was by fSSCP as described in section F "Identification and Characterization of Polymorphisms". The polymorphisms were mapped to cDNA sequences in the LifeSeqGold database (Incyte Genomics, Inc., Palo Alto, CA) to identify the affected gene.
EXAMPLE 5
Frequency of polymorphisms in Osteoporosis associated genes and polynucleotides in various populations. Polymorphisms identified in Example 4 were genotyped against populations described below by fSSCP or FP-TDI as described above. The results of the population frequency studies are given in Table 2 found in U.S. Provisional Patent Application Serial No. 60/342,711 entitled "Nucleotide Polymorphisms Associated with Osteoporosis" filed December 20, 2001, which is hereby incorporated herein by reference in its entirety. Two panels of human DNA have been developed to support the identification of frequent SNPs within an ethnically diverse population. The genomic Human Diversity Panel (HDP) will be used where full genomic structure for the selected candidate genes is available to allow screening of the open reading frame of the gene including splice junctions. A cDNA version of the HDP (generated from lymphoblastoid cell lines to obviate the need for intron/exon structure in 50% of human genes) will be used where full genomic structure for the selected candidate genes may not be available to permit screening of the open reading frame of the gene.
This HDP is derived from 47 consenting individuals from four ethnic groups (Caucasian, African-American, Asian and Hispanic). The panel is sufficiently sized to enable identification of 95% of SNPs with allele population frequencies >= 5%. Comparable utility of the HDP with the NUT Diversity panel was demonstrated by parallel screening of 90 kilobases of coding sequence from each panel.
EXAMPLE 6
Osteoporosis study population recruitment and clinical data collection
Families were identified through probands with a BMD Z score of at least -1.6 (equivalent to approximately the lower 5% of the normal distribution of BMD) at either the femoral neck or the lumbar spine (L2-L4). A "proband" is defined as the first person identified with a particular phenotype (in this case low BMD) within a family.
The initial phase of family collection focused on nuclear families of European Caucasoid origin. These families were used primarily for a genome-wide scan for genetic determinants of BMD. BMD was measured in all participating family members and treated as a quantitative trait. First degree relatives of probands will be invited to participate. These included parents, siblings and offspring over the age of twenty. Spouses could to take part to act as controls and to assist the analysis of their children's genotype.
If a relative is found to have a low bone density (cut off at Z of approximately -1.28; equivalent to the lower 10% of the phenotypic range), the invitation to participate will be extended to their first-degree relatives, hi some cases the parents of a proband will be deceased. If a strong family history suggesting osteoporosis in deceased parents is present then secondary relatives such as aunts, uncles, cousins will be invited to participate.
The size and nature of families will therefore depend on a number of factors including the age of the proband, family history of osteoporosis or fractures and whether other family members are willing to participate. It is expected, judged from previous experience, that the average number of volunteers per family will be five individuals. The absolute minimum family that was accepted into the study is a pair of siblings, either concordant or discordant for BMD where one of the siblings was a proband. A collection of large numbers of simplex families for linkage disequilibrium studies was carried out to get finer mapping stages of positional cloning and for systematic mapping of functional candidate genes. At a later stage, families from other ethnic groups will provide genetic diversity for haplotype analysis to help identify the primary disease- predisposing sequences. Cape Town and Singapore were selected to collect material from ethnic groups.
Potential probands were identified if they had a femoral neck/lumbar spine BMD equal or lower than Z -2.0, were between 20 - 85 years of age, European, white Caucasian and fully mobile. They were excluded from the study if they had secondary osteoporosis, prednisolone usage at a dose of 7.5mg per day for six months or longer or equivalent steroid doses of Dexamethasone 0.75mg per day or hydrocortisone 30mg per day, were hypothyroid patients on thyroxine if the TSH is below the laboratory normal range, had a malignancy (including myeloma) within five years, have malabsorption, have a inflammatory bowel disease, have premenopausal (aged less than 45 years) amenorrhoea greater than six months, other than pregnancy, had previous or cuπent alcohol intake estimated at greater than 30 units per week for more than six months, chronic renal failure (creatinine > 150 μmol/1) or chronic liver dysfunction (AST > twice normal).
Volunteers gave blood samples for DNA extraction for genetic studies, as well as blood samples for calcium, creatinine, liver function (if over 60 years), TSH and vitamin D (if over 60 years) tests, and a second voided urine sample for markers of bone turnover. For genetic studies, at least 10 ml of venous blood was collected from a forearm vein into EDTA tubes. Blood collected into plastic tubes will be frozen straight away. Blood collected into glass tubes was transferred to plastic tubes before freezing. Once frozen the blood will not be thawed until DNA extraction takes place. DNA extraction was performed using standard procedures. The blood was frozen quickly as possible to -70°C and then stored at -70°C. A 10-ml venous blood sample was also taken from all subjects for biochemical assays of calcium, creatinine, liver function, TSH and vitamin D. Blood was collected into a plain container. Separated serum was stored at -70°C.
Second voided urine samples for analysis of biochemical markers of bone turnover were taken. These samples were stored at -70°C. i addition to BMD at femoral neck and lumbar spine, height and weight were measured. Volunteers were scanned at the femoral neck and lower spine (L2-L4). For the femoral neck, the volunteer will be placed in the dorsal decubitus position with a 10-degree internal rotation of the hip, according to the manufacturer's protocol. For a satisfactory lumbar spine scan (e.g. no scoliosis, severe degenerative disease or obvious fracture), the volunteer will be placed, as described in the manufacturer's manual, in a comfortable supine position with legs raised and supported so as to ensure that the lumbar spine is as horizontal as possible. The axis of the spine should be parallel to the axis of the scanning machine.
Bone mineral density measurements was performed using dual energy X-ray absorbtiometry (DXA) scanning. The bone density data was standardized by the use at each center of the same male and female reference population databases for hip and spine. The Z score at both the femoral neck and the lumbar spine for an individual volunteer was calculated using the regression line and standard deviation from the respective reference database. This is done by using the regression equation y = mx + c, where y is the absolute BMD, x is the volunteer's age and m and c are the slope and constant, respectively. From this, one can calculate the predicted BMD value. The Z score was calculated by subtracting the predicted BMD from the actual BMD, and then dividing the difference by the reference population standard deviation.
EXAMPLE 7
Data Handling and Statistical Analysis to determine association of polymorphisms with
Osteoporosis Bone mineral density was analysed as a quantitative trait in probands and family members. Selection of probands with a low BMD increased the power to detect linkage of genetic marker loci.
Power is defined here as the probability of observing positive evidence for linkage at a single additive quantitative locus, assuming that a genetic susceptibility locus exists, using the variance component model of Amos (1994). Positive evidence of linkage means a LOD score of 3.0 (p<0.001) or greater, the accepted scientific standard.
Power to detect an additive locus was estimated from observed phenotypic data given assumed values for: Broad sense heritability (a measure of the overall genetic component of the phenotype).
Narrow sense heritability (a measure of the genetic contribution of a specific locus).
The frequencies of the alleles at the quantitative locus.
The broad sense heritability for osteoporosis was estimated to be between 0.3 and 0.8. Theoretical calculations were based on the analysis of 108 nuclear families with an average of 2.9 phenotyped siblings per family using the same recruitment strategy as described above, assuming:
An intermediate value of 0.5 for broad sense heritability.
The presence of two alleles at a given quantitative trait locus. That the locus behaves in an additive manner.
That the marker locus is highly informative.
That the recombination frequency between the trait locus and the marker is negligible.
While it is difficult to predict both the contribution of a specific locus to genetic susceptibility and the frequency of the associated allele, it is likely that for a multifactorial disease like osteoporosis, susceptibility will be due to common alleles with small gene effects. Power calculations were performed, therefore, for a range of narrow sense heritability and allele frequency combinations; narrow sense heritability > 8% and allele frequency of 0.22 or less, and narrow sense heritability > 12% and allele frequency of 0.28 or less are typical examples. Thus, if the modeling assumptions are reasonable approximations to reality, it was estimated that about 1200 families will be needed to detect a quantitative locus at 80% power using families in which the proband has a BMD Z score of -2.0 or less.
Similar calculations suggest that by tightening the proband inclusion criterion to a BMD Z score of -2.0 (approximately the lower 2.5% of the phenotypic range) the number of families required would be reduced to an estimated 800.
Thus, it is proposed that family recruitment begins initially using the proband inclusion criterion of a BMD Z score of -2.0. Depending on the rate of family recruitment, which will be monitored constantly, and also on the ongoing genetic analyses, a more stringent proband criterion of Z score of -2.0 or less will be adopted.
Statistical Analysis The genome scan was performed using a strategy of replication. This is a powerful approach based on the premise that it is highly improbable that false positive evidence for linkage will be replicated in the analysis of additional data sets. There was an interim analysis on an initial population of 200 families from the Oxford region, followed by analysis of additional family data sets from the UK and The Netherlands. Linkage analysis was performed using the variance-components analysis program.
BMD was coπected for height, weight, age and sex. The experimental threshold for positive evidence for linkage was p <0.001 at one locus or p <0.01 at two or more adjacent loci.
EXAMPLE 8 Identification of Single Nucleotide Polymorphisms
Single nucleotide polymorphisms (SNPs) were identified using Incyte' s proprietary fSSCP method. Fluoresently labeled primers were synthesized and PCR was performed on 47 DNAs from a Coriel-derived Human Diversity Panel. The PCR products were electrophoresed on an ABI 377 machine and 8% nondenaturing, 12cm SSCP gels were used. The resulting traces were aligned in ABI Genotyper software and where variant traces (indicating underlying polymorphisms) were found, examples of each variant were sequenced.
Biallelic polymorphism genotyping by Pyrosequencing™
A pair of oligonucleotides for amplification by PCR was designed on either side of each biallelic polymorphism to produce a product size between 50bp and 350bp. A sequencing oligonucleotide was designed to end within 30bp either 5' or 3' to each polymorphic site. All amplification oligonucleotides used to generate the complementary strand to the sequencing primer were labeled with a 5' - Biotin. Examples of the particular sequencing primers used are found in Table 10 of U.S. Provisional Patent Application Serial Number 60/423559, entitled "Nucleotide Polymorphisms Associated with Osteoporosis" filed November 4, 2002, which is incorporated by reference in its entirety, and in Tables 10 and 13 of this application. For each marker, all samples genotyped were amplified by PCR using the PCR amplification oligonucleotides. Each reaction used: 20ng DNA (dried down), 0.6 units of AmpliTaq Gold™ DNA polymerase, IX PCR Buffer R, 2.5mM MgCl2, lmM dNTP, and lOpmol of each PCR oligonucleotide in a final volume of 10ml The PCR cycling conditions used were: 95°C for 12 min, 45 cycles of: 94°C for 15 sec, TA for 15 sec, 72°C for 30 sec, and 72°C for 5 min.
After amplification the DNA strand of each PCR template complementary to the sequencing primer was isolated, ready for pyrosequencing (PSQ). To do this, 1) 50ml of Dynabead solution (2mg/ml Dynabeads®, 5mM Tris-HCl, 1M NaCI, 0.5 mM EDTA, 0.05% Tween 20) was added to the PCR product and shaken at 65 °C for 15 min, 2) the template was transferred using magnets to 50ml of 0.5M NaOH for 1 min, 3) the template was transfeπed using magnets to 100ml of IX Annealing buffer (20mM Tris-Acetate, 5mM MgAc2) for 1 min, and 4) the template was transferred using magnets to 45ml of IX Annealing buffer containing 15pmol of sequencing oligonucleotide. Examples of particular sequencing primers and specific annealing temperatures are found in Table 10 of U.S. Provisional Patent Application Serial Number 60/423559, entitled "Nucleotide Polymorphisms Associated with Osteoporosis" filed November 4, 2002, which is incorporated by reference in its entirety.
After template isolation, the sequencing oligonucleotide was annealed to the template by denaturing at 80°C for 2min and then cooling to room temperature for 10 min. Each marker/sample combination was then sequenced/genotyped by pyrosequencing™ on a
PSQ96™ (Pyrosequencing AB). Genotype results were stored in the PSQ oracle® database ready for statistical analysis.
EXAMPLE 9 Genes Analyzed for Polymorphism Association with Osteoporosis
The following genes were found to have polymorphism-associated effects on the susceptibility to low mineral bone density and/or bone damage, and hence osteoporosis:
1) Aortic carboxypeptidase-like protein (ACLP) mRNA: NM_001129 Protein: NP_001120 The ACLP, also known as the adipocyte enhancer(AE)-binding protein 1 (AEBP1), is a transcriptional repressor with carboxypeptidase activity and may play a role in adipogenesis.
2) A kinase anchor protein 9 (AKAP9) mRNA: NM_005751 Protein: NP_005742 mRNA: NM_147166 Protein: NP_671695 mRNA: NM_147171 Protein: NP_671700 mRNA: NM_147185 Protein: NP_671714
AKAP9, also known as YOTIAO, is a scaffold protein that binds type I protein phosphatase (PPl) and cAMP-dependent protein kinase (PKA) to NMDA receptors. AKAP9 also anchors protein kinases and phosphatases to the centrosome and the Golgi apparatus. 3) Bone morphogenetic protein receptor, type II (BMPR2)
Variant 1: mRNA: NM_001204 Protein: NP_001195 Variant 2: mRNA: NM 033346 Protein: NP 203132
BMPR2, also know as the serine/threonine kinase type II activin receptor-like kinase rs a transforming growth factor beta (TGF-beta) receptor that can also bind type I receptors and is involved in bone and other morphogenesis. Mutations in the gene are associated with familial primary pulmonary hypertension.
4) Fibroblast growth factor receptor 2 (FGFR2) mRNA: NM_000141 Protein: NP_000132 mRNA: NM_022969 Protein: NP_075258 mRNA: NM_022970 Protein: NP_075259 mRNA: NM_022971 Protein: NP_075260 mRNA: NM_022972 Protein: NP_075261 mRNA: NM_022973 Protein: NP_075262 mRNA: NM_022974 Protein: NP_075263 mRNA: NM_022975 Protein: NP_075264 mRNA: NM_022976 Protein: NP_075265 mRNA: NM_023028 Protein: NP_075417 mRNA: NM_023029 Protein: NP_075418 mRNA: NM_023030 Protein: NP_075419 mRNA: NM 023031 Protein: NP 075420
FGFR2 is a high-affinity receptor, depending on the isoform, for acidic, basic and/or keratinocyte growth factor. This receptor is a member of the fibroblast growth factor receptor family, where amino acid sequence is highly conserved between members and throughout evolution. FGFR family members differ from one another in their ligand affinities and tissue distribution. A full-length representative protein consists of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation. Mutations in this gene are associated with many craniosynostotic syndromes and bone malformations. The genomic organization of this gene encompasses 20 exons. Alternative splicing in multiple exons, including those encoding the Ig-like domains, the transmembrane region and the carboxyl terminus, results in varied isoforms which differ in structure and specificity.
5) FBJ murine osteosarcoma viral oncogene homolog B (FOSB) mRNA: NM_006732 Protein: NP_006723 FOSB is a DNA-binding member of the Fos family, forms AP-1 transcription factor complex with Jun proteins. FOSB may be involved in the pathogenesis of breast tumors. An alternative form, deltaFosB, plays a role in persistent neuroplasticity associated with cocaine addiction.
6) Follistatin-like 1 (FSTL1) mRNA: NM_007085 Protein: NP_009016
FSTL1, also known as follistatin-related protein, is a nuclear activin-binding protein that is induced by TGF beta 1 (TGFB1) and inhibits cell proliferation. FSTL1 is also an autoantigen in systemic rheumatic diseases. FSTL1 is abundantly expressed (0.33%) in trabecular bone libraries.
7) Insulin-like growth factor binding protein 5 (IGFBP5) mRNA: NM_000599 Protein: NP_000590 IGFBP5 is a member of the insulin-like growth factor binding family of proteins that bind to and modulate insulin-like growth factor activity, regulates bone formation and may serve in muscle and cartilage development. IGFBP5 has tissue specificity with osteosarcoma, and at lower levels in liver, kidney, and brain. IGFBP5 can also alter the interaction of insulin growth factors with their cell surface receptors.
8) Insulin receptor substrate 1 (IRSl) mRNA: NM_005544 Protein: NP_005535
IRSl, also known as FflRS-1, is a cytoplasmic docking protein that mediates IGF1 signaling to SH2-containing effector molecules such as Grb2 and PI3-kinase and inhibits apoptosis. IRSl also plays a role in cell proliferation and glucose transport.
9) Alpha V subunit integrin (ITGAV) mRNA: NM_002210 Protein: NP_002201 ITGAV is a subunit of the vitronectin receptor that is involved in cell-cell and cell-matrix interactions, plays a role in tumor angiogenesis and may contribute to tumorigenicity of cutaneous malignant melanoma. Integrins serve as major receptors for extracellular matrix-mediated cell adhesion and migration, cytoskeletal organization, cell proliferation, survival, and differentiation. Alpha-V integrins comprise a subset sharing a common alpha-V subunit combined with 1 of 5 beta subunits (beta-1, -3, -5, -6, or -8). All or most alpha-V integrins recognize the sequence RGD in a variety of ligands (vitronectin, fibronectin, osteopontin, bone sialoprotein, thrombospondin, fibrinogen, von Willebrand factor, tenascin, and agrin) and, in the case of alpha- V-8, laminin and type IV collagen
Vitronectin is a multifunctional glycoprotein present in blood and in the extracellular matrix. It binds glycosaminoglycans, collagen, plasminogen and the urokinase-receptor, and also stabilizes the inhibitory conformation of plasminogen activation inhibitor- 1. By its localization in the extracellular matrix and its binding to plasminogen activation inhibitor- 1, vitronectin can potentially regulate the proteolytic degradation of this matrix. In addition, vitronectin binds to complement, to heparin and to thrombin-antithrombin III complexes, implicating its participation in the immune response and in the regulation of clot formation. The biological functions of vitronectin can be modulated by proteolytic enzymes, and by exo- and ecto-protein kinases present in blood.
Vitronectin contains an Arg-Gly-Asp (RGD) sequence, through which it binds to the integrin receptor alpha v beta 3, and is involved in the cell attachment, spreading and migration.
Bone resorption requires the tight attachment of the bone-resorbing cells, the osteoclasts, to the bone mineralized matrix. Integrins, a class of cell surface adhesion glycoproteins, play a key role in the attachment process. Most integrins bind to their ligands via the RGD tripeptide present within the ligand sequence. The interaction between integrins and ligands results in bidirectional transfer of signals across the plasma membrane. Tyrosine phosphorylation occurs within cells as a result of integrin binding to ligands and probably plays a role in the formation of the osteoclast clear zone, a specialized region of the osteoclast membrane maintained by cytoskeletal structure and involved in bone resorption.
Human osteoclasts express alpha 2 beta 1 and alpha v beta 3 integrins on their surface. The alpha v beta 3 integrin, a vitronectin receptor, plays an essential role in bone resorption. For example, echistatin, an RGD-containing protein from a snake venom, binds to the alpha v beta 3 integrin and blocks bone resorption both in vitro and in vivo. (Dresner-Pollak R, Rosenblatt M. J Cell Biochem 1994 Nov;56(3):323-30).
Crystal structure of the extracellular portion of integrin alpha- V-beta-3 at 3.1- angstrom resolution. Its 12 domains assemble into an ovoid head and 2 tails. In the crystal, alpha- V-beta-3 is severely bent at a defined region in its tails, reflecting an unusual flexibility that may be linked to integrin regulation.
Alpha-V integrins have been implicated in many developmental processes and are therapeutic targets for inhibition of angiogenesis and osteoporosis. Surprisingly, the ablation of the gene for the alpha-V integrin subunit, eliminating all 5 alpha-V integrins, although causing lethality, allows considerable development and organogenesis including, most notably, extensive vasculogenesis and angiogenesis. Eighty percent of embryos die in midgestation, probably because of placental defects, but all embryos develop normally to E9.5, and 20% are born alive. These liveborn alpha- V-null mice consistently exhibit intracerebral and intestinal hemoπhages and cleft palates. These results necessitate reevaluation of the primacy of alpha-V integrins in many functions including vascular development, despite reports that blockade of these integrins with antibodies or peptides prevents angiogenesis
10) KJ_bonlib4/Eukaryotic translation initiation factor 4 gamma 2 (EIF4G2) mRNA: NM_001418 Protein: NP_001409
KJ_bonlib4 is also known as p97, DAP5, NAT1 and Eukaryotic translation initiation factor 4G-like 1. KJ_bonlib4 is a translational repressor that binds EIF3 and EIF4A, but not EIF4E, promotes IFNG-induced programmed cell death and is cleaved by caspase-3 (CASP3) during apoptosis.
ll) KJ_bonlib7 mRNA: NM_018067
KJ bonlib7 has an unknown function.
12) KJ_opgbal
XM_053496
KJ_opgbal is a member of the sulfatase family, which hydrolyze sulfate esters, has a region of moderate similarity to a region of N-acetylglucosamine-6-sulfate sulfatase (human GNS), which is associated with Sanfilippo disease IITD upon deficiency
13) KJ_opgbal3 mRNA : NM_007021 Protein: NP_008952
KJ_opgbal3 is also known as DEPP. Pfram model results indicate that KJJDPGB A13 is a thermophilic metalloprotease.
14) KJ_opgbal4 mRNA: NM_015429 Protein: NP_056244
KJ_opgbal4 is also known as NESHBP and TARSH. KJ_opgbal4 contains a fibronectin type III domain, which is involved in cell surface binding. 15) KJ_opgba47 mRNA: NM_024843 Protein: NP_079119
KJ_opgba47 is a member of the cytochrome b561 family, has moderate similarity to uncharacterized cytochrome b561 (human CYB561), which is an integral membrane protein found in neuroendocrine secretory vesicles.
16) KJ_opgball5 mRNA: NM_152309 Protein: NP_689522
KJ_opgbal 15 has a strong similarity to B cell phosphoinositide 3 -kinase (PI3K) adaptor (mouse Bcap), which binds to SH2 domains of PI3K and may recruit PI3K to glycolipid-enriched microdomains leading to BCR-mediated PI3K activation.
17) KJ_opgbal36 mRNA: NM_015493 Protein: NP_056308
KJ_opgbal36 has an unknown function.
18) Lumican (LUM) mRNA: NM_002345 Protein: NP_002336
LUM is an extracellular matrix keratan sulfate proteoglycan that may be involved in the development and maintenance of corneal transparency.
19) Matrix metalloproteinase 1 (MMPl) mRNA : NM_002421 Protein : NP_002412
MMPl, also known as interstitial collagenase, is a matrix metalloprotease that cleaves fibrillar collagen type I to gelatin and functions in collagen turnover in most tissues and may play a role in cartilage destruction in rheumatoid arthritis. 20) Mitogen-activated protein kinase 8 (MAPK8J
MAPKδ isoform l: mRNA: NM_139049 Protein: NP_620637
MAPK8 isoform 2: mRNA: NM_002750 Protein: NP_002741
MAPK8 isoform 3: mRNA: NM_139046 Protein: NP_620634 MAPK8 isoform 4: mRNA: NM_ 139047 Protein: NP_620635
MAPK8 is also known as JNK, JNK1, PRKM8, SAPK1, JNK1A2 and JNK21B1/2. MAPK8 is a serine-threonine kinase that regulates c-Jun (JUN) and plays a role in the induction of apoptosis and other cellular responses to stressors such as ultraviolet light, reactive oxygen and hypoxia.
21) Nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (NFKB2) mRNA: NM_002502 Protein: NP_002493 NFKB2 is a transcription factor, involved in immune response, may coordinate pre mRNA splicing and transcription and may play a role in HJV infection, leukemia, breast cancer and lymphoid neoplasia.
22) Notch (Drosophila) homolog 3 (NOTCH3) mRNA: NM_008716 Protein: NP_032742
NOTCH3 encodes the third discovered human homologue of the Drosophilia melanogaster type I membrane protein notch. In Drosophilia, notch interaction with its cell- bound ligands (delta, serrate) establishes an intercellular signalling pathway that plays a key role in neural development. Homologues ofthe notch-ligands in human function in CNS development and are upregulated in renal cell carcinoma. Mutations in NOTCH3 have been identified as the underlying cause of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASE ).
Alignment of available genomic sequence to the CDS contig identified at least 29 exons.
Screening of 14 of these exons by SSCP revealed 11 conformational variants within 7 exons, 10 of which were observed in 14 unrelated patients andnone of 200 control chromosomes. Each was shown by nucleotide sequencing to be due to nucleotide substitutions resulting in an amino acid change. Cosegregation of the abnormal conformer with the affected phenotype as established in 6 pedigrees available in the set of 14 patients. The eleventh variant was seen in patients and in controls, and sequencing showed that it was due to a silent nucleotide change. Notch is known for its role in specifying cell fate during Drosophila development. They stated that the only human disorder implicating a Notch gene previously was an adult T-cell leukemia, which is associated with truncation of the NOTCH1 transcript. No developmental abnormality or neoplasia is associated with CADASIL. On the basis of an analysis of Drosophila mutants, it had been proposed that Notch may be a receptor with different functional domains, the intracellular domain having the signal-transducing activity ofthe intact protein and the extracellular domain possessing a ligand-binding and regulatory activity.
23) Osteoblast specific factor 2 (OSF2) mRNA: NM_006475 Protein: NP_006466 OSF2 is also known as periostin.
24) Osteoglycin (OGN) mRNA: NM_014057 Protein: NP_054776 mRNA: NM_024416 Protein: NP_077727 mRNA: NM 033014 Protein: NP 148935
OGN is a member of the keratan sulfate proteoglycan group of the small leucine-rich profeoglycan family and may play a role in regulating corneal transparency.
25) Osteomodulin (OMD) mRNA: NM_005014 Protein: NP_005005 OMD, also known as osteoadherin, is a leucine-rich repeat containing proteoglycan that may play a role in bone mineralization and has moderate similarity to proline arginine-rich and leucine-rich repeat protein (human PRELP).
26) Plasminogen activator inhibitor 1 (PAH) mRNA: NM_000602 Protein: NP_000593
PAH is a member of the serpin family of serine proteases, inhibitors and plays a role in regulating blood coagulation by inhibiting fibrinolysis, contributes to tumor progression and is a risk factor for cardiovascular diseases.
27) Prostaglandin endoperoxide synthase 1 (PTGS1) mRNA: NM_000962 Protein: NP_000953 mRNA: NM_080591 Protein: NP_542158
PTGS1 is also known as COX1, catalyzes the conversion of arachidonic acid to prostaglandin H2 and may be involved in inflammation and blood coagulation. PTGS 1 's activity is irreversibly inhibited by aspirin.
28) CCL2 chemokine (C-C motif) ligand 2 (SCYA20) mRNA: NM_002982 Protein: NP_002973 SCYA2 is also known as monocyte secretory protein JE monocyte chemoattractant protein- 1 monocyte chemotactic and activating factor small inducible cytokine subfamily A (Cys-Cys), member 2 monocyte chemotactic protein 1 and homologous to mouse Sig-je, is a Cytokine A2, CC chemokine that attracts monocytes, memory T-cells, natural killer cells and endothelial cells. SCYA2 plays a role in the inflammatory response to infection and in inflammatory diseases including arthritis, multiple sclerosis and atherosclerosis.
29) Tissue inhibitor of metalloproteinase 1 (TIMPl) mRNA: NM_003254 Protein: NP 003245 TIMP is also known as erythroid potentiating activity, EPA, EPO, HCl and CLGI. TIMP inhibits matrix metalloproteases including MMP2, stimulates growth of erythroid cells and attenuates metastasis of tumorigenic cells when overexpressed.
30) Transglutaminase 2 (TGM1) mRNA: NM_000359 Protein: NP_000350
TGM1 is membrane bound and catalyzes the crosslinking of extracellular matrix (ECM) proteins and other cellular proteins, modulates the ECM, cell growth, adhesion, signaling, and apoptosis, and has been associated with Alzheimer's, Huntington, and celiac disease.
31) Tumor Necrosis Factor- Alpha-Induced Protein 6 (TNFAIP6) mRNA: NM_007115 Protein: NP_009046
TNFAIP6 is a metalloprotease. TNFAIP6 is transcribed in normal fibroblasts and activated by binding of the TNFa. Similar to CD44, TNFAIP6 binds hyaluronate and is involved in plasmin inhibition and the inhibition of inflammation.
32) Vascular endothelial growth factor (VEGF) mRNA: NM_003376 Protein: NP_003367
VEGF, which is structurally related to platelet-derived growth factor, induces endothelial cell proliferation and migration, vascular permeability, angiogenesis and NO-mediated signal transduction. Many polypeptide mitogens, such as basic fibroblast growth factor and platelet- derived growth factors are active on a wide range of different cell types. In contrast, vascular endothelial growth factor is a mitogen primarily for vascular endothelial cells. Data suggest that mutations of p53 and activation of the Ras/MAPK pathway may play a role in the induction of VEGF expression in human colorectal cancer. Up-regulation of vascular endothelial growth factor by membrane-type 1 matrix metalloproteinase stimulates human glioma xenograft growth and angiogenesis. Both VEGF-induced PI 3-kinase activation and beta(l) mtegrin-mediated binding to fibronectin are required for the recruitment and activation of PKC alpha.
EXAMPLE 10 OTDT analysis of results
Quantitative transmission disequilibrium tests (QTDT) for association between 100 SNP loci and 13 phenotypic traits are reported. The traits comprise calibrated bone mineral density (BMD) values for four skeletal sites, the corresponding Z scores (mean = 0, variance = 1), the occurrence of fractures, and four other traits not directly related to osteoporosis. For each marker-trait combination the significance of stratification is tested. The significance of association between marker and trait is tested both unpartitioned, and partitioned into between-family and within-family components. These analyses are performed for the sexes pooled, and using phenotypic data from males only and females only. There is little evidence of stratification, and interpretation is therefore focused on the unpartitioned association. Of the many significant associations, the most consistent are those with SNP loci OGN_02, OMD_03 and OMD_01. The first two loci show significant associations with phenotypic traits in all three sub-sets of the data (sexes pooled, males only and females only), while OMD_01 shows six associations that are significant at the 1 % level. The effect of an individual significant marker-trait association (measured as the difference between either homozygote and the heterozygote) ranges from 2.8 % to 10.4 % of the mean value for the trait in the case of the calibrated BMD traits, and from 0.114 to 0.448 units for the Z scores. In those cases where stratification is significant, the within-family association effect is consistently much smaller than the unpartitioned effect. Six additional individual SNP-trait associations are significant at the 1 % level. The most notable of these is between SNP locus ITGA08 and BMD in lumbar vertebrae 2 to 4 in males. The effect of this association is 4.1 % of the mean value for calibrated BMD, and 0.237 units for the Z score.
The traits comprise calibrated bone mineral density (BMD) values for four skeletal sites, the corresponding Z scores, the occurrence of fractures, and four other traits. The skeletal sites studied are lumbar vertebrae 2 to 4 (mean value), the neck of the femur, the trochanter, and the total of BMD values over three sites in the hip (neck of femur, trochanter and 'inter'). Calibrated BMD values are given in units of g/cm2. The corresponding Z scores (calculated within Oxagen Limited) are obtained by adjusting these values for the age and sex of the individual, and scaling them so that mean = 0 and variance = 1. The occurrence of fractures is scored as 0 = no fractures, 1 = fractures. The four other traits, which are not directly related to osteoporosis (though they are associated with it) and which are included for purposes of comparison, are the ages of onset and cessation of periods in females, and height and weight in both sexes.
Statistical analysis is performed using the software QTDT. For each marker-trait combination, the significance of stratification is tested. If stratification is present, the between-pedigrees component of the marker-trait association is not entirely due to linkage and only the within-pedigree component can legitimately be used to measure the effect of the locus. However, if stratification is absent the unpartitioned association provides a stronger test of significance and a more precise measure of the effect of the locus. Therefore each marker-trait combination is tested for association both without partitioning of the association, and with partitioning into between- and within-pedigree components. The interpretation of the results then depends on the outcome of the test for stratification. These analyses are performed both for the sexes pooled, and also using phenotypic data from males only and females only.
Table A. SNP loci subjected to QTDT analysis
ACLP02 AKA906 FGF201 IRS102 K11503 KJ1403 MAP803 NOT302 PAI108 SDF106
ACLP04 AKA908 FGF202 IRS104 K11504 KJ1405 MMP101 NOT303 PMX101 SOD201
ACLP05 BMPAOI FOSBOI IRS 105 K13601 KJ1_01 MMP103 NOT304 PTGIOI TGM102
ACLP06 BMPA03 FOSB04 IRS107 KJ1303 KJ1_02 MMP104 OGN_02 SC2001 TGM103 ACLP07 BMPA04FST101 IRS108 KJ1304 KJ4701 MMP105 OGN_03 SC2002 TGM106
ACLP08 CHTJK01 FST102 ITGA02 KJ1306 KJ4702 MMP107 OMD_01 SC2003 TGM111
ACLP09 CHUK02FST103 ITGA08 KJ1307 KJ4703 NFK201 OMD B SCY201 TJJF102
ACLP10 CHUK03 FST104 ITGA11 KJ1308 KJ4704 NFK202 PAI102 SCY202 TNF601
AD1204 CY1701 IGF401 ITGA12 KJ1311 LIF_02 NFK203 PAI105 SDF101 TNF602 AKA901 CY1705 IGF503 K11501 KJ1401 LUM_01 NOT301 PAI107 SDF104 VEGFOl
Table B. Phenotypic traits subjected to QTDT analysis
Age_Periods_Started CalL2_L4BMD Zox_L24 AgePeriodsStopped CalNeckBMD Zox_neck ht_filled_in CalTrochBMD Zox_troch wt_filled_in CalHTotalBMD Zox_ht fracture_numeric
The script contained in the file heritability/run_QTDT_heritability is then run. This script fits a QTDT model with options -a- -We -Veg to each phenotypic trait for the sexes pooled, both for the complete set of phenotypic values and with the exclusion of outliers for BMD. These options indicate that no model of association is to be fitted, and that the variance components Ve (null model) and Ve+ Vg (full model) are to be estimated. Heritability is then estimated as
V„ h2 =- v„ +v„
The coπesponding analysis was performed using the phenotypic values of males only and using those of females only. This script fits a QTDT model with options -at -Weg or each phenotypic trait, for the sexes pooled, in combination with each SNP locus. These options indicate that the association between the trait and the SNP locus is to be estimated, and that the model is also to include the variance components Ve + Vg. However, the association is not to be partitioned into between-pedigree and within-pedigree components. The same model is fitted for the phenotypes of males only and for females only. The chi-square value for association of each SNP with each trait is extracted from these files and these values are stored. They are transferred to an Excel workbook. The unpartitioned effect of each SNP on each trait is extracted from the output files and is stored. These values are transfeπed to 3rd_export_QTDT.xls, worksheets. The effect is measured as the difference between the value of the trait for an individual homozygous for allele 1 and for a heterozygous individual. It is expressed both in the units of the trait (cm for height, kg for weight etc.) and as a percentage of the mean value for the trait. The script is then run. This script fits a QTDT model with options -ao -Weg. These options specify the same model as the previous set, except that the association between the phenotypic trait and the SNP locus is partitioned into between-pedigree and within-pedigree components. The same models are fitted for each sex separately. The chi-square value for within-family association of each SNP with each trait is extracted from these files and these values are stored. They are transferred to 3rd_export_QTDT.xls, worksheets together with the frequency of the rare allele at each SNP locus. The within-family effect of each SNP on each trait (expressed both in the units of the trait and as a percentage of the mean value for the trait) is extracted from the output files and is stored. These values are transferred to 3rd_export_QTDT.xls, worksheets. The script is then run. This script fits a QTDT model with options -ap -Weg. These options specify that the models with and without partitioning of the association into between-pedigree and within- pedigree components are to be compared. The same models are fitted for each sex separately. The chi-square value for the improvement of fit due to partitioning of the association of each SNP with each trait is extracted from these files and these values are stored. They are transfeπed to 3rd_export_QTDT.xls, worksheets. In each worksheet that contains chi-square values, the background of each cell holding a chi-square value above the 5 % critical value (χ2, = 3.841, DF = 1), and the proportion of chi-square values that exceeds this value is presented for each SNP over the eight BMD traits (i.e. the four calibrated BMD values and the corresponding Z scores), for each trait over the 100 SNPs, and for the 800 (= 100 ' 8) SNP- BMD trait combinations. The chi-square value needed to achieve significance following the Bonfeπoni correction is presented for all BMD traits at a single SNP locus (8 tests), for all SNP loci for a single trait (100 tests) and for all SNP-trait combinations (800 tests). The frequency of the rare allele at each SNP locus is extracted from the file and is also presented in each of these worksheets. The mean and heritability of each phenotypic trait, for the sexes pooled (with and without the inclusion of outliers) and for each sex individually, are presented in Table 1. As expected, for the non-BMD traits the exclusion of outliers makes little difference to either the mean or the heritability. For the BMD traits (calibrated BMD values and Z scores) the exclusion of outliers consistently raises the mean value and lowers the heritability. This is to be expected, as the outliers were identified on the basis of high BMD. Heritability of the BMD traits is strikingly lower in females than in males. In particular, that of CalL2_L4BMD is zero. Conversion of the BMD values to Z scores causes a small but consistent increase in heritability in females, but had no consistent effect in males.
The test for association between a phenotypic trait and a SNP within pedigrees is less powerful than the non-partitioned test, provided that stratification is absent. It is therefore to be expected that the chi-square value for non-partitioned association (-at model in QTDT) will be larger than that for association within pedigrees (-ao model), unless there is strong association due to stratification in the opposite direction to that caused by linkage. In the present case there were no exceptions to this expectation. The significance tests for stratification (given by the -ap model) are summarized in Tables 2 and 3. There are a few significant values (P < 0.05), but not many more than the 5 % expected by chance. It is therefore concluded that stratification is not strong or widespread in these data, and attention is therefore focused on the non-partitioned model of association.
The significance tests for non-partitioned association are summarized in Tables 4 and 5. Summarizing over all BMD traits and SNPs, the proportion of significant chi-square values (P < 0.05) is substantially higher than the 5 % expected by chance in the sexes pooled and in males, but fairly close to 5 % in females. This suggests that less emphasis should be placed on the search for association between SNPs and traits in females, though individual SNP-trait combinations with highly significant chi-square values, and SNPs associated with several traits, are still worth pursuing. In the sexes pooled there are nine SNPs each of which associated with four or more of the BMD traits and in the males only there are eight SNPs that meet this criterion, whereas in the females only there are only five such SNPs. The identity of these SNPs, and the chi-square values for their association with each BMD trait, are presented in Table 6. The chi-square values for stratification are presented in Table 7 for the same SNP- trait combinations, and the magnitude of the effect of each of these SNPs on each of the BMD phenotypes is presented in Table 8. The SNP loci OGN_02 and OMDJ33 show significant associations with phenotypic traits in all three sub-sets of the data (sexes pooled, males only and females only), and OMD_01 shows six associations that are significant at the 1 % level. The difference between either homozygote and the heterozygote in these marker-trait associations ranges from 2.8 % to 10.4 % of the mean value for the trait in the case of the calibrated BMD traits, and from 0.114 to 0.448 units for the Z scores. In the cases where stratification is significant, the within-family association effect is consistently much smaller than the unpartitioned effect.
The six additional individual SNP-trait associations that are significant at the 1 % level are presented in Table 9. In none of these is there significant evidence of stratification. The most notable of these associations is that between SNP locus ITGA08 and BMD in lumbar vertebrae 2 to 4 in males. The effect of this association is 4.1 % of the mean value for calibrated BMD, and 0.237 units for the Z score.
The strongest and most consistent associations between SNP loci and phenotypic traits related to BMD are at loci OGN_02, OMD_03, OMD_01 and ITGA08. The first three of these each show several associations significant at the 5 % level, with effects ranging from 2.8 % to 10.4 % of the mean value for the trait in the case of the calibrated BMD traits, and from 0.114 to 0.448 units for the Z scores. ITGA08 shows significant association only with BMD in lumbar vertebrae 2 to 4 in males, but this effect is significant at the 1 % level. Its magnitude lies in the same range as those at the other three loci.
Some of polymorphism markers and their association with various effects on susceptibility to low BMD and bone damage, and hence osteoporosis are summarized in Table 10 of U.S. Provisional Patent Application Serial Number 60/423559, entitled "Nucleotide Polymorphisms Associated with Osteoporosis" filed November 4, 2002, which is incorporated by reference in its entirety.
In addition, Table 10 of this application provides a list of the polymorphism markers of the thirty-two (32) genes listed in Example 9 which have been found to have various effects on susceptibility to low BMD and bone damage. Tables 11 and 12 ranks into groups the polymorphic markers by the relevance of their association to the susceptibility to low BMD by sexes (Table 11 -males and Table 12-females). Those markers ranked in Group A are the ones that show the most association to the susceptibility to low BMD, Group B show less association and Group C shows the least association.
EXAMPLE 11
Gene-Gene Interaction between OMD and ITGAV
The gene by gene interactions were assessed by logistic regression. The interaction was assessed for every pair of OMD-ITGAV SNPs (OMD01 and OMD03 versus ITGAV02, 08, 11 and 12). The logistic regression models were
MEN: OP or Frx (0 or 1)= OMD(snp) + ITGAV(snp) + OMG(snp)+ ITGAV(snp)+ age+ weight WOMEN: OP or Frx (0 or 1)= OMD(snp) + ITGAV(snp) + OMG(snp)+ ITGAV(snp)+ age+ height
The weight was not included for the women and height for the men since they did not show significant effect on OP or Frx in the sample set used. The study group was made up of individuals with osteoporosis (OP) that were unrelated individuals from the FAMOS cohort that had (1) been diagnosed with OP, (2) had fractures and had a maximum Z (spine) of -1 or (3) was a proband and had a maximum Z (spine) of -0.5. For fractures, unrelated individuals who had self -reported fractures were used. Healthy individuals were unrelated subjects, not probands, not OP diagnosed, no fractures and maximum Z (spine) >-l.
Regardless of whether or not the individual SNPs had significant effect, the following interactions were found to be nominally or nearly-nominally significant:
Table C
ITGAV SNP OMD SNP TRAIT GENDER P-VALUE
11 1 OP Women 0.06
2 3 Frx Women 0.1
11 3 Frx Women 0.1
12 3 OP Women 0.05
2 3 OP Women 0.05
8 3 OP Men 0.02
11 3 OP Men 0.06
The odds ratio of the predisposing variant (AA for OMD.01 and A+ for OMD.03) was computed by logistic regression analyzing separately the two or three ITGAV genotypes. The significant level of the association was determined by computing the Wald's Chi-Square for the OMD SNP (model tested OP+OMDsnp + age + height(if women) + weighty men)) independently for each ITGAV genotype. This is illustrated by the plots in Figure 1 A-E.
These results are the first evidence to demonstrate that OMD SNPs show a significant association with osteoporosis in both men and women. In addition, these results provide the first experimental evidence that ITGAV and OMD interacting in bone metabolism as well as confirming the hypothesis of the role of OMD in promoting integrin mediated cell binding. These results further indicate pharmacogenetic implications, since there are cuπent various integrins (e.g. GSK and Pharmacia) in development and the OMD variants may affect the efficacy or ideal dosage of such compounds. OTHER EMBODIMENTS
Other embodiments will be evident to those of skill in the art. It should be understood that the foregoing detailed description is provided for clarity only and is merely exemplary. The spirit and scope of the present invention are not limited to the above examples, but are encompassed by the following claims.
The disclosures of all patents, applications and publications, mentioned above are expressly incorporated by reference herein.
Table 1. Mean and heritability of each phenotypic trait in various sub-sets of the data
Trait Units Sexes pooled Sexes | Dooled, Males only Females only no outliers mean heritability mean heritability mean heritability mean heritability
Age_Periods_Started years 13.27 0.61 13.26 0.61 13.27 0.61
AgePeriodsStopped years 47.83 0.17 47.85 0.10 47.83 0.17 ht illed n cm 167.09 0.64 167.06 0.64 176.15 0.77 161.680.81 wtjilledjn kg 69.48 0.47 69.36 0.47 79.08 0.70 63.42 0.56 fracture_numeric - 0.442 0.14 0.439 0.13 0.481 0.24 0.418 0.09
CalL2_L4BMD g/cm2 1.005 0.40 0.996 0.27 1.080 0.62 0.961 0.00
CalNeckBMD g/cm2 0.779 0.48 0.776 0.42 0.843 0.78 0.745 0.33
CalTrochBMD g/cm2 0.697 0.58 0.692 0.52 0.779 0.61 0.644 0.25
CalHTotalBMD g/cm2 0.922 0.51 0.918 0.47 1.014 0.89 0.862 0.21
Zox_L24 - -0.237 0.44 -0.293 0.34 -0.047 0.61 -0.363 0.15
Zox_neck - -0.205 0.66 -0.239 0.62 -0.052 0.83 -0.3040.57
Zoxjroc - -0.227 0.51 -0.259 0.48 -0.079 0.60 -0.331 0.34
Zox_ht - -0.214 0.54 -0.244 0.51 -0.083 0.86 -0.312 0.36
Table 2. Proportion of chi-square values for each marker, over the eight BMD traits1, from the test for stratification, that are above th 5 % critical value tø2 = 3.841 , DF = 1)
Marker Frequencyproportion of g values above Marker
Of I rarecritical value
Figure imgf000186_0001
allele allele
S e x e sM a I e sF e m al l es S e x e sM a I e sFemales pooled only only pooled only only
ACLP02 °-258 0 0 0 KJ1403 °-407 0 0 0
ACLP04 0.000 - - - KJ1405 0.003 - - -
ACLP05 0.051 0 0 0.750 KJ1_01 0.313 0 0 0
ACLP06 0.005 - - - KJ1_02 0.183 0 0 0
ACLP07 0.273 0.375 0.250 0.250 KJ4701 0.145 0 0 0
ACLP08 0.005 - - - KJ4702 0.298 0 0 0
ACLP09 0.298 0 0.250 0 KJ4703 0.318 0.375 0.500 0
ACLP10 0.370 0.250 0 0.125 KJ4704 0.462 0.250 0 0
AD1204 0.134 0 0 0 LIF_02 0.314 0 0 0
AKA901 0.375 0 0 0 LUM_010.093 0 0 0
AKA906 0.135 0 0 0 MAP8030.060 0 0 0
AKA908 0.336 0 0 0 MM P 1 00.261 0.750 0 0.250
1
BMPA01 0.129 0 0 0 M M P1 00.043 0 0 0
3
BMPA030.124 0 0 0 MM P1 00.062 0 0 "0
4
BMPA040.112 0 0 0 M M P1 00.000 - - -
5
CHUK01 0.068 0 0 0 MMP1 00.039 0.125 - 0
7
CHUK020.000 _ _ - NFK201 0.049 0 0 0
CHUK030.421 0 0.250 0 NFK2020.044 0 0 0
CY1701 0.305 0 0 0 NFK2030.337 0.125 0 0.125
CY1705 0.388 0.125 0 0.125 NOT3010.018 0.250 - 0.625
FGF201 0.005 - - - NOT3020.288 0 0 0
FGF202 0.237 0 0 0 NOT3030.133 0 0 0
FOSB01 0.475 0 0 0 NOT3040.035 0 0 0
FOSB040.309 0.125 0 0 OGN_020.047 0 0 0
FST101 0.372 0 0 0 OGN_030.383 0 0 0
FST102 0.006 - - - OM D_00.052 0.750 0 0.125
1
FST103 0.500 0 0 0 OM D_00.048 0 0 0
3
FST104 0.495 0 0 0 PAH 02 0.440 0.500 0 0.125
IGF401 0.301 0 0 o- PAH 05 0.014 0 - -
IGF503 0.043 0 0 0 PAH 07 0.081 0 0 0
IRS102 0.016 0 - - PAH 08 0.114 0 0 0
IRS104 0.010 - - - PMX1010.159 0 0 0
IRS105 0.079 0 0 0 PTG101 0.117 0 0.125 0.250
IRS107 0.083 0 0 0 SC2001 0.000 - . - -
IRS108 0.021 0 - 0.250 SC20020.015 0 - 0
ITGA02 0.246 0 0 0 SC20030.000 - - -
ITGA08 0.275 0.125 0 0.625 SCY201 0.350 0 0 0
00 ITGA11 0.227 0 0 0 SCY2020.017 0 - 0
<-Λ
ITGA12 0.484 0 0 0 SDF101 0.344 0 0 0
K11501 0.199 0 0 0 SDF1040.197 0 0 0
K11503 0.051 0 0 0 SDF1060.282 0 0 0
K11504 0.274 0 0 0 SOD2010.499 0 0 0
K13601 0.395 0 0 0 TGM1020.139 0 0.125 0
KJ1303 0.316 0.250 0.375 0 TGM1030.013 - - -
KJ1304 0.003 - - - TGM1060.133 0 0 0
KJ1306 0.094 0 0 0 TGM1110.359 0 0 0
KJ1307 0.258 0 0 0 TIF102 0.000 - - -
KJ1308 0.134 0 0 0.250 TNF601 0.002 - - -
KJ1311 0.343 0 0 0 TNF6020.144 0 0 0.125
KJ1401 0.144 0 0 0 VEGF010.317 0.250 0 0.250
1The four calibrated BMD values and the corresponding Z scores.
Table 3. Proportion of chi-square values for each trait, over the 100 SNPs, from the test for stratification, that are above the 5 % critical value ( = 3.841 , DF = 1)
Trait Sexes pooled Males only Females only
Age_Periods_Started0.060 _ 0.060
AgePeriodsStopped 0.069 - 0.069 ht_filled_in 0.059' 0.038 0.036 wt illedjn 0.024 0.038 0.060 fracture_numeric 0.071 0.090 0.072
CalL2_L4BMD 0.024 0.000 0.048
CalNeckBMD 0.047 0.026 0.072
CalTrochBMD 0.024 0.038 0.024
CalHTotalBMD 0.047 0.026 0.060
Zox_L24 0.047 0.000 0.048
Zox_neck 0.094 0.026 0.084
Zoxjroch 0.059 0.038 0.012
Zox_ht 0.094 0.039 0.060
Mean of BMD traits1 0.054 0.024 0.051
1The four calibrated BMD values and the corresponding Z scores.
Table 4. Proportion of chi-square values for each marker, over the eight BMD traits1 , from the test for association between marker and trait without partitioning, that are above the 5 % critical value ( = 3.841 , DF = 1)
Marker Sexes pooledMales only Females only Marker Sexes pooled Males only Females only u u
ACLP02 υ KJ1403 ° 0 0
ACLP04 0 0 0 KJ1405 0 0 0
ACLP05 0 0 0 KJ1_01 0 0.500 0
ACLP06 0 0 0 KJ1_02 0.750 0 0.125
ACLP07 0 0 0 KJ4701 0 0 0
ACLP08 0 0 0 KJ4702 0 0 0.250
ACLP09 0 0 0 KJ4703 0.875 0 0.375
ACLP10 0 0 0 KJ4704 0 0 0.250
AD1204 0 0 0 LIF_02 0 0 0
AKA901 0 0 0 LUM_010 0.250 0
AKA906 0.625 0.250 0 MAP8030 0 0.125
AKA908 0.750 0 0.500 MMP1 00 0 0
1
BMPA01 0 0 0 MM P1 00
3
BMPA030 0 0 MMP1 00 0.125
4
BMPA040 . 0 0 MM P1 00 0 0
5
CHUK01 0 0 0 MMP1 00.125 0 0.125
7
CHUK020 0 0 NFK201 0 0 0
CHUK030.125 0 0 NFK2020.750 0.500 0.125
CY1701 0 0 0 NFK2030 0 0
CY1705 0 0 0 NOT3010 0 0
FGF201 0 0 0 NOT3020 0 0
FGF202 0.250 0.125 0.125 NOT3030 0 0
FOSB01 0 0 0 NOT3040 0 0.375
FOSB040.250 0 0.125 OGN_020.750 1.000 0.500
FST101 0 0 0 OGN_030 0 0
FST102 0 0 0 OMD_00.625 0.750 0.250
1
Figure imgf000190_0001
Table 5. Proportion of chi-square values for each trait, overthe 100 SNPs, from the test for association between marker and trait without partitioning, that are above the 5 % critical value t2 = 3.841 , DF = 1 )
Trait Sexes pooled . Males only Females only
- 0.010
Age_Periods_Started0-01 °
AgePeriodsStopped 0.000 - 0.000 ht illed n 0.061 0.111 0.091 wt_filled_in 0.051 0.101 0.030 fracture_numeric 0.020 0.030 0.020
CalL2_L4BMD 0.061 0.061 0.061
CalNeckBMD 0.101 0.111 0.051
CalTrochBMD 0.061 0.081 0.040
CalHTotalBMD 0.081 0.081 0.091
Zox_L24 0.071 0.051 0.051
Zox_neck 0.111 0.101 0.081
Zox_troch 0.061 0.071 0.020
Zox_ht 0.101 0.071 0.091
0.081 0.078 0.061
Mean of BMD traits1
1The four calibrated BMD values and the corresponding Z scores.
Table 6. Chi-square values for association of each marker with each BMD trait, for those SNP loci at which four or more of these associations are significant1
Marker frequency ofSex Call_2 L4BMD CalNeckBMD CalTrochBMD CalHTotalBMD Zox L24 Zox neck Zox troch Zox
Figure imgf000192_0001
Light grey background indicates P < 0.05 (j? = 3.841 , DF = 1). Dark grey background indicates P < 0.01 ( = 6.635, DF = 1). None of the values in this table has P < 0.001.
Table 7. Chi-square values for stratification in the marker-trait combinations presented in Table 61
Marker Sex CalL2 L4BMD CalNeckBMD CalTrochBMD CalHTotalBMD Zox L24 Zox neck Zox troch Zox ht
2.05 0.59 1.87 0.19 1.37 0.12 1 .37
AKA906 pooled0,00
0.37 1.12 1.65 0.00 0.26 1.04 1.45
AKA908 pooled0"02 female 0.01 0.00 0.18 0.00 0.00 0.04 0.11 0.01
. 0.00 0.01 0.19 0.09 0.00 0.11 0.22 0.19
KJ1_01 male
1.99 2.91 2.32 1.45 0.90 2.23 1.75
KJ1_02 pooled1 -87
1.88 2.75 2.46 2.89 3.41
KJ1303 pooled2"35 * female 1.77 1.47 1.03 0.86 1.53 1.62 0.97 0.97
0.38 1.53 0.88 0.19 0.09 0.71 0.09
KJ1306 female1 -14
O
Figure imgf000193_0001
femaleθ.01 0.22 0.09 0.00 0.57 0.17 0.12 0.89
Figure imgf000193_0002
1 Light grey background indicates P < 0.05. Dark grey background indicates P < 0.01. None of the values in this table has P < 0.001.
Table 8. Unpartitioned effect1 of marker on trait for the marker-trait combinations presented in Table 62
Marker Sex CalL2_L4B D CalNeckBMD CalTrochBMD CalHTotalBMD Zox_L24 Zσx_neck Zox_ roch Zox_ht effect % of mean effect % of mean effect % of mean effect % of mean effect effect effect effect
AKA906 pooled 0.026 2.6 0.019 2.5 0.020 2.9 0.022 2.4 0.156 0.143 0.180 0.142
-0.022 -2.2 -0.019 -2.4 -0.013 -1.9 -0.017 -1.8 -0.129 -0.150 -0.123 -0.121
AKA908 pooled female -0.023 -2.4 -0.020 -2.7 -0.010 -1.5 -0.018 -2.1 -0.134 -0.148 -0.090 -0.109
0.015 1.4 0.023 2.8 0.021 2.7 0.018 1.8 0.076 0.155 0.170 0.103
KJ1_01 male
0.011 1.1 0.021 2.6 0.019 2.7 0.024 2.6 0.057 0.146 0.142 0.133
KJ1_02 ' pooled
-0.022 -2.2 -0.021 -2.7 -0.014 -2.0 -0.024 -2.6 -0.098 -0.106 -0.072 (0,094) -0.107 (0.
KJ1303 pooled female -0.026 -2.7 -0.023 -3.1 -0.010 -1.6 -0.025 -2.9 -0.135 -0.150 -0.083 -0.158
-0.022 -2.3 -0.027 -3.7 -0.025 -4.0 -0.035 -4.2 -0.104 -0.163 -0.216 -0.223
KJ1306 .female
0.025 2.3 0.023 2.8 0.027 3.5 0.029 2.9 0.131 0.191 0.197 0.174
KJ1311 male
0.023 2.3 0.020 2.6 0.014 2.0 0.022 (-0.004) 2.4 (-0.4) 0.119 0.112 0.097 (-0.047) 0.120 (-0.
KJ4703 pooled
-0.026 -2.5 -0.045 -5.5 -0.034 -4.7 -0.053 -5.5 -0.152 -0.293 -0.275 -0.321
NFK202 pooled male ' -0.025 -2.3 -0.059 -6.6 -0.046 -5.6 -0.072 -6.7 -0.129 -0.367 -0.353 -0.401
. 0.059 6.2 0.036 4.8 0.020 2.9 0.048 5.5 0.310 0.255 0.209 0.346
OGN_02 pooled male 0.086 8.5 0.058 7.3 0.045 6.1 0.067 '7.0 0.448 0.403' 0.358 0.404
>— > female 0.059 6.5 0.027 3.7 0.022 3.5 0.054 6.6 0.299 0.163 0.181 0.355 t
. 0.029 3.0 0.0 2 (0.004) 5.7 (0.6) 0.042 (0.007) 6.4 (1.1) 0.049 (-0.003) 5.6 (-0.4) 0.114 0.187 (-0.046) 0.288 (-0.008) 0.229 (-0.
OMD_01 pooled
■ male 0.051 4.9 0.080 10.4 0.047 6.4 0.072 7.6 0.261 0.475 0.375 0.398
-0.055 -5.8 -0.032 -4.3 -0.019 -2.8 -0.045 -5.1 -0.302 -0.251 -0.209 -0.346
OMD_03 pooled male -0.083 -8.2 -0.055 -6.9 -0.045 -6.1 -0.066 -6.9 -0.432 -0.394 -0.359 -0.406 female -0.055 -6.0 -0.024 -3.3 -0.020 -3.2 -0.051 -6.2 -0.286 -0.155 -0.173 -0.344
-0.048 -4.6 -0.042 -5.2 -0.038 -5.1 -0.047 - .8 -0.251 -0.282 (-0.585) -0.302 -0.272
PTG101 male
TNF602 male -0.041 -3.7 -0.034 -3.9 -0.027 -3.4 -0.042 -4.0 -0.208 -0.205 -0.202 -0.232
1ln those marker-trait combinations in which significant stratification is present, the unpartitioned effect includes effects of stratification. In these cases the within-pedigrees effect is also presented, in brac
2For calibrated BMD values, the effect of each marker is presented in the units of measurement (g/cm2)-, and as a percentage of the mean value. For Z scores, which are unitless and have a mean of zero and a variance of 1 (over the whole data set on which they are calculated), the actual effect of each marker is presented.
Table 9. Marker-trait combinations for which the chi-square value for unpartitioned association exceeds the 1 % critical value ( = 6.635, DF = 1 ), oth than those already presented in Tables 6 to 8.
Marker Sex Trait Chi square effect % of mean association1 stratification
FGF202 pooled Zox_L24 .6.94 0.52 -0.152 -
ITGA08 male CalL2_L4BMD 6.89 0.02 -0.045 -4.1 Zox_L24 - 7.12 0.03 -0.237 -
KJ4704 female CalHTotalBMD 6.96 0.30 -0.024 -2.8
NOT304 female CalHTotalBMD 8.71 1.02 0.015 1.7
OMD_01 female CalTrochBMD 7.24 3.31 0.033 4.0
1 unpartitioned marker-trait association
Figure imgf000195_0001
Table 10
4.
Figure imgf000196_0001
Table 10
Figure imgf000197_0001
Table 10
Figure imgf000198_0001
Table 10
Figure imgf000199_0001
Table 10
Figure imgf000200_0001
Table 10
Figure imgf000201_0001
Table 10
ts>
O σ
Figure imgf000202_0001
Table 10
Figure imgf000203_0001
Table 10
Figure imgf000204_0001
Table 10
o t
Figure imgf000205_0001
Table 10
Figure imgf000206_0001
Table 10
Figure imgf000207_0001
Table 10
Figure imgf000208_0001
Table 10
Figure imgf000209_0001
Table 10
Figure imgf000210_0001
Figure imgf000210_0002
Table 10
Figure imgf000211_0001
Table 10
to
Figure imgf000212_0001
Table 10
Figure imgf000213_0001
Table 10
Figure imgf000214_0001
Table 10
to i— '
Figure imgf000215_0001
Table 10
Figure imgf000216_0001
Table 10
Figure imgf000217_0001
Table 10
Figure imgf000218_0001
Table 10
to
Figure imgf000219_0001
Table 10
to
I—* 00
Figure imgf000220_0001
Table 10
Figure imgf000221_0001
Table 10
Figure imgf000222_0001
Table 10
Figure imgf000223_0001
Table 10
Figure imgf000224_0001
Table 10
Figure imgf000225_0001
Table 10
Figure imgf000226_0001
Table 10
Figure imgf000227_0001
Table 10
Figure imgf000228_0001
Table 10
Figure imgf000229_0001
Table 10
Figure imgf000230_0001
Table 1 1
Figure imgf000231_0001
Table 11
Figure imgf000232_0001
Table 11
Figure imgf000233_0001
Table 11
Figure imgf000234_0001
Table 12
Figure imgf000235_0001
Table 12
Figure imgf000236_0001
Table 12
Figure imgf000237_0001
Table 12
Figure imgf000238_0001
Table 13
Figure imgf000239_0001
Table 13
Figure imgf000240_0001
Table 13
Figure imgf000241_0001

Claims

1. A method, of determining whether an individual is predisposed to susceptibility to low bone mineral density (BMD) and/or bone damage comprising identifying whether the individual has at least one polymorphism in a polynucleotide encoding at least one of the proteins listed in Table 10.
2. The method of Claim 1 wherein the low BMD and/or bone damage is associated with a disease.
3. The method of Claim 2 where the disease is osteoporosis.
4. The method of Claim 1 where at least one of the polymorphisms is selected from the polymorphisms defined in Table 10.
5. The method of Claim 1 comprising contacting a sample from the individual with a specific binding agent for the polymorphism and determining whether the agent binds to the polymorphism.
6. The method of Claim 1 where the polymorphism in the polynucleotide is determined for both alleles of the individual.
7. A method for modulating an individual susceptibility to low BMD comprising identifying the individual by the method of Claim 1 and administering to the individual a composition comprising an agent which modulates said susceptibility.
8. The method of Claim 7 wherein the low BMD and/or bone damage is associated with a disease.
9. The method of Claim 8 where the disease is osteoporosis.
10. A polynucleotide encoding a protein selected from Table 10 having at least one polymoφhism in the polynucleotide selected from the group of polymorphisms listed in Table 10 for the polynucleotide.
11. A fragment of a polynucleotide encoding a protein selected from Table 10 having at least one polymorphism in the fragment selected from the group of polymorphisms listed in Table 10 where the fragment is selected from the group of fragments consisting of a) fragments having a length of 10 to 40 nucleotides, b) fragments having a length of 5 to 10 nucleotides, c) fragments having a length of 5 to 20 nucleotides, or d) fragments having a length of 10 to 20 nucleotides.
12. A method for identifying an agent for the modulating an individual's susceptibility to low BMD and/or bone damage comprising a) contacting a test agent with a polypeptide or a polynucleotide encoding the polypeptide selected from the list of Table 10 having at least one of the polymorphisms selected from the list of Table 10.
b) determining whether the agent is capable of binding to the polypeptide or polynucleotide encoding the polypeptide, and
c) determining whether the activity or expression of the polypeptide or polynucleotide encoding the polypeptide is modulated.
13. A method of formulating a composition comprising a) identifying an agent for the prevention or treatment of a disease resulting in susceptibility to low BMD and/or bone damage by the method of Claim 12, and b) formulating the agent with a carrier or diluent.
14. An agent identified by the method of Claim 12.
15. A composition for the modulating the susceptibility to low BMD and/or bone damage comprising an agent according to Claim 14 and a carrier or diluent.
16. Use of an agent according to Claim 14 in the manufacture of a medicament for use in modulating the susceptibility to low BMD and/or bone damage.
17. A probe, primer or antibody which is capable of selectively detecting a polymorphism listed in Table 10 in the corresponding polynucleotide encoding a protein listed in Table 10 associated with susceptibility to low BMD and or bone damage.
18. A vector comprising the polynucleotide of Claim 10.
19. A host cell line comprising the vector of Claim 18.
20. A nonhuman animal which is transgenic for the polynucleotide of Claim 10.
21. A cell line comprising the polynucleotide of Claim 10.
22. The use of a cell line according to Claim 21 in screening for an agent for use in diagnosis, modulation of an individual having a genetic predisposition to the susceptibility to low BMD and/or bone damage.
23. The use of a nonhuman animal according to Claim 20 in screening for an agent for use in diagnosis, modulation of an individual having a genetic predisposition to susceptibility to low BMD and/or bone damage.
24. A kit for use in diagnosis of an individual having a genetic predisposition to susceptibility to low BMD and/or bone damage comprising an agent for detection of the polynucleotide of Claim 10.
25. A kit for use in diagnosis of an individual having a genetic predisposition to susceptibility to low BMD and/or bone damage comprising an agent for detection of the fragment of a polynucleotide of Claim 11.
26. A kit for use in diagnosis of an individual having a genetic predisposition to susceptibility to low BMD and/or bone damage comprising the probe, primer or antibody of Claim 17.
PCT/US2002/040948 2001-12-20 2002-12-19 Nucleotide polymorphisms associated with osteoporosis WO2003054218A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU2002366709A AU2002366709A1 (en) 2001-12-20 2002-12-19 Nucleotide polymorphisms associated with osteoporosis
CA002471376A CA2471376A1 (en) 2001-12-20 2002-12-19 Nucleotide polymorphisms associated with osteoporosis
EP02805650A EP1466012A2 (en) 2001-12-20 2002-12-19 Nucleotide polymorphisms associated with osteoporosis

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US34271101P 2001-12-20 2001-12-20
US60/342,711 2001-12-20
US42355902P 2002-11-04 2002-11-04
US60/423,559 2002-11-04

Publications (2)

Publication Number Publication Date
WO2003054218A2 true WO2003054218A2 (en) 2003-07-03
WO2003054218A3 WO2003054218A3 (en) 2004-02-19

Family

ID=26993153

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/040948 WO2003054218A2 (en) 2001-12-20 2002-12-19 Nucleotide polymorphisms associated with osteoporosis

Country Status (4)

Country Link
EP (1) EP1466012A2 (en)
AU (1) AU2002366709A1 (en)
CA (1) CA2471376A1 (en)
WO (1) WO2003054218A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004065630A1 (en) * 2003-01-24 2004-08-05 King's College London Detection of predisposition to osteoporosis
EP2289908A1 (en) 2003-07-11 2011-03-02 DeveloGen Aktiengesellschaft Use of DG177 secreted protein products for preventing and treating pancreatic diseases and/or obesity and/or metabolic syndrome
WO2020076900A1 (en) * 2018-10-09 2020-04-16 Genecentric Therapeutics, Inc. Detecting tumor mutation burden with rna substrate

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
OHNO ET AL.: 'A cDNA cloning of human AEBP1 from primary cultured osteoblasts and its expression in a differentiating osteoblastic cell line' BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS vol. 228, 1996, pages 411 - 414, XP002970388 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004065630A1 (en) * 2003-01-24 2004-08-05 King's College London Detection of predisposition to osteoporosis
EP2289908A1 (en) 2003-07-11 2011-03-02 DeveloGen Aktiengesellschaft Use of DG177 secreted protein products for preventing and treating pancreatic diseases and/or obesity and/or metabolic syndrome
WO2020076900A1 (en) * 2018-10-09 2020-04-16 Genecentric Therapeutics, Inc. Detecting tumor mutation burden with rna substrate

Also Published As

Publication number Publication date
CA2471376A1 (en) 2003-07-03
EP1466012A2 (en) 2004-10-13
AU2002366709A1 (en) 2003-07-09
WO2003054218A3 (en) 2004-02-19
AU2002366709A8 (en) 2003-07-09

Similar Documents

Publication Publication Date Title
US20090317816A1 (en) Methods for identifying risk of breast cancer and treatments thereof
JP2007185199A (en) Locus for idiopathic generalized epilepsy, mutation thereof and method using the same to assess, diagnose, prognose or treat epilepsy
WO2000019883A9 (en) Compositions and methods of disease diagnosis and therapy
JP2004520005A (en) Osteolevin gene polymorphism
WO2003054166A2 (en) Nucleotide polymorphisms associated with osteoarthritis
US6551812B1 (en) Compositions and methods relating to the peroxisomal proliferator activated receptor-α mediated pathway
US20050064440A1 (en) Methods for identifying risk of melanoma and treatments thereof
JP4997113B2 (en) Methods and compositions for predicting drug response
US20050277118A1 (en) Methods for identifying subjects at risk of melanoma and treatments thereof
WO2001020031A2 (en) Polymorphisms in a klotho gene
US11473143B2 (en) Gene and mutations thereof associated with seizure and movement disorders
US20040018533A1 (en) Diagnosing predisposition to fat deposition and therapeutic methods for reducing fat deposition and treatment of associated conditions
JP2009165473A (en) Cancer
US20050233321A1 (en) Identification of novel polymorphic sites in the human mglur8 gene and uses thereof
US20170253929A1 (en) Novel Homeobox Gene
EP1466012A2 (en) Nucleotide polymorphisms associated with osteoporosis
JP2006526986A (en) Diagnosis method for inflammatory bowel disease
US20030175797A1 (en) Association of protein kinase C zeta polymorphisms with diabetes
JP2006506988A (en) Human type II diabetes gene located on chromosome 5q35-SLIT-3
EP2112229A2 (en) Methods for identifying risk of breast cancer and treatments thereof

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2471376

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2002805650

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2002805650

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2002805650

Country of ref document: EP

NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP