EP1466012A2 - Polymorphismes nucleotidiques associes a l'osteoporose - Google Patents

Polymorphismes nucleotidiques associes a l'osteoporose

Info

Publication number
EP1466012A2
EP1466012A2 EP02805650A EP02805650A EP1466012A2 EP 1466012 A2 EP1466012 A2 EP 1466012A2 EP 02805650 A EP02805650 A EP 02805650A EP 02805650 A EP02805650 A EP 02805650A EP 1466012 A2 EP1466012 A2 EP 1466012A2
Authority
EP
European Patent Office
Prior art keywords
sequence
dna
protein
gene
polymorphism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP02805650A
Other languages
German (de)
English (en)
Inventor
Karen Anne Jones
Ana Valdes
David J. Townley
Jonathan Mangion
Nicolas Incyte Genomics Inc. GALWEY
Simon Incyte Genomics Inc. BENNETT
Ian Incyte Genomics Inc. MCKAY
Alan Schafer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Incyte Corp
Original Assignee
Incyte Genomics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Incyte Genomics Inc filed Critical Incyte Genomics Inc
Publication of EP1466012A2 publication Critical patent/EP1466012A2/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the invention relates in general to polymorphisms in genes associated with susceptibility to low bone mineral density and bone remodeling and methods of identifying individuals having a gene containing a polymorphism associated with osteoporosis.
  • the invention also relates to a method of detecting an increases susceptibility to a disease in an individual resulting from the presence of a polymorphism or mutation in the gene coding sequence of a osteoporosis and bone remodeling associated gene.
  • SNPs single nucleotide polymorphisms
  • association studies such as linkage equilibrium studies
  • SNP single nucleotide polymorphism
  • Nucleotide sequence mutations which occur in a gene or gene family, where the gene or gene family is associated with a given disease, indicates susceptibility to or development of the disease.
  • Osteoporosis is a common disease characterized by low bone mineral density (BMD), deterioration of bone micro-architecture and increased risk of bone damage, such as fracture.
  • BMD bone mineral density
  • Common types of osteoporosis include postmenopausal and senile osteoporosis, which generally occur later in life, e.g., 70+ years.
  • Osteoporosis is a major health problem in virtually all societies. It is estimated that 30 million Americans and 100 million people worldwide are at risk for osteoporosis. In European populations, one in three women and one in twelve men over the age of fifty is at risk. These numbers are growing as the elderly population increases.
  • osteoporosis is a major public health problem which affects quality of life and increases costs to health care providers.
  • Peak bone mass is mainly genetically determined, though dietary factors and physical activity can have positive effects. Peak bone mass is attained at the point when skeletal growth ceases, after which time bone loss starts. In contrast to the positive balance that occurs during growth, in osteoporosis, the resorbed cavity is not completely refilled by bone. Despite recent successes with drugs that inhibit bone resorption, there is a clear need for specific anabolic agents that will considerably increase bone formation in people who have already suffered substantial bone loss. There are no such drugs currently approved.
  • HRT hormone replacement therapy
  • bisphosphonates e.g., alendronate (Fosamax)
  • estrogen and estrogen receptor modulators progestin, calcitonin, and vitamin D.
  • Osteoporosis can be considered a complex genetic trait with variants of several genes underlying the genetic determination of the variability of the phenotype.
  • Low bone mineral density (BMD) is an important risk factor for fractures, the clinically most relevant feature of osteoporosis. Segregation analysis in families has shown that BMD is under polygenic control while, in addition, biochemical markers of bone turnover have also been shown to have strong genetic components.
  • VDR vitamin D receptor
  • the present invention is applicable to any disease in which low BMD and/or bone fracture is a factor, and is therefore particularly concerned with diseases such as osteoporosis.
  • Low BMD is defined as two standard deviations below the age-matched mean of bone mineral density for a given population.
  • Bone damage may be defined as any form of structural damage such as fractures, bones or chips, and degradation or deterioration of the bone other than normal wear and tear resulting from low bone mineral density or another cause. Such low BMD and/or bone damage is associated with osteoporosis.
  • the invention may be practised on any mammalian subject.
  • the mammalian subject will be a human, and most preferably an adult, preferably female.
  • the polynucleotide of this invention is preferably DNA, or may be RNA or other options .
  • fragments of the nucleic acid sequences of the first aspect are provided, which comprise one or more nucleotide substitutions, insertions or deletions.
  • the novelty of a fragment according to the present embodiment may be easily ascertained using sequence comparison methods as previously described.
  • Preferred fragments may be 10 to 40 nucleotides in length. More preferably, the fragments are between 5 to 10, 5 to 20, or 10 to 20 nucleotides in length. For example, the fragments may be 5, 8, 10, 12, 15, 18, 20, 22, 25, 28, 30, or 35 nucleotides in length.
  • the fragments may be useful in a variety of diagnostic, prognostic or therapeutic methods, or may be useful as research tools for example in drug screening.
  • non-coding, complementary sequences which hybridize to a nucleic acid sequence of the first aspect.
  • anti-sense sequences are useful as probes or primers for detecting an allele of a polymorphism of the invention, or in the regulation of the genes. They may also be used as agents for use in the identification and/or treatment of individuals having or being susceptible to low bone mineral density.
  • the anti-sense polynucleotides of this embodiment may be the full length of sequence of the first aspect, or more preferably may be 5 to 30 nucleotides in length.
  • Preferred polynucleotides are 5 to 10 or 10 to 25 nucleotides in length.
  • Primers, in particular, are typically 10 to 15 nucleotides long, and may occasionally be 16 to 25.
  • the polynucleotides of the aforementioned aspects of the invention may be in the form of a vector, to enable the in vitro or in vivo expression of the polynucleotide sequence .
  • the polynucleotides may be operably linked to one or more regulatory elements including a promoter; regions upstream or downstream of a promoter such as enhancers which regulate the activity ofthe promoter; an origin of replication; appropriate restriction sites to enable cloning of inserts adjacent to the polynucleotide sequence; markers, for example antibiotic resistance genes; ribosome binding sites: RNA splice sites and transcription termination regions; polymerisation sites; or any other element which may facilitate the cloning and/or expression of the polynucleotide sequence.
  • each may be controlled by its own regulatory sequences, or all sequences may be controlled by the same regulatory sequences. In the same manner, each sequence may comprise a 3' polyadenylation site.
  • the vectors may be introduced into microbial, yeast or animal DNA, either chromosomal or mitochondrial, or may exist independently as plasmids. Examples of suitable vectors will be known to persons skilled in the art and include pBluescript II, LambdaZap, and pCMV-Script (Stratagene Cloning Systems, La Jolla (USA))
  • host cell comprising a polynucleotide according to any of the aforementioned aspects, for expression of the polynucleotide.
  • the host cell may comprise an expression vector, or naked DNA encoding said polynucleotides.
  • suitable host cells are available, both eukaryotic and prokaryotic.
  • transgenic non-human animal comprising a polynucleotide according to an aforementioned aspect of the invention.
  • the transgenic, non-human animal comprises a polynucleotide according to the second third aspects.
  • Transgenic non-human animals are useful for the analysis of the single nucleotide polymorphisms and their phenotypic effect.
  • a method of screening for agents for use in the prognosis, diagnosis or treatment of individuals having, or being susceptible to, low bone mineral density comprising contacting a putative agent with a polynucleotide or protein according to an aforementioned aspect of the present invention, and monitoring the reaction there between.
  • the method further comprises contacting a putative agent with a reference polynucleotide or protein, and comparing the reaction between (i) the agent and the polynucleotide or protein encoding the reference allele; and (ii) the agent and polynucleotide or protein ofthe invention.
  • Potential agents are those which react differently with a variant of the invention and a reference allele.
  • the present method may be carried out by contacting a putative agent with a host cell or transgenic non-human animal comprising a polynucleotide or protein according to the invention.
  • putative agents will include those known to persons skilled in the art, and include chemical or biological compounds, such as anti-sense polynucleotide sequences, complementary to the coding sequences of the first aspect, or polyclonal or monoclonal antibodies which bind to a product such as a protein or protein fragment of the second aspect. They may also be useful in determining susceptibility to low bone mineral density, or in the diagnosis, prognosis or treatment of related conditions.
  • a method of diagnosing, or determining susceptibility of a subject to low bone mineral density and/or bone damage comprising analysing the genetic material of a subject to determine which allele(s) ofthe gene is/are present.
  • the method may include determining whether one or more particular alleles are present, or which combination of alleles (i.e. a haplotype) is present.
  • the method may also include determining whether subjects are homozygous or heterozygous for a particular allele or haplotype.
  • the method comprises determining which allele of one or more of the polymorphisms of the invention is/are present.
  • the method may include determining the presence of the polymorphism of the gene which in combination with polymorphisms defined herein or other polymorphisms may define a risk haplotype.
  • the method may comprise determining which allele is present in the protein.
  • the method comprises determining whether the allele of the polymorphism of the fourth aspect is present. Any method for determining the presence of an allele may be used.
  • One such method involves the use of antibodies in diagnosing or determining susceptibility to low bone mineral density.
  • the method may comprise removing a sample from a subject, contacting the sample with an antibody to an antigen of the protein, and detecting binding of the antibody to the antigen, wherein binding is indicative ofthe presence of a particular allele or form of the protein and thus risk to low BMD. Tissue samples as described above are suitable for this method.
  • a method of predicting the response of a subject to treatment comprising analysing genetic material of a subject to determine which allele(s) of the gene is/are present.
  • the method is carried out according to the ninth aspect.
  • This aspect ofthe invention is based upon the observation that the effectiveness of treatment depends upon the underlying cause of disease. Therefore, depending upon the presence of particular allele(s), and their effect, certain treatments may be effective, whereas others may not. This will be the case where different alleles or haplotypes result in low bone mineral density, but mediate their effect via different biological mechanisms.
  • the method preferably also comprises comparing the alleles present in a subject with those ofthe genes which require particular treatments.
  • the present invention provides a kit to determine which alleles of the gene is/are present.
  • the kit will be suitable for determining which alleles of the polymorphisms of the first aspect are present.
  • the kit may contain polynucleotides, most preferably anti-sense sequences such as those of the third aspect, for use as probes or primers; antibodies which bind to alleles of the protein, such as those of the fifth aspect; or restriction enzymes for use in detecting the presence of a polynucleotide, protein, or fragment thereof.
  • the kit will also comprise means for detection of a reaction, such as nucleotide label detection means, labelled secondary antibodies or size detection means.
  • a reaction such as nucleotide label detection means, labelled secondary antibodies or size detection means.
  • the polynucleotides, or antibodies may be fixed to a substrate, for example an array.
  • the kit further comprises means for indicating correlation between the genotype of a subject and risk of low BMD. Such means may be in the form of a chart or visual aid, which indicate that presence of one or more alleles of the gene, including alleles of the polymorphisms of the invention, is/are associated with low BMD.
  • the invention provides novel polynucleotides and polymorphic polynucleotides associated with a given human disease, for example, with osteoporosis.
  • the invention also provides a gene sequence containing one or more polymorphic nucleotides associated with a predisposition to or the development of a given human disease such as osteoporosis.
  • the invention also relates to polypeptides encoded by the novel polynucleotides or the polymorphism- containing gene.
  • the invention also provides methods of detecting a polymorphism according to the invention in individuals at risk for osteoporosis, and for determining if a given polymorphism is associated with a predisposition to the disease.
  • the invention also discloses polymorphism(s) that are either associated with or are not associated with (i.e., are neutral) osteoporosis.
  • a polymorphism in a given gene can be utilized in various diagnostic and therapeutic methods and procedures, for example, in nucleic acid and peptide diagnosis, drug screening and design, and in gene and peptide therapy.
  • a polymorphism associated with a given gene can be utilized in various gene expression systems and assays designed to analyze gene regulation and expression. Definitions
  • polymorphism refers to a nucleotide alteration that either predisposes an individual to a disease or is not associated with a disease, which occurs as a result of a substitution, insertion or deletion. More particularly, a "polymorphism” or “polymorphic variation” may be a nucleic acid sequence variation, as compared to the naturally occurring sequence, resulting from either a nucleotide deletion, an insertion or addition, or a substitution, which is present at a frequency of greater than 1% in a population.
  • neutral polymorphism refers to a polymorphism which is present at a frequency of greater than 1 % in a population, which does not alter gene function or phenotype, and thus is not associated with a predisposition to or development of a disease.
  • polynucleotide sequence refers to a sense or antisense nucleic acid sequence comprising RNA, cDNA, genomic DNA, synthetic forms and mixed polymers, that may be chemically or biochemically modified or may contain non-natural or derivatized nucleotide bases.
  • mutant refers to a variation in the nucleotide sequence of a gene or regulatory sequence as compared to the naturally occurring or normal nucleotide sequence.
  • a mutation may result from the deletion, insertion or substitution of more than one nucleotide (e.g.,
  • nucleotide change such as a deletion, insertion or substitution.
  • mutation also encompasses chromosomal rearrangements.
  • nucleic acid probe refers to an oligonucleotide, nucleotide or polynucleotide, and fragments and portions thereof, and to DNA or RNA of genomic or synthetic origin which may be single- or double- stranded, which represents the sense or antisense strand.
  • DNA fragment refers to a length of polynucleotide, for example, as small as 5 nucleotides, 10, 20, 25, 40, 50, 75, 100, 250, 400, 500 and 1 kb, and as large as 5-10kb.
  • alteration refers to a change in either a nucleotide or amino acid sequence, as compared to the naturally occurring sequence, resulting from a deletion, an insertion or addition, or a substitution.
  • deletion refers to a change in either nucleotide or amino acid sequence wherein one or more nucleotides or amino acid residues, respectively, are absent.
  • insertion or “addition” refers to a change in either nucleotide or amino acid sequence wherein one or more nucleotides or amino acid residues, respectively, have been added.
  • substitution refers to a replacement of one or more nucleotides or amino acids by different nucleotides or amino acid residues, respectively.
  • specifically hybridizable refers to a nucleic acid or fragment thereof that hybridizes to another nucleic acid (or a complementary strand thereof) due to the presence of a region that is at least approximately 90 % homologous , preferably at least approximately 90- 95% homologous, and more preferably approximately 98-100% homologous, as are polynucleotides that hybridize to a partner under stringent hybridization conditions.
  • Stringent hybridization conditions are defined hereinbelow for various hybridization protocols.
  • a probe that is specifically hybridizable to a given sequence can be used to detect a 1 bp out of 10 bp (10%) or a 1 bp out of 2O bp (5%) difference between nucleic acid sequences and is therefore useful for discriminating between a wild type and a mutant form of a gene of interest.
  • amino acid sequence refers to the sequential array of amino acids that have been joined by peptide bonds between the carboxylic acid group of one amino acid and the amino group of the adjacent amino acid to form long linear polymers comprising proteins.
  • amino acid refers to protein subunit molecules that contain a carboxylic acid group, and an amino group, both linked to a single carbon atom.
  • a polypeptide is said to be "encoded" by a polynucleotide if the polynucleotide, either in its native state or in a recombinant form can be transcribed and/or translated to produce the mRNA for and/or the polypeptide or a fragment thereof.
  • gene refers to a region of DNA which includes a portion which can be transcribed into RNA, and which may contain an open reading frame or coding region (also referred to as an exon) which encodes a protein, a non-coding region (also referred to as an intron), and a specific regulatory region comprising the DNA regulatory elements which control expression of the transcribed region.
  • coding region refers to a region of DNA which encodes a protein, also known as an exon.
  • non-coding region refers to a region of DNA which does not encode a protein coding region, also known as an intron, and is not included in the RNA molecule that is synthesized from a particular gene.
  • regulatory region refers to DNA sequences which are located either 5' of the transcription start site, 3' or the transcription termination site, within an intron or exon, capable of ensuring that the gene is transcribed at the proper time and in the appropriate cell type.
  • consensus DNA sequence or wild-type DNA sequence refers to a sequence wherein every position represents the nucleotide that occurs with the highest frequency when many actual sequences are compared.
  • consensus DNA sequence or wild-type DNA sequence also refers to the normal, naturally occurring DNA sequence.
  • a given sequence (or mutation or polymorphism) "associated with" osteoporosis refers to a nucleic acid sequence that increases susceptibility to the disease, predisposes an individual to the disease or contributes to the disease, wherein the nucleic acid sequence is present at a higher frequency (at least 5%, preferably 10%, more preferably 25% higher) in individuals with the disease as compared to individuals who do not have the disease.
  • a sequence "not associated with" osteoporosis refers to a nucleic acid sequence that does not jncrease susceptibility to the disease, predispose an individual to the disease or contribute to the disease, wherein the nucleic acid sequence is not present at a higher frequency in individuals with the disease, and thus is present at a frequency about equal to its frequency in individuals who do not have the disease.
  • amplifying refers to producing additional copies of a nucleic acid sequence, preferably by the method of polymerase chain reaction (Mullis and Faloona, 1987, Methods Enzvmol. 155: 335).
  • oligonucleotide primers refer to single stranded DNA or RNA molecules that are hybridizable to a nucleic acid template and prime enzymatic synthesis of a second nucleic acid strand. Oligonucleotide primers useful according to the invention are between 5 to 100 nucleotides in length, preferably 20-60 nucleotides in length, and more preferably 20-40 nucleotides in length.
  • sequencing refers to determining the precise nucleotide composition or sequence of a nucleic acid region by methods well known in the art (see Ausubel et al., supra and S ambrook et al . , supra) .
  • comparing refers to determining if the nucleotides at one or more positions in a particular region of a nucleic acid fragment are identical for any two or more sequences. According to the invention, sequence comparisons can be performed by using computer program analysis as described below in Section F entitled “Identification and Characterization of Polymorphisms”.
  • sequence differences or “sequence variations” refer to nucleotide changes, at one or more positions between any two or more sequences being compared.
  • determining the presence of polymorphic variations refers to using methods well known in the art to identify a nucleotide, at one or more positions within a particular nucleic acid region, that is distinct from the nucleotide present in the naturally occurring, wild-type or consensus sequence, resulting from either a nucleotide deletion, an insertion or addition, or a substitution.
  • determining the absence of polymorphic variations refers to using methods well known in the art to determine that the nucleotides present at every position analyzed in a particular nucleic acid region are identical to the nucleotides present in the naturally occurring, wild type or consensus sequence.
  • genotyping refers to determining the composition of the genetic material that is inherited by an organism from its parents.
  • biological sample refers to a tissue or fluid sample containing a polynucleotide or polypeptide of interest, and isolated from an individual including but not limited to plasma, serum, spinal fluid, lymph fluid, urine, stool, external secretions of the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and samples of in vitro cell culture constituents.
  • amplimers refer to a specific fragment of DNA generated by PCR that is at least 30 bp in length and is preferably between 50 and 100 bp in length, and is more preferably between 150-300bp in length, with a melting temperature in the range of approximately 60-62°C.
  • phenotype refers to the biological appearances of an organism or a tissue derived from an organism, wherein biological appearances include chemical, structural and behavioral attributes, and excludes genetic constitution.
  • protype refers to the genetic material that is inherited by an organism from its parents.
  • genetic susceptibility to osteoporosis refers to an increased risk of developing osteoporosis resulting from specific DNA differences relative to non-susceptible individuals.
  • an individual who is genetically susceptible to osteoporosis has a 5-100%, and more preferably a 25-50% greater chance of developing osteoporosis, as compared to non- susceptible individuals.
  • diagnostic refers to the practice of identifying a disease from the signs and symptoms of an individual including the DNA sequences of genes that are associated with an increased susceptibility to the disease.
  • Diagnostic also refers to the practice of stratifying patient populations based on the efficacy or toxicity of a composition, and the predictive placement of an individual in a response strata based on stata-associated parameters.
  • prognosis refers to the possibility of recovering from a particular disease or condition, and also refers to risk assessment of developing a particular disease or condition.
  • oligonucleotide primers are disclosed that are useful for determining the sequence of a particular allele of a gene.
  • the invention also discloses oligonucleotide primers designed to amplify a region of a gene that is known to contain a polymorphism.
  • the invention also discloses oligonucleotide primers designed to anneal specifically to a particular allele of a gene.
  • Oligonucleotide primers useful according to the invention are single-stranded DNA or RNA molecules that are hybridizable to a nucleic acid template and prime enzymatic synthesis of a second nucleic acid strand.
  • oligonucleotide primers are prepared by synthetic methods, either chemical or enzymatic. Alternatively, such a molecule or a fragment thereof is naturally-occurring, and is isolated from its natural source or purchased from a commercial supplier. Oligonucleotide primers are 5 to 100 nucleotides in length, ideally from 20 to 40 nucleotides, although oligonucleotides of different length are of use.
  • Pairs of single- stranded DNA primers can be annealed to sequences within or surrounding a gene on chromosome Y in order to prime amplifying DNA synthesis of a region of a gene.
  • a complete set of gene primers will allow synthesis of all of the nucleotides of the coding sequences, e.g., the exons, introns and control regions.
  • the set of primers will also allow synthesis of both intron and exon sequences.
  • Allele-specific primers are also useful, according to the invention. Such primers will anneal only to a particular-mutant allele (e.g. alleles containing a polymorphism), and thus will only amplify a product if the template also contains the polymorphism. Allele specific primers that anneal only to a wild type gene sequence are also useful according to the invention.
  • selective hybridization occurs when two nucleic acid sequences are substantially complementary (at least about 65% complementary over a stretch of at least 14 to 25 nucleotides, preferably at least about 75%, more preferably at least about 90% complementary). See Kanehisa, M., 1984, Nucleic Acids Res. 12: 203, incorporated herein by reference. As a result, it is expected that a certain degree of mismatch at the priming site is tolerated. Such mismatch may be small, such as a mono-, di- or tri-nucleotide. Alternatively, it may encompass loops, which are defined as regions in which there exists a mismatch in an uninterrupted series of four or more nucleotides.
  • oligonucleotide primers Numerous factors influence the efficiency and selectivity of hybridization of the primer to a second nucleic acid molecule. These factors, which include primer length, nucleotide sequence and/or composition, hybridization temperature, buffer composition and potential for steric hindrance in the region to which the primer is required to hybridize, will be considered when designing oligonucleotide primers according to the invention.
  • longer sequences have a higher melting temperature (T M ) than do shorter ones, and are less likely to be repeated within a given target sequence, thereby minimizing promiscuous hybridization.
  • T M melting temperature
  • Primer sequences with a high G-C content or that comprise palindromic sequences tend to self-hybridize, as do their intended target sites, since unimolecular, rather than bimolecular, hybridization kinetics are generally favored in solution.
  • Hybridization temperature varies inversely with primer annealing efficiency, as does the concentration of organic solvents, e.g. formamide, that might be included in a priming reaction or hybridization mixture, while increases in salt concentration facilitate binding.
  • concentration of organic solvents e.g. formamide
  • synthesis primers hybridize more efficiently than do shorter ones, which are sufficient under more permissive conditions.
  • Stringent hybridization conditions typically include salt concentrations of less than about 1M, more usually less than about 500 mM and preferably less than about 200 mM.
  • Hybridization temperatures range from as low as 0°C to greater than 22°C, greater than about 30°C, and (most often) in excess of about 37°C. Longer fragments may require higher hybridization temperatures for specific hybridization. As several factors affect the stringency of hybridization, the combination of parameters is more important than the absolute measure of a single factor.
  • Oligonucleotide primers can be designed with these considerations in mind and synthesized according to the following methods.
  • the design of a particular oligonucleotide primer for the purpose of sequencing or PCR involves selecting a sequence that is capable of recognizing the target sequence, but has a minimal predicted secondary structure.
  • the oligonucleotide sequence binds only to a single site in the target nucleic acid.
  • the Tm of the oligonucleotide is optimized by analysis ofthe length and GC content ofthe oligonucleotide.
  • the selected primer sequence does not demonstrate significant matches to sequences in the GenBank database (or other available databases).
  • Primer The design of a primer is facilitated by the use of readily available computer programs, developed to assist in the evaluation of the several parameters described above and the optimization of primer sequences. Examples of such programs are "PrimerS elect” of the DNAStarTMsoftware package (DNAStar, Inc. ; Madison, WI), OLIGO 4.0 (National Biosciences, Inc.), PRIMER, Oligonucleotide Selection Program, PGEN and Amplify (described in Ausubel et al., 1995, Short Protocols in Molecular Biology.3rd Edition, John Wiley & Sons). Primers are designed with sequences that serve as targets for other primers to produce a PCR product that has known sequences on the ends which serve as targets for further amplification (e.g. to sequence the PCR product).
  • primers are designed with restriction enzyme site sequences appended to their 5' ends.
  • all nucleotides ofthe primers are derived from gene sequences or sequences adjacent to a gene, except for the few nucleotides necessary to form a restriction enzyme site.
  • restriction enzyme site is well known in the art. If the genomic sequence of a gene and the sequence of the open reading frame of a gene are known, design of particular primers is well within the skill of the art.
  • oligonucleotides are prepared by a suitable method, e.g. the phosphoramidite method described by Beaucage and Carruthers (1981, Tetrahedron Lett., 22:1859) or the triester method according to Matteucci et al. (1981, J. Am. Chem. Soc, 103:3185), both incorporated herein by reference, or by other chemical methods using either a commercial automated oligonucleotide synthesizer (which is commercially available) or VLSIPSTM technology.
  • the invention discloses polynucleotide sequences comprising polymorphisms.
  • the polynucleotide sequences of the invention are specifically hybridizable to a mutant form of a gene and are therefore useful for discriminating between a wild-type form of a gene and a mutant form of a gene.
  • the polynucleotide sequences ofthe invention may also be useful for expression ofthe encoded protein or a fragment thereof.
  • the invention also features antisense polynucleotide sequences complementary to polynucleotide sequences comprising polymorphisms. Antisense polynucleotide sequences are useful according to the invention for inhibiting expression of an allelic form of a gene.
  • the present invention utilizes polynucleotide sequences and fragments comprising RNA, cDNA, genomic DNA, synthetic forms, and mixed polymers.
  • the invention includes both sense and antisense strands of the polynucleotide sequences.
  • the polynucleotide sequences may be chemically or biochemically modified or may contain non- natural or derivatized nucleotide bases. Such modifications include, for example, labels, methylation, substitution of one or more of the naturally occurring nucleotides with an analog, internucleotide modifications such as uncharged linkages (e.g. methyl phosphonates, phosphorodithioates.
  • pendent moieties e.g., polypeptides
  • intercalators e.g. acridine, psoralen, etc.
  • alkylators e.g. alpha anomeric nucleic acids, etc.
  • modified linkages e.g. alpha anomeric nucleic acids, etc.
  • synthetic molecules that mimic polynucleotides in their ability to bind to a designated sequence via hydrogen bonding and other chemical interactions. Such molecules are known in the art and include, for example, those in which peptide linkages substitute for phosphate linkages in the backbone of the molecule.
  • the polynucleotide may be a naturally occurring polynucleotide, or may be a structurally related variant of such a polynucleotide having modified bases and/or sugars and/or linkages.
  • the term "polynucleotide” as used herein is intended to cover all such variants.
  • X or W NH) (Mag and Engels. 1988, Nucleic Acids Res., 16:3525)
  • purine derivatives lacking specific nitrogen atoms (e.g.7-deaza adenine, hypoxanthine) or functionalized in the 8-position (e.g. 8-azido adenine, 8-bromo adenine)
  • Polynucleotides covalently linked to reactive functional groups e.g.: i) psoralens (Miller et al., 1988, Nucleic Acids Res. Special Pub. No.
  • modified polynucleotides while sharing features with polynucleotides designed as "anti-sense” inhibitors, are distinct in that the compounds correspond to sense-strand sequences and the mechanism of action depends on protein-nucleic acid interactions and does not depend upon interactions with nucleic acid sequences.
  • Polynucleotide sequences comprising DNA can be isolated from cDNA or genomic libraries (including YAC and B AC libraries) by cloning methods well known to those skilled in the art (Ausubel et al., supra). Briefly, isolation of a DNA clone comprising a particular polynucleotide sequence involves screening a recombinant DNA or cDNA library and identifying the clone containing the desired sequence. Cloning will involve the following steps. The clones of a particular library are spread onto plates, transferred to an appropriate substrate for screening, denatured, and probed for the presence of a particular sequence. A description of hybridization conditions, and methods for producing labeled probes is included below.
  • the desired clone is preferably identified by hybridization to a nucleic acid probe or by expression of a protein that can be detected by an antibody.
  • the desired clone is identified by polymerase chain amplification of a sequence defined by a particular set of primers according to the methods described below.
  • Polynucleotide sequences of the invention are amplified from genomic DNA.
  • Genomic DNA is isolated from tissues or cells according to the following method.
  • the tissue is isolated free from surrounding normal tissues.
  • genomic DNA from mammalian tissue
  • the tissue is minced and frozen in liquid nitrogen.
  • Frozen tissue is ground into a fine powder with a prechilled mortar and pestle, and suspended in digestion buffer (100 mM NaCI, 10 mM TrisCl, pH 8.0, 25 mM EDTA, pH 8.0, 0.5% (w/v) SDS, 0.1 mg/ml proteinase K) at 1.2ml digestion buffer per lOOmg of tissue.
  • digestion buffer 100 mM NaCI, 10 mM TrisCl, pH 8.0, 25 mM EDTA, pH 8.0, 0.5% (w/v) SDS, 0.1 mg/ml proteinase K
  • cells are pelleted by centrifugation for 5 min at 500 x g, resuspended in 1-10 ml ice-cold PBS, repelleted for 5 min at 500 x g and resuspended in 1 volume of digestion buffer.
  • Samples in digestion buffer are incubated (with shaking) for 12-18 hours at 50°C, and then extracted with an equal volume of phenol/chloroform/isoamyl alcohol. If the phases are not resolved following a centrifugation step (10 min at 1700 x g), another volume of digestion buffer (without proteinase K) is added and the centrifugation step is repeated. If a thick white material is evident at the interface of the two phases, the organic extraction step is repeated. Following extraction the upper, aqueous layer is transferred to a new tube to which will be added x volume of 7.5M ammomum acetate and 2 volumes of 100% ethanol.
  • the nucleic acid is pelleted by centrifugation for 2 min at 1700 x g, washed with 70% ethanol, air dried and resuspended in TE buffer (10 mM TrisCl, pH 8.0, 1 mM EDTA, pH 8.0) at lmg/ml. Residual RNA is removed by incubating the sample for 1 hour at 37°C in the presence of 0.1% SDS and 1 mg/ml DNAse-free RNASE, andrepeating the extraction and ethanol precipitation steps.
  • the yield of genomic DNA according to this method is expected to be approximately 2 mg DNA/1 g cells or tissue (Ausubel et al., supra).
  • Genomic DNA isolated according to this method can be used for Southern blot analysis, restriction enzyme digestion, dot blot analysis or PCR analysis, according to the invention.
  • Restriction digest (of cDNA or genomic DNA) Following the identification of a desired cDNA or genomic clone containing a particular sequence, polynucleotides of the invention are isolated from these clones by digestion with restriction enzymes.
  • restriction enzyme digestion is well known to those skilled in the art (Ausubel et al., supra). Reagents useful for restriction enzyme digestion are readily available from commercial vendors including New England Biolabs, Boebringer Mannheim, Promega, as well as other sources. d. PCR
  • Polynucleotide sequences of the invention are amplified from genomic DNA or other natural sources by the polymerase chain reaction (PCR). PCR methods are well-known to those skilled in the art.
  • PCR provides a method for rapidly amplifying a particular DNA sequence by using multiple cycles of DNA replication catalyzed by a thermostable, DNA-dependent DNA polymerase to amplify the target sequence of interest.
  • PCR requires the presence of a nucleic acid to be amplified, two single stranded oligonucleotide primers flanking the sequence to be amplified, a DNA polymerase, deoxyribonucleoside triphosphates, a buffer and salts.
  • PCR is well known in the art. PCR, is performed as described in Mullis and Faloona, 1987, Methods Enzymol., 155: 335, herein incorporated by reference.
  • PCR is performed using template DNA (at least 1 pg; more usefully, 1 - 1000 ng) and at least 25 pmol of oligonucleotide primers.
  • a typical reaction mixture includes: 2 ml of DNA, 25 pmol of oligonucleotide primer, 2.5 ml of 10X PCR buffer 1 (Perkin-Elmer, Foster City, CA), 0.4 ml of 1.25 mM dNTP, 0.15 ml (or 2.5 units) of Taq DNA polymerase (Perkin Elmer, Foster City, CA) and deionized water to a total volume of 25 ml.
  • Mineral oil is overlaid and the PCR is performed using a programmable thermal cycler.
  • the length and temperature of each step of a PCR cycle, as well as the number of cycles, are adjusted according to the stringency requirements in effect.
  • Annealing temperature and timing are determined both by the efficiency with which a primer is expected to anneal to a template and the degree of mismatch that is to be tolerated.
  • the ability to optimize the stringency of primer annealing conditions is well within the knowledge of one of moderate skill in the art.
  • An annealing temperature of between 30°C and 72°C is used.
  • Initial denaturation of the template molecules normally occurs at between 92°C and 99°C for 4 minutes, followed by 20-40 cycles consisting of denaturation (94-99°C for 15 seconds to 1 minute), annealing (temperature determined as discussed above; 1-2 minutes), and extension (72°C for 1 minute).
  • the final extension step is generally carried out for 4 minutes at 72°C, and may be followed by an indefinite (0-24 hour) step at 4°C.
  • Taq DNA polymerase When Taq DNA polymerase is activated, it cleaves off the fluorescent reporters of the probe bound to the template by virtue of its 5'-to-3' nucleolytic activity. In the absence of the quenchers, the reporters now fluoresce. The color change in the reporters is proportional to the amount of each specific product and is measured by a fluorometer; therefore, the amount of each color can be measured and the PCR product can be quantified.
  • the PCR reactions can be performed in 96 well plates so that samples derived from many individuals can be processed and measured simultaneously.
  • the TaqmanTM system has the additional advantage of not requiring gel electrophoresis and allows for quantification when used with a standard curve.
  • the present invention also provides a polynucleotide sequence comprising RNA.
  • a polynucleotide comprising RNA is useful for detecting snps and polymorphisms by techniques including but not limited to hybridization methods or the RNase protection method.
  • a polynucleotide comprising RNA is also useful as a template for the in vitro production of protein.
  • a polynucleotide comprising RNA is also useful for detecting and localizing specific mRNA sequences by in situ hybridization.
  • Polynucleotide sequences comprising RNA can be produced according to the method of in vitro transcription.
  • the technique of in vitro transcription is well known to those of skill in the art. Briefly, the gene of interest is inserted into a vector containing an SP6, T3 or T7 promoter.
  • the vector is linearized with an appropriate restriction enzyme that digests the vector at a single site located downstream- of the coding sequence. Following a phenol/chloroform extraction, the DNA is ethanol precipitated, washed in 70% ethanol, dried and resuspended in sterile water.
  • the in vitro transcription reaction is performed by incubating the linearized DNA with transcription buffer (200 mM TrisCl, pH 8.0,40 mM MgCl 2 , 10 mM spermidine, 250 NaCI [T7 or T3] or 200 mM TrisCl, pH 7.5,30 mM MgCl 2 , lOmM spermidine [SP6]), dithiothreitol, RNASE inhibitors, each of the four ribonucleoside triphosphates, and either SP6, T7 or T3 RNA polymerase for 30 min at 37°C.
  • transcription buffer 200 mM TrisCl, pH 8.0,40 mM MgCl 2 , 10 mM spermidine, 250 NaCI [T7 or T3] or 200 mM TrisCl, pH 7.5,30 mM MgCl 2 , lOmM spermidine [SP6]
  • dithiothreitol 200 mM
  • RNA RNA
  • unlabeled UTP will be omitted and -SUTP will be included in the reaction mixture.
  • the DNA template is then removed by incubation with DNasel. Following ethanol precipitation, an aliquot of the radiolabeled RNA is counted in a scintillation counter to determine the cpm/ml (Ausubel et al., supra).
  • polynucleotide sequences comprising RNA are prepared by chemical synthesis techniques such as solid phase phosphoramidite (described above). 3. Polynucleotide Sequences Comprising Oligonucleotides
  • a polynucleotide sequence comprising oligonucleotides can be made by using • oligonucleotide synthesizing machines which are commercially available (described above).
  • Polynucleotide sequences ofthe invention can be used to express the protein product (or fragment thereof) of the gene of interest by inserting the polynucleotide sequence into an expression vector.
  • Expression vectors suitable for protein expression in mammalian cells, bacterial cells, insect cells or plant cells are well known in the art and are described in Section H entitled "Production of a Mutant Protein".
  • Polynucleotide sequences ofthe invention can be used to prepare hybrid polynucleotides comprising a sequence of a gene adjacent to a sequence encoding a foreign protein or a fragment thereof (e.g lacZ, trpE, glutathionine S-transferase or thioredoxin) or a protein tag (hemmaglutinin or FLAG).
  • hybrid polynucleotides produce fusion proteins that are useful, according to the invention, for improved expression and/or rapid isolation of a protein or protein fragment, encoded by the sequence of a gene.
  • Hybrid polynucleotides are also useful as a source of antigen for the production of antibodies.
  • Nucleic acid constructs comprising a polynucleotide of genomic, cDNA, synthetic or semi- synthetic origin in association with a polynucleotide sequence encoding a foreign protein or a fragment thereof, (carrier sequence) can be generated by recombinant nucleic acid techniques well known in the art (See Ausubel et al., supra). According to this method, the cloned gene is introduced into an expression vector at a position located 3' to a carrier sequence coding for the amino terminus of a highly expressed protein, an entire functional moiety of a highly expressed protein or the entire protein. It is preferable to use a earner sequence from an E. coli gene or from any gene that is expressed at high levels in E. coli.
  • the purification protocol can be designed in accordance with the unique physical properties of the carrier protein (e.g. heat stability).
  • the tag sequence may encode a protein (e.g. glutathione-S-transf erase (GST)) which can be purified by either a chemical interaction (for example glutathione purification of GST).
  • GST glutathione-S-transf erase
  • some carrier proteins, such as thioredoxin (Trx) can be selectively released from intact cells by osmotic shock or freeze/thaw procedures.
  • proteins that are fused to these carrier proteins can be purified away from intracellular contaminants by virtue of the physical attributes of the carrier protein (Ausubel et al., supra).
  • the temperature at which expression is induced can affect inclusion body formation since inclusion body formation is induced at higher temperatures (37°C and 42°C) and inhibited at lower temperatures (30°C). In certain instances, lowering the total level of protein expression can lead to an increase in the proportion of soluble protein that is produced.
  • the strain background of the cells in which the protein is being produced can affect the proportion of a particular protein that is expressed in a soluble form.
  • the choice of carrier protein can affect the solubility of an expressed fusion protein (Ausubel et al., supra).
  • fusion proteins in E. coli An additional problem that can be encountered when producing fusion proteins in E. coli is formation of an unstable protein, or a protein that is cleaved at the site of the junction between the carrier sequence and the sequence of the protein of interest. To decrease complications due to protein instability one can arrange for the fusion protein to be expressed as insoluble aggregates. Alternatively, one can express the fusion protein in E. coli strains that are deficient in proteases (Ausubel et al., supra).
  • cleavage of fusion proteins to remove the carrier are known to those skilled in the art.
  • the choice of a method is usually determined by the composition, sequence, and physical characteristics ofthe particular protein.
  • Reagents such as cyanogen bromide, hydroxylamine or low pH can be used to chemically cleave fusion proteins.
  • enzymatic cleavage methods can be used.
  • Enzymatic cleavage protocols are advantageous because they can be carried out under relatively mild reaction conditions, and because they involve highly specific cleavage reactions.
  • Enzymes useful for enzymatic cleavage of fusion proteins include factor Xa, thrombin, enterokinase, renin and collagenase (Ausubel et al., supra).
  • Recombinant constructs encoding fusion proteins wherein the carrier sequence is on the order of 9-15 codons can be generated by PCR methods. According to this method, a PCR primer will be designed to contain at least 13 nucleotides that are identical to the target sequence on either side of the nucleotide sequence encoding the carrier sequence.
  • the PCR primer will also contain a restriction enzyme site to facilitate cloning of the amplified product into an appropriate expression vector. PCR will be carried out as described above and the sequence ofthe amplified product will be confirmed by sequence analysis as described in Section D entitled "Isolation of a Wild type Gene”.
  • recombinant constructs encoding fusion proteins can be generated by site/oligonucleotide directed mutatagenesis (Ausubel et al., supra).
  • site directed mutatagenesis the DNA to be mutated is inserted into a plasmid which has an FI origin of replication.
  • a mutagenesis oligonucleotide is designed to contain 13 bp that are 100% identical to the target sequence, on either side of a sequence coding for the 9-15 codons of carrier sequence that is to be added by the mutatgenesis protocol.
  • a single stranded preparation of the vector is prepared by the following method.
  • the sample After the addition of 2.6 ml of 20% polyethylene glycol 200-800/2M NaCI to 20 ml of bacterial supernatant, the sample is incubated for 1 - 1.5 hours on ice. The sample is pelleted by centrifugation at 9000 rpm for 20 minutes. Following removal of the supernatant, residual supernatant are removed by centrifugation at 3000 rpm for 5 minutes. The pellet is resuspended in 400 ml of TE, extracted twice with phenol and four times with phenol: chloroform and ethanol precipitated. The resulting pellet is resuspended in 40 ml TE.
  • Mutagenesis is performed by using a muta-gene kit (Bio-Rad, Hercules, CA) according to the following method.
  • a muta-gene kit Bio-Rad, Hercules, CA
  • To kinase the oligonucleotide primer 1 ml (200ng) of oligonucleotide is incubated in the presence of 2 ml of 10 kinase buffer (0.5M Tris, pH 8.0, 70mM MgCl 2 , lOmM DTT), 2 ml lOmM rATP, 2 ml polynucleotide kinase and 13 ml H 2 0 for 37°C for 1 hour.
  • 10 kinase buffer 0.5M Tris, pH 8.0, 70mM MgCl 2 , lOmM DTT
  • 2 ml lOmM rATP 2 ml polynucleotide kinase
  • DNA is isolated from the transformed E. coli cells by mini prep methods known in the art (Ausubel et al., supra), and sequenced according to methods known in the art (described in Section D entitled "Isolation of a Wild Type Gene”.
  • the invention discloses nucleic acid probes.
  • the nucleic acid probes of the invention are specifically hybridizable to a mutant gene but not to a wild type form of a gene due to the presence of one or more polymorphisms.
  • These allele specific probes can be used to screen DNA sequences of a gene which have been amplified by PCR, or are present in a genomic DNA or RNA test sample. Hybridization of a particular allele specific probe to an amplified gene sequence, under stringent conditions (described below), indicates that the polymorphism contained in the probe is present in the amplified sequence.
  • Nucleic acid probes that are specifically hybridizable to a wild type form of a gene but not to a mutant form of a gene are also useful according to the invention.
  • the probes ofthe claimed invention will be specific for a nucleic acid region that is adjacent to a region that is thought to contain one or more polymorphisms. These probes will be useful for detecting the presence of one or more polymorphisms in the adjacent region by the method of primer extension (as described in Section F entitled "Identification and Characterization of Polymorphisms”.
  • probes of the claimed invention will be used to detect a gain or loss of a restriction enzyme site known to contain one or more polymorphisms of the claimed invention.
  • Nucleic acid probes are able to detect a restriction enzyme fragment that is of a size that can be easily separated on an agarose gel ' and visualized by Southern blot analysis. Probes that are useful according to this embodiment of the claimed invention can be specific for any region within a gene or outside of a gene.
  • the nucleic acids probes ofthe invention are useful for a variety of hybridization-based analyses including but not limited to Southern hybridization to genomic DNA, cDNA sequences or PCR amplification products, Northern hybridization to mRNA and RNase protection assays, DNA sequencing and isolation of genomic or cDNA clones of a gene.
  • the probes may also be used to determine whether mRNA encoded for by a gene is present in a cell or tissue by the method of in situ hybridization. These techniques are well known in the art and can be performed as described in Ausubel et al., supra.
  • polymorphisms associated with alleles of a gene which either predispose to a particular disease (e.g. osteoporosis) or are not associated with a particular disease (e.g. osteoporosis), will be detected by the formation of a stable hybrid consisting of a polynucleotide probe comprising one or more polymorphisms and a target sequence, that also comprises one or more polymorphisms, under stringent to moderately stringent hybridization and wash conditions. If it is expected that the probes will be perfectly complementary to the target sequence, stringent conditions will be used.
  • Hybridization stringency may be lessened if some mismatching is expected, for example, if variants are expected with the result that the probe will not be completely complementary. Conditions are chosen which rule out nonspecific/adventitious bindings, that is, which minimize noise. Since such indications identify neutral DNA polymorphisms as well as mutations, these indications need further analysis (such as assays described in Section F entitled "Identification and Characterization of Polymorphisms") to demonstrate detection of a susceptibility allele of a gene. Probes for alleles of a gene may be derived from genomic DNA or cDNA sequences from specific for the gene of interest. The probes may be of any suitable length, which span all or a portion ofthe region containing the gene.
  • the probes may be short, e.g., in the range of about 8-30 base pairs, since the hybrid will be relatively stable under even stringent conditions. If some degree of mismatch is expected with the probe, i.e., if it is suspected that the probe will hybridize to a variant region, a longer probe may be employed which hybridizes to the target sequence with the requisite specificity.
  • Probes according to the invention also include an isolated polynucleotide attached to a label or a reporter molecule which may be useful for isolating other polynucleotide sequences, having sequence similarity by standard methods, including but not limited to the above- referenced hybridization-based assays. Techniques for preparing and labeling probes (as described in Ausubel et al. Supra) are included below. A wide variety of labels and conjugation techniques are known by those skilled in the art and can be used in a various nucleic acid and amino acid assays.
  • Means for producing labeled hybridization or PCR probes for detecting related sequences include oligolabeling, nick translation, end-labeling or PCR amplification using a labeled nucleotide.
  • the protein-encoding sequence, or any portion of it may be cloned into a vector for the production of an mRNA probe.
  • Such vectors are known in the art, are commercially available, and may be used to synthesize RNA probes in vitro by addition of an appropriate RNA polymerase such as T7, T3 or SP6 and labeled nucleotides.
  • reporter molecules or labels include those radionuclides, enzymes, fluorescent, chemiluminescent, orchromogenic agents as well as substrates, cofactors, inhibitors, magnetic particles and the like.
  • Patents teaching the use of such labels include US Patents 3,817,838; 3,350,752; 3,939,350; 3,996,345; 4,277,437; 4,275,149 and 4,366,241.
  • recombinant immunoglobulins may be produced as shown in US Patent No. 4,816,567 incorporated herein by reference.
  • Probes comprising synthetic oligonucleotides or other polynucleotides of the present invention may be derived from naturally occurring or recombinant single- or double- stranded polynucleotides, or be chemically synthesized.
  • Portions of the polynucleotide sequence having at least approximately 5 nucleotides, preferably 9-15 nucleotides, fewer than about 6 kb and usually fewer than about 1 kb, from a polynucleotide sequence encoding a gene are preferred as probes.
  • a DNA probe useful according to the present invention can be isolated from a gene or a polynucleotide construct derived from a gene, or from a cDNA sequence specific for a gene or a cDNA construct specific for a gene by the methods of PCR or restriction enzyme digestion, as described above.
  • Riboprobes useful according to the invention can be synthesized by the method of in vitro transcription, or by chemical synthesis methods, as described above.
  • An oligonucleotide probe useful according to the invention can be designed, as described above, and synthesized in a commercially available automated synthesizer.
  • Nucleic acid hybridization rate and stability will be affected by a variety of experimental parameters including salt concentration, temperature, the presence of organic solvents, the viscosity of the hybridization solution, the base composition of the probe, the length of the duplex, and the number of mismatches between the hybridizing nucleic acids (Ausubel et al., supra), and as described in Section A entitled "Design and Synthesis of Oligonucleotide Primers".
  • Southern blot analysis can be used to detect sequence variations in a gene from a PCR amplified product or from a total genomic DNA test sample via a non-PCR based assay.
  • the hybridization conditions can be varied as necessary according to the parameters described in Section A entitled "Design and Synthesis of Oligonucleotide Primers". Following hybridization, the membrane is washed at room temperature in 2X SSC/0.1% SDS and at 65°C in 0.2X SSC/0.1% SDS, and exposed to film.
  • the stringency of the wash buffers can also be varied depending on the amount of the background signal (Ausubel et al., supra).
  • Detection of a nucleic acid probe-target nucleic acid hybrid will include the step of hybridizing a nucleic acid probe to the DNA target.
  • This probe may be radioactively labeled or covalently linked to an enzyme such that the covalent linkage does not interfere with the specificity of the hybridization.
  • a resulting hybrid can be detected with a labeled probe.
  • Methods for radioactively labeling a probe include random oligonucleotide primed synthesis, nick translation or kinase reactions (see Ausubel et al., supra).
  • a hybrid can be detected via non-isotopic methods.
  • Non-isotopically labeled probes can be produced by the addition of biotin or digoxigenin, fluorescent groups, chemiluminescent groups (e.g. dioxetanes, particularly triggered dioxetanes), enzymes or antibodies.
  • non-isotopic probes are detected by fluorescence or enzymatic methods. Detection of a radiolabeled probe-target nucleic acid complex can be accomplished by separating the complex from free probe and measuring the level of complex by autoradiography or scintillation counting. If the probe is covalently linked to an enzyme, the enzyme-probe-conjugate- target nucleic acid complex will be isolated away from the free probe enzyme conjugate and a substrate will be added for enzyme detection.
  • Enzymatic activity will be observed as a change in color development or luminescent output resulting in a 10 3 -10 6 increase in sensitivity.
  • An example of the preparation and use of nucleic acid probe- enzyme conjugates as hybridization probes (wherein the enzyme is alkaline phosphatase) is described in (Jablonski et al., 1986, Nucleic Acids Res., 14:6115)
  • Two-step label amplification methodologies are known in the art. These assays are based on the principle that a small ligand (such as digoxigenin, biotin, or the like) is attached to a nucleic acid probe capable of specifically binding to a gene. Allele specific gene probes are also useful according to this method.
  • the small ligand attached to the nucleic acid probe will be specifically recognized by an antibody-enzyme conjugate.
  • digoxigenin will be attached to the nucleic acid probe and hybridization will be detected by an antibody-alkaline phosphatase conjugate wherein the alkaline phosphatase reacts with a chemiluminescent substrate.
  • an antibody-alkaline phosphatase conjugate wherein the alkaline phosphatase reacts with a chemiluminescent substrate.
  • the small ligand will be recognized by a second ligand-enzyme conjugate that is capable of specifically complexing to the first ligand.
  • biotin avidin interaction A well known example of this manner of small ligand interaction is the biotin avidin interaction. Methods for labeling nucleic acid probes and their use in biotin-avidin based assays are described in Rigby et al., 1977, J. Mol. Biol., 113:237 and Nguyen et al., 1992, BioTechniques, 13:116).
  • Variations of the basic hybrid detection protocol are known in the art, and include modifications that facilitate separation of the hybrids to be detected from extraneous materials and/or that employ the signal from the labeled moiety. A number of these modifications are reviewed in, e.g., Matthews & Kricka, 1988, Anal. Biochem., 169:1; Landegren et al., 1988, Science, 242:229; Mittlin, 1989, Clincal Chem. 35:1819; U.S. Pat. No. 4,868,105, and in EPO Publication No. 225,807.
  • a wild type version of a candidate gene according to the invention can be isolated by cloning from an appropriately selected genomic library according to methods well known in the art. Methods of cloning are described in Section B entitled "Production of a Polynucleotide Sequence
  • sequence of the cloned gene will be determined by sequencing methods well known in the art (see Ausubel et al., supra and Sambrook et al., supra). Methods of sequencing employ such enzymes as the Klenow fragment of DNA polymerase I, Sequenase® (US Biochemical
  • the process is automated with machines such as the Hamilton Micro Lab 2200 (Hamilton, Reno NV), Peltier Thermal Cycler (PTC200; MJ Research, Watertown, MA) and the ABI 377 DNA sequencers (Perkin Elmer).
  • machines such as the Hamilton Micro Lab 2200 (Hamilton, Reno NV), Peltier Thermal Cycler (PTC200; MJ Research, Watertown, MA) and the ABI 377 DNA sequencers (Perkin Elmer).
  • a mutant version of a candidate gene according to the invention can be isolated by cloning from an appropriately selected genomic library according to methods well known in the art. Methods of cloning are described in Section B entitled “Production of a Polynucleotide Sequence.”
  • the sequence of the cloned gene will be determined by sequencing methods described in Section D entitled "Isolation of a Wild Type Gene.”
  • the starting point is a set of experimentally derived nucleic acid sequences.
  • the sequences In order to be useful for SNP discovery by the invention, it is preferred that the sequences have complete chromatogram files from a gel or capillary electrophoresis sequencing machine. When this is not available, quality score data which assigns a score to each base in the sequence indicating the likelihood of error for the basecall may be used. If neither of these data are available, the sequence may be used to assist the clustering of other sequences and in some cases to provide additional verification for a discovered SNP, but is not be used by the invention for the identification of the polymorphism.
  • the population of sequences used may constitute either a database of cDNA-derived sequences or genomic sequence.
  • sequences used by the invention are from an assembled cDNA database, such as the LifeSeqGold database (Incyte Genomics, Inc(Incyte), Palo Alto, CA).
  • Derivation of Nucleic Acid Sequences cDNA was isolated from libraries constructed using RNA derived from normal and diseased human tissues and cell lines.
  • the human tissues and cell lines used for cDNA library construction were selected from a broad range of sources to provide a diverse population of cDNAs representative of gene transcription throughout the human body. Descriptions of the human tissues and cell lines used for cDNA library construction are provided in the LIFESEQ database (Incyte Pharmaceuticals, Inc. (Incyte), Palo Alto CA).
  • Human tissues were broadly selected from, for example, cardiovascular, dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, reproductive, and urologic sources.
  • Cell lines used for cDNA library construction were derived from, for example, leukemic cells, teratocarcinomas, neuroepitheliomas, cervical carcinoma, lung fibroblasts, and endothelial cells.
  • Such cell lines include, for example, THP-1, Jurkat, HUVEC, hNT2, WI38, HeLa, and other cell lines commonly used and available from public depositories (American Type Culture Collection, Manassas VA).
  • cell lines Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical agent such as 5'-aza-2'-deoxycytidine, treated with an activating agent such as lipopolysaccharide in the case of leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress.
  • a pharmaceutical agent such as 5'-aza-2'-deoxycytidine
  • an activating agent such as lipopolysaccharide in the case of leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress.
  • Chain termination reaction products may be electrophoresed on urea-polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucleotides) or by fluorescence (for fluorophore- labeled nucleotides). Automated methods for mechanized reaction preparation, sequencing, and analysis using fluorescence detection methods have been developed.
  • Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 liquid transfer system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler (PTC200; MJ Research, Inc. (MJ Research), Watertown MA), and ABI CATALYST 800 thermal cycler (Perkin-Elmer). Sequencing can be carried out using, for example, the ABI 373 or 377 (Perkin-Elmer) or MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA) DNA sequencing systems, or other automated and manual sequencing systems well known in the art.
  • ABI 373 or 377 Perkin-Elmer
  • MEGABACE 1000 Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale CA
  • nucleotide sequences have been prepared by current, state-of-the-art, automated methods and, as such, may contain occasional sequencing errors or unidentified nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified bases do not represent a hindrance to practicing the invention for those skilled in the art.
  • Several methods employing standard recombinant techniques may be used to correct errors and complete the missing sequence information. (See, e.g., those described in Ausubel, F.M. et al. (1997) Short Protocols in Molecular Biology, John Wiley & Sons, New York NY; and Sambrook, J. et al. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Plainview NY.)
  • Human polynucleotide sequences may be assembled using programs or algorithms well known in the art. Sequences to be assembled are related, wholly or in part, and may be derived from a single or many different transcripts. Assembly of the sequences can be performed using such programs as PHRAP (Phils Revised Assembly Program) and the GELVIEW fragment assembly system (GCG), or other methods known in the art.
  • PHRAP Phils Revised Assembly Program
  • GCG GELVIEW fragment assembly system
  • cDNA sequences are used as "component" sequences that are assembled into “template” or “consensus” sequences as follows. Sequence chromatograms are processed, verified, and quality scores are obtained using PHRED. Raw sequences are edited using an editing pathway known as Block 1 (See, e.g., the LIFESEQ Assembled User Guide, Incyte Pharmaceuticals, Palo Alto, CA). A series of BLAST comparisons is performed and low- information segments and repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n' s", or masked, to prevent spurious matches. Mitochondrial and ribosomal RNA sequences are also removed.
  • Block 1 See, e.g., the LIFESEQ Assembled User Guide, Incyte Pharmaceuticals, Palo Alto, CA).
  • a series of BLAST comparisons is performed and low- information segments and repetitive elements (e.g., dinucleot
  • the processed sequences are then loaded into a relational database management system (RDMS) which assigns edited sequences to existing templates, if available.
  • RDMS relational database management system
  • a process is initiated which modifies existing templates or creates new templates from works in progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences themselves.
  • the templates can be merged into bins. If multiple templates exist in one bin, the bin can be split and the templates reannotated.
  • a resultant template sequence may contain either a partial or a full length open reading frame, or all or part of a genetic regulatory element. This variation is due in part to the fact that the full length cDNAs of many genes are several hundred, and sometimes several thousand, bases in length. With current technology, cDNAs comprising the coding regions of large genes cannot be cloned because of vector limitations, incomplete reverse transcription of the mRNA, or incomplete "second strand" synthesis. Template sequences may be extended to include additional contiguous sequences derived from the parent RNA transcript using a variety of methods known to those of skill in the art. Extension may thus be used to achieve the full length coding sequence of a gene.
  • the cDNA sequences are analyzed using a variety of programs and algorithms which are well known in the art. (See, e.g., Ausubel, supra. Chapter 7.7; Meyers, R.A. (Ed.) (1995) Molecular Biology and Biotechnology, Wiley VCH, New York NY, pp. 856-853). These analyses comprise both reading frame determinations, e.g., based on triplet codon periodicity for particular organisms (Fickett, J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and stop codons; and homology searches.
  • BLAST Basic Local Alignment Search Tool
  • BLAST is especially useful in determining exact matches and comparing two sequence fragments of arbitrary but equal lengths, whose alignment is locally maximal and for which the alignment score meets or exceeds a threshold or cutoff score set by the user (Karlin, S. et al. (1988) Proc. Natl. Acad. Sci.
  • Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif, BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, in "Database System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Data," U.S.S.N. 08/812,290, filed March 6, 1997, incorporated herein by reference.
  • the method comprise a series of filters to identify isSNPs from other sequencing variants and errors.
  • the filters can be grouped into the following five sets of filters by the order of application in the method:
  • Preliminary Filters the main filter in the first group removes the majority of base call errors by requiring a minimum phred quality score of 15. Additional filters at this stage deal with sequence alignment errors as well as errors resulting from improper trimming of vector sequence, chimeras and splice junctions.
  • Finishing Filters these filters remove duplicate and redundant SNPs from the generated list of SNP, and remove SNPs which are from the hypervariable regions of hypervariable genes such as immunoglobulin and T cell receptors.
  • sequences must first be trimmed to eliminate vector sequence, contamination and repetitive sequences. Then certain low information content sequences (for example, long runs of a single base, or two or three-base repeats) and repetitive sequences (for example Alu sequences in humans) must be massed (changed to N's) to prevent over-clustering errors.
  • the clustering process then identifies the sets of sequences that are believed to be derived from the same original DNA sequence or gene.
  • the preferred processes are Blocked 1 for trimming and masking, a variety of different algorithms for clustering, and phrap for the alignment. It will be recognized by those skilled in the art that phrap and other alignment methods carry out a secondary clustering step which divides clusters into contigs, and carry out a secondary trimming step which defines the end points of the portion of each sequence which participates in the contig. The contigs then maybe searched for the occurrence of SNPs.
  • the first step in identifying candidate SNP sequences is to redefine the end points of each sequence as the points within the previous end points where a stretch of at least 10 consecutive base calls, containing at least eight base changes, matches the consensus sequence exactly. Sequence trimming errors (both at single sequence stage and at the alignment stage contribute to the false positives when foreign sequence (vector, chimera or splice variant) is similar to the real sequence and the true boundary is difficult to determine. This step is a conservative approach to avoid false positives and also filters out lower-quality sequence that the ends. The reason the length of the match with a consensus is measured in base changes is to avoid low significance matches on repetitive sequence such as polyA.
  • the next step is an each position of the alignment to compare the base calls of all the aligned sequences which are between their start and end positions and which have quality scores greater than a set threshold, and which have neighboring base calls which agree with a consensus sequence and where the neighboring base calls also have a quality score > the threshold.
  • the threshold is a phred quality score greater than or equal to 15.
  • the possibilities are A, C, G, T, and -(deletion).
  • the next step is a Clone Filter where if there has been more than one base call for a sequence position, then the clone for each sequence is identified in the sequences corresponding to each clone are compared. If the base calls for different sequences from the same clone disagree, then all the sequences for this clone at this base position are removed from consideration. After all of these filters, positions for which there is more than one base call are candidate SNPs. The "wild type" base call is the one in the consensus sequence and the others are designated candidate SNPs. If the wild type base call is a deletion, then the SNP is considered to be an insertion at the previous base.
  • the next filters require opening of the chromatogram files for the sequences identified as containing candidate SNPs. At each candidate SNP position, the chromatogram data of each sequence passing the Identification Filters is extracted.
  • the first step in this process utilizes a program ABIdump to translate binary ABI chromatogram files into usable form.
  • Intensity Filter if the SNP is a single base change (this step is skipped for insertions and deletions), then the process intensity values for each of four bases at the call chromatogram location of the candidate SNP base are used to compute a ratio. If we call the intensity of wild type, "wt”, the intensity of the SNP base “snp”, the minimum of the other two “min”, and the phred quality of the base call "Q”, then the wild type sequences must have
  • the candidate SNP passes only if at least one wild type sequence passes and at least one SNP sequence passes.
  • the quality of the candidate SNP is the lower of the highest wild type pass level and the highest SNP pass level (if there is a high-quality wild type sequence but only low quality SNP sequences, then the candidate is low quality.
  • a SNP quality value is returned.
  • Polymerase errors are specific to the type of sequencing protocol used. For example, reverse transcriptase is involved in EST sequencing but not genomic clone sequencing. Polymerase is involved in the creation of extension clones (polymerase is used in all sequencing reactions, but errors are less likely to arise because only a fraction of the templates are affected in contrast to the extension process where a single polymerase product becomes a template for the entire reaction). This filter is not applied to genomic sequences in the current embodiment on the premise that the genomic sequences do not have polymerase errors, and that somatic mutations are likely to have the same profile as real SNPs.
  • This filter also filters out rare SNPs as well as apparent SNPs which are not real. It is difficult to determine and confirm by experiments to what extent SNP candidates are too rare to be confirmed vs. simply not real. For many applications, very rare SNPs are of less utility than common ones such that this is not a problem; however in some applications it may be advisable to turn this filter off.
  • This filter is that probabilities of different mutations is different depending on the source. For example true SNPs may be mostly transitions whereas reverse transcriptase mutations could be primarily G to T mutations. While this does not allow one to determine for sure that a given change is a true SNP, it allows one to evaluate the relative likelihood that a given mutation is a true SNP.
  • SNP confirmation data suggest that G/T SNP candidates in which there is only one clone having the T allele have a very low probability of being real SNPs. The SNP candidates are excluded from the high confidence set (they are kept in a different file-their confirmation rate is well below 50 percent). The other set which had a very low confirmation rate is any A/T SNP.
  • Frequency Filter This filter is based on the concept that true SNPs have a different frequency profile than clone errors and that a candidate SNP which is evident in only one clone in a deep alignment is less likely to be real than one which appears in one clone in a shallow alignment.
  • the likelihood of finding a SNP at a given sequence location is a function of the number of chromosomes sequenced. This curve is distinctly non-linear as most SNPs are sufficiently frequent, to be found with relatively few sequences. The probability of an error of this type, however is essentially linear in the number of sequences since the chance of the change occurring in two different sequences is independent.
  • This filter is the basis of a secondary method used to develop the base change sequence analysis filter. Comparing the set of single clone SNPs from shallow alignment's with those from deep alignment's, which are more likely to be errors, will reveal base changes which are more likely to be associated with polymerase errors and somatic mutations.
  • These filters are intended to remove candidates SNPs which result from the incorrect clustering of similar sequences such as highly homogenous genes, similar genomic sequences, and contamination from other species where the sequences of the species have been mis- labeled as human.
  • Number of base change filter This filter distinguishes homologous sequences from SNPs on the basis ofthe frequency of variants. True SNPs occur about one per kd when comparing to sequences or once per 2 kb if the length of sequences is included, and this fraction decreases as the depth of the alignment increases. Since EST sequences tend to be about 500 bp or less in length, then it would be expected to have not more than one SNP per four sequences. The number of SNPs in the cluster is divided by the number of sequences in the cluster and SNPs for which this number is larger than one are discarded. The higher the number, the less likely the SNP is to be real. The threshold value of one was chosen because it appears to correspond to roughly a 50 percent success rate, however the threshold value could be adjusted to higher value to accept lower confidence SNPs.
  • This filter calculates the number of SNPs for which the sequence is the only representative within a window of 100 bases on either side, and discards any of the SNPs for which there are more than one other SNP in this window.
  • This threshold can be set higher, but the actual fraction of SNP candidates which are true SNPs drops off to less than 50 percent.
  • Cluster total has proven to be empirically correlated with the confirmation rate, probably because it predicts clusters which contain para-logs, homologs and contamination from other species. Candidates SNPs which have a cluster number of less than eight are kept. This threshold value for the cluster total can be varied. Redundancy/finishing filters
  • Redundant SNP filter SNPs in different contigs of the same gene which have the same base change and surrounding sequence are flagged as redundant. To accommodate possible splice variants this redundancy filter also applies to SNPs which have the surrounding sequence matches on only one side.
  • Sequences containing SNPs are filtered to remove SNPs in sequences that are homologs to T cell receptors and immunoglobulin genes because both types of genes have hyper- variable regions which could result in false positives.
  • SNP related data With each candidate SNP a variety of data is kept, including the number and sources of all contributing sequences (for example gene album, HTPS, FL, WashU/Merck, etc.), the surrounding sequence, measures of the ratio and quality scores for the "best" sequence representing each allele, etc.
  • contributing sequences for example gene album, HTPS, FL, WashU/Merck, etc.
  • the surrounding sequence measures of the ratio and quality scores for the "best" sequence representing each allele, etc.
  • Sequence related data for each sequence associated with each SNP, the following data is kept including the distance in each direction to the end of the sequence, the distance in each direction to the next base different from the consensus and passing the initial quality filters, the library, tissue ID, donor ID and comments (for example tumor, diseases, normal).
  • the invention provides methods for detecting the presence of polymorphisms in candidate genes ofthe invention.
  • the invention also provides methods for distinguishing polymorphisms which contribute to a particular disease (e.g. osteoporosis) over polymorphisms which do not contribute to the disease.
  • Identification of Polymorphisms in a candidate gene will involve the steps of isolating the candidate gene, determining its genomic structure and identifying polymorphisms in the DNA sequences in any portion of the entire protein-coding region.
  • the invention also provides methods for identifying polymorphisms in the DNA sequences corresponding to RNA splice junctions.
  • the invention also provides methods for identifying polymorphisms in the DNA sequence corresponding to the regulatory (promoter) region of the candidate gene.
  • a candidate gene is isolated by cloning methods well known in the art (described above).
  • the genomic structure of a candidate gene is determined by Southern blot analysis, as described in Section C. It is expected that the entire sequence of an open reading frame (ORF) of an average entire gene can be spanned by 16 PCR-amplified DNA fragments or amplimers of an average length of 225 bp. It is expected that a smaller gene can be spanned by 1-2 amplimers and that >50 amplimers are required to span extremely large genes.
  • Primers useful for production of the amplimers of a particular candidate gene are designed based on preexisting knowledge of the sequence ofthe wild type gene, according to the primer design strategies described in Section A entitled "Design and Synthesis of Oligonucleotide Primers.”
  • primers that amplify overlapping regions of the candidate gene. If a sequence variation is located in a region of a candidate gene that corresponds to the region to which the primers hybridize, the primers will likely not bind, the region containing this sequence variation will not be amplified and the variation will not be detected in PCR based assays.
  • By producing overlapping amplimers it is expected that virtually all of the sequence variations in a particular candidate gene will be detected.
  • the amount of overlap in the amplimers is somewhat variable (approximately 20%) and the precise location ofthe overlapping regions will depend on the location of regions comprising a sequence that is an appropriate primer sequence.
  • each polymorphism will be detected in the context of an SSCP fragment.
  • Polymorphism analysis by fluorescent SSCP uses PCR to generate an amplimer of DNA to be studied.
  • the region to be tested is defined as the region between the primers (e.g. the region that is incorporated into the PCR product and reflects the sequence of the DNA sample being tested).
  • the PCR primers reflect the sequence of the DNA sample being tested and are incorporated into the PCR product as one end of each strand of DNA in the PCR product.
  • fSSCP provides a method of screening a DNA sequence located between PCR primers for the presence of polymorphisms.
  • the sensitivity of the technique of fSSCP for detecting a polymorphism is affected by length, such that there is a substantial decrease in the detection of polymorphisms in amplimers that are greater than 300 bp in length.
  • primer3 program (Copyright (c) 1996 Whitehead Institute for Biomedical Research) is employed to design pairs of primers suitable for use in a single PCR reaction.
  • program parameters are set so that multiple amplimers are designed in the length range of 150-300bp, with predicted primer melting temperatures in the narrow range 60- 62°C.
  • the narrow temperature range increases the likelihood that a single set of PCR conditions can be used to generate a wide variety of different amplimers.
  • SSCP does not detect 100% of polymorphisms.
  • the invention provides for detection of polymorphisms with an efficiency of 95% under a single set of conditions using single coverage of sequences; a 2-fold screening strategy can be employed if it is necessary to increase this detection efficiency.
  • polymorphism can be located, and detected anywhere in the SSCP fragment except in the regions at each end that correspond to the sequence of the PCR primers.
  • the precise location and identity of the sequence variation(s) of a particular SSCP fragment can be confirmed by sequencing the fragment as described in Section D entitled "Isolation of a Wild Type Gene".
  • the sequence of a candidate gene will be compared to the known sequence of a wild-type version of the gene by using the following DNA/protein sequence analysis programs and methods.
  • PSI-BLAST is a more sensitive variant of BLAST that operates by interactively searching the database while simultaneously refining the query pattern based on the results of the searches.
  • Other packages of programs that are available and which have different specific properties include the HMMER, SAM, WISE, STADEN and FASTA packages, and the programs est_genome, dotter, e-PCR, Clustal, cross_match and phrap (Pearson, 1996, Methods EnzvmoL. 266:227).
  • primers can be designed to produce amplimers useful for identifying polymorphisms located in the RNA splice junctions.
  • primers can be designed to produce amplimers useful for identifying polymorphisms located in the promoter region.
  • Additional methods for detecting and isolating polymorphisms include, but are not limited to fluorescent polarization-TDI, mass spectroscopy denaturing gradient gel electrophoresis, chemical cleavage of mismatch, constant denaturant capillary electrophoresis, RNase cleavage, heteroduplex analysis, sequencing by hybridization, DNA sequencing, representational difference analysis, and denaturing high performance liquid chromatography, described below in Section F entitled, "Identification and Characterization of Polymorphisms".
  • polymorphisms do not alter gene function and are called neutral polymorphisms. Some polymorphisms do have an effect on gene function, for example by changing the amino acid sequence of a protein, or by altering control sequences such as promoters or RNA splicing or degradation signals.
  • Polymorphisms can be used in genetic studies to identify a gene involved in a disease. If a polymorphism alters a gene function such that it increases disease susceptibility, then it will be present more often in individuals with the disease than in those without the disease. Alternatively, if a particular DNA variant is protective against a disease, it will be found more often in individuals without the disease than in those with the disease. Statistical methods are used to evaluate polymorphism frequencies found in diseased as compared to normal populations, and provide a means for establishing a causal link between a polymorphism and a phenotype. To detect a significant association between a disease and a polymorphic site, different tests may be used with either genotypic or allelic distributions.
  • the simplest test consists of a t- test wherein the frequency of the polymorphic alleles in normal individuals and individuals with the disease phenotype is compared.
  • a comparison of the genotypic distribution in normal individuals and individuals with the disease phenotype can also be performed using a chi-square test of homogeneity.
  • These tests are implemented in all commercially or freely available statistical packages, for example SAS and S+, and are even included in Microsoft Excel. More sophisticated analyses will be performed by incorporating covariates such as linear regression or logistic regression, and by accounting for the information provided by adjacent polymorphic sites (multipoint analysis).
  • associations studies which test polymorphism frequencies within groups exhibiting different phenotypes and use statistical methods to compare the group polymorphism frequencies and identify correlations with phenotypes, are known as "associations studies". Some polymorphisms that occur in a single gene can alter the function of a gene sufficiently such that the polymorphism results in a disease (monogenic disease). However, many common human diseases are polygenic; that is they are the result of complex interactions of various forms of multiple genes.
  • DNA variants leading to monogenic diseases are usually rare in a population due to the process of natural selection against those carrying the disease gene.
  • variants in genes that are involved in polygenic disease do not produce the disease phenotype unless they occur in the appropriate combination with other gene variants, normal individuals can carry a subset ofthe disease-contributing variants without suffering adverse effects.
  • disease-contributing gene variants that are associated with polygenic diseases may exist at a high frequency in a normal population.
  • Monogenic diseases tend to be rare within the population, and therefore few patients may be available for studies of these diseases.
  • a polymorphism in a single specific gene is necessary and usually sufficient to cause a monogenic disease, such that associations between the variant gene and the phenotype are usually readily apparent.
  • the polymorphism present in the disease gene will not be found upon examination of a large number of normal individuals. If there is not complete penetrance then some apparently normal individuals will contain the mutation; the difference in frequency of occurrence ofthe variant gene in the disease group as compared to the normal population will reveal that the variant is associated with the disease.
  • variation at different genes occurs in a combination which alters susceptibility to the disease.
  • genes may have variant forms which can contribute to a disease phenotype, it is not always necessary for a contributing variant to be present at every gene potentially contributing to the disease in a given affected individual.
  • a hypothetical disease could be caused by a particular combination of variants at three of four genes, designated as A, B, C, and D.
  • Appropriate susceptibility variants in combination at any three ofthe genes can cause the susceptibility, i.e. one person with increased susceptibility may have susceptibility variants in genes A, B, and C, while another individual with increased susceptibility to the same disease will have susceptibility variants in genes B, C, and D. Therefore, although not all affected individuals will have the same susceptibility variants, the net result is that a diseased population will have susceptibility variant forms of genes A, B, C, and D at a higher frequency than an unaffected population (as detected by association studies).
  • the polymorphisms which contribute to the polygenic disease are also present in a normal population.
  • an individual with susceptibility polymorphisms in only one or two of the genes potentially contributing to the disease susceptibility will be normal with regard to disease susceptibility. Therefore, normal populations can be used to identify polymorphic regions of the genome in the population, and these regions can then be specifically tested in larger patient and control populations.
  • a gene is analyzed for the presence of polymorphisms by testing between 2 and 100 normal individuals in order to establish if a particular polymorphism is present for that gene in the population. Once a polymorphic site(s) has been defined, the polymorphic site is then tested in case (disease) and control (normal) populations and statistical analyses are performed to identify polymorphisms which occur at significantly different frequencies in the two populations.
  • the determination ofthe statistical significance of polymorphism frequency differences is dependent upon the size of the observed frequency difference between the populations, and on the size of the populations being studied. If a significant difference is found, then it can be concluded that an association exists between the polymorphism and the phenotype being studied.
  • a statistically significant difference is a frequency difference at a particular site between populations which would be expected to occur by chance in only 5 out of 100 tests. That is, a difference which has a 95% probability of being a true difference due to the affect of the gene.
  • polymorphisms which do not directly contribute to a disease can also be used to identify regions of the genome which contain genes that contribute to the disease by virtue of their proximity to disease-contributing polymorphisms.
  • DNA exists as 23 homologous pairs of linear molecules (chromosomes). Recombination is a process which results in reciprocal exchanges of short homologous DNA segments between these homologous DNA pairs. Only one of each of the 23 pairs of chromosomes is inherited by the offspring. The inherited chromosome is thus made up of tandemly arrayed segments of DNA derived from both of a pair of chromosomes. Consequently, DNA is transferred in segments from one generation to the next. Although the boundaries of each inherited segment may vary in each generation, the net effect is that sequences of DNA which are adjacent along the length of the molecule are inherited together at a higher frequency than sequences that are farther apart.
  • a region (continuous linear segment) of DNA has two or more polymorphisms that are close together, they will be co-inherited at a higher frequency than polymorphisms that are farther apart, as they are more likely to remain on the same segment of DNA during recombination. Therefore, if two or more polymorphisms are close together, they will occur together at a higher frequency in a population than would be expected by random segregation. This effect is known as linkage. Linkage studies are performed using multiply affected individuals within families; the most commonly used approach is to test markers located throughout the genome in many sets of affected sib pairs that share the same phenotype.
  • Markers which are located in the region of a genome that contributes to the phenotype will be inherited in both siblings, along with the phenotype, at a higher frequency than expected by chance. Studies wherein data from many such families is compared can be used to implicate a region of a genome as one that contributes to a particular phenotype.
  • Linkage disequilibrium (LD) association studies provide another method for using polymorphisms in genetic studies. The method of LD involves making a correlation at the population level, between the alleles (alternative polymorphic forms ofthe same sequence site) present at different genomic sites.
  • site 1 has two variant forms, A and a
  • site 2 has two variant forms B and b
  • the observation in a population that allele A at site 1 is more often found with allele B at locus 2 than with allele b is an example of LD.
  • allele B is a disease- contributing polymorphism, then testing at allele A may show an association with the disease.
  • Linkage disequilibrium may be generated in several ways. Maintenance of LD in a population allows a disease association to be detected many generations after the formation of LD. The maintenance of LD is explained by linkage: the closer the two loci, the longer (in terms of number of generations) that particular LD is maintained.
  • polymorphisms which do not directly contribute to a disease can be used to identify regions of the genome which contain a disease contributing polymorphism. If a polymorphism affects gene function such that it contributes to a phenotype being studied and is found to be associated with the phenotype, nearby (neutral) polymorphisms which are in LD with the disease polymorphism may also show an association with the disease.
  • a polymorphism does not affect gene function but is found to be associated with a particular phenotype, this polymorphism is in LD with a different, but adjacent polymorphism that affects gene function such that it contributes to the phenotype being studied. If a neutral polymorphism is always inherited with a phenotype- contributing polymorphism, then the strength of the association of the neutral polymorphism to the phenotype will be equal to that of the polymorphism which affects gene function and is contributing to the phenotype.
  • a polymorphism which shows an association with a phenotype is a marker for that phenotype and implicates the region in which the polymorphism resides as a region containing a polymorphism which contributes to the phenotype. Additional flanking polymorphisms can be tested to determine the precise location of the true phenotype-contributing variant.
  • Linkage studies on families, and LD studies on populations have different degrees of resolution with regards to defining the size of a DNA region which contains the phenotype- contributing polymorphism. In general, linkage studies define an interval which potentially contains tens to hundreds of genes, while LD studies have been used to implicate single genes in the development of a particular phenotype. 3. Test Populations Useful for Polymorphism Genotyping
  • the invention provides methods of determining allelic frequencies by performing genotypic analyses in appropriate test populations.
  • Bone Fracture Cohort 1000 multiple or low trauma fracture cases and 1000 control cases to determine genetic association with fracture.
  • BMD (Bone Mass Density) Cohort: 300 high and 300 low BMD cases to study genetic association with high or low BMD.
  • BMD Case Control Cohort 500 low BMD and normal BMD case contols to study genetic association with low BMD/fracture.
  • osteoporosis is most effective at the time when bone loss is increasing and before the bones have become fragile and prone to fracturing.
  • Established diagnostic techniques use x-ray and ultrasonography to measure skeletal parameters of bone size, volume and mineral density to predict fracture risk and to assess response to therapy. Such measurements give a "static" value which can be compared to normal values to aid diagnosis of low bone mass and fracture risk (Schott, Cormier et al. 1998).
  • the World Health Organization defines osteoporosis as present when the bone mineral density levels are more than 2.5 standard deviations below the young normal mean.
  • DXA dual energy X-ray absorptiometry
  • QCT quantitative computed tomography
  • SXA single-energy x-ray absorptiometry
  • An alternative method to predict fracture independently of bone mass is to measure bone turnover.
  • High turnover bone resorption and formation
  • This is a "dynamic" measurement which is assessed with biochemical markers in urine or serum and can be used very effectively in therapy monitoring in preference to BMD measurements which alter more slowly (results of PEPI trial and Merck Research Laboratories).
  • biomarkers can provide more accurate fracture predictions over bone mass measurement alone.
  • markers for bone resorption deoxypyridinoline crosslinks
  • bone formation bone alkaline phosphatase, osteocalcin
  • the invention discloses methods for performing polymorphism genotyping. These methods can be used to detect the presence of a polymorphism in a sample comprising DNA or RNA.
  • a DNA sample for analysis according to the invention may be prepared from any tissue or cell line, and preparative procedures are well-known in the art. The preparation of genomic DNA is performed as described in Section B.
  • RNA samples may also be useful for genotyping according to the invention. Isolation of RNA can be performed according to the following methods.
  • RNA is purified from mammalian tissue according to the following method. Following removal of the tissue of interest, pieces of tissue of ⁇ 2g are cut and quick frozen in liquid nitrogen, to prevent degradation of RNA. Upon the addition of a volume of 20 ml tissue guanidinium solution per 2 g of tissue, tissue samples are ground in a tissuemizer with two or three 10-second bursts. To prepare tissue guanidiium solution (1 L) 590.8 g guanidinium isothiocyanate is dissolved in approximately 400 ml DEPC-treated H 2 0.
  • Homogenized tissue samples are subjected to centrifugation for 10 min at 12,000 x g at 12°C.
  • the resulting supernatant is incubated for 2 min at 65°C in the presence of 0.1 volume of 20% Sarkosyl, layered over 9 ml of a 5.7M CsCl solution (O.lg CsCl/ml), and separated by centrifugation overnight at 113,000 x g at 22°C. After careful removal ofthe supernatant, the tube is inverted and drained.
  • the bottom of the tube (containing the RNA pellet) is placed in a 50 ml plastic tube and incubated overnight (or longer) at 4°C in the presence of 3 ml tissue resuspension buffer (5 mM EDTA, 0.5% (v/v) Sarkosyl, 5% (v/v) 2-ME) to allow complete resuspension of the RNA pellet.
  • tissue resuspension buffer 5 mM EDTA, 0.5% (v/v) Sarkosyl, 5% (v/v) 2-ME
  • RNA solution is extracted sequentially with 25:24:1 phenol/chloroform/isoamyl alcohol, followed by 24:1 chloroform/isoamyl alcohol, precipitated by the addition of 3 M sodium acetate, pH 5.2, and 2.5 volumes of 100% ethanol, and resuspended in DEPC water (Chirgwin et al., 1979, Biochemistry, 18: 5294).
  • RNA is isolated from mammalian tissue according to the following single step protocol.
  • the tissue of interest is prepared by homogenization in a glass teflon homogenizer in 1 ml denaturing solution (4M guanidiium thiosulfate, 25 mM sodium citrate, pH 7.0, 0.1 M 2-ME, 0.5% (w/v) N-laurylsarkosine) per lOOmg tissue.
  • 1 ml denaturing solution 4M guanidiium thiosulfate, 25 mM sodium citrate, pH 7.0, 0.1 M 2-ME, 0.5% (w/v) N-laurylsarkosine
  • 0.1 ml of 2 M sodium acetate, pH 4 1 ml water-saturated phenol
  • 0.2 ml of 49: 1 chloroform/isoamyl alcohol are added sequentially.
  • the sample is mixed after the addition of each component, and incubated for 15 min at 0-4°C after all components have been added.
  • the sample is separated by centrifugation for 20 min at 10,000 x g, 4°C, precipitated by the addition of 1 ml of 100% isopropanol, incubated for 30 minutes at -20°C and pelleted by centrifugation for 10 minutes at 10,000 x g, 4°C.
  • the resulting RNA pellet is dissolved in 0.3 ml denaturing solution, transferred to a microfuge tube, precipitated by the addition of 0.3 ml of 100% isopropanol for 30 minutes at -20°C, and centrifuged for 10 minutes at 10,000 x g at 4°C.
  • RNA pellet is washed in 70% ethanol, dried, and resuspended in 100-200 ml DEPC-treated water or DEPC-treated 0.5% SDS (Chomczynski and Sacchi, 1987, Anal. Biochem., 162: 156).
  • RNA prepared according to either of these methods can be used for genotyping by the methods of Northern blot analysis, S 1 nuclease analysis and primer extension analysis (Ausubel et al., supra).
  • cDNA samples also may be prepared according to the invention, i.e., DNA that is complementary to RNA such as mRNA. The preparation of cDNA is well-known and well- documented in the prior art. cDNA is prepared according to the following method.
  • Total cellular RNA is isolated (as described) and passed through a column of oligo(dT)-cellulose to isolate polyA RNA.
  • the bound polyA mRNAs are eluted from the column with a low ionic strength buffer.
  • short deoxythymidine oligonucleotides (12-20 nucleotides) are hybridized to the polyA tails to be used as primers for reverse transcriptase, an enzyme that uses RNA as a template for DNA synthesis.
  • mRNA species can be primed from many positions by using short oligonucleotide fragments comprising numerous sequences complementary to the mRNA of interest as primers for cDNA synthesis.
  • RNA-DNA hybrid can be converted to a double stranded DNA molecule by a variety of enzymatic steps well-known in the art (Watson et al., 1992, Recombinant DNA, 2nd edition, Scientific American Books, New York).
  • Tissues or fluids which are useful for obtaining a DNA or RNA sample according to the invention include but are not limited to plasma, serum, spinal fluid, lymph fluid, external secretions of the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and samples of in vitro cell culture constituents.
  • Genotyping methods which are useful according to the invention, i.e., for the detection of polymorphisms in nucleic acid samples isolated from individuals, are disclosed below.
  • SSCP Single Strand Conformation Polymorphism
  • fSSCP Fluorescent SSCP Screening
  • SSCP single strand conformation polymorphism
  • SSCP Single stranded DNAs that contain sequence variations are identified by an abnormal mobility on polyacrylamide gels.
  • SSCP detects all types of point mutations and short insertions or deletions that are located between the PCR primers (within the probe region) with apparently equal efficiency. This technique has proven useful for detection of multiple mutations and polymorphisms, including SNPs.
  • SSCP sensitivity varies dramatically with the size of the DNA fragment being analysed. The optimal size fragment for sensitive detection by SSCP is approximately 125-300bp.
  • the mobility of a single stranded DNA or double stranded DNA fragment during electrophoresis through a gel matrix is dependent on its size. Small molecules migrate more rapidly than large molecules because they pass through the pores in the matrix more easily.
  • electrophoresis of single stranded DNA involves a 'denaturing' gel which maintains the single strandedness of the molecules.
  • the denaturant is typically urea in polyacrylamide gels, and typically formamide or sodium hydroxide in agarose gels.
  • single-stranded DNA is analysed on a 'nondenaturing' gel.
  • test DNA samples are prepared for analysis as described above, and subject to PCR amplification.
  • Oligonucleotide primers are designed and synthesized as described above.
  • Amplifications are performed in a total volume of 10 ml containing 50 mM KCI, 10 mM Tris- HCl, pH 9.0 (at 25°C), 0.1 % Triton X-100, 1.5 mM MgCl 2 , 0.2mM of dGTP, dATP, dTTP, 0.02 mM of non radioactive dCTP, 0.05 ml [a- 33 P] dCTP (1,000-3,000 Ci mmol "1 ; 10 mCi ml "1 ), 0.2 uM each primer, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U Taq DNA polymerase.
  • the PCR cycling profile is as follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72°C for 5 min. Annealing temperature is different for each PCR primer pair and can be optimized according to the parameters described above. Amplifications using Vent Taq polymerase (New England Biolabs) are performed in a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of dGTP, dATP, dTTP, 0.02 mM dCTP, 0.25 ul [a- 33 P] dCTP (1,000-3,000 Ci mmoi !
  • SSCP Dried gels are exposed to X-OMAT ARfilm (Kodak) and the autoradiographs are analysed and scored for aberrant migration of bands (band shifts).
  • SSCP may be optimized, as desired, as taught in Glavac et al., 1993, Hum. Mut. 2:404. fSSCP Analysis
  • fSSCP fluorescent SSCP
  • fSSCP does not require handling of radioactive materials. Furthermore, the fSSCP technique allows for automated data and automated data analysis programs that detect aberrantly migrating samples. In contrast, SSCP evaluation involves visual examination by an individual, and does not provide a means for correcting for lane to lane variations in electrophoretic conditions, as does fSSCP analysis. fSSCP Analysis is performed as follows.
  • Amplifications are performed in a total volume of 10 ul containing 50 mM KCI, lOmM Tris-HCl, pH 9.0 (at 25 °C), 0.1 % Triton X-100, 1.5 mM MgCl 2 , 0.2mM of dGTP, dATP, dTTP, dCTP, 0.2 uM primer labeled with one of the fluorochromes HEX, FAM, TET or JOE, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U Taq DNA polymerase.
  • the PCR cycling profile is as follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72'C for 5 min. Annealing temperature is different for each PCR primer pair.
  • Vent Taq polymerase New England Biolabs
  • Amplifications using Vent Taq polymerase are performed in a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of dGTP, dATP, dTTP, dCTP, 0.2 uM primer labeled with one of the fluorochromes HEX, FAM, TET or JOE, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U of Vent Taq DNA polymerase. Samples are heated to 98°C for 5 min prior to addition of enzyme and nucleotides.
  • the PCR cycling profile is 98°C, 1 min; annealing temperature, 45 sec; 72°C, 1 min for 35 cycles, followed by a final extension at 72°C for 5 min. Annealing temperature is different for each PCR primer pair.
  • Two ul of fluorescent PCR products are added to 3 ul formamide dye (95% formamide, 20mM EDTA, 0.05% bromophenol blue, 0.05% xylene cyanol), denatured at 100°C for 5 min, then placed on ice. Thereafter, 0.5-1 ml of GenescanTM 1500 size markers are added as an internal standard.
  • SSCP and fSSCP techniques are preferred according to the invention, other methods for detecting sequence variations, including DNA sequencing, can be employed. Additional techniques for detecting DNA sequence variations useful according to the invention are described below.
  • Fluorescence polarization-TDI is another preferred technique according to the invention for the detection of sequence variations.
  • Template-directed primer extension is a dideoxy chain terminating DNA sequencing protocol designed to ascertain the nature of the one base immediately 3 'to the sequencing primer that is annealed to the target DNA immediately upstream from the polymorphic site.
  • ddNTP dideoxyribonucleoside triphosphate
  • the primer is extended specifically by one base as dictated by the target DNA sequence at the polymorphic site. By determining which ddNTP is incorporated, the alleles present in the target DNA can be determined.
  • Fluorescence polarization is based on the observation that when a fluorescent molecule is exited by plane-polarized light, it emits polarized fluorescent light into a fixed plane if the molecules remain stationary between excitation and emission. However, because the molecule rotates and tumbles in solution, fluorescence polarization is not observed fully by an external detector.
  • the fluorescence polarization of a molecule is proportional to the molecule' s rotational relaxation time, which is related to the viscosity of the solvent, absolute temperature, molecular volume, and the gas constant. If the viscosity and temperature are held constant, then fluorescence polarization is directly proportional to the molecular volume, which is directly proportional to the molecular weight.
  • the fluorescent molecule If the fluorescent molecule is large (with high molecular weight), it rotates and tumbles more slowly in solution and fluorescence polarization is preserved. If the molecule is small (with low molecular weight), it rotates and tumbles faster and fluorescence polarization is largely lost (depolarized).
  • the sequencing primer is an unmodified primer with its 3' end immediately upstream from a polymorphic or mutation site.
  • the allele-specific dye ddNTP is incorporated onto the TDI primer in the presence of DNA polymerase and target DNA.
  • the genotype of the target DNA molecule can be determined simply by exciting the fluorescent dye in the reaction and determining whether a change in fluorescence polarization occurs. Chen et al., 1999, Genome Res., 9:492.
  • One or more test DNA samples are prepared for analysis as described above, and subj ect to PCR amplification.
  • Oligonucleotide primers are designed and synthesized as described above. Amplifications are performed in a total volume of 10 ml containing 50 mM KCI, 10 mM Tris- HCl, pH 9.0 (at 25°C), 0.1 % Triton X-100, 1.5 mM MgCl 2 , 0.2mM of dGTP, dATP, dTTP, 0.02 mM of non radioactive dCTP, 0.05 ml [a- 33 P] dCTP (1,000-3,000 Ci mmol "1 ; 10 mCi ml "1 ), 0.2 uM each primer, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U Taq DNA polymerase.
  • the PCR cycling profile is as follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72°C for 5 min.
  • Annealing temperature is different for each PCR primer pair and can be optimized according to the parameters described above.
  • Vent Taq polymerase New England Biolabs
  • Amplifications using Vent Taq polymerase are performed in a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of dGTP, dATP, dTTP, 0.02 mM dCTP, 0.25 ul [a- 33 P] dCTP (1,000-3,000 Ci mmol ⁇ lO mCi ml "1 ), 0.2 uM of each primer, 50 ng of genomic DNA (or 1 ng of cloned DNA template) and 0.1 U of Vent Taq DNA polymerase. Samples are heated to 98°C for 5 min prior to addition of enzyme and nucleotides.
  • the PCR cycling profile is 98°C, 1 min; annealing temperature, 45 sec; 72°C, 1 min for 35 cycles, followed by a final extension at 72°C for 5 min.
  • the length and temperature of each step of a PCR cycle, as well as the number of cycles, is adjusted in accordance to the stringency requirements, as described above.
  • unused PCR primers and dNTPs are destroyed by adding 2ml of PCR product to 2ml of SAP/Exonuclease cocktail (0.1U shimp alkaline phosphatase (1 U/ml,Amersham Pharmacia Biotech, Inc., Piscataway, NJ)and 0.2U E. coli exonuclease I (10 U/ml, Amersham)in SAP buffer (20mM TrisHCl, pH 8.0; 10 mM MgCl 2 , Amersham))per well of a 384-well Black PCR plate (ABT). The mixtures are incubated at 37°C for 60 min before the enzymes are heat inactivated at 95°C for 15 min. The mixture is held at 4°C until used in the FP- 5 TDI assay.
  • SAP/Exonuclease cocktail 0.1U shimp alkaline phosphatase (1 U/ml,Amersham Pharmacia Biotech, Inc., Piscataway, NJ
  • TDI reaction cocktail containing TDI buffer (50mM Tris-HCl (pH 9.0), 50mM KCI, 5 mM NaCI, 2 mM MgCl 2 , 8% glycerol), 1 mM TDI primer, 12.5 nM of each of two allele specific dye-labled ddNTPs (ROX-ddGTP, BFL- ddATP, Tamra-ddCTP, or R6G-ddUTP; NEN Life Science Products, Inc., Boston, MA), and ' 10 0.32U Thermo Sequenase (Amersham).
  • the reaction mixtures are incubated at 94oC for 15 min, followed by 34 cycles of 94°C for 30 seconds and 55°C for 15 seconds. Upon completion of the reaction cycles, the samples are held at 4°C.
  • Denaturing gradient gel electrophoresis is a gel system which allows electrophoretic separation of DNA fragments differing in sequence by a single base pair. The o separation is based upon differences in the temperature of strand dissociation of the wild-type and mutant molecules.
  • DGGE Denaturing gradient gel electrophoresis
  • fragments migrating through the gel are exposed to an increasing concentration of denaturant in the gel.
  • the DNA strands begin to dissociate. This dissociation causes a significant reduction in the mobility of the fragment.
  • the position in the gel at which the level 5 of denaturant is critical for a particular DNA fragment is a function of the Tm of the DNA fragment and is therefore different for wild-type versus mutant fragments.
  • DGGE mutation detection rate
  • the mutation detection rate of DGGE approaches 100%.
  • DGGE can only be used to analyze fragments between 100 and 800bp due to the resolution limit of polyacrylamide gels .
  • DGGE is advantageous over other methods useful for detecting sequence variations because the behavior of DNA molecules on DGGE gels can be modeled by computer thereby making it possible to accurately predict the detectability of a mutation in a given fragment. Genomic DNA fragments can be efficiently transferred from the gel following DGGE as described in US Patent No. 5,190,856.
  • Chemical Cleavage of Mismatches is another technique for detection of sequence variations that is useful according to the invention.
  • CCM is based upon the ability of hydroxylamine and osmium tetroxide to react with the mismatch in a DNA heteroduplex and the ability of piperidine to cleave the heteroduplex at the point of mismatch.
  • sequence variations are detected by the appearance of fragments that are smaller than the untreated heteroduplex following denaturing polyacrylamide gel electrophoresis.
  • DNA fragments up to lkb in size can be analysed by CCM with a probable 100% detection rate for sequence variation.
  • CCM is particularly useful for either detecting all of the sequence variations in a particular fragment of DNA or for determining that there are no sequence variations in a particular fragment of DNA.
  • CDCE analysis is particularly useful in high throughput screening, i.e., wherein large numbers of DNA samples are analysed.
  • CDCE analysis combines several elements of both replaceable linear polyacrylamide capillary electrophoresis and constant denaturant gel electrophoresis.
  • the technique of CDCE is a rapid, high resolution procedure that demonstrates a high dynamic range, and is automatable.
  • the method of CDCE as described in detail in Khrapko et al., 1994, Nucleic Acids Res. 22:364, involves the use of a zone of constant temperature and a denaturant concentration in capillary electrophoresis. Linear polyacrylamide gel electrophoresis is performed at viscosity levels that permit facile replacement of the matrix after each run.
  • point mutation-containing heteroduplexes are separated from wild type homoduplexes in less than 30 minutes.
  • the system has an absolute limit of detection of 3 x 10 4 molecules with a linear dynamic range of six orders of magnitude.
  • the relative limit of detection is about 3/10,000, i.e., 100,000 mutant sequences are recognized among 3 x 10 8 wild type sequences. This approach is applicable to analysis of low frequency mutations, and to genetic screening of pooled samples for detection of rare variants.
  • RNASE RNase Cleavage
  • RNASE A RNASE A
  • RNASE TI RNASE T2
  • RNASE T2 specifically digest single stranded RNA.
  • RNA is annealed to form double stranded RNA or an RNA/DNA duplex, it can no longer be digested with these enzymes.
  • cleavage at the point of mismatch may occur.
  • RNASE Cleavage is preferably performed with RNASE A.
  • Ribonuclease A specifically digests single stranded RNA but can also cleave heteroduplex molecules at the point of mismatch. The extent of cleavage at single base mismatches depends on both the type of mismatch, and the sequence of DNA flanking the mismatch. Sequence variations leading to mismatch are indicated by the presence of fragments that are smaller than the uncleaved heteroduplex on denaturing polyacrylamide gels.
  • RNASE Cleavage involves forming a heteroduplex between a radiolabeled single stranded RNA probe (riboprobe) and a PCR product derived from a biological sample. If a point mutation is present in the PCR product, following treatment of the resulting RNA/DNA heteroduplex with RNASE A, the RNA strand of the duplex may be cleaved. The sample is then denatured by heating and analysed on a denaturing polyacrylamide gel. If the RNA probe has not been cleaved, it will be the same size as the PCR product. If the probe has been cleaved, it will be smaller than the PCR product.
  • riboprobe radiolabeled single stranded RNA probe
  • RNASE Cleavage can be used to easily detect a 1 bp deletion. However, small insertions may not be as easily detected as small deletions, by RNASE Cleavage, as 'looping-out' occurs on the target strand rather than the probe strand.
  • Heteroduplex Analysis Another method for genotyping according to the invention is heteroduplex analysis.
  • Heteroduplex molecules i.e., double stranded DNA molecules containing a mismatch
  • the exact rate of detection of sequence variations by heteroduplex analysis is unknown, but is clearly significantly lower than 100%. Presumably, the sequence of DNA flanking the mismatch, rather than the actual mismatch affects the detectability. Mismatches that are located in the middle of a DNA fragment are detected most easily.
  • heteroduplex analysis is less sensitive than some of the other genotyping methods described, it may be considered useful according to the invention due to its simplicity.
  • MRD mismatch repair detection
  • MRD is an in vivo method that detects DNA sequence variation by the occurrence of a change in bacterial colony color. DNA fragments to be screened for variation are cloned into two MRD plasmids, and bacteria are transformed with heteroduplexes of these constructs . The resulting colonies are blue in the absence of a mismatch and white in the presence of a mismatch. MRD can be used to detect a single mismatch in a DNA fragment as large as 10 kb in size. MRD permits high-throughput screening of genetic mutations, and is described in detail in Faham et al., 1995, Genome Research 5:474.
  • Mismatch Recognition by DNA Repair Enzymes Another technique that is useful for detecting sequence variations according to the invention is Mismatch Recognition by DNA Repair Enzymes.
  • the E.coli mismatch correction systems are well-understood.
  • Three ofthe proteins required for the methyl-directed DNA repair pathway: MutS, MutL and MutH are sufficient to recognize 7 ofthe possible 8 single base-pair mismatches (C/C mismatches are not recognized) and cut/nick the DNA at the nearest GATC sequence.
  • the MutY protein which is involved in a distinct repair system can also be used to detect A/G and A C mismatches.
  • thymidine glycosylase can recognize all types of T mismatch and 'all-type endonuclease' or Topoisomerase I is capable of detecting all 8 mismatches, but does so with varying efficiencies, depending on both the type of mismatch and the neighboring sequence.
  • the MutS gene product is the methyl-directed repair protein which binds to the mismatch.
  • Purified MutS protein has been used to detect mutations by several different methods. Gel mobility assays can be performed in which DNA bound to the MutS protein migrates more slowly through an acrylamide gel than free DNA. This method has been used to detect single base mismatches.
  • MutS in mismatch recognition
  • nitrocellulose membranes An alternative method for the use of MutS in mismatch recognition, which does not require gel electrophoresis, involves the immobilization of MutS protein on nitrocellulose membranes. Labeled heteroduplexed DNA is used to probe the membrane in a dot-blot format. When both DNA strands are used, all mismatches can be recognized by binding of the DNA to the protein attached to the membrane. Although C/C mismatches are not detected, the corresponding G/G mismatch derived from the other strand is recognized. This technique is particularly useful because it is simple, inexpensive, and amenable to automation. However, the detection efficiency of this method may be limited by the size of the DNA fragment, h particular, this method works well for very short fragments.
  • An alternative method for detecting sequence variations according to the invention is sequencing by hybridization (SBH).
  • SBH sequencing by hybridization
  • arrays of short (8-10 base long) oligonucleotides are immobilized on a solid support in a manner similar to the reverse dot-blot protocol, and probed with a target DNA fragment.
  • oligonucleotides are synthesized together and directly onto the support.
  • the synthesis system begins with a silicon chip coated with a nucleotide linked to a light-sensitive chemical group which is used to illuminate particular grid co-ordinates removing the blocking group at these positions .
  • the chip is then exposed to the next photoprotected nucleotide, which polymerizes onto the exposed nucleotides.
  • oligonucleotides of different sequences can be synthesized at different positions on the solid support. Thirty-two cycles of specific additions (i.e., 8 additions of each of the four nucleotides) should enable the production of all 65,536 possible 8-mer oligonucleotides at defined positions on the chip.
  • a DNA molecule e.g., a fluorescently labeled PCR product
  • fully matched hybrids should give a high intensity of fluorescence and hybrids with one or more mismatches should give substantially less intense fluorescence.
  • the combination ofthe position and intensity of the signals on the chip enables computers to derive the sequence of the DNA molecule being analysed for the presence of sequence variations.
  • ASO allele-specific oligonucleotide
  • 'dot-blot' The technique of allele-specific oligonucleotide (ASO) hybridization or the 'dot-blot' is also useful for genotyping according to the invention.
  • an oligonucleotide will only bind to a PCR product if the two are 100% identical.
  • a single base pair mismatch is sufficient to prevent hybridization.
  • a pair of oligonucleotides, one carrying the wild type base and the other carrying a single base change, as compared to the wild type sequence, can be used to determine if a PCR product is homozygous wild type, heterozygous or homozygous mutant for a particular base change.
  • the PCR product When performing conventional dot blots, the PCR product is fixed onto a nylon membrane and probed with a labeled oligonucleotide.
  • an oligonucleotide When performing a 'reverse dot blot' , an oligonucleotide is fixed to a membrane and probed with a labeled PCR product.
  • the probe may be isotopically labeled, or non-isotopically labeled.
  • the allele-specific polymerase chain reaction (also called the amplification refractory mutation system or ARMS) comprises an assay that occurs during the PCR reaction itself.
  • ARMS requires the use of sequence-specific PCR primers which differ from each other at their terminal 3' nucleotide and are designed to amplify only the normal allele in one reaction, and only the mutant allele in another reaction.
  • amplification occurs.
  • Agarose gel electrophoresis is used to detect the presence of an amplified product.
  • the genotype of a (heterozygous) wild-type sample is characterized by amplification products in both reactions, and a homozygous mutant sample generates product in only the mutant reaction.
  • This technique can be modified so that the 5' ends of the allele-specific primers are labeled with different fluorescent labels, and the 5 ' end ofthe common primers are biotin labeled.
  • the wild-type specific and the mutant-specific reactions are performed in a single tube.
  • the advantages of this approach are that a gel electrophoresis step is not required, and the method is amenable to automation.
  • PIRA primer-introduced restriction analysis
  • the method of primer-introduced restriction analysis can also be used for genotyping according to the invention.
  • PIRA is a technique which allows known sequence variations to be detected by restriction digestion.
  • By introducing a base change close to the position of a known sequence variation for example by using a PCR primer containing a mismatch, as compared to the target sequence, it is possible to create a restriction endonuclease recognition site that indicates the presence of a particular sequence change.
  • the combination of the altered base in the primer sequence and the altered base at the mutation site creates a new restriction enzyme target site. This approach may be used to create a new restriction enzyme site in either the wild-type allele or the mutant allele.
  • the homozygous wild-type form would produce a single band of the full-length size
  • the homozygous mutant form would produce a single band of the reduced size
  • the heterozygous form would produce both full length and reduced sized bands. Band size will be analysed by gel electrophoresis.
  • Oli onucleotide Ligation Assay The technique of oligonucleotide ligation can also be used for genotyping according to the invention.
  • oligonucleotide ligation is based on the following observations. If two oligonucleotides are annealed to a strand of DNA and are exactly juxtaposed, they can be joined by the enzyme DNA ligase. If there is a single base pair mismatch at the junction of the two oligonucleotides then ligation will not occur. According to the method of oligonucleotide ligation, the two oligonucleotides used in the assay are modified by the addition of two different labels.
  • the assay for a li gated product involves detecting a ligated product by assaying for the appearance of the labels of the two oligonucleotides on a single molecule rather than visualization of a new, larger sized DNA fragment by gel electrophoresis.
  • the oligonucleotide ligation assay can be performed by a robot and the results can be analysed by a plate reader and fed directly into a computer. This method is therefore extremely useful for detecting the presence of a sequence variation in a large number of samples.
  • the oligonucleotide ligation assay is performed on PCR-amplified DNA.
  • a modification of this assay termed the ligase chain reaction, is performed on genomic DNA and involves amplification with a thermostable DNA ligase.
  • Direct DNA Sequencing Genotyping may also be carried out by directly sequencing the
  • Mini-Sequencing also known as single nucleotide primer extension
  • the technique of mini-sequencing can also be used to detect any known point mutation, deletion or insertion, according to the invention.
  • Obtaining sequence information for just a single base pair only requires the sequencing of that particular base. This can be done by including only one base in the sequencing reaction rather than all four. When this base is labeled and complementary to the first base immediately 3' to the primer (on the target strand), the label will not be incorporated. Thus, a given base pair can be sequenced on the basis of label incorporation or failure of incorporation without the need for electrophoretic size separation.
  • Genotyping according to the invention can also be performed by the method of 5' nuclease assay.
  • the 5' nuclease assay is a technique that monitors the extent of amplification in a PCR reaction on the basis of the degree of fluorescence in the reaction mix. A low level of fluorescence indicates no amplification or very poor amplification and a high level of fluorescence indicates good amplification.
  • This system can be adapted to permit identification of known sequence variations, without the need for any post-PCR analysis other than fluorescence emission analysis.
  • PCR amplification is detected by measuring the 5' to 3' exonuclease activity of Taq polymerase.
  • Taq polymerase cleaves 5' terminal nucleotides of double stranded DNA.
  • the preferred substrate for Taq polymerase is a partially double stranded molecule.
  • Taq polymerase cleaves the strand that contains the closest free 5' end.
  • an oligonucleotide 'probe' which is phosphorylated at its 3' end so as to render it incapable of serving as a DNA synthesis primer, is included in the PCR reaction.
  • the probe is designed to anneal to a position between the two amplification primers.
  • the probe is labeled in a manner that permits detection ofthe removal ofthe probe.
  • the probe is labeled at different positions with two different fluorescent labels.
  • One label has a localized quenching effect on the fluorescence of the other (reporter) label. This effect is mediated by energy transfer from one dye to the other, and requires that the two dyes are in close proximity to each other.
  • Genotyping according to the invention can also be carried out by Representational Difference Analysis (RDA).
  • RDA is described in detail in Lisitsyn et al., 1993, Science 259:946, and an adaptation which combines selective breeding with RDA is described in Lisitsyn et al., 1993, Nature Genet. 6:57.
  • RDA identifies sequence dissimilarities through the application of a powerful approach to subtractive hybridization.
  • An amplicon can comprise, for example, the set of BglJJ fragments that are small enough to be amplified by the PCR.
  • the iterative subtraction step begins with the ligation of a special adaptor to the 5' end of fragments contained in the amplicon derived from the test sample (tester amplicon).
  • the tester amplicon is then melted and briefly reannealed in the presence of a large excess of amplicon, derived from the wild type sample (driver amplicon).
  • Those tester fragments that reanneal can serve as a template for the addition of the adaptor sequence to the 3 '-end of the "partner" fragment.
  • these tester fragments can be exponentially amplified by PCR. This procedure is then repeated to achieve successively higher enrichment.
  • RDA may be used to clone sequences that are either wholly absent from the wild type sample or are present in the wild type DNA, but are contained in a restriction fragment that is too large to be amplified in the amplicon.
  • the former case may arise from a total deletion; the latter from a restriction fragment length polymorphism with the short allele present in the tester but not the wild type DNA.
  • RDA is useful for subtracting DNA from an individual with a particular disease from normal DNA so as to identify regions showing homozygous or heterozygous deletions; locating fragments present in a parent with a dominant disorder but absent in his unaffected offspring; and locating mRNAs expressed in normal tissue but not present in tissue isolated from an individual with a particular disease.
  • DHPLC Chromatography
  • partial heat denaturation and a linear acetonitrile column are used to identify polymorphisms in DNA fragments .
  • DHPLC provides a method of comparative DNA sequencing based on the capability of ion-pair reverse phase liquid chromatography on alkylated nonporous poly(styrene divinylbenzene) particles to resolve homo- from heteroduplex molecules under conditions of partial denaturation. This method can potentially be automated to allow for rapid analysis of a large number of samples (Underhill et al., 1996, Proc. Natl. Acad. Sci. USA, 93:196).
  • Matrix-assisted laser desorption-ionization-time-of-flight (MALDI-TOF) mass spectroscopy is another method according to the invention by which genotyping can be performed.
  • the method of MALDI-TOF mass spectroscopy is based on the irradiation of crystals formed by suitable small organic molecules (referred to as the matrix) with a short laser pulse at a wavelength close to the resonant adsorption band of the matrix molecules. This causes an energy transfer and desorption process producing matrix ions. Low concentrations of nucleic acid molecules are added to the matrix molecules while in solution and become embedded in the solid matrix crystals upon drying of the mixture.
  • the intact nucleic acids are then desorbed into the gas phase and ionized upon irradiation with a laser allowing their mass analysis.
  • MALDI is used primarily with time-of-flight spectrometers where the time of flight is related to the mass-to- charge ratio of the nucleic acids molecules. Reviewed in Griffin TJ. and Smith L.M., 2000, Trends Biotech 18:77. Genotyping can be performed by any of the following MALDI-TOF mass spectroscopy approaches including sequencing of PCR products (Fu,D-J et al., 1998, Nat. Biotechnol. 16:381; Kirpekar, F. et al., Nucleic Acids Res.
  • the invention provides methods for specifying a particular polymorphism.
  • specifying an polymorphism is meant defining a polymorphism in the context of a larger region of nucleic acid which contains the polymorphism, and is of sufficient length to be easily differentiated from any other position in the genome.
  • a unique nucleotide position (e.g. a polymorphic site) in the human genome can be specified by describing a unique sequence of DNA within the genome, and providing the location of the unique nucleotide position relative to that sequence. Preferably this is done by providing the sequence identity of a length of unique DNA containing the polymorphism, and indicating which of the nucleotide sites is polymorphic.
  • 16 bp would uniquely define a sequence in the genome.
  • the genome is not composed of random sequence and does not contain equal amounts of A, G, C andT.
  • 10-12 bp sequences are likely to be specific for 95% of genes. Some sequences may even be specified by as few as 8 nucleotides.
  • the minimum sequence length that is useful according to the invention for identifying polymorphisms in most gene and intergenic sequences is approximately 9-15 bp.
  • repeats In the case of repeat sequences and sequences associated with gene families, the probability of observing a particular sequence is greatly increased and it becomes difficult to specify a polymorphism in the context of a sequence that is only on the order of 9-15 bp.
  • repeats There are many types of repeats including tandem repeats, where a larger sequence block has within it smaller repeat units (e.g. microsatellites). Tandem repeats usually occur within non-genic areas, but can also occur within genes and subsequently affect gene function; they can be 10-lOOOs of bp long, or, if located in centromeres and telomeres, be megabase sized. Some repeats are composed of blocks which do not have sub-repeat units and are non-functional (e.g. -300 bp Alu repeats). These occur by duplication/dispersal throughout the genome.
  • a larger region of nucleic acid which contains the polymorphism will be required to define a polymorphism in a gene that is a member of a gene family. It is predicted that a sequence of 9-15 bp will be sufficient to define a polymorphism in 99% of all cases.
  • An oligonucleotide is designed such that it is specific for a target sequence, and hybridizes only at the target sequence site. This oligonucleotide will not hybridize if the target sequence differs at the position in the sequence to be tested.
  • Another oligonucleotide is designed such that it hybridizes with the polymorphic form of the sequence.
  • a DNA sample is tested for hybridization with each of the two probes independently. If the DNA hybridizes to only one of the probes, it can be concluded that the individual is homozygous for the corresponding sequence. If both probes hybridize to a test DNA sample, then the individual is heterozygous. Hybridization will be detected by the method of Southern blot analysis (as described in Section C entitled "Production of a Nucleic Acid Probe").
  • An alternative method for specifying a particular polymorphism involves a PCR-based strategy.
  • a region of a candidate gene to be tested is amplified by PCR (as described).
  • the amplified fragment is digested with a restriction enzyme that will not cut a fragment that contains a polymorphism, due to the location of the polymorphism within the recognition site of this restriction enzyme.
  • the products ofthe digestion reaction mixture are size separated in an agarose gel, stained with ethidium bromide, and visualized under ultraviolet light to determine if the amplified product has been digested.
  • the PCR primers provide the specificity for a particular polymorphism by virtue ofthe specific sequence of the two primers, as well as by the location of the primer binding sites in the target DNA.
  • multiple sites for primer binding may exist in a target DNA sequence, only the sites that are close enough together will produce an amplified product that includes the nucleic acid region containing the polymorphism.
  • a PCR reaction is carried out with PCR primers that contain polymorphisms. According to this embodiment, if the template nucleic acid lacks the polymorphism present in the primers there will be no PCR product. Thus, according to this embodiment of the invention, the absence of a PCR product indicates that a polymorphism is not present in the target sequence.
  • a DNA fragment comprising the region containing a polymorphism is PCR amplified from an individual to be tested.
  • the PCR product is denatured and one strand is retained for analysis.
  • An oligonucleotide probe is designed such that it is specific for a region in the sequence and hybridizes such that its 3' terminal nucleotide is paired with the nucleotide adjacent to the one to be tested.
  • the PCR product and probe are combined with a polymerase and terminating, differentially colored, nucleotides.
  • the polymerase extends the probe by one base, and only the base which is complementary to the site being tested is added. The reaction is washed, and the color of the reaction indicates the nucleotide that has been added and the sequence at the position of interest.
  • the PCR step provides one level of specificity by amplifying a region (1 - 10000 bp as desired between the PCR primers) from a complex (3,000,000,000 bp) mixture.
  • the PCR probes primers must be unique in both their hybridization specificity and their proximity to one another. Since proximity ofthe two PCR primers is needed (i.e. a distance across which a polymerase can extend to join the primers), shorter PCR primers can be used, e.g. in theory a small enough region could be amplified with a 8-10 bp binding site for a PCR primer. To ensure that a primer hybridizes with specificity, a primer must be at least 5 bp.
  • a second level of specificity is provided by the primer which is extended in the primer extension reaction. Since this primer is hybridizing to a short piece of DNA, it can be short and unique for the fragment with which it binds.
  • the primer is at least 5bp and preferably 8bp.
  • the primer used for the primer extension step is located probe adjacent to the polymorphic site, the PCR primers should not overlap with the polymorphic site being tested.
  • One method for detecting a previously defined polymorphism involves Southern blot analysis of wild type and mutant DNA following digestion with a restriction enzyme which has a recognition sequence which includes the polymorphic site to be tested.
  • a restriction enzyme which has a recognition sequence which includes the polymorphic site to be tested.
  • a particular restriction enzyme cuts wild type DNA but does not cut mutant DNA due to the presence of a polymorphism within the recognition site of this restriction enzyme.
  • Many restriction enzymes exist which recognize 4bps.
  • the resulting fragments will be size separated in an agarose gel, transferred to a membrane and probed with a nucleic acid probe. If the site is uncut, the fragment is one length and if the site is cut the fragment will be of a shorter length.
  • the nucleic acid hybridization probe will provide specificity to the particular polymorphism being tested by defining the polymorphism in the context of a larger stretch of nucleic acid sequence.
  • the nucleic acid probe may comprise the nucleic acid sequence corresponding to the region known to contain the polymorphism.
  • the sequence-specific probe may be located 10, 100, 1000, or even 100s of thousands of bases from the region containing the polymorphism. If the probe is located some distance from the region containing the polymorphism, an intervening recognition site for the restriction enzyme cannot be located between the probe hybridization site and the region of interest containing the polymorphism site.
  • a hybridization probe useful according to this method will be much larger than the minimum length of a sequence (9-15 bp) required to give specificity to, or define a particular polymorphism.
  • a chemical or enzyme which recognizes a unique pair of nucleotides at the site of a polymorphism can be used to detect the polymorphism.
  • the amount of sequence required for recognition by a chemical or enzyme is 2 bp (providing that the
  • 2 bp sequence is unique in a region large enough to produce a fragment which can then be bound by a specific probe).
  • a labeled chemical or enzyme which binds to one sequence of the polymorphic recognition site and not another is used.
  • This method involves the steps of digesting the DNA with a restriction enzyme, and adding a labeled, sequence-specific binding protein (e.g. a restriction enzyme that lacks cleavage capability).
  • the sequence-specific binding protein will bind to multiple sites in the genome, including the site to be tested.
  • the fragments will be separated on a gel and then probed with a probe specific for the test sequence. If the fragment identified by the second probe is identical to a fragment identified by the first probe (e.g. the labeled chemical or enzyme), then the sequence being tested for is present.
  • the invention provides methods for performing polymorphism genotyping in appropriate populations (described above).
  • the invention also provides in vitro and in vivo assays useful for determining the phenotypic outcome of a polymorphism in a candidate gene.
  • Every polymorphism has the potential to alter the genetic activity of an individual.
  • the effect of a polymorphism can range from an inconsequential, silent change to a change that causes a complete loss of protein function to a gain of aberrant or detrimental function mutation.
  • the severity of the effect of a polymorphism on gene activity will depend on the exact molecular consequences of the particular polymorphism. For example, alterations of a single pre-mRNA splicing dinucleotide could have profound effects on both the quantitative and qualitative properties of gene activity since alterations in splicing efficiency can both reduce the overall level of normal transcription as well as cause "exon skipping".
  • In vitro assays useful for determining the effects of a polymorphism on gene expression and protein function include, but are not limited to the following. i. Transcriptional Regulation The transcriptional regulation of a candidate gene containing a polymorphism may be altered, as compared to the wild type gene.
  • promoter assays wherein the altered promoter ofthe candidate gene is used to drive the expression of a reporter gene (e.g. CAT, luciferase, GFP) are performed.
  • Changes in the transcriptional regulation of a candidate gene due to the presence of a polymorphism can also be detected by methods useful for measuring the level of mRNA including S 1 nuclease mapping and RT-PCR.
  • the S 1 enzyme is a single-stranded endonuclease that will digest both single-stranded RNA and DNA.
  • a probe that has been efficiently labeled to a high specific activity at the 5 ' end through the use of a kinase is used to determine either the amount of an mRNA species or the 5' end of a message.
  • a single stranded probe that is complementary to the sequence of the RNA species of interest is utilized in SI analysis. If the structure of a particular mRNA species is known, S 1 analysis is performed with oligonucleotide probes of at least 40 bp, that are complementary to the RNA of interest.
  • oligonucleotides wherein the 5' end of the oligonucleotide is complementary to the RNA. It is also preferable to use oligonucleotides wherein the 5' terminal residues contain dG or dC residues. If Si nuclease analysis will be utilized to determine the 5' termini of an RNA species, the 3' end of the oligonucleotide should extend at least 4 nucleotides beyond the RNA coding sequence. The inclusion of additional nucleotides facilitates differentiation of a band resulting from an RNA:DNA duplex and a band representing the probe.
  • a hybridization probe for SI analysis is prepared by incubating 2pmol of an oligonucleotide in the presence of 150 mCi[y 32 P]ATP (3000-7000Ci/mmol), 2.5 ml 10X T4 polynucleotide kinase buffer (700mM Tris-Cl, pH 7.5, 100 mM MgCl 2 , 50 mM dithiothreitol, 1 mM spermidine-Cl, 1 mM EDTA), and 10U T4 polynucleotide kinase for 37°C for 30-60 minutes.
  • the radiolabeled probe is ethanol precipitated and resuspended at lml/0.3ng oligonucleotide or 1O 5 cpm.
  • the hybridization reaction is performed as follows. An amount of probe equal to 5xl0 4 Cerenkov counts is added to 5Omg RNA on ice and ethanol precipitated. The resulting pellet is resuspended in 20ml S 1 hybridization solution (80% deionized formamide, 40 mM PIPES, pH 6.4, 400mM NaCI, 1 mM EDTA, pH 8), denatured for 10 min at 65°C and hybridized overnight at 30°C.
  • S 1 hybridization solution 80% deionized formamide, 40 mM PIPES, pH 6.4, 400mM NaCI, 1 mM EDTA, pH 8
  • RT-PCR reverse transcription /polymerase chain reaction
  • the RNA is converted to first strand cDNA, which is relatively stable and is a suitable template for a PCR reaction.
  • the cDNA template of interest is amplified using PCR. This is accomplished by repeated rounds of annealing sequence- specific primers to either strand of the template and synthesizing new strands of complementary DNA from them using a thermostable DNA polymerase.
  • RNA sample is ethanol precipitated with a cDNA primer. It may be preferable to use a cDNA primer that is identical to one ofthe amplification primers.
  • a cDNA primer that is identical to one ofthe amplification primers.
  • To the pellet is added 12 ml H 2 0, 4ml 400mM TrisCl, pH 8.3, and 4 ml 400 mM KCI. The mixture is heated to 90°C, slow cooled to 67°C, microfuged and incubated for 3 hours at 52°C.
  • the resulting cDNA pellet is resuspended in 40ml H 2 0.5ml ofthe cDNA sample is mixed with 5ml or each amplification primer ( ⁇ 20mM each), 4ml 5mM 4dNTP mix, 10ml 10X amplification buffer (500mM KCI, lOOmM TrisCl, pH8.4, lmg/ml gelatin) and 70.5ml H 2 0. After the mixture is heated for 2 minutes at 94°C, 0.5 ml (2.5U) Taq DNA polymerase is added and the sample is overlaid with mineral oil.
  • PCR amplification ofthe cDNA will be performed using the following automated amplification cycles: 39 cycles (2 minutes at 55°C, 2 minutes at 72°C, 1 minute at 94°C), 1 cycle (2 minutes at 55°C, 7 minutes at 72°C). The number of cycles can be varied in accordance with the abundance of RNA (Ausubel et al., supra).
  • assays including but not limited to the yeast two-hybrid assay (Fields et al., 1994, Trends Genet., 10:286) can be used to determine the effects of a polymorphism on transcription factor binding.
  • the protein product of the gene of interest is a DNA binding protein
  • the phenotypic outcome of a polymorphism may be impaired nuclear transport, DNA binding, chromatin assembly or chromatin structure, methylations or histones deacetylation.
  • Irnmunocytochemical methods or cell fractionation techniques are used to determine if the protein is correctly localized in the nucleus.
  • DNA binding properties of a transcription factor are determined by gel shift analysis (as described in Ausubel et al., supra), oligonucleotide selection, southwestern assays or by immunohistochemical analysis of fixed chromosomes.
  • the method of gel shift analysis is used to detect sequence specific DNA-binding proteins from crude extracts. According to this method, proteins that bind to an end-labeled DNA fragment will retard the mobility of the fragment. The change in the mobility of the labeled fragment is detected by the appearance of a discrete band comprising the DNA-protein complex.
  • a number of methods for preparing nuclear and cytoplasmic extracts useful for gel shift analysis are known in the art. For example, nuclear extracts are prepared according to the following method.
  • a cell pellet is washed in PBS, resupended in a volume of hypotonic buffer (10 mM HEPES, pH 7.9, 1.5 mM MgCl 2 , lOmM KCI, 0.2 mM PMSF, 0.5 mM DTT ) that is approximately equal to 3 times the packed cell volume and allowed to swell on ice for 10 minutes.
  • hypotonic buffer 10 mM HEPES, pH 7.9, 1.5 mM MgCl 2 , lOmM KCI, 0.2 mM PMSF, 0.5 mM DTT
  • Cells are homogenized in a glass Dounce homogenizer and the nuclei are collected by centrifugation and resupended in a volume of low-salt buffer (20 mM HEPES , pH 7.9, 25% (v/v) glycerol, 1.5 mM MgCl 2 , 0.02 M KCI, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) equivalent to one-half ofthe packed nuclear volume.
  • low-salt buffer (20 mM HEPES , pH 7.9, 25% (v/v) glycerol, 1.5 mM MgCl 2 , 0.02 M KCI, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT
  • nuclear extraction is carried out for 30 minutes with continuous gentle stirring.
  • a volume of high-salt buffer (20 mM HEPES, pH 7.9, 25% (v/v) glycerol, 1.5 mM MgCl 2 , 1.2 M KCI, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) equivalent to one-half of the packed nuclear volume (dropwise with stirring) to the nuclei.
  • the nuclei are collected by centrifugation and the nuclear extract is dialyzed against 50 volumes of dialysis buffer (20 mM HEPES, pH 7.9, 20% (v/v) glycerol, lOOmM KCI, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) until the conductivities of extract and buffer are equivalent.
  • the extract is removed from the dialysis tubing and analysed for protein concentration (Ausubel et al., supra).
  • Probes useful for gel shift analysis include a fragment of plasmid DNA or a gel-purified double stranded oligonucleotide.
  • the probe is labeled with Klenow fragment by incubating a lOOml solution of plasmid DNA or oligonucleotide with lOOmCi of the desired [a- 32 P] dNTP, 4ml of 5 mM 3dNTP mix and 2.5 U Klenow fragment for 20 minutes at room temperature.
  • 4ml of a solution comprising 5 mM of the dNTP co ⁇ esponding to the radioactive dNTP the sample is incubated for 5 minutes at room temperature.
  • the radiolabeled probe is ethanol precipitated, resuspended in TE buffer and gel purified.
  • DNA binding activity is an essential property of proteins involved in many basic cell biological events, such as chromatin structure, transcriptional regulation, DNA replication and repair.
  • the biological activity of a DNA binding protein can be assayed by defining the optimal target DNA binding site.
  • the canonical nucleotide sequence defining the binding site is elucidated in vitro by mixing purified full length protein, or just the DNA binding domain of a protein of interest, with an oligonucleotide duplex pool containing a completely randomized central region flanked by primer- annealing sites. Multiple rounds of immunoprecipitation and amplification by PCR enriches for high affinity sites which are cloned are sequenced in order to define a canonical binding site.
  • DNA binding protein The ability of a DNA binding protein to correctly regulate chromatin assembly and structure can be determined by DNase hypersensitivity assays. Alternatively, coimmunoprecipitation experiments or Western blot analysis can be used to determine if the DNA binding protein is associated with a component of the chromatin.
  • the ability of a protein to bind DNA is measured by using the "Southwestern" blot technique (for example see Antalis et al., 1993, Gene, 134:201). According to this method, radiolabelled DNA is incubated with protein that has been immobilized on nitrocellulose filters and the amount of boundDNA is measured by scintillation counting or autoradiography followed by densitometry.
  • the protein to be tested can be pure protein, immunoprecipitated protein, crude cell lysates or even recombinant protein denatured directly from bacterial colonies, yeast or cell culture.
  • immunoprecipitation can be used to test for the presence of the protein (Otto and Lee, 1993, Methods Cell Biol., 37: 119, Banting, 1995, In Gene Probes 1: A practical approach. Chapter 8: Antibody probes, pp. 225-227, IRL press.).
  • the following methods are used for determining if a protein of interest is associated with a particular subcellular component.
  • proteins are immunoprecipitated with an antibody specific for a cellular component (e.g.
  • the immunoprecipitated material is analysed on a gel by denaturing polyacrylamide gel electrophoresis and western blot analysis is performed with an antibody specific for the protein of interest, to determine if a physical association exists between the cellular component and the protein of interest.
  • Various incubation and wash treatments of the cell lysate are used to remove background contamination and enhance the sensitivity of detection (Banting, 1995, supra).
  • the initial immunoprecipitation can be carried out with the antibody specific for the protein of interest, and the western blot analysis can be performed with an antibody specific for a cellular component.
  • the cells prior to immunoprecipitation the cells can be treated with a protein crosslinker to ensure that protein-protein interactions are maintained during immunoprecipitation.
  • proteins can be cross-linked to DNA and then precipitated (Dedon et al., 1991, Anal. Biochem., 197:83). If DNA coprecipitates with a particular protein, this suggests that DNA is associated with, and presumably bound to the protein. The coprecipitating DNA can be sequenced to identify the bound sequence.
  • the transcriptionally active promoter region of a gene can be analysed for susceptibility to cleavage by DNAsel (Montecino et al., 1994,Biochemistry, 33:348). Efficient cleavage of genomic DNA is dependent on the accessibility of this enzyme to the DNA, and is influenced by several factors, including nucleosome packaging, overall chromatin configuration, and the presence of DNA binding proteins such as transcription factors. DNA sequence variations within the promoter DNA may have profound effects on these factors and result in abe ⁇ ant regulation of gene transcription and ultimately abnormal biological activity of the gene. Therefore, altered gene activity around a polymorphic site can be detected as increased or decreased DNAsel hypersensitivity (Vaishnaw et al., 1995, Immunogenetics, 41:354).
  • methylations-specific PCR (Herman et al., 1996, Proc Natl Acad Sci USA., 93:9821), is used to determine the methylations status of CpG islands without the use of methylations-specific restriction enzymes.
  • chromatin-packaged genes involves highly regulated changes in nucleosome structure that control DNA accessibility. Changes in nucleosome structure can be mediated by enzymatic complexes which control the acetylation and deacetylation of histones. Transcription elongation is required for the formation of the unfolded structure of transcribing nucleosomes, and histones acetylation is required for the maintenance of these structures (Walia et al., 1998, J. Biol. Chem., 3:14516). Deacetylation can be prevented by incubating cells with histones deacetylase inhibitors such as sodium butyrate or trichostain A. To assay for changes in acetylation and the state of transcriptional activity, chromatin fractions are purified using organomercury and hydroxylapatite dissociation chromatographic techniques (Walia et al., supra).
  • a polymorphism causes a change in the transcriptional start site of a candidate gene
  • SI nuclease mapping and primer extension can be performed.
  • the presence of a polymorphism may cause an mRNA to be abe ⁇ antly expressed.
  • a polymorphism may change the tissue specificity or developmental expression pattern of an mRNA species.
  • a variety of molecular methods for detecting mRNA known in the art can be performed to determine the expression pattern of an mRNA These methods include, but are not limited to the following: Northern blot analysis, RT-PCR, SI analysis, RNASE Protection analysis, or in situ hybridization analysis of sections, wherein the samples are derived from multiple different tissues or from a tissue at different stages of development.
  • Northern blot analysis, RT-PCR and S 1 analysis can also be used to determine if a polymorphism results in an altered pattern of mRNA splicing.
  • Northern blotting The method of Northern blotting is well known in the art. This technique involves the transfer of RNA from an electrophoresis gel to a membrane support to allow the detection of specific sequences in RNA preparations.
  • RNA sample (prepared by the addition of MOPS buffer, formaldehyde and formamide) is separated on an agarose/formaldehyde gel in IX MOPS buffer. Following staining with ethidium bromide and visualization under ultra violet light to determine the integrity of the RNA, the RNA is hydrolyzed by treatment with 0.05M NaOH/l .5MNaCl followed by incubation with 0.5M Tris-Cl (pH 7.4V1.5M NaCI. The RNA is transferred to a commercially available nylon or nitrocellulose membrane (e.g.
  • Hybond-N membrane Amersham, Arlington Heights, IL
  • the membrane is hybridized with a radiolabeled probe in hybridization solution (e.g. in 50% formamide/2.5% Denhardt's/100-200mg denatured salmon sperm DNA/0. 1% SDS/5X SSPE) at 42°C.
  • hybridization solution e.g. in 50% formamide/2.5% Denhardt's/100-200mg denatured salmon sperm DNA/0. 1% SDS/5X SSPE
  • the hybridization conditions can be varied as necessary as described in Ausubel et al., supra and Sambrook et al., supra.
  • the membrane is washed at room temperature in 2X SSC/0.1% SDS, at 42°C in IX SSC/0.1% SDS, at 65°C in 0.2X SSC/0.1% SDS, and exposed to film.
  • the stringency of the wash buffers can also be varied depending on the amount of background signal (Ausubel et al., supra).
  • RNASE Protection analysis can be used to analyze RNA structure and amount and determine the endpoint of a specific RNA.
  • RNASE protection is more sensitive than SI analysis since it utilizes a sequence specific hybridization probe that is labeled to a high specific activity.
  • the probe is hybridized to sample RNAs and treated with ribonuclease to remove free probe. Following ribonuclease treatment, the fragments comprising probe annealed to homologous sequences in the sample RNA are recovered by ethanol precipitation, and analysed by electrophoresis on a sequencing gel. The presence of the target mRNA is indicated by the presence of an appropriately sized fragment of the probe.
  • a probe is labeled by the method of in vitro transcription (in the presence of [a- 32 P] CTP as described in Section B entitled "Production of a Polynucleotide Sequence".
  • the RNA sample to be analysed is ethanol precipitated and resuspended in 30ml hybridization buffer (4 parts formamide/1 part 200 mM PIPES, pH 6.4, 2M NaCI, 5 mM EDTA) containing 5 x 10 5 cpm of the probe RNA.
  • the mixture is denatured 5 minutes at 85°C and incubated at the desired hybridization temperature (30°C to 60°C) for >8 hours.
  • ribonuclease digestion buffer (10 mM Tris-Cl, pH 7.5, 300mM NaCI, 5mM EDTA) containing 40mg/ml ribonuclease A and 2mg/ml ribonuclease TI.
  • the sample is incubated for 30-60 minutes at 30°C.
  • 10 ml 20%SDS and 2.5ml 2Omg/ml proteinase K the sample is incubated for 15 minutes at 37°C.
  • RNA loading buffer 80% (v/v) formamide, 1 mM EDTA, pH 8.0, 0.1 % bromophenol blue, 0.1 % xylene cyanol
  • primer extension is used to map the 5' end of an RNA and to quantitate the amount of an RNA of interest by using reverse transcriptase to extend a primer that is complementary to a region of a given RNA.
  • oligonucleotide primer is labeled in a kinase reaction as described for SI analysis.
  • the primer extension reaction is performed by mixing 10-50mg total cellular RNA (in 10ml) with 1.5ml 10X Hybridization buffer (1.5M KCI, 0.1M TrisCl, pH 8.3, lOmM EDTA) and 3.5 ml labeled oligonucleotide. Samples are heated to 65°C for 90 minutes and allowed to slow cool at room temperature.
  • primer extension reaction mixture (0.9ml Tris-Cl, pH 8.3, 0.9ml 0.5MMgCl 2 , 0.25ml DTT, 6.75ml 1 mg/ml actinomycin D, 1.33 ml 5 mM 4dNTP mix, 20 ml H 2 0, 0.2ml 25U/ml AMV reverse transcriptase).
  • Samples are incubated for 1 hour at 42°C, and then, following the addition of 105ml RNASE reaction mix (100 mg/ml salmon sperm DNA, 20 mg/ml RNASE A) for 15 minutes at 37°C.
  • Samples are extracted in phenol/chloroformlisoamyl alcohol, ethanol precipitated, resuspended in stop/loading dye (20 mM EDTA, pH 8.0, 0.05% bromophenol blue, 0.05% xylene cyanol in formamide), heated at 65°C and analysed by electrophoresis on a 9% acrylamide/7M urea gel and autoradiography.
  • stop/loading dye (20 mM EDTA, pH 8.0, 0.05% bromophenol blue, 0.05% xylene cyanol in formamide
  • Cytological techniques well known in the art can be used to determine the temporal and spatial expression patterns of mRNA (in situ hybridization of tissue sections) and protein (immunohistochemistry in individual cells).
  • Tissue samples intended for use in in situ detection of either RNA or protein are fixed using conventional reagents; such samples may comprise whole or squashed cells, or sectioned tissue.
  • Fixatives useful for such procedures include, but are not limited to, formalin, 4% paraformaldehyde in an isotonic buffer, formaldehyde (each of which confers a measure of RNAase resistance to the nucleic acid molecules of the sample) or a multi -component fixative, such as FAAG (85 % ethanol, 4% formaldehyde, 5% acetic acid, 1% EM grade glutaraldehyde).
  • RNAase-free i.e. treated with 0.1% diethylprocarbonate (DEPC) at room temperature overnight and subsequently autoclaved for 1.5 to 2 hours.
  • Tissue will be fixed at 4°C, either on a sample roller or a rocking platform, for 12 to 48 hours in order to allow the fixative to reach the center of the sample.
  • DEPC diethylprocarbonate
  • sample Prior to embedding, excess fixative will be removed and the sample will be dehydrated by a series of two- to ten-minute washes in increasingly high concentrations of ethanol, beginning at 60% and ending with two washes in 95% and another two in 100% ethanol, followed by two ten-minute washes in xylene.
  • Samples will be embedded in one of a variety of sectioning supports, e.g. paraffin, plastic polymers or a mixed paraffin/polymer medium (e.g. Paraplast®Plus Tissue Embedding Medium, supplied by Oxford Labware).
  • sectioning supports e.g. paraffin, plastic polymers or a mixed paraffin/polymer medium (e.g. Paraplast®Plus Tissue Embedding Medium, supplied by Oxford Labware).
  • paraffin plastic polymers
  • a mixed paraffin/polymer medium e.g. Paraplast®Plus Tissue Embedding Medium, supplied by Oxford Labware.
  • fixed, dehydrated tissue will be transferred from the second xylene wash
  • the paraffin or a paraffin/polymer resin will be replaced three to six times over a period of approximately three hours to dilute out residual xylene.
  • the sample will be incubated overnight at 58°C under a vacuum, in order to optimize infiltration of the embedding medium into the tissue.
  • BSA bovine serum albumin
  • fixation and embedding are also applicable for use according to the methods of the invention; examples of these are found in Humason, G.L., 1979, Animal Tissue Techniques, 4th ed. (W.H. Freeman & Co., San Fransisco), as is frozen sectioning (Serrano et al., 1989, supra).
  • In situ Hybridization Analysis According to the method of in situ hybridization a specifically labeled nucleic acid probe is hybridized to cellular RNA present in individual cells or tissue sections. In situ hybridization can be performed on either paraffin or frozen sections. Depending on the desired sensitivity and resolution, either film or emulsion autoradioagraphy can be utilized to detect the hybridized radioactive probe.
  • the following method of in situ hybridization is performed by incubating slides containing cell or tissue specimens in a slide rack contained within a glass staining dish. According to this method, it is preferable to use solutions that have been prepared fresh. Prior to the hybridization steps, slides are dewaxed to remove the sectioning support material.
  • the dewaxing protocol involves sequential washes in xylene, rehydration by sequential washes in 100%, 95%, 70% and 50% ethanol, and denaturation in 0.2N HCl.
  • samples are postfixed in a freshly prepared solution of 4% PFA, washed in PBS, incubated in 10 mM DTT (10 min at 45°C) and blocked in 400 ml PBS containing 0.617g DTT, 0.74 g iodoacetamide and O.Sg N-ethylmaleimide, for 30 min at 45°C in a water bath covered with aluminum foil, due to the light sensitivity of iodoacetamide and N- ethylmaleimide.
  • the samples are washed in PBS and equilibrated sequentially in freshly prepared 0.
  • TEA buffer 1M triethanolamine
  • TEA buffer/0.25% acetic anhydride 1M triethanolamine
  • TEA buffer/0.5% acetic anhydride 1M triethanolamine
  • the sample are dehydrated by sequential washes in 50%, 70%, 95%, and 100% ethanol and air dried.
  • 35 S-labeled riboprobes and competitor probes prepared in the absence of a radiolabel (prepared as described in Section B entitled "Production of a Polynucleotide Sequence") or double-stranded DNA probes (prepared with [ 35 S]dNTPs by methods well known in the art including nick translation or random oligonucleotide-primed synthesis) are heated to 100°C for 3 min and diluted to a concentration of 0.3mg/ml final probe concentration, in 50% formamide, 0.3M NaCI, lOmM TrisCl, pH 8.0, 1 mM EDTA, lx Denhardt solution, 500mg/ml yeast tRNA, 500mg/ml ⁇ oly(A) (Pharmacia), 50 mM DTT, 10% polyethylene glycol (MW 6000).
  • the hybridization step is carried out by covering the sample with an appropriate amount of probe, and incubating for 30 min to 4 hour at 45°C in a chamber designed to prevent dilution or concentration of the hybridization solution. Samples are washed sequentially at 55°C in solution A (50% (v/v) formamide, 2X SSC, 20 mM 2-mercaptoethanol), and solution B (50% (v/v) formamide, 2X SSC, 20 mM 2-mercaptoethanol, 0.5% (v/v) Triton-X-100) and at room temperature in solution C (2X SSC, 20 mM 2- mercaptoethanol).
  • Gene-expression can be regulated by variations in mRNA stability (Liebhaber, 1997, Nucleic Acids Symp Ser., 36:29 and Ross J. 1996, Trends Genet., 5:171). Any gene variation occurring within the cis-acting elements which control mRNA abundance may influence gene expression levels (Peltz et al., 1992, Curr Opin Cell Biol., 4:979). Quantitative RT-PCR (Kohler, et al, 1995, Quantitation of mRNA by polymerase chain reaction, Springer) and mRNA radiolabelling techniques are two methods for measuring relative mRNA abundance and stability.
  • Quantitative PCR employs an internal standard to provide a direct comparison between alternative reactions, enabling comparison of low abundance transcripts or transcripts derived from a sample that is only available in a limited quantity (McPherson MJ et al., eds, 1995, PCR2- A practical approach. IRL Press).
  • RNA Transcription Rates Genetic polymorphism within the regulatory regions of a gene can significantly alter transcription rate and mRNA stability, resulting in reduced biological activity of the encoded protein.
  • One of the most sensitive assays for measuring the rate of gene transcription is the nuclear runoff assay (Groudine and Casimir, 1984, Nucleic Acids Res 12: 1427). Nuclei isolated from cell lines expressing the target gene of interest are treated with radiolabelled UTP and the level of incorporation of radiolabel into nascent RNA transcripts is determined by filter hybridization to immobilized cDNA derived from the target gene.
  • a genetic variation can cause a change in the localization of a particular mRNA species
  • RNA localization Changes in RNA localization can be detected by immunohistochemical methods well known in the art (e.g. in situ analysis described above).
  • mRNA like protein
  • the Xenopus oocyte is a popular, experimentally tractable, system for studying intracellular trafficking of mRNA (Nakielny et al . , 1997, Annu. Rev. Neurosci. , 20:269).
  • Fluorescently labelled RNA is microinjected into the large oocyte cell where its location can be detected using standard microscopy methods.
  • Polymorphic variants of a particular mRNA species may differ in their response to cellular mechanisms responsible for partitioning mRNA within the cell. This method has been useful for demonstrating that sequence variations can affect sub-cellular localization (Grimm et al., 1997,EMBO J., 16:793)
  • Post-Translational alterations resulting from premature stop codons, translational readthrough or multiple open reading frames and translational suppression may occur as a result of a polymorphism.
  • a polynucleotide comprising one or more polymorphisms is subjected to in vitro transcription and in vitro translation (as described in sections B and J entitled “Production of a Polynucleotide Sequence” and "Preparation of a Labeled Protein”).
  • the translation product(s) are analysed for the appearance of aberrantly sized proteins.
  • Additional post-translational alterations that may occur as a result of a polymorphism include changes in localization due to an altered signal sequence, and changes in glycosylation, myristilation, and susceptibility to or sites of proteolytic cleavage.
  • the method of immunocytochemistry can be used to determine if a protein is incorrectly localized, due to the presence of an altered signal sequence.
  • Immunohistochemical techniques including indirect immunofluorescence, immunoperoxidase labeling or immunogold labeling, are used for protein localization.
  • Immunofluorescent labeling of tissue sections (prepared as for in situ analysis, described above) is performed by the following method. Slides containing the sample of interest are equilibrated to room temperature washed in PBS, incubated with an appropriate dilution of primary antibody (1 hour at room temperature), washed in PBS, incubated with an appropriate dilution of secondary antibody (1 hour at room temperature), washed in PBS and analysed under a microscope (Ausubel et al., supra). Alternatively, the sensitivity ofthe immunohistochemical reaction is increased by using a streptavidin-secondary antibody conjugate reacted with a biotin- fluorochrome conjugate. Alternatively, immunogold labeling is used to detect a protein of interest by using an immunogold-conjugated secondary antibody. Immunoperoxidase labeling of tissue sections is performed by the following method.
  • Slides are pretreated in 0.25% hydrogen peroxide, incubated with primary antibody, washed in PBS and incubated (1 hour at room temperature) with a specific secondary bridging antibody capable of recognizing both the primary antibody and a Horseradish peroxidase antiperoixidase (PAP) complex.
  • PAP Horseradish peroxidase antiperoixidase
  • the slides are washed in PBS and developed in diaminobenzidene substrate solution (0.03% (w/v) 3,3' diaminobenzidene in 200 ml PBS) at room temperature (Ausubel et al., supra).
  • protein localization is determined by cell fractionation wherein cells are biosynthetically labeled, the labeled material is fractionated, and the radiolabeled proteins in each fraction are analysed by immunoprecitation with an antibody specific for the protein of interest.
  • Changes in protein glycosylation can be detected by radiolabelling a protein of interest with sugars, determining if a change in the cellular localization (by immunocytochemistry) ofthe protein in culture has occurred due to aberrant glycosylation, or by determining the effects of inhibitors of glycosylation on the migration pattern of proteins analysed by polyacrylamide gel electrophoresis.
  • Protein glycosylation can be inhibited by tunicamycin, an antibiotic, as well as by several sugar analogues (Schwarz, 1991, Behring Inst Mitt., 89:198). These reagents are used to characterize the effects of sequence changes on protein glycosylation.
  • Changes in protein modification with lipids are detected by radiolabelling a protein of interest with myristic acid or by determining if a change in the cellular localization of the protein in culture has occurred as a result of aberrant lipid modification (by immunocytochemistry).
  • Covalent attachment of lipids is a mechanism by which eukaryotic cells direct and, in some cases, control, membrane localization of proteins (Casey, 1994, Curr. Opin. Cell. Biol., 2:219). Such post-translational addition of myristyl, palmityl or prenyl side-chains has akey role in the functional regulation of many proteins (Chow et al., 1992, Curr. Opin. Cell. Biol., 4:629; Resh, 1994, Cell, 763:411). Assays for detecting proteins that are covalently modified by the attachment of lipids include labeling with [ 3 H]myristate (Stevenson et al., 1992, J. Exp.
  • Proteolytic Cleavage Post-translational cleavage of polypeptides is an important mechanism for modulating protein function in many physiological processes. Protease activity is involved in zymogen processing, activation of enzyme catalysis, tissue/cell remodelling, signal transduction cascades, protein degradation and cell death pathways (Rappay, 1989, Prog Histochem Cytochem., 18:1). A protein that is predicted to be a protease or the target of a protease can be assayed in vitro using purified proteins or cell extracts (Muta et al., 1995, J. Biol. Chem. 270:892) where cleavage efficiency is monitored by standard PAGE or western blotting.
  • proteases and/or their targets can be expressed from expression plasmids in in vivo cell culture systems in order to monitor their biological activity (Zhang, et al., 1998, J. Biol. Chem.273: 1144).
  • the specificity of proteolytic cleavage is determined using inhibitors that selectively block seine, cysteine, aspartic and metallo proteolytic activity (e.g. pepstatin A selectively inhibits aspartic proteases) (Rich, et al., 1985, Biochemistry., 24: 3165).
  • pulse chase experiments with radiolabeled protein can be carried out to determine the precursor-product relationship following digestion with a protease of a given specificity.
  • the method of pulse chase labeling is described in Ausubel et al., supra.
  • inhibitors of proteases e.g acid proteases or seine proteases
  • a polymorphism may modify the properties of the receptor such that receptor binding/turnover or activation is altered. Receptor formation can be impaired if a polymorphism causes improper receptor localization or assembly.
  • the receptor can be localized by immunocytochemical techniques.
  • cells that are expressing the receptor can be fractionated and subjected to Western blot analysis or biosynthetically labeled, fractionated and analysed by immunoprecipitation.
  • a number of methods can be used to determine if a receptor is colocalized with the appropriate protein partner.
  • a protein may be dependent on the ability of the protein to interact with other proteins as part of a large complex.
  • certain cell surface receptors consist of a receptor complex that is composed of several homo- or heteromeric protein subunits, and activation by ligand can result in altered protein-protein interactions both within the receptor complex and with "downstream" targets such as G-proteins (Okada and Pessin, 1996, J. Biol. Chem., 271:25533).
  • G-proteins Okada and Pessin, 1996, J. Biol. Chem., 271:25533.
  • Protein-protein interactions can be assayed immunologically by co- immunoprecipitation of native (Gilboa etal., 1998, J. Biol. Chem., 140:767) or chemically cross- linked complexes (Haniu et al., 1997, J.
  • Receptor binding/Turnover Receptor-ligand interaction is essential for the functionality of the bound complex.
  • Receptor binding/turnover can be measured by standard Scatchard analysis of radiolabelled ligand binding in vitro (Culouscou et al., 1993, J. Biol. Chem. 268:10458) or in cellular based assays (Greenlund et al., 1993, J. Biol. Chem. 268: 18103).
  • affinity chromatography methods can be employed to determine if a receptor is demonstrating aberrant binding characteristics. According to the method of affinity chromatography, receptor-ligand interactions are allowed to occur, and the binding efficiency or receptor and ligand and/or turnover of receptor-ligand complexes is measured. Alternatively, affinity chromatography can be used to isolate one or more components of a receptor ligand interaction for further analysis (March et al., 1974, Adv. Exp. Med. Biol., 42:3).
  • the method of affinity chromatography typically involves immobilizing on a solid support one component, for example a known ligand for a receptor, and then incubating the immobilized ligand with radiolabelled protein under optimal binding conditions. To measure the exact binding affinity of a given ligand-receptor pair, an increasing amount of non-labeled competitor is added. This assay can be used to assess altered binding efficiency resulting from the presence of a polymorphism in a protein of interest.
  • Receptor Activation Assays Phosphorylation, Kinase Activity and Mitogenic Stimulation
  • the biological function of a receptor is usually assayed in cell culture following over-expression.
  • the phosphorylated state of a receptor can be assayed directly by immunological methods by employing an antibody that specifically recognizes a phosphorylated residue (Bangalore, 1992., Proc Natl Acad Sci USA., 89:11637).
  • Endogenous kinase activity associated with a receptor is measured via the incorporation of radiolabelled phosphate in immunoprecipitated receptor complex (Kazlauskas and Cooper, 1989, Cell 58:1121).
  • Downstream events of receptor activity including mitogenic stimulation or map kinase activity can be measured by tritiated thymidine incorporation (Luo et al., 1996, Cancer Res. 56:4983), or by mobility-shift analysis of map kinase on western blots (Vietor, 1993., J. Biol. Chem. 268:18994), respectively.
  • Immunocytochemical methods can be used to determine if a receptor-ligand complex is co ⁇ ectly translocated to the nucleus.
  • nuclear preparations prepared as described below
  • Western blot or immunoprecipitation for the presence of the receptor protein.
  • a receptor is a transcriptional activator
  • the ability of the receptor to induce gene expression can be measured by a variety of methods including Northern blot analysis, or reporter gene assays wherein the promoter region isolated from a gene that is activated by the receptor regulates the expression of a reporter protein.
  • the gene of interest may encode a protein that has an enzymatic activity wherein the enzyme catalyzes a reaction that is critical to the general metabolism of a cell.
  • assays can be performed to measure the enzymatic activity of the protein.
  • the protein of interest may also be involved in various aspects of DNA synthesis or replication.
  • In vitro assays for the enzymatic reactions involved in DNA synthesis or replication e.g. polymerase, ligase, exonuclease or helicase activity
  • the biological activity of the proteins catalyzing these activities are assayed in vitro using standard enzymatic techniques (Adams, 199, DNA Replication: A Practical Approach I, Rickwood, et al., Eds., JRL Press. Oxford, England).
  • assays for measuring transporter activity or the activity of ATP dependent pumps are useful, according to the invention, for determining if a mutated protein is impaired in these functions.
  • the full-length cDNA clone is isolated by standard expression cloning strategies, and a change in activity ofthe full-length cDNA or antisense cDNA upon microinjection into Xenopus laevis oocytes is determined by measuring changes in influx/efflux transport of radiolabelled amino acid molecules (Broer et al., 1995, Biochem J., 312(Pt 3):863), neurotransmitters or their metabolites.
  • the coupling ratios e.g. moles substrate transported/mole ATP hydrolyzed
  • the gene of interest may encode for a protein that is a component of an ion channel. Immunocytochemical methods can be used to determine if an ion channel protein demonstrates the appropriate cell type specificity.
  • the activity of an ion channel can be measured by electrophysiological methods in oocytes. Alternatively, the sensitivity of ion channel activity to a particular inhibitor can be determined.
  • This technology represents a useful system for studying various aspects of ion channels encoded for by foreign mRNAs including channel expression, single-channel behavior, and the response of channels to the action of pharmacologically active substances (Sigel, 1987 J. Physiol., 386: 73).
  • the function of individual channel proteins is determined by the high resolution patch clamp technique.
  • This technique (which is useful in a variety of cell types, including Xenopus oocytes described above) involves measuring changes in transmembrane cmxent across the cell membrane in vitro (Sachs et al., 1983, Methods Enzymol., 103: 147). Processes such as signaling, secretion, and synaptic transmission are examined at the cellular level by the patch clamp method.
  • the gene expression pattern and protein structure of ionic channels can be determined by combining information derived from high-resolution electrophysiological recordings obtained by the patch clamp method with molecular biological analysis (Liem et al., 1995, Neurosurgery, 36: 382).
  • a polymorphic variation in a gene that encodes a protein that is a member of a multimeric protein complex, such as an ion channel or a cytoskeletal structural component, can alter the assembly and function the multimeric protein complex (Lee et al., 1994., Biophys J., 66: 667).
  • a gene variation may affect protein-protein interaction, or disrupt the production of components of a multimeric complex, thereby disrupting stoichiometry and consequently decreasing stability.
  • In vitro assembly assays (described above) can be performed to determine if a polymorphism has affected the assembly of an ion channel.
  • the influence of a polymorphism on general aspects of cell behavior, including cell morphology, adhesive properties, differentiation and proliferation can be assessed using a combination of methods including microscopic observation of cell cultures (Azuma et al., 1994, Histol.Histopathol., 9:781), immunohistochemistry, and FACs analysis techniques (Beesley, 1993, Immunocytochemistry: a Practical Approach, Rickwood, et al., (Eds), JRL Press and Ormerod, 1994, Flow Cytometry: a practical Approach, Rickwood et al., (Eds), BRL Press. Oxford, England).
  • Apoptosis has been implicated in the etiology and pathophysiology of a variety of human diseases.
  • Gene variants which influence the process of apoptosis can be assessed by a variety of methods of analysis involving either the tissues or cells (Allen et al., 1997, J Pharmacol Toxicol Methods, 37: 215).
  • Cell cultures expressing the gene variants of interest are analysed using Annexin V which interacts strongly with phosphatidylserine residues that have been exposed as a result of plasma membrane breakdown occurring in the early stages of apoptosis.
  • TdT-mediated deoxyuridine triphosphate (dUTP)-biotin nick end-labeling (TUNEL) is a prefe ⁇ ed method for specific staining of apoptotic cells in histological sections and cytology specimen (Labat-Moleur et al., 1998, J. Histochem Cytochem., 46:327; Sasano et al, 1998., Diagn Cytopathol.,18:398).
  • Apoptosis is also detected by quantification of DNA fragmentation by ethidium bromide staining and gel electrophoresis, or by the use of saturation labeling of 3' ends of DNA fragments (Peng and Liu, 1997, Lab Invest., 77:547).
  • Assay for In Vivo Receptor Function Growth Cone Guidance Assay. Activation of cell-surface receptors can result in the stimulation of cell motility.
  • signaling molecules for example the netrins, (Serafini et al. , 1994, Cell.78: 409), which are responsible for both contact mediated or chemo-mediated attraction and repulsion of migrating cells.
  • a classic model for this activity is the trajectory that the leading edge "growth cone" takes when a neuron is stimulated to grow out from explanted neural tissue in cell culture (Goodman, 1996, Annu Rev Neurosci. 19: 341).
  • Ligands present in the culture medium or immobilized on a substrate bind to receptors on the cell-surface of the growth cone and trigger second-messenger signals thereby dictating an appropriate steering response.
  • the biological activity of such receptors or ligands can be measured by overexpressing the receptor or ligand protein in culture and then monitoring growth cone guidance (Kremoser et al., 1995, Cell 82: 359). Attraction or repulsion of cells which is observed to be different than normal is an indication of the role of this protein in growth guidance, and identifies the polymorphisms as altering function.
  • Changes in gene expression or protein function that result from the presence of a polymorphism can be detected by in vivo assays including the production of transgenic animals, knock out animals or the analysis of naturally occurring animal models of a particular disease.
  • Transgenic mice provide a useful tool for genetic and developmental biology studies and forthe determination of afunction of anovel sequence. Accordingto the method of conventional transgenesis, additional copies of normal or modified genes are injected into the male pronucleus of the zygote and become integrated into the genomic DNA of the recipient mouse. The transgene is transmitted in a Mendelian manner in established transgenic strains.
  • Constructs useful for creating transgenic animals comprise genes under the control of either their normal promoters or an inducible promoter, reporter genes under the control of promoters to be analysed with respect to their patterns of tissue expression and regulation, and constructs containing dominant mutations, mutant promoters, and artificial fusion genes to be studied with regard to their specific developmental outcome.
  • Transgenic mice are useful according to the invention for analysis ofthe dominant effects of overexpressing a candidate gene in mouse. Typically, DNA fragments on the order of 10 kilobases or less are used to construct a transgenic animal (Reeves, 1998, New. Anat, 253: 19).
  • Transgenic animals can be created with a construct comprising a candidate gene containing one or more polymorphisms according to the invention.
  • transgenic mice engineered to overexpress a number of genes, including PCK1 (Valera et al., 1994, Proc. Natl. Acad. Sci. USA, 91: 9151), INS (Mitanchez et al.,FEBSLetters,421: 285), IAPP(D'Alession etal., 1994, Osteoporosis, 43: 1457), Asp (Klebig et al, Proc. Natl. Acad. Sci. USA, 92: 4728) and Agrt (Graham et al., Nature Genetics, 17:273), have been prepared and may be useful for studying osteoporosis.
  • PCK1 Valera et al., 1994, Proc. Natl. Acad. Sci. USA, 91: 9151
  • INS Mitanchez et al.,FEBSLetters,421: 285)
  • Knock out animals are produced by the method of creating gene deletions with homologous recombination. This technique is based on the development of embryonic stem (ES) cells that are derived from embryos, are maintained in culture and have the capacity to participate in the development of every tissue in the mouse when introduced into a host blastocyst. A knock out animal is produced by directing homologous recombination to a specific target gene in the ES cells, thereby producing a null allele of the gene. The potential phenotypic consequences of this null allele (either in heterozygous or homozygous offspring) can be analysed (Reeves, supra).
  • ES embryonic stem
  • Single or double knock out mice that may be useful for studying osteoporosis have been produced for a number of genes including IRS 1 (Araki et al., 1994, Nature, 372:186, Tamemoto et al., 1994, Nature, 372:182), 1R52 (Withers et al., 1998, Nature, 391:900), INSR, BJJRKO, MJJRKO, INSR (Lamothe et al., 1998, FEBS Letter, 426:381), GLUT2, GLUT4 (Katz et al., 1995, Nature, 377:151), GLP1R (Gallwitz and Schmidt, 1997, Z.
  • the method of targeted homologous recombination has been improved by the development of a system for site-specific recombination based on the bacteriophage PI site specific recombinase Cre.
  • the Cre-loxP site-specific DNA recombinase from bacteriophage PI is used in transgenic mouse assays in order to create gene knockouts restricted to defined tissues or developmental stages. Regionally restricted genetic deletion, as opposed to global gene knockout, has the advantage that a phenotype can be attributed to a particular cell/tissue (Marth, 1996, Clin. Invest. 97: 1999).
  • the Cre-loxP system one transgenic mouse strain is engineered such that loxP sites flank one or more exons of the gene of interest.
  • Amplified products useful according to the invention can be prepared by utilizing the method of PCR as described in Section B entitled "Production of a Polynucleotide Sequence
  • Primers useful for producing an amplified product according to the invention can be designed and synthesized as described in Section A entitled “Design and Synthesis of Oligonucleotide Primers”.
  • the invention provides methods (e.g. Southern blot analysis, PCR, primer extension and oligonucleotide hybridization), of detecting a polymorphism in an amplified product.
  • polynucleotide sequences which encode candidate gene protein fragments, fusion proteins or functional equivalents thereof may be used in recombinant DNA molecules that direct the expression of a candidate gene protein in appropriate host cells. Due to the inherent degeneracy ofthe genetic code, other DNA sequences which encode substantially the same or a functionally equivalent amino acid sequence, may be used to clone and express the candidate gene protein. As will be understood by those of skill in the art, it may be advantageous to produce candidate gene-encoding nucleotide sequences possessing non-naturally occurring codons.
  • Codons preferred by a particular prokaryotic or eukaryotic host can be selected, for example, to increase the rate of protein expression or to produce recombinant RNA transcripts having desirable properties, such as a longer half -life as compared to transcripts produced from the naturally occurring sequence.
  • nucleotide sequences of the present invention can be engineered in order to alter a candidate gene-encoding sequence for a variety of reasons, including but not limited to, alterations which modify the cloning, processing and/or expression of the gene product. For example, mutations may be introduced using techniques which are well known in the art, e.g., site-directed mutagenesis to insert new restriction sites, to alter glycosylation patterns, to change codon preference or to produce splice variants.
  • a natural, modified or recombinant candidate gene protein-encoding sequence may be ligated to a heterologous sequence to encode a fusion protein (as described in Section B entitled "Production of a Polynucleotide Sequence").
  • a fusion protein may also be engineered to contain a cleavage site located between a candidate protein and the heterologous protein sequence, so that the protein of interest may be substantially purified away from the heterologous moiety following cleavage.
  • the sequence encoding the candidate gene protein may be synthesized, whole or in part, using chemical methods well known in the art (see Caruthers, et al, 1980, Nuc Acids Res Symp Ser, 7:215, Horn, et al., 1980, Nuc Acids Res Symp Ser, 225, etc.)
  • the protein itself, or a portion thereof could be produced using chemical methods of synthesis.
  • peptide synthesis can be performed using various solid-phase techniques (Roberge, et al., 1995, Science, 269:202) and automated synthesis may be achieved, for example, using the A.1.431 A Peptide Synthesizer (Perkin Elmer) in accordance with the instructions provided by the manufacturer.
  • the newly synthesized peptide can be substantially purified by preparative high performance liquid chromatography (e.g., Creighton, 1983, Proteins, Structures and Molecular Principles, WH Freeman and Co. New YorkNY).
  • the composition ofthe synthetic peptides may be confirmed by amino acid analysis or sequencing (e.g., the Edman degradation procedure; Creighton, supra). Additionally the amino acid sequence of interest, or any part thereof, may be altered during direct synthesis and/or combined using chemical methods with sequences from other proteins , or any part thereof, to produce a variant polypeptide.
  • Expression Systems hi order to express a biologically active protein, the nucleotide sequence encoding the protein of interest or its functional equivalent, is inserted into an appropriate expression vector, i.e., a vector which contains the necessary elements for the transcription and translation of the inserted coding sequence.
  • an appropriate expression vector i.e., a vector which contains the necessary elements for the transcription and translation of the inserted coding sequence.
  • a variety of expression vector/host systems may be utilized to contain and express a protein product of a candidate gene according to the invention. These include but are not limited to microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with virus expression vectors (e.g., baculovirus); plant cell systems transfected with virus expression vector (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) or transformed with bacterial expression vectors (e.g., Ti or pBR322 plasmid); or animal cell systems.
  • microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid or cosmid DNA expression vectors; yeast transformed with yeast expression vectors; insect cell systems infected with virus expression vectors (e.g., baculovirus); plant cell systems transfected with virus expression vector (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic
  • control elements or “regulatory sequences” of these systems vary in their strength and specificities and are those nontranslated regions ofthe vector, enhancers, promoters, and 3' untranslated regions, which interact with host cellular proteins to carry out transcription and translation.
  • any number of suitable transcription and translation elements including constitutive and inducible promoters, may be used.
  • inducible promoters such as the hybrid lacZ promoter of the Bluescript® phagemid (Stratagene, LaJolla CA) or pSportl (Gibco BRL) and ptrp-lac hybrids and the like may be used.
  • the baculovirus polyhedron promoter may be used in insect cells. Promoters or enhancers derived from the genomes of plant cells (e.g., heat shock, RUBISCO; and storage protein genes) or from plant virus (e.g. viral promoters or leader sequences) may be cloned into the vector, hi mammalian cell systems promoters from the mammalian genes or from mammalian viruses are most appropriate. If it is necessary to generate a cell line that contains multiple copies ofthe sequence encoding the protein product ofthe gene of interest, vectors based on 5 V40 or EB V may be used with an appropriate selectable marker. In bacterial systems, a number of expression vectors may be selected depending upon the use intended for the protein of interest.
  • vectors which direct high level expression of fusion proteins that are readily purified may be desirable.
  • Such vectors include, but are not limited to, the multifunctional E. coli cloning and expression vectors such as Bluescript® (Stratagene), in which the sequence encoding the protein of interest may be ligated into the vector in frame with sequences encoding the amino-terminal Met and the subsequent 27 residues of b-galactosidase so that a hybrid protein is produced; pIN vectors (Van Heeke & Schuster, 1989, J Biol Chem 264:5503); and the like.
  • Bluescript® Stratagene
  • pIN vectors Van Heeke & Schuster, 1989, J Biol Chem 264:5503
  • Pgex vectors may also be used to express foreign polypeptides as fusion proteins with GST.
  • fusion proteins are soluble and can easily be purified from lysed cells by adsorption to glutathione-agarose beads followed by elution in the presence of free glutathione.
  • Proteins made in such systems are designed to include heparmn , thrombin or factor X A protease cleavage sites so that the cloned polypeptide of interest can be released from the GST moiety at will.
  • yeast Saccharomyces cerevisiae
  • a number of vectors containing constitutive or inducible promoters such as alpha factor, alcohol oxidase and PGH may be used.
  • the expression of a sequence encoding a protein of interest may be driven by any of a number of promoters.
  • viral promoters such as the 35S and 19S promoters of CaMV (Brisson et al., 1984, Nature 310:511) may be used alone or in combination with the omega leader sequence from TMV (Takamatsu et al, 1987, EMBO J 3:17).
  • plant promoters such as the small subunit of RUBISCO (Coruzzi et al., 1984, EMBO J 3:1671; Broglie et al., 1984, Science, 224:838); or heat shock promoters (Winter I and Sinibaldi RM, 1991, Results Probl Cell Differ., 17:85) may be used. These constructs can be introduced into plant cells by direct DNA transformation or pathogen- mediated transection. For reviews of such techniques, see Hobbs S or Mu ⁇ y LE in McGraw Hill Yearbook of Science and Technology (1992) McGraw Hill New York NY, pp 191-196 or Weissbach and Weissbach (1988) Methods for Plant Molecular Biology, Academic Press, New York, pp 421-463.
  • An alternative expression system which could be used to express a protein of interest is an insect system.
  • Autographa californica nuclear polyhedrosis virus (AcNPV) is used as a vector to express foreign genes in Spodoptera frugiperda cells or in Trichoplusia larvae.
  • the sequence encoding the protein of interest may be cloned into a nonessential region of the virus, such as the polyhedrin gene, and placed under control of the polyhedrin promoter. Successful insertion of the sequence encoding the protein of interest will render the polyhedron gene inactive and produce recombinant virus lacking coat protein coat.
  • the recombinant viruses are then used to infect S.frigoerda cells or Trichoplusia larvae in which the protein of interest is expressed (Smith et al., 1983., J Virol 46:584; Engelhard, et al., 1994, Proc Natl Acad Sci 91 :3224).
  • a number of viral-based expression systems may be utilized.
  • a sequence encoding the protein of interest may be ligated into an adenovirus transcription/translation complex consisting of the late promoter and tripartite leader sequence. Insertion in a nonessential El or E3 region of the viral genome will result in a viable virus capable of expressing in infected host cells (Logan and Shenk, 1984, Proc Natl Acad Sci, 81 :3655).
  • transcription enhancers such as the rous sarcoma virus (RSV) enhancer, may be used to increase expression in mammalian host cells.
  • RSV rous sarcoma virus
  • Specific initiation signals may also be required for efficient translation of a sequence encoding the protein of interest. These signals include the ATG initiation codon and adjacent sequences, hi cases where the sequence encoding the protein, its initiation codon and upstream sequences are inserted into the most appropriate expression vector, no additional translational control signals may be needed. However, in cases where only coding sequence, or a portion thereof, is inserted, exogenous transcriptional control signals including the ATG initiation codon must be provided. Furthermore, the initiation codon must be in the co ⁇ ect reading frame to ensure transcription ofthe entire insert. Exogenous transcriptional elements and initiation codons can be of various origins, both natural and synthetic.
  • Enhancers appropriate to the cell system in use (Scharf, et al., 1994, Results Probl Cell Differ, 20:125; Bittner et al, 1987, Methods in Enzymol, 153:516).
  • a host cell strain may be chosen for its ability to modulate the expression of the inserted sequences or to process the expressed protein in the desired fashion.
  • modifications of the polypeptide include but are not limited to, acetylation, carboxylation, glycosylation, phosphorylation, lipidation and acylation.
  • Post-translational processing which cleaves a " prepro" form ofthe protein may also be important for correct insertion, folding and/or function.
  • Different host cells such as CHO, HeLa, MDCK, 293, W138, etc have specific cellular machinery and characteristic mechanisms for such post-translational activities and may be chosen to ensure the correct modification and processing of the introduced, foreign protein.
  • cell lines which stably express a foreign protein may be transformed using expression vectors which contain viral origins of replication or endogenous expression elements and a selectable marker gene. Following the introduction of the vector, cells may be allowed to grow for 1-2 days in an enriched media before they are switched to selective media.
  • the purpose of the selectable marker is to confer resistance to selection, and its presence allows growth and recovery of cells which successfully express the introduced sequences. Resistant clumps of stably transformed cells can be expanded using tissue culture techniques appropriate to the cell type.
  • Any number of selection systems may be used to recover transformed cell lines. These include, but are not limited to, the herpes simplex virus thymidine kinase (Wigler., et al., 1977, Cell 11:223) and adenine phosphoribosyltransferase (Lowy, et al., 1980, Cell 22:817) genes which can be employed in tk- or aprt- cells, respectively.
  • antimetabolite, antibiotic or herbicide resistance can be used as the basis for selection; for example, dhfr which confers resistance to methotrexate (Wigler et al., 1980, Proc Natl Acad Sci 77:3567); npt, which confers resistance to the aminoglycosides neomycin and G-418 (Colbere-Garapin et al., 1981., J Mol Biol., 150:1) and als or pat, which confer resistance to chlorsulfuron and phosphinotricin acetyltransf erase, respectively (Murry, supra).
  • marker gene expression suggests that the gene of interest is also present, its presence and expression should be confirmed.
  • recombinant cells containing the sequence encoding the foreign protein can be identified by the absence of marker gene function.
  • a marker gene can be placed in tandem with the sequence encoding the foreign protein under the control of a single promoter. Expression of the marker gene in response to induction or selection usually indicates expression of the tandem sequences as well.
  • host cells which contain the coding sequence for a protein of interest and express the protein of interest may be identified by a variety of procedures known to those of skill in the art. These procedures include, but are not limited to, DNA-DNA or DNA-RNA hybridization and protein bioassay or immunoassay techniques which include membrane, solution, or chip based technologies for the detection and/or quantification of the nucleic acid or protein.
  • the presence of the polynucleotide sequence encoding the protein of interest can be detected by DNA-DNA or DNA-RNA hybridization or amplification using probes, portions or fragments of the sequence encoding the foreign protein of interest.
  • a variety of protocols for detecting and measuring the expression of the foreign protein, using either polyclonal or monoclonal antibodies specific for the protein are known in the art. Examples include enzyme-linked immunosorbant assay (ELISA), radioimmunoassay (RIA) and fluorescent activated cell sorting (FACS).
  • ELISA enzyme-linked immunosorbant assay
  • RIA radioimmunoassay
  • FACS fluorescent activated cell sorting
  • a two-site, monoclonal-based immunoassay utilizing monoclonal antibodies reactive to two non-interfering epitopes on the protein of interest is preferred, but a competitive binding assay may be employed. These and other assays are described in Hampton et al., 1990, Serological Methods a Laboratory Manual, APS Presds, St Paul MN and Maddox., et al, 1983, J Exp Med 158:1211.
  • Host cells transformed with a nucleotide sequence encoding a protein of interest may be cultured under conditions suitable for the expression and recovery of the encoded protein from cell culture.
  • the protein produced by a recombinant cell may be secreted or contained intracellularly depending on the sequence and/or the vector used.
  • expression vectors containing a sequence encoding a protein of interest can be designed with signal sequences which direct secretion of the protein of interest through a prokaryotic or eucaryotic cell membrane.
  • recombinanfconstructions may j oin the sequence encoding the protein of interest to the nucleotide sequence encoding a polypeptide domain which will facilitate purification of soluble proteins (Kroll et al., 1993, DNA Cell Biol, 12:441).
  • the protein of interest may also be expressed as a recombinant protein with one or more additional polypeptide domains added to facilitate protein purification.
  • purification facilitating domains include, but are not limited to, metal chelating peptides such as a histidine- tryptophan modules that allow purification on immobilized metals, protein a domains that allow purification on immobilized immunoglobulin, and the domain utilized in the FLAGS extension/affinity purification system (Immunex Corp, Seattle WA).
  • the inclusion of a cleavable linker sequences such as Factor XA or enterokinase (Invitrogen, San Diego CA), between the purification domain and the protein of interest is useful for facilitating purification.
  • One such expression vector provides for expression of a fusion protein comprising the sequence encoding a foreign protein and nucleic acid sequence encoding 6 histidine residues followed by thioredoxin and an enterokinase cleavage site.
  • the histidine residues facilitate purification while the enterokinase cleavage site provides a means for purifying the foreign protein from the fusion protein.
  • fragments of the protein of interest may be produced by direct peptide synthesis using solid-phase techniques (Stewart et al., 1969, Solid- Phase Peptide Synthesis, WH Freeman Co,. San Francisco; Merrifield, 1963, J Am Chem Soc, 85 : 2149) .
  • In vitro protein synthesis may be performed using manual techniques or by automation . Automated synthesis may be achieved, for example, using Applied Biosystems 431 A Peptide Synthesizer (Perkin Elmer, Foster City CA) in accordance with the instructions provided by the manufacturer.
  • Various fragments of a protein of interest may be chemically synthesized separately and combined using chemical methods to produce the full length molecule.
  • Antibodies specific for the protein products of the candidate genes of the invention are useful for protein purification, for the diagnosis and treatment of various diseases (e.g osteoporosis) and for drug screening and drug design methods useful for identifying and developing compounds to be used in the treatment of various diseases (e.g. osteoporosis).
  • diseases e.g osteoporosis
  • drug screening and drug design methods useful for identifying and developing compounds to be used in the treatment of various diseases (e.g. osteoporosis).
  • an antibody useful in the invention may comprise a whole antibody, an antibody fragment, a polyfunctional antibody aggregate, or in general a substance comprising one or more specific binding sites from an antibody.
  • the antibody fragment may be a fragment such as an Fv, Fab or F(ab') 2 fragment or a derivative thereof, such as a single chain Fv fragment.
  • the antibody or antibody fragment may be non-recombinant, recombinant or humanized.
  • the antibody may be of an immunoglobulin isotype, e.g., IgG, lgM, and so forth.
  • an aggregate, polymer, derivative and conjugate of an immunoglobulin or a fragment thereof can be used where appropriate.
  • Neutralizing antibodies are especially useful according to the invention for diagnostics, therapeutics and methods of drug screening and drug design.
  • Peptides used to induce specific antibodies may have an amino acid sequence consisting of at least five amino acids and preferably at least 10 amino acids. Preferably, they should be identical to a region of the natural protein and may contain the entire amino acid sequence of a small, naturally occurring molecule. Short stretches of amino acids coreesponding to the protein product of a candidate gene of the invention may be fused with amino acids from another protein such as keyhole limpet hemocyanin or GST, and antibody will be produced against the chimeric molecule.
  • Procedures well known in the art can be used for the production of antibodies to the protein products of the candidate genes of the invention.
  • various hosts including goats, rabbits, rats, mice etc... may be immunized by injection with the protein products (or any portion, fragment, or oligonucleotide thereof which retains immunogenic properties) of the candidate genes of the invention.
  • various adjuvants may be used to increase the immunological response.
  • adjuvants include but are not limited to Freund's, mineral gels such as aluminum hydroxide, and surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet hemocyanin, and dinitrophenol.
  • BCG Bacilli Calmette-Guerin
  • Corynebacterium parvum are potentially useful human adjuvants.
  • the antigen protein may be conjugated to a conventional carrier in order to increase its immunogenicity, and an antiserum to the peptide-carrier conjugate will be raised. Coupling of a peptide to a carrier protein and immunizations may be performed as described (Dymecki et al., 1992, J. Biol. Chem., 267: 4815).
  • the serum can be titered against protein antigen by ELISA (below) or alternatively by dot or spot blotting (Boersma and Van Leeuwen, 1994, J Neurosci. Methods, 51: 317).
  • the antiserum may be used in tissue sections prepared asdescribed. A useful serum will react strongly with the appropriate peptides by ELISA, for example, following the procedures of Green et al., 1982, Cell, 28: 477.
  • Monoclonal antibodies Techniques for preparing monoclonal antibodies are well known, and monoclonal antibodies may be prepared using a candidate antigen whose level is to be measured or which is to be either inactivated or affinity-purified, preferably bound to a carrier, as described by Arnheiter et al., 1981, Nature, 294;278.
  • Monoclonal antibodies are typically obtained from hybridoma tissue cultures or from ascites fluid obtained from animals into which the hybridoma tissue was introduced.
  • Monoclonal antibody-producing hybridomas (or polyclonal sera) can be screened for antibody binding to the target protein.
  • Antibody Detection Methods Particularly preferred immunological tests rely on the use of either monoclonal or polyclonal antibodies and include enzyme-linked immunoassays (ELISA), immunoblotting and immunoprecipitation (see Voller, 1978, Diagnostic Horizons, 2:1, Microbiological Associates Quarterly Publication, Walkersville, MD; Voller et al., 1978, J. Clin. Pathol., 31: 507; U.S. Reissue Pat. No. 31,006; UK Patent 2,019,408; Butler, 1981, Methods Enzymol, 73: 482; Maggio, E.
  • ELISA enzyme-linked immunoassays
  • Labeling techniques are useful, according to the invention, for studying the biochemical properties, processing, intracellular transport, secretion and degradation of proteins.
  • Biosynthetic labeling of proteins produced by candidate genes of the invention is preferably performed with 35 S-methionine due to the high specific activity (>800Ci/mmol) and ease of detection of this amino acid.
  • Another amino acid should be used to label a protein that contains little or no methionine.
  • either suspension cells or adherent cells are labeled with 35 S-methionine. Briefly, cells are washed and incubated for 15 min at 37°C in short-term labeling medium (complete serum-free, methionine free RPMI or DMEM containing 5% (v/v) dialyzed fetal bovine serum) to deplete intracellular pools of methionine.
  • cells can be labeled in the presence of 35 S- methionine in long term labeling medium (90% methionine free RPMI or DMEM) for up to 16 hours (Ausubel et al., supra).
  • the protein product of the cloned candidate gene of the invention can be produced by the methods of in vitro transcription and in vitro translation.
  • In vitro transcription is performed essentially as described in Section B entitled "Production of a Polynucleotide Sequence" in the absence of a labeled ribonucleoside.
  • the RNA produced by the in vitro transcription reaction will be extracted with phenol, ethanol precipitated twice and resuspended in 10ml of TE buffer.
  • In vitro translation is performed by adding 1 to 10ml of RNA to an in vitro translation kit (e.g. wheat germ or reticulocyte lysate) in the presence of 15mCi [ 35 S]methionine, following the directions provided by the manufacturer.
  • a typical reaction is carried out in a 30ml volume at room temperature for 30 to 60 minutes (Ausubel et al., supra).
  • Mammalian cells expressing a nucleotide sequence comprising a polymorphism are useful, according to the invention for determining the biochemical and functional properties of the protein product of a nucleotide sequence comprising a polymorphism, for analyzing expression of a candidate gene, for large scale production of a protein of interest, for drug screening and for the production of transgenic animals or knockout mice.
  • Methods of efficiently introducing foreign DNA into mammalian cells include calcium phosphate transfection, DEAE-dextran transfection, electroporation and liposome-mediated transfection (Ausubel et al., supra).
  • the method of calcium phosphate transfection involves preparing a precipitate by slowly mixing a HEPES -buffered saline solution with a mixture of calcium chloride and DNA. According to this method, up to 10% of the cells on a dish will incorporate DNA. Cells to be transfected are split one day prior to transfection so that on the day of transfection cells are well-separated on the plate, a 10 cm dish of cells is fed with 9.0 ml of complete medium approximately 2 to 4 hours before the addition of the precipitate.
  • DNA to be transfected (10-50mg/10-cm plate) is ethanol precipitated, resuspended in 450ml sterile water and mixed with 50ml of 2.5M CaCl 2
  • the DNA/CaCl 2 solution is added dropwise to a 15-ml conical tube containing 500ml 2X HeBS (0.283M NaCI, 0.023M HEPES acid, 1.5 mM Na 2 HPO 4 , pH 7.05). It is preferable to bubble the HeBS solution during the addition ofthe DNA mixture.
  • the precipitate After the precipitate has formed for 20 minutes at room temperature, it is added evenly to the cells.
  • the cells are incubated with the precipitate at 37°C in a CO 2 humidified incubator for 4-16 hours. Following removal of the precipitate, the cells are washed with PBS and fed in complete medium.
  • Glycerol or dimethyl sulfoxide shock can be used to increase the DNA uptake by certain types of cells (Ausubel et al., supra).
  • Cells to be transfected are plated at a concentration such that after 3 days of growth they are 30-50% confluent.
  • the DNA to be transfected (approximately 4mg) is ethanol precipitated, resuspended in 40ml TBS and added slowly while shaking to 80ml of warm lOmg/ml DEAE- dextran in TBS.
  • cells are shocked by the addition of 5 ml of 10% DMSO in PBS. After a 1 minute incubation at room temperature, cells are washed with PBS and fed with complete medium (Ausubel et al., supra).
  • DNA can be introduced into cells by the use of high-voltage electric shocks, a technique termed electroporation.
  • electroporation cells are suspended in an appropriate electroporation buffer and placed in an electroporation cuvette.
  • the cuvette is connected to a power supply and the cells are subjected to a high-voltage electrical pulse of a defined magnitude and length, optimized for the cell type being transfected. After a brief period of recovery, the cells are placed in normal culture medium.
  • a population of cells to be transfected by electroporation is grown to late-log phase in complete medium. Typically stable transfection requires 5 X 106 cells, and transient transfection requires 1-4X 10 7 cells. Cells are harvested by centrifugation for 5 minutes at640x gat4°C. The resulting cell pellet is resuspended in half of the original volume of ice-cold electroporation buffer (e.g. PBS without calcium or magnesium, Hepes buffered saline, tissue culture medium without serum, or phosphate buffered sucrose (272mM sucrose/7 mM K 2 HPO 4 , pH 7.4/lmM MgCl 2 )). The choice of an electroporation buffer is dictated by the cell line.
  • electroporation buffer e.g. PBS without calcium or magnesium, Hepes buffered saline, tissue culture medium without serum, or phosphate buffered sucrose (272mM sucrose/7 mM K 2 HPO 4 , pH 7.4/
  • Cells are then harvested by centrifugation for 5 minutes at 640 x g at 4°C, and resuspended at 1 X 10 7 /ml in electroporation buffer at 0°C for stable transfection or at a higher concentration (up to 8 X 10 7 /ml) for transient transfection. Aliquots of the cells (0.5 ml) are transferred into the desired number of electroporation cuvettes and placed on ice.
  • DNA is added to the cell suspension in the cuvettes on ice.
  • DNA (optimally 1-lOmg) should be linearized with a restriction enzyme that cuts at a site in a non- essential region, purified by phenol extraction and ethanol precipitated.
  • Supercoiled DNA (optimally lOmg) may be used for transient transfection. The DNA/cell suspension is mixed, and incubated on ice for 5 minutes.
  • the cuvette is placed in the holder in the electroporation apparatus (at room temperature) and shocked one or more times at the desired voltage and capacitance settings.
  • An electroporation apparatus useful according to the invention is the Bio-Rad Gene Pulser.
  • the number of shocks and the voltage and capacitance settings will vary depending on the cell type, and should be optimized. The two parameters that are critical for successful electroporation are the maximum voltage for the shock and the duration of the current pulse.
  • the cuvette containing the mixture of cells and DNA is incubated on ice for 10 minutes.
  • the transfected cells are diluted 20-fold in complete culture medium.
  • stable transfection cells are grown for 48 hours in nonselective medium and then transferred to antibiotic containing medium.
  • transient transfection cells are incubated 50-60 hours and then harvested for the desired transient assay.
  • Transgenic animals expressing a construct comprising a candidate gene containing a polymorphism, according to the invention can be produced by methods well known in the art (reviewed in Reeves et al., supra). Knock out mice wherein a candidate gene according to the invention has been disrupted can be produced by methods well known in the art (reviewed in Moreadith and Radford, 1997, J,Mol. Med., 75:208 and Shastry, 1998, Mol. Cell. Biochem., 181:163). These animals provide useful models for studying the functional consequences of one or more polymorphisms in a gene of interest.
  • the invention provides a method of producing a candidate gene library comprising genes that are potentially associated with the susceptibility to, or pathogenesis of a disease.
  • a candidate gene library is useful for determining the genetic basis of a disease of interest. Genetic susceptibility to a disease must occur as a result of specific DNA differences relative to non-susceptible individuals. In the case of osteoporosis, many genes are known which are potentially involved in the susceptibility to, or pathogenesis of the disease. These genes are included in the candidate gene library and the association of these genes with osteoporosis is determined from population studies according to the invention.
  • the candidate gene strategy Unlike linkage studies wherein a region of the genome that is thought to be involved in a disease is determined, the candidate gene strategy, including association studies, addresses the involvement of a particular gene in a disease.
  • the results of association studies of candidate genes are used to identify genes that should be intensively studied as potential therapeutics or therapeutic targets.
  • the full range of polymorphic sites within each candidate gene is identified and examined in diseased and normal populations. The frequency of each gene variant (allele) in each population is then compared to the other. If a specific polymorphism under analysis contributes to the disease phenotype, it will be present in the diseased population at a higher frequency than in the normal population.
  • the specific polymorphism under analysis does not itself contribute to the disease phenotype but resides elsewhere in, or is near to a gene containing a contributory polymorphism, a significant association may be seen with the polymorphic marker being tested. This is because the two markers are in linkage disequilibrium with each other due to their close proximity.
  • the goal of linkage studies is to determine the approximate position of disease genes by studying related individuals in families. According to linkage strategies, DNA markers that are randomly spaced throughout the genome, but are rarely located within genes, are tested for the frequency of their presence along with the particular disease phenotype. There is approximately a 50% chance of an unlinked gene and marker gene co-localizing. If a particular marker is present at a significantly higher frequency than expected in disease individuals, this indicates that the marker is located in the vicinity of the disease gene. Usually the disease gene is delimited to a large region (containing tens to hundreds of genes). After a disease gene has been grossly mapped, this entire region must be extensively characterized to determine what genes are present in the region. Any gene that is identified according to this method becomes a candidate gene.
  • a series of genetic crosses is performed in an animal model system of a particular defect that is characteristic of a disease of interest (e.g. osteoporosis) between individuals having an observable mutant phenotype and normal individuals of a control strain.
  • At least one disease- related loci is used as a marker in these crosses.
  • linkage analysis can be performed using chromosomal markers that do not comprise a disease related locus (described below). If non-random assortment of the mutant trait with a marker locus is observed, and if that non- random assortment is statistically significant (for example, if a Student's t test or ANOVA is applied to the results) the trait is linked to the marker locus.
  • Pedigree analysis is a useful technique for identifying genes for which variant alleles may contribute to the risk, onset or progression of a disease in a family containing multiple individuals afflicted with a disease; according to this method, numerous genetic loci from affected and unaffected family members are compared. Non-random assortment of a given genetic marker between affected and unaffected family members relative to the distributions observed for other genetic loci indicates that the marker (for example, a variant isoform of a gene) either contributes to the disease or is in physical proximity to another that does so.
  • the marker for example, a variant isoform of a gene
  • YAC yeast artificial chromosome
  • BAC bacterial artificial chromosome
  • An initial evaluation may be performed with the assistance of a computer program, such as the PathCallingTM (CuraGen) biological pathway discovery platform.
  • All or a subset of the open reading frames present in the region are then cloned (e.g., by PCR) from mutant animals or affected family members and from their healthy counterparts (either control animals or unaffected family members), and the sequences of these open reading frames are compared.
  • Jf a mutation or other allelic variant is found to be linked to individuals displaying the disease phenotype (in a statistically-significant, non-random manner), it can be concluded that this mutation is associated with a disease phenotype.
  • a nucleic acid fragment containing this gene can be labeled and used as a probe for in situ hybridization analysis of fixed chromosomes of the human or other mammal to determine precisely the physical location of the gene.
  • a gene that has been mapped and isolated in this manner may be useful as a candidate target for disease diagnosis and for drug targeting according to the invention (see below).
  • a candidate gene library according to the invention will include i . genes that are involved in known or predicted disease pathways, ii. new genes that are identified by a relevant pattern of specific tissue or cell expression, iii. genes that map to genomic regions of known linkage, and iv. gene sequences (from sequence databases) that are homologs of the above referenced categories of potential candidate genes.
  • the choice of potentially related genes to be selected from a database will depend on the percent identity as calculated by Fast DB and based upon mismatch penalty, gap penalty, gap size penalty and joining penalty.
  • predictions can be made regarding a cell or tissue-type that would be expected to express high or low levels of candidate genes associated with a particular disease. For osteoporosis, it is expected that muscle, adipose, pancreas or liver tissue or tissue comprising insulin secreting pancreatic b-cells, would be useful for identifying candidate genes according to the invention.
  • SAGE depends on the following two principles. First, sufficient information is contained within a short nucleotide sequence (approximately 9-lObp), isolated from a defined location within a transcript, to uniquely identify a transcript. Second, the concatenation of short tags of sequence allows transcripts to be analysed serially by sequencing multiple tags within a single clone.
  • the method of SAGE is performed by synthesizing double-stranded cDNA from mRNA, cleaving the resulting cDNA with an anchoring restriction endonuclease that is expected to cleave most transcripts at least one time, and isolating the most 3' region of the cleaved cDNA by binding to streptavadin beads.
  • This protocol allows for the identification of a unique site on a transcript that corresponds to the restriction site located closest to the polyA tail.
  • Replicate samples of the most 3 ' region of the cDNA are ligated to one of two linker molecules that contain a type US restriction site for a tagging enzyme.
  • the cleavage site for Type IIS restriction endonucleases is located at a defined distance up to 20 bp from the asymmetric recognition site.
  • Linkers are designed such that upon cleavage of the ligation product with the tagging enzyme there is release of the linker and an attached short region of cDNA.
  • the two pools of released tags are ligated to each other and the resulting ligated product is used as a template for PCR amplification in the presence of primers that are specific for each linker.
  • the PCR product is cleaved with the anchoring enzyme and amplification products, comprising two tags linked tail to tail, are isolated, concatenated by ligation, cloned and sequenced (Velescu et al., supra).
  • Differential display provides a method for separating and cloning individual mRNAs by PCR analysis.
  • oligonucleotide primers are selected wherein one primer is anchored to the polyadenylate tail of a subset of mRNA species and the other primer is short and of an arbitrary sequence such that it anneals at different positions relative to the first primer.
  • the mRNA subpopulations that are identified with these primer pairs are subjected to reverse transcription, amplified and analysed on a DNA sequencing gel.
  • DNA sequences to be tested for expression are spotted onto a surface, usually at high-density to allow for the testing of many genes.
  • the surface contain the DNA sequences is typically refe ⁇ ed to as a 'chip' .
  • the spotted DNA cam be either cDNA clones or oligonucleotides.
  • RNA is prepared from the two cells or tissues to be compared. The RNA from one cell/tissue will be labeled red and the RNA from the other cell/tissue will be labeled yellow. Both RNA preparations are hybridized to the DNA array. The ratio of red to yellow is indicative of the relative levels of expression between the two cells/tissues.
  • Linkage analysis provides a method for identifying genes mapping to genomic regions of known linkage.
  • linkage analysis may be performed between an unmapped candidate gene and one or more ofthe disease-related loci or by analyzing the genetic linkage between the candidate gene and chromosomal markers which are not themselves linked to a disease-related locus, according to the same method.
  • the spacing of markers throughout the genome of the test organism is approximately one every cM or less. This spacing will ensure complete coverage of the genome and will facilitate accurate mapping.
  • Other methods for mapping a candidate gene are provided below.
  • Radiation hybrid (RH) mapping is a somatic cell hybrid technique that was developed to create high resolution, contiguous maps of mammalian chromosomes.
  • the method is useful for ordering DNA markers spanning millions of base pairs of DNA at a resolution not easily obtained by other mapping methods (Cox etal., 1990, Science, 250: 245; Burffle etal., 1991, Genomics, 9:19; Warrington et al., 1992, Genomics, 13: 803; Abel et al., 1993, Genomics, 17:632).
  • Radiation hybrid mapping facilitates the mapping of non-polymorphic DNA markers that cannot be used for meiotic mapping.
  • a lethal dose of X-irradiation is used to fragment the chromosomes ofthe donor cell line. Chromosome fragments from the donor cell line are then retained, in a non-selective manner, following cell fusion with a recipient cell line. The resulting hybrid clones are then analysed for the presence or absence of specific donor chromosome markers. It is expected that markers that are further apart on a chromosome are more likely to be broken apart by radiation and to segregate independently in the RH cells than markers that are closer together.
  • mRNA is isolated from a tissue of choice, wherein the tissue is obtained from two distinct organisms and wherein one organism displays a mutant phenotype with regard to a particular trait while the other is normal in that respect.
  • Methods well known in the art are used to prepare cDNA from the mRNA derived from the organism.
  • the mRNA template is then degraded, either by hydrolysis under alkaline conditions or by RNAase H-mediated cleavage, and the cDNA is returned to a buffer in which mRNA is stable, and mixed with a molar excess of mRNA prepared from the second organism under conditions of stringent hybridization.
  • the mixture is then passed over a hydroxyapatite column, which binds double-stranded nucleic acids but allows single stranded nucleic acid molecules to pass through.
  • Reverse transcripts derived from the first sample which do not hybridize to niRNA molecules derived from the second organism (in other words, reverse transcripts specific to the first tissue sample) are present in the flow-through fraction and are cloned into a vector to create a subtraction library.
  • the reciprocal experiment in which the cDNA is derived from the second mRNA preparation) is also carried out to create a complete set of transcripts specific to the tissue samples derived from the two organisms.
  • This procedure will provide transcripts that can be labeled and used as probes in in situ hybridization analysis of immobilized chromosomes.
  • the method of subtractive screening therefore, yields both cloned genes as well as reagents useful for determining if the cloned genes co-localize with a loci of interest. If a particular gene is found to co-localize to a loci of interest, the genes may be analysed functionally (e.g., in a phenotypic rescue experiment, as described below or by the phenotypic assays described in Section F entitled "Identification and Characterization of Polymorphisms") Ultimately, these genes may be used as targets for drugs or disease diagnostic methods, or even as therapeutic nucleic acids.
  • entrapment vectors first described in bacteria (Casadaban and Cohen, 1979, Proc. Natl. Acad. Sci. U.S.A., 76: 4530; Casadaban et al., 1980, J Bacteriol, 143: 971).
  • entrapment vectors can be introduced into pluripotent ES cells in culture (for example, using electroporation or a retrovirus) and then passed into the germline via chimeras (Gossler et al., 1989, Science, 244: 463; Skames, 1990, Biotechnology, 8:827).
  • transgenic animals containing entrapment vectors may be generated by standard oocyte injection protocols.
  • Promoter or gene trap vectors often contain a reporter gene, e.g., lacZ, Cat or green fluorescent protein (Gfp) that lacks its own upstream promoter and/or splice acceptor sequence.
  • promoter gene traps contain a reporter gene with a splice site but no promoter. If the vector integrates within a gene and is spliced into the gene product, then the reporter gene will be expressed. Enhancer traps contain a reporter gene and have a minimal promoter which requires the activity of an enhancer in order to function. If the vector integrates near an enhancer (whether in a gene or not), then the reporter gene will be expressed. Activation of the reporter gene can only occur when the vector is integrated within an active host gene and generates a fusion transcript with the host gene. The activity of a reporter gene provides an easy assay for determining if a vector has been integrated into an expressed gene. Methods for detecting reporter gene activity in transfected cells or tissues of a transgenic animal are well known in the art.
  • the mutagenic vector may be mapped using standard cytogenetic techniques, such as in situ hybridization, wherein a labeled fragment comprising vector-specific sequence is used as a probe. Co-localization of the probe with a particular locus of interest indicates that the associated gene is a suitable candidate and should be subjected to further analysis. A gene that has been identified in this manner can be cloned as described.
  • N. Diagnostic Indicators, Screens and Disease Symptoms in another embodiment of the invention, there is provided a method of diagnosing or determining susceptibility of a subject to low BMD and/or bone damage.
  • This method involves analyzing the genetic material of a subject to determine which allele(s) ofthe gene is/are present.
  • the method may include determining whether one or more particular alleles are present, or which combination of alleles (i.e. a haplotype) is present.
  • the method may also include determining whether subjects are homozygous or heterozygous for a particular allele or haplotype.
  • the method comprises determining which allele of one or more ofthe polymorphisms ofthe invention is/are present.
  • the method may include determining the presence of the polymorphism of the gene which in combination with polymorphisms defined herein or other polymorphisms may define a risk haplotype.
  • the polynucleotides sequences for these particular alleles may be used for diagnostic purposes .
  • the polynucleotides which may be used include oligonucleotides, complementary RNA and DNA molecules and PNAs.
  • the polynucleotides may be used to determine whether subjects are homozygous or heterozygous for a particular allele or haplotype making them susceptible to low BMD and/or bone damage, and hence, osteoporosis.
  • hybridization with a PCR probe which is capable of detecting particular polymorphism and these probes may be used to identify nucleic acid sequences of particular alleles or haplotype. These probes must be specific to these particular alleles and the stringency of the hybridization or amplification must be such that the probe identifies only this particular allele.
  • Means for producing specific hybridization probes for these polynucleotides of particular alleles include the cloning of these polynucleotide sequences into vectors for the production of mRNA probes is well known to one skilled in the art. Such vectors are known in the art, are commercially available, and may be used to synthesize RNA probes in vitro by means of the addition of the appropriate RNA polymerases and the appropriate labeled nucleotides.
  • Hybridization probes may be labeled by a variety of reporter groups, for example, by radionuclides such as 32 P or 35 S, or by enzymatic labels, such as alkaline phosphatase coupled to the probe via avidin/biotin coupling systems, and the like.
  • Polynucleotides of particular alleles or haplotype may be used in Southern or northern analysis, dot blot, or other membrane-based technologies; in PCR technologies; in dipstick, pin, and multiformat ELISA-like assays; and in microarrays utilizing fluids or tissues from patients to detect susceptibility to low BMD and/or bone damage. Such qualitative methods are well known in the art.
  • polynucleotides of particular alleles or haplotype may be used in assays that detect susceptibility to low BMD and/or bone damage, particularly those mentioned above.
  • Polynucleotides complementary to sequences of a particular allele or haplotype may be labeled by standard methods and added to a fluid or tissue sample from a patient under conditions suitable for the formation of hybridization complexes. After a suitable incubation period, the sample is washed and determined if there is a signal. If a signal is found, then the presence of the polynucleotide of a particular allele, alleles or haplotype in the sample indicates the susceptibility to low BMD and/or bone damage, and hence, osteoporosis.
  • Such assays may also be used to determine the particular therapeutic treatment regimen for an individual patient.
  • osteoporosis With respect to osteoporosis, the presence of a particular polymorphism or polymorphisms in a tissue sample from an individual may indicate a predisposition for low BMD and/or bone damage, or may provide a means for detecting osteoporosis prior to the appearance of actual clinical symptoms. A more definitive diagnosis of this type may allow health professionals to employ preventative measures or aggressive treatment earlier, thereby preventing the development or further progression of osteoporosis.
  • oligonucleotides designed from the polynucleotide sequences of a particular allele or haplotype may involve the use of PCR. These oligomers may be chemically synthesized, generated enzymatically, or produced in vitro. Oligomers will contain a fragment of a polynucleotide a particular allele, alleles or haplotype or a fragment of a polynucleotide complementary to the polynucleotide a particular allele, alleles or haplotype, and will be employed under optimized conditions for identification of a specific polymorphism, polymorphisms or haplotype. Oligomers may also be employed under very stringent conditions for detection of these particular DNA or RNA sequences.
  • oligonucleotides or longer fragments derived from any of the polynucleotides described herein may be used as elements on a microarray.
  • the microarray can be used in transcript imaging techniques to detect a particular polymorphism, polymorphisms or haplotype simultaneously as described below.
  • this information may be used to develop a pharmacogenomic profile of a patient in order to select the most appropriate and effective treatment regimen for that patient. For example, therapeutic agents which are highly effective and display the fewest side effects may be selected for a patient based on his/her pharmacogenomic profile.
  • a method involves the use of antibodies in diagnosing or determining the susceptibility to low BMD and/or bone damage.
  • the antibodies would specifically bind to an epitope of a particular allele or form of the protein and may be used to determine susceptibility to low BMD and/or bone damage, and hence, osteoporosis.
  • Antibodies useful for diagnostic purposes may be prepared in the same manner as described above for therapeutics. Diagnostic assays for determining susceptibility to low BMD and/or bone damage include methods which utilize the antibody and a label to detect a particular allele or form of the protein in human body fluids or in extracts of cells or tissues.
  • the antibodies may be used with or without modification, and may be labeled by covalent or non-covalent attachment of a reporter molecule.
  • reporter molecules A wide variety of reporter molecules, several of which are described above, are known in the art and may be used.
  • fragments of ABBR, or antibodies specific for ABBR may be used as elements on a microarray.
  • Microa ⁇ ays may be prepared, used, and analysed using methods known in the art (Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. Natl. Acad. Sci. USA 93: 10614-10619; Baldeschweiler et al. (1995) PCT application WO95/251116; Shalon, D. et al. (1995) PCT application WO95/35505; Heller, R.A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150-2155; Heller, M.J. et al. (1997) U.S. Patent No. 5,605,662).
  • Various types of microarrays are well known and thoroughly described in Schena, M., ed. (1999; DNA Microarrays: A Practical Approach, Oxford University Press, London).
  • tissue or fluid samples containing a polynucleotide or polypeptide of interest include but are not limited to plasma, serum, spinal fluid, lymph fluid, urine, stool, external secretions of the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and samples of in vitro cell culture constituents.
  • Genomic DNA, cDNA or RNA can be prepared from the human sample according to the methods described above.
  • a biological sample such as blood is prepared and analysed for the presence or absence of susceptibility alleles of a gene containing a polymorphism, according to the invention. Results of these tests and interpretive information will be returned to the health care provider for communication to the tested individual.
  • diagnoses may be performed by diagnostic laboratories, or, alternatively, diagnostic kits are manufactured and sold to health care providers or to private individuals for self-diagnosis.
  • the screening method will involve amplification ofthe relevant gene sequences.
  • the screening method involves a non-PCR based strategy.
  • non-PCR based screening methods include Southern blot analysis to detect the presence of a variant form of a gene in a sample comprising total genomic DNA from the individual being tested.
  • northern blot analysis can be used to detect an aberrant mRNA encoded by a gene, that exhibits altered stability or is the result of alternative splicing in a sample comprising RNA from an individual being tested.
  • the methods of S 1 nuclease analysis, RNASE protection and primer extension can also be used to determine both the endpoint and the amount of a gene specific mRNA (Ausubel et al., supra). Both PCR and non-PCR based screening strategies can detect target sequences with a high level of sensitivity.
  • the preferred method is target amplification.
  • the target nucleic acid sequence is amplified with polymerases.
  • One particularly preferred method using polymerase-driven amplification is PCR (described above).
  • the polymerase chain reaction and other polymerase-driven amplification assays can achieve over a million-fold increase in copy number through the use of polymerase-driven amplification cycles.
  • PCR primers useful for target amplification according to the invention will be designed to amplify a region of DNA containing one or more polymorphisms. Allele specific primers (comprising one or more polymorphisms) are also useful for detecting gene sequence variations by PCR methodologies according to the invention.
  • the absence of a particular polymorphism will be indicated by the absence of an amplified product when the amplification step is carried out in the presence of allele specific primers.
  • the resulting nucleic acid can be sequenced and the specific sequence of the test DNA will be compared with the wild type sequence by using the computer programs described in Section F entitled "Identification and Characterization of Polymorphisms".
  • the amplified product will be analysed by Southern blot assay with nucleic acid probes. Nucleic acid probes, useful according to the invention, will be specifically hybridizable to a mutant form of a gene but not to the wild type gene due to the presence of one or more polymorphisms.
  • the biological sample to be analysed such as blood or serum
  • the biological sample to be analysed may be treated, if desired, to extract the nucleic acids (as described above).
  • the sample nucleic acids isolated from a biological sample or amplified by PCR
  • the targeted region ofthe nucleic acids being analysed are at least partially single-stranded to form hybrids with the targeting sequence of the probe. If the sequence is naturally single-stranded, denaturation will not be required. However, if the sequence is double-stranded, the sequence will probably need to be denatured. Denaturation can be carried out by various techniques known in the art.
  • analyte nucleic acid and probe will be incubated under conditions which promote stable hybrid formation of the target sequence in the probe with the putative targeted sequence in the sample DNA. If the region of the probe which is used to bind to the analyte is designed to be completely complementary to the targeted region, high stringency conditions are desirable in order to prevent false positives. However, conditions of high stringency will be used only if the probes are complementary to regions ofthe chromosome which are unique in the genome. The stringency of hybridization is determined by a number of factors (described above). Detection, if any, ofthe resulting hybrid is usually accomplished by the use of labeled probes.
  • the probe may be unlabeled, but may be detectable by specific binding with a ligand which is labeled, either directly or indirectly.
  • a ligand which is labeled, either directly or indirectly.
  • Suitable labels, and methods for labeling probes and ligand are known in the art, and are described in Section C entitled "Production of a Nucleic Acid Probe".
  • the foregoing screening method may be modified to identify individuals having a gene containing a neutral polymorphism not associated with osteoporosis, by preferably amplifying DNA fragments of a gene derived from a particular individual.
  • the amplified DNA fragments are sequenced and the sequence is compared to the consensus gene sequence containing neutral polymorphisms.
  • differences between the individual's coding sequence for a gene and a consensus sequence for the same gene are determined wherein the presence of any neutral polymorphisms and the absence of a polymorphisms not previously identified as neutral polymorphisms can be correlated with an absence of increased genetic susceptibility to osteoporosis resulting from a mutation in a gene coding sequence.
  • detection of a polymorphism will be performed by detecting loss of a restriction enzyme recognition site due to the presence of one or more polymorphisms.
  • a polymorphism will be detected with a polynucleotide probe that is capable of detecting a restriction enzyme fragment containing the polymorphism, wherein the fragment is of a size that can be easily separated on an agarose gel and visualized by Southern blot analysis.
  • a polynucleotide probe according to this embodiment of the invention can be specific for a sequence within the candidate gene or outside of the candidate gene.
  • the nucleic acid probe assays of this invention will employ a mixture of nucleic acid probes capable of detecting a gene.
  • a mixture of nucleic acid probes capable of detecting a gene in one example to detect the presence of a gene in a test sample, more than one probe complementary to a gene is employed and in particular the number of different probes is alternatively 2, 3, or 5 different nucleic acid probe sequences.
  • the probe mixture includes probes capable of binding to the allele- specific mutations identified in populations of patients with alterations in a gene.
  • any number of probes can be used, and will preferably include probes corresponding to the major gene mutations identified as predisposing an individual to osteoporosis.
  • Northern blot analysis SI nuclease analysis, RNASE protection and primer extension (Ausubel et al., supra) are also methods according to the invention for detecting changes in mRNA resulting from the presence of one or more polymorphisms in the sequence of a gene. Additionally, ofthe methods of genotyping described in Section F entitled “Identification and Characterization of Polymorphisms" can be used for diagnostics according to the invention.
  • Peptide Diagnosis and Diagnostic kits osteoporosis can also be detected on the basis of an alteration of the wild-type polypeptide. Such alterations can be determined by sequence analysis in accordance with conventional techniques. More preferably, antibodies (polyclonal or monoclonal) are used to detect differences in, or the absence of peptides derived from a gene of interest. The antibodies maybe prepared as described above in Section I entitled "Preparation of Antibodies". Preferably, antibodies will immunoprecipitate the protein product of a gene from solution as well as react with the protein product of a gene on Western or immunoblots of polyacrylamide gels. Antibodies useful according to the invention will also detect the protein product of a gene in paraffin or frozen tissue sections, using immunocytochemical techniques.
  • Prefe ⁇ ed embodiments relating to methods for detecting wild type or mutant forms of the protein product of a gene include enzyme linked immunosorbent assays (ELISA), radioimmunoassay (RIA), immunoradiometric assays (IRMA) and immunoenzymatic assays (IEMA), including sandwich assays using monoclonal and/or polyclonal antibodies.
  • ELISA enzyme linked immunosorbent assays
  • RIA radioimmunoassay
  • IRMA immunoradiometric assays
  • IEMA immunoenzymatic assays
  • Exemplary sandwich assays are described by David et al. In U.S. Pat. Nos.4,376,110 and 4,486,530, hereby incorporated by reference.
  • This invention is particularly useful for screening therapeutic compounds by using the mutant gene or protein product or binding fragment of the gene in any of a variety of drug screening techniques.
  • the protein product or fragment of a gene employed in such a test may either be free in solution, affixed to a solid support, expressed on the surface of a cell, or located intracellularly.
  • One method of drug screening utilizes eukaryotic or procaryotic host cells which are stably transformed with a recombinant polynucleotide expressing the polypeptide or fragment, preferably in competitive binding assays. Such cells, either in viable or fixed form, can be used for standard binding assays. In particular, these cells can be used to measure formation of a complex comprising the protein product or fragment of a gene and the agent being tested.
  • these cells can be used to determine if the formation of a complex between the protein product or fragment of a gene and a known ligand is interfered with by an agent being tested.
  • the present invention discloses methods useful for drug screening wherein such methods comprise contacting a candidate drug with a polypeptide or fragment derived from a gene and assaying (i) for the presence of a complex between the drug and the polypeptide derived or fragment derived from a gene, or (ii) for the presence of a complex between the polypeptide or fragment derived from a gene and a ligand, by methods well known in the art.
  • the polypeptide or fragment derived from a gene is labeled for use in competitive binding assays.
  • Purified protein can be coated directly onto plates for use in the aforementioned drug screening techniques.
  • non-neutralizing antibodies to the polypeptide can be used to capture the polypeptide or peptide fragment of interest and immobilize it on the solid support.
  • An additional technique for drug screening involves the use of host eukaryotic cell lines or cells (such as described above) which have a gene that produces a defective protein.
  • the host cell lines or cells are grown in the presence of a test drug compound.
  • the rate of growth ofthe host cells is measured to determine if the compound is capable of regulating the growth of cells expressing a nonfunctional protein product of a gene.
  • the ability of the test compound to restore the function of the mutant gene protein can be measured by using an appropriate in vitro assay for function of the protein product of a gene. Suitable in vitro functional assays are described in Section F entitled "Identification and Characterization of Polymorphisms".
  • the ability of the test compound to alter the cellular localization of the protein will be determined. Changes in the cellular localization of a protein of interest will be detected by performing cellular fractionation studies with biosynthetically labeled cells. Alternatively, the cellular localization of a protein of interest can be determined by immunocytochemical methods well known in the art. A method of drug screening may involve the use of host eukaryotic cell lines or cells
  • aberrant pattern of expression is meant the level of expression is either abnormally high or low, or the temporal pattern of expression is different from that of the wild type gene.
  • the ability of a test drug to alter the expression of a mutant form of a gene can be measured by Northern blot analysis, SI nuclease analysis, primer extension or RNASE protection assays.
  • cells can be engineered to express a reporter construct comprising a mutant gene promoter driving expression of a reporter gene (e.g. CAT, luciferase, green fluorescent protein). These cells can be grown in the presence of a test compound and the ability of a test compound to alter the level of activity of the mutant gene promoter can be determined by standard assays for each reporter gene which are well known in the art.
  • a “candidate drug” as used herein, is any compound with a potential to modulate a phenotype associated with a particular disease according to the invention.
  • a candidate drug is tested in a concentration range that depends upon the molecular weight of the drug and the type of assay.
  • small molecules (as defined below) may be tested in a concentration range of lpg - lOOmg/ml, preferably at about 100 pg - 10 ng/ml; large molecules, e.g., peptides, may be tested in the range of 10 ng - 100 mg/ml, preferably 100 ng - 10 mg/ml.
  • Candidate drug compounds from large libraries of synthetic or natural compounds can be screened. Numerous means are cu ⁇ ently used for random and directed synthesis of saccharide, peptide, and nucleic acid based compounds.
  • Synthetic compound libraries are commercially available from a number of companies including Maybridge Chemical Co. (Trevillet, Cornwall, UK), Comgenex (Princeton, NJ), Brandon Associates (Merrimack, NH), and Microsource (New Milford, CT).
  • a rare chemical library is available from Aldrich (Milwaukee, WI).
  • Combinatorial libraries are available and can be prepared.
  • libraries of natural compounds in the form of bacterial, fungal, plant and animal extracts are available from e.g., Pan Laboratories (Bothell, WA) or MycoSearch (NC), or are readily produceable by methods well known in the art.
  • natural and synthetically produced libraries and compounds are readily modified through conventional chemical, physical, and biochemical means.
  • Useful compounds may be found within numerous chemical classes, though typically they are organic compounds, and preferably small organic compounds. Small organic compounds have a molecular weight of more than 50 yet less than about 2,500 daltons, preferably less than about 750 daltons, more preferably less than about 350 daltons. Exemplary classes include heterocycles, peptides, saccharides, steroids, and the like. The compounds may be modified to enhance efficacy, stability, pharmaceutical compatibility, and the like. Structural identification of an agent may be used to identify, generate, or screen additional agents.
  • peptide agents may be modified in a variety of ways to enhance their stability, such as using an unnatural amino acid, such as a D-amino acid, particularly D-alanine, by functionalizing the amino or carboxylic terminus, e.g. for the amino group, acylation or alkylation, and for the carboxyl group, esterification or amidification, or the like.
  • an unnatural amino acid such as a D-amino acid, particularly D-alanine
  • Determination of Activity of a Drug is determined to be effective if its use results in a change of about 10% of a phenotype associated with a disease according to the invention.
  • the level of modulation by a candidate modulator of a phenotype associated with a disease may be quantified using any acceptable limits, for example, via the following formula, which describes detections performed with a radioactively labeled probe (e.g., a radiolabeled antibody in an immunobinding experiment or a radiolabeled nucleic acid probe in a Northern hybridization).
  • a radioactively labeled probe e.g., a radiolabeled antibody in an immunobinding experiment or a radiolabeled nucleic acid probe in a Northern hybridization.
  • CPM Control CPM ControI is the average of the cpm in antibody/ligand complexes or on Northern blots resulting from assays that lack the candidate modulator (in other words, untreated controls)
  • CPM Sample is the cpm in antibody/ligand complexes or on Northern blots resulting from assays containing the candidate modulator.
  • the assay comprises use of a labeling system or system of measuring enzymatic activity in which there is a linear relationship between the amount of label detected and the amount of protein or nucleic acid being represented per unit of label or the amount of protein or nucleic acid represented by a unit of enzymatic activity.
  • Rational drug design is useful for producing either structural analogs of biologically active polypeptides of interest or small molecules with which polypeptides of interest interact (e.g., agonists, antagonists, inhibitors) in order to design drugs which are, for example, more active or stable forms of the polypeptide, or which enhance or interfere with the function of a polypeptide in vivo. See, e.g., Hodgson, 1991, BioTechnology, 9:19.
  • the three-dimensional structure of a protein of interest e.g., the polypeptide product of the gene
  • the complex comprising the protein product of a gene in association with its ligand is determined by x-ray crystallography, by computer modeling or most typically, by a combination of approaches.
  • useful information regarding the structure of a polypeptide may be obtained by modeling based on the structure of homologous proteins. Rational drug design has been used successfully in the development of HTV protease inhibitors (Erickson et al., 1990, Science, 249: 527).
  • Rational drug design may also involve the analysis of peptides derived from the protein product of a gene by an alanine scan (Wells, 1991, Methods in Enzymol., 202: 390). According to this method, each of the amino acid residues of the peptide is sequentially replaced by alanine, and the effect of this amino acid substitution on the peptide' s activity is determined. This technique can be used to determine the functionally relevant regions of the peptide.
  • Another experimental approach to rational drug design will involve the isolation of a target-specific antibody (selected by a functional assay) and the determination of the crystal structure of this antibody. Theoretically, this approach will yield a pharmacore upon which subsequent drug design can be based.
  • anti-idiotypic antibodies anti -ids
  • the binding site ofthe anti -ids will be an analog ofthe original receptor.
  • the anti-id could then be used to identify and isolate potentially therapeutic peptides from banks of chemically or biologically produced banks of peptides. These selected peptides would then function as pharmacores.
  • the present invention also provides a method of supplying wild-type gene function to a cell which carries a mutant allele of a gene.
  • a full length version of the wild-type gene, or a fragment of the gene may be introduced into the cell in a vector such that the gene remains extrachromosomal and is expressed by the cell from the extrachromosomal location.
  • the wild-type gene or gene fragment should recombine with the endogenous mutant gene X already present in the cell. Such recombination requires a double recombination event which results in the correction ofthe gene mutation.
  • Vectors for introduction of genes both for recombination and for extrachromosomal maintenance are known in the art, and any suitable vector may be used.
  • Methods for introducing DNA into cells such as electroporation, calcium phosphate co- precipitation and lipofection are known in the art (described above).
  • Cells transformed with the wild-type gene can be used as model systems to study changes in the intensity of symptoms associated with osteoporosis and drug treatments which promote such changes.
  • a gene or a fragment thereof, where applicable may be used in gene therapy methods in order to increase the amount of the expression products of such genes in cells of patients with osteoporosis. It may also be useful to increase the level of expression of a gene even in those cells in which the mutant gene is expressed at a "normal" level, but the gene product is not fully functional.
  • Gene therapy can be carried out according to generally accepted methods, for example, as described by Friedman, 1991, In Therapy for Genetic Diseases; T. Friedman ed., Oxford
  • the vector will be injected into the patient, either locally at an appropriate site according to the invention or systemically.
  • Gene transfer systems known in the art may be useful in the practice ofthe gene therapy methods of the present invention. These include viral and nonviral transfer methods, a number of viruses have been used as gene transfer vectors, including papovaviruses, e.g., 5 V40 (Madzak et al., 1992, J Gen Virol, 73: 1533), adenovirus (Berkner, 1992, Curr. Top. Microbiol. Immunol, 158:39; Berkner et al, 1988, BioTechniques, 6:616; Gorziglia and Kapikian, 1992, J Virol, 66:4407; Quantin et al, 1992, Proc. Natl. Acad. Sci.
  • papovaviruses e.g., 5 V40 (Madzak et al., 1992, J Gen Virol, 73: 1533), adenovirus (Berkner, 1992, Curr. Top. Microbiol. Immunol, 158:39
  • Nonviral gene transfer methods known in the art include chemical techniques such as calcium phosphate coprecipitation (Graham and van der Eb, 1973, Virology, 52:456; Pellicer et al, 1980, Science, 209: 1414); mechanical techniques, for example microinjection (Anderson et al, 1980, Proc. Natl. Acad. Sci. USA, 77: 5399; Gordon et al, 1980, Proc. Natl Acad. Sci.. USA, 77: 7380; Brinster et al, 1981, Cell, 27:223; Constantini andLacy, 1981, Nature, 294:92); membrane fusion-mediated transfer via liposomes (Feigner et al, 1987, Proc. Natl. Acad.
  • DNA of any size is combined with a polylysine-conjugated antibody specific to the adenovirus hexon protein, and the resulting complex is bound to an adenovirus vector.
  • the trimolecular complex is then used to infect cells.
  • the adenovirus vector permits efficient binding, internalization, and degradation of the endosome before the coupled DNA is damaged.
  • Liposome/DNA complexes have been shown to be capable of mediating direct in vivo gene transfer. While in standard liposome preparations the gene transfer process is nonspecific, localized in vivo uptake and expression have been reported in tumor deposits, for example, following direct in situ administration (Nabel, 1992, Hum. Gen. Ther., 3:399). Gene transfer techniques which target DNA directly to an appropriate tissue, e.g., a tissue that normally expresses the protein product of the candidate gene of the invention, is preferred. Receptor-mediated gene transfer, for example, is accomplished by the conjugation of DNA (usually in the form of covalently closed supercoiled plasmid) to a protein ligand via polylysine.
  • Ligands are chosen on the basis ofthe presence ofthe corresponding ligand receptors on the cell surface of the target cell/tissue type. These ligand-DNA conjugates can be injected directly into the blood if desired and are directed to the target tissue where receptor binding and internalization of the DNA-protein complex occurs. To overcome the problem of intracellular destruction of DNA, coinfection with adenovirus can be included to disrupt endosome function.
  • Peptides which have gene activity can be supplied to cells which carry mutant or missing alleles of a gene.
  • peptides specific for a mutant form of the protein product of a gene can be supplied to cells carrying a wild type protein.
  • the protein product of a gene can be produced by expression ofthe cDNA sequence in bacteria, for example, using known expression vectors (as described in Section H entitled "Production of a Mutant Protein”).
  • the protein product of a gene can be extracted from mammalian cells engineered to produce the protein product of a gene of interest.
  • the techniques of synthetic chemistry can be employed to synthesize the protein product of a gene. Any of the above techniques can provide a preparation of protein product of a gene that is substantially free of other human proteins.
  • Active gene molecules can be introduced into cells by microinjection or by the use of liposomes, for example. Alternatively, some active molecules may be taken up by cells, actively or by diffusion. Extracellular application of the protein product of a gene may be sufficient to decrease or reverse the physiological effects of osteoporosis.
  • Other molecules with the activity of a protein product of a gene for example, peptides, drugs or organic compounds may also be used to effect such a reversal. Modified polypeptides having substantially similar function may also be useful for peptide therapy.
  • Transformed Hosts Cells and animals which carry a mutant allele of a gene can be used as model systems to study and test for substances which have potential as therapeutic agents. Following application of a test substance to the cells, the phenotype of the cell will be determined. Any variety of phenotypic changes associated with osteoporosis can be assessed, including insulin resistance and combined insulin resistance/insulin secretion detect. Assays for each of these traits are known in the art.
  • Animals useful for testing therapeutic agents can be selected after mutagenesis of whole animals or after treatment of germline cells or zygotes. Such treatments include insertion of mutant alleles of a gene, usually from a second animal species, as well as insertion of disrupted homologous genes. Alternatively, the endogenous gene of the animals may be disrupted by insertion or deletion mutation or other genetic alterations using conventional techniques (Capecchi, 1989, Science, 244:1288; Valancius and Smithies, 1991, Mol. Cell.
  • Polynucleotides can be used to mark objects or substances for the purposes of later identification.
  • polynucleotides ofthe invention are useful for tracking the manufacture and distribution of a large number of diverse substances, including but not limited to: (1) natural resources such as animals, plants, oil, minerals, and water; (2) chemicals such as drugs, solvents, petroleum products, and explosives; (3) commercial by-products including pollutants such as radioactive or other hazardous waste; and (4) articles of manufacture such as guns, typewriters, automobiles and automobile parts.
  • a nucleic acid according to the invention when used as a marker, thus aids in the determination of product identity and so provides information useful to manufacturers and consumers.
  • Polynucleotides have the advantage over other marking materials of being readily amplifiable through the use of polymerase chain reaction (PCR) technology.
  • PCR polymerase chain reaction
  • the method of PCR is well known in the art. PCR is performed as described by Mullis & Faloona, 1987, Methods Enzymol, 155:335, herein incorporated by reference. It is the unique sequence of a polynucleotide which renders it useful as a marker, since the sequence, or a characteristic pattern derived from its sequence, confers a property on the polynucleotide which permits it to be tracked.
  • a novel polynucleotide sequence ofthe invention may be used as markers by their attachment to or mixture in objects or substances to be marked. Methods for marking various classes of substances and later detection of the tags in those substances are disclosed in U.S. Patent Nos. 5,451,505, and 5,643,728.
  • a polynucleotide of the invention as a marker may entail combining a polynucleotide with the substance or object to be marked, using methods appropriate to that substance or object; and detecting the marker through amplification of the polynucleotide sequence using PCR technology, followed by either sequence analysis or identification by other means known in the art (e.g., hybridization assays).
  • a marker nucleic acid to a substance or object and subsequent detection of that nucleic acid will vary depending upon the nature ofthe substance or object and the environment to which it will be exposed.
  • inert solids such as paper, many pharmaceutical products, wood, some foodstuffs, etc.
  • Chemically active substances such as foodstuffs with enzymatic activity, polymers with charged groups, or acidic pharmaceuticals may require that a protective composition (e.g., liposomes) be added to the nucleic acid being used as a marker.
  • the nucleic acid may be mixed directly with the liquid, or, if the chemical nature of the liquid is not compatible with this approach (i.e., nucleic acids are not soluble in the liquid), the nucleic acid may be mixed with a detergent to enhance its solubility.
  • Containerized gases may be marked simply by adding a nucleic acid to the container in dry form, as it will be dispersed throughout the gas as the gas is released.
  • nucleic acid to add to a substance as a marker will also vary with the given situation, as will the detection strategy.
  • PCR technology allows the amplification and detection of as little as one molecule from a sample.
  • Other means of detection such as hybridization assays require that more nucleic acid be recovered from a sample to efficiently detect it.
  • PCR can be combined with a hybridization assay, however, to enhance the sensitivity of the method.
  • a nucleic acid sequence used as a marker will generally be from 20 to 1 ,000 bases long, and preferably will be 60 to 1,000 bases long when PCR is to be used to detect the marker.
  • Marked gunpowder may be prepared as follows: 1) add 16 ng of nucleic acid bearing the chosen marker sequence (derived from a polynucleotide of the invention) to 1 ml of distilled water; 2) mix the solution of nucleic acid with 1 g of nitrocellulose-based gunpowder; and 3) dry in air or under vacuum at 85°C.
  • Another example of a substance which may be marked with a nucleic acid according to the invention is ink.
  • the presence of an amplification product of the proper size (visualized, for example by gel electrophoresis alongside nucleic acid size markers followed by ethidium bromide staining of the gel, according to standard methods) will indicate the presence of the marker in the sample.
  • the PCR product may be further subjected to hybridization analysis or to sequencing to enhance the accuracy of the method. A method of hybridization analysis which can be used is described herein.
  • a polynucleotide of the invention is novel, (that is, its sequence is unique),it is useful as a marker for chromosomal mapping.
  • methods of chromosomal mapping known in the art. Prominent among them is the variant of the in situ hybridization technique known as "Fluorescence In Situ Hybridization", or FISH. Details of methods and solutions used for in situ hybridization are well-known in the art. There are many variations of the FISH technique itself, however the basic approach is similar in each case.
  • in situ hybridization of cells, nuclei, or metaphase chromosome spreads is performed with a polynucleotide probe either directly labeled with a fluorochrome, or labeled with a moiety which will be bound by a fluorochrome tagged entity.
  • the hybridized probe is visualized by irradiation of the sample with light in the wavelength which excites fluorescence from the fluorochrome.
  • the location of the novel polynucleotide sequence on that chromosome may be further localized by in situ hybridization along with probes specific for known genes or sequences, labeled with other fluorescent tags which allow the differentiation of the signals from the different probes.
  • in situ hybridization along with probes specific for known genes or sequences, labeled with other fluorescent tags which allow the differentiation of the signals from the different probes.
  • probes specific for known genes or sequences labeled with other fluorescent tags which allow the differentiation of the signals from the different probes.
  • polynucleotide of the invention In addition to being able to determine the chromosomal location of the novel polynucleotide, similar technology, in which FISH is combined with flow cytometry, will allow the polynucleotide of the invention to be used to sort chromosomes, nuclei, or whole cells containing various dosages (i.e., gene copy numbers) of the gene encoding that polynucleotide
  • Forensic science depends heavily on methods for determining the source of various compounds associated with criminal activity.
  • identification of individuals involved in criminal activity through analysis of substances found at the crime scenes is critical
  • genetic typing which involves the determination of the genotype of an individual with regard to loci which are polymorphic within the population.
  • polymorphic refers to a gene or other segment of DNA which shows nucleotide sequence variability from individual to individual.
  • the use of PCR techniques and nucleotide probes to detect even single nucleotide changes in a polynucleotide sequence has revolutionized the field of forensic serology (see Reynolds and Sensabaugh, 1991, Anal. Chem., 63:2).
  • polymorphisms useful for forensic identification and methods of typing samples with regard to those polymorphisms, see U.S. Patent # 5,273,883.
  • a polynucleotide ofthe invention is found to have nucleotide sequence variation among individuals within a population, it may be useful in the analysis of forensic samples.
  • methods known to those skilled in the art for typing nucleic acids with regard to polymorphisms It should be understood that any such method is acceptable according to the invention.
  • One particular method is termed the "reverse dot blot" method.
  • oligonucleotides bearing the sequences of various polymorphic forms of the polynucleotide region to be analysed are bound to membranes; 2) labeled, PCR-amplified fragments, derived from the sample to be genotyped, and corresponding to the polymorphic region ("target DNA") are allowed to hybridize to the bound oligonucleotides under conditions which only allow the hybridization of molecules with 100% complementary sequences; 3) unbound target DNA is removed; and 4) hybridized molecules are detected.
  • the specific genotype of the individual from whom the target sample was obtained (amplified), with regard to the polymorphic region of a polynucleotide ofthe invention, may thus be determined by screening a panel of probes containing the known polymorphic sequence variations of that region. It should be understood that the hybridization conditions may be adjusted by one of skill in the art so that limited amounts of non-complementarity, including single base mismatches, may be detected with this method.
  • compositions are accomplished orally or parenterally.
  • Methods of parenteral delivery include topical, intra-arterial (directly to the tumor), intramuscular, subcutaneous, intramedullary, intrathecal, intraventricular, intravenous, intraperitoneal, or intranasal administration.
  • these pharmaceutical compositions may contain suitable pharmaceutically acceptable carrier preparations which can be used pharmaceutically.
  • compositions for oral administration can be formulated using pharmaceutically acceptable carriers well known in the art in dosages suitable for oral administration.
  • Such carriers enable the pharmaceutical compositions to be formulated as tablets, pills, dragees, capsules, liquids, gels, syrups, slurries, suspensions and the like, for ingestion by the patient.
  • compositions for oral use can be obtained through combination of active compounds with solid excipient, optionally grinding a resulting mixture, and processing the mixture of granules, after adding suitable auxiliaries, if desired, to obtain tablets or dragee cores.
  • Suitable excipients are carbohydrate or protein fillers such as sugars, including lactose, sucrose, mannitol, or sorbitol; starch from corn, wheat, rice, potato, or other plants; cellulose such as methyl cellulose, hydroxypropylmethyl-cellulose, or sodium carboxymethyl cellulose; and gums including arabic and tragacanth; and proteins such as gelatin and collagen.
  • disintegrating or solubilizing agents may be added, such as the cross-linked polyvinyl pyrrolidone, agar, alginic acid, or a salt thereof, such as sodium alginate.
  • Dragee cores are provided with suitable coatings such as concentrated sugar solutions, which may also contain gum arabic, talc, polyvinylpyrrolidone, carbopol gel, polyethylene glycol, and/or titanium dioxide, lacquer solutions, and suitable organic solvents or solvent mixtures.
  • suitable coatings such as concentrated sugar solutions, which may also contain gum arabic, talc, polyvinylpyrrolidone, carbopol gel, polyethylene glycol, and/or titanium dioxide, lacquer solutions, and suitable organic solvents or solvent mixtures.
  • Dyestuffs or pigments may be added to the tablets or dragee coatings for product identification or to characterize the quantity of active compound, ie, dosage.
  • Push-fit capsules made of gelatin, as well as soft, sealed capsules made of gelatin and a coating such as glycerol or sorbitol.
  • Push-fit capsules can contain active ingredients mixed with a filler or binders such as lactose or starches, lubricants such as talc or magnesium stearate, and, optionally, stabilizers.
  • the active compounds may be dissolved or suspended in suitable liquids, such as fatty oils, liquid paraffin, or liquid polyethylene glycol with or without stabilizers.
  • compositions for parenteral administration include aqueous solutions of active compounds.
  • the pharmaceutical compositions of the invention may be formulated in aqueous solutions, preferably in physiologically compatible buffers such as Hank's solution, Ringer' solution, or physiologically buffered saline.
  • Aqueous injection suspensions may contain substances which increase the viscosity ofthe suspension, such as sodium carboxymethyl cellulose, sorbitol, or dextran.
  • suspensions ofthe active solvents or vehicles include fatty oils such as sesame oil, or synthetic fatty acid esters, such as ethyl oleate or triglycerides, or liposomes.
  • the suspension may also contain suitable stabilizers or agents which increase the solubility of the compounds to allow for the preparation of highly concentrated solutions.
  • penetrants appropriate to the particular barrier to be permeated or used in the formulation.
  • penetrants are generally known in the art.
  • compositions of the present invention may be manufactured in a manner that known in the art, e.g. by means of conventional mixing, dissolving, granulating, dragee-making, levitating, emulsifying, encapsulating, entrapping or lyophilizing processes.
  • the pharmaceutical composition may be provided as a salt and can be formed with many acids, including but not limited to hydrochloric, sulfuric, acetic, lactic, tartaric, malic, succinic, etc... Salts tend to be more soluble in aqueous or other protonic solvents that are the corresponding free base forms.
  • the preferred preparation may be a lyophilized powder in lmM-50 mM histidine, 0.1%-2% sucrose, 2%-7% mannitol at aPhRange of 4.5 to 5.5 that is combined with buffer prior to use.
  • compositions suitable for use in the present invention include compositions wherein the active ingredients are contained in an effective amount to achieve the intended purpose.
  • the determination of an effective dose is well within the capability of those skilled in the art.
  • the therapeutically effective dose can be estimated initially either in cell culture assays, or in animal models, usually mice, rabbits, dogs, or pigs. The animal model is also used to achieve a desirable concentration range and route of administration. Such information can then be use to determine useful doses and routes for administration in humans.
  • a therapeutically effective dose refers to that amount of protein or its antibodies, antagonists, or inhibitors which ameliorate the symptoms or conditions.
  • Therapeutic efficacy and toxicity of such compounds can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, eg, ED50 (the dose therapeutically effective in 50% of the population) and LD50 (the dose lethal to 50% of the population). The dose ratio between therapeutic and toxic effects is the therapeutic index, and it can be expressed as the ratio, LD50/ED50. Pharmaceutical compositions which exhibit large therapeutic indices are preferred.
  • the data obtained from cell culture assays and animals studies is used in formulating a range of dosage for human use.
  • the dosage of such compounds lies preferably within a range of circulating concentrations that include the ED5O with little or no toxicity. The dosage varies within this range depending upon the dosage from employed, sensitivity of the patient, and the route of administration.
  • the exact dosage is chosen by the individual physician in view of the patient to be treated. Dosage and administration are adjusted to provide sufficient levels of the active moiety or to maintain the desired effect. Additional factors which may be taken into account include the severity of the disease state; age, weight and gender of the patient; diet, time and frequency of administration, drug combination(s), reaction sensitivities, and tolerance/response to therapy. Long acting pharmaceutical compositions might be administered every 3 to 4 days, every week, or once every two weeks depending on a half-life and clearance rate of the particular formulation.
  • Dosage amounts may vary from 0.1 to 100,000 micrograms per person per day, for example, lug, lOug, lOOug, 500 ug, lmg, lOmg, and even up to a total dose of about lg per person per day, depending upon the route of administration.
  • Guidance as to particular dosages and methods of delivery is provided in the literature. See U.S. Patent Nos. 4,657,760; 5,206,344; or 5,225,212, hereby incorporated by reference. Those skilled in the art will employ different formulations for nucleotides than for proteins or their inhibitors. Similarly, delivery of polynucleotide or polypeptides will be specific to particular cells, conditions, locations, etc.
  • a polynucleotide sequence according to the invention containing a mutation which is believed to be associated with osteoporosis can be statistically linked to osteoporosis by linkage analysis.
  • An animal model system exhibiting a particular phenotypic defect that is characteristic of the osteoporosis is selected.
  • a series of genetic crosses is performed in this animal model system between individuals having an observable mutant phenotype and normal individuals of a control strain.
  • At least one disease-related locus or a chromosomal marker that does not comprise a disease related locus is used as a marker in these crosses. If a statistically significant pattern of non-random assortment of the mutant trait with a marker locus is observed, the trait is linked to the marker locus.
  • linkage analysis can be performed on an existing human or other mammalian pedigree.
  • numerous genetic loci from affected and unaffected family members are compared.
  • Non-random assortment of a given genetic marker between affected and unaffected family members relative to the distributions observed for other genetic loci indicates that the marker (for example, a variant isoform of a gene) either contributes to the disease or is in physical proximity to another that does so.
  • a polynucleotide sequence according to the invention can be used as a marker for a normal phenotype or for a phenotype associated with osteoporosis.
  • this sequence can be used as a marker for osteoporosis.
  • a sequence of interest can be used as a probe to screen genomic DNA from individuals by Southern blot analysis according to the method described above. If the sequence of interest is detected by Southern blot analysis, and the presence of this sequence is confirmed by direct sequencing, it can be concluded that the individual from which the genomic DNA has been isolated has an increased frequency for the development of osteoporosis for which the sequence is a marker.
  • the marker can also be used as an osteoporosis indicator according to the method of PCR.
  • a genomic DNA sample of interest can be analysed in a PCR reaction wherein one ofthe primers contains the marker sequence. If the marker sequence is present in the sample DNA, a PCR product will be produced.
  • the PCR primers can be designed such that they amplify a region containing the marker sequence.
  • the amplified product can be analysed by hybridization methods, described above, to determine the presence of the sequence of interest.
  • a polynucleotide according to the invention, containing a mutation which is believed to be associated with osteoporosis can be used a target for drug screening.
  • One method of drug screening utilizes eukaryotic or procaryotic host cells which are stably transformed with a polynucleotide according to the invention and either exhibit a particular phenotype characteristic of the presence of the polynucleotide or express a polypeptide or fragment encoded by the polynucleotide.
  • Such cells can be used for standard competitive binding assays.
  • these cells can be used to measure formation of a complex comprising the protein product or fragment of a polynucleotide according to the invention and the agent being tested.
  • these cells can be used to determine if the formation of a complex between the protein product or fragment of a polynucleotide according to the invention and a known ligand is interfered with by an agent being tested.
  • An alternative method for drug screening involves using of eukaryotic cell lines or cells (such as described above) which contain a polynucleotide according to the invention that produces a defective protein.
  • the host cell lines or cells are grown in the presence of a test drug.
  • the rate of growth of the host cells is measured to determine if the compound is capable of regulating the growth of cells expressing a nonfunctional protein product of the polynucleotide according to the invention.
  • a drug that is useful according to the invention will increase or decrease the growth rate of a cell by at least 10%.
  • the ability of the test compound to restore the function of the mutant gene protein by at least 10% can be measured by using an appropriate in vitro assay for function of the protein product of a gene (as described in Section F entitled "Identification and Characterization of Polymorphisms"). If the host cell lines or cells express a protein product of a gene that exhibits an aberrant pattern of cellular localization, the ability of the test compound to alter the cellular localization of the protein by at least 10% will be determined. Changes in the cellular localization of a protein of interest will be detected by performing cellular fractionation studies with biosynthetically labeled cells. Alternatively, the cellular localization of a protein of interest can be determined by immunocytochemical methods well known in the art.
  • a method of drug screening may also involve the use of host eukaryotic cell lines or cells (described above) which have an altered gene that demonstrates an aberrant pattern of expression where the level of expression is either abnormally high or low, or the temporal pattern of expression is different from that of the wild type gene.
  • the ability of a test drug to alter the expression of a mutant form of a gene by at least 10% can be measured by Northern blot analysis, SI nuclease analysis, primer extension or RNase protection assays, as described above.
  • cells can be engineered to express a reporter construct comprising a mutant gene promoter driving expression of a reporter gene (e.g.
  • CAT luciferase, green fluorescent protein
  • a transgenic animal whose genomic DNA contains a polynucleotide associated with a particular phenotypic defect that is characteristic of osteoporosis and a normal, control animal (not containing the polynucleotide) can be treated with a candidate drug according to the invention.
  • the ability of a candidate drug to ameliorate symptoms of the disease, by at least 10%, will be analysed by assessing the disease symptoms and their amelioration.
  • osteoporosis candidate gene list was compiled using gene or gene sequences selected from literature sources, using sequence homology, library subtraction and expression analysis. Expression analysis was performed using "guilt-by-association" queries to identify
  • Polymorphism discovery was by fSSCP as described in section F "Identification and Characterization of Polymorphisms".
  • the polymorphisms were mapped to cDNA sequences in the LifeSeqGold database (Incyte Genomics, Inc., Palo Alto, CA) to identify the affected gene.
  • the genomic Human Diversity Panel will be used where full genomic structure for the selected candidate genes is available to allow screening of the open reading frame of the gene including splice junctions.
  • a cDNA version of the HDP (generated from lymphoblastoid cell lines to obviate the need for intron/exon structure in 50% of human genes) will be used where full genomic structure for the selected candidate genes may not be available to permit screening of the open reading frame of the gene.
  • This HDP is derived from 47 consenting individuals from four ethnic groups (Caucasian, African-American, Asian and Hispanic).
  • probands were identified through probands with a BMD Z score of at least -1.6 (equivalent to approximately the lower 5% of the normal distribution of BMD) at either the femoral neck or the lumbar spine (L2-L4).
  • a "proband” is defined as the first person identified with a particular phenotype (in this case low BMD) within a family.
  • the initial phase of family collection focused on nuclear families of European Caucasoid origin. These families were used primarily for a genome-wide scan for genetic determinants of BMD. BMD was measured in all participating family members and treated as a quantitative trait. First degree relatives of probands will be invited to participate. These included parents, siblings and offspring over the age of twenty. Spouses could to take part to act as controls and to assist the analysis of their children's genotype.
  • the size and nature of families will therefore depend on a number of factors including the age of the proband, family history of osteoporosis or fractures and whether other family members are willing to participate. It is expected, judged from previous experience, that the average number of volunteers per family will be five individuals.
  • the absolute minimum family that was accepted into the study is a pair of siblings, either concordant or discordant for BMD where one of the siblings was a proband.
  • a collection of large numbers of simplex families for linkage disequilibrium studies was carried out to get finer mapping stages of positional cloning and for systematic mapping of functional candidate genes.
  • families from other ethnic groups will provide genetic diversity for haplotype analysis to help identify the primary disease- predisposing sequences. Cape Town and Singapore were selected to collect material from ethnic groups.
  • probands were identified if they had a femoral neck/lumbar spine BMD equal or lower than Z -2.0, were between 20 - 85 years of age, European, white Caucasian and fully mobile. They were excluded from the study if they had secondary osteoporosis, prednisolone usage at a dose of 7.5mg per day for six months or longer or equivalent steroid doses of Dexamethasone 0.75mg per day or hydrocortisone 30mg per day, were hypothyroid patients on thyroxine if the TSH is below the laboratory normal range, had a malignancy (including myeloma) within five years, have malabsorption, have a inflammatory bowel disease, have premenopausal (aged less than 45 years) amenorrhoea greater than six months, other than pregnancy, had previous or cu ⁇ ent alcohol intake estimated at greater than 30 units per week for more than six months, chronic renal failure (creatinine > 150 ⁇ mol/1) or chronic liver dysfunction (AST >
  • Volunteers gave blood samples for DNA extraction for genetic studies, as well as blood samples for calcium, creatinine, liver function (if over 60 years), TSH and vitamin D (if over 60 years) tests, and a second voided urine sample for markers of bone turnover.
  • at least 10 ml of venous blood was collected from a forearm vein into EDTA tubes. Blood collected into plastic tubes will be frozen straight away. Blood collected into glass tubes was transferred to plastic tubes before freezing. Once frozen the blood will not be thawed until DNA extraction takes place. DNA extraction was performed using standard procedures. The blood was frozen quickly as possible to -70°C and then stored at -70°C. A 10-ml venous blood sample was also taken from all subjects for biochemical assays of calcium, creatinine, liver function, TSH and vitamin D. Blood was collected into a plain container. Separated serum was stored at -70°C.
  • Second voided urine samples for analysis of biochemical markers of bone turnover were taken. These samples were stored at -70°C. i addition to BMD at femoral neck and lumbar spine, height and weight were measured. Volunteers were scanned at the femoral neck and lower spine (L2-L4). For the femoral neck, the volunteer will be placed in the dorsal decubitus position with a 10-degree internal rotation of the hip, according to the manufacturer's protocol. For a satisfactory lumbar spine scan (e.g. no scoliosis, severe degenerative disease or obvious fracture), the volunteer will be placed, as described in the manufacturer's manual, in a comfortable supine position with legs raised and supported so as to ensure that the lumbar spine is as horizontal as possible. The axis of the spine should be parallel to the axis of the scanning machine.
  • Bone mineral density measurements was performed using dual energy X-ray absorbtiometry (DXA) scanning.
  • the bone density data was standardized by the use at each center of the same male and female reference population databases for hip and spine.
  • the Z score was calculated by subtracting the predicted BMD from the actual BMD, and then dividing the difference by the reference population standard deviation.
  • Osteoporosis Bone mineral density was analysed as a quantitative trait in probands and family members. Selection of probands with a low BMD increased the power to detect linkage of genetic marker loci.
  • Power is defined here as the probability of observing positive evidence for linkage at a single additive quantitative locus, assuming that a genetic susceptibility locus exists, using the variance component model of Amos (1994). Positive evidence of linkage means a LOD score of 3.0 (p ⁇ 0.001) or greater, the accepted scientific standard.
  • Narrow sense heritability (a measure of the genetic contribution of a specific locus).
  • the broad sense heritability for osteoporosis was estimated to be between 0.3 and 0.8.
  • Theoretical calculations were based on the analysis of 108 nuclear families with an average of 2.9 phenotyped siblings per family using the same recruitment strategy as described above, assuming:
  • That the marker locus is highly informative.
  • family recruitment begins initially using the proband inclusion criterion of a BMD Z score of -2.0.
  • a more stringent proband criterion of Z score of -2.0 or less will be adopted.
  • BMD was co ⁇ ected for height, weight, age and sex.
  • the experimental threshold for positive evidence for linkage was p ⁇ 0.001 at one locus or p ⁇ 0.01 at two or more adjacent loci.
  • SNPs Single nucleotide polymorphisms
  • Incyte' s proprietary fSSCP method Fluoresently labeled primers were synthesized and PCR was performed on 47 DNAs from a Coriel-derived Human Diversity Panel. The PCR products were electrophoresed on an ABI 377 machine and 8% nondenaturing, 12cm SSCP gels were used. The resulting traces were aligned in ABI Genotyper software and where variant traces (indicating underlying polymorphisms) were found, examples of each variant were sequenced.
  • a pair of oligonucleotides for amplification by PCR was designed on either side of each biallelic polymorphism to produce a product size between 50bp and 350bp.
  • a sequencing oligonucleotide was designed to end within 30bp either 5' or 3' to each polymorphic site. All amplification oligonucleotides used to generate the complementary strand to the sequencing primer were labeled with a 5' - Biotin. Examples of the particular sequencing primers used are found in Table 10 of U.S.
  • Each reaction used 20ng DNA (dried down), 0.6 units of AmpliTaq GoldTM DNA polymerase, IX PCR Buffer R, 2.5mM MgCl 2 , lmM dNTP, and lOpmol of each PCR oligonucleotide in a final volume of 10ml
  • the PCR cycling conditions used were: 95°C for 12 min, 45 cycles of: 94°C for 15 sec, T A for 15 sec, 72°C for 30 sec, and 72°C for 5 min.
  • sequencing oligonucleotide was annealed to the template by denaturing at 80°C for 2min and then cooling to room temperature for 10 min. Each marker/sample combination was then sequenced/genotyped by pyrosequencingTM on a
  • PSQ96TM (Pyrosequencing AB). Genotype results were stored in the PSQ oracle® database ready for statistical analysis.
  • ACLP Aortic carboxypeptidase-like protein
  • NM_001129 Protein NP_001120
  • the ACLP also known as the adipocyte enhancer(AE)-binding protein 1 (AEBP1), is a transcriptional repressor with carboxypeptidase activity and may play a role in adipogenesis.
  • AEBP1 adipocyte enhancer(AE)-binding protein 1
  • a kinase anchor protein 9 (AKAP9) mRNA: NM_005751 Protein: NP_005742 mRNA: NM_147166 Protein: NP_671695 mRNA: NM_147171 Protein: NP_671700 mRNA: NM_147185 Protein: NP_671714
  • AKAP9 also known as YOTIAO, is a scaffold protein that binds type I protein phosphatase (PPl) and cAMP-dependent protein kinase (PKA) to NMDA receptors. AKAP9 also anchors protein kinases and phosphatases to the centrosome and the Golgi apparatus. 3) Bone morphogenetic protein receptor, type II (BMPR2)
  • Variant 1 mRNA: NM_001204 Protein: NP_001195
  • Variant 2 mRNA: NM 033346 Protein: NP 203132
  • BMPR2 also know as the serine/threonine kinase type II activin receptor-like kinase rs a transforming growth factor beta (TGF-beta) receptor that can also bind type I receptors and is involved in bone and other morphogenesis. Mutations in the gene are associated with familial primary pulmonary hypertension.
  • TGF-beta transforming growth factor beta
  • Fibroblast growth factor receptor 2 (FGFR2) mRNA: NM_000141 Protein: NP_000132 mRNA: NM_022969 Protein: NP_075258 mRNA: NM_022970 Protein: NP_075259 mRNA: NM_022971 Protein: NP_075260 mRNA: NM_022972 Protein: NP_075261 mRNA: NM_022973 Protein: NP_075262 mRNA: NM_022974 Protein: NP_075263 mRNA: NM_022975 Protein: NP_075264 mRNA: NM_022976 Protein: NP_075265 mRNA: NM_023028 Protein: NP_075417 mRNA: NM_023029 Protein: NP_075418 mRNA: NM_023030 Protein: NP_075419 mRNA: NM 023031 Protein: NP 075420
  • FGFR2 is a high-affinity receptor, depending on the isoform, for acidic, basic and/or keratinocyte growth factor.
  • This receptor is a member of the fibroblast growth factor receptor family, where amino acid sequence is highly conserved between members and throughout evolution.
  • FGFR family members differ from one another in their ligand affinities and tissue distribution.
  • a full-length representative protein consists of an extracellular region, composed of three immunoglobulin-like domains, a single hydrophobic membrane-spanning segment and a cytoplasmic tyrosine kinase domain. The extracellular portion of the protein interacts with fibroblast growth factors, setting in motion a cascade of downstream signals, ultimately influencing mitogenesis and differentiation.
  • Mutations in this gene are associated with many craniosynostotic syndromes and bone malformations.
  • the genomic organization of this gene encompasses 20 exons.
  • FOSB FBJ murine osteosarcoma viral oncogene homolog B (FOSB) mRNA: NM_006732 Protein: NP_006723
  • FOSB is a DNA-binding member of the Fos family, forms AP-1 transcription factor complex with Jun proteins. FOSB may be involved in the pathogenesis of breast tumors.
  • FSTL1 also known as follistatin-related protein, is a nuclear activin-binding protein that is induced by TGF beta 1 (TGFB1) and inhibits cell proliferation.
  • FSTL1 is also an autoantigen in systemic rheumatic diseases.
  • FSTL1 is abundantly expressed (0.33%) in trabecular bone libraries.
  • IGFBP5 Insulin-like growth factor binding protein 5
  • mRNA NM_000599
  • Protein NP_000590
  • IGFBP5 is a member of the insulin-like growth factor binding family of proteins that bind to and modulate insulin-like growth factor activity, regulates bone formation and may serve in muscle and cartilage development. IGFBP5 has tissue specificity with osteosarcoma, and at lower levels in liver, kidney, and brain. IGFBP5 can also alter the interaction of insulin growth factors with their cell surface receptors.
  • IRSl also known as FflRS-1, is a cytoplasmic docking protein that mediates IGF1 signaling to SH2-containing effector molecules such as Grb2 and PI3-kinase and inhibits apoptosis. IRSl also plays a role in cell proliferation and glucose transport.
  • Alpha V subunit integrin (ITGAV) mRNA: NM_002210 Protein: NP_002201 ITGAV is a subunit of the vitronectin receptor that is involved in cell-cell and cell-matrix interactions, plays a role in tumor angiogenesis and may contribute to tumorigenicity of cutaneous malignant melanoma. Integrins serve as major receptors for extracellular matrix-mediated cell adhesion and migration, cytoskeletal organization, cell proliferation, survival, and differentiation. Alpha-V integrins comprise a subset sharing a common alpha-V subunit combined with 1 of 5 beta subunits (beta-1, -3, -5, -6, or -8).
  • alpha-V integrins recognize the sequence RGD in a variety of ligands (vitronectin, fibronectin, osteopontin, bone sialoprotein, thrombospondin, fibrinogen, von Willebrand factor, tenascin, and agrin) and, in the case of alpha- V-8, laminin and type IV collagen
  • Vitronectin is a multifunctional glycoprotein present in blood and in the extracellular matrix. It binds glycosaminoglycans, collagen, plasminogen and the urokinase-receptor, and also stabilizes the inhibitory conformation of plasminogen activation inhibitor- 1. By its localization in the extracellular matrix and its binding to plasminogen activation inhibitor- 1, vitronectin can potentially regulate the proteolytic degradation of this matrix. In addition, vitronectin binds to complement, to heparin and to thrombin-antithrombin III complexes, implicating its participation in the immune response and in the regulation of clot formation. The biological functions of vitronectin can be modulated by proteolytic enzymes, and by exo- and ecto-protein kinases present in blood.
  • Vitronectin contains an Arg-Gly-Asp (RGD) sequence, through which it binds to the integrin receptor alpha v beta 3, and is involved in the cell attachment, spreading and migration.
  • RGD Arg-Gly-Asp
  • Bone resorption requires the tight attachment of the bone-resorbing cells, the osteoclasts, to the bone mineralized matrix.
  • Integrins a class of cell surface adhesion glycoproteins, play a key role in the attachment process. Most integrins bind to their ligands via the RGD tripeptide present within the ligand sequence. The interaction between integrins and ligands results in bidirectional transfer of signals across the plasma membrane. Tyrosine phosphorylation occurs within cells as a result of integrin binding to ligands and probably plays a role in the formation of the osteoclast clear zone, a specialized region of the osteoclast membrane maintained by cytoskeletal structure and involved in bone resorption.
  • Human osteoclasts express alpha 2 beta 1 and alpha v beta 3 integrins on their surface.
  • the alpha v beta 3 integrin a vitronectin receptor, plays an essential role in bone resorption.
  • echistatin an RGD-containing protein from a snake venom, binds to the alpha v beta 3 integrin and blocks bone resorption both in vitro and in vivo. (Dresner-Pollak R, Rosenblatt M. J Cell Biochem 1994 Nov;56(3):323-30).
  • Alpha-V integrins have been implicated in many developmental processes and are therapeutic targets for inhibition of angiogenesis and osteoporosis.
  • the ablation of the gene for the alpha-V integrin subunit, eliminating all 5 alpha-V integrins, although causing lethality, allows considerable development and organogenesis including, most notably, extensive vasculogenesis and angiogenesis.
  • These liveborn alpha- V-null mice consistently exhibit intracerebral and intestinal hemo ⁇ hages and cleft palates.
  • KJ_bonlib4 is also known as p97, DAP5, NAT1 and Eukaryotic translation initiation factor 4G-like 1.
  • KJ_bonlib4 is a translational repressor that binds EIF3 and EIF4A, but not EIF4E, promotes IFNG-induced programmed cell death and is cleaved by caspase-3 (CASP3) during apoptosis.
  • KJ bonlib7 has an unknown function.
  • KJ_opgbal is a member of the sulfatase family, which hydrolyze sulfate esters, has a region of moderate similarity to a region of N-acetylglucosamine-6-sulfate sulfatase (human GNS), which is associated with Sanfilippo disease IITD upon deficiency
  • KJ_opgbal3 is also known as DEPP.
  • Pfram model results indicate that KJJDPGB A13 is a thermophilic metalloprotease.
  • KJ_opgbal4 is also known as NESHBP and TARSH.
  • KJ_opgbal4 contains a fibronectin type III domain, which is involved in cell surface binding. 15)
  • KJ_opgba47 is a member of the cytochrome b561 family, has moderate similarity to uncharacterized cytochrome b561 (human CYB561), which is an integral membrane protein found in neuroendocrine secretory vesicles.
  • KJ_opgbal 15 has a strong similarity to B cell phosphoinositide 3 -kinase (PI3K) adaptor (mouse Bcap), which binds to SH2 domains of PI3K and may recruit PI3K to glycolipid-enriched microdomains leading to BCR-mediated PI3K activation.
  • PI3K B cell phosphoinositide 3 -kinase
  • KJ_opgbal36 has an unknown function.
  • LUM is an extracellular matrix keratan sulfate proteoglycan that may be involved in the development and maintenance of corneal transparency.
  • MMPl Matrix metalloproteinase 1
  • MMPl also known as interstitial collagenase, is a matrix metalloprotease that cleaves fibrillar collagen type I to gelatin and functions in collagen turnover in most tissues and may play a role in cartilage destruction in rheumatoid arthritis.
  • Mitogen-activated protein kinase 8 MAPK8J
  • MAPK ⁇ isoform l mRNA: NM_139049 Protein: NP_620637
  • MAPK8 isoform 2 mRNA: NM_002750 Protein: NP_002741
  • MAPK8 isoform 3 mRNA: NM_139046 Protein: NP_620634
  • MAPK8 isoform 4 mRNA: NM_ 139047 Protein: NP_620635
  • MAPK8 is also known as JNK, JNK1, PRKM8, SAPK1, JNK1A2 and JNK21B1/2.
  • MAPK8 is a serine-threonine kinase that regulates c-Jun (JUN) and plays a role in the induction of apoptosis and other cellular responses to stressors such as ultraviolet light, reactive oxygen and hypoxia.
  • NFKB2 Nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (NFKB2) mRNA: NM_002502 Protein: NP_002493 NFKB2 is a transcription factor, involved in immune response, may coordinate pre mRNA splicing and transcription and may play a role in HJV infection, leukemia, breast cancer and lymphoid neoplasia.
  • NOTCH3 encodes the third discovered human homologue of the Drosophilia melanogaster type I membrane protein notch.
  • notch interaction with its cell- bound ligands (delta, serrate) establishes an intercellular signalling pathway that plays a key role in neural development.
  • Mutations in NOTCH3 have been identified as the underlying cause of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASE ).
  • CADASE cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy
  • Alignment of available genomic sequence to the CDS contig identified at least 29 exons.
  • Notch may be a receptor with different functional domains, the intracellular domain having the signal-transducing activity ofthe intact protein and the extracellular domain possessing a ligand-binding and regulatory activity.
  • OSF2 Osteoblast specific factor 2
  • mRNA NM_006475
  • Protein NP_006466 OSF2 is also known as periostin.
  • OGN is a member of the keratan sulfate proteoglycan group of the small leucine-rich profeoglycan family and may play a role in regulating corneal transparency.
  • Plasminogen activator inhibitor 1 (PAH) mRNA: NM_000602 Protein: NP_000593
  • PAH is a member of the serpin family of serine proteases, inhibitors and plays a role in regulating blood coagulation by inhibiting fibrinolysis, contributes to tumor progression and is a risk factor for cardiovascular diseases.
  • PTGS1 is also known as COX1, catalyzes the conversion of arachidonic acid to prostaglandin H2 and may be involved in inflammation and blood coagulation. PTGS 1 's activity is irreversibly inhibited by aspirin.
  • SCYA2 CCL2 chemokine (C-C motif) ligand 2 (SCYA20) mRNA: NM_002982 Protein: NP_002973
  • SCYA2 is also known as monocyte secretory protein JE monocyte chemoattractant protein- 1 monocyte chemotactic and activating factor small inducible cytokine subfamily A (Cys-Cys), member 2 monocyte chemotactic protein 1 and homologous to mouse Sig-je, is a Cytokine A2, CC chemokine that attracts monocytes, memory T-cells, natural killer cells and endothelial cells.
  • SCYA2 plays a role in the inflammatory response to infection and in inflammatory diseases including arthritis, multiple sclerosis and atherosclerosis.
  • TIMP Tissue inhibitor of metalloproteinase 1 (TIMPl) mRNA: NM_003254 Protein: NP 003245
  • TIMP is also known as erythroid potentiating activity, EPA, EPO, HCl and CLGI. TIMP inhibits matrix metalloproteases including MMP2, stimulates growth of erythroid cells and attenuates metastasis of tumorigenic cells when overexpressed.
  • TGM1 Transglutaminase 2 (TGM1) mRNA: NM_000359 Protein: NP_000350
  • TGM1 is membrane bound and catalyzes the crosslinking of extracellular matrix (ECM) proteins and other cellular proteins, modulates the ECM, cell growth, adhesion, signaling, and apoptosis, and has been associated with Alzheimer's, Huntington, and celiac disease.
  • ECM extracellular matrix
  • TNFAIP6 Tumor Necrosis Factor- Alpha-Induced Protein 6
  • TNFAIP6 is a metalloprotease. TNFAIP6 is transcribed in normal fibroblasts and activated by binding of the TNFa. Similar to CD44, TNFAIP6 binds hyaluronate and is involved in plasmin inhibition and the inhibition of inflammation.
  • VEGF Vascular endothelial growth factor
  • VEGF which is structurally related to platelet-derived growth factor, induces endothelial cell proliferation and migration, vascular permeability, angiogenesis and NO-mediated signal transduction.
  • Many polypeptide mitogens such as basic fibroblast growth factor and platelet- derived growth factors are active on a wide range of different cell types.
  • vascular endothelial growth factor is a mitogen primarily for vascular endothelial cells. Data suggest that mutations of p53 and activation of the Ras/MAPK pathway may play a role in the induction of VEGF expression in human colorectal cancer.
  • vascular endothelial growth factor by membrane-type 1 matrix metalloproteinase stimulates human glioma xenograft growth and angiogenesis. Both VEGF-induced PI 3-kinase activation and beta(l) mtegrin-mediated binding to fibronectin are required for the recruitment and activation of PKC alpha.
  • Quantitative transmission disequilibrium tests for association between 100 SNP loci and 13 phenotypic traits are reported.
  • BMD bone mineral density
  • For each marker-trait combination the significance of stratification is tested.
  • the significance of association between marker and trait is tested both unpartitioned, and partitioned into between-family and within-family components.
  • SNP-trait associations are significant at the 1 % level. The most notable of these is between SNP locus ITGA08 and BMD in lumbar vertebrae 2 to 4 in males. The effect of this association is 4.1 % of the mean value for calibrated BMD, and 0.237 units for the Z score.
  • the traits comprise calibrated bone mineral density (BMD) values for four skeletal sites, the corresponding Z scores, the occurrence of fractures, and four other traits.
  • the skeletal sites studied are lumbar vertebrae 2 to 4 (mean value), the neck of the femur, the trochanter, and the total of BMD values over three sites in the hip (neck of femur, trochanter and 'inter').
  • Calibrated BMD values are given in units of g/cm 2 .
  • the four other traits, which are not directly related to osteoporosis (though they are associated with it) and which are included for purposes of comparison, are the ages of onset and cessation of periods in females, and height and weight in both
  • Statistical analysis is performed using the software QTDT. For each marker-trait combination, the significance of stratification is tested. If stratification is present, the between-pedigrees component of the marker-trait association is not entirely due to linkage and only the within-pedigree component can legitimately be used to measure the effect of the locus. However, if stratification is absent the unpartitioned association provides a stronger test of significance and a more precise measure of the effect of the locus. Therefore each marker-trait combination is tested for association both without partitioning of the association, and with partitioning into between- and within-pedigree components. The interpretation of the results then depends on the outcome of the test for stratification. These analyses are performed both for the sexes pooled, and also using phenotypic data from males only and females only.
  • the script contained in the file heritability/run_QTDT_heritability is then run.
  • This script fits a QTDT model with options -a- -We -Veg to each phenotypic trait for the sexes pooled, both for the complete set of phenotypic values and with the exclusion of outliers for BMD.
  • options indicate that no model of association is to be fitted, and that the variance components V e (null model) and V e + V g (full model) are to be estimated.
  • Heritability is then estimated as
  • the co ⁇ esponding analysis was performed using the phenotypic values of males only and using those of females only.
  • This script fits a QTDT model with options -at -Weg or each phenotypic trait, for the sexes pooled, in combination with each SNP locus. These options indicate that the association between the trait and the SNP locus is to be estimated, and that the model is also to include the variance components V e + V g . However, the association is not to be partitioned into between-pedigree and within-pedigree components.
  • the same model is fitted for the phenotypes of males only and for females only. The chi-square value for association of each SNP with each trait is extracted from these files and these values are stored.
  • the chi-square value needed to achieve significance following the Bonfe ⁇ oni correction is presented for all BMD traits at a single SNP locus (8 tests), for all SNP loci for a single trait (100 tests) and for all SNP-trait combinations (800 tests).
  • the frequency of the rare allele at each SNP locus is extracted from the file and is also presented in each of these worksheets.
  • the mean and heritability of each phenotypic trait, for the sexes pooled (with and without the inclusion of outliers) and for each sex individually, are presented in Table 1.
  • the exclusion of outliers makes little difference to either the mean or the heritability.
  • the BMD traits calibrated BMD values and Z scores
  • the exclusion of outliers consistently raises the mean value and lowers the heritability. This is to be expected, as the outliers were identified on the basis of high BMD.
  • Heritability of the BMD traits is strikingly lower in females than in males. In particular, that of CalL2_L4BMD is zero. Conversion of the BMD values to Z scores causes a small but consistent increase in heritability in females, but had no consistent effect in males.
  • the test for association between a phenotypic trait and a SNP within pedigrees is less powerful than the non-partitioned test, provided that stratification is absent. It is therefore to be expected that the chi-square value for non-partitioned association (-at model in QTDT) will be larger than that for association within pedigrees (-ao model), unless there is strong association due to stratification in the opposite direction to that caused by linkage. In the present case there were no exceptions to this expectation.
  • the significance tests for stratification (given by the -ap model) are summarized in Tables 2 and 3. There are a few significant values (P ⁇ 0.05), but not many more than the 5 % expected by chance. It is therefore concluded that stratification is not strong or widespread in these data, and attention is therefore focused on the non-partitioned model of association.
  • the SNP loci OGN_02 and OMDJ33 show significant associations with phenotypic traits in all three sub-sets of the data (sexes pooled, males only and females only), and OMD_01 shows six associations that are significant at the 1 % level.
  • the difference between either homozygote and the heterozygote in these marker-trait associations ranges from 2.8 % to 10.4 % of the mean value for the trait in the case of the calibrated BMD traits, and from 0.114 to 0.448 units for the Z scores. In the cases where stratification is significant, the within-family association effect is consistently much smaller than the unpartitioned effect.
  • the strongest and most consistent associations between SNP loci and phenotypic traits related to BMD are at loci OGN_02, OMD_03, OMD_01 and ITGA08.
  • the first three of these each show several associations significant at the 5 % level, with effects ranging from 2.8 % to 10.4 % of the mean value for the trait in the case of the calibrated BMD traits, and from 0.114 to 0.448 units for the Z scores.
  • ITGA08 shows significant association only with BMD in lumbar vertebrae 2 to 4 in males, but this effect is significant at the 1 % level. Its magnitude lies in the same range as those at the other three loci.
  • Table 10 of this application provides a list of the polymorphism markers of the thirty-two (32) genes listed in Example 9 which have been found to have various effects on susceptibility to low BMD and bone damage.
  • Tables 11 and 12 ranks into groups the polymorphic markers by the relevance of their association to the susceptibility to low BMD by sexes (Table 11 -males and Table 12-females). Those markers ranked in Group A are the ones that show the most association to the susceptibility to low BMD, Group B show less association and Group C shows the least association.
  • the gene by gene interactions were assessed by logistic regression.
  • the interaction was assessed for every pair of OMD-ITGAV SNPs (OMD01 and OMD03 versus ITGAV02, 08, 11 and 12).
  • the logistic regression models were
  • MEN: OP or Frx (0 or 1) OMD(snp) + ITGAV(snp) + OMG(snp)+ ITGAV(snp)+ age+ weight
  • OP or Frx (0 or 1) OMD(snp) + ITGAV(snp) + OMG(snp)+ ITGAV(snp)+ age+ height
  • the weight was not included for the women and height for the men since they did not show significant effect on OP or Frx in the sample set used.
  • the study group was made up of individuals with osteoporosis (OP) that were unrelated individuals from the FAMOS cohort that had (1) been diagnosed with OP, (2) had fractures and had a maximum Z (spine) of -1 or (3) was a proband and had a maximum Z (spine) of -0.5.
  • OP osteoporosis
  • the odds ratio of the predisposing variant was computed by logistic regression analyzing separately the two or three ITGAV genotypes.
  • the significant level of the association was determined by computing the Wald's Chi-Square for the OMD SNP (model tested OP+OMDsnp + age + height (if women) + weighty men) ) independently for each ITGAV genotype. This is illustrated by the plots in Figure 1 A-E.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

La présente invention concerne des polynucléotides associés à la susceptibilité à une faible teneur minérale des os et/ou à la dégradation osseuse généralement associée à des maladies humaines telles que, en particulier, l'ostéoporose. L'invention se rapporte également à des polynucléotides polymorphes associés à l'ostéoporose. L'invention se rapporte à des procédés permettant de déterminer si un polymorphisme particulier prédispose un individu au développement de l'ostéoporose ou est associé à ce développement. L'invention se rapporte en outre à des procédés de détection de la présence d'un ou de plusieurs polymorphismes constituant des indicateurs de l'ostéoporose, ainsi qu'à l'utilisation des nouveaux polynucléotides décrits ci-dessus pour le développement de médicaments et le traitement de maladies.
EP02805650A 2001-12-20 2002-12-19 Polymorphismes nucleotidiques associes a l'osteoporose Withdrawn EP1466012A2 (fr)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US34271101P 2001-12-20 2001-12-20
US342711P 2001-12-20
US42355902P 2002-11-04 2002-11-04
US423559P 2002-11-04
PCT/US2002/040948 WO2003054218A2 (fr) 2001-12-20 2002-12-19 Polymorphismes nucleotidiques associes a l'osteoporose

Publications (1)

Publication Number Publication Date
EP1466012A2 true EP1466012A2 (fr) 2004-10-13

Family

ID=26993153

Family Applications (1)

Application Number Title Priority Date Filing Date
EP02805650A Withdrawn EP1466012A2 (fr) 2001-12-20 2002-12-19 Polymorphismes nucleotidiques associes a l'osteoporose

Country Status (4)

Country Link
EP (1) EP1466012A2 (fr)
AU (1) AU2002366709A1 (fr)
CA (1) CA2471376A1 (fr)
WO (1) WO2003054218A2 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0301715D0 (en) * 2003-01-24 2003-02-26 King S College London Detection of predisposition to osteoporosis
WO2005005471A2 (fr) 2003-07-11 2005-01-20 Develogen Aktiengesellschaft Utilisation de produits des proteines dg153 ou dg177 secretees pour la prevention et le traitement de maladies du pancreas et/ou de l'obesite et/ou du syndrome metabolique
WO2020076900A1 (fr) * 2018-10-09 2020-04-16 Genecentric Therapeutics, Inc. Détection d'une charge de mutation de tumeur avec un substrat d'arn

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO03054218A3 *

Also Published As

Publication number Publication date
AU2002366709A8 (en) 2003-07-09
AU2002366709A1 (en) 2003-07-09
WO2003054218A2 (fr) 2003-07-03
WO2003054218A3 (fr) 2004-02-19
CA2471376A1 (fr) 2003-07-03

Similar Documents

Publication Publication Date Title
US20090317816A1 (en) Methods for identifying risk of breast cancer and treatments thereof
JP2007185199A (ja) 特発性全身てんかんについてのローカス、当該ローカスのミューテーション、及びてんかんの評価、診断、予後、または治療のための当該ローカスの使用方法
WO2000019883A9 (fr) Compositions et methodes permettant de diagnostiquer et de traiter des maladies
JP2004520005A (ja) オステオレビン遺伝子多型性
US6551812B1 (en) Compositions and methods relating to the peroxisomal proliferator activated receptor-α mediated pathway
WO2003054166A2 (fr) Polymorphisme nucleotidiques associes a l'osteoarthrite
US20050064440A1 (en) Methods for identifying risk of melanoma and treatments thereof
JP4997113B2 (ja) 薬物反応を予測するための方法および組成物
US20050277118A1 (en) Methods for identifying subjects at risk of melanoma and treatments thereof
WO2001020031A2 (fr) Polymorphismes dans un gene klotho
US11473143B2 (en) Gene and mutations thereof associated with seizure and movement disorders
JP2009165473A (ja)
US20040018533A1 (en) Diagnosing predisposition to fat deposition and therapeutic methods for reducing fat deposition and treatment of associated conditions
US20050233321A1 (en) Identification of novel polymorphic sites in the human mglur8 gene and uses thereof
US20170253929A1 (en) Novel Homeobox Gene
EP1466012A2 (fr) Polymorphismes nucleotidiques associes a l'osteoporose
JP2006526986A (ja) 炎症性大腸炎の診断方法
US20030175797A1 (en) Association of protein kinase C zeta polymorphisms with diabetes
JP2006506988A (ja) 染色体5q35に位置したヒトII型糖尿病遺伝子−SLIT−3
EP2112229A2 (fr) Procédés d'identification du risque du cancer du sein et traitements associés

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20040720

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LI LU MC NL PT SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20050728