WO2002083721A1 - Protein cluster v - Google Patents

Protein cluster v Download PDF

Info

Publication number
WO2002083721A1
WO2002083721A1 PCT/SE2002/000730 SE0200730W WO02083721A1 WO 2002083721 A1 WO2002083721 A1 WO 2002083721A1 SE 0200730 W SE0200730 W SE 0200730W WO 02083721 A1 WO02083721 A1 WO 02083721A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
nucleic acid
protein
polypeptide
acid molecule
Prior art date
Application number
PCT/SE2002/000730
Other languages
French (fr)
Inventor
Anneli Attersand
Original Assignee
Pharmacia Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pharmacia Ab filed Critical Pharmacia Ab
Priority to CA002440846A priority Critical patent/CA2440846A1/en
Priority to EP02718767A priority patent/EP1377603A1/en
Priority to JP2002581476A priority patent/JP2005500020A/en
Publication of WO2002083721A1 publication Critical patent/WO2002083721A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals

Definitions

  • the present invention relates to the identification of a human gene family expressed in metabolically relevant tissues.
  • the genes encode a group polypeptides referred to as "Protein Cluster V" which are predicted to be useful in the diagnosis of metabolic diseases, such as obesity and diabetes, as well as in the identification of agents useful in the treatment of the said diseases.
  • Metabolic diseases are defined as any of the diseases or disorders that disrupt normal metabolism. They may arise from nutritional deficiencies; in connection with diseases of the endocrine system, the liver, or the kidneys; or as a result of genetic defects. Metabolic diseases are conditions caused by an abnormality in one or more of the chemical reactions essential to producing energy, to regenerating cellular constituents, or to eliminating unneeded products arising from these processes. Depending on which metabolic pathway is involved, a single defective chemical reaction may produce consequences that are narrow, involving a single body function, or broad, affecting many organs and systems.
  • Insulin One of the major hormones that influence metabolism is insulin, which is synthesized in the beta cells of the islets of Langerhans of the pancreas. Insulin primarily regulates the direction of metabolism, shifting many processes toward the storage of substrates and away from their degradation. Insulin acts to increase the transport of glucose and amino acids as well as key minerals such as potassium, magnesium, and phosphate from the blood into cells. It also regulates a variety of enzymatic reactions within the cells, all of which have a common overall direction, namely the synthesis of large molecules from small units.
  • a deficiency in the action of insulin causes severe impairment in (i) the storage of glucose in the form of glycogen and the oxidation of glucose for energy; (ii) the synthesis and storage of fat from fatty acids and their precursors and the completion of fatty-acid oxidation; and (iii) the synthesis of proteins from amino acids.
  • Type I insulin-dependent diabetes mellitus
  • IDDM insulin-dependent diabetes mellitus
  • Type II non-insulin-dependent diabetes mellitus
  • NIDDM non-insulin-dependent diabetes mellitus
  • Obesity is usually defined in terms of the body mass index (BMI), i.e. weight (in kilograms) divided by the square of the height (in meters). Weight is regulated with great precision. Regulation of body weight is believed to occur not only in persons of normal weight but also among many obese persons, in whom obesity is attributed to an elevation in the set point around which weight is regulated. The determinants of obesity can be divided into genetic, environmental, and regulatory.
  • the ⁇ 3-adrenergic receptor represents one of a number of potential anti-obesity drugs targets for which selective agonists have been developed.
  • ⁇ 3-AR niRNA is abundant in white adipose tissue (WAT) and brown adipose tissue (BAT). It has been demonstrated that mice lacking endogenous ⁇ 3-adrenoreceptors have a slight increase in body fat, but otherwise appear normal (Susulic V.S., et al. (1995) J. Biol. Chem. 270(49): 29483-29492).
  • mice are completely resistant to the specific ⁇ 3- agonist CL-316,243, which has been shown to increase lipolysis, energy expenditure and affect insulin and leptin levels.
  • CL-316,243 which has been shown to increase lipolysis, energy expenditure and affect insulin and leptin levels.
  • ⁇ 3-AR was ectopically expressed in white and brown adipose tissue or brown adipose tissue only, it was recently demonstrated that the anorectic and insulin secretagogue effects appeared to be mediated by white adipose tissue (Grujic D, et al. (1997) J Biol Chem. 272(28): 17686-93). How these effects are mediated by ⁇ 3-AR agonists remains poorly understood.
  • Protein Cluster V a family of genes and encoded homologous proteins (hereinafter referred to as "Protein Cluster V”) has been identified. Consequently, the present invention provides an isolated nucleic acid molecule selected from:
  • nucleic acid molecules comprising a nucleotide sequence as shown in SEQ ID NO: 3, 5, 7, 9, 11, 13, 15, 17, or 19.
  • nucleic acid molecules comprising a nucleotide sequence capable of hybridizing, under stringent hybridization conditions, to a nucleotide sequence complemeiitai'y to the polypeptide coding region of a nucleic acid molecule as defined in (a);
  • nucleic acid molecules comprising a nucleic acid sequence which is degenerate as a result of the genetic code to a nucleotide sequence as defined in (a) or (b).
  • the nucleic acid molecules according to the present invention includes cDNA, chemically synthesized DNA, DNA isolated by PCR, genomic DNA, and combinations thereof. RNA transcribed from DNA is also encompassed by the present invention.
  • stringent hybridization conditions is known in the art from standard protocols (e.g. Ausubel et al., supra) and could be understood as e.g. hybridization to filter-bound DNA in 0.5 M NaHP0 4 , 7% sodium dodecyl s ⁇ lfate (SDS), 1 mM EDTA at +65°C, and washing in O.lxSSC / 0.1% SDS at +68°C.
  • the said nucleic acid molecule has a nucleotide sequence identical with SEQ ID NO: 3, 5, 7, 9, 1 1, 13, 15, 17, or 19 of the Sequence Listing.
  • the nucleic acid molecule according to the invention is not to be limited strictly to the sequence shown as SEQ ID NO: 3, 5, 7, 9, 1 1 , 13, 15, 17, or 19. Rather the invention encompasses nucleic acid molecules carrying modifications like substitutions, small deletions, insertions or inversions, which nevertheless encode proteins having substantially the features of the Protein Cluster V polypeptide according to the invention.
  • nucleic acid molecules the nucleotide sequence of which is at least 90% homologous, preferably at least 95% homologous, with the nucleotide sequence shown as SEQ ID NO: 3, 5, 7, 9, 11, 13, 15, 17, or 19 in the Sequence Listing.
  • nucleic acid molecule which nucleotide sequence is degenerate, because of the genetic code, to the nucleotide sequence shown as SEQ ID NO: 3, 5, 7, 9, 11, 13, 15, 17, or 19.
  • nucleic acid molecules according to the invention have numerous applications in techniques known to those skilled in the art of molecular biology. These techniques include their use as hybridization probes, for chromosome and gene mapping, in PCR technologies, in the production of sense or antisensc nucleic acids, in screening for new therapeutic molecules, etc.
  • sequence information provided by the invention makes possible large-scale expression of the encoded polypeptides by techniques well known in the art.
  • Nucleic acid molecules of the invention also permit identification and isolation of nucleic acid molecules encoding related polypeptides, such as human allelic variants and species homologues, by well-known techniques including Southern and/or Northern hybridization, and PCR.
  • Knowledge of the sequence of a human DNA also makes possible, tlirough use of Southern hybridization or PCR, the identification of genomic DNA sequences encoding the proteins in Cluster V, expression control regulatory sequences such as promoters, operators, enhancers, repressors, and the like.
  • Nucleic acid molecules of the invention are also useful in hybridization assays to detect the capacity of cells to express the proteins in Cluster V.
  • Nucleic acid molecules of the invention may also provide a basis for diagnostic methods useful for identifying a genetic alteration(s) in a locus that underlies a disease state or states, which information is useful both for diagnosis and for selection of therapeutic strategies.
  • the invention provides an isolated polypeptide encoded by the nucleic acid molecule as defined above.
  • the said polypeptide has an amino acid sequence according to SEQ ID NO: 4, 6, 8, 10, 12, 14, 16, 18 or 20 of the Sequence Listing.
  • the polypeptide according to the invention is not to be limited strictly to a polypeptide with an amino acid sequence identical with SEQ ID NO: 4, 6, 8, 10, 12, 14, 16, 18 or 20 in the Sequence Listing. Rather the invention encompasses polypeptides carrying modifications like substitutions, small deletions, insertions or inversions, which polypeptides nevertheless have substantially the features of the Protein Cluster V polypeptide.
  • polypeptides the amino acid sequence of which is at least 90% homologous, preferably at least 95% homologous, with the amino acid sequence shown as SEQ ID NO: 4, 6, 8, 10, 12, 14, 16, 18 or 20 in the Sequence Listing.
  • the invention provides a vector harboring the nucleic acid molecule as defined above.
  • the said vector can e.g. be a replicable expression vector, which carries and is capable of mediating the expression of a DNA molecule according to the invention.
  • replicable means that the vector is able to replicate in a given type of host cell into which is has been introduced.
  • vectors are viruses such as bacteriophages, cosmids. plasmids and other recombination vectors.
  • Nucleic acid molecules are inserted into vector genomes by methods well known in the art.
  • a cultured host cell harboring a vector according to the invention.
  • a host cell can be a prokaryotic cell, a unicellular eukaryotic cell or a cell derived from a multicellular organism.
  • the host cell can thus e.g. be a bacterial cell such as an E. coli cell; a cell from yeast such as Saccharomyces cervisiae or Pichia pastoris, or a mammalian cell.
  • the methods employed to effect introduction of the vector into the host cell are standard methods well known to a person familiar with recombinant DNA methods.
  • the invention provides a process for production of a polypeptide, comprising culturing a host cell, according to the invention, under conditions whereby said polypeptide is produced, and recovering said polypeptide.
  • the medium used to grow the cells may be any conventional medium suitable for the purpose.
  • a suitable vector may be any of the vectors described above, and an appropriate host cell may be any of the cell types listed above.
  • the methods employed to construct the vector and effect introduction thereof into the host cell may be any methods known for such purposes within the field of recombinant DNA.
  • the recombinant polypeptide expressed by the cells may be secreted, i.e. exported through the cell membrane, dependent on the type of cell and the composition of the vector.
  • the invention provides a method for identifying an agent capable of modulating a nucleic acid molecule according to the invention, comprising
  • appropriate host cells can be transformed with a vector having a reporter gene under the control of the nucleic acid molecule according to this invention.
  • the expression of the reporter gene can be measured in the presence or absence of an agent with known activity (i.e. a standard agent) or putative activity (i.e. a "test agent” or “candidate agent”).
  • a change in the level of expression of the reporter gene in the presence of the test agent is compared with that effected by the standard agent. In this way, active agents are identified and their relative potency in this assay determined.
  • a transfection assay can be a particularly useful screening assay for identifying an effective agent.
  • a nucleic acid containing a gene such as a rcporter gene that is operably linked to a nucleic acid molecule according to the invention is transfected into the desired cell type.
  • a test level of reporter gene expression is assayed in the presence of a candidate agent and compared to a control level of expression.
  • An effective agent is identified as an agent that results in a test level of expression that is different than a control level of reporter gene expression, which is the level of expression determined in the absence of the agent.
  • standard protocols and “standard procedures”, when used in the context of molecular biology techniques, are to be understood as protocols and procedures found in an ordinary laboratory manual such as: Current Protocols in Molecular Biology, editors F. Ausubel et al., John Wiley and Sons, Inc. 1994, or Sambrook, J., Fritsch, E.F. and Maniatis, T., Molecular Cloning: A laboratory manual, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY 1989.
  • Protein Cluster V A family of homologous proteins (hereinafter referred to as "Protein Cluster V”) was identified by an "all-versus-all" BLAST procedure using all Caenorhabdilis elegans proteins in the Wormpep20 database release (http://wwxv.sanger.ac uk/Projects/ C ilegans/xvormpep/index.shtml).
  • the Wormpep database contains the predicted proteins from the C. elegans genome sequencing project, carried out jointly by the Sanger Centre in Cambridge, UK and the Genome Sequencing Center in St. Louis, USA. A number of 18,940 proteins were retrieved from Wormpep20. The proteins were used in a Smith- Waterman clustering procedure to group together proteins of similarity (Smith T.F.
  • the obtained sequence clusters were compared to the Drosophila melanogaster proteins contained in the database Flybase (Berkeley Drosophila Genome Project; http://www.fruitfly.org), and annotated clusters were removed.
  • Non-annotated protein clusters conserved in both C. elegans and D. melanogaster, were saved to a worm/fly data set, which was used in a BLAST procedure (http://www.ncbi.nlm.nih.gov/ Education/BLASTinfo/ informations, html) against the Celera Human Genome Database (http://www.celera.com).
  • the human part of this protein family includes seven different 150-250 residue polypeptides shown as SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18 and 20, encoded by the nucleic acid sequences shown as SEQ ID NO: 1 , 3, 5, 7, 9, 1 1 , 13, 15, 17, and 19.
  • the amino acid sequence shown as SEQ ID NO: 2 was identified to correspond to a human 261 aa sequence encoded by the gene "WUGSC: H_DJ0747G1 .5" (GenBank Accession No. AC004876). No function has been associated with the said gene.
  • Pfam is a large collection of protein families and domains. Pfam contains multiple protein alignments and profile-HMMs (Profile Hidden Markov Models) of these families. Profile-HMMs can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. Pfam is available on the WWW at http://pfam.wustl.edu; http://www.sanger.ac.uk/Software/Pfam; and http://www.cgr.ki.se/Pfam. The latest version (4.3) of Pfam contains 1815 families. These Pfam families match 63% of proteins in SWISS-PROT 37 and TrEMBL 9.
  • TM-HMM is a method to model and predict the location and orientation of alpha helices in membrane-spanning proteins (Sonnhammer et al. (1998) A hidden Markov model for predicting Iransmembrane helices in protein sequences. ISMB 6:175-182). The results indicate that the human Cluster V proteins contain 3-4 transmembrane segments.
  • the Caenorhabditis elegans genome includes four genes, designated K07B 1.4 (GenBank Accession No. AF003384), F59A1.10 (GenBank Accession No. Z81557), Y53G8B.2 (GenBank Accession No. AC006804), and W01A11.2 (GenBank Accession No. U64852) orthologous to the human Cluster V genes.
  • the closest ancestor (K07B1.4) is on average 44% identical to the 10 human gene products.
  • the Drosophila melanogaster genome includes four genes orthologous to human Cluster V. The most closely related genes, designated "CGI 942" (GenBank Accession No. AE003840_36) and gene: “CGI 946” (GenBank Accession No. AE003840_37) are 39% identical to the human gene products. (See also Adams et al. (2000) The genome sequence of Drosophila melanogaster, Science 287:2185-2195) is 42% identical to the human protein set.
  • the tissue distribution of the human genes was studied using the Incyte LifeSeq® database (http://www.incyie.com). The genes shown as SEQ ID NO: 1 , 3, 5, 7, 9, 1 1, 13,
  • SEQ ID NO: 1 and 3 Liver, digestive system
  • SEQ ID NO: 7 and 9 Exocrine Glands, Com ective Tissue, Germ Cells
  • SEQ ID NO: 11 Female genitalia, urinary tract
  • SEQ ID NO: 17 Female genitalia, nervous system
  • SEQ ID NO: 5 Cardiovascular system
  • nucleic acid molecules and the encoded polypeptides shown are proposed to be useful for differential identification of the tissues or cell types present in a biological sample and for diagnosis of diseases and disorders related to the tissues where the genes are expressed.
  • EXAMPLE 4 Effect of ⁇ 3-AR agonists on cluster V genes.
  • Microarrays consist of a highly ordered matrix of thousands of different DNA sequences that can be used to measure DNA and RNA variation in applications that include gene expression profiling, comparative genomics and genotyping (For recent reviews, see e.g.: Harrington et al. (2000) Monitoring gene expression using DNA microarrays. Curr. Opin. Microbiol. 3(3): 285-291 ; or Duggan et al. (1999) Expression profiling using cDNA Microarrays. Nature Genetics Supplement 21 : 10-14).
  • ⁇ 3-AR agonists affect gene regulation in adipose tissue in vivo
  • a study was carried out using Afiymetrix GeneChip oligonucleotide arrays by comparing the transcript profiles of a large number of genes in white adipose tissue derived from C57BL/6J mice treated with the ⁇ 3-AR agonist CL-316, 243, or from control mice injected with a saline solution.
  • PolyA + mRNAs were extracted from white adipose tissue from control and [33-AR agonist treated mice respectively. They were reverse transcribed using a T7-tagged oligo-dT primer and double-stranded cDNAs were generated.
  • cDNAs were then amplified and labeled using In Vitro Transcription (IVT) with T7 RNA polymerase and biotinylated nucleotides.
  • IVT In Vitro Transcription
  • the populations of cRNAs obtained after IVT were purified and fragmented by heat to produce a distribution of RNA fragment sizes from approximately 35 to 200 bases.
  • the arrays were then washed and stained with R-phycoerythrin streptavidin with the help of an Affymetrix fluidics station.
  • the cartridges were scanned using a Hewlett-Packard confocal scanner and the images were analyzed with the GeneChip 3.1 software (Affymetrix).
  • mice gene (GenBank accession No. AA275948), orthologous to the worm gene F59A1.10, is down-regulated by ⁇ 3-AR agonist treatment. It is hypothesized that the human genes in Cluster V are similarly involved in metabolically important signaling pathways.
  • MTN Multiple Tissue Northern blotting
  • MTNTM Multiple Tissue Northern Blots
  • MTN Blots http://www.clontech.com/mtn
  • MTN Blots can be used to analyze size and relative abundance of transcripts in different tissues.
  • MTN Blots can also be used to investigate gene families and alternate splice forms and to assess cross species homology.
  • EXAMPLE 6 Identification of polypeptides binding to Protein Cluster V
  • the two-hybrid screening method can be used.
  • the two-hybrid method first described by Fields & Song (1989) Nature 340:245-247, is a yeast-based genetic assay to detect protein- protein interactions in vivo. The method enables not only identification of interacting proteins, but also results in the immediate availability of the cloned genes for these proteins.
  • the two-hybrid method can be used to determine if two known proteins (i.e. proteins for which the corresponding genes have been previously cloned) interact. Another important application of the two-hybrid method is to identify previously unknown proteins that interact with a target protein by screening a two-hybrid library.
  • the two-hybrid system a method to identify and clone genes for proteins that interact with a protein of interest. Proc. Natl. Acad. Sci. U.S.A. 88:9578-9582; Bartel PL, Fields (1995) Analyzing protein-protein interactions using two-hybrid system. Methods Enzymol.
  • the two-hybrid method uses the restoration of transcriptional activation to indicate the interaction between two proteins.
  • DNA-BD DNA-binding domain
  • AD activation domain
  • the DNA-BD vector is used to generate a fusion of the DNA-BD and a bait protein X
  • the AD vector is used to generate a fusion of the AD and another protein Y.
  • An entire library of hybrids with the AD can also be constructed to search for new or unknown proteins that interact with the bait protein.
  • the two functional domains responsible for DNA binding and activation, are tethered, resulting in functional restoration of transcriptional activation.
  • the two hybrids are cotransformed into a yeast host strain harboring reporter genes containing appropriate upstream binding sites; expression of the reporter genes then indicates interaction between a candidate protein and the target protein.
  • PCR polymerase chain reaction
  • a DNA fragment corresponding to a nucleotide sequence selected from the group consisting of SEQ ID NO: 1, 3, 5, 7, 9, 1 1 , 13, 15, 17 or 19, or a portion thereof can be used as a probe for hybridization screening of a phage cDNA library.
  • the DNA fragment is amplified by the polymerase chain reaction (PCR) method.
  • the primers are preferably 10 to 25 nucleotides in length and are determined by procedures well l ⁇ iown to those skilled in the art.
  • a lambda phage library containing cDNAs cloned into lambda phage-vectors is plated on agar plates with E.
  • Plasmid DNA is isolated from the clones. The size of the insert is determined by digesting the plasmid with appropriate restriction enzymes. The sequence of the entire insert is determined by automated sequencing of the plasmids.
  • EXAMPLE 8 Recombinant expression of proteins in eukaryotic host cells
  • a polypeptide-encoding nucleic acid molecule is expressed in a suitable host cell using a suitable expression vector and standard genetic engineering techniques.
  • the polypeptide-encoding sequence is subcloned into a commercial expression vector and transfected into mammalian, e.g. Chinese Hamster Ovary (CHO), cells using a standard transfection reagent. Cells stably expressing a protein are selected.
  • the protein may be purified from the cells using standard chromatographic techniques. To facilitate purification, antisera is raised against one or more synthetic peptide sequences that correspond to portions of the amino acid sequence, and the antisera is used to affinity purify the protein.
  • RNA interference offers a way of specifically and potently inactivating a cloned gene, and is proving a powerful tool for investigating gene function.
  • Fire RNA-triggered gene silencing. Trends in Genetics 15:358-363; or Kuwabara & Coulson (2000) RNAi-prospecls for a general technique for determining gene function. Parasitology Today 16:347-349.
  • dsRNA double- stranded RNA
  • dsRNA double- stranded RNA
  • PTGS posttranscriptional gene silencing

Landscapes

  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Peptides Or Proteins (AREA)
  • Biochemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Zoology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Toxicology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)

Abstract

The present invention relates to the identification of a human gene family expressed in metabolically relevant tissues. The genes encode a group polypeptides referred to as 'Protein Cluster V' which are predicted to be useful in the diagnosis of metabolic diseases, such as obesity and diabetes, as well as in the identification of agents useful in the treatment of the said diseases.

Description

PROTEIN CLUSTER V
TECHNICAL FIELD
The present invention relates to the identification of a human gene family expressed in metabolically relevant tissues. The genes encode a group polypeptides referred to as "Protein Cluster V" which are predicted to be useful in the diagnosis of metabolic diseases, such as obesity and diabetes, as well as in the identification of agents useful in the treatment of the said diseases.
BACKGROUND ART
Metabolic diseases are defined as any of the diseases or disorders that disrupt normal metabolism. They may arise from nutritional deficiencies; in connection with diseases of the endocrine system, the liver, or the kidneys; or as a result of genetic defects. Metabolic diseases are conditions caused by an abnormality in one or more of the chemical reactions essential to producing energy, to regenerating cellular constituents, or to eliminating unneeded products arising from these processes. Depending on which metabolic pathway is involved, a single defective chemical reaction may produce consequences that are narrow, involving a single body function, or broad, affecting many organs and systems.
One of the major hormones that influence metabolism is insulin, which is synthesized in the beta cells of the islets of Langerhans of the pancreas. Insulin primarily regulates the direction of metabolism, shifting many processes toward the storage of substrates and away from their degradation. Insulin acts to increase the transport of glucose and amino acids as well as key minerals such as potassium, magnesium, and phosphate from the blood into cells. It also regulates a variety of enzymatic reactions within the cells, all of which have a common overall direction, namely the synthesis of large molecules from small units. A deficiency in the action of insulin (diabetes mellitus) causes severe impairment in (i) the storage of glucose in the form of glycogen and the oxidation of glucose for energy; (ii) the synthesis and storage of fat from fatty acids and their precursors and the completion of fatty-acid oxidation; and (iii) the synthesis of proteins from amino acids.
There are two varieties of diabetes. Type I is insulin-dependent diabetes mellitus (IDDM), for which insulin injection is required; it was formerly referred to as juvenile onset diabetes. In this type, insulin is not secreted by the pancreas and hence must be taken by injection. Type II, non-insulin-dependent diabetes mellitus (NIDDM) may be controlled by dietary restriction. It derives from insufficient pancreatic insulin secretion and tissue resistance to secreted insulin, which is complicated by subtle changes in the secretion of insulin by the beta cells. Despite their former classifications as juvenile or adult, either type can occur at any age; NIDDM, however, is the most common type, accounting for 90 percent of all diabetes. While the exact causes of diabetes remain obscure, it is evident that NIDDM is linked to heredity and obesity. There is clearly a genetic predisposition to NIDDM diabetes in those who become overweight or obese.
Obesity is usually defined in terms of the body mass index (BMI), i.e. weight (in kilograms) divided by the square of the height (in meters). Weight is regulated with great precision. Regulation of body weight is believed to occur not only in persons of normal weight but also among many obese persons, in whom obesity is attributed to an elevation in the set point around which weight is regulated. The determinants of obesity can be divided into genetic, environmental, and regulatory.
Recent discoveries have helped explain how genes may determine obesity and how they may influence the regulation of body weight. For example, mutations in the ob gene have led to massive obesity in mice. Cloning the ob gene led to the identification of leptin, a protein coded by this gene; leptin is produced in adipose tissue cells and acts to control body fat. The existence of leptin supports the idea that body weight is regulated, because leptin serves as a signal between adipose tissue and the areas of the brain that control energy metabolism, which influences body weight. Metabolic diseases like diabetes and obesity are clinically and genetically heterogeneous disorders. Recent advances in molecular genetics have led to the recognition of genes involved in IDDM and in some subtypes of NIDDM, including maturity-onset diabetes of the young (MODY) (Velho & Froguel (1997) Diabetes Metab. 23 Suppl 2:34-37). However, several IDDM susceptibility genes have not yet been identified, and very little is known about genes contributing to common forms of NIDDM. Studies of candidate genes and of genes mapped in animal models of IDDM or NIDDM, as well as whole genome scanning of diabetic families from different populations, should allow the identification of most diabetes susceptibility genes and of the molecular targets for new potential drugs. The identification of genes involved in metabolic disorders will thus contribute to the development of novel predictive and therapeutic approaches.
The β3-adrenergic receptor (AR) represents one of a number of potential anti-obesity drugs targets for which selective agonists have been developed. In rodents, β3-AR niRNA is abundant in white adipose tissue (WAT) and brown adipose tissue (BAT). It has been demonstrated that mice lacking endogenous β3-adrenoreceptors have a slight increase in body fat, but otherwise appear normal (Susulic V.S., et al. (1995) J. Biol. Chem. 270(49): 29483-29492). These mice are completely resistant to the specific β3- agonist CL-316,243, which has been shown to increase lipolysis, energy expenditure and affect insulin and leptin levels. When the β3-AR was ectopically expressed in white and brown adipose tissue or brown adipose tissue only, it was recently demonstrated that the anorectic and insulin secretagogue effects appeared to be mediated by white adipose tissue (Grujic D, et al. (1997) J Biol Chem. 272(28): 17686-93). How these effects are mediated by β3-AR agonists remains poorly understood.
Lardizabal, K.D. et al. (J. Biol. Chem. 276: 38862-38869) and Cases, S. et al. (J. Biol. Chem. 276: 38870-38876; both papers published 31 July 2001) disclose a new gene family, including members in fungi, plants and animals, which encode proteins corresponding to the "Cluster V" proteins according to the present invention. The proteins were shown to have acyl CoA:diacylglycerol acyltransferase (DGAT; EC 2.3.1.20) function. The gene family is unrelated to the previously identified DGAT(l) family and was designated DGAT2. DGAT2 was shown to have high expression levels in liver and white adipose tissue, suggesting that it may play a significant role in mammalian triglyceride metabolism.
DISCLOSURE OF THE INVENTION
According to the present invention, a family of genes and encoded homologous proteins (hereinafter referred to as "Protein Cluster V") has been identified. Consequently, the present invention provides an isolated nucleic acid molecule selected from:
(a) nucleic acid molecules comprising a nucleotide sequence as shown in SEQ ID NO: 3, 5, 7, 9, 11, 13, 15, 17, or 19.
(b) nucleic acid molecules comprising a nucleotide sequence capable of hybridizing, under stringent hybridization conditions, to a nucleotide sequence complemeiitai'y to the polypeptide coding region of a nucleic acid molecule as defined in (a); and
(c) nucleic acid molecules comprising a nucleic acid sequence which is degenerate as a result of the genetic code to a nucleotide sequence as defined in (a) or (b).
The nucleic acid molecules according to the present invention includes cDNA, chemically synthesized DNA, DNA isolated by PCR, genomic DNA, and combinations thereof. RNA transcribed from DNA is also encompassed by the present invention.
The term "stringent hybridization conditions" is known in the art from standard protocols (e.g. Ausubel et al., supra) and could be understood as e.g. hybridization to filter-bound DNA in 0.5 M NaHP04, 7% sodium dodecyl sυlfate (SDS), 1 mM EDTA at +65°C, and washing in O.lxSSC / 0.1% SDS at +68°C.
In a preferred form of the invention, the said nucleic acid molecule has a nucleotide sequence identical with SEQ ID NO: 3, 5, 7, 9, 1 1, 13, 15, 17, or 19 of the Sequence Listing. However, the nucleic acid molecule according to the invention is not to be limited strictly to the sequence shown as SEQ ID NO: 3, 5, 7, 9, 1 1 , 13, 15, 17, or 19. Rather the invention encompasses nucleic acid molecules carrying modifications like substitutions, small deletions, insertions or inversions, which nevertheless encode proteins having substantially the features of the Protein Cluster V polypeptide according to the invention. Included in the invention are consequently nucleic acid molecules, the nucleotide sequence of which is at least 90% homologous, preferably at least 95% homologous, with the nucleotide sequence shown as SEQ ID NO: 3, 5, 7, 9, 11, 13, 15, 17, or 19 in the Sequence Listing.
Included in the invention is also a nucleic acid molecule which nucleotide sequence is degenerate, because of the genetic code, to the nucleotide sequence shown as SEQ ID NO: 3, 5, 7, 9, 11, 13, 15, 17, or 19. A sequential grouping of three nucleotides, a "codon", codes for one amino acid. Since there are 64 possible codons, but only 20 natural amino acids, most amino acids are coded for by more than one codon. This natural "degeneracy", or "redundancy", of the genetic code is well known in the art. It will thus be appreciated that the nucleotide sequence shown in the Sequence Listing is only an example within a large but definite group oi" sequences which will encode the Protein Cluster V polypeptide.
The nucleic acid molecules according to the invention have numerous applications in techniques known to those skilled in the art of molecular biology. These techniques include their use as hybridization probes, for chromosome and gene mapping, in PCR technologies, in the production of sense or antisensc nucleic acids, in screening for new therapeutic molecules, etc.
More specifically, the sequence information provided by the invention makes possible large-scale expression of the encoded polypeptides by techniques well known in the art. Nucleic acid molecules of the invention also permit identification and isolation of nucleic acid molecules encoding related polypeptides, such as human allelic variants and species homologues, by well-known techniques including Southern and/or Northern hybridization, and PCR. Knowledge of the sequence of a human DNA also makes possible, tlirough use of Southern hybridization or PCR, the identification of genomic DNA sequences encoding the proteins in Cluster V, expression control regulatory sequences such as promoters, operators, enhancers, repressors, and the like. Nucleic acid molecules of the invention are also useful in hybridization assays to detect the capacity of cells to express the proteins in Cluster V. Nucleic acid molecules of the invention may also provide a basis for diagnostic methods useful for identifying a genetic alteration(s) in a locus that underlies a disease state or states, which information is useful both for diagnosis and for selection of therapeutic strategies.
hi a further aspect, the invention provides an isolated polypeptide encoded by the nucleic acid molecule as defined above. In a preferred form, the said polypeptide has an amino acid sequence according to SEQ ID NO: 4, 6, 8, 10, 12, 14, 16, 18 or 20 of the Sequence Listing. However, the polypeptide according to the invention is not to be limited strictly to a polypeptide with an amino acid sequence identical with SEQ ID NO: 4, 6, 8, 10, 12, 14, 16, 18 or 20 in the Sequence Listing. Rather the invention encompasses polypeptides carrying modifications like substitutions, small deletions, insertions or inversions, which polypeptides nevertheless have substantially the features of the Protein Cluster V polypeptide. Included in the invention are consequently polypeptides, the amino acid sequence of which is at least 90% homologous, preferably at least 95% homologous, with the amino acid sequence shown as SEQ ID NO: 4, 6, 8, 10, 12, 14, 16, 18 or 20 in the Sequence Listing.
In a further aspect, the invention provides a vector harboring the nucleic acid molecule as defined above. The said vector can e.g. be a replicable expression vector, which carries and is capable of mediating the expression of a DNA molecule according to the invention. In the present context the term "replicable" means that the vector is able to replicate in a given type of host cell into which is has been introduced. Examples of vectors are viruses such as bacteriophages, cosmids. plasmids and other recombination vectors. Nucleic acid molecules are inserted into vector genomes by methods well known in the art.
Included in the invention is also a cultured host cell harboring a vector according to the invention. Such a host cell can be a prokaryotic cell, a unicellular eukaryotic cell or a cell derived from a multicellular organism. The host cell can thus e.g. be a bacterial cell such as an E. coli cell; a cell from yeast such as Saccharomyces cervisiae or Pichia pastoris, or a mammalian cell. The methods employed to effect introduction of the vector into the host cell are standard methods well known to a person familiar with recombinant DNA methods.
In yet another aspect, the invention provides a process for production of a polypeptide, comprising culturing a host cell, according to the invention, under conditions whereby said polypeptide is produced, and recovering said polypeptide. The medium used to grow the cells may be any conventional medium suitable for the purpose. A suitable vector may be any of the vectors described above, and an appropriate host cell may be any of the cell types listed above. The methods employed to construct the vector and effect introduction thereof into the host cell may be any methods known for such purposes within the field of recombinant DNA. The recombinant polypeptide expressed by the cells may be secreted, i.e. exported through the cell membrane, dependent on the type of cell and the composition of the vector.
In a further aspect, the invention provides a method for identifying an agent capable of modulating a nucleic acid molecule according to the invention, comprising
(i) providing a cell comprising the said nucleic acid molecule;
(ii) contacting said cell with a candidate agent; and
(iii) monitoring said cell for an effect that is not present in the absence of said candidate agent.
For screening purposes, appropriate host cells can be transformed with a vector having a reporter gene under the control of the nucleic acid molecule according to this invention. The expression of the reporter gene can be measured in the presence or absence of an agent with known activity (i.e. a standard agent) or putative activity (i.e. a "test agent" or "candidate agent"). A change in the level of expression of the reporter gene in the presence of the test agent is compared with that effected by the standard agent. In this way, active agents are identified and their relative potency in this assay determined.
A transfection assay can be a particularly useful screening assay for identifying an effective agent. In a transfection assay, a nucleic acid containing a gene such as a rcporter gene that is operably linked to a nucleic acid molecule according to the invention, is transfected into the desired cell type. A test level of reporter gene expression is assayed in the presence of a candidate agent and compared to a control level of expression. An effective agent is identified as an agent that results in a test level of expression that is different than a control level of reporter gene expression, which is the level of expression determined in the absence of the agent. Methods for transfecting cells and a variety of convenient reporter genes are well known in the art (see, for example, Goeddel (ed.), Methods Enzymol., Vol. 185, San Diego: Academic Press, Inc. (1990); see also Sambrook, supra).
Throughout this description the terms "standard protocols" and "standard procedures", when used in the context of molecular biology techniques, are to be understood as protocols and procedures found in an ordinary laboratory manual such as: Current Protocols in Molecular Biology, editors F. Ausubel et al., John Wiley and Sons, Inc. 1994, or Sambrook, J., Fritsch, E.F. and Maniatis, T., Molecular Cloning: A laboratory manual, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY 1989.
EXAMPLES
EXAMPLE 1 : Identification of protein clusters
A family of homologous proteins (hereinafter referred to as "Protein Cluster V") was identified by an "all-versus-all" BLAST procedure using all Caenorhabdilis elegans proteins in the Wormpep20 database release (http://wwxv.sanger.ac uk/Projects/ C ilegans/xvormpep/index.shtml). The Wormpep database contains the predicted proteins from the C. elegans genome sequencing project, carried out jointly by the Sanger Centre in Cambridge, UK and the Genome Sequencing Center in St. Louis, USA. A number of 18,940 proteins were retrieved from Wormpep20. The proteins were used in a Smith- Waterman clustering procedure to group together proteins of similarity (Smith T.F. & Waterman M.S. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147(1 ): 195-197; Pearson WR. (1991 ) Searching protein seguence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11 : 635-650; Olsen et al. (1999) Optimizing Smith- Waterman alignments. Pac Symp Biocomput.302-313). Completely annotated proteins were filtered out, whereby 10,130 proteins of unknown function could be grouped into 1 ,800 clusters.
The obtained sequence clusters were compared to the Drosophila melanogaster proteins contained in the database Flybase (Berkeley Drosophila Genome Project; http://www.fruitfly.org), and annotated clusters were removed. Non-annotated protein clusters, conserved in both C. elegans and D. melanogaster, were saved to a worm/fly data set, which was used in a BLAST procedure (http://www.ncbi.nlm.nih.gov/ Education/BLASTinfo/ informations, html) against the Celera Human Genome Database (http://www.celera.com). Overlapping fragments were assembled to, as close as possible, full-length proteins using the PHRAP software, developed at the University of Washington (http.i '/www. genome. Washington, edu/ UWGC/analysistoυls/phrap. htm) . A group of homologous proteins ("Protein Cluster V") with unknown function was chosen for further studies.
EST databases provided by the EMBL (http://www.embl.org/Services/index.html) were used to check whether the human proteins in Cluster V were expressed, in order to identify putative pseudogenes. One putative pseudogene was identified and excluded.
EXAMPLE 2: Analyses of Protein Cluster V
(a) Alignment
The human part of this protein family includes seven different 150-250 residue polypeptides shown as SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18 and 20, encoded by the nucleic acid sequences shown as SEQ ID NO: 1 , 3, 5, 7, 9, 1 1 , 13, 15, 17, and 19. The amino acid sequence shown as SEQ ID NO: 2 was identified to correspond to a human 261 aa sequence encoded by the gene "WUGSC: H_DJ0747G1 .5" (GenBank Accession No. AC004876). No function has been associated with the said gene.
An alignment of the human polypeptides included in Protein Cluster V, using the ClustalW multiple alignment software (Thompson et al. (1994) Nucleic Acid Research 22: 4673-4680) is shown in Table I. The alignment showed a high degree of conservation over a 100 residues region in the protein (corresponding to positions 23- 147 in SEQ ID NO: 2), indicating the presence of a novel domain.
(b) HMM-Pfam
A HMM-Pfam search was performed on the human family members. Pfam is a large collection of protein families and domains. Pfam contains multiple protein alignments and profile-HMMs (Profile Hidden Markov Models) of these families. Profile-HMMs can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. Pfam is available on the WWW at http://pfam.wustl.edu; http://www.sanger.ac.uk/Software/Pfam; and http://www.cgr.ki.se/Pfam. The latest version (4.3) of Pfam contains 1815 families. These Pfam families match 63% of proteins in SWISS-PROT 37 and TrEMBL 9. For references to Pfam, see Bateman et al. (2000) The Pfam protein families database. Nucleic Acids Res. 28:263-266; Sonnhammer et al. (1998) Pfam: Multiple Sequence Alignments and HMM-Profiles of Protein Domains. Nucleic Acids Research, 26:322-325; Sonnhammer el al. (1997) Pfam: a Comprehensive Database of Protein Domain Families Based on Seed Alignments. Proteins 28: 405-420.
The HMM-Pfam search indicated that no previously known domains could be identified in Protein Cluster V.
(c) TM-HMM
The human proteins in Cluster V were analyzed using the TM-HMM tool available e.g. at http://www.cbs.dtu.dk/services/TMHMM-LO. TM-HMM is a method to model and predict the location and orientation of alpha helices in membrane-spanning proteins (Sonnhammer et al. (1998) A hidden Markov model for predicting Iransmembrane helices in protein sequences. ISMB 6:175-182). The results indicate that the human Cluster V proteins contain 3-4 transmembrane segments.
(d) Analysis of non-human orthologs
The Caenorhabditis elegans genome includes four genes, designated K07B 1.4 (GenBank Accession No. AF003384), F59A1.10 (GenBank Accession No. Z81557), Y53G8B.2 (GenBank Accession No. AC006804), and W01A11.2 (GenBank Accession No. U64852) orthologous to the human Cluster V genes. The closest ancestor (K07B1.4) is on average 44% identical to the 10 human gene products. (See also: Genome sequence of the nematode C. elegans: a platform for investigating biology; The C. elegans Sequencing Consortium. Science (1998) 282:2012-201 8. Published errata appear in Science (1999) 283:35; 283:2103; and 285: 1493.)
The Drosophila melanogaster genome includes four genes orthologous to human Cluster V. The most closely related genes, designated "CGI 942" (GenBank Accession No. AE003840_36) and gene: "CGI 946" (GenBank Accession No. AE003840_37) are 39% identical to the human gene products. (See also Adams et al. (2000) The genome sequence of Drosophila melanogaster, Science 287:2185-2195) is 42% identical to the human protein set.
The human proteins in Cluster V show 27% identity to two yeast proteins; S. Cerevisiae SCYOR245C_l (GenBank Accession No. Z75153) and S. pombe SPCC548 (GenBank Accession No. AL359685). The yeast proteins are of unknown function. EXAMPLE 3: Expression analysis
The tissue distribution of the human genes was studied using the Incyte LifeSeq® database (http://www.incyie.com). The genes shown as SEQ ID NO: 1 , 3, 5, 7, 9, 1 1, 13,
15, 17 were found to be expressed primarily in the following tissues:
SEQ ID NO: 1 and 3: Liver, digestive system
SEQ ID NO: 7 and 9: Exocrine Glands, Com ective Tissue, Germ Cells
SEQ ID NO: 11 : Female genitalia, urinary tract
SEQ ID NO: 17: Female genitalia, nervous system
SEQ ID NO: 13 and 15: Digestive System
SEQ ID NO: 5: Cardiovascular system
Therefore, the said nucleic acid molecules and the encoded polypeptides shown are proposed to be useful for differential identification of the tissues or cell types present in a biological sample and for diagnosis of diseases and disorders related to the tissues where the genes are expressed.
EXAMPLE 4: Effect of β3-AR agonists on cluster V genes.
Microarrays consist of a highly ordered matrix of thousands of different DNA sequences that can be used to measure DNA and RNA variation in applications that include gene expression profiling, comparative genomics and genotyping (For recent reviews, see e.g.: Harrington et al. (2000) Monitoring gene expression using DNA microarrays. Curr. Opin. Microbiol. 3(3): 285-291 ; or Duggan et al. (1999) Expression profiling using cDNA Microarrays. Nature Genetics Supplement 21 : 10-14).
In order to investigate the mechanisms whereby β3-AR agonists affect gene regulation in adipose tissue in vivo, a study was carried out using Afiymetrix GeneChip oligonucleotide arrays by comparing the transcript profiles of a large number of genes in white adipose tissue derived from C57BL/6J mice treated with the β3-AR agonist CL-316, 243, or from control mice injected with a saline solution. PolyA+mRNAs were extracted from white adipose tissue from control and [33-AR agonist treated mice respectively. They were reverse transcribed using a T7-tagged oligo-dT primer and double-stranded cDNAs were generated. These cDNAs were then amplified and labeled using In Vitro Transcription (IVT) with T7 RNA polymerase and biotinylated nucleotides. The populations of cRNAs obtained after IVT were purified and fragmented by heat to produce a distribution of RNA fragment sizes from approximately 35 to 200 bases. Two Affymetrix Mul 9K and Mul lK sets of 3 arrays (subA, subB and subC) and 2 arrays (subA and subB) respectively, were hybridized (using the recommended buffer) overnight at 45°C with the control or the treated denatured samples. The arrays were then washed and stained with R-phycoerythrin streptavidin with the help of an Affymetrix fluidics station. The cartridges were scanned using a Hewlett-Packard confocal scanner and the images were analyzed with the GeneChip 3.1 software (Affymetrix).
The results indicate that the mouse gene (GenBank accession No. AA275948), orthologous to the worm gene F59A1.10, is down-regulated by β3-AR agonist treatment. It is hypothesized that the human genes in Cluster V are similarly involved in metabolically important signaling pathways.
EXAMPLE 5: Multiple Tissue Northern blotting
Multiple Tissue Northern blotting (MTN) is performed to make a more thorough analysis of the expression profiles of the proteins in Cluster V. Multiple Tissue Northern (MTN™) Blots (http://www.clontech.com/mtn) are pre-made Northern blots featuring Premium Poly A+ RNA from a variety of different human, mouse, or rat tissues. MTN Blots can be used to analyze size and relative abundance of transcripts in different tissues. MTN Blots can also be used to investigate gene families and alternate splice forms and to assess cross species homology. EXAMPLE 6: Identification of polypeptides binding to Protein Cluster V
In order to assay for proteins interacting with Protein Cluster V, the two-hybrid screening method can be used. The two-hybrid method, first described by Fields & Song (1989) Nature 340:245-247, is a yeast-based genetic assay to detect protein- protein interactions in vivo. The method enables not only identification of interacting proteins, but also results in the immediate availability of the cloned genes for these proteins.
The two-hybrid method can be used to determine if two known proteins (i.e. proteins for which the corresponding genes have been previously cloned) interact. Another important application of the two-hybrid method is to identify previously unknown proteins that interact with a target protein by screening a two-hybrid library. For reviews, see e.g.: Chien et al. (1991) The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest. Proc. Natl. Acad. Sci. U.S.A. 88:9578-9582; Bartel PL, Fields (1995) Analyzing protein-protein interactions using two-hybrid system. Methods Enzymol. 254:241 -263; or Wallach et al. (1998) The yeast two-hybrid screening technique and its use in the study of protein-protein interactions in apoptosis. Curr. Opin. Immunol. 10(2): 131-136. See also http://www. clontech. com/matchmaker.
The two-hybrid method uses the restoration of transcriptional activation to indicate the interaction between two proteins. Central to this technique is the fact that many eukaryotic transcriptional activators consist of two physically discrete modular domains: the DNA-binding domain (DNA-BD) that binds to a specific promoter sequence and the activation domain (AD) that directs the RNA polymerase IT complex to transcribe the gene downstream of the DNA binding site. The DNA-BD vector is used to generate a fusion of the DNA-BD and a bait protein X, and the AD vector is used to generate a fusion of the AD and another protein Y. An entire library of hybrids with the AD can also be constructed to search for new or unknown proteins that interact with the bait protein. When interaction occurs between the bait protein X and a candidate protein Y, the two functional domains, responsible for DNA binding and activation, are tethered, resulting in functional restoration of transcriptional activation. The two hybrids are cotransformed into a yeast host strain harboring reporter genes containing appropriate upstream binding sites; expression of the reporter genes then indicates interaction between a candidate protein and the target protein.
EXAMPLE 7: Full-length cloning of Cluster V genes
The polymerase chain reaction (PCR), which is a well-known procedure for in vitro enzymatic amplification of a specific DNA segment, can be used for direct cloning of Protein Cluster V genes. Tissue cDNA can be amplified by PCR and cloned into an appropriate plasmid and sequenced. For reviews, see e.g. Hooft van Huijsduijnen (1998) PCR-assisted cDNA cloning: a guided lour of the minefield. Biotechniques 24:390-392; Lenstra (1995) The applications of the polymerase chain reaction in the life sciences. Cellular & Molecular Biology 41 :603-614; or Rashtchian (1995) Novel methods for cloning and engineering genes using the polymerase chain reaction. Current Opinion in Biotechnology 6:30-36. Various methods for generating suitable ends to facilitate the direct cloning of PCR products are given e.g. in Ausubel et al. supra (section 15.7).
In an alternative approach to isolate a cDNA clone encoding a full length protein of Protein Cluster V, a DNA fragment corresponding to a nucleotide sequence selected from the group consisting of SEQ ID NO: 1, 3, 5, 7, 9, 1 1 , 13, 15, 17 or 19, or a portion thereof, can be used as a probe for hybridization screening of a phage cDNA library. The DNA fragment is amplified by the polymerase chain reaction (PCR) method. The primers are preferably 10 to 25 nucleotides in length and are determined by procedures well lαiown to those skilled in the art. A lambda phage library containing cDNAs cloned into lambda phage-vectors is plated on agar plates with E. coli host cells, and grown. Phage plaques are transferred to nylon membranes, which are hybridized with a DNA probe prepared as described above. Positive colonies are isolated from the plates. Plasmids containing cDNA are rescued from the isolated phages by standard methods. Plasmid DNA is isolated from the clones. The size of the insert is determined by digesting the plasmid with appropriate restriction enzymes. The sequence of the entire insert is determined by automated sequencing of the plasmids.
EXAMPLE 8: Recombinant expression of proteins in eukaryotic host cells
To produce proteins of Cluster V, a polypeptide-encoding nucleic acid molecule is expressed in a suitable host cell using a suitable expression vector and standard genetic engineering techniques. For example, the polypeptide-encoding sequence is subcloned into a commercial expression vector and transfected into mammalian, e.g. Chinese Hamster Ovary (CHO), cells using a standard transfection reagent. Cells stably expressing a protein are selected. Optionally, the protein may be purified from the cells using standard chromatographic techniques. To facilitate purification, antisera is raised against one or more synthetic peptide sequences that correspond to portions of the amino acid sequence, and the antisera is used to affinity purify the protein.
EXAMPLE 9: Determination of gene function
Methods are known in the art for elucidating the biological function or mode of action of individual genes. For instance, RNA interference (RNAi) offers a way of specifically and potently inactivating a cloned gene, and is proving a powerful tool for investigating gene function. For reviews, see e.g. Fire (1999) RNA-triggered gene silencing. Trends in Genetics 15:358-363; or Kuwabara & Coulson (2000) RNAi-prospecls for a general technique for determining gene function. Parasitology Today 16:347-349. When double- stranded RNA (dsRNA) corresponding to a sense and antisense sequence of an endogenous mRNA is introduced into a cell, the cognate mRNA is degraded and the gene is silenced. This type of posttranscriptional gene silencing (PTGS) was first discovered in C elegans (Fire et al., (1998) Nature 391 :806-81 1 ). RNA interference has recently been used for targeting nearly 90% of predicted genes on C. elegans chromosome I (Fraser et al. (2000) Nature 408: 325-330) and 96% of predicted genes on C. elegans chromosome III (Gonczy et al. (2000) Nature 408:331 -336). T ABLE I
Alignment of polypeptides in Protein Cluster V
SEQ 2
SEQ 4
SEQ 8
SEQ 10
SEQ 12
SEQ 14
SEQ 20 MVNGKSITSLQSNKNLAAIHGPKYLCGNFGPR QAFSLGTKLDPMEVFPKLLPSKVPVAQ 60
SEQ 16
SEQ 18
SEQ_ _6
SEQ 2
SEQ 4
SEQ 8
SEQ 10
SEQ 12
SEQ 14
SEQ 20 TLAPYSAPCFQRL SAAKVKAPSHNAKQGPKMDGQLVKTHDLSPKHNYIIANHPHGILS 120
SEQ 16 RPGGSEG 7
SEQ 18
SEQ_ _6
SEQ 2 EAP FSRCLAFHPPFILLNTPKLVKTAELPPDRNYVLGAHPHGIMCTGFLCNF 53
SEQ 4 LGTLLGWRAPLFSRCLAFHPPFILLNTPKLVKTAELPPDRNYVLGAHPHGIMCTGFLCNF 60
SEQ 8 —AFCNFSTEATEVSKKFPGIRPYLATLAGNFRMPVLREYLMSGGICPVSRDTIDYLLSK 58
SEQ 10
SEQ 12
SEQ 14 NLF 3
SEQ 20 FGVFINFATEATGIARIFPSITPFVGTLERIFWIPIVREYVMSMGVCPVSSSALKYLLTQ 180
SEQ 16 RFPKVTPVSGRVRAGTQAPP LSRLPSLQLVKTAELDPSRNYIAGFHPHGVLAVGAFANL 67
SEQ 18 SDYVPLKLLKTHDICPSRNYILVCHPHGLFAHG FGHF 38
SEQ_ _6 CSEIFASLRLPR IMAHSKQPSHFQSLMLLQ 31
SEQ 2 STESHGFSQLFPGLRP LSVLAG LFYLPVYRDYI SFGLCPVSRQSLD FIL 104
SEQ 4 STESNGFSQLFPGLRP LAVLAG LFYLPVYRDYIMSFGASLVPVYSFGENDIFRL 115
SEQ 8 NGSGNAIIIVVGGAAESLSSMPGKNAVTLRNRKGFVKLALRHGADLVPIYSFGENEVYKQ 118
SEQ 10 RNRKGFVKLALRHGADLVPIYSFGENEVYKQ 31
SEQ 12 KESLDAHPGKFTLFIRQRKGFVKIALTHGASLVPVVSFGENELFKQ 46
SEQ 14 EAHKLKFNIIVGGAQEALDARPGSFTLLLRNRKGFVRLALTHGAPLVXIFSFGENDLFDQ 63
SEQ 20 KGSGNAVVIVVGGAAEALLCRPGASTLFLKQRKGFVKMALQTGAYLVPSYSFGENEVFNQ 240
SEQ 16 CTESTGFSSIFPGIRPHLMMLTL WFRAPFFRDYIMSAGLVTSEKESAAHILNRKG 122
SEQ 18 ATEASGFSKIFPGITPYILTLGA FFWMPFLREYVMSTGACSVSRSSIDFLLTHKG 93
SEQ 6 PLSYLAIF ILQPLFVYLLFTSL PLPVLYFA LFLDWKTPERGGRRSA VRNWCVWTHI 91 ^ABLE 1 (continued)
SEQ 2 SQPQLG QAVVI VGGAEALYSVPGEHCLTLQKRKGFVRLALRHGASLVP 153
SEQ 4 KAFATGS QHWCQLTFKK LMGFSPCIFWGRGLFSATSWGLLPFAVPITTVVGRPIP 171
SEQ 8 VIFEEGS GR VQKKFQ KYIGFAPCIFHGRGLFSSDTWGLVPYSKPITTVVGEPIT 174
SEQ 10 VIFEEGSWGR VQKKFQ KYIGFAPCIFHGRGLFSSDT GLVPYSKPITTVGGGKIQ 87
SEQ 12 TDNPEGSWIRTVQNKLQ KIMGFALPLFHARGVFQYN-FGLMTYRKAIHTVVGRPIP 101
SEQ 14 IPNSSGS LRYIQNRLQ KIMG 84
SEQ 20 ETFPEGT LRLFQKTFQDTFKKILGLNFCTFHGRG-FTRGS GFLPFNRPITTVVGEPLP 299
SEQ 16 GGNLLGIIVG GAQEALDARPGSFTLLLRNRKGFVRLALTHG 163
SEQ 18 TGNMVIVVIG GLAECRYSLPGSSTLVLKNRSGFVRMALQHGVPLIP 139
SEQ_ _6 RDYFPITILK TKDLSPEHNYLMGVHPMGLLTFGAFCNFC 130
SEQ 2 VYS FGENDIFR KAFATGSWQHWCQLTFKKL-MGFSPCIFWVAV 196
SEQ 4 VPQRLHPTEEEVNHYHALYMTDLEQLFEEHKESCGVPASTCLTFI-- 216
SEQ 8 IPKLEHPTQQDIDLYHTMYMEALVKLFDKHKTKFGLPETEVLEVN— 219
SEQ 10 S RSKKRKINXX QNDSCYSL 106
SEQ 12 VRQTLNPTQEQIEELHQTYMEELRKLFEEHKGKYGIPEHETLVLK— 146
SEQ 14
SEQ 20 IPRIKRPNQKTVDKYHALYISALRKLFDQHKVEYG PETQELTIT— 344
SEQ 16
SEQ 18 AYAFGETDL 148
SEQ 6

Claims

1 . An isolated nucleic acid molecule selected from:
(a) nucleic acid molecules comprising a nucleotide sequence as shown in SEQ ID NO: 3, 5, 7, 9, 11 , 13, 15, 17, or 19;
(b) nucleic acid molecules comprising a nucleotide sequence capable of hybridizing, under stringent hybridization conditions, to a nucleotide sequence complementary to the polypeptide coding region of a nucleic acid molecule as defined in (a); and
(c) nucleic acid molecules comprising a nucleic acid sequence which is degenerate as a result of the genetic code to a nucleotide sequence as defined in (a) or (b).
2. An isolated polypeptide encoded by the nucleic acid molecule according to claim
3. The isolated polypeptide according to claim 2 having an amino acid sequence shown as SEQ ID NO: 4, 6, 8, 10, 12, 14, 16, 18 or 20 in the Sequence Listing
4. A vector harboring the nucleic acid molecule according to claim 1.
5. A replicable expression vector which carries and is capable of mediating the expression of a nucleotide sequence according to claim 1.
6. A cultured host cell harboring a vector according to claim 4 or 5.
7. A process for production of a polypeptide, comprising culturing a host cell according to claim 6 under conditions whereby said polypeptide is produced, and recovering said polypeptide.
8. A method for identifying an agent capable of modulating a nucleic acid molecule according to claim 1 , comprising
(i) providing a cell comprising the said nucleic acid molecule; (ii) contacting said cell with a candidate agent; and
(iii) monitoring said cell for an effect that is not present in the absence of said candidate agent.
PCT/SE2002/000730 2001-04-12 2002-04-12 Protein cluster v WO2002083721A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CA002440846A CA2440846A1 (en) 2001-04-12 2002-04-12 Protein cluster v
EP02718767A EP1377603A1 (en) 2001-04-12 2002-04-12 Protein cluster v
JP2002581476A JP2005500020A (en) 2001-04-12 2002-04-12 Protein cluster V

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SE0101317A SE0101317D0 (en) 2001-04-12 2001-04-12 Protein cluster v
SE0101317-6 2001-04-12

Publications (1)

Publication Number Publication Date
WO2002083721A1 true WO2002083721A1 (en) 2002-10-24

Family

ID=20283777

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2002/000730 WO2002083721A1 (en) 2001-04-12 2002-04-12 Protein cluster v

Country Status (6)

Country Link
EP (1) EP1377603A1 (en)
JP (1) JP2005500020A (en)
CA (1) CA2440846A1 (en)
NZ (1) NZ527682A (en)
SE (1) SE0101317D0 (en)
WO (1) WO2002083721A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000001713A2 (en) * 1998-07-02 2000-01-13 Calgene Llc Diacylglycerol acyl transferase proteins
WO2000078961A1 (en) * 1999-06-23 2000-12-28 Genentech, Inc. Secreted and transmembrane polypeptides and nucleic acids encoding the same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000001713A2 (en) * 1998-07-02 2000-01-13 Calgene Llc Diacylglycerol acyl transferase proteins
WO2000078961A1 (en) * 1999-06-23 2000-12-28 Genentech, Inc. Secreted and transmembrane polypeptides and nucleic acids encoding the same

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DATABASE GENESEQ [online] "Secreted and transmembrane proteins and nucleic acids designated PRO, useful as hybridization probes, in chromosome and gene mapping and gene therapy", XP002971740, Database accession no. (AAB66170) *

Also Published As

Publication number Publication date
NZ527682A (en) 2005-04-29
EP1377603A1 (en) 2004-01-07
JP2005500020A (en) 2005-01-06
CA2440846A1 (en) 2002-10-24
SE0101317D0 (en) 2001-04-12

Similar Documents

Publication Publication Date Title
Hillier et al. Generation and analysis of 280,000 human expressed sequence tags.
Chai et al. Identification of four highly conserved genes between breakpoint hotspots BP1 and BP2 of the Prader-Willi/Angelman syndromes deletion region that have undergone evolutionary transposition mediated by flanking duplicons
Zhang et al. Cloning and functional analysis of cDNAs with open reading frames for 300 previously undefined genes expressed in CD34+ hematopoietic stem/progenitor cells
Wlaschin et al. EST sequencing for gene discovery in Chinese hamster ovary cells
Lai et al. Characterization of the maize endosperm transcriptome and its comparison to the rice genome
Hammarsund et al. Identification and characterization of two novel human mitochondrial elongation factor genes, hEFG2 and hEFG1, phylogenetically conserved through evolution
Perelygin et al. The mammalian 2′-5′ oligoadenylate synthetase gene family: evidence for concerted evolution of paralogous Oas1 genes in Rodentia and Artiodactyla
WO2002053737A1 (en) Nf-kb activating gene
US6835556B2 (en) Protein cluster V
AU6117799A (en) Genes, proteins and biallelic markers related to central nervous system disease
Olivier et al. A novel set of hepatic mRNAs preferentially expressed during an acute inflammation in rat represents mostly intracellular proteins
Bennett et al. Characterization of the human secreted phosphoprotein 24 gene (SPP2) and comparison of the protein sequence in nine species
WO2000058510A2 (en) Schizophrenia associated genes, proteins and biallelic markers
Arnould et al. Identifying and characterizing a five‐gene cluster of ATP‐binding cassette transporters mapping to human chromosome 17q24: a new subgroup within the ABCA subfamily
WO2002083721A1 (en) Protein cluster v
AU2002249749A1 (en) Protein cluster V
US20050096269A1 (en) Protein Cluster II
WO2002051864A1 (en) Protein cluster ii
WO2002042324A1 (en) Gene encoding protein cluster i and the encoded protein
Mottus et al. Unique gene organization: alternative splicing in Drosophila produces two structurally unrelated proteins
US20020165182A1 (en) Gene encoding Protein Cluster I and the encoded protein
Vitale et al. Cysteine and tyrosine-rich 1 (CYYR1), a novel unpredicted gene on human chromosome 21 (21q21. 2), encodes a cysteine and tyrosine-rich protein and defines a new family of highly conserved vertebrate-specific genes
Carlsson et al. Genomic structure of mouse SPI-C and genomic structure and expression pattern of human SPI-C
WO2001007607A2 (en) FULL LENGTH cDNA CLONES AND PROTEINS ENCODED THEREBY
Jin et al. Expression profile of mRNAs from human pancreatic islet tumors

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 527682

Country of ref document: NZ

WWE Wipo information: entry into national phase

Ref document number: 2002249749

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 2440846

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2002718767

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2002581476

Country of ref document: JP

WWP Wipo information: published in national office

Ref document number: 2002718767

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 527682

Country of ref document: NZ

WWG Wipo information: grant in national office

Ref document number: 527682

Country of ref document: NZ

WWW Wipo information: withdrawn in national office

Ref document number: 2002718767

Country of ref document: EP