EP1163338A2 - HUMAN MESENCHYMAL DNAs AND EXPRESSION PRODUCTS - Google Patents

HUMAN MESENCHYMAL DNAs AND EXPRESSION PRODUCTS

Info

Publication number
EP1163338A2
EP1163338A2 EP00923114A EP00923114A EP1163338A2 EP 1163338 A2 EP1163338 A2 EP 1163338A2 EP 00923114 A EP00923114 A EP 00923114A EP 00923114 A EP00923114 A EP 00923114A EP 1163338 A2 EP1163338 A2 EP 1163338A2
Authority
EP
European Patent Office
Prior art keywords
sequence
dna
sequences
polypeptide
dna sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00923114A
Other languages
German (de)
French (fr)
Inventor
Christian Van Den Bos
Gabriel Mbalaviele
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Osiris Therapeutics Inc
Original Assignee
Osiris Therapeutics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Osiris Therapeutics Inc filed Critical Osiris Therapeutics Inc
Publication of EP1163338A2 publication Critical patent/EP1163338A2/en
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals

Definitions

  • This invention relates to newly identified polynucleotide sequences corresponding to transcription products of human genes, and to complete gene sequences associated therewith and to gene expression products thereof and to uses for the foregoing.
  • Osteoblasts key cells in bone formation, or osteogenesis, are formed from mesenchymal stem cells.
  • mesenchymal stem cells or MSCs
  • MSCs mesenchymal stem cells
  • Osteogenesis the differentiation into bone cells, has been reported as a means to generate replacement bone from cultured and implanted MSCs (Bruder et al, Growth Kinetics, Self-Renewal, and the Osteogenic Potential of Purified Human Mesenchymal Stem Cells During Extensive Subcultivation and Following Cryopreservation, J. Cell
  • MSCs Mesenchymal stem cells
  • mRNAs messenger ribonucleic acids
  • Figure 1 shows the consensus sequence (SEQ ID NO: 27) for the novel DNA sequence of the invention as determined from different cDNA clones of said sequence, the latter being about 2.5 kb in length.
  • Figure 2 is a deduced amino acid sequence for the protein expressed from the sequence of Figure 1 , residues 1 25 through 1 71 7 and corresponding to SEQ ID NO:29.
  • the amino acids set off between asterisks constitute a bipartite nuclear localization signal.
  • the isoelectric point and molecular weight were also calculated for the putative protein.
  • Figure 3 shows the results of a dot blot assay for the presence of the novel DNA sequence in a variety of human tissues.
  • a prefabricated dot blot from Clontech #7770-1
  • Figure 4 is a bar graph showing the distribution of the sequence of Figure 1 in a variety of human tissues based on relative mRNA abundance. The highest signal strength was in cells of adult heart and lowest was in fetal thymus. The bar graphs were generated using data from the dot blots of Figure 3 and were imported into an Excel spreadsheet.
  • One aspect of the present invention is directed to nucleic acids and isolated DNA sequences and molecules, and fragments thereof (and corresponding isolated RNA sequences, and fragments thereof), including sequences complementary to the foregoing, showing sequence similarity to, or capable of hybridizing to, the DNA sequences identified in SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 1 9, 21 , 23, 25, and 27 or 28.
  • the present invention is also directed to fragments or portions of such sequences which contain at least 1 5 bases, preferably at least 30 bases, more preferably at least 50 bases and most preferably at least 80 bases, and to those sequences which are at least 60%, preferably at least 80%, and most preferably at least 95%, especially 98%, identical thereto, and to DNA (or RNA) sequences encoding the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 1 2, 1 4, 1 6, 1 8, 20, 22, 24, 26, and 29, including fragments and portions thereof and, when derived from natural sources, includes alleles thereof.
  • the term "percent identity” or “percent identical,” when referring to a sequence, means that a sequence is compared to a claimed or described sequence after alignment of the sequence to be compared (the "Compared Sequence") with the described or claimed sequence (the “Reference Sequence”).
  • the Percent Identity is then determined according to the following formula:
  • C is the number of differences between the Reference Sequence and the Compared Sequence over the length of alignment between the Reference Sequence and the Compared Sequence wherein (i) each base or amino acid in the Reference Sequence that does not have a corresponding aligned base or amino acid in the Compared Sequence and (ii) each gap in the Reference Sequence and (iii) each aligned base or amino acid in the Reference Sequence that is different from an aligned base or amino acid in the Compared Sequence, constitutes a difference; and R is the number of bases or amino acids in the Reference Sequence over the length of the alignment with the Compared Sequence with any gap created in the Reference Sequence also being counted as a base or amino acid.
  • the Compared Sequence has the specified minimum percent identity to the Reference Sequence even though alignments may exist in which the hereinabove calculated Percent Identity is less than the specified Percent Identity.
  • Yet another aspect of the present invention is directed to an isolated DNA (or RNA) sequence or molecule comprising at least the coding region of a human gene (or a DNA sequence encoding the same polypeptide as such coding region), in particular an expressed human gene, which human gene comprises a DNA sequence homologous with, or contributing to, the sequence depicted in SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 1 9, 21 , 23, 25 and 27 or 28, or one at least 60%, preferably at least 80%, and most preferably at least 95%, especially 98%, identical thereto, including 100% identity, as well as fragments or portions of the coding region which encode a polypeptide having a similar function to the polypeptide encoded by said coding region.
  • the isolated DNA (or RNA) sequence may include only the coding region of the expressed gene (or fragment or portion thereof as hereinabove indicated) or may further include all or a portion of the non-coding DNA (or RNA) of the expressed human gene.
  • sequences homologous with and contributing to the sequences of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 1 9, 21 , 23, 25 and 27 or 28 are from the coding region of a human gene.
  • the present invention also relates to vectors or plasmids which include such DNA (or RNA) sequences, as well as the use of the DNA (or RNA) sequences.
  • 1 7, 1 9, 21 , 23, 25 and 28 are hybridizable with actual DNA and RNA sequences as derived from different human tissues. These sequences represent cDNA clones.
  • Figure 1 The sequence depicted in Figure 1 (SEQ ID NO: 27) is hybridizable with actual DNA and RNA sequences as derived from different human tissues. A number of cDNA clones have been generated. The nucleotide sequence of Figure 1 (SEQ ID NO: 27) itself showed a nuclear location in the various tissues studied. The distribution of this sequence in various human tissues is shown in Figures 3 and 4. Some of these clones had an additional 3 '-untranslated region, the presence of which is generally related to the extent to which the mRNA species remain in the cell before being turned over. See Kingman, Genetic Engineering, Blackwell, 1988, at page 313.
  • the 3'-untranslated region may also regulate the frequency at which the mRNA is translated and thus constitute a mechanism by which the expression of the protein can be regulated. (Gray, N.K. & Wickens, M., Control of Translation Initiation in Animals, Ann. Rev. Cell Dev. Biol., 14:399-458 (1 998).
  • the polynucleotides of the present invention may be in the form of RNA or in the form of DNA, which DNA includes cDNA, genomicJ DNA, and synthetic DNA.
  • the DNA may be double-stranded or single- stranded, and if single stranded may be the coding strand or non-coding (anti-sense) strand.
  • the coding sequence which encodes the mature polypeptide may be identical to the coding sequences present as open reading frames (ORFs) of the spolynucleotide sequences disclosed herein or may be a different coding sequence, which coding sequence, as a result of the redundancy or degeneracy of the genetic code, encodes the same mature polypeptide as the polynucleotide sequences of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 13, 1 5, 17, 19, 21 , 23, 25 and 28.
  • polypeptides that code for the polypeptides disclosed herein as putative proteins SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 1 8, 20, 22, 24, 26 and 29 may include, but are not limited to: only the coding sequence for the mature polypeptide; the coding sequence for the mature polypeptide and additional coding sequence such as a leader or secretory sequence, a proprotein sequence and a membrane anchor; the coding sequence for the mature polypeptide (and optionally additional coding sequence) and non-coding sequence, such as introns or non-coding sequence 5' and/or 3' of the coding sequence for the mature polypeptide.
  • the polynucleotide which codes for the polypeptide of Figure 2 may include, but is not limited to: only the coding sequence for the mature polypeptide; the coding sequence for the mature polypeptide and additional coding sequence such as a leader or secretory sequence, a proprotein sequence and a membrane anchor; the coding sequence for the mature polypeptide (and optionally additional coding sequence) and non-coding sequence, such as introns or non-coding sequence 5' and/or 3' of the coding sequence for the mature polypeptide.
  • polynucleotide as used for the present invention encompasses a polynucleotide which includes only coding sequence for the * polypeptide as well as a polynucleotide which includes additional coding and/or non-coding sequences.
  • the present invention further relates to variants of the hereinabove described polynucleotides which encode fragments, analogs and derivatives of the polypeptides having the amino acid sequences of SEQ ID NOS: 2, 4, 6, 8, 10, 1 2, 14, 16, 1 8, 20, 22, 24, 26 and 29.
  • Variants of the polynucleotide may be naturally occurring allelic variants of the polynucleotides or a non-naturally occurring variant of the polynucleotides.
  • nucleic acids, or polynucleotides, according to the present invention may have coding sequences which are naturally occurring allelic variants of the coding sequence shown in SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 19, 21 , 23, 25, 27 and 28.
  • an allelic variant is an alternate form of a polynucleotide sequence which may have a substitution, deletion or addition of one or more nucleotides, which does not substantially alter the function of the encoded polypeptide.
  • the present invention also includes polynucleotides, wherein the coding sequence for the mature polypeptide may be fused in the same reading frame to a polynucleotide sequence which aids in expression and secretion of a polypeptide from a host cell, for example, a leader sequence which functions as a secretory sequence for controlling transport of a polypeptide from the cell and a transmembrane anchor which facilitates attachment of the polypeptide to a cellular membrane.
  • the polypeptide having a leader sequence is a preprotein and may have the leader sequence cleaved by the host cell to form the mature polypeptide.
  • the polynucleotides may also encode for a proprotein which is the mature protein plus additional 5' amino acid residues.
  • a mature protein having a prosequence is a proprotein and is often an inactive form of the protein. Once the prosequence is cleaved an active mature protein remains.
  • the polynucleotide of the present invention may encode for a mature protein, for a protein having a prosequence, for a protein having a transmembrane anchor or for a polypeptide having a prosequence, a presequence (leader sequence) and a transmembrane anchor.
  • the polynucleotides of the present invention may also have the coding sequence fused in frame to a marker sequence which allows for purification of the polypeptide of the present invention.
  • the marker sequence may be a hexa-histidine tag supplied by a pQE-9 vector to provide for purification of the mature polypeptide fused to the marker in the case of a bacterial host, or, for example, the marker sequence may be a hemagglutinin (HA) tag when a mammalian host, e.g. COS-7 cells, is used.
  • the HA tag corresponds to an epitope derived from the influenza hemagglutinin protein (Wilson, I., et al., Cell, 37:767 (1984)).
  • Fragments of the full length polynucleotide of the present invention may be used as hybridization probes for a cDNA library to isolate the full length cDNA and to isolate other cDNAs which have a high sequence similarity to the gene or similar biological activity.
  • Probes of this type preferably have at least 1 5 bases, may have at least 30 bases and even 50 or more bases.
  • the probe may also be used to identify a cDNA clone corresponding to a full length transcript and a genomic clone or clones that contain the complete gene including regulatory and promotor regions, exons, and introns.
  • An example of a screen comprises isolating the coding region of the gene by using the known DNA sequence to synthesize an oligonucleotide probe. Labeled oligonucleotides having a sequence complementary to that of the gene of the present invention are used to screen a library of human cDNA, genomic DNA or mRNA to determine which members of the library the probe hybridizes to.
  • a polynucleotide according to the present invention may have" at least 1 5 bases, preferably at least 30 bases, and more preferably at least 50 bases which hybridize to a polynucleotide of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 19, 21 , 23, 25, 27 and 28 and which has an identity thereto, as hereinabove described, and which may or may not retain activity.
  • Such polynucleotides may be employed as probes for the polynucleotides or genes coding for the polypeptides of SEQ ID NOS: 2, 4,
  • polynucleotides according to the present invention may also occur in the form of mixtures of polynucleotides hybridizable to some extent with the gene sequences containing any of the nucleotide sequences of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 19, 21 , 23, 25, 27 and 28 including any and all fragments thereof, and which polynucleotide mixtures may be composed of any number of such polynucleotides, or fragments thereof, including mixtures having at least 10, perhaps at least 30 such sequences, or fragments thereof. Because coding regions comprise only a small portion of the human genome, identification and mapping of transcribed regions and coding regions of chromosomes is of significant interest.
  • Various aspects of the present invention include each of the individual sequences, corresponding partial and complete cDNAs, genomic DNA, mRNA, antisense strands, PCR primers, coding regions, and constructs.
  • Expression vectors and polypeptide expression products are also within the scope of the present invention, along with antibodies, especially monoclonal antibodies, to such expression products.
  • the term "gene” or “cistron” means the segment of DNA (or DNA segment) involved in producing a polypeptide chain; it includes regions preceding and following the coding region (5'-and 3'- untranslated regions, or UTRs, also called leader and trailer sequences, regions, or segments) as well as intervening sequences (introns) between individual coding segments (exons), which intronic regions are typically removed during processing of post- transcriptional RNA to form the final translatable mRNA product.
  • UTRs 3'- untranslated regions, or UTRs, also called leader and trailer sequences, regions, or segments
  • intervening sequences introns between individual coding segments (exons)
  • cDNAs contain no intronic sequences.
  • DNA segment refers to a DNA polymer, in the form of a separate fragment or as a component of a larger DNA construct, which has been derived from DNA isolated at least once in substantially pure form, i.e., free of contaminating endogenous materials and in a quantity or concentration enabling identification, manipulation, and recovery of the segment and its component nucleotide sequences by standard biochemical methods, for example, using a cloning vector.
  • Such segments are provided in the form of an open reading frame uninterrupted by internal nontranslated sequences (introns), which are typically present in eukaryotic genes. Sequences of non-translated DNA may be present downstream from the open reading frame, where the same do not interfere with manipulation or expression of the coding regions.
  • nucleic acids and polypeptide expression products disclosed according to the present invention may be in "enriched form.
  • enriched means that the concentration of the material is at least about 2, 5, 1 0, 100, or 1 000 times its natural concentration (for example), advantageously 0.01 %, by weight, preferably at least about 0.1 % by weight. Enriched preparations of about 0.5%, 1 %, 5%, 10%, and 20% by weight are also contemplated.
  • sequences, constructs, vectors, clones, and other materials comprising the present invention can advantageously be in enriched or isolated form. For example, removal, via the differential display techniques described herein, of clones corresponding to ribosomal RNA and "housekeeping" genes and clones without human cDNA inserts results in a library that is "enriched" in the desired clones.
  • isolated means that the material is removed from its original environment (e.g., the natural environment if it is naturally occurring).
  • a naturally-occurring polynucleotide, or DNA present in a living animal is not isolated, but the same polynucleotide or DNA, separated from some or all of the coexisting materials in the natural system, is isolated.
  • DNA could be part of a vector and/or such polynucleotide could be part of a composition, and still be isolated in that such vector or polynucleotide is not part of its natural environment.
  • the DNA and RNA sequences, and polypeptides, disclosed in accordance with the present invention may also be in "purified” form.
  • the term “purified” does not require absolute purity; rather, it is intended as a relative definition, and can include preparations that are highly purified or preparations that are only partially purified, as those terms are understood by those of skill in the relevant art.
  • Individual clones isolated from a cDNA library have been conventionally purified to electrophoretic homogeneity.
  • the cDNA clones are obtained via manipulation of a partially purified naturally occurring substance (messenger RNA). By conversion of mRNA into a cDNA library, pure individual cDNA clones can be isolated from the synthetic library by clonal selection.
  • creating a cDNA library from RNA and subsequently isolating individual clones from that library results in an approximately 10 6 fold purification of the native message.
  • Purification of starting material or natural material to at least one order of magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is expressly contemplated.
  • claimed polynucleotide which has a purity of preferably 0.001 %, or at least 0.01 % or 0.1 %; and even desirably 1 % by weight or greater is expressly contemplated.
  • coding region refers to that portion of a human gene which either naturally or normally codes for the expression product of that gene in its natural genomic environment, i.e., the region coding in vivo for the native expression product of the gene.
  • the coding region can be from a normal, mutated or altered gene, or can even be from a DNA sequence, or gene, wholly synthesized in the laboratory using methods well known to those of skill in the art of DNA synthesis.
  • nucleotide sequence refers to a heteropolymer of deoxyribonucleotides.
  • DNA segments encoding the proteins provided by this invention are assembled from cDNA fragments and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a synthetic gene which is capable of being expressed in a recombinant transcriptional unit comprising regulatory elements derived from a microbial or viral operon.
  • expression product means that polypeptide or protein that is the natural transcription product of the gene and any nucleic acid sequence coding equivalents resulting from genetic code degeneracy and thus coding for the same amino acid(s).
  • fragment when referring to a coding sequence means a portion of DNA comprising less than the complete human coding region whose expression product retains essentially the same biological function or activity as the expression product of the complete coding region.
  • portion refers to a continuous sequence of residues, such as amino acid residues, which sequence forms a subset of a larger sequence.
  • segment refers to a continuous sequence of residues, such as amino acid residues, which sequence forms a subset of a larger sequence.
  • the oligopeptides resulting from such treatment would represent portions, segments or fragments of the starting polypeptide.
  • portions, segments or fragments of polynucleotides would include those products resulting from the treatment of such polynucleotides with endonucleases.
  • primer means a short nucleic acid sequence that is paired with one strand of DNA and provides a free 3'OH end at which a DNA polymerase starts synthesis of a deoxyribonucleotide chain.
  • promoter means a region of DNA involved in binding of RNA polymerase to initiate transcription.
  • ORF open reading frame
  • exon means any segment of an interrupted gene * that is represented in the mature RNA product.
  • reference to a DNA sequence includes both single stranded and double stranded DNA.
  • specific sequence unless the context indicates otherwise, refers to the single strand DNA of such sequence, the duplex of such sequence with its complement (double stranded DNA) and the complement of such sequence.
  • the overall approach to identification of cDNAs from hMSCs involved measurement of gene expression during growth of human mesenchymal stem cells in culture.
  • Cells were harvested and the total RNA content thereof was recovered.
  • RT-PCR reverse transcriptase and polymerase chain reaction procedures
  • the mRNA from the cells of interest (such as the hMSCs used in accordance with the present invention) is used to prepare a set or family of cDNAs corresponding to the expressed genes of the cell.
  • This cDNA preparation is then exhaustively hybridized with mRNA of cells not expressing the gene and resulting in removal of all sequences from the cDNA preparation that are common to the two cell samples. All of the cDNA sequences that hybridize with the other mRNA and those that remain are then hybridized with mRNA from the cells expressing the gene (for example, cells from a healthy person or cells from tissues known to express the gene) to confirm that they are in fact the desired coding sequences. Because these latter clones contain sequences specific to the mRNA population of the cells of interest, they can subsequently be amplified and characterized using further rounds of PCR and the general techniques of molecular biology.
  • a cDNA library was generated and corresponds to the sequences of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 17, 19, 21 , 23, 25, 27 and 28.
  • Probes based on these cDNAs can be used to identify the relevant transcripts, using Northern Blotting Analysis methods well known in the art to localize these sequences within cells of various tissues.
  • the heaviest distribution of the gene coding for the polypeptide of Figure 2 (SEQ ID NO: 29) was in heart tissue, as shown in Figures 3 and 4.
  • cDNA was quantified by spotting 0.5 ⁇ l aliquots of standards and samples on ethidium agarose plates prepared as suggested in the instructions from the manufacturer (Stratagene, La Jolla, CA). Plates were incubated at room temperature for 1 5 minutes and DNA was visualized by UV transillumination. The respective cDNAs were then quantified by comparing spot intensities of the samples with those of the standards (the latter consisting of appropriate dilutions of 1 kb ladders (from Life Technology).
  • plasmid DNA was digested with both EcoRI and Xhol nucleases (New England Biolabs) and the resulting restriction fragments were separated on 1 .5% agarose gel electrophoresis.
  • Each of the DNA sequences identified herein can be used in numerous ways as polynucleotide reagents.
  • the sequences can be used as diagnostic probes for the presence of a specific mRNA in a particular cell type as well as in genetic linkage analysis (polymorphisms). Further, the sequences can be used as probes for locating gene regions associated with genetic disease.
  • the nucleotide and gene sequences of the present invention are also valuable for chromosome identification. Each sequence is specifically targeted to and can hybridize with a particular location on an individual human chromosome. Moreover, there is a current need for identifying particular sites on the chromosome.
  • the mapping of the polynucleotides to . specific chromosomes according to the present invention is an important first step in correlating those sequences with genes associated with disease.
  • sequences can be mapped to chromosomes by preparing PCR primers (preferably 1 5-30 bp) from the sequences disclosed herein. Computer analysis of these sequences is used to rapidly select primers that do not span more than one exon in the corresponding genomic DNA, which would otherwise complicate the amplification process. These primers are then used for PCR screening of somatic cell hybrids containing individual human chromosomes. Only those hybrids containing the human gene corresponding to the sequences or subsequences disclosed herein will yield an amplified fragment.
  • PCR mapping of somatic cell hybrids is a rapid procedure for assigning a particular sequence to a particular chromosome. Three or more clones can be assigned per day using a single thermal cycler, as is well known in the art. Using the present invention with the same oligonucleotide primers, sublocalization can be achieved with panels of fragments from specific chromosomes or pools of large genomic clones in an analogous manner.
  • Other mapping strategies that can similarly be used to map a sequence, or part of a sequence, to its chromosome include in situ hybridization, prescreening with labeled flow-sorted chromosomes and preselection by hybridization to construct chromosome specif ic-cDNA libraries.
  • Fluorescence in situ hybridization (FISH) of a cDNA clone to a metaphase chromosomal spread can be used to provide a precise chromosomal location in one step.
  • This technique can be used with cDNA as short as 500 or 600 bases; however, clones larger than 2,000 bp have a higher likelihood of binding to a unique chromosomal location with sufficient signal intensity for simple detection.
  • FISH requires use of the clone from which the sequence was derived, and the longer the better. For example, 2,000 bp is good, 4,000 is better, but more than 4,000 is probably not necessary to get good results a reasonable percentage of the time.
  • Reagents for chromosome mapping can be used individually (to mark a single chromosome or a single site on that chromosome) or as panels of reagents (for marking multiple sites and/or multiple chromosomes). Reagents corresponding to noncoding regions of the genes actually are preferred for mapping purposes. Coding sequences are more likely to be conserved within gene families, thus increasing the chance of cross hybridizations during chromosomal mapping.
  • a cDNA precisely localized to a chromosomal region associated with the disease could be one of between 50 and 500 potential causative genes. (This assumes 1 megabase mapping resolution and one gene per 20 kb.)
  • Comparison of affected and unaffected individuals generally involves first looking for structural alterations in the chromosomes, such as deletions or translocations that are visible from chromosome spreads or detectable using PCR based on that cDNA sequence. Ultimately, complete sequencing of genes from several individuals is required to confirm the presence of a mutation and to distinguish mutations from polymorphisms.
  • sequences of the invention can be used to control gene expression througrf triple helix formation or antisense DNA or RNA, both of which methods are based on binding of a polynucleotide sequence to DNA or RNA.
  • Polynucleotides suitable for use in these methods are usually 20 to 40 bases in length and are designed to be complementary to a region of the gene involved in transcription (triple helix - see Lee et al, Nucl.
  • Antisense RNA or oligonucleotide hybridization may also lead to RNAse H activation and hence destruction of the molecules involved in the hybrid.
  • the present invention is also a useful tool in gene therapy, which requires isolation of the disease-associated gene in question as a prerequisite to the insertion of a normal gene into an organism to correct a genetic defect.
  • the high specificity of the cDNA probes according to this invention have promise of targeting such gene locations in a highly accurate manner.
  • sequences of the present invention are also useful for identification of individuals from minute biological samples.
  • the United States military for example, is considering the use of restriction fragment length polymorphism (RFLP) for identification of its personnel.
  • RFLP restriction fragment length polymorphism
  • an individual's genomic DNA is digested with one or more restriction enzymes, and probed on a Southern blot to yield unique bands' for identifying personnel.
  • This method does not suffer from the current limitations of "Dog Tags" which can be lost, switched, or stolen, making positive identification difficult.
  • the sequences of the present invention are useful as additional DNA markers for RFLP.
  • RFLP is a pattern based technique, which does not require the DNA sequence of the individual to be sequenced.
  • Portions of the sequences of the present invention can be used to provide an alternative technique that determines the actual base-by-base DNA sequence of selected portions of an individual's genome.
  • These sequences can also be used to prepare PCR primers for amplifying and isolating such selected DNA.
  • One can, for example, take part of the sequence of the invention and prepare two PCR primers from the 5' and 3' ends of the sequence, or fragment of the sequence. These are used to amplify an individual's DNA, corresponding to the sequence. The amplified DNA is sequenced.
  • Panels of corresponding DNA sequences from individuals can provide unique individual identifications, as each individual will have a unique set of such DNA sequences, due to allelic differences.
  • the sequences of the present invention can be used to particular advantage to obtain such identification sequences from individuals and from tissue. Allelic variation occurs to some degree in the coding regions of these sequences, and to a greater degree in the noncoding regions. It is estimated that allelic variation between individual humans occurs with a frequency of about once per each 500 bases.
  • Each of the fragments or complete coding sequences comprising a part of the present invention can, to some degree, be used as a standard against which DNA from an individual can be compared for identification purposes. Because greater numbers of polymorphisms occur in the noncoding regions, fewer sequences are necessary to differentiate * individuals.
  • a panel of reagents from the sequences according to the present invention is used to generate a unique ID database for an individual, those same reagents can later be used to identify tissue from that individual. Positive identification of that individual, living or dead can be made from extremely small tissue samples.
  • DNA-based identification techniques are in forensic biology.
  • PCR technology can be used to amplify DNA sequences taken from very small biological samples.
  • gene sequences are amplified at specific loci known to contain a large number of allelic variations, for example the DQ ⁇ class II HLA gene (Erlich, H., PCR Technology, Freeman and Co. ( 1 992)).
  • DQ ⁇ class II HLA gene Erlich, H., PCR Technology, Freeman and Co. ( 1 992)
  • this specific area of the genome is amplified, it is digested with one or more restriction enzymes to yield an identifying set of bands on a Southern blot probed with DNA corresponding to the DQ ⁇ class II HLA gene.
  • the novel gene signal according to the present invention is found in many different tissues of the body.
  • sequences of the present invention can be used to provide polynucleotide reagents specifically targeted to additional loci in the human genome, and can enhance the reliability of DNA-based forensic identifications. Those sequences targeted to noncoding regions are particularly appropriate. As mentioned above, actual base sequence information can be used for identification as an accurate alternative to patterns formed by restriction enzyme generated fragments. Reagents for obtaining such sequence information are within the scope of the present invention. Such reagents can comprise complete genes, parts of genes or corresponding coding regions, or fragments of at least 1 5 bp, preferably at least 1 8 bp.
  • reagents capable of identifying the source of a particular tissue. Such need arises, for example, in forensics when presented with tissue of unknown origin.
  • Appropriate reagents can comprise, for example, DNA probes or primers specific to particular tissue prepared from the sequences of the present invention. Panels of such reagents can identify tissue by species and/or by organ type. In a similar manner, these reagents can be used to screen tissue cultures for contamination.
  • Sequences that match perfectly to several different genes can be detected by hybridizing to chromosomes: if many chromosomal loci are observed, the sequence (or a close variant) is in more than one gene.
  • This problem can be circumvented by using the 3'-untranslated part of the cDNA alone as a probe for the chromosomal location or for the full-length cDNA or gene.
  • the 3'-untranslated region is more likely to be unique within gene families, since there is no evolutionary pressure to conserve a coding function of this region of the mRNA.
  • the cDNA libraries disclosed according to the present invention ideally use directional cloning methods so that either the 5' end of the cDNA (likely to contain coding sequence) or the 3' end (likely to be a non-coding sequence) can be selectively obtained.
  • the polynucleotides of the present invention can be derived from natural sources or synthesized using known methods.
  • the sequences falling within the scope of the present invention are not limited to the specific sequences described, but include human allelic and species variations thereof. Allelic variations can be routinely determined by comparison of one sequence with a sequence from another individual of the same species.
  • the invention * includes sequences coding for the same amino acid sequences as do the specific sequences disclosed herein. In other words, in a coding region, substitution of one codon for another which encodes the same amino acid is expressly contemplated. (Coding regions can be determined through routine sequence analysis.)
  • a cDNA library there are many species of mRNA represented. Each cDNA clone can be interesting in its own right, but must be isolated from the library before further experimentation can be completed. In order to sequence any specific cDNA, it must be removed and separated (i.e. isolated and purified) from all the other sequences. This can be accomplished by many techniques known to those of skill in the art. These procedures normally involve identification of a bacterial colony containing the cDNA of interest and further amplification of that bacteria. Once a cDNA is separated from the mixed clone library, it can be used as a template for further procedures such as nucleotide sequencing.
  • the present invention also includes recombinant constructs comprising one or more of the sequences as broadly described above.
  • the constructs comprise a vector, such as a plasmid or viral vector, into which a sequence of the invention has been inserted, in a forward or reverse orientation.
  • the construct further comprises regulatory sequences, including for example, a promoter, operably linked to the sequence.
  • a promoter operably linked to the sequence.
  • Bacterial pBs, phagescript, PsiX1 74, pBluescript SK, pBs KS, pNH8a, pNH1 6a, pNH1 8a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia).
  • Eukaryotic pWLneo, pSV2cat, pOG44, pXT1 , pSG (Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia).
  • the present invention is not restricted to such constructs or sequences alone but also includes expression vehicles, which may include plasmids, viruses, or any other expression vectors, including cells and liposomes, containing any of the nucleic acids, nucleotide sequences, DNAs, RNAs, or fragments thereof, as disclosed according to the present invention. Furthermore, this will be true regardless of whether such sequences are coding sequences or non- coding sequences and whether such coding sequences code for all or part of the expression products as disclosed herein, so long as such expression products, or fragments thereof, exhibit some utility in keeping with the invention disclosed herein.
  • the present invention includes an isolated DNA sequence, or nucleic acid, that expresses a human protein when in a suitable expression system, for example, a cell- free, or in vitro, expression system
  • a suitable expression vehicle or vector
  • Such expression systems especially where part of an expression vehicle, will commonly require some promoter region that may include a promoter different from that normally associated in vivo with the genes coding for the gene expression products and proteins disclosed according to the present invention.
  • Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers.
  • Two appropriate vectors are pKK232-8 and pCM7.
  • Particular named bacterial promoters include lacl, lacZ, T3, T7, gpt, lambda P R , and trc.
  • Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-l. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art.
  • the present invention relates to host cells containing the above-described construct(s).
  • the host cell can be a higher eukaryotic cell, such as a mammalian cell, or a lower eukaryotic cell, such as a yeast cell, or the host cell can be a procaryotic cell, such as a bacterial cell.
  • Introduction of the construct into the host cell can be effected by calcium phosphate transfection, DEAE, dextran mediated transfection, or electroporation (Davis, L., Dibner, M., Battey, I.,
  • the constructs in host cells can be used in a conventional manner to produce the gene product coded by the recombinant sequence.
  • the encoded polypeptide once the sequence is known from the cDNAs, or from isolation of the pure product, can be synthetically produced by conventional methods of peptide synthesis, either manual or automated.
  • the present invention includes all polypeptides coded for by any and each of the DNA or RNA sequences disclosed herein, including fragments of said polypeptides, as well as derivatives and functional analogs thereof.
  • amino acid sequence can be synthesized using commercially available peptide synthesizers. This is particularly useful in producing small peptides and fragments of larger polypeptides. (Fragments are useful, for example, in generating antibodies against the native polypeptide.)
  • the DNA encoding the desired polypeptide can be inserted into a host organism and expressed.
  • the organism can be a bacterium, yeast, cell line, or multicellular plant or animal.
  • the literature * is replete with examples of suitable host organisms and expression techniques.
  • polynucleotide DNA or mRNA
  • This methodology can be used to deliver the polypeptide to the animal, or to generate an immune response against a foreign polypeptide.
  • the coding sequence can be inserted into a vector, which is then used to transfect a cell.
  • the cell (which may or may not be part of a larger organism) then expresses the polypeptide.
  • the present invention further relates to polypeptides having an amino acid sequence selected from SEQ ID NOS: 2, 4, 6, 8, 10, 1 2, 14, 1 6, 1 8, 20, 22, 24, 26, and 29, as well as fragments, analogs and derivatives of such polypeptide.
  • fragment when referring to the polypeptides disclosed herein also mean polypeptides that retain essentially the same biological function or activity as said polypeptides.
  • an analog includes a proprotein which can be activated by cleavage of the proprotein portion to produce an active mature polypeptide.
  • Such fragments, derivatives and analogs must have sufficient similarity to the polypeptides disclosed herein so that activity of the native polypeptide is retained.
  • polypeptides of the present invention may be recombinant polypeptides, natural polypeptides or synthetic polypeptides, preferably recombinant polypeptides.
  • Recombinant means that a protein is derived from recombinant (e.g., microbial or mammalian) expression systems.
  • Microbial refers to recombinant proteins made in bacterial or fungal (e.g., yeast) expression systems.
  • recombinant microbial defines a protein essentially free of native endogenous substances and unaccompanied by associated native glycosylation. Protein expressed in most bacterial cultures, e.g., coli, will be free of glycosylation modifications; protein expressed in yeast will have a glycosylation pattern different from that expressed in mammalian cells.
  • the fragment, derivative or analog of a polypeptide of SEQ ID NOS: 2, 4, 6, 8, 10, 1 2, 14, 1 6, 1 8, 20, 22, 24, 26 and 29 may be (i) one in which one or more of the amino acid residues are substituted with a conserved or non-conserved amino acid residue (preferably a conserved amino acid residue) and such substituted amino acid residue may or may not be one encoded by the genetic code, or (ii) one in which one or more of the amino acid residues includes a substituent group, or (iii) one in which the mature polypeptide is fused with another compound, such as a compound to increase the half-life of the polypeptide (for example, polyethylene glycol), or (iv) one in which the additional amino acids are fused to the mature polypeptide, such as a leader or secretory sequence or a sequence which is employed for purification of the mature polypeptide or a proprotein sequence.
  • Such fragments, derivatives and analogs are deemed to be within the abilities of those skilled in the art in view
  • polypeptides of the present invention are preferably provided in an isolated form, and preferably are purified to homogeneity. When applied to polypeptides, the term "isolated" has its already stated meaning.
  • polypeptides of the present invention include the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, and 29, in particular the mature polypeptide, as well as polypeptides which have at least 70% identity to these polypeptides, or which have, af least 90% identity to these polypeptides, still more preferably at least 95% identity to these polypeptides and also include portions of such polypeptides with such portion generally containing at least 30 amino acids and more preferably at least 50 amino acids.
  • Fragments or portions of the polypeptides of the present invention may be employed for producing the corresponding full-length polypeptide by peptide synthesis; therefore, the fragments may be employed as intermediates for producing the full-length polypeptides. Fragments or portions of the polynucleotides of the present invention may be used to synthesize full-length polynucleotides of the present invention.
  • the present invention also relates to vectors which include polynucleotides of the present invention, host cells which are genetically engineered with vectors of the invention and the production of polypeptides of the invention by recombinant techniques.
  • Host cells are genetically engineered (transduced or transformed or transfected) with the vectors of this invention which may be, for example, a cloning vector or an expression vector, either of which may be in the form of a plasmid, a viral particle, a phage, etc.
  • the engineered host ceils can be cultured in conventional nutrient media modified as appropriate for activating promoters, selecting transformants or amplifying the genes of the present invention.
  • the culture conditions such as temperature, pH and the like, are those previously used with the host cell selected for expression, and will be apparent to the ordinarily skilled artisan.
  • the polynucleotides of the present invention may be employed for producing polypeptides by recombinant techniques.
  • the polynucleotide may be included in any one of a variety of expression vectors for expressing a polypeptide.
  • Such vectors include chromosomal, nonchromosomal and synthetic DNA sequences, e.g., derivatives of SV40,' bacterial plasmids; phage DNA; baculovirus; yeast plasmids; vectors derived from combinations of plasmids and phage DNA, viral DNA such as vaccinia, adenovirus, fowl pox virus, and pseudorabies.
  • any other vector may be used as long as it is replicable and viable in the host.
  • an appropriate DNA sequence or segment may be inserted into the vector by a variety of procedures.
  • the DNA sequence is inserted into the appropriate restriction endonuclease site(s) by procedures known in the art. Such procedures and others are deemed to be within the scope of those skilled in the art.
  • the DNA sequence in the expression vector is operatively linked to an appropriate expression control sequence(s) (for example, a promoter sequence) to direct mRNA synthesis.
  • a promoter sequence for example, a promoter sequence
  • LTR or SV40 promoter the £. coli. lac or trp
  • phage lambda P L promoter the phage lambda P L promoter and other promoters known to control expression of genes in prokaryotic or eukaryotic cells or their viruses.
  • the expression vector also contains a ribosome binding site for translation initiation and a transcription terminator.
  • the vector may also include appropriate sequences for amplifying expression.
  • the expression vectors preferably contain one or more selectable marker genes to provide a phenotypic trait for selection of transformed host cells such as dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as tetracycline or ampicillin resistance in E. coli.
  • the vector containing the appropriate DNA sequence as hereinabove described, as well as an appropriate promoter or control sequence, may be employed to transform an appropriate host to permit the host to express the protein.
  • bacterial cells such as E. coli, Streptomyces, Salmonella typhimurium
  • fungal cells such as -yeast
  • insect cells such as Drosophila S2 and Spodoptera Sf9
  • animal cells such as CHO, COS or Bowes melanoma
  • adenoviruses plant cells, etc.
  • Recombinant expression vehicle or vector refers to a plasmid or phage or virus or vector, for expressing a polypeptide from a DNA (RNA) sequence.
  • the expression vehicle can comprise a transcriptional unit comprising an assembly of (1 ) a genetic element or elements having a regulatory role in gene expression, for example, promoters or enhancers, (2) a structural or coding sequence which is transcribed into mRNA and translated into protein, and (3) appropriate transcription initiation and termination sequences.
  • Structural units intended for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling extracellular secretion of translated protein by a host cell.
  • recombinant protein is expressed without a leader or transport sequence, it may include an N- terminal methionine residue. This residue may or may not be subsequently cleaved from the expressed recombinant protein to provide a final product.
  • Recombinant expression system means host cells which have stably integrated a recombinant transcriptional unit into chromosomal DNA or carry the recombinant transcriptional unit extra chromosomally.
  • the cells can be prokaryotic or eukaryotic.
  • Recombinant expression systems as defined herein will express heterologous protein upon induction of the regulatory elements linked to the DNA segment or synthetic gene to be expressed.
  • Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell- free translation systems can also be employed to produce such proteins using RNAs derived from the DNA constructs of the present invention.
  • Appropriate cloning and expression vectors for use with prokaryatic and eukaryotic hosts are described by Sambrook, et al., Molecular Cloning: A Laboratory Manual, Second Edition, (Cold Spring Harbor, N.Y., 1 989), Wu et al, Methods in Gene Biotechnology (CRC Press, New York, NY, 1 997), and Recombinant Gene Expression Protocols, in Methods in Molecular Biology, Vol. 62, (Tuan, ed., Humana Press, Totowa, NJ, 1997), the disclosures of which are hereby incorporated by reference.
  • Enhancer sequence Transcription of the DNA encoding the polypeptides according to the present invention by higher eukarotes can be increased by insertion of an enhancer sequence into the vector.
  • enhancers have been known for some time and are usually cis-acting elements of DNA, usually anywhere from 10 to 300 bp that act on a promoter to increase transcription- Common examples include the SV40 enhancer, the cytomegalovirus early promoter enhancer, the polyoma enhancer and the enhancers found in adenovirus.
  • recombinant expression vectors will include origins of replication and selectable markers permitting transformation of the host cell, e.g., the ampicillin resistance gene of £. coli and S. cerevisiae TRP1 gene, and a promoter derived from a highly-expressed gene to direct transcription of a downstream structural sequence.
  • promoters can be derived from operons encoding glycolytic enzymes such as 3- phosphoglycerate kinase (PGK), ⁇ -factor, acid phosphatase, or heat shock proteins, among others.
  • the heterologous structural sequence is assembled in appropriate phase with translation initiation and termination sequences, and preferably, a leader sequence capable of directing * secretion of translated protein into the periplasmic space or extracellular medium.
  • the heterologous sequence can encode a fusion protein including an N-terminal identification peptide imparting desired characteristics, e.g., stabilization or simplified purification of expressed recombinant product.
  • Useful expression vectors for bacterial use are constructed by inserting a structural DNA sequence encoding a desired protein together with suitable translation initiation and termination signals in operable reading phase with a functional promoter.
  • the vector will comprise one or more phenotypic selectable markers and an origin of replication to ensure maintenance of the vector and to, if desirable, provide amplification within the host.
  • Suitable prokaryotic hosts for transformation include E. coli, Bacillus subti/is, Salmonella typhimurium and various species within the genera Pseudomonas, Streptomyces, and Staphylococcus, although others may also be employed as a matter of choice.
  • useful expression vectors for bacterial use can comprise a selectable marker and bacterial origin of replication derived from commercially available plasmids comprising genetic elements of the well known cloning vector pBR322 (ATCC 3701 7).
  • cloning vector pBR322 ATCC 3701 7
  • Such commercial vectors include, for example, pKK223- 3 (Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 (Promega Biotec, Madison, Wl, USA). These pBR322 "backbone" sections are combined with an appropriate promoter and the structural sequence to be expressed.
  • the selected promoter is derepressed by appropriate means (e.g., temperature shift or chemical induction) and cells are cultured for an additional period.
  • appropriate means e.g., temperature shift or chemical induction
  • Cells are * typically harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for further purification.
  • mammalian cell culture systems can also be employed to express recombinant protein.
  • mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts, described by Gluzman, Cell, 23: 1 75 ( 1 981 ), and other cell lines capable of expressing a compatible vector, for example, the C1 27, 3T3, CHO, HeLa and BHK cell lines.
  • Mammalian expression vectors will comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking nontranscribed sequences.
  • DNA sequences derived from the SV40 viral genome may be used to provide the required nontranscribed genetic elements.
  • Recombinant protein produced in bacterial culture is conveniently isolated by initial extraction from cell pellets, followed by one or more salting-out, aqueous ion exchange or size exclusion chromatography steps. Protein refolding steps can be used, as necessary, in completing configuration of the mature protein. Finally, high performance liquid chromatography (HPLC) can be employed for final purification steps.
  • Microbial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents.
  • the protein, its fragments or other derivatives, or analogs thereof, or cells expressing them, can be used as an immunogen to produce antibodies thereto.
  • These antibodies can be, for example, polyclonal, monoclonal, chimeric, single chain, Fab fragments, or the * product of an Fab expression library.
  • Various procedures known in the art may be used for the production of polyclonal antibodies.
  • Antibodies generated against the polypeptide corresponding to a sequence of the present invention can be obtained by direct injection of the polypeptide into an animal or by administering the polypeptide to an animal, preferably a nonhuman. The antibody so obtained will then bind the polypeptide itself. In this manner, even a sequence encoding only a fragment of the polypeptide can be used to generate antibodies binding the whole native polypeptide. Such antibodies can then be used to isolate the polypeptide from tissue expressing that polypeptide. Moreover, a panel of such antibodies, specific to a large number of polypeptides, can be used to identify and differentiate such tissue.
  • any technique which provides antibodies produced by continuous cell line cultures can be used. Examples include the hybridoma technique (Kohler and Milstein, 1 975, Nature, 256:495-497), the trioma technique, the human B-cell hybridoma technique (Kozbor et al., 1 983, Immunology Today 4:72), and the EBV-hybridoma technique to produce human monoclonal antibodies (Cole, et al., 1 985, in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96).
  • the antibodies can be used in methods relating to the localization and activity of the protein sequences of the invention, e.g., for imaging these proteins, measuring levels thereof in appropriate physiological samples and the like.
  • the proteins encoded by the nucleotide sequences of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 1 9, 21 , 23, 25, 27 and 28 are expressed in U20S cells. This is achieved by selectively PCR amplifying the coding regions thereof (based on the available open reading frames) and then cloning the resulting amplicon into a suitable mammalian expression vector.
  • One such vector is pcDNA3.1 (sold by Invitrogen - #K4800-01 ).
  • the expression of the protein encoded by the described polynucleotide sequence is detected in either of two ways: by use of specific antibodies * raised against peptides derived from the amino acid sequence or by use of antibodies against tags added during the cloning procedure.
  • tags are the V5 epitope or a poly-histidine sequence as contained in the pcDNA3.1 vector.
  • cells will normally be transfected with the expression construct and cultured for 1 to 5 days. Cells will then be lysed and their protein content analyzed by western blotting using the above antibodies as appropriate. Cells will also be analyzed for the subcellular localization of the protein encoded by the described polynucleotide sequence by transfecting cells in suitable chambers, culturing them for 1 to 5 days and fixing them in situ. Such cells will then be analyzed for the presence and localization of the encoded protein by staining cells with the above-referenced antibodies.
  • cells will be transfected with an expression system in which the protein encoded by the described polynucleotide sequence is fused to a directly detectable tag such as green fluorescent protein (GFP).
  • GFP green fluorescent protein
  • the expression and localization of the protein encoded by the described polynucleotide sequence is then detected by analyzing that of GFP.
  • GFP green fluorescent protein
  • each such polypeptide is listed in the table below along with its calculated molecular weight (Daltons) and its expected isoelectric point (pi).
  • polypeptides of SEQ ID NOS: 8 and 20 corresponded only to partial sequences and thus no values could be calculated and such sequences are not in the table.
  • All of the polynucleotides from which these polypeptide sequences are derived are cDNAs isolated during a differential screen of osteogenic mesenchymal stem cells (MSCs) cultured for 4 days in the presence of osteogenic supplements.
  • MSCs osteogenic mesenchymal stem cells

Abstract

A human mesenchymal stem cell (hMSC) cDNAs and putative polypeptides derived from Open Reading Frames contained therein are disclosed. Also disclosed are methods for utilizing the polynucleotides and polypeptides, including use as reagents for chromosomal mapping and identification, DNA fingerprinting and the possible role played by genetic mutations in the disease process, and for the generation of polyclonal and/or monoclonal antibodies specific for said polypeptides.

Description

HUMAN MESENCHYMAL DNAs AND EXPRESSION PRODUCTS
This application claims the priority of U.S. Provisional Applications 60/1 48,800, filed 1 3 August 1 999, and 60/1 27,41 8, filed 1 April 1 999, the disclosures of which are hereby incorporated by reference in their entirety.
BACKGROUND OF THE INVENTION
This invention relates to newly identified polynucleotide sequences corresponding to transcription products of human genes, and to complete gene sequences associated therewith and to gene expression products thereof and to uses for the foregoing.
Osteoblasts, key cells in bone formation, or osteogenesis, are formed from mesenchymal stem cells. Such mesenchymal stem cells (or MSCs) of numerous mammalian species can be induced to differentiate into connective tissue cell lineages by varying the in vitro culture conditions. Osteogenesis, the differentiation into bone cells, has been reported as a means to generate replacement bone from cultured and implanted MSCs (Bruder et al, Growth Kinetics, Self-Renewal, and the Osteogenic Potential of Purified Human Mesenchymal Stem Cells During Extensive Subcultivation and Following Cryopreservation, J. Cell
Biochem., 64(2):278-294 (Feb. 1 997); Jaiswal et al., Osteogenic Differentiation of Purified, Culture-Expanded Human Mesenchymal Stem Cells In Vitro, J. Cell Biochem., 64(2):295-31 2 (Feb. 1 997), Kadiyala et al., Culture Expanded Canine Mesenchymal Stem Cells Possess Osteochondrogenic Potential In Vivo and In Vitro, Cell Transplant, 6(2): 1 25-134 (Mar-Apr 1 997)).
The process by which MSCs undergo osteogenic differentiation in culture is marked by the development of an osteoblastic morphology, the deposition of a hydroxyapetite mineralized extracellular matrix characteristic of osteoblasts and the presence of terminally differentiated osteocytes, as well as the expression of alkaline phosphatase (Jaiswal et al., Osteogenic Differentiation of Purified, Culture-Expanded Human Mesenchymal Stem Cells In Vitro, J. Cell Biochem., 64(2):295-31 2 (Feb. 1 997)). Mechanisms underlying the osteogenic differentiation of human MSCs (hereafter, hMSCs) are poorly understood. Identification of proteins produced during this process would greatly facilitate the discovery and development of small molecules that target the osteoblast and its bone forming potential. Identification of these factors would be accelerated by the availability of relevant cDNA libraries constructed from hMSCs during various stages of their differentiation.
Identification and sequencing of human genes is a major goal of modern Molecular Biology. For example, by identifying genes and determining their sequences, scientists have been able to make large quantities of valuable human "gene products." These include human insulin, interferon, Factor VIII, tumor necrosis factor, human growth hormone, tissue plasminogen activator, and numerous other compounds. Additionally, knowledge of gene sequences can provide the key to treatment or cure of genetic diseases (such as muscular dystrophy and cystic fibrosis). BRIEF SUMMARY OF THE INVENTION
In accordance with the present invention, Mesenchymal stem cells (MSCs) have been isolated and culture expanded from humans, and from them new cDNA libraries have been constructed from messenger ribonucleic acids (hereafter, mRNAs) isolated from hMSCs.
It is an object of the present invention to obtain cDNA libraries from purified and cultured MSCs and to use these isolated nucleic acids, isolated sequences, and fragments thereof, in the determination and preparation of the expression products of these nucleic acids and sequences, including fragments thereof.
It is a further object of the present invention to use the cDNAs so produced, and fragments thereof, as well as their expression products, as chromosomal markers for determining the location of genes within the genome, and alleles thereof, expressed during the development of differentiated mesenchymal cells.
It is yet another object of the present invention to provide DNA sequences for use in human "fingerprinting" whereby different individuals can be distinguished based on the sequences of the genes identified as wholly, or partly, identical to those disclosed herein.
It is still another object of the present invention to provide polynucleotide sequences corresponding to the genes coding for polypeptides as disclosed herein whereby such sequences can be compared with those found in similar chromosomal locations in animals, especially mammals, and most especially humans, where such animal is afflicted with a disease affecting bone growth, or such other disease, or diseases, as may be affected by such genes, and thus detecting the presence of mutations in said genes leading to such diseases.
It is a still further object of the present invention to provide genetically engineered cells, and vectors, containing one or more copies of the nucleic acids, or DNAs, or genes, or nucleotide sequences according to the present invention, capable of expressing said peptides, or polypeptides, or proteins for rapid cloning of genes according to the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows the consensus sequence (SEQ ID NO: 27) for the novel DNA sequence of the invention as determined from different cDNA clones of said sequence, the latter being about 2.5 kb in length.
Figure 2 is a deduced amino acid sequence for the protein expressed from the sequence of Figure 1 , residues 1 25 through 1 71 7 and corresponding to SEQ ID NO:29. The amino acids set off between asterisks constitute a bipartite nuclear localization signal. The isoelectric point and molecular weight were also calculated for the putative protein.
Figure 3 shows the results of a dot blot assay for the presence of the novel DNA sequence in a variety of human tissues. For this assay, a prefabricated dot blot from Clontech (#7770-1 ) was hybridized using a probe generated from the 2.5 kb cDNA of Figure 1 and treated according to the manufacturer's instructions. Signals due to bound probe were analyzed using a Storm 860 phosphorimager and imagequant software. Figure 4 is a bar graph showing the distribution of the sequence of Figure 1 in a variety of human tissues based on relative mRNA abundance. The highest signal strength was in cells of adult heart and lowest was in fetal thymus. The bar graphs were generated using data from the dot blots of Figure 3 and were imported into an Excel spreadsheet. The data were then analyzed as arbitrary signal strength per tissue after subtracting background (due to non-specific hybridization). The order of the tissues in the bar graph reflects signal strength (and therefor differs from that on the dot blot of Figure 3). Figure 4(b) is a continuation of Figure 4(a).
DETAILED DESCRIPTION OF THE INVENTION
One aspect of the present invention is directed to nucleic acids and isolated DNA sequences and molecules, and fragments thereof (and corresponding isolated RNA sequences, and fragments thereof), including sequences complementary to the foregoing, showing sequence similarity to, or capable of hybridizing to, the DNA sequences identified in SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 1 9, 21 , 23, 25, and 27 or 28. The present invention is also directed to fragments or portions of such sequences which contain at least 1 5 bases, preferably at least 30 bases, more preferably at least 50 bases and most preferably at least 80 bases, and to those sequences which are at least 60%, preferably at least 80%, and most preferably at least 95%, especially 98%, identical thereto, and to DNA (or RNA) sequences encoding the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 1 2, 1 4, 1 6, 1 8, 20, 22, 24, 26, and 29, including fragments and portions thereof and, when derived from natural sources, includes alleles thereof. In accordance with the present invention, the term "percent identity" or "percent identical," when referring to a sequence, means that a sequence is compared to a claimed or described sequence after alignment of the sequence to be compared (the "Compared Sequence") with the described or claimed sequence (the "Reference Sequence"). The Percent Identity is then determined according to the following formula:
Percent Identity = 100 [1 -(C/R)]
wherein C is the number of differences between the Reference Sequence and the Compared Sequence over the length of alignment between the Reference Sequence and the Compared Sequence wherein (i) each base or amino acid in the Reference Sequence that does not have a corresponding aligned base or amino acid in the Compared Sequence and (ii) each gap in the Reference Sequence and (iii) each aligned base or amino acid in the Reference Sequence that is different from an aligned base or amino acid in the Compared Sequence, constitutes a difference; and R is the number of bases or amino acids in the Reference Sequence over the length of the alignment with the Compared Sequence with any gap created in the Reference Sequence also being counted as a base or amino acid.
If an alignment exists between the Compared Sequence and the Reference Sequence in which the percent identity as calculated above is about equal to or greater than a specified minimum Percent Identity then the Compared Sequence has the specified minimum percent identity to the Reference Sequence even though alignments may exist in which the hereinabove calculated Percent Identity is less than the specified Percent Identity.
Yet another aspect of the present invention is directed to an isolated DNA (or RNA) sequence or molecule comprising at least the coding region of a human gene (or a DNA sequence encoding the same polypeptide as such coding region), in particular an expressed human gene, which human gene comprises a DNA sequence homologous with, or contributing to, the sequence depicted in SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 1 9, 21 , 23, 25 and 27 or 28, or one at least 60%, preferably at least 80%, and most preferably at least 95%, especially 98%, identical thereto, including 100% identity, as well as fragments or portions of the coding region which encode a polypeptide having a similar function to the polypeptide encoded by said coding region. Thus, the isolated DNA (or RNA) sequence may include only the coding region of the expressed gene (or fragment or portion thereof as hereinabove indicated) or may further include all or a portion of the non-coding DNA (or RNA) of the expressed human gene.
In general, sequences homologous with and contributing to the sequences of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 1 9, 21 , 23, 25 and 27 or 28 (or one at least 60%, preferably at least 80%, and most preferably at least 95% identical or homologous thereto) are from the coding region of a human gene.
The present invention also relates to vectors or plasmids which include such DNA (or RNA) sequences, as well as the use of the DNA (or RNA) sequences.
The sequences depicted in SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5,
1 7, 1 9, 21 , 23, 25 and 28 are hybridizable with actual DNA and RNA sequences as derived from different human tissues. These sequences represent cDNA clones.
The sequence depicted in Figure 1 (SEQ ID NO: 27) is hybridizable with actual DNA and RNA sequences as derived from different human tissues. A number of cDNA clones have been generated. The nucleotide sequence of Figure 1 (SEQ ID NO: 27) itself showed a nuclear location in the various tissues studied. The distribution of this sequence in various human tissues is shown in Figures 3 and 4. Some of these clones had an additional 3 '-untranslated region, the presence of which is generally related to the extent to which the mRNA species remain in the cell before being turned over. See Kingman, Genetic Engineering, Blackwell, 1988, at page 313. The 3'-untranslated region may also regulate the frequency at which the mRNA is translated and thus constitute a mechanism by which the expression of the protein can be regulated. (Gray, N.K. & Wickens, M., Control of Translation Initiation in Animals, Ann. Rev. Cell Dev. Biol., 14:399-458 (1 998).
The polynucleotides of the present invention may be in the form of RNA or in the form of DNA, which DNA includes cDNA, genomicJ DNA, and synthetic DNA. The DNA may be double-stranded or single- stranded, and if single stranded may be the coding strand or non-coding (anti-sense) strand. The coding sequence which encodes the mature polypeptide may be identical to the coding sequences present as open reading frames (ORFs) of the spolynucleotide sequences disclosed herein or may be a different coding sequence, which coding sequence, as a result of the redundancy or degeneracy of the genetic code, encodes the same mature polypeptide as the polynucleotide sequences of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 13, 1 5, 17, 19, 21 , 23, 25 and 28.
The polynucleotides that code for the polypeptides disclosed herein as putative proteins SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 1 8, 20, 22, 24, 26 and 29 may include, but are not limited to: only the coding sequence for the mature polypeptide; the coding sequence for the mature polypeptide and additional coding sequence such as a leader or secretory sequence, a proprotein sequence and a membrane anchor; the coding sequence for the mature polypeptide (and optionally additional coding sequence) and non-coding sequence, such as introns or non-coding sequence 5' and/or 3' of the coding sequence for the mature polypeptide.
The polynucleotide which codes for the polypeptide of Figure 2 (SEQ ID NO:29) may include, but is not limited to: only the coding sequence for the mature polypeptide; the coding sequence for the mature polypeptide and additional coding sequence such as a leader or secretory sequence, a proprotein sequence and a membrane anchor; the coding sequence for the mature polypeptide (and optionally additional coding sequence) and non-coding sequence, such as introns or non-coding sequence 5' and/or 3' of the coding sequence for the mature polypeptide.
The term "polynucleotide" as used for the present invention encompasses a polynucleotide which includes only coding sequence for the* polypeptide as well as a polynucleotide which includes additional coding and/or non-coding sequences.
The present invention further relates to variants of the hereinabove described polynucleotides which encode fragments, analogs and derivatives of the polypeptides having the amino acid sequences of SEQ ID NOS: 2, 4, 6, 8, 10, 1 2, 14, 16, 1 8, 20, 22, 24, 26 and 29. Variants of the polynucleotide may be naturally occurring allelic variants of the polynucleotides or a non-naturally occurring variant of the polynucleotides.
Thus, the nucleic acids, or polynucleotides, according to the present invention may have coding sequences which are naturally occurring allelic variants of the coding sequence shown in SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 19, 21 , 23, 25, 27 and 28. As known in the art, an allelic variant is an alternate form of a polynucleotide sequence which may have a substitution, deletion or addition of one or more nucleotides, which does not substantially alter the function of the encoded polypeptide. The present invention also includes polynucleotides, wherein the coding sequence for the mature polypeptide may be fused in the same reading frame to a polynucleotide sequence which aids in expression and secretion of a polypeptide from a host cell, for example, a leader sequence which functions as a secretory sequence for controlling transport of a polypeptide from the cell and a transmembrane anchor which facilitates attachment of the polypeptide to a cellular membrane. The polypeptide having a leader sequence is a preprotein and may have the leader sequence cleaved by the host cell to form the mature polypeptide. The polynucleotides may also encode for a proprotein which is the mature protein plus additional 5' amino acid residues. A mature protein having a prosequence is a proprotein and is often an inactive form of the protein. Once the prosequence is cleaved an active mature protein remains.
Thus, for example, the polynucleotide of the present invention may encode for a mature protein, for a protein having a prosequence, for a protein having a transmembrane anchor or for a polypeptide having a prosequence, a presequence (leader sequence) and a transmembrane anchor.
The polynucleotides of the present invention may also have the coding sequence fused in frame to a marker sequence which allows for purification of the polypeptide of the present invention. The marker sequence may be a hexa-histidine tag supplied by a pQE-9 vector to provide for purification of the mature polypeptide fused to the marker in the case of a bacterial host, or, for example, the marker sequence may be a hemagglutinin (HA) tag when a mammalian host, e.g. COS-7 cells, is used. The HA tag corresponds to an epitope derived from the influenza hemagglutinin protein (Wilson, I., et al., Cell, 37:767 (1984)).
Fragments of the full length polynucleotide of the present invention may be used as hybridization probes for a cDNA library to isolate the full length cDNA and to isolate other cDNAs which have a high sequence similarity to the gene or similar biological activity. Probes of this type preferably have at least 1 5 bases, may have at least 30 bases and even 50 or more bases. The probe may also be used to identify a cDNA clone corresponding to a full length transcript and a genomic clone or clones that contain the complete gene including regulatory and promotor regions, exons, and introns. An example of a screen comprises isolating the coding region of the gene by using the known DNA sequence to synthesize an oligonucleotide probe. Labeled oligonucleotides having a sequence complementary to that of the gene of the present invention are used to screen a library of human cDNA, genomic DNA or mRNA to determine which members of the library the probe hybridizes to.
A polynucleotide according to the present invention may have" at least 1 5 bases, preferably at least 30 bases, and more preferably at least 50 bases which hybridize to a polynucleotide of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 19, 21 , 23, 25, 27 and 28 and which has an identity thereto, as hereinabove described, and which may or may not retain activity. Such polynucleotides may be employed as probes for the polynucleotides or genes coding for the polypeptides of SEQ ID NOS: 2, 4,
6, 8, 1 0, 1 2, 14, 1 6, 18, 20, 22, 24, 26, and 29, for example, for recovery of the polynucleotide or as a diagnostic probe or as a PCR primer.
The polynucleotides according to the present invention may also occur in the form of mixtures of polynucleotides hybridizable to some extent with the gene sequences containing any of the nucleotide sequences of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 19, 21 , 23, 25, 27 and 28 including any and all fragments thereof, and which polynucleotide mixtures may be composed of any number of such polynucleotides, or fragments thereof, including mixtures having at least 10, perhaps at least 30 such sequences, or fragments thereof. Because coding regions comprise only a small portion of the human genome, identification and mapping of transcribed regions and coding regions of chromosomes is of significant interest. There is a corresponding need for reagents for identifying and marking coding regions and transcribed regions of chromosomes. Furthermore, such human sequences are valuable for chromosome mapping, human identification, identification of tissue type and origin, forensic identification, and locating disease-associated genes (i.e., genes that are associated with an inherited human disease, whether through mutation, deletion, or faulty gene expression) on the chromosome.
Various aspects of the present invention include each of the individual sequences, corresponding partial and complete cDNAs, genomic DNA, mRNA, antisense strands, PCR primers, coding regions, and constructs. Expression vectors and polypeptide expression products, are also within the scope of the present invention, along with antibodies, especially monoclonal antibodies, to such expression products.
As used herein and except as noted otherwise, all terms are defined as given below.
In accordance with the present invention, the term "gene" or "cistron" means the segment of DNA (or DNA segment) involved in producing a polypeptide chain; it includes regions preceding and following the coding region (5'-and 3'- untranslated regions, or UTRs, also called leader and trailer sequences, regions, or segments) as well as intervening sequences (introns) between individual coding segments (exons), which intronic regions are typically removed during processing of post- transcriptional RNA to form the final translatable mRNA product. Of course, by their nature, cDNAs contain no intronic sequences. In accordance with the present invention, the term "DNA segment" refers to a DNA polymer, in the form of a separate fragment or as a component of a larger DNA construct, which has been derived from DNA isolated at least once in substantially pure form, i.e., free of contaminating endogenous materials and in a quantity or concentration enabling identification, manipulation, and recovery of the segment and its component nucleotide sequences by standard biochemical methods, for example, using a cloning vector. Such segments are provided in the form of an open reading frame uninterrupted by internal nontranslated sequences (introns), which are typically present in eukaryotic genes. Sequences of non-translated DNA may be present downstream from the open reading frame, where the same do not interfere with manipulation or expression of the coding regions.
The nucleic acids and polypeptide expression products disclosed according to the present invention, as well as expression vectors containing such nucleic acids, may be in "enriched form. " As used herein, the term "enriched" means that the concentration of the material is at least about 2, 5, 1 0, 100, or 1 000 times its natural concentration (for example), advantageously 0.01 %, by weight, preferably at least about 0.1 % by weight. Enriched preparations of about 0.5%, 1 %, 5%, 10%, and 20% by weight are also contemplated. The sequences, constructs, vectors, clones, and other materials comprising the present invention can advantageously be in enriched or isolated form. For example, removal, via the differential display techniques described herein, of clones corresponding to ribosomal RNA and "housekeeping" genes and clones without human cDNA inserts results in a library that is "enriched" in the desired clones.
The DNA and RNA sequences, and polypeptides, disclosed in accordance with the present invention will commonly be in isolated form. The term "isolated" means that the material is removed from its original environment (e.g., the natural environment if it is naturally occurring). For example, a naturally-occurring polynucleotide, or DNA, present in a living animal is not isolated, but the same polynucleotide or DNA, separated from some or all of the coexisting materials in the natural system, is isolated. Such DNA could be part of a vector and/or such polynucleotide could be part of a composition, and still be isolated in that such vector or polynucleotide is not part of its natural environment.
The DNA and RNA sequences, and polypeptides, disclosed in accordance with the present invention may also be in "purified" form. The term "purified" does not require absolute purity; rather, it is intended as a relative definition, and can include preparations that are highly purified or preparations that are only partially purified, as those terms are understood by those of skill in the relevant art. Individual clones isolated from a cDNA library have been conventionally purified to electrophoretic homogeneity. The cDNA clones are obtained via manipulation of a partially purified naturally occurring substance (messenger RNA). By conversion of mRNA into a cDNA library, pure individual cDNA clones can be isolated from the synthetic library by clonal selection. Thus, creating a cDNA library from RNA and subsequently isolating individual clones from that library results in an approximately 106 fold purification of the native message. Purification of starting material or natural material to at least one order of magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is expressly contemplated. Furthermore, claimed polynucleotide which has a purity of preferably 0.001 %, or at least 0.01 % or 0.1 %; and even desirably 1 % by weight or greater is expressly contemplated.
The term "coding region" refers to that portion of a human gene which either naturally or normally codes for the expression product of that gene in its natural genomic environment, i.e., the region coding in vivo for the native expression product of the gene. The coding region can be from a normal, mutated or altered gene, or can even be from a DNA sequence, or gene, wholly synthesized in the laboratory using methods well known to those of skill in the art of DNA synthesis.
In accordance with the present invention, the term
"nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides. Generally, DNA segments encoding the proteins provided by this invention are assembled from cDNA fragments and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a synthetic gene which is capable of being expressed in a recombinant transcriptional unit comprising regulatory elements derived from a microbial or viral operon.
The term "expression product" means that polypeptide or protein that is the natural transcription product of the gene and any nucleic acid sequence coding equivalents resulting from genetic code degeneracy and thus coding for the same amino acid(s).
The term "fragment" when referring to a coding sequence means a portion of DNA comprising less than the complete human coding region whose expression product retains essentially the same biological function or activity as the expression product of the complete coding region.
When referring to a portion of a polypeptide, as used herein, the terms "portion," "segment," and "fragment," refer to a continuous sequence of residues, such as amino acid residues, which sequence forms a subset of a larger sequence. For example, if a polypeptide were subjected to treatment with any of the common endopeptidases, such as trypsin or chymotrypsin, the oligopeptides resulting from such treatment would represent portions, segments or fragments of the starting polypeptide.
Similarly, portions, segments or fragments of polynucleotides would include those products resulting from the treatment of such polynucleotides with endonucleases.
The term "primer" means a short nucleic acid sequence that is paired with one strand of DNA and provides a free 3'OH end at which a DNA polymerase starts synthesis of a deoxyribonucleotide chain.
The term "promoter" means a region of DNA involved in binding of RNA polymerase to initiate transcription.
The term "open reading frame (ORF)" means a series of triplets coding for amino acids without any termination codons and is a sequence (potentially) translatable into protein.
The term "exon" means any segment of an interrupted gene* that is represented in the mature RNA product.
As used herein, reference to a DNA sequence includes both single stranded and double stranded DNA. Thus, the specific sequence, unless the context indicates otherwise, refers to the single strand DNA of such sequence, the duplex of such sequence with its complement (double stranded DNA) and the complement of such sequence.
In accordance with the present invention, the overall approach to identification of cDNAs from hMSCs involved measurement of gene expression during growth of human mesenchymal stem cells in culture. Cells were harvested and the total RNA content thereof was recovered. Next, using various primer combinations, reverse transcriptase and polymerase chain reaction procedures (RT-PCR) were used to produce and amplify the corresponding cDNAs, which were then screened to find regulated DNA sequences that were subsequently purified and cloned. These clones were then sequenced and used to determine a consensus sequence (one based upon the most commonly occurring bases at each nucleotide position in a sequence after the contributing sequences are aligned by residue position). The resulting sequences were then subjected to computer database searches for novelty, and any homology with known sequences, using, for example, the BLAST program and the GenBank database.
Using the RT-PCR methodology, the mRNA from the cells of interest (such as the hMSCs used in accordance with the present invention) is used to prepare a set or family of cDNAs corresponding to the expressed genes of the cell. This cDNA preparation is then exhaustively hybridized with mRNA of cells not expressing the gene and resulting in removal of all sequences from the cDNA preparation that are common to the two cell samples. All of the cDNA sequences that hybridize with the other mRNA and those that remain are then hybridized with mRNA from the cells expressing the gene (for example, cells from a healthy person or cells from tissues known to express the gene) to confirm that they are in fact the desired coding sequences. Because these latter clones contain sequences specific to the mRNA population of the cells of interest, they can subsequently be amplified and characterized using further rounds of PCR and the general techniques of molecular biology.
In accordance with the foregoing, a cDNA library was generated and corresponds to the sequences of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 17, 19, 21 , 23, 25, 27 and 28. Probes based on these cDNAs can be used to identify the relevant transcripts, using Northern Blotting Analysis methods well known in the art to localize these sequences within cells of various tissues. For example, the heaviest distribution of the gene coding for the polypeptide of Figure 2 (SEQ ID NO: 29) was in heart tissue, as shown in Figures 3 and 4. In accordance with the present invention, cDNA was quantified by spotting 0.5 μl aliquots of standards and samples on ethidium agarose plates prepared as suggested in the instructions from the manufacturer (Stratagene, La Jolla, CA). Plates were incubated at room temperature for 1 5 minutes and DNA was visualized by UV transillumination. The respective cDNAs were then quantified by comparing spot intensities of the samples with those of the standards (the latter consisting of appropriate dilutions of 1 kb ladders (from Life Technology).
Aliquots of each amplified library were excised and plasmids from randomly chosen colonies were analyzed by restriction nuclease analysis. In accordance with the present invention, plasmid DNA was digested with both EcoRI and Xhol nucleases (New England Biolabs) and the resulting restriction fragments were separated on 1 .5% agarose gel electrophoresis. The cDNA inserts ranged in size from less than 1 kbp to larger than 4 kbp (where 1 kbp = 1 ,000 nucleotide base pairs of duplex DNA).
Each of the DNA sequences identified herein (and the corresponding complete gene sequences) can be used in numerous ways as polynucleotide reagents. The sequences can be used as diagnostic probes for the presence of a specific mRNA in a particular cell type as well as in genetic linkage analysis (polymorphisms). Further, the sequences can be used as probes for locating gene regions associated with genetic disease.
The nucleotide and gene sequences of the present invention are also valuable for chromosome identification. Each sequence is specifically targeted to and can hybridize with a particular location on an individual human chromosome. Moreover, there is a current need for identifying particular sites on the chromosome. The mapping of the polynucleotides to . specific chromosomes according to the present invention is an important first step in correlating those sequences with genes associated with disease.
Briefly, sequences can be mapped to chromosomes by preparing PCR primers (preferably 1 5-30 bp) from the sequences disclosed herein. Computer analysis of these sequences is used to rapidly select primers that do not span more than one exon in the corresponding genomic DNA, which would otherwise complicate the amplification process. These primers are then used for PCR screening of somatic cell hybrids containing individual human chromosomes. Only those hybrids containing the human gene corresponding to the sequences or subsequences disclosed herein will yield an amplified fragment.
PCR mapping of somatic cell hybrids is a rapid procedure for assigning a particular sequence to a particular chromosome. Three or more clones can be assigned per day using a single thermal cycler, as is well known in the art. Using the present invention with the same oligonucleotide primers, sublocalization can be achieved with panels of fragments from specific chromosomes or pools of large genomic clones in an analogous manner. Other mapping strategies that can similarly be used to map a sequence, or part of a sequence, to its chromosome include in situ hybridization, prescreening with labeled flow-sorted chromosomes and preselection by hybridization to construct chromosome specif ic-cDNA libraries.
Fluorescence in situ hybridization (FISH) of a cDNA clone to a metaphase chromosomal spread can be used to provide a precise chromosomal location in one step. This technique can be used with cDNA as short as 500 or 600 bases; however, clones larger than 2,000 bp have a higher likelihood of binding to a unique chromosomal location with sufficient signal intensity for simple detection. FISH requires use of the clone from which the sequence was derived, and the longer the better. For example, 2,000 bp is good, 4,000 is better, but more than 4,000 is probably not necessary to get good results a reasonable percentage of the time. For a review of this technique, see Verma et al., Human Chromosomes: a Manual of Basic Techniques. Pergamon Press, New York (1988).
Reagents for chromosome mapping can be used individually (to mark a single chromosome or a single site on that chromosome) or as panels of reagents (for marking multiple sites and/or multiple chromosomes). Reagents corresponding to noncoding regions of the genes actually are preferred for mapping purposes. Coding sequences are more likely to be conserved within gene families, thus increasing the chance of cross hybridizations during chromosomal mapping.
Once a sequence has been mapped to a precise chromosomal location, the physical position of the sequence on the chromosome can be correlated with genetic map data. (Such data are found, for example, in V. McKusick, Mendelian Inheritance in Man (available on line through Johns Hopkins University Welch Medical Library)). The relationship between genes and diseases that have been mapped to the same chromosomal region are then identified through linkage analysis (coinheritance of physically close genes).
Next, it is necessary to determine if there are differences in the cDNA or genomic sequence between affected and unaffected individuals. If a mutation is observed in some or all of the affected individuals but not in any normal individuals, then the mutation is likely to be the causative agent of the disease.
With current resolution of physical mapping and genetic mapping techniques, a cDNA precisely localized to a chromosomal region associated with the disease could be one of between 50 and 500 potential causative genes. (This assumes 1 megabase mapping resolution and one gene per 20 kb.)
Comparison of affected and unaffected individuals generally involves first looking for structural alterations in the chromosomes, such as deletions or translocations that are visible from chromosome spreads or detectable using PCR based on that cDNA sequence. Ultimately, complete sequencing of genes from several individuals is required to confirm the presence of a mutation and to distinguish mutations from polymorphisms.
In addition to the foregoing, the sequences of the invention, as broadly described, can be used to control gene expression througrf triple helix formation or antisense DNA or RNA, both of which methods are based on binding of a polynucleotide sequence to DNA or RNA. Polynucleotides suitable for use in these methods are usually 20 to 40 bases in length and are designed to be complementary to a region of the gene involved in transcription (triple helix - see Lee et al, Nucl. Acids Res., 6:3073 (1979); Cooney et al, Science, 241 :456 (1988) ; and Dervan et al, Science, 251 : 1 360 (1 991 ) ) or to the mRNA itself (antisense - Okano, J. Neurochem., 56:560 (1 991 ) ; Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, FL ( 1 988)). Triple helix- formation optimally results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization blocks translation of an mRNA molecule into polypeptide. Both techniques have been demonstrated to be effective in model systems. Information contained in the sequences of the present invention is necessary for the design of an antisense or triple helix oligonucleotide. Antisense RNA or oligonucleotide hybridization may also lead to RNAse H activation and hence destruction of the molecules involved in the hybrid. The present invention is also a useful tool in gene therapy, which requires isolation of the disease-associated gene in question as a prerequisite to the insertion of a normal gene into an organism to correct a genetic defect. The high specificity of the cDNA probes according to this invention have promise of targeting such gene locations in a highly accurate manner.
The sequences of the present invention, as broadly defined, and including subsequences and fragments thereof, are also useful for identification of individuals from minute biological samples. The United States military, for example, is considering the use of restriction fragment length polymorphism (RFLP) for identification of its personnel. In this technique, an individual's genomic DNA is digested with one or more restriction enzymes, and probed on a Southern blot to yield unique bands' for identifying personnel. This method does not suffer from the current limitations of "Dog Tags" which can be lost, switched, or stolen, making positive identification difficult. The sequences of the present invention are useful as additional DNA markers for RFLP.
However, RFLP is a pattern based technique, which does not require the DNA sequence of the individual to be sequenced. Portions of the sequences of the present invention can be used to provide an alternative technique that determines the actual base-by-base DNA sequence of selected portions of an individual's genome. These sequences can also be used to prepare PCR primers for amplifying and isolating such selected DNA. One can, for example, take part of the sequence of the invention and prepare two PCR primers from the 5' and 3' ends of the sequence, or fragment of the sequence. These are used to amplify an individual's DNA, corresponding to the sequence. The amplified DNA is sequenced. Panels of corresponding DNA sequences from individuals, made this way, can provide unique individual identifications, as each individual will have a unique set of such DNA sequences, due to allelic differences. The sequences of the present invention can be used to particular advantage to obtain such identification sequences from individuals and from tissue. Allelic variation occurs to some degree in the coding regions of these sequences, and to a greater degree in the noncoding regions. It is estimated that allelic variation between individual humans occurs with a frequency of about once per each 500 bases. Each of the fragments or complete coding sequences comprising a part of the present invention can, to some degree, be used as a standard against which DNA from an individual can be compared for identification purposes. Because greater numbers of polymorphisms occur in the noncoding regions, fewer sequences are necessary to differentiate* individuals.
If a panel of reagents from the sequences according to the present invention is used to generate a unique ID database for an individual, those same reagents can later be used to identify tissue from that individual. Positive identification of that individual, living or dead can be made from extremely small tissue samples.
Another use for DNA-based identification techniques is in forensic biology. PCR technology can be used to amplify DNA sequences taken from very small biological samples. In one prior art technique, gene sequences are amplified at specific loci known to contain a large number of allelic variations, for example the DQα class II HLA gene (Erlich, H., PCR Technology, Freeman and Co. ( 1 992)). Once this specific area of the genome is amplified, it is digested with one or more restriction enzymes to yield an identifying set of bands on a Southern blot probed with DNA corresponding to the DQα class II HLA gene. In accordance with the present invention, it is clear from the results depicted in Figure 3 and 4 that the novel gene signal according to the present invention is found in many different tissues of the body.
The sequences of the present invention can be used to provide polynucleotide reagents specifically targeted to additional loci in the human genome, and can enhance the reliability of DNA-based forensic identifications. Those sequences targeted to noncoding regions are particularly appropriate. As mentioned above, actual base sequence information can be used for identification as an accurate alternative to patterns formed by restriction enzyme generated fragments. Reagents for obtaining such sequence information are within the scope of the present invention. Such reagents can comprise complete genes, parts of genes or corresponding coding regions, or fragments of at least 1 5 bp, preferably at least 1 8 bp.
There is also a need for reagents capable of identifying the source of a particular tissue. Such need arises, for example, in forensics when presented with tissue of unknown origin. Appropriate reagents can comprise, for example, DNA probes or primers specific to particular tissue prepared from the sequences of the present invention. Panels of such reagents can identify tissue by species and/or by organ type. In a similar manner, these reagents can be used to screen tissue cultures for contamination.
Sequences that match perfectly to several different genes can be detected by hybridizing to chromosomes: if many chromosomal loci are observed, the sequence (or a close variant) is in more than one gene. This problem can be circumvented by using the 3'-untranslated part of the cDNA alone as a probe for the chromosomal location or for the full-length cDNA or gene. The 3'-untranslated region is more likely to be unique within gene families, since there is no evolutionary pressure to conserve a coding function of this region of the mRNA. The cDNA libraries disclosed according to the present invention ideally use directional cloning methods so that either the 5' end of the cDNA (likely to contain coding sequence) or the 3' end (likely to be a non-coding sequence) can be selectively obtained.
Using the sequence information provided herein, the polynucleotides of the present invention can be derived from natural sources or synthesized using known methods. The sequences falling within the scope of the present invention are not limited to the specific sequences described, but include human allelic and species variations thereof. Allelic variations can be routinely determined by comparison of one sequence with a sequence from another individual of the same species. Furthermore, to accommodate codon variability, the invention* includes sequences coding for the same amino acid sequences as do the specific sequences disclosed herein. In other words, in a coding region, substitution of one codon for another which encodes the same amino acid is expressly contemplated. (Coding regions can be determined through routine sequence analysis.)
In a cDNA library there are many species of mRNA represented. Each cDNA clone can be interesting in its own right, but must be isolated from the library before further experimentation can be completed. In order to sequence any specific cDNA, it must be removed and separated (i.e. isolated and purified) from all the other sequences. This can be accomplished by many techniques known to those of skill in the art. These procedures normally involve identification of a bacterial colony containing the cDNA of interest and further amplification of that bacteria. Once a cDNA is separated from the mixed clone library, it can be used as a template for further procedures such as nucleotide sequencing. The present invention also includes recombinant constructs comprising one or more of the sequences as broadly described above. The constructs comprise a vector, such as a plasmid or viral vector, into which a sequence of the invention has been inserted, in a forward or reverse orientation. In a preferred aspect of this embodiment, the construct further comprises regulatory sequences, including for example, a promoter, operably linked to the sequence. Large numbers of suitable vectors and promoters are known to those of skill in the art, and are commercially available. The following vectors are provided by way of example. Bacterial: pBs, phagescript, PsiX1 74, pBluescript SK, pBs KS, pNH8a, pNH1 6a, pNH1 8a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia). Eukaryotic: pWLneo, pSV2cat, pOG44, pXT1 , pSG (Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia).
Thus, the present invention is not restricted to such constructs or sequences alone but also includes expression vehicles, which may include plasmids, viruses, or any other expression vectors, including cells and liposomes, containing any of the nucleic acids, nucleotide sequences, DNAs, RNAs, or fragments thereof, as disclosed according to the present invention. Furthermore, this will be true regardless of whether such sequences are coding sequences or non- coding sequences and whether such coding sequences code for all or part of the expression products as disclosed herein, so long as such expression products, or fragments thereof, exhibit some utility in keeping with the invention disclosed herein. Thus, while the present invention includes an isolated DNA sequence, or nucleic acid, that expresses a human protein when in a suitable expression system, for example, a cell- free, or in vitro, expression system, such system may also be contained in, or part of, a suitable expression vehicle, or vector, be that a cell, a plasmid, a virus, or other operative expression vector. Such expression systems, especially where part of an expression vehicle, will commonly require some promoter region that may include a promoter different from that normally associated in vivo with the genes coding for the gene expression products and proteins disclosed according to the present invention. Promoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers. Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial promoters include lacl, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-l. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art.
In a further embodiment, the present invention relates to host cells containing the above-described construct(s). The host cell can be a higher eukaryotic cell, such as a mammalian cell, or a lower eukaryotic cell, such as a yeast cell, or the host cell can be a procaryotic cell, such as a bacterial cell. Introduction of the construct into the host cell can be effected by calcium phosphate transfection, DEAE, dextran mediated transfection, or electroporation (Davis, L., Dibner, M., Battey, I.,
Basic Methods in Molecular Biology, 1 986)) .
The constructs in host cells can be used in a conventional manner to produce the gene product coded by the recombinant sequence. Alternatively, the encoded polypeptide, once the sequence is known from the cDNAs, or from isolation of the pure product, can be synthetically produced by conventional methods of peptide synthesis, either manual or automated.
Thus, in accordance with the present invention, once the coding sequence is known, or the gene is cloned which encodes the polypeptide, conventional techniques in molecular biology can be used to obtain the polypeptide. More generally, the present invention includes all polypeptides coded for by any and each of the DNA or RNA sequences disclosed herein, including fragments of said polypeptides, as well as derivatives and functional analogs thereof.
At the simplest level, the amino acid sequence can be synthesized using commercially available peptide synthesizers. This is particularly useful in producing small peptides and fragments of larger polypeptides. (Fragments are useful, for example, in generating antibodies against the native polypeptide.)
Alternatively, the DNA encoding the desired polypeptide can be inserted into a host organism and expressed. The organism can be a bacterium, yeast, cell line, or multicellular plant or animal. The literature* is replete with examples of suitable host organisms and expression techniques. For example, polynucleotide (DNA or mRNA) can be injected directly into muscle tissue of mammals, where it is expressed. This methodology can be used to deliver the polypeptide to the animal, or to generate an immune response against a foreign polypeptide. Wolff, et al., Science, 247: 1465 ( 1 990); Feigner, et al., Nature, 349:351 (1 991 ). Alternatively, the coding sequence, together with appropriate regulatory regions (i.e., a construct), can be inserted into a vector, which is then used to transfect a cell. The cell (which may or may not be part of a larger organism) then expresses the polypeptide.
The present invention further relates to polypeptides having an amino acid sequence selected from SEQ ID NOS: 2, 4, 6, 8, 10, 1 2, 14, 1 6, 1 8, 20, 22, 24, 26, and 29, as well as fragments, analogs and derivatives of such polypeptide.
The terms "fragment," "derivative" and "analog," when referring to the polypeptides disclosed herein also mean polypeptides that retain essentially the same biological function or activity as said polypeptides. Thus, an analog includes a proprotein which can be activated by cleavage of the proprotein portion to produce an active mature polypeptide. Such fragments, derivatives and analogs must have sufficient similarity to the polypeptides disclosed herein so that activity of the native polypeptide is retained.
The polypeptides of the present invention may be recombinant polypeptides, natural polypeptides or synthetic polypeptides, preferably recombinant polypeptides.
"Recombinant," as used herein, means that a protein is derived from recombinant (e.g., microbial or mammalian) expression systems. "Microbial" refers to recombinant proteins made in bacterial or fungal (e.g., yeast) expression systems. As a product, "recombinant microbial" defines a protein essentially free of native endogenous substances and unaccompanied by associated native glycosylation. Protein expressed in most bacterial cultures, e.g., coli, will be free of glycosylation modifications; protein expressed in yeast will have a glycosylation pattern different from that expressed in mammalian cells.
The fragment, derivative or analog of a polypeptide of SEQ ID NOS: 2, 4, 6, 8, 10, 1 2, 14, 1 6, 1 8, 20, 22, 24, 26 and 29 may be (i) one in which one or more of the amino acid residues are substituted with a conserved or non-conserved amino acid residue (preferably a conserved amino acid residue) and such substituted amino acid residue may or may not be one encoded by the genetic code, or (ii) one in which one or more of the amino acid residues includes a substituent group, or (iii) one in which the mature polypeptide is fused with another compound, such as a compound to increase the half-life of the polypeptide (for example, polyethylene glycol), or (iv) one in which the additional amino acids are fused to the mature polypeptide, such as a leader or secretory sequence or a sequence which is employed for purification of the mature polypeptide or a proprotein sequence. Such fragments, derivatives and analogs are deemed to be within the abilities of those skilled in the art in view of the teachings herein.
The polypeptides of the present invention are preferably provided in an isolated form, and preferably are purified to homogeneity. When applied to polypeptides, the term "isolated" has its already stated meaning.
The polypeptides of the present invention include the polypeptides of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, and 29, in particular the mature polypeptide, as well as polypeptides which have at least 70% identity to these polypeptides, or which have, af least 90% identity to these polypeptides, still more preferably at least 95% identity to these polypeptides and also include portions of such polypeptides with such portion generally containing at least 30 amino acids and more preferably at least 50 amino acids.
Fragments or portions of the polypeptides of the present invention may be employed for producing the corresponding full-length polypeptide by peptide synthesis; therefore, the fragments may be employed as intermediates for producing the full-length polypeptides. Fragments or portions of the polynucleotides of the present invention may be used to synthesize full-length polynucleotides of the present invention.
The present invention also relates to vectors which include polynucleotides of the present invention, host cells which are genetically engineered with vectors of the invention and the production of polypeptides of the invention by recombinant techniques.
Host cells are genetically engineered (transduced or transformed or transfected) with the vectors of this invention which may be, for example, a cloning vector or an expression vector, either of which may be in the form of a plasmid, a viral particle, a phage, etc. The engineered host ceils can be cultured in conventional nutrient media modified as appropriate for activating promoters, selecting transformants or amplifying the genes of the present invention. The culture conditions, such as temperature, pH and the like, are those previously used with the host cell selected for expression, and will be apparent to the ordinarily skilled artisan.
The polynucleotides of the present invention may be employed for producing polypeptides by recombinant techniques. Thus, for example, the polynucleotide may be included in any one of a variety of expression vectors for expressing a polypeptide. Such vectors include chromosomal, nonchromosomal and synthetic DNA sequences, e.g., derivatives of SV40,' bacterial plasmids; phage DNA; baculovirus; yeast plasmids; vectors derived from combinations of plasmids and phage DNA, viral DNA such as vaccinia, adenovirus, fowl pox virus, and pseudorabies. However, any other vector may be used as long as it is replicable and viable in the host.
In accordance with the present invention, an appropriate DNA sequence or segment may be inserted into the vector by a variety of procedures. In general, the DNA sequence is inserted into the appropriate restriction endonuclease site(s) by procedures known in the art. Such procedures and others are deemed to be within the scope of those skilled in the art.
The DNA sequence in the expression vector is operatively linked to an appropriate expression control sequence(s) (for example, a promoter sequence) to direct mRNA synthesis. As representative examples of such promoters, there may be mentioned: LTR or SV40 promoter, the £. coli. lac or trp, the phage lambda PL promoter and other promoters known to control expression of genes in prokaryotic or eukaryotic cells or their viruses. The expression vector also contains a ribosome binding site for translation initiation and a transcription terminator. The vector may also include appropriate sequences for amplifying expression.
In addition, the expression vectors preferably contain one or more selectable marker genes to provide a phenotypic trait for selection of transformed host cells such as dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as tetracycline or ampicillin resistance in E. coli.
The vector containing the appropriate DNA sequence as hereinabove described, as well as an appropriate promoter or control sequence, may be employed to transform an appropriate host to permit the host to express the protein.
As representative examples of appropriate hosts, there may be mentioned: bacterial cells, such as E. coli, Streptomyces, Salmonella typhimurium; fungal cells, such as -yeast; insect cells such as Drosophila S2 and Spodoptera Sf9; animal cells such as CHO, COS or Bowes melanoma; adenoviruses; plant cells, etc. The selection of an appropriate host is deemed to be within the scope of those skilled in the art from the teachings herein.
"Recombinant expression vehicle or vector" refers to a plasmid or phage or virus or vector, for expressing a polypeptide from a DNA (RNA) sequence. The expression vehicle can comprise a transcriptional unit comprising an assembly of (1 ) a genetic element or elements having a regulatory role in gene expression, for example, promoters or enhancers, (2) a structural or coding sequence which is transcribed into mRNA and translated into protein, and (3) appropriate transcription initiation and termination sequences. Structural units intended for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling extracellular secretion of translated protein by a host cell. Alternatively, where recombinant protein is expressed without a leader or transport sequence, it may include an N- terminal methionine residue. This residue may or may not be subsequently cleaved from the expressed recombinant protein to provide a final product.
"Recombinant expression system" means host cells which have stably integrated a recombinant transcriptional unit into chromosomal DNA or carry the recombinant transcriptional unit extra chromosomally. The cells can be prokaryotic or eukaryotic. Recombinant expression systems as defined herein will express heterologous protein upon induction of the regulatory elements linked to the DNA segment or synthetic gene to be expressed.
Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell- free translation systems can also be employed to produce such proteins using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and expression vectors for use with prokaryatic and eukaryotic hosts are described by Sambrook, et al., Molecular Cloning: A Laboratory Manual, Second Edition, (Cold Spring Harbor, N.Y., 1 989), Wu et al, Methods in Gene Biotechnology (CRC Press, New York, NY, 1 997), and Recombinant Gene Expression Protocols, in Methods in Molecular Biology, Vol. 62, (Tuan, ed., Humana Press, Totowa, NJ, 1997), the disclosures of which are hereby incorporated by reference.
Transcription of the DNA encoding the polypeptides according to the present invention by higher eukarotes can be increased by insertion of an enhancer sequence into the vector. Such enhancers have been known for some time and are usually cis-acting elements of DNA, usually anywhere from 10 to 300 bp that act on a promoter to increase transcription- Common examples include the SV40 enhancer, the cytomegalovirus early promoter enhancer, the polyoma enhancer and the enhancers found in adenovirus.
Generally, recombinant expression vectors will include origins of replication and selectable markers permitting transformation of the host cell, e.g., the ampicillin resistance gene of £. coli and S. cerevisiae TRP1 gene, and a promoter derived from a highly-expressed gene to direct transcription of a downstream structural sequence. Such promoters can be derived from operons encoding glycolytic enzymes such as 3- phosphoglycerate kinase (PGK), α-factor, acid phosphatase, or heat shock proteins, among others. The heterologous structural sequence is assembled in appropriate phase with translation initiation and termination sequences, and preferably, a leader sequence capable of directing* secretion of translated protein into the periplasmic space or extracellular medium. Optionally, the heterologous sequence can encode a fusion protein including an N-terminal identification peptide imparting desired characteristics, e.g., stabilization or simplified purification of expressed recombinant product.
Useful expression vectors for bacterial use are constructed by inserting a structural DNA sequence encoding a desired protein together with suitable translation initiation and termination signals in operable reading phase with a functional promoter. The vector will comprise one or more phenotypic selectable markers and an origin of replication to ensure maintenance of the vector and to, if desirable, provide amplification within the host. Suitable prokaryotic hosts for transformation include E. coli, Bacillus subti/is, Salmonella typhimurium and various species within the genera Pseudomonas, Streptomyces, and Staphylococcus, although others may also be employed as a matter of choice. As a representative but nonlimiting example, useful expression vectors for bacterial use can comprise a selectable marker and bacterial origin of replication derived from commercially available plasmids comprising genetic elements of the well known cloning vector pBR322 (ATCC 3701 7). Such commercial vectors include, for example, pKK223- 3 (Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM 1 (Promega Biotec, Madison, Wl, USA). These pBR322 "backbone" sections are combined with an appropriate promoter and the structural sequence to be expressed.
Following transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the selected promoter is derepressed by appropriate means (e.g., temperature shift or chemical induction) and cells are cultured for an additional period. Cells are* typically harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for further purification.
Various mammalian cell culture systems can also be employed to express recombinant protein. Examples of mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts, described by Gluzman, Cell, 23: 1 75 ( 1 981 ), and other cell lines capable of expressing a compatible vector, for example, the C1 27, 3T3, CHO, HeLa and BHK cell lines. Mammalian expression vectors will comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be used to provide the required nontranscribed genetic elements. Recombinant protein produced in bacterial culture is conveniently isolated by initial extraction from cell pellets, followed by one or more salting-out, aqueous ion exchange or size exclusion chromatography steps. Protein refolding steps can be used, as necessary, in completing configuration of the mature protein. Finally, high performance liquid chromatography (HPLC) can be employed for final purification steps. Microbial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents.
The protein, its fragments or other derivatives, or analogs thereof, or cells expressing them, can be used as an immunogen to produce antibodies thereto. These antibodies can be, for example, polyclonal, monoclonal, chimeric, single chain, Fab fragments, or the* product of an Fab expression library. Various procedures known in the art may be used for the production of polyclonal antibodies.
Antibodies generated against the polypeptide corresponding to a sequence of the present invention can be obtained by direct injection of the polypeptide into an animal or by administering the polypeptide to an animal, preferably a nonhuman. The antibody so obtained will then bind the polypeptide itself. In this manner, even a sequence encoding only a fragment of the polypeptide can be used to generate antibodies binding the whole native polypeptide. Such antibodies can then be used to isolate the polypeptide from tissue expressing that polypeptide. Moreover, a panel of such antibodies, specific to a large number of polypeptides, can be used to identify and differentiate such tissue.
For preparation of monoclonal antibodies, any technique which provides antibodies produced by continuous cell line cultures can be used. Examples include the hybridoma technique (Kohler and Milstein, 1 975, Nature, 256:495-497), the trioma technique, the human B-cell hybridoma technique (Kozbor et al., 1 983, Immunology Today 4:72), and the EBV-hybridoma technique to produce human monoclonal antibodies (Cole, et al., 1 985, in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96).
Techniques described for the production of single chain antibodies (U.S. Patent 4,946,778) can be adapted to produce single chain antibodies to immunogenic polypeptide products of this invention.
The antibodies can be used in methods relating to the localization and activity of the protein sequences of the invention, e.g., for imaging these proteins, measuring levels thereof in appropriate physiological samples and the like.
In carrying out the procedures of the present invention it is of course to be understood that reference to particular buffers, media, reagents, cells, culture conditions and the like are not intended to be limiting, but are to be read so as- to include all related materials that one of ordinary skill in the art would recognize as being of interest or value in the particular context in which that discussion is presented. For example, it is often possible to substitute one buffer system or culture medium for another and still achieve similar, if not identical, results. Those of skill in the art will have sufficient knowledge of such systems and methodologies so as to be able, without undue experimentation, to make such substitutions as will optimally serve their purposes in using the methods and procedures disclosed herein.
Specific embodiments of the invention will now be further described in more detail in the following non-limiting examples and it will be appreciated that additional and different embodiments of the teachings of the present invention will doubtless suggest themselves to those of skill in the art and such other embodiments are considered to have been inferred from the disclosure herein.
EXAMPLE
The proteins encoded by the nucleotide sequences of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 1 3, 1 5, 1 7, 1 9, 21 , 23, 25, 27 and 28 are expressed in U20S cells. This is achieved by selectively PCR amplifying the coding regions thereof (based on the available open reading frames) and then cloning the resulting amplicon into a suitable mammalian expression vector. One such vector is pcDNA3.1 (sold by Invitrogen - #K4800-01 ). The expression of the protein encoded by the described polynucleotide sequence is detected in either of two ways: by use of specific antibodies* raised against peptides derived from the amino acid sequence or by use of antibodies against tags added during the cloning procedure. Examples of such tags are the V5 epitope or a poly-histidine sequence as contained in the pcDNA3.1 vector. In order to accomplish this, cells will normally be transfected with the expression construct and cultured for 1 to 5 days. Cells will then be lysed and their protein content analyzed by western blotting using the above antibodies as appropriate. Cells will also be analyzed for the subcellular localization of the protein encoded by the described polynucleotide sequence by transfecting cells in suitable chambers, culturing them for 1 to 5 days and fixing them in situ. Such cells will then be analyzed for the presence and localization of the encoded protein by staining cells with the above-referenced antibodies. Alternatively, cells will be transfected with an expression system in which the protein encoded by the described polynucleotide sequence is fused to a directly detectable tag such as green fluorescent protein (GFP). The expression and localization of the protein encoded by the described polynucleotide sequence is then detected by analyzing that of GFP. For purposes of identification of the polypeptides disclosed herein, each such polypeptide is listed in the table below along with its calculated molecular weight (Daltons) and its expected isoelectric point (pi).
_UIC 1.
SEQ ID NO: # Residues Mol. Wt. i
2 410 45786.9 8.96
4 227 26152.3 8.48
6 275 30781.6 10.00
10 84 8913.2 9.35
12 281 30386.7 9.35
14 322 32977.3 9.27
16 141 16444.4 9.34
18 219 24418.4 9.07
22 56 6356.3 7.85
24 344 37375.6 5.82
26 208 23864.9 9.71
29 531 60,576.6 9.63
The polypeptides of SEQ ID NOS: 8 and 20 corresponded only to partial sequences and thus no values could be calculated and such sequences are not in the table.
All of the polynucleotides from which these polypeptide sequences are derived are cDNAs isolated during a differential screen of osteogenic mesenchymal stem cells (MSCs) cultured for 4 days in the presence of osteogenic supplements.

Claims

What Is Claimed Is:
1 . An isolated nucleic acid comprising a polynucleotide that is at least 90% identical to a polynucleotide encoding a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOS: 2, 4, 6, 8, 1 0, 1 2, 14, 1 6, 1 8, 20, 22, 24, 26, and 29.
2. An isolated nucleic acid comprising a polynucleotide that is at least 95% identical to a polynucleotide encoding a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOS: 2, 4, 6, 8, 10, 1 2, 14, 16, 18, 20, 22, 24, 26, and 29.
3. An isolated nucleic acid comprising a polynucleotide that is* at least 98% identical to a polynucleotide encoding a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOS: 2, 4, 6, 8, 1 0, 1 2, 14, 1 6, 18, 20, 22, 24, 26, and 29.
4. An isolated nucleic acid comprising RNA corresponding to any of the DNA sequences or fragments of claims 1 , 2 or 3.
5. An isolated nucleic acid comprising a DNA sequence identical to a sequence selected from the group consisting of SEQ ID NOS: 1 , 3, 5, 7, 9, 1 1 , 13, 1 5, 17, 19, 21 , 23, 25, 27 and 28 and the complements of these.
6. An isolated nucleic acid comprising RNA corresponding to the DNA sequence of Claim 5.
7. An isolated nucleic acid comprising at least the polypeptide coding region of a human gene, said human gene containing a DNA sequence according to Claim 1 .
8. An isolated nucleic acid comprising at least the polypeptide coding region of a human gene which contains the DNA sequence of Claim 5.
9. The isolated nucleic acid of claim 8 which expresses a human protein when in a suitable expression system.
1 0. A vector comprising the DNA sequence of claim 1 .
1 1 . A vector comprising the DNA sequence of claim 3.
1 2. A vector comprising the DNA sequence of claim 5.
1 3. A vector comprising the DNA sequence of claim 9.
14. A polypeptide coded for by the DNA sequence of claim 7 and active fragments, derivatives and functional analogs thereof.
1 5. A polypeptide coded for by the DNA sequence of claim 8 and active fragments, derivatives and functional analogs thereof.
1 6. A polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOS: 2, 4, 6, 8, 10, 1 2, 14, 1 6, 18, 20, 22, 24, 26, and 29.
1 7. A genetically engineered cell having inserted into the genome thereof the DNA of Claim 7.
1 8. A process for producing cells for expressing a polypeptide using genetically engineered cells of claim 1 7.
1 9. An isolated DNA sequence comprising a fragment of a DNA of claim 5, wherein said fragment comprises at least 1 5 sequential bases of said sequence.
20. An isolated DNA sequence comprising a fragment of DNA of claim 5, wherein said fragment comprises at least 30 sequential bases of said sequence.
21 . An isolated DNA sequence comprising a fragment of DNA of claim 5, wherein said fragment comprises at least 50 sequential bases of said sequence.
22. An isolated DNA sequence comprising a fragment of DNA of claim 5, wherein said fragment comprises at least 80 sequential bases of said sequence.
23. A method of detecting genes within the human genome comprising contacting a sample of said genome with an isolated DNA selected from the group consisting of the DNAs of claims 19, 20, 21 , and 22.
24. A monoclonal antibody against a polypeptide selected from the group consisting of the polypeptides of claims 14 and 1 5.
EP00923114A 1999-04-01 2000-03-31 HUMAN MESENCHYMAL DNAs AND EXPRESSION PRODUCTS Withdrawn EP1163338A2 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US12741899P 1999-04-01 1999-04-01
US127418P 1999-04-01
US14880099P 1999-08-13 1999-08-13
US148800P 1999-08-13
PCT/US2000/008751 WO2000059933A2 (en) 1999-04-01 2000-03-31 HUMAN MESENCHYMAL DNAs AND EXPRESSION PRODUCTS

Publications (1)

Publication Number Publication Date
EP1163338A2 true EP1163338A2 (en) 2001-12-19

Family

ID=26825607

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00923114A Withdrawn EP1163338A2 (en) 1999-04-01 2000-03-31 HUMAN MESENCHYMAL DNAs AND EXPRESSION PRODUCTS

Country Status (4)

Country Link
EP (1) EP1163338A2 (en)
JP (1) JP2002540782A (en)
AU (1) AU4329300A (en)
WO (1) WO2000059933A2 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7227007B2 (en) 2000-12-28 2007-06-05 Asahi Kasei Pharma Corporation NF-κB activating gene
US9677042B2 (en) 2010-10-08 2017-06-13 Terumo Bct, Inc. Customizable methods and systems of growing and harvesting cells in a hollow fiber bioreactor system
EP3068867B1 (en) 2013-11-16 2018-04-18 Terumo BCT, Inc. Expanding cells in a bioreactor
JP6783143B2 (en) 2014-03-25 2020-11-11 テルモ ビーシーティー、インコーポレーテッド Passive replenishment of medium
WO2016049421A1 (en) 2014-09-26 2016-03-31 Terumo Bct, Inc. Scheduled feed
WO2017004592A1 (en) 2015-07-02 2017-01-05 Terumo Bct, Inc. Cell growth with mechanical stimuli
US11685883B2 (en) 2016-06-07 2023-06-27 Terumo Bct, Inc. Methods and systems for coating a cell growth surface
US11104874B2 (en) 2016-06-07 2021-08-31 Terumo Bct, Inc. Coating a bioreactor
US11624046B2 (en) 2017-03-31 2023-04-11 Terumo Bct, Inc. Cell expansion
JP7393945B2 (en) 2017-03-31 2023-12-07 テルモ ビーシーティー、インコーポレーテッド cell proliferation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998035022A1 (en) * 1997-02-06 1998-08-13 Osiris Therapeutics, Inc. p21?CIP1 OR p27KIP1¿ EFFECTS ON THE REGULATION OF DIFFERENTIATION OF HUMAN MESENCHYMAL STEM CELLS

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0059933A3 *

Also Published As

Publication number Publication date
AU4329300A (en) 2000-10-23
WO2000059933A3 (en) 2001-02-15
WO2000059933A2 (en) 2000-10-12
JP2002540782A (en) 2002-12-03

Similar Documents

Publication Publication Date Title
US6300310B1 (en) Human tissue inhibitor of metalloproteinase- 4
HUT77578A (en) Fibroblast growth factor-10
JP2002325588A (en) Human dna mismatch repair protein
WO1999023219A1 (en) Human slit polypeptide and polynucleotides encoding same
EP1163338A2 (en) HUMAN MESENCHYMAL DNAs AND EXPRESSION PRODUCTS
JPH09511140A (en) Stanniocalcin, a protein in the body of Stannius
US20020091247A1 (en) Polycyclic aromatic hydrocarbon induced molecules
US20040002449A1 (en) METH1 and METH2 polynucleotides and polypeptides
AU784881B2 (en) Human transporter proteins and polynucleotides encoding the same
US6482606B1 (en) Human DNA mismatch repair polynucleotides
WO1998011236A1 (en) Aspartic protease
EP1002132A1 (en) 90 human secreted proteins
US6337388B1 (en) Aspergillus fumigatus auxotrophs, auxotrophic markers and polynucleotides encoding same
US6376235B1 (en) IVI-2, IVI-3 and IVI-4 loci of Enterococcus faecalis polynucleotide, polypeptides and method of use therefor
EP1169447A1 (en) Genes and expression products from hematopoietic cells
JP2003505084A (en) Human homolog of NESP55, a bovine neuroendocrine protein associated with obesity, polynucleotides thereof and uses thereof
WO2000070036A2 (en) Genes expressed in hippocampus
WO1998012205A9 (en) Ivi-2, ivi-3 and ivi-4 loci of enterococcus faecalis polynucleotide, polypeptides and method of use therefor
WO2002057312A2 (en) Fgfrl1 gene and protein encoded thereby
US20040002084A1 (en) Nucleic acids, polypeptides, vectors, and cells derived from activated eosinophil cells
JP2000083683A (en) Frzb family member, frazzled
JP2003159080A (en) Identification and use of molecule related to pain
JPH11506309A (en) Human amine transporter
AU784615B2 (en) Human membrane proteins and polynucleotides encoding the same
CA2285605A1 (en) Fanconi-gene ii

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20011012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

17Q First examination report despatched

Effective date: 20040210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20040622