AU2002301051B2 - Extended cDNAs for secreted proteins - Google Patents

Extended cDNAs for secreted proteins Download PDF

Info

Publication number
AU2002301051B2
AU2002301051B2 AU2002301051A AU2002301051A AU2002301051B2 AU 2002301051 B2 AU2002301051 B2 AU 2002301051B2 AU 2002301051 A AU2002301051 A AU 2002301051A AU 2002301051 A AU2002301051 A AU 2002301051A AU 2002301051 B2 AU2002301051 B2 AU 2002301051B2
Authority
AU
Australia
Prior art keywords
sequences
cdna
seq
sequence
protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2002301051A
Other versions
AU2002301051A1 (en
AU2002301051B9 (en
Inventor
Lydie Bougueleret
Aymeric Duclert
Jean-Baptiste Dumas Milne Edwards
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Merck Biodevelopment SAS
Original Assignee
Serono Genetics Institute SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU85547/98A external-priority patent/AU8554798A/en
Priority claimed from AU10491/99A external-priority patent/AU753099B2/en
Priority claimed from PCT/IB1998/001862 external-priority patent/WO1999025825A2/en
Priority to AU2002301051A priority Critical patent/AU2002301051B9/en
Application filed by Serono Genetics Institute SA filed Critical Serono Genetics Institute SA
Publication of AU2002301051A1 publication Critical patent/AU2002301051A1/en
Assigned to GENSET S.A. reassignment GENSET S.A. Amend patent request/document other than specification (104) Assignors: GENSET
Assigned to SERONO GENETICS INSTITUTE S.A. reassignment SERONO GENETICS INSTITUTE S.A. Amend patent request/document other than specification (104) Assignors: GENSET S.A.
Publication of AU2002301051B2 publication Critical patent/AU2002301051B2/en
Priority to AU2006202884A priority patent/AU2006202884A1/en
Publication of AU2002301051B9 publication Critical patent/AU2002301051B9/en
Application granted granted Critical
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Landscapes

  • Peptides Or Proteins (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Description

AUSTRALIA
Patents Act 1990
GENSET
COMPLETE SPECIFICATION STANDARD PATENT Invention Title: Extended cDNAs for secreted proteins The following statement is a full description of this invention including the best method of performing it known to us:- EXTENDED cDNAs FOR SECRETED PROTEINS This is a divisional of AU 10491199 the contents of which are incorporated herein by reference.
Back-round of the Invention The estimated 50,000-100,000 genes scattered along the human chromosomes offer tremendous promise for the understanding, diagnosis, and treatment of human diseases. In addition, probes capable of specifically hybridizing to loci distributed throughout the human genome find applications in the construction of high resolution chromosome maps and in the identification of individuals.
In the past, the characterization of even a single human gene was a painstaking process, requiring years of effort. Recent developments in the areas of cloning vectors, DNA sequencing, and computer technology have merged to greatly accelerate the rate at which human genes can be isolated, sequenced, mapped, and characterized. Cloning vectors such as yeast artificial chromosomes (YACs) and bacterial artificial chromosomes (BACs) are able to accept DNA inserts ranging from 300 to 1000 kilobases (kb) or 100-400 kb in length respectively, thereby facilitating the manipulation and ordering of DNA sequences distributed over great distances on the human chromosomes.
Automated DNA sequencing machines permit the rapid sequencing of human genes. Bioinformatics software enables the comparison of nucleic acid and protein sequences, thereby assisting in the characterization of human gene products.
Currently, two different approaches are being pursued for identifying and characterizing the genes distributed along the human genome. In one approach, large fragments of genomic DNA are isolated, cloned, and sequenced. Potential open reading frames in these genomic sequences are identified using bio-informatics software. However, this approach entails sequencing large stretches of human DNA which do not encode proteins in order to find the protein encoding sequences scattered throughout the genome.
In addition to requiring extensive sequencing, the bio-informatics software may mischaracterize the genomic sequences obtained. Thus, the software may produce false positives in which non-coding DNA is mischaracterized as coding DNA or false negatives in which coding DNA is mislabeled as non-coding DNA.
An alternative approach takes a more direct route to identifying and characterizing human genes. In this approach, complementary DNAs (cDNAs) are synthesized from isolated messenger RNAs (mRNAs) which encode human proteins. Using this approach, sequencing is only performed on DNA which is derived from protein coding portions of the genome. Often, only short stretches of the cDNAs are sequenced to obtain sequences called expressed sequence tags (ESTs). The ESTs may then be used to isolate or purify extended cDNAs which include sequences adjacent to the EST sequences.
The extended cDNAs may contain all of the sequence of the EST which was used to obtain them or only a portion of the sequence of the EST which was used to obtain them. In addition, the extended cDNAs may contain the full coding sequence of the gene from which the EST was derived or, alternatively, the extended cDNAs may include portions of the coding sequence of the gene from which the EST was derived. It will be appreciated that there may be several extended cDNAs which include the EST sequence as a result of alternate splicing or the activity of alternative promoters.
In the past, the short EST sequences which could be used to isolate or purify extended cDNAs were often obtained from oligo-dT primed cDNA libraries. Accordingly, they mainly corresponded to the 3' untranslated region of the mRNA. In part, the prevalence of EST sequences derived from the 3' end of the mRNA is a result of the fact that typical techniques for obtaining cDNAs, are not well suited for isolating cDNA sequences derived from the 5' ends of mRNAs. (Adaims et al., Nature 377:174. 1996. Hillieret al., Genmni' Res. 6:807-828, 1996).
In addition, in those reported instances where longer cDNA sequences have been obtained, the reported sequences typically correspond to coding sequences and do not include the full 5' untranslated region of the mRNA from which the cDNA is derived. Such incomplete sequences may not include the first exon of the mRNA, particularly in situations where the first exon is short. Furthermore, they may not include some exons, often short ones, which are located upstream of splicing sites. Thus, there is a need to obtain sequences derived from the 5' ends of mRNAs which can be used to obtain extended cDNAs which may include the 5' sequences contained in the 5' ESTs.
While many sequences derived from human chromosomes have practical applications, approaches based on the identification and characterization of those chromosomal sequences which encode a protein product are particularly relevant to diagnostic and therapeutic uses. Of the 50,000-100,000 protein coding genes, those genes encoding proteins which are secreted from the cell in which they are synthesized, as well as the secreted proteins themselves, are particularly valuable as potential therapeutic agents. Such proteins are often involved in cell to cell communication and may be responsible for producing a clinically relevant response in their target cells.
In fact, several secretory proteins, including tissue plasminogen activator, G-CSF, GM-CSF, crythropoietin, human growth hormone, insulin, interfcron-a, intcrferon-p, intcrferon-, and interleukin-2, are currently in clinical use. These proteins are used to treat a wide range of conditions, including acute myocardial infarction, acute ischemic stroke, anemia, diabetes, growth hormone deficiency, hepatitis, kidney carcinoma, chemotherapy induced neutropenia and multiple sclerosis. For these reasons, extended cDNAs encoding secreted proteins or portions thereof represent a particularly valuable source of therapeutic agents. Thus, there is a need for the identification and characterization of secreted proteins and the nucleic acids encoding them.
In addition to being therapeutically useful themselves, secretory proteins include short peptides, called signal peptides, at their amino termini which direct their secretion. These signal peptides are encoded by the signal sequences located at the 5' ends of the coding sequences of genes encoding secreted proteins.
Because these signal peptides will direct the extracellular secretion of any protein to which they are operably linked, the signal sequences may be exploited to direct the efficient secretion of any protein by operably linking the signal sequences to a gene encoding the protein for which secretion is desired. This may prove beneficial in gene therapy strategies in which it is desired to deliver a particular gene product to cells other than the cell in which it is produced. Signal sequences encoding signal peptides also find application in simplifying protein purification techniques, In such applications, the extracellular secretion of the desired protein greatly facilitates purification by reducing the number of undesired proteins from which the desired protein must be selected. Thus. there exists a need to identify and characterize the portions of the genes for secretory proteins which encode signal peptides.
Public information on the number of hunUn genes for which the promoters and upstream regulatory regions have been identified and characterized is quite limited. In part, this may be due to the difficulty of isolating such regulatory sequences. Upstream regulatory sequences such as transcription factor binding sites are typically too short to be utilized as probes for isolating promoters from human genomic libraries.
Recently, some approaches have been developed to isolate human promoters. One of them consists of making a CpG island library (Cross, S.H. et al.. Purification of CpG Islands using a Methylated DNA Binding Column, Niaure Geneics 6: 236-244 (1994)). The second consists of isolating human genomic DNA sequences contaiining Spel binding sites by the use of Spel binding protein. (Mortlock et al., Genome Res. 6:327-335, 1996). Both of these approaches have their limits due to a lack of specificity or of comprehensiveness.
5' ESTs and extended cDNAs obtainable therefrom may be used to efficiently identify and isolate upstream regulatory regions which control the location, developmental stage, rate, and quantity of protein synthesis, us well as the stability of the mRNA. Theil et al., BioFactors 4:87-93 (1993). Once identified and characterized, these regulatory regions may be utilized in gene therapy or protein purification schemes to obtain the desired amount and locations of protein synthesis or to inhibit, reduce, or prevent the synthesis of undesirable gene products.
In addition, ESTs containing the 5' ends of secretory protein genes or extended cDNAs which include sequences adjacent to the sequences of the ESTs may include sequences useful as probes for chromosome mapping and the identification of individuals. Thus, there is a need to identify and characterize the sequences upstream of the 5' coding sequences of genes encoding secretory proteins.
Summnrv of the Invention The present invention relates to purified, isolated, or recombinant extended cDNAs which encode secreted proteins or fragments thereof. Preferably, the purified, isolated or recombinant cDNAs contain the entire open reading frame of their corresponding mRNAs, including a start codon and a stop codon. For example, the extended cDNAs may include nucleic acids encoding the signal peptide as well as the mature protein. Alternatively, the extended cDNAs may contain a fragment of the open reading frame. In some embodiments, the fragment may encode only the sequence of the mature protein. Alternatively, the fragment may encode only a portion of the mature protein. A further aspect of the present invention is a nucleic acid which encodes the signal peptide of a secreted protein.
The present extended cDNAs were obtained using ESTs which include sequences derived from the authentic 5' ends of their corresponding mRNAs. As used herein the terms "EST" or EST" refer to the short cDNAs which were used to obtain the extended cDNAs of the present invention. As used herein, the term "extended cDNA" refers to the cDNAs which include sequences adjacent to the 5' EST used to obtain them. The extended cDNAs may contain all or a portion of the sequence of the EST which was used to obtain them. The term "corresponding mRNA" refers to the mRNA which was the template for the cDNA synthesis which produced the 5' EST. As used herein, the term "purified" does not require absolute purity: rather, it is intended as a relative definition. Individual extended cDNA clones isolated from a cDNA library have been conventionally purified to clectrophoretic homogeneity. The sequences obtained from these clones could not be obtained directly either fron the library or from total human DNA. The extended cDNA clones are not naturally occurring as such, but rather are obtained via manipulation of a partially purified naturally occurring substance (messenger RNA). The conversion of mRNA into a cDNA library involves the creation of a synthetic substance (cDNA) and pure individual cDNA clones can be isolated from the synthetic library by clonal selection. Thus, creating a cDNA library from messenger RNA and subsequently isolating individual clones from that library results in an approximately 104-106 fold purification of the native message. Purification of starting material or natural material to at least one order of magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is expressly contemplated.
As used herein, the term "isolated" requires that the material be removed from its original environment the natural environment if it is naturally occurring). For example, a naturally-occurring polynucleotide present in a living animal is not isolated, but the same polynucleotide, separated from some or all of the coexisting materials in the natural system, is isolated.
As used herein, the term "recombinant" means that the extended cDNA is adjacent to "backbone" nucleic acid to which it is not adjacent in its natural environment. Additionally, to be "enriched" the extended cDNAs will represent 5% or more of the number of nucleic acid inserts in a population of nucleic acid backbone molecules. Backbone molecules according to the present invention include nucleic acids such as expression vectors, self-replicating nucleic acids, viruses, integrating nucleic acids, and other vectors or nucleic acids used to maintain or manipulate a nucleic acid insert of interest. Preferably, the enriched extended cDNAs represent 15% or more of the number of nucleic acid inserts in the population of recombinant backbone molecules. More preferably, the enriched extended cDNAs represent 50% or more of the number of nucleic acid inserts in the population of recombinant backbone molecules. In a highly preferred embodiment, the enriched extended cDNAs represent 90% or more of the number of nucleic acid inserts in the population of recombinant backbone molecules. "Stringent", "moderate," and "low" hybridization conditions are as defined in Example 29.
Unless otherwise indicated, a "complementary" sequence is fully complementary. Thus, extended cDNAs encoding secreted polypeptides or fragments thereof which arc present in cDNA libraries in which one or more extended cDNAs encoding secreted polypeptides or fragments thereof make up 5% or more of the number of nucleic acid inserts in the backbone molecules are "enriched recombinant extended cDNAs" as defined herein. Likewise, extended cDNAs encoding secreted polypeptides or fragments thereof which are in a population of plasmids in which one or more extended cDNAs of the present invention have been inserted such that they represent 5% or more of the number of inserts in tile plasmid backbone are" enriched recombinant extended cDNAs" as defined herein. However, extended cDNAs encoding secreted polypeptidcs or fragments thereof which are in cDNA libraries in which the extended cDNAs encoding sccreted polypeptides or fragments thercof constitute less than 5% of the number of nucleic acid inserts in the population of backbone molecules, such as libraries in which backbone molecules having a cDNA insert encoding a secreted polypeptide are extremely rare, are not "enriched recombinant extended cDNAs." In particular, the present invention relates to extended cDNAs which were derived from genes encoding secreted proteins. As used herein, a "secreted" protein is one which, when expressed in a suitable host cell, is transported across or through a membrane, including transport as a result of signal peptides in its amino acid sequence. "Secreted" proteins include without limitation proteins secreted wholly (e.g.
soluble proteins), or partially receptors) from the cell in which they are expressed. "Secreted" proteins also include without limitation proteins which are transported across the membrane of the endoplasmic reticulum.
Extended cDNAs encoding secreted proteins may include nucleic acid sequences, called signal sequences, which encode signal peptides which direct the extracellular secretion of the proteins encoded by the extended cDNAs. Generally, the signal peptides are located at the amino termini of secreted proteins.
Secreted proteins are translated by ribosomes associated with the "rough" endoplasmic reticulum.
Generally, secreted proteins are co-translationally transferred to the membrane of the endoplasmic reticulum. Association of the ribosome with the endoplasmic rcticulum during translation of secreted proteins is mediated by the signal peptide. The signal peptide is typically cleaved following its cotranslational entry into the endoplasmic reticulum. After delivery to the endoplasmic reticulum, secreted proteins may proceed through the Golgi apparatus. In the Golgi apparatus, the proteins may undergo posttranslational modification before entering secretory vesicles which transport them across the cell membrane.
The extended cDNAs of the present invention have several important applications. For example, they may be used to express the entire secreted protein which they encode. Alternatively, they may be used to express portions of the secreted protein. The portions may comprise the signal peptides encoded by the extended cDNAs or the mature proteins encoded by the extended cDNAs the proteins generated when the signal peptide is cleaved off). The portions may also comprise polypeptides having at least consecutive amino acids encoded by the extended cDNAs. Alternatively, the portions may comprise at least consecutive amino acids encoded by the extended cDNAs. In some embodiments, the portions may comprise at least 25 consecutive amino acids encoded by the extended cDNAs. In other embodiments, the portions may comprise at least 40 amino acids encoded by the extended cDNAs.
Antibodies which specifically recognize the entire secreted proteins encoded by the extended cDNAs or fragments thereof having at least 10 consecutive amino acids, at least 15 consecutive amino acids, at least 25 consecutive amino acids, or at least 40 consecutive amino acids may also be obtained as described below. Antibodies which specifically recognize the mature protein generated when the signal peptide is cleaved may also be obtained as described below. Similarly, antibodies which specifically recognize the signal peptides encoded by the extended cDNAs may also be obtained.
In some embodiments, the extended cDNAs include the signal sequence. In other embodiments. the extended cDNAs may include the full coding sequence for the mature prolcin the protein generated when the signal polypcptide is cleaved off). In addition, the extended cDNAs nray include regulatory regions upstream of the translation start site or downstream of the stop codon which control the amount, location, or developmental stage of gene expression, As discussed above, secreted proteins are therupeutically important. Thus, the proteins expressed from the cDNAs may be useful in treating or controlling a variety of human conditions. The extended cDNAs may also be used to obtain the corresponding genomic DNA. The term "corresponding genomic DNA" refers to the genomic DNA which encodes mRNA which includes the sequence of one of the strands of the extended cDNA in which thymidine residues in the sequence of the extended cDNA are replaced by uracil residues in the mRNA, s1 The extended cDNAs or genomic DNAs obtained therefrom may be used in forensic procedures to identify individuals or in diagnostic procedures to identify individuals having genetic diseases resulting from abnormal expression of the genes corresponding to the extended cDNAs. In addition, the present invention is useful for constructing a high resolution map of the human chromosomes.
The present invention also relates to secretion vectors capable of directing the secretion of a protein 21) of interest. Such vectors may be used in gene therapy strategies in which it is desired to produce a gene product in one cell which is to be delivered to another location in the body. Secretion vectors may also facilitate the purification of desired proteins.
The present invention also relates to expression vectors capable of directing the expression of an inserted gene in a desired spatial or temporal manner or at a desired level. Such vectors may include sequences upstream of the extended cDNAs such as promoters or upstream regulatory sequences.
In addition, the present invention may also be used for gene therapy to control or treat genetic diseases. Signal peptides may also be fused to heterologous proteins to direct their extracellular secretion.
One embodiment of the present invention is a purified or isolated nucleic acid comprising the sequence of one of SEQ ID NOs: 134-180 or a sequence complementary thereto. In one aspect of this embodiment, the nucleic acid is recombinant.
Another embodiment of the present invention is a purified or isolated nucleic acid comprising at least 10 consecutive bases of the sequence of one of SEQ ID NOs: 134-180 or one of the sequences complementary thereto. In one aspect of this embodiment, the nucleic acid comprises at least 15, 25, 30, 75 or 100 consecutive bases of one of the sequences of SEQ ID NOs: 134-180 or one of the sequences complementary thereto. The nucleic acid may be a recombinant nucleic acid.
Another embodiment of the present invention is a purified or isolated nucleic acid of at least bases capable of hybridizing under stringent conditions to the sequence of one of SEQ ID NOs: 134-180 or a sequence complementary to one of the sequences of SEQ ID NOs: 134-180. In one aspect of this embodiment, the nucleic acid is recombinant.
Another embodiment of the present invention is purified or isolated nucleic acid comprising the full coding sequences of one of SEQ ID NOs: 134-180. wherein the full coding sequence optionally comprises the sequence encoding signal peptide as well as the sequence encoding mature protein. In one aspect of this embodiment, the nucleic acid is recombinant.
A further embodiment of the present invention is a purified or isolated nucleic acid comprising the nucluolides of one of SEQ ID NOs: 134-180 which encode a mature protein. In one aspect of this embodiment. the nucleic acid is recombinant.
Yet another embodiment of the present invention is a purified or isolated nucleic acid comprising the nucleotides of one of SEQ ID NOs: 134-180 which encode the signal peptide. In one aspect of this embodiment, the nucleic acid is recombinant.
Another embodiment of the present invention is a purified or isolated nucleic acid encoding a polypeptide having the sequence of one of the sequences of SEQ ID NOs: 181-227.
Another embodiment of the present invention is a purified or isolated nucleic acid encoding a polypeptide having the sequence of a mature protein included in one of the sequences of SEQ ID NOs: 181- 227.
Another embodiment of the present invention is a purified or isolated nucleic acid encoding a polypeptide having the sequence of a signal peptide included in one of the sequences of SEQ ID NOs: 181- 227.
Yet another embodiment of the present invention is a purified or isolated protein comprising the sequence of one of SEQ ID NOs: 181-227.
Another embodiment of the present invention is a purified or isolated polypeptide comprising at least 10 consecutive amino acids of one of the sequences of SEQ ID NOs: 181-227. In one aspect of this embodiment, the purified or isolated polypeptide comprises at least 15, 20, 25. 35, 50, 75, 100, 150 or 200 consecutive amino acids of one of die sequences of SEQ ID NOs: 181-227. In still another aspect, the purified or isolated polypeptide comprises at least 25 consecutive amino acids of one of the sequences of SEQ ID NOs: 181-227.
Another embodiment of the present invention is an isolated or purified polypeptide comprising a signal peptide of one of the polypeptides of SEQ ID NOs: 181-227.
Yet another embodiment of the present invention is an isolated or purified polypeptide comprising a mature protein of one of the polypeptides of SEQ ID NOs: 181-227.
A further embodiment of the present invention is a method of making a protein comprising one of the sequences of SEQ ID NO: 181-227, comprising the steps of obtaining a cDNA comprising one of the sequences of sequence of SEQ ID NO: 134-180, inserting the cDNA in an expression vector such that the cDNA is operably linked to a promoter, and introducing the expression vector into a host cell whereby the host cell produces the protein encoded by said cDNA. In one aspect of this embodiment, the method furher comprises the step of isolating the protein.
Another embodiment of the present invention is a protein obtainable by the method described in the preceding paragraph.
Another embodiment of the present invention is a method of making a protein comprising the aino acid sequence of lie mature protein contained in one of the sequences of SEQ ID NO: 181-227, comprising the steps of obtaining a clNA comprising one of the nucleotides sequence of sequence of SEQ ID NO: 134-180 which encode for the nature protein, inserting the cDNA in an expression vector such that the cDNA is operably linked to a promoter. and introducing the expression vector into a host cell whereby the host cell produces the mature protein encoded by the cDNA. In one aspect of this embodiment, the method further comprises the step of isolating the protein.
Another embodiment of the present invention is a mature protein obtainable by the method described in the preceding paragraph.
Another embodiment of the present invention is a host cell containing the purified or isolated nucleic acids comprising the sequence of one of SEQ ID NOs: 134-180 or a sequence complementary thereto described herein.
Another embodiment of the present invention is a host cell containing the purified or isolated nucleic acids comprising tihe full coding sequences of one of SEQ ID NOs: 134-180. wherein the full coding sequence comprises the sequence encoding signal pcptide and the sequence encoding mature protein described herein.
Another embodiment of the present invention is a host cell containing the purified or isolated nucleic acids comprising the nucleotides of one of SEQ ID NOs: 134-180 which encode a mature protein which are described herein.
Another embodiment of the present invention is a host cell containing the purified or isolated nucleic acids comprising the nucleotides of one of SEQ ID NOs: 134-180 which encode the signal peptide which are described herein.
Another embodiment of the present invention is a purified or isolated antibody capable of specifically binding to a protein having the sequence of one of SEQ ID NOs: 181-227. In one aspect of this embodiment, the antibody is capable of binding to a polypeptide comprising at least 10 consecutive amino acids of the sequence of one of SEQ ID NOs: 181-227.
Another embodiment of the present invention is an array of cDNAs or fragments thereof of at least nuclcotides in length which includes at least one of the sequences of SEQ ID NOs: 134-180, or one of the sequences complementary to the sequences of SEQ ID NOs: 134-180, or a fragment thereof of at least 15 consecutive nucleotides. In one aspect of this embodiment, the array includes at least two of the sequences of SEQ ID NOs: 134-180, the sequences complementary to the sequences of SEQ ID NOs: 134- 180, or fragments thereof of at least 15 consecutive nucleotides. In another aspect of this embodiment, the array includes at least five of the sequences of SEQ ID NOs: 134-180, the sequences complementary to the sequences of SEQ ID NOs: 134-180, or fragments thereof of at least 15 consecutive nucleotides.
A further embodiment of the invention encompasses purified polynucleotides comprising an insert from a clone deposited in ATCC accession No. 98619 or a fragment thereof comprising a contiguous span of at least 8, 10, 12, 15, 20, 25, 40, 60, 100, or 200 nucleotides of said insert. An additional embodiment of the invention encompasses purified polypeptides which comprise, consist of, or consist essentially of an amino acid sequence encoded by the insert from a clone deposited in ATCC accession No. 98619, as well as polypeptides which comprise a fragment of said amino acid sequence consisting of a signal peptide, a mature protein, or a contiguous span of at least 5, 8, 10, 12, 15, 20, 25, 40, 60, 100, or 200 amino acids encoded by said insert.
An additional embodiment of the invention encompasses purified polypeptides which comprise a contiguous span of at least 5, 8, 10, 12, 15, 40, 60, 100, or 200 amino acids of SEQ ID NOs: 185, 186, 191, 192, 200, 201, 213, 214, 215, or 227, wherein said contiguous span comprises at least one of the amino acid positions which was not shown to be identical to a public sequence in any of Figures 9 to 16. Also encompassed by the invention are purified polynucleotides encoding said polypeptides.
In another embodiment, the present invention provides a purified or isolated polypeptide comprising a sequence of any one of SEQ ID NOs: 186, 197, 201, 214, 223 or 227.
In another embodiment, the present invention provides a purified or isolated polypeptide comprising a mature protein of any one of SEQ ID NOs: 186, 197, 201, 214, 223 or 227.
In another embodiment, the present invention provides a purified or isolated polypeptide comprising a signal peptide of any one of SEQ ID NOs: 186, 197, 201, 214, 223 or 227.
In another embodiment, the present invention provides a purified or isolated polypeptide comprising an amino acid sequence selected from the group consisting of: a) at least 58 consecutive amino acids of SEQ ID NO:186; b) at least 33 consecutive amino acids of SEQ ID NO:197; c) at least 64 consecutive amino acids of SEQ ID NO:201; d) at least 62 consecutive amino acids of SEQ ID NO:214; e) at least 23 consecutive amino acids of SEQ ID NO:223; f) at least 10 consecutive amino acids of SEQ ID NO:227; and g) any fragment comprising at least 10 amino acids of SEQ ID NO: 186, 197, 201,214, 223, or 227, wherein the polypeptide has at least one biological activity of a polypeptide of the invention.
In another embodiment, the present invention provides a purified or isolated polypeptide comprising a sequence which is at least 95% identical to any one of SEQ ID NOs: 186, 197, 201, 214, 223 or 227, wherein the polypeptide has at least one biological activity of a polypeptide of the invention.
Preferably, the biological activity is cell adhesion.
In another embodiment, the present invention provides a purified or isolated nucleic acid sequence encoding a polypeptide of the invention, or a sequence complementary thereto.
Preferably, the nucleic acid comprises a sequence selected from SEQ ID NOs: 139, 154, 150, 167, 176 or 180.
In another embodiment, the present invention provides a method of making a polypeptide comprising one of the sequences of SEQ ID NO: 186, 197, 201, 214, 223 or 227, the method comprising the steps of: obtaining a nucleic acid comprising one of the sequences of SEQ ID NO: 139, 154, 150, 167, 176 or 180; inserting said nucleic acid in an expression vector such that said nucleic acid is operably linked to a promoter; and introducing said expression vector into a host cell whereby said host cell produces the polypeptide encoded by said nucleic acid.
Preferably, the method further comprises the step of isolating said polypeptide.
In another embodiment, the present invention provides a polypeptide obtainable by a method of the invention.
In another embodiment, the present invention provides a host cell containing a recombinant nucleic acid according to the invention.
in 13 In another embodiment, the present invention provides a purified or isolated U antibody capable of specifically binding to a polypeptide having a sequence of any one ofSEQ ID NOs: 186, 197, 201, 214, 223 or 227.
SIn another embodiment, the present invention provides an array of polynucleotides comprising at least one polynucleotide selected from the group consisting of: a) a polynucleotide sequence of SEQ ID NOs: 139, 150, 154, 167, 176 or 180; C b) a polynucleotide encoding a polypeptide fragment of SEQ ID NO: 186, 197, 201, 214, 223 or 227, wherein the polypeptide fragment has at least C one biological activity of the polypeptide provided as SEQ ID NO: 186, 197, 201, 214, 223 or 227; c) a polynucleotide encoding a signal sequence of SEQ ID NO: 186, 197, 201, 214, 223 or 227, d) a polynucleotide encoding a polypeptide which is at least 90% identical to any of one of SEQ ID NOs: 186, 197, 201, 214, 223 or 227, wherein the polypeptide has at least one biological activity of the polypeptide provided as SEQ ID NO: 186, 197, 201, 214, 223 or 227; and e) a polynucleotide sequence complementary to any one of a) to d).
In one aspect, the present invention provides a purified or isolated polypeptide comprising a sequence of SEQ ID NO: 185 or SEQ ID NO: 215.
In another aspect, the present invention provides a purified or isolated polypeptide comprising a sequence which is at least 80% identical to SEQ ID NO: 185 or SEQ ID NO: 215, wherein the polypeptide has at least one biological activity of the polypeptide of the above-defined polypeptide.
Preferably, the biological activity is regulating protein-protein interaction in the MAP kinase pathway.
In a further aspect, the present invention provides a purified or isolated nucleic acid molecule encoding a polypeptide according to the invention, or a sequence complementary thereto.
In an embodiment, the nucleic acid comprises SEQ ID NO: 138 or SEQ ID NO: 168.
In another aspect, the present invention provides a purified or isolated nucleic acid molecule consisting of at least 30, 40, 50, 75, or 100 consecutive bases of SEQ ID NO: 138 or SEQ ID NO: 168, or a sequence complementary thereto.
t' 13A In a further aspect, the present invention provides an expression vector Scomprising the nucleic acid molecule according to the invention.
In a further aspect, the present invention provides a host cell comprising a recombinant nucleic acid molecule according to the invention.
In a further aspect, the present invention provides a method of making a polypeptide of the invention comprising the steps of: obtaining a cDNA comprising a nucleic acid molecule of the invention; (ii) inserting said cDNA in an expression vector such that said cDNA is Cc operably linked to a promoter; and (iii) introducing said expression vector into a host cell whereby said host cell CI produces the protein encoded by said cDNA.
Preferably, the method further comprises the step of isolating said polypeptide.
In a further aspect, the present invention provides a purified or isolated antibody capable of specifically binding to a polypeptide of the invention.
In a further aspect, the present invention provides for the use of the polypeptide of the invention for diagnosing or treating a disease selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock.
In a further aspect, the present invention provides for the use of the nucleic acid molecule of the invention for diagnosing or treating a disease selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock.
In a further aspect, the present invention provides for the use of the antibody of the invention for diagnosing or treating a disease selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock.
In a further aspect, the present invention provides a method for treating or preventing a disease in a subject selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock, the method comprising administering to the subject a polypeptide of the invention.
In a further aspect, the present invention provides a method for treating or preventing a disease in a subject selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock, the method comprising administering to the subject a nucleic acid molecule of the invention.
13B In a further aspect, the present invention provides a method for treating or Spreventing a disease in a subject selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock, the method comprising administering to the subject an antibody of the invention.
In a further aspect, the present invention provides for the use of a polypeptide of the invention for the manufacture of a medicament for treating or preventing a disease selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock.
In a further aspect, the present invention provides for the use of a nucleic acid N, molecule of the invention for the manufacture of a medicament for treating or preventing a disease selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock.
In a further aspect, the present invention provides for the use of an antibody of the invention for the manufacture of a medicament for treating or preventing a disease selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock Throughout this specification the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.
Any discussion of documents, acts, materials, devices, articles or the like which has been included in the present specification is solely for the purpose of providing a context for the present invention. It is not to be taken as an admission that any or all of these matters form part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed before the priority date of each claim of this application.
Brief Description of the Drawings Figure 1 is a summary of a procedure for obtaining cDNAs which have been selected to include the 5' ends of the mRNAs from which they are derived.
Figure 2 is an analysis of the 43 amino terminal amino acids of all human SwissProt proteins to determine the frequency of false positives and false negatives using the techniques for signal peptide identification described herein.
Figure 3 shows the distribution of von Heijne scores of 5' ESTs in each of the categories described herein and the probability that these 5' ESTs encode a signal peptide.
Figure 4 shows the distribution of 5' ESTs in each category and the number of 5' ESTs in each category having a given minimum von Heijne's score.
Figure 5 shows the tissues from which the mRNAs corresponding to the ESTs in each of the categories described herein were obtained.
Figure 6 is a map of pED6dpc2. PED6dpc2 is derived from pED6dpcl by insertion of a new polylinker to facilitate cDNA cloning. SST cDNAs are cloned between EcoRI and Notl. PED vectors are described in Kaufman et al.
(1991), NAR 19:4485-4490.
Figure 7 provides a schematic description of the promoters isolated and the way they are assembled with the corresponding 5' tags.
Figure 8 describes the transcription factor binding sites present in each of these promoters.
Figure 9 depicts an amino acid alignment between SEQ ID NO: 214 and murine SH3BGRL (AF042081). Identities are shown by and conservative substitutions by Cell attachment motif (RGD) is in bold type and the proline rich region is underlined.
Figure 10 depicts a multiple amino acid alignment between SEQ ID NOs: 185 and 215, and murine MEK binding partner (AF082526). Positions conserved in all three proteins are indicated by Figure 11 depicts an amino acid alignment between SEQ ID NO: 186 and murine claudin-2 (AF072128). Identities are shown by and conservative substitutions by Figure 12 depicts an amino acid alignment between SEQ ID NO: 213 and GMF-y (AB001993). In the alignment present the translation starts at position 2 of SEQ ID NO: 166. The actual start methionine of SEQ ID NO: 213 appears to be at position 13. Identities are shown by and conservative substitutions by Figure 13 depicts an amino acid alignment between SEQ ID NO: 191 and Derwent Protein Sequence Database Accession NO: W36955. Identities are shown by and conservative substitutions by Figure 14 depicts an amino acid alignment between SEQ ID NO: 200 and human Ring zinc finger protein (AF037204). Amino acids defining an EGF-like domain are highlighted. The region defining an almost perfect Ring Finger domain is boxed. Identities are shown by and conservative substitutions by Figure 15 depicts an amino acid alignment between SEQ ID NO: 192 and Y15286. Identities are shown by and conservative substitutions by Figure 16 depicts a multiple amino acid alignment between SEQ ID NOs: 201 and 227, and human stomatin (x85116). Positions conserved in all three proteins are indicated by The amino acid sequences in SEQ ID NOs: 201 and 227 differ in their N-terminal sequences: segment 1-76 (SEQ ID NO: 201) and segment 1-26 (SEQ ID NO: 227). The remainder of these 2 proteins are 99.5% identical. The band 7 protein family signature is boxed, The microbody C-terminal targeting signal appears in bold type.
Detailed Description of the Preferred Embodiment 1. Obtaining 5' ESTs The present extended cDNAs were obtained using 5' ESTs which were isolated as described below.
A. Chemical Methods for Obtaining mRNAs having Intact 5' Ends In order to obtain the 5' ESTs used to obtain the extended cDNAs of the present invention, mRNAs having intact 5' ends must be obtained. Currently, there are two approaches for obtaining such mRNAs.
One of these approaches is a chemical modification method involving derivatization of the 5' ends of the mRNAs and selection of the derivatized mRNAs. The 5' ends of eukaryotic mRNAs possess a structure referred to as a "cap" which comprises a guanosine methylated at the 7 position. The cap is joined to the first transcribed base of the mRNA by a 5'-triphosphate bond. In some instances, the 5' guanosine is methylated in both the 2 and 7 positions. Rarely, the 5' guanosine is trimethylated at the 2, 7 and 7 positions. In the chemical method for obtaining mRNAs having intact 5' ends, the 5' cap is specifically derivatized and coupled to a reactive group on an immobilizing substrate. This specific derivatization is based on the fact that only thie ribose linked to the methylated guanosine at the 5' end of the mRNA and the ribose linked to thle base at tile 3' terminus of tile rnRNA. possess 3'-cis diols. Optionally, where the 3' terminal ribose has a 3'cis diol, thc 3'-cis diol at the 3' end may be chemically modified, substitued, converted, or climinaed, leaving only the ribose linked to the miethylatcd guanosine at thle 5' end of the mRNA with a Y-cis dial. A variety of techniques nnre available for eliminating thle 3'-cis diol on the 3' terminal ribose. For example, controlled alkaline hydrolysis may ble used to generate nmRNA rragnL'nts in which the 3' terminal ribose is it 3-phosphate. 2'-phosphatc or 3')-cyclophosphate. Thereafter. the fragment which includes the original 3' ribose may be eliminated from the mixture through chromatography onl an oligo-dT columin. Alternatively, a base which lacks. the 3'cis diol may be added to the 3' end of (lie mRNA using oin RNA ligase such as T4 RNA ligase. Example I below describes i method for ligation of pCp to the 3' end of messenger RNA.
EXAMPLE 1 Ligation of thle Nucleoside Diphosohate nCY, to the 3T End of Messenger RNA.
1 lug of RNA was incubated in a final reaction medium of 10 41 in tie presence of 5 U of T. phage RNA liguse in the buffer provided by the manufacturer (Gibco, I3RL, 40 U of die RNase inhibitor RNasin (Promega) and, 2 pul of 32pcp (Amerslian #PB 10208). The incubation was performed at 370C for 2 hours or overnight at 7-80C.
Following modification or elimination of the 3'.cis diol at tlie 3' ribose, thle 3'cis diol present at the 5'end of the nmRNA may be oKidized using reagents such as NaBH 4 L, Nalli.CN. or sodium periodate, thereby converting (lie 3'-cis diol to a dialdehyde. Example 2 describes the oxidation of thie 3'-cis diol at the 5'cend of thle mRNA with sodium periodate.
EXAMPLE 2 Oxidation of '-cis diol at the S' End of tile mRNA 0. 1 OD unit of either a capped oligoribonucleotide of 47 nucleotides (including the cap) or an uncapped oligoribonucleotide of 46 nucleotides were treated as follows. The oligoribonucleotides were produced by in vitro transcription using the transcription kit "AmpliScribe TT' (Epicentre Technologies).
As indicated below, the DNA template for the RNA transcript contained a single cytosine, To synthesize the uncapped RNA, all four NTPs were included in Lhe in vitro transcription reaction. To obtain the capped RNA, GTP was replaced by an analogue of the cap, m70(5')ppp(5')G. This compound, recognized by polyrnerase, was incorporated into thle 5' end of the nascent transcript during the step of initiation of transcription but was not capable of incorporation during thle extension step. Consequently, the resulting RNA contained a cap at its 5'end. The sequences of the oligoribonuclcotides produced by the in vitro transcription reaction were: +Cap: 5'nl7GpppGCAUCCUACUCCCAUCCAAUUCCACCCUAACUCCUCCCAUCUCCAC.3' (SEQ ID NO: 1) -Cap: 5'-pppGCAUCCUACUCCCAUCCAAUIJCCACCCUAACUCCUCCCAUCUCCAC3' (SEQ ID NO:2) STile oligoribotiuckeotidos were dissolved in 9 pi of acetate buffer 1 M sodium acetate, p11I 5.2) and 3 ptl of ivshly prepared 0. 1 M sodium periodatc solution. The mixture was incubated for I hour in the dark at 4 0 C or room temperature. Thlereafter. the reaction was stopped by adding 4 P1 or 10% ethylene glycol. The product was ethanol precipitated. resuspended in l0tl or more of water or appropriaue buffer and dialyzed against water.
t0 Thle resulting aldehyde groups may then be coupled to molecules having at reactive amine group.
such Lis hydrazine, carbazide, thiocarbazide or sernicarbazide groups, in order to facilitate enrichment of tile ends of the mRNAs, Molecules having reactive amnine groups which are suitable for use in selecting mRNAs having intact 5' ends include avidin, proteins, antibodies, vitamins, ligands capable of specifically binding to receptor molecules, or oligonuc leot ides. Example 3 below describes the coupling of the resulting dialdehyde to biotin.
EXAMPLE 3 Coupling of the Dialdehyde with Biotin The oxidation product obtained in Example 2 was dissolved in 50 gI of sodium acetate at a pH of he-twecni 5 and 5.2 and 5o gl of freshly prepared 0.02 M solution of biotiln hydrazide in a inthoxycthanol/water mixture 1) of formula:
H
N
NH 2 _N -4M 2) NH -4M)4
NH
In the compound used in these experiments, n=5, and thle solid black dots represent oxygen.
However, it will be appreciated that other commercially available hydrazides may also be used, such as molecules of thle formula above in which n varies from 0 to The mixture was then incubated for 2 hours at 370C. Following the incubation, the mixture was precipitated with ethanol and dialyzed against distilled water.
Example 4 demonstrates the specificity of the biotinylation reaction.
EXAMPLE 4 Specificity of Biotinvlation The specificity of the biotinylation for capped mRNAs was evaluated by gel electrophoresis of the following samples: Sample I. The 46 nucleotide uncapped in vitro transcript prepared as in Example 2 and labeled with ".pCp as described in Example I.
Sample 2. The 46 nucleotide ulncapped in vitro transcript prepared as in Example 2, labeled with '2pCp as described in Example 1, treated with the oxidation reaction of Example 2, and subjected to the hiotinylation conditions of Example 3.
Sample 3. The 47 nucleotide capped in vitro transcript prepared as in Example 2 and labeled with "pCp as described in Example 1.
Sample 4. The 47 nucleotide capped in vitro transcript prepared as in Example 2. labeled with 'pCp as described in Example 1, treated with the oxidation reaction of Example 2, and subjected to the biotinylation conditions of Example 3.
Samples 1 and 2 had identical migration rates, demonstrating that the uncapped RNAs were not oxidized and biotinylated. Sample 3 migrated more slowly than Samples I and 2, while Sample 4 exhibited the slowest migration. The difference in migration of the RNAs in Samples 3 and 4 demonstrates that the capped RNAs were specifically biotinylated.
In some cases, mRNAs having intact 5' ends may be enriched by binding the molecule containing a reactive amine group to a suitable solid phase substrate such as the inside of the vessel containing the mRNAs, magnetic beads, chromatography matrices, or nylon or nitrocellulose membranes. For example, where the molecule having a reactive amine group is biotin, the solid phase substrate may be coupled to avidin or streptavidin. Alternatively, where the molecule having the reactive amine group is an antibody or receptor ligand, the solid phase substrate may be coupled to the cognate antigen or receptor. Finally, where the molecule having a reactive amine group comprises an oligonucleotide, the solid phase substrate may comprise a complementary oligonucleotide.
The mRNAs having intact 5' ends may be released from the solid phase following the enrichment procedure. For example, where the dialdehyde is coupled to biotin hydrazide and the solid phase comprises streptavidin, the mRNAs may be released from the solid phase by simply heating to 95 degrees Celsius in 21%SDS. In some methods, the molecule having a reactive amine group may also be cleaved from the mRNAs having intact 5' ends following enrichment. Example 5 describes the capture of biotinylated mRNAs with streptavidin coated beads and the release of the biotinylated mRNAs from the beads following enrichment.
EXAMPLE Capture and Release of Biotinylated mRNAs Using Strepatividin Coated Beads The streptavidin-coated magnetic beads were prepared according to the manufacturer's instructions (CPG Inc.. USA). The biotinylated mRNAs were added to a hybridization buffer (1.5 M NaCI, pH 5 6).
After incubating for 30 minutes, the unbound and nonbiotinylated material was removed. The beads were washed several times in water with 1% SDS. The beads obtained were incubated for 15 minutes at 950C in water containing 2% SDS.
Example 6 demonstrates the efficiency with which biotinylated mRNAs were recovered from the streptavidin coated beads, EXAMPLE 6 Efficiency of Recovery of Biotinvlated mRNAs The efficiency of the recovery procedure was evaluated as follows. RNAs were labeled with "pCp, oxidized, biotinylated and bound to streptavidin coated beads as described above. Subsequently, the bound RNAs were incubated for 5, 15 or 30 minutes at 95 0 C in the presence of 2% SDS.
The products of the reaction were analyzed by electrophoresis on 12% polyacrylamide gels under denaturing conditions (7 M urea). The gels were subjected to autoradiography. During this manipulation, the hydruzone bonds were not reduced.
Increasing amounts of nucleic acids were recovered as incubation times in 2% SDS increased, demonstrating that biotinylated mRNAs were efficiently recovered.
In an alternative method for obtaining mRNAs having intact 5' ends, an oligonucleotide which has been derivatized to contain a reactive amine group is specifically coupled to mRNAs having an intact cap.
Preferably. the 3' end of the mRNA is blocked prior to the step in which the aldehyde groups are joined to the derivatized oligonucleotide, as described above, so as to prevent the derivatized oligonucleotide from being joined to the 3' end of the mRNA. For example, pCp may be attached to the 3' end of the mRNA using T4 RNA ligase. However, as discussed above, blocking the 3' end of the mRNA is an optional step.
Derivatized oligonucleotides may be prepared as described below in Example 7.
EXAMPLE 7 Derivatization of the Oligonucleotide An oligonucleotide phosphorylated at its 3' end was convened to a 3' hydrazide in 3' by treatment with an aqueous solution of hydrazine or of dihydrazide of the formula H2N(R )NH: at about I to 3 M, and at pH 4.5, in the presence of a carbodiimide type agent soluble in water such as 1-ethyl-3-(3dimethylaminopropyl)carbodiimide at a final concentration of 0.3 M at a temperature of 8 0 C overnight.
The derivatized oligonucleotide was then separated from the other agents and products using a standard technique for isolating oligonucleotides.
As discussed above, the nRNAs to be enriched may be treated to eliminate the 3' OH groups which may be present thereon. This may be accomplished by enzymatic ligation of sequences lacking a 3' OH, such as pCp. as described above in Example 1. Alternatively, the 3' OH groups may be eliminated by alkaline hydrolysis as described in Example 8 below.
EXAMPLE 8 Alkalitic Hydrolysis of nmRNA The mRNAs may be treated will alkaline hydrolysis as follows. In a total volume of 100ptl of 0. IN sodium hydroxide, 1.5p.g mRNA is incubated for 40 to 60 minutes at 4C. The solution is neutralized with acetic acid and precipitated with ethanol, Following the optional elimination of the 3' OH groups, the diol groups at the 5' ends of the mRNAs are oxidized as described below in Example 9.
EXAMPLE 9 Oxidation of Diols Up to 1 OD unit of RNA was dissolved in 9 1l of buffer (0.1 M sodium acetate, pH 6-7 or water) and 3 gtl of freshly prepared 0.1 M sodium periodate solution. The reaction was incubated for I h in the dark at 4CC or room temperature. Following the incubation, the reaction was stopped by adding 4 gI of ethylene glycol. Thereafter the mixture was incubated at room temperature for 15 minutes. After ethanol precipitation, the product was resuspcnded in 10 1 or more of water or appropriate buffer and dialyzed against water.
Following oxidation of the diol groups at the 5' ends of the mRNAs, the derivatized oligonucleotide was joined to the resulting aldehydes as described in Example EXAMPLE Reaction of Aldehydes with Derivatized Olionucleotides The oxidized mRNA was dissolved in an acidic medium such as 50 pl of sodium acetate pH 4-6. Al of a solution of the derivatized oligonucleotide was added such that an mRNA:derivatized oligonucleotide ratio of 1:20 was obtained and mixture was reduced with a borohydride. The mixture was allowed to incubate for 2 h at 370C or overnight (14 h) at 10 0 C. The mixture was ethanol precipitated, resuspended in 101l or more of water or appropriate buffer and dialyzed against distilled water. If desired, the resulting product may be analyzed using acrylamide gel electrophoresis, HPLC analysis, or other conventional techniques.
Following the attachment of the derivatized oligonucleotide to the mRNAs, a reverse transcription reaction may be performed as described in Example 11 below.
EXAMPLE 11 Reverse Trinscription of niRNAs An oligodeoxyribonucleotide was derivatized as follows. 3 OD units of an oligodcoxyribonucleotide of sequence ATCAAGAATTCGCACGAGACCATTA (SEQ ID NO:3) having 5'-0l-l and 3'-P ends were dissolved in 70 It1 ofa 1.5 M hydroxybe.nzotriazole soluition, pH- 5.3. prepared in dimthl 'omaid/wwr(75:25) conitaining 2 jig or I *etliyl-3-(3-diimcthlylaintoliiolpyl)cairbodiimiide. The mixture was incubated for 2 hi 3(0 mintat 220C lhe mixture was theni precipitated twice in LiClO4acctone.
The pellet was resuspended in 2WX ltl of 0.25 M hydrazine and incubated at 8 0 C from 3 to 14 h. Following thie hydrazine reaction, the mixture was precipitated twice in LiClO41acetoine.
The messenger RNAs to be reverse transcribed were extracted front blocks of placenta having sides of 2 ern which had been stored at -80 0 C. The niRNA was extracted using conventional acidic phienol techniques. Oligo-dT chromatography was used to purify the mRNAs. The integrity of the mRNAs was checked by Northern-blotting.
The diol groups on 7 gg of the placental rnRNAs were oxidized as described above in Example 9.
Thle derivatized oligonucleotide was joined to the m.RNAs as described in Example 10 above except that the precipitation step was replaced by an exclusion chromatography step to remove derivatized oligodeoxyribonucleotides which were not joined to mRNAs. Exclusion chromatography was performed as follows: mli or AcA34 (l3ioSepra#23015 1) gel were cquilibrated in 50 nil of a solution of 10 iM Tris p1-1 8.0, 300 miM NaCI, I mM EDTA, and 0.05% SDS. The mixture was allowed to sediment. Thle supernatant was eliminated and the gel was resuspended in 50 ml of buffer. This procedure was repeated 2 or 3 times.
A glass bead (diameter 3 mml) was introduced into a 2 ml disposable pipette (length 25 cm). The pipette was filled with the gel suspension until the height of thc gel stabilized at I cm, from thc top of the pipette. Thle column was then equilibrated wvith 20 mil of equilibration buffer (10mrM Tris HCI pH 7.4, mM NaCI).
l.tl of the mRNA which had been reacted with the derivatized oligonucleotide were mixed in 39 ptl of 10 mM urea and 2 W. of blue-glycerol buffer, which had been prepared by dissolving 5 mg of bromophienol blue in 60% glycerol (vfv), and passing the mixture through a filter with a filter of diameter 0.45 lpm.
The column was loaded. As soon as the sample had penetrated, equilibration buffer was added. 100 pi fractions were collected. Derivatized oligonucleotide which had not bccn attached to mRNA appeared in fraction 16 and later fractions. Fractions 3 to I1'5 were combined and precipitated with ethanol.
The mRNAs which had been reacted with the dcrivatized oligonucleotide were spotted on a nylon membrane and hybridized to a radioactive probe using conventional techniques. Tile radioactive probe used in these hybridizations was an oligodeoxyribonucleotide of sequence TAATGOTCTCGTGCOAATTCTTGAT (SEQ ID NO:4) which was anticomplementary to the derivatized oligonucleotide and was labeled at its 5' end with 1/10th of the mRNAs which had been reacted with thile derivatized oligonucleotide was spotted in two spots on the membrane and the membrane was visualized by autoradiography after hybridization of the probe. A signal was observed. indicating that the derivatized oligonucleotide had been joined to the mRNA.
The remanining 9/10 of the mRNAs which had been reacted with the derivatized oligonucleotide was reverse transcribed as follows, A reverse transcription reaction was carried out with reverse transeriptase following thile mIanufacturer's instructions. To prime the reaction. 50 pmol of nonamers with random sequence were used.
A portion of thile resulting cDNA was spotted on a positively charged nylon membrane using conventional methods. The cDNAs were spotted on the membrane after thile cDNA:RNA heteroduplexes had been subjected to an alkline hydrolysis in order to eliminate the RNAs, An oligonucleotide having a sequence identical to that of the derivatized oligonucleotide was labeled at its 5' end with "P and hybridized to tilhe cDNA blots using conventional techniques. Single-stranded cDNAs resulting from the reverse transcription reaction were spotted on the membrane. As controls, the blot contained 1 pmol, 100 fmol. fmol, 10 fminol and I fmol respectively of a control oligodeoxyribonucleotide of sequence identical to that of the derivatized oligonucleotide. The signal observed in the spots containing the cDNA indicated that approximately 15 fmol of the derivatized oligonucleotide had been reverse transcribed.
These results demonstrate that the reverse transcription can be performed through the cap and, in particular, that reverse transcriptase crosses the bond of the cap of cukaryotic messenger RNAs.
The single stranded cDNAs obtained after the above first strand synthesis were used as template for PCR reactions. Two types of reactions were carried out. First, specific amplification of the mRNAs for tilhe alpha globin, dehydrogenase, pp15 and elongation factor E4 were carried out using the following pairs of oligodeoxyribonucctide primers.
alpha-globin GLO-S: CCG ACA AGA CCA ACG TCA AGG CCG C (SEQ ID GLO-As: TCA CCA GCA GGC AGT GGC TA GGA G 3' (SEQ ID NO:6) dchydrogenase 3 DH-S: AGT GAT TCC TGC TAC TIT GGA TGG C (SEQ ID NO:7) 3 DH-As: GCT TGG TCT TGT TCT GGA GTT TAG A (SEQ ID NO:8) 1'P 15-S: TCC AGA ATG GGA GAC AAG CCA ATT T (SEQ ID NO:9) PP 15-As: AGG GAG GAG GAA ACA GCG TGA GTC C (SEQ ID NO: Elongation factor E4 EFA l-S: ATG GGA AAG GAA AAG ACT CAT ATC A (SEQ ID NO: 11) EFIA-As: AGC AGC AAC AAT CAG GAC AGC ACA G (SEQ ID NO:12) Non-specific amplifications were also carried out with the antisense (-As) oligodeoxyribonucleotides of the pairs described above and a primer chosen from the sequence of the derivatized oligodeoxyribonucleotide (ATCAAGAATTCGCACGAGACCATTA) (SEQ ID NO:13).
A 1.5% agarose gel containing the following samples corresponding to tile PCR products of reverse transcription was stained with ethidiumn bromide. (1/20th of the products of reverse transcription were used for each PCR reaction).
Sample 1: The products of a PCR reaction using the globin primers o SEQ ID NOs 5 and 6 in the presence of cDNA.
Sample 2: The products of a PCR reaction using the globin primers ofSEQ ID NOs 5 and 6 in the absence of added eDNA.
Sample 3: The products of a PCR reaction using tile dehydrogenase primers of SEQ ID NOs 7 and 8 in the presence of cDNA.
Sample 4: The products of a PCR reaction using the dehydrogenase primers of SEQ ID NOs 7 and 8 in the absence of added cDNA.
Sample 5: The products of a PCR reaction using the ppl5 primers of SEQ ID NOs 9 and 10 in the presence of cDNA.
Sample 6: The products of a PCR reaction using the ppl5 primers of SEQ ID NOs 9 and 10 in the absence of added cDNA.
Sample 7: The products of a PCR reaction using the EIE4 primers of SEQ ID NOs 11 and 12 in the presence of added cDNA.
Sample 8: The products of a PCR reaction using the EIE4 primers of SEQ ID NOs 11 and 12 in the absence of added cDNA.
In Samples 1. 3, 5 and 7, a band of the size expected for the PCR product was observed, indicating the presence of the corresponding sequence in the cDNA population.
PCR reactions were also carried out with the antisense oligonucleotides of the globin and dehydrogenase primers (SEQ ID NOs 6 and 8) and an oligonucleotide whose sequence corresponds to that of the derivatized oligonucleotide. The presence of PCR products of the expected size in the samples corresponding to samples I and 3 above indicated that the derivatized oligonucleotide had been incorporated.
The above examples summarize the chemical procedure for enriching mRNAs for those having intact 5' ends. Further detail regarding the chemical approaches for obtaining mRNAs having intact 5' ends are disclosed in Intemational Application No. W096/34981, published November 7, 1996.
Strategies based on the above chemical modifications to the 5' cap structure may be utilized to generate cDNAs which have been selected to include the 5' ends of the mRNAs from which they are derived. In one version of such procedures, the 5' ends of the mRNAs are modified as described above.
Thereafter, a reverse transcription reaction is conducted to extend a primer complementary to the mRNA to the 5' end of the mRNA. Single stranded RNAs are eliminated to obtain a population of cDNA/mRNA heteroduplexes in which the mRNA includes an intact 5' end. The resulting heteroduplexes may be captured on a solid phase coated with a molecule capable of interacting with the molecule used to derivatize the 5' end of the mRNA. Thereafter, the strands of the heteroduplexes are separated to recover single stranded first cDNA strands which include the 5' end of the nRNA. Second strand cDNA synthesis may then proceed using conventional techniques. For example, the procedures disclosed in WO 96/34981 or in Carninci. P. et al. High-Efficiency Full-Lngthl cDNA Cloning by Diotinylated CAP Trapper. Genomnics 37:327-336 (1996). may be employed to select cDNAs which include tie sequence derived from the 5' end of the coding sequence of the mRNA.
Following ligation of the oligonucleotide lag to the 5' cap of the mRNA, a reverse transcription reaction is conducted to extend a primer complementary to the mRNA to the 5' end of the mRNA.
Following elimination of the RNA component of the resulting heteroduplex using standard techniques, second strand cDNA synthesis is conducted with a primer complementary to the oligonucleotide tag.
Figure I summarizes the above procedures for obtaining cDNAs which have been selected to include the 5' ends of the mRNAs from which they are derived.
B. Enzymatic Methods for Obtaining mRNAs having Intact 5' Ends Other techniques for selecting cDNAs extending to the 5' end of the mRNA from which they are derived are fully enzymatic. Some versions of these techniques are disclosed in Dumas Milne-Edwards J.B.
(Doctoral Thesis of Paris VI University, Le clonage des ADNc complcts: difficultcs et perspectives nouvelles. Apports pour I'etude de la regulation de I'expression de la tryptophane hydroxylase de rat, Dec. 1993), EPO 625572 and Kato ct al. Construction of a Human Full-Length cDNA Bank. Gene 150:243- 250(1994).
Briefly, in such approaches, isolated mRNA is treated with alkaline phosphatase to remove the phosphate groups present on the 5' ends of uncapped incomplete mRNAs. Following this procedure, the cap present on full length mRNAs is enzymatically removed with a dccapping enzyme such as T4 polynucleotide kinase or tobacco acid pyrophosphatase. An oligonucleotide, which may be either a DNA oligonucleotide or a DNA-RNA hybrid oligonucleotide having RNA at its 3' end, is then ligated to the phosphate present at the 5' end of the decapped mRNA using T4 RNA ligase. The oligonucleotide may include a restriction site to facilitate cloning of the cDNAs following their synthesis. Example 12 below describes one enzymatic method based on the doctoral thesis of Dumas.
EXAMPLE 12 Enzymatic Approach for Obtaining 5' ESTs Twenty micrograms of PolyA+ RNA were dephosphorylated using Calf Intestinal Phosphatase (Biolabs). After a phenol chloroforni extraction, the cap structure of mRNA was hydrolyzed using the Tobacco Acid Pyrophosphatase (purified as described by Shinshi et al., Biochemistry 15: 2185-2190, 1976) and a hemi 5'DNA/RNA-3' oligonucleotide having an unphosphorylated 5' end, a stretch of adenosine ribophosphate at the 3' end, and an EcoRI site near the 5' end was ligated to the 5'P ends of mRNA using the T4 RNA ligase (Biolabs). Oligonucleotides suitable for use in this procedure are preferably 30-50 bases in length. Oligonucleotides having an unphosphorylated 5' end may be synthesized by adding a fluorochronm at the 5' end. The inclusion of a stretch of adenosine ribophosphatcs at the 3' end of the oligonucleolide increases ligation efficiency. It will be appreciated that the oligonucleotide may contain cloning sites other than EcoRI.
Following ligation of the oligonucleotide to the phosphate present at the 5' end of the decapped mRNA, first and second strand cDNA synthesis may be carried out using conventional methods or those specified in EPO 625,572 and Kato et al. Construction of a Human Full-Length cDNA Bank. Gene 150:243.
250 (1994), and Dumas Milne-Edwards, supra. Thie resulting cDNA may then be ligated into vectors such as those disclosed in Kato et al. Construction of a Human Full-Length cDNA Bank. Gene 150:243-250 (1994) or other nucleic acid vectors known to those skilled in the art using techniques such as those described in Sambrook et al., Molecular Cloning: A Laboratory Manual 2d Ed., Cold Spring Harbor Laboratory Press (1989).
1I. Characterization of 5' ESTs The above chemical and enzymatic approaches for enriching mRNAs having intact 5' ends were employed to obtain 5' ESTs. First, mRNAs were prepared as described in Example 13 below.
EXAMPLE 13 Preparation of mRNA Total human RNAs or PolyA+ RNAs derived from 29 different tissues were respectively purchased from LABIMO and CLONTECH and used to generate 44 cDNA libraries as described below. The purchased RNA had been isolated from cells or tissues using acid guanidium thiocyanatc-phcnol-chloroform extraction (Chomczyniski, P and Sacchi, Analytical Biochemistry 162:156-159. 1987). PolyA+ RNA was isolated from total RNA (LABIMO) by two passes of oligodT chromatography, as described by Aviv and Leder (Aviv, H. and Leder, Proc. Natl. Acad. Sci. USA 69:1408-1412, 1972) in order to eliminate ribosomal RNA.
The quality and the integrity of the poly A+ were checked. Northern blots hybridized with a globin probe were used to confirm that the mRNAs were not degraded. Contamination of the PolyA+ mRNAs by ribosomal sequences was checked using RNAs blots and a probe derived from the sequence of the 28S RNA. Preparations of mRNAs with less than 5% of ribosomal RNAs were used in library construction. To avoid constructing libraries with RNAs contaminated by exogenous sequences (prokaryotic or fungal), the presence of bacterial 16S ribosomal sequences or of two highly expressed mRNAs was examined using
PCR.
Following preparation of the mRNAs, the above described chemical and/or the enzymatic procedures for enriching mRNAs having intact 5' ends discussed above were employed to obtain 5' ESTs from various tissues. In both approaches an oligonucleotide tag was attached to the cap at the 5' ends of the mRNAs. The oligonucleotide tag had an EcoRI site therein to facilitate later cloning procedures.
Following attachment of the oligonucleotide tag to the mRNA by either the chemical or enzymatic methods. the integrity of the mRNA was examined by performing a Northern blot with 2 00-500ng of mRNA using a probe complementary to the oligonuclcotide tag.
I~XAMPLE 14 clNA Synthesis Ulsine mRNA Temnpllts Iavine Intact 5' Ends For the mRNAs joined to oligonucleotide tags using both the chemical and enzymatic methods, first strand cDNA synthesis was performed with reverse transcriptase using random nonamers as primers. In order to protect internal EcoRI sites in the cDNA from digestion at later steps in the procedure, methylated dCTP was used for first strand synthesis. After removal of IUA by an alkaline hydrolysis, the first strand of cDNA was precipitated using isopropanol in order to eliminate residual primers.
For both the chemical and the enzymatic methods, synthesis of the second strand of the cDNA is conducted as follows. After removal of RNA by alkaline hydrolysis, the first strand of cDNA is precipitated using isopropanol in order to eliminate residual primers. The second strand of the cDNA was synthesized with Klenow using a primer corresponding to the 5'end of the ligated oligonucleotide described in Example 12. Preferably, the primer is 20-25 bases in length. Methylated dCTP was also used for second strand synthesis in order to protect internal EcoRI sites in the cDNA from digestion during the cloning process.
Following cDNA synthesis, the cDNAs were cloned into pBlueScript as described in Example below.
EXAMPLE Insertion of cDNAs into BlueScript Following second strand synthesis, the ends of the cDNA were blunted with T4 DNA polymerase (Biolabs) and the cDNA was digested with EcoRI. Since methylated dCTP was used during cDNA synthesis, the EcoRI site present in the tag was the only site which was hemi-methylated. Consequently, only the EcoRI site in the oligonucleotide tag was susceptible to EcoRI digestion. The cDNA was then size fractionated using exclusion chromatography (AcA, Biosepra). Fractions corresponding to cDNAs of more than 150 bp were pooled and ethanol precipitated. The cDNA was directionally cloned into the Smal and EcoRI ends of the phagemid pBlueScript vector (Stratagene). The ligation mixture was clectroporated into bacteria and propagated under appropriate antibiotic selection.
Clones containing the oligonucleotide tag attached were selected as described in Example 16 below.
EXAMPLE 16 Selection of Clones Having the Oligonucleotide Tag Attached Thereto The plasmid DNAs containing 5' EST libraries made as described above were purified (Qiagen). A positive selection of the tagged clones was performed as follows. Briefly, in this selection procedure, the plasmid DNA was converted to single stranded DNA using gene I endonuclease of the phage FI in combination with an cxonuclease (Chang et al.. Gene 127:95-8. (1993)) such as exonuclease Ill or T7 gene 6 exonuclease. The resulting single stranded DNA was then purilied using paramagnetic beads as described by Fly et al., Iillechniques, 13:; 124-131 (1992). In this procedure, the single stranded DNA was hybridized with a biotinylated oligonucleotide having a sequence corresponding to the 3' end of the oligonucleotide described in Example 13. Preferably, the primer has a length of 20-25 bases. Clones including a sequence complementary to the biotinylated oligonucleotide were captured by incubation with streptavidin coated magnetic beads followed by magnetic selection. After capture of the positive clones, the plasmid DNA was released from the magnetic beads and converted into double stranded DNA using a DNA polymerase such as the ThermoSequenase obtained from Amersham Pharmacia Biotech. Altematively, protocols such as the Gene Trapper kit (Gibco BRL) may be used. The double stranded DNA was then electroporated into bacteria. The percentage of positive clones having the 5' tag oligonucleotide was estimated to typically rank between 90 and 98% using dot blot analysis.
Following clectroporation. the libraries were ordered in 384-microtiter plates (MTP). A copy of the MTP was stored for future needs. Then the libraries were transferred into 96 MTP and sequenced as described below.
EXAMPLE 17 Seuencing of Inserts in Selected Clones Plasmid inserts were first amplified by PCR on PE 9600 thermocyclers (Perkin-Elmer), using standard SETA-A and SETA-B primers (Genset SA), AmpliTaqGold (Perkin-Elmer), dNTPs (Boehringer), buffer and cycling conditions as recommended by the Perkin-Elmer Corporation.
PCR products were then sequenced using automatic ABI Prism 377 sequencers (Perkin Elmer, Applied Biosystems Division, Foster City, CA). Sequencing reactions were performed using PE 9600 thermocyclers (Perkin Elmer) with standard dye-primer chemistry and ThermoSequenase (Amersham Life Science). The primers used were either T7 or 21M 13 (available from Genset SA) as appropriate. The primers were labeled with the JOE, FAM, ROX and TAMRA dyes. The dNTPs and ddNTPs used in the sequencing reactions were purchased from Boehringer. Sequencing buffer, reagent concentrations and cycling conditions were as recommended by Amersham.
Following the sequencing reaction, the samples were precipitated with ELOH, resuspended in formamide loading buffer, and loaded on a standard 4% acrylamide gel. Electrophoresis was performed for hours at 3000V on an ABI 377 sequencer, and the sequence.data were collected and analyzed using the ABI Prism DNA Sequencing Analysis Software, version 2.1.2.
The sequence data from the 44 cDNA libraries made as described above were transferred to a proprietary database, where quality control and validation steps were performed. A proprietary base-caller ("Trace"), working using a Unix system automatically flagged suspect peaks, taking into account the shape of the peaks, the inter-peak resolution, and the noise level. The proprietary base-caller also performed an automatic trimming. Any stretch of 25 or fewer bases having more than 4 suspect peaks was considered unreliable and was discarded. Sequences corresponding to cloning vector or ligation oligonucleolides were automatically removed from the EST sequences. However, the resulting EST sequences may contain I to bases belonging to the above mentioned sequences at their 5' end. If needed, these can easily be removed on a case by case basis.
Thereafter, the sequences were transferred to the proprietary NETGENETI Database for further analysis as described below.
Following sequencing as described above, the sequences of the 5' ESTs were entered in a proprietary database called NETGENErT for storage and manipulation. It will be appreciated by those skilled in the art that the data could be stored and manipulated on any medium which can be read and accessed by a computer. Computer readable media include magnetically readable media, optically readable media, or electronically readable media. For example, the computer readable media may be a hard disc, a floppy disc, a magnetic tape, CD-ROM, RAM, or ROM as well as other types of other media known to those skilled in the art.
In addition, the sequence data may be stored and manipulated in a variety of data processor programs in a variety of formats. For example, the sequence data may be stored as text in a word processing file, such as Microsoft WORD or WORDPERFECT or as an ASCII file in a variety of database programs familiar to those of skill in the art, such as DB2, SYBASE, or ORACLE.
The computer readable media on which the sequence information is stored may be in a personal computer, a network, a server or other computer systems known to those skilled in the art. The computer or other system preferably includes the storage media described above, and a processor for accessing and manipulating the sequence data.
Once the sequence data has been stored it may be manipulated and searched to locate those stored sequences which contain a desired nucleic acid sequence or which encode a protein having a particular functional domain. For example, the stored sequence information may be compared to other known sequences to identify homologies, motifs implicated in biological function, or structural motifs.
Programs which may be used to search or compare the stored sequences include the MacPattem (EMBL), BLAST, and BLAST2 program series (NCBI), basic local alignment search tool programs for nucleotide (BLASTN) and peptide (BLASTX) comparisons (Altschul et al, J. Mol. Biol. 215:403 (1990)) and FASTA (Pearson and Lipman, Proc. Nail. Acad. Sci. USA, 85: 2444 (1988)). The BLAST programs then extend the alignments on the basis of defined match and mismatch criteria.
Motifs which may be detected using the above programs include sequences encoding leucine zippers, helix-turn-helix motifs, glycosylation sites, ubiquitination sites. alpha helices, and beta sheets, signal sequences encoding signal peptides which direct the secretion of the encoded proteins, sequences implicated in transcription regulation such as homcoboxes, acidic stretches, enzymatic active sites, substrate binding sites, and enzymatic cleavage sites.
Before searching the cDNAs in the NETGENE database for sequence motifs of interest, cDNAs derived from mRNAs which were not of interest were identified and eliminated from further consideration as described in Example 18 below.
EXAMPLE 18 Elimination of Undesired Sequences from Further Consideration ESTs in the NETGENETrM database which were derived from undesired sequences such as transfer RNAs, ribosomal RNAs, mitochondrial RNAs, procaryotic RNAs, fungal RNAs, Alu sequences, LI sequences, or repeat sequences were identified using the FASTA and BLASTN programs with the parameters listed in Table I.
To eliminate 5' ESTs encoding tRNAs from further consideration, the 5' EST sequences were compared to the sequences of 1190 known tRNAs obtained from EMBL release 38, of which 100 were human. The comparison was performed using FASTA on both strands of the 5' ESTs. Sequences having more than 80% homology over more than 60 nucleotides were identified as tRNA. Of the 144,341 sequences screened. 26 were identified as tRNAs and eliminated from further consideration.
To eliminate 5' ESTs encoding rRNAs from further consideration, the 5' EST sequences were compared to the sequences of 2497 known rRNAs obtained from EMBL release 38, of which 73 were human. The comparison was performed using BLASTN on both strands of the 5' ESTs with the parameter S=108. Sequences having more than 80% homology over stretches longer than 40 nucleotides were identified as rRNAs. Of the 144,341 sequences screened, 3,312 were identified as rRNAs and eliminated from further consideration.
To eliminate 5' ESTs encoding mtRNAs from further consideration, the 5' EST sequences were compared to the sequences of the two known mitochondrial genomes for which the entire gcnomic sequences are available and all sequences transcribed from these mitochondrial genomes including tRNAs, rRNAs, and mRNAs for a total of 38 sequences. The comparison was performed using BLASTN on both strands of the 5' ESTs with the parameter S=108. Sequences having more than 80% homology over stretches longer than 40 nucleotides were identified as mtRNAs. Of the 144,341 sequences screened, 6,1 were identified as mtRNAs and eliminated from further consideration.
Sequences which might have resulted from exogenous contaminants were eliminated from further consideration by comparing the 5' EST sequences to release 46 of the EMBL bacterial and fungal divisions using BLASTN with the parameter S=144. All sequences having more than 90% homology over at least nucleotides were identified as exogenous contaminants. Of the 42 cDNA libraries examined, the average percentages of procaryotic and fungal sequences contained therein were 0.2% and 0.5% respectively.
Among these sequences, only one could be identified as a sequence specific to fungi. The others were either fungal or procaryotic sequences having homologies with vertebrate sequences or including repeat sequences which had not been masked during the electronic comparison.
In addition, the 5' ESTs were comp;ared to 6093 Alu sequences and 1115 LI sequences to mask ESTs contaniing such repeat sequences from further consideration. 5' ESTs including THE and MER repeats, SSTR sequences or satellite, micro-satellite, or telomeric repeats were also eliminated from further consideration. On average, 11.5% of the sequences in the libraries contained repeat sequences. Of this 11.5%, 7% contained Alu repeats, 3.3% contained LI repeats and the remaining 1.2% were derived from the other types of repetitive sequences which were screened. These percentages are consistent with those found in cDNA libraries prepared by other groups. For example, the cDNA libraries of Adams ct al.
contained between 0% and 7.4% Alu repeats depending on the source of the RNA which was used to prepare the cDNA library (Adams et al., Nature 377:174, 1996).
The sequences of those 5' ESTs remaining after the elimination of undesirable sequences were compared with the sequences of known human mRNAs to determine the accuracy of the sequencing procedures described above.
EXAMPLE 19 Measurement of Sequencing Accuracy by Comparison to Known Sequences To further determine the accuracy of the sequencing procedure described above, the sequences of ESTs derived from known sequences were identified and compared to the known sequences. First, a FASTA analysis with overhangs shorter than 5 bp on both ends was conducted on the 5' ESTs to identify those matching an entry in the public human mRNA database. The 6655 5' ESTs which matched a known human mRNA were then realigned with their cognate mRNA and dynamic programming was used to include substitutions, insertions, and deletions in the list of "errors" which would be recognized. Errors occurring in the last 10 bases of the 5' EST sequences were ignored to avoid the inclusion of spurious cloning sites in the analysis of sequencing accuracy.
This analysis revealed that the sequences incorporated in the NETGENETM database had an accuracy of more than 99.5%.
To determine the efficiency with which the above selection procedures select cDNAs which include the 5' ends of their corresponding mRNAs, the following analysis was performed.
EXAMPLE Determination of Efficiency of 5' EST Selection To determine the efficiency at which the above selection procedures isolated 5' ESTs which included sequences close to the 5' end of the mRNAs from which they were derived, the sequences of the ends of the 5' ESTs which were derived from the elongation factor 1 subunit a and ferritin heavy chain genes were compared to the known cDNA sequences for these genes. Since the transcription start sites for the elongation factor I subunit a and fcrritin heavy chain are well characterized, they may be used to determine the percentage of 5' ESTs derived from these genes which included the authentic transcription start sites.
For both genes. more than 95% of the cDNAs included sequences close to or upstream of the 5' end of the corresponding mRNAs.
To extend the analysis of the reliability of the procedures for isolating 5' ESTs from ESTs in the NETGENETM database, a similar analysis was conducted using a database composed of human mRNA sequences extracted from GenBank database release 97 for comparison. For those 5' ESTs derived from mRNAs included in the GeneBank database, more than 85% had their 5' ends close to the 5' ends of the known sequence. As some of the mRNA sequences available in the GenBank database are deduced from genomic sequences, a 5' end matching with these sequences will be counted as an internal match. Thus, dhe method used here underestimates the yield of ESTs including the authentic 5' ends of their corresponding mRNAs.
The EST libraries made above included multiple 5' ESTs derived from the same mRNA. The sequences of such 5' ESTs were compared to one another and the longest 5' ESTs for each niRNA were identified. Overlapping cDNAs were assembled into continuous sequences (contigs). The resulting continuous sequences were then compared to public databases to gauge their similarity to known sequences.
as described in Example 21 below.
EXAMPLE 21 Clustering of the 5' ESTs and Calculation of Novelty Indices for cDNA Libraries For each sequenced EST library, the sequences were clustered by the 5' end. Each sequence in the library was compared to the others with BLASTN2 (direct strand, parameters S=107). ESTs with High Scoring Segment Pairs (HSPs) at least 25 bp long, having 95% identical bases and beginning closer than bp from each EST 5' end were grouped. The longest sequence found in the cluster was used as representative of the cluster. A global clustering between libraries was then performed leading to the definition of super-contigs.
To assess the yield of new sequences within the EST libraries, a novelty rate (NR) was defined as: NR= 100 X (Number of new unique sequences found in the library/Total number of sequences from the library). Typically, novelty rating range between 10% and 41% depending on the tissue from which the EST library was obtained. For most of the libraries, the random sequencing of 5' EST libraries was pursued until the novelty rate reached Following characterization as described above, the collection of 5' ESTs in NETGENETM was screened to identify those 5' ESTs bearing potential signal sequences as described in Example 22 below.
EXAMPLE 22 Identification of Potential Signal Sequences in 5' ESTs The 5' ESTs in the NETGENET database were screened to identify those having an uninterrupted open reading frame (ORF) longer than 45 nucleolides beginning with an ATG codon and extending to the end of the EST. Approximately halff the cDNA sequences in NETGENETM contained such an ORF. The ORFs of these 5' ESTs were searched to identify potential signal motifs using slight modifications of the procedures disclosed in Von Heijne, G. A New Method for Predicting Signal Sequence Cleavage Sites.
Nucleic Acids Res. 14:4683-4690 (1986). Those 5' EST sequences encoding a 15 amino acid long stretch with a score of at least 3.5 in the Von Heijne signal peptide identification matrix were considered to possess a signal sequence. Those 5' ESTs which matched a known human mRNA or EST sequence and had a end more than 20 nucleotides downstream of the known 5' end were excluded from further analysis. The remaining cDNAs having signal sequences therein were included in a database called SIGNALTAGTM.
1 5 To confirm the accuracy of the above method for identifying signal sequences, the analysis of Example 23 was performed.
EXAMPLE 23 Confirmation of Accuracy of Identification of Potential Signal Saeuences in The accuracy of the above procedure for identifying signal sequences encoding signal peptides was evaluated by applying the method to the 43 amino tenninal amino acids of all human SwissProt proteins.
The computed Von Heijne score for each protein was compared with the known characterization of the protein as being a secreted protein or a non-secreted protein. In this manner, the number of non-secreted proteins having a score higher than 3.5 (false positives) and the number of secreted proteins having a score lower than 3.5 (false negatives) could be calculated.
Using the results of the above analysis, the probability that a peptide encoded by the 5' region of the mRNA is in fact a genuine signal peptide based on its Von Heijne's score was calculated based on either the assumption that 10% of human proteins are secreted or the assumption that 20% of human proteins are secreted. The results of this analysis are shown in Figures 2 and 3.
Using the above method of identifying secretory proteins, 5' ESTs for human glucagon, gamma interferon induced monokine precursor, secreted cyclophilin-like protein, human pleiotropin, and human biotinidase precursor all of which are polypeptides which are known to be secreted, were obtained. Thus, the above method successfully identified those 5' ESTs which encode a signal peptide.
To confirm that the signal peptide encoded by the 5' ESTs actually functions as a signal peptide, the signal sequences from the 5' ESTs may be cloned into a vector designed for the identification of signal peptides. Some signal peptide identification vectors are designed to confer the ability to grow in selective medium on host cells which have a signal sequence operably inserted into the vector. For example, to confirm that a 5' EST encodes a genuine signal peptide, the signal sequence of the 5' EST may be inserted upstream and in frame with a non-secreted form of the yeast invertase gene in signal pcptide selection vectors such as those described in U.S. Patent No. 5,536,637. Growth of host cells containing signal sequence selection vectors having the signal sequence from the 5' EST inserted therein confirms that the EST encodes a genuine signal peptide.
Allernatively. the presence of a signal peptide may be confirmed by cloning the extended cDNAs obtained using ile ESTs into expression vectors such as pXTI (as described below), or by constructing promoter-signal sequence-reporter gene vectors which encode fusion proteins between the signal peptide and an assayable reporter protein. After introduction of these vectors into a suitable host cell, such as COS cells or NIH 3T3 cells, the growth medium may be harvested and analyzed for the presence of the secreted protein. Tile medium from these cells is compared to the medium from cells containing vectors lacking the signal sequence or extended cDNA insert to identify vectors which encode a functional signal peptide or an authentic secreted protein.
Those 5' ESTs which encoded a signal peptide, as determined by the method of Example 22 above, were further grouped into four categories based on their homology to known sequences. The categorization of the 5' ESTs is described in Example 24 below.
EXAMPLE 24 Categorization of 5' ESTs Encoding a Signal leptide Those 5' ESTs having a sequence not matching any known vertebrate sequence nor any publicly available EST sequence were designated "new." Of the sequences in the SIGNALTAGTM database, 947 of the 5' ESTs having a Von Heijne's score of at least 3.5 fell into this category.
Those 5' ESTs having a sequence not matching any vertebrate sequence but matching a publicly known EST were designated "EST-ext", provided that the known EST sequence was extended by at least nucleotides in the 5' direction. Of the sequences in the SIGNALTAG T M database, 150 of the 5' ESTs having a Von Heijne's score of at least 3.5 fell into this category.
Those ESTs not matching any vertebrate sequence but matching a publicly known EST without extending the known EST by at least 40 nucleotides in the 5' direction were designated "EST." Of the sequences in the SIGNALTAGTM database, 599 of the 5' ESTs having a Von Heijne's score of at least fell into this category.
Those 5' ESTs matching a human mRNA sequence but extending the known sequence by at least nucleotides in the 5' direction were designated "VERT-ext." Of the sequences in the SIGNALTAGTM database, 23 of the 5' ESTs having a Von Heijne's score of at least 3.5 fell into this category. Included in this category was a 5' EST which extended the known sequence of the human translocase mRNA by more than 200 bases in the 5' direction. A 5' EST which extended the sequence of a human tumor suppressor gene in the 5' direction was also identified.
Figure 4 shows the distribution of 5' ESTs in each category and the number of 5' ESTs in each category having a given minimum von Heijne's score.
Each of the 5' ESTs was categorized based on the tissue from which its corresponding mRNA was obtained, as described below in Example IEXAMPLE Cuterorization of Expression Pitterns Figure 5 shows the tissues from which the mRNAs corresponding to the 5' ESTs in each of tlle above described categories were obtained.
In addition to categorizing the 5' ESTs by the tissue from which the cDNA library in which they were first identified was obtained, the spatial and temporal expression patterns of the mRNAs corresponding to the 5' ESTs, as well as their expression levels, may be determined as described in Example 26 below. Characterization of the spatial and temporal expression patterns and expression levels of these mRNAs is useful for constructing expression vectors capable of producing a desired level of gene product in a desired spatial or temporal manner, as will be discussed in more detail below.
In addition, 5' ESTs whose corresponding mRNAs are associated with disease states may also be identified. For example, a particular disease may result from lack of expression, over expression, or under expression of an mRNA corresponding to a 5' EST. By comparing mRNA expression patterns and quantities in samples taken from healthy individuals with those from individuals suffering from a particular disease, 5' ESTs responsible for the disease may be identified.
It will be appreciated that the results of the above characterization procedures for 5' ESTs also apply to extended cDNAs (obtainable as described below) which contain sequences adjacent to the 5' ESTs.
It will also be appreciated that if it is desired to defer characterization until extended cDNAs have been obtained rather than characterizing the ESTs themselves, the above characterization procedures can be applied to characterize the extended cDNAs after their isolation.
EXAMPLE 26 Evaluation of Expression Levels and Patterns of mRNAs Corresponding to 5' ESTs or Extended cDNAs Expression levels and patterns of mRNAs corresponding to 5' ESTs or extended cDNAs (obtainable as described below) may be analyzed by solution hybridization with long probes as described in International Patent Application No. WO 97/05277. Briefly, a 5' EST, extended cDNA, or fragment thereof corresponding to the gene encoding the mRNA to be characterized is inserted at a cloning site immediately downstream of a bacteriophage (T3, T7 or SP6) RNA polymerase promoter to produce antisense RNA.
Preferably, the 5' EST or extended cDNA has 100 or more nucleotides. The plasmid is linearized and transcribed in the presence of ribonucleotides comprising modified ribonucleotides biotin-UTP and DIG-UTP). An excess of this doubly labeled RNA is hybridized in solution with mRNA isolated from cells or tissues of interest. The hybridizations are performed under standard stringent conditions (40-500C for 16 hours in an 80% formamide, 0.4 M NaCI buffer, pH The unhybridized probe is removed by digestion with ribonucleases specific for single-stranded RNA RNases CL3. TI. Phy M. U2 or The presence of lie biotin-UTP modification enables capture of the hybrid on a microtitration plate coated with streptavidin. The presence of the DIG modification enables the hybrid to be detected and quantified by ELISA using an anti-DIG antibody coupled to alkaline phosphalase.
The 5' ESTs, extended cDNAs, or fragments thereof may also be tagged with nucleotide sequences to for the serial analysis of gene expression (SAGE) as disclosed in UK Patent Application No. 2.305,241 A, In this method, cDNAs are prepared from a cell, tissue, organism or other source of nucleic acid for which it is desired to determine gene expression patterns. The resulting cDNAs are separated into two pools. The cDNAs in each pool are cleaved with a first restriction endonuclease, called an "anchoring enzyme," having a recognition site which is likely to be present at least once in most cDNAs. The fragments which contain the 5' or 3' most region of the cleaved cDNA are isolated by binding to a capture medium such as streptavidin coated beads. A first oligonucleotide linker having a first sequence for hybridization of an amplification primer and an internal restriction site for a "tagging endonuclease" is ligated to the digested cDNAs in the first pool. Digestion with the second endonuclease produces short "tag" fragments from the cDNAs, A second oligonucleotide having a second sequence for hybridization of an amplification primer and an internal restriction site is ligated to the digested cDNAs in the second pool. The eDNA fragments in the second pool are also digested with the "tagging endonuclease" to generate short "tag" fragments derived from the cDNAs in the second pool. The "tags" resulting from digestion of the first and second pools with the anchoring enzyme and the tagging endonuclease are ligated to one another to produce "ditags." In some embodiments, the ditags are concatamerized to produce ligation products containing from 2 to 200 ditags.
The tag sequences are then determined and compared to the sequences of the 5' ESTs or extended cDNAs to determine which 5' ESTs or extended cDNAs are expressed in the cell, tissue, organism, or other source of nucleic acids from which the tags were derived. In this way, the expression pattern of the 5' ESTs or extended cDNAs in the cell, tissue, organism, or other source of nucleic acids is obtained.
Quantitative analysis of gene expression may also be performed using arrays. As used herein, the term array means a one dimensional, two dimensional, or multidimensional arrangement of full length cDNAs extended cDNAs which include the coding sequence for the signal peptide, the coding sequence for the mature protein, and a stop codon), extended cDNAs, 5' ESTs or fragments of the full length cDNAs, extended cDNAs, or 5' ESTs of sufficient length to permit specific detection of gene expression. Preferably, the fragments are at least 15 nucleotides in length. More preferably, the fragments are at least 100 nucleotides in length. More preferably, the fragments are more than 100 nucleotides in length. In some embodiments the fragments may be more than 500 nucleotides in length.
For example, quantitative analysis of gene expression may be performed with full length cDNAs, extended cDNAs, 5' ESTs, or fragments thereof in a complementary DNA microarray as described by Schena et al. Science 270:467-470, 1995; Proc. Natl. Acad. Sci. U.S.A. 93:10614-10619 (1996). Full length cDNAs. extended cDNAs, 5' ESTs or fragments thereof are amplified by PCR and arrayed from 96-well microtiter plates onto silylated microscope slides using high-speed robotics. Printed arrays are incubated in a humid chamber to allow rehydrution of the array elements and rinsed, once in 0.2% SDS for I min, twice in water for 1 min and once for 5 min in sodium borohydride solution. The arrays are submerged in water for 2 min at 95 0 C, transferred into 0.2% SDS for 1 min, rinsed twice with water, air dried and stored in the dark at 250C.
Cell or tissue mRNA is isolated or commercially obtained and probes are prepared by a single round of reverse transcription. Probes are hybridized to I cm 2 microarrays under a 14 x 14 mm glass coverslip for 6-12 hours at 600C. Arrays are washed for 5 min at 25°C in low stringency wash buffer (l x SSC/0.2% SDS), then for 10 min at room temperature in high stringency wash buffer (0.1 x SSC/0.2% SDS). Arrays are scanned in 0.1 x SSC using a fluorescence laser scanning device fitted with a custom filter set. Accurate differential expression measurements are obtained by taking the average of the ratios of two independent hybridizations.
Quantitative analysis of the expression of genes may also be performed with full length cDNAs, extended cDNAs, 5' ESTs, or fragments thereof in complementary DNA arrays as described by Pietu et al.
Genoe Research 6:492-503 (1996). The full length cDNAs, extended cDNAs, 5' ESTs or fragments thereof are PCR amplified and spotted on membranes. Then, mRNAs originating from various tissues or cells are labeled with radioactive nucleotides. After hybridization and washing in controlled conditions, the hybridized mRNAs are detected by phospho-imaging or autoradiography. Duplicate experiments are performed and a quantitative analysis of differentially expressed mRNAs is then performed.
Alternatively, expression analysis of the 5' ESTs or extended cDNAs can be done through high density nucleotide arrays as described by Lockhart et al. Nature Bioteclnology 14: 1675-1680, 1996. and Sosnowsky et Proc. Nail. Acad. Sci. 94:1119-1123, 1997. Oligonucleotides of 15-50 nucleotides corresponding to sequences of the 5' ESTs or extended cDNAs are synthesized directly on the chip (Lockhart et al., supra) or synthesized and then addressed to the chip (Sosnowski et al., supra). Preferably, the oligonucleotides are about 20 nucleotides in length.
cDNA probes labeled with an appropriate compound, such as biotin, digoxigenin or fluorescent dye, are synthesized from the appropriate mRNA population and then randomly fragmented to an average size of 50 to 100 nucleotides. The said probes are then hybridized to the chip. After washing as described in Lockhart ct al., supra and application of different electric fields (Sosnowsky et al., Proc, Nail. Acad. Sci.
94:1119-1123)., the dyes or labeling compounds are detected and quantified. Duplicate hybridizations are performed. Comparative analysis of the intensity of the signal originating from cDNA probes on the same target oligonucleotide in different cDNA samples indicates a differential expression of the mRNA corresponding to the 5' EST or extended cDNA from which the oligonucleotide sequence has been designed.
III. Use of 5' ESTs to Clone Extended cDNAs and to Clone the Corresponding Genomic DNAs Once 5' ESTs which include the 5' end of the corresponding mRNAs have been selected using the procedures described above, Ihey can be utilized to isolate extended cDNAs which contain sequences adjacent to the 5' ESTs. The extended cDNAs may include tile entire coding sequence of the protein encoded by the corresponding mRNA, including the authentic translation start site, tle signal sequence, and the sequence encoding the mature protein remaining after cleavage of the signal peptide, Such extended cDNAs are referred to herein as "full length cDNAs." Alternatively, the extended cDNAs may include only the sequence encoding the mature protein remaining after cleavage of the signal peptide, or only the sequence encoding the signal peptide.
Example 27 below describes a general method for obtaining extended cDNAs. Example 28 below describes the cloning and sequencing of several extended cDNAs, including extended cDNAs which include the entire coding sequence and authentic 5' end of the corresponding mRNA for several secreted proteins.
The methods of Examples 27, 28, and 29 can also be used to obtain extended cDNAs which encode less than the entire coding sequence of the secreted proteins encoded by the genes corresponding to the ESTs. In some embodiments, the extended cDNAs isolated using these methods encode at least 10 amino acids of one of the proteins encoded by the sequences of SEQ ID NOs: 134-180. In further embodiments.
the extended cDNAs encode at least 20 amino acids of the proteins encoded by the sequences of SEQ ID NOs: 134-180. In funher embodiments, the extended cDNAs encode at least 30 amino acids of the sequences of SEQ ID NOs: 134-180. In a preferred embodiment, the extended cDNAs encode a full length protein sequence, which includes the protein coding sequences of SEQ ID NOs: 134-180.
EXAMPLE 27 General Method for Using 5' ESTs to Clone and Sequence Extended cDNAs which Include the Entire Coding Region and the Authentic 5' End of the Corresponding mRNA The following general method has been used to quickly and efficiently isolate extended cDNAs including sequence adjacent to the sequences of the 5' ESTs used to obtain them. This method may be applied to obtain extended cDNAs for any 5' EST in the NetGcncTn database, including those 5' ESTs encoding secreted proteins. The method is summarized in figure 6.
I. Obtaining Extended cDNAs a) First strand synthesis The method takes advantage of the known 5' sequence of the mRNA. A reverse transcription reaction is conducted on purified mRNA with a poly 14dT primer containing a 49 nucleotide sequence at its end allowing the addition of a known sequence at the end of the cDNA which corresponds to the 3' end of the mRNA. For example, the primer may have the following sequence: 5'-ATC GTT GAG ACT CGT ACC AGC AGA GTC ACG AGA GAG ACT ACA CGG TAC TGG IT TTT TT ITT T VN (SEQ ID NO: 14). Those skilled in the art will appreciate that other sequences may also be added to the poly dT sequence and used to prime the first strand synthesis. Using this primer and a reverse transcriptase such as the Superscript II (Gibco BRL) or Rnase H Minus M-MLV (Promega) enzyme, a reverse transcript anchored at the 3' polyA site of the RNAs is generated.
Alter removal of the mRNA hybridized to the first cDNA strand by alkaline hydrolysis, the products of the alkaline hydrolysis and the residual poly dT primer are eliminated with an exclusion columni such as an AcA34 (Biosepra) matrix as explained in Example 11.
b) Second strand synthesis A pair of nested primers on each end is designed based on the known 5' sequence from the 5' EST and the known 3' end added by the poly dT primer used in the first strand synthesis. Softwares used to design primers are either based on GC content and melting temperatures of oligonucleotides, such as OSP (Illier and Green, PCR Meth. Appl. 1:124-128, 1991), or based on the octamer frequency disparity method (Griffais et al., Nucleic Acids Res. 19: 3887-3S91, 1991 such as PC-Rare (http://bioinformatics.weizmann.ac.i lsoftware/PC-Rare/doc/manuel.html).
Preferably, the nested primers at the 5' end are separated from one another by four to nine bases.
The 5' primer sequences may be selected to have melting temperatures and specificities suitable for use in
PCR.
Preferably, the nested primers at the 3' end are separated from one another by four to nine bases.
For example, the nested 3' primers may have the following sequences: CCA GCA GAG TCA CGA GAG AGA CTA CAC GG -3'(SEQ ID NO: 15). and CAC GAG AGA GAC TAC ACG GTA CTG G -3' (SEQ ID NO: 16). These primers were selected because they have melting temperatures and specificities compatible with their use in PCR. However, those skilled in the art will appreciate that other sequences may also be used as primers.
The first PCR run of 25 cycles is performed using the Advantage Tth Polymerase Mix (Clontech) and the outer primer from each of the nested pairs. A second 20 cycle PCR using the same enzyme and the inner primer from each of the nested pairs is then performed on 1/2500 of the first PCR product. Thereafter, the primers and nucleotides are removed.
2. Sequencine of Full Length Extended cDNAs or Fragments Thereof Due to the lack of position constraints on the design of 5' nested primers compatible for PCR use using the OSP software, amplicons of two types are obtained. Preferably, the second 5' primer is located upstream of the translation initiation codon thus yielding a nested PCR product containing the whole coding sequence. Such a full length extended cDNA undergoes a direct cloning procedure as described in section a. However, in some cases, the second 5' primer is located downstream of the translation initiation codon, thereby yielding a PCR product containing only part of the ORF. Such incomplete PCR products arc submitted to a modified procedure described in section b.
a) Nested PCR products containing complete ORFs When the resulting nested PCR product contains the complete coding sequence, as predicted from the 5'EST sequence, it is cloned in an appropriate vector such as pED6dpc2. as described in section 3.
b) Nested PCR products containing incomplete ORFs When the amplicon does not contain the complete coding sequence, intermediate steps are necessary to obtain both the complete coding sequence and a PCR product containing the full coding sequence. The complete coding sequence can be assembled from several partial sequences determined directly from different PCR products as described in the following section.
Once the full coding sequence has been completely determined, new primers compatible for PCR use are designed to obtain amplicons containing the whole coding region. However, in such cases, 3' primers compatible for PCR use are located inside the 3' UTR of the corresponding mRNA, thus yielding amplicons which lack part of this region, i.e. the polyA tract and sometimes the polyadenylation signal, as illustrated in figure 6. Such full length extended cDNAs are then cloned into an appropriate vector as described in section 3.
c) Sequencing extended cDNAs Sequencing of extended cDNAs is performed using a Die Terminator approach with the AmpliTuq DNA polymerase FS kit available from Perkin Elmer.
In order to sequence PCR fragments, primer walking is performed using software such as OSP to choose primers and automated computer software such as ASMG (Sutton et al., Genonme Science Technol.
1: 9-19, 1995) to construct contigs of walking sequences including the initial 5' tag using minimum overlaps of 32 nucleotides. Preferably, primer walking is performed until the sequences of full length cDNAs are obtained.
Completion of the sequencing of a given extended cDNA fragment is assessed as follows. Since sequences located after a polyA tract are difficult to determine precisely in the case of uncloned products, sequencing and primer walking processes for PCR products are interrupted when a polyA tract is identified in extended cDNAs obtained as described in case b. The sequence length is compared to the size of the nested PCR product obtained as described above. Due to the limited accuracy of the determination of the PCR product size by gel electrophoresis, a sequence is considered complete if the size of the obtained sequence is at least 70 the size of the first nested PCR product. If the length of the sequence determined from the computer analysis is not at least 70% of the length of the nested PCR product, these PCR products are cloned and the sequence of the insertion is determined. When Northern blot data are available, the size of the mRNA detected for a given PCR product is used to finally assess that the sequence is complete.
Sequences which do not fulfill the above criteria are discarded and will undergo a new isolation procedure.
Sequence data of all extended cDNAs are then transferred to a proprietary database, where quality controls and validation steps are carried out as described in example 3. Cloning of Full Length Extended cDNAs The PCR product containing the full coding sequence is then cloned in an appropriate vector. For example, the extended cDNAs can be cloned into the expression vector pED6dpc2 (DiscoverEase. Genetics Institute, Cambridge, MA) as follows. The structure of pED6dpc2 is shown in Figure 7. pED6dpc2 vector DNA is prepared with blunt ends by performing an EcoRI digestion followed by a fill in reaction. The blunt ended vector is dephosphorylated. After removal of PCR primers and ethanol precipitation. the PCR product containing the full coding sequence or lte extended cDNA obtained as described above is plosplhorylaIed with a kinase subsequently removed by phenol-Sevag extraction and precipitation. The double stranded extended cDNA is then ligated to the vector and the resulting expression plasmid introduced into appropriate host cells.
Since the PCR products obtained as described above are blunt ended molecules that can be cloned in either direction, the orientation of several clones for each PCR product is determined. Then, 4 to clones are ordered in microtiter plates and subjected to a PCR reaction using a first primer located in the vector close to the cloning site and a second primer located in the portion of the extended cDNA corresponding to the 3' end of the mRNA. This second primer may be the antisense primer used in anchored PCR in the case of direct cloning (case a) or the antisense primer located inside the 3'UTR in the case of indirect cloning (case Clones in which the start codon of the extended cDNA is operably linked to the promoter in the vector so as to permit expression of the protein encoded by the extended cDNA are conserved and scquenccd. In addition to the ends of cDNA inserts, approximately 50 bp of vector DNA on each side of the cDNA insert are also sequenced.
The cloned PCR products are then entirely sequenced according to the aforementioned procedure.
In this case, contig assembly of long fragments is then performed on walking sequences that have already contigatcd for uncloned PCR products during primer walking. Sequencing of cloned amplicons is complete when the resulting contigs include the whole coding region as well as overlapping sequences with vector DNA on both ends.
4. Comnuter Analysis of Full Length Extended cDNA Sequences of all full length extended cDNAs are then submitted to further analysis as described below and using the parameters found in Table I with the following modifications. For screening of miscellaneous subdivisions of Genbank, FASTA was used instead of BLASTN and 15 nucleotide of homology was the limit instead of 17. For Alu detection, BLASTN was used with the following parameters: S=72; identity=70%; and length 40 nuclcotides. Polyadenylation signal and polyA tail which were not search for the 5' ESTs were searched. For polyadcnylation signal detection the signal (AATAAA) was searched with one permissible mismatch in the last ten nucleotides preceding the 5' end of the polyA.
For the polyA, a stretch of 8 amino acids in the last 20 nucleotides of the sequence was searched with BLAST2N in the sense strand with the following parameters S=10, E=1000, and Finally, patented sequences and ORF homologies were searched using, respectively. BLASTN and BLASTP on GenSEQ (Derwent's database of patented nucleotide sequences) and SWISSPROT for ORFs with the following parameters (W=8 and B=10). Before examining the extended full length cDNAs for sequences of interest, extended cDNAs which are not of interest are searched as follows.
a) Elimination of undesired sequences Although 5'ESTs were checked to remove contaminants sequences as described in Example 18. a last verification was carried out to identify extended cDNAs sequences derived from undesired sequences such as vector RNAs. transfer RNAs, ribosomal rRNAs, mitochondrial RNAs, prokaryotic RNAs and fungal RNAs using the FASTA and BLASTN programs on both strands of extended cDNAs as described below.
To identify the extended cDNAs encoding vector RNAs, extended cDNAs are compared to the known sequences of vector RNA using the FASTA program. Sequences of extended cDNAs with more than 90% homology over stretches of 15 nucleotides are identified as vector RNA.
To identify the extended cDNAs encoding tRNAs, extended cDNA sequences were compared to the sequences of 1190 known tRNAs obtained from EMBL release 38, of which 100 were human.
Sequences of extended cDNAs having more than 80% homology over 60 nucleotides using FASTA were identified as tRNA.
To identify the extended cDNAs encoding rRNAs. extended cDNA sequences were compared to the sequences of 2497 known rRNAs obtained from EMBL release 38, of which 73 were human.
Sequences of extended cDNAs having more than 80% homology over stretches longer than 40 nucleotides using BLASTN were identified as rRNAs.
To identify the extended cDNAs encoding mtRNAs, extended cDNA sequences were compared to the sequences of the two known mitochondrial genomes for which the entire genomic sequences are available and all sequences transcribed from these mitochondrial gcnomes including tRNAs, rRNAs, and mRNAs for a total of 38 sequences. Sequences of extended cDNAs having more than 80% homology over stretches longer than 40 nucleotides using BLASTN were identified as mtRNAs.
Sequences which might have resulted from other exogenous contaminants were identified by comparing extended cDNA sequences to release 105 of Genbank bacterial and fungal divisions. Sequences of extended cDNAs having more than 90% homology over 40 nucleotides using BLASTN were identified as exogenous prokaryotic or fungal contaminants.
In addition, extended cDNAs were searched for different repeat sequences, including Alu sequences, LI sequences, THE and MER repeats, SSTR sequences or satellite, micro-satellite, or telomeric repeats. Sequences of extended cDNAs with more than 70% homology over 40 nucleotide stretches using BLASTN were identified as repeat sequences and masked in further identification procedures. In addition, clones showing extensive homology to repeats matches of either more than 50 nucleotides if the homology was at least 75% or more than 40 nucleotides if the homology was at least or more than 30 nucleotides if the homology was at least 90%, were flagged.
b) Identification of structural features Structural features, e.g. polyA tail and polyadenylation signal, of the sequences of full length extended cDNAs are subsequently determined as follows.
A polyA tail is defined as a homopolymeric stretch of at least 11 A with at most one alternative base within it. The polyA tail search is restricted to the last 20 nt of the sequence and limited to stretches of 11 consecutive A's because sequencing reactions are often not readable after such a polyA stretch. Stretches with 100% homology over 6 nucleotides are identified as polyA tails.
To search for a polyadenylation signal, the polyA tail is clipped from the full-length sequence. The bp preceding tlhe polyA tail are searched for the canonic polyadenylation AAUAAA signal allowing one mismatch to account for possible sequencing errors and known variation in the canonical sequence of the polyadenylation signal.
c) Identification of functional features Functional features, e.g. ORFs and signal sequences, of the sequences of full length extended cDNAs were subsequently determined as follows.
The 3 upper strand frames of extended cDNAs are searched for ORFs defined as the maximum length fragments beginning with a translation initiation codon and ending with a stop codon. ORFs encoding at least 20 amino acids are preferred.
Each found ORF is then scanned for the presence of a signal peptide in the first 50 amino-acids or, where appropriate, within shorter regions down to 20 amino acids or less in the ORF, using the matrix method of von Heijnc (Nuc. Acids Res. 14: 4683-4690 (1986)), the disclosure of which is incorporated herein by reference and the modification described in Example 22.
d) Homology to cither nuclcotidic or proteic sequences Sequences of full length extended cDNAs are then compared to known sequences on a nucleotidic or proteic basis.
Sequences of full length extended cDNAs are compared to the following known nucleic acid sequences: vertebrate sequences (Genbank release GB), EST sequences (Genbank release GB), patented sequences (Genseqn release GSEQ) and recently identified sequences (Genbank daily release) available at the time of filing. Full length cDNA sequences are also compared to the sequences of a private database (Genset internal sequences) in order to find sequences that have already been identified by applicants.
Sequences of full length extended cDNAs with more than 90% homology over 30 nucleotides using either BLASTN or BLAST2N as indicated in Table II are identified as sequences that have already been described. Matching vertebrate sequences are subsequently examined using FASTA; full length extended cDNAs with more than 70% homology over 30 nucleotides are identified as sequences that have already been described.
ORFs encoded by full length extended cDNAs as defined in section c) are subsequently compared to known amino acid sequences found in Swissprot release CHP, PIR release PIR# and Genpept release (PEEPI' public databases using BLASTP with the parameter W=8 and allowing a maximum of 10 matches.
Sequences of full length extended cDNAs showing extensive homology to known protein sequences are recognized as already identified proteins.
In addition, the three-frame conceptual translation products of the top strand of full length extended cDNAs are compared to publicly known amino acid sequences of Swissprot using BLASTX with the parameter E=0.001. Sequences of full lengll extended cDNAs with more than 70% homology over amino acid stretches are detected as already identified proteins.
Selection of Cloned Full Lentllh Sequences oi the Present Invention Cloned full length extended cDNA sequences that have already been characterized by the aforementioned computer analysis are then submitted to an automatic procedure in order to preselect full length extended cDNAs containing sequences of interest.
a) Automatic sequence preselection All complete cloned full length extended cDNAs clipped for vector on both ends are considered.
First, a negative selection is operated in order to eliminate unwanted cloned sequences resulting from either contaminants or PCR artifacts as follows. Sequences matching contaminant sequences such as vector RNA, tRNA, mtRNA, rRNA sequences are discarded as well as those encoding ORF sequences exhibiting extensive homology to repeats as defined in section 4 Sequences obtained by direct cloning using nested primers on 5' and 3' tags (section 1. case a) but lacking polyA tail are discarded. Only ORFs containing a signal peptide and ending either before the polyA tail (case a) or before the end of the cloned 3'UTR (case b) are kept. Then, ORFs containing unlikely mature proteins such as mature proteins which size is less than amino acids or less than 25% of the immature protein size are eliminated.
In the selection of the OFR, priority was given to the ORF and the frame corresponding to the polypcptides described in SignalTag Patents (United States Patent Application Serial Nos: 08/905,223; 08/905,135; 08/905,051; 08/905,144; 08/905,279; 08/904,468; 08/905.134; and 08/905,133). If the ORF was not found among the OFRs described in the SignalTag Patents, the ORF encoding the signal peptide with the highest score according to Von Heijne method as defined in Example 22 was chosen. If the scores were identical, then the longest ORF was chosen.
Sequences of full length extended cDNA clones are then compared pairwise with BLAST after masking of the repeat sequences. Sequences containing at least 90% homology over 30 nucleotides are clustered in the same class. Each cluster is then subjected to a cluster analysis that detects sequences resulting from internal priming or from alternative splicing, identical sequences or sequences with several frameshifts. This automatic analysis serves as a basis for manual selection of the sequences.
b) Manual sequence selection Manual selection is carried out using automatically generated reports for each sequenced full length extended cDNA clone. During this manual procedures, a selection is operated between clones belonging to the same class as follows. ORF sequences encoded by clones belonging to the same class are aligned and compared. If the homology between nucleotidic sequences of clones belonging to the same class is more than 90% over 30 nucleotide stretches or if the homology between amino acid sequences of clones belonging to the same class is more than 80% over 20 amino acid stretches, than the clones are considered as being identical. The chosen ORF is the best one according to the criteria mentioned below. If the nucleotide and amino acid homologies are less than 90% and 80% respectively, the clones are said to encode distinct proteins which can be both selected if they contain sequences of interest.
Selection of full length extended cDNA clones encoding sequences of interest is performed using the following criteria. Structural parameters (initial tag, polyadenylation site and signal) are first checked, Then, homologies with known nucleic acids and proteins are examined in order to determine whether the clone sequence match a known nucleic/proteic sequence and, in the latter case, its covering rate and the date at which the sequence became public. If there is no extensive match with sequences other than ESTs or genomic DNA, or if the clone sequence brings substantial new information, such as encoding a protein resulting from alternative slicing of an mRNA coding for an already known protein, the sequence is kept.
Examples of such cloned full length extended cDNAs containing sequences of interest are described in Example 28. Sequences resulting from chimera or double inserts as assessed by homology to other sequences are discarded during this procedure.
EXAMPLE 28 Cloning and Sequencing of Extended cDNAs The procedure described in Example 27 above was used to obtain the extended cDNAs of the present invention. Using this approach, the full length cDNA of SEQ ID NO: 17 was obtained. This cDNA falls into the "EST-ext" category described above and encodes the signal peptide MKKVLLLITAILAVAVG (SEQ ID NO: 18) having a von Heijne score of 8.2.
The full length cDNA of SEQ ID NO:49 was also obtained using this procedure. This cDNA falls into the "EST-ext" category described above and encodes the signal peptide MWWFQQGLSFLPSALVIWTSA (SEQ ID NO:20) having a von Heijne score of Another full length cDNA obtained using the procedure described above has the sequence of SEQ ID NO:21. This cDNA, falls into the "EST-ext" category described above and encodes the signal peptide MVLTILPSANSANSPVNMPTTGPNSLSYASSALSPCLT (SEQ ID NO:22) having a von Heijne score of 5.9.
The above procedure was also used to obtain a full length cDNA having the sequence of SEQ ID NO:23. This cDNA falls into the "EST-ext" category described above and encodes the signal peptide ILSTVTALTFAXA (SEQ ID NO:24) having a von Heijne score of The full length cDNA of SEQ ID NO:25 was also obtained using this procedure. This cDNA falls into the "new" category described above and encodes a signal peptide LVLTLCTLPLAVA (SEQ ID NO:26) having a von Heijne score of 10.1.
The full length cDNA of SEQ ID NO:27 was also obtained using this procedure. This cDNA falls into the "new" category described above and encodes a signal peptide LWLLFFLVTAIHA (SEQ ID NO:28) having a von Heijne score of 10.7.
The above procedures were also used to obtain the extended cDNAs of the present invention. ESTs expressed in a variety of tissues were obtained as described above. The appended sequence listing provides the tissues from which the extended cDNAs were obtained. It will be appreciated that the extended cDNAs may also be expressed in tissues other than the tissue listed in the sequence listing.
ESTs obtained as described above were used to obtain extended cDNAs having the sequences of SEQ ID NOs: 40-86. Table II provides the sequence identification numbers of the extended cDNAs of the present invention, the locations of the full coding sequences in SEQ ID NOs: 40-86 the nucleotides encoding both the signal peptide and the mature protein, listed under the heading FCS location in Table II).
the locations of the nucleotides in SEQ ID NOs: 40-86 which encode the signal peptides (listed under the heading SigPep Location in Table 11), the locations of the nucleotides in SEQ ID NOs: 40-86 which encode the mature proteins generated by cleavage of the signal peptides (listed under the heading Mature Polypeptide Location in Table the locations in SEQ ID NOs: 40-86 of stop codons (listed under the heading Stop Codon Location in Table II), the locations in SEQ ID NOs: 40-86 of polyA signals (listed under the heading Poly A Signal Location in Table II) and the locations of polyA sites (listed under the heading Poly A Site Location in Table II).
The polypeptides encoded by the extended cDNAs were screened for the presence of known structural or functional motifs or for the presence of signatures, small amino acid sequences which are well conserved amongst the members of a protein family. The conserved regions have been used to derive consensus patterns or matrices included in the PROSITE data bank, in particular in the file prosite.dat (Release 13.0 of November 1995, located at http://expasy.hcuge.ch/sprot/prosite.html. Prositeconvert and prositcscan programs (http://ulrec3.unil.ch/ftpserveur/prositescan) were used to find signatures on the extended cDNAs.
For each pattern obtained with the prosite.convert program from the prosite.dat file, the accuracy of the detection on a new protein sequence has been tested by evaluating the frequency of irrelevant hits on the population of human secreted proteins included in the data bank SWISSPROT. The ratio between the number of hits on shuffled proteins (with a window size of 20 amino acids) and the number of hits on native (unshuffled) proteins was used as an index. Every pattern for which the ration was greater than 20% (one hit on shuffled proteins for 5 hits on native proteins) was skipped during the search with prosite_scan. The program used to shuffle protein sequences (dbshuffled) and the program used to determine the statistics for each pattern in the protein data banks (prosite_statistics) are available on the ftp site http://ulrec3.unil.ch/ftpserveur/prositescan.
The results of the search are provided in Table II. The first column provides the ID number of the sequence. The second column indicates the beginning and end positions of the signature. The Prosite definition of the signature is indicated in the third column.
Table IV lists the sequence identification numbers of the polypeptides of SEQ ID NOs: 87-133, the locations of the amino acid residues of SEQ ID NOs: 87-133 in the full length polypeptide (second column), the locations of the amino acid residues of SEQ ID NOs: 87-133 in the signal peptides (third column), and the locations of the amino acid residues ofSEQ ID NOs: 87-133 in the mature polypeptide created by cleaving the signal peptide from the full length polypeptide (fourth column). In Table IV, the first amino acid of the signal peptide is designated as amino acid number 1. In the appended sequence listing, the first aminn acid of the mature protein resulting from cleavage of the signal peptide is designated as amino acid number I and the first amino acid of tle signal peptide is designated with the appropriate negative number, in accordance with the regulations governing sequence listings.
The extended cDNAs of tle present invention were categorized based on their homology to known sequences. Genebank release #103, division ESTs, and Geneseq release #28 were used to scan the extended cDNAs using Blast. For each extended cDNA ID, the covering rate of the sequence by another sequence was determined as follows. The length in nucleotides of the matching segment was calculated (even when gaps were present) and divided by the length in nucleotides of the extended cDNA sequence. When more than one covering rate was obtained for a given extended cDNA, the higher covering rate was used to classify the extended cDNA. The Geneseq sequences have been categorized as either ESTs or vertebrate, with ESTs being those sequences obtained by random sequencing of cDNA libraries and vertebrate sequences being those sequences containing sequences resembling known functional motifs.
The results of this categorization are provided in Table V. The first column lists the sequence identification number of the sequence being categorized. The second column indicates those sequences having no matches with the database scanned. The third column indicates those sequences having a covering rate of less than 30%. The fourth column indicates those sequences having a covering rate greater than 30%. The fifth column indicates sequences partially or totally covered by vertebrate sequences as described above.
The nucleotide sequences of the sequences of SEQ ID NOs: 40-86 and 134-180, and the amino acid sequences encoded by SEQ ID NOs: 40-86 and 134-180 amino acid sequences of SEQ ID NOs: 87-133 and 181-227) are provided in the appended sequence listing. In some instances, the sequences are preliminary and may include some incorrect or ambiguous sequences or amino acids. The sequences of SEQ ID NOs: 40-86 and 134-180 can readily be screened for any errors therein and any sequence ambiguities can be resolved by resequencing a fragment containing such errors or ambiguities on both strands. Nucleic acid fragments for resolving sequencing errors or ambiguities may be obtained from the deposited clones or can be isolated using the techniques described herein. Resolution of any such ambiguities or errors may be facilitated by using primers which hybridize to sequences located close to the ambiguous or erroneous sequences. For example, the primers may hybridize to sequences within 50-75 bases of the ambiguity or error. Upon resolution of an error or ambiguity, the corresponding corrections can be made in the protein sequences encoded by the DNA containing the error or ambiguity. The amino acid sequence of the protein encoded by a particular clone can also be determined by expression of the clone in a suitable host cell, collecting the protein, and determining its sequence.
For each amino acid sequence, Applicants have identified what they have determined to be the reading frame best identifiable with sequence information available at the time of filing. Some of the amino acid sequences may contain "Xaa" designators. These "Xaa" designators indicate either a residue which cannot be identilied because of nucleotide sequence ambiguity or a stop codon in the determined sequence where Applicants believe one should not exist (if the sequence were determined more accurately).
Cells containing the 47 extended cDNAs (SEQ ID NOs: 134-180) of the present invention in thevector pED6dpc2, are maintained in permanent deposit by the inventors at Genset, 24 Rue Royale, 75008 Paris, France.
A pool of the cells containing the 47 extended cDNAs (SEQ ID NOs: 134-180), from which the cells containing a particular polynucleotide is obtainable, will be deposited with the American Type Culture Collection. Each extended cDNA clone will be transfected into separate bacterial cells (E-coli) in this composite deposit. A pool of cells containing the 43 extended cDNAs (SEQ ID NOs: 134, 136-143, 145- 162, 164-174, and 176-180), from which the cells containing a particular polynucleotide is obtainable, were deposited with the American Type Culture Collection on December 16, 1997, under the name SignalTag 1- 43. and ATCC accession No. 98619. A pool of cells comprising the 2 extended cDNAs (SEQ ID NOs: 144 and 163), from which the cells containing a particular polynucleotide is obtainable, were deposited with the American Type Culture Collection on October 15, 1998, under the name SignalTag 44-66, and ATCC accession No. 98923. Each extended cDNA can be removed from the pED6dpc2 vector in which it was deposited by performing a Notl, PstI double digestion to produce the appropriate fragment for each clone.
The proteins encoded by the extended cDNAs may also be expressed from the promoter in pED6dpc2.
Bacterial cells containing a particular clone can be obtained from the composite deposit as follows: An oligonucleotide probe or probes should be designed to the sequence that is known for that particular clone. This sequence can be derived from the sequences provided herein, or from a combination of those sequences. The design of the oligonucleotide probe should preferably follow these parameters: It should be designed to an area of the sequence which has the fewest ambiguous bases if any; Preferably, the probe is designed to have a T. of approx. 80°C (assuming 2 degrees for each A or T and 4 degrees for each G or However, probes having melting temperatures between 40 °C and 80 °C may also be used provided that specificity is not lost.
The oligonucleotide should preferably be labeled with g-"PATP (specific activity 6000 Ci/mmole) and T4 polynucleotide kinase using commonly employed techniques for labeling oligonucleotides. Other labeling techniques can also be used. Unincorporated label should preferably be removed by gel filtration chromatography or other established methods. The amount of radioactivity incorporated into the probe should be quantified by measurement in a scintillation counter. Preferably. specific activity of the resulting probe should be approximately 4X 10" dpn/pmole.
The bacterial culture containing the pool of full-length clones should preferably be thawed and 100 tl of the stock used to inoculate a sterile culture flask containing 25 ml of sterile L-broth containing ampicillin at 100 pg/ml. The culture should preferably be grown to saturation at 37'C. and the saturated culture should preferably be diluted in fresh L-broth. Aliquots of these dilutions should preferably be plated to determine the dilution and volume which will yield approximately 5(XX) distinct and well-separaied colonies on solid bacteriological media contaiing L-broth containing ampicillin at 100 pg/nll and agar at in a 150 mm petri dish when grown overnight at 37C,. Other known methods of obtaining distinct.
well-separated colonies can also be employed.
Standard colony hybridization procedures should then be used to transfer the colonies to nitrocellulose filters and lyse. denature and bake them.
The filter is then preferably incubated at 65CC for I hour with gentle agitation in 6X SSC stock is 175.3 g NaC I/liter, 88.2 g Na citrate/liter, adjusted to pH 7.0 with NaOH) containing 0.5% SDS.
100 pg/ml of yeast RNA, and 10 mM EDTA (approximately 10 mL per 150 mm filter). Preferably, the probe is then added to the hybridization mix at a concentration greater than or equal to IX 10 dpm/mL. The filter is then preferably incubated at 65CC with gentle agitation overnight. The filter is then preferably washed in 500 mL of 2X SSC/0.1 SDS at room temperature with gentle shaking for 15 minutes. A third wash with 0. IX SSC/0.5% SDS at 65°C for 30 minutes to 1 hour is optional. The filter is then preferably dried and subjected to autoradiography for sufficient time to visualize the positives on the X-ray film. Other known hybridization methods can also be employed.
The positive colonies are picked, grown in culture, and plasmid DNA isolated using standard procedures. The clones can then be verified by restriction analysis, hybridization analysis, or DNA sequencing, The plasmid DNA obtained using these procedures may then be manipulated using standard cloning techniques familiar to those skilled in the an. Alternatively, a PCR can be done with primers designed at both ends of the extended cDNA insertion. For example, a PCR reaction may be conducted using a primer having the sequence GGCCATACACTTGAGTGAC (SEQ ID NO:38) and a primer having the sequence ATATAGACAAACGCACACC (SEQ. ID. NO:39), The PCR product which corresponds to the extended cDNA can then be manipulated using standard cloning techniques familiar to those skilled in the an.
In addition to PCR based methods for obtaining extended cDNAs, traditional hybridization based methods may also be employed. These methods may also be used to obtain the gcnomic DNAs which encode the mRNAs from which the 5' ESTs were derived, mRNAs corresponding to the extended cDNAs, or nucleic acids which are homologous to extended cDNAs or 5' ESTs. Example 29 below provides an example of such methods.
EXAMPLE 29 Methods for Obtaining Extended cDNAs or Nucleic Acids Homologous to Extended cDNAs or 5' ESTs A full length cDNA library can be made using the strategies described in Examples 13. 14, 15. and 16 above by replacing the random nonamer used in Example 14 with an oligo-dT primer. For instance, the oligonucleotide of SEQ ID NO: 14 may be used, Alternatively, a cDNA library or genomic DNA library may be obtained from a commercial source or made using techniques familiar to those skilled in the art. The library includes cDNAs which are derived from the mRNA corresponding to a 5' EST or which have homology to an extended cDNA or 5' EST. The cDNA library or genomic DNA library is hybridized to a detectable probe comprising at least consecutive nucleotides from the 5' EST or extended cDNA using conventional techniques. Preferably, the probe comprises at least 12, 15, or 17 consecutive nucleotides from the 5' EST or extended cDNA. More preferably, the probe comprises at least 20-30 consecutive nucleotides from the 5' EST or extended cDNA.
In some embodiments, the probe comprises more than 30 nucleotides from the 5' EST or extended cDNA.
Techniques for identifying cDNA clones in a cDNA library which hybridize to a given probe sequence are disclosed in Sambrook et al., Molecular Cloning: A Liboratory Manual 2d Ed., Cold Spring Harbor Laboratory Press, (1989). The same techniques may be used to isolate genomic DNAs.
Briefly, cDNA or genomic DNA clones which hybridize to the detectable probe are identified and isolated for further manipulation as follows. A probe comprising at least 10 consecutive nucleotides from the 5' EST or extended cDNA is labeled with a detectable label such as a radioisotope or a fluorescent molecule. Preferably, the probe comprises at least 12, 15, or 17 consecutive nucleotides from the 5' EST or extended cDNA. More preferably, the probe comprises 20-30 consecutive nucleotides from the 5' EST or extended cDNA. In some embodiments, the probe comprises more than 30 nucleotides from the 5' EST or extended cDNA.
Techniques for labeling the probe are well known and include phosphorylation with polynucleotide kinase, nick translation, in vitro transcription, and non-radioactive techniques. The cDNAs or genomic DNAs in the library are transferred to a nitrocellulose or nylon filter and denatured. After incubation of the filter with a blocking solution, the filter is contacted with the labeled probe and incubated for a sufficient amount of time for the probe to hybridize to cDNAs or genomic DNAs containing a sequence capable of hybridizing to the probe.
By varying the stringency of the hybridization conditions used to identify extended cDNAs or genomic DNAs which hybridize to the detectable probe, extended cDNAs having different levels of homology to the probe can be identified and isolated. To identify extended cDNAs or genomic DNAs having a high degree of homology to the probe sequence, the melting temperature of the probe may be calculated using the following formulas: For probes between 14 and 70 nucleotides in length the melting temperature (Tm) is calculated using the formula; Tm=8 .5+16.6(log (fraction G+C)-(600/N) where N is the length of the probe.
If the hybridization is carried out in a solution containing fomiamide. the melting temperature may be calculated using the equation Tm=8 1.5+16.6(log [Na+l)+0,41(fraction formamide)- (OO/N) where N is the length of the probe.
Prehybridization may be carried out in 6X SSC, 5X Denhardt's reagent, 0.5% SDS, 100pg denatured fragmented salmon sperm DNA or 6X SSC, 5X Denhardt's reagent, 0.5% SDS. 100.g denatured Iragmented salmon sperm DNA. 50% formamide. The formulas for SSC and Denhardt's solutions are listed in Sambrook et al., supra.
Hybridization is conducted by adding the detectable probe to the prehybridization solutions listed above. Where the probe comprises double stranded DNA, it is denatured before addition to the hybridization solution. The filter is contacted with the hybridization solution for a sufficient period of time to allow the probe to hybridize to extended cDNAs or genomic DNAs containing sequences complementary thereto or homologous thereto. For probes over 200 nucleotides in length, the hybridization may be carried out at 15-25°C below the Tm. For shorter probes, such as oligonucleotide probes, the hybridization may be conducted at 15-25OC below the Tm. Preferably, for hybridizations in 6X SSC, the hybridization is conducted at approximately 680C. Preferably, for hybridizations in 50% fonnamide containing solutions, the hybridization is conducted at approximately 420C.
All of the foregoing hybridizations would be considered to be under "stringent" conditions.
Following hybridization, the filter is washed in 2X SSC, 0.1% SDS at room temperature for minutes. The filter is then washed with 0.1X SSC. 0.5% SDS at room temperature for 30 minutes to I hour.
Thereafter, the solution is washed at the hybridization temperature in 0. 1X SSC, 0.5% SDS. A final wash is conducted in 0. IX SSC at room temperature.
Extended cDNAs, nucleic acids homologous to extended cDNAs or 5' ESTs, or genomic DNAs which have hybridized to the probe are identified by autoradiography or other conventional techniques.
The above procedure may be modified to identify extended cDNAs, nucleic acids homologous to extended cDNAs, or genomic DNAs having decreasing levels of homology to the probe sequence. For example, to obtain extended cDNAs, nucleic acids homologous to extended cDNAs, or genomic DNAs of decreasing homology to the detectable probe, less stringent conditions may be used. For example, the hybridization temperature may be decreased in increments of 5°C from 68 0 C to 42°C in a hybridization buffer having a Na+ concentration of approximately IM, Following hybridization, the filter may be washed with 2X SSC, 0.5% SDS at the temperature of hybridization. These conditions are considered to be "moderate" conditions above 500C and "low" conditions below Alternatively, the hybridization may be carried out in buffers, such as 6X SSC, containing formamide at a temperature of 42 0 C. In this case, the concentration of formamide in the hybridization buffer may be reduced in 5% increments from 50% to 0% to identify clones having decreasing levels of homology to the probe. Following hybridization, the filter may be washed with 6X SSC, 0.5% SDS at These conditions are considered to be "moderate" conditions above 25% formamide and "low" conditions below 25% formamide.
Extended cDNAs, nucleic acids homologous to extended cDNAs, or genomic DNAs which have hybridized to the probe are identified by autoradiography.
If it is desired to obtain nucleic acids homologous to extended cDNAs, such as allclic variants thereof or nucleic acids encoding proteins related to the proteins encoded by the extended cDNAs, the level of homology between the hybridized nucleic acid and the extended cDNA or 5' EST used as the probe may readily be detenmined. To determine the level of homology between the hybridized nucleic acid and the extended cDNA or 5'EST from which the probe was derived, the nucleotide sequences of the hybridized nucleic acid and the extended cDNA or 5'EST from which the probe was derived are compared. For example, using the above methods, nucleic acids having at least 95% nucleic acid homology to the extended cDNA or 5'EST from which the probe was derived may be obtained and identified. Similarly, by using progressively less stringent hybridization conditions one can obtain and identify nucleic acids having at least at least 85%, at least 80% or at least 75% homology to the extended cDNA or 5'EST from which the probe was derived.
To determine whether a clone encodes a protein having a given amount of homology to the protein encoded by the extended cDNA or 5' EST, the amino acid sequence encoded by the extended cDNA or EST is compared to the amino acid sequence encoded by the hybridizing nucleic acid. Homology is determined to exist when an amino acid sequence in the extended cDNA or 5' EST is closely related to an amino acid sequence in the hybridizing nucleic acid. A sequence is closely related when it is identical to that of the extended cDNA or 5' EST or when it contains one or more amino acid substitutions therein in which amino acids having similar characteristics have been substituted for one another. Using the above methods, one can obtain nucleic acids encoding proteins having at least 95%, at least 90%, at least 85%, at least 80% or at least 75% homology to the proteins encoded by the extended cDNA or 5'EST from which the probe was derived.
Alternatively, extended cDNAs may be prepared by obtaining mRNA from the tissue, cell, or organism of interest using mRNA preparation procedures utilizing poly A selection procedures or other techniques known to those skilled in the art. A first primer capable of hybridizing to the poly A tail of the mRNA is hybridized to the mRNA and a reverse transcription reaction is performed to generate a first cDNA strand.
The first cDNA strand is hybridized to a second primer containing at least 10 consecutive nucleotides of the sequences of the 5' EST for which an extended cDNA is desired. Preferably, the primer comprises at least 12, 15, or 17 consecutive nucleotides from the sequences of the 5' EST. More preferably.
the primer comprises 20-30 consecutive nucleotides from the sequences of the 5' EST. In some embodiments, the primer comprises more than 30 nucleotides from the sequences of the 5' EST. If it is desired to obtain extended cDNAs containing the full protein coding sequence, including the authentic translation initiation site, the second primer used contains sequences located upstream of the translation initiation site. Tile second primer is extended to generate a second cDNA strand complementary to the first cDNA strand. Alternatively, RTPCR may be performed as described above using primers from both ends of tile cDNA to be obtained.
Extended clNAs containing 5' fragments of the mRNA may be prepared by contacting an mRNA comprising the sequence of the 5' ESIT for which an extended cDNA is desired with a primer comprising at least 10 consecutive nucleotides of the sequences complementary to the 5' EST, hybridizing the primer to the mRNAs, and reverse transcribing the hybridized primer to make a first cDNA strand from the mRNAs.
Preferably, the primer comprises at least 12, 15, or 17 consecutive nucleotides from the 5' EST. More preferably, the primer comprises 20-30 consecutive nucleotides from the 5' EST.
Thereafter, a second cDNA strand complementary to the first cDNA strand is synthesized. The second cDNA strand may be made by hybridizing a primer complementary to sequences in the first cDNA strand to the first cDNA strand and extending the primer to generate the second cDNA strand.
The double stranded extended cDNAs made using the methods described above are isolated and cloned. The extended cDNAs may be cloned into vectors such as plasmids or viral vectors capable of replicating in an appropriate host cell. For example, the host cell may be a bacterial, mammalian, avian, or insect cell.
Techniques for isolating mRNA, reverse transcribing a primer hybridized to mRNA to generate a first cDNA strand, extending a primer to make a second cDNA strand complementary to the first cDNA strand, isolating the double stranded cDNA and cloning the double stranded cDNA arc well known to those skilled in the art and are described in Current Protocols in Molecular Biology. John Wiley 503 Sons, Inc.
(1997); and Sambrook et al. Molecular Clolning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, (1989).
Alternatively, kits for obtaining full length cDNAs, such as the GeneTrapper (Cat. No. 10356-020, Gibco, BRL), may be used for obtaining full length cDNAs or extended cDNAs. In this approach, full length or extended cDNAs are prepared from mRNA and cloned into double stranded phagemids. The cDNA library in the double stranded phagemids is then rendered single stranded by treatment with an cndonuclease, such as the Gene 1I product of the phage Fl, and Exonuclease III as described in the manual accompanying the GeneTrapper kit. A biotinylated oligonucleotide comprising the sequence of a 5' EST, or a fragment containing at least 10 nucleotides thereof, is hybridized to the single stranded phagemids.
Preferably, the fragment comprises at least 12, 15, or 17 consecutive nuclcotides from the 5' EST. More preferably, the fragment comprises 20-30 consecutive nucleotides from the 5' EST. In some procedures, the fragment may comprise more than 30 consecutive nucleotides from the 5' EST.
Hybrids between the biotinylated oligonucleotide and phagemids having inserts containing the EST sequence are isolated by incubating the hybrids with streptavidin coated paramagnetic beads and retrieving the beads with a magnet. Thereafter, the resulting phagemids containing the 5' EST sequence are released from the beads and converted into double stranded DNA using a primer specific for the 5' EST sequence. The resulting double stranded DNA is transformed into bacteria. Extended cDNAs containing the 5' EST sequence are identified by colony PCR or colony hybridization.
A plurality of extended cDNAs containing full length protein coding sequences or sequences encoding only the mature protein remaining after the signal peptide is cleaved may be provided as cDNA libraries for subsequent evaluation of the encoded proteins or use in diagnostic assays as described below.
IV. Expression of Proteins Encoded by Extended cDNAs Isolated Using 5' ESTs Extended cDNAs containing the full protein coding sequences of their corresponding mRNAs or portions thereof, such as cDNAs encoding the mature protein, may be used to express the secreted proteins or portions thereof which they encode as described in Example 30 below. If desired, the extended cDNAs may contain the sequences encoding the signal peptide to facilitate secretion of the expressed protein. It will be appreciated that a plurality of extended cDNAs containing the full protein coding sequences or portions thereof may be simultaneously cloned into expression vectors to create an expression library for analysis of the encoded proteins as described below.
EXAMPLE Expression of the Proteins Encoded by Extended cDNAs or Portions Thereof To express the proteins encoded by the extended cDNAs or portions thereof, nucleic acids containing the coding sequence for the proteins or portions thereof to be expressed are obtained as described in Examples 27-29 and cloned into a suitable expression vector. If desired, the nucleic acids may contain the sequences encoding the signal peptide to facilitate secretion of the expressed protein. For example, the nucleic acid may comprise the sequence of one of SEQ ID NOs: 134-180 listed in Table VII and in the accompanying sequence listing. Alternatively, the nucleic acid may comprise those nucleotides which make up the full coding sequence of one of the sequences of SEQ ID NOs: 134-180 as defined in Table VII above.
It will be appreciated that should the extent of the full coding sequence the sequence encoding the signal peptide and the mature protein resulting from cleavage of the signal peptide) differ from that listed in Table VII as a result ofa sequencing error, reverse transcription or amplification error, mRNA splicing, post-translational modification of the encoded protein, enzymatic cleavage of the encoded protein, or other biological factors, one skilled in the art would be readily able to identify the extent of the full coding sequences in the sequences of SEQ ID NOs. 134-180. Accordingly, the scope of any claims herein relating to nucleic acids containing the full coding sequence of one of SEQ ID NOs. 134-180 is not to be construed as excluding any readily identifiable variations from or equivalents to the full coding sequences listed in Table VII. Similarly, should the extent of the full length polypeptides differ from those indicated in Table VIII as a result of any of the preceding factors, the scope of claims relating to polypeptides comprising the amino acid sequence of the full length polypeptides is not to be construed as excluding any readily identifiable variations from or equivalents to the sequences listed in Table VIII.
Alternatively, the nucleic acid used to express the protein or portion thereof may comprise those nuclcotides which encode the mature protein the protein created by cleaving the signal peptide off) encoded by one of the sequences of SEQ ID NOs: 134-180 as defined in Table VII.
It will be appreciated that should the extent of the sequence encoding the mature protein differ from that listed in Table VII as a result of a sequencing error, reverse transcription or amplification error, niRNA splicing, post-translational modification of the encoded protein, enzymatic cleavage of the encoded protein, or other biological factors, one skilled in the art would be readily able to identify the extent of the sequence encoding the mature protein in the sequences of SEQ ID NOs: 134-180. Accordingly, the scope of any claims herein relating to nucleic acids containing the sequence encoding the mature protein encoded by one of SEQ ID NOs: 134-180 is not to be construed as excluding any readily identifiable variations from or equivalents to the sequences listed in Table VII. Thus, claims relating to nucleic acids containing the sequence encoding the mature protein encompass equivalents to the sequences listed in Table VII, such as sequences encoding biologically active proteins resulting from post-translational modification, enzymatic cleavage, or other readily identifiable variations from or equivalents to the proteins in addition to cleavage of the signal peptide. Similarly, should the extent of the mature polypeptides differ from those indicated in Table VIII as a result of any of the preceding factors, the scope of claims relating to polypeptides comprising the sequence of a mature protein included in the sequence of one of SEQ ID NOs. 181-227 is not to be construed as excluding any readily identifiable variations from or equivalents to the sequences listed in Table VIII. Thus, claims relating to polypeptides comprising the sequence of the mature protein encompass equivalents to the sequences listed in Table VIII. such as biologically active proteins resulting from post-translational modification, enzymatic cleavage, or other readily identifiable variations from or equivalents to the proteins in addition to cleavage of the signal peptide. It will also be appreciated that should the biologically active form of the polypeptides included in the sequence of one of SEQ ID NOs.
181-227 or the nucleic acids encoding the biologically active form of the polypeptides differ from those identified as the mature polypeptide in Table VIII or the nucleotides encoding the mature polypeptide in Table VII as a result of a sequencing error, reverse transcription or amplification error, mRNA splicing, post-translational modification of the encoded protein, enzymatic cleavage of the encoded protein, or other biological factors, one skillcd in the art would be readily able to identify the amino acids in the biologically active form of the polypeptides and the nucleic acids encoding the biologically active form of the polypeptides. In such instances, the claims relating to polypeptides comprising the mature protein included in one of SEQ ID NOs. 181-227 or nucleic acids comprising the nucleotides of one of SEQ ID NOs. 134- 180 encoding the mature protein shall not be construed to exclude any readily identifiable variations from the sequences listed in Table VII and Table VIII.
In some embodiments, the nucleic acid used to express the protein or portion thereof may comprise those nuclcotides which encode the signal peptide encoded by one of the sequences of SEQ ID NOs: 134- 180 as defined in Table VII above.
It will be appreciated that should the extent of the sequence encoding the signal peptide differ from that listed in Table VII as a result of a sequencing error, reverse transcription or amplification error, mRNA splicing, post-translational modification of the encoded protein, enzymatic cleavage of the encoded protein.
or other biological factors, one skilled in the art would be readily able to identify the extent of the sequence encoding the signal peptide in the sequences ofSEQ ID NOs. 134-180. Accordingly. the scope of any claims herein relating to nucleic acids containing the sequence encoding the signal peptide encoded by one of SEQ ID NOs.134-180 is not to be construed as excluding any readily identifiable variations from the sequences listed in Table VII. Similarly, should the extent of the signal peptides differ from those indicated in Table VIll as a result of any of the preceding factors, the scope of claims relating to polypeptides comprising the sequence of a signal peptide included in the sequence of one of SEQ ID NOs. 181-227 is not to be construed as excluding any readily identifiable variations from the sequences listed in Table VIII.
Alternatively, the nucleic acid may encode a polypeptide comprising at least 10 consecutive amino acids of one of the sequences of SEQ ID NOs: 181-227. In some embodiments, the nucleic acid may encode a polypeptide comprising at least 15 consecutive amino acids of one of the sequences of SEQ ID NOs: 181-227. In other embodiments, the nucleic acid may encode a polypeptide comprising at least consecutive amino acids of one of the sequences of SEQ ID NOs: 181-227.
The nucleic acids inserted into the expression vectors may also contain sequences upstream of the sequences encoding the signal peptide, such as sequences which regulate expression levels or sequences which confer tissue specific expression.
The nucleic acid encoding the protein or polypeptide to be expressed is operably linked to a promoter in an expression vector using conventional cloning technology. The expression vector may be any of the mammalian, yeast, insect or bacterial expression systems known in the art. Commercially available vectors and expression systems are available from a variety of suppliers including Genetics Institute (Cambridge, MA), Stratagene (La Jolla, California), Promega (Madison, Wisconsin), and Invitrogen (San Diego, California). If desired, to enhance expression and facilitate proper protein folding, the codon context and codon pairing of the sequence may be optimized for the particular expression organism in which the expression vector is introduced, as explained by Hatfield, ct al., U.S. Patent No. 5,082,767.
The following is provided as one exemplary method to express the proteins encoded by the extended cDNAs corresponding to the 5' ESTs or the nucleic acids described above. First, the methionine initiation codon for the gene and the poly A signal of the gene are identified. If the nucleic acid encoding the polypeptide to be expressed lacks a methionine to serve as the initiation site, an initiating methionine can be introduced next to the first codon of the nucleic acid using conventional techniques. Similarly, if the extended cDNA lacks a poly A signal, this sequence can be added to the construct by, for example, splicing out the Poly A signal from pSG5 (Stratagene) using BglI and Sall restriction endonuclease enzymes and incorporating it into the mammalian expression vector pXTL (Stratagene). pXTI contains the LTRs and a portion of the gag gene from Moloney Murine Leukemia Virus. The position of the LTRs in the construct allow efficient stable transfection. The vector includes the Herpes Simplex Thymidine Kinase promoter and the selectable neomycin gene. The extended cDNA or portion thereof encoding the polypeptide to be expressed is obtained by PCR from the bacterial vector using oligonucleotide primers complementary to the extended cDNA or portion thereof and containing restriction endonuclcase sequences for Pst I incorporated into the 5'primer and Bglll at the 5' end of the corresponding cDNA 3' primer, taking care to ensure that the extended cDNA is positioned in fraim with the poly A signal. The purified fragment obtained from the resulting PCR reaction is digested with Pstl, blunt ended with an exonuclease, digested with Bgl II. purified and ligated to pXTl, now containing a poly A signal and digested with Bglll.
The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. Positive transfectants are selected after growing the transfected cells in 600ug/ml G418 (Sigma, St. Louis, Missouri). Preferably the expressed protein is released into the culture medium, thereby facilitating purification.
Alternatively, the extended cDNAs may be cloned into pED6dpc2 as described above. The resulting pED6dpc2 constructs may be transfected into a suitable host cell, such as COS I cells.
Methotrexate resistant cells are selected and expanded. Preferably, the protein expressed from the extended cDNA is released into the culture medium thereby facilitating purification.
Proteins in the culture medium are separated by.gel electrophoresis. If desired, the proteins may be ammonium sulfate precipitated or separated based on size or charge prior to electrophoresis.
As a control, the expression vector lacking a cDNA insert is introduced into host cells or organisms and the proteins in the medium are harvested. The secreted proteins present in the medium are detected using techniques such as Coomassic or silver staining or using antibodies against the protein encoded by the extended cDNA. Coomassie and silver staining techniques are familiar to those skilled in the art.
Antibodies capable of specifically recognizing the protein of interest may be generated using synthetic 15-mer peptides having a sequence encoded by the appropriate 5' EST, extended cDNA, or portion thereof. The synthetic peptides are injected into mice to generate antibody to the polypeptide encoded by the 5' EST, extended cDNA, or portion thereof.
Secreted proteins from the host cells or organisms containing an expression vector which contains the extended cDNA derived from a 5' EST or a portion thereof are compared to those from the control cells or organism. The presence of a band in the medium from the cells containing the expression vector which is absent in the medium from the control cells indicates that the extended cDNA encodes a secreted protein.
Generally, the band corresponding to the protein encoded by the extended cDNA will have a mobility near that expected based on the number of amino acids in the open reading frame of the extended cDNA.
However, the band may have a mobility different than that expected as a result of modifications such as glycosylation, ubiquitination, or enzymatic cleavage.
Alternatively. if the protein expressed from the above expression vectors does not contain sequences directing its secretion, the proteins expressed from host cells containing an expression vector containing an insert encoding a secreted protein or portion thereof can be compared to the proteins expressed in host cells containing tile expression vector without an insert. The presence of a band in samples from cells conlaining thi expression vector with an insert which is absent in samples from cells containing tile expression vector without an insert indicates that the desired protein or portion thereof is being expressed. Generally, the band will have the mobility expected for the secreted protein or portion thereof. However, the band may have a mobility different than that expected as a result of modifications such as glycosylation, ubiquitination, or enzymatic cleavage, The protein encoded by the extended cDNA may be purified using standard inununochromatography techniques. In such procedures, a solution containing the secreted protein, such as the culture medium or a cell extract, is applied to a column having antibodies against the secreted protein attached to the chromatography matrix. The secreted protein is allowed to bind the immunochromatography column. Thereafter, the column is washed to remove non-specifically bound proteins. The specifically bound secreted protein is then released from the column and recovered using standard techniques, If antibody production is not possible, the extended cDNA sequence or portion thereof may be incorporated into expression vectors designed for use in purification schemes employing chimeric polypeptides. In such strategies the coding sequence of the extended cDNA or portion thereof is inserted in frame with the gene encoding the other half of the chimera. The other half of the chimera may be P-globin or a nickel binding polypeptide encoding sequence. A chromatography matrix having antibody to P-globin or nickel attached thereto is then used to purify the chimeric protein. Protease cleavage sites may be engineered between the p-globin gene or the nickel binding polypeptide and the extended cDNA or portion thereof. Thus, the two polypeptidcs of the chimera may be separated from one another by protease digestion.
One useful expression vector for generating p-globin chimerics is pSG5 (Stratagene), which encodes rabbit p-globin. Intron II of the rabbit p-globin gene facilitates splicing of the expressed transcript, and the polyadenylation signal incorporated into the construct increases the level of expression. These techniques as described are well known to those skilled in the art of molecular biology. Standard methods are published in methods texts such as Davis et al., (Basic Methods in Molecular Biology, L.G. Davis, M.D.
Dibner, and J.F. Battey, ed., Elsevier Press, NY, 1986) and many of the methods are available from Stratagene. Life Technologies, Inc., or Promega. Polypeptide may additionally be produced from the construct using in vitro translation systems such as the In vitro Express T Translation Kit (Stratagene).
Following expression and purification of the secreted proteins encoded by the 5' ESTs, extended cDNAs, or fragments thereof, the purified proteins may be tested for the ability to bind to the surface of various cell types as described in Example 31 below. It will be appreciated that a plurality of proteins expressed from these cDNAs may be included in a panel of proteins to be simultaneously evaluated for the activities specifically described below, as well as other biological roles for which assays for determining activity are available.
EXAMPLE 31 Analysis of Secreted Proteins to Determine Whether they Bind to the Cell Surface The proteins encoded by the 5' ESTs. extended cDNAs, or fragments thereof are cloned into expression vectors such as those described in Example 30. The proteins are purified by size, charge.
imnuunochroimatography or other techniques familiar to those skilled in the art. Following purification, the proteins are labeled using techniques known to those skilled in the art. The labeled proteins are incubated with cells or cell lines derived from a variety of organs or tissues to allow the proteins to bind to any receptor present on the cell surface. Following the incubation, the cells are washed to remove nonspecifically bound protein. The labeled proteins are detected by autoradiography. Alternatively, unlabeled proteins may be incubated with the cells and detected with antibodies having a detectable label, such as a fluorescent molecule, attached thereto.
Specificity of cell surface binding may be analyzed by conducting a competition analysis in which various amounts of unlabeled protein are incubated along with the labeled protein. The amount of labeled protein bound to the cell surface decreases as the amount of competitive unlabeled protein increases. As a control, various amounts of an unlabeled protein unrelated to the labeled protein is included in some binding reactions. The amount of labeled protein bound to the cell surface does not decrease in binding reactions containing increasing amounts of unrelated unlabeled protein, indicating that the protein encoded by the cDNA binds specifically to the cell surface.
As discussed above, secreted proteins have been shown to have a number of important physiological effects and, consequently, represent a valuable therapeutic resource. The secreted proteins encoded by the extended cDNAs or portions thereof made according to Examples 27-29 may be evaluated to determine their physiological activities as described below.
EXAMPLE 32 Assaving the Proteins Exnressed from Extended cDNAs or Portions Thereof for Cvtokine. Cell Proliferation or Cell Differentiation Activity As discussed above, secreted proteins may act as cytokines or may affect cellular proliferation or differentiation. Many protein factors discovered to date, including all known cytokines, have exhibited activity in one or more factor dependent cell proliferation assays, and hence the assays serve as a convenient confirmation of cytokine activity. The activity of a protein of the present invention is evidenced by any one of a number of routine factor dependent cell proliferation assays for cell lines including, without limitation, 32D, DA2, DAIG, T10, B9, B9/11, BaF3, MC9/G, M+ (prcB 2E8, RB5, DAI, 123, T 165, HT2, CTLL2, TF-1, Mo7c and CMK. The proteins encoded by the above extended cDNAs or portions thereof may be evaluated for their ability to regulate T cell or thymocyte proliferation in assays such as those described above or in the following references: C'urrent Protocols in imunology, Ed. by J.E. Coligan et al., Greene Publishing Associates and Wiley-lnterscience; Takai et al. J. linnitutol. 137:3494-3500 (1986); Bertagnolli ct al.-J. liwnumol, 145: 1706-1712 (1990): Bertagnolli et al.. CecllularlInununology 133:327-34 1 (1991); 1u:Ignolli, C( al, J. mummy!,tal 149.3778-3783 (1992); and Bowman et al., Imniumol. 152:1756- 1761 (1994).
In addition, numerous assays for cytokine production and/or the proliferation of spleen cells, lymph niode cells and thyniocytes Lare known. These include the techniques disclosed in Current Protocols in innumolog'y. J.E. Coligaun et til. Eds., Vol I pp. 3,12.1-3.12.14 John Wiley and Sons, Toronto. (1994); and Schreiber, R.D. Currenit Protocols in hmi~n no logy., srupra Vol I pp. 6.8.1-6.8.8, John Wiley and Sons, Toronto. (1994), The proteins encoded by the cDNAs may also be assayed for the ability to regulate the proliferation and differentiation of hematopoietic or lymphopoietic cells. Many assays for such activity arc familiar to those skilled in the art, including the assays in thle following references: Bottomly, Davis, L.S. and Lipsky, Measurement of Human and Murine Interleukin 2 and Interleukin 4, Current: Protocols in lhimutology., J.E. Coligan et al. Eds. Vol 1 pp. 6.3.1-6.3.12, John Wiley and Sons, Toronto. (1991); dcVries ct al., J. Exp. A-fe. 173:1205-1211, 199 1: Moreau ct al., Nitrure 36:690-692. (1988); Greenberger et al., Proc. Nail. Acail. Sci. U.S.A. 80:2931-2938, (1983); Nordan, Measurement of Mouse and Human Interleukin 6. Current Protocols inl Inmunology. J.E. Coligan et al. Eds. Vol I pp. 6.6.1-6.6.5, John Wiley and Sons, Toronto. (199 Smith ct al., Proc. Natl. Acad. Sci. U.S.A. 83:1857-1861, 1986; Bennett, F., Giannotti, Clark, S.C. and Turner, Measurement of Human Interleukin 11. Current Protocols illi nImmunology. J.E. Coligan ct al. Eds. Vol I pp. 6.15.1 John Wiley and Sons, Toronto. (199 and Ciarletta, Giannotti, J1., Clark, S.C. and Turner, Measurement of Mouse and Human Interleukin 9. Current Protocols in linnumology. I.E. Coligan et al., Eds. Vol 1 pp. 6.13. 1, John Wilcy and Sons, Toronto. (1991).
The proteins encoded by the cDNAs may also be assayed for their ability to regulate T-cell responses to antigens. Many assays for such activity are familiar to those skilled in the art, including thle assays described in the following references: Chapter 3 (In Vitro Assays for Mouse Lymphocyte Function), Chapter 6 (Cytokines and Their Cellular Receptors) and Chapter 7, (Immunologic Studies in Humans) Cuirrent Protocols in Immuuzology, L.E. Coligan et al. Eds. Greene Publishing Associates and Wiley- Interscience; Weinberger et Proc. Nail. Acad. Sc. USA 77:6091-6095 (1980): Weinberger et al., Eur. J.
Ilhilnu. 11:405-411 (198 Takai et al., J. Inununol. 137:3494-3500 (1986); and Takai et al., J. litanunol.
140:508-5 12 (1988).
Those proteins which exhibit cytokine, ccll proliferation, or cell differentiation activity mnay then be formulated as pharmaceuticals and used to treat clinical conditions in which induction of cell proliferation or differentiation is beneficial, Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids regulating the expression of these proteins may be introduced into appropriate host cells to increase or decrease the expression of the proteins as desired.
EXAMPLE 33 Asiaving ilia Proteins Expressed-froni Extended cDNAs or Portions Thereof for Acuivity as linmune Systemi Regulators The proteins encoded by the cDNAs may also be evaluated for their effects as immuitne regulators.
For example, the proteins maty be cvaluatod for their activity to influence thymiocyte or splenocyte cytotoxicity. Numerous assays for such activity are fam~iliar to those skilled in the ar including the assays described in the following references: Chapter 3 (in Vitro Assays for Mouse Lymphocyte Function 3. 1- 0 3.19) and Chapter 7 (Immunologic studies in Humans) Current Protocols in Iininunology. I.E. Coligan at al.
Eds. Greene Publishing Associates and Wiley-lnterscience; H-errmann at al., Proc. Nail. Acad. Sci. USA 78:2488-24921 (198 Hemrnanin et al., J. Iinnmwzol. 128:1968-1974 (1982); Hzanda et al., J. hiumunol.
135:1564-1572 (1985); Takai al al,, J. iinumnol, 137:3494-3500 (1986); Takai ct al.. J. Iinuinol. 140:508- 512 (1988); Heminarnnet al., Proc. NMid. Acod. Sci. USA 78:2488-2492 (198 Herrmann et al J. hnumol.
128:1968-1974 (1982); Handa ctua., J. hnimunol. 135:1564-1572 (1985); Takai et al.. J. inutnol.
137:3494-3500 (1986); Bowman et al,, J. Virology 61:1992-1998; Takai ct al.. J. lrinuol, 140:508-512 (1988); Bertagnolli cccal., Cellular Ininutiology 133:327-341 (1991); and Brown etcal.. 1. Ilninunol 153:3079-3092 (1994).
The proteins cncoded by the cDNAs may also be, cvaluated for their cffects on T-ccll dependent immunoglobulin responses and isotype switching. Numerous assays for such activity are familiar to those skilled in thc art, including the assays disclosed in thea following rcferciices: Maliszewski, Iminunol.
144:3028-3033 (1990); and Mond, J.J. and Brunswick, M. Assays for B Cell Function: hi vitro Antibody Production, Vol I pp. 3.8.1-3.8.16 Current Protocols in limmutnology. J.E. Coligan at al Eds.. John Wilcy and Sons, Toronto. (1994).
The proteins encoded by the cDNAs may also be evaluated for their effect on immune effector cells, including their effect on Th I cells and cytocoxic lymphocytes. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Chapter 3 (in Vitro Assays for Mouse Lymphocyte Function 3.1-3.19) and Chapter 7 (Immunologic Studies in Humans) C'urrent Protocols in Imunology, J.E. Coligan et al. Eds., Greene Publishing Associates and Wileylncerscience; Takai et al., J. linmnunol. 137:3494-3500 (1986); Tckci ct al.; J. Inimutnol. 140:508-5 12 (1988); and Bertagnolli et al., 1. Inunol. 149:3778-3783 (1992).
The proteins encoded by the cDNAs may also be evaluated for their effect on dcndritic cell mediated activation of naive T-cclls. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Query at al., J. Immtunol. 134:536-544 (1995); Inaba et al., Journal of Experimental Medicine 173:549-559 (199 Macctonic et cl., J. hninunol.
154:5071-5079 (1995); Porgador et al., Journal of Experimental Medicine 182:255-260 (1995); Nair et al., Journal of Virology 67:4062-4069 (1993); Huang et al., Science 264:961-965 (1994); Macatonia et al., Journal of Evperimental Medicine 169:1255-1264 (1989); Bhardwaj et al.. Journal of Clinical Investigation 94:797-807 (1994); and Inaba et al., Journal of Erperimental Medicine 172:631-640 (1990).
The proteins encoded by the cDNAs may also be evaluated for their influence on the lifetime of lymphocytes. Numerous assays for such activity arc familiar to those skilled in the art, including the assays disclosed in the following references: Darzynkiewicz ct al.. Cviometry 13:795-808 (1992); Gorczyca ct al., LeuIkemia 7:659-670 (1993); Gorczyca et al., Caincr Research 53:1945-1951 (1993); Itoh et al., Cell 66:233-243 (1991); Zacharchuk et al., J. liunaul. 145:4037-4(45 (1990); Zamai et al., Cytometry 14:891- 897 (1993); and Gorczyca et al., lnternational Jounal of Oncology 1:639-648 (1992).
Assays for proteins that influence early steps ofT-cell commitment and development include, without limitation, those described in: Antica et al., Blood 84:111.117 (1994); Fine et al., Cellular inunniology 155:111-122 (1994); Galy et al.. Blood 85:2770-2778 (1995); and Toki et al., Proc. Nat. Acad Sci. USA 88:7548-7551 (1991).
Those proteins which exhibit activity as immune system regulators activity may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of immune activity is beneficial.
For example, the protein may be useful in the treatment of various immune deficiencies and disorders (including severe combined immunodeficiency (SCID)), in regulating (up or down) growth and proliferation of T and/or B lymphocytes, as well as effecting the cytolytic activity of NK cells and other cell populations. These immune deficiencies may be genetic or be caused by viral HIV) as well as bacterial or fungal infections, or may result from autoimmune disorders. More specifically, infectious diseases caused by viral, bacterial, fungal or other infection may be treatable using a protein of the present invention, including infections by HIV, hepatitis viruses, herpcsviruses, mycobacteria. Leishmania spp., malaria spp. and various fungal infections such as candidiasis. Of course, in this regard, a protein of the present invention may also be useful where a boost to the immune system generally may be desirable, in the treatment of cancer.
Autoimmune disorders which may be treated using a protein of the present invention include, for example, connective tissue disease, multiple sclerosis, systemic lupus erythematosus, rheumatoid arthritis, autoimmune pulmonary inflammation, Guillain-Barre syndrome, autoimmune thyroiditis, insulin dependent diabetes mellitis, myasthenia gravis, graft-versus-host disease and autoimmune inflammatory eye disease.
Such a protein of the present invention may also to be useful in the treatment of allergic reactions and conditions, such as asthma (particularly allergic asthma) or other respiratory problems. Other conditions, in which immune suppression is desired (including, for example, organ transplantation), may also be treatable using a protein of the present invention.
Using the proteins of the invention it may also be possible to regulate immune responses, in a number of ways. Down regulation may be in the form of inhibiting or blocking an immune response already in progress or may involve preventing the induction of an immune response. The functions of activated Tcells may be inhibited by suppressing T cell responses or by inducing specific tolerance in T cells, or both.
Immunosuppression of T cell responses is generally an active, non-antigen-specific, process which requires continuous exposure of the T cells to the suppressive agent. Tolerance, which involves inducing nonresponsiveness or anergy in T cells, is distinguishable from immunosuppression in that it is generally antigen-specific and persists after exposure to the tolcrizing agent has ceased. Operationally, tolerance can be demonstrated by the lack of a T cell response upon reexposure to specific antigen in the absence of the tolerizing agent.
Down regulating or preventing one or more antigen functions (including without limitation 13 lymphocyte antigen functions (such as, for example, 137)), preventing high level lymphokine synthesis by activated T cells, will be useful in situations of tissue, skin and organ transplantation and in graft-versushost disease (GVHD). For example, blockage ofT cell function should result in reduced tissue destruction in tissue transplantation. Typically, in tissue transplants, rejection of the transplant is initiated through its recognition as foreign by T cells, followed by an immune reaction that destroys the transplant. The administration of a molecule which inhibits or blocks interaction of a B7 lymphocyte antigen with its natural ligand(s) on immune cells (such as a soluble, monomeric form of a peptide having B7-2 activity alone or in conjunction with a monomeric form of a peptide having an activity of another B lymphocyte antigen B7-1, 17-3) or blocking antibody), prior to transplantation can lead to the binding of the molecule to the natural ligand(s) on the immune cells without transmitting the corresponding costimulatory signal. Blocking B lymphocyte antigen function in this matter prevents cytokine synthesis by immune cells, such as T cells, and thus acts as an immunosuppressant. Moreover, the lack of costimulation may also be sufficient to anergize the T cells, thereby inducing tolerance in a subject. Induction of long-term tolerance by B lymphocyte antigen-blocking reagents may avoid the necessity of repeated administration of these blocking reagents. To achieve sufficient immunosuppression or tolerance in a subject, it may also be necessary to block the function of a combination of B lymphocyte antigens.
The efficacy of particular blocking reagents in preventing organ transplant rejection or GVHD can be assessed using animal models that are predictive of efficacy in humans. Examples of appropriate systems which can be used include allogeneic cardiac grafts in rats and xenogeneic pancreatic islet cell grafts in mice, both of which have been used to examine the immunosuppressive effects of CTLA4Ig fusion proteins in vivo as described in Lenschow et al., Science 257:789-792 (1992) and Turka et al., Proc. Natl.
Acad. Sci USA, 89:11102-11105 (1992). In addition, murine models of GVHD (sec Paul ed., Fundamental Imiunology, Raven Press, New York, (1989), pp. 846-847) can be used to determine the effect of blocking B lymphocyte antigen function in vivo on the development of that disease.
Blocking antigen function may also be therapeutically useful for treating autoimmune diseases.
Many autoimmune disorders are the result of inappropriate activation ofT cells that are reactive against self tissue and which promote the production of cytokines and autoantibodies involved in the pathology of the diseases. Preventing the activation of autoreactive T cells may reduce or eliminate disease symptoms.
Administration of reagents which block costimulation of T cells by disrupting receptor ligand interactions of B lymphocyte antigens can be used to inhibit T cell activation and prevent production of autoantibodies or T cell-derived cytokines which may be involved in the disease process. Additionally, blocking reagents may induce antigen-specific tolerance of autorcactive T cells which could lead to long-term relief from the disease. The efficacy of blocking reagents in preventing or alleviating autoimmune disorders can be determined using a number of well-characterized animal models of human autoimmune diseases. Examples include Lmurine experimental autoimmune encephalitis, systemic lupus crythnuatosis in MRL/pr/pr mice or NZB hybrid mice, murine autoimnmuno collagen arthritis, diabetes mellitus in OD mice and B rats, and murine experimental myasthenia gravis (see Paul ed., Funitrlau ntal Imminology, Raven Press, New York.
(1989), pp. 840-856).
Upregulation of an antigen function (preferably a B lymphocyte antigen function), us a means of up regulating immune responses, may also be useful in therapy. Upregulation of immune responses may be in the form of enhancing an existing immune response or eliciting an initial immune response. For example, enhancing an immune response through stimulating B lymphocyte antigen function may be useful in cases of viral infection. In addition, systemic viral diseases such as innuenza, the common cold, and encephalitis might be alleviated by the administration of stimulatory form of B lymphocyte antigens systemically.
Alternatively, anti-viral immune responses may be enhanced in an infected patient by removing T cells from the patient, costimulating the T cells in vitro with viral antigen-pulsed APCs either expressing a peptide of the present invention or together with a stimulatory form of a soluble peptide of the present invention and reintroducing the in vitro activated T cells into the patient. The infected cells would now be capable of delivering a costimulatory signal to T cells in vivo. thereby activating the T cells.
In another application, up regulation or enhancement of antigen function (preferably B lymphocyte antigen function) may be useful in the induction of tumor immunity. Tumor cells sarcoma, melanoma, lymphoma, leukemia, neuroblastoma. carcinoma) transfected with a nucleic acid encoding at least one peptide of the present invention can be administered to a subject to overcome tumor-specific tolerance in the subject. If desired, the tumor cell can be transfected to express a combination of peptides. For example, tumor cells obtained from a patient can be transfected ex vivo with an expression vector directing the expression of a peptide having B7-2-like activity alone, or in conjunction with a peptide having B7-1-like activity and/or B7-3-like activity. The transfected tumor cells are returned to the patient to result in expression of the peptides on the surface of the transfected cell. Alternatively, gene therapy techniques can be used to target a tumor cell for transfection in vivo.
The presence of the peptide of the present invention having the activity of a B lymphocyte antigen(s) on the surface of the tumor cell provides the necessary costimulation signal to T cells to induce a T cell mediated immune response against the transfected tumor cells. In addition, tumor cells which lack MHC class I or MHC class 11 molecules, or which fail to reexpress sufficient amounts of MHC class I or MHC class II molecules, can be transfected with nucleic acids encoding all or a portion of a cytoplasmic-domain truncated portion) of an MHC class I a chain protein and Oz macroglobulin protein or an MHC class 11 a chain protein and an MHC class 110 chain protein to thereby express MHC class I or MHC class 11 proteins on the cell surface. Expression of the appropriate class 11 or class HI MHC in conjunction with a peptide having thie activity of a B lymphocyte antigen B7- 1. 137-2, B7-3) induces a T cell mediated immune response against thc trinsfcccd tumnor ccl. Optionally. a gene encoding an antisciise construct which blocks expression of an MI IC class Ii associated protein, such as the invariant chain, can also be cotransfecied with a DNA encoding a peptide having the activity of a 13 lymphocyte antigen ito promote presentation of tumior associated antigens and induce tumnor specific immunity. Thus, fie induction of a T cell mediated immune response in a human subject may be surricient to overcome tumior-specific tolerance in the subject, Alternatively. as described in more detail below, genes encoding these proteins or nucleic acids regulating the expression of these proteins may be introduced into appropriate host cells to increase or decrease the expression of the proteins as desired.
EXAMPLE 34 Assaying the Proteins Expressed from Extended cDNAs or Portions Thereof for Hematornoiesis Regulating Activity The proteins encoded by the extended cDNAs or portions thereof may also be evaluated for their hemnatopoiesis regulating activity. For example, the effect of the proteins on embryonic stcm cell differentiation may be evaluated. Numerous assays for such activity are familiar to those skilled in the ort, including the assays disclosed in the following references: Johansson et al. Cellilar Biology IS: 141-151 (1995); Keller et al., Mlok'cukar andlCellular Biolo;gy 13:473-436(1993); and McClanahan ct al., Blood 81:2903-2915 (1993).
The proteins encoded by the extended cDNAs or portions thereof mnay also be evaluated for their influence on the lifetime of stem cells and stem cell differentiation. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Freshney, M.G. Methylcellulose Colony Forming Assays, Culture of fletnatopoiedic Cells. R.I. Frcshney, et al. Eds.
pp. 265-268, Wiley-Liss, Inc., New York, NY. (1994); Hirayama et al., Proc. Nall. Acad. Sci. USA 89:5907-5911 (1992); McNiece, I.K. and Briddell, R.A. Primitive Hematopoietic Colony Forming Cells with High Proliferative Potential, Culture of Heinatopoictic Cells. R.I. Freshney, et al. eds. Vol pp. 23-39, Wiley-Liss, Inc., New York, NY. (1994); Neben et at., Experimental Hienatology 22:353-359 (1994); Ploemacher, R.E. Cobblestone Area Forming CellI Assay, Culture of Henatopoictic Cells. R.I. Freshney, ct al. Eds. pp. 1-21, Wiley -Liss, Inc., New York. NY. (1994); Spooncer, Dexter, M. and Allen, T. Long Term Bone Marrow Cultures in the Presence of Stromal Cells, Culture of le'natopoietic Cells. R.I.
Freshney, et al. Eds. pp. 163-179, Wiley-Liss, Inc., New York, NY. (1994); and Sutherland, H.J. Long Term Culture Initiating Cell Assay, Culture of Hemtnaopoietic Cells. R.I. Freshney, et al. Eds. pp. 139-162, Wiley-Liss. Inc., New York, NY. (1994).
Those proteins which exhibit hematopoiesis regulatory activity may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of hematopoeisis is beneficial. For example, a protein of the present invention may be useful in regulation of hematopoiesis and, consequently, in the treatment of mycloid or lymphoid cell deficiencies. Even marginal biological activity in support of colony forming cells or of factor-dependent cell lines indicates involvement in regulating hematopoicsis, e.g. in supporting the growth and proliferation of crythroid progenitor cells alone or in combination with other cytokines, thereby indicating utility, for example, in treating various anemias or for use in conjunction with inradiation/chemotherapy to stimulate the production ofcrythroid precursors and/or crythroid cells; in supporting the growth and proliferation of myeloid cells such as granulocytes and monocytes/macrophages traditional CSF activity) useful, for example, in conjunction with chemotherapy to prevent or treat consequent myelo-suppression; in supporting the growth and proliferation of megakaryocytes and consequently of platelets thereby allowing prevention or treatment of various platelet disorders such as thrombocytopenia, and generally for use in place of or complimentary to platelet transfusions; and/or in supporting the growth and proliferation of hematopoictic stem-cells which are capable of maturing to any and all of the above-mentioned hematopoietic cells and therefore find therapeutic utility in various stem cell disorders (such as those usually treated with transplantion, including, without limitation, aplastic anemia and paroxysmal nocturnal hemoglobinuria), as well as in repopulating the stem cell compartment post irradiation/chemotherapy, either in-vivo or ex-vivo in conjunction with bone marrow transplantation or with peripheral progenitor cell transplantation (homologous or heterologous)) as normal cells or genetically manipulated for gene therapy. Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids regulating the expression of these proteins may be introduced into appropriate host cells to increase or decrease the expression of the proteins as desired.
EXAMPLE Assaying the Proteins Expressed from Extended cDNAs or Portions Thereof for Regulation of Tissue Growth The proteins encoded by the extended cDNAs or portions thereof may also be evaluated for their effect on tissue growth. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in International Patent Publication No. W095/16035, International Patent Publication No. W095/05846 and International Patent Publication No. W091/07491.
Assays for wound healing activity include, without limitation, those described in: Winter, Epidennal Wound Healing, pps. 71-112 (Maibach, HI and Royce. DT, cds.), Year Book Medical Publishers, Inc., Chicago, as modified by Eaglstein and Mcrtz, J. Invest. Dermatol. 71:382-84 (1978).
Those proteins which are involved in the regulation of tissue growth may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of tissue growth is beneficial. For example, a protein of the present invention also may have utility in compositions used for bone, cartilage, tendon, ligament and/or nerve tissue growth or regeneration, as well as for wound healing and tissue repair and replacement, and in the treatment of burs. incisions and ulcers.
A protein of the present invention, which induces cartilage and/or bone growth in circumstances where bone is not normally formed, has application in the healing of bone fractures and cartilage damage or defects in humans and other animals. Such a preparation employing a protein of the invention may have prophylactic use in closed as well as open fracture reduction and also in the improved fixation ofartificial joints. De novo bone formation induced by an ostcogenic agent contributes to the repair of congenital, trauma induced, or oncologic resection induced craniofacial defects, and also is useful in cosmetic plastic surgery.
A protein of this invention may also be used in the treatment of periodontal disease, and in other tooth repair processes. Such agents may provide an environment to attract bone-forming cells, stimulate growth of bone-forming cells or induce differentiation of progenitors of bone-forming cells. A protein of the invention may also be useful in the treatment of osteoporosis or osteoarthritis, such as through stimulation of bone and/or cartilage repair or by blocking inflammation or processes of tissue destruction (collagenase activity, osteoclast activity, etc.) mediated by inflammatory processes.
Another category of tissue regeneration activity that may be attributable to the protein of the present invention is tendon/ligament formation. A protein of the present invention, which induces tendon/ligamentlike tissue or other tissue formation in circumstances where such tissue is not normally formed, has application in the healing of tendon or ligament tears, deformities and other tendon or ligament defects in humans and other animals. Such a preparation employing a tendon/ligament-like tissue inducing protein may have prophylactic use in preventing damage to tendon or ligament tissue, as well as use in the improved fixation of tendon or ligament to bone or other tissues, and in repairing defects to tendon or ligament tissue.
De novo tendon/ligament-like tissue formation induced by a composition of the present invention contributes to the repair of congenital, trauma induced, or other tendon or ligament defects of other origin, and is also useful in cosmetic plastic surgery for attachment or repair of tendons or ligaments. The compositions of the present invention may provide an environment to attract tendon- or ligament-forming cells, stimulate growth of tendon- or ligament-forming cells, induce differentiation of progenitors of tendonor ligament-forming cells, or induce growth of tendon/ligament cells or progenitors ex vivo for return in vivo to effect tissue repair. The compositions of the invention may also be useful in the treatment of tendinitis, carpal tunnel syndrome and other tendon or ligament defects. The compositions may also include an appropriate matrix and/or sequestering agent as a carrier as is well known in the art.
The protein of the present invention may also be useful for proliferation of neural cells and for regeneration of nerve and brain tissue, for the treatment of central and peripheral nervous system diseases and neuropathies, as well as mechanical and traumatic disorders, which involve degeneration, death or trauma to neural cells or nerve tissue. More specifically, a protein may be used in the treatment of diseases of the peripheral nervous system, such as peripheral nerve injuries, peripheral neuropathy and localized neuropathies, and central nervous system diseases, such as Alzheimer's. Parkinson's disease, Huntington's disease, amyotrophic lateral sclerosis, and Shy-Drager syndrome. Further conditions which may be treated in accordance with the present invention include mechanical and traumatic disorders, such as spinal cord disorders, head trauma and cerebrovascular diseases such as stroke. Peripheral neuropathies resulting 'rom chemotherapy or other medical therapies niay also be treatable using a protein of the invention.
Proteins of the invention may also be useful to promote better or faster closure of non-healing wounds, including without limitation pressure ulcers, ulcers associated with vascular insufficiency, surgical and traumatic wounds, and the like, It is expected that a protein of the present invention may also exhibit activity for generation or regeneration of other tissues, such as organs (including, for example, pancreas, liver, intestine, kidney, skin, endothelium) muscle (smooth, skeletal or cardiac) and vascular (including vascular endothelium) tissue, or for promoting the growth of cells comprising such tissues. Part of the desired effects may be by inhibition or modulation of fibrotic scarring to allow normal tissue to generate. A protein of the invention may also exhibit angiogenic activity.
A protein of the present invention may also be useful for gut protection or regeneration and treatment of lung or liver fibrosis, reperfusion injury in various tissues, and conditions resulting from systemic cytokinc damage.
A protein of the present invention may also be useful for promoting or inhibiting differentiation of tissues described above from precursor tissues or cells; or for inhibiting the growth of tissues described above.
Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids regulating the expression of these proteins may be introduced into appropriate host cells to increase or decrease the expression of the proteins as desired.
EXAMPLE 36 Assaving the Proteins Expressed from Extended cDNAs or Portions Thereof for Regulation of Reproductive Hormones or Cell Movement The proteins encoded by the extended cDNAs or portions thereof may also be evaluated for their ability to regulate reproductive hormones, such as follicle stimulating hormone. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Vale et al., Endocrinology 91:562-572 (1972); Ling et al., Nature 321:779-782 (1986); Vale ct al., Nature 321:776-779 (1986); Mason ct al., Nature 318:659-663 (1985); Forage et al., Proc. Natl. Acad. Sci. USA 83:3091-3095 (1986). Chapter 6.12 (Measurement of Alpha and Beta Chemokines) Current Protocols in Inunology, J.E. Coligan et al. Eds.,Greene Publishing Associates and Wiley-Intersciece Taub et al. J.
Clin. Invest. 95:1370-1376 (1995); Lind et al, APMIS 103:140-146 (1995); Muller ct al. Eur. J. Immunol.
25:1744-1748; Gruber et al. J. of hlunuol. 152:5860-5867 (1994); and Johnston et al. J. of hnumiol.
153:1762-1768 (1994).
Those proteins which exhibit activity as reproductive hormones or regulators of cell movement may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of reproductive hormones or cell movement are beneficial. For example, a protein of the present invention may also exhibit activin- or inhibin-related activities. Inhibins are characterized by their ability to inhibit the release of follicle stimulating hormone (FSI1), while activins are characterized by their ability to stimulate the release of folic stimulating hormone (FSH). Thus, a protein of the present invention, alone or in heterodimers with a member of the inhibin a family, may be useful as a contraceptive based on the ability to of inhibins to decrease fertility in female mammals and decrease spermatogenesis in male mammals.
Administration of sufficient amounts of other inhibins can induce infertility in these mammals.
Alternatively, the protein of the invention, as a homodimer or as a heterodimer with other protein subunits of the inhibin-B group, may be useful as a fertility inducing therapeutic, based upon the ability of activin molecules in stimulating FSH release from cells of the anterior pituitary. See, for example, United States Patent 4.798.885. A protein of the invention may also be useful for advancement of the onset of fertility in sexually immature mammals, so as to increase the lifetime reproductive performance of domestic animals such as cows, sheep and pigs.
Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids regulating the expression of these proteins may be introduced into appropriate host cells to increase or decrease the expression of the proteins as desired.
EXAMPLE 36A Assaying the Proteins Exnressed from Extended cDNAs or Portions Thereof for Chemotactic/Chemokinetic Activity The proteins encoded by the extended cDNAs or portions thereof may also be evaluated for chemotacti/chemokinetic activity. For example, a protein of the present invention may have chemotactic or chemokinetic activity act as a chemokine) for mammalian cells, including, for example, monocytes, fibroblasts, neutrophils, T-cells, mast cells, cosinophils, epithelial and/or endothelial cells. Chemotactic and chmokinetic proteins can be used to mobilize or attract a desired cell population to a desired site of action.
Chemotactic or chemokinetic proteins provide particular advantages in treatment of wounds and other trauma to tissues, as well as in treatment of localized infections. For example, attraction of lymphocytes, monocytes or neutrophils to tumors or sites of infection may result in improved immune responses against the tumor or infecting agent.
A protein or peptide has chemotactic activity for a particular cell population if it can stimulate, directly or indirectly, the directed orientation or movement of such cell population. Preferably, the protein or peptide has the ability to directly stimulate directed movement of cells. Whether a particular protein has chemotactic activity for a population of cells can be readily determined by employing such protein or peptide in any known assay for cell chemotaxis.
The activity of a protein of the invention may, among other means, be measured by the following methods: Assays for clhecotactic activity (which will identify proteins that induce or prevent chemotaxis) consist of assays that measure the ability of a protein to induce the migration of cells across a membrane as well as the ability of a protein to induce the adhension of one cell population to another cell population.
Suitable assays for movement and adhesion include, without limitation, those described in: Current Protorols in Inmmology, Ed by J.E. Coligan, A.M. Kruisbeek, Margulics. E.M. Shevach, W. Strober, Pub. Greene Publishing Associates and Wiley-Interscience (Chapter 6.12, Measurement of alpha and beta Chemokincs 6.12.1-6.12.28; Taub et al. J C. in. Ivest. 95:1370-1376 (1995); Lind et al. APMIS 103:140- 146 (1995); Mueller et al. Eur. J. limuniol, 25:1744-1748; Gruber et al. J. of lnnunol. 152:5860-5867 (1994); and Johnston et al. J. of hinunol. 153:1762-1768 (1994).
EXAMPLE 37 Assaving the Proteins Expressed from Extended cDNAs or Portions Thereof for Regulation of Blood Clotting The proteins encoded by the extended cDNAs or portions thereof may also be evaluated for their effects on blood clotting. Numerous assays for such activity are familiar to those skilled in the art, including the assays disclosed in the following references: Liner et al., J. Clin. Pharinacol. 26:131-140 (1986); Burdick et al., Thrombosis Res. 45:413-419 (1987); Humphrey ct al., Fibrinolysis 5:71-79 (1991); and Schaub, Prostaglandins 35:467474 (1988).
Those proteins which are involved in the regulation of blood clotting may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of blood clotting is beneficial. For example, a protein of the invention may also exhibit hemostatic or thrombolytic activity. As a result, such a protein is expected to be useful in treatment of various coagulations disorders (including hereditary disorders, such as hemophilias) or to enhance coagulation and other hemostatic events in treating wounds resulting from trauma, surgery or other causes. A protein of the invention may also be useful for dissolving or inhibiting formation of thromboses and for treatment and prevention of conditions resulting therefrom (such as, for example, infarction of cardiac and central nervous system vessels stroke). Alternatively, as described in more detail below, genes encoding these proteins or nucleic acids regulating the expression of these proteins may be introduced into appropriate host cells to increase or decrease the expression of the proteins as desired.
EXAMPLE 38 Assaying the Proteins Expressed from Extended cDNAs or Portions Thereof for Involvement in Receptor/Ligand Interactions The proteins encoded by the extended cDNAs or a portion thereof may also be evaluated for their involvement in receplor/ligand interactions. Numerous assays for such involvement are familiar to those skilled in the art. including the assays disclosed in the following references: Chapter 7.28 (Measurement of Ccllular Adhesion under Static Conditions 7.28.1-7.28.22) Current Protocols in Ihmmunology, J.E. Coligan et al. Eds. Greene Publishing Associates and Wilcy-Interscience; Takai et al., Proc. Natl. Acad. Sci. USA 84:6864-6868 (1987); Bierer et al., J. Eq/j. Me,. 168:1145-1156 (1988); Rosenstein ct al., J. E.p. Med.
169:149-160(1989); Stoltenborg et al.. Iinumnol. Methods 175:59-68 (1994); Stitt et al., Cell 80:661- 670 (1995); and Gyuris et al., Cell 75:791-803 (1993), For example, the proteins of the present invention may also demonstrate activity as receptors, receptor ligands or inhibitors or agonists of receptor/ligand interactions. Examples of such receptors and ligands include, without limitation, cytokine receptors and their ligands, receptor kinases and their ligands, receptor phosphatases and their ligands, receptors involved in cell-cell interactions and their ligands (including without limitation, cellular adhesion molecules (such as sclectins, integrins and their ligands) and receptor/ligand pairs involved in antigen presentation, antigen recognition and development of cellular and humoral immune responses). Receptors and ligands are also useful for screening of potential peptide or small molecule inhibitors of the relevant receptor/ligand interaction. A protein of the present invention (including, without limitation, fragments of receptors and ligands) may themselves be useful as inhibitors of receptor/ligand interactions.
EXAMPLE 38A Assayine the Proteins Expressed from Extended cDNAs or Portions Thereof for Anti-Inflammatory Activity The proteins encoded by the extended cDNAs or a portion thereof may also be evaluated for antiinflammatory activity. The anti-inflammatory activity may be achieved by providing a stimulus to cells involved in the inflammatory response, by inhibiting or promoting cell-cell interactions (such as, for example, cell adhesion), by inhibiting or promoting chemotaxis of cells involved in the inflammatory process, inhibiting or promoting cell extravasation, or by stimulating or suppressing production of other factors which more directly inhibit or promote an inflammatory response. Proteins exhibiting such activities can be used to treat inflammatory conditions including chronic or acute conditions), including without limitation inflammation associated with infection (such as septic shock, sepsis or systemic inflammatory response syndrome (SIRS)), ischcmia-reperfusioninury, endotoxin lethality, arthritis, complement-mediated hyperacute rejection, nephritis, cytokine or chemokine-induced lung injury, inflammatory bowel disease, Crohn's disease or resulting from over production of cytokines such as TNF or L-1. Proteins of the invention may also be useful to treat anaphylaxis and hypersensitivity to an antigenic substance or material.
EXAMPLE 380 Assaying the Proteins Expressed from Extended cDNAs o Portions Thereof for Tumor Inhibition Activity The proteins encoded by the extended cDNAs or a portion thereof may also be evaluated for tumor inhibition activity. In addition to the activities described above for immunological treatment or prevention of tumors, a protein of the invention may exhibit other anti-tumor activities. A protein may inhibit tumor growth directly or indirectly (such as, for example, via ADCC). A protein may exhibit its tumor inhibitory activity by acting on tumor tissue or tumor precursor tissue, by inhibiting formation of tissues necessary to support tumor growth (such as, for example, by inhibiting angiogenesis), by causing production of other factors, agents or cell types which inhibit tumor growth, or by suppressing, eliminating or inhibiting factors, agents or cell types which promote tumor growth.
A protein of the invention may also exhibit one or more of the following additional activities or effects: inhibiting the growth, infection or function of, or killing, infectious agents, including, without limitation, bacteria, viruses, fungi and other parasites; effecting (suppressing or enhancing) bodily characteristics, including, without limitation, height, weight, hair color, eye color, skin, fat to lean ratio or other tissue pigmentation, or organ or body part size or shape (such as, for example, breast augmentation or diminution, change in bone form or shape); effecting biorhythms or circadian cycles or rhythms; effecting the fertility of male or female subjects; effecting the metabolism, catabolism, anabolism, processing, utilization, storage or elimination of dietary fat. lipid, protein, carbohydrate, vitamins, minerals, cofactors or other nutritional factors or component(s); effecting behavioral characteristics, including, without limitation, appetite, libido, stress, cognition (including cognitive disorders), depression (including depressive disorders) and violent behaviors; providing analgesic effects or other pain reducing effects; promoting differentiation and growth of embryonic stem cells in lineages other than hematopoictic lineages; hormonal or endocrine activity; in the case of enzymes, correcting deficiencies of the enzyme and treating deficiency-related diseases; treatment of hyperproliferative disorders (such as, for example, psoriasis); immunoglobulin-like activity (such as, for example, the ability to bind antigens or complement); and the ability to act as an antigen in a vaccine composition to raise an immune response against such protein or another material or entity which is cross-reactive with such protein.
EXAMPLE 39 Identification of Proteins which Interact with Polypentides Encoded by Extended cDNAs Proteins which interact with the polypeptides encoded by extended cDNAs or portions thereof, such as receptor proteins, may be identified using two hybrid systems such as the Matchmaker Two Hybrid System 2 (Catalog No. K 1604-1. Clontech). As described in the manual accompanying the Matchmaker Two Hybrid System 2 (Catalog No. K1604-1, Clontech), the extended cDNAs or portions thereof, are inserted into an expression vector such that they are in frame with DNA encoding the DNA binding domain of the yeast transcriptional activator GAL4. cDNAs in a cDNA library which encode proteins which might interact with the polypeptides encoded by the extended cDNAs or portions thereof arc inserted into a second expression vector such that they are in frame with DNA encoding the activation domain of GAL4. The two expression plasmids are transformed into yeast and the yeast are plated on selection medium which selects for expression of selectable markers on each of the expression vectors as well as GAL4 dependent expression of the HIS3 gene. Transformants capable of growing on medium lacking histidine are screened for GALA dependent lacZ expression. Those cells which are positive in both the histidine selection and the lacZ assay contain plasmids encoding proteins which interact with the polypeptide encoded by the extended cDNAs or portions thereof.
Alternatively, the system described in Lustig et al., Methods in Enzymology 283; 83-99 (1997), may be used for identifying molecules which interact with the polypeptides encoded by extended cDNAs. In such systems, in vitro transcription reactions are performed on a pool of vectors containing extended cDNA inserts cloned downstream of a promoter which drives in vitro transcription. The resulting pools of mRNAs are introduced into Xenopus laevis oocytes. The oocytes are then assayed for a desired activity.
Alternatively, the pooled in vitro transcription products produced as described above may be translated in vitro. The pooled in vitro translation products can be assayed for a desired activity or for interaction with a known polypcptide.
Proteins or other molecules interacting with polypeptides encoded by extended cDNAs can be found by a variety of additional techniques. In one method, affinity columns containing the polypeptide encoded by the extended cDNA or a portion thereof can be constructed. In some versions, of this method the affinity column contains chimeric proteins in which the protein encoded by the extended cDNA or a portion thereof is fused to glutathione S-transferase. A mixture of cellular proteins or pool of expressed proteins as described above and is applied to the affinity column. Proteins interacting with the polypeptide attached to the column can then be isolated and analyzed on 2-D electrophoresis gel as described in Ramunsen et al. Electrophoresis 18:588-598 (1997). Alternatively, the proteins retained on the affinity column can be purified by electrophoresis based methods and sequenced. The same method can be used to isolate antibodies, to screen phage display products, or to screen phage display human antibodies.
Proteins interacting with polypeptides encoded by extended cDNAs or portions thereof can also be screened by using an Optical Biosensor as described in Edwards Leatherbarrow, Analytical Biochemistry, 246:1-6 (1997). The main advantage of the method is that it allows the determination of the association rate between the protein and other interacting molecules. Thus, it is possible to specifically select interacting molecules with a high or low association rate. Typically a target molecule is linked to the sensor surface (through a carboxymethl dextran matrix) and a sample of test molecules is placed in contact with the target molecules. The binding of a test molecule to the target molecule causes a change in the refractive index and/or thickness. This change is detected by the Biosensor provided it occurs in the evanescent field (which extend a few hundred manometers from the sensor surface). In these screening assays, the target molecule can be one of the polypeptides encoded by extended cDNAs or a portion thereof and the est sample can be a collection of proteins extracted from tissues or cells, a pool of expressed proteins, combinatorial peptide and/ or chemical libraries, or phage displayed peptides.
The tissues or cells from which the test proteins are extracted can originate from any species.
In other methods, a target protein is immobilized and the test population is a collection of unique polypeptides encoded by the extended cDNAs or portions thereof.
To study the interaction of the proteins encoded by the extended cDNAs or portions thereof with drugs, the microdialysis coupled to HPLC method described by Wang et al., Chromatographia 44:205- 208(1997) or the affinity capillary electrophoresis method described by Busch et al., J. Chronatogr.
777:311-328 (1997).
The system described in U.S. Patent No. 5,654.150, may also be used to identify molecules which interact with the polypeptides encoded by the extended cDNAs, In this system, pools of extended cDNAs are transcribed and translated in vitro and the reaction products are assayed for interaction with a known polypeptide or antibody.
It will be appreciated by those skilled in the art that the proteins expressed from the extended cDNAs or portions may be assayed for numerous activities in addition to those specifically enumerated above. For example, the expressed proteins may be evaluated for applications involving control and regulation of inflammation, tumor proliferation or metastasis, infection, or other clinical conditions. In addition, the proteins expressed from the extended cDNAs or portions thereof may be useful as nutritional agents or cosmetic agents.
The proteins expressed from the extended cDNAs or portions thereof may be used to generate antibodies capable of specifically binding to the expressed protein or fragments thereof as described in Example 40 below. The antibodies may capable of binding a full length protein encoded by one of the sequences of SEQ ID NOs. 134-180, a mature protein encoded by one of the sequences of SEQ ID NOs.
134-180, or a signal peptide encoded by one of the sequences of SEQ ID Nos. 134-180. Alternatively, the antibodies may be capable of binding fragments of the proteins expressed from the extended cDNAs which comprise at least 10 amino acids of the sequences of SEQ ID NOs: 181-227. In some embodiments, the antibodies may be capable of binding fragments of the proteins expressed from the extended cDNAs which comprise at least 15 amino acids of the sequences of SEQ ID NOs: 181-227. In other embodiments, the antibodies may be capable of binding fragments of the proteins expressed from the extended cDNAs which comprise at least 25 amino acids of the sequences of SEQ ID NOs: 181-227. In further embodiments, the antibodies may be capable of binding fragments, of the proteins expressed from the extended cDNAs which comprise at least 40 amino acids of the sequences of SEQ ID NOs: 181-227.
EXAMPLE Production of an Anlibody to a Human Protein Substantially pure protein or polypeptide is isolated from the transfected or transformed cells ;as described in Example 30. The concentration of protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to the level of a few micrograms/ml, Monoclonal or polyclonal antibody to the protein can then be prepared as follows: A. Monoclonal Antibody Production by Hybridonm Fusion Monoclonal antibody to epitopes of any of the peptides identified and isolated as described can be prepared from murine hybridomas according to the classical method of Kohler, G. and Milstein, Nature 256:495 (1975) or derivative methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein or peptides derived therefrom over a period of a few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody-producing clones are identified by detection of antibody in the supematant fluid of the wells by immunoassay procedures, such as Elisa, as originally described by Engvall, Meth. Enzymol.
70:419 (1980), and derivative methods thereof. Selected positive clones can be expanded and their monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et al. Basic Methods in Molecular Biology Elsevicr, New York. Section 21-2.
B. Polyclonal Antibody Production by Immunization Polyclonal antiserum containing antibodies to heterogenous epitopes of a single protein can be prepared by immunizing suitable animals with the expressed protein or peptides derived therefrom described above, which can be unmodified or modified to enhance immunogenicity. Effective polyclonal antibody production is affected by many factors related both to the antigen and the host species. For example, small molecules tend to be less immunogenic than others and may require the use of carriers and adjuvant. Also, host animals vary in response to site of inoculations and dose, with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at multiple intradermal sites appears to be most reliable. An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al. J. Cliii. Endocrinol. Mctab. 33:988-991 (1971).
Booster injections can be given at regular intervals, and antiserum harvested when antibody tiler thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against known concentrations of the antigen, begins to fall. See, for example, Ouchterlony, O. et al., Chap. 19 in: Handbook of Erperimental Immtuology D. Wier (ed) Blackwell (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12 IM). Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, for example, by Fisher, Chap. 42 in: Manual of Clinical Inununology, 2d Ed. (Rose and Friedman, Eds.) Amer. Soc. For Microbiol., Washington.
D.C.(1980).
Antibody preparations prepared according to either protocol are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological sample, The antibodies may also be used in therapeutic compositions for killing cells expressing the protein or reducing the levels of the protein in the body.
V. Use of Extended cDNAs or Portions Thereof as Reagents The extended cDNAs of the present invention may be used as reagents in isolation procedures, diagnostic assays, and forensic procedures. For example, sequences from the extended cDNAs (or genomic DNAs obtainable therefrom) may be detectably labeled and used as probes to isolate other sequences capable of hybridizing to them. In addition, sequences from the extended cDNAs (or genomic DNAs obtainable therefrom) may be used to design PCR primers to be used in isolation, diagnostic, or forensic procedures.
EXAMPLE 41 Prenaration of PCR Primers and Amolification of DNA The extended cDNAs (or genomic DNAs obtainable therefrom) may be used to prepare PCR primers for a variety of applications, including isolation procedures for cloning nucleic acids capable of hybridizing to such sequences, diagnostic techniques and forensic techniques. The PCR primers are at least bases, and preferably at least 12, 15, or 17 bases in length. More preferably, the PCR primers are at least 20-30 bases in length. In some embodiments, the PCR primers may be more than 30 bases in length. It is preferred that the primer pairs have approximately the same G/C ratio, so that melting temperatures are approximately the same. A variety of PCR techniques are familiar to those skilled in the art. For a review of PCR technology, see Molecular Cloning to Genetic Engineering White, B.A. Ed. in Methods in Molecular Biology 67: Humana Press, Totowa (1997). In each of these PCR procedures, PCR primers on either side of the nucleic acid sequences to be amplified are added to a suitably prepared nucleic acid sample along with dNTPs and a thermostable polymerase such as Taq polymerase, Pfu polymcrase, or Vent polymerase. The nucleic acid in the sample is denatured and the PCR primers are specifically hybridized to complementary nucleic acid sequences in the sample. The hybridized primers are extended. Thereafter, another cycle of denaturation, hybridization, and extension is initiated. The cycles are repeated multiple times to produce an amplified fragment containing the nucleic acid sequence between the primer sites, EXAMPLE 42 Use of Extended cDNAs as Probes Probes derived from extended cDNAs or portions thereof (or gcnomic DNAs obtainable therefrom) may be labeled with detectable labels familiar to those skilled in the art, including radioisotopes and nonradioactive labels, to provide a detectable probe. The detectable probe may be single stranded or double stranded and may be made using techniques known in the art, including in vitro transcription, nick translation, or kinase reactions. A nucleic acid sample containing a sequence capable of hybridizing to the labeled probe is contacted with the labeled probe. If the nucleic acid in the sample is double stranded, it may be denatured prior to contacting the probe. In some applications, the nucleic acid sample may be immobilized on a surface such as a nitrocellulose or nylon membrane. The nucleic acid sample may comprise nucleic acids obtained from a variety of sources, including genomic DNA, cDNA libraries, RNA, or tissue samples.
Procedures used to detect the presence of nucleic acids capable of hybridizing to the detectable probe include well known techniques such as Southern blotting, Northern blotting, dot blotting, colony hybridization, and plaque hybridization. In some applications, the nucleic acid capable of hybridizing to the labeled probe may be cloned into vectors such as expression vectors, sequencing vectors, or in vitro transcription vectors to facilitate the characterization and expression of the hybridizing nucleic acids in the sample. For example, such techniques may be used to isolate and clone sequences in a genomic library or cDNA library which are capable of hybridizing to the detectable probe as described in Example 30 above.
PCR primers made as described in Example 41 above may be used in forensic analyses, such as the DNA fingerprinting techniques described in Examples 43-47 below. Such analyses may utilize detectable probes or primers based on the sequences of the extended cDNAs isolated using the 5' ESTs (or genomic DNAs obtainable therefrom).
EXAMPLE 43 Forensic Matchine by DNA Sequencine In one exemplary method, DNA samples are isolated from forensic specimens of, for example, hair, semen, blood or skin cells by conventional methods. A panel of PCR primers based on a number of the extended cDNAs (or genomic DNAs obtainable therefrom), is then utilized in accordance with Example 41 to amplify DNA of approximately 100-200 bases in length from the forensic specimen. Corresponding sequences are obtained from a test subject. Each of these identification DNAs is then sequenced using standard techniques, and a simple database comparison determines the differences, if any, between the sequences from the subject and those from the sample. Statistically significant differences between the suspect's DNA sequences and those from the sample conclusively prove a lack of identity. This lack of identity can be proven, for example, with only one sequence. Identity, on the other hand, should be demonstrated with a large number of sequences, all matching. Preferably, a minimum of 50 statistically identical sequences of 100 bases in length are used to prove identity between the suspect and the sample.
EXAMPLE 44 Positive Identification by DNA Sequcncine The technique outlined in the previous example may also be used on alarger scale to provide a unique fingerprint-type identification of any individual. In this technique. primers are prepared from a large number of sequeices from Table 11 and the appended sequence listing. Preferably, 20 to 50 different primers are used. These primers are used to obtain a corresponding number of PCR-generated DNA segments from the individual in.question in accordance with Example 41. Each of these DNA segments is sequenced, using the methods set forth in Example 43. The database of sequences generated through this procedure uniquely identifies the individual from whom the sequences were obtained. The same panel of primers may then be used at any later time to absolutely correlate tissue or other biological specimen with that individual.
EXAMPLE Southern Blot Forensic Identificaion The procedure of Example 44 is repeated to obtain a panel of at least 10 amplified sequences from an individual and a specimen. Preferably, the panel contains at least 50 amplified sequences. More preferably, the panel contains 100 amplified sequences. In some embodiments, the panel contains 200 amplified sequences. This PCR-generated DNA is then digested with one or a combination of, preferably, four base specific restriction enzymes. Such enzymes are commercially available and known to those of skill in the art. After digestion, the resultant gene fragments arc size separated in multiple duplicate wells on an agarose gel and transferred to nitrocellulose using Southern blotting techniques well known to those with skill in the art. For a review of Southern blotting see Davis et al. Basic Methods in Molecular Biology, (1986), Elsevier Press. pp 62-65).
A panel of probes based on the sequences of the extended cDNAs (or genomic DNAs obtainable therefrom), or fragments thereof of at least 10 bases, are radioactively or colorimetrically labeled using methods known in the art, such as nick translation or end labeling, and hybridized to the Southern blot using techniques known in the art (Davis et al., supra). Preferably, the probe comprises at least 12, 15, or 17 consecutive nucleotides from the extended cDNA (or genomic DNAs obtainable therefrom). More preferably, the probe comprises at least 20-30 consecutive nucleotides from the extended cDNA (or genomic DNAs obtainable therefrom). In some embodiments, the probe comprises more than nucleotides from the extended cDNA (or genomic DNAs obtainable therefrom).
Preferably, at least 5 to 10 of these labeled probes are used, and more preferably at least about 20 or 30 are used to provide a unique pattern. The resultant bands appearing from the hybridization of a large sample of extended cDNAs (or genomic DNAs obtainable therefrom) will be a unique identifier. Since the restriction enzyme cleavage will be different for every individual, the band pattern on the Southern blot will also be unique. Increasing the number of extended cDNA probes will provide a statistically higher level of confidence in the identification since there will be an increased number of sets of bands used for identification.
EXAMPLE 46 Dolt otI Ident liication Irocedure Another technique for identifying individuals using the extended cDNA sequences disclosed herein utilizes a dot blot hybridization technique.
Genomic DNA is isolated from nuclei of subject to be identified. Oligonucleotide probes of approximately 30 bp in length are synthesized that correspond to at least 10. preferably 50 sequences from the extended cDNAs or genomic DNAs obtainable therefrom. The probes are used to hybridize to the genomic DNA through conditions known to those in the art. The oligonucleotides are end labeled with Pn using polynucleotide kinase (Pharmacia). Dot Blots are created by spotting the genomic DNA onto nitrocellulose or the like using a vacuum dot blot manifold (BioRad, Richmond California), The nitrocellulose filter containing the genomic sequences is baked or UV linked to the filter, prehybridized and hybridized with labeled probe using techniques known in the art (Davis et al. siUra). The P labeled DNA fragments are sequentially hybridized with successively stringent conditions to detect minimal differences between the 30 bp sequence and the DNA. Tetramcthylammonium chloride is useful for identifying clones containing small numbers of nucleotide mismatches (Wood et al., Proc. Nail. Acad. Sci. USA 82(6):1585- 1588 (1985)). A unique pattern of dots distinguishes one individual from another individual.
Extended cDNAs or oligonucleotides containing at least 10 consecutive bases from these sequences can be used as probes in the following alternative fingerprinting technique. Preferably, the probe comprises at least 12, 15, or 17 consecutive nucleotides from the extended cDNA (or genomic DNAs obtainable therefrom). More preferably, the probe comprises at least 20-30 consecutive nucleotides from the extended cDNA (or genomic DNAs obtainable therefrom). In some embodiments, the probe comprises more than nucleotides from the extended cDNA (or genomic DNAs obtainable therefrom).
Preferably, a plurality of probes having sequences from different genes are used in the alternative fingerprinting technique. Example 47 below provides a representative alternative fingerprinting procedure in which the probes are derived from extended cDNAs.
EXAMPLE 47 Alternative "Fingerprint" Identification Technique oligonucleotides are prepared from a large number, e.g. 50, 100, or 200, of extended cDNA sequences (or genomic DNAs obtainable therefrom) using commercially available oligonuclcotide services such as Genset. Paris, France. Cell samples from the test subject are processed for DNA using techniques well known to those with skill in the art. The nucleic acid is digested with restriction enzymes such as EcoRI and Xba[. Following digestion, samples are applied to wells for electrophoresis. The procedure, as known in the art, may be modified to accommodate polyacrylamide electrophoresis, however in this example, samples containing 5 ug of DNA are loaded into wells and separated on 0.8% agarose gels.
The gels arc transferred onto nitrocellulose using standard Southern blotting tcchniques.
ng of each of the oligonucleotides are pooled and end-labeled with The nitrocellulose is prehybridized with blocking solution and hybridized with the labeled probes. Following hybridization and washing. the nitrocellulose filtcr is exposed to X-Omal AR X-ray film. The resulting hybridization pattern will be unique for each individual.
It is additionally contemplated within this example that the number of probe sequences used can be varied for additional accuracy or clarity.
The antibodies generated in Examples 30 and 40 above may be used to identify the tissue type or cell species from which a sample is derived as described above.
EXAMPLE 48 Identification of Tissue Types or Cell Species by Means of Labeled Tissue Specific Antibodies Identification of specific tissues is accomplished by the visualization of tissue specific antigens by means of antibody preparations according to Examples 30 and 40 which are conjugated, directly or indirectly to a detectable marker. Selected labeled antibody species bind to their specific antigen binding partner in tissue sections, cell suspensions, or in extracts of soluble proteins from a tissue sample to provide a pattern for qualitative or semi-qualitative interpretation.
Antisera for these procedures must have a potency exceeding that of the native preparation, and for that reason, antibodies are concentrated to a mg/ml level by isolation of the gamma globulin fraction, for example, by ion-exchange chromatography or by ammonium sulfate fractionation. Also, to provide the most specific antisera, unwanted antibodies, for example to common proteins, must be removed from the gamma globulin fraction, for example by means of insoluble immunoabsorbents, before the antibodies are labeled with the marker. Either monoclonal or heterologous antisera is suitable for either procedure.
A. Immunohistochemical Techniques Purified, high-titer antibodies, prepared as described above, are conjugated to a detectable marker.
as described, for example, by Fudenberg, Chap. 26 in: Basic 503 Clinical Imnunology, 3rd Ed. Lange.
Los Altos, California (1980) or Rose, N. et al., Chap. 12 in: Methods in Imnunnodiagnosis, 2d Ed. John Wiley 503 Sons, New York (1980).
A fluorescent marker, either fluorescein or rhodamine, is preferred, but antibodies can also be labeled with an enzyme that supports a color producing reaction with a substrate, such as horseradish peroxidase. Markers can be added to tissue-bound antibody in a second step, as described below.
Alternatively, the specific antitissue antibodies can be labeled with ferritin or other electron dense particles, and localization of the ferritin coupled antigen-antibody complexes achieved by means of an electron microscope. In yet another approach, the antibodies are radiolabelcd, with, for example and detected by overlaying the antibody treated preparation with photographic emulsion.
Preparations to carry out the procedures can comprise monoclonal or polyclonal antibodies to a single protein or peptide identified as specific to a tissue type. for example, brain tissue, or antibody preparations to several untigenically distinct tissue specific antigens can be used in panels, independently or in mixtures, as required.
Tissue sections and cell suspensions are prepared for imnunohistochemical examination according to common histological techniques. Multiple cryostat sections (about 4 gnm, unfixed) of the unknown tissue and known control, are mounted and each slide covered with different dilutions of the antibody preparation.
Sections of known and unknown tissues should also be treated with preparations to provide a positive control, a negative control, for example, pre-immune sera, and a control for non-specific staining, for example, buffer.
Treated sections are incubated in a humid chamber for 30 min at room temperature, rinsed, then washed in buffer for 3045 min. Excess fluid is blotted away, and the marker developed.
If the tissue specific antibody was not labeled in the first incubation, it can be labeled at this time in a second antibody-antibody reaction, for example, by adding fluorescein- or enzyme-conjugated antibody against the immunoglobulin class of the antiserum-producing species, for example, fluorescein labeled antibody to mouse IgG. Such labeled sera are commercially available.
The antigen found in the tissues by the above procedure can be quantified by measuring the intensity of color or fluorescence on the tissue section, and calibrating that signal using appropriate standards.
B. Identification of Tissue Specific Soluble Proteins The visualization of tissue specific proteins and identification of unknown tissues from that procedure is carried out using the labeled antibody reagents and detection strategy as described for immunohistochemistry; however the sample is prepared according to an electrophoretic technique to distribute the proteins extracted from the tissue in an orderly array on the basis of molecular weight for detection.
A tissue sample is homogenized using a Virtis apparatus; cell suspensions are disrupted by Dounce homogenization or osmotic lysis, using detergents in either case as required to disrupt cell membranes, as is the practice in the art. Insoluble cell components such as nuclei, microsomes, and membrane fragments arc removed by ultracentrifugation, and the soluble protein-containing fraction concentrated if necessary and reserved for analysis.
A sample of the soluble protein solution is resolved into individual protein species by conventional SDS polyacrylamide electrophoresis as described, for example, by Davis, L. et al., Section 19-2 in: Basic Methods in Molecular Biology Leder, ed), Elsevier, New York (1986). using a range of amounts of polyacrylamide in a set of gels to resolve the entire molecular weight range of proteins to be detected in the sample. A size marker is run in parallel for purposes of estimating molecular weights of the constituent proteins. Sample size for analysis is a convenient volume of from 5 to55 ptl, and containing from about I to 100 pg protein. An aliquot of each of the resolved proteins is transferred by blotting to a nitrocellulose filter paper. a process that maintains the pattern of resolution. Multiple copies are prepared. The procedure.
known as Western Blot Analysis, is well described in Davis, L. et al., (above) Section 19-3. One set of nitrocellulose blots is stained with Coomassie Blue dye to visualize the entire set of proteins for comparison with the antibody bound proteins. The remaining nitrocellulose filters are then incubated with a solution of one or more specific antiscra to tissue specific proteins prepared as described in Examples 30 and 40. In this procedure, as in procedure A above, appropriate positive and negative sample and reagent controls are run.
In either procedure A or B, a detectable label can be attached to the primary tissue antigen-primary antibody complex according to various strategies and permutations thereof. In a straightforward approach, the primary specific antibody can be labeled; alternatively, the unlabeled complex can be bound by a labeled secondary anti-lgG antibody. In other approaches, either the primary or secondary antibody is conjugated to a biotin molecule, which can, in a subsequent step, bind an avidin conjugated marker. According to yet another strategy, enzyme labeled or radioactive protein A. which has the property of binding to any IgG, is bound in a final step to either the primary or secondary antibody.
The visualization of tissue specific antigen binding at levels above those seen in control tissues to one or more tissue specific antibodies, prepared from the gene sequences identified from extended cDNA sequences, can identify tissues of unknown origin, for example, forensic samples, or differentiated tumor tissue that has metastasized to foreign bodily sites.
In addition to their applications in forensics and identification, extended cDNAs (or genomic DNAs obtainable therefrom) may be mapped to their chromosomal locations. Example 49 below describes radiation hybrid (RH) mapping of human chromosomal regions using extended cDNAs. Example 50 below describes a representative procedure for mapping an extended cDNA (or a genomic DNA obtainable therefrom) to its location on a human chromosome. Example 51 below describes mapping of extended cDNAs (or genomic DNAs obtainable therefrom) on metaphase chromosomes by Fluorescence In Situ Hybridization (FISH).
EXAMPLE 49 Radiation hyrid manning of Extended cDNAs to the human enom e Radiation hybrid (RH) mapping is a somatic cell genetic approach that can be used for high resolution mapping of the human genome. In this approach, cell lines containing one or more human chromosomes are lethally irradiated, breaking each chromosome into fragments whose size depends on the radiation dose. These fragments are rescued by fusion with cultured rodent cells, yielding subclones conaining different portions of the human genome. This technique is described by Benham ct al. Geniontics 4:509-5 17 (1989) and Cox et al., Science 250:245-250 (1990). The random and independent nature of the subclones permits efficient mapping of any hiuman genome marker. Human DNA isolated from a panel of SO- 100 cell lines provides a miapping reagent for ordering extended cDNAs (or genomic DNAs obtainable thercfrom). In this ajpproich, the frequency or breakage between markers is uscd to measure distance, allowing construction of fine resolution maps as has been done using conventional ESTs Schuleret al., Science' 274:540-546 (1996).
RI-I miapping has been used to generate a high-rcsoLtioI1 whole genotne radiation hybrid maip of Immian chromosome I 7q22-q25.3 across (the genes for growth hormone (0Gi) and thymidine kinase (TK) Foster et til., Genainics 33:185-192 (1996). the region surrounding thie Gorlin syndrome gene (Obermayr et a Cur. 1. Hunt. Genter. 4:242-245, 1996). 60 loci covering the entire short arm of chromosome 12 (Raeymackers et al., Genotnics 29:170-178, (1995)). the region of humn chromosome 22 containing the neuroribromatosis type 2 locus (Frazer et al., Getinics 14:574-584 and 13 loci on the long arm of chromosome 5 (Warrington ct al., Gensotics 11:701-708 (199 EXAMPLE Monning of Extended cDNAs tafHuman Chromosomes usng PCR tchic*e Extended cDNAs (or genomic DNAs obtainable therefrom) may be assigned to human chromosomes using PCR based methiodologics. In such approaches. oligonucleotide primer pairs are designed from the extended cDNA scquence (or the sequence of a genomic DNA obtainable therefrom) to minimize the chance of amplifying through an intron. Preferably, the oligonucleotide primers are 18-23 bp in length and are designed for PCR amplification. The creation of PCR primers from known sequences is well known to those with skill in the art. For a review of PCR technology see Erlich. PCR Technology, Principles and Applications for DNA Amnplfication. (1992). W.H. Freeman and Co., New York.
The primers are used in polymerase chain reactions (PCR) to amplify templates from total human genomic DNA. PCR conditions are as follows: 60 ng of genoniic DNA is used as a template for PCR with ng of each oligonucleotide primer, 0.6 unit of Taq polymerase, and I tCu of a 32 p-labeled deoxycytidine triphosphate. The ['CR is performed in a microplate thermocycler (Techne) under the following conditions: cycles of 94 0 C, 1.4 min; 5511C, 2 nin; and 720C, 2 min; with a final extension at 72 0 C for 10 min. The amplified products are analyzed on a 6% polyacrylaniide sequencing gel and visualized by autoradiography.
If the length of the resulting ['CR product is identical to the distance between the ends of the primer sequences in the extended cDNA from which the primers are derived, then the ['CR reaction is repeated with DNA templates from two panels of humn-rodent somatic cell hybrids, BI[OS PCRable DNA (BIOS Corporation) and NIOMS Human-Rodent Somatic Cell Hybrid Mapping Panel Number I (NIGMS, Camden, NJ).
PCR is used to screen a series of somatic cell hybrid cell lines containing defined sets of human chromosomes for the presence of a given extended cDNA (or genomic DNA obtainable therefrom), DNA is isolated from the somatic hybrids and used as starting templates for PCR reactions using the primer pairs from the extended cDNAs (or genomic DNAs obtainable therefrom). Only those somatic cell hybrids with chromosomes containing the human gene corresponding to the extended cDNA (or genomic DNA obtainable therefrom) will yield an amplified fragment. The extended cDNAs (or genomic DNAs obtainable therefrom) are assigned to a chromosome by analysis of the segregation pattern of PCR products from the somatic hybrid DNA templates. The single human chromosome present in all cell hybrids that give rise to an amplified fragment is the chromosome containing that extended cDNA (or genomic DNA obtainable therefrom). For a review of techniques and analysis of results from somatic cell gene mapping experiments. (See Ledbetter et al., Genomics 6:475-481 (1990).) Alternatively, the extended cDNAs (or genomic DNAs obtainable therefrom) may be mapped to individual chromosomes using FISH as described in Example 51 below.
EXAMPLE 51 Manning of Extended 5' ESTs to Chromosomes Using Fluorescence in situ Hybridization Fluorescence in situ hybridization allows the extended cDNA (or genomic DNA obtainable therefrom) to be mapped to a particular location on a given chromosome. The chromosomes to be used for fluorescence in situ hybridization techniques may be obtained from a variety of sources including cell cultures, tissues, or whole blood.
In a preferred embodiment, chromosomal localization of an extended cDNA (or genomic DNA obtainable therefrom) is obtained by FISH as described by Cherif et al. Proc. Natl. Acad. Sci. U.S.A., 87:6639-6643 (1990). Metaphase chromosomes are prepared from phytohemagglutinin (PHA)-stimulated blood cell donors. PHA-stimulated lymphocytes from healthy males are cultured for 72 h in RPMI-1640 medium. For synchronization, methotrexate (10 pM) is added for 17 h, followed by addition of bromodeoxyuridine (5-BudR, 0.1 mM) for 6 h. Colcemid (1 pg/ml) is added for the last 15 min before harvesting the cells. Cells are collected, washed in RPMI, incubated with a hypotonic solution of KCI mM) at 37 0 C for 15 min and fixed in three changes of methanol:acetic acid The cell suspension is dropped onto a glass slide and air dried. The extended cDNA (or genomic DNA obtainable therefrom) is labeled with biotin-16 dUTP by nick translation according to the manufacturer's instructions (Bethesda Research Laboratories, Bethesda, MD), purified using a Sephadex G-50 column (Pharmacia, Upssala, Sweden) and precipitated. Just prior to hybridization, the DNA pellet is dissolved in hybridization buffer (50% formamide, 2 X SSC, 10% dextran sulfate, 1 mg/ml sonicated salmon sperm DNA, pH 7) and the probe is denatured at 70 0 C for 5-10 min.
Slides kept at -20oC are treated for I h at 37 0 C with RNase A (100 pg/ml), rinsed three times in 2 X SSC and dehydrated in an ethanol series. Chromosome preparations are denatured in 70% formamide, 2 X SSC for 2 min at 700C, then dehydrated at 4 0 C. The slides are treated with proteinase K (10 pg/100 ml in mM Tris-HCI. 2 mM CaCI:) at 370C for 8 min and dehydrated. The hybridization mixture containing the probe is placed on the slide, covered with a coverslip. sealed with rubber cement and incubated overnight in a humid chamber at 37°C. After hybridization and post-hybridization washes, the biotinylated probe is detected by avidin-FITC and amplified with additional layers of biotinylated goat anti-avidin and avidin- FITC. For chromosomal localization, fluorescent R-bands are obtained as previously described (Cherifet al., supra.). The slides are observed under a LEICA fluorescence microscope (DMRXA). Chromosomes are counterstained with propidium iodide and the fluorescent signal of the probe appears as two symmetrical yellow-green spots on both chromatids of the fluorescent R-band chromosome (red). Thus, a particular extended cDNA (or genomic DNA obtainable therefrom) may be localized to a particular cytogenetic Rband on a given chromosome.
Once the extended cDNAs (or genomic DNAs obtainable therefrom) have been assigned to particular chromosomes using the techniques described in Examples 49-51 above, they may be utilized to construct a high resolution map of the chromosomes on which they are located or to identify the chromosomes in a sample, EXAMPLE 52 Use of Extended cDNAs to Construct or Expand Chromosome Maps Chromosome mapping involves assigning a given unique sequence to a particular chromosome as described above. Once the unique sequence has been mapped to a given chromosome, it is ordered relative to other unique sequences located on the same chromosome. One approach to chromosome mapping utilizes a series of yeast artificial chromosomes (YACs) bearing several thousand long inserts derived from the chromosomes of the organism from which the extended cDNAs (or genomic DNAs obtainable therefrom) are obtained. This approach is described in Ramaiah Nagaraja et al. Genome Research 7:210- 222, (March, 1997). Briefly, in this approach each chromosome is broken into overlapping pieces which are inserted into the YAC vector. The YAC inserts are screened using PCR or other methods to determine whether they include the extended cDNA (or genomic DNA obtainable therefrom) whose position is to be determined. Once an insert has been found which includes the extended cDNA (or genomic DNA obtainable therefrom), the insert can be analyzed by PCR or other methods to determine whether the insert also contains other sequences known to be on the chromosome or in the region from which the extended cDNA (or genomic DNA obtainable therefrom) was derived. This process can be repeated for each insert in the YAC library to determine the location of each of the extended cDNAs (or genomic DNAs obtainable therefrom) relative to one another and to other known chromosomal markers. In this way, a high resolution map of the distribution of numerous unique markers along each of the organisms chromosomes may be obtained.
As described in Example 53 below extended cDNAs (or genomic DNAs obtainable therefrom) may also be used to identify genes associated with a particular phenotype, such as hereditary disease or drug response.
EXAMPLE 53 Idcntification of geneis associatd with hereditary diseases or dru response This example illustrates an approach useful for the association of extended cDNAs (or genomic DNAs obtainable therefrom) with particular phenotypic characteristics. In this example, a particular extended cDNA (or genomic DNA obtainable therefrom) is used as a test probe to associate that extended cDNA (or genomic DNA obtainable therefrom) with a particular phenotypic characteristic.
Extended cDNAs (or genomic DNAs obtainable therefrom) are mapped to a particular location on a human chromosome using techniques such as those described in Examples 49 and 50 or other techniques known in the art. A search of Mendelian Inheritance in Man McKusick, Mendelian inheritance in Man (available on line through Johns Hopkins University Welch Medical Library) reveals the region of the human chromosome which contains the extended cDNA (or genomic DNA obtainable therefrom) to be a very gene rich region containing several known genes and several diseases or phenotypes for which genes have not been identified. The gene corresponding to this extended cDNA (or genomic DNA obtainable therefrom) thus becomes an immediate candidate for each of these genetic diseases.
Cells from patients with these diseases or phenotypes are isolated and expanded in culture. PCR primers from the extended cDNA (or genomic DNA obtainable therefrom) are used to screen genomic DNA, mRNA or cDNA obtained from the patients. Extended cDNAs (or genomic DNAs obtainable therefrom) that are not amplified in the patients can be positively associated with a particular disease by further analysis. Alternatively, the PCR analysis may yield fragments of different lengths when the samples are derived from an individual having the phenotype associated with the disease than when the sample is derived from a healthy individual, indicating that the gene containing the extended cDNA may be responsible for the genetic disease.
VI. Use or Extended cDNAs (or genomic DNAs obtainable therefrom) to Construct Vectors The present extended cDNAs (or genomic DNAs obtainable therefrom) may also be used to construct secretion vectors capable of directing the secretion of the proteins encoded by genes inserted in the vectors. Such secretion vectors may facilitate the purification or enrichment of the proteins encoded by genes inserted therein by reducing the number of background proteins from which the desired protein must be purified or enriched. Exemplary secretion vectors are described in Example 54 below.
EXAMPLE 54 Construction of Secretion Vectors The secretion vectors of the present invention include a promoter capable of directing gene expression in the host cell, tissue, or organism of interest. Such promoters include the Rous Sarcoma Virus promoter. the SV40 promoter. the human cytomegalovirus promoter, and other promoters familiar to those skilled in the art.
A signal sequence fromt an extended cDNA (or genoinic DNA obtainable therefrom), such as one of the signal sequences in SEQ ID NOs: 134-180 as defined in Table Vll above, is operably linked to the promoter such that the mRNA transcribed from the promoter will direct the translation of the signal peptide.
The host cell, tissue, or organism may be any cell. tissue, or organism which recognizes the signal peptide encoded by the signal sequence in the extended cDNA (or genomic DNA obtainable therefrom). Suitable hosts include mammalian cells, tissues or organisms, avian cells, tissues, or organisms. insect cells, tissues or organisms, or yeast.
In addition, the secretion vector contains cloning sites for inserting genes encoding the proteins which are to be secreted. The cloning sites facilitate the cloning of the insert gene in frame with the signal sequence such that a fusion protein in which the signal peptide is fused to the protein encoded by the inserted gene is expressed from the mRNA transcribed from the promoter. The signal peptide directs the extracellular secretion of the fusion protein.
The secretion vector may be DNA or RNA and may integrate into the chromosome of the host, be stably maintained as an extrachromosomal replicon in the host, be an artificial chromosome, or be transiently present in the host. Many nucleic acid backbones suitable for use as secretion vectors arc known to those skilled in the art, including retroviral vectors, SV40 vectors, Bovine Papilloma Virus vectors, yeast integrating plasmids, yeast episomal plasmids, yeast artificial chromosomes, human artificial chromosomes, P element vectors, baculovirus vectors, or bacterial plasmids capable of being transiently introduced into the host.
The secretion vector may also contain a polyA signal such that the polyA signal is located downstream of the gene inserted into the secretion vector.
After the gene encoding the protein for which secretion is desired is inserted into the secretion vector, the secretion vector is introduced into the host cell, tissue, or organism using calcium phosphate precipitation, DEAE-Dextran, electroporation, liposome-mediated transfection, viral particles or as naked DNA. The protein encoded by the inserted gene is then purified or enriched from the supernatant using conventional techniques such as ammonium sulfate precipitation, immunoprecipitation, immunochromatography, size exclusion chromatography, ion exchange chromatography, and hplc.
Alternatively, the secreted protein may be in a sufficiently enriched or pure state in the supematant or growth media of the host to permit it to be used for its intended purpose without further enrichment.
The signal sequences may also be inserted into vectors designed for gene therapy. In such vectors, the signal sequence is operably linked to a promoter such that mRNA transcribed from the promoter encodes the signal peptide. A cloning site is located downstream of the signal sequence such that a gene encoding a protein whose secretion is desired may readily be inserted into the vector and fused to the signal sequence. The vector is introduced into an appropriate host cell. The protein expressed from the promoter is secreted extracellularly, thereby producing a therapeutic effect.
The extended cDNAs or 5' ESTs may also be used to clone sequences located upstream of the extended cDNAs or 5' ESTs which are capable of regulating gene expression, including promoter sequences, enhancer sequences, and other upstream sequences which influence transcription or translation levels. Once identified and cloned, these upstream regulatory sequences may be used in expression vectors designed to direct the expression of an inserted gene in a desired spatial, temporal, developmental, or quantitative fashion. Example 55 describes a method for cloning sequences upstream of the extended cDNAs or 5' ESTs.
EXAMPLE Use of Extended cDNAs or 5' ESTs to Clone Unstream Scquences from Genomic DNA Sequences derived from extended cDNAs or 5' ESTs may be used to isolate the promoters of the corresponding genes using chromosome walking techniques. In one chromosome walking technique, which utilizes the GenomeWalkerTM kit available from Clontech, five complete genomic DNA samples are each digested with a different restriction enzyme which has a 6 base recognition site and leaves a blunt end.
Following digestion, oligonuclcotide adapters are ligated to each end of the resulting genomic DNA fragments.
For each of the five gcnomic DNA libraries, a first PCR reaction is performed according to the manufacturer's instructions using an outer adaptor primer provided in the kit and an outer gene specific primer. The gene specific primer should be selected to be specific for the extended cDNA or 5' EST of interest and should have a melting temperature, length, and location in the extended cDNA or' EST which is consistent with its use in PCR reactions. Each first PCR reaction contains 5ng of genomic DNA, 5 pl of Tth reaction buffer, 0.2 mM of each dNTP, 0.2 pM each of outer adaptor primer and outer gene specific primer, 1.1 mM of Mg(OAc) 2 and 1 pl of the Tth polymerase 50X mix in a total volume of 50 p1.
The reaction cycle for the first PCR reaction is as follows: I min 94 0 C 2 sec 94cC, 3 min 72 0 C (7 cycles) 2 sec 94 0 C, 3 min 67 0 C (32 cycles) 5 min 67°C, The product of the first PCR reaction is diluted and used as a template for a second PCR reaction according to the manufacturer's instructions using a pair of nested primers which are located internally on the amplicon resulting from the first PCR reaction. For example, 5 pl of the reaction product of the first PCR reaction mixture may be diluted 180 times. Reactions are made in a 50 pi volume having a composition identical to that of the first PCR reaction except the nested primers are used. The first nested primer is specific for the adaptor, and is provided with the GenomeWalkerTM kit. The second nested primer is specific for the particular extended cDNA or 5' EST for which the promoter is to be cloned and should have a melting temperature, length, and location in the extended cDNA or 5' EST which is consistent with its use in PCR reactions. The reaction parameters of the second PCR reaction are as follows: 1 min 94°C/ 2 sec 94°C, 3 min 72°C (6 cycles) 2 sec 94°C, 3 min 67 0 C (25 cycles) 5 min 67°C The product of the second PCR reaction is purified, cloned, and sequenced using standard techniques. Alternatively. low or more human genomic DNA libraries can be constructed by using two or more restriction enzymes. The digested genomic DNA is cloned into vectors which can be converted into single stranded, circular, or linear DNA. A biotinylated oligonucleotide comprising at least 15 nuclcotides from the extended cDNA or 5' EST sequence is hybridized to the single stranded DNA. Hybrids between the biotinyluted oligonucleotide and the single stranded DNA containing the extended cDNA or EST sequence are isolated as described in Example 29 above. Thereafter, the single stranded DNA containing the extended cDNA or EST sequence is released from the beads and converted into double stranded DNA using a primer specific for the extended cDNA or 5' EST sequence or a primer corresponding to a sequence included in the cloning vector. The resulting double stranded DNA is transformed into bacteria. DNAs containing the 5' EST or extended cDNA sequences are identified by colony PCR or colony hybridization.
Once the upstream genomic sequences have been cloned and sequenced as described above, prospective promoters and transcription start sites within the upstream sequences may be identified by comparing the sequences upstream of the extended cDNAs or 5' ESTs with databases containing known transcription start sites, transcription factor binding sites, or promoter sequences.
In addition, promoters in the upstream sequences may be identified using promoter reporter vectors as described in Example 56.
EXAMPLE 56 Identification of Promoters in Cloned Upstream Seauences The genomic sequences upstream of the extended cDNAs or 5' ESTs are cloned into a suitable promoter reporter vector, such as the pSEAP-Basic, pSEAP-Enhancer, ppgal-Basic, ppgal-Enhancer, or pEGFP-1 Promoter Reporter vectors available from Clontech. Briefly, each of these promoter reporter vectors include multiple cloning sites positioned upstream of a reporter gene encoding a readily assayable protein such as secreted alkaline phosphatase, 3 galactosidase, or green fluorescent protein. The sequences upstream of the extended cDNAs or 5' ESTs are inserted into the cloning sites upstream of the reporter gene in both orientations and introduced into an appropriate host cell. The level of reporter protein is assayed and compared to the level obtained from a vector which lacks an insert in the cloning site. The presence of an elevated expression level in the vector containing the insert with respect to the control vector indicates the presence of a promoter in the insert. If necessary, the upstream sequences can be cloned into vectors which contain an enhancer for augmenting transcription levels from weak promoter sequences. A significant level of expression above that observed with the vector lacking an insert indicates that a promoter sequence is present in the inserted upstream sequence.
Appropriate host cells for the promoter reporter vectors may be chosen based on the results of the above described determination of expression patterns of the extended cDNAs and ESTs. For example, if the expression pattern analysis indicates that the mRNA corresponding to a particular extended cDNA or EST is expressed in fibroblasts, the promoter reporter vector may be introduced into a human fibroblast cell line.
Promoter sequences within the upstream genomic DNA may he further defined by constructing nested deleiions in the upstream DNA using conventional techniques such as Exonuclease III digestion, The resulting deletion fragments can be inserted into the promoter reporter vector to determine whether the deletion has reduced or obliterated promoter activity, In this way, the boundaries of the promoters may be defined. If desired, potential individual regulatory sites within the promoter may be identified using site directed mutagenesis or linker scanning to obliterate potential transcription factor binding sites within the promoter individually or in combination. The effects of these mutations on transcription levels may be determined by inserting the mutations into the cloning sites in the promoter reporter vectors.
EXAMPLE 57 Clonine and Identification of Promoters Using the method described in Example 55 above with 5' ESTs. sequences upstream of several genes were obtained. Using the primer pairs GGG AAG ATG GAG ATA GTA TTG CCT G (SEQ ID NO:29) and CTG CCA TGT ACA TGA TAG AGA GAT TC (SEQ ID NO:30), the promoter having the internal designation P 13H2 (SEQ ID NO:3 1) was obtained.
Using the primer pairs GTA CCA GGGG ACT GTG ACC ATT GC (SEQ ID NO:32) and CTG TGA CCA TTG CTC CCA AGA GAG (SEQ ID NO:33), the promoter having the internal designation P15B4 (SEQ ID NO:34) was obtained.
Using the primer pairs CFG GGA TGG AAG GCA CGG TA (SEQ ID NO:35) and GAG ACC ACA CAG CTA GAC AA (SEQ ID NO:36), the promoter having the internal designation P29B6 (SEQ ID NO:37) was obtained.
Figure 7 provides a schematic description of the promoters isolated and the way they are assembled with the corresponding 5' tags. The upstream sequences were screened for the presence of motifs resembling transcription factor binding sites or known transcription start sites using the computer program Matlnspector release 2.0, August 1996.
Figure 8 describes the transcription factor binding sites present in each of these promoters. The columns labeled matrices provides the name of the MatInspector matrix used. The column labeled position provides the 5' position of the promoter site. Numeration of the sequence starts from the transcription site as determined by matching the genomic sequence with the 5' EST sequence. The column labeled "orientation" indicates the DNA strand on which the site is found, with the strand being the coding strand as determined by matching the genomic sequence with the sequence of the 5' EST. The column labeled "score" provides the Matlnspector score found for this site. The column labeled "length" provides the length of the site in nucleotides. The column labeled "sequence" provides the sequence of the site found.
The promoters and other regulatory sequences located upstream of the extended cDNAs or 5' ESTs may be used to design expression vectors capable of directing the expression of an inserted gene in a desired spatial. temporal. developmental, or quantitative manner. A promoter capable of directing the desired spatial, temporal, developmental, and quantitative pattcrs may be selected using the results of the expression analysis described in Example 26 above. For example, if a promoter which confers a high level of expression in muscle is desired, the promoter sequence upstream of an extended cDNA or 5' EST derived from an mRNA which is expressed at a high level in muscle, as determined by the method of Example 26, may be used in the expression vector.
Preferably, the desired promoter is placed near multiple restriction sites to facilitate the cloning of the desired insert downstream of the promoter, such that the promoter is able to drive expression of the inserted gene. The promoter may be inserted in conventional nucleic acid backbones designed for extrachromosomal replication, integration into the host chromosomes or transient expression. Suitable backbones for the present expression vectors include retroviral backbones, backbones from eukaryotic episomes such as SV40 or Bovine Papilloma Virus, backbones from bacterial episomes, or artificial chromosomes.
Preferably, the expression vectors also include a polyA signal downstream of the multiple restriction sites for directing the polyadenylation of mRNA transcribed from the gene inserted into the expression vector.
Following the identification of promoter sequences using the procedures of Examples 55-57, proteins which interact with the promoter may be identified as described in Example 58 below.
EXAMPLE 58 Identification of Proteins Which Interact with Promoter Sequences.
Unstream Regulatory Seauences, or mRNA Sequences within the promoter region which are likely to bind transcription factors may be identified by homology to known transcription factor binding sites or through conventional mutagenesis or deletion analyses of reporter plasmids containing the promoter sequence. For example, deletions may be made in a reporter plasmid containing the promoter sequence of interest operably linked to an assayable reporter gene. The reporter plasmids carrying various deletions within the promoter region are transfected into an appropriate host cell and the effects of the deletions on expression levels is assessed. Transcription factor binding sites within the regions in which deletions reduce expression levels may be further localized using site directed mutagenesis, linker scanning analysis, or other techniques familiar to those skilled in the art. Nucleic acids encoding proteins which interact with sequences in the promoter may be identified using one-hybrid systems such as those described in the manual accompanying the Matchmaker One-Hybrid System kit available from Clontech (Catalog No. K1603-1). Briefly, the Matchmaker One-hybrid system is used as follows. The target sequence for which it is desired to identify binding proteins is cloned upstream of a selectable reporter gene and integrated into the yeast genome. Preferably, multiple copies of the target sequences are inserted into the reporter plasmid in tandem.
A library comprised of fusions between cDNAs to be evaluated for the ability to bind to the promoter and the activation domain of a yeast transcription factor, such as GAL4, is transformed into the yeast strain containing the integrated reporter sequence. The yeast are plated on selective media to select cells expressing the selectable marker linked to the promoter sequence. The colonies which grow on the selective media contain genes encoding proteins which bind the target sequence. The inserts in the genes encoding the fusion proteins are further characterized by sequencing. In addition, the inserts may be inserted into expression vectors or in vitro transcription vectors. Binding of the polypeptides encoded by the inserts to the promoter DNA may be confirmed by techniques familiar to those skilled in the art, such as gel shift analysis or DNAse protection analysis.
VII. Use of Extended cDNAs (or Genomic DNAs ObtUinubic Therefrom) in Gene Therapy The present invention also comprises the use of extended cDNAs (or genomic DNAs obtainable therefrom) in gene therapy strategies, including antisense and triple helix strategies as described in Examples 57 and 58 below. In antisense approaches, nucleic acid sequences complementary to an mRNA are hybridized to the mRNA intracellularly, thereby blocking the expression of the protein encoded by the mRNA. The antisense sequences may prevent gene expression through a variety of mechanisms. For example, the antisense sequences may inhibit the ability of ribosomes to translate the mRNA. Alternatively, the antisense sequences may block transport of the mRNA from the nucleus to the cytoplasm, thereby limiting the amount of mRNA available for translation. Another mechanism through which antisense sequences may inhibit gene expression is by interfering with mRNA splicing. In yet another strategy, the antisense nucleic acid may be incorporated in a ribozyme capable of specifically cleaving the target mRNA.
EXAMPLE 59 Preparation and Use of Antisense Olieonucleotides The antisense nucleic acid molecules to be used in gene therapy may be either DNA or RNA sequences. They may comprise a sequence complementary to the sequence of the extended cDNA (or genomic DNA obtainable therefrom). The antisense nucleic acids should have a length and melting temperature sufficient to permit formation of an intracellular duplex having sufficient stability to inhibit the expression of the mRNA in the duplex. Strategies for designing antisense nucleic acids suitable for use in gene therapy are disclosed in Green et al., Ann. Rev. Biochem. 55:569-597 (1986) and Izant and Weintraub, Cell 36:1007-1015 (1984).
In some strategies, antisense molecules are obtained from a nucleotide sequence encoding a protein by reversing the orientation of the coding region with respect to a promoter so as to transcribe the opposite strand from that which is normally transcribed in the cell. The antisense molecules may be transcribed using in vitro transcription systems such as those which employ T7 or SP6 polymerase to generate the transcript. Another approach involves transcription of the antiscnse nucleic acids in vivo by operably linking DNA containing the antisense sequence to a promoter in an expression vector.
Alternatively, oligonuclotides which are complementary to the strand normally transcribed in the cell may be synthesized in vitro. Thus, the antisense nucleic acids arc complementary to the corresponding mRNA and arc capable of hybridizing to the mRNA to create a duplex. In some embodiments, the antisense sequences may contain modified sugar phosphate backbones to increase stability and make them less sensitive to RNase activity. Examples of modifications suitable for use in antisense strategies are described by Rossi et al.. Plhanacol. Ther. 50(2):245-254 (1991).
Various types of antisense oligonucleotides complementary to the sequence of the extended cDNA (or genomic DNA obtainable therefrom) may be used. In one preferred embodiment, stable and semi-stable antisense oligonucleotides described in International Application No. PCT W094/23026 are used. In these molecules, the 3' end or both the 3' and 5' ends are engaged in intramolecular hydrogen bonding between complementary base pairs. These molecules are better able to withstand exonuclease attacks and exhibit increased stability compared to conventional antisense oligonucleotides.
In another preferred embodiment, the antisense oligodeoxynucleotidcs against herpes simplex virus types 1 and 2 described in International Application No. WO 95/04141, are used.
In yet another preferred embodiment, the covalently cross-linked antisense oligonuclcotidcs described in International Application No. WO 96/31523, are used. These double- or single-stranded oligonucleotides comprise one or more, respectively, inter- or intra-oligonuclcotide covalent cross-linkages, wherein the linkage consists of an amide bond between a primary amine group of one strand and a carboxyl group of the other strand or of the same strand, respectively, the primary amine group being directly substituted in the 2' position of the strand nucleotide monosaccharide ring, and the carboxyl group being carried by an aliphatic spacer group substituted on a nucleotide or nucleotide analog of the other strand or the same strand, respectively.
The antisense oligodeoxynucleotides and oligonucleotides disclosed in International Application No. WO 92/18522, may also be used. These molecules are stable to degradation and contain at least one transcription control recognition sequence which binds to control proteins and are effective as decoys therefor. These molecules may contain "hairpin" structures, "dumbbell" structures, "modified dumbbell" structures, "cross-linked" decoy structures and "loop" structures.
In another preferred embodiment, the cyclic double-stranded oligonucleotides described in European Patent Application No. 0 572 287 A2 are used. These ligated oligonucleotide "dumbbells" contain the binding site for a transcription factor and inhibit expression of the gene under control of the transcription factor by sequestering the factor.
Use of the closed antisense oligonucleotides disclosed in International Application No. WO 92/19732, is also contemplated. Because these molecules have no free ends. they are more resistant to degradation by exonucleases than are conventional oligonucleotides. These oligonucleotides may be multifunctional, interacting with several regions which are not adjacent to the target mRNA.
The appropriate level of antiscnse nucleic acids required to inhibit gcne expression may be determined using in vitro expression analysis. The antiscnse molecule may be introduced into the cells by diffusion, injection, infection or transfection using procedures known in the art. For example, the antisense nucleic acids can be introduced into the body us a bare or naked oligonucleotide, oligonucleotide encapsulated in lipid, oligonucleotide sequence encapsidated by viral protein, or as an oligonucleotide operably linked to a promoter contained in an expression vector, The expression vector may be any of a variety of expression vectors known in the art, including retroviral or viral vectors, vectors capable of extrachromosomal replication, or integrating vectors, The vectors may be DNA or RNA.
The antisense molecules are introduced onto cell samples at a number of different concentrations preferably between lx10"lOM to lxlO"*M. Once the minimum concentration that can adequately control gene expression is identified, the optimized dose is translated into a dosage suitable for use in vivo. For example, an inhibiting concentration in culture of Ix 10' translates into a dose of approximately 0.6 mg/kg bodyweight. Levels of oligonucleotide approaching 100 mg/kg bodyweight or higher may be possible after testing the toxicity of the oligonucleotide in laboratory animals. It is additionally contemplated that cells from the vertebrate arc removed, treated with the antisense oligonucleotide, and reintroduced into the vertebrate.
It is further contemplated that the antisense oligonucleotide sequence is incorporated into a ribozyme sequence to enable the antiscnse to specifically bind and cleave its target mRNA. For technical applications of ribozyme and antisense oligonucleotides see Rossi et al., supra.
In a preferred application of this invention, the polypeptide encoded by the gene is first identified, so that the effectiveness of antisense inhibition on translation can be monitored using techniques that include but are not limited to antibody-mediated tests such as RIAs and ELISA, functional assays, or radiolabeling.
The extended cDNAs of the present invention (or genomic DNAs obtainable therefrom) may also be used in gene therapy approaches based on intracellular triple helix formation. Triple helix oligonucleotides are used to inhibit transcription from a genome. They are particularly useful for studying alterations in cell activity as it is associated with a particular gene. The extended cDNAs (or genomic DNAs obtainable therefrom) of the present invention or, more preferably, a portion of those sequences, can be used to inhibit gene expression in individuals having diseases associated with expression of a particular gene. Similarly, a portion of the extended cDNA (or genomic DNA obtainable therefrom) can be used to study the effect of inhibiting transcription of a particular gene within a cell. Traditionally, homopurine sequences were considered the most useful for triple helix strategies. However, homopyrimidine sequences can also inhibit gene expression. Such homopyrimidine oligonucleotides bind to the major groove at homopurine:homopyrimidine sequences. Thus, both types of sequences from the extended cDNA or from the gene corresponding to the extended cDNA are contemplated within (lie scope of this invention.
EXAMPLE Preparation and use of Trinle Helix Prolws The sequences of the extended cDNAs (or genomic DNAs obtainable therefrom) are scanned to identify 10-mcr to 20-mer hoinopyrimidine or homopurine stretches which could be used in triple-helix based strategies for inhibiting gene expression. Following identification of candidate homopyrimidine or homopurine stretches, their efficiency in inhibiting gene expression is assessed by introducing varying amounts of oligonucleotides containing the candidate sequences into tissue culture cells which normally express the target gene. The oligonucleotides may be prepared on an oligonucleotide synthesizer or they may be purchased commercially from a company specializing in custom oligonucleotide synthesis, such as GENSET, Paris, France.
The oligonucleotides may be introduced into the cells using a variety of methods known to those skilled in the art, including but not limited to calcium phosphate precipitation, DEAE-Dextran.
electroporation, liposome-mediated transfection or native uptake.
Treated cells are monitored for altered cell function or reduced gene expression using techniques such as Northern blotting, RNase protection assays, or PCR based strategies to monitor the transcription levels of the target gene in cells which have been treated with the oligonucleotide. The cell functions to be monitored are predicted based upon the homologies of the target gene corresponding to the extended cDNA from which the oligonucleotide was derived with known gene sequences that have been associated with a particular function. The cell functions can also be predicted based on the presence of abnormal physiologies within cells derived from individuals with a particular inherited disease, particularly when the extended cDNA is associated with the disease using techniques described in Example 53.
The oligonucleotides which are effective in inhibiting gene expression in tissue culture cells may then be introduced in vivo using the techniques described above and in Example 59 at a dosage calculated based on the in vitro results, as described in Example 59.
In some embodiments, the natural (beta) anomers of the oligonucleotide units can be replaced with alpha anomers to render the oligonucleotide more resistant to nucleases. Further, an intercalating agent such as ethidium bromide, or the like, can be attached to the 3' end of the alpha oligonucleotide to stabilize the triple helix. For information on the generation of oligonucleotides suitable for triple helix formation see Griffin et al. Science 245:967-971 (1989).
EXAMPLE 61 Use of Extended cDNAs to Express an Encoded Protein in a Host Organism The extended cDNAs of the present invention may also be used to express an encoded protein in a host organism to produce a beneficial effect. In such procedures, the encoded protein may be transiently expressed in the host organism or stably expressed in the host organism. The encoded protein may have any of the activities described above. The encoded protein may be a protein which the host organism lacks or, alternatively, the encoded protein may augment the existing levels of the protein in the host organism.
A full length extended cDNA encoding the signal peptide and the mature protein, or an extended cDNA encoding only the mature protein is introduced into the host organism. The extended cDNA may be introduced into the host organism using a variety of techniques known to those of skill in the art. For example, the extended cDNA may be injected into the host organism as naked DNA such that the encoded protein is expressed in the host organism, thereby producing a beneficial effect.
Alternatively, the extended cDNA may be cloned into an expression vector downstream of a promoter which is active in the host organism. The expression vector may be any of the expression vectors designed for use in gene therapy, including viral or retroviral vectors.
The expression vector may be directly introduced into the host organism such that the encoded protein is expressed in the host organism to produce a beneficial effect. In another approach, the expression vector may be introduced into cells in vitro. Cells containing the expression vector are thereafter selected and introduced into the host organism, where they express the encoded protein to produce a beneficial effect.
EXAMPLE 62 Use Of Signal Pentides Encoded By 5' Ests Or Sequences Obtained Therefrom To Import Proteins Into Cells The short core hydrophobic region of signal peptides encoded by the 5'ESTS or extended cDNAs derived from the 5'ESTs of the present invention may also be used as a carrier to import a peptide or a protein of interest, so-called cargo, into tissue culture cells (Lin et al., J. Biol. Clien., 270: 14225-14258 (1995); Du et al., J. Peptide Res., 51: 235-243 (1998); Rojas et al., Nature Biotech., 16: 370-375 (1998)).
When cell permeable peptides of limited size (approximately up to 25 amino acids) are to be translocated across cell membrane, chemical synthesis may be used in order to add the h region to either the C-temninus or the N-terminus to the cargo peptide of interest. Alternatively, when longer peptides or proteins are to be imported into cells, nucleic acids can be genetically engineered, using techniques familiar to those skilled in the art, in order to link the extended cDNA sequence encoding the h region to the 5' or the 3' end of a DNA sequence coding for a cargo polypeptide. Such genetically engineered nucleic acids are then translated either in vitro or in vive after transfection into appropriate cells, using conventional techniques to produce the resulting cell permeable polypeptide. Suitable hosts cells are then simply incubated with the cell permeable polypeptide which is then translocated across the membrane.
This method may be applied to study diverse intracellular functions and cellular processes. For instance, it has been used to probe functionally relevant domains of intracellular proteins and to examine protein-protein interactions involved in signal transduction pathways (Lin et al., supra; Lin et al., J. Oiol.
Chew., 271: 5305-5308 (1996); Rojas et al.. J. Biol. Chem., 271; 27456-27461 (1996); Liu et al., Proc. Nal, Acad. Sci. USA. 93: 11819-11824 (1996); Rojas et al., Dioch. Biophys. Res. Comunn., 234: 675-680 (1997)).
Such techniques may be used in cellular therapy to import proteins producing therapeutic effects.
For instance, cells isolated from a patient may be treated with imported therapeutic proteins and then reintroduced into the host organism.
Alternatively, the h region of signal peptides of the present invention could be used in combination with a nuclear localization signal to deliver nucleic acids into cell nucleus. Such oligonucleotides may be antisense oligonucleotides or oligonucleotides designed to form triple helixes, as described in examples 59 and 60 respectively, in order to inhibit processing and maturation of a target cellular RNA.
EXAMPLE 63 Reassembling Resequencing of Clones Further study of the clones reported in SEQ ID NOs: 40 to 86 revealed a series of abnormalities.
As a result, the clones were resequenced twice, reanalyzed and the open reading frames were reassigned.
The corrected nucleotide sequences have been disclosed in SEQ ID NOs: 134 to 180 and the predicted amino acid sequences for the corresponding polypeptides have also been corrected and disclosed in SEQ ID NOs: 181 to 227. The corrected sequences have been placed in the Sequence Listing in the same order as the original sequences from which they were derived.
After this reanalysis process a few apparent abnormalities persisted. The sequences presented in SEQ ID NOs: 134, 149, 151, and 164 are apparently unlikely to be genuine full length cDNAs. These clones are missing a stop codon and are thus more probably 3' truncated cDNA sequences. Similarly, the sequences presented in SEQ ID NOs: 145, 155, and 166 may also not be genuine full length cDNAs based on homolgy studies with existing protein sequences. Although both of these sequences encode a potential start methionine each could represent of 5' truncated cDNA.
In addition, after the reassignment of open reading frames for the clones, new open reading frames were chosen in some instances. In case of SEQ ID NOs: 135, 149, 155, 160, 166, 171, and 175 the new open reading frames were no longer predicted to contain a signal peptide.
Table VII provides the sequence identification numbers of the extended cDNAs of the present invention, the locations of the full coding sequences in SEQ ID NOs: 134-180 the nucleotides encoding both the signal peptide and the mature protein, listed under the heading FCS location in Table VII), the locations of the nucleotides in SEQ ID NOs: 134-180 which encode the signal peptides (listed under the heading SigPep Location in Table VII). the locations of the nucleotides in SEQ ID NOs: 134-180 which encode the mature proteins generated by cleavage of the signal peptides (listed under the heading Mature Polypeptide Location in Table VII). the locations in SEQ ID NOs: 134-180 of stopcodons (listed under the heading Stop Codon Location in Table Vll). the locations in SEQ ID NOs: 134-180 ofpolyA signals (listed under the heading PolyA Signal Location in Table VII) and hie locations of polyA sites (listed under the heading PolyA Site Location in Table VII).
Table VIII lists the sequence identification numbers of the polypeptides of SEQ ID NOs: 181- 227, the locations of the amino acid residues of SEQ ID NOs: 181-227 in the full length polypeptide (second column), the locations of the amino acid residues of SEQ ID NOs: 181-227 in the signal peptides (third column), and the locations of the amino acid residues of SEQ ID NOs: 181-227 in the mature polypeptide created by cleaving the signal peptide from the full length polypeptide (fourth column). In Table VIII, and in the appended sequence listing, the first amino acid of the mature protein resulting from cleavage of the signal peptide is designated as amino acid number I and the first amino acid of the signal peptide is designated with the appropriate negative number, in accordance with the regulations governing sequence listings.
EXAMPLE 64 Functional An:avsis of Predicted Protein Sequences Following double-sequencing, new contigs were assembled for each of the extended cDNAs of the present invention and each was compared to known sequences available at the time of filing. These sequences originate from the following databases Gcnbank (release 108 and daily releases up to October, 15, 1998), Gcnseq (release 32) PIR (release 53) and Swissprot (release 35). The predicted proteins of the present invention matching known proteins were further classified into 3 categories depending on the level of homology.
It should be noted that the numbering of amino acids in the protein sequences discussed in Figures 9 to 16, and Table VI, the first methionine encountered is designated as amino acid number 1. In the appended sequence listing, the first amino acid of the mature protein resulting from cleavage of the signal peptide is designated as amino acid number I and the first amino acid of the signal peptide is designated with the appropriate negative number, in accordance with the regulations governing sequence listings.
The first category contains proteins of the present invention exhibiting more than 90% identical amino acid residues on the whole length of the matched protein. They are clearly close homologues which most probably have the same function or a very similar function as the matched protein.
The second category contains proteins of the present invention exhibiting more remote homologies (40 to 90% over the whole protein) indicating that the protein of the present invention is susceptible to have functions similar to those of matched protein.
The third category contains proteins exhibiting high homology (90 to 100%) to a domain of a known protein indicating that the matched protein and the protein of the invention may share similar features.
In addition all of the corrected amino acid sequences (SEQ ID NOs: 181 to 227) were scanned for the presence of known protein signatures and motifs, This search was performed against the Prosite 34.0 database, using the Proscan software from the GCG package. Functional signatures and their locations are indicated in Table VI, A) i'roteius which are closely related to known proteins Protein of SEO ID NO: 214 The protein of SEQ ID NO: 214 encoded by the extended cDNA SEQ ID NO: 167 isolated from brain shows extensive homology to a human SH3 binding domain glutamic acid-rich like protein or SH3BGRL (Egeo et al, Biochem. Biophys. Res. Conmmun.. 247:302-306 (1998)) with Genbank accession number is AF042081. As shown by the alignments of Figure 9, the amino acid residues are identical except for positions 63 and 101 in the 114 amino acid long matched sequence. This SH3BROL protein is itself homologous to the middle proline-rich region of a protein containing an SH3 binding domain, the SH3BGR protein (Scartezzini et al., Hum. Genet., 99:387-392 (1997)). This proline-rich region is also highly conserved in mice. Both SH3BGR and SH3BGRL proteins are thought to be involved in the Down syndrome pathogenesis. The protein SEQ ID NO: 214 also contains the proline-rich SH3 binding domain (bold) and a potential RGD cell attachment sequence (underlined).
SH3 domains are small important functional modules found in several proteins from all cukaryotic organisms that are involved in a whole range of regulation of protein-protein interaction, e.g.
in regulating enzymatic activities, recruiting specific substrates to the enzyme in signal transduction pathways, in interacting with viral proteins and they are also thought to play a role in determining the localization of proteins to the plasma membrane or the cytoskeleton (for a review, see Cohen et al, Cell, 80:237-248 (1995)).
The Arg-Gly-Asp (RGD) attachment site promote cell adhesion of a large number of adhesive extracellular matrix, blood and cell surface proteins to their integrin receptors which have been shown to regulate cell migration, growth, differentiation and apoptosis. This cell adhesion activity is also maintained in short RGD containing synthetic peptides which were shown to exhibit anti-thrombolytic and anti-metastatic activities and to inhibit bone degradation in vivo (for review, see Ruoslahti, Annu.
Rev. Cell Dev. Biol., 12:697-715 (1996)).
Taken together, these data suggest that the protein of SEQ ID NO: 214 may be important in regulating protein-protein interaction in signal transduction pathways, and/or may play a role of localization of proteins to the plasma membrane or cytoskeleton, and/or may play a role in cell adhesion.
Moreover, this protein or part therein, especially peptides containing the RGD motif, may be useful in diagnosing and treating cancer, thrombosis, osteoporosis and/or in diagnosing and treating disorders associated with the Down syndrome.
Proteins of SEO ID NOs: 185 and 215: The nearly homologous proteins of SEQ ID NOs: 185 and 215 encoded by the extended cDNA SEQ ID NOs: 138 and 168, respectively, exhibit an extensive homology with a murine protein named MPIl for MEK binding partner 1 (Genbank accession number AF082526). The amino acid residues are identical to the Umarine protein except for positions 39. 118 and 119 of the Genbank MPl sequence for SEQ ID NO: 215 and except for positions 33, 39, 118 and 119 of the Genbank MPI sequence for SEQ ID NO: 185. The Genbank MPI sequence is the 124 amino acid long matched protein region. See the amino acid sequence alignment in Figure 10. MPI was shown to enhance enzymatic activation of mitogen-activated protein (MAP) kinase cascade. The MAP kinase pathway is one of the important enzymatic cascade that is conserved among all eukaryotes from yeast to human. This kind of pathway is involved in vital functions such as the regulation of growth, differentiation and apoptosis. MPI probably acts by facilitating the interaction of the two sequentially acting kinases MEK1 and ERKI (Schaffer et il., Science, 281:1668-1671 (1998)).
Taken together, these data suggest that the proteins of SEQ ID NO: 185 and 215 may be involved in regulating protein-protein interaction in the signal transduction pathways. Thus, these proteins may be useful in diagnosing and/or treating several types of disorders including, but not limited to, cancer, neurodegenerative diseases, cardiovascular disorders, hypertension, renal injury and repair and septic shock.
Protein of SEO ID NO: 186 The protein of SEQ ID NO: 186 encoded by the extended cDNA SEQ ID NO; 139 exhibits an extensive homology with a murine protein named claudin-2 (Genbank accession number AF072128).
The amino acid residues are identical except for the conservative substitutions observed at positions: 6.
22, 23, 29, 31, 90, 110, 120, 130, 171, 176, 179, 187, 192, 1 97 2 11, 212, 214, and 217 of the 230 amino acids long matched protein claudin-2. One drastic substitution from glycinc to arginine was observed at position 189. See the amino acid sequence alignment in Figure 11. The murine homologue claudin-2 is a integral membrane proteins with 4 putative transmembrane domains belonging to a family of proteins thought to be involved in the formation of tight junctions between cells in epithelial or endothelial cell sheets (Furuse et al., J. Cell. Biol., 141:1539-1550, (1998)).
In addition, the protein of SEQ ID NO: 186 shows more remote homology to a family of transmembrane proteins among which are receptors for Clostridium perfringens enterotoxin (CPE) with either high or low affinity for CPE (Katahira et al., J. Biol. Chem., 452:26652-26658 (1997)). The matched region include the 4 putative transmembrane regions.
Taken together, these data suggest that the protein of SEQ ID NO: 186 may be involved in the formation and/or regulation of tight junction, and more generally in cell-cell adhesion. This protein may also function as a receptor for a yet unknown ligand that may show homology to CPE. This protein may thus be useful in diagnosing and/or treating disorders associated with changes in epithelium permeability such as infectious diseases caused by Clostridiwn parasites.
Protein ofSEO ID NO: 213 The protein of SEQ ID NO: 213 encoded by the extended cDNA SEQ ID NO: 166 and expressed in lymphocytes exhibits an extensive homology to a stretch of 121 amino acid of a human hcmatopoictic maturation factor named glia maturation factor gammai or GMF-y (Genbank accession number A3001993) and also to other glia maturation factors found in human, bovine and rodent species. The amino acid residues are identical as shown below except for conservative substitutions at positions and 77 of the 142 amino acids long matched protein GMF-y which is itself highly homologous to GMF-P (Asai et al., Biochem,. Bioplys. Acra, 1396:242-244 (1998)). See the amino acid sequence alignment in Figure 12. GMF-P was shown to act as a growth and differentiation factor for neurons and glial cells in human brain (Lim et al., Proc Nanl Acod Sci USA 86:3901-3905 (1989); and Harman et al., Brain Res.
56:332-335 (1991)) and is also thought to regulate ERK proteins of the evolutionarily conserved mitogen-activated protein (MAP) kinase cascade which is important in the regulation of growth, differentiation and apoptosis (Zaheer and Lim, J. Biol. Clem., 272:5183-5186 (1997)).
Taken together, these data suggest that the protein of SEQ ID NO: 213 may be involved in cell growth and differentiation and/or in apoptosis and/or in intracellular signaling. Thus, this protein may be useful in diagnosing and/or treating several types of disorders including, but not limiting to, cancer, neurodegencrativc diseases, cardiovascular disorders, hypertension, renal injury and repair and septic shock.
Protein of SEO ID NO: 191 The protein of SEQ ID NO: 191 encoded by the extended cDNA SEQ ID NO: 144 and expressed in lymphocytes exhibits an extensive homology to a stretch of 91 amino acid of a human secreted protein expressed in peripheral blood mononucleocytes (Genpep accession number W36955 and Genseq accession number V00433). The amino acid residues are identical except for the substitution of asparagine to isoleucine at positions 94, and the conservative substitutions at positions 108, 109 and 110 of the 110 amino acids long matched protein. See the amino acid sequence alignment in Figure 13.
Protein of SEO ID NO: 200 The protein of SEQ ID NO: 200 encoded by the extended cDNA SEQ ID NO: 153 exhibits extensive homologies to proteins encoding RING zinc finger proteins of the human .chicken and rodent species, as well as an EGF-like domain. Two stretches of 341 and of 13 amino acids of the human RING zinc finger protein which might bind DNA (Genbank accession number AF037204). The amino acid residues are identical except for conservative substitutions at positions 18, 29, 156 and 282 of the 381 amino acid long human RING zinc finger. See the amino acid sequence alignment in Figure 14. Such RING zinc finger proteins are thought to be involved in protein-protein interaction and are especially found in nucleic acid binding proteins. Secreted proteins may have nucleic acid binding domain as shown by a nematode protein thought to regulate gene expression which exhibits zinc fingers as well as a functional signal peptide (Hoist and Zipfel. J. Biol. Chem., 271:16275-16733 (1996)).
Taken together, these data suggest that the protein of SEQ ID NO: 200 may play a role in protein-protein interaction or be a nucleic acid binding protein.
Protein of SEO ID NO: 192 The protein of SEQ ID NO: 192 encoded by the extended cDNA SEQ ID NO: 145 exhibits extensive homologies to stretches of proteins encoding vacuolar proton-ATPase subunits M9.2 of either human (Genbank accession number Y15286) or bovine species (Genbank accession number Y15285).
These two highly conserved proteins are extremely hydrophobic membrane proteins with two membranespanning helices and a potential metal-binding domain conserved in mammalian protein homologues (Ludwig et al., J. Diol. Chem., 273:10939-10947 (1998)). The amino acid residues are completely identical as shown in the alignment in Figure 15. However, the protein of SEQ ID NO: 192 is missing amino acids 1 to 92 from the Genbank sequences. The protein of SEQ ID NO: 192 contains the second putative transmembrane domain as well as the potential metal-binding site.
Taken together, these data suggest that the protein of SEQ ID NO: 192 may play a role in energy conservation, secondary active transport, acidification of intracellular compartments and/or cellular pH homeostasis.
II) Proteins which are remotely related to proteins with known functions Proteins of SEO ID NOs: 201 and 227 The proteins of SEQ ID NOs: 201 and 227 encoded by the extended cDNA SEQ ID NOs: 154 and 180, respectively, belong to the stomatin or band 7 family. The human stomatin is an integral membrane phosphoprotein thought to be involved to regulate the cation conductance by interacting with other proteins of the junctional complex of the membrane skeleton (Gallaghcr and Forget, J. Diol. Cliem., 270:26358-26363 (1995)). The proteins of SEQ ID NOs: 201 and 227 exhibit the PROSITE signature typical for the band 7 family signature. See the amino acid sequence alignment in Figure 16.
Taken together, these data suggest that the proteins of SEQ ID NOs: 201 and 227 play a role in the regulation of ion transport, hence in the control of cellular volume. These proteins may then be useful in diagnosing and/or treating stomatocytosis and/or cryohydrocytosis.
Protein of SEO TD NO: 198 The protein of SEQ ID NO: 198 encoded by the extended cDNA SEQ ID NO: 151 shows homologies with different DNA or RNA binding proteins such as the human Staf50 transcription factor (Genbank accession number X82200), the human Ro/SS-A ribonucleoprotein autoantigen (Swissprot accession number P19474) or the murine RPTI transcription factor (Swissprot accession number P15533). The protein of SEQ ID NO: 198 exhibits a putative signal peptide and also a PROSITE signature for a RING type zinc finger domain located from positions 15 to 59. Secreted proteins may have nucleic acid binding domain as shown by a nematode protein thought to regulate gene expression which exhibits zinc fingers as well as a functional signal peptide (Hoist and Zipfel, J. Biol. Clhem., 271:16275-16733 (1996)).
Taken together, these data suggest that the protein of SEQ ID NO: 198 may play a role in protein-protein interaction in intracellular signaling and eventually may directly or indirectly bind to DNA and/or RNA. hence regulating gene expression.
Proiin of SEO II) NO: 216 The protein of SEQ ID NO: 216 found in testis encoded by the extended cDNA SEQ ID NO: 169 shows homologies to protein domains with a 4-disulfide core signature found in either an extracellular proteinase inhibitor named chelonianin (Swissprot accession number P00993) or in rabbit and human proteins specifically expressed in epididymes (Genbank accession numbers U26725 and R13329). The matched domain in red sea turtle chelonianin is known to inhibit subtilisin, a serine protease (Kato and Tominaga, Fed. Proc., 38:832 (1979)). All cysteines of the 4 disulfide core signature thought to be crucial for biological activity are present in the protein of SEQ ID NO: 216. The 4 disulfide core signature is present except for a conservative substitution of asparagine to glutamine.
Taken together, these data suggest that the protein of SEQ ID NO: 216 may play a role in protein-protein interaction, act as a protense inhibitor and/or may also be related to male fertility.
Protein of SEO ID NO: 197 The protein of SEQ ID NO: 197 encoded by the extended cDNA SEQ ID NO: 150 shows extensive homology to the connexin family conserved in the rodent, chicken, human, frog, sheep species.
Connexins are a family of integral membrane proteins that oligomerize into clusters of intercellular channels called gap junctions, which join cells in virtually all metazoans. These channels permit exchange of ions between neurons and between neurons and excitable cells such as myocardiocytes (for review, see Goodenough et al, Ann. Rev. Biochem., 65:475-502 (1996)).
Taken together, these data suggest that the protein of SEQ ID NO: 197 may play a role in cell growth, differentiation and developmental signaling. Moreover, the protein of SEQ ID NO: 197 may be useful in diagnosing and/or treating cancer, neurodegenerative diseases and cardiovascular disorders.
C) Proteins homologous to a domain of a protein with known function Protein of SEQ ID NO: 183 The protein of SEQ ID NO: 183 encoded by the extended cDNA SEQ ID NO: 136 shows homology to a rabbit soluble protein called PiUS (Genbank accession number U74297) which is a stimulator of inorganic phosphate uptake and is thought to be involved in cellular phosphate metabolism and/or binding (Norbis et tl., J. Memb. Biol., 156:19-24 (1997)).
Taken together, these data suggest that the protein of SEQ ID NO: 183 may play a role in phosphate metabolism.
Protein of SEQ ID NO: 223 The protein of SEQ ID NO: 223 encoded by the extended cDNA SEQ ID NO: 176 shows homology to short stretches of a human protein called Tspan-1 (Genbank accession number AF054838) which belongs to the 4 transmembrane superfamily of molecular facilitators called tetraspanin (Meakers 'e al.. FASEB 11:428-442 (1997)).
Taken together, these data suggest that tie protein of SEQ ID NO: 223 may play a role in cell activation and proliferation, and/or adhesion and motility and/or differentiation and cancer.
Protein of SEO ID NO: 193 The protein of SEQ ID NO: 193 encoded by the extended cDNA SEQ ID NO: 146 shows homology to short stretches of Drosophila, C. elegans and chloroplast proteins similar to E. coli ribosomal protein L16.
Taken together, these data suggest that the protein of SEQ ID NO: 193 may be a ribosomal protein.
As discussed above, the extended cDNAs of the present invention or portions thereof can be used for various purposes. The polynucleotides can be used to express recombinant protein for analysis, characterization or therapeutic use; as markers for tissues in which the corresponding protein is preferentially expressed (either constitutively or at a particular stage of tissue differentiation or development or in disease states); as molecular weight markers on Southern gels; as chromosome markers or tags (when labeled) to identify chromosomes or to map related gene positions; to compare with endogendus DNA sequences in patients to identify potential genetic disorders; as probes to hybridize and thus discover novel, related DNA sequences; as a source of information to derive PCR primers for genetic fingerprinting; for selecting and making oligomers for attachment to a "gene chip" or other support, including for examination for expression patterns; to raise anti-protein antibodies using DNA immunization techniques; and as an antigen to raise anti-DNA antibodies or elicit another immune response. Where the polynucleotide encodes a protein which binds or potentially binds to another protein (such as, for example, in a receptor-ligand interaction), the polynucleotide can also be used in interaction trap assays (such as, for example, that described in Gyuris et al., Cell 75:791-803 (1993)) to identify polynucleotides encoding the other protein with which binding occurs or to identify inhibitors of the binding interaction.
The proteins or polypeptides provided by the present invention can similarly be used in assays to determine biological activity, including in a panel of multiple proteins for high-throughput screening; to raise antibodies or to elicit another immune response; as a reagent (including the labeled reagent) in assays designed to quantitatively determine levels of the protein (or its receptor) in biological fluids; as markers for tissues in which the corresponding protein is preferentially expressed (either constitutively or at a particular stage of tissue differentiation or development or in a disease state); and, of course, to isolate correlative receptors or ligands. Where the protein binds or potentially binds to another protein (such as, for example, in a receptor-ligand interaction), the protein can be used to identify the other protein with which binding occurs or to identify inhibitors of the binding interaction. Proteins involved in these binding interactions can also be used to screen for peptide or small molecule inhibitors or agonists of the binding interaction.
Any or all of these research utilities are capable of being developed into reagent grade or kit format for commercialization as research products.
Methods for performing the uses listed above are well known to those skilled in the art. References disclosing such methods include without liitation Molecular Cloning; A laboratory Manual. 2d ed.. Cole Spring Harbor Laboratory Press. Sambrook, EF. Fritsch and T. Maniatis eds., (1989), and Methods in Entzymology: Guide to Molecular Cloning Techniques, Academic Press. Berger. S.L, and A.R. Kimmel eds., (1987).
Polynucleotides and proteins of the present invention can also be used as nutritional sources or supplements. Such uses include without limitation use us a protein or amino acid supplement, use as a carbon source, use as a nitrogen source and use as a source of carbohydrate. In such cases the protein or polynucleotide of the invention can be added to the feed of a particular organism or can be administered as a separate solid or liquid preparation, such as in the form of powder, pills, solutions, suspensions or capsules.
In the case of microorganisms, the protein or polynucleotide of the invention can be added to the medium in or on which the microorganism is cultured.
Although this invention has been described in terms of certain preferred embodiments, other embodiments which will be apparent to those of ordinary skill in the art in view of the disclosure herein are also within the scope of this invention. Accordingly, the scope of the invention is intended to be defined only by reference to the appended claims.
Search Ch~aracteristics Selection Characteristics 1- Step Program Strand Parameters Ide~ntity M% ]Lenpth (bn' Miscellaneous blastn both S=61 X=16 90 17 'tRNA Tfasta both .80 rRNA blastn both S=108 80 mtfNA blastn both -S=108 80 Proaryotic blasta both S=144 90 Fungal blastn both S=144 90 AIu fasta* both 70 LI blastn both S=72 70 Repeats blastn b6th S=72 70 Promoters blastn top S--4 X=16 90 Vertebrate fasta* both S=108 90 ESTs blastn both S=108 X=16 90 Proteins blastx I E=0.001
I
Table 1: Parameters used for each step of EST analysis usC "Quick Pas- Database Scanner t alignnent furthcr constrained to begin closr tin IObp to EST 5' cd 0 using BLOSM162 substitution matrix r TABLE Il Mature Stop PolyA PolyA FCS SigPep Polypeptide Codon Signal Site Id Location Location Location Location Location Location 173-565 173-211 212-565 566 1063-1068 1087-1098 41 267.455 267.371 372-455 456 817.822 S42-855 42 174-662 174.266 267.662 663 1 144.1149 1165.1176 43 460-615 460-555 S56-615 616 614-619 635.648 44 79-450 79-369 370.450 451 1217-1222 1240-1251 160-849 160-231 232-849 850 1510-1515 1506.1519 46 106-321 106-201 202-321 322 577.582 59S.610 47 359-631 359-466 467.631 632 1334-1339 1357-1370 48 191-508 191-286 287-508 509 755-760 780-791 49 346-861 346-408 409-861 862 1400-1405 1420-1433 214-381 214-339 340-381 382 1133-1138 1146-1158 51 372-509 372-437 438-509 510 812-817 838-850 52 132-884 132-215 216-884 885 1069-1074 1094-1107 53 199-429 199-288 289-429 430 464.469 4S9-500 54 293.535 293-385 386-535 536 733-738 752-765 130-507 130-189 190-507 508 546-551 572.584 56 191-1009 191-325 326-1009 1010 1348-1353 1374-1387 57 141-614 141-251 252.614 615 1354-1359 1375-1385 58 212-364 212-268 269-364 365 1465-1470 1489.1497 59 147-1223 147-248 249-1223 1224 1538-1543 1558-1570 112-984 112-237 238-984 985 976-981 1010-1022 61 239439 239-316 317-439 440 586-591 603-615 62 157-537 157-345 346-537 538 771.776 791-804 63 194-484 194-253 254-484 485 768-773 780-792 64 148-405 148-207 208-405 406 789.794 820-832 156-368 156-230 231-368 369 706-711 709-721 66 272-451 272-397 398451 452 503-508 518-531 67 381-734 381-629 630-734 735 736-741 770-783 68 140-367 140-205 206-367 368 965-970 984-996 69 183-467 183-338 339467 468 620-625 644-657 140-385 140-205 206-385 386 383-388 405-416 71 129-395 129-176 177-395 396 513-518 530-543 72 285-374 285-341 342-374 375 575-580 592-605 73 136480 136-444 445-480 481 835-840 851-864 74 200-514 200427 428-514 515 1001-1006 1022-1033 68-346 68-133 134-346 347 472-477 490-499 76 274-600 274-399 400-600 601 943-948 966-978 77 421-573 421-465 466-573 574 553.558 575-587 78 198-365 198-278 279-365 366 364-369 387-400 79 167-652 167-229 230-652 653 1133-1138 1154-1166 107 Mature Stop PoIyA PoIYA FCS SigPep Polypeptide Codon Signal Site Id Location Location Location Location Location Location 180-557 180-383 384-557 558 722-727 743-754 31 179-598 179-298 299-598 599 680-685 697-708 82 100-228 100-171 172-228 229 211-216 230-243 83 346-552 346-408 409-552 553 792-797 817-829 84t 177-4 10 177-233 234-410 411 644.649 663-674 179-4L8 179-3119 320-41S 419 461.466 465-478 86 112-270 112-237 238-270 271 910-915 940-952 TABlLV:,, 111197 108 !AD!LE -al Motif Location 160-226 683-734 231-261 Motif Zinc finger, C2H-2 type, domain Conncxjns signatures Zinc finger, C3HC4 type, signature TADLs 1113 97 TABLE T Full Length Polypeptid c Lo0cationi 1-131 1-63 1-163 1-52 1-124 1-230 1-72 1-91 1-106 1-172 1-56 1-46 1-251 1-77 1-81 1-126 1-273 1-158 1-51 1-3 59 1-29 1 1-67 1-127 1-97 1-86 1-71 1-60 1-118 1-76 1-95 1-82 1-89 1-30 1-11t5 1-105 1-93 1-109 Signal Peptide Location 1-13 1-35 1-31 1-32 1-97 1-24 1-32 1-36 1-32 1-21 1-42 1-22 1-28 1-30 1-31 1-20 1-45 1-37 1-19 1-34 1-42 1-26 1-63 1-20 1-20 1-25 1-42 1-83 1-22 1-52 1-22 1-16 1-19 1-103 1-76 1-22 1-42 Mature Po lypeptide Location 14-131 36-63 32-163 33-52 98-124 25-230 33-72 37-9 1 33- 106 22-172 43-56 23-46 29-251 31-77 32-81 21-126 46-273 38-158 20-5 1 3 5-359 43-29 1 27-67 64-127 2 1-97 2 1-86 26-71 43-60 84-118 23-76 53-95 23-82 17-89 20-30 104-115 77-105 23-93 43-109 Full Length Polypeptidc Location 1-56 1-162 1-126 1-140 1-43 1-69 1-78 1-80 1-53 Signal Peptide Location 1-15 1-27 1-21 1-68 1-24 1.21 1-19 1-47 1-42 Mature Polypcptide Location 16-SI 28-56 22-162 69- 126 4 1-140 25-43 22-69 20-78 48-80 43-53 TABLB4;ss 111391 TABLE V No-matches Est Est x x x x x x x x x x Id No-matches Est Est >30% Vrt 79
X
x 81
X
82
X
83 x 84 x x 86X TADLES:js 111397 PROTEIN SIGNATURE SEQ ID LOCATION
MOTIF
214 76 78 cell attachment site 32 53 Leucine zipper 201 289 291 Microbodies C-terminal targeting signal 164 192 Band 7 protein family 227 239 241 Microbodies C-terminal targeting signal 114 142 Band 7 protein family 205 179 182 Endoplasmic reticulum targeting signal 226 78 81 Microbodies C-terminal targeting signal 181 99 101 cell attachment site 200 264 278 EGF like domain 240 282 C3HC4 zinc finger (RING finger) 196 10 32 C2H2 zinc finger 198 15 59 C3HC4 zinc finger (RING finger) 218 21 42 Leucine zipper 197 164 180 connexins TABLE VI Table VII SEQ FCS Sig Pep 1D Location Location 134 135 136 137 1 38 139 140 141 14? 143 144 145 146 147 143 149 150 151 152 153 154 155 156 157 153 159 160 161 162 1631 164 165 166 167 168 169 170 171 172 173 174 17 176 177 178 179 ISO0 131/1042 100/276 111/401 359/314 26/397? 36/2s 35r.50 169/411- 143/460 108/903 209/532 5/211 98/850 46/342 139/381 72/SI? 126/94.4 50/ 1279 83/1261 57/1199 72/944 4/279 90/4170 88/3 39 33/57S 3 3/2 45 125/343 126/632 90/3 17 126/410 85/34S 7 7/343 38/364 48/389 69/440 33/3 11 110/730 38/2 14 129/296 78/563 62/523 24/320 42/170 10S/3 14 118/351 12 8/367 149/871 13 1/169 111/194 359/454 26/3 16 36/107 351130 1691267 143P.38 108/ 170 5/142 98/ 181I 46/189 139r-31 126P260 50/160 83/ 139 57/95 72/197 90/278 83/1147 33/92 33/107 126/575 90/155 126/287 85/ 150 77/12 481356 69/359 33/98 110/2-35 129/209 78/359 62/265 42/113 108/ 170 118/171 128/268 149/457 iature Polypeptide Location 170/1042 100/1-7 6 195/401 455/514 3 17/397 108/725 13 1/250 263/4 3? 239/460 171/908 209/532 143/2 11 182/850 190/342 232/38 1 72/512 261/944 16 1/1279 140/1261 96/1199 1981944 4/279 279/4 70 148/339 93/578 103/245 125/343 576/632 156/317 288/410 15 1/348 125/343 38/364 357/389 360/440 99/311 236(730 3 8r-14 210/296 360t563 266/523 24/320 114/170 17 1/3 14 172/351 269/367 458/371 Stop Codiot PolyA Signal PolyA Site Location Location Location '277 402 515 398 726 251 433 461 909 533 212 851 343 382 945 1262 1200 945 280 471 340 579 246 344 633 318 411 344 365 390 441 312 731 215 297 564 524 321 171 315 352 368 872 63S/643 IOSO/ losi 1164/ 1169 1302/ 1307 505/510 1132/1137 697/702 114 1/1146 1133/1133 716/721 1035/1040 377/382 579/584 1233/1238 1433/1443 425/430 704/709 6 19/624 546/551 375/380 670/675 9 13/9 18 561/566 461/466 458/463 7421747 927/932 4 37/442 764/769 1042/1047 602/607 402/407 5501555 5831588 410/415 1042/ 1053 662/675 1101/1112 536/547 1187/119S 1389/1400 526/538 1155/1167 72 1n30 1161/1174 1146/1158 742n754 1060/ 1073 402/413 598/609 5 12/522 1309/1322 1280/1290 1356/ 1354 1458/1470 970/982 443/45 724n738 637/649 703nI 4 584/596 390/403 72i1/27 932/944 587/598 349/360 477/490 4751488 760/71 947/959 455/464 7 87/799 308/320 318/331 1063/ 1075 621/632 419/430 172/ 185 574/585 602/6 13 424/427 893/912 SIEQ Full Length Polypeptide ID Location 13-4 1 35 36 137 138 139 140 141 142 143 W4 145 146 147 148 149 150 151 152 153 154 155 I 56 157 [58 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
ISO
180 -13/291 1/59 -28/69 -32120 -97/27 *24P-06 -32/40 -33/55 -32/74 -21/2-46 1/103 -46/23 -2811-23 -1315 1 -31/50 1/147 -45/223 -37/373 -19/374 -13/368 -42/249 1/92 -63/64 -20/64 -20/162 -25/46 1/7 -150/19 -22154 -54/4 1 -22/66 16n73 1/109 -103/11 -97r-7 -22n71 -42/165 1/59 -27/29 -94/68 -68/86 1/99 -24/19 -211/48 18/60 -47/3 3 -103/138 1031138 Table V11I, signal Peptilc Locution .32/4I -97/-I1 1 .32/-1 -33/-I -32/-I -461-1 -37/41 -19/-4 -13/4I -42/-I -20/-I -2 0/-I .25/-I 0-1 -22/-I -103/-I -97/-1 -42/-I -27/-I -94/-I1 -68/4I -24/4I .1)1/-I1 -18/-I -47/-1 -103/-I 103/4I Mature Polypcptide Locution 1/291 1/59 1/69 1/20 1/27 11206 1140 1/55 1/74 1/2)46 1/108 1/23 1/223 1/SI 1/50 1/147 1/228 1/373 1/374 1/368 1/249 1/92 1/64 1/64 1/162 1/46 1/7 1/19 1/54 1/41 1/66 1/7 1/109 1/11, 1 /2 7 1/71 1/165 1/59 1/29 1/68 1U86 1/99 1/19 1/48 1/60 1/33 1/138 1/138 116 SEQUENCE LISTING eted Proteins <110> Censet SA <120> Extended cDNAS for Secr <160> 227 <170> Patent.pm <210> 1 <211> 47 <212> RNA <213> <220> <221> znodified-.base <222> 1 <223> m7Gppp added to 1 <300> <400> 1 ggcauccuac ucccauecaa uuccacc <210> 2 <211> <212> RN'A <213> <22 0> <300> <400> 2 gcauccuacu cccauccaau uccaccc <210> 3 <211> <212> DNA <213> <220> <300> <400> 3 atcaagaatt cgcacgagac catta <210> 4 <211> <212> DNA <213> <22 0> <300> <400> 4 taatggtccc grgcgaattc ttgat <210> <211> <212> DNA <213> <22 0> <300> <400> ccgacaagac caacgtcaag gccgc <210> 6 <211> <212> DNA <213> <220> <300> <400> 6 tcaccagcag gcagtggctt aggag <210> 7 -c2 11:1 <212> DNA <213> <22 0> <300> <400> 7 agtgattcct qctactttgg atggc cua acuccuccca ucuccac uaa cuccucccau cuccoc 117 <210> 8 <211> <212> DNA <213> <220> <300> <400> 8 gcttggtctt <210> 9 <211> <212> DNA <213> <220> <300> <400> 9 cccagaa cgg <210> <211> <212> DNA <213> <220> <300> <400> agggaggagg <210> 11 <211> <212> DNA <213> <220> <300> <400> 11 atgggaaagg <210> 12 <211> <212> DNA <213> <22 0> <300> <400> 12 agcagcaaca <210> 13 <211> <212> DNA <213> <220> <300> <400> 13 atcaagaat <210> 14 <211> 67 <212> DNA <213> <220> <300> <400> 14 atcgttgaga ttttt vn <210> <211> 29 <212> DNA <213> <22 0> <300> gttctggagt ttaga gagacaagcc aattt aaacagcgtg agtcc aaaagactca tatca atcaggacag Cacag cgcacgagac catta ctcgtaccag cagagtcacg agagagacta cacggtactg gtttttttt 118 <400> 15 ccagcagagt cacgagagag actacacgg <210> 16 <211> <212> DNA <213> <220> <300> <400> 16 cacgagagag actacacggt actcg <210> 17 <211> 526 <212> DNA <213> Homo Sapiens <220> <221> misc..feature <222> cornplement(26i. .376) <223> biastn <221> misc-feature <222> complement(380. .486) <223> blastn k221> misc-.feature <222> complementi110. .145) <223> biascn <221> misc..feature <222> complement(196. .229) <223> blastn <221> sig...peptide <222> 90. .140 <223> Von Heijne matrix <300> <400> 17 aatatrarac agctacaata ttccagggcc artcactgcc gagagaaaga actgactgar acgtttgag atg aag Met Lys aca gcc atc ttg gca gtg gct gtw ggt ttc Thr Ala Ile Leu Ala Val Ala Val Gly Phe 1 gaa cga gaa aaa aga agt atc agt gac agc Glu Arg Giu Lys Arg Ser Ile Ser Asp Ser cattctcat aacagcgtca gtt ctc ctc ctg atc Val Leu Leu Leu Ile gtc tct caa gac cag Val Ser Gin Asp Gin gat gaa t.ta gct tca ggr Asp Giu Leu Ala Ser Gly wtt ttt Xaa Phe gtg t.ec cct Val Phe Pro tac cca Tyr Pro tat cca ttt cgc cca ctt cca cca att Tyr Pro Phe Arg Pro Leu Pro Pro Ile cca tt cca aga ttt cca tgg ttt Pro Phe Pro Arg Phe Pro Trp Phe 45 cct gaa tct gcc cct aca act ccc Pro Giu Ser Ala Pro Thr Thr Pro aga cgt Arg Arg ttt cct att cca ata Phe Pro Ile Pro Ile gaa aag taaacaaraa ctt cct agc Leu Pro Ser Glu Lys ggaaaagtca crataaa( caaaattcct gttaata~ gtcaatac ettagtgat <210> 18 <211> 17 <212> PRT <213> Homo Sapiens <220> <221> SIGNAL <222> 1. .17 <223> Von Heilne m :ct ggtcacctga aattgaaact iaa raaaaacaaa tgtaattgaa :ct tctet.aataa acatgaaagc gagccacttc cttgaaraat atagcacaca gcactcca aaaaaaaaaa aa atrix score 8.2 119 seci LLLITAILAVAVG/FP <300> <400> 18 Met Lys Lys Val Leu 1 5 Gly <210> 19 <211> 822 <212> DNA *4213> Homo Sapiens ,.220> misc..feature <222> 260.-464 <223> blastn <221> misc-f.eature <222> 118,-184 <223> blastn <221> misc-feature <222> 56,..113 <223> blastn <221> misc.feature <222> 454. .485 <223> blastn -,221> misc-feature <222> 118. .545 <223> blastn <221> misc-fear-ure <222> 65. .369 <223> blastn <221> misc-feature <222>. 61,..399 <223> blastn <221> misc-feature <222> 408. .458 <223> blastn <221.> misc..feature <222> 60. .399 <223> blastn <221> misc-feature <222> 393. .432 <223> blastn <221> sig...peptide <222> 346. .408 <223> Von Heijne mz <3 00> <400> 19 actcctttta gcatagg5 ctgatgccga gttccgtc ctcaaacggc ctagtgc~ gtttgttgaa gcagttac cgtcctgtt gagtacac aagactaaca ttttgtg~ Leu Leu Ile Thr Ala Ile Leu Ala Val Ala Val 10 ~trix ;gc c :ca iag ttcggcgcca tcgcgtcct gcgc ttccgg agaatctt ca tcctgttgat ttgtaaaaca gcggccagcg tcctggccc agaaaatcag accctttccc C Cacaaaagg gaaaacctgt ctagtcggtc tggtaagtgc aggcaaagcg gasgnagac cggcaatt aatccccg acaaaagcta atcgagtaca tgcaggcatg agcaggcg cagaa atg tgg cgg ccc Met Trp Trp Phe gta att egg aca tc Val. Ile Trp Thr Ser 120 180 240 300 357 405 453 501 cag caa ggc ccc agt ttc ctt ccc tca gcc cttc Gin Gln Gly Leu Ser Phe Leu Pro Ser Ala Leu gcc gct eec ata eec eca tac ace, act gca gea aca ctc cac cat aca Ala Ala Phe Ile Phe Ser Tyr Ile Thr Ala Val Thr Leu His His Ile gac ccg gcc tea cctceat atc age gac ace ggc Asp Pro Ala Leu Pro Tyr Ile Ser Asp Thr Gly aca gta gct cca raa Thr Val Ala Pro Xaa 120 aaa tgc tta ttt *ggg gca atg cta aat att gcg gca get tta tgt caa.
Lys Cys Leu Phe Gly Ala Met Leu Asn Ile Ala Ala Val Leu Cys Gin 40 aaa tagaaatcag gaarataatt caacttaaag aakttcattt catgaccaaa Lys Ct~ttcaraa acatgtcttt acaagcatat gtctggcaat acctccgca7 tggaaaattt getaaggtggg cttttccccc tgtgtaattg ttgaaaaa agatatga gagtgacaca <210> 20 <211> 21 <212> PRT <213> Homo Sapiens <220> <221> SIGNAL <222> 1. .21 <223> Von Heijne matrix score seq SFLPSA.LVIWTSAIAF Ctcttgcat gatttarmta gCtactatgt aaaaaaaaaa gctttoeaoa ctgttgaatt gttctgact gacaaatatg cttactgago caagttgtaw <300> 2 <400> 2 Met #rrp Trp Phe Gin Gin Gly Leu Ser 1 5 Ile Trp Thr Ser Ala Phe Leu Pro Ser Ala Leu Val 10 is <210> 21 <211> 405 <212> DNA <213> Homo Sapiens <220> <221t, misc-feature <222> complernent(103. .398) <223> blastn <221> sig...peptide <222> 185. .295 <223> Von Heiine matrix <300> <400> 21 atcacct Ctccacct tstctgggcc agtcccoarc cccagcccaa gtcagccttc agcacgcgct ttoegoaca ggcattccag gacctccgma atgatgctcc agtcccttac tggc atg gtg ctg acc acc ctc ccc ttg ccc tc~ Met Val Leu Thr Thr Leu Pro Leu Pro Se2 -30 ocagtccctc tcctgacctg cagatatcc aggcctacct aagcgcttcc tggatgaggg g;o9c aac agc cct gtg rAla Asn Ser Pro Val aa t9cc acc act 99c ccc aac agc ctg age tat got agc tot gc Asn. Met Pro Thr Thr Gly Pro Asn Ser Leu Ser Tyr Ala Ser Ser Ala -15 ceg too ccc ege ctg acc gct cca aak tcc ccc cgg ott got atg atg Leu Ser Pro Cys Leu Thr Ala Pro Xaa Ser Pro Arg Leu Ala Met Met 1 5 cot gao aac taaatatcct tatccaaatc aataaarwra raatcctccc Pro Asp Asn tccaraaggg tetctaaaaa caaaaaaaaa a <210> 22 <211> 37 <212>. PRT <213> Homo Sapiens <220> <221> SIGNAL <222> 1. .37 <223> Von Heijne matrix score 5.9 seq LSYASSALSPCLT/AP <300> <400> 22 Met Val Leu Thr Thr 1 5 Met Pro Thr Thr Gly Ser Pro Cys Leu Thr Leu Pro Leu Pro Asn Ser Pro Ser 10 Leu Ser 25 Ala Asn Ser Pro Val. Asn Tyr Ala Ser Ser Ala Leu <210> 23 <211> 496 <212> DNA <213> Homo -4220> <221> misc ,<222> 149.
<223> blast <221> misc- <222> 328..
<223> blast <221> misc- <222> camp] <223> blast <221> sig r <222> 196..
<223> Von <300> <400> 23 aaaaaacc gg attagccgtg cccggagata gcacacagac Sa~piens feat ure .331 .feature .485 feature lement(182. .496) :n )eptide .240 ~eijne matrix tcccagtttt caccct gcctaggccg tttaac ggaccaaccg tcaggi agacc atg ggg att met Gly IlE :gccg cagggctggc tggggagggc agcggtttag :gggg tgacacgagc ntgcagggcc gagtccaagg iatgc gaggaatgtt tttcttcgga ctctatcgag tctg tct aca gtg aca gcc tta aca ttt Leu Ser Thr Val. Thr Ala Leu Thr Phe -10 aga aat ggc att gcc car cct gra agt Arg Asn Gly Ile Ala His Pro Ala Ser 5 gcc ara gcc ccg gac ggc tgc Ala Xaa Ala Leu Asp Gly Cys gag aag Glu Lys gcc rca Ala Pro 1 cac aga His Arg ctc gag aaa tgt agg gaa ctc gag asc asc cac tcg Leu Glu Lys Cys Arg Glu Leu Glu Xaa Xaa His Ser 20 gga tca acc cas cac cga aga aaa aca acc aga aga aat tat Gly Ser Thr Xaa His Arg Arg Lys Thr Thr Arg Arg Asn Tyr 40 tc tca gcc tgaaatgaak ccgggatcaa atggttgctg atcaragccc Ser Ser Ala atatttaaa-t tggaaaagtc aaattgasca ttaccaaaca aagcttgttt aatatgtcc aaacaaaaaa aa <210> 24 <211> <212> PRT <213> Homo Sapiens <220> <221> SIGNAL <222> 1. <223> Von Heijne matrix score seq ILSTVTALTFAXA/LD <300> <400> 24 Met Gly Ile Leu Ser Thr Va]. Thr Ala Leu Thr Phe Ala Xaa Ala 1 5 10 <210> <211> 623 <212> DNA 122 <213> Homo Sapidrxs <220> <221> sig-peptide <222> 49. .96 <223> Von Heijne matrix <300> <400> aaagacccct gcagcccggc aggagagaag gctgagcctt ctggcgec Otg gag agg Met Glu Arg ctc Leu tgc Cys g tc Val1 ca a Gln cgc Arg a tg Me e agg Arg gag Glu 100 agg Arg C C t cta Lou acg Thr agc Ser tgc tys C 1g Leu ttc Phe tgc Cys c9C Arg a ra Xaa awt acc Thr acg Thr C9g Trp a tc Ile etc Leu gaa Glu tg t Cys tgg Trp aaa Lys tcc C eg Le u Cca Pro acg Thr tcC S er agc Ser tg9 Trp tc Se r gc c Ala acc Thr 120 aac ccg Pro -5 Ctr-g Leu Ccc Pro gtc Val1 gc t Ala 60 ccc Pro tgc Cys ggg G ly cca Pro cca c eg Lou agc Ser acc Thr gte Val 45 ccc Pro a tg Met aac Asn cC tC Leu cag Gin 125 gct Ala tgo Cys tg9 Trp
CC:
Ser aga Arg geg Val1 Arg C tg Leu 110 C :-g Leu gcg Ala cag Gin agc Ser aaa Lys ccc Pro ggc Gly Cceg Leu cag Gin Leu tcc gct ggc Sc r 1 e gc Cys ccg Pro tgg Trp aac Asn 9t9 Val acc Thr gac Asp cca Pro Ala ttc ?he C eg Leu age Se r ga c Asp atc Ile Cca Pro oct Pro ctc Leu 130 57 105 153 201 249 297 345 393 441 489 rgg Leu P'ro Xaa Ser Asn Pro Leu Cys Pro Xaa Giu Thr Gin Glu Gly 135 140 145 caacaccgtg ggtgccccca cctgtgcatt gggaccacra cctcaccctc tcggaracaa taaactctca tgcccccaaa aaaaaaaaa <210> 26 <211> 16 <212> PRT <213> Homno Sapiens <220> <221> SIGNAL <222> 1. .16 <223> Von Heijne matrix score 10.1 seq LVLTLCTLPLAVA/
SA
<300> <400> 26 Met Glu Arg Leu Val Leu T1hr Leu Cys Thr Lou Pro Leu Ala Val Ala 1 5 10 <210> 27 <211> 848 <212> DN'A <213> Homo Sapiens <220> <221> sig..pepcide <222> 32. .73 123 <223> von Heijne matrix <300> <400> 27 aacttrgcct tgtgtttcc accctgaaag a at~g ttg tgg ct~g ctc ttt ttt Met Leu Trp Leu Leu Phe Phe c tg Leu qt Ala t 11t Ty r t tc Phe gc Val1 aca Th r gcc Al a gac Asp a tg M etr tgc Cys caa Gin rat Xaa 170 ccc Pro ga t Asp act gcc att cat gcc Tht- Ala Ile His Ala aa gtg ago ctt agt Lys Val Arg Leu Ser 15 tgg gat acc aat gao Trp Asp Thr Asn Glu otcg aga aaa gt ccc Met Arg Lys Val Pro ctt tgc oat gta acc Leu Cys Asn Vol Thr ct tca aaa aat cac Pro Ser Lys Asn His ago aeg aac aag aac Arg Met Asn Lys Asn 95 act ct~g gaa ttt tt~a Thr Leu Glu Phe Leu 110 cco tct gtg ccc atc Pro Ser Val Pro Ile 125 atc at~a gtt gco at Ile Ile Vol Ala Ile 140 ada ara aog aac aaa Xaa Xaa Lys Asn Lys 160 tgt gaa aac at~g at~c cys Glu Asn Met Ile 175 gac atg aag gga ggg Asp Met Lys Gly Gly 190 ogg ct~c acc cct ct~c Arg Leu Thr Pro Leu ctc tgt coo cco Leu oa Arg tac Tyr ago Arg so agg Arg ct Leu atcc Ile ate Ile at Ile 130 ct~a Leu ceo Pro att Ile at Ile Cys Gln Pro Gly oca get Ct~g gga Thr Ala Leu Gly 20 etc ttc aaa gcg Leu Phe Lys Ala 35 gao gco oca gao Glu Ala Thr Glu gta tca ttc tgg Vol Ser Phe Trp cct gct gtt gag Pro Ala Val Glu 85 aac oat gcc ttc Asn Asn Ala Phe 100 ect tce aca. ctt Pro Ser 'rhr Leu 115 att oto ttt ggt Ile Ile Phe Gly ct~g Ott tta tca Leu Ile Leu Ser 150 tct gao gtg got Ser Glu Vol Asp 165 gao aat ggc atc Glu Asn Gly Ile 180 aat got gcc ttc Asn Asp Ala Phe geo gao at Ala Clu Asn got aaa gca Asp Lys Ala ot~g gtao gct Met Val Ala act tee cat Ile Ser His Phe Vol Vol gtg coo tco Val. GIn Ser tt cta oat Phe Leu Asn 105 gca cca ccc Ala Pro Pro 120 gt~g ata ttt Val Ile Phe 135 ggg atc tgg Gly Ile Trp goc *gct gao Asp Ala Glu ccc tct got Pro Ser Asp 185 atg oca gag Met Thr Glu 100 148 196 244 292 340 388 436 484 532 580 628 676 tgaagggetg ttgttctgct tcectcoraa 205 ottaoaacatt tgtttctgtg tgactgctga gcotcctgaa atoccoagog eagatcatot wttttgttte accattctte ttttgtoata aattttgoat gtgcttegaaa aaaaaaao <210> 28 <211> 14 <212> PRT <213> Homno Sapiens <220> <221> SIGNAL <222> 1. .14 <223> Von Heijne matrix score 10.7 seq LWLLFFLVTAIHA/EL <300> <400> 28 124 Thr Ala Ile His Ala Met Leu Trp Leu Leu Phe Phe Leu Val 1 5 <210> 29 <211> <212> DNA <213> <220> <300> <400> 29 gggaagatgg <210> <211> 26 212>, DNA ,213> 20 <300> <400> ctgccatgta <210> 31 agatgtct gcctg catgatagag agattc <211> <212> <213> <220> <221> <222> <22 1> <222> <22 1> <222> <223> <221> <222> <223> <221> <222> <223> <221>- <222> <223> <221> <222> <223> <221> <222> <223> 546 bNA Homno Sapiens Promote r 1. .517 transcriptior 518 Pro tein-.bind 17. Matinapector name CMYB_01 score 0.983 Istart site prediction sequience tgtcagttg Protein-.bind cornplement(18. .27) matinspector prediction name MYOD..Q6 score 0.961 sequence cccaaccgac protein-bind complement(75. matinspector prediction name S801 score 0.960 sequence aatagaattag pro t ein-.bind 94-. 104 matinspector prediction name S8.01 score 0.966 sequence aactaaattag protein-bind complement(129. .139) matinspector prediction name DELTAEFI-01 score 0.960 sequence 9cacacctcag proteinbind complement(ISS. .165) matinspector prediction name GATAC score 0.964 sequence agataaatcca 125 <221> protein-bfnd <222> 170..178 <223> matinspector prediction name CMYB.01 score 0.958 sequence Cttcagttg <221> proein-bind <222> 176..189 <223> matinspeetor prediction name GATAL-02 score 0.959 sequence ttgtagataggaca <221> protein-bind <222> 180..190 <223> matinspector prediction name GATAC score 0,95-3 sequence agataggacat <221> protein-bind <222> 284..299 <223>'matinspector prediction name TAL1ALPAE47-01 score 0.973 sequence cataacagatggtaag <221> protein-bind <222> 284..299 <223> matlnspector prediction name TAL1BETAE47O01 score 0.983 sequence cataacagatggtaag <221> proteinbind <222> 284..299 <223> matinspeccor prediction name TALIBETAITF2_01 score 0.978 sequence cataacagatggtaag <221) protein~bind <222> complement(287..296) <223> matinspector prediction name MYODQ6 score 0.954 sequence accatctgtt <221> Protein-bind <222> complement(302,.314) <223> matinspector prediction name GATA1-04 score 0.953 sequence tcaagataaagta <221> protein-bind <222> 393..405 <223> matinspector prediction name IK1-01 score 0.963 sequence agttgggaattcc <221> protein-bind <222> 393..404 <223> matinspector prediction name 1K2.01 score 0.985 sequence agttgggaattc <221> protein.bind <222> 396..405 <223> matinspector prediction 126 name CREIC'Ol score 0.962 sequence tgggaattcc <221> protein-bind <222> 423. .436 <223> matinspector prediction name GATAl-.02 score 0,950 sequence ccagtgatatggca <221.> procein-bind <222> complement(478. .489) <223> zatinspector prediction name SRY-..02 score 0.951 sequence taaaacaaaaca <221> protein-.bind <222> 486. .493 <223> mrainspector prediction name E2F-02 score 0'.957 sequence tttagcgc <221> protein-.bind <222> complement(514. .5231 <223> matjnspector prediction name MZF1..01 score 0.975 sequence tgagggga <300> <400> 31 tgagtgcagt ecr.tgatttg 9ttattgact gataggacat at~caggagaa atactttatc gaatcgagga catcagtgat ttgtttag cttcat <210> 32 <211> 23 <212> DNA <213> <220> <300> <400> 32 gtcacatgtc Cctgctaat gaggtgtgct tgatagatac aaaaatgaca tt~gagtagga gccagctcag atggcaaatg cgctgctggg agttgggtca aaccccac ataagtacca tctggaaaac gagccttcct tcagaagcag tgggactaag gcatcgcctt agtttgttaa tggaac taaa ta tgtggae t ggacaaaagc ctatagggaa gtggcaacgt ggagt tggga ggtagtgatc gggtccccc tgtcat tcaa tagtttgat tatctattc agggagatct aggcataaca ggagaaggga attccgttca agagggt taa aaacagattc atcttctatg ggtcta.ta ttcagttgta ~tttccaaa gatggtaagg agaggtcgta tgtgatttag aartgtgtgt ccatgaatct gtaccaggga ctgtgaccat tgc <210> 33 <211.> 24 <212> DNA <213> <220> <300> <400> 33 ctgtgaccat tgctcccaag agag <210> 34 <211> 861 <212> DNA <213> Homo Sapiens <220> <221> promoter <222> 1. .806 <221> transcription start site 127 <222> <221> <222> <223> 807 protein-bind complement(60..70) matinspector prediction name NFYQ6 score 0,956 sequence ggaccaacat protein-bind 70.,77 <22.> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> -223> <221> <222> <223> <221> <222> <223> <221> <222> c223> <221> <222> <223> <221> <222> <223> matinspecor prediction name MZF1-O1 score 0.962 sequence cctgggga Protein-bind 124.-.32 matinspector prediction name CMYB01 score 0.994 sequence cgaccgtcg proein.bind complemenc(126..134) matinspector prediction name VMYB_02 score 0.985 sequence tccaacggt protein-bind 135..143 matinspector prediction name STATO1 score 0.968 sequence ttcctggaa proteinbind complement(135.143) natinspector prediction name STAT-01 score 0.951 sequence ttccaggaa protein-bind complement(252..259) matinspector prediction name MZF1O3.
score 0.956 sequence ttggggga protein-bind 357..368 matinspector prediction name IK2-01 score 0.965 sequence gaatgggatttc Proteinbind 384..391 matinspector prediction name MZF,_01 score 0.986 sequence agagggga protein-bind.
complement(410..421) matinspector prediction name SRY-02 score 0.955 sequence gaaaacaaaaca <221> proteinbind <222> 592..599 128 <223> matinspector name MZF1_01 score 0.960 predict ion <221> <222> 2 2 3 <221> 222 <223> <22 1> <222> <223> -c22 1> <222> <223> sequence gaagggga procein.bind 618. .627 matinspeccor prediction name MYO...Q6 acore 0.981 sequence agcatcgcc protein-.bind 632. .642 catinspector prediction name DELTAEFI-0.1 score 0.958 sequence tcccccctcc protein..bind complement(813. .823) matinspector prediction name S801 score 0.992 sequence gaggcaattac procein_bind complemenc(824. .831) macinspector prediction name MZFI-01 score 0.986 sequence agagggga <300> <400> 34 CdCtacaggg tgattggcc cggcgaccgt ctcagagggc ggagcatgcc aayccagggc gggacc ccag ccaaatcaag agccggaact caagcagcg c tgggtctcg tgcctgagct c cggaaccca tcc gatggt tct'cttggga <210> <211> 2D <212> DNA <213> .<220> <300> <400> c cggga cgga <210> 36 <211> <212> DNA <213> <220> <300> <400> 36 gagaccacac <210> 37 <211> 555 cacgcgtggc ctggggaagg tggatcccg taggcacgag ttcccccaac ccaascagaa gttagncagg gtaacttgct ccctaccact gagaacatgg cccaaagagc gtttggacaa acacccaggc Ccccaggcc gcaatggtca cgacggccgg tcggctggc gaagcagtag ggaaggtcag cccggcttsc scacaggccc gtgagagggg cccttctgct t tcaggagag c cggcagagg atctgcccat aaa tccaaac ttacaggcca tgggcacaaa
C
gctgtctgg Cccagcacag ctgtctgcc aggagaaggs yctcggymam aktcntggct aggccctg acgggccc cg tggtctcagg ctcagctgt ttcccaccct cccacccggc ccctgagcca atataattgc agcagagggc cgaggcattt tggatctggc aggsa rggcc agggcgkc cy smaagcacaa gcc cagc cc gtcc cggc cc cccgtggggc g c9cgggcc ccctccccc cactctggcc ggggcctctg ctctccccc acgccagcaa aggteacc c agggacaggg cagcgagarg cgggmacccr cagcctgaac gccccgcccc gtccccaccc cgtctgtc tgaaggggag accagaagcc tggcttcagc gaaatcct ccccatttcc aggcacggca agctagacaa 129 <212> DNA <213> Homo Sapiens <220> <221> promoter <222> 1..500 <221> transcription start site <222> 501 <221> proteinbind <222> 191..206 <223> matinspector prediction name ARNT_01 score 0.964 sequence ggactcacgtgctgct <221> proteinbind <222> 193..204 <223> matinspector prediction name NMYC_01 score 0.965 sequence actcacgtgctg <221> protein_bind <222> i93..204 <223> matinspector prediction name USF_01 score 0.985 sequence actcacgtgctg <221> protein_bind <222> complement(193..204) <223> matinspector prediction name USF01 score 0.985 sequence cagcacgtgagt <221> protein bind <222> complement(193..204) <223> matinspector prediction name NMYC_01 score 0.956 sequence cagcacgtgagt <221> protein_bind <222> complement(193..204) <223> matinspector prediction name MYCMAX02 score 0.972 sequence cagcacgtgagt <221> proteinbind <222> 195..202 <223> matinspector prediction name USF_C score 0.997 sequence tcacgtgc <221> proteinbind <222> complement(195..202) <223> matinspector prediction name USFC score 0.991 sequence gcacgtga <221> protein_bind <222> complement(210..217) <223> matinspector prediction name MZF1_01 score 0.968 sequence catgggga <221> protein_bind <222> 397..410 130 <223> matinspector prediction name ELK1-.02 score 0.963 sequence ctc~ccggaagcct <221> protein bind <222> 400. .409 4223> <221> <222> <223> <221> <222> <223> <221> <222> <223> matinspector prediction name CETS1P54_.01 score 0.974 sequence tccggaagcc proteinbind complement(460. .470) minspector prediction name API..Q4 score 0.963 sequence agtgactgaac protein~bind complement(460. .470) matinspector prediccion name APIFJ_.Q2 score 0.961 sequence agtgactgaac protein-bind 547. .555 matinspector prediction name PADS...C score 1.000 sequence tgtggtctc <300> <400> 37 ctatagggca aggacagcat kawaagctca aggaactgac gagcagtcag cat tcctgtc gttgccctgc cgtgccttct ttttgcctcc tagctgtgtg <210> 38 <211> 19 <212> DNA <213> <220> <300> <400> 38 cgcktggtcg ttgtkacatc gcaccggtgc ggac tcacg t acagtgcctg tgcattagta ccatggtccc gcctgctccc tcaatttctc gtctc acggcccggg tggtCtaCtg ccatcacagg gctgctccgC ggatagagtg ac tcccaacc actgcagacc gc tcacatcc ttgtcttagc ctggtctggt caccttcccc gccggcagca ccccatgagc agagttcagc tagatgtgaa caggcactct cacactcgtg cccatcctct ctgtkgtgga ctgccgtgca cacacatccc tcagtggacc: cagtaaatcc aacttagttc ccggaagcct ttcagtcact gttcccctgg gtcggg ttga cttggccttt attactcaga tgtctatgta aagtgattgt tttctcatag ggaaatcacc gagttacaga ccagtttgtc ggccatacac ttgagtgac <210> 39 <211> 19 <212> DNA <213> <220> <300> <400> 39 atatagacaa acgcacacc <210> <211t. 1098 <212> DNA <213> Homo sapiens <220> <221> sig...peptide <222> 173. .211 131 <223> Von Heijnd matrix score 4.19999980926514 seq MLAVSLTVPLLCA/MM <221> polyA.signal <222> 1063. .1068 <221> polyA.site <222> 1087. .1098 <221> misc...feature <222> 144. .467 <223> homology id :AA057573 eat <221> misc-feature <222> 510,..640 <223> homology id :AA057573 est <221> misc..feature <222> 436,..523 <223> homology id :AA057573 es t <221> misc-feature <222> 708. .786 <223> homology id :AA057573 est <221> misc..feature -<222> 635. .682 <223> homology id :AA057573 eat <221> misc-feature <222> 625. .1084 <223> homology id :NS7409 esc <221> misc-.feature <222> 779.,.1084 <223> homology id :R71351 est <221> misc-feature <222> 144. .506 <223> homology id :K12619 est <221> misc-feature <222> 90. .467 <223> homology id :T03538 est <221> misc-.feature <222> 314. .523 <223> homology id ;T34150 eat <221> misc..feature <222> 567. .687 <223> homology id :T34150 es t <221> misc..feature 132 <222> 686. .730 <223> homology id :T34150 eat.
<221> misc-feature <222> 510. .553 <223> homology id :T34150 est <221> <222> <223> <221> <2 22> <223> misc-.feature 550. .579 homology id :T34150 est misc..feature 144. .523 homology id :N32314 eat <221>-misc..feature <222> 510. .553 <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> homology id :N32314 est misc-feature 352. .523 homology id :T77966 eat misc-.feature 218. .351.
homology id :T77966 est misc-feature 510. .553 homology id :T77966 est misc-.feature 550. .917 homology id :AA464128 est <300> <400> agtgaggtgg tttctgcggg tgaggctggc gcccgtacca cgacagcgcc ggccccrgcg gcccgcaagt cgtcacagac ggctaaggac ggcagctcct ttagcggcag agttttccga tgagcgaggc ggacgggctg gatgatggcc aggccccgga gtgaccttct tg atg ctg Met Leu atg atg ctg ctg gaa Met Met Leu Leu Glu.
1 gct gtt Ala Val tct ctc acc gtt Ser Leu Thr Val Ctg ctt gga gcc Leu Leu Gly Ala tct cct ata gat cca cag cct ctc agc ttc aaa gaa ccc ccg ctc ttg Ser Pro Ile Asp Pro Gln Pro Leu Ser Phe Lys Glu Pro Pro Leu Leu 15 ctt ggt gtt ctg cat cca aat acg aag ctg cga cag gca gaa agg ctg Leu Gly Val Leu His Pro Asn Thr Lys Leu Arg Gin Ala Glu Arg Leu 30 ttt gaa aat Phe Clu Asn caa ctt gtt gga Gin Leu Val Gly gag tcc ata gca Glu Ser Ile Ala att ggg gat Ile Gly Asp gtg atg ttt act ggg aca gca gat 99c cgg gtc gta aaa ctt gaa aat 133 Val Met Phe Thr'Gly Thr Ala 'Asp Gly 60 Arg Val Val Lys Leu Giu Asn o~a nr~ n.~v, 65 at a d lv Cgt cg ggc cct tgc aaa acc Gly Glu Ile Glu Thr Ile Ala Arg Phe Gly Set Gly Pro Cys Lys Thr 75 80 cga ggt gat gag ccC gtg cge ggg aga ccc ctg.ggt atc cgt ggc agg Arg Gly Asp Glu Pro Val Cys Gly Arg Pro Leu Gly Ile Arg Gly Arg 95 100 gcc caa egg gac tc Ctt t~t ggc cga cgc ace caa egg gac tat Ctg Ala Cln Trp Asp Ser Leu Cys Gly Arg Cys Ile Gin Arg Asp Tyr Leu 105 110 115 aag taaatccceg gaaacgcgaa gcgaaactgc tgcegtcctc cgoacaccc 466 514 562 615 675 735 795 855 915 975 1035 1095 1098 atcgegggge atccta cc cca gagggcacag ctatggacc ccggcggcag ggcggggccg agctctgggg ctggatccc egdoaegtc ccgattc ag acgacgggcg agccgcggtt aaacaaccat atctgtttgt ggtaccgggt catccgagag ccc egtgaa cagcaaa egg cccgccggag cccgaatgga ggccaggaea ggagaeca cg gggcacgccg accctggace gatceacag caaagacgag teegacaceg geccageege cgaagagtce cc eggattc accatccgcc aaaaggatga ecactcagga actacctgct tgaccaggga cccctgcaga acgcctcegg cagacaaca c Ctaaccccgg tetcc aangg tgggaggaag eceggtgaeg agcaaaagc C agacttegec cccga egaag ccggcccagc gccttccacg taaaaaaaaa aaa <210> <211> <212> <213> <220> <221> <222> <223> <221> <22 2> <221>- <22 2> <221> <222> <223> 41 855
DNA
Homo sapiens Sig-.Pepcide 267. .371 Von Heijne matrix score 5.90000009536743 seq-LCGLLHLWL!KVFS/LK polyA.signal 817. .822 PO2.yA_.Sie 842. .855 misc-feacure 608. .811 homology id :M85769 es <300> <400> 41 acaaccag Ccagtcag cetcccca ac cgaac a gtagcage cat aga Tyr Arg ctg ta Leu Leu
C
tC cgccaatacc tc at aceagtcaac a~ ca tgacag~gac a~ gg aacagcacaa cC gg cccagcacac t~ aat gec agg Ccc Asn Val Arg Ser cat tea egg ctc His Leu Trp Leu :agaaacaa tcaaatcat Itgatgaac :tgggaccc tcggt atg Met atacctcgga caaatcttcc tctaaagacc gtagatggcg gcaecctagg cctcggacac tcccctgcag aaaattatat aggagtataa agacatgcag tacctcacg caaagtaaaa ttg act get aat gat gea cgt ttc Leu Thr Val Asn Asp Val Arg Phe eec cat eec cca te gt Asn His Phe Pro Phe Val -20 cga cee ege gge Arg Leu Cys Gly aaa tc egg Lys Ser Trp aag cat Lys Tyr aaa gec ccc tc ccc aaa cag eta aaa aaa Lys Val Phe Ser Leu Lys Gin Leu Lys Lys 1 tta tC gaa ccc Cgc tgc tat agg age teg Leu Phe Clu Ser Cys Cys Tyr Arg Ser Leu is taaacatacc tgcacacaaa gacggtctat tat geg :ge gec etc at Tyr Val Cys Val ?he Ile ttctatttaa teegegacat tegtecceg-gatatagtcc gegaaccaca agattatca tatttttcaa taatatgaga agaaaatggg 605 tatttctcta gtttttacct agtttgcttt cataacctta tatgttgaca caataattca cagagaagaa catttaaagg gttaatattt ttattgtggc ttctatttga aatgtgtcta aaaaaaaaa <210> 42 4211> 1176 <212> DNA ;213;, Homo sapiens <220> <221> sig-peptide <222> 174..266 <22-3> Von eilne marrix 134 ccgtaaa~tg ttaaccattt tatgttcaga aacatagaga ccagcaagtg aatatatatg 665 gaataatetg ttaaagataa actaattttt 725 ttgaaacgt ttcagataat atctatttga 785 aaataaaatg ctgtttattt aaaatgaaaa 845 <221> <222> <221> <222> <221> <222> <223> <221> <222> <223> C22 1!" <222> <223> <221> <222> <223> <221> <222> <223> <221> <222.
<223> <221> <222> <223> <221> <222 <223> score seq WSPLSTRSGGTHA/CS polyAsignal 1144..1149 polyAsite 1165..1176 misc-feature 886..1134 homology id :AA595193 est misc-feature 756..894 homology id :AA595193 est misejfeature 655..755 homology id :AA595193 est misc-feature 167..367 homology id ;W81213 est miscfeature 66..172 homology id :W81213 est misc-feature 429. .508 homology id :W81213 est misc-feature 756..894 homology id :AA150887 est misc.feature 536. .643 homo logy id :AA150887 est <221> misc-feature <222> 655..755 135 <223> homology id :AA150887 es t <221> misc-.Eeature <222> 429. .643 <223> homology id :AA493644 es t <221> miscfeature <222> 655. .755 <223> homology id ;AA493644 est <2,21> misc-.feature <222>. 429. .643 <223> homology id :AA493494 est <221> misc-~feature <222> 655. .755 <223>. homology id :AA493494 est <221> misc-feature <222> 500.. 643 <223> homology id :AA179182 es t <221> misc..feature <222> 655. .755 <223> homology id ;AA1792.82 est <221> misc-feature <222> 756. .847 <223> homology id :AA179182 est <221> misc-.feature <222> 3. .338 <223> homology id :HUM524FO5B est <221> misc-feature <222> 334. .374 <223> homology id :HUM524F056 est <221> misc..feacure <222> 886. .1134 <223> homology id :AA398156 es t <221> misc-feature <222> 756. .894 <223> homology id :AA398156 es t <3 00> <400> 42 aaaaacaata ggacggaaac gccgaggaac ccggctgagg cggcagagca tcctggccag aacaagccaa ggagccaaga cgagagggac acacggacaa acaacagaca gaagacgtac 120 tggccgctgg accccgctgc ctcccccatc tccccgccat ctgcgcccgg agg atg 176 136 Cca gcc ttc agg Pro Ala Phe Arg atg gat gtg Met Asp Val met gag ccc Glu Pro cgc gcc aaa ggc tcc Arg Ala Lys Gly Ser ttc tgg agc cct Phe Trp Ser Pro tcc acc agg tcg ggg ggc act cat gcg tgc tC Ser Thr Arg Ser GJly Gly Thr His Ala Cys Ser gct tca atg aga caa ccc tgg gca agc ccc tgg tcc Ala Ser Met Arg Gin Pro Trp Ala Ser Pro Trp Ser
I.
ggg aac at~c Gly Asn Ile agr tcr Ser Ser acd ag Thr Lys acg aga ccc tcc Thr Arg Pro Ser ct~g ava rgc 9ca aar tct crc ccc agr Leu Arg Cys Ala Asn Ser Leu Pro Ser gac aaa gcc Asp Lys Ala ggc ccc rrg tta Gly Pro Leu Leu ggc cat ccc rgc Gly His Pro Cys arc rrr tcc ccr Ile Phe Ser Pro cot tt~c ccc tgt Pro Phe Pro Cys cac agg gaa gtg His Arg Glu Val.
gaa cac ccc Glu Tyr Pro gaa gtg rca Glu Val Ser ccg gcr cct ct~g Pro Ala Pro Leu cc& gag ctg ggg Pro Clu Leu Gly gcc acc tca Ala Thr Ser agt cga gga Ser Arg Gly rcr crc tct gag cac gsa ttc ccc tgc agc Ser Leu Se: Glu His Xaa Phe Pro Cys Ser 272 320 368 416 464 512 560 608 656 712 772 832 992 952 1012 1072 1132 1176 ct~g agc Leu Ser 100 aga ttg agt gar Arg Leu Ser Asp ggg gca gan adg Gly Ala Xaa Xaa cag cca gtc gtr Gin Pro Val Val gcg ctc gkc Ala Leu Xaa cca ccc tgacagcccc atcctcaaag acrgtctraa ttactca Pro Pro agacccaagg ggaaaag statctgccr wgtgrtc cararrrrat aagaaca ttcgmtagg gtgtcrrt ttacccagtc ttcccyrt crgrarrcag gstyrgci ttaaaaattrc ccgcaaa acaraaaaar taaraaa <210> 43 <211> 648 <212> DNA <213> Homo sapiens <220> <221> sig..pepride <222> 460. .555 <223> Von Heijne m gag ang aaa ggr Glu Xaa Lys Gly gct gaa. acg ccc Ala Glu Thr Pro 130 tgg caggrttctag 'rgcrcc ccmaccagsr caaaga ccaaraaaaa 'ggcacw gtgggaagcc 'cyttat rrrgctrrgc 'caaacc tgrgttcccc *gtrrct grrtagggca.
taggar tagaagaarr rtg aaa gtg :ta catr crrrcaaggc ttgytatrr ggcyrgggrg ctg-rrggyt yttggatgst aaagcaagcc taaagagcaa tttcaatgat caccacatgt gtgasgrgag cctacccgkg rgrrrrgrrr ryr taacccr atgaggctgt tgttttcagt ggaaaaaaaa c rgg acag t9g9 gcc c cagg tgga yrttt aaaa at:r~.x score seq F <221> polyA <222> 614. <221>- polyA <222> 635..
<300> <400> 43 aarrcrggcc tcc rragagr taagacrcat ccctttrrcta tggcc ttgga caaargccag 4
SFMLLCMGGCLP/CF
-signal 619 sire 648 cagctrcttc tctccccrcca got acaagaa crgagaggaa arraaaccac ttacggtgat cccagcrcra tragcagttrg gttaaataag gtggaatgca caccaacaca gcgrttcaaca tccrgcrrcc tcttagggtc tttcccgaag c tccgacaag ctttrggatt tCcttarttC ctccatctcc rgrrrctggg tcacacagc t gataaggttt atcagaaggt cagrctttar tataggarrc gagccctgcc agccrcrcat tattgrgagc ggaaggagrg gacgcctttc 137 ctgaatcaca ggtgdattgg ggtgcr.tcct cctccccagg actcccaccc aactttata 420 acaCaaccca cttagaggag ttatctcagc aca~tatga atg tcg ggg acc acg Met Lou Gly Thr Thr ggc ctc ggg aca cag ggt cct tcc cag cag gct ctg ggc tt ttc tcc Gly Leu Gly Thr Gin Gly Pro Ser Gin Gin Ala Leu Gly Phe Phe Ser -20 ttt atg tta ctt gga atg ggc ggg tgc ctg cct gga ttc ctg cca cag Phe Met Leu Leu Gly met Gly dly Cys Leu Pro Gly Phe Leu Lou Gin cCc CCC 8aC cga tc cct act ttg CCt gca tCC 8CC ttC gc cat8 Pro Pro Asn Arg Ser Pro Thr.Leu Pro Ala Sor Thr Phe Ala His 15 tagtcaac tccccaccca taaaaaaaaa aaa, <210> 44 <211> 1251 <212> DNA <213> Homo sapiens <220> <221> sig-.peptide <222> 79. .369 <223> Von Heijne matrix <221> <222> <221> <222> <221> <222> <223> <221> <222> <223> <22 1> <222> <223> <221> <22 2> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> score 4 seq R-LPLWVSFIASSS/AN po lyk.s igna 1 1217,..1222 1240. .1251 misc-jeature 2. .423 homology id :AA056667 est misc-.feature 463. .520 homology id :AA056667 est misc-feature 418. .467 homology id :AA056667 est misc-feature 159. .636 homology id :AA044187 es t misc_Jeature 629. .684 homology id :AA044187 es t misc-feature 5. .453 homology id :AA1319S9 est mi sc..feature 446. .494 homology id :AA131958 138 est <221> misc-feature <222> 14. .343 <223> homology id :w95957 es t <221> misc-feature <222> 323. .467 <223> homology id :W95957 es t <221> misc..feature <222> 463. .494 <223> homology id :W95957 est <221> misc..feature <222> 14. .475 <223> homology -id :W95790 eat <221> misc-feature <222> 410. .876 <223> homology id :AA461134 est <221> misc..feature <222> 974. .1195 <223> homology id ;AA595195 est <221>. misc..feature <222> 769.,.982 <223> homology id :AA595195 est <221> misc-feature <222> 1208. .1237 <223> homology id :AA595195 est <221>. misc-feature <222> 223. .522 <223> homology id :AA041216 eat <221> misc-feature <222> 518. .636 <223> homology id ;AA041216 est <221>. misc-feature <222> 774. .1127 <223> homology id :N94607 est <221> mjsc-.feature <222> 690. .765 <223> homology id :N9460'7 est <221> misc.feature <222> 833. .1195 139 <223> homology id :AA076410 <300> <400> 44 aaagtgacag cggagagaac caggsagccc agaaacccca ggcgtggaga tgatcctgc gagagaaggg ggttccc acg gcg gat gac Met Ala Asp Asp -95 cta aag cga Leu LYS Arg ttc ttg tat aaa Phe Leu Tyr Lys gtg tCa gat aga Val Ser Asp Arg aag tta cca Lys Leu Pro gat gga gta Asp Gly Val agt Ctt gaa Ser Val Glu ctc cat gcc att Leu His Alit Ile cct gtt Pro Vdi1 add gtg gcd aat LYS Val Ala Asn adt get cca gag cat Asn Ala Pro Giu His ttg cga cct Leu Arg Pro tta tee act ttC gcc ctt gca aca Leu Ser Thr Phe Ala Leu Ala Thr gac cao Asp Gln gga ago aaa Gly Ser Lys aac acc tac Asn Thr Tyr tr.c ata gcc Phe Ile Ala Ctt tcc add Leu Ser Lys aaa agt arc arc Lys Ser Ile Ile cag grg gct caa ttt aat cgt tta cct ctg Gin Val Val Gin Phe Asn Arg Leu Pro Leu -15 agc agc agt gcc aar aca gga cta arc gtc Ser Ser Ser Ala Asn Thr Gly Leu Ile Val tgt tac tat Cys Tyr Tyr 9t9 gtg agt Val Val Ser agc cta gaa Ser Leu Glu gtg gaa att Val Glu Ile aag gag ctt gct cca ttg ttt gad gad Ctg aga cad gtt Lys Glu Leu Ala Pro Leu Phe Glu Glu Leu Arg Gin Vol ret taatetgaca gtg~ Ser atatcaatcc agcaatci cccccttttc caacttai tgcratatt ttctggt atggrrcagt crcaca ttctattcag tggatta ccaatrgtac aatatgc raaggacata tttttctl arggcrrggt aaaagca ttrgcagtata gatgaat atgaaaatga aaattatd grgttttadt gcttgtgi atgtagtar gtatgta.
aaatagtatt tttaaaa <210> 45 <211> 1524 <212> DNA <213> Homo sapiens <220> <221> Sig..peptide <222> 160. .231 grtetag tgtgcacctt atcttcatta taacaacaca ttt gta gaa :ca :ca ata att age aac gta agactocoar taaagaacta aggtcrrtct tcccacggag tcaaac tggt gg t tgcaga gatatgrrr aaatcagtac actaatcagt taatgttttt tcrrcataaa atgatagtac aaaaaaaaao aatgctttta gcoattgat tar ttagtga ttagtctggr acattgatcc ataaagccaa rat rtctttg aatcacac tgattatte tee tcaaact atttaaatac agccattttt a tccatgtgct gtaatttara gatctaggga caccagatat acttgagccg ctttttattg cat tgag tga tttCctttgt tcagagggtg ctgctttctg aattcgttat ttcatatgtg caagaaaggg gdtdgatcag taccacagoo ggatgagaga ttaagtgctg tgaataaraa ggtacataaa acarorrort Ctgctcttta taaccaatca tctgtttcca agtaaaaata 560 620 680 740 800 260 920 980 1040 1100 1160 1220 1251 <223> Von Heiine matrix score 5.6999998092651.4 seq ILGLLGL.LGTLVA/ML <221> PolYA-signal <222> 1510. .1515 <221> polyA-site <222> 1506,.1i519 <221> misc-feature <222> 1048. .1504 <223> homology id :AA552647 est <221> misc-feature <222> 597. .846 <223> homology id :AA345449 est <222.> misc-feature <222> 39. .93 <223> homology id :AA345449 est <221>~ misc..feature <222> 113- 149 <223> homology id :AA345449 eat <221> misc-feature <222> 98. .400 <223> homology -id :T86266 est <221> misc-feature <222> 1210. .1489 <223> homology id :T86158 est <221> misc-feature <222> 954.3.83 <223> homology id :AA116709 es t <300> <400> agctgcttgt ggccacccac actctgaaat gaasgattag ggagcagtcc ctgaagacgc 140 agacacttgt aaggaggaga gaagtcagcc tggcagagag aggtgttcaa ggragcaaag agcttcagcc tgaagacaag ttctactgag aggtotgcc atg gcc tct ctt ggc Met Ala Ser Leu Gly ctt ttg ggc aca Ile Pro Val. Ala'Trp Asn Leu His Gly 115 agc atg aaa ttt Ser Met Lys Phe ctg gtg cct Leu Val Pro 141 Ile Leu Arg 120 ;ag att gga gag 1lu Ile Gly Glu L35 .tg ea gct gga eu Ile Ala Gly ttg ggc att Leu Gly Ile tgC ttt Ccc Cys Phe Ser 160 tct tCC ctg ttc Ser Ser Leu Phe Asp Phe Tyr Ser 125 gct ctt tac Ala Leu Tyr 140 atc atc ctc Ile Ile Leu 155 tac gat gc Tyr Asp Ala tgC tea tcc cag Cys Ser 5cr Gin aat cgC tcc aac tac Aan Arg Ser Asn Tyr tae cd Tyr Gin 175 cee ccc Pro Pro gcc coo cet ctt Ala Gin Pro Leu acea gg age tct ccA egg cct ggc caa Thr Arg Ser 3cr Pro Arg Pro Gly Gin aaa gte aag Lys Val Lys gag CCte at tcc Glu Phe Asn Ser agc ctg aca ggg Ser Leu Thr Gly geg tgaagaacca ggg ValI tggacagcac cccgagg ctgctgaggg tagactg gtaacagcat gcaggtt caccttgctg ctcccct gccaggamtc agaggat ctaatcacat cccactg gstyctagct cattgct aacctamtty ccaagct yttgttatga ccaca acggcatyca gggaaca catttaaaaa aataaaa <210> 46 <211> 610 <212> DNA <213> Homo sapiens <220> <221> sig..peptide <222- 106. .201 gccagag ctggggggtg gctgggtctg tgaaaaacag gc act gaa gc c ccc act ggg tcc gtg gaa aa a acaggtgagg ttggccatcg t tgccaagga ctaagtcccc tYtgccctck gaccctctgt gatgggaagg c tccaaagaa tccagamtaa ageagga tgc aaaaa gacactacca gattgagcaa tgctcgccat aacccccaac gg~ctamctg ga tcaaagac agaagcagtg amtgattggc tttgtgcatg aggatgggag ctggatcgtg aggcagaaat gccagcctt t tgaaaccce ggactccatc CCtCCCtctg gctctystgg cctggaacct aactgaaata gacaggaagg tcagaaggtg gggggc tag t ctgttttcct attcccttaa cccaaaccca ge tgaggttg gcattgctyt ccatcccact aaacca tcc t cagcctggga 959 1019 1079 1139 1199 1259 1319 1379 1439 1499 1524 <223> Von Heijne matrix score 8.80000019073486 seq VPMLLLIVGGSFG/LR <221> poiyA-signal <222> 577. .582 <221> <222> <221> <222> <223> <221> <222> <223> <221> <2 22> <223> <221> <222> <223> polyA-s.ite 598. .610 misc-.fieature 68. .167 homology id :AA531561 eat misc-feature 166. .262 homology id :AA531561 est misc-feature 423. .520 homology id :AA531561 est misc-.feature 518. .564 homol.ogy id :AA531561 142 <221> misc. feature <222> 276. .313 <223> homology id :AA531561 est <221> misc-feature <222> 41. <223> homology id :AA531561 est <221> misc-.feature <222> 41. .262 <223> homology id :AA535454 esc <221> misc..feature <222> 423. .520 <223> homology id :AA535454 es t <221> misc-feature <222> 518. .564 <223> homology id :AA535454 est: <221> misc-feature <222> 276. .313 <223> homology id :AA535454 est <221> misc-feature <222> 46. .262 <223> homology id :H81225 est <221> misc-feature <222> 2. .39 <223> homology id :H81225 est <221> misc-feature <222> 455. .493 <223> homology id ;H81225 est <221> misc-feature <222> 276, .313 <223> homology id :H81225 est <221> misc..feacure <222> 423. .458 <223> homology id :H81225 esc <221> misc-feature <222> 53. .262 <223> homology id :AA044291 eat <221> misc-feature <222> 423.-520 143 <223> homology id :AA044291 <221> <222> <223> <221> <2 22 <223> <221> <222> <223> est MiSC-f.eature 518. .564 homology id :AA044291 misc-feature 276. .313 homology id ;AA044291 est misc -f~eat ur e 125. .262 homology id :W47031 est <3 00.- <400>. 46 aaagtgagtt aaggacgtac gcgctaggcc cgcttggagt tcgtcttggt gagagcgtga tctgagccga eggaagagtt stgctgagat ttgcgagtct cactc atg tte gca ccc Met Phe Ala Pro gcg gtg atg Ala Val Met ccc atg ttg Pro Met Leu gct ttt cgc aag Ala Phe Arg Lys aag act ctc ggc Lys Thr Leu Gly tat gga gtc Tyr Gly Val -15 ttg ctg att gtt gga ggt tct ttt ggt ctt cgt gag ttt Leu Leu Ile Val Gly Gly Ser Phe Gly Leu Arg Glii Phe caa atc cga tat gat gct gtg aag agt Gin Ile Arg Tyr Asp Ala Val Lys Ser aaa atg gat cct gag Lys Met Asp Pro Glu tta gag tcg gaa tat Leu Glu Ser Glu-Tyr gaa aaa aaa ccg Clu Lys Lys Pro gga agt atc tgt Gly Ser Ile Cys gag aat aaa ata Glu Asn Lys Ile tgaagggcta ctatctttcc ttggcccttc tcccttgttg ggactcaatc tccagactat ctccccagag ttgggaaaat caaagactcc aagtttgatg gggaagatcc tgaccccctc caaggaagaa tgactctgct gattcttttt tccttttttt aaaaaaaaa <210> 47 <211> 1370 <212> -DNA <213> Homo sapiens <220> <221> sig...peptide <222> 359. .466 <223> Von Heilne matrix aatcttgtca actggaagaa a tccaggaaa ttttaaataa aggcttggct tattcgagga gcc ttaagac aaatactatt ttaagctttg cccaggcctt taagacaac t aac tggaaaa <221> <222> <221> <222> <221> <22 2> <223> score 7.80000019073486 seq LTFLFLHLPPSTS/LF P01yA..signal 1334. .1339 po lyA.s it e 1357. .1370 misc-feature 113. .420 homology id :R79290 est 144 <222> 406. .482 <223> homology id :R79290 est <221> zisc-feature <222> 199. .420 <223:- homology id :RS1173 esc <221> misc-feature <222> 406. .514 <223> homology id :R81173 <221> misc-feature <222> 2. .269 <223> homology id :R81277 es t <221> misc-.feature <222> 406. .646 <223> homology id :R74123 es t <221> misc-feature <222> 647. .6-82 <223> homology id :R74123 est <221> misc-feature <222> 439. .646 <223> homology id :AA450228 est <22 1> misc..feature <222> 647. .739 <223> homology id :AA450228 est.
<221> misc-feature <222> 406. .646 223> homology id :R02473 es t <221> misc-feature <222> 406.-604 <223> homology id :T71107 est <221> misc-feature <222> 71. .282 <223> homology id :C06030 est <221> misc-feature <222> 319. .365 <223> homology id :C06030 es t <221> misc..feature <222> 57 <223> homology 145 id :C06030 est <221> misc-feature <222> 1173. .1277 <223> homology id :N54909 est <221> misCjfeature 422?> 1080. .1177 <223>i homology id :N54909 est <221> misc-feature <222. 1273. .1356 <223> homology id :N5~4909 est <221> misc-feature <222> 1173. .1277 <223> homology id :AA196824 est <2211> misc-feature <222> 1080. .1177 <223> homology id :AA196824 *es t <221> misc-feature <222> 1273. .1356 <223> homology id :AA196824 est <300> <400> 47 acaaggcaga cctttgctgg cctgctccac taacagacc t caggtcace tagtggggtg gcr.tctgaat aggaaggtac caag:gagag cctacagct ggagaacaag gaatccagaa ttcagg acaggg aaggta gtagga gaaatt rcctt cattccagag rtgaa gctgawgstg .ctta Cttgtacc *acta ctgtcccaga .agta gtagtttaaa tgaa gaccagatca agc ccg gc gtc Ser Leu Ala Val atg aac ach ttt gag cca Met Asn Thr Phe Glu Pro Ccctgtg gccaggcctt tacttggggg atctccttgg tccrtgtcag ccaggtgcat gccgaggcaa ggggattect gtagtaact.g ctactgtatt tgggtggtcc gcatgtga Ile Ala Phe Phe Leu tt ctc cat cta cca Phe Leu His Leu Pro gga caa ata aag ggc Gly Gin Ile Lys Gly att tgg acc ttC ect gcc ctt aca ttt Ile Trp Thr Phe Ser Ala Leu Thr Phe tcc acc agt eta ttt att aac tta gca Ser Thr Ser Leu Phe Ile Asn Leu Ala cct Ctt ggc Pro Leu Gly is tgc gac ttt Cys Asp Phe 1 teg att Leu Ile ttg ctt ctt tct ttc tgt gga Leu Leu Leu Ser Phe Cys Gly tat act aag Tyr Thr Lys gcc cca tcc Ala Leu Ser ttg gaa atc cc~t aac aga att gag ttt Leu Glu Ile Pro Asn Arg Ile Clu Phe att atg gat cca aaa aga aaa aca aaa Ile met Asp Pro Lys Arg Lys Thr Lys taatgaagcc atcasgtcaa gggtcacatg aagtagagc t tttgtttttc Cttggctcac ag tagc tggg ccaataaaca tatgaaatgg taaagacgga tgtaacctcc attgcaggtg ataaattttc ttcagtaagg gtctcgctctgc tcCcggg cgtgccacca cagaagaaat atgagcttgt gtcactcagg ttcaagccat ticctggcta gaaatccaac tgttttttgt c tggagtgca tctcctgcct atttttgtgt tagacaaata tttgttttgt gtggt-atgat cagtctcctg ttttggtaga gacagggttt cacc~cgtcg gtcgggcg 1011 146, tCtcgggctc ctgacctctt gatccgcctg ctggcctc ccaaagt gattttaaa gtatgtt gaaaaggtca tggggaa ccttatgtc taggcca Cttctctagc ttacaat 8aatgtcaaa actaact <210> 48 <211> 791 <212> DNA <213> Homo sapiens <220> gat~ cca gca ctt gga gggattacag gtccgtgtc gaggtgattc gtgaagaata CCtttttgaa ataataaatg a egtgagcca atggttggaa atggctctgt tgagtcagtt ctgggaaaca ~tattttca ccgtgcctag gacagagcag ggaatttgag att gccagcc cctgtctgC ca tygaaaaa ccaaggatga gaaggatatg gtgaatggtt ct~ggaa t t a attcacttta aaaaaaaaa 1071 1131 1191 1251 1311 1370 <221> <222> <223> <221> <222>- <221> <222> <221> <222> <223> <221> <22 2> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> sig.,peptide 191. .286 Von Heijne matrix score 8.80000019073486 seq VPMLLLIVGGSFG/LR polyA.signal 755. .760 polyAsite 780. .791 misc-..eature 361. .531 homology id :W73841 es t misc-feature 210. .347 homology id :W73841 est misc-feature 548. .637 homology id :W73841 es t miscfeature 181. .210 homology id :W73841 esc misc,.feature 361. .530 homology id :HSU74317 est misc-feature 238. .347 homology id :HSU74317 est misc-feature 568 .637 homology id :HSU74317 es t misc..feature 698. .733 homology id :HSU74317 est <221> MiSC..feature 147 <222> 361. .531 <223> homology id :W47031 est <221> misc..feature <222> 210. .347 <223> homology id :W47031 Qst <221> misc-feature <222> 148. .210 <223> homology id :W47031 est <221> misc-feature <222> 548. .600 <223> homology id,:W47031 est <221:- misc-feature <222> 129. .347 <223> homology id ;AA044118 es r <221> misc-feature <222> 437. .531 <223>, homology id :AA044118 es t <221> misc..jeature <222> 361. .454 <223> homology id :AA044118 est <221> rnisc-.feature <222> 176. .347 <223> homology id :AA293342 est <221> misc-feature <222> 361. .531 <223> homology id :AA293342 est <221> misc..feature <222> 548. .605 <223> homology id :PA293342 est <221> misc-.feature <222> 361. .531 <223> homology id :AA531561 est <221> misc..feature <4222> 153. .252 <223> homology id :AA531561 eat <300> <400> 48 aacaagtatg ttacgatggc tcgattgctt ttgcctagcg gaaaccattc actaaggacc gagcaccaaa taaccaagga aaaggaagcg agttaaggac gtactcgtct tggtgagagc 120 148 gtgagctgct gagatttggg agtctgcgct 180 aggcccgct ggagttctga gccgatggaa gagttcactc atg ttt gca cc gcg gtg atg cgc gct ttt cgc aag aac Met Phe Ala Pro Ala Val Met Arg Ala Phe Arg Lys Asn -30-2-0 aag act ctc ggc tat gga gtc ccc atg ttg -25 -t20tgtgg g Lys Thr Leu Gly tIyr Gly Val. Pro Met Leu Leu Leu Ile Val Gly GlIy 1 tct ttt ggt ctt cgt gag ttt Ser Phe Gly Leu Arg Giu Phe cad atc Cga tat gat get Gin Ile Arg Tyr Asp Ala gCg aag ggt aaa Gly LYS 15 ato gat cct gag Met ASP Pro Clu gad aaa add ctg Glu LYS Lys Leu gag dot aaa aca tet tca gag tcg ga ta aCs acasg c e ag ttt gat Ser Leu Clu Ser Clu Tyr Glu Lys Ile Lys ASP Ser Lys Phe Asp 35 A A 229 277 325 373 421 469 518 578 638 698 758 791 tgg aag aac att: cga gga ccc agg cet Trp LYS Asn Ile Arg Cly Pro Arg Pro tgg gad TrP GlU act a 'ag Thr Lys gat cet gae ASP Pro Asp cad gga ags sat Gin Gly Arg Asn gas agc Cttaag Glu Ser Leu Lys gattctctct act ctatca a Cggacaaaa ttccatctgc aaatgtga <210> 49 <211> 1433 <212>
DNA
<213> Homo <220> tccttttttt ag tggaaagg ktaatctktc ggatgaaagc ataactgctc ttttaaataa aaatacatt aaattceagg ccatggsaaa accaaaggtc atgtccagg tgcaatgctg gcceeegct caaaas ass sea act tgacccgCt Thr Thr aactggacttccetaatatat Cttggatatg ggtaatttgg tttttatact tcccagcaa kattttacac cntcgaaa sapiens <221> <222> <223> <221> <222> <221> <222> <221> <222> <223> <221> <222> <223> Sig-.PePtide 346. .408 Von Heijne matrix score seq SFLPSALVIWTSA/AF PolYA..signal 1400. .1405 polyA site 1420. .1433 misc-feature 268. .634 homology id ;W02860 esc misc-feature 118. .564 homology id :N27248 es c <221> misc-feature <222> 268. .697 <223>, homology id :N44490 es t <221> misc feature <222> 582. .687 <223> homology id :AA274731 est <221.> misc..feature 149 <222> 65. .369 <223> homology id :H94779 est <221> misc-feature <222> 471. .519 <223> homology id :H94779 est <221> misc..feature <222> 61. .399 <223> homology id :H09880 est <221.> misc-.feature <222> 408. .452 <223> homology id :H09880 est <221>t misc-feature <222> 484. .699 <223> homology id :H04537 est <221> misc-feature <222> 685. .772 <223> homology id :H04537 est <221> misc-.feacure <222> 454. .486 <223> homology id :H04537 est <221> misc-feature <222> 410. .439 <223> homology id :H404537 eat <221> misc-feature <222> 572. .687 <223> homology id :AA466632 est: <221> misc-feature <222> 260,..444 <223> homology id :AA459511 eat <221> misc-feature <222> 449.-567 <223> homology id :ALA459511 es t <223.> misc-feature <222> 117. .184 <223> homology id :AA459511 eat <221> misc..feature <222> 260. .464 <223> homology id :H57434 150 <221> <222> <223> <221> <222> <223> <221> <222> <223> ese misc-feature 1*18 184 homology id :H57434 est misc-jeature 56. .113 homology id :HS7434 esc misc..feature 454. .485 homology id :H57434 est <300> <400> 49 actcttta ctgatgccga ctcaaacggc gtttgttgaa cgeeccegtt aagac Caaca cag caa ggc Gin Gin Gly gct gct ttc Ala Ala Phe 1 gcataggggc gttccgtctc ctagtgcc gcagttacca gagtacacgt ttcegtgaag C tcggcgcca tcgcgceee gogoctccgg agaatcttca tCCegctgat ttgtaaaaca gcggccaacg tcctggtccc agaaaatcag accctttccc ttacaaaagg gaaaacc egt ctagtcggtc tggtaagtgc aggcaaagcg gasgnagac cggtctaatt aattcctctg acaaaagcta attgageaca tca~ggtatg agcaggtctg tagaa acg egg tgg tee Met Trp Trp Phe gta act tgg aca tot Val Ile Trp Thr Ser ctc age ttc ccc cct eca gcc ct Leu Ser Phe Leu Pro Ser Ala Leu ata ttt tca Ile Phe Ser tac ace ace gca gta aca ctc cac cat ata Tyr Ile Thr Ala Val Thr Leu His His Ile gac ccg gct tea ccc tat aec age gac Asp Pro Ala Leu Pro Tyr Ile Ser Asp aaa tgc eta ttt ggg gca atg cta aat Lys Cys Leu Phe Gly Ala Met Leu Asn 99t aca gta gct Gly Thr Val Ala cca gaa Pro Glu gcg gca gt Ala Ala Val eea tgc att Leu Cys Ile age cct gaa Ser Pro Giu gct acc att Ala Thr Ile gag aac gt Glu Asn Val tat gee cge tat Tyr Val Arg Tyr caa gtt cat gce Gin Val His Ala 120 180 240 300 357 405 453 501 549 597 645 693 741 789 837 891 951 1011 1071 1131.
atc atc aaa Ile Ile Lys aac aag gct ggc Asn Lys Ala Gly gea ctt gga aea Val Leu Gly Ile age ege eca gga Ser Cys Leu Gly act geg gca Ile Val Ala cag gaa aac Gln Giu Asn cet tgc tgc Phe Cys Cys ege aag egg agc Cys Lys Trp Ser gee eac ctt egg Ala Tyr Leu Trp tat ggg Tryr Gly 110 tgc agc ctc ate aea tat Leu Ile Ile Tyr 115 cca aaa tcc aac Pro Lys Ser Asn 130 tat ceg geg eggj Tyr Leu Val Trp get egt eca gac Val Cys Ser Asp ggc aaa aca age Gly. Lys Thr Ser 135 age aag tgc ace Ser Lys Cys Thr 150 ccc eec cta cca aaa Pro Phe Leu Pro Lys ccc ceg gat cag Leu Leu Asp Gin 125 gee gee gge Val Val Gly eagcatgctg actegcecat cagttttgca cagtggcaae CCa tgcgc cc ceeceecctg acatggatta eeegggactg atttagaaca gaaaceccat cacaegaeca ctactgcagc agaatggect acecacacec gegatteeca gaaaaectcc accctctatg acactgcacc ttgccctatt tggaaccccg aggacaaagg aegtcactee cctecteegg ttacgggtgg aagccaactt aacaatgaac gaacacggct 151 actttccags aagatattag atgaaaggat 1191 ctcagggant tggggaaang gttcacagaa aanccact.ta antcaaggct gacagstaac gaaagaagcc atttgcatag attattytaa cctatgccta tactttttta tytcagaaaa aa <210> 50 <211> 1158 <212> DNA <213> Homo saipiens <220> 4221> sig-.pepcide <222> 214. .339 <223> Von Heiine matrix aaaatatttc tgtaantgan ttastgastt gttgcttavt acgtgatgaa aggatatcat taaagtcaaa tcttcatcrt tgctgataat Caagaagant agacta tgaa gaanattttc caggaaacat attaaaaaca aaaaaaaaaa <221> <222> <221> <222> <221> <222> <223> <221> <222> <223> score 6.09999990463257 seq AILLLQSQCAYWA/LP po lyA.s igna 1 1133. .1138 polYA-.site 1146. .1158 misc..feature 840. .968 homology id :H64717 est misc-feature 858. .968 homology id :H65208 <300> <400> aarttgagct tggggactgc Cctcatctt ggatttgaaa tgrsagtgta mtggattatt aacagtccat gtgggtgact aga caa atg etat att cz Arg Gin Met Ty Ile GI -3 Cgc caa gga cga ata, t Ar-g Gin Gly Arg Ile C 1251 1311 1371 1431 1433 120 180 234 282 330 378 431 491 551 611 671 731 791 851 911 971 1031 1091 1151 1158 agctgtgggg g ttgagagca ccttgggCct cagctctgat agatttcagt gcattgcctc ccctgggtgc gcatgcttttg cccactgaaa ctcatcctgs gaatgacttg aatgtttccc cgectgagct ggg atg tgt ttt cca gag cac aga Met Cys Phe Pro Clu His Arg ~a gat aga ctg gac tct gtc acc agg aga gca .n Asp Arg Leu Asp Ser Val Thr Arg Arg Ala 0-25 ~t gct ata cta tta. ctc caa tct cag tgt gcc rs Ala Ile Leu Leu Leu Gin Ser Gin Cys Ala -10 tat tgg gcg ctt cca, gaa ccgj cgt aca. ctt gat ggg gga cat ctt atg Tyr Trp, Ala Leu Pro Glu Pro Arg Thr Leu Asp Gly Gly His Leu Met 1 5 caa tgatggctct ctcctgctcc aagatgtgca agaggctgac cagggaacct Gin acatctgtga atgtgcttcc aagegcttcc ggctaagtac cagaa taaaa aCtttaaaca gtccaaatat accthtgaca cacattgatg aaagaaataa tgaggcagca actgcattyc aaaaaaa aatccgcctc agaggagccc atgtgcaaga ttaccacaga atakgagtta cacteccccc gttttggaca acaaagtcya ctacatytgt gcagkycagg gaamtgcctg agcc tgggwg aaaggggaga aaaggtacgc ggcaaggaaa gtgaatcttc ttttagttaa tcacaaaagc catatttatt tgttytctttt attttatagg ctcagtggct agccccaggg acagagcaag gccaggtgrt aaatgcttac ctgattatct aaagaaatga kaataaaata cctgtgaagg aaatggaata actacgcccc taccctatgt catgcctgta ttcaagactg actYtgttta caagaaggcg ttaaagaggg tgagtaaatg ntcarttaaat ttgataatta atgttttgtt aatagtamtt aataccttts taggtgtttt atocctagcat cagtgagc ta aaa taaaa aa gtggtactgc gccaaggggc ccagcctttg tatttcagrt ttgtattatt cacatataat gaaccccggc atcagttatc gggggacaga tttgggaggc tgawggcacc agagaaaaaa <210> 51 <211> 850 <212> DNA <213> H-omo sapiens <220> <221> Sig...peptide <222> 372. .437 <223> Von Heijne matrix score 6.09999990463257 seq LFLTCLFWPLAAL/NV <221> polyA-signal <222> 812. .817 <221> polYA-m.ite <222> 838. .850 <221> misc-.feature <222> 128. .424 <223> homology id :N78012 est <221> misc-feature <.222> 61. .126 <223> homology id :N78012 es t <221> misc..feature <222>, 483. .554 <223> homology id :N78012 est <221> misc-.feature <222> 417. .464 <223> homology id :N?78012 esc <221> misc-.feature <222> 460. .500 <223> homology id :N78012 est <221> misc-feature <222> 577. .612 <223> homology id :N78012 es C <221> misc-.feature <222> 612. .649 <223> homology id :N78012 es C <221> misc..feature <222> 546. .577 <223> homology id :N780l2 est <221> misc-feature <222> 29. .63 <223> homology id :N78012 est <221> misc-.feature <222> 128. .294 <223> homology id :W37233 153 <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <22 3> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> es t misc-feature 370. .509 homology id :W37233 est misc-C.eature 505. .591 homology id :W37233 est misc-feature 293. .330 homology id :W37233 es t misc fea.ture 22. .57 homology id :W37233 esc misc-feature 95. .128 homology id :W37233 est misc-feature 128. .326 homology id :AA186399 esc misc-f.eature 418,..605 homology id :AA186399 est misc-.feature 326. .423 homology id :AA186399 est misc-feature 39. .128 homology id :AA186399 est misc..feature 206. .640 homology id :W52489 es t <300> <400> 51.
agacactocc aCCa cggcgt gtcggctttc catgttggtg actcaaccct ttggocttga atgccttcta tggtgggatc a ccacggcc e ttggtgcctt acctgttcag ctctttggac ggaagaagac g atg c88 z Met Gin I cgagtgaggc cactgtgcce ggttcatccc ttgctgcta cgcaattgaa atgctctaca Lat Cac ctc .sn His Leu gacggggtag gggttggCgc tcaggcggcg ctcattgtga tgagcgtgtt Ctggggcttc taagggtcct aaccggggag ttatcattac tctctettgg ctgattgcaa ttctggccca aaatgaaacc atctggtatc tgaagtatca gtgctcagtc tttgaggtca cgagaagaga caa. acc aga cca ctt ttc ttg act Gin Thr Arg Pro Leu Phe Leu Thr -15 tccCtg ttt tgg cca tta, got gcc tta aac gtt aac ago aca ttt gaa 154 Cys Leu Phe Trp Pro Leu Ala Ala Leu Asn Val Asn Ser Thr Phe Glu tgc ctt att cta caa tgc agc gtg Cys Leu Ile Leu Cln Cys Ser Val 15 ttt tcc ttt Phe Ser Phe gCC tt ttt gCa Ctt Ala Phe Phe Ala Leu tgg tgaattacgt gcctccataa cctgaactgt gccgactcca caaaacgart* Trp atgtactctt ctgagataga agatgctgtt cttctgagag atacgttact Ccc aatctgtgga ttgaaaatg gctcctgcct tctcacg~gg gaaccagtga agt aactgctgca agacaaacaa gactccagtg gggtggtcag taggaaaaca cgti gaagaaccat ctc~acagaa tcgcaccaaa ctatactttc aggatgaatt tc gccatctt. ggaacaaata ttcctcct tytatgtaa aaaaaaaaaa a <210> 52 <211> 1107 <212> DNA <213> Homo sapiens <220> <221> sig..peptide <222.> 132. .215 <223> Von Heijne matrix tccttgg :cagagg <221> <222> <221> <222> <221> <222> <223> <221> <222> <223> <221> <222> <22 3> <221> <22 2> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> score 3.59999990463257 seq PLSOSWALLPASA/Gv polyA-signal 1069. .1074 polyk..sie 1094,..1107 misc-.feature 177. .392 homology id :W80978 est misc-.feature 425. .542 homology id :W80978 est misc-feature 43. .114 homology id :W80978 est misc-.feature 387. .441 homology id :W80978 est misc..feature 113. .165 homology id :W80978 est.
misc-feature 551. .590 homology id :W80978 es t misc-feature 166. .314 homology id :AA043154 es r <221> misc-feature <222> 27. .181 155 <223> homology id :AA043154 est <221> misc-feature <222> 425. .564 <223> homology id :AA043154 <221> misc-.feature <222> 387. .441 <223> homology id :AA043154 <221> miscfecure <222> 309. .352 <223> homology id :AA043154 est <221> misc..feature <222 549.,.580 <223> homology id :A.N043154 est <221> misc-.feature <222> 601. .1071 <223> homology id :AA126732 est <221> misc-.feature <222> 576. .605 <223> homology id :AA126732 est <221> misc-feature <222> 387. .477 homology id :AA161280 est <221> misc-.feature <222> 292. .362 <223> homology id :AA161280 est <221> misc-.feature <222> 46. .113 <223> homology id :AA161280 es t <221> misc-.feature <222> 217. .277 <223> homology id :AA161280 est <221>'misc-f.eature <222> 113. .160 <223> homology id :AA161280 est <G221> misc-.feature <222> 173. .217 <223> homology id :AA161280 est 156 <300> <400> 52 aacaacttcc CgCtgcttac cgcccgtgac ggccccactg agcggtgtcc tgagccgatt acagceaggt agtggagcgc ctgggtgcag gagacagccg gagtcgctgg gggagctccg cgccgccggo c atg tgg Agg ctg ctg gct cgc gct agt gcg ccg ctc ctg Met Trp Arg Leu Leu Ala Arg Ala Ser Ala Pro Leu Lau gtg ccc ttg Val Pro Lou aag aca ccg Lys Thr Lau tca gat Ser Asp ccc cca Leu Pro tcc tgg gca Ser Trp Ala ctc cc Lau Leu Ccc gcc agt gct ggc Pro Ala Ser Ala Gly gta cca agt tcc gad gat gtt Vol Pro Sar Phe Glu Asp Val gad add ccc Glu Lys Pro gta aga ago Val Arg Arg aag cct aga ccc Lys Leu Ar-g Phe gaa egg gca cca Giu Arg Ala Pro tcc act cct Ser Ile Pro gcg cca aaa Vol Pro Lys cct tcc act Pro Ser Thr gao cct aaa Glu Pro Lys agt gac ata Ser Asp Ile gad gt Glu Ala acg gag kkk Thr Glu Xaa ggc aat ccc Gly Asn Phe ttg gco ttg Leu Ala Leu ggc tac ctg Gly Tyr Leu ggc cac ccc Gly His Ph.
atg cgc ctg Met Arg Leu aca ac Thr Ile 120 218 266 314 362 410 458 506 554 602 650 698 746 794 842 oac cgc tct Asn Arg Ser gcc cct, ttc Ala Pro Phe 100 gga ggc aaa Gly Gly Lys ccc aog aac Pro Lys Asn gcc aca tgg Ala Ile Trp ccc atc act Pro Ile Thr agc gcc ggg Ser Vol Gly cga gta cca Arg Vol Pro cgc atg ggg Arg Met Gly aag gct ggc Lys Ala Cly ggt gct at dly Ala Ile tac gtg aca Tyr Val Thr 115 cgc mww Arg Xaa gww gta gag Xaa Val Giu ggg cgt tgt Gly Arg Cys gaa gao gtg coo Glu Glu Val Gin ccc ctt gac Phe Leu Asp gcc cac aag Ala His Lys tty gca gca Phe Ala Ala aag gct Lys Ala 160 gtg agc cgc Val Ser Arg gao mgt aac Glu Xaa Asn 180 mac atg ctg Xaa Met Leu yca gag Oag Leu Giu Lys aaa got coo Lys Asp Gin gag gao ago Glu Glu Arg 175 gcc act gcc Ala TIhr Ala cog aac ccc Gin Asn Pro ttt gag cga Phe Giu Arg ggc ate cgg Gly Ile Arg ctg agc cco Leu Ser Pro 195 aag ggg Lys Gly 210 000 tam tgg Lys Xoa Trp tcy tac atg ccc Phe Tyr Met Pro tot goc ttg Tyr Asp Leu 205 mam cgt gtg Xaa Arg Vol aaggatcytg gccataamta aaaaagaaaa aoa acc cac Thr His togtgogcgt aggogacoac tgtacacagg stactgaaag cccctcgcc tacccactga ogtytttggg tagctyctao ttcgogtaga tttytgaaaa acgotgttac ttgttgattt attoaataaa atttcat, cacttcaggo oaoaaaa <210> 53 <211> 500 <212> DN'A <213> Homo sapiens <220> <221> sig-.peptide <222> 199. .288 catttytatt aggagcagca cwgtattccc 944 1004 1064 1107 157 <223> Von Heijne matrix score 5.59999990463257 seq IVSVLALIPETTT/LT <221> polyA..Signal <222> 464. .469 <221> POlYk..Site <222> 489.-500 <221> misc-.feature <222> 197. .412 <223> homology id :AA429945 es r <221> misc-.feature <222>, 61,..195 <223> homologry id :AA429945 est <221> misc..feature <222> 425. .488 <223>-homology id :AA429945 esc <221> misc-feature <222> 197. .412 <223> homology id :AA455042 es c <221> misc-feature <222> 61.-.195 <223> homology id :AA455042 est <221> misc-feature <222> 425..488 <223> homology id :AA455042 est <221> misc-feature <222> 207. .412 <223> homology id :W93646 es t <221> misc-feature <222> 58. .195 <223> homology id :W93646 es t <221> misc-feature <222> 425.,.488 <223> homology id :W93646 est <221> misc-feature <222> 197. .412 <223> homology id :AA516431 es t <221> misc-feature <222> 90. .195 <223> homology id :AA516431 est <221> misc-feature 158 <222> 425..488 <223> <221> <222> <223> <221> <222> <223> <221> <22?> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> homology id :AA516431 est misc-feature 52..195 homology id :W38899 esr misc-feature 197. .324 homology id :W38899 est misc-feature 443..477 homology id :W38899 est misc-feature 197. .338 homology id :W52820 est misc-feature 71..195 homology id :W52820 est misc-feature 339..401 homology id :W52820 eat miscfEeature 425..469 homology id !W52820 est mise..eature 40..195 homology id :W19506 est <300> <400> 53 agagctgtnn tcgcagaacc ctgcagacge cnsaagtagg ggagggcggt gctccgcmgm tactcaggca gCcagctgag aagagttgag gatggataac gtgcagccga aaataaaaca ggtggcggdh tgctatcgct ggaaagtget gctgctgggt tcgccccttc tgcttcagtg tgaaaggcca cgtgayag atg ctg egg ctg gat Met Leu Arg Leu Asp aca aca gta ttc atg etc ate gta tct gtg Thr Thr Val Phe Met Leu Ile Val Ser Val att atc aac tca ctg gta lie Ile'Asn Ser Leu Val -25 ttg gca ctg ata cca gaa Leu Ala Leu Ile Pro Glu 120 180 231 279 327 375 423 ace aca aca ttg aca gtt ggt gga ggg gcg ttt gca Ihr Thr Thr Leu Thr Val Gly Gly Gly Val Phe Ala ott gtg aca gca Leu Val Thr Ala gta tgc tgt Val Cys Cys aat ccc agc Asm Pro Ser ott gee gac ggg gee ctt att tao cgg aag ctt cg ttc Leu Ala Asp Gly Ala Leu Ile Tyr Arg Lys Leu Leu Phe 20 ggt oct tac cag aaa aag cct gtg cat gaa aaa aaa gaa Gly Pro Tyr Gin Lys Lys Pro Val His Glu Lys Lys Glu 159 35 40 gtt ttg taattttata ttactttta gtttgatact aagtattaaa catatttctg Val Leu tattcttcca aaaaaaaaaa a <210> 54 <211> 765 <212> DNA <213> Homo sapiens <220> <221> sig-.peptide <222> 293. .385 <223> Von Heiine macrix <221> <222> <221> <222> <221> <222> <2 23> score 4.40000009536743 seq TCCHLGLPHPVR.A/PR polyA.signal 733. .738 polyk..s ite 752. .765 rnisc...feature 310. .576 homo logy id :HUM426AO7B est <3 00> <400> 54 aaaccttgt tgaggggcga aaaccgtagg ttccccccca tgaaagagag gc tagggac gggaaaage kacgcgg tc agcgaaccg ge tagaagt C gggcggttg Cg5 t ttcctcaggt gtq a gaaaggcgac ggc rg gatgggaagt gac t ccgcttgcca gc~ cct gtc tca ctt Leu Val Ser Leu ~caaccgt ;gtgggga 7Ctgtcgg ttcaatg Lgcctcct gggcactgct gagggaggcg agttggaaag agattgaact tagtagagcg gaatttgaat ga tgccgqng ggacgcctgg tcagctggat ga atg agt Met Ser aat acc cac acg Asn Thr His Thr acc tgc tg cac Thr Cys Cys His cat ceg cac ceg gcc ctc His Pro His Pro Ala Leu ggc ctc cca Gly Leu Pro Ccg gtc cgc gct ccc Pro Val Ar-g Ala Pro cgc cct Arg Pro ccC cct Leu Pro cgc gta gaa ccg tgg gat cct agg tgg cag gac tca gag cta Arg Val Glu Pro Trp Asp Pro Arg Trp Gin Asp Ser Glu Leu agg tat cca cag gcc atg aat tcC ttc cta aat Arg Tyr Pro Gin Ala Met Asn Ser Phe Leu Asn 25 30 tgC agg acc tta agg caa gaa gca tcg gct gac Cys Arg Thr Leu Arg Gin Glu Ala Ser Ala Asp 45 tgaacctgat agattgctga tttatctta tttatcct atttctgaaa agaccataca gataaccaca aatatcaaga tagaatttag attaggttt ccttcctgct tcccacctcc tgggaccaac ttatggaat aaataagctg agctgcaaaa <210> 55 <211> 584 <212> DNA <213> Homo sapiens <220> <221> sig..peptide <222> 130. .189 <223> Von Heijne matrix score seq KFCLICLLTFIFH/HC <221> poly&..signal <222> 546. .551 gag cgg tca tcg ccg Glu Arg Ser Ser Pro aga tgt gat ctc Arg Cys Asp Leu gacttggtac aagttggg aagtcgtctt cagtattaag ttcgaataag gaaacgtctt waaaaaaaaa 160 <221> polyA-site <222> 572. .584 <300> <400> aagacgcgcc ggtttctgcg acgcagttag cgcagtctgc tttggtgaat acacgatttg gtgcagccgg ggtttggtac cgagcggaga ggagatgcac acggcactcg agtgtgagga aaaatagaa atg aag gta cat atg cac aca aaa ttt tgc ccc att tgt ttg Met Lys Vai His Met His Thr Lys Phe Cys Lou Ile Cys Leu ccg aca Leu Thr t act tct cat cat tgc aac cat Phe Ile Phe His His Cys Asn His cat gaa gan His Giu Clu cat ggc cct gaa gcg ctt cac aga cag His Gly Pro Glu Ala Leu His Arg Gin Cgt gga Otg aca Arg Gly Met Thr cat gac His Asp gaa ttg Ciu Leu aaa. tac Lys Tyr gag cca agc aaa Clu Pro Ser Lys tca aag caa gct gct gaa aat gaao Ser Lys Gin Aia Ala Glu Asn Glu tat act gaa aaa, ctt ttt gag Tyr I-le Ciu Lys Leu Phe Glu tat ggt gaa aat Tyr Gly Glu Asn aga. tta tcc Arg Leu Ser ttt ccc Phe Phe ggt ttg gag aaa Gly Leu Glu Lys tta aca aac ttg Leu Thr Asn Leu ctt gga, gag aga Leu Gly Glu Arg gta gtt gag act aat cat gag gat ctt ggc cac gat cat Val Val Glu Ile Asn His Glu Asp Leu Gly His Asp His cat tta agg tat ttt His Leu Arg Tyr Phe agt tca aga Ser Ser Arg gct tc Val Ser ctc aca Leu Thr 105 aaa gca ccc tca Lys Ala Phe Ser taaccaccca. gcactcccat. aatcatttaa attcagaaaa wtccacaaa-a aaaaaaa <210> 56 <211> 1387 <212> DNA <213> Homo sapiens <220> <221> sig..pepcide <222> 191. .325 <223> Von Heijne matrix score 4.59999990463257 seq VLVYLVTAERVWS/ DD <221> polyA..signai <222> 1348. .1353 <221> polyA-.site <222> 1374.. 1387 <221> misc..feature <222> 1258. .1372 <223> homology id :AA417826 est.
<221> misc..feature <222> 791,-887 <222> homology id :AA417826 eat <221> misc-feature <222> 94. .524 <223> homology id :.AA235826.
est <221> misc..feacure <222> 44. .94 tcaaaactgt gaccagtgta 161 <223> homology id :AA235826 es t <221> misc-feature <222> 1258. .1372 <223> homology id :AA236941 est <223.> misc-feature <222> 935. .1279 <223> homology id :AA480326 est <221> inisc-.feature <222> 1258. .1372 -<223> homology id :AA480326 eat <221> misc-f.eature <222> 724. .1148 <223> homology id :AA234245 eat <221.> misc-.feature <222> 944. .1279 <223> homology id :AA479344 eat <221> misc-.feature <222> 1258. .1372 <223> homology id :AA479344 est <221> misc-feature <222> 1070.-1212 <223> homology id :AA133636 est <221> misc-feature- <222> 1258. .1372 <223> homology id :AA133636 es t <221> misc-.feature <222> 938. .1054 <223> homology id :AA133636 eat <221> misc-.feature <222> 94. .436 <223> homology id :AA133635 est <223.> misc..feature <222> 32. .94 <223> homology id :AA133635 est <221> misc..feature <222> 895. .1273 <223> homology id :AA479453 est 162 <221> 22> <223> <221> <222> <223> misc..feature 1258. .1371 homology id :AA253214 est misc-feature 94. .268 homology id .:AA482378 <300> <400> 56 actCCcaggc C tccccgada tgccggctgc Cgggtccacc aac aag tac Asn Lys Tyr tgggccagca caccc99cQg gctctgtcct ggaaacaggc acctcccccg cCtctggata tgaavactca agctgcttgc tgggagccag gagag ccctg aggagtagtc actcagtagc atg aac tgg agt atc ttt gag gga ctc ctg agt Met Asn Trp Ser Ile Phe Glu Gly LeU Leu Ser t tcaacgggc tgagtcctat agc tgacgcg ggg gec Gly Val tcc aca gcc ttt Ser Thr Ala Phe cgc Atc tgg ctg Arg Ile Trp Leu ctg gtc ttc Leu Val Phe acc ctc Ile Phe gat gac Asp Asp cgc gcg ccg Arg Val Leu gcg Cac Val Tyr ctg 9C9 acg gcc gag cgt gtg tgg agt Leu Val Thr Ala Glu Arg Val Trp Ser cac aag gac r.cc gac tgc aat act cgc cag ccc ggc His Lys Asp Phe Asp Cys Asn Thr Arg Gin Pro Gly tgc tcc Cys Ser is aac gtc tgc ttt gat gag ttc ttc Asn Val Cys Phe Asp Glu Phe Phe gcg tCC cat gtg Val Ser His Val cgc ctc tgg Arg Leu Trp gtg gtC atg Val Val. Met gee ctg Ala Leu CaC gtg His Val so cag ctt atc ctg Gin Leu Ile Leu gcc tac cgg gag Ala Tyr Arg Glu gtg aca tgc ccc tca ctg ctc Val Thr Cys Pro Ser Leu Leu cag gag aag agg Gin Clu Lys Arg cga gaa gcc cat Arg Clu Ala His gag aac agt ggg Giu Asn Ser Gly ctc tac ctg aac Leu Tyr Leu Asn ggc aag aar cgg Gly Lys Lys Arg 120 18O 229 277 325 373 421 469 517 565 613 661 709 757 805 853 901 949 ctc tgg tgg aca tat gtc tgc agc Leu Trp Trp Thr Tyr Val Cys Ser gtg ttc aag gcg Val Phe Lys Ala agc gtg Ser Val gac ate Asp Ile etc cct Leu Pro gac tgc Asp Cys 130 atg gtg Met Val gcc Ala ctc tat gtg ttc Leu Tyr Val Phe tca ttc tac ccc Ser Phe Tyr Pro cct gtg gte aag tgc Pro Val Val Lys Cys 115 tte ate tec aag ccc Phe Ile Ser Lys Pro gca gat cca. tgt Ala Asp Pro Cys aaa tat ate Lys Tyr Ile 110 aat ata gtg Asn Ile Val ace etc ttc Thr Leu Phe gag aag aac Glu Lys Asn gcc aca gct Ala Thr Ala atc tgc ate etg Ile Cys Ile Leu aac cte gtg gag Asn Leu Val Clu tac ctg gtg Tyr Leu Vai aga tgc cac Arg Cys His tgc ctg gea Cys Leu Ala gcc caa gcc Ala Gin Ala kgc aaa caa Xaa Lys Gin 195 kgc aca ggt cat Xaa Thr Gly His ccc cay gat acc Pro Xaa Asp Thr gea agg aaa Ala Arg Lys 175 acy ttt tcc Thr Phe Ser 190 ctg ggn tca Leu Gly Ser gas gac ytc ytt Xaa Asp Xaa Xaa tcg ggk gac ytc ate ttt Ser Gly Asp Xaa Ile Phe 200 205 163 gac agt cat cyt cct ytc tta cca gac cgc ccc cga gac cat gtg aag Asp Ser His Xaa Pro Xaa Leu Pro Asp Arg Pro Arg Asp His Val Lys 210 215 220 aaa 8Cc aty ttg tgaggggctg cctggamtgg tytggcaggc tgggcccgga Lys Thr Ile Leu 225 tgggg gtagg~ ccagcc cc ccc tCa taaaa~ 210> <211 <212> <213> <220>.
<221> <222> <223-> <221> <222> <221> <222> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> aggct jgcag :acct tcty igaag ;rccaa 57 ytagcatyty gcaagagaga gccccags th cagaacggaa aaCaCacatg mtcatttgct tcatagg tgc ggat tcagac gacggcamtg acag cgaggg cgggcaccct ggtcaaaaaa aaCCtgagag gytgggag ggccagc tcc Ccaatgccca catygcgCgC aaaaaaaa tgggggagc t ccagettccca ccctytgscy gggtggagg ggcccac cg aagccacgag gccctcaamt tgcagstcgg gaggagggcc cagaacttaa 1049 1109 1169 1229 1289 1349 1387 1385
DNA
Homos sapiens sig-.peptide 141. .251 Von Heijne matrix score 4 seq PLSLDCGHSLCRA/CI polyk..signa.
1354. .1359 polyk..site 1375. .1385 mi sc-feature 1183.-1240 homology id :AA463623 eat misc-.feature 176. .239 homology id :AA258927 est misc..feature 803. .854 homology id :AA286417 est mis c-.feature 1183. .1213 homology id :AA608077 est <300> <400> -57 aacacccacc ctggcttttc ttcacctctt caaccaggag ccgagatttc tgttgctctg aagccatcca ggggtcttta accagaagag agaggagagc ctcaggagtt aggaccagaa gaagccaggg aagcagcgca acg gct tca aaa atc ttg ctt aac gta caa gag Met Ala Ser Lys Ile Leu Leu Asn Val Gin Glu gag gtg acc tgt ccc atc tgc ctg gag ctg ttg aca gaa ccc ttg agit Glu Val. Thr Cys Pro Ile Cys Leu Glu Leu Leu Thr Giu Pro Leu Ser -20 Cta gac tgt ggc cac agc Ctc tgc cga gcc tgc atc act gtg agc aac Leu Asp Cys Gly His Ser Leu Cys Arg Ala Cys Ile Thr Val Ser Asn -5 1 aag gag gca gtg acc agc &tg gga gga aaa agc agc tgt cct gtg tgt Lys Glu Ala Val Thr Ser Met Gly Gly Lys Ser Ser Cys Pro Val Cys 15 ggt atc agt tac tca, ttt gaa cat Cta. cag gct aat cag Cat cgg gCC 164 Gly Ile Ser aac ata gtg Asn Ile Val Tyr Ser Phe Giu His Ieu Gin Ala Asn Gin His Arg Ala gag aga ctc Glu Arg Leu gag gtc aag ttg Glu Val Lys Leu cca gac aac ggg Pro Asp Asn Gly aag aga gat ctc Lys Arg Asp Leu gat cat cat gga Asp His His Gly aaa ctc cta ctc Lys Leu Leu Lou aag gag gat Lys Giu Asp gtc act tgc Vai Ile Cys ctt cgc gag egg Leu Cys Ciu Arg tcc, cag Ser Gin gag cac cgt Glu HIS Arg atg tca gga Met Ser Gly 105 cac aca g9t cct cac gga gga agt His Thr Gly Pro His Gly Gly Ser act caa gga Ile Gin Giy 100 gaa gga aga Glu Giy Arg gaa act cca ggc agt Glu Thr Pro Gly Ser coo gag gct Gin Glu Ala gga gga agc tgagaagctg gaagctgaca tcagagaago gaaaacttcc Gly Gly Ser 120 413 461 509 557 605 654 714 774 834 894 954 1014 1074 1134 1194 1254 1314 1374 1385 tggaagtatc aggtacaaac agcacccaa ataatgagga acgccggaco agttgcaga gagctcac cagatgtgga atgagtggaa ccatgaaatg aagaaactga agactgtatt ggaactgaca gctgcccggt tttgaatckc gtccccccag tcagtgttat aatatggtg cgggaagtgg acgtgtccaa tcCcgccaca tgaagtatgt tacagacctc tatttggsta aaaaaaaaaa a <210> 58 <211> 1497 4212> DNA <213> Homo sapiens <220> <221> sig..peptide <222> 212. .268 tgagagacaa gcagagagag ggctgaggat gtgtcggagt gagtgagac ccatgcccca gctactgggt aagatcagag tkbttgggat gaaaac cgcc tgttagaaga ctgggttoa aggatacaaa c tgcaaagoc gagctagtcc cag cggccaa tggaggc tga gatctgagta gga cgtcaca acaagtgata Cccaatattt tggaccgg tgtgcaaaty ggg ttacaga cagaatttga tggaagaaga agcagaagca caatggagc c aaaagccaaa ggatgctgcr c tgaattcag tCtgtgccaa btccsstgg gggtacactg gtcaaaatbt ataaatgtaa tcagcttaga agaaaagaag gttggtgaga gctgcaggac aatggtttcc aotccttaga tcaacctaaa tttggccttt gaaacattac tagaacatat t cocaccaaa gcotggtgcc <223> Von Heijne matrix score 8.60000038146973 seq LLWLALACSPV}{T/TL <221> poiyk..signai <222> 1465. .1470 <221> poiyA-.site <222> 1489. .1497 <221> misc-feature <222> 958. .1110 <223> homology id :W72124 <221> <222> <223> <221> <222> <223> est misc-feacure 1362. .1488 homology id :W72124 est misc..feature 1202. .1312 homology id :W72124 est <221> misc..feature 165 <222> 1115.-1190 <223> homology -id :W72124 eat: <221> misc..Eeature <222> 1312. .1370 <223>. homology id !W72124 est <221> misc-feature <222> 653. .942 <223> homology id AA009415 est <221> misc-feature <222t, 454. .605 <223> homology id :AA009415 est <221> miac-feature <222> 598. .639 <223> homology id :AA009415 es <221>- misc,.feature <222> 805. .1032 <223> homology id :AA088502 eat <221> misc..feature <222> 633. .807 <223> homology id :AA088502 esr- <221> misc-feature <222> 598. .639 <223> homology id :AA088502 eat <221> misc-t.eature <222> 564. .605 <223> homology id :AA088502 <221> misc-feature <222> 653. .807 <223> homology id :AA181148 es t <221> misc-featcure <222> 907. .1046 <223> homology id :AA181148 eat <221> misc-.feature <222> 475. .605 <223> homology id :AA181148 eat <221> misc..feature <222> 598. .639 <223> homology id :AA181148 166 <221> <222> <223> <221> <22 2> <223> <221> <222> <223> <221> <222> <223> est misc-feature 1069. .1190 homology id :AA181149 est misc-feature 1362. .1475 homology id :AA181149 misc...feo.ure 1202. .1312 homology id :AA181149 est misc-feature 1312. .1370 homology id :AA181149 est, <3 00> <400> 58 atccggcgcg gcagatttcd agactgggtg cgaggc tgga ctggagcgtt anssagaaga cccgggagct ccc cactgtg t.tccggccgt cagagaagga gaggcagcca acacacctac gcgtttgtgg ccgtccggcc tccctgacat gcnagtggtc acggaatggg ccggggtcaa ccgtttcagc ctggccagcc ctctggaccc c atg cgg aca. ctc-tte aac ctc Met Arg Thr Leu Phe Asn Leu ccc t99 ctt gcc ctg gcc tgc age cct, gtt. cac act ace ccg tea *aag Leu Trp Leu Ala Leu Ala Cys Ser Pro Val His Thr Thr Leu Ser Lys tca gjac gcc asa aaa ccg ccc caa aga cgc tgc Ser Asp Ala Xaa Lys Pro Pro Gin Arg Arg Cys 10 15 ttt cag ata agc egg tgc aar acc ggg gtt tgg Phe Gin lie Ser Arg Cys Lys Thr Gly Val Trp tgg aga aga gtc agt Trp Arg Arg Val Ser tgg tgacggacct Trp caaagctgag agtgtggtcc acactttgct ggggatgtac caccaaggtc tttgggagca acgtggccgt gagatgtttg agctgtcagg aagcatgcca.
gacttacgat gatttccgga.
gaccgtggtc caggtggcaa ccagctgcta agccagaagc cctgcaccag gcccggctgc cgaccagctg ggcatgttca tttcagcctc atgacctacg gtcctgggtt cgagccgcg tcctcctggg gcccaacttc ctgttgtcgg ggccaggtac gggacagcca ggcctcagag tCgtCttca cccaaccctg ggcgttgggg tccctcytg caggtgggca ttgcggcctc gcaggtgtga aatacaggcc <210> 59 <211> 1570 <212> DNA <213> Homo sapiens <220> <221> Sig-peptide t tgagcatcg tgggctatgt age ecacaca aggtcacggg agggcctgca acgecttaga agaaccagca gcgtgggcec tggccctcct cgcacaagga actactctac tccaggcct ta egg ca gg atccagacac cactecctcg aagtcccegc ggagc tgggc cgcggtggac tccactccgt cagctactgc cactccatgg gatctcaeccc cctccacgac catagegcct cagtgaggat ttt~cgatggc caccacatg ggteatcccg gtttgagcag agcgcatcag ggacccggaa actacgcgac tgaadggacc agtacaagaa dggcgcgggc cagggcctgg gegteyttee ttgcaaaaaa tcggcaaagg aacagccatg gtcggctgc gtggaccaag eggetcegt, gagatagagg ttcgtggtgg ctcacccact cetgccatca, ctggcccccg cctggccca gtccaag egg c Cccaaggat, acaggccccg gagccgcagt tggagcggc accacttyta ytaagccatg aaa cccgggacag gctacgatgt agttgaagag ggtggatgcg tegaggaceg agctgagcaa aggtctggaa tggecgaggc cccccgggac tgctggatgg a cgcacccc C cgaagcaaaa gcccgtgagc ggaatggtgt gggaggcacg ccgggagc cg cgacctgcty gagtgagtga 120 180 232 280 328 374 434 494 554 614 674 734 794 854 914 974 1034 1094 1154 1214 1274 133'4 1394 1454 1497 167 <222> 147. .248 <223> Von Heijne matrix score 4.30000019073486 seq QLIFAPLNLLPVEA/DI <221> PolYA-.signal <222> 1538. .1543 <221> polyA..site <222> 1558,..1570 <221> misc-feature <222> 466. .968 <223> homology id :AA506103 es t <221> misc-feature 142. .664 <223> homology id :AA237105 es t <221> misc-feature <222> 114. .269 <223> homology id :AA317201 est <221> misc-feature <222> 2.-122 <223> homology id :AA317201 eat <221> misc-feature <222> 401. .443 <223> homology id :AA317201 est <221> misc-feature <222> 103. .385 <223> homology id ;T80259 esc <221> misc-feature <222> 21. .120 <223> homology id :TS0259 es t ,221> misc-feature <222> 109. .459 <223> homology id :N32697 est <221> misc-feature <222- 45. .87 <223> homology id :N32697 est <221> misc-feature <222> 92. .122 <223> homology id :N32697 est <221> misc-feature <222> 1220. .1409 <223> homology id :AA449621 est 168 <221> <222> <223> 4221> <2 22> <223> -:22 1> :2 22 <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> misc-feature 928. .1092 homo logy id :AA449621 est misc-feature 1178. .1222 homology id :AA449621 est misc-.feature 1220. .1545 homology id :N34685 est misc-.feature 1168. .1222 homology id :N34685 est misc-feature 1220. .1545 homology id :N22990 est misc-.feature 1178. .1222 homology id :N22990 est misc-feature 114. .325 homology id :AA330462 est misc-.feature 18. .122 homology id :AA330462 est misc-feature 135. .475 homology id :HUMESTSH12 es t <300> <400> 59 agtcgtccct agggtaggga gaacttcaag gccagtactc cgggctgtgg gggtcggtgc ggatattcag tcatgaaatc cttctcccgc agcgacgcgg ctggcaagac tgtttgtgtt gcgggggccg gtgattttac aacgag atg ctg etc tcc ata ggg atg etc atg Met Leu Leu Ser Ile Cly Met Leu Met ctg tca gcc aca caa gtc tao acc atc t~tg act gtc cag ctc ttt gca Leu Ser Ala Thr Glri Val Tyr Thr Ile Leu Thr Val Gln Leu Phe Ala ttc tta aac cta ctg cot gta gaa gca Phe Leu Asn Leu L.eu Pro Val Glu Ala gaa aat gca tct cag aca ttt gat gac Glu Asn Ala Ser Gln Thr Phe Asp Asp 15 aga ctt cca get gaa ggt tta aag ggt Arg Leu Pro Ala Glu Gly Leu Lys Gly att tta gca Ile Leu Ala ccc gca ara Pro Ala Xaa aac ttt Asn Phe ttt ggr. tat Phe Gly Tyr ttt tea ate aac tca aaa cca Phe Leu Ile Asn Ser Lys Pro 169 eat gcc egt gaa ccc ata geg cct cca cca gta aaa gac eat tca Asn Ala Cys Glu Pro Ile Val Pro Pro Pro Val Lys Asp Asn Ser tc ggc act ttc Ser Gly Thr Phe geg tta act ara Val Leu Ile Xaa ctt gat tgt aat Leu Asp Cys Asn ata aag gt Ile Lys Val gca cag aga Ala Gin Arg gga eac aag gca Gly Tyr Lys Ala cac ehat gee gat tc gat gac His Asn Val Asp Ser Asp Asp agc atg gga Ser met Gly gcc ate gte Ala Ile Val aac gac ace Asn Asp Ile ggt gaa tca Giy Glu Ser gag gta Clu Val 105 tca gct Ser Ala cca aag add act Leu Lys Lys Ile act eca tct gec cee Ile Pro Ser Val Phe age tcC ctg Ser Ser Leu gaa ttc aca Glu Phe Thr aaa ggg ggc Lys Gly Gly aec eta gee Ile Leu Val tte agt ctt Phe Ser Leu teg gaa tac tac Leu Giu Tyr Tyr cta act Leu Ile 150 ccc etc cct Pro Phe Leu aeg atc aca Met Ile Thr 170 geg 9gc aec Val Gly Ile aec ttg aea Ile Leu Ile gtc act tcc Val Ile Phe 165 aga aac aga Arg Asn Arg ttg tcc agg Leu Ser Arg cat aga gct His Arg Ala Ctt cgt Leu Arg 185 gga gat Giy Asp aaa gat caa ctt Lys Asp Gin Leu aaa ccc cct gea Lys Leu Pro Val aaa etc aag aaa Lys Phe Lys Lys gag tat gat Giu Tyr Asp 413 461 509 557 605 653 701 749 797 845 893 941 989 1037 1085 1133 1181 1223 1283 1343 1403 1463 1523 1570 gcc act tge Ala Ile Cys gag tat gaa Giu Tyr Giu gac aaa cc Asp Lys Leu atc ccc ccc tge Ile Leu Pro Cys cat gct tat cat His Ala Tyr His tege gea gac Cys Val. Asp agg caa aaa Arg Gin Lys 250 age age caa Ser Ser Gin egg cta act aaa Trp Leu Thr Lys aaa acc egt Lys Thr Cys cca geg tgc Pro Val Cys 245 gac aca gac Asp Thr Asp gee ccc ct Val Pro Ser gae eca gac Asp Ser Asp gaa gaa eae Glu Giu Asn geg ace gaa cae Val Thr Glu His cct tea ctg aga Pro LeU Leu Arg gnc ttc tgt Xaa Phe Cys cca rgt cam Pro Xaa Xaa ggg gce tea nec Gly Ala Leu Xaa ant ccc gct cac Xaa Pro Ala His cag aak cat gac.
Gin Xaa His Asp aga atc Arg Ile 305 at cag ace Ile Gin Thr ase. gag Xaa Glu 310 gaa gac gac aae gaa gat act gac Glu Asp Asp Asn Giu Asp Thr Asp age gat gca gaa gaa Ser Asp Ala Giu Giu tgaaattaat catagcaaac ggtatatact tee tag eac tgccaagaat agctyttytt <210> 60 <211> 1022 gaacaegatg accgtttgac gtaaeeegat ytacageeea ataceecact ttggaatgaa ecgeggtcca tttcagaaga tc cegctcC atcaaae eac cac taataat agtatagcca geegcagcct aatggtgaac tgaeeggeee attccceee cteaaaagac eeyegeagaa tgaaacagga ctecegacyc agaccggtgc tgtaacecaa aaacaaaaaa aaaaaaa gggattacaa aaaaegaeea ataacttce ggtataec gcatcaattc 170 <212> <213> <220> <221> <222> <223>
DNA
Homo sapiens Big-peptide 112. .237 Von Heijne matrix score 7.19999980926514 seq ILFStJSFLLVIIT/FP <221> PolYk..Signai <222> 976. .981 <221> poiyA-site <222> 1010.-1022 <300> <400> *dat~t ct CtLccccc CtCCCaagca catctgagtt agctccaaac ccatgaaaaa ttgccaagta tadagcttc gctgcctgte Cttcacacct tCdagaatga g atg gat Met Asp tct agg gtg tct tca ccc Ser Arg Val Ser Ser Pro -35 gtc aac aac aaa cgg ctt Val Asn Asn Lys Arg Leu tct ttc ctg ttg gtg atc Ser ?he Leu Leu Vai Ile Ctg aag atc att aag gag Leu Lys Ile Ile Lys Giu Cgc atc caa gct gac aaa Arg Ile Gin Ala Asp Lys 30 cca tqc ata gat gtg ttt Pro Cys Ile Asp Val Phe aac att cct cca caa gag Asn Ilie Pro Pro Gin Glu gta gat gga gtt gtc tat Val Asp Gly Val Val Tyr gct aat gtc aac gat gtc Ala Asn Val Asn Asp Val act ctg aga aat gtc tta Thr Leu Arg Asn Vai Leu 105 110 gga cga gaa gag atc gcc GlY Arg Ciu Glu Ile Ala 125 acc gaa ctg tgg ggg atc Thr Giu Leu Trp Gly Ile 140 cgg att ccc gtg cag ttg Arg Ile Pro Val Gin Leu 155 acc cgg gaa gcg aga gcc Thr Arg Giu Ala Arg Ala 170 gct tcc aaa tcc ctg aag Ala Ser Lys Ser Leu Lys 185 190 aag caa gat aaa gag oat ttc Lys Gin Asp Lys Glu Asn Phe 9ta tgt ggc tgg atc ctg ttt Vol Cys Ciy Trp Ile Leu Phe 8CC ttc ccc atc tcc ata tgg Thr ?he Pro Ile Ser Ile Trp 1 gaa cgt gct gtt gta ttc cgt Glu Arg Ala Val Vol Phe Arg oag ggg cca, ggt ttg atc ctg Lys Gly Pro Giy Leu Ile Leu aag gtt gac ctc cga aca gtt Lys Val Asp Leu Arg Thr Val ctc acc ago gac tcc gta act Leu Thr Arg Asp Ser Val Thr 65 aga otc tat agt gct gtc tca, Arg Ile Tyr Ser Ala Val Ser 80 coo gca oco ttt ctg ctg gct Gin Ala TIhr ?he Leu Leu Ala 100 aca cag acc ttg tcC cag atc Thr Gin Thr Leu Set Gin Ile 115 agc atc cag act tto ctt gat Ser Ile Gln Thr Leu LeU Asp 130 gtg gcc cga gtg gao otc aaa Val Ala Arg Vai Gu Ile Lys 145 150 aga tcc atg gca gcc gag gct Arg Ser Met Ala Ala Glu Ala 160 165 tC ctt gca gct gaa gga gao Val Leu Ala Ala Clu Gly Giu 180 gcc tcc atg gtg ctg gct gag Ala Ser Met Val Leu Ala Glu 195 ata. gct ctc cag ctg cgc tac ctg cag acc ttgogacgagcac agc acg gta gcc acc 885 Ile Ala Leu Gin Leu Arg Tyr Leu Gin 205 gag aag aat tct acg att gtg ttt cct Giu Lys Asn Ser Thr Ile Val Phe Pro Thr Leu Ser Thr Val Ala Thr 210 215 ctg ccc atg aat ata cta gag Leu Pro Met Asn Ile Leu GIu 230 cac aag aag ctt cca aat aaa His Lys Lys Leu Pro Asn Lys 1) C ggc att ggt Gly Ile Gly 235 gtc agc tat Val Ser Tyr gat eac Asp Asn 240 gcc tgaggtcctc ttgcggtage cagctaaaaa aaaaaaaa Ala <210> 61 <211> 615 <212> DNA <213> Homo sapiens z220> <221> sig.eptide <222> 239,.316 Von Heilne matrix 1022 <221> <222> <221> <222> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> score 3.90000009536743 seq ITWVSLFIDCVMT/RK polyA.signal 586..591 polyA.site 603..615 misc-feature 341..574 homology id :AA453275 miscfeature 174. .332 homology id :AA45327S est misc.feature 85..171 homology id :AA453275 est misc-feature 341..574 homology id :AA149631 ese misc-feature 170. .339 homology id :AA149631 est.
misc-feature 43..123 homology id :AA149631 est misc-feature 88..339 homology id :AA588414 est misc-feature 341..574 homology id :AA588414 172 est <221> misc-.feature <222> 1. .345 <223> homology id :AA1SEB47 est <221> misc-feature <222> 342. .414 <223> homology id :AA156847 es r <221> misc-.feature <222> 341. .574 <223> homology id :AA501739 eat <221> misc-jeature <222> 110. .339 <223> homology id :AA501739 es t <221> misc-feature <222> 341. .574 <223> homology id :AA131792 est <221> misc-feature <222> 153. .259 <223> homology id :AA131792 es t <221> misc-.feature <222> 259. .339 <223;- homology id :AA131792 es t <221> misc-.feature <222> 59. .338 <223> homology id :ALA131842 est <221> misc-.feature <222> 344. .415 <223> homology id :AA131842 est <221> misc-feature <222> 400. .434 <223> homology id :AA131842 es t <221> misc-feature <222> 341. .574 <223> homology id :AA152042 eat <221> misc-feature <222> 183. .339 <223> homology id :AA152042 est <300> <400> 61 173 agtgtgcctg tcccagagtt tgcagatagt atctttgaag aagaagaagt tgaatttatc gatcctgcca acattgttca tgaCtttaac aagaaactta cagcCtattt agatcttaac ctgaaagt gctatgtgat ccctctgaac acttccattg ttatgccacc cagaaaccta ctggagttac tattaacat caaggctgga acctatttgc ctcagtccca tctgattc atg agc aca tgg tta tta ctg atc gca ttg aaa aca ttg atc ace tgg Met Ser Thr Trp Leu Leu Leu Ile Ala Leu Lys Thr Leu Ile Thr Trp -20 gtt tet tta ttt atc gac tgt gtc atg aca agg aaa ctt aca aac tgc Val Ser Leu Phe Ile Asp Cys Val Met Thr Arg Lys Lau Thr Asn Cys -5 1 aac get aga gaa act att aaa ggt att cog aaa cgt goo gcc agc aat Asn Ala Arg Glu Thr Ile Lys Gly Ile Gin Lys Arg Glu Ala Scr Asn tgt ttc gca Cys Phe Ala act egg cat ttt gaa aac aaa ttt Ile Arg His Phe Giu Asn Lys Phe gee geg Ala Val goo act tta Glu Thr Leu acc cgc tcc tgaacagtca agaaaaacat tattgaggaa aattaatatc Ile Cys Ser acagcataac cccacecc gcaagtagca aacaggg cccaaaaaaa aaaaaa <210> 62 <211> 804 <212> DNA <213> Homo sapiens <220> tt acattttgtg cagtgattat tttttaaagt cctctcccac :tt tActatctcc tc tcdtc aactcaacta aaaccattac <221> <222> <223> <221> <222> <221> <222> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <c222> <223> sig..peptide 157. .345 Von Heijne matrix score seq GLVCAGLADMARP/AE polyA..signal 771. .776 polyA-site 791. .804 misc-feature 244. .789 homology id :AA576425 eat misc-feature 286, .790 homology id :AA236527 est misc-feature 287. .790 homology id :AA435919 eat misc-feature 520. .790 homology id :AA165350 est <221> misc..Eeature <222> 389. .522 <223> homology id :AA165350 eat <221> misc-.feature 174 <222> 336.-386 <223> homology id :AA165350 est <221> misc-feature <222> 326. .790 <223> homology id :AA490322 est <221> misc-.feature <222> 326. .790 <223> homology id :AA490310 est <221> misc-feature <222> 515. .780 <223> homology id :AA164559 est <221:t misc-feature <222> 325. .522 <223> homology id :AA164559 est <221> misc-feacure <222> 350. .790 <223> homology id :AA427895 est <221> misc-feature <222> 37B.,.790 <223> homology id :AA532390 est <221> misc-feature <222> 186. .382 <223> homology id :AA082259 es t <221> misc-feature <222> 61. .141 <223> homology id :AA082259 est <221> misc-feature <222> 426. .478 <223> homology id :AA082259 est <221> misc-feature <222> 29. .61 <223> homology id :AA082259 est <221 misc..feature <222> 389. .790 <223> homology id :AA157009 es t <221> misc-feature <222> 425. .790 <223> homology id :AA034912 175 <221> <222> <223> <221> <222> <223> es t misc..feature 186. .430 homol1ogy id :A.A428006 est rnisc-feature 59. .132 homo logy id :AA428006 .esc <300> ,:400> 62 aacagcgggc cgcggcagga gctcccaccc cga ggc ctg Arg Gly Leu atg ctg ccc Met Leu Pro agggaagcc gcggg9 aaagtgacta vctccc ggctgcchaa ggatcc Cgg gcc aec tac Arg Ala Thr Tyr aaggg crtc ccteg taCtccaggc gt tgtcagcc gagaggcgga cgcgagtcgc agggacgaga acacagccac gcggcg atg teg gcc gcc ggt gc Met Ser Ala Ala Gly Ala c99 ctc ctc gat aaa gtg gag ctg Arg Leu Leu Asp Lys Val Giu Leu gag aaa ttg Glu Lys Leu Ctg Lac aac Leu Tyr Asn cat cca gca ggt ccc His Pro Ala Gly Pro tgg ggg ttg gtg tgt Trp Gly Leu Val Cys aca gtt ttc ttc Thr Val Phe Phe gct eca act atg Ala Pro Ile met gga ttg gct gat atg gcc aga cct gca Gly Leu Ala Asp Met Ala Arg Pro Ala aaa. ctr. agc aca Lys Leu Ser Thr caa tct gct G3.n Ser Ala ctt gta act Leu Val Ile 9tt ttg atg gct aca ggg Val Leu Met Ala Thr Gly att tgg tca Ile Trp Ser aga tac tca Arg Tyr Scr act ccg aaa Ile Pro Lys tgg agt Ctg ttt gct gtt aat ttc ttc Trp Ser Leu Phe Ala Val Asn Phe Phe 9t9 ggg Val Gly gca gca gga gcc tct cag ctt ttt cgt act tgg aga tat Ala Ala Gly Ala Ser Gin Leu Phe Arg Ile Trp Arg Tyr aac caa. gaa cta aaa gct aaa. gca cac aaa Gin Giu Leu Lys Ala Lys Ala His Lys tgaacaatct agatgtggac aaaaccattg ggi agcaaagcta actgtgtgtt tagaaggcac tgt gaaaaatgca gcaaactttt aataacagtc tc gactatcagta acatttctcc accatttgtc cg~ aaaaaaa <210> 63 <211> 792 <212> DNA <213> Home sapiens <220> <221> sig-.peptide <222> 194. .253 <223> Von Heiine mnatrix~ taaaagagtt cctgatcz 'cc ttattgataa tgat tcaa ta cttatctatg cgtaaaaaaa ctagtt :aactggt tctacatg taataaaa tattatttgg agctagttct acttaaggaa catacttgct <221> <222:> <221> <222> <221> <222> score 12.3999996185303 seq ALLLGALLGTAWA/ RR polyA..signal 768- 773 polyk..site 780, .792 misc-.feature 154. .428 176 <223> homology id :R22491 est <221> misc-feature <222> 104. .160 <223> homology id :R22491 <221> misc-.feature <222> 47. .218 <223> homology id :AA136163 est <221> misc..feature <222> 265.,.403 <223> homology id :AA136163 est <221> misc-.feature <222> 3. <223> homology id :AA136163 est <221> misc..feature <222> 123. .265 <223> homology id :N57089 es t <221> misc-.feature <222> 47. .127 <223> homology id :N57089 est <221> misc-feature <222> 282. .323 <223> homology id :N57089 est <221> misc-feature <222> 128. .403 <223> homology id :AA314970 est <221> misc-feature <222> 138. .403 <223> homology id :AA314807 es t <221> misc..feature <222> 164. .403 <223> homology id :AA271811 est <221> misc..feature <222> 163. .385 <223> homology id :AA103053 est <221> misc-feature <222> 154...403 <223> homology id :AA042016 es t 177 <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <22 2> <223> <221> <222> <223> misc..feature 2. .250 homology id :AA315322 est misc-f.eature 154. .403 homology id :AA470189 es t misc-.feature 217. .403 homology id :AA462839 es t misc-fea ture 154. .403 homology id :AA120322 eat misc-feature 163. .403 homology id :W71694 eat misc-.feature 164. .385 homology id :AA250603 eat misc feature 266. .403 homology id :AA036242 est <300> <400> 63 aaggcggtcg cagccctgct cagcggtctt cattacgcta gcc ctg ct Ala Leu Let.
ccgggacacc ccgtgtgtgg caggcggcga asgctctgga ccccgcagcc aggtgtagtt tcgggagcca ctggggccaa ccagcgcttg ggccacggcg gcggccctgg gagcagaggt aag atg aaa ggc tgg ggt tgg ctg gcc ctg ctt Met Lys Gly Trp Gly Trp tLeu Ala Leu Leu gaatcccgga agtgagagtc ggagcgaccc ctg ggg Leu Gly acc gcc tgg gct cgg agg agc cgg gat ctc cac tgt Thr Ala Trp Ala Arg Arg Ser Arg Asp Leu His Cys gga gca tgc agg gct ctg gtg gat gaa cta gaa tgg gaa. att gcc cag Gly Ala Cys Arg Ala Leu Val Asp Glu Leu Glu Trp Glu Ile Ala Gln is gtg gac ccc aag aag acc att cag atg gga tcc ttc cgg atc aat cca Val Asp Pro Lys Lys Thr Ile Gin Met Gly Ser Phe Arg Il.e Asn Pro 30 35 gat ggc agc cag tca gtg gtg gag gta. act gtt act gkt tcc ccc aaa Asp Gly Ser Gin Ser Val Val Glu Val Thr Val Thr Xaa Ser Pro Lys 50 120 180 229 277 325 373 421 469 524 584 644 aca aaa gta, Thr Lys Val cac tct. 99c ttt tgg atg aaa att cga, His Ser Gly Phe Trp Met Lys Ile Arg ctg ctt aaa Leu Leu Lys aaa gga cct tgg tct taatagaaaa tgaagraaaa, cagactcaga aaaaaagatt Lys Gly Pro Trp Ser tbggctctgt ctcawtttgg aagaaggctggqcaggcttat tccccaatgc aactttgctt cctggctgca aaccyttaat acytttgttt ctgctgtaga aatttgttag ccaaaacawg 178 ggagtcctga twcagcaacc Cc~ttcca caatccacca tgactggttt tt 704 acttggggta tacatgcaaa accatccgtt Cmaaaatctg aatycggagc ttaaaaat aaaaatgaaa aacchaaaaa aaaaaaaa <210> 64 <211> 832 <212> DNA <213> Homo sapiens <220> <221> sig..peptide <222> 148. .207 <223> Von Heijne matrix atgtamc 764 792 <22 1> <222> <221> <222> <221> <222> <223> <221> <222> <223> <221> <222> <223> <2 21> <222> <223>, <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> score 12.3999996185303 seq ALLLGALLGTAWA/ RR PolyA-.Signal 7B9. .794 POlYA..Site 820. .832 misc-.feature 258. .553 homology id :AA435303 est misc..feature 117. .219 homology id :AA435303 est mi sc...feature 552. .645 homology id :AA435303 est mi sc-.feature 217. .258 homology id :AA435303 est misc..feature 258. .553 homology id :AA314807 est misc-feature 92. .258 homology id :AA314807 est misc..feature 258. .554 homology id :AA314970 es t misc..feature 82. .258 homology id :AA314970 est misc..feature 258. .553 homology id :AA547310 es t <221> misc..jeature 179 <222> 119.-.258 <~223> homology id :AAS47310 est <221> misc-feature <222> 359, .553 <223> homology id :AA565602 oat <221> misc..feature <222> 552. .683 <223> homology id :AA565602 est <221> misc-feature <222> 684. .751 <223> homology id :AA565602 -eat <221> misc-feature <222> 742. .783 <223> homology id :AA565602 eat <221> misc-feature <222> 364.. 553 <223> homology id :AA136094 eat <221> misc-feature <222> 552. .683 <223> homology id ;AA136094 est <221> misc-feature <222> 684. .751 <223> homology id :AA136094 eat <221> misc-feature <222> 258. .461 <223> homology id :AA136163 es t <221> misc-feature <222> 2. .172 <223> homology id :AA136163 eat <221> misc-feature <222> 216. .258 <223> homology id :AA136163 est <300>.
<400> 64 aggagaatcc cggacagccc tgctccctgc agccaggtgt agtttcggga gccactgggg ccaaagtgag agtccagcgg tcttecagcg cttgggccac ggcggcggcc ctgggagcag 120 aggtggagcg accccattac gctaaag atg aaa ggc tgg ggt tgg ctg gcc ctg 174 Met Lys Cly Trp Gly Trp Leu Ala Leu -20 ctt ctg ggg gcc ctg ctg gga acc gcc tgg gct cgg agg agc cag gat 222 Leu Leu. Gly Ala Leu Leu Gly Thr Ala -5 180 Trp Ala Arg Arg Ser Gin Asp ctc cac tgt gga gca tgc agg gct ctg gtg gat gaa act aga atg gga Leu His Cys Gly Ala Cys Arg Ala Leu Val Asp Glu Thr Arg Met Gly 15 eat tgc cca ggt gga ccc cee gaa gac cat tca gat ggg atc ttt ccg Asn Cys Pro Gly Gly Pro Gin Glu Asp His Ser Asp Gly Ile ?he Pro 30 gat caa tcc aga tgg cag cca gtc agt ggt gga ggt Ucc tta tgc cg Asp Gin Ser Arg Trp Gin Pro Val. Ser Gly Gly Gly Ala Lou Cys Pro cC aga ggc Leu Arg Gly eca cct eec aga gct get gga gga Pro Pro His Arg Aia Ala Gly Gly get aeg tgaccggatg Asp Met aaggagtacg gggaaca ggccggaatg gagaate agcggcaccc tcaagbt' tgaattct.tt tceegag tccttgtgac catgccci acactggctC gatggat tacgttttac tgaaatt <210> 65 <211> 721 <212> DNA <213> Homo sapiens <220> ttg agg tgc :c aac tgatcct tee cgaaetggae cgtgtgggaa ctgacaatgt acatatcgge ccccaggnaa tgaaaaatat acceatcgea c tacaaggca Cattgtggag caaagacaea atgatgagct gggaaaatgg gaaaccaaaa agaactacgt tccgaaccga gaataegagg ctttgcagta atgaaccact tggcaatgec gtscaaaaaa acgtgtagtg ctcagatatt atgaacccat agcgaacaga ggagcagccc ttttatatat aaaaaaa <221> <222> <223> <221> <222> -q22 1> <222> <22 1> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> sig...peptide 156. .230 Von Heijne matrix score seq MFAASLLAMCAGA/EV polyk..signal 706. .711 PolyASitQ 709. .721 misc-.feature 351. .688 homology id :H98648 est misc-f.eature 289. .353 homology id :H98648 est misc-feature 274. .641 homology id ;AA181022 est mi sc-feature 255. .286 homology id :AA181022 est zuisc..feaeture 242. .641 homology id :AA143192 es t mi sc.feature 261. .646 homology 181 id :AA594850 est <221> misc-.feature <222> 165.-474 <223> homology id :AA563681 est <221> misc-f.eature <222> 1. .74 <223> homology id :AA563681 est <221> misc_ feature <222> 261. .643 <223> homology id :AA287457 es t -<221> misc-feature <222> 352. .646 <223> homology id :N22S67 es t <221> misc-feature <222> 299. .354 <223> homology id :N22567 eat <221> misc-feature <222> 265. .303 <223> homology id :N22567 est <221> misc-.feature <222> 30. .165 <223> homology id :AA186657 est <221> misc-feature <222> 270. .349 <223> homology id :AA186657 est <221> misc-feature <222> 213. .261 <223> homology id :AA186657 eat <221> misc..feature <222> 165. .214 <223> homology id :AA186657 es t <221> misc-feature <222> 346. .387 <223> homology id :AA186657 est <221> misc-feature <222> 52. .400 <223> homology id :HSClEDO81 est <221> misc-feature 182 <222> 398. .436 <223> homology id :HSCIED08l est <221.> misc-.feature <222> 171. .316 <223> homology id :AA143136 t <300> <400> aCctgggtC cggcccgctc ccctgcagtt cgcggwacag gcccacagtc cctctcccCc atg tcc acc tac ctg ad Met Ser Thr Tyr Leu Ly gcmgtccgct ccgtccgccc tagacccgt cgcccagcac tccctattag agcgcgtgca tagaggcaga kaggagtgaa tagagccC9c cgacc atg ccc gcg ggc gtg ccc Met Pro Ala Gly Val Pro a atg ttc god goo agt, ctc ctg gcc atg tgc sa Met Phe Ala Ala Ser Leu Leu Ala Met Cys gca ggg gca Ala Oly Ala -10 gad gtg gtg cac agg tac tac cga ccg Glu Val Val His Arg Tyr Tyr Arg Pro gac ctg aca ata Asp Leu Thr Ile 1 5 cct ga act cod cca aag cgC gga gad ccc aaa acg gag ctt ttg gga Pro Glu Ile Pro Pro Lys Arg Gly Glu Leu Lys Thr C1u Leu Leu Gly is 20 ctg aaa gaa aga aaa cac aa cot ca gt tcc caa cag gag gaa ccc Leu Lys Glu Arg Lys His Lys Pro Gln Val Ser Gin Gln Clu Glu Leu 35 40 aaa taactatgcc aagaatctg tgaataatat aagtcttaaa tatgcattto Lys ccaatttatc gcatcaai gaggtgctca gtggatgt tCctgaaaac cgccaaa tctagccccg aatataac atttcatata aactaagg ga <210> 66 <211> 531 <212> DNA <213> Homo sapiens <220> tt ;ca :gc iaa acttgtcctt agccgatacg catatcacca gaaatagaat ttatccaaaa aagcac ttag ttgaasttta aaccatttca atttgtaagt ac tatgaact tctaatgcta atcacggccc tgaat atgg t otactatatg aggttccatt aocgcaagag gatccga tac t t tggaagatg ggtgtctt aaaaaaaaaa <221> <2 22> <223> <221.> <222> <221> <222> <221> <222> <223> <221> <222> <223> <221> <222> <223> sig..peptide 272. .397 Von Heijne matrix score 4.59999990463257 seq RIPSLPGSPVCWA/WP polyk..signal 503. .508 po lyA.s ite 518. .531 misc-feature 235. .517 homology id :AA524403 eat misc-feature 52. .208 homology id ;AA524403 es t miao...feature 259. .517 homology 183 id ;N93660 es t <221> misc-feature <222> 85. .207 <223> homology id :N93600 est.
<221> misc-..eature <222> 353. .517 <223> homology id :AA594610 est <221> misc-.feature <222> 258. .363 <223> homology id :AA594610 eat <221> misc-feature <222> 105. .207 <223> homology id :AAS94610 eat -,221> misc-feature <222> 202. .517 <223> homology id :AA074748 est <221> Misc-feature <222> 116. .153 <223> homology id :AA074748 eat <221> misc feature <222> 167.-202 <223> homology id :AA074748 eat <221> misc-feature <222> 258.-517 <223> homology id :N93603 est <221> misc-.feature <222> 208. .251 <223> homology id :N93603 es t <221> misc feature <222> 163. .202 <223> homology id :N93603 est <221> misc-.feature <222> 90. .125 <223> homology id :N93603 es t <221> misc-feature <222> 125. .363 <223> homology id :HSP004938 est <221> misc-feacure 184 <222>. 353. .517 <223> homology id :HSP004938 est <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> mi sc-f.eature 28. .227 homology id ;AA074804 es t: misc-.feature 265. .310 homology id :AA074804 eat m±ac-fea ture 227. .263 homology id :AA074804 ear: Cisc-feature 352. .385 homology id :AA074804 est <300> <400> 66 aaaaggaaag aggaagttga crcgcrcacgag tcggacggag C tcaacaaga aggtysggag aaggcccaga aagagaatgg agcgcgagga crtatgcgca: cgctcgcgag ggaggcc tcc ctgttgcagt ctcggcggct tcgcatrgaca atctcggacc acccaacctg gggcaaatgg ccggagctgg cggcgtcaga gcagctccag gagcgcgccc gacagcagct g acg gac gga cac tgg Met Asp Gly His Trp aaaggtgct t acc gaccatg tgccggggat agaggcgctg rtcg gct Ser Ala gc r Ala ,agg Arg rtc tct gca ctg Phe Ser Ala Leu 8CC gtg act gca atg tca tcc tgg gct cgg cgc Thr Val. Thr Ala Met Ser Ser Trp Ala Arg Arg -30' -25 agt tcc tca agc cgt cgg atr: cct tct ctg ccg qgg agc ccc gtg Scr Ser Ser 5cr Arg Arg Ile Pro Ser Leu Pro Gly Ser Pro Val tgc tgg gcc tgg cca tgg tac ccg gac aCC aca tcg ttt cca ttg agg Cys Trp Ala Trp Pro Trp Tyr Pro Asp Thr Thr Ser Phe Pro Leu Arg 1 5 tgc aga ggg aga gtc tgaccgggcc tccgtatctc tgaccacgat ggcgCttacc Cys Arg Gly Arg Val.
tttcagactt cattaaactt atgaccaaaa aaaaaaaaaa <210> 67 <211> 783 <212> DNA <213.> Homo sapiens .c220> <221> sig..peptide <222> 381. .629 <223> Von Heijne matrix <221> <222> <221> <222> score 8.60000038146973 seq LELLTSCSPPASA/SQ polyA.signal 736. .741.
polyA-.site 770. .783 <221> misc-.fea~ <222> 207. .263 <223> homology ture 185 id :AA351230 es t <300> <400> 67 agggactt cc agaactgtgc CCgggg tagg gga tggcgga ccetteacccg tgatcgggtt agtgaagaga cag gct gga Gln Ala Gly ggcc tcgctg tgggaagga t gtttgagcc gacgaaggac ggaggagtgg ccaoaaccag ggcccccac gcgtggacgt ttgtggtggg gcgtgttggt ccgcgctctc ggtagggcga ctggggctca cctccgcacc gttgtaggac cgtgggagct gccccacgcg gcctcgtcct gccaacggec gcagcgcaga tgttggtgac cttcaaggat gtggctgtga agacagcCgg acctggccca gaggaccctg taccgagagg agttggtcco cctgctagag catgggcagg agctgcggac atg cta cct gtg cag agt Ctc act ctt gtt gcc Met Leu Pro Val Gin Ser Phe Thr Leu Val Ala gtg cag tgg cgc Val Gln Trp Arg ctc agc tca ctg caa Leu Ser Ser Leu Gin cte ctq cct Leu Leu Pro ccc gag Pro Glu tac agg Tyr Arg ttc aag gga tcc Phe Lys Gly ?he tgc ctc agc ctc Cys Leu Ser Leu agt agc tgg gat Ser Ser Trp Asp 120 180 240 300 360 413 461 509 557 605 653 701 754 cgc cca cca cca tgc ccg gct ggt Arg Pro Pro Pro Cys Pro Ala Gly ttt gta ttt Phe Val Phe aeg ggg ctt Thr Gly Leu cat gct ggc cag gct ggt cct gaa ccc His Val Gly Gin Ala Gly Leu Glu Leu tta gta Leu Val ttg ac Leu Thr aca ggc Thr dly tca tgc agt cca ccc gcc tc gcc tcc caa agt gct geg act Ser Cys Ser Pro Pro Ala Ser Ala Ser Gin Ser Ala Ala Ile gcg agc cac gcg Val Ser His Val aat tca aga aaw Asn Leu Arg Xaa ccc ggc aaa Pro Gly Lys 15 aaa aaa ccg ctt Lys Lys Leu Leu gtt gaa aag aaa Val Glu Lys Lys tcg ccg acg gra aca aaa acy Leu Leu Thr Xaa Ile Lys Thr taacaaaact accacccgaa ggaatgaaaa aaccaaaaaa aaaaaaaaa <210> 68 <211> 996 <212> DNA <213> Homo sapiens <220> <221> <222> <223> <221> <222> <221> <222> <221> <222> <223> <222> <223> <221> <222> <223> sig-.peptide 140. .205 Von Heijne matrix score 5.90000009535743 seq IILGCLALFLLLQ/RK polyAsignal.
965. .970 po lyA.s ite 984. .996 misc-feature 676. .959 homology id :AA399103 est misc-..eature 609. .679 homology id :AA399103 est misc-feature 225. .433 homology id :AA398040 est 186 <221> misc-feature <222> 433. .563 <223> homology id :AA396040 estz <300> <400> 68 acagccacg aaggagagct gcaaaagttg cagcagaaag gttgggagtc ccgacaggtt ccgcagccca cagaaaagaa gcoagggacg gcaggactgt ttcacacte tctgctcctg gaaggtgctg gacaaaaac atg gaa cta act tcc cca aca gtg act ata ac Met Giu Lau Ile Ser Pro Thr Val Ile Ile Ile ceg gg: t.gc c:t gct ctg ttc tta etc ctc cog cgg eag aat ttg cgc Leu Gly Cys Lau Ale Leu Phe Lou Leu Leu Gin Arg Lys Asn Lau Arg aga ccc Arg Pro gak tt Xaa Phe gta cgt Val Cys -Z 1 ccg tgc atc aag 99C t99 att cct tgg act gga gct gga ttt Pro Cys Ile Lys.Gly Trp Ile Pro Trp Ile Gly Val Gly Phe 15 ggg aaa gcc cct cta gaa ttt ate gag aaa gca aga atc aag Cly Lys Ala Pro Leu Gi-u Phe Ile Giu Lys Ala Arg Ile Lys 30 gg t Gly cgt ggc ava Cg9 99t ctc cag agg aga Arg Cly Xaa Arg Ciy'Leu Gin Arg Arg caa tgc ttt c Gin Cys Phe Leu ttt taaactttct ttcattgact cttaagtgca 99getagaac acggggaaca Phe tacctgcttg caacaaatc cattttgga tgcgtatgtg agcttgtgea actgcctgca tatttgatga tagtagatgt ccecat taat at~gtytaat <210> 69 <211> 657 cctcacta c :gtgcaaaa agtagagat t ttaagtaaac agagcaat cagttggtcc tgatgtactc tt.tgtcttg gtyttttaag aaatttgtat aaggatc tag ttttgcgaaa aacyyt tcgt gttttcagta tccgtatcca gtacacaata cettttcamt aaagtatctt ttttattcaa gatattaaaa tCM~ytctga gaaatgaaat atttttactt aytgggaaag aaytggagtt gacaggc tyt acggcccgaa ttaaatgtyt tttccagtca aaaaaaaaa aktcctctac acaacgcmg cmtcgaagtt ataaagtgta aacaccaaag g tat tt ttag gagamtagta gagcacttta caaatatttc tsacrrtcra cgtgcatcga aagttccaaa atccaattta tattgtacaa c tgacgttgt atcctccttg aggaacagac atggtartctg <212> <213> <220> <221> <222> <223> <221> <222> <221> <222> <22 1) <222> <223>
DNA
Homo sapiens sig...peptide 183. .338 Von Heijne matrix score 3.79999995231628 seq VMLETCGLLVSLG/QS polyA-signai 620. .625 polyA site 644. .657 misc-feature 207. .263 homology id :AA357230 est: <300> <400> 69 agggacttcc agaac tgtgc ccggggtagg gg atg gcg Mact Ala gcctcgctg gcgtggacgt ttgtggtggg gcgtgttggt tgggaaggat ggtagggcga ctggggctca cctccgcacc gttttgagcc cgtgggagct gccccacgcg gcctcgtcct gag acg aag gac gca gcg cag atg ttg gtg acc Giu Thr Lys Asp Ala Ala Gin Met Leu Val Thr ccgcgctctc gttgtaggac gccaacggc ttC aeg Phe Lys gat gtg gct Asp Val Ala gCc cag agg Ala Gin Arg -45 gtg acc ttt aCC cgg gag Vai The Phe The Arg Gilu 187 gag tgg aga cag Glu Trp Arg Gin ctg gac ctg LeU Asp Leu acc ctg tac Thr Leu Tyr Cga gag gtg Arg Giu Val atg ctg gag ecc tgt ggg ctt Met Leu Giu The CYs Gly Leu ctg gt Leu Val tc cta ggg caa agc act tgg ctg cat ata aca gaa aac ceg Ser Leu Giy Gin Ser Ile Trp Leu His Ile The Glu Asn GIn atc aaa ctg Ile Lys Leu gct tca Ala Ser 1.
ccc gga agg aaa Pro Gly Arg Lys aag cct gag geg egg ttg gct cca ggC Lys Pro Giu Val. Trp Leu Ala Pro Gly tcc act Phe Thr ceg etc Leu Phe aac ecg cct gat gag Asn Ser Pro Asp Giu ggt gcc gca gcc cag Gly Ala Ala Ala Gin ttggttcagg ctctggatt gccrtcgactg gggcagcagg ctttcttttc acaacgagaa cgacgccatc aaggatgtc tggcccg tcctcc gtccecaggc cggctcceca tagggatgct gggtgctgca cccccatggc ccaacccatc ctcccaccct ggaecaaacg <210> 70 <211> 416 <212> DN4A <213> Homo sapiens <220> <221> sig..peptide <222> 140. .205 <223> Von Heiine matrix <221> <222> <221> <222> <221> <222> <223> score 5.90000009536743 seq IILGCLALFLLLQ/RK POlYk..signal 383. .388 polyA..site 405. .416 misc..feature 225. .316 homology 419 467 527 587 647 657 120 172 220 268 316 364 416 id :AA398040 es'.
<300> <400> aacagttacg aaggagagct gcaaaagttg ccgtagccca cagaaaagee gcaagggacg gaaggtgctg gacaaaaac atg gaa cta Met Giu Leu cagcagaaag. gttgggagtc ccgacaggtt gcaggactgt ttcacacttt tctgctcg at tcc cca aca gtg act ata atc Ile Ser Pro The Val Ile Ile Ile ctg ggt Leu Gly aga ccc Arg Pro gag ttt Giu Phe tgc ctt gct ccg ttc tta Ctc ctt cag cgg eag aat ttg Cys Leu Ala Leu Phe Leu Leu Leu Gin Arg Lys Asn Leu ccg tgc atc Pro Cys Ilie ggg aaa gcc Gly Lys Ala aag ggc cgg act cct egg act gga gtt Lys Gly Trp Ile Pro Trp Ile Gly Val gga ctc Gly Phe cct cta gee Pro Leu Glu ata gag aaa gca Ile Giu Lys Ala aga atc aag Arg Ile Lys atg acc ttc Met Thr Phe cat gga cca aca ttt aca gc Tyr Gly Pro Ile Phe The Val gct aeg gga aac Ala Met Gly Asn gtt act gaa gaa gga agg aat taatgtgcec ctaaaatcca aaaaaaaaaa a Val The Giu Glu Gly Arg Asn <210> 71 <211> 543 <212> <213> <220> <221>.
<222> <223> <221> <222> <221> <222,> <221> <222> <223> <221> <222>.
<2 23-> <221>.
<222> <223> <221> <222>.
<223>
DNA
Homo sapiens 188 129. .176 Von Heijne matrix score 4.80000019073486 seq SLFIYIFLTCSNr/Sp poiyk..aignai 513. .518 530. .543 misc-.feature 264, .500 homology id :AA534039 eat misc-.feature 205,..315 homology id :T82645 est misc-.feature 295. .382 homol1ogy id :T82645 es C misc..feature 375. .405 homology id :T82645 es t <300> <400>. 71 actgtccca: tcctcccc acaacacaca caccttcag gcagggasgn gatgagcttc cagccccaag agtggaggc: gccacatcct aacatasgta tctattgaaa aggaagcagt Met Ile Ile Ser Leu Phe Ile Tyr Ile Phe Leu Thr Cys Ser -10 aac acc tct cca tct tat caa gga act caa ct~c ggt ctg ggt ct~c ccc Asn Thr Ser Pro Ser Tyr Gin Gly Thr Gin Leu Gly Leu Gly Leu Pro gcc cag tgg tgg Ala Gin Trp Trp ttg aca ggt agg Leu Thr Gly Arg Ct~a ttt tgt ccc Leu Phe Cys Phe caa aac tgt Gin Asn Cys ctg att cag cat gat ccc tgt gag ct~g gctt ct~c Leu Ile Gin His Asp Pro Cys Clu Leu Val Leu atg cag tgc tgc agg Met Gin Cys Cys Arg ccC tC ccc ctc cac Pro Phe Pro Leu His aca atcc tcc tgg gac Thr Ile Ser Trp Asp taaccatact gtcttccttt ctcccctccct cccttgccct aaactatcaa gtgaaaaaaa t99g gct gag Trp Ala Giu gca ggg gct tcg Ala Gly Ala Ser 55 tat tct ccc Tyr Ser Pro cccccttgcc act:Cagcagt tatcccccca gctatgcctC ggcatatatt gtgccttatt tatgctgcaa atcataacatt aaaaaaaa <210> 72 <211>. 605 <212> DNA- <2i3'. Homo sapiens <220>.
<221>. sig..peptide 189 <222> 285. .341 <223> Von Heiine matrix score 5.59999990463257 seq PTLCVSSSPA.LWA/AS <221> polyA..signal <222> 575. .580 <221> POIYA-.Site <222> 592. .605 <221> misc-Eeature <222> 53. .296 <223> homology id :W07033 <221> misc-.feature <222> 348. .432 <223> homology id :W07033 es t <221> misc..feacure <222> 435. .497 <223> homology id :W07033 es t <221> misc-.feature <222> 293. .337 <223> homology id :W07033 est <221> misc-feature <222> 521. .560 <223> homology id :W07033 est <221> misc-.feature <222> 489. .520 <223> homology id :W07033 est <221> misc-feature <222> 15. .337 <223> homology id :AA15lOO4 est <221> misc-feature <222> 348. .412 <223> homology id ;AA1S1004 est <221> misc-feature <222> 434. .485 <223> homology id :AA2.51004 est <221> misc-feature <222> 83. .324 <223> homology id :AA476506 est <221> misc-feature <222> 347. .560 <223> homology id :AA476506 est <221> misc-featUre 9 <222> 16. .347 <223> homology id :W56567 est <221> misc-feature <222> 350. .405 <223> homology id :W56567 est <221> misc-feacure <222> 433. .470 <223> homology id :W56567 est <221> misc-.feature <222> 15. .296 <223> homology id :AA147584 <221> misc-.feature <222> 348.-421 <223> homology id :AA147584 est <221> misc-feature <222> 293. .337 <223> homology id :AA147584 est <221> misc-feature <222> 419. .453 <223> homology id :AA147584 est <221> misc feature <222> 2. .338 <223> homology id :AA281959 est <221> misc feature <222> 350. .432 <223> homology id :AA281959 est <300> <400> 72 aacgcctwta agacagcgga actaagaaaa gaagaggcct gtggacagaa caatcatgtc tgactccctg gtggtgtgcg aggtagaccc agagctaaca gaaaagctga kgaaattccg 120 CttCcgaaaa gagacagaca atgcagccat cataatgaag gtggacaaag accggcagat 180 ggtggtgjctg gaggaagaat ttcagaacat ttccccagag gagctcaaaa tggagttgcc 240 ggagagacag cccaggttcg tggtttacag ctacaagtac gtgc atg acg atg gcc 296 Met Thr Met Ala gag tvt cct acc ctt tgt gtt tca tct tct cca gcc ctg tgg gct gca 344 Glu Cys Pro Thr Leu Cys Val. Ser Ser Ser Pro Ala Leu Trp Ala Ala -10 agc gaa aca aca gat gat gta tgc agg gag taaaaacagg ctggtgcaga 394 Ser Glu Thr Thr Asp Asp Val. Cys Arg Glu cagcagagct cacaaaggtg tccgaaatcc gcaccactga tgacctcact gaggcctggc 454 tccaagaaaa gttgtctttc tttcgttgat ctctgggctg gggactgaat tcctgatgtc 514 tgagtcctca aggtgactgg ggacttggaa ccotaggac ctgaacaacc aaggacettta 574 aataaatttt aaaatgcaaa aaaaaaaaaa a 0 191 <210> 73 <211> 864 <212> DNA <213> Homo sapiens <220> <221> sig..peptide <222> 136. .444 <223> Von Heijne matrix score 4.90000009536743 seq VYAFLGLTAPSGS/KE <221> polyA.signal <222> 835. .840 <221> polyk..site <222> 851. .864 <221> misc..feature <222> 222. .456 <223> homology id :AA136758 eat <2 21> misc-feature <222> 557. .648 <223>.homology id :AA136758 est <221> misc-feature <222> 501. .571 <223> homology id :AA136758 eat <2>misc-.feature <222>- 130. .456 <223> homology id :AA393612 en t <221> misc..feature <222> 88. .130 <223> homol1ogy id :AA393612 eat <221> misc..feature <222> 501. .538 <223> homology id :AA393612 eat <221> misc-feature <222> 130. .458 <223> homology id :R59039 est <221> misc-feature <222> 71. .130 <223> homology id :R59039 est <221> misc-feature <222> 557. .716 <223> homology id :W48624 eat <221> misc..feature <222> 365. .456 <223> homology id :W48624 192 est <221> misc-feature <222> 501. .571 <223> homology id :W48624 eat <221> misc-feature <222> 716. .751.
<223> homology id :W48624 est <221> misc..feature <222> 222. .458 <223> homology id :AA136810 eat <221> misc-feature ,<222> 501. .581 <223> homology id :AA136810 est <221> misc-f.eature <222> 587.,.668 <223> homology id :A.A136810 est <221> misc-feature <222> 130. .419 <223> homology id :T35647 es t <221> misc-.feature <222>. 59. .130 <223> homology id :T35647 eat <221> misc-feature <222> 557. .852 <223> homology id :HUMO93FO6A est <221> misc-.feature <222> 501. .571 <223> homology id :HUM093FO6A eat <221> misc-feature <222> 130. .384 <223> homology id :T35666 est <300> <400> 73 aaagttctcc gga ttccagc cgctgttccc ttccaccttc ccccaccctt ctctgccaac cgctgtttca gcccctagct cattgctgca gctgctccac agcccttttc aggacccaaa caaccgcagc caggr atg gtg atc cgt gta tat att gca tct tcc tct ggc Met Val Ile Arg Val Tyr Ile Ala Ser Ser Ser Gly -100 tct aca gcg art aag. aag aaa caa caa gat gtg ctt ggt Ser Thr Ala Ile Lys Lys Lys Gin Gin Asp Val Leu Gly -8s -80, ttc cta gaa Phe Leu Glu gcc aac aaa ata gga ttt gaa gaa aaa gat att gca gcc aat gaa gag Ala Asn Lys Ile Gly Phe Glu Glu Lys Asp Ile Ala Ala Asn Glu Giu 193 aat cgg aag tgg atg aga gaa aat Asn Arg Lys Trp Met Arg GiU Asn aca ggt aac ccc ctg cca cct cag TIhr Cly Asn Pro Leu Pro Pro Gin gta cct gaa aat agt cga cca, gcc Val Pro Glu Asn Ser Arg Pro Ala -50 att ttc aat gaa agc cag tat cgc Ile Phe Asn Glu Ser Gin Tyr Arg 315 363 411 459 ggg gac tat gat Gly Asp Tyr Asp gec tee ttt gaa gcc aga gaa aat aat gca gtq tat Ala PhQ Phe Glu Ala Arg Clu Aun Asn Ala Val TIyr gc ttc Ala Phe tta ggC ttg aca Leu Gly Lou Thr gco cca Ala Pro Ccc ggt tca aag gad gca gga agg Ser Gly Ser Lys Clu Ala Gly Arg tgc aag cad agc agC aag oca tgaacccega gcactgtgct tttaagcatc Cys Lys Gin Ser Ser Lys Pro ccgaaaaaeg agtctcci aataggstta atgtcga aatacaaaat taaaattl taaaattata ttaataa gcactacacca atgattel acctaagtgt acytgcac <210> 74 <211> 1033 <212> DNA <213> Homo sapiens <220> att ~aat tga gta tac gc t got tttataa aatagae tag acattatggt gatatyg tag aaagaaaaca ttacattaaa aatagcagaa ttgggttttc gattatggtg aaatagegtt aaggagaaag ttagctttgo aca tgcaaao aggagaatgg getacctgoc ccytgocat agaaaaaaaa a ttoaaaaga amtcaaaatg gatattaaca aagccaecct tamtacggca aaaa <221> <222> <223> <221>- <2 22> <221> <222> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> sig-.peptide 200. .427 Von Heijne matrix score 4.69999980926514 seq LIVYLW'ASFIAS/SS polyk-.signal 1001. .1006 Pclyk-site 1022. .1033 misc-feature 55. .406 homology id :AA056667 est misc-.feature 397. .487 homology id :AA056667 eat misc-feature 527. .584 homology id :AA056667 est misc-feature 482. .531 homology id :AA056667 est miso..feature 581. .634 homology id :AA056667 est <221> misc-.feature <222> 397. .700 194 <223>. homology id :AA044187 est <221> misc..feature <c222> 222. .406 <223> homology id :AA044187est <221> misc..feature <222> 693. .748 <223> homology id :AA044187 <221> misc..feature <222> 68. .406 <223> homology id :AA131958 es t <221> misc-.feature <222> 397. .517 <223> homology id :AA131958 eat <22 1> misc-feature <222> 510. .558 <223> homology id :AA131958 est <221> misc-feature <222> 77. .531 <223> homology id :W95957 eat <221> misc-feature <222> 527. .558 <223> homology id :W95957 es t <221> misc-feature <222> 397. .586 <223> homology id :AA041216 eat <221> misc-.feature <222> 286. .406 <223> homology id :AA041216 est <221> misc-feature <222> 582,..700 <223> homology id :AA04l2l6 eat <221> misc..feature <222> 77.-406 <223> homology id :W95790 est <221> misc-.feature <222> 397. .539 <223> homology id :W95790 est 195 <221> <222> <223> <221> <222> <223> misc..feacure 474. .760 homology id ;AA461134 est miSC-feature 788.-.940 homol2.ogy id :AA46ii34 es c <300> <400> 74 aagacgaggt cctgadgtgd tgcgagagaa CCaagtgttg catgaatcat gtgacggtgg cagcggagag aaccaggcag gggggcccat cacggcggat aagggctcc atg cca ttg Met Pro Leu Cttgaggagg aacctgct taaagctgc cccagaaacc ccaggcgetgg agattgacc gacctaaagc gatctegta caaaaagtta ttg tgt cag ata gag atg gag tac Leu Cys Gin Ile Ciu Met Glu Tyr tta tta aag tgg Leu Leu Lys Trp 4tg aca atg ccc cag agc atg ctt tgc Met Thr Met Leu Gin Ser Met Leu Cys gac ctg gtt tc tat Leu Val Ser Tyr ctt ttg ccc ttg Leu Leu Pro Leu cag acc aag gaa Gin Thr Lys Giu -50aa Aca Aac 120
ISO
232 280 328 376 424 4172 tcg gac ttt Leu Asp Phe agg tgg tcc Arg Trp Phe agc agc ac Ser Ser Ser aaa ata aaa gta tca tc gtt act aca Lys Ile Lys Val Ser Ser Vai Thr Ile Al aca cct acc Thr Pro Thr tta atc get tac ctt tgg gtg gtg agc ccc ata gcc Leu Ile Val Tyr Leu Trp Val Val Ser Phe Ile Ala gcc eat aca gga cta. etc gc Ala Asn Thr Gly Leu Ile Val cta gaa aag gaa Leu Glu Lys Glu gct cca ttg ttt gaa gae ctg aga cae gte gtg gee gte tc Ala Pro Leu Phe Giu Giu Leu Arg Gin Val Val Giu Vai Ser taatctgaca gtggttt agcaaccc agactac caacttatac taaagag tctggtgta gggtcc ctatcaacag ctcccat tggatcagaa tcaaact~ aacacgccca ggcttgc 4 tttttYttca gattacg aaaagtaata aaaccagl <210> 75 <21i> 499 <212> DNA <213> Homo sapiens <220> :ag aat ca tc gga 9gt aga tac tgtgcacctt aaaccteca gca ta age C tattcagtga gttagtctgg acattgacc ataaagccaa tatttytttg aatcaceaaa atcttcatta tccatgtgct gtaacccaca gactaggga ecacegacac ac ttgagccg cc ttcat tg cattgagcga aaaaaaaaa taacaacaca ceagaaeggg gatagaccag taccacagaa ggatgegaga e taegcgccg tgaataataa ggaaceeaaa atatcaatccc cccctttc c ttgctatatc atggttcagc ttytacccag ccaettgtac teeggacata atggcttgge 574 634 694 754 814 874 934 994 1033 <221> sig-pepeide <222> 68. .133 <223> Von Heijne matrix score 9.80000019073486 seq LWVFCLALQLVPG/sp <221> poiyA.signai <222> 472. .477 <221> poiyA.site <222>- 490. .499 <300> <400> aaacagcagt gcctggccaa acccagcaac ccceggccag aacccactca cccatcccac tgecacc atg aag ccc gtg ctg ccc ccc cag ccc ctg gtg gtg ccc tgc Met Lys'Pro Vai ota gca ctg cag ctg gtg Leu Ala Leu Gin Leu Val 196 Gin Phe Leu Val Val Phe Cys Leu Pro Leu cct ggg agc Ccc aag cag cgt gtt Pro Gly Ser Pro Lys Gin Arg Val ctg aag Leu Lys tat atc Tyr Ile ttg gaa cct cca ccc tgc ata tca gca cct gaa aac tgt act Leu Glu Pro Pro Pro Cys Ilie Ser Ala Pro Glu Asn Cys Thr ctg tgc aca atg cag gaa gat tgc Leu Cys Thr MetGin Glu ASP Cys gag aa Giu Lys tca ga Ser Glu gga ttt cag tgc Giy Phe Gin Cys tcc ttc tgt Ser Phe Cys ata gtc tgt tca Ile Val Cys 5cr aca ttC aa Thr Phe Gin aag cgc Lys Arg sac aga atc Asn Arg Ile asa cac Lys His aag ggC tca gaa gtc Lys Gly Ser Glu Val atc atg cct gcc aac Ile Met Pro Ala Asn tgaggcatat ttcctag aaggtattga gaagcaa ttggcaataa aggctaai <210> 76 <211:, 978 <212> DNA <213> Homo sapiens <220> atc attetgcctc tacgatgttt tttcttggtc cacctttagg gaa actggaggcc castatctaa cctgcaaatc gtttttgagt tct accaaaaaaa aaa <221> <222> <223> <221> <222> <221> ,,222> <221> <222> <223>.
<221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> sig...peptide 274. .399 Von Heijne matrixc score 5.19999980926514 seq LLFDLVCHEFCQS/DD PolyA-.signai 943. .948 PoiyA.slte 966. .978 misc-feacure 335. .518 homology id :AA206225 est misc-feature 225 .'-274 homology id :AA206225 est misc-feature 812. .861 homology id :A.A206225 est misc-feature 186. .224 homology id :AA206225 es t misc-feature 708. .748 homology id :AA206225 eat misc-feature 276. .314 homology id :AA206225 197 est <221> misc-Eeature <222> 146. .176 <223> homology id :AA206225 est <221> misc-feature <222> 879. .909 <223:- homology id :AkA206225 est <221> misc-feature <222> 182.-518 <223> homology id :C15003 eat <221> misc-feature <222> 708.-.748 <223> homology id :C15003 eat <221> misc-feature <222> 182. .517 <223> homology id :HUM407E118 ea t <221> misc-feature <222> 170. .202 <223> homology id :AA544037 est <221> misc-feature <222> 517. .595 <223> homology id es t <221> misc-feature <222> 596. .665 <c223> homology id es t <221> misc-feature <222> 697. .748 <223> homology id :HUMOOTW170 eat <221> mise..feature <222> 805. .861 <223> homology id :HLTMOOTW170 est <221> misc-feature <222> 212. .369 <223> homology id :HUM169E083 eat <221> misc-feacure <222> 406. .493 <223> homology id :HUM169E08B est <221> misc-feature <222> 542. .595 198 <223> homology id :HUMOOTW1l2 es C <221> misc-.feature <222> 697. .748 <222> homology id :HUMOOTWii2 <300> <400> 76 accaggaaca tgaccecgcc taa tgttaag tctgaagagc cgctctgaaa tccagc tat c ggtgaaggtg &zotgggtCag agccagtgtt a tccagaa eg tagatagca ggggag~ge g aaacggggc.
tcggcttgtg gcttgattCC tttgctteca. taetgtcaagt etcaacaaatg tggacaagc. ctccgaetccg gacgagaaac gce.cagcctc tggaccaacc ccaggaagag ccctgetatac ttgaagctgc caaacaagta tac atg cac act teta caa ctg ctt Met His Ile Leu Gin Leu Leu act. aca gtg gat gat Thr Thr Val ASP ASP gga-aaa gac ate. egg Gly Lys Asp Ile Trp att caa gca ate.
Ile Gin Ala Ile cat cge. ccz gac His Cys Pro Asp aat tta ctt Ccc gac Asn Leu Leu Phe Asp cca gcc acc ac. ctt Pro Aia Ile Ilie Leu tc cag Cys Gin ceta gcc Leu Ala gag Caa Giu Gin tc gat gat Ser Asp Asp 1 cc gte. etet Ser Vai Phe gag tat cta Giu Tyr Leu ccg gtc tgc cat Leu Val Cys His caa gaa -cag aaa Gin Giu Gin Lys ac tat gcc tca Ile Tyr Ala Ser gaa tc Giu Phe aca gtg Thr Val cag act Gin Thr tca gcg Ser Val ttg ece. gcc Leu Ser Ala ata gaa 888 gta gat ccc cct cta at Ile Giu Lys Val ASP Leu Pro Leu Ile agc ctc act cgg Ser Leu Ile Arg etea caa aat atg Leu Gin Asn Met cag cgt cag aaa Gin Cys Gin Lys gag aa Glu As aaccca ctaata tgagca gggtgg gaaaaa sstgetb <210> <211> <212> <213> <220> <221> <222> <223> <221> <222> <221> <222> <221> <222> <223> c tog n Ser aga c att.
aac a aaga aamo caag 77 gca gga gtc Ala Giy Val etaacacagag gaaaccaaaa ggactgattt gatttccact tgaaaatcc ttcaggcatt gaagtgttcc ttttaetcaaa ttcccaagt.
aaaataaact aacaaaggag tccgcaactt atcctacgcg tgaaggtca ttatttct c aaaaggaca.
acggtggcc caaaarcc cc aagttgataa gaas taaaaa cactgaaaaa.
etgtcatggtg agggagtaaa cttctttcta ggcgcttgct rscktgaatg aaaaaaaa aagtttctgt ggaaggccgt tagcccotgt kgatgacttg ggaat cact c 587
DNA
Homo sapiens sig..peptide 421. .465 Von Heijne matrix score 3.90000009536743 seq LVPLGQSFPLSEP/RC poiyA-.signal 553. .558 poiyA-site 575. .587 misc-feature 182. .322 homology id :T35951 es c 199 <221> misc_feature <222> 32- 132 <223:, homology id :T35951 est <221> misc-feature <222> 136. .193 <223> homology id :T35951 eca <221> misc-feature <222> 182. .322 <223> homology id :T35949 est <221> misc..feature <222> 32. .132 <223> homology id :T35949 est <221> misc-feature <222> 136. .193 <223> homology id :T35949 eat.
<221> misc-.feature <2222- 136. .299 <223> homology id :AkA38lll1 est <221> misc-feature <222> 32. .132 <223> homology id :?A381lll est <22 1> misc-feature <222> 136. .322 <223> homology id :AA381O0l est -<221> misc-feature <222> 85. .132 <223> homology id :M.381001 es t <221> misc-feature <222> 182. .322 <223> homology id :HSCZQtO4l eat <221> misc-.feature <222> 136. .193 <223> homology id :HSCZQEO41 est <221> misc-feature <222> 82. .132 <223> homology id :HSCZQEO41 eat <221> misc-feature <222> 316. .428 <223> homology 200 <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223 <22 1> <222> <223> <221> <222> <223> id :AA477628 est misc-feature 475. .554 homology id :AA477628 es t misc-feacure 182. .322 homology id :HSC34001.
est misc-..eature 136. .192 homology id :HSC34001l es t misc-feature 41. .119 homology id ;AA090647 est misc-feature 136. .184 homology id :AA090647 eat misc-.feature 316. .426 homology id :AA505962 eat <300> <400> 77 aactcatc gaaaccagaa tkwgtgtgtk aggggaaccc gctcgtagcg agaagatgcc agagctdt~t atg cgt ct Met Arg Leu tgt ctt caa Cys Leu Gir Ccacccctcc ctcctaggtc gaaaaatatg agacggggaa gaagtycckg ctcaggcgt gctgttcaga gctgtgactg ccgggccttc tctcctcgtc ccactccagc ctctggactg ccttccacag ggggccttgc gtc ccc ttg ggc cag Val Pro Leu Gly Gln cct gtg aaa tgg gat Pro Val Lys Trp Asp I acacttttca tcatcgtgtg aggtacagtg cggctgcact atcatccaga ggggctctct agggaagggt tca ttt ccc Ser Phe Pro gaaaaagaat ctgcatcctg atgtgtgtgc Cgcctttggc tgtttgatcg tggtggcttg cagagaagct gcccttggct gcagccagtg tccgggaggc tcagtggctg aatgtccagc ccaggacttg acatcttaag ctc tct gag cct cgg Leu Ser Glu Pro Arg aat cac tgc ctt acc tcc ctc Asn His Cys Leu Thr Ser Leu gaa gtt ttt cat aaa ct. tgg Glu Val Phe His Lys Leu Trp 130 acg gtt gtt gtg agg act gagI Thr Val Val Val Arg Thr Glu atg cta gtg taaaaaaaaa aaaa Met Leu Val <23.0> 78 <21.1> 400 <212> DN'A <213> Homo sapiens <220> <221> sig..peptide <222> 198. .278 <223> Von Heine matrix score 4.90000009536743 seq CLLSYIALGAIHA/KI 201 <221> PoiyA.sigriai <222> 364. .369 <221> polyA_site <222> 387. .400 <300> <400> 78 aaCtttgcct tccgactcca gctccct cc tctt cc tgaa gggtgtcttg cgttCtgCac attccggagg accagcttcc tggaaaccag atggggcaac ggggtggttc tagtgcagac acctcagcc tgcecatttc cagcecagaa ctctactaa aaaggaa atg aac agg gtc cct gct gat tct cca Met Asn Arg Val. Pro Ala Asp Ser Pro ccatcagaag tgtagctgca cggcgtetec aa atg Asn Met tge cta sec ege eta ctg age Cys Leu Ile Cys Leu Leu Ser -10 aaa atc tge aga aga gca tcC Lys Ile Cys Arg Arg Ala Phe 1 5 acg ggc geg aga gct egg egc Thr Gly Val Arg Ala Trp Cys ttggaatagc caaaaaaaaa aaaaa <210:> 79 <211> 1166 <212> DNA <213> Homo sapiens <220> <221> sig..peptide <222> 167. .229 <223> Von H-eiine matrix eac aea gcd ctt Tyr Ile Ala LeU ggs gcc acc cac gca Gly Ala Ile His Ala Cag gaa gag gga aga gca &at gca aag Gin Giu Glu Gly Arg Ala Asn Ala Lys 10 is ata cag cca egg gcc aaa aaagttcc Ile Gin Pro Trp Ala Lys <221> <222> <221> <222> <221> <222> <223> <221> <222> <223> <221> <222> <223> score 5.59999990463257 seq LVLSLQFLLLSYD/LF POlyk.s ignal 1133. .1138 P0lYA-.Sice 1154. .1166 misc-.feature 22. .377 homology id :AA30691 est misc-.fea Cure 424. .540 homology id :AA306911 est misc..feature 376. .424 homology id :AA306911 <c221> misc-feature <222> 4,..458 <223> homology id :AA417777 est <221> misc-feature <222> 10. .447 <223> homology id :AA236327 est, <221> misc..feature <222> 279. .714 202 <223> homology id :ALA410332 est <221> misc-feature <222> 680.-893 <223> homology id :N32991 esc <221> misc-feature <222> 881. .1023 <223> homology id :N32991 es t <221> misc-feature <222> 1056. .1109 <223> homology id ;N32991 eat <221> misc-feature <222,> 1122,..1153 <223> homology id :N32991 est <221> misc-feature <222> 1024. .1054 <223> homology id :N32991 es e <221> misc-feature <222> 703. .893 <223> homology id :N24951 es t <221> misc-feature <222> 881. .1023 <223> homology id :N24951 es t <221> misc-feature <222> 1056. .1109 <223> homology id :N24951 eat <221> misc-feature <222> 1122.-1153 <223> homology id :N24951 eat <221> misc-feature <222> 1024. .1054 <223> homology id :N24951 eat <221> misc-feature <222> 225. .563 <223> homology id :AA455215 est <221t> misc-feature <222> 544. .631 <223> homology id :AA455215 eat <221> <222> <223> <221> <222> <223> misc..feaCt.ire 203 629. .660 homology id :AA455215 est misc-feature 680. .793 homology id :N66437 eat <300> <400> 79 aatgacaacc gacgttggag cccaaaggaa ccagaagcct ggaactgtgg gatgcgccct acc agt aac tac agc ct Thr Ser Asn Tyr Ser Le ttggaggcg cttgccttag Ctccctcagc ggcagggaga tgggggcccg agaaaacaga agcaagggaa acagctctca cagecaggag cggeececcg aggaag aeg ctc cag met Leu Gin ccc ctg ccg ctg tcc Phe Leu Leu Leu Ser :g gtg cc ~u Val Leu ctg cag Leu Gin tat gac ctc tcc gtc aat Tyr Asp Leu Phe Val. Asn etc cca gaa ctg ctc caa aag act cct Phe Ser Glu Leu Leu Gin Lys Thr Pro 1 atc cag Ile Gin ccc gcg ctc ttc atc atc cag Leu Val Leu Phe Ile Ile Gin att gca gtc cc Ile Ala Val Leu azc atc atc Ile Ile Ile ctc acg tc Leu met Phe aac acc tcc gtc Asn Thr Ser Val ttc cag Phe Gin aec aec Ile Ile gct ggc ctg Ala Gly Leu ccg aca gcc Leu Thr Ala acg aac cta Met Asn Leu aac ccc cta tc Asn Leu Leu Phe aag ttc aaa ggg Lys Phe Lys Gly gtg cac tc gcc Val. Tyr Phe Ala agc aec ccc cte Ser Ile Ser Leu g~c egg gec Val Trp Val 120 175 223 271 319 367 415 463 511 559 607 652 712 772 832 892 952 1012 1072 1132 1166 cgc tgg aaa aac tcc aac agc ttc ata egg aca gat gga Arg Trp Lys Asn Ser Asn Ser Phe Ile Trp Tlhr Asp Giy caa atg ceg tee Gin Met Leu Phe etc cag aga cta Phe Gin Arg Leu gca gtg ttg tcc Ala Val Leu Tyr tac ccc tat aaa Tyr Phe Tyr Lys gcc gea aga Ala Vai Arg gat ccc cac Asp Pro His cag gac tc Gin Asp Ser egg ceg cgc aag gag tcc atg caa get Trp Leu Arg Lys Giu Phe Met Gin Val tgacctcg tcacactgac tgcagggaga gceggcccea ctagcegcgt tcagcaccca aCcggcccc agcgcggcae ctcctccc egeaccatc tcttacccce gtgaagccec gatcccccag wtgaccaaag accgcagaca tccagacccc aataaaeega atcyecgtec <210> 80 <211> 754 <212> DNA <213> Homo sapiens <220> <221> sig-.pepcide <222> 180. .383 gga tact tee tgcatgggca agaaggaaga ccctccecg attceccceg tcctt tagcc ga tgcgaaga tatacccaag cctccegat aacagctgga tcccccctct cctc tacct accggccttt tgggacagaa gtgacagtta gcactttaaa agaagccaca ctttccaagg tgcacaatta ctgttccacc ccegccgagg ggacctcccg cgncgctcce aaaacgtt etc tac Phe Tyr 125 agg Arg e c gc Cgc cc aaggtccaga gagcgtcccc cccttccttc gttctgtggc gcccccaaag gactgaccac ataaatagag caaaaaaaaa aaaa 204 <223> Von Heijne matrix score 4.59999990463257 seq LPFSLVSMLVTQG/1 4
V
-c221> polyA.signal <222> 722. .727 <221> polyA-D.ite <222> 743. .754 <221> misc_ feature 4222> 116. .450 <223> homology id :W68799 est <221> misc-.feature <222> 593. .710 <223> homology id ;W68799 est 4221> misc..feature <222> 18. .117 <223> homology id :W68799 es t <221> misc-.feature <222> 561. .598 <223> homology id :W68799 es t <221> misc-.feature <222> 48. .511 <223> homology id :AA149518 est <221> misc-.feature <222> 593. .673 <223> homology id :AA149518 est <221> misc-.feature <222> 535. .710 <223> homology id :W80356 est <221> misc..feature <222> 256. .405 <223> homology id :W80356 es t <221> misc-feature <222> 432. .Sl1 <223> homology id :W80356 est <221> misc..feature <222> 392. .437 <223> homology -id :.W80356 est <221> misc-.feature <222> 535. .710 <223> homology id :W80631 es t <221> misc..feature <222> 289. .437 <223> homology id :W80631 est <221> misc-feacure <222>- 432. .511 <223> homology id :W80631 .est <221> misc-..eature <222> 343. .511 <223> homology id :AA142865 eat 205 <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> misc-.feature 535. .710 homology id !AA142865 est misc-feature 256, .341 homology id :AA142865 es t misc-.feature 248..511 homology id !AA405876 est misc-.feature 21. .271 homology id :AA405876 est misc..feature 121. .450 homology id :W68728 est misc-feature 592. .710 homology id :W68728 est <300> <400> aagacaggtg ggcctgctgg ctgggtcaag gggtactcgg get tggcaac gagtaagcag gaagc tggag gagggac tcg aggataaaca cgggCC9gcg gcctcggagg actggaagga gtgcagtcac gggggagcga cgacccagac cacacagaca gagcaagcac aaagtcatc atg gct tca gcg tct gct cgt gga Met Ala Ser Ala Ser Ala Arg Gly cca cca cca agc aag cag agc ctg Pro Pro Pro Ser Lys Gin Ser Leu caa gat aaa gat Gin Asp Lys Asp gcc cat ttt Ala His Phe tca aaa ctg Ser Lys Leu ttg ttt tgt cca Leu Phe Cys Pro cac atc His Ile cac aga gca gag His Arg Ala Giu tea aag att atg Ser Lys Ile Met gaa tgt cag gaa Glu Cys Gin Glu agt ttc tgg aag Ser Phe Trp Lys gct ctg cct ttt tct ctt gta age atg ctt Ala Leu Pro Phe Ser Leu Val Ser Met Leu gtc acc cag gga cta gtc tao caa ggt tat ttg gca gct aat tct aga Val Thr Gin Gly Leu Val Tyr Gin Gty Tyr Leu Ala Ala Asn Ser Arg 206 5 ttt gga tca Phe Gly Ser 1s cCC gga aag Leu Gly Lys ttg ccc aaa Leu Pro Lys gta tca tac Val Ser Tyr gtt gca ctt Val Ala Leu 20 ata gga gta Ile Gly V'al gct ggt ctc ttg gga ttt ggc Ala Gly Leu Leu Gly Phe Gly tgc cag agt aaa ttc cat ttt Cys Gin Ser Lys Phe His Phe ttt gaa gat cag ctc cgt ggg gct ggt ttt ggt ccw aca gca Phe Clu Asp Gin Leu Arg Gly Ala Gly Phe Gly Pro Thr Ala 50 taacaggcac tgcctcc gggagactcc cagccct acctctgaat tcgtaca acgtvaaaaa aaaaasa <210> 82.
<211> 709 <212> DNA <213> Homo sapiens <220> tta cctgtgagga atgcaaaata :ag ctcccaaat tctgcgtctg cat ttaaaatttc aaggtgtct aagcatggat taagcgagaa tgactcccga agtcccccaa ttaaaatnaa aacactcta, <221> <222> <223> <22 1> <222> <221> <222> <221> <2 22 <223> <221> <222> <223> <221> <222> <223> <221 <222> <223> <221> <222> <223> <221> <222> <223> <221> <222> <223> sig-.peptide 179. .298 Von Heine matrix score 4.30000019073486 seq ITLVSAAPGKVIC/EM polyAsignal 680. .685 polyA.s ice 697. .708 misc-.feature 137. .291 homology id :AA121372 est misc-feature 6. .91 homology id :AA121372 est rnisc-f.eature 318. .397 homology id :AA121372 est misc..feature 95. .132 homology id :A.A221372 es t misc-feature 460. .501 homology id :AA121372 eat misc-.feature 432. .465 homology id :AA121372 est misc-.feature 284. .313 homology id :AA121372 eat 207 <221> misc-featdre <222> 254. .670 <223> homology id :AA614605 es C <221> misc-..feature <222> 392. .658 <223> homology id :TS5234 <221> misc-feature <222> 271. .327 <223> homology id :T55234 es t <221> misc-.feature <222> 358. .670 <223> homology id :AA121362 est <221> misc-feature <222> 31.2. .344 <223> homology id :AA121362
C
<221> misc-feature <222> 2. .102 <223> homology id :T53974 es C <221> misc-feature <222> 150. .258 <223> homology Id :T53974 est <221> misc..jeature <222> 95. .171 <223> homology id :T53974 est <221> misc-feature <222> 322. .628 <223> homology id :HSPD02295 esc <221> misc-.feature <222> 445. .670 <223> homology id :AA454S02 est <221> misc-feature <222> 2. .102 <223> homology id :R09314 es C <221> misc..feature <222> 95. .171 <223> homology id :R09314 es C <221> misc-feature <222> 150. .222 <223> homology id ;R09314 est 208 <300> <400> 81 aaaatcgcgg ggcggagaag tccagctctt accaccgggg ctgccakctc gcccgactcc ggtgcgggct Cttcgccctt tgtgtccctc ccgaagttcg ttcttgcgca aagcccaaag cggcctcttg cgctcctagg tttcactaac ttctggactt gccggaaaac cgtccacg acc agc atg act Thr Ser Met Thr tct ctg cgg gag Ser Leu Arg Glu ata aag 9CC Ile Lys Ala atg acc Met Thr gct cgc Oat Ala Arg Asn gag aga gtt ttg 99a aag att act ctt gtc Glu Arg Val. Leu Gly Lys Ile Thr Leu Val cct get qet Ala Ala Cet ggg Pro Gly gca ata Ala Ile a gtg Ott tgt gad aCg ada gta gaa Oa Lys Val Ile Cya Glu Met Lys Val Glu Giu gag cat Glu His ace aat Thr Asn gat -aac Asp Asn ggc act etc Gly Thr Leu aca atg gct Thr Met Ala cac ggc ggt ttg His Gly Gly LeU gCC acg tta gta Ala Thr Leu Val.
ata tca Ile Ser ctg cta tgc Leu Leu Cys ga8 agg gga gca Glu Arg Gly Ala s0 120 178 226 274 322 370 418 466 514 562 608 668 709 gec agt gte gat atg aac ata aeg Val Ser Val Asp Met Asn Ile Thr tca Cct gca Ser Pro Ala aaa tta Lys Leu Lys Thr gga gag gat Cly GlU Asp ctt gca ttt Leu Ala Phe gtg att aca gca Val. Ile Thr Ala gtt ctg 889 caa, Val Leu.Lys Gln aCC tct gtg ggt ctg acc aac aag gCC aca gga 888 tta Thr Ser Val Gly Leu Thr Asn Lys Ala Thr Gly Lys Leu aca gca Ile Ala caa gga. 898 caC aca 888 cac ccg gga Gin Cly Arg His Thr Lys His Leu Gly a
A
ac tgagagaaca, sn 00 agatttgac tcaaacaatt gcagaatgac ctaaagaaac ccaacaatga atatcaagta t~ gtaatttteg aaataaacta gcaaaaccaa aaaaaaaaaa g <210> 82 <2i1> 243 <212> DNA <213> Homo sapiens <220> <221> <222> <223> <221> <22 2> <221> <222> <221> <222> <223> <221> <222> <223> <221> <222> <223> s ig.peptide 100. .171 Von Heijne matrix score 3.70000004768372 seq ILFNLLIFLCGFT/NY polyAsignal 211. .216 polyA..site 230, .243 misc..feature 2. .164 homology id :H64488 es t misc-.feature 2. .164 homology id :AA131065 est mi sc-eature 5. .164 homology id :AA224847 est <221> misc-feature <222> 10. .164 <223> homology id :AA161042 est <221> misc-feacure <222> 2. .84 <223> homology id :AA088770 est <221> misc-.feature <222> 104. .164 <223> homology id :AA088770 eat <221> misc..feature <222> 10,..164 <223-> homology id :AA100852 es t <221> misc..feature -222> 79.. .164 <223> homology id :AA146774 eat <221> misc-feature <222> 79. .164 <223> homology id :AA14660S est <221> mise-.feature <222> 109. .164 <223> homology id ;AA299239 est <221> misc-.feature <222> 158. .20'7 <223> homology id :AA037885 eat <221> misc-,feature <222> 160. .207 <223> homology id :AA480512 est <221> misc..feature <222:> 160. .207 <223> homology id :AA468030 eat <221> misc-feature <222> 160. .207 <223> homology id :AA420727 est <221> misc-.feature <222> 160,..207 <223> homology id :AA574382 es t <221> misc-feature 209 210 <222> 160.-207 <223> homology id :AA133048 es C <221> misc-feature <222> 200. .229 <223> homology id :AA469266 eat <221> misc-.feature <222> 200. .229 <223> homology id :AA5S0735 est <221> misc-f.eature <222> 200. .229 <223> homology id :AA601071 eat <221:, misc-.feature <222> 200. .229 <223> homology id :AA225190 est <300> <400> 82 aactcagtgg caacacccgg g~ tccagaactc actgccaaga g etc att aag acc atg atg Phe Ile Lys Thr Met Met ggc tC acc aac Cat acg Gly ?he Thr Asn Tyr Thr agctgtttt gtcctttgtg gagcctcagc agttccccct :cctgaaca ggagccacc atg cag tgc tc agc Met Gin Cys Phe Ser atc ctc tC aat ttg ctc atc ttt ctg tgt Ile Leu Phe Asn Leu Leu Ile Phe Leu Cys -10 gat Ctt gag gac tca ccc tac ttc aaa atg Asp Phe Glu Asp Ser Pro Tyr Phe Lys Met 5 taaaaaaaaa aaaaa cat aaa CCC His Lys Pro
I
gtt aca atg Val Thr Met <210> 83 <211> 829 <212> DNA 4213> Homo sapiens <220> <221> sig-peptide <222> 346. .408 <223> Von Heiine matrix score seq SFLPSALVIWTSA/AF <221> poly&..signal <222> 792. .797 <221> polyA.site <222> 817. .829 <221> misc-feature <222> 260. .464 <223> homology id ;H57434 est <221> misc-feature <222> 118. .184 <223> homology id :H57434 es C 211 <221> misc-feature <222> 56. .113 <223> homology id :H57434 eat <221> misc-feature <222> 454. .485 <223> homology id :H57434 est <221> misc-.feature <222> 118.. 545 <223> homology id :N27248 eat <221> mlsc-.feature <222> 65. .369 <223> homology id :H94779 est <221>- misc.fieature <222> 471. .519 <223> homology id :H94779 est <221> misc..feature <222> 61. .399 <223> homology id :H09880 est <221> misc-feature <222> 408. .452 <223> homology id :H09880 est <221> misc-feature <222> 60. .399 <223> homology id :H29351 est <221> misc-.feature <222> 393. .432 <223>, homology id :H29351 eat <221> misc-.feature <222> 260. .444 <223> homology id :AA4S9Sll eat <221> misc-.feature <222> 449. .545 <223> homology id :AA459511 es t <221> misc-.feature <222> 117. .184 <223> homology id :AA459511 est <221> misc-feature <222> 122. .399 <223> homology 212 id :T74091 -<221> <222> <223> <2215.
<222> -c223> <221> <222> <223> <221> <2 22> <223>; <221> <222> <223> misc-feature 393. .434 homology id :T74091 est misc..feature 61..378 homology id :HSC3CB08l est misc-.feature 118. .399 homology id :T82010 est misc-feature 268. .545 homology id :W02860 es t misc-feature 268. .545 homnology id :N44490 est <300> <400> 83 ctgatgccga c tcaaacggc gttgttgaa cgttcctgtt aagactaaca gcataggggc gttccgtctc ctagtgcttc gcagt tacca gagtacacgt ttttgtgaag t tcggcgcca tcgcgtcttt gcgc ttccgg agaatcttca tcctgttgat ttgtaaaaca gcggccagcg tcc tggtccc agaaaatcag accctttccc t tacaaaagg gaaaacctgt ctagtcggtc tggtaagtgc aggcaaagcg gasgnagatc cggtctaatt aattcctctg acaaaagcta attgagtaca tgcaggtatg agcaggcg tagaa atg tgg tgg ctt Met Trp Trp Phe gta att tgg aca tct Val Ile Trp Thr Ser cag caa ggc ctc agt ttc ctt. cct tca gce ctt Gin Gin Giy.Leu Ser Phe Leu Pro Ser Ala Leu get gct ttc ata ttt Ala Ala Phe Ile Phe tea tac Ser Tyr att act gca gta aca ctc cac cat Ile Thr Ala Val Thr Leu His His gac ccg get eta ect tat ate agt gac act ggt aca Asp Pro Ala Leu Pro Tyr Ile Ser Asp Thr Gly Thr aaa tgc tta Lys Cys Leu ggg gca atg eta aat att gcg gca Gly Ala Met Leu As Ile Ala Ala gta gct cca gaa Val Ala Pro Giu gte tea ege caa Val Leu Cys Gin aaa tagaaatcag gaagataatt caacttaaag aagtecattt catgaccaaa Lys ctcttcagaa gtctggcaat tgjgtaaggeg gtaageegaa <210> 84 <211> 674 <212 DNA <213> Homo <220> <221> sig.
<222> 177.
<223> Von acatgtettt acaagcatat ctcttgtatt gcttctaca ctgttgaatt attcegeag tggaaaattt gatttagcta gttcttgaet eggataaata ggctttccc cctgtgtaat tggeesesac gtcttacttg agccaagttg ataaaatgat watgagagtg acacavaaaa aaaaaaa sapiens pept ide.
.233 Heiine matrix 213 score 6.09999990463257 seq LALLWSLPASDLG/RS <221> PolyA signal <222> 644. .649 <221> pOlyA-site <222> 663. .674 <221> misc-feature <222> 194. .592 <223> homology id :AA496246 est <221> misc-.feature <222> 1. .100 <223> homology id :AA496226 es t <221> misc-feature <222> 99. .202 <223> homology -id :AA496246 es t <221> misc-feature <222> 187. .592 <223> homology id :AA476481 est <221> misc..feature <222> 594. .661 <223> homology id :AA476481 es t <221> misc-feature <222> 188. .592 <223> hoinology id :AA496245 eat <221> misc-feature <222> 594. .661 <223> homology id :AA496245 est <221> misc-feature <222> 194. .444 <223> homology id :AA476480 est <221> misc-.feature <222> 1- 102 <223> homology id :AA476480 es t <221> misc..feature <222> 99..187 <223> homology id :AA476480 est <221> misc..feature <222> 437. .592 <223> homology id :AA505488 est <221> misc-.feature <222> 594. .661 214 <223> homology id :AA505488 est <:221> misc-feature <222> 441.-592 <223> homology id :AA554685 es t <221>- misc-f.eature <222> 594. .661 <223> <:221> <:222> <223> <221> <:222> <:223> homology id :AA554685 est misc-.feature 414. .503 homology id ;AA215595 es t misc-feature -510. .539 homology id :AA215595 est ':300> <:400> 84 ataagtgaac cagaccaccc gggtgggtgg actagaagca agctgctgca cagagcctgg tgatggcatc cacagtgatg tttgggagta gtggccaggg tgtccacaag cttccaggtt agc ccc ggc agc gcc ttg gcc ctt ctg tgg tcc Ser Pro Gly Ser Ala Leu Ala Leu Leu Trp Ser -10 tcaaggctgg ggctggccag gCcCcggacg ctagccacgg ggggttggag cctggg atg Met ctg cca gcc tct gac Leu Pro Ala Ser Asp cac act ggc gtt ctc His Thr Gly Val Leu cCg ggc Cgg Leu Gly Arg
I
tca gtc att gct gga ctc rgg cca Ser Val Ile Ala Gly Leu Trp Pro cac ttg gaa aca His LeU Glu Thr ,cag tct ttt ctg caa ggt cag ttg acc Gin Ser Phe Leu Gin Gly Gin Leu Thr ata ttt ccc Ile ?he Pro tgt tgt aca tcg Cys Cys Thr Ser ttt tgt gtt Phe Cys Val tgt gtt gti Cys Val Va.
tgagtcgatg aca gtg ggt Thr Val Gly ggg agg gtg ggg Gly Arg Val Gly t 9gtcagaact ttagtatacg catgcgtcct gcaccttggt aactaaaccc ctctaatagc ttactgtaaa agcttgggtt tatttttgta gcaagggggc tcctctgttg gagtaatgta ttaaaaaaaa aaaa <210> ':211> 478 ':212> DNA <213> Homo sapiens <:22 0> <221> sig~peptide <222> 179.-319 <223> Von Heiine matrix :ct aca ttt ~er Thr Phe ctgagtgaca tataaaggct ggacttaatg aattgtaatt gtt gca Val Ala gggcattttg ttagttctgt gctaagaatt ataaataaac tcgaaaa taa attgattaag agggaacata atgcaaacct <221> <:222> <221> <222> score seq SALLFFARPCVFC /FK polyAsignal 461.-466 pa iyk.si te 465. .478 215 <221> misc-featuire <222> 2. .464 <223> homology id :AA310996 es t <221> misc_ feature <222> 8. .464 <223> homology id :AA312901 est <221> misc-feature <222> 2. .416 <223> homology id :AA401411 eat <221> misc-feature <222> 2. .349 <223> homology id :R64030 est <221> misc..feature <222> 56. .464 <223> homology id :AA400108 est <221> misc-feature <222> 126. .273 <223> homology id :AA010825 eat <221> misc-feature <222> 2. .147 <223> homology id :AAO1O825 est <221> misc-feature <222> 358. .435 <223> homology id :AA010825 est <22 1> misc-feature <222> 78. .464 <223> homology id :AA504732 eat <221> misc-.feature <222> 90. .441 <223> homology id :H60506 e st <221> misc-feature <222> 59. .349 <223> homology id :AA346780 est <221> misc..feature <222> 2. .331 <223> homology id :AA281167 est <221> misc-feature <222> 6. .236 <223> homology 216 id ;R35805 est <221> misc..feature <222> 232. .284 <223> homology id :R35805 est <221> misc-f.eature <222> 41. .307 <223> homology id :H13784 est <221> misc-feature <222> 2. <223> homology id :H13784 est <221> misc-f.eature <222> 64. .280 <223> homology id :AA128122 est <221> misc-.feature <222> 293. .349 <223> homology id :AA128122 est <221> misc-feature <222> 332.,.385 ,c223> homology id :AA128122 est <221> misc-.feature <222> 163. .420 <223> homology id :AA555127 est <300> <400> aagtccttcg cgccctcctc ttacactggg caacgtggtt ctaaaaaact tgaagaaatt atg aga ctg cct cca g< gccctcccca ccgacaccat ggaatgtatc tggctcagaa aaaaaggact tggatgccaa gctccagttc ctgcttggat ctatgatata ccaaacctgg gaagaaaccc cctagtgc Met Arg Leu CtC gag ggc Leu Glu Cly Pro Pro Ala ctc gtt tac Leu Val Tyr ctg cct Leu Pro -40 tat ctg Tyr Leu tca gga tat act Ser Gly Tyr Thr aac caa aag ctt Asn Gln Lys Leu tct act qct Ser Thr Ala ttt tcg tct Phe Ser Ser cca gcc Pro Ala tca gca ctt ctc ttc ttt gct aga Ser Ala Leu Leu Phe Phe Ala Arg tgt gtt ttt tgc ttt Cys Val Phe Cys Phe gca agc aaa atg ggg ccc caa ttt gag aac tac cca Ala Ser Lys met Gly Pro Gln Phe Glu Asn Tyr Pro aca tctt cca Thr Phe Pro ggg agg ttc Gly Arg Phe aca tac tca Thr Tyr Ser cct ctt ccc ata atc Pro Leu Pro Ile Ile ttc caa ctg Phe Gln Leu taagactgga attatggtgc tagattagta aacatgactt ttaatgaaaa aaaaacaaaa <210> 86 <211> 952 <212> DNA <213> Homo sapiens <220> <2 21> Sig-peptide 217 <222> 112. .237 <223> Von Reline matrix scare 7.19999980926514 seq ILFSLSFLLVIIT/P <221> PoiyA-signal <222> 910. .915 <221> polyA..site <222> 940. .952 <300> <400> 86 Aataccttct cctctcccct CtCCCaagca catctgagtt gctgcctget CV.tCdCaCt agctccaaztc ccatgaaa ttgccaagta taaaagcttc tcaagaatga g atu gat Met Asp Cc agg gcg tc tca ccc gag aag cad gat aaa gag aat ttc gtg ggt Ser Arg Val Ser Ser Pro Glu Lys Gin Asp Lys Glu Asn Phe Val Gly -35 -30 gtc aac aac aaa cgg ctt 99t gta tg ggc tgg acc ctg tt tcc ctc Val Asn Asn Lys Arg Leu Gly Val Cys Gly Trp Ile Leu Phe Ser Leu -15 tc tc ctg Ser Phe Leu ttg aag att Leu Lys Ile ttg gtg Leu Val atc att aCC ttc ccc atc tcc Ile Ile Thr Phe Pro Ile Ser tgg atg tgc Trp, Met Cys tgatcctggt cctgccatgc ataratgtgt ttgtcaaagt tgacctccga acagttacct taactactca ggtagatgga ctaakgtcaa cgatgtccat tcktagggac acaggacctt agcatccaga ctktacttga ggaaatcaaa gatgttcgga ggccacccgg gaagsgagag atccctgaag tcagcctcca cctgcagacc ttgagcacgg catgaatata ctagagggca ataaagcctg aggtcybct <210> 87 <211> 131 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -13. l <300> gcaacae tcc gttgtctatt caagcaacat gtccccagat tgatgccacc tcccgtgca ccaaggtcct tggtgstggs tagccaccga tggcggcgt gCggtagtca tccacaagag aCagaatcta ttctgctqgc cttaggctgg gaac :ggtgg gt tgcagaga tgcagctgaa tgagtytccc gaagaatt cags tatga t aaaaaaaaaa atcc~cacca tagtgctgtc tcaaaccac t acgagaagag gggatccggg tccatggcag ggagaaatga atagctytcc acgattgtgt aaccacaaga ad rgagactccg tcagcagtgg c tgagaaa tg atcgcccata tggcccgagt ccgaggs tga atgsccccaa agstgsgsta rttcctbtgcc agsr.tbscaa <400> 87 Met Leu Leu Glu Ala Val Ser Leu Thr Val Leu Leu Gly Ala Met Met Leu Ser Pro Ile Asp Pro Gin Pro Leu Ser 1 Lys Giu Pro Pro Leu Leu Gly Val His Pro Asn Thr Lys Leu Arg Gin Ala Giu Leu Phe Ciu Leu Val Gly Pro Glu Ser Ile Ala His Ile GlY Asp Val GlU Asn Cly Thr Gly Thr Ala Asp Gly Arg Val Val Lys Leu Gly Pro Cys Ile Giu.Thr Ala Arg Phe Gly Ser Lys Thr Gly Arg 100 Arg Giy Asp Giu Pro Val Cys Gly Arg Pro Leu Gly Ile Arg Ala Gin Trp Ser Leu Cys Gly Cys Ile Gin Arg T1yr Leu Lys 218 <210> 88 <211> 63 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -35..-1 <300> <400> 88 Met Leu Thr Val Asn His Phe Pro Phe Val Val Phe Ser Leu Lys 1 Phe Glu Ser Cys Cys, <210> 89 <211> 163 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -31..-1 <300> <400> 89 Met Ser Pro Ala Phe Ser Phe Trp Ser Pro Ser Ala Ser Met Arg
S
Ile Ser Ser Thr Arg Ser Thr Lys Asp Lys Pro Ile Phe Ser Pro Pro Glu Tyr Pro Thr Ser Glu Val Ser Ser Gly Leu Ser Arg Leu 100 Gly Val Gln Pro Val 115 Pro Pro Pro 130 <210> Asp Val Arg Phe Tyr Arg Asn Val Arg Ser Asn -30 -25 Arg Lcu Cys Gly Leu Leu His Lou Trp Leu Lys -10 Gin Leu Lys Lys Lys Ser Trp Set Lys Tyr Leu 5 Tyr Arg Ser Leu Tyr Val Cys Val Phe Ile 20 Val Glu Ser Gly Ser Pro Arg Cys Leu Leu Cys Gly His Pro Xaa Phe Ala Xaa Leu Xaa <211> 52 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -32..-1 <300> <400> Met Leu Gly Thr Thr Gly Leu Gly Thr Gln Gly Pro Ser Gln Gln Ala -25 Leu Gly Phe Phe Ser Phe Met Leu Leu Gly Met Gly Gly Cys Leu Pro -10 Gly Phe Leu Leu Gln Pro Pro Asn Arg Ser Pro Thr Leu Pro Ala Ser 1 5 10 219 Thr Phe Ala His' <210> 91 <211> 124 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -97..-1 <300> <400> 91 Met Ala Asp Asp Leu Lys Arg Phe Leu Tyr Lys Lys Leu Pro Scr Val -90 Glu Gly Leu His Ala Ile Val Val Ser Asp Arg Asp GlyVal Pro Val -75 Ile Lys Val Ala Asn Asp Asn Ala Pro Glu His Ala Leu Arg Pro Gly -60 -55 Phe Leu Ser Thr Phe Ala Leu Ala Thr Asp Gln Gly Ser Lys Leu Gly -40 Leu Ser Lys Asn Lys Ser Ile Ile Cys Tyr Tyr Asn Thr Tyr Gln Val -25 Val Gin Phe Asn Arg Leu Pro Leu Val Val Ser Phe Ile Ala Ser Ser -10 Ser Ala Asn Thr Gly Leu Ile Val Ser Leu Glu Lys Glu Leu Ala Pro 1 5 10 Leu Phe Glu Glu Leu Arg Gln Val Val Glu Ile Ser <210> 92 <211> 230 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -24..-1 <300> <400> 92 Met Ala Ser Leu Gly Leu Gin Leu Val Gly Tyr Ile Leu Gly Leu Leu -15 Gly Leu Leu Gly Thr Leu Val Ala Met Leu Leu Pro Ser Trp Lys Thr 1 Ser Ser Tyr Val Gly Ala Ser Ile Val Thr Ala Val Gly Phe Ser Lys 15 Gly Leu Trp Met Glu Cys Ala Thr His Ser Thr Gly Ile Thr Gln Cys 30 35 Asp Ile Tyr Ser Thr Leu Leu Gly Leu Pro Ala Asp Ile Xaa Ala Ala 45 50 Gin Ala Met Met Val Thr Ser Ser Ala Ile Ser Ser Leu Ala Cys Ile 65 Ile Ser Val Val Gly Met Xaa Cys Thr Val Phe Cys Gln Glu Ser Arg 80 Ala Lys Asp Arg Val Ala Val Ala Gly Gly Val Phe Phe Ile Leu Gly 95 100 Gly Leu Leu Gly Phe Ile Pro Val Ala Trp Asn Leu His Gly Ile Leu 105 110 115 120 Arg Asp Phe Tyr Ser Pro Leu Val Pro Asp Ser Met Lys Phe Glu Ile 125 130 135 Gly Glu Ala Leu Tyr Leu Gly Ile Ile Ser Ser Leu Phe Ser Leu Ile 140 145 150 Ala Gly Ile Ile Leu Cys Phe Ser Cys Ser Ser Gln Arg Asn Arg Ser 155 160 165 Asn Tyr Tyr Asp Ala Tyr Gln Ala Gln Pro Leu Ala Thr Arg Ser Ser 170 175 180 Pro Arg Pro Gly Gln Pro Pro Lys Val Lys Ser Glu Phe Asn Ser Tyr 220 185 190 195 200 Ser Leu Thr Gly Tyr Val 205 <210> 93 <211> 72 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -32..-1 <300> <400> 93 Met Phe Ala Pro Ala Val Met Arg Ala Phe Arg Lys Asn Lys Thr Leu -25 Gly Tyr Gly Val Pro Met Leu Leu Leu Ile Val Gly Gly Ser Phe Gly -10 Leu Arg Glu Phe Ser Gin Ile Arg Tyr Asp Ala Val Lys Ser Lys Met 1 5 10 Asp Pro Glu Leu Glu Lys Lys Pro Lys Glu Asn Lys Ile Ser Leu Glu 25 Ser Glu Tyr Glu Gly Ser Ile Cys <210> 94 <211> 91 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -36..-1 <300> <400> 94 Met Asn Thr Phe Glu Pro Asp Ser Leu Ala Val Ile Ala Phe Phe Leu -30 Pro Ile Trp Thr Phe Ser Ala Leu Thr Phe Leu Phe Leu His Leu Pro -15 -10 Pro Ser Thr Ser Leu Phe Ile Asn Leu Ala Arg Gly Gin Ile Lys Gly 1 5 Pro Leu Gly Leu Ile Leu Leu Leu Ser Phe Cys Gly Gly Tyr Thr Lys 20 Cys Asp Phe Ala Leu Ser Tyr Leu Glu Ile Pro Asn Arg Ile Glu Phe 35 Ser Ile Met Asp Pro Lys Arg Lys Thr Lys Cys 50 <210> 95 <211> 106 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -32..-1 <300> <400> Met Phe Ala Pro Ala Val Met Arg Ala Phe Arg Lys Asn Lys Thr Leu -25 Gly Tyr Gly Val Pro Met Leu Leu Leu Ile Val Gly Gly Ser Phe Gly -10 Leu Arg Glu Phe Ser Gln Ile Arg Tyr Asp Ala Val Lys Gly Lys Met 1 5 10 Asp Pro Glu Leu Glu Lys Lys Leu Lys Glu Asn Lys Ile Ser Leu Glu 25 Ser Glu Tyr Glu Lys Ile Lys Asp Ser Lys Phe Asp Asp Trp Lys Asn 40 Ile Arg Gly Pro Arg Pro Trp Glu Asp Pro Asp Leu Leu GCln Gly Arg Asn Pro Glu Ser Leu Lys Thr Lys Thr Thr <210> 96 <211> 172 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -21..-1 <300> <400> 96 Met Trp Trp Phe Gin Gin Glv Leu r Ile Trp Leu His Val Ala Val Leu Leu Ser Val Leu Gin Glu Leu Trp Pro Lys 125 Thr Val Ala Asp Lys Ala Glu Leu Pro Leu Pro -15 Phe Ala Leu Ile 50 Val Cys Cys Ile Ser 130 e~ FI Leu Pro Ser Ala Leu Val Ile Leu Phe Tyr Ile Leu Cys Tyr 115 Asn Ser Tyr Ala Arg Lys Leu Cys Cys Lys Tyr Ile Met Tyr Leu Ser Lys Ser Thr Cys 150 Ile Se Leu Lys Asn Ile Trp Asp Ser 135 Thr Thr Ala Asp Thr Asn Ile Gin Val Lys Ala Val Ala Ser Cys 105 His Pro 120 Leu Leu Gly Tyr Leu Val Trp Ser Lys <210> 97 <211> 56 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -42..-i <300> <400> 97 Met Cys Phe Pro Glu His Arg Arg Gin Met Tyr Ile Gln Asp Arg Leu -35 Asp Ser Val Thr Arg Arg Ala Arg Gin Gly Arg Ile Cys Ala Ile Leu -20 Leu Leu Gin Ser Gin Cys Ala Tyr Trp Ala Leu Pro Glu Pro Arg Thr -5 1 Leu Asp Gly Gly His Leu Met Gln <210> 98 <211> 46 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -22..-l <300> <400> 98 Met Gln Asn His Leu Gin Thr Arg Pro Leu Phe Leu Thr Cys Leu Phe -15 Trp Pro Leu Ala Ala Leu Asn Val Adh Ser Thr Phe Glu Cys Leu Ile 222 1 Leu Gin Cys Ser Val Phe Ser Phe Ala Phe Phe Ala Leu Trp <210> 99 <211> 251 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -28..-l <300> <400> 99 Met Trp Arg Leu Leu Ala Arg Ala Ser Ala Pro Leu Lou Arg Val Pro Leu Ser Asp Ser TI Leu Leu Pro Val Pi
S
Lys Lou Arg Phe I Clu Pro Lys Asn Le Clu Xaa Thr Glu GI Leu His Trp Gly Hi Met Asp Pro Lys As Lys Pro Ile Thr Ar Gly Ala Ile Asp Hi 120 Val Clu Met Gly G1 135 Asp Gin Val Ala Hi 150 Gly Thr Leu Glu Ly 165 Asn Gin Asn Pro Tr 18 Gly Ile Arg Lys Va 200 Xaa Trp Gly Lys Ph 215 <210> 100 <211> 77 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> <300> <400> 100 Met Leu Arg Leu As Leu Ile Val Ser Va -1 Val Gly Gly Gly Va Asp Gly Ala Leu Ii.
Tyr Gin Lys Lys Pr rp 0 e g y
S
n 5 1 e
S
S
P
5 1 e Ala Ser 10 Glu Ser Asn Phe Met 90 Lys Tyr Arg Lys Met 170 Thr Leu Tyr I Leu Phe Arg Asp Phe Glu 75 Phe Ser Va1 Cys Leu 155 Arg Phe Ser Met Leu -5 Glu Ala Ile Ala 60 Met Ala Val1 Thr Xaa 140 Pro Lys Glu Pro Pro 220 Pro Asp Pro Arg 45 Ile Met Ile Gly Pro 125 Phe Phe Asp Arg Tyr 205 Ala Va1 Leu 30 Gly Leu Arg Trp His 110 Val Glu Ala Gin Ile.
190 Asp Va Pro Ser Ala Gly Val Lys Thr 1 Ser Ile Pro Giu Lys Pro i5 Va1 Pro Ala Leu Arg 95 Arg Lys Glu Ala Glu 175 Ala Leu Va1 Glu 50 Gly Asn Ala Gly Arg 130 Gly Val G1u Xaa Lys 210 Arg Ala Gly Arg Pro Gly 115 Xaa Phe Ser Xaa Met 195 Gly Arg Thr Lyr Ser Phe 100 Lys Xaa Leu Arg Asn 180 Leu Lys Xaa Arg Val p Ile Ile Asn Ser Lou Val Thr Thr Val Phe Met -25 -20 1 Lou Ala Leu Ile Pro Giu Thr Thr Thr Leu Thr 0 -5 1 1 Phe Ala Leu Val Thr Ala Val Cys Cys Leu Ala 10 o Tyr Arg Lys Leu Leu Phe Asn Pro Ser Gly Pro 25 30 o Val His Glu Lys Lys Glu Val Leu 40 223 <210> 101 <211> 81 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -31..-1 <300> <400> 101 Met Ser Asn Thr Hi~ Ala Leu Thr Cys Cys Arg Pro Leu Pro Arg Glu Leu Arg Tyr Pro Ser Pro Cys Arg Thr Leu <210> 102 <211> 126 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -20..-1 <300> <400> 102 Met Lys Val His Met Phe Ile Phe His His 1 Pro Giu Ala Leu His Ser Lys Phe Ser Lys Glu Lys Leu Phe Glu Gly Leu Glu Lys Leu Val Giu Ile Asn His Arg Tyr Phe Gly Ser <210> 103 <211> 273 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -45..-1 <300> <400> 103 Met Asn Trp Ser Ile Ser Thr Ala Phe Gly Val Leu Val TyrLeu Lys Asp Phe Asp Cys S Thr Val Leu Val Ser Leu Pro His Pro His Pro -25 His Leu Gly Leu Pro His Pro Val Arg Ala Pro -10 -5 1 Val Glu Pro Trp Asp Pro Arg Trp Gin Asp Ser 10 Gln Ala Met Asn Ser Phe Leu Asn Glu Arg Ser 25 Leu Arg Gln Glu Ala Ser Ala Asp Arg Cys Asp 40 Phe Giu Gly Leu Leu Ser Gly Val Asn Lys Tyr -40 -35 Arg Ile Trp Leu Ser Leu Vai Phe Ile Phe Arg -20 Val Thr Ala Giu Arg Val Trp Ser Asp Asp His -5 1 Asn Thr Arg Gin Pro Gly Cys Ser Asn Val Cys 10 Phe Asp Glu Phe Phe Pro Val Ser His 224 Val Arg Leu Trp Ala Leu Gin Leu Ile Leu Val Thr Cys Pro Ser Leu Leu Val Met His Tyr Arg Ser Gly Trp Thr Phe Leu 100 Val Val Ile Ser Thr Ala Val Ser 165 Met Xaa 180 Xaa Asp Val Gin Leu Tyr Val Cys Val Phe Cys His 120 Pro Ser 135 Ile Cys Arg Cys Gly His Xaa Ser 200 Clu Leu Ser His 105 Ala Clu Ile His His 185 3ly Lys Asn Leu 90 Ser Asp Lys Leu Glu 170 Pro Asp Arg Pro Val Phe Pro Asn Leu 155 Cys Xaa Xaa
I
His Arg Giu Ala His Gly Xaa Pro Xaa Leu 215 Leu <210> 104 <211> 158 <212> PRT <213> Homo sapie <220> <221> SIGNAL <222> -37.-1 <300> <400i 104 Met Ala Ser Lys Ile Cys Leu Glu Ser Leu Cys Arg, Ser Met Gly Gly Phe Giu His Leu Leu Lys Giu Val Cys Asp His Hisi Lys Val Ile Cys His Thr Gly Pro Pro Gly Ser Pro 110 <210> 105 <211> 51 <212> PRT Pro Asp Arg Pro 60 Gly Phe Tyr Cys Ile 140 Asn Leu Asp Ile Arg 220 Val C Pro I Val c Pro N His E Asp i Leu t Arg Ile C 100 Glu C Lys Lys Lys Ala Pro Lys 110 Pro Asn 125 Phe Thr Leu Val Ala Ala Thr Thr 190 Phe Leu I 205 Arg Ser 95 Tyr Ile Leu Glu Arg 175 Phe ly Gly s0 Va1 Ile Val Phe Leu 160 Lys Ser Ser Gly Asp Leu Asp Met 145 Ile Ala Xaa Asp Val 50 Clu Leu Ile Pro Cys 130 Val Tyr Gin Lys Ser 210 Ala Asn Trp Ala Pro 115 Phe Ala Leu Ala Gin 195 His Asp His Val Lys Lys Thr Ile 225 <213> Homo sapiens <220> <221> SIGNAL <222> -19..-i <300> 225 <400> 105 Met Arg Thr Leu Phe Asn Leu Leu Trp Leu Ala Leu Ala Cys Ser Pro -10 Val His Thr Thr Leu Ser Lys Ser Asp Ala Xaa Lys Pro Pro Gln Arg 1 5 Arg Cys Trp Arg Arg Val Ser Phe G.n Ile Ser Arg Cys Lys Thr Gly 20 Val Trp Trp <210> 106 <211> 359 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -34..-1 <300> <400> 106 Met Leu Leu Ser Ile Gly Met Leu Met Leu Ser Ala Thr Gln Val Tyr -25 Thr Ile Leu Thr Val Gln Leu Phe Ala Phe Leu Asn Leu Leu Pro Val -10 Glu Ala Asp Ile Leu Ala Tyr Asn Phe Glu Asn Ala Ser Gln Thr Phe 1 5 Asp Asp Leu Pro Ala Xaa Phe Gly Tyr Arg Leu Pro Ala Glu Gly Leu 20 25 Lys Gly Phe Leu Ile Asn Ser Lys Pro Glu Asn Ala Cys Glu Pro Ile 35 40 Val Pro Pro Pro Val Lys Asp Asn Ser Ser Gly Thr Phe Ile Val Leu 55 Ile Xaa Xaa Leu Asp Cys Asn Phe Asp Ile Lys Val Leu Asn Ala Gin 70 Arg Ala Gly Tyr Lys Ala Ala Ile Val His Asn Val Asp Ser Asp Asp 85 Leu Ile Ser Met Gly Ser Asn Asp Ile Glu Val Leu Lys Lys Ile Asp 100 105 110 Ile Pro Ser Val Phe Ile Gly Glu Ser Ser Ala Ser Ser Leu Lys Asp 115 120 125 Glu Phe Thr Xaa Glu Lys Gly Gly His Leu Ile Leu Val Pro Glu Phe 130 135 140 Ser Leu Pro Leu Glu Tyr Tyr Leu Ile Pro Phe Leu Ile Xaa Val Gly 145 150 155 Ile Cys Leu Ile Leu Ile Val Ile Phe Met Ile Thr Lys Leu Ser Arg 160 165 170 Asp Arg His Arg Ala Arg Arg Asn Arg Leu Arg Lys Asp Gln Leu Lys 175 180 185 190 Lys Leu Pro Val His Lys Phe Lys Lys Gly Asp Glu Tyr Asp Val Cys 195 200 205 Ala Ile Cys Leu Asp Glu Tyr Glu Asp Gly Asp Lys Leu Arg Ile Leu 210 215 220 Pro Cys Ser His Ala Tyr His Cys Lys Cys Val Asp Pro Trp Leu Thr 225 230 235 Lys Thr Lys Lys Thr Cys Pro Val Cys Arg Gln Lys Val Val Pro Ser 240 245 250 Gin Gly Asp Ser Asp Ser Asp Thr Asp Ser Ser Gln Glu Glu Asn Glu 255 260 265 270 Val Thr Glu His Thr Pro Leu Leu Arg Pro Leu.Xaa Phe Cys Gin Cys 275 280 285 Pro Xaa Xaa Phe Gly Ala Leu Xaa Gly Xaa Pro Ala His Xaa Gln Xaa 290 295 300 His Asp Arg Ile Ile Gin Thr Xaa Glu Glu Asp Asp Asn Glu Asp Thr 305 310 315 226 Asp Ser Ser Asp Ala GCu Glu 320 325 <210> 107 <211> 291 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -42..-1 <300> <400> 107 Met Asp Ser Arg Val Scr Ser Val Gly Val Asn Asn Lys Arg -20 Ser Leu Ser Phe Leu Leu Val -5 Met Cys Leu Lys Ile Ile Lys Leu G-y Arg Ile Gin Ala Asp Val Leu Pro Cys Ile Asp Val 45 Thr Cys Asn Ile Pro Pro Gin 60 Thr Gin Val Asp Gly Val Val Ala Vai Ala Asn Vai Asn Asp Gin Thr Thr Leu Arg Asn Val I 105 Leu Ala Gly Arg Giu Giu Ile 120 125 Asp Ala Thr Giu Leu Trp Gly 135 140 Asp Val Arg Ile Pro Val Gin I 155 Giu Ala Thr Arg Giu Ala Arg 170 Met Ser Ala Ser Lys Ser Leu I 185 1 Ser Pro Ile Ala Leu Gin Leu 1 200 205 Ala Thr Clu Lys Asn Ser Thr I 215 220 Leu Giu Gly Ile Gly Gly Val E 235 Asn Lye Ala <210> 108 <211> 67 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -26..-1 <300> <400> 108 Met Ser Thr Trp Leu Leu Leu 1 -20 Val Ser Leu Phe Ile Asp Cys -5 Asn Ala Arg Glu Thr Ile Lys C Pro -35 Leu Ile Clu Lys 30 Phe lu Ial Leu 11 kla Ile .eu kla .ys .90 rg le er GJu Lys Gin Asp Lys Glu Aon Phe Sly Ile Tyr 15 Ala Va1 Ile Tyr His 95 Gly His Arg Gin Lye 175 Ser Tyr Val Tyr Val Thr Glu Lys Lye Leu Arg 80 Gin Thr Ser Val Arg 160 Va1 Ala Leu Phe Asp 240 Cys Gly Phe Pro 1 Arg Ala Gly Pro Val Asp Thr Arg 65 Ile Tyr Ala Thr Gin Thr Ile Gin 130 Ala Arg 145 Ser Met Leu Ala Ser Met Gin Thr 210 Pro Leu 225 Asn His ,Trp Ile Leu Phe Ile Val Gly Leu Asp Ser Phe Leu 115 Thr Va1 Ala Ala Va1 195 Leu Pro Lys Ser Val Leu Arg Ser Ala Leu 100 Ser Leu Glu Ala Glu 180 Leu Ser Met Lys Ile 5 Phe Ile Thr Va1 Va1 Leu Gin Leu Ile Glu 165 Gly Ala Thr Asn Leu 245 Trp Arg Leu Val Thr Ser Ala Ile Asp Lys 150 Ala Glu Glu Va1 Ile 230 Pro :le Ala Leu Lys Thr Leu Ile Thr Trp Pal Met Thr Arg Lys Leu Thr Asn Cys 1 ly Ile Gin Lys Arg Giu Ala Ser Asn is 227 Cys Phe Ala Ile-Arg His Phe Giu Asn Lys Phe Ala Val Giu Thr Leu 30 Ile Cys Ser <210> 109 <211> 127 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -63. l <300> <400> 109 Met Ser Ala Ala Gly Ala Arg Gly Leu Arg Ala Thr Tyr His Arg Leu -55 Leu Asp Lys Val Glu Leu Met Leu Pro Giu Lys Leu. Arg Pro Leu Tyr -40 A sn His Pro Ala Gly Pro Arg TIhr Val Phe Phe Trp Ala Pro Ile Met -25 Lys Trp Gly Leu Val Cys Ala Gly Leu Ala Asp Met Ala Arg Pro Ala -10 -5 1 Giu Lys Leu Ser Thr Ala Gin Ser Ala Val Leu Met Ala Thr Gly Phe.
10 Ile Trp Ser Arg Tyr Ser Leu Val Ile Ile Pro Lys Asn Trp Ser Leu 25 Phe Ala Val Asn Phe Phe Val Gly Ala Ala Gly Ala Ser Gin Leu Phe 40 Ar Ile Trp Arg Tyr Asn Gin Giu Leu Lys Ala Lys Ala His Lys 55 <210> 110 <211> 97 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -l <300> <400> 110 Met Lys Gly Trp Gly Trp Leu Ala Leu Leu Leu Giy Ala Leu Leu Gly -15 -10 Thr Ala Trp Ala Arg Arg Ser Arg Asp Leu His Cys Giy Ala Cys Arg 1 1 5 Ala Leu Val Asp Giu Leu Giu Trp Giu Ile Ala Gin Val Asp Pro Lys 20 Lys Thr Ile Gin Met Gly Ser Phe Arg Ile Asn Pro Asp Gly Ser Gin 35 Ser Val Val Giu Val Thr Val Thr Xaa Ser Pro Lys Thr Lys Val Ala 50 55 His Ser Gly Phe Trp Met Lys Ile Arg Leu Leu Lys Lys Gly Pro Trp 70 Ser <210> ill <211> 86 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -20. 1 <300> <400> ill Met Lys Gly Trp Gly Trp Leu Ala Leu Leu Leu Gly Ala Leu Leu Gly -15 -10 Thr Ala Trp Ala Arg Arg Ser Gin Asp Leu His Cys Gly Ala Cys Arg
I
Ala Leu Val Asp Giu Asp His Ser Val Ser Cly Cly Ala Ala Gly Gly <210> 112 <211> 71 <212> PRT <213> Homno Saple <220> <221> SIGNAL <222> -1 <300> <400> 112 Met Pro Ala Gly Ser Leu Leu Ala Arg Pro Asp Leu Lys Thr Giu Leu Ser Gin Gin Glu <210> 113 <211> <212> PRT <213> Homo sapie <220> <221> SIGNAL <222> -1 <300> <400> 113 Met Asp Gly His Met Ser Ser Trp Ser Leu Pro Gly Thr Thr Ser Phe <210> 114 <211> 118 <212> PRT <213> Homo sapie <220> <221> SIGNAL <222> -83.,.1 <300> <400> 114 Met Leu Pro Val Trp Arg His Leu Phe Ser Cys Leu Pro Cys Pro Ala His Val Gly Gin G1 Asj Cl, Asj rns 228 510 u Thr Arg Met Gly Asn Cys Pro Gly Gly Pro Gin 20 p Gly Ile Phe Pro Asp Gin Ser Arg Trp Gin Pro 35 y Ala Leu Cys Pro LeU Arg Gly Pro Pro His Arg 50 55 p Met Val Pro Met Ser Thr Tyr LeU Lys Met Phe Ala Ala -20 -15 Met Cys Ala Gly Ala Glu Val Val His Arg Tyr Tyr 1 Thr Ile Pro Glu Ile Pro Pro Lys Arg Gly Giu Leu 15 Leu Gly Leu Lys Giu Arg Lys His Lys Pro Gin Val 30 Glu Leu Lys ns Trp Ser Ala Ala Phe Ser Ala Leu Thr Val Thr Ala -35 Ala Arg Arg Arg Ser Ser Ser Ser Arg Arg Ile Pro -20 Ser Pro Val Cys Trp Ala Trp Pro Trp Tyr Pro Asp -5 1 Pro Leu A rg Cys Arg Gly Arg Val ~ns Gin Ser Phe Thr *Leu Val Ala Gin Ala Gly Val Gin -75 Ser Ser Leu Gin Leu Leu Pro Pro Ciu Phe Lys Gly -60 Ser Leu Pro Ser Ser Trp Asp Tyr Arg Arg Pro Pro -45 Gly Phe Phe Val Phe Leu Val Glu Thr Gly Leu His -30 -25 Ala Gly Leu Giu Leu Leu Thr Ser Cys Ser Pro Pro 229 10 Ala Ser Ala Ser Gln Ser Ala Ala Ile Thr Gly Val Ser His Val Pro 1 5 Gly Lys Lys Lys Leu Leu Lys Val Glu Lys Lys Asn Leu Arg Xaa Leu 20 Leu Thr Xaa Ile Lys Thr <210> 115 <211> 76 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -22..-1 <300> <400> 115 Met Glu Leu Ile Ser Pro Thr Val Ile Ile Ile Leu Gly Cys Leu Ala -15 Leu Phe Leu Leu Leu Gln Arg Lys Asn Leu Arg Arg Pro Pro Cys Ile 1 5 Lys Gly Trp Ile Pro Trp Ile Gly Val Gly Phe Xaa Phe Gly Lys Ala 20 Pro Leu Glu Phe Ile Glu Lys Ala Arg Ile Lys Val Cys Gly Arg Gly 35 Xaa Arg Gly Leu Gln Arg Arg Gin Cys Phe Leu Phe <210> 116 <211> <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -52..-1 <300> <400> 116 Met Ala Glu Thr Lys Asp Ala Ala Gln Met Leu Val Thr Phe Lys Asp -45 Val Ala Val Thr Phe Thr Arg Glu Glu Trp Arg Gln Leu Asp Leu Ala -30 Gin Arg Thr Leu Tyr Arg Glu Val Met Leu Glu Thr Cys Gly Leu Leu -15 -10 Val Ser Leu Gly Gln Ser Ile Trp Leu His Ile Thr Glu Asn Gln Ile 1 5 Lys Leu Ala Ser Pro Gly Arg Lys Phe Thr Asn Ser Pro Asp Glu Lys 20 Pro Glu Val Trp Leu Ala Pro Gly Leu Phe Gly Ala Ala Ala Gin 35 <210> 117 <211> 82 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -22..-1 <300> <400> 117 Met Glu Leu Ile Ser Pro Thr Val Ile Ile Ile Leu Gly Cys Leu Ala -15 Leu Phe Leu Leu Leu Gln Arg Lys Asn Leu Arg Arg Pro Pro Cys Ile 1 5 Lys Gly Trp Ile Pro Trp Ile Gly Val Gly Phe Glu Phe Gly Lys Ala 20 Pro Leu Glu Phe Ile Glu Lys Ala Arg Ile Lys Tyr Gly Pro Ile Phe 230 35 Thr Val Phe Ala Met Gly Asn Arg Met Thr Phe Val Thr Glu Glu Gly 50 Arg Asn <210> 118 <211> 89 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -16..-1 <300> <400> 118 Met Ile Ile Ser Leu Phe Ile Tyr Ile Phe Leu Thr Cys Ser Asn Thr -10 Ser Pro Ser Tyr GCln Gly Thr Gln Leu Gly Leu Gly Leu Pro Ser Ala 1 5 10 Gin Trp Trp Pro Leu Thr Gly Arg Arg Met Gln Cys Cys Arg Leu Phe 25 Cys Phe Leu Leu Gin Asn Cys Leu Phe Pro Phe Pro Leu His Leu Ile 40 Gin His Asp Pro Cys Glu Leu Val Leu Thr Ile Ser Trp Asp Trp Ala 55 Glu Ala Gly Ala Ser Leu Tyr Ser Pro <210> 119 <211> <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -19..-1 <300> <400> 119 Met Thr Met Ala Glu Cys Pro Thr Leu Cys Val Ser Ser Ser Pro Ala -10 Leu Trp Ala Ala Ser Glu Thr Thr Asp Asp Val Cys Arg Glu 1 5 <210> 120 <211> 115 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -103..-1 <300> <400> 120 Met Val Ile Arg Val Tyr Ile Ala Ser Ser Ser Gly Ser Thr Ala Ile -100 -95 Lys Lys Lys Gln Gln Asp Val Leu Gly Phe Leu Glu Ala Asn Lys Ile -80 Gly Phe Glu Glu Lys Asp Ile Ala Ala Asn Glu Glu Asn Arg Lys Trp -65 Met Arg Glu Asn Val Pro Glu Asn Ser Arg Pro Ala Thr Gly Asn Pro -50 -45 Leu Pro Pro Gin Ile Phe Asn Glu Ser Gin Tyr Arg Gly Asp Tyr Asp -30 Ala Phe Phe Clu Ala Arg Glu Asn Asn Ala Val Tyr Ala Phe Leu Gly -15 Leu Thr Ala Pro Ser Gly Ser Lys Glu Ala Gly Arg Cys Lys Gln Ser 1 Ser Lys Pro 231 <210> 121 <211> 105 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -76..-1 <300> <400> 121 Met Pro Leu Leu Cy, Gin Met Thr Met LeL Leu Leu Pro Leu Gin Ile Lys Val Ser Ser Ile Val Tyr Leu Trp 10 Thr Gly Leu Ile Val Glu Leu Arg Gln Val <210> 122 <211> 93 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -22..-1 <300> <400> 122 Met Lys Pro Val Leu Leu Gln Leu Val Pro Leu Glu Pro Pro Pro Cys Thr Met Gin Glu Phe Cys Gly Ile Val Ile Lys His Lys Gly <210> 123 <211> 109 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -42..-1 <300> <400> 123 Met His Ile Leu Gln Ile Val His Cys Pro Asp Leu Val Cys His Leu Gln Glu Gin Lys Ala Ile Tyr Ala Ser s Gin SGin -55 SGin Val Val Ser 10 Val Pro Gly Cys Asp Cys Ser lie -70 Ser Thr Thr Val Leu Glu Leu Ser 1 Ile Cys Ser Glu Glu Met Lys Ile Ser -5 Glu Val Clu Cys Ala Pro Ile Glu Leu Gin Pro 20 Gly Thr Met Tyr Leu Lou Leu Asp Leu Val Ser -50 Asn Leu Asp Phe Thr Arg Trp Phe Ala Ser Ser Ser 1 Leu Ala Pro Leu 15 Trp Pro Lys Leu Asn Glu Gin Phe -15 Pro Lys Ser Ala Glu Lys 35 Ser Glu 50 Val Ile Phe Cys Leu Ala Leu Lys Tyr Ile Cys Thr His Leu Cys Cys Ser Ser Lys Arg Asn Arg Asn Leu Leu Thr Thr Val Asp Asp Gly Ile Gin Ala -35 Asp Thr Gly Lys Asp Ile Trp Asn Leu Leu Phe -20 Glu Phe Cys Gln Ser Asp Asp Pro Ala Ile Ile -5 1 Thr Val Leu Ala Ser Val Phe Ser Val Leu Ser 15 Gin Thr Glu Gln Glu Tyr Leu Lys Ile Glu Lyz 232 30 Val Asp Leu Pro Leu Ile Asp Ser Leu Ile Arg Val Leu Gin Asn Met 45 Glu Gin Cys Gin Lys Lys Pro Glu Asn Ser Ala Gly Val 60 <210> 124 <211> 51 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -15..-1 <300> <400> 124 Met Arg Leu Val Pro Leu Gly Gin Ser Phe Pro Leu Ser Glu Pro Arg -10 -5 1 Cys Leu Gin Pro Val Lys Trp Asp His Asn His Cys Leu Thr Ser Leu 10 Thr Val Val Val Arg Thr GClu Cys Val Glu Val Phe His Lys Leu Trp S 20 25 Met Leu Val <210> 125 <211> 56 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -27..-1 <300> <400> 125 Met Asn Arg Val Pro Ala Asp Ser Pro Asn Met Cys Leu Ile Cys Leu -20 Leu Ser Tyr Ile Ala Leu Gly Ala Ile His Ala Lys Ile Cys Arg Arg -5 1
S
Ala Phe Gin Glu Glu Gly Arg Ala Asn Ala Lys Thr Gly Val Arg Ala 15 Trp Cys Ile GLn Pro Trp Ala Lys <210> 126 <211> 162 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -21..-1 <300> <400> 126 Met Leu Gin Thr Ser Asn Tyr Ser Leu Val Leu Ser Leu Gin Phe Leu -15 Leu Leu Ser Tyr Asp Leu Phe Val Asn Ser Phe Ser Glu Leu Leu Gin 1 5 Lys Thr Pro Val Ile Gin Leu Val Leu Phe Ile Ile Gin Asp Ile Ala 15 20 Val Leu Phe.Asn Ile Ile Ile Ile Phe Leu Met Phe Phe Asn Thr Ser 35 Val Phe Gin Ala Gly Leu Val Asn Leu Leu Phe His Lys Phe Lys Gly 50 Thr Ile Ile Leu Thr Ala Val Tyr Phe Ala Leu Ser Ile Ser Leu His 65 70 Val Trp Val Met Asn Leu Arg Trp Lys Asn Ser Asn Ser Phe Ile Trp 80 85 Thr Asp Gly Leu Gin Met Leu Phe Val Phe Gin Arg Leu Ala Ala Val 233 100 105 Leu Tyr Cys Tyr Phe Tyr Lys Arg Thr Ala Val Arg Leu Gly Asp Pro 110 115 120 His Phe Tyr Gln Asp Ser Leu Trp Leu Arg Lys Glu Phe Met Gln Val 125 130 135 Arg Arg 140 <210> 127 <211> 126 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -68..-1 <300> <400> 127 Met Ala Ser Ala Ser Ala Arg Gly Asn Gln Asp Lys Asp Ala His Phe -60 Pro Pro Pro Ser Lys Gln Ser Leu Leu Phe Cys Pro Lys Ser Lys Leu -45 His Ile His Arg Ala Glu Ile Ser Lys Ile Me.t Arg Glu Cys Gin Glu -30 Glu Ser Phe Trp Lys Arg Ala Leu Pro Phe Ser Leu Val Ser Met Leu -15 -10 Val Thr Gln Gly Leu Val Tyr Gln Gly Tyr Leu Ala Ala Asn Ser Arg 1 5 Phe Gly Ser Leu Pro Lys Val Ala Leu Ala Gly Leu Leu Gly Phe Gly 20 Leu Gly Lys Val Ser Tyr Ile Gly Val Cys Gln Ser Lys Phe His Phe 35 Phe Glu Asp Gln Leu Arg Gly Ala Gly Phe Gly Pro Thr Ala 50 <210> 128 <211> 140 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -40..-1 <300> <400> 128 Met Thr Ser Met Thr Gln Ser Leu Arg Glu Val Ile Lys Ala Met Thr -35 -30 Lys Ala Arg Asn Phe Glu Arg Val Leu Gly Lys Ile Thr Leu Val Ser -15 Ala Ala Pro Gly Lys Val Ile Cys Glu Met Lys Val Glu Glu Glu His 1 Thr Asn Ala Ile Gly Thr Leu His Gly Gly Leu Thr Ala Thr Leu Val 15 Asp Asn Ile Ser Thr Met Ala Leu Leu Cys Thr Glu Arg Gly Ala Pro 30 35 Gly Val Ser Val Asp Met Asn Ile Thr Tyr Met Ser Pro Ala Lys Leu 50 Gly Glu Asp Ile Val Ile Thr Ala His Val Leu Lys GCln Gly Lys Thr 65 Leu Ala Phe Thr Ser Val Gly Leu Thr Asn Lys Ala Thr Gly Lys Leu 80 Ile Ala Gln Gly Arg His Thr Lys His Leu Gly Asn 95 100 <210> 129 <211> 43 <212> PRT <213> Homo sapiens 234 <220> <221> SIGNAL <222> -24..-1 <300> <400> 129 Met Gln Cys Phe Ser P e Ile Lys Thr Met Met Ile Leu Phe Asn Leu -15 Leu Ile Phe Leu Cys Gly Phe Thr Asn Tyr Thr Asp Phe Glu Asp Ser 1 Pro Tyr Phe Lys Met His Lys Pro Val Thr Met <210> 130 <211> 69 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -21..-i <300> <400> 130 Met Trp Trp Phe Gln Gln Gly Leu Ser Phe Leu Pro Ser Ala Leu Val -15 Ile Trp Thr Ser Ala Ala Phe Ile Phe Ser Tyr Ile Thr Ala Val Thr 1 5 Leu His His Ile Asp Pro Ala Leu Pro Tyr Ile Ser Asp Thr Gly Thr 20 Val Ala Pro Glu Lys Cys Leu Phe Gly Ala Met Leu Asn Ile Ala Ala.
35 Val Leu Cys Gln Lys <210> 131 <211> 78 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -19..-1 <300> <400> 131 Met Ser Pro Gly Ser Ala Leu Ala Leu Leu Trp Ser Leu Pro Ala Ser -10 Asp Leu Gly Arg Ser Val Ile Ala Gly Leu Trp Pro His Thr Gly Val 1 5 Leu Ile His Leu Glu Thr Ser Gln Ser Phe Leu Gin Gly Gin Leu Thr 20 Lys Ser Ile Phe Pro Leu Cys Cys Thr Ser Leu Phe Cys Val Cys Val 35 40 Val Thr Val Gly Gly Gly Arg Val Gly Ser Thr Phe Val Ala <210> 132 <211> <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -47..-1 <300> <400> 132 Met Arg Leu Pro Pro Ala Leu Pro Set Gly Tyr Thr Asp Ser Thr Ala -40 Leu Glu Gly Leu Val Tyr Tyr Leu Asn Gln Lys Leu Leu Phe Ser Ser -25 Pro Ala Ser Ala Leu Lcu Phe Phe Ala Arg Pro Cys Val Phe Cys Phe 235 -10 -5 1 Lys Ala Ser Lys Met Gly Pro Gin Phe Giu Asn Tyr Pro Thr Phe Pro 10 Thr Tyr Ser Pro Leu Pro Ile Ile Pro Phe Gin Leu His Gly Arg Phe 25 <210> 133 <211> 53 <212> PRT <213> Homo sapiens <220> <221> SIGNAL -42. 1 <300> <400> 133 Mac Asp Ser Arg Val Ser Ser Pro Giu Lys Gin Asp Lys Giu Asn Phe -35 Val Gly Val Asn Asn Lys Arg Lau Gly Val Cys Gly Trp Ile Leu Phe -20 Ser Leu Ser Phe Lau Leu Val Ile Ile Thr Phe Pro Ile Ser Ile Trp -5 1 Met Cys Leu Lys'lie <210> 134 <211> 1053 <212> DNA <213> Homo sapiens <220> <221> sig...peptide <222> 131. .169 <223> Von Heijne matrix score 4.19999980926514 seq MLAVSLTVPLLGA/MM <221> polyA.site <222> 1042. .1053 <300> <400> 134 gagcgagtcg gacgggctgc gacagcgccg gcccctgcgg ccgcaggtcg tcacagacga tgacggccag gccccggagg ctaaggacgg cagctccttt agcggcagag tctccgagt 120 gaccttcttg atg ctg gct gtt tct ctc acc gtt ccc ctg ctt gga gcc 169 Met Lau Ala Val Ser Leu Thr Val Pro Leu Leu Gly Ala atg atg ctg ctg gaa tct cct ata gat cca cag cct ctc agc ttc aaa 217 Met Met Lau Leu Giu Ser Pro Ile Asp Pro Gin Pro Leu Ser Phe Lys 1 5 10 gaa ccc ccg ctc ttg ctt 99t gtt ctg cat cca aat acg aag ctg cga 265 Giu Pro Pro Leu Leu Leu Gly Val Leu His Pro Asn TIhr Lys Leu Axg 25 cag gca gaa agg ctg ttt gaa aat caa ctt gtt gga ccg gag tcc ata 313 Gin Ala Glu Arg Leu Phe Glu Asn Gin Leu Val Gly Pro Giu Ser Ile 40 gca cat act ggg gat gtg atg ttt act ggg aca gca gat ggc cgg gtc 361 Ala His Ile Gly Asp Val Met Phe Thr Gly Thr Ala Asp Gly Arg Val 55 gta aaa ctt gaa aat ggt gaa ata gag acc att gcc cgg ttt ggt tcg 409 Val Lys Leu Glu Asn Gly Giu Ile Giu Thr Ile Ala Arg Phe Gly Ser 70 75 so ggc cct tgc aaa acc cga ggt gat gag cct gtg tgt ggg aga, ccc ctg 457 Gly Pro Cys Lys 'Thr Arg Gly Asp Glu Pro Val Cys Gly Arg Pro Leu 90 qgt atc cgt gca, ggg ccc aat ggg act ctc ttt gtg gcc gat gca tac 505 Gly Ile Arg Ala Gly Pro Asn Gly Thr Leu Phe Val Ala Asp Ala Tyr 100 105 110 aag gga cta ttt gaa gta aat ccc tgg aaa cgt gaa gtg aaa ctg ctg 553 236 Lys Gly Lou Phe Glu Val Asri Pro Trp 120 Lys Arg Glu Val Lys Leu Leu Ctg tcc Lou Ser 130 gat ctt Aso Leu gag aca ccc act gag 9gg aag aac atg tcc ttt gtg aat Giu Thr Pro Ile Glu Gly Lys Asn Met Ser Phe Val Asn aca gtC act cag Thr Val Thr Gin ggg agg aag Gly Arg Lys ttc acc gat ect Phe Thr Asp Ser agc aaa tgg Ser Lys 'rrp aga cga gac tac Arg Arg Asp Tyr Ctt ctg gtg atg Lou Lou Val Met gag 9gc Clu Gly 175 aca gat gac Thr Asp Asp aaa gtt tta Lys Val Leu 195 etg ctg gag Leu Lou Glu, gat act qtg cc Asp Thr Val Thr gac cag ctg Asp Gin Leu ccg aat gga Pro Asn Gly 4gg gad gta Arg Glu Val 190 cog ctg tct Gin Leu Ser gcc egg eta Ala Arg Ile cot gca Pro Ala 210 cga aga Arg Arg gee gee ttt tcc Glu Asp Phe Val gcg gca gaa aca Val Ala Giu Thr 601 649 697 745 793 841 889 937 985 1033 1053 gtc tao gtt Vai Tyr Val ctg atg aag Leu Met Lys ggg gct gat ctg Gly Ala Asp Leu gtg gag eac atg Val Glu Asn met ttt Cce gac Phe Pro Asp egg ccc agc Arg Pro Ser agc tet Scr Ser 255 ggg ggg tac Gly Gly Tyr tee atg ctg Ser Met Leu 275 ggc atg tcg Gly Met Ser egc cot aac Arg Pro Asn Cot ggg ttt Pro Gly Phe 270 agg atg act Arg Met Ile ttc tta tot Phe Leu Ser ccc tgg at Pro Trp Ile ttC dog gta aaaaaaaaaa a Phe Lys Val 290 <210> 135 <211> 675 <212> DNA <2i3> Homo sapiens <220> <221> P01YA..Signal <222> 638. .643 <221> POlYk..site <222> 662. .675 <300> <400> 135 aocgaacagg aacagcacaa cctgggaccc gtagcagtgg ttcagcacac tttggtatgt agaeatgeag taco tctacg caaagtaaaa tgactgtta atg atg tae gtt tcc Met Met Tyr Val Ser ata gaa atg Ile Clu Met tgt tac act Cys Tyr Ile aae tot tgg Lys Ser Trp tca ggt cca acc 8.tt toc cat ttg tto gac tat gtg gtc Ser Cly Pro Thr Ile Ser His Lou Phe Asp Tyr Val Val 15 tat ggc tta aag tee ttt tct ctt aaa cag tta aaa aaa Tyr Gly Leu Lys Scr Phe Ser Leu Lys Gin Leu Lys Lys 30 tct eag tat tta tct gaa tcc tgt tgc tat egg agt ttg Ser Lys Tyr Leu Phe Giu, Ser Cys Cys Tyr Arg 5cr Lou .45 so tat gtg tgc gto ttc act Laaacatecc tgcataeaaa gatggtttat Tyr Val Cys Val Phe Ile ttctatttee tatgtgacat ttgcttcctg gatatagtcc Vtgaaccaca agatttatca 237 ccgtaaattg ttaaccattt tatgttcaga tatttcaa taateitgaga agaaaatggg 426 cataacctta tatgttgaca caataattca cagagaagaa catttaaagg gtaatacttt ttatcgtggc tctatttga aatgtgtcta aaaaaaaaa <210> 136 <211> 1112 <212> DNA <213> Homo sapiens <220> <221> sig..peptlde <222> 111-.194 <223> Von Heine matrix score 4.80000019073486 seq GVLLEPFVHQVGG/HS aacatagaga gaataatttg ttgaaacgtt aaataaatgc ccagcaag tg t taaagataa ttcagataat tgttattta aatatatatg actaattttt atctatttga aaatgaaaaa <221> pOlYA..signai <222>1080.,.1085 <221> polYA..Site <222> 1101.-1112 <300> <400> 136 ccgagagaga ctacacggta ctgggacaca cggacaaaca ccgctggact ccgctgcctc ccccatctcc ccgccatctg acagacagaa gacgtactgg cgcccggagg atg agc cca gcc Pro Ala ttc agg gcc atg ?he Arg Ala Met gtg gag ccc cgc gcc aaa gg( Val Glu Pro Arg Ala Lys G11 t ~et Ser gtc ct~t Val Leu ctc cgc Leu Arg gag ccc ttt gtc Glu Pro Phe Val 486 546 606 666 675 116 164 212 260 308 356 401 461 521 581 641 701 761 821 881 941 1001 1061 1112 cag gtc ggg ggg cac tca tgc gcg Gin Val Gly Gly His Ser Cys Val
I
aat gag aca acc ctg tgc aag Asn Glu Thr Thr Leu Cys Lys ctg gtc cca agg Leu Val Pro Arg gaa cat cag Glu His Gin ccc cag tac Pro Gin Tyr ttc tac gag Phe Tyr Glu aaa gga caa Lys Gly Gin ctc cct gct Leu Pro Ala cgc aaa ttc Arg Lys Phe agc caa agg Ser Gin Arg ctt gtt agc tgg Leu Val Ser Trp tcc ctg ccc cat Ser Leu Pro His ttc ccc tgg tcc ttt ccc ctg tgg cca cag gga agt gtg gcc Phe Pro 'rrp Ser Phe Pro Leu Trp Pro Gin Gly Ser Val Ala tgaatacccc accccggctc tctctctgag cacgcattcc ggcagagagg cctgagagga ctgctgaaac gcccccacct aggttctaga gacttaaggg aaccagctct atctgccttg aataaaaaca tattttataa gggaagcctt tgctagggtg gcctgctta cccagccctc g~.tccccctg tattcaggct tagggcatta aaaattcccg aagaattaca taaaaattaa <210> 137 <211> 547 <212> DNA <213> Homo sapiens <220> <221> sig~peptide <222> 359. .454 ctctgcaccc cc tgcagcag aaggtgttca gacagcccca gaaaagc tgc tgttcatttt gaacaaaagg tcttgtgctg ccttactctt ctgctttaaa caaactataa taaacatttt agagctgggg tcgaggactg gccagtcgtt tcctcaaaga tttcaaggcc gttattttgt cctgggtgcc tgtggtttgt ggatgcttc t gcaagc -catg agagcaatgt caatgatgga gccacctcag agcagattga tgtaaggcgc ctgtcttaat accacatgtc gacgtgagac tacccgtgtg t ttgt ttgcc taaccc tcag aggctgttgg tttcagtctt aaaaaaaaaa aagtgtcatc gtgatgctgg tcgtcggcac tactcatggc tgtgctcccc agcaaagacc ggggcact.gt cctttatttt gcaaacctgt agtttctgtt t taggat tag a 238 <223> Von Heijn4 matrix score 4 seq FSFMLLCMGCCLP/GF <221> polyA <222> 536. <300> <400> 137 ctggggagcc cagctagcct ggttttsttg aaggtggaag atCtttatg C tOcccaCCa Atg ttg ggg Met Leu Gly site 547 ctgcctaaga ctcatgctac aagaagttaa ctcatccctt ttctactgag aggaagtgga tg~gctggcc ttggaattaa accaccacca gclgtgcaaaa atqccatccc catgcttgtc scgcctttcc tgaatcacag gtgcattggu actttgtgaa cacaacccac tagaggagc acc acg ggCc tc ggg aca cag ggt Thr Thr Gly Leu Gly Thr Gln Gly ataagtttcc atgcactccg acacact:::t tgccaggeaa gtgct::cctc tatctcagca cgaagtcaca acasqgatcaa ggattatcag cctggegtcc ctccccagga ccc tcc cag cag get Pro Ser Gln Gin Ala ctg ggc ttt ttc tee ttt atg tta ctt gga atg ggc ggg tgc ctg, cct Leu Gly Phe Phe Ser Phe Met Leu Leu Gly Met Gly Gly Cys Leu Pro -10 gga tte ctg eta cag cct ccc dat cga tet cct act ttg cct gca tcc Gly Phe Leu Leu Gin Pro Pro Asn Arm e, Pro Th l' 10 acc ttt gcc Thr Phe Ala cat taaagtcaat His tctccaccca taaaaaaaaa aaa <210> 138 <211> 1198 <212> DNA <213> Homo sapiens <220> <22 1> <222> <2 23> sig...peptide 26. .316 Von Heijne matrix score 4 seq RLPLVVSFIASSS/AN 120 1.80 240 300 358 406 454 502 547 52 100 148 196 244 292 340 42221> poly~signal <222> 1164. .1169 <221> Polyk..site <222> 1187. .1198 <300> <400> 138 atcctgcgaa agaagggggt tcatc atg gcg gat Met Ala Asp -95 gac cta aag ega Asp Leu Lys Arg ttc ttg Phe Leu gtg tea Val Ser tat aaa aag Tyr Lys Lys gat aga gat Asp Arg Asp cea agt gtt gaa ggg ctc: cat gce att Pro Ser Val Glu Gly Leu His Ala Ile gta cot gtt Val Pro Val aaa gtg gca sat Lys Val Ala Asn aat get eca Asn Ala Pro gag cat Glu His gct ttg cga cct Ala Leu Arg Pro tta tee act ttt gcc ett ges aca Leu Ser Thr Phe Aia Leu Ala Thr esa gga agc: aaa ctt gga ctt tcc as aat a agt atc ate Gin Gly Ser Lys Leu Gly Leu Ser Lys Asn Lys Ser Ile Ile tat sac sc Tyr Asr. Thr cag gtg gtt csa ttt aat cgt tta cct ttg Gin Val Val Gin Phe Asm Arg Leu Pro Leu gtg ag: ttc Val Ser Phe gee agc age agt gcc aat sea gga eta att Ala Ser Ser Ser Ala Asn Thr Gly Leu Ile -101 gte age Vai Ser eta gaa sag gag ctt gct cca ttg ttt gas gaa etg aga caa. gtt gtg 239 Leu Glu Lys Glu'Leu Ala pro Leu Phe Glu Giu Leu Arg Gin Val Val gaa gtt tct taatctgaca gtggtttcag tgtgtacctt atcttcatta Glu Val Ser taacaacaca atatcaa caagaaaggg ccccttt gatagatcag ttgctat taccacagaa atggttc ggatgagaga ttctatt togtgctg ccaattg cgaataacaa taaggac ggtacataaa atggctt acatactatt ttgcagt ctgctcteta aegaaaa taaccaatca gtgcttc tctgtttcca atgttagl agtaaaaata aaatagc <210> 139 <211>- 1400 <212> DNA <213> Homo sapiens <220>.
<221> sig~peptide <222> 36. .107 tcc ttc att ag t cag tac at& ;g t ata tga at tat att agcaatcttt cacttatac tctggtgta ctatcacagc tggattagaa ao tatgccca tttttcttca aaaagtaaca gatgadtatt 888 ttatagc gtttgtgtgt gtatgtaaac t ttaaaagta agactacaat taeagagc ta ggctttct tcccatggag tcaaactggt ggcttgcaga gactatgttt 888 tcagtac actaatcagt taatgttttt Ctccatdaa atgatagcac aaaaaaaaaa aatgctttta gcatatagat tatttagtga ttagtctggt aattgatcc ataaagccaa tattctttg aatcactaac Ctgattattc tcctcaaact atttaaatac agccattttt a tccatgtgct gtaatttata gatctaggga caccagatat act cgagccg ctttttattg cat tgagtga tttcctcct tcagagggtg ctgctttctg aattcgttat catgtg <223> Von Heijne matrix score 5.69999980926514 seq ILGLLGLLGTLVA/ML <221> polyk..signal <222> 1302.. 1307 <221> polyA-site <222> 1389. .1400 <300> <400> 139 cagcccctga agacgcttct actgagagge< caa ctt gtg ggc tac atc cta ggc c Gin Leu Val Gly Tyr Ile Leu Gly I 437 497 557 617 677 737 797 857 917 977 1037 1097 1157 1198 53 101 149 197 245 293 341 389 437 485 cegcc atg gcc tct ctt ggc ctc Met Ala Ser Leu Gly Leu :tt ctg ggg ctt ttg ggc aca ctg ,eu Leu Gly Leu Leu Gly Thr Leu gtt gcc atg Val Ala Met ctc ccc agc Leu Pro Ser tgg aaa aca agt tct tat gtc ggt gcc Trp Lys Thr Ser Ser Tyr Val Gly Ala agc Ser att gtg aca gca gtt 99c ttc tcc aag ggc ctc tgg atg gaa Ile Val Thr Ala Val Gly Phe Ser Lys Gly Leu Trp, Met Glu gcc aca cac agc aca Ala Thr His Ser Thr ggc atc acc cag tgt gac atc tat agc acc Gly Ile TIhr Gin Cys Amp Ile Tyr Ser Thr ctg ggc ctg Leu Gly Leu tcc agt gca Ser Ser Ala gct gac atc cag Ala Asp Ile Gin tcc tcc ctg gcc Ser Ser Leu Ala gcc cag gcc atg Ala Gin Ala Met gtg aca Val. Thr tgc att atc tct Cys Ile Ile Ser gtg ggc atg Val Gly Met aga tgc aca gtc ttc tgc cag Arg Cys Thr Val ?he Cys Gin gta gca ggt gga gtc ttt ttc Val. Ala Gly Gly Val Phe Phe gaa tcc cga gcc Giu Ser Arg Ala gac aga gtg gcg Asp Arg Val. Ala a tc Ile ctt. gga ggc ctc ctg gga ttc att Leu Gly Gly Leu Leu Gly Phe Ile 105 110 Cct gtt gcc tgg aat ctt cat ggg atc cta cgg gac ttc tac tca cca Pro Vai Ala Trp Asn Leu His Gly Ile Leu Arg Asp Phe Tyr Ser Pro Ctg gtg cct Leu Val Pro 115 gac agc atg aaa ttt gag Asp Ser Met Lys Phe Glu 130 135 240 120 att gga gag gct Ctt Ile Gly Giu Ala Leu 125 tac ttg Tyr Leu ggc att Cly Ile ttt tCC Phe Ser 160 c gcc Gin Ala 175 Ccc Oda Pro Lys att tct tcc ctg ttc tcc ctg ata gct 9g atc atc ctc tgc Ile Ser Ser Leu Phe Ser Leu Ile Ala Gly Ile Ile Leu Cys i A r.
tca tcc cag Ser Ser Gin Oat cgc tcc aac Asn Arg Ser Asn tac gat gcc tac Tyr Asp Ala Tyr c~ia cct ctt 9cc acd agg a9C tct cca agg cct ggt cad cct Gin Pro Leu Ala Thr Arg Set Ser Pro Arg Pro Gly Gin Pro gtc aag agt gag ttc Oat Val Lys Ser Glu Phe Asn tgaagaacca ggggcCagag cccgdgggcc acaggtgagg tagactgact ttggccattg gcaggttgaa ttgccaagga.
ctcccctgcc ctaagtcccc agaggatccc tttgccctct cccactgact gaccctctgc cattgctggg gatgggaagg tcaagcttcc ctccasagaa ctcscagtg tccagactas gggaaeagaa sgcaggstgc aataaas aaa <210> 140 <211> 538 <212> DNA <213> Homo sapiens <220> <221> sig-.peptide <222> 35. .130 ctgggggg tg gacactacca ga ttgagcaa tgctcgccat aaccctcaac ggtttacctg gatcaaagac agaagcagcg actgattggc tttgtgcatg aggatgggag tcc tac age Ser Tyr Ser 200 gctgggtctg Ctggatcgtg aggcagsaaat gccagcctt t tgaaacccc ggactccatc CCtCCCtctg gcttttgtgg cc tggaacct daetgaaata gacaggaagg ctg aea ggi Leu Thr Gl., tgaaaaacag tcagaaggtg gggggc tag t ctgttttcct atcettaas cccaaaccca gctgaggttg gcattgctct ccateecact daaccacct cagcctggga 190 tat gtg Tyr Val 205 tggacagcac ctgetgaggg gtaacagcat cacct tgetg gecaggacte ctaatcacst gctcttagct aacctacttc cttgttatga acggtaccca cat ttaaaaa 581 629 677 725 785 845 905 965 1025 1085 1145 1205 1265 1325 1385 1400 103 151 199 247 300 <223> Von Heijne matrix score 8 seq VPMLLLIAGGSFG/LR <221> POiyA.signal <222> 505. .510 <221> polyAsite <222> 526.-538 <300> <400> 140 gcttggagtt ctgagccgat ggaggagttc aetc atg ttt gca ctc gcg gtg atg Met Phe Ala Leu Ala Val Met get ttt cgc aag Ala Phe Arg Lys aac sag act etc ggc- tat gga gtc ccc atg ttg Asn Lys Thr Leu Gly Tyr Gly Val Pro Met Leu -15 ttg Ctg att gct gga gcgt tct ttt ggt ct~t cgt gag ttt tct cad atc Leu Leu Ile Ala Gly GJly Ser Phe Gly Leu Arg Glu Phe Ser Gin Ile 1 cga-tat gat gct gcg aag agt aaa atg gat ect gag ctt gas ass aaa Arg Tyr Asp Ala Val Lys Ser Lys Met Asp Pro Giu Leu Glu Lys Lys 15 ccg ass Pro Lys gag sat asa ata Giu Asn Lys Ile tct tta gag tcg gas tat gag gga agt ate Ser Leu Giu Ser Glu Tyr Giu Gly Ser Ile 30 tgt tgaagggcta ctatctttcc ttggeccttc teccttgtcg ggactcaatc Cys teeagactat ctccccagag aatcttgtca aggcttggct ttaagctttg ttgggaaaat caaagactcc aagtttgatg actggaagaa 420 tgacctcctc caaggaagaa atccagaaag <210> 141 <211> 1167 <212> DNA <213> Homo sapiens <220> <221.- sig-pepcide <222> 169. .267 <223> Von Haijne matrix score 7.80000019073486 seq LTFLFLHLPPSTS/LF <221> polyk..signal <222> 1132. .1137 <221> polyA,.site <222> 1155.-1167 <300> <400> 141 gtaggaacta ctgtcccaga gctgaggcaa tgcettagta gtagtteaaa gtagtaactg gaaatttgaa gaccagatca tgggtggtct 241 tattcgagga cccaggcctt gggaa ccttaagact aagacaactt. gactctgctg aatactatta actggaaaaa aaaaaaaa gatcc ggggatttct Ctactgtatt gcatgtgaa t aca gcc tgg ctg tca ttg ctt tct tcc tcc cca Thr Ala Trp Leu Ser Leu Leu Ser Ser Ser Pro -25 -20 gcc ctt aca ttt ttg ttt ctc cat cta cca cca Ala Leu Thr Phe Leu Phe Leu His Leu Pro Pro caggtcattt ggagaacaag tagtggggtg gaattcagaa gaacagga atg agc cag Met Ser Gin ttt gga ccc ttc tct Phe Gly Pro Phe Ser Itcc acc agt cta ttt Ser Thr Ser Leu Phe 480 538 120 177 225 273 321 369 417 472 532 592 652 712 772 832 892 952 1012 1072 1132 1167 att aac ttA Il.e Asn Leu gca aga gga caa ata aag ggc cct ctt Ala Arg Gly Gin Ile Lys Gly Pro Leu ttg att, ttg Leu Ile Leu ctt ctt Leu Leu tct ttc tgt gga gga tat act aag tgc Ser Phe Cys Gly Cly Tyr Thr Lys Cys gac ttt gcc cta tcc Asp Phe Ala Leu Ser att atg gat cca aaa Ile MeZ Asp Pro Lys ttg gaa atc ccc aac aga att gag ttt.
Leu Clu Ile Pro Asn Arg Ile Glu Phe aga aaa aca aaa Arg Lys Thr Lys taatgaagcc atcagtcaag ggtcacatgc caataaacaa taaattttcc agaagaaatg tcagtaagga tgagcttgtt tctcgctctg tcactcaggc cctcccgggt tcaagccatt gtgccaccat gcctggctaa tcgggctggt ctcgggctcc ggattacaga tgtgagccac ttctgtgtca tggttggaag aggtgattca tggctctgtg gaagaatatg agtcagttat ttttgaactg ggaaacacct acaaatgctt attttcacat <210> 142 <211> 730 <212> DNA <213> Homo sapiens <220> <221> sig-.peptide, <222> 143. .238 aaatccaact gttttttgtt tggagtgcag ct CCtgcc te tttttgtgtt tgacctcttg cgtgcctagc acagagtagg aatttgaggt tgccagccctt tgtctgcatt cgaaaaaaaa agacaaataa ttgttttgtt tggtatgatc agtctcctga tttggtagag atccgcctgc caaggatgag aaggatatgg gaatggttcc ggaatttact cactttaaaa aaaaa agtagagctt ttggctcact gtagctggga acagggtttc cttggcctcc at tt t taaag aaaaggtcat ttattgtcta tctctagctt tgtcaaaact atgaaatggt aaagacggag gtaacctccg t tgcaggtgc accacgttgg caaagtgatg tatgttccag ggggaagcag ggccacttgt acaacggacc aatttttata <223> Von Heijne matrix score 8.80000019073486 242 seq V.
<221> poly; <222> 697..
<221> poly; <222> 721. <300> <400> 142 nctttgcctt cttggtgaga gagccgatgg cgc aag aac Arg Lys Asr
PMLLLIVGGSFG/LR
-signal 702 -site 730 tctntccaca ggtgtccnct cccaggtcca actgcagact tngaattcgt gcgtgagctg ccgagattcg ggagtctgcg ctaggcccgc ttggagttct aagagetecac tc atg ttc gca ccc gcg gcg acg cgt gct tt Met Pho Akla Pro Ala Val Thr Arg Ala Phe aag act ccc gc tat gga Lys Thr Leu Gly Tyr Gly g r.c Val1 ccc atg ceg Pro Met Leu Ctg ctg ate Leu Leu Ile gte. gga Val Gly gce geg Ala Vil g9t tc te ggt ctt cgc gag tee cct caa atc cga tat gat Gly Ser Phe Gly Leu Arg Giu Phe Ser Gin Ile Arg Tyr Asp aag age aaa Lys Ser Lys aeg gat ccc gag Met Asp Pro Glu gaa aaa aaa Giu Lys Lys adat aacd Lys Ile Lys aat aaa ata tc tea gag tcg gaa Asn Lys Ile Ser Leu Ciu Ser Giu ceg aaa gag Leu Lys Giu gac tcc aag Asp Ser Lys gaa gat cct Glu Asp Pro tt gat gac Phe Asp Asp gac ctc cc Asp Leu Leu tgg aag aat ate Trp Lys Asn Ile gga ccc agg cct tgg Gly Pro Arg Pro Trp caa gga aga Gin Gly Arg cca gaa agc cCt Pro Giu Ser Leu act aag aca act Thr Lys Thr Thr tgacecegct gattctc cctaatatat actcta ggtaaeeega tgacaaa~ ctattccatc tgtggaet aaaaatgtga atactgc~ <210> 143 <211> 1174 <212> DNA <213> Homo sapiens <220> <221> sig...peptide <222> 108. .170 <223> Von Heiine m~ ttt tCa caa ;aa tcc tccttcceec agtggaaagg ccc ecaccaa agtaacaaeg aaaaaaaaaa ~trix score seq SFLPSALVIWTSA/AF ttttaaataa aaattccagg aggtcatgta tzggccacgc ate tacaaaa cagaaaacct :ct tca gcc Pro Ser Ala -10 aaatactatt cccatggaaa caggececca aacctaca aactggactt ctggatatg tac ttcccag cc ecgaaata 120 172 220 268 316 364 412 460 520 580 640 700 730 116 164 212 260 <221> polyA.signal <222> 1141.,.1146 <221> po3.yk.site <222> 1161. .1174 <300> <400> 143 cacgttcctg ttgagtacac gtecctgttg tgaagactaa catecegtga agttgtaaaa ttt cag caa ggc ctc age ttc ctt Phe Gin Gin Gly Leu Ser Phe Leu -is tc gct gct etc ata tc tca tac Ser Ala Ala Phe Ile Phe Ser Tyr 1 5 ggtgcaggta tgagcaggtc gttagaa aeg egg egg Met Trp Trp ctt gta act egg aca Leu Val Il~e Trp Thr ate act gca gea aca ctc cac cat Ile Thr Ala Val. TIhr Leu His His ata gac ccg gct eta cct tat atc age gac act ggt aca gta gct cca Ile Asp Pro Ala Leu Pro Tyr Ile Ser Asp Thr Gly Thr Val Ala Pro gaa aaa tgc tta Giu Lys Cys Leu 243 25 at act gcg gca gte 'Sn Ile Ala Ala Val ggg gca atg cta Gly Ala Met Leu ta tgc Leu Cys agt cct Ser Pro act gct acc act tat gtt cgt tat Ile Ala Thr Ile Tyr Val Arg Tyr gte cat gct Val His Ala gaa gag aac Ciu Giu Asn ata ctg agc Ile Lau Ser gtt atc atc aaa Val Ile Ile Lys aac aag gcc ggc Asn Lys Ala Gly gta ccc gga Val Leu Gly tgt ta gga Cys Leu Gly act gtg gca Ile Val Ala ccc cog aaa aca Phe Gin Lys Thr acc tet ggt aeg Thr Phe Cly Met ctt ccc gcc gcd Lau Phe Ala Ala agc gga gct Ser Gly Ala tca tea tat Ser Leu Tyr gte cag acc Val Gin Thr ccc tac caa Ser Tyr Gin aeg cag Met Gin 125 ttg gee Leu Val ccc daa atc Pro Lys Ile aec cgg tgt Ile Trp Cys aaa cao gec Lys Gin Vol atc ago ctg Ile Arg Leu gea age gca Val Ser Ala aeg ceg ace Met Leu Thr eca eca gt Ser Ser Val teg cac Leu His 160 ggc aat tt Gly Asn Phe gat eta gao cag aaa ccc cat egg Asp Leu Giu Gin Lys Lou, His Trp ccc gag gac aaa Pro Giu Asp Lys gcg ccc cac Ala Leu His ace act gca Thr Thr Ala gaa egg tc aeg Giu Trp Ser Met tc ccc eec tee Phe Ser Phe Phe etc ceg act Phe Leu Thr tac act Tyr Ile 205 cat gga His Gly 308 356 404 452 500 548 596 644 692 740 788 836 884 938 998 1058 1118 1174 120 180 232 cge gat tee Arg Asp Phe tta acc cc Lou, Thr Leu 225 cgg cca ccc Arg Lou Leu 240 cag aaa Gin Lys 210 tat gac Tyr. Asp at tcc tea Ile Ser Leu gao gcc aac Glu Ala Asn ace gca cct tgc cct at aac Thr Ala Pro Cys Pro Ile Asn gaa cga aca Giu Arg 'rhr ccc aga gat act aga Ser Arg Asp Ile Arg tgaaaggata aaacateece gtaatgatta cgaetcecag ggaccgggga aaggcecaca gaagetgcee accacceaat caaggctgac agtaacactg atgaoegctg aagccatetg atagatetet ctoaaggaea tcatcaagaa cccataceec eccaectcag aaaacaaagt caaaogacta <210> 144 <211> 1158 <212> DNA <213> Homo sapiens <220> attcttc tcc atooccagga gactattaoa tgaaaaaaaa <221> P0iYA-.Signai <222> 1133.-1138 <221> poiyk..site <222> 1146.-1158 <300> <400> 144 aartegagcc tggggactgc agctgtgggg ogaceecogt gcaetgcctc tcttcatctt ggatcegaaa gcegagagca gcacgeeecg cccactgaaa cgrsagcgca mtggattatt ccctgggcce gaatgacetg aacgtttccc aacagtccat gtgggtgaee cagctctg aeg gga ege gte etc cag Met Gly Cys Val ?he Gin gaaatettca aacatgaaag aacacctatg aaaaaa ccctgggtgc c ecateccegs cgcctgagct agc aca Ser Thr 244 gaa gac Glu Asp aaa tgt ata ttc Lys Cys Ile Phe 1 ata gac tgg Ile Asp Trp gcc aag gac gaa tat gtg cta tac tat Ala Lys Asp Glu Tyr Val Leu Tyr Tyr act ctg Thr Leu tac tcc Tryr Ser 35 ttg atg Leu Met tca cca gga gag Ser Pro G2.y Giu aat ctc agt gtg Asn Leu Ser Val ggg gac atc tta Gly Asp Ile Leu att ggg cgc Ile Giy Arg cag aac cgc gta Gin Asn Arg Vai tgc Oat got Cys Asn Asp gga Acc tat Gly Thr Tyr aag aag 9c9 Lys Lys Ala ctc ctg ctc Leu Lou Leu gat gtg ca gag Asp Vol Gin Ciu gac cag Asp Gin cgt gad otc Cys Giu le ctc Oda ggg Leu Lys Gly gag agc Giu Ser gag ccc Giu Pro 100 cog gcg tc Gin Val Phe aaa ggt acg Lys Gly Thr gtg gta ctg cat gtg ctt cca gag Val Val Leu His Val Leu Pro Giu, cadaotg ctt act to Gin Met Leu Thr 105 gcaaggaaac tgattat tgaatcttca aagaaat tttagttaak aataaaa cacaaaagcc ctgtgaa atatttatta aatggaa gttyctto ctatgcc ttttacaggt accccat tcagtggctc atgcctg gccccagggt tcaagac cagagcaaga ctytgct <210> 145 <211> 754 <212> DNA <213> Homo sapiens <220> <221> sig...peptide <222> 5. .142 aagogggg ccaaggggca agagctttca tgtgcaagag ctt gan tat gga.
taa, cta gtt taa tgc taa gagtaaatgc tcattaaatt tgataat tat tgttttgtcc atagcamt tg acacctttsa aggtgtttg tcctagcact agtgagctat aataaaaaaa cagccttcgg atttcagrc tgtattatta acatataatg aaccccggca tcagttatcc ggggo tagaa t tgggaggc t gawggcacc a gagaaaaaaa, gctaagtacc agaataaaaa ctttaaacac tccaaatacg ccthtgacaa acattgatgc aagaaa taag gaggcagcag ctgcattyta aaaaaa taccocagag takgagttat acttccccct ttttggacac caaagtcyat tacatytgta cog kyc aggc aamtgcctga gcc tgggwga 280 /328 376 424 472 520 572 632 692 752 812 872 932 992 1052 1112 1158 49 97 145 193 <223> Von Heijne matrix score 6.59999990463257 seq VCCYLFWLIAILA/QL <221> poiyA.signal <222> 716. .721 <221> poiyk..site <222> 742. .754 <300> <400> 145 tgtg atg agc gtg tcc tgg ggc ttc gtc ggc ttc ttg gtg cct tgg ttc Met Ser Val ?he Trp Gly Phe Val Giy Phe Leu Val Pro Trp Phe -40 atc ccc aag ggt cct aac cgg gga gtt atc ott acc atg ttg gtg acc Ile Pro Lys Gly Pro Asn Arg Gly Val Ile Ile Thr Met Leu Val Thr -25 tgt tca gtt tgc tgc tat ctc ttt tgg ctg att gca att ctg gcc caa Cys Ser Val C ys Cys Tyr Leu, Phe Trp Leu Ile Ala Ile Leu Ala Gin -10 -5 1 aac ccc cc Asn Pro Leu tt gga ccg caa. ctg aaa oat gaa acc atc tgg tat ?he Gly Pro Gin Leu Lys Asn Giu Thr Ile Trp Tyr ctg aag tat Leu Lys Tyr cat cgg ccc tgaggaagaa gacatgctct acagtgctca His Trp Pro gtctttgagg tcacgagaag agaatgccet 301 245 Ctagatgcaa aatcacctct aaaccagacc actcttcttg acttgcctgt tttggc atgccttatt Ctacaatgca gcgtgt gtgcctccat aacctgaact gtgccg gaagatgctg tCttctgag agatac tggctcctgc cttctcacgt gggaat aagactecag tggggtggtc agtagg aatcgcacca aactatactt tcagga tattttcctc ctttctatgt aaaaaa <210> 146 <211> 1073 <212> DNA <213> Homo sapien~s <220> <221> sig..peptide <222> 98,_181 <223> Von Heijne matrix score 3.59999990463257 seq PLSDSWALLPASA/GV <221> PolyA..signal <222> 1035,..1040 <221> POIYA..site <222> 1060. .1073 cat t tttc actc gtta cag t agag tgaa aaaa agctgcctta aacgttaaca Ctttgccttt cacaaaacga ctctctCCtt gaagtgttta cacgttCaga ttcctccttt tttgcacttt ttatgtoctc ggaatctgtg gaaactgctg gggaagagcc ctgccatctt gcacat ttga ggtgaattac t tctgagata gatttgaaga caagacaaac atctcaacag t tggaa taaa <300> <400> 146 ccgattacag ctaggtagtg gagcgccgct cgctggggga gccccgcgcc gccggacgcc gcttacctgg gtgcaggaga cagccggagt cgtgacc atg tgg agg ctg ctg gct Met Trp Arg Leu Leu Ala cgc gct agt Arg Ala Ser gcg ccg ctc ctg Ala Pro Leu Leu gt9 ccc ttg tca Val Pro Leu Ser tcc tgg gca Ser Trp Ala 361 421 481 541 601 661 721 754 115 163 211 259 307 355 403 451 499 547 595 643 CtC ctc Leu Leu ccc gcc agt gct ggc gta aag aca ctq etc cca gta cca Pro Ala Ser Ala Gly Val Lys Thr Leu Leu Pro Val Pro ttt gaa gat gtt Phe Glu Asp Val agg gca cca ctt Arg Ala Pro Leu att cct gaa aaa Ile Pro Glu Lys gtg cca aaa gta aga Val Pro Lys Val Arg aag cct aga ttt Lys Leu Arg Phe gaa ccc aaa aat Glu Pro Lys Asn att, gaa Ile Giu tta agt Leu Ser gac ata cgg Asp Ile Arg ttt gca atc Phe Ala Ile gga cct. tcc act Gly Pro Ser Thr gct acg gag ttt aca gaa ggc aat Ala Thr Glu Phe Thr Glu Cly Asn ttg gca ttg Leu Ala Leu ggt ggc tac ctg Gly Gly Tyr Leu tgg ggc cac ttt Trp Gly His Phe gaa atg Clu Met atg cgc ctg aca atc aac cgc tct Met Arg Leu Thr Ile AsII Arg Ser ccc aag aac Pro Lys Asn ttt gcc ata tgg Phe Ala Ile Trp gta cca gcc cct Val Pro Ala Pro ccc atc act Pro Ile Thr agt, gtt ggg cat cgc atg ggg gga Ser Val Gly His Arg Met Gly Gly ggt get att Gly Ala Ile gtg aca cct Val. Thr Pro 125 gtg aag got gge Vai Lys Ala Gly tgt gaa ttt gaa gaa gtg Cys Giu Phe Giu Giu Val 140 gtt gta gag Val Val Giu ott gac cag Leu Asp Gin 150 gac cac tao Asp His Tyr 120 ggt ggg cgt Gly Gly Arg gcc cac aag Ala His Lys caa ggt ttc Gin Gly Phe 145 246 ttg CCC ttc gca gca aag gct geg agc cgc ggg act cta gag aag atg Leu Pro Phe Ala Ala Lys Ala Val Ser Arg Gly Thr Leu Giu Lys met 155 160 165 170 cga aaa gat caa gag gaa aga gaa cgt aac aac cag aac ccc egg aca Arg Lys Asp Gin Giu Giu Arg Giu Arg Asn Asn Gin Asn Pro Trp Thr 175 180 185 tte gag cga ata gcc act gcc aac atg ctg ggc ata cgg aaa gta ceg Phe Giu Arg Ile Ala Thr Ala Asn Met Leu Gly Ile Arg Lys Val Leu agc cca tat gac tcg acc cac aau ggg aaa tac egg ggc aag ttc tac Ser Pro TLyr Asp Leu Thr His Lys Gly Lys Tyr Trp Gly Lys Phe Tyr 205 210 215 atg ccc aaa cgt geg tagtgagege aggagaeaac tgtatatagg ctactgaaag Met Pro Lys Arg Val 220 aaggattctq cattectate cccctcagcc gccaeaacea aggagcagca teegageaga aaaaagaaaa ctgtattet attaaataaa ac CC Cac tga tectgaaaa atttaaacat agtctttggg acgatgttat caceecagga tagctceeaa ttgttgaett aaaaaaaaaa aaa <210> 147 <2 1 1> 413 <212> DNA <213> Homo sapiens <220> <221> sig...peptide <222> 46- 189 <223> Von Heijne m~atrix score 4.09999990463257 seq VFMLIVSVLALDIP/ET <221> polyA..signai <222> 377. .382 <221> poiyA..site <222> 402.-413 <300> <400> 147 Cgagaagagt tgagggaaag tgcegctgct gggtctgcag cag CCg aaa ata 888 cat cgc ccc etc tgc ttc Gin Pro Lys Ile Lys His Arg Pro Phe Cys Phe -35 geg aag atg ceg cgg ctg gat att atc aac eca Val Lys Met Leu Arg Leu Asp Ile Ile Asn Ser -20 etc aeg cec atc gta tct gtg teg gca ctg ata Phe Met Leu Ile Val Ser Val Leu Ala Leu Ile 691 739 787 835 890 950 1010 1070 1073 57 105 153 201 249 297 342 402 413 acgcg aeg gat aac geg Met Asp Asn Val agt geg aaa ggc cac Ser Val Lys Gly His ceg gta aca aca gta Leu Val Thr Thr Val cca gaa acc aca aca Pro Glu Thr Thr Thr ttg aca gte ggt gga Leu Thr Val Gly Gly ggg gtg dly Val tee gca ctt gtg aca gca gta egc Phe.Ala Leu Val Thr Ala Val Cys ctt gcc gac ggg gcc ccc att tac egg Leu Ala Asp Gly Ala Leu Ile Tyr Arg aag ctt ctg etc aat ccc agc Lys Leu Leu Phe Asn Pro Ser gaa aaa 888 gaa gte teg Glu Lys Lys Glu Val Leu ggt cct tac cag aaa aag cct gtg Gly Pro Tyr Gin Lys Lys Pro Vai cat His taaeceeata ceactetta gtttgatace aageatcaaa catatetceg tattcttcca aaaaaaaaaa a <210> 148 <211> 609 <212> DNA <213> Homo sapiens <220> 247 <221> <222> <223> <221> <222> <221> <222> -c3 00 <400> Cgtcgg sig-peptzide 139. .231 Von Heijne matrix score 4.40000009536743 seq TCCHLGLPHPVRA/ PR poiyA-.signal 579. .584 polyA~site 598. .609 148 'og tt 'agat ggaaagggac gcctggtttc cccccaagcg aaccgggatg ggaagtgact tgaacttcag ctggactgad agagaggcta gaagttccgc ttgccagcag CcCccC;.a9C agagcgga ag t aat acc cac acg Met Ser Asn Thr His Thr ctt gtc tca ctt Leu Val.Ser Leu cat ccg cac cog His Pro His Pro gtc cgc gct ccc Val Arg Ala Pro ctc acc tgc tgt cac ctc ggc ctc cca cac Leu Thr Cys Cys His Leu Cly Leu Pro His cot ott cot Pro Leu Pro cgc gta gaa cog tgg gat cct Arg Val Glu Pro Trp Asp Pro agg tgg cag Arg Trp Gin 1 gac cca Asp Ser gag cta agg tat cca cag gcc atg aat tcc ttc Giu Leu Arg Tyr Pro Gin Ala Met Asn Ser Phe cta aat Leu Asn gag cgg tca tog Glu Arg Ser Ser cog tgc Pro Cys agg aCC tta agg caa gaa gca tog Arg Thr Leu Arg Gin Ciu Ala Ser gac aga tgt gat Asp Arg Cys Asp tgaacctgat agattgctga ttttatctca ttttatcctt gacttggtac aagttttggg aatatcaaga aagtogtctt oagtattaag tcccacctcc tccgaataag gaaacgcctt agctgcaaaa aaaaaaaa <210> 149 <211> 522 <212> DNA <213> Homo sapiens <220> <221> PolYA-site <222> 512. .522 <300> <400> 149 ccaactgcag nttogaattt accgagogga gaaaaataga a atg aag gta cat atg Met Lys Val His Met atttctgaaa tagaatttag tgggaccaac agaccataca gataaccaca atctaggttt ccttcctgct tttatggaat aaataagctg gaggagatgc acacggcact cgagtgtgag cac aca aaa ttt tgc ctc att tgt His Thr Lys Phe Cys Leu Ile Cys ttg ctg aca ttt att ttt cat cat tgc aac cat tgc cat gaa gaa cat Leu Leu Thr Phe Ile Phe His His Cys Asn His Cys His Giu Giu His 20 gac cat ggc cct gaa gcg ctt caC aga Asp His Gly Pro Glu Ala Leu His Arg 35 :tg gag cca agc aaa ttt tca aag caa Leu Giu Pro Ser Lys Phe Ser Lys Gin cag cat cgt gga atg aca gaa Gin His Arg Gly Met Thr Giu 40 got got gaa aat gaa aaa aaa Ala Ala Giu Asn Giu Lys Lys tat ggt gaa aat gga aga tta T1yr Gly Glu Asn Gly Arg Leu tac tat att Tyr Tyr Ile gaa aaa ctt Giu Lys Leu ttt gag cgt Phe Giu Arg 70 tcc ttt ttt ggt ttg gag aaa Ser Phe Phe Gly Leu Ciu Lys ctt tta aca aac ttg Leu Leu Thr Asn Leu ggc ott gga gag Gly Leu Gly Giu 248 aga aaa gta gtt-gag att aat cat gag Arg Lys Vai Vai ciu Ile Asn His Giu Asp 100 tct cat tta ggt att ttg gca gtt Caa gag Ser His Leu Gly Ile Leu Ala Val Gin Giu gat ctt ggc cac Leu Gly His Asp 105 gga aag cat ttt Gly Lys His Phe gat cat gtt His VIal ca c cat aac cac cag His Asn His Gin tec cat aat cat Ser His Asn His t ca Ser gaa aat caa Giu Asn Gin 140 9t9 ace agt gta tce ace Val Thr Ser Val Scr Thr Oaaaaaeaa <210> <211> <212> <213> <220> <221> <222> <223> 150 1322
DNA
Homo sapiens sig..peptide 126. .260 Von Heijne matrix score 4.59999990463257 seq VLVYLVTAERVWS/DD <221> poiyksignai <222> 1283. .1288 <221> Polyk-site <222> 1309.. 1322 <300> <400> 150 ccgaaaacct tccccgcttc tggatatgaa attcaagctg cttgctgagt cctattgccg gctgctggga gccaggagag ccctgaggag tagccactca gtagcagccg acgcgtgggt ccacc atg aac tgg agt ate ttt gag gga etc ctg agt ggg gtc aac sag met Asn Trp Ser Ile Phe Ciu Gly Leu Leu Ser Gly Val Asn Lys tcc aca gcc Ser Thr Ala tct ggg Phe Gly cgc atc Arg Ile tgg ctg tct Trp Leu Ser ctg gtc ttc ate Leu Val Phe Ile cgc gtg ctg gtg tac ctg gtg acg gcc gag cgt gtg tgg agt get Arg Val Leu Val Tyr Leu Val Thr Ala Giu Arg Val Trp Ser Asp cac aag gac His Lys Asp ttc gao tgc sat Phe Asp Cys Asn cgc cag ccc ggc Arg Gin Pro Gly tcc aac gtc Ser Asn Val tgc ttt Cys Phe cag ct Gln Leu gat gag ttc ttc cct gtg tcc cat gtg cgc ctc tgg gcc ctg Asp Giu Phe Phe Pro Val Ser His Val Arg Leu Trp Ala Leu ate ctg gtg Ile Leu Val tgc ccc tca. ctg Cys Pro Ser Leu gtg gtc atg cac Val Val Met His tec cgg gag Tyr Arg Glu cag gag sag agg Gin Glu Lys Arg gsa gcc cat Giu Ala HIS ggg gag Gly Glu sac agt ggg Asn Ser Gly t9g Cgg aca Trp Trp Thr gcc tt~t cc Ala Phe Leu 100 etc tao ctg eac Leu Tyr Leu Asn Ccc ggc sag sag cgg Pro Gly Lys Lys Arg .75 gtg ttc sag gcg agc Val Phe Lys Ala Ser ggt ggg oto Gly Gly Leu gtg gac ate Val Amp Ile tat gtc tgc ago Tyr Val Cys Ser tat gtg ttc cac tca ttc tac Tyr Val Phe His Ser Phe Tyr gtg gtc aag tgc cac gca gat cca tgt Val VIal Lys Cys His Ala Asp Pro Cys ccc aaa tat'atc ctc cct pro Lys Tyr Ile Leu Pro 110 ccc ac ata gtg gao tgo Pro As Ile Val Asp Cys 125 130 249 ttc etc tcc aag Phe Ile Ser Lys gcc aca gct gcc Ala Thr Ala Ala *cCc tca gag aag aac Pro Ser Giu Lys Asn Ile 135 140 etc tge etc ctg Ctc aac Ile Cys Ile Leu Leu Asn att ttc acc CtC ttc atg gtg Phe Thr Leu Phe Met Val.
Ctc gtg gag Leu Val Giu 145 ctc atc tec Leu Ile Tyr 160 aae gct caa Lys Ala Gin ctg gt9 agc aag age tgc cec Leu Val Ser Lys Arg Cys His 165 gee at; tgc aca ggt cat c Ala Met Cys Thr Gly His His 180 185 cad gac gec ctc cet tc ggc Gln Asp Asp Lau Leu Ser Gly tge ctg gca gca Cys Leu Ala Ala ccc eec gat ace eec tcc ccc tgc aaa Pro His Asp Thr Thr Ser Ser Cys Lye gec ccc ate Asp Leu Ile Ctg ggc tca gac Leu Gly Ser Asp cct cct ctc tte cee gee cge ccc Pro Pro Leu Leu Pro Asp Ar; Pro gee c gcg Asp His Val aag aaa acl Lye Lye Th: 225 tggggagge t atc tc; tgaggggccg Ile Leu etagcatcte tcatagg gcaegagega ggatcca gceccagece gacggca tagaatggaa acagtga aacacacecg cgggcac ctcatttgct ggttaaa <210> 151 <211> 1290 (212> DNA <213> Homno sapiens <220> <221> sig..peptide <222> 50.-160 <223> Von H-eiine mn ectggactgg tetggeaggt tgggcctgga tge aacecgagag gee gctcgggag ctg ggeeagtcc ggg ccaatgccca ctt eatcgtgtgt aaa aaase tgggggagec eeagtccca cectctgetc gggttggagg ggcccactgt a egeea tga g gtccceaec tgeagc tcgg gaggagggcg cagaac ctaa gteggggcag ceegecacec ttcatagaag teaaagtcaa etrix 746 794 842 890 938 994 1054 1114 1174 1234 1294 1322 58 106 154 202 250 298 346 394 score 4 seq PLSLDCGHSLCRA/CI <221> poIyA-site <222> 1280. .1290 <300> <400> 151 gaggegagcc tcaggagtta. ggaccagaag aagccaggga agcagtgca atg get tea Met Ala Seraaa etc ttg ctt eec gte cee gag gag Lys Ile Leu Leu Asn Val Gin Giu Glu eec tgt ccc Thr Cys Pro ate tgc ctg Ile Cys Leu gag etg ttg ace gaa ccc ttg agtceta gee tgc ggc cac agc ctc tgc Glu Leu Leu Thr Giu Pro Leu Ser Leu Asp Cys dly His 5cr Leu Cys ega gcc tgc Arg Ala Cys act gtg age sac aeg gag gca gcg ace age atg gga Thr Val Ser Asn Lye Clu Ale Val Thr Ser Met Gly gga aae egc age tgt Gly Lys Set Set Cys gtg tgt ggt Val Cys Giy ate sgt Ile Set tac tca tc gee Tyr Set Phe Giu eta eag gcc aet cag cat etg gee aac ata gtg gag aga etc aeg gag Leu Gin Ala Asn Gin His Leu Ala Asn Ile Val. Glu Arg Leu Lye Giu 40 gte aag ttg age eca Val Lys Leu Set Pro gee eat ggg Asp Asn Gly eag age gat etc tgt get cat Lys Arg Asp Leu Cys Asp His cat gga gag aaa ctcceta etc ttc tgt aeg gag get agg sa gtc act His Gly Clu Lys Leu Leu Leu Phe C~s Lys Giu Asp Arg Lys Val Ile 250 tgc tgg Cys Trp ctt tgt gag egg tct cag gag cac cgt Leu Cys Giu Arg Ser Gin Glu His Arg cec cac aca gtc His His Thr Val aeg gag gaa gta ttc aag gaa tgt ceg gag aaa ctc cag gca Thr Giu Giu Val Phe Lys Giu Cys Gin Giu Lys Leu Gin Ala etc aag agg ctg Leu Lys Arg Leu a gaa gag gag Lys Glu Clu Giu gct gag aag ctg Ala Giu Lys Leu gaa gct Glu Ala 125 act gag Thr Clu gac etc age Asp Ile Arg age cad egg Arg Gin Arg 145 eat gag gag Asn Giu Glu gag a act cC Giu Lys Thr Ser tat cag gte Tyr Gin Val.
cad ace gee Gin Thr Glu gat cag ctt age Asp Gin Leu Arg ate eta aat Ile Leu Asn cag age gag Gin Arg Giu aga ttg gee Arg Leu Clu gee gaa aeg aag Glu Giu Lys Lys 160 acg ttg Thr Leu get eag ttt Asp Lys Phe c 94 gpat Ala Giu Asp gtt cap cap Val Gin Gin cag ttg gtg age Gin Leu Val Arg ctc etc tea gat Leu Ile Ser Asp tgt cgg agt Cys Arg Ser cap tgg Gin Trp tce aca etg Ser Thr Met gag ate tgg Giu Ilie Trp 225 act gte ttc Thr Val Phe ctg ceg gec Leu Gin Asp egt gga ate etg Ser Gly Ile Met aae tgg agt Lys-Trp Ser 220 aaa ctg aag Lys Leu Lys ctg aaa aag Leu Lys Lys aedaetg gtt tee aeg Lys Met Val Ser Lys cat gct cea His Aia Pro apgt egg atg Ser Arg Met ca atg ttt age Gin Met Phe Arg 240 gee ctg Giu Leu 586 634 682 730 778 826 874 922 970 1018 1066 1114 1162 1210 1258 1290 ace gct gte Thr Ala Val tee tgg gtg Tyr Trp Val gte ace etg eat Val Thr Leu Asn gte eec eta eat Val Asn Leu Asri aat ctt gte ctt Asn Leu Val Leu get cap ea Asp Gin Arg cee gtg Gin Val 285 gte ttg Val Leu eta tct gtg Ile Ser Val.
gga tee cae Gly Ser Gin 305 tee aeg aaa Ser Lys Lys 320 ett tgg cet ttt Ile Trp, Pro Phe tat eat tat Tyr Asn Tyr tat ttc tcc tct Tyr Phe Ser Ser aaa cat tac tgg Lys His Tyr Trp gtg gee gtg Vai Asp Val ace tat tc Thr Tyr Ser act gee tgg Thr Ala Trp ggg gte tee Gly Val Tyr cat etg aeg tat His met Lys Tyr age age tgt Arg Arg Cys eat cgt cee eat Asn Arg Gin Asn tee acc aaa tee Tyr Thr Lys Tyr eat eaa tgt aeg Asn Lys Cys Lys cet eta ttt ggc Pro Leu Phe Gly tgg gtt ate ggg Trp Val Ile Giy tat ggt gee Tyr Gly Ala eeaaaa aa a <210> <211> <212> 152 1364
DNA
<213> Homo sapiens <220> <221> Sig4peptide <222> 83. .139 <223> Von Heijne matrix score 8.60000038146973 seq LLWLALACSPVHr/TL <221> POlYA-site <222> 1356. .1354 <300> <400> 152 gcctgggagc tgaggcagcc acegtctcag cctggccagc cctctggacc cegaggttgg accctactgt gacacaceta cc atg egg aca cte tte aac etc ccc tgg ctt Met Arg Thr Leu Phe Asn Leu Lou Trp Lou gcc ctg gcc tgc Ala Leu Ala Cys CCt gtt cac act *cc ctg tea aag Pro Val His Thr Thr Lou Ser Lys tea gat gc Ser Asp Ala aaa aaa Lys Lys aag ccg Lys Pro agt gtg Ser Val gce tea aag aeg ctg ctg gag aag agt cag ttt tea gat Ala Ser Lye Thr Lou Leu Glu Lys Ser Gin Phe Ser Asp caa gac cgg ggt ttg gtg gtg acg gae ctc aaa get gag Gin Asp Arg Gly Leu Val VIal Th: Asp Leu Lys Ala Glu gtt Ctt gag Cat ege agc tac tgc Val. Leu Glu His Arg Ser Tyr Cys eag gcc cgg Lys Ala Arg age cac ttt get ggg gat gta etg gge Arg His ?he Ala Gly Asp Val. Leu Gly gte act cca tgg Val Thr Pro Trp Cat gc tee His Gly Tyr tea cc qte Ser Pro Val gte ace aeg gtc ttt Val. Thr Lys Val Phe age aag tte Ser Lys Phe tgg ctg cag ttg Trp Leu Gin Leu age cgt, ggc cgt Arg Arg Gly Arg aca cag ate Tb: Gin Ile atg ttt gag Met Phe Glu get gte agg Ala Val Arg gte aeg Val. Thr 105 aag cat Lys His ggc etc eec gee Gly Lou His Asp gee caa ggg tgg atg Asp Gln Gly Trp Met 112 160 208 256 304 352 400 448 496 544 592 640 688 736 784 832 880 928 gee aeg ggc Ala Lys Gly cac eta gtg Cct His Ile Val Pro ctg ttt gag Leu Phe Giu tgg aet tee gat Trp Thr Tyr Asp egg aec gte Arg Aen Val gac agt gag gat Asp Ser Giu Asp gag ate Glu Ile 150 gag gag ctg Glu Giu Lou gat ggc ttc Asp Gly Phe 170 gcg gc cc Val Gly Lou aec gtg gte Thr Val. Val gtg gca aag eac Val. Ala Lys Asn cag eat ttc Gin His Phe 165 ceg aeg cge Gin Lye Arg gtg gag gte tgg Val. Glu Val. Trp, ceg ctg eta Gin Lou Lou atc eec atg Ile His Met ace chc ttg gee gag gee ctg eec cag Thr His Lou Ala Giu Ala Lou His Gin 185 gee egg Ala Arg 200 Ctg etg Leu Lou gee etc Ala Lou 205 ggc atg Gly Met 220 gte ate ccg Va. Ile Pro ate ace ccc ggg Ile Thr Pro Gly 215 ace gac cag ctg Thr Asp Gin Lou ttc aeg ec Phe Thr His gag ttt gag cag Giu Phe Glu Gin ctg gcc Lou Ala 230 ccc gtg ccg Pro Val. Lou gat ggt Asp Gly 235 ttc age etc Phe Sor Lou atg ace tac Met Thr Tyr 240 ctg tee tgg Lou Ser Trp gac tae tet ace gcg Asp Tyr Ser Thr Ala 245 gtt, cga gee tgc gte Va. Arg Ala Cys Val cat eag ect ggc cet eat gca ccc His Gin Pro Gly Pro Asn Ala Pro 252 cag Gin c C Leu 280
CCC
Pro c9g Arg ag Lys c tg Leu acc ctg gac ccg aag tcc aag tgg cga Pro Lys ggt atg Cly met 285 gcc agg Ala Arg 300 gac agc Asp Ser ggg agg Gly Arg ctg gag Leu Glu ggc cag Gly Gin Ser 270 gac Asp tac Tyr cag Gin cac His c tg Leu 350 ggc Gly Trp gcg Ala Cag Gin tca Se r 320 g c Val1 Arg gac Arg acc Thr aca Thr 305 gag Glu ttc Phe gag Giu tac aaa atc ctc ctg Lys Ile Leu Leu 275 aag gat gcc cgt Lys Asp Ala Arg aag gac cac agg Lys Asp His Arg 310 ttc tcc gag cac Phe Phe Giu Tyr 325 cca acc ccg aag Pro Thr Leu Lys 340 ggc gcc ggg c Gly Val Gly Val 355 cac gac ctg cc Tyr Asp Leu Leu Ile Trp Giu Leu Lou Asp Tyr Phe taggtgggca ttgcggcctc cgcggtggac gtgtctttt gcaggtgtga aatacaggcc tccaccccgc ttgcaaaaaa <210> 153 <211> 1470 <212i> DN~A <213> Homo sapiens <220> <221> sig...peptide <222> 57. <223> Von Heijne matrix score 3.90000009536743 seq HLLSIGMLMLSAT/QV <221> polyAsignal <222> 1438. .1443 <221> polyA_.Site <222> 1458. .1470 <300> <400> 153 gccggcaaga ctgtttgtgt tgcgggggcc ggacttcaag aaa gcgattttac: aacgag atg 976 1024 1072 1120 1168 1216 1261 1321 1364 59 107 155 203 251 299 347 395 ctg ctc tcc ata Lou Leu Ser Ile gtc ttg act gtc Val Leu Thr Val gca gac act tta, Ala Asp Ile Leu gac ccc cct gca Asp Lou Pro Ala 99t, ttt tta at Gly Phe Leu Ile 55 cct cca cca gta Pro Pro Pro Val ggg Gly cag Gin gca Ala aga Arg aac Asn aaa Lys atg ctc atg ctg tca gcc aca caa Met Leu gca tc Ala Phe ttt gaa Phe Glu tat aga Tyr Arg 45 cca gag Pro Glu tca tct Ser Ser Ser tta Leu aa C Asn 30 ctc Leu aa t Asn ggc Gly Ala aac Asn 15 gca Ala oca Pro gcc Ala act Thr Thr cca Pro tct Ser gct Ala Cys tCc Phe gtc tac acc Val Tyr Thr cct gta gaa Pro Val Glu aca. ttt gat 'rhr Phe Asp ggt tta, aag Gly Leu Lys ccc ata gtg Pro Ile Val gtg tta act Val Lou Ile aga aga ctt gat tgt aat ttt gat aca Arg Leu Asp Cys Phe Asp Ile aag gt Lys Val tta aat gca Leu Asn Ala cag aga Gln Arg 100 253 gca gga Ala Gly Cac aag gca gcc sea gtt cac Tyr Lys Ala 105 Ala Ile Val His Asri att agc atg Ilie Ser Met cca tcC gee Pro Ser Val 135 ttc ace tat Phe Thr Tyr tcc aac gac ace Ser Asn Asp Ile 110 gta Val1 aat gte gat tct gat gac cc Val Asp Ser Asp Asp Leu 115 cta aag aaa att gac ate Leu Lys Lys Ile Asp Ile ttt ace ggt gas Phe Ile Gly Clu gCt age tc Ala Ser Ser aaa gat gas Lys Asp Glu gsa ttt age Glu Phe Ser gsa ass ggg GIlu Lys Gly cdc Ctc atc cta His Leu Ile Leu 150 ccc ccc Leu Pro ttg gas cac Leu Glu Tyr cta ate CCC ttc Leu Ile Pro Phe atc ea geg ggc Ile Ile Val Gly ccc sec ceg Leu Ile Leu gec ace ttc atg Val Ile ?he Met ae ass Ccc gtc Thr Lys Phe Val cag gat Gin Asp aga cat aga Arg His Arg CCC CCC ges Leu Pro Val 215 ate ege Ceg Ile Cys Leu aga aac aga Arg Asn Arg ass gat caa Lys Asp Gin ass ccc sag Lys Phe Lys ass Lys gat gag cac Asp Giu Tyr ccc aag aa Leu Lys Lys 210 gca ege gcc Val Cys Ala atc ctt ccc Ile Leu Pro gat gag tat Asp Giu Tyr gat gga gac aas Asp Gly Asp Lys 230 tgc tcc Cys Ser cat gcc tac His Ala Tyr sag tge gta Lys Cys Val tgg cca acc Trp Leu Thr c 491 539 587 635 683 731 779 827 875 923 971 1019 1067 1115 1163 1209 1269 1329 1389 1449 1470 ass ass ace Lys Lys Thr ccs geg tgc agg Pro Val Cys Arg caa Gin 270 age Se r gee gee ccc Val Val Pro Ccc caa Ser Gin 99c gat Cca GJly Asp Ser aca gas cac Thr Ciu His 295 Cct gac aca gac Ser Asp Thr Asp caa gsa gas Gin Glu Glu acc ccc tts ctg Thr Pro Leu Leu eta gcc cCC Leu Ala Ser eat gaa geg Asn Giu Val 290 agC gcc cag Ser Ala Gin seg aca gaa Met Thr Glu eca ttt Ser Phe 310 tcC tca Ser Ser ggg gcc tca ecg Gly Ala Leu Ser gac tat gag gas Asp Tyr Giu Giu 330 Ccc cgc ecs cat Ser Arg Ser His gac eat gas Asp Asn Glu gas eat ga Ciu Asn Glu eat gaa cat gat gcc geg Asn Giu His Asp Val Val 1350 Cac aac aca gca aae acc Tyr Asn Ile Ala Asn Thr ac g9t gas Asn Gly Ciu ace gac age agt gat Thr Asp Ser Ser Asp 340 gtc cag teg cag cce Val Gin Leu Gin Pro 355 get tgactttcag Val taccgcaaec tgacecceeg tacectacag CCtaatcaa gssesesccc catcacsaa Ccceeeggaa tgssagctt aagacgaccg gcccaecccc cCCtaaaacg accaggeacs ctCcccaaa sgaccccegc agaaace tattttccag teacegasac aggacteceg accggeacc tacctgccaa Caateagaceg gcgccgcasc Ccasgcaeca sttcagcece gccasaacsa saaaaaaaaa a <210> 154 <211> 982 <2.12> DNA <213> Homo sapiens <220> <221> sig..pepcide <222> 72. .197 254 <223> Von Hejine matrix score 7.19999980926514 seq ILFSLSFLLVIIT/FP <221> poiyA_site <222> 970. .982 <300> <400> 154 gctgjcctgtt cttcacactt agctccaaac ccatgaaaaa ttgccaagta taaaagcttc tcaagaatga g atg gat tct agg gtg tct tca cct gag aag caa gat aaa Met Asp Ser Arg Val. Ser Ser Pro Glu Lys Gin Asp Lys gag Glu a tc Ile tc Ser gta Val1 t tg Leu cga Arg t cc Ser gc t Ala c tg Leu 100 tcc Ser t ta Leu gaa Glu gcc Ala gaa Glu 180 Ctg Leu agc Ser a cg Met aat ttc qgg Asn Phe Val ctg ttt tcc Leu Phe Ser at* tgg atg Ile Trp Met ttC cgt ctg Phe Arg Leu atc ctg gtc Ile Leu Val aca gtt act Thr Val Thr gta act act Val Thr Thr gtc tca gca Val Ser Ala ctg gct caa Leu Ala Gin cag atc tta Gin Ile Leu ctt gat gat Leu Asp Asp 135 atc aaa gat Ile Lys Asp 150 gag gct gag Giu Ala Giu 165 gga gaa atg Gly Giu Met gct gag tcc Ala Giu Ser acg gta gcC Thr Val Ala 215 aat ata cta Asn Ile Leu 230 Igt Gly ec Leu tgc Cys 9ga Gly ct; Leu tgc Cys cag Gin gtg Val1 ac 'h r gC t Alia 120 gcc Ala gtt V/al cC Ala ag c Ser ~cC Pro 200 acc Thr gag Glu gtc Val1 tet Ser t tg Leu cgc Arg 25 cca Pro aac Asn gta Val gc t Ala act Thr 105 gga G ly acc Thr Cgg Arg ac Thr gc t Ala 185 a ta Ile gag Glu 9gC Gly aac aat ada cg ctt Asn t tc Phe aag Lys 10 ate Ile tgc Cys att Ile ga t Asp aa t Asn 90 c tg Leu cga Arg gaa Ciu at t Ile Arg 170 t cc S er gc t Ala aag Lys att le Asn Lys Ctg ttg Leu ,Leu -5 atc att Ile Ile caa gct Gin Ala ata gat Ile Asp cc cda Pro Pro 60 gga gtt Gly Val 75 gtc aac Val Asn aga aat Ar; Asn gaa gag Giu Giu ctg tg Leu Tip 140 CCC gtg Pro Val 155 gaa gcg GIlu Ala aaa tcc Lys Ser Leu Gin aat tct Asn Ser 220 99t ggc Gly Gly Arg Leu -20 gcg atc Val Ile gag Arg Giu gac aaa Asp Lys 30 gtg ttt Val Phe 45 caa gag Gin Giu gtc tat Val Tyr gat gtc Asp Val gtc tta Val Leu 110 atc gcc Ile Ala 125 ggg atc Gly Ile Ca; ttg Gin Leu aga gcc Arg Ala ctg aag Leu Lys 190 Ctg cgc Leu Arg 205 acg att Thr Ile gtc agc Val Ser 9g c Gly act Ile tat Tyr 15 gcc Ala gtc Val1 atc Ile tac Tyr cat His ggg Gly cat His C9g Arg cag Gin aag Lys 175 tca Ser tac Tyr gtg Val gta Val1 acc Thr gaa Ciu aag Lys dag Lys ctc Leu aga Ar; caa Gin aca Thr agc Ser gtg Val1 aga Arg 160 gtc Vali gcc Ala c; Leu ttt Phe tgt ggc tog Cys t tc Phe 1 eg t Arg ggg Gly gt c Val acc Thr a tc Ile gca Ala cag Gin a tc Ile gcc Ala 145 tcc Ser ctt Leu tcc Ser cag Gin ccc Pro 225 Gly ccc Pro gc t Ala cca Pro gac Asp aga Arg tat Tyr aca Thr a cc Thr cag Gin 130 cga Ar; a tg Met gca Al a atg Met ac c Thr 210 cc; Leu Trp a tc Ile gt t Val 9g t Gly ctc Leu gac Asp agt Se r ttt Phe t tg Leu 115 act Thr gtg Val gca Ala gc t Ala gtg Val1 195 ttg Leu cc Pro 110 158 206 254 302 350 398 446 494 542 590 638 686 734 782 830 878 926 Tyr Asp Asn His Lys 2350 aag ctt cca aat aaa gcc tgaggtcctc ttgcggtagt cagctaaaaa aaaaaaaa Lys Leu Pro Asn'Lys Ala 245 <22.0> 155 <211> 455 <212> DNA <213> Homo sapiens <220> <221> POlYk..signal <222> 425. .430 <221> POlYA..site <222> 443. .455 <300> <400> 155 gtt atg oca ccc aga aac Met Pro Pro Arg Asn 1r 255 ota ctg gag tta ctt att aac atc aag Leu Leu Glu Leu Leu Ile Asn Ie Lys gga acc tat ttg Gly Thr Tyr Leu act gat cgc atc Thr Asp Arg Ile cag too tat ctg Gin Ser Tyr Leu gag cac atg gtt Glu His Met Val aac att gat Asn Ile Asp 99t ttc tee Gly Phe Phe tat cga Tyr Arg ctg tge cat gac aag gaa act tac aaa ctg caa cgc Leu Cys His Asp Lys Glu Thr Tyr Lys Leu Gin Arg aaa. qg Lys Gly att cag aaa Ile Gin Lys cgt gaa Arg Glu gcc agc aat tgt Ala Ser Asn Cys ttc gca.
Phe Ala ttgaaac aaa ttt gc gtg gaa act tta ate tge tcC Ph. Glu Asn Lys Phe Ala Val Giu Thr Leu Ile Cys Ser 85 agaaaaacat tattgaggaa aattaatc acagcataac cocacocc cagtgattat tttttaaagt cttctttoat gtaagtagca aacagggc tcatctcatt aattcaatta aaaccattac ccoaaaaaad aaaaaa <210> 156 <211> 738 <212> DNA <213> Homno sapiens <220> <221> sig-peptide <222> 90. .278 <223> Von Heijne matrix score seq GLVCAGLADMARP/AE <221> pOiyk..signai <222> 704.,,709 <221> pOlyA site <222> 724. .738 <300> <400> 156 gggaaaagtg actagcccc cttcgttgtc agccagggac gagaacac acccggotgc caacgatcco tcggcggcg atg tcg gcc gcc ggt Met Ser Ala Ala Gly gaa. act act Glu Thr Ile act cgg cat Ile Arg His tgaacagtca Ct acattttgtg :tt tactatcttt 48 96 144 192 240 289 349 409 455 113 161 209 257 ag ccacgctccc gcc cga ggo Ala Arg Gly cgg 9cc acc tao Arg Ala Thr Tyr cgg Ctc ccc gat aa gtg gag ctg atg Arg Leu Pro Asp Lys Val Glu Leu Met gag aaa ttg Glu Lys Leu cog ttg tac aac Pro Leu Tyr Asn coa goa ggt ccc Pro Ala Gly Pro aga aca Arg Thr got gga Ala Gly gtt tc tto Val ?he Phe got cca ate atg Ala Pro Ile Met ggg ttg gtg Gly Leu Val 256 ttg gct gat atggcc aga cct gca gaa aaa ctt agc aca gct caa tct Leu Ala Asp Met Ala Arg Pro Ala Glu Lys Leu Ser Thr Ala Gin Ser gtt ttg atg gct Val Leu Met Ala aca ggg Thr Cly ttt att tgg tca aga tac tca ctt gta Phe Ile Trp Ser Arg Tyr Ser Leu Val ate ate ccg aaa Ie Ilie Pro Lys tgg agt ceg tee Trp Ser Leu Phe gce gte aat ttc ttt Ala Val Asn Phe Phe geg ggg Val Gly gca gca vga gcc tc Ala Ala Gly Ala Ser Cta 888 gcc aaa gca LeU LYS Ala Lys Ala cag ctt tct cge att egg aga tat aac caa Gin Leu Phe Arg 110 Trp, Arg Tyr Asn Gin 50 55 cac aa taaadgag.t cctgatcacc tgaacaatct His Lys gaa Glu agatgtggac aaccattg ggacceagct actgtgtgtt tagaaggcdc tgtaactggt gcaaactttt aataacagtc tctctacatg acatttttct accatttgtc cgcaataaac <210> 157 <211> 649 <212> DNA <213> Homno sapiens <220> <221> sigpeptide <222> 88.-147 <223> Von Heijne matrix score 12.3999996185303 seq ALLLGALLCTAWA/
RR
tattatetgg agctagetct ac t taga c tact tgct ttttgataa tgattcaata CctttCtatg cgtaaaaaaa agcaaagcca gaaaaatgca gatattagta aaaaaaaa <221> PolYA-signal <222> 619. .624 <221> P0iYA-site <222> 637.. 649 <300> <400> 157 ccaaagtgag agtccagcgg tctCccagcg cttgggccac ggcggcggcc Ctgggagcag aggaggagcg accccatcac gctaaag atg aaa, ggc tgg qg tgg ccg gcc ctg Met Lys Gly Trp Gly Trp Leu Ala Leu -20 353 401 449 500 560 620 680 738 114 162 210 258 306 359 419 479 539 599 649 Lct ctg Leu Leu ctc cac Leu His ggg gcc ctg ctg gga aCC gcc tgg gct cgg agg agc cag Gly Ala Leu Leu Gly Thr Ala TrP Ala Arg Arg Ser Gin gat tgt gga gca Cys Gly Ala 10 tgc agg gcc ctg gtg gat gaa cta gaa.
CYS Arg Ala Leu Val Asp Glu Leu Glu tgg gaa Trp Glu ttc cgg Phe Arg ate gcc cag Ile Ala Gin aec aat cca Ile Asn Pro gac ccc aag aag aCC Asp Pro Lys Lys Thr cag atg gga Gin Met Gly gat ggc agc cag ASP Gly Ser Gin gtg gtg gag gta Vai Val Glu Val get act gte Val Thr Val ccc cca Pro Pro aac 888 gta gct Asn Lys Val Ala tcC ggC ttt gga Ser Gly Phe Giy tgaaattcga ctgcttaaaa aggacceegg tceaatagaa Cateeggaag aagctgcagg tacttegtet ccgccgtaga cacatccaca tgactggtt qacegagtc ggagctaaaa <210> 158 <211> 714 <212> DNA <213> Homo sapiens atgaagaaaa cttattcccc atttgttagc ttaatgtagc ataaaaaatg cagac ecaga atgcacttgc aaacagggag actgtggtat aaaaaacaaa aaaaagact ttcctggctg tcctgaecag acatgcaaac aaaaaaaaaa ggcectgtce Caaaccttaa caccctctcc acccgttcaa 257 <220> <221> sig...peptide <222> 33. .92 ,-223> Von Heijne matrix score 12.3999996185303 seq ALLLGALLGTAWA/RR <221> P0iYA..site <222> 703. .714 <300> <400> 158 agcagaggtg gagcgaicccc attacgctaa ag atg aaa gc c Ala cag Gln tgg Trp ?he gcc Ala Cgg Arg aac Asn c ta Leu 100 gcg Ala ttt Phe aca c tg Leu ga t Asp gia Giu C gg Arg cgc Arg atg Met tac Tyr caa Gin tg t Cys tcc Ser ga t ctg ggg Leu Gly cac tgt His Cys gcc cag Ala Gin aat cca Asn Pro gag gcc Giu Ala gag tat Giu Tyr cgt gta Arg Val atc cga Ile Arg acatt Ser Ile 120 gag Oct Glu Ala 135 tgt gac C tg Leu tgc Cys ccc Pro dgc Ser aca Thr cag Gin 75 cgg Arg tca Se r gaa Ciu Ott Val1 Ctg Met Lays acc gcc Thr Ala Oct Ctg Ala Leu aag acc Lys Thr 30 tca gtg Ser Vai 45 ctg Ctg Leu Leu gat cct Asp Pro gga gaa Giy Giu att agc Ile Ser 110 gag gat Giu Asp 125 gac aaa Asp Lys ata tca ggc tgg ggc tgg ctg Gly Trp Gly Trp Lou tgg gct cgg agg agc Trp Ala Arg Arg Ser 1 gtg gat gaa cta gaa Val Asp Glu Leu Giu att cag atg gga tct Ile Gin Met Gly Ser gtg gag gtg cct tat Val Giu Val Pro Tyr gag gag ata tgt gac Giu Glu Ile Cys Asp tcc acc cat cgc aag Ser Thr H*is Arg Lys tcc agt gaa ctg gac Ser Ser Glu Leu Asp ggc acc ctc aag ttt Gly Thr Lou Lys Phe 115 gaa ctc azt gaa ttc Giu Lou Ile Giu Phe 130 ctt tgc agt aag cga Lou Cys Ser Lys Arg 145 cat gat gag cta His Asp Giu Leu 160 cccaggaggg gaaaatggtg gaaaaaatat gaaaccaaaa Thr Asp Leu Cys Asp His Ala Leu His Ile Ser 150 155 tgaaccactg gagcagccca cactggcttg atggatcacc gcaatgcctt ttatatatta tgtttttact gaaattaact gtacaaaaaa aaaaaa <210> 159 <211> 596 <212> DNA <213> Homo sapiens <220> <221> sig..peptide <222> 33. .107 <223> Von Heijne matrix score seq MFAASLLAMCAGA/EV <221> pciyA..signal.
<222> 546. .551 <221> pciyA..site <222> 584. .596 258 <300> <400> 159 cacagttcct Ctcctcctag agcctgccga cc atg ccc gcg Met Pro Ala ggC Gly gtg ccc atg Val Pro Met atg tgc gca Met Cys Ala -25 tcc acc tac ctg aaa atg ttc gca gcc agt ctc ctg gcc Ser Thr Tyr Leu Lys Met Phe Ala Ala Ser Leu Leu Ala ggg gca gaa gtg geg cac agg tac tac cga ccg gac ceg aca ate cct Gly Ala Glu Val Val His Arg Tyr Tyr AL-g Pro Asp Lau Thr Ile Pro gaa ate cca cca aag cgt 99a gaa ctc aaa Glu Ile Pro Pro Lys Arg Gly Clu Leu Lys aaa gee age aaa cac aaa cct caa Lys Clu Arg Lys His Lys Pro Gin taactatgcc aegaattctg tgaataatec gcatcaaact, acttgtcctt &egcacttag gtggaitgttt agccgaeacg ttgaaatta, Egccaeagca catatcatca aaccattca aatataacgc gaaatageaa aeeegtaagt aaetaagaaa ttatttaaad ctatgaacca <210> 160 <211> 403 <212> DNA <213> Homo sapiens <220> <221> polyA..signal <222> 375.-380 <221> POlyk..site <222> 390. .403 <300- <400> 160 tgaagagaat ggctgttgca gtcggcgtca agagcgcgag gactcggcgg ctgagcgcgc gace atg cgc ace cgc atg aca gat met Arg Ile Arg met Thr Asp 9et tct Cda Val Ser Gin 40 eagceaaa tceaagcta attacggttt ctacttag gtttcactaa gag ccc teg gga. ctg 01u Leu Leu Gly Leu cag gag gee cet aa Gin Giu Giu Leu, Lys teegeetttc taetteatt actgcaageg gaggtgccca gecegatace tcttgaaaac Ctggaagatg tttagtcttg ggtgtcttt atttcatata aaaaaaaega a 53 101 149 197 245 305 365 42S 485 545 596 120 169 217 265 313 363 403 gagcagcccc agtgccgggg attcggacgg ccgacagcag ctegaggcgc tgctcaacaa gga cgg ace ceg gtc ggc tgc tte Gly Arg Thr Leu Val Gly Cys Phe CtC tgc ace gac Leu Cys Thr Asp cgt gac tgc aat gtc eec ceg ggC ecg gcg Arg Asp Cys Asn. Val Ile Leu Gly Ser Ala 25 tcg gat tcc ttc tct gcc ggg gag ccc cgt Ser Asp Ser Phe Ser Ala Gly Giu Pro Arg cag gag Gin Glu gtg ctg Val Leu ttc ctc aeg Phe Leu Lys ggc ctg gcc Gly Leu Ala aeg gee ccc gga cac cac atc get tcc ate gag gtg cag Met Val Pro Gly His His Ile Val Ser Ile Giu. Val Gin.
agg gag agt ctg acc ggg cct ccg tat ctc tgaccacgee ggcgcttacc Arg Glu Ser Leu Thr Gly Pro Pro Tyr Leu teecagacee cattaeactt atgaccaeaa aaaaaaaaaa <210> 161 <211> 727 <212> DNA <213> Homo sapiens <220> <221> Sig..peptide <222> 126. .575 <223> Von Heijne matrix score 8.60000038146973 seq LELLTSCSPPASA/SQ <221> polyA signal 259 <222> 670.,.675 <221> P01YA-site <222> 721. .727 <300> <400> 161 Ctcagaactg tgctgggaag gacccgggg agggctccga gtcgg atg gcg gag acg Met Ala Glu Thr -150 gatgaggg, cgactggggc tcacctccgc accgttgtag gJcccgtggga gcegccccac gcggccecge cctgccaacg aag gac aca gcg cag aeg ttg gtg acc ttc aag Lys Asp Thr Ala Gin Met Leu Val. Thr Phe Lys gat geg Asp Val.
-135 gcc 9tg acc*-ttt acc cgg gag Ala Val Thr Phe Thr Arg Glu -130 gag tgg aga cag Clu Trp Arg Gin ctg gac ceg LeU Asp Leu gcc cag agg Ala Gin Arg ttg gec cac Leu Val His ggc ctc tca Gly Leu Ser egg age gca Trp Ser Ala acc ctg tac cga Thr Leu Tyr Arg -115 ceg cta gag Cat Leu Leu Giu His gag ggc aec ggg ttc ccn aaa ClU Gly Ile dly Phe Pro Lys ggg cag Gly Gin -110 gag ctg Giu Leu -120 cca gag Pro Glu aag aga LYS Arg -100 catC His tgg ata gtg Trp Ile Val gcC acc ege Ala Thr Cys gag tee cac tcC Glu Phe His Ser tgc cca ggc Cys Pro Gly geg gnn cgc Val Xaa Arg ctc agc tca ctg Leu Ser Ser Leu Ctt ceg ccc ccc LeU Leu Pro Pro tcC aag gga tec Phe Lys Gly Phe egc ctc agc ctc Cys Leu Ser Leu agt agc tgg gat Ser Ser Trp Asp cgc cca cca Arg Pro Pro tgc ccg gcc ggt Cys Pro Ala Gly tet gCa ttt tta gta Phe Val Phe Leu Val acg ggg ctt cac cat gtt ggc cag gct ggt ctt gaa Thr Gly Leu His His Val Gly Gin Ala Gly Leu Glu CCC ttg acc eca Leu Leu Thr Ser ate aca ggc gcg Ile Thr Gly Val 120 170 218 266 314 362 410 458 506 554 602 652 712 727 113 tgt age cca Cys Ser Pro ccc gcc cc gcc tcc caa age gct Pro Ala Ser Ala Ser Gin Ser Ala agc Ser cac cge gcc cgg cag aga aaa ace gct taaggtegaa aagagaaatt His Axrg Ala Arg Gin Arg Lys Thr Ala Caagaaattg ctgacggaat aaaaacataa agcaaaaaaa aaaaa <210> 162 <211> 944 <212> DNA <213> Homo sapiens <220> <221> sig..peptide <222> 90. .155 <223> Von Heijne matrix score 5.90000009536743 seq IILCCLALFLLLO/RK tagaactaca acaccgaagg aaacgaaaga <221> PoiyA signal <222> 913. .918 <221> polyA site <222> 932. .944 <300> <400> 162 gaac~caggee ccgeagccca cagaaaagaa gcaagggacg tcegceectg gaaggtgceg gacaaaaac aeg gaa cta Met Giu Leu -20 ace ata aec ceg gge tgc ccc gct cCg ttc eta gcaggactgt Cecacacct ate tcc cca aca gtg Ile Ser Pro T1hr Val ccc ccc cag Cgg aag Ile Ile Ile Leu- Gly Cys Leu Ala Leu 260 Phe Leu Leu Leu Gin Arg Lys aat ttg cgt aga Asri Leu Arg Arg ccc ccg tgc atc aag ggc tgg act cct tgg ate gga Pro Pro Cys Ile Lys Gly Trp Ile Pro Trp Ile Gly gee gga Val Gly aga ac Arg Ile ttt gag ttt ggg aaa gcc ect ota gaa ttt ata gag aaa gca Phe G1u Phe Gly Lys Ala Pro Leu Ciu Phe Ile Ciu Lys Ala 25 aag gta tgc 99t cgt ggc aga cgg ggc ccc cag agg aga c~a Lys Val Cys Gly Arg Cly Arg Arg Gly Leu Gin Arg Arg Gin t ttC ctt ttt taaacttct ttcatcgactc tcaagcgca gggctagaac Cys Phe Leu Phe 209 257 305 357 417 477 537 597 657 717 777 837 897 944 acggggaaca tacctgc ~acaattaac eacaata gCgcaccgac atceeg agttccaaat gtgtatg tccaatttaa gttgcg tcgtacaaat tgcttgc gacgccgtta tttgatg cccccgta gtagatg gaacagaccc ttattaa ggtatttgat tgtcaa <210> 163 <211> 598 <212> DNA <213> Homo sapiens <220> <221> Sig...peptide <222> 126.,.287 <223> Von Heiine m~ cc; tcc gaa cg t aaa aca at; tCc tgt taa Cctcaactaa tgtgcaaaac geagagatta taagtaaatg atgagtaatc ge cggcccgt atgeaccca C egtcC Cgaa cttttaagct dtttgtatga aggatcagt tttgcgaaag act Cttcgta ttttcagcaa cgcatccaaa acacaa taga CCC teactac agtatcc cc c tat Ccaa tt cattaaaaaa Catttctgad aaatgaaa Ca CCC ttacttc c tgggaaaga ttggagttaa caggctctge ggcccgaaga aaatgcga tccagtcaca aaaaaaa ttcctctact caattgcagc atcgaagtta taadgtgtaa caccaaagta atttt agect gactagtaat gzcactceaag aatatcc cat score 3.90000009536743 seq LETCGLLVSLVES/ 1W ,4221> P0lYk..signal <222> 561. .566 <221> POlYA-site <222> 587. .598 <300> <400) 163 Ctcagaactg tgccgggaag gdcccggggr. agggcccega gtcgg atg gcg gag acg Met Ala Glu Thr gatqgaggg cgactggggc tcacctccgc accgttgtag gcccgcggga gccgccccac gcggcctcgtc Ctgccaacg aag gac gca gcg cag atg tt gtg acc ttc aag Lys Asp Ala Ala Gin Met Leu Val Thr Phe Lys -45 gat gtg gct gtg Asp Val Ala Val ttt acc cgg gag Phe Thr Arg Glu t9g aga cag cC; Trp Ar; Gin Leu gac ctg Asp Leu gcc cag agg Ala Gin Arg ctg get tca Leu Val Ser ctg tac cga gag Leu Tyr Arg Glu at; cC; gag Met Leu Giu ace tgt gg; ct Thr Cys Gly Leu 10 aca gaa aac cag Thr Glu Asn Gin gtg gaa agc act tgg cC; cat Val Glu Ser Ile Trp Leu His aaa ctg gct tca cct gga agg aaa tcc act aac c; ccc Lys Leu Ala Ser Pro Cly Arg Lys Phe Thr Asn Ser Pro aag cct gag gtg egg Lys Pro Glu Val. Trp get cca ggc Ala Pro Gly tgacgccatc aaggatgcc tggtCtccg tcctcaggct ggccctcac agggatgccg ccccatgrccc aatccatecc cecacctcgg aaaaaaaa ggc gcc gca Gly Ala Ala ttggttcagg ccttgactgg ttcttteac gat gag Asp Glu gcc cag Ala Gin .ctcctgatcg ggcagcaggc aatgagaaaa ttccccc ggccgcag aataaatgct 261 <210> <211> <212> <5213 <220> <221> <222> <223> <221> <222> <300> <400> 164 360
DNA
Homo sapiens sig-peptide 85. .150 Von Heijne matrix score 5.90000009536743 seq IILGCLA.LFLLLQ/RK POIYA&site 349. .360 164 Cdggttccgc agccacagaa aagaagcaag ggacggcagg actgcttcac acttttctgc t~ctggaagg tgctggacaa aaac atg gaa cta att tcc cca aca gtg att Met GlU Leu Ile Ser Pro Thr Val. Ile -20 ata atc ctg ggt tgc ctt gct ctg ttc tta ctc ctt cag cgg aag aat Ile Ile Leu Gly Cys Leu Ala Leu Phe Leu Leu Leu Gin Arg Lys Asn -11 ttg cgt Leu Arg gga ttC Gly Phe aga ccc ccg tgc Arg Pro Pro Cys atc aag Ile Lys 99c tgg att ct Egg att gga gtt Gly Trp Ile Pro Trp Ile Gly Val ill 159 207 255 303 gag ttt ggg Glu Phe Gly gcc Cct cta gaa ttt ata gag aaa gca Ala Pro Leu GlU Phe Ile Glu Lys Ala atc aag tat gga cca Ile Lys Tyr Gly Pro ttt aca gtc Phe Thr Val 9ct atg gga aac Ala Met Gly Asn 8CC ttt gtt Thr Phe Val gaa gaa gaa gga Glu Giu Glu Gly aat gtg ttt Ota Asn Val Phe Leu aaaaaaaaaa aa <210> 165 <211> 490 <212> DNA <213> Homno sapiens <220> <221> <222> <223> sig-..peptide 77. .124 Von Heijne matrix score 4.80000019073486 seq SLFIYIFLTCSNT/SP <221> P01YA-signal <222> 461. .466 <221,> P0lYA-site <222> 477. .490 <300> <400> 165 atgagcttcc agccccaaga gtggaggctg ccacatccca acatagtatc tattgaaaag gaagcagtgt gtatct atg att ata tct ctg ttc atc tat ata ttt tcg aca Met Ile Ile Ser Leu Phe Ile Tyr Ile Phe Leu Thr -10 tgt agc aac acc tct cca'tct tat caa gga act caa ctc ggt ctg ggt Cys Ser Asn Thr Ser Pro Ser Tyr Gin Gly Thr Gin Leu Gly Leu Gly ctc ccc agt gcc Leu Pro Ser Ala tgc agg cta ttt Cys Arg Leu Phe cag Egg tgg cot ttg aca ggt agg agg atg cag tgc Gin Trp Trp Pro Leu Thr Gly Arg Arg Met Gin Cys 20 tgt ttt ttg tta caa aac tgt ctt ttc Oct ttt CCC Cys Phe Leu Leu Gin Asn Cys Leu Phe Pro Phe Pro 262 Ctc cac ctg att'cag cat gat ccc tgt gag ctg gtt ctc aca atc tec Lou His Leu Ile Gin His Asp Pro Cys Glu Leu Val Lou Thr Ile Ser 50 55 tgg gac tgg gct gag gca ggg gct tcg ctc tat tct cc taaccatact Trp Asp Trp Ala Glu Ala Gly Ala Ser Lou Tyr Ser Pro 353 gccttcctet cccccttgcc cccttgccct ggcacatat gtgaaaaasa aaas <210> 166 <211> 488 <212> DNA <213> Homo sapiens <220>, <221> polyA..signal <222> 458. .463 <221> poiyAsite <222> 475. .488 <300>.
<400>'166 acttagcagt tatCccccca gctatgcctt ctccctccct gcgccttatc tatgCtgcaa atataacatt aaactatcaa Ccgcttccga aaagaigacag acaatgcagc .Catcata atg aag gtg gac aaa gac Met Lys Val Asp Lys Asp cgg cag atg Arg Gin Met gag ctc ass Giu Lou Lys agc tac sag Ser Tyr Lys gtg ctg gag gaa gaa ttt cgg aac att Val Leu Giu Giu Giu Phe Arg Asn Ile tcc cca gag Ser Pro Giu gtg gtt tac Val Val Tyr atg gag ttg ccg Met Giu Leu Pro 413 473 490 103 151 199 247 295 343 394 454 488 aga cag ccc agg Arg Gin Pro Arg tac gtg cgt gac gat ggc cga gtg Tyr Val Arg Asp Asp Gly Axg Val tac cct ttg tgt Tyr Pro Leu Cys ttc Phe ta t Tyr atc ttc tcc Ile Phe Ser gcs ggg agt Ala Gly Ser agc cct Ser Pro aaa sac Lys Asn gtg 99c tgc aag Vai Giy Cys Lys gas cas cag atg Giu Gin Gin Met agg ctg gtg cag aca gca gag Arg Leu Val Gin Thr Ala Glu gtg ttc gsa atc cgc acc act gat Val Phe Giu Ile Arg Thr Thr Asp ctc acs sag Leu Thr Lys tgg ctc caa Trp Leu Gin ctc act gag gcc Leu Thr Glu Ala gas sag ttg GiU Lys Leu 105 tct ttc ttt cgt Ser Phe Phe Arg tgatctctgg gctggggact gaattcctga tgtctgagtc ctcsaggtgs ctggggactt ttaaataaat tttaaaatgc aaaaaaaaaa <210> 167 <211> 771 <212> DNA <213:> Hono sapiens <220> <221> sig..peptide <222> 48. .356 <223> Von Heijne matrix score 4.90000009536743 seq VYAFLGLTAPSGS/KE ggaaccccta aaaa ggscctgsac aaccaagact <221> PoiyA..signal <222> 742. .747 <221> PolyA..site <222> 760. .771 <300> <400> 167 ccacagccct tttcaggacc caaacaaccg cagccgctgt tccgg atg gtg atc 263 cgt gte Arg Val -100 caa eaa Gin Gin gaa aaa Glu Lys tat att gca Tyr Ile Ala tce tet ggc tct Ser Ser Gly Ser gat gtg ctt ggt Asp Vai Leu Gly gat att gca gcc Asp Ile Ala Ala ttc eta gad Phe Leu Giu cat gaa gag Asn Giu Glu -60 cge ccc 9cc Arg Pro Ala m'et Val Ile ace geg att Thr Ala Ile -90 aac aaa ata Asn Lys Ile cgg cog egg Arg Lys Trp aag aag aa Lys LYS Lys gga ttt gaa Gly Phe Glu atg aga gaa Met Arg Glu ctg cca cct Leu Pro Pro gee ttC ttt Ala Phe Phe 104 152 200 248 296 344 aac gta cct Asn Val Pro eat agt Asn Ser aca ggt eec Thr Gly Asn cog ate Gin Ile gae gc Giu Ala tte eat gee age Phe Asn Glu Ser cge ggg gee Arg Gly Asp age gee eat Arg Cl-u Asn gt9 tat gee tec Val Tyr Ala Phe tea ggc ttg Leu Gly Leu ace gee Thr Ala t=e ggc tea aeg gee gee gee gtg eaa gee Ser Gly Ser Lys Giu Ala Glu Val. Gin Ala tgaaeeetga geactgt aacageat Cegcet gggeeetcec atgcaaa ttatggtgeg gagaatg etagtgeegt tacccgc ctteectct tctgcca ggagaaagag aaaaaa <210> 168 <211> 959 <212> DNA <213> Momo sapiens <220> gc t get cec gga caa tte dad ctaagcatcc tcaaaagaae tcaeaega tatteecata ctetggcaa tgeaaeega tagg t tae: aatte tat t etecaccaet ctaagtgtat eag cog cac gee Lys Gln Gin Ala gtctccceeg cctttcaaa 9tgergeoce tagaeeagtt aAatttgaac attatggtga ategtage ttcgtagee gatttacec agaeaacacc etgeagctct ecattaecee <:221> sig-.pepeide <222> 69. .359 <223> Von Heijne matrix score 4 seq RLPLVVSFIASSSI/N <221> PoiyA-signal <222> 927._932 <221> PolyA-site <222> 947. .959 <300> <400> 168 cggagegeac caggcagccc ageacccca ggcgtggaga ttgatcctge gagagaaggg ggttceee atg gcg gat gac eta eag cge ttc ttg tat aaa aeg tta ce Met Ale Asp Asp Leu Lys Arg Phe Leu Tyr Lys Lys Leu Pro 449 509 569 629 689 749 771 110 158 206 254 302 350 age gte gee Ser Val Glu cct gtt ate Pro Val Ile -65 cct ggt ttc Pro Gly Phe etc cat gee att Leu His Ala Ile gtg eca get age Vel Ser Asp Arg get 99a gte Asp Gly Val get ttg cga Ala Leu Arg.
gtg gee eat Val Ala Asn get cce gag Ale Pro Giu tte tee act Leu Ser Thr ctt gee ace Leu Ala Thr cae ggc age aaa Gin Gly Se: Lys gga ctt tee edad Gly Leu Ser Lys agt ate ate Ser Ile Ile tee tat eec ace Tyr Tyr Asn Thr gtg agt tee eta Val Ser Phe Ile gtg get cad ttt ad cgC tta ect ttg Val Val Gin Phe Asn Arg Leu Pro Leu 264 15 -v -j agc agc agt gcc aet aca gga cta att gtc agc cta gaa aag gee ctt Ser Ser Ser Ala Asn Thr Gly Leu Ile Val Ser Leu Giu Lys Giu Leu 1 5 gcC cca ttg ttt gaa gaa ctg aga caa gtt gcg gaa gett tct Ala Pro Leu Phe Giu Giu Leu Arg Gin Val Val Giu Val Ser taatctgaca gtggttt agcaatcttt agactac caactcatac taaagag ttctggtgca gggtct Ctatcacagc tCCCazg tggatcagaa tcaaact aatatgccca ggcttgc aaaagtaata aaatcag <210> 169 <211> 464 <212> DNA <213> Homo sapiens <220> <221> Sig...peptide <222> 33. .98 <223> Von Heiine mn cag C ta tt gag 9g t aga ttt tac tgtgtacctt aatactta gcatatagat tatttagCga t tag tc tggt acattgatcc a taaagccaa tattcttg aaccacta atcttcatta ta~ gtaaettata get gatctaggga tac Caccatat gga acttgagccg tta Cttteeatetg tga caCCgegtga gga aaaaaaaaa :c atg aag cct Met Lys Pro a. ctg cag ctg .a Leu Gin Leu acaa caca agaaaggg tagatcag .cacagaa Ltgagaga Lagtgccg Lataataa aca taaa a-tetcaa Ccc cccctttctc C tgctat Ctt atggttcagt tc CtatCtcag ccaattgtac taaggacata atggce eggt trix score 9.80000019073486 seq LVVFCLA.LQLVPG/SP 4221> PoiyA..signai <222> 437. .442 <221> PoiyA..site <222> 455. .464 <300> <400> 169 gccagaactt actcacccat CcCactgaca c S00 560 620 680 740 800 860 920 959 53 101 149 197 245 293 341 401 461 464 cag tcC ctg gtg gtg Gin Phe Leu Val Val ccc aag cag cgt gt Pro Lys Gin Arg Val Cgc cta gc Cys Leu Al geg ctg cct cc Val Leu Pro Leu gtg cct ggg egt Val Pro Gly Ser cca ccc tgc ata Pro Pro Cys Ile cag gee gat tgc Gin Giu Asp Cys aag tat atC Lys Tyr Ile 10 act cac ctg Thr His Leu tea gca cct Ser Ala Pro gag aaa gga Giu Lys Gly ttg gaa cct Leu Giu Pro tgt aca atg Cys Thr Met gee aac tge Giu Asn Cys tc cag tgc Phe Gin Cys tcc tcc ttc tgc .ggg ata gtc tgt tca Ser Ser Phe Cys Gly Ile Val Cys Ser gee aca ttt caa Ciu Thr Phe Gin aeg cgc aac aga atc aaa cac aag ggc tca Lys Arg Asn Arg Ile Lys His Lys Gly Ser aac tgaggcacat ttcctagatc attttgccc Asn gt-c atc atg cct Val Ile Met Pro tecgatgctt CCtcCtggcc cacctttagg aaggtacega gaagcaagaa actggaggcc caatatcaa cctgcaaacc gctecgagt ttggcaataa aggctaatct accaaaaaaa aaa <210> 170 <21i> 799 <212> DNA <213> Homo sapiens <220> <221> sig-peptide <222> 110. .235 <22 3: Von Heiine matrix 265 score 5.19999980926514 seq LLFDLVCHEFCQS/DD <221> P01yA signal <222> 764. .769 <221> PoiyA-site <222> 787. .799 <300> <400> 170 ccaaccccag gaagagtctg aagagcagcc agtgtttcgg agctgccaaa caagcacgtc Ctgaaaatcc agaatggctt Cttgtgccct gtatacttgl gatgtttac atg cac act Mac His Ile tca caa ctg ctt act aca gtg gat Leu Gin Leu Leu Thr Thr Val Asp tgt cct gac act Cys tgc Cys cag Gin gc Ala cct Pro cag Cin agg Arg att Ile acg Thr tc Ser ga t Asp t Pro cat His aaa Lys tca S er C ta Leu a aa Lys act Thr tta Leu g tg Val.
tct Ser ttt Phe gaa Asp gad Glu aca Thr cag Gin act Ile aaa Lys gat Asp tg e Cys gc t Aia gca Ala act Ile 140 aa Thr t tc Phe gtg Val1 act Thr gac Asp cc a Pro tta Lau gaa Glu cag Gin ttt Phe 125 ada Lys aac gga aaa gac Gly Lys Asp tgc cag tct Cys Gin Ser cta gcc tct Leu Ala Ser is gag cad gag Giu Gin Giu 30 agc ctc act Ser Leu Ile gag eac tcg Giu Asn Ser acc caa gat Thr Gin Asp 80 ttt ctt tct Phe Leu Ser 95 gga gta aag Gly Val Lys 110 caa aac ctt Gin Asn Leu atc: cta cgt Ile Leu Arg ttc cca agt act Ile ga t Asp
I
gt Val1 tat Tyr cgg Arg gca Ala 65 ga t Asp aat As r.
gaa Ciu ctt Leu gaa Giu 145 ttg gat gga Asp 01 y -30 tgg aat Trp Asn -15 gat cca Asp Pro ttC cca Phe Ser cta aag Leu Lys 35 gtC tta Val Leu 50 gag tct Glu Ser ttc cac Phe His att ttt Ile Phe ggc cag Gly Gin 115 cct ttC Pro Phe 130 gtt gat Vai Asp act Ile C ta Leu ccc Pro g cg Val 20 ata Ile caa Gin aac: As n t tg Leu cag 100 ttg Leu tat Tyr aag Lys *caa gca Gin Ala ctt ttt *Leu Phe atc att Ile Ile 5 ttg tct Leu Ser gaa aaa Giu Lys aat atg Asn Met aca gag Thr Glu aaa. atc Lys Ile gca tta Ala Leu agc aaa Ser Lys agc cct Ser Pro gcg ctt Ala Leu 150 act taaa att Ile gac Asp ctt Leu gcc Ala gta ValI gaa Glu gad.
Giu tta Leu aca Thr cag Din 9tg Val 135 gta Val1 c tg Leu caa Gin atc Ile ga t Asp cag Gin act Thr aag Lys aag Lys aag Lys 120 gtg Val1 cat His gtc ValI gad Giu tat Tyr ctt Leu tg t Cys ada Lys ga t Asp gag Ciu 105 tgt Cys.
gaa Glu 118 166 2 14 262 310 358 406 454 502 550 598 646 694 740 799 Ala Asp Asp acctga LJeu Ilu L.ys Asn Phe Pro Ser Leu Lys Val Gin attggaatta cttctgtaca agaaacaaac ttcatttttc tcactgaaaa aaaaaaaaa <210> 17i <211> 320 <212> DNA <213> Homo sapiens <220> <221> poiyA-site <222> 308. .320 <300> <400> 171 tcatcatcca. gagcagccag tgtccgggag geagaag atg ccc cac tci4 aag cct 266 Met Pro His ctg gac tgg Lou Asp Trp Ctc tct Lca gtg Leu Ser Ser Val gaa tgt cca gca Glu Cys Pro Ala Ser Lys Pro gag cta ttt ect Cee aca ggg ggc ctt gca ggg aag ggt eca gga ctt gac etc tta Pro Ser Thr Gly Gly Leu Ala Gly Lys Gly Pro Gly Lou Asp Ile Leu aga tgc Arg Cys ggc Gtc Gly Val gte ttg tee cct Val Lou Ser Pro tgg gcc Trp Ala agt cat ttc eec tet ctg agc etc Set His Phe Pro Ser Leu Ser Lau ttc aac ccg tgaaatgggo tcataatcac tgccttacet CCCtczicggt Phe Asn Lau tgtcgagg actgagtgtg tgaagtttt tcataaactt tggatgetag tgtaaaaaaa aaaaaa <210> 172 <211> 331 <212> DNA 2 13> Homo sapiens <220> <221> Sigpeptide <222> 129. .209 <223> Von Heiine matrix score 4.90000009536743 seq CLLSYIALGAIA/Kl 103 151 199 254 314 320 120 170 266 316 <221:> P0lYA-site <222> 318. .331 <300> <400> 172 atggaaacca gatggggcaa cggggtggtt esectctagc etgctcattt ccagctcaga aaaaggaa atg aac agg gtc cct get Met Asn Arg Val Pro Ala ctagtgcaga czgtagccgc agctcetctc aattctacta atggcgtttt ttctteetga gat tct cca aat atg tgt cta atc Asp Ser Pro Asn Met Cys Leu Ile tgt tta ctg agt tac ate gca ctt gga gee Cys Leu Leu Ser Tyr Ile Ala Leu Gly Ala -5 agg aga gca ttc cag gaa gag gga aga gca Arg Arg Ala Phe Gin Glu Glu Gly Arg Ala ate cat ges aaa atc tgc Ile His Ala Lys Ile Cys aat gca aag acg ggc gtg Asn Ala Lys Thr Gly Val is taaagtttcc ttggaatagc get tgg tgc ata Ala Trp Cys Ile tgg gee aaa Trp Ala Lys caaaaaaaaa aaaaa <210> 173 <211> 1075 <212> DNA <213> Homo sapiens <220> <221> Sig-.peptide <222> 78. .359 score 4.19999980926514 seq IILTAVYFALSIS/LH <221> PolyA-signal <222> 1042. .1047 <221> POlyA site <222> 1063. .1075 <300> <400> 173 gtggtaggga geagecagga gcggtttect gggaactgtg ggatgtgcec ttgggggcce gagaaaaeag aaggaag atg etc eag aeC agt aac tac age ctg gtg etc Met Leu Gin Thr 267 Ser Asn Tyr Ser Leu Val Leu tet ctg cag Ser Leu Gin tea gaa ctg Ser Giu Leu ate cag gat 11e Gin Asp ctg etg Ctg Leu Leu Leu tcc tat Ser Tyr gac etc ttt Asp Leu Phe etc caa aag act Leu Gin Lys Thr cc t Pro act gca gte Ile Ala Val tte ctc Phe Phe -60 etc ttc Loeu Phe -45 ttc eag Phe Gin gte ate cag ctt gtg Val Ile Gin Leu Val aae ate ate atc Ott Asn Ile Ile Ile Ile get ggc ctg gcc aac Ala Gly Lou Val Asn aat tce ttc Asn Ser Phe etc ttc ate Leu Phe Ile ttCcetc atg Phe Lau Met aae ace tte Asn Thr Phe etc eta Leu Leu aag ttc aaa Lys Phe Lys atc ate ctg Ile Ile Leu gtg tac Val Tyr ttt gc Phe Ala aaa aac Lys Asn age atc tee ctt, Ser Ile Ser Leu gte tgg gtc atg Val Trp Val Met aac age Asn Ser is aga eta Ara Leu
I.
tte ata Phe Ile gea gea Ala Ala tta cge tgg Leu Arg Trp atg ctg ttt Met Leu Phe tgg aca gat Trp Thr Asp gtg ttg tac Val Leu Tyr 99a ett caa Gly Leu Gin tgc tac ttc Cys Tryr PMe gta ttc cag Val Phe Gin egg aca gee gta Arg Thr Ala Val eta ggc gat Leu Gly Asp ttc tac cag gac tet ttg tgg etg Phe Tyr Gin Asp Ser Leu Trp Leu gag ttc atg caa gtt ega agg Giu Phe Met Gin Val Arg Arg tgacetettg tcacaetgat ggatactttt ccttcctgat aaeagc tgga tccccctct eettctaect accggccttt tgggacagaa tgatagttac aetttaaaaa aa <210> 174 <211> 632 <212> DNA <213> Homo <220> agaagccaea ettcccaagg tgeacaatta etgctceaec ettgecgagg ggaeetceeg gtgctectga aatgttttat tttgetgctt aaggtteaga gagtgtcccc cctcette gttetgtggc gcccecaaag etgateacae aaatagagaa tgcagggaga ctagctgtgt atcggtctcc eteteetete tcttaccctt gateteecag cgeagaeactt taaattgaat 158 206 254 302 350 398 446 494 542 593 653 713 773 833 893 953 1013 1073 1075 109 157 gttggcccta tcagcat tea agtgcggcat tgtaecattc gtgaagcttt tgaeeaaagg tagattttta tcttgttcca tgea tgggca agaaggaaga ceettcettg attetccetg tcctttagcc atgcgaagag taeeeaaggc aaaaaaaaaa sapiens <221> <222> <223> s ig-peptide 62. .265 Von Heijne matrix score 4.59999990463257 seq LPFSLVSMLV'rQC/LV <221.> poiyA..signai <222> 602. .607 <221> pOiyA..site <222> 621. .632 <300> <400> 174 caetgggtca aggagtaage agaggataaa caaetggaag gagageaage aeaaagtcat c atg get tea geg.tct get cgt gga aac caa gat aaa gat gee cat ttt Met Ala Ser Ala Ser Ala Arg Gly Asri Gin Asp Lys Asp Al~a His PMe -60 eca eca cca age aag eag age etg ttg ttt tgt eca aaa tea aaa ctg Pro Pro Pro Ser Lys Gin Ser Lou Lou PMe Cys Pro Lys Ser Lys Lou 268 -45 cac atc eac aga gca gag ate tea aag act atg cga gaa tge eag gaa 205 His Ile His Arg Ala Glu Ile Ser Lys Ile Met Arg Glu Cys Gin Glu -30 gaa age tee egg aag aga gee Ctg cct ttt ect ctt gta agc atg ccc 253 Clu Ser Phe Trp Lys Arg Ala Leu Pro Phe Ser Leu Val Ser Met Leu -15 -10 gec ace cag gga cta gtc cac caa ggt tat ttg gca get aae ct aga 301 Val Thr Gin Gly Leu Val. Tyr Cln Gly Tyr Leu Ala Ala Asn Ser Arg 1 5 tc gga tea ttg ccc aaa get gca etc get ggt ccc tc; gga ttt ggc 349 Phe Gly Ser Leu Pro Lys Val. Ala Lou Ala Gly Leu Leu Gly Phe Gly 20 ctc gga aag gta tea tac ata gga gca tgc cag agt *a*a tcceat ttt 397 Leu Gly Lys Val Ser Tyr Ile Gly Val Cys Gin Ser Lys Phe His ?he 35 ccc gaa gac cag ccc cgt ggg get ggt tc ggt cca cag cat aac agg 445 Phie Glu Asp Gin Leu Ar; Gly Ala Gly Phe Gly Pro Gin His Asn Arg 50 55 cac cgc ccc ctt ace cgc gag gas egc aa ata sag cat gga tca agt 493 His Cys Leu Leu Thr Cys Glu Clu Cys Lys Ile Lys His Gly Leu Ser 70 gag sag gga gac tc cag ccc tca gcc tcc tassecge gtcegcgace 543 Glu Lys Gly Asp Ser Gin Pro Ser Ala Ser tcgaagccc cccaaacccc tgaactCgca eaeatttaaa atttcaagtg cactttaaaa 603 taaaatctc ctaatgcaaa aaaaaaaaa 632 <210> 175 <211> 430 <212> DNA <213> Homo sapiens ,c220> <221> polyA.signal <c222> 402. .407 <221> pOiyA-site <222> 419. .430 <300> <400> 175 gtacegggaa agcgattcge gaa at; aaa gCa gas gaa gag cat ace aat gca 53 Met Lys Val Giu Glu Glu, His Thr Asn Ala 1 5 aca ggc act ccc cac ggc ggt tcg sea gce aeg cca gca gat aac ata 101 Ile dly 'hr Leu His Gly Gly Leu Thr Ala Thr Leu Val. Asp Asn Ile is 20 eea aca aeg get ccg eta tgc aeg gsa sgg gga gca ccc gga gtc agc 149 Ser Thr Met Ala Leu Leu Cys Thr Giu Arg Gly Ala Pro Gly Val Ser 35 gte gat atg aac aca acg cac atg tea ccc gcs aaa tea gga gag gat 197 Val Asp Met Asn Ile Thr Tyr met Ser Pro Ala Lys Leu Gly Glu Asp 50 acs gcg act sea gea cat gte ccg aag esa gga aaa aca etc gca ccc 245 Ile Val Ile Thr Ala His Val. Leu Lys Gin Gly Lys Thr Leu Ala Phe 65 ace tc gtg ggc ctg ace aac aag gee aca gga aas cca ata gea caa 293 Thr Ser Val Gly Leu Thr Asn Lys Ala Thr Gly Lys Leu Ile Ala Gin 80 85 gga aga cac sea aa eac ccg gga aac tgagsgaaca geagasegac 340 Gly Arg His Thr Lys His Leu Gly Asn ctaaagaaac ecascasegd atatcacae cagaccegac tcaaacaatc gcaatcttttg 400 ataaacta geaaseea aaaaaaaaaa 430 <210> 176 <211> 185 269 <212> DNA <213> Homno sapiens <220> <221> Sig-peptide <222> <223> 42. .113 Von Heijne matrix score 3.70000004768372 seq ILFULLIFLCGFT/NY <221> PoiyA-site <222> 172. .185 <300> <400> 176 ctttcagasoc tcactgccaa gatgcccegaa caggagccac c atg cag tgc ttc agc Met Gln Cys Phe Ser ttc att aag acc atg atg Phe Ile Lys Thr Met Met Atc ctC ttc aat ttg ctc atc ttt ctg tgt le Leu Phe Asn Leu Leu lie ?he Leu Cys -10 ggc ttc acc Gly Phe Thr tat acg gat ttt gag gac tca ccc tac ttc aaa atg Tyr Thr Asp Phe Giu Asp Ser Pro Tyr Phe Lys Met cat aaa cct gtt aca atg His Lys Pro Val Thr Met <210> 177 <211> 585 <212> DNA <213> Homo sapiens <220> <221> sig...peptide <222> 108. .170 taaaaaaaaa aaaaa score seq SFLPSALVIWTSA/AF <221> polyA signal <222> 550. .555 <221> polyA..site <222> 574. .585 <300> <400> 177 cacgttcctg ttgagtacac gttcctgttg atttacaaaa tgaagactaa cattttgtga agttgtaaaa cagaaaacct ggtgcaggta tgagcaggtc gttagaa atg tgg tgg Met Trp Trp ttt cag caa ggc ctc agt ttc ctt, cct. tca gcc ctt gta att tgg aca Phe Gin Gin Gly Leu Ser Phe Leu Pro Ser Ala Leu Val Ile Trp Thr -10 tct gct gct ttc ata ttt tca tac att act gca gta aca ctc cac cat Ser Ala Ala Phe Ile Phe Ser Tyr Ile Thr Ala Val Thr Leu His His ata gac ccg get tta Ile Asp Pro Ala Leu tat ate agt gac Tryr Ile Ser Asp ggC aca gta gct Gly Thr Val Ala gaa aaa tgc tta ttt ggg gca atg cta aat att gcg gca gtc Glu Lys Cys Leu Phe Gly Ala Met Leu Asn Ile Ala Ala Val tta, tgt Leu Cys caa aaa tagaaatcag gaagataatt caacttaaag aagttcattt catgaccaaa, Gin Lys ctcttcagaa acatgtcttt acaagcatat ctcttgtatt gctttctaca ctgttgaatt gtctggcaac acttctgcag tggaaaattt gatttagcta gttcttgact gataaatatg gtaaggcggg cttttccccc tgtgtaattg gctactatgt cttactgagc caagttgtaa tttgaaataa aatgatatga gagtgacaca aaaaaaaaaa a <210> 178 270 <211> <212> <213> <220> <221> <222> <223> 613
DNA
Hom 0sapiens Sigpeptide 118. .171 Von Heijne matrix score 5.90000009536743 Seq ALALLWStJPASDL/GR <221> POlYA..signal <222> 563..568 <221> polyA..site <222> 602. .613 <300> <400> 178 ggggtgggtg gactagaagc atttgggage gagctgccgc acagagcctg gtgtccacaa atg agc ccc ggc agc gcc ttg gcc Met Ser Pro Gly Ser Ala Leu Ala agtggccagg gCttcdggt Ctt ctg tgg Leu Leu Trp ggccctggac gctagccacg tggggttgga gcctggg tcc ctg cca gcc tct Ser Leu Pro Ala Ser gac ctg ggc cgg tea gtc Asp Leu Gly Arg Ser Val.
got gga. etc tgg cca cac act ggc gtt Ala Gly Leu Trp Pro His Thr Gly Val
I.
ate cac Ile His ttg gaa aca Leu Giu Thr ttt ccc ctc Phe Pro Leu agc cag tct ttt Ser Gin Ser Phe tgt tgt aca tcg Cys Cys Thr Ser caa ggt cag ttg Gin Gly Gin Leu aag agc ata, Lys Se- Ile ttt tgt gtt Phe Cys Val gta aca gtg ggt gga ggg agg gtg ggg tct aca ttt gtt Val Thr Val. Gly GJly Gly Arg Val. Gly Ser Thr Phe Val tgagtcgatg ggtcaga tcgaaaataa gcacctt attgattaag ttactgt agaacatagc aaggggg gcaaaccttt aaaaaaa <210> 179 <211> 427 <212> DNA <213> Homo sapiens <220> <221> sig..peptide <222> 128. .268 <223> Von Heijne m, act ttagtatacg ggt aactaaaccc aaa agcttgggtt CtC ctctgttgga aaa aa catgegtcct etc taatagc tatttttgta gtaatgtaaa ctgagtgaca tataaaggct ggacttaatg ttgtaattat gggeattttg ttagctctgt gctaagaatt aaataaacat atrix score seq SALLFFARPCVFC/FK <221> PoiyA..signal <222> 410. .415 <221> polyA.site <222> 424. .427 <300> <400> 179 agcttggatt tacactgggc aacgtggttg gaatgtatct caaacctggc taaaaaaett gaagaaatta aaaaggactt ctagcgc atg aga ctg OCt cca gca ctg cct tca M1et Arg Leu Pro Pro Ala Leu Pro Ser -40 act gct ct Thr Ala Leu ggctcagaac tatgatatac ggatgccaag aagaaacccc gga tat act gat tct Gly Tyr Thr Asp Ser caa aag ctt ttg ttt Gin Lys Leu Leu Phe aga ccc tgt gtt ttt Arg Pro Cys Val Phe gag gge etc gtt tac tat ctg aac Glu Gly Leu Va. Tyr Tyr Leu Asn -25 tog tet cca, gee tca gea ott ctc tto ttt got Ser Ser Pro Ala Ser Ala Leu Lou Phe Phe Ala -i0otgc ttt aaa gca agc aaa atg ggg ccc caa ttt gag aac Cys Phe Lys Ala Ser Lys Met Gly Pro Gin Phe Giu Ann 1 5 10 ttt eca aca tac tea cct ctt Ccc ata ate cct ttc caa Phe Pro Thr Tyr Ser Pro Leu Pro Ile Ile Pro Phe Gin 25 agg ttc taagactgga attatggtgc tagattagta aacatgactt Arg Phe aaaaacaaaa <210> 180 <211> 905 <212> DNA <213> Homo sapiens <220> <221> slg-.peptide <222> 149. .457 <223> Von Heijne matrix score 4.90000009536743 seq FLLAQTTLRNVLG/TQ ta Cti Le' t t c cca aca r Pro Thr is g cat ggg u His Gly aat gaaaa taaaagcttc gagaatttcg tte cct Phe Pro Sgat gtg tAsp Val tgte aaa e Val Lys <221> poiyA.site <222> 893. .912 <300> <400> 180 gctgcctgtt ctccacactt agctccaaac eeatgaaaaa ttgecaagta teaagaatga gacggattct agggtgtctt cacctgagaa gcaagataaa tgggtgccaa caataaaegg cttggtgt atg tgg ccg gat cct gtt Met Trp Leu Asp Pro Val -100 ttt ect gtt ggt Phe Pro Val Gly gat cat Asp His tac ctt ccc Tyr Leu Pro cat etc cat at~ His Leu His Me~ -85 ata gat gtg ttt Ile Asp Val. Ph ctt gaa ggt ttg ate ctg gte ctg cca Leu Giu Gly Lou Ile Leu Val Leu Pro gtt gac etc Val Asp Leu acc aga gac Thr Arg Asp aca gtt act tgc aac att ect cca. caa Thr Val Thr Cys Ann Ile Pro Pro Gin gta act act cag Val Thr Thr Gin -40 gta gat gga Val Asp Gly ate tat Ile Tyr gca aca Ala Thr agt get gte tea gca gtg get aat gtc Ser Ala Val Ser Ala Val Ala Ann Val gtt -gtc Val Val aac gat Ann Asp aat gte Ann Val gag atc etc Giu Ile Leu tat tac aga T1yr Tyr Arg gte cat caa Val His Gin tta ggg aca Leu Gly Thr 120 172 220 268 316 364 412 460 508 556 604 652 700 ttt ctg Phe Leu ctg get Lou Ala cag ate Gin Ile caa acc act ctg Gin Thr Thr Leu tta, get gga ega Leu Ala Gly Arg cag ace ttg tee Gin Thr Leu Ser ate cag act tta Ile Gin Thr Leu gag ate gee Clu Ile Ala 1 cat age His Ser ctt gat gat gee Leu Asp Asp Ala gaa ctg tgg ggg ate egg gtg Giu Leu Trp Gly Ile Arg Val gec ega Ala Arg gtg gaa ate aaa Val Glu Ile Lys gat gtt Asp Val cgg att ccc Arg Ile Pro cag ttg cag aga Gin Leu Gin Arg tee atg gca gee Ser M4et Ala Ala so ctt gca get gaa, Lou Ala Ala Glu gag get gag gcc ace Glu Ala Giu Ala Thr egg gaa geg aga gee aag gte Arg Giu Ala Arg Ala Lys Val 60 tec aaa tee ctg aag tea gee Ser Lys Ser Lou Lys Ser Ala 75 gga gaa Gly Glu atg aat get Met Ann Ala tee atg gtg etg get gag tct ccc ata gct etc cag ctg ege tac ctg Ser Met Val Leu Ala cag acc ttg agc acg Gin Thr Leu Ser Thr 100 cct ctg ccc atg aat Pro Leu Pro Met Asn 115 aac cac aag aeg ctt Asn His Lys Lys Lou 130 caaaaaaaaa aaaa <210> 181 <211> 307 <212> PRT <213> Homo sapiens <220> Glu Ser Pro Ile 90 gta gcc acc gag Val Ala Thr Glu 105 ata cta gag ggc Ile Leu Glu Gly 120 cca aat aaa gcc Pro Asn Lys Ala 135 272 Ala Leu Gin Leu Arg Tyr Leu aag aat tct acg att gtg ttt Lys Asn Ser Thr Ile Val Phe 110 att ggt ggc gtc agc tat gat Ile Gly Gly Val Ser Tyr Asp 125 tgaggtcctc ttgcggzagt <221> SIGNAL <222> -13..-1 <300> <400> 181 Met Leu Ala Vi Leu Glu Ser P~ Leu Leu Leu C: Arg Leu Phe G: Gly Asp Val M Glu Asn Gly G Lys Thr Arg G: Ala Gly Pro A~ 100 Phe Giu Val Al Glu Thr Pro 1 Val Thr Gin Al 150 Trp Gin Arg A 165 Gly Arg Leu L 180 Leu Asp Gin L Asp Phe Val L 2 Tyr Val Ser C 230 Me. Pro Gly P 245 Trp Val Gly M 260 Asp Phe Leu S Lys Lys Lys <210> 182 <211> 59 <212> PRT al
LO
ro Ly Lu et eu ly he sn e 35 sp eu eu eu 15 ly he et er Ser Ile Va1 Asn Phe Ile Asp Gly Pro 120 Glu Gly Asp Glu Arg 200 Va1 Leu Pro Ser Clu 280 Leu Asp Leu 25 Gin Thr Glu Glu Thr 105 Trp Gly Arg Tyr Tyr 185 Phe Ala Met Asp Thr 265 Arg Pro -5 Pro Asn Gly Ala 60 Ala Cys Val Glu Met 140 Tyr Leu Va1 Gly Thr 220 Gly Arg Pro Ile Leu Leu Thr Pro 45 Asp Arg Gly Ala Val 125 Ser Phe Va1 Thr Val 205 Met Ala Pro Asn Lys 285 Leu Ser Lys 30 Glu Gly Phe Arg Asp 110 Lys Phe Thr Met Arg 190 Gln Ala Asp Ser Pro 270 Arg Met Met Leu 1 Glu Pro Pro Gin Ala Glu Ala His Ile Val Lys Leu Gly Pro Cys Gly Ile Arg Lys Gly Leu 115 Leu Ser Ser 130 Asp Leu Thr 145 Ser Ser Lys Thr Asp Asp Lys Val Leu 195 Pro Ala Glu 210 Arg Arg Val 225 Val Giu Asn Gly Gly Tyr Ser Met Leu 275 Phe Lys Val 290 273 <213> Homo sapiens <220> <300> <400> 182 Met Met Tyr Val Ser Ile Glu Met Ser Gly Pro Thr Ile Ser His Leu 1 5 10 Phe Asp Tyr Val Val Cys Tyr Ile Tyr Gly Leu Lys Ser Phe Ser Leu 25 Lys Gln Leu Lys Lys Lys Ser Trp Ser Lys Tyr Leu Phe Clu Ser Cys 40 Cys Tyr Arg Ser Leu Tyr Val Cys Val Pho Ile <210> 183 <211> 97 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -28..-1 <3007 <400> 183 Met Ser Pro Ala Phe Arg Ala Met Asp Val Glu Pro Arg Ala Lys Gly -20 Val Leu Leu Glu Pro Phe Val His Gin Val Gly Gly His Ser-Cys Val -5 1 Leu Arg Phe Asn Glu Thr Thr Leu Cys Lys Pro Leu Val Pro Arg Glu 10 15 His Gin Phe Tyr Glu Thr Leu Pro Ala Glu Met Arg Lys Phe Ser Pro 30 Gin Tyr Lys Gly Gln Ser Gln Arg Pro Leu Val Ser Trp Pro Ser Leu 45 Pro His Phe Phe Pro Trp Ser Phe Pro Leu Trp Pro Gln Gly Ser Val 60 Ala <210> 184 <211> 52 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -32..-1 <300> <400> 184 Met Leu Gly Thr Thr Gly Leu Gly Thr Gln Gly Pro Ser Gln Gln Ala -25 Leu Gly Phe Phe Ser Phe Met Leu Leu Gly Met Gly Gly Cys Leu Pro -10 Gly Phe Leu Leu Gln Pro Pro Asn Arg Ser Pro Thr Leu Pro Ala Ser 1 5 10 Thr Phe Ala His <210> 185 <211> 124 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -97..-1 <300> <400> 185 Met Ala Asp Asp Leu Lys Arg Phe Leu Tyr Lys Lys Leu Pro Ser Val -90 Glu Gly Leu His Ala Ile Val Val Ser Asp Arg Asp Gly Val Pro Val 274 Val Lys Val Ala Asn Phe Leu Ser Thr Phe Leu Ser Lys Asn Lys.
Val Gin Phe Asn Arg Ser Ala Asn Thr Gly
I
Leu Phe GCu Glu Lou <210> 186 <211> 230 <212, PRT <213> Homo sapiens <220> <221>-SIGNAL <222> -24..-1 <300> <400> 186 Mec Ala Ser Leu Gly Gly Leu Leu Gly Thr Ser Ser Tyr Val Cly Gly Leu Trp Met Glu Asp Ile Tyr Ser Thr Gin Ala Met Met Val Ile Ser Val Val Gly Ala Lys Asp Arg Val Gly Leu Leu Gly Phe 105 Arg Asp Phe Tyr Ser 125 Gly Giu Ala Leu Tyr 140 Ala Gly Ile Ile Leu 155 Asn Tyr Tyr Asp Ala 170 Pro Arg Pro Giy Gin 185 Ser Leu Thr Gly Tyr 205 <210> 187 <211> 72 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -32..-i <300> <400> 187 -75 Asp Asn Ala Pro -60 Ala Leu Ala Thr Ser Ile Ile Cys -25 Leu Pro Leu Val -10 Leu Ile Val Sor 5 Arg Gin Val Val Glu Asp -40 Tyr Val Leu Glu Gly Leu Thr Ser Pro Ile Val Gly Trp Asp 130 Ser Ser Pro His -55 Gin Tyr Ser Clu 10 Va 1 Tyr Leu Ala Thr Ala Ser Phe I Val Asn 115 Ser Ser Se r Leu Ser 195 Ala Leu Arg Pro Gly Giy Ser Lys Leu Gly Asn Thr Tyr Gin Val Phe Ile Ala Ser Ser
-S
Lys Glu Leu Ala Pro Ser Lys Val Lys Glu Phe Asn Ser Met ?he Ala Leu Ala Val Met Arg Ala Phe Arg Lys Asn Lys Thr Leu -25 275 Gly Tyr Gly Val Pro Met Leu Leu Leu Ile Ala Gly Giy Ser Phe Gly -10 Leu Arg Glu Phe Ser Gin Ile Arg Tyr Asp Ala Val Lys Ser Lys Met 1 5 10 Asp Pro Glu Leu Giu Lys Lys Pro Lys Giu Asn Lys Ile Ser Leu Clu 25 Ser Giu Tyr Giu Gly Ser Ile Cys <210> 188 <211> 88 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -33..-l <300> <400> 188 Met Ser Gin.Thr Ala Trp Leu Ser Leu Leu Ser Ser Ser Pro Phe Gly -25 Pro Phe Ser Ala Leu Thr Phe Leu Phe Leu His Leu Pro Pro Ser Thr -10 Ser Leu Phe Ile Asn Leu Ala Arg dGy Gin Ile Lys Gly Pro Leu Gly 1 5 10 Leu Ile Leu Leu Leu Ser Phe Cys Gly Gly Tyr Thr Lys Cys Asp Phe 25 Ala Leu Ser Tyr Leu Giu Ile Pro Asn Arg Ile Giu Phe Ser Ile Met 40 Asp Pro Lys Arg Lys Thr Lys Cys so <210> 189 <211> 106 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -32..-i <300> <400> 189 Met Phe Ala Pro Ala Val Thr Arg Ala Phe Arg Lys Asn Lys Thr Leu -25 Gly Tyr Gly Val Pro Met Leu Leu Leu Ile Val Giy Gly Ser Phe Gly -10 Leu Arg Giu Phe Ser Gin Ile Arg Tyr Asp Ala Val Lys Ser Lys Met 1 5 10 Asp Pro Giu Leu Giu Lys Lys Leu Lys Glu Asn Lys Ile Ser Leu Glu 25 Ser Giu Tyr Giu Lys Ile Lys Asp Ser Lys Phe Asp Asp Trp Lys Asn 40 Ile Arg Gly Pro Arg Pro Trp Giu Asp Pro Asp Leu Leu Gin Cly Arg 55 Asn Pro Giu Ser Leu Lys Thr Lys Thr Thr <210> 190 <211> 267 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -21.-1 <300> <400> 190 Met Trp Trp Phe Gin Gin Gly Leu Ser Phe Leu Pro Ser Ala Leu Val -15 276 Ile Trp Thr Ser Ala Ala Phe Ile Phe 1 Leu His His Ile Asp Val Ala Pro Glu Lys Val Leu Cys Ile Ala Leu Ser Pro Glu Glu Val Leu Gly Ile Leu Gln Lys Thr Thr Leu Phe Gly Met Gly Ser 110 Gin Met Gin Pro Lys 125 Leu Leu Val Ile Trp 140 Ser Set Val Leu His 160 Leu His Trp Asn Pro 175 Thr Ala Ala Glu Trp 190 Thr Tyr Ile Arg Asp 205 Leu His Gly Leu Thr 220 Glu Arg Thr Arg Leu 240 <210> 191 <211> 108 <212> PRT <213> Homo sapiens <220> <300> <400> 191 Met Gly Cys Val Phe 1 5 Asp Trp Thr Leu Ser Tyr Tyr Tyr Ser Asn Val His Leu Met Gly Gln Asp Val Gin Glu Leu Lys Gly Glu Ser LeL Pro Glu Glu Pro 100 <210> 192 <211> 69 <212> PRT Leu Phe 35 Tyr Ile Lou Ala Met 115 Gly Val Asn Lys Ser 195 Lys Asp Arg Thr Glu Val 40 Leu Gln Phe Thr 5 Pro Tyr 20 Gly Ala Val Arg Ile Lys Gly Leu 85 His Val 100 Phe Val Lys Gin Ser Ala Phe Gly 165 Gly Tyr 180 Phe Ser Ile Ser Thr Ala Asp Ile 245 Glu Asp 10 His Ala 25 Pro Ile Cys Asn Gly Thr Lys Lys 90 Gin Met 105 Ser Tyr Ile Thr Ala Val Thr Ile Ser Asp Thr Gly Thr Asn Gin Lys Val Ala Ile 120 Trp Met Leu His Gly 200 Val Pro Ile Glu Phe Ser Cys Val Ile Val Ala Ala Val 105 Leu Ile Leu Glu Met 185 Phe Glu Ile Phe Tyr Gin Leu Glu Leu Ala His Gly Asn Leu Ser Arg Thr Gin 170 Ile Phe Ala Asn Lys Val Asn Leu Ile His Ala Ala Leu Phe Thr Tyr Leu Cys 155 Lys Thr Leu Asn Asn 235 Ile Leu Arg Leu Arg Val Gin Ser Pro Gly Leu Ser Asp Ile Ala Asp 70 Gin Val Lys Gly <213> Homo sapiens <220> <221> SIGNAL <222> -46..-1 <300> <400> 192 Met Ser Val Phe Trp Gly Phe Val Gly Phe Leu Val Pro Trp Phe Ile 277 -40 Pro Lys Gly Pro Asn Arg Gly Val Ile Ile Thr Met Leu Val Thr Cys -25 -20 Ser Val Cys Cys Tyr Leu Phe Trp Leu Ile Ala Ile Leu Ala Gin Leu -5 1 Asn Pro Leu Phe Gly Pro Gin Leu Lys Asn Clu Thr Ile Trp Tyr Leu 10 Lys Tyr His Trp Pro <210> 193 <211> 251 <212> PR? <213> Homo sapiens <220> <22i> SIGNA.L <222> -28. 1 <300> <400> 193 Met Trp Arg Leu Leu Ala Arg Ala Ser Aia Pro Leu Leu Arg Val Pro Leu Ser A.
Leu Leu P; Lys Leu Ai Giu Pro L) Giu Phe T1 Leu His Tz Met Asp Pi Lys Pro Il Giy Ala Il Vai Clu Me 13 Asp Gin Ve 150 Gly Thr Le 165 Asn Gin AE Gly Ile Ax Tyr Trp GI 22 <210> 194 <211> 99 <212> PRT Val1 Glu Val1 Glu Gly Asn Ala Gly Arg 130 Gly Val1 Glu Asn Lys 210 <213> Homo. sapiens <220> <221> SIGNAL <222> -1 <300> <400> 194 Met Asp Asn Val Gin Pro Lys Ile Lys His Arg Pro Phe Cys Phe Ser -40 Val Lys Gly His Val Lys Met Leu Arg Leu Asp Ile Ile Asn Ser Leu -25 278 Val Thr Thr Val Phe Met Leu Ile Val Ser Val Leu Ala Leu Ile Pro -10 Glu Thr Thr Thr Leu Thr Val Gly Gly Gly Val Phe Ala Leu Val Thr 1 5 10 Ala Val Cys Cys Leu Ala Asp Gly Ala Leu Ile Tyr Arg Lys Leu Leu 25 Phe Asn Pro Ser Gly Pro Tyr Gln Lys Lys Pro Val His Glu Lys Lys 40 Clu Val Leu <210> 195 <211> 81 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -31..-1 <300> <400> 195 Met Ser Asn Thr His Thr Val Leu Val Ser Leu Pro His Pro His Pro -25 Ala Leu Thr Cys Cys His Leu Gly Leu Pro His Pro Val Arg Ala Pro -10 -5 1 Arg Pro Leu Pro Arg Val Glu Pro Trp Asp Pro Arg Trp Gln Asp Ser 10 Glu Leu Arg Tyr Pro Gin Ala Met Asn Ser Phe Leu Asn Glu Arg Ser 25 Ser Pro Cys Arg Thr Leu Arg Gin Glu Ala Ser Ala Asp Arg Cys Asp 40 Leu <210> 196 <211> 150 <212> PRT <213> Homo sapiens <220> <300> <400> 196 Met Lys Val His Met His Thr Lys Phe Cys Leu Ile Cys Leu Leu Thr 1 5 10 Phe Ile Phe His His Cys Asn His Cys His Glu Glu His Asp His Gly 25 Pro Glu Ala Leu His Arg Gin His Arg Gly Met Thr Glu Leu Glu Pro 40 Ser Lys Phe Ser Lys Gln Ala Ala Glu Asn Glu Lys Lys Tyr Tyr Ile 55 Glu Lys Leu Phe Clu Arg Tyr Gly Glu Asn Gly Arg Leu Ser Phe Phe 70 75 Gly Leu Glu Lys Leu Leu Thr Asn Leu Gly Leu Gly Glu Arg Lys Val 90 Val Glu Ile Asn His Glu Asp Leu Gly His Asp His Val Ser His Leu 100 105 110 Gly Ile Leu Ala Val Gin Glu Gly Lys His Phe His Ser His Asn His 115 120 125 Cln His Ser His Asn His Leu Asn Ser Glu Asn Gin Thr Val Thr Ser 130 135 140 Val Ser Thr Lys Lys Lys 145 150 <210> 197 <211> 273 <212> PRT <213> Homo sapiens <220> <221> SIGNAL27 279 <400> 197 Met Asn T Ser Thr A V.il Leu Vi Lys Asp PI
S
Phe Asp G.
Lou Ile Li Tyr Arg G: Ser Gly Ai Trp Thr T) Phe Leu T) 100 Val Val. L Ile Ser L Thr Ala A] Vai Ser Lj 165 Met Cys V1 180 Asp Asp Lc Pro Pro Lc Leu <210> 198 <211> 413 <212> PRT rp Ser Ile Phe Glu Gly Leu Cys Phe Th r Gin Tyr Cys Phe His 120 Ser Cys Cys His Ser 200 Pro Arg Ile Val Thr Asn Thr 10 Pro Val 25 Cys Pro Giu Lys Lou Asn Ser Leu 90 His Ser 105 Ala Asp Giu Lys Ile Leu His Giu 170 His Pro 185 31ly Asp ksp Arg Leu Leu Lou Thr -i5 Cys Ile
I
Ser Ser Ala Asn Leu Ser 50 Glu Lys 65 Leu Cys Gly LeU Leu Ser Trp Leu Ser Lou Ala CiU Arg Val Arg Gin Pro Gly Ser His Val Arg Ser Lou Leu Val Arg His Arg Glu Pro Gly Lys Lys Val Phe Lys Ala Phe Tyr Pro Lys 110 Pro Cys Pro Asn 125 Asn Ile Phe Thr 140 Leu Asn Leu Val 155 Cys Leu Ala Ala His Asp Thr Thr 190 Leu Ile Phe Leu 205 Pro Arg Asp His 220 Gly Val Asfl Lys Tyr Val1 Trp Cys Lou Val1 Ala Arg Ser Tyr Ile Leu Ciu Arg 175 Ser Gly Val1 Giu Leu Lys Gly Asn Lys Cys Glu Phe Se r Ser Trp Me t His Gly Val1 Ile Val1 Phe Leu 160 Lys Ser Ser Lys Val
ASP
Glu Ile Ile Lys Lys His Ile Asp As n Ala His Gly Gly Asp Leu Asp Met~ 145 Ile Al a Cys
ASP
Lys 225 Thr Cys Ala Se r Val1 Arg Glu Arg <213> Homo sapiens <220> <221> SIGNAL <222> -37. i <300> <400> 198 Met Ala Ser Lys 11, Ile Cys Lou Giu Lei Ser Leu Cys Arg Al, Ser Met Gly Gly Ly is Phe Giu His Leu Gi, Leu Lys Giu Val. Ly Cys Asp His His GI Lys Val Ile Cys Tr Val Pro Val.
Pro 20 His Asp Leu Arg G lu Ser Asn Cys Ala Gly Phe 70 Gin Cys Pro Gly His Val Thr Tyr Ser Glu. Arg Asp Leu Asp Arg Gly His 280 His Thr Val Leu Thr Glu Giu Val Phe Lys Gin Ala V 1 Leu Giu A 125 Gin T1hr C 140 Ile Lou A Giu Lys L Gin Gin L
I
Ser Gin T 205 Lys Trp S 220 Lys Ltu L Met Phe A: Leu A~n Si 2' Arg Gin V~ 285 Gly Val Li 300 Val Asp V~ Thr Tyr S Gin Asn L 3' Cly Leu G: 365 <210> 199 <211> 393 <212> PRT al 10 iu ys 90 rp er ys rg 70 a.
50 In Leu Asp Arg As n Thr.
175 Gin Ser Glu Thr Glu 255 Val1 Ile Gly Ser Arg 335 Tyr Asn Leu Giu 130 Ile Gin Lys Arg Glu 210 Arg His Ala Asn Pro 290 Tyr Thr Lys Tyr Lys Giu GIn Arg Phe Glu 195 Leu Lou Al a ValI Leu 275 Ile Phe Ala Tyr Arg 355 Lys Lys Thr Giu Ala 180 Leu Leu Lys Pro Arg 260 Asn Trp Sor Trp Val1 340 Pro Giu Thr Giu Lou 165 Giu Ile Gin Lys Asp 245 Cys Leu Pro Ser Ile 325 Val1 Leu 85 Giu Glu Se r Pho 150 Gin Ala Sor Asp Pro 230 Leu Tyr Val1 ?he dly 310 Leu Arg Phe Giu Trp 135 Asp Arg Clu Asp Met 215 Lys Scr Trp Leu Gin 295 Lys Gly Arg Giy Cys Gin Giu Lys *Giu 120 Lys Gin Lou Asp Val1 200 Ser Met Arg Val1 Ser 280 Cys His Vai Cys Tyr 105 Ala Tyr Lou- Giu Giu 185 G iu Gly Val1 Met Asp 265 Glu Tyr Tyr Tyr Ala 345 Trp Cys Ala Gin Lou Ala Asp Trp Leu Lys Glu Gin Arg Ciu 170 Lcu Cys Ile Ser Lou 250 Val1 Asp As n Trp Cys 330 Asn Val1 Sor
-S
Ser Asp Giu Gly Val1 Lou His Gly Leu Lys Val1 Ser 155 Giu ValI Arg Met Lys 235 Gln Thr Gin Tyr Giu 315 Arg Arg Ile Pro Lys Arg His Asp Thr Gin Asp Lou Cys Lys 370 Tyr Giy Ala Lys Lys Lys 375 <213> Homo sapiens <220> <221> SIGNAL <222> -19. 1 <300> <400> 199 Met Arg Thr Lou Phi 1! Val His Thr Thr Lei
I
Thr Leu LeU Glu Ly Gly Lou Val Val Th, Arg Sor Tyr Cys So: s0 Val Leu Gly Tyr Va.
Lys Val Phe Gly Se Lou, Lys Arg Arg Ci Val Asp Gin Gly Tr Lou Ala -10 Ala Lys Asp Lys Giu Ser 40 Asp Arg 55 Ser His Ile 5cr Giu Val Arg Lys 110 His Ile Val Pro Al Arg Asn Val Leu A 145 Val Val Gln Val Al 160 Val Trp Asn Gln LE 175 Leu Thr His Leu Al 190 Leu Val Ile Pro Pr 21 Phe Thr His Lys CG1 225 Ser Leu Met Thr Ty 240 Ala Pro Leu Ser Tr 255 Ser Lys Trp Arg Se 270 Asp Tyr Ala Thr Se 29 Tyr Ile Gln Thr Le 305 Gin Ala Ser Glu Hi 320 His Val Val Phe Ty 335 Leu Ala Arg Glu Le 350 Gly Leu Asp Tyr Ph 37 <210> 200 <211> 381 <212> PRT <213> Homo sapiens rg sp la eu la .u r r r 0 u s r u e 0 115 Leu Set Lys Leu Glu 195 Ala Phe Asp Val Lys 275 Lys Lys Phe Pro Gly 355 281 120 Asp Trp Thr Leu Glu Asn Ser 180 Ala Ile Glu Tyr Arg 260 Ile Asp Asp Phe Thr 340 Val Phe Asp Gln 165 Gln Leu Thr Gin Set 245 Ala Leu Ala His Glu 325 Leu Gly Glu Glu 150 His Lys His Pro Leu 230 Thr Cys Leu Arg Arg 310 Tyr Lys Tyr Asp Asp 135 Ile Phe Arg Gln Gly 215 Ala Ala Val Gly Glu 295 Pro Lys Ser Glu Asp Val Ala 200 Thr Pro His Gin Leu 280 Pro Arg Lys Leu Glu Gly Gly 185 Arg Asp Val Gin Val 265 Asn Val Met Ser Gin 345 Leu Phe 170 Leu Lou Gln Leu Pro 250 Leu Phe Val Val Arg 330 Val Ser 155 Val Ile Lou Leu Asp 235 Gly Asp Tyr Gly Trp 315 Ser Arg 140 Lys Val His Ala Gly 220 Gly Pro Pro Gly Ala 300 Asp Gly Leu 125 Phe Thr Clu Met Leu 205 Met Phe Asn Lys Met 285 Arg Ser Arg Glu Val Ser Ile Trp Glu Leu Gly Gln Tyr Asp Leu Leu <220> <221> SIGNAL <222> -13..-1 <300> <400> 200 Met Leu Leu Si Thr Val Leu T Glu Ala Asp I.
Asp Asp Leu P: Lys Cly Phe L Val Pro Pro P Ile Arg Arg L, Arg Ala Gly T 100 Leu Ile Ser M Ile Pro Ser V 1 Glu Phe Thr T Ile Gly Met Leu Met Leu Ser Ala Thr Gin Val Tyr Leu Tyr Phe Ser Asp Asn Ala Asn Gly Gly Phe Glu Arg Glu Ser Ile His Glu 125 Ser Leu 282 150 155 160 Ser Leu Pro Leu Giu Tyr Tyr Leu Ile Pro Phe Leu Ile Ile Val Gly 165 170 175 Ile Cys Leu Ile Leu Ile Val Ile Phe Met Ile Thr Lys Phe Val Gin ieO 185 190 195 Asp Arg His Arg Ala Arg Arg Asn Arg Leu Arg Lys Asp Gin Leu Lys 200 205 210 Lys Leu Pro Val His Lys Phe Lys Lys Gly Asp Giu Tyr Asp Val Cys 215 220 225 Ala Ile Cys Leu Asp Giu Tyr Giu Asp Gly Asp Lys Leu Arg.Ile Leu 230 235 240 Pro Cys Ser His Ala Tyr His Cys Lys Cys Val Asp Pro Trp Leu Thr 245 250 255 Lys Thr Lys Lys Thr Cys Pro Val Cys Arg Gin Lys Val Val Pro Ser 260 265 270 275 Gin Gly Asp Ser Asp Ser Asp Thr Asp Ser Ser Gln Glu Giu Asn Glu 280 285 290 Val Thr Giu His Thr Pro Leu Leu Arg Pro Leu Ala Ser Val Ser Ala 295 300 305 Gin Ser Phe Gly Ala Leu Ser Glu Ser Arg Ser His Gin Asn Met Thr 310 315 320 Giu Ser Ser Asp Tyr Glu Glu Asp Asp Asn Glu Asp Thr Asp Ser Ser 325 330 335 Asp Ala Glu Asn Glu Ile Asn Glu His Asp Val Val Val Gin Leu Gin 340 345 350 355 Pro Asn Gly Clu Arg Asp Tyr Asn Ile Ala Asn 'hr Val 360 365 <210> 201 <211> 291 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -42..-1 <300> <400> 201 Met Asp Ser Arg Val Ser Ser Pro Glu Lys Gin Asp Lys Giu Asn Phe -35 Val Gly Val Asn Asn Lys Arg Leu dly Val Cys Cly Trp Ile Leu Phe -20 Ser Leu Ser Phe Leu Leu Val ie Ile Thr Phe Pro Ile Ser Ile Trp -5 1 Met Cys Leu Lys Ile Ile Arg Glu Tyr Clu Arg Ala Val Val Phe Arg 15 Leu Gly Arg Ile Gin Ala Asp Lys Ala Lys Gly Pro Gly Leu Ile Leu 30 Val Leu Pro Cys Ile Asp Val Phe Val Lys Vai Asp Leu Arg Thr Val 45 Thr Cys Asn Ile Pro Pro Cln Giu Ile Leu Thr Arg Asp Ser Val Thr 60 65 Thr Gin Val Asp Gly Val Val Tyr Tyr Arg Ile Tyr Ser Ala Val Ser 80 Ala Val Ala Asn Vai Asn Asp Val His Gin Ala Thr Phe Leu Leu Ala 95 100 Gin Thr Thr Leu Arg Asn Val Leu Gly Thr Gin Thr Lou Ser Gin Ile 105 110 115 Leu Ala Gly Arg Glu Glu Ile Ala His Ser Ile Gin Thr Leu Leu Asp 120 125 130 Asp Ala Thr Giu Leu Trp Gly Ile Arg Val Aia Arg Val Giu Ile Lys 135 140 145 150 Asp Val Arg Ile Pro Val Gin Leu Gin Arg Ser Met Aia Ala Giu Ala 155 160 165 Glu Ala Thr Arg Glu Ala Arg Ala Ls Val Lou Ala Ala Giu Gly Glu 170 Met Ser Ala Ser Lys 185 Ser Pro Ile Ala Leu 200 Ala Thr Glu Lys Asn 215 Leu Glu Gly Ile Gly 235 Asn Lys Ala <210> 202 <211> 92 <212> PRT -c213> Homo sapiens <220> <300> <400> 202 Met Pro Pro Arg Asn 1 5 Thr Tyr Leu Pro Gin Asp Arg Ile Giu Asn Cys His Asp Lys Giu so Gly Ile Gin Lys Arg Giu Asn Lys Phe Ala <210> 203 <211> 127 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -3 <300> <400> 203 Met Ser Ala Ala Gly Pro Asp Lys Val Giu Asn His Pro Ala Gly Lys Trp Gly Leu Val Giu Lys Leu Ser Thr Ilie Trp Ser Arg Tyr Phe Ala Val. Asn Phe Arg Ile Trp Arg Tyr s0 <210> 204 <211> 84 <212> PR'r <213> Homo sapiens <220, <221> SIGNAL <222> -20. i <300> <400> 204 175 Ser Leu Lys Ser 190 Gin Leu Arg Tyr 205 Ser Thr Ile Val.
220 Gly Val Ser Tyr 283 180 Ala Ser Met Val Leu Ala Glu 195 Leu Gin Thr Leu Ser Thr Val 210 Phe Pro Leu Pro Met Asn Ilie 225 230 ASP Asn His Lys Lys Leu Pro 240 245 Leu, Glu Tyr Leu Asp His 40 Tyr Lys 55 Ala Ser Glu Thr Asn Ile Lys Ala Cly His Met Val Ile Thr Phe Ile Tyr Arg Leu Arg Giu Thr Ile Lys Ala Ile Arg His Phe Ser Met Lys Gly Trp Gly Trp Leu Ala Leu -15 Thr Ala Trp Ala Arg Arg Ser Gin Asp 1 5 Ala Leu Val Asp Glu Leu Glu Trp Glu 20 Lys Thr Ile Gin Met Gly Ser Phe Arg 35 Ser Val Val Glu Val Thr Val Thr Val 50 Ser Gly Phe Gly <210> 205 <211> 182 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> <300> <400> 205 Met Lys Gly Trp Gly Trp Leu Ala Leu -15 Thr Aia Trp Ala Arg Arg Ser Gin Asp 1 5 Ala Leu Val Asp Glu Leu Giu Trp Glu 20 Lys Thr Ile Gln Met Gly Ser Phe Arg 35 Ser Val Val Giu Val Pro Tyr Ala Arg 50 Leu Lou Glu Glu Ile Cys Asp Arg Met Asp Pro Ser Thr His Arg Lys Asn Tyr 85 Gly Glu Ser Ser Giu Leu Asp Leu Gln 100 Ile Ser Gly Thr Leu Lys Phe Ala Cys 110 115 Giu Asp Giu Leu Ile Clu Phe Phe Ser 125 130 Asp Lys Leu Cys Ser Lys Arg Thr Asp I 145 Ile Ser His Asp Ciu Leu 160 -210> 206 284 Leu Leu Gly Ala Leu Leu Gly -10 Leu His Cys Gly Ala Cys Arg Ile Ala Gin Val Asp Pro Lys 25 Ile Asn Pro Asp Gly Ser Cln Pro Pro Asn Lys Val Ala His 55 Leu Leu Gly Ala Lou Leu Gly -10 Leu His Cys Gly Ala Cys Arg Ile Ala Gin Val Asp Pro Lys lie Asn Pro Asp Gly Ser Gln Ser Glu Ala His Leu Thr Glu 55 LYs Giu Tyr Gly Glu Gin Ile 70 /al Arg Val Val Gly Arg Asn fly Ile Arg Ile Asp Ser Asp 105 3ly Ser Ile Val Glu Glu Tyr 120 rg Giu Ala Asp Asn Val Lys 135 140 ,eu Cys Asp His Ala Leu His LSO 155 <211> 71 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -25..-1 <300> <400> 206 Met Pro Ala Gly Val Pro Met Ser Thr Tyr Leu Lys Met Phe Ala Ala -20 -15 Ser Leu Leu Ala Met Cys Ala Gly Ala Giu Va1 Val His Arg Tyr Tyr 1 Arg Pro Asp Lou Thr Ile Pro Ciu Ile Pro Pro Lys Arg'Gly Giu Leu 15 Lys Thr Glu Leu Lou Gly Lou Lys Giu Arg Lys His Lys Pro Gin Val 30 Ser Gin Gin Giu Giu Leu Lys 285 <210> 207 <211> 73 <212> PRT <213> Homo sapiens <220> <300> <400> 207 Met Arg Ile Arg Met 1 5 Cys Thr Asp Arg Asg Leu Lys Pro Ser Asg Leu Ala Met Val Prc Glu Ser Leu Thr Gl <210> 208 <211> 169 <212> PRT <213> Homeo sapiens <220> <221> SIGNAL <222> -150..-1 <300> <400> 208 Met Ala Glu Thr Lys -150 Val Ala Val Thr Phe -13 Gin Arg Thr Leu Tyr -115 Val His Leu Leu Glu -100 Leu Ser His Ala Thr Ser Ala Val Xaa Arg Phe Lys Gly Phe Ser Arg Pro Pro Pro Cys Gly Leu His His Val Ser Pro Pro Ala Ser His Arg Ala Arg Gin <210> 209 <211> 76 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -22..-1 <300> <400> 209 Met Glu Leu Ile Se: Leu Phe Leu Leu Let Lys Gly Trp Ile Pr t Thr Asp Gly Arg Thr Leu Val Gly Cys Phe Leu 10 p Cys Asn Val Ile Leu Gly Ser Ala Gin Glu Phe 25 Ser Phe Ser Ala Gly Glu Pro Arg Val Leu Gly 40 Gly His His Ile Val Ser Ile Clu Val Gln Arg 55 Pro Pro Tyr Leu Asp Thr -145 Thr Arg 0 Arg Glu His Gly Cys Ala -80 His Leu -65 Cys Leu Pro Ala Gly Gin Ala Ser 1 Arg Lys Gin Met Leu Val -140 Glu Trp Arg Gln -125 Ile Gly Phe Pro -110 Glu Leu Trp Ile Phe His Ser Cys Ser Leu Gin Leu -60 Leu Pro Ser Ser -45 Phe Phe Val Phe -30 Gly Leu Glu Leu Ser Ala Ala Ile 5 Ala Phe Lys Asp -135 Asp Leu Ala -120 Pro Glu Leu -105 Lys Arg Gly Pro Gly Trp Pro Pro Glu Asp Tyr Arg Val Glu Thr Thr Ser Cys Gly Val Ser r Pro Thr Val Ile Ile Ile Leu Gly Cys Leu Ala -15 Gin Arg Lys Asn Leu Arg Arg Pro Pro Cys Ile 1 5 o Trp Ile Gly Val Gly Phe Glu Phe Gly Lys Ala 20 286 Pro Leu Glu Phe*Ile Glu Lys Ala Arg Ile Lys Val Cys Gly Arg Cly 35 Arg Arg Gly Leu Gin Arg Arg Gln Cys Phe Leu Phe <210> 210 <211> <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -54..-i <300> <400> 210 Met Ala Giu Thr Lys Asp Ala Ala Gin Met Leu Val Thr Phe Lys Asp -45 Vai Ala Val Thr Phe Thr Arg Giu Giu Trp Arg Gin Leu Asp Leu Ala -30 Gin Arg Thr Leu Tyr Arg Giu Val Met Leu Giu Thr Cys Gly Leu Leu -15 Val Ser Leu Val Giu Ser Ile Trp Leu His Ile Thr Glu Asn Gin Ile -s 1 5 Lys Leu Ala Ser Pro Gly Arg Lys Phe Thr Asn Ser Pro Asp Giu Lys is 20 Pro Giu Val Trp Leu Ala Pro Gly Leu Phe Gly Ala Ala Ala Gin 35 <210> 211 <211> 92 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -22..-i <300> <400> 211 Met Glu Leu Ile Ser Pro Thr Val Ile Ile Ile Leu Gly Cys Leu Ala -15 Leu Phe Leu Leu Leu Gin Arg Lys Asn Leu Arg Arg Pro Pro Cys Ile 1 5 Lys Gly Trp Ile Pro Trp Ile dly Val Giy Phe Giu Phe Gly Lys Ala 20 Pro Leu Giu Phe Ile Giu Lys Ala Arg Ile Lys Tyr Gly Pro Ile Phe 35 Thr Val Phe Ala Met Gly Asn Arg Met Thr Phe Val Thr Giu Glu Glu 50 Gly Ile Asn Val Phe Leu Lys Ser Lys Lys Lys Lys 65 <210> 212 <211> 89 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -16..-l <300> <400> 212 Met Ile Ile Ser Leu Phe Ile Tyr Ile Phe Leu Thr Cys Ser Asn Thr -10 Ser Pro Ser Tyr Gin Gly Thr Gin Leu Gly Leu Gly Leu Pro Ser Ala 1 5 10 Gin Trp Trp Pro Leu Thr Gly Arg Arg Met Gin Cys Cys Arg Leu Phe 25 Cys Phe Leu Leu Gin Asn Cys Leu Phe Pro Phe Pro Leu His Leu Ile 40 287 Gin His Asp Pro Cys Glu Leu Val Leu Thr Ile Ser Trp Asp Trp Ala 55 Glu Ala Gly Ala Ser Leu Tyr Ser Pro <210> 213 <211> 109 <212> PRT <213> Homo sapiens <220> <300> <400> 213 Met Lys Val Asp Lys Asp Arg Gln Met Val Val Leu Glu Glu Glu Phe 1 5 10 Arg Asn Ile Ser Pro Glu Glu Leu Lys Met Glu Leu Pro Glu Arg Gin 25 Pro Arg Phe Val Val Tyr Ser Tyr Lys Tyr Val Arg Asp Asp Gly Arg 40 Val Ser Tyr Pro Leu Cys Phe Ile Phe Ser Ser Pro Val Gly Cys Lys 55 Pro Glu Gln Gln Met Met Tyr Ala Gly Ser Lys Asn Arg Leu Val Gin 70 75 Thr Ala Glu Leu Thr Lys Val Phe Glu Ile Arg Thr Thr Asp Asp Leu 90 Thr Glu Ala Trp Leu Gln Glu Lys Leu Ser Phe Phe Arg 100 105 <210> 214 <211> 114 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -103..-1 <300> <400> 214 Met Val Ile Arg Val Tyr Ile Ala Ser Ser Ser Gly Ser Thr Ala Ile -100 -95 Lys Lys Lys Gln Gln Asp Val Leu Gly Phe Leu Glu Ala Asn Lys Ile -80 Gly Phe Glu Glu Lys Asp Ile Ala Ala Asn Glu Glu Asn Arg Lys Trp -65 Met Arg Glu Asn Val Pro Glu Asn Ser Arg Pro Ala Thr Gly Asn Pro -50 -45 Leu Pro Pro Gin Ile Phe Asn Glu Set Gln Tyr Arg Gly Asp Tyr Asp -30 Ala Phe Phe Glu Ala Arg Glu Asn Asn Ala Val Tyr Ala Phe Leu Gly -15 Leu Thr Ala Pro Ser Gly Ser Lys Glu Ala Glu Val Gln Ala Lys Gin 1 Gin Ala <210> 215 <211> 124 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -97..-1 <300> <400> 215 Met Ala Asp Asp Leu Lys Arg Phe Leu Tyr Lys Lys Leu Pro Ser Val -90 Glu Gly Leu His Ala Ile Val Val Ser Asp Arg Asp Gly Val Pro Val -75 Ile Lys Val Ala Asn Phe Leu Set Thr Phe Leu Ser Lys Asn Lys Val Gin Phe Asn Arg Ser Ala Asn Thr Gly 1 Lieu Phe Glu Giu Leu <210> 216 <211> 93 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -22. l <300> <400> 216 Met Lys Pro Val Leu Leu Gin Leu Val Pro Leu Glu Pro Pro Pro Cys Thr Met Gin Glu Phe Cys Gly Ile Val Ile Lys His Lys Gly <210> 217 <211> 207 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -42. 1 <300> <400> 217 Met His Ile Leu Gin Ala Ala Ile Leu -10 Va 1 Val1 Gin Pro Ser Giu Ser Val1 Pro Thr Cys -25 Val1 Set Va 1 Phe Lys Ala Lys 35 Giu Ile
'I
L
G
288 Glu His Ala Leu Arg Pro Gly -55 ~sp Gin Gly Ser Lys Leu Gly *40 yr Tyr Asn Thr Tyr Gin Val !al Ser Phe Ile Ala Ser Set *eu Giu Lys Glu Leu Ala Pro 10 lu Val Set Leu Gin Pro Gly Thr Met~ Cys Leu Ala Lys Tyr Ile Thr His Leu Cys Ser Ser Arg Asn Arg Leu Leu Thr Thr Val Asp Asp Gly Ile Gin Ala Ile Vai His Cys Pro Asp Asp Leu Ala Val1 Giu Giu Leu Thr Cys His Gin Lys Ala Ser Pro Leu Gin Lys Arg Thr Ile Leu Thr Val Thr Phe Vai Thr Asp Pro Leu Glu Gin Lys Gin Ala Gin Leu Asn Gin Leu Val1 Ile Trp, -is Asp Asp
I
Val Phe Tyr Leu Arg Val Ala Giu Asp Phe Asn Ile Glu Gly 105 115 Gin Lys Cys Set Set Ala Phe Gin Asn Leu Leu Pro Phe Tyr Ser Pro 289 120 125 130 Val Val Glu Asp Phe Ile Lys Ile Leu Arg Glu Val Asp Lys Ala Leu 135 140 145 150 Ala Asp Asp Leu Glu Lys Asn Phe Pro Ser Leu Lys Val Gln Thr 155 160 165 <210> 218 <211> 59 <212> PRT <213> Homo sapiens <220> <300> <400> 218 Mec Pro His Ser Lys Pro Leu Asp Trp Gly Leu Ser Ser Val Ala Glu 1 5 10 Cys Pro Ala Glu Leu Phe Pro Ser Thr Gly Gly Leu Ala Gly Lys Gly 25 Pro Gly Leu Asp Ile Leu Arg Cys Val Leu Ser Pro Trp Ala Ser His 40 Phe Pro Ser Leu Ser Leu Gly Val Phe Asn Leu <210> 219 <211> 56 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -27..-1 <300> <400> 219 Met Asn Arg Val Pro Ala Asp Ser Pro Asn Met Cys Leu Ile Cys Leu -20 Leu Ser Tyr Ile Ala Leu Gly Ala Ile His Ala Lys Ile Cys Arg Arg -5 1 Ala Phe Gin Glu Glu Gly Arg Ala Asn Ala Lys Thr Gly Val Arg Ala 15 Trp Cys Ile Gin Pro Trp Ala Lys <210> 220 <211> 162 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -94..-1 <300> <400> 220 Met Leu Gin Thr SertAsn Tyr Ser Leu Val Leu Ser Leu Gin Phe Leu -85 Leu Leu Ser Tyr Asp Leu Phe Val Asn Ser Phe Ser Glu Leu Leu Gin -70 Lys Thr Pro Val Ile Gin Leu Val Leu Phe Ile Ile Gin Asp Ile Ala -55 Val Leu Phe Asn Ile Ile Ile Ile Phe Leu Met Phe Phe Asn Thr Phe -40 Val Phe Gin Ala Gly Leu Val Asn Leu Leu Phe His Lys Phe Lys Gly -25 -20 Thr Ile Ile Leu Thr Ala Val Tyr Phe Ala Leu Ser Ile Ser Leu His -5 1 Val Trp Val Met Asn Leu Arg Trp Lys Asn Ser Asn Ser Phe Ile Trp 10 Thr Asp Gly Leu Gin Met Leu Phe Val Phe Gin Arg Leu Ala Ala Val 25 30 Leu Tyr Cys Tyr Phe Tyr Lys Arg Thr Ala Val Arg Leu Gly Asp Pro 290 40 45 His Phe Tyr Gin Asp Ser Leu Trp Leu Arg Lys Giu Phe Met Gin Val 60 Arg Arg <210> 221 <211> 154 <212> PRi' <213> Homo sapiens <220> <221> SIGN4AL <222> -68. i <300> <400> 221 Met Ala Ser Ala Ser 65 Pro Pro Pro Ser Lys His Ile His Arg Ala Giu Ser Phe Trp Lys Val Thr Gln Gly Leu
I
Phe Gly Ser Leu Pro Leu Gly Lys Val Ser Phe Glu Asp Gin Leu His Cys Leu Leu Thr Glu Lys Gly Asp Ser s0 <210> 222 <211> 99 <212> PRT <213> Homo sapiens <220> <300> <400> 222 Met Lys Val Glu Glu 1 5 Gly Leu Thr Ala Thr Cys Thr Glu Arg Gly Tyr Met Ser Pro Ala so Val Leu Lys Gin Gly Asn Lys Aia Thr Gly Leu Gly Asn <210> 223 <211> 43 <212> PRT Lys Asp Pro Lys Arg. Glu Leu Val Ala Ala Leu Leu Sek Lys Pro Gin Lys His His Thr Asn Ala Val Asp Asn Ile Pro Gly Val Ser Leu Cly Clu Asp Thr Leu Ala Phe Leu Ile Ala Gin <213> Homo sapiens <220> <221> SIGNAL <222> <300> <400> 223 Met Gin Cys ?he Ser Phe Ile LYS Thr met met Ile Leu Phe Asn Leu 291 15 Leu Ile Phe Leu Cys Gly Phe Thr Asn Tyr Thr Asp Phe Glu Asp Ser 1 Pro Tyr Phe Lys Met His Lys Pro Val Thr Met <210> 224 <211> 69 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -21..-1 <300> <400> 224 Met Trp Trp Phe Gln Gin Gly Leu Ser Phe Leu Pro Ser Ala Leu Val -15 Ile Trp Thr Ser Ala Ala Phe Ile Phe Ser Tyr Ile Thr Ala Val Thr 1 5 Leu His His Ile Asp Pro Ala Leu Pro Tyr Ile Ser Asp Thr Gly Thr 20 Val Ala Pro Glu Lys Cys Leu Phe Gly Ala Met Leu Asn Ile Ala Ala 35 Val Leu Cys Gln Lys <210> 225 <211> 78 <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -18..-1 <300> <400> 225 Met Ser Pro Gly Ser Ala Leu Ala Leu Leu Trp Ser Leu Pro Alea Ser -10 Asp Leu Gly Arg Ser Val Ile Ala Gly Leu Trp Pro His Thr Gly Val 1 5 Leu Ile His Leu Glu Thr Ser Gin Ser Phe Leu Gln Gly Gln Leu Thr 20 25 Lys Ser Ile Phe Pro Leu Cys Cys Thr Ser Leu Phe Cys Val Cys Val 40 Val Thr Val Gly Gly Gly Arg Val Gly Ser Thr Phe Val Ala 55 <210> 226 <211> <212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -47..-1 <300> <400> 226 Met Arg Leu Pro Pro Ala Leu Pro Set Gly Tyr Thr Asp Ser Thr Ala -40 Leu Glu Gly Leu Val Tyr Tyr Leu Asn Gin Lys Leu Leu Phe Ser Ser -25 Pro Ala Ser Ala Leu Leu Phe Phe Ala Arg Pro Cys Val Phe Cys Phe -10 -5 1 Lys Ala Ser Lys Met Gly Pro Gin Phe Glu Asn Tyr Pro Thr Phe Pro 10 Thr Tyr Ser Pro Leu Pro Ile Ile Pro Phe Gin Leu His Gly Arg Phe 25 <210> 227 292 <211> 241.
<212> PRT <213> Homo sapiens <220> <221> SIGNAL <222> -103. l <300> <400> 227 Mt Trp Leu Asp Pro Val Phe Pro Leu Phe Pro Val Gly Asp His Tyr -100 -95 Lou Pro Asn Val1 Ala Thr Gly Thr Arg Thr Ala Ile Glu Gly Ala Lau As p Pro Val1 Asn Asn Glu Trp Val1 Ala Ser Gin Ser Gly 125 Lou Clu Val Asp Thr Arg Ile Tyr Ala Thr -15 Gin Thr Ile Gln Ala Arg Ser met 50 Leu Ala Ser Met Gin Thr Pro Leu 115 Asn His 130 Ile Leu Thr Val Val Thr Val Ser Lau Ala Gin Ile Leu Asp Ile Lys Glu Ala Gly Glu Ala Glu Thr Val Asn Ile Leu Pro 135 Val Lou Thr Cys Thr Gin Ala Val Gin Thr Leu Ala Asp Ala Asp Val Giu Ala Met Asn Ser Pro Ala Thr 105 Leu Glu 120 Asn Lys

Claims (16)

1. A purified or isolated polypeptide comprising a sequence of SEQ ID NO: 185 or SEQ ID NO: 215.
2. A purified or isolated polypeptide comprising a sequence which is at least identical to SEQ ID NO: 185 or SEQ ID NO: 215, wherein the polypeptide has at least one biological activity of the polypeptide of claim 1. (Ni
3. The purified or isolated polypeptide of claim 2, wherein the biological activity is regulating protein-protein interaction in the MAP kinase pathway.
4. A purified or isolated nucleic acid molecule encoding a polypeptide according to any of claims 1 to 3, or a sequence complementary thereto. The purified or isolated nucleic acid molecule of claim 4, wherein the nucleic acid comprises SEQ ID NO: 138 or SEQ ID NO: 168.
6. A purified or isolated nucleic acid molecule consisting of at least 30, 40, 50, or 100 consecutive bases of SEQ ID NO: 138 or SEQ ID NO: 168, or a sequence complementary thereto.
7. An expression vector comprising the nucleic acid molecule according to any of claims 4 to 6.
8. A host cell comprising a recombinant nucleic acid molecule according to any of claims 4 to 6.
9. A method of making a polypeptide according to any of claims 1 to 3 comprising the steps of: obtaining a cDNA comprising a nucleic acid molecule according to any of claims 4 to 6; (ii) inserting said cDNA in an expression vector such that said cDNA is operably linked to a promoter; and (iii) introducing said expression vector into a host cell whereby said host cell produces the protein encoded by said cDNA. The method of claim 9, further comprising the step of isolating said polypeptide.
11. A purified or isolated antibody capable of specifically binding to a polypeptide according to any of claims 1 to 3.
12. Use of the polypeptide according to any of claims 1 to 3 for diagnosing or Streating a disease selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock.
13. Use of the nucleic acid molecule according to any of claims 4 to 6 for diagnosing or treating a disease selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock.
14. Use of the antibody according to claim 11 for diagnosing or treating a disease selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock. A method for treating or preventing a disease in a subject selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock, the method comprising administering to the subject a polypeptide according to any one of claims 1 to 3.
16. A method for treating or preventing a disease in a subject selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock, the method comprising administering to the subject a nucleic acid molecule according to any one of claims 4 to 6.
17. A method for treating or preventing a disease in a subject selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock, the method comprising administering to the subject an antibody according to claim 11. S295
18. Use of a polypeptide according to any one of claims 1 to 3 for the manufacture Sof a medicament for treating or preventing a disease selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock.
19. Use of a nucleic acid molecule according to any one of claims 4 to 6 for the manufacture of a medicament for treating or preventing a disease selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, Cr hypertension and renal injury and repair and septic shock. C, 20. Use of an antibody according to claim 11 for the manufacture of a medicament for treating or preventing a disease selected from the group consisting of cancer, neurodegenerative diseases, cardiovascular disorders, hypertension and renal injury and repair and septic shock Dated this twelfth day of December 2005 Serono Genetics Institute S.A. Patent Attorneys for the Applicant: F B RICE CO
AU2002301051A 1997-08-01 2002-09-09 Extended cDNAs for secreted proteins Ceased AU2002301051B9 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2002301051A AU2002301051B9 (en) 1997-08-01 2002-09-09 Extended cDNAs for secreted proteins
AU2006202884A AU2006202884A1 (en) 1997-08-01 2006-07-05 Extended cDNAs for secreted proteins

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US08905135 1997-08-01
AU85547/98A AU8554798A (en) 1997-08-01 1998-07-31 5'ests for non tissue specific secreted proteins
PCT/IB1998/001862 WO1999025825A2 (en) 1997-11-13 1998-11-13 EXTENDED cDNAs FOR SECRETED PROTEINS
AU10491/99A AU753099B2 (en) 1997-08-01 1998-11-13 Extended cDNAs for secreted proteins
AU2002301051A AU2002301051B9 (en) 1997-08-01 2002-09-09 Extended cDNAs for secreted proteins

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
AU85547/98A Division AU8554798A (en) 1997-08-01 1998-07-31 5'ests for non tissue specific secreted proteins
AU10491/99A Division AU753099B2 (en) 1997-08-01 1998-11-13 Extended cDNAs for secreted proteins

Related Child Applications (1)

Application Number Title Priority Date Filing Date
AU2006202884A Division AU2006202884A1 (en) 1997-08-01 2006-07-05 Extended cDNAs for secreted proteins

Publications (3)

Publication Number Publication Date
AU2002301051A1 AU2002301051A1 (en) 2003-02-27
AU2002301051B2 true AU2002301051B2 (en) 2006-04-06
AU2002301051B9 AU2002301051B9 (en) 2006-08-31

Family

ID=39338292

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2002301051A Ceased AU2002301051B9 (en) 1997-08-01 2002-09-09 Extended cDNAs for secreted proteins

Country Status (1)

Country Link
AU (1) AU2002301051B9 (en)

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NCBI Acc No AAC34591 *
NCBI Acc No AF082526 *
Schaeffer HJ, Catling AD, Eblen ST, Collier LS, Krauss A, Weber MJ. MP1: a MEK binding partner that enhances enzymatic activation of the MAP kinase cascade. Science. 1998 Sep 11;281(5383):1668-71. Abstract *

Also Published As

Publication number Publication date
AU2002301051B9 (en) 2006-08-31

Similar Documents

Publication Publication Date Title
AU758004B2 (en) Extended cDNAs for secreted proteins
AU764441B2 (en) cDNAs encoding secreted proteins
AU764571B2 (en) 5&#39; ESTs and encoded human proteins
EP1000150B1 (en) 5&#39; ESTs FOR SECRETED PROTEINS EXPRESSED IN BRAIN
EP1378571B1 (en) 5&#39; ESTs for secreted proteins expressed in various tissues
US20060223142A1 (en) Extended cDNAs for secreted proteins
EP1033401A2 (en) Expressed sequence tags and encoded human proteins
EP1000149B1 (en) 5&#39; ESTs FOR SECRETED PROTEINS IDENTIFIED FROM BRAIN TISSUES
JP2001512013A (en) 5&#39;EST of secreted protein expressed in prostate
JP2001512011A (en) 5&#39;EST of non-tissue specific secreted protein
JP2001512016A (en) 5&#39;EST of secreted proteins expressed in muscle and other mesodermal tissues
US6573068B1 (en) Claudin-50 protein
EP1375514B1 (en) 5&#39;ESTs for secreted proteins expressed in various tissues
JP2001512005A (en) 5&#39;EST of secreted protein expressed in endoderm
AU2002301051B2 (en) Extended cDNAs for secreted proteins
AU753099B2 (en) Extended cDNAs for secreted proteins
AU2003204659C1 (en) Extended cDNAs for secreted proteins
EP1903111A2 (en) Extended cDNAs for secreted proteins
AU2006202884A1 (en) Extended cDNAs for secreted proteins

Legal Events

Date Code Title Description
DA3 Amendments made section 104

Free format text: THE NATURE OF THE AMENDMENT IS: AMEND THE NAME OF THE APPLICANT FROM GENSET TO GENSET S.A.

TC Change of applicant's name (sec. 104)

Owner name: SERONO GENETICS INSTITUTE S.A.

Free format text: FORMER NAME: GENSET S.A.

FGA Letters patent sealed or granted (standard patent)
SREP Specification republished
MK14 Patent ceased section 143(a) (annual fees not paid) or expired