A NEW DISINTEGRIN AND METALLOPROTEINASE WITH THROMBOSPONDIN TYPE I POLYNUCLEOTIDE AND ITS ENCODED POLYPEPTIDE
Field of the invention
[0001] The present invention relates to a new metalloproteinase polynucleotide and polypeptide genetic sequence (A Disintegrin And Metalloproteinase with ThromboSpondin type I repeats, (ADAMTS-14) , a vector comprising said sequence, a cell transfected by said vector and the various pharmaceutical and industrial uses of said products .
Background of the invention [0002] ADAMTS (A Disintegrin And Metalloproteinase with
ThromboSpondin type I repeats) is a novel family of metalloproteinases found in vertebrates and invertebrates. These enzymes are related to ADAMs as judged from sequence homology and conserved domains such as a characteristic metalloproteinase domain and a disintegrin-like module.
However, they differ from ADAMs by their domain organisation and the presence of distinct features. The most specific hallmark is the presence of a central thrombospondin type I repeat (TSPI) between the disintegrin-like module and the Cys-rich domain. All
ADAMTS, except ADAMTS-4, contain also TSPI-like domains in varying numbers at the C-terminus (1) . Currently, 10
ADAMTS from vertebrates and a few from invertebrates
(Drosophila and C. e legan s) have been described.
ADAMTS-1, -4 and -5 are able to cleave proteoglycans and are probably involved in cartilage degradation during arthritis (2). ADAMTS-1 and -8 are potent anti-angiogenic molecules (3). Adamt sl' ' mice display abnormal growth, defective fertility and altered organ morphology and function (4) . A C. e l ega n s Adam t s , gon-1, was found essential for gonadal morphogenesis (5). Both the metalloproteinase domain and some TSPI-like repeats are required for the control of this process. [0003] The primarily described activity of ADAMTS-2 is to excise the aminopropeptide of type I and type II procollagens , explaining its former trivial name aminoprocollagen I/II peptidase (6, 7). Removal of the island C-propeptide of type I procollagen is required to generate collagen monomers able to assemble into elongated and cylindrical collagen fibres. Human Ehlers-Danlos type VIIC (dermatosparactic-type, OMIM 225410) and animal dermatosparaxis are recessively inherited disorders that are caused by mutations preventing the synthesis of active ADAMTS-2 (8) . As a consequence, pN-I collagen (type I collagen that still contains the N- but not the C- propeptide) accumulates, resulting in the polymerisation of abnormal collagen fibres that appear irregular, thin, branched and "hieroglyphic" in cross section. The main clinical feature of human patients and affected animals is a severe cutaneous fragility. A similar phenotype has been recently reported in transgenic mice with inactive alleles for Adamts2 (9). Other type I collagen-rich tissues, such as bone and tendon, do not seem to be functionally affected. Furthermore, a significant proportion of type I collagen extracted from skin biopsies of Ehlers-Danlos type VIIC (EDSVIIC) patients or from dermatosparactic calves is N-terminally processed, at a site that remained to be
determined, although no active ADAMTS2 is synthesised
(8). These observations and the fact that processing of aminopropeptide is a complex event requiring a specific three-dimensional native conformation suggested that an enzyme closely related to ADAMTS-2 would be responsible for this alternative aminoprocollagen peptidase activity.
State of the art
[0004] The document WO98/00555 describes a human recombinant N- proteinase, a polynucleotide sequence encoding said proteinase and method for producing said proteinase through a transfection of a host cell. [0005] The document WO00/53774 describes various polynucleotide sequences encoding various ADAMTS polypeptides sequences and their variants, as well as vectors comprising said sequences, cells transformed by said vectors, polypeptides obtained from said cells and pharmaceuticals applications of said polypeptides.
Definitions
[0006] « Polypeptide » refers to any peptide or protein comprising two or more amino acids joined to each other by peptide bonds or modified peptide bonds, i.e., peptide isosteres. "Polypeptide" refers to both short chains, commonly referred to as peptides, oligopeptides or oligomers, and to longer chains, generally referred to as proteins. Polypeptides may contain amino acids other than the 20 gene-encoded amino acids. "Polypeptides" include amino acid sequences modified either by natural processes, such as posttranslational processing, or by chemical modification techniques which are well known in the art. Such modifications are well described in basic texts and in more detailed monographs, as well as in a voluminous
research literature. Modifications can occur anywhere in a polypeptide, including the peptide backbone, the amino acid side-chains and the amino or carboxyl termini. It will be appreciated that the same type of modification may be present in the same or varying degrees at several sites in a given polypeptide. Also, a given polypeptide may contain many types of modifications. Polypeptides may be branched as a result of ubiquitination, and they may be cyclic, with or without branching. Cyclic, branched and branched cyclic polypeptides may result from posttranslational natural processes or may be made by synthetic methods. Modifications include acetylation, acylation, ADP-ribosylation, amidation, covalent attachment of flavin, covalent attachment of a hem moiety, covalent attachment of a nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, covalent attachment of phosphotidylinositol, cross-linking, cyclization, disulfide bond formation, demethylation, formation of covalent cross-linkings, formation of cystine, formation of pyroglutamate , formylation, gamma- carboxylation, glycosylation, GPI anchor formation, hydroxylation, iodination, methylation, myristoylation, oxidation, proteolytic processing, phosphorylation, prenylation, racemization, selenoylation, sulfation, transfer-RNA mediated addition of amino of amino acids to proteins such as arginylation, and ubiquitination. See, for instance, PROTEINS - STRUCTURE AND MOLECULAR PROPERTIES, 2nd Ed., T. E. Creighton, W. H. Freeman and Comany, New York, 1993 and Wolt, F., Posttranslational Protein Modifications: Perspectives and Prospects, pp. 1-12 in POSTTRANSLATIONAL COVALENT MODIFICATION OF PROTEINS, B. C. Johnson, Ed., Academic Press, New York, 1983; Seifter e t a l . , "Analysis for protein modifications and nonprotein
cofactors", Me th . En zymol . (1990) 182 : 626-646 and
Rattan et a l . , "Protein Synthesis: Posttranslational Modifications and Aging", Ann NY Acad Sci (1992) 663 : 48- 62. [0007] "Polynucleotide" generally refers to any polyribonucleotide or polydeoxyribonucleotide, which may be unmodified RNA or DNA or modified RNA or DNA. "Polynucleotides" include, without limitation single- and double-stranded DNA, DNA that is a mixture of single- and double-stranded regions, single- and double- stranded RNA, and RNA that is a mixture of single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or a mixture of single- and double-stranded regions. In addition, "Polynucleotide" refers to triple-stranded regions comprising RNA or DNA or both RNA and DNA. The term "Polynucleotide" also includes DNAs or RNAs containing one or more modified bases and DNAs or RNAs with backbones modified for stability or for other reasons. "Modified" bases include, for example, tritylated bases and unusual bases such as inosine. A variety of modifications has been made to DNA and RNA; thus, "Polynucleotide" embraces chemically, enzymatically or metabolically modified forms of polynucleotides as typically found in nature, as well as the chemical forms of DNA and RNA characteristic of viruses and cells. "Polynucleotide" also embraces relatively short polynucleotides, often referred to as oligonucleotides. [0008] "Variant" as the term is used herein, is a polynucleotide or polypeptide that differs from a reference polynucleotide or polypeptide respectively, but retains essential properties. A typical variant of a polynucleotide differs in nucleotide sequence from another, reference polynucleotide. Changes in the nucleotide sequence of the
variant may or may not alter the amino acid sequence of a polypeptide encoded by the reference polynucleotide. Nucleotide changes may result in amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the reference sequence, as discussed below. A typical variant of a polypeptide differs in amino acid sequence from another reference polypeptide. Generally, differences are limited so that the sequences of the reference polypeptide and the variant are closely similar overall and, in many regions, identical. A variant and reference polypeptide may differ in amino acid sequence by one or more substitutions (preferably conservative) , additions and deletions in any combination. A substituted or inserted amino acid residue may or may not be one encoded by the genetic code. A variant of a polynucleotide or polypeptide may be a naturally occurring such as an allelic variant, or it may be a variant that is not known to occur naturally. Non-naturally occurring variants of polynucleotides and polypeptides may be made by mutagenesis techniques or by direct synthesis. Variants should retain one or more of the biological activities of the reference polypeptide. For instance, they should have similar antigenic or immunogenic activities as the reference polypeptide. Antigenicity can be tested using standard immunoblot experiments, preferably using polyclonal sera against the reference polypeptide. The immunogenicity can be tested by measuring antibody responses (using polyclonal sera generated against the variant polypeptide) against purified reference polypeptide in a standard ELISA test. Preferably, a variant would retain all of the above biological activities.
[0009] "Identity" is a measure of the identity of nucleotide sequences or amino acid sequences. In general,
the sequences are aligned so that the highest order match is obtained. "Identify" per se has an art-recognised meaning and can be calculated using published techniques. See, e.g.: (COMPUTATIONAL MOLECULAR BIOLOGY, Lesk, A.M., ed., Oxford University Press, New York, 1988; BIOCOMPUTING: INFORMATICS AND GENOME PROJECTS, Smith, D.W., ed., Academic Press, New York, 1993; COMPUTER ANALYSIS OF SEQUENCE DATA, PART I, Griffin, A.M., and Griffin, H.G., eds, Humana Press, New Jersey, 1994; SEQUENCE ANALYSIS IN MOLECULAR BIOLOGY, von Heijne, G., Academic Press, 1987; and SEQUENCE ANALYSIS PRIMER, Gribskov, M. and Devereux, J., eds, M Stockton Press, New York, 1991) . While there exist a number of methods to measure identity between two polynucleotide or polypeptide sequences, the term "identity" is well known to skilled artisans (Carillo, H., and Lipton, D., SIAM J Applied Ma th (1998) 48 : 1073) . Methods commonly employed to determine identity or similarity between two sequences include, but are not limited to those disclosed in Guide to Huge Computers, Martin J. Bishop, ed., Academic Press, San Diego, 1994, and Carillo, H., and Lipton, D., SIAM J Appl i ed Ma th (1988) 48 : 1073. Methods to determine identity and similarity are codified in computer programs. Preferred computer program methods to determine identity and similarity between two sequences include, but are not limited to, GCG program package (Devereux, J., et al . , J Molec Biol (1990) 215 : 403) . Most preferably, the program used to determine identity levels was the GAP program, as was used in the Examples hereafter. [0010] As an illustration, by a polynucleotide having a nucleotide sequence having at least, for example, 95% "identity" to a reference nucleotide sequence is intended that the nucleotide sequence of the polynucleotide is identical to the reference sequence except that the
polynucleotide sequence may include an average up to five point mutations per each 100 nucleotides of the reference nucleotide sequence. In other words, to obtain a polynucleotide having a nucleotide sequence at least 95% identical to a reference nucleotide sequence, up to 5% of the nucleotides in the reference sequence may be deleted or substituted with another nucleotide, or a number of nucleotides up to 5% of the total nucleotides in the reference sequence may be inserted into the reference sequence. These mutations of the reference sequence may occur at the 5' or 3' terminal positions of the reference nucleotide sequence or anywhere between those terminal positions, interspersed either individually among nucleotides in the reference sequence or in one or more contiguous groups within the reference sequence.
[0011] A fragment may be "free-standing" or comprised within a larger polypeptide of which they form a part or region, most preferably as a single continuous region. Representative examples of polypeptide fragments of the invention, include, for example, fragments from about amino acid number 1-20, 21-40, 41-60, 61-80, 81-100, and 101 to the end of the polypeptide. In this context "about" includes the particularly recited ranges larger or smaller by several, 5, 4, 3, 2 or 1 amino acid at either extreme or at both extremes .
[0012] Preferred fragments include, for example, truncated polypeptides having the amino acid sequence of the polypeptides, except for deletion of a continuous series of residues that includes the amino terminus, or a continuous series of residues that includes the carboxyl terminus and / or transmembrane region or deletion of two continuous series of residues, one including the amino terminus and one including the carboxyl terminus. Also
preferred are fragments characterised by structural or functional attributes such as fragments that comprise alpha-helix and alpha-helix forming regions, beta-sheet and beta-sheet forming regions, turn and turn-forming regions, coil and coil-forming regions, hydrophilic regions, hydrophobic regions, alpha amphipathic regions, beta amphipathic regions, flexible regions, surface-forming regions, substrate binding region, and high antigenic index regions. Other preferred fragments are biologically active fragments.
Summary of the invention
[0013] The present invention is related to an isolated and purified polynucleotide that encodes an ADAMTS polypeptide which presents more than 55%, 60%, 70%, 80 %,
85 %, 90 %, 95 %, more preferably 98-99 % homology (or sequence identity) with the sequence SEQ.ID.NO 13, or its complementary strand. The overall identity between ADAMTS- 14 and ADAMTS-2 polynucleotides is 54% with regions displaying higher levels of identity such as 78% in 860 nucleotides overlap or 95% in 75 nucleotides overlap. [0014] Preferably, the polynucleotide according to the invention corresponds to the human nucleotide sequence SEQ.NO 13, a variant or an active portion thereof. [0015] Another aspect of the present invention is related to a nucleotide probe made of at least 15, 20 or 25 consecutive nucleotides capable of specifically hybridizing with an unique sequence included in the polynucleotide sequence according to the invention, the nucleotide probe may be a DNA or a RNA molecule.
[0016] Preferably, said nucleotide probe is an antisense oligonucleotide having a sequence capable of specifically hybridizing to a mRNA molecule encoding the
polypeptide according to the invention, so as to prevent translation of said mRNA molecule or an antisense oligonucleotide having a sequence capable of specific hybridizing to the cDNA molecule encoding the polypeptide according to the invention.
[0017] Furthermore, said antisense oligo-nucleotide may comprise chemical analogues of nucleotides or substances which inactivate mRNA or be included in RNA molecule endowed with ribozyme or small interferential RNA (hereafter called siRNA) activity. Preferably, said complementary hybridization is obtained under stringent conditions (such as the one described by Sambrook et al.) [0018] Another aspect of the present invention concerns the ADAMTS polypeptide encoded by the polynucleotide according to the invention and which presents preferably an amino-acid sequence which has more that 55%, 60%, 70%, 80 %, 85 %, 90 %, 95 %, more preferably 98-99 % homology (or sequence identity) with SEQ.ID.NO 13. The sequence identity between ADAMTS-2 polypeptides is 54% in 1170 amino-acids overlap with regions displaying higher levels of identity such as 97% in 30 amino-acid overlap.
[0019] Preferably, said polypeptide is a human polypeptide having an amino-acid sequence SEQ.ID.NO 13 or is a variant or an active fragment of said sequence. [0020] The polypeptide according to the invention differs from others ADAMTS polypeptides in that it does not comprise a TSPI-like domain at the C-terminus. Therefore, variants or active fragments of the polypeptide according to the invention preferably comprise an active portion of the specific C-terminus, the furin cleaving site, the metallo protease domain (catalytic site and Met-turn) and or the TSPI site of the polypeptide complete sequence according to the invention.
[0021] The variants of the polynucleotide and/or polypeptide sequence according to the invention are molecules that present the same or similar biological activity (aminoprocollagen peptidase, PN-I collagen and other functions) than in the complete sequence. Fragments or variants of the polypeptide according to the invention are also molecules which present the same activity with one or more genetic modifications (such as deletion or addition of one or more amino-acids or nucleotides) in the complete sequence or are natural occurring allelic variants. Preferably, such modifications do not modify the above- mentioned pourcentage of homology or sequence identity. Said variants are also molecules which present a similar activity as the complete polypeptide according to the invention through the same biochemical pathway and acting similarly upon the same active side. The polypeptides according to the invention can be integrated as native proteins or as part of fusion proteins or may advantageously include additional amino-acid sequences which comprise secretory or leader sequences, pro- sequences, sequences which improve elution in purification, such as multiple histidino-residues or an additional sequence for stability during recombinant production (tag His in the C-terminal sequence) . Said polypeptides may comprise also marker sequences which facilite purification of the fuse polypeptides, such as sequences with an hexa- histidine peptide as provided in the PQE vector (Invitrogen) and described by Gentz et al . (Proceeding National Academy of Sciences of the USA, 1989, vol. 86, page 821-824) or a HA Tag or a glutathione-S-transferase . The corresponding polynucleotides may also content non- coding 5' and 3' sequences, such as transcribed non-
translated sequences, splicing and polyadenylation signal and ribosome binding sites.
[0022] Another aspect of the present invention is related to a vector comprising the polynucleotide or polypeptide according to the invention, said vector being preferably selected from the group consisting of plasmids, viruses, liposomes or cathionic vesicules, able to transfect or transform a cell and to obtain the expression of said polynucleotide and a transcription of said polypeptide by said cell.
[0023] A further aspect of the invention is related to the cell (prokaryotic or eukariotic cell) transfected by, transformed by or comprising said vector. [0024] A further aspect of the present invention is related to an inhibitor directed against the polypeptide or polynucleotides according to the invention or directed against a fragment (epitope) of said polypeptide. Preferably, said inhibitor could be an antisense polynucleotide as above described or a siRNA or an antibody (monoclonal, polyclonal antibody or an active hypervariable portion thereof Fab1, Fab2,...). The inhibitor could be also a specific receptor of a blood cell able to interact specifically with said polypeptide and/or its epitope (s). [0025] The present invention is also related to the cell (hybridomas) expressing and producing said antibody or an active hypervariable portion thereof.
[0026] A further aspect of the present action is related to an agent (including an antibody) that may modify the expression or the activity of the polypeptide according to the invention and having advantageously the improved pharmaceuticals properties mentioned hereafter.
[0027] The present invention also concerns a transgenic non-human animal (preferably a non-human mammal, such as a mouse) over-expressing (or expressing ectopically) the polynucleotide encoding the polypeptide according to the invention.
[0028] The present invention also concerns a transgenic non-human animal (preferably a non-human mammal, such as a mouse) comprising a homologous recombination knock-out of the native polypeptide according to the invention. Said knock-out could be done upon a complete polypeptide according to the invention or upon active fragments thereof. The transgenic non-human animal according to the invention could be used for identifiying the phenotype modifications induced by the polynucleotide according to the invention and for identifying compound that may modulate and possibly restore the expression of the polynucleotide according to the invention.
[0029] Another aspect of the present invention is related to a pharmaceutical composition comprising an adequate pharmaceutical carrier (or diluant) and at least one of the various elements according to the invention, (especially the polynucleotide, its encoded polypeptide, their variants or active fragments, the vector, the cell transformed by the vector and/ or the inhibitor above described) .
[0030] Advantageously, said pharmaceutical composition may comprise the polypeptide according to the invention and another polypeptide of the same family which may present unexpectedly a synergic effect, when they are administrated to a patient. Said pharmaceutical composition may comprise also a suitable adjuvant, an anti-oxydant , a bacteriostatin, etc. The adjuvant used in a pharmaceutical composition is advantageously used for modulating (if
necessary) the immune response of the patient
(especially a mammal, including a human) in order to improve the characteristics of the pharmaceutical composition according to the invention and/ or to reduce its possible side effects.
[0031] The suitable pharmaceutical carrier or diluant is selected by the person skilled in the art according to the type of administration to the mammal (oral administration, intravenous administration, intradermal administration, intramuscular administration, peritoneal administration, etc) .
[0032] The pharmaceutical composition can be present in the form of unidose or multidoses container and may be stored in the freeze dry conditions, which requires only the addition of a sterile liquid carrier. Said pharmaceutical carrier could be in solid liquid or gaseous form and the suitable dose of administration and the ratio between the pharmaceutical carrier/active compound varies according to the number of administration doses, the mass of the mammal to be treated and the possible side effects of the compound according to the invention upon said mammal. The pharmaceutical composition according to the invention could be a therapeutic or prophylactic composition such as a vaccine. [0033] Advantageously, the pharmaceutical composition according to the invention could be used for inhibiting (neuro) -inflammation and/ or neuro-degeneration in a mammal patient. The pharmaceutical composition according to the invention could be also used for treating or preventing a condition associated with cells proliferation, cells migration, inflammation and (cattle and sheep) dermatosparaxis, Ehlers-Danlos syndrome type VII C in humans and other diseases and syndromes induced by the
formation of irregular collagen fibres especially severe skin (and other collagen rich tissue such as bones, tendons and bloodvessels ) fragility, arthritis (osteoarthritis and rheumatoid arthritis), inflammatory bowel disease, Chrohn's disease, emphysema, acute respiratory distress syndrome, asthma, chronic obstructive pulmonary disease, Alzheimer's disease, organ transplant toxicity and rejection, cachexia, allergy, cancer (such as solid tumor cancer including colon, breast, lung, prostate, brain and hematopoietic malignancies including leukemia and lymphoma) , normal and pathological angiogenesis, tissue ulcerations, restenosis, periodontal disease, epidermolysis bullosa, osteoporosis, loosening of artificial joints implants, atherosclerosis (including atherosclerotic plaque rupture) , aortic aneurysm (including abdominal aortic and brain aortic aneurysm) , congestive heart failure, myocardial infarction, stroke, cerebral ischemia, head trauma, spinal cord injury, neurodegenerative diseases
(acute and chronic) , autoimmune disorders, Huntington's disease, Parkinson's disease, Alzheimer's disease, migraine, depression, peripheral neuropathy, pain, cerebral amyloid angiopathy, nootropic or cognition enhancement, amyotrophic lateral sclerosis, multiple sclerosis, ocular angiogenesis, corneal injury, macular degeneration, abnormal wound healing, burns, infertility, diabetic shock or fibrosis.
[0034] Another aspect of the present invention is related to the use of the pharmaceutical composition according to the invention for the manufacture of a medicament in the treatment and/or the prevention of the above mentioned various diseases and disorders.
[0035] The polypeptide or cell according to the invention could be also used for the industrial productions of monomers from procollagen polypeptide.
[0036] A further aspect of the present invention concerns a method for obtaining (producing) the new polypeptide, its variant or active portion according to the invention, said method comprising the step of transfecting or transforming a cell with the polynucleotide sequence or vector according to the invention, culturing said host cell in appropriated culture media and isolating and recovering the polypeptide from cell culture media.
[0037] A final aspect of the present invention concerns the screening of a compound that may modulate the polypeptide expression in a cell, said method comprising the step of contacting a candidate compound with a cell or a cell extract expressing the polypeptide according to the invention and evaluating the effects of the candidate compound on the expression of the polynucleotide mRNA or polypeptide according to the invention and therefrom identifying and recovering the compound that modulate said polypeptide expression in a cell. The modulation of the expression of said polypeptide could be an increase in the activity of said polypeptide or a decrease of the expression or activity of said polypeptide. A modification of the expression or activity of said polypeptide could be measured by genomic or proteomic methods (possibly upon biochips) wellknown by the person skilled in the art, such as RT-PCR, performed upon the mRNA expression of the polynucleotide or electrophoresis gel measuring the expression or activity of said polypeptide.
[0038] A last aspect of the present invention is related to the compound identified and/or recovered by said method
that could be integrating the pharmaceutical composition according to the invention.
[0039] The present invention will be described in the following preferred examples that are presented as non- limiting illustration of the present invention.
Short description of the figures
[0040] The figure 1 represents accumulation of pNαl and pNα2 (type I collagen in most tissues). [0041] The figure 2 represents the structure of collagen fibrils .
[0042] The figure 3 represents the mechanism of generation of alternative exon IA, IB and 1C.
[0043] The figure 4 represents the human and mouse polypeptide amino acid sequences and structures according to the invention.
[0044] The figure 5 represents the northern analysis of polynucleotide transcripts according to the invention.
[0045] The figure 6 represents the distribution pattern of the polypeptides according to the invention in most tissues .
[0046] The figure 7 represents the regulation o f the expression of the polypeptides according to the invention .
Detailed description of the invention
Experimental procedures
Analysis of procollagen processing [ 0047] For evaluating the level of aminoprocollagen I
( pNPI ) proce ssing , various organs and ti ssues collected from Adamt s2~ ~ mice ( 9 ) were ground at l iquid nitrogen temperature and extracted with 0 . 1 M acetic acid for 18
hours at 4°C. After centrifugation, the supernatants were neutralized and ammonium sulfate was added (40% saturation) . After centrifugation (8000 g, 30 min) , the pellets were washed in an ammonium sulfate solution (20% saturation) and finally dissolved in 0.1 M acetic acid. After precipitation at 33% ethanol (final concentration) , the pellets containing collagen were denatured in Laemmli sample buffer. Similar amounts of protein were separated by electrophoresis on a 7.5% SDS- PAGE and stained with Coomassie blue.
[0048] The cleavage site of aminoprocollagen I in the absence of A d a m t s - 2 activity was determined on dermatosparactic calf tendon. Collagen was purified from 1 M NaCl extracts by sequential steps of precipitation and solubilization. The collagen preparation was then treated or not with pyroglutamate aminopeptidase before electrophoresis on a pre-run 7.5% acrylamide/ piperazine diacrylamide gel in 50 mM Tris/borate buffer (pH 8.3) containing 0.1% SDS and 0.1 mM thioglycolic acid. After transfer on a PVDF membrane (in 200 mM Tris/borate, pH 9.5) and Coomassie blue staining, all and α2I bands were collected and submitted to 6 cycles of Edman degradation amino acid sequencing.
PCR amplification and sequencing of ADAMTS14 cDNA
[0049] Three large overlapping cDNA fragments covering the ADAMTS14 cDNA sequence that correspond to exons 2 to 22 of ADAMTS2 were PCR-amplified from fibroblast cDNA (37 cycles consisting of 94°C for 30 sec, 66°C for 30 sec, 72°C for 90 sec) using Taq DNA polymerase (Takara) and three ADAMTS14 primer pairs (5' -CTATGGTGTGACAGTGCCCTGCA-3' and 5' -GACGCTGCCCAGGCTGGTCTCA-3' ;
5'-GGCATGTGTCACCCCCTGAGGA-3' and
5' -TCCTTGTCACAGCCGACAGGCACA-3' ; 5'-GACGTGGTGTTCATGAACCAGGT-3' and
5'-GCCAGTGGGATGGCAGGGCACA-3' ) . PCR products were then gel purified and sequenced using the manufacturer's recommended protocols (Thermo-sequenase radiolabeled terminator cycle sequencing kit, Amersham, Life Sciences) . In order to amplify mouse Adamt sl 4 cDNA, various human primer pairs were used. PCR conditions were: 2 cycles consisting of 94°C for 20 sec, 50°C for 20 sec and 72°C for 1 min, followed by 35 cycles consisting of 94°C for 20 sec, 66°C for 20 sec and 72°C for 1 min. PCR products amplified by 5 ' - CTATGGTGTGACAGTGCCCTGCA- 3 ' and 5'- GACGCTGCCCAGGCTGGTCTCA-3' or 5' -AGCCTGGCCTACAAGTACGTCAT-3' and 5'- CTCTTCTTGTGGTCACACAGGTGT-3' pairs were sequenced. The determination of partial mouse sequences allowed the design a mouse specific primer pair that was used to amplify and sequence the central part of the murine cDNA. For tissue distribution analysis, total RNA was purified from various normal mouse tissues. Duplicate samples from 3 dilutions of RNA from each tissue (10, 2 and 0.4 ng) were used for semi-quantitative RT-PCR amplification.
Determination of the 5' -end of ADAMTS14 cDNA [0050] The 5' -end of ADAMTS14 mRNA was amplified using the FirstChoice™ RLM-RACE Kit (Ambion) using the manufacturer's recommended protocols. Briefly, mRNA from cultured fibroblasts was dephosphorylated and then treated with Tobacco Acid Pyrophosphatase to remove the cap structure from full length mRNA, leaving a 5' monophosphate . A RNA adapter was then ligated to the decapped phosphorylated mRNA. After reverse transcription
(BcaBest™ RNA PCR kit, Takara), the ADAMTS14 cDNA 5' end
was amplified using the sense "Outer Adapter primer" from the kit and the ADAMTS14 5' -CCAGACACCACGTGGGAGAGGAA-3' antisense primer (30 cycles; 94°C for 30 sec, 64°C for 30 sec, 72°C for 1 min) . One microliter of the outer amplification product was then re-amplified using the nested sense "Inner Adapter primer" and the 5'- CGTCCCCGAAAGTCTGTGCTGCA antisense primer (25 cycles; 94°C for 30 sec, 64°C for 30 sec, 72°C for 1 min) . Resulting PCR products were then sequenced as described above.
Northern analysis
[0051] PCR amplified products generated from the first or the second half of the ADAMTS14 cDNA (corresponding to amino acids 153-471 and 824-1078, respectively) were cloned using pCR4-TOPO cloning kit (Invitrogen) according to the manufacturer's protocols. Antisense labeled riboprobes were synthesized from 500 ng of linearized plasmid (Spel restriction site) using T7 RNA polymerase (Strip-EZ™ RNA kit, Ambion) and 32P-UTP (ICN) . [0052] Messenger RNA purified from human skin fibroblasts in culture (PolyATract mRNA Isolation System III, Promega) was separated by electrophoresis on a 0.9% agarose/formaldehyde gel and was transferred and fixed to a nylon membrane (Hybond N, Amersham) by UV irradiation. The filters were then sequentially prehybridized for 1 hour (at 65°C in 0.2 M NaH2P04 (pH 7.2), ImM EDTA, 1% BSA, 7% SDS, 20% formamide), hybridized with labeled probes for 18 hours (same conditions as for prehybridization) and washed 3 times (at 65°C in 40 mM NaH2P04 (pH7.2), ImM EDTA, 1% SDS) before autoradiographic exposure.
Semi-quantitative RT-PCR assay
[0053] The determination of mRNA level by RT-PCR amplification requires the use of an internal standard allowing to monitor the efficiency of each step of the procedure in each sample. For the construction of this standard, we designed and generated synthetic RNAs that have two main characteristics. First, they can be RT-PCR amplified by using the same primer pairs used for RT-PCR amplification of the cellular mRNAs . Secondly, their amplification products are larger or smaller than those obtained from the cellular mRNAs, enabling their discrimination by electrophoresis when co-amplified in the same tube. Semi-quantification was obtained by calculating, for each sample, the ratio between the level of the product generated from the endogenous mRNA and from a defined copy number of the standard synthetic RNA. RT- PCR reactions were performed in an automated thermal cycler (GeneAmp PCR System 2400 or 9600, Perkin Elmer, Norwalk, CT) using the GeneAmp Thermostable rTth Reverse Transcriptase RNA PCR kit (Perkin Elmer), specific pairs of primers (5 pmoles each), 10 ng of total RNA and a known copy number of internal standard RNA, in a 25 μl reaction mixture. The RT step (70°C for 15 sec) was followed by denaturation of RNA/DNA duplexes (95°C for 2 min) and by PCR amplification (adequate number of cycles consisting of 94°C for 15 sec, 66°C for 20 sec and 72°C for 10 sec) . RT- PCR products were resolved on a 10% polyacrylamide gel and analyzed ( Fluor-S-Multilmager , BioRad) after staining (Gelstar, FMC BioProducts) .
Cell culture
[0054] Human dermal fibroblasts, at passages 4 to 12, were plated at a density of 8 x 103 cells/cm2 and cultured for 1 or 2 days in Dulbecco' s Modified Eagle Medium (DMEM) .
The medium was supplemented with 10% dialyzed and decomplemented fetal calf serum and contained or not TPA (10 ng / ml), IL-lβ (100 U / ml), TNFCC (5 ng / ml), EGF (20 ng / ml) or TGFβ (5 ng/ ml) . Total RNA was purified using the High Pure RNA Isolation kit (Roche Molecular Biochemicals) .
Electron microscopy
[0055] Fragments of skin and tendon from wild type and Adamts2~'~ mice were fixed for 60 min at room temperature in 2.5% glutaraldehyde in 0.1 Sδrensen' s buffer (pH 7.4), posfixed for 30 min in 0.1% osmium tetroxide in Sδrensen' s buffer, dehydrated in a series of ethanol concentrations and embedded in epoxy resin (Epon 812, Fluka) . Ultrathin sections were stained with uranyl acetate and lead citrate before being examined using a Jeol electron microscope CX100 II at 60 kV.
Results
Procollagen processing in dermatosparaxis
[0056] The level of type I aminoprocollagen (pNI) processing in the skin of wild type (WT) or Adamts2~'~ mice was investigated by SDS-PAGE (fig IA) . [0057] A. The pattern of type I collagen polypeptide purified from skin and tendon of "wild type" (WT) and Ada ts2~'~ (TS2" ) mice was determined after SDS-PAGE and Coomassie blue staining. In WT, only αl and oc2 mature chains are visible. In TS2_ ", absence of Adamts-2 activity results in the accumulation of pNαl and pNα2 chains. Higher proportions of pNI oc chains were observed in skin. [0058] B. Type I collagen was extracted from various tissues of WT or TS2_ " mice. After electrophoresis on SDS-
PAGE and staining, pNαl, pNoc2 , αl and α2 type I collagen bands were quantified. For each tissue, the proportion of pNα chains is expressed as a percentage of total type I collagen. [0059] Only αl and α2 chains were observed in tissues of WT mice, illustrating the complete processing of the aminopropeptide of type I collagen. In Adamts2~,~ mice, pNαll and pNα2l chains were detected, as expected from an animal lacking Adamts-2 activity. However, mature αl and α2 chains were present, suggesting the existence of an alternative aminoprocollagen peptidase activity. Similar observations were made in bovine dermatosparaxis and human Ehlers-Danlos syndrome type VIIC. The level of processing varied from tissue to tissue (fig IB) and was not related to the collagen content in the various organs. Skin contains mostly unprocessed pNI- collagen (60 to 70%) while, in tendon, it represents only 20 to 25%. When most of the pN-I collagen is not processed, such as in the skin of Adamts2' ~ mice, collagen fibres are deeply altered (fig 2, compare panel A and B) . On the opposite, in tendons, where a high proportion of collagen is correctly processed, fibres look almost normal in terms of diameter, shape and supramolecular organization (fig 2, panels C and D) . Their mechanical resistance is also normal. Collagen fibrils from skin and tendons of wild type (WT) and Adamts2~'~ (TS2~ /_) mice were observed by electron microscopy. In TS2 " _, skin collagen fibrils are strongly disorganized while they display a normal shape in tendon. Bars represent 0.3 μm. [0060] Top panels: Collagen fibrils (longitudinal and cross section) in skin from a WT mouse (A) or a TS2" _ mouse (B) . Bottom panels: Collagen fibrils, in longitudinal (C) or cross (D) section, in tendon from a TS2~ ~ mouse.
[0061] The specificity of the processing of pNI collagen in absence of Adamt s -2 activity was assessed by amino acid sequencing of the N-terminus of processed αl and α2 chains isolated from dermatosparactic calf tendon. A first attempt failed to provide sequence information, suggesting that their amino-terminus was blocked, possibly by modification of the glutamine residue which is the first expected amino acid of αl and α2 when the processing occurs at the site normally cleaved by Adamt s -2. After digestion with pyroglutamate aminopeptidase, a LSYGYD sequence was obtained for the αl chain and a FDAKGG sequence for the α2 chain. In bovine type I collagen, these two sequences immediately follow the Q residue in position 1 of correctly processed αl and α2. These features demonstrate that, even in absence of Adam t s -2 activity in vivo, procollagen processing can occur at the cleavage site for Adamts-2.
Identification of ADAMTS14 cDNA and gene [0062] Based upon the hypothesis that the specific aminoprocollagen peptidase activity observed in animals lacking Adam t s -2 results from the expression and the activity of a closely related enzyme, nucleic acid databases were scanned from GenBank using the human ADAMTS2 cDNA sequence (Accession number AJ003125) . Besides homologies with ADAMTS2 and ADAMTS3 cDNA and genes (on chromosome 5 and 4 respectively) , homology was also found between portions of exons 6, 7, 8 and 10 of ADAMTS2 and sequences from chromosome 10 (Accession numbers: AC069538, AC016043, AC007484, AC018979, AL358817) . A second homology search using less stringent parameters and the sequence of individual ADAMTS2 exons revealed that, with the exception of exons 1, 5 and 21, each ADAMTS2 exon had partial
sequence homology with sequences in chromosome 10.
This suggested the existence of a gene coding for a new ADAMTS closely related to ADAMTS2.
[0063] This new ADAMTS gene was actually expressed in human skin fibroblasts in culture and in placenta, although at a lower level. Region of the mRNA corresponding to exons 2 to 22 of ADAMTS2 was then RT-PCR amplified and sequenced. This allowed to determine that sequences corresponding to exons 5 and 21 of ADAMTS2 were present in the new ADAMTS but displayed very low homology, explaining why they were not detected by scanning of the databases. An alternative splicing mechanism leads to the removal of the last 9 bases of exon 6 (table 1) . The ADAMTS14 name was assigned to this new cDNA, in agreement with the Human Gene Nomenclature Committee (GenBank accession number AF366351) . Comparison of the ADAMTS14 cDNA sequence with the draft sequence of the human genome revealed that the gene is located on chromosome 10 (q21.3). The exon/intron structure has been determined (table 1). It appeared that the sequence reporting this part of chromosome 10 (accession number NT_024089.2) has to be corrected since the part of the ADAMTS14 gene encompassing the promoter region to intron 2 is in the opposite orientation compared to the exon 3 to exon 22 sequence. [0064] For the determination of sequences located upstream exon 2, a 5' RACE method was used. Two different sequences were identified. They represent 2 alternative exons 1 that were named exon IA and IB (fig 3, table 1) . Comparison with genomic sequences showed that the beginning of exon IB (5' TATTT) starts at a tctgTATTT potential Cap signal located 17 base pairs (bp) downstream a potential TATA-box (tgtatat) . This suggests that the TATTT sequence represents the actual start of transcription of this
transcript. For exon IA, the 5' -end sequence that was determined (TTGC) does not start after a typical Cap site, perhaps suggesting that the actual 5' end of the transcript had not been cloned. A tcagc Cap signal 27 bp upstream and adequately preceded (82 bp) by a typical GC-box could be the actual start of transcription. As a confirmation of the presence, at the 5' end of the mRNA, of the alternative exons IA and IB, a RT-PCR assay was performed using a common antisense primer specific of exon 2 and one sense primer specific of exon IA or exon IB. Products of the expected size (204 and 241 bp, respectively) and sequence were obtained. However, another product (+/- 700 bp) was also RT-PCR amplified using the exon IB specific primer. In absence of the RT step, this product was not detected, demonstrating that it did not result from the amplification of genomic DNA. Sequencing revealed that this product was generated from a large exon, named IC, consisting of exon IB, intron IB and exon 2 fused together (fig 3, table 1). In the following, the mRNA beginning with exon IA, IB or IC will be named transcript A, B or and C, respectively. RT- PCR evaluation of the relative level of the three transcripts in cultured fibroblasts revealed that transcript A is more abundant than transcript C that is much more abundant than B. Moreover, while transcript A is expressed in placenta, skin and fibroblasts, transcripts B and C are found at a significant level only in fibroblasts.
Primary structure of ADAMTS-14
[0065] Open rectangles represent exons and lines represent promoter or intronic sequences. Dotted lines illustrate splicing mechanisms. For transcript A, exon IA is joined to exon 2 by splicing of intron IA. A potential regulatory GC-box is located upstream the 5' -end of the
transcript as determined by 5' RACE. The ATG triplet located in exon IA in a suitable Kozak consensus sequence is indicated. Transcripts B and C start 17 bp downstream a potential TATA-box. Intron IB is spliced during the maturation of transcript B while it is conserved in transcript C. The first potential ATG start codon for these two transcripts is located in the sequence corresponding to exon 2 of transcript B. [0066] For transcript A, the most 5' in-frame ATG found in exon IA (fig 3) is located in a conserved Kozak consensus sequence. Its use as a Start codon would lead to the presence of a moderately hydrophobic N-terminal domain (amino acid 52 to 76) that could serve as a signal peptide (fig 4A) . [0067] The human sequence (hTS14) is reported on the middle lines. The beginning and end ([ ]) of the partial mouse sequence (mTS14) and human ADAMTS-2 (hTS2) are reported on the upper or the lower line respectively, but only at positions where their sequence differes from hTS14. The peptide sequence translated from transcript A only is in italic (amino acids 1 to 67). The MQG (369-371) sequence is alternatively present due to alternative splicing mechanism at the end of exon 6. In the mouse these three amino acids are always absent. The furin cleavage site (RKRR) , the catalytic site and Met-turn, the PLAC domain and the 4 thrombospondin type I repeats (TSPI) are underlined. The start of the Disintegrin-like, the Cys-rich and Spacer domains are indicated (→) . [0068] B. The catalytic domain and the first TSPI module of the ADAMTS family members were compared. Length of the different members and their degree of similarity with ADAMTS-14 are reported between parentheses. Amino acid residues identical to those in ADAMTS-14 are shaded.
Residues that are hallmarks of ADAMTS-2, -3 and -14 are indicated (*) .
[0069] Low homology was found between part of this sequence and the N-terminus of ADAMTS-2 (amino acids 42 to 59) . For transcript B and C, the first ATG triplets that are found are not in a perfectly conserved Kozak consensus sequence and are followed by Stop codons. The first ATG in a suitable Kozak motif for translation (A at position -3 and G at position +4, with respect to the A (+1) of the ATG Start codon) is found in exon 2 (fig 3). The predicted protein from these transcripts starts at Met 68 of the sequence reported for transcript A (fig 4A) and does not contain an obvious signal peptide. Besides this difference, the predicted proteins translated from the three transcripts are identical. They have a higher homology with ADAMTS-2 and -3 than with the other ADAMTS (fig 4B) and they display a similar domains organization consisting of a pro-domain separated from the metalloproteinase domain by a furin cleavage site (RKRR) , a disintegrin-like domain, a central thrombospondin type I repeat (TSPI), a cystein-rich domain, a spacer domain, three additional TSPI and a C-terminal tail without significant homology for ADAMTS-2 or -3, except for a highly conserved PLAC domain (fig 4A) . Analysis of the sequence of the catalytic site and the first TSPI, two of the most conserved domains in the ADAMTS family, confirmed that ADAMTS-14 is closely related to ADAMTS-2 and -3. Specific hallmarks of these three enzymes are a T and a E at position 3 and 10 of the catalytic site and F (+9), K/R (+21) and R (+23) of the first TSPI (fig 4B) . [0070] A partial mouse sequence was also determined. It showed a high similarity with human ADAMTS-14, mainly in
regions supposed to be critical for enzyme function such as the furin cleavage site, the metalloproteinase domain and the beginning of TSPI (fig 4A) .
Northern analysis
[0071] As seen for other ADAMTS genes, ADAMTS14 mRNA was expressed at low level. Therefore, Northern analysis was performed on mRNA from fibroblasts in culture, the richest source of ADAMTS14 mRNA, using antisense riboprobe to increase the sensitivity of the assay. Two different probes were used, specific of either the 5' end or the 3' end of the mRNA. The 5' end probe revealed two transcripts of about 4.5 and 5.7 Kb (fig. 5), which is similar to the size of other ADAMTS transcripts. The 3' specific probe also recognized these two products and smaller transcripts (about 2.8, 2.0 and 1.1 Kb). [0072] The size of the RNA markers is shown on the left. mRNA from skin fibroblasts in culture was separated on an agarose/formaldehyde gel and transferred onto a nylon membrane. Blots were revealed using labeled cRNA probes specific either of the 5' -end (lane 1) or the 3' -end (lane 2) of the ADAMTS14 mRNA.
Tissue distribution and regulation of expression [0073] Tissue specific expression was evaluated by RT- PCR. Adamtsl 4 is expressed in all examined tissues, with highest levels in type I collagen-rich tissues and in fibroblasts in culture (fig 6). Duplicate samples containing 10, 2 or 0.4 ng of total RNA purified from various tissues were RT-PCR amplified using specific primers to Adamtsl 4 and 28S RNA. After electrophoresis on 10% polyacrylamide gel and staining, the amplified cDNA products were quantified. Values obtained with 28S primers
were equivalent in all tissues (not shown) . Values measured for products amplified from Adam t sl 4 mRNA are reported as arbitrary units of absorbance per ng of total RNA in the samples. Br: brain; He: heart; St: stomach; Li: liver; In: intestine; Mu: muscle; Ey: eye; Bo: bone; Sk: skin; Lu: lung; Sp: spleen; Ki : kidney; Te: tendon; Fb: fibroblasts in culture.
[0074] Lower levels were observed in liver, stomach, brain and eye. This tissue distribution and the relative amount of Adam t s l 4 mRNA are quite similar to those determined for Adamt s2, the only exception being the eye which has a low Adamt sl 4 expression while it contains a high level of Adamts2. Scanning EST databases using human and mouse cDNA sequences revealed that ADAMTS14 is also expressed in ovary (accession numbers: BF906533, BF906528, BF906335) , kidney tumor (accession number BF823025) and mammary tumor (accession number BF123774) and that it is upregulated in larynx carcinoma (accession number AJ403134) . [0075] Regulation of expression of ADAMTS14 mRNA was investigated in culture treated with factors known to be regulators of the expression of various genes in fibroblast (fig 7) . Fibroblasts in culture were left untreated or were treated for 1 day (black bar) or 2 days (grey bar) with TPA (5 ng / ml), IL-lβ (IL-1, 100 U / ml), TNFα (TNF, 10 ng / ml), EGF (20 ng/ml) or TGFβ (TGF, 5 ng/ml) . MMP1 (A) and ADAMTS14 (B) mRNA levels were assayed by RT-PCR. Results are expressed as the ratio of mRNA levels in treated/untreated control cultures. [0076] As a control for the efficiency of cell treatment, MMP1 expression was also measured in the various conditions. Results are expressed as a ratio of the values determined for treated on untreated cells. Modulation of
MMP1 expression measured in the different conditions was in good agreement with previous reports confirming the efficiency of cell treatment. At the opposite, none of the five treatments was able to significantly modify the ADAMTS14 overall expression (fig 7), neither that of individual transcripts A, B and C) . Similar results were obtained for ADAMTS2 and 3.
[0077] An intriguing observation in Ehlers-Danlos type VIIC (EDSVIIC) and animal dermatosparaxis is the presence of processed type I collagen in absence of functional ADAMTS-2 (8). This observation was confirmed in this study by using Adamts∑''' mice. The level of processed collagen varied from tissue to tissue and could not be correlated to type I collagen content. For example, 80% of type I collagen is processed in tendon while it represents only 30 to 40% in the skin. This difference is biologically significant since, in Adamts2~'~ mice, tendon has a normal mechanical resistance and contains normal collagen fibres while the skin is highly fragile and contains the abnormal collagen polymers seen in dermatosparactic calf and EDSVIIC patients (8). Studies investigating the processing of pN-I collagen in ADAMTS2-deficient human and animal had been performed so far by analysis of a SDS-PAGE pattern. Hence, it remained to be determined that the observed processing occurred at the cleavage site specific for ADAMTS-2 or at a close site. For example, MMP13 can cleave the aminotelopeptide of type I collagen, generating collagen fragments of about the same size than products released from ADAMTS-2 digestion. Sequencing of processed αl and α2 chains extracted from dermatosparactic calf tendon demonstrated that pN-I cleavage occurred at the bonds that are cleaved by ADAMTS-2 (Pro (142) /Glu (143) for αl and Ala (79) /Glu (80) for α2 ) . Processing of pN-I collagen
requires a complex three- dimensional structure of the substrate in which the 3 propeptides (two αl and one α2) are folded back across the major triple helix of the monomer. This specific requirement and the fact that the processing occurs at the ADAMTS-2 cleavage site for both αl and α2 chains in dermatosparactic animal strongly suggested that a closely related enzyme could display a true aminoprocollagen peptidase activity. Because of its high homology with ADAMTS-2, ADAMTS-3 was first considered as the enzyme that could partially compensate for the ADAMTS-2 deficiency. Preliminary investigations on tissue distribution and relative level of ADAMTS-3 were performed to verify this hypothesis (not shown) . ADAMTS-3 is expressed only at low level in many organs. In addition, some discrepancies were observed between the proportion of processed type I collagen in the tissues of Adamts2~/~ mice and the relative level of Adamt s3. These data suggested that an enzyme other than Adamts-2 or -3 displays aminoprocollagen peptidase activity.
ADAMTS14 cDNA cloning
[0078] By scanning databases, homology was found between ADAMTS2 cDNA and sequences located on chromosome 10 (q21.3) that could represent exons of a new ADAMTS gene. After confirmation, by RT-PCR, that this potential novel gene can be expressed as mRNA, the complete sequence of human ADAMTS14 cDNA (name assessed in agreement with Human Gene Nomenclature Committee) and also part of the Adamtsl 4 mouse ortholog were determined by sequencing overlapping RT-PCR fragments and by applying 5 'RACE technology. Different transcripts are expressed from the gene. An alternative splicing mechanism occurring at the end of exon 6 lead to an in-frame deletion of 9 bases (Table 1) . Only the 9
bases-skipped form is found in mouse, while in ADAMTS2 the corresponding 9 bases are always present, suggesting that this alternative splicing does not have a major biological significance. The presence of three different exons 1, named IA, IB and IC (table 1, fig 3), was also determined. Exon IA on one hand and exon IB or IC on the other hand resulted from the alternative use of two different signals of initiation of transcription (fig 3) . The difference between transcripts B and C (fig 3) is the presence in transcript C of a sequence that is spliced (intron IB) during the maturation of transcript B. As a result, exon IC is a large exon (1064 bp) consisting of sequences that correspond on transcript B to exon IB, intron IB and exon 2. Another example of such a large first exon is found in the ADAMTS1 gene, where exon 1 is 1136 bp long. The existence of 2 sites of initiation of transcription, in two different promoter contexts, suggests that transcripts resulting from these two sites can be differently regulated. Preliminary data support this hypothesis, indicating that transcript A is expressed at relatively high level in skin, placenta and human fibroblasts in culture while transcripts B and C are found at significant level in fibroblasts only. Northern analysis showed a 4.5 Kbp major product. This size is similar to the size of other ADAMTS mRNA and is consistent with the cDNA sequence that was determined. The larger product of about 5.7 Kbp probably results from the alternative use of a more 3' polyadenylation signal. A puzzling observation is the presence of only one band in the 4.5 Kbp region while, from RT-PCR data, we would expect to see two: one from transcript C and a smaller for transcript A, transcript B being expressed at a level too low to be seen on Northern blot. Two hypotheses could
explain these apparently contradictory observations.
Since quantification of transcripts A and C does not use the same primer pair, we cannot exclude that differences in the efficiency of the two assays would have led to biased results, a transcript being actually much more relatively expressed, compared to the other, than expected from RT-PCR quantification. According to this hypothesis, only this transcript would be visible by Northern analysis, the two others being undistinguishable from the background. Another explanation is that exon IA is longer than the 135 bp reported on table 1. Evaluation of its size is based upon the presence in genomic DNA of a suitable CAP signal situated 27 bp upstream the ADAMTS14 cDNA sequence determined by 5' RACE. If, instead, exon IA extends a few hundred bases upstream, the size of transcripts A and C would be too close to be discriminated on Northern blot.
Adamtsl 4 regulation of expression
[0079] The highest levels of Adam t sl 4 were found in collagen-rich tissue, supporting its role as an aminoprocollagen peptidase. However, significant levels were also detected in all tissues investigated, such as brain, spleen and liver, indicating that Adamts-14 may have other functions. Similar findings have been reported for ADAMTS-2 (9) . As ADAMTS-2 and -3, ADAMTS14 is not transcriptionally regulated in fibroblasts by various soluble factors. At the opposite, ADAMTS1 is an inflammatory associated gene that can be induced by IL-1 (11). Up-regulation of the expression of ADAMTS12 by TGFβ (1) and of aggrecanases (ADAMTS4 and 5) by IL-1, IL-6 and TNFα is also reported (12) .
ADAMTS-14 primary structure
[0080] Determination of the ADAMTS-14 primary structure requires the determination of the translation start site. For transcript A, the most 5' in-frame ATG in a suitable Kozak consensus sequence is found in exon IA (Fig. 3) . This ATG codon would result in the synthesis of a polypeptide displaying a moderately hydrophobic sequence at its N-terminus, which could serve as a signal peptide (fig 4A) . Homology existing between this sequence (amino acids 42 to 61) and the sequence derived from exon 1 of ADAMTS2 or from mouse Adamtsl 4 suggests the actual use of this ATG as a Start of translation. For transcripts B and C, the first suitable ATG corresponds to Met 68 of the protein synthesized from transcript A (fig 3 and 4A) . As a result, ADAMTS-14 polypeptides from transcripts B and C do not contain an obvious signal peptide. It remains to be determined wether these ADAMTS-14 variants are secreted, as all the ADAMTS described so far, or sequestered in the cytoplasm. [0081] Besides differences at the N-terminus, ADAMTS-14 variants are identical and display very high homology with ADAMTS-2 and -3, in terms of length of polypeptide chain, primary structure and domains organization. The highest similarity was observed around the catalytic site and the lowest, as expected, in the pro-domain and the C-terminal tail (fig 4). These results indicate that ADAMTS-2, -3 and -14 are three members of a structurally and functionally distinct subfamily of ADAMTS proteases. It has recently been demonstrated that ADAMTS-3, as ADAMTS-2, can process pN-II collagen (Fernandez et al, submitted for publication) . Because of the high homology between these three enzymes, it is conceivable that ADAMTS-14 may also display aminoprocollagen peptidase activity and may be the
enzyme responsible for pN-I collagen maturation in tissues of dermatosparactic calves, Adamt s2~ ' mice and EDSVIIC patients. Preliminary data seem to confirm this hypothesis. For example, highest ADAMTS14 mRNA levels were detected in collagen-rich tissues. In addition, the mRNA levels of ADAMTS14 and ADAMTS2 are similar, suggesting that the amount of ADAMTS-14 enzyme is high enough to allow maturation of significant amounts of pN-I collagen. Finally, the eye of Adam t s∑''' mice, where Adamt sl 4 is barely expressed, contains very low levels of mature type I collagen.
[0082] In summary, the gene structure and the primary structure of mouse and human ADAMTS-14 have been determined. The tissue distribution of this novel ADAMTS and its homology with ADAMTS-2 and -3 suggests its role as an aminoprocollagen peptidase and its implication for procollagen processing in absence of ADAMTS-2 activity. Various transcripts have been identified. They result from the use of two different promoters and transcription start sites and lead to the synthesis of ADAMTS-14 isoforms that differ by their amino-terminus . These observations suggest complex mechanisms of regulation of gene expression and enzyme function.
References
1. Cal,S. et al. (2001) J. Biol. Chem.216, 17932-17940.
2. Kuno, K. et al . (2000) FEBS Lett. 478, 241-245. 3. Vazquez, F. et al . (1999) J. Biol. Chem. 274, 23349- 23357.
4. Shindo, T. et al. (2000) J. Clin. Invest. 105, 1345- 1351.
5. Blelloch, R., and Kimble, J. (1999) Nature 399, 586- 590.
6. Lapiere, C. M., Lenaers, A., Kohn, L. (1971) Proc. Natl. Acad. Sci. U. S. A. 68, 3054-3058.
7. Colige, A. et al . (1995) J. Biol. Chem. 270, 16724- 16730. 8. Colige, A. et al. (1999) Am. J. Hum. Genet. 63, 308- 317.
9. Li, S.-W. et al. (2001) Biochem. J. 355, 271-278.
10. Colige, A. et al . (1997) Proc. Natl. Acad. Sci. U. S. A. 94, 2374-2379. 11. Sasaki, M. et al. (2001) Mol. Brain Res. 89, 158-163. 12. Flannery, C. R. et al . (2000) Matrix Biol. 19, 549- 553.
CO
Oo US
o
Table 1: ADAMTS14 gene structure.
Exon partial sequences are in capital letters. The nucleotide consensus sequences of the 5' and 3' splice junctions of the introns are shown in boldface letters. a: Alternative exons IA and IB are separated from exon 2 by introns IA and IB, respectively (see fig. 3) .
Alternative exon IC, made of exon IB, intron IB and exon 2, is separated from exon 3 by intron 2. b: A of the ATG start codon in exon IA is considered as +1 c: This alternative exon does not contain the ATG start codon used for the numbering (see b) . d: An alternative splicing mechanism using a
CTCAG/gtatgcaag donor site causes a 9 bp deletion (see fig 4A) .