WO1999002688A1 - Coffee storage proteins - Google Patents

Coffee storage proteins Download PDF

Info

Publication number
WO1999002688A1
WO1999002688A1 PCT/EP1998/004038 EP9804038W WO9902688A1 WO 1999002688 A1 WO1999002688 A1 WO 1999002688A1 EP 9804038 W EP9804038 W EP 9804038W WO 9902688 A1 WO9902688 A1 WO 9902688A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
sequence seq
protein
seq
sequence
Prior art date
Application number
PCT/EP1998/004038
Other languages
French (fr)
Inventor
Pierre Marraccini
John Rogers
Original Assignee
Societe Des Produits Nestle S.A.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Societe Des Produits Nestle S.A. filed Critical Societe Des Produits Nestle S.A.
Priority to CA002295320A priority Critical patent/CA2295320A1/en
Priority to AU87309/98A priority patent/AU756433B2/en
Priority to US09/462,720 priority patent/US6617433B1/en
Priority to EP98938679A priority patent/EP1007683A1/en
Priority to BRPI9811690-8A priority patent/BR9811690B1/en
Priority to JP2000502184A priority patent/JP2001509386A/en
Publication of WO1999002688A1 publication Critical patent/WO1999002688A1/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61QSPECIFIC USE OF COSMETICS OR SIMILAR TOILETRY PREPARATIONS
    • A61Q19/00Preparations for care of the skin
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K8/00Cosmetics or similar toiletry preparations
    • A61K8/18Cosmetics or similar toiletry preparations characterised by the composition
    • A61K8/30Cosmetics or similar toiletry preparations characterised by the composition containing organic compounds
    • A61K8/60Sugars; Derivatives thereof
    • A61K8/606Nucleosides; Nucleotides; Nucleic acids
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/415Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8216Methods for controlling, regulating or enhancing expression of transgenes in plant cells
    • C12N15/8222Developmentally regulated expression systems, tissue, organ specific, temporal or spatial regulation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8216Methods for controlling, regulating or enhancing expression of transgenes in plant cells
    • C12N15/8222Developmentally regulated expression systems, tissue, organ specific, temporal or spatial regulation
    • C12N15/823Reproductive tissue-specific promoters
    • C12N15/8234Seed-specific, e.g. embryo, endosperm
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8242Phenotypically and genetically modified plants via recombinant DNA technology with non-agronomic quality (output) traits, e.g. for industrial processing; Value added, non-agronomic traits
    • C12N15/8243Phenotypically and genetically modified plants via recombinant DNA technology with non-agronomic quality (output) traits, e.g. for industrial processing; Value added, non-agronomic traits involving biosynthetic or metabolic pathways, i.e. metabolic engineering, e.g. nicotine, caffeine
    • C12N15/8251Amino acid content, e.g. synthetic storage proteins, altering amino acid biosynthesis
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides

Definitions

  • the subject of the present invention is proteins derived from the coffee bean, and DNAs encoding and regulating the expression of at least one of these proteins .
  • EP 0,295,959 demonstrates, in particular, the expression, in a host plant, of the DNA derived from Bertholletia excelsa H.B.K. (brazil nut) encoding at least one subunit of the storage protein called 2S.
  • WO 9119801 demonstrates the existence of two storage proteins derived from Theobro a cacao, their precursor and their genes encoding these proteins .
  • the aim of the present invention is to respond to these needs .
  • the present invention relates to any DNA derived from the coffee bean, encoding at least 20 consecutive amino acids of the amino acid sequence SEQ ID NO: 2.
  • the present invention relates to any storage protein derived from the coffee bean, having at least 20 consecutive amino acids of the amino acid sequence SEQ ID NO: 2.
  • Another subject of the present invention relates to all or part of the DNA delimited by nucleotides 1 to 2509 of the nucleic sequence SEQ ID NO: 3, capable of regulating the transcription of the storage proteins according to the invention, as well as the use of all or part of this DNA to direct the expression of genes of interest in plants, in particular in the coffee tree .
  • the present invention also relates to the use of all or part of the DNA delimited by nucleotides 33 to 1508 of the nucleic sequence SEQ ID NO: 1 or of its complementary strand, of at least 10 bp, to carry out a PCR or as probe to detect in vi tro or to inactivate in vivo a coffee bean gene encoding a storage protein .
  • the invention relates to any recombinant plant cell capable of expressing a recombinant storage protein according to the invention.
  • the present invention relates to any food, cosmetic or pharmaceutical product comprising all or part of the DNA or of the recombinant proteins according to the invention.
  • the present invention therefore opens the possibility of using all or part of the DNA according to the invention so as to modify the original production of the storage proteins in the coffee bean. It is therefore possible in particular to envisage overexpressing or underexpressing the expression of all or part of the DNA according to the invention in the coffee bean.
  • homologous nucleic sequence is understood to mean any nucleic sequence differing from the nucleic sequences according to the invention only in the substitution, deletion and/or insertion of a small number of base pairs.
  • two nucleic sequences which, because of the degeneracy of the genetic code, encode the same protein will be considered in particular as being homologous.
  • homologous nucleic sequence is also understood to mean a sequence which hybridizes under stringent conditions, that is to say any nucleic sequence capable of hybridizing to the nucleic sequences according to the present invention by the Southern-Blot method, so as to avoid aspecific hybridizations or hybridizations which are not very stable (Sambrook et al . Molecular Cloning, A Laboratory Manual , Cold Spring Harbor Laboratory Press, USA, 1989, chapter 9.31 to 9.51).
  • homologous amino acid sequence is understood to mean any amino acid sequence differing from the amino acid sequences according to the present invention only in the substitution, insertion and/or deletion of at least one amino acid. Will also be considered as homologous sequence, that which exhibits more than 50% homology with the amino acid sequence according to the invention. In the latter case, the homology is determined by the ratio between the number of amino acids of a homologous sequence and that of an amino acid sequence according to the invention.
  • sequences SEQ ID NO: refer to the sequences presented in the sequence listing hereinafter.
  • the synthetic oligonucleotides SEQ ID NO: 5 to SEQ ID NO: 18, which are mentioned in the description and presented in the sequence listing hereinafter, are provided by Genset SA, 1 passage Delaunay, 75011 Paris, France.
  • Storage proteins are present only in the coffee bean and are highly expressed in the endosperm. In the ripe coffee bean, they represent nearly 50% of the total proteins and play a major role in the maturation of the coffee bean. These proteins influence in particular the structure and the density of the coffee bean as well as its amino acid content. They also play a major role in the storage of amino acids for the germination of the bean.
  • said DNA encodes at least one protein derived from the coffee bean, chosen from the group comprising the storage protein ⁇ , having the amino acid sequence SEQ ID NO: 2, the cleavage protein delimited in the amino acid sequence SEQ ID NO: 2 by amino acids 1 to 304, the cleavage protein ⁇ , delimited in the amino acid sequence SEQ ID NO: 2 by amino acids 305 to 492, or any nucleic sequences homologous to these sequences .
  • the invention relates to the DNA delimited by nucleotides 33 to 1508 in the nucleic sequence SEQ ID NO : 1 encoding the storage protein ⁇ , or any nucleic sequence, homologous to this sequence.
  • the invention relates to the DNA comprising at least in the nucleic sequence SEQ ID NO: 1 nucleotides 33 to 944 encoding the cleavage protein ⁇ and/or nucleotides 945 to 1508 encoding the cleavage protein ⁇ .
  • the present invention also relates to the use of all or part of the DNA delimited by nucleotides 33 to 1508 of the nucleic sequence SEQ ID NO: 1 or of its complementary strand, of at least 10 bp as primer to carry out a PCR or as probe to detect in vi tro or to modify the expression in vivo of at least one coffee bean gene encoding at least one storage protein.
  • the DNA according to the present invention may be advantageously used to express at least one recombinant storage protein, derived from the coffee bean, in a host plant or microorganism.
  • a host plant or microorganism it is possible to clone all or part of the nucleic sequence SEQ ID NO: 1 delimited by nucleotides 33 to 1508 into an expression vector downstream of a promoter, or of a promoter and a signal sequence, and upstream of a terminator, while preserving the reading frame, then the said vector may be introduced into a plant, a yeast or bacterium, for example. Specific examples of application are presented hereinafter.
  • nucleotides 33 to 1508 of nucleic sequence SEQ ID NO : 1 may be advantageously used in the coffee bean in a form which is modified by mutagenesis so as to modify the original production of storage proteins in the coffee bean and thus to modify the organoleptic quality of the coffee bean.
  • the invention also relates to the storage protein ⁇ , having the amino acid sequence SEQ ID NO: 2, the cleavage protein ⁇ having the sequence delimited by amino acids 1 to 304 of the amino acid sequence SEQ ID NO: 2 and the cleavage protein ⁇ having the sequence delimited by amino acids 305 to 492 of the amino acid sequence SEQ ID NO: 2, or any amino acid sequence which is homologous thereto.
  • the storage proteins derived from the coffee bean are synthesized into a large precursor, the storage protein ⁇ , which is cleaved into two proteins, the cleavage protein ⁇ and the cleavage protein ⁇ , has been demonstrated.
  • the cleavage proteins ⁇ and ⁇ can recombine in a polymerized form through at least one disulphide bridge. Indeed, it has been possible to isolate in the endosperm of the coffee bean polymerized forms of the cleavage proteins ⁇ and/or ⁇ and/or of their homologous sequences . To this effect, the present invention also relates to the polymerized form of the recombinant storage proteins ⁇ , ⁇ and/or ⁇ , as well as their homologous sequences .
  • Another subject of the present invention relates to all or part of the DNA delimited by nucleotides 1 to 2509 of the nucleic sequence SEQ ID NO: 3, capable of regulating the expression of the storage protein having the amino acid sequence SEQ ID NO: 2.
  • the invention also relates to the use of all or part of the DNA delimited by nucleotides 1 to 2509 of the nucleic sequence SEQ. ID NO : 3, to allow the expression, in the coffee bean or in a heterologous plant, of the storage protein ⁇ encoded by nucleotides 33 to 1508 of the nucleic sequence SEQ ID NO: 1 or of a protein encoded by a gene of interest.
  • the DNA delimited by nucleotides 1 to 2509 of the nucleic sequence SEQ ID NO: 3 may be advantageously used by fusing it, completely or partially, with a gene of interest, while preserving the reading frame, and then by cloning the whole into an expression vector which is introduced into coffee, so as to allow the expression of the protein encoded by this gene in the coffee bean.
  • the invention also covers all the food, cosmetic or pharmaceutical products comprising all or part of the DNA, or of the recombinant proteins according to the invention.
  • Persons skilled in the art are indeed capable, by means of oligonucleotide probes or of appropriate antibodies, of detecting their presence in very low quantities.
  • the storage proteins derived from the coffee bean, the DNA derived from the coffee bean encoding at least one of these proteins, as well as the DNA capable of regulating their transcription, according to the present invention, are characterized in greater detail with the aid of biochemical and molecular analyses hereinafter.
  • the total proteins are extracted from ripe fruits of Coffea arabica of the Caturra variety.
  • the maternal tissues are separated from the coffee beans which are rapidly ground in liquid nitrogen, and which are then reduced to a powder according to the method of Damerval et al . (Electrophoresis 1_, 52-54, 1986).
  • the coffee proteins are then extracted from 10 mg of this powder which is solubilized in 100 ⁇ l of solution containing 3% w/v of CHAPS, 8.5 M urea, 0.15% w/v of DTT and 3% v/v of ampholyte support pH 3-10.
  • the mixture is then centrifuged at 13,000 g for 5 min and the supernatant which contains the total proteins of the coffee beans is recovered.
  • a one-dimensional electrophoresis is performed on this supernatant on the basis of a pH gradient, using, for example, the Multiphore system (Pharmacia Biotech AB, Bj ⁇ rkgatan 30, 75182 Upsula, Sweden) . To do this, 50 ⁇ l are deposited/electrophoresis gel.
  • a second SDS-PAGE electrophoresis is then performed on the gels derived from the first electrophoresis, using, for example, a Bio-Rad equipment (Bio-Rad Laboratories, 2000 Alfred Nobel Drive, Hercules, California 94547 USA) under standard conditions, according to the Laemmli method (Nature, 277, 680-688, 1970) .
  • the gels derived from the one-dimensional electrophoresis are equilibrated with 5 ml/gel of Tris buffer containing 6 M urea, 30% v/v of glycerol, 2% w/v of SDS, 2% w/v of DTT and 2.5% w/v of iodoacetamide, they are placed on the gels of the second SDS-PAGE electrophoresis, and the migration of the proteins is carried out in a Bio-Rad equipment at 40 mA and at a temperature of 12 °C for 9 h, for example.
  • the gels thus produced are silver stained by the Bjellqyist et al . method (Electrophoresis, 1_4, 1357- 1365, 1993) .
  • the images are the . analysed with the aid of a scanner (Scanner XRS 12CX, X-Ray Scanner Corporation, 4030 Spencer Street, Torrance, California 90503 USA) and, for example, with the aid of the Bio Image programme (Bio Image, 777 East Eisenhower Parkway, Suite 950, Ann Arbor, Michigan 48108, USA).
  • a scanner Scanner XRS 12CX, X-Ray Scanner Corporation, 4030 Spencer Street, Torrance, California 90503 USA
  • Bio Image programme Bio Image, 777 East Eisenhower Parkway, Suite 950, Ann Arbor, Michigan 48108, USA.
  • the proteins separated by two-dimensional electrophoresis are transferred onto PVDF membranes in a CAPS buffer, with the aid of a Bio-Rad Transblot Cell (Bio-Rad, USA) maintained at 420 mA and at a temperature of 4°C for 1 h 30 min, and then they are stained with coomassie blue, according to the instructions of Applied Biosystems (Applied Biosystems Inc., 850 Lincoln Centre Drive, Foster City, California 94404 USA) .
  • the membranes are dried at room temperature, before storing them at -18°C in plastic pouches .
  • Microsequencing of the N-terminal sequences of the protein blots is carried out with the aid of a sequencer of the Beckman LF 3000 type and of the Beckman Gold HPLC system (Beckmann Instruments Inc., 250 Harbor Boulevard Box 3100, Fullerton, California 92634 USA) .
  • the protein blots are cut out of the membrane and subjected to trypsin digestion at pH 8.3 in 50 ⁇ l of digestion buffer containing 10% v/v of trypsin, 100 mM of Tris HCl, 1% v/v of triton RTX and 10% v/v of acetonitrile .
  • the peptides are then separated by HPLC in a C18 column (Merk KGAA Frankfurte Strasse 250, 64923 Darmstadt, DE) , using a water/acetonitrile gradient containing 0.05% of TFA, the peptide fractions are concentrated and they are rediluted in 30% of acetonitrile and 0.01% of TFA, and they are sequenced as described above.
  • the two-dimensional electrophoretic profile, under denaturant conditions of the endosperm of ripe C. arabica beans shows 4 groups of proteins which are represented in particular, these proteins having an apparent molecular weight of the order of 70, 56, 32 and 23 kDa.
  • 2 proteins of the group of proteins at 23 kDa as well as 2 proteins of the group of proteins at 70 kDa have an N-terminal sequence which is identical to the N-terminal sequence of the cleavage protein ⁇ .
  • N-terminal sequences of the proteins of 23, 32, 56 and 70 kDa and the internal sequences of the protein at 32 kDa have a high homology with the sequences of storage proteins of certain plant species, such as for example glycinins of Glycine max, 12s proteins of Arabidopsis thaliana , cruciferin of Brassica napus , glutelins of Oryza sativa and 11s protein of Cucurbi ta maxima .
  • the group of proteins of 56 kDa represents a large precursor, the mature storage protein ⁇ , comprising two domains , the ⁇ domain and the ⁇ domain .
  • the two-dimensional electrophoretic profile also demonstrates the existence of the cleavage protein ⁇ , present in several isoforms at 32 kDa and that of the cleavage protein ⁇ , present in several isoforms at 23 kDa.
  • the cleavage proteins ⁇ and ⁇ may exist in various isoforms.
  • the group of proteins of 70 kDa represents the trimeric form of the cleavage protein ⁇ .
  • the quantity of storage proteins contained in the coffee bean is calculated, in per cent, relative to the total integrated intensity of the two-dimensional electrophoretic profile. To do this, the integrated intensity of the protein blots, representing the storage protein ⁇ , the cleavage protein ⁇ , the cleavage protein ⁇ , the trimeric form of the cleavage protein ⁇ and the fragment of the cleavage protein ⁇ is measured.
  • the expression of the storage proteins of the coffee bean in tissues of the coffee bean other than the endosperm is also checked by two-dimensional electrophoresis. It is thus possible to demonstrate the fact that the storage proteins are only synthesized in a large quantity in the endosperm and in a much lower proportion in the embryo of the coffee bean.
  • the total RNAs are extracted from coffee beans harvested from 4 to 40 weeks after flowering.
  • the maternal tissues are separated from the coffee beans which are rapidly ground in liquid nitrogen before being reduced to a powder.
  • This powder is then resuspended in 8 ml of buffer at pH 8 containing 100 mM Tris-HCl, 0.1% w/v of SDS and 0.5% v/v of ⁇ -mercaptoethanol , it is homogenized with one volume of phenol saturated with 100 mM Tris-HCl, pH 8, and then centrifuged at 12,000 g for 10 min at 4°C, so as to extract the aqueous phase which is centrifuged (i) once with an equivalent volume of phenol, (ii) twice with an equivalent volume of phenol : chloroform (1:1) and (iii) twice with an equivalent volume of chloroform.
  • the total nucleic acids are then precipitated for 1 h at -20°C by adding to the aqueous phase 1/10 of the volume of 3 M sodium acetate, pH 5.2 and 2.5 volumes of ethanol .
  • RNAs After centrifugation, the pellet of total RNAs is taken up in 1 ml of H 2 0 and it is digested for 1 h at 37°C with DNAse RQ1 (Promega Corporation, 2800 Woods Hollow Road, Madison, Wisconsin 53711 USA) , so as to eliminate any trace of DNA, and the total RNAs are then deproteinized by treatment with phenol and with chloroform, before precipitating them in the presence of sodium acetate as described above.
  • DNAse RQ1 Promega Corporation, 2800 Woods Hollow Road, Madison, Wisconsin 53711 USA
  • RNAs are then taken up in 500 ⁇ l of H 2 0 and they are quantified by spectrophotometric assay at 260 nm. Their quality is analysed by agarose gel electrophoresis in the presence of formaldehyde and by in vi tro translation.
  • RNA polyA+ messenger RNAs
  • mRNA polyA+ messenger RNAs
  • the labelled proteins are then separated by two-dimensional electrophoresis as described above. After fixing in an acetic acid/ethanol mixture (40/10) , the gels are incubated in the presence of Amplify (Amersham, UK) , they are dried under vacuum and they are exposed at -80°C against an autoradiographic film.
  • cDNA necessary for the construction of libraries, is carried out according to the recommendations provided in the "Riboclone cDNA synthesis system M-MLV (H-)" kit (Promega, USA), using the mRNA extracted from coffee beans harvested 16 and 30 weeks after flowering. The efficiency of this reaction is monitored by the addition of [alpha- 32 P] dCTP during the synthesis of the two DNA strands. After migration on an alkaline agarose gel (Sambrook et al . , Molecular Cloning - A Laboratory Manual, 1989), the size of the new synthesized cDNA is estimated to vary from 0.2 to more than 4.3 kb. The quantifications, with the aid of the DNA Dipstick kit (InVitrogen BV, De Schelp 12, 9351 NV Leek, Netherlands), show that about 100 ng of cDNA are synthesized from 1 ⁇ g of mRNA.
  • the new synthesized cDNA(s) are then treated according to the recommendations provided in the RiboClone EcoRl
  • This ligature mixture is used to convert the E. coli strain XLl-Blue MRF ' (Stratagene, USA) .
  • the bacteria containing recombinant vectors are selected on dishes with LB (Luria-Bertani) medium containing
  • Petri dishes so as to obtain about 300 clones per dish.
  • sequence from amino acids 325 to 330 of the sequence SEQ ID NO: 2 is chosen in the amino acid sequence of the cleavage protein ⁇ because it makes it possible to designate an oligonucleotide probe which is relatively only slightly degenerate, the probe OLIGO 1, having the nucleic sequence SEQ ID NO: 4, which is labelled at its 5' end by the addition of the digoxigenin radical (Genosys Biotechnologies Inc., 162A Science Park, Milton Road, Cambridge CB4 4BR, UK) .
  • the filters are prehybridized at 65°C for 4 h in the hybridization solution defined in the DIG oligonucleotide 3 ' -end labelling kit protocol (Boehringer Mannheim, DE) and the hybridization is carried out at 37°C for 10 h in the presence of the probe OLIGOl (10 pmol /ml final) .
  • the filters are washed in the presence of tetramethylammonium chloride according to the protocol defined by Wood et al . (Proc. Natl. Acad. Sci. USA, 8_2_, 1585-1588, 1985) and then they are subjected to immunological detection in the presence of CSPD (Tropix, 47 Wiggins Avenue, Bedford, Massachusetts 01730 USA) according to the recommendations provided by Boehringer Mannheim (DIG luminescent detection kit) .
  • CSPD Tropix, 47 Wiggins Avenue, Bedford, Massachusetts 01730 USA
  • a positive clone harbouring the recombinant vector, called "pCSPl” in the remainder of the description, is selected from the screening of the cDNA library carried out 16 weeks after flowering.
  • This vector contains a cDNA, cloned into the EcoRl site of the vector pBluescript II SK (+), which is sequenced according to the "T7 sequencing kit” protocol (Pharmacia, Sweden) in the presence of [alpha- 35 S] dATP .
  • This cDNA comprises the last 819 nucleotides of the sequence SEQ ID NO: 1 and, consequently, is incapable of encoding the storage protein ⁇ .
  • a new nucleic probe called SOI in the remainder of the description, is synthesized.
  • SOI a new nucleic probe
  • a PCR is carried out (US Patent 4,683,195 and US Patent 4, 683 , 202) using the synthetic oligonucleotide OLIGO 2, having the nucleic sequence SEQ ID NO: 5, and the synthetic olignucleotide OLIGO 3, having the nucleic sequence SEQ ID NO : 6.
  • the PCR reaction is carried out in the presence of 0.1 ng of vector pCSPl, in a final volume of 50 ⁇ l containing 50 mM KC1, 10 mM Tris-HCl, pH 8.8, 1.5 mM MgCl 2 , 0.1 mg/ml gelatin, 0.2 mM of each dNTP, 0.25 ⁇ M of each oligonucleotide (OLIGO 2 and OLIGO*3) and 3 units of Taq DNA polymerase (Stratagene, USA) .
  • the reaction mixture is covered with 50 ⁇ l of mineral oil and it is incubated for 30 cycles (94°C-30 s, 42°C- 30 s, 72°C-2 min) followed by a final extension at 72°C for 7 min.
  • the fragment obtained after amplification is purified on a Microcon 100 cartridge (Amicon INC, 72 Cherry Hill Drive, Beverly, Massachusetts 01915 USA) and 50 ng of this fragment are labelled by random primer extension -with 50 ⁇ Ci of [alpha- j2 P] dCTP according to the Megaprime kit (Amersham, UK) .
  • the Nylon filters used during the screening with the probe OLIGO 1 are dehybridized by two washes of 15 min at 37°C, in the presence of 0.2 N- NaOH-0.1% SDS (w/v) and then prehybridized for 4 h at 65°C in a solution containing 6xSSC, lxDenhart (0.2% Ficoll, 0.2% polyvinylpyrrolidone, 0.2% BSA fraction IV) and 50 ⁇ g/ml of denatured salmon sperm DNA.
  • a positive clone harbouring the recombinant vector, called in the remainder of the description pCSP2 is thus selected from the screening of the cDNA library carried out at 30 weeks after flowering.
  • This vector contains the sequence SEQ ID NO: 1 of 1706 bp, corresponding to the cDNA encoding the entire storage protein ⁇ , having as amino acid sequence the sequence SEQ ID NO: 2 and a theoretical molecular weight of 54999 Da.
  • a search of the SwissProt databank with the sequence SEQ ID NO: 2 confirms that this coffee protein belongs to the family of type 11s plant storage proteins .
  • the cleavage site of the precursor is located between amino acids 304 and 305 of the amino acid sequence SEQ ID NO: 2, as has been observed for all the other type 11s plant proteins (Borroto and Dure, Plant Mol . Biol. _8, 113-131, 1987) . This is also confirmed by the N- terminal sequencing of the cleavage protein ⁇ described above. Consequently, the cleavage protein ⁇ corresponds to the first 304 amino acids of the amino acid sequence SEQ ID NO: 2, whereas the cleavage protein ⁇ corresponds to the last 188 amino acids of this sequence.
  • the theoretical molecular weights of ⁇ and ⁇ are respectively 34125 Da and 20892 Da and are in agreement with those described above under "Identification of the storage proteins of the coffee bean" .
  • N-terminal sequences of the cleavage proteins ⁇ and ⁇ analysed above are found in the amino acid sequence SEQ ID NO: 2 with the exception of a few amino acids. These differences are probably explained by the existence of several isoforms of these proteins which may differ from each other by a few amino acids
  • RNAs of these coffee beans are denatured for 15 min at 65°C in lxMOPS buffer (20 mM MOPS, 5 mM sodium acetate, 1 mM EDTA, pH 7) in the presence of formamide (50%) and formaldehyde (0.66 M final) .
  • RNAs are stained with ethidium bromide (BET) according to Sambrook et al . 1989, which makes it possible to standardize the quantities deposited on a gel from the intensities of fluorescence of the 16S and 23 S ribosomal RNAs.
  • BET ethidium bromide
  • RNAs are then transferred and fixed on a positively charged Nylon membrane according to the recommendations provided by Boehringer Mannheim
  • the mRNAs- encoding the storage protein ⁇ are completely absent from the beans harvested up to 9 weeks after flowering. They begin to be very weakly detected in the beans harvested at 12 weeks after flowering and are very abundant in the beans harvested between 16 and 30 weeks after flowering, again becoming very weakly represented in the ripe coffee beans (35 weeks after flowering) . In all cases, the probe SOI hybridizes with only one class of mRNA whose estimated size at around 1.8 kb is close to that of the nucleic sequence SEQ ID NO: 1.
  • the kinetics of accumulation of the mRNAs is similar to that observed for most of the genes for storage proteins (Shirsat, 1991) .
  • tissue examinations made during the maturation of the coffee beans it is observed that the increase in the quantity of mRNA between 12 and 16 weeks after flowering occurs at the same time as the absorption of the perisperm by the endosperm.
  • a perfect superposition of the kinetics of accumulation of mRNAs with that of the storage proteins is observed.
  • the persistence of the storage proteins in the absence of their corresponding messenger RNAs is explained by a high stability of these proteins in vivo .
  • the promoter of the gene encoding the storage protein ⁇ of Coffea arabica is isolated by several inverse PCRs according to the method of Ochman et al . (Genetics 120, 621-623, 1988) .
  • the DNA is taken up in about 500 ⁇ l of ligation buffer containing 30 mM Tris-HCl, pH 7.8, 10 mM MgCl 2 , 10 mM DTT and 0.5 mM rATP, so as to obtain a final DNA concentration of about 1 to 2 ng/ ⁇ l.
  • the ligation is carried out for 12 h at 14°C in the presence of T4 DNA ligase at 0.02 Weiss u/ ⁇ l and then the self-ligated genomic DNA is precipitated as described above and it is taken up in 20 ⁇ l of H 2 0 before quantifying it with the DNA Dipstick kit (InVitrogen, Netherlands) .
  • This first reaction is carried out using the synthetic oligonucleotide SO10, having the nucleic sequence SEQ ID NO: 7, and the oligonucleotide SOU, having the nucleic sequence SEQ ID NO : 8.
  • This inverse PCR reaction is carried out in the presence of 50 ng of ligated genomic DNA in a final volume of 50 ⁇ l containing 50 mM KC1, 10 mM Tris-HCl, pH 8.8, 1.5 mM MgCl 2 , 0.1 mg/ml of gelatin, 0.2 mM of each dNTP, 0.25 ⁇ M of each oligonucleotide (SO10 and SOU) and 3 units of Taq DNA polymerase (Stratagene, USA) .
  • the reaction mixture is covered with 50 ⁇ l of mineral oil and it is incubated for 30 cycles (94°C- 30 s, 56°C-30s, 72°C-3 min) followed by a final extension cycle at 72 °C for 7 min.
  • the amplified DNA fragments are then analysed by molecular hybridization (J. Southern, Mol . Biol. 98, 503-517, 1975), they are separated by electrophoresis on 1% agarose gel stained with ethidium bromide and then they are transferred in the presence of 0.4 N NaOH for 12 h onto positively charge Nylon membrane (Boehringer Mannheim, DE) . After the transfer, the membrane is baked for 15 min at
  • the membrane is then hybridized at 37°C for 10 h in the presence of the synthetic oligonucleotide S012 (10 pmol/ml) , having the nucleic sequence SEQ ID NO: 9 and labelled at its 5' end with a digoxigenin radical.
  • the filters are washed in the presence of tetramethylammonium chloride according to the protocol defined by Wood et al . , 1985, and then they are subjected to immunological detection in the presence of CSPD (Tropix, USA) , according to the recommendations provided in the DIG luminescent detection kit (Boehringer Mannheim, DE) .
  • 3 ⁇ l of the DNA thus purified are treated in the presence of native Pfu DNA polymerase (Stratagene, USA) in order to convert its cohesive ends to blunt ends.
  • This reaction is carried out in a final volume of 10 ⁇ l containing 10 mM KC1, 6 mM (NH ) 2 S04, 20 mM Tris-HCl, pH 8.0, 0.1% Triton X-100, 2 mM MgCl 2 , 1 mM of each dNTP, 10 ⁇ g/ml BSA and the reaction mixture is covered with 50 ⁇ l of mineral oil, it is incubated for 30 min at 72 °C and then 1 ⁇ l of this reaction mixture is directly used in the ligation reaction with the vector pCR-Script ⁇ K(+) .
  • This ligation mixture (10 ⁇ l) is used to transform the E. coli strain XLl-Blue MRF' (Stratagene, USA) .
  • the bacteria containing the recombinant vectors are selected on dishes with LB medium containing 20 ⁇ g/ml of ampicillin, 80 ⁇ g/ml of methicillin and in the presence of IPTG and X-Gal (Sambrook et al . , 1989) .
  • CSP1 CSP1
  • This partial sequence of the C ⁇ P1 gene shows the presence of two introns of identical size (111 bp) , located respectively between nucleotides 2811-2921 for the first, and nucleotides 3239-3349 for the second nucleic sequence SEQ ID NO: 3. These two introns have sizes less than those observed, for example, in Arabidopsis thaliana but they are on the other hand located at the same positions as those observed in this plant (Pang et al . , Plant Mol. Biol _11, 805-820).
  • nucleic sequences located upstream of the Hindi site position 1763 of the nucleic sequence SEQ ID NO : 3
  • another inverse PCR reaction is carried out using, this time, the synthetic oligonucleotides S016 and S017 deduced from the sequence previously cloned to the plasmid pCSPPl and having respectively the nucleic sequences SEQ ID NO: 10 and SEQ ID NO: 11.
  • This inverse PCR reaction is carried out under conditions identical to those described for the inverse PCR reaction No. 1, with the exception of the following parameters : the attachment of the oligonucleotides was carried out at 57 °C and 35 polymerization cycles were performed.
  • ⁇ PCR are analysed by molecular hybridization after having been separated on an electrophoresis gel and they are transferred onto a Nylon membrane. This membrane is then prehybridized for 4 h at 65°C in a solution containing 6xSSC, lxDenhart (0.2% Ficoll, 0.2% polyvinylpyrrolidone, 0.2% BSA fraction IV) and 50 ⁇ g/ml of denatured salmon sperm DNA and then it is hybridized for 10 h at 65°C in the same solution with the probe SO1016.
  • This probe is in fact synthesized by PCR using the synthetic oligonucleotides SO10 and S016 described above, in the presence of 0.1 ng of vector pCSPPl, in a final volume of 50 ⁇ l containing 50 mM KCl, 10 mM Tris- HCl, pH 8.8, 1.5 mM MgCl 2 , 0.1 mg/ml gelatin, 0.2 mM of each dNTP, 0.25 ⁇ M of each oligonucleotide (SO10 and S016) and 3 units of Taq DNA polymerase (Stratagene, USA) .
  • the reaction mixture is covered with 50 ⁇ l of mineral oil and it is incubated for 30 cycles (94°C- 30 s, 46°C-30 s, 72°C-2 min) followed by a final extension cycle at 72°C for 7 min.
  • the fragment obtained after amplification (698 bp) is purified on a Microcon 100 cartridge (A icon, USA) and 50 ng of this fragment are labelled by random primer extension with 50 ⁇ Ci of [alpha- 32 P]dCTP according to the Megaprime kit (Amersham, UK) protocol.
  • the membrane is washed three times for 30 min at 65°C in the presence, successively, of 2xSSC-0.1% SDS, lxSSC-0.1% SDS and O.lxSSC-0.1% SDS and it is analysed by autoradiography so as to detect a DNA fragment of about 1 kb which binds the probe SO1016.
  • This DNA derived from the inverse PCR reaction on the genomic DNA initially digested with the restriction enzyme Ndel, is then treated with Pfu DNA polymerase and then it is ligated into the vector pCR-Script (SK+) as described above.
  • This ligation is then used to transform the E. coli strain XLl-Blue MRF' and the transformants are selected and analysed by molecular hybridization with the probe SO1016 according to the conditions described above.
  • This screening makes it possible to isolate a positive clone harbouring the vector pCSPP2. As expected, this vector results from the cloning into the Sfrl site of the vector pCR-Script
  • the probe used is deduced from the sequence of coffee nuclear DNA cloned into the vector pCSPP2 and it is synthesized by PCR using the oligonucleotide S017 described above and the oligonucleotide SO20 having the nucleic sequence SEQ ID NO: 12.
  • This reaction is carried out in the presence of 0.1 ng of vector pCSPP2 , under conditions identical to those used for the synthesis of the probe SO1016, with the exception of the temperature for attachment of the oligonucleotides which is 50°C.
  • the fragment obtained after amplification (262 bp) is labelled as described above and it is used as probe to test the inverse PCR reactions No. 2.
  • the Nylon membrane used during the screening of the products of the inverse PCR reaction No. 2 with the probe SO1016 is dehybridized by two washes for 15 min at 37°C in the presence of 0.2 N NaOH-0.1% SDS (w/v), then it is prehybridized and it is hybridized as described above with the probe SO1720.
  • the transformants are then selected and they are screened by molecular hybridization with the probe SO1720. It is thus possible to isolate a positive clone harbouring the vector pCSPP3 which results from the cloning into the Sfrl site of the vector pCR-Script (SK+) of the DNA fragment previously identified by hybridization.
  • the latter which corresponds to the DNA segment between nucleotides 1 and 1886 of the nucleic sequence SEQ ID NO: 3, bordered at each end by a raJ restriction site, is sequenced. It therefore contains 1513 base pairs in addition upstream of the genomic DNA fragment cloned into the vector pCSPP2.
  • the inverse PCR experiments form chimeric linear molecules by combining noncontiguous DNA fragments in the genome with each other (Ochman et al . , 1988). Moreover, measurements of mutation frequency show that the Pfu DNA polymerase is approximately twelve times more accurate than Tag DNA polymerase, which reduces the probability of point mutations during PCR amplifications (Lundberg et al . , Gene 108, 1-6, 1991).
  • a PCR reaction is carried out on the native genomic DNA of C. arabica , Caturra variety, in the presence of Pfu DNA polymerase.
  • This reaction is carried out in the presence of 10 ng of genomic DNA, in a final volume of 50 ⁇ l containing 10 mM KCl, 6 mM (NH 4 ) 2 S0 4 , 20 mM Tris-HCl, pH 8.0, 0.1% Triton X-100, 2 mM MgCl 2 , 10 ⁇ g/ml BSA, 0.2 mM of each dNTP, 0.25 ⁇ M of the oligonucleotides, SOlO and SO20 described above and 3 units of Pfu DNA polymerase.
  • the oligonucleotide SOlO is located on the antisense strand of the nucleic sequence SEQ ID NO : 3, between nucleotides 2512 and 2534 whereas the oligonucleotide SO20 is located on the sense strand of the nucleic sequence SEQ ID NO: 3, between nucleotides 1565 and 1584.
  • the reaction mixture is then covered with 50 ⁇ l of mineral oil and it is incubated for 45 cycles (94°C-30 s, 50°C-30 s, 72°C- 3 min) followed by a final extension cycle at 72 °C for 7 min.
  • sequences located upstream of the site of initiation of translation are analysed in order to test their capacity for regulating the expression of the reporter gene uidA, in the beans of transformed plants .
  • This vector contains the reporter gene uidA which encodes ⁇ -glucuronidase (GUS) and the bacterial gene nptll, which encodes neomycin phosphotransferase .
  • GUS ⁇ -glucuronidase
  • nptll bacterial gene nptll
  • the latter confers resistance to kanamycin in the transformed plants.
  • the vector pBIlOl is digested with the restriction enzyme BamHl and it is dephosphorylated by treatment with calf alkaline phosphatase (Promega, USA) according to the protocol defined by the supplier.
  • DNA fragments of different size which are obtained by PCR in the presence of the vector pCSPP4 , of Pfu DNA polymerase and of two synthetic oligonucleotides each containing at their 5' end the nucleic sequence SEQ ID NO: 13, are cloned into the vector PBIlOl.
  • This sequence comprises a BamHl restriction site which allows the cloning of the PCR products into the vector pBIlOl linearized with the same enzyme.
  • a synthetic oligonucleotide is used, on the one hand, which is capable of binding to the promoter and, on the other hand, the oligonucleotide BAGUS which has the nucleic sequence SEQ ID NO: 14.
  • the use of the latter allows, after digestion of the PCR products with the restriction enzyme BamHl, a translational fusion between the first 5 amino acids of the storage protein ⁇ of the coffee bean and the N-terminal end of ⁇ - glucuronidase to be obtained.
  • the PCR reaction is carried out with 5 ng of plasmid pCSPP4, in a volume of 50 ⁇ l containing 10 mM KC1, 6 mM
  • reaction mixture is covered with 50 ⁇ l of mineral oil and it is incubated for 30 cycles (94°C-30 s, 55°C- 30 s, 72°C-2 min) followed by a final extension cycle at 72°C for 7 min.
  • the PCR fragment of about 950 bp is purified on a Microcon 100 cartridge (Amicon, USA) , and it is digested for 12 h at 37°C with BamHl (Promega, USA) and it is ligated into the linearized vector pBIlOl, in the presence of T4 DNA ligase (Promega, USA) , according to the recommendations provided by the supplier.
  • the E. coli strain XLl-Blue MRF' is transformed with the entire ligation mixture.
  • the plasmids are independently extracted from several transformants and they are sequenced so as to determine the orientation of the PCR fragment in the binary vector. This analysis thus makes it possible to select the plasmid pCSPP5.
  • this vector is carried out as described for the vector pCSPP5 except for the fact that the oligonucleotide UP210 is replaced with the oligonucleotide UP211 which has the nucleic sequence SEQ ID NO: 16.
  • the cloning of the PCR product (about 700 bp) , correctly oriented in the vector pBIlOl, gives the vector pCSPP6.
  • this vector is carried out as described for the vector pCSPP5 except for the fact that the oligonucleotide UP210 is replaced with the oligonucleotide UP212 which has the nucleic sequence SEQ ID NO: 17.
  • the cloning of the PCR product (450 bp) correctly oriented in the vector pBIlOl, gives the vector pCSPP7.
  • this vector is carried out as described for the vector pCSPP5 except for the fact that the oligonucleotide UP210 is replaced with the oligonucleotide UP213 which has the nucleic sequence SEQ ID NO: 18.
  • the cloning of the PCR product (250 bp) correctly oriented in the vector pBIlOl, gives the vector pCSPP8.
  • the vectors described above (pCSPP5-8) , as well as the plasmids pBIlOl and pBI121 (Clontech) are independently introduced into the disarmed Agrobacterium tumefaciens strain C58pMP910 (Koncz and Schell, Mol . Gen. Genet. 204, 383-396, 1986) according to the direct transformation method described by An et al . (Plant Mol. Biol. Manuel, Gelvin, Schilperoort and Verma Eds, Kluwer Academic Publishers Dordrecht, Netherlands, A3 , 1-19, 1993).
  • the recombinant Agrobacterium tumefaciens clones are selected on LB medium supplemented with kanamycin (50 ⁇ g/ml) and rifampicin (50 ⁇ g/ml) .
  • the gene uidA is silent because it lacks a promoter.
  • this same gene is expressed in plants transformed with the vector pBI121 because it is under the control of the constitutive CaMV 35S promoter (Jefferson et al . , J. EMBO, _6, 3901- 3907, 1987) .
  • These two plasmids were used respectively as negative and positive controls for the expression of the reporter gene uidA .
  • Nicotiana tabacum var . XHFD8 is carried out with the vectors described above (pCSPP5-8, pBIlOl and pBI121) , according to the protocol described by Horsch et al . (Plant Mol. Biol. Manuel, Gelvin, Schilperoort and Verma Eds, Kluwer Academic Publishers Dordrecht, Netherlands, A5, 1-9, 1993). To do this, foliar discs of plantlets which are germinated in vi tro are incubated for about 2 min with a transformed stationary phase culture of Agrobacterium tumefaciens diluted in a 0.9% NaCl solution so as to obtain an OD measurement at 600 nm of between 0.2 and 0.3.
  • MS-stem medium MS salts 4.3 g/1, sucrose 30 g/1, agar 8 g/1, myoinositol 100 mg/1, thiamine 10 mg/1, nicotinic ' acid 1 mg/1, pyridoxine 1 mg/1, naphthaleneacetic acid (NAA) 0.1 mg/1, benzyladenine (BA) 1 mg/1) (Murashige and Skoog, Physiol. Plant 15 , 473-497, 1962) .
  • MS salts 4.3 g/1, sucrose 30 g/1, agar 8 g/1, myoinositol 100 mg/1, thiamine 10 mg/1, nicotinic ' acid 1 mg/1, pyridoxine 1 mg/1, naphthaleneacetic acid (NAA) 0.1 mg/1, benzyladenine (BA) 1 mg/1) (Murashige and Skoog, Physiol. Plant 15 , 473-497, 1962) .
  • the discs are transferred onto MS medium supplemented with kanamycin (100 ⁇ g/ml) and with cefotaxime (400 ⁇ g/ml) in order to multiply the transformed cells so as to obtain calli. These discs are then subcultured every week on fresh "MS-stem" medium.
  • the buds which germinate are cut from the calli and they are subcultured on standard MS medium, that is to say an MS medium free of phytohormones, supplemented with kanamycin (100 ⁇ g/ml) and with cefotaxime (200 ⁇ g/ml) .
  • MS medium an MS medium free of phytohormones
  • the plantlets are transplanted into earthenware pots in a substrate composed of peat and compost and then grown in a greenhouse at a temperature of 25°C and with a photoperiod of 16 h.
  • 30 plantlets (R0 generation) are selected. All these plantlets proved to be morphologically normal and fertile. They were selfed and gave seeds (Rl generation) .
  • X Analysis of the genomic DNA of tobacco plants transformed with Agrobacterium tumefaciens
  • the genomic DNA of transgenic tobacco plants is extracted from the leaves according to the protocol described by Rogers and Bendich (Plant Mol. Biol. Manuel, Gelvin, Schilpoort and Verma Eds, Kluwer Academic Publishers Dordrecht, Netherlands, A6_, 1-11, 1993) and then they are analysed by PCR and by molecular hybridization, according to the Southern-Blot technique .
  • the PCR reactions are carried out with 10 ng of DNA in a final volume of 50 ⁇ l containing 50 mM KC1, 10 mM Tris-HCl, pH 8.8, 1.5 mM MgCl 2 , 0.1 mg/ml gelatin, 0.2 mM of each dNTP, 3 units of Tag DNA polymerase, 0.25 ⁇ M of the oligonucleotide BI104, having the sequence SEQ ID NO: 19 described in the sequence listing hereinafter, and 0.25 ⁇ M of the oligonucleotide BI105, having the sequence SEQ ID NO : 20, described in the sequence listing hereinafter.
  • the oligonucleotide BI104 is located at 27 bp downstream of the BamHl site of the plasmid BI101 and the oligonucleotide BI105 is located at 73 bp upstream of the BamHl site of the plasmid pBIlOl.
  • the PCR reactions are carried out over 30 cycles (94°C-30 s, 54°C-30 s, 72°C-2 min) followed by a cycle of 7 min at 72 °C (final extension) .
  • the DNA fragments amplified from transgenic tobacco plants transformed with the plasmids pBIlOl (negative control), pBI121 (positive control), pCSPP5, pCSPP6, pCSPP7 and pCSPP8 have molecular weights of about 280 bp, 1030 bp, 1230 bp, 980 bp, 730 bp and 430 bp respectively. In all cases, it is concluded that the fragment initially cloned upstream of the reporter gene uidA is intact.
  • the probe uidA is synthesized by PCR using the synthetic olignucleotide GMP1, having the sequence SEQ ID NO: 21 described in the sequence listing hereinafter, and the synthetic oligonucleotide GMP2 having the sequence SEQ ID NO: 22 described in the sequence listing hereinafter, in the presence of 0.1 ng of vector pBIlOl, in a final volume of 50 ⁇ l containing 50 mM KC1, 10 mM Tris-HCl, pH 8.8, 1.5 mM MgCl 2 , 0.1 mg/ml gelatin, 0.2 mM of each dNTP, 0.25 ⁇ M of each oligonucleotide and 3 units of Tag DNA polymerase (Stratagene, USA) .
  • the reaction mixture is covered with 50 ⁇ l of mineral oil and it is incubated for 30 cycles (94°C-30 s, 46°C-30 s, 72°C-2 min) followed by a cycle at 72°C for 7 min.
  • the probe np ll is synthesized by PCR using the synthetic oligonucleotide NPTII-1, having the sequence SEQ ID NO: 23 described in the sequence listing hereinafter, and the synthetic olignucleotide NPTII-2 having the sequence SEQ ID NO: 24 described in the sequence listing hereinafter, in the presence of 0.1 ng of vector pBIlOl, in a final volume of 50 ⁇ l containing 50 mM KC1, 10 mM Tris-HCl, pH 8.8, 1.5 mM MgCl 2 , 0.1 mg/ml gelatin, 0.2 mM of each dNTP, 0.25 ⁇ M of each oligonucleotide and 3 units of Tag DNA polymerase (Stratagene, USA
  • the hybridization profiles obtained for each probe are then compared so as to select the tobacco plants transformed with Agrojacterium tumefaciens which have integrated into their genome a single non-rearranged copy of the T-DNA.
  • the selection of these plants is also confirmed by the results of the analysis of the segregation of the kanamycin-resistance character, after germination in vi tro on standard MS medium of the Rl seeds of these plants. Indeed, in this case, a 3/4- 1/4 segregation of the kanamycin-resistance character is observed, which is compatible with the integration of the T-DNA at a single locus of the nuclear DNA.
  • the measurements of the GUS activity are therefore carried out on the leaves and the seeds according to the method described by Jefferson et al . (1987), using MUG (methyl umbelliferyl glucuronide) as substrate and by measuring, by fluorimetry, the appearance of MU (methylumbelliferone) .
  • MUG methyl umbelliferyl glucuronide
  • MU methylumbelliferone
  • the cellular debris are removed by centrifugation for 15 min at 4°C and the soluble proteins in the supernatant are quantified by the Bradford method (Anal. Biochem. 12 . ' 248-254, 1976) according to the protocol defined by Bio-Rad (USA) and using BSA as standard.
  • the measurements of GUS activity are carried out in microtitre plates incubated at 37°C, using 1 ⁇ g of soluble proteins in 150 ⁇ l of reaction buffer (extraction buffer with 1 mM MUG) .
  • the measurements of fluorescence, expressed in pmol MU/min/mg of proteins are carried out at an excitation wavelength of 365 nm and an emission wavelength of 455 nm (Fluoroskanll , Labsystem) .
  • the maximum expression of the uidA gene is obtained with the vectors pCSPP5 and pCSPP6, reaching on average 465 nmol of MU/min/mg of protein. From this observation, it can be concluded that the DNA fragment between nucleotides 1572 (5' end of the sequence SEQ ID NO: 15) and 1815 (5' end of the sequence SEQ ID NO: 16) of the sequence SEQ ID NO: 3 contains no sequence which is critical in the functioning of the coffee promoter. The most substantial deletions made in the promoter (corresponding to the vectors pCSPP7 and pCSPP8) have as a consequence a reduction in the level of expression of the uidA reporter gene which is greater, the more substantial the deletion.
  • a PCR amplification of the DNA sequence between nucleotides 108 and 1517 of the sequence SEQ ID NO: 1 is carried out with the aid of the oligonucleotide TAG1 , having the sequence SEQ ID NO: 25, and the oligonucleotide TAG2 , having the sequence SEQ ID NO: 26.
  • TAG1 having the sequence SEQ ID NO: 25
  • TAG2 having the sequence SEQ ID NO: 26.
  • This reaction is carried out in the presence of 50 ng of vector pCSP2 , in a final volume of 100 ⁇ l containing 1.5 units of Pwo DNA polymerase (Boehringer Mannheim), 10 ⁇ l of 10X Pwo DNA polymerase buffer (Boehringer Mannheim), 0.1 mM of each dNTP and 2 nM of each olignucleotide, TAG1 and TAG2.
  • the reaction mixture is incubated for 30 cycles (94°C-30 s, 40°C-60 s, 72°C- 2 min) followed by a final extension cycle at 72°C for 7 min.
  • the use of the expression vector pQE31 makes it possible to introduce 6 histidines (6 His tag) in phase with the N- terminal end of the coffee 11s storage protein, which then facilitates the purification of this recombinant protein after passing over a column of Ni-NTA resin containing Ni 2+ ions (Hochuli et al . , J. Chromatography, 411, 177-184, 1987) .
  • the ligation mixture is used to transform competent cells of the strain M15[pREP4] of Escherichia coli according to the recommendations provided by Qiagen (USA) and the recombinant bacteria are selected on dishes with LB medium containing 25 ⁇ g/ml of kanamycin and 100 ⁇ g/ml of ampicillin.
  • the induction is carried out by adding IPTG to a final concentration of 1 mM into the culture medium and culture samples are collected every 30 minutes.
  • the bacteria are lysed and the soluble proteins are extracted from Escherichia coli under denaturing conditions. These proteins are then separated on a column of Ni-NTA resin following the protocol defined by QIAGEN (QIAexpress system) . The protein fractions successively eluted are then analysed by SDS-PAGE electrophoresis. It is thus shown that the only protein capable of binding to the Ni-NTA column corresponds to the coffee 11s recombinant protein. This protein is expressed in Escherichia coli with an approximate molecular weight of 55 kDa which is in agreement with that observed in coffee beans for the storage protein in its precursor form, and this taking into consideration the protein sequence modifications which were carried out during the construction of the expression vector.
  • GTT GAC CTC AAA ATA ATA CAS AAA TTG AAC CGT CCS AAA GAT CAA AGG 821 Val Aap Leu Lys He He Gin Lys Leu Lys Gly Pro Lys Aap G n Arg 250 255 260
  • AGC AAT GCC ATT TTT GCA CCA CAC TGG AAT ATC AAT GCA CAT AAT GCC 1157 Ser Asn Ala l e Phe Ala Pro His Trp Aan l e Asn Ala His Asn Ala 360 365 370 375
  • AGC TCT TTC CAA ATT TCC AGC GAG GAA GCT GAG GAA TTG AAG TAT GGA 1445 Ser Ser Phe Gin He Ser Ser Glu Glu Ala Glu Glu Leu Lys Tyr Gly 460 465 470
  • CTACCAACTA TATGTGTGAA TCTAATTCCA AAATAAAATG GTCAATGGAT GTAAAGACAT 1608
  • MOLECULE TYPE DNA (genomic)
  • FEATURE FEATURE:
  • AGAACCATCC TTCAGGTTCC CATCAGAGGC TGGTTTAACT GAATTCTGGG ATTCTAATAA 2700

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • Cell Biology (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Public Health (AREA)
  • Animal Behavior & Ethology (AREA)
  • Veterinary Medicine (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Pregnancy & Childbirth (AREA)
  • Developmental Biology & Embryology (AREA)
  • Dermatology (AREA)
  • Birds (AREA)
  • Epidemiology (AREA)
  • Reproductive Health (AREA)
  • Nutrition Science (AREA)
  • Botany (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Peptides Or Proteins (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The subject of the present invention is proteins derived from the coffee bean, and DNAs encoding and regulating the expresion of at least one of these proteins.

Description

COFFEE STORAGE PROTEINS
The subject of the present invention is proteins derived from the coffee bean, and DNAs encoding and regulating the expression of at least one of these proteins .
STATE OF THE ART:
It is known that numerous plants are capable of producing, in their embryos, in their tubers and in particular in their seeds, storage proteins during their growth. These storage proteins play an important role, in particular, in the storage of amino acids for germination of the seed. They are also important in the structure and the content of amino acids .
Some of these proteins have been isolated and, in some cases, have been expressed in host plants.
Thus, EP 0,295,959 demonstrates, in particular, the expression, in a host plant, of the DNA derived from Bertholletia excelsa H.B.K. (brazil nut) encoding at least one subunit of the storage protein called 2S.
Furthermore, WO 9119801 demonstrates the existence of two storage proteins derived from Theobro a cacao, their precursor and their genes encoding these proteins .
However, up until now, no storage protein derived from the coffee bean and no sequence capable of regulating the transcription of these proteins are known. Yet, it would be very useful to have available sequences of such proteins, in particular in order to modify the original production of the storage proteins in the coffee bean. Furthermore, it would also be very useful to have available a sequence capable of regulating the transcription of such proteins, so as to allow, in particular, the expression, in the coffee bean, of a protein encoded by a gene of interest.
The aim of the present invention is to respond to these needs .
SUMMARY OF THE INVENTION:
To this effect, the present invention relates to any DNA derived from the coffee bean, encoding at least 20 consecutive amino acids of the amino acid sequence SEQ ID NO: 2.
The present invention relates to any storage protein derived from the coffee bean, having at least 20 consecutive amino acids of the amino acid sequence SEQ ID NO: 2.
Another subject of the present invention relates to all or part of the DNA delimited by nucleotides 1 to 2509 of the nucleic sequence SEQ ID NO: 3, capable of regulating the transcription of the storage proteins according to the invention, as well as the use of all or part of this DNA to direct the expression of genes of interest in plants, in particular in the coffee tree .
The present invention also relates to the use of all or part of the DNA delimited by nucleotides 33 to 1508 of the nucleic sequence SEQ ID NO: 1 or of its complementary strand, of at least 10 bp, to carry out a PCR or as probe to detect in vi tro or to inactivate in vivo a coffee bean gene encoding a storage protein .
Furthermore, the invention relates to any recombinant plant cell capable of expressing a recombinant storage protein according to the invention. Finally, the present invention relates to any food, cosmetic or pharmaceutical product comprising all or part of the DNA or of the recombinant proteins according to the invention.
The present invention therefore opens the possibility of using all or part of the DNA according to the invention so as to modify the original production of the storage proteins in the coffee bean. It is therefore possible in particular to envisage overexpressing or underexpressing the expression of all or part of the DNA according to the invention in the coffee bean.
DETAILED DESCRIPTION OF THE INVENTION:
For the purposes of the present invention, "homologous nucleic sequence" is understood to mean any nucleic sequence differing from the nucleic sequences according to the invention only in the substitution, deletion and/or insertion of a small number of base pairs. In this context, two nucleic sequences which, because of the degeneracy of the genetic code, encode the same protein will be considered in particular as being homologous. Will also be considered as homologous sequence, that which exhibits more than 70% homology with the nucleic sequence according to the invention. In the latter case, the homology is determined by the ratio between the number of base pairs of a homologous sequence and that of a nucleic sequence according to the invention.
Furthermore, for the purposes of the present invention, homologous nucleic sequence is also understood to mean a sequence which hybridizes under stringent conditions, that is to say any nucleic sequence capable of hybridizing to the nucleic sequences according to the present invention by the Southern-Blot method, so as to avoid aspecific hybridizations or hybridizations which are not very stable (Sambrook et al . Molecular Cloning, A Laboratory Manual , Cold Spring Harbor Laboratory Press, USA, 1989, chapter 9.31 to 9.51).
Finally, for the purposes of the present invention, "homologous amino acid sequence" is understood to mean any amino acid sequence differing from the amino acid sequences according to the present invention only in the substitution, insertion and/or deletion of at least one amino acid. Will also be considered as homologous sequence, that which exhibits more than 50% homology with the amino acid sequence according to the invention. In the latter case, the homology is determined by the ratio between the number of amino acids of a homologous sequence and that of an amino acid sequence according to the invention.
In the remainder of the description, the sequences SEQ ID NO: refer to the sequences presented in the sequence listing hereinafter. The synthetic oligonucleotides SEQ ID NO: 5 to SEQ ID NO: 18, which are mentioned in the description and presented in the sequence listing hereinafter, are provided by Genset SA, 1 passage Delaunay, 75011 Paris, France.
Storage proteins are present only in the coffee bean and are highly expressed in the endosperm. In the ripe coffee bean, they represent nearly 50% of the total proteins and play a major role in the maturation of the coffee bean. These proteins influence in particular the structure and the density of the coffee bean as well as its amino acid content. They also play a major role in the storage of amino acids for the germination of the bean.
It is possible to isolate the DNA encoding, as well as the DNA regulating the expression of the storage proteins of the coffee bean by carrying out a conventional inverse PCR starting with nucleic primers derived from the nucleic sequences SEQ ID NO: 1 and SEQ ID NO: 3. Persons skilled in the art are indeed capable of choosing the primers which are most suitable for carrying out this PCR, for example.
To this effect, a DNA encoding at least 20 consecutive amino acids of the amino acid sequence SEQ ID NO: 2 has been isolated from the coffee bean.
Preferably, said DNA encodes at least one protein derived from the coffee bean, chosen from the group comprising the storage protein αβ , having the amino acid sequence SEQ ID NO: 2, the cleavage protein delimited in the amino acid sequence SEQ ID NO: 2 by amino acids 1 to 304, the cleavage protein β, delimited in the amino acid sequence SEQ ID NO: 2 by amino acids 305 to 492, or any nucleic sequences homologous to these sequences .
Given the benefit of the present invention, the invention relates to the DNA delimited by nucleotides 33 to 1508 in the nucleic sequence SEQ ID NO : 1 encoding the storage protein αβ , or any nucleic sequence, homologous to this sequence. In particular, the invention relates to the DNA comprising at least in the nucleic sequence SEQ ID NO: 1 nucleotides 33 to 944 encoding the cleavage protein α and/or nucleotides 945 to 1508 encoding the cleavage protein β.
The present invention also relates to the use of all or part of the DNA delimited by nucleotides 33 to 1508 of the nucleic sequence SEQ ID NO: 1 or of its complementary strand, of at least 10 bp as primer to carry out a PCR or as probe to detect in vi tro or to modify the expression in vivo of at least one coffee bean gene encoding at least one storage protein.
The DNA according to the present invention may be advantageously used to express at least one recombinant storage protein, derived from the coffee bean, in a host plant or microorganism. To this effect, it is possible to clone all or part of the nucleic sequence SEQ ID NO: 1 delimited by nucleotides 33 to 1508 into an expression vector downstream of a promoter, or of a promoter and a signal sequence, and upstream of a terminator, while preserving the reading frame, then the said vector may be introduced into a plant, a yeast or bacterium, for example. Specific examples of application are presented hereinafter.
Furthermore, all or part of the DNA delimited by nucleotides 33 to 1508 of nucleic sequence SEQ ID NO : 1 may be advantageously used in the coffee bean in a form which is modified by mutagenesis so as to modify the original production of storage proteins in the coffee bean and thus to modify the organoleptic quality of the coffee bean.
The invention also relates to the storage protein αβ, having the amino acid sequence SEQ ID NO: 2, the cleavage protein α having the sequence delimited by amino acids 1 to 304 of the amino acid sequence SEQ ID NO: 2 and the cleavage protein β having the sequence delimited by amino acids 305 to 492 of the amino acid sequence SEQ ID NO: 2, or any amino acid sequence which is homologous thereto.
The fact that the storage proteins derived from the coffee bean are synthesized into a large precursor, the storage protein αβ, which is cleaved into two proteins, the cleavage protein α and the cleavage protein β, has been demonstrated. The cleavage proteins α and β can recombine in a polymerized form through at least one disulphide bridge. Indeed, it has been possible to isolate in the endosperm of the coffee bean polymerized forms of the cleavage proteins α and/or β and/or of their homologous sequences . To this effect, the present invention also relates to the polymerized form of the recombinant storage proteins αβ, α and/or β, as well as their homologous sequences .
Another subject of the present invention relates to all or part of the DNA delimited by nucleotides 1 to 2509 of the nucleic sequence SEQ ID NO: 3, capable of regulating the expression of the storage protein having the amino acid sequence SEQ ID NO: 2.
The invention also relates to the use of all or part of the DNA delimited by nucleotides 1 to 2509 of the nucleic sequence SEQ. ID NO : 3, to allow the expression, in the coffee bean or in a heterologous plant, of the storage protein αβ encoded by nucleotides 33 to 1508 of the nucleic sequence SEQ ID NO: 1 or of a protein encoded by a gene of interest.
The DNA delimited by nucleotides 1 to 2509 of the nucleic sequence SEQ ID NO: 3 may be advantageously used by fusing it, completely or partially, with a gene of interest, while preserving the reading frame, and then by cloning the whole into an expression vector which is introduced into coffee, so as to allow the expression of the protein encoded by this gene in the coffee bean.
The invention also covers all the food, cosmetic or pharmaceutical products comprising all or part of the DNA, or of the recombinant proteins according to the invention. Persons skilled in the art are indeed capable, by means of oligonucleotide probes or of appropriate antibodies, of detecting their presence in very low quantities.
The storage proteins derived from the coffee bean, the DNA derived from the coffee bean encoding at least one of these proteins, as well as the DNA capable of regulating their transcription, according to the present invention, are characterized in greater detail with the aid of biochemical and molecular analyses hereinafter.
I . Identification of the storage proteins of the coffee bean
The total proteins are extracted from ripe fruits of Coffea arabica of the Caturra variety.
To do this, the maternal tissues are separated from the coffee beans which are rapidly ground in liquid nitrogen, and which are then reduced to a powder according to the method of Damerval et al . (Electrophoresis 1_, 52-54, 1986). The coffee proteins are then extracted from 10 mg of this powder which is solubilized in 100 μl of solution containing 3% w/v of CHAPS, 8.5 M urea, 0.15% w/v of DTT and 3% v/v of ampholyte support pH 3-10.
The mixture is then centrifuged at 13,000 g for 5 min and the supernatant which contains the total proteins of the coffee beans is recovered.
A one-dimensional electrophoresis is performed on this supernatant on the basis of a pH gradient, using, for example, the Multiphore system (Pharmacia Biotech AB, Bjδrkgatan 30, 75182 Upsula, Sweden) . To do this, 50 μl are deposited/electrophoresis gel.
To separate the total proteins according to their molecular weights, a second SDS-PAGE electrophoresis is then performed on the gels derived from the first electrophoresis, using, for example, a Bio-Rad equipment (Bio-Rad Laboratories, 2000 Alfred Nobel Drive, Hercules, California 94547 USA) under standard conditions, according to the Laemmli method (Nature, 277, 680-688, 1970) . To do this, the gels derived from the one-dimensional electrophoresis are equilibrated with 5 ml/gel of Tris buffer containing 6 M urea, 30% v/v of glycerol, 2% w/v of SDS, 2% w/v of DTT and 2.5% w/v of iodoacetamide, they are placed on the gels of the second SDS-PAGE electrophoresis, and the migration of the proteins is carried out in a Bio-Rad equipment at 40 mA and at a temperature of 12 °C for 9 h, for example.
The gels thus produced are silver stained by the Bjellqyist et al . method (Electrophoresis, 1_4, 1357- 1365, 1993) .
The images are the . analysed with the aid of a scanner (Scanner XRS 12CX, X-Ray Scanner Corporation, 4030 Spencer Street, Torrance, California 90503 USA) and, for example, with the aid of the Bio Image programme (Bio Image, 777 East Eisenhower Parkway, Suite 950, Ann Arbor, Michigan 48108, USA).
The proteins separated by two-dimensional electrophoresis are transferred onto PVDF membranes in a CAPS buffer, with the aid of a Bio-Rad Transblot Cell (Bio-Rad, USA) maintained at 420 mA and at a temperature of 4°C for 1 h 30 min, and then they are stained with coomassie blue, according to the instructions of Applied Biosystems (Applied Biosystems Inc., 850 Lincoln Centre Drive, Foster City, California 94404 USA) .
After the transfer, the membranes are dried at room temperature, before storing them at -18°C in plastic pouches .
Microsequencing of the N-terminal sequences of the protein blots is carried out with the aid of a sequencer of the Beckman LF 3000 type and of the Beckman Gold HPLC system (Beckmann Instruments Inc., 250 Harbor Boulevard Box 3100, Fullerton, California 92634 USA) . For that, the protein blots are cut out of the membrane and subjected to trypsin digestion at pH 8.3 in 50 μl of digestion buffer containing 10% v/v of trypsin, 100 mM of Tris HCl, 1% v/v of triton RTX and 10% v/v of acetonitrile .
The peptides are then separated by HPLC in a C18 column (Merk KGAA Frankfurte Strasse 250, 64923 Darmstadt, DE) , using a water/acetonitrile gradient containing 0.05% of TFA, the peptide fractions are concentrated and they are rediluted in 30% of acetonitrile and 0.01% of TFA, and they are sequenced as described above.
The two-dimensional electrophoretic profile, under denaturant conditions of the endosperm of ripe C. arabica beans shows 4 groups of proteins which are represented in particular, these proteins having an apparent molecular weight of the order of 70, 56, 32 and 23 kDa.
It can be observed that 2 proteins of the group of proteins at 23 kDa as well as 2 proteins of the group of proteins at 70 kDa have an N-terminal sequence which is identical to the N-terminal sequence of the cleavage protein β.
Furthermore, the fact that 3 proteins of the group of proteins at 32 kDa and 1 protein of the group of proteins at 56 kDa have an N-terminal sequence identical to the N-terminal sequence of the cleavage protein α was demonstrated.
It was also possible to establish 7 internal sequences, of 5 to 15 amino acids, from one of the proteins of 32 kDa.
Moreover, with the aid of the SwissProt databank (Genetics Computer Group Inc., University Research Park, 575 Science Drive, Madison, Wisconsin 53711 USA) and using the FASTA programme (Pearson and Lipman, Proc. Natl. Acad. Sci. USA, J35, 2444-2448, 1988), it was possible to demonstrate the fact that the N-terminal sequences of the proteins of 23, 32, 56 and 70 kDa and the internal sequences of the protein at 32 kDa have a high homology with the sequences of storage proteins of certain plant species, such as for example glycinins of Glycine max, 12s proteins of Arabidopsis thaliana , cruciferin of Brassica napus , glutelins of Oryza sativa and 11s protein of Cucurbi ta maxima .
In the light of these results, it has been possible to make the following -hypotheses on the structure of the storage proteins derived from the coffee bean.
The group of proteins of 56 kDa represents a large precursor, the mature storage protein αβ , comprising two domains , the α domain and the β domain . The two-dimensional electrophoretic profile also demonstrates the existence of the cleavage protein α, present in several isoforms at 32 kDa and that of the cleavage protein β, present in several isoforms at 23 kDa. Thus, like the storage protein αβ, the cleavage proteins α and β may exist in various isoforms.
Finally, the group of proteins of 70 kDa represents the trimeric form of the cleavage protein β.
Furthermore, the existence of a fragment of the cleavage protein α of 13 kDa has been demonstrated on the two-dimensional electrophoretic profile.
II. Estimation of the quantity of storage proteins contained in the coffee bean and specificity of expression of the storage proteins derived from the coffee bean
The quantity of storage proteins contained in the coffee bean is calculated, in per cent, relative to the total integrated intensity of the two-dimensional electrophoretic profile. To do this, the integrated intensity of the protein blots, representing the storage protein αβ, the cleavage protein α, the cleavage protein β, the trimeric form of the cleavage protein β and the fragment of the cleavage protein α is measured.
It is accepted that the total integrated density of the two-dimensional electrophoretic profile is equivalent to 100%. A value of 50% of storage proteins contained in the coffee bean is thus obtained.
Moreover, the expression of the storage proteins of the coffee bean in tissues of the coffee bean other than the endosperm is also checked by two-dimensional electrophoresis. It is thus possible to demonstrate the fact that the storage proteins are only synthesized in a large quantity in the endosperm and in a much lower proportion in the embryo of the coffee bean.
III . Isolation and translation in vi tro of the polyA+ messengers RNAs from the total RNAs of the coffee bean
The total RNAs are extracted from coffee beans harvested from 4 to 40 weeks after flowering.
To do this, the maternal tissues are separated from the coffee beans which are rapidly ground in liquid nitrogen before being reduced to a powder.
This powder is then resuspended in 8 ml of buffer at pH 8 containing 100 mM Tris-HCl, 0.1% w/v of SDS and 0.5% v/v of β-mercaptoethanol , it is homogenized with one volume of phenol saturated with 100 mM Tris-HCl, pH 8, and then centrifuged at 12,000 g for 10 min at 4°C, so as to extract the aqueous phase which is centrifuged (i) once with an equivalent volume of phenol, (ii) twice with an equivalent volume of phenol : chloroform (1:1) and (iii) twice with an equivalent volume of chloroform.
The total nucleic acids are then precipitated for 1 h at -20°C by adding to the aqueous phase 1/10 of the volume of 3 M sodium acetate, pH 5.2 and 2.5 volumes of ethanol .
The whole is then centrifuged at 12,000 g for 30 min at 4°C and the pellet is taken up in 10 ml of H20, before precipitating the nucleic acids again in the presence of LiCl (2 M final) and ethanol (2.5 volumes).
After centrifugation, the pellet of total RNAs is taken up in 1 ml of H20 and it is digested for 1 h at 37°C with DNAse RQ1 (Promega Corporation, 2800 Woods Hollow Road, Madison, Wisconsin 53711 USA) , so as to eliminate any trace of DNA, and the total RNAs are then deproteinized by treatment with phenol and with chloroform, before precipitating them in the presence of sodium acetate as described above.
The total RNAs are then taken up in 500 μl of H20 and they are quantified by spectrophotometric assay at 260 nm. Their quality is analysed by agarose gel electrophoresis in the presence of formaldehyde and by in vi tro translation.
To do this, the polyA+ messenger RNAs (mRNA) are then purified from 500 μg of total RNAs using the Oligotex- dT purification system (Qiagen INC., 9600 De Soto
Avenue, Chatsworth, California 91311 USA) , and the quality of the mRNAs is then evaluated by their capacity to synthesize proteins in vi tro . For that, translation experiments are carried out with 1 μg of mRNA in the presence of a rabbit reticulocyte lysate
(Promega, USA) , and then the proteins thus synthesized are labelled by incorporation of 5S-methionine
(Amersham International pic., Amersham Place, Little Chalfont, Buckinghamshire HP7 9NA, UK) . The labelled proteins are then separated by two-dimensional electrophoresis as described above. After fixing in an acetic acid/ethanol mixture (40/10) , the gels are incubated in the presence of Amplify (Amersham, UK) , they are dried under vacuum and they are exposed at -80°C against an autoradiographic film.
On the one hand, the results of the in vi tro translations with the mRNAs extracted from beans 4 to 40 weeks old after flowering demonstrate the presence of numerous proteins with molecular weights of between 1 and 100 kDa.
On the other hand, the results of the in vi tro translations with the mRNAs extracted from beans harvested between 16 and 30 weeks after flowering demonstrate the presence, in a large quantity, of proteins which correspond to the αβ form of the storage proteins. On the other hand, no product of translation corresponding in size to the cleavage proteins α and β is observed. This result confirms the hypothesis made above, according to which these two cleavage proteins are effectively derived from the in vivo cleavage of the large αβ precursor.
To isolate the cDNA for these storage proteins, two libraries were made in the manner described below.
IV. Construction and screening of cDNA libraries
The synthesis of cDNA, necessary for the construction of libraries, is carried out according to the recommendations provided in the "Riboclone cDNA synthesis system M-MLV (H-)" kit (Promega, USA), using the mRNA extracted from coffee beans harvested 16 and 30 weeks after flowering. The efficiency of this reaction is monitored by the addition of [alpha-32P] dCTP during the synthesis of the two DNA strands. After migration on an alkaline agarose gel (Sambrook et al . , Molecular Cloning - A Laboratory Manual, 1989), the size of the new synthesized cDNA is estimated to vary from 0.2 to more than 4.3 kb. The quantifications, with the aid of the DNA Dipstick kit (InVitrogen BV, De Schelp 12, 9351 NV Leek, Netherlands), show that about 100 ng of cDNA are synthesized from 1 μg of mRNA.
The new synthesized cDNA(s) are then treated according to the recommendations provided in the RiboClone EcoRl
Adaptator Ligation System kit (Promega, USA) and they are ligated into the plasmid pBluescript II SK ( + )
(Stratagene, 11011 North Torrey Pines Road, La Jolla, California 92037, USA) previously digested with the restriction enzyme EcoRl and dephosphorylated by treatment with calf intestinal alkaline phosphatase .
The whole of this ligature mixture is used to convert the E. coli strain XLl-Blue MRF ' (Stratagene, USA) . The bacteria containing recombinant vectors are selected on dishes with LB (Luria-Bertani) medium containing
12.5 μl/ml of tetracycline, 20 μg/ml of ampicillin, 80 μg/ml of methicillin and in the presence of IPTG and X- Gal (Sambrook et al . , 1989). They are then cultured on
Petri dishes so as to obtain about 300 clones per dish.
These clones are transferred onto Nylon filter and they are then treated according to the recommendations provided by Boehringer Mannheim (Boehringer Mannheim GmbH, Biochemica, Postfach 310120, Mannheim 31, DE) .
Moreover, the sequence from amino acids 325 to 330 of the sequence SEQ ID NO: 2 is chosen in the amino acid sequence of the cleavage protein β because it makes it possible to designate an oligonucleotide probe which is relatively only slightly degenerate, the probe OLIGO 1, having the nucleic sequence SEQ ID NO: 4, which is labelled at its 5' end by the addition of the digoxigenin radical (Genosys Biotechnologies Inc., 162A Science Park, Milton Road, Cambridge CB4 4BR, UK) .
The filters are prehybridized at 65°C for 4 h in the hybridization solution defined in the DIG oligonucleotide 3 ' -end labelling kit protocol (Boehringer Mannheim, DE) and the hybridization is carried out at 37°C for 10 h in the presence of the probe OLIGOl (10 pmol /ml final) .
After the hybridization, the filters are washed in the presence of tetramethylammonium chloride according to the protocol defined by Wood et al . (Proc. Natl. Acad. Sci. USA, 8_2_, 1585-1588, 1985) and then they are subjected to immunological detection in the presence of CSPD (Tropix, 47 Wiggins Avenue, Bedford, Massachusetts 01730 USA) according to the recommendations provided by Boehringer Mannheim (DIG luminescent detection kit) .
A positive clone harbouring the recombinant vector, called "pCSPl" in the remainder of the description, is selected from the screening of the cDNA library carried out 16 weeks after flowering. This vector contains a cDNA, cloned into the EcoRl site of the vector pBluescript II SK (+), which is sequenced according to the "T7 sequencing kit" protocol (Pharmacia, Sweden) in the presence of [alpha-35S] dATP . This cDNA comprises the last 819 nucleotides of the sequence SEQ ID NO: 1 and, consequently, is incapable of encoding the storage protein αβ .
To isolate the cDNA encoding the entire storage protein αβ, a new nucleic probe, called SOI in the remainder of the description, is synthesized. To do this, a PCR is carried out (US Patent 4,683,195 and US Patent 4, 683 , 202) using the synthetic oligonucleotide OLIGO 2, having the nucleic sequence SEQ ID NO: 5, and the synthetic olignucleotide OLIGO 3, having the nucleic sequence SEQ ID NO : 6. The PCR reaction is carried out in the presence of 0.1 ng of vector pCSPl, in a final volume of 50 μl containing 50 mM KC1, 10 mM Tris-HCl, pH 8.8, 1.5 mM MgCl2, 0.1 mg/ml gelatin, 0.2 mM of each dNTP, 0.25 μM of each oligonucleotide (OLIGO 2 and OLIGO*3) and 3 units of Taq DNA polymerase (Stratagene, USA) . The reaction mixture is covered with 50 μl of mineral oil and it is incubated for 30 cycles (94°C-30 s, 42°C- 30 s, 72°C-2 min) followed by a final extension at 72°C for 7 min. The fragment obtained after amplification is purified on a Microcon 100 cartridge (Amicon INC, 72 Cherry Hill Drive, Beverly, Massachusetts 01915 USA) and 50 ng of this fragment are labelled by random primer extension -with 50 μCi of [alpha-j2P] dCTP according to the Megaprime kit (Amersham, UK) .
Furthermore, the Nylon filters used during the screening with the probe OLIGO 1 are dehybridized by two washes of 15 min at 37°C, in the presence of 0.2 N- NaOH-0.1% SDS (w/v) and then prehybridized for 4 h at 65°C in a solution containing 6xSSC, lxDenhart (0.2% Ficoll, 0.2% polyvinylpyrrolidone, 0.2% BSA fraction IV) and 50 μg/ml of denatured salmon sperm DNA. They are then hybridized for 10 h at 65°C in the same solution with the whole of the labelled probe SOI and then they are washed for 30 min at 65°C three times in the presence, successively, of 2xSSC-0.1% SDS, lxSSC- 0.1% SDS and O.lxSSC-0.1% SDS.
A positive clone harbouring the recombinant vector, called in the remainder of the description pCSP2 is thus selected from the screening of the cDNA library carried out at 30 weeks after flowering. This vector contains the sequence SEQ ID NO: 1 of 1706 bp, corresponding to the cDNA encoding the entire storage protein αβ, having as amino acid sequence the sequence SEQ ID NO: 2 and a theoretical molecular weight of 54999 Da. A search of the SwissProt databank with the sequence SEQ ID NO: 2 confirms that this coffee protein belongs to the family of type 11s plant storage proteins .
The cleavage site of the precursor is located between amino acids 304 and 305 of the amino acid sequence SEQ ID NO: 2, as has been observed for all the other type 11s plant proteins (Borroto and Dure, Plant Mol . Biol. _8, 113-131, 1987) . This is also confirmed by the N- terminal sequencing of the cleavage protein β described above. Consequently, the cleavage protein α corresponds to the first 304 amino acids of the amino acid sequence SEQ ID NO: 2, whereas the cleavage protein β corresponds to the last 188 amino acids of this sequence. The theoretical molecular weights of α and β are respectively 34125 Da and 20892 Da and are in agreement with those described above under "Identification of the storage proteins of the coffee bean" .
The N-terminal sequences of the cleavage proteins α and β analysed above are found in the amino acid sequence SEQ ID NO: 2 with the exception of a few amino acids. These differences are probably explained by the existence of several isoforms of these proteins which may differ from each other by a few amino acids
(Shirsat, Developmental Regulation of Plant Gene Expression, Grierson Ed. , Blackie, Chapman and Hall NY, 153-181, 1991) .
V. Expression of the gene encoding the storage protein αβ during the development of the Coffea arabica bean
The expression of the gene encoding the storage protein αβ in coffee beans harvested at various stages of development (at 9, 12, 16, 30 and 35 weeks after flowering) is monitored.
To do this, 10 μg of total RNAs of these coffee beans are denatured for 15 min at 65°C in lxMOPS buffer (20 mM MOPS, 5 mM sodium acetate, 1 mM EDTA, pH 7) in the presence of formamide (50%) and formaldehyde (0.66 M final) .
They are then separated by electrophoresis, for 6 h at 2.5 V/cm, in the presence of lxMOPS buffer, on a 1.2% agarose gel containing 2.2 M formaldehyde as final concentration .
After migration, the RNAs are stained with ethidium bromide (BET) according to Sambrook et al . 1989, which makes it possible to standardize the quantities deposited on a gel from the intensities of fluorescence of the 16S and 23 S ribosomal RNAs.
The total RNAs are then transferred and fixed on a positively charged Nylon membrane according to the recommendations provided by Boehringer Mannheim
(Boehringer Mannheim, DE) . The prehybridization and hybridization are carried out according to the conditions described above in chapter IV.
The mRNAs- encoding the storage protein αβ are completely absent from the beans harvested up to 9 weeks after flowering. They begin to be very weakly detected in the beans harvested at 12 weeks after flowering and are very abundant in the beans harvested between 16 and 30 weeks after flowering, again becoming very weakly represented in the ripe coffee beans (35 weeks after flowering) . In all cases, the probe SOI hybridizes with only one class of mRNA whose estimated size at around 1.8 kb is close to that of the nucleic sequence SEQ ID NO: 1.
The kinetics of accumulation of the mRNAs is similar to that observed for most of the genes for storage proteins (Shirsat, 1991) . According to the tissue examinations made during the maturation of the coffee beans, it is observed that the increase in the quantity of mRNA between 12 and 16 weeks after flowering occurs at the same time as the absorption of the perisperm by the endosperm. In comparison with the analyses carried out above by two-dimensional electrophoresis, on the accumulation of proteins during the maturation of the bean, a perfect superposition of the kinetics of accumulation of mRNAs with that of the storage proteins is observed. At the mature stage, the persistence of the storage proteins in the absence of their corresponding messenger RNAs is explained by a high stability of these proteins in vivo . According to these observations, and as has been shown in other plant species (Shirsat, 1991) it appears that the expression of the gene encoding the storage protein αβ is essentially controlled by a promoter, a sequence capable of regulating the transcription of the gene, which is specifically expressed in the endosperm of the coffee beans.
VI . Isolation of the promoter of the gene encoding the storage protein αβ of Coffea arabica
The promoter of the gene encoding the storage protein αβ of Coffea arabica is isolated by several inverse PCRs according to the method of Ochman et al . (Genetics 120, 621-623, 1988) .
To do this, the nuclear DNA of coffee is isolated from young leaves of C. arabica , Caturra variety, according to the protocol described by Rogers and Bendich (Plant
Mol. Biol. Manuel, Gelvin, Sc ilperoort and Verma Eds, Kluwer Academic Publishers Dordrecht, Netherlands, A6 , 1-11, 1993).
0.5 to 1 μg of this DNA is digested with several restriction enzymes, such as for example Oral, Hindi and Ndβl, and then treated with phenol : chloroform (1:1) and it is precipitated for 12 h at -20°C in the presence of sodium acetate 0.3 M final and of ethanol (2.5 volumes) .
After centrifugation at 10,000 g for 15 min at 4°C, the DNA is taken up in about 500 μl of ligation buffer containing 30 mM Tris-HCl, pH 7.8, 10 mM MgCl2, 10 mM DTT and 0.5 mM rATP, so as to obtain a final DNA concentration of about 1 to 2 ng/μl. The ligation is carried out for 12 h at 14°C in the presence of T4 DNA ligase at 0.02 Weiss u/μl and then the self-ligated genomic DNA is precipitated as described above and it is taken up in 20 μl of H20 before quantifying it with the DNA Dipstick kit (InVitrogen, Netherlands) .
a) Inverse PCR reaction No. 1
This first reaction is carried out using the synthetic oligonucleotide SO10, having the nucleic sequence SEQ ID NO: 7, and the oligonucleotide SOU, having the nucleic sequence SEQ ID NO : 8.
This inverse PCR reaction is carried out in the presence of 50 ng of ligated genomic DNA in a final volume of 50 μl containing 50 mM KC1, 10 mM Tris-HCl, pH 8.8, 1.5 mM MgCl2, 0.1 mg/ml of gelatin, 0.2 mM of each dNTP, 0.25 μM of each oligonucleotide (SO10 and SOU) and 3 units of Taq DNA polymerase (Stratagene, USA) . Next the reaction mixture is covered with 50 μl of mineral oil and it is incubated for 30 cycles (94°C- 30 s, 56°C-30s, 72°C-3 min) followed by a final extension cycle at 72 °C for 7 min.
The amplified DNA fragments are then analysed by molecular hybridization (J. Southern, Mol . Biol. 98, 503-517, 1975), they are separated by electrophoresis on 1% agarose gel stained with ethidium bromide and then they are transferred in the presence of 0.4 N NaOH for 12 h onto positively charge Nylon membrane (Boehringer Mannheim, DE) . After the transfer, the membrane is baked for 15 min at
120°C and then it is prehybridized at 65°C for 4 h in the hybridization solution defined in the "DIG oligonucleotide 3 ' -end labelling kit" protocol
(Boehringer Mannheim, DE) .
The membrane is then hybridized at 37°C for 10 h in the presence of the synthetic oligonucleotide S012 (10 pmol/ml) , having the nucleic sequence SEQ ID NO: 9 and labelled at its 5' end with a digoxigenin radical.
After hybridization, the filters are washed in the presence of tetramethylammonium chloride according to the protocol defined by Wood et al . , 1985, and then they are subjected to immunological detection in the presence of CSPD (Tropix, USA) , according to the recommendations provided in the DIG luminescent detection kit (Boehringer Mannheim, DE) .
After autoradiography, the presence of a DNA fragment of about 1.7 kb, derived from the inverse PCR reaction on the genomic DNA initially digested with the restriction enzyme Hindi, which binds the probe S012 , is detected.
This DNA is then cloned into the vector pCR-Script
(SK+) (Stratagene, USA) . To do this, 10 μl of the inverse PCR reaction are mixed with 100 μl of sterile water and then the mixture is centrifuged for 10 min at
3000 g in a Microcon 100 cartridge (Amicon, USA) .
3 μl of the DNA thus purified are treated in the presence of native Pfu DNA polymerase (Stratagene, USA) in order to convert its cohesive ends to blunt ends. This reaction is carried out in a final volume of 10 μl containing 10 mM KC1, 6 mM (NH )2S04, 20 mM Tris-HCl, pH 8.0, 0.1% Triton X-100, 2 mM MgCl2, 1 mM of each dNTP, 10 μg/ml BSA and the reaction mixture is covered with 50 μl of mineral oil, it is incubated for 30 min at 72 °C and then 1 μl of this reaction mixture is directly used in the ligation reaction with the vector pCR-Script ΞK(+) .
The whole of this ligation mixture (10 μl) is used to transform the E. coli strain XLl-Blue MRF' (Stratagene, USA) . The bacteria containing the recombinant vectors are selected on dishes with LB medium containing 20 μg/ml of ampicillin, 80 μg/ml of methicillin and in the presence of IPTG and X-Gal (Sambrook et al . , 1989) .
At the end of the transformation, about 100 clones are obtained which are . transferred onto Nylon filter and are analysed by molecular colony hybridization (Grunstein and Hogness, Proc. Natl. Acad. Sci. USA 72 , 3961-3965, 1975) with the probe S012 according to the conditions described above. This screening makes it possible to isolate a positive clone harbouring the recombinant vector pCSPPl . This vector contains the genomic DNA fragment detected by autoradiography which is cloned into the Sfrl site of the vector pCR-Script (SK+) . This DNA is sequenced, according to the protocol defined by Pharmacia (T7 sequencing kit) , in the presence of [alpha-35S] dATP . It comprises the last 1717 base pairs of the nucleic sequence SEQ ID NO: 3, bordered at each end by an Hindi restriction site. It contains 750 base pairs upstream of the codon for initiation of translation of the gene encoding the storage protein αβ and the first 968 base pairs of this nuclear gene. Given the fact that this gene belongs to a multigene family, it will be called hereinafter CSP1.
This partial sequence of the CΞP1 gene shows the presence of two introns of identical size (111 bp) , located respectively between nucleotides 2811-2921 for the first, and nucleotides 3239-3349 for the second nucleic sequence SEQ ID NO: 3. These two introns have sizes less than those observed, for example, in Arabidopsis thaliana but they are on the other hand located at the same positions as those observed in this plant (Pang et al . , Plant Mol. Biol _11, 805-820).
b) Inverse PCR reaction No. 2: first screening
To obtain the nucleic sequences located upstream of the Hindi site (position 1763 of the nucleic sequence SEQ ID NO : 3, another inverse PCR reaction is carried out using, this time, the synthetic oligonucleotides S016 and S017 deduced from the sequence previously cloned to the plasmid pCSPPl and having respectively the nucleic sequences SEQ ID NO: 10 and SEQ ID NO: 11.
This inverse PCR reaction is carried out under conditions identical to those described for the inverse PCR reaction No. 1, with the exception of the following parameters : the attachment of the oligonucleotides was carried out at 57 °C and 35 polymerization cycles were performed.
As defined above, the DNA fragments amplified by this
■PCR are analysed by molecular hybridization after having been separated on an electrophoresis gel and they are transferred onto a Nylon membrane. This membrane is then prehybridized for 4 h at 65°C in a solution containing 6xSSC, lxDenhart (0.2% Ficoll, 0.2% polyvinylpyrrolidone, 0.2% BSA fraction IV) and 50 μg/ml of denatured salmon sperm DNA and then it is hybridized for 10 h at 65°C in the same solution with the probe SO1016.
This probe is in fact synthesized by PCR using the synthetic oligonucleotides SO10 and S016 described above, in the presence of 0.1 ng of vector pCSPPl, in a final volume of 50 μl containing 50 mM KCl, 10 mM Tris- HCl, pH 8.8, 1.5 mM MgCl2, 0.1 mg/ml gelatin, 0.2 mM of each dNTP, 0.25 μM of each oligonucleotide (SO10 and S016) and 3 units of Taq DNA polymerase (Stratagene, USA) . The reaction mixture is covered with 50 μl of mineral oil and it is incubated for 30 cycles (94°C- 30 s, 46°C-30 s, 72°C-2 min) followed by a final extension cycle at 72°C for 7 min. The fragment obtained after amplification (698 bp) is purified on a Microcon 100 cartridge (A icon, USA) and 50 ng of this fragment are labelled by random primer extension with 50 μCi of [alpha-32P]dCTP according to the Megaprime kit (Amersham, UK) protocol.
After hybridization, the membrane is washed three times for 30 min at 65°C in the presence, successively, of 2xSSC-0.1% SDS, lxSSC-0.1% SDS and O.lxSSC-0.1% SDS and it is analysed by autoradiography so as to detect a DNA fragment of about 1 kb which binds the probe SO1016.
This DNA, derived from the inverse PCR reaction on the genomic DNA initially digested with the restriction enzyme Ndel, is then treated with Pfu DNA polymerase and then it is ligated into the vector pCR-Script (SK+) as described above. This ligation is then used to transform the E. coli strain XLl-Blue MRF' and the transformants are selected and analysed by molecular hybridization with the probe SO1016 according to the conditions described above. This screening makes it possible to isolate a positive clone harbouring the vector pCSPP2. As expected, this vector results from the cloning into the Sfrl site of the vector pCR-Script
(SK+) of the DNA fragment previously identified by hybridization. The latter, which corresponds to the DNA segment between nucleotides 1514 and 2523 of the nucleic sequence SEQ ID NO: 3, bordered at each of the ends by an Ndel restriction site and which consequently contains 250 bp in addition upstream of the genomic DNA fragment cloned into the vector pCSPPl, is sequenced. c) Inverse PCR reaction No. 2: second screening
To clone nucleotides 1 to 1513 of the nucleic sequence SEQ ID NO: 3, another molecular hybridization is performed on the DNA fragments derived from the inverse PCR reaction No. 2.
To do this, the probe used, called SO1720, is deduced from the sequence of coffee nuclear DNA cloned into the vector pCSPP2 and it is synthesized by PCR using the oligonucleotide S017 described above and the oligonucleotide SO20 having the nucleic sequence SEQ ID NO: 12. This reaction is carried out in the presence of 0.1 ng of vector pCSPP2 , under conditions identical to those used for the synthesis of the probe SO1016, with the exception of the temperature for attachment of the oligonucleotides which is 50°C. The fragment obtained after amplification (262 bp) is labelled as described above and it is used as probe to test the inverse PCR reactions No. 2.
The Nylon membrane used during the screening of the products of the inverse PCR reaction No. 2 with the probe SO1016 is dehybridized by two washes for 15 min at 37°C in the presence of 0.2 N NaOH-0.1% SDS (w/v), then it is prehybridized and it is hybridized as described above with the probe SO1720.
At the end of this hybridization, a DNA fragment of about 1.9 kb, derived from the inverse PCR reaction No. 2 on the genomic DNA initially digested with the restriction enzyme Oral, is detected. As described above, this DNA is then treated with Pfu DNA polymerase, it is ligated into the vector pCR-Script (SK+) and the entire ligation is used to transform the E. coli strain XLl-Blue MRF'.
The transformants are then selected and they are screened by molecular hybridization with the probe SO1720. It is thus possible to isolate a positive clone harbouring the vector pCSPP3 which results from the cloning into the Sfrl site of the vector pCR-Script (SK+) of the DNA fragment previously identified by hybridization. The latter, which corresponds to the DNA segment between nucleotides 1 and 1886 of the nucleic sequence SEQ ID NO: 3, bordered at each end by a raJ restriction site, is sequenced. It therefore contains 1513 base pairs in addition upstream of the genomic DNA fragment cloned into the vector pCSPP2.
d) Cloning of the genomic DNA fragments
The inverse PCR experiments form chimeric linear molecules by combining noncontiguous DNA fragments in the genome with each other (Ochman et al . , 1988). Moreover, measurements of mutation frequency show that the Pfu DNA polymerase is approximately twelve times more accurate than Tag DNA polymerase, which reduces the probability of point mutations during PCR amplifications (Lundberg et al . , Gene 108, 1-6, 1991).
For these reasons, a PCR reaction is carried out on the native genomic DNA of C. arabica , Caturra variety, in the presence of Pfu DNA polymerase. This reaction is carried out in the presence of 10 ng of genomic DNA, in a final volume of 50 μl containing 10 mM KCl, 6 mM (NH4)2S04, 20 mM Tris-HCl, pH 8.0, 0.1% Triton X-100, 2 mM MgCl2, 10 μg/ml BSA, 0.2 mM of each dNTP, 0.25 μM of the oligonucleotides, SOlO and SO20 described above and 3 units of Pfu DNA polymerase. The oligonucleotide SOlO is located on the antisense strand of the nucleic sequence SEQ ID NO : 3, between nucleotides 2512 and 2534 whereas the oligonucleotide SO20 is located on the sense strand of the nucleic sequence SEQ ID NO: 3, between nucleotides 1565 and 1584. The reaction mixture is then covered with 50 μl of mineral oil and it is incubated for 45 cycles (94°C-30 s, 50°C-30 s, 72°C- 3 min) followed by a final extension cycle at 72 °C for 7 min.
Following this PCR, a single fragment is obtained which is cloned into the vector pCR-Script (SK+) to give the vector pCSPP4. By sequencing, it is shown that this genomic DNA fragment corresponds to the sequence between oligonucleotides SOlO and SO20. The DNA amplified during this PCR reaction is then used for the construction of the vectors, as described below.
VII . Construction of the genetic transformation vectors necessary for the functional analysis of the promoter of the gene encoding the storage protein αβ of Coffea arabi ca
The sequences located upstream of the site of initiation of translation, positioned at 2510 of the nucleic sequence SEQ ID NO: 3, are analysed in order to test their capacity for regulating the expression of the reporter gene uidA, in the beans of transformed plants .
To do this, several constructs are made in the binary transformation vector pBIlOl (Clontech Laboratories
Inc., 1020 East Meadow Circle, Palo Alto, California
94303-4230 USA) . This vector contains the reporter gene uidA which encodes β-glucuronidase (GUS) and the bacterial gene nptll, which encodes neomycin phosphotransferase . The latter confers resistance to kanamycin in the transformed plants. These two genes are bordered by the right and left ends of the T-DNA of the plasmid pTiT37 of Agrobacterium tumefaciens (Bevan,
Nucl. Acids Res. 12_, 8711-8721, 1984) which define the DNA region capable of being transferred into the genome of plants infected with this bacterium.
The vector pBIlOl is digested with the restriction enzyme BamHl and it is dephosphorylated by treatment with calf alkaline phosphatase (Promega, USA) according to the protocol defined by the supplier.
Next, DNA fragments of different size which are obtained by PCR in the presence of the vector pCSPP4 , of Pfu DNA polymerase and of two synthetic oligonucleotides each containing at their 5' end the nucleic sequence SEQ ID NO: 13, are cloned into the vector PBIlOl. This sequence comprises a BamHl restriction site which allows the cloning of the PCR products into the vector pBIlOl linearized with the same enzyme.
A synthetic oligonucleotide is used, on the one hand, which is capable of binding to the promoter and, on the other hand, the oligonucleotide BAGUS which has the nucleic sequence SEQ ID NO: 14. The use of the latter allows, after digestion of the PCR products with the restriction enzyme BamHl, a translational fusion between the first 5 amino acids of the storage protein αβ of the coffee bean and the N-terminal end of β- glucuronidase to be obtained.
a) Construction of pCSPP5
The PCR reaction is carried out with 5 ng of plasmid pCSPP4, in a volume of 50 μl containing 10 mM KC1, 6 mM
(NH4)2S04, 20 mM Tris-HCl, pH 8, 0.1% Triton X-100, 2 mM
MgCl2, 0.2 mM of each dNTP, 10 μg/ml BSA, 0.25 μM of the oligonucleotide UP210 having the nucleic sequence SEQ ID NO: 15 and BAGUS, having the nucleic sequence SEQ ID NO: 14, and 3 units of Pfu DNA polymerase. The reaction mixture is covered with 50 μl of mineral oil and it is incubated for 30 cycles (94°C-30 s, 55°C- 30 s, 72°C-2 min) followed by a final extension cycle at 72°C for 7 min.
The PCR fragment of about 950 bp is purified on a Microcon 100 cartridge (Amicon, USA) , and it is digested for 12 h at 37°C with BamHl (Promega, USA) and it is ligated into the linearized vector pBIlOl, in the presence of T4 DNA ligase (Promega, USA) , according to the recommendations provided by the supplier. Next, the E. coli strain XLl-Blue MRF' is transformed with the entire ligation mixture. The plasmids are independently extracted from several transformants and they are sequenced so as to determine the orientation of the PCR fragment in the binary vector. This analysis thus makes it possible to select the plasmid pCSPP5.
b) Construction of pCSPP6
The construction of this vector is carried out as described for the vector pCSPP5 except for the fact that the oligonucleotide UP210 is replaced with the oligonucleotide UP211 which has the nucleic sequence SEQ ID NO: 16. The cloning of the PCR product (about 700 bp) , correctly oriented in the vector pBIlOl, gives the vector pCSPP6.
c) Construction of pCSPP7
The construction of this vector is carried out as described for the vector pCSPP5 except for the fact that the oligonucleotide UP210 is replaced with the oligonucleotide UP212 which has the nucleic sequence SEQ ID NO: 17. The cloning of the PCR product (450 bp) , correctly oriented in the vector pBIlOl, gives the vector pCSPP7.
d) Construction of pCSPP8
The construction of this vector is carried out as described for the vector pCSPP5 except for the fact that the oligonucleotide UP210 is replaced with the oligonucleotide UP213 which has the nucleic sequence SEQ ID NO: 18. The cloning of the PCR product (250 bp) , correctly oriented in the vector pBIlOl, gives the vector pCSPP8.
VIII. Transformation of Agrobacterium tumefaciens
The vectors described above (pCSPP5-8) , as well as the plasmids pBIlOl and pBI121 (Clontech) are independently introduced into the disarmed Agrobacterium tumefaciens strain C58pMP910 (Koncz and Schell, Mol . Gen. Genet. 204, 383-396, 1986) according to the direct transformation method described by An et al . (Plant Mol. Biol. Manuel, Gelvin, Schilperoort and Verma Eds, Kluwer Academic Publishers Dordrecht, Netherlands, A3 , 1-19, 1993). For each transformation, the recombinant Agrobacterium tumefaciens clones are selected on LB medium supplemented with kanamycin (50 μg/ml) and rifampicin (50 μg/ml) .
To check the structure of the plasmids introduced into Agrobacterium tumefaciens , they are extracted by the rapid minipreparation technique described by An et al . (1993) and they are analysed by restriction mapping after reverse transformation in the E. coli strain XLl- Blue MRF' .
In the plasmid pBIlOl, the gene uidA is silent because it lacks a promoter. In contrast, this same gene is expressed in plants transformed with the vector pBI121 because it is under the control of the constitutive CaMV 35S promoter (Jefferson et al . , J. EMBO, _6, 3901- 3907, 1987) . These two plasmids were used respectively as negative and positive controls for the expression of the reporter gene uidA .
IX . Transformation and regeneration of Nicotiana tabacum
The transformation of Nicotiana tabacum var . XHFD8 is carried out with the vectors described above (pCSPP5-8, pBIlOl and pBI121) , according to the protocol described by Horsch et al . (Plant Mol. Biol. Manuel, Gelvin, Schilperoort and Verma Eds, Kluwer Academic Publishers Dordrecht, Netherlands, A5, 1-9, 1993). To do this, foliar discs of plantlets which are germinated in vi tro are incubated for about 2 min with a transformed stationary phase culture of Agrobacterium tumefaciens diluted in a 0.9% NaCl solution so as to obtain an OD measurement at 600 nm of between 0.2 and 0.3. They are then dried on 3 MM paper (Whatmann), they are incubated, without selection pressure, in a culture chamber on MS-stem medium (MS salts 4.3 g/1, sucrose 30 g/1, agar 8 g/1, myoinositol 100 mg/1, thiamine 10 mg/1, nicotinic ' acid 1 mg/1, pyridoxine 1 mg/1, naphthaleneacetic acid (NAA) 0.1 mg/1, benzyladenine (BA) 1 mg/1) (Murashige and Skoog, Physiol. Plant 15 , 473-497, 1962) .
After 3 days, the discs are transferred onto MS medium supplemented with kanamycin (100 μg/ml) and with cefotaxime (400 μg/ml) in order to multiply the transformed cells so as to obtain calli. These discs are then subcultured every week on fresh "MS-stem" medium.
After 21 to 28 days, the buds which germinate are cut from the calli and they are subcultured on standard MS medium, that is to say an MS medium free of phytohormones, supplemented with kanamycin (100 μg/ml) and with cefotaxime (200 μg/ml) . After rooting on a Petri dish, the plantlets are transplanted into earthenware pots in a substrate composed of peat and compost and then grown in a greenhouse at a temperature of 25°C and with a photoperiod of 16 h. For each transformation experiment, 30 plantlets (R0 generation) are selected. All these plantlets proved to be morphologically normal and fertile. They were selfed and gave seeds (Rl generation) . X. Analysis of the genomic DNA of tobacco plants transformed with Agrobacterium tumefaciens
The genomic DNA of transgenic tobacco plants is extracted from the leaves according to the protocol described by Rogers and Bendich (Plant Mol. Biol. Manuel, Gelvin, Schilpoort and Verma Eds, Kluwer Academic Publishers Dordrecht, Netherlands, A6_, 1-11, 1993) and then they are analysed by PCR and by molecular hybridization, according to the Southern-Blot technique .
The PCR reactions are carried out with 10 ng of DNA in a final volume of 50 μl containing 50 mM KC1, 10 mM Tris-HCl, pH 8.8, 1.5 mM MgCl2, 0.1 mg/ml gelatin, 0.2 mM of each dNTP, 3 units of Tag DNA polymerase, 0.25 μM of the oligonucleotide BI104, having the sequence SEQ ID NO: 19 described in the sequence listing hereinafter, and 0.25 μM of the oligonucleotide BI105, having the sequence SEQ ID NO : 20, described in the sequence listing hereinafter. The oligonucleotide BI104 is located at 27 bp downstream of the BamHl site of the plasmid BI101 and the oligonucleotide BI105 is located at 73 bp upstream of the BamHl site of the plasmid pBIlOl. The PCR reactions are carried out over 30 cycles (94°C-30 s, 54°C-30 s, 72°C-2 min) followed by a cycle of 7 min at 72 °C (final extension) . The DNA fragments amplified from transgenic tobacco plants transformed with the plasmids pBIlOl (negative control), pBI121 (positive control), pCSPP5, pCSPP6, pCSPP7 and pCSPP8 have molecular weights of about 280 bp, 1030 bp, 1230 bp, 980 bp, 730 bp and 430 bp respectively. In all cases, it is concluded that the fragment initially cloned upstream of the reporter gene uidA is intact.
10 μg of the DNA from tobacco plants transformed with Agrobacterium tumefaciens are digested with BamHl . Next, the restriction fragments obtained are separated by electrophoresis on agarose gel (1%) and the DNA is transferred onto a Nylon filter, before hybridizing it independently with a probe uidA and a probe nptll.
The probe uidA is synthesized by PCR using the synthetic olignucleotide GMP1, having the sequence SEQ ID NO: 21 described in the sequence listing hereinafter, and the synthetic oligonucleotide GMP2 having the sequence SEQ ID NO: 22 described in the sequence listing hereinafter, in the presence of 0.1 ng of vector pBIlOl, in a final volume of 50 μl containing 50 mM KC1, 10 mM Tris-HCl, pH 8.8, 1.5 mM MgCl2, 0.1 mg/ml gelatin, 0.2 mM of each dNTP, 0.25 μM of each oligonucleotide and 3 units of Tag DNA polymerase (Stratagene, USA) . The reaction mixture is covered with 50 μl of mineral oil and it is incubated for 30 cycles (94°C-30 s, 46°C-30 s, 72°C-2 min) followed by a cycle at 72°C for 7 min. The probe np ll is synthesized by PCR using the synthetic oligonucleotide NPTII-1, having the sequence SEQ ID NO: 23 described in the sequence listing hereinafter, and the synthetic olignucleotide NPTII-2 having the sequence SEQ ID NO: 24 described in the sequence listing hereinafter, in the presence of 0.1 ng of vector pBIlOl, in a final volume of 50 μl containing 50 mM KC1, 10 mM Tris-HCl, pH 8.8, 1.5 mM MgCl2, 0.1 mg/ml gelatin, 0.2 mM of each dNTP, 0.25 μM of each oligonucleotide and 3 units of Tag DNA polymerase (Stratagene, USA) . The reaction mixture is covered with 50 μl of mineral oil and it is incubated for 30 cycles (94°C-30 s, 46°C-30 s, 72°C-2 min) followed by a cycle at 72 °C for 7 min.
These two probes are purified and then they are labelled as described above in test VI.
The hybridization profiles obtained for each probe are then compared so as to select the tobacco plants transformed with Agrojacterium tumefaciens which have integrated into their genome a single non-rearranged copy of the T-DNA. The selection of these plants is also confirmed by the results of the analysis of the segregation of the kanamycin-resistance character, after germination in vi tro on standard MS medium of the Rl seeds of these plants. Indeed, in this case, a 3/4- 1/4 segregation of the kanamycin-resistance character is observed, which is compatible with the integration of the T-DNA at a single locus of the nuclear DNA.
XI. Study of the functioning characteristics of the coffee promoter and of its derivatives in transgenic tobacco plants
This study is carried out on R0 generation plants and on Rl generation mature seeds.
The measurements of the GUS activity are therefore carried out on the leaves and the seeds according to the method described by Jefferson et al . (1987), using MUG (methyl umbelliferyl glucuronide) as substrate and by measuring, by fluorimetry, the appearance of MU (methylumbelliferone) . To do this, the foliar explants (10 mg) and the seeds (about 40) are ground in the presence of sterile sand in 300 μl of extraction buffer (50 mM Na2HP0 , pH 7.0, 10 mM EDTA, 10 mM β- mercaptoethanol) . The cellular debris are removed by centrifugation for 15 min at 4°C and the soluble proteins in the supernatant are quantified by the Bradford method (Anal. Biochem. 12.' 248-254, 1976) according to the protocol defined by Bio-Rad (USA) and using BSA as standard. The measurements of GUS activity are carried out in microtitre plates incubated at 37°C, using 1 μg of soluble proteins in 150 μl of reaction buffer (extraction buffer with 1 mM MUG) . The measurements of fluorescence, expressed in pmol MU/min/mg of proteins are carried out at an excitation wavelength of 365 nm and an emission wavelength of 455 nm (Fluoroskanll , Labsystem) . The results of the measurements of GUS activity which are presented in Figure 1 hereinafter show that no enyzmatic activity is observed in the leaves and the seeds of the plants containing the T-DNA of the plasmid pBIlOl. For the other transformation experiments, the differences in GUS activities which are observed between each of the transgenic plants transformed with the same genetic construct can be explained by a positional effect which results from the random integration of the T-DNA into the genome (Jones et al . , J. EMBO, _4, 2411-2418, 1985). The plants containing the construct pBI121 have a glucuronidase activity of between 1500 and 20,000 pmol of MU/min/mg of proteins. For these same plants, no significant differences are observed between the measurements of GUS specific activities carried out using the seeds and the leaves. These observations confirm the constitutive character of the CaMV 35S promoter in plants (Odell et al . , Nature 3_13_, 810-812, 1985).
Analysis of the results shows that the GUS activities measured in the leaves of the plants independently containing the T-DNAs of the plasmid constructs pCSPP5, pCSPP6, pCSPP7 and pCSPP8 are zero. On the other hand, the GUS activities measured in the seeds of these same plants, independently containing the T-DNAs of the plasmid constructs pCSPP5, pCSPP6, pCSPP7 and pCSPP8 , are respectively 60, 60, 30 and 12 times higher than the average GUS activity measured in the seeds of the plants transformed with the plasmid pBI121. It is also observed that the maximum expression of the uidA gene is obtained with the vectors pCSPP5 and pCSPP6, reaching on average 465 nmol of MU/min/mg of protein. From this observation, it can be concluded that the DNA fragment between nucleotides 1572 (5' end of the sequence SEQ ID NO: 15) and 1815 (5' end of the sequence SEQ ID NO: 16) of the sequence SEQ ID NO: 3 contains no sequence which is critical in the functioning of the coffee promoter. The most substantial deletions made in the promoter (corresponding to the vectors pCSPP7 and pCSPP8) have as a consequence a reduction in the level of expression of the uidA reporter gene which is greater, the more substantial the deletion. On the other hand, these deletions do not lead in any case to a loss of the specificity of expression of the promoter since in all the transgenic plants analysed, the uidA reporter gene continues to be specifically expressed in the seeds. The measurements of the GUS activity show that the coffee DNA sequence between nucleotides 1572 and 2524 of the sequence SEQ ID NO: 3 described in the sequence listing hereinafter effectively contains a promoter which behaves like- a very strong promoter in the tobacco seeds compared with the 35S promoter of CaMV. It is also observed that this same DNA sequence, as well as the deletions derived therefrom contain the information which is necessary and sufficient to direct the expression of the uidA reporter gene in the seeds of the transgenic tobacco plants at a level which is in all cases greater than that conferred by the reference promoter CaMV35S.
XII . Expression of the coffee 11s storage protein in Escherichia coli
To overexpress and purify the coffee 11s protein in Escherichia coli , a PCR amplification of the DNA sequence between nucleotides 108 and 1517 of the sequence SEQ ID NO: 1 is carried out with the aid of the oligonucleotide TAG1 , having the sequence SEQ ID NO: 25, and the oligonucleotide TAG2 , having the sequence SEQ ID NO: 26. These two sequences are described in the sequence listing hereinafter. These two oligonucleotides make it possible to introduce the unique .EcoRl and Pstl sites into the coffee sequence amplified by PCR. They also make it possible to amplify the coffee DNA sequence encoding the coffee storage protein but lacking its cellular addressing sequence, called "signal peptide", which is between amino acids 1 and 26 of the sequence SEQ ID NO: 2. This strategy was followed so as to limit the toxic effects due to an overexpression in Escherichia coli of the proteins which contain very hydrophobic sequences .
This reaction is carried out in the presence of 50 ng of vector pCSP2 , in a final volume of 100 μl containing 1.5 units of Pwo DNA polymerase (Boehringer Mannheim), 10 μl of 10X Pwo DNA polymerase buffer (Boehringer Mannheim), 0.1 mM of each dNTP and 2 nM of each olignucleotide, TAG1 and TAG2. The reaction mixture is incubated for 30 cycles (94°C-30 s, 40°C-60 s, 72°C- 2 min) followed by a final extension cycle at 72°C for 7 min. 30 μl of the' PCR mixture are then digested with the restriction enzymes EcoRl and PstI according to the recommendations provided by Promega (USA) . The coffee DNA fragment (1400 bp) amplified by PCR is separated by electrophoresis on a 0.8% agarose gel and it is purified according to the recommendations provided in the QIAquick Gel Extraction kit (Qiagen Inc, 9600 De Soto Avenue, Chatsworth, CA91311, USA) . It is then ligated into the expression vector pQE31 (Qiagen, USA) previously digested with the enzymes EcoRl and PstI and dephosphorylated by a calf intestinal alkaline phosphatase treatment. The use of the expression vector pQE31 makes it possible to introduce 6 histidines (6 His tag) in phase with the N- terminal end of the coffee 11s storage protein, which then facilitates the purification of this recombinant protein after passing over a column of Ni-NTA resin containing Ni2+ ions (Hochuli et al . , J. Chromatography, 411, 177-184, 1987) .
The ligation mixture is used to transform competent cells of the strain M15[pREP4] of Escherichia coli according to the recommendations provided by Qiagen (USA) and the recombinant bacteria are selected on dishes with LB medium containing 25 μg/ml of kanamycin and 100 μg/ml of ampicillin. To test the expression of the coffee 11s storage protein in Escherichia coli , the bacteria are then cultured in 50 ml of liquid LB medium supplemented with the antibiotics as indicated above until an OD at 600 nm = 1 is obtained. The induction is carried out by adding IPTG to a final concentration of 1 mM into the culture medium and culture samples are collected every 30 minutes. The bacteria are lysed and the soluble proteins are extracted from Escherichia coli under denaturing conditions. These proteins are then separated on a column of Ni-NTA resin following the protocol defined by QIAGEN (QIAexpress system) . The protein fractions successively eluted are then analysed by SDS-PAGE electrophoresis. It is thus shown that the only protein capable of binding to the Ni-NTA column corresponds to the coffee 11s recombinant protein. This protein is expressed in Escherichia coli with an approximate molecular weight of 55 kDa which is in agreement with that observed in coffee beans for the storage protein in its precursor form, and this taking into consideration the protein sequence modifications which were carried out during the construction of the expression vector.
SEQUENCE LISTING
(1) GENERAL INFORMATION:
(i) APPLICANT:
(A) NAME: SOCIETE DES PRODUITS NESTLE
(B) STREET: AVENUE NESTLE 55
(C) CITY: VEVEY
(D) STATE OR PROVINCE: VAUD
(E) COUNTRY: SWITZERLAND
(F) POSTAL CODE: 1800
(G) TELEPHONE: 021/924 34 20 (H) TELEFAX: 021/924 28 80
(ii) TITLE OF THE INVENTION: COFFEE PROTEINS (iii) NUMBER OF SEQUENCES: 26 (iv) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Floppy disk
(B) COMPUTER: IBM PC compatible
(C) OPERATING SYSTEM: PC-DOS/MS-DOS
(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO)
(2) INFORMATION FOR SEQ ID NO : 1: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1706 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: double
(D) TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE:
(A) NAME/KEY: CDS
(B) LOCATION: 33..1508
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1:
AAACACACTA CACTCTCCTC TGTTGTCAGA GA ATG GCT CAC TCT CAT ATG ATT 53
Met Ala His Ser His Met lie 1 5
TCT CTT TCC TTG TAC GTT CTT TTG TTC CTC GGC TGT TTG GCT CAA CTA 101 Ser Leu Ser Leu Tyr Val Leu Leu Phe Leu Gly Cys Leu Ala Gin Leu 10 15 20
GGG AGA CCA CAG CCA AGG CTC AGG GGT AAA ACT CAG TGC GAT ATT CAG 149 Gly A g Pro Gin Pro Arg Leu Arg Gly Lys Thr Gin Cys Asp He Gin 25 30 35
AAG CTT AAT GCA CAA GAA CCA TCC TTC AGG TTC CCA TCA GAG GCT GGT 197 Lys Leu Asn Ala Gin Glu Pro Ser Phe Arg Phe Pro Ser Glu Ala Gly 40 45 50 55
TTA ACT GAA TTC TGG GAT TCT AAT AAT CCA GAA TTT GGG TGC GCT GGT 245 Leu Thr Glu Phe Trp Asp Ser Asn Asn Pr>< Glu Phe Gly Cys Ala Gly 60 65 70
GTG GAA TTT GAG CGT AAC ACT GTC CAA CCT AAG GGC CTT CGT TTG CCT 293 Val Glu Phe Glu Arg Asn Thr Val _Gj.n Pro Lys Gly Leu Arg Leu Pro 75 80 85
CAT TAC TCT AAC GTG CCT AAA TTC GTC TAC GTT GTC GAA GGT ACC GGT 341 His Tyr Ser Asn Val Pro Lys Phe Val Tyr Val Val Glu Gly Thr Gly 90 95 100
CTT CAA CCC ACT CTC ATC CCT CβT TCT CCT CAA ACA TTT CAA TCC CAG
Val Gin C y Thr Val lie Pro Sly Cys Ala Clu Thr Phe Glu Sar G n
105 110 115
CCT SAA TCA TTT TGC GCT CCT CAC CAA CAG CCC CCC AAA CSS CAA SAA 437 Cly Clu Sar P « Trp Gly Gly Gin Glu Gin Pro Cly Lys Gly Gin Glu 120 125 130 135
GGC CAA SAC CAA GCT TCC AAA GGT CCT CAC SAA GGC CCA ACC CAA ACS 485 Gly Gin Slu n sly Sar Lys sly sly sin S u sly Arg A ? C n Arg 140 145 ISO TTT CCA CAC CSC CAT CAC AAC CTC ACA ASS TTC CAA AAA CCA CAT GTC 533 Phe Pro Aap Arg Hi- Gin Lys Leu xg Arg Phe Gin Lys Gly Aap Val 155 160 165
CTT ATA TTG CTT CCT CCT TTC ACT CAC TCG ACA TAT AAT GAT GGA GAT 5βl Lau lie Leu Leu Pro Gly Phe Thr Gin Trp Thr Tyr A»n Aap Gly Aap 170 175 180
GTT CCA CTT GTC ACT GTC GCA CTT CTT GAT GTT GCC AAT CAC GCT AAT 629 Val Pro Leu Val Thr Val Ala Leu Leu Aap Val Ala Aan Glu Ala Aan 185 190 195
CAC CTT CAT TTG CAC TCC AGO AAA TTT TTC CTA GCC CGA AAC CCS CAA 677
Cln Lau Aap L«u cln Sar Arg Lys Phe Phe Leu Ala Cly Aan Pro Cln
200 205 210 215
CAG GGT CGT GGA AAG GAA GGC CAT CAA GCC CAG CAS CAG CAG CAT ACA 725
Gin Gly Gly Gly Lys Glu Gly His G n Gly Cln Gin Gin S n ___.• Arg 220 225 230 AAC ATC TTC TCA CGA TTT GAT GAC CAA CTT TTG CCT GAT GCT TTC AAT 773 Aan He Phe Ser Gly Phe Aap Asp Cln Lau Lau Ala Aap Ala Phe Asn 235 240 245
GTT GAC CTC AAA ATA ATA CAS AAA TTG AAC CGT CCS AAA GAT CAA AGG 821 Val Aap Leu Lys He He Gin Lys Leu Lys Gly Pro Lys Aap G n Arg 250 255 260
CCT AGC ACA CTC CGA CCT GAA AAA CTT CAA CTC TTC CTC CCT GAA TAT 869 Gly Ser Thr Val Arg Ala Slu Lys Leu Gin Leu Phe Leu Pro Glu Tyr 265 270 275
AGT GAG CAA GTG CAA CAA CCC CAA CAA CAG CAC GAC CAC CAA CAA CAT 917 Ser Glu Sin Val Cln Gin Pro Cln G n Cln Cln Glu Gin S n Cln Eli 280 285 290 295
GCT CTT GCA AGA SCA TCS ACA TCC AAC CGA CTT SAS SAA ACT TTC TSC Gly Val Gly Arg Gly Trp Arg Ser Aan Gly Leu Glu Glu Thr __eu Cys 300 305 310 ACC CTG AAC CTT AGT CAA AAC ATT CCC CTC CCC CAA CAC GCT CAT CTA 1013
Thr Val Lys Leu Ser Glu Aan Ha Cjly Lau Pro Gin Clu Ala Aap Val 315 320 325
TTC AAT CCT CCT CCT CGC CGC ATT ACC ACT CTT AAT AGC CAA AAG ATT 1061 Pha Aan Pro Arg Ala Gly Arg Ha Thr Thr Val Aan Sar Cln Lys Ha
330 335 340
CCT ATC CTC AGC AGC CTC CAA CTT. AGT GCA GAA AGA GGA TTC CTC TAC 1109 Pro II* Leu Ser Sar Lau Gin Lau Sar Ala Glu Ax? Gly Phe Lau Tyr 345 350 355
AGC AAT GCC ATT TTT GCA CCA CAC TGG AAT ATC AAT GCA CAT AAT GCC 1157 Ser Asn Ala l e Phe Ala Pro His Trp Aan l e Asn Ala His Asn Ala 360 365 370 375
CTG TAT GTG ATT AGA GGA AAT GCA AGA ATT CAG GTG GTG GAT CAC AAA 1205 Leu Tyr Val He Axg Gly Asn Ala Arg He Gin Val Val Asp His Lys 380 385 390
GGA AAC AAA GTT TTT GAC GAT GAA GTA AAA CAG GGT CAG CTA ATA ATT 1253 Gly Asn Lys Val Phe Asp Asp Glu Val Lys Gin Gly Gin Lau II* He 395 400 405
GTG CCA CAA TAC TTT GCT GTG ATC AAG AAA GCT GGA AAC CAA GGA TTT 1301 Val Pro Gin Tyr Phe Ala Val He Lys Lys Ala Gly Asn Gin Gly Phe 410 415 420
GAG TAC GTT GCA TTC AAG ACG AAC GAC AAT GCC ATG ATT AAC CCA CTT 1349 Glu Tyr Val Ala Phe Lys Thr Asn Asp Asn Ala Met He Asn Pro Leu 425 430 435
GTT GGA AGA CTT TCG GCA TTT CGA GCA ATT CCT GAG GAA GTT TTG AGG 1397 Val Gly Arg Leu Ser Ala Phe Arg Ala He Pro Glu Glu Val Leu Arg 440 445 450 455
AGC TCT TTC CAA ATT TCC AGC GAG GAA GCT GAG GAA TTG AAG TAT GGA 1445 Ser Ser Phe Gin He Ser Ser Glu Glu Ala Glu Glu Leu Lys Tyr Gly 460 465 470
AGA CAG GAG CGT TTG CTT TTG AGT GAG CAG TCT CAG CAG GGG AAA AAG 1493 Arg Gin Glu Arg Leu Leu Leu Ser Glu Gin Ser G n Gin Gly Lys Lys 475 480 485
AGA AGT TGC TTG AGC TAATTATGTA AAAATAATCG TATATTAGTC CATGCATAGT 1548 Arg Ser Cys Leu Ser 490
CTACCAACTA TATGTGTGAA TCTAATTCCA AAATAAAATG GTCAATGGAT GTAAAGACAT 1608
GGCAATCCAA GCCTTACTAC TGGCGTTGAT TGCGAGAAGT TTGATGTTTG GTGACCATGA 1668
GTCAATAATA AACTATGATA ATTAATGTAA AATTTTCC 1706
2) INFORMATION FOR SEQ ID NO : 2: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 492 amino acids
(B) TYPE: amino acid (D) TOPOLOGY: linear
(ii) MOLECULE TYPE: protein
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2:
Met Ala His Ser Hxs Met lie Ser Leu Ser Leu Tyr Val Leu Leu Phe
1 5 10 15 l«u Gly Cys u Ala G n Leu Gly ? Pro Gin Pro Λrg u Ax« Gly 20 25 30
Lye Thr G n Cya Aap Zla Gin Lys Uu Aan Ala Sin Slu Pro Sar Phe 35 40 45
Arg Phe Pro Ser Glu Ala Gly Leu Thr Glu Phe Trp Aap Ser Asn Aan SO 35 60
Pro Glu Phe Sly Cya Ala Gly Val Clu Phe Glu Are λin Thr Val Cln 65 70 75 80
Pro Lyi Gly Leu Arg Leu Pro His Tyr Ser Aan Val Pro Lys Phe Val 85 90 95
Tyr Val Val Glu Gly Thr Sly Val Gin Gly Thr Val lie Pro Gly Cya 100 105 110
Ala Glu Thr Phe Glu Ser Gin Gly Glu Ser Phe Trp Gly Gly Gin Glu 115 120 125
Gin Pro Gly Lys Gly Gin Glu Gly Gin Glu Gin Gly Ser Lya Sly Gly 130 135 140
Gin Glu Gly Arg Arg Gin Arg Phe Pro Asp Arg His Gin Lya Leu Arg 1*5 150 155 160
Arg Phe Gin Lya Gly Asp Val Leu lie Leu Leu Pro Sly Phe Thr Gin 165 170 175
Trp Thr Tyr Aan Aap Gly Aap Val Pro Leu Val Thr Val Ala Lau Leu 180 185 190
Aap Val Ala Asn Glu Ala Asn Gin Leu Asp Leu Gin Ser Arg Lya Phe 195 200 205
Phe Leu Ala Gly Asn Pro Gin Gin Gly Gly Gly Lys Glu Gly His Gin 210 215 220
Gly Gin Gin Gin Gin His Arg Asn He Phe Ser Gly Phe Asp Asp Gin 225 230 235 240
Leu Leu Ala Asp Ala Phe Aan Val Aap Leu Lys He Zla Gin Lya Leu 245 250 25S
Lys Gly Pro Lys Asp Gin Arg Gly Ser Thr Val Arg Ala Glu Lys Leu 260 265 270
Gin Lau Phe Leu Pro Glu Tyr Ser Glu Gin Val Gin Gin Pro Gin Gin 275 2β0 285
Gin Gin Glu Gin Gin Gin Hia Gly Val Gly Arg Gly Trp Arg Ser Asn 290 295 300
Gly Leu Glu Slu Thr Lau Cys Thr _Val Lys Leu Ser Glu Aan l e Gly 305 310 315 320
Leu Pro Gin Glu Ala Asp Val phe Asn Pro Arg Ala Gly Arg lie Thr 325 330 335
Thr Val Aan Ser Gin Lys He Pro He Leu Ser Ser Leu Gin Leu Ser 340 345 3S0
Al* Glu Arg Gly Ph* Leu Tyr Ser Asn Al* He Phe Ala Pro His Trp 355 360 365
Asn He Asn Ala His Asn Al* Leu Tyr V*l He Arg Gly Asn Ala Arg 370 375 380
He Gin Val Val Asp His Lys Gly Asn Lys Val Phe Asp Asp Glu Val 385 390 395 400
Lys Gin Gly Gin Leu He He Val Pro Gin Tyr Phe Ala Val He Lys 405 410 415
Lys Ala Gly Asn Gin Gly Phe Glu Tyr Val Ala Phe Lya Thr Asn Asp 420 425 430
Asn Ala Met He Asn Pro Leu Val Gly Arg Leu Ser Ala Phe Arg Ala 435 440 445
He Pro Glu Glu Val Leu Arg Ser Ser Phe Gin He Ser Ser Glu Glu 450 455 460
Ala Glu Glu Leu Lys Tyr Gly Arg Gin Glu Arg Leu Leu Leu Ser Glu
465 470 475 480
Gin Ser Gin Gin Gly Lys Lys Arg Ser Cys Leu Ser 485 490
(2) INFORMATION FOR SEQ ID NO : 3: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 3477 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: double
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: DNA (genomic) (ix) FEATURE:
(A) NAME/KEY: promoter
(B) LOCATION: 1..2509
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3: TTTAAAAGTT TGGTAGAAAA TTTGAGATAT TTGACGTCTA CGAGGCCAGA TATCAATTTG 60 TGTTGGTGT GATTAACAAG TTTATGAAAA ATCCACGTCA GTCACATTTG TAG5CGGTTA 120
AGAAGATTTT GAAATATATT GAAAGTACTC ACAGTGTTGG CATTTTTTAT TCAGAAAATT 180
ATCCAGTTGA ATGGTTTGGC TACTGATAGT TATTGAGCAG GTGATACAAT AGAGAAGAAG 240
ACTACTTCAA GTTATβCATT TTTTATTGGT. .CTGACGTAT TTTCCTCGAG TTCAAAGAAA 300
CAACAGGTGA TTGCATTGTT TACAGCAGAA GCAGAGTATA TTGCAGCGGC TAAXAGTGCT 360
AATCAAGTTT TGTGGTTACG TTGCATGTTT GGTATTCTAC AATACAAGCA GGTTGATCTT 420 ACGAΛAATTT ATTGTGATAG TΛAGTCAGCT ATTGAATTGT CCAAGAAXTT AGTACTTCAT 480
GGATGTAACA AGCλXATTGG CATCAAA AT CACTTCA AC CTSAGT GCT TCGAGAGAGA 540
GAGAGASGT -AAATTGATT ATTGCAGAAT TAAATAGTAA GTSSCTGACA TTTTCΛCCAA 600
GACATTGAAG ATAGAGATTT TTGTCAAGTT GAAGAATAXG TTASGCATGT CCAAGTTAGA 660
GGAGATTCGT TTAAXGGAGC CAAXATAGAA ACACAAACCA AGCCTTTATT AT TG T AT 720
GCTGTCATGT GGGATTGGTA GTAGTATTGT TGGTTGGTAG GGTββTCA A TGGGATTGAA 780
TTTCCTA GA CTΛSTASλST TAGTAΛXAGA AGTTAGCCG CAAGGGGTTT TGATGTGTAG 840
CTG TGCGTC CGTCTT TT AGCCTTAAGA AGAAGTAGTC ACCXCCGTTG TCT CTGCAT 900
GGTGTAGCAG AGCCTTGTTA TGATTAATAG AAAATTTTCC TTTGCCTCAA TATCGTTTTT 960
TTTTTTTATT GTTTCTGTGG GTTTTGTGTA TTTATCAATT TGGGTCCACC ACTTTTTCCA 1020
ACCATGATCT TAAGCATATC ACCTCACTTC CACTCTATTT CTTTACCATG ATTTTAACTA 1080
CAATAATTTC CTAAAAAACC AAAAAAGAAT CATATCTATA AATTTTGAGA AAAGCATATA 114 0
TACTGCTAAC ATGATTCTAC GTATAATAGT GGATTATTAA AAATTATTAA TTACATATTT 1200
TTGACATAAC CATCGGTGTA CCAAAACCAC TAATGATTAC AACACTAAAC ACCCAAAGTT 1260
GAGTAATTGA AACTGAAATT ACACATAGAC AAAAACTCAA CTAAACAATG TTAGAATGGA 1320
ATAGATTAGA GAACCATTGA ATGATCTAAC TCTGGAACTG GGGTTAAGAC AGT CTT C CCA 1380
AGCAACTTTT TTTGTCCATG ATTTGGCTAT CATATCACTA TCTTGAAATT TGTTCAGACA 144 0
CACTGTGGGA GGCTGGAATC AATAGCTTGG ACTTGGATCA TTTAXASAAC CTSATGATCA 1500
TTATTGCTCA ACATATGAAT TTGATACAAA TGTCAGTGGA ATCAACTTCG TACTTTTTTT 1S 60
TTTTCCTCTT TTCTTTTGGA GTACAAGCCT ACCTACAAGG GGAAGGATAG AGGAAATGCA 1620
TAGAGGGAGA TTTAACCTCT ACCCAAGCGG CAGATACAAT GGGTCACGAT ACAGCTGGTT 1680
TATTGATGTA TTACAGCGGA AAACGATGTA GATGAGCAAC CTTTTCAAAG AACATAACTC 1740
AAAATCATAG ATGTAAAGCA GTCAACTGAG TCTGTGGCAA TTGTTAGACC TAAAACTCTA 1 B 00
TTCCATGTCA TTATTAGGTT TCTTGCTCTA TCTTTTAGTT TGATCCAACA TGGATTGGCT 1860
GTCTTTTGTT TGCTAATAAA GATTTTAAAT CATGGAATTT CCCTGTAGAA TGCCTTTAAT 1920
TACATGCCAC TAGACTAGAA ACGGTAATTG TTTAACAGAT ATT ATTCCA GGCATTGAAA 1980
TTATGAACTG CAACAGTCAT TTGCCTAGAA GTGTAAACCA ATTGTCTTCA ATAAAGGTGA 2040
ATAAAAATCG ATGAAGATAG ATAGGTGCTA_ GAAACTTAAA AGCAGAAGAT GAXAGGTGTG 2100
ATGTAATACG CAGCAGTAGT GATCATCTTT CCATATCACA TCTTGAAAGA TCCCAAGATG 2160
AATGTGTGTT TGATTTGGGG TTTGATTCAT CAAAAGCCAT CGTAGCAGAT AATGCACCTT 2220
ACCATGCCAT TGCTAAAfiTA CλAAλATTTC ATGCAAATAC AAACACAAAA GATTGAACAA 2280
TACATGTCAG AAACTCTATG CCACCAAGGC TTACACATCA TCTTTGGTGT AAAGAAGTGT 2340
TCATCTTCAT CAGCCATGCA CAAGACTGAS TAGCCAAGTG TAAAATGAAA ATTTTGACGT 2400
GTCGATTCCT CATCTTCCAT TACATGTTAT AAAAGGAGCC ATTTCCAAGC TCTAATCGCC 2460
GCATCCCCTC ACCACAA AA CACACTjACAC TCTCCTCTGT TGTCAGAGA TGGCTCACTC 2520
TCATATGATT TCTCTTTCCT TGTACGTTCT TTTGTTCCTC GGCTGTTTGG CTCAACTAGG 2580
GAGACCACAG CCAAGGCTCA GGGGTAAAAC TCAGTGCGAT ATTCAGAAGC TTAATGCACA 2640
AGAACCATCC TTCAGGTTCC CATCAGAGGC TGGTTTAACT GAATTCTGGG ATTCTAATAA 2700
TCCAGAATTT GGGTGCGCTG GTGTGGAATT TGAGCGTAAC ACTGTCCAAC CTAAGGGCCT 2760
TCGTTTGCCT CATTACTCTA ACGTGCCTAA ATTCGTCTAC GTTGTCGAAG GCAGTTTCAT 2820
TTCCCATCCT TTCCATTATT TCTGGAGTTT TTTTTCTATT TTCTTCTTAA TCATCGTATT 2880
ATTCATTTTC TTCATGATTT AATCATTTTG GCATAATGCA GGTACCGGTG TTCAAGGCAC 2940
TGTGATCCCT GGTTGTGCTG AAACATTTGA ATCCCAGGGT GAATCATTTT GGGGTGGTCA 3000
GGAACAGCCG GGCAAAGGGC AAGAAGGCCA AGAGCAAGGT TCCAAAGGTG GTCAGGAAGG 3060
GCGAAGGCAA AGGTTTCCAG ACCGCCATCA GAAGCTCAGA AGGTTCCAAA AAGGAGATGT 3120
CCTTATATTG CTTCCTGGTT TCACTCAGTG GACATATAAT GATGGAGATG TTCCACTTGT 3180
CACTGTCACA CTTCTTGATG TTGCCAATGA CGTGAATCAG CTTGATTTGC AGTCCAGGGT 3240
AAGAAAACTT TCAATCCAAA CTTGCCAAGT ATTAATCAAA AAATAATCTC TTTCTGGGCA 3300
TATTTTATTG CGGTACCATC TTAATAAAAA AAAAATTTTA TACTTTCAGA AATTTTTCCT 3360
AGCCGGAAAC CCGCAACAGG GTGGTGGAAA GGAAGGCCAT CAAGGCCAGC AGCAGCAGCA 3420
TAGAAACATC TTCTCAGGAT TTGATGACCA CTTTTGGCTG ATGCTTTCAA TGTTGAC 3477
( 2 ) INFORMATION FOR SEQ ID NO : 4 : (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 17 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "nucleotide" (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4:
GCNGAYGTNT TYAAYCC 17
( 2 ) INFORMATION FOR SEQ ID NO : 5 : (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 17 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5:
AAACATTGGC CTCCCCC 17
(2) INFORMATION FOR SEQ ID NO : 6: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 19 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6:
CCAAACATCA AACTTCTCG 19
( 2 ) INFORMATION FOR SEQ ID NO : 7 : (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 23 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7:
GAGAAATCAT ATGAGAGTGA GCC 23
(2) INFORMATION FOR SEQ ID NO : 8: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 23 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8:
TTCTTTTGTT CCTCGGCTGT TTG 23
(2) INFORMATION FOR SEQ ID NO: 9: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 17 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9:
GTGAGCCATT CTCTGAC 17
(2) INFORMATION FOR SEQ ID NO: 10: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 23 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10:
AGTTTGATCC AACATGGATT GGC 23
(2) INFORMATION FOR SEQ ID NO : 11: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 24 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE. DESCRIPTION: SEQ ID NO : 11:
GCAAGAAACC TAATAATGAC ATGG 24
(2) INFORMATION FOR SEQ ID NO: 12: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 20 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12:
CCTCTTTTCT TTTGGAGTAC 20
(2) INFORMATION FOR SEQ ID NO: 13: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 12 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13:
CGCGGATCCG CG 12
(2) INFORMATION FOR SEQ ID NO: 14: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 33 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14:
CGCGGATCCG CGATGAGAGT GAGCCATTCT CTG 33
(2) INFORMATION FOR SEQ ID NO: 15: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 35 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15:
CGCGGATCCG CGCCTCTTTT CTTTTGGAGT ACAAG 35
(2) INFORMATION FOR SEQ ID NO : 16: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 36 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 16:
CGCGGATCCG CGTAGGTTTC TTGCTCTATC TTTTAG 36
(2) INFORMATION FOR SEQ ID NO: 17: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 36 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17:
CGCGGATCCG CGGTGCTAGA AACTTAAAAG CAGAAG 36
(2) INFORMATION FOR SEQ ID NO : 18: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 36 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18:
CGCGGATCCG CGACAAAAGA TTGAACAATA CATGTC 36
(2) INFORMATION FOR SEQ ID NO : 19: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 30 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19:
TTTGATTTCA CGGGTTGGGG TTTCTACAGG 30
(2) INFORMATION FOR SEQ ID NO : 20: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 30 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE. DESCRIPTION: SEQ ID NO: 20:
GGCTCGTATG TTGTGTGGAA TTGTGAGCGG 30
(2) INFORMATION FOR SEQ ID NO : 21: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 18 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 21:
ATGTTACGTC CTGTAGAA 18
(2) INFORMATION FOR SEQ ID NO: 22: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 18 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22:
GCAAAGTCCC GCTAGTGC 18
(2) INFORMATION FOR SEQ ID NO : 23: (i) SEQUENCE CHARACTERISTICS:
(A)" LENGTH: 17 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23:
CTGGATCGTT TCGCATG 17
(2) INFORMATION FOR SEQ ID NO : 24: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 16 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24:
CCAGAGTCCC GCTCAG 16
(2) INFORMATION FOR SEQ ID NO : 25: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 30 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25:
ACTAGGGGAT CCACAGCCAA GGCTCAGGGG 30
(2) INFORMATION FOR SEQ ID NO : 26: (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 36 base pairs
(B) TYPE: nucleotide
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(ii) MOLECULE TYPE: other nucleic acid
(A) DESCRIPTION: /desc = "OLIGONUCLEOTIDE" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26:
GTACTCTGCA GACATAATTA GCTCAAGCAA CTTCCC 36

Claims

Claims
1. DNA derived from the coffee bean, encoding at least 20 consecutive amino acids of the amino acid sequence SEQ ID NO: 2.
2. DNA according to Claim 1, encoding at least one protein chosen from the group comprising the storage protein ╬▒╬▓, having the amino acid sequence SEQ ID NO: 2, the cleavage protein cc, delimited in the amino acid sequence SEQ ID NO: 2 by amino acids 1 to 304, and the cleavage protein ╬▓, delimited in the amino acid sequence SEQ ID NO: 2 by amino acids 305 to 492.
3. DNA according to Claim 1, whose sequence is delimited by nucleotides 33 to 1508, 33 to 944 and/or 945 to 1508 of the nucleic sequence SEQ ID NO: 1, or any nucleic sequences homologous to these sequences.
4. Recombinant storage proteins derived from the coffee bean, having at least 20 consecutive amino acids of the amino acid sequence SEQ ID NO: 2.
5. Storage proteins according to Claim 4, having the amino acid sequence SEQ ID NO: 2 (storage protein ╬▒╬▓) , the sequence delimited by amino acids 1 to 304 of the sequence SEQ ID NO: 2 (storage protein ╬▒) , the sequence delimited by amino acids 305 to 492 of the sequence SEQ ID NO: 2 (storage protein ╬▓) , or any amino acid sequences homologous to these sequences.
6. Proteins according to Claim 5, characterized in that they are polymerized, independently or with each other.
7. All or part of the DNA delimited by nucleotides 1 to 2509 of the nucleic sequence SEQ ID NO: 3, capable of regulating the transcription of the storage proteins according to Claim 5.
8. Use of all or part of the DNA according to Claim 7, to direct the expression of a gene of interest in a plant .
9. Use of all or part of the DNA delimited by nucleotides 33 to 1508 of the sequence SEQ ID NO: 1 or of its complementary strand, of at least 10 bp, as primer to carry out a PCR or as probe to detect in vi tro or to modify in vivo at least one coffee bean gene encoding at least one storage protein.
10. Recombinant plant cells capable of expressing a recombinant storage protein according to Claim 5.
11. Plants or seeds consisting of plant cells according to Claim 10.
12. Food, cosmetic or pharmaceutical composition comprising a DNA according to one of Claims 1 to 3 , or a recombinant protein according to one of Claims 4 to 6.
PCT/EP1998/004038 1997-07-12 1998-06-25 Coffee storage proteins WO1999002688A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
CA002295320A CA2295320A1 (en) 1997-07-12 1998-06-25 Coffee storage proteins
AU87309/98A AU756433B2 (en) 1997-07-12 1998-06-25 Coffee storage proteins
US09/462,720 US6617433B1 (en) 1997-07-12 1998-06-25 Coffee storage proteins
EP98938679A EP1007683A1 (en) 1997-07-12 1998-06-25 Coffee storage proteins
BRPI9811690-8A BR9811690B1 (en) 1997-07-12 1998-06-25 use of all or part of DNA which encodes and regulates the expression of reserve proteins.
JP2000502184A JP2001509386A (en) 1997-07-12 1998-06-25 Coffee storage protein

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP97202183.6 1997-07-12
EP97202183 1997-07-12

Publications (1)

Publication Number Publication Date
WO1999002688A1 true WO1999002688A1 (en) 1999-01-21

Family

ID=8228554

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP1998/004038 WO1999002688A1 (en) 1997-07-12 1998-06-25 Coffee storage proteins

Country Status (7)

Country Link
US (1) US6617433B1 (en)
EP (1) EP1007683A1 (en)
JP (1) JP2001509386A (en)
AU (1) AU756433B2 (en)
BR (1) BR9811690B1 (en)
CA (1) CA2295320A1 (en)
WO (1) WO1999002688A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6441273B1 (en) 2000-02-25 2002-08-27 Cornell Research Foundation, Inc. Constitutive and inducible promoters from coffee plants
US6903247B2 (en) 2000-02-08 2005-06-07 Cornell Research Foundation, Inc. Constitutive α-Tubulin promoter from coffee plants and uses thereof
US7153953B2 (en) 2001-05-11 2006-12-26 Nestec S.A. Leaf specific gene promoter of coffee
US20220402959A1 (en) * 2014-04-25 2022-12-22 Translate Bio, Inc. Methods for purification of messenger rna

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10048840A1 (en) * 2000-10-02 2002-04-11 Biotechnolog Forschung Gmbh Use of lipopeptides or lipoproteins to treat lung infections and tumors
US8685735B2 (en) 2008-06-23 2014-04-01 Nestec S.A. Genes for modulating coffee maturation and methods for their use

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0295959A2 (en) * 1987-06-19 1988-12-21 Plant Cell Research Institute, Inc. Sulphur-rich protein from bertholletia excelsa H.B.K.
WO1991019801A1 (en) * 1990-06-11 1991-12-26 Mars Uk Limited RECOMBINANT 47 AND 31kD COCOA PROTEINS AND PRECURSOR
WO1992017580A1 (en) * 1991-04-08 1992-10-15 Rhone-Poulenc Agrochimie Chimeric plant genes based on upstream regulatory elements of helianthinin

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0295959A2 (en) * 1987-06-19 1988-12-21 Plant Cell Research Institute, Inc. Sulphur-rich protein from bertholletia excelsa H.B.K.
WO1991019801A1 (en) * 1990-06-11 1991-12-26 Mars Uk Limited RECOMBINANT 47 AND 31kD COCOA PROTEINS AND PRECURSOR
WO1992017580A1 (en) * 1991-04-08 1992-10-15 Rhone-Poulenc Agrochimie Chimeric plant genes based on upstream regulatory elements of helianthinin

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ACUNA R. ET AL.: "11S storage globulins from coffee, AC U64443", EMBL DATABASE, 25 January 1997 (1997-01-25), HEIDELBERG, XP002049318 *
ACUNA R. ET AL.: "AC P93079", EMBL DATABASE, 1 May 1997 (1997-05-01), HEIDELBERG, XP002084347 *
ROGERS W J ET AL: "An 11S -type storage protein from Coffea arabica L. endosperm: Biochemical characterization, promoter function and expression during grain maturation.", ASSOCIATION SCIENTIFIQUE INTERNATIONALE DU CAFE. 17TH INTERNATIONAL SCIENTIFIC COLLOQUIUM ON COFFEE, NAIROBI, KENYA, JULY 20-25, 1997. 828P. ASSOCIATION SCIENTIFIQUE INTERNATIONALE DU CAFE (ASIC): PARIS, FRANCE. 0 (0). 1997. 161-168. ISBN: 2-900212-1, XP002084348 *
YUFFA A M ET AL: "COMPARATIVE STUDY OF PROTEIN ELECTROPHORETIC PATTERNS DURING EMBRYOGENESIS IN COFFEA ARABICA CV CATIMOR", PLANT CELL REPORTS, vol. 13, no. 3/04, 1994, pages 197 - 202, XP000614722 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6903247B2 (en) 2000-02-08 2005-06-07 Cornell Research Foundation, Inc. Constitutive α-Tubulin promoter from coffee plants and uses thereof
US6441273B1 (en) 2000-02-25 2002-08-27 Cornell Research Foundation, Inc. Constitutive and inducible promoters from coffee plants
US7153953B2 (en) 2001-05-11 2006-12-26 Nestec S.A. Leaf specific gene promoter of coffee
US20220402959A1 (en) * 2014-04-25 2022-12-22 Translate Bio, Inc. Methods for purification of messenger rna
US11884692B2 (en) 2014-04-25 2024-01-30 Translate Bio, Inc. Methods for purification of messenger RNA

Also Published As

Publication number Publication date
CA2295320A1 (en) 1999-01-21
JP2001509386A (en) 2001-07-24
BR9811690A (en) 2000-09-26
BR9811690B1 (en) 2009-05-05
AU8730998A (en) 1999-02-08
US6617433B1 (en) 2003-09-09
EP1007683A1 (en) 2000-06-14
AU756433B2 (en) 2003-01-16

Similar Documents

Publication Publication Date Title
EP0686197B1 (en) Geminivirus-based gene expression system
CA2285687C (en) An oleosin 5&#39; regulatory region for the modification of plant seed lipid composition
US20030097689A1 (en) Seed-preferred promoters from end genes
US20050198702A1 (en) Seed-preferred promoter from maize
EP1104469A1 (en) Seed-preferred promoters
CA2285707A1 (en) A sunflower albumin 5&#39; regulatory region for the modification of plant seed lipid composition
US20020148007A1 (en) Seed-preferred promoter from barley
US20030106089A1 (en) Cotton fiber transcriptional factors
JP2003525030A (en) Flax seed-specific promoter
US7153953B2 (en) Leaf specific gene promoter of coffee
US5650303A (en) Geminivirus-based gene expression system
US6617433B1 (en) Coffee storage proteins
AU782957B2 (en) Plant seed endosperm-specific promoter
WO1998018948A1 (en) Flax promoters for manipulating gene expression
US6784342B1 (en) Regulation of embryonic transcription in plants
US5981236A (en) Geminivirus-based gene expression system
US20010047091A1 (en) Cryptic regulatory elements obtained from plants
US6930182B1 (en) Composition and methods of using the Mirabilis mosaic caulimovirus sub-genomic transcript (Sgt) promoter for plant genetic engineering
MXPA00000743A (en) Coffee storage proteins
JP4359963B2 (en) Plant promoter
Kim et al. Analysis of the Glycinin Gy2 Promoter Activity in Soybean Protoplasts and Transgenic Tobacco Plants
AU1468401A (en) Seed-preferred promoter from barley

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU BG BR CA CN CZ HU ID IL JP KR MX NO NZ PL RO RU SG TR US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1998938679

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2295320

Country of ref document: CA

Ref country code: CA

Ref document number: 2295320

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 87309/98

Country of ref document: AU

NENP Non-entry into the national phase

Ref country code: KR

WWE Wipo information: entry into national phase

Ref document number: PA/a/2000/000743

Country of ref document: MX

WWE Wipo information: entry into national phase

Ref document number: 09462720

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 1998938679

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 87309/98

Country of ref document: AU