WO2002099105A2 - Methods for modifying the cpg content of polynucleotides - Google Patents

Methods for modifying the cpg content of polynucleotides Download PDF

Info

Publication number
WO2002099105A2
WO2002099105A2 PCT/EP2002/006043 EP0206043W WO02099105A2 WO 2002099105 A2 WO2002099105 A2 WO 2002099105A2 EP 0206043 W EP0206043 W EP 0206043W WO 02099105 A2 WO02099105 A2 WO 02099105A2
Authority
WO
WIPO (PCT)
Prior art keywords
polynucleotide
codon
written
codons
sequence
Prior art date
Application number
PCT/EP2002/006043
Other languages
French (fr)
Other versions
WO2002099105A3 (en
Inventor
André CHOULIKA
Arnaud Perrin
Jean Charles Epinat
Alexandre Zanghellini
Original Assignee
Cellectis
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cellectis filed Critical Cellectis
Priority to AU2002317771A priority Critical patent/AU2002317771A1/en
Publication of WO2002099105A2 publication Critical patent/WO2002099105A2/en
Publication of WO2002099105A3 publication Critical patent/WO2002099105A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8216Methods for controlling, regulating or enhancing expression of transgenes in plant cells
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1089Design, preparation, screening or analysis of libraries using computer algorithms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/67General methods for enhancing the expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Definitions

  • the present invention relates to a process for (re)writing a polynucleotide sequence containing a coding sequence, whereby the content of CpG dinucleotides is adjusted to a predetermined value.
  • These polynucleotides are useful to increase, stabilize, silence and/or reduce gene expression, in particular for use in protein production, to generate transgenic animal, transgenic plants or to make gene therapy.
  • the present invention also relates to process for producing or stably expressing these (re)written polynucleotides in in vitro and in vivo expression systems.
  • DNA methylation in eukaryotes involves addition of a methyl group to the carbon 5 position of cytosine ring. It is the most common eukaryotic DNA modification and is a widespread epigenetic phenomenon. Eukaryotic genomes are not methylated uniformly but contain methylated regions interspersed with unmethylated domains. In eukaryotes, numerous studies have shown that the methylation of 5 CpG3' dinucleotides (mCpG) has a repressive effect on gene expression in vertebrates and flowering plants (Hsieh, Mol Cell. Biol, 14:5467-94, 1994; Kudo, Mol. Cell. Biol, 18:5492-99, 1998; Goto and Monk, Microbiol. Mol. Biol.
  • mCpG 5 CpG3' dinucleotides
  • CpG methylation is primarily associated with transcriptional repression.
  • Tissue-specific genes are variably methylated, often in a tissue-specific pattern, and usually the methylation level is inversely correlated with the transcriptional status of the genes.
  • the methylation of 5'CpG3' dinucleotides within genes creates potential targets for protein complexes that bind to methylated DNA sequences and to histone deacetylases (MBD-
  • DNA hypermethylation may contribute to tumorigenicity by silencing the expression of genes required to maintain a normal cell phenotype. Methylation as a mechanism for inactivating tumor-suppressor genes has been demonstrated for several genes. Similarly, cancer metastasis and invasion are closely associated with the phenomenon of cell to cell
  • BESTATIGUNGSKOPIE adhesiveness The gene expressing an invasion-suppressor gene (E-cadherin) was silenced by hypermethylation of the promoter region in human carcinomas and in human breast cancer cells.
  • the methylation of CpG dinucleotides also contributes to a C->T mutation, as demonstrated for example for the p53 gene. So far, genetic engineering has always been done so that the natural gene regulation is maintained, so that the CpG content is preserved. Indeed, the CpG methylation silencing effect is not a problem if expression of a gene in its natural host is contemplated.
  • the foreign DNA can be an exogenous gene or an endogenous gene which is not expressed in a differentiated cell.
  • the inactivation of foreign gene expression by methylation in specific cell types has important economical, therapeutic and pharmacological implications.
  • the expression of the introduced gene needs to be stable for transgenic animals or plants and gene therapy. Therefore, the methylation of introduced genes that leads to the silencing of such genes interferes with the therapeutic effect and restrains the use of transgenic animals and plants.
  • a solution to control the stability of gene expression is the adjustment of the CpG dinucleotides content.
  • This adjustment of the CpG dinucleotides i.e. removal of CpG dinucleotides in eukaryotic host
  • the gene has to be rewritten and the DNA synthesized. Then, the rewritten genes with a decreased CpG dinucleotide content could avoid the silencing by hypermethylation.
  • the adjustment of the CpG dinucleotides i.e. increasing of CpG dinucleotides in eukaryotic host
  • the present invention aims to remove the inhibitory expression barrier which exists between organisms from different genus and species. This is achieved by modifying the content of codons in the coding sequence in order to meet a codon usage which is in agreement with a particular host.
  • the present invention provides also an optimization of the sequence by meeting the usage codon of the host organism in order to achieve high expression.
  • the instant application aims to facilitate the manipulation of the (re)written gene by allowing the possibility to remove or insert restriction enzyme site.
  • the invention concerns a method of (re)writing a polynucleotide containing a coding sequence, typically of sequence coding for a polypeptide.
  • said (re)written polynucleotide has a predetermined content of CpG dinucleotides.
  • the content of CpG dinucleotides is minimized.
  • said (re)written polynucleotide encodes said polypeptide with the more frequent codons used in a host or in a group of hosts.
  • said (re)written polynucleotide contains codons so that the codon composition of the encoding sequence meets the codon usage table of a selected host or group of hosts.
  • said (re)written polynucleotide does not comprise any undesired restriction site.
  • some restriction sites are introduced in said (re)written polynucleotide.
  • the invention concerns a method of writing a polynucleotide containing a coding sequence for a polypeptide (or any other expression product) comprising the following steps : a) providing at least one database containing the amino-acids of said polypeptide and corresponding codons; b) reading at least one amino-acid from said polypeptide sequence ; c) selecting from said database one codon which encodes said amino-acid; d) repeating steps b) and c) for all amino-acids of said polypeptide: the polynucleotide being written with the selected codons; whereby the written polynucleotide has a content of CpG dinucleotide adjusted to a predetermined value.
  • the invention concerns a method of rewriting of a first polynucleotide into a second polynucleotide, each containing a sequence coding for the same expression product (e.g., polypeptide, RNA), comprising the steps of : a) providing at least one database containing groups of codons encoding the same amino- acid; b) reading at least one codon from said first polynucleotide; c) selecting from said database one codon which belongs to the same group of the read codon, which can be identical or different from said read codon; d) repeating steps b) and c) for all codons of said coding sequence of said polynucleotide; the polynucleotide being written with the selected codons; whereby the rewritten polynucleotide has a content of CpG dinucleotide adjusted to a predetermined value.
  • the mean CpG content is distinct from that of the first polynucleotide
  • the method of (re)writing a polynucleotide containing a coding sequence for a polypeptide can be done by manual process or by computarized process.
  • the (re)writing method is done by a computerized process.
  • the present invention also relates to methods of producing or synthesizing improved polynucleotides encoding a desired expression product (e.g., polypeptide or RNA), the method comprising: providing an improved polynucleotide sequence encoding said expression product using a method as described above, and synthesizing a polynucleotide comprising said sequence. Synthesis of the polynucleotide can be performed by variety of techniques, including recombinant DNA technologies, artificial synthesis, mutagenesis, enzymatic techniques, cloning, ligating, etc., or a combination thereof. In a further step, the polynucleotide may be cloned into a vector, particularly an expression vector.
  • a desired expression product e.g., polypeptide or RNA
  • the present invention also relates to isolated polynucleotides derived from a native gene having an increased or reduced content of CpG dinucleotides as compared to the native gene.
  • the polynucleotide has a decreased content of CpG dinucleotides. More preferably, the polynucleotide has less than 0.5 %, or 0.1%, preferably less than 0.05%, more preferably less than 0.01% of CpG dinucleotides. Still more preferably, the polynucleotide according to the present invention contains 1 or 0 CpG dinucleotide. In a more preferred embodiment of the invention, the polynucleotide is CpG free.
  • polynucleotide does not contain any CpG dinucleotide.
  • Said polynucleotide has at least 500, 700, or 900 bp, more preferably at least 1, 1.5, 2, 2.5 or 3 kb.
  • the polynucleotide encodes a native polypeptide.
  • the present invention relates to an isolated polynucleotide having an increased content of CpG dinucleotides.
  • the polynucleotide has a content of CpG dinucleotides higher than 1%, preferably higher than 5%, more preferably higher than 10%.
  • Said polynucleotide has at least 500, 700, or 900 bp, more preferably at least 1, 1.5, 2, 2.5 or 3 kb.
  • the invention encompasses the (re) written polynucleotides obtained by a method according to the present invention.
  • the invention concerns polynucleotides containing a sequence coding for a polypeptide having 1 or 0 CpG dinucleotide.
  • the invention concerns polynucleotides containing a sequence coding for a polypeptide having no CpG dinucleotide.
  • the invention also encompasses expression vectors, cells, and living organisms genetically modified as to comprise and/or express any of the polynucleotides object of the invention.
  • an isolated polynucleotide according to the invention for compensating a genetic defect is contemplated in the present invention.
  • the use of an isolated polynucleotide according to the invention for introducing a trait in a transgenic plant is also contemplated.
  • Figure 1 presents a codon usage table for prokaryotes, plants and eukaryotes.
  • Figure 2 presents an optimized codon usage table for prokaryotes and eukaryotes (see column PRO&EU means). For each codon encoding an amino acid, the frequency is calculated. The sum of the codon frequencies encoding one amino acid is 1. The column PRO&EU corrected corresponds to the codon frequency modified so that the codons comprising a CpG dinucleotide are not used and then have a frequency of zero. The frequency of the codon comprising a CpG dinucleotide is distributed among the other codon encoding the same amino acid.
  • Figure 3 presents an example of database comprising the amino acids, the codons encoding thereof and a frequency for each codon
  • the frequency for each codon corresponds to the column PRO&EU corrected of the figure 2.
  • these frequencies can also be called "coefficients”.
  • Figure 4 presents a relation between one restriction site and a group of amino acid sequences (2 or 3 amino acids).
  • the polylinker sequence of pUC19 is translated in an amino acid sequence in the three different reading frames. From the amino acid sequences are deduced the possible successions of amino acids, called regular expression, that indicate a potential insertion place for introducing a restriction enzyme site.
  • Figure 5 presents the regular expression to search in the amino acid sequence in order to identify the place appropriate to introduce a restriction site for the group of the pUC 19 polylinker restriction sites.
  • the brackets mean that any amino acid between the brackets could be chosen.
  • [RG] - 1 - [RPLHQ] designates the following sequences : RIR, GIR, RIP,
  • Figure 6 shows the manual selection of codons to write a polynucleotide sequence encoding the I-Crel protein, said polynucleotide meeting the usage codon of an host organism which can be prokaryotic or eukaryotic ( Figure 3), being without any CpG dinucleotide and without any restriction site comprised in the pUC19 polylinker.
  • the first two lines present the amino acid sequence of I-Cre I. Behind each amino acid is the possible codons encoding this amino acid.
  • the lines "preferential codon”, “CpG minus” and “restriction minus” disclose the three steps of the polynucleotide rewriting.
  • a BamHI restriction site is in position 60 of the "preferential codon” sequence. This site has been removed in the "CpG minus" sequence.
  • Figure 7 depicts the restriction enzyme sites of the I-Cre rewritten polynucleotide sequence.
  • Figure 8 presents a reference table with the codon frequency to meet, a table with the amino acid content of I Cre I protein, and a table with the theoretical codon content of the rewritten polynucleotide in order to meet the codon usage ( Figure 3) and the CpG dinucleotide free.
  • Figure 9 presents a table with the theoretical codon content of the rewritten polynucleotide for I Cre I, a table with the real codon content of the rewritten polynucleotide encoding I Cre I and a table giving the difference between the theoretical and the real codon content of the rewritten polynucleotide encoding I Cre I protein.
  • Figure 10 shows the manual selection of codons to rewrite a polynucleotide sequence encoding the HO protein, said polynucleotide meeting the usage codon of the host organism (prokaryote and eukaryote) ( Figure 3), and being without any CpG dinucleotide and without any restriction site comprised in the pUC 19 polylinker but the two introduced Kpn 1 and Pst 1 restriction sites.
  • the two first columns present the amino acid sequence of HO. At the left of each amino acid are the possible codons encoding this amino acid.
  • the "rewritten sequence” indicates the selected codon.
  • a Kpn 1 and Pst 1 restriction sites have been introduced in position 667 and 1233 of the "rewritten" sequence.
  • Figure 11 depicts the restriction enzyme site of the HO rewritten polynucleotide sequence. Among the restriction sites, the sites Kpn 1 and Pst 1 are indicated.
  • Figure 12 presents a reference table with the codon frequency to meet, a table with the amino acid content of HO protein, and a table with the theoretical codon content of the rewritten polynucleotide in order to meet the codon usage ( Figure 3) and the CpG dinucleotide free.
  • Figure 13 presents a table with the theoretical codon content of the rewritten polynucleotide for HO, a table with the real codon content of the rewritten polynucleotide encoding HO protein and a table giving the difference between the theoretical and the real codon content of the rewritten polynucleotide encoding HO protein
  • Figure 14 shows the manual selection of codons to rewrite a polynucleotide sequence encoding the F-Tevl protein, said rewritten polynucleotide meeting the usage codon of the host organism (prokaryote and eukaryote) ( Figure 3), and being without any CpG dinucleotide and any restriction site comprised in the pUC 19 polylinker.
  • the two first lines present the amino acid sequence of F-Tevl. Behind each amino acid are the possible codons encoding this amino acid.
  • the lines "preferential codon”, “CpG minus” and “restriction minus” disclose the three steps of the polynucleotide rewriting.
  • Figure 15 shows the manual selection of codons to rewrite a polynucleotide sequence encoding the I-Dmol protein, said rewritten polynucleotide meeting the usage codon of the host organism (prokaryote and eukaryote) ( Figure 3), and being without any CpG dinucleotide and without any restriction site comprised in the pUC 19 polylinker.
  • the two first lines present the amino acid sequence of I-Dmol. Behind each amino acid are the possible codons encoding this amino acid.
  • the lines "more frequent codon”, “preferential codon”, “CpG minus” and “restriction minus” disclose the four steps of the polynucleotide rewriting.
  • Figures 16-20 present a table with the theoretical codon content of the rewritten polynucleotide for the encoded polypeptide, a table with the real codon content of the rewritten polynucleotide encoding said polypeptide and a table giving the difference between the theoretical and the real codon content of the rewritten polynucleotide encoding said polypeptide.
  • the encoded polypeptides are F-Tevl in figure 16, I-Dmol in figure 17, 1-Scel in figure 18, 1-TevIII in figure 19, and Pl-Scel in figure 20.
  • Figure 21 presents a table with the theoretical codon content of the rewritten polynucleotide for PI-MtuI, a table with the real codon content of the rewritten polynucleotide encoding PI-MtuI protein and a table giving the difference between the theoretical and the real codon content of the rewritten polynucleotide PI-MtuI protein.
  • Figure 22 presents the tree and path search used by the algorithm of the computerized method for the rewriting polynucleotide containing a coding sequence for a polypeptide.
  • Figure 23 presents the flow chart representing the branching algorithm of the computerized method for the rewriting polynucleotide containing a coding sequence for a polypeptide.
  • Figure 24 presents an optimized codon usage table for higher eukaryotes and CpG minus.
  • the frequency is calculated.
  • the sum of the codon frequencies encoding one amino acid is 1.
  • the column CpG corrected corresponds to the codon frequency modified so that the codons comprising a CpG dinucleotide are not used and then have a frequency of zero.
  • the frequency of the codon comprising a CpG dinucleotide is distributed among the other codon encoding the same amino acid.
  • the present invention concerns the (re)writing, synthesis and/or expression of polynucleotides containing a sequence coding for an expression product (e.g., a polypeptide or RNA), so that said polynucleotide has a predetermined content of CpG dinucleotides and/or an improved codon usage and/or selected restriction sites.
  • an expression product e.g., a polypeptide or RNA
  • a polynucleotide having a content of X % of CpG dinucleotides refers to a polynucleotide which presents x CpG dinucleotides for 100 nucleotides.
  • CpG free polynucleotide refers to a polynucleotide comprising no CpG dinucleotide.
  • a polynucleotide is said to "derive” from a native gene or a fragment thereof when such polynucleotide comprises at least one portion, substantially similar in its sequence, to the native gene or to a fragment thereof.
  • the polynucleotide is also similar in its function to the native gene from which it derives.
  • expression or “expressing”, as is generally understood and used herein refer to the process by which a gene produces a polypeptide. It involves transcription of the gene into mRNA, and the translation of such mRNA into polypeptide(s).
  • a “host” refers to a cell, tissue, organ or organism capable of providing cellular components for allowing the expression of an exogenous nucleic acid (typically a nucleic acid embedded into a vector or a viral genome). This term is intended to also include hosts which have been modified in order to accomplish these functions. Bacteria, fungi, animal (cells, tissues or organisms) and plant (cells, tissues, or organisms) are examples of a host. "Non-human hosts” comprise vertebrates such as rodents, non-human primates, sheep, dog, cow, amphibians, reptiles, etc.
  • Isolated means altered “by the hand of man” from its natural state, i.e., if it occurs in nature, it has been changed, purified or removed from its original environment, or both.
  • a polynucleotide naturally present in a living organism is not “isolated”.
  • the same polynucleotide separated from the coexisting materials of its natural state, obtained by cloning, amplification and/or chemical synthesis is "isolated” as the term is employed herein.
  • a polynucleotide that is introduced into an organism by transformation, genetic manipulation or by any other recombinant method is “isolated” even if it is still present in said organism.
  • the terms "modified”, “modifying” or “modification” as applied to the terms polynucleotides or genes refer to polynucleotides that differ, in their nucleotide sequence, from another reference polynucleotide or gene. Changes in the nucleotide sequence of the modified polynucleotide may or may not alter the amino acid sequence of a polypeptide encoded by the reference polynucleotide/gene Nucleotide changes may result in amino acid substitutions, additions, deletions, fusion proteins and truncations in the polypeptide encoded by the reference sequence. According to preferred embodiments of the invention, the modifications are conservative such that these changes do not alter the amino acid sequence of the encoded polypeptide.
  • Modified polynucleotides may be made by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to the skilled artisans.
  • the polynucleotides of the invention can also contain chemical modifications or additional chemical moieties not present in the native gene. These modifications may improve the polynucleotides solubility, absorption, biological half life, and the like.
  • the moieties may alternatively decrease the toxicity of the polynucleotides, eliminate or attenuate any undesirable side-effects and the like.
  • a person skilled in the art knows how to obtain polynucleotides derived from a native gene.
  • "native" refers to the fact that an object can be found in nature.
  • a gene that is present in an organism that can be isolated from its natural non-isolated state is said to be a "native gene”.
  • a native polypeptide refers to a polypeptide having an amino acid sequence presenting 100 % of identity with its natural non- isolated state.
  • Polynucleotide refers to any DNA or RNA sequence or molecule having one nucleotide or more, including nucleotide sequences encoding a complete gene. The term is intended to encompass all nucleic acids whether occurring naturally or non-naturally in a particular cell, tissue or organism. This includes DNA and fragments thereof, RNA and fragments thereof, cDNAs and fragments thereof, expressed sequence tags, artificial sequences including randomized artificial sequences.
  • Vector refers to a self-replicating or integrating RNA or DNA molecule which can be used to transfer an RNA or DNA segment from one organism to another.
  • Vectors are particularly useful for manipulating genetic constructs and different vectors may have properties particularly appropriate to express protein(s) in a recipient during cloning procedures and may comprise different selectable markers.
  • Bacterial plasmids are commonly used vectors.
  • “Expression Vector” refers to a vector or vehicle similar to a cloning vector but which is capable of expressing a gene (or a fragment thereof) which has been cloned therein. Typically, expression of the gene occurs when the vector has been introduced into the host.
  • the cloned gene is usually placed under the control of certain control sequences or regulatory elements such as promoter sequences. Expression control sequences vary depending on whether the vector is designed to express the operably linked gene in a prokaryotic or eukaryotic host and may additionally contain transcriptional elements such as enhancer elements, termination sequences, tissue-specificity elements and/or translational and termination sites.
  • Codon usage table refers to a database giving the codons, the amino acid encoded by each codon, and the frequency at which these codons are found for a defined type of amino acid. "Two consecutive codons” refers to two codons immediately consecutive in a coding sequence.
  • codon usage is intended that there is an optimization of the codon frequency in the (re)written polynucleotide in order to be the closest as possible to the codon frequency in the codon usage table for a considered host or group of hosts.
  • the optimization is not the same for all the groups of codon encoding an amino acid. Indeed, the optimization depends on the number of codons encoding a type of amino acid. For example, as shown in Figure 2, the serine can be encoded by 6 different codons whereas the tyrosine can be encoded by two different codons. Therefore, when an amino acid is encoded by more codons, the optimization is better.
  • the invention concerns a method, preferably a computerized process, of writing a polynucleotide containing a coding sequence for a polypeptide comprising the steps of : a) providing at least one database containing the amino-acids of said polypeptide and corresponding codons; b) reading at least one amino-acid from said polypeptide sequence ; c) selecting from said database one codon which encodes said amino-acid; d) repeating steps b) and c) for all amino-acids of said polypeptide: the polynucleotide being written with the selected codons; whereby the written polynucleotide has a content of CpG dinucleotide adjusted to a predetermined value.
  • the invention concerns a method, preferably a computerized process, for rewriting of a first polynucleotide into a second polynucleotide containing each a coding sequence for the same polypeptide, comprising the steps of : a) providing at least one database containing groups of codons encoding the same amino- acid; b) reading at least one codon from said first polynucleotide; c) selecting from said database one codon which belongs to the same group of the read codon, which can be identical or different from said read codon; d) repeating steps b) and c) for all codons of said coding sequence of said polynucleotide; the polynucleotide being written with the selected codons; whereby the rewritten polynucleotide has a content of CpG dinucleotide adjusted to a predetermined value.
  • the method is computerized.
  • all the steps of the method are computerized.
  • only some steps of the method are computerized.
  • the (re)written polynucleotide sequence is longer than 500 bp, preferably 1 kb, and more preferably 2 kb.
  • the (re)written polynucleotide is not limited to the sequence encoding the polypeptide. It can comprise additional sequences upstream and downstream of the coding sequence.
  • additional sequences can be insulator (Kaffer et al., Genes Dev. 2000, 14, 1908-19 ; EP 859,059 ; WO96/04390, the disclosures of which are incorporated herein by reference).
  • the (re)writing polynucleotide can be surrounded by restriction enzyme sites.
  • the invention also encompasses a (re)written polynucleotide which comprises non- coding additional sequence(s) which are introduced into the coding sequence. These additional sequences could have a predetermined content of CpG dinucleotides.
  • the content of CpG dinucleotide in the (re)written polynucleotide is minimized, preferably is such content is less than 1%, more preferably less than 0.5%, and most preferably less than 0.1%. More preferably, the polynucleotide has less than 0.1%, preferably less than 0.05%, more preferably less than 0.01% of CpG dinucleotides. Still more preferably, the polynucleotide according to the present invention contains 1 or 0 CpG dinucleotide. In a more preferred embodiment of the invention, the polynucleotide is CpG free.
  • the method of (re)writing the polynucleotide selects the codon so that the codons or the two consecutive codons without CpG dinucleotide are maintained. In an other embodiment of the invention wherein the CpG content is minimized, the method of (re)writing the polynucleotide selects the codon so that the codons or the two consecutive codons without CpG dinucleotide are partially or totally exchanged into further codons without CpG dinucleotide. In one embodiment of the invention, the content of CpG dinucleotide in the (re)written sequence is maximized.
  • the content of CpG dinucleotide is more than 1%, preferably more than 5%, more preferably more than 10%. More preferably, by maximized is intended that the CpG dinucleotide content of the (re)written polynucleotide is higher than the native polynucleotide encoding the same polypeptide.
  • the CpG dinucleotide content of the (re)written polynucleotide has to be higher than the provided polynucleotide.
  • the method of (re)writing the polynucleotide selects the codon so that the codons or the two consecutive codons with CpG dinucleotide are maintained. In an other of the invention wherein the CpG content is maximized, the method of (re)writing the polynucleotide selects the codon so that the codons or the two consecutive codons with CpG dinucleotide are partially or totally exchanged into further codons with CpG dinucleotide.
  • the invention concerns the method of (re)writing a polynucleotide in which the step a) of providing at least one database containing the amino-acids of said polypeptide and corresponding codons or groups of codons encoding the same amino-acid further comprises providing the codon usage table corresponding to one host or to a group of hosts and whereby the (re)written polynucleotide sequence meets the codon usage of said host or said group of hosts.
  • the invention concerns the method of (re)writing a polynucleotide in which the step c) of selecting from said database one codon are performed one codon at a time.
  • the selection steps are performed one codon at a time, and the selected codon is the one that is closest to the codon usage that is determined with the so-far written polynucleotide.
  • the invention concerns the method of (re)writing a polynucleotide in which the step c) of selecting are performed on a batch of k codons.
  • the selection steps are performed on a batch of k codons, and the selected batch of k codons is the one that is closest to the codon usage that is determined with the so-far written polynucleotide.
  • k is at least 2 and no more than the number of amino acids comprised in the encoded polypeptide.
  • k is between 2 and 1000, preferably between 5 and 500, more preferably between 10 and 100.
  • k is at least 2, 5, 10, 25, 50, 75, 100, 200.
  • the invention also concerns a method of (re)writing a polynucleotide containing a coding sequence for a polypeptide according to the invention, further comprising the steps for removing the undesired restriction sites.
  • the invention concerns a method, preferably a computerized process, for rewriting of a first polynucleotide into a second polynucleotide containing each a coding sequence for the same polypeptide, as above-described, further comprising the steps of :
  • sequence difference with a restriction enzyme site is intended that the sequence present at least one nucleotide difference.
  • sequence difference with a restriction enzyme site is intended that the sequence can not be recognized by the restriction.
  • the selection of the codon allows to control the CpG dinucleotide content of the (re)written polynucleotide in order to reach a predetermined CpG dinucleotide content. Therefore, two criteria have to be considered: Does the selected codon comprise a CpG dinucleotide?; Does the selecting codon considering together with the immediately consecutive codon(s) introduce a CpG dinucleotide? In case of a minimized CpG dinucleotide is desired, the codon, alone or combined with the immediately consecutive codon(s), which comprises a CpG dinucleotide will not be selected.
  • the invention concerns a method, preferably a computerized process, for rewriting of a first polynucleotide into a second polynucleotide containing each a coding sequence for the same polypeptide, as above-described, wherein, for at least a part of the first polynucleotide corresponding to n successive codons of the first polynucleotide:
  • step a) comprises providing at least one database containing groups of codons encoding the same amino-acid
  • - step b) comprises reading the j ,h codon of said part of the first polynucleotide;
  • - step c) comprises the sub- steps of:
  • said part of the first polynucleotide is determined as being the part of the first polynucleotide having the highest CpG dinucleotide concentration.
  • restriction sites generally have 4, 6 or 8 nucleotides, at least 2 or 3 immediately consecutive codons have to be considered in order to avoid the introduction of undesired restriction sites during the (re)writing of the polynucleotide.
  • the invention concerns an embodiment of the above-mentioned rewriting method wherein:
  • step a) comprises providing at least one database containing groups of codons encoding the same amino-acid and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
  • step b) comprises reading the j th codon of said part of the first polynucleotide
  • step c) comprises the sub- steps of:
  • the invention concerns a particular embodiment of the above-mentioned rewriting method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
  • - sub-step cl) futher comprises reading the (j-2) th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide
  • the selected codon considered together with said (j"2) th and (j-l) th codons of said part of the second polynucleotide do not contain an enzyme restriction sites listed in the second database;
  • the codon selection is done so that the already (re)written polynucleotide is the closest of the codon usage table of the host or group of hosts. Therefore, the invention concerns an embodiment of the above-mentioned rewriting method wherein:
  • step a) comprises providing at least one database containing groups of codons encoding the same amino-acid and the codon usage table corresponding to a host or a group of hosts;
  • - step b) comprises reading the j lh codon of said part of the first polynucleotide;
  • - step c) comprises the sub- steps of:
  • the selected codon considered with the first to (j - 1 ) th codons of said part of the second polynucleotide is the closest to codon usage with respect to any one of the codons of the same group; - c3) placing the selected codon at the j th codon location in said part of the second polynucleotide;
  • step a) comprises providing at least one database containing groups of codons encoding the same amino-acid and the codon usage table corresponding to a host or a group of hosts and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
  • step b) comprises reading the j th codon of said part of the first polynucleotide
  • step c) comprises the sub- steps of:
  • the selected codon considered with the first to (j - 1 ) th codons of said part of the second polynucleotide is the closest to codon usage with respect to any one of the codons of the same group;
  • the informations contained in the databases can be fused in only one database.
  • the invention concerns a particular embodiment of the above-mentioned rewriting method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
  • - sub-step cl) futher comprises reading the (j-2) th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide - in sub-step c2), the selected codon considered together with said 0-2)* and ( -l) tb codons of said part of the second polynucleotide do not contain an enzyme restriction sites listed in the second database;
  • the invention also concerns a method, preferably a computerized process, of writing a polynucleotide containing a coding sequence for a polypeptide, as above-described, wherein, for at least a part of the polypeptide corresponding to n successive amino-acids:
  • - step a) comprises providing at least one database containing the amino-acids of said polypeptide and corresponding codons;
  • - step b) comprises reading the j th amino-acid of said part of the polypeptide;
  • step c) comprises the sub- steps of:
  • step a) comprises providing at least one database containing the amino-acids of said polypeptide and corresponding codons and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
  • step b) comprises reading the j th amino-acid of said part of the polypeptide
  • step c) comprises the sub- steps of:
  • the invention concerns a particular embodiment of the above-mentioned writing method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
  • - sub-step cl) futher comprises reading the -2)* codon in the part of the written polynucleotide correponding to said part of the polypeptide;
  • step a) comprises providing at least one database containing the amino-acids of said polypeptide, the corresponding codons, and the codon usage table corresponding to a host or a group of hosts;
  • step b) comprises reading the j th amino-acid of said part of the polypeptide
  • step c) comprises the sub- steps of:
  • - step a) comprises providing at least one database containing the amino-acids of said polypeptide, the corresponding codons, and the codon usage table corresponding to a host or a group of hosts and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
  • - step b) comprises reading the j th amino-acid of said part of the polypeptide;
  • - step c) comprises the sub- steps of: - cl) reading the j- ⁇ ) lb codon in the part of the written polynucleotide correponding to said part of the polypeptide;
  • the informations contained in the databases can be fused in only one database.
  • the invention concerns a particular embodiment of the above-mentioned writing method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
  • - sub-step cl) futher comprises reading the 0-2)* codon in the part of the written polynucleotide correponding to said part of the polypeptide;
  • the invention concerns a method, preferably a computerized process, for rewriting of a first polynucleotide into a second polynucleotide containing each a coding sequence for the same polypeptide, as above-described, wherein, for at least a part of the first polynucleotide corresponding to n successive codons of the first polynucleotide:
  • step a) comprises providing at least one database containing groups of codons encoding the same amino-acid
  • step b) comprises the sub-steps of:
  • - step c) comprises the sub-steps of: - cl) selecting from said database one codon which belongs to the same group as the i th codon of the first polynucleotide, which can be identical or different from the i th codon, the selected codon considered together with the (i-1)' codon of said part of the second polynucleotide containing no CpG dinucleotide; - c2) placing the selected codon at the i th codon location in said part of the second polynucleotide;
  • step cl) and c2) repeating step cl) and c2) until the k codons have been selected;
  • the invention also concerns one embodiment of this rewriting method, wherein: - step a) comprises providing at least one database containing groups of codons encoding the same amino-acid and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
  • step b) comprises the sub-steps of:
  • step c) comprises the sub-steps of:
  • step cl) and c2) repeating step cl) and c2) until the k codons have been selected;
  • the invention concerns a particular embodiment of the rewriting method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
  • - sub-step bl) futher comprises reading the (i-2) th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide
  • the selected codon considered together with said (i-2) th and (i-l) th codons of said part of the second polynucleotide do not contain an undesired enzyme restriction sites listed in the second database, and k is at least 3.
  • the invention further concerns the above-mentioned rewriting method wherein:
  • step a) comprises providing at least one database containing groups of codons encoding the same amino-acid and the codon usage table corresponding to a host or a group of hosts;
  • - step b) comprises the sub-steps of: - bl) reading the (i-l) th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide;
  • - step c) comprises the sub-steps of: - cl) selecting from said database one codon which belongs to the same group as the i th codon of the first polynucleotide, which can be identical or different from the i th codon,
  • the invention further concerns the above-mentioned rewriting method wherein:
  • step a) comprises providing at least one database containing groups of codons encoding the same amino-acid and the codon usage table corresponding to a host or a group of hosts, and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
  • step b) comprises the sub-steps of:
  • step c) comprises the sub-steps of:
  • the selected codon considered with the first to (i-l) th codons of said part of the second polynucleotide is the closest to codon usage with respect to any one of the codons of the same group;
  • the invention concerns a particular embodiment of the rewriting method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
  • - sub-step bl) futher comprises reading the (i-2) th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide
  • the invention also concerns a method, preferably a computerized process, of writing a polynucleotide containing a coding sequence for a polypeptide, as above-described, wherein, for at least a part of the polypeptide corresponding to n successive amino-acids:
  • - step a) comprises providing at least one database containing the amino-acids of said polypeptide and corresponding codons;
  • - step b) comprises the sub-steps of:
  • - step c) comprises the sub-steps of: - cl) selecting from said database one codon which codes for said i th amino-acid, the selected codon considered together with the (i-l) th codon of said part of the written polynucleotide containing no CpG dinucleotide;
  • the invention also concerns one embodiment of this writing method, wherein:
  • - step a) comprises providing at least one database containing the amino-acids of said polypeptide and corresponding codons and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
  • step b) comprises the sub-steps of:
  • - step c) comprises the sub-steps of: - cl) selecting from said database one codon which codes for said i th amino-acid, the selected codon considered together with the (i-l) th codon of said part of the written polynucleotide containing no CpG dinucleotide and no undesired enzyme restriction site listed in the second database;
  • step cl) and c2) repeating step cl) and c2) until the k codons have been selected;
  • the invention concerns a particular embodiment of the above-mentioned writing method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
  • - sub-step bl) futher comprises reading the (i-2) th codon in the part of the written polynucleotide correponding to said part of the polypeptide;
  • the invention also concerns one embodiment of this writing method, wherein:
  • step a) comprises providing at least one database containing the amino-acids of said polypeptide, the corresponding codons, and the codon usage table corresponding to a host or a group of hosts;
  • step b) comprises the sub-steps of:
  • - step c) comprises the sub-steps of: - cl) selecting from said database one codon which codes for said i' amino-acid, the selected codon considered together with the (i-l) lh codon of said part of the written polynucleotide containing no CpG dinucleotide and the selected codon considered with the first to (i-l) lh codons of said part of the written polynucleotide is the closest to codon usage with respect to any one of the codons of the same group;
  • step cl) and c2) repeating step cl) and c2) until the k codons have been selected;
  • the invention also concerns one embodiment of this writing method, wherein:
  • - step a) comprises providing at least one database containing the amino-acids of said polypeptide, the corresponding codons and the codon usage table corresponding to a host or a group of hosts, and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
  • - step b) comprises the sub-steps of:
  • step c) comprises the sub-steps of: - cl) selecting from said database one codon which codes for said i th amino-acid,
  • the selected codon considered with the first to (i-l) th codons of said part of the written polynucleotide is the closest to codon usage with respect to any one of the codons of the same group;
  • the invention concerns a particular embodiment of the above-mentioned writing method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
  • the invention concerns a method, preferably a computerized process, for rewriting of a first polynucleotide into a second polynucleotide containing each a coding sequence for the same polypeptide, as above-described, further comprising the steps of :
  • nucleotide sequences corresponding to desired restriction enzyme sites as well as the corresponding 2 or 3 amino-acids sequences encoded by said nucleotide sequences in the three reading frames;
  • the rewritten polynucleotide sequence contains at least one further desired restriction enzyme site, when compared to the first polynucleotide sequence.
  • the rewritten polynucleotide contains only one further desired restriction enzyme site, when compared to the first polynucleotide sequence.
  • the invention also concerns a method, preferably a computerized process, of writing a polynucleotide containing a coding sequence for a polypeptide, as above-described, further comprising the steps of : - (i) providing a further database containing nucleotide sequences corresponding to desired restriction enzyme sites, as well as the corresponding 2 or 3 amino-acids sequences encoded by said nucleotide sequences in the three reading frames;
  • the written polynucleotide sequence contains at least one desired restriction enzyme site.
  • said desired restriction sites are introduced whereby restriction sites are introduced in the (re)written polynucleotide sequence at a predetermined length from each other.
  • the predetermined length from each other is between 100 to 1000 bp, preferably 300 to 800 bp, more preferably 600 to 800 bp.
  • said desired restriction site(s) is introduced whereby a restriction site is introduced between each functional unit of the (re)written polynucleotide.
  • the present invention relates to a method for (re)writing a CpG free polynucleotide containing a coding sequence for a polypeptide comprising the following steps : a) providing an amino acid sequence or a polynucleotide sequence; b) removing the CpG dinucleotides by replacing with codon or codon combination which does not comprise a CpG; c) writing a nucleotide sequence encoding said amino acid sequence by selecting the preferential codon of the codon usage table corresponding to the host or group of hosts and/or d) removing the undesired restriction sites by replacing with codon or codon combination which does not comprise a CpG and which does not comprise an undesired restriction site and/or; e) optionally adding desired restriction site(s).
  • the steps b) and/or c) and/or d) can be done consecutively or simultaneously.
  • the global codon frequency has to be estimated in order to control the accordance with the chosen codon usage table. More preferably, said global frequency is controlled at each step b), c), d), and e), of the method.
  • the amino acid sequence is provided (step a)
  • the number of each amino acid is determined.
  • the number of each codon to be used is determined.
  • an appropriate codon usage table can be the one depicted in Figure 3. These numbers are used during the writing step b) and the following steps c), d) and e) for the rewriting. Examples of such (re)writing method are disclosed in Figures 6-20.
  • the invention relates to an alternative method in which the amino acid sequence is modified so that the nucleotide sequence contains said predetermined CpG dinucleotide content and the substitution of one or more amino acids is conservative.
  • conservative is intended that first amino acid can be substituted by an other one from a group comprising the first, the groups being the following : Group I : Gly, Ala, Val, He, Leu, Met, Phe, Trp
  • Group II Ser, Thr, Cys Group III : Asp, Glu, Asn, Gin Group IV : Arg, Lys, Met Group V : His, Phe, Tyr, Trp
  • the amino acids Gly, Cys and Pro are not changed. Therefore, the invention concerns a method of (re)writing a polynucleotide containing a coding sequence for a polypeptide further comprising the steps of :
  • the (re)writing method according to the present invention can also comprise a search of cryptic splicing sites.
  • the presence of cryptic splicing sites in the (re)writing sequence is a very rare.
  • a checking step can be introduced in the (re)writing method in order to delete them.
  • Another embodiment of the invention includes (re)written of polynucleotide sequence which substantially meets the codon usage of a host or a group of hosts.
  • the nucleotide sequence encoding a polypeptide is (re)written so that the codons are selected in order to encode the amino acid sequence and to avoid the CpG dinucleotides. Indeed, as the translation code is degenerated, several codons may encode the same amino acid. Therefore, the codons comprising the dinucleotide CpG are never used, namely GCG, CGA, CGC, CGG, CGT, CCG, TCG, and ACG.
  • the codon ending by a C nucleotide (namely GCC, CGC, AAC, GAC, TGC, GGC, CAC, ATC, TTC, CCC, AGC, TCC, ACC, TAC, GTC) will not be used if the next codon begins by a G nucleotide (namely, GCA, GCC, GCG, GCT, GAC, GAT, GAA, GAG, GGA, GGC, GGG, GGT, GTA, GTC, GTG, GTT).
  • the codons comprising the dinucleotide CpG are preferably used, namely GCG, CGA, CGC, CGG, CGT, CCG, TCG, and ACG.
  • the codon ending by a C nucleotide is preferably used if the next codon begins by a G nucleotide (namely, GCA, GCC, GCG, GCT, GAC, GAT, GAA, GAG, GGA, GGC, GGG, GGT, GTA, GTC, GTG, GTT).
  • a C nucleotide namely GCC, CGC, AAC, GAC, TGC, GGC, CAC, ATC, TTC, CCC, AGC, TCC, ACC, TAC, GTC
  • GCA e.g., GCA, GCC, GCG, GCT, GAC, GAT, GAA, GAG, GGA, GGC, GGG, GGT, GTA, GTC, GTG, GTT.
  • codon frequency is different in each organism.
  • codon usage tables are available. More particularly, codon usage tables are available for prokaryotic organisms, for plants, for inferior and superior eukaryotes. For example, the difference is highly relevant for heterologous expression in plants.
  • the sequence could be optimized for one organism. A high specificity could lead to a strong expression.
  • the codon with the highest frequency is chosen.
  • the global frequency of each codon is in agreement with the codon usage table of the host organism.
  • the sequence could be optimized for several hosts or for a group of hosts.
  • the sequence is optimized for prokaryotes, for eukaryotes and/or for plants.
  • Figure 1 presents a codon usage table for prokaryotes, plants and eukaryotes. Therefore, a specific codon usage table is generated with the mean of the frequency of one codon in several codon usage tables.
  • the method according to the invention uses the codon usage table of Figure 2 in an optimization for prokaryotes and eukaryotes.
  • the codon usage tables of the preferred organisms are preferably used.
  • the introduction of rarely used organisms for the generation of the optimized table can lead to a codon usage table that is incompatible with frequently used organisms.
  • the codon usage table introduced in the (re)writing method is preferably checked to be compatible with the frequently used or planned organisms.
  • Figure 24 presents an optimized codon usage table for higher eukaryotic hosts with the above-mentionned suggestions.
  • the codon usage table of Figure 24 is preferably used in the (re)writing method according to the present invention.
  • the codons are chosen so that the final and global proportion of every codons are similar to the codon usage table of the host cell. Indeed, the respect of these frequencies can allow an increased expression. For example, if a stretch of several Alanine is found the protein sequence and if the same codon is used, the translation can be hindered. However, if the codon frequency is less than 10 %, preferably less than 5 %, this codon is not used.
  • the invention also contemplates a method of (re)writing a polynucleotide containing a coding sequence for a polypeptide so that the (re)written polynucleotide meets the codon usage table of a host or a group of host.
  • said polypeptide is not naturally expressed in the host or in one host of the group of hosts.
  • the (re)writing of such polynucleotide can allow a better expression of said polypeptide in said host or group of hosts.
  • the user may provide a list of nucleic acid sequences as a base to compute a particular codon usage. For instance, suppose we want to rewrite a peptidic sequence originally coded by an intronic ORF in an organism, say S. cerevisiae., into an intronic ORF in an other organism, say E. coli . Suppose also that the user have a set of nucleic sequences being intronic ORFs from E. Coli . According to the method described hereafter, the user may get a specific codon usage, drawn from the said set of sequences of intronic ORFs, and use the previously described embodiment of the invention along with the custom-made codon usage to rewrite the sequence.
  • a set of nucleic acid sequences is read from database files, preferably one file, more preferably a Fasta-formatted file.
  • a set of nucleic acid sequences is read from database files, preferably one file, more preferably a Fasta-formatted file.
  • Count codon frequencies for each sequence from the said set of sequences For this, a specific genetic code must be specified to the program, as a file or an internal data structure.
  • iv) Normalize the frequencies into usage percentage, such that the sum of the usage percentage over all the degenerated codons coding for a particular amino-acid makes 100 percent.
  • Store the resulting codon usage table in an internal data structure or in an external file, and use it as the reference codon usage used by the rewriting process, as described in previous points.
  • the invention covers also (re)writing of a polynucleotide having a predetermined content of CpG dinucleotides and containing a coding sequence for a polypeptide in which the undesired restriction enzyme sites have been removed and/or at least one desired restriction site has been introduced.
  • (re)writing polynucleotide comprises the additional steps of removing of the restriction sites and, optionally, of specifically introducing at least one desired restriction sites.
  • the removal of the restriction sites allows an easy manipulation of the (re)written polynucleotide, more particularly in vitro.
  • Some restriction sites can be intentionally introduced in order to facilitate the manipulation of the (re)writing polynucleotide, for example for cloning, subcloning, sequencing, making mutagenesis.
  • the restriction sites By removal of the undesired restriction sites is intended the restriction sites frequently used and at least the restriction sites of the polylinker comprised in the used vector.
  • the (re)writing could comprise no restriction site introduction.
  • new restriction sites are introduced with a regular spacing without modifying the protein sequence.
  • a restriction site can be introduced at a determined length between each other, preferably each 100, 200, 300, 400, 500, 600, 700, 800 or 1000 bp. More preferably, a restriction site is introduced each 600 or 800 bp.
  • some restriction sites can be introduced between each functional unit.
  • functional unit can be intended a nucleotide encoding a protein domain, a regulatory sequence, a promoter, etc...
  • some restriction sites could be added between the nucleotide sequence encoding some protein fragments, motifs or domains.
  • these restriction sites can allow the replacement of a nucleotide sequence encoding a protein fragment by a nucleotide sequence encoding another protein fragment.
  • Another utility is the production of protein hybrids.
  • the introduced restriction sites are chosen from the group consisting of the restriction sites comprised in the polylinker of the vector that will be used.
  • the introduced restriction sites are chosen from the group consisting of the restriction sites comprised in the polylinker of the pUC19 vector, namely EcoR I, Sac I, Kpn I, Sma I, BamH I, Xba I, Sal I, Bspm I, Pst I, Sph I, Hind III.
  • the restriction sites are introduced so that to respect the order of the restriction sites in the polylinker.
  • the order of the restriction sites is this one of the polylinker of pUC 19 (5' EcoR I - Sac I - Kpn I - Sma I - BamH I - Xba I - Sal I - Bspm I - Pst I - Sph I - Hind III 3').
  • the method used for the introduction of restriction sites is the following.
  • the restriction site is translated in amino acid in the three different reading frames.
  • a relation between one restriction site and a group of amino acid sequences (2 or 3 amino acids) is established. See Figure 4.
  • This method to identify the place more appropriate to introduce a site restriction is based on the search of regular expression
  • Figure 5 presents the regular expression to search in the amino acid sequence in order to identify the place more appropriate to introduce a site restriction for the group of the pUC 19 polylinker sites.
  • the amino acid sequence of the polypeptide to be encoded is examined in order to identify the places showing one sequence of the group of amino acid sequences for one site. At this place, the sequence encoding the polypeptide can be modified in order to introduce the restriction site.
  • the database comprises at least the amino acids and some codon encoding said amino acids.
  • a coefficient or frequency is affected to each amino acid - codon couple (for example, see Figure 3). The higher the coefficient is, the more frequently the codon is introduced. If an amino acid - codon couple is undesired, the coefficient is near zero, preferably zero. An amino acid - codon couple could be undesired because of the presence of a CpG dinucleotide and/or of a very low frequency in the host or the group of host.
  • the coefficient can permit to introduce only the most frequent codon in a host cell or a group of hosts.
  • the coefficient can allow to meet the usage codon table for a host cell or a group of host cells.
  • the invention also encompasses a database with information on the undesired restriction enzyme sites.
  • the invention further encompasses a database with information on the desired restriction enzyme sites. More particularly, this database comprises the regular expression for the considered restriction enzyme sites (for example see Figure 5). These informations can be contained in a database or in a combination of databases.
  • the coefficients of the database can be used to calculate a score, more particularly in a computerized process.
  • the database(s) allow(s) to select the codon during the (re)writing process and to check if the (re)written polynucleotide meets the requirement.
  • requirement could be a predetermined content in CpG dinucleotides, and/or a codon usage table corresponding to a host or a group of hosts, and/or the absence of undesired restriction sites.
  • the process for (re)writing a polynucleotide sequence is computerized.
  • the software is intended at providing a toolkit (that is a set of software components available for later use in an encapsulated program) , as well as a standalone executable for rewriting genes.
  • This software components enables to perform the main task of (re)writing of a polynucleotide sequence with a predetermined CpG content from a polypeptide or polynucleotide sequence, and optionally the two following tasks of • (re)writing of a polynucleotide sequence in which restriction enzyme sites are deleted and/or introduced
  • the algorithm comprises initially in reading the first amino-acid of the polypeptide and to select a codon coding for said first amino-acid which is CpG free.
  • the selected codon is written as the first codon of the polynucleotide to write.
  • the algorithm reads the second amino-acid of the wished polypeptide - which is adjacent to the first one - and selects a second codon coding for said second amino-acid so that the already selected first codon of the polynucleotide considered together with the second codon is CpG free.
  • the second codon does not contain CpG and there is no CpG straddled on the first codon and the second codon.
  • the algorithm may check successively for each possible codon that codes for the second amino-acid if it fullfills the precited selection condition, untill it finds one fullfilling it.
  • the second selected codon is written to the polynucleotide adjacent to the first selected codon.
  • the algorithm repeats successively the precited step on the third till the N th amino-acid by selecting each time a corresponding codon by considering the previously selected codon for having no CpG.
  • the main routine of the algorithm comprises successively reading the I th amino-acid of the wished polypeptide and selecting an I th codon coding for said I th amino-acid so that the already selected (I-l) lh codon of the polynucleotide considered together with the I codon is CpG free, I being varied one by one from 2 to N.
  • the written polynucleotide will encode the polypeptide
  • This algorithm is particularly adapted to be computerized. It can be implemented with help of a database giving for each amino-acids possible corresponding codons.
  • An improvment thereof consists in that if all possible codons corresponding to the I th amino-acid leads to a straddled CpG on the (I-l) th selected codon, then the algorithm branches back to the (I-l) th amino-acid for selecting another corresponding codon. This another selected codon is selected as previously in consideration of the (1-2)' selected codon to be CpG free and is then written in the polynuclotide in replacement of the previously selected codon at this location. Then, it continues again with the I th amino-acid.
  • the algorithm may be completed to obtain a polynucleotide which is both CpG free and undesired restriction enzyme site free. Therefore, the first codon is also chosen undesired restriction site free. And, in the precited main routine, the codon for the I th amino-acid is selected so that it is also undesired restriction site free when considered with the (I-l) th selected codon. This can be done in the same manner than for the CpG presence checking that was previously described. It can also be implemented with help of a database containing the nucleotide sequences of the enzyme restriction sites to consider.
  • any restriction site to avoid corresponds to an undesired sequence having more than six nucleotides
  • the main routine is adapted so that the Ith amino-acid is considered with a number of previously selected (I- 1 )th, (I-2)th codons in order to allow the checking of the presence of such restriction sites over this number of codons added to the one in course of selection.
  • the restriction site to avoid has 8 nucleotides, the main routine will consider the two previously selected codons in order to select the following one.
  • restriction site free polynucleotide is possible with this computerized algorithm independently from the fact it is also CpG free. Indeed, the selection of each codon may be done without checking the presence of CpG. As described here, the treatment of restriction sites is done at the same time as the CpG treatment. However, it is possible also to write first a polynucleotide free of CpG regardless of the restriction sites, and then to rewrite this polynucleotide to get it restriction site free, or vice- versa.
  • the computerized algorithm is completed for writing the polynucleotide so that it tends to respect the codon usage of an host organism.
  • a tree exploration is used. An example of such a tree (constructed for a sequence amino acids) is shown on the figure 22.
  • the algorithm begins by the first amino-acid of the sequence : it builds one node per codon that codes for this particular amino-acid (the circles on the figure 22). Then it computes a score for each node, that is based on the frequencies of apparition of codons in the portion of sequence that has been rewritten (see below for a detailed explanation of the scoring algorithm - the scores are indicated on branches of the tree on figure 22). Once the nodes have been scored, the best-scored node is chosen and the same operation is recursively applied. If at any time a CG or a restriction enzyme site is found, the algorithm stops on the currently investigated node and traces back to the previous node.
  • each node corresponds a new codon (which is in the list of possible codons) that is added to the sequence being constructed. It computes the percentage of each of the possible codon in the newly constructed sequence, and compare it to the percentage put in the targeted codon usage by computing the square of the difference of these percentages. Then it takes the maximum of these differences upon the possible codon. So we have one number for each possible newly created sequence, which is the score associated to the node. The node selected is the one which presents the lower score which is considered as the best score.
  • « seq » corresponds to the amino-acids sequence of the polypeptide
  • « AA » is an abbreviation for amino-acid
  • « last-seq » are the codons that were selected for the previous amino-acids of the polypeptide.
  • codon usage optimisation may be used independently from the CpG freeness condition and/or from the restriction sites freeness. To do only usage codon optimisation, it is sufficient not to do the CpG and/or restriction sites checkings.
  • the computerized algorithms previously described may easily be adapted for rewriting a first polynucleotide having a coding sequence, into a second one coding for the same polypeptide, but which is CpG free and/or restriction site free. Therefore, instead of reading successively the amino-acids of the polypeptide, the algorithm successively reads the codons of the first polynucleotide, determines e.g the corresponding amino-acid by using the database and then continues as previously described. Heuristic improvements to the branching algorithm for codon usage optimisation
  • the algorithm selects the path of K codons having the best score and writes them into the polynucleotide.
  • the algorithm determines all the possibilities of codons corresponding to this K amino-acids and selects the combination of K codons being the nearest of the codon usage when considered together with all the previously selected codons of the polynucleotide. Then, the algorithm repeats the operation for the K following amino-acids.
  • This method provides a local score computing that spans on K codons and not just on one codon as in the algorithm illustrated in fig. 23.
  • Second heuristic segmentation of the search
  • the algorithm begins by the first amino-acid and then scans the sequence sequentially.
  • the algorithm does not take into account the regions where there are strong constraints (i. e. regions where there are a lot of CG, or regions where there exists restriction sites, or both).
  • An heuristic could be to begin the process by regions with a high ratio of CG and/or restriction sites so that the maximum flexibility in codon repartition is allowed. In other words, it provides more choices at the beginning than at the end).
  • the algorithm may comprise a preliminary step of looking for regions of the first polynucleotide having a bigger concentration of CpG and or restriction sites than the average amount thereof in the whole polynucleotide and beginning the rewriting for said high concentration regions and afterwards rewriting the other regions.
  • the algorithm may provide a polynucleotide which is CpG free as well as restriction site free, while being optimized as regards the codon usage.
  • the present also covers the technique of genetic algorithm in case a better global optimization is wanted.
  • This kind of algorithm has the property to find local optima (for an optimization problem). So it has the same limitations that our branching algorithm. But since it is based on a completely different approach, it is likely to give another type of solution. Thus it is preferable to use both algorithm altogether and to keep the best solution.
  • the algorithm of the invention may also be used to write polynucleotide having a given content of CpG instead of being CpG free.
  • a first method consist in first writing a CpG free polynucleotide starting from the polypeptide or from a first polynucleotide, e.g. with the help of the previously mentioned algorithm. Then, the CpG free polynucleotide is rewritten so as to add CpG in the wished quantity. Therefore, the algorithm sequentially screens said CpG free polynucleotide in order to determine codons in the polynucleotide for which it exists at least one equivalent codon - i.e. coding the same amino-acid - which contain a CpG dinucleotide. When such a codon is found, the algorithm replaces said codon by the equivalent one containing a CpG. The algorithm repeats the operation as many time as necessary to introduce the wished number of CpG.
  • the algorithm may start again the screening for looking for two adjacent codons which may be replaced with two equivalent codons which contain a straddled CpG thereon.
  • a second method consist in first screening the polynucleotide to rewrite in order to determine the number of CpG it contains. If it contains more CpG than wished, the CpG-free- polynucleotide-providing-algorithm may be applied to the sequence in order to remove the number of CpG in excess. On the contrary, if it contains less CpG than wished, the algorithm screens for CpG free codons which have at least one equivalent, but CpG-containing-codon and replaces a number of such codons to get the wished number of CpG in the polynucleotide. If not possible, the algorithm may also screen the polynucleotide for finding pairs of adjacent codons which may be replaced by two equivalent codons which contain a straddled CpG thereon.
  • the algorithm provides for insertion of restriction sites in the polynucleotide. Therefore, a database contains for the wished restriction sites the amino-acid combinations which may be encoded by adjacent codons comprising said restriction site. To introduce a given restriction site in the polynucleotide, the algorithm screens the polynucleotide for finding adjacent codons encoding one of said amino-acid combination which corresponds in the database to said restriction site. Then, if it is possible, the algorithm replaces the found adjacent codons by the codons that encode for the same amino-acid combination, but which contain the restriction site.
  • weights are affected to each constraints, preferably proportionally to the priority wanted by the user. For instance, during the rewriting process, a CG will cost 10, the addition of a restriction site from said restriction enzyme database will cost 30 which means that the user prefers adding CG rather than restriction sites.
  • the first method is the constraint solving programming method (here after referred to as
  • This embodiment of the invention consists in five steps. i) define the size of the sequence window, that is the number of consecutive amino-acids that will be affected to each node for the optimization process. The said size is computed as the integer part of the ratio of the sequence length (in amino-acids) by the number of computation nodes available.
  • sequence window, or window will be thereafter meant a portion of the amino acid sequence to rewrite that comprises said size of consecutive amino acids.
  • sequence will be thereafter meant the whole amino acid sequence to be rewritten.
  • Assign each window to one node Each node carries the same rewriting on its sequence window than described in previous, sequential embodiment of the invention (points x to y).
  • the nodes are computationally distinct entities.
  • the final step consists in assembling back the different rewritten windows, so that it avoids adding CG, restriction sites and/or other constraints at the jointures.
  • the algorithm changes the codons flanking the join one after another, using codons allowed by codon degeneracy, without taking into account the codon usage. The idea is that since codon usage is a global property along the whole sequence, changing a few codons at join position will not change much the codon usage of the whole sequence. v) If the constraints at join positions still can't be solved, the algorithm steps back to step
  • Polypeptide or polynucleotide encoding the polypeptide The sequence of the encoded polypeptide is not modified by the (re)writing process.
  • the method according to the invention concerns the (re)writing of a polynucleotide encoding a polypeptide from a polypeptide sequence.
  • said polypeptide is a native polypeptide.
  • said polypeptide is a mutated polypeptide derived from a native polypeptide.
  • said polypeptide is a chimeric polypeptide.
  • said polypeptide is an artificial polypeptide.
  • the method according to the invention concerns the (re)writing of a second polynucleotide encoding a polypeptide from a first polynucleotide containing the encoding sequence for the same polypeptide.
  • said first polynucleotide encoding a polypeptide is a native polynucleotide.
  • said polynucleotide is a mutated polynucleotide derived from a native polynucleotide.
  • said polynucleotide is a chimeric polynucleotide.
  • said polynucleotide is an artificial polynucleotide.
  • the (re)written polynucleotide can be prokaryotic, viral, or eukaryotic (notably from plant).
  • the polynucleotide to be (re)written can be any kind of gene. It can be an exogenous gene for the host cell. It can also be an endogenous gene. It can be a nuclear gene or an organelle's gene.For example, the (re)written gene can be a reporter gene.
  • a (re)written polynucleotide is disclosed for F-Tevl, F-TevII, HO, I-Ceul, I-Chul, I-Crel, I-Dmol, I-Scel, I-Tevl, I-TevII, I-TevIII, PI-Mlel, Pl-Pful, Pl-PfuII, Pl-Scel, PI-Tlil, PI-THII, I-Dirl and PI-MtuI.
  • Meganuclease are very rare-cutting enzymes encoded, in a large majority of cases, by introns ORF (Intron meganucleases), "classical” genes or intervening sequences (Inteins). These enzymes have striking structural and functional properties that distinguish them from “classical” and well known restriction enzymes (generally from bacterial system RMII). They have recognition non-palindromic sequences that span 12-40 bp of DNA, whereas "classical" restriction enzymes recognise much shorter stretches of DNA, in the 3-8 bp range (up to 12 bp for rare-cutter).
  • meganucleases can be used for in vivo genome engineering. Indeed, they recognize long DNA sequence: thus they can locate and cut a unique and specific site in the entire genome. For example, they can specifically cut a gene at a unique given location.
  • Some methods of recombination based on double-strand break repair, in order to introduce modifications into the cellular genome are based on the utilisation of meganucleases. These methods are described in US 5,474,896, US 5,792,632, US 5,866,361, US 5,948,678, US 5,962,327, US 5,830,729, WO 00/46385 and WO 00/46386, these patents and patent applications are hereby incorporated in their entirety by such reference. Meganuclease recombination system allows outstanding increases in levels of homologous recombination.
  • the meganuclease has to be expressed in host cells which do not naturally expressed meganucleases. Indeed, number of meganuclease genes are encoded by DNA of organelles such as mitochondria or chloroplastes. Generally, the expression of meganucleases in prokaryotic or eukaryotic host cell needs the modification of their ORF (open reading frame).
  • the present invention is concerned with isolated polynucleotides derived from a native gene having an increased or reduced content of CpG dinucleotides as compared to the native gene.
  • the isolated polynucleotides thereby demonstrate a modified level of expression once introduced into a cell as compared to the native gene's level of expression.
  • the invention concerns a (re)written polynucleotide containing a coding sequence for a polypeptide having 1 or 0 CpG dinucleotide.
  • the invention concerns the polynucleotide containing a coding sequence for a polypeptide having no CpG dinucleotide.
  • said (re)written polynucleotide consisting of a coding sequence for a polypeptide.
  • the invention concerns a (re)written polynucleotide containing a coding sequence for a polypeptide having 0.05 % of CpG dinucleotide, preferably 0.01 %.
  • said (re)written polynucleotide consisting of a coding sequence for a polypeptide.
  • the invention concerns a (re)written polynucleotide containing a coding sequence for a polypeptide having less than 0.5 % of CpG dinucleotide, preferably less than 0.1 % of CpG dinucleotide, more preferably less than 0.05 % of CpG dinucleotide, and meeting the codon usage table of a host or a group of hosts.
  • said (re)written polynucleotide has no undesired restriction site.
  • at least one desired restriction site has been introduced in said (re)written polynucleotide.
  • said (re)written polynucleotide has 1 or 0 CpG dinucleotide.
  • said (re)written polynucleotide has no CpG dinucleotide.
  • said (re)written polynucleotide consists of a coding sequence for a polypeptide.
  • the invention concerns a (re)written polynucleotide containing a coding sequence for a polypeptide having more than 1%, preferably more than 5%, more preferably more than 10% of CpG dinucleotide.
  • said (re)written polynucleotide further meets the codon usage table of a host or a group of hosts.
  • said (re)written polynucleotide has no undesired restriction site.
  • at least one desired restriction site has been introduced in said (re)written polynucleotide.
  • said (re)written polynucleotide consisting of a coding sequence for a polypeptide.
  • the invention also encompasses a (re)written polynucleotide containing a coding sequence for a polypeptide meeting the codon usage table of a host or a group of hosts.
  • said (re)written polynucleotide consists of a coding sequence for a polypeptide.
  • said (re)written polynucleotide has no undesired restriction site.
  • at least one desired restriction site has been introduced in said (re)written polynucleotide.
  • the invention further encompasses a (re)written polynucleotide containing a coding sequence for a polypeptide having no undesired restriction site.
  • said (re)written polynucleotide consisting of a coding sequence for a polypeptide.
  • the invention contemplates a (re)written polynucleotide having at least 500, 700, or 900 bp, more preferably at least 1, 1.5, 2, 2.5 or 3 kb.
  • the (re)written polynucleotides according to the invention are not native. Hence, said (re)written polynucleotides can not be found in nature.
  • the invention encompasses the (re)written polynucleotide by a method according to the present invention.
  • the invention also concerns an isolated polynucleotide comprising said (re)written polynucleotide according to the present invention.
  • the invention more particularly relates to any one of the (re)written sequences SEQ ID NO: 1
  • the invention concerns any polynucleotide comprising or consisting of a fragment of at least 20, 30, 50, 100, 200 consecutive nucleotides from any one of the (re)written sequences SEQ ID N° 1, 3, 5, 7, 9, 11, 13, 15 and 17.
  • the (re)written polynucleotides can be synthesized with any method skilled in the art.
  • such articles and patents describe some means of synthezised gene (Engels et al, Adv Biochem Eng Biotechnol 1988;37:73-127 ; Beattie et al, Biotechnol Appl Biochem. 1988 Dec; 10(6):510-21 ; Casimiro et al, Structure. 1997 Nov 15;5(11): 1407-12 ; Scheller et al, Nat Biotechnol. 2001 Jun;19(6):573-577 ; Massaer et 1, Int Arch Allergy Immunol.
  • the invention concerns a method of producing a polynucleotide containing a coding sequence for a polypeptide, comprising the steps of : a) (re)writing said polynucleotide by any (re)writing method accroding to the present invention; and b) synthesizing said polynucleotide.
  • the invention also encompasses expression vectors, cells, and living organisms genetically modified as to comprise and/or express any of the polynucleotides object of the invention or a complementary sequence thereto.
  • the invention further encompasses a cell or a living organism containing a vector comprising a (re)written polynucleotide according to the invention. More particularly, the living organism is a transgenic animal or plant. Preferably, said transgenic animal is murine, more preferably is a mouse.
  • said transgenic plant is sweet pepper, cucumber, sunflower, leek, sugar beet, tomato, carrot, Brassica napus, chichory, corn, wheat, barley, cotton, soybean, triticale, oat, tobacco, rye and rice.
  • the cell comprising (re)written polynucleotide according to the invention is an embryonic stem cell or fertilized egg.
  • the cell comprising (re)written polynucleotide according to the invention is a protoplast. More preferably, said embryonic stem cell or fertilized egg is murine, preferably from a mouse. In an other embodiment, the cell can be a differentiated cell.
  • the host cell can be of the same species than the polypeptide to express or can be a different species.
  • the host cell can be different from the cell naturally expressing the polypeptide.
  • the host cell is a differentiated cell.
  • the host cell is a differentiated cell which does not naturally express the encoded polypeptide.
  • Host organisms or host can refer to an organism, more preferably a group of organisms such as superior or inferior eukaryotes, prokaryotes, plants, still more preferably said organisms refer to a combination of eukaryotes, prokaryotes, and plants.
  • the present invention relates to expression vectors, cells and living organisms genetically modified to comprise and/or express any of the isolated polynucleotides comprising or consisting of a (re)written polynculeotide according to the invention.
  • "Genetically modified" cells and living organisms would preferably integrate and express a foreign DNA inserted therein.
  • Well known methods for reliably inserting a foreign DNA into cells and/or living organisms include : bacterial transformation, transgenesis, stem cells transformation, viral transfection, and artificial chromosome insertion.
  • the foreign DNA may be found integrated to the genome of the host or be found under a non-integrated form (episomal, plasmidic or viral). It may also be inserted to an artificial chromosome or to an independent genome such as into the genome of a bacterial parasitizing an eukaryotic cell.
  • This method is characterized in that it comprises the step of providing an isolated polynucleotide for which expression is desired by (re)writing said polynucleotide containing a coding sequence according to a method of the present invention and expressing said polynucleotide in said host.
  • said host is eukaryotic.
  • the method generally also comprises the step of introducing said isolated polynucleotide into the host using a method preferably selected from the group comprising transgenesis, viral transfection, bacterial transformation, artificial chromosome insertion or homologeous recombination as disclosed for example by Cappuchi et al. (Trends genetics, 1989, 5:70-76) or by Brulet et al in European Patent No. 419621, those documents being incorporated herein by reference.
  • said polynucleotide has a predetermined CpG content. More preferably, the CpG dinucleotide content is 1 or 0. Still more preferably, the CpG dinucleotide content is 0.
  • the (re)written polynucleotide is thereby capable of showing an increased and/or stabilized level of expression when introduced into a cell of said host as compared to the level of expression of the native polynucleotide encoding the same polypeptide in the same host cell.
  • the invention concerns a method to stably express in an eukaryotic host a polynucleotide, comprising the steps of : a) (re)writing a polynucleotide in accordance with any of the (re)writing method of the present invention; b) expressing said polynucleotide in said host.
  • the invention concerns a method to stably express in an eukaryotic host a polynucleotide, comprising the steps of : a) (re)writing a polynucleotide in accordance with any of the (re)writing method of the present invention; b) inserting into the host cell the (re)written polyncuelotide of step a); and, c) inducing the expression of said (re)written polyncuelotide of step b).
  • said (re)written polynculeotide has a minimized content of CpG dinucleotide.
  • the CpG dinucleotide content is less than 1%, more preferably less than 0.5%, and most preferably less than 0.1%. More preferably, the CpG dinucleotide content is less than 0.1%, preferably less than 0.05%, more preferably less than 0.01% of CpG dinucleotides. Still more preferably, the CpG dinucleotide content is 1 or 0 CpG dinucleotide.
  • said (re)written polynucleotide is CpG free. The minimized content of CpG dinucleotide of the (re)written polynucleotide allows to avoid the epigenetic silencing due to the de novo methylation of the CpG dinucleotides.
  • the polynucleotide encoding a polypeptide and having an increased content of CpG nucleotides can be used for a transitory expression. Indeed, the high CpG dinucleotide content increases the de novo methylation such that the silencing of that polynucleotide is stimulated. Therefore, the expression of the polynucleotide is brief.
  • the (re)written polynucleotide having a maximized content of CpG dinucleotide could be used to reduce or to silence the expression of said (re)written polynculeotide.
  • the invention concerns a method of reducing or silencing the expression of a polynucleotide in a host cell, comprising the steps of : a) (re)writing an isolated polynucleotide in accordance with any of the (re)writing method of the present invention; b) inserting into the host cell the (re)written polynucleotide; c) reducing or silencing the expression of said (re)written polynucleotide or of a cis-gene proximal or distal to said (re)written polynucleotide.
  • said (re)written polynucleotide has a maximized content of CpG dinucleotide.
  • by maximized is intended that the content of CpG dinucleotide is more than 1%, preferably more than 5%, more preferably more than 10%.
  • the invention concerns the use of the (re)written polynucleotide according to the present invnetion for obtaining transgenic animals or plants, and/or in gene therapy.
  • the gene therapy can be done for compensating a genetic defect.
  • the methylation of the dinucleotide CpG contributes to the mutation C->T
  • the removal of the CpG dinucleotides from a gene could avoid such a mutation.
  • the p53 gene can be rewritten and protected against the mutation C->T.
  • a tumor suppressor gene and/or an invasion-suppressor gene can be rewritten for removing the CpG dinucletides.
  • the rewritten genes could avoid the silencing by hypermethylation.
  • an other embodiment of the present invention is the use of the (re)written polynucleotide according to the present invnetion for the gene therapy is intended for treating or preventing cancer formation.
  • the (re)written gene is a tumor suppressor gene or an invasion-suppressor gene.
  • the invention encompasses the use of the (re)written polynucleotide according to the present invention for the production of a protein or polypeptide of interest in prokaryotes or eukaryotes.
  • the (re)writing polynculeotide allows the heterologous expression of a protein or polypeptide in all organisms.
  • a human protein can be expressed as an exogenous gene in a plant such as the tobacco.
  • the invention covers also the use of (re)written polynucleotide for the prevention of an immune response against exogenous DNA used in genetic or cellular therapy.
  • said (re)written polynucleotide has a minimized content of CpG dinucleotides.
  • the CpG dinucleotide content is less than 1%, more preferably less than 0.5%, and most preferably less than 0.1%. More preferably, the CpG dinucleotide content is less than 0.1%, preferably less than 0.05%, more preferably less than 0.01% of CpG dinucleotides. Still more preferably, the CpG dinucleotide content is 1 or 0 CpG dinucleotide. In a more preferred embodiment of the invention, said (re)written polynucleotide is CpG free.
  • the invention is also concerned with the use of the (re)written polynculeotide having a minimized content of CpG dinculeotide for the prevention of autoimmune against endogenous methyl CpG motifs, DNA used in genetic or cellular therapy or any host similar sequences.
  • (re)written polynculeotide of the invention with no or a reduced number of CpG dinucleotides, fragments thereof or vectors containing them, could be used to minimize a T-cell response against the T-cells or tissues treated with them.
  • the invention thus proposed a new concept of DNA vaccination based on lowering/deleting CpG dinucleotides of a whole polynucleotide still encoding an immunoactive antigen.
  • Another aspect of the present invention is the use of the (re)written polynculeotide with a maximized content of CpG dinucleotides in the induction of a protective immune response in vivo or in vitro.
  • the administration of such (re) written polynculeotide may help and increase the use of the DNA vaccine methods in vivo.
  • a better T-cell response could also be envisaged by an in vitro stimulation of lymphocytes of a patient against a non-natural polynucleotide of interest according to the invention, as compared to the T-cell response against a natural native polynucleotide.
  • Example 1 provide some manual (re)written polynucleotides encoding some meganucleases.
  • “more frequent codon” refers to a sequence using the more frequent codons for each amino acid
  • preferential codon refers to the (re)written sequence meeting the codon usage table
  • CpG minus refers to the (re)written sequence which does not contain any CpG dinucleotide and which maintains the codon usage table meeting
  • “restriction minus” refers to the (re)written sequence which does not contain any undesired restriction site and contains the desired restriction sites.
  • F-Tevl SEQ ID N° 1
  • HO SEQ ID N° 3
  • I-Crel SEQ ID N° 5
  • I- Dmol SEQ ID N° 7
  • I-Scel SEQ ID N° 9
  • I-TevIII SEQ ID N° 11
  • Pl-Scel SEQ ID N° 13
  • EXAMPLE 2 The example 2 provide three (re)written polynculeotides encoding the PI-MtuI and I- Bmo I meganucleases by a computerized process, respectively SEQ ID N° 15 and 17.
  • the computerized procees is generally at least 100 fold more rapid. Futhermore, the computerized procees has a better meeting of the codon usage table.
  • EXAMPLE 3 The (re) written polynucleotides encoding the meganucleases were synthesized as following. For each (re)written polynucleotide, oligonucleotides of 80 bp were designed so as to cover the whole (re)written polynucleotide for the two strands and to be overlapping each other at 50 %.
  • a first PCR was done with 8 to 12 oligonucleotides (4 to 6 for each strand, 5 pmol for each oligonucleotide).
  • the PCR was done with 1 unity of high fidelity Taq in 50 ⁇ l of reactional volume with the following cycles: lx 94 °C for 5 min, 25x (94 °C for 30 sec, 72°C for 2 min) and lx 72°C for 2 min.
  • This first PCR led to 300 to 400 bp fragments.
  • the first PCR products were loaded on agarose gel and the awaited band was cut out. The product contained in this band is purified on silica column (NucleoSpin® Extract).
  • Two first-PCR fragments presenting an overlap of at least 50 nucleotides, with two primers corresponding to the ends of the fragments were used for a second PCR 1/5 of the purified product of the first PCR were used with 20 pmoles of primers.
  • the PCR was done with 1 unity of high fidelity Taq in 50 ⁇ l of reactional volume with the following cycles: lx 94 °C for 5 min, 25x (94 °C for 30 sec, 61°C for 1 min, 72°C for 1 min) and lx 72°C for 5 min.
  • EXAMPLE 4 The following table discloses if the written polynucleotide sequences are expressed in the host cells. Three types of host cells have been assayed: bacteria, yeast and mammalian cells.

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Plant Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Cell Biology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

The present invention is concerned with a process for (re)writing a polynucleotide sequence containing a coding sequence for a polypeptide, whereby the content of CpG dinucleotides is adjusted to a predetermined value. These polynucleotides are useful to increase, stabilize, silence and/or reduce gene expression, in particular in protein production, to generate transgenic animal, transgenic plants or to make gene therapy. Preferably, the present invention also relates to process for stably expressing these (re)written polynucleotides in in vivo and in in vivo expression systems.

Description

PROCESS OF WRITING OR REWRITING A POLYNUCLEOTIDE SEQUENCE HAVING A PREDETERMINED CONTENT OF CpG DINUCLEOTIDES
Background of the invention
a) Field of the Invention
The present invention relates to a process for (re)writing a polynucleotide sequence containing a coding sequence, whereby the content of CpG dinucleotides is adjusted to a predetermined value. These polynucleotides are useful to increase, stabilize, silence and/or reduce gene expression, in particular for use in protein production, to generate transgenic animal, transgenic plants or to make gene therapy. Preferably, the present invention also relates to process for producing or stably expressing these (re)written polynucleotides in in vitro and in vivo expression systems.
b) Brief description of the prior art
Epigenetic control of nucleic acid sequences and regions are known to be involved in gene regulation expression.
DNA methylation in eukaryotes involves addition of a methyl group to the carbon 5 position of cytosine ring. It is the most common eukaryotic DNA modification and is a widespread epigenetic phenomenon. Eukaryotic genomes are not methylated uniformly but contain methylated regions interspersed with unmethylated domains. In eukaryotes, numerous studies have shown that the methylation of 5 CpG3' dinucleotides (mCpG) has a repressive effect on gene expression in vertebrates and flowering plants (Hsieh, Mol Cell. Biol, 14:5467-94, 1994; Kudo, Mol. Cell. Biol, 18:5492-99, 1998; Goto and Monk, Microbiol. Mol. Biol. Rev., 62:382-378, 1998; Jones et al. 1998, Colas 1998.; Singal and Ginger, Blood 1999 Jun 15;93(12):4059-70, Henry et al., C.R. Acad. Sci. Paris, 1999, 322: 1061-1070). CpG methylation is primarily associated with transcriptional repression. Tissue-specific genes are variably methylated, often in a tissue-specific pattern, and usually the methylation level is inversely correlated with the transcriptional status of the genes. The methylation of 5'CpG3' dinucleotides within genes creates potential targets for protein complexes that bind to methylated DNA sequences and to histone deacetylases (MBD-
HDAC). This can lead to a transcriptional repression following modification(s) of the chromatin.
Otherwise, DNA hypermethylation may contribute to tumorigenicity by silencing the expression of genes required to maintain a normal cell phenotype. Methylation as a mechanism for inactivating tumor-suppressor genes has been demonstrated for several genes. Similarly, cancer metastasis and invasion are closely associated with the phenomenon of cell to cell
BESTATIGUNGSKOPIE adhesiveness. The gene expressing an invasion-suppressor gene (E-cadherin) was silenced by hypermethylation of the promoter region in human carcinomas and in human breast cancer cells.
Furthermore, the methylation of CpG dinucleotides also contributes to a C->T mutation, as demonstrated for example for the p53 gene. So far, genetic engineering has always been done so that the natural gene regulation is maintained, so that the CpG content is preserved. Indeed, the CpG methylation silencing effect is not a problem if expression of a gene in its natural host is contemplated.
However, in both cultured cells transfected with foreign DNA and transgenic organisms, the newly integrated foreign DNA frequently becomes de novo methylated. It has been proposed that de novo methylation constitutes a cellular defense mechanism to silence integrated foreign
DNA or genes. The foreign DNA can be an exogenous gene or an endogenous gene which is not expressed in a differentiated cell. The inactivation of foreign gene expression by methylation in specific cell types has important economical, therapeutic and pharmacological implications.
Indeed, the expression of the introduced gene needs to be stable for transgenic animals or plants and gene therapy. Therefore, the methylation of introduced genes that leads to the silencing of such genes interferes with the therapeutic effect and restrains the use of transgenic animals and plants.
A solution to control the stability of gene expression is the adjustment of the CpG dinucleotides content. This adjustment of the CpG dinucleotides (i.e. removal of CpG dinucleotides in eukaryotic host) allows to avoid the regulation system linked to CpG methylation in order to obtain a strong and stable expression without any consideration of the cell type and the fact that the gene is endogenous or exogenous. For the adjustment of the CpG dinucleotides, the gene has to be rewritten and the DNA synthesized. Then, the rewritten genes with a decreased CpG dinucleotide content could avoid the silencing by hypermethylation. Alternatively, the adjustment of the CpG dinucleotides (i.e. increasing of CpG dinucleotides in eukaryotic host) allows to silence the expression and to obtain a transitory expression by the de novo methylation.
With the present invention, it is possible to synthesize an artificial gene or a polynucleotide derived from the native gene having a modified content of CpG dinucleotides and thereby modify accordingly the levels of expression of the artificial gene as compared to the unmodified native gene.
Furthermore, the present invention aims to remove the inhibitory expression barrier which exists between organisms from different genus and species. This is achieved by modifying the content of codons in the coding sequence in order to meet a codon usage which is in agreement with a particular host. The present invention provides also an optimization of the sequence by meeting the usage codon of the host organism in order to achieve high expression. Furthermore, the instant application aims to facilitate the manipulation of the (re)written gene by allowing the possibility to remove or insert restriction enzyme site.
The present invention also fulfils other needs which will be apparent to those skilled in the art upon reading the following specification.
Summary of the invention
The invention concerns a method of (re)writing a polynucleotide containing a coding sequence, typically of sequence coding for a polypeptide.
Preferably, said (re)written polynucleotide has a predetermined content of CpG dinucleotides. In a first embodiment of the method according to the present invention, the content of CpG dinucleotides is minimized. Preferably, the content of CpG dinucleotides in the
(re) written polynucleotide sequence is zero. In a second embodiment, the content of CpG dinucleotides is maximized.
In one alternative, said (re)written polynucleotide encodes said polypeptide with the more frequent codons used in a host or in a group of hosts. In an other alternative, said (re)written polynucleotide contains codons so that the codon composition of the encoding sequence meets the codon usage table of a selected host or group of hosts.
Optionally, said (re)written polynucleotide does not comprise any undesired restriction site. Optionally, some restriction sites are introduced in said (re)written polynucleotide. The invention concerns a method of writing a polynucleotide containing a coding sequence for a polypeptide (or any other expression product) comprising the following steps : a) providing at least one database containing the amino-acids of said polypeptide and corresponding codons; b) reading at least one amino-acid from said polypeptide sequence ; c) selecting from said database one codon which encodes said amino-acid; d) repeating steps b) and c) for all amino-acids of said polypeptide: the polynucleotide being written with the selected codons; whereby the written polynucleotide has a content of CpG dinucleotide adjusted to a predetermined value. The invention concerns a method of rewriting of a first polynucleotide into a second polynucleotide, each containing a sequence coding for the same expression product (e.g., polypeptide, RNA), comprising the steps of : a) providing at least one database containing groups of codons encoding the same amino- acid; b) reading at least one codon from said first polynucleotide; c) selecting from said database one codon which belongs to the same group of the read codon, which can be identical or different from said read codon; d) repeating steps b) and c) for all codons of said coding sequence of said polynucleotide; the polynucleotide being written with the selected codons; whereby the rewritten polynucleotide has a content of CpG dinucleotide adjusted to a predetermined value. In the rewritten polynucleotide, the mean CpG content is distinct from that of the first polynucleotide.
The method of (re)writing a polynucleotide containing a coding sequence for a polypeptide can be done by manual process or by computarized process. Preferably, the (re)writing method is done by a computerized process.
The present invention also relates to methods of producing or synthesizing improved polynucleotides encoding a desired expression product (e.g., polypeptide or RNA), the method comprising: providing an improved polynucleotide sequence encoding said expression product using a method as described above, and synthesizing a polynucleotide comprising said sequence. Synthesis of the polynucleotide can be performed by variety of techniques, including recombinant DNA technologies, artificial synthesis, mutagenesis, enzymatic techniques, cloning, ligating, etc., or a combination thereof. In a further step, the polynucleotide may be cloned into a vector, particularly an expression vector.
The present invention also relates to isolated polynucleotides derived from a native gene having an increased or reduced content of CpG dinucleotides as compared to the native gene. In a preferred embodiment, the polynucleotide has a decreased content of CpG dinucleotides. More preferably, the polynucleotide has less than 0.5 %, or 0.1%, preferably less than 0.05%, more preferably less than 0.01% of CpG dinucleotides. Still more preferably, the polynucleotide according to the present invention contains 1 or 0 CpG dinucleotide. In a more preferred embodiment of the invention, the polynucleotide is CpG free. By CpG free is intended that the polynucleotide does not contain any CpG dinucleotide. Said polynucleotide has at least 500, 700, or 900 bp, more preferably at least 1, 1.5, 2, 2.5 or 3 kb. Preferably, the polynucleotide encodes a native polypeptide.
Alternatively, the present invention relates to an isolated polynucleotide having an increased content of CpG dinucleotides. Preferably, the polynucleotide has a content of CpG dinucleotides higher than 1%, preferably higher than 5%, more preferably higher than 10%. Said polynucleotide has at least 500, 700, or 900 bp, more preferably at least 1, 1.5, 2, 2.5 or 3 kb. The invention encompasses the (re) written polynucleotides obtained by a method according to the present invention. Furthermore, the invention concerns polynucleotides containing a sequence coding for a polypeptide having 1 or 0 CpG dinucleotide. Preferably, the invention concerns polynucleotides containing a sequence coding for a polypeptide having no CpG dinucleotide.
The invention also encompasses expression vectors, cells, and living organisms genetically modified as to comprise and/or express any of the polynucleotides object of the invention.
It is also an object of this invention to provide a method to express in a host an isolated polynucleotide comprising any of the polynucleotides object of the invention.
The use of an isolated polynucleotide according to the invention for compensating a genetic defect is contemplated in the present invention. The use of an isolated polynucleotide according to the invention for introducing a trait in a transgenic plant is also contemplated.
The invention and its numerous advantages will be better understood upon reading the following non-restrictive specification and the accompanying drawings.
Brief description of the drawings
Figure 1 presents a codon usage table for prokaryotes, plants and eukaryotes.
Figure 2 presents an optimized codon usage table for prokaryotes and eukaryotes (see column PRO&EU means). For each codon encoding an amino acid, the frequency is calculated. The sum of the codon frequencies encoding one amino acid is 1. The column PRO&EU corrected corresponds to the codon frequency modified so that the codons comprising a CpG dinucleotide are not used and then have a frequency of zero. The frequency of the codon comprising a CpG dinucleotide is distributed among the other codon encoding the same amino acid.
Figure 3 presents an example of database comprising the amino acids, the codons encoding thereof and a frequency for each codon The frequency for each codon corresponds to the column PRO&EU corrected of the figure 2. In this application, these frequencies can also be called "coefficients".
Figure 4 presents a relation between one restriction site and a group of amino acid sequences (2 or 3 amino acids). The polylinker sequence of pUC19 is translated in an amino acid sequence in the three different reading frames. From the amino acid sequences are deduced the possible successions of amino acids, called regular expression, that indicate a potential insertion place for introducing a restriction enzyme site.
Figure 5 presents the regular expression to search in the amino acid sequence in order to identify the place appropriate to introduce a restriction site for the group of the pUC 19 polylinker restriction sites. The brackets mean that any amino acid between the brackets could be chosen. For example, [RG] - 1 - [RPLHQ] designates the following sequences : RIR, GIR, RIP,
GIP, etc... Figure 6 shows the manual selection of codons to write a polynucleotide sequence encoding the I-Crel protein, said polynucleotide meeting the usage codon of an host organism which can be prokaryotic or eukaryotic (Figure 3), being without any CpG dinucleotide and without any restriction site comprised in the pUC19 polylinker. The first two lines present the amino acid sequence of I-Cre I. Behind each amino acid is the possible codons encoding this amino acid. The lines "preferential codon", "CpG minus" and "restriction minus" disclose the three steps of the polynucleotide rewriting. A BamHI restriction site is in position 60 of the "preferential codon" sequence. This site has been removed in the "CpG minus" sequence.
Figure 7 depicts the restriction enzyme sites of the I-Cre rewritten polynucleotide sequence.
Figure 8 presents a reference table with the codon frequency to meet, a table with the amino acid content of I Cre I protein, and a table with the theoretical codon content of the rewritten polynucleotide in order to meet the codon usage (Figure 3) and the CpG dinucleotide free. Figure 9 presents a table with the theoretical codon content of the rewritten polynucleotide for I Cre I, a table with the real codon content of the rewritten polynucleotide encoding I Cre I and a table giving the difference between the theoretical and the real codon content of the rewritten polynucleotide encoding I Cre I protein.
Figure 10 shows the manual selection of codons to rewrite a polynucleotide sequence encoding the HO protein, said polynucleotide meeting the usage codon of the host organism (prokaryote and eukaryote) (Figure 3), and being without any CpG dinucleotide and without any restriction site comprised in the pUC 19 polylinker but the two introduced Kpn 1 and Pst 1 restriction sites. The two first columns present the amino acid sequence of HO. At the left of each amino acid are the possible codons encoding this amino acid. The "rewritten sequence" indicates the selected codon. A Kpn 1 and Pst 1 restriction sites have been introduced in position 667 and 1233 of the "rewritten" sequence.
Figure 11 depicts the restriction enzyme site of the HO rewritten polynucleotide sequence. Among the restriction sites, the sites Kpn 1 and Pst 1 are indicated.
Figure 12 presents a reference table with the codon frequency to meet, a table with the amino acid content of HO protein, and a table with the theoretical codon content of the rewritten polynucleotide in order to meet the codon usage (Figure 3) and the CpG dinucleotide free.
Figure 13 presents a table with the theoretical codon content of the rewritten polynucleotide for HO, a table with the real codon content of the rewritten polynucleotide encoding HO protein and a table giving the difference between the theoretical and the real codon content of the rewritten polynucleotide encoding HO protein Figure 14 shows the manual selection of codons to rewrite a polynucleotide sequence encoding the F-Tevl protein, said rewritten polynucleotide meeting the usage codon of the host organism (prokaryote and eukaryote) (Figure 3), and being without any CpG dinucleotide and any restriction site comprised in the pUC 19 polylinker. The two first lines present the amino acid sequence of F-Tevl. Behind each amino acid are the possible codons encoding this amino acid. The lines "preferential codon", "CpG minus" and "restriction minus" disclose the three steps of the polynucleotide rewriting.
Figure 15 shows the manual selection of codons to rewrite a polynucleotide sequence encoding the I-Dmol protein, said rewritten polynucleotide meeting the usage codon of the host organism (prokaryote and eukaryote) (Figure 3), and being without any CpG dinucleotide and without any restriction site comprised in the pUC 19 polylinker. The two first lines present the amino acid sequence of I-Dmol. Behind each amino acid are the possible codons encoding this amino acid. The lines "more frequent codon", "preferential codon", "CpG minus" and "restriction minus" disclose the four steps of the polynucleotide rewriting. Figures 16-20 present a table with the theoretical codon content of the rewritten polynucleotide for the encoded polypeptide, a table with the real codon content of the rewritten polynucleotide encoding said polypeptide and a table giving the difference between the theoretical and the real codon content of the rewritten polynucleotide encoding said polypeptide. More particularly, the encoded polypeptides are F-Tevl in figure 16, I-Dmol in figure 17, 1-Scel in figure 18, 1-TevIII in figure 19, and Pl-Scel in figure 20.
Figure 21 presents a table with the theoretical codon content of the rewritten polynucleotide for PI-MtuI, a table with the real codon content of the rewritten polynucleotide encoding PI-MtuI protein and a table giving the difference between the theoretical and the real codon content of the rewritten polynucleotide PI-MtuI protein. Figure 22 presents the tree and path search used by the algorithm of the computerized method for the rewriting polynucleotide containing a coding sequence for a polypeptide.
Figure 23 presents the flow chart representing the branching algorithm of the computerized method for the rewriting polynucleotide containing a coding sequence for a polypeptide. Figure 24 presents an optimized codon usage table for higher eukaryotes and CpG minus.
For each codon encoding an amino acid, the frequency is calculated. The sum of the codon frequencies encoding one amino acid is 1. The column CpG corrected corresponds to the codon frequency modified so that the codons comprising a CpG dinucleotide are not used and then have a frequency of zero. The frequency of the codon comprising a CpG dinucleotide is distributed among the other codon encoding the same amino acid. Brief description of the sequence listing
Figure imgf000009_0001
Detailed description of the invention The present invention concerns the (re)writing, synthesis and/or expression of polynucleotides containing a sequence coding for an expression product (e.g., a polypeptide or RNA), so that said polynucleotide has a predetermined content of CpG dinucleotides and/or an improved codon usage and/or selected restriction sites.
In order to provide an even clearer and more consistent understanding of the specification and the claims, including the scope given herein to such terms, the following definitions are provided:
l Definitions Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described. All patents and scientific literature cited in this application evidence the level of knowledge in this field and are hereby incorporated by reference. For purposes of clarification, the following terms are defined below.
"A polynucleotide having a content of X % of CpG dinucleotides" refers to a polynucleotide which presents x CpG dinucleotides for 100 nucleotides.
The term "CpG free polynucleotide" refers to a polynucleotide comprising no CpG dinucleotide. A polynucleotide is said to "derive" from a native gene or a fragment thereof when such polynucleotide comprises at least one portion, substantially similar in its sequence, to the native gene or to a fragment thereof. Preferably, the polynucleotide is also similar in its function to the native gene from which it derives. The terms "expression" or "expressing", as is generally understood and used herein refer to the process by which a gene produces a polypeptide. It involves transcription of the gene into mRNA, and the translation of such mRNA into polypeptide(s).
A "host" refers to a cell, tissue, organ or organism capable of providing cellular components for allowing the expression of an exogenous nucleic acid (typically a nucleic acid embedded into a vector or a viral genome). This term is intended to also include hosts which have been modified in order to accomplish these functions. Bacteria, fungi, animal (cells, tissues or organisms) and plant (cells, tissues, or organisms) are examples of a host. "Non-human hosts" comprise vertebrates such as rodents, non-human primates, sheep, dog, cow, amphibians, reptiles, etc. "Isolated" means altered "by the hand of man" from its natural state, i.e., if it occurs in nature, it has been changed, purified or removed from its original environment, or both. For example, a polynucleotide naturally present in a living organism is not "isolated". The same polynucleotide separated from the coexisting materials of its natural state, obtained by cloning, amplification and/or chemical synthesis is "isolated" as the term is employed herein. Moreover, a polynucleotide that is introduced into an organism by transformation, genetic manipulation or by any other recombinant method is "isolated" even if it is still present in said organism.
As used herein, the terms "modified", "modifying" or "modification" as applied to the terms polynucleotides or genes, refer to polynucleotides that differ, in their nucleotide sequence, from another reference polynucleotide or gene. Changes in the nucleotide sequence of the modified polynucleotide may or may not alter the amino acid sequence of a polypeptide encoded by the reference polynucleotide/gene Nucleotide changes may result in amino acid substitutions, additions, deletions, fusion proteins and truncations in the polypeptide encoded by the reference sequence. According to preferred embodiments of the invention, the modifications are conservative such that these changes do not alter the amino acid sequence of the encoded polypeptide. Modified polynucleotides may be made by mutagenesis techniques, by direct synthesis, and by other recombinant methods known to the skilled artisans. The polynucleotides of the invention can also contain chemical modifications or additional chemical moieties not present in the native gene. These modifications may improve the polynucleotides solubility, absorption, biological half life, and the like. The moieties may alternatively decrease the toxicity of the polynucleotides, eliminate or attenuate any undesirable side-effects and the like. A person skilled in the art knows how to obtain polynucleotides derived from a native gene. As used herein as applied to an object, "native" refers to the fact that an object can be found in nature. For example, a gene that is present in an organism that can be isolated from its natural non-isolated state is said to be a "native gene". A native polypeptide refers to a polypeptide having an amino acid sequence presenting 100 % of identity with its natural non- isolated state.
"Polynucleotide" refers to any DNA or RNA sequence or molecule having one nucleotide or more, including nucleotide sequences encoding a complete gene. The term is intended to encompass all nucleic acids whether occurring naturally or non-naturally in a particular cell, tissue or organism. This includes DNA and fragments thereof, RNA and fragments thereof, cDNAs and fragments thereof, expressed sequence tags, artificial sequences including randomized artificial sequences.
"Vector" refers to a self-replicating or integrating RNA or DNA molecule which can be used to transfer an RNA or DNA segment from one organism to another. Vectors are particularly useful for manipulating genetic constructs and different vectors may have properties particularly appropriate to express protein(s) in a recipient during cloning procedures and may comprise different selectable markers. Bacterial plasmids are commonly used vectors.
"Expression Vector" refers to a vector or vehicle similar to a cloning vector but which is capable of expressing a gene (or a fragment thereof) which has been cloned therein. Typically, expression of the gene occurs when the vector has been introduced into the host. The cloned gene is usually placed under the control of certain control sequences or regulatory elements such as promoter sequences. Expression control sequences vary depending on whether the vector is designed to express the operably linked gene in a prokaryotic or eukaryotic host and may additionally contain transcriptional elements such as enhancer elements, termination sequences, tissue-specificity elements and/or translational and termination sites. "Codon usage table" refers to a database giving the codons, the amino acid encoded by each codon, and the frequency at which these codons are found for a defined type of amino acid. "Two consecutive codons" refers to two codons immediately consecutive in a coding sequence.
By "meet the codon usage" is intended that there is an optimization of the codon frequency in the (re)written polynucleotide in order to be the closest as possible to the codon frequency in the codon usage table for a considered host or group of hosts. The optimization is not the same for all the groups of codon encoding an amino acid. Indeed, the optimization depends on the number of codons encoding a type of amino acid. For example, as shown in Figure 2, the serine can be encoded by 6 different codons whereas the tyrosine can be encoded by two different codons. Therefore, when an amino acid is encoded by more codons, the optimization is better. B) General overview of the invention
Method of (re)writing a polynucleotide containing a sequence coding for a polypeptide.
The invention concerns a method, preferably a computerized process, of writing a polynucleotide containing a coding sequence for a polypeptide comprising the steps of : a) providing at least one database containing the amino-acids of said polypeptide and corresponding codons; b) reading at least one amino-acid from said polypeptide sequence ; c) selecting from said database one codon which encodes said amino-acid; d) repeating steps b) and c) for all amino-acids of said polypeptide: the polynucleotide being written with the selected codons; whereby the written polynucleotide has a content of CpG dinucleotide adjusted to a predetermined value.
The invention concerns a method, preferably a computerized process, for rewriting of a first polynucleotide into a second polynucleotide containing each a coding sequence for the same polypeptide, comprising the steps of : a) providing at least one database containing groups of codons encoding the same amino- acid; b) reading at least one codon from said first polynucleotide; c) selecting from said database one codon which belongs to the same group of the read codon, which can be identical or different from said read codon; d) repeating steps b) and c) for all codons of said coding sequence of said polynucleotide; the polynucleotide being written with the selected codons; whereby the rewritten polynucleotide has a content of CpG dinucleotide adjusted to a predetermined value.
In a preferred method of (re)writing a polynucleotide containing a coding sequence for a polypeptide according to the invention, the method is computerized. Optionally, all the steps of the method are computerized. Optionally, only some steps of the method are computerized.
In a preferred embodiment of the (re)writing method, the (re)written polynucleotide sequence is longer than 500 bp, preferably 1 kb, and more preferably 2 kb.
The (re)written polynucleotide is not limited to the sequence encoding the polypeptide. It can comprise additional sequences upstream and downstream of the coding sequence. For example, such additional sequences can be insulator (Kaffer et al., Genes Dev. 2000, 14, 1908-19 ; EP 859,059 ; WO96/04390, the disclosures of which are incorporated herein by reference). In another example, the (re)writing polynucleotide can be surrounded by restriction enzyme sites. Furthermore, the invention also encompasses a (re)written polynucleotide which comprises non- coding additional sequence(s) which are introduced into the coding sequence. These additional sequences could have a predetermined content of CpG dinucleotides.
In one embodiment of the invention, the content of CpG dinucleotide in the (re)written polynucleotide is minimized, preferably is such content is less than 1%, more preferably less than 0.5%, and most preferably less than 0.1%. More preferably, the polynucleotide has less than 0.1%, preferably less than 0.05%, more preferably less than 0.01% of CpG dinucleotides. Still more preferably, the polynucleotide according to the present invention contains 1 or 0 CpG dinucleotide. In a more preferred embodiment of the invention, the polynucleotide is CpG free. In one embodiment of the invention wherein the CpG content is minimized, the method of (re)writing the polynucleotide selects the codon so that the codons or the two consecutive codons without CpG dinucleotide are maintained. In an other embodiment of the invention wherein the CpG content is minimized, the method of (re)writing the polynucleotide selects the codon so that the codons or the two consecutive codons without CpG dinucleotide are partially or totally exchanged into further codons without CpG dinucleotide. In one embodiment of the invention, the content of CpG dinucleotide in the (re)written sequence is maximized. Preferably, by maximized is intended that the content of CpG dinucleotide is more than 1%, preferably more than 5%, more preferably more than 10%. More preferably, by maximized is intended that the CpG dinucleotide content of the (re)written polynucleotide is higher than the native polynucleotide encoding the same polypeptide. In case of the first of the method for (re)writing in the providing of a polynucleotide encoding a polypeptide, the CpG dinucleotide content of the (re)written polynucleotide has to be higher than the provided polynucleotide.
In one embodiment of the invention wherein the CpG content is maximized, the method of (re)writing the polynucleotide selects the codon so that the codons or the two consecutive codons with CpG dinucleotide are maintained. In an other of the invention wherein the CpG content is maximized, the method of (re)writing the polynucleotide selects the codon so that the codons or the two consecutive codons with CpG dinucleotide are partially or totally exchanged into further codons with CpG dinucleotide.
The invention concerns the method of (re)writing a polynucleotide in which the step a) of providing at least one database containing the amino-acids of said polypeptide and corresponding codons or groups of codons encoding the same amino-acid further comprises providing the codon usage table corresponding to one host or to a group of hosts and whereby the (re)written polynucleotide sequence meets the codon usage of said host or said group of hosts.
The invention concerns the method of (re)writing a polynucleotide in which the step c) of selecting from said database one codon are performed one codon at a time. In a preferred embodiment, the selection steps are performed one codon at a time, and the selected codon is the one that is closest to the codon usage that is determined with the so-far written polynucleotide.
Alternatively, the invention concerns the method of (re)writing a polynucleotide in which the step c) of selecting are performed on a batch of k codons. In one embodiment, the selection steps are performed on a batch of k codons, and the selected batch of k codons is the one that is closest to the codon usage that is determined with the so-far written polynucleotide. Preferably, k is at least 2 and no more than the number of amino acids comprised in the encoded polypeptide.
More preferably, k is between 2 and 1000, preferably between 5 and 500, more preferably between 10 and 100. Optionally, k is at least 2, 5, 10, 25, 50, 75, 100, 200. The invention also concerns a method of (re)writing a polynucleotide containing a coding sequence for a polypeptide according to the invention, further comprising the steps for removing the undesired restriction sites.
The invention concerns a method, preferably a computerized process, for rewriting of a first polynucleotide into a second polynucleotide containing each a coding sequence for the same polypeptide, as above-described, further comprising the steps of :
- (i) providing a further database containing nucleotide sequences corresponding to undesired restriction enzyme sites;
- (ii) locating a sequence of preferably 4, 6 or 8 nucleotides in the polynucleotide, which encodes an undesired restriction enzyme site from the database ; - (iii) retrieving at least one codon of the first polynucleotide which is at least partly comprised in the located sequence of preferably 4, 6 or 8 nucleotides,
- (iv) selecting one codon from the first database which belongs to the group of the retrieved codon, the replacement of the retrieved codon by the selected codon in the polynucleotide providing sequence difference with the said undesired restriction enzyme site for the located sequence;
- (v) repeating steps (ii) to (iv); whereby the rewritten polynucleotide contains no undesired restriction enzyme sites.
By sequence difference with a restriction enzyme site is intended that the sequence present at least one nucleotide difference. Preferably, sequence difference with a restriction enzyme site is intended that the sequence can not be recognized by the restriction.
According to the (re)writing method of the present invention, the selection of the codon allows to control the CpG dinucleotide content of the (re)written polynucleotide in order to reach a predetermined CpG dinucleotide content. Therefore, two criteria have to be considered: Does the selected codon comprise a CpG dinucleotide?; Does the selecting codon considering together with the immediately consecutive codon(s) introduce a CpG dinucleotide? In case of a minimized CpG dinucleotide is desired, the codon, alone or combined with the immediately consecutive codon(s), which comprises a CpG dinucleotide will not be selected.
The invention concerns a method, preferably a computerized process, for rewriting of a first polynucleotide into a second polynucleotide containing each a coding sequence for the same polypeptide, as above-described, wherein, for at least a part of the first polynucleotide corresponding to n successive codons of the first polynucleotide:
- step a) comprises providing at least one database containing groups of codons encoding the same amino-acid;
- step b) comprises reading the j,h codon of said part of the first polynucleotide; - step c) comprises the sub- steps of:
-cl) reading the (j-l)th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide;
- c2) selecting from said database one codon which belongs to the same group as the jth codon of said part of the first polynucleotide, which can be identical or different from the jth codon, the selected codon considered together with said (j-l)th codon of said part of the second polynucleotide containing no CpG dinucleotide;
- c3) placing the selected codon at the jth codon location in said part of the second polynucleotide;
- step d) comprises repeating step b) and c) by increasing j each time by one from j=2 to j=n.
Optionally, said part of the first polynucleotide is determined as being the part of the first polynucleotide having the highest CpG dinucleotide concentration.
As the restriction sites generally have 4, 6 or 8 nucleotides, at least 2 or 3 immediately consecutive codons have to be considered in order to avoid the introduction of undesired restriction sites during the (re)writing of the polynucleotide.
Therefore, the invention concerns an embodiment of the above-mentioned rewriting method wherein:
- step a) comprises providing at least one database containing groups of codons encoding the same amino-acid and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
- step b) comprises reading the jth codon of said part of the first polynucleotide;
- step c) comprises the sub- steps of:
-cl) reading the 0-1)* codon in the part of the second polynucleotide correponding to said part of the first polynucleotide; - c2) selecting from said database one codon which belongs to the same group as the j* codon of said part of the first polynucleotide, which can be identical or different from the jth codon, the selected codon considered together with said (j-l)th codon of said part of the second polynucleotide containing no CpG dinucleotide and no undesired enzyme restriction site listed in the second database;
- c3) placing the selected codon at the jth codon location in said part of the second polynucleotide;
- step d) comprises repeating step b) and sub-steps cl) to c3) by increasing j each time by one from j=2 to j=n.
The invention concerns a particular embodiment of the above-mentioned rewriting method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
- sub-step cl) futher comprises reading the (j-2)th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide
- in sub-step c2), the selected codon considered together with said (j"2)th and (j-l)th codons of said part of the second polynucleotide do not contain an enzyme restriction sites listed in the second database;
- in step d), j is increased each time by one from j=3 to j=n.
Preferably, at each step of the polynucleotide (re)writing, the codon selection is done so that the already (re)written polynucleotide is the closest of the codon usage table of the host or group of hosts. Therefore, the invention concerns an embodiment of the above-mentioned rewriting method wherein:
- step a) comprises providing at least one database containing groups of codons encoding the same amino-acid and the codon usage table corresponding to a host or a group of hosts;
- step b) comprises reading the jlh codon of said part of the first polynucleotide; - step c) comprises the sub- steps of:
-cl) reading the 0-1)* codon in the part of the second polynucleotide correponding to said part of the first polynucleotide;
- c2) selecting from said database one codon which belongs to the same group as the jth codon of said part of the first polynucleotide, which can be identical or different from the jth codon,
- the selected codon considered together with said (j-l)th codon of said part of the second polynucleotide containing no CpG dinucleotide, and
- the selected codon considered with the first to (j - 1 )th codons of said part of the second polynucleotide is the closest to codon usage with respect to any one of the codons of the same group; - c3) placing the selected codon at the jth codon location in said part of the second polynucleotide;
- step d) comprises repeating step b) and sub-steps cl) to c3) by increasing j each time by one from j=2 to j=n. Therefore, the invention concerns an embodiment of the above-mentioned rewriting method wherein:
- step a) comprises providing at least one database containing groups of codons encoding the same amino-acid and the codon usage table corresponding to a host or a group of hosts and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
- step b) comprises reading the jth codon of said part of the first polynucleotide;
- step c) comprises the sub- steps of:
-cl) reading the j - 1 )th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide; - c2) selecting from said database one codon which belongs to the same group as the j* codon of said part of the first polynucleotide, which can be identical or different from the jth codon,
-the selected codon considered together with said (j-l)th codon of said part of the second polynucleotide containing no CpG dinucleotide and no undesired enzyme restriction site listed in the second database, and
- the selected codon considered with the first to (j - 1 )th codons of said part of the second polynucleotide is the closest to codon usage with respect to any one of the codons of the same group;
- c3) placing the selected codon at the jlh codon location in said part of the second polynucleotide;
- step d) comprises repeating step b) and sub-steps cl) to c3) by increasing j each time by one from j=2 to j=n.
Optionally, the informations contained in the databases can be fused in only one database. The invention concerns a particular embodiment of the above-mentioned rewriting method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
- sub-step cl) futher comprises reading the (j-2)th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide - in sub-step c2), the selected codon considered together with said 0-2)* and ( -l)tb codons of said part of the second polynucleotide do not contain an enzyme restriction sites listed in the second database;
- in step d), j is increased each time by one from j=3 to j=n. The invention also concerns a method, preferably a computerized process, of writing a polynucleotide containing a coding sequence for a polypeptide, as above-described, wherein, for at least a part of the polypeptide corresponding to n successive amino-acids:
- step a) comprises providing at least one database containing the amino-acids of said polypeptide and corresponding codons; - step b) comprises reading the jth amino-acid of said part of the polypeptide;
- step c) comprises the sub- steps of:
- cl) reading the OO* codon in the part of the written polynucleotide correponding to said part of the polypeptide;
- c2) selecting from said database one codon which codes for said jth amino-acid, the selected codon considered together with said 0-1)* codon of said part of the written polynucleotide containing no CpG dinucleotide;
- c3) placing the selected codon at the jth codon location in said part of the written polynucleotide;
- step d) comprises repeating step b) and c) by increasing j each time by one from j=2 to j=n.
The invention also concerns an embodiment of the above-mentioned writing method wherein:
- step a) comprises providing at least one database containing the amino-acids of said polypeptide and corresponding codons and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
- step b) comprises reading the jth amino-acid of said part of the polypeptide;
- step c) comprises the sub- steps of:
- cl) reading the 0-1)* codon in the part of the written polynucleotide correponding to said part of the polypeptide; - c2) selecting from said database one codon which codes for said jth amino-acid, the selected codon considered together with said 0-1)* codon of said part of the written polynucleotide containing no CpG dinucleotide and no undesired enzyme restriction site listed in the second database;
- c3) placing the selected codon at the j"1 codon location in said part of the written polynucleotide; - step d) comprises repeating step b) and c) by increasing j each time by one from j=2 to j=n.
The invention concerns a particular embodiment of the above-mentioned writing method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
- sub-step cl) futher comprises reading the -2)* codon in the part of the written polynucleotide correponding to said part of the polypeptide;
- in sub-step c2), the selected codon considered together with said 0-2)* and (j-l)tb codons of said part of the written polynucleotide do not contain an enzyme restriction sites listed in the second database; in step d), j is increased each time by one from j=3 to j=n.
The invention concerns an embodiment of the above-mentioned writing method wherein:
- step a) comprises providing at least one database containing the amino-acids of said polypeptide, the corresponding codons, and the codon usage table corresponding to a host or a group of hosts;
- step b) comprises reading the jth amino-acid of said part of the polypeptide;
- step c) comprises the sub- steps of:
- cl) reading the 0"!)* codon in the part of the written polynucleotide correponding to said part of the polypeptide; - c2) selecting from said database one codon which codes for said jth amino-acid,
- the selected codon considered together with said 0-1)* codon of said part of the written polynucleotide containing no CpG dinucleotide, and
- the selected codon considered with the first to 0-1)* codons of said part of the written polynucleotide is the closest to the codon usage with respect to any one of the codons of the same group;
- c3) placing the selected codon at the jth codon location in said part of the written polynucleotide;
- step d) comprises repeating step b) and c) by increasing j each time by one from j=2 to j=n. The invention concerns an embodiment of the above-mentioned writing method wherein:
- step a) comprises providing at least one database containing the amino-acids of said polypeptide, the corresponding codons, and the codon usage table corresponding to a host or a group of hosts and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites; - step b) comprises reading the jth amino-acid of said part of the polypeptide;
- step c) comprises the sub- steps of: - cl) reading the j-\)lb codon in the part of the written polynucleotide correponding to said part of the polypeptide;
- c2) selecting from said database one codon which codes for said jth amino-acid,
- the selected codon considered together with said 0-1)* codon of said part of the written polynucleotide containing no CpG dinucleotide and no undesired enzyme restriction site listed in the second database, and
- the selected codon considered with the first to 0-1)* codons of said part of the written polynucleotide is the closest to the codon usage with respect to any one of the codons of the same group; - c3) placing the selected codon at the jth codon location in said part of the written polynucleotide;
- step d) comprises repeating step b) and c) by increasing j each time by one from j=2 to j=n-
Optionally, the informations contained in the databases can be fused in only one database.
The invention concerns a particular embodiment of the above-mentioned writing method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
- sub-step cl) futher comprises reading the 0-2)* codon in the part of the written polynucleotide correponding to said part of the polypeptide;
- in sub-step c2), the selected codon considered together with said 0_2)* and 0_1)* codons of said part of the written polynucleotide do not contain an enzyme restriction sites listed in the second database; in step d), j is increased each time by one from j=3 to j=n. The invention concerns a method, preferably a computerized process, for rewriting of a first polynucleotide into a second polynucleotide containing each a coding sequence for the same polypeptide, as above-described, wherein, for at least a part of the first polynucleotide corresponding to n successive codons of the first polynucleotide:
- step a) comprises providing at least one database containing groups of codons encoding the same amino-acid;
- step b) comprises the sub-steps of:
- bl) reading the (i-l)th codon in the part of the second polynucleotide correponding to said group of the first polynucleotide;
- b2) reading the ith to (i+k)lh codon(s) of the first polynucleotide, k being greater or equal to 1 and lower or equal to n;
- step c) comprises the sub-steps of: - cl) selecting from said database one codon which belongs to the same group as the ith codon of the first polynucleotide, which can be identical or different from the ith codon, the selected codon considered together with the (i-1)' codon of said part of the second polynucleotide containing no CpG dinucleotide; - c2) placing the selected codon at the ith codon location in said part of the second polynucleotide;
- c3) repeating step cl) and c2) until the k codons have been selected;
- step d) comprises repeating steps b) and c) by increasing i by (k + 1) from i=2 to i=n. The invention also concerns one embodiment of this rewriting method, wherein: - step a) comprises providing at least one database containing groups of codons encoding the same amino-acid and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
- step b) comprises the sub-steps of:
- bl) reading the (i-l)th codon in the part of the second polynucleotide correponding to said group of the first polynucleotide;
- b2) reading the ith to (i+k)th codon(s) of the first polynucleotide, k being greater or equal to 1 and lower or equal to n;
- step c) comprises the sub-steps of:
- cl) selecting from said database one codon which belongs to the same group as the ith codon of the first polynucleotide, which can be identical or different from the i,h codon, the selected codon considered together with the (i-l),h codon of said part of the second polynucleotide containing no CpG dinucleotide and no undesired enzyme restriction site listed in the second database;
- c2) placing the selected codon at the ith codon location in said part of the second polynucleotide;
- c3) repeating step cl) and c2) until the k codons have been selected;
- step d) comprises repeating steps b) and c) by increasing i by (k +1) from i=2 to i^n. The invention concerns a particular embodiment of the rewriting method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
- sub-step bl) futher comprises reading the (i-2)th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide
- in sub-step c2), the selected codon considered together with said (i-2)th and (i-l)th codons of said part of the second polynucleotide do not contain an undesired enzyme restriction sites listed in the second database, and k is at least 3. The invention further concerns the above-mentioned rewriting method wherein:
- step a) comprises providing at least one database containing groups of codons encoding the same amino-acid and the codon usage table corresponding to a host or a group of hosts;
- step b) comprises the sub-steps of: - bl) reading the (i-l)th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide;
- b2) reading the ith to (i+k),h codon(s) of the first polynucleotide, k being greater or equal to 1 and lower or equal to n;
- step c) comprises the sub-steps of: - cl) selecting from said database one codon which belongs to the same group as the ith codon of the first polynucleotide, which can be identical or different from the ith codon,
- the selected codon considered together with the (i-l)th codon of said the second polynucleotide containing no CpG dinucleotide, and - the selected codon considered with the first to (i-l)th codons of said part of the second polynucleotide is the closest to codon usage with respect to any one of the codons of the same group;
- c2) placing the selected codon at the ith codon location in said part of the second polynucleotide; - c3) repeating step cl) and c2) until the k codons have been selected;
- step d) comprises repeating steps b) and c) by increasing i by (k + 1) from i=2 to i=n. The invention further concerns the above-mentioned rewriting method wherein:
- step a) comprises providing at least one database containing groups of codons encoding the same amino-acid and the codon usage table corresponding to a host or a group of hosts, and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites;
- step b) comprises the sub-steps of:
- bl) reading the (i-l),h codon in the part of the second polynucleotide correponding to said part of the first polynucleotide; - b2) reading the ith to (i+k)th codon(s) of the first polynucleotide, k being greater or equal to 1 and lower or equal to n;
- step c) comprises the sub-steps of:
- cl) selecting from said database one codon which belongs to the same group as the ith codon of the first polynucleotide, which can be identical or different from the ith codon, - the selected codon considered together with the (i-1)' codon of said the second polynucleotide containing no CpG dinucleotide and no undesired enzyme restriction site listed in the second database, and
- the selected codon considered with the first to (i-l)th codons of said part of the second polynucleotide is the closest to codon usage with respect to any one of the codons of the same group;
- c2) placing the selected codon at the ith codon location in said part of the second polynucleotide;
- c3) repeating step cl) and c2) until the k codons have been selected; - step d) comprises repeating steps b) and c) by increasing i by (k +1) from i=2 to i=n.
The invention concerns a particular embodiment of the rewriting method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
- sub-step bl) futher comprises reading the (i-2)th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide
- in sub-step c2), the selected codon considered together with said (i-2)th and (i-l)th codons of said part of the second polynucleotide do not contain an undesired enzyme restriction sites listed in the second database, and k is at least 3. The invention also concerns a method, preferably a computerized process, of writing a polynucleotide containing a coding sequence for a polypeptide, as above-described, wherein, for at least a part of the polypeptide corresponding to n successive amino-acids:
- step a) comprises providing at least one database containing the amino-acids of said polypeptide and corresponding codons; - step b) comprises the sub-steps of:
- bl) reading the (i-l),h amino-acid of said part of the polypeptide;
- b2) reading the ith to (i+k)th amino-acid(s) of said part of the polypeptide, k being greater or equal to 1 and lower or equal to n;
- step c) comprises the sub-steps of: - cl) selecting from said database one codon which codes for said ith amino-acid, the selected codon considered together with the (i-l)th codon of said part of the written polynucleotide containing no CpG dinucleotide;
- c2) placing the selected codon at the i"1 codon location in said part of the written polynucleotide; - c3) repeating step cl) and c2) until the k codons have been selected;
- step d) comprises repeating steps b) and c) by increasing i by (k +1) from i=2 to i=n. The invention also concerns one embodiment of this writing method, wherein:
- step a) comprises providing at least one database containing the amino-acids of said polypeptide and corresponding codons and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites; - step b) comprises the sub-steps of:
- bl) reading the (i-l)th amino-acid of said part of the polypeptide;
- b2) reading the ith to (i+k),h amino-acid(s) of said part of the polypeptide, k being greater or equal to 1 and lower or equal to n;
- step c) comprises the sub-steps of: - cl) selecting from said database one codon which codes for said ith amino-acid, the selected codon considered together with the (i-l)th codon of said part of the written polynucleotide containing no CpG dinucleotide and no undesired enzyme restriction site listed in the second database;
- c2) placing the selected codon at the i" codon location in said part of the written polynucleotide;
- c3) repeating step cl) and c2) until the k codons have been selected;
- step d) comprises repeating steps b) and c) by increasing i by (k +1) from i=2 to i=n. The invention concerns a particular embodiment of the above-mentioned writing method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
- sub-step bl) futher comprises reading the (i-2)th codon in the part of the written polynucleotide correponding to said part of the polypeptide;
- in sub-step c2), the selected codon considered together with said (i-2)th and (i-l)th codons of said part of the written polynucleotide do not contain an enzyme restriction sites listed in the second database; in step d), i is increased each time by one from i=3 to i=n.
The invention also concerns one embodiment of this writing method, wherein:
- step a) comprises providing at least one database containing the amino-acids of said polypeptide, the corresponding codons, and the codon usage table corresponding to a host or a group of hosts;
- step b) comprises the sub-steps of:
- bl) reading the (i-l)th amino-acid of said part of the polypeptide;
- b2) reading the i* to (i+k)th amino-acid(s) of said part of the polypeptide, k being greater or equal to 1 and lower or equal to n; - step c) comprises the sub-steps of: - cl) selecting from said database one codon which codes for said i' amino-acid, the selected codon considered together with the (i-l)lh codon of said part of the written polynucleotide containing no CpG dinucleotide and the selected codon considered with the first to (i-l)lh codons of said part of the written polynucleotide is the closest to codon usage with respect to any one of the codons of the same group;
- c2) placing the selected codon at the il codon location in said part of the written polynucleotide;
- c3) repeating step cl) and c2) until the k codons have been selected;
- step d) comprises repeating steps b) and c) by increasing i by (k +1) from i=2 to i=n. The invention also concerns one embodiment of this writing method, wherein:
- step a) comprises providing at least one database containing the amino-acids of said polypeptide, the corresponding codons and the codon usage table corresponding to a host or a group of hosts, and a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites; - step b) comprises the sub-steps of:
- bl) reading the (i-l),h amino-acid of said part of the polypeptide;
- b2) reading the ith to (i+k)th amino-acid(s) of said part of the polypeptide, k being greater or equal to 1 and lower or equal to n;
- step c) comprises the sub-steps of: - cl) selecting from said database one codon which codes for said ith amino-acid,
- the selected codon considered together with the (i-l)* codon of said part of the written polynucleotide containing no CpG dinucleotide and no undesired enzyme restriction site listed in the second database, and
- the selected codon considered with the first to (i-l)th codons of said part of the written polynucleotide is the closest to codon usage with respect to any one of the codons of the same group;
- c2) placing the selected codon at the ilh codon location in said part of the written polynucleotide;
- c3) repeating step cl) and c2) until the k codons have been selected; - step d) comprises repeating steps b) and c) by increasing i by (k + 1 ) from i=2 to i=n.
The invention concerns a particular embodiment of the above-mentioned writing method wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
- sub-step bl) futher comprises reading the (i-2)th codon in the part of the written polynucleotide correponding to said part of the polypeptide; - in sub-step c2), the selected codon considered together with said (i-2)th and (i-l)th codons of said part of the written polynucleotide do not contain an enzyme restriction sites listed in the second database; in step d), i is increased each time by one from i=3 to i=n. The invention concerns a method, preferably a computerized process, for rewriting of a first polynucleotide into a second polynucleotide containing each a coding sequence for the same polypeptide, as above-described, further comprising the steps of :
- (i) providing a further database containing nucleotide sequences corresponding to desired restriction enzyme sites, as well as the corresponding 2 or 3 amino-acids sequences encoded by said nucleotide sequences in the three reading frames;
- (ii) providing the 3 sequences of amino-acids which correspond to the 3 different reading frames of the first polynucleotide sequence, and reading them;
- (iii) locating a sequence of 2 or 3 amino-acids which corresponds to a restriction enzyme site, by retrieving said sequence from said database; - (iv) selecting at least one codon from said first database to match with the nucleotide sequence of the restriction enzyme site ;
- (v) repeating steps (ii) to (iv); whereby the rewritten polynucleotide sequence contains at least one further desired restriction enzyme site, when compared to the first polynucleotide sequence. In a preferred embodiment, the rewritten polynucleotide contains only one further desired restriction enzyme site, when compared to the first polynucleotide sequence.
The invention also concerns a method, preferably a computerized process, of writing a polynucleotide containing a coding sequence for a polypeptide, as above-described, further comprising the steps of : - (i) providing a further database containing nucleotide sequences corresponding to desired restriction enzyme sites, as well as the corresponding 2 or 3 amino-acids sequences encoded by said nucleotide sequences in the three reading frames;
- (ii) locating a sequence of 2 or 3 amino-acids which corresponds to a restriction enzyme site in said polypeptide, by retrieving said sequence from said database; - (iii) selecting at least one codon from said first database to match with the nucleotide sequence of the restriction enzyme site ;
- (iv) repeating steps (ii) to (iii); whereby the written polynucleotide sequence contains at least one desired restriction enzyme site. In one embodiment of the (re)writing method, said desired restriction sites are introduced whereby restriction sites are introduced in the (re)written polynucleotide sequence at a predetermined length from each other. Preferably, the predetermined length from each other is between 100 to 1000 bp, preferably 300 to 800 bp, more preferably 600 to 800 bp. In one embodiment of the (re)writing method, said desired restriction site(s) is introduced whereby a restriction site is introduced between each functional unit of the (re)written polynucleotide. The present invention relates to a method for (re)writing a CpG free polynucleotide containing a coding sequence for a polypeptide comprising the following steps : a) providing an amino acid sequence or a polynucleotide sequence; b) removing the CpG dinucleotides by replacing with codon or codon combination which does not comprise a CpG; c) writing a nucleotide sequence encoding said amino acid sequence by selecting the preferential codon of the codon usage table corresponding to the host or group of hosts and/or d) removing the undesired restriction sites by replacing with codon or codon combination which does not comprise a CpG and which does not comprise an undesired restriction site and/or; e) optionally adding desired restriction site(s).
The steps b) and/or c) and/or d) can be done consecutively or simultaneously. Preferably, the global codon frequency has to be estimated in order to control the accordance with the chosen codon usage table. More preferably, said global frequency is controlled at each step b), c), d), and e), of the method. In a preferred embodiment, once the amino acid sequence is provided (step a), the number of each amino acid is determined. Then, with an appropriate codon usage table, the number of each codon to be used is determined. For example, an appropriate codon usage table can be the one depicted in Figure 3. These numbers are used during the writing step b) and the following steps c), d) and e) for the rewriting. Examples of such (re)writing method are disclosed in Figures 6-20.
Several checking steps can be added at the end of the method in order to verify that the nucleotide sequence does not comprise any CpG dinucleotide and any undesired restriction site and that the global codon frequency is in accordance with the chosen codon usage table.
Moreover, in the case the predetermined CpG dinucleotide cannot be reached with any amino acid change in the encoded sequence, the invention relates to an alternative method in which the amino acid sequence is modified so that the nucleotide sequence contains said predetermined CpG dinucleotide content and the substitution of one or more amino acids is conservative. By conservative is intended that first amino acid can be substituted by an other one from a group comprising the first, the groups being the following : Group I : Gly, Ala, Val, He, Leu, Met, Phe, Trp
Group II : Ser, Thr, Cys Group III : Asp, Glu, Asn, Gin Group IV : Arg, Lys, Met Group V : His, Phe, Tyr, Trp
Preferably, the amino acids Gly, Cys and Pro are not changed. Therefore, the invention concerns a method of (re)writing a polynucleotide containing a coding sequence for a polypeptide further comprising the steps of :
- (i) providing at least one database containing the conservative groups of amino- acids and their corresponding codons;
- (ii) selecting from said further database one codon which encodes an amino-acid from the same conservative group than the amino-acid encoded by the read codon; whereby the (re)written polynucleotide encodes a conservatively-substituted polypeptide.
Optionally, the (re)writing method according to the present invention can also comprise a search of cryptic splicing sites. Indeed, the presence of cryptic splicing sites in the (re)writing sequence is a very rare. However, a checking step can be introduced in the (re)writing method in order to delete them.
Codon Usage
Another embodiment of the invention includes (re)written of polynucleotide sequence which substantially meets the codon usage of a host or a group of hosts. In one embodiment of the (re)writing method according to the invention, the nucleotide sequence encoding a polypeptide is (re)written so that the codons are selected in order to encode the amino acid sequence and to avoid the CpG dinucleotides. Indeed, as the translation code is degenerated, several codons may encode the same amino acid. Therefore, the codons comprising the dinucleotide CpG are never used, namely GCG, CGA, CGC, CGG, CGT, CCG, TCG, and ACG. Moreover, the codon ending by a C nucleotide (namely GCC, CGC, AAC, GAC, TGC, GGC, CAC, ATC, TTC, CCC, AGC, TCC, ACC, TAC, GTC) will not be used if the next codon begins by a G nucleotide (namely, GCA, GCC, GCG, GCT, GAC, GAT, GAA, GAG, GGA, GGC, GGG, GGT, GTA, GTC, GTG, GTT).
Oppositely, in an embodiment of the (re)writing method according to the invention in which the CpG dinucleotide content is maximized, the codons comprising the dinucleotide CpG are preferably used, namely GCG, CGA, CGC, CGG, CGT, CCG, TCG, and ACG. Moreover, the codon ending by a C nucleotide (namely GCC, CGC, AAC, GAC, TGC, GGC, CAC, ATC, TTC, CCC, AGC, TCC, ACC, TAC, GTC) is preferably used if the next codon begins by a G nucleotide (namely, GCA, GCC, GCG, GCT, GAC, GAT, GAA, GAG, GGA, GGC, GGG, GGT, GTA, GTC, GTG, GTT). As the translation code is not universal, the codon signification needs to be controlled. For example, the codon signification is different in nuclear and in some organelles such as mitochondria or chloroplast. Therefore, the (re)writing of gene is necessary for the stable nuclear expression of a mitochondria gene in a cell outside of mitochondria. The codon frequency is different in each organism. Several codon usage tables are available. More particularly, codon usage tables are available for prokaryotic organisms, for plants, for inferior and superior eukaryotes. For example, the difference is highly relevant for heterologous expression in plants.
In a first embodiment of the invention, the sequence could be optimized for one organism. A high specificity could lead to a strong expression. In a first alternative, the codon with the highest frequency is chosen. In a second one, the global frequency of each codon is in agreement with the codon usage table of the host organism.
In a preferred embodiment of the invention, the sequence could be optimized for several hosts or for a group of hosts. Preferably, the sequence is optimized for prokaryotes, for eukaryotes and/or for plants. For example, Figure 1 presents a codon usage table for prokaryotes, plants and eukaryotes. Therefore, a specific codon usage table is generated with the mean of the frequency of one codon in several codon usage tables. In a preferred embodiment, the method according to the invention uses the codon usage table of Figure 2 in an optimization for prokaryotes and eukaryotes. For the generation of the optimized codon usage table, the codon usage tables of the preferred organisms are preferably used. Indeed, the introduction of rarely used organisms for the generation of the optimized table can lead to a codon usage table that is incompatible with frequently used organisms. In order to avoid such bias, the codon usage table introduced in the (re)writing method is preferably checked to be compatible with the frequently used or planned organisms. Figure 24 presents an optimized codon usage table for higher eukaryotic hosts with the above-mentionned suggestions. The codon usage table of Figure 24 is preferably used in the (re)writing method according to the present invention.
In a more preferred alternative of the present method, the codons are chosen so that the final and global proportion of every codons are similar to the codon usage table of the host cell. Indeed, the respect of these frequencies can allow an increased expression. For example, if a stretch of several Alanine is found the protein sequence and if the same codon is used, the translation can be hindered. However, if the codon frequency is less than 10 %, preferably less than 5 %, this codon is not used.
The invention also contemplates a method of (re)writing a polynucleotide containing a coding sequence for a polypeptide so that the (re)written polynucleotide meets the codon usage table of a host or a group of host. In a preferred embodiment of the (re)writing method, said polypeptide is not naturally expressed in the host or in one host of the group of hosts. As described above, the (re)writing of such polynucleotide can allow a better expression of said polypeptide in said host or group of hosts.
Hereafter is described an alternative method for designing a codon usage.
Rewriting using codon-usage drawn from a set of sequence 1) Principles and aim
An interactive method of specifying codon usage have been devised. The user may provide a list of nucleic acid sequences as a base to compute a particular codon usage. For instance, suppose we want to rewrite a peptidic sequence originally coded by an intronic ORF in an organism, say S. cerevisiae., into an intronic ORF in an other organism, say E. coli . Suppose also that the user have a set of nucleic sequences being intronic ORFs from E. Coli . According to the method described hereafter, the user may get a specific codon usage, drawn from the said set of sequences of intronic ORFs, and use the previously described embodiment of the invention along with the custom-made codon usage to rewrite the sequence.
2) Description of the method i) In a first step, the user, by setting program flags, more preferably options, decides which codon usage will be used. If he interactive mode is chosen, then step to point (ii). ii) In a first step, a set of nucleic acid sequences is read from database files, preferably one file, more preferably a Fasta-formatted file. iii) Count codon frequencies for each sequence from the said set of sequences. For this, a specific genetic code must be specified to the program, as a file or an internal data structure. iv) Normalize the frequencies into usage percentage, such that the sum of the usage percentage over all the degenerated codons coding for a particular amino-acid makes 100 percent. v) Store the resulting codon usage table in an internal data structure or in an external file, and use it as the reference codon usage used by the rewriting process, as described in previous points.
Restriction sites
The invention covers also (re)writing of a polynucleotide having a predetermined content of CpG dinucleotides and containing a coding sequence for a polypeptide in which the undesired restriction enzyme sites have been removed and/or at least one desired restriction site has been introduced.
In order to facilitate the manipulation of the (re)written polynucleotide, the method for
(re)writing polynucleotide comprises the additional steps of removing of the restriction sites and, optionally, of specifically introducing at least one desired restriction sites. The removal of the restriction sites allows an easy manipulation of the (re)written polynucleotide, more particularly in vitro. Some restriction sites can be intentionally introduced in order to facilitate the manipulation of the (re)writing polynucleotide, for example for cloning, subcloning, sequencing, making mutagenesis.
By removal of the undesired restriction sites is intended the restriction sites frequently used and at least the restriction sites of the polylinker comprised in the used vector. In a first alternative, the (re)writing could comprise no restriction site introduction.
In a second embodiment of the invention, new restriction sites are introduced with a regular spacing without modifying the protein sequence. A restriction site can be introduced at a determined length between each other, preferably each 100, 200, 300, 400, 500, 600, 700, 800 or 1000 bp. More preferably, a restriction site is introduced each 600 or 800 bp. In an additional alternative, some restriction sites can be introduced between each functional unit. By functional unit can be intended a nucleotide encoding a protein domain, a regulatory sequence, a promoter, etc... Preferably, some restriction sites could be added between the nucleotide sequence encoding some protein fragments, motifs or domains. For example, these restriction sites can allow the replacement of a nucleotide sequence encoding a protein fragment by a nucleotide sequence encoding another protein fragment. Another utility is the production of protein hybrids.
Preferably, the introduced restriction sites are chosen from the group consisting of the restriction sites comprised in the polylinker of the vector that will be used. In one embodiment, the introduced restriction sites are chosen from the group consisting of the restriction sites comprised in the polylinker of the pUC19 vector, namely EcoR I, Sac I, Kpn I, Sma I, BamH I, Xba I, Sal I, Bspm I, Pst I, Sph I, Hind III.
Preferably, the restriction sites are introduced so that to respect the order of the restriction sites in the polylinker. In one embodiment, the order of the restriction sites is this one of the polylinker of pUC 19 (5' EcoR I - Sac I - Kpn I - Sma I - BamH I - Xba I - Sal I - Bspm I - Pst I - Sph I - Hind III 3').
Several alternative methods for removing or introducing a restriction site are described in Libertini & Di Donato (1992, Protein Eng, 5, 821-825), Shankarappa et al. (Biotechniques, 12, 882-884) and Tamura et al. (Biotechniques, 10, 782-784), the disclosure of which is incorporated herein by reference. These methods can be used in the method accroding to the prensent invention.
Preferably, the method used for the introduction of restriction sites is the following. The restriction site is translated in amino acid in the three different reading frames. Then, a relation between one restriction site and a group of amino acid sequences (2 or 3 amino acids) is established. See Figure 4. This method to identify the place more appropriate to introduce a site restriction is based on the search of regular expression For example, Figure 5 presents the regular expression to search in the amino acid sequence in order to identify the place more appropriate to introduce a site restriction for the group of the pUC 19 polylinker sites.
With the knowledge of the relation between one restriction site and a group of amino acid sequences, the amino acid sequence of the polypeptide to be encoded is examined in order to identify the places showing one sequence of the group of amino acid sequences for one site. At this place, the sequence encoding the polypeptide can be modified in order to introduce the restriction site.
Database
The database comprises at least the amino acids and some codon encoding said amino acids. In a preferred embodiment, a coefficient or frequency is affected to each amino acid - codon couple (for example, see Figure 3). The higher the coefficient is, the more frequently the codon is introduced. If an amino acid - codon couple is undesired, the coefficient is near zero, preferably zero. An amino acid - codon couple could be undesired because of the presence of a CpG dinucleotide and/or of a very low frequency in the host or the group of host. Optionally, the coefficient can permit to introduce only the most frequent codon in a host cell or a group of hosts. Optionally, the coefficient can allow to meet the usage codon table for a host cell or a group of host cells. The invention also encompasses a database with information on the undesired restriction enzyme sites. The invention further encompasses a database with information on the desired restriction enzyme sites. More particularly, this database comprises the regular expression for the considered restriction enzyme sites (for example see Figure 5). These informations can be contained in a database or in a combination of databases.
The coefficients of the database can be used to calculate a score, more particularly in a computerized process. Indeed, the database(s) allow(s) to select the codon during the (re)writing process and to check if the (re)written polynucleotide meets the requirement. Such requirement could be a predetermined content in CpG dinucleotides, and/or a codon usage table corresponding to a host or a group of hosts, and/or the absence of undesired restriction sites.
Computerized process
The process for (re)writing a polynucleotide sequence is computerized. The software is intended at providing a toolkit (that is a set of software components available for later use in an encapsulated program) , as well as a standalone executable for rewriting genes. This software components enables to perform the main task of (re)writing of a polynucleotide sequence with a predetermined CpG content from a polypeptide or polynucleotide sequence, and optionally the two following tasks of • (re)writing of a polynucleotide sequence in which restriction enzyme sites are deleted and/or introduced
• (re)writing of a polynucleotide sequence which is optimal with respect to codon usage.
The skilled man of the art will understand that these two latter tasks can be performed independently or in a combined manner with the task of (re)writing with a predetermined CpG content.
Other dependent tasks can also be performed by this software:
• reading of a file describing the codon usage and mapping it into memory
• testing of a sequence to know if it is CpG free and/or without restriction sites • displaying of the gap between the percentage of each codon in the sequence and the percentage of this codon in the targeted codon usage. Algorithms description
We will now describe an embodiment of a computerized algorithm for writing a polynucleotide which is CpG dinucleotide free, said polynucleotide encoding for a wished polypeptide.
Considering a wished polypeptide with N amino-acids, the algorithm comprises initially in reading the first amino-acid of the polypeptide and to select a codon coding for said first amino-acid which is CpG free. The selected codon is written as the first codon of the polynucleotide to write. Then, the algorithm reads the second amino-acid of the wished polypeptide - which is adjacent to the first one - and selects a second codon coding for said second amino-acid so that the already selected first codon of the polynucleotide considered together with the second codon is CpG free. In other words, the second codon does not contain CpG and there is no CpG straddled on the first codon and the second codon. For this, the algorithm may check successively for each possible codon that codes for the second amino-acid if it fullfills the precited selection condition, untill it finds one fullfilling it. The second selected codon is written to the polynucleotide adjacent to the first selected codon.
The algorithm repeats successively the precited step on the third till the Nth amino-acid by selecting each time a corresponding codon by considering the previously selected codon for having no CpG.
In other words, the main routine of the algorithm comprises successively reading the Ith amino-acid of the wished polypeptide and selecting an Ith codon coding for said Ith amino-acid so that the already selected (I-l)lh codon of the polynucleotide considered together with the I codon is CpG free, I being varied one by one from 2 to N. As a result, the written polynucleotide will encode the polypeptide This algorithm is particularly adapted to be computerized. It can be implemented with help of a database giving for each amino-acids possible corresponding codons.
An improvment thereof consists in that if all possible codons corresponding to the Ith amino-acid leads to a straddled CpG on the (I-l)th selected codon, then the algorithm branches back to the (I-l)th amino-acid for selecting another corresponding codon. This another selected codon is selected as previously in consideration of the (1-2)' selected codon to be CpG free and is then written in the polynuclotide in replacement of the previously selected codon at this location. Then, it continues again with the Ith amino-acid. Due to the change of the (I-l)* codon, it may now be possible to find an Ith codon which has no more an straddled CpG on the (I-l)th codon. If not, the (I-l)th may be again changed and so on untill a CpG free solution is found. If all possible codons of the (I-l)th do not allow a CpG free solution, then it possible to similarly branch back to the (I-2)th amino-acid and so on.
Further, the algorithm may be completed to obtain a polynucleotide which is both CpG free and undesired restriction enzyme site free. Therefore, the first codon is also chosen undesired restriction site free. And, in the precited main routine, the codon for the Ith amino-acid is selected so that it is also undesired restriction site free when considered with the (I-l)th selected codon. This can be done in the same manner than for the CpG presence checking that was previously described. It can also be implemented with help of a database containing the nucleotide sequences of the enzyme restriction sites to consider. In case any restriction site to avoid corresponds to an undesired sequence having more than six nucleotides, than the main routine is adapted so that the Ith amino-acid is considered with a number of previously selected (I- 1 )th, (I-2)th codons in order to allow the checking of the presence of such restriction sites over this number of codons added to the one in course of selection. Thus, if the restriction site to avoid has 8 nucleotides, the main routine will consider the two previously selected codons in order to select the following one.
The man skilled in the art will understand that the writing of a restriction site free polynucleotide is possible with this computerized algorithm independently from the fact it is also CpG free. Indeed, the selection of each codon may be done without checking the presence of CpG. As described here, the treatment of restriction sites is done at the same time as the CpG treatment. However, it is possible also to write first a polynucleotide free of CpG regardless of the restriction sites, and then to rewrite this polynucleotide to get it restriction site free, or vice- versa.
In an improvement, the computerized algorithm is completed for writing the polynucleotide so that it tends to respect the codon usage of an host organism. To approach the targeted codon usage, a tree exploration is used. An example of such a tree (constructed for a sequence amino acids) is shown on the figure 22.
The algorithm begins by the first amino-acid of the sequence : it builds one node per codon that codes for this particular amino-acid (the circles on the figure 22). Then it computes a score for each node, that is based on the frequencies of apparition of codons in the portion of sequence that has been rewritten (see below for a detailed explanation of the scoring algorithm - the scores are indicated on branches of the tree on figure 22). Once the nodes have been scored, the best-scored node is chosen and the same operation is recursively applied. If at any time a CG or a restriction enzyme site is found, the algorithm stops on the currently investigated node and traces back to the previous node. Then it tries the best-scoring node - apart from the one that led to a deadlock -. On figure 22, the path that is first investigated (because of best scores) ends on node 6. Since it leads to a CG construct, it is not taken. Then the algorithm goes back to node 5 and tries the other way (which result in the broken line path) which also leads to a CG construct. The algorithm then goes back to node 1 (since all path beginning by 5 have been unsuccessfully investigated) and, following the best-score path, find a solution (in filled arrows).
This algorithm is described on flow chart of figure 23. At each node of the tree, the score is computed as follows : each node correspond to a new codon (which is in the list of possible codons) that is added to the sequence being constructed. It computes the percentage of each of the possible codon in the newly constructed sequence, and compare it to the percentage put in the targeted codon usage by computing the square of the difference of these percentages. Then it takes the maximum of these differences upon the possible codon. So we have one number for each possible newly created sequence, which is the score associated to the node. The node selected is the one which presents the lower score which is considered as the best score.
In the flow chart of figure 23, « seq » corresponds to the amino-acids sequence of the polypeptide, « AA » is an abbreviation for amino-acid and « last-seq » are the codons that were selected for the previous amino-acids of the polypeptide.
The man skilled in the art understands that the codon usage optimisation may be used independently from the CpG freeness condition and/or from the restriction sites freeness. To do only usage codon optimisation, it is sufficient not to do the CpG and/or restriction sites checkings.
The computerized algorithms previously described may easily be adapted for rewriting a first polynucleotide having a coding sequence, into a second one coding for the same polypeptide, but which is CpG free and/or restriction site free. Therefore, instead of reading successively the amino-acids of the polypeptide, the algorithm successively reads the codons of the first polynucleotide, determines e.g the corresponding amino-acid by using the database and then continues as previously described. Heuristic improvements to the branching algorithm for codon usage optimisation
The algorithm as described returns the best solution by locally trying to use the closest path to the codon usage. To further improve the present invention, an heuristic methods may be used. First heuristic : sub-tree evaluation
The idea is to modify the precited algorithm as follows : the score is computed as above, but all the paths up to a depth of K amino-acids (to be fixed, depending on the precision wanted and the time allowed) - are explored, and scores with respect to the codon usage are computed. At the end of this step, the algorithm selects the path of K codons having the best score and writes them into the polynucleotide. In other words, the algorithm determines all the possibilities of codons corresponding to this K amino-acids and selects the combination of K codons being the nearest of the codon usage when considered together with all the previously selected codons of the polynucleotide. Then, the algorithm repeats the operation for the K following amino-acids.
This method provides a local score computing that spans on K codons and not just on one codon as in the algorithm illustrated in fig. 23.
Second heuristic : segmentation of the search
In the method developed above, the algorithm begins by the first amino-acid and then scans the sequence sequentially. In case the algorithm is used to rewrite a first polynucleotide into a second polynucleotide, the algorithm does not take into account the regions where there are strong constraints (i. e. regions where there are a lot of CG, or regions where there exists restriction sites, or both). An heuristic could be to begin the process by regions with a high ratio of CG and/or restriction sites so that the maximum flexibility in codon repartition is allowed. In other words, it provides more choices at the beginning than at the end). Thus, the algorithm may comprise a preliminary step of looking for regions of the first polynucleotide having a bigger concentration of CpG and or restriction sites than the average amount thereof in the whole polynucleotide and beginning the rewriting for said high concentration regions and afterwards rewriting the other regions.
Thus, the algorithm may provide a polynucleotide which is CpG free as well as restriction site free, while being optimized as regards the codon usage. Other optimization techniques
The present also covers the technique of genetic algorithm in case a better global optimization is wanted. This kind of algorithm has the property to find local optima (for an optimization problem). So it has the same limitations that our branching algorithm. But since it is based on a completely different approach, it is likely to give another type of solution. Thus it is preferable to use both algorithm altogether and to keep the best solution. The algorithm of the invention may also be used to write polynucleotide having a given content of CpG instead of being CpG free.
A first method consist in first writing a CpG free polynucleotide starting from the polypeptide or from a first polynucleotide, e.g. with the help of the previously mentioned algorithm. Then, the CpG free polynucleotide is rewritten so as to add CpG in the wished quantity. Therefore, the algorithm sequentially screens said CpG free polynucleotide in order to determine codons in the polynucleotide for which it exists at least one equivalent codon - i.e. coding the same amino-acid - which contain a CpG dinucleotide. When such a codon is found, the algorithm replaces said codon by the equivalent one containing a CpG. The algorithm repeats the operation as many time as necessary to introduce the wished number of CpG.
When the algorithm has screened the whole CpG free polypeptide without reaching the wished number of CpG, it may start again the screening for looking for two adjacent codons which may be replaced with two equivalent codons which contain a straddled CpG thereon.
A second method consist in first screening the polynucleotide to rewrite in order to determine the number of CpG it contains. If it contains more CpG than wished, the CpG-free- polynucleotide-providing-algorithm may be applied to the sequence in order to remove the number of CpG in excess. On the contrary, if it contains less CpG than wished, the algorithm screens for CpG free codons which have at least one equivalent, but CpG-containing-codon and replaces a number of such codons to get the wished number of CpG in the polynucleotide. If not possible, the algorithm may also screen the polynucleotide for finding pairs of adjacent codons which may be replaced by two equivalent codons which contain a straddled CpG thereon.
Of course, it is possible to adapt said methods for tending to distribute the CpG according a given distribution law along the polynucleotide. They may be e.g. placed at regular intervals in the polynucleotide. In an another embodiment, the algorithm provides for insertion of restriction sites in the polynucleotide. Therefore, a database contains for the wished restriction sites the amino-acid combinations which may be encoded by adjacent codons comprising said restriction site. To introduce a given restriction site in the polynucleotide, the algorithm screens the polynucleotide for finding adjacent codons encoding one of said amino-acid combination which corresponds in the database to said restriction site. Then, if it is possible, the algorithm replaces the found adjacent codons by the codons that encode for the same amino-acid combination, but which contain the restriction site.
One will understand that the different algorithms may be used independently from each other, but also in combination. For instance, it is possible to remove certain restriction sites with the first described algorithm and then introduce other restriction sites with the latter described algorithm. 1) Other ways to implement gene rewriting in a computer program 1.1) Principles and aim
Other optimization techniques can be considered to solve the gene rewriting problem, and especially techniques that are no longer sequential (that is, as said method, progressing from one point of the sequence to a couple of other points).
In a first step, weights are affected to each constraints, preferably proportionally to the priority wanted by the user. For instance, during the rewriting process, a CG will cost 10, the addition of a restriction site from said restriction enzyme database will cost 30 which means that the user prefers adding CG rather than restriction sites. Once these weights have been affected to each constraints, the problem appears as a combinatorial optimization problem.
The first method is the constraint solving programming method (here after referred to as
CSP). A second method that is available from state of the art combinatorial optimization techniques is genetic algorithms. This later technique, though its use enables to solve the problem, is not very suited since it does not provide any warranty on the quality of the found solution.
2) Scalability through native parallelization
2.1) Principles
The previously described processes (branch and bound exploration and other optimization techniques described in points 1.1) may be time-consuming, especially when rewriting huge peptidic sequences (say, several dozens of kb). To obtain better performance, that is to be able to compute near-optimal solutions in a reasonable time, a parallelization scheme has been devised.
2.2) Description of the method
This embodiment of the invention consists in five steps. i) define the size of the sequence window, that is the number of consecutive amino-acids that will be affected to each node for the optimization process. The said size is computed as the integer part of the ratio of the sequence length (in amino-acids) by the number of computation nodes available. By sequence window, or window will be thereafter meant a portion of the amino acid sequence to rewrite that comprises said size of consecutive amino acids. By sequence will be thereafter meant the whole amino acid sequence to be rewritten. ii) Assign each window to one node. Each node carries the same rewriting on its sequence window than described in previous, sequential embodiment of the invention (points x to y). The nodes are computationally distinct entities. They may be on the same SMP machine (implemented as kernel or POSIX threads), preferably on physically different machines, like in clusters of PCs or workstations. iii) Once all the computational nodes have finished the rewriting of their window, they synchronize and gather all their rewritten windows. iv) The final step consists in assembling back the different rewritten windows, so that it avoids adding CG, restriction sites and/or other constraints at the jointures. When joining the different windows, if a CG, restriction site or other constraints is not satisfied, the algorithm changes the codons flanking the join one after another, using codons allowed by codon degeneracy, without taking into account the codon usage. The idea is that since codon usage is a global property along the whole sequence, changing a few codons at join position will not change much the codon usage of the whole sequence. v) If the constraints at join positions still can't be solved, the algorithm steps back to step
(i) with a different window layout such that join positions will be different in the final sequence. In this case, windows will not necessary have the same size.
Polypeptide or polynucleotide encoding the polypeptide The sequence of the encoded polypeptide is not modified by the (re)writing process.
However, if the modification of the polypeptide is absolutely necessary, some conservative mutation can be done in the encoded polypeptide.
The method according to the invention concerns the (re)writing of a polynucleotide encoding a polypeptide from a polypeptide sequence. Optionally, said polypeptide is a native polypeptide. Optionally, said polypeptide is a mutated polypeptide derived from a native polypeptide. Optionally, said polypeptide is a chimeric polypeptide. Optionally said polypeptide is an artificial polypeptide. The method according to the invention concerns the (re)writing of a second polynucleotide encoding a polypeptide from a first polynucleotide containing the encoding sequence for the same polypeptide. Optionally, said first polynucleotide encoding a polypeptide is a native polynucleotide. Optionally, said polynucleotide is a mutated polynucleotide derived from a native polynucleotide. Optionally, said polynucleotide is a chimeric polynucleotide. Optionally said polynucleotide is an artificial polynucleotide.
The (re)written polynucleotide can be prokaryotic, viral, or eukaryotic (notably from plant). The polynucleotide to be (re)written can be any kind of gene. It can be an exogenous gene for the host cell. It can also be an endogenous gene. It can be a nuclear gene or an organelle's gene.For example, the (re)written gene can be a reporter gene.
In the examples are disclosed the (re)writing of meganuclease gene. Namely, a (re)written polynucleotide is disclosed for F-Tevl, F-TevII, HO, I-Ceul, I-Chul, I-Crel, I-Dmol, I-Scel, I-Tevl, I-TevII, I-TevIII, PI-Mlel, Pl-Pful, Pl-PfuII, Pl-Scel, PI-Tlil, PI-THII, I-Dirl and PI-MtuI. Meganuclease are very rare-cutting enzymes encoded, in a large majority of cases, by introns ORF (Intron meganucleases), "classical" genes or intervening sequences (Inteins). These enzymes have striking structural and functional properties that distinguish them from "classical" and well known restriction enzymes (generally from bacterial system RMII). They have recognition non-palindromic sequences that span 12-40 bp of DNA, whereas "classical" restriction enzymes recognise much shorter stretches of DNA, in the 3-8 bp range (up to 12 bp for rare-cutter).
These meganucleases can be used for in vivo genome engineering. Indeed, they recognize long DNA sequence: thus they can locate and cut a unique and specific site in the entire genome. For example, they can specifically cut a gene at a unique given location.
Some methods of recombination based on double-strand break repair, in order to introduce modifications into the cellular genome are based on the utilisation of meganucleases. These methods are described in US 5,474,896, US 5,792,632, US 5,866,361, US 5,948,678, US 5,962,327, US 5,830,729, WO 00/46385 and WO 00/46386, these patents and patent applications are hereby incorporated in their entirety by such reference. Meganuclease recombination system allows outstanding increases in levels of homologous recombination.
Therefore, the meganuclease has to be expressed in host cells which do not naturally expressed meganucleases. Indeed, number of meganuclease genes are encoded by DNA of organelles such as mitochondria or chloroplastes. Generally, the expression of meganucleases in prokaryotic or eukaryotic host cell needs the modification of their ORF (open reading frame).
(Re)written polynucleotides
The present invention is concerned with isolated polynucleotides derived from a native gene having an increased or reduced content of CpG dinucleotides as compared to the native gene. The isolated polynucleotides thereby demonstrate a modified level of expression once introduced into a cell as compared to the native gene's level of expression.
Furthermore, the invention concerns a (re)written polynucleotide containing a coding sequence for a polypeptide having 1 or 0 CpG dinucleotide. Preferably, the invention concerns the polynucleotide containing a coding sequence for a polypeptide having no CpG dinucleotide. In one embodiment of the invention, said (re)written polynucleotide consisting of a coding sequence for a polypeptide.
Additionally, the invention concerns a (re)written polynucleotide containing a coding sequence for a polypeptide having 0.05 % of CpG dinucleotide, preferably 0.01 %. In one embodiment of the invention, said (re)written polynucleotide consisting of a coding sequence for a polypeptide. Alternatively, the invention concerns a (re)written polynucleotide containing a coding sequence for a polypeptide having less than 0.5 % of CpG dinucleotide, preferably less than 0.1 % of CpG dinucleotide, more preferably less than 0.05 % of CpG dinucleotide, and meeting the codon usage table of a host or a group of hosts. Optionally, said (re)written polynucleotide has no undesired restriction site. Optionally, at least one desired restriction site has been introduced in said (re)written polynucleotide. Optionally, said (re)written polynucleotide has 1 or 0 CpG dinucleotide. Optionally, said (re)written polynucleotide has no CpG dinucleotide. In one embodiment of the invention, said (re)written polynucleotide consists of a coding sequence for a polypeptide. Otherwise, the invention concerns a (re)written polynucleotide containing a coding sequence for a polypeptide having more than 1%, preferably more than 5%, more preferably more than 10% of CpG dinucleotide. Optionally, said (re)written polynucleotide further meets the codon usage table of a host or a group of hosts. Optionally, said (re)written polynucleotide has no undesired restriction site. Optionally, at least one desired restriction site has been introduced in said (re)written polynucleotide. In one embodiment of the invention, said (re)written polynucleotide consisting of a coding sequence for a polypeptide.
The invention also encompasses a (re)written polynucleotide containing a coding sequence for a polypeptide meeting the codon usage table of a host or a group of hosts. In one embodiment of the invention, said (re)written polynucleotide consists of a coding sequence for a polypeptide. Optionally, said (re)written polynucleotide has no undesired restriction site. Optionally, at least one desired restriction site has been introduced in said (re)written polynucleotide.
The invention further encompasses a (re)written polynucleotide containing a coding sequence for a polypeptide having no undesired restriction site. In one embodiment of the invention, said (re)written polynucleotide consisting of a coding sequence for a polypeptide. The invention contemplates a (re)written polynucleotide having at least 500, 700, or 900 bp, more preferably at least 1, 1.5, 2, 2.5 or 3 kb.
The (re)written polynucleotides according to the invention are not native. Hence, said (re)written polynucleotides can not be found in nature.
The invention encompasses the (re)written polynucleotide by a method according to the present invention.
The invention also concerns an isolated polynucleotide comprising said (re)written polynucleotide according to the present invention.
The invention more particularly relates to any one of the (re)written sequences SEQ ID
N° 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 and 18 and to any isolated polynucleotide comprising any one of said (re)written sequence a complementary sequence thereto or a fragment thereof. The invention concerns any polynucleotide comprising or consisting of a fragment of at least 20, 30, 50, 100, 200 consecutive nucleotides from any one of the (re)written sequences SEQ ID N° 1, 3, 5, 7, 9, 11, 13, 15 and 17.
The (re)written polynucleotides can be synthesized with any method skilled in the art. For example, such articles and patents describe some means of synthezised gene (Engels et al, Adv Biochem Eng Biotechnol 1988;37:73-127 ; Beattie et al, Biotechnol Appl Biochem. 1988 Dec; 10(6):510-21 ; Casimiro et al, Structure. 1997 Nov 15;5(11): 1407-12 ; Scheller et al, Nat Biotechnol. 2001 Jun;19(6):573-577 ; Massaer et 1, Int Arch Allergy Immunol. 2001 May; 125(l):32-43 ; Traub et al, Appl Microbiol Biotechnol. 2001 Mar;55(2): 198-204 ; Chalmers et al, Biotechniques. 2001 Feb;30(2):249-52 ; Withers-Martinez et al, Protein Eng. 1999 Dec; 12(12):1113-20 ; Alexeyev et al, Biochim Biophys Acta. 1999 Jul 15;1419(2):299-306 ; Au et al, Biochem Biophys Res Commun. 1998 Jul 9;248(l):200-3; Henry et al, C R Acad Sci III. 1999 Dec;322(12):1061-70 ; US 6,110,668 ; US 5,158,877, US 5,093,251 ; WO 99/14318 ; EP 0406937 ; these articles and patents are hereby incorporated in their entirety by such reference). The invention concerns a method of producing a polynucleotide containing a coding sequence for a polypeptide, comprising the steps of : a) (re)writing said polynucleotide by any (re)writing method accroding to the present invention; and b) synthesizing said polynucleotide.
Vector, Host Cell, and Animals ; Use of (Re)written polynucleotide The invention also encompasses expression vectors, cells, and living organisms genetically modified as to comprise and/or express any of the polynucleotides object of the invention or a complementary sequence thereto. The invention further encompasses a cell or a living organism containing a vector comprising a (re)written polynucleotide according to the invention. More particularly, the living organism is a transgenic animal or plant. Preferably, said transgenic animal is murine, more preferably is a mouse. Preferably, said transgenic plant is sweet pepper, cucumber, sunflower, leek, sugar beet, tomato, carrot, Brassica napus, chichory, corn, wheat, barley, cotton, soybean, triticale, oat, tobacco, rye and rice. Preferably, the cell comprising (re)written polynucleotide according to the invention is an embryonic stem cell or fertilized egg. Additionally, the cell comprising (re)written polynucleotide according to the invention is a protoplast. More preferably, said embryonic stem cell or fertilized egg is murine, preferably from a mouse. In an other embodiment, the cell can be a differentiated cell.
The host cell can be of the same species than the polypeptide to express or can be a different species. The host cell can be different from the cell naturally expressing the polypeptide. In one embodiment, the host cell is a differentiated cell. Preferably, the host cell is a differentiated cell which does not naturally express the encoded polypeptide.
Host organisms or host can refer to an organism, more preferably a group of organisms such as superior or inferior eukaryotes, prokaryotes, plants, still more preferably said organisms refer to a combination of eukaryotes, prokaryotes, and plants.
In further aspects, the present invention relates to expression vectors, cells and living organisms genetically modified to comprise and/or express any of the isolated polynucleotides comprising or consisting of a (re)written polynculeotide according to the invention. "Genetically modified" cells and living organisms would preferably integrate and express a foreign DNA inserted therein. Well known methods for reliably inserting a foreign DNA into cells and/or living organisms include : bacterial transformation, transgenesis, stem cells transformation, viral transfection, and artificial chromosome insertion. Once inserted, the foreign DNA may be found integrated to the genome of the host or be found under a non-integrated form (episomal, plasmidic or viral). It may also be inserted to an artificial chromosome or to an independent genome such as into the genome of a bacterial parasitizing an eukaryotic cell.
It is also an object of this invention to provide a method to express in a host an isolated polynucleotide comprising a (re)written polynucleotide according to the present invention. This method is characterized in that it comprises the step of providing an isolated polynucleotide for which expression is desired by (re)writing said polynucleotide containing a coding sequence according to a method of the present invention and expressing said polynucleotide in said host. In a preferred embodiment of the expression method, said host is eukaryotic. The method generally also comprises the step of introducing said isolated polynucleotide into the host using a method preferably selected from the group comprising transgenesis, viral transfection, bacterial transformation, artificial chromosome insertion or homologeous recombination as disclosed for example by Cappuchi et al. (Trends genetics, 1989, 5:70-76) or by Brulet et al in European Patent No. 419621, those documents being incorporated herein by reference. Preferably, said polynucleotide has a predetermined CpG content. More preferably, the CpG dinucleotide content is 1 or 0. Still more preferably, the CpG dinucleotide content is 0. The (re)written polynucleotide is thereby capable of showing an increased and/or stabilized level of expression when introduced into a cell of said host as compared to the level of expression of the native polynucleotide encoding the same polypeptide in the same host cell.
The invention concerns a method to stably express in an eukaryotic host a polynucleotide, comprising the steps of : a) (re)writing a polynucleotide in accordance with any of the (re)writing method of the present invention; b) expressing said polynucleotide in said host. Additionally, the invention concerns a method to stably express in an eukaryotic host a polynucleotide, comprising the steps of : a) (re)writing a polynucleotide in accordance with any of the (re)writing method of the present invention; b) inserting into the host cell the (re)written polyncuelotide of step a); and, c) inducing the expression of said (re)written polyncuelotide of step b). In a preferred embodiment of the method to stably express in an eukaryotic host a polynucleotide, said (re)written polynculeotide has a minimized content of CpG dinucleotide. Preferably, the CpG dinucleotide content is less than 1%, more preferably less than 0.5%, and most preferably less than 0.1%. More preferably, the CpG dinucleotide content is less than 0.1%, preferably less than 0.05%, more preferably less than 0.01% of CpG dinucleotides. Still more preferably, the CpG dinucleotide content is 1 or 0 CpG dinucleotide. In a more preferred embodiment of the invention, said (re)written polynucleotide is CpG free. The minimized content of CpG dinucleotide of the (re)written polynucleotide allows to avoid the epigenetic silencing due to the de novo methylation of the CpG dinucleotides.
The polynucleotide encoding a polypeptide and having an increased content of CpG nucleotides can be used for a transitory expression. Indeed, the high CpG dinucleotide content increases the de novo methylation such that the silencing of that polynucleotide is stimulated. Therefore, the expression of the polynucleotide is brief. The (re)written polynucleotide having a maximized content of CpG dinucleotide could be used to reduce or to silence the expression of said (re)written polynculeotide.
Therefore, the invention concerns a method of reducing or silencing the expression of a polynucleotide in a host cell, comprising the steps of : a) (re)writing an isolated polynucleotide in accordance with any of the (re)writing method of the present invention; b) inserting into the host cell the (re)written polynucleotide; c) reducing or silencing the expression of said (re)written polynucleotide or of a cis-gene proximal or distal to said (re)written polynucleotide.
In a preferred embodiment of the method of reducing or silencing the expression, said (re)written polynucleotide has a maximized content of CpG dinucleotide. Preferably, by maximized is intended that the content of CpG dinucleotide is more than 1%, preferably more than 5%, more preferably more than 10%.
The invention concerns the use of the (re)written polynucleotide according to the present invnetion for obtaining transgenic animals or plants, and/or in gene therapy. In one embodiment, the gene therapy can be done for compensating a genetic defect. As the methylation of the dinucleotide CpG contributes to the mutation C->T, the removal of the CpG dinucleotides from a gene could avoid such a mutation. For example, the p53 gene can be rewritten and protected against the mutation C->T.
Additionally, a tumor suppressor gene and/or an invasion-suppressor gene can be rewritten for removing the CpG dinucletides. The rewritten genes could avoid the silencing by hypermethylation.
Therefore, an other embodiment of the present invention is the use of the (re)written polynucleotide according to the present invnetion for the gene therapy is intended for treating or preventing cancer formation. Preferably, the (re)written gene is a tumor suppressor gene or an invasion-suppressor gene.
The invention encompasses the use of the (re)written polynucleotide according to the present invention for the production of a protein or polypeptide of interest in prokaryotes or eukaryotes. Indeed, the (re)writing polynculeotide allows the heterologous expression of a protein or polypeptide in all organisms. For example, a human protein can be expressed as an exogenous gene in a plant such as the tobacco.
The invention covers also the use of (re)written polynucleotide for the prevention of an immune response against exogenous DNA used in genetic or cellular therapy. Preferably, said (re)written polynucleotide has a minimized content of CpG dinucleotides. Preferably, the CpG dinucleotide content is less than 1%, more preferably less than 0.5%, and most preferably less than 0.1%. More preferably, the CpG dinucleotide content is less than 0.1%, preferably less than 0.05%, more preferably less than 0.01% of CpG dinucleotides. Still more preferably, the CpG dinucleotide content is 1 or 0 CpG dinucleotide. In a more preferred embodiment of the invention, said (re)written polynucleotide is CpG free.
The invention is also concerned with the use of the (re)written polynculeotide having a minimized content of CpG dinculeotide for the prevention of autoimmune against endogenous methyl CpG motifs, DNA used in genetic or cellular therapy or any host similar sequences. Indeed, (re)written polynculeotide of the invention with no or a reduced number of CpG dinucleotides, fragments thereof or vectors containing them, could be used to minimize a T-cell response against the T-cells or tissues treated with them. The invention thus proposed a new concept of DNA vaccination based on lowering/deleting CpG dinucleotides of a whole polynucleotide still encoding an immunoactive antigen.
Another aspect of the present invention is the use of the (re)written polynculeotide with a maximized content of CpG dinucleotides in the induction of a protective immune response in vivo or in vitro. The administration of such (re) written polynculeotide may help and increase the use of the DNA vaccine methods in vivo. A better T-cell response could also be envisaged by an in vitro stimulation of lymphocytes of a patient against a non-natural polynucleotide of interest according to the invention, as compared to the T-cell response against a natural native polynucleotide.
The following examples are intended to further illustrate certain preferred embodiments of the invention and are not intended to limit the scope of the invention.
EXAMPLES
EXAMPLE 1 Example 1 provide some manual (re)written polynucleotides encoding some meganucleases. In the figures illustrating the (re)writing of the polynucleotides, "more frequent codon" refers to a sequence using the more frequent codons for each amino acid, "preferential codon" refers to the (re)written sequence meeting the codon usage table, "CpG minus" refers to the (re)written sequence which does not contain any CpG dinucleotide and which maintains the codon usage table meeting, and "restriction minus" refers to the (re)written sequence which does not contain any undesired restriction site and contains the desired restriction sites. Detailed examples are presented for
1. I-Crel in Figures 6 to 9;
2. HO in Figures 10 to 13 3. F-Tevl in Figure 14 and 16
4. I-Dmol in Figure 15 and 17
The list of rewritten genes encoding some meganucleases is presented in SEQ ID N° 1, 3,
5, 7, 9, 11, and 13, namely F-Tevl (SEQ ID N° 1), HO (SEQ ID N° 3), I-Crel (SEQ ID N° 5), I- Dmol (SEQ ID N° 7), I-Scel (SEQ ID N° 9), I-TevIII (SEQ ID N° 11), and Pl-Scel (SEQ ID N° 13).
EXAMPLE 2 The example 2 provide three (re)written polynculeotides encoding the PI-MtuI and I- Bmo I meganucleases by a computerized process, respectively SEQ ID N° 15 and 17. The computerized procees is generally at least 100 fold more rapid. Futhermore, the computerized procees has a better meeting of the codon usage table.
Detailed example is presented for PI-MtuI in figure 21.
EXAMPLE 3 The (re) written polynucleotides encoding the meganucleases were synthesized as following. For each (re)written polynucleotide, oligonucleotides of 80 bp were designed so as to cover the whole (re)written polynucleotide for the two strands and to be overlapping each other at 50 %.
A first PCR was done with 8 to 12 oligonucleotides (4 to 6 for each strand, 5 pmol for each oligonucleotide). The PCR was done with 1 unity of high fidelity Taq in 50 μl of reactional volume with the following cycles: lx 94 °C for 5 min, 25x (94 °C for 30 sec, 72°C for 2 min) and lx 72°C for 2 min. This first PCR led to 300 to 400 bp fragments. The first PCR products were loaded on agarose gel and the awaited band was cut out. The product contained in this band is purified on silica column (NucleoSpin® Extract).
Two first-PCR fragments, presenting an overlap of at least 50 nucleotides, with two primers corresponding to the ends of the fragments were used for a second PCR 1/5 of the purified product of the first PCR were used with 20 pmoles of primers. The PCR was done with 1 unity of high fidelity Taq in 50 μl of reactional volume with the following cycles: lx 94 °C for 5 min, 25x (94 °C for 30 sec, 61°C for 1 min, 72°C for 1 min) and lx 72°C for 5 min.
Additional PCR were done until the (re)written polynucleotide is completed.
EXAMPLE 4 The following table discloses if the written polynucleotide sequences are expressed in the host cells. Three types of host cells have been assayed: bacteria, yeast and mammalian cells.
Figure imgf000047_0001
While several embodiments of the invention have been described, it will be understood that the present invention is capable of further modification, and this application is intended to cover any variations, uses, or adaptation of the invention, following in general the principles of the invention and including such departures from the present disclosure as to come within knowledge or customary practice in the art to which the invention pertains, and as may be applied to the essential features hereinbefore set forth and falling within the scope of the invention or the limits of the appended claims.

Claims

What is claimed :
1. A method for writing of a polynucleotide containing a coding sequence for a polypeptide comprising the steps of : a) providing at least one database containing the amino-acids of said polypeptide and corresponding codons; b) reading at least one amino-acid from said polypeptide sequence ; c) selecting from said database one codon which encodes said amino-acid(s); d) repeating steps b) and c) for all amino-acids of said polypeptide: the polynucleotide being written with the selected codons; whereby the written polynucleotide has a content of CpG dinucleotide adjusted to a predetermined value.
2. Method according to claim 1, wherein the content of CpG dinucleotide in the written polynucleotide is minimized.
3. Method according to claim 2, wherein the content of CpG dinucleotide in the written polynucleotide is less than 1%.
4. Method according to claim 3, wherein the content of CpG dinucleotide in the written polynucleotide is 1 or 0.
5. Method according to any of claims 1 to 4, wherein:
- step a) further comprises providing the codon usage table corresponding to one host or to a group of hosts ; whereby the written polynucleotide sequence meets the codon usage of said host or to said group of hosts.
6. Method according to according to any of claims 1 to 5, wherein the selection steps are performed one codon at a time.
7. Method according to according to claim 6, wherein the selection steps are performed one codon at a time, and the selected codon is the one that is closest to the codon usage that is determined with the so-far written polynucleotide.
8. Method according to according to any of claims 1 to 7, wherein the selection steps are performed on a batch of k codons.
9. Method according to claims 8, wherein the selection steps are performed on a batch of k codons, and the selected batch of k codons is the one that is closest to the codon usage that is determined with the so-far written polynucleotide.
10. Method according to any of claims 1 to 9, further comprising the steps of :
- (i) providing a further database containing nucleotide sequences corresponding to desired restriction enzyme sites, as well as the corresponding 2 or 3 amino-acids sequences encoded by said nucleotide sequences in the three reading frames;
- (ii) locating a sequence of 2 or 3 amino-acids which corresponds to a restriction enzyme site in said polypeptide, by retrieving said sequence from said database;
- (iii) selecting at least one codon from said first database to match with the nucleotide sequence of the restriction enzyme site ;
- (iv) repeating steps (ii) to (iii); whereby the written polynucleotide sequence contains at least one desired restriction enzyme site.
11. Method according to claim 10, whereby said desired restriction sites are introduced in the written polynucleotide sequence at a predetermined length from each other.
12. Method according to claim 10, whereby a desired restriction site is introduced between each functional unit of the gene.
13. Process according to claim 1 to 7 and 10 to 12, wherein, for at least a part of the polypeptide corresponding to n successive amino-acids:
- step b) comprises reading the jth amino-acid of said part of the polypeptide;
- step c) comprises the sub- steps of: - cl) reading the 0"O* codon in the part of the written polynucleotide correponding to said part of the polypeptide;
- c2) selecting from said database one codon which codes for said jth amino-acid, the selected codon considered together with said 0-1)* codon of said part of the written polynucleotide containing no CpG dinucleotide; - c3) placing the selected codon at the jth codon location in said part of the written polynucleotide; - step d) comprises repeating step b) and c) by increasing j each time by one from j=2 to j=n.
14. Process according to claim 13, wherein:
- step a) further comprises providing the codon usage table corresponding to a host or a group of hosts;
- sub-step c2) further comprises the selected codon considered with the first to 0-1)* codons of said part of the written polynucleotide is the closest to the codon usage with respect to any one of the codons of the same group.
15. Process according to claim 13 to 14, wherein:
- step a) further comprises providing a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites ;
- sub-step c2) further comprises the selected codon considered together with said 0-1)* codon of said part of the written polynucleotide containing no undesired enzyme restriction site listed in the second database.
16. Process according to claim 15, wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and: - sub-step cl) futher comprises reading the 0_2)* codon in the part of the written polynucleotide correponding to said part of the polypeptide;
- in sub-step c2), the selected codon considered together with said 0-2)* and 0-1)* codons of said part of the written polynucleotide does not contain an enzyme restriction site listed in the second database; in step d), j is increased each time by one from j=3 to j=n.
17. Process according to claim 1 to 5 and 8 to 12, wherein, for at least a part of the polypeptide corresponding to n successive amino-acids:
- step b) comprises the sub-steps of: - b 1 ) reading the (i- 1 )th amino-acid of said part of the polypeptide;
- b2) reading the ith to (i+k)th amino-acid(s) of said part of the polypeptide, k being greater or equal to 1 and lower or equal to n;
- step c) comprises the sub-steps of:
- cl) selecting from said database one codon which codes for said ith amino-acid, the selected codon considered together with the (i-l)th codon of said part of the written polynucleotide excluding the presence of CpG dinucleotide; - c2) placing the selected codon next to the i' codon location in said part of the written polynucleotide;
- c3) repeating step cl) and c2) until the k codons have been selected;
- step d) comprises repeating steps b) and c) by increasing i by (k +1) from i=2 to i=n.
18. Process according to claim 17, wherein:
- step a) further comprises providing the codon usage table corresponding to a host or a group of hosts; and
- in sub-step c2), the selected codon considered with the first to (i-l)th codons of said part of the written polynucleotide is the closest to codon usage with respect to any one of the codons of the same group.
19. Process according to claim 17 or 18, wherein:
- step a) further comprises providing a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites ;
- sub-step c2) further comprises the selected codon considered together with the (i-l)th codon of said part of the written polynucleotide containing no undesired enzyme restriction site listed in the second database.
20. Process according to claim 19, wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
- sub-step bl) futher comprises reading the (i-2)th codon in the part of the written polynucleotide correponding to said part of the polypeptide;
- in sub-step c2), the selected codon considered together with said (i-2)th and (i-l)th codons of said part of the written polynucleotide does not contain an enzyme restriction site listed in the second database; in step d), i is increased each time by one from i=3 to i=n.
21. A method for rewriting of a first polynucleotide into a second polynucleotide, each containing a sequence coding for a polypeptide, comprising the steps of : a) providing at least one database containing groups of codons encoding the same amino- acid; b) reading at least one codon from said first polynucleotide; c) selecting from said database one codon which belongs to the same group of the read codon(s), which can be identical or different from said read codon; d) repeating steps b) and c) for all codons of said coding sequence of said polynucleotide; the polynucleotide being written with the selected codons; whereby the rewritten polynucleotide has a content of CpG dinucleotide adjusted to a predetermined value.
22. Method according to claim 21, wherein the content of CpG dinucleotide in the rewritten polynucleotide is minimized.
23. Method according to claim 22, wherein the content of CpG dinucleotide in the rewritten polynucleotide is less than 1%.
24. Method according to claim 23, wherein the content of CpG dinucleotide in the rewritten polynucleotide is 1 or 0.
25. Method according to any one of claims 21 to 24, whereby the codons or the two consecutive codons without CpG dinucleotide are maintained.
26. Method according to any one of claims 21 to 24, whereby the codons or the two consecutive codons without CpG dinucleotide are partially or totally exchanged into further codons without CpG dinucleotide.
27. Method according to any one of claims 21 to 26, wherein:
- step a) further comprises providing the codon usage table corresponding to one host or to a group of hosts ; whereby the rewritten polynucleotide sequence meets the codon usage of said host or to said group of hosts.
28. Method according to claim 21 to 27, wherein the selection steps are performed one codon at a time.
29. Method according to claim 28, wherein the selection steps are performed one codon at a time, and the selected codon is the one that is closest to the codon usage that is determined with the so-far written polynucleotide.
30. Method according to claims 21 to 27, wherein the selection steps are performed on a batch of k codons.
31. Method according to claims 30, wherein the selection steps are performed on a batch of k codons, and the selected batch of k codons is the one that is closest to the codon usage that is determined with the so-far written polynucleotide.
32. Method according to any one of claims 21 to 31 , further comprising the steps of :
- (i) providing a further database containing nucleotide sequences corresponding to desired restriction enzyme sites, as well as the corresponding 2 or 3 amino-acids sequences encoded by said nucleotide sequences in the three reading frames;
- (ii) providing the 3 sequences of amino-acids which correspond to the 3 different reading frames of the first polynucleotide sequence, and reading them;
- (iii) locating a sequence of 2 or 3 amino-acids which corresponds to a desired restriction enzyme site, by retrieving said sequence from said data-base;
- (iv) selecting at least one codon from said first database to match with the nucleotide sequence of the desired restriction enzyme site ; - (v) repeating steps (ii) to (iv); whereby the rewritten polynucleotide sequence contains at least one further desired restriction enzyme site, when compared to the first polynucleotide sequence.
33. Method according to claim 32, whereby said desired restriction sites are introduced in the (re)written polynucleotide sequence at a predetermined length from each other.
34. Method according to claim 32, whereby a desired restriction site is introduced between each functional unit of the gene.
35. Method according to claim 21 to 29 and 32 to 34, wherein, for at least a part of the first polynucleotide corresponding to n successive codons of the first polynucleotide:
- step b) comprises reading the jth codon of said part of the first polynucleotide;
- step c) comprises the sub- steps of:
-cl) reading the (]-l)ih codon in the part of the second polynucleotide correponding to said part of the first polynucleotide;
- c2) selecting from said database one codon which belongs to the same group as the jth codon of said part of the first polynucleotide, which can be identical or different from the j* codon, the selected codon considered together with said 0-1)* codon of said part of the second polynucleotide containing no CpG dinucleotide; - c3) placing the selected codon at the jth codon location in said part of the second polynucleotide; step d) comprises repeating step b) and c) by increasing j each time by one from j=2 to j=n.
36. Method according to claim 35, wherein,
- step a) further comprises providing the codon usage table corresponding to a host or a group of hosts;
- sub-step c2) further comprises the selected codon considered with the first to -1)* codons of said part of the second polynucleotide is the closest to codon usage with respect to any one of the codons of the same group.
37. Method according to claim 35 or 36, wherein,
- step a) further comprises providing a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites ;
- sub-step c2) further comprises the selected codon considered together with said 0-1)* codon of said part of the second polynucleotide containing no undesired enzyme restriction site listed in the second database.
38. Method according to claim 37, wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and: - sub-step cl) futher comprises reading the 0-2)* codon in the part of the second polynucleotide correponding to said part of the first polynucleotide
- in sub-step c2), the selected codon considered together with said 0-2)* and 0-1)* codons of said part of the second polynucleotide does not contain an enzyme restriction site listed in the second database; - in step d), j is increased each time by one from j=3 to j=n.
39. Method according to any one of claims 21 to 27 and 30 to 34, wherein, for at least a part of the first polynucleotide corresponding to n successive codons of the first polynucleotide: - step b) comprises the sub-steps of:
- bl) reading the (i-l)lh codon in the part of the second polynucleotide correponding to said group of the first polynucleotide;
- b2) reading the ith to (i+k)th codon(s) of the first polynucleotide, k being greater or equal to 1 and lower or equal to n; - step c) comprises the sub-steps of: - cl) selecting from said database one codon which belongs to the same group as the ith codon of the first polynucleotide, which can be identical or different from the ilh codon, the selected codon considered together with the (i-l)th codon of said part of the second polynucleotide containing no CpG dinucleotide; - c2) placing the selected codon at the ith codon location in said part of the second polynucleotide;
- c3) repeating step cl) and c2) until the k codons have been selected;
- step d) comprises repeating steps b) and c) by increasing i by (k +1) from i=2 to i=n.
40. Method according to claim 39, wherein,
- step a) further comprises providing the codon usage table corresponding to a host or a group of hosts;
- sub-step cl) further comprises the selected codon considered with the first to (i-1)* codons of said part of the second polynucleotide is the closest to codon usage with respect to any one of the codons of the same group.
41. Method according to claims 39 to 40, wherein,
- step a) further comprises providing a second database containing nucleotide sequences corresponding to undesired enzyme restriction sites ; - sub-step cl) further comprises the selected codon considered together with the (i-l)th codon of said part of the second polynucleotide containing no no undesired enzyme restriction site listed in the second database.
42. Method according to claim 41, wherein said second database contains sequences corresponding to undesired enzyme restriction sites of 8 nucleotides and:
- sub-step bl) futher comprises reading the (i-2)th codon in the part of the second polynucleotide correponding to said part of the first polynucleotide
- in sub-step c2), the selected codon considered together with said (i-2)lh and (i-l)lh codons of said part of the second polynucleotide does not contain an undesired enzyme restriction site listed in the second database, and k is at least 3.
43. Method according to claim 1 and 21, whereby the content of CpG dinucleotide in the (re)written sequence is maximized.
44. Method according to claim 43, whereby the CpG dinucleotide content of the (re)written polynucleotide is higher than 1 %.
45. Method according to any one of claims 1 to 44, whereby the (re)written polynucleotide sequence is longer than 500bp, preferably 1 kb, and more preferably 2 kb.
46. Method according to any one of claims 1 to 45, wherein the method is a computerized process.
47. An isolated polynucleotide comprising or consisting of a (re)written polynucleotide obtained by a method according to any one of claims 1 to 46.
48. Polynucleotide according to claim 47, wherein said polynucleotide is longer than 500bp, preferably lkb, and more preferably 2kb and has 1 or 0 CpG dinucleotide.
49. Polynucleotide according to claim 48, wherein said (re)written polynucleotide meets the codon usage table of a host or a group of hosts.
50. Polynucleotide according to any one of claims 47 to 49, wherein said (re)written polynucleotide does not comprise any undesired restriction site.
51. Polynucleotide according to any one of claims 47 to 50, wherein said (re)written polynucleotide contains at least one desired restriction site.
52. A Process to stably express in an eukaryotic host a polynucleotide, comprising the steps of : a) (re)writing a polynucleotide sequence in accordance with any one of claims 1 to 46; b) expressing said polynucleotide in said host.
53. Method according to claim 52, wherein the method further comprises the step of introducing said polynucleotide into the host.
54. An expression vector, wherein the vector comprises at least one (re)written polynucleotide sequence in accordance with any one of claims 1 to 46, or a complementary sequence thereto; or an isolated polynucleotide in accordance with any one of claims 47-51 or a complementary sequence thereto.
55. A non-human transgenic animal or a transgenic plant, wherein the transgenic animal or plant comprises at least one (re)written polynucleotide sequence in accordance with any one of claims 1 to 46, or a complementary sequence thereto; or an isolated polynucleotide in accordance with any one of claims 47 to 51 or a complementary sequence thereto; a vector, according to claim 54.
56. The use of the (re)written polynucleotide in accordance with any one of claims 1 to 46, or a complementary sequence thereto; or an isolated polynucleotide in accordance with any one of claims 47 to 51 or a complementary sequence thereto, for obtaining non-human transgenic animals, transgenic plants and in gene therapy.
57 The use of the (re)written polynucleotide sequence in accordance with any one of claims 1 to 46, or a complementary sequence thereto; or an isolated polynucleotide in accordance with any one of claims 47 to 51 or a complementary sequence thereto, for the production of a protein or polypeptide of interest in prokaryotes or eukaryotes.
58. A method of producing or synthesizing an improved polynucleotide encoding a desired expression product, the method comprising: - providing an improved polynucleotide sequence encoding said expression product using a method as described in any one of claims 1 to 46, or a complementary sequence thereto, and synthesizing a polynucleotide comprising said sequence.
59. The method of claim 58, wherein synthesis of the polynucleotide is performed by recombinant DNA technologies, artificial synthesis, mutagenesis, enzymatic techniques, cloning, and/or ligating, or a combination thereof.
60. The method of claim 58 or 59, further comprising the cloning of the polynucleotide into a vector.
PCT/EP2002/006043 2001-06-05 2002-06-03 Methods for modifying the cpg content of polynucleotides WO2002099105A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002317771A AU2002317771A1 (en) 2001-06-05 2002-06-03 Methods for modifying the cpg content of polynucleotides

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US29557401P 2001-06-05 2001-06-05
US60/295,574 2001-06-05

Publications (2)

Publication Number Publication Date
WO2002099105A2 true WO2002099105A2 (en) 2002-12-12
WO2002099105A3 WO2002099105A3 (en) 2003-08-07

Family

ID=23138282

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2002/006043 WO2002099105A2 (en) 2001-06-05 2002-06-03 Methods for modifying the cpg content of polynucleotides

Country Status (2)

Country Link
AU (1) AU2002317771A1 (en)
WO (1) WO2002099105A2 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009095793A1 (en) * 2008-01-31 2009-08-06 Cellectis New i-crei derived single-chain meganuclease and uses thereof
US7842489B2 (en) 2003-01-28 2010-11-30 Cellectis Use of meganucleases for inducing homologous recombination ex vivo and in toto in vertebrate somatic tissues and application thereof
WO2011064736A1 (en) * 2009-11-27 2011-06-03 Basf Plant Science Company Gmbh Optimized endonucleases and uses thereof
US8211685B2 (en) * 2004-04-30 2012-07-03 Cellectis I-DmoI derivatives with enhanced activity at 37° C and use thereof
US8859275B2 (en) 2004-08-03 2014-10-14 Geneart Ag Method for modulating gene expression by modifying the CpG content
WO2016086988A1 (en) * 2014-12-03 2016-06-09 Wageningen Universiteit Optimisation of coding sequence for functional protein expression
EP3149176A4 (en) * 2014-05-30 2017-11-08 The Trustees of Columbia University in the City of New York Methods for altering polypeptide expression
US10041053B2 (en) * 2007-10-31 2018-08-07 Precision Biosciences, Inc. Rationally-designed single-chain meganucleases with non-palindromic recognition sequences
US10842885B2 (en) 2018-08-20 2020-11-24 Ucl Business Ltd Factor IX encoding nucleotides
US11344608B2 (en) 2014-11-12 2022-05-31 Ucl Business Ltd Factor IX gene therapy
WO2024067780A1 (en) * 2022-09-30 2024-04-04 南京金斯瑞生物科技有限公司 Codon optimization for reducing immunogenicity of exogenous nucleic acids

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998052581A1 (en) * 1997-05-20 1998-11-26 Ottawa Civic Hospital Loeb Research Institute Vectors and methods for immunization or therapeutic protocols
WO2000014262A2 (en) * 1998-09-09 2000-03-16 Genzyme Corporation Methylation of plasmid vectors
WO2001040478A2 (en) * 1999-12-06 2001-06-07 Institut Pasteur Isolated polynucleotides having a reduced or an increased content of epigenetic control motifs and uses thereof
WO2002072846A2 (en) * 2001-03-09 2002-09-19 Cayla Synthetic genes and bacterial plasmids devoid of cpg

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998052581A1 (en) * 1997-05-20 1998-11-26 Ottawa Civic Hospital Loeb Research Institute Vectors and methods for immunization or therapeutic protocols
WO2000014262A2 (en) * 1998-09-09 2000-03-16 Genzyme Corporation Methylation of plasmid vectors
WO2001040478A2 (en) * 1999-12-06 2001-06-07 Institut Pasteur Isolated polynucleotides having a reduced or an increased content of epigenetic control motifs and uses thereof
WO2002072846A2 (en) * 2001-03-09 2002-09-19 Cayla Synthetic genes and bacterial plasmids devoid of cpg

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ISABELLE HENRY ET AL: "LagoZ et LagZ, deux gènes appauvris en dinucléotides CpG dérivés du gène LacZ pour l'étude des contrôles épigénétiques" LIFE SCIENCES, PERGAMON PRESS, OXFORD, GB, vol. 322, 1999, pages 1061-1070, XP002185406 ISSN: 0024-3205 *
SHIMSHEK D R ET AL: "Codon-improved Cre recombinase (iCre) expression in the mouse." GENESIS THE JOURNAL OF GENETICS AND DEVELOPMENT, vol. 32, no. 1, January 2002 (2002-01), pages 19-26, XP009006802 January, 2002 ISSN: 1526-954X *
SKOPEK T R ET AL: "SYNTHESIS OF A LACI GENE ANALOGUE WITH REDUCED CPG CONTENT" MUTATION RESEARCH, AMSTERDAM, NL, vol. 349, no. 2, 1996, pages 163-172, XP001041417 ISSN: 0027-5107 *
TAN Y ET AL: "THE INHIBITORY ROLE OF CPG IMMUNOSTIMULATORY MOTIFS IN CATIONIC LIPID VECTOR-MEDIATED TRANSGENE EXPRESSION IN VIVO" HUMAN GENE THERAPY, XX, XX, vol. 10, 1 September 1999 (1999-09-01), pages 2153-2161, XP000951517 ISSN: 1043-0342 *
YEW N S ET AL: "HIGH AND SUSTAINED TRANSGENE EXPRESSION IN VIVO FROM PLASMID VECTORS CONTAINING A HYBRID UBIQUITIN PROMOTER" MOLECULAR THERAPY, ACADEMIC PRESS, SAN DIEGO, CA,, US, vol. 4, no. 1, July 2001 (2001-07), pages 75-82, XP001079292 ISSN: 1525-0016 *
YEW N S ET AL: "REDUCED INFLAMMATORY RESPONSE TO PLASMID DNA VECTORS BY ELIMINATION ND INHIBITION OF IMMUNOSTIMULATORY CPG MOTIFS" MOLECULAR THERAPY, ACADEMIC PRESS, SAN DIEGO, CA,, US, vol. 1, no. 3, March 2000 (2000-03), pages 255-262, XP001078874 ISSN: 1525-0016 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8530214B2 (en) 2003-01-28 2013-09-10 Cellectis S.A. Use of meganucleases for inducing homologous recombination ex vivo and in toto in vertebrate somatic tissues and application thereof
US7842489B2 (en) 2003-01-28 2010-11-30 Cellectis Use of meganucleases for inducing homologous recombination ex vivo and in toto in vertebrate somatic tissues and application thereof
US8697395B2 (en) 2003-01-28 2014-04-15 Cellectis S.A. Use of meganucleases for inducing homologous recombination ex vivo and in toto in vertebrate somatic tissues and application thereof
US8624000B2 (en) 2003-01-28 2014-01-07 Cellectis S.A. Use of meganucleases for inducing homologous recombination ex vivo and in toto in vertebrate somatic tissues and application thereof
US8211685B2 (en) * 2004-04-30 2012-07-03 Cellectis I-DmoI derivatives with enhanced activity at 37° C and use thereof
US10273486B2 (en) 2004-08-03 2019-04-30 Geneart Ag Method for modulating gene expression by modifying the CpG content
US8859275B2 (en) 2004-08-03 2014-10-14 Geneart Ag Method for modulating gene expression by modifying the CpG content
US10041053B2 (en) * 2007-10-31 2018-08-07 Precision Biosciences, Inc. Rationally-designed single-chain meganucleases with non-palindromic recognition sequences
WO2009095793A1 (en) * 2008-01-31 2009-08-06 Cellectis New i-crei derived single-chain meganuclease and uses thereof
US8927247B2 (en) 2008-01-31 2015-01-06 Cellectis, S.A. I-CreI derived single-chain meganuclease and uses thereof
US9404099B2 (en) 2009-11-27 2016-08-02 Basf Plant Science Company Gmbh Optimized endonucleases and uses thereof
CN102725412B (en) * 2009-11-27 2017-09-22 巴斯夫植物科学有限公司 Endonuclease of optimization and application thereof
WO2011064736A1 (en) * 2009-11-27 2011-06-03 Basf Plant Science Company Gmbh Optimized endonucleases and uses thereof
CN102725412A (en) * 2009-11-27 2012-10-10 巴斯夫植物科学有限公司 Optimized endonucleases and uses thereof
EP3149176A4 (en) * 2014-05-30 2017-11-08 The Trustees of Columbia University in the City of New York Methods for altering polypeptide expression
US11344608B2 (en) 2014-11-12 2022-05-31 Ucl Business Ltd Factor IX gene therapy
WO2016086988A1 (en) * 2014-12-03 2016-06-09 Wageningen Universiteit Optimisation of coding sequence for functional protein expression
US10842885B2 (en) 2018-08-20 2020-11-24 Ucl Business Ltd Factor IX encoding nucleotides
US11517631B2 (en) 2018-08-20 2022-12-06 Ucl Business Ltd Factor IX encoding nucleotides
WO2024067780A1 (en) * 2022-09-30 2024-04-04 南京金斯瑞生物科技有限公司 Codon optimization for reducing immunogenicity of exogenous nucleic acids

Also Published As

Publication number Publication date
AU2002317771A1 (en) 2002-12-16
WO2002099105A3 (en) 2003-08-07

Similar Documents

Publication Publication Date Title
KR101906491B1 (en) Composition for Genome Editing comprising Cas9 derived from F. novicida
Wang et al. Multiplex gene editing in rice with simplified CRISPR‐Cpf1 and CRISPR‐Cas9 systems
EP1504092B2 (en) Methods and compositions for using zinc finger endonucleases to enhance homologous recombination
CN110157726B (en) Method for site-directed substitution of plant genome
EP2625277B1 (en) Expression vector for high level expression of recombinant proteins
Yan et al. Efficient multiplex mutagenesis by RNA-guided Cas9 and its use in the characterization of regulatory elements in the AGAMOUS gene
WO2002099105A2 (en) Methods for modifying the cpg content of polynucleotides
JP2018099136A (en) Site-specific enzymes and methods of use
US6734019B1 (en) Isolated DNA that encodes an Arabidopsis thaliana MSH3 protein involved in DNA mismatch repair and a method of modifying the mismatch repair system in a plant transformed with the isolated DNA
EP1321523A3 (en) Selection marker gene free recombinant strains; a method for obtaining them and the use of these strains
CN1981047A (en) Methods for dynamic vector assembly of DNA cloning vector plasmids
Simone et al. Fishing for understanding: Unlocking the zebrafish gene editor’s toolbox
CN109517845A (en) A kind of CRISPR single base repair system and its application
CN112852877A (en) Plasmid vector and application thereof in targeted site-specific integration of exogenous genes at COL1A1 site of pig
CN112159801A (en) SlugCas9-HF protein, gene editing system containing SlugCas9-HF protein and application
EP0063494B1 (en) Method for producing protein from a microorganism, microorganisms for use in such method and creation thereof, vectors for use in said creation, and protein produced thereby, and transformant culture derived from said microorganisms
CN110551762A (en) CRISPR/ShaCas9 gene editing system and application thereof
KR20220039564A (en) Compositions and methods for use of engineered base editing fusion protein
CN113564145A (en) Fusion protein for cytosine base editing and application thereof
CN112608930A (en) Application of BnaSVP gene in regulating flowering phase of rape and preparation method of rape mutant material with different flowering phases
KR20190122595A (en) Gene Construct for Base Editing in Plant, Vector Comprising the Same and Method for Base Editing Using the Same
CN110551763A (en) CRISPR/SlutCas9 gene editing system and application thereof
Basturea Base editing
WO2023165627A1 (en) Application of polynucleotide, protein and biological material in regulation and control of plant tuber development, and related product and cultivation method therefor
KR100512018B1 (en) Production of human mutant proteins in human cells by homologous recombination

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP