EP3052624A1 - Systematic optimization of coding sequence for functional protein expression - Google Patents

Systematic optimization of coding sequence for functional protein expression

Info

Publication number
EP3052624A1
EP3052624A1 EP13773223.6A EP13773223A EP3052624A1 EP 3052624 A1 EP3052624 A1 EP 3052624A1 EP 13773223 A EP13773223 A EP 13773223A EP 3052624 A1 EP3052624 A1 EP 3052624A1
Authority
EP
European Patent Office
Prior art keywords
codon
expression
coding sequence
protein
host cell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP13773223.6A
Other languages
German (de)
French (fr)
Inventor
John Van Der Oost
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wageningen Universiteit
Original Assignee
Wageningen Universiteit
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wageningen Universiteit filed Critical Wageningen Universiteit
Publication of EP3052624A1 publication Critical patent/EP3052624A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/67General methods for enhancing the expression

Definitions

  • the present invention relates to a systematic approach to codon selection aimed at optimizing a coding sequence for functional expression of a heterologous protein in a host cell.
  • This approach recognizes that there is a certain codon landscape for optimal translation efficiency, and as such for maximal production of a protein.
  • This codon landscape consists of both optimal and non-optimal codons at certain positions in the nucleotide sequence, the pattern of which is hard to accurately predict with the current insights in the complex translation process.
  • the presented unbiased approach for optimization of the coding sequence relies on the provision of an expression library of synonymous coding sequences for the protein of concern. Expression of each such sequence in a given host cell is compared under certain conditions, desirably by high— throughput screening relying on screening for an optical signal.
  • Some features are general and can be easily manipulated by for example the choice of a suitable expression vector, e.g. adjusting promoter strength to enhance the transcription rate (DNA to mRNA), or by relatively straightforward genetic engineering , e.g. reducing any potential folding of the mRNA transcript around the translation start site in order to enhance the translation initiation process.
  • a suitable expression vector e.g. adjusting promoter strength to enhance the transcription rate (DNA to mRNA)
  • relatively straightforward genetic engineering e.g. reducing any potential folding of the mRNA transcript around the translation start site in order to enhance the translation initiation process.
  • some features are difficult to predict as they are specific for each gene or protein, for each host, as well as for the expression conditions.
  • One such variable is 'codon bias'.
  • Codon bias relates to the fact that some ('optimal') codons are well recognised by the corresponding tRNA and result in relatively fast translation; on the other hand, translation of other ('non-optimal') codons is much less efficient.
  • codon bias differs both within and between genomes and variation in codon optimisation amongst individual genes results in differential speed and accuracy in their translation (Rocha, 2004 Genome Res. 14: 2279-2286; Hershberg and Petrov, 2008 Annu. Rev. Genet 42: 287-299; Sharp et al., 2010 Philos. Trans. R Soc. Lond. B Biol. Sci. 365: 1203-1212 ). Research in the early 1980s revealed that several bacterial and yeast species are subject to translational selection, whereby highly-expressed genes
  • Organisms clearly differ both qualitatively and quantitatively with respect to their set of tRNAs, in the extent of codon usage bias across their genomes and in the forces determining it (Botzman and Margalit, 201 1 Genome Biol. 12: R109; Sharp et al., 2005 Nucleic Acids Res 33: 1141-1153).
  • heterologous expression may require adaptation of the codon bias to adjust it to match with the particular protein synthesis machinery of the production cell.
  • researchers have made attempts to understand the occurrence of 'non- optimal' codons.
  • the importance of 'non-optimal' (or 'slow') codons in loops for the successful co-translational folding of protein domains was demonstrated in Escherichia coii and Bacillus subtilis by Zhang et al. in 2009 ⁇ Nat. Struct. Mol. Biol. 16: 274-280).
  • the authors disclosed an algorithm to predict putative 'slow' translating regions in these species.
  • the method disclosed involves mapping the folding status of translation intermediates to determine whether local discontinuity in translation at certain regions in the mRNA sequence is needed to efficiently co-ordinate the rate of elongation of the peptide chain and its co-translational folding.
  • the algorithm is applied to the protein of interest to identify regions where 'non-optimal' codons may be required in order to allow slower translation and proper folding of the peptide.
  • the concentrations of iso- accepting tRNAs for a set of synonymous codons are variable, and the codon-reading programme can dramatically alter in response to both internal (gene expression level, GC content) and external factors (amino acid starvation, environmental stress, population size) (Elf et al., 2003 Science 300: 1718-1722; Subramaniam et al., 2013 PNAS 110: 2419- 2424; Behura and Severson, 2013 Biol. Rev. Camb. Philos. Soc 88: 49-61; Botzman and Margalit, 201 1 ). Variability in the codon reading programme may be particularly
  • the factors include: tRNA repertoire, codon position, GC content, expression level, gene length, amino acid conservation, transcriptional selection, RNA stability, protein hydrophobicity, recombination rates, environmental stress, population size, optimal growth temperature and organismal lifestyle have all been shown to be influential in determining the extent of codon bias (Chen et al., 2004 PNAS 101: 3480-3485; Hershberg and Petrov, 2009 PLoS Genetics 5: e1000556; Botzman and Margalit, 201 1 ; Behura and Severson, 2013; Akashi, 1997 Gene 205: 269-278; Powell and Moriyama, 1997 PNAS 94: 7784-7790; Moriyama and Powell, 1998 Nucleic Acids Res. 26: 3188-3193; Powell et al., 2003 J. Mol. Evol. 57 Suppl. 1: S214-225).
  • preferential codon usage is likely to be different between different genes, organisms, environmental situations and gene expression contexts.
  • this invention provides a systematic approach to generating high-levels of functional protein expression for any gene of interest, in a host cell of interest and under specific conditions of interest.
  • Adopting a non-prejudicial, function-oriented approach to generating high-levels of protein expression by systematically altering individual codons alone, or in combination, without affecting the polypeptide sequence enables attainment of a codon complement tailored to the production of optimal levels of functional protein in the cell of interest which takes account of the conditions for protein production, including the actual (rather than theoretical) tRNA status.
  • the present invention provides a method of codon selection for provision of a coding sequence for functional expression of a heterologous protein in a host cell, which, where the wild-type coding sequence is expressible in said host cell in an expression construct, comprises:
  • each variant coding sequence being inserted in an expression construct whereby each said sequence can be separately expressed in said host cell under the same conditions, and wherein each non-wild-type variant coding sequence differs from the wild-type coding sequence at one codon position, or at a cluster of sequential codon positions, or at a combination of codon sites selected from individual codon positions and clusters of sequential codon positions, the selected codon position(s) being a position or positions for which the host cell can provide more than one cognate tRNA and the variant coding sequences providing for each selected codon position both (a) the pre-determined optimal codon for translational efficiency and (b) one or more non-optimal synonymous codons;
  • the variant synonymous coding sequences provided will all differ from the wild- type coding sequence by just one codon and cover a number of selected codon positions. These may be all positions for which multiple cognate tRNAs are available in the host cell, although not necessarily so.
  • the expression library will include all possible synonymous codons.
  • step (i) of an expression library of synonymous coding sequences encoding said protein consisting of:
  • the variant coding sequences from the starting sequence will all exhibit just a single codon change compared to that sequence and cover a number of selected codon positions. All codon positions for which multiple cognate tRNAs are available may be included in the analysis, although again in some instances it may be chosen to cover less than the full complement of these. Again, for each selected codon position all possible synonymous codons may be provided.
  • a coding sequence derived by a method of the invention as above may be used as the starting sequence in substitution for said first coding sequence in a further cycle of codon selection.
  • This may, for example, comprise firstly provision of a set of variant sequences as defined above with the proviso that there is non-variance of one or more codons previously selected for substitution in the starting sequence.
  • each variant coding sequence of the expression library may incorporate codon
  • a cluster of sequential codon positions may be, for example, two adjacent codon positions or more than two adjacent codon positions, e.g. 3, 4, 5 or higher. Provision of an expression library for a further cycle of codon selection will be followed by repeat of steps (ii) and (iii) above. Further such cycles may be carried out.
  • the invention provides as its simplest mode for codon selection a method as follows resulting in provision of a coding sequence for functional expression of a heterologous protein in host cell. This method comprises:
  • each variant coding sequence being inserted in an expression construct whereby each said sequence can be separately expressed in said host cell under the same conditions, and wherein said variant coding sequences consist of:
  • these steps may be followed by one or more further rounds of codon selection as discussed.
  • the same method may be simply repeated but with one or more codons previously selected for substitution in the starting sequence being fixed. This may reveal optimal codon pairs and/ or clusters.
  • determination of optimal functional protein expression may take account of need for co-expression of one or more further
  • heterologous proteins in the same host cell at a desired level. Where only expression of a single heterologous protein is of concern, then codon selection will simply be on the basis of association with the highest functional expression of the desired protein.
  • a coding sequence may be provided for the protein of interest wherein each position for which synonymous codons can be used is either the pre-determined optimal codon or in preference a synonymous non-optimal codon which has been found to correspond with improved functional protein expression.
  • the provision of the coding sequence may comprise actual synthesis of a sequence comprising the coding sequence. This may be for example by de novo synthesis or possibly by modification of a pre-existing sequence, e.g. by site-directed mutagenesis. Subsequent expression of the coding sequence in the chosen host cell may be carried out.
  • the sequence will preferably be provided in an expression construct, e.g.
  • Figure 1 shows a hypothetical example of the systematic manipulation of individual codons and high-throughput screening of a library of synonymous variants to generate a sequence for improved functional protein expression in a specific production host.
  • a suitable model system for this approach is expression of green fluorescent protein (GFP) as a model protein in E. coli as a model host since functional expression of the model protein can be simply directly detected by fluorescence measurement.
  • GFP green fluorescent protein
  • Other models include proteins with established 3D structure that allow for easy functional screening and for which functional production has been established in the chosen host cell.
  • the essential starting point for a method of codon optimization according to the invention is a coding sequence which expresses the protein of interest and can be expressed to some degree as a heterologous protein in the chosen host cell.
  • the wild type sequence may be utilized as the starting sequence.
  • an expression library may be provided of synonymous coding sequences including the wild-type sequence and wherein each variant non-wild-type coding sequence differs from the wild-type coding sequence at one codon position, or at a cluster of sequential codon positions, or at a combination of codon sites selected from individual codon positions and clusters of sequential codon positions.
  • the variant coding sequences will provide for each selected codon position (a position for which there is more than one cognate tRNA) both (a) the pre-determined optimal codon for translation efficiency and (b) one or more non-optimal synonymous codons. All possible codons will generally be provided for each selected position.
  • a different coding sequence may be chosen as the starting point for codon optimization.
  • This may conveniently be a synthetic sequence of pre-determined optimal codons, e.g. a commercially available coding DNA of pre-determined optimal codons or such a sequence with one or more non-optimal substitutions based on prior
  • an alternative starting expression library for codon optimization may be one in which the variant coding sequences consist of:
  • codons will commonly be provided for each selected position.
  • more than one round of codon optimization may be carried out using a method of codon selection according to the invention. All codon positions for which more than one cognate tRNA is available may be included in a round of analysis.
  • a common starting point will be an expression library in which compared to the starting sequence (either the wild-type sequence or first coding sequence as defined above) each variant sequence has just a single codon substitution and preferably the full length of the coding sequence is covered. It will be appreciated that where there is a starting methionine (Met) codon this will remain ATG in all variants since there are no synonymous codons (see Fig.1 ). However in some instances a different start codon may be present in the starting sequence or substituted.
  • Met methionine
  • prior information may mean that more directed codon substitutions may be made covering less than the full length sequence.
  • native protein coding sequences in that species commonly have a 'starting ramp' of about 10 codons long which directs low translation efficiency and is thought to be beneficial for overall functional expression of the corresponding proteins (Pechman and Frydman, ibid).
  • it may be chosen to apply the approach of the invention to codon optimization at a section of a protein coding sequence downstream of a 5' starting section of a plurality of codons, e.g. at least 10 codons, at least 20, 30, 40 or 50 codons, which may be kept in all variants tested as the wild-type sequence.
  • Each variant coding sequence may be conveniently generated by site-directed
  • pre-determined optimal codon may be equated with any of the following:
  • the approach of the invention can be applied to any type of host cell which can be cultured, and which is genetically accessible (i.e. able to serve as a host for functional production of the protein of interest (POI)). It may be applied to all host cells commonly employed for recombinant heterologous protein expression including bacterial cells (e.g. E. coli, Bacillus subtilis), yeast cells (e.g. S. cerevisiae, Pichia pastoris), fungal cells, insect cells and mammalian cell lines (e.g. CHO cells) and tumour cell lines (e.g HeLa cells).
  • the POI may be a native protein of a host cell in which the native coding sequence for that protein has been knocked out. In these circumstances, the POI will be considered as a heterologous protein to the mutated host cell.
  • the expression constructs of the library may be located in plasmids (expression vectors) which are used to transform the host cell.
  • Methods of transformation may include, but are not limited to, heat shock, electroporation, particle bombardment, chemical induction, microinjection and viral transformation.
  • the expression levels of the protein for each synonymous coding variant are determined under the same expression conditions.
  • functional expression e.g. as with GFP or by enzymatic action of the protein of interest (POI) to generate a detectable optical signal.
  • POI protein of interest
  • the POI will be detectable by a high-throughput screening method, for example, relying on detection of an optical signal.
  • a tag may be, for example, a fluorescence reporter molecule translationally-fused to the C-terminal end of the POI, e.g. GFP, Yellow Fluorescent Protein (YFP), Red Fluorescent Protein (RFP) or Cyan Fluorescent Protein (CFP). It may be an enzyme which can be used to generate an optical signal.
  • Tags used for detection of expression may also be antigen peptide tags.
  • a tag may be provided for affinity purification, e.g. a His tag.
  • the codon selected from amongst synonymous codons for any selected position will be the codon associated with the highest or optimal observed functional expression of the POI, or where more than one codon provides substantially equal such expression, one such codon corresponding with that level of expression. Where there is choice of codons indicated for a selected position based on the expression data, preference may be given to the codon in the starting sequence, i.e. the wild type codon if the starting sequence is the wild-type sequence. This will minimise the number of codon changes to convert the starting sequence in a nucleic acid to the selected synonymous coding sequence for improved functional protein expression.
  • POIs may preferably be recovered from the cell culture medium as secreted proteins, although they may also be recovered from host cell lysates.
  • a method of the invention for codon optimization may be repeated with one or more further proteins, all the selected proteins for heterologous expression sharing one or more structural motifs.
  • the analysis carried out may further include determination of whether any signature pattern of optimal and non-optimal codons is associated with any shared structural motif.
  • the invention also extends to synthesizing a protein coding sequence including a signature pattern so determined and expressing the sequence, preferably in the same host cell and under the same expression conditions as employed for codon optimization.
  • a method of providing a DNA comprising a coding sequence for expressing a heterologous protein in a host cell which includes incorporating in said sequence a pattern of optimal and non-optimal codons at a site associated with provision of a structural motif, wherein said pattern enables increased expression efficiency of said protein in said host cell compared with the synonymous coding sequence containing solely optimal codons, wherein optimal codons are those codons pre-calculated to provide the highest codon translation efficiency in the host cell or the sole possible codon.
  • the DNA may be provided for example by site-directed mutagenesis or de novo synthesis and once obtained, the coding sequence for the protein of interest may desirably be expressed in said host cell.
  • GFP provides a suitable model protein for exemplification of codon optimization according to the invention starting with a wild-type coding sequence, since the protein is well characterized and its functional expression can be simply assayed by high throughput fluorescent screening in E. col i as a model host. Screening can be done either
  • Figure 1 illustrates generation of a synthetic library of synonymous coding sequences wherein the variants each have an initiating Met codon and each variant of the wild-type sequence has one codon change. For each codon position for which synonymous codons are available all possible codons are provided in the library.
  • each variant under the same conditions The fluorescence associated with functional expression of each variant under the same conditions is determined. For each codon position for which multiple codons are present in the library, a codon is selected which gives the highest expression. Where one such codon is the wild-type codon, this is maintained in the final selected sequence.
  • the wild type sequence is supplemented by 3 variant synonymous coding sequences to provide all 4 codons for the amino acid Gly.
  • the variant providing the non-wild type codon GGC at codon position 2 is observed to give the highest fluorescence and is selected for codon position 2 in the new coding sequence.
  • codon position 3 just one variant coding sequence is supplied to supplement the wild- type coding sequence so as to provide in the expression library both possible codons for Asp at that position.
  • the variant gives no significant difference in the level of fluorescence and thus the wild-type codon is maintained in preference in the new coding sequence.
  • the same approach to codon selection is repeated for other codon positions.
  • the selected complete new sequence may be obtained for expression by site-directed mutagenesis of the wild-type sequence.
  • the new selected sequence may be used as the starting sequence for one or more further rounds of codon optimization in accordance with the invention aimed at revealing optimal codon pairs and/ or clusters.
  • the above method of the example may be repeated but with one initially selected codon at a variant codon position fixed.

Abstract

The present invention relates to a systematic approach to codon selection for provision of a coding sequence for functional expression of a heterologous protein in a host cell. This approach recognizes that non-optimal codons for translation efficiency may be desirable at certain sites and relies on the provision of an expression library of synonymous coding sequences for the protein of concern. Expression of each such sequence is compared under the same conditions in the host cell, desirably by high–throughput screening.

Description

Systematic optimization of coding sequence for functional protein
expression
Field of the invention
The present invention relates to a systematic approach to codon selection aimed at optimizing a coding sequence for functional expression of a heterologous protein in a host cell. This approach recognizes that there is a certain codon landscape for optimal translation efficiency, and as such for maximal production of a protein. This codon landscape consists of both optimal and non-optimal codons at certain positions in the nucleotide sequence, the pattern of which is hard to accurately predict with the current insights in the complex translation process. The presented unbiased approach for optimization of the coding sequence relies on the provision of an expression library of synonymous coding sequences for the protein of concern. Expression of each such sequence in a given host cell is compared under certain conditions, desirably by high— throughput screening relying on screening for an optical signal.
Background to the invention
Production of proteins, either homologous (expression of a gene by cultivation in the original host cell) or heterologous (expression of a gene found in one cell type in a cultured host cell of another type), is a frequent activity in academic as well as industrial laboratories. However, the success rate of functional protein production varies
substantially from case to case because there are many variables which potentially influence gene expression (Welch et al., 2009 PLoS ONE 4: e7002). Some features are general and can be easily manipulated by for example the choice of a suitable expression vector, e.g. adjusting promoter strength to enhance the transcription rate (DNA to mRNA), or by relatively straightforward genetic engineering , e.g. reducing any potential folding of the mRNA transcript around the translation start site in order to enhance the translation initiation process. On the other hand, some features are difficult to predict as they are specific for each gene or protein, for each host, as well as for the expression conditions. One such variable is 'codon bias'.
A consequence of degeneracy in the genetic code, with 61 different codons (triplets of nucleotides) encoding 20 amino acid residues, is that all the amino acids, except for methionine and tryptophan, are encoded by more than one codon. During the translation process, these codons are recognised by complementary transfer RNA (tRNA) molecules that deliver specific amino acids to the protein synthesis machinery (the ribosome).
Different codons encoding the same amino acid (synonymous codons) typically do not occur with equal frequency across genomes and their non-uniform occurrence in coding DNA is known as 'codon usage bias' or 'codon bias' (Grantham et al., 1980 Nucleic Acids Research 8: R49-R62). Codon bias relates to the fact that some ('optimal') codons are well recognised by the corresponding tRNA and result in relatively fast translation; on the other hand, translation of other ('non-optimal') codons is much less efficient.
The extent of codon bias differs both within and between genomes and variation in codon optimisation amongst individual genes results in differential speed and accuracy in their translation (Rocha, 2004 Genome Res. 14: 2279-2286; Hershberg and Petrov, 2008 Annu. Rev. Genet 42: 287-299; Sharp et al., 2010 Philos. Trans. R Soc. Lond. B Biol. Sci. 365: 1203-1212 ). Research in the early 1980s revealed that several bacterial and yeast species are subject to translational selection, whereby highly-expressed genes
preferentially contain codons which are translated more efficiently by their ribosomes (Gouy and Gautier, 1982 Nucleic Acids Research 10: 7055-7074; Bennetzen and Hall, 1982 J. Biol. Chem. 257: 3026-3031), a phenomenon attributed to the differential abundance of tRNAs, with positive selection for codons with greater numbers of associated charged tRNAs (Bennetzen and Hall, 1982; Post and Nomura, 1980 J. Biol. Chem. 255: 4660-4666).
The abundance of available charged tRNAs, which can differ up to ten-fold in their cellular concentrations (Ikemura, 1981 J. Mol. Biol. 151:389-409; Ikemura, 1985 Mol. Biol. Evol. 2: 13-34; Dong et al., 1996 J. Mol. Biol. 260: 649-663), depends on a dynamic equilibrium of tRNA supply and demand. Several factors contribute to this process: (i) the set of tRNA genes, (ii) redundancy in tRNA genes, (iii) the expression control of tRNAs, (iv) the transcriptome, (v) the codon frequencies of mRNAs, (vi) mRNAs being translated and (vii) the density of ribosomes on mRNA transcripts. The extent of codon bias therefore reflects a balance between mutation and selection for translational optimisation (Bulmer, 1988 J. Evol. Biol. 1: 15-26; Sharp et al., 1993 Biochem. Soc. Trans. 21: 835-841; Kliman and Hey, 2003 Genet. Res. 81: 89-90; Akashi, 1997 Gene 205: 269-278). In many organisms such as Escherichia coli and Saccharomyces cerevisiae, translational selection has been demonstrated, with strongly positive correlations existing between codon bias of genes and corresponding protein levels (Lithwick and Margalit, 2003 Genome Res. 13: 2665- 2673; Ghaemmaghami et al., 2003 Nature 425: 737-741). Organisms clearly differ both qualitatively and quantitatively with respect to their set of tRNAs, in the extent of codon usage bias across their genomes and in the forces determining it (Botzman and Margalit, 201 1 Genome Biol. 12: R109; Sharp et al., 2005 Nucleic Acids Res 33: 1141-1153). This means that heterologous expression may require adaptation of the codon bias to adjust it to match with the particular protein synthesis machinery of the production cell. In recent years, researchers have made attempts to understand the occurrence of 'non- optimal' codons. The importance of 'non-optimal' (or 'slow') codons in loops for the successful co-translational folding of protein domains was demonstrated in Escherichia coii and Bacillus subtilis by Zhang et al. in 2009 {Nat. Struct. Mol. Biol. 16: 274-280). The authors disclosed an algorithm to predict putative 'slow' translating regions in these species. The method disclosed involves mapping the folding status of translation intermediates to determine whether local discontinuity in translation at certain regions in the mRNA sequence is needed to efficiently co-ordinate the rate of elongation of the peptide chain and its co-translational folding. The algorithm is applied to the protein of interest to identify regions where 'non-optimal' codons may be required in order to allow slower translation and proper folding of the peptide.
The method which is disclosed in the study depends on prior knowledge of the three dimensional structure of the protein of interest as well as the concentrations and relative abundances of the full set of tRNAs. It is therefore suitable and sufficiently accurate only for those organisms in which this has been determined. Unfortunately tRNA abundances are only known for a limited number of organisms such as Escherichia coii (Dong et al., 1996) and Bacillus subtilis (Kanaya et al., 1999 Gene 238: 143-155) and so this method is not applicable in a wide range of model and host systems which are either currently used, or represent desirable targets for optimised functional protein production. Even in those species in which the tRNA repertoire has been determined, the concentrations of iso- accepting tRNAs for a set of synonymous codons are variable, and the codon-reading programme can dramatically alter in response to both internal (gene expression level, GC content) and external factors (amino acid starvation, environmental stress, population size) (Elf et al., 2003 Science 300: 1718-1722; Subramaniam et al., 2013 PNAS 110: 2419- 2424; Behura and Severson, 2013 Biol. Rev. Camb. Philos. Soc 88: 49-61; Botzman and Margalit, 201 1 ). Variability in the codon reading programme may be particularly
problematic in host systems which are already expressing (several) other heterologous proteins, because over-use of a single codon may deplete its cognate tRNA. Under these conditions the codon reading programme may diverge significantly from the predicted 'optimal' set of codons.
The algorithm disclosed by Zhang et al. (2009) also does not incorporate steric effects and interactions of the charged tRNA with the acceptor (A)-site (Curran and Yarus, 1989 J. Mol. Biol. 209: 65-77). Codon context sequences (sequences flanking the codon of interest) are also not considered and may have a significant effect on mRNA translation rates because these sequences reside in the A and P sites of the ribosome during their translation. The A site binds to an incoming aminoacylated tRNA and the P site to a peptidyl-tRNA. Accordingly, during translation, the A and P sites of an active ribosome are occupied by the specific tRNAs with the complementary anticodon sequences; some combinations of tRNAs may result in efficient translation elongation, whereas others may have an opposite effect. Optimisation of these sequences in combination with the codon of interest may have a profound effect on levels of protein expression (Moura et al., 2007 PLoS ONE 2: e847; Behura and Severson, 2012 PLoS ONE 7: e43111). The algorithm disclosed by Zhang et al. (2009) uses the occurrence of regions of complex secondary structure as a predictor of 'sub-optimal' codon position. However, as genome- wide patterns of codon bias from a range of organisms have become available (Behura and Severson, 2013), deficiencies in this approach are emerging. Large variations in codon bias exist between organisms (Botzman and Margalit, 201 1 ) and depend on a multiplicity of factors, unrelated to secondary structure, that differ amongst species or between cellular contexts. The factors include: tRNA repertoire, codon position, GC content, expression level, gene length, amino acid conservation, transcriptional selection, RNA stability, protein hydrophobicity, recombination rates, environmental stress, population size, optimal growth temperature and organismal lifestyle have all been shown to be influential in determining the extent of codon bias (Chen et al., 2004 PNAS 101: 3480-3485; Hershberg and Petrov, 2009 PLoS Genetics 5: e1000556; Botzman and Margalit, 201 1 ; Behura and Severson, 2013; Akashi, 1997 Gene 205: 269-278; Powell and Moriyama, 1997 PNAS 94: 7784-7790; Moriyama and Powell, 1998 Nucleic Acids Res. 26: 3188-3193; Powell et al., 2003 J. Mol. Evol. 57 Suppl. 1: S214-225).
Consequently, preferential codon usage is likely to be different between different genes, organisms, environmental situations and gene expression contexts. Predictive
approaches will therefore struggle to capture the necessary variables to maximise functional protein production in multiple host systems and contexts. In an attempt to understand the evolutionary significance of codon bias, Pechmann and Frydman (February 2013, Nat. Struct.Mol. Biol. 20: 237-244) took a comparative evolutionary approach; comparing codon usage from nucleotide sequence alignments of orthologous genes in 10 closely-related yeast species. They provided evidence of a correlation between the position of 'non-optimal' codons and the occurrence of secondary structural elements in a subset of laboratory yeast strains. The authors speculated that the statistical association of codon usage and protein structural complexity might be explained by the 'tuning' of translation rates to allow proper co-translational folding, with 'non-optimal' codons causing a reduction in the speed of the translation process, to allow for co- translational folding of the nascent polypeptide as it emerges from the ribosome. However, there is no disclosure of the experimental manipulation of codons to establish whether this is a causal relationship.
Although nucleotide conservation often provides a strong indication of functional importance, the correlation disclosed by Pechmann and Frydman is representative only of putative function in laboratory yeast strains. The ability to generalise about other systems remains limited because selective pressures often differ substantially between
environments, species, cells and gene expression contexts, as do tRNA iso-acceptor availabilities for each codon. It is therefore unlikely that a generalised, theoretical set of rules for codon optimisation will be applicable in all systems. Furthermore, the approach outlined by Pechmann and Frydman (2012) is predicated on the supposition that the evolutionarily conserved codon composition leaves no room for improvement in functional protein expression. However, the used model host system is 'optimised' in response to a range of competing evolutionary pressures relating to a specific molecular, cellular and environmental context in which the production efficiency of a particular protein may be compromised in relation to other, more influential factors. Evolutionary pressures acting on laboratory yeast strains are unlikely to be representative of those affecting many other organisms. Greater translational efficiencies may therefore be possible in other genes, systems, cellular contexts or environments.
Molecular biology-related companies are now in existence which will synthesise genes on request; moreover, such companies may offer to adjust the codon bias of the gene to allow its efficient expression in a production organism of choice. However, true codon optimisation is significantly more complex than merely substituting 'non-optimal' codons for those corresponding to the theoretically most abundant tRNA for each of the 20 amino acids. This strategy often results in incorrectly folded proteins as a consequence of excessively rapid translation rates. Dissonance or incompatibility in codon bias among organisms or cellular contexts can contribute to low expression yields or insolubility of foreign proteins in heterologous host systems (Angov et al., 2008 PloS ONE 3: e2189). This presents serious challenges in research or in industries where high levels of protein expression are required. Despite the recent recognition (Zhang et al., 2009; Pechmann and Frydman, 2012) that the optimal translation rate of a gene is not uniform and therefore the location of optimal and non-optimal ('fast' and 'slow') codons along a gene sequence is an important determinant of translation efficiency, a major problem routinely encountered in the functionality of synthetic genes (with a designed codon bias) is that the optimal positioning of the 'fast' and 'slow' codons cannot easily be predicted because for each protein with a different fold, the pattern (or codon landscape) will be different. Currently available methods to address this require prior knowledge of tRNA abundances and/or 3D protein structures and are focussed either on increasing or slowing translation rates to allow efficient nascent peptide folding rather than functional protein production per se. A systematic method is therefore required to analyse and empirically reveal the optimal codon landscape and increase the amount of functional protein expressed for any gene of interest in a particular production host. Summary of the invention
Instead of a focus on achieving quicker translation rates by generating a set of generalised 'optimal' codons, or relying on limited predictive approaches to identify regions of structural complexity, this invention provides a systematic approach to generating high-levels of functional protein expression for any gene of interest, in a host cell of interest and under specific conditions of interest. Adopting a non-prejudicial, function-oriented approach to generating high-levels of protein expression by systematically altering individual codons alone, or in combination, without affecting the polypeptide sequence, enables attainment of a codon complement tailored to the production of optimal levels of functional protein in the cell of interest which takes account of the conditions for protein production, including the actual (rather than theoretical) tRNA status.
More particularly, the present invention provides a method of codon selection for provision of a coding sequence for functional expression of a heterologous protein in a host cell, which, where the wild-type coding sequence is expressible in said host cell in an expression construct, comprises:
(i) providing a library of variant synonymous coding sequences encoding said protein including the wild-type coding sequence, each variant coding sequence being inserted in an expression construct whereby each said sequence can be separately expressed in said host cell under the same conditions, and wherein each non-wild-type variant coding sequence differs from the wild-type coding sequence at one codon position, or at a cluster of sequential codon positions, or at a combination of codon sites selected from individual codon positions and clusters of sequential codon positions, the selected codon position(s) being a position or positions for which the host cell can provide more than one cognate tRNA and the variant coding sequences providing for each selected codon position both (a) the pre-determined optimal codon for translational efficiency and (b) one or more non-optimal synonymous codons;
(ii) comparing expression of said protein by each expression construct of said library in said host cell under the same expression conditions; and (iii) in respect of the selected codon position(s) for which synonymous codons are provided in said library determining the codon(s) which give the highest or optimal functional expression.
Commonly, at least for initial application of the method to a chosen wild-type coding sequence, the variant synonymous coding sequences provided will all differ from the wild- type coding sequence by just one codon and cover a number of selected codon positions. These may be all positions for which multiple cognate tRNAs are available in the host cell, although not necessarily so. Preferably, for each selected codon position the expression library will include all possible synonymous codons.
Alternatively, the same general approach may be adopted but with provision in step (i) of an expression library of synonymous coding sequences encoding said protein consisting of:
(a) a first coding sequence wherein each codon position is occupied by the sole possible codon or the pre-determined optimal codon for translational efficiency, with the proviso that at least one codon substitution may be made such that functional expression is achieved in the host cell, and
(b) a set of variant coding sequences wherein each sequence has compared with said first coding sequence substitution of a synonymous non-optimal codon recognised by a cognate tRNA of said host cell at one codon position, or at a cluster of sequential codon positions, or at a combination of codon sites selected from individual codon positions and clusters of sequential codon positions.
Provision of this library will again be followed by the steps of:
(ii) comparing expression of said protein by each expression construct of said library in said host cell under the same expression conditions; and
(iii) in respect of the selected codon position(s) for which synonymous codons are provided in said library determining the codon(s) which give the highest
or optimal functional expression. Again commonly, at least for initial application of the method, the variant coding sequences from the starting sequence (said first coding sequence as defined above) will all exhibit just a single codon change compared to that sequence and cover a number of selected codon positions. All codon positions for which multiple cognate tRNAs are available may be included in the analysis, although again in some instances it may be chosen to cover less than the full complement of these. Again, for each selected codon position all possible synonymous codons may be provided.
A coding sequence derived by a method of the invention as above may be used as the starting sequence in substitution for said first coding sequence in a further cycle of codon selection. This may, for example, comprise firstly provision of a set of variant sequences as defined above with the proviso that there is non-variance of one or more codons previously selected for substitution in the starting sequence. Alternatively, for example, each variant coding sequence of the expression library may incorporate codon
substitutions at a cluster of sequential codon positions. In the context of the present invention, a cluster of sequential codon positions may be, for example, two adjacent codon positions or more than two adjacent codon positions, e.g. 3, 4, 5 or higher. Provision of an expression library for a further cycle of codon selection will be followed by repeat of steps (ii) and (iii) above. Further such cycles may be carried out.
From another perspective, the invention provides as its simplest mode for codon selection a method as follows resulting in provision of a coding sequence for functional expression of a heterologous protein in host cell. This method comprises:
(i) providing a starting coding sequence for the desired protein which is
expressible in the chosen host cell (This may be for example, a wild-type coding sequence or a starting sequence wherein each codon position is occupied by the sole possible codon or the pre-determined optimal codon for translational efficiency);
providing a library of synonymous coding sequences encoding said protein, each variant coding sequence being inserted in an expression construct whereby each said sequence can be separately expressed in said host cell under the same conditions, and wherein said variant coding sequences consist of:
(a) said starting coding sequence and
(b) a set of variant coding sequences which each differ from said starting coding sequence at a single codon position, whereby for all codon positions for which the host cell provides more than one cognate tRNA (or at least all such positions minus a 5' terminal section, e.g. a section of no more than about 10 codons), all possible codons are provided in the library; (iii) comparing expression of said protein by each expression construct of said library in said host cell under the same expression conditions and
(iv) in respect of each variant codon position determining the codon which is
associated with the highest or optimal functional protein expression to thereby select a full length coding sequence.
Optionally these steps may be followed by one or more further rounds of codon selection as discussed. Thus for example the same method may be simply repeated but with one or more codons previously selected for substitution in the starting sequence being fixed. This may reveal optimal codon pairs and/ or clusters.
In the context of a method of the invention, determination of optimal functional protein expression may take account of need for co-expression of one or more further
heterologous proteins in the same host cell at a desired level. Where only expression of a single heterologous protein is of concern, then codon selection will simply be on the basis of association with the highest functional expression of the desired protein.
Once codons have been selected by a method of the invention as above, a coding sequence may be provided for the protein of interest wherein each position for which synonymous codons can be used is either the pre-determined optimal codon or in preference a synonymous non-optimal codon which has been found to correspond with improved functional protein expression. The provision of the coding sequence may comprise actual synthesis of a sequence comprising the coding sequence. This may be for example by de novo synthesis or possibly by modification of a pre-existing sequence, e.g. by site-directed mutagenesis. Subsequent expression of the coding sequence in the chosen host cell may be carried out. For this, the sequence will preferably be provided in an expression construct, e.g. an expression vector, such that expression can be carried out in the same host cell and preferably under the same conditions as employed for codon optimization. This approach to codon optimization for a gene of interest has the advantages that it requires no knowledge of secondary or tertiary structure and can take account of variation of tRNA availability with different conditions. It has been demonstrated for example in £. co// that during stress of protein over-production, the set of tRNAs is substantially different from the pre-determined "optimal set" based on abundant proteins including ribosomal proteins; in other words pre-determined optimal codons may be sub-optimal under production conditions and the systematic approach to codon selection of the present invention importantly enables account to be taken of this. Brief description of the figure
Embodiments of the invention are further described hereinafter with reference to the accompanying figure.
Figure 1 shows a hypothetical example of the systematic manipulation of individual codons and high-throughput screening of a library of synonymous variants to generate a sequence for improved functional protein expression in a specific production host. A suitable model system for this approach is expression of green fluorescent protein (GFP) as a model protein in E. coli as a model host since functional expression of the model protein can be simply directly detected by fluorescence measurement. Other models include proteins with established 3D structure that allow for easy functional screening and for which functional production has been established in the chosen host cell.
Detailed description
It will be appreciated that the essential starting point for a method of codon optimization according to the invention is a coding sequence which expresses the protein of interest and can be expressed to some degree as a heterologous protein in the chosen host cell. In some instances the wild type sequence may be utilized as the starting sequence. In this case, as indicated above, an expression library may be provided of synonymous coding sequences including the wild-type sequence and wherein each variant non-wild-type coding sequence differs from the wild-type coding sequence at one codon position, or at a cluster of sequential codon positions, or at a combination of codon sites selected from individual codon positions and clusters of sequential codon positions. The variant coding sequences will provide for each selected codon position (a position for which there is more than one cognate tRNA) both (a) the pre-determined optimal codon for translation efficiency and (b) one or more non-optimal synonymous codons. All possible codons will generally be provided for each selected position.
Where the wild-type coding sequence cannot be expressed in the host cell, or if in any case desired, a different coding sequence may be chosen as the starting point for codon optimization. This may conveniently be a synthetic sequence of pre-determined optimal codons, e.g. a commercially available coding DNA of pre-determined optimal codons or such a sequence with one or more non-optimal substitutions based on prior
considerations aimed at ensuring a detectable expression level. It is well recognised that in some instances a heterologous sequence of optimal codons will not express in a selected host cell, e.g. inclusion bodies of improperly folded protein may arise. Thus, as indicated above, an alternative starting expression library for codon optimization may be one in which the variant coding sequences consist of:
(a) a first coding sequence wherein each codon position is occupied by the sole
possible codon or pre-determined optimal codon for translational efficiency, with the proviso that at least one codon substitution may be made such that functional expression is achieved in the host cell, and
(b) a set of a variant coding sequences wherein each sequence has compared with said first coding sequence substitution of a synonymous non-optimal codon recognized by a cognate tRNA of the host cell at one codon position, or at a cluster of sequential codon positions, or at a combination of codon sites selected from individual codon positions and clusters of sequential codon positions.
Again all possible codons will commonly be provided for each selected position. As discussed above, it will be appreciated that more than one round of codon optimization may be carried out using a method of codon selection according to the invention. All codon positions for which more than one cognate tRNA is available may be included in a round of analysis. As indicated above, a common starting point will be an expression library in which compared to the starting sequence (either the wild-type sequence or first coding sequence as defined above) each variant sequence has just a single codon substitution and preferably the full length of the coding sequence is covered. It will be appreciated that where there is a starting methionine (Met) codon this will remain ATG in all variants since there are no synonymous codons (see Fig.1 ). However in some instances a different start codon may be present in the starting sequence or substituted.
In some instances prior information may mean that more directed codon substitutions may be made covering less than the full length sequence. For example, previous work in S. cerevisiae has suggested that native protein coding sequences in that species commonly have a 'starting ramp' of about 10 codons long which directs low translation efficiency and is thought to be beneficial for overall functional expression of the corresponding proteins (Pechman and Frydman, ibid). Hence, in some instances, it may be chosen to apply the approach of the invention to codon optimization at a section of a protein coding sequence downstream of a 5' starting section of a plurality of codons, e.g. at least 10 codons, at least 20, 30, 40 or 50 codons, which may be kept in all variants tested as the wild-type sequence. Each variant coding sequence may be conveniently generated by site-directed
mutagenesis using conventional methods.
In the context of the claimed invention, the term "pre-determined optimal codon" may be equated with any of the following:
(a) the codon corresponding to the most abundant cognate tRNA in the genome (or in selected highly expressed genes, e.g. genes encoding the set of ribosomal proteins) of the selected host cell for overall usage; this information is readily available from tables for many cell types.
(b) the optimal codon based on calculated tRNA adaption index (tAI) as defined by Dos Reis et al. (2004) Nucleic Acid Res. 32, 5036-5044. The tAI does not take account of the cellular tRNA dynamics driven by trade-off between tRNA supply and demand. Hence, it may be considered preferable to rely on optimal codon assignment in accordance with (c) below.
(c) the optimal codon determined with reference to a normal translational efficiency (nTE) scale as discussed in Pechmann and Frydman, ibid. This scale reflects the competition for the cellular pool of tRNAs by normalizing the cellular tRNA abundances and selective constraints on codon- tRNA interactions (taken account of in defining the tAI) by the codon usage. How often a codon is translated in a cell depends on the codon frequencies in the mRNAs, the abundance of the mRNAs that are attached to the ribosomes and the densities of ribosomes on the specific mRNAs. However Pechmann and Frydman verified in S. cerevisiae that mRNA abundances alone can serve as a sufficient and readily available proxy for the calculation of codon usage.
Of course where the starting sequence for the approach of the invention to codon selection is the wild-type sequence then there is no absolute need to consider any optimal codon for translational efficiency. Simply all synonymous codons may be provided in the expression library for all selected variant positions.
The approach of the invention can be applied to any type of host cell which can be cultured, and which is genetically accessible (i.e. able to serve as a host for functional production of the protein of interest (POI)). It may be applied to all host cells commonly employed for recombinant heterologous protein expression including bacterial cells (e.g. E. coli, Bacillus subtilis), yeast cells (e.g. S. cerevisiae, Pichia pastoris), fungal cells, insect cells and mammalian cell lines (e.g. CHO cells) and tumour cell lines (e.g HeLa cells). In the context of the invention, it will be understood that the POI may be a native protein of a host cell in which the native coding sequence for that protein has been knocked out. In these circumstances, the POI will be considered as a heterologous protein to the mutated host cell.
Transformation of the host cell with a heterologous gene sequence
The expression constructs of the library may be located in plasmids (expression vectors) which are used to transform the host cell. Methods of transformation may include, but are not limited to, heat shock, electroporation, particle bombardment, chemical induction, microinjection and viral transformation.
Heterologous protein expression analysis
Subsequently, the expression levels of the protein for each synonymous coding variant are determined under the same expression conditions. In some instances, it may be possible to directly determine functional expression, e.g. as with GFP or by enzymatic action of the protein of interest (POI) to generate a detectable optical signal. However in some instances it may be chosen to determine physical expression, e.g. by antibody probing, and rely on separate test to verify that physical expression is accompanied by the required function.
In preferred embodiments of the invention, the POI will be detectable by a high-throughput screening method, for example, relying on detection of an optical signal. For this purpose, it may be necessary for the POI to incorporate a tag, or be labelled with a removeable tag, which permits detection of expression Such a tag may be, for example, a fluorescence reporter molecule translationally-fused to the C-terminal end of the POI, e.g. GFP, Yellow Fluorescent Protein (YFP), Red Fluorescent Protein (RFP) or Cyan Fluorescent Protein (CFP). It may be an enzyme which can be used to generate an optical signal. Tags used for detection of expression may also be antigen peptide tags. A tag may be provided for affinity purification, e.g. a His tag. Where the POI is a protein to be used as a therapeutic, any tag employed for detection of expression will be cleavable from the POI.
The codon selected from amongst synonymous codons for any selected position will be the codon associated with the highest or optimal observed functional expression of the POI, or where more than one codon provides substantially equal such expression, one such codon corresponding with that level of expression. Where there is choice of codons indicated for a selected position based on the expression data, preference may be given to the codon in the starting sequence, i.e. the wild type codon if the starting sequence is the wild-type sequence. This will minimise the number of codon changes to convert the starting sequence in a nucleic acid to the selected synonymous coding sequence for improved functional protein expression.
The methods of the present invention will be useful in the production of many different proteins in the industrial, agricultural, chemical and pharmaceutical fields, particularly for example antibodies, hormones and other protein therapeutics. POIs may preferably be recovered from the cell culture medium as secreted proteins, although they may also be recovered from host cell lysates.
Detection of signature patterns associated with structural motifs
A method of the invention for codon optimization may be repeated with one or more further proteins, all the selected proteins for heterologous expression sharing one or more structural motifs. In this case, the analysis carried out may further include determination of whether any signature pattern of optimal and non-optimal codons is associated with any shared structural motif. The invention also extends to synthesizing a protein coding sequence including a signature pattern so determined and expressing the sequence, preferably in the same host cell and under the same expression conditions as employed for codon optimization.
In a further aspect of the invention, there is provided a method of providing a DNA comprising a coding sequence for expressing a heterologous protein in a host cell which includes incorporating in said sequence a pattern of optimal and non-optimal codons at a site associated with provision of a structural motif, wherein said pattern enables increased expression efficiency of said protein in said host cell compared with the synonymous coding sequence containing solely optimal codons, wherein optimal codons are those codons pre-calculated to provide the highest codon translation efficiency in the host cell or the sole possible codon. The DNA may be provided for example by site-directed mutagenesis or de novo synthesis and once obtained, the coding sequence for the protein of interest may desirably be expressed in said host cell.
The following example illustrates the invention with reference to Figure 1 and utilization of GFP as a model protein in E. coli. EXAMPLE
Codon optimization using a wild-type starting sequence
GFP provides a suitable model protein for exemplification of codon optimization according to the invention starting with a wild-type coding sequence, since the protein is well characterized and its functional expression can be simply assayed by high throughput fluorescent screening in E. col i as a model host. Screening can be done either
systematically in a microtiter plate format, or just by pooling all clones, plating them on several agar plates, and screening for the brightest colonies (that than have to be sequenced).
Figure 1 illustrates generation of a synthetic library of synonymous coding sequences wherein the variants each have an initiating Met codon and each variant of the wild-type sequence has one codon change. For each codon position for which synonymous codons are available all possible codons are provided in the library.
The fluorescence associated with functional expression of each variant under the same conditions is determined. For each codon position for which multiple codons are present in the library, a codon is selected which gives the highest expression. Where one such codon is the wild-type codon, this is maintained in the final selected sequence.
Thus, referring to Figure 1 , for codon position 2, the wild type sequence is supplemented by 3 variant synonymous coding sequences to provide all 4 codons for the amino acid Gly. Out of these variants, the variant providing the non-wild type codon GGC at codon position 2, is observed to give the highest fluorescence and is selected for codon position 2 in the new coding sequence.
For codon position 3, just one variant coding sequence is supplied to supplement the wild- type coding sequence so as to provide in the expression library both possible codons for Asp at that position. The variant gives no significant difference in the level of fluorescence and thus the wild-type codon is maintained in preference in the new coding sequence.
The same approach to codon selection is repeated for other codon positions. The selected complete new sequence may be obtained for expression by site-directed mutagenesis of the wild-type sequence.
The new selected sequence may be used as the starting sequence for one or more further rounds of codon optimization in accordance with the invention aimed at revealing optimal codon pairs and/ or clusters. For example, the above method of the example may be repeated but with one initially selected codon at a variant codon position fixed.

Claims

Claims:
1. A method of codon selection for provision of a coding sequence for functional expression of a heterologous protein in a host cell, which comprises:
(A) if the wild-type coding sequence is expressible in said host cell in an expression construct,
(i) providing a library of variant synonymous coding sequences encoding said protein and including the wild-type sequence, each variant coding sequence being inserted in an expression construct whereby each said sequence can be separately expressed in said host cell under the same conditions, and wherein each non-wild-type variant coding sequence differs from the wild-type coding sequence at one codon position, or at a cluster of sequential codon positions, or at a combination of codon sites selected from individual codon positions and clusters of sequential codon positions, the selected codon position(s) being a position or positions for which the host cell can provide more than one cognate tRNA and the variant coding sequences providing for each selected codon position both (a) the predetermined optimal codon for translational efficiency and (b) one or more non-optimal synonymous codons;
(ii) comparing expression of said protein by each expression construct of said library in said host cell under the same expression conditions; and
(iii) in respect of the selected codon position(s) for which synonymous codons are provided in said library determining the codon(s) which give the highest or optimal functional expression;
or,
(B) (i) providing a library of variant synonymous coding sequences encoding said
protein, each variant coding sequence being inserted in an expression construct whereby each said sequence can be separately expressed in said host cell under the same conditions, and wherein said variant coding sequences consist of:
(a) a first coding sequence wherein each codon position is occupied by the sole possible codon or the pre-determined optimal codon for translational efficiency, with the proviso that at least one codon substitution may be made such that functional expression is achieved in said host cell, and
(b) a set of variant coding sequences wherein each sequence has compared with said first coding sequence substitution of a synonymous non-optimal codon recognised by a cognate tRNA of said host cell at one codon position, or at a cluster of sequential codon positions, or at a combination of codon sites selected from individual codon positions and clusters of sequential codon positions,
(ii) comparing expression of said protein by each expression construct of said library in said host cell under the same expression conditions; and
(iii) in respect of the selected codon position(s) for which synonymous codons are provided in said library determining the codon(s) which give the highest or optimal functional expression.
2. A method as claimed in claiml wherein the variant coding sequences provided in said library compared to the wild-type coding sequence or said first coding sequence as the starting sequence all differ from the starting sequence by just one codon and cover a number of selected codon positions, optionally all codon positions for which multiple cognate tRNAs are available in the host cell.
3. A method as claimed in claim 2 wherein for each selected codon position all possible synonymous codons are provided in the library.
4. A method of codon selection for provision of a coding sequence for functional expression of a heterologous protein in a host cell, which comprises:
(i) providing a starting coding sequence for the desired protein which is
expressible in the chosen host cell;
(ii) providing a library of synonymous coding sequences encoding said protein, each variant coding sequence being inserted in an expression construct whereby each said sequence can be separately expressed in said host cell under the same conditions, and wherein said variant coding sequences consist of:
(a) said starting coding sequence and
(b) a set of variant coding sequences which each differ from said starting coding sequence at a single codon position, whereby for all codon positions for which the host cell provides more than one cognate tRNA (or at least all such positions minus a 5' terminal section) all possible codons are provided in the library;
(iii) comparing expression of said protein by each expression construct of said library in said host cell under the same expression conditions and
(iv) in respect of each variant codon position determining the codon which is
associated with the highest or optimal functional protein expression to thereby select a full length coding sequence.
5. A method as claimed in any one of claims 1 to 4 wherein the starting coding sequence is a wild-type coding sequence or contains only sole or pre-determined optimal codons for translational efficiency.
6. A method as claimed in any one of claims 1 to 5 which further comprises one or more further cycles of codon selection in which a selected codon sequence is used as the first coding sequence for codon selection in accordance with claim 1 .
7. A method a claimed in any one of claims 1 to 6 wherein said protein incorporates a tag or is labelled with a removeable tag for detection of expression or affinity purification.
8. A method as claimed in any one of claims 1 to 7 wherein detection of protein expression comprises detection of an optical signal, for example a fluorescent signal.
9. A method a claimed in any one of claims 1 to 8 wherein detection of protein expression is by high-throughput screening.
10. A method as claimed in any one of claims 1 to 9 which further comprises providing a coding sequence for said protein wherein each position for which synonymous codons can be used is either the pre-determined optimal codon or in preference a synonymous non-optimal codon which has been found to correspond with higher functional protein expression.
1 1. A method as claimed in claim 10 which further comprises synthesizing said sequence, optionally by site-directed mutagenesis.
12. A method as claimed in claim 1 1 which further comprises expressing said sequence in said host cell.
13. A method as claimed in any one of claims 1 to 9 which further comprises repeating the method for one or more further proteins wherein the selected proteins for expression share one or more structural motifs and wherein the analysis carried out in said step (iii) or step (iv) is followed by determination of whether any signature pattern of optimal and non- optimal codons is associated with any shared structural motif of said proteins.
14. A method as claimed in claim 13 which further comprises providing a protein coding sequence including a determined signature pattern of optimal and non-optimal codons associated with a structural motif.
15. A method as claimed in claim 14 which comprises synthesizing said protein coding sequence.
16. A method as claimed in claim 15 which further comprises expressing said protein in said host cell.
17. A method of providing a DNA comprising a coding sequence for expressing a heterologous protein in a host cell which includes incorporating in said sequence a pattern of optimal and non-optimal codons at a site associated with provision of a structural motif, wherein said pattern enables increased expression efficiency of said protein in said host cell compared with the synonymous coding sequence containing solely optimal codons, wherein optimal codons are those codons pre-calculated to provide the highest codon translation efficiency in the host cell or the sole possible codon.
18. A method as claimed in claim 17 which further comprises expressing said coding sequence in said host cell.
EP13773223.6A 2013-10-02 2013-10-02 Systematic optimization of coding sequence for functional protein expression Withdrawn EP3052624A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2013/070531 WO2015048989A1 (en) 2013-10-02 2013-10-02 Systematic optimization of coding sequence for functional protein expression

Publications (1)

Publication Number Publication Date
EP3052624A1 true EP3052624A1 (en) 2016-08-10

Family

ID=49303970

Family Applications (1)

Application Number Title Priority Date Filing Date
EP13773223.6A Withdrawn EP3052624A1 (en) 2013-10-02 2013-10-02 Systematic optimization of coding sequence for functional protein expression

Country Status (2)

Country Link
EP (1) EP3052624A1 (en)
WO (1) WO2015048989A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023026292A1 (en) * 2021-08-25 2023-03-02 Ramot At Tel-Aviv University Ltd. Optimized expression in target organisms

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10724040B2 (en) 2015-07-15 2020-07-28 The Penn State Research Foundation mRNA sequences to control co-translational folding of proteins
GB201600512D0 (en) 2016-01-12 2016-02-24 Univ York Recombinant protein production
EP3363900A1 (en) * 2017-02-21 2018-08-22 ETH Zurich Evolution-guided multiplexed dna assembly of dna parts, pathways and genomes
CN113195719A (en) * 2018-09-26 2021-07-30 卡斯西部储备大学 Methods and compositions for increasing protein expression and/or treating haploinsufficiency
FR3099179A1 (en) * 2019-07-22 2021-01-29 Universite De Rennes 1 METHOD FOR DETERMINING THE EFFECT OF A MUTATION ON THE EXPRESSION OF A GENE OF INTEREST

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2015048989A1 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023026292A1 (en) * 2021-08-25 2023-03-02 Ramot At Tel-Aviv University Ltd. Optimized expression in target organisms

Also Published As

Publication number Publication date
WO2015048989A1 (en) 2015-04-09

Similar Documents

Publication Publication Date Title
WO2015048989A1 (en) Systematic optimization of coding sequence for functional protein expression
Portela et al. Synthetic core promoters as universal parts for fine-tuning expression in different yeast species
Liska et al. Expanding the organismal scope of proteomics: cross‐species protein identification by mass spectrometry and its implications
Yang et al. eRF1 mediates codon usage effects on mRNA translation efficiency through premature termination at rare codons
Twyman Principles of proteomics
Yin et al. P gas, a low-pH-induced promoter, as a tool for dynamic control of gene expression for metabolic engineering of Aspergillus niger
Xiong et al. Condition-specific promoter activities in Saccharomyces cerevisiae
Zrimec et al. Controlling gene expression with deep generative design of regulatory DNA
Decoene et al. Toward predictable 5′ UTRs in Saccharomyces cerevisiae: development of a yUTR Calculator
Oliver From gene to screen with yeast
Rehbein et al. “CodonWizard”–An intuitive software tool with graphical user interface for customizable codon optimization in protein expression efforts
CN113234702A (en) Lt1Cas13d protein and gene editing system
Yilmaz et al. Towards next-generation cell factories by rational genome-scale engineering
JP2009509533A5 (en)
Duan et al. Deciphering the rules of ribosome binding site differentiation in context dependence
Yumerefendi et al. Library-based methods for identification of soluble expression constructs
Picard et al. Transcriptomic, proteomic, and functional consequences of codon usage bias in human cells during heterologous gene expression
CN108866057B (en) Escherichia coli pressure response type promoter and preparation method thereof
Vaishnav et al. A comprehensive fitness landscape model reveals the evolutionary history and future evolvability of eukaryotic cis-regulatory DNA sequences
US20230295612A1 (en) Method for screening for bioactive natural products
Wang et al. PSCL: predicting protein subcellular localization based on optimal functional domains
Ozoline et al. Predicting antisense RNAs in the genomes of Escherichia coli and Salmonella typhimurium using promoter-search algorithm PlatProm
CN104088019A (en) Construction method of peptide aptamer library based on dimolecular fluorescence complementation technology
CN111718929B (en) Protein translation using circular RNA and uses thereof
Boob et al. CRISPR-COPIES: an in silico platform for discovery of neutral integration sites for CRISPR/Cas-facilitated gene integration

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20160429

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20161122