CN114360645A

CN114360645A - Codon optimization method of protein expression system and protein expression system

Info

Publication number: CN114360645A
Application number: CN202111673482.3A
Authority: CN
Inventors: 郭敏; 熊亮; 周伟峰; 徐丽琼; 徐秀珍; 唐磊; 曹平生; 于雪
Original assignee: Kangma Healthcode Shanghai Biotech Co Ltd
Current assignee: Kangma Healthcode Shanghai Biotech Co Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-15
Also published as: CN116417065A

Abstract

The codon optimization method is based on ribosomal protein of a cell extract source species in the protein expression system, counting codons of a DNA sequence for coding the ribosomal protein to obtain the relative frequency of each codon in synonymous codons corresponding to each amino acid residue in an amino acid sequence of the ribosomal protein, selecting the codon with the highest relative frequency, and using the codon as the codon of the same amino acid residue in the amino acid sequence of a target protein. The codon optimization method can rapidly obtain a DNA sequence which does not contain a specific site and has higher protein expression efficiency compared with the DNA sequence before optimization under the condition of using less computing resources.

Description

Codon optimization method of protein expression system and protein expression system

Technical Field

The invention belongs to the technical field of biosynthesis, and particularly relates to a codon optimization method of a protein expression system and the protein expression system.

Background

Codon optimization is an operation of changing the DNA coding sequence of the target protein to be expressed so as to improve the expression amount and/or expression activity of the target protein in an expression system.

The factors to be considered in the codon optimization process are mainly: the physicochemical properties of DNA and mRNA transcribed therefrom, the codon preference of a protein expression system, the two-dimensional and three-dimensional structure of a target protein, and the like. The main parameters considered by the codon optimization method of the current common protein expression system include: codon bias of genes in the host cell, duplex codon bias of the host cell, tRNA copy number of the host cell, GC content, mRNA secondary structure and the like.

Theoretically, the number of DNA sequences used to express the same target protein is very large due to the existence of synonymous codons, and the number of DNA sequences increases geometrically as the length of the protein amino acid sequence increases. For example, it is assumed that the amino acid sequence of the target protein to be expressed is a₁a₂...a_nThe number of synonymous codons corresponding to the m-th amino acid residue (m is a natural number, and 1. ltoreq. m.ltoreq.n) is x_mThen the number of the types of the DNA coding sequence corresponding to the protein amino acid sequence is:

for example, for the following length of 40 amino acid residues:

DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV，

according to the following 20 natural amino acids list and codon table corresponding to each amino acid,

list of 20 natural amino acids

Cipher sub-table

The number of synonymous codons for various amino acid residues is shown in the following table:

thus, if the protein expression system is a eukaryotic cell, the number of possible corresponding DNA sequences is:

2*4*2*2*6*2*2*2*2*2*2*4*2*2*2*2*6*4*2*2*4*2*2*4*4*2*2*2*4*2*3*3*4*6*1*4*4*4*4*4＝273,593,677,362,757,632。

for proteins comprising more amino acid residues, this value increases exponentially. Therefore, if codon optimization solution is performed by using the brute force enumeration method, too much computational resources and time are occupied in solution due to the excessive number of synonymous DNA sequences. The method is generally directly optimized for the DNA sequence of the coding target protein, and finally, which DNA sequence is selected for protein expression also needs to adopt different DNA sequences for massive expression experiments so as to select the DNA sequence with higher expression efficiency and stability, so that the workload is huge. In addition, this method does not take into account other adverse factors that affect the efficiency of protein expression.

Disclosure of Invention

The invention aims to overcome the defects of the existing codon optimization method, and find a DNA coding sequence with an optimized object different from a target protein, wherein the optimization method is different from a codon optimization algorithm of a violent enumeration method so as to improve the optimization efficiency and the expression efficiency of the optimized target protein.

Ribosomes are specialized organelles composed of ribosomal RNA and ribosomal proteins, which play a critical role in the translation of mRNA into protein. Research shows that the expression amount of ribosomal protein is important for the normal function of organism. In view of the importance of ribosomal proteins in organisms, DNA encoding ribosomal proteins is subject to great selection pressure, promoting its evolution toward high stability and high expression efficiency. It is concluded that optimizing the DNA sequence encoding the target protein based on the codon bias regularity of the DNA sequence encoding the ribosomal protein of the target organism will greatly increase the expression level and/or the expression activity of the target protein in and out of the target organism.

In the actual protein expression process, whether based on a cell-based expression system or a cell-free expression system, it is necessary to prepare a DNA encoding a target protein. In order to avoid the degradation of the target DNA fragment by the restriction enzyme, the restriction enzyme cutting site of the corresponding restriction enzyme needs to be avoided on the premise of keeping the synonymity of the codon and relatively high expression efficiency. Furthermore, given that a particular sequence may have a particular negative effect on protein expression, it may sometimes be desirable to remove specific sites based on that particular sequence in addition to the cleavage site.

The problem of removing the restriction sites of restriction enzymes in a DNA sequence during codon optimization can be regarded as a constrained optimization problem. The mandatory constraints in the problem are synonymous codons and sites that need to be avoided, the optimization goal being that the synonymous codons in the sequence are relatively frequently expressed in ribosomal proteins.

In order to achieve the above object, according to a first aspect of the present invention, there is provided a codon optimization method for a protein expression system, wherein codons in a DNA sequence encoding ribosomal proteins are counted based on the ribosomal proteins of a cell extract-derived species in the protein expression system, the relative frequency of each codon in synonymous codons corresponding to each amino acid residue in an amino acid sequence of the ribosomal proteins is obtained, the codon having the highest relative frequency is selected, and the codon is used as a codon of the same amino acid residue in the amino acid sequence of a target protein.

Further, the relative frequency is obtained by normalizing statistical data, the statistical data includes the number of times of usage of each codon in the synonymous codon of each amino acid residue, and the relative frequency of each codon in the synonymous codon is a ratio of the number of times of usage of the relative frequency to the sum of the number of times of usage of each codon in the synonymous codon.

Further, codons with a relative frequency of not more than 0.05 were deleted.

Further, it is also recognized whether a specific site that restricts the expression of the target protein is present in the DNA sequence encoding the target protein, and if present, the nucleotide sequence of the specific site is optimized.

Further, the specific site is a restriction enzyme cutting site of a restriction enzyme.

Further, the optimization procedure for the DNA sequence encoding the protein of interest is as follows: inputting a target protein-based sequence R0 to be optimized, and translating the target protein-based sequence into an amino acid sequence if R0 is a DNA sequence; in the synonymous codon corresponding to each amino acid residue in the amino acid sequence, the codon composition optimized DNA sequence R1 identical to the codon with the highest relative frequency in the synonymous codon of the same amino acid residue in the amino acid sequence of the ribosomal protein was selected.

Further, the DNA sequence encoding the target protein is optimized in a segmented manner.

Further, the length of the segments is m bases, m is more than or equal to 6 and less than or equal to 300, and is integral multiple of 3.

Further, a set A of specific sites to be avoided is also input, the optimized DNA sequence R1 is divided into n segments, whether the specific sites subordinate to the set A exist in each segment is identified, and if the specific sites exist, the specific sites are optimized; the optimized sequences are combined to form an optimized DNA sequence R2.

In a second aspect, the present invention provides a protein expression system, comprising a cell extract and a DNA sequence encoding a target protein, wherein the DNA sequence encoding the target protein is optimized by the codon optimization method of the protein expression system according to any one of the first aspect.

Further, the source species of the cell extract is one of escherichia coli, bacillus subtilis, saccharomyces cerevisiae, saccharomyces pichia pastoris and kluyveromyces.

Further, the Kluyveromyces is one of Kluyveromyces lactis, Kluyveromyces marxianus, Kluyveromyces polybuvinus, Kluyveromyces hainanensis, Kluyveromyces non-fermented, Kluyveromyces williamsii, Kluyveromyces thermotolerans, Kluyveromyces fragilis, Kluyveromyces hunanensis, Kluyveromyces polyspora, Kluyveromyces siamensis, and Kluyveromyces aureoides.

Compared with the prior art, the invention has the beneficial effects that:

1. the concept of the optimization method is derived from the codon preference of the coding DNA sequence of the ribosomal protein of the source species of the cell extract for expressing the target protein, and the codon preference is quantitatively counted and then is transplanted to the coding DNA sequence of the target protein for optimizing, so that the advantages of high stability and high expression efficiency similar to those of the ribosomal protein can be obtained to a certain extent. The ribosome to which the ribosomal protein belongs is a 'protein factory' through which the target protein is expressed, and is optimized from the viewpoint of the biological mechanism of target protein expression, and the expression efficiency and stability of the target protein can be effectively improved by adopting the optimized DNA sequence.

2. Compared with a global violent enumeration method, the segmented optimization can greatly reduce the number of the synonymous sequences required to be considered by each segment of optimization, greatly reduce the resources and time required by calculation and improve the operation efficiency.

3. Optimizing specific sites can destroy the restriction enzyme cutting sites of restriction enzymes, and the optimization can be more targeted, so that the calculation amount can be further reduced.

Drawings

FIG. 1 is a flow chart of codon preference selection in one embodiment of the method of the invention.

FIG. 2 is a flow chart of one embodiment of the method of the present invention.

FIG. 3 is a flow chart of segment optimization in one embodiment of the method of the present invention.

FIG. 4 is a comparison graph of expression levels before and after optimization in one embodiment of the expression system of the present invention.

FIG. 5 is a comparison graph of expression levels before and after optimization in another example of the expression system of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

Example 1:

kluyveromyces lactis (Kluyveromyces lactis) is a commonly used yeast in bioengineering, and can be used for large-scale production of proteins. Through counting the codons used by the ribosomal protein coding DNA sequence, after normalization processing, the codons with the relative frequency less than 0.05 are removed, and the obtained codon relative frequencies are shown in the following table:

example 2:

referring to FIGS. 1-3, the input DNA sequence is AACCTTTGGGAAACCCTC, the sites to be avoided are the matched double strands of "CCC" and "GGG", the target protein expression system is derived from Kluyveromyces lactis, the DNA sequence is segmented into 9 character segments, and the arrangement of codons is based on the relative frequency of synonymous codons. The codon optimization procedure was as follows:

1 the synonymous codon ordering obtained in example 1 above was adjusted.

2 translating the input DNA sequence into an amino acid sequence 'NLWETL' according to a codon table;

3 according to the amino acid sequence and codon sequence, the initial DNA sequence generated is: AACTTGTGGGAAACCTTG

4 cutting the initial DNA sequence into two segments "AACTTGTGG" and "GAAACCTTG

5 the first segment does not contain sites to be avoided and therefore is directly used as a solution for the segment;

6 the second segment does not contain sites to be avoided, and therefore is directly used as a solution of the segment;

7, splicing the sequences in sequence, wherein the spliced result is 'AACTTGTGGGAAACCTTG', and the sequence contains a site GGG;

8 pairs of codon combinations involved R4: "TGGGAA" performs a synonymous search, and the possible combinations are: "TGGGAA" and "TGGGAG";

9 neither of the two codon combinations avoids the "GGG" site and therefore the optimization fails.

Example 3:

referring to FIGS. 1-3, the input DNA sequence is ACCCTAGGACTTTACTACCGA, the sites to be avoided are "GTAC" and "GGGTTT", the target protein expression system is derived from Kluyveromyces lactis, the DNA sequence is segmented into 9 character segments, and the arrangement of codons is based on the relative frequency of synonymous codons. The codon optimization procedure was as follows:

1, calling the synonymous codon sequence obtained in the above example 1;

2 translating the input sequence into an amino acid sequence "TLGLYYR" according to a codon table;

3 according to the amino acid sequence and codon sequence, the initial DNA sequence generated is: ACCTTGGGTTTGTACTACAGA, respectively;

4 cutting the initial DNA sequence into three sections of ACCTTGGGT, TTGTACTAC and AGA;

in the second paragraph 6, which contains "GTAC", the codon region R4 involved is: "TTGTAC"

7, generating an ordered combination of synonymous codons corresponding to TTGTAC by an enumeration algorithm, wherein the combination is as follows: { "TTGTAC", "TTGTAT", "TTATAC", "TTATAT", "CTATAC", "CTATAT" };

8 sequence "TTGTAC", which can not satisfy the requirement of not containing "GTAC";

9 sequence "TTGTAT", found to have met the requirements;

the 10 second segment would be modified to "TTGTATTAC";

11 the last segment "AGA" contains no sites to be avoided and will therefore be spliced directly;

12, splicing the sequences to generate a result: ACCTTGGGTTTGTATTACAGA, verifying that the sequence is found to contain the site GGGTTT to be avoided, and the related codon combination is 'TTGGGTTTG';

the 13 enumeration algorithm generates TTGGGTTTG corresponding synonymous codon combinations, the first few of which are: (1) TTGGGTTTA, respectively; (2) TTAGGTTTG, respectively;

14, checking one by one, and finding TTAGGTTTG to avoid the designated site;

15 modify the final sequence to ACCTTAGGTTTGTATTACAGA;

16 the sequence of step 15 does not contain the sites to be avoided, thus outputting the optimized sequence "ACCTTAGGTTTGTATTACAGA".

Example 4:

referring to FIGS. 1-3, the input DNA sequence is TTCGGGACATGA, and no avoidance sites are required. The target protein expression system is derived from Kluyveromyces lactis, the DNA sequence is segmented into 9 character segments, and the arrangement basis of codons is the frequency of synonymous codons. The codon optimization procedure was as follows:

1 the synonymous codon ordering obtained in example 1 above was adjusted.

2 translating the input sequence into the amino acid sequence "FAT" according to the codon table;

3 according to the amino acid sequence and codon sequence, the initial DNA sequence generated is: TTCGCTACCTAA, and directly outputting.

Example 5:

referring to FIGS. 1-3, the input DNA sequence is ACCCTAGGACTTTACTACCGA, and the sites to be avoided are GGGTTTA. The target protein expression system is derived from Kluyveromyces lactis, the DNA sequence is segmented into 9 character segments, and the arrangement basis of codons is the frequency of synonymous codons. The codon optimization procedure was as follows:

1, calling the synonymous codon sequence obtained in the above example 1;

three 5 segments do not contain sites to be avoided, so that the three segments are directly spliced in sequence to obtain: ACCTTGGGTTTGTACTACAGA, respectively;

6, the corresponding sequence in step 5 is checked, and no site needing to be avoided is found, so that the ACCTTGGGTTTGTACTACAGA is directly output.

Example 6:

comparing the change of the expression level of the target protein E before and after the sequence optimization by using the scheme. The protein E is fluorescent protein and can generate autofluorescence, and the fluorescence brightness of the protein E can be obtained by reading the value through a fluorescence instrument.

The nucleotide sequence before codon optimization of protein E is as follows, and the sequence is named EO

ATGattacagaaacatcatcaccgttcagatctatattctcccacagtgggaaaCACCACCATCACCACCACCATCACGGGAGCGGCGAGAACTTaTATTTCCAGGGATCCCGGAATGAATTCGGATCTCAATTCGAGCTCCGTCGACAAGCTgGCGGCCGCGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGCGCGGCGAGGGCGAGGGCGATGCCACCAACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTCCTTCAAGGACGACGGCACCTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTTCAACAGCCACAACGTCTATATCACGGCCGACAAGCAGAAGAACGGCATCAAGGCGAACTTCAAGATCCGCCACAACGTCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAG

The codon optimization steps of this sequence are as follows:

1. the corresponding amino acid sequence of the DNA is as follows:

MITETSSPFRSIFSHSGKHHHHHHHHGSGENLYFQGSRNEFGSQFELRRQAGGRVSKGEELFTGVVPILVELDGDVNGHKFSVRGEGEGDATNGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTISFKDDGTYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNFNSHNVYITADKQKNGIKANFKIRHNVEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSKLSKDPNEKRDHMVLLEFVTAAGITLGMDELYK

2. according to the results of example 1, each amino acid residue of the protein was sequentially selected for the corresponding synonymous codon, and the resulting nucleotide sequence was as follows, and the optimized sequence was designated as EXL:

ATGATCACCGAAACCTCTTCTCCATTCAGATCTATCTTCTCTCACTCTGGTAAGCACCACCACCACCACCACCACCACGGTTCTGGTGAAAACTTGTACTTCCAAGGTTCTAGAAACGAATTCGGTTCTCAATTCGAATTGAGAAGACAAGCTGGTGGTAGAGTTTCTAAGGGTGAAGAATTGTTCACCGGTGTTGTTCCAATCTTGGTTGAATTGGACGGTGACGTTAACGGTCACAAGTTCTCTGTTAGAGGTGAAGGTGAAGGTGACGCTACCAACGGTAAGTTGACCTTGAAGTTCATCTGTACCACCGGTAAGTTGCCAGTTCCATGGCCAACCTTGGTTACCACCTTGACCTACGGTGTTCAATGTTTCTCTAGATACCCAGACCACATGAAGCAACACGACTTCTTCAAGTCTGCTATGCCAGAAGGTTACGTTCAAGAAAGAACCATCTCTTTCAAGGACGACGGTACCTACAAGACCAGAGCTGAAGTTAAGTTCGAAGGTGACACCTTGGTTAACAGAATCGAATTGAAGGGTATCGACTTCAAGGAAGACGGTAACATCTTGGGTCACAAGTTGGAATACAACTTCAACTCTCACAACGTTTACATCACCGCTGACAAGCAAAAGAACGGTATCAAGGCTAACTTCAAGATCAGACACAACGTTGAAGACGGTTCTGTTCAATTGGCTGACCACTACCAACAAAACACCCCAATCGGTGACGGTCCAGTTTTGTTGCCAGACAACCACTACTTGTCTACCCAATCTAAGTTGTCTAAGGACCCAAACGAAAAGAGAGACCACATGGTTTTGTTGGAATTCGTTACCGCTGCTGGTATCACCTTGGGTATGGACGAATTGTACAAG

3. the above sequence named EXL is a result of codon optimization because there are no sites to avoid.

After expression using the protein expression system, the fluorescence values, in RFU, of the products were measured, as shown in fig. 4. The intensity of the fluorescence value represents the expression level of the EGFP protein. As can be seen from FIG. 4, the fluorescence of protein E (EXL) after codon optimization was significantly improved as compared with that of protein E (EO) before codon optimization. Because the fluorescence value is positively correlated with the protein expression quantity, the expression quantity of the protein E is obviously improved after codon optimization.

Example 7:

the expression level of protein L was judged from the electrophoretogram after purification of protein L, and the change in expression level of protein L before and after codon optimization was compared.

The sequence before optimization of protein L is as follows, denoted LO

ATGAACGTTATTGCTATTTTGAACCACATGGGCGTTTACTTCAAGGAAGAACCAATTAGAGAATTGCACAGAGCTTTGGAAAGATTGAACTTCCAAATTGTTTACCCAAACGACAGAGACGACTTGTTGAAGTTGATTGAAAACAACGCTAGATTGTGCGGCGTTATTTTCGACTGGGACAAGTACAACTTGGAATTGTGCGAAGAAATTTCTAAGATGAACGAAAACTTGCCATTGTACGCTTTCGCTAACACTTACTCTACTTTGGACGTTTCTTTGAACGACTTGAGATTGCAAATTTCTTTCTTCGAATACGCTTTGGGCGCTGCTGAAGACATTGCTAACAAGATTAAGCAAACTACTGACGAATACATTAACACTATTTTGCCACCATTGACTAAGGCTTTGTTCAAGTACGTTAGAGAAGGCAAGTACACTTTCTGCACTCCAGGCCACATGGGCGGCACTGCTTTCCAAAAGTCTCCAGTTGGCTCTTTGTTCTACGACTTCTTCGGCCCAAACACTATGAAGTCTGACATTTCTATTTCTGTTTCTGAATTGGGCTCTTTGTTGGACCACTCTGGCCCACACAAGGAAGCTGAACAATACATTGCTAGAGTTTTCAACGCTGACAGATCTTACATGGTTACTAACGGCACTTCTACTGCTAACAAGATTGTTGGCATGTACTCTGCTCCAGCTGGCTCTACTATTTTGATTGACAGAAACTGCCACAAGTCTTTGACTCACTTGATGATGATGTCTGACGTTACTCCAATTTACTTCAGACCAACTAGAAACGCTTACGGCATTTTGGGCGGCATTCCACAATCTGAATTCCAACACGCTACTATTGCTAAGAGAGTTAAGGAAACTCCAAACGCTACTTGGCCAGTTCACGCTGTTATTACTAACTCTACTTACGACGGCTTGTTGTACAACACTGACTTCATTAAGAAGACTTTGGACGTTAAGTCTATTCACTTCGACTCTGCTTGGGTTCCATACACTAACTTCTCTCCAATTTACGAAGGCAAGTGCGGCATGTCTGGCGGCAGAGTTGAAGGCAAGGTTATTTACGAAACTCAATCTACTCACAAGTTGTTGGCTGCTTTCTCTCAAGCTTCTATGATTCACGTTAAGGGCGACGTTAACGAAGAAACTTTCAACGAAGCTTACATGATGCACACTACTACTTCTCCACACTACGGCATTGTTGCTTCTACTGAAACTGCTGCTGCTATGATGAAGGGCAACGCTGGCAAGAGATTGATTAACGGCTCTATTGAAAGAGCTATTAAGTTCAGAAAGGAAATTAAGAGATTGAGAACTGAATCTGACGGCTGGTTCTTCGACGTTTGGCAACCAGACCACATTGACACTACTGAATGCTGGCCATTGAGATCTGACTCTACTTGGCACGGCTTCAAGAACATTGACAACGAACACATGTACTTGGACCCAATTAAGGTTACTTTGTTGACTCCAGGCATGGAAAAGGACGGCACTATGTCTGACTTCGGCATTCCAGCTTCTATTGTTGCTAAGTACTTGGACGAACACGGCATTGTTGTTGAAAAGACTGGCCCATACAACTTGTTGTTCTTGTTCTCTATTGGCATTGACAAGACTAAGGCTTTGTCTTTGTTGAGAGCTTTGACTGACTTCAAGAGAGCTTTCGACTTGAACTTGAGAGTTAAGAACATGTTGCCATCTTTGTACAGAGAAGACCCAGAATTCTACGAAAACATGAGAATTCAAGAATTGGCTCAAAACATTCACAAGTTGATTGTTCACCACAACTTGCCAGACTTGATGTACAGAGCTTTCGAAGTTTTGCCAACTATGGTTATGACTCCATACGCTGCTTTCCAAAAGGAATTGCACGGCATGACTGAAGAAGTTTACTTGGACGAAATGGTTGGCAGAATTAACGCTAACATGATTTTGCCATACCCACCAGGCGTTCCATTGGTTATGCCAGGCGAAATGATTACTGAAGAATCTAGACCAGTTTTGGAATTCTTGCAAATGTTGTGCGAAATTGGCGCTCACTACCCAGGCTTCGAAACTGACATTCACGGCGCTTACAGACAAGCTGACGGCAGATACACTGTTAAGGTTTTGAAGGAAGAATCTAAGAAG

The codon optimization steps of this sequence are as follows:

1. the corresponding amino acid sequence of the DNA sequence is as follows:

MNVIAILNHMGVYFKEEPIRELHRALERLNFQIVYPNDRDDLLKLIENNARLCGVIFDWDKYNLELCEEISKMNENLPLYAFANTYSTLDVSLNDLRLQISFFEYALGAAEDIANKIKQTTDEYINTILPPLTKALFKYVREGKYTFCTPGHMGGTAFQKSPVGSLFYDFFGPNTMKSDISISVSELGSLLDHSGPHKEAEQYIARVFNADRSYMVTNGTSTANKIVGMYSAPAGSTILIDRNCHKSLTHLMMMSDVTPIYFRPTRNAYGILGGIPQSEFQHATIAKRVKETPNATWPVHAVITNSTYDGLLYNTDFIKKTLDVKSIHFDSAWVPYTNFSPIYEGKCGMSGGRVEGKVIYETQSTHKLLAAFSQASMIHVKGDVNEETFNEAYMMHTTTSPHYGIVASTETAAAMMKGNAGKRLINGSIERAIKFRKEIKRLRTESDGWFFDVWQPDHIDTTECWPLRSDSTWHGFKNIDNEHMYLDPIKVTLLTPGMEKDGTMSDFGIPASIVAKYLDEHGIVVEKTGPYNLLFLFSIGIDKTKALSLLRALTDFKRAFDLNLRVKNMLPSLYREDPEFYENMRIQELAQNIHKLIVHHNLPDLMYRAFEVLPTMVMTPYAAFQKELHGMTEEVYLDEMVGRINANMILPYPPGVPLVMPGEMITEESRPVLEFLQMLCEIGAHYPGFETDIHGAYRQADGRYTVKVLKEESKK

2. according to the results of example 1, the corresponding synonymous codon was selected for each amino acid residue in turn of the protein, and the resulting nucleotide sequence, denoted LXL, was as follows:

ATGAACGTTATCGCTATCTTGAACCACATGGGTGTTTACTTCAAGGAAGAACCAATCAGAGAATTGCACAGAGCTTTGGAAAGATTGAACTTCCAAATCGTTTACCCAAACGACAGAGACGACTTGTTGAAGTTGATCGAAAACAACGCTAGATTGTGTGGTGTTATCTTCGACTGGGACAAGTACAACTTGGAATTGTGTGAAGAAATCTCTAAGATGAACGAAAACTTGCCATTGTACGCTTTCGCTAACACCTACTCTACCTTGGACGTTTCTTTGAACGACTTGAGATTGCAAATCTCTTTCTTCGAATACGCTTTGGGTGCTGCTGAAGACATCGCTAACAAGATCAAGCAAACCACCGACGAATACATCAACACCATCTTGCCACCATTGACCAAGGCTTTGTTCAAGTACGTTAGAGAAGGTAAGTACACCTTCTGTACCCCAGGTCACATGGGTGGTACCGCTTTCCAAAAGTCTCCAGTTGGTTCTTTGTTCTACGACTTCTTCGGTCCAAACACCATGAAGTCTGACATCTCTATCTCTGTTTCTGAATTGGGTTCTTTGTTGGACCACTCTGGTCCACACAAGGAAGCTGAACAATACATCGCTAGAGTTTTCAACGCTGACAGATCTTACATGGTTACCAACGGTACCTCTACCGCTAACAAGATCGTTGGTATGTACTCTGCTCCAGCTGGTTCTACCATCTTGATCGACAGAAACTGTCACAAGTCTTTGACCCACTTGATGATGATGTCTGACGTTACCCCAATCTACTTCAGACCAACCAGAAACGCTTACGGTATCTTGGGTGGTATCCCACAATCTGAATTCCAACACGCTACCATCGCTAAGAGAGTTAAGGAAACCCCAAACGCTACCTGGCCAGTTCACGCTGTTATCACCAACTCTACCTACGACGGTTTGTTGTACAACACCGACTTCATCAAGAAGACCTTGGACGTTAAGTCTATCCACTTCGACTCTGCTTGGGTTCCATACACCAACTTCTCTCCAATCTACGAAGGTAAGTGTGGTATGTCTGGTGGTAGAGTTGAAGGTAAGGTTATCTACGAAACCCAATCTACCCACAAGTTGTTGGCTGCTTTCTCTCAAGCTTCTATGATCCACGTTAAGGGTGACGTTAACGAAGAAACCTTCAACGAAGCTTACATGATGCACACCACCACCTCTCCACACTACGGTATCGTTGCTTCTACCGAAACCGCTGCTGCTATGATGAAGGGTAACGCTGGTAAGAGATTGATCAACGGTTCTATCGAAAGAGCTATCAAGTTCAGAAAGGAAATCAAGAGATTGAGAACCGAATCTGACGGTTGGTTCTTCGACGTTTGGCAACCAGACCACATCGACACCACCGAATGTTGGCCATTGAGATCTGACTCTACCTGGCACGGTTTCAAGAACATCGACAACGAACACATGTACTTGGACCCAATCAAGGTTACCTTGTTGACCCCAGGTATGGAAAAGGACGGTACCATGTCTGACTTCGGTATCCCAGCTTCTATCGTTGCTAAGTACTTGGACGAACACGGTATCGTTGTTGAAAAGACCGGTCCATACAACTTGTTGTTCTTGTTCTCTATCGGTATCGACAAGACCAAGGCTTTGTCTTTGTTGAGAGCTTTGACCGACTTCAAGAGAGCTTTCGACTTGAACTTGAGAGTTAAGAACATGTTGCCATCTTTGTACAGAGAAGACCCAGAATTCTACGAAAACATGAGAATCCAAGAATTGGCTCAAAACATCCACAAGTTGATCGTTCACCACAACTTGCCAGACTTGATGTACAGAGCTTTCGAAGTTTTGCCAACCATGGTTATGACCCCATACGCTGCTTTCCAAAAGGAATTGCACGGTATGACCGAAGAAGTTTACTTGGACGAAATGGTTGGTAGAATCAACGCTAACATGATCTTGCCATACCCACCAGGTGTTCCATTGGTTATGCCAGGTGAAATGATCACCGAAGAATCTAGACCAGTTTTGGAATTCTTGCAAATGTTGTGTGAAATCGGTGCTCACTACCCAGGTTTCGAAACCGACATCCACGGTGCTTACAGACAAGCTGACGGTAGATACACCGTTAAGGTTTTGAAGGAAGAATCTAAGAAG

3. the above sequence named LXL is a result of codon optimization because there are no sites to be avoided.

After the expression of the in vitro protein expression system, the nickel magnetic bead affinity purification is carried out. Because the protein L is provided with a histag at the C terminal, the protein L can be specifically adsorbed with nickel. SDS-PAGE was performed on the proteins eluted after the purification, and the expression level of protein L was compared, as shown in FIG. 5, and the size of protein L was 82.8kDa, as shown by the arrow in FIG. 5. The result shows that after codon optimization, the expression level is obviously improved, and the electrophoresis band of the target protein is obviously enhanced.

Reference is made to the applicant's relevant prior patent documents for protein expression systems mentioned in the present invention, as if each document were individually incorporated by reference. Furthermore, it should be understood that various changes and modifications of the present invention can be made by those skilled in the art after reading the above teachings of the present invention, and these equivalents also fall within the scope of the present invention as defined by the appended claims.

Claims

1. A codon optimization method for a protein expression system is characterized in that, based on ribosomal proteins of cell extracts derived species in the protein expression system, the codons of a DNA sequence encoding the ribosomal proteins are counted to obtain the relative frequency of each codon in synonymous codons corresponding to each amino acid residue in the amino acid sequence of the ribosomal proteins, the codon with the highest relative frequency is selected, and the codon is used as the codon of the same amino acid residue in the amino acid sequence of a target protein.

2. The codon optimization method according to claim 1, wherein the relative frequency is normalized by statistical data including the number of times of usage of each codon in the synonymous codon for each amino acid residue, and the relative frequency of each codon in the synonymous codon is a ratio of the number of times of usage thereof to the sum of the number of times of usage of each codon in the synonymous codon.

3. The codon optimization method for protein expression system according to claim 2, wherein codons having a relative frequency of not more than 0.05 are deleted.

4. The codon optimization method for protein expression system according to claim 1, wherein the DNA sequence encoding the target protein is further identified as to whether or not a specific site that restricts the expression of the target protein is present, and if present, the nucleotide sequence of the specific site is optimized.

5. The codon optimization method for protein expression system according to claim 4, wherein the specific site is a cleavage site of a restriction enzyme.

6. The codon optimization method for protein expression system according to claim 1, wherein the DNA sequence encoding the target protein is optimized as follows: inputting a target protein-based sequence R0 to be optimized, and translating the target protein-based sequence into an amino acid sequence if R0 is a DNA sequence; in the synonymous codon corresponding to each amino acid residue in the amino acid sequence, the codon composition optimized DNA sequence R1 identical to the codon with the highest relative frequency in the synonymous codon of the same amino acid residue in the amino acid sequence of the ribosomal protein was selected.

7. The codon optimization method for protein expression system according to claim 6, wherein the DNA sequence encoding the target protein is optimized in a stepwise manner.

8. The codon optimization method for protein expression system according to claim 7, wherein the length of said segment is m bases, 6. ltoreq. m.ltoreq.300, and is an integral multiple of 3.

9. The codon optimization method for protein expression system according to claim 8, wherein a set A of specific sites to be avoided is further inputted, the optimized DNA sequence R1 is divided into n segments, whether a specific site belonging to the set A exists in each segment is identified, and if so, the specific site is optimized; the optimized sequences are combined to form an optimized DNA sequence R2.

10. A protein expression system comprising a cell extract and a DNA sequence encoding a target protein, wherein the DNA sequence encoding the target protein is optimized by the codon optimization method for the protein expression system according to any one of claims 1 to 9.

11. The protein expression system of claim 10, wherein the cell extract is derived from one of escherichia coli, bacillus subtilis, saccharomyces cerevisiae, saccharomyces pichia pastoris, and kluyveromyces.

12. The protein expression system of claim 11, wherein the kluyveromyces is one of kluyveromyces lactis, kluyveromyces marxianus, kluyveromyces polybuuyveri, kluyveromyces hainanensis, kluyveromyces non-fermented, kluyveromyces wilcoxieli, kluyveromyces thermotolerans, kluyveromyces fragilis, kluyveromyces hubeiensis, kluyveromyces polyspora, kluyveromyces siamensis, and kluyveromyces salospori.