CN114360645A - Codon optimization method of protein expression system and protein expression system - Google Patents

Codon optimization method of protein expression system and protein expression system Download PDF

Info

Publication number
CN114360645A
CN114360645A CN202111673482.3A CN202111673482A CN114360645A CN 114360645 A CN114360645 A CN 114360645A CN 202111673482 A CN202111673482 A CN 202111673482A CN 114360645 A CN114360645 A CN 114360645A
Authority
CN
China
Prior art keywords
codon
expression system
amino acid
kluyveromyces
protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111673482.3A
Other languages
Chinese (zh)
Inventor
郭敏
熊亮
周伟峰
徐丽琼
徐秀珍
唐磊
曹平生
于雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kangma Healthcode Shanghai Biotech Co Ltd
Original Assignee
Kangma Healthcode Shanghai Biotech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kangma Healthcode Shanghai Biotech Co Ltd filed Critical Kangma Healthcode Shanghai Biotech Co Ltd
Priority to CN202111673482.3A priority Critical patent/CN114360645A/en
Publication of CN114360645A publication Critical patent/CN114360645A/en
Priority to CN202211060298.6A priority patent/CN116417065A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Genetics & Genomics (AREA)
  • Theoretical Computer Science (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)

Abstract

The codon optimization method is based on ribosomal protein of a cell extract source species in the protein expression system, counting codons of a DNA sequence for coding the ribosomal protein to obtain the relative frequency of each codon in synonymous codons corresponding to each amino acid residue in an amino acid sequence of the ribosomal protein, selecting the codon with the highest relative frequency, and using the codon as the codon of the same amino acid residue in the amino acid sequence of a target protein. The codon optimization method can rapidly obtain a DNA sequence which does not contain a specific site and has higher protein expression efficiency compared with the DNA sequence before optimization under the condition of using less computing resources.

Description

Codon optimization method of protein expression system and protein expression system
Technical Field
The invention belongs to the technical field of biosynthesis, and particularly relates to a codon optimization method of a protein expression system and the protein expression system.
Background
Codon optimization is an operation of changing the DNA coding sequence of the target protein to be expressed so as to improve the expression amount and/or expression activity of the target protein in an expression system.
The factors to be considered in the codon optimization process are mainly: the physicochemical properties of DNA and mRNA transcribed therefrom, the codon preference of a protein expression system, the two-dimensional and three-dimensional structure of a target protein, and the like. The main parameters considered by the codon optimization method of the current common protein expression system include: codon bias of genes in the host cell, duplex codon bias of the host cell, tRNA copy number of the host cell, GC content, mRNA secondary structure and the like.
Theoretically, the number of DNA sequences used to express the same target protein is very large due to the existence of synonymous codons, and the number of DNA sequences increases geometrically as the length of the protein amino acid sequence increases. For example, it is assumed that the amino acid sequence of the target protein to be expressed is a1a2...anThe number of synonymous codons corresponding to the m-th amino acid residue (m is a natural number, and 1. ltoreq. m.ltoreq.n) is xmThen the number of the types of the DNA coding sequence corresponding to the protein amino acid sequence is:
Figure BDA0003453679520000011
for example, for the following length of 40 amino acid residues:
DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV,
according to the following 20 natural amino acids list and codon table corresponding to each amino acid,
list of 20 natural amino acids
Figure BDA0003453679520000012
Figure BDA0003453679520000021
Cipher sub-table
Figure BDA0003453679520000022
The number of synonymous codons for various amino acid residues is shown in the following table:
Figure BDA0003453679520000023
Figure BDA0003453679520000031
thus, if the protein expression system is a eukaryotic cell, the number of possible corresponding DNA sequences is:
2*4*2*2*6*2*2*2*2*2*2*4*2*2*2*2*6*4*2*2*4*2*2*4*4*2*2*2*4*2*3*3*4*6*1*4*4*4*4*4=273,593,677,362,757,632。
for proteins comprising more amino acid residues, this value increases exponentially. Therefore, if codon optimization solution is performed by using the brute force enumeration method, too much computational resources and time are occupied in solution due to the excessive number of synonymous DNA sequences. The method is generally directly optimized for the DNA sequence of the coding target protein, and finally, which DNA sequence is selected for protein expression also needs to adopt different DNA sequences for massive expression experiments so as to select the DNA sequence with higher expression efficiency and stability, so that the workload is huge. In addition, this method does not take into account other adverse factors that affect the efficiency of protein expression.
Disclosure of Invention
The invention aims to overcome the defects of the existing codon optimization method, and find a DNA coding sequence with an optimized object different from a target protein, wherein the optimization method is different from a codon optimization algorithm of a violent enumeration method so as to improve the optimization efficiency and the expression efficiency of the optimized target protein.
Ribosomes are specialized organelles composed of ribosomal RNA and ribosomal proteins, which play a critical role in the translation of mRNA into protein. Research shows that the expression amount of ribosomal protein is important for the normal function of organism. In view of the importance of ribosomal proteins in organisms, DNA encoding ribosomal proteins is subject to great selection pressure, promoting its evolution toward high stability and high expression efficiency. It is concluded that optimizing the DNA sequence encoding the target protein based on the codon bias regularity of the DNA sequence encoding the ribosomal protein of the target organism will greatly increase the expression level and/or the expression activity of the target protein in and out of the target organism.
In the actual protein expression process, whether based on a cell-based expression system or a cell-free expression system, it is necessary to prepare a DNA encoding a target protein. In order to avoid the degradation of the target DNA fragment by the restriction enzyme, the restriction enzyme cutting site of the corresponding restriction enzyme needs to be avoided on the premise of keeping the synonymity of the codon and relatively high expression efficiency. Furthermore, given that a particular sequence may have a particular negative effect on protein expression, it may sometimes be desirable to remove specific sites based on that particular sequence in addition to the cleavage site.
The problem of removing the restriction sites of restriction enzymes in a DNA sequence during codon optimization can be regarded as a constrained optimization problem. The mandatory constraints in the problem are synonymous codons and sites that need to be avoided, the optimization goal being that the synonymous codons in the sequence are relatively frequently expressed in ribosomal proteins.
In order to achieve the above object, according to a first aspect of the present invention, there is provided a codon optimization method for a protein expression system, wherein codons in a DNA sequence encoding ribosomal proteins are counted based on the ribosomal proteins of a cell extract-derived species in the protein expression system, the relative frequency of each codon in synonymous codons corresponding to each amino acid residue in an amino acid sequence of the ribosomal proteins is obtained, the codon having the highest relative frequency is selected, and the codon is used as a codon of the same amino acid residue in the amino acid sequence of a target protein.
Further, the relative frequency is obtained by normalizing statistical data, the statistical data includes the number of times of usage of each codon in the synonymous codon of each amino acid residue, and the relative frequency of each codon in the synonymous codon is a ratio of the number of times of usage of the relative frequency to the sum of the number of times of usage of each codon in the synonymous codon.
Further, codons with a relative frequency of not more than 0.05 were deleted.
Further, it is also recognized whether a specific site that restricts the expression of the target protein is present in the DNA sequence encoding the target protein, and if present, the nucleotide sequence of the specific site is optimized.
Further, the specific site is a restriction enzyme cutting site of a restriction enzyme.
Further, the optimization procedure for the DNA sequence encoding the protein of interest is as follows: inputting a target protein-based sequence R0 to be optimized, and translating the target protein-based sequence into an amino acid sequence if R0 is a DNA sequence; in the synonymous codon corresponding to each amino acid residue in the amino acid sequence, the codon composition optimized DNA sequence R1 identical to the codon with the highest relative frequency in the synonymous codon of the same amino acid residue in the amino acid sequence of the ribosomal protein was selected.
Further, the DNA sequence encoding the target protein is optimized in a segmented manner.
Further, the length of the segments is m bases, m is more than or equal to 6 and less than or equal to 300, and is integral multiple of 3.
Further, a set A of specific sites to be avoided is also input, the optimized DNA sequence R1 is divided into n segments, whether the specific sites subordinate to the set A exist in each segment is identified, and if the specific sites exist, the specific sites are optimized; the optimized sequences are combined to form an optimized DNA sequence R2.
In a second aspect, the present invention provides a protein expression system, comprising a cell extract and a DNA sequence encoding a target protein, wherein the DNA sequence encoding the target protein is optimized by the codon optimization method of the protein expression system according to any one of the first aspect.
Further, the source species of the cell extract is one of escherichia coli, bacillus subtilis, saccharomyces cerevisiae, saccharomyces pichia pastoris and kluyveromyces.
Further, the Kluyveromyces is one of Kluyveromyces lactis, Kluyveromyces marxianus, Kluyveromyces polybuvinus, Kluyveromyces hainanensis, Kluyveromyces non-fermented, Kluyveromyces williamsii, Kluyveromyces thermotolerans, Kluyveromyces fragilis, Kluyveromyces hunanensis, Kluyveromyces polyspora, Kluyveromyces siamensis, and Kluyveromyces aureoides.
Compared with the prior art, the invention has the beneficial effects that:
1. the concept of the optimization method is derived from the codon preference of the coding DNA sequence of the ribosomal protein of the source species of the cell extract for expressing the target protein, and the codon preference is quantitatively counted and then is transplanted to the coding DNA sequence of the target protein for optimizing, so that the advantages of high stability and high expression efficiency similar to those of the ribosomal protein can be obtained to a certain extent. The ribosome to which the ribosomal protein belongs is a 'protein factory' through which the target protein is expressed, and is optimized from the viewpoint of the biological mechanism of target protein expression, and the expression efficiency and stability of the target protein can be effectively improved by adopting the optimized DNA sequence.
2. Compared with a global violent enumeration method, the segmented optimization can greatly reduce the number of the synonymous sequences required to be considered by each segment of optimization, greatly reduce the resources and time required by calculation and improve the operation efficiency.
3. Optimizing specific sites can destroy the restriction enzyme cutting sites of restriction enzymes, and the optimization can be more targeted, so that the calculation amount can be further reduced.
Drawings
FIG. 1 is a flow chart of codon preference selection in one embodiment of the method of the invention.
FIG. 2 is a flow chart of one embodiment of the method of the present invention.
FIG. 3 is a flow chart of segment optimization in one embodiment of the method of the present invention.
FIG. 4 is a comparison graph of expression levels before and after optimization in one embodiment of the expression system of the present invention.
FIG. 5 is a comparison graph of expression levels before and after optimization in another example of the expression system of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments.
Example 1:
kluyveromyces lactis (Kluyveromyces lactis) is a commonly used yeast in bioengineering, and can be used for large-scale production of proteins. Through counting the codons used by the ribosomal protein coding DNA sequence, after normalization processing, the codons with the relative frequency less than 0.05 are removed, and the obtained codon relative frequencies are shown in the following table:
Figure BDA0003453679520000051
Figure BDA0003453679520000061
Figure BDA0003453679520000071
example 2:
referring to FIGS. 1-3, the input DNA sequence is AACCTTTGGGAAACCCTC, the sites to be avoided are the matched double strands of "CCC" and "GGG", the target protein expression system is derived from Kluyveromyces lactis, the DNA sequence is segmented into 9 character segments, and the arrangement of codons is based on the relative frequency of synonymous codons. The codon optimization procedure was as follows:
1 the synonymous codon ordering obtained in example 1 above was adjusted.
2 translating the input DNA sequence into an amino acid sequence 'NLWETL' according to a codon table;
3 according to the amino acid sequence and codon sequence, the initial DNA sequence generated is: AACTTGTGGGAAACCTTG
4 cutting the initial DNA sequence into two segments "AACTTGTGG" and "GAAACCTTG
5 the first segment does not contain sites to be avoided and therefore is directly used as a solution for the segment;
6 the second segment does not contain sites to be avoided, and therefore is directly used as a solution of the segment;
7, splicing the sequences in sequence, wherein the spliced result is 'AACTTGTGGGAAACCTTG', and the sequence contains a site GGG;
8 pairs of codon combinations involved R4: "TGGGAA" performs a synonymous search, and the possible combinations are: "TGGGAA" and "TGGGAG";
9 neither of the two codon combinations avoids the "GGG" site and therefore the optimization fails.
Example 3:
referring to FIGS. 1-3, the input DNA sequence is ACCCTAGGACTTTACTACCGA, the sites to be avoided are "GTAC" and "GGGTTT", the target protein expression system is derived from Kluyveromyces lactis, the DNA sequence is segmented into 9 character segments, and the arrangement of codons is based on the relative frequency of synonymous codons. The codon optimization procedure was as follows:
1, calling the synonymous codon sequence obtained in the above example 1;
2 translating the input sequence into an amino acid sequence "TLGLYYR" according to a codon table;
3 according to the amino acid sequence and codon sequence, the initial DNA sequence generated is: ACCTTGGGTTTGTACTACAGA, respectively;
4 cutting the initial DNA sequence into three sections of ACCTTGGGT, TTGTACTAC and AGA;
5 the first segment does not contain sites to be avoided and therefore is directly used as a solution for the segment;
in the second paragraph 6, which contains "GTAC", the codon region R4 involved is: "TTGTAC"
7, generating an ordered combination of synonymous codons corresponding to TTGTAC by an enumeration algorithm, wherein the combination is as follows: { "TTGTAC", "TTGTAT", "TTATAC", "TTATAT", "CTATAC", "CTATAT" };
8 sequence "TTGTAC", which can not satisfy the requirement of not containing "GTAC";
9 sequence "TTGTAT", found to have met the requirements;
the 10 second segment would be modified to "TTGTATTAC";
11 the last segment "AGA" contains no sites to be avoided and will therefore be spliced directly;
12, splicing the sequences to generate a result: ACCTTGGGTTTGTATTACAGA, verifying that the sequence is found to contain the site GGGTTT to be avoided, and the related codon combination is 'TTGGGTTTG';
the 13 enumeration algorithm generates TTGGGTTTG corresponding synonymous codon combinations, the first few of which are: (1) TTGGGTTTA, respectively; (2) TTAGGTTTG, respectively;
14, checking one by one, and finding TTAGGTTTG to avoid the designated site;
15 modify the final sequence to ACCTTAGGTTTGTATTACAGA;
16 the sequence of step 15 does not contain the sites to be avoided, thus outputting the optimized sequence "ACCTTAGGTTTGTATTACAGA".
Example 4:
referring to FIGS. 1-3, the input DNA sequence is TTCGGGACATGA, and no avoidance sites are required. The target protein expression system is derived from Kluyveromyces lactis, the DNA sequence is segmented into 9 character segments, and the arrangement basis of codons is the frequency of synonymous codons. The codon optimization procedure was as follows:
1 the synonymous codon ordering obtained in example 1 above was adjusted.
2 translating the input sequence into the amino acid sequence "FAT" according to the codon table;
3 according to the amino acid sequence and codon sequence, the initial DNA sequence generated is: TTCGCTACCTAA, and directly outputting.
Example 5:
referring to FIGS. 1-3, the input DNA sequence is ACCCTAGGACTTTACTACCGA, and the sites to be avoided are GGGTTTA. The target protein expression system is derived from Kluyveromyces lactis, the DNA sequence is segmented into 9 character segments, and the arrangement basis of codons is the frequency of synonymous codons. The codon optimization procedure was as follows:
1, calling the synonymous codon sequence obtained in the above example 1;
2 translating the input sequence into an amino acid sequence "TLGLYYR" according to a codon table;
3 according to the amino acid sequence and codon sequence, the initial DNA sequence generated is: ACCTTGGGTTTGTACTACAGA, respectively;
4 cutting the initial DNA sequence into three sections of ACCTTGGGT, TTGTACTAC and AGA;
three 5 segments do not contain sites to be avoided, so that the three segments are directly spliced in sequence to obtain: ACCTTGGGTTTGTACTACAGA, respectively;
6, the corresponding sequence in step 5 is checked, and no site needing to be avoided is found, so that the ACCTTGGGTTTGTACTACAGA is directly output.
Example 6:
comparing the change of the expression level of the target protein E before and after the sequence optimization by using the scheme. The protein E is fluorescent protein and can generate autofluorescence, and the fluorescence brightness of the protein E can be obtained by reading the value through a fluorescence instrument.
The nucleotide sequence before codon optimization of protein E is as follows, and the sequence is named EO
ATGattacagaaacatcatcaccgttcagatctatattctcccacagtgggaaaCACCACCATCACCACCACCATCACGGGAGCGGCGAGAACTTaTATTTCCAGGGATCCCGGAATGAATTCGGATCTCAATTCGAGCTCCGTCGACAAGCTgGCGGCCGCGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGCGCGGCGAGGGCGAGGGCGATGCCACCAACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTCCTTCAAGGACGACGGCACCTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTTCAACAGCCACAACGTCTATATCACGGCCGACAAGCAGAAGAACGGCATCAAGGCGAACTTCAAGATCCGCCACAACGTCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAG
The codon optimization steps of this sequence are as follows:
1. the corresponding amino acid sequence of the DNA is as follows:
MITETSSPFRSIFSHSGKHHHHHHHHGSGENLYFQGSRNEFGSQFELRRQAGGRVSKGEELFTGVVPILVELDGDVNGHKFSVRGEGEGDATNGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTISFKDDGTYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNFNSHNVYITADKQKNGIKANFKIRHNVEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSKLSKDPNEKRDHMVLLEFVTAAGITLGMDELYK
2. according to the results of example 1, each amino acid residue of the protein was sequentially selected for the corresponding synonymous codon, and the resulting nucleotide sequence was as follows, and the optimized sequence was designated as EXL:
ATGATCACCGAAACCTCTTCTCCATTCAGATCTATCTTCTCTCACTCTGGTAAGCACCACCACCACCACCACCACCACGGTTCTGGTGAAAACTTGTACTTCCAAGGTTCTAGAAACGAATTCGGTTCTCAATTCGAATTGAGAAGACAAGCTGGTGGTAGAGTTTCTAAGGGTGAAGAATTGTTCACCGGTGTTGTTCCAATCTTGGTTGAATTGGACGGTGACGTTAACGGTCACAAGTTCTCTGTTAGAGGTGAAGGTGAAGGTGACGCTACCAACGGTAAGTTGACCTTGAAGTTCATCTGTACCACCGGTAAGTTGCCAGTTCCATGGCCAACCTTGGTTACCACCTTGACCTACGGTGTTCAATGTTTCTCTAGATACCCAGACCACATGAAGCAACACGACTTCTTCAAGTCTGCTATGCCAGAAGGTTACGTTCAAGAAAGAACCATCTCTTTCAAGGACGACGGTACCTACAAGACCAGAGCTGAAGTTAAGTTCGAAGGTGACACCTTGGTTAACAGAATCGAATTGAAGGGTATCGACTTCAAGGAAGACGGTAACATCTTGGGTCACAAGTTGGAATACAACTTCAACTCTCACAACGTTTACATCACCGCTGACAAGCAAAAGAACGGTATCAAGGCTAACTTCAAGATCAGACACAACGTTGAAGACGGTTCTGTTCAATTGGCTGACCACTACCAACAAAACACCCCAATCGGTGACGGTCCAGTTTTGTTGCCAGACAACCACTACTTGTCTACCCAATCTAAGTTGTCTAAGGACCCAAACGAAAAGAGAGACCACATGGTTTTGTTGGAATTCGTTACCGCTGCTGGTATCACCTTGGGTATGGACGAATTGTACAAG
3. the above sequence named EXL is a result of codon optimization because there are no sites to avoid.
After expression using the protein expression system, the fluorescence values, in RFU, of the products were measured, as shown in fig. 4. The intensity of the fluorescence value represents the expression level of the EGFP protein. As can be seen from FIG. 4, the fluorescence of protein E (EXL) after codon optimization was significantly improved as compared with that of protein E (EO) before codon optimization. Because the fluorescence value is positively correlated with the protein expression quantity, the expression quantity of the protein E is obviously improved after codon optimization.
Example 7:
the expression level of protein L was judged from the electrophoretogram after purification of protein L, and the change in expression level of protein L before and after codon optimization was compared.
The sequence before optimization of protein L is as follows, denoted LO
ATGAACGTTATTGCTATTTTGAACCACATGGGCGTTTACTTCAAGGAAGAACCAATTAGAGAATTGCACAGAGCTTTGGAAAGATTGAACTTCCAAATTGTTTACCCAAACGACAGAGACGACTTGTTGAAGTTGATTGAAAACAACGCTAGATTGTGCGGCGTTATTTTCGACTGGGACAAGTACAACTTGGAATTGTGCGAAGAAATTTCTAAGATGAACGAAAACTTGCCATTGTACGCTTTCGCTAACACTTACTCTACTTTGGACGTTTCTTTGAACGACTTGAGATTGCAAATTTCTTTCTTCGAATACGCTTTGGGCGCTGCTGAAGACATTGCTAACAAGATTAAGCAAACTACTGACGAATACATTAACACTATTTTGCCACCATTGACTAAGGCTTTGTTCAAGTACGTTAGAGAAGGCAAGTACACTTTCTGCACTCCAGGCCACATGGGCGGCACTGCTTTCCAAAAGTCTCCAGTTGGCTCTTTGTTCTACGACTTCTTCGGCCCAAACACTATGAAGTCTGACATTTCTATTTCTGTTTCTGAATTGGGCTCTTTGTTGGACCACTCTGGCCCACACAAGGAAGCTGAACAATACATTGCTAGAGTTTTCAACGCTGACAGATCTTACATGGTTACTAACGGCACTTCTACTGCTAACAAGATTGTTGGCATGTACTCTGCTCCAGCTGGCTCTACTATTTTGATTGACAGAAACTGCCACAAGTCTTTGACTCACTTGATGATGATGTCTGACGTTACTCCAATTTACTTCAGACCAACTAGAAACGCTTACGGCATTTTGGGCGGCATTCCACAATCTGAATTCCAACACGCTACTATTGCTAAGAGAGTTAAGGAAACTCCAAACGCTACTTGGCCAGTTCACGCTGTTATTACTAACTCTACTTACGACGGCTTGTTGTACAACACTGACTTCATTAAGAAGACTTTGGACGTTAAGTCTATTCACTTCGACTCTGCTTGGGTTCCATACACTAACTTCTCTCCAATTTACGAAGGCAAGTGCGGCATGTCTGGCGGCAGAGTTGAAGGCAAGGTTATTTACGAAACTCAATCTACTCACAAGTTGTTGGCTGCTTTCTCTCAAGCTTCTATGATTCACGTTAAGGGCGACGTTAACGAAGAAACTTTCAACGAAGCTTACATGATGCACACTACTACTTCTCCACACTACGGCATTGTTGCTTCTACTGAAACTGCTGCTGCTATGATGAAGGGCAACGCTGGCAAGAGATTGATTAACGGCTCTATTGAAAGAGCTATTAAGTTCAGAAAGGAAATTAAGAGATTGAGAACTGAATCTGACGGCTGGTTCTTCGACGTTTGGCAACCAGACCACATTGACACTACTGAATGCTGGCCATTGAGATCTGACTCTACTTGGCACGGCTTCAAGAACATTGACAACGAACACATGTACTTGGACCCAATTAAGGTTACTTTGTTGACTCCAGGCATGGAAAAGGACGGCACTATGTCTGACTTCGGCATTCCAGCTTCTATTGTTGCTAAGTACTTGGACGAACACGGCATTGTTGTTGAAAAGACTGGCCCATACAACTTGTTGTTCTTGTTCTCTATTGGCATTGACAAGACTAAGGCTTTGTCTTTGTTGAGAGCTTTGACTGACTTCAAGAGAGCTTTCGACTTGAACTTGAGAGTTAAGAACATGTTGCCATCTTTGTACAGAGAAGACCCAGAATTCTACGAAAACATGAGAATTCAAGAATTGGCTCAAAACATTCACAAGTTGATTGTTCACCACAACTTGCCAGACTTGATGTACAGAGCTTTCGAAGTTTTGCCAACTATGGTTATGACTCCATACGCTGCTTTCCAAAAGGAATTGCACGGCATGACTGAAGAAGTTTACTTGGACGAAATGGTTGGCAGAATTAACGCTAACATGATTTTGCCATACCCACCAGGCGTTCCATTGGTTATGCCAGGCGAAATGATTACTGAAGAATCTAGACCAGTTTTGGAATTCTTGCAAATGTTGTGCGAAATTGGCGCTCACTACCCAGGCTTCGAAACTGACATTCACGGCGCTTACAGACAAGCTGACGGCAGATACACTGTTAAGGTTTTGAAGGAAGAATCTAAGAAG
The codon optimization steps of this sequence are as follows:
1. the corresponding amino acid sequence of the DNA sequence is as follows:
MNVIAILNHMGVYFKEEPIRELHRALERLNFQIVYPNDRDDLLKLIENNARLCGVIFDWDKYNLELCEEISKMNENLPLYAFANTYSTLDVSLNDLRLQISFFEYALGAAEDIANKIKQTTDEYINTILPPLTKALFKYVREGKYTFCTPGHMGGTAFQKSPVGSLFYDFFGPNTMKSDISISVSELGSLLDHSGPHKEAEQYIARVFNADRSYMVTNGTSTANKIVGMYSAPAGSTILIDRNCHKSLTHLMMMSDVTPIYFRPTRNAYGILGGIPQSEFQHATIAKRVKETPNATWPVHAVITNSTYDGLLYNTDFIKKTLDVKSIHFDSAWVPYTNFSPIYEGKCGMSGGRVEGKVIYETQSTHKLLAAFSQASMIHVKGDVNEETFNEAYMMHTTTSPHYGIVASTETAAAMMKGNAGKRLINGSIERAIKFRKEIKRLRTESDGWFFDVWQPDHIDTTECWPLRSDSTWHGFKNIDNEHMYLDPIKVTLLTPGMEKDGTMSDFGIPASIVAKYLDEHGIVVEKTGPYNLLFLFSIGIDKTKALSLLRALTDFKRAFDLNLRVKNMLPSLYREDPEFYENMRIQELAQNIHKLIVHHNLPDLMYRAFEVLPTMVMTPYAAFQKELHGMTEEVYLDEMVGRINANMILPYPPGVPLVMPGEMITEESRPVLEFLQMLCEIGAHYPGFETDIHGAYRQADGRYTVKVLKEESKK
2. according to the results of example 1, the corresponding synonymous codon was selected for each amino acid residue in turn of the protein, and the resulting nucleotide sequence, denoted LXL, was as follows:
ATGAACGTTATCGCTATCTTGAACCACATGGGTGTTTACTTCAAGGAAGAACCAATCAGAGAATTGCACAGAGCTTTGGAAAGATTGAACTTCCAAATCGTTTACCCAAACGACAGAGACGACTTGTTGAAGTTGATCGAAAACAACGCTAGATTGTGTGGTGTTATCTTCGACTGGGACAAGTACAACTTGGAATTGTGTGAAGAAATCTCTAAGATGAACGAAAACTTGCCATTGTACGCTTTCGCTAACACCTACTCTACCTTGGACGTTTCTTTGAACGACTTGAGATTGCAAATCTCTTTCTTCGAATACGCTTTGGGTGCTGCTGAAGACATCGCTAACAAGATCAAGCAAACCACCGACGAATACATCAACACCATCTTGCCACCATTGACCAAGGCTTTGTTCAAGTACGTTAGAGAAGGTAAGTACACCTTCTGTACCCCAGGTCACATGGGTGGTACCGCTTTCCAAAAGTCTCCAGTTGGTTCTTTGTTCTACGACTTCTTCGGTCCAAACACCATGAAGTCTGACATCTCTATCTCTGTTTCTGAATTGGGTTCTTTGTTGGACCACTCTGGTCCACACAAGGAAGCTGAACAATACATCGCTAGAGTTTTCAACGCTGACAGATCTTACATGGTTACCAACGGTACCTCTACCGCTAACAAGATCGTTGGTATGTACTCTGCTCCAGCTGGTTCTACCATCTTGATCGACAGAAACTGTCACAAGTCTTTGACCCACTTGATGATGATGTCTGACGTTACCCCAATCTACTTCAGACCAACCAGAAACGCTTACGGTATCTTGGGTGGTATCCCACAATCTGAATTCCAACACGCTACCATCGCTAAGAGAGTTAAGGAAACCCCAAACGCTACCTGGCCAGTTCACGCTGTTATCACCAACTCTACCTACGACGGTTTGTTGTACAACACCGACTTCATCAAGAAGACCTTGGACGTTAAGTCTATCCACTTCGACTCTGCTTGGGTTCCATACACCAACTTCTCTCCAATCTACGAAGGTAAGTGTGGTATGTCTGGTGGTAGAGTTGAAGGTAAGGTTATCTACGAAACCCAATCTACCCACAAGTTGTTGGCTGCTTTCTCTCAAGCTTCTATGATCCACGTTAAGGGTGACGTTAACGAAGAAACCTTCAACGAAGCTTACATGATGCACACCACCACCTCTCCACACTACGGTATCGTTGCTTCTACCGAAACCGCTGCTGCTATGATGAAGGGTAACGCTGGTAAGAGATTGATCAACGGTTCTATCGAAAGAGCTATCAAGTTCAGAAAGGAAATCAAGAGATTGAGAACCGAATCTGACGGTTGGTTCTTCGACGTTTGGCAACCAGACCACATCGACACCACCGAATGTTGGCCATTGAGATCTGACTCTACCTGGCACGGTTTCAAGAACATCGACAACGAACACATGTACTTGGACCCAATCAAGGTTACCTTGTTGACCCCAGGTATGGAAAAGGACGGTACCATGTCTGACTTCGGTATCCCAGCTTCTATCGTTGCTAAGTACTTGGACGAACACGGTATCGTTGTTGAAAAGACCGGTCCATACAACTTGTTGTTCTTGTTCTCTATCGGTATCGACAAGACCAAGGCTTTGTCTTTGTTGAGAGCTTTGACCGACTTCAAGAGAGCTTTCGACTTGAACTTGAGAGTTAAGAACATGTTGCCATCTTTGTACAGAGAAGACCCAGAATTCTACGAAAACATGAGAATCCAAGAATTGGCTCAAAACATCCACAAGTTGATCGTTCACCACAACTTGCCAGACTTGATGTACAGAGCTTTCGAAGTTTTGCCAACCATGGTTATGACCCCATACGCTGCTTTCCAAAAGGAATTGCACGGTATGACCGAAGAAGTTTACTTGGACGAAATGGTTGGTAGAATCAACGCTAACATGATCTTGCCATACCCACCAGGTGTTCCATTGGTTATGCCAGGTGAAATGATCACCGAAGAATCTAGACCAGTTTTGGAATTCTTGCAAATGTTGTGTGAAATCGGTGCTCACTACCCAGGTTTCGAAACCGACATCCACGGTGCTTACAGACAAGCTGACGGTAGATACACCGTTAAGGTTTTGAAGGAAGAATCTAAGAAG
3. the above sequence named LXL is a result of codon optimization because there are no sites to be avoided.
After the expression of the in vitro protein expression system, the nickel magnetic bead affinity purification is carried out. Because the protein L is provided with a histag at the C terminal, the protein L can be specifically adsorbed with nickel. SDS-PAGE was performed on the proteins eluted after the purification, and the expression level of protein L was compared, as shown in FIG. 5, and the size of protein L was 82.8kDa, as shown by the arrow in FIG. 5. The result shows that after codon optimization, the expression level is obviously improved, and the electrophoresis band of the target protein is obviously enhanced.
Reference is made to the applicant's relevant prior patent documents for protein expression systems mentioned in the present invention, as if each document were individually incorporated by reference. Furthermore, it should be understood that various changes and modifications of the present invention can be made by those skilled in the art after reading the above teachings of the present invention, and these equivalents also fall within the scope of the present invention as defined by the appended claims.

Claims (12)

1. A codon optimization method for a protein expression system is characterized in that, based on ribosomal proteins of cell extracts derived species in the protein expression system, the codons of a DNA sequence encoding the ribosomal proteins are counted to obtain the relative frequency of each codon in synonymous codons corresponding to each amino acid residue in the amino acid sequence of the ribosomal proteins, the codon with the highest relative frequency is selected, and the codon is used as the codon of the same amino acid residue in the amino acid sequence of a target protein.
2. The codon optimization method according to claim 1, wherein the relative frequency is normalized by statistical data including the number of times of usage of each codon in the synonymous codon for each amino acid residue, and the relative frequency of each codon in the synonymous codon is a ratio of the number of times of usage thereof to the sum of the number of times of usage of each codon in the synonymous codon.
3. The codon optimization method for protein expression system according to claim 2, wherein codons having a relative frequency of not more than 0.05 are deleted.
4. The codon optimization method for protein expression system according to claim 1, wherein the DNA sequence encoding the target protein is further identified as to whether or not a specific site that restricts the expression of the target protein is present, and if present, the nucleotide sequence of the specific site is optimized.
5. The codon optimization method for protein expression system according to claim 4, wherein the specific site is a cleavage site of a restriction enzyme.
6. The codon optimization method for protein expression system according to claim 1, wherein the DNA sequence encoding the target protein is optimized as follows: inputting a target protein-based sequence R0 to be optimized, and translating the target protein-based sequence into an amino acid sequence if R0 is a DNA sequence; in the synonymous codon corresponding to each amino acid residue in the amino acid sequence, the codon composition optimized DNA sequence R1 identical to the codon with the highest relative frequency in the synonymous codon of the same amino acid residue in the amino acid sequence of the ribosomal protein was selected.
7. The codon optimization method for protein expression system according to claim 6, wherein the DNA sequence encoding the target protein is optimized in a stepwise manner.
8. The codon optimization method for protein expression system according to claim 7, wherein the length of said segment is m bases, 6. ltoreq. m.ltoreq.300, and is an integral multiple of 3.
9. The codon optimization method for protein expression system according to claim 8, wherein a set A of specific sites to be avoided is further inputted, the optimized DNA sequence R1 is divided into n segments, whether a specific site belonging to the set A exists in each segment is identified, and if so, the specific site is optimized; the optimized sequences are combined to form an optimized DNA sequence R2.
10. A protein expression system comprising a cell extract and a DNA sequence encoding a target protein, wherein the DNA sequence encoding the target protein is optimized by the codon optimization method for the protein expression system according to any one of claims 1 to 9.
11. The protein expression system of claim 10, wherein the cell extract is derived from one of escherichia coli, bacillus subtilis, saccharomyces cerevisiae, saccharomyces pichia pastoris, and kluyveromyces.
12. The protein expression system of claim 11, wherein the kluyveromyces is one of kluyveromyces lactis, kluyveromyces marxianus, kluyveromyces polybuuyveri, kluyveromyces hainanensis, kluyveromyces non-fermented, kluyveromyces wilcoxieli, kluyveromyces thermotolerans, kluyveromyces fragilis, kluyveromyces hubeiensis, kluyveromyces polyspora, kluyveromyces siamensis, and kluyveromyces salospori.
CN202111673482.3A 2021-12-31 2021-12-31 Codon optimization method of protein expression system and protein expression system Pending CN114360645A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111673482.3A CN114360645A (en) 2021-12-31 2021-12-31 Codon optimization method of protein expression system and protein expression system
CN202211060298.6A CN116417065A (en) 2021-12-31 2022-08-31 Codon optimization method of protein expression system and protein expression system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111673482.3A CN114360645A (en) 2021-12-31 2021-12-31 Codon optimization method of protein expression system and protein expression system

Publications (1)

Publication Number Publication Date
CN114360645A true CN114360645A (en) 2022-04-15

Family

ID=81105391

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202111673482.3A Pending CN114360645A (en) 2021-12-31 2021-12-31 Codon optimization method of protein expression system and protein expression system
CN202211060298.6A Pending CN116417065A (en) 2021-12-31 2022-08-31 Codon optimization method of protein expression system and protein expression system

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202211060298.6A Pending CN116417065A (en) 2021-12-31 2022-08-31 Codon optimization method of protein expression system and protein expression system

Country Status (1)

Country Link
CN (2) CN114360645A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115440300A (en) * 2022-11-07 2022-12-06 深圳市瑞吉生物科技有限公司 Codon sequence optimization method and device, computer equipment and storage medium
CN117095752A (en) * 2023-08-21 2023-11-21 基诺创物(武汉市)科技有限公司 DNA protein coding region streaming data storage method capable of keeping codon preference
WO2024109911A1 (en) * 2022-11-24 2024-05-30 南京金斯瑞生物科技有限公司 Codon optimization

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115440300A (en) * 2022-11-07 2022-12-06 深圳市瑞吉生物科技有限公司 Codon sequence optimization method and device, computer equipment and storage medium
CN115440300B (en) * 2022-11-07 2023-01-20 深圳市瑞吉生物科技有限公司 Codon sequence optimization method and device, computer equipment and storage medium
WO2024109911A1 (en) * 2022-11-24 2024-05-30 南京金斯瑞生物科技有限公司 Codon optimization
CN117095752A (en) * 2023-08-21 2023-11-21 基诺创物(武汉市)科技有限公司 DNA protein coding region streaming data storage method capable of keeping codon preference
CN117095752B (en) * 2023-08-21 2024-03-19 基诺创物(武汉市)科技有限公司 DNA protein coding region streaming data storage method capable of keeping codon preference

Also Published As

Publication number Publication date
CN116417065A (en) 2023-07-11

Similar Documents

Publication Publication Date Title
CN114360645A (en) Codon optimization method of protein expression system and protein expression system
Xu et al. Engineered miniature CRISPR-Cas system for mammalian genome regulation and editing
Harris et al. Phylogenomic evidence for the monophyly of bryophytes and the reductive evolution of stomata
Dunkelmann et al. Engineered triply orthogonal pyrrolysyl–tRNA synthetase/tRNA pairs enable the genetic encoding of three distinct non-canonical amino acids
Krassowski et al. Evolutionary instability of CUG-Leu in the genetic code of budding yeasts
Quax et al. Codon bias as a means to fine-tune gene expression
CN112513989B (en) Codon optimization
Heux et al. White biotechnology: state of the art strategies for the development of biocatalysts for biorefining
Jukes et al. Evolutionary changes in the genetic code.
Anderson et al. An expanded genetic code with a functional quadruplet codon
Mukai et al. Codon reassignment in the Escherichia coli genetic code
Shulgina et al. A computational screen for alternative genetic codes in over 250,000 genomes
Hockenberry et al. Quantifying position-dependent codon usage bias
Gilchrist et al. Estimating gene expression and codon-specific translational efficiencies, mutation biases, and selection coefficients from genomic data alone
Mignon et al. Codon harmonization–going beyond the speed limit for protein expression
Mühlhausen et al. Endogenous stochastic decoding of the CUG codon by competing Ser-and Leu-tRNAs in Ascoidea asiatica
An et al. Emergence and evolution of yeast prion and prion-like proteins
CN101490262A (en) A method for achieving improved polypeptide expression
Baisya et al. Genome-wide functional screens enable the prediction of high activity CRISPR-Cas9 and-Cas12a guides in Yarrowia lipolytica
Bachvaroff A precedented nuclear genetic code with all three termination codons reassigned as sense codons in the syndinean Amoebophrya sp. ex Karlodinium veneficum
Zhao et al. Bioinformatics analysis of alternative polyadenylation in green alga Chlamydomonas reinhardtii using transcriptome sequences from three different sequencing platforms
Willems et al. Lost and found: re-searching and re-scoring proteomics data aids genome annotation and improves proteome coverage
Clauwaert et al. TIS transformer: remapping the human proteome using deep learning
Kollmar et al. How tRNAs dictate nuclear codon reassignments: Only a few can capture non-cognate codons
Villada et al. Integrated analysis of individual codon contribution to protein biosynthesis reveals a new approach to improving the basis of rational gene design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20220415