CN117660445A - Adjacent codon optimization method and device, electronic equipment and storage medium - Google Patents

Adjacent codon optimization method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117660445A
CN117660445A CN202311605984.1A CN202311605984A CN117660445A CN 117660445 A CN117660445 A CN 117660445A CN 202311605984 A CN202311605984 A CN 202311605984A CN 117660445 A CN117660445 A CN 117660445A
Authority
CN
China
Prior art keywords
sequence
mrna
codon
iteration
codons
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311605984.1A
Other languages
Chinese (zh)
Inventor
刘阳
郜杰
方晓敏
张肖男
何径舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202311605984.1A priority Critical patent/CN117660445A/en
Publication of CN117660445A publication Critical patent/CN117660445A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/22Vectors comprising a coding region that has been codon optimised for expression in a respective host

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Plant Pathology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The disclosure provides an optimization method, an optimization device, electronic equipment and a storage medium for adjacent codons, relates to the technical field of biological information, and particularly relates to the technical field of biological computation. The specific implementation scheme is as follows: obtaining an initial mRNA sequence of the mRNA to be optimized; and performing J times of iterative optimization on the codon K-mer in the mRNA initial sequence to obtain one or more mRNA target sequences, wherein J is a natural number greater than or equal to 1. Therefore, the scheme obtains the mRNA initial sequence to be optimized, and carries out repeated iterative optimization on the sequence to obtain the mRNA target sequence, so that the use frequency of codons can be optimized, the stability of mRNA is ensured, the translation efficiency of mRNA in a designated host cell is improved, and the protein expression yield is improved.

Description

Adjacent codon optimization method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of biological information, in particular to the technical field of biological computation, and especially relates to an optimization method and device of adjacent codons, electronic equipment and a storage medium.
Background
In order to improve the translation efficiency of Messenger ribonucleic acid (mRNA), the prior art can optimize codons based on the minimum folding free energy of mRNA and codon adaptation index, but cannot guarantee the stability of mRNA, resulting in more uncertainty of the optimization result.
Disclosure of Invention
The disclosure provides an optimization method and device of adjacent codons, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a method for optimizing adjacent codons, including: obtaining an initial mRNA sequence of the mRNA to be optimized; and performing iterative optimization on the codon K-mer in the mRNA initial sequence for J times to obtain one or more mRNA target sequences, wherein J is a natural number greater than or equal to 1, and the codon K-mer is adjacent K codons in the mRNA sequence.
According to another aspect of the present disclosure, there is provided an optimizing apparatus for adjacent codons, including: the acquisition module is used for acquiring an initial mRNA sequence of the mRNA to be optimized; and the optimization module is used for performing J times of iterative optimization on the codon K-mer in the mRNA initial sequence to obtain one or more mRNA target sequences, wherein J is a natural number greater than or equal to 1, and the codon K-mer is adjacent K codons in the mRNA sequence.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for optimizing adjacent codons described in the embodiment of the above aspect.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to perform the method of optimizing adjacent codons according to the embodiment of the above aspect.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program/instruction which, when executed by a processor, implements the method for optimizing adjacent codons according to the embodiment of the above aspect.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic flow chart of a method for optimizing adjacent codons provided in an embodiment of the disclosure;
FIG. 2 is a flow chart of another method for optimizing adjacent codons provided in an embodiment of the disclosure;
FIG. 3 is a schematic flow chart of determining a first threshold and a second threshold in an optimization method of adjacent codons according to an embodiment of the disclosure;
FIG. 4 is a schematic flow chart of determining local scores and global scores in an optimization method of adjacent codons provided in an embodiment of the disclosure;
FIG. 5 is a schematic flow chart of mutation operation in an optimization method of adjacent codons provided in an embodiment of the disclosure;
FIG. 6 is a schematic diagram of a mutation operation on an original codon provided in an embodiment of the present disclosure;
FIG. 7 is a flow chart of another method for optimizing adjacent codons provided in an embodiment of the disclosure;
FIG. 8 is a schematic flow chart of iterative optimization of codon K-mers provided by embodiments of the present disclosure;
FIG. 9 is a schematic structural diagram of an optimizing device for adjacent codons according to an embodiment of the present disclosure;
FIG. 10 is a block diagram of an electronic device for implementing a method of optimizing adjacent codons in an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The following describes a method, an apparatus and an electronic device for optimizing adjacent codons according to an embodiment of the present disclosure with reference to the accompanying drawings.
Biological computing is a field that references the principles and mechanisms of biological systems to solve computational problems. It applies some characteristics and processes of biology to computing systems to improve computing efficiency and performance. The goal of biological computing is to obtain inspiration from a biological system and convert it into new computing methods and techniques to solve complex problems. The method has wide application in the fields of optimization, pattern recognition, data analysis, simulation and the like, and is continuously developed and expanded.
The disclosed embodiments may be used in mRNA prophylactic vaccines, mRNA therapeutic cancer vaccines, mRNA protein replacement therapies, deoxyribonucleic acid (DeoxyriboNucleic Acid, DNA) gene therapies of viral or non-viral vectors, engineering biological engineering, and the like.
mRNA prophylactic vaccine: the present disclosure is useful for designing mRNA vaccines that express various pathogen antigens, enhancing the in vivo expression levels of the antigens, and enhancing vaccine protection efficacy.
mRNA therapeutic cancer vaccine: the present disclosure is useful for designing mRNA vaccines that express a patient's cancer neoantigen, increasing the expression level of the neoantigen, activating and enhancing the immune response of the human body, and promoting the clearance of cancer cells.
mRNA protein replacement therapy: the present disclosure is useful for designing mRNA drugs that express various therapeutic proteins, including various proteins (e.g., insulin and enzymes required for various metabolism) that are deficient in vivo due to genetic defects and aging, etc., as well as protein drugs (e.g., antibodies), etc. Such scenarios often require greater protein yields, and the present disclosure will help to improve translation efficiency and protein yield of mRNA in such therapies.
DNA gene therapy of viral or non-viral vectors: the present disclosure is useful in the design of gene coding regions in DNA gene therapy, to improve the translational efficiency of mRNA transcribed therefrom, and ultimately to improve the translational yield of a protein of interest.
Modification of genetically engineered organisms: the present disclosure is useful in the engineering of genetically engineered organisms for the production of various proteins, including various animals, plants and microorganisms. By optimizing the codon composition of the transgene fragment, the improvement of the expression of the target protein is realized.
FIG. 1 is a schematic flow chart of a method for optimizing adjacent codons according to an embodiment of the disclosure.
As shown in FIG. 1, the method for optimizing adjacent codons may include:
s101, acquiring an initial mRNA sequence of the mRNA to be optimized.
It should be noted that, in the embodiments of the present disclosure, the execution body of the optimization method of adjacent codons may be a hardware device with data information processing capability and/or software necessary for driving the hardware device to operate. Alternatively, the execution body may include a server, a computer, a user terminal, and other intelligent devices. Optionally, the user terminal includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, etc. Alternatively, the server includes, but is not limited to, a web server, an application server, a server of a distributed system, a server incorporating a blockchain, etc.
It will be appreciated that Messenger ribonucleic acid (mRNA) can translate genetic information within a biological cell, synthesize a protein, and optimize codons in the mRNA in order to increase the efficiency of translation of the mRNA within a given host cell, and thus the amount of expression of the target protein.
In some implementations, the target protein to be synthesized can be determined according to the requirement, and the mRNA sequence can be extracted by methods such as reverse translation and the like, so that the minimum folding free energy (Minimum Free Energy, MFE) and the codon adaptation index (Codon Adaptation Index, CAI) of the sequence are optimized to obtain the mRNA initial sequence. Alternatively, optimization of MFE and CAI may be performed based on LinearDesign, geneDesigner or the like.
S102, performing iterative optimization on a codon K-mer in an mRNA initial sequence for J times to obtain one or more mRNA target sequences, wherein J is a natural number greater than or equal to 1, and the codon K-mer is adjacent K codons in the mRNA sequence.
In some implementations, the codons K-mer refer to K adjacent codons in the mRNA sequence. The codon K-mer can be optimized for J iterations starting from the initial sequence of the mRNA. The mRNA initial sequence can be amplified and mutated, whether the mutated mRNA sequence meets the iteration ending condition is judged, and the iteration is ended when the iteration ending condition is met, so that one or more mRNA target sequences are obtained.
Optionally, if the iteration end condition is not satisfied, performing the next iteration based on the mutated mRNA sequence, and performing amplification and mutation when the iteration is performed on the mutated mRNA sequence at the end of the previous iteration until the mutated mRNA sequence satisfies the iteration condition, thereby obtaining the mRNA target sequence.
Alternatively, the iteration end condition may be that the iteration number reaches J, for example, if J is set to 100, the iteration is ended after the 100 th iteration, and the mRNA sequence obtained by the 100 th iteration is the mRNA target sequence.
According to the optimization method of adjacent codons, the mRNA target sequence can be obtained by acquiring the initial sequence of mRNA to be optimized and performing repeated iterative optimization on the codons K-mer in the sequence, the use frequency of the adjacent codons can be optimized, the stability of the mRNA is ensured, the translation efficiency of the mRNA in a designated host cell is improved, and the protein expression yield is improved.
FIG. 2 is a flow chart of a method for optimizing adjacent codons according to an embodiment of the present disclosure.
As shown in FIG. 2, the method for optimizing adjacent codons may include:
s201, obtaining an initial mRNA sequence of the mRNA to be optimized.
The relevant content of step S201 may be referred to the above embodiments, and will not be described herein.
S202, iterating from an initial sequence of mRNA, acquiring a first sequence set at the end of the jth iteration aiming at the jth iteration, and amplifying the first sequence of the mRNA in the first sequence set to obtain a second sequence set at the beginning of the (j+1) th iteration, wherein the second sequence set comprises the first sequence and the second sequence of the mRNA, J is a natural number, and J is more than or equal to 1 and less than or equal to J.
In some implementations, the total number of sequences in the iterative process may be preset such that the number of amplified mRNA sequences is the total number of sequences. By obtaining the first sequence set at the end of the j-th iteration and amplifying the first sequence of the mRNA, the second sequence set for the j+1-th iteration can be obtained.
Illustratively, assuming that iteration 3 is required, the amplification is based on the first mRNA sequences in the first set of sequences at the end of iteration 2, where the first mRNA sequences are 30 and the total number of sequences is 100, then 70 second mRNA sequences are amplified such that the second set of sequences has up to 100 mRNA sequences, and the second set of sequences includes 30 first mRNA sequences and 70 second mRNA sequences.
Alternatively, the mRNA first sequence may be amplified based on a global score for the mRNA first sequence. The first sequence of mRNA may be replicated in turn based on a global score of the first sequence of mRNA, ordered from high to low, resulting in a second sequence of mRNA.
S203, carrying out mutation operation on the codon K-mer in the mRNA second sequence in the second sequence set to obtain a third sequence set, wherein the third sequence set comprises the mRNA third sequence, and determining the first sequence set at the end of the j+1st iteration from the third sequence set.
In some implementations, a mutation candidate region can be determined in the second sequence of the mRNA and the codons within the mutation candidate region are subjected to a mutation operation to yield the third sequence of the mRNA. Through carrying out mutation on each mRNA second sequence in the second sequence set, a third sequence set can be obtained, so that mRNA is easier to identify and translate in the translation process, and the translation efficiency is improved.
Alternatively, the mutation candidate region may be determined based on a local score of the codon K-mer of the mRNA sequence. For each mRNA second sequence, a local score for each codon K-mer in the mRNA second sequence is obtained, and a mutation candidate region is determined from the mRNA second sequence based on the local score.
Alternatively, a mutation candidate region may be determined from each codon K-mer by comparing the local score of that codon K-mer to the magnitude of the local score threshold, and determining a codon K-mer that is less than the local score threshold. The codon K-mer with the smallest local score can be selected as a mutation candidate region, and the codon K-mer with the largest local score can be selected as a mutation candidate region, so that the local score of the codon K-mer in the mutation candidate region is lower, and the codon in the region is mutated, so that the translation efficiency of mRNA can be improved.
Further, mutation operation is performed on the original codon in the mutation candidate region to obtain a third sequence of mRNA. The mRNA third sequence can be obtained by substitution of synonymous codons for the original codons.
In some implementations, the third mRNA sequence can be filtered to obtain a highly stable, translation-efficient mRNA sequence. By obtaining a set of constraint thresholds for the output sequence of the mRNA, wherein the set of constraint thresholds includes a first threshold for the codon adaptation index (Codon Adaptation Index, CAI) and a second threshold for the minimum folding free energy (Minimum Free Energy, MFE). And further filtering the third sequence in the third sequence set based on the first threshold and the second threshold to obtain a first sequence set at the end of the j+1st iteration.
Alternatively, by determining the CAI and MFE of each third sequence in the third sequence set, determining a candidate third sequence with the CAI greater than or equal to the first threshold and the MFE less than or equal to the second threshold from the third sequence set, obtaining the first sequence set at the end of the j+1st iteration based on the candidate third sequence, and screening the third sequence set based on the first threshold and the second threshold, an mRNA sequence with high stability and high translation efficiency can be obtained.
Alternatively, the first set of sequences at the end of the j+1st iteration may be determined from the candidate third sequences based on the global score of the mRNA sequences. And determining global scores of the candidate third sequences, and sorting the candidate third sequences according to the global scores. And determining a target third sequence from the candidate third sequences according to the sorting result, and obtaining a first sequence set at the end of the j+1st iteration.
Alternatively, the global scores may be ranked in order from large to small, and the candidate third sequence with the top ranking may be selected as the target third sequence, so as to obtain the first sequence set at the end of the j+1st iteration, determine the target third sequence based on the global scores, and determine the mRNA sequence with stable structure from the candidate third sequence.
S204, responding to the j+1th iteration meeting the iteration ending condition, and obtaining an mRNA target sequence based on the first sequence set at the j+1th iteration ending.
S205, in response to the j+1th iteration not satisfying the iteration end condition, continuing the j+2th iteration based on the first sequence set at the end of the j+1th iteration.
In some implementations, the target sequence is a sequence in the set of sequences iterated when the iteration end condition is satisfied. Wherein the iteration end condition includes: the first global score of the first sequence set is greater than or equal to a set global score threshold, or the number of iterations is equal to J.
In some implementations, after the first set of sequences at the end of the j+1th iteration is obtained, a first global score for each mRNA sequence in the first set of sequences may be calculated and the first global score compared to a set global score threshold. If the first global score is larger than the set global score threshold, and the iteration ending condition is met, the mRNA sequence in the first sequence set at the end of the j+1st iteration is the mRNA target sequence.
Optionally, if the first global score is less than or equal to the set global score threshold, it may be determined whether j+1 is equal to J, and if j+1 is equal to J, it is determined that the mRNA sequence in the first sequence set at the end of the j+1st iteration is the mRNA target sequence if the iteration end condition is satisfied. Optionally, if the first global score is less than or equal to the set global score threshold and j+1 is less than J, and it is determined that the iteration end condition is not met, then based on the first sequence set when the j+1st iteration ends, continuing the j+2nd iteration until the iteration end condition is met, so as to obtain the mRNA target sequence, thereby ensuring translation efficiency, stability and the like of the mRNA target sequence, and improving protein expression yield.
According to the optimization method of adjacent codons, the mRNA initial sequence to be optimized is obtained, repeated iterative optimization is carried out from the mRNA initial sequence, the mRNA sequence is amplified and mutated, whether the mutated sequence meets the iteration ending condition or not is judged, iteration is ended when the mutated sequence meets the iteration ending condition, the mRNA target sequence is obtained, the use frequency of the codons can be optimized, the stability of the mRNA is guaranteed, the translation efficiency of the mRNA in a specified host cell is improved, and the protein expression yield is improved.
On the basis of the above embodiments, the process of determining the first threshold value and the second threshold value according to the embodiments of the present disclosure may be explained, as shown in fig. 3, including:
s301, determining initial CAI and initial MFE of an initial sequence of mRNA.
It will be appreciated that CAI is an indicator for assessing codon usage preference of a given gene. The index is used to measure whether the codon usage of a particular gene meets the preference and fitness of the host organism, and is typically used to analyze the efficiency of expression of a foreign gene in the host organism. MFE refers to the free energy value corresponding to the most stable three-dimensional structure that an RNA molecule can achieve under specific conditions.
In some implementations, the initial CAI may be calculated based on a codon frequency table of the host in which the initial sequence of mRNA is located, where the codon frequency table is related to the species of the host and the codon frequency table of the same species is the same, e.g., the codon frequency table of a human being is the same.
Alternatively, the initial MFE may be calculated based on the initial sequence of the mRNA using calculation tools for MFE.
S302, determining the CAI allowable loss value and the MFE allowable loss value of the output sequence of the mRNA.
In some implementations, the CAI and MFE allowable loss values may be determined based on the allowable loss levels of CAI and MFE for the output sequence of the mRNA. Wherein, the value range of the CAI allowed loss value and the MFE allowed loss value is a real number interval [0,1].
For example, both the CAI and MFE allowable loss values were taken to be 0.95, indicating that the absolute values of CAI and MFE for the output sequence of mRNA were not less than 95% of their corresponding initial CAI and initial MFE.
S303, determining a first threshold based on the initial CAI and the CAI allowable loss value, and determining a second threshold based on the initial MFE and the MFE allowable loss value.
Optionally, the formula for calculating the first threshold and the second threshold is as follows:
T CAI =CAI init ×(1-L CAI ) (1)
T MFE =MFE init ×(1-L MFE ) (2)
wherein T is CAI Represents a first threshold, T MFE Representing a second threshold, CAI init And MFE (MFE) init Representing an initial CAI and an initial MFE, L, respectively CAI And L MFE Respectively represent the CAI allowable loss value and the MFE allowable loss value, and the value range is a real number interval [0,1 ]]。
According to the optimization method of adjacent codons, the mRNA target sequence can be obtained by acquiring the mRNA initial sequence to be optimized and performing repeated iterative optimization on the sequence, the use frequency of the codons can be optimized, the stability of the mRNA is ensured, the translation efficiency of the mRNA in a designated host cell is improved, and the protein expression yield is improved. And (3) screening the third sequence set by calculating the first threshold value of CAI and the second threshold value of MFE, so that mRNA sequences with high stability and high translation efficiency can be obtained.
On the basis of the above embodiments, the embodiments of the present disclosure may explain a process of determining a local score and a global score, as shown in fig. 4, including:
s401, aiming at any codon K-mer, obtaining the occurrence frequency of codons, amino acids, adjacent K codon pairs and adjacent K amino acid pairs in a genome where an mRNA sequence is located, wherein K is a natural number greater than or equal to 1.
S402, determining the local score of the codon K-mer based on the occurrence frequency.
In some implementations, optionally, the formula for calculating the local score is as follows:
wherein CKS represents local score, C i Represents a codon, A i Represent C i Corresponding amino acids, F (C) i ) Represents the frequency of occurrence of codons in the genome, F (A i ) Frequency of amino acids in the genome, F (C 1 C 2 …C K ) Frequency of occurrence of adjacent K codon pairs in the genome, F (A) 1 A 2 …A K ) The frequency of occurrence of adjacent K amino acid pairs in the genome.
In some implementations, when the frequency of occurrence of any adjacent K amino acid pairs in the genome where the mRNA sequence is located is less than a set threshold, it may result in a local score that is greatly affected by accidental factors, reducing its reference value.
Optionally, in adjacent K sub+1 Frequency of occurrence of amino acid pairs in genomeLess than a set threshold T F And adjacent K sub Frequency of occurrence of amino acid pairs in the genome +.>Greater than or equal to a set threshold T F In the case of (a), K before can be used sub (K sub <K) The frequency of occurrence of each amino acid and its corresponding codon is calculated as follows:
wherein CKS represents local score, C i Represents a codon, A i Represent C i Corresponding amino acids, F (C) i ) Represents the frequency of occurrence of codons in the genome, F (A i ) Frequency of occurrence of amino acids in genome,Adjacent K sub Frequency of occurrence of the pair of codons in the genome,/-> Adjacent K sub Frequency of occurrence of pairs of amino acids in the genome.
S403, determining a global score of the mRNA sequence based on the local score and the number of codons in the mRNA sequence.
Alternatively, the global score may be determined based on the ratio of the local score to the number of codons, and the formula for calculating the global score is as follows:
wherein CKB represents global score, CKS represents local score, and N represents the number of codons in the mRNA sequence.
According to the optimization method of adjacent codons, the mRNA target sequence can be obtained by acquiring the mRNA initial sequence to be optimized and performing repeated iterative optimization on the sequence, the use frequency of the codons can be optimized, the stability of the mRNA is ensured, the translation efficiency of the mRNA in a designated host cell is improved, and the protein expression yield is improved. Based on the local score and the global score, the joint influence or coupling effect of adjacent codons on the translation efficiency can be embodied. Through screening mutation candidate regions and carrying out mutation operation on mRNA sequences, mRNA can be more easily identified and translated in the translation process, and the translation efficiency is improved.
On the basis of the above embodiments, the embodiment of the present disclosure may explain a process of mutation operation, as shown in fig. 5, including:
s501, obtaining one or more synonymous candidate codons of the original codon.
In some implementations, synonymous codons refer to different codons in a genetic code table encoding the same amino acid. One or more synonymous candidate codons for the original codon may be determined by querying a genetic code table.
S502, based on the candidate codon, carrying out codon substitution on the original codon of the mutation candidate region to obtain a mRNA third sequence.
In some implementations, the order of mutation of the candidate codons can be determined, and the original codons are replaced with the candidate codons in the order of mutation, and a first global score after mRNA sequence replacement is determined. By comparing the first global score with the pre-substitution second global score to determine whether to end the mutation operation, the mutated mRNA sequence can be more stable and the translation efficiency can be increased.
Optionally, ending the mutation to obtain a third sequence of mRNA in response to the first global score after the current substitution being greater than the second global score before the mRNA sequence substitution. And in response to the first global score after the current substitution being less than or equal to the second global score before the mutation of the mRNA sequence, continuing to substitute the original codon for the next candidate codon and performing subsequent steps.
In some implementations, the mutation candidate region has a plurality of original codons, and the original codons can be synonymously replaced by determining the position numbers of the plurality of original codons and according to the position numbers of the original codons.
Optionally, if the original codon at position i is replaced with the candidate codon and the first global score after each replacement is less than or equal to the second global score, then synonymous codon replacement is performed for the original codon at position i+1, where i is a natural number greater than or equal to 1.
Optionally, if the original codon at the position i+1 is replaced by the synonymous codon, and the first global score is greater than the second global score after the replacement, stopping the synonymous codon replacement operation on the original codon at the subsequent position to obtain a mutant sequence corresponding to the mRNA sequence, that is, a third mRNA sequence, otherwise, continuing to replace the synonymous codon on the original codon at the position i+2.
The original codons in the mutation candidate region were subjected to a mutation operation as shown in FIG. 6. K original codons exist in the mutation candidate region, wherein the original codon at the position i is ACA, the original codon at the position i+1 is AAC, the original codon at the position i+2 is AAA, and the original codon at the position i+K-1 is GCA.
For the original codon at the position i, the synonymous candidate codons are ACC, ACG, ACU respectively, ACC is used for replacing the original codon ACA first, and if the first global score after replacement is larger than the second global score before replacement, mutation is ended to obtain a mRNA third sequence. If the first global score after substitution is smaller than or equal to the second global score, the ACG is used for substitution, whether the first global score is larger than the second global score or not is judged, if the first global score is still smaller than or equal to the second global score, the ACU is used for substitution, and if the first global score is still smaller than or equal to the second global score, the substitution of the synonymous codon on the original codon in position i+1 is carried out. And if the first global score after substitution is smaller than or equal to the second global score, continuing to substitute the synonymous codon for the original codon at the position i+2.
And if the first global score after the substitution of the first K-1 original codons is smaller than or equal to the second global score, the substitution of synonymous codons is carried out on the original codons GGA on the position i+K-1. The synonymous candidate codons are GGU, GGG, GGC respectively, firstly, GGU is used for replacing the original codon GGA, and if the first global score after replacement is larger than the second global score before replacement, mutation is ended to obtain a mRNA third sequence. If the first global score after substitution is smaller than or equal to the second global score, GGG is used for substitution, whether the first global score is larger than the second global score is judged, if the first global score is still smaller than or equal to the second global score, GGC is used for substitution, and if the first global score is still smaller than or equal to the second global score, mutation is not accepted, and mutation is ended, so that an mRNA third sequence is obtained.
According to the optimization method of adjacent codons, the mRNA target sequence can be obtained by acquiring the mRNA initial sequence to be optimized and performing repeated iterative optimization on the sequence, the use frequency of the codons can be optimized, the stability of the mRNA is ensured, the translation efficiency of the mRNA in a designated host cell is improved, and the protein expression yield is improved. The mutation candidate region codons are replaced by synonymous codons, and whether the synonymous codons are replaced or not is judged based on global scores, so that the mutated mRNA sequence is more stable and the translation efficiency is improved.
FIG. 7 is a flow chart of a method for optimizing adjacent codons according to an embodiment of the present disclosure.
As shown in FIG. 7, the method for optimizing adjacent codons may include:
s701, obtaining an initial mRNA sequence of the mRNA to be optimized.
S702, iterating from an initial mRNA sequence, acquiring a first sequence set at the end of the jth iteration aiming at the jth iteration, and amplifying the mRNA first sequence in the first sequence set to obtain a second sequence set at the beginning of the (j+1) th iteration, wherein the second sequence set comprises the first sequence and the mRNA second sequence, J is a natural number, and J is more than or equal to 1 and less than or equal to J.
S703, obtaining local scores of each codon K-mer in the mRNA second sequence for each mRNA second sequence.
S704, determining mutation candidate regions from the mRNA second sequence based on the local scores.
S705, replacing the original codon with the candidate codon according to the mutation order for the original codon in the mutation candidate region, and determining a first global score after the mRAN sequence replacement.
S706, ending the mutation to obtain a third sequence of the mRAN in response to the first global score after the current substitution being greater than the second global score before the substitution of the mRAN sequence, wherein the second global score is the global score of the second sequence of mRNA.
S707, obtaining a first threshold value and a second threshold value of the output sequence of the mRNA.
S708, filtering the third sequence in the third sequence set based on the first threshold and the second threshold to obtain a first sequence set at the end of the j+1st iteration.
S709, responding to the j+1th iteration meeting the iteration ending condition, and obtaining an mRNA target sequence based on the first sequence set at the j+1th iteration ending.
S710, in response to the j+1th iteration not meeting the iteration end condition, continuing the j+2th iteration based on the first sequence set at the end of the j+1th iteration.
According to the optimization method of adjacent codons, the mRNA initial sequence to be optimized is obtained, repeated iterative optimization is carried out from the mRNA initial sequence, the mRNA sequence is amplified and mutated, whether the mutated sequence meets the iteration ending condition or not is judged, iteration is ended when the mutated sequence meets the iteration ending condition, the mRNA target sequence is obtained, the use frequency of the codons can be optimized, the stability of the mRNA is guaranteed, the translation efficiency of the mRNA in a specified host cell is improved, and the protein expression yield is improved.
A flow chart for iterative optimization of the codon K-mer as shown in fig. 8. The method comprises the steps of obtaining an initial sequence of mRNA, calculating a first threshold value and a second threshold value, starting iteration from the initial sequence of mRNA, amplifying the first sequence of mRNA in a first sequence set based on the end of the j-th iteration for the j-th iteration to obtain a second sequence set, carrying out mutation operation on the second sequence of mRNA in the second sequence set, determining mutation candidate regions based on local scores of codons K-mer, and carrying out synonymous codon substitution on K original codons in the mutation candidate regions. And further determining whether to receive the mutation based on the first global score after the substitution. And ending the mutation to obtain a third sequence set in response to the first global score after the current substitution being greater than the second global score before the substitution. Wherein the second global score is a global score for the second sequence of mRNA. And filtering the mRNA third sequences in the third sequence set based on the first threshold value and the second threshold value to determine candidate mRNA third sequences. And further determining the global score of the candidate mRNA third sequence, sequencing the candidate mRNA third sequence according to the global score to obtain a first sequence set when the iteration is finished, judging whether the iteration is finished or not, stopping the iteration if the iteration is finished, outputting an mRNA target sequence, and continuing j+2 iterations if the iteration is not finished, until the iteration is finished.
Exemplary description the embodiments of the present disclosure are explained using the example of optimizing mRNA vaccine double codons. Since this scenario is optimized for the double codon frequency, k=2. Based on the frequency of occurrence of adjacent 2 amino acid pairs and adjacent 2 codon pairs in the genome of the statistical host (here, human), and calculating a local score for each codon pair according to formula (3) as:
the local scores for each codon pair may be stored in a hash table for subsequent computation of the query.
Step 1: an initial CAI and an initial MFE of an initial sequence of mRNA are obtained, wherein the initial CAI can be calculated based on a codon frequency table of the host (human). For the CAI and MFE allowable loss values, values close to 1 should be taken, here, for example, 0.95, indicating that after the double codon frequency optimization, the absolute values of the CAI and MFE of the sequence are not lower than 95% of their corresponding initial values. Further, according to the formulas (1) and (2), a first threshold value and a second threshold value are calculated.
Step 2: setting the total number n of amplified samples as 100, that is to say, copying the existing samples in sequence from high to low according to the overall score of the mRNA sequence until the total number of the samples reaches 100. For the first iteration, the number of samples is 1 at this time, so it needs to be replicated 99 times; for non-initial iterations, the number of samples is the number of samples screened in the previous iteration, which is less than n, and the number of copies is 100 minus the current number of samples.
For each newly amplified sequence, a mutation operation was performed. First, a local score for each double-codon in the sequence is obtained by querying the local score hash table described above. Then, one of the lowest scoring s double codons is randomly selected as a mutation candidate region, wherein the parameter s can be set according to the length of the sequence, for example, to 20% of the total number of codons in the sequence. Further, sequentially attempting to mutate 2 codons of the mutation candidate region, in each attempt, replacing the mutation candidate region by a synonymous candidate codon of the original codon at the current position, if the replacement brings about an increase in the first global score of the sequence, accepting the mutation, otherwise continuing to attempt to replace other candidate codons, and replacing the original codon at other positions in the mutation candidate region until a certain mutation is accepted or the mutation candidate region is searched for, thereby obtaining a third sequence set.
Step 3: and calculating CAI and MFE of the third sequence of each mRNA, filtering the CAI if the CAI is smaller than a first threshold value, and filtering the MFE if the MFE is larger than a second threshold value to obtain the candidate mRNA third sequence.
Step 4: and calculating the global score of the candidate mRNA third sequence, sorting the candidate mRNA third sequence from large to small according to the global score, and selecting m samples before ranking to obtain a first sequence set at the end of the current iteration. Where m should be significantly smaller than n (n=100), e.g. 20.
Step 5: and judging whether the iteration ending condition is met. The iteration end condition may be set such that the global score reaches a set threshold, or the number of iterations reaches a set value J. Here, taking the limit of the set number of iterations as an example, j=100, for example. If the iteration ending condition is met, outputting a first sequence set as an mRNA target sequence, and ending the iteration; otherwise, continuing to execute the step 2 and entering the next iteration.
Alternatively, the mRNA target sequence may be modified according to the use requirements of the mRNA vaccine. For example, the wrong base on the mRNA target sequence can be corrected, and the sequence can be ensured to meet the requirement of mRNA vaccine.
In correspondence with the adjacent codon optimization method provided in the above embodiments, an embodiment of the present disclosure further provides an adjacent codon optimization device, and since the adjacent codon optimization device provided in the embodiment of the present disclosure corresponds to the adjacent codon optimization method provided in the above embodiments, implementation of the adjacent codon optimization method described above is also applicable to the adjacent codon optimization device provided in the embodiment of the present disclosure, and will not be described in detail in the following embodiments.
Fig. 9 is a schematic structural diagram of an optimizing device for adjacent codons according to an embodiment of the disclosure.
As shown in fig. 9, the optimizing device 900 of adjacent codons in the embodiment of the disclosure includes an obtaining module 901 and an optimizing module 902.
The acquisition module 901 is configured to acquire an initial mRNA sequence of the mRNA to be optimized.
And an optimizing module 902, configured to perform J iterative optimization on a codon K-mer in the mRNA initial sequence, to obtain one or more mRNA target sequences, where J is a natural number greater than or equal to 1, and the codon K-mer is K adjacent codons in the mRNA sequence.
In one embodiment of the present disclosure, the optimizing module 902 is further configured to: starting iteration from the initial mRNA sequence, acquiring a first sequence set at the end of the jth iteration for the jth iteration, and amplifying the first mRNA sequence in the first sequence set to obtain a second sequence set at the beginning of the (j+1) th iteration, wherein the second sequence set comprises the first sequence and the second mRNA sequence, J is a natural number, and J is more than or equal to 1 and less than or equal to J; carrying out mutation operation on a codon K-mer in an mRNA second sequence in the second sequence set to obtain a third sequence set, wherein the third sequence set comprises an mRNA third sequence, and determining a first sequence set at the end of the j+1th iteration from the third sequence set; and responding to the j+1th iteration meeting the iteration ending condition, obtaining the mRNA target sequence based on the first sequence set when the j+1th iteration is ended, or responding to the j+1th iteration not meeting the iteration ending condition, and continuing to carry out the j+2th iteration based on the first sequence set when the j+1th iteration is ended.
In one embodiment of the present disclosure, the optimizing module 902 is further configured to: obtaining a local score for each codon K-mer in each of the mRNA second sequences; determining a mutation candidate region from the mRNA second sequence based on the local score; and carrying out mutation operation on the original codon in the mutation candidate region to obtain the mRNA third sequence.
In one embodiment of the present disclosure, the optimizing module 902 is further configured to: obtaining one or more synonymous candidate codons for the original codon; and carrying out codon substitution on the original codon of the mutation candidate region based on the candidate codon to obtain the mRNA third sequence.
In one embodiment of the present disclosure, the optimizing module 902 is further configured to: determining a mutation order of the candidate codons, replacing the original codons with the candidate codons according to the mutation order, and determining a first global score after the mRNA sequence replacement; ending the mutation to obtain the mRNA third sequence in response to the first global score after the current substitution being greater than the second global score before the mRNA sequence substitution; and in response to the first global score after the current substitution being less than or equal to the second global score before the mutation of the mRNA sequence, continuing to substitute the original codon for the next candidate codon and performing subsequent steps.
In one embodiment of the present disclosure, the optimizing module 902 is further configured to: determining the position serial numbers of the plurality of original codons, and carrying out synonymous codon substitution on the original codons according to the position serial numbers of the original codons; if the original codon at the position i is replaced by the candidate codon and the first global score after each replacement is smaller than or equal to the second global score, carrying out synonymous codon replacement on the original codon at the position i+1, wherein i is a natural number larger than or equal to 1; and if the original codon at the position i+1 is replaced by the synonymous codon, and the first global score is larger than the second global score after the replacement, stopping the synonymous codon replacement operation on the original codon at the subsequent position to obtain a mutation sequence corresponding to the mRNA sequence, otherwise, continuing to replace the synonymous codon on the original codon at the position i+2.
In one embodiment of the present disclosure, the optimizing module 902 is further configured to: obtaining a set of constraint thresholds for an output sequence of the mRNA, wherein the set of constraint thresholds comprises a first threshold for a codon adaptation index CAI and a second threshold for a minimum folding free energy MFE; and filtering the third sequence in the third sequence set based on the first threshold and the second threshold to obtain a first sequence set at the end of the j+1th iteration.
In one embodiment of the present disclosure, the optimizing module 902 is further configured to: determining CAI and MFE for each third sequence in the set of third sequences; determining, from the third set of sequences, candidate third sequences for which the CAI is greater than or equal to the first threshold and the MFE is less than or equal to the second threshold; and obtaining a first sequence set at the end of the j+1th iteration based on the candidate third sequence.
In one embodiment of the present disclosure, the optimizing module 902 is further configured to: determining an initial CAI and an initial MFE for an initial sequence of the mRNA; determining CAI and MFE allowable loss values for the output sequence of the mRNA; the first threshold is determined based on the initial CAI and the CAI allowable loss value, and the second threshold is determined based on the initial MFE and the MFE allowable loss value.
In one embodiment of the present disclosure, the optimizing module 902 is further configured to: determining global scores of the candidate third sequences, and sorting the candidate third sequences according to the global scores; and determining a target third sequence from the target third sequence according to the sequencing result, and obtaining the first sequence set at the end of the j+1st iteration.
In one embodiment of the present disclosure, the optimizing module 902 is further configured to: for any codon K-mer, obtaining the occurrence frequency of the codon, the amino acid, the adjacent K codon pairs and the adjacent K amino acid pairs in the genome where the mRNA sequence is located, wherein K is a natural number greater than or equal to 1; based on the frequency of occurrence, a local score for the codon K-mer is determined.
In one embodiment of the present disclosure, the optimizing module 902 is further configured to: a global score for the mRNA sequence is determined based on the local score and the number of codons in the mRNA sequence.
In one embodiment of the present disclosure, the optimizing module 902 is further configured to: the first global score of the first sequence set is greater than or equal to a set global score threshold; alternatively, the number of iterations is equal to the J.
According to the optimizing device for adjacent codons, provided by the embodiment of the disclosure, the mRNA initial sequence to be optimized is obtained, repeated iterative optimization is performed from the mRNA initial sequence, the mRNA sequence is amplified and mutated, whether the mutated sequence meets the iteration ending condition or not is judged, iteration is ended when the mutated sequence meets the iteration ending condition, the mRNA target sequence is obtained, the use frequency of the codons can be optimized, the stability of the mRNA is ensured, the translation efficiency of the mRNA in a specified host cell is improved, and the protein expression yield is improved.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 10 shows a schematic block diagram of an example electronic device 1000 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to computer programs/instructions stored in a Read Only Memory (ROM) 1002 or loaded from a storage unit 1006 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the device 1000 can also be stored. The computing unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
Various components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and communication unit 1009 such as a network card, modem, wireless communication transceiver, etc. Communication unit 1009 allows device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The computing unit 1001 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 1001 performs the respective methods and processes described above, for example, the optimization method of adjacent codons. For example, in some embodiments, the method of optimizing adjacent codons may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as in some embodiments of the storage unit 1006, part or all of the computer program/instructions may be loaded and/or installed onto the device 1000 via the ROM 1002 and/or the communication unit 1009. When the computer program/instructions is loaded into RAM 1003 and executed by computing unit 1001, one or more steps of the adjacent-codon optimization method described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the optimization method of adjacent codons by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs/instructions that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs/instructions running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (29)

1. A method of optimizing adjacent codons, wherein the method comprises:
obtaining an initial mRNA sequence of the mRNA to be optimized;
and performing iterative optimization on the codon K-mer in the mRNA initial sequence for J times to obtain one or more mRNA target sequences, wherein J is a natural number greater than or equal to 1, and the codon K-mer is adjacent K codons in the mRNA sequence.
2. The method of claim 1, wherein said performing J iterative optimizations for a codon K-mer in the mRNA initial sequence results in one or more mRNA target sequences, comprising:
Starting iteration from the initial mRNA sequence, acquiring a first sequence set at the end of the jth iteration for the jth iteration, and amplifying the first mRNA sequence in the first sequence set to obtain a second sequence set at the beginning of the (j+1) th iteration, wherein the second sequence set comprises the first sequence and the second mRNA sequence, J is a natural number, and J is more than or equal to 1 and less than or equal to J;
carrying out mutation operation on a codon K-mer in an mRNA second sequence in the second sequence set to obtain a third sequence set, wherein the third sequence set comprises an mRNA third sequence, and determining a first sequence set at the end of the j+1th iteration from the third sequence set;
and responding to the j+1th iteration meeting the iteration ending condition, obtaining the mRNA target sequence based on the first sequence set when the j+1th iteration is ended, or responding to the j+1th iteration not meeting the iteration ending condition, and continuing to carry out the j+2th iteration based on the first sequence set when the j+1th iteration is ended.
3. The method of claim 2, wherein the mutating the codon K-mers in the second sequence of the mRNA in the second set of sequences comprises:
Obtaining a local score for each codon K-mer in each of the mRNA second sequences;
determining a mutation candidate region from the mRNA second sequence based on the local score;
and carrying out mutation operation on the original codon in the mutation candidate region to obtain the mRNA third sequence.
4. The method of claim 3, wherein said mutating the original codon in said mutation candidate region to obtain said mRNA third sequence comprises:
obtaining one or more synonymous candidate codons for the original codon;
and carrying out codon substitution on the original codon of the mutation candidate region based on the candidate codon to obtain the mRNA third sequence.
5. The method of claim 4, wherein said codon substitution of the original codon of said mutated candidate region based on said candidate codon results in said mRNA third sequence comprising:
determining a mutation order of the candidate codons, replacing the original codons with the candidate codons according to the mutation order, and determining a first global score after the mRNA sequence replacement;
Ending the mutation to obtain the mRNA third sequence in response to the first global score after the current substitution being greater than the second global score before the mRNA sequence substitution;
and in response to the first global score after the current substitution being less than or equal to the second global score before the mutation of the mRNA sequence, continuing to substitute the original codon for the next candidate codon and performing subsequent steps.
6. The method of claim 4, wherein the mutation candidate region comprises a plurality of original codons, the method further comprising:
determining the position serial numbers of the plurality of original codons, and carrying out synonymous codon substitution on the original codons according to the position serial numbers of the original codons;
if the original codon at the position i is replaced by the candidate codon and the first global score after each replacement is smaller than or equal to the second global score, carrying out synonymous codon replacement on the original codon at the position i+1, wherein i is a natural number larger than or equal to 1;
and if the original codon at the position i+1 is replaced by the synonymous codon, and the first global score is larger than the second global score after the replacement, stopping the synonymous codon replacement operation on the original codon at the subsequent position to obtain a mutation sequence corresponding to the mRNA sequence, otherwise, continuing to replace the synonymous codon on the original codon at the position i+2.
7. The method of any of claims 2-5, wherein the determining the first set of sequences from the third set of sequences at the end of the j+1th iteration comprises:
obtaining a set of constraint thresholds for an output sequence of the mRNA, wherein the set of constraint thresholds comprises a first threshold for a codon adaptation index CAI and a second threshold for a minimum folding free energy MFE;
and filtering the third sequence in the third sequence set based on the first threshold and the second threshold to obtain a first sequence set at the end of the j+1th iteration.
8. The method of claim 7, wherein the filtering the third sequence in the third sequence set based on the first threshold and the second threshold results in a first sequence set at the end of the j+1th iteration, comprising:
determining CAI and MFE for each third sequence in the set of third sequences;
determining, from the third set of sequences, candidate third sequences for which the CAI is greater than or equal to the first threshold and the MFE is less than or equal to the second threshold;
and obtaining a first sequence set at the end of the j+1th iteration based on the candidate third sequence.
9. The method of claim 7, wherein the obtaining a set of constraint thresholds for the output sequence of mRNA comprises:
determining an initial CAI and an initial MFE for an initial sequence of the mRNA;
determining CAI and MFE allowable loss values for the output sequence of the mRNA;
the first threshold is determined based on the initial CAI and the CAI allowable loss value, and the second threshold is determined based on the initial MFE and the MFE allowable loss value.
10. The method of claim 7, wherein the deriving the first set of sequences at the end of the j+1th iteration based on the candidate third sequence comprises:
determining global scores of the candidate third sequences, and sorting the candidate third sequences according to the global scores;
and determining a target third sequence from the target third sequence according to the sequencing result, and obtaining the first sequence set at the end of the j+1st iteration.
11. A method according to claim 3, wherein the determination of the local score comprises:
for any codon K-mer, obtaining the occurrence frequency of the codon, the amino acid, the adjacent K codon pairs and the adjacent K amino acid pairs in the genome where the mRNA sequence is located, wherein K is a natural number greater than or equal to 1;
Based on the frequency of occurrence, a local score for the codon K-mer is determined.
12. The method of claim 11, wherein the determining of the global score comprises:
a global score for the mRNA sequence is determined based on the local score and the number of codons in the mRNA sequence.
13. The method of claim 2, wherein the target sequence is a sequence in a set of sequences iteratively derived when an iteration end condition is satisfied, the iteration end condition comprising:
the first global score of the first sequence set is greater than or equal to a set global score threshold; or,
the number of iterations is equal to the J.
14. An apparatus for optimizing adjacent codons, wherein the apparatus comprises:
the acquisition module is used for acquiring an initial mRNA sequence of the mRNA to be optimized;
and the optimization module is used for performing J times of iterative optimization on the codon K-mer in the mRNA initial sequence to obtain one or more mRNA target sequences, wherein J is a natural number greater than or equal to 1, and the codon K-mer is adjacent K codons in the mRNA sequence.
15. The apparatus of claim 14, wherein the optimization module is further configured to:
Starting iteration from the initial mRNA sequence, acquiring a first sequence set at the end of the jth iteration for the jth iteration, and amplifying the first mRNA sequence in the first sequence set to obtain a second sequence set at the beginning of the (j+1) th iteration, wherein the second sequence set comprises the first sequence and the second mRNA sequence, J is a natural number, and J is more than or equal to 1 and less than or equal to J;
carrying out mutation operation on a codon K-mer in an mRNA second sequence in the second sequence set to obtain a third sequence set, wherein the third sequence set comprises an mRNA third sequence, and determining a first sequence set at the end of the j+1th iteration from the third sequence set;
and responding to the j+1th iteration meeting the iteration ending condition, obtaining the mRNA target sequence based on the first sequence set when the j+1th iteration is ended, or responding to the j+1th iteration not meeting the iteration ending condition, and continuing to carry out the j+2th iteration based on the first sequence set when the j+1th iteration is ended.
16. The apparatus of claim 15, wherein the optimization module is further configured to:
obtaining a local score for each codon K-mer in each of the mRNA second sequences;
Determining a mutation candidate region from the mRNA second sequence based on the local score;
and carrying out mutation operation on the original codon in the mutation candidate region to obtain the mRNA third sequence.
17. The apparatus of claim 16, wherein the optimization module is further configured to:
obtaining one or more synonymous candidate codons for the original codon;
and carrying out codon substitution on the original codon of the mutation candidate region based on the candidate codon to obtain the mRNA third sequence.
18. The apparatus of claim 17, wherein the optimization module is further configured to:
determining a mutation order of the candidate codons, replacing the original codons with the candidate codons according to the mutation order, and determining a first global score after the mRNA sequence replacement;
ending the mutation to obtain the mRNA third sequence in response to the first global score after the current substitution being greater than the second global score before the mRNA sequence substitution;
and in response to the first global score after the current substitution being less than or equal to the second global score before the mutation of the mRNA sequence, continuing to substitute the original codon for the next candidate codon and performing subsequent steps.
19. The apparatus of claim 17, wherein the optimization module is further configured to:
determining the position serial numbers of the plurality of original codons, and carrying out synonymous codon substitution on the original codons according to the position serial numbers of the original codons;
if the original codon at the position i is replaced by the candidate codon and the first global score after each replacement is smaller than or equal to the second global score, carrying out synonymous codon replacement on the original codon at the position i+1, wherein i is a natural number larger than or equal to 1;
and if the original codon at the position i+1 is replaced by the synonymous codon, and the first global score is larger than the second global score after the replacement, stopping the synonymous codon replacement operation on the original codon at the subsequent position to obtain a mutation sequence corresponding to the mRNA sequence, otherwise, continuing to replace the synonymous codon on the original codon at the position i+2.
20. The apparatus of any of claims 15-18, wherein the optimization module is further to:
obtaining a set of constraint thresholds for an output sequence of the mRNA, wherein the set of constraint thresholds comprises a first threshold for a codon adaptation index CAI and a second threshold for a minimum folding free energy MFE;
And filtering the third sequence in the third sequence set based on the first threshold and the second threshold to obtain a first sequence set at the end of the j+1th iteration.
21. The apparatus of claim 20, wherein the optimization module is further configured to:
determining CAI and MFE for each third sequence in the set of third sequences;
determining, from the third set of sequences, candidate third sequences for which the CAI is greater than or equal to the first threshold and the MFE is less than or equal to the second threshold;
and obtaining a first sequence set at the end of the j+1th iteration based on the candidate third sequence.
22. The apparatus of claim 20, wherein the optimization module is further configured to:
determining an initial CAI and an initial MFE for an initial sequence of the mRNA;
determining CAI and MFE allowable loss values for the output sequence of the mRNA;
the first threshold is determined based on the initial CAI and the CAI allowable loss value, and the second threshold is determined based on the initial MFE and the MFE allowable loss value.
23. The apparatus of claim 20, wherein the optimization module is further configured to:
Determining global scores of the candidate third sequences, and sorting the candidate third sequences according to the global scores;
and determining a target third sequence from the target third sequence according to the sequencing result, and obtaining the first sequence set at the end of the j+1st iteration.
24. The apparatus of claim 16, wherein the optimization module is further configured to:
for any codon K-mer, obtaining the occurrence frequency of the codon, the amino acid, the adjacent K codon pairs and the adjacent K amino acid pairs in the genome where the mRNA sequence is located, wherein K is a natural number greater than or equal to 1;
based on the frequency of occurrence, a local score for the codon K-mer is determined.
25. The apparatus of claim 24, wherein the optimization module is further configured to:
a global score for the mRNA sequence is determined based on the local score and the number of codons in the mRNA sequence.
26. The apparatus of claim 15, wherein the optimization module is further configured to:
the first global score of the first sequence set is greater than or equal to a set global score threshold; or,
the number of iterations is equal to the J.
27. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.
28. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-13.
29. A computer program product comprising computer program/instructions which, when executed by a processor, implement the method steps of any one of claims 1 to 13.
CN202311605984.1A 2023-11-28 2023-11-28 Adjacent codon optimization method and device, electronic equipment and storage medium Pending CN117660445A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311605984.1A CN117660445A (en) 2023-11-28 2023-11-28 Adjacent codon optimization method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311605984.1A CN117660445A (en) 2023-11-28 2023-11-28 Adjacent codon optimization method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117660445A true CN117660445A (en) 2024-03-08

Family

ID=90070645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311605984.1A Pending CN117660445A (en) 2023-11-28 2023-11-28 Adjacent codon optimization method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117660445A (en)

Similar Documents

Publication Publication Date Title
US20230366046A1 (en) Systems and methods for analyzing viral nucleic acids
US20200232029A1 (en) Systems and methods for mitochondrial analysis
TWI802728B (en) Codon optimization method, system and electronic device comprising the same, nucleic acid molecule thereof, and protein expression method using the same
Zhu et al. DNA sequence compression using adaptive particle swarm optimization-based memetic algorithm
US20170199959A1 (en) Genetic analysis systems and methods
CN113035269B (en) Genome metabolism model construction, optimization and visualization method based on high-throughput sequencing technology
US20170098030A1 (en) System and method for generating detection of hidden relatedness between proteins via a protein connectivity network
Zheng et al. Detecting distant-homology protein structures by aligning deep neural-network based contact maps
KR20160073406A (en) Systems and methods for using paired-end data in directed acyclic structure
Choo et al. Recent applications of hidden Markov models in computational biology
CN112580324A (en) Text error correction method and device, electronic equipment and storage medium
Tran et al. Detection of generic differential RNA processing events from RNA-seq data
CA3056303A1 (en) Systems and methods for determining effects of genetic variation on splice site selection
CN113284559B (en) Method, system and equipment for querying promoter of species genome
Sahraeian et al. PicXAA-R: efficient structural alignment of multiple RNA sequences using a greedy approach
CN113574603A (en) Rapid detection of gene fusion
CN117660445A (en) Adjacent codon optimization method and device, electronic equipment and storage medium
Faure et al. GraphUnzip: unzipping assembly graphs with long reads and Hi-C
He et al. Inference of RNA structural contacts by direct coupling analysis
CN114429786A (en) Omics data processing method and device, electronic device and storage medium
Backofen et al. Comparative RNA Genomics
CN109360600B (en) Protein structure prediction method based on residue characteristic distance
CN114429787B (en) Omics data processing method and device, electronic device and storage medium
Chowdhury et al. An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm
CN114792573B (en) Drug combination effect prediction method, model training method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination