CN117275582A - Construction of amino acid sequence generation model and method for obtaining protein variant - Google Patents

Construction of amino acid sequence generation model and method for obtaining protein variant Download PDF

Info

Publication number
CN117275582A
CN117275582A CN202310832292.4A CN202310832292A CN117275582A CN 117275582 A CN117275582 A CN 117275582A CN 202310832292 A CN202310832292 A CN 202310832292A CN 117275582 A CN117275582 A CN 117275582A
Authority
CN
China
Prior art keywords
amino acid
model
acid sequence
construction
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310832292.4A
Other languages
Chinese (zh)
Inventor
王晨阳
胡亦朗
夏晋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhuyao Technology Co ltd
Original Assignee
Shanghai Zhuyao Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhuyao Technology Co ltd filed Critical Shanghai Zhuyao Technology Co ltd
Priority to CN202310832292.4A priority Critical patent/CN117275582A/en
Publication of CN117275582A publication Critical patent/CN117275582A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0006Oxidoreductases (1.) acting on CH-OH groups as donors (1.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y101/00Oxidoreductases acting on the CH-OH group of donors (1.1)
    • C12Y101/01Oxidoreductases acting on the CH-OH group of donors (1.1) with NAD+ or NADP+ as acceptor (1.1.1)
    • C12Y101/01037Malate dehydrogenase (1.1.1.37)
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Organic Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Databases & Information Systems (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Medicinal Chemistry (AREA)
  • Microbiology (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Epidemiology (AREA)

Abstract

The invention provides a construction of an amino acid sequence generation model and a protein variant obtaining method to generate a high-quality protein sequence with reasonable structure and corresponding actual functions, and specifically, the construction comprises the following steps: constructing a data set for generation, collecting all actually existing amino acid sequences corresponding to target proteins from a public protein database, clustering, and dividing the actually existing amino acid sequences into a training data set and an evaluation data set; constructing a network model structure, and performing generating network construction and judging network construction to form a TPGAN preliminary model; model training and evaluation, namely, a preliminary model is adopted, a training data set is input, a back propagation algorithm is utilized to simultaneously optimize and iterate training on a generating network and a judging network, and the evaluation set adjusts the preliminary model to avoid overfitting to obtain an adjusted model; and obtaining a generation model, and verifying the adjustment model to obtain the generation model of the amino acid sequence which can be used for generating the target protein.

Description

Construction of amino acid sequence generation model and method for obtaining protein variant
Technical Field
The invention belongs to the technical field of biology, and particularly relates to a construction method of an amino acid sequence generation model and a protein variant obtaining method.
Background
Proteins are important basic substances in living bodies, and the diversity of amino acid sequences is important for the survival and reproduction of living bodies.
The TPGAN model (transducer-based protein generative adversarial network) is a large language model, can effectively generate brand-new protein sequences (amino acid sequences), and has wide application value.
However, the conventional experimental-based protein sequence prediction method has certain limitations such as high cost, long time consumption, etc., which also facilitate the research and exploration of the TPGAN model based on the deep learning technique.
Disclosure of Invention
The invention provides a construction of an amino acid sequence generation model and a protein variant obtaining method, and aims to describe a specific implementation method of a TPGAN model and application of the TPGAN model in protein sequence generation in detail so as to obtain a high-quality protein sequence which has a reasonable structure and corresponding actual functions, so that better technical popularization and application are expected to be achieved.
For this purpose, the present invention provides the following technical solutions.
The present invention provides a construction of an amino acid sequence generation model for generating an amino acid sequence of a target protein, comprising: constructing a data set for generation, collecting all actually existing amino acid sequences corresponding to target proteins from a public protein database, preprocessing, clustering based on the consistency percentage of the actually existing amino acid sequences, randomly selecting a certain number of clusters from all clusters with the number of sequences less than or equal to 5 in the clusters to be used as an evaluation set, wherein the total number of the randomly selected clusters as the evaluation set accounts for 20% or less of the total number of all clusters obtained by clustering, and the sequences of the other clusters are gathered together to be used as a training data set to construct a network model structure, and generating a network construction and judging the network construction to form a TPGAN preliminary model; model training and evaluation, wherein a training data set is input by adopting the preliminary model, a generating network and a judging network are simultaneously optimized and iteratively trained by utilizing a back propagation algorithm, and the preliminary model is adjusted by adopting the evaluation data set to avoid overfitting so as to obtain an adjusted model; and obtaining a generation model, and verifying the adjustment model to obtain the generation model of the amino acid sequence which can be used for generating the target protein.
The construction of the amino acid sequence generation model provided by the invention also has the following characteristics: wherein the pretreatment comprises de-duplication and de-noising, and discarding sequences with amino acid lengths exceeding 500.
The construction of the amino acid sequence generation model provided by the invention also has the following characteristics: wherein the generating network comprises a self-encoder construction and a generator construction, the self-encoder construction being: a transducer module is adopted to construct a coder and a decoder, four layers of networks are respectively used, and a multi-head attention mechanism is applied in the middle; the generator is a neural network constructed by three fully connected layers, inputs a noise conforming to Gaussian distribution, uses KL divergence loss, changes a vector conforming to a normal distribution through calculation of a plurality of hidden layers, and transmits the vector to a decoder for decoding to generate a new amino acid sequence.
The construction of the amino acid sequence generation model provided by the invention also has the following characteristics: wherein, the discrimination network discriminates whether the amino acid sequence generated by the generation network is reasonable or not, and preferably, the discrimination network is a 3-layer MLP model.
The construction of the amino acid sequence generation model provided by the invention also has the following characteristics: the method comprises the steps of judging whether a network receives a real amino acid sequence and generates a network generated amino acid sequence, and learning differences between the real amino acid sequence and the generated amino acid sequence by calculating a plurality of hidden layers by using binary cross entropy as a loss function so as to judge whether the received amino acid sequence is the real amino acid sequence or not.
The construction of the amino acid sequence generation model provided by the invention also has the following characteristics: wherein a plurality of loss functions are optimized simultaneously in training, and corresponding super-parameters are adjusted, preferably the super-parameters are learning rates, and the learning rates are adjusted to be 1e-4,dropout rate 0.1,batchsize to be 8.
The construction of the amino acid sequence generation model provided by the invention also has the following characteristics: the verification of the adjustment model is as follows: and comparing the amino acid sequences generated by the model after each training and adjustment in a protein database by adopting blast software, and obtaining a generated model when the comparison result is improved by 3 times compared with the comparison result generated by the initial training.
The construction of the amino acid sequence generation model provided by the invention also has the following characteristics: wherein the protein of interest is a variant of malate dehydrogenase.
The invention also provides a method for obtaining a protein variant, which is characterized by comprising the following steps: randomly generating a plurality of amino acid sequences by using the generation model; calculating similarity scores of each generated amino acid sequence and a protein database sequence library by using blast software, and selecting an amino acid sequence with the blast score of top 100; predicting the three-dimensional structure of each selected amino acid sequence by using alpha fold2 to obtain plddt fraction, and reserving the amino acid sequence of plddt > 90; comparing the three-dimensional structure corresponding to the reserved amino acid sequence with the wild crystal structure to obtain the structural RMSD, and simultaneously analyzing the conserved site of the wild type; the amino acid sequence with all conserved sites reserved and RMSD <2.0 was selected to obtain the corresponding protein variants, and functional alignment tests were performed with wild type to select the desired protein variants.
The obtaining method provided by the invention also has the following characteristics: wherein a desired malate dehydrogenase variant is when the protein variant is a malate dehydrogenase variant, preferably the malate dehydrogenase variant has an enzyme activity that is at least 1-fold that of the wild-type enzyme activity.
Drawings
FIG. 1 is a flow chart showing the construction of an amino acid sequence generation model described in example 1;
fig. 2 is a graph of the fit equation involved in example 2.
Detailed Description
The following detailed description of the invention is provided in connection with the accompanying drawings. With respect to the specific methods or materials used in the embodiments, those skilled in the art may perform conventional alternatives based on the technical idea of the present invention and are not limited to the specific descriptions of the embodiments of the present invention.
The methods used in the examples are conventional methods unless otherwise specified; the materials, reagents and the like used, unless otherwise specified, are all commercially available.
The malate dehydrogenase (Malate dehydrogenase, abbreviated as MDH, EC 1.1.1.37) is an enzyme protein, and is widely used in organisms including plants, animals, microorganisms and the like. The function of this enzyme is to catalyze the redox reaction between malic acid and NAD+ to convert malic acid to oxalic acid, while reducing NAD+ to NADH.
Malate dehydrogenase plays an important physiological role in organisms and is involved in the regulation of many metabolic pathways, such as tricarboxylic acid cycle, photosynthesis, respiration, etc. In plants, malate dehydrogenase is also involved in regulating plant response to environmental adaptation, such as regulating root growth and development, adaptation to acidic soil, and the like.
Therefore, research on malate dehydrogenase has important significance for deeply knowing metabolic pathways and regulation mechanisms thereof in organisms and improving agricultural production efficiency.
Variants herein refer to those that have amino acids that are not identical relative to the wild type, but retain the essential properties of the wild type.
The enzyme activity herein refers to a unit of measurement of the enzyme activity, that is, 1 unit of enzyme activity refers to an amount of enzyme capable of converting 1. Mu. Mole of a substrate in 1 minute under a specific condition (25 ℃ C., other is the optimum condition), or an amount of enzyme capable of converting 1. Mu. Mole of a relevant group in the substrate.
Kcat is the catalytic constant of the enzyme (catalytic constant, kcat), also called turnover number, i.e., how many substrates 1 enzyme molecule catalyzes into products per unit time. Kcat can be used to measure the catalytic efficiency of an enzyme, the greater the Kcat value, the greater the catalytic efficiency of the enzyme.
Miq constant K m Defined as the substrate concentration at which the enzyme is running at half its maximum catalytic rate; thus, it describes the affinity of an enzyme for a particular substrate. K (K) m Knowledge of the values is crucial for quantitative understanding of enzymatic and regulatory interactions between enzymes and metabolites: it will metabolize the intracellular concentration, K m Can reflect the affinity of the enzyme to the substrate, i.e. K m The smaller the value, the greater the affinity of the enzyme to the substrate; conversely, the smaller the affinity.
K cat /K m Will K cat And K m Taken together, not only can be used to measure the catalytic efficiency of an enzyme, but also can show the perfection of an enzyme.
Example 1
The present embodiment provides a construction of an amino acid sequence generation model for generating an amino acid sequence of a target protein, comprising the steps of: the method comprises the steps of constructing a data set for generation, constructing a network model structure, training a model, evaluating the model and obtaining a generation model.
The protein of interest refers to a protein having a desired function, which is finally obtained, for example, a variant of malate dehydrogenase.
The construction process is explained in detail as follows (as in fig. 1):
the data set for construction and generation specifically includes: collection of proteins of interest from public protein databases
And (3) preprocessing all the actually existing amino acid sequences corresponding to the quality, clustering based on the consistency percentage of the actually existing amino acid sequences, randomly selecting a certain number of clusters from all clusters with the number of sequences less than or equal to 5 in the clusters to be clustered together as an evaluation set, wherein the total number of the randomly selected clusters as the evaluation set accounts for 20% or less of the total number of all clusters obtained by clustering, for example, 50 clusters are obtained by clustering, 30 clusters with the number of sequences less than or equal to 5 are obtained by clustering, and 10 clusters are randomly selected from the 30 clusters to be clustered to be the evaluation set. "all the true amino acid sequences corresponding to the protein of interest" means that the amino acid sequences are already present in reality and: wild-type proteins corresponding to the final protein of interest, as well as all other variants relative to the wild-type.
Optionally, when the number of clusters with the sequence number of less than or equal to 5 is less than or equal to 20% of the number of all clusters obtained by clustering, a set of all clusters with the sequence number of less than or equal to 5 is selected as the evaluation set, for example, 50 clusters are obtained by clustering, and 5 clusters with the sequence number of less than or equal to 5 are obtained, and then all the 5 clusters are collected as the evaluation set.
Preferably, the protein of interest for which the construction process is directed is a variant of malate dehydrogenase.
The construction of the network model structure is specifically as follows: the generation network construction and the discrimination network construction are performed,
forming a preliminary model of TPGAN.
Model training and evaluation are specifically: adopting a preliminary model, inputting a training data set, carrying out back propagation according to a loss function, simultaneously carrying out optimization and iterative training on a generating network and a judging network by using a back propagation algorithm, and adopting an evaluation data set to adjust the preliminary model to avoid over fitting so as to obtain an adjusted model;
and obtaining a generation model, and verifying the adjustment model to obtain the generation model of the amino acid sequence which can be used for generating the target protein.
The TPGAN model adopts a technology of generating an antagonistic network, and the model extracts the characteristics of the protein sequence by learning the arrangement and distribution rules of amino acids in the protein sequence and generates a brand new protein sequence based on the rule characteristics.
Compared with the common generation of the countermeasure network, a protein language pre-training large model based on a transducer is added. The pretrained large model with massive protein sequences can more effectively extract the regular characteristics of protein language, and the attention mechanism in the transducer can more effectively enable data to automatically learn weight, so that more weight can be provided for the model.
In one example, the pre-processing described above includes de-duplication and de-noising the collected, truly existing amino acid sequences, and discarding sequences that are more than 500 amino acids in length.
In an example, the building of the generation network includes a self-encoder building and a generator building.
The self-encoder is constructed as follows: the encoder and decoder are constructed by using a transducer module, four layers of networks are used respectively, and a multi-head attention mechanism is used in the middle: the input from the encoder is an amino acid sequence and the output is a vector.
The generator is a neural network constructed by three fully connected layers, inputs a noise conforming to Gaussian distribution, changes a vector conforming to a normal distribution through calculation of a plurality of hidden layers by utilizing KL divergence loss, transmits the vector to a decoder to decode and generate probability of each site, and finally converts the vector into an amino acid sequence.
In one example, the discrimination network discriminates whether the amino acid sequence generated by the generation network is reasonable or not, and preferably, the discrimination network is a neural network model consisting of 3 full-connection layers. Specifically, the training data set of the real amino acid sequence received by the discrimination network and the amino acid sequence generated by the generation network are used as a loss function, and the difference between the real amino acid sequence and the generated amino acid sequence is learned through calculation of a plurality of hidden layers to judge whether the received amino acid sequence is the real amino acid sequence or not: the received real and generated amino acid sequences are numbered, and the output is 1, which is determined to be a real sequence, and 0, which is not real sequence.
In an example, in optimizing and iteratively training the generating network and the discriminating network simultaneously using a back propagation algorithm, a plurality of loss functions are optimized simultaneously, and corresponding super-parameters are adjusted, preferably, the super-parameters are learning rates, and the learning rates are adjusted to 1e-4,dropout rate 0.1,batchsize to 8.
In one example, the validation of the adjustment model is: and comparing the amino acid sequences generated by the model after each training and adjustment in a protein database by adopting blast software to obtain similar scores, and obtaining the generated model when the scores are improved by 3 times compared with the comparison scores generated by the initial training.
The embodiment also provides a method for obtaining a protein variant, which comprises the following steps:
randomly generating a plurality of amino acid sequences by using a generating model obtained by the training;
calculating similarity scores of each amino acid sequence generated and the protein database sequence library by using blast software, specifically, selecting an amino acid sequence with the blast score of top 100 by comparing the amino acid sequence with the collected real amino acid sequences;
predicting the three-dimensional structure of each selected amino acid sequence by using alpha fold2 to obtain plddt fraction, and reserving the amino acid sequence of plddt > 90;
comparing the three-dimensional structure corresponding to the reserved amino acid sequence with the wild crystal structure to obtain the structural RMSD, and simultaneously analyzing the conserved site of the wild type;
the amino acid sequence with all conserved sites reserved and RMSD <2.0 was selected to obtain the corresponding protein variants, and functional alignment tests were performed with wild type to select the desired protein variants.
In one example, the protein variant for which the obtaining method is directed is a malate dehydrogenase variant,
preferably, the malate dehydrogenase variant is a desired malate dehydrogenase variant when the malate dehydrogenase variant has an enzyme activity that is at least 1-fold that of the wild-type (malate dehydrogenase has an amino acid sequence as shown in SEQ ID NO: 13). In one example, the malate dehydrogenase variant obtained by the method has an amino acid sequence as shown in any one of SEQ ID NO. 1-12, or has an amino acid sequence that has at least 85%, 90%, 95% or more identity with any one of SEQ ID NO. 1-12.
SEQ ID NOS 1-13 are shown in detail as follows:
SEQ ID NO:1:
MKVAVLGAAGGIGQALALLLKTQLPAGSELSLYDIAPVTPGVALDLSHIPTNVEVKGFSGEDATPALEGADVVLISAGVARKPGMDRSDLFNINAGIVRNLVEKIAKTFPSAIIGIITNPVNTTVAIAAEVLKKAGKYDKNKLFGVTTLDIIRSETFVAELKGKDPVEVDVPVIGGHSGVTILPLLSQVPGVSFTNQEVAALTKRIQNAGTEVVEAKAGGGSATLSMGQAAVRFGLSLVRGLQGENGVVECALVEGDGKHARFCAQPLLLGKNGVEERKSYGDL SAFEQQALEGMLATLKTDITLGEEFVKK;
SEQ ID NO:2:
MKVAVLGAAGGIGQALALLLKTQLPSGSELTLYDIAPVTPGVAVDLSHIPTAVKITGFSGEDAAPALEGADIVVISAGVRRKPGMDRSDLAPVNYGIVENLTKQIAKVTPDAIVGIITNPVNATVAVAEAVLEKAGVYDPRKLFGVTTLDIIRSNTFVAELKGKQPGEVEVPVIGGHSGRTIIPLLSQVEGVTFTPEEVKALTRRIQNAGTEVVEAKAGGGSATLSMGQAAARFVLDLVAAKEGAENIVRDALVKNDGSYAHFFTRPCLLGTDGIKEVLSIGELSEFEKARLEASRPYLSAEIAKGFAYVNT;
SEQ ID NO:3:
MKVAVLGAAGGIGQALALLLKTQLPSGSTLTLYDIAPVTPGVAVDLSHIPTAVKIEGFTGEDAAPALEGADIVVISAGVRRKPGMDRSDLKPVNFGIVENLTKQIAEVTPDAIILIITNPVNTTVAIAAEVLKKAGVYDPKRLFGVTTLDIIRSNTFVAELKGKQPGEVEVPVIGGHSGKTIIPLLSKVEGLTFTDEEVEELTKRIQNAGTEVVEAKAGGGSATLSMGQAAARTVLAVARARAGAENVVLDVLVEGDGSYARFFTRPCLLGTDGVKEILSIGELSDFEKKRLEESIPYMKEEIDAGYDYVNN;
SEQ ID NO:4:
MKVAVLGAAGGIGQALALLLKTQLPAGSELSLYDIAPVTPGVAVDLSHIPTAVKVKGFSGEDHTPALEGADVVLISAGVARKPGMDRSDLFNVNAGIVKNLVEQIAKTFPKAIIGIITNPVNTTVAIAAEVLKKAGVYDKNKLFGVTTLDVIRSETFVAELKPKDPVEVDVPVIGGHSGVTILPLLSQVPGVSFTNQEVAALTKRIQNAGTEVVEAKAGGGSATLSMGQAAVRFGLSLVRGLQGENGVVECALVEGDGKHARFCAQPLLLGKNGVEERKSYGKLSAFEQQALEGMLATLKKDITLGEEFVKKGSPAATAAERILVVVITDRN;SEQ ID NO:5:
MKKKVTVVGAGNVGATAAQEIAEKESRDVVLDDGMEGLPQGKALDVLQAGPLIGQSARISGTNDSSGTAGSDVVVITAGIPRKPGMSRDDLIGTNADIVKSVTENVVKLSPKAYIIVVSNPLDAMGYTAFSATGFPIERVIGMAGALDSARFRAFIAMELNVSAGNIQAVVLGGHGDTMVPLKRRTTVAGIPITSLMSAEGIEVIVMRTRMGGAEIVILLKTGSAYAAPSASEATMVDSIVKDQKRILPCALYLEGEYGASGICVGVPVKLGANGVEEIVDIKLQEEEKLLISISAKAVREMNKVLSVL;
SEQ ID NO:6:
MKVAVLGAAGGIGQALALLLKTQLPAGSELSLYDIAPVTPGVALDLSHIPTNVEVKGFSGEDATPALEGADVVLISAGVARKPGMDRSDLFNINAGIVRNLVEQIAKTFPKAIIGIITNPVNTTVAIAAEVLKKAGKYDKNKLFGVTTLDIIRSETFVAELKGKDPVEVDVPVIGGHSGVTILPLLSQVPGVSFTNQEVAALTKRIQNAGTEVVEAKAGGGSATLSMGQAAVRFGLSLVRGLQGENGVVECALVEGDGKHARFCAQPLLLGKNGVEERKSYGDLSAFEQQALEGMLATLKKDITTGE;
SEQ ID NO:7:
MKVAVLGAAGGIGQALALLLKTQLPAGSELSLYDIAPVTPGVAADLSHIPTNVFVKGFSGEDATPALEGADVVLISAGVARKPGMDRSDLFNVNAGIVKNLVEQIAKTFPKAIIGIITNPVNTTVAIAAEVLKKAGKYDKNKLFGVTTLDVIRSETFVAELKPKDPVEVDVPVIGGHSGVTILPLLSQVPGVSFTNQEVADLTKRIQNAGTEVVEAKAGGGSATLSMGQAAVRFGLSLVRALQGENGVVECALVEGDGKHARFCAQPLLLGKNGVEERKSYGDLSAFEQQALDGMLATLKKDITTME;
SEQ ID NO:8:
MKVAVLGAAGGIGQALALLLKTQLPSGSELTLYDIAPVTPGVAVDLSHIPTAVKIKGFSGEDASPALEGADVVVISAGVRRKPGMDRSDLAPVNFGIVENLTRQIAKVTPNAIVGIITNPVNSTVAVAAEVLKKEGVYDPKR LFGVTTLDIIRSNTFVAELKGKQPGEVEVPVIGGHSGETIIPLLSQVKGLTFSDEEIRDLTARIQNAGTEVVEAKAGGGSATLSMGQAAARFVLDVVAALEGEKNIIRDALVENDGSYARFFTAPCLLGTDGIEKVLSIGTLSAFEKAQLAASRPIMNAEIDKGFDYVNK;
SEQ ID NO:9:
MKVTVVGAGAVGATCAENIANKQIASEVVLLDIKEGFAEGKALDIMQTASLNGFDTKITGVTNDYSKTAGSDVVVITSGIPRKPGMTREELIGINAGIVKSVTENLLKLSPDRIIIVVSNPMDTMTYLAFKATGLPKNRIIGMGGALDSVRFRYFLSLALNVSASDLQAMVIGGHGDTTMIPLIRLATLNSIPVSKMLAGEELDEVAQDTMVGGATLTKLIGTSAWYAPGAAVATLVDSIVKDQKKIFPCSVYLEGEYGQKDICIGVPVILGANGVEKIVDIDLQDAEKAKLSKSADAVREMNKVLSV;
SEQ ID NO:10:
MVLKKILVGGAGNVGHTAANRAADERIGVVVLFDIVAGVPQGKELDIAESGPNEGFDRKTKGTNDYAGIAGSDVVIITAGIPRKPGMSRDDLLEINAKIVKSVVEGILKYSPDAIVIVVSNPLDVMVWVAQKFSGFPKNRVLGMAGVLDSSRFKYFEAEYLEVSMEDVLAFVLGGHGDTMVPLVRYDTVAGIPVTELLDSPEIAAIVERTRGGGAEIVTLLKTGSAYYAPSAAVAELVEAILPDTKKILPVAAHLAGEYGVSDMFVGVPVKLGSHGVEGIIEGKLTEAEDAAFQSSAESVDEGLAVLAAL;
SEQ ID NO:11:
MKVAVLGAAGGIGQALALLLKTRLPAGSELSLYDIAPVTPGVAVDLSHIPTAVKIKGFAGEDPTPALEGADVVLISAGVARKPGMDRSDLFNINAGIVKNLVEQNAKIFPKAIIGIITNPVNTTVAIAAEVLKKAGVYDKNKLFIVTTLDVIRSETFVAELKGLDPAEVDVPVIGGHSGVTILPLLSQVPGVSFTNQEVAALTKRIQNAGTEVVEAKAGGGSATLSMGQAAARFGLSLVRALQGENGVVECALVEGDGKHARFGAQPLLLGKNGVEAVKSYGKLSA FEQQALEGMLATLKADIVLGEEFVKK;
SEQ ID NO:12:
MKVAVLGAAGGIGQALALLLKTQLPSGSELKLYDIAPVTPGVAVDLSHIPTAVRIEGFTGEDATPALEGADVVVISAGVRRKPGMDRSDLIPVNFGIVENLIKQIAETTPDAVILIITNPVNSTVAVAAEVLEKAGVYDPKRLFGVTTLDIIRSNTFVAELKGKQPGEVEVRVIGGHSGETIIPLLSQVEGVTFTEEEKKELTDRIQNAGTEVVEAKAGGGSATLSMGQAAARTVLAVVRALRGEKDVVLDLLVKGDGSYSEFFTAPCLLGKDGVEEILSIGELDEYEKELLESSLPYLNRLIAIGKDYVNN;
SEQ ID NO:13:
MKVAVLGAAGGIGQALALLLKTQLPSGSELSLYDIAPVTPGVAVDLSHIPTAVKIKGFSGEDATPALEGADVVLISAGVRRKPGMDRSDLFNVNAGIVKNLVQQVAKTCPKACIGIITNPVNTTVAIAAEVLKKAGVYDKNKLFGVTTLDIIRSNTFVAELKGKQPGEVEVPVIGGHSGVTILPLLSQVPGVSFTEQEVADLTKRIQNAGTEVVEAKAGGGSATLSMGQAAARFGLSLVRALQGEQGVVECAYVEGDGQYARFFSQPLLLGKNGVEERKSIGTLSAFEQNALEGMLDTLKKDIALGQEFVNK。
example 2
This example demonstrates the effectiveness of the method of example 1 using the malate dehydrogenase variant obtained.
Plasmids containing variants were constructed and synthesized by Beijing qingke biotechnology Co., ltd using wild type malate dehydrogenase as a comparative example.
1. Recombinant escherichia coli culture and crude enzyme preparation experiment
1. Transformation of the plasmid into E.coli BL21 (DE 3): on a super clean bench, 2. Mu.L of plasmid at a concentration of 50mg/L was added to 100. Mu.L of BL21 (DE 3) competent cell suspension. Flick by hand or mix with gun, place on ice for 30min; heat shock is carried out for 90s in a water bath at the temperature of 42 ℃, and the heat shock is quickly carried out on ice for cooling for 5min, so that shaking is avoided. On an ultra-clean bench, 0.9mL of LB liquid medium is heated into the cell suspension, and after uniform mixing, the cells are cultured for 45min at 37 ℃ in a shaking way, and the rotating speed is 150-225rpm, so that the cells are recovered to a normal growth state. Centrifuging at 4000rpm for 1min, sucking the supernatant on an ultra-clean bench until 200uL of bacterial liquid remains, and blowing and sucking uniformly;
2. coating: 200. Mu.L of the mixture was plated on LB medium plates containing kana antibiotics, and the mixture was spread with a disposable spreading bar. Sealing the flat plate with sealing film, and standing until bacteria liquid is fully absorbed. Inverting the plate, and culturing at 37 ℃ for 12-24 hours until the transformant appears;
3. culturing primary seed liquid: 10mL of LB liquid culture medium and final concentration of kana antibiotics of 50 mug/mL are used for picking 1 monoclonal and culturing first-stage seed liquid. Culture conditions: 37 ℃,200rpm,12-18h;
4. and (5) gene delivery sequencing: 2mL of the cultured primary seed solution is taken to send gene sequencing, and whether the target gene fragment sequence of the MDH variant on the plasmid is correct or not is checked;
5. glycerol-retaining bacteria: on an ultra clean bench, a sterile bacteria-preserving pipe is opened, 0.5mL of first-stage seed liquid is added by a liquid-transferring gun, then 0.5mL of sterilized 50% glycerol is added, the mixture is uniformly mixed by the liquid-transferring gun, and a cover is covered. Placing into a refrigerator at-80deg.C for preservation;
6. culturing first-stage seed liquid by glycerol bacteria: and (3) verifying the sequence to be tested, and culturing the first-stage seed solution by using 10mL of LB liquid culture medium and 10 mu L of MDH variant glycerol bacteria with the final concentration of kana antibiotics of 50 mu g/mL. Culture conditions: 37 ℃,200rpm,12-18h;
7. culturing a secondary seed solution: the second seed solution was cultured with 200mL of LB medium, final kana antibiotic concentration of 50. Mu.g/mL, 10mL of MDH variant first seed solution. Culture conditions: sampling and measuring OD600nm to OD value of 0.6-0.8 in the culture process at 37 ℃ and 180 rpm;
8. induction: and after the secondary seed solution is cultured, adding an IPTG aqueous solution into the secondary seed solution on an ultra-clean bench to ensure that the final concentration of the IPTG in the secondary seed solution reaches 0.5mM. Culture conditions: 25 ℃,180rpm,16-24 hours;
9. and (3) centrifugally collecting thalli at a low temperature: for the induced bacterial liquid, 40mL of liquid is filled in a 50mL centrifuge tube, the liquid is centrifuged for 5min at 4 ℃ and 8000rpm, the supernatant is removed, and bacterial mud is reserved;
10. washing: for bacterial mud in each centrifuge tube, adding 5mL of Tris-HCl buffer (50 mM, pH 6.8-7.2) into each tube, blowing and sucking uniformly, combining a plurality of tubes into one tube, swirling, centrifuging at 8000rpm for 5min, removing supernatant, and reserving bacterial mud;
11. ultrasonic crushing: for the washed bacterial mud, adding 15mL of Tris-HCl buffer (50 mM, pH 6.8-7.2) into a 50mL centrifuge tube, blowing, sucking and suspending, vortex mixing uniformly, setting the working power of an ultrasonic breaker to 230W, carrying out ultrasonic treatment for 3s, stopping for 7s, and carrying out total working time of 40min. The centrifuge tube is in ice water bath in the ultrasonic breaking stage, and the thalli starts to be broken;
12. centrifuging to remove sediment: and (3) centrifuging the bacterial liquid after ultrasonic disruption at 4 ℃ and 10000rpm for 40min. Removing sediment, and reserving supernatant to obtain crude enzyme liquid of MDH variant enzyme.
2. Enzyme activity assay:
1. preheating an ultraviolet spectrophotometer for 30min in advance, adjusting the wavelength to 340nm, and setting background absorption to 0 by distilled water;
2. keeping the temperature in a water bath with the temperature of 37 ℃ of standby distilled water;
sequentially adding 760 mu L of distilled water, 10 mu L of 0.8mM NADH aqueous solution, 10 mu L of 1.6 mM-malic acid aqueous solution and 20 mu L of crude enzyme solution to be detected into A1 mL quartz cuvette, fully blowing, sucking and uniformly mixing, immediately recording initial absorbance A1 and absorbance A2 after 1min of reaction at a wavelength of 340nm, and keeping the temperature of the reaction solution at 37 ℃ in the reaction process;
3. enzyme activity calculation: under optimal conditions, the amount of enzyme that converts 1. Mu. Mol of substrate in 1min is 1U. Here, the enzyme activity calculation formula of the crude enzyme solution (U/mL) =Δa×v reaction solution/(ε×l×t×v enzyme solution). The notes for each term in the formula are as in Table 1:
both wild-type MDH and MDH variants were tested and the enzyme activity was calculated as described above, and the results are shown in Table 2:
conclusion: as can be seen from the enzyme activity determination table 2, the enzyme activities of the 12 groups of MDH variants are significantly better than that of the MDH wild type;
3. kcat value, K of wild MDH and 12 MDH variants M Measurement of the value:
kcat value and K of MDH M The measurement method of the value takes wild MDH as an example:
after the wild-type MDH crude enzyme solution was purified, the concentration of the purified enzyme was measured by the Bradford method.
The reaction was designed to catalyze the conversion of the substrates L-malate and NAD+ to oxaloacetate and NADH with wild-type MDH, and the concentration of NADH generated by the reaction was measured by High Performance Liquid Chromatography (HPLC).
The reaction system: the total volume of the reaction was 10mL, and the addition amount of the wild-type MDH-purified enzyme solution was 1mL. L-malic acid concentration 9 gradients [ S ] were set: 5. Mu.M, 10. Mu.M, 20. Mu.M, 40. Mu.M, 80. Mu.M, 160. Mu.M, 320. Mu.M, 640. Mu.M, 1280. Mu.M, L-malic acid was prepared as a 10mM stock solution at the time of use, and the loading volume was calculated from the desired concentration. NAD+ was set at 2mM, and NAD+ was prepared as a 10mM stock solution at the time of use, and the loading volume was calculated from the desired concentration. The whole reaction volume was made up to 10mL with ultrapure water.
Reaction sampling and detection: the reaction temperature is 37 ℃ and the reaction time is 1min, 1mL of reaction solution is taken, and the reaction solution is inactivated for 1min at a high temperature in a water bath with the temperature of 80 ℃. The obtained inactivated sample is diluted to a proper concentration, the absorption peak value of NADH at the wavelength of 340nm is detected by HPLC, and the actual NADH concentration in the reaction liquid is calculated according to the standard concentration curve of NADH standard substance. The reaction rate v (NADH is formed in equivalent to oxaloacetate) was calculated from the concentration of NADH formed.
As shown in FIG. 2, the Lineweaver-Burk equation for wild-type MDH under the experimental reaction conditions was linearly fitted using a double reciprocal mapping method: taking the reciprocal 1/[ S ] of the initial concentration of L-malic acid as an abscissa and taking the reciprocal 1/v of the reaction rate measured at each concentration as an ordinate, making a scatter diagram in Excel, and calculating a corresponding linear equation y=kx+b, wherein k in the equation is KM/Vmax in a Lineweaver-Burk equation, and b is 1/Vmax in the Lineweaver-Burk equation. The values of the variables are shown in Table 3.
Vmax=1/105.3=9.50×10 can be calculated from the fitted equation -3 (mol/min),K M =0.0126*Vmax=1.20*10 -4 (mol/L)。
Kcat=vmax per the molar amount of enzyme in the reaction is known from the definition. Vmax=9.50×10 -3 mol/min=1.58*10 -4 mol/s. The concentration of the mother liquor of the pure enzyme of the wild type MDH is 1.01mg/mL by the Bradford method, 1mL is taken in the reaction, and the molecular weight of the wild type MDH is 34458.63Da, thus the Kcat=Vmax/(1.01 mg/34458.63 g.mol) can be calculated -1 )=5387s -1
Kcat value and K of other mutants of MDH M The value measurement method is consistent with the wild type MDH. The measurement results are shown in Table 4:
from example 2, it can be seen that the variant of malate dehydrogenase obtained by the model and method constructed in example 1 improves the catalytic efficiency on malate and nad+, and in particular, the enzyme activity of the variant is 1-5 times that of the wild type, i.e., the model and method constructed in example 1 can obtain a high-quality protein sequence which can generate a protein sequence with reasonable structure and corresponding practical function, and has greater application potential.

Claims (10)

1. A construction of an amino acid sequence generation model for generating an amino acid sequence of a protein of interest, comprising:
constructing a data set for generation, collecting all actually existing amino acid sequences corresponding to target proteins from a public protein database, preprocessing, clustering based on the consistency percentage of the actually existing amino acid sequences, randomly selecting a certain number of clusters from all clusters with the number of sequences less than or equal to 5 in the clusters to be used as an evaluation set, wherein the total number of the randomly selected clusters as the evaluation set accounts for 20% or less of the total number of all clusters obtained by clustering, and the rest sequences are merged into a training data set;
constructing a network model structure, and performing generating network construction and judging network construction to form a TPGAN preliminary model;
model training and evaluation, wherein a training data set is input by adopting the preliminary model, a generating network and a judging network are simultaneously optimized and iteratively trained by utilizing a back propagation algorithm, and the preliminary model is adjusted by adopting the evaluation data set to avoid overfitting so as to obtain an adjusted model;
and obtaining a generation model, and verifying the adjustment model to obtain the generation model of the amino acid sequence which can be used for generating the target protein.
2. The construction of claim 1, wherein:
wherein the pretreatment comprises de-duplication and de-noising, and discarding sequences with amino acid lengths exceeding 500.
3. Construction according to claim 1 or 2, characterized in that:
wherein the generating network construction includes a self-encoder construction and a generator construction,
the self-encoder is constructed to: a transducer module is adopted to construct a coder and a decoder, four layers of networks are respectively used, and a multi-head attention mechanism is applied in the middle;
the generator is a neural network constructed by three fully connected layers, inputs noise conforming to Gaussian distribution, changes a vector conforming to a normal distribution through calculation of a plurality of hidden layers by utilizing KL divergence loss, transmits the vector to the decoder to decode and generate probability of adopting each amino acid at each position, and finally converts the probability into a new amino acid sequence.
4. A construction according to claim 3, wherein:
wherein the discrimination network discriminates whether the amino acid sequence generated by the generation network is reasonable, and preferably, the discrimination network is a 3-layer MLP model.
5. The construction of claim 4, wherein:
the discrimination network receives the training data set and the amino acid sequence generated by the generation network, and learns the difference between the real amino acid sequence and the generated amino acid sequence in the training data set by using binary cross entropy as a loss function and calculating a plurality of hidden layers so as to judge whether the received amino acid sequence is the real amino acid sequence.
6. The construction according to claim 5, wherein:
wherein a plurality of loss functions are optimized simultaneously in the training, and corresponding super-parameters are adjusted,
preferably, the super-parameter is a learning rate, which is adjusted to 1e-4,dropout rate 0.1,batchsize to 8.
7. A construction according to claim 3, wherein:
the verification of the adjustment model is as follows: and comparing the amino acid sequences generated by the adjusted model after each training in the protein database by adopting blast software, and obtaining the generated model when the comparison result is improved by 3 times compared with the comparison result generated by the initial training.
8. The construction of claim 1, wherein:
wherein the protein of interest is a variant of malate dehydrogenase.
9. A method for obtaining a protein variant, comprising:
randomly generating a number of amino acid sequences using the generation model of any one of claims 1-8;
calculating the similarity score of each generated amino acid sequence and protein database sequence library by using blast software, and selecting the amino acid sequence with the blast score of top 100;
predicting the three-dimensional structure of each selected amino acid sequence by using alpha fold2 to obtain plddt fraction, and reserving the amino acid sequence of plddt > 90;
comparing the three-dimensional structure corresponding to the reserved amino acid sequence with the wild crystal structure to obtain the structural RMSD, and simultaneously analyzing the conserved site of the wild type;
the amino acid sequence with all conserved sites reserved and RMSD <2.0 was selected to obtain the corresponding protein variants, and functional alignment tests were performed with wild type to select the desired protein variants.
10. The obtaining method according to claim 9, characterized in that:
wherein, when the protein variant is a malate dehydrogenase variant, preferably, the malate dehydrogenase variant has an enzyme activity at least 1-fold that of the wild-type enzyme activity, the desired malate dehydrogenase variant.
CN202310832292.4A 2023-07-07 2023-07-07 Construction of amino acid sequence generation model and method for obtaining protein variant Pending CN117275582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310832292.4A CN117275582A (en) 2023-07-07 2023-07-07 Construction of amino acid sequence generation model and method for obtaining protein variant

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310832292.4A CN117275582A (en) 2023-07-07 2023-07-07 Construction of amino acid sequence generation model and method for obtaining protein variant

Publications (1)

Publication Number Publication Date
CN117275582A true CN117275582A (en) 2023-12-22

Family

ID=89201572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310832292.4A Pending CN117275582A (en) 2023-07-07 2023-07-07 Construction of amino acid sequence generation model and method for obtaining protein variant

Country Status (1)

Country Link
CN (1) CN117275582A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733444A (en) * 2020-12-30 2021-04-30 浙江大学 Multistep long time sequence prediction method based on CycleGAN neural network
CN114303201A (en) * 2019-05-19 2022-04-08 贾斯特-埃沃泰克生物制品有限公司 Generation of protein sequences using machine learning techniques
CN115620831A (en) * 2022-10-09 2023-01-17 深圳瑞德林生物技术有限公司 Method for generating sequence mutation fitness through loop iteration optimization and related device
CN116230074A (en) * 2022-12-14 2023-06-06 粤港澳大湾区数字经济研究院(福田) Protein structure prediction method, model training method, device, equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114303201A (en) * 2019-05-19 2022-04-08 贾斯特-埃沃泰克生物制品有限公司 Generation of protein sequences using machine learning techniques
CN112733444A (en) * 2020-12-30 2021-04-30 浙江大学 Multistep long time sequence prediction method based on CycleGAN neural network
CN115620831A (en) * 2022-10-09 2023-01-17 深圳瑞德林生物技术有限公司 Method for generating sequence mutation fitness through loop iteration optimization and related device
CN116230074A (en) * 2022-12-14 2023-06-06 粤港澳大湾区数字经济研究院(福田) Protein structure prediction method, model training method, device, equipment and medium

Similar Documents

Publication Publication Date Title
Monod The phenomenon of enzymatic adaptation
US7842485B2 (en) Enhanced alcohol tolerant microorganism and methods of use thereof
Huang et al. Application of artificial neural network coupling particle swarm optimization algorithm to biocatalytic production of GABA
US20150031100A1 (en) Compositions and methods for chemical reporter vectors
JP2009523424A (en) Methods and compositions for cyanobacteria producing ethanol
CN107002019A (en) The method for producing the recombination yeast of 3 hydracrylic acids and 3 hydracrylic acids being produced using it
CN105112436A (en) Complete-biological synthesis method of adipic acid
CN109415418A (en) The method for generating interested molecule by the inclusion of the microbial fermentation of the gene of coding sugar phosphotransferase system (PTS)
US10233439B2 (en) Directed modification of glucosamine synthase mutant and application thereof
CN110615832A (en) Bmor mutant for efficiently screening isobutanol high-yield strains
CN117275582A (en) Construction of amino acid sequence generation model and method for obtaining protein variant
CN105255934A (en) Strategy for efficiently coproducing alpha-aminobutyric acid and gluconic acid
JP2010508021A (en) Methods of destroying quorum sensing that affect cell density of microbial populations
CN114657159B (en) 4-hydroxyl-L-threonine-phosphate dehydrogenase PdxA mutant and application thereof in preparation of vitamin B 6 In (1)
CN110396509A (en) Change the coenzyme activity of glucose dehydrogenase and the method and its application of Preference
CN109321508A (en) Produce genetic engineering bacterium and its application of heparosan
CN106574230A (en) Fed-batch process for the production of bacterial ghosts
CN116656637B (en) Variant of malate dehydrogenase
Ciobanu et al. Enhanced growth and β-galactosidase production on Escherichia coli using oxygen vectors
CN114854625B (en) Wound escherichia and method for preparing carotenoid degrading enzyme by using same
CN108949785A (en) Application of the sporulation related gene spo0A in producing enzyme
CN116254268B (en) Promoter library and application thereof in different bacteria
CN117603924B (en) Formate dehydrogenase mutant with improved protein solubility expression and application thereof
CN114891706B (en) High acid-resistant acetobacter and application thereof
CN113604413B (en) Recombinant strain, preparation method and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination