WO2024108567A1 - 蛋白质多位点组合突变的自动化迭代优化方法与应用 - Google Patents

蛋白质多位点组合突变的自动化迭代优化方法与应用 Download PDF

Info

Publication number
WO2024108567A1
WO2024108567A1 PCT/CN2022/134414 CN2022134414W WO2024108567A1 WO 2024108567 A1 WO2024108567 A1 WO 2024108567A1 CN 2022134414 W CN2022134414 W CN 2022134414W WO 2024108567 A1 WO2024108567 A1 WO 2024108567A1
Authority
WO
WIPO (PCT)
Prior art keywords
plasmid
site
amino acid
mutation
protein
Prior art date
Application number
PCT/CN2022/134414
Other languages
English (en)
French (fr)
Inventor
司同
付立豪
张建志
陈永灿
郭二鹏
谢文豪
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2022/134414 priority Critical patent/WO2024108567A1/zh
Publication of WO2024108567A1 publication Critical patent/WO2024108567A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/66General methods for inserting a gene into a vector to form a recombinant vector using cleavage and ligation; Use of non-functional linkers or adaptors, e.g. linkers containing the sequence for a restriction endonuclease
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P19/00Preparation of compounds containing saccharide radicals
    • C12P19/26Preparation of nitrogen-containing carbohydrates
    • C12P19/28N-glycosides
    • C12P19/30Nucleotides
    • C12P19/34Polynucleotides, e.g. nucleic acids, oligoribonucleotides
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Definitions

  • the present invention relates to an automated iterative optimization method for protein multi-site combined mutation and related applications.
  • Protein engineering is one of the important research directions in the field of synthetic biology.
  • human beings still have limited understanding of basic biological problems such as protein folding and the natural evolution mechanism of enzymes. Therefore, it is still a difficult problem to design protein functions from scratch (de novo design) based on rational design methods.
  • Directed evolution can effectively optimize the function of proteins without relying on structural and mechanistic information by simulating the principles of natural evolution in the laboratory.
  • directed evolution is highly dependent on high-throughput screening methods, which also limits its ability to transform proteins that lack high-throughput screening methods.
  • One object of the present invention is to provide an improved method for engineering protein multi-site combined mutations.
  • Another object of the present invention is to provide related applications of an improved protein multi-site combined mutation engineering method.
  • the present invention provides a method for engineering protein multi-site combined mutation, the method comprising:
  • the amino acid sequence is divided into M segments, each segment contains at least one preset mutation site, and a group of plasmid elements is constructed for each segment of the amino acid sequence, thereby constructing a plasmid element library containing M groups of plasmid elements; wherein N is an integer greater than or equal to 2, M is an integer greater than or equal to 2, and N ⁇ M; each group of plasmid elements includes: a plasmid element containing a gene encoding an unmutated amino acid sequence, and a plasmid element containing a gene encoding an amino acid sequence with a site-directed mutation;
  • the corresponding plasmid elements are selected from each group of plasmid elements, assembled into a multi-site combination mutation plasmid, and then the protein mutants are expressed for detection.
  • the engineering modification method of protein multi-site combined mutation provided by the present invention can quickly construct multi-site combined mutations and then perform function-sequence relationship testing.
  • each sequence contains a preset mutation site.
  • each group of plasmid elements includes: a plasmid element containing a coding gene of an unmutated amino acid sequence, and 1-19 plasmid elements containing a coding gene of a site-directed mutated amino acid sequence.
  • the 1-19 species correspond to 20 common amino acids except for unmutated amino acids.
  • each group of plasmid units includes 20 plasmid elements, corresponding to 20 amino acids at preset mutation sites.
  • the 20 amino acids namely glycine, alanine, valine, leucine, isoleucine, methionine (methionine), proline, tryptophan, serine, tyrosine, cysteine, phenylalanine, asparagine, glutamine, threonine, aspartic acid, glutamic acid, lysine, arginine and histidine, are the main units of proteins in living organisms.
  • the construction of plasmid elements and the construction of multi-site combination mutation plasmids are independently completed using automated functional islands.
  • the automated functional island may be an existing automated functional island in the art, or it may be commercially available instruments and equipment assembled according to the process requirements of the present invention.
  • it may be a functional island for automated protein engineering described in PCT/CN2021/133816.
  • all the contents recorded in PCT/CN2021/133816 are cited herein.
  • the use of automated functional islands to complete the construction of plasmid elements and the construction of multi-site combination mutation plasmids can quickly provide standard and reliable data, accelerate protein iterative optimization performance, and improve protein engineering capabilities.
  • the process of constructing a plasmid element library comprises:
  • a plasmid element containing the corresponding coding gene is constructed, and a total of M plasmid elements containing the coding gene of the unmutated amino acid sequence are constructed;
  • mutant bases were introduced into the amplification primers for PCR amplification, and the original sequence plasmid element was site-directedly mutated into other amino acid coding sequences, thereby constructing plasmid elements containing genes encoding site-directed mutated amino acid sequences.
  • the process of constructing a plasmid element containing a coding gene of an unmutated amino acid sequence is to construct a plasmid element using a Golden Gate assembly strategy.
  • it may include: configuring the system through an automated pipetting workstation in automated PCR, adding it to a PCR well plate, and transferring it to an automated PCR instrument with a mechanical gripper to run the amplification program. After the program is completed, it is moved to an automated nucleic acid extractor to run a DNA purification program, and the product is sent to an automated pipetting workstation for quantification and homogenization, and the Golden Gate assembly system is configured and added to a PCR well plate.
  • the 96-well plate containing the assembly system is sent to an automated PCR instrument to run the assembly program. Furthermore, after the program is completed, it is sent to an automated pipetting workstation for transformation of host cells (e.g., Escherichia coli DH5a competent cells), and after culturing overnight in an incubator, clones are picked and transferred to liquid culture medium, and sent for sequencing detection.
  • host cells e.g., Escherichia coli DH5a competent cells
  • the process of constructing a plasmid element containing a gene encoding a site-directed mutated amino acid sequence includes: using the constructed plasmid element containing a gene encoding an unmutated amino acid sequence as a template, introducing a mutated base in the amplification primer for PCR amplification, using 2 pairs of primers for PCR amplification for each plasmid template, and then assembling into a new plasmid through Gbsion assembly.
  • the automated PCR, automated nuclear purification, automated assembly, automated transformation and sequencing processes are the same as the construction of the plasmid element containing a gene encoding an unmutated amino acid sequence.
  • the preset multi-site combination mutation sequence is a multi-site combination mutation sequence recommended by a machine learning algorithm.
  • the machine learning algorithm can be any feasible machine learning algorithm in the art, which performs model prediction (including simulation prediction of its spatial structure, function, etc.) on protein mutants in advance to provide a recommended multi-site combination mutation sequence.
  • the machine learning algorithm can be the machine learning algorithm recorded in CN115249514A, or an algorithm of a protein engineering method guided by other machine learning. In the present invention, all the contents recorded in CN115249514A are cited here.
  • the process of constructing a multi-site combinatorial mutation plasmid includes: arranging the M groups of plasmid elements in the plasmid element library in sequence into a well plate, and the automated pipetting workstation selects the plasmid elements at the corresponding positions into a new well plate according to the preset multi-site combinatorial mutation sequence, and adds other assembly components in the assembly system, and runs the assembly program using an automated PCR instrument.
  • the multi-site combination mutation plasmid is further transformed into a host cell (e.g., Escherichia coli BL21 (DE3) competent cells), cultured, and clones are picked and transferred to the culture medium to continue culturing and expressing protein mutants for detection.
  • a host cell e.g., Escherichia coli BL21 (DE3) competent cells
  • the detection includes: detecting the sequence-function relationship of protein mutants.
  • metabolites are detected by high-throughput MALDI mass spectrometry, and the metabolites include protein mutants, so as to obtain sequence-function relationships.
  • the detection results can be further used as new inputs to the machine learning algorithm to perform continuous improvements in the next round of protein mutant model prediction and sequence design, thereby achieving iterative optimization of protein engineering.
  • the engineering method of protein multi-site combined mutation of the present invention is used for automated iterative optimization of protein multi-site combined mutation. That is, on the other hand, the present invention also provides the application of the method in automated iterative optimization of protein multi-site combined mutation.
  • the protein to be modified applicable to the present invention may include 3-20, more preferably 4-15, and even more preferably 4-10 preset mutation sites.
  • the protein to be modified is rhamnosyl acyltransferase (RhlA).
  • the preset mutation sites of rhamnosyl acyltransferase include one or more of Arg74, Ala101, Leu148, and Ser173.
  • the assembly of the combinatorial mutant plasmids is completed in rounds.
  • each round completes the construction of 48-768, preferably 96-384, combinatorial mutant plasmids.
  • 384 combined mutations are predicted in each round, and better combined mutations can be screened after 4-5 rounds of iterations.
  • a round of construction and testing of 384 combined mutant strains usually only takes 3-5 days, and the machine learning algorithm can screen the best combined mutations within 4-5 rounds.
  • the method of the present invention can complete a protein engineering transformation goal within half a month to one month.
  • the present invention provides an automated iterative optimization method and application of protein combination mutation, wherein the gene is segmented by PCR according to the multiple sites that need mutation, and constructed onto the backbone plasmid, and site-directed mutagenesis primers are designed to mutate specific amino acid sites into other 19 kinds of amino acids, and a plasmid element library is constructed.
  • suitable elements in the plasmid element library are selected to be constructed onto the backbone plasmid, and transformed into other chassis such as Escherichia coli for gene expression, and metabolites are detected by high-throughput MALDI mass spectrometry to obtain sequence-function relationships.
  • test results can be further used as new inputs to the algorithm to continuously improve model prediction and sequence design.
  • the automated combined mutation construction test can quickly provide standard and reliable data for the machine learning algorithm, accelerate the algorithm iterative optimization performance, and improve the protein engineering capability.
  • the present invention can quickly construct the mutation sequence recommended by the algorithm, thereby performing iterative optimization in a shorter time and better guiding protein engineering.
  • FIG1 is a schematic diagram of the technical route of protein multi-site mutation engineering modification of the present invention.
  • FIG. 2 is a schematic diagram of the automated process for constructing the lv0 plasmid element of the present invention.
  • FIG3 is a schematic diagram of the process flow of the lv0 plasmid component library construction scheme of the present invention.
  • FIG4 is a schematic diagram of the automated process of constructing combined mutations of the present invention.
  • FIG5 is a schematic diagram of the automated process of combined mutant detection according to the present invention.
  • each raw reagent material is commercially available, and the experimental method without specifying specific conditions is a conventional method and conventional conditions well known in the art, or according to the conditions recommended by the instrument manufacturer.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • FIG. 1 The technical route of the protein multi-site mutation engineering modification of the present invention is shown in Figure 1.
  • This example takes rhamnosyl acyltransferase (RhlA) as an example to study the effect of combined mutations at four sites (Arg74, Ala101, Leu148, Ser173) on substrate selectivity.
  • RhlA rhamnosyl acyltransferase
  • the rhla gene was synthesized into four segments, each of which contained bases 1-264bp, 265-387bp, 388-483bp and 484-888bp of the rhla gene nucleotide sequence, which encoded amino acids 1-88aa, 89-129aa, 130-161aa and 162-296aa of the RhlA amino acid sequence, respectively.
  • the lv0 plasmid element was constructed using the Golden Gate assembly strategy (the automated process for constructing the lv0 plasmid element is shown in Figure 2), and the primers (as shown in Table 1) introduced the endonuclease Bsmbl and endonuclease Bsal sites and their interfaces.
  • the specific process includes:
  • the system was configured through an automated pipetting workstation, 25 ⁇ L PCR mix, 1 ⁇ L template plasmid (containing the rhla gene), 2 ⁇ L upstream primer, 2 ⁇ L downstream primer, and 20 ⁇ L deionized water were added to a 96-well PCR plate, and the mechanical gripper was transferred to the automated PCR instrument to run the amplification program;
  • the program After the program is completed, it is moved to the automated nucleic acid extractor to run the DNA purification program.
  • the product is sent to the automated liquid transfer workstation for quantification and homogenization, and the Golden Gate assembly system is configured.
  • the total reaction system is 2 ⁇ L NEB Golden Gate Enzyme Mix (BsmBI-v2) (NEB Product Number: E1602), 2 ⁇ L T4DNA Ligase Buffer (10X), 75ng each of the lv0-ccdb backbone plasmid and the purified DNA fragment, and the deionized system is filled to 20 ⁇ L;
  • the 96-well plate containing the assembly system was sent to the automated PCR instrument to run the assembly program.
  • the program was set according to the product manual. After the program was completed, it was sent to the automated pipetting workstation for transformation of E. coli DH5a competent cells. After culturing in a 37-degree incubator overnight, two clones were picked and transferred to liquid culture medium and sent for sequencing.
  • lv0 plasmid elements Four lv0 plasmid elements were obtained and named: lv0-s1, lv0-s2, lv0-s3 and lv0-s4.
  • Rhla-GG-S1-F CGTCTCATCGGGGTCTCAttggatgcggcgcgaaagt
  • Rhla-GG-S1-R CGTCTCAGGTCGGTCTCAgccaggaggatttccacct
  • Rhla-GG-S2-F CGTCTCATCGGGGTCTCAtggcgctgatcgagcgctt
  • Rhla-GG-S2-R Rhla-GG-S2-R CGTCTCAGGTCGGTCTCAggggcgaatgccatcacca
  • Rhla-GG-S3-F CGTCTCATCGGGGTCTCAcccctggactgaaccaggc (SEQ ID NO:5) Rhla-GG-S3-R CGTCTCAGGTCGGTCTCAggtctcgttgagcagatgg
  • Rhla-GG-S4-F CGTCTCATCGGGGTCTCAcccctggactgaaccaggc Rhla-GG-S3-R
  • Primers were designed using the lv0 plasmid element constructed above as a template (Table 2), and mutant bases were introduced into the primers to mutate specific sites of the lv0 plasmid element into other 19 common amino acid sequences to form a plasmid element library (see Figure 3 for the construction scheme of the lv0 plasmid element library).
  • the specific process includes:
  • Each plasmid template was PCR amplified using 2 pairs of primers and then assembled into a new plasmid using Gibson assembly.
  • the automated PCR, automated nuclear purification, automated assembly, automated transformation and sequencing processes were the same as the construction of lv0 plasmid elements.
  • the automated assembly system was: 10 ⁇ L Gibson Assembly Master Mix (2X) (NEB catalog number: E5510), 2.5 ⁇ L PCR fragment 1, 2.5 ⁇ L PCR fragment 2, 5 ⁇ L deionized water, and the assembly reaction conditions were set according to the product manual.
  • lv0-s1, lv0-s2, lv0-s3 and lv0-s4 plasmid elements were mutated into 19 plasmids respectively, and a total of lv080 plasmid element libraries (lv0-074 (20), lv0-101 (20), lv0-148 (20), lv0-173 (20)) were obtained.
  • Rhla-SDM-A101L-1-F aggtcaatcacctggtAtccCTGtcctggg
  • Rhla-SDM-A101M-1-F aggtcaatcacctggtAtccATGtcctggg
  • Rhla-SDM-A101N-1-F aggtcaatcacctggtAtccAATtcctggg
  • Rhla-SDM-A101P-1-F aggtcaatcacctggtAtccCCGtcctggg Rhla-SDM-A101Q-1-F aggtcaatcacctggtAtccCAGtcctggg
  • Rhla-SDM-A101R-1-F aggtcaatcacctggtAtccCGGtcctggg
  • Rhla-SDM-S173I-1-F tgccgcagcgcctgaaagccATTaaccatc (SEQ ID NO:73) Rhla-SDM-S173K-1-F tgccgcagcgcctgaaagccAAGaaccatc (SEQ ID NO:74) Rhla-SDM-S173L-1-F tgccgcagcgcctgaaagccCTGaaccatc (SEQ ID NO:75) Rhla-SDM-S173M-1-F tgccgcagcgcctgaaagccATGaaccatc (SEQ ID NO:76) Rhla-SDM-S173N-1-F tgccgcagcgcctgaaagccAATaaccatc (SEQ ID NO:77) Rhla-SDM-S173P-1-F tgccgca
  • the four required plasmid elements were selected from the lv0-074 (20), lv0-101 (20), lv0-148 (20), and lv0-173 (20) element libraries, and constructed into the lv1-rhlb-ccdb backbone plasmid using the Golden Gate assembly method (see Figure 4 for the automated process of combined mutation construction).
  • the specific process includes:
  • plasmid components were arranged in order in a 96-well plate.
  • the automated pipetting workstation sucked the plasmid components at the corresponding position into a new 96-well PCR plate according to the sequence list, and other assembly components were added at the same time.
  • the assembly system was: 2 ⁇ L NEB Golden Gate Enzyme Mix (Bsal-v2) (NEB catalog number: E1601), 2 ⁇ L T4DNA Ligase Buffer (10X), 75ng each of the lv1-rhlb-ccdb backbone plasmid and the other 4 lv0 plasmid components, and deionized water was added to 20 ⁇ L.
  • the mechanical gripper was transferred to the automated PCR instrument to run the assembly program, and the program was set according to the product manual. After the program was completed, it was sent to the automated pipetting workstation for transformation of the competent state of Escherichia coli BL21 (DE3). After culturing in a 37-degree incubator overnight, two clones were picked and transferred to liquid culture medium and sent for sequencing detection. It usually takes only 1.5 hours to complete the assembly of 96 combined mutation plasmids, and 384-768 combined mutation plasmids can be constructed in one day.
  • the algorithm used in this embodiment comes from the machine learning-guided biological sequence engineering method and device disclosed in CN115249514A. Each round predicts 384 combined mutations, and the best combined mutation can be screened after 4-5 rounds of iterations (see Figure 5 for the automated process of combined mutant detection). Algorithms with similar functions can be configured in the present invention, not limited to CN115249514A.
  • the specific process includes:
  • the combined mutant strain constructed by the above method is cultured and fermented in a 96-well deep-well plate with high throughput in an automated liquid transfer workstation and an automated culture instrument for 24 hours. After completion, ethyl acetate is added using an automated liquid transfer workstation, and the upper organic phase is added to the MALDI mass spectrometry target plate for machine detection. The peak height of the specific metabolite charge-to-mass ratio (m/z) is measured according to mass spectrometry to quantify the performance of the mutant strain, the sequence-function relationship data after quantification is re-imported into the machine learning algorithm, and the next round of combined mutation optimization process is carried out. Usually, a round of 384 combined mutant strains only need 3-5 days to build a test, and the machine learning algorithm can screen the optimal combination mutation in 4-5 rounds, and the method of the present invention can complete a protein engineering target in half a month to one month.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Medicinal Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

蛋白质多位点组合突变的自动化迭代优化方法与应用。首先提供一种蛋白质多位点组合突变的工程改造方法,该方法包括:根据待改造蛋白质的氨基酸序列的N个预设突变位点,将氨基酸序列设分为M段,每段含有至少一个预设突变位点,针对每段氨基酸序列分别构建一组质粒元件,从而构建形成包含M组质粒元件的质粒元件库;每组质粒元件中分别包括含有未突变或突变氨基酸序列的编码基因的质粒元件;根据预设的多位点组合突变序列,从每组质粒元件中选取对应的质粒元件,组装构建成多位点组合突变质粒,进而表达蛋白质突变体进行检测。本技术可以快速地构建多点组合突变,并进行功能-序列关系的测试。

Description

蛋白质多位点组合突变的自动化迭代优化方法与应用 技术领域
本发明是关于一种蛋白质多位点组合突变的自动化迭代优化方法与相关应用。
背景技术
蛋白质工程是合成生物学领域的重要研究方向之一。但目前人类对于蛋白质折叠、酶天然进化机制等基础生物学问题的理解仍很有限,因此基于理性设计方法进行蛋白质的功能从头设计(de novo design)仍然是一个难题。定向进化(directed evolution)通过在实验室模拟自然进化的原理,可以在不依赖结构和机制信息的基础上对蛋白质的功能进行有效优化。但是定向进化高度依赖高通量筛选方法,也限制了其对缺少高通量筛选方法的蛋白质进行改造的能力。近年来,人工智能辅助的蛋白质工程逐渐发展成为一种高效的蛋白质分子设计新策略,在蛋白质的结构预测、功能预测、溶解度预测和指导智能文库设计等多个方面显现出独特的优势,成为理性设计和定向进化之后的又一次技术发展的浪潮。
机器学习(Machine learning)算法为了从标记数据中学习序列-功能关系,已经创建了监督模型来预测各种性质,包括热稳定性、荧光、配体结合亲和力以及催化性能等。虽然取得了概念验证的成功,但当前机器学习算法的性能受到数据稀缺性和偏差的影响。例如,进化采样序列的适应度水平通常在中等范围内,因此标签多样性在低适应度和高适应度范围内受到限制。此外,由于功能分析昂贵且费力,标记的数据通常只覆盖整个序列空间的一小部分。对于含有N个突变位点的蛋白质改造,理论改造空间为20 N(例:20 4=160000),现有的实验水平很难全部构建,当前机器学习指导的蛋白质工程方法试图将实验批次大小和迭代次数保持在尽可能低的水平。
发明内容
本发明的一个目的在于提供一种改进的蛋白质多位点组合突变的工程改造方法。
本发明的另一目的在于提供改进的蛋白质多位点组合突变的工程改造方法的相关应用。
一方面,本发明提供了一种蛋白质多位点组合突变的工程改造方法,该方法包括:
根据待改造蛋白质的氨基酸序列的N个预设突变位点,将氨基酸序列设分为M段,每段含有至少一个预设突变位点,针对每段氨基酸序列分别构建一组质粒元件,从而构建形成包含M组质粒元件的质粒元件库;其中,N为大于等于2的整数,M为大于等于2的整数,且N≥M;每组质粒元件中分别包括:含有未突变氨基酸序列的编码基因的质粒元件,以及含有定点突变氨基酸序列的编码基因的质粒元件;
根据预设的多位点组合突变序列,从每组质粒元件中选取对应的质粒元件,组装构建成多位点组合突变质粒,进而表达蛋白质突变体进行检测。
本发明提供的蛋白质多位点组合突变的工程改造方法,可以快速地构建多点组合突变,进而进行功能-序列关系的测试。
根据本发明的具体实施方案,本发明的方法中,M段氨基酸序列中,每段序列含有一个预设突变位点。
根据本发明的具体实施方案,本发明的方法中,每组质粒元件中分别包括:含有未突变氨基酸序列的编码基因的质粒元件,以及1-19种含有定点突变氨基酸序列的编码基因的质粒元件。所述1-19种分别对应20种常见氨基酸种除未突变氨基酸之外的其他氨基酸。根据本发明的优选实施方案,每组质粒单元中分别包括20种质粒元件,对应预设突变位点的20种氨基酸。所述20种氨基酸即甘氨酸、丙氨酸、缬氨酸、亮氨酸、异亮氨酸、甲硫氨酸(蛋氨酸)、脯氨酸、色氨酸、丝氨酸、酪氨酸、半胱氨酸、苯丙氨酸、天冬酰胺、谷氨酰胺、苏氨酸、天门冬氨酸、谷氨酸、赖氨酸、精氨酸和组氨酸这二十种,是组成生命体中的蛋白质的主要单元。
根据本发明的具体实施方案,本发明的方法中,质粒元件的构建、多位点组合突变质粒的构建各地独立地利用自动化功能岛完成。
根据本发明的具体实施方案,所述自动化功能岛可以是本领域种的现有自动化功能岛,也可以是商购各仪器设备根据本发明的工艺需求进行组装。例如,可以是PCT/CN2021/133816中记载的蛋白质自动化工程改造用的功能岛。本发明中将PCT/CN2021/133816中记载的内容全部引用于此。利用自动化功能岛完成质粒元件的构建、多位点组合突变质粒的构建,可以快速提供标准、可靠的数据,加速蛋白质迭代优化性能,提高蛋白质工程改造能力。
根据本发明的具体实施方案,本发明的方法中,构建质粒元件库的过程包括:
针对每段原始氨基酸序列分别构建含有对应编码基因的质粒元件,共构建得到M个含有未突变氨基酸序列的编码基因的质粒元件;
以每个所构建的含有未突变氨基酸序列的编码基因的质粒元件为模版,分别在扩增引物中引入突变碱基进行PCR扩增,将原序列质粒元件定点突变成其他氨基酸编码序列,分别构建得到含有定点突变氨基酸序列的编码基因的质粒元件。
根据本发明的具体实施方案,本发明的方法中,构建含有未突变氨基酸序列的编码基因的质粒元件的过程是以Golden Gate组装策略构建质粒元件。具体可包括:自动化PCR中通过自动化移液工作站进行体系的配置,添加到PCR孔板中,机械抓手转移到自动化PCR仪中运行扩增程序。程序完成后移至自动化核酸提取仪中运行DNA纯化程序,产物送至自动化移液工作站进行定量和均一化,并进行Golden Gate组装体系配置,添加到PCR孔板中。将装有组装体系的96孔板送至自动化PCR仪中运行组装程序。进一步地,程序完成送至自动化移液工作站进行宿主细胞(例如大肠杆菌DH5a感受态细胞)的转化,培养箱培养过夜后挑取克隆转接到液体培养基中,并送测序检测。
根据本发明的具体实施方案,本发明的方法中,构建含有定点突变氨基酸序列的编码基因的质粒元件的过程包括:以所构建的含有未突变氨基酸序列的编码基因的质粒元件为模版,在扩增引物中引入突变碱基进行PCR扩增,每个质粒模版使用2对引物进行PCR扩增,再通过Gbsion assembly组装成新的质粒。自动化PCR、自动化核纯化、自动化组装、自动化转化以及测序流程同含有未突变氨基酸序列的编码基因的质粒元件的构建。
根据本发明的具体实施方案,本发明的方法中,预设的多位点组合突变序列为根据机器学习算法推荐的多位点组合突变序列。所述的机器学习算法可以是本领域中任何可行的机器学习算法,预先对蛋白质突变体进行模型预测(包括其空间结构、功能等的模拟预测),提供推荐的多位点组合突变序列。例如,所述的机器学习算法可以是CN115249514A记载的机器学习算法,或是其他机器学习指导的蛋白质工程改造方法的算法。本发明中将CN115249514A中记载的内容全部引用于此。
根据本发明的具体实施方案,本发明的方法中,构建多位点组合突变质粒的过程包括:将质粒元件库中的M组质粒元件按顺序排列到孔板中,自动化移液工作站 根据预设的多位点组合突变序列选取相应位置的质粒元件到新的孔板中,并添加组装体系中的其他组装成分,利用自动化PCR仪运行组装程序。
根据本发明的具体实施方案,本发明的方法中,多位点组合突变质粒进一步被转化到宿主细胞(例如大肠杆菌BL21(DE3)感受态细胞)中,培养,挑取克隆转接到培养基中继续培养表达蛋白质突变体,进行检测。
根据本发明的具体实施方案,本发明的方法中,所述检测包括:检测蛋白质突变体序列-功能关系。
根据本发明的具体实施方案,本发明的方法中,通过高通量MALDI质谱进行代谢物的检测,所述代谢物包括蛋白质突变体,从而获得序列-功能关系。优选地,可进一步将检测结果作为机器学习算法新的输入,进行下一轮蛋白质突变体模型预测和序列设计的持续改进,实现蛋白质工程改造的迭代优化。
根据本发明的具体实施方案,本发明的蛋白质多位点组合突变的工程改造方法是用于蛋白质多位点组合突变的自动化迭代优化。即,另一方面,本发明还提供了所述的方法在蛋白质多位点组合突变的自动化迭代优化中的应用。
在本发明的一些具体实施方案中,适用于本发明的待改造蛋白质可以包括3-20个、进一步优选4-15个、更进一步优选4-10个预设突变位点。
在本发明的一些具体实施方案中,待改造蛋白质为鼠李糖脂酰基转移酶(RhlA)。在本发明的一些更优选的具体实施方案中,鼠李糖脂酰基转移酶的预设突变位点包括Arg74、Ala101、Leu148、Ser173中的一个或多个。
在本发明的一些具体实施方案中,组合突变质粒的组装分轮次完成。优选地,每轮次完成48-768个优选96-384个组合突变质粒的构建。
在本发明的一些更优选的具体实施方案中,每一轮预测384个组合突变,4-5轮迭代后可以筛选到较佳的组合突变。
本发明的方法,通常一轮384个组合突变株构建测试只需要3‐5天,机器学习算法可以在4‐5轮内筛选到较佳的组合突变,本发明的方法可以在半个月至一个月内完成一个蛋白质工程改造目标。
综上所述,本发明提供了一种蛋白质组合突变的自动化迭代优化方法与应用,其中根据需要突变的多位点,以PCR的方法将基因进行分割,并构建到骨架质粒上去,设计定点突变引物,对特定氨基酸位点突变成其他19种氨基酸,构建质粒元件库。 根据机器学习算法生成的突变序列,选取质粒元件库中合适的元件构建到骨架质粒上,并转化到大肠杆菌等其他底盘中进行基因的表达,通过高通量MALDI质谱进行代谢物的检测,从而获得序列-功能关系。进一步可将检测结果作为算法新的输入,进行模型预测和序列设计的持续改进。本发明的方法中,自动化的组合突变构建测试可以快速为机器学习算法标准、可靠的数据,加速算法迭代优化性能,提高蛋白质工程改造能力。本发明可以快速地构建算法推荐的突变序列,从而在较短的时间内进行迭代优化,更好地指导蛋白质工程改造。
附图说明
图1为本发明的蛋白质多位点突变工程改造的技术路线示意图。
图2为本发明的lv0质粒元件构建自动化流程示意图。
图3为本发明的lv0质粒元件库构建方案流程示意图。
图4为本发明的组合突变构建自动化流程示意图。
图5为本发明的组合突变株检测自动化流程示意图。
具体实施方式
为了对本发明的技术特征、目的和有益效果有更加清楚的理解,现对本发明的技术方案进行详细说明,应理解这些说明不用于限制本发明的范围。实施例中,各原始试剂材料均可商购获得,未注明具体条件的实验方法为所属领域熟知的常规方法和常规条件,或按照仪器制造商所建议的条件。
除非另外专门定义,本文使用的所有技术和科学术语都与相关领域普通技术人员的通常理解具有相同的含义。
实施例1:
本发明的蛋白质多位点突变工程改造的技术路线参见图1所示。本实施例以鼠李糖脂酰基转移酶(RhlA)为例,研究其四位点(Arg74,Ala101,Leu148,Ser173)组合突变对底物选择性的影响。
(1)lv0质粒元件的构建
将rhla基因分别合成4段,每段分别包含rhla基因核苷酸序列的1-264bp、265-387bp、388-483bp和484-888bp的碱基,这些碱基分别编码了RhlA氨基酸序列的1- 88aa、89-129aa、130-161aa和162-296aa的氨基酸。以Golden Gate组装策略构建lv0质粒元件(lv0质粒元件构建自动化流程参见图2所示),引物(如表1)中引入了内切酶Bsmbl和内切酶Bsal位点及其接口。具体过程包括:
自动化PCR中通过自动化移液工作站进行体系的配置,将25μL PCR mix、1μL模版质粒(含rhla基因)、2μL上游引物、2μL下游引物、20μL去离子水添加到96孔PCR孔板中,机械抓手转移到自动化PCR仪中运行扩增程序;
程序完成后移至自动化核酸提取仪中运行DNA纯化程序,产物送至自动化移液工作站进行定量和均一化,并进行Golden Gate组装体系配置,总反应体系为2μL NEB Golden Gate Enzyme Mix(BsmBI-v2)(NEB货号:E1602),2μL T4DNA Ligase Buffer(10X),lv0-ccdb骨架质粒和纯化后的DNA片段各75ng,去离子补齐体系至20μL;
将装有组装体系的96孔板送至自动化PCR仪中运行组装程序,程序按照产品说明书设置。程序完成送至自动化移液工作站进行大肠杆菌DH5a感受态的转化,37度培养箱培养过夜后挑取2个克隆转接到液体培养基中,并送测序检测。
获得4个lv0质粒元件,命名为:lv0-s1、lv0-s2、lv0-s3和lv0-s4。
表1、lv0质粒元件构建引物
引物名称 引物序列(5’to 3’)
Rhla-GG-S1-F CGTCTCATCGGGGTCTCAttggatgcggcgcgaaagt(SEQ ID NO:1)
Rhla-GG-S1-R CGTCTCAGGTCGGTCTCAgccaggaggatttccacct(SEQ ID NO:2)
Rhla-GG-S2-F CGTCTCATCGGGGTCTCAtggcgctgatcgagcgctt(SEQ ID NO:3)
Rhla-GG-S2-R CGTCTCAGGTCGGTCTCAggggcgaatgccatcacca(SEQ ID NO:4)
Rhla-GG-S3-F CGTCTCATCGGGGTCTCAcccctggactgaaccaggc(SEQ ID NO:5)
Rhla-GG-S3-R CGTCTCAGGTCGGTCTCAggtctcgttgagcagatgg(SEQ ID NO:6)
Rhla-GG-S4-F CGTCTCATCGGGGTCTCAgaccgtcggcaaatacctg(SEQ ID NO:7)
Rhla-GG-S4-R CGTCTCAGGTCGGTCTCAtcaggcgtagccgatggcc(SEQ ID NO:8)
(2)lv0质粒元件库构建
以前述构建的lv0质粒元件为模版设计引物(表2),在引物中引入突变碱基将lv0质粒元件特定位点突变成其他19种常见氨基酸序列,形成质粒元件库(lv0质粒元件库构建方案参见图3所示)。具体过程包括:
每个质粒模版使用2对引物进行PCR扩增,再通过Gbsion assembly组装成新的质 粒。自动化PCR、自动化核纯化、自动化组装、自动化转化以及测序流程同lv0质粒元件的构建。自动化组装的体系为:10μL Gibson Assembly Master Mix(2X)(NEB货号:E5510),2.5μL PCR片段1,2.5μL PCR片段2,5μL去离子水,组装反应条件按照产品说明书设置。lv0-s1、lv0-s2、lv0-s3和lv0-s4质粒元件都分别突变成19个质粒,共获得lv080个质粒元件库(lv0-074(20)、lv0-101(20)、lv0-148(20)、lv0-173(20))。
表2、lv0质粒元件库构建引物
引物名称 引物序列(5’to 3’)
Rhla-SDM-R74A-1-F cgcgtcagcacaacccgcagGCGgggttga(SEQ ID NO:9)
Rhla-SDM-R74C-1-F cgcgtcagcacaacccgcagTGTgggttga(SEQ ID NO:10)
Rhla-SDM-R74D-1-F cgcgtcagcacaacccgcagGATgggttga(SEQ ID NO:11)
Rhla-SDM-R74E-1-F cgcgtcagcacaacccgcagGAGgggttga(SEQ ID NO:12)
Rhla-SDM-R74F-1-F cgcgtcagcacaacccgcagTTTgggttga(SEQ ID NO:13)
Rhla-SDM-R74G-1-F cgcgtcagcacaacccgcagGGGgggttga(SEQ ID NO:14)
Rhla-SDM-R74H-1-F cgcgtcagcacaacccgcagCATgggttga(SEQ ID NO:15)
Rhla-SDM-R74I-1-F cgcgtcagcacaacccgcagATTgggttga(SEQ ID NO:16)
Rhla-SDM-R74K-1-F cgcgtcagcacaacccgcagAAGgggttga(SEQ ID NO:17)
Rhla-SDM-R74L-1-F cgcgtcagcacaacccgcagCTGgggttga(SEQ ID NO:18)
Rhla-SDM-R74M-1-F cgcgtcagcacaacccgcagATGgggttga(SEQ ID NO:19)
Rhla-SDM-R74N-1-F cgcgtcagcacaacccgcagAATgggttga(SEQ ID NO:20)
Rhla-SDM-R74P-1-F cgcgtcagcacaacccgcagCCGgggttga(SEQ ID NO:21)
Rhla-SDM-R74Q-1-F cgcgtcagcacaacccgcagCAGgggttga(SEQ ID NO:22)
Rhla-SDM-R74S-1-F cgcgtcagcacaacccgcagTCGgggttga(SEQ ID NO:23)
Rhla-SDM-R74T-1-F cgcgtcagcacaacccgcagACGgggttga(SEQ ID NO:24)
Rhla-SDM-R74V-1-F cgcgtcagcacaacccgcagGTGgggttga(SEQ ID NO:25)
Rhla-SDM-R74W-1-F cgcgtcagcacaacccgcagTGGgggttga(SEQ ID NO:26)
Rhla-SDM-R74Y-1-F cgcgtcagcacaacccgcagTATgggttga(SEQ ID NO:27)
Rhla-SDM-A101C-1-F aggtcaatcacctggtAtccTGTtcctggg(SEQ ID NO:28)
Rhla-SDM-A101D-1-F aggtcaatcacctggtAtccGATtcctggg(SEQ ID NO:29)
Rhla-SDM-A101E-1-F aggtcaatcacctggtAtccGAGtcctggg(SEQ ID NO:30)
Rhla-SDM-A101F-1-F aggtcaatcacctggtAtccTTTtcctggg(SEQ ID NO:31)
Rhla-SDM-A101G-1-F aggtcaatcacctggtAtccGGGtcctggg(SEQ ID NO:32)
Rhla-SDM-A101H-1-F aggtcaatcacctggtAtccCATtcctggg(SEQ ID NO:33)
Rhla-SDM-A101I-1-F aggtcaatcacctggtAtccATTtcctggg(SEQ ID NO:34)
Rhla-SDM-A101K-1-F aggtcaatcacctggtAtccAAGtcctggg(SEQ ID NO:35)
Rhla-SDM-A101L-1-F aggtcaatcacctggtAtccCTGtcctggg(SEQ ID NO:36)
Rhla-SDM-A101M-1-F aggtcaatcacctggtAtccATGtcctggg(SEQ ID NO:37)
Rhla-SDM-A101N-1-F aggtcaatcacctggtAtccAATtcctggg(SEQ ID NO:38)
Rhla-SDM-A101P-1-F aggtcaatcacctggtAtccCCGtcctggg(SEQ ID NO:39)
Rhla-SDM-A101Q-1-F aggtcaatcacctggtAtccCAGtcctggg(SEQ ID NO:40)
Rhla-SDM-A101R-1-F aggtcaatcacctggtAtccCGGtcctggg(SEQ ID NO:41)
Rhla-SDM-A101S-1-F aggtcaatcacctggtAtccTCGtcctggg(SEQ ID NO:42)
Rhla-SDM-A101T-1-F aggtcaatcacctggtAtccACGtcctggg(SEQ ID NO:43)
Rhla-SDM-A101V-1-F aggtcaatcacctggtAtccGTGtcctggg(SEQ ID NO:44)
Rhla-SDM-A101W-1-F aggtcaatcacctggtAtccTGGtcctggg(SEQ ID NO:45)
Rhla-SDM-A101Y-1-F aggtcaatcacctggtAtccTATtcctggg(SEQ ID NO:46)
Rhla-SDM-L148A-1-F gggcgcaggcgctgatcgagGCTgacgaca(SEQ ID NO:47)
Rhla-SDM-L148C-1-F gggcgcaggcgctgatcgagTGTgacgaca(SEQ ID NO:48)
Rhla-SDM-L148D-1-F gggcgcaggcgctgatcgagGATgacgaca(SEQ ID NO:49)
Rhla-SDM-L148E-1-F gggcgcaggcgctgatcgagGAGgacgaca(SEQ ID NO:50)
Rhla-SDM-L148F-1-F gggcgcaggcgctgatcgagTTTgacgaca(SEQ ID NO:51)
Rhla-SDM-L148G-1-F gggcgcaggcgctgatcgagGGGgacgaca(SEQ ID NO:52)
Rhla-SDM-L148H-1-F gggcgcaggcgctgatcgagCATgacgaca(SEQ ID NO:53)
Rhla-SDM-L148I-1-F gggcgcaggcgctgatcgagATTgacgaca(SEQ ID NO:54)
Rhla-SDM-L148K-1-F gggcgcaggcgctgatcgagAAGgacgaca(SEQ ID NO:55)
Rhla-SDM-L148M-1-F gggcgcaggcgctgatcgagATGgacgaca(SEQ ID NO:56)
Rhla-SDM-L148N-1-F gggcgcaggcgctgatcgagAATgacgaca(SEQ ID NO:57)
Rhla-SDM-L148P-1-F gggcgcaggcgctgatcgagCCGgacgaca(SEQ ID NO:58)
Rhla-SDM-L148Q-1-F gggcgcaggcgctgatcgagCAGgacgaca(SEQ ID NO:59)
Rhla-SDM-L148R-1-F gggcgcaggcgctgatcgagAGGgacgaca(SEQ ID NO:60)
Rhla-SDM-L148S-1-F gggcgcaggcgctgatcgagAGTgacgaca(SEQ ID NO:61)
Rhla-SDM-L148T-1-F gggcgcaggcgctgatcgagACGgacgaca(SEQ ID NO:62)
Rhla-SDM-L148V-1-F gggcgcaggcgctgatcgagGTGgacgaca(SEQ ID NO:63)
Rhla-SDM-L148W-1-F gggcgcaggcgctgatcgagTGGgacgaca(SEQ ID NO:64)
Rhla-SDM-L148Y-1-F gggcgcaggcgctgatcgagTATgacgaca(SEQ ID NO:65)
Rhla-SDM-S173A-1-F tgccgcagcgcctgaaagccGCGaaccatc(SEQ ID NO:66)
Rhla-SDM-S173C-1-F tgccgcagcgcctgaaagccTGTaaccatc(SEQ ID NO:67)
Rhla-SDM-S173D-1-F tgccgcagcgcctgaaagccGATaaccatc(SEQ ID NO:68)
Rhla-SDM-S173E-1-F tgccgcagcgcctgaaagccGAGaaccatc(SEQ ID NO:69)
Rhla-SDM-S173F-1-F tgccgcagcgcctgaaagccTTTaaccatc(SEQ ID NO:70)
Rhla-SDM-S173G-1-F tgccgcagcgcctgaaagccGGGaaccatc(SEQ ID NO:71)
Rhla-SDM-S173H-1-F tgccgcagcgcctgaaagccCATaaccatc(SEQ ID NO:72)
Rhla-SDM-S173I-1-F tgccgcagcgcctgaaagccATTaaccatc(SEQ ID NO:73)
Rhla-SDM-S173K-1-F tgccgcagcgcctgaaagccAAGaaccatc(SEQ ID NO:74)
Rhla-SDM-S173L-1-F tgccgcagcgcctgaaagccCTGaaccatc(SEQ ID NO:75)
Rhla-SDM-S173M-1-F tgccgcagcgcctgaaagccATGaaccatc(SEQ ID NO:76)
Rhla-SDM-S173N-1-F tgccgcagcgcctgaaagccAATaaccatc(SEQ ID NO:77)
Rhla-SDM-S173P-1-F tgccgcagcgcctgaaagccCCGaaccatc(SEQ ID NO:78)
Rhla-SDM-S173Q-1-F tgccgcagcgcctgaaagccCAGaaccatc(SEQ ID NO:79)
Rhla-SDM-S173R-1-F tgccgcagcgcctgaaagccCGGaaccatc(SEQ ID NO:80)
Rhla-SDM-S173T-1-F tgccgcagcgcctgaaagccACGaaccatc(SEQ ID NO:81)
Rhla-SDM-S173V-1-F tgccgcagcgcctgaaagccGTGaaccatc(SEQ ID NO:82)
Rhla-SDM-S173W-1-F tgccgcagcgcctgaaagccTGGaaccatc(SEQ ID NO:83)
Rhla-SDM-S173Y-1-F tgccgcagcgcctgaaagccTATaaccatc(SEQ ID NO:84)
Rhla-SDM-1-R cgagattttcaggagctaaggaagc(SEQ ID NO:85)
Rhla-SDM-2-F gcttccttagctcctgaaaatctcg(SEQ ID NO:86)
Rhla-SDM-R74-2-R ctgcgggttgtgctgacgcg(SEQ ID NO:87)
Rhla-SDM-A101-2-R ggagaccaggtgattgacct(SEQ ID NO:88)
Rhla-SDM-L148-2-R ctcgatcagcgcctgcgccc(SEQ ID NO:89)
Rhla-SDM-S173-2-R ggctttcaggcgctgcggca(SEQ ID NO:90)
(3)组合突变构建
根据机器学习算法推荐的四位点突变序列,分别从lv0-074(20)、lv0-101(20)、lv0-148(20)、lv0-173(20)元件库中选取需要的四个质粒元件,以Golden Gate组装方式构建到lv1-rhlb-ccdb骨架质粒中去(组合突变构建自动化流程参见图4)。具体过程包括:
首先将80个元件质粒按顺序排列到96孔板中,自动化移液工作站根据序列表吸取相应位置的质粒元件到新的96孔PCR孔板中去,同时添加其他组装成分,组装体系为:2μL NEB Golden Gate Enzyme Mix(Bsal-v2)(NEB货号:E1601),2μL T4DNA Ligase Buffer(10X),lv1-rhlb-ccdb骨架质粒和其他4个lv0质粒元件各75ng,去离子水补齐至20μL。机械抓手转移到自动化PCR仪中运行组装程序,程序按照产品说明书设置。程序完成送至自动化移液工作站进行大肠杆菌BL21(DE3)感受态的转化,37度培养箱培养过夜后挑取2个克隆转接到液体培养基中,并送测序检测。通常只需要1.5小时即可完成96个组合突变质粒的组装,一天可以完成384-768个组合突变质粒的构建。
(4)机器学习指导蛋白质工程迭代优化
本实施例使用的算法来自于CN115249514A中公开的机器学习引导的生物序列工程改造方法及装置,每一轮预测384个组合突变,4-5轮迭代后可以筛选到最佳的组合突变(组合突变株检测自动化流程参见图5)。类似功能的算法都可以在本发明中配置,不仅限于CN115249514A。具体过程包括:
通过上述方法构建的组合突变株,在自动化移液工作站和自动化培养仪中进行高通量的96孔深孔板中培养发酵24小时。完成后使用自动化移液工作站添加乙酸乙酯,并将上层有机相添加到MALDI质谱检测靶板上,上机检测。根据质谱测出特定代谢物荷质比(m/z)的峰高来量化突变株的性能,将量化后的序列-功能关系数据重新导入机器学习算法中,并进行下一轮组合突变优化过程。通常一轮384个组合突变株构建测试只需要3-5天,机器学习算法可以在4-5轮内筛选到最优组合突变,本发明的方法可以在半个月至一个月内完成一个蛋白质工程改造目标。

Claims (10)

  1. 一种蛋白质多位点组合突变的工程改造方法,该方法包括:
    根据待改造蛋白质的氨基酸序列的N个预设突变位点,将氨基酸序列设分为M段,每段含有至少一个预设突变位点,针对每段氨基酸序列分别构建一组质粒元件,从而构建形成包含M组质粒元件的质粒元件库;其中,N为大于等于2的整数,M为大于等于2的整数,且N≥M;每组质粒元件中分别包括:含有未突变氨基酸序列的编码基因的质粒元件,以及含有定点突变氨基酸序列的编码基因的质粒元件;
    根据预设的多位点组合突变序列,从每组质粒元件中选取对应的质粒元件,组装构建成多位点组合突变质粒,进而表达蛋白质突变体进行检测。
  2. 根据权利要求1所述的方法,其中,M段氨基酸序列中,每段序列含有一个预设突变位点;
    优选地,每组质粒元件中分别包括:含有未突变氨基酸序列的编码基因的质粒元件,以及1-19种含有定点突变氨基酸序列的编码基因的质粒元件;
    优选地,每组质粒单元中分别包括20种质粒元件,对应预设突变位点的20种氨基酸。
  3. 根据权利要求1所述的方法,其中,预设的多位点组合突变序列为根据机器学习算法推荐的多位点组合突变序列。
  4. 根据权利要求1所述的方法,其中,质粒元件的构建、多位点组合突变质粒的构建各地独立地利用自动化功能岛完成。
  5. 根据权利要求1-4任一项所述的方法,其中,构建质粒元件库的过程包括:
    针对每段原始氨基酸序列分别构建含有对应编码基因的质粒元件,共构建得到M个含有未突变氨基酸序列的编码基因的质粒元件;
    以每个所构建的含有未突变氨基酸序列的编码基因的质粒元件为模版,分别在扩增引物中引入突变碱基进行PCR扩增,将原序列质粒元件定点突变成其他氨基酸编码序列,分别构建得到含有定点突变氨基酸序列的编码基因的质粒元件。
  6. 根据权利要求1-5任一项所述的方法,其中,构建多位点组合突变质粒的过程包括:
    将质粒元件库中的M组质粒元件按顺序排列到孔板中,自动化移液工作站根据预设的多位点组合突变序列选取相应位置的质粒元件到新的孔板中,并添加组装体 系中的其他组装成分,利用自动化PCR仪运行组装程序。
  7. 根据权利要求1-6任一项所述的方法,其中,多位点组合突变质粒进一步被转化到宿主细胞中,培养,挑取克隆转接到培养基中继续培养表达蛋白质突变体,进行检测。
  8. 根据权利要求7所述的方法,其中,所述检测包括:检测蛋白质突变体序列-功能关系。
  9. 根据权利要求7所述的方法,其中,通过高通量MALDI质谱进行代谢物的检测,所述代谢物包括蛋白质突变体,从而获得序列-功能关系;
    优选地,将检测结果作为机器学习算法新的输入,进行下一轮蛋白质突变体模型预测和序列设计的持续改进,实现蛋白质工程改造的迭代优化。
  10. 权利要求1-9任一项所述的方法在蛋白质多位点组合突变的自动化迭代优化中的应用;
    优选地,待改造蛋白质包括3-20个、进一步优选4-15个、更进一步优选4-10个预设突变位点;
    优选地,待改造蛋白质为鼠李糖脂酰基转移酶(RhlA);更优选地,鼠李糖脂酰基转移酶的预设突变位点包括Arg74、Ala101、Leu148、Ser173中的一个或多个;
    优选地,每轮次迭代完成48-768个更优选96-384个组合突变质粒的构建。
PCT/CN2022/134414 2022-11-25 2022-11-25 蛋白质多位点组合突变的自动化迭代优化方法与应用 WO2024108567A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/134414 WO2024108567A1 (zh) 2022-11-25 2022-11-25 蛋白质多位点组合突变的自动化迭代优化方法与应用

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/134414 WO2024108567A1 (zh) 2022-11-25 2022-11-25 蛋白质多位点组合突变的自动化迭代优化方法与应用

Publications (1)

Publication Number Publication Date
WO2024108567A1 true WO2024108567A1 (zh) 2024-05-30

Family

ID=91195013

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/134414 WO2024108567A1 (zh) 2022-11-25 2022-11-25 蛋白质多位点组合突变的自动化迭代优化方法与应用

Country Status (1)

Country Link
WO (1) WO2024108567A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108424907A (zh) * 2018-05-09 2018-08-21 北京大学 一种高通量dna多位点精确碱基突变方法
WO2021121391A1 (zh) * 2019-12-19 2021-06-24 南京金斯瑞生物科技有限公司 一种基因突变文库的构建方法
CN114901820A (zh) * 2019-12-30 2022-08-12 南京金斯瑞生物科技有限公司 构建基因突变文库的方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108424907A (zh) * 2018-05-09 2018-08-21 北京大学 一种高通量dna多位点精确碱基突变方法
WO2021121391A1 (zh) * 2019-12-19 2021-06-24 南京金斯瑞生物科技有限公司 一种基因突变文库的构建方法
CN114901820A (zh) * 2019-12-30 2022-08-12 南京金斯瑞生物科技有限公司 构建基因突变文库的方法

Similar Documents

Publication Publication Date Title
Zhou et al. MiYA, an efficient machine-learning workflow in conjunction with the YeastFab assembly strategy for combinatorial optimization of heterologous metabolic pathways in Saccharomyces cerevisiae
Sandberg et al. The emergence of adaptive laboratory evolution as an efficient tool for biological discovery and industrial biotechnology
Wanamaker et al. CrY2H-seq: a massively multiplexed assay for deep-coverage interactome mapping
JP4755200B2 (ja) インシリコ分析に基づく菌株の改良方法
Trowitzsch et al. New baculovirus expression tools for recombinant protein complex production
Kumar et al. Emerging technologies in yeast genomics
US20180258421A1 (en) Compositions, methods and uses for multiplex protein sequence activity relationship mapping
Park et al. Global physiological understanding and metabolic engineering of microorganisms based on omics studies
Fogg et al. Higher-throughput approaches to crystallization and crystal structure determination
Feldman et al. Pooled genetic perturbation screens with image-based phenotypes
Feltus Systems genetics: a paradigm to improve discovery of candidate genes and mechanisms underlying complex traits
Yadav et al. Overview and principles of bioengineering: the drivers of omics technologies
Ding et al. Engineering the 5′ UTR-mediated regulation of protein abundance in Yeast using nucleotide sequence activity relationships
US20140038831A1 (en) High throughput yeast two-hybrid screening method and reagent kit
Miton et al. Statistical analysis of mutational epistasis to reveal intramolecular interaction networks in proteins
Freed et al. Genome-wide tuning of protein expression levels to rapidly engineer microbial traits
Huels et al. The impact of protein biochips and microarrays on the drug development process
Emerson et al. Multivariate data analysis in cell gene therapy manufacturing
Altmann et al. High‐quality yeast‐2‐hybrid interaction network mapping
Xu et al. Genome reconstruction and haplotype phasing using chromosome conformation capture methodologies
Salazar et al. Evaluating a screen and analysis of mutant libraries
WO2024108567A1 (zh) 蛋白质多位点组合突变的自动化迭代优化方法与应用
Zhang et al. From multi‐scale methodology to systems biology: to integrate strain improvement and fermentation optimization
Antypas et al. A universal platform for selection and high-resolution phenotypic screening of bacterial mutants using the nanowell slide
AU783339B2 (en) Computer-assisted formulation of culture media

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22966255

Country of ref document: EP

Kind code of ref document: A1