US20230235317A1 - Directed evolution method based on primary and secondary replicon of gemini virus - Google Patents
Directed evolution method based on primary and secondary replicon of gemini virus Download PDFInfo
- Publication number
- US20230235317A1 US20230235317A1 US17/923,264 US202117923264A US2023235317A1 US 20230235317 A1 US20230235317 A1 US 20230235317A1 US 202117923264 A US202117923264 A US 202117923264A US 2023235317 A1 US2023235317 A1 US 2023235317A1
- Authority
- US
- United States
- Prior art keywords
- genetic element
- replicon
- geminivirus
- rep
- repa
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 241000702463 Geminiviridae Species 0.000 title claims abstract description 154
- 238000000034 method Methods 0.000 title claims abstract description 140
- 230000002068 genetic effect Effects 0.000 claims abstract description 148
- 101710090029 Replication-associated protein A Proteins 0.000 claims description 160
- 241000196324 Embryophyta Species 0.000 claims description 138
- 230000014509 gene expression Effects 0.000 claims description 97
- 101710163270 Nuclease Proteins 0.000 claims description 74
- 239000013598 vector Substances 0.000 claims description 67
- 230000010076 replication Effects 0.000 claims description 62
- 102000018120 Recombinases Human genes 0.000 claims description 58
- 108010091086 Recombinases Proteins 0.000 claims description 58
- 230000035897 transcription Effects 0.000 claims description 55
- 238000013518 transcription Methods 0.000 claims description 55
- 239000012190 activator Substances 0.000 claims description 47
- 108090000623 proteins and genes Proteins 0.000 claims description 46
- 230000000694 effects Effects 0.000 claims description 43
- 102000004169 proteins and genes Human genes 0.000 claims description 38
- 230000004568 DNA-binding Effects 0.000 claims description 31
- 108020004414 DNA Proteins 0.000 claims description 27
- 108091027544 Subgenomic mRNA Proteins 0.000 claims description 21
- 230000001105 regulatory effect Effects 0.000 claims description 20
- 238000011144 upstream manufacturing Methods 0.000 claims description 20
- 108091026890 Coding region Proteins 0.000 claims description 16
- 244000061176 Nicotiana tabacum Species 0.000 claims description 13
- 235000002637 Nicotiana tabacum Nutrition 0.000 claims description 13
- 241000701489 Cauliflower mosaic virus Species 0.000 claims description 12
- 102200107778 rs140148105 Human genes 0.000 claims description 12
- 235000021307 Triticum Nutrition 0.000 claims description 10
- 241000702302 Wheat dwarf virus Species 0.000 claims description 10
- 239000002773 nucleotide Substances 0.000 claims description 10
- 125000003729 nucleotide group Chemical group 0.000 claims description 10
- 239000013612 plasmid Substances 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 10
- 108700026226 TATA Box Proteins 0.000 claims description 9
- 238000002708 random mutagenesis Methods 0.000 claims description 9
- 102220533509 Baculoviral IAP repeat-containing protein 1_K229E_mutation Human genes 0.000 claims description 8
- 230000005026 transcription initiation Effects 0.000 claims description 8
- 230000027455 binding Effects 0.000 claims description 7
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 6
- 238000006467 substitution reaction Methods 0.000 claims description 6
- 241000577998 Bean yellow dwarf virus Species 0.000 claims description 5
- 235000016068 Berberis vulgaris Nutrition 0.000 claims description 5
- 241000335053 Beta vulgaris Species 0.000 claims description 5
- 108020004638 Circular DNA Proteins 0.000 claims description 5
- 241000209510 Liliopsida Species 0.000 claims description 5
- 235000007688 Lycopersicon esculentum Nutrition 0.000 claims description 5
- 240000003768 Solanum lycopersicum Species 0.000 claims description 5
- 108020005004 Guide RNA Proteins 0.000 claims description 4
- 108091029795 Intergenic region Proteins 0.000 claims description 4
- 108700026244 Open Reading Frames Proteins 0.000 claims description 4
- 241001233957 eudicotyledons Species 0.000 claims description 4
- 108020001507 fusion proteins Proteins 0.000 claims description 4
- 102000037865 fusion proteins Human genes 0.000 claims description 4
- 238000012165 high-throughput sequencing Methods 0.000 claims description 4
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 claims description 3
- 241000219194 Arabidopsis Species 0.000 claims description 3
- 240000008067 Cucumis sativus Species 0.000 claims description 3
- 235000010799 Cucumis sativus var sativus Nutrition 0.000 claims description 3
- 108700028146 Genetic Enhancer Elements Proteins 0.000 claims description 3
- 108700007698 Genetic Terminator Regions Proteins 0.000 claims description 3
- 240000005979 Hordeum vulgare Species 0.000 claims description 3
- 235000007340 Hordeum vulgare Nutrition 0.000 claims description 3
- 240000003183 Manihot esculenta Species 0.000 claims description 3
- 235000016735 Manihot esculenta subsp esculenta Nutrition 0.000 claims description 3
- 240000007594 Oryza sativa Species 0.000 claims description 3
- 235000007164 Oryza sativa Nutrition 0.000 claims description 3
- 244000046052 Phaseolus vulgaris Species 0.000 claims description 3
- 108020004459 Small interfering RNA Proteins 0.000 claims description 3
- 240000006394 Sorghum bicolor Species 0.000 claims description 3
- 235000011684 Sorghum saccharatum Nutrition 0.000 claims description 3
- 240000008042 Zea mays Species 0.000 claims description 3
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 claims description 3
- 235000002017 Zea mays subsp mays Nutrition 0.000 claims description 3
- 235000005822 corn Nutrition 0.000 claims description 3
- 238000012258 culturing Methods 0.000 claims description 3
- 235000009566 rice Nutrition 0.000 claims description 3
- 230000001131 transforming effect Effects 0.000 claims description 3
- 125000003275 alpha amino acid group Chemical group 0.000 claims 8
- 244000098338 Triticum aestivum Species 0.000 claims 1
- 238000010353 genetic engineering Methods 0.000 abstract description 3
- 238000012750 in vivo screening Methods 0.000 abstract description 2
- 210000004027 cell Anatomy 0.000 description 94
- 230000006870 function Effects 0.000 description 85
- 235000018102 proteins Nutrition 0.000 description 32
- 241000700605 Viruses Species 0.000 description 23
- 150000001413 amino acids Chemical group 0.000 description 22
- 238000012216 screening Methods 0.000 description 20
- 239000000243 solution Substances 0.000 description 17
- 230000009471 action Effects 0.000 description 14
- 210000001938 protoplast Anatomy 0.000 description 14
- 238000003752 polymerase chain reaction Methods 0.000 description 12
- 108700028369 Alleles Proteins 0.000 description 11
- 101710094523 Replication enhancer Proteins 0.000 description 11
- 230000003007 single stranded DNA break Effects 0.000 description 11
- 235000001014 amino acid Nutrition 0.000 description 10
- 241000209140 Triticum Species 0.000 description 9
- 150000007523 nucleic acids Chemical class 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 238000012270 DNA recombination Methods 0.000 description 8
- 241000588724 Escherichia coli Species 0.000 description 8
- 230000000977 initiatory effect Effects 0.000 description 7
- 108020004707 nucleic acids Proteins 0.000 description 7
- 102000039446 nucleic acids Human genes 0.000 description 7
- 239000000047 product Substances 0.000 description 7
- 238000011160 research Methods 0.000 description 7
- 241000124008 Mammalia Species 0.000 description 6
- 241000702459 Mastrevirus Species 0.000 description 6
- 239000012634 fragment Substances 0.000 description 6
- 241000894006 Bacteria Species 0.000 description 5
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 5
- 230000003321 amplification Effects 0.000 description 5
- 210000004102 animal cell Anatomy 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 238000003199 nucleic acid amplification method Methods 0.000 description 5
- 238000005215 recombination Methods 0.000 description 5
- 230000006798 recombination Effects 0.000 description 5
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 5
- 108091033409 CRISPR Proteins 0.000 description 4
- 108700004991 Cas12a Proteins 0.000 description 4
- FBPFZTCFMRRESA-KVTDHHQDSA-N D-Mannitol Chemical compound OC[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)CO FBPFZTCFMRRESA-KVTDHHQDSA-N 0.000 description 4
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 4
- 229930195725 Mannitol Natural products 0.000 description 4
- 210000003855 cell nucleus Anatomy 0.000 description 4
- 239000003153 chemical reaction reagent Substances 0.000 description 4
- 238000012350 deep sequencing Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 239000013604 expression vector Substances 0.000 description 4
- 238000001727 in vivo Methods 0.000 description 4
- 239000000594 mannitol Substances 0.000 description 4
- 235000010355 mannitol Nutrition 0.000 description 4
- 238000012163 sequencing technique Methods 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 101150079413 virD2 gene Proteins 0.000 description 4
- 241000589155 Agrobacterium tumefaciens Species 0.000 description 3
- 108091093088 Amplicon Proteins 0.000 description 3
- UXVMQQNJUSDDNG-UHFFFAOYSA-L Calcium chloride Chemical compound [Cl-].[Cl-].[Ca+2] UXVMQQNJUSDDNG-UHFFFAOYSA-L 0.000 description 3
- 108090000565 Capsid Proteins Proteins 0.000 description 3
- 102100023321 Ceruloplasmin Human genes 0.000 description 3
- 102000053602 DNA Human genes 0.000 description 3
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 3
- 239000002202 Polyethylene glycol Substances 0.000 description 3
- 101710088839 Replication initiation protein Proteins 0.000 description 3
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 3
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 3
- 239000001110 calcium chloride Substances 0.000 description 3
- 229910001628 calcium chloride Inorganic materials 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000002255 enzymatic effect Effects 0.000 description 3
- 238000010362 genome editing Methods 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 3
- 229920001223 polyethylene glycol Polymers 0.000 description 3
- 238000003753 real-time PCR Methods 0.000 description 3
- 125000006850 spacer group Chemical group 0.000 description 3
- 239000006228 supernatant Substances 0.000 description 3
- 230000004083 survival effect Effects 0.000 description 3
- 210000001519 tissue Anatomy 0.000 description 3
- 238000011282 treatment Methods 0.000 description 3
- 230000003612 virological effect Effects 0.000 description 3
- 241000702449 African cassava mosaic virus Species 0.000 description 2
- 241000589158 Agrobacterium Species 0.000 description 2
- 241000702286 Bean golden mosaic virus Species 0.000 description 2
- 108091003079 Bovine Serum Albumin Proteins 0.000 description 2
- 108020004705 Codon Proteins 0.000 description 2
- 102000052510 DNA-Binding Proteins Human genes 0.000 description 2
- 101710096438 DNA-binding protein Proteins 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 241000206602 Eukaryota Species 0.000 description 2
- 241000702489 Maize streak virus Species 0.000 description 2
- 238000012408 PCR amplification Methods 0.000 description 2
- 238000011529 RT qPCR Methods 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- 241000702479 Squash leaf curl virus Species 0.000 description 2
- 241000702295 Tomato golden mosaic virus Species 0.000 description 2
- 241000702308 Tomato yellow leaf curl virus Species 0.000 description 2
- OJOBTAOGJIWAGB-UHFFFAOYSA-N acetosyringone Chemical compound COC1=CC(C(C)=O)=CC(OC)=C1O OJOBTAOGJIWAGB-UHFFFAOYSA-N 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 229940098773 bovine serum albumin Drugs 0.000 description 2
- 102220350953 c.272A>G Human genes 0.000 description 2
- 230000032823 cell division Effects 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000003623 enhancer Substances 0.000 description 2
- 229940088598 enzyme Drugs 0.000 description 2
- 238000002744 homologous recombination Methods 0.000 description 2
- 230000006801 homologous recombination Effects 0.000 description 2
- 239000000413 hydrolysate Substances 0.000 description 2
- 230000001939 inductive effect Effects 0.000 description 2
- 208000015181 infectious disease Diseases 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 229910001629 magnesium chloride Inorganic materials 0.000 description 2
- 230000001404 mediated effect Effects 0.000 description 2
- 239000012528 membrane Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000006780 non-homologous end joining Effects 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 230000032361 posttranscriptional gene silencing Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000010839 reverse transcription Methods 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 230000008685 targeting Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000029812 viral genome replication Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 210000005253 yeast cell Anatomy 0.000 description 2
- 239000004475 Arginine Substances 0.000 description 1
- 108010059892 Cellulase Proteins 0.000 description 1
- 108091062157 Cis-regulatory element Proteins 0.000 description 1
- 230000009946 DNA mutation Effects 0.000 description 1
- 230000004543 DNA replication Effects 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 241000589565 Flavobacterium Species 0.000 description 1
- 102100034343 Integrase Human genes 0.000 description 1
- ONIBWKKTOPOVIA-BYPYZUCNSA-N L-Proline Chemical compound OC(=O)[C@@H]1CCCN1 ONIBWKKTOPOVIA-BYPYZUCNSA-N 0.000 description 1
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 1
- ODKSFYDXXFIFQN-BYPYZUCNSA-P L-argininium(2+) Chemical compound NC(=[NH2+])NCCC[C@H]([NH3+])C(O)=O ODKSFYDXXFIFQN-BYPYZUCNSA-P 0.000 description 1
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 description 1
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 1
- KDXKERNSBIXSRK-YFKPBYRVSA-N L-lysine Chemical compound NCCCC[C@H](N)C(O)=O KDXKERNSBIXSRK-YFKPBYRVSA-N 0.000 description 1
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 1
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 239000004677 Nylon Substances 0.000 description 1
- 229910019142 PO4 Inorganic materials 0.000 description 1
- 229920001030 Polyethylene Glycol 4000 Polymers 0.000 description 1
- ONIBWKKTOPOVIA-UHFFFAOYSA-N Proline Natural products OC(=O)C1CCCN1 ONIBWKKTOPOVIA-UHFFFAOYSA-N 0.000 description 1
- 102220565179 Proline dehydrogenase 1, mitochondrial_Y106H_mutation Human genes 0.000 description 1
- 108010009736 Protein Hydrolysates Proteins 0.000 description 1
- 101710193533 Protein RepA Proteins 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- 108700009124 Transcription Initiation Site Proteins 0.000 description 1
- 108091023040 Transcription factor Proteins 0.000 description 1
- 235000004279 alanine Nutrition 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 1
- 125000003118 aryl group Chemical group 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000003766 bioinformatics method Methods 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 210000002421 cell wall Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 229940106157 cellulase Drugs 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000002962 chemical mutagen Substances 0.000 description 1
- 210000003763 chloroplast Anatomy 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 230000001086 cytosolic effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- MTHSVFCYNBDYFN-UHFFFAOYSA-N diethylene glycol Chemical compound OCCOCCO MTHSVFCYNBDYFN-UHFFFAOYSA-N 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- 210000002472 endoplasmic reticulum Anatomy 0.000 description 1
- JKLNYGDWYRKFKR-UHFFFAOYSA-N ethyl methyl sulfate Chemical compound CCOS(=O)(=O)OC JKLNYGDWYRKFKR-UHFFFAOYSA-N 0.000 description 1
- 230000010429 evolutionary process Effects 0.000 description 1
- 239000012737 fresh medium Substances 0.000 description 1
- 108010074605 gamma-Globulins Proteins 0.000 description 1
- 238000013412 genome amplification Methods 0.000 description 1
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 230000007062 hydrolysis Effects 0.000 description 1
- 238000006460 hydrolysis reaction Methods 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 239000003112 inhibitor Substances 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 229930027917 kanamycin Natural products 0.000 description 1
- SBUJHOSQTJFQJX-NOAMYHISSA-N kanamycin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N SBUJHOSQTJFQJX-NOAMYHISSA-N 0.000 description 1
- 229960000318 kanamycin Drugs 0.000 description 1
- 229930182823 kanamycin A Natural products 0.000 description 1
- 230000033001 locomotion Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000010534 mechanism of action Effects 0.000 description 1
- 239000002609 medium Substances 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 229920001778 nylon Polymers 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- 229920002401 polyacrylamide Polymers 0.000 description 1
- 230000008488 polyadenylation Effects 0.000 description 1
- 239000002244 precipitate Substances 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- JQXXHWHPUNPDRT-WLSIYKJHSA-N rifampicin Chemical compound O([C@](C1=O)(C)O/C=C/[C@@H]([C@H]([C@@H](OC(C)=O)[C@H](C)[C@H](O)[C@H](C)[C@@H](O)[C@@H](C)\C=C\C=C(C)/C(=O)NC=2C(O)=C3C([O-])=C4C)C)OC)C4=C1C3=C(O)C=2\C=N\N1CC[NH+](C)CC1 JQXXHWHPUNPDRT-WLSIYKJHSA-N 0.000 description 1
- 229960001225 rifampicin Drugs 0.000 description 1
- 238000007363 ring formation reaction Methods 0.000 description 1
- 102220272852 rs1555583360 Human genes 0.000 description 1
- 102220155845 rs201567623 Human genes 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 210000002845 virion Anatomy 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1058—Directional evolution of libraries, e.g. evolution of libraries is achieved by mutagenesis and screening or selection of mixed population of organisms
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1082—Preparation or screening gene libraries by chromosomal integration of polynucleotide sequences, HR-, site-specific-recombination, transposons, viral vectors
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/82—Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/82—Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
- C12N15/8201—Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
- C12N15/8202—Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation by biological means, e.g. cell mediated or natural vector
- C12N15/8203—Virus mediated transformation
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/82—Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
- C12N15/8216—Methods for controlling, regulating or enhancing expression of transgenes in plant cells
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2750/00—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssDNA viruses
- C12N2750/00011—Details
- C12N2750/12011—Geminiviridae
- C12N2750/12041—Use of virus, viral particle or viral elements as a vector
- C12N2750/12043—Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2800/00—Nucleic acids vectors
- C12N2800/80—Vectors containing sites for inducing double-stranded breaks, e.g. meganuclease restriction sites
Definitions
- the present invention belongs to the field of genetic engineering. Specifically, the present invention relates to a directed evolution method based on geminivirus. More specifically, the present invention relates to a directed evolution method for in vivo screening of a genetic element in a plant cell by using primary and secondary replicons of geminivirus.
- directed evolution may modify a protein in the case of unknown structural information and mechanism of action of the target protein. Therefore, the directed evolution is an effective method to obtain new functional proteins in molecular biology research at present.
- the plant cell has special anatomical structures. Compared with Escherichia coli , the plant, as a eukaryote, has a cell nucleus, an endoplasmic reticulum, and other structures in its cell; and compared with the animal cell, the plant cell has a chloroplast, a cell wall and other structures, and it is difficult for some proteins related to these structures to evolve in Escherichia coli or mammal systems. Third, the plant cell has a unique cellular regulatory network.
- a complex regulatory network is formed between various genetic elements in the cell, and there are great differences between prokaryotes and eukaryotes, and between the plant cells and animal cells. Since any elements are unlikely to be independent of this network, products obtained by the directed evolution system outside a plant cell system may not work effectively in a plant system. However, the elements in the plant cell regulatory network may not achieve the directed evolution in other biological systems. For the above several reasons, some elements that may work efficiently in the mammal or Escherichia coli system, such as eAPOBEC3A reported by J. Keith Joung et al., do not show activity in the plant cells. Therefore, a directed evolution system based on a plant cell needs to be developed in this field.
- the geminivirus is the largest type of a single-stranded DNA virus in the plants, and its viral particle is of a doublet structure, a monad or a dyad, with a DNA size of 2.5-3.0 kb per molecule and a total genome size of 2.5-5.2 kb.
- this family of viruses may firstly form a replication intermediate of a double-stranded DNA in the cell nucleus, and then a viral genome is amplified in a mode of rolling-circle replication under the action of a replication initiation protein Rep/RepA encoded by the viruses and an endogenous DNA polymerase in the cells (as shown in FIG. 1 ).
- Rep/RepA is related to initiation and termination of the virus rolling-circle replication, inhibition of plant immunity, and regulation of virus gene expression.
- MP and CP are respectively related to movement and package of virions, but not to the replication.
- the genome of the members in the Mastrevirus genus also contains two gene spacer regions: a large intergenic region (LIR) and a small intergenic region (SIR).
- LIR large intergenic region
- SIR small intergenic region
- the former is a bidirectional promoter and contains a stable stem-loop structure, which may be recognized by Rep/RepA and is a replication starting point of the rolling-circle replication; and the latter is a bi-directional terminator and is related to the formation of a double-stranded DNA intermediate in the process of the replication.
- LIR and SIR are the only two cis-acting elements required for the virus replication, and Rep/RepA is the only trans-acting factor required
- the researchers develop a deconstructed virus replicon (as shown in FIG. 3 ), namely it only contains LIR and SIR, and the rest portions may be any sequences, Rep/RepA may be in-situ or ectopic expressed, and drive the rolling-circle replication of the replicon.
- the present invention provides a method for directed evolution of a genetic element to obtain a mutant of the genetic element with a desired function, and the method includes:
- the genetic element is selected from a protein coding sequence; a functional RNA coding sequence, such as tRNA or siRNA coding sequences; and an expression regulatory sequence such as a promoter sequence, an enhancer sequence, or a terminator sequence.
- the genetic element is derived from a plant, or is expected to be applied in a plant.
- the library of the mutants of the genetic element is obtained by respectively inserting the plurality of the mutants of the genetic element into the vectors containing the geminivirus replicon.
- the plurality of the mutants of the genetic element are generated by random mutagenesis of the genetic element.
- the library is generated by performing random mutagenesis on the genetic element that is already inserted into the vector containing the geminivirus replicon.
- the vector containing the geminivirus replicon is a circular DNA, such as a plasmid or a minicircle DNA.
- the vector containing the geminivirus replicon contains at least one LIR, for example, the LIR contains a nucleotide sequence shown in SEQ ID NO: 1.
- the vector containing the geminivirus replicon further contains at least one SIR, for example, the SIR contains a nucleotide sequence shown in SEQ ID NO: 2.
- the vector containing the geminivirus replicon contains one LIR.
- the vector containing the geminivirus replicon contains two LIRs.
- the mutant of the inserted genetic element is operably linked to an expression regulatory sequence.
- the vector containing the geminivirus replicon further contains an expression cassette of the geminivirus Rep and/or RepA protein.
- the vector containing the geminivirus replicon does not contain the expression cassette of the geminivirus Rep and/or RepA protein.
- the method further includes introducing another vector for expressing the geminivirus Rep and/or RepA protein into the plant cell.
- the population of the plant cells is co-transformed with the another vector for expressing the geminivirus Rep and/or RepA protein and the library.
- the plant cell already contains the vector for expressing the geminivirus Rep and/or RepA protein, and/or the genome of the plant cell is already integrated with the expression cassette of the geminivirus Rep and/or RepA protein.
- the geminivirus Rep protein comprises an amino acid sequence shown in SEQ ID NO: 3, or comprises an amino acid sequence with amino acid substitution K229E or Y20C relative to SEQ ID NO: 3, preferably comprises an amino acid sequence shown in SEQ ID NO: 4.
- the geminivirus RepA protein comprises an amino acid sequence shown in SEQ ID NO: 5, or comprises an amino acid sequence with amino acid substitution K229E or Y20C relative to SEQ ID NO: 5, preferably comprises an amino acid sequence shown in SEQ ID NO: 6.
- the number of vector molecules containing the mutants in the library is 10 3 to 10 5 times of the number of the cells in the population of the plant cells.
- the detecting and selecting the genetic element mutant enriched in the population of the plant cells may be performed by high-throughput sequencing.
- it further includes a step iv) identifying the function of the enriched genetic element mutant.
- the plant is a monocotyledon or a dicotyledon, for example, it is selected from corn, wheat, rice, barley, sorghum, kidney bean, beet, tomato, cassava, cucumber, arabidopsis and tobacco.
- the expression or activity of the geminivirus Rep and/or RepA protein in the plant cell is coupled with the desired function of the genetic element mutant, thereby achieving the directed evolution of the genetic element.
- the genetic element with the desired function activates Rep/RepA expression, thereby driving the rolling-circle replication to achieve self-enrichment; and the genetic element without the desired function cannot activate the Rep/RepA expression to achieve enrichment, thereby the directed evolution of the genetic element is achieved.
- the genetic element is a promoter.
- the method further includes placing a promoter library to be evolved upstream of Rep/RepA in the replicon.
- the genetic element is a cauliflower mosaic virus (CaMV) 35S promoter TATA-box.
- CaMV cauliflower mosaic virus
- the genetic element is a sequence encoding a transcription activator.
- the method further includes inserting a recognition sequence of the transcription activator upstream of Rep/RepA and inserting a minimal transcription initiation element between the recognition sequence and Rep/RepA; and placing a transcription activator library to be evolved in the replicon.
- the genetic element is a DNA binding domain.
- the method further includes inserting a target binding sequence of the DNA binding domain upstream of Rep/RepA, and inserting a minimal transcription initiation element between the recognition sequence and Rep/RepA; and placing a fusion protein of the DNA binding domain to be evolved and a transcription activator without sequence specificity in the replicon.
- the genetic element is a sequence encoding a recombinase.
- the method further includes dividing Rep/RepA into two portions, and placing at two ends of the recombinase recognition sequence; and placing a sequence encoding the recombinase to be evolved in the replicon.
- the method further includes adding a 5′ intron and a 3′ intron between Rep/RepA and the recombinase recognition sequence.
- the genetic element is a prime editing guide RNA (pegRNA).
- pegRNA prime editing guide RNA
- the method further includes inserting a target site at N terminal of Rep/RepA, allowing frame-shilling of the open reading frame of Rep/RepA; and inserting an expression cassette of the pegRNA into the geminivirus replicon, and inserting a fluorescence reporter system into its two ends.
- the desired function of the genetic element is coupled with the expression of a nuclease.
- the nuclease is a sequence specific nuclease.
- the genetic element with the desired function activates the expression of the nuclease or guides the nuclease to cut its recognition site, thereby driving the rolling-circle replication to achieve the self-enrichment; and the genetic element without the desired function cannot allow the nuclease to cut its recognition site and the enrichment cannot be achieved, thereby the directed evolution of the genetic element is achieved.
- the genetic element is a DNA binding domain.
- the method further includes fusing a DNA binding domain library to be evolved with a non-sequence specific nuclease, and placing the same in the replicon together with its recognition sequence.
- the genetic element is a sequence encoding a non-sequence specific nuclease.
- the genetic element is a sequence encoding a transcription activator.
- the method further includes inserting a recognition sequence of the transcription activator upstream of the nuclease, and inserting a minimal transcription initiation element between the recognition sequence and the nuclease; and placing a transcription activator library to be evolved in the replicon together with the recognition sequence of the nuclease.
- the genetic element is a sequence encoding a recombinase.
- the method further includes placing a recombinase library to be evolved and the recognition sequence of the nuclease in the replicon; and dividing the nuclease into two portions, and placed in two ends of the recombinase recognition sequence.
- the method further includes adding a 5′ intron and a 3′ intron between the nuclease and the recombinase recognition sequence.
- the genetic element is a protospacer adjacent motif (PAM) of a Cas protein.
- PAM protospacer adjacent motif
- the method further includes placing PAM to be evolved and a target sequence of the Cas protein together in the replicon.
- the genetic element is a sgRNA.
- the method further includes placing the sgRNA to be evolved and the target sequence of the Cas protein together in the replicon.
- the present invention further provides a kit for implementing the method of the present invention.
- FIG. 1 shows a rolling-circle replication model of a geminivirus.
- FIG. 2 shows a genome structure of a Mastrevirus genus virus.
- FIG. 3 shows a deconstruction virus replicon strategy for the geminivirus.
- FIG. 4 shows a basic principle of a plant in vivo directed evolution system based on a primary replicon: different alleles (mutants) of a gene of interest (GOI) are placed in a geminivirus replicon, to form a library to be screened; and a desired function of GOI is coupled with expression of Rep/RepA, namely a functional allele with the desired function may cause the expression of Rep/RepA in a plant cell, while a non-functional allele without the desired function may not cause the expression of Rep/RepA in the plant cell.
- mutants a gene of interest
- FIG. 5 shows a method of using the primary replicon directed evolution system to achieve promoter directed evolution.
- FIG. 6 shows a method of using the primary replicon directed evolution system to achieve transcription activator directed evolution.
- FIG. 7 shows a method of using the primary replicon directed evolution system to achieve DNA binding domain directed evolution.
- FIG. 8 shows a method of using the primary replicon directed evolution system to achieve recombinase directed evolution.
- FIG. 9 shows the principle of secondary replicon formation.
- FIG. 10 shows a basic principle of a plant in vivo directed evolution system based on secondary replicon.
- FIG. 1 l shows a method of using the secondary replicon directed evolution system to achieve DNA binding domain directed evolution.
- FIG. 12 shows a method of using the secondary replicon directed evolution system to achieve non-sequence specific nuclease directed evolution.
- FIG. 13 shows a method of using the secondary replicon directed evolution system to achieve transcription activator directed evolution.
- FIG. 14 shows a method of using the secondary replicon directed evolution system to achieve recombinase directed evolution.
- FIG. 15 shows construction of the library screened in Example 1.
- FIG. 16 shows a screening result of Rep Y20 while a replication enhancer is not added in Example 1.
- FIG. 17 shows a screening result of Rep Y20 while the replication enhancer is added in Example 1.
- FIG. 18 shows construction of the library screened in Example 2.
- FIG. 19 shows a screening result in Example 2.
- FIG. 20 shows an experimental principle and vector construction of Example 3.
- FIG. 21 shows a screening result in Example 3.
- FIG. 22 shows construction of the library screened in Example 4.
- FIG. 23 shows a screening principle of the PAM library in Example 4.
- FIG. 24 shows screening results of 3 bases at a 3′ end of a PAM library sequence in Example 4.
- FIG. 25 shows a sequence identification diagram of 6 bases in the PAM library sequence in Example 4.
- FIG. 26 shows an experimental principle and vector construction of Example 5.
- FIG. 27 shows a screening result in Example 5.
- FIG. 28 shows a schematic diagram of a directed evolution principle of a base editor, wherein the plant cell is co-transformed by a base editor mutant expression library, an inactivated Rep/RepA expression vector, and a sgRNA expression construct targeting the inactivated Rep/RepA coding sequence. While a base editor mutant has the desired base editing activity, it may correct the inactivated Rep/RepA to the activated Rep/RepA, so that enrichment is obtained.
- FIG. 29 shows a schematic diagram of a directed evolution principle of a recombinase, wherein the plant cell is co-transformed with a recombinase mutant expression library, and a Rep/RepA gene and promoter reverse expression vector. While a recombinase mutant has the desired activity, it may invert a reverse Rep/RepA gene (inversion), so that it may be driven and expressed by a promoter to achieve enrichment of the recombinase mutant.
- the present invention provides a method for directed evolution of a genetic element to obtain a mutant of the genetic element with a desired function, and the method includes:
- replication level of the geminivirus replicon in the plant cell is configured to be associated with the desired function of the genetic element mutant.
- the term “genetic element” refers to a nucleotide sequence/nucleic acid molecule that may achieve a specific function in a cell, preferably in a plant cell.
- the genetic element include, but are not limited to, a protein coding sequence, a functional RNA (such as tRNA and siRNA) coding sequence, and an expression regulatory sequence such as a promoter sequence, an enhancer sequence, or a terminator sequence.
- the genetic element is derived from a plant, or is expected to be applied in a plant.
- the term “library” is used with its known meaning in the field of cell biology and molecular biology, and it refers to a collection of different nucleic acid fragments/nucleic acid molecules.
- a specific type of the library is a library containing random mutants generated by random mutagenesis.
- Another example is a designed (or synthesized) library, and it contains different specially engineered nucleic acid fragments/nucleic acid molecules.
- the library of mutants of the genetic element is obtained by respectively inserting a plurality of the mutants of the genetic element into a vector containing the geminivirus replicon. In some embodiments, the plurality of the mutants of the genetic element is generated by the random mutagenesis.
- the library may be generated by performing random mutagenesis on the genetic element that has been inserted into the vector containing the geminivirus replicon.
- random mutagenesis is used with its meaning known in the field of cell biology and molecular biology; and it refers to a method in which a DNA mutation is introduced randomly to generate mutant genes and proteins. Then, many of these mutant genes may be compiled into the library.
- Non-limiting examples of the random mutagenesis method are error prone polymerase chain reaction (PCR), ultraviolet (UV) radiation and chemical mutagen.
- “Geminivirus” is a DNA virus that infects plants, and it is a virus having 1 or 2 single-stranded circular DNA molecules.
- examples of the geminivirus include, but are not limited to: maize streak virus (MSV), wheat dwarf virus (WDV), bean yellow dwarf virus (BeYDV) and other viruses belonging to Mastrevirus genus, beet curytop virus (BCTV) and other viruses belonging to beet curytop virus genus, tomato pseudo-curytop virus (TPCTV) and other viruses belonging to tomato pseudo-curytop virus genus, as well as bean golden mosaic virus (BGMV), African cassava mosaic virus (ACMV), squash leaf curl virus (SLCV), tomato golden mosaic virus (TGMV), and tomato yellow leaf curl virus (TYLCV) and the like.
- the geminivirus is WDV.
- the plant of the present invention may be a monocotyledon or a dicotyledon, as long as the geminivirus replicon may be replicated in its cell.
- the suitable plants include, but are not limited to, corn, wheat, rice, barley, sorghum, kidney bean, beet, tomato, cassava, cucumber, arabidopsis, tobacco and the like.
- the plant cell is an isolated plant cell. In some preferred embodiments, the plant cell is a protoplast cell.
- the plant cell is a cell in a plant tissue or a plant organ or a plant body, namely the cell is not isolated from the plant tissue or the plant organ or the plant body.
- the plant cell may be a cell in a leaf.
- the “replication level” of the geminivirus replicon may be determined by detecting the copy number of the geminivirus replicon.
- Methods for detecting the copy number of the geminivirus replicon are known in the art, including but not limited to PCR (such as fluorescence quantitative PCR) method or sequencing (such as deep sequencing) method.
- the vector containing the geminivirus replicon is a circular DNA, such as a double-stranded or single-stranded circular DNA. In some embodiments, the vector containing the geminivirus replicon is a plasmid. In some embodiments, the vector containing the geminivirus replicon is a minicircle DNA.
- the vector containing the geminivirus replicon contains at least one LIR.
- the vector containing the geminivirus replicon further contains at least one, such as one SIR.
- the vector containing the geminivirus replicon contains one LIR. In this case, the entire vector is replicated as the geminivirus replicon replicates.
- the vector containing the geminivirus replicon contains two LIRs. In some embodiments, one SIR is contained between the two LIRs. In this case, the sequence from the first LIR to the second LIR (containing SIR) is replicated as the geminivirus replicon replicates. Preferably, the genetic element mutant is located between the two LIRs.
- the LIR comprises a nucleotide sequence shown in SEQ ID NO:1. In some embodiments, the SIR comprises a nucleotide sequence shown in SEQ ID NO: 2.
- the inserted mutant of the genetic element is operably linked to an expression regulatory sequence.
- the “expression regulatory sequence” and “expression regulatory element” may be interchangeably used, and refer to nucleotide sequences that are located at upstream (5′ non-coding sequence), middle or downstream (3′ non-coding sequence) of the coding sequence, and affect the transcription, RNA processing or stability or translation of the relevant coding sequence.
- the plant expression regulatory element refers to a nucleotide sequence that may control the transcription, RNA processing or stability or translation of the interested nucleotide sequence in a plant.
- the expression regulatory sequence may include, but not limited to, a promoter, a translation leader sequence, an intron, and a polyadenylation recognition sequence.
- the “promoter” refers to a nucleic acid fragment that may control the transcription of another nucleic acid fragment.
- the promoter is a promoter that may control gene transcription in the plant cell, regardless of whether it is derived from the plant cell.
- the promoter may be a constitutive promoter or a tissue specific promoter or a developmental regulatory promoter or an inducible promoter.
- the vector containing the geminivirus replicon also contains an expression cassette of a geminivirus Rep and/or RepA protein.
- the expression cassette of the geminivirus Rep and/or RepA protein usually contains a coding nucleotide sequence of the geminivirus Rep and/or RepA protein and a expression regulatory element operably linked thereto.
- the vector containing the geminivirus replicon does not contain the expression cassette of the geminivirus Rep and/or RepA protein. Therefore, the geminivirus Rep and/or RepA protein needs to be provided in a trans manner.
- the method further includes introducing another vector for expressing the geminivirus Rep and/or RepA protein into the plant cell.
- the vector for expressing the geminivirus Rep and/or RepA protein usually contains the expression cassette of the geminivirus Rep and/or RepA protein.
- the population of the plant cells is co-transformed with the another vector for expressing the geminivirus Rep and/or RepA protein and the library.
- the plant cell already contains the vector for expressing the geminivirus Rep and/or RepA protein, and/or the genome of the plant cell is already integrated with the expression cassette of the geminivirus Rep and/or RepA protein.
- the geminivirus Rep protein comprises an amino acid sequence shown in SEQ ID NO: 3, or comprises an amino acid sequence with amino acid substitution K229E or Y20C relative to SEQ ID NO: 3, for example SEQ ID NO: 4.
- the geminivirus RepA protein comprises an amino acid sequence shown in SEQ ID NO: 5, or comprises an amino acid sequence with amino acid substitution K229E or Y20C relative to SEQ ID NO: 5, for example SEQ ID NO: 6.
- the geminivirus Rep protein comprises an amino acid sequence shown in SEQ ID NO: 4.
- the geminivirus RepA protein comprises an amino acid sequence shown in SEQ ID NO: 6.
- the replication level of the geminivirus replicon in the plant cell is configured to be associated with the desired function of the genetic element mutant” means that the replication level of the geminivirus replicon of the genetic element mutant having the desired function in the plant cell is higher than, preferably, significantly higher than the replication level of the geminivirus replicon of the genetic element mutant without the desired function in the plant cell.
- the genetic element mutant having the desired function causes the replication of the geminivirus replicon in the plant cell, while the genetic element mutant without the desired function does not cause the replication of the geminivirus replicon in the plant cell; or preferably, the genetic element mutant having the desired function causes high level replication of the geminivirus replicon in the plant cell, while the genetic element mutant without the desired function causes low level replication or non-replication of the geminivirus replicon in the plant cell.
- geminivirus replicon containing the genetic element mutant having the desired function is amplified or significantly amplified, due to the replication or the high level replication, compared with other geminivirus replicons without the genetic element mutant having the desired function, the enrichment of the genetic element mutant having the desired function may be achieved.
- the Rep and/or RepA protein is a replication initiation protein of the geminivirus, and its activity or expression level is usually positively correlated with the replication level (such as the copy number) of the geminivirus within a certain range. Therefore, in some embodiments, “the activity or expression level of the geminivirus Rep and/or RepA protein in the plant cell may be configured to be associated with the desired function of the genetic element mutant”. For example, the activity or expression level of the geminivirus Rep and/or RepA protein in the plant cell containing the genetic element mutant having the desired function may be higher than, preferably, significantly higher than the activity or expression level of the geminivirus Rep and/or RepA protein in the plant cell containing the genetic element mutant without the desired function.
- the activity of the geminivirus Rep and/or RepA protein is the activity of mediating (initiating) the replication of the geminivirus replicon, and it may be determined, for example, by detecting the replication level of the geminivirus replicon.
- the genetic element mutant having the desired function may cause the expression of the geminivirus Rep and/or RepA protein in the plant cell, while the genetic element mutant without the desired function causes no expression of the geminivirus Rep and/or RepA protein in the plant cell: or the genetic element mutant having the desired function may cause the high level expression of the geminivirus Rep and/or RepA protein in the plant cell, while the genetic element mutant without the desired function causes low level expression or no expression of the geminivirus Rep and/or RepA protein in the plant cell.
- the genetic element mutant having the desired function may cause the geminivirus Rep and/or RepA protein to be active in the plant cell, while the genetic element mutant without the desired function causes the geminivirus Rep and/or RepA protein to be inactive in the plant cell: or the genetic element mutant having the desired function may cause the high activity of the geminivirus Rep and/or RepA protein in the plant cell, while the genetic element mutant without the desired function causes low activity or no activity of the geminivirus Rep and/or RepA protein in the plant cell.
- the expression or activity or high level expression or high activity of the geminivirus Rep and/or RepA protein in the plant cell may cause the amplification or significant amplification of the geminivirus replicon, thereby the enrichment of the genetic element mutant having the desired function is achieved.
- low level or “low activity” mentioned herein is relative to the “high level” or “high activity”, which does not necessarily mean that it is lower than the normal level or normal activity.
- the replication level of the geminivirus in the plant cell or the activity or expression level of the Rep and/or RepA protein of the geminivirus in the plant cell may be configured as directly or indirectly associated with the desired function of the genetic element mutant. Those skilled in the art may achieve such association according to the type of a genetic element and the desired specific function of the mutant.
- expression level is used with its meaning known in the field of cell biology and molecular biology; and it refers to the transcription level and/or translation level of a DNA fragment and its derived mRNA respectively.
- the genetic element is a expression regulatory element (such as the promoter and the enhancer)
- the coding sequence of the geminivirus Rep and/or RepA protein may be directly placed under the control of the expression regulatory element mutant (such as a promoter mutant and an enhancer mutant). If the mutant can enhance gene expression, it may cause increased expression of the Rep and/or RepA protein, and the increased expression of the Rep and/or RepA protein may cause increased replication of the geminivirus replicon and the corresponding expression regulatory element mutant (such as the promoter mutant) in turn.
- the expression regulatory element mutant that enhances gene expression namely an evolved expression regulatory element, can be obtained.
- the genetic element is a protein coding sequence
- the desired function of the protein encoded by it may be associated with the activity or expression level of the geminivirus Rep and/or RepA protein.
- the plant cell may be co-transformed with a base editor mutant library constructed in a vector containing the geminivirus replicon, an expression vector containing the coding sequence of an inactivated Rep/RepA protein specifically designed for the desired function of the base editor, and a sgRNA expression construct targeting the coding sequence of the inactivated Rep/RepA.
- the base editor mutant in the plant cell has the desired base editing activity, it may correct the specifically designed inactivated Rep/RepA to the activated Rep/RepA, thereby inducing the replication of the geminivirus replicon, so that the mutant may be enriched.
- the plant cell may be co-transformed with a library of the recombinase mutants constructed in a vector containing the geminivirus replicon, and an expression vector in which Rep/RepA coding sequence and promoter are reversely arranged. While the recombinase mutant in the plant cell has the desired activity, it may invert the reversed Rep/RepA gene (inversion), thereby the Rep/RepA gene may be driven and expressed by the promoter, and the replication of the geminivirus replicon is induced to achieve enrichment of the recombinase mutant.
- the number of vector molecules containing the mutants in the library is 10 3 to 10 5 times of the number of the cells in the population of the plant cells. This ratio may reduce the probability of multiple different vector molecules transformed into a same cell and reduce the background of screening while the transformation efficiency is guaranteed.
- primary replicon refers to a replicon formed by the recognition and cyclization of LIRs tandem on the vector by Rep/RepA in a plant geminivirus system.
- the primary replicon is amplified by rolling-circle replication.
- the directed evolution of the genetic element is accomplished by as follows: the expression or activity of the geminivirus Rep and/or RepA protein in the plant cell is coupled with the desired function of the genetic element mutant.
- the directed evolution of the genetic element is accomplished by as follows: the genetic element having the desired function activates Rep/RepA expression, thereby the rolling-circle replication is driven, to achieve self enrichment; and the genetic element without the function may not activate the Rep/RepA expression, so the enrichment cannot be achieved.
- the genetic element is a promoter.
- the method i) further includes placing a promoter library to be evolved upstream of Rep/RepA in the replicon.
- the promoter having the function may drive the expression of downstream Rep/RepA, drive its own rolling-circle replication, increase the copy number, and achieve the self enrichment; and the promoter without the function may not drive the expression of the downstream Rep/RepA, and may not achieve the enrichment, thereby the directed evolution of the promoter is achieved.
- the genetic element is a CaMV 35S promoter TATA-box.
- transcription activator refers to a DNA binding protein that may activate gene expression.
- the transcription activator binds to an upstream promoter element to regulate the transcription process.
- the genetic element is a sequence encoding the transcription activator.
- the method i) further includes inserting a recognition sequence of the transcription activator into the upstream of Rep/RepA, and inserting a minimal promoter between the recognition sequence and Rep/RepA; and placing a transcription activator library to be evolved in the replicon.
- the genetic element is a DNA binding domain.
- the method i) further includes inserting a target binding sequence of the DNA binding domain to the upstream of Rep/RepA, and inserting a minimal transcription initiation element between the target binding sequence and Rep/RepA; and placing a fusion protein formed by the DNA binding domain to be evolved and the transcription activator without sequence specificity in the replicon together.
- the transcription activator having the desired function may bind to its recognition sequence, and activate the expression of the downstream Rep/RepA, thereby the rolling-circle replication is driven to achieve the self-enrichment; and the transcription activator without the function cannot activate the downstream Rep/RepA, and cannot achieve the enrichment, thereby the directed evolution of the transcription activator is achieved.
- the term “recombinase” refers to an enzyme involved in the process of gene directed recombination. It is responsible for identifying and cutting a specific recombination site, and linking two molecules involved in recombination.
- the genetic element is a sequence encoding the recombinase.
- the method i) further includes dividing Rep/RepA into two portions, and placing to two ends of a recombinase recognition sequence; and placing a sequence encoding the recombinase to be evolved in the replicon.
- the method i) further includes adding a 5′ intron and a 3′ intron between Rep/RepA and the recombinase recognition sequence.
- the recombinase with the desired function may recognize its specific recognition site, mediate the DNA recombination, and normally express Rep/RepA, thereby the rolling-circle replication is driven to achieve self enrichment; and the recombinase without the desired function cannot mediate the DNA recombination, cannot express Rep/RepA, and cannot achieve the enrichment, thereby the directed evolution of the recombinase is achieved.
- the genetic element is a prime editing guide pegRNA.
- the method i) further includes inserting a target site to N terminal of Rep/RepA to allow frame-shifting of the open reading frame of Rep/RepA; and inserting an expression cassette of pegRNA into the geminivirus replicon, and inserting fluorescence reporter systems into its two ends. If the virus replicon generates the rolling-circle replication under the action of the pegRNA, a fluorescence signal is reported; and if the virus replicon does not generate the rolling-circle replication, there is no fluorescence signal.
- active pegRNA is significantly enriched, because the low concentration may guarantee that only one vector enter the cell for most cells, and the screening requirements are satisfied.
- nuclease refers to a type of enzymes that catalyze hydrolysis of phosphate diester bond when using a nucleic acid as a substrate.
- the desired function of the genetic element is coupled with nuclease expression.
- the nuclease is a sequence specific nuclease.
- secondary replicon refers to a replicon formed as follows: the primary replicon of the geminivirus generates a double-stranded DNA break (DSB) under the action of the sequence specific nuclease, and the break may be linked with a right border (RB) of the plasmid under the guidance of VirD2, thereby a replicon is formed under the action of Rep/RepA.
- DRB double-stranded DNA break
- RB right border
- the directed evolution of the genetic element is accomplished by as follows: the genetic element with the desired function activates the expression of the nuclease or guides the nuclease to cut its recognition site, thereby the secondary replicon is formed, and the rolling-circle replication is driven to achieve the self enrichment; and the genetic element without the function may not allow the nuclease to cut its recognition site, thereby the secondary replicon cannot be formed, and the enrichment cannot be achieved.
- the genetic element is a DNA binding domain.
- the method i) further includes fusing a DNA binding domain library to be evolved with a non-sequence specific nuclease, and placing in the replicon together with the recognition sequence thereof.
- the DNA binding domain with the desired function may guide the nuclease to cut the target sequence, and generate a secondary replicon under the action of virD2; and the DNA binding domain without the desired function cannot guide the nuclease to cut the target sequence, thereby the secondary replicon cannot be generated.
- the directed evolution of the DNA binding domain is achieved.
- the genetic element is a sequence encoding a non-sequence specific nuclease.
- the genetic element is a sequence encoding a transcription activator.
- the method i) further includes inserting a recognition sequence of the transcription activator to upstream of the nuclease, and inserting a minimal transcription initiation element between the recognition sequence and the nuclease; and placing a transcription activator library to be evolved in the replicon together with the recognition sequence of the nuclease.
- the transcription activator with the desired function may activate the expression of the nuclease, which in turn cuts its recognition sequence, and the secondary replicon is formed under the action of virD2; and the transcription activator without the desired function cannot activate the expression of the nuclease, and thereby the secondary replicon cannot be generated.
- the directed evolution of the transcription activator may be achieved.
- the genetic element is a sequence encoding a recombinase.
- the method i) further includes placing a recombinase library to be evolved and a recognition sequence of a nuclease in the replicon; and dividing the nuclease into two portions which are placed at two ends of a recognition sequence of the recombinase.
- the method i) further includes adding a 5′ intron and a 3′ intron between the nuclease and the recombinase recognition sequence.
- the recombinase with the desired function may mediate the DNA recombination to express the nuclease, which in turn cuts its recognition site to generate a secondary replicon; and the recombinase without the desired function cannot mediate the DNA recombination, the nuclease cannot be expressed normally, and the secondary replicon cannot be generated.
- the directed evolution of the recombinase may be achieved.
- the genetic element is PAM of a Cas protein.
- the method further includes placing PAM to be evolved and a target sequence of the Cas protein in the replicon together.
- PAM that may be recognized by the Cas may generate DSB in the target region to form a secondary replicon, and information of PAM may be preserved in the secondary replicon; and PAM that cannot be recognized by Cas cannot generate a DSB, and the secondary replicon cannot be formed.
- the directed evolution of PAM may be achieved.
- the genetic element is a sgRNA.
- the method further includes placing sgRNA to be evolved and the target sequence of a Cas protein in the replicon together.
- sgRNA with the desired activity can guide Cas to cut a target site located at its downstream so as to form a secondary replicon; and sgRNA without the activity cannot generate a DSB, so the secondary replicon cannot be generated.
- the directed evolution of sgRNA may be achieved.
- the detecting and selecting of the genetic element mutants enriched in the population of the plant cells may be performed by high-throughput sequencing.
- the total DNA of the population of the plant cells may be extracted, and high-throughput sequencing may be performed for the genetic element.
- the method further includes a step iv) of identifying the function of the enriched genetic element mutant.
- the present invention provides a genetic element mutant or a coding product thereof obtained by the method of the present invention, and the use of the obtained genetic element mutant or the coding product thereof in plants, especially in plant genetic engineering.
- the present invention provides a kit for implementing the method of the present invention.
- the kit may include, for example, a vector containing the geminivirus replicon, and/or a vector for expressing the geminivirus Rep and/or RepA protein.
- the kit may further include a specification for implementing the method of the present invention.
- WDV and BeYDV replication subsystems are developed.
- the two viruses belong to the Mastrevirus genus, the genome structures are very similar, and they may achieve the high-efficient genome amplification in monocotyledons and dicotyledons respectively.
- LIRs that exists tandemly on a vector may be recognized by Rep/RepA, and cyclized into a primary replicon (PR), and then rolling-circle replication is performed, so the copy number may be increased by about 3 orders of magnitude.
- Rep/RepA is the only protein required for this process.
- a screening library of GOI (it may be generated by error prone PCR or saturation mutation) may be cloned into the geminivirus replicon, and supplemented by other elements, so that a desired function of GOI is coupled with the expression of Rep/RepA, and thus a plant in vivo directed evolution system based on the primary replicon of the geminivirus is constructed (as shown in FIG. 4 ).
- a target gene allele with the desired function may directly or indirectly drive the expression of Rep/RepA, thereby it is enriched by itself; and the allele without the desired function may not start the expression of Rep/RepA, and it may not be enriched by itself.
- deep sequencing it may be inferred which allele has the function, thereby the purpose of evolution is achieved.
- the inventor expects that the directed evolution of the genetic element such as a promoter, a transcription activator, a DNA binding protein, and a recombinase may be achieved.
- the promoter library to be evolved may be placed to upstream of Rep/RepA.
- the promoter with function may drive the expression of downstream Rep/RepA, drive the own rolling-circle replication, increase the copy number, and achieve the enrichment; and the promoter without function cannot drive the expression of the downstream Rep/RepA, and cannot achieve the enrichment, thereby the directed evolution of the promoter is achieved (as shown in FIG. 5 ).
- the recognition sequence of the transcription activator may be inserted to upstream of Rep/RepA, and supplemented by a minimal transcription initiation element; and a transcription activator library to be evolved is inserted into the replicon.
- the transcription activator with function may bind to its recognition sequence, and activate the expression of the downstream Rep/RepA, thereby the rolling-circle replication is driven to achieve the self enrichment; and the transcription activator without function may not activate the downstream Rep/RepA, and may not achieve the enrichment, thereby the directed evolution of the transcription activator is achieved (as shown in FIG. 6 ).
- the DNA binding domain to be evolved and a transcription activator without sequence specificity may be combined to form a fusion protein, and placed in the replicon together; and the target binding sequence of the DNA binding domain is inserted to upstream of Rep/RepA, and supplemented by a mini-promoter.
- the DNA binding domain having function may bind to its target sequence, and bring the transcription activator to the mini-promoter, the expression of the downstream Rep/RepA is driven, and the rolling-circle replication is driven to achieve the self enrichment; and the DNA binding domain without function cannot bind to the target sequence, and the downstream Rep/RepA cannot be activated, so that the enrichment cannot be achieved, thereby the directed evolution of the DNA binding domain is achieved (as shown in FIG. 7 ).
- the recombinase to be evolved may be placed in the replicon; and Rep/RepA is divided into two portions and placed to two ends of the recombinase recognition sequence.
- a 5′ intron and a 3′ intron may be added between Rep/RepA and the recombinase recognition sequence, so that the recombinase recognition sequence may be cut off after transcription, and Rep/RepA is translated normally.
- the recombinase with function may recognize its specific recognition site, mediate the DNA recombination, and normally express Rep/RepA, thereby the rolling-circle replication is driven to achieve self enrichment; and the recombinase without function cannot mediate the DNA recombination, cannot express Rep/RepA, and cannot achieve the enrichment, thereby the directed evolution of the recombinase is achieved (as shown in FIG. 8 ).
- a series of Vir proteins encoded by the Agrobacterium tumefaciens may recognize a RB sequence on a Ti plasmid, and generate a single-strand DNA break nick at a specific position on it. Then, the VirD2 protein may covalently bind to the 5′ DNA end of the nick, release a T-DNA sequence which is transferred to a plant cell nucleus.
- VirD2 may recognize a DSB spontaneously generated on the plant genome, and link the T-DNA sequence to it by non homologous end joining (NHEJ) and other modes under the action of a series of host factors, so as to insert the T-DNA sequence into the plant genome.
- NHEJ non homologous end joining
- the directed evolution relying on the secondary replicon also has the following advantages: first, in order to guarantee that most cells with only one vector molecule in the evolution process, the concentration of the target gene library must be kept very low, but this also means that the initial expression of GOI is very low, and it may not be enough to meet the screening requirements.
- GOI may firstly generate a first round of the rolling-circle replication under the action of Rep to form the primary replicon, and in this process, the copy number of GOI may be increased by three orders of magnitude in a short time period, and the expression is greatly increased, so that it would be enough to meet the next screening step.
- the secondary replicon allows for enriching twice: firstly, the secondary replicon is generated under the action of the sequence specific nuclease; and secondly, under the action of Rep/RepA, the secondary replicon generates the second round of the rolling-circle replication, and the copy number is greatly increased. All these make the directed evolution system relying on the secondary replicon have extraordinary advantages.
- the secondary replicon may be used to screen or evolve sequence specific nucleases in the plants with high throughput (as shown in FIG. 10 ), and may also be used to research a cutting mode of the sequence specific nuclease (such as PAM of a Cas nuclease) and guide RNA (such as sgRNA of Cas9 or crRNA of Cas12a).
- sequence specific nuclease such as PAM of a Cas nuclease
- guide RNA such as sgRNA of Cas9 or crRNA of Cas12a
- the genetic element that may be coupled with nuclease expression may be evolved with high throughput.
- a DNA binding domain may be evolved.
- a DNA binding domain library to be evolved is fused with a non-sequence specific nuclease and placed in the replicon together with target sequence thereof.
- the DNA binding domain with function may guide the nuclease to cut the target sequence and generate the secondary replicon under the action of virD2; and the DNA binding domain without function cannot guide the nuclease to cut the target sequence, thereby the secondary replicon cannot be generated.
- the directed evolution of the DNA binding domain is achieved (as shown in FIG. 11 ).
- the directed evolution of the non-sequence specific nuclease may also be achieved (as shown in FIG. 12 ).
- the secondary replicon may also be used to achieve the directed evolution of a transcription activator.
- the recognition sequence of the transcription activator may be placed upstream of the nuclease, and supplemented by a minimal transcription initiation element (mini-promoter); and a transcription activator library to be evolved is placed in the replicon together with the recognition sequence of the nuclease.
- mini-promoter minimal transcription initiation element
- a transcription activator library to be evolved is placed in the replicon together with the recognition sequence of the nuclease.
- the directed evolution of the transcription activator may be achieved (as shown in FIG. 13 ).
- the directed evolution of a recombinase may also be performed by using the directed evolution system relying on the secondary replicon.
- a recombinase library to be evolved and the recognition sequence of the nuclease are placed in the replicon together, and the nuclease is divided into two portions and placed on two ends of a recognition sequence of the recombinase.
- a 5′ intron and a 3′ intron may be added between the nuclease and the recombinase recognition sequence, so that the recombinase recognition sequence may be cut off after transcription and the nuclease is translated normally.
- the recombinase with function may mediate the DNA recombination to express the nuclease which will cut its recognition site to generate the secondary replicon; and the recombinase without function cannot mediate the DNA recombination, the nuclease cannot be expressed normally, and the secondary replicon cannot be generated.
- the directed evolution of the recombinase may be achieved (as shown in FIG. 14 ).
- Wheat seeds were planted in a culture room, and cultured under the conditions of 25 ⁇ 2° C. of a temperature, 1000 Lx of an illuminance and 14-16 h/d of light, and the culture time was about 1-2 weeks.
- MMG monoclonal gamma globulin
- a layer of filter paper was placed in a petri dish, and soaked with water, and tobacco seeds were sprinkled on the filter paper, cultured under light at 22° C. for about 5 days. Sprouted seedlings were transplanted into a culture bowl, and cultured for 4 weeks under the conditions of 22 ⁇ 2° C., 1000 Lx illuminance and 14-16 h/d of light.
- Agrobacterium tumefaciens with the target plasmid were inoculated into a Luria-Bertani (LB) medium containing kanamycin and rifampicin, shaken and cultured overnight at 28° C.
- LB Luria-Bertani
- 0.3 ml of turbid Agrobacterium bacteria solution was re-inoculated into 6 ml fresh medium, shaken and cultured at 28° C. for 4-6 hours. While the bacteria solution was cultured to 0.6-1.0 of optical density (OD), it was centrifuged, bacterial cells were collected, resuspended with tobacco infection solution, OD was adjusted to the target concentration (which should not exceed 1.6), and incubated in the dark at the room temperature for 30 min to 3 h. Flat and healthy tobacco leaves were selected, and the incubated bacteria solution was injected with an injector. Samples were taken for analysis after 48 h-96 h.
- OD optical density
- Example 2 primers 35Sp-200F and WDV-Rep-5R were used for preforming the first round of PCR amplification on DNA. Barcode primers ngs35Sp-300F and ngsWDV-Rep-100R were used for performing the second round of the amplification on a first round PCR product. In Example 2, primers 35Sp-200F and WDV-Rep-100R were used for performing the first round of PCR amplification on DNA. Barcode primers ngs35Sp-250F and ngsWDV-Rep-50R were used for performing the second round of the amplification on a first round PCR product.
- Second round PCR product was recovered, mixed in an proportion equal for all treatments, sent to the company for creating a library, and the deep sequencing was performed.
- Example 1 Evolution of 20th Amino Acid of Rep/RepA Achieved by Using Plant Directed Evolution System Based on Geminivirus
- the 20th amino acid of wild-type WDV Rep/RepA is a tyrosine, and this amino acid is highly conservative in this genus.
- Previous research show that the mutant Rep/RepA Y20C of Rep does not have the replication initiation activity.
- the inventors firstly attempted to evolve the 20th amino acid of Rep/RepA.
- a codon of the 20th amino acid of Rep/RepA was convened from TAT to NNN by a PCR method, and then a library with diversity of 64 (43) was obtained by cloning a PCR fragment onto the geminivirus vector (as shown in FIG. 15 ). Then, the library was transformed into wheat protoplasts at different concentration gradients (10 ⁇ g-0.1 ng, divided into 10 concentration gradients). After 48 h, protoplast DNA was extracted, and deeply sequenced for the site. Sequencing results of each concentration were compared with the results of the initial library.
- the probability of a functional allele and a non-functional allele co-transformed into the same cell is increased, and it may cause the amplification of the non-functional allele. This may results in a very high background noise.
- this replication enhancer may not independently initiate the rolling-circle replication, and on the other hand, while the Rep/RepA expression quantity is relatively low, the existence of this replication enhancer may greatly increase the copy number of the replicon.
- the previous research show that Rep/RepA of the geminivirus is a multifunctional protein, and in addition to initiating the rolling-circle replication, it is also an inhibitor of post transcriptional gene silencing (PTGS), and a transcription activator of a viral coding gene, and may interact with an endogenous protein, allowing a mature cell has the ability of high level DNA replication again, and the like. Based on this assumption, the inventor attempted to find a mutant of Rep/RepA. On the one hand, it may no longer initiate the rolling-circle replication, and on the other hand, it retains other functions except for initiating the rolling-circle replication, so that it can be used as the replication enhancer.
- PTGS post transcriptional gene silencing
- the inventor constructed several Rep/RepA mutants, including Y106H, K229E, Y20C, H59R, E198A and H91R, and then the replication initiation activity of these mutants were detected by the fluorescent quantitative PCR. Results show that except H91R, all other mutants have no replication initiation activity at all. Then, in order to screen the suitable replication enhancer, the added amount of the Rep/RepA plasmid was reduced from original 10 ⁇ g to 100 ng, and at this concentration, the copy number of the replicon may only reach a lower level. In this case, the addition of the replication enhancer should be able to greatly increase the level of the copy number. After being detected, it was found that RepA, Rep/RepA K229E and Y20C may be used as the replication enhancer. In subsequent experiments, Rep/RepA Y20C is used as the replication enhancer.
- the amount of the library added is also a key factor affecting the evolution system.
- the dilution ratio reaches 5-10, namely the added amount is about 1 ⁇ 10 ⁇ 13 mol to 1 ⁇ 10 ⁇ 15 mol, which is 10 3 to 10 5 times of the protoplast amount, the screening effect is the best.
- Example 2 Evolution of CaMV 35S Promoter TATA-Box Achieved by Using Plant Directed Evolution System Based on Geminivirus
- CaMV 35S promoter is a constitutive promoter commonly used in plants. There is a TATA-box motif at about 30 bp upstream of its transcription initiation site, and the sequence of which is 5′-ctatataag-3′. Most eukaryotic pol type-II promoters have a TATA-box element, and it is critical to the transcriptional activity of the promoter. In this Example, it was intended to evolve the CaMV 35S promoter TATA-box, and simultaneously study the effect of the element sequence on the activity of the CaMV 35S promoter.
- the activity of the CaMV 35S promoter was coupled with the expression of Rep/RepA.
- the CaMV 35S promoter was used to drive expression of WDV Rep/RepA, and the two were placed in the replicon together.
- the second step was to construct a screening library.
- TATA-box of the CaMV 35S promoter was converted from CTATATAAG to CNNNNNNNG by PCR method, and then the PCR fragments were cloned onto the geminivirus vector to obtain a library with theoretical diversity of 16384 (4 7 ) (as shown in FIG. 18 ).
- the library was transformed into wheat protoplasts at the different concentrations. After 48 h, protoplast DNA was extracted, and the site was sequenced. Sequencing results of each concentration were compared with the results of the initial library.
- Prime editing system is a gene editing system that may perform any base replacement and small fragment deletion or insertion, and it includes two portions, namely a nCas9 (H840A) protein fused with an M-MLV reverse transcriptase and a prime editing guide RNA (pegRNA).
- the prime editing system may work more efficiently in yeast and animal cells, but it is very inefficient in the plants and has very strong site specificity.
- pegRNA includes 4 portions, a spacer portion responsible for recognizing a target site, a scaffold part responsible for linking with nCas9, a primer binding site (PBS) as a primer responsible for complementing with 5′ end sequence of a nCas nick, and a reverse transcription template (RT) portion, as a reverse transcription template, responsible for repairing the 3′ end of the nick into an given sequence.
- PBS primer binding site
- RT reverse transcription template
- the function of pegRNA needs to be coupled with the expression of Rep/RepA.
- the target site is inserted at N end of Rep/RepA, resulting in frame-shifting of the open reading frame of Rep/RepA. While the pegRNA is absent or ineffective, Rep/RepA cannot be expressed. However, if the pegRNA is active, it may introduce short insertion or deletion at the target site, so that Rep/RepA may be correctly expressed.
- an expression cassette of pegRNA is inserted into the geminivirus replicon, and a fluorescence reporter system is inserted at its two ends (as shown in FIG. 20 ). If the virus replicon generates the rolling-circle replication under the action of pegRNA, a fluorescence signal is reported; and if the virus replicon does not generate the rolling-circle replication, there is no fluorescence signal.
- two vectors were constructed, wherein one vector contains known pegRNA with activity, and the other contains a mutation in PBS of pegRNA which results in loss of activity.
- the two vectors were mixed in an equal proportion to form a library and transformed into tobacco leaves at different concentrations. After 6 days. DNA was extracted and detected, and it was found by the inventors while the tobacco leaves were transformed with a high concentration of library, pegRNA with activity was not enriched; and while the tobacco leaves were transformed with a low concentration of library, pegRNA with activity was significantly enriched (as shown in FIG. 21 ). This meets the expectations and proves that the system may work.
- Cas proteins with nuclease activity including Cas9, Cas12a, Cas12b and the like.
- PAM protospacer adjacent motif
- FbCas12a a Cas12a protein, called FbCas12a
- Flavobacterium branchiophellum a Cas12a protein, called FbCas12a
- FbCas12a If a certain PAM can be recognized by FbCas12a, the latter may generate DSB in the target site region and then form a secondary replicon, and PAM information may remain in the secondary replicon; and if a certain PAM cannot be recognized by FbCas12a, DSB cannot be generated and PAM cannot be retained in the secondary replicon. Therefore, as long as the secondary replicon is specifically detected, it can be known which PAM can be recognized by FbCas12a (as shown in FIG. 23 ).
- OsEPSPSC3 and c5 Two target sites. OsEPSPSC3 and c5, were selected for testing. After testing, it was apparent that PAM that can be recognized by FbCas12a is ‘TTT’ (as shown in FIG. 24 ). At other positions, no apparent base preference was found (as shown in FIG. 25 ).
- RNA For a Cas protein, it needs a RNA to guide its nucleic acid cutting, called sgRNA (Cas9) or crRNA (Cas12a, Cas12b). Its sequence and structure greatly affect the activity of the Cas protein. Therefore, it is hoped to establish a system that can screen sgRNA quickly in high throughput.
- vectors shown in the figure are designed (as shown in FIG. 26 ). If a certain sgRNA is active, it can guide Cas9 to cut the target site located at its downstream, thereby a secondary replicon is formed; and if a certain sgRNA is inactive, it cannot generate DSB, and the secondary replicon cannot be generated. Based on this principle, two vectors were constructed, wherein one contains a sgRNA with activity, and the other containing a sgRNA without the activity. After 5 days of screening, it was found that sgRNA with activity in the secondary replicon was significantly enriched, and enrichment was related to the library concentration (as shown in FIG. 27 ). This proves that the system works.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Cell Biology (AREA)
- Virology (AREA)
- Ecology (AREA)
- Medicinal Chemistry (AREA)
- Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
The present invention belongs to the field of genetic engineering. Specifically, the present invention relates to a directed evolution method based on geminivirus. More specifically, the present invention relates to a directed evolution method for in vivo screening of a genetic element in a plant cell by using primary and secondary replicons of geminivirus.
Description
- The present invention belongs to the field of genetic engineering. Specifically, the present invention relates to a directed evolution method based on geminivirus. More specifically, the present invention relates to a directed evolution method for in vivo screening of a genetic element in a plant cell by using primary and secondary replicons of geminivirus.
- In the long history, organisms constantly produce variations, some of which are conducive to the survival of the organisms, while some of which are harmful to the survival of the organisms. Under the pressure of natural selection, the variations that are conducive to the survival of the organisms are retained and enriched, while those that are unfavorable are eliminated, this process is called evolution.
- According to this principle, in the field of modern molecular biology, researchers simulate this process in the laboratory, artificially create a large number of mutations, give targeted selection pressure according to the required functions and purposes, and screen out genotypes with desired properties. This simulated evolution at the molecular level is called directed evolution. The directed evolution may modify a protein in the case of unknown structural information and mechanism of action of the target protein. Therefore, the directed evolution is an effective method to obtain new functional proteins in molecular biology research at present.
- Up to now, many directed evolution systems are already reported. The main systems are a multiple automated genome engineering (MAGE) system developed by George M. Church et al, in 2009, a phase-assisted continuous evolution (ACE) system developed by David. Liu et al, in 2011, and an EvolvR system developed by John E. Dueber et al, in 2018 and the like. These systems may work efficiently, and complete the evolution of many genetic tools, such as xCas9. However, there is not a high-efficient and high-throughput directed evolution system based on a higher plant at present.
- The following factors make it urgent to establish a directed evolution system working in a plant cell. First, different working temperatures and pH are required in plants. At present, most tools used in the plants come from Escherichia coli or mammals, the living temperature of Escherichia coli and mammals is generally 37° C., and the optimal working temperature of proteins coming from the two is also generally 37° C., but the optimal growth temperature of most of the plants is 20-25° C.; and similarly, pH of an animal cell is generally 7.2-7.4. pH of Escherichia coli is 7.0-7.5, and pH of a plant cytoplasmic matrix is 5.6-5.9. As a result of the above, the proteins coming from Escherichia coli and mammals, or products obtained from the directed evolution system depending on Escherichia coli and mammals may not work as efficiently in the plants as before due to thermodynamic and chemical factors. Second, the plant cell has special anatomical structures. Compared with Escherichia coli, the plant, as a eukaryote, has a cell nucleus, an endoplasmic reticulum, and other structures in its cell; and compared with the animal cell, the plant cell has a chloroplast, a cell wall and other structures, and it is difficult for some proteins related to these structures to evolve in Escherichia coli or mammal systems. Third, the plant cell has a unique cellular regulatory network. In the long evolutionary process, a complex regulatory network is formed between various genetic elements in the cell, and there are great differences between prokaryotes and eukaryotes, and between the plant cells and animal cells. Since any elements are unlikely to be independent of this network, products obtained by the directed evolution system outside a plant cell system may not work effectively in a plant system. However, the elements in the plant cell regulatory network may not achieve the directed evolution in other biological systems. For the above several reasons, some elements that may work efficiently in the mammal or Escherichia coli system, such as eAPOBEC3A reported by J. Keith Joung et al., do not show activity in the plant cells. Therefore, a directed evolution system based on a plant cell needs to be developed in this field.
- However, at present, although researchers attempt to perform the directed evolution in the plants in several jobs, these systems are difficult to be applied and popularized. The reason is that some inherent factors in the plants hinder the development of related research. Firstly, many existing directed evolution systems rely on high-frequency homologous recombination in vivo, but the efficiency of homologous recombination in the plants is less than 0.1%. Secondly, bacteria, yeast and animal cells may all prepare cell lines efficiently. However, at present, only a few plant species may prepare the cell lines, and the repeatability thereof is relatively low. Thirdly, there is a lack of a high-efficient transformation system in the plants, so that the high-throughput research in the plants needs to pay a large amount of work. These factors greatly hinder the development of the directed evolution system in the plants, and it is difficult to make a substantial change in a short time.
- The geminivirus is the largest type of a single-stranded DNA virus in the plants, and its viral particle is of a doublet structure, a monad or a dyad, with a DNA size of 2.5-3.0 kb per molecule and a total genome size of 2.5-5.2 kb. After invading the plant cell, this family of viruses may firstly form a replication intermediate of a double-stranded DNA in the cell nucleus, and then a viral genome is amplified in a mode of rolling-circle replication under the action of a replication initiation protein Rep/RepA encoded by the viruses and an endogenous DNA polymerase in the cells (as shown in
FIG. 1 ). - Up to now, more than 500 species are already found in Geminiviridae; and according to the genome structure, the Geminiviridae is divided into 9 genera. Only the Mastrevirus genus may infect monocotyledons, its structure is the simplest, and it is also most thoroughly researched. Members in the Mastrevirus genus are all monad viruses, its genome size is 2500 bp-2800 bp, and a total of 4 proteins are encoded, namely a mobile protein (MP), a capsid protein (CP), and the replication initiation proteins Rep and RepA (as shown in
FIG. 2 ). Rep and RepA are encoded by a common piece of DNA, and two transcripts are obtained in a mode of alternative splicing. Rep/RepA is related to initiation and termination of the virus rolling-circle replication, inhibition of plant immunity, and regulation of virus gene expression. MP and CP are respectively related to movement and package of virions, but not to the replication. In addition, the genome of the members in the Mastrevirus genus also contains two gene spacer regions: a large intergenic region (LIR) and a small intergenic region (SIR). The former is a bidirectional promoter and contains a stable stem-loop structure, which may be recognized by Rep/RepA and is a replication starting point of the rolling-circle replication; and the latter is a bi-directional terminator and is related to the formation of a double-stranded DNA intermediate in the process of the replication. Since LIR and SIR are the only two cis-acting elements required for the virus replication, and Rep/RepA is the only trans-acting factor required, the researchers develop a deconstructed virus replicon (as shown inFIG. 3 ), namely it only contains LIR and SIR, and the rest portions may be any sequences, Rep/RepA may be in-situ or ectopic expressed, and drive the rolling-circle replication of the replicon. - The present invention provides a method for directed evolution of a genetic element to obtain a mutant of the genetic element with a desired function, and the method includes:
- i) providing a library of the mutants of the genetic element, which contains a plurality of mutants of the genetic element respectively inserted into vectors containing a geminivirus replicon, and wherein the mutant is inserted into the geminivirus replicon so that the mutant is amplified while the geminivirus replicon is replicated,
- ii) transforming a population of plant cells with the library, and
- iii) culturing the population of the plant cells, detecting and selecting the genetic element mutant enriched in the population of the plant cells, wherein the replication level of the geminivirus replicon in the plant cells is configured to be associated with the desired function of the genetic element mutant.
- In some embodiments, the genetic element is selected from a protein coding sequence; a functional RNA coding sequence, such as tRNA or siRNA coding sequences; and an expression regulatory sequence such as a promoter sequence, an enhancer sequence, or a terminator sequence.
- In some embodiments, the genetic element is derived from a plant, or is expected to be applied in a plant.
- In some embodiments, the library of the mutants of the genetic element is obtained by respectively inserting the plurality of the mutants of the genetic element into the vectors containing the geminivirus replicon.
- In some embodiments, the plurality of the mutants of the genetic element are generated by random mutagenesis of the genetic element.
- In some embodiments, the library is generated by performing random mutagenesis on the genetic element that is already inserted into the vector containing the geminivirus replicon.
- In some embodiments, the vector containing the geminivirus replicon is a circular DNA, such as a plasmid or a minicircle DNA.
- In some embodiments, the vector containing the geminivirus replicon contains at least one LIR, for example, the LIR contains a nucleotide sequence shown in SEQ ID NO: 1.
- In some embodiments, the vector containing the geminivirus replicon further contains at least one SIR, for example, the SIR contains a nucleotide sequence shown in SEQ ID NO: 2.
- In some embodiments, the vector containing the geminivirus replicon contains one LIR.
- In some embodiments, the vector containing the geminivirus replicon contains two LIRs.
- In some embodiments, in the vector containing the geminivirus replicon, the mutant of the inserted genetic element is operably linked to an expression regulatory sequence.
- In some embodiments, the vector containing the geminivirus replicon further contains an expression cassette of the geminivirus Rep and/or RepA protein.
- In some embodiments, the vector containing the geminivirus replicon does not contain the expression cassette of the geminivirus Rep and/or RepA protein.
- In some embodiments, the method further includes introducing another vector for expressing the geminivirus Rep and/or RepA protein into the plant cell.
- In some embodiments, the population of the plant cells is co-transformed with the another vector for expressing the geminivirus Rep and/or RepA protein and the library.
- In some embodiments, the plant cell already contains the vector for expressing the geminivirus Rep and/or RepA protein, and/or the genome of the plant cell is already integrated with the expression cassette of the geminivirus Rep and/or RepA protein.
- In some embodiments, the geminivirus Rep protein comprises an amino acid sequence shown in SEQ ID NO: 3, or comprises an amino acid sequence with amino acid substitution K229E or Y20C relative to SEQ ID NO: 3, preferably comprises an amino acid sequence shown in SEQ ID NO: 4.
- In some embodiments, the geminivirus RepA protein comprises an amino acid sequence shown in SEQ ID NO: 5, or comprises an amino acid sequence with amino acid substitution K229E or Y20C relative to SEQ ID NO: 5, preferably comprises an amino acid sequence shown in SEQ ID NO: 6.
- In some embodiments, wherein in the transformation of step ii), the number of vector molecules containing the mutants in the library is 103 to 105 times of the number of the cells in the population of the plant cells.
- In some embodiments, wherein in step iii), the detecting and selecting the genetic element mutant enriched in the population of the plant cells may be performed by high-throughput sequencing.
- In some embodiments, it further includes a step iv) identifying the function of the enriched genetic element mutant.
- In some embodiments, the plant is a monocotyledon or a dicotyledon, for example, it is selected from corn, wheat, rice, barley, sorghum, kidney bean, beet, tomato, cassava, cucumber, arabidopsis and tobacco.
- In some embodiments, the expression or activity of the geminivirus Rep and/or RepA protein in the plant cell is coupled with the desired function of the genetic element mutant, thereby achieving the directed evolution of the genetic element.
- In some embodiments, the genetic element with the desired function activates Rep/RepA expression, thereby driving the rolling-circle replication to achieve self-enrichment; and the genetic element without the desired function cannot activate the Rep/RepA expression to achieve enrichment, thereby the directed evolution of the genetic element is achieved.
- In some embodiments, the genetic element is a promoter.
- In some embodiments, the method further includes placing a promoter library to be evolved upstream of Rep/RepA in the replicon.
- In some embodiments, the genetic element is a cauliflower mosaic virus (CaMV) 35S promoter TATA-box.
- In some embodiments, the genetic element is a sequence encoding a transcription activator.
- In some embodiments, the method further includes inserting a recognition sequence of the transcription activator upstream of Rep/RepA and inserting a minimal transcription initiation element between the recognition sequence and Rep/RepA; and placing a transcription activator library to be evolved in the replicon.
- In some embodiments, the genetic element is a DNA binding domain.
- In some embodiments, the method further includes inserting a target binding sequence of the DNA binding domain upstream of Rep/RepA, and inserting a minimal transcription initiation element between the recognition sequence and Rep/RepA; and placing a fusion protein of the DNA binding domain to be evolved and a transcription activator without sequence specificity in the replicon.
- In some embodiments, the genetic element is a sequence encoding a recombinase.
- In some embodiments, the method further includes dividing Rep/RepA into two portions, and placing at two ends of the recombinase recognition sequence; and placing a sequence encoding the recombinase to be evolved in the replicon.
- In some embodiments, the method further includes adding a 5′ intron and a 3′ intron between Rep/RepA and the recombinase recognition sequence.
- In some embodiments, the genetic element is a prime editing guide RNA (pegRNA).
- In some embodiments, the method further includes inserting a target site at N terminal of Rep/RepA, allowing frame-shilling of the open reading frame of Rep/RepA; and inserting an expression cassette of the pegRNA into the geminivirus replicon, and inserting a fluorescence reporter system into its two ends.
- In some embodiments, the desired function of the genetic element is coupled with the expression of a nuclease.
- In some embodiments, the nuclease is a sequence specific nuclease.
- In some embodiments, the genetic element with the desired function activates the expression of the nuclease or guides the nuclease to cut its recognition site, thereby driving the rolling-circle replication to achieve the self-enrichment; and the genetic element without the desired function cannot allow the nuclease to cut its recognition site and the enrichment cannot be achieved, thereby the directed evolution of the genetic element is achieved.
- In some embodiments, the genetic element is a DNA binding domain.
- In some embodiments, the method further includes fusing a DNA binding domain library to be evolved with a non-sequence specific nuclease, and placing the same in the replicon together with its recognition sequence.
- In some embodiments, the genetic element is a sequence encoding a non-sequence specific nuclease.
- In some embodiments, the genetic element is a sequence encoding a transcription activator.
- In some embodiments, the method further includes inserting a recognition sequence of the transcription activator upstream of the nuclease, and inserting a minimal transcription initiation element between the recognition sequence and the nuclease; and placing a transcription activator library to be evolved in the replicon together with the recognition sequence of the nuclease.
- In some embodiments, the genetic element is a sequence encoding a recombinase.
- In some embodiments, the method further includes placing a recombinase library to be evolved and the recognition sequence of the nuclease in the replicon; and dividing the nuclease into two portions, and placed in two ends of the recombinase recognition sequence.
- In some embodiments, the method further includes adding a 5′ intron and a 3′ intron between the nuclease and the recombinase recognition sequence.
- In some embodiments, the genetic element is a protospacer adjacent motif (PAM) of a Cas protein.
- In some embodiments, the method further includes placing PAM to be evolved and a target sequence of the Cas protein together in the replicon.
- In some embodiments, the genetic element is a sgRNA.
- In some embodiments, the method further includes placing the sgRNA to be evolved and the target sequence of the Cas protein together in the replicon.
- The present invention further provides a kit for implementing the method of the present invention.
-
FIG. 1 shows a rolling-circle replication model of a geminivirus. -
FIG. 2 shows a genome structure of a Mastrevirus genus virus. -
FIG. 3 shows a deconstruction virus replicon strategy for the geminivirus. -
FIG. 4 shows a basic principle of a plant in vivo directed evolution system based on a primary replicon: different alleles (mutants) of a gene of interest (GOI) are placed in a geminivirus replicon, to form a library to be screened; and a desired function of GOI is coupled with expression of Rep/RepA, namely a functional allele with the desired function may cause the expression of Rep/RepA in a plant cell, while a non-functional allele without the desired function may not cause the expression of Rep/RepA in the plant cell. -
FIG. 5 shows a method of using the primary replicon directed evolution system to achieve promoter directed evolution. -
FIG. 6 shows a method of using the primary replicon directed evolution system to achieve transcription activator directed evolution. -
FIG. 7 shows a method of using the primary replicon directed evolution system to achieve DNA binding domain directed evolution. -
FIG. 8 shows a method of using the primary replicon directed evolution system to achieve recombinase directed evolution. -
FIG. 9 shows the principle of secondary replicon formation. -
FIG. 10 shows a basic principle of a plant in vivo directed evolution system based on secondary replicon. -
FIG. 1 l shows a method of using the secondary replicon directed evolution system to achieve DNA binding domain directed evolution. -
FIG. 12 shows a method of using the secondary replicon directed evolution system to achieve non-sequence specific nuclease directed evolution. -
FIG. 13 shows a method of using the secondary replicon directed evolution system to achieve transcription activator directed evolution. -
FIG. 14 shows a method of using the secondary replicon directed evolution system to achieve recombinase directed evolution. -
FIG. 15 shows construction of the library screened in Example 1. -
FIG. 16 shows a screening result of Rep Y20 while a replication enhancer is not added in Example 1. -
FIG. 17 shows a screening result of Rep Y20 while the replication enhancer is added in Example 1. -
FIG. 18 shows construction of the library screened in Example 2. -
FIG. 19 shows a screening result in Example 2. -
FIG. 20 shows an experimental principle and vector construction of Example 3. -
FIG. 21 shows a screening result in Example 3. -
FIG. 22 shows construction of the library screened in Example 4. -
FIG. 23 shows a screening principle of the PAM library in Example 4. -
FIG. 24 shows screening results of 3 bases at a 3′ end of a PAM library sequence in Example 4. -
FIG. 25 shows a sequence identification diagram of 6 bases in the PAM library sequence in Example 4. -
FIG. 26 shows an experimental principle and vector construction of Example 5. -
FIG. 27 shows a screening result in Example 5. -
FIG. 28 shows a schematic diagram of a directed evolution principle of a base editor, wherein the plant cell is co-transformed by a base editor mutant expression library, an inactivated Rep/RepA expression vector, and a sgRNA expression construct targeting the inactivated Rep/RepA coding sequence. While a base editor mutant has the desired base editing activity, it may correct the inactivated Rep/RepA to the activated Rep/RepA, so that enrichment is obtained. -
FIG. 29 shows a schematic diagram of a directed evolution principle of a recombinase, wherein the plant cell is co-transformed with a recombinase mutant expression library, and a Rep/RepA gene and promoter reverse expression vector. While a recombinase mutant has the desired activity, it may invert a reverse Rep/RepA gene (inversion), so that it may be driven and expressed by a promoter to achieve enrichment of the recombinase mutant. - In one aspect, the present invention provides a method for directed evolution of a genetic element to obtain a mutant of the genetic element with a desired function, and the method includes:
- i) providing a library of the mutant of the genetic element, which contains a plurality of mutants of the genetic element respectively inserted into a vector containing a geminivirus replicon, and wherein the mutant is inserted into the geminivirus replicon so that the mutant is amplified while the geminivirus replicon is replicated,
- ii) transforming a population of plant cells with the library, and
- iii) culturing the population of the plant cells, detecting and selecting the genetic element mutant enriched in the population of the plant cells,
- wherein the replication level of the geminivirus replicon in the plant cell is configured to be associated with the desired function of the genetic element mutant.
- As used wherein, the term “genetic element” refers to a nucleotide sequence/nucleic acid molecule that may achieve a specific function in a cell, preferably in a plant cell. Examples of the genetic element include, but are not limited to, a protein coding sequence, a functional RNA (such as tRNA and siRNA) coding sequence, and an expression regulatory sequence such as a promoter sequence, an enhancer sequence, or a terminator sequence. In some preferred embodiments, the genetic element is derived from a plant, or is expected to be applied in a plant.
- In the context of this description, the term “library” is used with its known meaning in the field of cell biology and molecular biology, and it refers to a collection of different nucleic acid fragments/nucleic acid molecules. A specific type of the library is a library containing random mutants generated by random mutagenesis. Another example is a designed (or synthesized) library, and it contains different specially engineered nucleic acid fragments/nucleic acid molecules.
- In some embodiments, the library of mutants of the genetic element is obtained by respectively inserting a plurality of the mutants of the genetic element into a vector containing the geminivirus replicon. In some embodiments, the plurality of the mutants of the genetic element is generated by the random mutagenesis.
- In some embodiments, the library may be generated by performing random mutagenesis on the genetic element that has been inserted into the vector containing the geminivirus replicon.
- In the context of this description, the term “random mutagenesis” is used with its meaning known in the field of cell biology and molecular biology; and it refers to a method in which a DNA mutation is introduced randomly to generate mutant genes and proteins. Then, many of these mutant genes may be compiled into the library. Non-limiting examples of the random mutagenesis method are error prone polymerase chain reaction (PCR), ultraviolet (UV) radiation and chemical mutagen.
- “Geminivirus” is a DNA virus that infects plants, and it is a virus having 1 or 2 single-stranded circular DNA molecules. Examples of the geminivirus include, but are not limited to: maize streak virus (MSV), wheat dwarf virus (WDV), bean yellow dwarf virus (BeYDV) and other viruses belonging to Mastrevirus genus, beet curytop virus (BCTV) and other viruses belonging to beet curytop virus genus, tomato pseudo-curytop virus (TPCTV) and other viruses belonging to tomato pseudo-curytop virus genus, as well as bean golden mosaic virus (BGMV), African cassava mosaic virus (ACMV), squash leaf curl virus (SLCV), tomato golden mosaic virus (TGMV), and tomato yellow leaf curl virus (TYLCV) and the like. In some preferred embodiments, the geminivirus is WDV.
- The plant of the present invention may be a monocotyledon or a dicotyledon, as long as the geminivirus replicon may be replicated in its cell. The suitable plants include, but are not limited to, corn, wheat, rice, barley, sorghum, kidney bean, beet, tomato, cassava, cucumber, arabidopsis, tobacco and the like.
- In some embodiments, the plant cell is an isolated plant cell. In some preferred embodiments, the plant cell is a protoplast cell.
- In some embodiments, the plant cell is a cell in a plant tissue or a plant organ or a plant body, namely the cell is not isolated from the plant tissue or the plant organ or the plant body. For example, the plant cell may be a cell in a leaf.
- The “replication level” of the geminivirus replicon may be determined by detecting the copy number of the geminivirus replicon. Methods for detecting the copy number of the geminivirus replicon are known in the art, including but not limited to PCR (such as fluorescence quantitative PCR) method or sequencing (such as deep sequencing) method.
- In some embodiments, the vector containing the geminivirus replicon is a circular DNA, such as a double-stranded or single-stranded circular DNA. In some embodiments, the vector containing the geminivirus replicon is a plasmid. In some embodiments, the vector containing the geminivirus replicon is a minicircle DNA.
- In some embodiments, the vector containing the geminivirus replicon contains at least one LIR.
- In some embodiments, the vector containing the geminivirus replicon further contains at least one, such as one SIR.
- In some embodiments, the vector containing the geminivirus replicon contains one LIR. In this case, the entire vector is replicated as the geminivirus replicon replicates.
- In some preferred embodiments, the vector containing the geminivirus replicon contains two LIRs. In some embodiments, one SIR is contained between the two LIRs. In this case, the sequence from the first LIR to the second LIR (containing SIR) is replicated as the geminivirus replicon replicates. Preferably, the genetic element mutant is located between the two LIRs.
- In some embodiments, the LIR comprises a nucleotide sequence shown in SEQ ID NO:1. In some embodiments, the SIR comprises a nucleotide sequence shown in SEQ ID NO: 2.
- In some embodiments, in the vector containing the geminivirus replicon, the inserted mutant of the genetic element is operably linked to an expression regulatory sequence.
- The “expression regulatory sequence” and “expression regulatory element” may be interchangeably used, and refer to nucleotide sequences that are located at upstream (5′ non-coding sequence), middle or downstream (3′ non-coding sequence) of the coding sequence, and affect the transcription, RNA processing or stability or translation of the relevant coding sequence. The plant expression regulatory element refers to a nucleotide sequence that may control the transcription, RNA processing or stability or translation of the interested nucleotide sequence in a plant. The expression regulatory sequence may include, but not limited to, a promoter, a translation leader sequence, an intron, and a polyadenylation recognition sequence. The “promoter” refers to a nucleic acid fragment that may control the transcription of another nucleic acid fragment. In some embodiments of the present invention, the promoter is a promoter that may control gene transcription in the plant cell, regardless of whether it is derived from the plant cell. The promoter may be a constitutive promoter or a tissue specific promoter or a developmental regulatory promoter or an inducible promoter.
- In some embodiments, the vector containing the geminivirus replicon also contains an expression cassette of a geminivirus Rep and/or RepA protein.
- The expression cassette of the geminivirus Rep and/or RepA protein usually contains a coding nucleotide sequence of the geminivirus Rep and/or RepA protein and a expression regulatory element operably linked thereto.
- In some embodiments, the vector containing the geminivirus replicon does not contain the expression cassette of the geminivirus Rep and/or RepA protein. Therefore, the geminivirus Rep and/or RepA protein needs to be provided in a trans manner.
- In some embodiments, the method further includes introducing another vector for expressing the geminivirus Rep and/or RepA protein into the plant cell. The vector for expressing the geminivirus Rep and/or RepA protein usually contains the expression cassette of the geminivirus Rep and/or RepA protein. In some embodiments, the population of the plant cells is co-transformed with the another vector for expressing the geminivirus Rep and/or RepA protein and the library.
- In some embodiments, the plant cell already contains the vector for expressing the geminivirus Rep and/or RepA protein, and/or the genome of the plant cell is already integrated with the expression cassette of the geminivirus Rep and/or RepA protein.
- In some embodiments, the geminivirus Rep protein comprises an amino acid sequence shown in SEQ ID NO: 3, or comprises an amino acid sequence with amino acid substitution K229E or Y20C relative to SEQ ID NO: 3, for example SEQ ID NO: 4. In some embodiments, the geminivirus RepA protein comprises an amino acid sequence shown in SEQ ID NO: 5, or comprises an amino acid sequence with amino acid substitution K229E or Y20C relative to SEQ ID NO: 5, for example SEQ ID NO: 6. In some preferred embodiments, the geminivirus Rep protein comprises an amino acid sequence shown in SEQ ID NO: 4. In some preferred embodiments, the geminivirus RepA protein comprises an amino acid sequence shown in SEQ ID NO: 6.
- “The replication level of the geminivirus replicon in the plant cell is configured to be associated with the desired function of the genetic element mutant” means that the replication level of the geminivirus replicon of the genetic element mutant having the desired function in the plant cell is higher than, preferably, significantly higher than the replication level of the geminivirus replicon of the genetic element mutant without the desired function in the plant cell. For example, it is possible to configure that the genetic element mutant having the desired function causes the replication of the geminivirus replicon in the plant cell, while the genetic element mutant without the desired function does not cause the replication of the geminivirus replicon in the plant cell; or preferably, the genetic element mutant having the desired function causes high level replication of the geminivirus replicon in the plant cell, while the genetic element mutant without the desired function causes low level replication or non-replication of the geminivirus replicon in the plant cell. While the geminivirus replicon containing the genetic element mutant having the desired function is amplified or significantly amplified, due to the replication or the high level replication, compared with other geminivirus replicons without the genetic element mutant having the desired function, the enrichment of the genetic element mutant having the desired function may be achieved.
- The Rep and/or RepA protein is a replication initiation protein of the geminivirus, and its activity or expression level is usually positively correlated with the replication level (such as the copy number) of the geminivirus within a certain range. Therefore, in some embodiments, “the activity or expression level of the geminivirus Rep and/or RepA protein in the plant cell may be configured to be associated with the desired function of the genetic element mutant”. For example, the activity or expression level of the geminivirus Rep and/or RepA protein in the plant cell containing the genetic element mutant having the desired function may be higher than, preferably, significantly higher than the activity or expression level of the geminivirus Rep and/or RepA protein in the plant cell containing the genetic element mutant without the desired function. In some embodiments, the activity of the geminivirus Rep and/or RepA protein is the activity of mediating (initiating) the replication of the geminivirus replicon, and it may be determined, for example, by detecting the replication level of the geminivirus replicon. For example, the genetic element mutant having the desired function may cause the expression of the geminivirus Rep and/or RepA protein in the plant cell, while the genetic element mutant without the desired function causes no expression of the geminivirus Rep and/or RepA protein in the plant cell: or the genetic element mutant having the desired function may cause the high level expression of the geminivirus Rep and/or RepA protein in the plant cell, while the genetic element mutant without the desired function causes low level expression or no expression of the geminivirus Rep and/or RepA protein in the plant cell. Alternatively, the genetic element mutant having the desired function may cause the geminivirus Rep and/or RepA protein to be active in the plant cell, while the genetic element mutant without the desired function causes the geminivirus Rep and/or RepA protein to be inactive in the plant cell: or the genetic element mutant having the desired function may cause the high activity of the geminivirus Rep and/or RepA protein in the plant cell, while the genetic element mutant without the desired function causes low activity or no activity of the geminivirus Rep and/or RepA protein in the plant cell. The expression or activity or high level expression or high activity of the geminivirus Rep and/or RepA protein in the plant cell may cause the amplification or significant amplification of the geminivirus replicon, thereby the enrichment of the genetic element mutant having the desired function is achieved.
- The “low level” or “low activity” mentioned herein is relative to the “high level” or “high activity”, which does not necessarily mean that it is lower than the normal level or normal activity.
- The replication level of the geminivirus in the plant cell or the activity or expression level of the Rep and/or RepA protein of the geminivirus in the plant cell may be configured as directly or indirectly associated with the desired function of the genetic element mutant. Those skilled in the art may achieve such association according to the type of a genetic element and the desired specific function of the mutant.
- In the context of this description, the term “expression level” is used with its meaning known in the field of cell biology and molecular biology; and it refers to the transcription level and/or translation level of a DNA fragment and its derived mRNA respectively.
- For example, while the genetic element is a expression regulatory element (such as the promoter and the enhancer), the coding sequence of the geminivirus Rep and/or RepA protein may be directly placed under the control of the expression regulatory element mutant (such as a promoter mutant and an enhancer mutant). If the mutant can enhance gene expression, it may cause increased expression of the Rep and/or RepA protein, and the increased expression of the Rep and/or RepA protein may cause increased replication of the geminivirus replicon and the corresponding expression regulatory element mutant (such as the promoter mutant) in turn. By detecting the significantly enriched mutant sequence, the expression regulatory element mutant that enhances gene expression, namely an evolved expression regulatory element, can be obtained.
- While the genetic element is a protein coding sequence, the desired function of the protein encoded by it may be associated with the activity or expression level of the geminivirus Rep and/or RepA protein.
- For example, if the directed evolution needs to be performed on a base editor, the plant cell may be co-transformed with a base editor mutant library constructed in a vector containing the geminivirus replicon, an expression vector containing the coding sequence of an inactivated Rep/RepA protein specifically designed for the desired function of the base editor, and a sgRNA expression construct targeting the coding sequence of the inactivated Rep/RepA. While the base editor mutant in the plant cell has the desired base editing activity, it may correct the specifically designed inactivated Rep/RepA to the activated Rep/RepA, thereby inducing the replication of the geminivirus replicon, so that the mutant may be enriched.
- Alternatively, if the directed evolution needs to be performed on a recombinase, the plant cell may be co-transformed with a library of the recombinase mutants constructed in a vector containing the geminivirus replicon, and an expression vector in which Rep/RepA coding sequence and promoter are reversely arranged. While the recombinase mutant in the plant cell has the desired activity, it may invert the reversed Rep/RepA gene (inversion), thereby the Rep/RepA gene may be driven and expressed by the promoter, and the replication of the geminivirus replicon is induced to achieve enrichment of the recombinase mutant.
- In some embodiments, in the transformation of step ii), the number of vector molecules containing the mutants in the library is 103 to 105 times of the number of the cells in the population of the plant cells. This ratio may reduce the probability of multiple different vector molecules transformed into a same cell and reduce the background of screening while the transformation efficiency is guaranteed.
- In the context of this description, the term “primary replicon” refers to a replicon formed by the recognition and cyclization of LIRs tandem on the vector by Rep/RepA in a plant geminivirus system. The primary replicon is amplified by rolling-circle replication.
- In some embodiments, the directed evolution of the genetic element is accomplished by as follows: the expression or activity of the geminivirus Rep and/or RepA protein in the plant cell is coupled with the desired function of the genetic element mutant. In some embodiments, the directed evolution of the genetic element is accomplished by as follows: the genetic element having the desired function activates Rep/RepA expression, thereby the rolling-circle replication is driven, to achieve self enrichment; and the genetic element without the function may not activate the Rep/RepA expression, so the enrichment cannot be achieved.
- In some embodiments, the genetic element is a promoter. In some embodiments, the method i) further includes placing a promoter library to be evolved upstream of Rep/RepA in the replicon. The promoter having the function may drive the expression of downstream Rep/RepA, drive its own rolling-circle replication, increase the copy number, and achieve the self enrichment; and the promoter without the function may not drive the expression of the downstream Rep/RepA, and may not achieve the enrichment, thereby the directed evolution of the promoter is achieved.
- In some embodiments, the genetic element is a CaMV 35S promoter TATA-box.
- In the context of this description, the term “transcription activator” refers to a DNA binding protein that may activate gene expression. The transcription activator binds to an upstream promoter element to regulate the transcription process.
- In some embodiments, the genetic element is a sequence encoding the transcription activator. In some embodiments, the method i) further includes inserting a recognition sequence of the transcription activator into the upstream of Rep/RepA, and inserting a minimal promoter between the recognition sequence and Rep/RepA; and placing a transcription activator library to be evolved in the replicon.
- In some embodiments, the genetic element is a DNA binding domain. In some embodiments, the method i) further includes inserting a target binding sequence of the DNA binding domain to the upstream of Rep/RepA, and inserting a minimal transcription initiation element between the target binding sequence and Rep/RepA; and placing a fusion protein formed by the DNA binding domain to be evolved and the transcription activator without sequence specificity in the replicon together. The transcription activator having the desired function may bind to its recognition sequence, and activate the expression of the downstream Rep/RepA, thereby the rolling-circle replication is driven to achieve the self-enrichment; and the transcription activator without the function cannot activate the downstream Rep/RepA, and cannot achieve the enrichment, thereby the directed evolution of the transcription activator is achieved.
- In the context of this description, the term “recombinase” refers to an enzyme involved in the process of gene directed recombination. It is responsible for identifying and cutting a specific recombination site, and linking two molecules involved in recombination. In some embodiments, the genetic element is a sequence encoding the recombinase. In some embodiments, the method i) further includes dividing Rep/RepA into two portions, and placing to two ends of a recombinase recognition sequence; and placing a sequence encoding the recombinase to be evolved in the replicon. In some embodiments, the method i) further includes adding a 5′ intron and a 3′ intron between Rep/RepA and the recombinase recognition sequence. The recombinase with the desired function may recognize its specific recognition site, mediate the DNA recombination, and normally express Rep/RepA, thereby the rolling-circle replication is driven to achieve self enrichment; and the recombinase without the desired function cannot mediate the DNA recombination, cannot express Rep/RepA, and cannot achieve the enrichment, thereby the directed evolution of the recombinase is achieved.
- In some embodiments, the genetic element is a prime editing guide pegRNA. In some embodiments, the method i) further includes inserting a target site to N terminal of Rep/RepA to allow frame-shifting of the open reading frame of Rep/RepA; and inserting an expression cassette of pegRNA into the geminivirus replicon, and inserting fluorescence reporter systems into its two ends. If the virus replicon generates the rolling-circle replication under the action of the pegRNA, a fluorescence signal is reported; and if the virus replicon does not generate the rolling-circle replication, there is no fluorescence signal. In some embodiments, while a tobacco leaf is transformed with a low concentration library, active pegRNA is significantly enriched, because the low concentration may guarantee that only one vector enter the cell for most cells, and the screening requirements are satisfied.
- In the context of this description, the term “nuclease” refers to a type of enzymes that catalyze hydrolysis of phosphate diester bond when using a nucleic acid as a substrate. In some embodiments, the desired function of the genetic element is coupled with nuclease expression. In some embodiments, the nuclease is a sequence specific nuclease.
- In the context of this description, the term “secondary replicon” refers to a replicon formed as follows: the primary replicon of the geminivirus generates a double-stranded DNA break (DSB) under the action of the sequence specific nuclease, and the break may be linked with a right border (RB) of the plasmid under the guidance of VirD2, thereby a replicon is formed under the action of Rep/RepA. In some embodiments, the directed evolution of the genetic element is accomplished by as follows: the genetic element with the desired function activates the expression of the nuclease or guides the nuclease to cut its recognition site, thereby the secondary replicon is formed, and the rolling-circle replication is driven to achieve the self enrichment; and the genetic element without the function may not allow the nuclease to cut its recognition site, thereby the secondary replicon cannot be formed, and the enrichment cannot be achieved.
- In some embodiments, the genetic element is a DNA binding domain. In some embodiments, the method i) further includes fusing a DNA binding domain library to be evolved with a non-sequence specific nuclease, and placing in the replicon together with the recognition sequence thereof. The DNA binding domain with the desired function may guide the nuclease to cut the target sequence, and generate a secondary replicon under the action of virD2; and the DNA binding domain without the desired function cannot guide the nuclease to cut the target sequence, thereby the secondary replicon cannot be generated. By detecting the secondary replicon, the directed evolution of the DNA binding domain is achieved.
- In some embodiments, the genetic element is a sequence encoding a non-sequence specific nuclease.
- In some embodiments, the genetic element is a sequence encoding a transcription activator. In some embodiments, the method i) further includes inserting a recognition sequence of the transcription activator to upstream of the nuclease, and inserting a minimal transcription initiation element between the recognition sequence and the nuclease; and placing a transcription activator library to be evolved in the replicon together with the recognition sequence of the nuclease. The transcription activator with the desired function may activate the expression of the nuclease, which in turn cuts its recognition sequence, and the secondary replicon is formed under the action of virD2; and the transcription activator without the desired function cannot activate the expression of the nuclease, and thereby the secondary replicon cannot be generated. By detecting the secondary replicon, the directed evolution of the transcription activator may be achieved.
- In some embodiments, the genetic element is a sequence encoding a recombinase. In some embodiments, the method i) further includes placing a recombinase library to be evolved and a recognition sequence of a nuclease in the replicon; and dividing the nuclease into two portions which are placed at two ends of a recognition sequence of the recombinase. In some embodiments, the method i) further includes adding a 5′ intron and a 3′ intron between the nuclease and the recombinase recognition sequence. The recombinase with the desired function may mediate the DNA recombination to express the nuclease, which in turn cuts its recognition site to generate a secondary replicon; and the recombinase without the desired function cannot mediate the DNA recombination, the nuclease cannot be expressed normally, and the secondary replicon cannot be generated. By detecting the secondary replicon, the directed evolution of the recombinase may be achieved.
- In some embodiments, the genetic element is PAM of a Cas protein. In some embodiments, the method further includes placing PAM to be evolved and a target sequence of the Cas protein in the replicon together. PAM that may be recognized by the Cas may generate DSB in the target region to form a secondary replicon, and information of PAM may be preserved in the secondary replicon; and PAM that cannot be recognized by Cas cannot generate a DSB, and the secondary replicon cannot be formed. By detecting the secondary replicon, the directed evolution of PAM may be achieved.
- In some embodiments, the genetic element is a sgRNA. In some embodiments, the method further includes placing sgRNA to be evolved and the target sequence of a Cas protein in the replicon together. sgRNA with the desired activity can guide Cas to cut a target site located at its downstream so as to form a secondary replicon; and sgRNA without the activity cannot generate a DSB, so the secondary replicon cannot be generated. By detecting the secondary replicon, the directed evolution of sgRNA may be achieved.
- In some embodiments, in the step iii), the detecting and selecting of the genetic element mutants enriched in the population of the plant cells may be performed by high-throughput sequencing. For example, the total DNA of the population of the plant cells may be extracted, and high-throughput sequencing may be performed for the genetic element.
- In some embodiments, the method further includes a step iv) of identifying the function of the enriched genetic element mutant.
- In one aspect, the present invention provides a genetic element mutant or a coding product thereof obtained by the method of the present invention, and the use of the obtained genetic element mutant or the coding product thereof in plants, especially in plant genetic engineering.
- In one aspect, the present invention provides a kit for implementing the method of the present invention. The kit may include, for example, a vector containing the geminivirus replicon, and/or a vector for expressing the geminivirus Rep and/or RepA protein. The kit may further include a specification for implementing the method of the present invention.
- Further understanding of the present invention may be obtained by referring to some specific embodiments given herein, and these embodiments are only used to describe the present invention, but are not intended to limit the scope of the present invention. Apparently, various modifications and changes may be made to the present invention without departing from the essence of the present invention. Therefore, these modifications and changes are also within a scope of protection claimed in the present application.
- WDV and BeYDV replication subsystems are developed. The two viruses belong to the Mastrevirus genus, the genome structures are very similar, and they may achieve the high-efficient genome amplification in monocotyledons and dicotyledons respectively.
- 1. Directed Evolution System Based on Primary Replicon
- At present, there are many directed evolution systems with various methods, but the core idea is much the same, namely GOI with a function is enriched, and GOI without the function is filtered out. However, compared with bacteria and yeast, the speed of plant cell division (namely replication of genomic DNA) is very slow, and it is difficult to meet the needs. Therefore, it is expected to use virus replication to replace the plant cell division so as to achieve the enrichment of GOI.
- In a plant geminivirus system. LIRs that exists tandemly on a vector may be recognized by Rep/RepA, and cyclized into a primary replicon (PR), and then rolling-circle replication is performed, so the copy number may be increased by about 3 orders of magnitude. Herein, Rep/RepA is the only protein required for this process. Based on this principle, a screening library of GOI (it may be generated by error prone PCR or saturation mutation) may be cloned into the geminivirus replicon, and supplemented by other elements, so that a desired function of GOI is coupled with the expression of Rep/RepA, and thus a plant in vivo directed evolution system based on the primary replicon of the geminivirus is constructed (as shown in
FIG. 4 ). In this system, a target gene allele with the desired function may directly or indirectly drive the expression of Rep/RepA, thereby it is enriched by itself; and the allele without the desired function may not start the expression of Rep/RepA, and it may not be enriched by itself. Then, by deep sequencing, it may be inferred which allele has the function, thereby the purpose of evolution is achieved. - By using the primary replicon, the inventor expects that the directed evolution of the genetic element such as a promoter, a transcription activator, a DNA binding protein, and a recombinase may be achieved. In order to achieve the directed evolution of a promoter, the promoter library to be evolved may be placed to upstream of Rep/RepA. The promoter with function may drive the expression of downstream Rep/RepA, drive the own rolling-circle replication, increase the copy number, and achieve the enrichment; and the promoter without function cannot drive the expression of the downstream Rep/RepA, and cannot achieve the enrichment, thereby the directed evolution of the promoter is achieved (as shown in
FIG. 5 ). In the directed evolution of a transcription activator, the recognition sequence of the transcription activator may be inserted to upstream of Rep/RepA, and supplemented by a minimal transcription initiation element; and a transcription activator library to be evolved is inserted into the replicon. The transcription activator with function may bind to its recognition sequence, and activate the expression of the downstream Rep/RepA, thereby the rolling-circle replication is driven to achieve the self enrichment; and the transcription activator without function may not activate the downstream Rep/RepA, and may not achieve the enrichment, thereby the directed evolution of the transcription activator is achieved (as shown inFIG. 6 ). In the directed evolution of a DNA binding domain, the DNA binding domain to be evolved and a transcription activator without sequence specificity may be combined to form a fusion protein, and placed in the replicon together; and the target binding sequence of the DNA binding domain is inserted to upstream of Rep/RepA, and supplemented by a mini-promoter. In this way, the DNA binding domain having function may bind to its target sequence, and bring the transcription activator to the mini-promoter, the expression of the downstream Rep/RepA is driven, and the rolling-circle replication is driven to achieve the self enrichment; and the DNA binding domain without function cannot bind to the target sequence, and the downstream Rep/RepA cannot be activated, so that the enrichment cannot be achieved, thereby the directed evolution of the DNA binding domain is achieved (as shown inFIG. 7 ). In order to achieve the directed evolution of a recombinase, the recombinase to be evolved may be placed in the replicon; and Rep/RepA is divided into two portions and placed to two ends of the recombinase recognition sequence. In order to guarantee that Rep/RepA may function normally after recombination, a 5′ intron and a 3′ intron may be added between Rep/RepA and the recombinase recognition sequence, so that the recombinase recognition sequence may be cut off after transcription, and Rep/RepA is translated normally. In this way, the recombinase with function may recognize its specific recognition site, mediate the DNA recombination, and normally express Rep/RepA, thereby the rolling-circle replication is driven to achieve self enrichment; and the recombinase without function cannot mediate the DNA recombination, cannot express Rep/RepA, and cannot achieve the enrichment, thereby the directed evolution of the recombinase is achieved (as shown inFIG. 8 ). - 2. Directed Evolution System Based on Secondary Replicon
- In a plant genetic transformation system mediated by Agrobacterium tumefaciens, a series of Vir proteins encoded by the Agrobacterium tumefaciens may recognize a RB sequence on a Ti plasmid, and generate a single-strand DNA break nick at a specific position on it. Then, the VirD2 protein may covalently bind to the 5′ DNA end of the nick, release a T-DNA sequence which is transferred to a plant cell nucleus. In the cell nucleus, VirD2 may recognize a DSB spontaneously generated on the plant genome, and link the T-DNA sequence to it by non homologous end joining (NHEJ) and other modes under the action of a series of host factors, so as to insert the T-DNA sequence into the plant genome.
- Based on this principle, it is found by the inventor from experiments that if one DSB is artificially generated in the geminivirus replicon by using a sequence specific nuclease, the RB region may be linked with the break under the guidance of VirD2, and then a secondary replicon (SR) may be formed under the action of Rep/RepA (as shown in
FIG. 9 ). The inventors subsequently find that this linkage may be divided into two modes, cis-linkage and trans-linkage, and the former is dominant. In general, whether a secondary replicon is generated depends entirely on whether the sequence specific nuclease generates a DSB in a specific position. - In addition, compared with other directed evolution methods, the directed evolution relying on the secondary replicon also has the following advantages: first, in order to guarantee that most cells with only one vector molecule in the evolution process, the concentration of the target gene library must be kept very low, but this also means that the initial expression of GOI is very low, and it may not be enough to meet the screening requirements. In the directed evolution depending on the secondary replicon, GOI may firstly generate a first round of the rolling-circle replication under the action of Rep to form the primary replicon, and in this process, the copy number of GOI may be increased by three orders of magnitude in a short time period, and the expression is greatly increased, so that it would be enough to meet the next screening step. Second, in the process of the directed evolution, the secondary replicon allows for enriching twice: firstly, the secondary replicon is generated under the action of the sequence specific nuclease; and secondly, under the action of Rep/RepA, the secondary replicon generates the second round of the rolling-circle replication, and the copy number is greatly increased. All these make the directed evolution system relying on the secondary replicon have extraordinary advantages.
- According to this principle, the secondary replicon may be used to screen or evolve sequence specific nucleases in the plants with high throughput (as shown in
FIG. 10 ), and may also be used to research a cutting mode of the sequence specific nuclease (such as PAM of a Cas nuclease) and guide RNA (such as sgRNA of Cas9 or crRNA of Cas12a). In addition, the genetic element that may be coupled with nuclease expression may be evolved with high throughput. - For example, a DNA binding domain may be evolved. A DNA binding domain library to be evolved is fused with a non-sequence specific nuclease and placed in the replicon together with target sequence thereof. In this way, the DNA binding domain with function may guide the nuclease to cut the target sequence and generate the secondary replicon under the action of virD2; and the DNA binding domain without function cannot guide the nuclease to cut the target sequence, thereby the secondary replicon cannot be generated. By detecting the secondary replicon, the directed evolution of the DNA binding domain is achieved (as shown in
FIG. 11 ). Similarly, the directed evolution of the non-sequence specific nuclease may also be achieved (as shown inFIG. 12 ). In addition, the secondary replicon may also be used to achieve the directed evolution of a transcription activator. The recognition sequence of the transcription activator may be placed upstream of the nuclease, and supplemented by a minimal transcription initiation element (mini-promoter); and a transcription activator library to be evolved is placed in the replicon together with the recognition sequence of the nuclease. In this way, the transcription activator with function may activate the expression of the nuclease, and cut its recognition sequence, and the secondary replicon is formed under the action of virD2; and the transcription activator without function cannot activate the expression of the nuclease, and thereby the secondary replicon cannot be generated. By detecting the secondary replicon, the directed evolution of the transcription activator may be achieved (as shown inFIG. 13 ). The directed evolution of a recombinase may also be performed by using the directed evolution system relying on the secondary replicon. A recombinase library to be evolved and the recognition sequence of the nuclease are placed in the replicon together, and the nuclease is divided into two portions and placed on two ends of a recognition sequence of the recombinase. In order to guarantee that the nuclease can function normally after recombination, a 5′ intron and a 3′ intron may be added between the nuclease and the recombinase recognition sequence, so that the recombinase recognition sequence may be cut off after transcription and the nuclease is translated normally. In this way, the recombinase with function may mediate the DNA recombination to express the nuclease which will cut its recognition site to generate the secondary replicon; and the recombinase without function cannot mediate the DNA recombination, the nuclease cannot be expressed normally, and the secondary replicon cannot be generated. By detecting the secondary replicon, the directed evolution of the recombinase may be achieved (as shown inFIG. 14 ). - 1. Cultivation of Wheat Seedlings:
- Wheat seeds were planted in a culture room, and cultured under the conditions of 25±2° C. of a temperature, 1000 Lx of an illuminance and 14-16 h/d of light, and the culture time was about 1-2 weeks.
- 2. Protoplast Separation:
- 1) A young leaf of wheat was taken, its middle portion was cut into 0.5-1 mm of silks, put into 0.6 M of Mannitol solution and treated in the dark for 10 minutes, then filtered with a filter screen, and put into 50 ml of enzymolysis solution and digested at 20-25° C. in the dark for 5 hours (firstly static enzymolysis was performed for 0.5 h, then slowly shaken at 10 rmp for 4.5 h).
- 2) 10 ml of W5 solution with a pH value of 5.7 was added to dilute the enzymatic hydrolysate, and the enzymatic hydrolysate was filtered with a 75 um nylon filter membrane into a 50 ml round bottom centrifuge tube.
- 3) Centrifuged at 100 g at 23° C. for 3 min, and the supernatant was discarded.
- 4) The precipitate was gently suspended with 10 ml of the W5 solution, placed on ice for 30 min so that the protoplast was gradually settled, and the supernatant was discarded.
- 5) An appropriate amount of monoclonal gamma globulin (MMG) solution was added for suspension, the protoplast concentration was adjusted to 2×105/ml×106/ml by microscopic examination (×100), and placed on the ice for transformation.
- 3. Wheat Protoplast Transformation
- 1) 20 μg of plasmid was added to a 2 ml centrifuge tube, 200 μl of the protoplast was added with a spear without a tip and gently mixed uniformly, placed stilly for 3-5 min, and 250 μl of polyethylene glycol (PEG) solution was added, flicked gently and mixed uniformly, transformed in the dark for 30 min.
- 2) 900 μl of the W5 solution was added and mixed upsidedown uniformly at a room temperature, centrifuged with 80 g for 3 min, and the supernatant was discarded.
- 3) 1 ml of the W5 solution was added, mixed upsidedown uniformly, and gently transferred to a 6-well plate in which 1 ml of the W5 solution was added per well in advance, and cultured at 23° C. for 24-48 h.
- 4. Detection of Amplicon Copy Number by Fluorescent Quantitative PCR
- After 24-48 h of culture, wheat protoplast DNA was extracted. The residual plasmid DNA was digested with DpnI treatment. PCR system: BIO-RAD iTaq Universal
SYBR Green Mix 10 μL, 8.4 μL of diluted DNA template, 0.8 μL of F primer, and 0.8 μL of R primer. qPCR procedure: 95° C. 30 s, 95° C. 10 s, 60° C. 15 s, and 38 cycles. The WDV replicon was amplified with primers WDVLIR-qF/R, and the genomic DNA was amplified with a primer TaPDS-qF/R. The Ct value of qPCR result was converted into the absolute concentration by a standard curve, and then the ratio of the two was calculated to obtain the copy number of a WDV amplicon. - 5. Cultivation of Tobacco Plants
- A layer of filter paper was placed in a petri dish, and soaked with water, and tobacco seeds were sprinkled on the filter paper, cultured under light at 22° C. for about 5 days. Sprouted seedlings were transplanted into a culture bowl, and cultured for 4 weeks under the conditions of 22±2° C., 1000 Lx illuminance and 14-16 h/d of light.
- 6. Agrobacterium-Mediated Tobacco Transient Transformation
- Agrobacterium tumefaciens with the target plasmid were inoculated into a Luria-Bertani (LB) medium containing kanamycin and rifampicin, shaken and cultured overnight at 28° C. On the next day, 0.3 ml of turbid Agrobacterium bacteria solution was re-inoculated into 6 ml fresh medium, shaken and cultured at 28° C. for 4-6 hours. While the bacteria solution was cultured to 0.6-1.0 of optical density (OD), it was centrifuged, bacterial cells were collected, resuspended with tobacco infection solution, OD was adjusted to the target concentration (which should not exceed 1.6), and incubated in the dark at the room temperature for 30 min to 3 h. Flat and healthy tobacco leaves were selected, and the incubated bacteria solution was injected with an injector. Samples were taken for analysis after 48 h-96 h.
- 7. Deep Sequencing Method
- 1) The protoplast DNA was extracted, and then the residual plasmid DNA was digested with the DpnI treatment.
- 2) In Example 1, primers 35Sp-200F and WDV-Rep-5R were used for preforming the first round of PCR amplification on DNA. Barcode primers ngs35Sp-300F and ngsWDV-Rep-100R were used for performing the second round of the amplification on a first round PCR product. In Example 2, primers 35Sp-200F and WDV-Rep-100R were used for performing the first round of PCR amplification on DNA. Barcode primers ngs35Sp-250F and ngsWDV-Rep-50R were used for performing the second round of the amplification on a first round PCR product.
- 3) Second round PCR product was recovered, mixed in an proportion equal for all treatments, sent to the company for creating a library, and the deep sequencing was performed.
- Components of each solution used in the above methods are as follows:
- 50 ml of Enzymatic Hydrolysate
-
Added amount Final concentration Cellulase R10 0.75 g 1.5% Macerozyme R10 0.375 g 0.75% Mannitol 5.4651 g 0.6M Methyl ethyl sulphate (MES) 0.1066 g 10 mM The volume is 50 ml, and the pH value is adjusted to 5.7 by KOH; and incubated in a 55° C. water bath for 10 minutes, and added after being naturally cooled at the room temperature. CaCl2 0.0735 g 10 mM Bovine serum albumin (BSA) 0.05 g 0.1% Filtered with a 0.45 um filter membrane - 500 ml W5
-
Reagent Added amount Final concentration NaCl 4.5 g 154 mM CaCl2 9.189 g 125 mM KCl 0.1864 g 5 mM MES 0.2132 g 2 mM The volume is 500 ml, and pH is adjusted to 5.7 by NaOH. - 10 ml of MMG Solution
-
Reagent Added amount Final concentration Mannitol (0.8M) 5 ml 0.4M MgCl2 (1M) 0.15 ml 15 mM MES (200 mM) 0.2 ml 4 mM DDW To 10 ml - 4 ml of PEG Solution
-
Reagent Added amount Final concentration PEG4000 1.6 g 40% Mannitol (0.8M) 1 ml 0.2M CaCl2 (1M) 0.4 ml 0.1M DDW To 4 ml - 5. Tobacco Infection Solution
-
Reagent Final concentration MgCl 2 10 mM MES 10 mM Acetosyringone 150 μM - The 20th amino acid of wild-type WDV Rep/RepA is a tyrosine, and this amino acid is highly conservative in this genus. Previous research show that the mutant Rep/RepA Y20C of Rep does not have the replication initiation activity. In order to verify whether the directed evolution system of the present invention can work, the inventors firstly attempted to evolve the 20th amino acid of Rep/RepA.
- Firstly, a codon of the 20th amino acid of Rep/RepA was convened from TAT to NNN by a PCR method, and then a library with diversity of 64 (43) was obtained by cloning a PCR fragment onto the geminivirus vector (as shown in
FIG. 15 ). Then, the library was transformed into wheat protoplasts at different concentration gradients (10 μg-0.1 ng, divided into 10 concentration gradients). After 48 h, protoplast DNA was extracted, and deeply sequenced for the site. Sequencing results of each concentration were compared with the results of the initial library. - It was found from the results that 12 of 64 alleles contained in the library were enriched, encoding proline, glutamine, arginine, leucine, tyrosine, lysine and alanine respectively (as shown in
FIG. 16 ). 35% of the encoded amino acids generated positive screening, which did not meet the initial expectations. It was speculated that the reason is that the expression quantity of Rep/RepA is positively correlated with the copy number of the rolling-circle replication, and while the used amount of the library in protoplast transformation is gradually reduced, the proportion of cells transformed with only one molecule is increased, but then the expression quantity of Rep/RepA expressed by the single molecule is relatively low, it is not enough to drive the replicon to generate the high-efficient rolling-circle replication. However, if the used amount of the library is relatively high, the probability of a functional allele and a non-functional allele co-transformed into the same cell is increased, and it may cause the amplification of the non-functional allele. This may results in a very high background noise. - Based on this result, the inventor hoped to find a replication enhancer. On the one hand, this replication enhancer may not independently initiate the rolling-circle replication, and on the other hand, while the Rep/RepA expression quantity is relatively low, the existence of this replication enhancer may greatly increase the copy number of the replicon. The previous research show that Rep/RepA of the geminivirus is a multifunctional protein, and in addition to initiating the rolling-circle replication, it is also an inhibitor of post transcriptional gene silencing (PTGS), and a transcription activator of a viral coding gene, and may interact with an endogenous protein, allowing a mature cell has the ability of high level DNA replication again, and the like. Based on this assumption, the inventor attempted to find a mutant of Rep/RepA. On the one hand, it may no longer initiate the rolling-circle replication, and on the other hand, it retains other functions except for initiating the rolling-circle replication, so that it can be used as the replication enhancer.
- Firstly, the inventor constructed several Rep/RepA mutants, including Y106H, K229E, Y20C, H59R, E198A and H91R, and then the replication initiation activity of these mutants were detected by the fluorescent quantitative PCR. Results show that except H91R, all other mutants have no replication initiation activity at all. Then, in order to screen the suitable replication enhancer, the added amount of the Rep/RepA plasmid was reduced from original 10 μg to 100 ng, and at this concentration, the copy number of the replicon may only reach a lower level. In this case, the addition of the replication enhancer should be able to greatly increase the level of the copy number. After being detected, it was found that RepA, Rep/RepA K229E and Y20C may be used as the replication enhancer. In subsequent experiments, Rep/RepA Y20C is used as the replication enhancer.
- The evolution experiment of the 20th codon of Rep/RepA was performed again. It was found that after the replication enhancer was added, 4 alleles were enriched, encoding phenylalanine and tyrosine respectively (as shown in
FIG. 17 ). In addition to the expected tyrosine, the phenylalanine was also enriched. It is specifically speculated that the reason is that a motif of Rep/RepA Y20 needs to interact with DNA, and the phenylalanine, like the tyrosine, has an aromatic residue which may generate π-π stacking with a DNA base. This work shows that the system of the present invention may work. - In this experiment, it was found that the amount of the library added is also a key factor affecting the evolution system. In a series of the concentration gradients, it was found that while the dilution ratio reaches 5-10, namely the added amount is about 1×10−13 mol to 1×10−15 mol, which is 103 to 105 times of the protoplast amount, the screening effect is the best.
- CaMV 35S promoter is a constitutive promoter commonly used in plants. There is a TATA-box motif at about 30 bp upstream of its transcription initiation site, and the sequence of which is 5′-ctatataag-3′. Most eukaryotic pol type-II promoters have a TATA-box element, and it is critical to the transcriptional activity of the promoter. In this Example, it was intended to evolve the CaMV 35S promoter TATA-box, and simultaneously study the effect of the element sequence on the activity of the CaMV 35S promoter.
- Firstly, the activity of the CaMV 35S promoter was coupled with the expression of Rep/RepA. The CaMV 35S promoter was used to drive expression of WDV Rep/RepA, and the two were placed in the replicon together.
- The second step was to construct a screening library. TATA-box of the CaMV 35S promoter was converted from CTATATAAG to CNNNNNNNG by PCR method, and then the PCR fragments were cloned onto the geminivirus vector to obtain a library with theoretical diversity of 16384 (47) (as shown in
FIG. 18 ). Then, the library was transformed into wheat protoplasts at the different concentrations. After 48 h, protoplast DNA was extracted, and the site was sequenced. Sequencing results of each concentration were compared with the results of the initial library. - It was found from the results by Sanger sequencing and amplicon sequencing that, in the library of which the sequence is CNNNNNNNG, after screening, a sequence corresponding to the original TATA-box was enriched, so that almost all the sequences become CTATATAAG (as shown in
FIG. 19 ). It was consistent with the expected results. - Prime editing system is a gene editing system that may perform any base replacement and small fragment deletion or insertion, and it includes two portions, namely a nCas9 (H840A) protein fused with an M-MLV reverse transcriptase and a prime editing guide RNA (pegRNA). The prime editing system may work more efficiently in yeast and animal cells, but it is very inefficient in the plants and has very strong site specificity. pegRNA includes 4 portions, a spacer portion responsible for recognizing a target site, a scaffold part responsible for linking with nCas9, a primer binding site (PBS) as a primer responsible for complementing with 5′ end sequence of a nCas nick, and a reverse transcription template (RT) portion, as a reverse transcription template, responsible for repairing the 3′ end of the nick into an given sequence. For one target site, Spacer and Scaffold regions are fixed, but PBS and RT regions are highly variable, and the length and sequence of the two regions have a great impact on the efficiency of prime editing. Therefore, it is hoped to establish a high-throughput plant system capable of screening pegRNAs.
- Firstly, the function of pegRNA needs to be coupled with the expression of Rep/RepA. The target site is inserted at N end of Rep/RepA, resulting in frame-shifting of the open reading frame of Rep/RepA. While the pegRNA is absent or ineffective, Rep/RepA cannot be expressed. However, if the pegRNA is active, it may introduce short insertion or deletion at the target site, so that Rep/RepA may be correctly expressed. Based on this principle, an expression cassette of pegRNA is inserted into the geminivirus replicon, and a fluorescence reporter system is inserted at its two ends (as shown in
FIG. 20 ). If the virus replicon generates the rolling-circle replication under the action of pegRNA, a fluorescence signal is reported; and if the virus replicon does not generate the rolling-circle replication, there is no fluorescence signal. - In order to verify whether the system works, two vectors were constructed, wherein one vector contains known pegRNA with activity, and the other contains a mutation in PBS of pegRNA which results in loss of activity. The two vectors were mixed in an equal proportion to form a library and transformed into tobacco leaves at different concentrations. After 6 days. DNA was extracted and detected, and it was found by the inventors while the tobacco leaves were transformed with a high concentration of library, pegRNA with activity was not enriched; and while the tobacco leaves were transformed with a low concentration of library, pegRNA with activity was significantly enriched (as shown in
FIG. 21 ). This meets the expectations and proves that the system may work. - In the field of genome editing, researchers already found many Cas proteins with nuclease activity (including Cas9, Cas12a, Cas12b and the like). One of the characteristics of these Cas proteins is that there needs to be a specific sequence at upstream or downstream of the cutting target site, called protospacer adjacent motif (PAM). Different Cas proteins have different PAM sequences.
- In previous researches, a Cas12a protein, called FbCas12a, was found in Flavobacterium branchiophellum by using a method of bioinformatics analysis. In order to apply it in biotechnologies, it is necessary to determine its PAM sequence firstly. For this reason, a library containing 4096 PAMs was constructed (as shown in
FIG. 22 ). - If a certain PAM can be recognized by FbCas12a, the latter may generate DSB in the target site region and then form a secondary replicon, and PAM information may remain in the secondary replicon; and if a certain PAM cannot be recognized by FbCas12a, DSB cannot be generated and PAM cannot be retained in the secondary replicon. Therefore, as long as the secondary replicon is specifically detected, it can be known which PAM can be recognized by FbCas12a (as shown in
FIG. 23 ). - Two target sites. OsEPSPSC3 and c5, were selected for testing. After testing, it was apparent that PAM that can be recognized by FbCas12a is ‘TTT’ (as shown in
FIG. 24 ). At other positions, no apparent base preference was found (as shown inFIG. 25 ). - For a Cas protein, it needs a RNA to guide its nucleic acid cutting, called sgRNA (Cas9) or crRNA (Cas12a, Cas12b). Its sequence and structure greatly affect the activity of the Cas protein. Therefore, it is hoped to establish a system that can screen sgRNA quickly in high throughput.
- Similar to Example 4, vectors shown in the figure are designed (as shown in
FIG. 26 ). If a certain sgRNA is active, it can guide Cas9 to cut the target site located at its downstream, thereby a secondary replicon is formed; and if a certain sgRNA is inactive, it cannot generate DSB, and the secondary replicon cannot be generated. Based on this principle, two vectors were constructed, wherein one contains a sgRNA with activity, and the other containing a sgRNA without the activity. After 5 days of screening, it was found that sgRNA with activity in the secondary replicon was significantly enriched, and enrichment was related to the library concentration (as shown inFIG. 27 ). This proves that the system works. -
Sequence listing > SEQ ID NO 1 WDV-LIRGGTAGTGAACAGAAGTCCGGCAGGTCCTTAGCGAAAAAACGGGGTGTGC CAGAAAACTCTATCCTCTACCCTGCGTGGAGGTGTGAATTCTGCACACT GCAAATGCAATGTGTCCAATGCTTTATATAGGGCAGGTTTTGGCGGGAG AACAGGGCCCTAGTGTTCCCACGGTAGCGTAGCGAATCGTGTGGGCCCT GTTCGGTGTGCGGTCGGGGGGCCTCCACGCGGGTTATAATATTACCCCG CGTGGTGGCCCCCGACGCGCACTCGGCTTTTCGTGAGTGCGCGGAGGCT TTTGGACCACATCTTTTCTGATCACTTTCGTGGAAGATGTTGATTTATC ACACTTTTGACGGGGAAATCTGTGCCATGCCTTAGCTTATAAGGAAGTG CGTGGTAGCCCATCTCG > SEQ ID NO 2 WDV-SIRTAAAATAATATTTTATTTATCTCATGTCATTCGATTACAGAGGCTCGGC TACGAGCAAAGACAAACCAAATATAACAAACAACAACCCTTACACAATG ACATCGGAAAACGAAATACAACACCCTGAGATATTACATTTATAGAAAC TGTACGCCGTCCGCGCTAGGACAG > SEQ ID NO 3 WDV-RepMASSSAPRFRVYSKYLFLTYPQCTLEPQYALDSLRTLLNKYEPLYIAAV RELHEDGSPHLHVTVQNKLRASITNPNALNLRMDTSPFSIFHPNIQAAK DCNQVRDYITKEVDSDVNTAEWGTFVAVSTPGRKDRDADMKQIIESSSS REEFLSMVCNRFPFEWSIRLKDFEYTARHLFPDPVATYTPEFPTESLIC HETIESWKNEHLYSESPGRHKSIYICGPTRTGKTSWARSLGTHNYYNSL VDFTTYDVNAKYNIIDDIPFKFTPNWKCFVGAQRDFTVNPKYGKRKVIR GGIPCIILVNPDEDWLKDMTPEQSDYMYSNTVVHYMYEGETFINYSFAS GEDVTASQ* >SEQ ID NO 4 WDV-Rep Y20C MASSSAPRFRVYSKYLFLTCPQCTLEPQYALDSLRTLLNKYEPLYIAAV RELHEDGSPHLHVLVQNKLRASITNPNALNLRMDTSPFSIFHPNIQAAK DCNQVRDYITKEVDSDVNTAEWGTFVAVSTPGRKDRDADMKQIIESSSS REEFLSMVCNRFPFEWSIRLKDFEYTARHLFPDPVATYTPEFPTESLIC HETIESWKNEHLYSESPGRHKSIYICGPTRTGKTSWARSLGTHNYYNSL VDFTTYDVNAKYNIIDDIPFKFTPNWKCFVGAQRDFTVNPKYGKRKVIR GGIPCIILVNPDEDWLKDMTPEQSDYMYSNTVVHYMYEGETFINYSFAS GEDVTASQ* >SEQ ID NO 5 WDV-RepAMASSSAPRFRVYSKYLFLTYPQCTLEPQYALDSLRTLLNKYEPLYIAAV RELHEDGSPHLHVLVQNKLRASITNPNALNLRMDTSPFSIFHPNIQAAK DCNQVRDYITKEVDSDVNTAEWGTFVAVSTPGRKDRDADMKQIIESSSS REEFLSMVCNRFPFEWSIRLKDFEYTARHLFPDPVATYTPEFPTESLIC HETIESWKNEHLYSVSLESYILCTSTPADQAQSDLEWMDDYSRSHRGGI SPSTSAGQPEQERLPGQGL* >SEQ ID NO 6 WDV-RepA Y20CMASSSAPRFRVYSKYLFLTCPQCTLEPQYALDSLRTLLNKYEPLYIAAV RELHEDGSPHLHVLVQNKLRASITNPNALNLRMDTSPFSIFHPNIQAAK DCNQVRDYITKEVDSDVNTAEWGTFVAVSTPGRKDRDADMKQIIESSSS REEFLSMVCNRFPFEWSIRLKDFEYTARHLFPDPVATYTPEFPTESLIC HETIESWKNEHLYSVSLESYILCTSTPADQAQSDLEWMDDYSRSHRGGI SPSTSAGQPEQERLPGQGL*
Claims (54)
1. A method for directed evolution of a genetic element to obtain a mutant of the genetic element with a desired function, wherein the method comprises:
i) providing a library of the mutants of the genetic element, wherein it contains a plurality of the mutants of the genetic element respectively inserted into a vector containing a geminivirus replicon, and wherein the mutant is inserted into the geminivirus replicon so that while the geminivirus replicon is replicated, the mutant is amplified,
ii) transforming a population of plant cells with the library, and
iii) culturing the population of plant cells, detecting and selecting the genetic element mutant enriched in the population of plant cells,
wherein the replication level of the geminivirus replicon in the plant cells is configured to be associated with the desired function of the genetic element mutant.
2. The method according to claim 1 , wherein the genetic element is selected from a protein coding sequence: a functional RNA coding sequence, such as tRNA and siRNA coding sequences; and an expression regulatory sequence such as a promoter sequence, an enhancer sequence, and a terminator sequence.
3. The method according to claim 1 or 2 , wherein the genetic element is derived from a plant, or is expected to be applied in a plant.
4. The method according to any one of claims 1 -3 , wherein the library of the mutants of the genetic element is obtained by respectively inserting the plurality of the mutants of the genetic element into a vector containing the geminivirus replicon.
5. The method according to claim 4 , wherein the plurality of the mutants of the genetic element is generated by random mutagenesis of the genetic element.
6. The method according to any one of claims 1 -3 , wherein the library is generated by performing random mutagenesis on the genetic element that has been inserted into the vector containing the geminivirus replicon.
7. The method according to any one of claims 1 -6 , wherein the geminivirus is a wheat dwarf virus (WDV).
8. The method according to any one of claims 1 -6 , wherein the geminivirus is a bean yellow dwarf virus (BeYDV).
9. The method according to any one of claims 1 -8 , wherein the vector containing the geminivirus replicon is a circular DNA, such as a plasmid or a minicircle DNA.
10. The method according to any one of claims 1 -9 , wherein the vector containing the geminivirus replicon contains at least one large intergenic region (LIR), for example, the LIR comprises the nucleotide sequence shown in SEQ ID NO: 1.
11. The method according to any one of claims 1 -10 , wherein the vector containing the geminivirus replicon contains at least one small intergenic region (SIR), for example, the SIR comprises the nucleotide sequence shown in SEQ ID NO: 2.
12. The method according to any one of claims 1 -11 , wherein the vector containing the geminivirus replicon contains one LIR.
13. The method according to any one of claims 1 -11 , wherein the vector containing the geminivirus replicon contains two LIRs.
14. The method according to any one of claims 1 -13 , wherein in the vector containing the geminivirus replicon, the inserted mutant of the genetic element is operably linked with an expression regulatory sequence.
15. The method according to any one of claims 1 -14 , wherein the vector containing the geminivirus replicon further contains an expression cassette of a geminivirus Rep and/or RepA protein.
16. The method according to any one of claims 1 -14 , wherein the vector containing the geminivirus replicon does not contain an expression cassette of a geminivirus Rep and/or RepA protein.
17. The method according to claim 16 , wherein the method further comprises introducing another vector for expressing the geminivirus Rep and/or RepA protein into the plant cells, for example, the population of plant cells are co-transformed with the another vector for expressing a geminivirus Rep and/or RepA protein together with the library.
18. The method according to claim 16 , wherein the plant cell already contains a vector for expressing a geminivirus Rep and/or RepA protein, and/or the genome of the plant cell is already integrated with an expression cassette of a geminivirus Rep and/or RepA protein.
19. The method according to any one of claims 1 -18 , wherein the geminivirus Rep protein comprises the amino acid sequence shown in SEQ ID NO: 3, or comprises an amino acid sequence with amino acid substitution K229E or Y20C relative to SEQ ID NO: 3, preferably comprises the amino acid sequence shown in SEQ ID NO: 4.
20. The method according to any one of claims 1 -18 , wherein the geminivirus RepA protein comprises the amino acid sequence shown in SEQ ID NO: 5, or comprises an amino acid sequence with amino acid substitution K229E or Y20C relative to SEQ ID NO: 5, preferably comprises the an amino acid sequence shown in SEQ ID NO: 6.
21. The method according to any one of claims 1 -20 , wherein the activity or expression level of the geminivirus Rep and/or RepA protein in the plant cell is configured to be associated with the desired function of the genetic element mutant.
22. The method according to any one of claims 1 -21 , wherein in the transformation of the step ii), the number of vector molecules containing the mutants in the library is 103 to 105 times of the number of the cells in the population of plant cells.
23. The method according to any one of claims 1 -22 , wherein the expression or activity of the geminivirus Rep and/or RepA protein in the plant cell is coupled with the desired function of the genetic element mutant, thereby the directed evolution of the genetic element is achieved.
24. The method according to any one of claims 1 -23 , wherein the genetic element mutant with the desired function activates Rep/RepA expression, drives the rolling-circle replication and thus achieves self-enrichment: while the genetic element mutant without the desired function cannot activate the Rep/RepA expression such that enrichment cannot be achieved.
25. The method according to claim 1 , wherein the genetic element is a promoter.
26. The method according to claim 25 , wherein the method further comprises placing a promoter library to be evolved to upstream of Rep/RepA in the replicon.
27. The method according to claim 25 or 26 , wherein the genetic element is a cauliflower mosaic virus (CaMV) 35S promoter TATA-box.
28. The method according to claim 1 , wherein the genetic element is a sequence encoding a transcription activator.
29. The method according to claim 28 , wherein the method further comprises inserting a recognition sequence of the transcription activator to upstream of Rep/RepA, and inserting a minimal promoter between the recognition sequence and the Rep/RepA; and placing a library of the transcription activator to be evolved in the replicon.
30. The method according to claim 1 , wherein the genetic element is a DNA binding domain.
31. The method according to claim 30 , wherein the method further comprises inserting a target binding sequence of the DNA binding domain to upstream of Rep/RepA, and inserting the minimal transcription initiation element between the target binding sequence and Rep/RepA; and placing a fusion protein formed by the DNA binding domain to be evolved and a transcription activator without sequence specificity in the replicon.
32. The method according to claim 1 , wherein the genetic element is a sequence encoding a recombinase.
33. The method according to claim 32 , wherein the method further comprises dividing Rep/RepA into two portions which are placed at two ends of a recombinase recognition sequence; and placing the sequence encoding the recombinase to be evolved in the replicon.
34. The method according to claim 32 or 33 , wherein the method further comprises adding a 5′ intron and a 3′ intron between Rep/RepA and the recombinase recognition sequences.
35. The method according to claim 1 , wherein the genetic element is a prime editing guide RNA (pegRNA).
36. The method according to claim 35 , wherein the method further comprises inserting a target site at N terminal of Rep % RepA which results in frame-shifting of the open reading frame of Rep/RepA; and inserting an expression cassette of pegRNA into the geminivirus replicon, and inserting a fluorescence reporter system to two ends thereof.
37. The method according to claim 1 , wherein the desired function of the genetic element is coupled with expression of a nuclease.
38. The method according to claim 37 , wherein the nuclease is a specific nuclease.
39. The method according to claim 37 or 38 , wherein the genetic element with the desired function activates the expression of the nuclease or guides the nuclease to cut its recognition site to allow the rolling-circle replication to achieve the self-enrichment; and the genetic element without the desired function cannot allow the nuclease to cut its recognition site to achieve the enrichment, thereby the directed evolution of the genetic element is achieved.
40. The method according to any one of claims 37 -39 , wherein the genetic element is a DNA binding domain.
41. The method according to claim 40 , wherein the method further comprises fusing a the DNA binding domain to be evolved to a non-sequence specific nuclease, and placing it in the replicon together with its recognition sequence.
42. The method according to any one of claims 37 -39 , wherein the genetic element is a sequence encoding a non-sequence specific nuclease.
43. The method according to any one of claims 37 -39 , wherein the genetic element is a sequence encoding a transcription activator.
44. The method according to claim 43 , wherein the method further comprises inserting a recognition sequence of the transcription activator to upstream of the nuclease, and inserting a minimal promoter between the recognition sequence and the nuclease; and placing a library of the transcription activator to be evolved in the replicon together with the recognition sequence of the nuclease.
45. The method according to any one of claims 37 -39 , wherein the genetic element is a sequence encoding a recombinase.
46. The method according to claim 45 , wherein the method further comprises placing a library of the recombinase to be evolved and the recognition sequence of a nuclease in the replicon; and dividing the nuclease into two portions which are placed at two ends of the recombinase recognition sequence.
47. The method according to claim 45 or 46 , wherein the method further comprises adding a 5′ intron and a 3′ intron between the nuclease and the recombinase recognition sequence.
48. The method according to any one of claims 37 -39 , wherein the genetic element is a protospacer adjacent motif (PAM) of a Cas protein.
49. The method according to claim 48 , wherein the method further comprises placing sgRNA to be evolved and the target sequence of the Cas protein in the replicon together.
50. The method according to any one of claims 37 -39 , wherein the genetic element is sgRNA.
51. The method according to claim 50 , wherein the method further comprises placing sgRNA to be evolved and the target sequence of the Cas protein in the replicon together.
52. The method according to any one of claims 1 -51 , wherein in step iii), the detecting and selecting of the genetic element mutant enriched in the population of plant cells is performed by high-throughput sequencing.
53. The method according to any one of claims 1 -52 , wherein it further comprises a step iv) of identifying the function of the enriched genetic element mutant.
54. The method according to any one of claims 1 -53 , wherein the plant is a monocotyledon or a dicotyledon, for example, it is selected from corn, wheat, rice, barley, sorghum, kidney bean, beet, tomato, cassava, cucumber, arabidopsis and tobacco.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010376936 | 2020-05-07 | ||
CN202010376936.X | 2020-05-07 | ||
PCT/CN2021/083949 WO2021223545A1 (en) | 2020-05-07 | 2021-03-30 | Directed evolution method based on primary and secondary replicon of gemini virus |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230235317A1 true US20230235317A1 (en) | 2023-07-27 |
Family
ID=78468628
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/923,264 Pending US20230235317A1 (en) | 2020-05-07 | 2021-11-11 | Directed evolution method based on primary and secondary replicon of gemini virus |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230235317A1 (en) |
EP (1) | EP4148129A4 (en) |
CN (1) | CN115516089A (en) |
WO (1) | WO2021223545A1 (en) |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110804618B (en) * | 2019-11-25 | 2022-11-11 | 东北电力大学 | Directed evolution construction of ROMT mutant for efficiently converting resveratrol to produce pinqi |
-
2021
- 2021-03-30 WO PCT/CN2021/083949 patent/WO2021223545A1/en unknown
- 2021-03-30 CN CN202180033728.2A patent/CN115516089A/en active Pending
- 2021-03-30 EP EP21799929.1A patent/EP4148129A4/en active Pending
- 2021-11-11 US US17/923,264 patent/US20230235317A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
EP4148129A1 (en) | 2023-03-15 |
EP4148129A4 (en) | 2024-05-29 |
CN115516089A (en) | 2022-12-23 |
WO2021223545A1 (en) | 2021-11-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11384360B2 (en) | Gene targeting in plants using DNA viruses | |
WO2018103686A1 (en) | Chloroplast genome editing method | |
US20160237451A1 (en) | Conferring resistance to geminiviruses in plants using crispr/cas systems | |
Ren et al. | Bidirectional promoter-based CRISPR-Cas9 systems for plant genome editing | |
US20210348179A1 (en) | Compositions and methods for regulating gene expression for targeted mutagenesis | |
WO2019138052A1 (en) | Optimized plant crispr/cpf1 systems | |
Lu et al. | A high-throughput virus-induced gene-silencing vector for screening transcription factors in virus-induced plant defense response in orchid | |
CN108034671B (en) | Plasmid vector and method for establishing plant population by using same | |
Zhong et al. | Intron-based single transcript unit CRISPR systems for plant genome editing | |
Aragonés et al. | Simplifying plant gene silencing and genome editing logistics by a one‐Agrobacterium system for simultaneous delivery of multipartite virus vectors | |
JP5733609B2 (en) | A novel promoter used to transform algae | |
US20230235317A1 (en) | Directed evolution method based on primary and secondary replicon of gemini virus | |
WO2018082611A1 (en) | Nucleic acid construct expressing exogenous gene in plant cells and use thereof | |
CN115725577A (en) | sgRNA for knocking out PRMT1 gene and construction method and application of PRMT1 gene knock-out cell line | |
CN109913448B (en) | Promoter pSSP2 specifically expressed in rice stamen and application thereof | |
WO2020234468A1 (en) | Rna viral rna molecule for gene editing | |
CN117363644B (en) | VIGS silencing efficiency reporting plasmid, method for evaluating silencing efficiency and application | |
EP3889267A1 (en) | (be-)curtovirus replicon-mediated genome editing in plants | |
CN117286178B (en) | Simplified construction method of double-component viral vector and related application thereof | |
CN110283821B (en) | Promoter with anther tissue specificity | |
Dobhal et al. | Studies on plant regeneration and transformation efficiency of Agrobacterium mediated transformation using neomycin phosphotransferase II (nptII) and glucuronidase (GUS) as a reporter gene | |
CN117683807A (en) | Efficient fragment assembly and rapid plant cell protoplast transient expression method, system and application | |
Lu et al. | Developing a multi-modular assembled prime editing (mPE) system improved precise multi-base insertion efficiency in dicots | |
CN106893723B (en) | Plant bidirectional promoter and application thereof | |
Gunadi | Advancing CRISPR Applications Using Soybean [Glycine max (L.) Merr.] Promoters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION UNDERGOING PREEXAM PROCESSING |
|
AS | Assignment |
Owner name: SUZHOU QI BIODESIGN BIOTECHNOLOGY COMPANY LIMITED, CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GAO, CAIXIA;ZHU, HAOCHENG;REEL/FRAME:061888/0088 Effective date: 20221112 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |