CN116987715B

CN116987715B - Artificial gene driving system

Info

Publication number: CN116987715B
Application number: CN202311247476.0A
Authority: CN
Inventors: 刘洋; 焦丙可; 钱文峰
Original assignee: Institute of Genetics and Developmental Biology of CAS
Current assignee: Institute of Genetics and Developmental Biology of CAS
Priority date: 2023-09-25
Filing date: 2023-09-25
Publication date: 2024-01-30
Anticipated expiration: 2043-09-25
Also published as: CN116987715A

Abstract

The invention belongs to the field of biotechnology. In particular, the present invention relates to an artificial gene driving system. More specifically, the present invention relates to an artificial gene driving system based on a poison-drug-releasing mechanism applicable to plants, wherein a gene editing system for disabling a pollen tube development essential protein functions as a poison, a recoded coding sequence of the pollen tube development essential protein codes for a wild-type pollen tube development essential protein and cannot be targeted by the gene editing system as a drug-releasing. The artificial gene driving system of the present invention can be used to transmit characteristics beneficial to humans to wild organism populations.

Description

Artificial gene driving system

Technical Field

The invention belongs to the field of biotechnology. In particular, the present invention relates to an artificial gene driving system. More particularly, the present invention relates to an artificial gene driving system based on a poison-drug releasing mechanism applicable to plants.

Background

The method has important significance for gene manipulation of wild populations in the face of controlling disease transmission media such as mosquitoes, protecting biodiversity, relieving diversification challenges such as agricultural pest disasters and the like. However, under the influence of classical mendelian genetics and darwinian selection, characteristics beneficial to humans are often difficult to widespread in these populations, as they tend to be selection neutral or even detrimental to the organism itself. Nevertheless, there are widespread selfish genetic elements in nature that can be transferred to offspring at frequencies exceeding mendelian's law (> 50%, for the example of diploid hybrids) to obtain their own advantages. Inspired by these natural processes, there have been artificially designed gene driven systems aimed at propagating genetic changes in populations (genetic alterations) without regard to the possible cost of adaptability (fitness cost) to the individual organism. Thus, this technology has the potential to propagate human-beneficial features to wild populations, providing an attractive tool to address the aforementioned global challenges.

In yeast ^1-2 Mosquito and mosquito ^3-7 Drosophila, drosophila ^8-10 And mice (mice) ¹¹ In the past, various synthetic gene drive systems have been implemented for population improvement (population modification) or population suppression (population suppression). These gene drive systems are mostly based on localization endonucleases (so called home-based drive), functioning through a "copy-paste" mechanism, utilizing CRISPR-Cas9 mediated DNA Double Strand Breaks (DSBs) and subsequent homologous recombination repair (HDR) processes to convert heterozygotes to homozygotes, allowing the gene drive element to inherit to offspring in proportions exceeding 50%. If the gene drive system is internally interlocked with a cargo ³ (cargo) with the spread of gene drives, population improvement can be achieved; if the gene driving system itself is located in a fertility-related essential gene (essential gene) ^4,5 Population suppression can be achieved by continuing to cut the homologous genes internally. However, if repair by HDR is not successful, DSBs will be repaired by non-homologous end joining (NHEJ), introducing Indel, and creating a resistance allele that cannot continue to be targeted for cleavage by gRNA (resistance allele). Whereas the NHEJ pattern is particularly common in plants and is therefore plant specific Continuous spread of gene drives in populations constitutes a significant challenge. Therefore, mechanisms that do not rely on the HDR repair pathway are vital in pursuing more efficient synthetic gene driven systems.

Toxic-drug-releasing (TA) mechanism, naturally occurring in micetHaplotypetA replotype) as an example, gives a very promising hint. Toxicants are usually expressed prior to meiosis and thus are present in the four gametes formed, interfering with normal gametogenesis, whereas antidotes are activated at a stage following meiosis, able to mitigate, neutralize the lesions caused by the toxicants, providing evolutionary advantages for their carriers. Although these natural drug-solution systems cannot replicate directly into individual species, the advent of CRISPR-Cas9 provides a more versatile way to mimic natural drug-solution strategies. In an artificially designed TA system, an essential gene is repaired through CRISPR/Cas9 cleavage and NHEJ pathway, so that loss-of-function (LOF) is used as a poison, and the Recoded (Recoded) sequence of the essential gene which cannot be targeted by gRNA is used as a poison to rescue the influence generated by the poison.

Toxin-Antidote Recessive Embryo (TARE) drive System has been developed ¹⁰ Also known as Cleave and Rescue or ClvR ⁹ Targeting an essential gene for zygotic development, an individual dies when both alleles of that gene are inactivated (LOF) in the individual and there is no genetic driving element. In contrast, individuals who inherit the gene driving element survive. This system is already known in Drosophila melanogaster @Drosophila melanogaster) 88-95% gene driven transmissibility was achieved in female heterozygotes (transmission rate) ¹⁰ . Although very effective, the efficiency of this system depends on the presence of Cas9 activity from egg cell carry (caryover) to zygote (only able to cleave the paternally contributed WT allele), or requires that the target gene be located on the sex chromosome, thus somewhat hampering wide species application.

In addition, another design principle that has not been realized is the axin-Antidote Dominant Sperm (TADS) drive ¹² The aim is to interfere with essential genes in the spermatogenesis process, theoretically driving more efficiently, bypassing the above-mentioned limitations of TARE by disrupting only one copy of the gene of interest, rather than the need to disrupt both alleles as in the TARE system. However, its technical implementation is hampered by the fact that determining only essential genes affecting spermatogenesis is still a problem.

Disclosure of Invention

In this work, the inventors developed a gene drive system, named CAIN (CRISPR-Assisted Inheritance utilizingNPG1). The gene driving system based on poison-drug decomposition mechanism is a carefully designed artificial gene driving system which can be applied to plants, utilizes the longer male gametophyte development process of plants and targets and cuts an essential gene related to pollen grain germinationNPG1(No Pollen Germination 1). Experiments prove that the method can be improved from 50% to 88-99% of two continuous generations in Arabidopsis thaliana, successfully realizes partial isolation (biased inheritance), and realizes the inheritance of a remarkable super-Mendelian proportion. The success of CAIN predicts the potential of application in a variety of plant species, providing a solution to the important challenges-slowing down the spread of invasive species by affecting the genetic proportion of sterile genes, and managing weed populations by spreading genes that are sensitive to certain herbicides, leading to a new era of ecological management and sustainable agriculture.

The present invention includes, but is not limited to, the following embodiments:

embodiment 1. An artificial gene driving system for plants comprising:

a first nucleic acid comprising a coding sequence for a component of a gene editing system that can target a gene, such as a coding sequence, for a pollen tube development essential protein in the plant and cause the pollen tube development essential protein to lose function, the coding sequence for the component of the gene editing system being operably linked to a promoter that mediates specific expression during pollen formation;

A second nucleic acid comprising a recoded coding sequence for the pollen tube development essential protein, the recoded sequence encoding a wild-type pollen tube development essential protein and not being targeted by the gene editing system and being operably linked to a native promoter of the pollen tube development essential gene; and

a third nucleic acid comprising a coding sequence for a cargo, e.g., the cargo to be transmitted in a population of the plant.

Embodiment 2. The artificial gene driving system of embodiment 1 wherein the first nucleic acid, the second nucleic acid and the third nucleic acid are located on the same expression construct.

Embodiment 3. The artificial gene driving system of embodiment 1 or 2 wherein the pollen tube development essential protein isNo Pollen Germination 1 (NPG1)。

Embodiment 4. The artificial gene driving system of embodiment 3 whereinNPG1Comprising an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9%, even 100% sequence identity to SEQ ID No. 1.

Embodiment 5. The artificial gene driving system of embodiment 3 wherein endogenous plants NPG1Comprises a nucleotide sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9%, even 100% sequence identity to SEQ ID No. 2.

Embodiment 6. The artificial gene driving system of embodiment 3 wherein the recodedNPG1Comprises a nucleotide sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9%, even 100% sequence identity to SEQ ID NO. 3, and recodedNPG1Cannot be targeted by the gene editing system and thus will not be rendered functional by the expression of the gene editing system.

Embodiment 7. Embodiment3, whereinNPG1Comprises a nucleotide sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9%, even 100% sequence identity to SEQ ID No. 4.

Embodiment 8 the artificial gene driving system of any one of embodiments 1-7 wherein the promoter mediating specific expression during pollen formation isDMC1 (Disruption of Meiotic Control 1) Promoters of genes.

Embodiment 9. The artificial gene driving system of embodiment 8 whereinDMC1The promoter comprises a nucleotide sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9%, even 100% sequence identity to SEQ ID No. 5.

Embodiment 10 the artificial gene drive system of any one of embodiments 1-7 wherein the promoter that mediates specific expression during pollen formation isTPD1 (Tapetum Determinant 1) Promoters of genes.

Embodiment 11 the artificial gene driving system of embodiment 10 whereinTPD1The promoter comprises a nucleotide sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9%, even 100% sequence identity to SEQ ID No. 6.

Embodiment 12 the artificial gene drive system of any one of embodiments 1-11 wherein the gene editing system is selected from CRISPR, ZFN or TALEN based gene editing systems, preferably the gene editing system is a CRISPR based gene editing system.

Embodiment 13. The artificial gene drive system of embodiment 12 wherein the CRISPR gene editing system comprises a CRISPR nuclease and at least one guide RNA, preferably the CRISPR nuclease is a Cas9 nuclease.

Embodiment 14. The artificial gene drive system of embodiment 13 wherein the coding sequence of the CRISPR nuclease is operably linked to the promoter that mediates specific expression during pollen formation, preferably to theTPD1The promoter is operably linked.

Embodiment 15. The artificial gene drive system of embodiment 13, wherein the gene editing system comprises a Cas9 nuclease and at least one targeted endogenous sourceNPG1Is a gRNA of (C).

Embodiment 16. The artificial gene driving system of embodiment 15, wherein the at least one targeting endogenous sourceNPG1Targeting a nucleotide sequence selected from any one of SEQ ID NOs 7 to 10.

Embodiment 17 the artificial gene driving system of any of embodiments 1-16, wherein the expression of the cargo is detrimental or beneficial to the plant when the plant is exposed to a particular compound or condition, e.g., the cargo is a herbicide sensitive gene, a gene that disrupts herbicide resistance, a gene that enhances environmental adaptation, a gene that enhances disease resistance.

Embodiment 18. A method of producing a modified plant for genetically driven engineering a plant population, the method comprising introducing the artificial gene drive system of any one of embodiments 1-17 into at least one plant, thereby obtaining at least one modified plant having the genome thereof integrated with the first nucleic acid, the second nucleic acid, and the third nucleic acid.

Embodiment 19 the method of embodiment 18, wherein the first nucleic acid, second nucleic acid, and third nucleic acid integrated into the genome of the modified plant are closely linked, e.g., located at the same locus.

Embodiment 20. Use of a modified plant for genetically engineering a population of plants, wherein the modified plant is obtained by the method of embodiment 18 or 19 or the modified plant has introduced into it an artificial gene drive system for plants according to any of embodiments 1 to 17, whereby its genome has been integrated with said first, second and third nucleic acids.

Embodiment 21. A modified plant for genetically engineering a population of plants, wherein the modified plant is obtained by the method of embodiment 18 or 19 or the modified plant has introduced into it an artificial gene driven system for plants according to any of embodiments 1 to 17, whereby its genome integrates said first, second and third nucleic acids.

Embodiment 22. A method of modifying a plant population by gene driving, the method comprising placing at least one modified plant of embodiment 21 into the population of plants and allowing the at least one modified plant to cross with other plants in the plant population.

Embodiment 23. The method of embodiment 22, wherein the method allows the progeny of the at least one modified plant that crosses with other plants in the plant population to cross with other plants and/or progeny in the population.

Embodiment 24. The method of embodiment 22 or 23, resulting in an increased proportion of plants carrying the cargo in the population of modified plants as compared to the population of unmodified plants.

Drawings

FIG. 1, CAIN gene driving and theoretical genetic behavior in Arabidopsis. a, CAIN gene driven elements. The portions within the left and right borders are shown and describe the corresponding stages of development of the arabidopsis male gametophyte.TPD1: Tapetum Determinant 1. DMC1: Disruption of Meiotic Control 1. NPG1: No Pollen Germination 1B, assuming two, one or no, respectively, in male germ line cellsNPG1When allele is cut resulting in loss of function, the proportion of CAIN-carrying individuals in the F1 offspring generated by crossing the wild-type female parent and CAIN-carrying male parent is predicted. Salmon color, sky blue, and gray boxes represent female parent, male parent, and offspring, respectively. The dashed box represents the gametophyte. Red crosses represent non-germinated pollen grains.

FIG. 2, CAIN gene driven the transmission rate from T1 to F1 generation in test cross. a, transforming to obtain T1 plants and a subsequent hybridization test step. Transgenic T1 was obtained using Agrobacterium transformed plants carrying either the control vector (FAST only) or one of the gene driven vectors (DMC-CAIN and TPD-CAIN). Taking a T1 plant inserted at a single site as a male parent and a wild Col-0 as a female parent to obtain an F1 generation. b, the transmission efficiency of CAIN gene drive is the proportion of FAST+F1 seeds in all F1 seeds. Each red dot represents the transfer efficiency in a single corner.

FIG. 3, FAST+F1 plants in TPD-CAIN experimentsNPG1Genotype at the locus. a, F1 somatic tissue (rosette leaves and inflorescences) genotyping schematic diagram. b, genotypic results for 16 fast+f1 plants at four gRNA targets were summarized. The numerical values and the symbols "+" and "-" preceding the bases represent insertion and deletion events, respectively. The numbers following the symbols represent the number of nucleotides in the indels greater than 2. Symbol "A>C' represents a base substitution from adenine (A) to cytosine (C). c, genotypic results at the gRNA11 target based on Illumina sequencing.

FIG. 4, propagation rate of TPD-CAIN from F1 generation to F2 generation in backcrossing. The average transmission efficiency of TPD-CAIN in offspring when FAST+F1 plants were used as male parent (a) or female parent (b), respectively.

FIG. 5, DMC-CAIN propagation rate from F1 to F2. and a, genotyping the inflorescence part of the F1 plant. b, genotype summary of 12 fast+f1 plants at four gRNA target sites. c, average delivery efficiency of DMC-CAIN in F2 seed generated by FAST+F1 as male parent.

FIG. 6, FAST-F1 and FAST +/-F2 plants inNPG1Genotype at the locus. a, genotyping of the leaf part of the FAST-F1 plant generated by taking a T1 plant carrying TPD-CAIN as a male parent. b, genotype summary of 11 FAST-F1 plants at four gRNA target sites. The corresponding mechanism (incomplete cutting efficiency or incomplete penetrance) for each F1 plant is marked below by the permutation. c, genotyping of fast+ and FAST-F2 plants at four gRNA target sites according to Sanger sequencing. The F2 plants were generated from F1 carrying TPD-CAIN (TPD-CAIN/+) with wild type (+/+) positive and negative crosses.

FIG. 7, modified and suppressed CAIN driven propagation dynamics simulations. a, the calculation simulation shows the effect of different settings of the male germ cell cutting efficiency (empirical value: 98.4%, artificial setting: 50.0% and 100.0%) and the penetrance (empirical value: 96.0%, artificial setting: 50.0% and 100.0%) on CAIN propagation dynamics. b, calculation simulation of the diffusion dynamics driven by the home type, TARE and CAIN, and the initial throwing proportion is 1%. The cleavage efficiency of both home and TARE was set to maximum. For CAIN, the penetrance was set to 96.0% of the empirical value. And different male (empirical 98.4% or artificial set-point 50.0%) and female (empirical 94.1% and artificial set-point 0.0% and 50.0%) germ cell cutting efficiencies were set. c, diffusion dynamics driven by the inhibition type CAIN. The figure shows the number and frequency of CAIN carriers and wild-type individuals as a function of each generation.

FIG. 8, four gRNAs involved in CAIN gene driving. a, CAIN vector contains four grnas in tandem. Based on synonymous codon principle pairNPG1Sequence changes without changing the encoded amino acids as a RecodedNPG1The mutated nucleotides are marked with red boxes. b, the positions of the four gRNA target sequences on the genomic sequence are displayed. Primers used for genotyping have been labeled. The PCR products were first Sanger sequenced using the primer pairs NPG-gDNA-F1 and NPG-gDNA-R1_2 to amplify a genomic region covering the four target sites. The primers used for sequencing are: NPG-gDNA-F1, NPG-gDNA-F2, NPG-gDNA-R1_1 and NPG-gDNA-R2.

FIG. 9, CAIN gene drive vector map. The control vectors FAST only (a), DMC-CAIN (b) and TPD-CAIN (c) are shown, and the total sequence length and main features are marked.

Fig. 10, summary of experimental procedure. The first step: the control vector (FAST only), DMC-CAIN or TPD-CAIN gene-driven vector was infected with the Arabidopsis Col-0 background, respectively. Successfully transformed seeds were directly picked by the phenotype of FAST (red fluorescence). T1 plants with single site insertions were selected by TAIL-PCR and whole genome sequencing. And a second step of: hybridization is carried out by taking the T1 as a male parent and a Col-0 female parent. The percentage of fast+ seeds in F1 seeds was taken as the CAIN gene driven transmissibility (drive%). And a third step of: f1 seeds are planted to obtain F1 plants. Genotyping was performed on each F1 plant as described in the methods section. Fourth step: f1 plants with known genotypes are used as male parent or female parent to be crossed with Col-0 plants respectively. Drive transmissivities in F2 seeds were also counted. Fifth step: f2 plants are obtained by seeding F2 seeds. Genotyping was performed similarly on F2 plants.

FIG. 11, type of mutation detected in FAST+F1 plants. The figure illustrates the insertions, deletions and single nucleotide polymorphisms generated at (a) the gRNA2 and (b) the gRNA11 target site and their positions. * The possible alignment results are indicated, as the underlined bases can be located on either side of the deletion.

FIG. 12, reversible TPD-CAIN gene drive. Gene driven TPD-CAIN at new versions ⁿ⁺¹ In the design of gRNAs against novel sites in NPG1 ⁿ⁺¹ As a novel poison to gRNAs ⁿ And gRNAs ⁿ⁺¹ Recoded with resistance (not capable of being targeted for cleavage) ⁿ⁺¹ Then it is used as medicine for resolving medicine. When Cas9 is active and cleaves in germ cells, new toxicants destroy NPG1 and recoded on the genome ⁿ Therefore only be recooded ⁿ⁺¹ Rescue. In this way, if a new version of TPD-CAIN ⁿ⁺¹ With old version TPD-CAIN ⁿ In a homologous position, the new version will be obsolete and replace the old version. Novel cargo linked thereto ⁿ⁺¹ And also spreads out therewith.

FIG. 13, effect of male germ cell cleavage efficiency, incomplete penetrance and female germ cell cleavage efficiency on TPD-CAIN system. and a, estimating the cutting efficiency and the apparent rate of the male germ cells. In the F1 offspring, 94.3% (526/558) of the seeds were FAST+ (TPD-CAIN/+) and NPG1 at the gRNA11 target site ^- Genotype. In addition, 2.6% (5.7%. Times.5/11) is genotype +/+;NPG1 ^+/- 3.1% (5.7%. Times.6/11) is genotype +/+;NPG1 ^+/+ And (5) a plant. In the F2 generation, 94.8% (3868/4080) was FAST+ (TPD-CAIN/+) and NPG1 was found at the gRNA11 target site ^- Genotype, remaining 5.2% are +/++, NPG1 ^+/- . Based on the statistics in the F1 and F2 generations, the average cleavage failure rate was estimated to be 1.6%, i.e., the male germ cell cleavage efficiency was 98.4%. The average penetrance was estimated to be 96.0%. b, female germ cell cleavage efficiency (r) estimation. Since no further cleavage of the target site occurs in FAST-F2 plantsThus, according to the gRNA11 target siteNPG1Genotype calculation cleavage efficiency r. Since only one of 34 FAST-F2 strains is +/+;NPG1 ^+/+ Genotype, thus estimating r to be 94.1%.

FIG. 14 potential application of TPD-CAIN gene drive. TPD-CAIN has two potential application directions. a, improving plant adaptability: by propagating genes that promote the adaptation of specific endangered species to their environment, TPD-CAIN can make rapid genetic rescue and make the target species more suitable for their living environment. b, weed management: TPD-CAIN enables efficient weed area management by transmitting genes that confer herbicide sensitivity to the target weeds, in combination with the local application of subsequent herbicides.

Detailed Description

1. Definition of the definition

In the present invention, unless otherwise indicated, scientific and technical terms used herein have the meanings commonly understood by one of ordinary skill in the art. Also, the terms related to protein and nucleic acid chemistry, molecular biology, microbiology and laboratory procedures used herein are terms and conventional procedures that are widely used in the corresponding field. For example, standard recombinant DNA and molecular cloning techniques for use in the present invention are well known to those skilled in the art and are more fully described in the following documents: sambrook, j., fritsch, e.f., and Maniatis, t., molecular Cloning: a Laboratory Manual; cold Spring Harbor Laboratory Press: cold Spring Harbor,1989 (hereinafter "Sambrook"). Meanwhile, in order to better understand the present invention, definitions and explanations of related terms are provided below.

As used herein, the term "and/or" encompasses all combinations of items connected by the term, and should be viewed as having been individually listed herein. For example, "a and/or B" encompasses "a", "a and B", and "B". For example, "A, B and/or C" encompasses "a", "B", "C", "a and B", "a and C", "B and C" and "a and B and C".

The term "comprising" is used herein to describe a sequence of a protein or nucleic acid, which may consist of the sequence, or may have additional amino acids or nucleotides at one or both ends of the protein or nucleic acid, but still have the activity described herein. Furthermore, it will be clear to those skilled in the art that the methionine encoded by the start codon at the N-terminus of a polypeptide may be retained in some practical situations (e.g., when expressed in a particular expression system) without substantially affecting the function of the polypeptide. Thus, in describing a particular polypeptide amino acid sequence in the specification and claims, although it may not comprise a methionine encoded at the N-terminus by the initiation codon, a sequence comprising such methionine is also contemplated at this time, and accordingly, the encoding nucleotide sequence may also comprise the initiation codon; and vice versa.

"exogenous" with respect to a sequence means a sequence from a foreign species, or if from the same species, a sequence that has undergone significant alteration in composition and/or locus from its native form by deliberate human intervention.

"Polynucleotide", "nucleic acid sequence", "nucleotide sequence" or "nucleic acid" are used interchangeably and are a single-or double-stranded RNA or DNA polymer, optionally containing synthetic, unnatural or altered nucleotide bases. Nucleotides are referred to by their single letter designations as follows: "A" is adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively), "C" represents cytidine or deoxycytidine, "G" represents guanosine or deoxyguanosine, "U" represents uridine, "T" represents deoxythymidine, "R" represents purine (A or G), "Y" represents pyrimidine (C or T), "K" represents G or T, "H" represents A or C or T, "D" represents A, T or G, "I" represents inosine, and "N" represents any nucleotide.

Codon optimization refers to a method of modifying a nucleic acid sequence to enhance expression in a host cell of interest by replacing at least one codon of the native sequence with a more or most frequently used codon in the gene of the host cell (e.g., about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons while maintaining the native amino acid sequence).

"polypeptide", "peptide", and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The term applies to amino acid polymers in which one or more amino acid residues are artificial chemical analogues of the corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers. The terms "polypeptide", "peptide", "amino acid sequence" and "protein" may also include modified forms including, but not limited to, glycosylation, lipid attachment, sulfation, gamma carboxylation of glutamic acid residues, hydroxylation and ADP-ribosylation.

Sequence "identity" has art-recognized meanings and the percent sequence identity between two nucleic acid or polypeptide molecules or regions can be calculated using the disclosed techniques. Sequence identity may be measured along the full length of a polynucleotide or polypeptide or along a region of the molecule. (see, e.g., computational Molecular Biology, lesk, A.M., ed., oxford University Press, new York, 1988; biocomputing: informatics and Genome Projects, smith, D.W., ed., academic Press, new York, 1993; computer Analysis of Sequence Data, part I, griffin, A.M., and Griffin, H.G., eds., humana Press, new Jersey, 1994; sequence Analysis in Molecular Biology, von Heinje, G., academic Press, 1987; and Sequence Analysis Primer, grib skov, M.and Devereux, J., eds., M Stockton Press, new York, 1991). Although there are many methods of measuring identity between two polynucleotides or polypeptides, the term "identity" is well known to the skilled artisan (carrello, h. & Lipman, d.,. SIAM J Applied Math 48:1073 (1988)).

In peptides or proteins, suitable conservative amino acid substitutions are known to those skilled in the art, and can generally be made without altering the biological activity of the resulting molecule. In general, one skilled in The art recognizes that single amino acid substitutions in The non-essential region of a polypeptide do not substantially alter biological activity (see, e.g., watson et al, molecular Biology of The Gene, 4th Edition, 1987, the Benjamin/Cummings pub. Co., p. 224).

As used herein, an "expression construct" refers to a vector, such as a recombinant vector, suitable for expression of a nucleotide sequence of interest in an organism. "expression" refers to the production of a functional product. For example, expression of a nucleotide sequence may refer to transcription of the nucleotide sequence (e.g., transcription into mRNA or functional RNA) and/or translation of RNA into a precursor or mature protein.

The "expression construct" of the invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector, or, in some embodiments, may be an RNA (e.g., mRNA) that is capable of translation, such as RNA produced by in vitro transcription.

The "expression construct" of the invention may comprise regulatory sequences of different origin and nucleotide sequences of interest, or regulatory sequences and nucleotide sequences of interest of the same origin but arranged in a manner different from that normally found in nature.

"regulatory sequence" and "regulatory element" are used interchangeably and refer to a nucleotide sequence that is located upstream (5 'non-coding sequence), intermediate or downstream (3' non-coding sequence) of a coding sequence and affects transcription, RNA processing or stability, or translation of the relevant coding sequence. Regulatory sequences may include, but are not limited to, promoters, translation leader sequences, introns, and polyadenylation recognition sequences.

"promoter" refers to a nucleic acid fragment capable of controlling transcription of another nucleic acid fragment. In some embodiments of the invention, the promoter is a promoter capable of controlling transcription of a gene in a cell, whether or not it is derived from the cell. The promoter may be a constitutive or tissue specific or developmentally regulated or inducible promoter.

As used herein, the term "operably linked" refers to a regulatory element (e.g., without limitation, a promoter sequence, a transcription termination sequence, etc.) linked to a nucleic acid sequence (e.g., a coding sequence or an open reading frame) such that transcription of the nucleotide sequence is controlled and regulated by the transcription regulatory element. Techniques for operably linking a regulatory element region to a nucleic acid molecule are known in the art.

"introducing" a nucleic acid molecule (e.g., plasmid, linear nucleic acid fragment, RNA, etc.) or protein into an organism refers to transforming a cell of the organism with the nucleic acid or protein such that the nucleic acid or protein is capable of functioning in the cell. "transformation" as used herein includes both stable transformation and transient transformation. "Stable transformation" refers to the introduction of an exogenous nucleotide sequence into the genome, resulting in stable inheritance of an exogenous gene. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any successive generation thereof. "transient transformation" refers to the introduction of a nucleic acid molecule or protein into a cell to perform a function without stable inheritance of an exogenous gene. In transient transformation, the exogenous nucleic acid sequence is not integrated into the genome.

As used herein, the term "plant" includes whole plants and any progeny, cells, tissues, or parts of plants. The term "plant part" includes any part of a plant, including, for example, but not limited to: seeds (including mature seeds, immature embryos without seed coats, and immature seeds); plant cutting (plant cutting); a plant cell; plant cell cultures; plant organs (e.g., pollen, embryos, flowers, fruits, shoots, leaves, roots, stems, and related explants). The plant tissue or plant organ may be a seed, a callus, or any other population of plant cells organized into structural or functional units. Plant cells or tissue cultures are capable of regenerating plants having the physiological and morphological characteristics of the plant from which the cells or tissue are derived, and of regenerating plants having substantially the same genotype as the plant. In contrast, some plant cells are not capable of regenerating to produce plants. The regenerable cells in the plant cells or tissue culture may be embryos, protoplasts, meristematic cells, callus tissue, pollen, leaves, anthers, roots, root tips, filaments, flowers, kernels, ears, cobs, husks, or stems.

Plant "progeny" includes any subsequent generation of a plant.

2. Artificial gene driving

In one aspect, the present invention provides an artificial gene drive system for plants comprising:

a first nucleic acid comprising a coding sequence for a gene editing system component that can target a gene encoding a pollen tube development essential protein in the plant and cause the pollen tube development essential protein to lose function, the coding sequence of the gene editing system component being operably linked to a promoter that mediates specific expression during pollen formation;

a second nucleic acid comprising a recoded coding sequence for the pollen tube development essential protein, the recoded sequence encoding the functional pollen tube development essential protein and not being targeted by the gene editing system and being operably linked to a native promoter of a gene encoding the pollen tube development essential protein; and

a third nucleic acid comprising a coding sequence for a cargo to be transmitted in a population of the plant.

In some embodiments, the first nucleic acid, the second nucleic acid, and the third nucleic acid are located on the same expression construct.

In some embodiments, the gene encoding a pollen tube development essential protein is an endogenous gene of the plant. In some embodiments, the gene encoding a protein essential for pollen tube development is an exogenous gene that has been introduced into the plant.

In some embodiments, the pollen tube development essential protein is No Pollen Germination 1 (NPG 1). NPG1 is associated with the development of male gametophytes but does not affect female gametophyte development and is required for the later stages of pollen germination. NPG1 is well conserved among different plants.

In some embodiments, NPG1 comprises an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9%, even 100% sequence identity to SEQ ID NO. 1.

In some embodiments, the coding sequence for endogenous NPG1 in a plant comprises a nucleotide sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9%, even 100% sequence identity to SEQ ID No. 2.

In some embodiments, the coding sequence of the recoded NPG1 comprises a nucleotide sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9%, even 100% sequence identity to SEQ ID NO. 3. The coding sequence of the recoded NPG1 cannot be targeted by the gene editing system and thus is not rendered functional by the expression of the gene editing system.

In general, a promoter of a gene refers to a sequence on the genome that is about 100bp to about 5kb, e.g., about 500bp to about 3kb, e.g., about 2kb, in length upstream of the translation start site or transcription start site of the coding sequence of the gene.

In some embodiments, the natural promoter of NPG1 comprises a nucleotide sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9%, even 100% sequence identity to SEQ ID NO. 4.

In some embodiments, the promoter that mediates specific expression during pollen formation isDMC1(Disruption of Meiotic Control 1) Promoters of genes. In some embodiments, theDMC1The promoter comprises a nucleotide sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9%, even 100% sequence identity to SEQ ID No. 5. The saidDMC1The promoter is capable of driving expression of the nucleotide sequence to which it is operably linked in pollen mother cells.

In some preferred embodiments, the promoter that mediates specific expression during pollen formation isTPD1 (Tapetum Determinant 1) Promoters of genes. In some embodiments, theTPD1The promoter comprises a nucleotide sequence having at least 70%, at least 75%, at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5%, at least 99.9%, even 100% sequence identity to SEQ ID No. 6. The saidTPD1The promoter is capable of driving the continuous expression of the nucleotide sequence to which it is operably linked during the development of the progenitor cells of the pollen mother cell, i.e. the sporogenic cells, into the pollen mother cell.

The gene editing system useful in the present invention may be various gene editing systems known in the art as long as they can perform targeted genome editing in plants. The gene editing system may be a CRISPR, ZFN or TALEN based gene editing system. Preferably, the gene editing system is a CRISPR-based gene editing system.

The CRISPR gene editing system may comprise a CRISPR nuclease and at least one guide RNA. The CRISPR nuclease and the guide RNA can form a complex that targets and/or cleaves a genomic target sequence based on the complementarity of the guide RNA to the genomic target sequence.

The "CRISPR nuclease" can be derived from a Cas9 nuclease, including a Cas9 nuclease or a functional variant thereof. The Cas9 nuclease may be Cas9 nucleases from different species, such as from streptococcus pyogenes @, for exampleS. pyogenes) spCas9 or derived from Staphylococcus aureusS. aureus) SaCas9 of (A). "Cas9 coreThe nucleases "and" Cas9 "are used interchangeably herein to refer to RNA-guided nucleases comprising a Cas9 protein or fragment thereof (e.g., a protein comprising the active DNA cleavage domain of Cas9 and/or the gRNA binding domain of Cas 9). Cas9 is a component of a CRISPR/Cas (clustered regularly interspaced short palindromic repeats and related systems) genome editing system that can target and cleave DNA target sequences to form DNA Double Strand Breaks (DSBs) under the direction of guide RNAs.

The "CRISPR nuclease" may also be derived from a Cpf1 nuclease, including a Cpf1 nuclease or a functional variant thereof. The Cpf1 nuclease may be a Cpf1 nuclease from a different species, e.g.fromFrancisella novicida U112、Acidaminococcus sp.BV3L6Lachnospiraceae bacteriumCpf1 nuclease of ND 2006.

Useful "CRISPR nucleases" can also be derived from Cas3, cas8a, cas5, cas8b, cas8C, cas10d, cse1, cse2, csy1, csy2, csy3, GSU0054, cas10, csm2, cmr5, cas10, csx11, csx10, csf1, csn2, cas4, C2C1, C2C3, or C2 nucleases, including for example these nucleases or functional variants thereof.

In some embodiments, the coding sequence of the CRISPR nuclease is operably linked to the promoter that mediates specific expression during pollen formation, preferably to the promoterTPD1The promoter is operably linked.

As used herein, "guide RNA" and "gRNA" are used interchangeably to refer to an RNA molecule that is capable of forming a complex with a CRISPR effector protein and of targeting the complex to a target sequence due to having a identity to the target sequence. The guide RNA targets the target sequence by base pairing with the complementary strand of the target sequence. For example, the grnas employed by Cas9 nucleases or functional variants thereof are typically composed of crrnas and tracrRNA molecules that are partially complementary to form a complex, wherein the crrnas comprise a guide sequence (also known as a seed sequence) that has sufficient identity to a target sequence to hybridize to the complementary strand of the target sequence and direct the CRISPR complex (Cas 9+ crRNA + tracrRNA) to specifically bind to the target sequence. However, it is known in the art that one-way guide RNAs (sgrnas) can be designed which contain both the features of crrnas and tracrrnas. Whereas the grnas employed for Cpf1 nucleases or functional variants thereof typically consist of only mature crRNA molecules, which may also be referred to as sgrnas. It is within the ability of those skilled in the art to design a suitable gRNA based on the CRISPR nuclease used and the target sequence to be edited. In some embodiments, the guide RNA may be driven in expression, such as transcription, by a constitutive promoter. In some embodiments, the guide RNA may drive expression, such as transcription, by a U6 or U3 promoter.

The gene editing system may target any region of the endogenous gene encoding a pollen tube development essential protein, as long as it is capable of causing the loss of function of the endogenous pollen tube development essential protein. For example, the gene editing system may target endogenous coding sequences for proteins essential for pollen tube development, resulting in incomplete mutation or translation of the protein. Alternatively, the gene editing system may target endogenous regulatory sequences of a protein necessary for pollen tube development, resulting in the protein not being expressed.

Methods of recoding the coding sequence of the pollen tube development essential protein such that it expresses a functional protein but is no longer targeted by the gene editing system are well known in the art. For example, the nucleotide sequence may be altered by codon degeneracy to remove the target sequence of the gene editing system without altering the encoded protein sequence. However, if the gene editing system targets an endogenous regulatory sequence of the pollen tube development essential protein, the coding sequence of the pollen tube development essential protein contained in the second nucleic acid may also be identical to the wild type coding sequence, which may also be referred to herein as recoded, because it is also not targeted by the gene editing system.

In some embodiments, the gene editing system comprises a Cas9 nuclease and at least one gRNA targeting endogenous NPG 1. In some embodiments, the at least one gRNA targeting endogenous NPG1 targets a nucleotide sequence selected from any of SEQ ID NOs 7-10.

The "cargo to be spread in the population of plants" as described herein may be any sequence that is desired to be spread in the population of plants, such as a wild population. For example, expression of the cargo is detrimental to the plant when the plant is exposed to a particular compound or condition. For example, the cargo may be a herbicide sensitive gene, or a gene capable of disrupting the original herbicide resistance. By matching with subsequent artificial spraying of a certain herbicide or specific compound, effective weed management in a controllable range and locally can be realized. The cargo may also be a gene that affects megaspore cell or embryo development, whereby control over population size may be achieved. The goods can also be genes capable of improving the adaptability to the environment, disease resistance and the like, so that the adaptability of endangered plants to the natural environment is improved.

In another aspect, the invention provides a method of producing a modified plant for genetically driven engineering a population of plants, the method comprising introducing the artificial gene driven system for plants of the invention into at least one plant, thereby obtaining at least one modified plant having a genome into which the first, second and third nucleic acids are integrated.

In some embodiments, the first nucleic acid, second nucleic acid, and third nucleic acid integrated into the genome of the modified plant are closely linked, e.g., located at the same locus. In some embodiments, the first nucleic acid, second nucleic acid, and/or third nucleic acid are present in a single copy in the genome of the plant.

In the methods of the invention, the artificial gene drive system may be introduced into plants by various methods well known to those skilled in the art. Methods useful for introducing the artificial gene drive system of the invention into plants include, but are not limited to: gene gun method, PEG-mediated protoplast transformation, agrobacterium-mediated transformation, plant virus-mediated transformation, pollen tube channel method, and ovary injection method.

In another aspect, the invention provides a modified plant for genetically driven engineering a plant population, prepared by the method of the invention.

In another aspect, the invention provides a modified plant for genetically driven engineering a plant population, into which the artificial gene driven system for plants of the invention has been introduced, whereby its genome has been integrated with the first, second and third nucleic acids.

In another aspect, the invention provides a method of engineering a population of plants by gene driving, the method comprising placing at least one modified plant of the invention into the population of plants and allowing the at least one modified plant to cross with other plants in the population of plants. In some embodiments, the methods allow the progeny of the at least one modified plant that crosses with other plants in the plant population to cross with other plants and/or progeny in the population. In some embodiments, the method results in an increased proportion of plants carrying the cargo in the population of modified plants as compared to the population of non-modified plants. For example, the population of modified plants comprises at least 1% to 100%, such as at least 1%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or even 100% of the plants carrying the cargo.

In another aspect, the invention provides a method of engineering a plant population by gene driving, the method comprising:

i) Introducing the artificial gene driving system for plants of the present invention into at least one plant, thereby obtaining at least one modified plant having the genome integrated with the first, second and third nucleic acids;

ii) placing at least one modified plant obtained in step i) in a population of said plants and allowing said at least one modified plant to cross with other plants in said population of plants.

In some embodiments, the first nucleic acid, second nucleic acid, and third nucleic acid integrated into the genome of the modified plant are closely linked, e.g., located at the same locus. In some embodiments, the first nucleic acid, second nucleic acid, and/or third nucleic acid are present in a single copy in the genome of the plant. In some embodiments, the method allows the progeny of the at least one modified plant that crosses with other plants in the plant population to cross with other plants in the population. In some embodiments, the methods allow the progeny of the at least one modified plant that crosses with other plants in the plant population to cross with other plants and/or progeny in the population. In some embodiments, the method results in an increased proportion of plants carrying the cargo in the population of modified plants as compared to the population of non-modified plants. For example, the population of modified plants comprises at least 1% to 100%, such as at least 1%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or even 100% of the plants carrying the cargo.

The plants of the various aspects of the invention may be monocotyledonous or dicotyledonous plants, preferably plants that are predominantly inbred. For example, the plant may be Arabidopsis thaliana, maize, canola, tobacco, grassy weeds, and the like.

Examples

Experimental materials and methods

Plant material and growth conditions

All Arabidopsis lines used in this study were Columbia-0 (Col-0) ecotype. Seeds were surface sterilized in 10% sodium hypochlorite for 10 min, washed 3 times with sterile water, and then sown in germination medium (2.2 g/l Murashige and Skoog medium, 10 g/l sucrose, and 7.6 g/l plant agar, ph=5.7). After 2 days of treatment at 4 ℃, the agar plates were transferred to a growth chamber at 22 ℃,16 hours light/8 hours dark photoperiod. Seedlings 7 days old were transferred to soil and continuously cultivated in a greenhouse at the above temperature and light conditions for subsequent experiments.

Plasmid construction

All restriction enzymes and enzymes for Gibson ligation were from NEB. High fidelity polymerase (Phanta Max Super-Fidelity DNA Polymerase, P505) and gel extraction kit (FastPure Gel DNA Extraction Mini Kit, DC 301) for Polymerase Chain Reaction (PCR) are both from Vazyme. All plasmids used in this study were cloned using standard molecular biology techniques and purified using StarPrep Fast Plasmid Mini Kit (Genstar, D201).

To construct Fast only vectors, ligation was performed by Gibson ²⁶ FAST marker sequence ¹⁷ (pOLE 1: OLE1-TagRFP-Nos terminator) was inserted into XF675 binary vector. Specifically, the promoter and part of the coding sequence of the OLE1 (AT 4G 25140) gene were first amplified from genomic DNA (gDNA), the sequences of the TagRFP and the Nos terminator being as per the references ¹⁷ Synthesized. The HindIII digested XF675 plasmid and the two fragments were assembled together by Gibson ligation.

To construct DMC-CAIN and TPD-CAIN vectors, all components were cloned into XF675 binary vectors in five consecutive steps. Specifically, sgRNA cassette (pU 6-SmR-gRNA scaffold-U6 terminator) was amplified from template pHEE401E (Addgene # 71287) and inserted into XF675 after double digestion (EcoRI and HindIII) using Gibson ligation. Then, bsaI edge cutting and connecting mode is used ²⁷ 4 gRNAs were introduced. The plasmid was digested again with HindIII and the FAST marker sequence was cloned in via Gibson ligation, while the HindIII cleavage recognition sites were filled in.NPG1The promoter (about 2 kb) was amplified from gDNA, the Recoded NPG1 sequence was obtained by mutation PCR, and the ligation by Gibson was continued to insert into the HindIII digested plasmid. Finally, the plasmid of the previous step was digested with KpnI and the following fragments were ligated by Gibson ligation: amplified from gDNA DMC1(AT 3G 22880) andTPD1(AT 4G 24972), a plant codon optimized SpCas9 sequence with two Nuclear Localization Signals (NLS) amplified from template pHEE401E and a Nos terminator amplified from template XF 675.

Evaluation of editing efficiency of potential target sites in protoplasts

According to CRISPR-P2.0 ²⁴ (http:// CRISPR. Hzau. Edu. Cn/CRISPR2 /) and CRISPR-GE ²⁸ (http:// skl. Scau. Edu. Cn/home /) predicts candidate gRNAs for the NPG1 (AT 2G 43040) coding sequence, respectively, and 12 were screened for cleavage efficiency in Arabidopsis protoplasts.

Each 20nt gRNA sequence was inserted into the pAtU6-sgRNA vector (Addgene plasmid # 119775) by BsaI cleavage and Gibson ligation. Arabidopsis protoplasts were prepared according to the reference ^29,30 . After co-transformation with pAtU6-sgRNA and p2X35S-Cas9, protoplasts were harvested after 48 hours incubation at room temperature. Meanwhile, one tube of protoplasts was transformed with p2X35S-GFP alone, and about 16 hours after transformation was used to estimate transformation efficiency. The conversion efficiency of the two biological replicates was found to be 41% and 45%, respectively.

Genomic DNA from each protoplast was extracted using DNA extraction kit (Plant Genomic DNA Kit, DP305, TIANGEN). A180-220 bp genomic region surrounding each target region was PCR amplified and purified. Two biologically repeatedly generated PCR products at the same target site were distinguished by primer introduction of 6nt barcode ('ATGCAG'). All 24 purified PCR product samples were quantitated with Nanodrop 2000 and mixed in equal amounts and sent to Novogene for library construction and Illumina PE150 sequencing.

The calculation method of the gRNA editing efficiency comprises the following steps: the number of reads edited is divided by the total number of reads on the site-specific alignment (map). Mismatch reads located within 10bp upstream of PAM (NGG) served as reads that were edited. Since the main editing types of Cas9 are insertions and deletions, single base Substitutions (SNPs) were removed when calculating the editing type.

Single locus insertion driven heterozygote generation and validation

By dipping using Agrobacterium GV3101 strain containing DMC-CAIN, TPD-CAIN or FAST only vector ³¹ Wild type Col-0 Arabidopsis thaliana was transformed. Successful primary transformants (T1) were selected directly from the harvested dry seeds according to the presence or absence of red fluorescence under a hand-held fluorescence detector (LUYOR 3415 RG).

Agrobacterium-mediated transformationSometimes exogenous DNA sequences are introduced into multiple sites of the plant genome. For this purpose, 48-50 strains were randomly selected from T1 obtained from DMC-CAIN, TPD-CAIN, and subjected to thermal asymmetric interlacing polymerase chain reaction (TAIL-PCR) ³² The number of T-DNA insertion sites in each plant was examined. And by whole genome sequencing (Novogene, PE 150), using software TDNAscan ³³ And semi-automatic pipeline developed by the inventors for data analysis, further confirming single site inserted T1 plants, obtaining T1 of 3 DMC-CAIN, T1 of 5 TPD-CAIN, and T1 of 2 FAST only.

Cross pollination and percent drive evaluation

To examine the gene-driven spread, unopened flowers of the wild-type female parent (several strains Col-0) were emasculated and pollinated with pollen from T1 plants of DMC-CAIN and TPD-CAIN. To simulate natural hybridization, pollen from multiple flowers of the male parent would be smeared on each stigma. The red fluorescence in the F1 dry seeds was identified with a hand-held fluorescence detector, and the percentage of red fluorescence in the F1 seeds was taken as the propagation rate of gene drive in the F1 generation. To test whether the TPD-CAIN gene driver was transmitted through the male parent, FAST+F1 plants were crossed with wild type Col-0 plants as female or male parent to obtain F2 generation. The method for identifying F2 representation is the same as that of F1.

Statistical analysis

To check the significance of CAIN propagation scale, an exact binom.test () function in R was used to check the ratio of fast+ seed to FAST-seed, assuming 1:1. Because of the heterogeneity in fast+ seed ratios between different fruits, the heterogeneity was quantified by replied G-test with desco tools (https:// cran. R-project. Org/package = desco tools) in R. Since multiple comparisons may increase type I errors (false positives), the p.adjust () function in the R package is used to calculate the False Discovery Rate (FDR).

Genotyping

Target geneNPG1Is identified by amplification of about 2 kb genomic fragments including four target sitesSanger sequencing was performed (FIG. 8 b). Wherein the reverse amplification primer is located within the intron region to avoid interference of the Recoded NPG1 sequence in the driving element. Genomic DNA (gDNA) extracted from rosette leaves and inflorescences, respectively, was used as a template for Polymerase Chain Reaction (PCR). The PCR products were directly subjected to Sanger sequencing. Consistent Sanger sequencing results are generally obtained in leaf and inflorescence samples of the same plant. If the presence of multiple peaks in the sequencing result of leaf samples (indicating the possible presence of chimeric or heterogeneous species) leads to an inability to determine genotype, an inflorescence sample from the same plant is used to determine genotype. FAST-F1 and F2 plants were genotyped using leaf samples only.

For three F1 plants, the gRNA11 target site genotype was determined for leaf and inflorescence samples using Illumina sequencing. PCR products from different tissues all had unique barcode sequences (barcode) introduced during PCR, mixed in equal amounts and sequenced using Illumina PE 150. Clear read, differentiated by barcode sequence, was aligned back to genomic sequence surrounding the gRNA11 target site using BWA. The read depth covering the gRNA11 region is between 107,207 and 150,422. Consider a read that shows a mismatch in the 23-nt target site region as the edit type, and determine the edit efficiency of each sample as the ratio of the mismatched read to the total reads aligned to the target site. Single base substitutions with a frequency higher than 0.5% and all indels are considered as edited types.

Population dynamic simulation

CAINIs computationally modeled using an individual-based stochastic model based on the Wright-Fisher model, which assumes that the population size is constant and that the generations do not overlap. For modification typeCAINDriving, taking into account two unlinked lociCAINAndNPG1the initial population has 9900 wild individuals and 100 heterozygotes carrying CAINTPD- CAIN/+；NPG1 ^+/+ ). Study before reference ²⁵ A density adjustment strategy is employed. In short, the formula s=10/(9×) is usedN / K+1) calculation of the scaling factorS) WhereinNRepresenting the population size of the current generation,Krepresenting environmental load bearing capacity (i.e., 10,000). According to binomial distributionX ~ B (50, 0.02 × S) Calculating the population scale of each generationX). For each generation, individual pairs are randomly selected to produce offspring, and sexes are randomly assigned. This process is repeatedXSecond, the offspring generated overlay the parental data.

For each generation, CAIN-carrying male parents could produce CRISPR-mediated cleavage at NPG1 sites with cleavage efficiency set to either empirical (98.4%, fig. 13 a) or fixed (50% or 100%). Cutting results inNPG1The gene function is lost. The phenotypic penetrance of pollen failure due to loss of male gamete function was either empirically (96.0%, FIG. 13 a) or manually set (100%). Also contemplated is a TPD-CAIN/+ female germ cell pair NPG1Is set to an empirical value (94.1%, FIG. 13 b) or is set manually (0%, 50%, 100%).

For a homing type of drive, the drive allele in heterozygous state can convert the wild-type allele into the drive allele with 100% efficiency. For TARE actuation, the target gene is a haplosufficient (haplokick) essential gene for embryo development, and Cas9 will cleave the wild type target gene in germ cells with 100% probability. After fertilization, cas9 carried by the egg cells will further cleave the paternally derived wild-type target gene with a cleavage efficiency of 100% as well. Embryos with two disrupted alleles of the target gene but without genetic TARE driving elements will not continue to develop.

For the inhibitory CAIN drive, CAIN is located inside a haploid abundant male fertility gene, which is therefore disabled. Except for the initial population size andKall were set to 100,000, the rest were the same as the modified CAIN drive. Male CAIN homozygotes fail to produce viable pollen.

The simulation was implemented using custom Python script available from https:// gitsub.com/QianLabWebsite/GeneDrive.

Example 1 design of CAIN, CRISPR-based poison-antidote gene driven system targeting pollen germination

CAIN consists of three parts that are closely linked: poison, untangling and carried goods. The poison is a gRNA-Cas9 complex that can introduce inactivating mutations in an essential gene associated with pollen germination by triggering DSBs and subsequent repair by NHEJ. The drug solution is a recoded version of this essential gene, expressed with its native promoter (FIG. 1 a). In theory, a poison may disrupt both alleles of the essential gene prior to meiosis, affecting germination of all four pollen grains, but only neutralizing this effect if a antidote is present in the pollen grains. Thus, when CAIN-carrying plants were mated with wild-type plants, only two pollen grains carrying CAIN were able to successfully germinate, allowing CAIN to reach 100% transmission (fig. 1 b). Even if only one of the two alleles of the essential gene is disturbed by a poison, the transmission of CAIN will be two-thirds (fig. 1 b). In either case, the ratio of CAIN delivery will be 50% higher than expected for mendelian genetics.

This process can be propagated in successive generations, with CAINs being propagated throughout the population by a continuous hybridization process. Although the gene driving system aims at the population of the outcrossing propagation, the arabidopsis thaliana is selected as an experimental object, because the arabidopsis thaliana is a main self-pollination mode plant, the artificial hybridization is carried out in the whole experimental process, and the ecological safety of the gene driving system before formal release is further enhanced through strict experimental procedures and management.

The implementation of CAIN requires the selection of a gene critical to pollen germination. To determine the appropriate target gene, a list of genes related to male gametophyte development but not affecting female gametophyte development was retrieved from a previous collection ¹³ Select and selectNo Pollen Germination(NPG1)As a target gene, it is a gene necessary for the late stage of pollen germination ¹⁴ . Selecting it as the target gene can enable the period of Cas9 cleavage to be relatively longer, as Cas9 cleavage needs to be completed before it can function (fig. 1 a).

12 gRNAs were screened in the NPG1 CDS sequence and their cleavage efficiency was initially tested in Arabidopsis protoplasts. Finally, four grnas (gRNA 2, gRNA6, gRNA11 and gRNA 23) with different DNA cleavage efficiencies were selected and constructed into drive elements (fig. 1 a).

To mitigate the negative effects that Cas9 expression may have on plant development and adaptability in somatic cells, we selected promoters that are active mainly in germ cells to drive Cas9 expression (fig. 1 a). We constructed two gene drive systems, using promoters with different simultaneous empty expression patterns (FIG. 1 a). One isDMC1(Disruption of Meiotic Control 1)Is capable of expressing Cas9 in pollen mother cells within anthers, the other is TPD1 (Tapetum Determinant 1)Is continuously expressed in the process that the progenitor cell of the pollen mother cell, namely the sporocyte, gradually develops into the pollen mother cell ¹⁶ . The latter provides a longer cutting time window (FIG. 1 a), possibly duringNPG1Has a higher overall Cas9 cleavage activity before onset of action.

Medicine decomposition isNPG1The recoded version of the gene sequence is driven by its original promoter to complement the gene function. To ensure that this solution was not cleaved by Cas9, we mutated the target site of the gRNA (according to synonymous codons, so amino acid composition was not affected, fig. 8 a). In addition, the intronic region of the gene is deleted in the version, so that the structure is more simplified, and the subsequent genome is not interferedNPG1Genotype identification of (figure 8 b). For goods (cargo), a red fluorescent protein expressed during seed drying is selected and named FAST ¹⁷ So as to observe the spread of drive.

In summary, two CAIN vectors, TPD-CAIN and DMC-CAIN (fig. 1a and 9), were constructed, named for their respective promoters driving Cas9 expression. In addition, a vector containing only the FAST marker was constructed as a negative control (fig. 9). After the above vectors were introduced into wild type Arabidopsis Col-0, respectively, by Agrobacterium-mediated flower dipping, random insertion into a certain genomic position was performed (FIG. 10). Transgenic plants T1 with single site insertion (i.e., first generation transgenic plants) were screened for subsequent analysis and, in the absence of gene driving activity, drive elements would be propagated to offspring at a rate of 50% according to mendelian genetics.

Example 2 significantly increased propagation Rate of TPD-CAIN transfer to F1 offspring

To assess whether the constructed gene-driven CAIN was successfully transmitted by male parent, arabidopsis plant T1 carrying CAIN was used as male parent and crossed with wild type Col-0 as female parent (fig. 2 a). Since maternal Col-0 will always be transferred to one Wild Type (WT) allele of the F1 generation, the proportion of CAIN transferred from the male parent can be determined from the dominant phenotype conferred by FAST (i.e. red fluorescence in F1 seeds).

For DMC-CAIN, the test results showed that only one of the three crosses (i.e., D31 plants as male parent) had significantly increased CAIN delivery (exact binomial test, FIG. 2 b). In contrast, for TPD-CAIN, all four hybridizations showed a transmission ratio of 89.6% to 96.9%, which deviates greatly from Mendelian inheritance (FIG. 2 b). In contrast, the negative control (i.e. FAST only) propagates at a proportion of about 50% (fig. 2 b), conforming to mendelian inheritance.

Since the observations in each silique (silique) can be considered as an independent test event, further examined for goodness-of-fit with repetitionGTest, fig. 2 b) determines the deviation between the test result and 50% to explain the differences between different fruits. It was observed that in crosses with D31 plants as male parent, there was a significant heterogeneity of DMC-CAIN between different cones, which is the only cross showing more than 50% transmission rate, probably due to lower efficiency of Cas9 cleavage, and inconsistency between different cones due to randomness. In TPD-CAIN, however, no significant heterogeneity in propagation ratio was observed between different cones, and all four hybridization combinations showed a propagation efficiency approaching 100% (FIG. 2 b), indicating powerful and consistent performance of TPD-CAIN.

Example 3 parental-dependent bias isolation in TPD-CAIN

To evaluate whether TPD-CAIN is designed by destructionNPG1Partial separation achieved (FIG. 1), in individuals F1NPG1Genotyping was performed. The rosette leaves and inflorescences of the plant are sampled, and the target position is amplified by extracting genome DNAPoint and Sanger sequencing, genotyping (FIG. 3 a). The results show that all FAST + (i.e. containing driving elements,TPD-CAIN/+) F1 plants (n=16) all carry a disruptionNPG1Allele [ ]NPG1 ^- ) (FIG. 3b, FIG. 11), various types of index production were observed at 88% and 100% of F1 offspring at the target sites of gRNA2 and gRNA11, respectively. Theoretically, indels of length three could produce CRISPR resistance alleles without affecting subsequent reading frames, but at the same time the rarity (i.e. 0%) at both gRNA sites, suggesting that CAIN designs have lower rates of normal gene-functional resistance allele formation, especially where multiple grnas are concatenated.NPG1 ^- Is ubiquitous in almost all F1 generation plants, suggesting CRISPR-basedNPG1The knocking-out efficiency is very high, and pollen germination is further damaged theoretically, so that the partial separation phenomenon is caused.

To further investigate the mechanism of partial segregation, a forward and reverse cross was performed between FAST+F1 plants and Col-0, aiming at determining whether the partial segregation phenomenon of TPD-CAIN was affected by the direction of hybridization. When fast+f1 plants were used as male parent (n=13), the transmission rate of TPD-CAIN was significantly higher than expected for mendelian inheritance (i.e. 50%, fig. 4 a). In contrast, when fast+f1 plants were used as female parent (n=8), the transmission rate of TPD-CAIN did not deviate significantly by 50% (fig. 4 b). These results indicate that the partial separation phenomenon of TPD-CAIN is caused by NPG1The functional defect of the male gametophyte caused by gene knockout.

The propagation rate of TPD-CAIN is higher than DMC-CAIN (FIG. 2), presumablyTPD1The promoter is capable of producing higher activity caused by Cas9, so thatNPG1Is higher. To verify this hypothesis, FAST+F1 plants generated by DMC-CAIN were also subjected toNPG1Genotyping (n=12). The results show that only two strains F1 possess two KO alleles [ ]NPG1 ^-/- ) One strain possesses a KO alleleNPG1 ^+/- ) The rest 9 strainsNPG1 ^+/+ ) Without any KO allele (fig. 5 a-b). This also means that DMC-CAIN is shown in the following section (FIG. 3 b) relative to TPD-CAINNPG1The cleavage efficiency of the locus is low.

Although DMC-CAIN cleavage was less efficient, it was further evaluated whether DMC-CAIN could produce bias separation in the next generation. For this purpose, two strains are usedDMC-CAIN/+；NPG1 ^-/- F1 plants pollinated wild-type female parent, and the red fluorescence ratio in their F2 seeds was counted (fig. 5b and 5 c), and DMC-CAIN transitivity was found to reach 95.9% and 99.5%, respectively. In contrast, byDMC-CAIN/+；NPG1 ^+/- Of the F2 offspring produced by the F1 plants as male parent, drive% was 63.7% (FIGS. 5b and 5 c). The other nine strainsDMC-CAIN/+NPG1 ^+/+ The% drive in the F2 offspring produced by F1 plants was nearly 50% (fig. 5b and 5 c). On average, DMC-CAIN had a transfer rate of 57.5% from F1 to F2 (3367/5857). This limited transmissibility indicates a critical role for cutting efficiency in affecting the efficacy of CAIN systems.

Example 4 insufficient DNA cleavage and incomplete Exactness can result in a TPD-CAIN spread of not reaching 100%

When T1 plants with gene driving elements were used as male parent, a transmission rate of TPD-CAIN of between 89.6% and 96.9% was observed (FIG. 2 b), i.e.a fraction (3.1% -10.4%) was still not inherited TPD-CAIN (i.e.FAST-). Similarly, in the F2 generation produced by the F1 plant carrying TPD-CAIN, some (1.0% -12.2%) had no genetic TPD-CAIN (FIG. 4 a). The mechanism of production of these FAST-offspring was further explored.

To confirm whether these FAST-F1 plants were due toNPG1Not successfully cut, for 11 FAST-F1 plantsNPG1Genotyping was performed (FIG. 6 a), notably where 6 strains were homozygous for the WT allele at all four target sites (FIG. 6 b). Demonstrating that male gametes provided by the male parent contributed to WT alleles, these FAST-F1 were indeed generated by escaping cleavage of Cas9 (i.e., insufficient DNA cleavage).

On the other hand, there were 5 FAST-F1 strains that were of WT/KO genotype at the gRNA2 and gRNA11 target sites (FIG. 6 b). Considering that female parent Col-0 can only provide one WT allele, the KO allele is transmitted from male parent through pollen. This means that a small number is carried NPG1KO allele and lack of drive elementPollen of the pieces was still able to germinate, indicatingNPG1The non-germinating phenotype of the KO allele is not 100%, i.e. there is an incomplete exon rate (Incomplete penetrance).

To understand the effect of DNA cleavage failure and incomplete exon on the population level that CAIN-driven propagation may produce, computational simulations based on the Wright-Fisher model were performed. Such an individual-based random model assumes that a limited, randomly mated population is propagated in discrete, non-overlapping generations. From 9900 wild-type individuals and 100TPD-CAIN/+Individuals began, the population size was preserved, the mating pairs were randomly selected, and DNA cleavage and bias separation were limited to the male parent. Based on the estimated efficiency of cleavage of male germ cells and the apparent rate (98.4% and 96.0%, respectively; FIG. 13 a), the simulation results showed that,TPD- CAINpopulation transmission from 1% to 99% requires approximately 17 generations, only one generation more than the optimal conditions (i.e., 100% DNA cleavage efficiency and full-penetrance, fig. 7 a). The results show that the efficiency of TPD-CAIN propagation is relatively robust despite these effects.

Example 5 presence of ubiquitously in germ cells of plants carrying TPD-CAINNPG1Allelic conversion

The in-use TPD-CAIN was observed, NPG1 ^- Pollen against wild type plants (NPG 1) ^+/+ ) NPG1 was not detected by Sanger sequencing in somatic tissues of FAST+F1 plants produced by pollination ⁺ Genotype (fig. 3b; n=16). This suggests maternal NPG1 ⁺ Alleles may have undergone CRISPR-mediated DSBs. To further test this hypothesis, leaf and inflorescence samples from three fast+f1 plants of male parent T18 were detailed at the gRNA11 target site using Illumina sequencing technologyNPG1Genotyping. Except for the major (average 52.9%) genotype "NPG1 possibly inherited from T18 ^-8 In addition to "(FIG. 3 b), several other genotypes were observed, but uncleaved NPG1 ⁺ Alleles were relatively rare (0.8% on average, fig. 3 c). The result shows that the female parentNPG1 ⁺ Alleles did undergo CRISPR-mediated DSBs, presumably Cas9 activity results from post-zygote formation, with DSBs repaired primarily by the end-ligation mechanism.

Cas9 activity responsible for the generation of these DSBs may originate from one of two possible scenarios: cas9 expressed from the embryo genome after fertilization (i.e., zygote), or the paternal carry-over of Cas9 protein. The latter situation appears to be unlikely due to the limited protein/RNA content of the sperm cells. To investigate these two hypotheses, F2 plants previously generated by F1 back-crossing were examined using Sanger sequencing (FIG. 6 c) NPG1Genotype. NPG1 whether TPD-CAIN is inherited from male parent or female parent ⁺ None of the genotypes appeared in fast+f2 offspring. In contrast, NPG1 was detected in all FAST-F2 offspring even if the father carried TPD-CAIN ⁺ Genotype (FIG. 6 c). These observations are very consistent with Cas9 expression after zygote formation, in contradiction with the interpretation of paternal inheritance.

Analysis of transfer from fast+f1 to F2 plantsNPG1Genotype (FIG. 6 c), found a certainNPG1The frequency of genotypes will dominate (i.e., much more than 50% in frequency), although the genotypes of these F1 parents are heterozygous NPG1 at the time of initial fertilized egg formation ^+/- . For example, although at the gRNA11 target site, FAST+F1 male parent T18-2-1 (NPG 1) ^-8 ) Transferred to three genotypes of F2 plants, but NPG1 ^-8 Appears in 13/16F 2 plants, the remaining two genotypes NPG1 ^-C And NPG1 ^-30 Are identified in only one plant, respectively. These findings suggest that HDR repair of DSB occurs in germ cells compared to somatic tissues to father NPG1 ^- Alleles act as templates and create allelic transitions.

EXAMPLE 6 female germ cell internal cleavageNPG1Can promote CAIN propagation

Cleavage in female germ cellsNPG1And its subsequent repair is possible by recruiting male germ cells NPG1Knockout to enhance CAIN propagation. To accurately estimate in female germ cellsNPG1Cleavage efficiency, the genotype of FAST-F2 plants was analyzed, since no additional events occurred after F2-zygote formationNPG1Cleavage provides the original genotype information inherited from its parent. Of the 34 FAST-F2 offspring produced from FAST+F1 plants (i.e., TPD-CAIN/+) and wild-type pollen, 33 are at the gRNA11 target siteWhere is shown as heterozygous NPG1 ^+/- Genotype, thereby estimating female germ cellNPG1The cleavage efficiency was 94.1% (FIG. 13 b).

To evaluate female germ cellsNPG1The cut was to a large extent able to promote CAIN propagation, and this parameter was incorporated into the simulation with the CAIN carrier's initial introduction frequency set to 1% (fig. 7 b). The results reveal a context dependent effect: when the efficiency of male germ cell cleavage is 50%, an additional 50% of female germ cell cleavage can accelerate CAIN propagation by about 3 passages. However, the efficiency of cleavage of male germ cells observed in TPD-CAIN plants was 98.4% and the actual cleavage of female germ cells was 94.1% only about one generation faster (FIG. 7 b). The simulation results also show that the CAIN propagation speed is delayed by only a few generations compared to the home form. However, it is still much faster than frequency dependent TARE drives, which require a high enough initial launch ratio to be able to spread out quickly (FIG. 7 b).

The CAIN/TADS propagation speed is also significantly faster than the TARE with high initial drop ratio ¹⁸ This also shows its potential as a means of population suppression when integrated into a haploid-sufficient male fertility gene ²⁵ . To measure the inhibition effect, the population dynamics of CAIN were simulated at an initial input ratio of 1% (fig. 7 c). These simulations show a rapid increase in the number and frequency of CAIN carriers (primarily heterozygotes in the early stages and primarily homozygotes in the late stages). At the same time, the total population size gradually decreased until the 26 th generation population became extinct (fig. 7 c), possibly due to the proliferation of the number of CAIN homozygotes for male sterility. Overall, the results of computer simulations indicate that rapid propagation of CAIN gives it the potential to achieve population suppression.

In this study, the present inventors developed CAIN, a CRISPR-based poison-antidote gene driven system, which implements supermendelian inheritance for a key gene necessary for the function of arabidopsis male gametophyte and is capable of generating very few resistance alleles compared to home-based drives. The inventor not only provides key insights for the design and application of the artificially synthesized TA gene drive system in plants, but also provides an innovative solution for solving urgent ecological and agricultural challenges.

Homing-based drives that have been successfully applied to mosquitoes ^3-7 Resistance alleles are often generated in species with low HDR repair rates, impeding their transmission. An artificially synthesized TA gene drive system mimicking the nature would overcome this problem. For example, synthetic medical ^20,21 The system (maternal effect dominant embryonic arrest), whose inspiration comes from the natural medical gene drive in the planchet (flower beedle), exploits a complex balance between parent miRNA (as a poison) and drug release expressed in the zygote and is implemented in drosophila. However, its wider application is limited because it relies on detailed knowledge of Drosophila embryo development. TARE (TARE) ¹⁰ (also known as ClvR ⁹ ) This was avoided by the design of (c) which was directed to the essential gene in the ontogenesis stage, although it was dependent on Cas9 activity transfer (carryover) into the zygote in the egg. Therefore, the design of Medium and TARE/ClvR is biased towards female germ cells and may impair fertility ²² . In contrast, the present inventors' design was aimed at affecting male germ cells. Given that the number of pollen grains is far greater than the ovules, it is possible to minimize the cost of adaptation.

The design focus of the present inventors is the key gene for normal functioning of male gametophyte, and the unique strategy of the present inventors makes use of the common male gametophyte formation process in plants, i.e. after meiosis, there are two rounds of mitosis cycle, eventually forming mature pollen grains ²³ . And pollen grains continue to germinate and extend, and the two sperm cells are transferred through the stigma, so that final double fertilization is realized, and the pollen grains can be transferred to offspring. The design and efficacy of the gene driven system of the present inventors depends on the selection of a target gene (in this caseNPG1) In addition to selecting a highly potent promoter for high activity of Cas9 in the germline. This method of tailoring plants may also be applied in animals if the critical essential genes for spermatogenesis can be determined.

By marking FAST ¹⁷ As a cargo, the inventors determined that CAIN can achieve efficient bias separation in plants. Cargo may also be replaced for solving various ecological problems, depending on the specific situation and objectives (fig. 14). For example, CAIN systems can be used to control invasive plants by selecting specific genes, such as those that affect megasporocyte or embryo development to achieve population control. Or CAIN can be used to achieve the transmission of beneficial traits, such as drought or disease resistance genes, to enhance survival of endangered species in the field. Also, specific herbicide susceptibility genes can be introduced to more effectively manage weeds and the like. This strategy, if widely applied, might be predictive of a new era of ecological management and sustainable agriculture.

Due to the intense debate and regulatory scrutiny about gene driven technology, the inventors have taken specific measures to ensure ecological safety when designing CAINs. The selection of such a model plant for selfing propagation of Arabidopsis precludes accidental transmission by the present system, as gene driving requires crosses to be effectively transmitted in the population. In addition, CAIN design contains a degree of specificity, which can be specifically directed against certain genotypes or ecotypes by screening for gRNA. As with the home-based drive, CAIN is zero threshold (zero-threshold), i.e., an individual theoretically releasing a drive element can gradually spread throughout the population. Whereas its propagation speed can control Cas9 expression by using weaker gRNA or promoter, since potency of drive is closely related to cleavage efficiency. This additional flexibility adds a layer of control to the propagation of the gene drive, enhancing its safe use in different ecological environments.

Some off-target effects may be present in the gRNAs, which is a concern. Although off-target phenomena are unlikely to hinder CAIN transmission, it may introduce unintended genomic mutations, thereby increasing genetic load. It is worth mentioning that although not emphasized in the results section, the inventors performed small scale assays by CRISPR-P2.0 ²⁴ 16 potential off-target sites for 4 gRNAs were predicted and the bases of these sites were determined by Sanger sequencing in 16F 1 plants generated from four T1 linesThus, none of these potential off-target sites was edited in the 16 plants tested, supporting the specific targeting of the four gRNAs selected in CAINNPG1。

The flexibility of updating existing gene driven elements in a population is a critical safety issue. CAIN designed by the present inventors has the ability to be functionally replaced. There are three preconditions for this process to be implemented (fig. 12).

First, a new CAIN driver, CAIN ⁿ⁺¹ Must be matched with CAIN ⁿ Integration into the same genomic location. Although the probability of achieving targeted integration by homologous repair (HDR) in plants is relatively low, a sufficient number of transformant screens can compensate for this limitation. Homologous location can force CAIN ⁿ⁺¹ And CAIN ⁿ Direct competition ensures that only one drive element remains.

Second, CAIN ⁿ⁺¹ It is necessary to use different gRNA targets to destroy essential genes and thus act as new toxicants. This change causes CAIN to ⁿ Medium RecodedNPG1And in the genomeNPG1Are all the cutting objects.

Finally, it is not CAIN ⁿ And CAIN ⁿ⁺¹ Novel REcoded targeted by gRNA in (E) a host cell NPG1As a new drug for drug decomposition.

With these three preconditions, the original CAIN ⁿ Can be CAIN ⁿ⁺¹ Instead, the removal or modification of the current good (cargo) is thereby achieved. The method does not need to select a new target gene, and maintains the overall size of the gene driving element, thereby improving compatibility and integration efficiency.

In summary, the present inventors devised CAIN, a gene driven system based on CRISPR TA principle specifically tailored for plants. By targeting the prolonged male gamete stage in the plant life cycle, the inventors successfully demonstrated the efficacy of CAIN in arabidopsis, setting a benchmark for its use in other species. In view of the key componentsNPG1The gene shows sequence conservation in various plants, and species popularization of CAIN is promising. Hope for future, needs to continue to perfect the gene driving system, packageIncluding research on its reversibility and adaptability, and the relevant mechanism of controllability. CAIN and similar gene driven systems are expected to remodel ecological management, agriculture and species protection in a substantial and revolutionary manner.

Reference is made to:

1.DiCarlo, J.E., Chavez, A., Dietz, S.L., Esvelt, K.M. & Church, G.M. Safeguarding CRISPR-Cas9 gene drives in yeast. Nature biotechnology 33, 1250-1255 (2015).

2.Xu, H. et al. Chromosome drives via CRISPR-Cas9 in yeast. Nature communications 11, 4344 (2020).

3.Gantz, V.M. et al. Highly efficient Cas9-mediated gene drive for population modification of the malaria vector mosquito Anopheles stephensi. Proceedings of the National Academy of Sciences 112, E6736-E6743 (2015).

4.Hammond, A. et al. A CRISPR-Cas9 gene drive system targeting female reproduction in the malaria mosquito vector Anopheles gambiae. Nature biotechnology 34, 78-83 (2016).

5.Kyrou, K. et al. A CRISPR–Cas9 gene drive targeting doublesex causes complete population suppression in caged Anopheles gambiae mosquitoes. Nature biotechnology 36, 1062-1066 (2018).

6.Li, M. et al. Development of a confinable gene drive system in the human disease vector Aedes aegypti. Elife 9, e51701 (2020).

7.Simoni, A. et al. A male-biased sex-distorter gene drive for the human malaria vector Anopheles gambiae. Nature biotechnology 38, 1054-1060 (2020).

8.Gantz, V.M. & Bier, E. The mutagenic chain reaction: a method for converting heterozygous to homozygous mutations. Science 348, 442-444 (2015).

9.Oberhofer, G., Ivy, T. & Hay, B.A. Cleave and Rescue, a novel selfish genetic element and general strategy for gene drive. Proceedings of the National Academy of Sciences 116, 6250-6259 (2019).

10.Champer, J. et al. A toxin-antidote CRISPR gene drive system for regional population modification. Nature communications 11, 1082 (2020).

11.Grunwald, H.A. et al. Super-Mendelian inheritance mediated by CRISPR–Cas9 in the female mouse germline. Nature 566, 105-109 (2019).

12.Champer, J., Kim, I.K., Champer, S.E., Clark, A.G. & Messer, P.W. Performance analysis of novel toxin-antidote CRISPR gene drive systems. BMC biology 18, 1-17 (2020).

13.Muralla, R., Lloyd, J. & Meinke, D. Molecular foundations of reproductive lethality in Arabidopsis thaliana. PloS one 6, e28398 (2011).

14.Golovkin, M. & Reddy, A.S. A calmodulin-binding protein from Arabidopsis has an essential role in pollen germination. Proceedings of the National Academy of Sciences 100, 10558-10563 (2003).

15.Klimyuk, V.I. & Jones, J.D. AtDMC1, the Arabidopsis homologue of the yeast DMC1 gene: characterization, transposon‐induced allelic variation and meiosis‐associated expression. The Plant Journal 11, 1-14 (1997).

16.Yang, S.-L. et al. Tapetum determinant1 is required for cell specialization in the Arabidopsis anther. The Plant Cell 15, 2792-2804 (2003).

17.Shimada, T.L., Shimada, T. & Hara‐Nishimura, I. A rapid and non‐destructive screenable marker, FAST, for identifying transformed seeds of Arabidopsis thaliana. The Plant Journal 61, 519-528 (2010).

18.Zou, J. et al. Comparative proteomic analysis of Arabidopsis mature pollen and germinated pollen. Journal of Integrative Plant Biology 51, 438-455 (2009).

19.Leljak-Levanić, D., Juranić, M. & Sprunck, S. De novo zygotic transcription in wheat (Triticum aestivum L.) includes genes encoding small putative secreted peptides and a protein involved in proteasomal degradation. Plant reproduction 26, 267-285 (2013).

20.Chen, C.-H. et al. A synthetic maternal-effect selfish genetic element drives population replacement in Drosophila. science 316, 597-600 (2007).

21.Buchman, A., Marshall, J.M., Ostrovski, D., Yang, T. & Akbari, O.S. Synthetically engineered Medea gene drive system in the worldwide crop pest Drosophila suzukii. Proceedings of the National Academy of Sciences 115, 4725-4730 (2018).

22.Zanders, S.E. & Unckless, R.L. Fertility costs of meiotic drivers. Current Biology 29, R512-R520 (2019).

23.Schmidt, A., Schmid, M.W. & Grossniklaus, U. Plant germline formation: common concepts and developmental flexibility in sexual and asexual reproduction. Development 142, 229-241 (2015).

24.Liu, H. et al. CRISPR-P 2.0: an improved CRISPR-Cas9 tool for genome editing in plants. Molecular plant 10, 530-532 (2017).

25.Kim, Y.-J., Zhang, D. & Jung, K.-H. Molecular basis of pollen germination in cereals. Trends in Plant Science 24, 1126-1136 (2019).

26.Gibson, D.G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature methods 6, 343-345 (2009).

27.Xing, H.-L. et al. A CRISPR/Cas9 toolkit for multiplex genome editing in plants. BMC plant biology 14, 1-12 (2014).

28.Xie, X. et al. CRISPR-GE: a convenient software toolkit for CRISPR-based genome editing. Molecular plant 10, 1246-1249 (2017).

29.Yoo, S.-D., Cho, Y.-H. & Sheen, J. Arabidopsis mesophyll protoplasts: a versatile cell system for transient gene expression analysis. Nature protocols 2, 1565-1572 (2007).

30.Wu, F.-H. et al. Tape-Arabidopsis Sandwich-a simpler Arabidopsis protoplast isolation method. Plant methods 5, 1-10 (2009).

31.Clough, S.J. & Bent, A.F. Floral dip: a simplified method for Agrobacterium‐mediated transformation of Arabidopsis thaliana. The plant journal 16, 735-743 (1998).

32.Liu, Y.G., Mitsukawa, N., Oosumi, T. & Whittier, R.F. Efficient isolation and mapping of Arabidopsis thaliana T‐DNA insert junctions by thermal asymmetric interlaced PCR. The Plant Journal 8, 457-463 (1995).

33.Sun, L. et al. TDNAscan: a software to identify complete and truncated T-DNA insertions. Frontiers in Genetics 10, 685 (2019).

the partial sequences referred to in this application:

SEQ ID NO. 1 Arabidopsis NPG1 amino acid sequence

MLGNQSADFSEKGEDEIVRQLCANGICMKTTEVEAKLDEGNIQEAESSLREGLSLNFEEARALLGRLEYQRGNLEGALRVFEGIDLQAAIQRLQVSVPLEKPATKKNRPREPQQSVSQHAANLVLEAIYLKAKSLQKLGRITEAAHECKSVLDSVEKIFQQGIPDAQVDNKLQETVSHAVELLPALWKESGDYQEAISAYRRALLSQWNLDNDCCARIQKDFAVFLLHSGVEASPPSLGSQIEGSYIPRNNIEEAILLLMILLKKFNLGKAKWDPSVFEHLTFALSLCSQTAVLAKQLEEVMPGVFSRIERWNTLALSYSAAGQNSAAVNLLRKSLHKHEQPDDLVALLLAAKLCSEEPSLAAEGTGYAQRAINNAQGMDEHLKGVGLRMLGLCLGKQAKVPTSDFERSRLQSESLKALDGAIAFEHNNPDLIFELGVQYAEQRNLKAASRYAKEFIDATGGSVLKGWRFLALVLSAQQRFSEAEVVTDAALDETAKWDQGPLLRLKAKLKISQSNPTEAVETYRYLLALVQAQRKSFGPLRTLSQMEEDKVNEFEVWHGLAYLYSSLSHWNDVEVCLKKAGELKQYSASMLHTEGRMWEGRKEFKPALAAFLDGLLLDGSSVPCKVAVGALLSERGKDHQPTLPVARSLLSDALRIDPTNRKAWYYLGMVHKSDGRIADATDCFQAASMLEESDPIESFSTIL

SEQ ID NO. 2 Arabidopsis NPG1 coding sequence

ATGCTCGGGAATCAATCCGCGGATTTTAGTGAGAAGGGGGAAGATGAGATCGTCAGACAGCTTTGTGCTAATGGGATTTGCATGAAAACAACTGAAGTTGAAGCAAAGCTTGATGAAGGAAATATTCAAGAAGCTGAATCTTCTTTGAGAGAAGGATTATCTCTCAATTTCGAGGAAGCAAGAGCACTTCTTGGAAGATTGGAATACCAAAGAGGGAATTTAGAAGGCGCACTTCGTGTCTTTGAAGGTATCGACCTTCAAGCAGCTATCCAGCGGTTACAGGTTTCCGTGCCTCTTGAGAAACCGGCTACTAAGAAAAACCGTCCCCGTGAACCGCAGCAATCAGTTTCTCAGCATGCTGCTAACTTGGTCCTTGAAGCTATCTACTTGAAAGCCAAATCCCTTCAAAAGCTTGGGAGAATAACTGAGGCTGCTCATGAATGCAAGAGTGTTCTTGATTCTGTTGAGAAGATATTTCAGCAAGGGATACCAGATGCTCAAGTGGATAACAAACTTCAAGAAACCGTTAGCCACGCCGTTGAACTACTTCCTGCGCTATGGAAAGAATCTGGTGATTATCAAGAAGCCATATCTGCTTATAGACGCGCGCTTTTAAGCCAATGGAATCTTGATAATGATTGTTGTGCAAGGATTCAAAAAGATTTTGCAGTCTTTCTTTTACATTCTGGAGTCGAAGCGAGTCCACCGAGTTTAGGTTCTCAGATAGAGGGATCGTACATACCTAGAAACAACATAGAAGAAGCCATTCTTCTTCTAATGATTCTTTTAAAGAAGTTTAACCTCGGGAAAGCGAAATGGGATCCGTCTGTGTTTGAGCACCTTACCTTTGCGTTATCTTTATGTAGTCAGACCGCGGTTCTCGCCAAGCAGCTTGAAGAAGTAATGCCTGGTGTGTTTAGCCGTATTGAGCGTTGGAACACTTTGGCTCTTTCTTATAGTGCAGCAGGTCAAAACAGTGCTGCAGTTAACCTTCTTAGAAAGTCTCTGCATAAACACGAACAACCCGATGATCTTGTGGCGCTTTTGTTAGCTGCTAAGCTTTGCAGTGAAGAGCCTTCTTTAGCTGCTGAAGGTACGGGTTATGCGCAGAGAGCGATAAACAATGCTCAAGGTATGGATGAGCATTTGAAAGGCGTTGGTTTGAGGATGTTAGGACTTTGTTTAGGGAAACAAGCGAAGGTTCCGACATCGGATTTTGAAAGATCTCGGCTGCAATCAGAATCATTGAAAGCATTAGATGGAGCTATAGCTTTTGAGCACAATAATCCTGATTTGATCTTTGAGTTAGGTGTTCAATACGCTGAGCAACGGAACTTAAAAGCTGCTTCCCGTTACGCCAAAGAGTTCATCGATGCAACGGGAGGGTCAGTGTTAAAAGGATGGAGATTTCTCGCGCTTGTTTTGTCAGCTCAACAACGGTTTTCAGAAGCAGAAGTTGTGACTGATGCTGCTTTAGATGAAACTGCAAAGTGGGATCAGGGACCTCTCTTGAGACTCAAAGCAAAGCTGAAAATCTCTCAGTCAAATCCAACAGAAGCCGTTGAGACTTATCGTTACCTTCTTGCATTGGTTCAAGCGCAAAGGAAATCTTTCGGACCTCTCAGAACTCTTTCTCAGATGGAGGAAGACAAAGTGAATGAGTTTGAAGTGTGGCATGGCTTGGCTTATCTTTACTCAAGCCTTTCGCATTGGAACGACGTAGAAGTCTGTCTGAAAAAAGCCGGAGAGCTGAAACAATACTCTGCTTCAATGTTGCATACAGAAGGTCGAATGTGGGAAGGACGAAAGGAGTTCAAACCCGCGCTAGCAGCTTTCTTGGACGGTTTATTACTAGACGGATCATCGGTTCCTTGCAAAGTAGCGGTTGGAGCGTTATTGTCCGAAAGAGGGAAAGATCATCAGCCAACTCTCCCCGTGGCTAGAAGTTTGCTCTCTGATGCATTGAGGATCGATCCAACAAACCGAAAAGCTTGGTATTACTTAGGAATGGTTCATAAATCTGATGGACGTATAGCTGATGCTACTGATTGCTTCCAAGCTGCTTCTATGCTTGAAGAGTCTGATCCTATTGAAAGCTTCTCAACCATTCTTTAA

SEQ ID NO 3 recoded Arabidopsis NPG1 coding sequence

ATGCTCGGGAATCAATCGGCAGACTTCTCAGAGAAGGGGGAAGATGAGATCGTCAGACAGCTTTGTGCTAATGGGATTTGCATGAAAACAACTGAAGTTGAAGCAAAGCTTGATGAAGGAAATATTCAAGAAGCTGAATCTTCTTTGAGAGAAGGATTATCTCTCAATTTCGAGGAAGCAAGAGCACTTCTTGGAAGATTGGAATACCAAAGAGGGAATTTAGAAGGCGCACTTCGTGTCTTTGAAGGTATCGACCTTCAGGCGGCCATACAACGATTACAGGTTTCCGTGCCTCTTGAGAAACCGGCTACTAAGAAAAACCGTCCCCGTGAACCGCAGCAATCAGTTTCTCAGCATGCTGCTAACTTGGTCCTTGAAGCTATCTACTTGAAAGCCAAATCCCTTCAAAAGCTTGGGAGAATAACTGAGGCTGCTCATGAATGCAAGAGTGTTCTTGATTCTGTTGAGAAGATATTTCAGCAAGGGATACCAGATGCTCAAGTGGATAACAAACTTCAAGAAACCGTTAGCCACGCCGTTGAACTACTTCCTGCCTTGTGGAAGGAGAGCGGTGATTATCAAGAAGCCATATCTGCTTATAGACGCGCGCTTTTAAGCCAATGGAATCTTGATAATGATTGTTGTGCAAGGATTCAAAAAGATTTTGCAGTCTTTCTTTTACATTCTGGAGTCGAAGCGAGTCCACCGAGTTTAGGTTCTCAGATAGAGGGATCGTACATACCTAGAAACAACATAGAAGAAGCCATTCTTCTTCTAATGATTCTTTTAAAGAAGTTTAATTTAGGCAAGGCTAAGTGGGATCCGTCTGTGTTTGAGCACCTTACCTTTGCGTTATCTTTATGTAGTCAGACCGCGGTTCTCGCCAAGCAGCTTGAAGAAGTAATGCCTGGTGTGTTTAGCCGTATTGAGCGTTGGAACACTTTGGCTCTTTCTTATAGTGCAGCAGGTCAAAACAGTGCTGCAGTTAACCTTCTTAGAAAGTCTCTGCATAAACACGAACAACCCGATGATCTTGTGGCGCTTTTGTTAGCTGCTAAGCTTTGCAGTGAAGAGCCTTCTTTAGCTGCTGAAGGTACGGGTTATGCGCAGAGAGCGATAAACAATGCTCAAGGTATGGATGAGCATTTGAAAGGCGTTGGTTTGAGGATGTTAGGACTTTGTTTAGGGAAACAAGCGAAGGTTCCGACATCGGATTTTGAAAGATCTCGGCTGCAATCAGAATCATTGAAAGCATTAGATGGAGCTATAGCTTTTGAGCACAATAATCCTGATTTGATCTTTGAGTTAGGTGTTCAATACGCTGAGCAACGGAACTTAAAAGCTGCTTCCCGTTACGCCAAAGAGTTCATCGATGCAACGGGAGGGTCAGTGTTAAAAGGATGGAGATTTCTCGCGCTTGTTTTGTCAGCTCAACAACGGTTTTCAGAAGCAGAAGTTGTGACTGATGCTGCTTTAGATGAAACTGCAAAGTGGGATCAGGGACCTCTCTTGAGACTCAAAGCAAAGCTGAAAATCTCTCAGTCAAATCCAACAGAAGCCGTTGAGACTTATCGTTACCTTCTTGCATTGGTTCAAGCGCAAAGGAAATCTTTCGGACCTCTCAGAACTCTTTCTCAGATGGAGGAAGACAAAGTGAATGAGTTTGAAGTGTGGCATGGCTTGGCTTATCTTTACTCAAGCCTTTCGCATTGGAACGACGTAGAAGTCTGTCTGAAAAAAGCCGGAGAGCTGAAACAATACTCTGCTTCAATGTTGCATACAGAAGGTCGAATGTGGGAAGGACGAAAGGAGTTCAAACCCGCGCTAGCAGCTTTCTTGGACGGTTTATTACTAGACGGATCATCGGTTCCTTGCAAAGTAGCGGTTGGAGCGTTATTGTCCGAAAGAGGGAAAGATCATCAGCCAACTCTCCCCGTGGCTAGAAGTTTGCTCTCTGATGCATTGAGGATCGATCCAACAAACCGAAAAGCTTGGTATTACTTAGGAATGGTTCATAAATCTGATGGACGTATAGCTGATGCTACTGATTGCTTCCAAGCTGCTTCTATGCTTGAAGAGTCTGATCCTATTGAAAGCTTCTCAACCATTCTTTAA

SEQ ID NO. 4 Arabidopsis NPG1 native promoter sequence

tatgagtcgagtgtctgacttgtatgagttagggcctagtatgaataaataaacattattaatattaagatagttgttttcgataattgtttgataaggatgccactaaactcatacctcttagcttatacgaattgacttaattagacattaatacattatatctatatattatctagatttataattgctaagccaataggtcaaggtcttgtctaataaatgcatgcacaactaattcagtcaataatgtacctgtataatactacaaataattcaagctaattgatctatattgaagaacaaataagtaatctatttcggatttagtcttatcatgtgtctaaataaacacataactcttaagtcttaatgatttatttttgatagatatcaattataattatacaattacaaatgatttgatgattgactatacgtaagaactaactttgataattttgaattgggacaaatcattgaaggccttacgtttaagctttagatgtttcccaacgccaaaggagaatgaaaaggacagaccatcagtgatttgagtactcaatcaacatatttattatgtactttgagttaattaattttctattaataacaaaaatcaagcttgcacatttcaatgtgataagtatatgaataataatccaagctaatttttaagaaaagaggaatattgaaagcttgcaaattattcgaatgctagaggtccttaccttgcatgcaccttttgtaacaattacctatgggtgtggggaaatctagctagctacatattttcaattatttttccctattaaattgagattattgttataaaagaaaatgcccaaacttaattttcggggtttaaaattttgtttaaaaataaataaaatataagaaaagaaagaaaagtataatttgggttaaggggtttgaatatgattgatttgaatcgtcgtcgaaatgtatacgtcacctaacgcttttgttgctatactagtatcattaagtggaaattttaaagtcattaaaactcttctcatttttgtatttctaaaagagtcttaaggggtttgaatatgatttaaattatcttacaagtgtaaatgccatctaacgcttttgttgttatactagtattatttagtaataagatgctaaagtcactcaaactccagaatcaataatactccaagctatacatattagaattttaaaatagtatgaacactttcgataataaaaataccaaacttatttgggacactaaataagtttgggccgaaaatatttaaaagcccaatttaaactaaaattcatttaggctcttctcttctactaccttcttctatcgagccacaccgaatgaaattagtgaaactgctattggcttgtgaattgtgtgtgatggcgttaaagcctcttttagttcgtaaccgatgaaatgacagtaatagccttgagaaacactgaaaattacagaaagagagtttgaactttgaagacaaaacaggtgtttctatttctctccccgttcacgttctgcaacatcggaagcacgtacggctcctaagactccgttttgcttctttttttttaaaacacattcttattataaatataaaaaaaacaaagagagatcaaaaacaaaaagtttcctctctttttctaattttttaaagtttctttcatcttcttcagatccgaattgtcgccgcgaaattcgtcagtgcagcttcttcttcttcgcgtactttattcgatcggctgtctgaagaacatgaagccgatgatcgtaggtacgttagattcatttttccgaaattggctttttgatttttctgatcgaaacgatgcgaggttcaatttcatcattgttttgaaatctatgacttacaaaagtaatggcgttgacagatttgttcttaataaggacccacatttttgctgaattttggaacaaacattgttcttctttgatttcaaaacaagaattagaaaattcatttatcatgtaatctatttagctgatgtgacgatgaacagatcaaaggaatgtagtctcgaattgtttaaggttataatgattcctctaagtgaaaaaaaaaaaaaaagcagaaaaaaaagttagaaagatgacaaagttgaagatttctttttgtctttgaagcttctatttttttggtgggtcctttttaagacaatgatttcaattcttggattttgtctgaagaaaaatgttgctgttcttctctttacaatgtttttgattgtgagcttgcgttgacttaaatcatgtcatatattttggtttctcacggttttatttattgtgccaagtgatgcagttgctgctagttacggtggattgatgtttggatggacgcagaaattttgatgtgggtttagtctaaaaggtgaagaaca

SEQ ID NO. 5 DMC1 promoter sequence

cagggaatgttccaatataagacactttaaacgtaagtttagacaatatagacactttccaagttagaggcacttttccttctttttgaaggaaaacttgacttttatacctcttaactaaacaatcgaaaacaataactaaatatatatcttaaccaaacaattaaaaaaataaaagaatttagatacgtagttattaatatagaccattagattgaaaaataaaaattaagatctatggctgagattaaagacaataaatggattaattttttgatgttaaaatctgattagaaaaaggtatttctcttcgtctctagaactaaatctctctctctaaaaaaacaatcgtttctccctttctccttcctgaagatcgttttttcataaatccatagtagtttaaaaacgaagcagagagatgttgaaaatcgtttctcatgaaattaatcgattattctctgtgaagttctttaatccacacaactttcctcatgaacatgataatagtagtaaatggaggtttttcctatggttactctagacgaaggaggatctccttgtgttggacaggtttgtgatttctttccatggattaaaaaaatttgattgtttgtttatgatgaacgattctttggctacggaagagtgtcatggagttctggcgaattctttggctatgtttggtgatttcgtttttaatcaagttgggaatcaataggaaacaactaagcatacaacatagattagaagagatatcaagatggatctaatttaagtaagatttggcgactaattctagatgattagggttatttgtgatttattacaaggcatttgtgttctcattgatttggcgagtaattctgtatgactagggttatttgtgttttcttaaaaagaatttgtgttcttgttgaaatcttgttcattggaattatttgtgtttggtaaatcttcattggtggctaaggatgtgtttgtagctcttacggcgtttgttattggtgatgtccattatggatggcaaattatggatggcacattatggatgatgaatcatggatgacatattatggatgacgcatcatggattgtatattatggattgatatggtgagatttgtaaatcttttggtcttacatgttaagagtaaaagatgaagaattggagaagcatgtctaacatcctaaaaacaagctatatgcggttgatttgctacaaataattttttggtatccataataacaaatccatttaaatatatccattcagaaacctttctactgatccgtatccattctatataccatgtcaataataataggagattcgattaaccgtgttttgtaaagaaaccaaagttccatgtccataaggttttgaaggtggaggtctctgcaaactgaaaaaaaaatcaacaaacaattttttggtgtccataataacgaatccatttaaatatatccattcggaaacctttttactgatctatatccattttatataacatgtccatgataacaggagattcgattaactgaaatctcgatgctacgtagatgaaacgagtttgacacatgagagagagcaaaaatcaaatcaaaccgccattgttgaagaagaagaagtttcttctcattttttacaaagatgaagagagagagaggtgaagagagagagagagatgaagagagagagagagagaaagagagagatgaagagagagagagaaagagagaaaacgtgggttaagataatattttagttaagagggtattttagtaaaaaaacataaaaaagtgcctaatcttttgaaagtgcctaaacacagaaatagttttaaaaaagtgtttaagagtgtaatattctctttttttcacctagattccttctattgaccgtcgatagacggatgataactatgacgtggcattatcgcagccatcaaacaaagtcatgtataacaaagaagagcacacaaacgaaaacaaattcagttgcggaacccaaattcaaatcaacggaattagaatcacgctttcaattccgtaacccgccattaaaaaccttgaaccctcgaagcaaatcgagcaaagattttcaaatttcgaatttcaaaattctatctctctcactcttccaagcttagagactcttagagcgagaaa

SEQ ID NO. 6 TPD1 promoter sequence

acatagagcttgcatatatttggaggttagattacaagacgagattccatgtgtaacctaattgattaataaggcatctctatttatttgtgactcgacctgatctgatccgggtgggatacaacatgttgtagattagtgttattgataggaaatttgtaacatctctaaatgtttttgacctatgattgttttttttccttacaaacttataccattcccatagccttagcatctgccattgcagtaacattaacgatttccatgtgaaaaacaaccaattttagcaataatttgggttgactttgtcgagatcttggctcaattatatatattcataccattactatataagaactgatgtcttgtttatttgatgtcagacgcctgagaggtttcaaagtttttaaaaaaaaaaatttttaaagagaagcgtgtgtggctttaaataaggtcaactaggaaatgggaatcattcaacaagaagaaaaatgacaaaatgaaatatgaatgaagaggaggagggggtcgagaaaggttgagagaagcagaccaaagctctgcaaaactctgttttattaatgacacattgtgctctgtctgtcaaaagcaatgccttctttctagtgcatttattgcccattcccaaacaaaatatacaaataagtgtaaggatgcatgatatagtttaaaaaactatttgaaatgctcacattctttttgaacttctcttttaaatttgcaaaaaaaaattatatttttttgttccaaaaactgcaagcaaatgttgatacgaacgagccaacttgtcattttatgaccttgttttatctctgccagtcaaataactctttccgttttcgcttttttggcttacttcttactctgttggtttgccttttgtttggccttactttcgtttataggaatcgaatttcaatgttttatctttcctgtcgaaattaaattggtctttctaataaatctcatttttttctttttcaaagatttgtttatttagtgaacaaattcttaagagagtttttttccccaagcaattgaaaatgaatcatgtaatgttgatttttttggtgcaagtttatatagtttgctagaaatttggccttcatacgatatttgaacattttgatataagatttctatcagaagacagaagctacacgattgattcagccagagaaaacaaaagttgaaccgaacgattaaacccacacacaaaaaaaaacaaatagaataagaatgaaggagaaggaatataaaaatgggtacaagaaaaaacatcatcgtcgcaatcataaatgcaattgaaggcgcgtggaaaagagactcgtgtgcttctgatactcccacgtgaggatgtgacaatttaatattacgaattcaataattacccaatctttcttaatctgttaatttatctaagccaatcattcattcctttcacacccgccacggtgtcaatccaaattttttagaatcaccaaattacacctttacccttatattgtttttaatttgtttccgaattttaccaactgtttcaataaaacgtaccaacccatttctggttggatacaagcgggattcattcctatataacatttttaacggtatcattcaatcataccggtccaatttatttctatcatgctatctatataacattttcttacaaaatgtctttctctataccttttcacattcgaaacttcaaaagttaatgtgtcaatttaattacgcataactcgaaaaatgcattttaaaacaattaaaattaaatttatcttaatttgacgttataaaaaaatattgaatatatttccgagaaataataataagagaaaggactataaatacgtctctagtgtgtaatgtgtaacacagacgagagtcctcaaatccattttctctctctatctctctttatcccttcgtcttcttcctcggcgacaccacttgcaggcgctaactcgacgaagaaggaaaaggtgagagaaactctctgaaaactgtacggatttaaacgtatatatgtgtgtatgtatacgaatctgatggtttagttttctggatttttctccattctctgttgattctactttttttgtttgtttgtttgctttgtttctctgtgtttcacgctgcactacgctccagctttctctttgtttttcagaaccagattgcttttttccatgaaactcgatcgagatttcttactttttcctatttttagtcgctttatgatacgattcatctgtcgcctgattcgcttcatctccttggtttgattttagattttcaatttcttctgtttttggttacgtttgtgttcgctgtgatgaagttttccctgaaacttgttaaaagcgataatgcatttcgccgtcgttttcttcgattttaggtttaagcttctctctctctctttcactgtacattgcgcaacagatttttgattttggtcaaagtttttcaaaatttctgcagtagattccttatatttcaaatcagagaagcgagtgatgttaggagccgcttaaatctggattttcctctgttttatactgttcattgatatgatggatgcaagacaagtcgtgtgataagatactcataaagtttttttcgttctctttcctctggtttttacagattttccggtgttagtcacatcgacgcagaaggaacagagaagaagacgagagtcagcttcattatcaactttagttcttcgacgtctacgcac

SEQ ID NO. 7 gRNA2 target sequence

CCCCTTCTCACTAAAATCCGCGG

SEQ ID NO 8 gRNA6 target sequence

GATCCCATTTCGCTTTCCCGAGG

SEQ ID NO 9 gRNA11 target sequence

ACCTTCAAGCAGCTATCCAGCGG

SEQ ID NO 10 gRNA23 target sequence

CCAGATTCTTTCCATAGCGCAGG。

Claims

1. An artificial gene driving system for a plant, the artificial gene driving system comprising:

a first nucleic acid comprising a coding sequence for a component of a gene editing system that can target and cause the loss of function of a pollen tube development essential protein in the plant, the coding sequence for the component of the gene editing system being operably linked to a promoter that mediates specific expression during pollen formation;

a second nucleic acid comprising a recoded coding sequence for the pollen tube development essential protein encoding a wild-type pollen tube development essential protein and not being targeted by the gene editing system and being operably linked to a native promoter of the pollen tube development essential gene; and

A third nucleic acid comprising a coding sequence for a cargo to be transmitted in a population of said plant,

wherein the plant is Arabidopsis thaliana and the pollen tube development essential protein is Arabidopsis thaliana No Pollen Germination 1.

2. The artificial gene drive system of claim 1, wherein the first nucleic acid, second nucleic acid, and third nucleic acid are located on the same expression construct.

3. The artificial gene drive system of claim 2, wherein the No Pollen Germination 1 consists of the amino acid sequence set forth in SEQ ID No. 1.

4. The artificial gene driving system of claim 2, wherein the coding sequence of endogenous No Pollen Germination 1 in the plant consists of the nucleotide sequence set forth in SEQ ID No. 2.

5. The artificial gene drive system of claim 2, wherein the recoded coding sequence of No Pollen Germination 1 consists of the nucleotide sequence set forth in SEQ ID No. 3 and the recoded coding sequence of No Pollen Germination 1 cannot be targeted by the gene editing system so as not to be disabled by expression of the gene editing system.

6. The artificial gene driving system of claim 2, wherein the native promoter of No Pollen Germination 1 consists of the nucleotide sequence set forth in SEQ ID No. 4.

7. The artificial gene drive system of claim 2 wherein the promoter that mediates specific expression during pollen formation isDisruption of Meiotic Control 1Promoters of genes.

8. The artificial gene driving system of claim 7, wherein theDisruption of Meiotic Control 1The promoter of the gene consists of the nucleotide sequence shown in SEQ ID No. 5.

9. The artificial gene drive system of claim 2 wherein the promoter that mediates specific expression during pollen formation isTapetum Determinant 1Promoters of genes.

10. The artificial gene driving system of claim 9, wherein theTapetum Determinant 1The promoter of the gene consists of the nucleotide sequence shown in SEQ ID NO. 6.

11. The artificial gene driving system of claim 1, wherein the gene editing system is selected from CRISPR, ZFN, or TALEN based gene editing systems.

12. The artificial gene drive system of claim 11, wherein the CRISPR gene editing system comprises a CRISPR nuclease and at least one guide RNA.

13. The artificial gene drive system of claim 12 wherein the coding sequence of the CRISPR nuclease is operably linked to the promoter that mediates specific expression during pollen formation.

14. The artificial gene drive system of claim 12, wherein the gene editing system comprises a Cas9 nuclease and at least one gRNA targeting endogenous No Pollen Germination 1.

15. The artificial gene drive system of claim 14, wherein the at least one gRNA targeting endogenous No Pollen Germination 1 targets a nucleotide sequence selected from any one of SEQ ID NOs 7-10.

16. The artificial gene drive system of any one of claims 1-15, wherein the cargo is a herbicide sensitive gene, a gene that disrupts herbicide resistance, a gene that improves environmental adaptation, or a gene that improves disease resistance.

17. A method of producing a modified plant for genetically driven engineering a plant population, the method comprising introducing the artificial gene driven system of any one of claims 1-16 into at least one plant, thereby obtaining at least one modified plant, the at least one modified plant genome integrating the first nucleic acid, second nucleic acid, and third nucleic acid, wherein the plant is arabidopsis thaliana.

18. The method of claim 17, wherein the first, second, and third nucleic acids integrated into the genome of the modified plant are closely linked.

19. Use of a modified plant for genetically engineering a plant population, wherein the modified plant is obtained by the method of producing a modified plant for genetically engineering a plant population as claimed in claim 17 or 18 or the modified plant has been introduced into an artificial gene drive system for plants as claimed in any of claims 1 to 16, whereby the modified plant genome has the first, second and third nucleic acids integrated therein, wherein the plant is arabidopsis.

20. A method of genetically modifying a population of plants, the method comprising placing at least one modified plant obtained by the method of producing a modified plant for genetically modifying a population of plants of claim 17 or 18 into the population of plants and allowing the at least one modified plant to cross with other plants in the population of plants, wherein the plant is arabidopsis thaliana.

21. The method of claim 20, wherein the method allows the offspring of the at least one modified plant that hybridizes with other plants in the plant population to hybridize with other plants and/or offspring in the population.

22. The method of claim 20 or 21, which results in an increased proportion of plants carrying the cargo in a plant population modified by the method as compared to an unmodified plant population.