CN118207185A

CN118207185A - Base editing system and base editing method

Info

Publication number: CN118207185A
Application number: CN202211617230.3A
Authority: CN
Inventors: 季泉江; 马佳诚; 陈未中
Original assignee: ShanghaiTech University
Current assignee: ShanghaiTech University
Priority date: 2022-12-15
Filing date: 2022-12-15
Publication date: 2024-06-18
Also published as: WO2024125221A1

Abstract

The invention relates to the technical field of biology, in particular to a base editing system and a base editing method, wherein the base editing system comprises the following components: 1) A C2C9 nuclease and/or a nucleic acid encoding the C2C9 nuclease; 2) A guide RNA and/or a nucleic acid encoding the guide RNA; 3) Deaminase and/or nucleic acid encoding the same. The base editing method comprises the step of contacting a target gene with the base editing system so as to realize single base editing on the target gene. The invention successfully realizes the single base mutation of two different types on the target site, greatly reduces the nucleic acid coding size of a base editing system, and the optimal base editing system has the protein part gene size of 2793bp which is far smaller than the maximum exogenous gene size which can be loaded by a single AAV.

Description

Base editing system and base editing method

Technical Field

The invention relates to the technical field of biology, in particular to a base editing system and a base editing method.

Background

CRISPR/Cas (Clustered regularly interspaced short palindromic repeats/CRISPR-associated protein) is a novel gene editing system developed in recent years. Since only single-effect proteins function, two types of proteins, CRISPR/Cas9 and CRISPR/Cas12a, are now the dominant gene editing tools. Cas nucleases bind specifically to their corresponding guide RNAs to form a complex, wherein the Cas protein recognizes a protospacer-adjacent motif (PAM), and the RNAs target recognition of a particular genomic site by base-complementary pairing and double strand break of the target site by the nuclease. Repairing the broken genomic DNA by utilizing DNA repair mechanisms endogenous or externally added to the organism, such as repair mechanisms of homologous recombination and non-homologous recombination end connection, thereby realizing editing of specific sites of the genome in the repair process.

In cells, the genome due to double strand breaks may rely on repair pathways that are not homologous end joining to create indels at the break, resulting in frame shift mutations or premature termination of the gene of interest, thereby disrupting the gene of interest. Because of the higher efficiency of non-homologous end joining repair pathways, CRISPR/Cas9 and CRISPR/Cas12a can easily make knockout of the gene of interest. However, the correction of the target gene requires the use of homologous end repair pathways. The editing method not only needs to additionally add the repair template, but also has extremely low editing efficiency.

For easier gene correction, one achieves single base editing of the gene of interest by fusion of the catalytically inactive Cas protein with deaminase, including fusion of adenine deaminase to achieve a type of a to G mutation of the non-targeting strand, followed by mutation of the targeting strand, i.e., the complementary strand, and fusion of cytosine deaminase to achieve a type of C to T mutation of the non-targeting strand, followed by mutation of the targeting strand, i.e., the complementary strand, followed by a type of G to a mutation (which may be expressed as a.t-g.c or c.g-t.a base substitution). The Cas protein-based base editing system fused to cytosine deaminase is generally referred to as a Cytosine Base Editor (CBE), while the adenine-fused Cas protein base editing system is referred to as an Adenine Base Editor (ABE). However, the CRISPR effector protein nucleases Cas9 and Cas12a are large proteins containing more than 1000 amino acids, and the gene size of Cas protein alone exceeds 3kb, which would greatly limit in vivo delivery and application of such base editing tools as therapeutic tools if the gene size of the refolding deaminase is huge. In vivo treatment typically requires the use of adeno-associated viral vectors (AAV) which can accommodate a maximum of 4.7kb of exogenous genes, thus making loading of Cas9 and Cas12a based base editing systems difficult. In addition, due to the limited PAM sequences recognized by the existing CRISPR/Cas systems, genomic sites that can be selected are limited, affecting their in-depth applications in the fields of medicine and biology, etc.

Recently, a novel class of Cas proteins-C2C 9 has been discovered. The proteins not only have extremely small gene sizes (1.2 kb-2.4 kb), but also have gene sizes which are not more than 3kb even if the fusion deaminase is used, so that the problem of the packaging load limit of the adeno-associated virus vector can be solved. And the PAM sequences identified by the proteins are AAN and GAN, and the PAM is different from the currently commonly used SpCas9 (NGG), saCas9 (NNGRRT) and common TTrich of Cas12 families, so that the protein-derived base editing system can be helped to enlarge the selection range of the target sites.

Disclosure of Invention

In view of the above-described drawbacks of the prior art, an object of the present invention is to provide a base editing system and a base editing method for solving the problems of the prior art. The invention can solve the problems of difficult delivery of huge base editing system and single PAM recognition type of selectable site restriction under the prior art frame, and can realize efficient and accurate editing of target genes or target genome in vitro or in vivo (including cells) by using the base editing system and the method provided by the invention.

To achieve the above and other related objects, the present invention provides a base editing system comprising:

1) A C2C9 nuclease and/or a nucleic acid encoding the C2C9 nuclease;

2) A guide RNA and/or a nucleic acid encoding the guide RNA;

3) Deaminase and/or nucleic acid encoding the same.

The invention also provides a recombinant expression vector comprising any two or three of the following: (i) a nucleotide sequence encoding a gRNA, (ii) a nucleotide sequence encoding a C2C9 nuclease, (iii) a nucleotide sequence encoding the deaminase.

The invention also provides an adeno-associated virus, which is formed by packaging the recombinant expression vector.

The invention also provides a base editing method, which is to contact the target gene with the base editing system so as to realize the editing of single base on the target gene.

The present invention also provides a cell obtained by editing with the base editing system or the base editing method.

As described above, the base editing system and base editing method of the present invention have the following advantageous effects:

1. The invention successfully realizes the base editing of a target site by fusing deaminase on mutant inactivated C2C9 nuclease, and successfully realizes the single base mutation of A to G or C to T on a target site non-target chain and the single base mutation of T to C and G to A on a target chain mutated therewith by fusing two different types of deaminase (cytosine deaminase and/or adenine deaminase).

2. The invention uses novel CRISPR/C2C9 system effector proteins, greatly reduces the nucleic acid coding size of a base editing system, and in the embodiment, the optimal base editing system has a protein part gene size of only 2793bp, which is far smaller than the maximum exogenous gene size which can be loaded by a single AAV, which is not available in the prior base editing system based on Cas9 and Cas 12.

3. The invention uses novel CRISPR/C2C9 system effector proteins, the PAM recognition types are AAN and GAN, and the PAM is different from the currently commonly used SpCas9 (NNGRRT), saCas9 (NGG) and Cas12 families which are commonly used TTrich, thereby helping to solve the problem of the restriction of editable sites of the existing base editing system.

4. The base editing system of the invention has a wider base editing window, and the optimal base editing system in the embodiment has the base editing window that the 3 rd base at the downstream of PAM lasts to the 14 th base, which also increases the selection of editable sites.

Drawings

FIG. 1 is a graph of amino acid sequence alignment analysis results for a plurality of Cas12 nucleases and TnpB nucleases, including four C2C 9.

FIG. 2 shows the results of in vitro cleavage experiments of three possible AcC2C9 catalytic inactivating mutants (D240A/E332A/D429A) and AcC2C9 wild-type protein (WT). The results showed that all three mutants lost activity or were significantly attenuated in activity of AcC2C 9.

FIG. 3 is a schematic diagram of five AcC2C 9-based ABE base editing system constructs.

FIG. 4 is a graph of the base editing results of five AcC2C 9-based ABE base editing systems at four human cell genomic sites. The results show that the five base editing systems can achieve effective base editing (about 4% -13% efficiency), with the D3 version producing the best base editing effect in general.

FIG. 5 is a graph showing the size of the D3 version ABE base editing system construct gene and the base editing results at more human cell genomic sites. The results show that the D3 version base editing system has a gene size of only 2793bp and can effectively edit at a plurality of sites.

FIG. 6 is a diagram showing the analysis result and the editing window of the D3 version ABE base editing system in the editing window (editable base position) of seven genomic loci of Guide1, guide2, guide3, guide4, guide6, guide7, and Guide8, and the result shows that the editing window is from the 3 rd base to the 14 th base of the PAM downstream target sequence.

FIG. 7 is a schematic representation of two AcC2C 9-based CBE base editing system constructs and their base editing results at the VEGFA site. The results show that both CBE systems can achieve effective base editing of the site of interest (efficiency about 4% -13%).

Detailed Description

The terms "C2C9", "C2C9 nuclease", "C2C9 polypeptide", "C2C9 protein" are used interchangeably.

The terms "guide RNA", "gRNA", "single gRNA" and "chimeric gRNA" are used interchangeably.

The term "homology" or "identity" or "similarity" refers to sequence similarity between two peptides or between two nucleic acid molecules. Homology can be determined by aligning corresponding positions in different polypeptides or nucleic acid molecules, where the same position in the compared molecule sequences is occupied by the same base or amino acid in different sequences, then the molecules are homologous at that position. The degree of homology between sequences is determined by a function of the number of matched or homologous positions shared by the sequences. "unrelated" or "nonhomologous" sequences should have less than 20% homology to one of the disclosed sequences.

A polynucleotide or polynucleotide region (or polypeptide region) having a certain percentage of sequence homology (e.g., 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 98%, or 99%) with another polynucleotide or polynucleotide region (or polypeptide region) means that when aligned, the percentage of bases (or amino acids) in the two sequences being aligned are identical. The alignment and percent homology or sequence identity can be determined using software programs and methods known in the art. For example as described in Ausubel et al eds. (2007) Current Protocols in Molecular Biology. Preferably, default parameters should be used for sequence alignment. One of the alternative alignment programs is BLAST, using default parameters. In particular, when the programs BLASTN and BLASTP are used for alignment, polynucleotides that are considered biologically equivalent using the following default parameters ：Genetic code＝standard;filter＝none;strand＝both;cutoff＝60;expect＝10;Matrix＝BLOSUM62;Descriptions＝50sequences;sort by＝HIGH SCORE;Databases＝non-redundant,GenBank+EMBL+DDBJ+PDB+GenBank CDS translations+SwissProtein+SPupdate+PIR. refer to those polynucleotides that have the defined percent homology described above and encode polypeptides having the same or similar biological activity.

In the present invention, the terms "polynucleotide" and "oligonucleotide" are used interchangeably and refer to polymeric forms of nucleotides of any length, whether deoxyribonucleotides or ribonucleotides or analogs thereof. Examples of polynucleotides include, but are not limited to, the following: genes or gene fragments (including probes, primers, ESTs or SAGE tags), exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, dsRNA, siRNA, miRNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes and primers. Polynucleotides also include modified nucleotides, such as methylated nucleotides and nucleotide analogs. If a modification is present on the polynucleotide, the modification may be imparted before or after assembly of the polynucleotide. Unless otherwise indicated or required, any embodiment of a disclosed polynucleotide of the present invention includes both its double stranded form and either of two complementary single stranded forms known or predicted to be capable of constituting the double stranded form.

The term "encoding" when applied to a polynucleotide means that the polynucleotide "encodes" a polypeptide, i.e., in its native state or when manipulated by methods well known to those of skill in the art, which can be transcribed and/or translated to produce the polypeptide of interest and/or fragments thereof, or to produce an mRNA capable of encoding the polypeptide of interest and/or fragments thereof. The antisense strand refers to the sequence complementary to the polynucleotide and from which the coding sequence can be deduced.

The term "genomic DNA" refers to DNA of the genome of an organism, including DNA of the genome of a bacterium, archaebacteria, fungus, protist, virus, plant or animal.

The term "manipulating" DNA includes binding, making a nick on one strand, or cleaving two strands of DNA, or includes modifying or editing DNA or polypeptides that bind to DNA. Manipulation of DNA can silence, activate, or modulate expression of RNA or polypeptide encoded by the DNA (either to prevent transcription, or to reduce transcriptional activity, or to prevent translation, or to reduce translation levels), or to prevent or enhance binding of polypeptide to DNA. Cleavage can be performed by a variety of methods, such as enzymatic or chemical hydrolysis of phosphodiester bonds; single-stranded or double-stranded; DNA cleavage can result in the creation of blunt ends or staggered ends.

The term "hybridizable" or "complementary" or "substantially complementary" means that a nucleic acid (e.g., RNA) comprises a nucleotide sequence that enables it to non-covalently bind to another nucleic acid in a sequence-specific, antiparallel manner under appropriate in vitro and/or in vivo temperature and solution ionic strength conditions, i.e., form watson-crick base pairs and/or G/U base pairs, "anneal" or "hybridize".

It is understood in the art that the sequence of a polynucleotide need not be 100% complementary to the sequence of a target nucleic acid to which it is specifically hybridizable. Polynucleotides may hybridize over one or more segments. The polynucleotide can comprise at least 70%, at least 80%, at least 90%, at least 95%, at least 99%, or 100% sequence complementarity to a target region within a target nucleic acid sequence to which it is targeted. The percent complementarity between specific nucleic acid sequence segments within a nucleic acid can be routinely determined using known BLAST programs and PowerBLAST programs in the art.

The terms "peptide," "polypeptide," and "protein" are used interchangeably herein and refer to polymeric forms of amino acids of any length, which may include encoded and non-encoded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones.

The term "binding domain" refers to a protein domain capable of non-covalently binding to another molecule. The binding domain may bind to, for example, a DNA molecule (DNA-binding protein), an RNA molecule (RNA-binding protein) and/or a protein molecule (protein-binding protein). In the case of a protein domain-binding protein, it may bind itself (to form homodimers, homotrimers, etc.) and/or it may bind one or more molecules of one or more different proteins.

The term "DNA sequence encoding" a particular RNA is a DNA nucleic acid sequence transcribed into RNA. The DNA polynucleotide may encode an RNA (mRNA) that is translated into a protein, or the DNA polynucleotide may encode an RNA (e.g., tRNA, rRNA, or gRNA; also referred to as "non-coding" RNA or "ncRNA") that is not translated into a protein. A "protein coding sequence" or a sequence encoding a particular protein or polypeptide is a nucleic acid sequence that is transcribed into mRNA (in the case of DNA) and translated (in the case of mRNA) into a polypeptide in vivo or in vitro under the control of appropriate regulatory sequences.

The term "vector" or "expression vector" is a replicon, such as a plasmid, phage, virus, or cosmid, to which another DNA segment, i.e., an "insert," may be attached in order to effect replication of the attached segment in a cell.

The term "expression cassette" comprises a DNA coding sequence operably linked to a promoter. "operatively linked" means that the components are connected side-by-side in a relationship that allows them to function in their intended manner. The term "recombinant expression vector" or "DNA construct" is used interchangeably herein to refer to a DNA molecule comprising a vector and at least one insert. Recombinant expression vectors are typically generated for the purpose of expressing and/or amplifying the insert or for the construction of other recombinant nucleotide sequences.

When exogenous DNA, such as a recombinant expression vector, has been introduced into a cell, the cell has been "genetically modified" or "transformed" or "transfected" with the DNA. The presence of foreign DNA results in permanent or transient genetic changes. The transforming DNA may or may not be integrated into the genome of the cell.

The term "target DNA" is a DNA polynucleotide comprising a "target site" or "target sequence". The terms "target site", "target sequence", "target protospacer DNA" or "protospacer-like sequence" are used interchangeably herein to refer to a nucleic acid sequence present in target DNA to which a DNA-targeting segment of gRNA will bind if sufficient conditions exist for binding. RNA molecules comprise sequences that bind, hybridize or complement to target sequences within the target DNA, thereby targeting the bound polypeptide to a specific location (target sequence) within the target DNA. "cleavage" refers to the cleavage of the covalent backbone of a DNA molecule.

The terms "nuclease" and "endonuclease" are used interchangeably to refer to an enzyme having catalytic activity for endonuclease degradation of polynucleotide cleavage. "cleavage domain" or "active domain" or "nuclease domain" of a nuclease refers to a polypeptide sequence or domain within a nuclease that has catalytic activity for DNA cleavage. The cleavage domain may be contained in a single polypeptide chain, or the cleavage activity may result from association of two or more polypeptides.

The term "targeting polypeptide" or "RNA-binding site directed polypeptide" refers to a polypeptide that binds RNA and is targeted to a particular DNA sequence.

The term "guide sequence" or DNA-targeting segment (or "DNA-targeting sequence") comprises a nucleotide sequence (complementary strand of target DNA) that is complementary to a specific sequence within the target DNA, referred to herein as a "protospacer-like" sequence.

The term "recombination" refers to the process of exchanging genetic information between two polynucleotides. "Homology Directed Repair (HDR)" as used herein means a specialized form of DNA repair that occurs, for example, during repair of double strand breaks in cells. This process requires nucleotide sequence homology, uses "donor" molecules to provide templates for repair of "target" molecules (i.e., molecules that undergo double strand breaks), and results in transfer of genetic information from the donor to the target. If the donor polynucleotide is different from the target molecule and part or all of the sequence of the donor polynucleotide is incorporated into the target DNA, homology directed repair may result in alterations (e.g., insertions, deletions, mutations) in the sequence of the target molecule.

The term "non-homologous end joining (NHEJ)" refers to repairing double strand breaks in DNA by directly joining the broken ends to each other without the need for a homologous template. NHEJ often results in a deletion of the nucleotide sequence near the site of double strand break.

The term "treating" includes preventing the occurrence of a disease or symptom; inhibit the disease or symptoms or alleviate the disease.

The terms "individual," "subject," "host," and "patient" are used interchangeably herein and refer to any mammalian subject, particularly a human, for whom diagnosis, treatment, or therapy is desired.

The present invention provides a base editing system comprising:

1) A C2C9 nuclease and/or a nucleic acid encoding the C2C9 nuclease;

2) A guide RNA and/or a nucleic acid encoding the guide RNA;

3) Deaminase and/or nucleic acid encoding the same.

The C2C9 nuclease has at least one of the following activities: regulating transcription within a target gene (e.g., target DNA), cleavage activity (endoribonuclease and/or endonuclease activity), gene editing activity, etc., types of effecting gene editing include, but are not limited to: gene cleavage, gene deletion, gene insertion, point mutation, transcriptional repression, transcriptional activation, base editing, and the like. The C2C9 nuclease may be derived from any biological species.

In certain embodiments of the invention, the C2C9 nuclease is:

(I) A wild-type C2C9 nuclease or fragment thereof, having RNA-guided nucleic acid binding activity;

(II) a variant having at least 30% sequence homology with the amino acid sequence of (I) and having RNA-guided nucleic acid binding activity;

(III) according to (I) or (II), further comprising a nuclear localization signal fragment;

(IV) according to (I) or (II) or (III), further comprising:

(a) One or more modifications or mutations that result in a dna sequence having significantly reduced endonuclease activity, or a loss of endonuclease activity, compared to the endonuclease sequence prior to the modification or mutation; and/or

(B) A polypeptide or domain having other functional activity;

(c) Having other polypeptides or domains that inactivate or significantly reduce the functional activity of a C2C9 nuclease;

(V) according to (I) or (II) or (III), the C2C9 is a nuclease with impaired endonuclease activity, may be a nuclease with partial endonuclease activity, or may be a nuclease without endonuclease activity.

The C2C9 is a nuclease having impaired endonuclease activity or partial endonuclease activity may be at least 1%, 2%, 3%, 5%, 10%, 20%, 30%, 40%, 50% or more reduced relative to wild-type endonuclease activity.

In certain embodiments of the invention, the amino acid sequence of the wild-type C2C9 nuclease is set forth in SEQ ID NO.1-SEQ ID NO. 115. In a preferred embodiment, the C2C9 nuclease is an AcC2C9 nuclease derived from Actinomadura CRANIELLAE C C9, the amino acid sequence of which is shown in SEQ ID NO. 1.

A C2C9 nuclease from a different species or a different C2C9 nuclease from the same species may require a different PAM sequence in the target DNA, and thus, the PAM sequence requirements may be different from the PAM sequences described previously for the particular C2C9 nuclease selected.

In the present invention, the C2C9 nuclease variants may also be formed by modification, mutation, DNA shuffling, or the like, such that the C2C9 nuclease variants have improved desired characteristics, such as function, activity, kinetics, half-life, or the like. The modification may be, for example, a deletion, insertion or substitution of an amino acid, and may be, for example, the replacement of the "cleavage domain" of a C2C9 nuclease with a homologous or heterologous cleavage domain from a different nuclease (e.g., the HNH domain of a CRISPR-associated nuclease); the DNA targeting of C2C9 nucleases can be altered, for example, by any modification method known in the art for DNA binding and/or DNA modification proteins, such as methylation, demethylation, acetylation, and the like. By DNA shuffling is meant the exchange of sequence fragments between DNA sequences of C2C9 nucleases of different origins to produce chimeric DNA sequences encoding synthetic proteins with RNA-guided endonuclease activity. The modification, mutation, DNA shuffling, etc. may be used singly or in combination.

The C2C9 of the invention may be wild-type, truncated, mutant and/or fusion. The wild type, truncations, mutants, fusions of the C2C9 nuclease still have the ability to interact with the guide RNA and bind to the target gene. The truncations of the C2C9 nucleases are versions in which several amino acids are truncated on the basis of the wild type of C2C9, the truncated amino acids being continuous and/or discontinuous. The mutant of the C2C9 nuclease is obtained by mutating a plurality of amino acids based on the wild type of the C2C9, and mutation sites are continuous and/or discontinuous. The fusion of the C2C9 nuclease is a polypeptide fused on the basis of the wild type of the C2C9 and used for generating functions such as incision, positioning and the like, the number of connected polypeptides is arbitrary, and the connection positions can be the C-terminal and/or N-terminal of the protein and/or any proper position inside the protein.

In one embodiment, the mutant of the C2C9 nuclease is: any one or more of the AcC2C9 inactivating mutants D240A, E332A, D a.

In one embodiment, the nucleotide sequence optimized for the humanized codon of the AcC2C9 nuclease is set forth in SEQ ID NO. 136.

In some embodiments, the C2C9 nuclease variant has no cleavage activity. In some embodiments, the C2C9 nuclease variant has single-stranded cleavage activity. In some embodiments, the C2C9 nuclease variant has double-strand cleavage activity.

The C2C9 nuclease does not exceed 800 amino acids.

The base editing system recognizes a PAM sequence on a target sequence; and/or the nucleic acid fragment with the length of 12-40 bp after the base editing system targets the PAM sequence, and the preferable length is 20bp.

The PAM sequence is AAN, GAN; wherein N is a degenerate base and represents A, T, C or G arbitrary bases.

In some embodiments, the gRNA comprises:

A first segment of nucleotide sequence complementary to a target sequence in a target gene (also referred to as a "gene targeting sequence" or "gene targeting fragment"); and

A second fragment (also referred to as a "protein binding sequence" or "protein binding fragment") that interacts with a C2C9 nuclease.

In some embodiments, the gRNA comprises an array of repeat spacer regions, wherein the spacer regions comprise a nucleic acid sequence complementary to a target sequence in a gene.

In some embodiments, the gRNA comprises:

i. A gene targeting segment (e.g., a DNA-targeting segment) capable of hybridizing to a target sequence,

A tracr mate sequence, which is a sequence of a tracr mate,

Tracr RNA sequence;

in some embodiments, the gRNA further comprises:

Necessary motifs for interaction with proteins and/or recruitment;

necessary structurally stable sequences.

The recruitment motif may be located at the 3 'or 5' end of the guide RNA, or may be inserted into the guide RNA (e.g., within a hairpin loop). The recruitment motif may also be any RNA motif that is recognized by an affinity polypeptide, e.g., the RNA recruitment motif is capable of being bound by an affinity polypeptide. RNA recruitment motifs and their corresponding affinity polypeptides may include, but are not limited to, telomerase Ku binding motifs (e.g., ku binding hairpins) and affinity polypeptides of Ku (e.g., ku heterodimers); a telomerase Sm7 binding motif and an affinity polypeptide for Sm 7; an MS2 phage operon stem loop and an affinity polypeptide of MS2 coat protein (MCP), and the like. The preferred affinity polypeptide is MCP, and the corresponding preferred RNA motif is MS2. The specific sequence of the method is as follows,

MCP:MTSASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSS AQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGI(SEQ ID NO.182)

MS2:CTAGATAGCTCTAGCTGTAGAAAACATGAGGATCACCCATGT(SEQ ID NO.183)

The structurally stable sequence is at the 5 'or 3' end of the guide RNA or in the guide RNA (e.g., within the hairpin loop). The structurally stable sequence comprises any possible motif, such as:

Dengue4:5’AGTCAGGCCACTTGTGCCACGGTTTGAGCAAACCGTGCTGCCTGTAGCTCCG CCAATAATGGGAGGCGT3’(SEQ ID NO.184)

ZIKV:5'TGTCAGGCCTGCTAGTCAGCCACAGTTTGGGGAAAGCTGTGCAGCCTGTAACCC CCCCAGGAGAAGCTGGGAAACCAAGCT3'(SEQ ID NO.185)

MVE:5'TAGTCAGGCCAGCCGGTTAGGCTGCCACCGAAGGTTGGTAGACGGTGCTGCCTG CGACCAACCCCAGGAGGACTGGGT3'(SEQ ID NO.186)

YF:5'TGTCAGCCCAGAACCCCACACGAGTTTTGCCACTGCTAAGCTGTGAGGCAGTGCA GGCTGGGACAGCCGACCTCCAGGTTGCGAAAAACCTGGT3'(SEQ ID NO.187)

The guide RNA is a strand formed by the DNA-targeting segment (i) linked in sequence to the tracr mate sequence (ii) and tracr RNA sequence (iii); alternatively, the guide RNA comprises two strands, one of which is formed by ligation of the gene targeting segment (i) with the tracr mate sequence (ii) and the other strand is the tracr RNA sequence (iii).

Wherein, the gene targeting segment (i) is preferably an RNA sequence corresponding to a nucleic acid fragment with the length of 20bp after the PAM sequence.

Wherein the tracr mate sequence (ii) hybridizes to the tracrRNA sequence (iii) and forms a stem-loop structure. The stem-loop structure forms a protein binding structure that interacts with a C2C9 nuclease. In some embodiments, the protein binding structure of the gRNA comprises a 4-stem loop structure, and the tracr mate sequence (ii) is typically paired with the tracr RNA sequence (iii) by base complementarity. The activity of the C2C9 nuclease can be increased or nonspecific recognition reduced by engineering the base sequence of the gRNA.

Wherein the tracr mate sequence (ii) and the tracr RNA sequence (iii) are capable of being joined together to form a single guide RNA backbone sequence.

Wherein the RNA sequence (crRNA) resulting from ligation of the gene targeting segment (i) hybridized to the target sequence and the tracr mate sequence (ii) and the tracr RNA sequence (iii) as two separate RNA sequences, in the presence of the same time, mediate C2C9 endonuclease activity. Or a complete guide RNA expression construct for the target sequence, obtained after ligation of the guide RNA backbone sequence and the DNA-targeting segment (i) hybridized to the target sequence, may also mediate C2C9 endonuclease activity.

The gene targeting segment of the gRNA comprises a nucleotide sequence complementary to a sequence in the target gene that interacts with the target gene in a sequence-specific manner by hybridization (i.e., base pairing). The gene targeting sequence of the gRNA can be modified, for example, by genetic engineering, so that the gRNA hybridizes to any desired sequence within the target gene. The gRNA directs the bound polypeptide, e.g., C2C9 nuclease, to a specific nucleotide sequence within the target gene via the gene targeting sequences described above.

In some embodiments, the target gene is a DNA sequence. In some embodiments, the target gene is an RNA sequence.

In some embodiments, the sequence of the gRNA is set forth in SEQ ID No. 134.

In some embodiments, the targeting sequence of the target gene may have a length of 12-40 nucleotides, for example, may be 13-20, 18-25, 22-32, 26-37, 30-38, 32-40 nucleotides in length. The percent complementarity between the targeting segment (i) of the guide RNA and the target sequence of the target gene can be at least 50% (e.g., at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99%, or 100%).

In some embodiments, the gRNA further comprises a transcription terminator.

The C2C9 nuclease and its guide RNA are used to bind target DNA for base editing.

In certain embodiments of the invention, the deaminase is a cytosine deaminase and/or an adenine deaminase.

In some embodiments, the deaminase is linked to a C2C9 nuclease via a linker peptide, or is linked directly to a C2C9 nuclease. In other embodiments, the deaminase is optionally recruited to the vicinity of the C2C9 nuclease by protein-to-protein interactions, RNA-to-protein interactions, or chemical interactions, thereby ligating with the C2C9 nuclease. In a preferred embodiment, the deaminase is linked to the C2C9 nuclease by way of a linker peptide fusion.

The deaminase may be attached at any or several suitable positions of the C2C9 nuclease. In one embodiment, the deaminase may be fused either alone at the C-terminus or N-terminus of the C2C9 nuclease or separately at the C-terminus and N-terminus of the C2C9 nuclease.

The adenine deaminase is DNA or RNA adenine deaminase. In one embodiment, the adenine deaminase is TadA (tRNA specific adenosine deaminase) or TadA (evolved tRNA specific adenosine deaminase), or a fusion of both. The adenine deaminase is selected from wild-type tadA or mutant tadA, or a complex of both. In a preferred embodiment, the adenine deaminase nucleotide sequence is as shown in any one of SEQ ID NO.139-SEQ ID NO.141, or has at least 50% (e.g., at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100%) similarity to SEQ ID NO.139-SEQ ID NO. 141.

The cytosine deaminase is a DNA or RNA cytosine deaminase. In one embodiment, the cytosine deaminase is selected from an apolipoprotein B mRNA editing catalytic polypeptide-like (apodec), a human activation-induced deaminase (hAID), a FERNY deaminase and/or a CDA1 deaminase, or a variant thereof. The apodec is, for example, apodec 1, apodec 2, apodec 3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC H or apodec 4. In a more preferred embodiment, the nucleotide sequence of the cytosine deaminase is as set forth in any one of SEQ ID NO.154-SEQ ID NO.155, or has at least 50% (e.g., at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98%, at least 99% or 100%) similarity to SEQ ID NO.154-SEQ ID NO. 155.

The number of deaminase fusions is one or more. The fusion site may be the C-and/or N-terminus of the protein and/or any suitable site within the protein.

The C2C9 nucleases can also be used to ligate polypeptides of interest. Wherein the polypeptide of interest comprises at least one polypeptide having nicking enzyme activity, recombinase activity, transposase activity, methylase activity, glycosylase (DNA glycosylase) activity, glycosylase inhibitor activity (e.g., uracil-DNA glycosylase inhibitor (UGI)), demethylase activity, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, histone modification activity, nuclease activity, single stranded RNA cleavage activity, double stranded RNA cleavage activity, restriction endonuclease activity (e.g., fok 1), nucleic acid binding activity, methyltransferase activity, DNA repair activity, DNA damage activity, dismutase activity, alkylation activity, depurination activity, oxidation activity, pyrimidine dimer formation activity, integrase activity, transposase activity, polymerase activity, ligase activity, helicase activity, and/or photolytic activity.

In a preferred embodiment, the polypeptide of interest comprises a glycosylase inhibitor. The glycosylase inhibitor is two or more. The glycosylase inhibitor is optionally recruited to the vicinity of the C2C9 nuclease and/or deaminase by protein-to-protein interactions, RNA-to-protein interactions, or chemical interactions, thereby linking to the C2C9 nuclease and/or deaminase. The preferred enrichment method is by fusion of the connecting peptides.

In a more preferred embodiment, the glycosylase inhibitor is a uracil-DNA glycosylase inhibitor (UGI). In one embodiment, the nucleotide sequence of UGI is SEQ ID No.153.

The UGI fusion number is one or more. The fusion site may be the C-terminus and/or N-terminus of a C2C9 nuclease and/or deaminase, and/or any suitable site within a C2C9 nuclease and/or deaminase.

Also included is the requisite nuclear localization signal peptide (NLS), a preferred nuclear localization signal peptide is SV40. Its specific amino acid sequence is PKKKRKV (SEQ ID NO. 188)

The number of nuclear localization signal peptides is one or more. The fusion site may be the C-terminus and/or N-terminus of a C2C9 nuclease and/or deaminase, and/or any suitable site within a C2C9 nuclease and/or deaminase.

The C2C9 nuclease, deaminase, nuclear localization signal peptide, glycosylase inhibitor and other protein elements can be connected by a linker composed of 4-100 amino acids or can be directly connected. The protein linker is 4 or more amino acids in length, for example, 4 to 32 amino acids. In one embodiment, the protein linker is 4, 10, 32 amino acids in length.

In certain embodiments of the invention, the polynucleotide encoding a C2C9 nuclease, the polynucleotide encoding a deaminase, and the guide RNA are comprised in one or more expression vectors.

In certain embodiments of the invention, the polynucleotide encoding the C2C9 nuclease and the guide RNA are contained in one or more expression vectors, in which embodiment the C2C9 nuclease is fused to the deaminase fragment via a linker peptide.

In certain embodiments of the invention, the polynucleotide encoding the C2C9 nuclease and the guide RNA are contained in one or more expression vectors, in which embodiment the deaminase is recruited to the vicinity of the C2C9 nuclease by protein-protein interactions or protein-nucleic acid interactions.

In certain embodiments of the invention, polynucleotides encoding a guide nucleic acid having a fragment that recruits deaminase and a C2C9 nuclease are contained in one or more expression vectors.

The recombinant expression vector is selected from viral vectors (e.g., vaccinia virus-based viral vectors, poliovirus, adenovirus, adeno-associated virus, SV40, herpes simplex virus, human immunodeficiency virus, retrovirus vectors (e.g., murine leukemia virus, spleen necrosis virus, and retroviral-derived vectors such as Rous sarcoma virus, harvey sarcoma virus, avian leukemia virus, lentivirus, human immunodeficiency virus, myeloproliferative sarcoma virus, and mammary tumor virus), and the like.

The invention also provides the adeno-associated virus, which is formed by packaging the recombinant expression vector by viruses.

The invention also provides a composition comprising one or more of the C2C9 nuclease or polynucleotide encoding the same, the gRNA or polynucleotide encoding the same, and a base editing system, and can also comprise an acceptable carrier, medium and the like. Such acceptable carriers, vehicles, e.g., sterile or normal saline, stabilizers, excipients, antioxidants (ascorbic acid, etc.), buffers (phosphoric acid, citric acid, other organic acids, etc.), preservatives, surfactants (PEG, tween, etc.), chelating agents (EDTA, etc.), binders, and the like. Furthermore, other low molecular weight polypeptides may be included; proteins such as serum albumin, gelatin, and immunoglobulins; amino acids such as glycine, glutamine, asparagine, arginine and lysine; saccharides or carbohydrates such as polysaccharides and monosaccharides; sugar alcohols such as mannitol and sorbitol. When preparing an aqueous solution for injection, for example, physiological saline, isotonic solution containing glucose or other auxiliary drugs, such as D-sorbitol, D-mannose, D-mannitol, sodium chloride, and the like, an appropriate solubilizing agent such as alcohol (ethanol or the like), polyol (propylene glycol, PEG or the like), nonionic surfactant (Tween 80, HCO-50) or the like may be used in combination. In some embodiments, the composition comprises a gRNA and a buffer for stabilizing nucleic acids.

The invention also provides a kit comprising a system or composition as described above. The kit may further comprise one or more, for example selected from: diluting the buffer solution; washing buffer; control reagents, and the like. In some embodiments, the kit comprises (a) a C2C9 nuclease or a nucleic acid encoding a C2C9 nuclease according to the above; and (b) a gRNA or a nucleic acid encoding the gRNA, wherein the gRNA is capable of directing the C2C9 nuclease or variant thereof to a target polynucleotide sequence, C) a deaminase and/or a nucleic acid encoding the deaminase. In certain embodiments, the kit further comprises a donor template comprising a heterologous polynucleotide sequence, wherein the heterologous polynucleotide sequence is capable of being inserted into the target polynucleotide sequence.

The present invention provides for the use of the systems, compositions and kits described above in any of the following in vivo, ex vivo, or cell-free systems, including but not limited to:

Cutting the target gene;

Manipulating expression of a target gene;

Genetically modifying the target gene;

Genetically modifying a target gene-related polypeptide;

for intentional and controlled damage at any desired location of a target gene;

For intentional and controlled repair at any desired location of the target gene.

In some embodiments, the target gene is a target DNA. In some embodiments, the target gene is a target RNA.

In the present invention, the systems, compositions, kits, etc. are applicable to any organism, including but not limited to bacteria, archaea, fungi, protist, plant or animal. Accordingly, suitable target cells include, but are not limited to, bacterial cells, archaeal cells, fungal cells, protozoan cells, plant cells, or animal cells (e.g., rodent cells, human cells, non-human primate cells). Suitable target cells may be any type of cell, including stem cells, somatic cells, and the like.

The base editing method is performed in cells or in vivo, ex vivo or cell-free systems.

The base editing method comprises the following steps:

i) Introducing the C2C9 nuclease or a polynucleotide encoding the same, the guide RNA or a polynucleotide encoding the same, a deaminase and or a polynucleotide encoding the deaminase into a cell. The method of introduction may be plasmid transfection, or viral, viroid delivery, or liposome delivery, or microinjection, among other possible methods of delivery;

ii) one or more mutations in the target gene are generated or the target gene is targeted, edited, modified or manipulated mediated by the AcC2C9 nuclease and deaminase.

The target gene is a nucleic acid that is present in an organism and/or that is present in nature and/or that is not present in nature. The organism is an animal, plant, fungus, archaebacteria or bacteria.

In the present invention, the bacteria or prokaryotic bacteria may be Escherichia coli, klebsiella pneumoniae, bacteroides ovale, campylobacter jejuni, staphylococcus saprophyticus, enterococcus faecalis, bacteroides thetaiotaomicron, bacteroides vulgaris, bacteroides simplex, lactobacillus casei, bacteroides fragilis, acinetobacter reuteri, fusobacterium nucleatum, bacteroides johnsonii, arabidopsis thaliana, lactobacillus rhamnosus, bacteroides mosaic, paramygdalina faecalis, fusobacterium mortiferum, bifidobacterium breve, etc.

In the present invention, eukaryotic cells include, but are not limited to, eukaryotic cells such as mammalian cells, fungi, and the like. The fungi include yeasts, aspergillus, which may be, for example, saccharomyces cerevisiae, hansenula polymorpha, pichia pastoris, kluyveromyces fragilis, kluyveromyces lactis, schizosaccharomyces pombe, candida albicans, candida duveticus, candida glabrata, candida quaternium, candida lactis, candida krusei, candida vini, candida merrillii, candida oleaginous, candida parapsilosis, candida tropicalis and Candida utilis, aspergillus fumigatus, aspergillus flavus, aspergillus niger, aspergillus clavus, aspergillus glaucus, aspergillus nidulans, aspergillus oryzae, aspergillus terreus, aspergillus coke, aspergillus versicolor, etc.

In some embodiments, the target gene may be in vitro naked DNA that is not bound to a DNA-related protein.

In the methods of the invention, the C2C9 nuclease or polynucleotide encoding the same, the gRNA or polynucleotide encoding the same, the deaminase and/or nucleic acid encoding the deaminase or recombinant expression vectors, systems, compositions and/or donor polynucleotides thereof are administered directly to an individual when applied in vivo.

In the method of the present invention, a C2C9 nuclease or a nucleic acid comprising a nucleotide sequence encoding a polypeptide of a C2C9 nuclease may be introduced into a cell by a known method. Likewise, the gRNA or a nucleic acid comprising a nucleotide sequence encoding the gRNA, a deaminase and/or a nucleic acid encoding the deaminase may be introduced into a cell by well known methods. Well-known methods include DEAE-dextran mediated transfection, liposome-mediated transfection, viral or phage infection, lipofection, transfection, conjugation, protoplast fusion, polyethylenimine-mediated transfection, electroporation, calcium phosphate precipitation, gene gun, calcium phosphate precipitation, microinjection, nanoparticle-mediated nucleic acid delivery, and the like. The plasmid is delivered, for example, by electroporation, calcium chloride transfection, microinjection, and lipofection. For viral vector delivery, the cells are contacted with a viral particle comprising a nucleic acid encoding a gRNA and/or a C2C9 nuclease and/or a chimeric C2C9 nuclease and/or a donor polynucleotide.

The invention also provides a cell, which comprises the host cell genetically modified by the C2C9 nuclease or the polynucleotide encoding the same, the gRNA or the polynucleotide encoding the same, deaminase and/or nucleic acid encoding the deaminase or recombinant expression vectors, systems and compositions of the deaminase and the deaminase.

In the present invention, effective dosages of gRNA and/or C2C9 nuclease and/or deaminase and/or recombinant expression vector and/or donor polynucleotide are conventional to those of skill in the art. Can be determined according to the different routes of administration and the nature of the condition being treated.

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention.

Before the embodiments of the invention are explained in further detail, it is to be understood that the invention is not limited in its scope to the particular embodiments described below; it is also to be understood that the terminology used in the examples of the invention is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the invention; in the description and claims of the invention, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise.

Where numerical ranges are provided in the examples, it is understood that unless otherwise stated herein, both endpoints of each numerical range and any number between the two endpoints are significant both in the numerical range. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In addition to the specific methods, devices, materials used in the embodiments, any methods, devices, and materials of the prior art similar or equivalent to those described in the embodiments of the present invention may be used to practice the present invention according to the knowledge of one skilled in the art and the description of the present invention.

Example 1 determination of the catalytically active site of AcC2C9

The catalytically active site of AcC2C9 was determined by amino acid sequence alignment analysis. Sequence alignment was performed using NCBI COBALT (construct-based Multiple Alignment Tool). The amino acid sequences of the common Cas12 family nucleases and TnpB nucleases are entered. The amino acid sequence of AcC2C9 is SEQ ID NO.1, the amino acid sequence of CgC2C9 is SEQ ID NO.3, the amino acid sequence of RdC C9 is SEQ ID NO.18, the amino acid sequence of MiC C9 is SEQ ID NO.10, the amino acid sequence of FnCas a is SEQ ID NO.116, the amino acid sequence of LbCas a is SEQ ID NO.117, the amino acid sequence of AsCas a is SEQ ID NO.118, the amino acid sequence of Un1Cas12f1 is SEQ ID NO.119, the amino acid sequence of SpCas12f1 is SEQ ID NO.120, the amino acid sequence of AsCas12f1 is SEQ ID NO.121, and the amino acid sequence of TnpB is SEQ ID NO.122. As shown in fig. 1, the sequence alignment shows that amino acid aspartic acid 240 (D240), amino acid glutamic acid 332 (E332), amino acid aspartic acid 429 (D429) of AcC2C9 are three conserved catalytic active sites in Cas12 nuclease family and TnpB nucleases.

Example 2 construction of heterologous expression plasmids pET28a-sumo-AcC2C9 and pET28a-sumo-dAcC C9-D240A/pET28a-sumo-dAcC C9-E332A/pET28a-sumo-dAcC C9-D429A for AcC2C9 protein and its mutant inactive protein

1. Construction of pET28a-sumo-AcC2C9 plasmid

The following sequences were synthesized by the biological engineering (Shanghai) Co., ltd:

The nucleotide sequence of the AcC2C9 coding gene expression cassette shown as SEQ ID NO.123, the nucleotide sequence of the pET28a-sumo plasmid skeleton shown as SEQ ID NO.124, and the two fragments are assembled into the pET28a-sumo-AcC2C9 plasmid by using the Gibson assembly technology. The ligation product was transformed into transformed E.coli DH 5. Alpha. Cells and screened on LBA solid culture plates of 50. Mu.g/mL kanamycin, and the single clone was picked up and sequenced overnight at 37℃to give pET28a-sumo-AcC2C9 plasmid (SEQ ID NO. 125).

2. Construction of pET28a-sumo-dAcC C9-D240A/pET28a-sumo-dAcC C9-E332A/pET28a-sumo-dAc C2C9-D429A plasmid

Mutations were introduced by amplification using the above pET28a-sumo-AcC2C9 plasmid as a template. First, the following six primers were synthesized at the biological engineering (Shanghai) Co., ltd:

D240A primerF:

5’-CGAGGCGCTGTTGGGGTGGcgCTTGGCGTTAAGGTC-3’(SEQ ID NO.126)

D240A primerR：

5’-GGACCTTAACGCCAAGcgCCACCCCAACAGCGCCT-3’(SEQ ID NO.127)

E332A primerF:

5’-TCGCGGTCGTCGCGCTTGccGATCTGAACGTGGCCG-3’(SEQ ID NO.128)

E332A primerR：

5’-CCGGCCACGTTCAGATCggCAAGCGCGACGACCGC-3’(SEQ ID NO.129)

D429A primerF:

5’-TGATGGACCGGGccATGAACGCGGCCCGGAACATC-3’(SEQ ID NO.130)

D429A primerR：

5’-GGCCGCGTTCATggCCCGGTCCATCACCAGCCCGC-3’(SEQ ID NO.131)

The dAcC C9D240A/dAcC C9-E332A/dAcC C9-D429 plasmid fragments containing the mutation were amplified using Phanta Max Master Mix reagents from Norflua, inc., respectively, in the following reaction scheme: 25. Mu.L 2xPhanta Max Master Mix,1.5. Mu. L PrimerF (10. Mu.M), 1.5. Mu. L PrimerR (10. Mu.M), 0.5. Mu.L template DNA (100 ng/. Mu.L), 1.5. Mu.L DMSO, 20. Mu.L ddH ₂ O. Wherein PrimerF and PrimerR are formulated using a corresponding set of dAcC2C9D240A(D240A primerF,D240A primerR)/dAcC2C9-E332A(E332A primerF,E332A primerR)/dAcC2C9-D429A(D429A primerF,D429A primerR).PCR reaction systems and then subjected to Polymerase Chain Reaction (PCR) in the following cycles: 98 ℃ for 2min; then, the temperature is 98 ℃ for 20s,55 ℃ for 20s and 72 ℃ for 3min, and 30 cycles are carried out; finally, the temperature is 72 ℃ for 5min. Template DNA was digested with Quick Cut ^TM DpnI (Cat # 1609) manufactured by TAKARA corporation. The reaction system is as follows: mu.l of the PCR reaction product was digested with 1. Mu. LQuick Cut ^TM DpnI at 37℃for 30min. The PCR products were then separately recovered using a San Prep column PCR product purification kit manufactured by Biotechnology (Shanghai) Co., ltd. The specific steps of PCR product purification were performed according to the manual of the kit.

The three DNA fragments dAcC C9-D240A/dAcC C9-E332A/dAcC C9-D429A were assembled into a plasmid by Gibson assembly, respectively. The Gibson assembly reaction products were transformed into E.coli competent DH 5. Alpha. Strain, respectively, and screened on LBA solid culture plates of 50. Mu.g/mL kanamycin, and cultured overnight at 37 ℃. Then picking up the monoclonal, amplifying and culturing, extracting plasmids, and obtaining pET28a-sumo-dAcC C9-D240A/pET28a-sumo-dAcC C9-E332A/pET28a-sumo-dAcC C9-D429A plasmids through sequencing and confirmation.

EXAMPLE 3 complete or partial loss of in vitro double-stranded DNA cleavage Activity of three mutant inactivated AcC2C9

1. Preparation of nucleases

Preparation of AcC2C9 nuclease. The pET28a-sumo-AcC2C9 plasmid constructed in example 2.1 was transformed into E.coli expression strain BL21 (DE 3). The next day, the transformants were transferred to 1L of LB medium and shake-cultured at 37 ℃. When the OD600 reached 0.6, 0.25mL of 1M IPTG was added to the culture broth, and the temperature was lowered to 16℃for further incubation overnight. The strains after overnight were collected, sonicated, and purified using HISTRAP NI-NTA (GE Healthcare) chromatography columns. Further purification was then performed using HiLoad 16/600Superdex 200pg molecular sieves (GE HEALTHCARE). The purified protein was concentrated using ultrafiltration tubes and stored in 500mM NaCl,10mMTris-HCl, ph=7.5, 1mm DTT buffer.

Preparation of three mutant inactivated AcC2C9 nucleases (dAcC C9-D240A/dAcC C9-E332A/dAcC C9-D429A) the transformed plasmids were replaced only with pET28a-sumo-dAcC C9-D240A/pET28a-sumo-dAcC C9-E332A/pET28a-sumo-dAcC C9-D429A plasmids constructed in example 2.2 in agreement with the preparation of AcC2C9 nucleases.

2. Cleavage substrate preparation.

The cleavage substrates are four fluorescent labeling synthetic DNA sequences of 58 bp. Two base complementary pair primers were synthesized by the division of biological engineering (Shanghai) and the sequences were respectively

5’GGCTGTGAGAAAGCGGTTCAGGTGAAAGTGAAAACACTGCCCGACGCCCAGT TCGAAG 3’(SEQ ID NO.132)

5’CTTCGAACTGGGCGTCGGGCAGTGTTTTCACTTTCACCTGAACCGCTTTCTC ACAGCC 3’(SEQ ID NO.133)。

FAM fluorophores are respectively marked at the 5 'end or the 3' end of the two primers.

3. Preparation of complete guide RNA targeted to cleavage substrate.

The complete sequence of the guide RNA is shown below:

5'AAACGUCGCCUGCGAUAGGCGGGAGACGCUAAACGCCCGUGGAGCAUCCAUA AGACCAACCACCUCUCGGGGCGGUAGGCACGACGCAUCGAAGCGGGAAGGCUCCGGCGCUCGGCCUGAGUCACCUCAGCAGAGUGAUCUGCUGACGCUCCCAACCUUGAAUAACGAAACGGCAACGCCUCCAUAGCGGUGCAGGUCAAUAAGGGUCGGCCCCACGCGUGUAGGGAGCGAUCGCGGUUCAGGUGAAAGUGAAA 3'(SEQ ID NO.134)

The guide RNA transcription template DNA was synthesized by Jin Weizhi Biotechnology Inc., the sequence of which is as follows:

5'TAATACGACTCACTATAGGGAAACGTCGCCTGCGATAGGCGGGAGACGCTAAACG CCCGTGGAGCATCCATAAGACCAACCACCTCTCGGGGCGGTAGGCACGACGCATCGAAGCGGGAAGGCTCCGGCGCTCGGCCTGAGTCACCTCAGCAGAGTGATCTGCTGACGCTCCCAACCTTGAATAACGAAACGGCAACGCCTCCATAGCGGTGCAGGTCAATAAGGGTCGG CCCCACGCGTGTAGGGAGCGATCGCGGTTCAGGTGAAAGTGAAA 3'(SEQ ID NO.135)

Guide RNA was transcribed from the above template through HiScribe T7 HIGH YIELD RNA SYNTHESIS KIT (NEB). The specific steps were carried out according to the kit use instruction manual. The prepared complete guide RNA is purified by phenol chloroform extraction and ethanol precipitation.

4. In vitro cutting

In vitro cleavage experiments were performed in 150mM NaCl,10mM MgCl ₂, 10mM Tris-HCl, pH=7.5, 1mM DTT. The total reaction volume was 10. Mu.L, including 20nM cleavage substrate, 2. Mu.M AcC2C9, 2. Mu.M complete guide RNA. The reaction temperature was 37 ℃. The reaction time was 30 minutes. After the completion of the reaction, 10. Mu.L of a 2 Xformamide loading buffer (available from Shanghai Co., ltd.) was added to terminate the reaction. The reaction products were separated by 20% TBE-Urea-PAGE and imaged after excitation by 488nm blue light.

FIG. 2 is a graph showing the results of cleavage of four 58bp fluorescently labeled DNA substrates using AcC2C9 nuclease and three mutation-inactivated AcC2C9 nuclease (dAcC C9-D240A/dAcC C9-E332A/dAcC2C 9-D429A). The results show that: all three mutants of AcC2C9 either completely lost or significantly attenuated AcC2C9 activity. Wherein dAcC C9-D240A retained weak cleavage activity, while dAcC C9-E332A and dAcC C9-D429A completely lost double-stranded DNA cleavage activity.

EXAMPLE 4 construction of AcC2C9 nuclease System-based mammalian cell ABE editor expression plasmid pAcC2C9HS-ABE1-D240A/pAcC2C9HS-ABE2-D240A/pAcC2C9HS-ABE3-D240A/pAcC2C9HS-ABE4-E332A/pAcC2C9HS-ABE5-D429A and CBE editor expression plasmid pAcC C9HS-CBE1-D240A/pAcC C9HS-CBE2-D240A

1. Construction of pAcC C9HS-ABE1-D240A/pAcC C9HS-ABE2-D240A/pAcC C9HS-ABE3-D240A plasmid

Construction of 1.1pAcC2C9HS-ABE1/pAcC C9HS-ABE2/pAcC C9HS-ABE3 plasmid

The nucleotide sequence of the coding gene expression cassette of the human codon optimized AcC2C9 shown as SEQ ID NO.136, the nucleotide sequence of the coding gene expression cassette of the guide RNA corresponding to the AcC2C9 shown as SEQ ID NO.137, the nucleotide sequence of the coding gene expression cassette of the human transient expression plasmid skeleton shown as SEQ ID NO.138, the nucleotide sequence of the ABE1 fragment shown as SEQ ID NO.139, the nucleotide sequence of the ABE2 fragment shown as SEQ ID NO.140 and the nucleotide sequence of the ABE3 fragment shown as SEQ ID NO. 140.

The three fragments of the human codon optimized AcC2C9 coding gene expression cassette, the guide RNA expression cassette corresponding to the AcC2C9 and the human transient expression plasmid skeleton are assembled into pAcC C9HS-ABE1/pAcC C9HS-ABE2/pAcC2C9HS-ABE3 plasmids respectively with three different ABE versions of the ABE1 fragment, the ABE2 fragment and the ABE3 fragment by using the Gibson assembly technology. The ligation product was transformed into transformed E.coli DH 5. Alpha. Cells and screened on LBA solid culture plates of 50. Mu.g/mL carbenicillin, cultured overnight at 37℃and the monoclonal selected for sequencing gave pAcC C9HS-ABE1/pAcC C9HS-ABE2/pAcC C9HS-ABE3 plasmid.

Construction of the 1.2pAcC2C9HS-ABE1-D240A/pAcC C9HS-ABE2-D240A/pAcC C9HS-ABE3-D240A plasmid

Mutations were introduced by amplification using the pAcC C9HS-ABE1/pAcC C9HS-ABE2/pAcC C9HS-ABE3 plasmid described above as template. The following two primers were first synthesized at the biological engineering (Shanghai) Co., ltd:

HsD240A primerF:

5’-TCACCCCCAGGGCCACTCCCACGGCGCCCCTCTC-3’(SEQ ID NO.142)

HsD240A primerR：

5’-GAGGGGCGCCGTGGGAGTGGCCCTGGGGGTGAAGGTGCTGGC-3’(SEQ ID NO.143)

mutation introduction experimental procedure was the same as in example 2.2, template was exchanged for pAcC C9HS-ABE1/pAcC C9HS-ABE2/pAcC C9HS-ABE3 plasmid amplification primers were exchanged for HsD240AprimerF and HsD240AprimerR, and the plasmid was carbenicillin resistant. Sequencing and verification prove that the obtained plasmid is pAcC C9HS-ABE1-D240A/pAcC C9HS-ABE2-D240A/pAcC C9HS-ABE3-D240A plasmid, and the nucleotide sequences are respectively shown as SEQ ID NO.144, SEQ ID NO.145 and SEQ ID NO. 146.

2. Construction of pAcC C9HS-ABE4-E332A/pAcC C9HS-ABE5-D429 plasmid

Mutations were introduced by amplification using the pAcC C9HS-ABE3 plasmid described above as template. The following two primers were first synthesized at the biological engineering (Shanghai) Co., ltd:

HsE332A primerF:

5’-GACGTTCAGATCCGCCAGGGCCACCACGGCGAACC-3’(SEQ ID NO.147)

HsE332A primerR：

5’-TGGTGGCCCTGGCGGATCTGAACGTCGCCGGCAT-3’(SEQ ID NO.148)

HsD429A primerF:

5’-CGGCGTTCATAGCTCTATCCATGACCAGGCCGC-3’(SEQ ID NO.149)

HsD429A primerR：

5’-CATGGATAGAGCTATGAACGCCGCTAGGAACAT-3’(SEQ ID NO.150)

The experimental procedure for mutation introduction was the same as in example 2.2, with the template being replaced by pAcC C9HS-ABE3 plasmid, the amplification primers being replaced by HsE332AprimerF/HsE AprimerR or HsD429AprimerF/HsD429AprimerR, the plasmid being carbenicillin resistant. The obtained plasmid is pAcC C9HS-ABE4-E332A/pAcC C9HS-ABE5-D429 plasmid through sequencing verification, and the nucleotide sequences are respectively shown as SEQ ID NO.151 and SEQ ID NO. 152.

3. Construction of pAcC C9HS-CBE1-D240A/pAcC C9HS-CBE2-D240A plasmid

The following sequences were synthesized by the biological engineering (Shanghai) Co., ltd: a UGI fragment with a nucleotide sequence shown as SEQ ID NO.153, a CBE1 fragment with a nucleotide sequence shown as SEQ ID NO.153, and a CBE2 fragment with a nucleotide sequence shown as SEQ ID NO. 155.

The three fragments of the human codon optimized AcC2C9 encoding gene expression cassette, the corresponding guide RNA expression cassette of AcC2C9 and the human transient expression plasmid backbone of the above example 4.1.1 were assembled with the CBE1 fragment and the UGI fragment into pAcC C9HS-CBE1 plasmid using the Gibson assembly technique. And assembling three fragments of a human codon optimized AcC2C9 coding gene expression cassette, a guide RNA expression cassette corresponding to the AcC2C9 and a human transient expression plasmid skeleton with the CBE2 fragment and the UGI fragment into the pAcC C9HS-CBE2 plasmid. Two ligation products were transformed into transformed E.coli DH 5. Alpha. Cells and screened on LBA solid culture plates of 50. Mu.g/mL carbenicillin, and the single clones were picked and sequenced by overnight incubation at 37℃to give pAcC C9HS-CBE1 and pAcC C9HS-CBE2 plasmids.

Mutation introduction experimental procedure was the same as in example 2.2, with template changes of pAcC C9HS-CBE1 and pAcC C9HS-CBE2, plasmid amplification primers of HsD AprimerF and HsD240AprimerR of example 4.2, and plasmid resistance to carbenicillin. Sequencing shows that the obtained plasmid is pAcC C9HS-CBE1-D240A/pAcC C9HS-CBE2-D240A plasmid, and the nucleotide sequences are shown as SEQ ID NO.156 and SEQ ID NO. 157.

Example 5 base editor System based on AcC2C9 System mammalian cell base editing

In this example, human embryonic kidney cells HEK293T were used as the cells for the experiment.

1、pAcC2C9HS-ABE1-D240A-G1-4/pAcC2C9HS-ABE2-D240A-G1-4/pAcC2C9HS-ABE3-D240A-G1-4/pAcC2C9HS-ABE4-E332A-G1-4/pAcC2C9HS-ABE5-D429-G1-4 Construction of series of plasmids

In this example, 4 genes, VEGFA, AAVS1, PDCD1, HEXA, in the genome of HEK293T cells were selected as target sequences, and 120 bp of target sequence DNA was selected on each of these target sequences. The following sequences were synthesized by the biological engineering (Shanghai) Co., ltd:

Guide1_F:5'-ATCGAGTGACAGTATCCTCTGTAT-3'(SEQ ID NO.158)

Guide1_R:5'-AAAAATACAGAGGATACTGTCACT-3'(SEQ ID NO.159)

Guide2_F:5'-ATCGTGTAAGGAAGCTGCAGCACC-3'(SEQ ID NO.160)

Guide2_R:5'-AAAAGGTGCTGCAGCTTCCTTACA-3'(SEQ ID NO.161)

Guide3_F:5'-ATCGACAGTGGGGACTAGAGCTCA-3'(SEQ ID NO.162)

Guide3_R:5'-AAAATGAGCTCTAGTCCCCACTGT-3'(SEQ ID NO.163)

Guide4_F:5'-ATCGGACAAGGTTGAACAAACAGT-3'(SEQ ID NO.164)

Guide4_R:5'-AAAAACTGTTTGTTCAACCTTGTC-3'(SEQ ID NO.165)

phosphorylation and annealing of targeting sequence DNA: the annealing systems were formulated separately as follows in table 1.

TABLE 1

Component (A)	Volume of
		10×T4 DNA ligase Buffer(NEB)	1μl
Guide-F	2μl
		Guide-R	2μl
T4 PNK(NEB)	0.5μl
		Sterile water	4.5μl

The phosphorylation procedure was: the reaction was carried out at 37℃for 30 minutes. Subsequently, 50mM NaCl was added to 10. Mu.l of the reaction system, and the mixture was slowly annealed to obtain annealed target sequence DNA.

Next, 4 kinds of targeting sequence DNAs were inserted into pAcC2C9HS-ABE1-D240A/pAcC2C9HS-ABE2-D240A/pAcC2C9HS-ABE3-D240A/pAcC2C9HS-ABE4-E332A/pAcC2C9HS-ABE5-D429 plasmids constructed in example 4.1 and example 4.2, respectively, through Golden gate assembly to construct pAcC2C9HS-ABE1-D240A-G1-4/pAcC2C9HS-ABE2-D240A-G1-4/pAcC2C9HS-ABE3-D240A-G1-4/pAcC2C9HS-ABE4-E332A-G1-4/pAcC2C9HS-ABE5-D429-G1-4 series of plasmids. Golden gate assembly systems are shown in table 2 below.

And (3) connection procedure: this step was repeated for 25 cycles at 37℃for 2min and 16℃for 3min, and finally at 80℃for 10min.

The ligation product was transformed into E.coli DH 5. Alpha. Cells, and the single clone was picked up for sequencing to obtain 4 pAcC2C9HS-ABE1-D240A-G1-4/pAcC2C9HS-ABE2-D240A-G1-4/pAcC2C9HS-ABE3-D240A-G1-4/pAcC2C9HS-ABE4-E332A-G1-4/pAcC2C9HS-ABE5-D429-G1-4 series of plasmids (transient expression plasmids) containing the targeting sequence.

TABLE 2

Component (A)	Content of
		10×T4 DNA ligase Buffer(NEB)	1μl
PAcC2C9hs-ABE series plasmid	50ng
		Targeting sequence DNA	0.5μl
T4 DNA ligase(NEB)	0.5μl
		BsaI(NEB)	0.5μl
Sterile water	To 10 μl

2. Construction of pAcC C9HS-ABE3-D240A-G5-12 series plasmid

In this example, the 4 genes VEGFA, AAVS1, PDCD1, HEXA in the genome of HEK293T cells were selected as target sequences, and 220 bp target sequence DNAs were selected on each of these target sequences. The following sequences were synthesized by the biological engineering (Shanghai) Co., ltd:

Guide5_F:5'-ATCGAATCATTTCCCCAAGAGGAA-3'(SEQ ID NO.166)

Guide5_R:5'-AAAATTCCTCTTGGGGAAATGATT-3'(SEQ ID NO.167)

Guide6_F:5'-ATCGCAGAAAGAGTGACAGTATCC-3'(SEQ ID NO.168)

Guide6_R:5'-AAAAGGATACTGTCACTCTTTCTG-3'(SEQ ID NO.169)

Guide7_F:5'-ATCGGGAGACATCCGTCGGAGAAG-3'(SEQ ID NO.170)

Guide7_R:5'-AAAACTTCTCCGACGGATGTCTCC-3'(SEQ ID NO.171)

Guide8_F:5'-ATCGGAATCTGCCTAACAGGAGGT-3'(SEQ ID NO.172)

Guide8_R:5'-AAAAACCTCCTGTTAGGCAGATTC-3'(SEQ ID NO.173)

Guide9_F:5'-ATCGACAGTGGGGACTAGAGCTCA-3'(SEQ ID NO.174)

Guide9_R:5'-AAAATGAGCTCTAGTCCCCACTGT-3'(SEQ ID NO.175)

Guide10_F:5'-ATCGATGCTTCAGAGACGAGATGG-3'(SEQ ID NO.176)

Guide10_R:5'-AAAACCATCTCGTCTCTGAAGCAT-3'(SEQ ID NO.177)

Guide11_F:5'-ATCGTTTAACTACTTACTGTTTGT-3'(SEQ ID NO.178)

Guide11_R:5'-AAAAACAAACAGTAAGTAGTTAAA-3'(SEQ ID NO.179)

Guide12_F:5'-ATCGGACAAGGTTGAACAAACAGT-3'(SEQ ID NO.180)

Guide12_R:5'-AAAAACTGTTTGTTCAACCTTGTC-3'(SEQ ID NO.181)

Construction methods As described in example 5.1, only pAcC C9HS-ABE3-D240A plasmids were used to insert a total of 8 Guide sequences from 5 to 12 of the newly selected Guide sequences. The obtained plasmid is pAcC C9HS-ABE3-D240A-G5-12 series plasmid.

3. Construction of pAcC C9HS-CBE1-D240A-G1/pAcC C9HS-CBE2-D240A-G1 plasmid

Construction method As described in example 5.1, only the ABE series plasmid was replaced with the pAcC C9hs-CBE series plasmid constructed in example 4.3. The targeting sequence was inserted only into the targeting sequence Guide1 of example 5.1. The obtained plasmid is pAcC C9HS-CBE1-D240A-G1/pAcC C9HS-CBE2-D240A-G1 plasmid.

4.1、pAcC2C9HS-ABE1-D240A-G1-4/pAcC2C9HS-ABE2-D240A-G1-4/pAcC2C9HS-ABE3-D240A-G1-4/pAcC2C9HS-ABE4-E332A-G1-4/pAcC2C9HS-ABE5-D429-G1-4 Serial plasmid mediated human cell gene base editing

Activated HEK293T cells were cultured in DMEM medium containing 10% FBS by volume and passaged into 24 well plates after the cell growth density reached around 90% at a cell number of about 1.0 x 10 ⁵ cells per well. After 16-18 hours, 1000ng pAcC2C9HS-ABE1-D240A-G1-4/pAcC2C9HS-ABE2-D240A-G1-4/pAcC2C9HS-ABE3-D240A-G1-4/pAcC2C9HS-ABE4-E332A-G1-4/pAcC2C9HS-ABE5-D429-G1-4 plasmid editing the different genes was transfected into the cells with 2. Mu.L lipofectamine3000 (Invitrogen) per well, respectively. After 24 hours, fresh medium containing puromycin at a final concentration of 2. Mu.g/ml was added for screening. After culturing for 48 hours, the adherent cells were digested and genomic DNA was extracted. Cell genomic loci edited by different ABE systems were amplified using primers containing different 8nt barcode, respectively, and the amplicon sequencing library was constructed and submitted to Illumina NovaSeq sequencing (PE 150) at HaploX genomic centers (china, jiang). Sequencing data was analyzed using CRISPResso a 2.

4.2 For ease of representation, as shown in FIG. 3, the pAcC2C9HS-ABE1-D240A/pAcC2C9HS-ABE2-D240A/pAcC2C9HS-ABE3-D240A/pAcC2C9HS-ABE4-E332A/pAcC2C9HS-ABE5-D429 plasmid-mediated 5 AcC2C 9-based ABE editing systems are abbreviated as Design1 (D1), design2 (D2), design3 (D3), design4 (D4), design5 (D5), respectively. The efficiency of the 5 ABE base editing system in mammalian cells is shown in figure 4. The 5 ABE versions can realize effective A.T-G.C base editing on the target site. The base editing efficiency is 4% -13%, wherein the D3 version shows significantly higher editing efficiency in four gene loci compared with other four base editing systems.

5. PAcC2C9HS-ABE3-D240A-G1-12 series plasmid mediated human cell gene base editing

Gene editing experiments and analytical methods were as described in example 5.4.1 above, with only the transfection plasmid replaced with pAcC C9HS-ABE3-D240A-G1-12 series of plasmids. As in fig. 5, the D3 version exhibited effective gene editing efficiency (1% -13%) at more genomic sites. By analyzing the editing window of seven genomic loci of Guide1, guide2, guide3, guide4, guide6, guide7, and Guide8 (as shown in fig. 6), the editing window of the base editing system D3 based on AcC2C9 was continued from the 3 rd base to the 14 th base on the PAM downstream target sequence.

6. PAcC2C9HS-CBE1-D240A-G1/pAcC C9HS-CBE2-D240A-G1 plasmid mediated human cell gene base editing

Gene editing experiments and analytical methods were as described in example 5.4.1 above, with only the transfection plasmid replaced with pAcC C9HS-CBE1-D240A-G1/pAcC C9HS-CBE2-D240A-G1 plasmid. As shown in FIG. 7, the CBE base editing system based on pAcC C9HS-CBE1-D240A/pAcC C9HS-CBE2-D240A plasmid can realize effective C.G-T.A base editing at the target site. Editing efficiency was about 4% and 13%.

In summary, the base editing system of the present invention is a system based on RNA-guided deaminase enrichment in inactivated or partially inactivated C2C9 nuclease. The two different types of base editing systems of the invention can respectively carry out two different types of single base mutations of A.T-G.C or C.G-T.A. The gene editing system or method of the present invention can realize A.T.G.C or C.G.T.A base substitution at a target site including mammalian cells. The novel base editing system based on CRISPR/C2C9 has small gene size and can carry out single adeno-associated virus vector (AAV) loading. The technical defect that a single AAV coating cannot be realized by a base editing system based on SpCas9 is overcome. The PAM recognition type of the novel base editing system based on CRISPR/C2C9 is AAN and GAN. Helping to solve the problem of restriction of the editable sites of other Cas protein based base editing systems.

The above examples are provided to illustrate the disclosed embodiments of the invention and are not to be construed as limiting the invention. Further, various modifications of the methods set forth herein, as well as variations of the methods of the invention, will be apparent to those skilled in the art without departing from the scope and spirit of the invention. While the invention has been specifically described in connection with various specific preferred embodiments thereof, it should be understood that the invention should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention which are obvious to those skilled in the art are intended to be within the scope of the present invention.

Claims

1. A base editing system, comprising:

1) A C2C9 nuclease and/or a nucleic acid encoding the C2C9 nuclease;

2) A guide RNA and/or a nucleic acid encoding the guide RNA;

3) Deaminase and/or nucleic acid encoding the same.

2. The base editing system according to claim 1, wherein the C2C9 nuclease is:

(IV) according to (I) or (II) or (III), further comprising:

(B) A polypeptide or domain having other functional activity;

(V) according to (I) or (II) or (III), said C2C9 is a nuclease with impaired endonuclease activity, or a nuclease with partial endonuclease activity, or a nuclease without endonuclease activity.

3. The base editing system according to claim 2, wherein the amino acid sequence of the wild-type C2C9 nuclease is shown in SEQ ID No.1 to SEQ ID No.115, and/or the mutants of the C2C9 nuclease are: any one or more of the AcC2C 9D 240A, E a or D429A mutants.

4. The base editing system according to claim 1, wherein said base editing system recognizes PAM sequences on a target sequence; and/or the nucleic acid fragment with the length of 12-40 bp after the base editing system targets the PAM sequence, and the preferable length is 20bp.

5. The base editing system according to claim 4, wherein the PAM sequence is AAN, GAN; wherein N is a degenerate base and represents A, T, C or G arbitrary bases.

6. The base editing system of claim 1, wherein the guide RNA comprises:

i. A gene targeting segment capable of hybridizing to a target sequence,

A tracr mate sequence, which is a sequence of a tracr mate,

Tracr RNA sequence;

preferably, the guide RNA further comprises:

motifs that interact with and/or recruit proteins;

v. a structurally stable sequence;

Wherein the tracr mate sequence (ii) hybridizes to the tracrRNA sequence (iii) and forms a stem-loop structure;

The guide RNA is a strand formed by the sequential ligation of the nucleic acid targeting segment (i) with the tracr mate sequence (ii) and tracr RNA sequence (iii); alternatively, the guide RNA comprises two strands, one of which is formed by ligation of the gene targeting segment (i) with the tracr mate sequence (ii) and the other strand is the tracr RNA sequence (iii).

7. The base editing system according to claim 1, wherein the guide RNA has a sequence as set forth in SEQ ID NO.

Indicated at 134.

8. The base editing system according to claim 1, wherein the deaminase is cytosine deaminase and/or adenine deaminase; preferably, the deaminase is linked to a C2C9 nuclease by a linker peptide or the deaminase is linked to a C2C9 nuclease by any of protein-to-protein interactions, RNA-to-protein interactions, or chemical interactions.

9. The base editing system according to claim 8, wherein the adenine deaminase is a DNA or RNA adenine deaminase; preferably, the adenine deaminase is TadA or TadA x or a fusion of both; more preferably, the adenine deaminase nucleotide sequence is as shown in any one of SEQ ID NO.139-SEQ ID NO.141 or has at least 50% similarity to SEQ ID NO.139-SEQ ID NO. 141.

10. The base editing system according to claim 8, wherein the cytosine deaminase is a DNA or RNA cytosine deaminase; preferably, the cytosine deaminase is selected from APOBEC, AID, FERNY deaminase and/or CDA1 deaminase or a variant thereof; more preferably, the cytosine deaminase has a nucleotide sequence as set forth in any one of SEQ ID NO.154-SEQ ID NO.155 or has at least 50% similarity to SEQ ID NO.154-SEQ ID NO. 155.

11. The base editing system according to claim 1, wherein said C2C9 nuclease is further linked to a polypeptide having glycosylase inhibitor activity, preferably said glycosylase inhibitor is UGI.

12. A recombinant expression vector comprising any two or three of: (i) a nucleotide sequence encoding a gRNA, (ii) a nucleotide sequence encoding a C2C9 nuclease, (iii) a nucleotide sequence encoding the deaminase.

13. An adeno-associated virus, wherein the adeno-associated virus is packaged by the recombinant expression vector of claim 12.

14. A base editing method, characterized in that a target gene is contacted with the base editing system according to any one of claims 1 to 11 to effect editing of a single base on the target gene.

15. The base editing method according to claim 14, comprising the steps of:

i) Introducing the C2C9 nuclease or a polynucleotide encoding the same, the guide RNA or a polynucleotide encoding the same, and a deaminase or a polynucleotide encoding the same into a cell;

ii) one or more mutations in the target gene are generated or the target gene is targeted, edited, modified or manipulated mediated by the C2C9 nuclease and deaminase.

16. The base editing method according to claim 14, wherein the method is performed on the target gene and/or its related polypeptide in vivo, in an ex vivo cell or in a cell-free environment, preferably the ex vivo cell comprises a bacterial cell, an archaeal cell, a fungal cell, a protozoan cell, a viral cell, a plant cell or an animal cell.

17. A cell obtained by editing with the base editing system according to any one of claims 1 to 11 or the base editing method according to claim 14 or 15.