CN117701542A

CN117701542A - Cytosine deaminase SfSddA, base editor comprising same and application

Info

Publication number: CN117701542A
Application number: CN202311633324.4A
Authority: CN
Inventors: 欧阳红生; 袁泓明; 逄大欣; 邓嘉成; 李雪原
Original assignee: Chongqing Jitang Biotechnology Research Institute Co ltd
Current assignee: Chongqing Jitang Biotechnology Research Institute Co ltd
Priority date: 2023-12-01
Filing date: 2023-12-01
Publication date: 2024-03-15

Abstract

The invention provides a cytosine deaminase SfSddA, a base editor containing the same and application thereof, wherein the amino acid sequence is shown in the attached table, unreported SddA is found near the branching of high-activity protein, the SddA is used as candidate protein for further structural comparison, examination is carried out, and the amino acid sequence is selected from the group based on the result of the structural comparisonSaccharopolyspora flavaThe SddA is used as one of final candidate cytosine deaminase for activity evaluation, the editing efficiency of 4 nuclear target sites of the SfSddA in HEK293T cells is detected, and the efficient C-to-T editing efficiency is shown at the target position, so that the CBE containing the SfSddA has the advantages of high activity, small size and the like, and has obvious advantages compared with other CBEs.

Description

Cytosine deaminase SfSddA, base editor comprising same and application

Technical Field

The invention belongs to the field of gene editing, and particularly relates to cytosine deaminase SfSddA, a base editor containing the cytosine deaminase SfSddA and application of the cytosine deaminase SfSddA, which can effectively realize conversion from C to T in eukaryotic genes.

Background

CRISPR/Cas systems have become the most widely used gene editing tool in recent years with their advantages of convenience and efficiency. Under the guidance of sgrnas, cas proteins cleave to create double strand breaks at the target site, combining with subsequent DNA repair processes, achieving gene knockout by means of non-homologous end joining, or giving a DNA template, achieving gene repair by means of homologous recombination. However, homologous recombination mediated gene repair is generally inefficient and introduces a large number of insertions and deletions.

The simple, efficient and accurate single-base editing technology becomes a more effective tool for correcting point mutation caused by human diseases, and also becomes a popular choice for gene therapy research. The single base editing systems currently developed mainly include a Cytosine Base Editor (CBE), an Adenine Base Editor (ABE), a Guanine Base Editor (GBE), and a Prime Editor (Prime Editor).

The core components of CBE are nCas9 or dCas9 and cytosine deaminase, and Cas9 protein and cytosine deaminase constitute fusion protein, and uracil glycosylase inhibitor (uracil glycosylase inhibitor, UGI) is fused. The specific working principle is as follows: when the fusion protein targets the genome DNA under the guidance of sgRNA, cytosine deaminase can be combined to ssDNA of an R-loop region formed by Cas9 protein, sgRNA and genome DNA, cytosine (C) in a certain range on the ssDNA is deaminated to uracil (U), and then U is converted into thymine (T) through DNA replication or repair, and finally direct substitution from C/G base pairs to T/A base pairs is realized. CBEs developed by David Liu laboratories have undergone a fourth generation upgrade, fourth generation cytosine base editor: the rAPOBEC1-XTEN-nCas9-2UGI contains cytosine deaminase (rAPOBEC 1) of a rat, mutated Cas9 (D10A) -nCas9 and two uracil glycosylase inhibitors, and continuous improvement of editing efficiency and precision is realized. However, existing cytosine deaminase as a core component of CBE is limited by the limitations of gene sequence context, low efficiency, obvious off-target effect, inability to transport in vivo by adeno-associated virus (AAV) viral vectors due to oversized size, etc.

Disclosure of Invention

The object of the present invention is to provide a novel cytosine deaminase for optimizing CBE, said novel cytosine deaminase being derived from Saccharopolyspora chrysosporium @Saccharopolysporaflava) Is named as SfSddA, is found based on bioinformatics analysis of the existing single-base DNA dehydrogenase (SddA) data, and has the amino acid sequence shown in SEQ ID NO. 1.

It is a further object of the present invention to provide a polynucleotide sequence expressing a cytosine deaminase as defined in claim 1, having the nucleotide sequence as set forth in SEQ ID No. 2.

It is still another object of the present invention to provide a fusion protein having an amino acid sequence as shown in SEQ ID No.3, comprising a Cas enzyme domain; a cytosine deaminase domain and a UGI domain; the cytosine deaminase domain is SfSddA.

It is still another object of the present invention to provide a polynucleotide sequence for expressing the above fusion protein, the nucleotide sequence of which is shown in SEQ ID NO. 4.

It is a further object of the present invention to provide a recombinant expression vector comprising the polynucleotide sequence of the above fusion protein.

It is a further object of the present invention to provide a genetically engineered host cell comprising the recombinant expression vector or polynucleotide sequence with the fusion protein integrated in the genome as described above, as well as sgrnas, and AAVs.

It is still another object of the present invention to provide a cytosine single-base editor, wherein the cytosine single-base editor is obtained by integrating a polynucleotide sequence encoding the above fusion protein into an expression vector.

It is a further object of the present invention to provide the use of the above cytosine base editor for the preparation of a reagent for mediating gene editing for gene editing, to reduce off-target effects, to increase the efficiency of targeted editing or to increase the fidelity of targeted editing.

It is still another object of the present invention to provide a gene editing method comprising mediating gene editing with said cytosine base editor, co-injecting a polynucleic acid sequence encoding said cytosine base editor and sgRNA into a receptor, thereby performing gene editing, said receptor comprising: somatic or germ cells, including embryonic or fertilized eggs.

A further object of the present invention is a reagent or kit for gene editing comprising the SfSddA or the cytosine base editor as defined above.

Other aspects of the invention will be apparent to those skilled in the art in view of the disclosure herein.

According to the invention, through researching all data reported by DddA and SddA, high-activity proteins in the data set are artificially classified, a seed data set is generated for HMMER search by multi-sequence comparison, a protein sequence result queried in a database is simplified, and then multi-sequence comparison is carried out with a processed protein group to construct an evolutionary tree. Whereby an unreported SddA was found near the branching of the high activity protein, which was used as a candidate protein for further structural alignment, examined and self-selected based on the result of structural alignmentSaccharopolysporaflavaThe activity evaluation was performed as one of the final candidate cytosine deaminase.

The invention detects the editing efficiency of SfSddA at 4 nuclear target sites in HEK293T cells, and shows high-efficiency C-to-T editing efficiency at the target positions.

From the above, the CBE containing SfSddA has the advantages of high activity, small size and the like, and has obvious advantages compared with other CBEs in the prior art.

Drawings

FIG. 1 is a diagram showing the structural prediction of cytosine deaminase SfSddA of the invention;

FIG. 2 shows the sequencing peak of the efficiency of the mutation of sgRNA-1C to T, the recognition of the PAM sequence of sgRNA-1 is GGG, the mutation site of C-transition T is indicated by the black arrow, C ₄ the to T website evaluation mutation efficiency is 67%, C ₆ the to T website evaluates the mutation efficiency to 61%.

FIG. 3 shows the sequencing peak of the efficiency of the mutation of sgRNA-2C to T, the PAM sequence recognized by the sgRNA-2 is GGG, the T mutation site for C conversion is indicated by the black arrow, C ₅ the to T website evaluates mutation efficiency to 14%, C ₆ the to T website evaluates the mutation efficiency to 12%.

FIG. 4 shows the sequencing peak of the efficiency of the mutation of sgRNA-3C to T, the recognition of the PAM sequence by sgRNA-3 is TGG, the mutation site of C-transition T is indicated by the black arrow, C ₆ the to T website evaluates the mutation efficiency to 12%.

FIG. 5 shows the sequencing peak of the efficiency of the mutation of sgRNA-4C to T, the recognition of the PAM sequence by sgRNA-4 is TGG, the mutation site of C-transition T is indicated by the black arrow, C ₅ the to T website evaluates mutation efficiency to 12%, C ₆ the to T website evaluates the mutation efficiency to 10%.

Detailed Description

Embodiments of the present invention are described in detail below. The following examples are illustrative only and are not to be construed as limiting the invention. The examples are not to be construed as limiting the specific techniques or conditions described in the literature in this field or as per the specifications of the product. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention.

The embodiment of the invention prepares novel cytosine deaminase, a polynucleotide sequence for expressing the cytosine deaminase, a fusion protein with an amino acid sequence shown as SEQ ID NO.3, a polynucleotide sequence for expressing the fusion protein and a recombinant expression vector.

Example 1 construction of cytosine deaminase SfCBE.

The amino acid sequence of SfSddA is shown as SEQ ID NO.1, and the predicted structure is shown as figure 1. SfSddA was synthesized and cloned into the AncBE4max vector from which APOBEC was excised, yielding the cytosine deaminase SfCBE.

Example 2 construction of sgRNA vector.

According to NCBI gene library human gene sequence (version number of human genome sequence is GRCh 38), selecting region targeted by sgRNA for gene editing according to PAM sequence (NGG), designing and synthesizing 4 sgRNA sequences for human genes, specifically as follows:

sgRNA1-F sequence: 5-ccggGAACACAAAGCATAGACTGC-3;

sgRNA1-R sequence: 5-aaacGCAGTCTATGCTTTGTGTTC-3;

sgRNA2-F sequence: 5-ccggGAGTCCGAGCAGAAGAAGAA-3;

sgRNA2-R sequence: 5-aaacTTCTTCTTCTGCTCGGACTC-3;

sgRNA3-F sequence: 5-ccggGGAATCCCTTCTGCAGCACC-3;

sgRNA3-R sequence: 5-aaacGGTGCTGCAGAAGGGATTCC-3;

sgRNA4-F sequence: 5-ccggGGCCACACTAGCGTTGCTGC-3;

sgRNA4-R sequence: 5-aaacGCAGCAACGCTAGTGTGGCC-3.

Annealing the DNA sequences of the four pairs of single-stranded sgRNAs respectively to form four oligonucleotide chains of the sgRNAs targeting different gene loci of the human, and then respectively connecting the four sgRNAs to the linearized 74707 vector subjected to enzyme digestion to obtain four targeted human nuclear gene sgRNA expression vectors.

Example 3, CBE transfection of 293T cells.

After sequencing and verifying the constructed sgRNA expression vectors, extracting a target plasmid, performing precipitation and purification by sodium ethylacetate precipitation, introducing SfCBE and each sgRNA expression vector into 293T cells in a liposome transfection manner, culturing for 12 hours, replacing a culture solution, extracting the genome of each group of cells after 72 hours of transfection, performing PCR reaction by using specific primers, and sequencing the obtained PCR product gel by agarose gel electrophoresis. Specific genetic locus editing conditions of 293T cells were evaluated by analysis of sequencing peak patterns, and specific editing efficiency peak patterns are shown in FIGS. 2 to 5.As shown in FIG. 2, the sgRNA-1 recognizes the PAM sequence GGG, and the C-converting T mutation site is shown by the black arrow, C ₄ the to T website evaluation mutation efficiency is 67%, C ₆ the to T website evaluates the mutation efficiency to 61%. As shown in FIG. 3, the PAM sequence recognized by sgRNA-2 is GGG, the C-converting T mutation site is indicated by the black arrow, C ₅ the to T website evaluates mutation efficiency to 14%, C ₆ the to T website evaluates the mutation efficiency to 12%. As shown in FIG. 4, the PAM sequence recognized by sgRNA-3 is TGG, the C-converting T mutation site is indicated by the black arrow, C ₆ the to T website evaluates the mutation efficiency to 12%. As shown in FIG. 5, the PAM sequence recognized by sgRNA-4 is TGG, the C-converting T mutation site is indicated by the black arrow, C ₅ the to T website evaluates mutation efficiency to 12%, C ₆ the to T website evaluates the mutation efficiency to 10%.

As can be seen from the analysis of the sequencing peak diagrams of fig. 2 to 5: sfSddA has the ability to deaminate cytosines in 293T cells, and CBEs composed of combinations thereof can cause C to T conversion at the targeted DNA sequence under the direction of sgrnas. The target editing efficiency of the cytosine base editor is remarkably improved.

The invention provides genetically engineered host cells comprising the recombinant expression vectors described above or polynucleotide sequences having the fusion proteins integrated in the genome, as well as sgrnas, and AAV.

The invention prepares a cytosine single base editor, which is obtained by integrating a polynucleotide sequence for encoding the fusion protein into an expression vector.

The invention uses the cytosine base editor for gene editing, reduces off-target effect, improves target editing efficiency or improves fidelity of target editing.

The invention mediates gene editing by the cytosine base editor, and co-injects a polynucleic acid sequence and sgRNA for encoding the cytosine base editor into a receptor, thereby carrying out gene editing, wherein the receptor comprises: somatic or germ cells, including embryonic or fertilized eggs.

The present invention provides a reagent or kit for gene editing comprising the above SfSddA or the above cytosine base editor.

In the present invention, the vector of the present invention may contain a primer sequence, for example, a CAG promoter, and a promoter that can be expressed in a mammal, such as an EF1 a promoter that is generally regarded as equivalent to the CAG promoter, may be used. Furthermore, a mammalian tissue-specific promoter such as ICAM2 promoter may be used. The CAG promoter described above is used as one of gene expression promoters for expressing foreign genes. The "promoter" is usually located as the transcription start point in the front part of the DNA base sequence carrying the genetic information of the gene to be expressed, and is located within several hundred bases from the transcription start point. In eukaryotes, proteins known as transcriptional regulators bind to the promoter portion, thereby participating in the binding of RNA polymerase.

In the present invention, "sgRNA" refers to small guide RNA (sgRNA).

In the present invention, the cytosine base editor is also linked to a nuclear localization sequence; preferably, the N-terminal and/or C-terminal is linked to a nuclear localization sequence. The cytosine base editor and the nuclear localization sequence also comprise a connecting sequence, such as a tag sequence. The cytosine deaminase is also linked to the nuclear localization sequence by a linking sequence, which may be any linking sequence that does not affect the function of both, such as a tag sequence or some flexible linking sequence known in the art. Suitable labels may be used in the present invention. For example, the tag may be FLAG, HA, HA1, c-Myc, poly-His, poly-Arg, strep-TagII, AU1, EE, T7,4A6, ε, B, gE or Ty1.

The cytosine deaminase of the invention may be a recombinant protein, a natural protein, a synthetic protein, preferably a recombinant protein. The proteins of the invention may be naturally purified products, or chemically synthesized products, or produced from prokaryotic or eukaryotic hosts (e.g., bacterial, yeast, higher plant, insect, and mammalian cells) using recombinant techniques.

The invention may also include fragments, derivatives and analogues of SfSddA. The terms "fragment," "derivative" and "analog" as used herein refer to a protein that retains substantially the same biological function or activity of SfSddA of the invention. The protein fragments, derivatives or analogues of the invention may be proteins having one or more conserved or non-conserved amino acid residues (preferably conserved amino acid residues) substituted, and such substituted amino acid residues may or may not be encoded by the genetic code, or a protein having a substituent group in one or more amino acid residues, or a protein in which an additional amino acid sequence is fused to the protein sequence (such as a leader or secretory sequence or a sequence used to purify the protein or a proprotein sequence, or a fusion protein). Such fragments, derivatives and analogs are within the purview of one skilled in the art in view of the definitions herein.

In the present invention, sfSddA may also include (but is not limited to): deletion, insertion and/or substitution of several (usually 1-20, more preferably 1-10, still more preferably 1-8, 1-5, 1-3, or 1-2) amino acids, and addition or deletion of one or several (usually 20 or less, preferably 10 or less, more preferably 5 or less) amino acids at the C-terminal and/or N-terminal. For example, in the art, substitution with amino acids of similar or similar properties does not generally alter the function of the protein. As another example, the addition of one or more amino acids at the C-terminus and/or N-terminus typically does not alter the function of the protein. The term also includes active fragments and active derivatives of the engineered enzymes.

In the present invention, sfSddA also includes (but is not limited to): a derivatized protein having 80% or more, preferably 85% or more, more preferably 90% or more, even more preferably 95% or more, such as 98% or more, 99% or more sequence identity to the SfSddA amino acid sequence, which retains its protein activity.

The present invention provides polynucleotide sequences encoding SfSddA or a conservative variant protein thereof of the invention.

The polynucleotides of the invention may be in the form of DNA or RNA. DNA forms include cDNA, genomic DNA, or synthetic DNA. The DNA may be single-stranded or double-stranded. The DNA may be a coding strand or a non-coding strand.

The polynucleotide for encoding the mature protein of SfSddA comprises: a coding sequence encoding only the mature protein; coding sequences for mature proteins and various additional coding sequences; the coding sequence (and optionally additional coding sequences) of the mature protein, and non-coding sequences. The "polynucleotide encoding a protein" may include a polynucleotide encoding the protein, or may include additional coding and/or non-coding sequences.

The full-length SfSddA nucleotide sequence or fragment thereof of the present invention can be generally obtained by PCR amplification, recombinant methods or synthetic methods. For the PCR amplification method, primers can be designed according to the nucleotide sequences disclosed in the present invention, particularly the open reading frame sequences, and amplified to obtain the relevant sequences using a commercially available cDNA library or a cDNA library prepared according to a conventional method known to those skilled in the art as a template. When the sequence is longer, it is often necessary to perform two or more PCR amplifications, and then splice the amplified fragments together in the correct order. Once the relevant sequences are obtained, recombinant methods can be used to obtain the relevant sequences in large quantities. This is usually done by cloning it into a vector, transferring it into a cell, and isolating the relevant sequence from the propagated host cell by conventional methods.

Furthermore, the sequences concerned, in particular fragments of short length, can also be synthesized by artificial synthesis. In general, fragments of very long sequences are obtained by first synthesizing a plurality of small fragments and then ligating them. At present, it is already possible to obtain the DNA sequences encoding the proteins of the invention (or fragments or derivatives thereof) entirely by chemical synthesis. The DNA sequence can then be introduced into a variety of existing DNA molecules (or vectors, for example) and cells known in the art

The invention also relates to vectors comprising the polynucleotides of the invention, as well as host cells genetically engineered with the vectors or engineered enzyme coding sequences of the invention, and methods for producing the proteins of the invention by recombinant techniques.

The polynucleotide sequences of the present invention may be used to express or produce recombinant SfSddA by conventional recombinant DNA techniques. Generally, there are the following steps: (1) Transforming or transducing a suitable host cell with a polynucleotide of SfSddA of the invention, or with a recombinant expression vector comprising the polynucleotide; (2) host cells cultured in a suitable medium; (3) isolating and purifying the protein from the culture medium or the cells.

The present invention provides cytosine base editors comprising said engineered enzymes or polynucleotide sequences thereof. Other components of the cytosine base editor are known to those skilled in the art.

In the present invention, the SfSddA polynucleotide sequence or the cytosine base editor polynucleotide sequence may be inserted into a recombinant expression vector. The term "recombinant expression vector" refers to bacterial plasmids, phages, yeast plasmids, plant cell viruses, mammalian cell viruses or other vectors well known in the art. In general, any plasmid or vector can be used as long as it replicates and is stable in the host. An important feature of expression vectors is that they generally contain an origin of replication, a promoter, a marker gene and translational control elements.

Methods well known to those skilled in the art can be used to construct expression vectors containing the SfSddA polynucleotide sequence or the cytosine base editor polynucleotide sequence and appropriate transcriptional/translational control signals. These methods include in vitro recombinant DNA techniques, DNA synthesis techniques, in vivo recombinant techniques, and the like. The DNA sequence may be operably linked to an appropriate promoter in an expression vector to direct mRNA synthesis. The expression vector also includes a ribosome binding site for translation initiation and a transcription terminator. The expression vector preferably comprises one or more selectable marker genes to provide a phenotypic trait for selection of transformed host cells.

Vectors comprising the appropriate DNA sequences and appropriate promoter or control sequences described above may be used to transform appropriate host cells and then recipient cells.

The invention also provides a method for carrying out gene editing, which comprises the step of mediating gene editing by using the cytosine base editor. In addition to gene editing using the cytosine base editors described herein, other aspects of the gene editing reagents may be designed using techniques known in the art, e.g., sgrnas may be designed using techniques known in the art.

In the present invention, the object to be subjected to gene editing is not particularly limited, and may be a somatic cell or a germ cell, and may be an animal cell or a human cell.

The invention expands the application of CBE in laboratory and clinic.

Finally, it should be noted that the present invention is capable of other various embodiments and that various changes and modifications can be made herein by one of ordinary skill in the art without departing from the spirit and scope of the invention as defined by the appended claims.

SfSddA amino acid sequence:

MSALEELLASLREIADQLESARADMSAGMDSWAERAATLEWWLTGTNDPDALDVLAQQPVTTDALRTSWEAVDLTVAVIEDYIGRLEAVESPSSSAASGNSTSGDDQAAQPSMTAPDGSRYPRAVGWAVDVMPRRVREGQGDRTVGYADGAVGQPFTSGHDQTWTPLILERARAVGLPARFAGLLGSHVEMKVATAMIQRGRRHSELVINHVPCGSQAGQRPGCHQALEKYLPEGHTLTVHGTTQGGEPYSHTYHGRAQR。

SfSddA nucleotide sequence:

ATGTCCGCCCTCGAGGAACTGCTGGCTAGCCTGAGAGAGATCGCCGATCAGCTGGAATCCGCCAGAGCTGACATGAGCGCCGGAATGGACAGCTGGGCTGAAAGAGCAGCTACACTGGAATGGTGGCTGACAGGCACAAACGACCCCGATGCCCTGGACGTGCTGGCCCAGCAGCCAGTGACCACCGACGCCCTGCGGACATCTTGGGAGGCCGTGGACCTGACCGTGGCCGTCATCGAGGACTACATCGGCAGACTGGAAGCCGTTGAGAGCCCATCTAGCAGCGCCGCTTCTGGAAATAGCACAAGCGGCGACGACCAGGCTGCCCAGCCCAGCATGACCGCCCCTGATGGATCTAGATATCCTAGAGCCGTGGGCTGGGCCGTGGATGTGATGCCTAGGCGGGTGCGGGAAGGCCAGGGCGATAGAACCGTGGGCTACGCCGACGGAGCCGTCGGCCAGCCTTTCACCAGCGGCCACGACCAAACATGGACACCTCTGATCCTGGAGAGAGCCCGCGCCGTGGGACTGCCTGCCAGATTCGCCGGCCTGCTGGGCTCCCACGTGGAAATGAAGGTGGCCACCGCCATGATCCAGCGGGGCCGGCGGCACAGCGAGCTGGTGATTAACCACGTGCCCTGCGGCAGCCAGGCCGGCCAAAGACCTGGTTGTCACCAGGCCCTGGAGAAGTACCTGCCCGAGGGCCATACCCTGACCGTGCACGGCACCACCCAGGGCGGCGAGCCTTACAGCCACACCTACCACGGCAGAGCTCAGAGA。

amino acid sequence of SfSddA fusion protein:

PKKKRKVGSMSALEELLASLREIADQLESARADMSAGMDSWAERAATLEWWLTGTNDPDALDVLAQQPVTTDALRTSWEAVDLTVAVIEDYIGRLEAVESPSSSAASGNSTSGDDQAAQPSMTAPDGSRYPRAVGWAVDVMPRRVREGQGDRTVGYADGAVGQPFTSGHDQTWTPLILERARAVGLPARFAGLLGSHVEMKVATAMIQRGRRHSELVINHVPCGSQAGQRPGCHQALEKYLPEGHTLTVHGTTQGGEPYSHTYHGRAQRSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESIRPKRNSDKLIARKKDWDPKKYGGFVSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASARFLQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPRAFKYFDTTIDRKVYRSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSKRTADGSEFEPKKKRKV。

nucleotide sequence of SfSddA fusion protein:

ccaaagaagaagcggaaagtcggatccATGTCCGCCCTCGAGGAACTGCTGGCTAGCCTGAGAGAGATCGCCGATCAGCTGGAATCCGCCAGAGCTGACATGAGCGCCGGAATGGACAGCTGGGCTGAAAGAGCAGCTACACTGGAATGGTGGCTGACAGGCACAAACGACCCCGATGCCCTGGACGTGCTGGCCCAGCAGCCAGTGACCACCGACGCCCTGCGGACATCTTGGGAGGCCGTGGACCTGACCGTGGCCGTCATCGAGGACTACATCGGCAGACTGGAAGCCGTTGAGAGCCCATCTAGCAGCGCCGCTTCTGGAAATAGCACAAGCGGCGACGACCAGGCTGCCCAGCCCAGCATGACCGCCCCTGATGGATCTAGATATCCTAGAGCCGTGGGCTGGGCCGTGGATGTGATGCCTAGGCGGGTGCGGGAAGGCCAGGGCGATAGAACCGTGGGCTACGCCGACGGAGCCGTCGGCCAGCCTTTCACCAGCGGCCACGACCAAACATGGACACCTCTGATCCTGGAGAGAGCCCGCGCCGTGGGACTGCCTGCCAGATTCGCCGGCCTGCTGGGCTCCCACGTGGAAATGAAGGTGGCCACCGCCATGATCCAGCGGGGCCGGCGGCACAGCGAGCTGGTGATTAACCACGTGCCCTGCGGCAGCCAGGCCGGCCAAAGACCTGGTTGTCACCAGGCCCTGGAGAAGTACCTGCCCGAGGGCCATACCCTGACCGTGCACGGCACCACCCAGGGCGGCGAGCCTTACAGCCACACCTACCACGGCAGAGCTCAGAGAagcggcagcgagactcccgggacctcagagtccgccacacccgaaagtgacaagaagtacagcatcggcctggccatcggcaccaactctgtgggctgggccgtgatcaccgacgagtacaaggtgcccagcaagaaattcaaggtgctgggcaacaccgaccggcacagcatcaagaagaacctgatcggagccctgctgttcgacagcggcgaaacagccgaggccacccggctgaagagaaccgccagaagaagatacaccagacggaagaaccggatctgctatctgcaagagatcttcagcaacgagatggccaaggtggacgacagcttcttccacagactggaagagtccttcctggtggaagaggataagaagcacgagcggcaccccatcttcggcaacatcgtggacgaggtggcctaccacgagaagtaccccaccatctaccacctgagaaagaaactggtggacagcaccgacaaggccgacctgcggctgatctatctggccctggcccacatgatcaagttccggggccacttcctgatcgagggcgacctgaaccccgacaacagcgacgtggacaagctgttcatccagctggtgcagacctacaaccagctgttcgaggaaaaccccatcaacgccagcggcgtggacgccaaggccatcctgtctgccagactgagcaagagcagacggctggaaaatctgatcgcccagctgcccggcgagaagaagaatggcctgttcggaaacctgattgccctgagcctgggcctgacccccaacttcaagagcaacttcgacctggccgaggatgccaaactgcagctgagcaaggacacctacgacgacgacctggacaacctgctggcccagatcggcgaccagtacgccgacctgtttctggccgccaagaacctgtccgacgccatcctgctgagcgacatcctgagagtgaacaccgagatcaccaaggcccccctgagcgcctctatgatcaagagatacgacgagcaccaccaggacctgaccctgctgaaagctctcgtgcggcagcagctgcctgagaagtacaaagagattttcttcgaccagagcaagaacggctacgccggctacattgacggcggagccagccaggaagagttctacaagttcatcaagcccatcctggaaaagatggacggcaccgaggaactgctcgtgaagctgaacagagaggacctgctgcggaagcagcggaccttcgacaacggcagcatcccccaccagatccacctgggagagctgcacgccattctgcggcggcaggaagatttttacccattcctgaaggacaaccgggaaaagatcgagaagatcctgaccttccgcatcccctactacgtgggccctctggccaggggaaacagcagattcgcctggatgaccagaaagagcgaggaaaccatcaccccctggaacttcgaggaagtggtggacaagggcgcttccgcccagagcttcatcgagcggatgaccaacttcgataagaacctgcccaacgagaaggtgctgcccaagcacagcctgctgtacgagtacttcaccgtgtataacgagctgaccaaagtgaaatacgtgaccgagggaatgagaaagcccgccttcctgagcggcgagcagaaaaaggccatcgtggacctgctgttcaagaccaaccggaaagtgaccgtgaagcagctgaaagaggactacttcaagaaaatcgagtgcttcgactccgtggaaatctccggcgtggaagatcggttcaacgcctccctgggcacataccacgatctgctgaaaattatcaaggacaaggacttcctggacaatgaggaaaacgaggacattctggaagatatcgtgctgaccctgacactgtttgaggacagagagatgatcgaggaacggctgaaaacctatgcccacctgttcgacgacaaagtgatgaagcagctgaagcggcggagatacaccggctggggcaggctgagccggaagctgatcaacggcatccgggacaagcagtccggcaagacaatcctggatttcctgaagtccgacggcttcgccaacagaaacttcatgcagctgatccacgacgacagcctgacctttaaagaggacatccagaaagcccaggtgtccggccagggcgatagcctgcacgagcacattgccaatctggccggcagccccgccattaagaagggcatcctgcagacagtgaaggtggtggacgagctcgtgaaagtgatgggccggcacaagcccgagaacatcgtgatcgaaatggccagagagaaccagaccacccagaagggacagaagaacagccgcgagagaatgaagcggatcgaagagggcatcaaagagctgggcagccagatcctgaaagaacaccccgtggaaaacacccagctgcagaacgagaagctgtacctgtactacctgcagaatgggcgggatatgtacgtggaccaggaactggacatcaaccggctgtccgactacgatgtggaccatatcgtgcctcagagctttctgaaggacgactccatcgacaacaaggtgctgaccagaagcgacaagaaccggggcaagagcgacaacgtgccctccgaagaggtcgtgaagaagatgaagaactactggcggcagctgctgaacgccaagctgattacccagagaaagttcgacaatctgaccaaggccgagagaggcggcctgagcgaactggataaggccggcttcatcaagagacagctggtggaaacccggcagatcacaaagcacgtggcacagatcctggactcccggatgaacactaagtacgacgagaatgacaagctgatccgggaagtgaaagtgatcaccctgaagtccaagctggtgtccgatttccggaaggatttccagttttacaaagtgcgcgagatcaacaactaccaccacgcccacgacgcctacctaaacgccgtcgtgggaaccgccctgatcaaaaagtaccctaagctggaaagcgagttcgtgtacggcgactacaaggtgtacgacgtgcggaagatgatcgccaagagcgagcaggaaatcggcaaggctaccgccaagtacttcttctacagcaacatcatgaactttttcaagaccgagattaccctggccaacggcgagatccggaagcggcctctgatcgagacaaacggcgaaaccggggagatcgtgtgggataagggccgggattttgccaccgtgcggaaagtgctgagcatgccccaagtgaatatcgtgaaaaagaccgaggtgcagacaggcggcttcagcaaagagtctatcCggcccaagaggaacagcgataagctgatcgccagaaagaaggactgggaccctaagaagtacggcggcttcgtgagccccaccgtggcctattctgtgctggtggtggccaaagtggaaaagggcaagtccaagaaactgaagagtgtgaaagagctgctggggatcaccatcatggaaagaagcagcttcgagaagaatcccatcgactttctggaagccaagggctacaaagaagtgaaaaaggacctgatcatcaagctgcctaagtactccctgttcgagctggaaaacggccggaagagaatgctggcctctgccagattcctgcagaagggaaacgaactggccctgccctccaaatatgtgaacttcctgtacctggccagccactatgagaagctgaagggctcccccgaggataatgagcagaaacagctgtttgtggaacagcacaagcactacctggacgagatcatcgagcagatcagcgagttctccaagagagtgatcctggccgacgctaatctggacaaagtgctgtccgcctacaacaagcaccgggataagcccatcagagagcaggccgagaatatcatccacctgtttaccctgaccaatctgggagcccctcgggccttcaagtactttgacaccaccatcgaccggaaggtgtaccggagcaccaaagaggtgctggacgccaccctgatccaccagagcatcaccggcctgtacgagacacggatcgacctgtctcagctgggaggtgacagcggcgggagcggcgggagcggggggagcactaatctgagcgacatcattgagaaggagactgggaaacagctggtcattcaggagtccatcctgatgctgcctgaggaggtggaggaagtgatcggcaacaagccagagtctgacatcctggtgcacaccgcctacgacgagtccacagatgagaatgtgatgctgctgacctctgacgcccccgagtataagccttgggccctggtcatccaggattctaacggcgagaataagatcaagatgctgagcggaggatccggaggatctggaggcagcaccaacctgtctgacatcatcgagaaggagacaggcaagcagctggtcatccaggagagcatcctgatgctgcccgaagaagtcgaagaagtgatcggaaacaagcctgagagcgatatcctggtccataccgcctacgacgagagtaccgacgaaaatgtgatgctgctgacatccgacgccccagagtataagccctgggctctggtcatccaggattccaacggagagaacaaaatcaaaatgctgtctggcggctcaaaaagaaccgccgacggcagcgaattcgagcccaagaagaagaggaaagtc。

Claims

1. a cytosine deaminase, characterized by: the amino acid sequence is shown as SEQ ID NO. 1.

2. A polynucleotide sequence which expresses the cytosine deaminase of claim 1 and has the nucleotide sequence set forth in SEQ ID No. 2.

3. A fusion protein, characterized in that: comprising a Cas enzyme domain; a cytosine deaminase domain and a UGI domain; the cytosine deaminase domain is the cytosine deaminase of claim 1.

4. A polynucleotide sequence characterized by: a fusion protein according to claim 3.

5. A recombinant expression vector, characterized in that: comprising the polynucleotide sequence of claim 4.

6. A host cell, characterized in that: comprising the recombinant expression vector or genome of claim 5 having incorporated therein the polynucleotide sequence of claim 4, and sgRNA.

7. A cytosine single base editor obtained by integrating the polynucleotide sequence of claim 4 into an expression vector.

8. Use of the cytosine base editor of claim 7 in the preparation of a reagent for mediating gene editing, including for gene editing, reducing off-target effects, increasing efficiency of targeted editing, or increasing fidelity of targeted editing.

9. A method of gene editing comprising co-injecting into a host cell a polynucleotide sequence according to claim 4 encoding said polynucleotide sequence and sgRNA according to claim 7, wherein the cytosine base editor mediates gene editing.

10. A reagent or kit for gene editing comprising the cytosine deaminase of claim 1 or the cytosine base editor of claim 7.