WO2023193536A1 - 一种腺苷脱氨酶、碱基编辑器及应用 - Google Patents

一种腺苷脱氨酶、碱基编辑器及应用 Download PDF

Info

Publication number
WO2023193536A1
WO2023193536A1 PCT/CN2023/078133 CN2023078133W WO2023193536A1 WO 2023193536 A1 WO2023193536 A1 WO 2023193536A1 CN 2023078133 W CN2023078133 W CN 2023078133W WO 2023193536 A1 WO2023193536 A1 WO 2023193536A1
Authority
WO
WIPO (PCT)
Prior art keywords
amino acid
seq
adenosine deaminase
base editor
acid sequence
Prior art date
Application number
PCT/CN2023/078133
Other languages
English (en)
French (fr)
Inventor
张红玲
赖崇平
Original Assignee
尧唐(上海)生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 尧唐(上海)生物科技有限公司 filed Critical 尧唐(上海)生物科技有限公司
Publication of WO2023193536A1 publication Critical patent/WO2023193536A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P3/00Drugs for disorders of the metabolism
    • A61P3/06Antihyperlipidemics
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04004Adenosine deaminase (3.5.4.4)
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/09Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal

Definitions

  • the invention belongs to the field of biotechnology, and more specifically, the invention relates to an adenosine deaminase, a base editor fusion protein, a base editor system and their uses.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • Cas9-mediated gene editing technology has become the most powerful tool to achieve this goal.
  • Traditional CRISPR/Cas9 technology induces homologous recombination (HR) and non-homologous end joining (Non-Homologous EndJoining) in cells by generating DNA double strand breaks (DSB) at the target site.
  • HR homologous recombination
  • Non-Homologous EndJoining non-homologous end joining
  • DSB DNA double strand breaks
  • DNA repair triggered by DSB is difficult to achieve efficient and stable single-base mutations.
  • Single nucleotide variations can cause about 2/3 of human genetic diseases, and are also the genetic basis for variations in important traits of many animals and plants. Therefore, it is particularly important to develop a technology that can accurately and efficiently realize single base substitutions, David R. Liu experiment
  • the base editor developed in the laboratory was born for this purpose.
  • David R. Liu's laboratory has developed three different base editors, namely Cytosine Base Editor (CBE), Adenine Base Editor (ABE) and Lead Editor (Prime Editor), these base editors do not rely on the generation of DSBs or the involvement of donor DNA when working.
  • Adenine base editing technology based on adenosine deaminase mainly uses a fusion protein composed of nickase Cas9n (D10A) or dCas9 combined with adenosine deaminase. Under the guidance of sgRNA, it will be located in the base editing activity window The target base adenine A in the DNA is deaminated to form hypoxanthine I, which is gradually replaced by G after DNA repair and replication, ultimately forming a directional substitution of A to G (A to G).
  • Base editors can also treat some diseases by editing gene targets, such as hypercholesterolemia, transthyretin amyloidosis, and beta-hemoglobinopathies. Therefore, existing base editors should be optimized. It also has better prospects for clinical application.
  • the invention discloses a deaminase, a base editor containing the same and its use.
  • site-directed mutations are carried out on the basic sequence of the deaminase to obtain an adenine base editor with improved editing efficiency, and improve its performance in the real world.
  • Editing efficiency in nuclear cells we also conducted a large number of experiments on the structure of base editor fusion proteins, found a suitable position for chimeric adenosine deaminase in the nuclease domain, and obtained a base editor fusion with improved editing efficiency. Protein structure to obtain better application prospects.
  • proteins can undergo amino acid changes, such as substitutions, deletions, or additions, and the resulting protein can maintain its function or activity.
  • substitution refers to the replacement of a certain amino acid residue at a certain position in the amino acid sequence by another amino acid residue; wherein the “substitution” can be a conservative amino acid substitution.
  • Constant modification refers to the substitution of other amino acids in a protein with other amino acids that have similar characteristics (such as charge, side chain size, hydrophobicity/hydrophilicity, backbone conformation and rigidity, etc.) Amino acids allow frequent changes without altering the biological activity of the protein.
  • amino acids with similar characteristics are as follows:
  • Polynucleotide and “nucleic acid” refer to a polymeric form of nucleotides (eg, RNA or DNA) of any length.
  • the term includes, but is not limited to, single-, double-, or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids or compounds containing purine and pyrimidine bases or other natural, chemically or biochemically modified, non- Polymers of natural or derivatized nucleotide bases.
  • DNA sequence encoding a specific RNA refers to the DNA nucleotide sequence that is transcribed into RNA.
  • regulatory element refers to transcriptional and translational regulatory sequences, such as promoters, enhancers, terminators, etc.
  • promoter also known as a promoter sequence
  • a “promoter” is a DNA regulatory region capable of binding RNA polymerase and initiating the transcription of downstream (3' direction) coding or non-coding sequences.
  • promoters including inducible promoters, can be used to drive expression of the various vectors of the invention.
  • Codon optimization refers to the modification of a nucleic acid sequence to make the aforementioned nucleic acid sequence better expressed in host cells. Generally speaking, by changing at least one codon (for example, it can be 1 or more) in the original nucleic acid sequence. (e.g., 10, 20 or more) are replaced with more frequently or most frequently used codons in the host cell gene while maintaining the still natural amino acid sequence expressed therein.
  • Naturally occurring also called unmodified, unmodified, wild-type (wt) refers to a nucleic acid, polypeptide, cell or organism that occurs in nature.
  • wt unmodified, unmodified, wild-type
  • polypeptide As used in this disclosure, the terms "polypeptide,” “peptide,” and “protein” are used interchangeably herein and refer to a polymer of amino acids of any length.
  • the polymer can be linear or branched, it can contain modified amino acids, and it can be interrupted by non-amino acids.
  • the term also includes amino acid polymers that have been modified (eg, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component).
  • Fusion protein refers to a hybrid polypeptide comprising protein domains from at least two different proteins.
  • One protein can be located in the amino-terminal (N-terminal) portion or the carboxy-terminal (C-terminal) portion of the fusion protein, thereby forming an amino-terminal fusion protein or a carboxy-terminal fusion protein, respectively.
  • CRISPR-Cas system also known as “CRISPR system” or “CRISPR/Cas system”
  • CRISPR system CRISPR/Cas system
  • components of a CRISPR system may comprise nucleic acids (eg, vectors) encoding one or more components of the system, components in protein form, or a combination thereof.
  • Cas protein refers to CRISPR-related protein (Cas) (also known as “CRISPR-related protein”, “CRISPR effector”, “effector”, “Cas protein”, “Cas enzyme” or “CRISPR enzyme”) A protein that performs enzymatic activity and/or binds to a target site on the nucleic acid specified by the RNA Wizard.
  • the Cas protein has endonuclease activity, nickase activity, exonuclease activity, transposase activity and/or excision activity; in other embodiments, the Cas protein may be a nuclease Deactivated or partially deactivated.
  • Guide RNA also known as guide RNA, gRNA or crRNA
  • guide RNA refers to any RNA molecule that facilitates targeting of Cas protein to a target nucleic acid (such as DNA and/or RNA)
  • guide RNA includes those skilled in the art
  • RNA-based molecules capable of forming complexes with Cas proteins (e.g., direct repeat (DR) sequences) and containing sufficient affinity to the target nucleic acid sequence.
  • DR direct repeat
  • crRNA includes repeats and spacers.
  • CRISPR is transcribed to form a long chain of pre-CRISPR RNA (pre-crRNA).
  • Pre-crRNA is processed to obtain a repeat region.
  • crRNA is obtained by Cas protein acting on pre-crRNA.
  • crRNA is obtained by Cas protein and tracrRNA (trans-activating crRNA) acting on pre-crRNA.
  • crRNA can be used alone as a guide RNA (gRNA) to guide the Cas protein to the target sequence located near the PAM sequence, or crRNA and tracrRNA can be combined into a single guide RNA (single guide RNA, sgRNA). Guide the Cas protein to target sequences located near the PAM sequence.
  • gRNA guide RNA
  • sgRNA single guide RNA
  • the "guide sequence of crRNA” refers to the sequence in crRNA that hybridizes to the target sequence of the target nucleic acid, which corresponds to the spacer sequence (spacer) of crRNA.
  • target sequence refers to the nucleotide sequence in the target nucleic acid that is complementary or at least partially complementary to crRNA. After the Cas protein, crRNA and the target sequence form a ternary complex, the Cas protein exerts its influence on the target nucleic acid. Specific cleavage activity on target nucleic acid strands and/or non-nucleotide strands.
  • target sequence is used interchangeably with “target nucleic acid,” “target polynucleotide,” “target sequence,” and “target nucleic acid sequence.”
  • target strand refers to the nucleotide strand in the target nucleic acid that hybridizes to crRNA; the term “non-target strand” refers to the nucleotide strand in the target nucleic acid that is not hybridized to crRNA.
  • the term “Cas9” or “Cas9 domain” refers to an RNA-guided nuclease that contains the Cas9 protein or a fragment thereof (e.g., a protein containing the active, inactive, or partially active DNA cleavage domain of Cas9, and/or a gRNA Binding domain of Cas9).
  • Cas9 nuclease is sometimes also called CRISPR-associated nuclease 9.
  • CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids).
  • CRISPR clusters contain numerous short and conserved repeats and spacers.
  • CRISPR clusters are transcribed and processed into pre-crRNA.
  • type II CRISPR/cas9 systems the correct processing of pre-crRNA requires transcoded small RNA (tracrRNA), endogenous ribonuclease 3 (RNase III), and Cas9 protein.
  • tracrRNA serves as a guide for ribonuclease 3-assisted processing of pre-crRNA.
  • Cas9/crRNA/tracrRNA endonucleolyzes linear or circular dsDNA targets that are complementary to the spacer sequence.
  • Target strands that are not complementary to crRNA are cleaved by endonucleolysis.
  • DNA binding and cutting typically requires proteins and two types of RNA.
  • single guide RNA sgRNA
  • Cas9 recognizes a short motif in the CRISPR repeat (PAM or protospacer adjacent motif) to help distinguish self from non-self.
  • Cas9 nuclease sequence and structure are well known to those skilled in the art (see, e.g., "Complete genome sequence of an M1strain of Streptococcus pyogenes.” Ferretti et al., Proc. Natl. Acad. Sci. USA 98:4658-4663 (2001); " CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E. et al., Nature 471:602-607 (2011); and "Aprogrammable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M. et al., Science 337:816-821(2012)).
  • Cas9 orthologs have been described in a variety of species, including but not limited to Streptococcus pyogenes and Streptococcus thermophilus.
  • Other suitable Cas9 nucleases and sequences will be apparent to those skilled in the art based on this disclosure, and such Cas9 nucleases and sequences include those from Chylinski, Rhun, and Charpentier, "The tracrRNA and Cas9 families of type II CRISPR- Cas9 sequences for organisms and loci disclosed in "Cas immunity systems” (2013) RNA Biology 10:5,726-737; the entire contents of which are incorporated herein by reference.
  • Nuclease-inactivated Cas9 proteins are interchangeably referred to as “dCas9” proteins (nuclease-"dead” Cas9 or nuclease-inactive Cas9) or catalytically inactivated Cas9.
  • Useful methods for generating Cas9 proteins (or fragments thereof) with inactive DNA cleavage domains are known (see, e.g., Jinek et al., Science. 337:816-821 (2012); Qi et al. "Repurposing CRISPR as an RNA- Guided Platform for Sequence-Specific Control of Gene Expression” (2013) Cell. 28; 152(5):1173-83).
  • the DNA cleavage domain of Cas9 is known to include two subdomains, the HNH nuclease subdomain and the RuvC subdomain.
  • the HNH subdomain cleaves the strand complementary to the gRNA, while the RuvC subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas9.
  • mutations D10A and H840A completely inactivate the nuclease activity of Streptococcus pyogenes Cas9 (Jinek et al., Science. 337:816-821 (2012); Qi et al., Cell. 28:152(5):1173- 83(2013)).
  • nuclease-free Cas9 domains include, but are not limited to, D10A/H840A, D10A/D839A/H840A, and D10A/D839A/H840A/N863A mutant domains (see, e.g., Prashant et al., Nature Biotechnology. 2013;31(9):833-838).
  • Cas9 nickase can cut one strand of double-stranded DNA.
  • Cas9 nickases can be generated by introducing inactivating mutations into the HNH subdomain or the RuvC subdomain. For example, deletions can be introduced in the RuvC domain of Streptococcus pyogenes Cas9. active mutation (D10A), while the HNH domain remains active, i.e., the residue at position 840 remains histidine.
  • Such Cas9 variants are capable of generating single-stranded DNA breaks (nicks) at specific locations based on target sequences determined by the gRNA.
  • One skilled in the art can identify the catalytic residues in the RuvC and HNH domains of any known Cas9 protein and introduce inactivating mutations to generate the corresponding dCas9 or nCas9.
  • the term "deaminase” or “deaminase domain” refers to a protein or enzyme that catalyzes a deamination reaction.
  • the deaminase is an adenosine deaminase, which catalyzes the hydrolytic deamination of adenine to hypoxanthine.
  • the deaminase is an adenosine deaminase, which catalyzes the hydrolytic deamination of adenosine or adenine (A) to inosine (I).
  • the deaminase or deaminase domain is an adenosine deaminase that catalyzes the hydrolytic deamination of adenosine or deoxyadenosine to inosine or deoxyinosine, respectively.
  • adenosine deaminase catalyzes the hydrolytic deamination of adenosine from deoxyribonucleic acid (DNA).
  • nucleic acid programmable nucleotide-binding domain and “nucleic acid programmable DNA-binding protein (napDNAbp)” refer to nucleic acids (e.g., DNA or RNA) , for example, a protein that guides the binding of a polynucleotide (eg, a gRNA) that directs napDNAbp to a specific nucleic acid sequence, for example, by hybridizing to a target nucleic acid sequence.
  • the Cas9 protein can bind to a guide RNA that directs the Cas9 protein to a specific DNA sequence that is complementary to the guide RNA.
  • napDNAbp is a Cas9 domain, e.g., nuclease active Cas9, Cas9 nickase (nCas9) or nuclease inactive Cas9 (dCas9).
  • nucleic acid programmable DNA binding proteins include, but are not limited to, Cas9 (eg, dCas9 and nCas9), CasX, CasY, Cpf1, C2c1, C2c2, C2c3, and Argonaute proteins (AGO).
  • nucleic acid programmable DNA binding proteins also include nucleic acid programmable proteins that bind RNA.
  • napDNAbp can bind to a nucleic acid that directs napDNAbp to RNA.
  • Other nucleic acid programmable DNA binding proteins are also within the scope of this disclosure, although they may not be specifically described in this disclosure.
  • the nCas9 domain includes nCas9 or a fragment thereof, and the nCas9 fragment has a certain degree of homology with nCas9 (e.g., at least about 70% homology, or at least about 80% homology, at least about 90% homology, at least about 95% homology) Homology, at least about 96% homology, at least about 97% homology, at least about 98% homology, at least about 99% homology, at least about 99.5% homology, or at least about 99.9% homology. origin) and retain its basic functions.
  • nCas9 e.g., at least about 70% homology, or at least about 80% homology, at least about 90% homology, at least about 95% homology
  • Homology at least about 96% homology, at least about 97% homology, at least about 98% homology, at least about 99% homology, at least about 99.5% homology, or at least about 99.9% homology. origin
  • Base Editor refers to a reagent that binds to a polynucleotide and has nucleobase modification activity.
  • a base editor comprises a nucleobase modifying polypeptide (e.g., a deaminase) and a nucleic acid programmable nucleotide binding domain (e.g., a guide RNA) that binds to a guide polynucleotide (e.g., a guide RNA) Nucleic acid programmable DNA binding protein).
  • the agent is a biomolecular complex comprising a protein domain having base editing activity, i.e., capable of modifying bases (e.g., A, T, C, G or U).
  • the polynucleotide programmable DNA binding domain is fused or linked to a deaminase domain.
  • the agent is a fusion protein comprising a domain having base editing activity.
  • a domain with base editing activity is capable of deaminating bases within a nucleic acid molecule.
  • the base editor is capable of deaminating one or more bases within a DNA molecule.
  • the base editor is an adenine base editor (ABE).
  • base editing activity refers to the use of chemically changing bases within a polynucleotide.
  • the first base is converted to the second base.
  • the base editing activity is adenosine or adenine deaminase activity, such as converting target A ⁇ T to C ⁇ G.
  • base editing activity is assessed by editing efficiency.
  • Base editing efficiency can be measured by any suitable means, for example, by Sanger sequencing or next-generation sequencing.
  • base editing efficiency is measured by the percentage of total sequencing reads that have nucleobase conversion affected by the base editor, e.g., have target C ⁇ G bases converted to A ⁇ T base pairs Percentage of total sequencing reads for pairs.
  • base editing efficiency is measured by the percentage of total cells that have nucleobase conversion affected by the base editor.
  • the term "base editor system” refers to a system for editing nucleobases of a target nucleotide sequence.
  • the base editor system comprises (1) a nucleic acid programmable nucleotide binding domain (e.g., Cas9); (2) a deaminase structure for deamination of the nucleobase domain (eg, adenosine deaminase); (3) one or more guide polynucleotides (eg, guide RNA).
  • Guide polynucleotide refers to a polynucleotide that can specifically target a target sequence and can form a complex with a nucleic acid programmable nucleotide binding domain protein (eg, Cas9).
  • the guide polynucleotide is a guide RNA (gRNA).
  • gRNA can exist as a complex of two or more RNAs or as a single RNA molecule.
  • a gRNA that exists in the form of a single RNA molecule can be called a single guide RNA (sgRNA).
  • sgRNA single guide RNA
  • gRNA single guide RNA
  • gRNA single guide RNA
  • gRNA is used interchangeably to refer to a guide RNA that exists as a single molecule or as a complex of two or more molecules.
  • gRNAs that exist as a single RNA species include two domains: (1) a domain that has homology to the target nucleic acid (e.g., guides the binding of the Cas9 complex to the target nucleic acid); (2) a structure that binds the Cas9 protein area.
  • domain (2) corresponds to a sequence known as tracrRNA and includes a stem-loop structure.
  • domain (2) is the same or homologous to the tracrRNA provided in Jinek et al., Science 337:816-821 (2012).
  • Other examples of gRNAs can be found in U.S.
  • a gRNA includes two or more of domains (1) and (2) and may be referred to as an "extended gRNA.”
  • the extended gRNA will bind two or more Cas9 proteins and bind the target nucleic acid in two or more different regions.
  • the gRNA includes a nucleotide sequence complementary to the target site, which mediates the binding of the nuclease/RNA complex to the target site, providing sequence specificity to the nuclease:RNA complex.
  • Vector refers to a polynucleotide composition used to transfer, deliver or introduce nucleic acid into a host cell.
  • Suitable vectors include plasmid vectors, phage vectors, viral vectors (eg, retroviral vectors, adeno-associated virus vectors, herpes simplex virus vectors, AAV vectors, lentiviral vectors, baculovirus vectors), and the like.
  • Delivery system which includes a delivery vehicle including one or more liposomes, nanoparticles, exosomes, exosomes, microvesicles, viral vectors, gene guns, or electroporation devices, etc.
  • “Functional fragment” refers to a protein or polypeptide sequence that includes fewer amino acids than the original sequence of the protein or polypeptide, but the functional activity of the remaining amino acid sequence relative to the original reference sequence still exists in a certain proportion (such as 10%, 20%, 30%, 40%, 50% or 60-99%, 100%) of functional activity (e.g., the protein or polypeptide can be modified by substitution, insertion, deletion, and/or addition of one or more amino acids while retaining a certain proportion of enzyme activity).
  • Identity refers to the matching sequence between two polypeptides or two nucleic acids. When positions in both sequences being compared are occupied by the same base or amino acid monomer subunit (for example, if positions in each of two DNA molecules are occupied by adenine, or if each of two polypeptides When all positions in are occupied by lysine), each molecule is identical at that position.
  • Perfect identity between two sequences is a function of the number of matching positions common to the two sequences divided by the number of positions compared multiplied by 100. For example, if 6 out of 10 positions of two sequences match, then The two sequences are 60% identical. For example, the DNA sequences CTGACT and CAGGTT share 50% identity (matching at 3 positions out of a total of 6 positions). Typically, comparisons are made when two sequences are aligned to yield maximum identity.
  • “Host cell” includes in vitro, ex vivo or in vivo cells or cell lines or their progeny, including but not limited to CHO, BHK, 293, 293T cell lines, etc., said cells or cell lines or their progeny It includes: the Cas13 protein, fusion protein, CRISPR-Cas system, polynucleotide, vector or delivery system of the present invention.
  • “Operably linked” means that the nucleoside of interest is incorporated into the target cell in a manner that allows expression of the nucleotide sequence (e.g., in an in vitro transcription/translation system or when the vector is introduced into the target cell).
  • the acid sequence is linked to the regulatory sequence.
  • coding sequence or "protein coding sequence” as used interchangeably herein refer to a polynucleotide fragment that encodes a protein. This region or sequence has a start codon near the 5' end and a stop codon near the 3' end.
  • the coding sequence may also be called an open reading frame.
  • nuclear localization sequence and “nuclear localization signal (NLS)” refer to the amino acid sequence that promotes the import of proteins into the nucleus.
  • Nuclear localization sequences are known in the art and are described, for example, in Plank et al., International PCT application, PCT/EP2000/011690, filed on November 23, 2000, published as WO/2001/038547 on May 31, 2001, which The contents are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences.
  • the NLS is an optimized NLS, such as described by Koblan et al., Nature Biotech. 2018 doi:10.1038/nbt.4172.
  • linker may refer to a covalent linker (e.g., a covalent bond), a non-covalent linker, a chemical group, or a linker that connects two molecules or moieties (e.g., two molecules of a protein complex or a ribonucleosome). component) molecule, for example, two domains of a fusion protein, such as a polynucleotide programmable DNA binding domain (e.g., dCas9) and a deaminase domain (e.g., adenosine deaminase).
  • Linkers can connect different components of a base editor system or different parts of components.
  • a linker can connect a guide polynucleotide binding domain of a polynucleotide programmable nucleotide binding domain and a catalytic domain of a deaminase.
  • a linker can be located between or on either side of two groups, molecules, or other moieties and be attached to each via a covalent bond or non-covalent interaction, thereby connecting the two.
  • the linker can be a polynucleotide.
  • the linker can be a DNA linker.
  • the linker can be an RNA linker.
  • the linker can be an amino acid or multiple amino acids (eg, a peptide or protein). In some embodiments, the linker can be about 5 to 100 amino acids in length, such as about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 20 to 30, 30 to 40, 40 to 50, 50 to 60, 60 to 70, 70 to 80, 80 to 90 or 90 to 100 amino acids length. In some embodiments, the linker can be about 100 to 150, 150 to 200, 200 to 250, 250 to 300, 300 to 350, 350 to 400, 400 to 450, or 450 to 500 amino acids in length. Longer or shorter joints may also be considered.
  • cleavage may refer to the breaking of phosphodiester bonds in a nucleotide chain.
  • the type of break can be a single strand break or a double strand break.
  • the terms “complementary” or “hybrid” are used to refer to “polynucleotide” and “oligonucleotide” (which are interchangeable terms referring to is the nucleotide sequence).
  • the sequence “CAGT” is complementary to the sequence “GTCA”.
  • Complementarity can be “partial” or “total.”
  • Partial complementarity means that one or more nucleic acid bases are mismatched according to the base pairing rules.
  • “Total” or “complete” complementarity between nucleic acids means that each nucleic acid base is matched with another base under base pairing. Base matching rules.
  • the degree of complementarity between nucleic acid strands has an important impact on the efficiency and strength of hybridization between nucleic acid strands. This is particularly important in amplification reactions and detection methods that depend on binding between nucleic acids.
  • hybridization refers to any process that uses a nucleic acid strand to combine with a complementary strand through base pairing to form a hybridization complex to pair complementary nucleic acids.
  • nucleic acid sequence and “nucleotide sequence” refer to oligonucleotides or polynucleotides, fragments or portions thereof, and refer to genomic or synthetic sources that may be single- or double-stranded of DNA or RNA, and represents the sense or antisense strand.
  • sequence identity and “percent identity” refer to the percentage of nucleotides or amino acids that are identical (i.e., identical) between two or more polynucleotides or polypeptides.
  • Sequence identity between two or more polynucleotides or polypeptides can be determined by aligning the nucleotide or amino acid sequences of the polynucleotides or polypeptides and the aligned polynucleotides or polypeptides containing The number of positions with identical nucleotides or amino acid residues is scored and compared to the number of positions in the aligned polynucleotide or polypeptide that contain different nucleotides or amino acid residues.
  • Polynucleotides may differ at one position, for example, by containing different nucleotides (ie, substitutions or mutations) or missing nucleotides (ie, nucleotide insertions or nucleotide deletions in one or both polynucleotides).
  • Polypeptides may differ at one position, for example, by containing a different amino acid (ie, a substitution or mutation) or a missing amino acid (ie, an amino acid insertion or amino acid deletion in one or both polypeptides).
  • Sequence identity can be calculated by dividing the number of positions containing identical nucleotides or amino acid residues by the total number of amino acid residues in the polynucleotide or polypeptide. For example, percent identity can be calculated by dividing the number of positions containing the same nucleotide or amino acid residue by the total number of nucleotides or amino acid residues in the polynucleotide or polypeptide and multiplying by 100.
  • two or more sequences or subsequences have at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% nucleotides "sequence identity" or "percent identity”.
  • sequences are substantially identical throughout the entire length of either or both compared biopolymers (eg, polynucleotides).
  • vector refers to a means of introducing a nucleic acid sequence into a cell thereby producing a transformed cell.
  • Vectors include plasmids, transposons, phages, viruses, liposomes, and episomes.
  • An "expression vector” is a nucleic acid sequence containing a nucleotide sequence to be expressed in a recipient cell. Expression vectors may include additional nucleic acid sequences to facilitate and/or facilitate expression of introduced sequences, such as initiation, termination, enhancers, promoters, and secretion sequences.
  • mammals include, but are not limited to, domesticated animals (e.g., cows, sheep, cats, dogs, and horses), primates (e.g., humans and non-human primates such as monkeys), rabbits, and rodents (e.g., mice and rats). mouse). In particular, individuals are people.
  • the methods disclosed herein may be performed in vitro, ex vivo, or in vivo, or the products may be present in in vitro, ex vivo, or in vivo forms.
  • in vitro refers to experiments using materials, biological substances, cells and/or tissues in laboratory conditions or culture media; whereas the term “in vivo” refers to experiments and procedures using intact multicellular organisms.
  • methods performed in vivo can be performed on non-human animals.
  • Ex vivo refers to an event that exists or occurs outside an organism, such as outside a human or animal body, such as an event that may exist or occur on tissue (eg, a whole organ) or cells taken from an organism.
  • the term "pharmaceutically acceptable carrier” refers to a pharmaceutically acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g. , lubricants, talc, magnesium stearate, calcium or zinc or stearic acid) or solvent encapsulated materials involving the delivery or transport of compounds from one site of the body (e.g., delivery site) to another site (e.g., organ, tissue or body part).
  • a pharmaceutically acceptable carrier is "acceptable” in the sense of being compatible with the other ingredients of the formulation and not deleterious to the tissues of the subject (eg, physiologically compatible, sterile, physiological pH, etc.).
  • materials that can serve as pharmaceutically acceptable carriers include: (1) sugars, such as lactose, glucose, and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose and its derivatives, such as Sodium carboxymethylcellulose, methylcellulose, ethylcellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricant, Such as magnesium stearate, sodium lauryl sulfate and talc; (8) Excipients, such as cocoa Grease and suppository wax; (9) Oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) Diols, such as propylene glycol; (11) Polyols, such as glycerol, sorbose Alcohol, mannitol and polyethylene glycol (PEG); (12) Esters, such as ethyl o
  • wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservatives and antioxidants may also be present in the formulations.
  • Terms such as “excipient,””pharmaceutically acceptable carrier,” and the like are used interchangeably herein.
  • an effective amount refers to an amount of biologically active agent sufficient to elicit the desired biological response.
  • an effective amount of a base editor may refer to an amount of base editor sufficient to induce mutation of a target site that is specifically bound by the mutation of the base editor.
  • effective amounts of reagents such as base editor fusion proteins, deaminases, polynucleotides, etc., can vary depending on various factors, such as with the desired biological response, such as with the editing to be The specific allele, genome, or target site varies depending on the cells or tissues targeted and the reagents used.
  • treatment means, as described herein, intended to reverse, alleviate a disease or condition or one or more symptoms thereof, delay the onset of a disease or condition or one or more symptoms thereof, or inhibit a disease or condition or clinical intervention for the progression of one or more of its symptoms.
  • treatment means, as described herein, intended to reverse, alleviate a disease or condition or one or more symptoms thereof, delay the onset of a disease or condition or one or more symptoms thereof, or Clinical intervention that inhibits the progression of a disease or condition or one or more of its symptoms.
  • treatment may be administered after one or more symptoms have developed and/or the disease has been diagnosed.
  • treatment may be administered in the absence of symptoms, for example, to prevent or delay the onset of symptoms or to inhibit the onset or progression of a disease.
  • the treatment may be administered to a susceptible individual prior to the onset of symptoms (eg, in view of a history of symptoms and/or in view of genetic or other susceptibility factors). Treatment can also be continued after symptoms have subsided, for example to prevent or delay their recurrence.
  • the content of the present invention includes the following contents:
  • an adenosine deaminase comprising one or more of the following sequences:
  • the present invention also relates to a variant of adenosine deaminase, which has any multiple changes in the following amino acid positions relative to the amino acid sequence shown in SEQ ID NO: 1: 33, 35, 36, 46, 47, 48, 49, 104, 105, 107, 148, 149, 150, 151, 152, 153, 154, 155.
  • the deaminase variant includes a change in at least one amino acid position selected from the following relative to the amino acid sequence shown in SEQ ID NO: 1: V33I, D35G, D35R, D36N, D36G, A46C, I47Y, I47R, I47V, T48G, T48R, L49T, L49H, L49K, V104A, V104M, S105C, S105G, S107R, S107K, S107A, Q148P, Q148C, Q148A, Q148G, Q149M, Q149G, Q149L, P150R, P1 50L, P150C, R151K, E152R, E152T, E152G, E152W, V153P, V153F, V153I, V153T, F154H, F154K, F154L, N155T, N155R, N155H. Specific position changes can be made in one or more combinations. Specifically, the adenosine deaminase variant includes
  • the invention provides an adenosine deaminase variant that has any number of changes in amino acid positions relative to the amino acid sequence shown in SEQ ID NO: 170.
  • the substitution is a substitution that occurs at one or more of the following positions in the amino acid sequence shown in SEQ ID NO: 170:
  • the substitution is a substitution that occurs at one or more of the following positions of the amino acid sequence shown in SEQ ID NO: 170:
  • the substitution is a substitution that occurs at a combination of the following positions of the amino acid sequence shown in SEQ ID NO: 170:
  • N132K+V133I+L134F+N135H (adenosine deaminase 004V12);
  • V82T+L83Q (adenosine deaminase 004V33);
  • Activity can be the deamination activity of adenosine deaminase that completely retains the original sequence, or it can be the deamination activity of adenosine deaminase that partially retains the original sequence, such as retaining 30%, 40%, 50%, 60 %, 70%, 80%, 90%, 95%, 99% deamination activity.
  • an adenosine deaminase having a modified sequence such as an adenosine deaminase having an amino acid substituted sequence, may also have a higher deamination activity than an adenosine deaminase of the original sequence.
  • the adenosine deaminase provided by the invention can act on any polynucleotide, including DNA, RNA and DNA-RNA hybrids.
  • an adenosine deaminase can deaminate the target adenine (A) of a polynucleotide comprising DNA.
  • an adenosine deaminase can deaminate the target adenine (A) of an RNA-containing polynucleotide.
  • the substitutions are conservative substitutions.
  • “moderate stringency conditions”, “medium-high stringency conditions”, “high stringency conditions” or “very high stringency conditions” describe nucleic acid hybridization and washing conditions. Instructions for conducting hybridization reactions are found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6, which is incorporated herein by reference. Aqueous and non-aqueous methods are described in this document, and either can be used.
  • specific hybridization conditions are as follows: (1) Low stringency hybridization conditions in 6 ⁇ sodium chloride/sodium citrate (SSC) at approximately 45°C, then at least 50°C in 0.2 ⁇ SSC, 0.1% SDS Medium wash 2 times (for low stringency conditions, the washing temperature can be increased to 55°C); (2) Medium stringency hybridization conditions are at 6 ⁇ SSC, at about 45°C, then at 60°C, at 0.2 ⁇ SSC, Wash 1 or more times in 0.1% SDS; (3) High stringency hybridization conditions in 6 ⁇ SSC, at about 45°C, then at 65°C, wash 1 or more times in 0.2 ⁇ SSC, 0.1% SDS and Preferred; (4) Very high stringency hybridization conditions are 0.5M sodium phosphate, 7% SDS at 65°C, followed by one or more washes at 65°C in 0.2 ⁇ SSC, 1% SDS.
  • SSC sodium chloride/sodium citrate
  • amino acid sequence of adenosine deaminase 005V1 is as follows:
  • amino acid sequence of adenosine deaminase 004V1 is as follows;
  • nucleotide sequence of adenosine deaminase 005V1 is as follows:
  • a second aspect of the present invention provides a base editor fusion protein, which includes the adenosine deaminase described in the first aspect of the present invention, and a nucleic acid programmable nucleotide binding domain.
  • a nucleic acid programmable nucleotide binding domain when combined with a bound guide polynucleotide (eg, gRNA), can specifically bind to a target polynucleotide sequence (i.e., through the bases of the bound guide nucleic acid). and the bases of the target polynucleotide), thereby positioning the base editor to the target nucleic acid sequence that needs to be edited.
  • the target polynucleotide sequence comprises single-stranded DNA or double-stranded DNA.
  • the target polynucleotide sequence comprises RNA.
  • the target polynucleotide sequence comprises a DNA-RNA hybrid. It should be understood that nucleic acid programmable nucleotide binding domains may also include nucleic acid programmable proteins that bind RNA.
  • the nucleic acid programmable nucleotide binding domain in the base editor is Cas protein or AGO protein.
  • the Cas protein or AGO protein includes naturally occurring Cas protein or AGO protein, as well as homologs thereof or modified or engineered versions thereof.
  • the Cas protein or AGO protein may be an amino acid sequence that has at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, or at least the same amino acid sequence as a naturally occurring Cas protein or AGO protein.
  • the Cas protein also includes nickase or nuclease-inactive forms of the protein.
  • non-limiting examples of Cas proteins that can serve as nucleic acid programmable nucleotide binding domains include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas5d, Cas5t, Cas5h, Cas5a, Cas6, Cas7, Cas8 , Cas8a, Cas8b, Cas8c, Cas9 (also known as Csn1 or Csx12), Cas10, Cas10d, Cas12a/Cpfl, Cas12b/C2cl, Cas12c/C2c3, Cas12d/CasY, Cas12e/CasX, Cas12f (C2c10/Cas14), Cas12g , Cas12h, Cas12i, Cas12j, Cas12k/C2c5, Cas12l, Cas12m, Cas12n, Cas13a(C2c2), cas13b, Cas13c, Cas13
  • the Cas protein is selected from the Cas9 family, Cas12 family and Cas13 family, such as but not limited to Cas9, Cas12a (Cpf1), Cas12b (C2c1), Cas12c (C2c3), Cas12d (CasY), Cas12e (CasX), Cas12f (C2c10/Cas14), Cas12g, Cas12h, Cas12i, Cas12j, Cas12k (C2c5), Cas12l, Cas12m, Cas12n, Cas13a (C2c2), Cas13b, Cas13c, Cas13d and their homologs or modifications or engineering version.
  • Cas proteins include nuclease-free forms of the above-mentioned Cas proteins, such as dCas9, dCas12a, dCas12b, dCas12c, dCas12d, dCas12e, dCas12f, dCas12g, dCas12h, dCas12i, dCas12j, dCas12k, dCas12l , dCas12m, dCas12n, dCas13a, dcas13b, dCas13c and dCas13d.
  • Cas proteins also include nickase forms of the above proteins, such as nCas9, but are not limited to nCas9.
  • the nucleic acid programmable nucleotide binding domain is Cas9.
  • Cas9 is Cas9 from Streptococcus pyogenes (SpCas9), Cas9 from Staphylococcus aureus (SaCas9), Cas9 from Streptococcus thermophilus 1 (St1Cas9).
  • Cas9 is Cas9 from Streptococcus pyogenes (SpCas9).
  • Cas9 can be nuclease active Cas9, Cas9 nickase (nCas9) or nuclease inactive Cas9 (dCas9).
  • the nucleic acid programmable nucleotide binding domain is Cas9 nickase (nCas9).
  • the nucleic acid programmable nucleotide binding domain comprises an amino acid sequence that is at least 80%, at least 85%, at least 90%, or at least identical to the Cas9 nickase (nCas9) provided herein.
  • the adenosine deaminase and the nucleic acid programmable nucleotide binding domain are directly fused/connected to form a fusion protein, or fused/connected through a linker to form a fusion protein.
  • the fusion/connection sequence of adenosine deaminase and nucleic acid programmable nucleotides is not particularly limited.
  • adenosine deaminase can be at the N-terminus of the base editor, or the nucleic acid programmable nucleotide binding domain is located at the base. The N-side of the base editor.
  • an exemplary base editor fusion protein has the following structure:
  • an exemplary base editor fusion protein has the following structure:
  • the domains of the nucleobase editor are fused through a linker comprising the following amino acid sequence:
  • the base editor also includes at least one nuclear localization signal sequence (NLS sequence), and the NLS sequence can be selected from the amino acid sequences in the following table:
  • NLS sequence nuclear localization signal sequence
  • the nuclear localization signal sequence can be located at the N-terminus, C-terminus, both ends of the base editor, or between adenosine deaminase and the nucleic acid programmable nucleotide binding domain. In some embodiments of the invention, the nuclear localization signal sequence can be fused to the base editor directly or through a linker.
  • An exemplary structure of a base editor containing a nuclear localization signal sequence is as follows:
  • the base editor has the following structure:
  • the base editor comprises one or more of the following sequences:
  • the amino acid sequence retains the binding polynucleotide and base editing activity of the amino acid sequence shown in SEQ ID NO: 190 or SEQ ID NO: 3, and the stringent conditions are moderately stringent conditions, medium-high stringency conditions, and high stringency conditions or very high stringent conditions.
  • the base editor sequence composed of adenosine deaminase 004V1 and 005V1 and nCas9 is as follows:
  • sequence in bold represents the sequence derived from nCas9; italics represent the linker sequence; the double underlined sequence represents the nuclear localization sequence; the single underlined sequence is the deaminase 004V1 sequence; the asterisk at the C terminus represents the stop codon position.
  • the bold sequence represents the sequence derived from nCas9
  • the italics represent the linker sequence
  • the double underlined sequence represents the nuclear localization sequence
  • the single underlined sequence is the adenosine deaminase 005V1 sequence
  • the asterisk represents the position of the stop codon.
  • nucleotide sequence of 005V1-nCas9 is as follows:
  • the present invention also provides mutants obtained by amino acid substitution based on adenosine deaminase 004V1: adenosine deaminase 004V2, adenosine deaminase 004V3, adenosine deaminase 004V4, adenosine deaminase 004V7, adenosine deaminase 004V8, adenosine deaminase 004V10, adenosine deaminase 004V12, adenosine deaminase 004V13-004V41, the details are as follows:
  • amino acid sequence of 004V2-nCas9 is as follows:
  • the bold sequence represents the sequence derived from nCas9
  • the italics represent the linker sequence
  • the double underlined sequence represents the nuclear localization sequence
  • the single underlined sequence is the adenosine deaminase 004V2 sequence
  • the asterisk represents the position of the stop codon.
  • amino acid sequence of 004V3-nCas9 is as follows:
  • sequence in bold represents the sequence derived from nCas9
  • the italics represent the linker sequence
  • the double underlined sequence represents the nuclear localization sequence
  • the single underlined sequence is the deaminase 004V3 sequence
  • the asterisk represents the position of the stop codon.
  • amino acid sequence of 004V4-nCas9 is as follows:
  • the bold sequence represents the sequence derived from nCas9
  • the italics represent the linker sequence
  • the double underlined sequence represents the nuclear localization sequence
  • the single underlined sequence is the deaminase 004V4 sequence
  • the asterisk represents the position of the stop codon.
  • amino acid sequence of 004V7-nCas9 is as follows:
  • the bold sequence represents the sequence derived from nCas9
  • the italics represent the linker sequence
  • the double underlined sequence represents the nuclear localization sequence
  • the single underlined sequence is the deaminase 004V7 sequence
  • the asterisk represents the position of the stop codon.
  • amino acid sequence of 004V8-nCas9 is as follows:
  • the bold sequence represents the sequence derived from nCas9
  • the italics represent the linker sequence
  • the double underlined sequence represents the nuclear localization sequence
  • the single underlined sequence is the deaminase 004V8 sequence
  • the asterisk represents the position of the stop codon.
  • amino acid sequence of 004V10-nCas9 is as follows:
  • the bold sequence represents the sequence derived from nCas9
  • the italics represent the linker sequence
  • the double underlined sequence represents the nuclear localization sequence
  • the single underlined sequence is the deaminase 004V10 sequence
  • the asterisk represents the position of the stop codon.
  • amino acid sequence of 004V12-nCas9 is as follows:
  • the bold sequence represents the sequence derived from nCas9
  • the italics represent the linker sequence
  • the double underlined sequence represents the nuclear localization sequence
  • the single underlined sequence is the deaminase 004V12 sequence
  • the asterisk represents the position of the stop codon.
  • amino acid sequence of 004V13-nCas9 is as follows:
  • sequence in bold represents the sequence derived from nCas9; italics represent the linker sequence; the double underlined sequence represents the nuclear localization sequence; the single underlined sequence is the adenosine deaminase 004V13 sequence; the asterisk at the C terminus represents the stop codon position.
  • the present invention also provides mutants obtained by subjecting one or more amino acid substitutions to adenosine deaminase 005V1, including deaminase 005V1-10-3, 005V1-11-5, 005V1- 10-1, 005V1-11-2, 005V1-15-1, 005V1-11-3, 005V1-10-4, 005V1-11-4, 005V1-15-2, 005V1-2-7, 005V1-1- 1.
  • Various mutants including 005V1-9-1, 005V1-9-2, 005V1-9-3, 005V1-9-4, 005V1-9-5, and 005V1-10-2.
  • adenosine deaminase By searching for a suitable insertion site for adenosine deaminase in the middle of the nCas9 protein, we also obtained a variety of chimeric base editor fusion proteins with improved editing efficiency.
  • the adenosine deaminase used includes adenosine deaminase.
  • 005V1, 005V1-10-1, 005V1-10-3 the above three adenosine deaminase sequences, base editor fusion protein, the nucleic acid programmable nucleotide binding domain selects the nCas9 domain, the The insertion position of adenosine deaminase is 583-584, 768-769, 770-771, 776-777, 793-794, 905-906, 919-920, 1048-1063, 1049- of SEQ ID NO:61 Between amino acid positions 1062, 1249-1250, 1263-1264, or 1276-1277.
  • the binding polynucleotide and base editing activity retaining the amino acid sequence shown in SEQ ID NO: 190 may be the binding polynucleotide and base editing activity of a base editor that completely retains the amino acid sequence shown in SEQ ID NO: 190 or SEQ ID NO: 3, or may partially retain its activity.
  • a base editor having a modified sequence may also have binding polynucleotides and bases higher than the base editor having an amino acid sequence set forth in SEQ ID NO: 190 or SEQ ID NO: 3. Editing activity.
  • the base editor fusion proteins of the present disclosure may contain one or more additional features.
  • the fusion protein can include an inhibitor, a cytoplasmic localization sequence, an export sequence, such as a nuclear export sequence or other localization sequence, and a tag that can be used to solubilize, purify, or detect the fusion.
  • Suitable tags include, but are not limited to, biotin carboxylase carrier protein (BCCP) tag, myc tag, calmodulin tag, FLAG tag, hemagglutinin (HA) tag, polyhistidine tag, also known as Histidine tag or His-tag, maltose binding protein (MBP)- tag, nus-tag, glutathione-S-transferase (GST)-tag, green fluorescent protein (GFP)-tag, thioredoxin-tag, S-tag, Softags (e.g., Softag 1, Softag 3 ), strand tag, biotin ligase tag, Flash tag, V5 tag and SBP tag. Other suitable sequences will be apparent to those skilled in the art.
  • the fusion protein contains one or more His tags.
  • a third aspect of the present invention provides a polynucleotide encoding the adenosine deaminase described in the first aspect of the present invention or encoding the base editor fusion protein described in the second aspect of the present invention.
  • a fourth aspect of the present invention provides a vector comprising the polynucleotide described in the third aspect of the present invention.
  • the vector is a mammalian expression vector.
  • the expression vector is selected from one or more of adeno-associated virus, retroviral vector, adenoviral vector, lentiviral vector, Sendai virus vector and herpes virus vector.
  • the vector includes a promoter.
  • the fifth aspect of the present invention provides a cell comprising the adenosine deaminase described in the first aspect of the present invention, the base editor fusion protein described in the second aspect of the present invention, and the adenosine deaminase described in the third aspect of the present invention.
  • the cell is a prokaryotic cell, a eukaryotic cell, further, a bacterial cell, a plant cell, an insect cell, a human cell or a mammalian cell.
  • a sixth aspect of the present invention provides a base editor system.
  • a base editor system includes an adenosine deaminase of the first aspect of the invention, a nucleic acid programmable nucleotide binding domain, and a guide polynucleotide.
  • a base editor system includes the base editor fusion protein of the second aspect of the invention, and a guide polynucleotide.
  • the guide polynucleotide is a guide RNA (gRNA), which is a short synthetic RNA, It consists of a backbone sequence required for Cas protein binding and a user-defined spacer sequence of approximately 20 nucleotides that defines the genomic target to be modified.
  • gRNA guide RNA
  • the genomic target specificity of a Cas protein in part by how specific the gRNA targeting sequence is for the genomic target compared to the rest of the genome.
  • the guide polynucleotide is an sgRNA, which consists of a backbone sequence required for Cas protein binding and a user-defined spacer sequence of approximately 20 nucleotides.
  • the domain that binds the Cas9 protein (SpCas9) that is, the backbone sequence of the sgRNA is:
  • the seventh aspect of the present invention provides a pharmaceutical composition, which includes the adenosine deaminase described in the first aspect of the present invention, the base editor fusion protein described in the second aspect of the present invention, and the third aspect of the present invention.
  • a pharmaceutical composition which includes the adenosine deaminase described in the first aspect of the present invention, the base editor fusion protein described in the second aspect of the present invention, and the third aspect of the present invention.
  • the pharmaceutically acceptable carrier may be a delivery carrier, such as a lipid, a cationic lipid, or other polymers with drug delivery functions.
  • the eighth aspect of the present invention provides a kit, specifically a disease treatment kit, which includes the adenosine deaminase described in the first aspect of the present invention and the base editor fusion described in the second aspect of the present invention.
  • a kit specifically a disease treatment kit, which includes the adenosine deaminase described in the first aspect of the present invention and the base editor fusion described in the second aspect of the present invention.
  • Protein, polynucleotide according to the third aspect of the present invention, expression vector according to the fourth aspect of the present invention, cell according to the fifth aspect of the present invention, base editor system according to the sixth aspect of the present invention and the present invention One or more pharmaceutical compositions according to the seventh aspect of the invention.
  • the ninth aspect of the present invention provides a delivery system, which includes the adenosine deaminase described in the first aspect of the present invention, the base editor fusion protein described in the second aspect of the present invention, and the third aspect of the present invention.
  • a delivery system which includes the adenosine deaminase described in the first aspect of the present invention, the base editor fusion protein described in the second aspect of the present invention, and the third aspect of the present invention.
  • One or more of the polynucleotide described above, the vector described in the fourth aspect of the present invention, the cell described in the fifth aspect of the present invention, and the base editor system described in the sixth aspect of the present invention, and a delivery medium are examples of the polynucleotide described above, the vector described in the fourth aspect of the present invention, the cell described in the fifth aspect of the present invention, and the base editor system described in the sixth aspect of the present invention, and a delivery medium .
  • the delivery medium can be nanoparticles, liposomes, exosomes, microvesicles, or gene guns, Cell-penetrating peptides, etc.
  • a tenth aspect of the present invention provides the use of the adenosine deaminase described in the first aspect of the present invention in preparing a base editor or a base editor system.
  • the eleventh aspect of the present invention provides the adenosine deaminase described in the first aspect of the present invention, the base editor fusion protein described in the second aspect of the present invention, the polynucleotide described in the third aspect of the present invention, the The expression vector according to the fourth aspect of the invention, the cell according to the fifth aspect of the invention, the base editor system according to the sixth aspect of the invention or the pharmaceutical composition according to the seventh aspect of the invention, the ninth aspect of the invention Use of the delivery system of the aspect in the preparation of a medicament for the treatment of diseases associated with or caused by point mutations.
  • the drug corrects the point mutation.
  • the point mutation is G to A and/or C to T.
  • diseases associated with or caused by point mutations include hypercholesterolemia, transthyretin amyloidosis, beta-hemoglobinopathies; in other embodiments, the diseases are not limited to Examples include Meier-Gorlin syndrome; Seckel syndrome; Joubert syndrome; Leber congenital amaurosis; Charcot-Marie-Tooth disease, type 2; Charcot-Marie-Tooth disease, type 2; Usher syndrome, type 2C; spinocerebellar ataxia disorders; long QT syndrome 2; Theoglin-Larsson syndrome (-Larsson syndrome); hereditary fruit diabetes; hereditary fruit diabetes; neuroblastoma; neuroblastoma; Kallmann syndrome (Kallmann syndrome) )1; Kallman syndrome; metachromatic leukodystrophy; Rett syndrome (Rettsyndrome); amyotrophic lateral sclerosis type 10; Li-Fraumeni syndrome (Li-Fraumeni syndrome).
  • a twelfth aspect of the present invention provides a method for base editing of nucleic acids, which includes the step of contacting the nucleic acid with the base editor system described in the sixth aspect of the present invention.
  • the nucleic acid is DNA. Further, the nucleic acid is double-stranded DNA.
  • the nucleic acid comprises a target sequence associated with a disease.
  • the target sequence contains a point mutation associated with a disease.
  • the target sequence comprises a G to A or C to T point mutation associated with a disease or disorder, and wherein deamination of the mutated A base results in a sequence not associated with the disease or disorder.
  • the target sequence encodes a protein, and wherein the point mutation is in a codon and results in a change in the amino acid encoded by the mutated codon compared to a wild-type codon.
  • the target sequence is located at a splice site, and wherein the point mutation results in a splicing change in the mRNA transcript compared to a wild-type transcript.
  • the target sequence is located at the promoter of a gene, and wherein the point mutation results in increased gene expression.
  • the target sequence is located at the promoter of a gene, and wherein the point mutation results in reduced gene expression.
  • the nucleic acid is located within the genome of an organism.
  • the organism is a prokaryotic or eukaryotic organism or a vertebrate or mammal.
  • deamination of the mutated A base results in a change in the amino acid encoded by the mutated codon, or results in a codon encoding a wild-type amino acid, or results in a change in the mRNA transcript, or results in wild-type mRNA transcription substances, either causing an increase or a decrease in gene expression.
  • the contacting is performed in vitro.
  • the contacting is performed in a subject.
  • the subject has been diagnosed with a disease or disorder.
  • the disease or disorder is associated with point mutations in the proprotein convertase subtilisin 9 (PCSK9) gene.
  • PCSK9 proprotein convertase subtilisin 9
  • the disease includes hypercholesterolemia, transthyretin amyloidosis, beta-hemoglobinopathies.
  • the disease includes, non-limiting examples include Meier-Gorlin syndrome; Seckel syndrome; Joubert syndrome ; Leber congenital amaurosis; Charcot-Marie-Tooth disease, type 2; Charcot-Marie-Tooth disease, type 2; Usher Usher syndrome, type 2C; spinocerebellar ataxia; long QT syndrome; Theoglin-Larsson syndrome; hereditary fruit diabetes; hereditary fruit diabetes; neuroblastoma Cell tumor; neuroblastoma; Kallmann syndrome (Kallmannsyndrome); metachromatic leukocytoma Malnutrition; Rettsyndrome; amyotrophic lateral sclerosis type 10; Li-Fraumeni syndrome.
  • a thirteenth aspect of the invention provides methods for treating a disease associated with or caused by a point mutation.
  • the methods provided comprise administering an effective amount to a subject suffering from such a disease.
  • the base editor fusion protein of the second aspect of the present invention that introduces correction point mutations or inactivating mutations into disease-related genes, the base editor system of the sixth aspect of the present invention, and the seventh aspect of the present invention The pharmaceutical composition, the kit according to the eighth aspect of the present invention, and the delivery system according to the ninth aspect of the present invention.
  • ABE8e is based on For ABE7.10 (Gaudelli NM, Komor AC, Rees HA, Packer MS, Badran AH, Bryson DI, Liu DR.
  • amino acid sequence of ABE8e is as follows:
  • sequence in bold indicates the sequence derived from nCas9; italics indicate the linker sequence; the double underlined sequence indicates the nuclear localization sequence; the single underlined sequence is the ecTadA* deaminase sequence; the * at the end of the C terminus indicates the stop codon position.
  • ABE8e nucleotide sequence is as follows:
  • the napDNAbp used in ABE8e and the base editor containing the adenosine deaminase provided by the present invention is nCas9, and its amino acid sequence is as follows:
  • nCas9 nucleotide sequence is as follows:
  • the present invention improves the editing efficiency of the corresponding base editor, expands the application prospects, and has potential utilization value.
  • the editing efficiency of the base editor involved in the present invention on the PCSK9 target site and other sites is significantly higher than that of the wild type. It is useful in treating diseases related to or caused by point mutations (such as hypercholesterolemia, transthyretin, etc.). Protein amyloidosis, beta-hemoglobinopathies) have great potential in the treatment field.
  • adenosine deaminase and nuclease were used to form a base editor, and a suitable chimeric position for the adenosine deaminase of the present invention within the nCas9 protein was found, further improving the base editing efficiency.
  • Figure 1 is the 005V1-nCas9 plasmid map.
  • Figure 2 shows the editing efficiency of 005V1-nCas9 and various mutant base editors at the PCSK9 site.
  • the editing target site is the PCSK9 gene, and the cells used are 293T cells.
  • Figure 3 is a comparison of the editing efficiency of 005V1-nCas9 and various mutant base editors at the PCSK9 site. To clearly show the comparison of the editing efficiency of the mutant base editor with that of 005V1-nCas9, the editing efficiency value of 005V1-nCas9 was set to 1, and the editing efficiencies of other mutant base editors were calculated in equal proportions.
  • Figure 4 is a comparison of the editing efficiency of mutant base editors with editing efficiency and 005V1-nCas9, including 005V1-10-3-nCas9, 005V1-10-1-nCas9, 005V1-11-2-nCas9, and 005V1-15 -1-nCas9, 005V1-11-3-nCas9, 005V1-11-5-nCas9, 005V1-10-4-nCas9, 005V1-11-4-nCas9, 005V1-15-2-nCas9, 005V1-3-3 -nCas9, 005V1-15-5-nCas9, 005V1-5-4-nCas9, 005V1-5-8-nCas9, 005V1-10-5-nCas9, 005V1-5-2-nCas9, 005V1-3-6-nCas
  • Figure 5 is a comparison of the editing efficiency of 005V1-nCas9 and some mutant base editors at different sites.
  • the different depths of the colors in the figure represent different editing efficiencies.
  • Names refer to the corresponding base editor.
  • Figure 6A and Figure 6B are respectively the PHK09 plasmid map (Figure 6A) and the 004V1-nCas9 structure diagram ( Figure 6B).
  • Figure 7 shows the editing efficiency of 004V1-nCas9 from A ⁇ T to G ⁇ C at site 1, which includes A to C.
  • Figure 7-A shows the comparison of the editing efficiency of 004V1-nCas9 and ABE8e at site 1.
  • the editing position They are the adenine deoxynucleotides of +3, +5, +7, and +8 from the 5' end of sgRNA.
  • the error bars represent the mean ⁇ SEM.
  • Each group of samples has 3 biological replicates;
  • Figure 7-B shows base editing.
  • Figure 7-C shows the sequencing results at site 1 after transfection with base editor ABE8e.
  • Figure 8 shows the editing efficiency of 004V1-nCas9 from A ⁇ T to G ⁇ C at site 17, which includes A to C.
  • a in Figure 8 is a comparison of the editing efficiency of 004V1-nCas9 and ABE8e at site 17. Edit The positions are +3, +4, +5, +7 adenine deoxynucleotides from the 5' end of sgRNA. The error bars represent the mean ⁇ SEM. Each group of samples has 3 biological replicates; B in Figure 8 is the base. The sequencing results at position 17 after transfection with base editor 004V1-nCas9; C in Figure 8 is the sequencing result at position 17 after transfection with base editor ABE8e.
  • Figure 9 shows the editing efficiency of 004V1-nCas9 from A ⁇ T to G ⁇ C at site 18, which includes A to C.
  • Figure 9-A shows the comparison of the editing efficiency of p004V1-nCas9 and ABE8e at site 18.
  • the editing position They are the adenine deoxynucleotides from +3, +5, +7, and +9 from the 5' end of sgRNA.
  • the error bars represent the mean ⁇ SEM.
  • Each group of samples has 3 biological replicates;
  • Figure 9-B shows base editing.
  • Figure 9-C shows the sequencing results at position 18 after transfection with base editor ABE8e.
  • Figure 10 shows the editing efficiency of 004V1-nCas9 at the PCSK9 site from A ⁇ T to G ⁇ C, including A to C.
  • Figure 10-A shows the comparison of the editing efficiency of 004V1-nCas9 and ABE8e at the PCSK9 site, editing position It is the +6 adenine deoxynucleotides from the 5' end of sgRNA. The error bars represent the mean ⁇ SEM. Each group of samples has 3 biological replicates;
  • Figure 10-B shows the expression of PCSK9 after transfection with base editor 004V1-nCas9. Sequencing results of the site;
  • Figure 10-C shows the sequencing results of the PCSK9 site after transfection with the base editor ABE8e.
  • Figure 11 shows the editing efficiency of various mutants of adenosine deaminase 004V1-nCas9 and ABE8e at different sites.
  • Figure 12 shows the editing efficiency of base editors composed of other mutants of adenosine deaminase 004V1 at different sites.
  • Figure 13 shows the base editing efficiency at site 1 (site1) of a partial chimeric base editor and a base editor in which a deaminase is connected to the N-terminal or C-terminal of nCas9.
  • Figure 14 shows the base editing efficiency at the PCSK9 site of a partial chimeric base editor and a base editor in which a deaminase is connected to the N-terminal or C-terminal of nCas9.
  • Figure 15 shows the base editing efficiency at the FANCF site of a partial chimeric base editor and a base editor in which a deaminase is connected to the N-terminal or C-terminal of nCas9.
  • Example 1 Obtaining base editors composed of adenosine deaminase 005V1 and various mutants
  • the applicant used bioinformatics functions to predict key amino acid sites that may affect its biological function, and mutated the amino acid sites, and obtained a base editor with significantly higher editing efficiency than the base editor composed of adenosine deaminase 005V1.
  • the specific site amino acid mutation patterns of 005V1 deaminase are shown in Tables 1a and 1b, and the mutation patterns of each mutant are shown in Table 2.
  • Base editors of different deaminase variants were generated through PCR-based site-directed mutagenesis.
  • the specific method was to amplify the DNA sequence encoding the 005V1-nCas9 base editor centered on 4-6 amino acids near the mutation site.
  • the sequence to be mutated is introduced on the primer, and different mutant base editors are obtained by homologous recombination or enzyme digestion ligation of the amplified fragments.
  • the PCR primers involved in the mutant base editor are shown in Tables 3-1, 3-2, and 3-3.
  • the specific method is as follows:
  • the applicant designed two mutation primers near the mutation site, and inserted the sequence to be mutated into the mutation primers.
  • the mutation primers used are shown in Tables 2 and 3.
  • Use 2 ⁇ Phanta Flash Master Mix enzyme (Vazyme, P520) amplify the full-length DNA sequence of the plasmid (operate according to the instructions). After amplification, use DpnI enzyme (NEB, R1076L) to digest the remaining original template, and then use (NEB, R3733L) and T4 DNA ligase (NEB, M0202L) were used to ligate the target fragments with end enzyme digestion. After transformation, they were sent to Boshang Biotechnology Co., Ltd. for sequencing. The sequencing results showed that different mutants of 005V1-nCas9 with correct sequences have been obtained. .
  • sgRNA expression vector sgRNA plasmid
  • Design PCSK9-sgRNA for the PCSK9 target point based on site1 (site 1), site8 (site 8), site16 (site 16), site18 (site 18) target points (see Gaudelli NM, Komor AC, Rees HA ,Packer MS,Badran AH,Bryson DI,Liu DR.Programmable base editing of A ⁇ T to G ⁇ C in genomic DNA without DNA cleavage.Nature.2017 Nov 23;551(7681):464-471.doi:10.1038/ nature24644.Epub 2017 Oct 25.Erratum in: Nature.2018 May 2; PMID: 29160308; PMCID: PMC5726555.) Design site1-sgRNA, site8-sgRNA, site16-sgRNA, site18-sgRNA. The specific sgRNA is shown in Table 4 Show.
  • sgRNA and synthesize oligonucleotides based on the target sequence.
  • the sgRNA sequences used are shown in SEQ ID NO: 141-145. Add CACC sequence to the 5' end of the upstream sequence of each sgRNA, and add AAAC sequence to the 5' end of the downstream sequence. Therefore, the upstream sequence format of each sgRNA used for synthesis is: 5'-CACCXXXXXXXXXXXXXXXXXXXXXX (20nt)-3', downstream sequence The format is: 5'-AAACXXXXXXXXXXXXXXXXXXXX(20nt)-3'.
  • the upstream and downstream sequences were annealed through the preset program (95°C, 5min; 95°C-85°C at -2°C/s; 85°C-25°C at -0.1°C/s; maintained at 4°C).
  • the annealed product was connected to the PHK09 vector linearized by BsmBI (NEB: R0739L) (the plasmid map is shown in Figure 6A, owned by the laboratory, which already contains the backbone sequence of sgRNA).
  • the sequence of the PHK09 vector is as follows:
  • the system used in sgRNA plasmid construction is as follows:
  • the linearization system of PHK09 vector is as follows: 3 ⁇ g of PHK09 vector; 6 ⁇ L of buffer (NEB: R0539L); 2 ⁇ L of BsmBI; add ddH 2 O to 60 ⁇ L, and digest overnight at 37°C.
  • connection system between sgRNA annealing product and linearized vector is as follows: 1 ⁇ L of T4 ligase buffer (NEB: M0202L), 20 ng of linearized vector, 5 ⁇ L of annealed oligo fragment (10 ⁇ M), 0.5 ⁇ L of T4 ligase (NEB: M0202L), ddH 2 O. Make up to 10 ⁇ L and connect overnight at 16°C.
  • the ligated vector was transformed into Escherichia coli DH5a competent cells (Videobio, DL1001).
  • the specific process is as follows: Take out the DH5 ⁇ competent cells from -80°C and quickly insert them into ice. After 5 minutes, wait for the bacterial block to melt. Add the ligation product and mix gently by hand at the bottom of the centrifuge tube. Leave it in ice for 25 minutes. Heat shock in 42°C water bath for 45 seconds, quickly return to ice and let stand for 2 minutes. Add 700 ⁇ l of sterile LB culture medium without antibiotics to the centrifuge tube, mix well, and resuscitate at 37°C and 200 rpm for 60 minutes. Centrifuge at 5000 rpm for one minute to collect the bacteria.
  • HEK293T cells purchased from ATCC were inoculated in DMEM medium (Gibco, 11965092) supplemented with 10% FBS (v/v), containing 1% Penicillin Streptomycin (v/v) (Gibco, 15140122), in Cultivate in a 37 °C cell culture incubator with 5% CO2 .
  • the cells used for transfection were seeded in a 24-well cell culture plate the day before and cultured. The cells were observed the next day and transfected when the cells grew to a cell density of about 80%.
  • the amount of plasmids used for transfection in each well of the 24-well plate is 0.4 ⁇ g of 005V1-nCas9 plasmid and each 005V1 mutant-nCas9, and 0.4 ⁇ g of sgRNA plasmid.
  • the plasmids were mixed and diluted with 25 ⁇ l of serum-reduced medium (Yuanbai Biotechnology, L530KJ) medium, then 2 ⁇ l of p3000 reagent was added, pipetting and mixing was used as reagent A, and allowed to stand for 5 minutes.
  • the system used for target site sequence amplification is as follows: 2 ⁇ Taq Master Mix (Vazyme, P112-03) 25 ⁇ L; Primer-F (10 pmol/ ⁇ L) 1 ⁇ L; Primer-R (10 pmol/ ⁇ L) 1 ⁇ L; template 1 ⁇ L; ddH 2 O. Make up to 50 ⁇ L.
  • Gene editing effect testing :
  • the base editor structure including adenosine deaminase and nCas9 provided by the invention is as follows:
  • This example counts the editing efficiency of each base editor on the PCSK9 target (the efficiency of mutating adenine A to guanine G).
  • the editing efficiency of the base editors is measured and compared.
  • 005V1-nCas9 and each deselected The editing efficiency of the base editor composed of ammonia enzyme mutants is shown in Table 6-9:
  • Figure 2 shows the editing efficiency of 005V1-nCas9 and 005V1 mutant-nCas9 at the PCSK9 site from A ⁇ T to G ⁇ C.
  • the editing position is +6 adenine deoxynucleotides from the 5' end of sgRNA. Error bars Expressed as mean ⁇ SEM, each group of samples had 3 to 5 biological replicates.
  • the editing efficiency of 005V1-10-3-nCas9 can reach 46.67%.
  • the editing efficiency of 005V1-11-2-nCas9 and 005V1-10-1-nCas9 also reached 35.67% and 35.33%.
  • the editing efficiency of the base editors composed of most of the other mutants in the figure is also higher than that of 005V1-nCas9.
  • 005V1-5-4-nCas9, 005V1-3-3-nCas9, 005V1-5-2-nCas9, 005V1-2-7-nCas9, 005V1-3-6-nCas9, 005V1-2- 8-nCas9, 005V1-5-3-nCas9, 005V1-3-4-nCas9 are in the editing window of site1 site (+5 position, +7 position), and the editing window of site8 site (+2 position, +3 position) , +4 position, +6 position) and the editing window of site16 (+3 position, +4 position, +6 position) are significantly narrowed, while the above-mentioned base editor has narrowed the editing window of site1 site.
  • the adenine deoxynucleotide at the +5 position from the 5' end of the sgRNA within the window still maintains extremely high editing activity, which can achieve efficient and precise editing and has great potential application value
  • adenosine deaminase 004V1 is used and fused with nCas9 to form a new adenine base editor 004V1-nCas9 with a narrower editing window.
  • the construction strategy of adenine base editor 004V1-nCas9 is to replace the adenosine deaminase in ABE8e with 004V1, thereby obtaining a new adenine base editor 004V1-nCas9.
  • the base editor structure including adenosine deaminase and nCas9 provided by the invention is as follows:
  • the sgRNA used is shown in Table 10, and the identification primer sequences used are shown in Table 11.
  • sgRNA and base editor The specific construction of sgRNA and base editor is as follows.
  • sgRNA expression vector sgRNA plasmid
  • the adenine base editor expression vector 004V1-nCas9 was prepared.
  • the nucleotide sequence of adenosine deaminase 004V1 is shown in SEQ ID NO:156.
  • the nucleotide sequence of the above-mentioned deaminase 004V1 has been codon optimized according to the preference of human codon usage.
  • Sangon Bioengineering (Shanghai) Co., Ltd. was entrusted to complete the artificial synthesis of the 540 bp deaminase 004V1 gene and will synthesize the The gene replaces nucleotides 63-560 of the ABE8e sequence.
  • the 004V1-nCas9 plasmid map is shown in Figure 6B. (Correspondingly, the amino acid sequence of 004V1-nCas9 is shown in SEQ ID NO: 190.)
  • nucleotide sequence of 004V1-nCas9 is as follows:
  • HEK293T cells purchased from ATCC were inoculated in DMEM medium (Gibco, 11965092) supplemented with 10% FBS (v/v), containing 1% Penicillin Streptomycin (v/v) (Gibco, 15140122), in Cultivate in a 37 °C cell culture incubator with 5% CO2 .
  • the cells used for transfection were seeded in a 24-well cell culture plate the day before and cultured. The cells were observed the next day and transfected when the cells grew to a cell density of about 80%.
  • the amounts of plasmids used for transfection in each well of the 24-well plate were 0.4 ⁇ g of p004V1-nCas9 plasmid and 0.4 ⁇ g of sgRNA plasmid.
  • the plasmids were mixed and diluted with 25 ⁇ l of serum-reduced medium (Yuanbai Biotechnology, L530KJ) medium, then 2 ⁇ l of p3000 reagent was added, pipetting to mix evenly as reagent A, and left to stand for 5 minutes.
  • the system used for target site sequence amplification is as follows: 2 ⁇ Taq Master Mix (Vazyme, P112-03) 25 ⁇ L; Primer-F (10 pmol/ ⁇ L) 1 ⁇ L; Primer-R (10 pmol/ ⁇ L) 1 ⁇ L; template 1 ⁇ L; ddH 2 O. Make up to 50 ⁇ L.
  • the 004V1-nCas9 plasmid was co-transfected with sgRNA plasmids at different sites into HEK293T cells (purchased from ATCC). Compared with the ABE8e plasmid (addgene, Plasmid#138489), it was found that 004V1 was found at site 1 and site 17.
  • the editing efficiency of -nCas9 is similar to that of ABE8e ( Figure 7, Figure 8), while the editing efficiency of 004V1-nCas9 at position 18 is significantly better than ABE8e ( Figure 9).
  • the editing windows of 004V1-nCas9 at site 1, site 17, and site 18 are all smaller than ABE8e (see Figures 7 to 9).
  • Example 3 the adenine base editor 004V1-nCas9 obtained in Example 3 is applied to disease treatment.
  • PCSK9 Proprotein convertase subtilisin 9
  • LDLR Low-Density Lipoprotein Receptor
  • excess PCSK9 can accelerate its degradation after binding to LDLR on the surface of hepatocytes, causing hepatocytes to respond to Low-Density Lipoprotein (Low-Density Lipoprotein).
  • LDL-C Cholesterol
  • sgRNA expression vector sgRNA plasmid
  • the sgRNA plasmid targeting PCSK9 was constructed as described in Example 1.
  • the sgRNA sequence used is shown in SEQ ID NO:141.
  • PCSK9-sgRNA Cccgcaccttggcgcagcgg (SEQ ID NO: 141).
  • the editing efficiency detection method in this example is the same as that described in Example 1, and the identification primer sequences used are as shown in SEQ ID NO: 146-147.
  • PCSK9-forward primer Gctagccttgcgttccg (SEQ ID NO: 146);
  • PCSK9-reverse primer Gtccccaagatcgtgccaa (SEQ ID NO: 147).
  • the adenine base editor expression vector p004V1-nCas9 and the PCSK9-targeting sgRNA plasmid were co-transfected into HEK293T cells.
  • the adenine base editor expression vector p004V1-nCas9 and the PCSK9-targeting sgRNA plasmid were co-transfected into HEK293T cells.
  • the base editor 004V1-nCas9 can target hypercholesterolemia caused by high expression of PCSK9.
  • This embodiment uses mutants adenosine deaminase 004V1, adenosine deaminase 004V2, adenosine deaminase 004V3, adenosine deaminase 004V4, adenosine deaminase 004V7, adenosine deaminase 004V8, adenosine deaminase Deaminase 004V10, adenosine deaminase 004V12, adenosine deaminase 004V13, use the same method as Example 3 or Example 4 to construct an adenine base editor expression vector and sgRNA expression vector, and perform cell culture and Transfection and testing of editing efficiency.
  • amino acid sequence information of the base editor composed of the above mutants is as follows:
  • the base editing efficiency results are shown in Figure 11.
  • Figure 11 compared to ABE8e, 004V2-nCas9, 004V3-nCas9, 004V4-nCas9, 004V7-nCas9, 004V8-nCas9, 004V10-nCas9, 004V12-nCas9, and 004V13-nCas9 have obvious editing windows at site 1 Narrowed (+5, +7), among which the editing window of 004V12-nCas9 is the narrowest, only alignment Position +5 has editing activity, and the base editing efficiency is 46%, which can achieve efficient and precise editing.
  • nucleotide sequence of the base editor composed of each adenosine deaminase is as follows
  • adenosine deaminase 004V1 Some other amino acid positions of adenosine deaminase 004V1 were also mutated, resulting in adenosine deaminase 004V14, adenosine deaminase 004V15, adenosine deaminase 004V16, adenosine deaminase 004V17, adenosine deaminase Aminase 004V18, adenosine deaminase 004V19, adenosine deaminase 004V20, adenosine deaminase 004V21, adenosine deaminase 004V22, adenosine deaminase 004V23, adenosine deaminase 004V24, a
  • Example 2-4 The same method as in Example 2-4 was used to construct the adenine base editor expression vector and PCSK9-sgRNA expression vector composed of adenosine deaminase 004V14-004V41, and the same method as in the above example was used for cell culture and The process of transfection and gene editing effect testing is as follows:
  • the base editor composed of deaminase 004V14-004V32 and nCas9 has higher editing efficiency at the PCSK9 site than 004V1-nCas9.
  • the editing efficiency of the base editor composed of adenosine deaminase 004V14-004V29 and nCas9 is significantly improved compared to 004V1-nCas9.
  • the editing efficiency of 004V17-nCas9 ⁇ 004V29-nCas9 is improved by 20% compared to 004V1-nCas9. %above.
  • the editing efficiency of 004V14-nCas9, 004V15-nCas9, and 004V16-nCas9 is more than 2 times that of 004V1-nCas9.
  • the adenosine deaminase used includes adenosine deaminase 005V1, 005V1-10-1, and 005V1-10-3.
  • the above three adenosine deaminase See Example 1 for the deaminase sequence.
  • the experimental steps are as follows:
  • the amplified PCR products were recovered according to the instructions of the kit (Tiangen, universal DNA purification and recovery kit, DP214).
  • the purified PCR product was transformed into Escherichia coli DH5a competent cells (Videobio, DL1001).
  • the specific process is as follows: Take out the DH5 ⁇ competent cells from -80°C and quickly insert them into ice. After 5 minutes, wait for the bacterial block to melt. Add the ligation product and mix gently by hand at the bottom of the centrifuge tube. Leave it in ice for 25 minutes. Heat shock in 42°C water bath for 45 seconds, quickly return to ice and let stand for 2 minutes. Add 700 ⁇ l of sterile LB culture medium without antibiotics to the centrifuge tube, mix well, and resuscitate at 37°C and 200 rpm for 60 minutes. Centrifuge at 5000 rpm for one minute to collect the bacteria.
  • nCas9 The primer sequence of nCas9 is shown in Table 18.
  • the primer sequences corresponding to nCas9 plasmids corresponding to the different chimeric base editors described in Table 18 are designed as follows:
  • the primer sequences in Table 19 are used to amplify the nCas9 plasmid obtained in step (1).
  • the amplification system and PCR procedures are the same as Tables 15 and 16.
  • the amplified PCR products are recovered according to the kit instructions (Tiangen, Universal DNA purification and recovery kit, DP214).
  • the primers used for nCas9 plasmid amplification corresponding to different insertion positions are shown in Table 19.
  • the amino acid sequence of 005V1-1249-ABE is as follows:
  • italics represent NLS, bold represents nCas9 fragment, underline represents linker, double underline represents deaminase, and the asterisk at the C end represents the stop codon position; the deaminase sequence is deaminase 005V1, and the deaminase 005V1 is embedded in nCas9 at position 1249-1250.
  • the homologous recombination product was transformed into Escherichia coli DH5a competent cells (Videobio, DL1001).
  • the specific process is as follows: Take out the DH5 ⁇ competent cells from -80°C and quickly insert them into ice. After 5 minutes, wait for the bacterial block to melt, add the ligation product and use your hands to Dip the bottom of the centrifuge tube and mix gently, then let stand on ice for 25 minutes. Heat shock in 42°C water bath for 45 seconds, quickly return to ice and let stand for 2 minutes. Add 700 ⁇ l of sterile LB culture medium without antibiotics to the centrifuge tube, mix well, and resuscitate at 37°C and 200 rpm for 60 minutes.
  • HEK293T cells purchased from ATCC were inoculated in DMEM medium (Gibco, 11965092) supplemented with 10% FBS (v/v), containing 1% Penicillin Streptomycin (v/v) (Gibco, 15140122), in Cultivate in a 37 °C cell culture incubator with 5% CO2 .
  • the cells used for transfection were seeded in a 24-well cell culture plate the day before and cultured. The cells were observed the next day and transfected when the cells grew to a cell density of about 80%.
  • the amount of plasmid transfected into each well of the 24-well plate is 0.4 ⁇ g of chimeric recombinant plasmid and 0.4 ⁇ g of sgRNA plasmid.
  • the sgRNA used is shown in the table below:
  • PCSK9-sgRNA and site1-sgRNA sequences used in this example are the same as the corresponding sequences in other examples, and FANCF-sgRNA is a sgRNA targeting the FANCF gene.
  • Genomic DNA extraction kit (TIANGEN, DP304-03) was used to extract the genome of HEK293T cells. Design primers according to experimental requirements. The used identification primer sequences are shown in Table 21.
  • PCR amplification of sequences near the sgRNA target site was performed using identification primers, and the amplified PCR products were identified by Sanger sequencing (BioShang Biotechnology (Shanghai) Co., Ltd.) for editing efficiency.
  • the system used for target site sequence amplification is as follows:
  • FIG. 13-15 The relevant results of this example are shown in Figures 13-15.
  • the base editing efficiency of each chimeric base editor and deaminase connected to the N-terminal or C-terminal of nCas9 is shown in Figure 13 . It can be seen that at the site1 site, the base editors 005V1-C-ABE, 005V1-10-1-C-ABE, and 005V1-10-3-C-ABE have no editing activity.
  • the editing efficiency of 005V1-1047-1064-ABE, 005V1-1048-1063-ABE, and 005V1-1249-ABE at the A5 position of the site1 site is significantly higher than that of 005V1-N-ABE, especially 005V1-
  • the editing efficiency of 1249-ABE is nearly 2 times that of 005V1-N-ABE.
  • the base editing efficiency of 005V1-776-ABE, 005V1-793-ABE, 005V1-905-ABE, and 005V1-919-ABE at specific positions of site1 is higher than that of 005V1-N-ABE. This can be used to achieve specific gene positions.
  • the base editing efficiency of each chimeric base editor and deaminase connected to the N-terminal or C-terminal of nCas9 is shown in Figure 14. It can be seen that at the PCSK9 site, the editing efficiency of 005V1-1249-ABE is higher than that of 005V1-N-ABE, and the editing efficiency of 005V1-10-1-1249-ABE is higher than that of 005V1-10-1-N at this site. -ABE, the editing efficiency of 005V1-10-3-1249-ABE is higher than that of 005V1-10-3-N-ABE at this site.
  • 005V1-10-3-1249-ABE shows similar characteristics to 005V1-10-1-1249-ABE at this site, showing editing efficiency at the A10 and A12 positions, while 005V1-10-3-N-ABE There is no editing activity at these positions, and the editing activity at other positions is not much different from that of 005V1-10-3-N-ABE.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medicinal Chemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Diabetes (AREA)
  • Hematology (AREA)
  • Obesity (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

本发明属于生物技术领域,提供了一种脱氨酶及腺嘌呤碱基编辑器,还提供了突变后的脱氨酶及对应的腺嘌呤碱基编辑器,所述突变后的脱氨酶对比亲本脱氨酶进行了多个氨基酸的突变,从而提高了碱基编辑的效率,具有较好的应用前景。

Description

一种腺苷脱氨酶、碱基编辑器及应用 技术领域
本发明属于生物技术领域,更具体地,本发明涉及一种腺苷脱氨酶、碱基编辑器融合蛋白、碱基编辑器系统及用途。
背景技术
如何精准、高效地对基因组进行修饰是生命科学领域研究的重要目标,而CRISPR(Clustered Regularly Interspaced Short Palindromic Repeats)/Cas9介导的基因编辑技术成为实现该目标的最强工具。传统的CRISPR/Cas9技术通过在靶点处产生DNA双链断裂(Double Strand Breaks,DSB),从而诱发细胞内的同源重组(Homologous Recombination,HR)和非同源末端连接(Non-Homologous EndJoining,NHEJ)修复途径,进而实现对基因组DNA的定点敲除、替换、插入等修饰。然而,DSB引发的DNA修复很难实现高效稳定的单碱基突变。单核苷酸变异会导致大约2/3人类遗传病的发生,也是许多动植物重要性状变异的遗传基础,因此开发一种精准且能够高效实现单碱基替换的技术尤为重要,DavidR.Liu实验室开发的碱基编辑器就是为此而生的。David R.Liu实验室开发了三种不同的碱基编辑器,分别是胞嘧啶碱基编辑器(Cytosine Base Editor,CBE)、腺嘌呤碱基编辑器(Adenine Base Editor,ABE)和先导编辑器(Prime Editor),这些碱基编辑器在工作时不依赖DSB的产生,也不需要供体DNA的参与。
以腺苷脱氨酶为基础的腺嘌呤碱基编辑技术主要是利用由切口酶Cas9n(D10A)或dCas9结合腺苷脱氨酶组成融合蛋白,在sgRNA的引导下,将位于碱基编辑活性窗口内的靶碱基腺嘌呤A脱氨形成次黄嘌呤I,再经DNA修复25和复制后逐渐被替换为G,最终形成A向G的定向替换(A至G)。
碱基编辑器还可通过对基因靶点的编辑实现对部分疾病的治疗,例如高胆固醇血症、转甲状腺素蛋白淀粉样变、β-血红蛋白病,因此,对现有碱基编辑器进行优化也具有更好的临床应用的前景。
但是目前的碱基编辑器存在位点依赖性的编辑效率的差异,并且编辑窗口较宽导致非必要的编辑,因此对现有碱基编辑器的改造十分必要。
发明内容
本发明公开了脱氨酶、包含其的碱基编辑器及其用途,通过实验和摸索,对脱氨酶基础序列进行定点突变,获得编辑效率提高的腺嘌呤碱基编辑器,提高其在真核细胞中的编辑效率;还对碱基编辑器融合蛋白的结构进行了大量实验,找到了适合腺苷脱氨酶嵌合在核酸酶结构域的位置,获得编辑效率提高的碱基编辑器融合蛋白结构,从而获得更好的应用前景。
定义
本领域技术人员熟知,蛋白可以出现氨基酸变化,例如取代、缺失或添加,产生的蛋白能够保持其功能或活性。
所述的“取代”指在氨基酸序列中的某个位置的某个氨基酸残基被其他氨基酸残基替代;其中,“取代”可以是保守氨基酸取代。
“保守修饰”、“保守取代”或“保守置换”是指用其他具有类似特征(例如电荷、侧链大小、疏水性/亲水性、主链构象和刚性等)的其它氨基酸置换蛋白中的氨基酸,使得可频繁进行改变而不改变蛋白的生物学活性。
本领域技术人员知晓,一般而言,多肽的非必需区域中的单个氨基酸置换基本上不改变生物学活性(参见Watson等(1987),Molecular Biology of the Gene,The Benjamin/Cummings Pub.Co.,第224页,(第4版))。另外,结构或功能类似的氨基酸的置换不大可能破环生物学活性。示例性保守取代于“示例性氨基酸保守取代”中陈述。
示例性氨基酸保守取代

又如,具有类似特征的氨基酸如下所示:
除非另有定义,本文使用的所有技术和科学术语均具有本发明所属领域的技术人员通常理解的含义。如本文所用,下述术语具有下述赋予它们的含义,除非另有说明。
“多核苷酸”和“核酸”是指具有任何长度的核苷酸(比如,RNA或DNA)的聚合形式。该术语包括但不限于单链、双链或多链DNA或RNA、基因组DNA、cDNA、DNA-RNA杂交体或包含嘌呤碱基和嘧啶碱基或其他天然的、化学或生物化学修饰的、非天然的或衍生的核苷酸碱基的聚合物。
“编码特定RNA的DNA序列”是指转录成RNA的DNA核苷酸序列。
“调控元件”、“调节元件”或“DNA调控序列”、“控制元件”是指转录和翻译调节序列,诸如启动子、增强子、终止子等。
“启动子(又称启动子序列)”是能够结合RNA聚合酶并启动下游(3'方向)编码或非编码序列的转录的DNA调控区。通常来说,包括诱导型启动子在内的各种启动子可用于驱动本发明的各种载体的表达。
“密码子优化”是指核酸序列进行修饰,以使得前述核酸序列在宿主细胞中更好表达,通常来说,是通过将原核酸序列中的至少一个密码子(例如,可以是1个或多个,如10个、20个或更多个)替换为所述宿主细胞基因中更频繁或最频繁使用的密码子,同时保持其表达出的仍然为天然的氨基酸序列。
“天然存在的(也称为未修饰的、未经修饰的、野生型的(wt))”是指存在于自然界中的核酸、多肽、细胞或生物体。例如,存在于生物体中的可从自然界中的来源分离的多肽或多核苷酸序列是天然存在的。
如本公开所使用的,术语“多肽”、“肽”和“蛋白质”在本文中互换地使用并且为任意长度的氨基酸聚合物。该聚合物可以是线形或分支的,它可以包含修饰的氨基酸,并且它可以由非氨基酸隔断。该术语也包括已经被修饰(例如,二硫键形成、糖基化、脂质化、乙酰化、磷酸化或任何其他操作,如以标记组分缀合)的氨基酸聚合物。“融合蛋白”是指包含来自至少两种不同蛋白质的蛋白质结构域的杂合多肽。一种蛋白质可以位于融合蛋白的氨基末端(N-末端)部分或羧基末端(C-末端)蛋白,从而分别形成氨基末端融合蛋白或羧基末端融合蛋白。
“CRISPR-Cas系统”(又称为“CRISPR系统”或“CRISPR/Cas系统”),其通常包含与CRISPR相关(“Cas”)基因的表达有关的转录产物或其他元件,或者能够指导所述Cas基因活性的转录产物或其他元件。在一些实施例中,CRISPR系统的组分可以包含编码系统的一种或多种组分的核酸(例如,载体)、蛋白质形式的组分或其组合。
“Cas蛋白”是指CRISPR相关蛋白(Cas)(又称为“CRISPR相关蛋白”、“CRISPR效应子”、“效应子”、“Cas蛋白”、“Cas酶”或“CRISPR酶”)是指执行酶活性和/或结合由RNA向导指定的核酸上的靶位点的蛋白。在一些实施方式中,Cas蛋白具有内切核酸酶活性、切口酶活性、核酸外切酶活性、转座酶活性和/或切除活性;在另外一些实施例中,所述Cas蛋白可能是核酸酶失活的或部分失活的。
“向导RNA(又称为指导RNA、gRNA或crRNA)”是指有利于将Cas蛋白靶向至靶核酸(诸如DNA和/或RNA)的任何RNA分子,“向导RNA”包括本领域的技术人员已知的一种或多种向导RNA及其等效物,包括但不限于能够与Cas蛋白形成复合物的基于RNA的分子(例如直接重复(DR)序列),并包含与靶核酸序列具有足够互补性以与靶核酸序列杂交并引导 复合物与靶核酸序列的序列特异性结合的序列(例如间隔序列(spacer))。
如本公开所使用的,术语“crRNA”包含重复序列(repeat)和间隔序列(spacer),CRISPR转录形成长链的pre-CRISPR RNA(pre-crRNA),pre-crRNA加工后得到包含一段重复区序列和一段间隔区序列的短的crRNA。在一些CRISPR/Cas系统中,crRNA由Cas蛋白作用于pre-crRNA得到。在另外一些CRISPR/Cas系统中,crRNA由Cas蛋白与tracrRNA(trans-activating crRNA)共同作用于pre-crRNA得到。
在不同的CRISPR/Cas系统中,crRNA可以单独作为引导RNA(guide RNA,gRNA)引导Cas蛋白定位到位于PAM序列附近的靶序列,或者crRNA与tracrRNA合并成为单一引导RNA(single guide RNA,sgRNA)引导Cas蛋白定位到位于PAM序列附近的靶序列。
如本公开所使用的,“crRNA的引导序列”是指crRNA中与目标核酸的靶序列杂交的序列,其对应由crRNA的间隔序列(spacer)形成。
如本公开所使用的,术语“靶序列”是指目标核酸中与crRNA互补或至少部分互补的核苷酸序列,Cas蛋白、crRNA与靶序列形成三元复合物后,Cas蛋白发挥对目标核酸中靶核酸链和/或非核苷酸链的特异性切割活性。在本公开中,“靶序列”与“靶核酸”、“靶多核苷酸”、“目标序列”、“目标核酸序列”可以互换地使用。
如本公开所使用的,术语“靶标链”(target strand)是指目标核酸中与crRNA杂交的核苷酸链;术语“非靶标链”(non-target strand)是指目标核酸中与crRNA不发生杂交配对的核苷酸链。术语“Cas9”或“Cas9结构域”是指RNA引导的核酸酶,其包含Cas9蛋白或其片段(例如,包含Cas9的活性、无活性或部分活性的DNA切割结构域的蛋白,和/或gRNA Cas9的结合结构域)。Cas9核酸酶有时也称为CRISPR相关核酸酶9。如前所述,CRISPR是一种适应性免疫系统,可针对移动遗传元件(病毒、转座元件和接合质粒)提供保护。CRISPR簇包含众多短而保守的重复序列区(repeat)和间隔区(spacer)。CRISPR簇被转录并加工成pre-crRNA。在第II型CRISPR/cas9系统中,正确处理pre-crRNA需要转编码的小RNA(tracrRNA)、内源性核糖核酸酶3(RNase III)和Cas9蛋白。tracrRNA作为核糖核酸酶3辅助处理pre-crRNA的引导。随后,Cas9/crRNA/tracrRNA核酸内切切割与间隔序列互补的线性或环状dsDNA靶标。与crRNA不互补的靶标链通过核酸内切方式切割。在自然界中,DNA结合和切割通常需要蛋白质和两种RNA。然而,可以对单一引导RNA(single guide RNA,sgRNA)进行工程改 造,以便将crRNA和tracrRNA的各个方面整合到单个RNA种类中。参见,如,Jinek M.等人,Science 337:816-821(2012),其全部内容以引用方式并入本文。Cas9识别CRISPR重复序列(PAM或原间隔序列临近基序)中的一个短基序,以帮助区分自我与非自我。Cas9核酸酶序列和结构是本领域技术人员众所周知的(参见,如“Complete genome sequence of an M1strain of Streptococcus pyogenes.”Ferretti等,Proc.Natl.Acad.Sci.U.S.A.98:4658-4663(2001);“CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.”Deltcheva E.等,Nature 471:602-607(2011);和“Aprogrammable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.”Jinek M.等,Science 337:816-821(2012))。Cas9直向同源物已在各种物种中得到描述,包括但不限于化脓链球菌和嗜热链球菌。基于本公开内容,其他合适的Cas9核酸酶和序列对本领域技术人员来说是显而易见的,并且此类Cas9核酸酶和序列包括来自Chylinski,Rhun和Charpentier,“The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems”(2013)RNA Biology10:5,726-737中公开的生物体和基因座的Cas9序列;其全部内容在此引入作为参考。
核酸酶灭活的Cas9蛋白可互换地称为“dCas9”蛋白(核酸酶-“死的”Cas9或无核酸酶活性的Cas9)或催化灭活的Cas9。用于产生具有无活性DNA切割结构域的Cas9蛋白(或其片段)是已知的(参见,如Jinek等,Science.337:816-821(2012);Qi等人“Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression”(2013)Cell.28;152(5):1173-83)。例如,已知Cas9的DNA切割结构域包括两个子结构域,HNH核酸酶子结构域和RuvC子结构域。HNH子结构域切割与gRNA互补的链,而RuvC子结构域切割非互补链。这些子结构域内的突变可以使Cas9的核酸酶活性沉默。例如,突变D10A和H840A使化脓性链球菌Cas9的核酸酶活性完全灭活(Jinek等人,Science.337:816-821(2012);Qi等人,Cell.28:152(5):1173-83(2013))。基于本领域的知识,另外的合适的无核酸酶活性的Cas9结构域对于本领域技术人员而言将是显而易见的。此类另外的示例性合适的无核酸酶活性的Cas9结构域包括但不限于D10A/H840A、D10A/D839A/H840A和D10A/D839A/H840A/N863A突变体结构域(参见例如Prashant等,Nature Biotechnology.2013;31(9):833-838)。
Cas9切口酶能够切割双链DNA的一条链。可以通过将失活的突变引入到HNH子结构域或RuvC子结构域中来生成Cas9切口酶。例如,可以在酿脓链球菌Cas9的RuvC结构域中引入失 活的突变(D10A),而HNH结构域保持有活性,即位置840处的残基保持为组氨酸。此类Cas9变体能够基于gRNA确定的靶序列在特定位置处生成单链DNA断裂(切口)。本领域技术人员能够鉴定任何已知Cas9蛋白的RuvC和HNH结构域中的催化残基并引入失活的突变以生成相应的dCas9或nCas9。
同样地,对于其他Cas蛋白,本领域技术人员可以通过同样的方式获得相应的无核酸酶活性的Cas蛋白和切割双链DNA的一条链的切口酶。
如本公开所使用的,术语“脱氨酶”或“脱氨酶结构域”是指催化脱氨反应的蛋白质或酶。在一些实施方案中,脱氨酶是腺苷脱氨酶,其催化腺嘌呤水解脱氨为次黄嘌呤。在一些实施方案中,脱氨酶是腺苷脱氨酶,其催化腺苷或腺嘌呤(A)水解脱氨为肌苷(I)。在一些实施方案中,脱氨酶或脱氨酶结构域是分别催化腺苷或脱氧腺苷水解脱氨为肌苷或脱氧肌苷的腺苷脱氨酶。在一些实施方案中,腺苷脱氨酶催化脱氧核糖核酸(DNA)中腺苷的水解脱氨。
如本公开所使用的,术语“核酸可编程核苷酸结合结构域”、“核酸可编程DNA结合蛋白(Nucleic acid programmable DNA-binding protein,napDNAbp)”是指与核酸(例如,DNA或RNA),例如引导多核苷酸(例如,gRNA)结合的蛋白质,所述核酸例如通过与靶核酸序列杂交将napDNAbp引导至特定核酸序列。例如,Cas9蛋白可以与引导RNA结合,该引导RNA将Cas9蛋白引导至与引导RNA互补的特定DNA序列。在一些实施方案中,napDNAbp是Cas9结构域,例如,核酸酶活性Cas9,Cas9切口酶(nCas9)或无核酸酶活性Cas9(dCas9)。核酸可编程DNA结合蛋白的实例包括但不限于Cas9(例如,dCas9和nCas9)、CasX、CasY、Cpf1、C2c1、C2c2、C2c3和Argonaute蛋白(AGO)。然而,应当理解,核酸可编程DNA结合蛋白也包括结合RNA的核酸可编程蛋白。例如,napDNAbp可以与将napDNAbp引导至RNA的核酸结合。其他核酸可编程DNA结合蛋白也在本公开内容的范围内,尽管它们可能未在本公开内容中具体描述。
nCas9结构域包含nCas9或其片段,nCas9片段与nCas9存在一定同源性(例如,至少约70%同源性,或至少约80%同源性,至少约90%同源性,至少约95%同源性,至少约96%同源性,至少约97%同源性,至少约98%同源性,至少约99%同源性,至少约99.5%同源性,或至少约99.9%同源性),并保留其基本功能。
如本公开所使用的,“碱基编辑器(Base Editor,BE)”或“核碱基编辑器”是指结合多核苷酸并具有核碱基修饰活性的试剂。在各种实施方案中,碱基编辑器包含核碱基修饰多肽(例如,脱氨酶)和与引导多核苷酸(例如,引导RNA)结合的核酸可编程核苷酸结合结构域(例如,核酸可编程DNA结合蛋白)。在各种实施方案中,所述试剂是包含具有碱基编辑活性的蛋白质结构域的生物分子复合物,即能够修饰核酸分子(例如,DNA、RNA)内的碱基(例如,A、T、C、G或U)。在一些实施方案中,所述多核苷酸可编程DNA结合结构域与脱氨酶结构域融合或连接。在一个实施方案中,所述试剂是包含具有碱基编辑活性的结构域的融合蛋白。在一些实施方案中,具有碱基编辑活性的结构域能够使核酸分子内的碱基脱氨基。在一些实施方案中,所述碱基编辑器能够使DNA分子内的一个或多个碱基脱氨基。在一些实施方案中,所述碱基编辑器是腺嘌呤碱基编辑器(ABE)。
如本公开所使用的,“碱基编辑活性”是指用于化学改变多核苷酸内的碱基。在一个实施方案中,第一个碱基被转化为第二个碱基。在一个实施方案中,所述碱基编辑活性是腺苷或腺嘌呤脱氨酶活性,例如将靶标A·T转化为C·G。
在一些实施例中,碱基编辑活性通过编辑效率来评估。碱基编辑效率可以通过任何合适的方式来测量,例如,通过sanger测序或二代测序。在一些实施方案中,碱基编辑效率通过具有受碱基编辑器影响的核碱基转化的总测序读数的百分比来测量,例如,具有转化为A·T碱基对的目标C·G碱基对的总测序读数的百分比。在一些实施方案中,当在细胞群中进行碱基编辑时,碱基编辑效率通过具有受碱基编辑器影响的核碱基转化的总细胞的百分比来测量。
如本公开所使用的,术语“碱基编辑器系统”是指用于编辑目标核苷酸序列的核碱基的系统。在各种实施方案中,所述碱基编辑器系统包含(1)核酸可编程核苷酸结合结构域(例如Cas9);(2)用于使所述核碱基脱氨基的脱氨酶结构域(例如腺苷脱氨酶);(3)一种或多种引导多核苷酸(例如引导RNA)。
“引导多核苷酸”、“引导RNA”或“gRNA”是指可以特异性靶向目标序列并且可以与核酸可编程核苷酸结合结构域蛋白(例如Cas9)形成复合物的多核苷酸。在一个实施方案中,引导多核苷酸是引导RNA(gRNA)。gRNA可以作为两个或多个RNA的复合物存在,也可以作为单个RNA分子存在。以单个RNA分子形式存在的gRNA可称为单一引导RNA(sgRNA), 但“gRNA”可互换使用以指以单个分子或两个或更多个分子的复合物形式存在的引导RNA。通常,作为单个RNA种类存在的gRNA包括两个结构域:(1)与目标核酸具有同源性的结构域(例如,引导Cas9复合物与目标核酸的结合);(2)结合Cas9蛋白的结构域。在一些实施方案中,结构域(2)对应于称为tracrRNA的序列,并且包括茎环结构。例如,在一些实施方案中,结构域(2)与Jinek et al.,Science 337:816-821(2012)中提供的tracrRNA相同或同源。gRNA的其他实例可以为在2013年9月6日提交的美国临时专利申请U.S.S.N.61/874,682(发明名称为“可切换的Cas9核酸酶及其用途(Switchable Cas9Nucleases and Uses Thereof)”)和在2013年9月6日提交的美国临时专利申请U.S.S.N.61/874,746发明名称为“功能性核酸酶递送系统(Delivery System For Functional Nucleases)”中公开的。在一些实施方案中,gRNA包括结构域(1)和(2)中的两个或更多个,并且可以被称为“延伸的gRNA”。延伸的gRNA将结合两个或更多个Cas9蛋白并在两个或更多个不同区域结合目标核酸。gRNA包括与目标位点互补的核苷酸序列,其介导核酸酶/RNA复合物与所述目标位点的结合,提供核酸酶:RNA复合物的序列特异性。
根据本发明,所用氨基酸三字母代码和单字母代码如J.biol.chem,243,p3558(1968)中所述。“载体”是指用于将核酸转移、递送或引入宿主细胞内的多核苷酸组合物。适合的载体包括质粒载体、噬菌体载体、病毒载体(例如,逆转录病毒载体、腺相关病毒载体、单纯疱疹病毒载体、AAV载体、慢病毒载体、杆状病毒载体)等。
“递送系统”,其包含递送载体,所述递送载体包括一种或多种脂质体、纳米颗粒、外泌体、外来体、微泡、病毒载体、基因枪或电穿孔装置等。
“功能片段”是指蛋白或多肽序列包括相比蛋白或多肽的原始序列更少的氨基酸,但剩余的氨基酸序列相对于原始参考序列的功能活性,仍存在一定比例(如10%、20%、30%、40%、50%或60-99%、100%)的功能活性(例如,可通过取代、插入、缺失和/或添加一个或多个氨基酸来修饰蛋白或多肽,同时保留一定比例的酶活性)。
“同一性”是指两个多肽或两个核酸之间匹配的序列。当两个进行比较的序列中的位置都被相同的碱基或氨基酸单体亚单元占据(例如,如果两个DNA分子的每一个中的位置都被腺嘌呤占据,或两个多肽的每一个中的位置都被赖氨酸占据)时,各个分子在该位置上是同一的。两个序列之间的“百分数同一性”是由这两个序列共有的匹配位置的数目除以进行比较的位置的数目再乘以100的函数。例如,如果两个序列的10个位置中有6个匹配,那么 这两个序列具有60%的同一性。例如,DNA序列CTGACT和CAGGTT共有50%的同一性(总共6个位置中有3个位置匹配)。通常,在将两个序列比对以产生最大同一性时进行比较。
“宿主细胞”包括体外的、离体的或体内的细胞或细胞系或它们的子代,包括但不限于CHO、BHK、293、293T细胞系等,所述细胞或细胞系或它们的子代包含:本发明所述的Cas13蛋白、融合蛋白、CRISPR-Cas系统、多核苷酸、载体或递送系统。
“可操作地连接”是指在指以允许核苷酸序列表达的方式(例如,在体外转录/翻译系统中或当将载体导入至靶细胞时的靶细胞中),将感兴趣的核苷酸序列连结至调控序列。
如本文可互换使用的术语“编码序列”或“蛋白质编码序列”是指编码蛋白质的多核苷酸片段。该区域或序列在靠近5'端的地方有一个起始密码子,在靠近3'端的地方有一个终止密码子。编码序列也可称为开放阅读框。
术语“核定位序列”、“核定位信号(Nuclear Localization Signal,NLS)”是指促进蛋白质输入细胞核的氨基酸序列。核定位序列是本领域已知的并且描述于例如Plank等人的国际PCT申请,PCT/EP2000/011690,2000年11月23日提交,2001年5月31日作为WO/2001/038547公布,其内容以引用方式并入本文以用于它们对示例性核定位序列的公开。在一些实施方案中,所述NLS是优化的NLS,例如由Koblan等人,Nature Biotech.2018doi:10.1038/nbt.4172所描述。
如本文所用,术语“接头”可指共价接头(例如,共价键)、非共价接头、化学基团或连接两个分子或部分(例如,蛋白质复合物或核糖核复合体的两个组分)的分子,例如,融合蛋白的两个结构域,如多核苷酸可编程DNA结合结构域(例如,dCas9)和脱氨酶结构域(例如,腺苷脱氨酶)。接头可以连接碱基编辑器系统的不同组分或组分的不同部分。例如,在一些实施方案中,接头可以连接多核苷酸可编程核苷酸结合结构域的引导多核苷酸结合结构域和脱氨酶的催化结构域。接头可以位于两个基团、分子或其他部分之间或两侧,并通过共价键或非共价相互作用连接到每一个,从而连接这两者。在一些实施方案中,所述接头可以是多核苷酸。在一些实施方案中,所述接头可以是DNA接头。在一些实施方案中,所述接头可以是RNA接头。
在一些实施方案中,接头可以是一个氨基酸或多个氨基酸(例如,肽或蛋白质)。在一些实施方案中,所述接头的长度可为约5至100个氨基酸,例如约5、6、7、8、9、10、11、12、 13、14、15、16、17、18、19、20、20至30、30至40、40至50、50至60、60至70、70至80、80至90或90至100个氨基酸的长度。在一些实施例中,接头的长度可为约100至150、150至200、200至250、250至300、300至350、350至400、400至450或450至500个氨基酸。也可以考虑更长或更短的接头。
如本公开所使用的,术语“切割”可以是指使核苷酸链中磷酸二酯键断裂。对于断裂的类型可以是单链断裂或双链断裂。
如本公开所使用的,术语“互补的”或“杂交的”用于指与碱基配对规则相关的“多核苷酸”和“寡核苷酸”(它们是可互换的术语,指的是核苷酸序列)。例如,序列“CAGT”与序列“GTCA”互补。互补可以是“部分的”或“全部的”。“部分”互补是指一个或多个核酸碱基根据碱基配对规则错配,核酸之间的“全部”或“完全”互补是指每个核酸碱基在碱基配对下均与另一个碱基匹配规则。核酸链之间的互补程度对核酸链之间的杂交效率和强度具有重要影响。这在扩增反应以及取决于核酸之间结合的检测方法中特别重要。
如本文所用,术语“杂交”是指使用核酸链通过碱基配对与互补链结合以形成杂交复合物的任何过程来配对互补核酸。
如本公开所使用的,术语“核酸序列”和“核苷酸序列”是指寡核苷酸或多核苷酸及其片段或部分,并且是指可以是单链或双链的基因组或合成来源的DNA或RNA,和代表有义或反义链。如本公开所使用的,术语“序列同一性”和“同一性百分比”指两个或更多个多核苷酸或多肽之间相同(即同一)的核苷酸或氨基酸的百分比。两个或更多个多核苷酸或多肽之间的序列同一性可通过以下方法测定:将多核苷酸或多肽的核苷酸或氨基酸序列对准且经对准的多核苷酸或多肽中含有相同核苷酸或氨基酸残基的位置数目进行评分,将其与经对准的多核苷酸或多肽中含有不同核苷酸或氨基酸残基的位置数目进行比较。多核苷酸可例如通过含有不同核苷酸(即取代或突变)或缺失核苷酸(即一个或两个多核苷酸中的核苷酸插入或核苷酸缺失)而在一个位置处不同。多肽可例如通过含有不同氨基酸(即取代或突变)或缺失氨基酸(即一个或两个多肽中的氨基酸插入或氨基酸缺失)而在一个位置处不同。序列同一性可通过用含有相同核苷酸或氨基酸残基的位置数目除以多核苷酸或多肽中氨基酸残基的总数来计算。举例而言,可通过用含有相同核苷酸或氨基酸残基的位置数目除以多核苷酸或多肽中核苷酸或氨基酸残基的总数且乘以100来计算同一性百分比。
示例性的,当使用序列比较算法或通过目视检查测量以最大的对应性进行比较和比对时,两个或多个序列或子序列具有至少80%、81%、82%、83%、84%、85%、86%、87%、88%、89%、90%、91%、92%、93%、94%、95%、96%、97%、98%或99%核苷酸的“序列同一性”或“同一性百分比”。在某些实施方案中,所述序列在任一或两个相比较的生物聚合物(例如,多核苷酸)的整个长度上基本相同。
术语“载体”是指将核酸序列引入细胞中从而产生转化细胞的手段。载体包括质粒、转座子、噬菌体、病毒、脂质体和附加体。“表达载体”是包含待在受体细胞中表达的核苷酸序列的核酸序列。表达载体可以包括额外的核酸序列以促进和/或促进引入序列的表达,例如起始、终止、增强子、启动子和分泌序列。
如本公开所使用的,术语“个体”和“受试者”可互换地使用,是指哺乳动物。哺乳动物包括但不限于驯化动物(例如,奶牛、绵羊、猫、犬和马)、灵长类(例如,人和非人灵长类如猴)、兔和啮齿类(例如,小鼠和大鼠)。特别地,个体是人。
本文公开的方法可以在体外、离体、或体内进行,或者产品可以以体外、离体、或体内形式存在。术语“体外”是指在实验室条件或培养液中使用材料、生物物质、细胞和/或组织的实验;而术语“体内”是指使用完整多细胞有机体的实验和工序。在一些实施方案中,体内进行的方法可以在非人动物上进行。“离体”是指存在于有机体外或发生在有机体外,例如在人或动物体外的事件,例如可以在取自有机体的组织(例如整个器官)或细胞上存在或发生的事件。
如本公开所使用的,术语“药学上可接受的载体”是指药学上可接受的材料、组合物或媒介物,例如液体或固体填充剂、稀释剂、赋形剂、制造助剂(例如,润滑剂、滑石粉、硬脂酸镁、钙或锌或硬脂酸)或溶剂包封材料,涉及将化合物从身体的一个部位(例如,递送部位)运送或运输到另一个部位(例如,器官、组织或身体的一部分)。药学上可接受的载体是“可接受的”,意思是与制剂的其他成分相容并且对受试者的组织无害(例如,生理学相容的、无菌的、生理学的pH等)。可以充当药学上可接受的载体的材料的一些实例包括:(1)糖,例如乳糖、葡萄糖和蔗糖;(2)淀粉,如玉米淀粉和马铃薯淀粉;(3)纤维素及其衍生物,如羧甲基纤维素钠、甲基纤维素、乙基纤维素、微晶纤维素和醋酸纤维素;(4)粉末黄蓍胶;(5)麦芽;(6)明胶;(7)润滑剂,如硬脂酸镁、十二烷基硫酸钠和滑石粉;(8)赋形剂,如可可 脂和栓剂蜡;(9)油,如花生油、棉籽油、红花油、芝麻油、橄榄油、玉米油和豆油;(10)二醇,如丙二醇;(11)多元醇,如甘油、山梨糖醇、甘露醇和聚乙二醇(PEG);(12)酯类,如油酸乙酯和月桂酸乙酯;(13)琼脂;(14)缓冲剂,如氢氧化镁和氢氧化铝;(15)海藻酸;(16)无热原水;(17)等渗盐水;(18)林格氏液;(19)乙醇;(20)pH缓冲溶液;(21)聚酯,聚碳酸酯和/或聚酸酐;(22)增量剂(bulking agent),如多肽和氨基酸(23)血清成分,如血清白蛋白、高密度脂蛋白(high density lipoprotein,HDL)和低密度脂蛋白(low density lipoprotein,LDL);(22)C2-C12醇,如乙醇;和(23)药物制剂中采用的其他无毒相容物质。润湿剂、着色剂、脱模剂、包衣剂、甜味剂、调味剂、芳香剂、防腐剂和抗氧化剂也可以存在于制剂中。诸如“赋形剂”、“药学上可接受的载体”等术语在本文中可互换使用。
如本文所用,术语“有效量”是指足以引起期望的生物学反应的生物活性剂的量。例如,在一些实施方案中,碱基编辑器的有效量可以指足以诱导由碱基编辑器突变的特异性结合的靶位点的突变的碱基编辑器的量。如本领域技术人员将理解的,试剂,例如碱基编辑器融合蛋白、脱氨酶、多核苷酸等的有效量可以随各种因素而变化,例如随期望的生物学反应,例如随待编辑的特定等位基因、基因组或靶位点,随靶定的细胞或组织和使用的试剂而变化。
术语“治疗”、“处理”是指如本文所述旨在逆转、缓解疾病或病症或其一种或多种症状、延迟疾病或病症或其一种或多种症状的发作或抑制疾病或病症或其一种或多种症状进展的临床干预。如本文所用,术语“治疗”、“处理”是指如本文所述旨在逆转、缓解疾病或病症或其一种或多种症状、延迟疾病或病症或其一种或多种症状的发作或抑制疾病或病症或其一种或多种症状进展的临床干预。在一些实施方案中,可以在一种或多种症状已经得以形成之后和/或疾病已经得到诊断之后施用治疗。在其他实施方案中,可以在没有症状的情况下施用治疗,例如用于预防或延迟症状的发作或抑制疾病的发作或进展。例如,可以在症状发作之前(例如,鉴于症状的历史和/或鉴于遗传或其他易感性因素)施用治疗于易感个体。治疗也可以在症状消退后继续进行,例如以预防或延迟其复发。
具体的,本发明内容包括如下内容:
<腺苷脱氨酶>
在本发明的第一方面的一些实施方案中,提供了一种腺苷脱氨酶,其包含以下序列中的一种或多种:
一种腺苷脱氨酶,所述腺苷脱氨酶与SEQ ID NO:1或SEQ ID NO:170所示的氨基酸序列具有至少80%、82%、85%、87%、90%、92%、95%、96%、97%、98%或99%同一性的氨基酸序列,并且其保留如SEQ ID NO:1所示的氨基酸序列的脱氨活性;
在SEQ ID NO:1或SEQ ID NO:170所示的氨基酸序列中添加、取代、缺失或插入1个或多个氨基酸残基的氨基酸序列,并且其保留如SEQ ID NO:1或SEQ ID NO:170所示的氨基酸序列的脱氨活性;或者,
由核苷酸序列编码的氨基酸序列,所述核苷酸序列与编码如SEQ ID NO:1或SEQ ID NO:170所示的氨基酸序列的多核苷酸序列在严格条件下杂交,并且所述氨基酸序列保留如SEQ ID NO:1或SEQ ID NO:170所示的氨基酸序列的脱氨活性,所述严格条件是中等严格条件,中-高严格条件,高严格条件或非常高严格条件。
本发明还涉及腺苷脱氨酶的变体,所述腺苷脱氨酶变体相对于SEQ ID NO:1所示的氨基酸序列存在以下任意多个氨基酸位置的改变:33、35、36、46、47、48、49、104、105、107、148、149、150、151、152、153、154、155。
进一步的,所述脱氨酶变体包含相对于SEQ ID NO:1所示的氨基酸序列,选自以下至少一个氨基酸位置的改变:V33I、D35G、D35R、D36N、D36G、A46C、I47Y、I47R、I47V、T48G、T48R、L49T、L49H、L49K、V104A、V104M、S105C、S105G、S107R、S107K、S107A、Q148P、Q148C、Q148A、Q148G、Q149M、Q149G、Q149L、P150R、P150L、P150C、R151K、E152R、E152T、E152G、E152W、V153P、V153F、V153I、V153T、F154H、F154K、F154L、N155T、N155R、N155H。而具体的位置的改变是可以进行一个或多个进行组合的。具体的,所述腺苷脱氨酶变体相对于SEQ ID NO:1所示的氨基酸序列,包含任一个以下的组的改变:
(1)Q148G+Q149M+P150R;(2)S107A;(3)E152R+V153P+F154H+N155T;(4)S107K;(5)Q148P+Q149G+P150L;(6)A46C+I47Y+T48G+L49H;(7)E152T+V153F+N155T;(8)S107R;(9)V104M;(10)E152G+V153I+F154K+N155R;(11)S105C;(12)Q148C+Q149L+P150R+R151K; (13)Q148A+Q149G+P150C;(14)E152W+V153T+F154L+N155H;(15)S105G;(16)I47R+T48G+L49T;(17)D35G+D36N;(18)V33I+D35R+D36G;(19)V104A;(20)I47V+T48R+L49K。
本发明提供了一种腺苷脱氨酶变体,所述腺苷脱氨酶变体相对于SEQ ID NO:170所示的氨基酸序列存在任意多个氨基酸位置的改变,在一些实施方案中,所述的取代为在如SEQ ID NO:170所示的氨基酸序列的以下位点中的一个或多个发生的取代:
S15、D16、H17、E18、F19、N20、D21、E22、Y23、W24、M25、R26、H27、A28、L29、T30、K33、R34、A35、R36、V41、V43、L47、L49、N51、N59、A61、I62、L64、A69、E72、G80、L81、V82、L83、Q84、N85、Y86、I89、D90、A91、T92、V95、F97、A106、R111、I112、S113、R114、L115、F117、V119、R120、N121、S122、K123、R124、N132、V133、L134、N135、P137、G138、M139、N140、H141、R142、E144、D160、V168、F169、N170。
进一步地,在一些实施方案中,所述的取代为在如SEQ ID NO:170所示的氨基酸序列的以下位点中的一个或多个发生的取代:
S15T、S15G、D16E、D16N、H17C、H17K、E18D、E18K、F19C、F19Y、F19S、N20Q、N20L、N20S、D21Q、E22D、Y23F、W24F、M25L、M25V、R26K、R26T、H27R、A28C、L29I、T30E、K33R、K33S、R34K、A35S、R36Q、L49F、N51G、L64M、L64Q、L64P、G80A、L81N、V82A、V82T、L83I、L83Q、Q84N、N85S、Y86W、I89E、I89L、D90G、A91C、A91T、T92D、A106C、I112L、S113K、R114K、V119L、S122N、S122P、R124H、R124T、N132K、V133I、L134F、N135H、N135S、P137F、G138A、M139L、N140K、H141A、R142L、R142S、E144H、D160E、V168A、F169C、F169V、N170D。
在一些具体的实施方案中,所述的取代为在如SEQ ID NO:170所示的氨基酸序列的以下位点的组合发生的取代:
(1)S15T+D16E+H17K+F19Y+N20Q(腺苷脱氨酶004V2);
(2)E22D+Y23F+W24F+R26K+H27R+L29I(腺苷脱氨酶004V3);
(3)K33R+R34K+A35S(腺苷脱氨酶004V4);
(4)G80A+L81N+V82A+L83I+Q84N+N85S+Y86W(腺苷脱氨酶004V7);
(5)I89L+D90G+A91T+T92D(腺苷脱氨酶004V8);
(6)I112L+S113K+R114K(腺苷脱氨酶004V10);
(7)N132K+V133I+L134F+N135H(腺苷脱氨酶004V12);
(8)P137F+G138A+M139L(腺苷脱氨酶004V13);
(9)E18K+F19S+N20L(腺苷脱氨酶004V14);
(10)M25L(腺苷脱氨酶004V15);
(11)S15G+D16N+H17C(腺苷脱氨酶004V16);
(12)L64Q+S122P+V168A(腺苷脱氨酶004V17);
(13)R36Q+L64Q+S122P+F169V(腺苷脱氨酶004V18);
(14)R36Q+V119L+V168A(腺苷脱氨酶004V19);
(15)R36Q+V119L+N170D(腺苷脱氨酶004V20);
(16)L64Q+V119L+V168A(腺苷脱氨酶004V21);
(17)F169C(腺苷脱氨酶004V22);
(18)L64Q+V119L+V168A(腺苷脱氨酶004V23);
(19)L64P+R124T+F169V(腺苷脱氨酶004V24);
(20)I89E+A91C(腺苷脱氨酶004V25);
(21)L64Q+V119L+F169V(腺苷脱氨酶004V26);
(22)L64M(腺苷脱氨酶004V27);
(23)L64Q+S122P+N170D(腺苷脱氨酶004V28);
(24)E18D+F19C+N20S(腺苷脱氨酶004V29);
(25)L64P+V119L+N170D(腺苷脱氨酶004V30);
(26)S122N+R124H+N135S+D160E(腺苷脱氨酶004V31);
(27)R26T+H27R+A28C(腺苷脱氨酶004V32);
(28)V82T+L83Q(腺苷脱氨酶004V33);
(29)M25V(腺苷脱氨酶004V34);
(30)D21Q+T30E+K33S(腺苷脱氨酶004V35);
(31)N140K+H141A+R142S(腺苷脱氨酶004V36);
(32)L49F+N51G(腺苷脱氨酶004V37);
(33)A106C(腺苷脱氨酶004V38);
(34)R142L+E144H(腺苷脱氨酶004V39);
(35)E22R+Y23F+W24P+M25H(腺苷脱氨酶004V40);和/或
(36)L64Q+R124T+V168A(腺苷脱氨酶004V41)。
在本发明中,“保留脱氨活性”的表述,例如“保留如SEQ ID NO:1所示的氨基酸序列的脱氨活性”、“保留如SEQ ID NO:170所示的氨基酸序列的脱氨活性”,可以是完全保留原始序列的腺苷脱氨酶的脱氨活性,也可以是部分保留原始序列的腺苷脱氨酶的脱氨活性,例如保留30%、40%、50%、60%、70%、80%、90%、95%、99%的脱氨活性。在另一些实施方案中,具有修改后的序列的腺苷脱氨酶,例如具有经氨基酸取代的序列的腺苷脱氨酶也可能具有高于原始序列的腺苷脱氨酶的脱氨活性。
本发明提供的腺苷脱氨酶可作用于任何多核苷酸,包括DNA、RNA和DNA-RNA杂交体。在某些实施方案中,腺苷脱氨酶可以使包含DNA的多核苷酸的靶标腺嘌呤(A)脱氨基。在某些实施方案中,腺苷脱氨酶可以使包含RNA的多核苷酸的靶标腺嘌呤(A)脱氨基。在一些实施方案中,所述的取代为保守取代。
在本发明中,“中等严格条件”,“中-高严格条件”,“高严格条件”或“非常高严格条件”描述了核酸杂交和洗涤的条件。进行杂交反应的指导参见Current Protocols in Molecular Biology,John Wiley&Sons,N.Y.(1989),6.3.1-6.3.6,其通过引用并入本文。在该文献中描述了含水的和非含水的方法,且可以使用任一种。例如,具体的杂交条件如下:(1)低严格性杂交条件在6×氯化钠/柠檬酸钠(SSC)中,在约45℃,然后在至少50℃,在0.2×SSC,0.1%SDS中洗涤2次(对于低严格性条件,可以将洗涤温度升高到55℃);(2)中等严格性杂交条件在6×SSC,在约45℃,然后在60℃,在0.2×SSC,0.1%SDS中洗涤1次或多次;(3)高严格性杂交条件在6×SSC,在约45℃,然后在65℃,在0.2×SSC,0.1%SDS中洗涤1次或多次且优选;(4)非常高的严格性杂交条件是0.5M磷酸钠,7%SDS,在65℃,然后在65℃,在0.2×SSC,1%SDS中洗涤1次或多次。
腺苷脱氨酶005V1的氨基酸序列如下:
腺苷脱氨酶004V1的氨基酸序列如下;
腺苷脱氨酶005V1的核苷酸序列如下:
腺苷脱氨酶004V1的核苷酸序列:
<碱基编辑器融合蛋白>
本发明的第二方面提供了一种碱基编辑器融合蛋白,其包含本发明第一方面所述腺苷脱氨酶,以及核酸可编程核苷酸结合结构域。
在本发明中,核酸可编程核苷酸结合结构域当与结合的引导多核苷酸(例如,gRNA)结合时,可以特异性结合靶多核苷酸序列(即,通过结合的引导核酸的碱基和靶多核苷酸的碱基之间的互补碱基配对序列),从而将碱基编辑器定位到需要编辑的目标核酸序列。在一些实施方案中,靶多核苷酸序列包含单链DNA或双链DNA。在一些实施方案中,所述靶多核苷酸序列包含RNA。在一些实施方案中,靶标多核苷酸序列包含DNA-RNA杂交体。应当理解,核酸可编程核苷酸结合结构域还可以包括结合RNA的核酸可编程蛋白。
在本发明的一些实施方案中,碱基编辑器中的核酸可编程核苷酸结合结构域是Cas蛋白或者AGO蛋白。所述Cas蛋白或AGO蛋白包括天然存在的Cas蛋白或AGO蛋白,以及其同源物或其修饰或工程化版本。例如,在一些实施方案中,Cas蛋白或AGO蛋白可以是包括与天然存在的Cas蛋白或AGO蛋白的氨基酸序列具有至少60%,至少65%,至少70%,至少75%,至少80%,至少85%,至少90%,至少95%,至少96%,至少97%,至少98%,至少99%或至少99.5%同一性的氨基酸序列的蛋白。在另一些实施方案中,Cas蛋白还包括其切口酶或无核酸酶活性形式的蛋白。
在一些实施方案中,可以作为核酸可编程核苷酸结合结构域的Cas蛋白的非限制实例包括Cas1、Cas1B、Cas2、Cas3、Cas4、Cas5、Cas5d、Cas5t、Cas5h、Cas5a、Cas6、Cas7、Cas8、Cas8a、Cas8b、Cas8c、Cas9(也已知为Csn1或Csx12)、Cas10、Cas10d、Cas12a/Cpfl、Cas12b/C2cl、Cas12c/C2c3、Cas12d/CasY、Cas12e/CasX、Cas12f(C2c10/Cas14)、Cas12g、Cas12h、Cas12i、Cas12j、Cas12k/C2c5、Cas12l、Cas12m、Cas12n、Cas13a(C2c2)、cas13b、Cas13c、Cas13d、Csy1、Csy2、Csy3、Csy4、Css1、Css2、Cse5e、Csc2、Csa5、Csn1、Csn2、Csm1、Csm2、Csm3、Csm4、Csm5、Csm6、Cmr1、Cmr2、Cmr3、Cmr4、Cmr5、Cmr6、Csb1、Csb2、Csb3、Csx17、Csx14、Csx10、Csx16、Csx1、Csx1S、Csx11、Csf1、Csf2、CsO、Csf4、Csd1、Csd2、Cst1、Cst2、Csh1、Csh2、Csa1、Csa2、Csa3、Csa4、Csa5、II类Cas效应蛋白、V型Cas效应蛋白、VI类Cas蛋白、CARF、DinG、其同源物或其修饰或工程化版本。其他核酸可编程核苷酸结合结构域也在本公开的范围内,尽管它们可能未在本 公开中具体列出。
在本发明的一些具体的实施方案中,Cas蛋白选自Cas9家族、Cas12家族和Cas13家族,例如但不限于Cas9、Cas12a(Cpf1)、Cas12b(C2c1)、Cas12c(C2c3)、Cas12d(CasY)、Cas12e(CasX)、Cas12f(C2c10/Cas14)、Cas12g、Cas12h、Cas12i、Cas12j、Cas12k(C2c5)、Cas12l、Cas12m、Cas12n、Cas13a(C2c2)、Cas13b、Cas13c、Cas13d其同源物或其修饰或工程化版本。在本发明的一些具体的实施方案中,Cas蛋白包括上述Cas蛋白的无核酸酶活性形式,例如dCas9、dCas12a、dCas12b、dCas12c、dCas12d、dCas12e、dCas12f、dCas12g、dCas12h、dCas12i、dCas12j、dCas12k、dCas12l、dCas12m、dCas12n、dCas13a、dcas13b、dCas13c和dCas13d。在本发明的一些具体的实施方案中,Cas蛋白还包括上述蛋白的切口酶形式,例如nCas9,但不限于nCas9。
在本发明的一些优选的实施方案中,核酸可编程核苷酸结合结构域为Cas9。在本发明的一些具体的实施方案中,Cas9是来自化脓性链球菌的Cas9(SpCas9)、金黄色葡萄球菌的Cas9(SaCas9)、嗜热链球菌1的Cas9(St1Cas9)。在本发明的一些优选的实施方案中,Cas9是来自化脓性链球菌的Cas9(SpCas9)。
在本发明的一些更优选的实施方案中Cas9可以是核酸酶活性Cas9、Cas9切口酶(nCas9)或无核酸酶活性Cas9(dCas9)。
在本发明的一些进一步优选的实施方案中,核酸可编程核苷酸结合结构域为Cas9切口酶(nCas9)。在本发明的另一些进一步优选的实施方案中,核酸可编程核苷酸结合结构域包含与本文提供的Cas9切口酶(nCas9)的氨基酸序列具有至少80%,至少85%,至少90%,至少95%,至少96%,至少97%,至少98%,至少99%或至少99.5%同一性的氨基酸序列。在本发明的一些实施方案中,腺苷脱氨酶与核酸可编程核苷酸结合结构域直接融合/连接形成融合蛋白,或者通过接头融合/连接形成融合蛋白。腺苷脱氨酶与核酸可编程核苷酸的融合/连接顺序没有特别的限制,例如腺苷脱氨酶可以在碱基编辑器的N端,或者核酸可编程核苷酸结合结构域位于碱基编辑器的N端。
在腺苷脱氨酶与核酸可编程核苷酸结合结构域直接融合的实施方案中,示例性的碱基编辑器融合蛋白具有如下的结构:
NH2-[腺苷脱氨酶]-[napDNAbp]-COOH;
NH2-[napDNAbp]-[腺苷脱氨酶]-COOH;或
NH2-[napDNAbp的N末端片段]-[腺苷脱氨酶]-[napDNAbp的C末端片段]-COOH。
在腺苷脱氨酶与核酸可编程核苷酸结合结构域通过接头融合的实施方案中,示例性的碱基编辑器融合蛋白具有如下的结构:
NH2-[腺苷脱氨酶]-[任选的接头]-[napDNAbp]-COOH;
NH2-[napDNAbp]-[任选的接头]-[腺苷脱氨酶]-COOH;或
NH2-[napDNAbp的N末端片段]-[任选的接头]-[腺苷脱氨酶]-[任选的接头]-[napDNAbp的C末端片段]-COOH。
在一些实施方案中,核碱基编辑器的结构域通过包含以下氨基酸序列的接头融合:
在本发明的一些实施方案中,碱基编辑器中还包括至少一条核定位信号序列(NLS序列),NLS序列可选自下表中的氨基酸序列:
在本发明的一些实施方案中,核定位信号序列可以位于碱基编辑器的N端、C端、两端,或者位于腺苷脱氨酶与核酸可编程核苷酸结合结构域之间。在本发明的一些实施方案中,核定位信号序列可以直接融合于碱基编辑器中,或者通过接头融合在碱基编辑器中。
包含核定位信号序列的碱基编辑器的示例性结构如下:
NH2-[NLS]-[腺苷脱氨酶]-[napDNAbp]-COOH;
NH2-[腺苷脱氨酶]-[NLS]-[napDNAbp]-COOH;
NH2-[腺苷脱氨酶]-[napDNAbp]-[NLS]-COOH;
NH2-[NLS]-[napDNAbp]-[腺苷脱氨酶]-COOH;
NH2-[napDNAbp]-[NLS]-[腺苷脱氨酶]-COOH;
NH2-[napDNAbp]-[腺苷脱氨酶]-[NLS]-COOH;
NH2-[NLS]-[腺苷脱氨酶]-[任选的接头]-[napDNAbp]-COOH;
NH2-[腺苷脱氨酶]-[任选的接头]-[NLS]-[任选的接头]-[napDNAbp]-COOH;
NH2-[腺苷脱氨酶]-[任选的接头]-[napDNAbp]-[NLS]-COOH;
NH2-[NLS]-[napDNAbp]-[任选的接头]-[腺苷脱氨酶]-COOH;
NH2-[napDNAbp]-[任选的接头]-[NLS]-[任选的接头]-[腺苷脱氨酶]-COOH;
NH2-[napDNAbp]-[任选的接头]-[腺苷脱氨酶]-[NLS]-COOH;
NH2-[NLS]-[腺苷脱氨酶]-[任选的接头]-[napDNAbp]-[NLS]-COOH。
在本发明的一些优选的实施方案中,碱基编辑器具有如下的结构:
NH2-[NLS]-[腺苷脱氨酶]-[任选的接头]-[napDNAbp]-[NLS]-COOH;或
NH2-[NLS]-[napDNAbp的N末端片段]-[任选的接头]-[腺苷脱氨酶]-[任选的接头]-[napDNAbp的C末端片段]-[NLS]-COOH。
在本发明的一些进一步优选的实施方案中,所述碱基编辑器包含以下序列中的一种或多种:
(i)如SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列;
(ii)与SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列具有至少80%、82%、85%、87%、90%、92%、95%、96%、97%、98%或99%同一性的氨基酸序列,并且其保留如SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列的结合多核苷酸和碱基编辑活性;
(iii)在SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列中添加、取代、缺失或插入1个或多个氨基酸残基的氨基酸序列,并且其保留如SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列的结合多核苷酸和碱基编辑活性;或者,
(iv)由核苷酸序列编码的氨基酸序列,所述核苷酸序列与编码如SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列的多核苷酸序列在严格条件下杂交,并且所述氨基酸序列保留如SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列的结合多核苷酸和碱基编辑活性,所述严格条件是中等严格条件,中-高严格条件,高严格条件或非常高严格条件。
腺苷脱氨酶004V1及005V1与nCas9所构成的碱基编辑器序列如下:
004V1-nCas9的氨基酸序列:
其中,粗体序列表示源自nCas9的序列;斜体表示接头序列;双下划线序列表示核定位序列;单下划线序列是脱氨酶004V1序列;C端的星号代表终止密码子位置。
005V1-nCas9氨基酸序列:
其中,粗体序列表示源自nCas9的序列,斜体表示接头序列,双下划线序列表示核定位序列;单下划线序列是腺苷脱氨酶005V1序列,星号代表终止密码子的位置。
005V1-nCas9的核苷酸序列如下所示:

进一步的,本发明还提供了基于腺苷脱氨酶004V1进行氨基酸取代后获得的突变体:腺苷脱氨酶004V2、腺苷脱氨酶004V3、腺苷脱氨酶004V4、腺苷脱氨酶004V7、腺苷脱氨酶004V8、腺苷脱氨酶004V10、腺苷脱氨酶004V12、腺苷脱氨酶004V13-004V41,具体如下:


上述腺苷脱氨酶各自所构成的碱基编辑器示例性的序列如下:
004V2-nCas9的氨基酸序列如下所示:
其中,粗体序列表示源自nCas9的序列,斜体表示接头序列,双下划线序列表示核定位序列;单下划线序列是腺苷脱氨酶004V2序列,星号代表终止密码子的位置。
004V3-nCas9的氨基酸序列如下所示:
其中,粗体序列表示源自nCas9的序列,斜体表示接头序列,双下划线序列表示核定位序列;单下划线序列是脱氨酶004V3序列,星号代表终止密码子的位置。
004V4-nCas9的氨基酸序列如下所示:
其中,粗体序列表示源自nCas9的序列,斜体表示接头序列,双下划线序列表示核定位序列;单下划线序列是脱氨酶004V4序列,星号代表终止密码子的位置。
004V7-nCas9的氨基酸序列如下所示:
其中,粗体序列表示源自nCas9的序列,斜体表示接头序列,双下划线序列表示核定位序列;单下划线序列是脱氨酶004V7序列,星号代表终止密码子的位置。
004V8-nCas9的氨基酸序列如下所示:
其中,粗体序列表示源自nCas9的序列,斜体表示接头序列,双下划线序列表示核定位序列;单下划线序列是脱氨酶004V8序列,星号代表终止密码子的位置。
004V10-nCas9的氨基酸序列如下所示:
其中,粗体序列表示源自nCas9的序列,斜体表示接头序列,双下划线序列表示核定位序列;单下划线序列是脱氨酶004V10序列,星号代表终止密码子的位置。
004V12-nCas9的氨基酸序列如下所示:
其中,粗体序列表示源自nCas9的序列,斜体表示接头序列,双下划线序列表示核定位序列;单下划线序列是脱氨酶004V12序列,星号代表终止密码子的位置。
004V13-nCas9的氨基酸序列如下所示:
其中,粗体序列表示源自nCas9的序列;斜体表示接头序列;双下划线序列表示核定位序列;单下划线序列是腺苷脱氨酶004V13序列;C端的星号代表终止密码子位置。
进一步的,本发明还提供了基于腺苷脱氨酶005V1进行了一个或多个位点的氨基酸取代后获得的突变体,包括脱氨酶005V1-10-3、005V1-11-5、005V1-10-1、005V1-11-2、005V1-15-1、005V1-11-3、005V1-10-4、005V1-11-4、005V1-15-2、005V1-2-7、005V1-1-1、005V1-1-2、005V1-1-3、005V1-1-4、005V1-1-5、005V1-1-6、005V1-1-7、005V1-1-8、005V1-2-1、005V1-2-2、005V1-2-3、005V1-2-5、005V1-2-6、005V1-2-8、005V1-3-1、005V1-3-2、005V1-3-3、005V1-3-4、005V1-3-5、005V1-3-7、005V1-3-8、005V1-4-1、005V1-4-2、005V1-4-3、005V1-15-5、005V1-5-4、005V1-5-8、005V1-10-5、005V1-5-2、005V1-3-6、005V1-4-5、005V1-4-7、005V1-4-8、005V1-5-1、005V1-5-3、005V1-5-5、005V1-5-6、005V1-5-7、005V1-6-1、005V1-6-2、005V1-6-5、005V1-6-6、005V1-6-8、005V1-7-1、005V1-7-2、005V1-7-3、005V1-8-1、005V1-8-2、005V1-8-3、005V1-8-4、005V1-8-5、005V1-9-1、005V1-9-2、005V1-9-3、005V1-9-4、005V1-9-5、005V1-10-2在内的多种突变体。
通过在nCas9蛋白中间寻找合适的腺苷脱氨酶的插入位点,还获得了多种编辑效率提升的嵌合碱基编辑器融合蛋白,所采用的腺苷脱氨酶包括腺苷脱氨酶005V1、005V1-10-1、005V1-10-3,上述三个腺苷脱氨酶序列,碱基编辑器融合蛋白,所述的核酸可编程核苷酸结合结构域选择nCas9结构域,所述腺苷脱氨酶的插入位置为SEQ ID NO:61的第583-584、768-769、770-771、776-777、793-794、905-906、919-920、1048-1063、1049-1062、1249-1250、1263-1264或1276-1277的氨基酸位置之间。
在本发明中,“保留如SEQ ID NO:190所示的氨基酸序列的结合多核苷酸和碱基编辑活性”、“保留如SEQ ID NO:3所示的氨基酸序列的结合多核苷酸和碱基编辑活性”可以是完全保留如SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列的碱基编辑器的结合多核苷酸和碱基编辑活性,也可以是部分保留其活性。在另一些实施方案中,具有修改序列的碱基编辑器也可以具有高于如SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列的碱基编辑器的结合多核苷酸和碱基编辑活性。
应当理解,本公开的碱基编辑器融合蛋白可以包含一个或多个附加特征。例如,在一些实施方案中,融合蛋白可包含抑制剂、细胞质定位序列、输出序列,例如核输出序列或其他定位序列,以及可用于溶解、纯化或检测融合的标签。本文提供的合适的标签包括但不限于生物素羧化酶载体蛋白(BCCP)标签、myc标签、钙调蛋白标签、FLAG标签、血凝素(HA)标签、多组氨酸标签,也称为组氨酸标签或His-标签、麦芽糖结合蛋白(MBP)- 标签、nus-标签、谷胱甘肽-S-转移酶(GST)-标签、绿色荧光蛋白(GFP)-标签、硫氧还蛋白-标签、S-标签、Softags(例如,Softag 1、Softag 3)、链标签、生物素连接酶标签、Flash标签、V5标签和SBP标签。其他合适的序列对本领域技术人员来说是显而易见的。在一些实施方案中,融合蛋白包含一个或多个His标签。
<多核苷酸>
本发明的第三方面提供了一种多核苷酸,其编码本发明第一方面所述的腺苷脱氨酶或者编码本发明第二方面所述的碱基编辑器融合蛋白。
<表达载体>
本发明的第四方面提供了一种载体,其包含本发明第三方面所述的多核苷酸。在本发明的一些实施方案中,所述载体是哺乳动物表达载体。在一些实施方案中,所述表达载体选自腺相关病毒、逆转录病毒载体、腺病毒载体、慢病毒载体、仙台病毒载体和疱疹病毒载体中的一种或多种。在一些实施方案中,所述载体包含启动子。
<细胞>
本发明的第五方面提供了一种细胞,其包含本发明第一方面所述的腺苷脱氨酶、本发明第二方面所述的碱基编辑器融合蛋白、本发明第三方面所述的多核苷酸和本发明第四方面所述的载体中的一种或多种。在本发明的一些实施方案中,所述细胞是原核细胞、真核细胞,进一步地,可以是细菌细胞、植物细胞、昆虫细胞、人类细胞或哺乳动物细胞。
<碱基编辑器系统>
本发明的第六方面提供了一种碱基编辑器系统。在一些实施方案中,碱基编辑器系统包含本发明第一方面所述的腺苷脱氨酶、核酸可编程核苷酸结合结构域、以及引导多核苷酸。在另一些实施方案中,碱基编辑器系统包含本发明第二方面所述的碱基编辑器融合蛋白,以及引导多核苷酸。
在本发明的一些实施方案中,引导多核苷酸为引导RNA(gRNA),其是一种短的合成RNA, 由Cas蛋白结合所需的骨架序列和用户定义的约20个核苷酸间隔序列组成,该间隔序列定义了要修改的基因组靶标。因此,技术人员可以改变Cas蛋白的基因组靶标特异性部分取决于gRNA靶向序列与基因组其余部分相比对基因组靶标的特异性。
在本发明的更具体的一些实施方案中,引导多核苷酸为sgRNA,其由Cas蛋白结合所需的骨架序列和用户定义的约20个核苷酸间隔序列(spacer)组成。
针对不同来源或种类的Cas蛋白,可以选择不同的骨架序列。在本发明的一些更具体的具体实施方案中,其中,结合Cas9蛋白的结构域(SpCas9),即sgRNA的骨架序列为:
<药物组合物、试剂盒、递送系统、用途及方法>
本发明第七方面提供了一种药物组合物,其包含本发明第一方面所述的腺苷脱氨酶、本发明第二方面所述的碱基编辑器融合蛋白、本发明第三方面所述的多核苷酸、本发明第四方面所述的表达载体、本发明第五方面所述的细胞和本发明第六方面所述的碱基编辑器系统中的一种或多种,以及药学上可接受的载体。
在一些实施方案中,所述的药学上可接受的载体可以是递送载体,例如脂质、阳离子脂质或其他具有药物递送功能的聚合物。
本发明第八方面提供了一种试剂盒,具体为一种疾病治疗试剂盒,其包含本发明第一方面所述的腺苷脱氨酶、本发明第二方面所述的碱基编辑器融合蛋白、本发明第三方面所述的多核苷酸、本发明第四方面所述的表达载体、本发明第五方面所述的细胞、本发明第六方面所述的碱基编辑器系统和本发明第七方面所述的药物组合物中的一种或多种。
本发明的第九方面提供了一种递送系统,其包含本发明第一方面所述的腺苷脱氨酶、本发明第二方面所述的碱基编辑器融合蛋白、本发明第三方面所述的多核苷酸、本发明第四方面所述的载体、本发明第五方面所述的细胞和本发明第六方面所述的碱基编辑器系统中的一种或多种,以及递送介质。
在一些实施方案中,所述递送介质可以为纳米颗粒、脂质体、外泌体、微囊泡或基因枪、 细胞穿膜肽等。
本发明第十方面提供了本发明第一方面所述的腺苷脱氨酶在制备碱基编辑器或碱基编辑器系统中的用途。
本发明第十一方面提供了本发明第一方面所述的腺苷脱氨酶、本发明第二方面所述的碱基编辑器融合蛋白、本发明第三方面所述的多核苷酸、本发明第四方面所述的表达载体、本发明第五方面所述的细胞、本发明第六方面所述的碱基编辑器系统或本发明第七方面所述的药物组合物、本发明第九方面所述的递送系统在制备用于治疗与点突变相关或由点突变引起的疾病的药物中的用途。
在一些实施方案中,所述药物可以将所述点突变进行校正。在一些实施方案中,所述点突变为G至A和/或C至T。
在一些实施方案中,与点突变相关或由点突变引起的疾病包括高胆固醇血症、转甲状腺素蛋白淀粉样变、β-血红蛋白病;在另一些实施方案中,所述疾病非限制性的实例包括梅尔-戈林综合征(Meier-Gorlin syndrome);塞克尔综合征(Seckel syndrome);乔伯特综合征(Joubert syndrome);莱伯氏先天性黑蒙症(Leber congenital amaurosis);夏科-马里-图思病(Charcot-Marie-Tooth disease),2型;夏科-马里-图思病,2型;乌谢尔综合征(Usher syndrome),2C型;脊髓小脑性共济失调;长QT综合征2;西奥格林-拉尔逊氏综合征(-Larsson syndrome);遗传性果糖尿病;遗传性果糖尿病;神经母细胞瘤;神经母细胞瘤;卡尔曼综合征(Kallmannsyndrome)1;卡尔曼综合征;异染性脑白质营养不良;雷特综合征(Rettsyndrome);肌萎缩性侧索硬化症10型;李-佛美尼综合征(Li-Fraumeni syndrome)。
本发明第十二方面提供了核酸的碱基编辑方法,其包括将核酸与本发明第六方面所述的碱基编辑器系统相接触的步骤。
在一些实施方案中,核酸是DNA。进一步地,核酸是双链DNA。
在一些实施方案中,所述核酸包含与疾病相关的靶序列。
在一些实施方案中,所述靶序列包含与疾病相关的点突变。
在一些具体的实施方案中,所述靶序列包含G至A或C至T的与疾病或病症相关的点突变,并且其中突变A碱基的脱氨化导致与疾病或病症不相关的序列。
在一些实施方案中,所述靶序列编码蛋白质,并且其中所述点突变在密码子中,并且导致与野生型密码子相比由该突变密码子编码的氨基酸变化。
在一些实施方案中,所述靶序列位于剪接位点处,并且其中所述点突变导致与野生型转录物相比mRNA转录物的剪接变化。
在一些实施方案中,所述靶序列位于基因的启动子处,并且其中所述点突变导致基因表达增加。
在一些实施方案中,所述靶序列位于基因的启动子处,并且其中所述点突变导致基因表达减少。
在一些实施方案中,核酸位于生物体的基因组内。
在一些实施方案中,所述生物体为原核生物或真核生物或脊椎动物或哺乳动物。
在一些实施方案中,所述突变A碱基的脱氨化导致该突变密码子所编码的氨基酸变化,或导致编码野生型氨基酸的密码子,或导致mRNA转录物变化,或导致野生型mRNA转录物,或导致基因表达增加,或导致基因表达减少。
在一些实施方案中,所述接触在体外实施。
在一些实施方案中,所述接触在受试者体内实施。
在一些实施方案中,所述受试者已被诊断患有疾病或病症。
在一些实施方案中,所述疾病或病症与前蛋白转化酶枯草溶菌素9(PCSK9)基因的点突变相关。
在一些实施方案中,所述疾病包括高胆固醇血症、转甲状腺素蛋白淀粉样变、β-血红蛋白病。在另一些实施方案中,所述疾病包括非限制性的实例包括梅尔-戈林综合征(Meier-Gorlin syndrome);塞克尔综合征(Seckel syndrome);乔伯特综合征(Joubert syndrome);莱伯氏先天性黑蒙症(Leber congenital amaurosis);夏科-马里-图思病(Charcot-Marie-Tooth disease),2型;夏科-马里-图思病,2型;乌谢尔综合征(Usher syndrome),2C型;脊髓小脑性共济失调;长QT综合征;西奥格林-拉尔逊氏综合征(-Larsson syndrome);遗传性果糖尿病;遗传性果糖尿病;神经母细胞瘤;神经母细胞瘤;卡尔曼综合征(Kallmannsyndrome);异染性脑白质营 养不良;雷特综合征(Rettsyndrome);肌萎缩性侧索硬化症10型;李-佛美尼综合征(Li-Fraumeni syndrome)。
本发明第十三方面提供了用于治疗患有与点突变相关或由点突变引起的疾病的方法,在一些实施方案中,提供的方法包括向患有此类疾病的受试者施用有效量的校正点突变或将失活性突变引入到疾病相关基因中的本发明第二方面所述的碱基编辑器融合蛋白、本发明第六方面所述的碱基编辑器系统、本发明第七方面所述的药物组合物、本发明第八方面所述的试剂盒、本发明第九方面所述的递送系统。
<对照>
在以下实施例中,使用David R.Liu团队所进化出的高效碱基编辑器ABE8e(Richter MF,Zhao KT,Eton E,Lapinaite A,Newby GA,Thuronyi BW,Wilson C,Koblan LW,Zeng J,Bauer DE,Doudna JA,Liu DR.Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity.Nat Biotechnol.2020Jul;38(7):883-891.doi:10.1038/s41587-020-0453-z.Epub 2020Mar 16.Erratum in:Nat Biotechnol.2020May 20;:PMID:32433547;PMCID:PMC7357821.)作为对比,与包含本发明提供的腺苷脱氨酶的碱基编辑器进行比较,ABE8e是基于对ABE7.10(Gaudelli NM,Komor AC,Rees HA,Packer MS,Badran AH,Bryson DI,Liu DR.Programmable base editing of A·T to G·C in genomic DNA without DNA cleavage.Nature.2017 Nov 23;551(7681):464-471.doi:10.1038/nature24644.Epub 2017Oct 25.Erratum in:Nature.2018May 2;:PMID:29160308;PMCID:PMC5726555.)的脱氨酶成分进行优化后得到的;根据前述团队所得出的实验结果,相对于ABE7.10的活性(以一级脱氨常数(first-order kinetics.deamination rate constants,kapp)表示)提高590倍。
具体地,在以下实施例中,ABE8e的氨基酸序列如下:
其中,粗体序列表示源自nCas9的序列;斜体表示接头序列;双下划线序列表示核定位序列;单下划线序列是ecTadA*脱氨酶序列;C端末尾*表示终止密码子位置。
相应地,ABE8e核苷酸序列如下:

在以下实施例中,ABE8e以及包含本发明提供的腺苷脱氨酶的碱基编辑器中所使用的napDNAbp均为nCas9,其氨基酸序列如下:

nCas9核苷酸序列如下:

发明的有益效果
本发明通过对亲本腺苷脱氨酶进行定点突变,提高了对应的碱基编辑器的编辑效率,扩大了应用前景,具有潜在的利用价值。本发明所涉及碱基编辑器在对PCSK9靶点以及其他位点上的编辑效率明显高于野生型,在治疗与点突变相关或由点突变引起的疾病(如高胆固醇血症、转甲状腺素蛋白淀粉样变、β-血红蛋白病)治疗领域中的具有较大的潜力。通过大量探索研究,对部分腺苷脱氨酶与核酸酶构成碱基编辑器,找到了适合本发明腺苷脱氨酶在nCas9蛋白内部的合适嵌合位置,进一步提升了碱基编辑效率。
对本文所述的本发明中任何一个实施方案,包括仅在实施例或权利要求中描述或仅在以下一个方面/部分中描述的实施方案,应该作如下理解:除非明确否认或组合形式不当,所述实施方案可以与本发明的任何其他一个或多个实施方案进行组合。
附图说明
图1是005V1-nCas9质粒图谱。
图2是005V1-nCas9及各个突变体碱基编辑器在PCSK9位点的编辑效率,所述编辑靶向位点为PCSK9基因,所用细胞为293T细胞。
图3是005V1-nCas9及各个突变体碱基编辑器的在PCSK9位点的编辑效率对比,为能够更 清楚地表明突变体碱基编辑器与005V1-nCas9的编辑效率对比,将005V1-nCas9的编辑效率值设置为1,将其他突变体碱基编辑器的编辑效率等比例计算。
图4是存在编辑效率的突变体碱基编辑器以及005V1-nCas9的编辑效率对比,包括005V1-10-3-nCas9、005V1-10-1-nCas9、005V1-11-2-nCas9、005V1-15-1-nCas9、005V1-11-3-nCas9、005V1-11-5-nCas9、005V1-10-4-nCas9、005V1-11-4-nCas9、005V1-15-2-nCas9、005V1-3-3-nCas9、005V1-15-5-nCas9、005V1-5-4-nCas9、005V1-5-8-nCas9、005V1-10-5-nCas9、005V1-5-2-nCas9、005V1-3-6-nCas9、005V1-2-7-nCas9、005V1-nCas9等碱基编辑器的编辑效率对比,为便于展示,图中仅标注出腺苷脱氨酶名称指代对应的碱基编辑器。
图5是005V1-nCas9及部分突变体碱基编辑器在不同位点的编辑效率对比,图中颜色的不同深度代表了不同的编辑效率,为便于展示,图中仅标注出腺苷脱氨酶名称指代对应的碱基编辑器。
图6A和图6B分别为PHK09质粒图谱(图6A)以及004V1-nCas9结构图(图6B)。
图7为004V1-nCas9在位点1处A·T到G·C的编辑效率,其包括A~C,图7-A为004V1-nCas9和ABE8e在位点1的编辑效率的比较,编辑位置为sgRNA 5’端起+3,+5,+7,+8的腺嘌呤脱氧核苷酸,误差线表示平均值±SEM,每组样品3个生物学重复;图7-B为碱基编辑器004V1-nCas9转染后在位点1的测序结果;图7-C为碱基编辑器ABE8e转染后在位点1的测序结果。
图8为004V1-nCas9在位点17处A·T到G·C的编辑效率,其包括A~C,图8中的A为004V1-nCas9和ABE8e在位点17的编辑效率的比较,编辑位置为sgRNA 5’端起+3,+4,+5,+7的腺嘌呤脱氧核苷酸,误差线表示平均值±SEM,每组样品3个生物学重复;图8中的B为碱基编辑器004V1-nCas9转染后在位点17的测序结果;图8中的C为碱基编辑器ABE8e转染后在位点17的测序结果。
图9为004V1-nCas9在位点18处A·T到G·C的编辑效率,其包括A~C,图9-A为p004V1-nCas9和ABE8e在位点18的编辑效率的比较,编辑位置为sgRNA 5’端起+3,+5,+7,+9的腺嘌呤脱氧核苷酸,误差线表示平均值±SEM,每组样品3个生物学重复;图9-B为碱基编辑器004V1-nCas9转染后在位点18的测序结果;图9-C为碱基编辑器ABE8e转染后在位点18的测序结果。
图10为004V1-nCas9在PCSK9位点处A·T到G·C的编辑效率,其包括A~C,图10-A为004V1-nCas9和ABE8e在PCSK9位点的编辑效率的比较,编辑位置为sgRNA 5’端起+6的腺嘌呤脱氧核苷酸,误差线表示平均值±SEM,每组样品3个生物学重复;图10-B为碱基编辑器004V1-nCas9转染后在PCSK9位点的测序结果;图10-C为碱基编辑器ABE8e转染后在PCSK9位点的测序结果。
图11为腺苷脱氨酶004V1各个突变体-nCas9以及ABE8e在不同位点的编辑效率。
图12为腺苷脱氨酶004V1另外一些突变体构成的碱基编辑器在不同位点的编辑效率。
图13示出了部分嵌合碱基编辑器及脱氨酶连接到nCas9的N端或C端的碱基编辑器在位点1(site1)的碱基编辑效率。
图14示出了部分嵌合碱基编辑器及脱氨酶连接到nCas9的N端或C端的碱基编辑器在PCSK9位点的碱基编辑效率。
图15示出了部分嵌合碱基编辑器及脱氨酶连接到nCas9的N端或C端的碱基编辑器在FANCF位点的碱基编辑效率。
具体实施方式
除非特别指明,否则基本上按照本领域内熟知的以及在各种参考文献中描述的常规方法进行实施例中描述的实验和方法。例如,本发明中所使用的免疫学、生物化学、化学、分子生物学、微生物学、细胞生物学、基因组学和重组DNA等常规技术,可参见萨姆布鲁克(Sambrook))、弗里奇(Fritsch)和马尼亚蒂斯(Maniatis),《分子克隆:实验室手册》(MOLECULAR CLONING:A LABORATORY MANUAL),第2次编辑(1989);《当代分子生物学实验手册》(CURRENT PROTOCOLS IN MOLECULAR BIOLOGY)(F.M.奥苏贝尔(F.M.Ausubel)等人编辑,(1987));《酶学方法》(METHODS IN ENZYMOLOGY)系列(学术出版公司):《PCR 2:实用方法》(PCR 2:A PRACTICAL APPROACH)(M.J.麦克弗森(M.J.MacPherson)、B.D.黑姆斯(B.D.Hames)和G.R.泰勒(G.R.Taylor)编辑(1995))、哈洛(Harlow)和拉内(Lane)编辑(1988)《抗体:实验室手册》(ANTIBODIES,ALABORATORY MANUAL),以及《动物细胞培养》(ANIMAL CELL CULTURE)(R.I.弗雷谢尼(R.I.Freshney)编辑(1987))。
实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。本领域技术人员知晓,实施例以举例方式描述本发明,且不意欲限制本发明所要求保护的范围。本文中提及的全部公开案和 其他参考资料以其全文通过引用合并入本文。
实施例1 腺苷脱氨酶005V1及各个突变体构成的碱基编辑器的获得
申请人通过生物信息学功能预测可能影响其生物学功能的关键氨基酸位点,并对氨基酸位点进行突变,得到了编辑活性较腺苷脱氨酶005V1构成的碱基编辑器显著提高编辑效率的多个腺苷脱氨酶突变体,005V1脱氨酶的具体位点氨基酸突变方式如表1a、1b所示,每个突变体的突变方式如表2所示。
表1a 005V1脱氨酶具体氨基酸突变方式

表1b 005V1脱氨酶具体氨基酸突变方式


表2腺苷脱氨酶005V1各个突变体的突变形式

通过基于PCR的定点诱变产生不同脱氨酶变体的碱基编辑器,具体的方法是以突变位点附近4-6个氨基酸为中心对编码005V1-nCas9碱基编辑器的DNA序列进行扩增,同时在引物上引入需要突变的序列,通过对扩增后片段进行同源重组或者酶切连接的方式得到不同的突变体碱基编辑器。突变体碱基编辑器所涉及的PCR引物如表3-1、3-2、3-3所示。
表3-1各个突变体碱基编辑器所用PCR引物


表3-2各个突变体碱基编辑器所用PCR引物



表3-3各个突变体碱基编辑器所用PCR引物


申请人通过PCR的定点诱变脱氨酶005V1-nCas9得到相应的突变体碱基编辑器,具体方法如下:
如上所述,申请人在突变的位点附近设计两条突变引物,并在突变引物上插入需要突变的序列,所使用的突变引物如表2、3所示。使用2×Phanta Flash Master Mix酶(Vazyme, P520)扩增质粒全长DNA序列(按照说明书操作),扩增后采用DpnI酶(NEB,R1076L)消化残留的原始模板,之后采用(NEB,R3733L)和T4DNA连接酶(NEB,M0202L)对目的片段进行末端酶切连接,转化后送至铂尚生物科技有限公司测序,测序结果显示已得到序列正确的005V1-nCas9的不同突变体。
实施例2 005V1-nCas9以及各个脱氨酶突变体构建的碱基编辑器的编辑活性验证
(1)sgRNA表达载体(sgRNA质粒)构建过程如下:
针对PCSK9靶点设计PCSK9-sgRNA,根据site1(位点1)、site8(位点8)、site16(位点16)、site18(位点18)靶点(可参见Gaudelli NM,Komor AC,Rees HA,Packer MS,Badran AH,Bryson DI,Liu DR.Programmable base editing of A·T to G·C in genomic DNA without DNA cleavage.Nature.2017 Nov 23;551(7681):464-471.doi:10.1038/nature24644.Epub 2017 Oct 25.Erratum in:Nature.2018 May 2;:PMID:29160308;PMCID:PMC5726555.)设计site1-sgRNA、site8-sgRNA、site16-sgRNA、site18-sgRNA,具体的sgRNA如表4所示。
表4
根据目标序列设计sgRNA并合成寡核苷酸(oligos),所用到的sgRNA序列如SEQ ID NO:141~145所示。在每个sgRNA的上游序列5’端加CACC序列,下游序列的5’端加AAAC序列,因此用于合成的每个sgRNA上游序列形式为:5’-CACCXXXXXXXXXXXXXXXXXXXX(20nt)-3’,下游序列形式为:5’-AAACXXXXXXXXXXXXXXXXXXXX(20nt)-3’。经合成后,上、下游序列通过预设程序(95℃,5min;95℃-85℃以-2℃/s;85℃-25℃以-0.1℃/s;保持在4℃)进行退火,将退火产物连接到经过BsmBI(NEB:R0739L)线性化的PHK09载体(质粒图谱如图6A所示,实验室自有,其已包含sgRNA的骨架序列)上,PHK09载体的序列如下所示:





其中,sgRNA质粒构建中所使用的体系如下:
PHK09载体的线性化体系如下:PHK09载体3μg;缓冲液(NEB:R0539L)6μL;BsmBI 2μL;ddH2O补齐到60μL,37℃酶切过夜。
sgRNA退火产物与线性化载体连接体系如下:T4连接酶缓冲液(NEB:M0202L)1μL,线性化载体20ng,经退火的oligo片段(10μM)5μL,T4连接酶(NEB:M0202L)0.5μL,ddH2O补齐到10μL,16℃连接过夜。
将连接的载体转化大肠杆菌DH5a感受态细胞(唯地生物,DL1001)。具体流程如下:DH5α感受态细胞从-80℃拿出,迅速插入冰中,5分钟后待菌块融化,加入连接产物并用手拨打离心管底轻轻混匀,冰中静置25分钟。42℃水浴热激45秒,迅速放回冰中并静置2分钟。向离心管中加入700μl不含抗生素的无菌LB培养基,混匀后37℃,200rpm复苏60分钟。5000rpm离心一分钟收菌,留取100μl左右上清轻轻吹打重悬菌块并涂布到Amp抗生素的LB培养基上。将平板倒置放于37℃培养箱过夜培养。挑取单菌落,经过测序确认后对阳性克隆摇菌并提取质粒(TIANGEN,DP120-01)后测定浓度,-20℃冰箱中保存备用。
(2)细胞培养与转染
将HEK293T细胞(购自ATCC)接种于添加了10%FBS(v/v)的DMEM培养基中(Gibco,11965092),其中含1%Penicillin Streptomycin(v/v)(Gibco,15140122),在含有5%CO2的37℃细胞培养箱中进行培养。用于转染的细胞,前一天接种于24孔细胞培养板中进行培养,第二天观察细胞,当细胞生长到细胞密度为80%左右时进行转染。24孔板每孔转染的质粒用量分别是005V1-nCas9质粒及各个005V1突变体-nCas9 0.4μg,sgRNA质粒0.4μg。将质粒混合后用25μl的减血清培养基(源培生物,L530KJ)培养基稀释,再加入2μl p3000试剂,吹打混匀作为试剂A,静置5分钟。同时,将2μl的Lipofectamine 3000转染试剂(Thermo,11668019)用25μl的减血清培养基稀释并混匀,作为试剂B,静置5分钟。将上述试剂A与试剂B混合并吹打均匀,静置20分钟。静置结束后将混合试剂逐滴加入待转染的24孔板细胞中,放回37℃培养箱培养。转染6小时后将培养基换为含有10%FBS的DMEM培养基。转染48小时后收集细胞进行编辑效率的检测。
(3)将所收集的细胞进行基因组抽提(TIANGEN,DP304-03),根据实验需求设计引物,所用到的鉴定引物序列如表5所示。
表5各个靶点鉴定引物设计

以基因组为模板,对靶点附近序列进行PCR扩增,扩增的PCR产物用于高通量深度测序(金唯智生物科技有限公司)或Sanger测序(铂尚生物技术(上海)有限公司)进行编辑效率的鉴定。用于目标位点序列扩增的体系如下:2×Taq Master Mix(Vazyme,P112-03)25μL;Primer-F(10pmol/μL)1μL;Primer-R(10pmol/μL)1μL;模板1μL;ddH2O补齐到50μL。基因编辑效果检测:
基因编辑效率计算方法参见Kluesner MG,Nedveck DA,Lahr WS,Garbe JR,Abrahante JE,Webber BR,Moriarity BS.EditR:A Method to Quantify Base Editing from Sanger Sequencing.CRISPR J.2018Jun;1(3):239-250.doi:10.1089/crispr.2018.0014.PMID:31021262;PMCID:PMC6694769。
在本实施例中,包含本发明提供的腺苷脱氨酶和nCas9的碱基编辑器结构如下:
NH2-[NLS]-[腺苷脱氨酶]-接头-[nCas9]-[NLS]-COOH。但仅仅用于示例,并不用于限定碱基编辑器的结构。该结构示例可参见SEQ ID NO:3和SEQ ID NO:190所示。
本实施例统计了各个碱基编辑器对PCSK9靶点的编辑效率(由腺嘌呤A突变为鸟嘌呤G的效率),对碱基编辑器编辑效率进行了测定和对比,005V1-nCas9以及各个脱氨酶突变体构成的碱基编辑器的编辑效率如表6-9所示:
表6

表7

表8
表9
005V1-nCas9以及各个突变体构成的编辑器的编辑效率及对比如图2-5所示。
如图2为005V1-nCas9和005V1各个突变体-nCas9在PCSK9位点处A·T到G·C的编辑效率,编辑位置为sgRNA 5’端起+6的腺嘌呤脱氧核苷酸,误差线表示平均值±SEM,每组样品3~5个生物学重复。
如图3所示,005V1-10-3-nCas9、005V1-11-2-nCas9、005V1-10-1-nCas9、005V1-10-4-nCas9、005V1-11-3-nCas9、005V1-15-1-nCas9、005V1-11-5-nCas9、005V1-11-4-nCas9、005V1-15-2-nCas9、005V1-3-3-nCas9、005V1-5-4-nCas9、005V1-15-5-nCas9、005V1-5-8-nCas9、005V1-10-5-nCas9、005V1-5-2-nCas9、005V1-3-6-nCas9、005V1-2-7-nCas9 在PCSK9位点的编辑效率明显提高,平均编辑效率均在20%以上,相比于原始005V1-nCas9编辑效率提高了1.07~2.41倍。
如图4所示,在PCSK9位点编辑效率明显提高的突变体中,其中005V1-10-3-nCas9编辑效率能够达到46.67%。005V1-11-2-nCas9和005V1-10-1-nCas9的编辑效率也达到35.67%以及35.33%。图中的大部分其他突变体所构成的碱基编辑器的编辑效率也高于005V1-nCas9的编辑效率。
如图5所示,005V1-5-4-nCas9、005V1-3-3-nCas9、005V1-5-2-nCas9、005V1-2-7-nCas9、005V1-3-6-nCas9、005V1-2-8-nCas9、005V1-5-3-nCas9、005V1-3-4-nCas9在site1位点的编辑窗口(+5位置,+7位置)、site8位点的编辑窗口(+2位置,+3位置,+4位置,+6位置)和site16的编辑窗口(+3位置,+4位置,+6位置)明显变窄,而上述碱基编辑器在site1位点编辑窗口变窄的同时,对该窗口内sgRNA 5’端起+5位置的腺嘌呤脱氧核苷酸依旧维持着极高编辑活性,可以实现高效精准编辑,具有较大的潜在应用价值。
实施例3
本实施例采用腺苷脱氨酶004V1,将其与nCas9融合表达构成一种新的编辑窗口更窄的腺嘌呤碱基编辑器004V1-nCas9。
在本实施例中,腺嘌呤碱基编辑器004V1-nCas9构建策略为:将ABE8e中的腺苷脱氨酶替换为004V1,从而得到一种新的腺嘌呤碱基编辑器004V1-nCas9。
在以下实施例中,包含本发明提供的腺苷脱氨酶和nCas9的碱基编辑器结构如下:
NH2-[NLS]-[腺苷脱氨酶]-接头-[napDNAbp]-[NLS]-COOH。但仅仅用于示例,并不用于限定碱基编辑器的结构。
在以下实施例中,所使用到的sgRNA参见表10,所用到的鉴定引物序列如表11所示。
表10
表11
sgRNA及碱基编辑器的具体构建如下。
1、sgRNA表达载体(sgRNA质粒)构建方法与实施例2相同。
2、腺嘌呤碱基编辑器004V1-nCas9表达载体(004V1-nCas9质粒)的构建
本实施例制备了腺嘌呤碱基编辑器表达载体004V1-nCas9。腺苷脱氨酶004V1的核苷酸序列如SEQ ID NO:156所示。
004V1的核苷酸序列:
上述脱氨酶004V1的核苷酸序列已根据人密码子使用的偏好性进行密码子优化,委托生工生物工程(上海)股份有限公司完成540bp的脱氨酶004V1基因人工合成工作并将合成的基因替换ABE8e序列的第63-560位核苷酸。004V1-nCas9质粒图谱如图6B所示。(对应的,其中,004V1-nCas9的氨基酸序列如SEQ ID NO:190所示。)
004V1-nCas9的核苷酸序列如下所示:


3、细胞培养与转染
将HEK293T细胞(购自ATCC)接种于添加了10%FBS(v/v)的DMEM培养基中(Gibco,11965092),其中含1%Penicillin Streptomycin(v/v)(Gibco,15140122),在含有5%CO2的37℃细胞培养箱中进行培养。用于转染的细胞,前一天接种于24孔细胞培养板中进行培养,第二天观察细胞,当细胞生长到细胞密度为80%左右时进行转染。24孔板每孔转染的质粒用量分别是p004V1-nCas9质粒0.4μg,sgRNA质粒0.4μg。将质粒混合后用25μl的减血清培养基(源培生物,L530KJ)培养基稀释,再加入2μl p3000试剂,吹打混匀作为试剂A,静置5分钟。同时,将2μl的Lipofectamine 3000转染试剂(Thermo,11668019)用25μl的减血清培养基稀释并混匀,作为试剂B,静置5分钟。将上述试剂A与试剂B混合并吹打均匀,静置20分钟。静置结束后将混合试剂逐滴加入待转染的24孔板细胞中,放回37℃培养箱培养。转染6小时后将培养基换为含有10%FBS的DMEM培养基。转染48小时后收集细胞进行编辑效率的检测。
4、本实施例的腺嘌呤碱基编辑器004V1-nCas9在内源基因位点编辑效率检测
将“3、细胞培养与转染”中所述细胞进行基因组抽提(TIANGEN,DP304-03)。根据实验需求设计引物,所用到的鉴定引物序列如表11中的SEQ ID NO:148-149,166-167,154-155所示。以基因组为模板,对靶点附近序列进行PCR扩增,扩增的PCR产物用于高通量深度测序(金唯智生物科技有限公司)或Sanger测序(铂尚生物技术(上海)有限公司)进行编辑效率的鉴定。用于目标位点序列扩增的体系如下:2×Taq Master Mix(Vazyme,P112-03)25μL;Primer-F(10pmol/μL)1μL;Primer-R(10pmol/μL)1μL;模板1μL;ddH2O补齐到50μL。
基因编辑效果试验过程如下:
将004V1-nCas9质粒分别与不同位点的sgRNA质粒共转到HEK293T细胞(ATCC购买)中,相比于将ABE8e质粒(addgene,Plasmid#138489),发现在位点1和位点17处,004V1-nCas9的编辑效率与ABE8e相似(图7,图8),而在位点18处004V1-nCas9的编辑效率明显优于ABE8e(图9)。
至于编辑窗口,位点1、位点17、位点18处004V1-nCas9的编辑窗口都小于ABE8e(参见图7~图9)。
基因编辑效率计算方法参见Kluesner MG,Nedveck DA,Lahr WS,Garbe JR,Abrahante JE,Webber BR,Moriarity BS.EditR:A Method to Quantify Base Editing from Sanger Sequencing.CRISPR J.2018Jun;1(3):239-250.doi:10.1089/crispr.2018.0014.PMID:31021262;PMCID:PMC6694769。本实施例相关结果如图7-9所示。
实施例4
本实施例中,将实施例3中获得的腺嘌呤碱基编辑器004V1-nCas9应用到疾病治疗中。
前蛋白转化酶枯草溶菌素9(PCSK9)是kexin样前蛋白转化酶枯草溶菌素家族的第9个成员,由692个氨基酸残基组成。作为低密度脂蛋白受体(Low-Density Lipoprotein Receptor,LDLR)的负调节剂,过量PCSK9与肝细胞表面LDLR结合后可加速其降解,导致肝细胞对低密度脂蛋白-胆固醇(Low-Density Lipoprotein Cholesterol,LDL-C)的摄取下降,进而增加外周循环的LDL-C水平,最终使血液中胆固醇水平升高。
1、sgRNA表达载体(sgRNA质粒)构建
本实施例中靶向PCSK9的sgRNA质粒的构建方式如实施例1中所述。所用到的sgRNA序列如SEQ ID NO:141所示。
PCSK9-sgRNA:Cccgcaccttggcgcagcgg(SEQ ID NO:141)。
2、细胞培养与转染
本实施例中HEK293T细胞的培养与转染方法与实施例3相同。
3、优化后的碱基编辑工具在PCSK9位点编辑效率检测
本实施例中编辑效率检测方法同实施例1中所述,所用到的鉴定引物序列如SEQ ID NO:146-147所示。
PCSK9-正向引物:Gctagccttgcgttccg(SEQ ID NO:146);
PCSK9-反向引物:Gtccccaagatcgtgccaa(SEQ ID NO:147)。
本实施例将腺嘌呤碱基编辑器表达载体p004V1-nCas9与靶向PCSK9的sgRNA质粒共转到HEK293T细胞中,如图10所示,相比于ABE8e腺嘌呤碱基编辑器,004V1-nCas9的编辑效率优于ABE8e。说明碱基编辑器004V1-nCas9可以靶向治疗PCSK9高表达导致的高胆固醇血症。
实施例5
本实施例采用腺苷脱氨酶004V1的突变体腺苷脱氨酶004V2、腺苷脱氨酶004V3、腺苷脱氨酶004V4、腺苷脱氨酶004V7、腺苷脱氨酶004V8、腺苷脱氨酶004V10、腺苷脱氨酶004V12、腺苷脱氨酶004V13,采用与实施例3或实施例4相同的方法构建腺嘌呤碱基编辑器表达载体、sgRNA表达载体,并进行细胞培养及转染,检测编辑效率。
上述突变体所构成的碱基编辑器氨基酸序列信息如下:
004V2-nCas9(SEQ ID NO:191)、004V3-nCas9(SEQ ID NO:192)、004V4-nCas9(SEQ ID NO:193)、004V7-nCas9(SEQ ID NO:194)、004V8-nCas9(SEQ ID NO:195)、004V10-nCas9(SEQ ID NO:196)、004V12-nCas9(SEQ ID NO:197)、004V13-nCas9(SEQ ID NO:198)。
碱基编辑效率结果见图11。如图11所示,相比于ABE8e,004V2-nCas9、004V3-nCas9、004V4-nCas9、004V7-nCas9、004V8-nCas9、004V10-nCas9、004V12-nCas9、004V13-nCas9在位点1的编辑窗口明显变窄(+5,+7),其中以004V12-nCas9的编辑窗口最窄,仅对位 置+5处具有编辑活性,并且碱基编辑效率为46%,可以实现高效精准编辑。在位点18,004V1-nCas9的不同突变体的编辑效率都明显优于ABE8e。对于PCSK9位点,004V1-nCas9和004V3-nCas9的编辑效率也优于ABE8e。
各腺苷脱氨酶构成的碱基编辑器的核苷酸序列如下
004V2-nCas9的核苷酸序列:

004V3-nCas9的核苷酸序列:


004V4-nCas9的核苷酸序列:


004V7-nCas9的核苷酸序列:

004V8-nCas9的核苷酸序列:


004V10-nCas9的核苷酸序列:


004V12-nCas9的核苷酸序列:

004V13-nCas9的核苷酸序列:


实施例6
对腺苷脱氨酶004V1另外一些氨基酸位点也进行了突变,得到了腺苷脱氨酶004V14、腺苷脱氨酶004V15、腺苷脱氨酶004V16、腺苷脱氨酶004V17、腺苷脱氨酶004V18、腺苷脱氨酶004V19、腺苷脱氨酶004V20、腺苷脱氨酶004V21、腺苷脱氨酶004V22、腺苷脱氨酶004V23、腺苷脱氨酶004V24、腺苷脱氨酶004V25、腺苷脱氨酶004V26、腺苷脱氨酶004V27、腺苷脱氨酶004V28、腺苷脱氨酶004V29、腺苷脱氨酶004V30、腺苷脱氨酶004V31、腺苷脱氨酶004V32、腺苷脱氨酶004V33、腺苷脱氨酶004V34、腺苷脱氨酶004V35、腺苷脱氨酶004V36、腺苷脱氨酶004V37、腺苷脱氨酶004V38、腺苷脱氨酶004V39、腺苷脱氨酶004V40和腺苷脱氨酶004V41,氨基酸突变方式参见表12,各个脱氨酶突变体的涉及的氨基酸突变方式参见表13。
表12各个脱氨酶突变体的涉及的氨基酸突变方式


表13各个脱氨酶突变体的突变方式

采用与实施例2-4相同的方法构建由腺苷脱氨酶004V14-004V41构成的腺嘌呤碱基编辑器表达载体、PCSK9-sgRNA表达载体,并采用与上述实施例相同的方式进行细胞培养及转染,基因编辑效果试验过程如下:
基因编辑效率计算方法参见Kluesner MG,Nedveck DA,Lahr WS,Garbe JR,Abrahante JE,Webber BR,Moriarity BS.EditR:A Method to Quantify Base Editing from Sanger Sequencing.CRISPR J.2018 Jun;1(3):239-250.doi:10.1089/crispr.2018.0014.PMID:31021262;PMCID:PMC6694769。
上述各个腺苷脱氨酶004V1与nCas9所构成的碱基编辑器的编辑效率与脱氨酶004V1-nCas9的编辑效率对比如下:

本实施例相关结果如图12所示。相比于004V1-Cas9,脱氨酶004V14-004V32与nCas9所构成的碱基编辑器在PCSK9位点的编辑效率均高于004V1-nCas9。腺苷脱氨酶004V14-004V29与nCas9所构成的碱基编辑器的编辑效率相较于004V1-nCas9均有大幅提高,004V17-nCas9~004V29-nCas9的编辑效率相较于004V1-nCas9提高了20%以上。特别地,004V14-nCas9、004V15-nCas9、004V16-nCas9的编辑效率是004V1-nCas9编辑效率2倍以上。
实施例7
为进一步改善碱基编辑器融合蛋白的编辑效率,申请人对碱基编辑器融合蛋白的结构进行了探索。在nCas9蛋白中间寻找合适的腺苷脱氨酶的插入位点,所采用的腺苷脱氨酶包括腺苷脱氨酶005V1、005V1-10-1、005V1-10-3,上述三个腺苷脱氨酶序列参见实施例1。实验步骤如下:
1.nCas9质粒的构建
(1)设计引物(由上海铂尚生物科技有限公司合成引物):
表14
使用诺唯赞公司的高保真酶试剂盒(Vazyme,P501-d2)对ABE8e(Addgene,#138489)进行PCR扩增,扩增体系如表15所示:
表15
PCR扩增程序下表所示:
表16
扩增好的PCR产物用按照试剂盒说明书进行回收(天根,通用型DNA纯化回收试剂盒,DP214)。
将纯化后的PCR产物转化大肠杆菌DH5a感受态细胞(唯地生物,DL1001)。具体流程如下:DH5α感受态细胞从-80℃拿出,迅速插入冰中,5分钟后待菌块融化,加入连接产物并用手拨打离心管底轻轻混匀,冰中静置25分钟。42℃水浴热激45秒,迅速放回冰中并静置2分钟。向离心管中加入700μl不含抗生素的无菌LB培养基,混匀后37℃,200rpm复苏60分钟。5000rpm离心一分钟收菌,留取100μl左右上清轻轻吹打重悬菌块并涂布到Amp抗生素的LB培养基上。将平板倒置放于37℃培养箱过夜培养。挑取单菌落,经过测序确认后对阳性克隆摇菌并利用无内毒素质粒大提试剂盒(TIANGEN:DP120-01)抽提nCas9质粒后测定浓度,-20℃冰箱中保存备用。
(2)编码脱氨酶005V1、005V1-10-1及005V1-10-3的DNA序列的获得
用引物对005V1-nCas9、005V1-10-1-nCas9、005V1-10-3-nCas9质粒进行PCR扩增,扩增体系与PCR程序同表15、16,扩增好的PCR产物用按照试剂盒说明书进行回收(天根,通用型DNA纯化回收试剂盒,DP214)。
所用的PCR引物如下表所示:
表17

(3)不同嵌合碱基编辑器的设计及对应nCas9质粒的引物序列设计
对脱氨酶005V1、005V1-10-1、005V1-10-3在nCas9中的插入位置进行了研究,设计多种嵌合碱基编辑器,并根据上述脱氨酶的不同插入位置设计了对应nCas9的引物序列,具体如表18所示
表18

表18所述的不同嵌合碱基编辑器对应nCas9质粒的引物序列设计如下:
表19

用表19中的引物序列对步骤(1)得到的nCas9质粒分别进行扩增,扩增体系与PCR程序同表15、16,扩增好的PCR产物用按照试剂盒说明书进行回收(天根,通用型DNA纯化回收试剂盒,DP214)。不同插入位置对应的nCas9质粒扩增所用的引物如表19所示。例如,005V1-1249-ABE的氨基酸序列如下所示:
其中,斜体表示NLS,粗体表示nCas9片段,下划线代表linker,双下划线代表脱氨酶,C端的星号代表终止密码子位置;所述脱氨酶序列为脱氨酶005V1,所述脱氨酶005V1在nCas9嵌入位置为1249-1250之间。
(4)将步骤(2)中所得到的腺苷脱氨酶005V1、10-1、10-3的PCR产物与步骤(3)中不同插入位置的线性化PCR产物使用Gibson Assembly Master Mix重组试剂盒(NEB,E2611S)进行同源重组,得到不同的嵌合重组质粒,反应体系如下表所示:
表19
将同源重组产物转化大肠杆菌DH5a感受态细胞(唯地生物,DL1001)。具体流程如下:DH5α感受态细胞从-80℃拿出,迅速插入冰中,5分钟后待菌块融化,加入连接产物并用手 拨打离心管底轻轻混匀,冰中静置25分钟。42℃水浴热激45秒,迅速放回冰中并静置2分钟。向离心管中加入700μl不含抗生素的无菌LB培养基,混匀后37℃,200rpm复苏60分钟。5000rpm离心一分钟收菌,留取100μl左右上清轻轻吹打重悬菌块并涂布到Amp抗生素的LB培养基上。将平板倒置放于37℃培养箱过夜培养。挑取单菌落,经过测序确认后对阳性克隆摇菌并利用无内毒素质粒大提试剂盒(TIANGEN:DP120-01)提取嵌合重组质粒后测定浓度,-20℃冰箱中保存备用。
(5)细胞培养与转染
将HEK293T细胞(购自ATCC)接种于添加了10%FBS(v/v)的DMEM培养基中(Gibco,11965092),其中含1%Penicillin Streptomycin(v/v)(Gibco,15140122),在含有5%CO2的37℃细胞培养箱中进行培养。用于转染的细胞,前一天接种于24孔细胞培养板中进行培养,第二天观察细胞,当细胞生长到细胞密度为80%左右时进行转染。24孔板每孔转染的质粒用量分别是嵌合重组质粒0.4μg,sgRNA质粒0.4μg。所用sgRNA如下表所示:
表20
本实施例所用的PCSK9-sgRNA、site1-sgRNA序列与其他实施例对应序列相同,FANCF-sgRNA是靶向FANCF基因sgRNA。
将嵌合重组质粒和sgRNA质粒混合后用25μl的减血清培养基(源培生物,L530KJ)培养基稀释,再加入2μl p3000试剂,吹打混匀作为试剂A,静置5分钟。同时,将2μl的Lipofectamine3000转染试剂(Thermo,11668019)用25μl的减血清培养基稀释并混匀,作为试剂B,静置5分钟。将上述试剂A与试剂B混合并吹打均匀,静置20分钟。静置结束后将混合试剂逐滴加入待转染的24孔板细胞中,放回37℃培养箱培养。转染6小时后将培养基换为含有10%FBS的DMEM培养基。转染48小时后收集细胞进行编辑效率的检测。
(6)编辑效率检测
利用基因组DNA提取试剂盒(TIANGEN,DP304-03)对HEK293T细胞进行基因组抽提。根据实验需求设计引物,所用到的鉴定引物序列如表21所示。
表21
以基因组为模板,利用鉴定引物对sgRNA靶点附近序列进行PCR扩增,扩增的PCR产物用Sanger测序(铂尚生物技术(上海)有限公司)进行编辑效率的鉴定。
用于目标位点序列扩增的体系如下:
2×Taq Master Mix(Vazyme,P112-03)25μL;Primer-F(10pmol/μL)1μL;Primer-R(10pmol/μL)1μL;模板1μL;ddH2O补齐到50μL。
基因编辑效率计算方法参见Kluesner MG,Nedveck DA,Lahr WS,Garbe JR,Abrahante JE,Webber BR,Moriarity BS.EditR:A Method to Quantify Base Editing from Sanger Sequencing.CRISPR J.2018 Jun;1(3):239-250.doi:10.1089/crispr.2018.0014.PMID:31021262;PMCID:PMC6694769。
本实施例相关结果如图13-15所示,在site1位点,各个嵌合碱基编辑器及脱氨酶连接到nCas9的N端或C端的碱基编辑器的碱基编辑效率参见图13。可以看出,在site1位点,碱基编辑器005V1-C-ABE、005V1-10-1-C-ABE、005V1-10-3-C-ABE均没有编辑活性。对于其他碱基编辑器,005V1-1047-1064-ABE、005V1-1048-1063-ABE、005V1-1249-ABE在site1位点的A5位置编辑效率显著高于005V1-N-ABE,尤其是005V1-1249-ABE的编辑效率是005V1-N-ABE的近2倍。005V1-776-ABE、005V1-793-ABE、005V1-905-ABE、005V1-919-ABE对site1的特定位置的碱基编辑效率高于005V1-N-ABE,可以利用这一点实现特定基因位置的编辑以实现疾病治疗。005V1-10-1-1249-ABE在site1位点的所有位置 的编辑效率均高于005V1-10-1-N-ABE。005V1-10-3-1249-ABE在site1位点的所有位置的编辑效率均高于005V1-10-3-N-ABE。可以看出对于005V1及其突变体005V1-10-1、005V1-10-3来说,其选择nCas9插入位置为第1249-1250之间时,能够获得更好的编辑效率。
在PCSK9位点,各个嵌合碱基编辑器及脱氨酶连接到nCas9的N端或C端的碱基编辑器的碱基编辑效率参见图14。可以看出,在PCSK9位点,005V1-1249-ABE的编辑效率高于005V1-N-ABE,005V1-10-1-1249-ABE在该位点的编辑效率高于005V1-10-1-N-ABE,005V1-10-3-1249-ABE在该位点的编辑效率高于005V1-10-3-N-ABE。
在FANCF位点,各个嵌合碱基编辑器及脱氨酶连接到nCas9的N端或C端的碱基编辑器的碱基编辑效率参见图15。可以看出,在FANCF位点,005V1-1249-ABE的在特定位置表现出编辑效率高于005V1-N-ABE,且有些位置的编辑特性与005V1-N-ABE不同。005V1-10-1-1249-ABE在该位点的A10、A12位置表现出了编辑效率,而005V1-10-1-N-ABE在这些位置则没有表现出编辑活性,在其他位置的编辑活性与005V1-10-1-N-ABE相差不大。005V1-10-3-1249-ABE在该位点呈现出与005V1-10-1-1249-ABE相似的特性,在A10、A12位置表现出了编辑效率,而005V1-10-3-N-ABE在这些位置则没有表现出编辑活性,在其他位置的编辑活性与005V1-10-3-N-ABE相差不大。

Claims (26)

  1. 一种腺苷脱氨酶,其包含以下序列中的一种或多种:
    (i)如SEQ ID NO:1或SEQ ID NO:170所示的氨基酸序列;
    (ii)与SEQ ID NO:1或SEQ ID NO:170所示的氨基酸序列具有至少80%、82%、85%、87%、90%、92%、95%、96%、97%、98%或99%同一性的氨基酸序列,并且其保留如SEQ ID NO:1或SEQ ID NO:170所示的氨基酸序列的脱氨活性;
    (iii)在SEQ ID NO:1或SEQ ID NO:170所示的氨基酸序列中添加、取代、缺失或插入1个或多个氨基酸残基的氨基酸序列,并且其保留如SEQ ID NO:1或SEQ ID NO:170所示的氨基酸序列的脱氨活性;或者,
    (iv)由核苷酸序列编码的氨基酸序列,所述核苷酸序列与编码如SEQ ID NO:1或SEQ ID NO:170所示的氨基酸序列的多核苷酸序列在严格条件下杂交,并且所述氨基酸序列保留如SEQ ID NO:1或SEQ ID NO:170所示的氨基酸序列的脱氨活性,所述严格条件是中等严格条件,中-高严格条件,高严格条件或非常高严格条件。
  2. 根据权利要求1所述的腺苷脱氨酶,其中,所述腺苷脱氨酶相对于SEQ ID NO:1所示的氨基酸序列存在以下任意多个氨基酸位置的改变:33、35、36、46、47、48、49、104、105、107、148、149、150、151、152、153、154、155。
  3. 根据权利要求1所述的腺苷脱氨酶,其中,所述的取代为在如SEQ ID NO:170所示的氨基酸序列的以下位点中的一个或多个发生的取代:
    S15、D16、H17、E18、F19、N20、D21、E22、Y23、W24、M25、R26、H27、A28、L29、T30、K33、R34、A35、R36、V41、V43、L47、L49、N51、N59、A61、I62、L64、A69、E72、G80、L81、V82、L83、Q84、N85、Y86、I89、D90、A91、T92、V95、F97、A106、R111、I112、S113、R114、L115、F117、V119、R120、N121、S122、K123、R124、N132、V133、L134、N135、P137、G138、M139、N140、H141、R142、E144、D160、V168、F169、N170。
  4. 根据权利要求2所述的腺苷脱氨酶,所述脱氨酶包含选自相对于SEQ ID NO:1所示的氨基酸序列中以下氨基酸位置的改变:
    V33I、D35G、D35R、D36G、D36N、A46C、I47Y、I47R、I47V、T48G、T48R、L49T、L49K、L49H、V104A、V104M、S105C、S105G、S107R、S107K、S107A、Q148P、Q148C、 Q148A、Q148G、Q149M、Q149G、Q149L、P150R、P150L、P150C、R151K、E152R、E152T、E152G、E152W、V153P、V153F、V153I、V153T、F154H、F154K、F154L、N155T、N155R、N155H。
  5. 根据权利要求4所述的腺苷脱氨酶,其中,所述腺苷脱氨酶包含相对于SEQ ID NO:1所示的氨基酸序列选自以下的任意一个组的改变:
    (1)Q148G+Q149M+P150R;(2)S107A;(3)E152R+V153P+F154H+N155T;(4)S107K;(5)Q148P+Q149G+P150L;(6)A46C+I47Y+T48G+L49H;(7)E152T+V153F+N155T;(8)S107R;(9)V104M;(10)E152G+V153I+F154K+N155R;(11)S105C;(12)Q148C+Q149L+P150R+R151K;(13)Q148A+Q149G+P150C;(14)E152W+V153T+F154L+N155H;(15)S105G;(16)I47R+T48G+L49T;(17)D35G+D36N;(18)V33I+D35R+D36G;(19)V104A;(20)I47V+T48R+L49K。
  6. 根据权利要求3所述的腺苷脱氨酶,所述的取代为在如SEQ ID NO:170所示的氨基酸序列的以下位点中的一个或多个发生的取代:S15T、S15G、D16E、D16N、H17C、H17K、E18D、E18K、F19C、F19Y、F19S、N20Q、N20L、N20S、D21Q、E22D、Y23F、W24F、M25L、M25V、R26K、R26T、H27R、A28C、L29I、T30E、K33R、K33S、R34K、A35S、R36Q、L49F、N51G、L64M、L64Q、L64P、G80A、L81N、V82A、V82T、L83I、L83Q、Q84N、N85S、Y86W、I89E、I89L、D90G、A91C、A91T、T92D、A106C、I112L、S113K、R114K、V119L、S122N、S122P、R124H、R124T、N132K、V133I、L134F、N135H、N135S、P137F、G138A、M139L、N140K、H141A、R142L、R142S、E144H、D160E、V168A、F169C、F169V、N170D;
    优选地,所述取代为在如SEQ ID NO:170所示的氨基酸序列的以下位点的组合发生的取代:
    (1)S15T+D16E+H17K+F19Y+N20Q;(2)E22D+Y23F+W24F+R26K+H27R+L29I;(3)K33R+R34K+A35S;(4)G80A+L81N+V82A+L83I+Q84N+N85S+Y86W;(5)I89L+D90G+A91T+T92D;(6)I112L+S113K+R114K;(7)N132K+V133I+L134F+N135H;(8)P137F+G138A+M139L;(9)E18K+F19S+N20L;(10)M25L;(11)S15G+D16N+H17C; (12)L64Q+S122P+V168A;(13)R36Q+L64Q+S122P+F169V;(14)R36Q+V119L+V168A;(15)R36Q+V119L+N170D;(16)L64Q+V119L+V168A;(17)F169C;(18)L64Q+V119L+V168A;(19)L64P+R124T+F169V;(20)I89E+A91C;(21)L64Q+V119L+F169V;(22)L64M;(23)L64Q+S122P+N170D;(24)E18D+F19C+N20S;(25)L64P+V119L+N170D;(26)S122N+R124H+N135S+D160E;(27)R26T+H27R+A28C;(28)V82T+L83Q;(29)M25V;(30)D21Q+T30E+K33S;(31)N140K+H141A+R142S;(32)L49F+N51G;(33)A106C;(34)R142L+E144H。
  7. 一种碱基编辑器融合蛋白,其包含如权利要求1~6中任一项所述的腺苷脱氨酶,以及核酸可编程核苷酸结合结构域。
  8. 根据权利要求7所述的碱基编辑器融合蛋白,其中,所述核酸可编程核苷酸结合结构域为Cas蛋白或AGO蛋白。
  9. 根据权利要求7或8所述的碱基编辑器融合蛋白,其中,所述碱基编辑器融合蛋白中还包括至少一条核定位信号序列;
    可选地,所述碱基编辑器融合蛋白中还包括接头;
    任选地,所述接头包含如SEQ ID NO:171-180所示的序列中的一种或多种;
    任选的,所述核定位信号序列包含如SEQ ID NO:181-189所示序列中的一种或多种。
  10. 根据权利要求8所述的融合蛋白,所述的Cas蛋白包括Cas9、Cas12或Cas13家族。
  11. 根据权利要求10所述的融合蛋白,所述的Cas蛋白包括SpCas9、SaCas9、Cas12a、Cas12b、Cas12c、Cas12d、Cas12e、Cas12g、Cas12h、Cas12i、Cas12j或Cas13。
  12. 根据权利要求10或11所述的融合蛋白,所述的Cas蛋白为失活核酸酶或切口核酸酶。
  13. 根据权利要求7~12中任一项所述的碱基编辑器融合蛋白,其中,所述碱基编辑器融合蛋白包含以下序列中的一种或多种:
    (i)如SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列;
    (ii)与SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列具有至少80%、82%、85%、87%、90%、92%、95%、96%、97%、98%或99%同一性的氨基酸序列,并且其保留如SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列的结合多核苷酸和碱基编辑活性;
    (iii)在SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列中添加、取代、缺失或插入1个或多个氨基酸残基的氨基酸序列,并且其保留如SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列的结合多核苷酸和碱基编辑活性;或者,
    (iv)由核苷酸序列编码的氨基酸序列,所述核苷酸序列与编码如SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列的多核苷酸序列在严格条件下杂交,并且所述氨基酸序列保留如SEQ ID NO:190或SEQ ID NO:3所示的氨基酸序列的结合多核苷酸和碱基编辑活性,所述严格条件是中等严格条件,中-高严格条件,高严格条件或非常高严格条件。
  14. 根据权利要求13所述的碱基编辑器融合蛋白,其中,所述的碱基编辑器融合蛋白包含SEQ ID NO:191-198所示序列中的任一种。
  15. 一种碱基编辑器融合蛋白,其中,其包含如权利要求1~6任一项所述的腺苷脱氨酶,以及权利要求7-9中任一项所述的核酸可编程核苷酸结合结构域,所述融合蛋白结构为NH2-[核酸可编程核苷酸结合结构域的N末端片段]-[脱氨酶]-[核酸可编程核苷酸结合结构域的C末端片段]-COOH。
  16. 根据权利要求15所述的碱基编辑器融合蛋白,所述的核酸可编程核苷酸结合结构域选择nCas9结构域,所述腺苷脱氨酶的插入位置为SEQ ID NO:61的第583-584、768-769、770-771、776-777、793-794、905-906、919-920、1048-1063、1049-1062、1249-1250、1263-1264或1276-1277的氨基酸位置之间。
  17. 一种多核苷酸,其编码如权利要求1-6中任一项所述的腺苷脱氨酶或者编码如权利要求7-16中任一项所述的碱基编辑器融合蛋白,包括但不限于SEQ ID NO.2、4、156-165所示的核苷酸序列。
  18. 一种载体,其包含如权利要求17所述的多核苷酸。
  19. 一种细胞,其包含如权利要求1~6中任一项所述的腺苷脱氨酶、如权利要求7~16中任一项所述的碱基编辑器融合蛋白、如权利要求17所述的多核苷酸或如权利要求18所述的载体。
  20. 一种碱基编辑器系统,其包含如权利要求1~6中任一项所述的腺苷脱氨酶、核酸可编程核苷酸结合结构域、以及引导多核苷酸;
    或者,其包含如权利要求7~16中任一项所述的碱基编辑器融合蛋白,以及引导多核 苷酸。
  21. 一种药物组合物,其包含如权利要求1~6中任一项所述的腺苷脱氨酶、如权利要求7~16中任一项所述的碱基编辑器融合蛋白、如权利要求17所述的多核苷酸、如权利要求18所述的载体、如权利要求19所述的细胞或如权利要求20所述的碱基编辑器系统,以及药学上可接受的载体。
  22. 一种试剂盒,其包含如权利要求1~6中任一项所述的腺苷脱氨酶、如权利要求7~16中任一项所述的碱基编辑器融合蛋白、如权利要求17所述的多核苷酸、如权利要求18所述的载体、如权利要求19所述的细胞或如权利要求20所述的碱基编辑器系统。
  23. 一种递送系统,其包含如权利要求1~6中任一项所述的腺苷脱氨酶、如权利要求7~16中任一项所述的碱基编辑器融合蛋白、如权利要求17所述的多核苷酸、如权利要求18所述的载体、如权利要求19所述的细胞或如权利要求20所述的碱基编辑器系统,以及递送介质。
  24. 一种核酸的碱基编辑方法,其包括将待被编辑碱基的核酸与如权利要求20所述的碱基编辑器系统相接触的步骤。
  25. 如权利要求1~6中任一项所述的腺苷脱氨酶、如权利要求7~16中任一项所述的碱基编辑器融合蛋白、如权利要求17所述的多核苷酸、如权利要求18所述的载体、如权利要求19所述的细胞、如权利要求20所述的碱基编辑器系统、如权利要求21所述的药物组合物或如权利要求23所述的递送系统在治疗与点突变相关或由点突变引起的疾病或制备用于治疗与点突变相关或由点突变引起的疾病药物中的用途。
  26. 权利要求25所述的用途,主要包括治疗PCSK9靶点相关的疾病中的用途,和/或制备用于治疗高胆固醇血症、转甲状腺素蛋白淀粉样变、β-血红蛋白病的药物中的用途。
PCT/CN2023/078133 2022-04-07 2023-02-24 一种腺苷脱氨酶、碱基编辑器及应用 WO2023193536A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210363757.1 2022-04-07
CN202210363757.1A CN114634923B (zh) 2022-04-07 2022-04-07 腺苷脱氨酶、碱基编辑器融合蛋白、碱基编辑器系统及用途

Publications (1)

Publication Number Publication Date
WO2023193536A1 true WO2023193536A1 (zh) 2023-10-12

Family

ID=81952358

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/078133 WO2023193536A1 (zh) 2022-04-07 2023-02-24 一种腺苷脱氨酶、碱基编辑器及应用

Country Status (2)

Country Link
CN (2) CN117925585A (zh)
WO (1) WO2023193536A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117925585A (zh) * 2022-04-07 2024-04-26 尧唐(上海)生物科技有限公司 腺苷脱氨酶、碱基编辑器融合蛋白、碱基编辑器系统及用途
CN117965505A (zh) * 2023-06-28 2024-05-03 微光基因(苏州)有限公司 工程化的腺苷脱氨酶及碱基编辑器

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210252118A1 (en) * 2019-02-13 2021-08-19 Beam Therapeutics Inc. Compositions and methods for treating hemoglobinopathies
CN114026227A (zh) * 2019-02-13 2022-02-08 比姆医疗股份有限公司 具有用于修饰靶标序列中核碱基的腺苷脱氨酶碱基编辑器的经修饰的免疫细胞
CN114072496A (zh) * 2019-02-13 2022-02-18 比姆医疗股份有限公司 腺苷脱氨酶碱基编辑器及使用其修饰靶标序列中的核碱基的方法
CN114072180A (zh) * 2019-02-13 2022-02-18 比姆医疗股份有限公司 用于治疗α1-抗胰蛋白酶缺乏症的组合物和方法
CN114634923A (zh) * 2022-04-07 2022-06-17 尧唐(上海)生物科技有限公司 腺苷脱氨酶、碱基编辑器融合蛋白、碱基编辑器系统及用途

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11946040B2 (en) * 2019-02-04 2024-04-02 The General Hospital Corporation Adenine DNA base editor variants with reduced off-target RNA editing
CN110804628B (zh) * 2019-02-28 2023-05-12 中国科学院脑科学与智能技术卓越创新中心 高特异性无脱靶单碱基基因编辑工具
CN111778233B (zh) * 2019-04-04 2023-11-21 辉大(上海)生物科技有限公司 一种新型的单碱基编辑技术及其应用
CN113699135B (zh) * 2021-08-10 2022-05-24 国家卫生健康委科学技术研究所 一种无pam限制的腺嘌呤碱基编辑器融合蛋白及应用

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210252118A1 (en) * 2019-02-13 2021-08-19 Beam Therapeutics Inc. Compositions and methods for treating hemoglobinopathies
CN114026227A (zh) * 2019-02-13 2022-02-08 比姆医疗股份有限公司 具有用于修饰靶标序列中核碱基的腺苷脱氨酶碱基编辑器的经修饰的免疫细胞
CN114072496A (zh) * 2019-02-13 2022-02-18 比姆医疗股份有限公司 腺苷脱氨酶碱基编辑器及使用其修饰靶标序列中的核碱基的方法
CN114072180A (zh) * 2019-02-13 2022-02-18 比姆医疗股份有限公司 用于治疗α1-抗胰蛋白酶缺乏症的组合物和方法
CN114634923A (zh) * 2022-04-07 2022-06-17 尧唐(上海)生物科技有限公司 腺苷脱氨酶、碱基编辑器融合蛋白、碱基编辑器系统及用途

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DATABASE PROTEIN ANONYMOUS : "tRNA adenosine(34) deaminase TadA [Hafnia paralvei]", XP093097125, retrieved from NCBI *

Also Published As

Publication number Publication date
CN114634923B (zh) 2024-02-23
CN114634923A (zh) 2022-06-17
CN117925585A (zh) 2024-04-26

Similar Documents

Publication Publication Date Title
AU2017308889B2 (en) Programmable Cas9-recombinase fusion proteins and uses thereof
WO2023193536A1 (zh) 一种腺苷脱氨酶、碱基编辑器及应用
JP2024041081A (ja) アデノシン塩基編集因子の使用
AU2021231074C1 (en) Class II, type V CRISPR systems
EP3433363A1 (en) Genome editing systems comprising repair-modulating enzyme molecules and methods of their use
CA3009727A1 (en) Compositions and methods for the treatment of hemoglobinopathies
KR102151065B1 (ko) 동물 배아의 염기 교정용 조성물 및 염기 교정 방법
AU2015330699A1 (en) Compositions and methods for promoting homology directed repair
JP2023540797A (ja) 塩基編集酵素
JP2024504981A (ja) 新規の操作されたヌクレアーゼおよびキメラヌクレアーゼ
WO2023098485A1 (zh) 一种基于c2c9核酸酶的新型基因组编辑系统及其应用
AU2022284808A1 (en) Class ii, type v crispr systems
CA3225082A1 (en) Enzymes with ruvc domains
KR20220039564A (ko) 신규의 개량된 염기 편집 또는 교정용 융합단백질 및 이의 용도
CN111065736A (zh) 针对颗粒状角膜变性症的基因治疗药物
WO2023206872A1 (zh) 一种工程优化的核酸酶、向导rna、编辑系统和应用
WO2023024089A1 (zh) 实现a到c和/或a到t碱基突变的碱基编辑系统及其应用
WO2023154011A2 (en) Compositions and methods for genome editing
CN117210435A (zh) 一种用于调控rna甲基化修饰的编辑系统及其应用
KR20230016751A (ko) 염기 편집기 및 이의 용도
WO2022056301A1 (en) Base editing enzymes
CN117210445A (zh) 一种基于rna特异性脱氨酶突变体及其应用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23784100

Country of ref document: EP

Kind code of ref document: A1