WO2020199200A1 - New-type single-base editing technique and use thereof - Google Patents

New-type single-base editing technique and use thereof Download PDF

Info

Publication number
WO2020199200A1
WO2020199200A1 PCT/CN2019/081532 CN2019081532W WO2020199200A1 WO 2020199200 A1 WO2020199200 A1 WO 2020199200A1 CN 2019081532 W CN2019081532 W CN 2019081532W WO 2020199200 A1 WO2020199200 A1 WO 2020199200A1
Authority
WO
WIPO (PCT)
Prior art keywords
tada
amino acid
enzyme
vector
editing
Prior art date
Application number
PCT/CN2019/081532
Other languages
French (fr)
Chinese (zh)
Inventor
杨辉
周昌阳
Original Assignee
中国科学院脑科学与智能技术卓越创新中心
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院脑科学与智能技术卓越创新中心 filed Critical 中国科学院脑科学与智能技术卓越创新中心
Priority to PCT/CN2019/081532 priority Critical patent/WO2020199200A1/en
Publication of WO2020199200A1 publication Critical patent/WO2020199200A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K19/00Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/01Preparation of mutants without inserting foreign genetic material therein; Screening processes therefor
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression

Definitions

  • the present invention relates to the field of biotechnology, in particular, to a novel single-base editing technology and its application.
  • DNA base editing methods developed in recent years can directly produce precise point mutations in genomic DNA without double-strand breaks (DSB).
  • Two types of basic editors have been reported: cytosine base editors (CBE, C to T and G to A) and adenine base editors (ABE, A to G, T to C).
  • CBE cytosine base editors
  • ABE adenine base editors
  • its application still has a key problem, namely off-target effect.
  • RNA targeting activity mediated by DNA base editing has not been studied before.
  • cytosine base editor BE3 and the adenine base editor ABE7.10 produced tens of thousands of off-target RNA single nucleotide variants (SNV), while cells without base editing only showed a few hundred. SNV.
  • the ABE7.10 developed by David Liu's laboratory of Harvard University can edit the third to eighth bases of the sgRNA target sequence. If there are other bases beside the target base to be edited, it will be edited non-specifically.
  • the purpose of the present invention is to provide a single-base editing technique with high accuracy, significantly reducing RNA off-target effects, and maintaining effective DNA targeting activity.
  • a mutein of adenine deaminase TadA said mutein is a non-natural protein, and said mutein is one selected from the group consisting of adenine deaminase TadA Or multiple amino acids are mutated:
  • the 147th and 148th positions correspond to the 147th and 148th positions of the sequence shown in SEQ ID NO:1.
  • the adenine deaminase TadA is derived from a species selected from the group consisting of Escherichia coli (E. coli), A. aeolicus, and B. subtilis (B. subtilis). ), Yeast CDD1.
  • the mutant protein has the activity of catalyzing the hydrolysis and deamination of adenine to form hypoxanthine.
  • the adenine deaminase TadA includes TadA* enzyme and wild-type TadA enzyme.
  • the adenine deaminase TadA is TadA* enzyme.
  • amino acid sequence of the wild-type TadA enzyme is shown in SEQ ID NO:1.
  • amino acid sequence of the TadA* enzyme is shown in SEQ ID NO: 2.
  • the phenylalanine (F) at position 147 is mutated to an amino acid residue other than phenylalanine.
  • the mutation of phenylalanine at position 147 is: alanine (A), glycine (G), arginine (R), aspartic acid (D), cysteine (C), Glutamine (Q), Glutamic Acid (E), Glycine (G), Histidine (H), Isoleucine (I), Leucine (L), Lysine (K ), methionine (M), serine (S), proline (P), threonine (T), tryptophan (W), tyrosine (Y), or valine (V).
  • the mutation of phenylalanine at position 147 is: leucine (L), valine (V), isoleucine (I), alanine (A), or tyrosine Acid (Y).
  • the phenylalanine (F) at position 148 is mutated to an amino acid residue other than phenylalanine.
  • the mutation of phenylalanine at position 148 is: alanine (A), glycine (G), arginine (R), aspartic acid (D), cysteine (C), Glutamine (Q), Glutamic Acid (E), Glycine (G), Histidine (H), Isoleucine (I), Leucine (L), Lysine (K ), methionine (M), serine (S), proline (P), threonine (T), tryptophan (W), tyrosine (Y), or valine (V).
  • the mutation of phenylalanine at position 148 is: leucine (L), valine (V), isoleucine (I), alanine (A), or tyrosine Acid (Y).
  • the remaining amino acid sequence of the mutant protein is the same or substantially the same as the sequence shown in SEQ ID NO.:1.
  • the said substantially identical is at most 50 (preferably 1-20, more preferably 1-10, more preferably 1-5) amino acids are not the same, wherein, The difference includes amino acid substitution, deletion or addition, and the mutant protein still has the activity of catalyzing the hydrolysis and deamination of adenine to form hypoxanthine.
  • the amino acid sequence of the mutant protein is shown in SEQ ID NO: 3.
  • the amino acid sequence of the mutant protein is shown in SEQ ID NO: 4.
  • the amino acid sequence of the mutant protein has at least 80% homology with the sequence shown in SEQ ID NO: 3 or SEQ ID NO: 4, preferably at least 85% or 90%, and more It is preferably at least 95%, most preferably at least 98%, and the homology is ⁇ 166/167 or 99.4%.
  • a gene editing enzyme is provided, and the structure of the gene editing enzyme is shown in formula I:
  • Z1 is the amino acid sequence of adenine deaminase TadA
  • Z2 is the amino acid sequence of TadA* enzyme
  • Z1 and/or Z2 is the amino acid sequence of the mutant protein according to the first aspect of the present invention.
  • Z3 is the coding sequence of Cas9 nuclease
  • L1, L2 and L3 are each independently an optional connecting peptide sequence
  • Z4 is a non-or nuclear localization signal element (NLS);
  • each "-" is independently a peptide bond.
  • the Z1 has the amino acid sequence of wild-type TadA enzyme.
  • the Z1 has the amino acid sequence of the wild-type TadA enzyme with F147A and/or F148A mutation.
  • the Z1 is a wild-type TadA enzyme with F147A and/or F148A mutations.
  • amino acid sequence of Z1 is shown in SEQ ID NO: 3.
  • the Z2 has the amino acid sequence of TadA* enzyme.
  • the Z2 has the amino acid sequence of the TadA* enzyme with F147A and/or F148A mutation.
  • the Z2 is a TadA* enzyme with F147A and/or F148A mutations.
  • amino acid sequence of Z2 is shown in SEQ ID NO: 4.
  • amino acid sequence of L1 is shown in SEQ ID NO: 5.
  • amino acid sequence of L1 is the same or substantially the same as the amino acid sequence shown in SEQ ID NO: 5.
  • amino acid sequence of L2 is shown in SEQ ID NO: 6.
  • amino acid sequence of L2 is the same or substantially the same as the amino acid sequence shown in SEQ ID NO: 6.
  • amino acid sequence of L3 is shown in SEQ ID NO:7.
  • amino acid sequence of L3 is the same or substantially the same as the amino acid sequence shown in SEQ ID NO:7.
  • the source of the Cas9 nuclease is selected from the group consisting of Streptococcus pyogenes, Staphylococcus aureus, mutant of Streptococcus pyogenes, or aureus Coccus mutants.
  • the Cas9 nuclease in the Z3, can be replaced with Cpf1 nuclease, and the source of the Cpf1 nuclease is selected from the following group: Acidaminococcus, Lachnospiraceae , Acid aminococcus mutants, Chaetomillaceae mutants.
  • amino acid sequence of Z3 is shown in SEQ ID NO: 8.
  • amino acid sequence of Z3 is the same or substantially the same as the amino acid sequence shown in SEQ ID NO: 8.
  • amino acid sequence of Z4 is shown in SEQ ID NO: 9.
  • amino acid sequence of Z4 is the same or substantially the same as the amino acid sequence shown in SEQ ID NO:9.
  • the said substantially the same is at most 50 (preferably 1-20, more preferably 1-10, more preferably 1-5, most preferably 1- 3) Amino acids are not identical, wherein the difference includes substitution, deletion or addition of amino acids.
  • the said substantially identical is that the sequence identity between the amino acid sequence and the corresponding amino acid sequence is at least 70%, at least 75%, at least 80%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%.
  • amino acid sequence of the gene editing enzyme is shown in SEQ ID NO: 10.
  • a polynucleotide which encodes the gene editing enzyme as described in the second aspect of the present invention.
  • polynucleotide is selected from the following group:
  • flank of the ORF of the gene editing enzyme described in the second aspect of the present invention additionally contains auxiliary elements selected from the group consisting of signal peptide, secretory peptide, tag sequence (such as 6His), Or a combination.
  • the signal peptide is a nuclear localization sequence.
  • the polynucleotide is selected from the following group: DNA sequence, RNA sequence, or a combination thereof.
  • a vector which contains the polynucleotide according to the third aspect of the present invention.
  • the vectors include expression vectors, shuttle vectors, and integration vectors.
  • a host cell contains the vector according to the fourth aspect of the present invention, or its genome integrates the polynucleotide according to the third aspect of the present invention.
  • the host is a prokaryotic cell or a eukaryotic cell.
  • the prokaryotic cell includes: Escherichia coli.
  • the eukaryotic cell is selected from the group consisting of yeast cells, plant cells, mammalian cells, human cells (such as HEK293T cells), or a combination thereof.
  • a method for single-base site-directed editing of genes including the steps:
  • the first vector contains a first nucleotide construct
  • the first nucleic acid construct has a 5'-3' (5' to 3') formula II structure:
  • P1 is the first promoter sequence
  • X1 is a nucleotide sequence encoding the gene editing enzyme of the second aspect of the present invention.
  • L4 is no or connection sequence
  • X2 is a polyA sequence
  • each "-" is independently a bond or a nucleotide linking sequence.
  • the first promoter is selected from the group consisting of CMV promoter, CAG promoter, PGK promoter, EF1 ⁇ promoter, EFS promoter, or a combination thereof.
  • the first promoter sequence is a CMV promoter.
  • the length of the connecting sequence is 30-120 nt, preferably 48-96 nt, and preferably a multiple of 3.
  • first carrier and the second carrier may be the same or different.
  • the first carrier and the second carrier may be the same carrier.
  • the first vector and/or the second vector further contain an expression cassette for expressing a selection marker.
  • the screening marker is selected from the group consisting of green fluorescent protein, yellow fluorescent protein, red fluorescent protein, blue fluorescent protein, or a combination thereof.
  • the method is non-diagnostic and non-therapeutic.
  • the cells are from the following species: humans, non-human mammals, poultry, plants, or microorganisms.
  • the non-human mammal includes rodents (such as mice, rats, rabbits), cows, pigs, sheep, horses, dogs, cats, and non-human primates (such as monkeys).
  • rodents such as mice, rats, rabbits
  • cows such as cows, pigs, sheep, horses, dogs, cats
  • non-human primates such as monkeys
  • the cell is selected from the group consisting of somatic cells, stem cells, germ cells, non-dividing cells or a combination thereof.
  • the cells are selected from the group consisting of kidney cells, epithelial cells, endothelial cells, nerve cells or a combination thereof.
  • the editing window when using the method for gene editing, is the 4th to 7th bases of the 20 base sequence targeted by sgRNA, and the 5th base has the highest editing efficiency. Distributed on both sides is significantly reduced.
  • the editing window of the non-mutated ABE7.10 editing system is wider than this method.
  • the editing window is from the 3rd amino acid to the 9th amino acid, and the 5th base has the highest editing efficiency. The lateral distribution gradually decreases.
  • kits comprising:
  • the kit further includes:
  • (a2) A second container, and a second vector in the second container, the second vector containing an expression cassette for expressing sgRNA.
  • the first vector and/or the second vector further contain an expression cassette for expressing a selection marker.
  • first container and the second container may be the same container or different containers.
  • the kit also contains instructions, which describe the following instructions: a method for infecting cells with the first vector and the second vector to perform single-base-directed editing of genes in the cell .
  • Figure 1 shows the off-target RNA SNV results of each single-base editing system.
  • APOBEC1 is the cytosine deaminase of BE3.
  • TadA-TadA* wild-type TadA enzyme-evolved TadA heterodimer
  • TadA* modified TadA
  • E DNA targeting efficiency of WT, GFP, TadA-TadA*, ABE7.10 and ABE7.10-site 2.
  • Each group n 3 repeats.
  • Figure 2 shows the characterization of off-target RNA SNV.
  • C Distribution of mutation types in each group. The number indicates the percentage of a certain mutation among all mutations.
  • Non-synonymous mutations induced by ABE7.10 are located on oncogenes and tumor suppressors with the highest editing rate. Gene names are shown in blue, amino acid mutations are shown in red, and single nucleotide conversions are shown in green. The GFP group served as a control for all comparisons. All values are expressed as mean ⁇ SEM. *p ⁇ 0.05, **p ⁇ 0.01, ***p ⁇ 0.001, unpaired t test.
  • Figure 3 shows the results of single-cell RNA SNV analysis of cells transfected with the base editor.
  • A SNV image analyzed by single-cell RNA sequencing method.
  • B The expression pattern of ABE, BE3 or GFP in a single cell from single-cell RNA-seq data.
  • F Distribution of mutation types in each cell. The number indicates the percentage of a certain mutation among all mutations.
  • G, H The ratio of SNV shared between any two samples in the same group. The ratio in each cell is calculated by dividing the number of overlapping SNVs between the two samples by the samples in the row.
  • Figure 4 shows the result of rational design of deaminase to eliminate off-target RNA SNV.
  • A Schematic diagram of BE3 and ABE7.10 variants. All deaminase mutations were performed under the background of BE3/ABE7.10. The point mutation is indicated by the red line.
  • G The representative editing site shows that ABE7.10 (F148A) has reduced the width of the editing window. All values are expressed as mean ⁇ SEM. *p ⁇ 0.05, **p ⁇ 0.01, ***p ⁇ 0.001, unpaired t test.
  • Figure 5 shows a schematic diagram of the plasmid.
  • Figure 6 shows a representative distribution of off-target RNA SNV on chromosomes.
  • A APOBEC1, BE3-site 3, BE3-RNF2; B: TadA-TadA*, ABE7.10-site 1 and ABE7.10-site 2
  • Figure 7 shows the distribution of mutation types for each repeat in all groups. The number indicates the percentage of a certain type of mutation among all mutations.
  • A Distribution of mutation types for each repeat in the GFP group.
  • B Distribution of mutation types for each repeat of APOBEC1 and BE3 groups with or without sgRNA.
  • Figure 8 shows that in all BE3 and ABE7.10 transfection groups, genes containing overlapping off-target RNA SNV were significantly higher than random analog genes. P value was calculated by two-sided Student's t'test.
  • Figure 9 shows the similarity between adjacent off-target RNA SNV sequence and target sequence
  • Figure 10 shows the rate of editing non-synonymous mutations induced by BE3 located on oncogenes and tumor suppressor genes. Single nucleotide conversions are shown in green, amino acid mutations are shown in red, and gene names are shown in blue.
  • Figure 11 shows the ratio of non-synonymous mutations induced by editing ABE7.10 located on oncogenes and tumor suppressor genes. Single nucleotide conversions are shown in green, amino acid mutations are shown in red, and gene names are shown in blue.
  • Figure 12 shows that only off-target RNA SNV was detected in RNA, not DNA.
  • the Sanger sequencing chromatogram showed that only U to C mutations were observed in the RNA of the two highest ranked oncogenes, TOPRS and CSDE1.
  • Figure 13 shows the expression level of the transfection vector in a single cell.
  • the expression levels of GFP, APOBEC1 and TadA-TadA* were quantified in all single cells sequenced.
  • the threshold is indicated by the blue dashed line.
  • the log2 (FPKM+1) thresholds of GFP, BE3 and ABE7.10 are 0.3, 1 and 0.3, respectively. Include cells with expression levels above the threshold for further analysis.
  • Figure 14 shows the mutation type distribution of all single cells.
  • Figure 15 shows the distribution of off-target RNA SNV from all single cells on human chromosomes, and its expression level is higher than the threshold.
  • Figure 16 shows the editing rate of BE3-induced non-synonymous mutations on oncogenes and tumor suppressor genes in single cells. Single nucleotide conversions are shown in green, amino acid mutations are shown in red, and gene names are shown in blue.
  • Figure 17 shows the editing rate of non-synonymous mutations induced by ABE7.10 on oncogenes and tumor suppressor genes located in single cells. Single nucleotide conversions are shown in green, amino acid mutations are shown in red, and gene names are shown in blue.
  • Figure 18 shows a representative distribution of off-target RNA SNV on human chromosomes of engineered BE3 and ABE7.10 variants.
  • Figure 20 shows the distribution of mutation types for each sample of the engineered variants of BE3 and ABE7.10.
  • Figure 21 shows the ratio of shared RNA SNV between any two samples in the engineered variants of BE3 and ABE7.10. Calculate the ratio in each cell by dividing the number of overlapping RNA SNV between the two samples by the number of RNA SNV in the row.
  • Figure 23 shows the homology of TadA enzymes in multiple species.
  • the inventors unexpectedly discovered for the first time that the TadA fragment and the TadA* fragment in the adenine deaminase (TadA-TadA*) associated with the adenine base editor ABE
  • the gene editing window can be significantly narrowed while maintaining effective DNA targeting activity, which can significantly increase The accuracy of its gene editing; and, experiments have proved that in the gene editing system with this mutation (ie, TadA F148A -TadA* F148A ), the off-target effect of RNA is greatly reduced.
  • the present invention has been completed on this basis.
  • base mutation refers to a substitution, insertion and/or deletion of a base at a certain position in a nucleotide sequence.
  • base substitution refers to the mutation of a base at a certain position in the nucleotide sequence to another different base, such as the mutation of A to G.
  • selection marker gene refers to a gene used to screen transgenic cells or transgenic animals in the transgenic process.
  • the selection marker gene that can be used in this application is not particularly limited, and includes various selection marker genes commonly used in the field of transgenics, representative examples Including (but not limited to): luciferin, or luciferase (such as firefly luciferase, Renilla luciferase), green fluorescent protein, yellow fluorescent protein, red fluorescent protein, or a combination thereof.
  • Cas protein refers to a nuclease.
  • a preferred Cas protein is the Cas9 protein.
  • Typical Cas9 proteins include (but are not limited to): Cas9 derived from Staphylococcus aureus.
  • the Cas9 protein can also be replaced by Cpf1 nuclease, and the source of the Cpf1 nuclease is selected from the following group: Acidaminococcus, Lachnospiraceae, acid aminococcus mutants , Mutants of Laospirillaceae.
  • TadA is a prokaryotic RNA editing enzyme.
  • TadA enzyme has the activity of adenine deaminase and can deaminate adenine (Adenosine, A) into hypoxanthine (Inosine, I). Recombinant TadA protein forms a homodimer, which produces inosine by deaminating adenosine residues at the swing position of tRNA Arg-2.
  • TadA has high homology among multiple species.
  • E. coli tadA shows sequence similarity to the yeast tRNA deaminase subunit Tad2p.
  • TadA7.10 and “TadA*” are used interchangeably and refer to a mutant based on the amino acid sequence of the wild-type TadA enzyme of the present invention.
  • the mutant amino acid residues include W23R, H36L, P48A, R51L, L84F, A106V, D108N, H123Y, S146C, D147Y, R152P, E155V, I156F and K157N.
  • the terms "ABE7.10” and “TadA-TadA*” are used interchangeably, and refer to the amino acid sequence that contains the amino acid sequence of the wild-type TadA enzyme and the TadA* enzyme that have not been mutated according to the present invention. protein.
  • the wild-type TadA enzyme has the amino acid sequence shown in SEQ ID NO: 1
  • the TadA* enzyme has the amino acid sequence shown in SEQ ID NO: 2.
  • gene editing enzyme As used herein, the terms “gene editing enzyme”, “gene editing enzyme of the present invention”, “TadA F148A- TadA* F148A of the present invention”, and “ABE7.10 F148A” are used interchangeably and refer to the second aspect of the present invention.
  • Z1 is the amino acid sequence of adenine deaminase TadA
  • Z2 is the amino acid sequence of TadA* enzyme
  • Z1 and/or Z2 is the amino acid sequence of the mutant protein according to the first aspect of the present invention.
  • Z3 is the coding sequence of Cas9 nuclease
  • L1, L2 and L3 are each independently an optional connecting peptide sequence
  • Z4 is a non-or nuclear localization signal element (NLS);
  • each "-" is independently a peptide bond.
  • the amino acid sequence of Z1 is an amino acid sequence with F148A mutation at position 148 based on the amino acid sequence shown in SEQ ID NO:1.
  • the amino acid sequence of Z2 is based on the amino acid sequence shown in SEQ ID NO: 2, an amino acid sequence in which the F148A mutation occurs at position 148.
  • amino acid sequence of Z3 is shown in SEQ ID NO: 8.
  • said L1, L2 and L3 each independently have an amino acid sequence selected from the group consisting of GGS, (GGS) 2 , (GGS) 3 , (GGS) 4 , (GGS) 5 , (GGS) 6 , (GGS) 7 , or a combination thereof.
  • the amino acid sequence of L1 is SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 5); the amino acid sequence of L2 is SGGSSGGSSGSETPGTSESATPESSGGSSGGSGS (SEQ ID NO: 6); the amino acid sequence of L3 is SGGS (SEQ ID NO: 6) ID NO: 7).
  • the Z4 is a nuclear localization signal element (NLS), and the amino acid sequence is PKKKRKV (SEQ ID NO: 9).
  • a typical amino acid sequence of the gene editing enzyme of the present invention is shown in SEQ ID NO: 10.
  • the present invention also includes 50% or more of the sequence shown in SEQ ID NO: 10 of the present invention (preferably 60% or more, 70% or more, 80% or more, more preferably 90% or more, more preferably 95% or more, most preferably 98 % Or more, such as 99%) homologous polypeptides or proteins with the same or similar functions.
  • the "same or similar function” mainly refers to "the activity of catalyzing the hydrolysis and deamination of adenine to form hypoxanthine".
  • amino acid numbering in the gene editing enzyme of the present invention is based on SEQ ID NO.: 10.
  • the amino acid numbering of the editing enzyme may be misaligned with respect to the amino acid numbering of SEQ ID NO.: 10, such as misaligned positions 1-5 to the N-terminus or C-terminus of the amino acid, and conventional sequence alignment techniques in the art are used in the art.
  • misalignment is within a reasonable range, and should not have homology of 80% (such as 90%, 95%, 98%), with the same or similar genes produced due to the misalignment of amino acid numbering
  • a mutant editing enzyme catalytic activity is not within the scope of the gene editing enzyme of the present invention.
  • the gene editing enzyme of the present invention is a synthetic protein or a recombinant protein, that is, it can be a chemically synthesized product, or produced from a prokaryotic or eukaryotic host (for example, bacteria, yeast, and plants) using recombinant technology. Depending on the host used in the recombinant production protocol, the gene editing enzyme of the present invention may be glycosylated or non-glycosylated. The gene editing enzyme of the present invention may also include or not include the initial methionine residue.
  • the present invention also includes fragments, derivatives and analogs of the gene editing enzyme.
  • fragment refers to a protein that substantially maintains the same biological function or activity of the gene editing enzyme.
  • the gene editing enzyme fragment, derivative or analogue of the present invention may be (i) a gene editing enzyme in which one or more conservative or non-conservative amino acid residues (preferably conservative amino acid residues) are replaced, and such substitution
  • the amino acid residues of may or may not be encoded by the genetic code, or (ii) gene editing enzymes with substitution groups in one or more amino acid residues, or (iii) mature gene editing enzymes and another compound ( For example, a compound that extends the half-life of a gene editing enzyme, such as polyethylene glycol) is fused to form a gene editing enzyme, or (iv) an additional amino acid sequence is fused to the gene editing enzyme sequence to form a gene editing enzyme (such as a leader sequence or secreted Sequence or used to purify the gene editing enzyme sequence or proprotein sequence, or the formation of fusion protein with antigen IgG fragment).
  • these fragments, derivatives and analogs are within the scope well known to those skilled in the art.
  • conservatively substituted are within the scope well known
  • Trp(W) Tyr phe Tyr Tyr(Y) Trp; phe; Thr; Ser Phe Val(V) Ile; Leu; Met; phe; Ala Leu
  • the gene editing enzyme of the present invention can also be modified.
  • Modification (usually without changing the primary structure) forms include: in vivo or in vitro chemically derived forms of gene editing enzymes such as acetylation or carboxylation.
  • Modifications also include glycosylation, such as those gene editing enzymes produced by glycosylation modification during the synthesis and processing of gene editing enzymes or in further processing steps. This modification can be accomplished by exposing the gene editing enzyme to an enzyme that performs glycosylation (such as a mammalian glycosylase or deglycosylase).
  • Modified forms also include sequences with phosphorylated amino acid residues (such as phosphotyrosine, phosphoserine, phosphothreonine). It also includes gene editing enzymes that have been modified to improve their resistance to proteolysis or optimize their solubility.
  • polynucleotide encoding a gene editing enzyme may include a polynucleotide encoding the gene editing enzyme of the present invention, or a polynucleotide that also includes additional coding and/or non-coding sequences.
  • the present invention also relates to variants of the above-mentioned polynucleotides, which encode fragments, analogs and derivatives of polypeptides or gene editing enzymes having the same amino acid sequence as the present invention.
  • These nucleotide variants include substitution variants, deletion variants and insertion variants.
  • an allelic variant is an alternative form of polynucleotide, which may be a substitution, deletion or insertion of one or more nucleotides, but does not substantially change the gene editing enzyme it encodes Function.
  • the present invention also relates to polynucleotides that hybridize with the above-mentioned sequences and have at least 50%, preferably at least 70%, and more preferably at least 80% identity between the two sequences.
  • the present invention particularly relates to polynucleotides that can hybridize with the polynucleotide of the present invention under stringent conditions (or stringent conditions).
  • stringent conditions refer to: (1) hybridization and elution at lower ionic strength and higher temperature, such as 0.2 ⁇ SSC, 0.1% SDS, 60°C; or (2) adding during hybridization There are denaturants, such as 50% (v/v) formamide, 0.1% calf serum/0.1% Ficoll, 42°C, etc.; or (3) only the identity between the two sequences is at least 90% or more, and more Fortunately, hybridization occurs when more than 95%.
  • the gene editing enzyme and polynucleotide of the present invention are preferably provided in an isolated form, and more preferably, are purified to homogeneity.
  • the full-length sequence of the polynucleotide of the present invention can usually be obtained by PCR amplification method, recombinant method or artificial synthesis method.
  • primers can be designed according to the relevant nucleotide sequence disclosed in the present invention, especially the open reading frame sequence, and a commercially available cDNA library or a cDNA prepared by a conventional method known to those skilled in the art can be used.
  • the library is used as a template to amplify the relevant sequences. When the sequence is long, it is often necessary to perform two or more PCR amplifications, and then splice the amplified fragments together in the correct order.
  • the recombination method can be used to obtain the relevant sequence in large quantities. This usually involves cloning it into a vector, then transferring it into a cell, and then isolating the relevant sequence from the proliferated host cell by conventional methods.
  • artificial synthesis methods can also be used to synthesize related sequences, especially when the fragment length is short. Usually, by first synthesizing multiple small fragments, and then ligating to obtain a very long fragment.
  • the DNA sequence encoding the protein (or fragment or derivative thereof) of the present invention can be obtained completely through chemical synthesis.
  • the DNA sequence can then be introduced into various existing DNA molecules (or such as vectors) and cells known in the art.
  • mutations can also be introduced into the protein sequence of the present invention through chemical synthesis.
  • the method of amplifying DNA/RNA using PCR technology is preferably used to obtain the polynucleotide of the present invention. Especially when it is difficult to obtain full-length cDNA from the library, the RACE method (RACE-cDNA end rapid amplification method) can be preferably used.
  • the primers used for PCR can be appropriately selected according to the sequence information of the present invention disclosed herein. And can be synthesized by conventional methods.
  • the amplified DNA/RNA fragments can be separated and purified by conventional methods such as gel electrophoresis.
  • the first vector contains a first nucleotide construct
  • the first nucleic acid construct has a 5'-3' (5' to 3') formula II structure:
  • P1 is the first promoter sequence
  • X1 is a nucleotide sequence encoding the gene editing enzyme of the second aspect of the present invention.
  • L4 is no or connection sequence
  • X2 is a polyA sequence
  • each "-" is independently a bond or a nucleotide linking sequence.
  • the first promoter is selected from the group consisting of CMV promoter, CAG promoter, PGK promoter, EF1 ⁇ promoter, EFS promoter, or a combination thereof.
  • the first promoter sequence is a CMV promoter.
  • the length of the connecting sequence is 30-120 nt, preferably, 48-96 nt, and preferably a multiple of 3.
  • the first carrier and the second carrier may be the same or different.
  • the first carrier and the second carrier may be the same carrier.
  • the first vector and/or the second vector further contain an expression cassette for expressing a selection marker.
  • the selection marker is selected from the following group: green fluorescent protein, yellow fluorescent protein, red fluorescent protein, blue fluorescent protein, or a combination thereof.
  • the method is non-diagnostic and non-therapeutic.
  • the cells are from the following species: humans, non-human mammals, poultry, plants, or microorganisms.
  • the non-human mammals include rodents (such as mice, rats, rabbits), cows, pigs, sheep, horses, dogs, cats, and non-human primates (such as monkeys).
  • the cell is selected from the group consisting of somatic cells, stem cells, germ cells, non-dividing cells or a combination thereof.
  • the cells are selected from the group consisting of kidney cells, epithelial cells, endothelial cells, nerve cells or a combination thereof.
  • the editing window when using the method for gene editing, is the 4th to 7th bases of the 20 base sequence targeted by sgRNA, and the 5th base has the highest editing efficiency.
  • the distribution is significantly reduced, and the editing window of the non-mutated ABE7.10 editing system is wider than this method.
  • the editing window is from the 3rd amino acid to the 9th amino acid, and the 5th base has the highest editing efficiency, which is distributed on both sides. Into a gradually decreasing trend.
  • the editing window of the single-base editing system ABE is reduced, and the accuracy of single-base editing is greatly improved.
  • the editing window is the 4th to 7th bases of the 20 base sequence targeted by sgRNA, and the 5th base has the highest editing efficiency, and the distribution to both sides is significantly reduced.
  • the editing window of the non-mutated ABE7.10 editing system is wider than this method.
  • the editing window is from the 3rd amino acid to the 9th amino acid, and the 5th base has the highest editing efficiency, and it is distributed to both sides into a gradually decreasing trend. .
  • ABE7.10 F148A almost maintains the editing activity of ABE7.10, keeping the same activity in the target editing site.
  • the plasmid was constructed using NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs) according to standard protocols.
  • the 293T cells were seeded in a 10cm culture dish, and in Dulbecco's modified Eagle medium (DMEM, Thermo Fisher Scientific) supplemented with 10% FBS (Thermo Fisher Scientific) and penicillin/streptomycin at 37°C, 5% CO 2 Under cultivation.
  • the cells were transfected with 30 ⁇ g plasmid using Lipofectamine 3000 (Thermo Fisher Scientific). Three days after transfection, the cells were digested with 0.05% trypsin (Thermo Fisher Scientific) and prepared for FACS.
  • GFP-positive cells were sorted and stored in DMEM or Trizol (Ambion) to determine DNA base editing or RNA-seq.
  • DMEM or Trizol Ambion
  • cells were lysed using a one-step mouse genotyping kit (Vazyme), followed by deep sequencing using Hi-TOM or using EditR 1.0.8 quantitative Sanger sequencing.
  • Vazyme mouse genotyping kit
  • Hi-TOM Hi-TOM
  • EditR 1.0.8 quantitative Sanger sequencing For RNA-seq, ⁇ 500,000 cells are collected and RNA is extracted according to standard protocols and then converted into cDNA, which is used for high-throughput RNA-seq.
  • RNA-seq high-throughput mRNA sequencing
  • FastQC v0.11.3
  • Trimmomatic v0.36
  • Use STAR v2.5.2b to map qualified reads to the reference genome (Ensemble GRCh38) in a 2-pass mode, and its parameters are implemented by the ENCODE project. Then use Picard tool (v2.3.0) to sort and mark the duplicates of the mapped BAM file.
  • the refined BAM file uses SplitNCigarReads, IndelRealigner, BaseRecalibrator and HaplotypeCaller tools from GATK (v3.5) to perform segmentation reading, crossing splice junctions, partial rearrangement, basic recalibration and variant calling.
  • filter clusters of at least 5 SNVs. These SNVs are within a 35-base window, and variants with a gene quality score> 25 are retained.
  • the mapping quality score is> 20, Fisher Strand Value (FS>30.0), Qual By depth value (QD ⁇ 2.0), and sequencing depth>20.
  • RNA-seq data First trim the original readings of single-cell RNA-seq data and compare them with GRCh38 human transcriptome (STAR v2.5.2b). After deduplication, GATK software (v3.5) was used to identify RNA SNV from individual cells. Those SNVs detected in single cells with DP ⁇ 20.0, FS ⁇ 30.0 and QD ⁇ 2.0 were retained for downstream analysis.
  • Example 1 Off-target RNA SNV detection for various single-base editing systems
  • CBE CBE, BE3 (APOBEC1-nCas9-UGI) or ABE, ABE7.10 (TadA-TadA*-nCas9), and GFP and with or without Single guide RNA (sgRNA) was transfected into cultured 293T cells. After 72 hours of incubation, cells expressing GFP were collected by FACS and then analyzed by RNA-seq. The experimental results of each group were compared with wild-type (WT, untransfected) samples, and RNA SNV was used in each transfection group (Figure 1A).
  • the 9 groups of transfected cells include expressing GFP, APOBEC1, BE3, BE3 with "site 3" sgRNA, BE3 with "RNF2" sgRNA, TadA-TadA*, ABE7.10, and ABE7 with “site 1" sgRNA. 10. ABE7.10 cells with “site 2" sgRNA ( Figure 5).
  • RNA-seq (two or three repetitions per group) was performed on these samples at an average depth of 125x. Call RNA SNV from RNA-seq data in each replicate, and filter out those identified in any WT cells.
  • RNA SNV was found in GFP transfected cells. Surprisingly, there are more RNA and SNV in the expression of APOBEC1, BE3 without sgRNA, and BE3 with site 3 or RNF2 sgRNA (5-40 times that in cells expressing only GFP). Similarly, a large amount of RNA SNV (5-10 times) was also found in cells expressing TadA-TadA*, ABE7.10 without sgRNA, or ABE7.10 with site 1 or site 2 sgRNA.
  • transfection of APOBEC1 or TadA-TadA* induced a higher amount of RNA SNV than other transfection groups, which means that the increase in SNV in CBE or ABE-treated cells may be caused by It is caused by overexpression of APOBEC1 or TadA.
  • off-target RNA SNV was characterized for each single-base editing system.
  • RNA SNV identified in BE3-treated cells is a mutation from G to A or C to U, which is significantly higher than that of GFP-transfected cells ( Figure 2A and 2C and Figure 7) ).
  • This mutation deviation is the same as APOBEC1 itself, indicating that these mutations are not spontaneous, but induced by BE3 or APOBEC1.
  • the GFP group also showed some deviations for A to G and U to C mutations (Figure 2C), which may be due to innate mutation preference.
  • off-target RNA and SNV induced by CBE and ABE are sgRNA-independent and caused by the overexpression of APOBEC1 and TadA-TadA*, respectively.
  • Example 3 Single-cell RNA SNV analysis of cells transfected with single-base editing system
  • single-cell RNA-seq sequencing was performed on four groups of cells (WT, GFP, BE3-site 3 and ABE7.10-site 1) to avoid random off-target signal loss due to population averaging .
  • BE3 (APOBEC1) transfected cells compared with BE3 (APOBEC1) transfected cells, BE3 (hA3A) transfected 293T cells showed significantly reduced off-target RNA SNV, while maintaining high targeted DNA editing efficiency (Figure 4B and 4C, Figure 4C). 18).
  • the point mutation W90A was introduced into the predicted RNA binding domain of APOBEC1, and it was found that although BE3 (W90A) eliminated the RNA off-target effect, the targeted DNA editing activity of BE3 (W90A) basically did not exist ( Figure 4B and 4C, Figure 18).
  • the engineered ABE7.10 F148A in the present invention has a larger application prospect.

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Microbiology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medicinal Chemistry (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Toxicology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

Provided in the present invention are a new-type single-base editing technique and the use thereof. Specifically provided in the present invention is a gene editing enzyme, characterized in that the structure of the gene editing enzyme is as shown in formula I: Z1-L1-Z2-L2-Z3-L3-Z4 (I), wherein Z1 is the amino acid sequence of the adenine deaminase TadA; Z2 is the amino acid sequence of the TadA* enzyme; and Z1 and/or Z2 have/has a mutation corresponding to the F residue at position 147 and/or 148 of the sequence shown in SEQ ID NO:1; Z3 is the coding sequence of Cas9 nuclease; L1, L2 and L3 are each independently an optional connecting peptide sequence; Z4 is a non- or nuclear localization signal element (NLS); and each "-" is independently a peptide bond. Further provided in the present invention is a method for the site-directed editing of single-base genes. In the method of the present invention, the accuracy of DNA editing is high, and an RNA off-target effect can be significantly reduced.

Description

一种新型的单碱基编辑技术及其应用A new single-base editing technique and its application 技术领域Technical field
本发明涉及生物技术领域,具体地,涉及一种新型的单碱基编辑技术及其应用。The present invention relates to the field of biotechnology, in particular, to a novel single-base editing technology and its application.
背景技术Background technique
自2013年以来,以CRISPR/Cas9为代表的新一代基因编辑技术进入生物学领域的各个实验,正改变着传统的基因操作手段。Since 2013, the new generation of gene editing technology represented by CRISPR/Cas9 has entered various experiments in the field of biology, which is changing the traditional methods of gene manipulation.
近年来开发的DNA碱基编辑方法能够在基因组DNA中直接产生精确的点突变,而不会产生双链断裂(DSB)。已经报道了两类基础编辑器:胞嘧啶碱基编辑器(CBE,C至T和G至A)和腺嘌呤碱基编辑器(ABE,A至G,T至C)。然而,其应用还存在关键问题,即脱靶效应。DNA base editing methods developed in recent years can directly produce precise point mutations in genomic DNA without double-strand breaks (DSB). Two types of basic editors have been reported: cytosine base editors (CBE, C to T and G to A) and adenine base editors (ABE, A to G, T to C). However, its application still has a key problem, namely off-target effect.
以前的研究主要集中在评估基因组DNA中的脱靶突变。最近的研究结果表明,CBE而非ABEs在基因编辑的过程中诱导大量的脱靶单核苷酸突变,强调了开发更高保真性单碱基编辑工具的必要性。除了DNA靶向活性外,常用的单碱基编辑系统可能会对RNA进行突变。例如,发现与CBE相关的胞嘧啶脱氨酶APOBEC1既能靶向DNA又能靶向RNA,并且发现与ABE相关的腺嘌呤脱氨酶TadA也能诱导RNA上的位点特异性肌苷形成。然而,DNA碱基编辑介导的RNA靶向活性尚未在之前进行过研究。研究表明,胞嘧啶碱基编辑器BE3和腺嘌呤碱基编辑器ABE7.10都产生了数万个脱靶RNA单核苷酸变异(SNV),而没有碱基编辑的细胞仅表现出几百个SNV。Previous studies have focused on evaluating off-target mutations in genomic DNA. Recent research results indicate that CBEs rather than ABEs induce a large number of off-target single-nucleotide mutations in the process of gene editing, emphasizing the need to develop higher-fidelity single-base editing tools. In addition to DNA targeting activity, commonly used single-base editing systems may mutate RNA. For example, it was found that the cytosine deaminase APOBEC1 related to CBE can target both DNA and RNA, and it was found that TadA, the adenine deaminase related to ABE, can also induce site-specific inosine formation on RNA. However, the RNA targeting activity mediated by DNA base editing has not been studied before. Studies have shown that both the cytosine base editor BE3 and the adenine base editor ABE7.10 produced tens of thousands of off-target RNA single nucleotide variants (SNV), while cells without base editing only showed a few hundred. SNV.
目前,已有的DNA碱基编辑方法中,DNA编辑的精确度并不高,即基因编辑窗口过大。哈佛大学的David Liu实验室开发的ABE7.10能够编辑sgRNA靶向序列的第三到第八个碱基,如果需要编辑的目的碱基旁边还有其他碱基会被非特异性地编辑。At present, among the existing DNA base editing methods, the accuracy of DNA editing is not high, that is, the gene editing window is too large. The ABE7.10 developed by David Liu's laboratory of Harvard University can edit the third to eighth bases of the sgRNA target sequence. If there are other bases beside the target base to be edited, it will be edited non-specifically.
因此,本领域迫切需要开发一种精确度高、显著降低RNA脱靶效应,并且能保持有效的DNA靶向活性的单碱基编辑技术。Therefore, there is an urgent need in the art to develop a single-base editing technology with high accuracy, significantly reducing RNA off-target effects, and maintaining effective DNA targeting activity.
发明内容Summary of the invention
本发明的目的就是提供一种精确度高、显著降低RNA脱靶效应,并且能 保持有效的DNA靶向活性的单碱基编辑技术。The purpose of the present invention is to provide a single-base editing technique with high accuracy, significantly reducing RNA off-target effects, and maintaining effective DNA targeting activity.
在本发明的第一方面,提供一种腺嘌呤脱氨酶TadA的突变蛋白,所述的突变蛋白为非天然蛋白,并且所述突变蛋白在腺嘌呤脱氨酶TadA的选自下组的一个或多个氨基酸发生突变:In the first aspect of the present invention, there is provided a mutein of adenine deaminase TadA, said mutein is a non-natural protein, and said mutein is one selected from the group consisting of adenine deaminase TadA Or multiple amino acids are mutated:
第147位苯丙氨酸(F)和第148位苯丙氨酸(F);Phenylalanine (F) at position 147 and Phenylalanine (F) at position 148;
其中,所述第147位和第148位是对应于如SEQ ID NO:1所示的序列的第147位和第148位。Wherein, the 147th and 148th positions correspond to the 147th and 148th positions of the sequence shown in SEQ ID NO:1.
在另一优选例中,所述的腺嘌呤脱氨酶TadA来源于选自下组的物种:大肠杆菌(E.coli)、超嗜热菌(A.aeolicus)、枯草芽孢杆菌(B.subtilis)、酵母CDD1。In another preferred example, the adenine deaminase TadA is derived from a species selected from the group consisting of Escherichia coli (E. coli), A. aeolicus, and B. subtilis (B. subtilis). ), Yeast CDD1.
在另一优选例中,所述突变蛋白具有催化腺嘌呤水解脱氨基生成次黄嘌呤的活性。In another preferred embodiment, the mutant protein has the activity of catalyzing the hydrolysis and deamination of adenine to form hypoxanthine.
在另一优选例中,所述的腺嘌呤脱氨酶TadA包括TadA*酶和野生型TadA酶。In another preferred example, the adenine deaminase TadA includes TadA* enzyme and wild-type TadA enzyme.
在另一优选例中,所述的腺嘌呤脱氨酶TadA为TadA*酶。In another preferred embodiment, the adenine deaminase TadA is TadA* enzyme.
在另一优选例中,所述的野生型TadA酶的氨基酸序列如SEQ ID NO:1所示。In another preferred example, the amino acid sequence of the wild-type TadA enzyme is shown in SEQ ID NO:1.
在另一优选例中,所述的TadA*酶的氨基酸序列如SEQ ID NO:2所示。In another preferred embodiment, the amino acid sequence of the TadA* enzyme is shown in SEQ ID NO: 2.
在另一优选例中,所述第147位苯丙氨酸(F)突变为非苯丙氨酸的氨基酸残基。In another preferred embodiment, the phenylalanine (F) at position 147 is mutated to an amino acid residue other than phenylalanine.
在另一优选例中,所述第147位苯丙氨酸突变为:丙氨酸(A)、甘氨酸(G)、精氨酸(R)、天冬氨酸(D)、半胱氨酸(C)、谷氨酰胺(Q)、谷氨酸(E)、甘氨酸(G)、组氨酸(H)、异亮氨酸(I)、亮氨酸(L)、赖氨酸(K)、蛋氨酸(M)、丝氨酸(S)、脯氨酸(P)、苏氨酸(T)、色氨酸(W)、酪氨酸(Y),或者缬氨酸(V)。In another preferred embodiment, the mutation of phenylalanine at position 147 is: alanine (A), glycine (G), arginine (R), aspartic acid (D), cysteine (C), Glutamine (Q), Glutamic Acid (E), Glycine (G), Histidine (H), Isoleucine (I), Leucine (L), Lysine (K ), methionine (M), serine (S), proline (P), threonine (T), tryptophan (W), tyrosine (Y), or valine (V).
在另一优选例中,所述第147位苯丙氨酸突变为:亮氨酸(L)、缬氨酸(V)、异亮氨酸(I)、丙氨酸(A),或者酪氨酸(Y)。In another preferred embodiment, the mutation of phenylalanine at position 147 is: leucine (L), valine (V), isoleucine (I), alanine (A), or tyrosine Acid (Y).
在另一优选例中,所述第148位苯丙氨酸(F)突变为非苯丙氨酸的氨基酸残基。In another preferred example, the phenylalanine (F) at position 148 is mutated to an amino acid residue other than phenylalanine.
在另一优选例中,所述第148位苯丙氨酸突变为:丙氨酸(A)、甘氨酸(G)、精氨酸(R)、天冬氨酸(D)、半胱氨酸(C)、谷氨酰胺(Q)、谷氨酸(E)、甘氨酸(G)、组氨酸(H)、异亮氨酸(I)、亮氨酸(L)、赖氨酸(K)、蛋 氨酸(M)、丝氨酸(S)、脯氨酸(P)、苏氨酸(T)、色氨酸(W)、酪氨酸(Y),或者缬氨酸(V)。In another preferred embodiment, the mutation of phenylalanine at position 148 is: alanine (A), glycine (G), arginine (R), aspartic acid (D), cysteine (C), Glutamine (Q), Glutamic Acid (E), Glycine (G), Histidine (H), Isoleucine (I), Leucine (L), Lysine (K ), methionine (M), serine (S), proline (P), threonine (T), tryptophan (W), tyrosine (Y), or valine (V).
在另一优选例中,所述第148位苯丙氨酸突变为:亮氨酸(L)、缬氨酸(V)、异亮氨酸(I)、丙氨酸(A),或者酪氨酸(Y)。In another preferred embodiment, the mutation of phenylalanine at position 148 is: leucine (L), valine (V), isoleucine (I), alanine (A), or tyrosine Acid (Y).
在另一优选例中,所述的突变蛋白除所述突变(如147位和148位氨基酸)外,其余的氨基酸序列与SEQ ID NO.:1所示的序列相同或基本相同。In another preferred embodiment, except for the mutations (such as amino acids 147 and 148), the remaining amino acid sequence of the mutant protein is the same or substantially the same as the sequence shown in SEQ ID NO.:1.
在另一优选例中,所述的基本相同是至多有50个(较佳地为1-20个,更佳地为1-10个、更佳地1-5个)氨基酸不相同,其中,所述的不相同包括氨基酸的取代、缺失或添加,且所述的突变蛋白仍具有催化腺嘌呤水解脱氨基生成次黄嘌呤的活性。In another preferred embodiment, the said substantially identical is at most 50 (preferably 1-20, more preferably 1-10, more preferably 1-5) amino acids are not the same, wherein, The difference includes amino acid substitution, deletion or addition, and the mutant protein still has the activity of catalyzing the hydrolysis and deamination of adenine to form hypoxanthine.
在另一优选例中,所述腺嘌呤脱氨酶TadA为野生型TadA酶时,突变蛋白的氨基酸序列如SEQ ID NO:3所示。In another preferred embodiment, when the adenine deaminase TadA is a wild-type TadA enzyme, the amino acid sequence of the mutant protein is shown in SEQ ID NO: 3.
在另一优选例中,所述腺嘌呤脱氨酶TadA为TadA*酶时,突变蛋白的氨基酸序列如SEQ ID NO:4所示。In another preferred embodiment, when the adenine deaminase TadA is the TadA* enzyme, the amino acid sequence of the mutant protein is shown in SEQ ID NO: 4.
在另一优选例中,所述突变蛋白的氨基酸序列与SEQ ID NO:3或SEQ ID NO:4所示序列的同源性至少为80%,较佳地至少为85%或90%,更佳地至少为95%,最佳地至少为98%,且同源性≤166/167或99.4%。In another preferred embodiment, the amino acid sequence of the mutant protein has at least 80% homology with the sequence shown in SEQ ID NO: 3 or SEQ ID NO: 4, preferably at least 85% or 90%, and more It is preferably at least 95%, most preferably at least 98%, and the homology is ≤166/167 or 99.4%.
在本发明的第二方面,提供了一种基因编辑酶,所述基因编辑酶的结构如式I所示:In the second aspect of the present invention, a gene editing enzyme is provided, and the structure of the gene editing enzyme is shown in formula I:
Z1-L1-Z2-L2-Z3-L3-Z4     (I)Z1-L1-Z2-L2-Z3-L3-Z4 (I)
其中,among them,
Z1为腺嘌呤脱氨酶TadA的氨基酸序列;Z1 is the amino acid sequence of adenine deaminase TadA;
Z2为TadA*酶的氨基酸序列;Z2 is the amino acid sequence of TadA* enzyme;
并且所述Z1和/或Z2为如本发明第一方面所述的突变蛋白的氨基酸序列;And said Z1 and/or Z2 is the amino acid sequence of the mutant protein according to the first aspect of the present invention;
Z3为Cas9核酸酶的编码序列;Z3 is the coding sequence of Cas9 nuclease;
L1、L2和L3各自独立地为任选的连接肽序列;L1, L2 and L3 are each independently an optional connecting peptide sequence;
Z4为无或核定位信号元件(NLS);Z4 is a non-or nuclear localization signal element (NLS);
并且各“-”独立地为肽键。And each "-" is independently a peptide bond.
在另一优选例中,所述Z1具有野生型TadA酶的氨基酸序列。In another preferred embodiment, the Z1 has the amino acid sequence of wild-type TadA enzyme.
在另一优选例中,所述Z1具有F147A和/或F148A突变的野生型TadA酶的氨基酸序列。In another preferred embodiment, the Z1 has the amino acid sequence of the wild-type TadA enzyme with F147A and/or F148A mutation.
在另一优选例中,所述Z1为具有F147A和/或F148A突变的野生型TadA酶。In another preferred example, the Z1 is a wild-type TadA enzyme with F147A and/or F148A mutations.
在另一优选例中,所述Z1的氨基酸序列如SEQ ID NO:3所示。In another preferred embodiment, the amino acid sequence of Z1 is shown in SEQ ID NO: 3.
在另一优选例中,所述Z2具有TadA*酶的氨基酸序列。In another preferred embodiment, the Z2 has the amino acid sequence of TadA* enzyme.
在另一优选例中,所述Z2具有F147A和/或F148A突变的TadA*酶的氨基酸序列。In another preferred embodiment, the Z2 has the amino acid sequence of the TadA* enzyme with F147A and/or F148A mutation.
在另一优选例中,所述Z2为具有F147A和/或F148A突变的TadA*酶。In another preferred embodiment, the Z2 is a TadA* enzyme with F147A and/or F148A mutations.
在另一优选例中,所述Z2的氨基酸序列如SEQ ID NO:4所示。In another preferred embodiment, the amino acid sequence of Z2 is shown in SEQ ID NO: 4.
在另一优选例中,所述L1的氨基酸序列如SEQ ID NO:5所示。In another preferred embodiment, the amino acid sequence of L1 is shown in SEQ ID NO: 5.
在另一优选例中,所述L1的氨基酸序列与如SEQ ID NO:5所示的氨基酸序列相同或基本相同。In another preferred embodiment, the amino acid sequence of L1 is the same or substantially the same as the amino acid sequence shown in SEQ ID NO: 5.
在另一优选例中,所述L2的氨基酸序列如SEQ ID NO:6所示。In another preferred embodiment, the amino acid sequence of L2 is shown in SEQ ID NO: 6.
在另一优选例中,所述L2的氨基酸序列与如SEQ ID NO:6所示的氨基酸序列相同或基本相同。In another preferred embodiment, the amino acid sequence of L2 is the same or substantially the same as the amino acid sequence shown in SEQ ID NO: 6.
在另一优选例中,所述L3的氨基酸序列如SEQ ID NO:7所示。In another preferred embodiment, the amino acid sequence of L3 is shown in SEQ ID NO:7.
在另一优选例中,所述L3的氨基酸序列与如SEQ ID NO:7所示的氨基酸序列相同或基本相同。In another preferred embodiment, the amino acid sequence of L3 is the same or substantially the same as the amino acid sequence shown in SEQ ID NO:7.
在另一优选例中,所述Z3中,所述Cas9核酸酶的来源选自下组:酿脓链球菌(Streptococcuspyogenes)、葡萄球菌(Staphylococcus aureus)、酿脓链球菌突变体,或者金黄色葡萄球菌突变体。In another preferred embodiment, in the Z3, the source of the Cas9 nuclease is selected from the group consisting of Streptococcus pyogenes, Staphylococcus aureus, mutant of Streptococcus pyogenes, or aureus Coccus mutants.
在另一优选例中,所述Z3中,所述Cas9核酸酶可以替换为Cpf1核酸酶,所述Cpf1核酸酶的来源选自下组:酸性氨基球菌(Acidaminococcus)、毛螺菌科(Lachnospiraceae)、酸性氨基球菌突变体、毛螺菌科突变体。In another preferred example, in the Z3, the Cas9 nuclease can be replaced with Cpf1 nuclease, and the source of the Cpf1 nuclease is selected from the following group: Acidaminococcus, Lachnospiraceae , Acid aminococcus mutants, Chaetomillaceae mutants.
在另一优选例中,所述Z3的氨基酸序列如SEQ ID NO:8所示。In another preferred embodiment, the amino acid sequence of Z3 is shown in SEQ ID NO: 8.
在另一优选例中,所述Z3的氨基酸序列与如SEQ ID NO:8所示的氨基酸序列相同或基本相同。In another preferred embodiment, the amino acid sequence of Z3 is the same or substantially the same as the amino acid sequence shown in SEQ ID NO: 8.
在另一优选例中,所述Z4的氨基酸序列如SEQ ID NO:9所示。In another preferred embodiment, the amino acid sequence of Z4 is shown in SEQ ID NO: 9.
在另一优选例中,所述Z4的氨基酸序列与如SEQ ID NO:9所示的氨基酸序列 相同或基本相同。In another preferred embodiment, the amino acid sequence of Z4 is the same or substantially the same as the amino acid sequence shown in SEQ ID NO:9.
在另一优选例中,所述的基本相同是至多有50个(较佳地为1-20个,更佳地为1-10个、更佳地1-5个,最佳地为1-3个)氨基酸不相同,其中,所述的不相同包括氨基酸的取代、缺失或添加。In another preferred example, the said substantially the same is at most 50 (preferably 1-20, more preferably 1-10, more preferably 1-5, most preferably 1- 3) Amino acids are not identical, wherein the difference includes substitution, deletion or addition of amino acids.
在另一优选例中,所述的基本相同是氨基酸序列与相应氨基酸序列的序列同一性至少为70%、至少为75%、至少为80%、至少为85%、至少为86%、至少为87%、至少为88%、至少为89%、至少为90%、至少为91%、至少为92%、至少为93%、至少为94%、至少为95%、至少为96%、至少为97%、至少为98%,或者至少为99%。In another preferred embodiment, the said substantially identical is that the sequence identity between the amino acid sequence and the corresponding amino acid sequence is at least 70%, at least 75%, at least 80%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%.
在另一优选例中,所述基因编辑酶的氨基酸序列如SEQ ID NO:10所示。In another preferred embodiment, the amino acid sequence of the gene editing enzyme is shown in SEQ ID NO: 10.
在本发明的第三方面,提供了一种多核苷酸,所述的多核苷酸编码如本发明第二方面所述的基因编辑酶。In the third aspect of the present invention, a polynucleotide is provided, which encodes the gene editing enzyme as described in the second aspect of the present invention.
在另一优选例中,所述多核苷酸选自下组:In another preferred embodiment, the polynucleotide is selected from the following group:
(a)编码如SEQ ID NO:10所示氨基酸序列的多核苷酸;(a) A polynucleotide encoding the amino acid sequence shown in SEQ ID NO: 10;
(b)核苷酸序列与(a)所述多核苷酸序列的序列同一性≥95%(较佳地≥98%)的多核苷酸;(b) A polynucleotide whose nucleotide sequence is ≥95% (preferably ≥98%) with the sequence identity of the polynucleotide sequence of (a);
(c)与(a)和(b)中任一所述的多核苷酸互补的多核苷酸。(c) A polynucleotide complementary to the polynucleotide described in any one of (a) and (b).
在另一优选例中,所述的如本发明第二方面所述的基因编辑酶的ORF的侧翼还额外含有选自下组的辅助元件:信号肽、分泌肽、标签序列(如6His)、或其组合。In another preferred example, the flank of the ORF of the gene editing enzyme described in the second aspect of the present invention additionally contains auxiliary elements selected from the group consisting of signal peptide, secretory peptide, tag sequence (such as 6His), Or a combination.
在另一优选例中,所述信号肽为核定位序列。In another preferred embodiment, the signal peptide is a nuclear localization sequence.
在另一优选例中,所述的多核苷酸选自下组:DNA序列、RNA序列、或其组合。In another preferred embodiment, the polynucleotide is selected from the following group: DNA sequence, RNA sequence, or a combination thereof.
在本发明的第四方面,提供了一种载体,所述的载体含有如本发明第三方面所述的多核苷酸。In the fourth aspect of the present invention, a vector is provided, which contains the polynucleotide according to the third aspect of the present invention.
在另一优选例中,所述载体包括表达载体、穿梭载体、整合载体。In another preferred embodiment, the vectors include expression vectors, shuttle vectors, and integration vectors.
在本发明的第五方面,提供了一种宿主细胞,所述的宿主细胞含有如本发明第四方面所述的载体,或其基因组中整合有如本发明第三方面所述的多核苷酸。In the fifth aspect of the present invention, a host cell is provided, the host cell contains the vector according to the fourth aspect of the present invention, or its genome integrates the polynucleotide according to the third aspect of the present invention.
在另一优选例中,所述宿主为原核细胞或真核细胞。In another preferred embodiment, the host is a prokaryotic cell or a eukaryotic cell.
在另一优选例中,所述原核细胞包括:大肠杆菌。In another preferred embodiment, the prokaryotic cell includes: Escherichia coli.
在另一优选例中,所述真核细胞选自下组:酵母细胞、植物细胞、哺乳动物细胞、人细胞(如HEK293T细胞),或其组合。In another preferred embodiment, the eukaryotic cell is selected from the group consisting of yeast cells, plant cells, mammalian cells, human cells (such as HEK293T cells), or a combination thereof.
在本发明的第六方面,提供了一种基因单碱基定点编辑的方法,包括步骤:In the sixth aspect of the present invention, a method for single-base site-directed editing of genes is provided, including the steps:
(i)提供一细胞以及第一载体和第二载体,其中所述第一载体含有如本发明第二方面所述的基因编辑酶的表达盒,并且所述第二载体含有表达sgRNA的表达盒;(i) Provide a cell and a first vector and a second vector, wherein the first vector contains an expression cassette for the gene editing enzyme according to the second aspect of the present invention, and the second vector contains an expression cassette for expressing sgRNA ;
(ii)用所述的第一载体和第二载体感染所述的细胞,从而在所述细胞内进行单碱基定点编辑。(ii) Infecting the cell with the first vector and the second vector, thereby performing single-base site-directed editing in the cell.
在另一优选例中,其中,所述第一载体中含有第一核苷酸构建物,所述第一核酸构建物具有5’-3’(5’至3’)的式II结构:In another preferred example, wherein the first vector contains a first nucleotide construct, and the first nucleic acid construct has a 5'-3' (5' to 3') formula II structure:
P1-X1-L4-X2   (II)P1-X1-L4-X2 (II)
其中,P1为第一启动子序列;Wherein, P1 is the first promoter sequence;
X1为编码本发明第二方面所述的基因编辑酶的核苷酸序列;X1 is a nucleotide sequence encoding the gene editing enzyme of the second aspect of the present invention;
L4为无或连接序列;L4 is no or connection sequence;
X2为polyA序列;X2 is a polyA sequence;
并且,各“-”独立地为键或核苷酸连接序列。Also, each "-" is independently a bond or a nucleotide linking sequence.
在另一优选例中,所述的第一启动子选自下组:CMV启动子、CAG启动子、PGK启动子、EF1α启动子,EFS启动子,或其组合。In another preferred example, the first promoter is selected from the group consisting of CMV promoter, CAG promoter, PGK promoter, EF1α promoter, EFS promoter, or a combination thereof.
在另一优选例中,所述第一启动子序列为CMV启动子。In another preferred example, the first promoter sequence is a CMV promoter.
在另一优选例中,所述连接序列的长度为30-120nt,较佳地,48-96nt,并且优选为3的倍数。In another preferred example, the length of the connecting sequence is 30-120 nt, preferably 48-96 nt, and preferably a multiple of 3.
在另一优选例中,所述第一载体、第二载体可以相同,可以不同。In another preferred embodiment, the first carrier and the second carrier may be the same or different.
在另一优选例中,所述第一载体和第二载体可以为同一载体。In another preferred embodiment, the first carrier and the second carrier may be the same carrier.
在另一优选例中,所述第一载体和/或第二载体还含有表达筛选标记的表达盒。In another preferred embodiment, the first vector and/or the second vector further contain an expression cassette for expressing a selection marker.
在另一优选例中,所述筛选标记选自下组:绿色荧光蛋白、黄色荧光蛋白、红色荧光蛋白、蓝色荧光蛋白,或其组合。In another preferred embodiment, the screening marker is selected from the group consisting of green fluorescent protein, yellow fluorescent protein, red fluorescent protein, blue fluorescent protein, or a combination thereof.
在另一优选例中,所述的方法是非诊断和非治疗性的。In another preferred embodiment, the method is non-diagnostic and non-therapeutic.
在另一优选例中,所述的细胞来自以下物种:人、非人哺乳动物、家禽、植物、或微生物。In another preferred embodiment, the cells are from the following species: humans, non-human mammals, poultry, plants, or microorganisms.
在另一优选例中,所述的非人哺乳动物包括啮齿动物(如小鼠、大鼠、兔)、牛、猪、羊、马、狗、猫、非人灵长动物(如猴)。In another preferred embodiment, the non-human mammal includes rodents (such as mice, rats, rabbits), cows, pigs, sheep, horses, dogs, cats, and non-human primates (such as monkeys).
在另一优选例中,所述的细胞选自下组:体细胞、干细胞、生殖细胞、非分裂细胞或其组合。In another preferred embodiment, the cell is selected from the group consisting of somatic cells, stem cells, germ cells, non-dividing cells or a combination thereof.
在另一优选例中,所述的细胞选自下组:肾细胞、上皮细胞、内皮细胞,神经细胞或其组合。In another preferred embodiment, the cells are selected from the group consisting of kidney cells, epithelial cells, endothelial cells, nerve cells or a combination thereof.
在另一优选例中,用所述方法基因编辑时,编辑窗口是sgRNA靶向的20个碱基序列的第4个到第7个碱基,其中第5个碱基的编辑效率最高,往两侧分布式显著降低,而非突变的ABE7.10编辑系统的编辑窗口相对本方法更宽,编辑窗口位第3个氨基酸到第9个氨基酸,第5个碱基的编辑效率最高,往两侧分布成逐渐降低的趋势。In another preferred example, when using the method for gene editing, the editing window is the 4th to 7th bases of the 20 base sequence targeted by sgRNA, and the 5th base has the highest editing efficiency. Distributed on both sides is significantly reduced. The editing window of the non-mutated ABE7.10 editing system is wider than this method. The editing window is from the 3rd amino acid to the 9th amino acid, and the 5th base has the highest editing efficiency. The lateral distribution gradually decreases.
在本发明的第七方面,提供了一种试剂盒,所述试剂盒包括:In the seventh aspect of the present invention, a kit is provided, the kit comprising:
(a1)第一容器,以及位于所述第一容器中的第一载体,所述所述第一载体含有如本发明第二方面所述的基因编辑酶的表达盒。(a1) A first container, and a first vector located in the first container, the first vector containing the expression cassette of the gene editing enzyme according to the second aspect of the present invention.
在另一优选例中,所述试剂盒还包括:In another preferred embodiment, the kit further includes:
(a2)第二容器,以及位于所述第二容器中的第二载体,所述第二载体含有表达sgRNA的表达盒。(a2) A second container, and a second vector in the second container, the second vector containing an expression cassette for expressing sgRNA.
在另一优选例中,所述第一载体和/或第二载体还含有表达筛选标记的表达盒。In another preferred embodiment, the first vector and/or the second vector further contain an expression cassette for expressing a selection marker.
在另一优选例中,所述第一容器和第二容器可以是相同的容器,可以是不同的容器。In another preferred embodiment, the first container and the second container may be the same container or different containers.
在另一优选例中,所述试剂盒还含有说明书,所述说明书中记载了如下说明:将第一载体和第二载体感染细胞,从而在所述细胞内进行基因单碱基定点编辑的方法。In another preferred embodiment, the kit also contains instructions, which describe the following instructions: a method for infecting cells with the first vector and the second vector to perform single-base-directed editing of genes in the cell .
应理解,在本发明范围内中,本发明的上述各技术特征和在下文(如实施例)中具体描述的各技术特征之间都可以互相组合,从而构成新的或优选的技术方案。限于篇幅,在此不再一一累述。It should be understood that within the scope of the present invention, the above-mentioned technical features of the present invention and the technical features specifically described in the following (such as the embodiments) can be combined with each other to form a new or preferred technical solution. Due to space limitations, I will not repeat them here.
附图说明Description of the drawings
图1显示了各单碱基编辑系统的脱靶RNA SNV结果。Figure 1 shows the off-target RNA SNV results of each single-base editing system.
A:实验设计方案。A: Experimental design plan.
B:WT(n=3个重复)、GFP(n=3)、APOBEC1(n=3个重复)、BE3(n=3个重复)和BE3-位点3(n=2次重复)的DNA靶向效率。注意,APOBEC1是BE3的胞嘧啶脱氨酶。B: WT (n=3 repeats), GFP (n=3), APOBEC1 (n=3 repeats), BE3 (n=3 repeats) and BE3-site 3 (n=2 repeats) DNA Targeting efficiency. Note that APOBEC1 is the cytosine deaminase of BE3.
C:WT、GFP、APOBEC1、BE3和BE3-RNF2的DNA靶向效率。每组n=3重复。C: DNA targeting efficiency of WT, GFP, APOBEC1, BE3 and BE3-RNF2. Each group n=3 repeats.
D:WT、GFP、TadA-TadA*、ABE7.10和ABE7.10-位点1的DNA靶向效率。每组n=3个重复。注意,TadA-TadA*(野生型TadA酶-进化的TadA异二聚体)是ABE7.10的腺嘌呤脱氨酶,并且改进的TadA由TadA*表示。D: DNA targeting efficiency of WT, GFP, TadA-TadA*, ABE7.10 and ABE7.10-site 1. Each group n=3 repeats. Note that TadA-TadA* (wild-type TadA enzyme-evolved TadA heterodimer) is an adenine deaminase of ABE7.10, and modified TadA is represented by TadA*.
E:WT、GFP、TadA-TadA*、ABE7.10和ABE7.10-位点2的DNA靶向效率。每组n=3个重复。E: DNA targeting efficiency of WT, GFP, TadA-TadA*, ABE7.10 and ABE7.10-site 2. Each group n=3 repeats.
F、G:BE3和ABE7.10组的脱靶RNA SNV的比较。F, G: Comparison of off-target RNA SNV between BE3 and ABE7.10 groups.
H:GFP、BE3和ABE7.10的人染色体上脱靶RNA SNV的代表性分布。染色体用不同的颜色表示。GFP组作为所有比较的对照。所有值均表示为平均值±SEM*p<0.05,**p<0.01,***p<0.001,非配对t检验。H: Representative distribution of off-target RNA SNV on human chromosomes of GFP, BE3 and ABE7.10. Chromosomes are represented by different colors. The GFP group served as a control for all comparisons. All values are expressed as mean±SEM*p<0.05, **p<0.01, ***p<0.001, unpaired t test.
图2显示了脱靶RNA SNV的表征。Figure 2 shows the characterization of off-target RNA SNV.
A:GFP(n=6个重复)、APOBEC1(n=3个重复)、BE3(n=3个重复)、BE3-位点3(n=2个重复)和BE3-RNF2(n=3次重复)的G>A和C>U突变的比例。A: GFP (n=6 repeats), APOBEC1 (n=3 repeats), BE3 (n=3 repeats), BE3-site 3 (n=2 repeats) and BE3-RNF2 (n=3 times) Repeat) the ratio of G>A and C>U mutations.
B:GFP(n=6个重复)、TadA-TadA*(n=3个重复)、ABE7.10(n=3个重复)、ABE7.10-位点1(n=3次重复)和ABE7.10-位点2(n=3次重复)的A>G和U>C突变的比例。B: GFP (n=6 repeats), TadA-TadA* (n=3 repeats), ABE7.10 (n=3 repeats), ABE7.10-site 1 (n=3 repeats) and ABE7 .10-The ratio of A>G and U>C mutations at position 2 (n=3 repetitions).
C:各组突变类型的分布。数字表示所有突变中某种突变的百分比。C: Distribution of mutation types in each group. The number indicates the percentage of a certain mutation among all mutations.
D:BE3和ABE7.10组中任何两个样品之间的共享RNA SNV的比率。通过两个样品之间的重叠RNA SNV的数量除以该行中RNA SNV的数量来计算每个细胞中的比例。D: The ratio of shared RNA SNV between any two samples in the BE3 and ABE7.10 groups. Calculate the ratio in each cell by dividing the number of overlapping RNA SNV between the two samples by the number of RNA SNV in the row.
E:ABE7.10诱导的非同义突变位于癌基因和癌基因上具有最高编辑率的肿瘤抑制因子。基因名称用蓝色表示,氨基酸突变用红色表示,单核苷酸转换用绿色表 示。GFP组作为所有比较的对照。所有值均表示为平均值±SEM。*p<0.05,**p<0.01,***p<0.001,非配对t检验。E: Non-synonymous mutations induced by ABE7.10 are located on oncogenes and tumor suppressors with the highest editing rate. Gene names are shown in blue, amino acid mutations are shown in red, and single nucleotide conversions are shown in green. The GFP group served as a control for all comparisons. All values are expressed as mean±SEM. *p<0.05, **p<0.01, ***p<0.001, unpaired t test.
图3显示了用碱基编辑器转染的细胞的单细胞RNA SNV分析结果。Figure 3 shows the results of single-cell RNA SNV analysis of cells transfected with the base editor.
A:通过单细胞RNA测序方法分析的SNV图。A: SNV image analyzed by single-cell RNA sequencing method.
B:来自单细胞RNA-seq数据的单个细胞中ABE、BE3或GFP的表达模式。B: The expression pattern of ABE, BE3 or GFP in a single cell from single-cell RNA-seq data.
C:在GFP-(n=15个细胞)、BE3-位点3-(n=4个细胞)和ABE7.10-位点1-(n=9个细胞)处理的单个细胞中检测到的脱靶RNA SNV的数量。C: Detected in single cells treated with GFP-(n=15 cells), BE3-site 3-(n=4 cells) and ABE7.10-site 1-(n=9 cells) The number of off-target RNA SNV.
D:G>A和C>U突变的比例。D: The ratio of G>A and C>U mutations.
E:GFP(n=15细胞)、BE3-位点3(n=4细胞)和ABE7.10-位点1(n=9个细胞)的A>G和U>C突变的比例。E: Ratio of A>G and U>C mutations in GFP (n=15 cells), BE3-site 3 (n=4 cells) and ABE7.10-site 1 (n=9 cells).
F:每个细胞中突变类型的分布。数字表示所有突变中某种突变的百分比。F: Distribution of mutation types in each cell. The number indicates the percentage of a certain mutation among all mutations.
G、H:同一组中任意两个样本之间共享SNV的比率。每个单元中的比例通过两个样本之间的重叠SNV的数量除以该行中的样本来计算。G, H: The ratio of SNV shared between any two samples in the same group. The ratio in each cell is calculated by dividing the number of overlapping SNVs between the two samples by the samples in the row.
I:发生在至少3个ABE7.10编辑的单细胞中的位于癌症相关基因上的SNV的编辑率。GFP组作为所有比较的对照。所有值均表示为平均值±SEM。*p<0.05,**p<0.01,***p<0.001,非配对t检验。I: Editing rate of SNV located on cancer-related genes in at least 3 single cells edited by ABE7.10. The GFP group served as a control for all comparisons. All values are expressed as mean±SEM. *p<0.05, **p<0.01, ***p<0.001, unpaired t test.
图4显示了通过合理设计脱氨酶消除脱靶RNA SNV的结果。Figure 4 shows the result of rational design of deaminase to eliminate off-target RNA SNV.
A:BE3和ABE7.10变体的示意图。所有脱氨酶突变均在BE3/ABE7.10背景下进行。点突变由红线表示。A: Schematic diagram of BE3 and ABE7.10 variants. All deaminase mutations were performed under the background of BE3/ABE7.10. The point mutation is indicated by the red line.
B:GFP(n=3个重复)、BE3-位点3(n=2个重复),BE3(hA3A)-位点3(n=3个重复)和BE3(W90A)-位点3(n=3个重复)的靶向效率。B: GFP (n=3 repeats), BE3-site 3 (n=2 repeats), BE3 (hA3A)-site 3 (n=3 repeats) and BE3 (W90A)-site 3 (n = 3 replicates) targeting efficiency.
C:BE3-位点3处理组中脱靶RNA SNV的比较。C: Comparison of off-target RNA SNV in the BE3-site 3 treatment group.
D:GFP、ABE7.10-位点1、ABE7.10(D53G)-位点1和ABE7.10(F148A)-位点1组的靶向效率。每组n=3个重复。D: Targeting efficiency of GFP, ABE7.10-site 1, ABE7.10 (D53G)-site 1, and ABE7.10 (F148A)-site 1 group. Each group n=3 repeats.
E:ABE7.10处理组中脱靶RNA SNV的比较。E: Comparison of off-target RNA SNV in ABE7.10 treatment group.
F:比较ABE7.10和ABE7.10(F148A)在四个不同位点上的编辑效率。每组n=3个重复。F: Compare the editing efficiency of ABE7.10 and ABE7.10 (F148A) at four different positions. Each group n=3 repeats.
G:代表性的编辑位点显示ABE7.10(F148A)缩小了编辑窗口的宽度。所有值均表示为平均值±SEM。*p<0.05,**p<0.01,***p<0.001,非配对t检验。G: The representative editing site shows that ABE7.10 (F148A) has reduced the width of the editing window. All values are expressed as mean±SEM. *p<0.05, **p<0.01, ***p<0.001, unpaired t test.
图5显示了质粒的示意图。Figure 5 shows a schematic diagram of the plasmid.
图6显示了染色体上脱靶RNA SNV的代表性分布。Figure 6 shows a representative distribution of off-target RNA SNV on chromosomes.
A:APOBEC1、BE3-位点3、BE3-RNF2;B:TadA-TadA*、ABE7.10-位点1和ABE7.10-位点2A: APOBEC1, BE3-site 3, BE3-RNF2; B: TadA-TadA*, ABE7.10-site 1 and ABE7.10-site 2
图7显示了所有组的每个重复的突变类型的分布。数字表示所有突变中某种类型突变的百分比。Figure 7 shows the distribution of mutation types for each repeat in all groups. The number indicates the percentage of a certain type of mutation among all mutations.
A:GFP组的每个重复的突变类型的分布。A: Distribution of mutation types for each repeat in the GFP group.
B:具有或不具有sgRNA的APOBEC1和BE3组的每个重复的突变类型的分布。B: Distribution of mutation types for each repeat of APOBEC1 and BE3 groups with or without sgRNA.
C:具有或不具有sgRNA的TadA-TadA*和ABE7.10组的每个重复的突变类型的分布。C: Distribution of mutation types for each repetition of TadA-TadA* and ABE7.10 groups with or without sgRNA.
图8显示了在所有BE3和ABE7.10转染组中,含有重叠的脱靶RNA SNV的基因显著高于随机模拟基因。通过双侧Student t'检验计算P值。Figure 8 shows that in all BE3 and ABE7.10 transfection groups, genes containing overlapping off-target RNA SNV were significantly higher than random analog genes. P value was calculated by two-sided Student's t'test.
图9显示了相邻的脱靶RNA SNV序列和靶序列之间的相似性Figure 9 shows the similarity between adjacent off-target RNA SNV sequence and target sequence
图10显示了编辑位于癌基因和肿瘤抑制基因上的BE3诱导的非同义突变的速率。单核苷酸转换用绿色表示,氨基酸突变用红色表示,基因名称用蓝色表示。Figure 10 shows the rate of editing non-synonymous mutations induced by BE3 located on oncogenes and tumor suppressor genes. Single nucleotide conversions are shown in green, amino acid mutations are shown in red, and gene names are shown in blue.
图11显示了编辑位于癌基因和肿瘤抑制基因上的ABE7.10诱导的非同义突变的比率。单核苷酸转换用绿色表示,氨基酸突变用红色表示,基因名称用蓝色表示。Figure 11 shows the ratio of non-synonymous mutations induced by editing ABE7.10 located on oncogenes and tumor suppressor genes. Single nucleotide conversions are shown in green, amino acid mutations are shown in red, and gene names are shown in blue.
图12显示了仅在RNA中检测到脱靶RNA SNV,而不是DNA。Sanger测序色谱图显示仅在两个排名最高的癌基因TOPRS和CSDE1的RNA中观察到U至C突变。Figure 12 shows that only off-target RNA SNV was detected in RNA, not DNA. The Sanger sequencing chromatogram showed that only U to C mutations were observed in the RNA of the two highest ranked oncogenes, TOPRS and CSDE1.
图13显示了转染载体在单细胞中的表达水平。在所有测序的单细胞中定量GFP、APOBEC1和TadA-TadA*的表达水平。阈值由蓝色虚线表示。GFP、BE3和ABE7.10的log2(FPKM+1)的阈值分别为0.3、1和0.3。包括表达水平高于阈值的细胞用于进一步分析。Figure 13 shows the expression level of the transfection vector in a single cell. The expression levels of GFP, APOBEC1 and TadA-TadA* were quantified in all single cells sequenced. The threshold is indicated by the blue dashed line. The log2 (FPKM+1) thresholds of GFP, BE3 and ABE7.10 are 0.3, 1 and 0.3, respectively. Include cells with expression levels above the threshold for further analysis.
图14显示了所有单细胞的突变类型分布。Figure 14 shows the mutation type distribution of all single cells.
A:GFP转染的单细胞(n=16个细胞)的突变类型的分布。A: Distribution of mutation types in GFP-transfected single cells (n=16 cells).
B:BE3位点3转染的单细胞(n=31个细胞)的突变类型的分布。表达水平高于阈值的APOBEC1的细胞包括在红色方块中。B: Distribution of mutation types of single cells (n=31 cells) transfected with BE3 site 3. Cells expressing APOBEC1 above the threshold are included in the red square.
C:ABE7.10-位点1-转染的单细胞(n=28个细胞)的突变类型的分布。具有高于阈值的表达水平TadA-TadA*的细胞包括在红色方块中。该数字表示所有突变中某种突变的百分比。SC代表单细胞。C: Distribution of mutation types of ABE7.10-site1-transfected single cells (n=28 cells). Cells with an expression level of TadA-TadA* above the threshold are included in the red squares. This number represents the percentage of a certain mutation among all mutations. SC stands for single cell.
图15显示了所有单个细胞的脱靶RNA SNV在人染色体上的分布,其表达水 平高于阈值。Figure 15 shows the distribution of off-target RNA SNV from all single cells on human chromosomes, and its expression level is higher than the threshold.
A:GFP转染的单细胞(n=15)在人染色体上的脱靶RNA SNV的分布。A: The distribution of off-target RNA SNV on human chromosomes of single cells (n=15) transfected with GFP.
B:BE3位点3转染的单细胞(n=4)在人染色体上的脱靶RNA SNV的分布。B: The distribution of off-target RNA SNV on human chromosomes of single cells (n=4) transfected with BE3 site 3.
C:ABE7.10-位点1-转染的单细胞(n=9)的人染色体上脱靶RNA SNV的分布。C: The distribution of off-target RNA SNV on the human chromosomes of ABE7.10-site 1-transfected single cells (n=9).
图16显示了位于单细胞中癌基因和肿瘤抑制基因上的BE3诱导的非同义突变的编辑率。单核苷酸转换用绿色表示,氨基酸突变用红色表示,基因名称用蓝色表示。Figure 16 shows the editing rate of BE3-induced non-synonymous mutations on oncogenes and tumor suppressor genes in single cells. Single nucleotide conversions are shown in green, amino acid mutations are shown in red, and gene names are shown in blue.
图17显示了位于单细胞中的癌基因和肿瘤抑制基因上的ABE7.10诱导的非同义突变的编辑率。单核苷酸转换用绿色表示,氨基酸突变用红色表示,基因名称用蓝色表示。Figure 17 shows the editing rate of non-synonymous mutations induced by ABE7.10 on oncogenes and tumor suppressor genes located in single cells. Single nucleotide conversions are shown in green, amino acid mutations are shown in red, and gene names are shown in blue.
图18显示了工程化BE3和ABE7.10变体的人染色体上脱靶RNA SNV的代表性分布。Figure 18 shows a representative distribution of off-target RNA SNV on human chromosomes of engineered BE3 and ABE7.10 variants.
图19显示了BE3和ABE7.10的工程化变体的突变类型的平均分布,每组n=3。Figure 19 shows the average distribution of mutation types of engineered variants of BE3 and ABE7.10, with n=3 in each group.
图20显示了BE3和ABE7.10的工程化变体的每个样品的突变类型的分布。Figure 20 shows the distribution of mutation types for each sample of the engineered variants of BE3 and ABE7.10.
图21显示了在BE3和ABE7.10的工程化变体中任何两个样品之间的共享RNA SNV的比率。通过两个样品之间的重叠RNA SNV的数量除以该行中RNA SNV的数量来计算每个细胞中的比例。Figure 21 shows the ratio of shared RNA SNV between any two samples in the engineered variants of BE3 and ABE7.10. Calculate the ratio in each cell by dividing the number of overlapping RNA SNV between the two samples by the number of RNA SNV in the row.
图22显示了对ABE7.10(n=3)和ABE7.10 F148A(n=3)之间编辑窗口的宽度的比较。 Figure 22 shows a comparison of the width of the edit window between ABE7.10 (n=3) and ABE7.10 F148A (n=3).
图23显示了多个物种中的TadA酶的同源性。Figure 23 shows the homology of TadA enzymes in multiple species.
具体实施方式detailed description
本发明人经过广泛而深入的研究,经过大量的筛选,首次意外地发现,分别将腺嘌呤碱基编辑器ABE相关的腺嘌呤脱氨酶(TadA-TadA*)中的TadA片段和TadA*片段的第148位的氨基酸残基F分别突变为A(即TadA F148A-TadA* F148A)以后,可以在维持有效的DNA靶向活性的情况下,使其基因编辑窗口显著变窄,即可以显著提高其基因编辑的精确性;并且,实验证明,在具有此突变(即TadA F148A-TadA* F148A)的基因编辑系统中,RNA的脱靶效应被大大减少。在此基础上完成了本发明。 After extensive and in-depth research and extensive screening, the inventors unexpectedly discovered for the first time that the TadA fragment and the TadA* fragment in the adenine deaminase (TadA-TadA*) associated with the adenine base editor ABE After mutating the amino acid residue F at position 148 to A (that is, TadA F148A -TadA* F148A ), the gene editing window can be significantly narrowed while maintaining effective DNA targeting activity, which can significantly increase The accuracy of its gene editing; and, experiments have proved that in the gene editing system with this mutation (ie, TadA F148A -TadA* F148A ), the off-target effect of RNA is greatly reduced. The present invention has been completed on this basis.
术语the term
如本文所用,术语“碱基突变”指核苷酸序列的某一位置处发生碱基的替换(substitution)、插入(insertion)和/或缺失(deletion)。As used herein, the term "base mutation" refers to a substitution, insertion and/or deletion of a base at a certain position in a nucleotide sequence.
如本文所用,术语“碱基替换”指核苷酸序列的某一位置处的碱基突变为另一不同的碱基,比如A突变为G。As used herein, the term "base substitution" refers to the mutation of a base at a certain position in the nucleotide sequence to another different base, such as the mutation of A to G.
如本文所用,“筛选标记基因”指转基因过程中用来筛选转基因细胞或转基因动物的基因,可用于本申请的筛选标记基因没有特别限制,包括转基因领域常用的各种筛选标记基因,代表性例子包括(但并不限于):荧光素蛋白、或荧光素酶(如萤火虫荧光素酶、海肾荧光素酶)、绿色荧光蛋白、黄色荧光蛋白、红色荧光蛋白、或其组合。As used herein, "selection marker gene" refers to a gene used to screen transgenic cells or transgenic animals in the transgenic process. The selection marker gene that can be used in this application is not particularly limited, and includes various selection marker genes commonly used in the field of transgenics, representative examples Including (but not limited to): luciferin, or luciferase (such as firefly luciferase, Renilla luciferase), green fluorescent protein, yellow fluorescent protein, red fluorescent protein, or a combination thereof.
如本文所用,术语“Cas蛋白”指一种核酸酶。一种优选的Cas蛋白是Cas9蛋白。典型的Cas9蛋白包括(但并不限于):来源于葡萄球菌(Staphylococcus aureus)的Cas9。在本发明中,所述的Cas9蛋白还可以被Cpf1核酸酶替换,所述Cpf1核酸酶的来源选自下组:酸性氨基球菌(Acidaminococcus)、毛螺菌科(Lachnospiraceae)、酸性氨基球菌突变体、毛螺菌科突变体。As used herein, the term "Cas protein" refers to a nuclease. A preferred Cas protein is the Cas9 protein. Typical Cas9 proteins include (but are not limited to): Cas9 derived from Staphylococcus aureus. In the present invention, the Cas9 protein can also be replaced by Cpf1 nuclease, and the source of the Cpf1 nuclease is selected from the following group: Acidaminococcus, Lachnospiraceae, acid aminococcus mutants , Mutants of Laospirillaceae.
腺嘌呤脱氨酶TadAAdenine Deaminase TadA
TadA是的原核RNA编辑酶。TadA is a prokaryotic RNA editing enzyme.
TadA酶具有腺嘌呤脱氨酶的活性,能够将腺嘌呤(Adenosine,A)脱氨基化成为次黄嘌呤(Inosine,I)。重组TadA蛋白形成同源二聚体,通过在tRNA Arg-2的摆动位置使腺苷残基脱氨来产生肌苷。TadA enzyme has the activity of adenine deaminase and can deaminate adenine (Adenosine, A) into hypoxanthine (Inosine, I). Recombinant TadA protein forms a homodimer, which produces inosine by deaminating adenosine residues at the swing position of tRNA Arg-2.
如图23所示,在多个物种中,TadA均具有较高的同源性。例如,大肠杆菌tadA显示与酵母tRNA脱氨酶亚基Tad2p的序列相似性。As shown in Figure 23, TadA has high homology among multiple species. For example, E. coli tadA shows sequence similarity to the yeast tRNA deaminase subunit Tad2p.
在多个物种中,尤其在对应于本发明SEQ ID NO:1所示序列的第148位,具有高度保守的氨基酸残基。In many species, especially at position 148 corresponding to the sequence shown in SEQ ID NO:1 of the present invention, there are highly conserved amino acid residues.
如本文所用,术语“TadA7.10”、“TadA*”可互换使用,是指一种在本发明所述的野生型TadA酶的氨基酸序列的基础上的突变体,突变的氨基酸残基包括W23R、H36L、P48A、R51L、L84F、A106V、D108N、H123Y、S146C、D147Y、R152P、E155V、I156F和K157N。As used herein, the terms "TadA7.10" and "TadA*" are used interchangeably and refer to a mutant based on the amino acid sequence of the wild-type TadA enzyme of the present invention. The mutant amino acid residues include W23R, H36L, P48A, R51L, L84F, A106V, D108N, H123Y, S146C, D147Y, R152P, E155V, I156F and K157N.
相应地,术语“ABE7.10”、“TadA-TadA*”可互换使用,是指氨基酸序列中,包含了未经本发明所述的突变的野生型TadA酶和TadA*酶的氨基酸序列的蛋 白。Correspondingly, the terms "ABE7.10" and "TadA-TadA*" are used interchangeably, and refer to the amino acid sequence that contains the amino acid sequence of the wild-type TadA enzyme and the TadA* enzyme that have not been mutated according to the present invention. protein.
在本发明的一个实施方式中,所述的野生型TadA酶具有如SEQ ID NO:1所示的氨基酸序列,所述的TadA*酶具有如SEQ ID NO:2所示的氨基酸序列。In one embodiment of the present invention, the wild-type TadA enzyme has the amino acid sequence shown in SEQ ID NO: 1, and the TadA* enzyme has the amino acid sequence shown in SEQ ID NO: 2.
Figure PCTCN2019081532-appb-000001
Figure PCTCN2019081532-appb-000001
本发明基因编辑酶及其编码核酸Gene editing enzyme of the present invention and its encoding nucleic acid
如本文所用,术语“基因编辑酶”、“本发明基因编辑酶”、“本发明TadA F148A-TadA* F148A”、“ABE7.10 F148A”可互换使用,是指本发明第二方面所述的具有式I结构的基因编辑酶: As used herein, the terms "gene editing enzyme", "gene editing enzyme of the present invention", "TadA F148A- TadA* F148A of the present invention", and "ABE7.10 F148A " are used interchangeably and refer to the second aspect of the present invention. The gene editing enzyme with the structure of formula I:
Z1-L1-Z2-L2-Z3-L3-Z4   (I)Z1-L1-Z2-L2-Z3-L3-Z4 (I)
其中,among them,
Z1为腺嘌呤脱氨酶TadA的氨基酸序列;Z1 is the amino acid sequence of adenine deaminase TadA;
Z2为TadA*酶的氨基酸序列;Z2 is the amino acid sequence of TadA* enzyme;
并且所述Z1和/或Z2为如本发明第一方面所述的突变蛋白的氨基酸序列;And said Z1 and/or Z2 is the amino acid sequence of the mutant protein according to the first aspect of the present invention;
Z3为Cas9核酸酶的编码序列;Z3 is the coding sequence of Cas9 nuclease;
L1、L2和L3各自独立地为任选的连接肽序列;L1, L2 and L3 are each independently an optional connecting peptide sequence;
Z4为无或核定位信号元件(NLS);Z4 is a non-or nuclear localization signal element (NLS);
并且各“-”独立地为肽键。And each "-" is independently a peptide bond.
在一个优选的实施方式中,所述Z1的氨基酸序列为在SEQ ID NO:1所示氨基酸序列的基础上,在第148位发生F148A突变的氨基酸序列。In a preferred embodiment, the amino acid sequence of Z1 is an amino acid sequence with F148A mutation at position 148 based on the amino acid sequence shown in SEQ ID NO:1.
在一个优选的实施方式中,所述Z2的氨基酸序列为在SEQ ID NO:2所示氨基酸序列的基础上,在第148位发生F148A突变的氨基酸序列。In a preferred embodiment, the amino acid sequence of Z2 is based on the amino acid sequence shown in SEQ ID NO: 2, an amino acid sequence in which the F148A mutation occurs at position 148.
在一个优选的实施方式中,所述Z3的氨基酸序列如SEQ ID NO:8所示。In a preferred embodiment, the amino acid sequence of Z3 is shown in SEQ ID NO: 8.
Figure PCTCN2019081532-appb-000002
Figure PCTCN2019081532-appb-000002
在本发明的一个实施方式中,所述的L1、L2和L3各自独立地具有选自下组的氨基酸序列:GGS、(GGS) 2、(GGS) 3、(GGS) 4、(GGS) 5、(GGS) 6、(GGS) 7,或其组 合。 In one embodiment of the present invention, said L1, L2 and L3 each independently have an amino acid sequence selected from the group consisting of GGS, (GGS) 2 , (GGS) 3 , (GGS) 4 , (GGS) 5 , (GGS) 6 , (GGS) 7 , or a combination thereof.
在一个优选的实施方式中,所述L1的氨基酸序列为SGGSSGGSSGSETPGTSESATPESSGGSSGGS(SEQ ID NO:5);所述L2的氨基酸序列为SGGSSGGSSGSETPGTSESATPESSGGSSGGSGS(SEQ ID NO:6);所述L3的氨基酸序列为SGGS(SEQ ID NO:7)。In a preferred embodiment, the amino acid sequence of L1 is SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 5); the amino acid sequence of L2 is SGGSSGGSSGSETPGTSESATPESSGGSSGGSGS (SEQ ID NO: 6); the amino acid sequence of L3 is SGGS (SEQ ID NO: 6) ID NO: 7).
在一个优选地实施方式中,所述Z4为核定位信号元件(NLS),氨基酸序列为PKKKRKV(SEQ ID NO:9)。In a preferred embodiment, the Z4 is a nuclear localization signal element (NLS), and the amino acid sequence is PKKKRKV (SEQ ID NO: 9).
在本发明的一个优选的实施方式中,本发明基因编辑酶的一种典型的氨基酸序列如SEQ ID NO:10所示。In a preferred embodiment of the present invention, a typical amino acid sequence of the gene editing enzyme of the present invention is shown in SEQ ID NO: 10.
Figure PCTCN2019081532-appb-000003
Figure PCTCN2019081532-appb-000003
Figure PCTCN2019081532-appb-000004
Figure PCTCN2019081532-appb-000004
本发明还包括与本发明的SEQ ID NO:10所示序列具有50%或以上(优选60%以上,70%以上,80%以上,更优选90%以上,更优选95%以上,最优选98%以上,如99%)同源性的具有相同或相似功能的多肽或蛋白。The present invention also includes 50% or more of the sequence shown in SEQ ID NO: 10 of the present invention (preferably 60% or more, 70% or more, 80% or more, more preferably 90% or more, more preferably 95% or more, most preferably 98 % Or more, such as 99%) homologous polypeptides or proteins with the same or similar functions.
所述“相同或相似功能”主要是指:“催化腺嘌呤水解脱氨基生成次黄嘌呤的活性”。The "same or similar function" mainly refers to "the activity of catalyzing the hydrolysis and deamination of adenine to form hypoxanthine".
应理解,本发明基因编辑酶中的氨基酸编号基于SEQ ID NO.:10作出,当某一具体基因编辑酶与SEQ ID NO.:10所示序列的同源性达到80%或以上时,基因编辑酶的氨基酸编号可能会有相对于SEQ ID NO.:10的氨基酸编号的错位,如向氨基酸的N末端或C末端错位1-5位,而采用本领域常规的序列比对技术,本领域技术人员通常可以理解这样的错位是在合理范围内的,且不应当由于氨基酸编号的错位而使同源性达80%(如90%、95%、98%)的、具有相同或相似产生基因编辑酶催化活性的突变体不在本发明基因编辑酶的范围内。It should be understood that the amino acid numbering in the gene editing enzyme of the present invention is based on SEQ ID NO.: 10. When the homology between a specific gene editing enzyme and the sequence shown in SEQ ID NO.: 10 reaches 80% or more, the gene The amino acid numbering of the editing enzyme may be misaligned with respect to the amino acid numbering of SEQ ID NO.: 10, such as misaligned positions 1-5 to the N-terminus or C-terminus of the amino acid, and conventional sequence alignment techniques in the art are used in the art. The skilled person can generally understand that such misalignment is within a reasonable range, and should not have homology of 80% (such as 90%, 95%, 98%), with the same or similar genes produced due to the misalignment of amino acid numbering A mutant editing enzyme catalytic activity is not within the scope of the gene editing enzyme of the present invention.
本发明基因编辑酶是合成蛋白或重组蛋白,即可以是化学合成的产物,或使用重组技术从原核或真核宿主(例如,细菌、酵母、植物)中产生。根据重组生产方案所用的宿主,本发明的基因编辑酶可以是糖基化的,或可以是非糖基化的。本发明的基因编辑酶还可包括或不包括起始的甲硫氨酸残基。The gene editing enzyme of the present invention is a synthetic protein or a recombinant protein, that is, it can be a chemically synthesized product, or produced from a prokaryotic or eukaryotic host (for example, bacteria, yeast, and plants) using recombinant technology. Depending on the host used in the recombinant production protocol, the gene editing enzyme of the present invention may be glycosylated or non-glycosylated. The gene editing enzyme of the present invention may also include or not include the initial methionine residue.
本发明还包括所述基因编辑酶的片段、衍生物和类似物。如本文所用,术语“片段”、“衍生物”和“类似物”是指基本上保持所述基因编辑酶相同的 生物学功能或活性的蛋白。The present invention also includes fragments, derivatives and analogs of the gene editing enzyme. As used herein, the terms "fragment", "derivative" and "analog" refer to a protein that substantially maintains the same biological function or activity of the gene editing enzyme.
本发明的基因编辑酶片段、衍生物或类似物可以是(i)有一个或多个保守或非保守性氨基酸残基(优选保守性氨基酸残基)被取代的基因编辑酶,而这样的取代的氨基酸残基可以是也可以不是由遗传密码编码的,或(ii)在一个或多个氨基酸残基中具有取代基团的基因编辑酶,或(iii)成熟基因编辑酶与另一个化合物(比如延长基因编辑酶半衰期的化合物,例如聚乙二醇)融合所形成的基因编辑酶,或(iv)附加的氨基酸序列融合到此基因编辑酶序列而形成的基因编辑酶(如前导序列或分泌序列或用来纯化此基因编辑酶的序列或蛋白原序列,或与抗原IgG片段的形成的融合蛋白)。根据本文的教导,这些片段、衍生物和类似物属于本领域熟练技术人员公知的范围。本发明中,保守性替换的氨基酸最好根据表I进行氨基酸替换而产生。The gene editing enzyme fragment, derivative or analogue of the present invention may be (i) a gene editing enzyme in which one or more conservative or non-conservative amino acid residues (preferably conservative amino acid residues) are replaced, and such substitution The amino acid residues of may or may not be encoded by the genetic code, or (ii) gene editing enzymes with substitution groups in one or more amino acid residues, or (iii) mature gene editing enzymes and another compound ( For example, a compound that extends the half-life of a gene editing enzyme, such as polyethylene glycol) is fused to form a gene editing enzyme, or (iv) an additional amino acid sequence is fused to the gene editing enzyme sequence to form a gene editing enzyme (such as a leader sequence or secreted Sequence or used to purify the gene editing enzyme sequence or proprotein sequence, or the formation of fusion protein with antigen IgG fragment). According to the teachings herein, these fragments, derivatives and analogs are within the scope well known to those skilled in the art. In the present invention, conservatively substituted amino acids are preferably generated by amino acid substitutions according to Table I.
表ITable I
最初的残基Initial residue 代表性的取代Representative substitution 优选的取代Preferred substitution
Ala(A)Ala(A) Val;Leu;IleVal; Leu; Ile ValVal
Arg(R)Arg(R) Lys;Gln;AsnLys; Gln; Asn LysLys
Asn(N)Asn(N) Gln;His;Lys;ArgGln; His; Lys; Arg GlnGln
Asp(D)Asp(D) GluGlu GluGlu
Cys(C)Cys(C) SerSer SerSer
Gln(Q)Gln(Q) AsnAsn AsnAsn
Glu(E)Glu(E) AspAsp AspAsp
Gly(G)Gly(G) Pro;AlaPro; Ala AlaAla
His(H)His(H) Asn;Gln;Lys;ArgAsn; Gln; Lys; Arg ArgArg
Ile(I)Ile(I) Leu;Val;Met;Ala;pheLeu; Val; Met; Ala; phe LeuLeu
Leu(L)Leu(L) Ile;Val;Met;Ala;pheIle; Val; Met; Ala; phe IleIle
Lys(K)Lys(K) Arg;Gln;AsnArg; Gln; Asn ArgArg
Met(M)Met(M) Leu;phe;IleLeu; phe; Ile LeuLeu
Phe(F)Phe(F) Leu;Val;Ile;Ala;TyrLeu; Val; Ile; Ala; Tyr LeuLeu
Pro(P)Pro(P) AlaAla AlaAla
Ser(S)Ser(S) ThrThr ThrThr
Thr(T)Thr(T) SerSer SerSer
Trp(W)Trp(W) Tyr;pheTyr; phe TyrTyr
Tyr(Y)Tyr(Y) Trp;phe;Thr;SerTrp; phe; Thr; Ser PhePhe
Val(V)Val(V) Ile;Leu;Met;phe;AlaIle; Leu; Met; phe; Ala LeuLeu
此外,还可以对本发明基因编辑酶进行修饰。修饰(通常不改变一级结构)形式包括:体内或体外的基因编辑酶的化学衍生形式如乙酰化或羧基化。修饰还包括糖基化,如那些在基因编辑酶的合成和加工中或进一步加工步骤中进行糖基化修饰而产生的基因编辑酶。这种修饰可以通过将基因编辑酶暴露于进行糖基化的酶(如哺乳动物的糖基化酶或去糖基化酶)而完成。修饰形式还包括具有磷酸化氨基酸残基(如磷酸酪氨酸,磷酸丝氨酸,磷酸苏氨酸)的序列。还包括被修饰从而提高了其抗蛋白水解性能或优化了溶解性能的基因编辑酶。In addition, the gene editing enzyme of the present invention can also be modified. Modification (usually without changing the primary structure) forms include: in vivo or in vitro chemically derived forms of gene editing enzymes such as acetylation or carboxylation. Modifications also include glycosylation, such as those gene editing enzymes produced by glycosylation modification during the synthesis and processing of gene editing enzymes or in further processing steps. This modification can be accomplished by exposing the gene editing enzyme to an enzyme that performs glycosylation (such as a mammalian glycosylase or deglycosylase). Modified forms also include sequences with phosphorylated amino acid residues (such as phosphotyrosine, phosphoserine, phosphothreonine). It also includes gene editing enzymes that have been modified to improve their resistance to proteolysis or optimize their solubility.
术语“编码基因编辑酶的多核苷酸”可以是包括编码本发明基因编辑酶的多核苷酸,也可以是还包括附加编码和/或非编码序列的多核苷酸。The term "polynucleotide encoding a gene editing enzyme" may include a polynucleotide encoding the gene editing enzyme of the present invention, or a polynucleotide that also includes additional coding and/or non-coding sequences.
本发明还涉及上述多核苷酸的变异体,其编码与本发明有相同的氨基酸序列的多肽或基因编辑酶的片段、类似物和衍生物。这些核苷酸变异体包括取代变异体、缺失变异体和插入变异体。如本领域所知的,等位变异体是一个多核苷酸的替换形式,它可能是一个或多个核苷酸的取代、缺失或插入,但不会从实质上改变其编码的基因编辑酶的功能。The present invention also relates to variants of the above-mentioned polynucleotides, which encode fragments, analogs and derivatives of polypeptides or gene editing enzymes having the same amino acid sequence as the present invention. These nucleotide variants include substitution variants, deletion variants and insertion variants. As known in the art, an allelic variant is an alternative form of polynucleotide, which may be a substitution, deletion or insertion of one or more nucleotides, but does not substantially change the gene editing enzyme it encodes Function.
本发明还涉及与上述的序列杂交且两个序列之间具有至少50%,较佳地至少70%,更佳地至少80%相同性的多核苷酸。本发明特别涉及在严格条件(或严紧条件)下与本发明所述多核苷酸可杂交的多核苷酸。在本发明中,“严格条件”是指:(1)在较低离子强度和较高温度下的杂交和洗脱,如0.2×SSC,0.1%SDS,60℃;或(2)杂交时加有变性剂,如50%(v/v)甲酰胺,0.1%小牛血清/0.1%Ficoll,42℃等;或(3)仅在两条序列之间的相同性至少在90%以上,更好是95%以上时才发生杂交。The present invention also relates to polynucleotides that hybridize with the above-mentioned sequences and have at least 50%, preferably at least 70%, and more preferably at least 80% identity between the two sequences. The present invention particularly relates to polynucleotides that can hybridize with the polynucleotide of the present invention under stringent conditions (or stringent conditions). In the present invention, "stringent conditions" refer to: (1) hybridization and elution at lower ionic strength and higher temperature, such as 0.2×SSC, 0.1% SDS, 60°C; or (2) adding during hybridization There are denaturants, such as 50% (v/v) formamide, 0.1% calf serum/0.1% Ficoll, 42°C, etc.; or (3) only the identity between the two sequences is at least 90% or more, and more Fortunately, hybridization occurs when more than 95%.
本发明的基因编辑酶和多核苷酸优选以分离的形式提供,更佳地,被纯化至均质。The gene editing enzyme and polynucleotide of the present invention are preferably provided in an isolated form, and more preferably, are purified to homogeneity.
本发明多核苷酸全长序列通常可以通过PCR扩增法、重组法或人工合成的方法获得。对于PCR扩增法,可根据本发明所公开的有关核苷酸序列,尤其是开放阅读框序列来设计引物,并用市售的cDNA库或按本领域技术人员已知的常规方法所制备的cDNA库作为模板,扩增而得有关序列。当序列较长时,常 常需要进行两次或多次PCR扩增,然后再将各次扩增出的片段按正确次序拼接在一起。The full-length sequence of the polynucleotide of the present invention can usually be obtained by PCR amplification method, recombinant method or artificial synthesis method. For the PCR amplification method, primers can be designed according to the relevant nucleotide sequence disclosed in the present invention, especially the open reading frame sequence, and a commercially available cDNA library or a cDNA prepared by a conventional method known to those skilled in the art can be used. The library is used as a template to amplify the relevant sequences. When the sequence is long, it is often necessary to perform two or more PCR amplifications, and then splice the amplified fragments together in the correct order.
一旦获得了有关的序列,就可以用重组法来大批量地获得有关序列。这通常是将其克隆入载体,再转入细胞,然后通过常规方法从增殖后的宿主细胞中分离得到有关序列。Once the relevant sequence is obtained, the recombination method can be used to obtain the relevant sequence in large quantities. This usually involves cloning it into a vector, then transferring it into a cell, and then isolating the relevant sequence from the proliferated host cell by conventional methods.
此外,还可用人工合成的方法来合成有关序列,尤其是片段长度较短时。通常,通过先合成多个小片段,然后再进行连接可获得序列很长的片段。In addition, artificial synthesis methods can also be used to synthesize related sequences, especially when the fragment length is short. Usually, by first synthesizing multiple small fragments, and then ligating to obtain a very long fragment.
目前,已经可以完全通过化学合成来得到编码本发明蛋白(或其片段,或其衍生物)的DNA序列。然后可将该DNA序列引入本领域中已知的各种现有的DNA分子(或如载体)和细胞中。此外,还可通过化学合成将突变引入本发明蛋白序列中。At present, the DNA sequence encoding the protein (or fragment or derivative thereof) of the present invention can be obtained completely through chemical synthesis. The DNA sequence can then be introduced into various existing DNA molecules (or such as vectors) and cells known in the art. In addition, mutations can also be introduced into the protein sequence of the present invention through chemical synthesis.
应用PCR技术扩增DNA/RNA的方法被优选用于获得本发明的多核苷酸。特别是很难从文库中得到全长的cDNA时,可优选使用RACE法(RACE-cDNA末端快速扩增法),用于PCR的引物可根据本文所公开的本发明的序列信息适当地选择,并可用常规方法合成。可用常规方法如通过凝胶电泳分离和纯化扩增的DNA/RNA片段。The method of amplifying DNA/RNA using PCR technology is preferably used to obtain the polynucleotide of the present invention. Especially when it is difficult to obtain full-length cDNA from the library, the RACE method (RACE-cDNA end rapid amplification method) can be preferably used. The primers used for PCR can be appropriately selected according to the sequence information of the present invention disclosed herein. And can be synthesized by conventional methods. The amplified DNA/RNA fragments can be separated and purified by conventional methods such as gel electrophoresis.
本发明方法Method of the invention
在本发明中,还提供了一种基因单碱基定点编辑的方法,包括步骤:In the present invention, there is also provided a method for single-base site-directed editing of genes, including the steps:
(i)提供一细胞以及第一载体和第二载体,其中所述第一载体含有如本发明第二方面所述的基因编辑酶的表达盒,并且所述第二载体含有表达sgRNA的表达盒;(i) Provide a cell and a first vector and a second vector, wherein the first vector contains an expression cassette for the gene editing enzyme according to the second aspect of the present invention, and the second vector contains an expression cassette for expressing sgRNA ;
(ii)用所述的第一载体和第二载体感染所述的细胞,从而在所述细胞内进行单碱基定点编辑。(ii) Infecting the cell with the first vector and the second vector, thereby performing single-base site-directed editing in the cell.
在另一优选例中,其中,所述第一载体中含有第一核苷酸构建物,所述第一核酸构建物具有5’-3’(5’至3’)的式II结构:In another preferred example, wherein the first vector contains a first nucleotide construct, and the first nucleic acid construct has a 5'-3' (5' to 3') formula II structure:
P1-X1-L4-X2  (II)P1-X1-L4-X2 (II)
其中,among them,
P1为第一启动子序列;P1 is the first promoter sequence;
X1为编码本发明第二方面所述的基因编辑酶的核苷酸序列;X1 is a nucleotide sequence encoding the gene editing enzyme of the second aspect of the present invention;
L4为无或连接序列;L4 is no or connection sequence;
X2为polyA序列;X2 is a polyA sequence;
并且,各“-”独立地为键或核苷酸连接序列。Also, each "-" is independently a bond or a nucleotide linking sequence.
其中,所述的第一启动子选自下组:CMV启动子、CAG启动子、PGK启动子、EF1α启动子,EFS启动子,或其组合。在一个优选的实施方式中,所述第一启动子序列为CMV启动子。Wherein, the first promoter is selected from the group consisting of CMV promoter, CAG promoter, PGK promoter, EF1α promoter, EFS promoter, or a combination thereof. In a preferred embodiment, the first promoter sequence is a CMV promoter.
在本发明的一个实施方式中,所述连接序列的长度为30-120nt,较佳地,48-96nt,并且优选为3的倍数。In one embodiment of the present invention, the length of the connecting sequence is 30-120 nt, preferably, 48-96 nt, and preferably a multiple of 3.
在所述的方法中,所述第一载体、第二载体可以相同,可以不同。在一个优选的实施方式中,所述的第一载体和第二载体可以为同一载体。In the method, the first carrier and the second carrier may be the same or different. In a preferred embodiment, the first carrier and the second carrier may be the same carrier.
优选地,所述第一载体和/或第二载体还含有表达筛选标记的表达盒。所述的筛选标记选自下组:绿色荧光蛋白、黄色荧光蛋白、红色荧光蛋白、蓝色荧光蛋白,或其组合。Preferably, the first vector and/or the second vector further contain an expression cassette for expressing a selection marker. The selection marker is selected from the following group: green fluorescent protein, yellow fluorescent protein, red fluorescent protein, blue fluorescent protein, or a combination thereof.
在本发明的一个实施方式中,所述的方法是非诊断和非治疗性的。In one embodiment of the invention, the method is non-diagnostic and non-therapeutic.
在本发明的方法中,所述的细胞来自以下物种:人、非人哺乳动物、家禽、植物、或微生物。其中,所述的非人哺乳动物包括啮齿动物(如小鼠、大鼠、兔)、牛、猪、羊、马、狗、猫、非人灵长动物(如猴)。In the method of the present invention, the cells are from the following species: humans, non-human mammals, poultry, plants, or microorganisms. Wherein, the non-human mammals include rodents (such as mice, rats, rabbits), cows, pigs, sheep, horses, dogs, cats, and non-human primates (such as monkeys).
在本发明的一个实施方式中,所述的细胞选自下组:体细胞、干细胞、生殖细胞、非分裂细胞或其组合。优选地,所述的细胞选自下组:肾细胞、上皮细胞、内皮细胞,神经细胞或其组合。In one embodiment of the present invention, the cell is selected from the group consisting of somatic cells, stem cells, germ cells, non-dividing cells or a combination thereof. Preferably, the cells are selected from the group consisting of kidney cells, epithelial cells, endothelial cells, nerve cells or a combination thereof.
在本发明中,用所述方法基因编辑时,编辑窗口是sgRNA靶向的20个碱基序列的第4个到第7个碱基,其中第5个碱基的编辑效率最高,往两侧分布式显著降低,而非突变的ABE7.10编辑系统的编辑窗口相对本方法更宽,编辑窗口位第3个氨基酸到第9个氨基酸,第5个碱基的编辑效率最高,往两侧分布成逐渐降低的趋势。In the present invention, when using the method for gene editing, the editing window is the 4th to 7th bases of the 20 base sequence targeted by sgRNA, and the 5th base has the highest editing efficiency. The distribution is significantly reduced, and the editing window of the non-mutated ABE7.10 editing system is wider than this method. The editing window is from the 3rd amino acid to the 9th amino acid, and the 5th base has the highest editing efficiency, which is distributed on both sides. Into a gradually decreasing trend.
本发明的主要优点包括:The main advantages of the present invention include:
1)减小了单碱基编辑系统ABE的编辑窗口,大大提高了单碱基编辑的精确性。用本发明方法基因编辑时,编辑窗口是sgRNA靶向的20个碱基序列的第4个到第7个碱基,其中第5个碱基的编辑效率最高,往两侧分布式显著降低,而非突变的ABE7.10编辑系统的编辑窗口相对本方法更宽,编辑窗口位第3个氨基酸到第9个氨基酸,第5个碱基的编辑效率最高,往两侧分布成逐渐降低的趋势。1) The editing window of the single-base editing system ABE is reduced, and the accuracy of single-base editing is greatly improved. When using the method of the present invention for gene editing, the editing window is the 4th to 7th bases of the 20 base sequence targeted by sgRNA, and the 5th base has the highest editing efficiency, and the distribution to both sides is significantly reduced. The editing window of the non-mutated ABE7.10 editing system is wider than this method. The editing window is from the 3rd amino acid to the 9th amino acid, and the 5th base has the highest editing efficiency, and it is distributed to both sides into a gradually decreasing trend. .
2)几乎消除了单碱基编辑系统ABE在RNA水平上产生的点突变,大大提高了单碱基编辑系统ABE的特异性。2) The point mutations generated by the single-base editing system ABE at the RNA level are almost eliminated, and the specificity of the single-base editing system ABE is greatly improved.
3)ABE7.10 F148A几乎维持了ABE7.10的编辑活性,在目的编辑位点中保持一致的活性。 3) ABE7.10 F148A almost maintains the editing activity of ABE7.10, keeping the same activity in the target editing site.
下面结合具体实施例,进一步阐述本发明。应理解,这些实施例仅用于说明本发明而不用于限制本发明的范围。下列实施例中未注明具体条件的实验方法,通常按照常规条件,例如Sambrook等人,分子克隆:实验室手册(New York:Cold Spring Harbor Laboratorypress,1989)中所述的条件,或按照制造厂商所建议的条件。除非另外说明,否则百分比和份数是重量百分比和重量份数。The present invention will be further described below in conjunction with specific embodiments. It should be understood that these embodiments are only used to illustrate the present invention and not to limit the scope of the present invention. The experimental methods without specific conditions in the following examples usually follow conventional conditions, such as the conditions described in Sambrook et al., Molecular Cloning: Laboratory Manual (New York: Cold Spring Harbor Laboratory press, 1989), or according to the manufacturer’s The suggested conditions. Unless otherwise specified, percentages and parts are weight percentages and parts by weight.
如无特别说明,实施例所用的材料和试剂均为市售产品。Unless otherwise specified, the materials and reagents used in the examples are all commercially available products.
方法和材料Methods and materials
瞬时转染和测序Transient transfection and sequencing
根据标准方案使用NEBuilder HiFi DNA Assembly Master Mix(New England Biolabs)构建质粒。将293T细胞接种在10cm培养皿中,并在补充有10%FBS(Thermo Fisher Scientific)和青霉素/链霉素的Dulbecco改良Eagle培养基(DMEM,Thermo Fisher Scientific)中于37℃、5%CO 2下培养。使用Lipofectamine 3000(Thermo Fisher Scientific)用30μg质粒转染细胞。转染三天后,用0.05%胰蛋白酶(Thermo Fisher Scientific)消化细胞并制备用于FACS。分选GFP阳性细胞并保存在DMEM或Trizol(Ambion)中以确定DNA碱基编辑或RNA-seq。为了确定DNA碱基编辑的效率,使用一步小鼠基因分型试剂盒(Vazyme)裂解细胞,随后使用Hi-TOM或使用EditR 1.0.8定量的Sanger测序进行深度测序。对于RNA-seq,收集~500000个细胞并根据标准方案提取RNA,然后转化为cDNA,其用于高通量RNA-seq。 The plasmid was constructed using NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs) according to standard protocols. The 293T cells were seeded in a 10cm culture dish, and in Dulbecco's modified Eagle medium (DMEM, Thermo Fisher Scientific) supplemented with 10% FBS (Thermo Fisher Scientific) and penicillin/streptomycin at 37°C, 5% CO 2 Under cultivation. The cells were transfected with 30 μg plasmid using Lipofectamine 3000 (Thermo Fisher Scientific). Three days after transfection, the cells were digested with 0.05% trypsin (Thermo Fisher Scientific) and prepared for FACS. GFP-positive cells were sorted and stored in DMEM or Trizol (Ambion) to determine DNA base editing or RNA-seq. In order to determine the efficiency of DNA base editing, cells were lysed using a one-step mouse genotyping kit (Vazyme), followed by deep sequencing using Hi-TOM or using EditR 1.0.8 quantitative Sanger sequencing. For RNA-seq, ~500,000 cells are collected and RNA is extracted according to standard protocols and then converted into cDNA, which is used for high-throughput RNA-seq.
通过RNA测序进行RNA编辑分析RNA editing analysis by RNA sequencing
使用Illumina Hiseq以125x的平均覆盖度进行高通量mRNA测序(RNA-seq)。FastQC(v0.11.3)和Trimmomatic(v0.36)用于质量控制。使用STAR(v2.5.2b)以2遍模式将合格的读数映射到参考基因组(Ensemble GRCh38),其参数由ENCODE项目实施。然后应用Picard工具(v2.3.0)对映射的BAM文件的重复项 进行排序和标记。精制的BAM文件分别使用来自GATK(v3.5)的SplitNCigarReads,IndelRealigner,BaseRecalibrator和HaplotypeCaller工具进行分割读取、跨越剪接点、局部重新排列、基础重新校准和变体调用。为了确定具有高可信度的变体,过滤至少5个SNV的簇,这些SNV在35个碱基的窗口内,并且保留了基因质量得分>25的变体,映射质量得分>20,Fisher Strand值(FS>30.0)、Qual By深度值(QD<2.0)和测序深度>20。Use Illumina Hiseq to perform high-throughput mRNA sequencing (RNA-seq) with an average coverage of 125x. FastQC (v0.11.3) and Trimmomatic (v0.36) are used for quality control. Use STAR (v2.5.2b) to map qualified reads to the reference genome (Ensemble GRCh38) in a 2-pass mode, and its parameters are implemented by the ENCODE project. Then use Picard tool (v2.3.0) to sort and mark the duplicates of the mapped BAM file. The refined BAM file uses SplitNCigarReads, IndelRealigner, BaseRecalibrator and HaplotypeCaller tools from GATK (v3.5) to perform segmentation reading, crossing splice junctions, partial rearrangement, basic recalibration and variant calling. In order to identify variants with high confidence, filter clusters of at least 5 SNVs. These SNVs are within a 35-base window, and variants with a gene quality score> 25 are retained. The mapping quality score is> 20, Fisher Strand Value (FS>30.0), Qual By depth value (QD<2.0), and sequencing depth>20.
在野生型293T细胞中发现的任何可靠变体被认为是SNP并且从GFP和碱基编辑器转染的组中过滤出用于脱靶分析。编辑率计算为突变读数的数量除以每个位点的测序深度。为了分析每个脱靶的预测变异效应,使用变异效应预测器(VEP,v94)和GRCh38数据库进行变体注释。Any reliable variants found in wild-type 293T cells were considered SNPs and filtered from the GFP and base editor transfected group for off-target analysis. The edit rate is calculated as the number of mutation reads divided by the sequencing depth of each site. In order to analyze the predicted variation effect of each off-target, the variation effect predictor (VEP, v94) and GRCh38 database were used for variant annotation.
单细胞全长RNA-seq的文库构建Single-cell full-length RNA-seq library construction
在FACS后手动挑取单个人293T细胞,裂解并使用Smart-seq2方案进行cDNA合成。然后如前所述扩增并片段化单细胞cDNA(2,3)。构建测序文库(New England Biolabs),质量检查并在Illumina HiSeq X-Ten平台(Novogene)上用配对末端150-bp读数测序。After FACS, a single human 293T cell was manually picked, lysed and cDNA synthesis was performed using the Smart-seq2 protocol. Then amplify and fragment single-cell cDNA as previously described (2,3). A sequencing library (New England Biolabs) was constructed, quality checked and sequenced on the Illumina HiSeq X-Ten platform (Novogene) with paired-end 150-bp reads.
处理单细胞RNA-seq数据Process single-cell RNA-seq data
首先修剪单细胞RNA-seq数据的原始读数并与GRCh38人转录组(STAR v2.5.2b)比对。在去重复后,使用GATK软件(v3.5)鉴定来自个体细胞的RNA SNV。在DP≥20.0,FS≤30.0和QD≥2.0的单细胞中检测到的那些SNV被保留用于下游分析。First trim the original readings of single-cell RNA-seq data and compare them with GRCh38 human transcriptome (STAR v2.5.2b). After deduplication, GATK software (v3.5) was used to identify RNA SNV from individual cells. Those SNVs detected in single cells with DP≥20.0, FS≤30.0 and QD≥2.0 were retained for downstream analysis.
统计分析Statistical Analysis
所有值均显示为平均值+/-SEM。未配对Student's t检验(双尾)用于比较,p<0.05被认为具有统计学意义。All values are shown as mean +/-SEM. Unpaired Student's t test (two-tailed) was used for comparison, and p<0.05 was considered statistically significant.
实施例1:对各种单碱基编辑系统进行脱靶RNA SNV检测Example 1: Off-target RNA SNV detection for various single-base editing systems
在本实施例中,为了评估基因编辑在RNA水平上的脱靶效应,将CBE、BE3(APOBEC1-nCas9-UGI)或ABE、ABE7.10(TadA-TadA*-nCas9),以及GFP和有或没有单指导RNA(sgRNA)转染至培养的293T细胞。温育72 小时后,通过FACS收集表达GFP的细胞,然后通过RNA-seq分析。将每一组的实验结果与野生型(WT,未转染的)样品相比,在每个转染组中调用RNA SNV(图1A)。In this example, in order to evaluate the off-target effect of gene editing at the RNA level, CBE, BE3 (APOBEC1-nCas9-UGI) or ABE, ABE7.10 (TadA-TadA*-nCas9), and GFP and with or without Single guide RNA (sgRNA) was transfected into cultured 293T cells. After 72 hours of incubation, cells expressing GFP were collected by FACS and then analyzed by RNA-seq. The experimental results of each group were compared with wild-type (WT, untransfected) samples, and RNA SNV was used in each transfection group (Figure 1A).
9组转染细胞包括表达GFP、APOBEC1、BE3、具有“位点3”sgRNA的BE3、具有“RNF2”sgRNA的BE3、TadA-TadA*、ABE7.10、具有“位点1”sgRNA的ABE7.10、具有“位点2”sgRNA的ABE7.10的细胞(图5)。The 9 groups of transfected cells include expressing GFP, APOBEC1, BE3, BE3 with "site 3" sgRNA, BE3 with "RNF2" sgRNA, TadA-TadA*, ABE7.10, and ABE7 with "site 1" sgRNA. 10. ABE7.10 cells with "site 2" sgRNA (Figure 5).
首先,使用靶向深度测序验证了这些293T细胞中BE3和ABE7.10的DNA编辑的高靶向效率,结果如图1B至1E所示。First, the use of targeted deep sequencing verified the high targeting efficiency of BE3 and ABE7.10 DNA editing in these 293T cells, and the results are shown in Figures 1B to 1E.
接下来,在这些样品上以125x的平均深度进行RNA-seq(每组两次或三次重复)。在每个重复中分别从RNA-seq数据中调用RNA SNV,过滤掉在任何WT细胞中鉴定的那些。Next, RNA-seq (two or three repetitions per group) was performed on these samples at an average depth of 125x. Call RNA SNV from RNA-seq data in each replicate, and filter out those identified in any WT cells.
结果如图1F至1H、图6所示。在GFP转染的细胞中发现742+/-113(SEM,n=6)RNA SNV。令人惊讶的是,在表达APOBEC1、不含sgRNA的BE3、具有位点3或RNF2sgRNA的BE3(为仅表达GFP细胞中的5-40倍)中具有更多数量的RNA SNV。类似地,在表达TadA-TadA*、不含sgRNA的ABE7.10或具有位点1或位点2sgRNA的ABE7.10的细胞中也发现了大量RNA SNV(5-10倍)。The results are shown in Figures 1F to 1H and Figure 6. 742+/-113 (SEM, n=6) RNA SNV was found in GFP transfected cells. Surprisingly, there are more RNA and SNV in the expression of APOBEC1, BE3 without sgRNA, and BE3 with site 3 or RNF2 sgRNA (5-40 times that in cells expressing only GFP). Similarly, a large amount of RNA SNV (5-10 times) was also found in cells expressing TadA-TadA*, ABE7.10 without sgRNA, or ABE7.10 with site 1 or site 2 sgRNA.
有趣的是,在本实施例中发现,APOBEC1或TadA-TadA*的转染诱导了比其他转染组更高量的RNA SNV,这意味着CBE或ABE处理的细胞中SNV的增加可能是由脱氨酶APOBEC1或TadA的过表达引起的。Interestingly, it was found in this example that transfection of APOBEC1 or TadA-TadA* induced a higher amount of RNA SNV than other transfection groups, which means that the increase in SNV in CBE or ABE-treated cells may be caused by It is caused by overexpression of APOBEC1 or TadA.
实施例2:脱靶RNA SNV的表征Example 2: Characterization of off-target RNA SNV
在本实施例中,对各单碱基编辑系统进行了脱靶RNA SNV的表征。In this example, off-target RNA SNV was characterized for each single-base editing system.
结果如图2和图7-12所示。The results are shown in Figure 2 and Figure 7-12.
值得注意的是,在BE3处理的细胞中鉴定的几乎100%的RNA SNV是从G到A或从C到U的突变,显着高于GFP转染的细胞(如图2A和2C以及图7)。这种突变偏差与APOBEC1本身相同,表明这些突变不是自发的,而是由BE3或APOBEC1诱导的。It is worth noting that almost 100% of RNA SNV identified in BE3-treated cells is a mutation from G to A or C to U, which is significantly higher than that of GFP-transfected cells (Figure 2A and 2C and Figure 7) ). This mutation deviation is the same as APOBEC1 itself, indicating that these mutations are not spontaneous, but induced by BE3 or APOBEC1.
相应地,95%的ABE7.10诱导的突变是A至G或U至C,与ABE7.10的作用一致(如图2B和2C以及图7)。Correspondingly, 95% of the mutations induced by ABE7.10 are A to G or U to C, which is consistent with the effect of ABE7.10 (Figures 2B and 2C and Figure 7).
从结果中,还可以注意到GFP组也对A到G和U到C突变表现出一些偏 差(如图2C),这可能是由于先天突变偏好。From the results, it can also be noted that the GFP group also showed some deviations for A to G and U to C mutations (Figure 2C), which may be due to innate mutation preference.
在BE3-或ABE7.10-转染组的任何两个样品中,分别观察到27.7+/-3.6%或51.0+/-3.3%的重叠,并且这些重叠的SNV在具有高表达的基因中显著富集(图2D和图8)。然而,没有脱靶位点与预测的脱靶突变重叠,并且在脱靶和靶序列之间未观察到相似性(图2D和图9)。In any two samples of the BE3- or ABE7.10-transfection group, an overlap of 27.7+/-3.6% or 51.0+/-3.3% was observed, respectively, and these overlapping SNVs were significant in genes with high expression Enriched (Figure 2D and Figure 8). However, no off-target sites overlap with predicted off-target mutations, and no similarity was observed between off-target and target sequences (Figure 2D and Figure 9).
因此,CBE和ABE诱导的脱靶RNA SNV分别是sgRNA非依赖性的并且分别由APOBEC1和TadA-TadA*的过表达引起。Therefore, the off-target RNA and SNV induced by CBE and ABE are sgRNA-independent and caused by the overexpression of APOBEC1 and TadA-TadA*, respectively.
有趣的是,在本实施例中,观察到ABE7.10分别在癌基因和肿瘤抑制基因中诱导了56和12个非同义RNA SNV,其中许多显示编辑率高于40%并通过Sanger测序验证,提高了对致癌风险的担忧DNA碱基编辑(如图2E、图10至12)。Interestingly, in this example, it was observed that ABE7.10 induced 56 and 12 non-synonymous RNA SNVs in oncogenes and tumor suppressor genes, many of which showed an editing rate higher than 40% and verified by Sanger sequencing , Which raises the concern about carcinogenic risk DNA base editing (Figure 2E, Figure 10 to 12).
实施例3:用单碱基编辑系统转染的细胞的单细胞RNA SNV分析Example 3: Single-cell RNA SNV analysis of cells transfected with single-base editing system
在本实施例中,在四组细胞(WT、GFP、BE3-位点3和ABE7.10-位点1)上进行单细胞RNA-seq测序,以避免由于群体平均而导致的随机脱靶信号丢失。In this example, single-cell RNA-seq sequencing was performed on four groups of cells (WT, GFP, BE3-site 3 and ABE7.10-site 1) to avoid random off-target signal loss due to population averaging .
结果如图3和图13-17所示。The results are shown in Figure 3 and Figure 13-17.
平均而言,通过约6.07百万个测序读数在每个单细胞中检测到10,932个RefSeq基因,结果如图3B所示。选取具有高表达水平的指定脱氨酶的细胞用于进一步分析,结果图13所示。并且,在表达基础编辑的那些细胞中观察到严重的RNA脱靶和类似的突变模式(如图3C至3F和图14和15)。On average, 10,932 RefSeq genes were detected in each single cell through approximately 6.07 million sequencing reads, and the results are shown in Figure 3B. Cells with high expression levels of the designated deaminase were selected for further analysis, and the results are shown in Figure 13. Also, severe RNA off-target and similar mutation patterns were observed in those cells expressing basic editing (Figures 3C to 3F and Figures 14 and 15).
有趣的是,任何BE3或ABE7.10编辑细胞共享的脱靶位点百分比(4.5+/-1.0%)远低于细胞群(40.8+/-3.7%),这表明BE3-或ABE7.10诱导的脱靶SNV基本上是随机的和sgRNA非依赖性的(如图3G和3H)。值得注意的是,在单细胞中的一些癌基因和肿瘤抑制因子中检测到的非同义突变的编辑率高于从细胞群观察到的编辑率(如图3I、图16和17)。Interestingly, the percentage of off-target sites shared by any BE3 or ABE7.10 editing cells (4.5+/-1.0%) is much lower than the cell population (40.8+/-3.7%), indicating that BE3- or ABE7.10 induced Off-target SNV is basically random and sgRNA independent (Figure 3G and 3H). It is worth noting that the editing rate of non-synonymous mutations detected in some oncogenes and tumor suppressor factors in single cells is higher than that observed from cell populations (Figure 3I, Figures 16 and 17).
实施例4:通过合理设计脱氨酶消除脱靶RNAExample 4: Elimination of off-target RNA by rational design of deaminase
在本实施例中,为了进一步探索可能消除碱基编辑的RNA脱靶活性的实验方法,本发明人研究了去稳定APOBEC1和TadA对RNA结合的潜在影响。In this example, in order to further explore experimental methods that may eliminate the off-target activity of base-edited RNA, the inventors studied the potential effects of destabilizing APOBEC1 and TadA on RNA binding.
具体地,测试了用hA3A替换APOBEC1是否可以消除BE3的RNA脱靶 活性(如图4A)。Specifically, it was tested whether replacing APOBEC1 with hA3A can eliminate the RNA off-target activity of BE3 (Figure 4A).
结果如图4和图18-22所示。The results are shown in Figure 4 and Figure 18-22.
实际上,与BE3(APOBEC1)转染的细胞相比,BE3(hA3A)转染的293T细胞显示出显着降低的脱靶RNA SNV,同时保持高的靶向DNA编辑效率(图4B和4C,图18)。In fact, compared with BE3 (APOBEC1) transfected cells, BE3 (hA3A) transfected 293T cells showed significantly reduced off-target RNA SNV, while maintaining high targeted DNA editing efficiency (Figure 4B and 4C, Figure 4C). 18).
在另一种方法中,向APOBEC1的预测RNA结合域引入了点突变W90A,并发现虽然BE3(W90A)消除了RNA脱靶效应,但BE3的靶向DNA编辑活性(W90A)基本上不存在(图4B和4C,图18)。In another method, the point mutation W90A was introduced into the predicted RNA binding domain of APOBEC1, and it was found that although BE3 (W90A) eliminated the RNA off-target effect, the targeted DNA editing activity of BE3 (W90A) basically did not exist (Figure 4B and 4C, Figure 18).
在本实施例中,对于ABE的改造,本发明人将D53G或F148A引入ABE7.10的TadA和TadA*(图4A)。In this example, for the modification of ABE, the inventors introduced D53G or F148A into TadA and TadA* of ABE7.10 (Figure 4A).
有趣的是,发现ABE7.10 D53G和ABE7.10 F148A都保持了高DNA靶向效率,并且ABE7.10 F148A显示完全没有RNA脱靶效应,结果如图4D和4E、图18所示。此外,ABE7.10 F148A转染细胞中剩余的SNV与仅用GFP转染的细胞中的水平相近(如图19至21)。在本实施例中,还进一步证实,ABE7.10 F148A的DNA靶向活性在另外四个位点上与ABE7.10的相似(如图4F)。 Interestingly, it was found that both ABE7.10 D53G and ABE7.10 F148A maintained high DNA targeting efficiency, and ABE7.10 F148A showed no RNA off-target effect at all. The results are shown in Figures 4D and 4E and Figure 18. In addition, the levels of SNV remaining in ABE7.10 F148A transfected cells were similar to those in cells transfected with GFP alone (Figures 19 to 21). In the present embodiment, it is also confirmed, ABE7.10 F148A DNA targeting activity on the other four sites with similar ABE7.10 (FIG. 4F).
特别值得注意的是,在本实施例中,ABE7.10 F148A的编辑窗口显着缩小,结果图4G和图22所示。这表明DNA碱基编辑的精确度提高。 It is particularly worth noting that in this embodiment, the editing window of ABE7.10 F148A is significantly reduced, and the results are shown in Figs. 4G and 22. This indicates that the accuracy of DNA base editing has improved.
因此,本发明中工程化改造的ABE7.10 F148A具有较大的应用前景。 Therefore, the engineered ABE7.10 F148A in the present invention has a larger application prospect.
在本发明提及的所有文献都在本申请中引用作为参考,就如同每一篇文献被单独引用作为参考那样。此外应理解,在阅读了本发明的上述讲授内容之后,本领域技术人员可以对本发明作各种改动或修改,这些等价形式同样落于本申请所附权利要求书所限定的范围。All documents mentioned in the present invention are cited as references in this application, as if each document was individually cited as a reference. In addition, it should be understood that after reading the above teaching content of the present invention, those skilled in the art can make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined by the appended claims of the present application.

Claims (10)

  1. 一种腺嘌呤脱氨酶TadA的突变蛋白,其特征在于,所述的突变蛋白为非天然蛋白,并且所述突变蛋白在腺嘌呤脱氨酶TadA的选自下组的一个或多个氨基酸发生突变:A mutant protein of adenine deaminase TadA, characterized in that the mutant protein is an unnatural protein, and the mutant protein occurs in one or more amino acids selected from the group consisting of the adenine deaminase TadA mutation:
    第147位苯丙氨酸(F)和第148位苯丙氨酸(F);Phenylalanine (F) at position 147 and Phenylalanine (F) at position 148;
    其中,所述第147位和第148位是对应于如SEQ ID NO:1所示的序列的第147位和第148位。Wherein, the 147th and 148th positions correspond to the 147th and 148th positions of the sequence shown in SEQ ID NO:1.
  2. 如权利要求1所述的突变蛋白,其特征在于,所述突变蛋白具有催化腺嘌呤水解脱氨基生成次黄嘌呤的活性。The mutant protein of claim 1, wherein the mutant protein has the activity of catalyzing the hydrolysis and deamination of adenine to form hypoxanthine.
  3. 如权利要求1所述的突变蛋白,其特征在于,所述的腺嘌呤脱氨酶TadA包括TadA*酶和野生型TadA酶。The mutant protein of claim 1, wherein the adenine deaminase TadA includes TadA* enzyme and wild-type TadA enzyme.
  4. 一种基因编辑酶,其特征在于,所述基因编辑酶的结构如式I所示:A gene editing enzyme, characterized in that the structure of the gene editing enzyme is shown in formula I:
    Z1-L1-Z2-L2-Z3-L3-Z4    (I)Z1-L1-Z2-L2-Z3-L3-Z4 (I)
    其中,among them,
    Z1为腺嘌呤脱氨酶TadA的氨基酸序列;Z1 is the amino acid sequence of adenine deaminase TadA;
    Z2为TadA*酶的氨基酸序列;Z2 is the amino acid sequence of TadA* enzyme;
    并且所述Z1和/或Z2为如权利要求1所述的突变蛋白的氨基酸序列;And said Z1 and/or Z2 is the amino acid sequence of the mutant protein according to claim 1;
    Z3为Cas9核酸酶的编码序列;Z3 is the coding sequence of Cas9 nuclease;
    L1、L2和L3各自独立地为任选的连接肽序列;L1, L2 and L3 are each independently an optional connecting peptide sequence;
    Z4为无或核定位信号元件(NLS);Z4 is a non-or nuclear localization signal element (NLS);
    并且各“-”独立地为肽键。And each "-" is independently a peptide bond.
  5. 如权利要求4所述的基因编辑梅,其特征在于,所述基因编辑酶的氨基酸序列如SEQ ID NO:10所示。The gene-edited plum of claim 4, wherein the amino acid sequence of the gene-editing enzyme is shown in SEQ ID NO: 10.
  6. 一种多核苷酸,其特征在于,所述的多核苷酸编码如权利要求4所述的基因编辑酶。A polynucleotide, wherein the polynucleotide encodes the gene editing enzyme according to claim 4.
  7. 一种载体,其特征在于,所述的载体含有如权利要求6所述的多核苷酸。A vector, characterized in that the vector contains the polynucleotide according to claim 6.
  8. 一种宿主细胞,其特征在于,所述的宿主细胞含有如权利要求7所述的载体,或其基因组中整合有如权利要求6所述的多核苷酸。A host cell, characterized in that the host cell contains the vector according to claim 7, or the polynucleotide according to claim 6 is integrated into its genome.
  9. 一种基因单碱基定点编辑的方法,其特征在于,包括步骤:A method for gene single-base directed editing, which is characterized in that it comprises the steps:
    (i)提供一细胞以及第一载体和第二载体,其中所述第一载体含有如权利要求2所述的基因编辑酶的表达盒,并且所述第二载体含有表达sgRNA的表达盒;(i) providing a cell and a first vector and a second vector, wherein the first vector contains an expression cassette for the gene editing enzyme according to claim 2, and the second vector contains an expression cassette for expressing sgRNA;
    (ii)用所述的第一载体和第二载体感染所述的细胞,从而在所述细胞内进行单碱基定点编辑。(ii) Infecting the cell with the first vector and the second vector, thereby performing single-base site-directed editing in the cell.
  10. 一种试剂盒,其特征在于,所述试剂盒包括:A kit, characterized in that the kit includes:
    (a1)第一容器,以及位于所述第一容器中的第一载体,所述所述第一载体含有如权利要求2所述的基因编辑酶的表达盒。(a1) A first container, and a first vector located in the first container, the first vector containing the expression cassette of the gene editing enzyme according to claim 2.
PCT/CN2019/081532 2019-04-04 2019-04-04 New-type single-base editing technique and use thereof WO2020199200A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/081532 WO2020199200A1 (en) 2019-04-04 2019-04-04 New-type single-base editing technique and use thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/081532 WO2020199200A1 (en) 2019-04-04 2019-04-04 New-type single-base editing technique and use thereof

Publications (1)

Publication Number Publication Date
WO2020199200A1 true WO2020199200A1 (en) 2020-10-08

Family

ID=72664764

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/081532 WO2020199200A1 (en) 2019-04-04 2019-04-04 New-type single-base editing technique and use thereof

Country Status (1)

Country Link
WO (1) WO2020199200A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109295186A (en) * 2018-09-30 2019-02-01 中山大学 A kind of method based on genome sequencing detection adenine single base editing system undershooting-effect and its application in gene editing
CN109385425A (en) * 2018-11-13 2019-02-26 中山大学 A kind of high specific ABE base editing system and its application in β hemoglobinopathy

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109295186A (en) * 2018-09-30 2019-02-01 中山大学 A kind of method based on genome sequencing detection adenine single base editing system undershooting-effect and its application in gene editing
CN109385425A (en) * 2018-11-13 2019-02-26 中山大学 A kind of high specific ABE base editing system and its application in β hemoglobinopathy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QIN, WEI ET AL.: "Precise A•T to G•C Base Editing in the Zebrafish Genome", BMC BIOLOGY, vol. 16, 20 November 2018 (2018-11-20), XP55741388, DOI: 20191213155051A *

Similar Documents

Publication Publication Date Title
WO2020244122A1 (en) New-type single-base editing technique and use thereof
CN111778233B (en) Novel single-base editing technology and application thereof
Cao et al. Construction of BAC‐based physical map and analysis of chromosome rearrangement in Chinese hamster ovary cell lines
JP6480647B1 (en) Method for producing eukaryotic cell in which DNA is edited, and kit used in the method
JP5320546B2 (en) Tol1 element transposase and DNA introduction system using the same
US20110016546A1 (en) Porcine genome editing with zinc finger nucleases
KR20170027743A (en) Methods and compositions for modifying a targeted locus
US7771714B2 (en) Medium comprising chicken leukemia inhibitory factor (LIF)
JP2017517250A (en) Epigenetic modification of the mammalian genome using targeted endonucleases
CN110804628B (en) High-specificity off-target-free single-base gene editing tool
JPH04502554A (en) Leukemia inhibitory factor from livestock species and its use to promote fetal cell transplantation and proliferation
CN111172191B (en) Efficient gene knockout vector and application thereof
Alexandraki et al. Evolution of α q-and β-tubulin genes as inferred by the nucleotide sequences of sea urichin cDNA clones
EP4165180A2 (en) Engineered mad7 directed endonuclease
US20230091242A1 (en) Rna-guided genome recombineering at kilobase scale
Omasa et al. Bacterial artificial chromosome library for genome‐wide analysis of Chinese hamster ovary cells
US20220162648A1 (en) Compositions and methods for improved gene editing
CN110938629B (en) Complete sgRNA for specifically recognizing pig Wip1 gene and application and product thereof
WO2020199200A1 (en) New-type single-base editing technique and use thereof
WO2022206352A1 (en) Prime editing tool, fusion rna, and use thereof
EP3666898A1 (en) Gene knockout method
US20120309011A1 (en) Targeting of modifying enzymes for protein evolution
KR20130069188A (en) Zinc finger nuclease for targeting myostatin and use thereof
TW201840849A (en) Composition and method for editing a nucleic acid sequence
US20230313205A1 (en) Fusion protein and use thereof in base editing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19923551

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19923551

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19923551

Country of ref document: EP

Kind code of ref document: A1