CN112048497B - Novel single-base editing technology and application thereof - Google Patents

Novel single-base editing technology and application thereof Download PDF

Info

Publication number
CN112048497B
CN112048497B CN201910493592.8A CN201910493592A CN112048497B CN 112048497 B CN112048497 B CN 112048497B CN 201910493592 A CN201910493592 A CN 201910493592A CN 112048497 B CN112048497 B CN 112048497B
Authority
CN
China
Prior art keywords
leu
lys
amino acid
glu
asp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910493592.8A
Other languages
Chinese (zh)
Other versions
CN112048497A (en
Inventor
杨辉
周昌阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huida Shanghai Biotechnology Co ltd
Original Assignee
Huida Shanghai Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huida Shanghai Biotechnology Co ltd filed Critical Huida Shanghai Biotechnology Co ltd
Priority to CN201910493592.8A priority Critical patent/CN112048497B/en
Priority to PCT/CN2019/111770 priority patent/WO2020244122A1/en
Publication of CN112048497A publication Critical patent/CN112048497A/en
Application granted granted Critical
Publication of CN112048497B publication Critical patent/CN112048497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K19/00Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/10Cells modified by introduction of foreign genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04001Cytosine deaminase (3.5.4.1)

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Cell Biology (AREA)
  • Mycology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The invention provides a novel single-base editing technology and application thereof. Specifically, the invention provides a gene editing enzyme, which is characterized in that the structure of the gene editing enzyme is shown as a formula I: Z1-L1-Z2-L2-Z3-Z4 (I) wherein Z1 is the amino acid sequence of cytosine deaminase APOBEC 3A; z2 is the amino acid sequence of spCas9 (n); and Z1 has a mutation A corresponding to residue R of residue 128 of the sequence shown in SEQ ID NO. 1; z3 is the coding sequence of uracil DNA glycosylase inhibitor (Uracil DNA glycosylase inhibitor, UGI); l1 and L2 are each independently an optional linker peptide sequence; z4 is no or nuclear localization signal element (NLS); and each "-" is independently a peptide bond. The invention also provides a method for single-base fixed-point editing of the gene. The DNA editing accuracy of the method is high, and the RNA off-target effect can be obviously reduced.

Description

Novel single-base editing technology and application thereof
Technical Field
The invention relates to the field of biotechnology, in particular to a novel single-base editing technology and application thereof.
Background
Since 2013, a new generation gene editing technology represented by CRISPR/Cas9 has entered various experiments in the field of biology, and conventional gene manipulation means have been being changed.
The DNA base editing method developed in recent years can directly generate accurate point mutations in genomic DNA without generating Double Strand Breaks (DSB). Two types of basic editors have been reported: cytosine base editors (CBE, C to T and G to a) and adenine base editors (ABE, a to G, T to C). However, its use also presents a key problem, namely off-target effects.
Previous studies have focused mainly on assessing off-target mutations in genomic DNA. Recent research results indicate that CBEs, but not ABEs, induce a large number of off-target single nucleotide mutations during gene editing, underscores the need to develop higher fidelity single base editing tools. In addition to DNA targeting activity, commonly used single base editing systems may mutate RNA. For example, the cytosine deaminase apodec 1 associated with CBE was found to target both DNA and RNA, and the adenine deaminase TadA associated with ABE was found to also induce site-specific inosine formation on RNA. However, DNA base editing mediated RNA targeting activity has not been studied previously. Studies have shown that both cytosine base editor BE3 and adenine base editor ABE7.10 produce tens of thousands of off-target RNA Single Nucleotide Variations (SNVs), whereas cells without base editing exhibit only a few hundred SNVs.
Currently, in existing DNA base editing methods, the accuracy of DNA editing is not high, i.e., the gene editing window is too large. ABE7.10 developed by David Liu laboratories at harvard university is capable of editing the third to eighth bases of the sgRNA targeting sequence, and if other bases besides the base of interest to be edited are to be edited non-specifically.
Therefore, there is a strong need in the art to develop a single base editing technique that has high accuracy, significantly reduces RNA off-target effects, and maintains effective DNA targeting activity.
Disclosure of Invention
The invention aims to provide a single-base editing technology which has high accuracy, remarkably reduces the RNA off-target effect and can maintain effective DNA targeting activity.
In a first aspect of the present invention there is provided a mutein of the cytosine deaminase apodec 3A, said mutein being a non-natural protein and said mutein being mutated at one or more amino acids of the cytosine deaminase apodec 3A selected from the group consisting of:
arginine (R) at position 128 and tyrosine (Y) at position 130.
Wherein the 128 th bit and 130 th bit are the 128 th bit and 130 th bit corresponding to the sequence shown as SEQ ID NO. 1.
In another preferred embodiment, the cytosine deaminase apodec 3A is derived from the species: human (Homo sapiens).
In another preferred embodiment, the mutein has the activity of catalyzing the hydrolytic deamination of cytosine to uracil.
In another preferred embodiment, the amino acid sequence of the wild-type APOBEC3A enzyme is shown in SEQ ID NO. 1.
In another preferred embodiment, the arginine (R) at position 128 is mutated to an amino acid residue other than arginine.
In another preferred embodiment, the arginine at position 128 is mutated to: alanine (a), glycine (G), phenylalanine (F), aspartic acid (D), cysteine (C), glutamine (Q), glutamic acid (E), glycine (G), histidine (H), isoleucine (I), leucine (L), lysine (K), methionine (M), serine (S), proline (P), threonine (T), tryptophan (W), tyrosine (Y), or valine (V).
In another preferred embodiment, the arginine at position 128 is mutated to: leucine (L), valine (V), isoleucine (I) or alanine (a).
In another preferred embodiment, the arginine (R) at position 128 is mutated to an alanine (A) residue.
In another preferred embodiment, the tyrosine 130 is mutated to a non-tyrosine amino acid residue.
In another preferred embodiment, the tyrosine mutation at position 130 is: alanine (a), glycine (G), phenylalanine (F), aspartic acid (D), cysteine (C), glutamine (Q), glutamic acid (E), glycine (G), histidine (H), isoleucine (I), leucine (L), lysine (K), methionine (M), serine (S), proline (P), threonine (T), tryptophan (W), arginine (R), or valine (V).
In another preferred embodiment, the tyrosine 130 is mutated to: leucine (L), valine (V), isoleucine (I), alanine (a), or phenylalanine (F).
In another preferred embodiment, the tyrosine 130 is mutated to a phenylalanine (F) residue.
In another preferred embodiment, the mutant protein has the same or substantially the same amino acid sequence as set forth in SEQ ID NO. 1, except for the mutation (e.g., amino acids 128 and 130).
In another preferred embodiment, the substantial identity is up to 50 (preferably 1-20, more preferably 1-10, more preferably 1-5) amino acid differences, wherein the differences include amino acid substitutions, deletions or additions and the muteins still have activity in catalyzing the hydrolytic deamination of adenine to hypoxanthine.
In another preferred embodiment, the mutant protein is cytosine deaminase APOBEC3A with an R128A mutation, and the amino acid sequence of the mutant protein is shown in SEQ ID NO. 2.
In another preferred embodiment, the mutant protein is cytosine deaminase APOBEC3A with a mutation of Y130F, and the amino acid sequence of the mutant protein is shown in SEQ ID NO. 3.
In another preferred embodiment, the mutant protein is cytosine deaminase APOBEC3A with R128A and Y130F mutation, and the amino acid sequence of the mutant protein is shown in SEQ ID NO. 4.
In another preferred embodiment, the amino acid sequence of the mutein hybridizes with the amino acid sequence of SEQ ID NO: 2. 3 or 4, preferably at least 85% or 90%, more preferably at least 95%, most preferably at least 98%, and homology of 202/203 or 99.5%.
In a second aspect of the present invention, there is provided a gene-editing enzyme having the structure shown in formula I:
Z1-L1-Z2-L2-Z3-Z4 (I)
wherein,,
z1 is the amino acid sequence of the cytosine deaminase APOBEC3A mutein as described in the first aspect of the invention;
z2 is the amino acid sequence of nuclease Cas 9;
z3 is the amino acid sequence of uracil DNA glycosylase inhibitor (Uracil DNA glycosylase inhibitor, UGI);
l1, L2 and L3 are each independently an optional linker peptide sequence;
z4 is no or nuclear localization signal element (NLS);
and each "-" is independently a peptide bond.
In another preferred embodiment, the amino acid sequence of L1 is shown in SEQ ID NO. 5.
In another preferred embodiment, the amino acid sequence of L1 is identical or substantially identical to the amino acid sequence shown in SEQ ID NO. 5.
In another preferred embodiment, the amino acid sequence of L2 is shown in SEQ ID NO. 6.
In another preferred embodiment, the amino acid sequence of L2 is identical or substantially identical to the amino acid sequence shown in SEQ ID NO. 6.
In another preferred embodiment, the amino acid sequence of L3 is shown in SEQ ID NO. 7.
In another preferred embodiment, the amino acid sequence of L3 is identical or substantially identical to the amino acid sequence shown in SEQ ID NO. 7.
In another preferred embodiment, in Z2, the source of nuclease Cas9 is selected from the group consisting of: streptococcus pyogenes (streptococcus), staphylococcus (Staphylococcus aureus), streptococcus pyogenes mutants, or staphylococcus aureus mutants.
In another preferred embodiment, in Z2, the nuclease Cas9 may be replaced with a Cpf1 nuclease, the source of the Cpf1 nuclease being selected from the group consisting of: amino acid coccus (Acidomicrocos), trichosporon (Lachnospiraceae), amino acid coccus mutants, and Trichosporon mutants.
In another preferred embodiment, the amino acid sequence of Z2 is shown in SEQ ID NO. 8.
In another preferred embodiment, the amino acid sequence of Z2 is identical or substantially identical to the amino acid sequence shown in SEQ ID NO. 8.
In another preferred embodiment, the amino acid sequence of Z3 is shown in SEQ ID NO. 11.
In another preferred embodiment, the amino acid sequence of Z3 is identical or substantially identical to the amino acid sequence shown in SEQ ID NO. 11.
In another preferred embodiment, the amino acid sequence of Z4 is shown in SEQ ID NO. 9.
In another preferred embodiment, the amino acid sequence of Z4 is identical or substantially identical to the amino acid sequence shown in SEQ ID NO. 9.
In another preferred embodiment, said substantial identity is at most 50 (preferably 1-20, more preferably 1-10, more preferably 1-5, most preferably 1-3) amino acid differences, wherein said differences comprise amino acid substitutions, deletions or additions.
In another preferred embodiment, the substantial identity is at least 70%, at least 75%, at least 80%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity of the amino acid sequence to the corresponding amino acid sequence.
In another preferred embodiment, the amino acid sequence of the gene editing enzyme is shown in SEQ ID NO. 10.
In a third aspect of the invention there is provided a polynucleotide encoding a gene editing enzyme according to the second aspect of the invention.
In another preferred embodiment, the polynucleotide is selected from the group consisting of:
(a) A polynucleotide encoding an amino acid sequence shown in SEQ ID NO. 10;
(b) A polynucleotide having a nucleotide sequence which has a sequence identity of 95% (preferably 98%) or more to the polynucleotide sequence of (a);
(c) A polynucleotide complementary to the polynucleotide of any one of (a) and (b).
In another preferred embodiment, the ORF of the gene-editing enzyme additionally comprises auxiliary elements selected from the group consisting of: a signal peptide, a secretory peptide, a tag sequence (e.g., 6 His), or a combination thereof.
In another preferred embodiment, the signal peptide is a nuclear localization sequence.
In another preferred embodiment, the polynucleotide is selected from the group consisting of: a DNA sequence, an RNA sequence, or a combination thereof.
In a fourth aspect of the invention there is provided a vector comprising a polynucleotide according to the third aspect of the invention.
In another preferred embodiment, the vector comprises an expression vector, a shuttle vector, an integration vector.
In a fifth aspect of the invention there is provided a host cell comprising a vector according to the fourth aspect of the invention, or having incorporated into its genome a polynucleotide according to the third aspect of the invention.
In another preferred embodiment, the host is a prokaryotic cell or a eukaryotic cell.
In another preferred embodiment, the prokaryotic cell comprises: coli.
In another preferred embodiment, the eukaryotic cell is selected from the group consisting of: yeast cells, plant cells, mammalian cells, human cells (e.g., HEK293T cells), or combinations thereof.
In a sixth aspect of the present invention, there is provided a method for single base site-directed editing of a gene, comprising the steps of:
(i) Providing a cell and a first vector and a second vector, wherein the first vector comprises an expression cassette for a gene editing enzyme according to the second aspect of the invention and the second vector comprises an expression cassette for expression of sgRNA;
(ii) Infecting said cells with said first and second vectors, thereby performing single base site-directed editing within said cells.
In another preferred embodiment, wherein the first vector comprises a first nucleic acid construct having the structure of formula II of 5'-3' (5 'to 3'):
P1-X1-L4-X2 (II)
Wherein P1 is a first promoter sequence;
x1 is a nucleotide sequence encoding a gene editing enzyme according to the second aspect of the invention;
l4 is a no or a linking sequence;
x2 is a polyA sequence;
and each "-" is independently a bond or a nucleotide linking sequence.
In another preferred embodiment, the first promoter is selected from the group consisting of: CMV promoter, CAG promoter, PGK promoter, EF1 a promoter, EFs promoter, or a combination thereof.
In another preferred embodiment, the first promoter sequence is a CMV promoter.
In another preferred embodiment, the length of the linker sequence is 30-120nt, preferably 48-96nt, and preferably a multiple of 3.
In another preferred embodiment, the first carrier and the second carrier may be the same or different.
In another preferred embodiment, the first carrier and the second carrier may be the same carrier.
In another preferred embodiment, the first vector and/or the second vector further comprises an expression cassette for expressing a selectable marker.
In another preferred embodiment, the selectable marker is selected from the group consisting of: green fluorescent protein, yellow fluorescent protein, red fluorescent protein, blue fluorescent protein, or a combination thereof.
In another preferred embodiment, the method is non-diagnostic and non-therapeutic.
In another preferred embodiment, the cells are from the following species: human, non-human mammal, poultry, plant, or microorganism.
In another preferred embodiment, the non-human mammal comprises a rodent (e.g., mouse, rat, rabbit), cow, pig, sheep, horse, dog, cat, non-human primate (e.g., monkey).
In another preferred embodiment, the cells are selected from the group consisting of: somatic cells, stem cells, germ cells, non-dividing cells, or a combination thereof.
In another preferred embodiment, the cells are selected from the group consisting of: kidney cells, epithelial cells, endothelial cells, neural cells, or combinations thereof.
In another preferred example, when the method is used for gene editing, the editing window is the 4 th to 7 th bases of the sgRNA targeted 20 base sequence, wherein the editing efficiency of the 5 th base is highest, the distribution is obviously reduced towards two sides, the editing window of a non-mutated BE3-hA3A editing system is wider than the method, the editing window is positioned from the 3 rd amino acid to the 9 th amino acid, and the editing efficiency of the 5 th base is highest, and the editing window is distributed towards two sides in a gradually decreasing trend.
In a seventh aspect of the invention, there is provided a kit comprising:
(a1) A first container, and a first vector in the first container, the first vector comprising an expression cassette for a gene-editing enzyme according to the second aspect of the invention.
In another preferred embodiment, the kit further comprises:
(a2) A second container, and a second vector in the second container, the second vector comprising an expression cassette for expressing sgRNA.
In another preferred embodiment, the first vector and/or the second vector further comprises an expression cassette for expressing a selectable marker.
In another preferred embodiment, the first container and the second container may be the same container, and may be different containers.
In another preferred embodiment, the kit further comprises instructions describing: a method of infecting a cell with the first vector and the second vector, thereby performing single-base site-directed editing of a gene in the cell.
It is understood that within the scope of the present invention, the above-described technical features of the present invention and technical features specifically described below (e.g., in the examples) may be combined with each other to constitute new or preferred technical solutions. And are limited to a space, and are not described in detail herein.
Drawings
FIG. 1 shows the SNV results of off-target RNAs for each single base editing system.
A: experimental design scheme.
B: DNA targeting efficiency of WT (n=3 replicates), GFP (n=3), apodec 1 (n=3 replicates), BE3 (n=3 replicates) and BE 3-site 3 (n=2 replicates). Note that apodec 1 is a cytosine deaminase of BE 3.
C: WT, GFP, APOBEC1 DNA targeting efficiency of BE3 and BE3-RNF 2. Each group n=3 repeats.
D: WT, GFP, tadA-TadA, ABE7.10 and ABE 7.10-DNA targeting efficiency at site 1. Each group of n=3 replicates. Note that TadA-TadA (wild-type TadA enzyme-evolved TadA heterodimer) is an adenine deaminase of ABE7.10, and improved TadA is represented by TadA.
E: WT, GFP, tadA-TadA, ABE7.10 and ABE 7.10-DNA targeting efficiency at site 2. Each group of n=3 replicates.
F. G: comparison of off-target RNA SNV for BE3 and ABE7.10 groups.
H: representative distribution of off-target RNA SNV on human chromosomes of GFP, BE3 and ABE 7.10. Chromosomes are represented in different colors. GFP group served as a control for all comparisons. All values are expressed as mean ± SEM p <0.05, p <0.01, p <0.001, unpaired t-test.
FIG. 2 shows characterization of off-target RNA SNV.
A: proportion of G > a and C > U mutations for GFP (n=6 replicates), APOBEC1 (n=3 replicates), BE 3-site 3 (n=2 replicates) and BE3-RNF2 (n=3 replicates).
B: proportion of a > G and U > C mutations for GFP (n=6 replicates), tadA-TadA x (n=3 replicates), ABE 7.10-site 1 (n=3 replicates) and ABE 7.10-site 2 (n=3 replicates).
C: distribution of mutation types for each group. The numbers represent the percentage of a mutation among all mutations.
D: ratio of shared RNA SNV between any two samples in the BE3 and ABE7.10 groups. The ratio in each cell was calculated by dividing the number of overlapping RNA SNVs between the two samples by the number of RNA SNVs in the row.
E: ABE7.10 induced nonsensical mutations are located in oncogenes and tumor suppressors with highest editing rates on oncogenes. Gene names are indicated in blue, amino acid mutations are indicated in red, and single nucleotide transitions are indicated in green. GFP group served as a control for all comparisons. All values are expressed as mean ± SEM. * p <0.05, < p <0.01, < p <0.001, unpaired t-test.
FIG. 3 shows the results of single cell RNA SNV analysis of cells transfected with a base editor.
A: SNV profile analyzed by single cell RNA sequencing method.
B: expression patterns of ABE, BE3 or GFP in single cells from single cell RNA-seq data.
C: number of off-target RNA SNV detected in GFP- (n=15 cells), BE 3-site 3- (n=4 cells) and ABE 7.10-site 1- (n=9 cells) treated single cells.
D: ratio of G > A and C > U mutations.
E: proportion of a > G and U > C mutations for GFP (n=15 cells), BE 3-site 3 (n=4 cells) and ABE 7.10-site 1 (n=9 cells).
F: distribution of mutation types in each cell. The numbers represent the percentage of a mutation among all mutations.
G. H: the ratio of SNV is shared between any two samples in the same group. The ratio in each cell is calculated by dividing the number of overlapping SNVs between two samples by the samples in the row.
I: editing rate of SNV located on cancer-related genes occurring in single cells edited with at least 3 ABE 7.10. GFP group served as a control for all comparisons. All values are expressed as mean ± SEM. * p <0.05, < p <0.01, < p <0.001, unpaired t-test.
FIG. 4 shows the results of elimination of off-target RNA SNV by a rational design of deaminase.
A: schematic representation of BE3 and ABE7.10 variants. All deaminase mutations were performed in the BE3/ABE7.10 background. Point mutations are indicated by red lines.
B: GFP (n=3 repeats), BE 3-site 3 (n=2 repeats), BE3 (hA 3A) -site 3 (n=3 repeats) and BE3 (W90A) -site 3 (n=3 repeats).
C: comparison of off-target RNA SNV in BE 3-site 3 treatment group.
D: targeting efficiency of GFP, ABE 7.10-site 1, ABE7.10 (D53G) -site 1 and ABE7.10 (F148A) -site 1 groups. Each group of n=3 replicates.
E: comparison of off-target RNA SNV in ABE7.10 treated group.
F: edit efficiencies of ABE7.10 and ABE7.10 (F148A) at four different sites were compared. Each group of n=3 replicates.
G: representative editing site display ABE7.10 (F148A) reduces the width of the editing window. All values are expressed as mean ± SEM. * p <0.05, < p <0.01, < p <0.001, unpaired t-test.
FIG. 5 shows a schematic representation of the plasmid.
Fig. 6 shows a representative distribution of off-target RNA SNV on chromosomes.
A: APOBEC1, BE 3-site 3, BE3-RNF2; b: tadA-TadA. Ex. ABE 7.10-site 1 and ABE 7.10-site 2
Figure 7 shows the distribution of mutation types for each repeat of all groups. The numbers represent the percentage of a mutation of a certain type among all mutations.
A: distribution of mutation types for each repeat of GFP group.
B: distribution of mutation types for each repeat of the apopec 1 and BE3 groups with or without sgrnas.
C: distribution of mutation types for each repeat of TadA-TadA and ABE7.10 groups with or without sgrnas.
Figure 8 shows that genes containing overlapping off-target RNA SNV were significantly higher than the random mock genes in all BE3 and ABE7.10 transfected groups. P values were calculated by a double sided Student t' test.
FIG. 9 shows the similarity between adjacent off-target RNA SNV sequences and target sequences
FIG. 10 shows the rate of editing BE 3-induced non-synonymous mutations located on oncogenes and tumor suppressor genes. Single nucleotide conversion is indicated in green, amino acid mutation in red, and gene name in blue.
FIG. 11 shows the rate of editing ABE7.10 induced non-synonymous mutations located on oncogenes and tumor suppressor genes. Single nucleotide conversion is indicated in green, amino acid mutation in red, and gene name in blue.
FIG. 12 shows that off-target RNA SNV was detected in RNA only, not DNA. Sanger sequencing chromatograms showed that U to C mutations were only observed in RNA of the two highest ranked oncogenes TOPRs and CSDE 1.
FIG. 13 shows the expression levels of the transfection vector in single cells. The expression levels of GFP, apodec 1 and TadA-TadA were quantified in all sequenced single cells. The threshold is indicated by the blue dashed line. The log2 (FPKM+1) thresholds for GFP, BE3 and ABE7.10 were 0.3, 1 and 0.3, respectively. Cells with expression levels above the threshold are included for further analysis.
Fig. 14 shows the mutation type distribution of all single cells.
A: distribution of mutation types in GFP-transfected single cells (n=16 cells).
B: distribution of mutation types in single cells transfected with BE3 site 3 (n=31 cells). Cells expressing apodec 1 at levels above the threshold are included in the red squares.
C: distribution of mutation types of ABE 7.10-site 1-transfected single cells (n=28 cells). Cells with expression levels TadA-TadA above the threshold are included in the red squares. The number indicates the percentage of a mutation among all mutations. SC represent single cells.
Fig. 15 shows the distribution of off-target RNA SNV on human chromosomes for all individual cells with expression levels above a threshold.
A: distribution of off-target RNA SNV on human chromosome for GFP transfected single cells (n=15).
B: distribution of off-target RNA SNV on human chromosome for single cells transfected with BE3 site 3 (n=4).
C: distribution of off-target RNA SNV on ABE 7.10-site 1-transfected single cell (n=9) human chromosome.
FIG. 16 shows the editing rate of BE 3-induced non-synonymous mutations located on oncogenes and tumor suppressor genes in single cells. Single nucleotide conversion is indicated in green, amino acid mutation in red, and gene name in blue.
FIG. 17 shows the editing rate of ABE7.10 induced non-synonymous mutations on oncogenes and tumor suppressor genes located in single cells. Single nucleotide conversion is indicated in green, amino acid mutation in red, and gene name in blue.
Figure 18 shows representative distributions of off-target RNA SNV on human chromosomes of engineered BE3 and ABE7.10 variants.
Fig. 19 shows the average distribution of mutation types for the engineered variants of BE3 and ABE7.10, n=3 for each group.
FIG. 20 shows the distribution of mutation types for each sample of engineered variants of BE3 and ABE 7.10.
Figure 21 shows the ratio of shared RNA SNV between any two samples in the engineered variants of BE3 and ABE 7.10. The ratio in each cell was calculated by dividing the number of overlapping RNA SNVs between the two samples by the number of RNA SNVs in the row.
Fig. 22 shows the results for ABE7.10 (n=3) and ABE7.10 F148A (n=3) comparison of the width of the editing window.
FIG. 23 shows homology of TadA enzymes in various species.
Detailed Description
The inventors have conducted extensive and intensive studies and, as a result of extensive screening, have unexpectedly found for the first time that the 128 th amino acid residue R of the APOBEC3A fragment in the cytosine deaminase (APOBEC 3A) associated with the cytosine base editor BE3-hA is mutated to A (i.e., APOBEC 3A) R128A ) Or mutation of amino acid residue Y at position 130 to F (i.e.APOBEC 3A Y130F ) Later, the gene editing window can be obviously narrowed under the condition of maintaining effective DNA targeting activity, so that the accuracy of gene editing can be obviously improved; and experiments have shown that the mutation (i.e. APOBEC3A R128A Or APOBEC3A Y130F ) In the gene editing system, the off-target effect of RNA is greatly reduced. The present application has been completed on the basis of this finding.
Terminology
As used herein, the term "base mutation" refers to the occurrence of a substitution (substitution), insertion (insertion) and/or deletion (deletion) of a base at a position in a nucleotide sequence.
As used herein, the term "base substitution" refers to a mutation of a base at a position of a nucleotide sequence to a different base, such as a mutation of C to T.
As used herein, "selectable marker gene" refers to a gene used in a transgenic process to screen transgenic cells or transgenic animals, and selectable marker genes useful in the present application are not particularly limited, including various selectable marker genes commonly used in the transgenic art, representative examples including (but not limited to): a luciferin protein, or a luciferase (e.g., firefly luciferase, renilla luciferase), a green fluorescent protein, a yellow fluorescent protein, a red fluorescent protein, or a combination thereof.
As used herein, the term "Cas protein" refers to a nuclease. One preferred Cas protein is Cas9 protein. Typical Cas9 proteins include (but are not limited to): cas9 from staphylococcus (Staphylococcus aureus). In the present application, the Cas9 protein may also be replaced by a Cpf1 nuclease, the source of the Cpf1 nuclease being selected from the group consisting of: amino acid coccus (Acidomicrocos), trichosporon (Lachnospiraceae), amino acid coccus mutants, and Trichosporon mutants.
Cytosine deaminase apodec 3A
Apodec 3A is a cytosine deaminase of human origin.
The apodec 3A enzyme has Cytosine deaminase activity and is capable of deaminating Cytosine (C) to Uracil (U).
There are amino acid residues which are highly conserved among a number of species, in particular at position 128, which corresponds to the sequence shown in SEQ ID NO. 1 of the present invention.
Accordingly, the term apodec 3A refers to a protein comprising the amino acid sequence of an apodec 3A enzyme that has not been mutated according to the invention in the amino acid sequence.
In one embodiment of the invention, the wild-type apodec 3A enzyme has the amino acid sequence shown in SEQ ID No. 1.
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYYYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSE(SEQ ID NO:1)
In a preferred embodiment, the APOBEC3A (R128A) enzyme has the amino acid sequence shown in SEQ ID NO. 2.
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAAAIYYYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSE(SEQ ID NO:2)
In a preferred embodiment, the APOBEC3A (Y130F) enzyme has the amino acid sequence shown in SEQ ID NO. 3.
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIFYYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSE(SEQ ID NO:3)
In a preferred embodiment, the APOBEC3A (R128A and Y130F) enzyme has the amino acid sequence shown in SEQ ID NO. 4.
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAAAIFYYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSE(SEQ ID NO:4)
The gene editing enzyme and its coding nucleic acid
As used herein, the terms "gene-editing enzyme", "gene-editing enzyme of the invention", "apodec 3A of the invention R128A "interchangeably used" refers to a gene-editing enzyme of the second aspect of the invention having the structure of formula I:
Z1-L1-Z2-L2-Z3-Z4 (I)
wherein Z1 is the amino acid sequence of a mutein according to the first aspect of the invention;
z2 is the amino acid sequence of nuclease Cas 9;
z3 is the coding sequence of uracil DNA glycosylase inhibitor (Uracil DNA glycosylase inhibitor, UGI);
l1, L2 and L3 are each independently an optional linker peptide sequence;
z4 is no or nuclear localization signal element (NLS);
and each "-" is independently a peptide bond.
In a preferred embodiment, the amino acid sequence of Z2 is shown in SEQ ID NO. 8.
DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD(SEQ ID NO:8)
In a preferred embodiment, the amino acid sequence of Z3 is shown in SEQ ID NO. 11.
TNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKML(SEQ ID NO:11)
In one embodiment of the invention, each of L1, L2 and L3 independently has an amino acid sequence selected from the group consisting of: GGS (GGS) 2 、(GGS) 3 、(GGS) 4 、(GGS) 5 、(GGS) 6 、(GGS) 7 Or a combination thereof.
In a preferred embodiment, the amino acid sequence of L1 is TPGTSESATPES (SEQ ID NO: 5); the amino acid sequence of the L2 is SGGS (SEQ ID NO: 6); the amino acid sequence of L3 is SGGS (SEQ ID NO: 7).
In a preferred embodiment, the Z4 is a nuclear localization signaling element (NLS) and the amino acid sequence is PKKKRKV (SEQ ID NO: 9).
In a preferred embodiment of the present invention, a typical amino acid sequence of the gene editing enzyme of the present invention is shown in SEQ ID NO. 10.
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAAAIYYYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV*(SEQ ID NO:10)
The invention also includes a polypeptide which hybridizes with SEQ ID NO of the present invention: 10 (preferably, 60% or more, 70% or more, 80% or more, more preferably 90% or more, more preferably 95% or more, most preferably 98% or more, such as 99%) of a polypeptide or protein having the same or similar function.
The term "same or similar functions" mainly means: "Activity to catalyze the hydrolytic deamination of adenine to hypoxanthine".
It will be appreciated that when the amino acid numbering in a gene-editing enzyme of the invention is made based on SEQ ID NO. 10, and when the homology of a particular gene-editing enzyme to the sequence shown in SEQ ID NO. 10 is 80% or more, the amino acid numbering of the gene-editing enzyme may have a misalignment, such as a misalignment of 1-5 positions to the N-or C-terminus of the amino acid, relative to the amino acid numbering of SEQ ID NO. 10, and such a misalignment is generally understood by those skilled in the art to be within reasonable limits and mutants having the same or similar resulting catalytic activity of the gene-editing enzyme that do not have 80% (e.g., 90%, 95%, 98%) due to the misalignment of the amino acid numbering are not within the scope of the gene-editing enzyme of the invention.
The gene editing enzymes of the invention are synthetic or recombinant proteins, i.e., can be the products of chemical synthesis, or can be produced from a prokaryotic or eukaryotic host (e.g., bacteria, yeast, plants) using recombinant techniques. Depending on the host used in the recombinant production protocol, the gene editing enzymes of the invention may be glycosylated or may be non-glycosylated. The gene editing enzymes of the invention may or may not also include an initial methionine residue.
The invention also includes fragments, derivatives and analogues of the gene editing enzyme. As used herein, the terms "fragment," "derivative" and "analog" refer to a protein that retains substantially the same biological function or activity of the gene-editing enzyme.
The gene-editing enzyme fragment, derivative or analogue of the invention may be (i) a gene-editing enzyme having one or more conserved or non-conserved amino acid residues (preferably conserved amino acid residues) substituted, which may or may not be encoded by the genetic code, or (ii) a gene-editing enzyme having a substituent in one or more amino acid residues, or (iii) a gene-editing enzyme formed by fusion of a mature gene-editing enzyme with another compound, such as a compound that extends the half-life of the gene-editing enzyme, for example polyethylene glycol, or (iv) a gene-editing enzyme formed by fusion of an additional amino acid sequence to the gene-editing enzyme sequence, such as a leader sequence or secretory sequence or sequence used to purify the gene-editing enzyme or a proteolytic sequence, or fusion protein with the formation of an antigen IgG fragment. Such fragments, derivatives and analogs are within the purview of one skilled in the art and would be well known in light of the teachings herein. In the present invention, conservatively substituted amino acids are preferably generated by amino acid substitution according to Table I.
TABLE I
Initial residues Representative substitution Preferred substitution
Ala(A) Val;Leu;Ile Val
Arg(R) Lys;Gln;Asn Lys
Asn(N) Gln;His;Lys;Arg Gln
Asp(D) Glu Glu
Cys(C) Ser Ser
Gln(Q) Asn Asn
Glu(E) Asp Asp
Gly(G) Pro;Ala Ala
His(H) Asn;Gln;Lys;Arg Arg
Ile(I) Leu;Val;Met;Ala;phe Leu
Leu(L) Ile;Val;Met;Ala;phe Ile
Lys(K) Arg;Gln;Asn Arg
Met(M) Leu;phe;Ile Leu
Phe(F) Leu;Val;Ile;Ala;Tyr Leu
Pro(P) Ala Ala
Ser(S) Thr Thr
Thr(T) Ser Ser
Trp(W) Tyr;phe Tyr
Tyr(Y) Trp;phe;Thr;Ser Phe
Val(V) Ile;Leu;Met;phe;Ala Leu
In addition, the gene-editing enzyme of the present invention may be modified. Modified (typically without altering the primary structure) forms include: chemical derivative forms of gene editing enzymes in vivo or in vitro such as acetylation or carboxylation. Modifications also include glycosylation, such as those resulting from glycosylation modifications during synthesis and processing of the gene-editing enzyme or during further processing steps. Such modification may be accomplished by exposing the gene editing enzyme to an enzyme that performs glycosylation (e.g., mammalian glycosylase or deglycosylase). Modified forms also include sequences having phosphorylated amino acid residues (e.g., phosphotyrosine, phosphoserine, phosphothreonine). Also included are gene editing enzymes modified to improve their proteolytic resistance or to optimize their solubility properties.
The term "polynucleotide encoding a gene-editing enzyme" may include polynucleotides encoding the gene-editing enzymes of the invention, as well as polynucleotides further comprising additional coding and/or non-coding sequences.
The invention also relates to variants of the above polynucleotides which encode fragments, analogs and derivatives of the polypeptides or gene editing enzymes having the same amino acid sequence as the invention. Such nucleotide variants include substitution variants, deletion variants and insertion variants. As known in the art, an allelic variant is a substitution of a polynucleotide, which may be a substitution, deletion, or insertion of one or more nucleotides, without substantially altering the function of the gene-editing enzyme it encodes.
The invention also relates to polynucleotides which hybridize to the sequences described above and which have at least 50%, preferably at least 70%, more preferably at least 80% identity between the two sequences. The invention relates in particular to polynucleotides which hybridize under stringent conditions (or stringent conditions) to the polynucleotides of the invention. In the present invention, "stringent conditions" means: (1) Hybridization and elution at lower ionic strength and higher temperature, e.g., 0.2 XSSC, 0.1% SDS,60 ℃; or (2) adding denaturing agents such as 50% (v/v) formamide, 0.1% calf serum/0.1% Ficoll,42℃and the like during hybridization; or (3) hybridization only occurs when the identity between the two sequences is at least 90% or more, more preferably 95% or more.
The gene editing enzymes and polynucleotides of the invention are preferably provided in isolated form, and more preferably purified to homogeneity.
The full-length polynucleotide sequence of the present invention can be obtained by PCR amplification, recombinant methods or artificial synthesis. For the PCR amplification method, primers can be designed according to the nucleotide sequences disclosed in the present invention, particularly the open reading frame sequences, and amplified to obtain the relevant sequences using a commercially available cDNA library or a cDNA library prepared according to a conventional method known to those skilled in the art as a template. When the sequence is longer, it is often necessary to perform two or more PCR amplifications, and then splice the amplified fragments together in the correct order.
Once the relevant sequences are obtained, recombinant methods can be used to obtain the relevant sequences in large quantities. This is usually done by cloning it into a vector, transferring it into a cell, and isolating the relevant sequence from the propagated host cell by conventional methods.
Furthermore, the sequences concerned, in particular fragments of short length, can also be synthesized by artificial synthesis. In general, fragments of very long sequences are obtained by first synthesizing a plurality of small fragments and then ligating them.
At present, it is already possible to obtain the DNA sequences encoding the proteins of the invention (or fragments or derivatives thereof) entirely by chemical synthesis. The DNA sequence can then be introduced into a variety of existing DNA molecules (or vectors, for example) and cells known in the art. In addition, mutations can be introduced into the protein sequences of the invention by chemical synthesis.
Methods of amplifying DNA/RNA using PCR techniques are preferred for obtaining polynucleotides of the invention. In particular, when it is difficult to obtain full-length cDNA from a library, it is preferable to use RACE method (RACE-cDNA end rapid amplification method), and primers for PCR can be appropriately selected according to the sequence information of the present invention disclosed herein and synthesized by a conventional method. The amplified DNA/RNA fragments can be isolated and purified by conventional methods, such as by gel electrophoresis.
The method of the invention
The invention also provides a method for single base site-directed editing of genes, which comprises the following steps:
(i) Providing a cell and a first vector and a second vector, wherein the first vector comprises an expression cassette for a gene editing enzyme according to the second aspect of the invention and the second vector comprises an expression cassette for expression of sgRNA;
(ii) Infecting said cells with said first and second vectors, thereby performing single base site-directed editing within said cells.
In another preferred embodiment, wherein the first vector comprises a first nucleic acid construct having the structure of formula II of 5'-3' (5 'to 3'):
P1-X1-L4-X2 (II)
wherein,,
p1 is a first promoter sequence;
x1 is a nucleotide sequence encoding a gene editing enzyme according to the second aspect of the invention;
l4 is a no or a linking sequence;
x2 is a polyA sequence;
and each "-" is independently a bond or a nucleotide linking sequence.
Wherein the first promoter is selected from the group consisting of: CMV promoter, CAG promoter, PGK promoter, EF1 a promoter, EFs promoter, or a combination thereof. In a preferred embodiment, the first promoter sequence is a CMV promoter.
In one embodiment of the invention, the length of the linker sequence is 30-120nt, preferably 48-96nt, and preferably a multiple of 3.
In the method, the first carrier and the second carrier may be the same or different. In a preferred embodiment, the first carrier and the second carrier may be the same carrier.
Preferably, the first vector and/or the second vector further comprises an expression cassette for expressing a selectable marker. The screening markers are selected from the group consisting of: green fluorescent protein, yellow fluorescent protein, red fluorescent protein, blue fluorescent protein, or a combination thereof.
In one embodiment of the invention, the method is non-diagnostic and non-therapeutic.
In the method of the invention, the cells are from the following species: human, non-human mammal, poultry, plant, or microorganism. Wherein the non-human mammal comprises rodent (such as mouse, rat, rabbit), cow, pig, sheep, horse, dog, cat, and non-human primate (such as monkey).
In one embodiment of the invention, the cell is selected from the group consisting of: somatic cells, stem cells, germ cells, non-dividing cells, or a combination thereof. Preferably, the cells are selected from the group consisting of: kidney cells, epithelial cells, endothelial cells, neural cells, or combinations thereof.
In the invention, when the method is used for gene editing, the editing window is the 4 th to 7 th bases of the sgRNA targeted 20 base sequence, wherein the editing efficiency of the 5 th base is highest, the distribution is obviously reduced towards two sides, the editing window of a non-mutated ABE7.10 editing system is wider than the method, the editing window is positioned from the 3 rd amino acid to the 9 th amino acid, the editing efficiency of the 5 th base is highest, and the editing window is distributed towards two sides in a gradually reduced trend.
Gene editing enzyme based on adenine base editor ABE
The present inventors have also developed a novel gene editing enzyme based on adenine base editor ABE (e.g., ABE7.10 F148A ). Specifically, the present inventors mutated the TadA fragment and the amino acid residue F at position 148 of the TadA fragment in adenine deaminase (TadA-TadA) related to adenine base editor ABE to a (i.e., tadA), respectively F148A -TadA* F148A ) The results indicate that it is possible to achieve, in the mean time, effective DNA targeting activityThe gene editing window is obviously narrowed, so that the accuracy of gene editing can be obviously improved; and experiments have shown that when the mutant (i.e., tadA F148A -TadA* F148A ) In the gene editing system, the off-target effect of RNA is greatly reduced.
This novel gene editing enzyme is described in detail in the inventor's prior chinese patent application CN2019102729593 (filing date 2019, 4 months, 4 days). The entire contents of this chinese application are incorporated herein by reference in their entirety.
The main advantages of the invention include:
1) The editing window of a single base editing system BE3-hA3A is reduced, and the accuracy of single base editing is greatly improved. When the method is used for gene editing, the editing window is the 4 th to 7 th bases of the sgRNA targeted 20 base sequence, wherein the editing efficiency of the 5 th base is highest, the distribution to two sides is obviously reduced, the editing window of a non-mutated BE3-hA3A editing system is wider than the method, the editing window is located between the 3 rd amino acid and the 9 th amino acid, the editing efficiency of the 5 th base is highest, and the editing window is distributed to the two sides to BE gradually reduced.
2) Almost eliminates the point mutation generated by the single base editing system BE3-hA3A on the RNA level, and greatly improves the specificity of the single base editing system ABE.
3)BE3-hA3A R128A And BE3-hA3A Y130F Almost maintained the editing activity of BE3-hA3A, and maintained the consistent activity in the editing site of interest.
The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. The experimental procedure, which does not address the specific conditions in the examples below, is generally followed by routine conditions, such as, for example, sambrook et al, molecular cloning: conditions described in the laboratory Manual (New York: cold Spring Harbor Laboratorypress, 1989) or as recommended by the manufacturer. Percentages and parts are weight percentages and parts unless otherwise indicated.
Unless otherwise indicated, the materials and reagents used in the examples were all commercially available products.
Gene editing enzyme ABE7.10 F148A (i.e. tadA) F148A -TadA* F148A ") can be found in CN2019102729593.
Methods and materials
Transient transfection and sequencing
Plasmids were constructed according to standard protocols using NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs). 293T cells were seeded in 10cm dishes and in Dulbecco's modified Eagle Medium (DMEM, thermo Fisher Scientific) supplemented with 10% FBS (Thermo Fisher Scientific) and penicillin/streptomycin at 37℃with 5% CO 2 And (5) culturing. Cells were transfected with 30 μg of plasmid using Lipofectamine 3000 (Thermo Fisher Scientific). Three days after transfection, cells were digested with 0.05% trypsin (Thermo Fisher Scientific) and prepared for FACS. GFP positive cells were sorted and stored in DMEM or Trizol (Ambion) to determine DNA base editing or RNA-seq. To determine the efficiency of DNA base editing, cells were lysed using a one-step mouse genotyping kit (Vazyme) followed by deep sequencing using Hi-TOM or Sanger sequencing using EditR 1.0.8 quantification. For RNA-seq, 500000 cells are collected and RNA is extracted according to standard protocols and then converted to cDNA, which is used for high throughput RNA-seq.
RNA editing analysis by RNA sequencing
High throughput mRNA sequencing (RNA-seq) was performed with average coverage of 125x using an Illumina Hiseq. FastQC (v0.11.3) and Trimmomatic (v0.36) are used for quality control. Qualified reads were mapped to the reference genome (envelope GRCh 38) in a 2 pass mode using STAR (v 2.5.2 b), the parameters of which were implemented by the code item. The repeat items of the mapped BAM file are then ordered and marked using the Picard tool (v2.3.0). Refined BAM files were split read, cross-splice, local rearrangement, base recalibration, and variant calls using SplitNCigarReads, indelRealigner, baseRecalibrator and replotypecller tools from GATK (v 3.5), respectively. To determine variants with high confidence, clusters of at least 5 SNVs within a window of 35 bases were filtered, and variants with a gene quality score >25 were retained, mapping quality score >20, fisher Strand value (FS > 30.0), qual By depth value (QD < 2.0), and sequencing depth >20.
Any reliable variants found in wild-type 293T cells were considered SNPs and filtered from GFP and base editor transfected groups for off-target analysis. The edit rate was calculated as the number of mutation reads divided by the sequencing depth of each site. To analyze the predicted variant effect of each off-target, variant annotation was performed using a variant effect predictor (VEP, v 94) and GRCh38 database.
Library construction of single cell full-length RNA-seq
Individual human 293T cells were manually picked after FACS, lysed and cDNA synthesized using the Smart-seq2 protocol. Single cell cDNAs (2, 3) are then amplified and fragmented as described previously. A sequencing library was constructed (New England Biolabs), quality checked and sequenced on an Illumina HiSeq X-Ten platform (Novogene) using paired-end 150-bp reads.
Processing single cell RNA-seq data
The original reads of single cell RNA-seq data were first trimmed and aligned with the GRCh38 human transcriptome (STAR v2.5.2 b). After de-duplication, RNA SNV from individual cells was identified using GATK software (v 3.5). Those SNVs detected in single cells with DP.gtoreq.20.0, FS.gtoreq.30.0 and QD.gtoreq.2.0 were retained for downstream analysis.
Statistical analysis
All values are shown as mean +/-SEM. The unpaired Student's t test (double tailed) was used for comparison, with p <0.05 considered statistically significant.
Example 1: off-target RNA SNV detection for various single base editing systems
In this example, to assess off-target effects of gene editing on RNA level, CBE, BE3 (APOBEC 1-nCas 9-UGI) or ABE, ABE7.10 (TadA-nCas 9), and GFP with or without single guide RNA (sgRNA) were transfected into cultured 293T cells. After 72 hours of incubation, GFP-expressing cells were collected by FACS and then analyzed by RNA-seq. The experimental results of each group were compared to wild-type (WT, untransfected) samples, calling RNA SNV in each transfected group (fig. 1A).
The 9 groups of transfected cells included cells expressing GFP, apodec 1, BE3 with "site 3" sgrnas, BE3 with "RNF2" sgrnas, tadA-TadA, ABE7.10 with "site 1" sgrnas, ABE7.10 with "site 2" sgrnas (fig. 5).
First, high targeting efficiency of DNA editing of BE3 and ABE7.10 in these 293T cells was verified using targeted depth sequencing, and the results are shown in figures 1B to 1E.
Next, RNA-seq (two or three replicates per group) was performed on these samples at an average depth of 125 x. RNA SNV was called from the RNA-seq data separately in each repeat, filtering out those identified in any WT cells.
The results are shown in FIGS. 1F to 1H and FIG. 6. 742+/-113 (SEM, n=6) RNA SNV was found in GFP transfected cells. Surprisingly, there was a greater number of RNA SNVs in APOBEC1, BE3 without sgRNA, BE3 with site 3 or RNF2 sgRNA (5-40 fold in GFP-expressing cells alone). Similarly, large amounts of RNA SNV (5-10 fold) were also found in cells expressing TadA-TadA, ABE7.10 without sgrnas or ABE7.10 with either site 1 or site 2 sgrnas.
Interestingly, in this example, transfection of apodec 1 or TadA-TadA was found to induce higher amounts of RNA SNV than the other transfected groups, meaning that the increase in SNV in CBE or ABE treated cells may be caused by overexpression of deaminase apodec 1 or TadA.
Example 2: characterization of off-target RNA SNV
In this example, off-target RNA SNV was characterized for each single base editing system.
The results are shown in fig. 2 and fig. 7-12.
Notably, almost 100% of RNA SNVs identified in BE 3-treated cells were either G to A or C to U mutations, significantly higher than GFP transfected cells (FIGS. 2A and 2C and FIG. 7). This mutation bias is the same as apodec 1 itself, indicating that these mutations are not spontaneous, but are induced by BE3 or apodec 1.
Accordingly, 95% of ABE7.10 induced mutations were a to G or U to C, consistent with the effects of ABE7.10 (as in figures 2B and 2C and figure 7).
From the results, it can also be noted that GFP group also showed some bias for a to G and U to C mutations (as in figure 2C), possibly due to congenital mutation preference.
In any two samples of the BE 3-or ABE 7.10-transfected group, an overlap of 27.7+/-3.6% or 51.0+/-3.3% was observed, respectively, and these overlapping SNVs were significantly enriched in genes with high expression (FIGS. 2D and 8). However, no off-target sites overlapped with predicted off-target mutations and no similarity was observed between off-target and target sequences (fig. 2D and fig. 9).
Thus, CBE and ABE induced off-target RNA SNV are sgRNA independent and caused by overexpression of apodec 1 and TadA-TadA, respectively.
Interestingly, in this example, ABE7.10 was observed to induce 56 and 12 non-synonymous RNA SNVs in oncogenes and tumor suppressor genes, respectively, many of which showed editing rates higher than 40% and were verified by Sanger sequencing, increasing the risk of carcinogenic risk of worrying DNA base editing (e.g. fig. 2E, fig. 10-12).
Example 3: single cell RNA SNV analysis of cells transfected with Single base editing System
In this example, single cell RNA-seq sequencing was performed on four groups of cells (WT, GFP, BE 3-site 3 and ABE 7.10-site 1) to avoid random off-target signal loss due to population averaging.
The results are shown in FIG. 3 and FIGS. 13-17.
On average, 10,932 RefSeq genes were detected in each single cell by about 6.07 million sequencing reads, the results are shown in fig. 3B. Cells with high expression levels of the designated deaminase were selected for further analysis and the results are shown in figure 13. Also, severe RNA off-target and similar mutation patterns were observed in those cells expressing basal editing (as in fig. 3C-3F and fig. 14 and 15).
Interestingly, the percentage of off-target sites shared by any BE3 or ABE7.10 editing cells (4.5+/-1.0%) was much lower than the cell population (40.8+/-3.7%), indicating that BE 3-or ABE 7.10-induced off-target SNVs were essentially random and sgRNA independent (as in FIGS. 3G and 3H). Notably, the rate of editing of non-synonymous mutations detected in some oncogenes and tumor suppressors in single cells was higher than that observed from cell populations (see fig. 3I, 16 and 17).
Example 4: elimination of off-target RNA by rational design of deaminase
In this example, to further explore experimental methods that may eliminate the RNA off-target activity of base editing, the inventors studied the potential impact of destabilizing apodec 1 and TadA on RNA binding.
Specifically, it was tested whether replacement of apodec 1 with hA3A could eliminate the RNA off-target activity of BE3 (see fig. 4A).
The results are shown in fig. 4 and fig. 18-22.
Indeed, BE3 (hA 3A) transfected 293T cells showed significantly reduced off-target RNA SNV compared to BE3 (APOBEC 1) transfected cells while maintaining high targeted DNA editing efficiency (FIGS. 4B and 4C, FIG. 18).
In another approach, a point mutation W90A was introduced into the predicted RNA binding domain of apodec 1, and it was found that while BE3 (W90A) eliminated the RNA off-target effect, the targeted DNA editing activity of BE3 (W90A) was essentially absent (fig. 4B and 4C, fig. 18); the point mutations Y130F and R128A were introduced into the predicted RNA binding domain of APOBEC3A, and it was found that BE3-hA3A (Y130F) and BE3-hA3A (R128A) were able to eliminate RNA off-target effects, but the targeted DNA editing activity of BE3-hA3A (Y130F) and BE3-hA3A (R128A) remained essentially unchanged (FIGS. 4B and 4C, FIG. 18).
In this example, for modification of ABE, the inventors introduced D53G or F148A into TadA and TadA of ABE7.10 (fig. 4A).
Interestingly, ABE7.10 was found D53G And ABE7.10 F148A High DNA targeting efficiency is maintained and ABE7.10 F148A Shows no RNA off-target effect at all, and the results are shown in FIGS. 4D and 4E and FIG. 18. In addition, ABE7.10 F148A The levels of SNV remaining in transfected cells were similar to those in cells transfected with GFP alone (FIGS. 19 to 21). In this example, it was further confirmed that ABE7.10 F148A In another four sitesSimilar to ABE7.10 above (see fig. 4F).
It is particularly notable that in this example, the editing window of the mutant BE3-hA3A (R128A or Y130F) is significantly reduced, and the results are shown in FIG. 4B and FIG. 4C. This indicates an improved accuracy of DNA base editing.
Thus, the engineered BE3-hA3A of the present application R128A And BE3-hA3A Y130F Has a wide application prospect.
All documents mentioned in this disclosure are incorporated by reference in this disclosure as if each were individually incorporated by reference. Further, it will be appreciated that various changes and modifications may be made by those skilled in the art after reading the above teachings, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.
Sequence listing
<110> Shanghai life science institute of China academy of sciences
<120> a novel single base editing technique and application thereof
<130> P2019-0927
<160> 11
<170> SIPOSequenceListing 1.0
<210> 1
<211> 203
<212> PRT
<213> Homo sapiens (Homo sapiens)
<400> 1
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Tyr Tyr Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu
195 200
<210> 2
<211> 203
<212> PRT
<213> Artificial sequence (Artificial sequence)
<400> 2
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Ala
115 120 125
Ile Tyr Tyr Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu
195 200
<210> 3
<211> 203
<212> PRT
<213> Artificial sequence (Artificial sequence)
<400> 3
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Phe Tyr Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu
195 200
<210> 4
<211> 203
<212> PRT
<213> Artificial sequence (Artificial sequence)
<400> 4
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Ala
115 120 125
Ile Phe Tyr Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu
195 200
<210> 5
<211> 12
<212> PRT
<213> Artificial sequence (Artificial sequence)
<400> 5
Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser
1 5 10
<210> 6
<211> 4
<212> PRT
<213> Artificial sequence (Artificial sequence)
<400> 6
Ser Gly Gly Ser
1
<210> 7
<211> 4
<212> PRT
<213> Artificial sequence (Artificial sequence)
<400> 7
Ser Gly Gly Ser
1
<210> 8
<211> 1367
<212> PRT
<213> Artificial sequence (Artificial sequence)
<400> 8
Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly
1 5 10 15
Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys
20 25 30
Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly
35 40 45
Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys
50 55 60
Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr
65 70 75 80
Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe
85 90 95
Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His
100 105 110
Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His
115 120 125
Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser
130 135 140
Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met
145 150 155 160
Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp
165 170 175
Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn
180 185 190
Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys
195 200 205
Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu
210 215 220
Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu
225 230 235 240
Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp
245 250 255
Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp
260 265 270
Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu
275 280 285
Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile
290 295 300
Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met
305 310 315 320
Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala
325 330 335
Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp
340 345 350
Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln
355 360 365
Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly
370 375 380
Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys
385 390 395 400
Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly
405 410 415
Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu
420 425 430
Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro
435 440 445
Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met
450 455 460
Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val
465 470 475 480
Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn
485 490 495
Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu
500 505 510
Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr
515 520 525
Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys
530 535 540
Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val
545 550 555 560
Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser
565 570 575
Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr
580 585 590
Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn
595 600 605
Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu
610 615 620
Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His
625 630 635 640
Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr
645 650 655
Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys
660 665 670
Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala
675 680 685
Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys
690 695 700
Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His
705 710 715 720
Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile
725 730 735
Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg
740 745 750
His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr
755 760 765
Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu
770 775 780
Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val
785 790 795 800
Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln
805 810 815
Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu
820 825 830
Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp
835 840 845
Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly
850 855 860
Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn
865 870 875 880
Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
885 890 895
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys
900 905 910
Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys
915 920 925
His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu
930 935 940
Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys
945 950 955 960
Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu
965 970 975
Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val
980 985 990
Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val
995 1000 1005
Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser
1010 1015 1020
Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn
1025 1030 1035 1040
Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile
1045 1050 1055
Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val
1060 1065 1070
Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met
1075 1080 1085
Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe
1090 1095 1100
Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala
1105 1110 1115 1120
Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro
1125 1130 1135
Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly Lys
1140 1145 1150
Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met
1155 1160 1165
Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys
1170 1175 1180
Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr
1185 1190 1195 1200
Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala
1205 1210 1215
Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val
1220 1225 1230
Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro
1235 1240 1245
Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr
1250 1255 1260
Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile
1265 1270 1275 1280
Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His
1285 1290 1295
Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe
1300 1305 1310
Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr
1315 1320 1325
Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala
1330 1335 1340
Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp
1345 1350 1355 1360
Leu Ser Gln Leu Gly Gly Asp
1365
<210> 9
<211> 7
<212> PRT
<213> Artificial sequence (Artificial sequence)
<400> 9
Pro Lys Lys Lys Arg Lys Val
1 5
<210> 10
<211> 1680
<212> PRT
<213> Artificial sequence (Artificial sequence)
<400> 10
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Ala
115 120 125
Ile Tyr Tyr Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser
195 200 205
Glu Ser Ala Thr Pro Glu Ser Asp Lys Lys Tyr Ser Ile Gly Leu Ala
210 215 220
Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys
225 230 235 240
Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser
245 250 255
Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr
260 265 270
Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg
275 280 285
Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met
290 295 300
Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu
305 310 315 320
Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile
325 330 335
Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu
340 345 350
Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile
355 360 365
Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile
370 375 380
Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile
385 390 395 400
Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn
405 410 415
Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys
420 425 430
Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys
435 440 445
Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro
450 455 460
Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu
465 470 475 480
Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile
485 490 495
Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp
500 505 510
Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys
515 520 525
Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His Gln
530 535 540
Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys
545 550 555 560
Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr
565 570 575
Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro
580 585 590
Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu Asn
595 600 605
Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile
610 615 620
Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln
625 630 635 640
Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys
645 650 655
Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly
660 665 670
Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr
675 680 685
Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser
690 695 700
Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys
705 710 715 720
Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn
725 730 735
Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala
740 745 750
Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys
755 760 765
Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys
770 775 780
Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg
785 790 795 800
Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile Lys
805 810 815
Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp
820 825 830
Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu
835 840 845
Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys Gln
850 855 860
Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu
865 870 875 880
Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe
885 890 895
Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile His
900 905 910
Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser
915 920 925
Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly Ser
930 935 940
Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu
945 950 955 960
Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu
965 970 975
Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg
980 985 990
Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln
995 1000 1005
Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys
1010 1015 1020
Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp Gln
1025 1030 1035 1040
Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val
1045 1050 1055
Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr
1060 1065 1070
Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu
1075 1080 1085
Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys
1090 1095 1100
Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly
1105 1110 1115 1120
Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val
1125 1130 1135
Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg
1140 1145 1150
Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys
1155 1160 1165
Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe
1170 1175 1180
Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp
1185 1190 1195 1200
Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro
1205 1210 1215
Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val
1220 1225 1230
Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala
1235 1240 1245
Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile
1250 1255 1260
Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn
1265 1270 1275 1280
Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr
1285 1290 1295
Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr
1300 1305 1310
Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg
1315 1320 1325
Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys
1330 1335 1340
Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val
1345 1350 1355 1360
Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu
1365 1370 1375
Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro
1380 1385 1390
Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu
1395 1400 1405
Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg
1410 1415 1420
Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu
1425 1430 1435 1440
Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr
1445 1450 1455
Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe
1460 1465 1470
Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser
1475 1480 1485
Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val
1490 1495 1500
Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala
1505 1510 1515 1520
Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala
1525 1530 1535
Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser
1540 1545 1550
Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly
1555 1560 1565
Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Ser Gly
1570 1575 1580
Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln
1585 1590 1595 1600
Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu
1605 1610 1615
Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr
1620 1625 1630
Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro
1635 1640 1645
Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn
1650 1655 1660
Lys Ile Lys Met Leu Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys Val
1665 1670 1675 1680
<210> 11
<211> 83
<212> PRT
<213> Homo sapiens (Homo sapiens)
<400> 11
Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val
1 5 10 15
Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile
20 25 30
Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu
35 40 45
Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr
50 55 60
Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile
65 70 75 80
Lys Met Leu

Claims (41)

1. A gene editing enzyme, which is characterized in that the structure of the gene editing enzyme is shown as a formula I:
Z1-L1-Z2-L2-Z3-Z4 (I)
wherein,,
z1 is the amino acid sequence of a mutant protein of cytosine deaminase APOBEC3A, wherein the mutant protein is a non-natural protein, and the mutant protein is mutated into an alanine A residue at the 128 th arginine R of the cytosine deaminase APOBEC 3A; wherein the 128 th bit is the 128 th bit corresponding to the sequence shown as SEQ ID NO. 1;
z2 is the amino acid sequence of nuclease Cas 9;
z3 is the amino acid sequence of uracil DNA glycosylase inhibitor;
l1, L2 and L3 are each independently an optional linker peptide sequence;
Z4 is a no or nuclear localization signaling element;
and each "-" is independently a peptide bond.
2. The gene editing enzyme of claim 1, wherein the cytosine deaminase apodec 3A is derived from the species: human (Homo sapiens).
3. The gene editing enzyme of claim 1, wherein the mutant protein has an activity of catalyzing the hydrolytic deamination of cytosine to uracil.
4. The gene editing enzyme according to claim 1, wherein the mutant protein is cytosine deaminase apodec 3A with R128A mutation, and the amino acid sequence of the mutant protein is shown in SEQ ID No. 2.
5. The gene editing enzyme according to claim 1, wherein the mutant protein is cytosine deaminase apodec 3A in which R128A and Y130F mutations occur, and the amino acid sequence of the mutant protein is shown in SEQ ID No. 4.
6. The gene editing enzyme according to claim 1, wherein the amino acid sequence of L1 is shown in SEQ ID No. 5.
7. The gene editing enzyme according to claim 1, wherein the amino acid sequence of L1 is identical to the amino acid sequence shown in SEQ ID No. 5.
8. The gene editing enzyme according to claim 1, wherein the amino acid sequence of L2 is shown in SEQ ID No. 6.
9. The gene editing enzyme according to claim 1, wherein the amino acid sequence of L2 is identical to the amino acid sequence shown in SEQ ID No. 6.
10. The gene editing enzyme according to claim 1, wherein the amino acid sequence of L3 is shown in SEQ ID No. 7.
11. The gene editing enzyme according to claim 1, wherein the amino acid sequence of L3 is identical to the amino acid sequence shown in SEQ ID No. 7.
12. The gene editing enzyme of claim 1, wherein in Z2, the source of nuclease Cas9 is selected from the group consisting of: streptococcus pyogenes (streptococcus) or staphylococcus aureus (Staphylococcus aureus).
13. The gene editing enzyme of claim 1, wherein in Z2, the nuclease Cas9 is replaced with a Cpf1 nuclease, the source of the Cpf1 nuclease being selected from the group consisting of: amino acid coccus (Acidoaminococcus) or Trichosporon (Lachnospiraceae).
14. The gene editing enzyme according to claim 1, wherein the amino acid sequence of Z2 is shown in SEQ ID No. 8.
15. The gene editing enzyme according to claim 1, wherein the amino acid sequence of Z2 is identical to the amino acid sequence shown in SEQ ID No. 8.
16. The gene editing enzyme according to claim 1, wherein the amino acid sequence of Z3 is shown in SEQ ID NO. 11.
17. The gene editing enzyme according to claim 1, wherein the amino acid sequence of Z3 is identical to the amino acid sequence shown in SEQ ID No. 11.
18. The gene editing enzyme according to claim 1, wherein the amino acid sequence of Z4 is shown in SEQ ID No. 9.
19. The gene editing enzyme according to claim 1, wherein the amino acid sequence of Z4 is identical to the amino acid sequence shown in SEQ ID No. 9.
20. The gene editing enzyme according to claim 1, wherein the amino acid sequence of the gene editing enzyme is shown in SEQ ID NO. 10.
21. A polynucleotide encoding a gene editing enzyme according to any one of claims 1 to 20.
22. The polynucleotide of claim 21, wherein said polynucleotide is selected from the group consisting of:
(a) A polynucleotide encoding an amino acid sequence shown in SEQ ID NO. 10;
(b) A polynucleotide complementary to the polynucleotide of (a).
23. The polynucleotide of claim 21, wherein said ORF of a gene editing enzyme additionally comprises an auxiliary element selected from the group consisting of: a signal peptide, a secretory peptide, a tag sequence, or a combination thereof.
24. The polynucleotide of claim 23, wherein said signal peptide is a nuclear localization sequence.
25. The polynucleotide of claim 21, wherein said polynucleotide is selected from the group consisting of: a DNA sequence, an RNA sequence, or a combination thereof.
26. A vector comprising the polynucleotide of any one of claims 22-25.
27. The vector of claim 26, wherein the vector comprises an expression vector, a shuttle vector, an integration vector.
28. A host cell comprising the vector of claim 26 or 27, or having incorporated into its genome the polynucleotide of any one of claims 22-25, wherein the host cell is not a human germ cell or stem cell or plant cell.
29. The host cell of claim 28, wherein the host is a prokaryotic cell or a eukaryotic cell.
30. A non-diagnostic and non-therapeutic method for single base site-directed editing of a gene comprising the steps of:
(i) Providing a cell and a first vector and a second vector, wherein the first vector comprises the gene-editing enzyme expression cassette of any one of claims 1-20 and the second vector comprises an sgRNA expression cassette;
(ii) Infecting said cells with said first and second vectors, thereby performing single base site-directed editing within said cells,
wherein the cells are not human germ cells or stem cells or plant cells.
31. The method of claim 30, wherein the first vector comprises a first nucleotide construct having a 5 'to 3' structure of formula II:
P1-X1-L4-X2 (II)
wherein P1 is a first promoter sequence;
x1 is a nucleotide sequence encoding the gene editing enzyme of any one of claims 1-20;
l4 is a no or a linking sequence;
x2 is a polyA sequence;
and each "-" is independently a bond or a nucleotide linking sequence.
32. The method of claim 31, wherein said first promoter is selected from the group consisting of: CMV promoter, CAG promoter, PGK promoter, EF1 a promoter, EFs promoter, or a combination thereof.
33. The method of claim 31, wherein the linker sequence has a length of 30-120nt.
34. The method of claim 30, wherein the first carrier and the second carrier are the same or different.
35. The method of claim 30, wherein the first carrier and the second carrier are the same carrier.
36. The method of claim 30, wherein the first vector and/or the second vector further comprises an expression cassette for expressing a selectable marker.
37. The method of claim 36, wherein the selectable marker is selected from the group consisting of: green fluorescent protein, yellow fluorescent protein, red fluorescent protein, blue fluorescent protein, or a combination thereof.
38. The method of claim 30, wherein the cells are from the following species: human, non-human mammal, poultry, or microorganism.
39. The method of claim 30, wherein the editing window is the 4 th to 7 th bases of the 20 base sequence targeted by sgRNA when gene editing is performed by the method, wherein the editing efficiency of the 5 th base is highest, the editing window of the non-mutated BE3-hA3A editing system is significantly reduced toward two sides, the editing window is wider than the method, the editing efficiency of the 3 rd amino acid to the 9 th amino acid is highest, and the editing efficiency of the 5 th base is distributed toward two sides in a gradually decreasing trend.
40. A kit, comprising:
(a1) A first container, and a first vector in the first container, the first vector comprising the gene-editing enzyme expression cassette of any one of claims 1-20.
41. The kit of claim 40, further comprising:
(a2) A second container, and a second vector in the second container, the second vector comprising an expression cassette for expressing sgRNA.
CN201910493592.8A 2019-06-06 2019-06-06 Novel single-base editing technology and application thereof Active CN112048497B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910493592.8A CN112048497B (en) 2019-06-06 2019-06-06 Novel single-base editing technology and application thereof
PCT/CN2019/111770 WO2020244122A1 (en) 2019-06-06 2019-10-17 New-type single-base editing technique and use thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910493592.8A CN112048497B (en) 2019-06-06 2019-06-06 Novel single-base editing technology and application thereof

Publications (2)

Publication Number Publication Date
CN112048497A CN112048497A (en) 2020-12-08
CN112048497B true CN112048497B (en) 2023-11-03

Family

ID=73608634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910493592.8A Active CN112048497B (en) 2019-06-06 2019-06-06 Novel single-base editing technology and application thereof

Country Status (2)

Country Link
CN (1) CN112048497B (en)
WO (1) WO2020244122A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113073094B (en) * 2021-03-29 2023-03-28 中山大学 Single base mutation system based on cytidine deaminase LjCDA1L1_4a and mutants thereof
CN115261363B (en) * 2021-04-29 2024-01-30 中国科学院分子植物科学卓越创新中心 Method for measuring RNA deaminase activity of APOBEC3A and RNA high-activity APOBEC3A variant
CN115678872A (en) * 2021-06-04 2023-02-03 中国科学院脑科学与智能技术卓越创新中心 Novel Cas13 protein and screening method and application thereof
CN117587064A (en) * 2021-10-29 2024-02-23 中国种子集团有限公司 Method for improving amylose content of rice by mutating OsWaxy gene by single base gene editing technology

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108513575A (en) * 2015-10-23 2018-09-07 哈佛大学的校长及成员们 Nucleobase editing machine and application thereof
CN108822217A (en) * 2018-02-23 2018-11-16 上海科技大学 A kind of gene base editing machine
WO2018218188A2 (en) * 2017-05-25 2018-11-29 The General Hospital Corporation Base editors with improved precision and specificity
WO2019042284A1 (en) * 2017-09-01 2019-03-07 Shanghaitech University Fusion proteins for improved precision in base editing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019005886A1 (en) * 2017-06-26 2019-01-03 The Broad Institute, Inc. Crispr/cas-cytidine deaminase based compositions, systems, and methods for targeted nucleic acid editing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108513575A (en) * 2015-10-23 2018-09-07 哈佛大学的校长及成员们 Nucleobase editing machine and application thereof
WO2018218188A2 (en) * 2017-05-25 2018-11-29 The General Hospital Corporation Base editors with improved precision and specificity
WO2018218166A1 (en) * 2017-05-25 2018-11-29 The General Hospital Corporation Using split deaminases to limit unwanted off-target base editor deamination
WO2019042284A1 (en) * 2017-09-01 2019-03-07 Shanghaitech University Fusion proteins for improved precision in base editing
CN108822217A (en) * 2018-02-23 2018-11-16 上海科技大学 A kind of gene base editing machine
CN109021111A (en) * 2018-02-23 2018-12-18 上海科技大学 A kind of gene base editing machine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LOGUE,E.C.等.A DNA Sequence Recognition Loop on APOBEC3A Controls Substrate Specificity. PLOS ONE.2014,第9卷(第5期),e97062. *

Also Published As

Publication number Publication date
WO2020244122A1 (en) 2020-12-10
CN112048497A (en) 2020-12-08

Similar Documents

Publication Publication Date Title
CN112048497B (en) Novel single-base editing technology and application thereof
CN111778233B (en) Novel single-base editing technology and application thereof
Cao et al. Construction of BAC‐based physical map and analysis of chromosome rearrangement in Chinese hamster ovary cell lines
JP2021166514A5 (en)
US20200032294A1 (en) Somatic haploid human cell line
JP6480647B1 (en) Method for producing eukaryotic cell in which DNA is edited, and kit used in the method
CN111742051A (en) Extended single guide RNA and uses thereof
CN110804628B (en) High-specificity off-target-free single-base gene editing tool
KR20170027743A (en) Methods and compositions for modifying a targeted locus
US7771714B2 (en) Medium comprising chicken leukemia inhibitory factor (LIF)
JP2017517250A (en) Epigenetic modification of the mammalian genome using targeted endonucleases
CN110300802A (en) Composition and base edit methods for animal embryo base editor
CN111172191B (en) Efficient gene knockout vector and application thereof
US20230091242A1 (en) Rna-guided genome recombineering at kilobase scale
EP4165180A2 (en) Engineered mad7 directed endonuclease
US11946163B2 (en) Methods for measuring and improving CRISPR reagent function
JP7210028B2 (en) Gene mutation introduction method
Omasa et al. Bacterial artificial chromosome library for genome‐wide analysis of Chinese hamster ovary cells
EP3953470A1 (en) Compositions and methods for improved gene editing
WO2022206352A1 (en) Prime editing tool, fusion rna, and use thereof
WO2020199200A1 (en) New-type single-base editing technique and use thereof
EP3666898A1 (en) Gene knockout method
CN115703842A (en) Base editor for efficient and highly accurate cytosine C to guanine G conversion
Dumitrache et al. TREX2 exonuclease defective cells exhibit double-strand breaks and chromosomal fragments but not Robertsonian translocations
CN105695509B (en) Method for obtaining high-purity myocardial cells

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210623

Address after: Room 1002, unit 1, building 7, 160 basheng Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai, 201203

Applicant after: Huida (Shanghai) Biotechnology Co.,Ltd.

Address before: 200031 No. 320, Yueyang Road, Shanghai, Xuhui District

Applicant before: Center for excellence and innovation of brain science and intelligent technology, Chinese Academy of Sciences

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant