CN115109798A - Improved CG base editing system - Google Patents

Improved CG base editing system Download PDF

Info

Publication number
CN115109798A
CN115109798A CN202210225327.3A CN202210225327A CN115109798A CN 115109798 A CN115109798 A CN 115109798A CN 202210225327 A CN202210225327 A CN 202210225327A CN 115109798 A CN115109798 A CN 115109798A
Authority
CN
China
Prior art keywords
leu
lys
glu
ser
asp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210225327.3A
Other languages
Chinese (zh)
Inventor
高彩霞
王升星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Qihe Biotechnology Co ltd
Original Assignee
Shanghai Blue Cross Medical Science Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Blue Cross Medical Science Research Institute filed Critical Shanghai Blue Cross Medical Science Research Institute
Publication of CN115109798A publication Critical patent/CN115109798A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8216Methods for controlling, regulating or enhancing expression of transgenes in plant cells
    • C12N15/8218Antisense, co-suppression, viral induced gene silencing [VIGS], post-transcriptional induced gene silencing [PTGS]
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H5/00Angiosperms, i.e. flowering plants, characterised by their plant parts; Angiosperms characterised otherwise than by their botanic taxonomy
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H6/00Angiosperms, i.e. flowering plants, characterised by their botanic taxonomy
    • A01H6/46Gramineae or Poaceae, e.g. ryegrass, rice, wheat or maize
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K19/00Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/52Genes encoding for enzymes or proenzymes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/10Cells modified by introduction of foreign genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2497Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing N- glycosyl compounds (3.2.2)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/93Ligases (6)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/02Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2) hydrolysing N-glycosyl compounds (3.2.2)
    • C12Y302/02027Uracil-DNA glycosylase (3.2.2.27)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04001Cytosine deaminase (3.5.4.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y603/00Ligases forming carbon-nitrogen bonds (6.3)
    • C12Y603/02Acid—amino-acid ligases (peptide synthases)(6.3.2)
    • C12Y603/02019Ubiquitin-protein ligase (6.3.2.19), i.e. ubiquitin-conjugating enzyme
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/95Fusion polypeptide containing a motif/fusion for degradation (ubiquitin fusions, PEST sequence)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Cell Biology (AREA)
  • Botany (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Physiology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Developmental Biology & Embryology (AREA)
  • Environmental Sciences (AREA)
  • Virology (AREA)
  • Toxicology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Natural Medicines & Medicinal Plants (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

The present invention relates to the field of genetic engineering. In particular, the present invention relates to an improved CG base editing system that enables efficient, accurate in vivo C to G base editing.

Description

Improved CG base editing system
Technical Field
The present invention relates to the field of genetic engineering. In particular, the present invention relates to an improved CG base editing system that enables efficient, accurate in vivo C to G base editing.
Background
In recent years, with the development of genome editing technology, a large number of base editors are continuously developed, improved and applied, including a Cytosine Base Editor (CBE) composed of Cas9 nickase (nCas9(D10A)) fused Cytosine deaminase (Cytosine deaminase), Uracil Glycosylase Inhibitor (UGI) and an Adenine Base Editor (ABE) composed of nCas9 fused Adenosine deaminase (adenosin deaminase), which mediate precise base substitutions between c.g. to t.a and a.t to g.c at targeted sites of animal and plant genomes (Komor et al, 2016; gaudell et al, 2017; Zong et al, 2018; Li et al, 2018). In 2019, Anzolobe et al constructed a guided editing system (Prime editing) by fusing nCas9(H840A) with reverse transcriptase (MMLV), and successfully achieved the substitution between any type of bases in the genome under the guidance of pegRNA, however, the editing efficiency of the system was limited by multiple factors such as the target sequence, PBS length and melting temperature (Tm) (Lin et al, 2020). In addition, when the CBE and ABE8e system (Richter et al, 2020) and the Cas9 variant SpG and SpRY (Walton et al, 2020) are combined, efficient base editing between C.G to T.A and A.T to G.C of any target site can be realized, the efficiency is far higher than that of the PE system, but the two systems cannot realize replacement between other types of bases. Therefore, the current development situation of gene editing still needs to develop a new base editing system to realize the replacement of other types of bases. ZHao et al (2020) and Kurt et al (2020) use nCas9(D10A) to fuse APOBEC1 cytosine deaminase and uracil-DNA glycosylase (UDG) to construct CG base editors, successfully achieving C-to-G transversion in mammalian cells, but also introducing a large amount of C-to-T, C-to-A and Indel and other byproducts. Furthermore, due to the difference of the DNA damage repair pathways of animals and plants, Indel mutation is more easily generated in a CG base editing system of the plants, so that the corresponding CG base editing system has not been established in the plants so far. Therefore, a CG base editing system needs to be established on plants to expand the single base editing range, and meanwhile, the CG base editing system is further optimized to be capable of mediating C-to-G base transversion in animal and plant genomes more efficiently and accurately.
Brief Description of Drawings
FIG. 1, shows the DNA damage occurrence involved in cytosine deamination and its potential repair pathways.
FIG. 2 shows a map of vector construction of A3A-PBE and PCGBE-1(Plant C to G Base Editing-1).
FIG. 3 shows the type and efficiency of mutations mediated by A3A-PBE and PCGBE-1 in rice protoplasts.
FIG. 4 shows the mutation types and efficiencies of A3A-PBE and PCGBE-1 in mediating OsNRT1.1B, OsPDS and OsGRF1 targets in rice callus.
FIG. 5 shows that A3A-PBE and PCGBE-1 mediate the mutation types and efficiencies of OsAAT and OsSWEET14 targets in rice callus.
FIG. 6 shows a vector construction map of PCGBE-2-6 after optimization.
FIG. 7 shows the mutation types and efficiencies of PCGBE-1-6 in rice callus for mediating OsNRT1.1B target.
FIG. 8 shows the mutation types and efficiencies of PCGBE-1-6 in rice callus for mediating OsPDS target.
FIG. 9 shows the mutation types and efficiencies of PCGBE-1-6 in rice callus for mediating OsGRF1 target.
FIG. 10 shows the difference between the C-to-G editing efficiency and the editing purity of PCGBEs-1-6.
Figure 11, shows an additional PCGBE binary vector construction.
FIG. 12, shows the type and efficiency of mutations mediated in rice regenerated plants by the two PCGBE systems shown in FIG. 11b and FIG. 11 c.
Detailed Description
A, define
In the present invention, unless otherwise specified, scientific and technical terms used herein have the meanings that are commonly understood by those skilled in the art. Also, protein and nucleic acid chemistry, molecular biology, cell and tissue culture, microbiology, immunology related terms, and laboratory procedures used herein are all terms and conventional procedures used extensively in the relevant art. For example, standard recombinant DNA and molecular cloning techniques used in the present invention are well known to those skilled in the art and are more fully described in the following references: sambrook, j., Fritsch, e.f. and manitis, t., Molecular Cloning: a Laboratory Manual; cold Spring Harbor Laboratory Press: cold Spring Harbor, 1989 (hereinafter referred to as "Sambrook"). Meanwhile, in order to better understand the present invention, the definitions and explanations of related terms are provided below.
As used herein, the term "and/or" encompasses all combinations of items linked by the term, as if each combination had been individually listed herein. For example, "a and/or B" encompasses "a", "a and B", and "B". For example, "A, B and/or C" encompasses "a", "B", "C", "a and B", "a and C", "B and C", and "a and B and C".
"genome" as used herein encompasses not only chromosomal DNA present in the nucleus of a cell, but organelle DNA present in subcellular components of the cell (e.g., mitochondria, plastids).
As used herein, "organism" includes any organism suitable for genome editing, preferably a eukaryote. Examples of organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cows, cats; poultry such as chicken, duck, goose; plants include monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis, and the like.
By "genetically modified organism" or "genetically modified cell" is meant an organism or cell that comprises within its genome an exogenous polynucleotide or modified gene or expression control sequence. For example, an exogenous polynucleotide can be stably integrated into the genome of an organism or cell and be inherited by successive generations. The exogenous polynucleotide may be integrated into the genome alone or as part of a recombinant DNA construct. The modified gene or expression regulatory sequence is one which comprises single or multiple deoxynucleotide substitutions, deletions and additions in the genome of the organism or cell.
"exogenous" with respect to a sequence means a sequence from a foreign species, or if from the same species, a sequence whose composition and/or locus has been significantly altered from its native form by deliberate human intervention.
"polynucleotide", "nucleic acid sequence", "nucleotide sequence" or "nucleic acid fragment" are used interchangeably and are single-or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural or altered nucleotide bases. Nucleotides are referred to by their single letter designation as follows: "A" is adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively), "C" represents cytidine or deoxycytidine, "G" represents guanosine or deoxyguanosine, "U" represents uridine, "T" represents deoxythymidine, "R" represents purine (A or G), "Y" represents pyrimidine (C or T), "K" represents G or T, "H" represents A or C or T, "I" represents inosine, and "N" represents any nucleotide.
"polypeptide," "peptide," and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residues is an artificial chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers. The terms "polypeptide", "peptide", "amino acid sequence" and "protein" may also include modifications including, but not limited to, glycosylation, lipid attachment, sulfation, gamma carboxylation of glutamic acid residues, hydroxylation and ADP-ribosylation.
Sequence "identity" has a art-recognized meaning and can be calculated using the disclosed techniques as the percentage of sequence identity between two nucleic acid or polypeptide molecules or regions. Sequence identity can be measured along the entire length of a polynucleotide or polypeptide or along regions of the molecule. (see, e.g., Computer Molecular Biology, desk, A.M., ed., Oxford University Press, New York, 1988; Biocomputing: information and Genome Projects, Smith, D.W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A.M., and Griffin, H.G., eds, Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heanje, G.academic Press, 1987; and Sequence Analysis, Priviskton, M.J., development, N.M., and Stock, 1991). Although there are many methods for measuring identity between two polynucleotides or polypeptides, the term "identity" is well known to the skilled person (Carrillo, H. & Lipman, D., SIAM J Applied Math 48:1073 (1988)).
The term "comprising" when used herein to describe a sequence of a protein or nucleic acid, the protein or nucleic acid may consist of the sequence or may have additional amino acids or nucleotides at one or both ends of the protein or nucleic acid, but still possess the activity described herein. Furthermore, it is clear to the skilled person that the methionine at the N-terminus of the polypeptide encoded by the start codon may be retained in certain practical cases (e.g.during expression in a particular expression system), but does not substantially affect the function of the polypeptide. Thus, in describing a particular polypeptide amino acid sequence in the specification and claims of this application, although it may not contain a methionine encoded by the start codon at the N-terminus, the sequence containing the methionine is also encompassed herein, and accordingly, the encoding nucleotide sequence may also contain the start codon; and vice versa.
Suitable conservative amino acid substitutions in peptides or proteins are known to those skilled in the art and can generally be made without altering the biological activity of the resulting molecule. In general, one of skill in The art recognizes that single amino acid substitutions in non-essential regions of a polypeptide do not substantially alter biological activity (see, e.g., Watson et al, Molecular Biology of The Gene,4th Edition,1987, The Benjamin/Cummings pub.co., p.224).
As used herein, the term "CRISPR effector protein" generally refers to nucleases found in naturally occurring CRISPR systems, as well as modified forms thereof, variants thereof, catalytically active fragments thereof, and the like. The term encompasses any effector protein capable of gene targeting (e.g., gene editing, gene targeting regulation, etc.) within a cell based on a CRISPR system. The CRISPR effector proteins described herein may for example be selected from Cas3, Cas8a, Cas5, Cas8b, Cas8C, Cas10d, Cse1, Cse2, Csy1, Csy2, Csy3, GSU0054, Cas10, Csm2, Cmr5, Cas10, Csx11, Csx10, Csf1, Cas9, Csn2, Cas4, Cpf1, C2C1, C2C3 or C2C2 proteins, or functional variants of these nucleases. Examples of "CRISPR effector proteins" include Cas9 nuclease or variants thereof. The Cas9 nuclease may be a Cas9 nuclease from a different species, such as spCas9 from streptococcus pyogenes (s.pyogenes) or SaCas9 derived from staphylococcus aureus (s.aureus). "Cas 9 nuclease" and "Cas 9" are used interchangeably herein to refer to an RNA-guided nuclease that includes a Cas9 protein or fragment thereof (e.g., a protein comprising the active DNA cleavage domain of Cas9 and/or the gRNA binding domain of Cas 9). Cas9 is a component of a CRISPR/Cas (clustered regularly interspaced short palindromic repeats and related systems) genome editing system that is capable of targeting and cleaving a DNA target sequence under the direction of a guide RNA to form a DNA Double Strand Break (DSB). Examples of "CRISPR effector proteins" may also include Cpf1 nuclease or variants thereof, such as high specificity variants. The Cpf1 nuclease may be Cpf1 nuclease from different species, such as Cpf1 nuclease from Francisella novicida U112, Acidaminococcus sp.bv3l6 and Lachnospiraceae bacterium ND 2006.
As used herein, "expression construct" refers to a vector, such as a recombinant vector, suitable for expression of a nucleotide sequence of interest in an organism. "expression" refers to the production of a functional product. For example, expression of a nucleotide sequence can refer to transcription of the nucleotide sequence (e.g., transcription to produce mRNA or functional RNA) and/or translation of the RNA into a precursor or mature protein.
The "expression construct" of the invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector, or, in some embodiments, may be an RNA (e.g., mRNA) capable of translation.
An "expression construct" of the invention may comprise regulatory sequences and nucleotide sequences of interest of different origin, or regulatory sequences and nucleotide sequences of interest of the same origin but arranged in a manner different from that normally found in nature.
"regulatory sequence" and "regulatory element" are used interchangeably to refer to a nucleotide sequence that is located upstream (5 'non-coding sequence), intermediate, or downstream (3' non-coding sequence) of a coding sequence and that affects the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences may include, but are not limited to, promoters, translation leader sequences, introns, and polyadenylation recognition sequences.
"promoter" refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment. In some embodiments of the invention, the promoter is a promoter capable of controlling transcription of a gene in a cell, whether or not it is derived from the cell. The promoter may be a constitutive promoter or a tissue-specific promoter or a developmentally regulated promoter or an inducible promoter.
"constitutive promoter" refers to a promoter that will generally cause a gene to be expressed in most cell types under most circumstances. "tissue-specific promoter" and "tissue-preferred promoter" are used interchangeably and refer to a promoter that is expressed primarily, but not necessarily exclusively, in a tissue or organ, but may also be expressed in a particular cell or cell type. "developmentally regulated promoter" refers to a promoter whose activity is determined by a developmental event. An "inducible promoter" selectively expresses an operably linked DNA sequence in response to an endogenous or exogenous stimulus (environmental, hormonal, chemical signal, etc.).
Examples of promoters include, but are not limited to, polymerase (pol) I, pol II, or pol III promoters. Examples of pol I promoters include the chicken RNA pol I promoter. Examples of pol II promoters include, but are not limited to, the cytomegalovirus immediate early (CMV) promoter, the rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and the simian virus 40(SV40) immediate early promoter. Examples of pol III promoters include the U6 and H1 promoters. Inducible promoters such as the metallothionein promoter may be used. Other examples of promoters include the T7 phage promoter, the T3 phage promoter, the β -galactosidase promoter, and the Sp6 phage promoter. When used in plants, the promoter may be the cauliflower mosaic virus 35S promoter, the maize Ubi-1 promoter, the wheat U6 promoter, the rice U3 promoter, the maize U3 promoter, the rice actin promoter.
As used herein, the term "operably linked" refers to a regulatory element (such as, but not limited to, a promoter sequence, a transcription termination sequence, and the like) linked to a nucleic acid sequence (e.g., a coding sequence or an open reading frame) such that transcription of the nucleotide sequence is controlled and regulated by the transcriptional regulatory element. Techniques for operably linking regulatory element regions to nucleic acid molecules are known in the art.
"introducing" a nucleic acid molecule (e.g., a plasmid, linear nucleic acid fragment, RNA, etc.) or a protein into an organism refers to transforming cells of the organism with the nucleic acid or protein such that the nucleic acid or protein is capable of functioning in the cells. "transformation" as used herein includes both stable transformation and transient transformation.
"Stable transformation" refers to the introduction of an exogenous nucleotide sequence into a genome, resulting in the stable inheritance of the exogenous gene. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any successive generation thereof.
"transient transformation" refers to the introduction of a nucleic acid molecule or protein into a cell that performs a function without stable inheritance of a foreign gene. In transient transformation, the foreign nucleic acid sequence is not integrated into the genome.
Two, C to G base editing system
The present invention provides a C to G base editing system for editing a target sequence in the genome of a cell, comprising:
a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide, wherein the first polypeptide comprises i) a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG), or ii) a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a Rad18 protein; and
a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of the cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA.
In some embodiments, the C to G base editing line further comprises
i) A second polypeptide and/or an expression construct comprising a nucleotide sequence encoding a second polypeptide, wherein said second polypeptide comprises a proliferating cell nuclear antigen with a mutated ubiquitin protein binding site: (Proliferating Cell Nuclear Ansigen, PCNA) and ubiquitin protein;
ii) a third polypeptide and/or an expression construct comprising a nucleotide sequence encoding the third polypeptide, wherein the third polypeptide comprises a mutated AP endonuclease (APE);
iii) a fourth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fourth polypeptide, wherein said fourth polypeptide comprises a Rad18 protein; and/or
iv) a fifth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fifth polypeptide, wherein the fifth polypeptide comprises uracil-DNA glycosylase (UDG).
In some embodiments, the C to G base editing system comprises:
a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding a first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG);
a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of a cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA; and
a second polypeptide and/or an expression construct comprising a nucleotide sequence encoding a second polypeptide, wherein said second polypeptide comprises a ubiquitin protein binding siteMutated proliferating cell nuclear antigen: (Proliferating Cell Nuclear Ansigen, PCNA) and ubiquitin protein.
In some embodiments, the C to G base editing system comprises:
a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease inactivated CRISPR effector protein and a uracil-DNA glycosylase (UDG);
a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of a cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA; and
a third polypeptide and/or an expression construct comprising a nucleotide sequence encoding a third polypeptide, wherein the third polypeptide comprises a mutated AP endonuclease (APE).
In some embodiments, the C to G base editing system comprises:
a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding a first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG);
a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of a cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA;
a second polypeptide and/or an expression construct comprising a nucleotide sequence encoding a second polypeptide, wherein said second polypeptide comprises a proliferating cell nuclear antigen with a mutated ubiquitin protein binding site: (Proliferating Cell Nuclear Ansigen, PCNA) and ubiquitin protein; and
a third polypeptide and/or an expression construct comprising a nucleotide sequence encoding the third polypeptide, wherein the third polypeptide comprises a mutated AP endonuclease (APE).
In some embodiments, the C to G base editing system comprises:
a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding a first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG);
a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of a cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA; and
a fourth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fourth polypeptide, wherein said fourth polypeptide comprises a Rad18 protein.
In some embodiments, the C to G base editing system comprises:
a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease inactivated CRISPR effector protein, and a Rad18 protein;
a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of a cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA; and
a fifth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fifth polypeptide, wherein said fifth polypeptide comprises uracil-DNA glycosylase (UDG).
In some embodiments, the expression construct comprising a nucleotide sequence encoding a first polypeptide, the expression construct comprising a nucleotide sequence encoding a second polypeptide, the expression construct comprising a nucleotide sequence encoding a third polypeptide, the expression construct comprising a nucleotide sequence encoding a fourth polypeptide, the expression construct comprising a nucleotide sequence encoding a fifth polypeptide, and/or the expression construct comprising a nucleotide sequence encoding a guide RNA may be different expression constructs, or any two, any three, or all of them may be the same expression construct. In some embodiments, the first polypeptide is isolated, the second polypeptide is isolated, the third polypeptide is isolated, the fourth polypeptide is isolated, the fifth polypeptide is isolated and/or the guide RNA is isolated.
In some embodiments, the gene editing system comprises at least an expression construct comprising a nucleotide sequence encoding the first polypeptide, a nucleotide sequence encoding a self-cleaving peptide, and a nucleotide sequence encoding the second, third, fourth, or fifth polypeptide linked in-frame.
As used herein, "self-cleaving peptide" means a peptide that can achieve self-cleavage within a cell. For example, the self-cleaving peptide may include a protease recognition site so as to be recognized and specifically cleaved by a protease within the cell.
Alternatively, the self-cleaving peptide may be a 2A polypeptide. 2A polypeptides are a class of short peptides from viruses, the self-cleavage of which occurs during translation. When two different polypeptides of interest are expressed in-frame using a 2A polypeptide to link them, the two polypeptides of interest are produced in a ratio of approximately 1: 1. Commonly used 2A polypeptides may be P2A from porcine teschovirus (porcine techovirus-1), T2A from Spodoptera litura beta-tetrad virus (Thosea asigna virus), E2A from equine rhinovirus (equine rhinovirus A virus) and F2A from foot-and-mouth disease virus (foot-and-mouth disease virus). Among them, P2A is preferable because it has the highest cleavage efficiency. A variety of functional variants of these 2A polypeptides are also known in the art and may be used in the present invention.
As used herein, the term "cytosine deaminase" refers to a deaminase that accepts single-stranded DNA as a substrate and is capable of catalyzing the deamination of cytidine or deoxycytidine to uracil or deoxyuracil, respectively. Examples of cytosine deaminases include, but are not limited to, for example, APOBEC1 deaminase, activation-induced cytidine deaminase (AID), APOBEC3G, CDA1, human APOBEC3A deaminase. In some embodiments, the cytosine deaminase is a human APOBEC3A deaminase, e.g., the amino acid sequence of which is set forth in SEQ ID NO: 1. In some embodiments, the cytosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% sequence identity to SEQ ID No. 1, or one or more conservative amino acid substitutions relative to SEQ ID No. 1, but substantially retains the function of the protein set forth in SEQ ID No. 1.
As used herein, "nuclease-inactivated CRISPR effector protein" refers to a deletion in the double-stranded nucleic acid cleavage activity of the CRISPR effector protein, yet retains the gRNA-directed DNA targeting ability. CRISPR effector proteins that lack double-stranded nucleic acid cleaving activity also encompass nickases (nickases) that form nicks (nicks) on double-stranded nucleic acid molecules, but do not completely cleave double-stranded nucleic acids. In some preferred embodiments of the invention, the nuclease-inactivated CRISPR effector protein of the invention has nickase activity.
In some embodiments, the nuclease-inactivated CRISPR effector protein is nuclease-inactivated Cas 9. The DNA cleavage domain of Cas9 nuclease is known to comprise two subdomains: HNH nuclease subdomain and RuvC subdomain. The HNH subdomain cleaves the complementary strand to the gRNA, while the RuvC subdomain cleaves the non-complementary strand. Mutations in these subdomains can inactivate the nuclease activity of Cas9, forming a "nuclease-inactivated Cas 9". The nuclease inactivated Cas9 still retained the gRNA-directed DNA binding ability. Thus, in principle, when fused to another protein, nuclease-inactivated Cas9 can target the other protein to almost any DNA sequence simply by co-expression with a suitable guide RNA.
The nuclease inactivated Cas9 according to the invention may be derived from Cas9 of different species, e.g. from streptococcus pyogenes (s. pyogenes) Cas9(SpCas9), or from staphylococcus aureus (s. aureus) Cas9(SaCas 9). Simultaneously mutating the HNH nuclease subdomain and RuvC subdomain of Cas9 (e.g., comprising mutations D10A and H840A) inactivates Cas9 nuclease as nuclease-dead Cas9(dCas 9). Mutation inactivation of one of the subdomains can render Cas9 nickase active, i.e., obtain Cas9 nickase (nCas9), e.g., nCas9 with only mutation D10A. Thus, in some embodiments of the invention, the nuclease-inactivated Cas9 of the invention comprises amino acid substitutions D10A and/or H840A relative to wild-type Cas 9. In some embodiments of the invention, the nuclease-inactivated Cas9 may further comprise an additional mutation. For example, nuclease inactivated SpCas9 may also comprise EQR, VQR or VRER mutations and SaCas9 may also comprise KKH mutations (Kim et al. nat. biotechnol.35, 371-376.).
In some embodiments of the invention, the nuclease-inactivated CRISPR effector protein comprises the amino acid sequence set forth in SEQ ID No. 3.
As used herein, Uracil-DNA Glycosylase (UDG) or Uracil-N-Glycosylase (UNG) refers to an enzyme that recognizes the U base and removes the N-glycosidic bond of the base to form an apurinic or apyrimidinic site. The UDG may be of different origin, for example from e. In some embodiments, UDG has the amino acid sequence shown in SEQ ID NO 5. In some embodiments, the UDG comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% sequence identity to SEQ ID No. 5, or one or more conservative amino acid substitutions relative to SEQ ID No. 5, but substantially retains the function of the protein set forth in SEQ ID No. 5.
In some embodiments of the invention, the cytosine deaminase in the first polypeptide is fused to the N-terminus of the nuclease-inactivated CRISPR effector protein.
In some embodiments of the invention, said cytosine deaminase, said nuclease-inactivated CRISPR effector protein and/or said UDG are directly linked in a first polypeptide. In some embodiments of the invention, the cytosine deaminase, the nuclease-inactivated CRISPR effector protein, and/or the Rad18 protein in the first polypeptide are directly linked. In some embodiments of the invention, the cytosine deaminase, the nuclease-inactivated CRISPR effector protein and/or the UDG in the first polypeptide are linked by a linker. In some embodiments of the invention, the cytosine deaminase, the nuclease-inactivated CRISPR effector protein, and/or the Rad18 protein of the first polypeptide are linked by a linker. The linker may be a non-functional amino acid sequence of 1-50 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or 20-25, 25-50) or more amino acids in length, without secondary or higher structure. For example, the linker can be a flexible linker, such as the X-TENT linker shown in SEQ ID NO: 2.
In the present invention, the Proliferating Cell Nuclear Antigen (PCNA) may be PCNA derived from various species. In some embodiments, the PCNA is derived from a species that requires base editing. In some embodiments, the ubiquitin protein binding site of PCNA is mutated to prevent it from being ubiquitinated by endogenous systems within the cell. It is within the ability of the person skilled in the art to identify and/or mutate the ubiquitin protein binding site of PCNA to prevent its natural ubiquitination. In some embodiments, the PCNA is rice PCNA. The wild type rice PCNA contains, for example, the amino acid sequence shown in SEQ ID NO: 20. In some embodiments, the PCNA amino acid sequence in which the ubiquitin protein binding site is mutated comprises the amino acid substitution K164R with respect to wild-type PCNA, the amino acid position being referenced to SEQ ID NO: 20. In some embodiments, the PCNA in which the ubiquitin protein binding site is mutated comprises the amino acid sequence set forth in SEQ ID NO. 9.
In some embodiments, the second polypeptide comprises one ubiquitin protein fused to the PCNA with the ubiquitin protein binding site mutated (monoubiquitination). The ubiquitin proteins can be ubiquitin proteins from various species. In some embodiments, the ubiquitin protein is derived from a species that requires base editing. In some embodiments, the ubiquitin protein is a rice ubiquitin protein. In some embodiments, the one ubiquitin protein is a truncated ubiquitin protein. In some embodiments, the truncated ubiquitin protein comprises only the N-terminal functional domain of the ubiquitin protein. In some embodiments, the truncated ubiquitin protein comprises the amino acid sequence set forth in SEQ ID NO. 10. In some embodiments, the ubiquitin protein is fused to the C-terminus of the PCNA in which the ubiquitin protein binding site is mutated.
In some embodiments, the second polypeptide further comprises MCP (MS2 coat protein), e.g., the MCP is fused to the N-terminus of the PCNA in which the ubiquitin protein binding site is mutated. An exemplary MCP comprises the amino acid sequence shown in SEQ ID NO 7. In some embodiments, the MCP comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% sequence identity to SEQ ID No. 7, or an amino acid sequence thereof having one or more conservative amino acid substitutions relative to SEQ ID No. 7, but substantially retaining the function of the protein set forth in SEQ ID No. 7. Accordingly, in some embodiments, the guide RNA comprises the MS2 sequence.
In some embodiments of the invention, the mutant PCNA, the ubiquitin protein, and optionally the MCP in the second polypeptide are linked directly to each other. In some embodiments of the invention, the mutated PCNA, the ubiquitin protein and optionally the MCP in the second polypeptide are linked by a linker. The linker may be a non-functional amino acid sequence of 1-50 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or 20-25, 25-50) or more amino acids in length, without secondary or higher structure.
"AP endonuclease," "AP lyase," and "purine-free pyrimidine lyase" are used interchangeably herein to refer to enzymes that are capable of recognizing an apurinic or apyrimidinic site on a nucleic acid and cleaving the nucleic acid. The AP endonuclease can be of various species. In some embodiments, the AP endonuclease is derived from a species that requires base editing. In some embodiments, the AP endonuclease is a rice AP endonuclease. In some embodiments, the AP endonuclease is rice APE01 g. Wild type rice APE01g contains the amino acid sequence shown in SEQ ID NO. 18. In some embodiments, the AP endonuclease is rice APE12 g. Wild type rice APE12g contains the amino acid sequence shown in SEQ ID NO. 19.
In some embodiments, the AP endonuclease is mutated to inactivate it, e.g., such that it retains substrate binding activity but loses catalytic activity.
In some embodiments, the mutant AP lyase is derived from rice APE01g and comprises the amino acid substitution D297A, relative to wild-type APE01g, the amino acid position being referenced to SEQ ID NO: 18. In some embodiments, the mutant AP lyase comprises the amino acid sequence set forth in SEQ ID NO. 15.
In some embodiments, the mutant AP lyase is derived from rice APE12g and comprises the amino acid substitution D327A with respect to wild-type APE12g, the amino acid position being referenced to SEQ ID NO 19. In some embodiments, the mutated AP lyase comprises the amino acid sequence set forth in SEQ ID NO 16.
In some embodiments, the mutant AP lyase is derived from rice APE12g and comprises the amino acid substitutions D238A and N240V with respect to wild-type APE12g, the amino acid positions being referenced to SEQ ID NO 19. In some embodiments, the mutant AP lyase comprises the amino acid sequence set forth in SEQ ID NO 11.
In the present invention, the Rad18 protein may be Rad18 protein from various species. In some embodiments, the Rad18 protein is a human Rad18 protein. In some embodiments, the Rad18 protein comprises the amino acid sequence set forth in SEQ ID NO 17. In some embodiments, the Rad18 protein comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% sequence identity to SEQ ID No. 17, or an amino acid sequence thereof having one or more conservative amino acid substitutions relative to SEQ ID No. 17, but substantially retains the function of the Rad18 protein set forth in SEQ ID No. 17.
As used herein, "gRNA" and "guide RNA" are used interchangeably to refer to an RNA molecule capable of forming a complex with a CRISPR nuclease and, due to some complementarity with a target sequence, of targeting the complex to the target sequence. For example, in Cas 9-based gene editing systems, grnas typically consist of crRNA and tracrRNA molecules that are partially complementary to form a complex, where the crRNA comprises a sequence that is sufficiently complementary to a target sequence to hybridize to the target sequence and direct the CRISPR complex (Cas9+ crRNA + tracrRNA) to specifically bind to the target sequence. However, it is known in the art to design single guide rnas (sgrnas) that contain both the characteristics of crRNA and tracrRNA. Whereas in Cpf 1-based genome editing systems, grnas typically consist only of mature crRNA molecules (also referred to as sgrnas), where the crRNA comprises a sequence that is sufficiently identical to the target sequence to hybridize to a complementary sequence of the target sequence and direct specific binding of the complex (Cpf1+ crRNA) to that target sequence. It is within the ability of one skilled in the art to design suitable grnas based on the CRISPR nuclease used and the target sequence to be edited. As used herein, a "target sequence" is a sequence that is complementary to or identical (depending on the different CRISPR nucleases) to a guide sequence of about 20 nucleotides contained in a guide RNA. Guide RNAs target a target sequence by base pairing with the target sequence or its complementary strand.
In some embodiments of the invention, the editing results in one or more nucleotide substitutions C to G in the target sequence.
In some embodiments of the invention, the polypeptide of the invention further comprises a Nuclear Localization Sequence (NLS). In general, one or more NLS in the polypeptide should be of sufficient strength to drive the polypeptide to accumulate in the nucleus of the cell in an amount that can perform its gene editing function. In general, the intensity of nuclear localization activity is determined by the number, location, specific NLS or NLSs used, or a combination of these factors, in the polypeptide. In some embodiments of the invention, the NLS of the polypeptide of the invention may be located at the N-terminus and/or C-terminus or in the middle. In some embodiments, the polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs. When there is more than one NLS, each can be chosen to be independent of the other NLS. In some embodiments, the NLS comprises the amino acid sequence set forth in SEQ ID NO 4 or 8.
In addition, the polypeptides of the invention may also include other localization sequences, such as cytoplasmic localization sequences, chloroplast localization sequences, mitochondrial localization sequences, etc., depending on the DNA location to be edited.
In some embodiments, the first polypeptide comprises the amino acid sequence set forth in SEQ ID NO 12. In some embodiments, a fusion protein comprising a first polypeptide and a second polypeptide fused by a self-cleaving peptide (T2A) comprises the amino acid sequence set forth in SEQ ID NO 13. In some embodiments, a fusion protein comprising a first polypeptide and a third polypeptide fused by a self-cleaving peptide (T2A) comprises the amino acid sequence set forth in SEQ ID NO: 14. In some embodiments, a fusion protein comprising a first polypeptide and a third polypeptide fused by a self-cleaving peptide (T2A) comprises the amino acid sequence set forth in SEQ ID NO: 21. In some embodiments, a fusion protein comprising a first polypeptide and a third polypeptide fused by a self-cleaving peptide (T2A) comprises the amino acid sequence set forth in SEQ ID NO: 22. In some embodiments, a fusion protein comprising a first polypeptide and a fourth polypeptide fused by a self-cleaving peptide (T2A) comprises the amino acid sequence set forth in SEQ ID NO 23.
In order to obtain efficient expression in a cell, in some embodiments of the invention, the nucleotide sequence encoding the polypeptide is codon optimized for the organism from which the cell to be subjected to gene editing is derived.
Codon optimization refers to a method of modifying a nucleic acid sequence to enhance expression in a host cell of interest by replacing at least one codon of the native sequence (e.g., about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons) with a codon that is used more frequently or most frequently in the host cell's gene while maintaining the native amino acid sequence. Genes can be tailored for optimal gene expression in a given organism based on codon optimization. Tables of Codon Usage are readily available, for example in the Codon Usage Database ("Codon Usage Database") available at www.kazusa.orjp/Codon/and these tables may be adapted in different ways. See, Nakamura Y. et al, "Codon use partitioned from the international DNA sequences databases: status for the year 2000.Nucl. acids Res., 28:292 (2000).
The organism from which the cells that can be subjected to gene editing by the system of the invention are derived is preferably a eukaryote, including, but not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cows, cats; poultry such as chicken, duck, goose; plants include monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis, and the like. Preferably, the organism is a plant, more preferably rice.
As used herein, "editing system" refers to a combination of components required for base editing of a genome within a cell or organism. Wherein the individual components of the system, such as the one or more polypeptides or expression constructs encoding the same, the one or more guide RNAs or expression constructs encoding the same, may each be present independently, or may be present in any combination as a composition.
Method for modifying target sequence in cell genome
In another aspect, the invention provides a method of modifying a target sequence in the genome of a cell, comprising introducing into the cell the base editing system of the invention.
In some embodiments, the modification results in one or more nucleotide substitutions C to G in the target sequence. . In some embodiments, the modification does not include an insertion and/or deletion mutation.
In another aspect, the invention also provides a method of producing a genetically modified cell comprising introducing into said cell a gene editing system of the invention.
In another aspect, the invention also provides a genetically modified organism comprising the genetically modified cell produced by the method of the invention or progeny cells thereof.
In the present invention, the target sequence to be modified may be located anywhere in the genome, for example, within a functional gene such as a protein-encoding gene, or may be located, for example, in a gene expression regulatory region such as a promoter region or an enhancer region, thereby effecting a modification of the function of the gene or a modification of gene expression. Modifications in the cellular target sequence may be detected by T7EI, PCR/RE or sequencing methods.
In the method of the present invention, the base editing system can be introduced into cells by various methods well known to those skilled in the art.
Methods that can be used to introduce the base editing system of the invention into a cell include, but are not limited to: calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (such as baculovirus, vaccinia, adenovirus, adeno-associated virus, lentivirus and other viruses), particle gun methods, PEG-mediated transformation of protoplasts, Agrobacterium tumefaciens-mediated transformation.
The cells that can be base-edited by the method of the present invention may be derived from, for example, mammals such as human, mouse, rat, monkey, dog, pig, sheep, cow, cat; poultry such as chicken, duck, goose; plants, including monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis, and the like. Preferably, the cell is a plant cell, such as a rice cell.
In some embodiments, the cell is a proliferating and/or differentiating cell. In some embodiments, the cell is a meristematic cell. In some embodiments, the cell is a callus cell.
In some embodiments, the methods of the invention are performed in vitro. For example, the cell is an isolated cell, or a cell in an isolated tissue or organ.
In other embodiments, the methods of the invention may also be performed in vivo. For example, the cell is a cell within an organism into which the system of the invention can be introduced in vivo by, for example, viral or Agrobacterium tumefaciens mediated methods.
Application of the plant
The base editing system and the method of modifying a target sequence in the genome of a cell of the present invention are particularly suitable for genetically modifying plants. Preferably, the plant is a crop plant, including but not limited to wheat, rice, maize, soybean, sunflower, sorghum, canola, alfalfa, cotton, barley, millet, sugarcane, tomato, tobacco, cassava, and potato. More preferably, the plant is rice.
In another aspect, the invention provides a method of producing a genetically modified plant comprising introducing into at least one of said plants a base editing system of the invention, thereby resulting in the substitution of one or more C to G of a target sequence in the genome of said at least one plant.
In some embodiments, the method further comprises screening the at least one plant for plants having the desired one or more C to G substitutions.
In the method of the present invention, the base editing system can be introduced into a plant by various methods well known to those skilled in the art. Methods that can be used to introduce the base editing system of the present invention into a plant include, but are not limited to: particle gun method, PEG mediated protoplast transformation, Agrobacterium tumefaciens mediated transformation, plant virus mediated transformation, pollen tube channel method and ovary injection method. Preferably, the base editing system is introduced into the plant by transient transformation.
In the method of the present invention, modification of the target sequence can be achieved simply by introducing or producing the polypeptide and guide RNA in a plant cell, and the modification can be stably inherited without stably transforming a plant with an exogenous polynucleotide encoding a component of the base editing system. This avoids potential off-target effects of stably present (continually produced) base editing compositions and also avoids integration of exogenous nucleotide sequences in the plant genome, thereby providing greater biosafety.
In some preferred embodiments, the introduction is performed in the absence of selective pressure, thereby avoiding integration of the exogenous nucleotide sequence in the plant genome.
In some embodiments, the introducing comprises transforming an isolated plant cell or tissue with the base editing system of the invention and then regenerating the transformed plant cell or tissue into a whole plant. Preferably, the regeneration is carried out in the absence of selective pressure, i.e., without using any selection agent for the selection gene carried on the expression vector during the tissue culture process. The regeneration efficiency of the plant can be improved without using a selection agent, and a modified plant without an exogenous nucleotide sequence can be obtained.
In other embodiments, the base editing system of the invention can be transformed into a specific site on an intact plant, such as a leaf, stem tip, pollen tube, young ear, or hypocotyl. This is particularly suitable for the transformation of plants which are difficult to regenerate by tissue culture.
In some embodiments, the system is introduced into a tissue/cell in plant proliferation and/or differentiation. In some embodiments, the system is introduced into a meristem. In some embodiments, the system is introduced into callus.
In some embodiments of the invention, the in vitro expressed protein and/or in vitro transcribed RNA molecule (e.g., the expression construct is an in vitro transcribed RNA molecule) is directly transformed into the plant. The protein and/or RNA molecules are capable of effecting base editing in plant cells and subsequent degradation by the cells, avoiding integration of foreign nucleotide sequences in the plant genome.
Thus, in some embodiments, genetic modification and breeding of plants using the methods of the invention can result in plants that have no exogenous polynucleotide integrated in their genome, i.e., non-transgenic (transgene-free) modified plants.
In some embodiments of the invention, wherein the modified target sequence is associated with a plant trait, such as an agronomic trait, whereby the substitution of one or more C to G results in the plant having an altered (preferably improved) trait, such as an agronomic trait, relative to a wild type plant.
In some embodiments, the method further comprises the step of screening for plants having a desired C to G substitution or substitutions and/or a desired trait, such as an agronomic trait.
In some embodiments of the invention, the method further comprises obtaining progeny of the genetically modified plant. Preferably, the genetically modified plant or progeny thereof has a desired C to G substitution or substitutions and/or a desired trait such as an agronomic trait.
In another aspect, the present invention also provides a genetically modified plant or progeny or parts thereof, wherein the plant is obtained by the method of the invention as described above. In some embodiments, the genetically modified plant or progeny or part thereof is non-transgenic. Preferably, the genetically modified plant or progeny thereof has a desired genetic modification and/or a desired trait, such as an agronomic trait.
In another aspect, the present invention also provides a method of plant breeding comprising crossing a genetically modified first plant comprising one or more C to G substitutions in a target nucleic acid region obtained by the method of the invention described above with a second plant not comprising the one or more nucleotide substitutions, thereby introducing the one or more nucleotide substitutions into the second plant. Preferably, the genetically modified first plant has a desired trait, such as an agronomic trait.
Fifth, kit
The invention also includes a kit for use in the method of the invention, the kit comprising the base editing system of the invention, and instructions for use. The kit generally includes a label indicating the intended use and/or method of use of the kit contents. The term label includes any written or recorded material provided on or with the kit or otherwise provided with the kit.
Examples
Materials and methods
1. Vector construction
In order to construct the PCGBE system, Escherichia coli UDG (accession number: AMB53293.1), rice own PCNA (LOC _ Os02g56130), Ubiquitin (LOC _ Os03g13170), rice own APE01g (LOC _ Os01g58690) and APE12g (LOC _ Os12g18200), and hRad18 protein (NP _064550.3) of human origin were selected. And performing K164R point mutation on a PCNA sequence (PCNA (K164R)), intercepting an N-terminal functional domain (Ub) of the Ubiquitin, performing inactivation point mutation on an APE12g sequence (dAPE, D238A/N240V), and performing point mutation on APE01g and APE12g to obtain mAPE01g (D297A) and mAPE12g (D327A). All gene fragments were subjected to rice codon optimization and gene synthesis in Nanjing Kingsrei Biotech Co.
APOBEC3A was fused to the N-terminus of Cas9 with an XTEN linker and UDG was fused to the C-terminus of Cas9, thereby constructing PCGBE-1 vector (fig. 2) or ADU vector (fig. 11 a).
Replacing nCas9(D10A) in the PCGBE-1 vector with dCas9(D10A, H840A) to obtain PCGBE-2; secondly, respectively putting mAPE01g, mAPE12g and hRad18 proteins to the C terminal of the PCGBE-1 vector by utilizing self-shearing 2A polypeptide (T2A), thereby constructing the PCGBE-3-5 vector; the PCGBE-6 vector is obtained by exchanging the position of UDG and hRad18 protein on the PCGBE-5 vector. And finally, integrating the fusion gene-carrying fragment and the sgRNA expression component into a pHUE411 framework by using a Gibson method to construct a binary vector pH-PCGBE-2-6 (figure 6) for agrobacterium infection mediated rice genetic transformation.
In addition, MCP-PCNA-Ub fusion protein and dAPE protein are respectively fused to the C terminal of an APOBEC3A-nCas9-UDG vector by using a self-shearing 2A polypeptide (T2A), so that CGBE vectors shown in figure 11b and CGBE vectors shown in figure 11C are constructed, and finally, a fusion gene-carrying fragment and a sgRNA expression component are integrated into a pHUE411 framework by using a Gibson method to construct a binary vector for agrobacterium infection-mediated rice genetic transformation.
9 endogenous targets were selected from 9 rice genes (OsAAT, OsALS, OsCDC48, OsDEP1, OsGRF1, OsIPA1, OsNRT1.1B, OsPDS and OsSWEET14) for system testing, and all the targeted site sequences are shown in Table 1.
TABLE 1 sgRNA targeting sites and primers
Figure BDA0003538988620000151
Bold shows PAM sequences
2. Protoplast isolation and transformation
The rice material used for protoplast isolation and transformation according to the invention is Zhonghua 11.
2.1 culture of etiolated seedlings of Rice
Rinsing the rice seeds of the Zhonghua 11 with 75% ethanol for 1 minute, treating the rice seeds with 4% sodium hypochlorite for 30 minutes, and washing the rice seeds with sterile water for more than 5 times. Culturing in M6 culture medium for 3-4 weeks, and processing at 26 deg.C in dark.
2.2 isolation of Rice protoplasts
(1) Cutting off stem tissue of etiolated seedling, cutting the middle part of the etiolated seedling into 0.5-1mm threads with a blade, placing the cut-off threads into 0.6M Mannitol solution to be protected from light for 10min, filtering with a filter screen, placing the cut-off threads into 50mL of enzymolysis solution (filtering with a 0.45 mu M filter membrane), vacuumizing (the pressure is about 15Kpa) for 30min, taking out the solution, and placing the solution on a shaking table (10rpm) for room temperature enzymolysis for 5 h; (2) adding 30-50mL of W5 to dilute the enzymolysis product, and filtering the enzymolysis liquid with a 75-micron nylon filter membrane in a round-bottom centrifuge tube (50 mL); (3) 3 is raised and lowered at 23 ℃ by 250g (rcf), the mixture is centrifuged for 3min, and the supernatant is discarded; (4) gently suspend the cells with 20mL of W5 and repeat step (3); (5) adding proper amount of MMG for suspension and waiting for transformation.
2.3 transformation of Rice protoplasts
(1) Respectively adding 10 mu g of each needed transformation carrier into a 2mL centrifuge tube, uniformly mixing, sucking 200 mu L of protoplast by using a tip-removing gun head, flicking, uniformly mixing, adding 220 mu L of PEG4000 solution, flicking, uniformly mixing, and inducing and transforming for 20-30min at room temperature in a dark place; (2) adding 880 mu L W5, slightly reversing, mixing, 250g (rcf), rising 3 and falling 3, centrifuging for 3min, and removing supernatant; (3) add 1mL of WI solution, mix by gentle inversion, and incubate at 23 ℃ in the dark for 48 h.
3. Agrobacterium infection mediated rice genetic transformation
The binary vector constituting the target is delivered to AGL1 Agrobacterium by electrotransformation. And then transforming the calli of the medium flower 11 rice by using an agrobacterium-mediated staining method, and using hygromycin as a screening marker for screening transgenic positive plants.
4. DNA extraction and amplicon sequencing analysis of protoplasts and transgenic plants
3.1 protoplast and transgenic plant DNA extraction
Collecting protoplasts in a 2mL centrifuge tube, and extracting DNA (about 30 mu L) of the protoplasts by using a CTAB method; each transgenic clone was sampled separately, its genomic DNA was extracted by CTAB method, its concentration (50 ng/. mu.L) was measured by NanoDrop ultramicro spectrophotometer, and it was stored at-20 ℃.
3.2 amplicon sequencing analysis
(1) The genomic primers were used to perform one round of PCR amplification on the DNA template, and the information of the primers in one round is shown in Table 2. A20. mu.L amplification regimen contained 4. mu.L of 5 XFastpfu buffer, 1.6. mu.L of dNTPs (2.5mM), 0.4. mu.L of Forward primer (10. mu.M), 0.4. mu.L of Reverse primer (10. mu.M), 0.4. mu.L of Fastpfu polymerase (2.5U/. mu.L), and 2. mu.L of DNA template (. about.60 ng). Amplification conditions: pre-denaturation at 95 ℃ for 5 min; denaturation at 95 ℃ for 30s, annealing at 50-64 ℃ for 30s, extension at 72 ℃ for 30s, and 35 cycles; fully extending for 5min at 72 ℃, and storing at 12 ℃;
(2) diluting the amplification product by 10 times, taking 1 mu L as a second round PCR amplification template, wherein the amplification primer is a sequencing primer containing Barcode, and the details are shown in Table 2. A50. mu.L amplification system contained 10. mu.L of 5 XFastpfu buffer, 4. mu.L of dNTPs (2.5mM), 1. mu.L of Forward primer (10. mu.M), 1. mu.L of Reverse primer (10. mu.M), 1. mu.L of Fastpfu polymerase (2.5U/. mu.L), and 1. mu.L of DNA template. The amplification conditions were as above, and the number of amplification cycles was 38 cycles.
(3) Separating the PCR product in 2% agarose Gel electrophoresis, performing Gel recovery on the target fragment by using an AxyPrepTM DNA Gel Extraction kit, and performing quantitative analysis on the recovered product by using a NanoDrop ultramicro spectrophotometer; 100ng of the recovered products are respectively mixed and sent to Beijing Nuo He Zhiyuan science and technology Co., Ltd for amplicon library construction and sequencing analysis.
(4) After the sequencing to be detected is completed, splitting original data according to a sequencing primer, and simultaneously taking the sgRNA sequence and the flanking sequence thereof as reference sequences to perform systematic comparative analysis on the types of the editing products and the editing efficiency of different systems on different gene target sites.
Example 1 construction of an accurate CG base editing System
The Cytosine Base Editor (CBE) was established as early as 2016 in the David Liu laboratory, and this system uses nCas9(D10A) to guide cytosine deaminase to act on the non-complementary strand of a DNA target site and deaminate cytosine (C) in a specific region into uracil base (U), which is recognized as thymine (T) during mismatch repair (MMR) or DNA replication, ultimately achieving precise single base substitution of C-to-T. However, during the process of body Base Excision Repair (BER), U is recognized and cut by UDG to form AP site, and the AP site forms a cut under the action of AP endonuclease (APE), which causes a plurality of indel byproducts, but at the same time, the CBE system is found to generate a plurality of low-frequency C-to-G byproducts, so that the generation of various byproducts is greatly reduced after the CBE system is fused with UGI (figure 1).
The inventors have replaced UGI in the high potency A3A-PBE system to UDG as early as 2019, constructed the initial PCGBE system (PCGBE-1) (fig. 2), and tested it in rice protoplasts, resulting in detection of only a very low frequency of C-to-G base substitutions, mainly also C-to-T base edits (fig. 3). Immediately after the test in rice callus, PCGBE-1 was found to mediate about 30% of C-to-G transversion in rice callus (FIGS. 4 and 5). This also fully suggests that CGBE mediates that the C-to-G base transversion process is dependent on the DNA replication process. In addition to mediating the C-to-G transversion, the PCGBE-1 system also produced a high proportion of indel by-products (approximately 30% or more) (FIGS. 4 and 5), somewhat similar to the results published by Kurt, which is 2020. The numerical values in the graph represent C-to-G editing efficiency; the numbers in parentheses below indicate the C-to-G purity.
The invention studies the DNA damage occurrence and potential repair mechanism involved in the process from cytosine deamination to C-to-G production (FIG. 1), and resolves 1 key repair pathway from this, namely, cross-damage DNA synthesis (TLS) repair (Zhuang et al, 2008; Qin et al, 2013; Martin and Wood, 2019). It has been shown that TLS can specifically insert 1 nucleotide into the synthetic strand corresponding to the DNA damage site (such as AP site), and this nucleotide is likely dCTP, and is mainly used for TLS repair signal initiation, which also provides the possibility for C-to-G base editing. However, one key protein factor that determines TLS repair is the Proliferation Cell Nuclear Antigen (PCNA), which recognizes DNA damage during DNA replication and mediates diverse DNA repair processes. When PCNA is uniubiquinated, it promotes recruitment of the associated DNA polymerase (pol η and pol ζ) during TLS repair, thereby promoting wound bypass repair; when PCNA is polyubiquitinated, it participates in the pathway of lesion avoidance using intact sister monomers as templates (Zhuang et al, 2008; Qin et al, 2013; Martin and Wood, 2019). Rad6-Rad18, however, plays an important role in responding to DNA damage and promotes monoubiquitination of PCNA, thereby allowing DNA damage repair to proceed toward the TLS pathway. However, through the comparison analysis of animal and plant related protein information, the inventors have not identified Rad18 homologous protein in the rice genome, and it is presumed that the Rad18 homologous protein may be one of the reasons for the low C-to-G efficiency and purity in plants. Thus, co-expression of the human Rad18 protein would likely allow mutations to proceed toward TLS repair, resulting in precise C-to-G base transversions in the target sequence.
In addition, without exogenous AP endonuclease, only part of AP sites will be cut, and the un-cut AP sites will enter into the bypass repair process of TLS during DNA replication. When AP site is generated, the expression of endogenous APE is also induced in a short time to achieve the goal of Base Excision Repair (BER), therefore, the APE protein of the mutant cell loses catalytic activity and only retains binding activity, and the mutant APE protein (mAPE) is co-expressed in the cell, thus competitively inhibiting the expression of endogenous APE and protecting AP site, and the strategy can also make the mutation proceed towards TLS repair or even C-to-G base inversion.
Based on the above results and TLS repair mechanism, the inventors have established a PCGBE-1 system by i) replacing nCas9 for dCas 9; ii) co-expressing a rice-derived mutant APE01g/APE12g protein; or iii) co-expressing humanized Rad18 proteins to form PCGBE-2-6 systems respectively (figure 6), and performing system test on OsNRT1.1B, OsPDS and OsGRF1 targets through rice callus, and finding that PCGBE-4 and PCGBE-5 are remarkably improved in both C-to-G editing efficiency and editing purity compared with PCGBE-1 (figure 7-10).
In addition, lysine at position 164 of PCNA was converted to arginine by point mutation (K164R), so that the ubiquitinated protein binding site was disrupted; meanwhile, a Ubiquitin protein (Ubiquitin) N-terminal structural domain is fused to construct a monoubiquitinated proliferating cell nuclear antigen fusion protein (PCNA. Ub), which can stably recruit TLS-related polymersomes to induce TLS repair, so that the mutation is likely to progress towards TLS repair by coexpressing the PCNA. Ub fusion protein of the cell, and accurate C-to-G base inversion is realized in a target sequence.
Based on the above results and TLS repair mechanism, the inventor uses PCNA and Ubiquitin genes of cells as templates to perform point mutation, truncation, splicing and synthesis, and then constructs the fusion protein to the C-terminal of the PCGBE-1 system to form the PCGBE system shown in FIG. 11 b. In addition, the inventors constructed inactivated rice APE (D238A/N240V) to the C-terminus of the PCGBE-1 system through T2A to form the CGBE system shown in FIG. 11C. The base editing system shown in FIG. 11 was introduced into rice calli. The results of the mutation detection are shown in FIG. 12.
The result shows that the CG base editing system not only maintains high-proportion C-to-G editing, but also greatly reduces the generation of indels, C-to-A and other byproducts, and also shows that the improved PCGBE system can realize efficient and accurate C-to-G base transversion
The sequence is as follows:
SEQ ID NO:1 APOBEC3A
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN
SEQ ID NO:2 XTEN linker
SGSETPGTSESATPES
SEQ ID NO:3 nCas9(D10A)
DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD
SEQ ID NO:4 nucleoplasmin NLS
KRPAATKKAGQAKKKK
SEQ ID NO:5 E-coil UDG
ANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESE
SEQ ID NO:6 T2A linker
EGRGSLLTCGDVEENPGP
SEQ ID NO:7 MCP
ASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY
SEQ ID NO:8 SV40 NLS
PKKKRKV
SEQ ID NO:9 OsPCNA(K164R)
MLELRLVQGSLLKKVLEAIRELVTDANFDCSGTGFSLQAMDSSHVALVALLLRSEGFEHYRCDRNLSMGMNLNNMAKMLRCAGNDDIITIKADDGSDTVTFMFESPNQDKIADFEMKLMDIDSEHLGIPDSEYQAIVRMPSSEFSRICKDLSSIGDTVIISVTREGVKFSTAGDIGTANIVCRQNKTVDKPEDATIIEMQEPVSLTFALRYMNSFTKASPLSEQVTISLSSELPVVVEYKIAEMGYIRFYLAPKIEEDEEMKS
SEQ ID NO:10 truncated Ubiquitin(Ub)
MQIFVKTLTGKTITLEVESSDTIDNVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLADYNIQKESTLHLVLRLR
SEQ ID NO:11 OsAPE(D238A/N240V),dAPE
MKRFFQPVPKDGSPAKKRPAAAAAASASDSDSLGGDAPAAAACAVGEGDSPPAPREEEPRRFVTWNANSLLLRMKSDWPAFCQFVSRVDPDVICVQEVRMPAAGSKGAPKNPGQLKDDTSSSRDEKQVVLRALSSPPFKDYRVWWSLSDSKYAGTAMIIKKKFEPKKVSFNLDRTSSKHEPDGRVIIAEFESFLLLNTYAPNNGWKEEENSFQRRRKWDKRMLEFVQQVDKPLIWCGALVVSHEEIDVSHPDFFSSAKLNGYIPPNKEDCGQPGFTLSERRRFGNILSQGKLVDAYRYLHKEKDMDCGFSWSGHPIGKYRGKRMRIDYFLVSEKLKDQIVSCDIHGRGIELEGFYGSDHCPVSLELSEEVEAPKPKSSN
12 PCGBE-1 System exemplary Polypeptides
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESRPDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRPAATKKAGQAKKKKGTDSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESEPKKKRKV
Schematic polypeptide of the PCGBE system of SEQ ID NO 13 (FIG. 11b)
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESLKDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRPAATKKAGQAKKKKTRDSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESEPKKKRKVSAEGRGSLLTCGDVEENPGPASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGGSPKKKRKVSGSETPGTSESATPESPRMLELRLVQGSLLKKVLEAIRELVTDANFDCSGTGFSLQAMDSSHVALVALLLRSEGFEHYRCDRNLSMGMNLNNMAKMLRCAGNDDIITIKADDGSDTVTFMFESPNQDKIADFEMKLMDIDSEHLGIPDSEYQAIVRMPSSEFSRICKDLSSIGDTVIISVTREGVKFSTAGDIGTANIVCRQNKTVDKPEDATIIEMQEPVSLTFALRYMNSFTKASPLSEQVTISLSSELPVVVEYKIAEMGYIRFYLAPKIEEDEEMKSSGGSMQIFVKTLTGKTITLEVESSDTIDNVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLADYNIQKESTLHLVLRLRSGGSPKKKRKV
14 PCGBE System schematic polypeptide of SEQ ID NO (FIG. 11c)
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESLKDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRPAATKKAGQAKKKKTRDSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESEPKKKRKVSAEGRGSLLTCGDVEENPGPMKRFFQPVPKDGSPAKKRPAAAAAASASDSDSLGGDAPAAAACAVGEGDSPPAPREEEPRRFVTWNANSLLLRMKSDWPAFCQFVSRVDPDVICVQEVRMPAAGSKGAPKNPGQLKDDTSSSRDEKQVVLRALSSPPFKDYRVWWSLSDSKYAGTAMIIKKKFEPKKVSFNLDRTSSKHEPDGRVIIAEFESFLLLNTYAPNNGWKEEENSFQRRRKWDKRMLEFVQQVDKPLIWCGALVVSHEEIDVSHPDFFSSAKLNGYIPPNKEDCGQPGFTLSERRRFGNILSQGKLVDAYRYLHKEKDMDCGFSWSGHPIGKYRGKRMRIDYFLVSEKLKDQIVSCDIHGRGIELEGFYGSDHCPVSLELSEEVEAPKPKSSNPKKKRKV
SEQ ID NO:15 mAPE01g(D297A)
SAIRASSHRLQTRTVALTRTKMSSMAGLGASQHGYPPRSHEPWTKLVHRERLPEWFAYNPKTMRPPPLSHDTKCMKILSWNINGLHDVVTTKGFSARDLAQRENFDVLCLQETHLEEKDVEKFKNLIADYDSYWSCSVSRLGYSGTAVISRVKPISVQYGIGIREHDHEGRVITLEFDGFYLVNAYVPNSGRFLRRLNYRVNNWDPCFSNYVKILEKSKPVIVAGDLNCARQSIDIHNPPAKTKSAGFTIEERESFETNFSSKGLVDTFRKQHPNAVGYTFWGENQRITNKGWRLAYFLASESITDKVHDSYILPDVSFSDHSPIGLVLKL
SEQ ID NO:16 mAPE12g(D327A)
KRFFQPVPKDGSPAKKRPAAAAAASASDSDSLGGDAPAAAACAVGEGDSPPAPREEEPRRFVTWNANSLLLRMKSDWPAFCQFVSRVDPDVICVQEVRMPAAGSKGAPKNPGQLKDDTSSSRDEKQVVLRALSSPPFKDYRVWWSLSDSKYAGTAMIIKKKFEPKKVSFNLDRTSSKHEPDGRVIIAEFESFLLLNTYAPNNGWKEEENSFQRRRKWDKRMLEFVQQVDKPLIWCGDLNVSHEEIDVSHPDFFSSAKLNGYIPPNKEDCGQPGFTLSERRRFGNILSQGKLVDAYRYLHKEKDMDCGFSWSGHPIGKYRGKRMRIAYFLVSEKLKDQIVSCDIHGRGIELEGFYGSDHCPVSLELSEEVEAPKPKSSN
SEQ ID NO:17 hRad18
DSLAESRWPPGLAVMKTIDDLLRCGICFEYFNIAMIIPQCSHNYCSLCIRKFLSYKTQCPTCCVTVTEPDLKNNRILDELVKSLNFARNHLLQFALESPAKSPASSSSKNLAVKVYTPVASRQSLKQGSRLMDNFLIREMSGSTSELLIKENKSKFSPQKEASPAAKTKETRSVEEIAPDPSEAKRPEPPSTSTLKQVTKVDCPVCGVNIPESHINKHLDSCLSREEKKESLRSSVHKRKPLPKTVYNLLSDRDLKKKLKEHGLSIQGNKQQLIKRHQEFVHMYNAQCDALHPKSAAEIVREIENIEKTRMRLEASKLNESVMVFTKDQTEKEIDEIHSKYRKKHKSEFQLLVDQARKGYKKIAGMSQKTVTITKEDESTEKLSSVCMGQEDNMTSVTNHFSQSKLDSPEELEPDREEDSSSCIDIQEVLSSSESDSCNSSSSDIIRDLLEEEEAWEASHKNDLQDTEISPRQNRRTRAAESAEIEPRNKRNRN
SEQ ID NO:18 APE01g
MSAIRASSHRLQTRTVALTRTKMSSMAGLGASQHGYPPRSHEPWTKLVHRERLPEWFAYNPKTMRPPPLSHDTKCMKILSWNINGLHDVVTTKGFSARDLAQRENFDVLCLQETHLEEKDVEKFKNLIADYDSYWSCSVSRLGYSGTAVISRVKPISVQYGIGIREHDHEGRVITLEFDGFYLVNAYVPNSGRFLRRLNYRVNNWDPCFSNYVKILEKSKPVIVAGDLNCARQSIDIHNPPAKTKSAGFTIEERESFETNFSSKGLVDTFRKQHPNAVGYTFWGENQRITNKGWRLDYFLASESITDKVHDSYILPDVSFSDHSPIGLVLKL
SEQ ID NO:19 APE12g
MKRFFQPVPKDGSPAKKRPAAAAAASASDSDSLGGDAPAAAACAVGEGDSPPAPREEEPRRFVTWNANSLLLRMKSDWPAFCQFVSRVDPDVICVQEVRMPAAGSKGAPKNPGQLKDDTSSSRDEKQVVLRALSSPPFKDYRVWWSLSDSKYAGTAMIIKKKFEPKKVSFNLDRTSSKHEPDGRVIIAEFESFLLLNTYAPNNGWKEEENSFQRRRKWDKRMLEFVQQVDKPLIWCGDLNVSHEEIDVSHPDFFSSAKLNGYIPPNKEDCGQPGFTLSERRRFGNILSQGKLVDAYRYLHKEKDMDCGFSWSGHPIGKYRGKRMRIDYFLVSEKLKDQIVSCDIHGRGIELEGFYGSDHCPVSLELSEEVEAPKPKSSN
SEQ ID NO:20 OsPCNA(wt)
MLELRLVQGSLLKKVLEAIRELVTDANFDCSGTGFSLQAMDSSHVALVALLLRSEGFEHYRCDRNLSMGMNLNNMAKMLRCAGNDDIITIKADDGSDTVTFMFESPNQDKIADFEMKLMDIDSEHLGIPDSEYQAIVRMPSSEFSRICKDLSSIGDTVIISVTKEGVKFSTAGDIGTANIVCRQNKTVDKPEDATIIEMQEPVSLTFALRYMNSFTKASPLSEQVTISLSSELPVVVEYKIAEMGYIRFYLAPKIEEDEEMKS
21 PCGBE-3 System exemplary fusion protein
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESRPDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRPAATKKAGQAKKKKGTDSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESEPKKKRKVLKEGRGSLLTCGDVEENPGPSAIRASSHRLQTRTVALTRTKMSSMAGLGASQHGYPPRSHEPWTKLVHRERLPEWFAYNPKTMRPPPLSHDTKCMKILSWNINGLHDVVTTKGFSARDLAQRENFDVLCLQETHLEEKDVEKFKNLIADYDSYWSCSVSRLGYSGTAVISRVKPISVQYGIGIREHDHEGRVITLEFDGFYLVNAYVPNSGRFLRRLNYRVNNWDPCFSNYVKILEKSKPVIVAGDLNCARQSIDIHNPPAKTKSAGFTIEERESFETNFSSKGLVDTFRKQHPNAVGYTFWGENQRITNKGWRLAYFLASESITDKVHDSYILPDVSFSDHSPIGLVLKLSGGSPKKKRKV
Schematic fusion protein of SEQ ID NO 22 PCGBE-4 system
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESRPDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRPAATKKAGQAKKKKGTDSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESEPKKKRKVLKEGRGSLLTCGDVEENPGPKRFFQPVPKDGSPAKKRPAAAAAASASDSDSLGGDAPAAAACAVGEGDSPPAPREEEPRRFVTWNANSLLLRMKSDWPAFCQFVSRVDPDVICVQEVRMPAAGSKGAPKNPGQLKDDTSSSRDEKQVVLRALSSPPFKDYRVWWSLSDSKYAGTAMIIKKKFEPKKVSFNLDRTSSKHEPDGRVIIAEFESFLLLNTYAPNNGWKEEENSFQRRRKWDKRMLEFVQQVDKPLIWCGDLNVSHEEIDVSHPDFFSSAKLNGYIPPNKEDCGQPGFTLSERRRFGNILSQGKLVDAYRYLHKEKDMDCGFSWSGHPIGKYRGKRMRIAYFLVSEKLKDQIVSCDIHGRGIELEGFYGSDHCPVSLELSEEVEAPKPKSSNSGGSPKKKRKV
Schematic fusion protein of SEQ ID NO. 23 PCGBE-5 system
MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESRPDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRPAATKKAGQAKKKKGTDSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESEPKKKRKVLKEGRGSLLTCGDVEENPGPDSLAESRWPPGLAVMKTIDDLLRCGICFEYFNIAMIIPQCSHNYCSLCIRKFLSYKTQCPTCCVTVTEPDLKNNRILDELVKSLNFARNHLLQFALESPAKSPASSSSKNLAVKVYTPVASRQSLKQGSRLMDNFLIREMSGSTSELLIKENKSKFSPQKEASPAAKTKETRSVEEIAPDPSEAKRPEPPSTSTLKQVTKVDCPVCGVNIPESHINKHLDSCLSREEKKESLRSSVHKRKPLPKTVYNLLSDRDLKKKLKEHGLSIQGNKQQLIKRHQEFVHMYNAQCDALHPKSAAEIVREIENIEKTRMRLEASKLNESVMVFTKDQTEKEIDEIHSKYRKKHKSEFQLLVDQARKGYKKIAGMSQKTVTITKEDESTEKLSSVCMGQEDNMTSVTNHFSQSKLDSPEELEPDREEDSSSCIDIQEVLSSSESDSCNSSSSDIIRDLLEEEEAWEASHKNDLQDTEISPRQNRRTRAAESAEIEPRNKRNRNSGGSPKKKRKV
Sequence listing
<110> Shanghai blue Cross medical science institute
<120> improved CG base editing System
<130> P2022TC1988
<160> 23
<170> PatentIn version 3.5
<210> 1
<211> 199
<212> PRT
<213> Artificial Sequence
<220>
<223> APOBEC3A
<400> 1
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn
195
<210> 2
<211> 16
<212> PRT
<213> Artificial Sequence
<220>
<223> XTEN linker
<400> 2
Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser
1 5 10 15
<210> 3
<211> 1367
<212> PRT
<213> Artificial Sequence
<220>
<223> nCas9 (D10A)
<400> 3
Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly
1 5 10 15
Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys
20 25 30
Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly
35 40 45
Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys
50 55 60
Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr
65 70 75 80
Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe
85 90 95
Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His
100 105 110
Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His
115 120 125
Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser
130 135 140
Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met
145 150 155 160
Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp
165 170 175
Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn
180 185 190
Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys
195 200 205
Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu
210 215 220
Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu
225 230 235 240
Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp
245 250 255
Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp
260 265 270
Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu
275 280 285
Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile
290 295 300
Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met
305 310 315 320
Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala
325 330 335
Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp
340 345 350
Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln
355 360 365
Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly
370 375 380
Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys
385 390 395 400
Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly
405 410 415
Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu
420 425 430
Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro
435 440 445
Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met
450 455 460
Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val
465 470 475 480
Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn
485 490 495
Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu
500 505 510
Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr
515 520 525
Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys
530 535 540
Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val
545 550 555 560
Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser
565 570 575
Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr
580 585 590
Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn
595 600 605
Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu
610 615 620
Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His
625 630 635 640
Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr
645 650 655
Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys
660 665 670
Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala
675 680 685
Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys
690 695 700
Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His
705 710 715 720
Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile
725 730 735
Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg
740 745 750
His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr
755 760 765
Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu
770 775 780
Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val
785 790 795 800
Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln
805 810 815
Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu
820 825 830
Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp
835 840 845
Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly
850 855 860
Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn
865 870 875 880
Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
885 890 895
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys
900 905 910
Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys
915 920 925
His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu
930 935 940
Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys
945 950 955 960
Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu
965 970 975
Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val
980 985 990
Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val
995 1000 1005
Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys
1010 1015 1020
Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr
1025 1030 1035
Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn
1040 1045 1050
Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr
1055 1060 1065
Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg
1070 1075 1080
Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu
1085 1090 1095
Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg
1100 1105 1110
Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys
1115 1120 1125
Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu
1130 1135 1140
Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser
1145 1150 1155
Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe
1160 1165 1170
Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu
1175 1180 1185
Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe
1190 1195 1200
Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu
1205 1210 1215
Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn
1220 1225 1230
Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro
1235 1240 1245
Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His
1250 1255 1260
Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg
1265 1270 1275
Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr
1280 1285 1290
Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile
1295 1300 1305
Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe
1310 1315 1320
Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr
1325 1330 1335
Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly
1340 1345 1350
Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp
1355 1360 1365
<210> 4
<211> 16
<212> PRT
<213> Artificial Sequence
<220>
<223> nucleoplasmin NLS
<400> 4
Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys
1 5 10 15
<210> 5
<211> 228
<212> PRT
<213> Artificial Sequence
<220>
<223> E-coil UDG
<400> 5
Ala Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln
1 5 10 15
Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln Ser
20 25 30
Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala Phe Arg
35 40 45
Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly Gln Asp Pro
50 55 60
Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe Ser Val Arg Pro
65 70 75 80
Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met Tyr Lys Glu Leu Glu
85 90 95
Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn His Gly Tyr Leu Glu Ser
100 105 110
Trp Ala Arg Gln Gly Val Leu Leu Leu Asn Thr Val Leu Thr Val Arg
115 120 125
Ala Gly Gln Ala His Ser His Ala Ser Leu Gly Trp Glu Thr Phe Thr
130 135 140
Asp Lys Val Ile Ser Leu Ile Asn Gln His Arg Glu Gly Val Val Phe
145 150 155 160
Leu Leu Trp Gly Ser His Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys
165 170 175
Gln Arg His His Val Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala
180 185 190
His Arg Gly Phe Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp
195 200 205
Leu Glu Gln Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro
210 215 220
Ala Glu Ser Glu
225
<210> 6
<211> 18
<212> PRT
<213> Artificial Sequence
<220>
<223> T2A linker
<400> 6
Glu Gly Arg Gly Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro
1 5 10 15
Gly Pro
<210> 7
<211> 116
<212> PRT
<213> Artificial Sequence
<220>
<223> MCP
<400> 7
Ala Ser Asn Phe Thr Gln Phe Val Leu Val Asp Asn Gly Gly Thr Gly
1 5 10 15
Asp Val Thr Val Ala Pro Ser Asn Phe Ala Asn Gly Ile Ala Glu Trp
20 25 30
Ile Ser Ser Asn Ser Arg Ser Gln Ala Tyr Lys Val Thr Cys Ser Val
35 40 45
Arg Gln Ser Ser Ala Gln Asn Arg Lys Tyr Thr Ile Lys Val Glu Val
50 55 60
Pro Lys Gly Ala Trp Arg Ser Tyr Leu Asn Met Glu Leu Thr Ile Pro
65 70 75 80
Ile Phe Ala Thr Asn Ser Asp Cys Glu Leu Ile Val Lys Ala Met Gln
85 90 95
Gly Leu Leu Lys Asp Gly Asn Pro Ile Pro Ser Ala Ile Ala Ala Asn
100 105 110
Ser Gly Ile Tyr
115
<210> 8
<211> 7
<212> PRT
<213> Artificial Sequence
<220>
<223> SV40 NLS
<400> 8
Pro Lys Lys Lys Arg Lys Val
1 5
<210> 9
<211> 263
<212> PRT
<213> Artificial Sequence
<220>
<223> OsPCNA (K164R)
<400> 9
Met Leu Glu Leu Arg Leu Val Gln Gly Ser Leu Leu Lys Lys Val Leu
1 5 10 15
Glu Ala Ile Arg Glu Leu Val Thr Asp Ala Asn Phe Asp Cys Ser Gly
20 25 30
Thr Gly Phe Ser Leu Gln Ala Met Asp Ser Ser His Val Ala Leu Val
35 40 45
Ala Leu Leu Leu Arg Ser Glu Gly Phe Glu His Tyr Arg Cys Asp Arg
50 55 60
Asn Leu Ser Met Gly Met Asn Leu Asn Asn Met Ala Lys Met Leu Arg
65 70 75 80
Cys Ala Gly Asn Asp Asp Ile Ile Thr Ile Lys Ala Asp Asp Gly Ser
85 90 95
Asp Thr Val Thr Phe Met Phe Glu Ser Pro Asn Gln Asp Lys Ile Ala
100 105 110
Asp Phe Glu Met Lys Leu Met Asp Ile Asp Ser Glu His Leu Gly Ile
115 120 125
Pro Asp Ser Glu Tyr Gln Ala Ile Val Arg Met Pro Ser Ser Glu Phe
130 135 140
Ser Arg Ile Cys Lys Asp Leu Ser Ser Ile Gly Asp Thr Val Ile Ile
145 150 155 160
Ser Val Thr Arg Glu Gly Val Lys Phe Ser Thr Ala Gly Asp Ile Gly
165 170 175
Thr Ala Asn Ile Val Cys Arg Gln Asn Lys Thr Val Asp Lys Pro Glu
180 185 190
Asp Ala Thr Ile Ile Glu Met Gln Glu Pro Val Ser Leu Thr Phe Ala
195 200 205
Leu Arg Tyr Met Asn Ser Phe Thr Lys Ala Ser Pro Leu Ser Glu Gln
210 215 220
Val Thr Ile Ser Leu Ser Ser Glu Leu Pro Val Val Val Glu Tyr Lys
225 230 235 240
Ile Ala Glu Met Gly Tyr Ile Arg Phe Tyr Leu Ala Pro Lys Ile Glu
245 250 255
Glu Asp Glu Glu Met Lys Ser
260
<210> 10
<211> 74
<212> PRT
<213> Artificial Sequence
<220>
<223> truncated Ubiquitin (Ub)
<400> 10
Met Gln Ile Phe Val Lys Thr Leu Thr Gly Lys Thr Ile Thr Leu Glu
1 5 10 15
Val Glu Ser Ser Asp Thr Ile Asp Asn Val Lys Ala Lys Ile Gln Asp
20 25 30
Lys Glu Gly Ile Pro Pro Asp Gln Gln Arg Leu Ile Phe Ala Gly Lys
35 40 45
Gln Leu Glu Asp Gly Arg Thr Leu Ala Asp Tyr Asn Ile Gln Lys Glu
50 55 60
Ser Thr Leu His Leu Val Leu Arg Leu Arg
65 70
<210> 11
<211> 379
<212> PRT
<213> Artificial Sequence
<220>
<223> OsAPE (D238A/N240V), dAPE
<400> 11
Met Lys Arg Phe Phe Gln Pro Val Pro Lys Asp Gly Ser Pro Ala Lys
1 5 10 15
Lys Arg Pro Ala Ala Ala Ala Ala Ala Ser Ala Ser Asp Ser Asp Ser
20 25 30
Leu Gly Gly Asp Ala Pro Ala Ala Ala Ala Cys Ala Val Gly Glu Gly
35 40 45
Asp Ser Pro Pro Ala Pro Arg Glu Glu Glu Pro Arg Arg Phe Val Thr
50 55 60
Trp Asn Ala Asn Ser Leu Leu Leu Arg Met Lys Ser Asp Trp Pro Ala
65 70 75 80
Phe Cys Gln Phe Val Ser Arg Val Asp Pro Asp Val Ile Cys Val Gln
85 90 95
Glu Val Arg Met Pro Ala Ala Gly Ser Lys Gly Ala Pro Lys Asn Pro
100 105 110
Gly Gln Leu Lys Asp Asp Thr Ser Ser Ser Arg Asp Glu Lys Gln Val
115 120 125
Val Leu Arg Ala Leu Ser Ser Pro Pro Phe Lys Asp Tyr Arg Val Trp
130 135 140
Trp Ser Leu Ser Asp Ser Lys Tyr Ala Gly Thr Ala Met Ile Ile Lys
145 150 155 160
Lys Lys Phe Glu Pro Lys Lys Val Ser Phe Asn Leu Asp Arg Thr Ser
165 170 175
Ser Lys His Glu Pro Asp Gly Arg Val Ile Ile Ala Glu Phe Glu Ser
180 185 190
Phe Leu Leu Leu Asn Thr Tyr Ala Pro Asn Asn Gly Trp Lys Glu Glu
195 200 205
Glu Asn Ser Phe Gln Arg Arg Arg Lys Trp Asp Lys Arg Met Leu Glu
210 215 220
Phe Val Gln Gln Val Asp Lys Pro Leu Ile Trp Cys Gly Ala Leu Val
225 230 235 240
Val Ser His Glu Glu Ile Asp Val Ser His Pro Asp Phe Phe Ser Ser
245 250 255
Ala Lys Leu Asn Gly Tyr Ile Pro Pro Asn Lys Glu Asp Cys Gly Gln
260 265 270
Pro Gly Phe Thr Leu Ser Glu Arg Arg Arg Phe Gly Asn Ile Leu Ser
275 280 285
Gln Gly Lys Leu Val Asp Ala Tyr Arg Tyr Leu His Lys Glu Lys Asp
290 295 300
Met Asp Cys Gly Phe Ser Trp Ser Gly His Pro Ile Gly Lys Tyr Arg
305 310 315 320
Gly Lys Arg Met Arg Ile Asp Tyr Phe Leu Val Ser Glu Lys Leu Lys
325 330 335
Asp Gln Ile Val Ser Cys Asp Ile His Gly Arg Gly Ile Glu Leu Glu
340 345 350
Gly Phe Tyr Gly Ser Asp His Cys Pro Val Ser Leu Glu Leu Ser Glu
355 360 365
Glu Val Glu Ala Pro Lys Pro Lys Ser Ser Asn
370 375
<210> 12
<211> 1842
<212> PRT
<213> Artificial Sequence
<220>
<223> exemplary Polypeptides of the PCGBE-1 System
<400> 12
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser
195 200 205
Glu Ser Ala Thr Pro Glu Ser Arg Pro Asp Lys Lys Tyr Ser Ile Gly
210 215 220
Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu
225 230 235 240
Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg
245 250 255
His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly
260 265 270
Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr
275 280 285
Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn
290 295 300
Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser
305 310 315 320
Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly
325 330 335
Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr
340 345 350
His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg
355 360 365
Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe
370 375 380
Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu
385 390 395 400
Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro
405 410 415
Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu
420 425 430
Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu
435 440 445
Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu
450 455 460
Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu
465 470 475 480
Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
485 490 495
Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu
500 505 510
Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile
515 520 525
Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His
530 535 540
His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro
545 550 555 560
Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala
565 570 575
Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile
580 585 590
Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys
595 600 605
Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly
610 615 620
Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg
625 630 635 640
Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile
645 650 655
Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala
660 665 670
Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr
675 680 685
Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala
690 695 700
Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn
705 710 715 720
Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val
725 730 735
Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys
740 745 750
Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu
755 760 765
Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr
770 775 780
Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu
785 790 795 800
Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile
805 810 815
Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu
820 825 830
Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile
835 840 845
Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met
850 855 860
Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg
865 870 875 880
Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu
885 890 895
Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu
900 905 910
Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln
915 920 925
Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala
930 935 940
Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val
945 950 955 960
Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val
965 970 975
Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn
980 985 990
Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly
995 1000 1005
Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
1010 1015 1020
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met
1025 1030 1035
Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp
1040 1045 1050
Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile
1055 1060 1065
Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser
1070 1075 1080
Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr
1085 1090 1095
Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
1100 1105 1110
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
1115 1120 1125
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile
1130 1135 1140
Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys
1145 1150 1155
Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr
1160 1165 1170
Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe
1175 1180 1185
Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala
1190 1195 1200
Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro
1205 1210 1215
Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp
1220 1225 1230
Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala
1235 1240 1245
Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1250 1255 1260
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu
1265 1270 1275
Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly
1280 1285 1290
Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val
1295 1300 1305
Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys
1310 1315 1320
Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg
1325 1330 1335
Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro
1340 1345 1350
Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1355 1360 1365
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr
1370 1375 1380
Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu
1385 1390 1395
Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys
1400 1405 1410
Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg
1415 1420 1425
Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala
1430 1435 1440
Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr
1445 1450 1455
Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu
1460 1465 1470
Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln
1475 1480 1485
Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1490 1495 1500
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile
1505 1510 1515
Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn
1520 1525 1530
Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp
1535 1540 1545
Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu
1550 1555 1560
Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu
1565 1570 1575
Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala
1580 1585 1590
Gly Gln Ala Lys Lys Lys Lys Gly Thr Asp Ser Gly Gly Ser Ala
1595 1600 1605
Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln
1610 1615 1620
Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln
1625 1630 1635
Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala
1640 1645 1650
Phe Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly
1655 1660 1665
Gln Asp Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe
1670 1675 1680
Ser Val Arg Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met
1685 1690 1695
Tyr Lys Glu Leu Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn
1700 1705 1710
His Gly Tyr Leu Glu Ser Trp Ala Arg Gln Gly Val Leu Leu Leu
1715 1720 1725
Asn Thr Val Leu Thr Val Arg Ala Gly Gln Ala His Ser His Ala
1730 1735 1740
Ser Leu Gly Trp Glu Thr Phe Thr Asp Lys Val Ile Ser Leu Ile
1745 1750 1755
Asn Gln His Arg Glu Gly Val Val Phe Leu Leu Trp Gly Ser His
1760 1765 1770
Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys Gln Arg His His Val
1775 1780 1785
Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala His Arg Gly Phe
1790 1795 1800
Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp Leu Glu Gln
1805 1810 1815
Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro Ala Glu
1820 1825 1830
Ser Glu Pro Lys Lys Lys Arg Lys Val
1835 1840
<210> 13
<211> 2358
<212> PRT
<213> Artificial Sequence
<220>
<223> schematic polypeptide of PCGBE System (FIG. 11b)
<400> 13
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser
195 200 205
Glu Ser Ala Thr Pro Glu Ser Leu Lys Asp Lys Lys Tyr Ser Ile Gly
210 215 220
Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu
225 230 235 240
Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg
245 250 255
His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly
260 265 270
Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr
275 280 285
Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn
290 295 300
Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser
305 310 315 320
Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly
325 330 335
Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr
340 345 350
His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg
355 360 365
Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe
370 375 380
Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu
385 390 395 400
Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro
405 410 415
Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu
420 425 430
Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu
435 440 445
Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu
450 455 460
Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu
465 470 475 480
Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
485 490 495
Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu
500 505 510
Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile
515 520 525
Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His
530 535 540
His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro
545 550 555 560
Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala
565 570 575
Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile
580 585 590
Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys
595 600 605
Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly
610 615 620
Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg
625 630 635 640
Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile
645 650 655
Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala
660 665 670
Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr
675 680 685
Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala
690 695 700
Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn
705 710 715 720
Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val
725 730 735
Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys
740 745 750
Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu
755 760 765
Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr
770 775 780
Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu
785 790 795 800
Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile
805 810 815
Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu
820 825 830
Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile
835 840 845
Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met
850 855 860
Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg
865 870 875 880
Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu
885 890 895
Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu
900 905 910
Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln
915 920 925
Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala
930 935 940
Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val
945 950 955 960
Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val
965 970 975
Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn
980 985 990
Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly
995 1000 1005
Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
1010 1015 1020
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met
1025 1030 1035
Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp
1040 1045 1050
Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile
1055 1060 1065
Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser
1070 1075 1080
Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr
1085 1090 1095
Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
1100 1105 1110
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
1115 1120 1125
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile
1130 1135 1140
Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys
1145 1150 1155
Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr
1160 1165 1170
Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe
1175 1180 1185
Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala
1190 1195 1200
Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro
1205 1210 1215
Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp
1220 1225 1230
Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala
1235 1240 1245
Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1250 1255 1260
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu
1265 1270 1275
Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly
1280 1285 1290
Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val
1295 1300 1305
Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys
1310 1315 1320
Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg
1325 1330 1335
Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro
1340 1345 1350
Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1355 1360 1365
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr
1370 1375 1380
Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu
1385 1390 1395
Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys
1400 1405 1410
Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg
1415 1420 1425
Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala
1430 1435 1440
Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr
1445 1450 1455
Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu
1460 1465 1470
Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln
1475 1480 1485
Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1490 1495 1500
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile
1505 1510 1515
Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn
1520 1525 1530
Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp
1535 1540 1545
Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu
1550 1555 1560
Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu
1565 1570 1575
Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala
1580 1585 1590
Gly Gln Ala Lys Lys Lys Lys Thr Arg Asp Ser Gly Gly Ser Ala
1595 1600 1605
Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln
1610 1615 1620
Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln
1625 1630 1635
Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala
1640 1645 1650
Phe Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly
1655 1660 1665
Gln Asp Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe
1670 1675 1680
Ser Val Arg Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met
1685 1690 1695
Tyr Lys Glu Leu Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn
1700 1705 1710
His Gly Tyr Leu Glu Ser Trp Ala Arg Gln Gly Val Leu Leu Leu
1715 1720 1725
Asn Thr Val Leu Thr Val Arg Ala Gly Gln Ala His Ser His Ala
1730 1735 1740
Ser Leu Gly Trp Glu Thr Phe Thr Asp Lys Val Ile Ser Leu Ile
1745 1750 1755
Asn Gln His Arg Glu Gly Val Val Phe Leu Leu Trp Gly Ser His
1760 1765 1770
Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys Gln Arg His His Val
1775 1780 1785
Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala His Arg Gly Phe
1790 1795 1800
Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp Leu Glu Gln
1805 1810 1815
Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro Ala Glu
1820 1825 1830
Ser Glu Pro Lys Lys Lys Arg Lys Val Ser Ala Glu Gly Arg Gly
1835 1840 1845
Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro Gly Pro Ala
1850 1855 1860
Ser Asn Phe Thr Gln Phe Val Leu Val Asp Asn Gly Gly Thr Gly
1865 1870 1875
Asp Val Thr Val Ala Pro Ser Asn Phe Ala Asn Gly Ile Ala Glu
1880 1885 1890
Trp Ile Ser Ser Asn Ser Arg Ser Gln Ala Tyr Lys Val Thr Cys
1895 1900 1905
Ser Val Arg Gln Ser Ser Ala Gln Asn Arg Lys Tyr Thr Ile Lys
1910 1915 1920
Val Glu Val Pro Lys Gly Ala Trp Arg Ser Tyr Leu Asn Met Glu
1925 1930 1935
Leu Thr Ile Pro Ile Phe Ala Thr Asn Ser Asp Cys Glu Leu Ile
1940 1945 1950
Val Lys Ala Met Gln Gly Leu Leu Lys Asp Gly Asn Pro Ile Pro
1955 1960 1965
Ser Ala Ile Ala Ala Asn Ser Gly Ile Tyr Gly Gly Ser Pro Lys
1970 1975 1980
Lys Lys Arg Lys Val Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu
1985 1990 1995
Ser Ala Thr Pro Glu Ser Pro Arg Met Leu Glu Leu Arg Leu Val
2000 2005 2010
Gln Gly Ser Leu Leu Lys Lys Val Leu Glu Ala Ile Arg Glu Leu
2015 2020 2025
Val Thr Asp Ala Asn Phe Asp Cys Ser Gly Thr Gly Phe Ser Leu
2030 2035 2040
Gln Ala Met Asp Ser Ser His Val Ala Leu Val Ala Leu Leu Leu
2045 2050 2055
Arg Ser Glu Gly Phe Glu His Tyr Arg Cys Asp Arg Asn Leu Ser
2060 2065 2070
Met Gly Met Asn Leu Asn Asn Met Ala Lys Met Leu Arg Cys Ala
2075 2080 2085
Gly Asn Asp Asp Ile Ile Thr Ile Lys Ala Asp Asp Gly Ser Asp
2090 2095 2100
Thr Val Thr Phe Met Phe Glu Ser Pro Asn Gln Asp Lys Ile Ala
2105 2110 2115
Asp Phe Glu Met Lys Leu Met Asp Ile Asp Ser Glu His Leu Gly
2120 2125 2130
Ile Pro Asp Ser Glu Tyr Gln Ala Ile Val Arg Met Pro Ser Ser
2135 2140 2145
Glu Phe Ser Arg Ile Cys Lys Asp Leu Ser Ser Ile Gly Asp Thr
2150 2155 2160
Val Ile Ile Ser Val Thr Arg Glu Gly Val Lys Phe Ser Thr Ala
2165 2170 2175
Gly Asp Ile Gly Thr Ala Asn Ile Val Cys Arg Gln Asn Lys Thr
2180 2185 2190
Val Asp Lys Pro Glu Asp Ala Thr Ile Ile Glu Met Gln Glu Pro
2195 2200 2205
Val Ser Leu Thr Phe Ala Leu Arg Tyr Met Asn Ser Phe Thr Lys
2210 2215 2220
Ala Ser Pro Leu Ser Glu Gln Val Thr Ile Ser Leu Ser Ser Glu
2225 2230 2235
Leu Pro Val Val Val Glu Tyr Lys Ile Ala Glu Met Gly Tyr Ile
2240 2245 2250
Arg Phe Tyr Leu Ala Pro Lys Ile Glu Glu Asp Glu Glu Met Lys
2255 2260 2265
Ser Ser Gly Gly Ser Met Gln Ile Phe Val Lys Thr Leu Thr Gly
2270 2275 2280
Lys Thr Ile Thr Leu Glu Val Glu Ser Ser Asp Thr Ile Asp Asn
2285 2290 2295
Val Lys Ala Lys Ile Gln Asp Lys Glu Gly Ile Pro Pro Asp Gln
2300 2305 2310
Gln Arg Leu Ile Phe Ala Gly Lys Gln Leu Glu Asp Gly Arg Thr
2315 2320 2325
Leu Ala Asp Tyr Asn Ile Gln Lys Glu Ser Thr Leu His Leu Val
2330 2335 2340
Leu Arg Leu Arg Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys Val
2345 2350 2355
<210> 14
<211> 2248
<212> PRT
<213> Artificial Sequence
<220>
<223> schematic polypeptide of PCGBE System (FIG. 11c)
<400> 14
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser
195 200 205
Glu Ser Ala Thr Pro Glu Ser Leu Lys Asp Lys Lys Tyr Ser Ile Gly
210 215 220
Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu
225 230 235 240
Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg
245 250 255
His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly
260 265 270
Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr
275 280 285
Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn
290 295 300
Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser
305 310 315 320
Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly
325 330 335
Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr
340 345 350
His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg
355 360 365
Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe
370 375 380
Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu
385 390 395 400
Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro
405 410 415
Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu
420 425 430
Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu
435 440 445
Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu
450 455 460
Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu
465 470 475 480
Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
485 490 495
Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu
500 505 510
Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile
515 520 525
Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His
530 535 540
His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro
545 550 555 560
Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala
565 570 575
Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile
580 585 590
Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys
595 600 605
Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly
610 615 620
Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg
625 630 635 640
Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile
645 650 655
Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala
660 665 670
Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr
675 680 685
Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala
690 695 700
Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn
705 710 715 720
Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val
725 730 735
Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys
740 745 750
Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu
755 760 765
Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr
770 775 780
Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu
785 790 795 800
Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile
805 810 815
Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu
820 825 830
Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile
835 840 845
Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met
850 855 860
Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg
865 870 875 880
Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu
885 890 895
Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu
900 905 910
Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln
915 920 925
Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala
930 935 940
Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val
945 950 955 960
Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val
965 970 975
Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn
980 985 990
Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly
995 1000 1005
Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
1010 1015 1020
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met
1025 1030 1035
Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp
1040 1045 1050
Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile
1055 1060 1065
Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser
1070 1075 1080
Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr
1085 1090 1095
Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
1100 1105 1110
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
1115 1120 1125
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile
1130 1135 1140
Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys
1145 1150 1155
Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr
1160 1165 1170
Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe
1175 1180 1185
Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala
1190 1195 1200
Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro
1205 1210 1215
Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp
1220 1225 1230
Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala
1235 1240 1245
Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1250 1255 1260
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu
1265 1270 1275
Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly
1280 1285 1290
Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val
1295 1300 1305
Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys
1310 1315 1320
Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg
1325 1330 1335
Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro
1340 1345 1350
Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1355 1360 1365
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr
1370 1375 1380
Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu
1385 1390 1395
Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys
1400 1405 1410
Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg
1415 1420 1425
Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala
1430 1435 1440
Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr
1445 1450 1455
Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu
1460 1465 1470
Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln
1475 1480 1485
Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1490 1495 1500
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile
1505 1510 1515
Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn
1520 1525 1530
Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp
1535 1540 1545
Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu
1550 1555 1560
Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu
1565 1570 1575
Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala
1580 1585 1590
Gly Gln Ala Lys Lys Lys Lys Thr Arg Asp Ser Gly Gly Ser Ala
1595 1600 1605
Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln
1610 1615 1620
Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln
1625 1630 1635
Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala
1640 1645 1650
Phe Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly
1655 1660 1665
Gln Asp Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe
1670 1675 1680
Ser Val Arg Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met
1685 1690 1695
Tyr Lys Glu Leu Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn
1700 1705 1710
His Gly Tyr Leu Glu Ser Trp Ala Arg Gln Gly Val Leu Leu Leu
1715 1720 1725
Asn Thr Val Leu Thr Val Arg Ala Gly Gln Ala His Ser His Ala
1730 1735 1740
Ser Leu Gly Trp Glu Thr Phe Thr Asp Lys Val Ile Ser Leu Ile
1745 1750 1755
Asn Gln His Arg Glu Gly Val Val Phe Leu Leu Trp Gly Ser His
1760 1765 1770
Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys Gln Arg His His Val
1775 1780 1785
Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala His Arg Gly Phe
1790 1795 1800
Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp Leu Glu Gln
1805 1810 1815
Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro Ala Glu
1820 1825 1830
Ser Glu Pro Lys Lys Lys Arg Lys Val Ser Ala Glu Gly Arg Gly
1835 1840 1845
Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro Gly Pro Met
1850 1855 1860
Lys Arg Phe Phe Gln Pro Val Pro Lys Asp Gly Ser Pro Ala Lys
1865 1870 1875
Lys Arg Pro Ala Ala Ala Ala Ala Ala Ser Ala Ser Asp Ser Asp
1880 1885 1890
Ser Leu Gly Gly Asp Ala Pro Ala Ala Ala Ala Cys Ala Val Gly
1895 1900 1905
Glu Gly Asp Ser Pro Pro Ala Pro Arg Glu Glu Glu Pro Arg Arg
1910 1915 1920
Phe Val Thr Trp Asn Ala Asn Ser Leu Leu Leu Arg Met Lys Ser
1925 1930 1935
Asp Trp Pro Ala Phe Cys Gln Phe Val Ser Arg Val Asp Pro Asp
1940 1945 1950
Val Ile Cys Val Gln Glu Val Arg Met Pro Ala Ala Gly Ser Lys
1955 1960 1965
Gly Ala Pro Lys Asn Pro Gly Gln Leu Lys Asp Asp Thr Ser Ser
1970 1975 1980
Ser Arg Asp Glu Lys Gln Val Val Leu Arg Ala Leu Ser Ser Pro
1985 1990 1995
Pro Phe Lys Asp Tyr Arg Val Trp Trp Ser Leu Ser Asp Ser Lys
2000 2005 2010
Tyr Ala Gly Thr Ala Met Ile Ile Lys Lys Lys Phe Glu Pro Lys
2015 2020 2025
Lys Val Ser Phe Asn Leu Asp Arg Thr Ser Ser Lys His Glu Pro
2030 2035 2040
Asp Gly Arg Val Ile Ile Ala Glu Phe Glu Ser Phe Leu Leu Leu
2045 2050 2055
Asn Thr Tyr Ala Pro Asn Asn Gly Trp Lys Glu Glu Glu Asn Ser
2060 2065 2070
Phe Gln Arg Arg Arg Lys Trp Asp Lys Arg Met Leu Glu Phe Val
2075 2080 2085
Gln Gln Val Asp Lys Pro Leu Ile Trp Cys Gly Ala Leu Val Val
2090 2095 2100
Ser His Glu Glu Ile Asp Val Ser His Pro Asp Phe Phe Ser Ser
2105 2110 2115
Ala Lys Leu Asn Gly Tyr Ile Pro Pro Asn Lys Glu Asp Cys Gly
2120 2125 2130
Gln Pro Gly Phe Thr Leu Ser Glu Arg Arg Arg Phe Gly Asn Ile
2135 2140 2145
Leu Ser Gln Gly Lys Leu Val Asp Ala Tyr Arg Tyr Leu His Lys
2150 2155 2160
Glu Lys Asp Met Asp Cys Gly Phe Ser Trp Ser Gly His Pro Ile
2165 2170 2175
Gly Lys Tyr Arg Gly Lys Arg Met Arg Ile Asp Tyr Phe Leu Val
2180 2185 2190
Ser Glu Lys Leu Lys Asp Gln Ile Val Ser Cys Asp Ile His Gly
2195 2200 2205
Arg Gly Ile Glu Leu Glu Gly Phe Tyr Gly Ser Asp His Cys Pro
2210 2215 2220
Val Ser Leu Glu Leu Ser Glu Glu Val Glu Ala Pro Lys Pro Lys
2225 2230 2235
Ser Ser Asn Pro Lys Lys Lys Arg Lys Val
2240 2245
<210> 15
<211> 331
<212> PRT
<213> Artificial Sequence
<220>
<223> mAPE01g (D297A)
<400> 15
Ser Ala Ile Arg Ala Ser Ser His Arg Leu Gln Thr Arg Thr Val Ala
1 5 10 15
Leu Thr Arg Thr Lys Met Ser Ser Met Ala Gly Leu Gly Ala Ser Gln
20 25 30
His Gly Tyr Pro Pro Arg Ser His Glu Pro Trp Thr Lys Leu Val His
35 40 45
Arg Glu Arg Leu Pro Glu Trp Phe Ala Tyr Asn Pro Lys Thr Met Arg
50 55 60
Pro Pro Pro Leu Ser His Asp Thr Lys Cys Met Lys Ile Leu Ser Trp
65 70 75 80
Asn Ile Asn Gly Leu His Asp Val Val Thr Thr Lys Gly Phe Ser Ala
85 90 95
Arg Asp Leu Ala Gln Arg Glu Asn Phe Asp Val Leu Cys Leu Gln Glu
100 105 110
Thr His Leu Glu Glu Lys Asp Val Glu Lys Phe Lys Asn Leu Ile Ala
115 120 125
Asp Tyr Asp Ser Tyr Trp Ser Cys Ser Val Ser Arg Leu Gly Tyr Ser
130 135 140
Gly Thr Ala Val Ile Ser Arg Val Lys Pro Ile Ser Val Gln Tyr Gly
145 150 155 160
Ile Gly Ile Arg Glu His Asp His Glu Gly Arg Val Ile Thr Leu Glu
165 170 175
Phe Asp Gly Phe Tyr Leu Val Asn Ala Tyr Val Pro Asn Ser Gly Arg
180 185 190
Phe Leu Arg Arg Leu Asn Tyr Arg Val Asn Asn Trp Asp Pro Cys Phe
195 200 205
Ser Asn Tyr Val Lys Ile Leu Glu Lys Ser Lys Pro Val Ile Val Ala
210 215 220
Gly Asp Leu Asn Cys Ala Arg Gln Ser Ile Asp Ile His Asn Pro Pro
225 230 235 240
Ala Lys Thr Lys Ser Ala Gly Phe Thr Ile Glu Glu Arg Glu Ser Phe
245 250 255
Glu Thr Asn Phe Ser Ser Lys Gly Leu Val Asp Thr Phe Arg Lys Gln
260 265 270
His Pro Asn Ala Val Gly Tyr Thr Phe Trp Gly Glu Asn Gln Arg Ile
275 280 285
Thr Asn Lys Gly Trp Arg Leu Ala Tyr Phe Leu Ala Ser Glu Ser Ile
290 295 300
Thr Asp Lys Val His Asp Ser Tyr Ile Leu Pro Asp Val Ser Phe Ser
305 310 315 320
Asp His Ser Pro Ile Gly Leu Val Leu Lys Leu
325 330
<210> 16
<211> 378
<212> PRT
<213> Artificial Sequence
<220>
<223> mAPE12g (D327A)
<400> 16
Lys Arg Phe Phe Gln Pro Val Pro Lys Asp Gly Ser Pro Ala Lys Lys
1 5 10 15
Arg Pro Ala Ala Ala Ala Ala Ala Ser Ala Ser Asp Ser Asp Ser Leu
20 25 30
Gly Gly Asp Ala Pro Ala Ala Ala Ala Cys Ala Val Gly Glu Gly Asp
35 40 45
Ser Pro Pro Ala Pro Arg Glu Glu Glu Pro Arg Arg Phe Val Thr Trp
50 55 60
Asn Ala Asn Ser Leu Leu Leu Arg Met Lys Ser Asp Trp Pro Ala Phe
65 70 75 80
Cys Gln Phe Val Ser Arg Val Asp Pro Asp Val Ile Cys Val Gln Glu
85 90 95
Val Arg Met Pro Ala Ala Gly Ser Lys Gly Ala Pro Lys Asn Pro Gly
100 105 110
Gln Leu Lys Asp Asp Thr Ser Ser Ser Arg Asp Glu Lys Gln Val Val
115 120 125
Leu Arg Ala Leu Ser Ser Pro Pro Phe Lys Asp Tyr Arg Val Trp Trp
130 135 140
Ser Leu Ser Asp Ser Lys Tyr Ala Gly Thr Ala Met Ile Ile Lys Lys
145 150 155 160
Lys Phe Glu Pro Lys Lys Val Ser Phe Asn Leu Asp Arg Thr Ser Ser
165 170 175
Lys His Glu Pro Asp Gly Arg Val Ile Ile Ala Glu Phe Glu Ser Phe
180 185 190
Leu Leu Leu Asn Thr Tyr Ala Pro Asn Asn Gly Trp Lys Glu Glu Glu
195 200 205
Asn Ser Phe Gln Arg Arg Arg Lys Trp Asp Lys Arg Met Leu Glu Phe
210 215 220
Val Gln Gln Val Asp Lys Pro Leu Ile Trp Cys Gly Asp Leu Asn Val
225 230 235 240
Ser His Glu Glu Ile Asp Val Ser His Pro Asp Phe Phe Ser Ser Ala
245 250 255
Lys Leu Asn Gly Tyr Ile Pro Pro Asn Lys Glu Asp Cys Gly Gln Pro
260 265 270
Gly Phe Thr Leu Ser Glu Arg Arg Arg Phe Gly Asn Ile Leu Ser Gln
275 280 285
Gly Lys Leu Val Asp Ala Tyr Arg Tyr Leu His Lys Glu Lys Asp Met
290 295 300
Asp Cys Gly Phe Ser Trp Ser Gly His Pro Ile Gly Lys Tyr Arg Gly
305 310 315 320
Lys Arg Met Arg Ile Ala Tyr Phe Leu Val Ser Glu Lys Leu Lys Asp
325 330 335
Gln Ile Val Ser Cys Asp Ile His Gly Arg Gly Ile Glu Leu Glu Gly
340 345 350
Phe Tyr Gly Ser Asp His Cys Pro Val Ser Leu Glu Leu Ser Glu Glu
355 360 365
Val Glu Ala Pro Lys Pro Lys Ser Ser Asn
370 375
<210> 17
<211> 494
<212> PRT
<213> Artificial Sequence
<220>
<223> hRad18
<400> 17
Asp Ser Leu Ala Glu Ser Arg Trp Pro Pro Gly Leu Ala Val Met Lys
1 5 10 15
Thr Ile Asp Asp Leu Leu Arg Cys Gly Ile Cys Phe Glu Tyr Phe Asn
20 25 30
Ile Ala Met Ile Ile Pro Gln Cys Ser His Asn Tyr Cys Ser Leu Cys
35 40 45
Ile Arg Lys Phe Leu Ser Tyr Lys Thr Gln Cys Pro Thr Cys Cys Val
50 55 60
Thr Val Thr Glu Pro Asp Leu Lys Asn Asn Arg Ile Leu Asp Glu Leu
65 70 75 80
Val Lys Ser Leu Asn Phe Ala Arg Asn His Leu Leu Gln Phe Ala Leu
85 90 95
Glu Ser Pro Ala Lys Ser Pro Ala Ser Ser Ser Ser Lys Asn Leu Ala
100 105 110
Val Lys Val Tyr Thr Pro Val Ala Ser Arg Gln Ser Leu Lys Gln Gly
115 120 125
Ser Arg Leu Met Asp Asn Phe Leu Ile Arg Glu Met Ser Gly Ser Thr
130 135 140
Ser Glu Leu Leu Ile Lys Glu Asn Lys Ser Lys Phe Ser Pro Gln Lys
145 150 155 160
Glu Ala Ser Pro Ala Ala Lys Thr Lys Glu Thr Arg Ser Val Glu Glu
165 170 175
Ile Ala Pro Asp Pro Ser Glu Ala Lys Arg Pro Glu Pro Pro Ser Thr
180 185 190
Ser Thr Leu Lys Gln Val Thr Lys Val Asp Cys Pro Val Cys Gly Val
195 200 205
Asn Ile Pro Glu Ser His Ile Asn Lys His Leu Asp Ser Cys Leu Ser
210 215 220
Arg Glu Glu Lys Lys Glu Ser Leu Arg Ser Ser Val His Lys Arg Lys
225 230 235 240
Pro Leu Pro Lys Thr Val Tyr Asn Leu Leu Ser Asp Arg Asp Leu Lys
245 250 255
Lys Lys Leu Lys Glu His Gly Leu Ser Ile Gln Gly Asn Lys Gln Gln
260 265 270
Leu Ile Lys Arg His Gln Glu Phe Val His Met Tyr Asn Ala Gln Cys
275 280 285
Asp Ala Leu His Pro Lys Ser Ala Ala Glu Ile Val Arg Glu Ile Glu
290 295 300
Asn Ile Glu Lys Thr Arg Met Arg Leu Glu Ala Ser Lys Leu Asn Glu
305 310 315 320
Ser Val Met Val Phe Thr Lys Asp Gln Thr Glu Lys Glu Ile Asp Glu
325 330 335
Ile His Ser Lys Tyr Arg Lys Lys His Lys Ser Glu Phe Gln Leu Leu
340 345 350
Val Asp Gln Ala Arg Lys Gly Tyr Lys Lys Ile Ala Gly Met Ser Gln
355 360 365
Lys Thr Val Thr Ile Thr Lys Glu Asp Glu Ser Thr Glu Lys Leu Ser
370 375 380
Ser Val Cys Met Gly Gln Glu Asp Asn Met Thr Ser Val Thr Asn His
385 390 395 400
Phe Ser Gln Ser Lys Leu Asp Ser Pro Glu Glu Leu Glu Pro Asp Arg
405 410 415
Glu Glu Asp Ser Ser Ser Cys Ile Asp Ile Gln Glu Val Leu Ser Ser
420 425 430
Ser Glu Ser Asp Ser Cys Asn Ser Ser Ser Ser Asp Ile Ile Arg Asp
435 440 445
Leu Leu Glu Glu Glu Glu Ala Trp Glu Ala Ser His Lys Asn Asp Leu
450 455 460
Gln Asp Thr Glu Ile Ser Pro Arg Gln Asn Arg Arg Thr Arg Ala Ala
465 470 475 480
Glu Ser Ala Glu Ile Glu Pro Arg Asn Lys Arg Asn Arg Asn
485 490
<210> 18
<211> 332
<212> PRT
<213> Artificial Sequence
<220>
<223> APE01g
<400> 18
Met Ser Ala Ile Arg Ala Ser Ser His Arg Leu Gln Thr Arg Thr Val
1 5 10 15
Ala Leu Thr Arg Thr Lys Met Ser Ser Met Ala Gly Leu Gly Ala Ser
20 25 30
Gln His Gly Tyr Pro Pro Arg Ser His Glu Pro Trp Thr Lys Leu Val
35 40 45
His Arg Glu Arg Leu Pro Glu Trp Phe Ala Tyr Asn Pro Lys Thr Met
50 55 60
Arg Pro Pro Pro Leu Ser His Asp Thr Lys Cys Met Lys Ile Leu Ser
65 70 75 80
Trp Asn Ile Asn Gly Leu His Asp Val Val Thr Thr Lys Gly Phe Ser
85 90 95
Ala Arg Asp Leu Ala Gln Arg Glu Asn Phe Asp Val Leu Cys Leu Gln
100 105 110
Glu Thr His Leu Glu Glu Lys Asp Val Glu Lys Phe Lys Asn Leu Ile
115 120 125
Ala Asp Tyr Asp Ser Tyr Trp Ser Cys Ser Val Ser Arg Leu Gly Tyr
130 135 140
Ser Gly Thr Ala Val Ile Ser Arg Val Lys Pro Ile Ser Val Gln Tyr
145 150 155 160
Gly Ile Gly Ile Arg Glu His Asp His Glu Gly Arg Val Ile Thr Leu
165 170 175
Glu Phe Asp Gly Phe Tyr Leu Val Asn Ala Tyr Val Pro Asn Ser Gly
180 185 190
Arg Phe Leu Arg Arg Leu Asn Tyr Arg Val Asn Asn Trp Asp Pro Cys
195 200 205
Phe Ser Asn Tyr Val Lys Ile Leu Glu Lys Ser Lys Pro Val Ile Val
210 215 220
Ala Gly Asp Leu Asn Cys Ala Arg Gln Ser Ile Asp Ile His Asn Pro
225 230 235 240
Pro Ala Lys Thr Lys Ser Ala Gly Phe Thr Ile Glu Glu Arg Glu Ser
245 250 255
Phe Glu Thr Asn Phe Ser Ser Lys Gly Leu Val Asp Thr Phe Arg Lys
260 265 270
Gln His Pro Asn Ala Val Gly Tyr Thr Phe Trp Gly Glu Asn Gln Arg
275 280 285
Ile Thr Asn Lys Gly Trp Arg Leu Asp Tyr Phe Leu Ala Ser Glu Ser
290 295 300
Ile Thr Asp Lys Val His Asp Ser Tyr Ile Leu Pro Asp Val Ser Phe
305 310 315 320
Ser Asp His Ser Pro Ile Gly Leu Val Leu Lys Leu
325 330
<210> 19
<211> 379
<212> PRT
<213> Artificial Sequence
<220>
<223> APE12g
<400> 19
Met Lys Arg Phe Phe Gln Pro Val Pro Lys Asp Gly Ser Pro Ala Lys
1 5 10 15
Lys Arg Pro Ala Ala Ala Ala Ala Ala Ser Ala Ser Asp Ser Asp Ser
20 25 30
Leu Gly Gly Asp Ala Pro Ala Ala Ala Ala Cys Ala Val Gly Glu Gly
35 40 45
Asp Ser Pro Pro Ala Pro Arg Glu Glu Glu Pro Arg Arg Phe Val Thr
50 55 60
Trp Asn Ala Asn Ser Leu Leu Leu Arg Met Lys Ser Asp Trp Pro Ala
65 70 75 80
Phe Cys Gln Phe Val Ser Arg Val Asp Pro Asp Val Ile Cys Val Gln
85 90 95
Glu Val Arg Met Pro Ala Ala Gly Ser Lys Gly Ala Pro Lys Asn Pro
100 105 110
Gly Gln Leu Lys Asp Asp Thr Ser Ser Ser Arg Asp Glu Lys Gln Val
115 120 125
Val Leu Arg Ala Leu Ser Ser Pro Pro Phe Lys Asp Tyr Arg Val Trp
130 135 140
Trp Ser Leu Ser Asp Ser Lys Tyr Ala Gly Thr Ala Met Ile Ile Lys
145 150 155 160
Lys Lys Phe Glu Pro Lys Lys Val Ser Phe Asn Leu Asp Arg Thr Ser
165 170 175
Ser Lys His Glu Pro Asp Gly Arg Val Ile Ile Ala Glu Phe Glu Ser
180 185 190
Phe Leu Leu Leu Asn Thr Tyr Ala Pro Asn Asn Gly Trp Lys Glu Glu
195 200 205
Glu Asn Ser Phe Gln Arg Arg Arg Lys Trp Asp Lys Arg Met Leu Glu
210 215 220
Phe Val Gln Gln Val Asp Lys Pro Leu Ile Trp Cys Gly Asp Leu Asn
225 230 235 240
Val Ser His Glu Glu Ile Asp Val Ser His Pro Asp Phe Phe Ser Ser
245 250 255
Ala Lys Leu Asn Gly Tyr Ile Pro Pro Asn Lys Glu Asp Cys Gly Gln
260 265 270
Pro Gly Phe Thr Leu Ser Glu Arg Arg Arg Phe Gly Asn Ile Leu Ser
275 280 285
Gln Gly Lys Leu Val Asp Ala Tyr Arg Tyr Leu His Lys Glu Lys Asp
290 295 300
Met Asp Cys Gly Phe Ser Trp Ser Gly His Pro Ile Gly Lys Tyr Arg
305 310 315 320
Gly Lys Arg Met Arg Ile Asp Tyr Phe Leu Val Ser Glu Lys Leu Lys
325 330 335
Asp Gln Ile Val Ser Cys Asp Ile His Gly Arg Gly Ile Glu Leu Glu
340 345 350
Gly Phe Tyr Gly Ser Asp His Cys Pro Val Ser Leu Glu Leu Ser Glu
355 360 365
Glu Val Glu Ala Pro Lys Pro Lys Ser Ser Asn
370 375
<210> 20
<211> 263
<212> PRT
<213> Artificial Sequence
<220>
<223> OsPCNA (wt)
<400> 20
Met Leu Glu Leu Arg Leu Val Gln Gly Ser Leu Leu Lys Lys Val Leu
1 5 10 15
Glu Ala Ile Arg Glu Leu Val Thr Asp Ala Asn Phe Asp Cys Ser Gly
20 25 30
Thr Gly Phe Ser Leu Gln Ala Met Asp Ser Ser His Val Ala Leu Val
35 40 45
Ala Leu Leu Leu Arg Ser Glu Gly Phe Glu His Tyr Arg Cys Asp Arg
50 55 60
Asn Leu Ser Met Gly Met Asn Leu Asn Asn Met Ala Lys Met Leu Arg
65 70 75 80
Cys Ala Gly Asn Asp Asp Ile Ile Thr Ile Lys Ala Asp Asp Gly Ser
85 90 95
Asp Thr Val Thr Phe Met Phe Glu Ser Pro Asn Gln Asp Lys Ile Ala
100 105 110
Asp Phe Glu Met Lys Leu Met Asp Ile Asp Ser Glu His Leu Gly Ile
115 120 125
Pro Asp Ser Glu Tyr Gln Ala Ile Val Arg Met Pro Ser Ser Glu Phe
130 135 140
Ser Arg Ile Cys Lys Asp Leu Ser Ser Ile Gly Asp Thr Val Ile Ile
145 150 155 160
Ser Val Thr Lys Glu Gly Val Lys Phe Ser Thr Ala Gly Asp Ile Gly
165 170 175
Thr Ala Asn Ile Val Cys Arg Gln Asn Lys Thr Val Asp Lys Pro Glu
180 185 190
Asp Ala Thr Ile Ile Glu Met Gln Glu Pro Val Ser Leu Thr Phe Ala
195 200 205
Leu Arg Tyr Met Asn Ser Phe Thr Lys Ala Ser Pro Leu Ser Glu Gln
210 215 220
Val Thr Ile Ser Leu Ser Ser Glu Leu Pro Val Val Val Glu Tyr Lys
225 230 235 240
Ile Ala Glu Met Gly Tyr Ile Arg Phe Tyr Leu Ala Pro Lys Ile Glu
245 250 255
Glu Asp Glu Glu Met Lys Ser
260
<210> 21
<211> 2204
<212> PRT
<213> Artificial Sequence
<220>
<223> exemplary fusion protein of PCGBE-3 System
<400> 21
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser
195 200 205
Glu Ser Ala Thr Pro Glu Ser Arg Pro Asp Lys Lys Tyr Ser Ile Gly
210 215 220
Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu
225 230 235 240
Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg
245 250 255
His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly
260 265 270
Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr
275 280 285
Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn
290 295 300
Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser
305 310 315 320
Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly
325 330 335
Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr
340 345 350
His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg
355 360 365
Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe
370 375 380
Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu
385 390 395 400
Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro
405 410 415
Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu
420 425 430
Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu
435 440 445
Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu
450 455 460
Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu
465 470 475 480
Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
485 490 495
Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu
500 505 510
Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile
515 520 525
Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His
530 535 540
His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro
545 550 555 560
Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala
565 570 575
Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile
580 585 590
Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys
595 600 605
Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly
610 615 620
Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg
625 630 635 640
Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile
645 650 655
Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala
660 665 670
Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr
675 680 685
Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala
690 695 700
Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn
705 710 715 720
Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val
725 730 735
Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys
740 745 750
Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu
755 760 765
Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr
770 775 780
Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu
785 790 795 800
Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile
805 810 815
Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu
820 825 830
Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile
835 840 845
Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met
850 855 860
Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg
865 870 875 880
Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu
885 890 895
Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu
900 905 910
Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln
915 920 925
Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala
930 935 940
Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val
945 950 955 960
Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val
965 970 975
Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn
980 985 990
Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly
995 1000 1005
Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
1010 1015 1020
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met
1025 1030 1035
Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp
1040 1045 1050
Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile
1055 1060 1065
Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser
1070 1075 1080
Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr
1085 1090 1095
Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
1100 1105 1110
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
1115 1120 1125
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile
1130 1135 1140
Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys
1145 1150 1155
Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr
1160 1165 1170
Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe
1175 1180 1185
Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala
1190 1195 1200
Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro
1205 1210 1215
Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp
1220 1225 1230
Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala
1235 1240 1245
Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1250 1255 1260
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu
1265 1270 1275
Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly
1280 1285 1290
Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val
1295 1300 1305
Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys
1310 1315 1320
Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg
1325 1330 1335
Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro
1340 1345 1350
Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1355 1360 1365
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr
1370 1375 1380
Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu
1385 1390 1395
Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys
1400 1405 1410
Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg
1415 1420 1425
Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala
1430 1435 1440
Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr
1445 1450 1455
Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu
1460 1465 1470
Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln
1475 1480 1485
Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1490 1495 1500
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile
1505 1510 1515
Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn
1520 1525 1530
Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp
1535 1540 1545
Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu
1550 1555 1560
Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu
1565 1570 1575
Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala
1580 1585 1590
Gly Gln Ala Lys Lys Lys Lys Gly Thr Asp Ser Gly Gly Ser Ala
1595 1600 1605
Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln
1610 1615 1620
Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln
1625 1630 1635
Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala
1640 1645 1650
Phe Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly
1655 1660 1665
Gln Asp Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe
1670 1675 1680
Ser Val Arg Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met
1685 1690 1695
Tyr Lys Glu Leu Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn
1700 1705 1710
His Gly Tyr Leu Glu Ser Trp Ala Arg Gln Gly Val Leu Leu Leu
1715 1720 1725
Asn Thr Val Leu Thr Val Arg Ala Gly Gln Ala His Ser His Ala
1730 1735 1740
Ser Leu Gly Trp Glu Thr Phe Thr Asp Lys Val Ile Ser Leu Ile
1745 1750 1755
Asn Gln His Arg Glu Gly Val Val Phe Leu Leu Trp Gly Ser His
1760 1765 1770
Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys Gln Arg His His Val
1775 1780 1785
Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala His Arg Gly Phe
1790 1795 1800
Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp Leu Glu Gln
1805 1810 1815
Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro Ala Glu
1820 1825 1830
Ser Glu Pro Lys Lys Lys Arg Lys Val Leu Lys Glu Gly Arg Gly
1835 1840 1845
Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro Gly Pro Ser
1850 1855 1860
Ala Ile Arg Ala Ser Ser His Arg Leu Gln Thr Arg Thr Val Ala
1865 1870 1875
Leu Thr Arg Thr Lys Met Ser Ser Met Ala Gly Leu Gly Ala Ser
1880 1885 1890
Gln His Gly Tyr Pro Pro Arg Ser His Glu Pro Trp Thr Lys Leu
1895 1900 1905
Val His Arg Glu Arg Leu Pro Glu Trp Phe Ala Tyr Asn Pro Lys
1910 1915 1920
Thr Met Arg Pro Pro Pro Leu Ser His Asp Thr Lys Cys Met Lys
1925 1930 1935
Ile Leu Ser Trp Asn Ile Asn Gly Leu His Asp Val Val Thr Thr
1940 1945 1950
Lys Gly Phe Ser Ala Arg Asp Leu Ala Gln Arg Glu Asn Phe Asp
1955 1960 1965
Val Leu Cys Leu Gln Glu Thr His Leu Glu Glu Lys Asp Val Glu
1970 1975 1980
Lys Phe Lys Asn Leu Ile Ala Asp Tyr Asp Ser Tyr Trp Ser Cys
1985 1990 1995
Ser Val Ser Arg Leu Gly Tyr Ser Gly Thr Ala Val Ile Ser Arg
2000 2005 2010
Val Lys Pro Ile Ser Val Gln Tyr Gly Ile Gly Ile Arg Glu His
2015 2020 2025
Asp His Glu Gly Arg Val Ile Thr Leu Glu Phe Asp Gly Phe Tyr
2030 2035 2040
Leu Val Asn Ala Tyr Val Pro Asn Ser Gly Arg Phe Leu Arg Arg
2045 2050 2055
Leu Asn Tyr Arg Val Asn Asn Trp Asp Pro Cys Phe Ser Asn Tyr
2060 2065 2070
Val Lys Ile Leu Glu Lys Ser Lys Pro Val Ile Val Ala Gly Asp
2075 2080 2085
Leu Asn Cys Ala Arg Gln Ser Ile Asp Ile His Asn Pro Pro Ala
2090 2095 2100
Lys Thr Lys Ser Ala Gly Phe Thr Ile Glu Glu Arg Glu Ser Phe
2105 2110 2115
Glu Thr Asn Phe Ser Ser Lys Gly Leu Val Asp Thr Phe Arg Lys
2120 2125 2130
Gln His Pro Asn Ala Val Gly Tyr Thr Phe Trp Gly Glu Asn Gln
2135 2140 2145
Arg Ile Thr Asn Lys Gly Trp Arg Leu Ala Tyr Phe Leu Ala Ser
2150 2155 2160
Glu Ser Ile Thr Asp Lys Val His Asp Ser Tyr Ile Leu Pro Asp
2165 2170 2175
Val Ser Phe Ser Asp His Ser Pro Ile Gly Leu Val Leu Lys Leu
2180 2185 2190
Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys Val
2195 2200
<210> 22
<211> 2251
<212> PRT
<213> Artificial Sequence
<220>
<223> schematic fusion protein of PCGBE-4 System
<400> 22
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser
195 200 205
Glu Ser Ala Thr Pro Glu Ser Arg Pro Asp Lys Lys Tyr Ser Ile Gly
210 215 220
Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu
225 230 235 240
Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg
245 250 255
His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly
260 265 270
Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr
275 280 285
Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn
290 295 300
Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser
305 310 315 320
Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly
325 330 335
Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr
340 345 350
His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg
355 360 365
Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe
370 375 380
Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu
385 390 395 400
Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro
405 410 415
Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu
420 425 430
Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu
435 440 445
Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu
450 455 460
Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu
465 470 475 480
Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
485 490 495
Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu
500 505 510
Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile
515 520 525
Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His
530 535 540
His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro
545 550 555 560
Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala
565 570 575
Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile
580 585 590
Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys
595 600 605
Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly
610 615 620
Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg
625 630 635 640
Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile
645 650 655
Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala
660 665 670
Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr
675 680 685
Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala
690 695 700
Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn
705 710 715 720
Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val
725 730 735
Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys
740 745 750
Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu
755 760 765
Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr
770 775 780
Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu
785 790 795 800
Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile
805 810 815
Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu
820 825 830
Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile
835 840 845
Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met
850 855 860
Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg
865 870 875 880
Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu
885 890 895
Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu
900 905 910
Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln
915 920 925
Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala
930 935 940
Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val
945 950 955 960
Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val
965 970 975
Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn
980 985 990
Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly
995 1000 1005
Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
1010 1015 1020
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met
1025 1030 1035
Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp
1040 1045 1050
Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile
1055 1060 1065
Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser
1070 1075 1080
Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr
1085 1090 1095
Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
1100 1105 1110
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
1115 1120 1125
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile
1130 1135 1140
Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys
1145 1150 1155
Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr
1160 1165 1170
Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe
1175 1180 1185
Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala
1190 1195 1200
Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro
1205 1210 1215
Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp
1220 1225 1230
Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala
1235 1240 1245
Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1250 1255 1260
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu
1265 1270 1275
Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly
1280 1285 1290
Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val
1295 1300 1305
Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys
1310 1315 1320
Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg
1325 1330 1335
Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro
1340 1345 1350
Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1355 1360 1365
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr
1370 1375 1380
Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu
1385 1390 1395
Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys
1400 1405 1410
Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg
1415 1420 1425
Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala
1430 1435 1440
Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr
1445 1450 1455
Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu
1460 1465 1470
Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln
1475 1480 1485
Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1490 1495 1500
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile
1505 1510 1515
Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn
1520 1525 1530
Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp
1535 1540 1545
Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu
1550 1555 1560
Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu
1565 1570 1575
Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala
1580 1585 1590
Gly Gln Ala Lys Lys Lys Lys Gly Thr Asp Ser Gly Gly Ser Ala
1595 1600 1605
Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln
1610 1615 1620
Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln
1625 1630 1635
Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala
1640 1645 1650
Phe Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly
1655 1660 1665
Gln Asp Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe
1670 1675 1680
Ser Val Arg Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met
1685 1690 1695
Tyr Lys Glu Leu Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn
1700 1705 1710
His Gly Tyr Leu Glu Ser Trp Ala Arg Gln Gly Val Leu Leu Leu
1715 1720 1725
Asn Thr Val Leu Thr Val Arg Ala Gly Gln Ala His Ser His Ala
1730 1735 1740
Ser Leu Gly Trp Glu Thr Phe Thr Asp Lys Val Ile Ser Leu Ile
1745 1750 1755
Asn Gln His Arg Glu Gly Val Val Phe Leu Leu Trp Gly Ser His
1760 1765 1770
Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys Gln Arg His His Val
1775 1780 1785
Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala His Arg Gly Phe
1790 1795 1800
Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp Leu Glu Gln
1805 1810 1815
Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro Ala Glu
1820 1825 1830
Ser Glu Pro Lys Lys Lys Arg Lys Val Leu Lys Glu Gly Arg Gly
1835 1840 1845
Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro Gly Pro Lys
1850 1855 1860
Arg Phe Phe Gln Pro Val Pro Lys Asp Gly Ser Pro Ala Lys Lys
1865 1870 1875
Arg Pro Ala Ala Ala Ala Ala Ala Ser Ala Ser Asp Ser Asp Ser
1880 1885 1890
Leu Gly Gly Asp Ala Pro Ala Ala Ala Ala Cys Ala Val Gly Glu
1895 1900 1905
Gly Asp Ser Pro Pro Ala Pro Arg Glu Glu Glu Pro Arg Arg Phe
1910 1915 1920
Val Thr Trp Asn Ala Asn Ser Leu Leu Leu Arg Met Lys Ser Asp
1925 1930 1935
Trp Pro Ala Phe Cys Gln Phe Val Ser Arg Val Asp Pro Asp Val
1940 1945 1950
Ile Cys Val Gln Glu Val Arg Met Pro Ala Ala Gly Ser Lys Gly
1955 1960 1965
Ala Pro Lys Asn Pro Gly Gln Leu Lys Asp Asp Thr Ser Ser Ser
1970 1975 1980
Arg Asp Glu Lys Gln Val Val Leu Arg Ala Leu Ser Ser Pro Pro
1985 1990 1995
Phe Lys Asp Tyr Arg Val Trp Trp Ser Leu Ser Asp Ser Lys Tyr
2000 2005 2010
Ala Gly Thr Ala Met Ile Ile Lys Lys Lys Phe Glu Pro Lys Lys
2015 2020 2025
Val Ser Phe Asn Leu Asp Arg Thr Ser Ser Lys His Glu Pro Asp
2030 2035 2040
Gly Arg Val Ile Ile Ala Glu Phe Glu Ser Phe Leu Leu Leu Asn
2045 2050 2055
Thr Tyr Ala Pro Asn Asn Gly Trp Lys Glu Glu Glu Asn Ser Phe
2060 2065 2070
Gln Arg Arg Arg Lys Trp Asp Lys Arg Met Leu Glu Phe Val Gln
2075 2080 2085
Gln Val Asp Lys Pro Leu Ile Trp Cys Gly Asp Leu Asn Val Ser
2090 2095 2100
His Glu Glu Ile Asp Val Ser His Pro Asp Phe Phe Ser Ser Ala
2105 2110 2115
Lys Leu Asn Gly Tyr Ile Pro Pro Asn Lys Glu Asp Cys Gly Gln
2120 2125 2130
Pro Gly Phe Thr Leu Ser Glu Arg Arg Arg Phe Gly Asn Ile Leu
2135 2140 2145
Ser Gln Gly Lys Leu Val Asp Ala Tyr Arg Tyr Leu His Lys Glu
2150 2155 2160
Lys Asp Met Asp Cys Gly Phe Ser Trp Ser Gly His Pro Ile Gly
2165 2170 2175
Lys Tyr Arg Gly Lys Arg Met Arg Ile Ala Tyr Phe Leu Val Ser
2180 2185 2190
Glu Lys Leu Lys Asp Gln Ile Val Ser Cys Asp Ile His Gly Arg
2195 2200 2205
Gly Ile Glu Leu Glu Gly Phe Tyr Gly Ser Asp His Cys Pro Val
2210 2215 2220
Ser Leu Glu Leu Ser Glu Glu Val Glu Ala Pro Lys Pro Lys Ser
2225 2230 2235
Ser Asn Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys Val
2240 2245 2250
<210> 23
<211> 2367
<212> PRT
<213> Artificial Sequence
<220>
<223> schematic fusion protein of PCGBE-5 System
<400> 23
Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His
1 5 10 15
Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30
Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met
35 40 45
Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys
50 55 60
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro
65 70 75 80
Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile
85 90 95
Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110
Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
115 120 125
Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140
Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His
145 150 155 160
Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp
165 170 175
Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190
Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser
195 200 205
Glu Ser Ala Thr Pro Glu Ser Arg Pro Asp Lys Lys Tyr Ser Ile Gly
210 215 220
Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu
225 230 235 240
Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg
245 250 255
His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly
260 265 270
Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr
275 280 285
Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn
290 295 300
Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser
305 310 315 320
Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly
325 330 335
Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr
340 345 350
His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg
355 360 365
Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe
370 375 380
Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu
385 390 395 400
Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro
405 410 415
Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu
420 425 430
Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu
435 440 445
Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu
450 455 460
Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu
465 470 475 480
Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
485 490 495
Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu
500 505 510
Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile
515 520 525
Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His
530 535 540
His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro
545 550 555 560
Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala
565 570 575
Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile
580 585 590
Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys
595 600 605
Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly
610 615 620
Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg
625 630 635 640
Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile
645 650 655
Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala
660 665 670
Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr
675 680 685
Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala
690 695 700
Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn
705 710 715 720
Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val
725 730 735
Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys
740 745 750
Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu
755 760 765
Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr
770 775 780
Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu
785 790 795 800
Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile
805 810 815
Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu
820 825 830
Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile
835 840 845
Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met
850 855 860
Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg
865 870 875 880
Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu
885 890 895
Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu
900 905 910
Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln
915 920 925
Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala
930 935 940
Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val
945 950 955 960
Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val
965 970 975
Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn
980 985 990
Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly
995 1000 1005
Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln
1010 1015 1020
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met
1025 1030 1035
Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp
1040 1045 1050
Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile
1055 1060 1065
Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser
1070 1075 1080
Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr
1085 1090 1095
Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
1100 1105 1110
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
1115 1120 1125
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile
1130 1135 1140
Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys
1145 1150 1155
Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr
1160 1165 1170
Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe
1175 1180 1185
Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala
1190 1195 1200
Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro
1205 1210 1215
Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp
1220 1225 1230
Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala
1235 1240 1245
Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys
1250 1255 1260
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu
1265 1270 1275
Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly
1280 1285 1290
Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val
1295 1300 1305
Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys
1310 1315 1320
Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg
1325 1330 1335
Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro
1340 1345 1350
Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly
1355 1360 1365
Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr
1370 1375 1380
Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu
1385 1390 1395
Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys
1400 1405 1410
Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg
1415 1420 1425
Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala
1430 1435 1440
Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr
1445 1450 1455
Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu
1460 1465 1470
Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln
1475 1480 1485
Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
1490 1495 1500
Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile
1505 1510 1515
Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn
1520 1525 1530
Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp
1535 1540 1545
Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu
1550 1555 1560
Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu
1565 1570 1575
Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala
1580 1585 1590
Gly Gln Ala Lys Lys Lys Lys Gly Thr Asp Ser Gly Gly Ser Ala
1595 1600 1605
Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln
1610 1615 1620
Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln
1625 1630 1635
Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala
1640 1645 1650
Phe Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly
1655 1660 1665
Gln Asp Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe
1670 1675 1680
Ser Val Arg Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met
1685 1690 1695
Tyr Lys Glu Leu Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn
1700 1705 1710
His Gly Tyr Leu Glu Ser Trp Ala Arg Gln Gly Val Leu Leu Leu
1715 1720 1725
Asn Thr Val Leu Thr Val Arg Ala Gly Gln Ala His Ser His Ala
1730 1735 1740
Ser Leu Gly Trp Glu Thr Phe Thr Asp Lys Val Ile Ser Leu Ile
1745 1750 1755
Asn Gln His Arg Glu Gly Val Val Phe Leu Leu Trp Gly Ser His
1760 1765 1770
Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys Gln Arg His His Val
1775 1780 1785
Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala His Arg Gly Phe
1790 1795 1800
Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp Leu Glu Gln
1805 1810 1815
Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro Ala Glu
1820 1825 1830
Ser Glu Pro Lys Lys Lys Arg Lys Val Leu Lys Glu Gly Arg Gly
1835 1840 1845
Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro Gly Pro Asp
1850 1855 1860
Ser Leu Ala Glu Ser Arg Trp Pro Pro Gly Leu Ala Val Met Lys
1865 1870 1875
Thr Ile Asp Asp Leu Leu Arg Cys Gly Ile Cys Phe Glu Tyr Phe
1880 1885 1890
Asn Ile Ala Met Ile Ile Pro Gln Cys Ser His Asn Tyr Cys Ser
1895 1900 1905
Leu Cys Ile Arg Lys Phe Leu Ser Tyr Lys Thr Gln Cys Pro Thr
1910 1915 1920
Cys Cys Val Thr Val Thr Glu Pro Asp Leu Lys Asn Asn Arg Ile
1925 1930 1935
Leu Asp Glu Leu Val Lys Ser Leu Asn Phe Ala Arg Asn His Leu
1940 1945 1950
Leu Gln Phe Ala Leu Glu Ser Pro Ala Lys Ser Pro Ala Ser Ser
1955 1960 1965
Ser Ser Lys Asn Leu Ala Val Lys Val Tyr Thr Pro Val Ala Ser
1970 1975 1980
Arg Gln Ser Leu Lys Gln Gly Ser Arg Leu Met Asp Asn Phe Leu
1985 1990 1995
Ile Arg Glu Met Ser Gly Ser Thr Ser Glu Leu Leu Ile Lys Glu
2000 2005 2010
Asn Lys Ser Lys Phe Ser Pro Gln Lys Glu Ala Ser Pro Ala Ala
2015 2020 2025
Lys Thr Lys Glu Thr Arg Ser Val Glu Glu Ile Ala Pro Asp Pro
2030 2035 2040
Ser Glu Ala Lys Arg Pro Glu Pro Pro Ser Thr Ser Thr Leu Lys
2045 2050 2055
Gln Val Thr Lys Val Asp Cys Pro Val Cys Gly Val Asn Ile Pro
2060 2065 2070
Glu Ser His Ile Asn Lys His Leu Asp Ser Cys Leu Ser Arg Glu
2075 2080 2085
Glu Lys Lys Glu Ser Leu Arg Ser Ser Val His Lys Arg Lys Pro
2090 2095 2100
Leu Pro Lys Thr Val Tyr Asn Leu Leu Ser Asp Arg Asp Leu Lys
2105 2110 2115
Lys Lys Leu Lys Glu His Gly Leu Ser Ile Gln Gly Asn Lys Gln
2120 2125 2130
Gln Leu Ile Lys Arg His Gln Glu Phe Val His Met Tyr Asn Ala
2135 2140 2145
Gln Cys Asp Ala Leu His Pro Lys Ser Ala Ala Glu Ile Val Arg
2150 2155 2160
Glu Ile Glu Asn Ile Glu Lys Thr Arg Met Arg Leu Glu Ala Ser
2165 2170 2175
Lys Leu Asn Glu Ser Val Met Val Phe Thr Lys Asp Gln Thr Glu
2180 2185 2190
Lys Glu Ile Asp Glu Ile His Ser Lys Tyr Arg Lys Lys His Lys
2195 2200 2205
Ser Glu Phe Gln Leu Leu Val Asp Gln Ala Arg Lys Gly Tyr Lys
2210 2215 2220
Lys Ile Ala Gly Met Ser Gln Lys Thr Val Thr Ile Thr Lys Glu
2225 2230 2235
Asp Glu Ser Thr Glu Lys Leu Ser Ser Val Cys Met Gly Gln Glu
2240 2245 2250
Asp Asn Met Thr Ser Val Thr Asn His Phe Ser Gln Ser Lys Leu
2255 2260 2265
Asp Ser Pro Glu Glu Leu Glu Pro Asp Arg Glu Glu Asp Ser Ser
2270 2275 2280
Ser Cys Ile Asp Ile Gln Glu Val Leu Ser Ser Ser Glu Ser Asp
2285 2290 2295
Ser Cys Asn Ser Ser Ser Ser Asp Ile Ile Arg Asp Leu Leu Glu
2300 2305 2310
Glu Glu Glu Ala Trp Glu Ala Ser His Lys Asn Asp Leu Gln Asp
2315 2320 2325
Thr Glu Ile Ser Pro Arg Gln Asn Arg Arg Thr Arg Ala Ala Glu
2330 2335 2340
Ser Ala Glu Ile Glu Pro Arg Asn Lys Arg Asn Arg Asn Ser Gly
2345 2350 2355
Gly Ser Pro Lys Lys Lys Arg Lys Val
2360 2365

Claims (19)

1. A C to G base editing system for editing a target sequence in the genome of a cell, comprising:
a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide, wherein the first polypeptide comprises i) a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG), or ii) a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a Rad18 protein; and
a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of the cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA.
2. The C to G base editing system of claim 1, further comprising
i) Second oneA polypeptide and/or an expression construct comprising a nucleotide sequence encoding a second polypeptide, wherein said second polypeptide comprises a proliferating cell nuclear antigen with a ubiquitin protein binding site mutated: (a)Proliferating Cell Nuclear Ansigen, PCNA) and ubiquitin protein;
ii) a third polypeptide and/or an expression construct comprising a nucleotide sequence encoding the third polypeptide, wherein the third polypeptide comprises a mutated AP endonuclease (APE);
iii) a fourth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fourth polypeptide, wherein said fourth polypeptide comprises a Rad18 protein; and/or
iv) a fifth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fifth polypeptide, wherein the fifth polypeptide comprises uracil-DNA glycosylase (UDG).
3. The C to G base editing system of claim 2, wherein the C to G base editing system comprises any one selected from the following i) -v):
i) a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease inactivated CRISPR effector protein and a uracil-DNA glycosylase (UDG);
a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of a cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA; and
a second polypeptide and/or an expression construct comprising a nucleotide sequence encoding a second polypeptide, wherein said second polypeptide comprises a proliferating cell nuclear antigen with a mutated ubiquitin protein binding site: (Proliferating Cell Nuclear Antigen, PCNA) and ubiquitin protein;
ii) a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding a first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG);
a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of a cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA; and
a third polypeptide and/or an expression construct comprising a nucleotide sequence encoding a third polypeptide, wherein the third polypeptide comprises a mutated AP endonuclease (APE);
iii) a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding a first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG);
a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of a cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA;
a second polypeptide and/or an expression construct comprising a nucleotide sequence encoding a second polypeptide, wherein said second polypeptide comprises a proliferating cell nuclear antigen with a mutated ubiquitin protein binding site: (Proliferating Cell Nuclear Ansigen, PCNA) and ubiquitin protein; and
a third polypeptide and/or an expression construct comprising a nucleotide sequence encoding a third polypeptide, wherein the third polypeptide comprises a mutated AP endonuclease (APE);
iv) a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding a first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG);
a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of a cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA; and
a fourth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fourth polypeptide, wherein said fourth polypeptide comprises a Rad18 protein;
v) a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a Rad18 protein;
a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of a cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA; and
a fifth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fifth polypeptide, wherein said fifth polypeptide comprises uracil-DNA glycosylase (UDG).
4. The C to G base editing system of any one of claims 1 to 3, wherein the cytosine deaminase is selected from the group consisting of APOBEC1 deaminase, activation-induced cytidine deaminase (AID), APOBEC3G, CDA1, human APOBEC3A deaminase,
preferably, the cytosine deaminase is a human APOBEC3A deaminase, e.g., the amino acid sequence of which is shown in SEQ ID NO 1.
5. The C to G base editing system of any of claims 1-4 wherein the nuclease-inactivated CRISPR effector protein is a nuclease-inactivated Cas9, preferably a Cas9 nickase (nCas9), e.g., the nuclease-inactivated CRISPR effector protein comprises the amino acid sequence shown in SEQ ID NO 3.
6. The C to G base editing system of any one of claims 1 to 5 wherein the UDG has an amino acid sequence set forth in SEQ ID NO 5.
7. The C to G base editing system of any one of claims 2 to 6, wherein the PCNA in which the ubiquitin protein binding site is mutated comprises the amino acid sequence shown in SEQ ID NO 9.
8. The C to G base editing system of any one of claims 2-7 wherein the second polypeptide comprises one ubiquitin protein fused to PCNA with the ubiquitin protein binding site mutated.
9. The C to G base editing system of any one of claims 2-8, wherein the ubiquitin protein is a truncated ubiquitin protein, e.g., the truncated ubiquitin protein comprises the amino acid sequence set forth in SEQ ID NO 10.
10. The C to G base editing system of any one of claims 2 to 9, wherein the second polypeptide further comprises MCP (MS2 coat protein), e.g. the MCP is fused to the N-terminus of PCNA wherein the ubiquitin protein binding site is mutated.
11. The C to G base editing system of claim 10, wherein the MCP comprises an amino acid sequence shown in SEQ ID NO. 7.
12. The C to G base editing system of any one of claims 2-11, wherein the mutated APE
i) Derived from rice APE01g and comprising the amino acid substitution D297A with respect to wild-type APE01g, the amino acid position being referenced to SEQ ID NO: 18;
ii) is derived from rice APE12g and comprises the amino acid substitution D327A relative to wild-type APE12g, said amino acid position being referenced to SEQ ID NO 19; or
iii) is derived from rice APE12g and comprises the amino acid substitutions D238A and N240V with respect to wild-type APE12g, the amino acid positions being referenced to SEQ ID NO 19.
13. The C to G base editing system of claim 12 wherein the mutant APE mutant AP lyase comprises the amino acid sequence shown in SEQ ID NO 11, 15 or 16.
14. The C to G base editing system of any one of claims 2-13, wherein the Rad18 protein is a human Rad18 protein, e.g., the Rad18 protein comprises the amino acid sequence set forth in SEQ ID NO 17.
15. The C to G base editing system of any one of claims 1 to 14, wherein the expression construct comprises a nucleotide sequence encoding the amino acid sequence set forth in one of SEQ ID NOs 12-14 and 21-23.
16. A method of producing a genetically modified cell, comprising introducing into a cell the gene editing system of any one of claims 1-15.
17. The method of claim 16, wherein the genetic modification is a substitution of one or more C to G in the target sequence.
18. The method of claim 16 or 17, wherein the cell is derived from, for example, a mammal such as a human, mouse, rat, monkey, dog, pig, sheep, cow, cat; poultry such as chicken, duck, goose; plants, including monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis.
19. A kit comprising the gene editing system of any one of claims 1-15, and instructions for use.
CN202210225327.3A 2021-03-09 2022-03-09 Improved CG base editing system Pending CN115109798A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110258351 2021-03-09
CN2021102583512 2021-03-09

Publications (1)

Publication Number Publication Date
CN115109798A true CN115109798A (en) 2022-09-27

Family

ID=83227422

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202280007672.8A Pending CN117043345A (en) 2021-03-09 2022-03-09 Improved CG base editing system
CN202210225327.3A Pending CN115109798A (en) 2021-03-09 2022-03-09 Improved CG base editing system

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202280007672.8A Pending CN117043345A (en) 2021-03-09 2022-03-09 Improved CG base editing system

Country Status (2)

Country Link
CN (2) CN117043345A (en)
WO (1) WO2022188816A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018086623A1 (en) * 2016-11-14 2018-05-17 Institute Of Genetics And Developmental Biology, Chinese Academy Of Sciences A method for base editing in plants
WO2018165629A1 (en) * 2017-03-10 2018-09-13 President And Fellows Of Harvard College Cytosine to guanine base editor
CN112805385B (en) * 2018-07-24 2023-05-30 苏州齐禾生科生物科技有限公司 Base editor based on human APOBEC3A deaminase and application thereof
EP3966335A4 (en) * 2019-05-07 2023-06-28 Suzhou Qi Biodesign biotechnology Company Limited Improved gene editing system
WO2021042047A1 (en) * 2019-08-30 2021-03-04 The General Hospital Corporation C-to-g transversion dna base editors

Also Published As

Publication number Publication date
CN117043345A (en) 2023-11-10
WO2022188816A1 (en) 2022-09-15

Similar Documents

Publication Publication Date Title
US11702643B2 (en) System and method for genome editing
CN110526993B (en) Nucleic acid construct for gene editing
CN107027313A (en) For the polynary RNA genome editors guided and the method and composition of other RNA technologies
JP7138712B2 (en) Systems and methods for genome editing
WO2023169454A1 (en) Adenine deaminase and use thereof in base editing
CN110527695B (en) Nucleic acid construct for gene site-directed mutagenesis
WO2020224611A1 (en) Improved gene editing system
WO2021032155A1 (en) Base editing system and use method therefor
CN112048493B (en) Method for enhancing Cas9 and derivative protein-mediated gene manipulation system thereof and application
WO2019149239A1 (en) Improved method for genome editing
CN115667528A (en) Multiplex genome editing method and system
CN112280771A (en) Bifunctional genome editing system and uses thereof
CN112805385B (en) Base editor based on human APOBEC3A deaminase and application thereof
JP7361109B2 (en) Systems and methods for C2c1 nuclease-based genome editing
EP4130257A1 (en) Improved cytosine base editing system
KR20240049838A (en) Improved guided editing system
CN115109798A (en) Improved CG base editing system
WO2024051850A1 (en) Dna polymerase-based genome editing system and method
CN116286742B (en) CasD protein, CRISPR/CasD gene editing system and application thereof in plant gene editing
WO2023227050A1 (en) Method for site-specific insertion of exogenous sequence in genome
CN115052980A (en) Gene editing system derived from Flavobacterium
CN115197958A (en) Method for improving efficiency of genetic transformation and gene editing of plants

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right

Effective date of registration: 20220921

Address after: Unit E598, 5th Floor, Lecheng Plaza, Phase II, Biomedical Industrial Park, No. 218, Sangtian Street, Suzhou Industrial Park, Suzhou Area, China (Jiangsu) Pilot Free Trade Zone, Suzhou City, Jiangsu Province, 215127

Applicant after: Suzhou Qihe Biotechnology Co.,Ltd.

Address before: Room D340, F3, building 2, No. 2250, Pudong South Road, Pudong New Area, Shanghai 200120

Applicant before: Shanghai Blue Cross Medical Science Research Institute

TA01 Transfer of patent application right
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination