CN115109798A

CN115109798A - Improved CG base editing system

Info

Publication number: CN115109798A
Application number: CN202210225327.3A
Authority: CN
Inventors: 高彩霞; 王升星
Original assignee: Shanghai Blue Cross Medical Science Research Institute
Current assignee: Suzhou Qihe Biotechnology Co ltd
Priority date: 2021-03-09
Filing date: 2022-03-09
Publication date: 2022-09-27
Also published as: CN117043345A; WO2022188816A1

Abstract

The present invention relates to the field of genetic engineering. In particular, the present invention relates to an improved CG base editing system that enables efficient, accurate in vivo C to G base editing.

Description

Improved CG base editing system

Technical Field

Background

In recent years, with the development of genome editing technology, a large number of base editors are continuously developed, improved and applied, including a Cytosine Base Editor (CBE) composed of Cas9 nickase (nCas9(D10A)) fused Cytosine deaminase (Cytosine deaminase), Uracil Glycosylase Inhibitor (UGI) and an Adenine Base Editor (ABE) composed of nCas9 fused Adenosine deaminase (adenosin deaminase), which mediate precise base substitutions between c.g. to t.a and a.t to g.c at targeted sites of animal and plant genomes (Komor et al, 2016; gaudell et al, 2017; Zong et al, 2018; Li et al, 2018). In 2019, Anzolobe et al constructed a guided editing system (Prime editing) by fusing nCas9(H840A) with reverse transcriptase (MMLV), and successfully achieved the substitution between any type of bases in the genome under the guidance of pegRNA, however, the editing efficiency of the system was limited by multiple factors such as the target sequence, PBS length and melting temperature (Tm) (Lin et al, 2020). In addition, when the CBE and ABE8e system (Richter et al, 2020) and the Cas9 variant SpG and SpRY (Walton et al, 2020) are combined, efficient base editing between C.G to T.A and A.T to G.C of any target site can be realized, the efficiency is far higher than that of the PE system, but the two systems cannot realize replacement between other types of bases. Therefore, the current development situation of gene editing still needs to develop a new base editing system to realize the replacement of other types of bases. ZHao et al (2020) and Kurt et al (2020) use nCas9(D10A) to fuse APOBEC1 cytosine deaminase and uracil-DNA glycosylase (UDG) to construct CG base editors, successfully achieving C-to-G transversion in mammalian cells, but also introducing a large amount of C-to-T, C-to-A and Indel and other byproducts. Furthermore, due to the difference of the DNA damage repair pathways of animals and plants, Indel mutation is more easily generated in a CG base editing system of the plants, so that the corresponding CG base editing system has not been established in the plants so far. Therefore, a CG base editing system needs to be established on plants to expand the single base editing range, and meanwhile, the CG base editing system is further optimized to be capable of mediating C-to-G base transversion in animal and plant genomes more efficiently and accurately.

Brief Description of Drawings

FIG. 1, shows the DNA damage occurrence involved in cytosine deamination and its potential repair pathways.

FIG. 2 shows a map of vector construction of A3A-PBE and PCGBE-1(Plant C to G Base Editing-1).

FIG. 3 shows the type and efficiency of mutations mediated by A3A-PBE and PCGBE-1 in rice protoplasts.

FIG. 4 shows the mutation types and efficiencies of A3A-PBE and PCGBE-1 in mediating OsNRT1.1B, OsPDS and OsGRF1 targets in rice callus.

FIG. 5 shows that A3A-PBE and PCGBE-1 mediate the mutation types and efficiencies of OsAAT and OsSWEET14 targets in rice callus.

FIG. 6 shows a vector construction map of PCGBE-2-6 after optimization.

FIG. 7 shows the mutation types and efficiencies of PCGBE-1-6 in rice callus for mediating OsNRT1.1B target.

FIG. 8 shows the mutation types and efficiencies of PCGBE-1-6 in rice callus for mediating OsPDS target.

FIG. 9 shows the mutation types and efficiencies of PCGBE-1-6 in rice callus for mediating OsGRF1 target.

FIG. 10 shows the difference between the C-to-G editing efficiency and the editing purity of PCGBEs-1-6.

Figure 11, shows an additional PCGBE binary vector construction.

FIG. 12, shows the type and efficiency of mutations mediated in rice regenerated plants by the two PCGBE systems shown in FIG. 11b and FIG. 11 c.

Detailed Description

A, define

In the present invention, unless otherwise specified, scientific and technical terms used herein have the meanings that are commonly understood by those skilled in the art. Also, protein and nucleic acid chemistry, molecular biology, cell and tissue culture, microbiology, immunology related terms, and laboratory procedures used herein are all terms and conventional procedures used extensively in the relevant art. For example, standard recombinant DNA and molecular cloning techniques used in the present invention are well known to those skilled in the art and are more fully described in the following references: sambrook, j., Fritsch, e.f. and manitis, t., Molecular Cloning: a Laboratory Manual; cold Spring Harbor Laboratory Press: cold Spring Harbor, 1989 (hereinafter referred to as "Sambrook"). Meanwhile, in order to better understand the present invention, the definitions and explanations of related terms are provided below.

As used herein, the term "and/or" encompasses all combinations of items linked by the term, as if each combination had been individually listed herein. For example, "a and/or B" encompasses "a", "a and B", and "B". For example, "A, B and/or C" encompasses "a", "B", "C", "a and B", "a and C", "B and C", and "a and B and C".

"genome" as used herein encompasses not only chromosomal DNA present in the nucleus of a cell, but organelle DNA present in subcellular components of the cell (e.g., mitochondria, plastids).

As used herein, "organism" includes any organism suitable for genome editing, preferably a eukaryote. Examples of organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cows, cats; poultry such as chicken, duck, goose; plants include monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis, and the like.

By "genetically modified organism" or "genetically modified cell" is meant an organism or cell that comprises within its genome an exogenous polynucleotide or modified gene or expression control sequence. For example, an exogenous polynucleotide can be stably integrated into the genome of an organism or cell and be inherited by successive generations. The exogenous polynucleotide may be integrated into the genome alone or as part of a recombinant DNA construct. The modified gene or expression regulatory sequence is one which comprises single or multiple deoxynucleotide substitutions, deletions and additions in the genome of the organism or cell.

"exogenous" with respect to a sequence means a sequence from a foreign species, or if from the same species, a sequence whose composition and/or locus has been significantly altered from its native form by deliberate human intervention.

"polynucleotide", "nucleic acid sequence", "nucleotide sequence" or "nucleic acid fragment" are used interchangeably and are single-or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural or altered nucleotide bases. Nucleotides are referred to by their single letter designation as follows: "A" is adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively), "C" represents cytidine or deoxycytidine, "G" represents guanosine or deoxyguanosine, "U" represents uridine, "T" represents deoxythymidine, "R" represents purine (A or G), "Y" represents pyrimidine (C or T), "K" represents G or T, "H" represents A or C or T, "I" represents inosine, and "N" represents any nucleotide.

"polypeptide," "peptide," and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residues is an artificial chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers. The terms "polypeptide", "peptide", "amino acid sequence" and "protein" may also include modifications including, but not limited to, glycosylation, lipid attachment, sulfation, gamma carboxylation of glutamic acid residues, hydroxylation and ADP-ribosylation.

Sequence "identity" has a art-recognized meaning and can be calculated using the disclosed techniques as the percentage of sequence identity between two nucleic acid or polypeptide molecules or regions. Sequence identity can be measured along the entire length of a polynucleotide or polypeptide or along regions of the molecule. (see, e.g., Computer Molecular Biology, desk, A.M., ed., Oxford University Press, New York, 1988; Biocomputing: information and Genome Projects, Smith, D.W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A.M., and Griffin, H.G., eds, Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heanje, G.academic Press, 1987; and Sequence Analysis, Priviskton, M.J., development, N.M., and Stock, 1991). Although there are many methods for measuring identity between two polynucleotides or polypeptides, the term "identity" is well known to the skilled person (Carrillo, H. & Lipman, D., SIAM J Applied Math 48:1073 (1988)).

The term "comprising" when used herein to describe a sequence of a protein or nucleic acid, the protein or nucleic acid may consist of the sequence or may have additional amino acids or nucleotides at one or both ends of the protein or nucleic acid, but still possess the activity described herein. Furthermore, it is clear to the skilled person that the methionine at the N-terminus of the polypeptide encoded by the start codon may be retained in certain practical cases (e.g.during expression in a particular expression system), but does not substantially affect the function of the polypeptide. Thus, in describing a particular polypeptide amino acid sequence in the specification and claims of this application, although it may not contain a methionine encoded by the start codon at the N-terminus, the sequence containing the methionine is also encompassed herein, and accordingly, the encoding nucleotide sequence may also contain the start codon; and vice versa.

Suitable conservative amino acid substitutions in peptides or proteins are known to those skilled in the art and can generally be made without altering the biological activity of the resulting molecule. In general, one of skill in The art recognizes that single amino acid substitutions in non-essential regions of a polypeptide do not substantially alter biological activity (see, e.g., Watson et al, Molecular Biology of The Gene,4th Edition,1987, The Benjamin/Cummings pub.co., p.224).

As used herein, the term "CRISPR effector protein" generally refers to nucleases found in naturally occurring CRISPR systems, as well as modified forms thereof, variants thereof, catalytically active fragments thereof, and the like. The term encompasses any effector protein capable of gene targeting (e.g., gene editing, gene targeting regulation, etc.) within a cell based on a CRISPR system. The CRISPR effector proteins described herein may for example be selected from Cas3, Cas8a, Cas5, Cas8b, Cas8C, Cas10d, Cse1, Cse2, Csy1, Csy2, Csy3, GSU0054, Cas10, Csm2, Cmr5, Cas10, Csx11, Csx10, Csf1, Cas9, Csn2, Cas4, Cpf1, C2C1, C2C3 or C2C2 proteins, or functional variants of these nucleases. Examples of "CRISPR effector proteins" include Cas9 nuclease or variants thereof. The Cas9 nuclease may be a Cas9 nuclease from a different species, such as spCas9 from streptococcus pyogenes (s.pyogenes) or SaCas9 derived from staphylococcus aureus (s.aureus). "Cas 9 nuclease" and "Cas 9" are used interchangeably herein to refer to an RNA-guided nuclease that includes a Cas9 protein or fragment thereof (e.g., a protein comprising the active DNA cleavage domain of Cas9 and/or the gRNA binding domain of Cas 9). Cas9 is a component of a CRISPR/Cas (clustered regularly interspaced short palindromic repeats and related systems) genome editing system that is capable of targeting and cleaving a DNA target sequence under the direction of a guide RNA to form a DNA Double Strand Break (DSB). Examples of "CRISPR effector proteins" may also include Cpf1 nuclease or variants thereof, such as high specificity variants. The Cpf1 nuclease may be Cpf1 nuclease from different species, such as Cpf1 nuclease from Francisella novicida U112, Acidaminococcus sp.bv3l6 and Lachnospiraceae bacterium ND 2006.

As used herein, "expression construct" refers to a vector, such as a recombinant vector, suitable for expression of a nucleotide sequence of interest in an organism. "expression" refers to the production of a functional product. For example, expression of a nucleotide sequence can refer to transcription of the nucleotide sequence (e.g., transcription to produce mRNA or functional RNA) and/or translation of the RNA into a precursor or mature protein.

The "expression construct" of the invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector, or, in some embodiments, may be an RNA (e.g., mRNA) capable of translation.

An "expression construct" of the invention may comprise regulatory sequences and nucleotide sequences of interest of different origin, or regulatory sequences and nucleotide sequences of interest of the same origin but arranged in a manner different from that normally found in nature.

"regulatory sequence" and "regulatory element" are used interchangeably to refer to a nucleotide sequence that is located upstream (5 'non-coding sequence), intermediate, or downstream (3' non-coding sequence) of a coding sequence and that affects the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences may include, but are not limited to, promoters, translation leader sequences, introns, and polyadenylation recognition sequences.

"promoter" refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment. In some embodiments of the invention, the promoter is a promoter capable of controlling transcription of a gene in a cell, whether or not it is derived from the cell. The promoter may be a constitutive promoter or a tissue-specific promoter or a developmentally regulated promoter or an inducible promoter.

"constitutive promoter" refers to a promoter that will generally cause a gene to be expressed in most cell types under most circumstances. "tissue-specific promoter" and "tissue-preferred promoter" are used interchangeably and refer to a promoter that is expressed primarily, but not necessarily exclusively, in a tissue or organ, but may also be expressed in a particular cell or cell type. "developmentally regulated promoter" refers to a promoter whose activity is determined by a developmental event. An "inducible promoter" selectively expresses an operably linked DNA sequence in response to an endogenous or exogenous stimulus (environmental, hormonal, chemical signal, etc.).

Examples of promoters include, but are not limited to, polymerase (pol) I, pol II, or pol III promoters. Examples of pol I promoters include the chicken RNA pol I promoter. Examples of pol II promoters include, but are not limited to, the cytomegalovirus immediate early (CMV) promoter, the rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and the simian virus 40(SV40) immediate early promoter. Examples of pol III promoters include the U6 and H1 promoters. Inducible promoters such as the metallothionein promoter may be used. Other examples of promoters include the T7 phage promoter, the T3 phage promoter, the β -galactosidase promoter, and the Sp6 phage promoter. When used in plants, the promoter may be the cauliflower mosaic virus 35S promoter, the maize Ubi-1 promoter, the wheat U6 promoter, the rice U3 promoter, the maize U3 promoter, the rice actin promoter.

As used herein, the term "operably linked" refers to a regulatory element (such as, but not limited to, a promoter sequence, a transcription termination sequence, and the like) linked to a nucleic acid sequence (e.g., a coding sequence or an open reading frame) such that transcription of the nucleotide sequence is controlled and regulated by the transcriptional regulatory element. Techniques for operably linking regulatory element regions to nucleic acid molecules are known in the art.

"introducing" a nucleic acid molecule (e.g., a plasmid, linear nucleic acid fragment, RNA, etc.) or a protein into an organism refers to transforming cells of the organism with the nucleic acid or protein such that the nucleic acid or protein is capable of functioning in the cells. "transformation" as used herein includes both stable transformation and transient transformation.

"Stable transformation" refers to the introduction of an exogenous nucleotide sequence into a genome, resulting in the stable inheritance of the exogenous gene. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any successive generation thereof.

"transient transformation" refers to the introduction of a nucleic acid molecule or protein into a cell that performs a function without stable inheritance of a foreign gene. In transient transformation, the foreign nucleic acid sequence is not integrated into the genome.

Two, C to G base editing system

The present invention provides a C to G base editing system for editing a target sequence in the genome of a cell, comprising:

a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide, wherein the first polypeptide comprises i) a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG), or ii) a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a Rad18 protein; and

a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of the cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA.

In some embodiments, the C to G base editing line further comprises

i) A second polypeptide and/or an expression construct comprising a nucleotide sequence encoding a second polypeptide, wherein said second polypeptide comprises a proliferating cell nuclear antigen with a mutated ubiquitin protein binding site: (Proliferating Cell Nuclear Ansigen, PCNA) and ubiquitin protein;

ii) a third polypeptide and/or an expression construct comprising a nucleotide sequence encoding the third polypeptide, wherein the third polypeptide comprises a mutated AP endonuclease (APE);

iii) a fourth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fourth polypeptide, wherein said fourth polypeptide comprises a Rad18 protein; and/or

iv) a fifth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fifth polypeptide, wherein the fifth polypeptide comprises uracil-DNA glycosylase (UDG).

In some embodiments, the C to G base editing system comprises:

a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding a first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG);

a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of a cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA; and

a second polypeptide and/or an expression construct comprising a nucleotide sequence encoding a second polypeptide, wherein said second polypeptide comprises a ubiquitin protein binding siteMutated proliferating cell nuclear antigen: (Proliferating Cell Nuclear Ansigen, PCNA) and ubiquitin protein.

In some embodiments, the C to G base editing system comprises:

a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease inactivated CRISPR effector protein and a uracil-DNA glycosylase (UDG);

a third polypeptide and/or an expression construct comprising a nucleotide sequence encoding a third polypeptide, wherein the third polypeptide comprises a mutated AP endonuclease (APE).

In some embodiments, the C to G base editing system comprises:

a guide RNA capable of targeting the first polypeptide to a target sequence in the genome of a cell and/or an expression construct comprising a nucleotide sequence encoding a guide RNA;

a second polypeptide and/or an expression construct comprising a nucleotide sequence encoding a second polypeptide, wherein said second polypeptide comprises a proliferating cell nuclear antigen with a mutated ubiquitin protein binding site: (Proliferating Cell Nuclear Ansigen, PCNA) and ubiquitin protein; and

a third polypeptide and/or an expression construct comprising a nucleotide sequence encoding the third polypeptide, wherein the third polypeptide comprises a mutated AP endonuclease (APE).

In some embodiments, the C to G base editing system comprises:

a fourth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fourth polypeptide, wherein said fourth polypeptide comprises a Rad18 protein.

In some embodiments, the C to G base editing system comprises:

a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease inactivated CRISPR effector protein, and a Rad18 protein;

a fifth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fifth polypeptide, wherein said fifth polypeptide comprises uracil-DNA glycosylase (UDG).

In some embodiments, the expression construct comprising a nucleotide sequence encoding a first polypeptide, the expression construct comprising a nucleotide sequence encoding a second polypeptide, the expression construct comprising a nucleotide sequence encoding a third polypeptide, the expression construct comprising a nucleotide sequence encoding a fourth polypeptide, the expression construct comprising a nucleotide sequence encoding a fifth polypeptide, and/or the expression construct comprising a nucleotide sequence encoding a guide RNA may be different expression constructs, or any two, any three, or all of them may be the same expression construct. In some embodiments, the first polypeptide is isolated, the second polypeptide is isolated, the third polypeptide is isolated, the fourth polypeptide is isolated, the fifth polypeptide is isolated and/or the guide RNA is isolated.

In some embodiments, the gene editing system comprises at least an expression construct comprising a nucleotide sequence encoding the first polypeptide, a nucleotide sequence encoding a self-cleaving peptide, and a nucleotide sequence encoding the second, third, fourth, or fifth polypeptide linked in-frame.

As used herein, "self-cleaving peptide" means a peptide that can achieve self-cleavage within a cell. For example, the self-cleaving peptide may include a protease recognition site so as to be recognized and specifically cleaved by a protease within the cell.

Alternatively, the self-cleaving peptide may be a 2A polypeptide. 2A polypeptides are a class of short peptides from viruses, the self-cleavage of which occurs during translation. When two different polypeptides of interest are expressed in-frame using a 2A polypeptide to link them, the two polypeptides of interest are produced in a ratio of approximately 1: 1. Commonly used 2A polypeptides may be P2A from porcine teschovirus (porcine techovirus-1), T2A from Spodoptera litura beta-tetrad virus (Thosea asigna virus), E2A from equine rhinovirus (equine rhinovirus A virus) and F2A from foot-and-mouth disease virus (foot-and-mouth disease virus). Among them, P2A is preferable because it has the highest cleavage efficiency. A variety of functional variants of these 2A polypeptides are also known in the art and may be used in the present invention.

As used herein, the term "cytosine deaminase" refers to a deaminase that accepts single-stranded DNA as a substrate and is capable of catalyzing the deamination of cytidine or deoxycytidine to uracil or deoxyuracil, respectively. Examples of cytosine deaminases include, but are not limited to, for example, APOBEC1 deaminase, activation-induced cytidine deaminase (AID), APOBEC3G, CDA1, human APOBEC3A deaminase. In some embodiments, the cytosine deaminase is a human APOBEC3A deaminase, e.g., the amino acid sequence of which is set forth in SEQ ID NO: 1. In some embodiments, the cytosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% sequence identity to SEQ ID No. 1, or one or more conservative amino acid substitutions relative to SEQ ID No. 1, but substantially retains the function of the protein set forth in SEQ ID No. 1.

As used herein, "nuclease-inactivated CRISPR effector protein" refers to a deletion in the double-stranded nucleic acid cleavage activity of the CRISPR effector protein, yet retains the gRNA-directed DNA targeting ability. CRISPR effector proteins that lack double-stranded nucleic acid cleaving activity also encompass nickases (nickases) that form nicks (nicks) on double-stranded nucleic acid molecules, but do not completely cleave double-stranded nucleic acids. In some preferred embodiments of the invention, the nuclease-inactivated CRISPR effector protein of the invention has nickase activity.

In some embodiments, the nuclease-inactivated CRISPR effector protein is nuclease-inactivated Cas 9. The DNA cleavage domain of Cas9 nuclease is known to comprise two subdomains: HNH nuclease subdomain and RuvC subdomain. The HNH subdomain cleaves the complementary strand to the gRNA, while the RuvC subdomain cleaves the non-complementary strand. Mutations in these subdomains can inactivate the nuclease activity of Cas9, forming a "nuclease-inactivated Cas 9". The nuclease inactivated Cas9 still retained the gRNA-directed DNA binding ability. Thus, in principle, when fused to another protein, nuclease-inactivated Cas9 can target the other protein to almost any DNA sequence simply by co-expression with a suitable guide RNA.

The nuclease inactivated Cas9 according to the invention may be derived from Cas9 of different species, e.g. from streptococcus pyogenes (s. pyogenes) Cas9(SpCas9), or from staphylococcus aureus (s. aureus) Cas9(SaCas 9). Simultaneously mutating the HNH nuclease subdomain and RuvC subdomain of Cas9 (e.g., comprising mutations D10A and H840A) inactivates Cas9 nuclease as nuclease-dead Cas9(dCas 9). Mutation inactivation of one of the subdomains can render Cas9 nickase active, i.e., obtain Cas9 nickase (nCas9), e.g., nCas9 with only mutation D10A. Thus, in some embodiments of the invention, the nuclease-inactivated Cas9 of the invention comprises amino acid substitutions D10A and/or H840A relative to wild-type Cas 9. In some embodiments of the invention, the nuclease-inactivated Cas9 may further comprise an additional mutation. For example, nuclease inactivated SpCas9 may also comprise EQR, VQR or VRER mutations and SaCas9 may also comprise KKH mutations (Kim et al. nat. biotechnol.35, 371-376.).

In some embodiments of the invention, the nuclease-inactivated CRISPR effector protein comprises the amino acid sequence set forth in SEQ ID No. 3.

As used herein, Uracil-DNA Glycosylase (UDG) or Uracil-N-Glycosylase (UNG) refers to an enzyme that recognizes the U base and removes the N-glycosidic bond of the base to form an apurinic or apyrimidinic site. The UDG may be of different origin, for example from e. In some embodiments, UDG has the amino acid sequence shown in SEQ ID NO 5. In some embodiments, the UDG comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% sequence identity to SEQ ID No. 5, or one or more conservative amino acid substitutions relative to SEQ ID No. 5, but substantially retains the function of the protein set forth in SEQ ID No. 5.

In some embodiments of the invention, the cytosine deaminase in the first polypeptide is fused to the N-terminus of the nuclease-inactivated CRISPR effector protein.

In some embodiments of the invention, said cytosine deaminase, said nuclease-inactivated CRISPR effector protein and/or said UDG are directly linked in a first polypeptide. In some embodiments of the invention, the cytosine deaminase, the nuclease-inactivated CRISPR effector protein, and/or the Rad18 protein in the first polypeptide are directly linked. In some embodiments of the invention, the cytosine deaminase, the nuclease-inactivated CRISPR effector protein and/or the UDG in the first polypeptide are linked by a linker. In some embodiments of the invention, the cytosine deaminase, the nuclease-inactivated CRISPR effector protein, and/or the Rad18 protein of the first polypeptide are linked by a linker. The linker may be a non-functional amino acid sequence of 1-50 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or 20-25, 25-50) or more amino acids in length, without secondary or higher structure. For example, the linker can be a flexible linker, such as the X-TENT linker shown in SEQ ID NO: 2.

In the present invention, the Proliferating Cell Nuclear Antigen (PCNA) may be PCNA derived from various species. In some embodiments, the PCNA is derived from a species that requires base editing. In some embodiments, the ubiquitin protein binding site of PCNA is mutated to prevent it from being ubiquitinated by endogenous systems within the cell. It is within the ability of the person skilled in the art to identify and/or mutate the ubiquitin protein binding site of PCNA to prevent its natural ubiquitination. In some embodiments, the PCNA is rice PCNA. The wild type rice PCNA contains, for example, the amino acid sequence shown in SEQ ID NO: 20. In some embodiments, the PCNA amino acid sequence in which the ubiquitin protein binding site is mutated comprises the amino acid substitution K164R with respect to wild-type PCNA, the amino acid position being referenced to SEQ ID NO: 20. In some embodiments, the PCNA in which the ubiquitin protein binding site is mutated comprises the amino acid sequence set forth in SEQ ID NO. 9.

In some embodiments, the second polypeptide comprises one ubiquitin protein fused to the PCNA with the ubiquitin protein binding site mutated (monoubiquitination). The ubiquitin proteins can be ubiquitin proteins from various species. In some embodiments, the ubiquitin protein is derived from a species that requires base editing. In some embodiments, the ubiquitin protein is a rice ubiquitin protein. In some embodiments, the one ubiquitin protein is a truncated ubiquitin protein. In some embodiments, the truncated ubiquitin protein comprises only the N-terminal functional domain of the ubiquitin protein. In some embodiments, the truncated ubiquitin protein comprises the amino acid sequence set forth in SEQ ID NO. 10. In some embodiments, the ubiquitin protein is fused to the C-terminus of the PCNA in which the ubiquitin protein binding site is mutated.

In some embodiments, the second polypeptide further comprises MCP (MS2 coat protein), e.g., the MCP is fused to the N-terminus of the PCNA in which the ubiquitin protein binding site is mutated. An exemplary MCP comprises the amino acid sequence shown in SEQ ID NO 7. In some embodiments, the MCP comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% sequence identity to SEQ ID No. 7, or an amino acid sequence thereof having one or more conservative amino acid substitutions relative to SEQ ID No. 7, but substantially retaining the function of the protein set forth in SEQ ID No. 7. Accordingly, in some embodiments, the guide RNA comprises the MS2 sequence.

In some embodiments of the invention, the mutant PCNA, the ubiquitin protein, and optionally the MCP in the second polypeptide are linked directly to each other. In some embodiments of the invention, the mutated PCNA, the ubiquitin protein and optionally the MCP in the second polypeptide are linked by a linker. The linker may be a non-functional amino acid sequence of 1-50 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or 20-25, 25-50) or more amino acids in length, without secondary or higher structure.

"AP endonuclease," "AP lyase," and "purine-free pyrimidine lyase" are used interchangeably herein to refer to enzymes that are capable of recognizing an apurinic or apyrimidinic site on a nucleic acid and cleaving the nucleic acid. The AP endonuclease can be of various species. In some embodiments, the AP endonuclease is derived from a species that requires base editing. In some embodiments, the AP endonuclease is a rice AP endonuclease. In some embodiments, the AP endonuclease is rice APE01 g. Wild type rice APE01g contains the amino acid sequence shown in SEQ ID NO. 18. In some embodiments, the AP endonuclease is rice APE12 g. Wild type rice APE12g contains the amino acid sequence shown in SEQ ID NO. 19.

In some embodiments, the AP endonuclease is mutated to inactivate it, e.g., such that it retains substrate binding activity but loses catalytic activity.

In some embodiments, the mutant AP lyase is derived from rice APE01g and comprises the amino acid substitution D297A, relative to wild-type APE01g, the amino acid position being referenced to SEQ ID NO: 18. In some embodiments, the mutant AP lyase comprises the amino acid sequence set forth in SEQ ID NO. 15.

In some embodiments, the mutant AP lyase is derived from rice APE12g and comprises the amino acid substitution D327A with respect to wild-type APE12g, the amino acid position being referenced to SEQ ID NO 19. In some embodiments, the mutated AP lyase comprises the amino acid sequence set forth in SEQ ID NO 16.

In some embodiments, the mutant AP lyase is derived from rice APE12g and comprises the amino acid substitutions D238A and N240V with respect to wild-type APE12g, the amino acid positions being referenced to SEQ ID NO 19. In some embodiments, the mutant AP lyase comprises the amino acid sequence set forth in SEQ ID NO 11.

In the present invention, the Rad18 protein may be Rad18 protein from various species. In some embodiments, the Rad18 protein is a human Rad18 protein. In some embodiments, the Rad18 protein comprises the amino acid sequence set forth in SEQ ID NO 17. In some embodiments, the Rad18 protein comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% sequence identity to SEQ ID No. 17, or an amino acid sequence thereof having one or more conservative amino acid substitutions relative to SEQ ID No. 17, but substantially retains the function of the Rad18 protein set forth in SEQ ID No. 17.

As used herein, "gRNA" and "guide RNA" are used interchangeably to refer to an RNA molecule capable of forming a complex with a CRISPR nuclease and, due to some complementarity with a target sequence, of targeting the complex to the target sequence. For example, in Cas 9-based gene editing systems, grnas typically consist of crRNA and tracrRNA molecules that are partially complementary to form a complex, where the crRNA comprises a sequence that is sufficiently complementary to a target sequence to hybridize to the target sequence and direct the CRISPR complex (Cas9+ crRNA + tracrRNA) to specifically bind to the target sequence. However, it is known in the art to design single guide rnas (sgrnas) that contain both the characteristics of crRNA and tracrRNA. Whereas in Cpf 1-based genome editing systems, grnas typically consist only of mature crRNA molecules (also referred to as sgrnas), where the crRNA comprises a sequence that is sufficiently identical to the target sequence to hybridize to a complementary sequence of the target sequence and direct specific binding of the complex (Cpf1+ crRNA) to that target sequence. It is within the ability of one skilled in the art to design suitable grnas based on the CRISPR nuclease used and the target sequence to be edited. As used herein, a "target sequence" is a sequence that is complementary to or identical (depending on the different CRISPR nucleases) to a guide sequence of about 20 nucleotides contained in a guide RNA. Guide RNAs target a target sequence by base pairing with the target sequence or its complementary strand.

In some embodiments of the invention, the editing results in one or more nucleotide substitutions C to G in the target sequence.

In some embodiments of the invention, the polypeptide of the invention further comprises a Nuclear Localization Sequence (NLS). In general, one or more NLS in the polypeptide should be of sufficient strength to drive the polypeptide to accumulate in the nucleus of the cell in an amount that can perform its gene editing function. In general, the intensity of nuclear localization activity is determined by the number, location, specific NLS or NLSs used, or a combination of these factors, in the polypeptide. In some embodiments of the invention, the NLS of the polypeptide of the invention may be located at the N-terminus and/or C-terminus or in the middle. In some embodiments, the polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs. When there is more than one NLS, each can be chosen to be independent of the other NLS. In some embodiments, the NLS comprises the amino acid sequence set forth in

SEQ ID NO

4 or 8.

In addition, the polypeptides of the invention may also include other localization sequences, such as cytoplasmic localization sequences, chloroplast localization sequences, mitochondrial localization sequences, etc., depending on the DNA location to be edited.

In some embodiments, the first polypeptide comprises the amino acid sequence set forth in SEQ ID NO 12. In some embodiments, a fusion protein comprising a first polypeptide and a second polypeptide fused by a self-cleaving peptide (T2A) comprises the amino acid sequence set forth in SEQ ID NO 13. In some embodiments, a fusion protein comprising a first polypeptide and a third polypeptide fused by a self-cleaving peptide (T2A) comprises the amino acid sequence set forth in SEQ ID NO: 14. In some embodiments, a fusion protein comprising a first polypeptide and a third polypeptide fused by a self-cleaving peptide (T2A) comprises the amino acid sequence set forth in SEQ ID NO: 21. In some embodiments, a fusion protein comprising a first polypeptide and a third polypeptide fused by a self-cleaving peptide (T2A) comprises the amino acid sequence set forth in SEQ ID NO: 22. In some embodiments, a fusion protein comprising a first polypeptide and a fourth polypeptide fused by a self-cleaving peptide (T2A) comprises the amino acid sequence set forth in SEQ ID NO 23.

In order to obtain efficient expression in a cell, in some embodiments of the invention, the nucleotide sequence encoding the polypeptide is codon optimized for the organism from which the cell to be subjected to gene editing is derived.

Codon optimization refers to a method of modifying a nucleic acid sequence to enhance expression in a host cell of interest by replacing at least one codon of the native sequence (e.g., about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons) with a codon that is used more frequently or most frequently in the host cell's gene while maintaining the native amino acid sequence. Genes can be tailored for optimal gene expression in a given organism based on codon optimization. Tables of Codon Usage are readily available, for example in the Codon Usage Database ("Codon Usage Database") available at www.kazusa.orjp/Codon/and these tables may be adapted in different ways. See, Nakamura Y. et al, "Codon use partitioned from the international DNA sequences databases: status for the year 2000.Nucl. acids Res., 28:292 (2000).

The organism from which the cells that can be subjected to gene editing by the system of the invention are derived is preferably a eukaryote, including, but not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cows, cats; poultry such as chicken, duck, goose; plants include monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis, and the like. Preferably, the organism is a plant, more preferably rice.

As used herein, "editing system" refers to a combination of components required for base editing of a genome within a cell or organism. Wherein the individual components of the system, such as the one or more polypeptides or expression constructs encoding the same, the one or more guide RNAs or expression constructs encoding the same, may each be present independently, or may be present in any combination as a composition.

Method for modifying target sequence in cell genome

In another aspect, the invention provides a method of modifying a target sequence in the genome of a cell, comprising introducing into the cell the base editing system of the invention.

In some embodiments, the modification results in one or more nucleotide substitutions C to G in the target sequence. . In some embodiments, the modification does not include an insertion and/or deletion mutation.

In another aspect, the invention also provides a method of producing a genetically modified cell comprising introducing into said cell a gene editing system of the invention.

In another aspect, the invention also provides a genetically modified organism comprising the genetically modified cell produced by the method of the invention or progeny cells thereof.

In the present invention, the target sequence to be modified may be located anywhere in the genome, for example, within a functional gene such as a protein-encoding gene, or may be located, for example, in a gene expression regulatory region such as a promoter region or an enhancer region, thereby effecting a modification of the function of the gene or a modification of gene expression. Modifications in the cellular target sequence may be detected by T7EI, PCR/RE or sequencing methods.

In the method of the present invention, the base editing system can be introduced into cells by various methods well known to those skilled in the art.

Methods that can be used to introduce the base editing system of the invention into a cell include, but are not limited to: calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (such as baculovirus, vaccinia, adenovirus, adeno-associated virus, lentivirus and other viruses), particle gun methods, PEG-mediated transformation of protoplasts, Agrobacterium tumefaciens-mediated transformation.

The cells that can be base-edited by the method of the present invention may be derived from, for example, mammals such as human, mouse, rat, monkey, dog, pig, sheep, cow, cat; poultry such as chicken, duck, goose; plants, including monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis, and the like. Preferably, the cell is a plant cell, such as a rice cell.

In some embodiments, the cell is a proliferating and/or differentiating cell. In some embodiments, the cell is a meristematic cell. In some embodiments, the cell is a callus cell.

In some embodiments, the methods of the invention are performed in vitro. For example, the cell is an isolated cell, or a cell in an isolated tissue or organ.

In other embodiments, the methods of the invention may also be performed in vivo. For example, the cell is a cell within an organism into which the system of the invention can be introduced in vivo by, for example, viral or Agrobacterium tumefaciens mediated methods.

Application of the plant

The base editing system and the method of modifying a target sequence in the genome of a cell of the present invention are particularly suitable for genetically modifying plants. Preferably, the plant is a crop plant, including but not limited to wheat, rice, maize, soybean, sunflower, sorghum, canola, alfalfa, cotton, barley, millet, sugarcane, tomato, tobacco, cassava, and potato. More preferably, the plant is rice.

In another aspect, the invention provides a method of producing a genetically modified plant comprising introducing into at least one of said plants a base editing system of the invention, thereby resulting in the substitution of one or more C to G of a target sequence in the genome of said at least one plant.

In some embodiments, the method further comprises screening the at least one plant for plants having the desired one or more C to G substitutions.

In the method of the present invention, the base editing system can be introduced into a plant by various methods well known to those skilled in the art. Methods that can be used to introduce the base editing system of the present invention into a plant include, but are not limited to: particle gun method, PEG mediated protoplast transformation, Agrobacterium tumefaciens mediated transformation, plant virus mediated transformation, pollen tube channel method and ovary injection method. Preferably, the base editing system is introduced into the plant by transient transformation.

In the method of the present invention, modification of the target sequence can be achieved simply by introducing or producing the polypeptide and guide RNA in a plant cell, and the modification can be stably inherited without stably transforming a plant with an exogenous polynucleotide encoding a component of the base editing system. This avoids potential off-target effects of stably present (continually produced) base editing compositions and also avoids integration of exogenous nucleotide sequences in the plant genome, thereby providing greater biosafety.

In some preferred embodiments, the introduction is performed in the absence of selective pressure, thereby avoiding integration of the exogenous nucleotide sequence in the plant genome.

In some embodiments, the introducing comprises transforming an isolated plant cell or tissue with the base editing system of the invention and then regenerating the transformed plant cell or tissue into a whole plant. Preferably, the regeneration is carried out in the absence of selective pressure, i.e., without using any selection agent for the selection gene carried on the expression vector during the tissue culture process. The regeneration efficiency of the plant can be improved without using a selection agent, and a modified plant without an exogenous nucleotide sequence can be obtained.

In other embodiments, the base editing system of the invention can be transformed into a specific site on an intact plant, such as a leaf, stem tip, pollen tube, young ear, or hypocotyl. This is particularly suitable for the transformation of plants which are difficult to regenerate by tissue culture.

In some embodiments, the system is introduced into a tissue/cell in plant proliferation and/or differentiation. In some embodiments, the system is introduced into a meristem. In some embodiments, the system is introduced into callus.

In some embodiments of the invention, the in vitro expressed protein and/or in vitro transcribed RNA molecule (e.g., the expression construct is an in vitro transcribed RNA molecule) is directly transformed into the plant. The protein and/or RNA molecules are capable of effecting base editing in plant cells and subsequent degradation by the cells, avoiding integration of foreign nucleotide sequences in the plant genome.

Thus, in some embodiments, genetic modification and breeding of plants using the methods of the invention can result in plants that have no exogenous polynucleotide integrated in their genome, i.e., non-transgenic (transgene-free) modified plants.

In some embodiments of the invention, wherein the modified target sequence is associated with a plant trait, such as an agronomic trait, whereby the substitution of one or more C to G results in the plant having an altered (preferably improved) trait, such as an agronomic trait, relative to a wild type plant.

In some embodiments, the method further comprises the step of screening for plants having a desired C to G substitution or substitutions and/or a desired trait, such as an agronomic trait.

In some embodiments of the invention, the method further comprises obtaining progeny of the genetically modified plant. Preferably, the genetically modified plant or progeny thereof has a desired C to G substitution or substitutions and/or a desired trait such as an agronomic trait.

In another aspect, the present invention also provides a genetically modified plant or progeny or parts thereof, wherein the plant is obtained by the method of the invention as described above. In some embodiments, the genetically modified plant or progeny or part thereof is non-transgenic. Preferably, the genetically modified plant or progeny thereof has a desired genetic modification and/or a desired trait, such as an agronomic trait.

In another aspect, the present invention also provides a method of plant breeding comprising crossing a genetically modified first plant comprising one or more C to G substitutions in a target nucleic acid region obtained by the method of the invention described above with a second plant not comprising the one or more nucleotide substitutions, thereby introducing the one or more nucleotide substitutions into the second plant. Preferably, the genetically modified first plant has a desired trait, such as an agronomic trait.

Fifth, kit

The invention also includes a kit for use in the method of the invention, the kit comprising the base editing system of the invention, and instructions for use. The kit generally includes a label indicating the intended use and/or method of use of the kit contents. The term label includes any written or recorded material provided on or with the kit or otherwise provided with the kit.

Examples

Materials and methods

1. Vector construction

In order to construct the PCGBE system, Escherichia coli UDG (accession number: AMB53293.1), rice own PCNA (LOC _ Os02g56130), Ubiquitin (LOC _ Os03g13170), rice own APE01g (LOC _ Os01g58690) and APE12g (LOC _ Os12g18200), and hRad18 protein (NP _064550.3) of human origin were selected. And performing K164R point mutation on a PCNA sequence (PCNA (K164R)), intercepting an N-terminal functional domain (Ub) of the Ubiquitin, performing inactivation point mutation on an APE12g sequence (dAPE, D238A/N240V), and performing point mutation on APE01g and APE12g to obtain mAPE01g (D297A) and mAPE12g (D327A). All gene fragments were subjected to rice codon optimization and gene synthesis in Nanjing Kingsrei Biotech Co.

APOBEC3A was fused to the N-terminus of Cas9 with an XTEN linker and UDG was fused to the C-terminus of Cas9, thereby constructing PCGBE-1 vector (fig. 2) or ADU vector (fig. 11 a).

Replacing nCas9(D10A) in the PCGBE-1 vector with dCas9(D10A, H840A) to obtain PCGBE-2; secondly, respectively putting mAPE01g, mAPE12g and hRad18 proteins to the C terminal of the PCGBE-1 vector by utilizing self-shearing 2A polypeptide (T2A), thereby constructing the PCGBE-3-5 vector; the PCGBE-6 vector is obtained by exchanging the position of UDG and hRad18 protein on the PCGBE-5 vector. And finally, integrating the fusion gene-carrying fragment and the sgRNA expression component into a pHUE411 framework by using a Gibson method to construct a binary vector pH-PCGBE-2-6 (figure 6) for agrobacterium infection mediated rice genetic transformation.

In addition, MCP-PCNA-Ub fusion protein and dAPE protein are respectively fused to the C terminal of an APOBEC3A-nCas9-UDG vector by using a self-shearing 2A polypeptide (T2A), so that CGBE vectors shown in figure 11b and CGBE vectors shown in figure 11C are constructed, and finally, a fusion gene-carrying fragment and a sgRNA expression component are integrated into a pHUE411 framework by using a Gibson method to construct a binary vector for agrobacterium infection-mediated rice genetic transformation.

9 endogenous targets were selected from 9 rice genes (OsAAT, OsALS, OsCDC48, OsDEP1, OsGRF1, OsIPA1, OsNRT1.1B, OsPDS and OsSWEET14) for system testing, and all the targeted site sequences are shown in Table 1.

TABLE 1 sgRNA targeting sites and primers

Bold shows PAM sequences

2. Protoplast isolation and transformation

The rice material used for protoplast isolation and transformation according to the invention is Zhonghua 11.

2.1 culture of etiolated seedlings of Rice

Rinsing the rice seeds of the Zhonghua 11 with 75% ethanol for 1 minute, treating the rice seeds with 4% sodium hypochlorite for 30 minutes, and washing the rice seeds with sterile water for more than 5 times. Culturing in M6 culture medium for 3-4 weeks, and processing at 26 deg.C in dark.

2.2 isolation of Rice protoplasts

(1) Cutting off stem tissue of etiolated seedling, cutting the middle part of the etiolated seedling into 0.5-1mm threads with a blade, placing the cut-off threads into 0.6M Mannitol solution to be protected from light for 10min, filtering with a filter screen, placing the cut-off threads into 50mL of enzymolysis solution (filtering with a 0.45 mu M filter membrane), vacuumizing (the pressure is about 15Kpa) for 30min, taking out the solution, and placing the solution on a shaking table (10rpm) for room temperature enzymolysis for 5 h; (2) adding 30-50mL of W5 to dilute the enzymolysis product, and filtering the enzymolysis liquid with a 75-micron nylon filter membrane in a round-bottom centrifuge tube (50 mL); (3) 3 is raised and lowered at 23 ℃ by 250g (rcf), the mixture is centrifuged for 3min, and the supernatant is discarded; (4) gently suspend the cells with 20mL of W5 and repeat step (3); (5) adding proper amount of MMG for suspension and waiting for transformation.

2.3 transformation of Rice protoplasts

(1) Respectively adding 10 mu g of each needed transformation carrier into a 2mL centrifuge tube, uniformly mixing, sucking 200 mu L of protoplast by using a tip-removing gun head, flicking, uniformly mixing, adding 220 mu L of PEG4000 solution, flicking, uniformly mixing, and inducing and transforming for 20-30min at room temperature in a dark place; (2) adding 880 mu L W5, slightly reversing, mixing, 250g (rcf), rising 3 and falling 3, centrifuging for 3min, and removing supernatant; (3) add 1mL of WI solution, mix by gentle inversion, and incubate at 23 ℃ in the dark for 48 h.

3. Agrobacterium infection mediated rice genetic transformation

The binary vector constituting the target is delivered to AGL1 Agrobacterium by electrotransformation. And then transforming the calli of the medium flower 11 rice by using an agrobacterium-mediated staining method, and using hygromycin as a screening marker for screening transgenic positive plants.

4. DNA extraction and amplicon sequencing analysis of protoplasts and transgenic plants

3.1 protoplast and transgenic plant DNA extraction

Collecting protoplasts in a 2mL centrifuge tube, and extracting DNA (about 30 mu L) of the protoplasts by using a CTAB method; each transgenic clone was sampled separately, its genomic DNA was extracted by CTAB method, its concentration (50 ng/. mu.L) was measured by NanoDrop ultramicro spectrophotometer, and it was stored at-20 ℃.

3.2 amplicon sequencing analysis

(1) The genomic primers were used to perform one round of PCR amplification on the DNA template, and the information of the primers in one round is shown in Table 2. A20. mu.L amplification regimen contained 4. mu.L of 5 XFastpfu buffer, 1.6. mu.L of dNTPs (2.5mM), 0.4. mu.L of Forward primer (10. mu.M), 0.4. mu.L of Reverse primer (10. mu.M), 0.4. mu.L of Fastpfu polymerase (2.5U/. mu.L), and 2. mu.L of DNA template (. about.60 ng). Amplification conditions: pre-denaturation at 95 ℃ for 5 min; denaturation at 95 ℃ for 30s, annealing at 50-64 ℃ for 30s, extension at 72 ℃ for 30s, and 35 cycles; fully extending for 5min at 72 ℃, and storing at 12 ℃;

(2) diluting the amplification product by 10 times, taking 1 mu L as a second round PCR amplification template, wherein the amplification primer is a sequencing primer containing Barcode, and the details are shown in Table 2. A50. mu.L amplification system contained 10. mu.L of 5 XFastpfu buffer, 4. mu.L of dNTPs (2.5mM), 1. mu.L of Forward primer (10. mu.M), 1. mu.L of Reverse primer (10. mu.M), 1. mu.L of Fastpfu polymerase (2.5U/. mu.L), and 1. mu.L of DNA template. The amplification conditions were as above, and the number of amplification cycles was 38 cycles.

(3) Separating the PCR product in 2% agarose Gel electrophoresis, performing Gel recovery on the target fragment by using an AxyPrepTM DNA Gel Extraction kit, and performing quantitative analysis on the recovered product by using a NanoDrop ultramicro spectrophotometer; 100ng of the recovered products are respectively mixed and sent to Beijing Nuo He Zhiyuan science and technology Co., Ltd for amplicon library construction and sequencing analysis.

(4) After the sequencing to be detected is completed, splitting original data according to a sequencing primer, and simultaneously taking the sgRNA sequence and the flanking sequence thereof as reference sequences to perform systematic comparative analysis on the types of the editing products and the editing efficiency of different systems on different gene target sites.

Example 1 construction of an accurate CG base editing System

The Cytosine Base Editor (CBE) was established as early as 2016 in the David Liu laboratory, and this system uses nCas9(D10A) to guide cytosine deaminase to act on the non-complementary strand of a DNA target site and deaminate cytosine (C) in a specific region into uracil base (U), which is recognized as thymine (T) during mismatch repair (MMR) or DNA replication, ultimately achieving precise single base substitution of C-to-T. However, during the process of body Base Excision Repair (BER), U is recognized and cut by UDG to form AP site, and the AP site forms a cut under the action of AP endonuclease (APE), which causes a plurality of indel byproducts, but at the same time, the CBE system is found to generate a plurality of low-frequency C-to-G byproducts, so that the generation of various byproducts is greatly reduced after the CBE system is fused with UGI (figure 1).

The inventors have replaced UGI in the high potency A3A-PBE system to UDG as early as 2019, constructed the initial PCGBE system (PCGBE-1) (fig. 2), and tested it in rice protoplasts, resulting in detection of only a very low frequency of C-to-G base substitutions, mainly also C-to-T base edits (fig. 3). Immediately after the test in rice callus, PCGBE-1 was found to mediate about 30% of C-to-G transversion in rice callus (FIGS. 4 and 5). This also fully suggests that CGBE mediates that the C-to-G base transversion process is dependent on the DNA replication process. In addition to mediating the C-to-G transversion, the PCGBE-1 system also produced a high proportion of indel by-products (approximately 30% or more) (FIGS. 4 and 5), somewhat similar to the results published by Kurt, which is 2020. The numerical values in the graph represent C-to-G editing efficiency; the numbers in parentheses below indicate the C-to-G purity.

The invention studies the DNA damage occurrence and potential repair mechanism involved in the process from cytosine deamination to C-to-G production (FIG. 1), and resolves 1 key repair pathway from this, namely, cross-damage DNA synthesis (TLS) repair (Zhuang et al, 2008; Qin et al, 2013; Martin and Wood, 2019). It has been shown that TLS can specifically insert 1 nucleotide into the synthetic strand corresponding to the DNA damage site (such as AP site), and this nucleotide is likely dCTP, and is mainly used for TLS repair signal initiation, which also provides the possibility for C-to-G base editing. However, one key protein factor that determines TLS repair is the Proliferation Cell Nuclear Antigen (PCNA), which recognizes DNA damage during DNA replication and mediates diverse DNA repair processes. When PCNA is uniubiquinated, it promotes recruitment of the associated DNA polymerase (pol η and pol ζ) during TLS repair, thereby promoting wound bypass repair; when PCNA is polyubiquitinated, it participates in the pathway of lesion avoidance using intact sister monomers as templates (Zhuang et al, 2008; Qin et al, 2013; Martin and Wood, 2019). Rad6-Rad18, however, plays an important role in responding to DNA damage and promotes monoubiquitination of PCNA, thereby allowing DNA damage repair to proceed toward the TLS pathway. However, through the comparison analysis of animal and plant related protein information, the inventors have not identified Rad18 homologous protein in the rice genome, and it is presumed that the Rad18 homologous protein may be one of the reasons for the low C-to-G efficiency and purity in plants. Thus, co-expression of the human Rad18 protein would likely allow mutations to proceed toward TLS repair, resulting in precise C-to-G base transversions in the target sequence.

In addition, without exogenous AP endonuclease, only part of AP sites will be cut, and the un-cut AP sites will enter into the bypass repair process of TLS during DNA replication. When AP site is generated, the expression of endogenous APE is also induced in a short time to achieve the goal of Base Excision Repair (BER), therefore, the APE protein of the mutant cell loses catalytic activity and only retains binding activity, and the mutant APE protein (mAPE) is co-expressed in the cell, thus competitively inhibiting the expression of endogenous APE and protecting AP site, and the strategy can also make the mutation proceed towards TLS repair or even C-to-G base inversion.

Based on the above results and TLS repair mechanism, the inventors have established a PCGBE-1 system by i) replacing nCas9 for dCas 9; ii) co-expressing a rice-derived mutant APE01g/APE12g protein; or iii) co-expressing humanized Rad18 proteins to form PCGBE-2-6 systems respectively (figure 6), and performing system test on OsNRT1.1B, OsPDS and OsGRF1 targets through rice callus, and finding that PCGBE-4 and PCGBE-5 are remarkably improved in both C-to-G editing efficiency and editing purity compared with PCGBE-1 (figure 7-10).

In addition, lysine at position 164 of PCNA was converted to arginine by point mutation (K164R), so that the ubiquitinated protein binding site was disrupted; meanwhile, a Ubiquitin protein (Ubiquitin) N-terminal structural domain is fused to construct a monoubiquitinated proliferating cell nuclear antigen fusion protein (PCNA. Ub), which can stably recruit TLS-related polymersomes to induce TLS repair, so that the mutation is likely to progress towards TLS repair by coexpressing the PCNA. Ub fusion protein of the cell, and accurate C-to-G base inversion is realized in a target sequence.

Based on the above results and TLS repair mechanism, the inventor uses PCNA and Ubiquitin genes of cells as templates to perform point mutation, truncation, splicing and synthesis, and then constructs the fusion protein to the C-terminal of the PCGBE-1 system to form the PCGBE system shown in FIG. 11 b. In addition, the inventors constructed inactivated rice APE (D238A/N240V) to the C-terminus of the PCGBE-1 system through T2A to form the CGBE system shown in FIG. 11C. The base editing system shown in FIG. 11 was introduced into rice calli. The results of the mutation detection are shown in FIG. 12.

The result shows that the CG base editing system not only maintains high-proportion C-to-G editing, but also greatly reduces the generation of indels, C-to-A and other byproducts, and also shows that the improved PCGBE system can realize efficient and accurate C-to-G base transversion

The sequence is as follows:

SEQ ID NO:1 APOBEC3A

MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN

SEQ ID NO:2 XTEN linker

SGSETPGTSESATPES

SEQ ID NO:3 nCas9(D10A)

DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD

SEQ ID NO:4 nucleoplasmin NLS

KRPAATKKAGQAKKKK

SEQ ID NO:5 E-coil UDG

ANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESE

SEQ ID NO:6 T2A linker

EGRGSLLTCGDVEENPGP

SEQ ID NO:7 MCP

ASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY

SEQ ID NO:8 SV40 NLS

PKKKRKV

SEQ ID NO:9 OsPCNA(K164R)

MLELRLVQGSLLKKVLEAIRELVTDANFDCSGTGFSLQAMDSSHVALVALLLRSEGFEHYRCDRNLSMGMNLNNMAKMLRCAGNDDIITIKADDGSDTVTFMFESPNQDKIADFEMKLMDIDSEHLGIPDSEYQAIVRMPSSEFSRICKDLSSIGDTVIISVTREGVKFSTAGDIGTANIVCRQNKTVDKPEDATIIEMQEPVSLTFALRYMNSFTKASPLSEQVTISLSSELPVVVEYKIAEMGYIRFYLAPKIEEDEEMKS

SEQ ID NO:10 truncated Ubiquitin(Ub)

MQIFVKTLTGKTITLEVESSDTIDNVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLADYNIQKESTLHLVLRLR

SEQ ID NO:11 OsAPE(D238A/N240V),dAPE

MKRFFQPVPKDGSPAKKRPAAAAAASASDSDSLGGDAPAAAACAVGEGDSPPAPREEEPRRFVTWNANSLLLRMKSDWPAFCQFVSRVDPDVICVQEVRMPAAGSKGAPKNPGQLKDDTSSSRDEKQVVLRALSSPPFKDYRVWWSLSDSKYAGTAMIIKKKFEPKKVSFNLDRTSSKHEPDGRVIIAEFESFLLLNTYAPNNGWKEEENSFQRRRKWDKRMLEFVQQVDKPLIWCGALVVSHEEIDVSHPDFFSSAKLNGYIPPNKEDCGQPGFTLSERRRFGNILSQGKLVDAYRYLHKEKDMDCGFSWSGHPIGKYRGKRMRIDYFLVSEKLKDQIVSCDIHGRGIELEGFYGSDHCPVSLELSEEVEAPKPKSSN

12 PCGBE-1 System exemplary Polypeptides

MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESRPDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRPAATKKAGQAKKKKGTDSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESEPKKKRKV

Schematic polypeptide of the PCGBE system of SEQ ID NO 13 (FIG. 11b)

MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESLKDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRPAATKKAGQAKKKKTRDSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESEPKKKRKVSAEGRGSLLTCGDVEENPGPASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGGSPKKKRKVSGSETPGTSESATPESPRMLELRLVQGSLLKKVLEAIRELVTDANFDCSGTGFSLQAMDSSHVALVALLLRSEGFEHYRCDRNLSMGMNLNNMAKMLRCAGNDDIITIKADDGSDTVTFMFESPNQDKIADFEMKLMDIDSEHLGIPDSEYQAIVRMPSSEFSRICKDLSSIGDTVIISVTREGVKFSTAGDIGTANIVCRQNKTVDKPEDATIIEMQEPVSLTFALRYMNSFTKASPLSEQVTISLSSELPVVVEYKIAEMGYIRFYLAPKIEEDEEMKSSGGSMQIFVKTLTGKTITLEVESSDTIDNVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLADYNIQKESTLHLVLRLRSGGSPKKKRKV

14 PCGBE System schematic polypeptide of SEQ ID NO (FIG. 11c)

MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESLKDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRPAATKKAGQAKKKKTRDSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESEPKKKRKVSAEGRGSLLTCGDVEENPGPMKRFFQPVPKDGSPAKKRPAAAAAASASDSDSLGGDAPAAAACAVGEGDSPPAPREEEPRRFVTWNANSLLLRMKSDWPAFCQFVSRVDPDVICVQEVRMPAAGSKGAPKNPGQLKDDTSSSRDEKQVVLRALSSPPFKDYRVWWSLSDSKYAGTAMIIKKKFEPKKVSFNLDRTSSKHEPDGRVIIAEFESFLLLNTYAPNNGWKEEENSFQRRRKWDKRMLEFVQQVDKPLIWCGALVVSHEEIDVSHPDFFSSAKLNGYIPPNKEDCGQPGFTLSERRRFGNILSQGKLVDAYRYLHKEKDMDCGFSWSGHPIGKYRGKRMRIDYFLVSEKLKDQIVSCDIHGRGIELEGFYGSDHCPVSLELSEEVEAPKPKSSNPKKKRKV

SEQ ID NO:15 mAPE01g(D297A)

SAIRASSHRLQTRTVALTRTKMSSMAGLGASQHGYPPRSHEPWTKLVHRERLPEWFAYNPKTMRPPPLSHDTKCMKILSWNINGLHDVVTTKGFSARDLAQRENFDVLCLQETHLEEKDVEKFKNLIADYDSYWSCSVSRLGYSGTAVISRVKPISVQYGIGIREHDHEGRVITLEFDGFYLVNAYVPNSGRFLRRLNYRVNNWDPCFSNYVKILEKSKPVIVAGDLNCARQSIDIHNPPAKTKSAGFTIEERESFETNFSSKGLVDTFRKQHPNAVGYTFWGENQRITNKGWRLAYFLASESITDKVHDSYILPDVSFSDHSPIGLVLKL

SEQ ID NO:16 mAPE12g(D327A)

KRFFQPVPKDGSPAKKRPAAAAAASASDSDSLGGDAPAAAACAVGEGDSPPAPREEEPRRFVTWNANSLLLRMKSDWPAFCQFVSRVDPDVICVQEVRMPAAGSKGAPKNPGQLKDDTSSSRDEKQVVLRALSSPPFKDYRVWWSLSDSKYAGTAMIIKKKFEPKKVSFNLDRTSSKHEPDGRVIIAEFESFLLLNTYAPNNGWKEEENSFQRRRKWDKRMLEFVQQVDKPLIWCGDLNVSHEEIDVSHPDFFSSAKLNGYIPPNKEDCGQPGFTLSERRRFGNILSQGKLVDAYRYLHKEKDMDCGFSWSGHPIGKYRGKRMRIAYFLVSEKLKDQIVSCDIHGRGIELEGFYGSDHCPVSLELSEEVEAPKPKSSN

SEQ ID NO:17 hRad18

DSLAESRWPPGLAVMKTIDDLLRCGICFEYFNIAMIIPQCSHNYCSLCIRKFLSYKTQCPTCCVTVTEPDLKNNRILDELVKSLNFARNHLLQFALESPAKSPASSSSKNLAVKVYTPVASRQSLKQGSRLMDNFLIREMSGSTSELLIKENKSKFSPQKEASPAAKTKETRSVEEIAPDPSEAKRPEPPSTSTLKQVTKVDCPVCGVNIPESHINKHLDSCLSREEKKESLRSSVHKRKPLPKTVYNLLSDRDLKKKLKEHGLSIQGNKQQLIKRHQEFVHMYNAQCDALHPKSAAEIVREIENIEKTRMRLEASKLNESVMVFTKDQTEKEIDEIHSKYRKKHKSEFQLLVDQARKGYKKIAGMSQKTVTITKEDESTEKLSSVCMGQEDNMTSVTNHFSQSKLDSPEELEPDREEDSSSCIDIQEVLSSSESDSCNSSSSDIIRDLLEEEEAWEASHKNDLQDTEISPRQNRRTRAAESAEIEPRNKRNRN

SEQ ID NO:18 APE01g

MSAIRASSHRLQTRTVALTRTKMSSMAGLGASQHGYPPRSHEPWTKLVHRERLPEWFAYNPKTMRPPPLSHDTKCMKILSWNINGLHDVVTTKGFSARDLAQRENFDVLCLQETHLEEKDVEKFKNLIADYDSYWSCSVSRLGYSGTAVISRVKPISVQYGIGIREHDHEGRVITLEFDGFYLVNAYVPNSGRFLRRLNYRVNNWDPCFSNYVKILEKSKPVIVAGDLNCARQSIDIHNPPAKTKSAGFTIEERESFETNFSSKGLVDTFRKQHPNAVGYTFWGENQRITNKGWRLDYFLASESITDKVHDSYILPDVSFSDHSPIGLVLKL

SEQ ID NO:19 APE12g

MKRFFQPVPKDGSPAKKRPAAAAAASASDSDSLGGDAPAAAACAVGEGDSPPAPREEEPRRFVTWNANSLLLRMKSDWPAFCQFVSRVDPDVICVQEVRMPAAGSKGAPKNPGQLKDDTSSSRDEKQVVLRALSSPPFKDYRVWWSLSDSKYAGTAMIIKKKFEPKKVSFNLDRTSSKHEPDGRVIIAEFESFLLLNTYAPNNGWKEEENSFQRRRKWDKRMLEFVQQVDKPLIWCGDLNVSHEEIDVSHPDFFSSAKLNGYIPPNKEDCGQPGFTLSERRRFGNILSQGKLVDAYRYLHKEKDMDCGFSWSGHPIGKYRGKRMRIDYFLVSEKLKDQIVSCDIHGRGIELEGFYGSDHCPVSLELSEEVEAPKPKSSN

SEQ ID NO:20 OsPCNA(wt)

MLELRLVQGSLLKKVLEAIRELVTDANFDCSGTGFSLQAMDSSHVALVALLLRSEGFEHYRCDRNLSMGMNLNNMAKMLRCAGNDDIITIKADDGSDTVTFMFESPNQDKIADFEMKLMDIDSEHLGIPDSEYQAIVRMPSSEFSRICKDLSSIGDTVIISVTKEGVKFSTAGDIGTANIVCRQNKTVDKPEDATIIEMQEPVSLTFALRYMNSFTKASPLSEQVTISLSSELPVVVEYKIAEMGYIRFYLAPKIEEDEEMKS

21 PCGBE-3 System exemplary fusion protein

MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESRPDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRPAATKKAGQAKKKKGTDSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESEPKKKRKVLKEGRGSLLTCGDVEENPGPSAIRASSHRLQTRTVALTRTKMSSMAGLGASQHGYPPRSHEPWTKLVHRERLPEWFAYNPKTMRPPPLSHDTKCMKILSWNINGLHDVVTTKGFSARDLAQRENFDVLCLQETHLEEKDVEKFKNLIADYDSYWSCSVSRLGYSGTAVISRVKPISVQYGIGIREHDHEGRVITLEFDGFYLVNAYVPNSGRFLRRLNYRVNNWDPCFSNYVKILEKSKPVIVAGDLNCARQSIDIHNPPAKTKSAGFTIEERESFETNFSSKGLVDTFRKQHPNAVGYTFWGENQRITNKGWRLAYFLASESITDKVHDSYILPDVSFSDHSPIGLVLKLSGGSPKKKRKV

Schematic fusion protein of SEQ ID NO 22 PCGBE-4 system

MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESRPDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRPAATKKAGQAKKKKGTDSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESEPKKKRKVLKEGRGSLLTCGDVEENPGPKRFFQPVPKDGSPAKKRPAAAAAASASDSDSLGGDAPAAAACAVGEGDSPPAPREEEPRRFVTWNANSLLLRMKSDWPAFCQFVSRVDPDVICVQEVRMPAAGSKGAPKNPGQLKDDTSSSRDEKQVVLRALSSPPFKDYRVWWSLSDSKYAGTAMIIKKKFEPKKVSFNLDRTSSKHEPDGRVIIAEFESFLLLNTYAPNNGWKEEENSFQRRRKWDKRMLEFVQQVDKPLIWCGDLNVSHEEIDVSHPDFFSSAKLNGYIPPNKEDCGQPGFTLSERRRFGNILSQGKLVDAYRYLHKEKDMDCGFSWSGHPIGKYRGKRMRIAYFLVSEKLKDQIVSCDIHGRGIELEGFYGSDHCPVSLELSEEVEAPKPKSSNSGGSPKKKRKV

Schematic fusion protein of SEQ ID NO. 23 PCGBE-5 system

MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESRPDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDKRPAATKKAGQAKKKKGTDSGGSANELTWHDVLAEEKQQPYFLNTLQTVASERQSGVTIYPPQKDVFNAFRFTELGDVKVVILGQDPYHGPGQAHGLAFSVRPGIAIPPSLLNMYKELENTIPGFTRPNHGYLESWARQGVLLLNTVLTVRAGQAHSHASLGWETFTDKVISLINQHREGVVFLLWGSHAQKKGAIIDKQRHHVLKAPHPSPLSAHRGFFGCNHFVLANQWLEQRGETPIDWMPVLPAESEPKKKRKVLKEGRGSLLTCGDVEENPGPDSLAESRWPPGLAVMKTIDDLLRCGICFEYFNIAMIIPQCSHNYCSLCIRKFLSYKTQCPTCCVTVTEPDLKNNRILDELVKSLNFARNHLLQFALESPAKSPASSSSKNLAVKVYTPVASRQSLKQGSRLMDNFLIREMSGSTSELLIKENKSKFSPQKEASPAAKTKETRSVEEIAPDPSEAKRPEPPSTSTLKQVTKVDCPVCGVNIPESHINKHLDSCLSREEKKESLRSSVHKRKPLPKTVYNLLSDRDLKKKLKEHGLSIQGNKQQLIKRHQEFVHMYNAQCDALHPKSAAEIVREIENIEKTRMRLEASKLNESVMVFTKDQTEKEIDEIHSKYRKKHKSEFQLLVDQARKGYKKIAGMSQKTVTITKEDESTEKLSSVCMGQEDNMTSVTNHFSQSKLDSPEELEPDREEDSSSCIDIQEVLSSSESDSCNSSSSDIIRDLLEEEEAWEASHKNDLQDTEISPRQNRRTRAAESAEIEPRNKRNRNSGGSPKKKRKV

Sequence listing

<110> Shanghai blue Cross medical science institute

<120> improved CG base editing System

<130> P2022TC1988

<160> 23

<170> PatentIn version 3.5

<210> 1

<211> 199

<212> PRT

<213> Artificial Sequence

<220>

<223> APOBEC3A

<400> 1

Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His

1 5 10 15

Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr

20 25 30

Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met

35 40 45

Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys

50 55 60

Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro

65 70 75 80

Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile

85 90 95

Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala

100 105 110

Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg

115 120 125

Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg

130 135 140

Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His

145 150 155 160

Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp

165 170 175

Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala

180 185 190

Ile Leu Gln Asn Gln Gly Asn

195

<210> 2

<211> 16

<212> PRT

<213> Artificial Sequence

<220>

<223> XTEN linker

<400> 2

Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser

1 5 10 15

<210> 3

<211> 1367

<212> PRT

<213> Artificial Sequence

<220>

<223> nCas9 (D10A)

<400> 3

Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly

1 5 10 15

Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys

20 25 30

Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly

35 40 45

Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys

50 55 60

Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr

65 70 75 80

Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe

85 90 95

Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His

100 105 110

Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His

115 120 125

Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser

130 135 140

Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met

145 150 155 160

Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp

165 170 175

Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn

180 185 190

Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys

195 200 205

Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu

210 215 220

Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu

225 230 235 240

Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp

245 250 255

Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp

260 265 270

Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu

275 280 285

Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile

290 295 300

Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met

305 310 315 320

Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala

325 330 335

Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp

340 345 350

Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln

355 360 365

Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly

370 375 380

Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys

385 390 395 400

Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly

405 410 415

Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu

420 425 430

Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro

435 440 445

Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met

450 455 460

Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val

465 470 475 480

Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn

485 490 495

Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu

500 505 510

Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr

515 520 525

Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys

530 535 540

Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val

545 550 555 560

Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser

565 570 575

Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr

580 585 590

Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn

595 600 605

Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu

610 615 620

Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His

625 630 635 640

Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr

645 650 655

Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys

660 665 670

Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala

675 680 685

Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys

690 695 700

Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His

705 710 715 720

Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile

725 730 735

Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg

740 745 750

His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr

755 760 765

Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu

770 775 780

Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val

785 790 795 800

Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln

805 810 815

Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu

820 825 830

Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp

835 840 845

Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly

850 855 860

Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn

865 870 875 880

Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe

885 890 895

Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys

900 905 910

Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys

915 920 925

His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu

930 935 940

Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys

945 950 955 960

Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu

965 970 975

Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val

980 985 990

Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val

995 1000 1005

Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys

1010 1015 1020

Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr

1025 1030 1035

Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn

1040 1045 1050

Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr

1055 1060 1065

Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg

1070 1075 1080

Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu

1085 1090 1095

Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg

1100 1105 1110

Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys

1115 1120 1125

Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu

1130 1135 1140

Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser

1145 1150 1155

Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe

1160 1165 1170

Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu

1175 1180 1185

Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe

1190 1195 1200

Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu

1205 1210 1215

Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn

1220 1225 1230

Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro

1235 1240 1245

Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His

1250 1255 1260

Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg

1265 1270 1275

Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr

1280 1285 1290

Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile

1295 1300 1305

Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe

1310 1315 1320

Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr

1325 1330 1335

Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly

1340 1345 1350

Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp

1355 1360 1365

<210> 4

<211> 16

<212> PRT

<213> Artificial Sequence

<220>

<223> nucleoplasmin NLS

<400> 4

Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys

1 5 10 15

<210> 5

<211> 228

<212> PRT

<213> Artificial Sequence

<220>

<223> E-coil UDG

<400> 5

Ala Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln

1 5 10 15

Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln Ser

20 25 30

Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala Phe Arg

35 40 45

Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly Gln Asp Pro

50 55 60

Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe Ser Val Arg Pro

65 70 75 80

Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met Tyr Lys Glu Leu Glu

85 90 95

Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn His Gly Tyr Leu Glu Ser

100 105 110

Trp Ala Arg Gln Gly Val Leu Leu Leu Asn Thr Val Leu Thr Val Arg

115 120 125

Ala Gly Gln Ala His Ser His Ala Ser Leu Gly Trp Glu Thr Phe Thr

130 135 140

Asp Lys Val Ile Ser Leu Ile Asn Gln His Arg Glu Gly Val Val Phe

145 150 155 160

Leu Leu Trp Gly Ser His Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys

165 170 175

Gln Arg His His Val Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala

180 185 190

His Arg Gly Phe Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp

195 200 205

Leu Glu Gln Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro

210 215 220

Ala Glu Ser Glu

225

<210> 6

<211> 18

<212> PRT

<213> Artificial Sequence

<220>

<223> T2A linker

<400> 6

Glu Gly Arg Gly Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro

1 5 10 15

Gly Pro

<210> 7

<211> 116

<212> PRT

<213> Artificial Sequence

<220>

<223> MCP

<400> 7

Ala Ser Asn Phe Thr Gln Phe Val Leu Val Asp Asn Gly Gly Thr Gly

1 5 10 15

Asp Val Thr Val Ala Pro Ser Asn Phe Ala Asn Gly Ile Ala Glu Trp

20 25 30

Ile Ser Ser Asn Ser Arg Ser Gln Ala Tyr Lys Val Thr Cys Ser Val

35 40 45

Arg Gln Ser Ser Ala Gln Asn Arg Lys Tyr Thr Ile Lys Val Glu Val

50 55 60

Pro Lys Gly Ala Trp Arg Ser Tyr Leu Asn Met Glu Leu Thr Ile Pro

65 70 75 80

Ile Phe Ala Thr Asn Ser Asp Cys Glu Leu Ile Val Lys Ala Met Gln

85 90 95

Gly Leu Leu Lys Asp Gly Asn Pro Ile Pro Ser Ala Ile Ala Ala Asn

100 105 110

Ser Gly Ile Tyr

115

<210> 8

<211> 7

<212> PRT

<213> Artificial Sequence

<220>

<223> SV40 NLS

<400> 8

Pro Lys Lys Lys Arg Lys Val

1 5

<210> 9

<211> 263

<212> PRT

<213> Artificial Sequence

<220>

<223> OsPCNA (K164R)

<400> 9

Met Leu Glu Leu Arg Leu Val Gln Gly Ser Leu Leu Lys Lys Val Leu

1 5 10 15

Glu Ala Ile Arg Glu Leu Val Thr Asp Ala Asn Phe Asp Cys Ser Gly

20 25 30

Thr Gly Phe Ser Leu Gln Ala Met Asp Ser Ser His Val Ala Leu Val

35 40 45

Ala Leu Leu Leu Arg Ser Glu Gly Phe Glu His Tyr Arg Cys Asp Arg

50 55 60

Asn Leu Ser Met Gly Met Asn Leu Asn Asn Met Ala Lys Met Leu Arg

65 70 75 80

Cys Ala Gly Asn Asp Asp Ile Ile Thr Ile Lys Ala Asp Asp Gly Ser

85 90 95

Asp Thr Val Thr Phe Met Phe Glu Ser Pro Asn Gln Asp Lys Ile Ala

100 105 110

Asp Phe Glu Met Lys Leu Met Asp Ile Asp Ser Glu His Leu Gly Ile

115 120 125

Pro Asp Ser Glu Tyr Gln Ala Ile Val Arg Met Pro Ser Ser Glu Phe

130 135 140

Ser Arg Ile Cys Lys Asp Leu Ser Ser Ile Gly Asp Thr Val Ile Ile

145 150 155 160

Ser Val Thr Arg Glu Gly Val Lys Phe Ser Thr Ala Gly Asp Ile Gly

165 170 175

Thr Ala Asn Ile Val Cys Arg Gln Asn Lys Thr Val Asp Lys Pro Glu

180 185 190

Asp Ala Thr Ile Ile Glu Met Gln Glu Pro Val Ser Leu Thr Phe Ala

195 200 205

Leu Arg Tyr Met Asn Ser Phe Thr Lys Ala Ser Pro Leu Ser Glu Gln

210 215 220

Val Thr Ile Ser Leu Ser Ser Glu Leu Pro Val Val Val Glu Tyr Lys

225 230 235 240

Ile Ala Glu Met Gly Tyr Ile Arg Phe Tyr Leu Ala Pro Lys Ile Glu

245 250 255

Glu Asp Glu Glu Met Lys Ser

260

<210> 10

<211> 74

<212> PRT

<213> Artificial Sequence

<220>

<223> truncated Ubiquitin (Ub)

<400> 10

Met Gln Ile Phe Val Lys Thr Leu Thr Gly Lys Thr Ile Thr Leu Glu

1 5 10 15

Val Glu Ser Ser Asp Thr Ile Asp Asn Val Lys Ala Lys Ile Gln Asp

20 25 30

Lys Glu Gly Ile Pro Pro Asp Gln Gln Arg Leu Ile Phe Ala Gly Lys

35 40 45

Gln Leu Glu Asp Gly Arg Thr Leu Ala Asp Tyr Asn Ile Gln Lys Glu

50 55 60

Ser Thr Leu His Leu Val Leu Arg Leu Arg

65 70

<210> 11

<211> 379

<212> PRT

<213> Artificial Sequence

<220>

<223> OsAPE (D238A/N240V), dAPE

<400> 11

Met Lys Arg Phe Phe Gln Pro Val Pro Lys Asp Gly Ser Pro Ala Lys

1 5 10 15

Lys Arg Pro Ala Ala Ala Ala Ala Ala Ser Ala Ser Asp Ser Asp Ser

20 25 30

Leu Gly Gly Asp Ala Pro Ala Ala Ala Ala Cys Ala Val Gly Glu Gly

35 40 45

Asp Ser Pro Pro Ala Pro Arg Glu Glu Glu Pro Arg Arg Phe Val Thr

50 55 60

Trp Asn Ala Asn Ser Leu Leu Leu Arg Met Lys Ser Asp Trp Pro Ala

65 70 75 80

Phe Cys Gln Phe Val Ser Arg Val Asp Pro Asp Val Ile Cys Val Gln

85 90 95

Glu Val Arg Met Pro Ala Ala Gly Ser Lys Gly Ala Pro Lys Asn Pro

100 105 110

Gly Gln Leu Lys Asp Asp Thr Ser Ser Ser Arg Asp Glu Lys Gln Val

115 120 125

Val Leu Arg Ala Leu Ser Ser Pro Pro Phe Lys Asp Tyr Arg Val Trp

130 135 140

Trp Ser Leu Ser Asp Ser Lys Tyr Ala Gly Thr Ala Met Ile Ile Lys

145 150 155 160

Lys Lys Phe Glu Pro Lys Lys Val Ser Phe Asn Leu Asp Arg Thr Ser

165 170 175

Ser Lys His Glu Pro Asp Gly Arg Val Ile Ile Ala Glu Phe Glu Ser

180 185 190

Phe Leu Leu Leu Asn Thr Tyr Ala Pro Asn Asn Gly Trp Lys Glu Glu

195 200 205

Glu Asn Ser Phe Gln Arg Arg Arg Lys Trp Asp Lys Arg Met Leu Glu

210 215 220

Phe Val Gln Gln Val Asp Lys Pro Leu Ile Trp Cys Gly Ala Leu Val

225 230 235 240

Val Ser His Glu Glu Ile Asp Val Ser His Pro Asp Phe Phe Ser Ser

245 250 255

Ala Lys Leu Asn Gly Tyr Ile Pro Pro Asn Lys Glu Asp Cys Gly Gln

260 265 270

Pro Gly Phe Thr Leu Ser Glu Arg Arg Arg Phe Gly Asn Ile Leu Ser

275 280 285

Gln Gly Lys Leu Val Asp Ala Tyr Arg Tyr Leu His Lys Glu Lys Asp

290 295 300

Met Asp Cys Gly Phe Ser Trp Ser Gly His Pro Ile Gly Lys Tyr Arg

305 310 315 320

Gly Lys Arg Met Arg Ile Asp Tyr Phe Leu Val Ser Glu Lys Leu Lys

325 330 335

Asp Gln Ile Val Ser Cys Asp Ile His Gly Arg Gly Ile Glu Leu Glu

340 345 350

Gly Phe Tyr Gly Ser Asp His Cys Pro Val Ser Leu Glu Leu Ser Glu

355 360 365

Glu Val Glu Ala Pro Lys Pro Lys Ser Ser Asn

370 375

<210> 12

<211> 1842

<212> PRT

<213> Artificial Sequence

<220>

<223> exemplary Polypeptides of the PCGBE-1 System

<400> 12

Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His

1 5 10 15

Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr

20 25 30

Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met

35 40 45

Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys

50 55 60

Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro

65 70 75 80

Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile

85 90 95

Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala

100 105 110

Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg

115 120 125

Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg

130 135 140

Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His

145 150 155 160

Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp

165 170 175

Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala

180 185 190

Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser

195 200 205

Glu Ser Ala Thr Pro Glu Ser Arg Pro Asp Lys Lys Tyr Ser Ile Gly

210 215 220

Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu

225 230 235 240

Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg

245 250 255

His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly

260 265 270

Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr

275 280 285

Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn

290 295 300

Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser

305 310 315 320

Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly

325 330 335

Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr

340 345 350

His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg

355 360 365

Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe

370 375 380

Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu

385 390 395 400

Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro

405 410 415

Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu

420 425 430

Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu

435 440 445

Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu

450 455 460

Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu

465 470 475 480

Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala

485 490 495

Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu

500 505 510

Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile

515 520 525

Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His

530 535 540

His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro

545 550 555 560

Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala

565 570 575

Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile

580 585 590

Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys

595 600 605

Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly

610 615 620

Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg

625 630 635 640

Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile

645 650 655

Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala

660 665 670

Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr

675 680 685

Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala

690 695 700

Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn

705 710 715 720

Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val

725 730 735

Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys

740 745 750

Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu

755 760 765

Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr

770 775 780

Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu

785 790 795 800

Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile

805 810 815

Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu

820 825 830

Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile

835 840 845

Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met

850 855 860

Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg

865 870 875 880

Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu

885 890 895

Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu

900 905 910

Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln

915 920 925

Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala

930 935 940

Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val

945 950 955 960

Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val

965 970 975

Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn

980 985 990

Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly

995 1000 1005

Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln

1010 1015 1020

Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met

1025 1030 1035

Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp

1040 1045 1050

Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile

1055 1060 1065

Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser

1070 1075 1080

Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr

1085 1090 1095

Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe

1100 1105 1110

Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

1115 1120 1125

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile

1130 1135 1140

Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys

1145 1150 1155

Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr

1160 1165 1170

Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe

1175 1180 1185

Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala

1190 1195 1200

Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro

1205 1210 1215

Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp

1220 1225 1230

Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala

1235 1240 1245

Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys

1250 1255 1260

Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu

1265 1270 1275

Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly

1280 1285 1290

Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val

1295 1300 1305

Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys

1310 1315 1320

Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg

1325 1330 1335

Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro

1340 1345 1350

Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly

1355 1360 1365

Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr

1370 1375 1380

Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu

1385 1390 1395

Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys

1400 1405 1410

Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg

1415 1420 1425

Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala

1430 1435 1440

Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr

1445 1450 1455

Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu

1460 1465 1470

Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln

1475 1480 1485

Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu

1490 1495 1500

Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile

1505 1510 1515

Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn

1520 1525 1530

Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp

1535 1540 1545

Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu

1550 1555 1560

Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu

1565 1570 1575

Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala

1580 1585 1590

Gly Gln Ala Lys Lys Lys Lys Gly Thr Asp Ser Gly Gly Ser Ala

1595 1600 1605

Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln

1610 1615 1620

Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln

1625 1630 1635

Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala

1640 1645 1650

Phe Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly

1655 1660 1665

Gln Asp Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe

1670 1675 1680

Ser Val Arg Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met

1685 1690 1695

Tyr Lys Glu Leu Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn

1700 1705 1710

His Gly Tyr Leu Glu Ser Trp Ala Arg Gln Gly Val Leu Leu Leu

1715 1720 1725

Asn Thr Val Leu Thr Val Arg Ala Gly Gln Ala His Ser His Ala

1730 1735 1740

Ser Leu Gly Trp Glu Thr Phe Thr Asp Lys Val Ile Ser Leu Ile

1745 1750 1755

Asn Gln His Arg Glu Gly Val Val Phe Leu Leu Trp Gly Ser His

1760 1765 1770

Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys Gln Arg His His Val

1775 1780 1785

Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala His Arg Gly Phe

1790 1795 1800

Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp Leu Glu Gln

1805 1810 1815

Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro Ala Glu

1820 1825 1830

Ser Glu Pro Lys Lys Lys Arg Lys Val

1835 1840

<210> 13

<211> 2358

<212> PRT

<213> Artificial Sequence

<220>

<223> schematic polypeptide of PCGBE System (FIG. 11b)

<400> 13

Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His

1 5 10 15

Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr

20 25 30

Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met

35 40 45

Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys

50 55 60

Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro

65 70 75 80

Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile

85 90 95

Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala

100 105 110

Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg

115 120 125

Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg

130 135 140

Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His

145 150 155 160

Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp

165 170 175

Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala

180 185 190

Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser

195 200 205

Glu Ser Ala Thr Pro Glu Ser Leu Lys Asp Lys Lys Tyr Ser Ile Gly

210 215 220

Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu

225 230 235 240

Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg

245 250 255

His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly

260 265 270

Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr

275 280 285

Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn

290 295 300

Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser

305 310 315 320

Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly

325 330 335

Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr

340 345 350

His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg

355 360 365

Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe

370 375 380

Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu

385 390 395 400

Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro

405 410 415

Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu

420 425 430

Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu

435 440 445

Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu

450 455 460

Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu

465 470 475 480

Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala

485 490 495

Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu

500 505 510

Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile

515 520 525

Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His

530 535 540

His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro

545 550 555 560

Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala

565 570 575

Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile

580 585 590

Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys

595 600 605

Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly

610 615 620

Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg

625 630 635 640

Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile

645 650 655

Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala

660 665 670

Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr

675 680 685

Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala

690 695 700

Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn

705 710 715 720

Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val

725 730 735

Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys

740 745 750

Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu

755 760 765

Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr

770 775 780

Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu

785 790 795 800

Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile

805 810 815

Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu

820 825 830

Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile

835 840 845

Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met

850 855 860

Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg

865 870 875 880

Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu

885 890 895

Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu

900 905 910

Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln

915 920 925

Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala

930 935 940

Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val

945 950 955 960

Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val

965 970 975

Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn

980 985 990

Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly

995 1000 1005

Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln

1010 1015 1020

Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met

1025 1030 1035

Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp

1040 1045 1050

Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile

1055 1060 1065

Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser

1070 1075 1080

Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr

1085 1090 1095

Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe

1100 1105 1110

Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

1115 1120 1125

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile

1130 1135 1140

Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys

1145 1150 1155

Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr

1160 1165 1170

Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe

1175 1180 1185

Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala

1190 1195 1200

Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro

1205 1210 1215

Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp

1220 1225 1230

Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala

1235 1240 1245

Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys

1250 1255 1260

Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu

1265 1270 1275

Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly

1280 1285 1290

Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val

1295 1300 1305

Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys

1310 1315 1320

Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg

1325 1330 1335

Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro

1340 1345 1350

Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly

1355 1360 1365

Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr

1370 1375 1380

Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu

1385 1390 1395

Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys

1400 1405 1410

Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg

1415 1420 1425

Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala

1430 1435 1440

Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr

1445 1450 1455

Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu

1460 1465 1470

Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln

1475 1480 1485

Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu

1490 1495 1500

Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile

1505 1510 1515

Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn

1520 1525 1530

Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp

1535 1540 1545

Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu

1550 1555 1560

Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu

1565 1570 1575

Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala

1580 1585 1590

Gly Gln Ala Lys Lys Lys Lys Thr Arg Asp Ser Gly Gly Ser Ala

1595 1600 1605

Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln

1610 1615 1620

Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln

1625 1630 1635

Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala

1640 1645 1650

Phe Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly

1655 1660 1665

Gln Asp Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe

1670 1675 1680

Ser Val Arg Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met

1685 1690 1695

Tyr Lys Glu Leu Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn

1700 1705 1710

His Gly Tyr Leu Glu Ser Trp Ala Arg Gln Gly Val Leu Leu Leu

1715 1720 1725

Asn Thr Val Leu Thr Val Arg Ala Gly Gln Ala His Ser His Ala

1730 1735 1740

Ser Leu Gly Trp Glu Thr Phe Thr Asp Lys Val Ile Ser Leu Ile

1745 1750 1755

Asn Gln His Arg Glu Gly Val Val Phe Leu Leu Trp Gly Ser His

1760 1765 1770

Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys Gln Arg His His Val

1775 1780 1785

Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala His Arg Gly Phe

1790 1795 1800

Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp Leu Glu Gln

1805 1810 1815

Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro Ala Glu

1820 1825 1830

Ser Glu Pro Lys Lys Lys Arg Lys Val Ser Ala Glu Gly Arg Gly

1835 1840 1845

Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro Gly Pro Ala

1850 1855 1860

Ser Asn Phe Thr Gln Phe Val Leu Val Asp Asn Gly Gly Thr Gly

1865 1870 1875

Asp Val Thr Val Ala Pro Ser Asn Phe Ala Asn Gly Ile Ala Glu

1880 1885 1890

Trp Ile Ser Ser Asn Ser Arg Ser Gln Ala Tyr Lys Val Thr Cys

1895 1900 1905

Ser Val Arg Gln Ser Ser Ala Gln Asn Arg Lys Tyr Thr Ile Lys

1910 1915 1920

Val Glu Val Pro Lys Gly Ala Trp Arg Ser Tyr Leu Asn Met Glu

1925 1930 1935

Leu Thr Ile Pro Ile Phe Ala Thr Asn Ser Asp Cys Glu Leu Ile

1940 1945 1950

Val Lys Ala Met Gln Gly Leu Leu Lys Asp Gly Asn Pro Ile Pro

1955 1960 1965

Ser Ala Ile Ala Ala Asn Ser Gly Ile Tyr Gly Gly Ser Pro Lys

1970 1975 1980

Lys Lys Arg Lys Val Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu

1985 1990 1995

Ser Ala Thr Pro Glu Ser Pro Arg Met Leu Glu Leu Arg Leu Val

2000 2005 2010

Gln Gly Ser Leu Leu Lys Lys Val Leu Glu Ala Ile Arg Glu Leu

2015 2020 2025

Val Thr Asp Ala Asn Phe Asp Cys Ser Gly Thr Gly Phe Ser Leu

2030 2035 2040

Gln Ala Met Asp Ser Ser His Val Ala Leu Val Ala Leu Leu Leu

2045 2050 2055

Arg Ser Glu Gly Phe Glu His Tyr Arg Cys Asp Arg Asn Leu Ser

2060 2065 2070

Met Gly Met Asn Leu Asn Asn Met Ala Lys Met Leu Arg Cys Ala

2075 2080 2085

Gly Asn Asp Asp Ile Ile Thr Ile Lys Ala Asp Asp Gly Ser Asp

2090 2095 2100

Thr Val Thr Phe Met Phe Glu Ser Pro Asn Gln Asp Lys Ile Ala

2105 2110 2115

Asp Phe Glu Met Lys Leu Met Asp Ile Asp Ser Glu His Leu Gly

2120 2125 2130

Ile Pro Asp Ser Glu Tyr Gln Ala Ile Val Arg Met Pro Ser Ser

2135 2140 2145

Glu Phe Ser Arg Ile Cys Lys Asp Leu Ser Ser Ile Gly Asp Thr

2150 2155 2160

Val Ile Ile Ser Val Thr Arg Glu Gly Val Lys Phe Ser Thr Ala

2165 2170 2175

Gly Asp Ile Gly Thr Ala Asn Ile Val Cys Arg Gln Asn Lys Thr

2180 2185 2190

Val Asp Lys Pro Glu Asp Ala Thr Ile Ile Glu Met Gln Glu Pro

2195 2200 2205

Val Ser Leu Thr Phe Ala Leu Arg Tyr Met Asn Ser Phe Thr Lys

2210 2215 2220

Ala Ser Pro Leu Ser Glu Gln Val Thr Ile Ser Leu Ser Ser Glu

2225 2230 2235

Leu Pro Val Val Val Glu Tyr Lys Ile Ala Glu Met Gly Tyr Ile

2240 2245 2250

Arg Phe Tyr Leu Ala Pro Lys Ile Glu Glu Asp Glu Glu Met Lys

2255 2260 2265

Ser Ser Gly Gly Ser Met Gln Ile Phe Val Lys Thr Leu Thr Gly

2270 2275 2280

Lys Thr Ile Thr Leu Glu Val Glu Ser Ser Asp Thr Ile Asp Asn

2285 2290 2295

Val Lys Ala Lys Ile Gln Asp Lys Glu Gly Ile Pro Pro Asp Gln

2300 2305 2310

Gln Arg Leu Ile Phe Ala Gly Lys Gln Leu Glu Asp Gly Arg Thr

2315 2320 2325

Leu Ala Asp Tyr Asn Ile Gln Lys Glu Ser Thr Leu His Leu Val

2330 2335 2340

Leu Arg Leu Arg Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys Val

2345 2350 2355

<210> 14

<211> 2248

<212> PRT

<213> Artificial Sequence

<220>

<223> schematic polypeptide of PCGBE System (FIG. 11c)

<400> 14

Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His

1 5 10 15

Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr

20 25 30

Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met

35 40 45

Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys

50 55 60

Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro

65 70 75 80

Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile

85 90 95

Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala

100 105 110

Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg

115 120 125

Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg

130 135 140

Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His

145 150 155 160

Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp

165 170 175

Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala

180 185 190

Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser

195 200 205

Glu Ser Ala Thr Pro Glu Ser Leu Lys Asp Lys Lys Tyr Ser Ile Gly

210 215 220

Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu

225 230 235 240

Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg

245 250 255

His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly

260 265 270

Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr

275 280 285

Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn

290 295 300

Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser

305 310 315 320

Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly

325 330 335

Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr

340 345 350

His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg

355 360 365

Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe

370 375 380

Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu

385 390 395 400

Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro

405 410 415

Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu

420 425 430

Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu

435 440 445

Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu

450 455 460

Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu

465 470 475 480

Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala

485 490 495

Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu

500 505 510

Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile

515 520 525

Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His

530 535 540

His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro

545 550 555 560

Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala

565 570 575

Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile

580 585 590

Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys

595 600 605

Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly

610 615 620

Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg

625 630 635 640

Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile

645 650 655

Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala

660 665 670

Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr

675 680 685

Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala

690 695 700

Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn

705 710 715 720

Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val

725 730 735

Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys

740 745 750

Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu

755 760 765

Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr

770 775 780

Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu

785 790 795 800

Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile

805 810 815

Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu

820 825 830

Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile

835 840 845

Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met

850 855 860

Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg

865 870 875 880

Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu

885 890 895

Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu

900 905 910

Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln

915 920 925

Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala

930 935 940

Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val

945 950 955 960

Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val

965 970 975

Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn

980 985 990

Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly

995 1000 1005

Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln

1010 1015 1020

Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met

1025 1030 1035

Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp

1040 1045 1050

Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile

1055 1060 1065

Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser

1070 1075 1080

Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr

1085 1090 1095

Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe

1100 1105 1110

Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

1115 1120 1125

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile

1130 1135 1140

Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys

1145 1150 1155

Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr

1160 1165 1170

Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe

1175 1180 1185

Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala

1190 1195 1200

Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro

1205 1210 1215

Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp

1220 1225 1230

Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala

1235 1240 1245

Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys

1250 1255 1260

Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu

1265 1270 1275

Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly

1280 1285 1290

Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val

1295 1300 1305

Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys

1310 1315 1320

Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg

1325 1330 1335

Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro

1340 1345 1350

Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly

1355 1360 1365

Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr

1370 1375 1380

Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu

1385 1390 1395

Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys

1400 1405 1410

Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg

1415 1420 1425

Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala

1430 1435 1440

Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr

1445 1450 1455

Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu

1460 1465 1470

Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln

1475 1480 1485

Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu

1490 1495 1500

Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile

1505 1510 1515

Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn

1520 1525 1530

Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp

1535 1540 1545

Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu

1550 1555 1560

Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu

1565 1570 1575

Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala

1580 1585 1590

Gly Gln Ala Lys Lys Lys Lys Thr Arg Asp Ser Gly Gly Ser Ala

1595 1600 1605

Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln

1610 1615 1620

Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln

1625 1630 1635

Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala

1640 1645 1650

Phe Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly

1655 1660 1665

Gln Asp Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe

1670 1675 1680

Ser Val Arg Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met

1685 1690 1695

Tyr Lys Glu Leu Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn

1700 1705 1710

His Gly Tyr Leu Glu Ser Trp Ala Arg Gln Gly Val Leu Leu Leu

1715 1720 1725

Asn Thr Val Leu Thr Val Arg Ala Gly Gln Ala His Ser His Ala

1730 1735 1740

Ser Leu Gly Trp Glu Thr Phe Thr Asp Lys Val Ile Ser Leu Ile

1745 1750 1755

Asn Gln His Arg Glu Gly Val Val Phe Leu Leu Trp Gly Ser His

1760 1765 1770

Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys Gln Arg His His Val

1775 1780 1785

Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala His Arg Gly Phe

1790 1795 1800

Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp Leu Glu Gln

1805 1810 1815

Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro Ala Glu

1820 1825 1830

Ser Glu Pro Lys Lys Lys Arg Lys Val Ser Ala Glu Gly Arg Gly

1835 1840 1845

Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro Gly Pro Met

1850 1855 1860

Lys Arg Phe Phe Gln Pro Val Pro Lys Asp Gly Ser Pro Ala Lys

1865 1870 1875

Lys Arg Pro Ala Ala Ala Ala Ala Ala Ser Ala Ser Asp Ser Asp

1880 1885 1890

Ser Leu Gly Gly Asp Ala Pro Ala Ala Ala Ala Cys Ala Val Gly

1895 1900 1905

Glu Gly Asp Ser Pro Pro Ala Pro Arg Glu Glu Glu Pro Arg Arg

1910 1915 1920

Phe Val Thr Trp Asn Ala Asn Ser Leu Leu Leu Arg Met Lys Ser

1925 1930 1935

Asp Trp Pro Ala Phe Cys Gln Phe Val Ser Arg Val Asp Pro Asp

1940 1945 1950

Val Ile Cys Val Gln Glu Val Arg Met Pro Ala Ala Gly Ser Lys

1955 1960 1965

Gly Ala Pro Lys Asn Pro Gly Gln Leu Lys Asp Asp Thr Ser Ser

1970 1975 1980

Ser Arg Asp Glu Lys Gln Val Val Leu Arg Ala Leu Ser Ser Pro

1985 1990 1995

Pro Phe Lys Asp Tyr Arg Val Trp Trp Ser Leu Ser Asp Ser Lys

2000 2005 2010

Tyr Ala Gly Thr Ala Met Ile Ile Lys Lys Lys Phe Glu Pro Lys

2015 2020 2025

Lys Val Ser Phe Asn Leu Asp Arg Thr Ser Ser Lys His Glu Pro

2030 2035 2040

Asp Gly Arg Val Ile Ile Ala Glu Phe Glu Ser Phe Leu Leu Leu

2045 2050 2055

Asn Thr Tyr Ala Pro Asn Asn Gly Trp Lys Glu Glu Glu Asn Ser

2060 2065 2070

Phe Gln Arg Arg Arg Lys Trp Asp Lys Arg Met Leu Glu Phe Val

2075 2080 2085

Gln Gln Val Asp Lys Pro Leu Ile Trp Cys Gly Ala Leu Val Val

2090 2095 2100

Ser His Glu Glu Ile Asp Val Ser His Pro Asp Phe Phe Ser Ser

2105 2110 2115

Ala Lys Leu Asn Gly Tyr Ile Pro Pro Asn Lys Glu Asp Cys Gly

2120 2125 2130

Gln Pro Gly Phe Thr Leu Ser Glu Arg Arg Arg Phe Gly Asn Ile

2135 2140 2145

Leu Ser Gln Gly Lys Leu Val Asp Ala Tyr Arg Tyr Leu His Lys

2150 2155 2160

Glu Lys Asp Met Asp Cys Gly Phe Ser Trp Ser Gly His Pro Ile

2165 2170 2175

Gly Lys Tyr Arg Gly Lys Arg Met Arg Ile Asp Tyr Phe Leu Val

2180 2185 2190

Ser Glu Lys Leu Lys Asp Gln Ile Val Ser Cys Asp Ile His Gly

2195 2200 2205

Arg Gly Ile Glu Leu Glu Gly Phe Tyr Gly Ser Asp His Cys Pro

2210 2215 2220

Val Ser Leu Glu Leu Ser Glu Glu Val Glu Ala Pro Lys Pro Lys

2225 2230 2235

Ser Ser Asn Pro Lys Lys Lys Arg Lys Val

2240 2245

<210> 15

<211> 331

<212> PRT

<213> Artificial Sequence

<220>

<223> mAPE01g (D297A)

<400> 15

Ser Ala Ile Arg Ala Ser Ser His Arg Leu Gln Thr Arg Thr Val Ala

1 5 10 15

Leu Thr Arg Thr Lys Met Ser Ser Met Ala Gly Leu Gly Ala Ser Gln

20 25 30

His Gly Tyr Pro Pro Arg Ser His Glu Pro Trp Thr Lys Leu Val His

35 40 45

Arg Glu Arg Leu Pro Glu Trp Phe Ala Tyr Asn Pro Lys Thr Met Arg

50 55 60

Pro Pro Pro Leu Ser His Asp Thr Lys Cys Met Lys Ile Leu Ser Trp

65 70 75 80

Asn Ile Asn Gly Leu His Asp Val Val Thr Thr Lys Gly Phe Ser Ala

85 90 95

Arg Asp Leu Ala Gln Arg Glu Asn Phe Asp Val Leu Cys Leu Gln Glu

100 105 110

Thr His Leu Glu Glu Lys Asp Val Glu Lys Phe Lys Asn Leu Ile Ala

115 120 125

Asp Tyr Asp Ser Tyr Trp Ser Cys Ser Val Ser Arg Leu Gly Tyr Ser

130 135 140

Gly Thr Ala Val Ile Ser Arg Val Lys Pro Ile Ser Val Gln Tyr Gly

145 150 155 160

Ile Gly Ile Arg Glu His Asp His Glu Gly Arg Val Ile Thr Leu Glu

165 170 175

Phe Asp Gly Phe Tyr Leu Val Asn Ala Tyr Val Pro Asn Ser Gly Arg

180 185 190

Phe Leu Arg Arg Leu Asn Tyr Arg Val Asn Asn Trp Asp Pro Cys Phe

195 200 205

Ser Asn Tyr Val Lys Ile Leu Glu Lys Ser Lys Pro Val Ile Val Ala

210 215 220

Gly Asp Leu Asn Cys Ala Arg Gln Ser Ile Asp Ile His Asn Pro Pro

225 230 235 240

Ala Lys Thr Lys Ser Ala Gly Phe Thr Ile Glu Glu Arg Glu Ser Phe

245 250 255

Glu Thr Asn Phe Ser Ser Lys Gly Leu Val Asp Thr Phe Arg Lys Gln

260 265 270

His Pro Asn Ala Val Gly Tyr Thr Phe Trp Gly Glu Asn Gln Arg Ile

275 280 285

Thr Asn Lys Gly Trp Arg Leu Ala Tyr Phe Leu Ala Ser Glu Ser Ile

290 295 300

Thr Asp Lys Val His Asp Ser Tyr Ile Leu Pro Asp Val Ser Phe Ser

305 310 315 320

Asp His Ser Pro Ile Gly Leu Val Leu Lys Leu

325 330

<210> 16

<211> 378

<212> PRT

<213> Artificial Sequence

<220>

<223> mAPE12g (D327A)

<400> 16

Lys Arg Phe Phe Gln Pro Val Pro Lys Asp Gly Ser Pro Ala Lys Lys

1 5 10 15

Arg Pro Ala Ala Ala Ala Ala Ala Ser Ala Ser Asp Ser Asp Ser Leu

20 25 30

Gly Gly Asp Ala Pro Ala Ala Ala Ala Cys Ala Val Gly Glu Gly Asp

35 40 45

Ser Pro Pro Ala Pro Arg Glu Glu Glu Pro Arg Arg Phe Val Thr Trp

50 55 60

Asn Ala Asn Ser Leu Leu Leu Arg Met Lys Ser Asp Trp Pro Ala Phe

65 70 75 80

Cys Gln Phe Val Ser Arg Val Asp Pro Asp Val Ile Cys Val Gln Glu

85 90 95

Val Arg Met Pro Ala Ala Gly Ser Lys Gly Ala Pro Lys Asn Pro Gly

100 105 110

Gln Leu Lys Asp Asp Thr Ser Ser Ser Arg Asp Glu Lys Gln Val Val

115 120 125

Leu Arg Ala Leu Ser Ser Pro Pro Phe Lys Asp Tyr Arg Val Trp Trp

130 135 140

Ser Leu Ser Asp Ser Lys Tyr Ala Gly Thr Ala Met Ile Ile Lys Lys

145 150 155 160

Lys Phe Glu Pro Lys Lys Val Ser Phe Asn Leu Asp Arg Thr Ser Ser

165 170 175

Lys His Glu Pro Asp Gly Arg Val Ile Ile Ala Glu Phe Glu Ser Phe

180 185 190

Leu Leu Leu Asn Thr Tyr Ala Pro Asn Asn Gly Trp Lys Glu Glu Glu

195 200 205

Asn Ser Phe Gln Arg Arg Arg Lys Trp Asp Lys Arg Met Leu Glu Phe

210 215 220

Val Gln Gln Val Asp Lys Pro Leu Ile Trp Cys Gly Asp Leu Asn Val

225 230 235 240

Ser His Glu Glu Ile Asp Val Ser His Pro Asp Phe Phe Ser Ser Ala

245 250 255

Lys Leu Asn Gly Tyr Ile Pro Pro Asn Lys Glu Asp Cys Gly Gln Pro

260 265 270

Gly Phe Thr Leu Ser Glu Arg Arg Arg Phe Gly Asn Ile Leu Ser Gln

275 280 285

Gly Lys Leu Val Asp Ala Tyr Arg Tyr Leu His Lys Glu Lys Asp Met

290 295 300

Asp Cys Gly Phe Ser Trp Ser Gly His Pro Ile Gly Lys Tyr Arg Gly

305 310 315 320

Lys Arg Met Arg Ile Ala Tyr Phe Leu Val Ser Glu Lys Leu Lys Asp

325 330 335

Gln Ile Val Ser Cys Asp Ile His Gly Arg Gly Ile Glu Leu Glu Gly

340 345 350

Phe Tyr Gly Ser Asp His Cys Pro Val Ser Leu Glu Leu Ser Glu Glu

355 360 365

Val Glu Ala Pro Lys Pro Lys Ser Ser Asn

370 375

<210> 17

<211> 494

<212> PRT

<213> Artificial Sequence

<220>

<223> hRad18

<400> 17

Asp Ser Leu Ala Glu Ser Arg Trp Pro Pro Gly Leu Ala Val Met Lys

1 5 10 15

Thr Ile Asp Asp Leu Leu Arg Cys Gly Ile Cys Phe Glu Tyr Phe Asn

20 25 30

Ile Ala Met Ile Ile Pro Gln Cys Ser His Asn Tyr Cys Ser Leu Cys

35 40 45

Ile Arg Lys Phe Leu Ser Tyr Lys Thr Gln Cys Pro Thr Cys Cys Val

50 55 60

Thr Val Thr Glu Pro Asp Leu Lys Asn Asn Arg Ile Leu Asp Glu Leu

65 70 75 80

Val Lys Ser Leu Asn Phe Ala Arg Asn His Leu Leu Gln Phe Ala Leu

85 90 95

Glu Ser Pro Ala Lys Ser Pro Ala Ser Ser Ser Ser Lys Asn Leu Ala

100 105 110

Val Lys Val Tyr Thr Pro Val Ala Ser Arg Gln Ser Leu Lys Gln Gly

115 120 125

Ser Arg Leu Met Asp Asn Phe Leu Ile Arg Glu Met Ser Gly Ser Thr

130 135 140

Ser Glu Leu Leu Ile Lys Glu Asn Lys Ser Lys Phe Ser Pro Gln Lys

145 150 155 160

Glu Ala Ser Pro Ala Ala Lys Thr Lys Glu Thr Arg Ser Val Glu Glu

165 170 175

Ile Ala Pro Asp Pro Ser Glu Ala Lys Arg Pro Glu Pro Pro Ser Thr

180 185 190

Ser Thr Leu Lys Gln Val Thr Lys Val Asp Cys Pro Val Cys Gly Val

195 200 205

Asn Ile Pro Glu Ser His Ile Asn Lys His Leu Asp Ser Cys Leu Ser

210 215 220

Arg Glu Glu Lys Lys Glu Ser Leu Arg Ser Ser Val His Lys Arg Lys

225 230 235 240

Pro Leu Pro Lys Thr Val Tyr Asn Leu Leu Ser Asp Arg Asp Leu Lys

245 250 255

Lys Lys Leu Lys Glu His Gly Leu Ser Ile Gln Gly Asn Lys Gln Gln

260 265 270

Leu Ile Lys Arg His Gln Glu Phe Val His Met Tyr Asn Ala Gln Cys

275 280 285

Asp Ala Leu His Pro Lys Ser Ala Ala Glu Ile Val Arg Glu Ile Glu

290 295 300

Asn Ile Glu Lys Thr Arg Met Arg Leu Glu Ala Ser Lys Leu Asn Glu

305 310 315 320

Ser Val Met Val Phe Thr Lys Asp Gln Thr Glu Lys Glu Ile Asp Glu

325 330 335

Ile His Ser Lys Tyr Arg Lys Lys His Lys Ser Glu Phe Gln Leu Leu

340 345 350

Val Asp Gln Ala Arg Lys Gly Tyr Lys Lys Ile Ala Gly Met Ser Gln

355 360 365

Lys Thr Val Thr Ile Thr Lys Glu Asp Glu Ser Thr Glu Lys Leu Ser

370 375 380

Ser Val Cys Met Gly Gln Glu Asp Asn Met Thr Ser Val Thr Asn His

385 390 395 400

Phe Ser Gln Ser Lys Leu Asp Ser Pro Glu Glu Leu Glu Pro Asp Arg

405 410 415

Glu Glu Asp Ser Ser Ser Cys Ile Asp Ile Gln Glu Val Leu Ser Ser

420 425 430

Ser Glu Ser Asp Ser Cys Asn Ser Ser Ser Ser Asp Ile Ile Arg Asp

435 440 445

Leu Leu Glu Glu Glu Glu Ala Trp Glu Ala Ser His Lys Asn Asp Leu

450 455 460

Gln Asp Thr Glu Ile Ser Pro Arg Gln Asn Arg Arg Thr Arg Ala Ala

465 470 475 480

Glu Ser Ala Glu Ile Glu Pro Arg Asn Lys Arg Asn Arg Asn

485 490

<210> 18

<211> 332

<212> PRT

<213> Artificial Sequence

<220>

<223> APE01g

<400> 18

Met Ser Ala Ile Arg Ala Ser Ser His Arg Leu Gln Thr Arg Thr Val

1 5 10 15

Ala Leu Thr Arg Thr Lys Met Ser Ser Met Ala Gly Leu Gly Ala Ser

20 25 30

Gln His Gly Tyr Pro Pro Arg Ser His Glu Pro Trp Thr Lys Leu Val

35 40 45

His Arg Glu Arg Leu Pro Glu Trp Phe Ala Tyr Asn Pro Lys Thr Met

50 55 60

Arg Pro Pro Pro Leu Ser His Asp Thr Lys Cys Met Lys Ile Leu Ser

65 70 75 80

Trp Asn Ile Asn Gly Leu His Asp Val Val Thr Thr Lys Gly Phe Ser

85 90 95

Ala Arg Asp Leu Ala Gln Arg Glu Asn Phe Asp Val Leu Cys Leu Gln

100 105 110

Glu Thr His Leu Glu Glu Lys Asp Val Glu Lys Phe Lys Asn Leu Ile

115 120 125

Ala Asp Tyr Asp Ser Tyr Trp Ser Cys Ser Val Ser Arg Leu Gly Tyr

130 135 140

Ser Gly Thr Ala Val Ile Ser Arg Val Lys Pro Ile Ser Val Gln Tyr

145 150 155 160

Gly Ile Gly Ile Arg Glu His Asp His Glu Gly Arg Val Ile Thr Leu

165 170 175

Glu Phe Asp Gly Phe Tyr Leu Val Asn Ala Tyr Val Pro Asn Ser Gly

180 185 190

Arg Phe Leu Arg Arg Leu Asn Tyr Arg Val Asn Asn Trp Asp Pro Cys

195 200 205

Phe Ser Asn Tyr Val Lys Ile Leu Glu Lys Ser Lys Pro Val Ile Val

210 215 220

Ala Gly Asp Leu Asn Cys Ala Arg Gln Ser Ile Asp Ile His Asn Pro

225 230 235 240

Pro Ala Lys Thr Lys Ser Ala Gly Phe Thr Ile Glu Glu Arg Glu Ser

245 250 255

Phe Glu Thr Asn Phe Ser Ser Lys Gly Leu Val Asp Thr Phe Arg Lys

260 265 270

Gln His Pro Asn Ala Val Gly Tyr Thr Phe Trp Gly Glu Asn Gln Arg

275 280 285

Ile Thr Asn Lys Gly Trp Arg Leu Asp Tyr Phe Leu Ala Ser Glu Ser

290 295 300

Ile Thr Asp Lys Val His Asp Ser Tyr Ile Leu Pro Asp Val Ser Phe

305 310 315 320

Ser Asp His Ser Pro Ile Gly Leu Val Leu Lys Leu

325 330

<210> 19

<211> 379

<212> PRT

<213> Artificial Sequence

<220>

<223> APE12g

<400> 19

Met Lys Arg Phe Phe Gln Pro Val Pro Lys Asp Gly Ser Pro Ala Lys

1 5 10 15

Lys Arg Pro Ala Ala Ala Ala Ala Ala Ser Ala Ser Asp Ser Asp Ser

20 25 30

Leu Gly Gly Asp Ala Pro Ala Ala Ala Ala Cys Ala Val Gly Glu Gly

35 40 45

Asp Ser Pro Pro Ala Pro Arg Glu Glu Glu Pro Arg Arg Phe Val Thr

50 55 60

Trp Asn Ala Asn Ser Leu Leu Leu Arg Met Lys Ser Asp Trp Pro Ala

65 70 75 80

Phe Cys Gln Phe Val Ser Arg Val Asp Pro Asp Val Ile Cys Val Gln

85 90 95

Glu Val Arg Met Pro Ala Ala Gly Ser Lys Gly Ala Pro Lys Asn Pro

100 105 110

Gly Gln Leu Lys Asp Asp Thr Ser Ser Ser Arg Asp Glu Lys Gln Val

115 120 125

Val Leu Arg Ala Leu Ser Ser Pro Pro Phe Lys Asp Tyr Arg Val Trp

130 135 140

Trp Ser Leu Ser Asp Ser Lys Tyr Ala Gly Thr Ala Met Ile Ile Lys

145 150 155 160

Lys Lys Phe Glu Pro Lys Lys Val Ser Phe Asn Leu Asp Arg Thr Ser

165 170 175

Ser Lys His Glu Pro Asp Gly Arg Val Ile Ile Ala Glu Phe Glu Ser

180 185 190

Phe Leu Leu Leu Asn Thr Tyr Ala Pro Asn Asn Gly Trp Lys Glu Glu

195 200 205

Glu Asn Ser Phe Gln Arg Arg Arg Lys Trp Asp Lys Arg Met Leu Glu

210 215 220

Phe Val Gln Gln Val Asp Lys Pro Leu Ile Trp Cys Gly Asp Leu Asn

225 230 235 240

Val Ser His Glu Glu Ile Asp Val Ser His Pro Asp Phe Phe Ser Ser

245 250 255

Ala Lys Leu Asn Gly Tyr Ile Pro Pro Asn Lys Glu Asp Cys Gly Gln

260 265 270

Pro Gly Phe Thr Leu Ser Glu Arg Arg Arg Phe Gly Asn Ile Leu Ser

275 280 285

Gln Gly Lys Leu Val Asp Ala Tyr Arg Tyr Leu His Lys Glu Lys Asp

290 295 300

Met Asp Cys Gly Phe Ser Trp Ser Gly His Pro Ile Gly Lys Tyr Arg

305 310 315 320

Gly Lys Arg Met Arg Ile Asp Tyr Phe Leu Val Ser Glu Lys Leu Lys

325 330 335

Asp Gln Ile Val Ser Cys Asp Ile His Gly Arg Gly Ile Glu Leu Glu

340 345 350

Gly Phe Tyr Gly Ser Asp His Cys Pro Val Ser Leu Glu Leu Ser Glu

355 360 365

Glu Val Glu Ala Pro Lys Pro Lys Ser Ser Asn

370 375

<210> 20

<211> 263

<212> PRT

<213> Artificial Sequence

<220>

<223> OsPCNA (wt)

<400> 20

Met Leu Glu Leu Arg Leu Val Gln Gly Ser Leu Leu Lys Lys Val Leu

1 5 10 15

Glu Ala Ile Arg Glu Leu Val Thr Asp Ala Asn Phe Asp Cys Ser Gly

20 25 30

Thr Gly Phe Ser Leu Gln Ala Met Asp Ser Ser His Val Ala Leu Val

35 40 45

Ala Leu Leu Leu Arg Ser Glu Gly Phe Glu His Tyr Arg Cys Asp Arg

50 55 60

Asn Leu Ser Met Gly Met Asn Leu Asn Asn Met Ala Lys Met Leu Arg

65 70 75 80

Cys Ala Gly Asn Asp Asp Ile Ile Thr Ile Lys Ala Asp Asp Gly Ser

85 90 95

Asp Thr Val Thr Phe Met Phe Glu Ser Pro Asn Gln Asp Lys Ile Ala

100 105 110

Asp Phe Glu Met Lys Leu Met Asp Ile Asp Ser Glu His Leu Gly Ile

115 120 125

Pro Asp Ser Glu Tyr Gln Ala Ile Val Arg Met Pro Ser Ser Glu Phe

130 135 140

Ser Arg Ile Cys Lys Asp Leu Ser Ser Ile Gly Asp Thr Val Ile Ile

145 150 155 160

Ser Val Thr Lys Glu Gly Val Lys Phe Ser Thr Ala Gly Asp Ile Gly

165 170 175

Thr Ala Asn Ile Val Cys Arg Gln Asn Lys Thr Val Asp Lys Pro Glu

180 185 190

Asp Ala Thr Ile Ile Glu Met Gln Glu Pro Val Ser Leu Thr Phe Ala

195 200 205

Leu Arg Tyr Met Asn Ser Phe Thr Lys Ala Ser Pro Leu Ser Glu Gln

210 215 220

Val Thr Ile Ser Leu Ser Ser Glu Leu Pro Val Val Val Glu Tyr Lys

225 230 235 240

Ile Ala Glu Met Gly Tyr Ile Arg Phe Tyr Leu Ala Pro Lys Ile Glu

245 250 255

Glu Asp Glu Glu Met Lys Ser

260

<210> 21

<211> 2204

<212> PRT

<213> Artificial Sequence

<220>

<223> exemplary fusion protein of PCGBE-3 System

<400> 21

Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His

1 5 10 15

Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr

20 25 30

Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met

35 40 45

Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys

50 55 60

Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro

65 70 75 80

Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile

85 90 95

Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala

100 105 110

Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg

115 120 125

Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg

130 135 140

Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His

145 150 155 160

Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp

165 170 175

Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala

180 185 190

Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser

195 200 205

Glu Ser Ala Thr Pro Glu Ser Arg Pro Asp Lys Lys Tyr Ser Ile Gly

210 215 220

Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu

225 230 235 240

Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg

245 250 255

His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly

260 265 270

Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr

275 280 285

Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn

290 295 300

Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser

305 310 315 320

Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly

325 330 335

Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr

340 345 350

His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg

355 360 365

Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe

370 375 380

Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu

385 390 395 400

Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro

405 410 415

Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu

420 425 430

Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu

435 440 445

Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu

450 455 460

Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu

465 470 475 480

Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala

485 490 495

Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu

500 505 510

Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile

515 520 525

Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His

530 535 540

His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro

545 550 555 560

Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala

565 570 575

Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile

580 585 590

Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys

595 600 605

Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly

610 615 620

Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg

625 630 635 640

Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile

645 650 655

Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala

660 665 670

Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr

675 680 685

Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala

690 695 700

Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn

705 710 715 720

Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val

725 730 735

Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys

740 745 750

Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu

755 760 765

Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr

770 775 780

Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu

785 790 795 800

Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile

805 810 815

Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu

820 825 830

Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile

835 840 845

Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met

850 855 860

Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg

865 870 875 880

Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu

885 890 895

Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu

900 905 910

Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln

915 920 925

Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala

930 935 940

Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val

945 950 955 960

Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val

965 970 975

Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn

980 985 990

Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly

995 1000 1005

Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln

1010 1015 1020

Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met

1025 1030 1035

Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp

1040 1045 1050

Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile

1055 1060 1065

Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser

1070 1075 1080

Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr

1085 1090 1095

Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe

1100 1105 1110

Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

1115 1120 1125

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile

1130 1135 1140

Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys

1145 1150 1155

Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr

1160 1165 1170

Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe

1175 1180 1185

Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala

1190 1195 1200

Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro

1205 1210 1215

Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp

1220 1225 1230

Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala

1235 1240 1245

Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys

1250 1255 1260

Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu

1265 1270 1275

Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly

1280 1285 1290

Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val

1295 1300 1305

Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys

1310 1315 1320

Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg

1325 1330 1335

Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro

1340 1345 1350

Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly

1355 1360 1365

Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr

1370 1375 1380

Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu

1385 1390 1395

Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys

1400 1405 1410

Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg

1415 1420 1425

Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala

1430 1435 1440

Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr

1445 1450 1455

Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu

1460 1465 1470

Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln

1475 1480 1485

Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu

1490 1495 1500

Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile

1505 1510 1515

Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn

1520 1525 1530

Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp

1535 1540 1545

Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu

1550 1555 1560

Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu

1565 1570 1575

Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala

1580 1585 1590

Gly Gln Ala Lys Lys Lys Lys Gly Thr Asp Ser Gly Gly Ser Ala

1595 1600 1605

Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln

1610 1615 1620

Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln

1625 1630 1635

Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala

1640 1645 1650

Phe Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly

1655 1660 1665

Gln Asp Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe

1670 1675 1680

Ser Val Arg Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met

1685 1690 1695

Tyr Lys Glu Leu Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn

1700 1705 1710

His Gly Tyr Leu Glu Ser Trp Ala Arg Gln Gly Val Leu Leu Leu

1715 1720 1725

Asn Thr Val Leu Thr Val Arg Ala Gly Gln Ala His Ser His Ala

1730 1735 1740

Ser Leu Gly Trp Glu Thr Phe Thr Asp Lys Val Ile Ser Leu Ile

1745 1750 1755

Asn Gln His Arg Glu Gly Val Val Phe Leu Leu Trp Gly Ser His

1760 1765 1770

Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys Gln Arg His His Val

1775 1780 1785

Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala His Arg Gly Phe

1790 1795 1800

Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp Leu Glu Gln

1805 1810 1815

Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro Ala Glu

1820 1825 1830

Ser Glu Pro Lys Lys Lys Arg Lys Val Leu Lys Glu Gly Arg Gly

1835 1840 1845

Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro Gly Pro Ser

1850 1855 1860

Ala Ile Arg Ala Ser Ser His Arg Leu Gln Thr Arg Thr Val Ala

1865 1870 1875

Leu Thr Arg Thr Lys Met Ser Ser Met Ala Gly Leu Gly Ala Ser

1880 1885 1890

Gln His Gly Tyr Pro Pro Arg Ser His Glu Pro Trp Thr Lys Leu

1895 1900 1905

Val His Arg Glu Arg Leu Pro Glu Trp Phe Ala Tyr Asn Pro Lys

1910 1915 1920

Thr Met Arg Pro Pro Pro Leu Ser His Asp Thr Lys Cys Met Lys

1925 1930 1935

Ile Leu Ser Trp Asn Ile Asn Gly Leu His Asp Val Val Thr Thr

1940 1945 1950

Lys Gly Phe Ser Ala Arg Asp Leu Ala Gln Arg Glu Asn Phe Asp

1955 1960 1965

Val Leu Cys Leu Gln Glu Thr His Leu Glu Glu Lys Asp Val Glu

1970 1975 1980

Lys Phe Lys Asn Leu Ile Ala Asp Tyr Asp Ser Tyr Trp Ser Cys

1985 1990 1995

Ser Val Ser Arg Leu Gly Tyr Ser Gly Thr Ala Val Ile Ser Arg

2000 2005 2010

Val Lys Pro Ile Ser Val Gln Tyr Gly Ile Gly Ile Arg Glu His

2015 2020 2025

Asp His Glu Gly Arg Val Ile Thr Leu Glu Phe Asp Gly Phe Tyr

2030 2035 2040

Leu Val Asn Ala Tyr Val Pro Asn Ser Gly Arg Phe Leu Arg Arg

2045 2050 2055

Leu Asn Tyr Arg Val Asn Asn Trp Asp Pro Cys Phe Ser Asn Tyr

2060 2065 2070

Val Lys Ile Leu Glu Lys Ser Lys Pro Val Ile Val Ala Gly Asp

2075 2080 2085

Leu Asn Cys Ala Arg Gln Ser Ile Asp Ile His Asn Pro Pro Ala

2090 2095 2100

Lys Thr Lys Ser Ala Gly Phe Thr Ile Glu Glu Arg Glu Ser Phe

2105 2110 2115

Glu Thr Asn Phe Ser Ser Lys Gly Leu Val Asp Thr Phe Arg Lys

2120 2125 2130

Gln His Pro Asn Ala Val Gly Tyr Thr Phe Trp Gly Glu Asn Gln

2135 2140 2145

Arg Ile Thr Asn Lys Gly Trp Arg Leu Ala Tyr Phe Leu Ala Ser

2150 2155 2160

Glu Ser Ile Thr Asp Lys Val His Asp Ser Tyr Ile Leu Pro Asp

2165 2170 2175

Val Ser Phe Ser Asp His Ser Pro Ile Gly Leu Val Leu Lys Leu

2180 2185 2190

Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys Val

2195 2200

<210> 22

<211> 2251

<212> PRT

<213> Artificial Sequence

<220>

<223> schematic fusion protein of PCGBE-4 System

<400> 22

Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His

1 5 10 15

Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr

20 25 30

Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met

35 40 45

Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys

50 55 60

Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro

65 70 75 80

Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile

85 90 95

Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala

100 105 110

Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg

115 120 125

Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg

130 135 140

Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His

145 150 155 160

Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp

165 170 175

Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala

180 185 190

Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser

195 200 205

Glu Ser Ala Thr Pro Glu Ser Arg Pro Asp Lys Lys Tyr Ser Ile Gly

210 215 220

Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu

225 230 235 240

Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg

245 250 255

His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly

260 265 270

Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr

275 280 285

Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn

290 295 300

Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser

305 310 315 320

Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly

325 330 335

Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr

340 345 350

His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg

355 360 365

Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe

370 375 380

Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu

385 390 395 400

Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro

405 410 415

Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu

420 425 430

Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu

435 440 445

Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu

450 455 460

Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu

465 470 475 480

Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala

485 490 495

Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu

500 505 510

Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile

515 520 525

Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His

530 535 540

His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro

545 550 555 560

Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala

565 570 575

Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile

580 585 590

Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys

595 600 605

Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly

610 615 620

Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg

625 630 635 640

Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile

645 650 655

Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala

660 665 670

Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr

675 680 685

Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala

690 695 700

Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn

705 710 715 720

Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val

725 730 735

Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys

740 745 750

Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu

755 760 765

Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr

770 775 780

Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu

785 790 795 800

Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile

805 810 815

Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu

820 825 830

Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile

835 840 845

Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met

850 855 860

Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg

865 870 875 880

Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu

885 890 895

Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu

900 905 910

Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln

915 920 925

Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala

930 935 940

Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val

945 950 955 960

Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val

965 970 975

Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn

980 985 990

Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly

995 1000 1005

Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln

1010 1015 1020

Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met

1025 1030 1035

Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp

1040 1045 1050

Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile

1055 1060 1065

Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser

1070 1075 1080

Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr

1085 1090 1095

Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe

1100 1105 1110

Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

1115 1120 1125

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile

1130 1135 1140

Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys

1145 1150 1155

Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr

1160 1165 1170

Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe

1175 1180 1185

Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala

1190 1195 1200

Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro

1205 1210 1215

Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp

1220 1225 1230

Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala

1235 1240 1245

Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys

1250 1255 1260

Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu

1265 1270 1275

Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly

1280 1285 1290

Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val

1295 1300 1305

Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys

1310 1315 1320

Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg

1325 1330 1335

Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro

1340 1345 1350

Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly

1355 1360 1365

Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr

1370 1375 1380

Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu

1385 1390 1395

Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys

1400 1405 1410

Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg

1415 1420 1425

Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala

1430 1435 1440

Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr

1445 1450 1455

Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu

1460 1465 1470

Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln

1475 1480 1485

Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu

1490 1495 1500

Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile

1505 1510 1515

Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn

1520 1525 1530

Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp

1535 1540 1545

Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu

1550 1555 1560

Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu

1565 1570 1575

Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala

1580 1585 1590

Gly Gln Ala Lys Lys Lys Lys Gly Thr Asp Ser Gly Gly Ser Ala

1595 1600 1605

Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln

1610 1615 1620

Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln

1625 1630 1635

Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala

1640 1645 1650

Phe Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly

1655 1660 1665

Gln Asp Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe

1670 1675 1680

Ser Val Arg Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met

1685 1690 1695

Tyr Lys Glu Leu Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn

1700 1705 1710

His Gly Tyr Leu Glu Ser Trp Ala Arg Gln Gly Val Leu Leu Leu

1715 1720 1725

Asn Thr Val Leu Thr Val Arg Ala Gly Gln Ala His Ser His Ala

1730 1735 1740

Ser Leu Gly Trp Glu Thr Phe Thr Asp Lys Val Ile Ser Leu Ile

1745 1750 1755

Asn Gln His Arg Glu Gly Val Val Phe Leu Leu Trp Gly Ser His

1760 1765 1770

Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys Gln Arg His His Val

1775 1780 1785

Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala His Arg Gly Phe

1790 1795 1800

Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp Leu Glu Gln

1805 1810 1815

Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro Ala Glu

1820 1825 1830

Ser Glu Pro Lys Lys Lys Arg Lys Val Leu Lys Glu Gly Arg Gly

1835 1840 1845

Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro Gly Pro Lys

1850 1855 1860

Arg Phe Phe Gln Pro Val Pro Lys Asp Gly Ser Pro Ala Lys Lys

1865 1870 1875

Arg Pro Ala Ala Ala Ala Ala Ala Ser Ala Ser Asp Ser Asp Ser

1880 1885 1890

Leu Gly Gly Asp Ala Pro Ala Ala Ala Ala Cys Ala Val Gly Glu

1895 1900 1905

Gly Asp Ser Pro Pro Ala Pro Arg Glu Glu Glu Pro Arg Arg Phe

1910 1915 1920

Val Thr Trp Asn Ala Asn Ser Leu Leu Leu Arg Met Lys Ser Asp

1925 1930 1935

Trp Pro Ala Phe Cys Gln Phe Val Ser Arg Val Asp Pro Asp Val

1940 1945 1950

Ile Cys Val Gln Glu Val Arg Met Pro Ala Ala Gly Ser Lys Gly

1955 1960 1965

Ala Pro Lys Asn Pro Gly Gln Leu Lys Asp Asp Thr Ser Ser Ser

1970 1975 1980

Arg Asp Glu Lys Gln Val Val Leu Arg Ala Leu Ser Ser Pro Pro

1985 1990 1995

Phe Lys Asp Tyr Arg Val Trp Trp Ser Leu Ser Asp Ser Lys Tyr

2000 2005 2010

Ala Gly Thr Ala Met Ile Ile Lys Lys Lys Phe Glu Pro Lys Lys

2015 2020 2025

Val Ser Phe Asn Leu Asp Arg Thr Ser Ser Lys His Glu Pro Asp

2030 2035 2040

Gly Arg Val Ile Ile Ala Glu Phe Glu Ser Phe Leu Leu Leu Asn

2045 2050 2055

Thr Tyr Ala Pro Asn Asn Gly Trp Lys Glu Glu Glu Asn Ser Phe

2060 2065 2070

Gln Arg Arg Arg Lys Trp Asp Lys Arg Met Leu Glu Phe Val Gln

2075 2080 2085

Gln Val Asp Lys Pro Leu Ile Trp Cys Gly Asp Leu Asn Val Ser

2090 2095 2100

His Glu Glu Ile Asp Val Ser His Pro Asp Phe Phe Ser Ser Ala

2105 2110 2115

Lys Leu Asn Gly Tyr Ile Pro Pro Asn Lys Glu Asp Cys Gly Gln

2120 2125 2130

Pro Gly Phe Thr Leu Ser Glu Arg Arg Arg Phe Gly Asn Ile Leu

2135 2140 2145

Ser Gln Gly Lys Leu Val Asp Ala Tyr Arg Tyr Leu His Lys Glu

2150 2155 2160

Lys Asp Met Asp Cys Gly Phe Ser Trp Ser Gly His Pro Ile Gly

2165 2170 2175

Lys Tyr Arg Gly Lys Arg Met Arg Ile Ala Tyr Phe Leu Val Ser

2180 2185 2190

Glu Lys Leu Lys Asp Gln Ile Val Ser Cys Asp Ile His Gly Arg

2195 2200 2205

Gly Ile Glu Leu Glu Gly Phe Tyr Gly Ser Asp His Cys Pro Val

2210 2215 2220

Ser Leu Glu Leu Ser Glu Glu Val Glu Ala Pro Lys Pro Lys Ser

2225 2230 2235

Ser Asn Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys Val

2240 2245 2250

<210> 23

<211> 2367

<212> PRT

<213> Artificial Sequence

<220>

<223> schematic fusion protein of PCGBE-5 System

<400> 23

Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro His

1 5 10 15

Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr

20 25 30

Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn Gly Thr Ser Val Lys Met

35 40 45

Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys

50 55 60

Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro

65 70 75 80

Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp Phe Ile

85 90 95

Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala

100 105 110

Phe Leu Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg

115 120 125

Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg

130 135 140

Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His

145 150 155 160

Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp

165 170 175

Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala

180 185 190

Ile Leu Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser

195 200 205

Glu Ser Ala Thr Pro Glu Ser Arg Pro Asp Lys Lys Tyr Ser Ile Gly

210 215 220

Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu

225 230 235 240

Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg

245 250 255

His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly

260 265 270

Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr

275 280 285

Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn

290 295 300

Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser

305 310 315 320

Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly

325 330 335

Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr

340 345 350

His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg

355 360 365

Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe

370 375 380

Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu

385 390 395 400

Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro

405 410 415

Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu

420 425 430

Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu

435 440 445

Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu

450 455 460

Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu

465 470 475 480

Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala

485 490 495

Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu

500 505 510

Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile

515 520 525

Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His

530 535 540

His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro

545 550 555 560

Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala

565 570 575

Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile

580 585 590

Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys

595 600 605

Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly

610 615 620

Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg

625 630 635 640

Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile

645 650 655

Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala

660 665 670

Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr

675 680 685

Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala

690 695 700

Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn

705 710 715 720

Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val

725 730 735

Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys

740 745 750

Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu

755 760 765

Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr

770 775 780

Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu

785 790 795 800

Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile

805 810 815

Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu

820 825 830

Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile

835 840 845

Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met

850 855 860

Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg

865 870 875 880

Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu

885 890 895

Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu

900 905 910

Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln

915 920 925

Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala

930 935 940

Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val

945 950 955 960

Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val

965 970 975

Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn

980 985 990

Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly

995 1000 1005

Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln

1010 1015 1020

Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met

1025 1030 1035

Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp

1040 1045 1050

Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile

1055 1060 1065

Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser

1070 1075 1080

Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr

1085 1090 1095

Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe

1100 1105 1110

Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp

1115 1120 1125

Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile

1130 1135 1140

Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys

1145 1150 1155

Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr

1160 1165 1170

Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe

1175 1180 1185

Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala

1190 1195 1200

Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro

1205 1210 1215

Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp

1220 1225 1230

Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala

1235 1240 1245

Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys

1250 1255 1260

Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu

1265 1270 1275

Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly

1280 1285 1290

Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val

1295 1300 1305

Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys

1310 1315 1320

Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg

1325 1330 1335

Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro

1340 1345 1350

Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly

1355 1360 1365

Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr

1370 1375 1380

Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu

1385 1390 1395

Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys

1400 1405 1410

Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg

1415 1420 1425

Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala

1430 1435 1440

Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr

1445 1450 1455

Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu

1460 1465 1470

Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln

1475 1480 1485

Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu

1490 1495 1500

Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile

1505 1510 1515

Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn

1520 1525 1530

Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp

1535 1540 1545

Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu

1550 1555 1560

Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu

1565 1570 1575

Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala Thr Lys Lys Ala

1580 1585 1590

Gly Gln Ala Lys Lys Lys Lys Gly Thr Asp Ser Gly Gly Ser Ala

1595 1600 1605

Asn Glu Leu Thr Trp His Asp Val Leu Ala Glu Glu Lys Gln Gln

1610 1615 1620

Pro Tyr Phe Leu Asn Thr Leu Gln Thr Val Ala Ser Glu Arg Gln

1625 1630 1635

Ser Gly Val Thr Ile Tyr Pro Pro Gln Lys Asp Val Phe Asn Ala

1640 1645 1650

Phe Arg Phe Thr Glu Leu Gly Asp Val Lys Val Val Ile Leu Gly

1655 1660 1665

Gln Asp Pro Tyr His Gly Pro Gly Gln Ala His Gly Leu Ala Phe

1670 1675 1680

Ser Val Arg Pro Gly Ile Ala Ile Pro Pro Ser Leu Leu Asn Met

1685 1690 1695

Tyr Lys Glu Leu Glu Asn Thr Ile Pro Gly Phe Thr Arg Pro Asn

1700 1705 1710

His Gly Tyr Leu Glu Ser Trp Ala Arg Gln Gly Val Leu Leu Leu

1715 1720 1725

Asn Thr Val Leu Thr Val Arg Ala Gly Gln Ala His Ser His Ala

1730 1735 1740

Ser Leu Gly Trp Glu Thr Phe Thr Asp Lys Val Ile Ser Leu Ile

1745 1750 1755

Asn Gln His Arg Glu Gly Val Val Phe Leu Leu Trp Gly Ser His

1760 1765 1770

Ala Gln Lys Lys Gly Ala Ile Ile Asp Lys Gln Arg His His Val

1775 1780 1785

Leu Lys Ala Pro His Pro Ser Pro Leu Ser Ala His Arg Gly Phe

1790 1795 1800

Phe Gly Cys Asn His Phe Val Leu Ala Asn Gln Trp Leu Glu Gln

1805 1810 1815

Arg Gly Glu Thr Pro Ile Asp Trp Met Pro Val Leu Pro Ala Glu

1820 1825 1830

Ser Glu Pro Lys Lys Lys Arg Lys Val Leu Lys Glu Gly Arg Gly

1835 1840 1845

Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro Gly Pro Asp

1850 1855 1860

Ser Leu Ala Glu Ser Arg Trp Pro Pro Gly Leu Ala Val Met Lys

1865 1870 1875

Thr Ile Asp Asp Leu Leu Arg Cys Gly Ile Cys Phe Glu Tyr Phe

1880 1885 1890

Asn Ile Ala Met Ile Ile Pro Gln Cys Ser His Asn Tyr Cys Ser

1895 1900 1905

Leu Cys Ile Arg Lys Phe Leu Ser Tyr Lys Thr Gln Cys Pro Thr

1910 1915 1920

Cys Cys Val Thr Val Thr Glu Pro Asp Leu Lys Asn Asn Arg Ile

1925 1930 1935

Leu Asp Glu Leu Val Lys Ser Leu Asn Phe Ala Arg Asn His Leu

1940 1945 1950

Leu Gln Phe Ala Leu Glu Ser Pro Ala Lys Ser Pro Ala Ser Ser

1955 1960 1965

Ser Ser Lys Asn Leu Ala Val Lys Val Tyr Thr Pro Val Ala Ser

1970 1975 1980

Arg Gln Ser Leu Lys Gln Gly Ser Arg Leu Met Asp Asn Phe Leu

1985 1990 1995

Ile Arg Glu Met Ser Gly Ser Thr Ser Glu Leu Leu Ile Lys Glu

2000 2005 2010

Asn Lys Ser Lys Phe Ser Pro Gln Lys Glu Ala Ser Pro Ala Ala

2015 2020 2025

Lys Thr Lys Glu Thr Arg Ser Val Glu Glu Ile Ala Pro Asp Pro

2030 2035 2040

Ser Glu Ala Lys Arg Pro Glu Pro Pro Ser Thr Ser Thr Leu Lys

2045 2050 2055

Gln Val Thr Lys Val Asp Cys Pro Val Cys Gly Val Asn Ile Pro

2060 2065 2070

Glu Ser His Ile Asn Lys His Leu Asp Ser Cys Leu Ser Arg Glu

2075 2080 2085

Glu Lys Lys Glu Ser Leu Arg Ser Ser Val His Lys Arg Lys Pro

2090 2095 2100

Leu Pro Lys Thr Val Tyr Asn Leu Leu Ser Asp Arg Asp Leu Lys

2105 2110 2115

Lys Lys Leu Lys Glu His Gly Leu Ser Ile Gln Gly Asn Lys Gln

2120 2125 2130

Gln Leu Ile Lys Arg His Gln Glu Phe Val His Met Tyr Asn Ala

2135 2140 2145

Gln Cys Asp Ala Leu His Pro Lys Ser Ala Ala Glu Ile Val Arg

2150 2155 2160

Glu Ile Glu Asn Ile Glu Lys Thr Arg Met Arg Leu Glu Ala Ser

2165 2170 2175

Lys Leu Asn Glu Ser Val Met Val Phe Thr Lys Asp Gln Thr Glu

2180 2185 2190

Lys Glu Ile Asp Glu Ile His Ser Lys Tyr Arg Lys Lys His Lys

2195 2200 2205

Ser Glu Phe Gln Leu Leu Val Asp Gln Ala Arg Lys Gly Tyr Lys

2210 2215 2220

Lys Ile Ala Gly Met Ser Gln Lys Thr Val Thr Ile Thr Lys Glu

2225 2230 2235

Asp Glu Ser Thr Glu Lys Leu Ser Ser Val Cys Met Gly Gln Glu

2240 2245 2250

Asp Asn Met Thr Ser Val Thr Asn His Phe Ser Gln Ser Lys Leu

2255 2260 2265

Asp Ser Pro Glu Glu Leu Glu Pro Asp Arg Glu Glu Asp Ser Ser

2270 2275 2280

Ser Cys Ile Asp Ile Gln Glu Val Leu Ser Ser Ser Glu Ser Asp

2285 2290 2295

Ser Cys Asn Ser Ser Ser Ser Asp Ile Ile Arg Asp Leu Leu Glu

2300 2305 2310

Glu Glu Glu Ala Trp Glu Ala Ser His Lys Asn Asp Leu Gln Asp

2315 2320 2325

Thr Glu Ile Ser Pro Arg Gln Asn Arg Arg Thr Arg Ala Ala Glu

2330 2335 2340

Ser Ala Glu Ile Glu Pro Arg Asn Lys Arg Asn Arg Asn Ser Gly

2345 2350 2355

Gly Ser Pro Lys Lys Lys Arg Lys Val

2360 2365

Claims

1. A C to G base editing system for editing a target sequence in the genome of a cell, comprising:

2. The C to G base editing system of claim 1, further comprising

i) Second oneA polypeptide and/or an expression construct comprising a nucleotide sequence encoding a second polypeptide, wherein said second polypeptide comprises a proliferating cell nuclear antigen with a ubiquitin protein binding site mutated: (a)Proliferating Cell Nuclear Ansigen, PCNA) and ubiquitin protein;

3. The C to G base editing system of claim 2, wherein the C to G base editing system comprises any one selected from the following i) -v):

i) a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease inactivated CRISPR effector protein and a uracil-DNA glycosylase (UDG);

a second polypeptide and/or an expression construct comprising a nucleotide sequence encoding a second polypeptide, wherein said second polypeptide comprises a proliferating cell nuclear antigen with a mutated ubiquitin protein binding site: (Proliferating Cell Nuclear Antigen, PCNA) and ubiquitin protein;

ii) a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding a first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG);

a third polypeptide and/or an expression construct comprising a nucleotide sequence encoding a third polypeptide, wherein the third polypeptide comprises a mutated AP endonuclease (APE);

iii) a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding a first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG);

iv) a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding a first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a uracil-DNA glycosylase (UDG);

a fourth polypeptide and/or an expression construct comprising a nucleotide sequence encoding a fourth polypeptide, wherein said fourth polypeptide comprises a Rad18 protein;

v) a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide, wherein the first polypeptide comprises a cytosine deaminase, a nuclease-inactivated CRISPR effector protein, and a Rad18 protein;

4. The C to G base editing system of any one of claims 1 to 3, wherein the cytosine deaminase is selected from the group consisting of APOBEC1 deaminase, activation-induced cytidine deaminase (AID), APOBEC3G, CDA1, human APOBEC3A deaminase,

preferably, the cytosine deaminase is a human APOBEC3A deaminase, e.g., the amino acid sequence of which is shown in SEQ ID NO 1.

5. The C to G base editing system of any of claims 1-4 wherein the nuclease-inactivated CRISPR effector protein is a nuclease-inactivated Cas9, preferably a Cas9 nickase (nCas9), e.g., the nuclease-inactivated CRISPR effector protein comprises the amino acid sequence shown in SEQ ID NO 3.

6. The C to G base editing system of any one of claims 1 to 5 wherein the UDG has an amino acid sequence set forth in SEQ ID NO 5.

7. The C to G base editing system of any one of claims 2 to 6, wherein the PCNA in which the ubiquitin protein binding site is mutated comprises the amino acid sequence shown in SEQ ID NO 9.

8. The C to G base editing system of any one of claims 2-7 wherein the second polypeptide comprises one ubiquitin protein fused to PCNA with the ubiquitin protein binding site mutated.

9. The C to G base editing system of any one of claims 2-8, wherein the ubiquitin protein is a truncated ubiquitin protein, e.g., the truncated ubiquitin protein comprises the amino acid sequence set forth in SEQ ID NO 10.

10. The C to G base editing system of any one of claims 2 to 9, wherein the second polypeptide further comprises MCP (MS2 coat protein), e.g. the MCP is fused to the N-terminus of PCNA wherein the ubiquitin protein binding site is mutated.

11. The C to G base editing system of claim 10, wherein the MCP comprises an amino acid sequence shown in SEQ ID NO. 7.

12. The C to G base editing system of any one of claims 2-11, wherein the mutated APE

i) Derived from rice APE01g and comprising the amino acid substitution D297A with respect to wild-type APE01g, the amino acid position being referenced to SEQ ID NO: 18;

ii) is derived from rice APE12g and comprises the amino acid substitution D327A relative to wild-type APE12g, said amino acid position being referenced to SEQ ID NO 19; or

iii) is derived from rice APE12g and comprises the amino acid substitutions D238A and N240V with respect to wild-type APE12g, the amino acid positions being referenced to SEQ ID NO 19.

13. The C to G base editing system of claim 12 wherein the mutant APE mutant AP lyase comprises the amino acid sequence shown in SEQ ID NO 11, 15 or 16.

14. The C to G base editing system of any one of claims 2-13, wherein the Rad18 protein is a human Rad18 protein, e.g., the Rad18 protein comprises the amino acid sequence set forth in SEQ ID NO 17.

15. The C to G base editing system of any one of claims 1 to 14, wherein the expression construct comprises a nucleotide sequence encoding the amino acid sequence set forth in one of SEQ ID NOs 12-14 and 21-23.

16. A method of producing a genetically modified cell, comprising introducing into a cell the gene editing system of any one of claims 1-15.

17. The method of claim 16, wherein the genetic modification is a substitution of one or more C to G in the target sequence.

18. The method of claim 16 or 17, wherein the cell is derived from, for example, a mammal such as a human, mouse, rat, monkey, dog, pig, sheep, cow, cat; poultry such as chicken, duck, goose; plants, including monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis.

19. A kit comprising the gene editing system of any one of claims 1-15, and instructions for use.