CN112175927A - Base editing tool and application thereof - Google Patents

Base editing tool and application thereof Download PDF

Info

Publication number
CN112175927A
CN112175927A CN201910591202.0A CN201910591202A CN112175927A CN 112175927 A CN112175927 A CN 112175927A CN 201910591202 A CN201910591202 A CN 201910591202A CN 112175927 A CN112175927 A CN 112175927A
Authority
CN
China
Prior art keywords
leu
lys
glu
fragment
asp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910591202.0A
Other languages
Chinese (zh)
Other versions
CN112175927B (en
Inventor
刘馨怡
李广磊
黄行许
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ShanghaiTech University
Original Assignee
ShanghaiTech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ShanghaiTech University filed Critical ShanghaiTech University
Priority to CN201910591202.0A priority Critical patent/CN112175927B/en
Publication of CN112175927A publication Critical patent/CN112175927A/en
Application granted granted Critical
Publication of CN112175927B publication Critical patent/CN112175927B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04002Adenine deaminase (3.5.4.2)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/09Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/10Plasmid DNA
    • C12N2800/106Plasmid DNA for vertebrates
    • C12N2800/107Plasmid DNA for vertebrates for mammalian
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2810/00Vectors comprising a targeting moiety
    • C12N2810/10Vectors comprising a non-peptidic targeting moiety

Abstract

The invention relates to the technical field of biology, in particular to a base editing tool and application thereof. The invention provides a fusion protein, which comprises an ecTadA-ecTadA dimer fragment and an xCas9n fragment, wherein the ecTadA-ecTadA dimer fragment comprises an ecTad fragment and an ecTadA fragment. The present invention provides a base editing tool that can efficiently recognize a site having a PAM sequence of NG, GAA, and GAT and can perform accurate and broader genome recognition by fusing a nuclear localization element and a DNA double strand break site binding protein to an xCas 9-based base editor. In addition, the fusion protein has high-efficiency and accurate single-base editing capacity, the purity of an editing product is high, and the fusion protein has a good industrialization prospect.

Description

Base editing tool and application thereof
Technical Field
The invention relates to the technical field of biology, in particular to a base editing tool and application thereof.
Background
The advent and development of CRISPR/Cas technology led to a revolutionary surge in the field of gene editing. Currently, the CRISPR/Cas9 system has been successfully applied to gene editing studies of multiple species, including knockout and knock-in of DNA, molecular labeling, and gene transcription regulation, among others. Under sgrna (single-stranded rna) guidance, Cas9 protein reaches a designated region and exerts nuclease activity to perform DNA double strand cleavage, resulting in DNA Double Strand Breaks (DSBs). The production of DSBs activates the body's own DNA Repair mechanisms, including Non-Homologous End joining Repair (NHEJ) and Homologous recombination Repair (HDR). Where NHEJ is the main repair pathway, the repair results in indels (indels) resulting in frameshift mutations (frameshift mutations), leading to gene knock-outs (knockouts); HDR-mediated repair can be achieved by knock-in (knock-in) in the presence of homologous templates, which is difficult to achieve with this approach due to the low efficiency and time and effort of HDR.
In response to the above problems, David Liu et al, harvard university, developed a Base editing tool (Base editor) capable of achieving single Base substitution without causing DSB by fusing Cas9nickase (Cas9n) having a nickase activity with a nucleotide deaminase. Currently, there are Cytosine Base Editors (CBE) and Adenine Base Editors (ABE) which can respectively realize C-to-T and A-to-G base substitution. When the CRISPR/Cas system targets a genome, a pro-spacer adjacent motif (PAM) is recognized, so that the PAM sequence becomes one of barriers for limiting the application of the existing base editing system. It has been studied that Cas9(SpCas9) from Streptococcus pyogenes (Streptococcus pyogenes) was modified to recognize sequences PAM as NG, GAA and GAT, and the modified Cas9 was named xCas 9. Compared with base editing tools (BE3 and ABE) based on SpCas9, the base editor (including xBE3 and xABE) based on xCas9 can identify more target sites on the genome, but the editing efficiency of xBE3 and xABE is relatively low, so that it has important application significance to improve the editing efficiency of xBE3 and xABE by fusing the nuclear localization element and the DNA double strand break binding protein.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, it is an object of the present invention to provide a base editing tool and its use for solving the problems of the prior art.
To achieve the above and other related objects, according to one aspect of the present invention, there is provided a fusion protein comprising an ecTadA-ecTadA dimer fragment and an xCas9n fragment, wherein the ecTadA-ecTadA dimer fragment comprises an ecTad fragment and an ecTadA fragment.
In some embodiments of the invention, the amino acid sequence of said ecTadA fragment comprises:
a) an amino acid sequence shown as SEQ ID NO. 32; or the like, or, alternatively,
b) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.32, and having the function of the amino acid sequence defined in a), preferably capable of forming a dimer with an ecTadA fragment, the dimer having adenine deaminase activity.
In some embodiments of the invention, the amino acid sequence of said ecTadA fragment comprises:
c) an amino acid sequence shown as SEQ ID NO. 33; or the like, or, alternatively,
d) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.33 and having the function of the amino acid sequence defined in c), preferably capable of forming a dimer with an ecTadA fragment, the dimer having adenine deaminase activity.
In some embodiments of the invention, the amino acid sequence of said xCas9n fragment comprises:
e) an amino acid sequence shown as SEQ ID NO. 34; or the like, or, alternatively,
f) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.34 and having the function of the amino acid sequence defined in e), preferably capable of recognizing NG, GAA or GAT as PAM.
In some embodiments of the invention, the fusion protein comprises, in order from N-terminus to C-terminus, an ecTadA-ecTadA dimer fragment and an xCas9N fragment.
In some embodiments of the invention, the ecTadA-ecTadA dimer fragment comprises, in order from N-terminus to C-terminus, an ecTad fragment and an ecTadA fragment.
In some embodiments of the invention, the fusion protein further comprises a nuclear localization signal fragment, preferably the nuclear localization signal fragment is located at the N-terminus and/or the C-terminus of the ecTadA-ecTadA dimer fragment and the xCas9N fragment, preferably the amino acid sequence of the nuclear localization signal fragment is as shown in SEQ ID No.35, or SEQ ID No. 36.
In some embodiments of the invention, the fusion protein further comprises a flexible linker peptide fragment, preferably, the amino acid sequence of the flexible linker peptide fragment is as shown in SEQ ID No.37 or SEQ ID No. 38.
In some embodiments of the invention, the amino acid sequence of the fusion protein is set forth in SEQ ID No. 39.
In another aspect, the invention provides an isolated polynucleotide encoding the fusion protein.
In another aspect, the invention provides a construct comprising the isolated polynucleotide.
In another aspect, the invention provides an expression system comprising said construct or a genome into which said exogenous polynucleotide has been integrated.
In some embodiments of the invention, the host cell of the expression system is selected from eukaryotic cells or prokaryotic cells, preferably from mouse cells, hamster cells, non-human primate cells, human cells, more preferably from mouse brain neuroma cells, chinese hamster ovary cells, african green monkey kidney cells, human osteosarcoma cells, human embryonic kidney cells, or human cervical cancer cells, more preferably from N2a cells, CHO cells, COS-7 cells, U2OS cells, HEK293FT cells, or Hela cells.
In another aspect, the invention provides the use of said fusion protein, said isolated polynucleotide, said construct or said expression system in gene editing.
In some embodiments of the invention, the use is in particular in gene editing in eukaryotes.
In another aspect, the invention provides a base editing system, which includes the fusion protein, and the base editing system further includes sgRNA.
In another aspect, the present invention provides a gene editing method, including: and (b) performing gene editing through the fusion protein or the base editing system.
Drawings
FIG. 1 shows a schematic diagram of the BPNLS-xBE vector and the editing efficiency detection thereof at partial test sites. Wherein a is a schematic diagram of an optimized BPNLS-xBE carrier; b is a Sanger sequencing peak diagram edited by the optimized BPNLS-xABE and the optimized pre-xABE at the endogenous gene locus; c, analyzing the editing efficiency of the BPNLS-xABE and xABE on the endogenous gene locus by applying Sanger sequencing; d is the standardization of the efficiency of editing endogenous gene sites by BPNLS-xABE and xABE.
FIG. 2 is a schematic diagram showing the editing efficiency of BPNLS-xABE simulated pathogenic sites in example 1 of the present invention. Wherein a is a Sanger sequencing peak diagram of a part of representative sites of BPNLS-xABE for pathogenic mutation simulation; b, analyzing and quantifying the simulation pathogenic mutation efficiency of BPNLS-xABE and xABE by applying Sanger sequencing; and c, simulating the mutation efficiency of two pathogenic sites simultaneously for BPNLS-xABE.
FIG. 3 is a schematic diagram showing the Wilson's disease-causing mutant cell line of example 2 of the present invention. Wherein a is Wilson disease pathogenic gene mutation site Atp7bT1033APosition and sequence information in the gene, and sgRNA sequences for modeling and repairing the pathogenic mutation; b is Wilson disease pathogenic gene mutation site Atp7bT1220MPosition and sequence information in the gene, and sgRNA sequences for modeling and repairing the pathogenic mutation; c, successfully constructing the Wilson disease pathogenic mutation site Atp7b by applying BPNLS-xABET1033ASanger sequencing peak plot of cell lines; d is to use BPNLS-Gam-xBE3 to successfully construct a mutant site Atp7b containing Wilson disease causing mutationT1220MSanger sequencing peak plot of cell lines.
FIG. 4 shows the experiment of example 3 of the present invention using BPNLS-Gam-xBE3 and BPNLS-xABE for Atp7b respectivelyT1033AAnd Atp7bT1220MSchematic representation of the repair. Wherein a is BPNLS-Gam-xBE3 applied to pathogenic mutation site Atp7b containing Wilson's diseaseT1033ARepair of mutation sites in cell linesSanger sequencing peak plot of (a); b is Wilson disease pathogenic mutation site Atp7bT1033AQuantification in mutant and repair cells; c, applying BPNLS-xABE to the pathogenic mutation site Atp7b containing Wilson's diseaseT1220MSanger sequencing peak diagrams of repair of mutation sites in cell lines; d is Wilson disease pathogenic mutation site Atp7bT1220MQuantification in mutant and repair cells.
Detailed Description
The present inventors have made extensive exploratory studies and have provided a fusion protein which is a novel adenine base editing tool, and which can recognize NG, GAA and GAT sites and perform precise base editing, widening the targeting range of base editing, and have completed the present invention.
In a first aspect, the invention provides a fusion protein comprising an ecTadA-ecTadA dimer fragment and an xCas9n fragment, said ecTadA-ecTadA dimer fragment comprising an ecTad fragment and an ecTadA fragment. The fusion protein can use NG, GAA and GAT as PAM sequences, is matched with sgRNA in a target area, realizes efficient base editing of 4-7A-to-G at the 5' end of the sgRNA in the target area, and has higher editing efficiency and higher editing accuracy at the target site.
In the fusion protein provided by the present invention, the amino acid sequence of the ecTadA fragment may comprise: a) an amino acid sequence shown as SEQ ID NO. 32; or b) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.32 and having the function of the amino acid sequence defined in a). Specifically, the amino acid sequence in b) specifically refers to: the amino acid sequence shown in SEQ ID NO.32 is obtained by substituting, deleting or adding one or more (specifically, 1 to 50, 1 to 30, 1 to 20, 1 to 10, 1 to 5, 1 to 3, 1, 2, or 3) amino acids, or one or more (specifically, 1 to 50, 1 to 30, 1 to 20, 1 to 10, 1 to 5, 1 to 3, 1, 2, or 3) amino acids are added to the N-terminal and/or C-terminal, and the polypeptide fragment has the function of the polypeptide fragment shown in SEQ ID NO.32, for example, a polypeptide fragment having a dimer activity capable of forming a dimer with an ecTadA fragment and having an adenine deaminase activity, more specifically, a polypeptide fragment having an adenine (enadine, A) deamination functions to produce hypoxanthine (I). The amino acid sequence in b) may have more than 80%, 85%, 90%, 93%, 95%, 97%, or 99% similarity to SEQ ID NO. 32. The ecTadA fragment is derived from Escherichia coli (Escherichia coli).
In the fusion protein provided by the present invention, the amino acid sequence of the ecTadA fragment may comprise: c) an amino acid sequence shown as SEQ ID NO. 33; or d) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.33 and having the function of the amino acid sequence defined in c). Specifically, the amino acid sequence in d) specifically refers to: the amino acid sequence shown in SEQ ID NO.33 may be obtained by substituting, deleting or adding one or more (specifically, 1 to 50, 1 to 30, 1 to 20, 1 to 10, 1 to 5, 1 to 3, 1, 2, or 3) amino acids, or may be obtained by adding one or more (specifically, 1 to 50, 1 to 30, 1 to 20, 1 to 10, 1 to 5, 1 to 3, 1, 2, or 3) amino acids to the N-terminus and/or the C-terminus, and may have a function of the polypeptide fragment shown in SEQ ID NO.33, for example, a polypeptide fragment having a function of forming a dimer with an ecTadA fragment and having an adenine deaminase activity, more specifically, adenine (adenine, A) deamination functions to produce hypoxanthine (I). The amino acid sequence in d) may have more than 80%, 85%, 90%, 93%, 95%, 97%, or 99% similarity to SEQ ID NO. 33. The ecTadA fragments are derived from e.coli (Escherichia coli) and obtained by artificial directed evolution.
In the fusion protein provided by the present invention, the amino acid sequence of the xCas9n fragment may include: e) an amino acid sequence shown as SEQ ID NO. 34; or f) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.34 and having the function of the amino acid sequence defined in e). Specifically, the amino acid sequence in f) specifically refers to: the amino acid sequence shown in SEQ ID NO.34 may be obtained by substituting, deleting or adding one or more (specifically, 1 to 50, 1 to 30, 1 to 20, 1 to 10, 1 to 5, 1 to 3, 1, 2, or 3) amino acids, or one or more (specifically, 1 to 50, 1 to 30, 1 to 20, 1 to 10, 1 to 5, 1 to 3, 1, 2, or 3) amino acids may be added to the N-terminus and/or the C-terminus, and the polypeptide fragment having the function of the polypeptide fragment shown in SEQ ID NO.34 as an amino acid, for example, may be a polypeptide fragment capable of recognizing NG, GAA, or GAT as PAM, specifically may be a polypeptide fragment capable of recognizing NG, GAA, or GAT as PAM, and can be matched with sgRNA and ecTadA-ecTadA dimer fragments of specific target sites to realize base editing of A-to-G at 4-7 sites of the 5' end of the sgRNA in a target region. The amino acid sequence in f) may have more than 80%, 85%, 90%, 93%, 95%, 97%, or 99% similarity to SEQ ID No. 34. Targeted recognition of the CRISPR/Cas9 system usually requires a pro-spacer adjacent motif (PAM) next to the target site, as one of the Cas9 enzymes most frequently used for genome editing, Cas9(SpCas9) from Streptococcus pyogenes (Streptococcus pyogenes) is only able to recognize PAM for the NGG sequence, which limits the range that can be targeted in the genome, whereas the xCas9n fragment of the present invention is derived from Streptococcus pyogenes (Streptococcus pyogenes) is able to recognize GAA, GAA or GAT sequence as PAM.
In the fusion protein provided by the invention, the substitution, deletion or addition can be conservative amino acid substitution. The "conservative amino acid substitution" may specifically refer to the case where an amino acid residue is substituted with another amino acid residue having a similar side chain. Families of amino acid residues with similar side chains should be known to those skilled in the art and may be, for example, families including, but not limited to, basic side chains (e.g., lysine, arginine, histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan) isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). More specifically, conservative amino acid substitutions may include, but are not limited to, the particulars listed in the following table, where the numbers in table 1 (amino acid similarity matrix) indicate the degree of similarity between two amino acids, and where the numbers are greater than or equal to 0, they are considered conservative amino acid substitutions, and table 2 is an exemplary scheme of conservative amino acid substitutions.
TABLE 1
C G P S A T D E N Q H K R V M I L F Y W
W -8 -7 -6 -2 -6 -5 -7 -7 -4 -5 -3 -3 2 -6 -4 -5 -2 0 0 17
Y 0 -5 -5 -3 -3 -3 -4 -4 -2 -4 0 -4 -5 -2 -2 -1 -1 7 10
F -4 -5 -5 -3 -4 -3 -6 -5 -4 -5 -2 -5 -4 -1 0 1 2 9
L -6 -4 -3 -3 -2 -2 -4 -3 -3 -2 -2 -3 -3 2 4 2 6
I -2 -3 -2 -1 -1 0 -2 -2 -2 -2 -2 -2 -2 4 2 5
M -5 -3 -2 -2 -1 -1 -3 -2 0 -1 -2 0 0 2 6
V -2 -1 -1 -1 0 0 -2 -2 -2 -2 -2 -2 -2 4
R -4 -3 0 0 -2 -1 -1 -1 0 1 2 3 6
K -5 -2 -1 0 -1 0 0 0 1 1 0 5
H -3 -2 0 -1 -1 -1 1 1 2 3 6
Q -5 -1 0 -1 0 -1 2 2 1 4
N -4 0 -1 1 0 0 2 1 2
E -5 0 -1 0 0 0 3 4
D -5 1 -1 0 0 0 4
T -2 0 0 1 1 3
A -2 1 1 1 2
S 0 1 1 1
P -3 -1 6
G -3 5
C 12
TABLE 2
Figure BDA0002116133840000061
Figure BDA0002116133840000071
The fusion protein provided by the invention can also comprise a nuclear localization signal fragment (NLS), and the nuclear localization signal fragment can generally interact with a nuclear vector, so that the protein can be transported into a nucleus. The nuclear localization signal fragment may be located at the N-terminus of the ecTadA-ecTadA dimer fragment, at the C-terminus of the xCas9N fragment, or between the ecTadA-ecTadA dimer fragment and the xCas9N fragment. The skilled person can generally select the position and specific sequence of suitable nuclear localization signal fragments, for example, the N-terminus of the ecTadA-ecTadA dimer fragment can be provided with a first nuclear localization signal fragment, the amino acid sequence of which can include the amino acid sequence shown in SEQ ID No. 35; as another example, the C-terminus of the xCas9n fragment may be provided with a second nuclear localization signal fragment, and the amino acid sequence of the second nuclear localization signal fragment may include the amino acid sequence shown as SEQ ID NO. 36.
The fusion protein provided by the invention can further comprise a flexible connecting peptide segment, wherein the flexible connecting peptide segment can be positioned at the N end of the ecTadA-ecTadA dimer segment, between the ecTadA segment and the ecTadA segment, between the ecTadA-ecTadA dimer segment and the xCas9N segment, or at the C end of the xCas9N segment. Those skilled in the art will generally select suitable flexible linker fragments for linking the polypeptide fragments, for example, a first flexible linker fragment may be provided between the ecTad fragment and the ecTadA fragment, and the amino acid sequence of said first flexible linker fragment may comprise the amino acid sequence shown in SEQ ID No. 37; for another example, a second flexible linker peptide segment may be disposed between the ecTadA-ecTadA dimer fragment and the xCas9n fragment, and an amino acid sequence of the second flexible linker peptide segment may include an amino acid sequence shown in SEQ ID No. 37; for another example, the C-terminus of the xCas9n fragment may be provided with a third flexible linker, and the amino acid sequence of the third flexible linker may include the amino acid sequence shown in SEQ ID No. 38.
In the fusion protein provided by the invention, the fusion protein may sequentially include an ecTadA-ecTadA dimer fragment and an xCas9N fragment from the N-terminal to the C-terminal, and the ecTadA-ecTadA dimer fragment may sequentially include an ecTadA fragment and an ecTadA fragment from the N-terminal to the C-terminal, and preferably may sequentially include an ecTadA fragment, a first flexible linker fragment and an ecTadA fragment. In a specific embodiment of the present invention, the fusion protein may sequentially include, from the N-terminal to the C-terminal, a first nuclear localization signal fragment, an ecTadA-ecTadA dimer fragment, a second flexible linker fragment, an xCas9N fragment, a third flexible linker fragment, and a second nuclear localization signal fragment, and an amino acid sequence of the fusion protein is shown in SEQ ID No. 39.
In a second aspect, the present invention provides an isolated polynucleotide encoding a fusion protein provided by the first aspect of the present invention.
In a third aspect, the invention provides a construct comprising an isolated polynucleotide provided in the second aspect of the invention. The construct can be generally constructed by inserting the isolated polynucleotide into a suitable expression vector, which can be selected by those skilled in the art, for example, including but not limited to, a pCMV expression vector, a pSV2 expression vector, a pGL3 expression vector, a pST1347 vector, a px330 vector, a pAAV vector, and the like.
In a fourth aspect, the invention provides an expression system comprising a construct or genome provided by the third aspect of the invention and integrated therein an exogenous isolated polynucleotide provided by the second aspect of the invention. The expression system can be a host cell that can express the fusion protein as described above, which can cooperate with the sgRNA such that the fusion protein can be targeted to the target region, enabling base editing of the target region. In another embodiment of the present invention, the host cell may be a eukaryotic cell and/or a prokaryotic cell, more specifically a mouse cell, hamster cell, non-human primate cell, human cell, etc., more specifically a mouse brain neuroma cell, chinese hamster ovary cell, african green monkey kidney cell, human osteosarcoma cell, human embryonic kidney cell, human cervical cancer cell, etc., more specifically N2a cell, CHO cell, COS-7 cell, U2OS cell, HEK293FT cell, Hela cell, etc.
In a fifth aspect, the invention provides the fusion protein provided by the first aspect of the invention, or the isolated polynucleotide provided by the second aspect of the invention, or the construct provided by the third aspect of the invention, or the expression system provided by the fourth aspect of the invention, for use in gene editing, preferably for use in gene editing of a eukaryote, particularly an metazoan, particularly including but not limited to humans, non-human primates, hamsters, mice, and the like. The uses may specifically include, but are not limited to, base editing from a to G (more specifically base editing from a-to-G at positions 4-7 of the 5' end of the sgRNA in the target region), editing of splice acceptor/donor sites to modulate RNA splicing, construction of models (e.g., disease models, cellular models, animal models, etc.) or treatment of human diseases using the present tools, and the like. In a specific embodiment of the invention, the use may specifically be a targetThe specific editing site of the gene from A to G can be ATP7bM1169V(GenBank:NC_000013.11)、ATP7bI1230V、ATP7bW1353R、ATP7bY713C、STK11L67P(GenBank:NC_000019.10)、ARHGEF18T270A(GenBank:NC_000019.10)、BCS1LV327A(GenBank:NC_000002.12)、BCS1LM48V、AIFM1D237G(GenBank:NC_000023.11)、ALPLC201R(GenBank: NC-000001.11), and the like. In a specific embodiment of the invention, the application is specifically the construction or repair of a cell line, more specifically a cell line containing Wilson disease-causing mutation, or a PJ's syndrome-causing mutation, and the like, and the specific editing site can be Atp7bT1033A、Atp7bT1220M、STK11D194N、STK11R297KAnd the like. In another embodiment of the present invention, the object being edited may be an embryo, a cell, or the like.
In a sixth aspect, the invention provides a base editing system, including the fusion protein provided in the first aspect, the base editing system further including sgRNA. One skilled in the art can select an appropriate sgRNA targeting a specific site according to the targeted editing region of the gene. For example, the sequence of the sgRNA can be at least partially complementary to the target region, so that it can cooperate with the fusion protein to localize the fusion protein to the target region and allow base editing, specifically adenine deamination, i.e., the editing of adenine (a) to hypoxanthine (I), of a-to-G at positions 4-7 of the 5' end of the sgRNA in the target region. The base editing system provided by the invention greatly widens the targeted range of the genome, can take NG, GAA or GAT sequences as PAM, realizes the bases of A-to-G at 4-7 sites of the 5' end in the sgRNA target region, and has high mutation precision and low adjacent miss distance. In a specific embodiment of the invention, the application is specifically the construction or repair of a cell strain, more specifically a mutant cell strain containing Wilson disease, or a mutant cell strain containing PJ's syndrome. In a specific embodiment of the invention, the application is specifically the construction of cell lines, more specifically containing Wilson disease causing mutationThe specific editing site may be Atp7b, such as construction or repair of cell lines, or construction or repair of mutant PJ's syndrome cell linesT1033A(GenBank:NC_000013.11)、Atp7bT1220M(Genbank:NC_000013.11)、STK11D194N(GenBank:NC_000019.10)、STKR297K(GenBank: NC-000019.10), and the like. In another embodiment of the present invention, the object being edited may be an embryo, a cell, or the like.
The seventh aspect of the present invention provides a base editing method comprising: the gene editing is performed by the fusion protein provided by the first aspect of the present invention or the base editing system provided by the sixth aspect of the present invention. For example, the gene editing method may include: culturing the expression system provided by the fourth aspect of the present invention under appropriate conditions to express the fusion protein, which can base-edit the target region in the presence of the sgRNA targeting the target region to which it is mated. Methods for providing conditions under which the sgRNA exists should be known to those skilled in the art, and for example, an expression system capable of expressing the sgRNA, which may be a host cell including an expression vector containing a polynucleotide encoding the sgRNA or a host cell having the polynucleotide encoding the sgRNA integrated in a chromosome, may be cultured under appropriate conditions. In a specific embodiment of the present invention, the sgRNA and the fusion protein can be expressed in the same host cell, which is the target cell. In another embodiment of the invention, the gene editing is in vitro gene editing.
The present invention provides a base editing tool that can efficiently recognize a site having a PAM sequence of NG, GAA, and GAT and can perform accurate and broader genome recognition by fusing a nuclear localization element and a DNA double strand break site binding protein to an xCas 9-based base editor. In addition, the fusion protein has high-efficiency and accurate single-base editing capacity, the purity of an editing product is high, and the fusion protein has a good industrialization prospect.
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Before the present embodiments are further described, it is to be understood that the scope of the invention is not limited to the particular embodiments described below; it is also to be understood that the terminology used in the examples is for the purpose of describing particular embodiments, and is not intended to limit the scope of the present invention; in the description and claims of the present application, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise.
When numerical ranges are given in the examples, it is understood that both endpoints of each of the numerical ranges and any value therebetween can be selected unless the invention otherwise indicated. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In addition to the specific methods, devices, and materials used in the examples, any methods, devices, and materials similar or equivalent to those described in the examples may be used in the practice of the invention in addition to the specific methods, devices, and materials used in the examples, in keeping with the knowledge of one skilled in the art and with the description of the invention.
Unless otherwise indicated, the experimental methods, detection methods, and preparation methods disclosed herein all employ techniques conventional in the art of molecular biology, biochemistry, chromatin structure and analysis, analytical chemistry, cell culture, recombinant DNA technology, and related arts. These techniques are well described in the literature, and may be found in particular in the study of the MOLECULAR CLONING, Sambrook et al: a LABORATORY MANUAL, Second edition, Cold Spring Harbor LABORATORY Press, 1989and Third edition, 2001; ausubel et al, Current PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, New York, 1987and periodic updates; the series METHODS IN ENZYMOLOGY, Academic Press, San Diego; wolffe, CHROMATIN STRUCTURE AND FUNCTION, Third edition, Academic Press, San Diego, 1998; (iii) METHODS IN ENZYMOLOGY, Vol.304, Chromatin (P.M.Wassarman and A.P.Wolffe, eds.), Academic Press, San Diego, 1999; and METHODS IN MOLECULAR BIOLOGY, Vol.119, chromatography Protocols (P.B.Becker, ed.) Humana Press, Totowa, 1999, etc.
The construction methods of the BPNLS-Gam-xBE3 vector and the BPNLS-xBE vector used in the examples are as follows: xBE3 and xABE were optimized by introducing the BPNLS and Gam elements in the xBE3 vector and the BPNLS elements in the xABE vector, respectively, xBE3 and xABE plasmids were from addge (#108380, # 108382). The sequences of the constructed BPNLS-Gam-xBE3 and BPNLS-xABE vectors are respectively shown in SEQ ID NO.1 and SEQ ID NO. 2.
Example 1
In this example, the editing efficiency of BPNLS-xABE was verified in HEK293T cells using selected sgRNA sites of the endogenous gene, and the results are shown in fig. 1 and 2.
1.1 sgRNA plasmid construction
Selection of 10 endogenous Gene loci (including ATP7 b)M1169V,ATP7bI1230V,ATP7bW1353R,ATP7bY713C,STK11L67P,ARHGEF18T270A,BCS1LV327A,BCS1LM48V,AIFM1D237G,ALPLC201RAnd the like), designing sgRNA according to a target sequence and synthesizing oligos, wherein the sequence of the sgRNA is shown as SEQ ID.3-22. The 5 'end of the upstream sequence of each sgNRA was added with an accg sequence and the 5' end of the downstream sequence was added with an aaac sequence, so that the upstream sequence of each sgRNA used for synthesis was: 5 '-accgXXXXXXXXXXXXXXXXXX (20nt) -3', downstream sequence: 5 '-aaacXXXXXXXXXXXXXXXXXX (20nt) -3'. After synthesis, the upstream and downstream sequences were annealed by a preset program (95 ℃, 5 min; 95 ℃ -85 ℃ at-2 ℃/s; 85 ℃ -25 ℃ at-0.1 ℃/s; hold at 4 ℃), the annealed product was ligated to pGL3-U6-sgRNA (Addgene #51133) vector linearized with BsaI (NEB: R0539L), the sequence of pGL3-U6-sgRNA vector being shown as SEQ ID.31. The linearization system of pGL3-U6-sgRNA vector is as follows: pGL3-U6-sgRNA 3. mu.g; buffer (NEB: R0539L) 6. mu.L; BsaI 2. mu.L; ddH2O supplementThe enzyme was digested overnight at 37 ℃ in a volume of 60. mu.L. The sgRNA annealing product was ligated to the linearized vector as follows: 1 μ L of T4 ligase buffer (NEB: M0202L), 20ng of linearized vector, 5 μ L of annealed oligo fragment (10 μ M), 0.5 μ L of T4 ligase (NEB: M0202L), and ddH2O filled to 10 μ L and ligated overnight at 16 ℃. And transforming the connected vector into escherichia coli, selecting bacteria for identification, carrying out sequencing confirmation, carrying out shake bacteria on positive clones, extracting plasmids (Axygene: AP-MN-P-250G), measuring the concentration, and storing for later use.
1.2 cell culture and transfection
HEK293T cells (purchased from ATCC) were inoculated in DMEM high-sugar medium (HyClone, SH30022.01B) supplemented with 10% FBS (v/v) containing 1% Penicillin Streptomycin (v/v) (Gibco) and cultured in a 37-degree cell incubator containing 5% CO 2. The cells used for transfection were inoculated in 12-well cell culture plates for culture on the first day, the cells were observed on the next day, and when the cells grew to a cell density of about 75%, the cells were optimally restored by changing the medium with fresh antibiotic-free DMEM medium containing 10% FBS for 2 hours. When in transfection, the dosage of plasmids transfected by each hole of the 12-hole plate is 1 mu g of BPNLS-Gam-xBE3 or BPNLS-xABE plasmid and 0.6 mu g of sgRNA plasmid respectively. The plasmids were mixed and diluted with 50. mu.l of Opti-MEM (Gibco,11058021) medium to prepare a reagent A, and the mixture was allowed to stand for 5 minutes. Meanwhile, 2. mu.l of Lipofectamine 2000 transfection reagent (Thermo,11668019) was diluted with 50. mu.l of Opti-MEM medium and mixed well as reagent B, and left to stand for 5 minutes. And mixing the reagent A and the reagent B, uniformly blowing, and standing for 20 minutes. After the standing is finished, the mixed reagent is dropwise added into 12-hole plate cells to be transfected and placed back to a 37-degree incubator for culture. The medium was changed to DMEM medium containing 10% FBS 6 hours after transfection. Fluorescence was observed under a microscope 48 hours after transfection, and cells harvested 72 hours after transfection were sorted for GFP positive cells using a flow cytometer. 5000-10000 GFP positive cells are collected for each sample as required, and the samples are cracked and genotyped by cell lysate after centrifugation. The main components of the cell lysate were 50mM KCl, 1.5mM MgCl2, 10mM Tris pH 8.0, 0.5% Nonidet P-40, 0.5% Tween 20, 100g/ml protease K.
1.3 detection of editing efficiency of optimized base editing tool at endogenous Gene locus
Designing a primer according to the experimental requirement, carrying out PCR amplification on a sequence near a target point by using the GFP positive cell lysis product of 1.2 as a template, carrying out gel electrophoresis on the amplification product to detect a target band, and then purifying. The purified PCR product is used for high-throughput deep sequencing or Sanger sequencing for the identification of editing efficiency. The system for amplification of the target site sequence is as follows: 2Xbuffer (Vazyme, P505) 25. mu.L; dNTP 1 u L; f (10 pmol/. mu.L) 1. mu.L; r (10 pmol/. mu.L) 1. mu.L; 1 mu L of template; 0.5. mu.L of DNA polymerase (Vazyme, P505); ddH2O was made up to 50. mu.L. The amplified product is purified after the band of interest is determined by electrophoresis gel according to the following steps: adding PCR-A reagent (Axygen: AP-PCR-250G) with three times of volume into the PCR product, uniformly mixing, adding into A purification column, centrifuging at 12000 r/min for 1 min, and discarding the waste liquid; adding 700 mu L of Buffer W2, centrifuging for 1 minute, and discarding the waste liquid; adding 700 mu L of Buffer W2, centrifuging for 1 minute, and discarding the waste liquid; the purified column was housed in a recovery header and idled at 12000 rpm for 1 minute; the collection tube was replaced with a new one (1.5 mL), and after standing for 2 minutes, 20. mu.L of water was added to the column for elution. The correlation results are shown in fig. 1 and 2.
Example 2
In this example, the optimized base editing system of the present invention was used to simulate Wilson's disease-causing gene-related mutation Atp7b in a human cell lineT1033AAnd Atp7bT1220MAnd constructing mutant cell strains respectively containing the two mutations, wherein the method is realized by using BPNLS-Gam-xBE3, BPNLS-xABE and corresponding sgRNA, and the result is shown in figure 3.
2.1 sgRNA plasmid construction
Near the mutation site, the upstream sequences of mutant sgRNA Atp7b-27 and sgRNA Atp7b-16, Atp7b-27 were designed as follows: 5'-accgATTACCCATGGCGTCCCCAG-3' (SEQ id No. 23), the downstream sequence is: 5'-aaacCTGGGGACGCCATGGGTAAT-3' (SEQ ID.24). The sequence upstream of ATP7b-16 is: 5'-accgGATCACGGGGGACAACCGGA-3' (SEQ id.25), the downstream sequence is: 5'-aaacTCCGGTTGTCCCCCGTGATC-3' (SEQ ID. 26). After synthesizing oligos, sgRNA plasmid construction is carried out according to the method 1.1, and after sequencing and identifying the correct sequence, plasmids are extracted and stored for later use.
2.2 culture and transfection of cells
HEK293T cells were cultured and transfected as described in 1.2. Construct Atp7bT1033AThe plasmids used by the mutant cell line are BPNLS-xABE and sgRNA ATP7b-27, and the dosage of the plasmids is 1 mu g and 0.5 mu g respectively. Construct Atp7bT1220MThe plasmids used by the mutant cell line are BPNLS-Gam-xBE3 and sgRNA Atp7b-16, and the dosage of the plasmids is 1 mu g and 0.5 mu g respectively.
2.3 construction of a cell line containing Wilson's disease-causing mutant cells
After 72 hours of cell transfection, the cells were resuspended by trypsinization and sorted by flow cytometry. Selecting single cells with strong GFP positive, sorting to a 96-well plate, culturing for about two weeks in a 37-degree incubator, and selecting monoclonal cells under a microscope. And (3) selecting the monoclonal cells with good state for digestion and resuspension, numbering all the selected monoclonal cells, wherein one part is used for genotype identification, and the other part is subjected to passage amplification. After centrifugation of the cells for genotyping, the supernatant was discarded and the pellet at the bottom of the centrifuge tube was lysed with cell lysis buffer as described in 1.2. Primers were designed based on the target sequence and PCR amplification was performed using the cell lysate as template, as described in 1.3. And identifying target bands of the PCR amplification products by gel electrophoresis and then performing Sanger sequencing. Selection of pathogenic site Atp7b based on Sanger sequencing results peak plotT1033AAnd Atp7bT1220MAnd carrying out passage amplification and subsequent experiments on the completely mutated homozygous genotype monoclonal cells. The correlation results are shown in FIG. 3.
Example 3
In this example, the obtained homozygous mutant cell lines containing Wilson disease-causing mutation were treated with BPNLS-Gam-xBE3 and BPNLS-xABEAE of the present invention for Atp7bT1033AAnd Atp7bT1220MAnd (5) repairing. In this example, the repair of the mutation site was carried out using BPNLS-Gam-xBE3 and BPNLS-xABE plasmids in combination with the corresponding repair sgRNA, the structure of which is shown in FIG. 4.
3.1 sgRNA plasmid construction
In the vicinity of the mutation site, sgRNA was designed and repaired, and oligos were synthesized. Atp7bT1033AAnd Atp7bT1220MPathogenic factorThe repair sgrnas corresponding to the mutation sites are respectively: atp7 b-27-mut-corection sgRNA and Atp7 b-16-mut-coretionsgRNA. Atp7 b-27-mut-chromatography upstream sequence: 5'-accgATGGGTAATGGTGCCAGTCT-3' (SEQ ID.27), the downstream sequence being: 5'-aaacAGACTGGCACCATTACCCAT-3' (SEQ ID.28). Atp7 b-16-mut-chromatography upstream sequence: 5'-accgGTCCCCCGTGATCAGAACCA-3' (SEQ id No. 29), the downstream sequence is: 5'-aaac TGGTTCTGATCACGGGGGAC-3' (SEQ ID. 30). After synthesizing oligos, sgRNA plasmid construction is carried out according to the method 1.1, and after sequencing and identifying the correct sequence, plasmids are extracted and stored for later use.
3.2 cell culture and transfection
The gene which is constructed by the method 2.3 and contains the Wilson disease pathogenic mutation site Atp7bT1033AAnd Atp7bT1220MThe cell line (2) was cultured in a DMEM high-glucose medium containing 10% FBS and amplified. When repairing the pathogenic mutation, cell transfection was performed according to the method described in 1.2. Repair Atp7bT1033AThe pathogenic site needs to transfect plasmids BPNLS-Gam-xBE3 and Atp7 b-27-mut-Cortion, the dosage of the plasmids is 1 mug and 0.6 mug respectively, the repair Atp7bT1220MThe pathogenic site needs to be transfected with 1. mu.g of BPNLS-xABE and 0.6. mu.g of Atp7 b-16-mut-Cortion.
3.3 detection of repair efficiency
Cells were collected 72 hours after transfection and 10000 GFP strongly positive cells were sorted by a cell flow meter 5000-. Designing a primer to amplify the target fragment by PCR to identify the genotype. The main components of the cell lysate were 50mM KCl, 1.5mM MgCl2, 10mM Tris pH 8.0, 0.5% Nonidet P-40, 0.5% Tween 20, 100g/ml protease K. And (3) performing Sanger sequencing on the PCR amplification product after confirming a target band through electrophoresis gel, detecting whether the mutation site is repaired according to a Sanger sequencing peak diagram, and detecting the repair efficiency of the mutation site through quantification of the Sanger sequencing peak diagram. The correlation results are shown in fig. 4.
In conclusion, the present invention effectively overcomes various disadvantages of the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.
Sequence listing
<110> Shanghai science and technology university
<120> a base editing tool and use thereof
<160> 39
<170> SIPOSequenceListing 1.0
<210> 1
<211> 9201
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
cctgtcaagg tatccccacg tcactctgtt tatttacatc gcaaggctgt accaccacgc 60
tgacccccgc aatcgacaag gcctgcggga tttgatctct tcaggtgtga ctatccaaat 120
tatgactgag caggagtcag gatactgctg gagaaacttt gtgaattata gcccgagtaa 180
tgaagcccac tggcctaggt atccccatct gtgggtacga ctgtacgttc ttgaactgta 240
ctgcatcata ctgggcctgc ctccttgtct caacattctg agaaggaagc agccacagct 300
gacattcttt accatcgctc ttcagtcttg tcattaccag cgactgcccc cacacattct 360
ctgggccacc gggttgaaat ctggtggttc ttctggtggt tctagcggca gcgagactcc 420
cgggacctca gagtccgcca cacccgaaag ttctggtggt tcttctggtg gttctgataa 480
aaagtattct attggtttag ccatcggcac taattccgtg ggctgggccg tgatcaccga 540
cgagtacaag gtgcccagca agaaattcaa ggtgctgggc aacaccgacc ggcacagcat 600
caagaagaac ctgatcggag ccctgctgtt cgacagcggc gaaacagccg aggccacccg 660
gctgaagaga accgccagaa gaagatacac cagacggaag aaccggatct gctatctgca 720
agagatcttc agcaacgaga tggccaaggt ggacgacagc ttcttccaca gactggaaga 780
gtccttcctg gtggaagagg ataagaagca cgagcggcac cccatcttcg gcaacatcgt 840
ggacgaggtg gcctaccacg agaagtaccc caccatctac cacctgagaa agaaactggt 900
ggacagcacc gacaaggccg acctgcggct gatctatctg gccctggccc acatgatcaa 960
gttccggggc cacttcctga tcgagggcga cctgaacccc gacaacagcg acgtggacaa 1020
gctgttcatc cagctggtgc agacctacaa ccagctgttc gaggaaaacc ccatcaacgc 1080
cagcggcgtg gacgccaagg ccatcctgtc tgccagactg agcaagagca gacggctgga 1140
aaatctgatc gcccagctgc ccggcgagaa gaagaatggc ctgttcggaa acctgattgc 1200
cctgagcctg ggcctgaccc ccaacttcaa gagcaacttc gacctggccg aggataccaa 1260
actgcagctg agcaaggaca cctacgacga cgacctggac aacctgctgg cccagatcgg 1320
cgaccagtac gccgacctgt ttctggccgc caagaacctg tccgacgcca tcctgctgag 1380
cgacatcctg agagtgaaca ccgagatcac caaggccccc ctgagcgcct ctatgatcaa 1440
gctgtacgac gagcaccacc aggacctgac cctgctgaaa gctctcgtgc ggcagcagct 1500
gcctgagaag tacaaagaga ttttcttcga ccagagcaag aacggctacg ccggctacat 1560
tgacggcgga gccagccagg aagagttcta caagttcatc aagcccatcc tggaaaagat 1620
ggacggcacc gaggaactgc tcgtgaagct gaacagagag gacctgctgc ggaagcagcg 1680
gaccttcgac aacggcatca tcccccacca gatccacctg ggagagctgc acgccattct 1740
gcggcggcag gaagattttt acccattcct gaaggacaac cgggaaaaga tcgagaagat 1800
cctgaccttc cgcatcccct actacgtggg ccctctggcc aggggaaaca gcagattcgc 1860
ctggatgacc agaaagagcg aggaaaccat caccccctgg aacttcgaga aggtggtgga 1920
caagggcgct tccgcccaga gcttcatcga gcggatgacc aacttcgata agaacctgcc 1980
caacgagaag gtgctgccca agcacagcct gctgtacgag tacttcaccg tgtataacga 2040
gctgaccaaa gtgaaatacg tgaccgaggg aatgagaaag cccgccttcc tgagcggcga 2100
ccagaaaaag gccatcgtgg acctgctgtt caagaccaac cggaaagtga ccgtgaagca 2160
gctgaaagag gactacttca agaaaatcga gtgcttcgac tccgtggaaa tctccggcgt 2220
ggaagatcgg ttcaacgcct ccctgggcac ataccacgat ctgctgaaaa ttatcaagga 2280
caaggacttc ctggacaatg aggaaaacga ggacattctg gaagatatcg tgctgaccct 2340
gacactgttt gaggacagag agatgatcga ggaacggctg aaaacctatg cccacctgtt 2400
cgacgacaaa gtgatgaagc agctgaagcg gcggagatac accggctggg gcaggctgag 2460
ccggaagctg atcaacggca tccgggacaa gcagtccggc aagacaatcc tggatttcct 2520
gaagtccgac ggcttcgcca acagaaactt catccagctg atccacgacg acagcctgac 2580
ctttaaagag gacatccaga aagcccaggt gtccggccag ggcgatagcc tgcacgagca 2640
cattgccaat ctggccggca gccccgccat taagaagggc atcctgcaga cagtgaaggt 2700
ggtggacgag ctcgtgaaag tgatgggccg gcacaagccc gagaacatcg tgatcgaaat 2760
ggccagagag aaccagacca cccagaaggg acagaagaac agccgcgaga gaatgaagcg 2820
gatcgaagag ggcatcaaag agctgggcag ccagatcctg aaagaacacc ccgtggaaaa 2880
cacccagctg cagaacgaga agctgtacct gtactacctg cagaatgggc gggatatgta 2940
cgtggaccag gaactggaca tcaaccggct gtccgactac gatgtggacc atatcgtgcc 3000
tcagagcttt ctgaaggacg actccatcga caacaaggtg ctgaccagaa gcgacaagaa 3060
ccggggcaag agcgacaacg tgccctccga agaggtcgtg aagaagatga agaactactg 3120
gcggcagctg ctgaacgcca agctgattac ccagagaaag ttcgacaatc tgaccaaggc 3180
cgagagaggc ggcctgagcg aactggataa ggccggcttc atcaagagac agctggtgga 3240
aacccggcag atcacaaagc acgtggcaca gatcctggac tcccggatga acactaagta 3300
cgacgagaat gacaagctga tccgggaagt gaaagtgatc accctgaagt ccaagctggt 3360
gtccgatttc cggaaggatt tccagtttta caaagtgcgc gagatcaaca actaccacca 3420
cgcccacgac gcctacctga acgccgtcgt gggaaccgcc ctgatcaaaa agtaccctaa 3480
gctggaaagc gagttcgtgt acggcgacta caaggtgtac gacgtgcgga agatgatcgc 3540
caagagcgag caggaaatcg gcaaggctac cgccaagtac ttcttctaca gcaacatcat 3600
gaactttttc aagaccgaga ttaccctggc caacggcgag atccggaagc ggcctctgat 3660
cgagacaaac ggcgaaaccg gggagatcgt gtgggataag ggccgggatt ttgccaccgt 3720
gcggaaagtg ctgagcatgc cccaagtgaa tatcgtgaaa aagaccgagg tgcagacagg 3780
cggcttcagc aaagagtcta tcctgcccaa gaggaacagc gataagctga tcgccagaaa 3840
gaaggactgg gaccctaaga agtacggcgg cttcgacagc cccaccgtgg cctattctgt 3900
gctggtggtg gccaaagtgg aaaagggcaa gtccaagaaa ctgaagagtg tgaaagagct 3960
gctggggatc accatcatgg aaagaagcag cttcgagaag aatcccatcg actttctgga 4020
agccaagggc tacaaagaag tgaaaaagga cctgatcatc aagctgccta agtactccct 4080
gttcgagctg gaaaacggcc ggaagagaat gctggcctct gccggcgtgc tgcagaaggg 4140
aaacgaactg gccctgccct ccaaatatgt gaacttcctg tacctggcca gccactatga 4200
gaagctgaag ggctcccccg aggataatga gcagaaacag ctgtttgtgg aacagcacaa 4260
gcactacctg gacgagatca tcgagcagat cagcgagttc tccaagagag tgatcctggc 4320
cgacgctaat ctggacaaag tgctgtccgc ctacaacaag caccgggata agcccatcag 4380
agagcaggcc gagaatatca tccacctgtt taccctgacc aatctgggag cccctgccgc 4440
cttcaagtac tttgacacca ccatcgaccg gaagaggtac accagcacca aagaggtgct 4500
ggacgccacc ctgatccacc agagcatcac cggcctgtac gagacacgga tcgacctgtc 4560
tcagctggga ggcgactctg gtggttctac taatctgtca gatattattg aaaaggagac 4620
cggtaagcaa ctggttatcc aggaatccat cctcatgctc ccagaggagg tggaagaagt 4680
cattgggaac aagccggaaa gcgatatact cgtgcacacc gcctacgacg agagcaccga 4740
cgagaatgtc atgcttctga ctagcgacgc ccctgaatac aagccttggg ctctggtcat 4800
acaggatagc aacggtgaga acaagattaa gatgctctct ggtggttctc ccaagaagaa 4860
gaggaaagtc taaccggtca tcatcaccat caccattgag tttaaacccg ctgatcagcc 4920
tcgactgtgc cttctagttg ccagccatct gttgtttgcc cctcccccgt gccttccttg 4980
accctggaag gtgccactcc cactgtcctt tcctaataaa atgaggaaat tgcatcgcat 5040
tgtctgagta ggtgtcattc tattctgggg ggtggggtgg ggcaggacag caagggggag 5100
gattgggaag acaatagcag gcatgctggg gatgcggtgg gctctatggc ttctgaggcg 5160
gaaagaacca gctggggctc gataccgtcg acctctagct agagcttggc gtaatcatgg 5220
tcatagctgt ttcctgtgtg aaattgttat ccgctcacaa ttccacacaa catacgagcc 5280
ggaagcataa agtgtaaagc ctagggtgcc taatgagtga gctaactcac attaattgcg 5340
ttgcgctcac tgcccgcttt ccagtcggga aacctgtcgt gccagctgca ttaatgaatc 5400
ggccaacgcg cggggagagg cggtttgcgt attgggcgct cttccgcttc ctcgctcact 5460
gactcgctgc gctcggtcgt tcggctgcgg cgagcggtat cagctcactc aaaggcggta 5520
atacggttat ccacagaatc aggggataac gcaggaaaga acatgtgagc aaaaggccag 5580
caaaaggcca ggaaccgtaa aaaggccgcg ttgctggcgt ttttccatag gctccgcccc 5640
cctgacgagc atcacaaaaa tcgacgctca agtcagaggt ggcgaaaccc gacaggacta 5700
taaagatacc aggcgtttcc ccctggaagc tccctcgtgc gctctcctgt tccgaccctg 5760
ccgcttaccg gatacctgtc cgcctttctc ccttcgggaa gcgtggcgct ttctcatagc 5820
tcacgctgta ggtatctcag ttcggtgtag gtcgttcgct ccaagctggg ctgtgtgcac 5880
gaaccccccg ttcagcccga ccgctgcgcc ttatccggta actatcgtct tgagtccaac 5940
ccggtaagac acgacttatc gccactggca gcagccactg gtaacaggat tagcagagcg 6000
aggtatgtag gcggtgctac agagttcttg aagtggtggc ctaactacgg ctacactaga 6060
agaacagtat ttggtatctg cgctctgctg aagccagtta ccttcggaaa aagagttggt 6120
agctcttgat ccggcaaaca aaccaccgct ggtagcggtg gtttttttgt ttgcaagcag 6180
cagattacgc gcagaaaaaa aggatctcaa gaagatcctt tgatcttttc tacggggtct 6240
gacgctcagt ggaacgaaaa ctcacgttaa gggattttgg tcatgagatt atcaaaaagg 6300
atcttcacct agatcctttt aaattaaaaa tgaagtttta aatcaatcta aagtatatat 6360
gagtaaactt ggtctgacag ttaccaatgc ttaatcagtg aggcacctat ctcagcgatc 6420
tgtctatttc gttcatccat agttgcctga ctccccgtcg tgtagataac tacgatacgg 6480
gagggcttac catctggccc cagtgctgca atgataccgc gagacccacg ctcaccggct 6540
ccagatttat cagcaataaa ccagccagcc ggaagggccg agcgcagaag tggtcctgca 6600
actttatccg cctccatcca gtctattaat tgttgccggg aagctagagt aagtagttcg 6660
ccagttaata gtttgcgcaa cgttgttgcc attgctacag gcatcgtggt gtcacgctcg 6720
tcgtttggta tggcttcatt cagctccggt tcccaacgat caaggcgagt tacatgatcc 6780
cccatgttgt gcaaaaaagc ggttagctcc ttcggtcctc cgatcgttgt cagaagtaag 6840
ttggccgcag tgttatcact catggttatg gcagcactgc ataattctct tactgtcatg 6900
ccatccgtaa gatgcttttc tgtgactggt gagtactcaa ccaagtcatt ctgagaatag 6960
tgtatgcggc gaccgagttg ctcttgcccg gcgtcaatac gggataatac cgcgccacat 7020
agcagaactt taaaagtgct catcattgga aaacgttctt cggggcgaaa actctcaagg 7080
atcttaccgc tgttgagatc cagttcgatg taacccactc gtgcacccaa ctgatcttca 7140
gcatctttta ctttcaccag cgtttctggg tgagcaaaaa caggaaggca aaatgccgca 7200
aaaaagggaa taagggcgac acggaaatgt tgaatactca tactcttcct ttttcaatat 7260
tattgaagca tttatcaggg ttattgtctc atgagcggat acatatttga atgtatttag 7320
aaaaataaac aaataggggt tccgcgcaca tttccccgaa aagtgccacc tgacgtcgac 7380
ggatcgggag atcgatctcc cgatccccta gggtcgactc tcagtacaat ctgctctgat 7440
gccgcatagt taagccagta tctgctccct gcttgtgtgt tggaggtcgc tgagtagtgc 7500
gcgagcaaaa tttaagctac aacaaggcaa ggcttgaccg acaattgcat gaagaatctg 7560
cttagggtta ggcgttttgc gctgcttcgc gatgtacggg ccagatatac gcgttgacat 7620
tgattattga ctagttatta atagtaatca attacggggt cattagttca tagcccatat 7680
atggagttcc gcgttacata acttacggta aatggcccgc ctggctgacc gcccaacgac 7740
ccccgcccat tgacgtcaat aatgacgtat gttcccatag taacgccaat agggactttc 7800
cattgacgtc aatgggtgga gtatttacgg taaactgccc acttggcagt acatcaagtg 7860
tatcatatgc caagtacgcc ccctattgac gtcaatgacg gtaaatggcc cgcctggcat 7920
tatgcccagt acatgacctt atgggacttt cctacttggc agtacatcta cgtattagtc 7980
atcgctatta ccatggtgat gcggttttgg cagtacatca atgggcgtgg atagcggttt 8040
gactcacggg gatttccaag tctccacccc attgacgtca atgggagttt gttttggcac 8100
caaaatcaac gggactttcc aaaatgtcgt aacaactccg ccccattgac gcaaatgggc 8160
ggtaggcgtg tacggtggga ggtctatata agcagagctg gtttagtgaa ccgtcagatc 8220
cgctagagat ccgcggccgc taatacgact cactataggg agagccgcca ccatgaagcg 8280
caccgccgat ggttccgagt tcgaaagccc caaaaaaaag cgcaaggtcg ctaaaccagc 8340
aaaacgtatc aagagtgccg cagcggctta tgtgccacaa aaccgcgatg cggtgattac 8400
cgatattaaa cgcatcgggg atttacagcg cgaagcatca cgtctggaaa cggaaatgaa 8460
tgatgccatc gcggaaatta cggagaaatt tgcggcccgg attgcaccga ttaaaaccga 8520
tattgaaacc ctttcaaaag gcgttcaggg atggtgtgaa gcgaaccgcg acgaactgac 8580
gaacggcggc aaagtgaaga cggcgaatct tgtcaccggt gatgtatcgt ggcgggtccg 8640
tccaccatca gtaagtattc gtggtatgga tgcagtgatg gaaacgctgg agcgtcttgg 8700
cctgcaacgc tttattcgca cgaagcagga aatcaacaag gaagcgattt tactggaacc 8760
gaaagcggtc gcaggcgttg ccggaattac agttaaatca ggcattgagg atttttctat 8820
tattccattt gaacaggaag ccggtattag cggcagcgag actcccggga cctcagagtc 8880
cgctacaccc gaaagtagct cagagactgg cccagtggct gtggacccca cattgagacg 8940
gcggatcgag ccccatgagt ttgaggtatt cttcgatccg agagagctcc gcaaggagac 9000
ctgcctgctt tacgaaatta attggggggg ccggcactcc atttggcgac atacatcaca 9060
gaacactaac aagcacgtcg aagtcaactt catcgagaag ttcacgacag aaagatattt 9120
ctgtccgaac acaaggtgca gcattacctg gtttctcagc tggagcccat gcggcgaatg 9180
tagtagggcc atcactgaat t 9201
<210> 2
<211> 8792
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
tctggtggtt ctcccaagaa gaagaggaaa gtctaaccgg tcatcatcac catcaccatt 60
gagtttaaac ccgctgatca gcctcgactg tgccttctag ttgccagcca tctgttgttt 120
gcccctcccc cgtgccttcc ttgaccctgg aaggtgccac tcccactgtc ctttcctaat 180
aaaatgagga aattgcatcg cattgtctga gtaggtgtca ttctattctg gggggtgggg 240
tggggcagga cagcaagggg gaggattggg aagacaatag caggcatgct ggggatgcgg 300
tgggctctat ggcttctgag gcggaaagaa ccagctgggg ctcgataccg tcgacctcta 360
gctagagctt ggcgtaatca tggtcatagc tgtttcctgt gtgaaattgt tatccgctca 420
caattccaca caacatacga gccggaagca taaagtgtaa agcctagggt gcctaatgag 480
tgagctaact cacattaatt gcgttgcgct cactgcccgc tttccagtcg ggaaacctgt 540
cgtgccagct gcattaatga atcggccaac gcgcggggag aggcggtttg cgtattgggc 600
gctcttccgc ttcctcgctc actgactcgc tgcgctcggt cgttcggctg cggcgagcgg 660
tatcagctca ctcaaaggcg gtaatacggt tatccacaga atcaggggat aacgcaggaa 720
agaacatgtg agcaaaaggc cagcaaaagg ccaggaaccg taaaaaggcc gcgttgctgg 780
cgtttttcca taggctccgc ccccctgacg agcatcacaa aaatcgacgc tcaagtcaga 840
ggtggcgaaa cccgacagga ctataaagat accaggcgtt tccccctgga agctccctcg 900
tgcgctctcc tgttccgacc ctgccgctta ccggatacct gtccgccttt ctcccttcgg 960
gaagcgtggc gctttctcat agctcacgct gtaggtatct cagttcggtg taggtcgttc 1020
gctccaagct gggctgtgtg cacgaacccc ccgttcagcc cgaccgctgc gccttatccg 1080
gtaactatcg tcttgagtcc aacccggtaa gacacgactt atcgccactg gcagcagcca 1140
ctggtaacag gattagcaga gcgaggtatg taggcggtgc tacagagttc ttgaagtggt 1200
ggcctaacta cggctacact agaagaacag tatttggtat ctgcgctctg ctgaagccag 1260
ttaccttcgg aaaaagagtt ggtagctctt gatccggcaa acaaaccacc gctggtagcg 1320
gtggtttttt tgtttgcaag cagcagatta cgcgcagaaa aaaaggatct caagaagatc 1380
ctttgatctt ttctacgggg tctgacgctc agtggaacga aaactcacgt taagggattt 1440
tggtcatgag attatcaaaa aggatcttca cctagatcct tttaaattaa aaatgaagtt 1500
ttaaatcaat ctaaagtata tatgagtaaa cttggtctga cagttaccaa tgcttaatca 1560
gtgaggcacc tatctcagcg atctgtctat ttcgttcatc catagttgcc tgactccccg 1620
tcgtgtagat aactacgata cgggagggct taccatctgg ccccagtgct gcaatgatac 1680
cgcgagaccc acgctcaccg gctccagatt tatcagcaat aaaccagcca gccggaaggg 1740
ccgagcgcag aagtggtcct gcaactttat ccgcctccat ccagtctatt aattgttgcc 1800
gggaagctag agtaagtagt tcgccagtta atagtttgcg caacgttgtt gccattgcta 1860
caggcatcgt ggtgtcacgc tcgtcgtttg gtatggcttc attcagctcc ggttcccaac 1920
gatcaaggcg agttacatga tcccccatgt tgtgcaaaaa agcggttagc tccttcggtc 1980
ctccgatcgt tgtcagaagt aagttggccg cagtgttatc actcatggtt atggcagcac 2040
tgcataattc tcttactgtc atgccatccg taagatgctt ttctgtgact ggtgagtact 2100
caaccaagtc attctgagaa tagtgtatgc ggcgaccgag ttgctcttgc ccggcgtcaa 2160
tacgggataa taccgcgcca catagcagaa ctttaaaagt gctcatcatt ggaaaacgtt 2220
cttcggggcg aaaactctca aggatcttac cgctgttgag atccagttcg atgtaaccca 2280
ctcgtgcacc caactgatct tcagcatctt ttactttcac cagcgtttct gggtgagcaa 2340
aaacaggaag gcaaaatgcc gcaaaaaagg gaataagggc gacacggaaa tgttgaatac 2400
tcatactctt cctttttcaa tattattgaa gcatttatca gggttattgt ctcatgagcg 2460
gatacatatt tgaatgtatt tagaaaaata aacaaatagg ggttccgcgc acatttcccc 2520
gaaaagtgcc acctgacgtc gacggatcgg gagatcgatc tcccgatccc ctagggtcga 2580
ctctcagtac aatctgctct gatgccgcat agttaagcca gtatctgctc cctgcttgtg 2640
tgttggaggt cgctgagtag tgcgcgagca aaatttaagc tacaacaagg caaggcttga 2700
ccgacaattg catgaagaat ctgcttaggg ttaggcgttt tgcgctgctt cgcgatgtac 2760
gggccagata tacgcgttga cattgattat tgactagtta ttaatagtaa tcaattacgg 2820
ggtcattagt tcatagccca tatatggagt tccgcgttac ataacttacg gtaaatggcc 2880
cgcctggctg accgcccaac gacccccgcc cattgacgtc aataatgacg tatgttccca 2940
tagtaacgcc aatagggact ttccattgac gtcaatgggt ggagtattta cggtaaactg 3000
cccacttggc agtacatcaa gtgtatcata tgccaagtac gccccctatt gacgtcaatg 3060
acggtaaatg gcccgcctgg cattatgccc agtacatgac cttatgggac tttcctactt 3120
ggcagtacat ctacgtatta gtcatcgcta ttaccatggt gatgcggttt tggcagtaca 3180
tcaatgggcg tggatagcgg tttgactcac ggggatttcc aagtctccac cccattgacg 3240
tcaatgggag tttgttttgg caccaaaatc aacgggactt tccaaaatgt cgtaacaact 3300
ccgccccatt gacgcaaatg ggcggtaggc gtgtacggtg ggaggtctat ataagcagag 3360
ctggtttagt gaaccgtcag atccgctaga gatccgcggc cgctaatacg actcactata 3420
gggccaccat gaagcgcacc gccgatggtt ccgagttcga aagccccaaa aaaaagcgca 3480
aggtcccgag agccgccacc atgtccgaag tcgagttttc ccatgagtac tggatgagac 3540
acgcattgac tctcgcaaag agggcttggg atgaacgcga ggtgcccgtg ggggcagtac 3600
tcgtgcataa caatcgcgta atcggcgaag gttggaatag gccgatcgga cgccacgacc 3660
ccactgcaca tgcggaaatc atggcccttc gacagggagg gcttgtgatg cagaattatc 3720
gacttatcga tgcgacgctg tacgtcacgc ttgaaccttg cgtaatgtgc gcgggagcta 3780
tgattcactc ccgcattgga cgagttgtat tcggtgcccg cgacgccaag acgggtgccg 3840
caggttcact gatggacgtg ctgcatcacc caggcatgaa ccaccgggta gaaatcacag 3900
aaggcatatt ggcggacgaa tgtgcggcgc tgttgtccga cttttttcgc atgcggaggc 3960
aggagatcaa ggcccagaaa aaagcacaat cctctactga cagcggcggc agcagcggcg 4020
gcagcagcgg cagcgagact cccgggacct cagagtccgc cacacccgaa agtagcggcg 4080
gcagcagcgg cggcagctcc gaagtcgagt tttcccatga gtactggatg agacacgcat 4140
tgactctcgc aaagagggct cgggatgaac gcgaggtgcc cgtgggggca gtactcgtgc 4200
ttaacaatcg cgtaatcggc gaaggttgga atagggcgat cggactccac gaccccactg 4260
cacatgcgga aatcatggcc cttcgacagg gagggcttgt gatgcagaat tatcgactta 4320
tcgatgcgac gctgtacgtc acgtttgaac cttgcgtaat gtgcgcggga gctatgattc 4380
actcccgcat tggacgagtt gtattcggtg tccgcaacgc caagacgggt gccgcaggtt 4440
cactgatgga cgtgctgcat tacccaggca tgaaccaccg ggtagaaatc acagaaggca 4500
tattggcgga cgaatgtgcg gcgctgttgt gctacttttt tcgcatgccg aggcaggtgt 4560
tcaatgccca gaaaaaagca caatcctcta ctgacagcgg cggcagcagc ggcggcagca 4620
gcggcagcga gactcccggg acctcagagt ccgccacacc cgaaagtagc ggcggcagca 4680
gcggcggcag cgataaaaag tattctattg gtttagccat cggcactaat tccgtgggct 4740
gggccgtgat caccgacgag tacaaggtgc ccagcaagaa attcaaggtg ctgggcaaca 4800
ccgaccggca cagcatcaag aagaacctga tcggagccct gctgttcgac agcggcgaaa 4860
cagccgaggc cacccggctg aagagaaccg ccagaagaag atacaccaga cggaagaacc 4920
ggatctgcta tctgcaagag atcttcagca acgagatggc caaggtggac gacagcttct 4980
tccacagact ggaagagtcc ttcctggtgg aagaggataa gaagcacgag cggcacccca 5040
tcttcggcaa catcgtggac gaggtggcct accacgagaa gtaccccacc atctaccacc 5100
tgagaaagaa actggtggac agcaccgaca aggccgacct gcggctgatc tatctggccc 5160
tggcccacat gatcaagttc cggggccact tcctgatcga gggcgacctg aaccccgaca 5220
acagcgacgt ggacaagctg ttcatccagc tggtgcagac ctacaaccag ctgttcgagg 5280
aaaaccccat caacgccagc ggcgtggacg ccaaggccat cctgtctgcc agactgagca 5340
agagcagacg gctggaaaat ctgatcgccc agctgcccgg cgagaagaag aatggcctgt 5400
tcggaaacct gattgccctg agcctgggcc tgacccccaa cttcaagagc aacttcgacc 5460
tggccgagga taccaaactg cagctgagca aggacaccta cgacgacgac ctggacaacc 5520
tgctggccca gatcggcgac cagtacgccg acctgtttct ggccgccaag aacctgtccg 5580
acgccatcct gctgagcgac atcctgagag tgaacaccga gatcaccaag gcccccctga 5640
gcgcctctat gatcaagctg tacgacgagc accaccagga cctgaccctg ctgaaagctc 5700
tcgtgcggca gcagctgcct gagaagtaca aagagatttt cttcgaccag agcaagaacg 5760
gctacgccgg ctacattgac ggcggagcca gccaggaaga gttctacaag ttcatcaagc 5820
ccatcctgga aaagatggac ggcaccgagg aactgctcgt gaagctgaac agagaggacc 5880
tgctgcggaa gcagcggacc ttcgacaacg gcatcatccc ccaccagatc cacctgggag 5940
agctgcacgc cattctgcgg cggcaggaag atttttaccc attcctgaag gacaaccggg 6000
aaaagatcga gaagatcctg accttccgca tcccctacta cgtgggccct ctggccaggg 6060
gaaacagcag attcgcctgg atgaccagaa agagcgagga aaccatcacc ccctggaact 6120
tcgagaaggt ggtggacaag ggcgcttccg cccagagctt catcgagcgg atgaccaact 6180
tcgataagaa cctgcccaac gagaaggtgc tgcccaagca cagcctgctg tacgagtact 6240
tcaccgtgta taacgagctg accaaagtga aatacgtgac cgagggaatg agaaagcccg 6300
ccttcctgag cggcgaccag aaaaaggcca tcgtggacct gctgttcaag accaaccgga 6360
aagtgaccgt gaagcagctg aaagaggact acttcaagaa aatcgagtgc ttcgactccg 6420
tggaaatctc cggcgtggaa gatcggttca acgcctccct gggcacatac cacgatctgc 6480
tgaaaattat caaggacaag gacttcctgg acaatgagga aaacgaggac attctggaag 6540
atatcgtgct gaccctgaca ctgtttgagg acagagagat gatcgaggaa cggctgaaaa 6600
cctatgccca cctgttcgac gacaaagtga tgaagcagct gaagcggcgg agatacaccg 6660
gctggggcag gctgagccgg aagctgatca acggcatccg ggacaagcag tccggcaaga 6720
caatcctgga tttcctgaag tccgacggct tcgccaacag aaacttcatc cagctgatcc 6780
acgacgacag cctgaccttt aaagaggaca tccagaaagc ccaggtgtcc ggccagggcg 6840
atagcctgca cgagcacatt gccaatctgg ccggcagccc cgccattaag aagggcatcc 6900
tgcagacagt gaaggtggtg gacgagctcg tgaaagtgat gggccggcac aagcccgaga 6960
acatcgtgat cgaaatggcc agagagaacc agaccaccca gaagggacag aagaacagcc 7020
gcgagagaat gaagcggatc gaagagggca tcaaagagct gggcagccag atcctgaaag 7080
aacaccccgt ggaaaacacc cagctgcaga acgagaagct gtacctgtac tacctgcaga 7140
atgggcggga tatgtacgtg gaccaggaac tggacatcaa ccggctgtcc gactacgatg 7200
tggaccatat cgtgcctcag agctttctga aggacgactc catcgacaac aaggtgctga 7260
ccagaagcga caagaaccgg ggcaagagcg acaacgtgcc ctccgaagag gtcgtgaaga 7320
agatgaagaa ctactggcgg cagctgctga acgccaagct gattacccag agaaagttcg 7380
acaatctgac caaggccgag agaggcggcc tgagcgaact ggataaggcc ggcttcatca 7440
agagacagct ggtggaaacc cggcagatca caaagcacgt ggcacagatc ctggactccc 7500
ggatgaacac taagtacgac gagaatgaca agctgatccg ggaagtgaaa gtgatcaccc 7560
tgaagtccaa gctggtgtcc gatttccgga aggatttcca gttttacaaa gtgcgcgaga 7620
tcaacaacta ccaccacgcc cacgacgcct acctgaacgc cgtcgtggga accgccctga 7680
tcaaaaagta ccctaagctg gaaagcgagt tcgtgtacgg cgactacaag gtgtacgacg 7740
tgcggaagat gatcgccaag agcgagcagg aaatcggcaa ggctaccgcc aagtacttct 7800
tctacagcaa catcatgaac tttttcaaga ccgagattac cctggccaac ggcgagatcc 7860
ggaagcggcc tctgatcgag acaaacggcg aaaccgggga gatcgtgtgg gataagggcc 7920
gggattttgc caccgtgcgg aaagtgctga gcatgcccca agtgaatatc gtgaaaaaga 7980
ccgaggtgca gacaggcggc ttcagcaaag agtctatcct gcccaagagg aacagcgata 8040
agctgatcgc cagaaagaag gactgggacc ctaagaagta cggcggcttc gacagcccca 8100
ccgtggccta ttctgtgctg gtggtggcca aagtggaaaa gggcaagtcc aagaaactga 8160
agagtgtgaa agagctgctg gggatcacca tcatggaaag aagcagcttc gagaagaatc 8220
ccatcgactt tctggaagcc aagggctaca aagaagtgaa aaaggacctg atcatcaagc 8280
tgcctaagta ctccctgttc gagctggaaa acggccggaa gagaatgctg gcctctgccg 8340
gcgtgctgca gaagggaaac gaactggccc tgccctccaa atatgtgaac ttcctgtacc 8400
tggccagcca ctatgagaag ctgaagggct cccccgagga taatgagcag aaacagctgt 8460
ttgtggaaca gcacaagcac tacctggacg agatcatcga gcagatcagc gagttctcca 8520
agagagtgat cctggccgac gctaatctgg acaaagtgct gtccgcctac aacaagcacc 8580
gggataagcc catcagagag caggccgaga atatcatcca cctgtttacc ctgaccaatc 8640
tgggagcccc tgccgccttc aagtactttg acaccaccat cgaccggaag aggtacacca 8700
gcaccaaaga ggtgctggac gccaccctga tccaccagag catcaccggc ctgtacgaga 8760
cacggatcga cctgtctcag ctgggaggcg ac 8792
<210> 3
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
accgggcact gcggctggag gtgg 24
<210> 4
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 4
aaacccacct ccagccgcag tgcc 24
<210> 5
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 5
accggcggtc tcaagcacta ccta 24
<210> 6
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 6
aaactaggta gtgcttgaga ccgc 24
<210> 7
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 7
accggagctg cacatactag cccc 24
<210> 8
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 8
aaacggggct agtatgtgca gctc 24
<210> 9
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 9
accggcagac ggcagtcact aggg 24
<210> 10
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 10
aaacccctag tgactgccgt ctgc 24
<210> 11
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 11
accggtcgca ggacagcttt tcct 24
<210> 12
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 12
aaacaggaaa agctgtcctg cgac 24
<210> 13
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 13
accggggaag ctgggtgaat ggag 24
<210> 14
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 14
aaacctccat tcacccagct tccc 24
<210> 15
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 15
accggcggag actctggtgc tgtg 24
<210> 16
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 16
aaaccacagc accagagtct ccgc 24
<210> 17
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 17
accgggtcag aaataggggg tcca 24
<210> 18
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 18
aaactggacc ccctatttct gacc 24
<210> 19
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 19
accggatcca ggtgctgcag aagg 24
<210> 20
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 20
aaacccttct gcagcacctg gatc 24
<210> 21
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 21
accggtcact ccaggattcc aata 24
<210> 22
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 22
aaactattgg aatcctggag tgac 24
<210> 23
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 23
accgattacc catggcgtcc ccag 24
<210> 24
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 24
aaacctgggg acgccatggg taat 24
<210> 25
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 25
accggatcac gggggacaac cgga 24
<210> 26
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 26
aaactccggt tgtcccccgt gatc 24
<210> 27
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 27
accgatgggt aatggtgcca gtct 24
<210> 28
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 28
aaacagactg gcaccattac ccat 24
<210> 29
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 29
accggtcccc cgtgatcaga acca 24
<210> 30
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 30
aaactggttc tgatcacggg ggac 24
<210> 31
<211> 4927
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 31
ggtaccgatt agtgaacgga tctcgacggt atcgatcacg agactagcct cgagcggccg 60
cccccttcac cgagggccta tttcccatga ttccttcata tttgcatata cgatacaagg 120
ctgttagaga gataattgga attaatttga ctgtaaacac aaagatatta gtacaaaata 180
cgtgacgtag aaagtaataa tttcttgggt agtttgcagt tttaaaatta tgttttaaaa 240
tggactatca tatgcttacc gtaacttgaa agtatttcga tttcttggct ttatatatct 300
tgtggaaagg acgaaacacc gtgagaccga gagagggtct cagttttaga gctagaaata 360
gcaagttaaa ataaggctag tccgttatca acttgaaaaa gtggcaccga gtcggtgctt 420
tttttaaaga attctcgacc tcgagacaaa tggcagtatt catccacaat tttaaaagaa 480
aaggggggat tggggggtac agtgcagggg aaagaatagt agacataata gcaacagaca 540
tacaaactaa agaattacaa aaacaaatta caaaaattca aaattttcgg gtttattaca 600
gggacagcag agatccactt tggccgcggc tcgagggggt tggggttgcg ccttttccaa 660
ggcagccctg ggtttgcgca gggacgcggc tgctctgggc gtggttccgg gaaacgcagc 720
ggcgccgacc ctgggactcg cacattcttc acgtccgttc gcagcgtcac ccggatcttc 780
gccgctaccc ttgtgggccc cccggcgacg cttcctgctc cgcccctaag tcgggaaggt 840
tccttgcggt tcgcggcgtg ccggacgtga caaacggaag ccgcacgtct cactagtacc 900
ctcgcagacg gacagcgcca gggagcaatg gcagcgcgcc gaccgcgatg ggctgtggcc 960
aatagcggct gctcagcagg gcgcgccgag agcagcggcc gggaaggggc ggtgcgggag 1020
gcggggtgtg gggcggtagt gtgggccctg ttcctgcccg cgcggtgttc cgcattctgc 1080
aagcctccgg agcgcacgtc ggcagtcggc tccctcgttg accgaatcac cgacctctct 1140
ccccaggggg atccatggtg agcaagggcg aggagctgtt caccggggtg gtgcccatcc 1200
tggtcgagct ggacggcgac gtaaacggcc acaagttcag cgtgtccggc gagggcgagg 1260
gcgatgccac ctacggcaag ctgaccctga agttcatctg caccaccggc aagctgcccg 1320
tgccctggcc caccctcgtg accaccctga cctacggcgt gcagtgcttc agccgctacc 1380
ccgaccacat gaagcagcac gacttcttca agtccgccat gcccgaaggc tacgtccagg 1440
agcgcaccat cttcttcaag gacgacggca actacaagac ccgcgccgag gtgaagttcg 1500
agggcgacac cctggtgaac cgcatcgagc tgaagggcat cgacttcaag gaggacggca 1560
acatcctggg gcacaagctg gagtacaact acaacagcca caacgtctat atcatggccg 1620
acaagcagaa gaacggcatc aaggtgaact tcaagatccg ccacaacatc gaggacggca 1680
gcgtgcagct cgccgaccac taccagcaga acacccccat cggcgacggc cccgtgctgc 1740
tgcccgacaa ccactacctg agcacccagt ccgccctgag caaagacccc aacgagaagc 1800
gcgatcacat ggtcctgctg gagttcgtga ccgccgccgg gatcactctc ggcatggacg 1860
agctgtacaa gtaaagcggc cgcgactcta gatcataatc agccatacca catttgtaga 1920
ggttttactt gctttaaaaa acctcccaca cctccccctg aacctgaaac ataaaatgaa 1980
tgcaattgtt gttgttaact tgtttattgc agcttataat ggttacaaat aaagcaatag 2040
catcacaaat ttcacaaata aagcattttt ttcactgcat tctagttgtg gtttgtccaa 2100
actcatcaat gtatcttagt cgaccgatgc ccttgagagc cttcaaccca gtcagctcct 2160
tccggtgggc gcggggcatg actatcgtcg ccgcacttat gactgtcttc tttatcatgc 2220
aactcgtagg acaggtgccg gcagcgctct tccgcttcct cgctcactga ctcgctgcgc 2280
tcggtcgttc ggctgcggcg agcggtatca gctcactcaa aggcggtaat acggttatcc 2340
acagaatcag gggataacgc aggaaagaac atgtgagcaa aaggccagca aaaggccagg 2400
aaccgtaaaa aggccgcgtt gctggcgttt ttccataggc tccgcccccc tgacgagcat 2460
cacaaaaatc gacgctcaag tcagaggtgg cgaaacccga caggactata aagataccag 2520
gcgtttcccc ctggaagctc cctcgtgcgc tctcctgttc cgaccctgcc gcttaccgga 2580
tacctgtccg cctttctccc ttcgggaagc gtggcgcttt ctcaatgctc acgctgtagg 2640
tatctcagtt cggtgtaggt cgttcgctcc aagctgggct gtgtgcacga accccccgtt 2700
cagcccgacc gctgcgcctt atccggtaac tatcgtcttg agtccaaccc ggtaagacac 2760
gacttatcgc cactggcagc agccactggt aacaggatta gcagagcgag gtatgtaggc 2820
ggtgctacag agttcttgaa gtggtggcct aactacggct acactagaag gacagtattt 2880
ggtatctgcg ctctgctgaa gccagttacc ttcggaaaaa gagttggtag ctcttgatcc 2940
ggcaaacaaa ccaccgctgg tagcggtggt ttttttgttt gcaagcagca gattacgcgc 3000
agaaaaaaag gatctcaaga agatcctttg atcttttcta cggggtctga cgctcagtgg 3060
aacgaaaact cacgttaagg gattttggtc atgagattat caaaaaggat cttcacctag 3120
atccttttaa attaaaaatg aagttttaaa tcaatctaaa gtatatatga gtaaacttgg 3180
tctgacagtt accaatgctt aatcagtgag gcacctatct cagcgatctg tctatttcgt 3240
tcatccatag ttgcctgact ccccgtcgtg tagataacta cgatacggga gggcttacca 3300
tctggcccca gtgctgcaat gataccgcgg gacccacgct caccggctcc agatttatca 3360
gcaataaacc agccagccgg aagggccgag cgcagaagtg gtcctgcaac tttatccgcc 3420
tccatccagt ctattaattg ttgccgggaa gctagagtaa gtagttcgcc agttaatagt 3480
ttgcgcaacg ttgttgccat tgctacaggc atcgtggtgt cacgctcgtc gtttggtatg 3540
gcttcattca gctccggttc ccaacgatca aggcgagtta catgatcccc catgttgtgc 3600
aaaaaagcgg ttagctcctt cggtcctccg atcgttgtca gaagtaagtt ggccgcagtg 3660
ttatcactca tggttatggc agcactgcat aattctctta ctgtcatgcc atccgtaaga 3720
tgcttttctg tgactggtga gtactcaacc aagtcattct gagaatagtg tatgcggcga 3780
ccgagttgct cttgcccggc gtcaatacgg gataataccg cgccacatag cagaacttta 3840
aaagtgctca tcattggaaa acgttcttcg gggcgaaaac tctcaaggat cttaccgctg 3900
ttgagatcca gttcgatgta acccactcgt gcacccaact gatcttcagc atcttttact 3960
ttcaccagcg tttctgggtg agcaaaaaca ggaaggcaaa atgccgcaaa aaagggaata 4020
agggcgacac ggaaatgttg aatactcata ctcttccttt ttcaatatta ttgaagcatt 4080
tatcagggtt attgtctcat gagcggatac atatttgaat gtatttagaa aaataaacaa 4140
ataggggttc cgcgcacatt tccccgaaaa gtgccacctg acgcgccctg tagcggcgca 4200
ttaagcgcgg cgggtgtggt ggttacgcgc agcgtgaccg ctacacttgc cagcgcccta 4260
gcgcccgctc ctttcgcttt cttcccttcc tttctcgcca cgttcgccgg ctttccccgt 4320
caagctctaa atcgggggct ccctttaggg ttccgattta gtgctttacg gcacctcgac 4380
cccaaaaaac ttgattaggg tgatggttca cgtagtgggc catcgccctg atagacggtt 4440
tttcgccctt tgacgttgga gtccacgttc tttaatagtg gactcttgtt ccaaactgga 4500
acaacactca accctatctc ggtctattct tttgatttat aagggatttt gccgatttcg 4560
gcctattggt taaaaaatga gctgatttaa caaaaattta acgcgaattt taacaaaata 4620
ttaacgttta caatttccca ttcgccattc aggctgcgca actgttggga agggcgatcg 4680
gtgcgggcct cttcgctatt acgccagccc aagctaccat gataagtaag taatattaag 4740
gtacgggagg tacttggagc ggccgcaata aaatatcttt attttcatta catctgtgtg 4800
ttggtttttt gtgtgaatcg atagtactaa catacgctct ccatcaaaac aaaacgaaac 4860
aaaacaaact agcaaaatag gctgtcccca gtgcaagtgc aggtgccaga acatttctct 4920
atcgata 4927
<210> 32
<211> 167
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 32
Met Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu
1 5 10 15
Thr Leu Ala Lys Arg Ala Trp Asp Glu Arg Glu Val Pro Val Gly Ala
20 25 30
Val Leu Val His Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Pro
35 40 45
Ile Gly Arg His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg
50 55 60
Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu
65 70 75 80
Tyr Val Thr Leu Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His
85 90 95
Ser Arg Ile Gly Arg Val Val Phe Gly Ala Arg Asp Ala Lys Thr Gly
100 105 110
Ala Ala Gly Ser Leu Met Asp Val Leu His His Pro Gly Met Asn His
115 120 125
Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu
130 135 140
Leu Ser Asp Phe Phe Arg Met Arg Arg Gln Glu Ile Lys Ala Gln Lys
145 150 155 160
Lys Ala Gln Ser Ser Thr Asp
165
<210> 33
<211> 167
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 33
Met Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu
1 5 10 15
Thr Leu Ala Lys Arg Ala Arg Asp Glu Arg Glu Val Pro Val Gly Ala
20 25 30
Val Leu Val Leu Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Ala
35 40 45
Ile Gly Leu His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg
50 55 60
Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu
65 70 75 80
Tyr Val Thr Phe Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His
85 90 95
Ser Arg Ile Gly Arg Val Val Phe Gly Val Arg Asn Ala Lys Thr Gly
100 105 110
Ala Ala Gly Ser Leu Met Asp Val Leu His Tyr Pro Gly Met Asn His
115 120 125
Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu
130 135 140
Leu Cys Tyr Phe Phe Arg Met Pro Arg Gln Val Phe Asn Ala Gln Lys
145 150 155 160
Lys Ala Gln Ser Ser Thr Asp
165
<210> 34
<211> 1367
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 34
Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly
1 5 10 15
Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys
20 25 30
Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly
35 40 45
Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys
50 55 60
Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr
65 70 75 80
Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser Phe
85 90 95
Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His
100 105 110
Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His
115 120 125
Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser
130 135 140
Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met
145 150 155 160
Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp
165 170 175
Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn
180 185 190
Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys
195 200 205
Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu
210 215 220
Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu
225 230 235 240
Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp
245 250 255
Leu Ala Glu Asp Thr Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp
260 265 270
Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu
275 280 285
Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile
290 295 300
Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met
305 310 315 320
Ile Lys Leu Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala
325 330 335
Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp
340 345 350
Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln
355 360 365
Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly
370 375 380
Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys
385 390 395 400
Gln Arg Thr Phe Asp Asn Gly Ile Ile Pro His Gln Ile His Leu Gly
405 410 415
Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu
420 425 430
Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro
435 440 445
Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met
450 455 460
Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Lys Val
465 470 475 480
Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn
485 490 495
Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu
500 505 510
Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr
515 520 525
Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Asp Gln Lys
530 535 540
Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val
545 550 555 560
Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser
565 570 575
Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr
580 585 590
Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn
595 600 605
Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu
610 615 620
Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His
625 630 635 640
Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr
645 650 655
Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys
660 665 670
Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala
675 680 685
Asn Arg Asn Phe Ile Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys
690 695 700
Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His
705 710 715 720
Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile
725 730 735
Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg
740 745 750
His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr
755 760 765
Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu
770 775 780
Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val
785 790 795 800
Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln
805 810 815
Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu
820 825 830
Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp
835 840 845
Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg Gly
850 855 860
Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn
865 870 875 880
Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
885 890 895
Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys
900 905 910
Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys
915 920 925
His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu
930 935 940
Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys
945 950 955 960
Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu
965 970 975
Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val Val
980 985 990
Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val
995 1000 1005
Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser
1010 1015 1020
Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn
1025 1030 1035 1040
Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile
1045 1050 1055
Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val
1060 1065 1070
Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met
1075 1080 1085
Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe
1090 1095 1100
Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala
1105 1110 1115 1120
Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro
1125 1130 1135
Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu Lys Gly Lys
1140 1145 1150
Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met
1155 1160 1165
Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys
1170 1175 1180
Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr
1185 1190 1195 1200
Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala
1205 1210 1215
Gly Val Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val
1220 1225 1230
Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro
1235 1240 1245
Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr
1250 1255 1260
Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile
1265 1270 1275 1280
Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His
1285 1290 1295
Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe
1300 1305 1310
Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr
1315 1320 1325
Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala
1330 1335 1340
Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp
1345 1350 1355 1360
Leu Ser Gln Leu Gly Gly Asp
1365
<210> 35
<211> 18
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 35
Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu Ser Pro Lys Lys Lys Arg
1 5 10 15
Lys Val
<210> 36
<211> 7
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 36
Pro Lys Lys Lys Arg Lys Val
1 5
<210> 37
<211> 32
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 37
Ser Gly Gly Ser Ser Gly Gly Ser Ser Gly Ser Glu Thr Pro Gly Thr
1 5 10 15
Ser Glu Ser Ala Thr Pro Glu Ser Ser Gly Gly Ser Ser Gly Gly Ser
20 25 30
<210> 38
<211> 4
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 38
Ser Gly Gly Ser
1
<210> 39
<211> 1799
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 39
Met Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu Ser Pro Lys Lys Lys
1 5 10 15
Arg Lys Val Pro Arg Ala Ala Thr Met Ser Glu Val Glu Phe Ser His
20 25 30
Glu Tyr Trp Met Arg His Ala Leu Thr Leu Ala Lys Arg Ala Trp Asp
35 40 45
Glu Arg Glu Val Pro Val Gly Ala Val Leu Val His Asn Asn Arg Val
50 55 60
Ile Gly Glu Gly Trp Asn Arg Pro Ile Gly Arg His Asp Pro Thr Ala
65 70 75 80
His Ala Glu Ile Met Ala Leu Arg Gln Gly Gly Leu Val Met Gln Asn
85 90 95
Tyr Arg Leu Ile Asp Ala Thr Leu Tyr Val Thr Leu Glu Pro Cys Val
100 105 110
Met Cys Ala Gly Ala Met Ile His Ser Arg Ile Gly Arg Val Val Phe
115 120 125
Gly Ala Arg Asp Ala Lys Thr Gly Ala Ala Gly Ser Leu Met Asp Val
130 135 140
Leu His His Pro Gly Met Asn His Arg Val Glu Ile Thr Glu Gly Ile
145 150 155 160
Leu Ala Asp Glu Cys Ala Ala Leu Leu Ser Asp Phe Phe Arg Met Arg
165 170 175
Arg Gln Glu Ile Lys Ala Gln Lys Lys Ala Gln Ser Ser Thr Asp Ser
180 185 190
Gly Gly Ser Ser Gly Gly Ser Ser Gly Ser Glu Thr Pro Gly Thr Ser
195 200 205
Glu Ser Ala Thr Pro Glu Ser Ser Gly Gly Ser Ser Gly Gly Ser Ser
210 215 220
Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu Thr Leu
225 230 235 240
Ala Lys Arg Ala Arg Asp Glu Arg Glu Val Pro Val Gly Ala Val Leu
245 250 255
Val Leu Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Ala Ile Gly
260 265 270
Leu His Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg Gln Gly
275 280 285
Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu Tyr Val
290 295 300
Thr Phe Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His Ser Arg
305 310 315 320
Ile Gly Arg Val Val Phe Gly Val Arg Asn Ala Lys Thr Gly Ala Ala
325 330 335
Gly Ser Leu Met Asp Val Leu His Tyr Pro Gly Met Asn His Arg Val
340 345 350
Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu Leu Cys
355 360 365
Tyr Phe Phe Arg Met Pro Arg Gln Val Phe Asn Ala Gln Lys Lys Ala
370 375 380
Gln Ser Ser Thr Asp Ser Gly Gly Ser Ser Gly Gly Ser Ser Gly Ser
385 390 395 400
Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser Gly Gly
405 410 415
Ser Ser Gly Gly Ser Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly
420 425 430
Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro
435 440 445
Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys
450 455 460
Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu
465 470 475 480
Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys
485 490 495
Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys
500 505 510
Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu
515 520 525
Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp
530 535 540
Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys
545 550 555 560
Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu
565 570 575
Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly
580 585 590
Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu
595 600 605
Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser
610 615 620
Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg
625 630 635 640
Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly
645 650 655
Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe
660 665 670
Lys Ser Asn Phe Asp Leu Ala Glu Asp Thr Lys Leu Gln Leu Ser Lys
675 680 685
Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp
690 695 700
Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile
705 710 715 720
Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro
725 730 735
Leu Ser Ala Ser Met Ile Lys Leu Tyr Asp Glu His His Gln Asp Leu
740 745 750
Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys
755 760 765
Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp
770 775 780
Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu
785 790 795 800
Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu
805 810 815
Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ile Ile Pro His
820 825 830
Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp
835 840 845
Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu
850 855 860
Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser
865 870 875 880
Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp
885 890 895
Asn Phe Glu Lys Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile
900 905 910
Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu
915 920 925
Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu
930 935 940
Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu
945 950 955 960
Ser Gly Asp Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn
965 970 975
Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile
980 985 990
Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn
995 1000 1005
Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys
1010 1015 1020
Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val
1025 1030 1035 1040
Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu
1045 1050 1055
Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys
1060 1065 1070
Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn
1075 1080 1085
Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys
1090 1095 1100
Ser Asp Gly Phe Ala Asn Arg Asn Phe Ile Gln Leu Ile His Asp Asp
1105 1110 1115 1120
Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln
1125 1130 1135
Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala
1140 1145 1150
Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val
1155 1160 1165
Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala
1170 1175 1180
Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg
1185 1190 1195 1200
Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu
1205 1210 1215
Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr
1220 1225 1230
Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu
1235 1240 1245
Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln
1250 1255 1260
Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser
1265 1270 1275 1280
Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val
1285 1290 1295
Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile
1300 1305 1310
Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu
1315 1320 1325
Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr
1330 1335 1340
Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn
1345 1350 1355 1360
Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile
1365 1370 1375
Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe
1380 1385 1390
Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr
1395 1400 1405
Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu
1410 1415 1420
Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys
1425 1430 1435 1440
Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr
1445 1450 1455
Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu
1460 1465 1470
Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu
1475 1480 1485
Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg
1490 1495 1500
Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val
1505 1510 1515 1520
Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser
1525 1530 1535
Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly
1540 1545 1550
Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys
1555 1560 1565
Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu
1570 1575 1580
Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp
1585 1590 1595 1600
Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile
1605 1610 1615
Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg
1620 1625 1630
Met Leu Ala Ser Ala Gly Val Leu Gln Lys Gly Asn Glu Leu Ala Leu
1635 1640 1645
Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys
1650 1655 1660
Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu
1665 1670 1675 1680
Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe
1685 1690 1695
Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser
1700 1705 1710
Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn
1715 1720 1725
Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe
1730 1735 1740
Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys
1745 1750 1755 1760
Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr
1765 1770 1775
Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Ser Gly Gly Ser
1780 1785 1790
Pro Lys Lys Lys Arg Lys Val
1795

Claims (14)

1. A fusion protein comprising an ecTadA-ecTadA dimer fragment and an xCas9n fragment, said ecTadA-ecTadA dimer fragment comprising an ecTad fragment and an ecTadA fragment.
2. The fusion protein of claim 1, wherein the amino acid sequence of the ecTadA fragment comprises:
a) an amino acid sequence shown as SEQ ID NO. 32; or the like, or, alternatively,
b) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.32, and having the function of the amino acid sequence defined in a), preferably capable of forming a dimer with an ecTadA fragment, the dimer having adenine deaminase activity.
3. The fusion protein of claim 1, wherein the amino acid sequence of the ecTadA fragment comprises:
c) an amino acid sequence shown as SEQ ID NO. 33; or the like, or, alternatively,
d) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.33 and having the function of the amino acid sequence defined in c), preferably capable of forming a dimer with an ecTadA fragment, the dimer having adenine deaminase activity.
4. The fusion protein of claim 1, wherein the amino acid sequence of the xCas9n fragment comprises:
e) an amino acid sequence shown as SEQ ID NO. 34; or the like, or, alternatively,
f) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.34 and having the function of the amino acid sequence defined in e), preferably capable of recognizing NG, GAA or GAT as PAM.
5. The fusion protein of claim 1, comprising an ecTadA-ecTadA dimer fragment and an xCas9N fragment in that order from N-terminus to C-terminus;
and/or the ecTadA-ecTadA dimer fragment comprises an ecTad fragment and an ecTadA fragment in that order from the N-terminus to the C-terminus.
6. The fusion protein of claim 1, further comprising a nuclear localization signal fragment, preferably wherein the nuclear localization signal fragment is located N-terminal and/or C-terminal to the ecTadA-ecTadA dimer fragment and the xCas9N fragment, preferably wherein the amino acid sequence of the nuclear localization signal fragment is as set forth in SEQ ID No.35, or SEQ ID No. 36;
and/or, the fusion protein further comprises a flexible connecting peptide fragment, preferably, the amino acid sequence of the flexible connecting peptide fragment is shown in SEQ ID NO.37 or SEQ ID NO. 38.
7. The fusion protein of claim 1, wherein the amino acid sequence of the fusion protein is set forth in SEQ ID No. 39.
8. An isolated polynucleotide encoding the fusion protein of any one of claims 1 to 7.
9. A construct comprising the isolated polynucleotide of claim 8.
10. An expression system comprising the construct or genome of claim 9 having integrated therein an exogenous polynucleotide of claim 8.
11. The expression system of claim 10, wherein the host cell of the expression system is selected from eukaryotic cells or prokaryotic cells, preferably from mouse cells, hamster cells, non-human primate cells, human cells, more preferably from mouse brain neuroma cells, chinese hamster ovary cells, african green monkey kidney cells, human osteosarcoma cells, human embryonic kidney cells, or human cervical cancer cells, more preferably from N2a cells, CHO cells, COS-7 cells, U2OS cells, HEK293FT cells, or Hela cells.
12. Use of a fusion protein according to any one of claims 1 to 7, an isolated polynucleotide according to claim 8, a construct according to claim 9 or an expression system according to any one of claims 10 to 11 for gene editing, preferably eukaryotic gene editing.
13. A base editing system comprising the fusion protein of any one of claims 1 to 7, the base editing system further comprising a sgRNA.
14. A method of gene editing comprising: gene editing is performed by the fusion protein according to any one of claims 1 to 7 or the base editing system according to claim 13.
CN201910591202.0A 2019-07-02 2019-07-02 Base editing tool and application thereof Active CN112175927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910591202.0A CN112175927B (en) 2019-07-02 2019-07-02 Base editing tool and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910591202.0A CN112175927B (en) 2019-07-02 2019-07-02 Base editing tool and application thereof

Publications (2)

Publication Number Publication Date
CN112175927A true CN112175927A (en) 2021-01-05
CN112175927B CN112175927B (en) 2023-04-18

Family

ID=73915849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910591202.0A Active CN112175927B (en) 2019-07-02 2019-07-02 Base editing tool and application thereof

Country Status (1)

Country Link
CN (1) CN112175927B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113694218A (en) * 2021-08-30 2021-11-26 昆明理工大学 Gene repair treatment vector for ATP7B gene P992L mutation
CN115161305A (en) * 2021-04-02 2022-10-11 上海科技大学 Fusion protein comprising double-base editor and preparation method and application thereof
CN116497067A (en) * 2019-02-13 2023-07-28 比姆医疗股份有限公司 Compositions and methods for treating heme lesions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101595228A (en) * 2005-07-21 2009-12-02 艾博特公司 The method that comprises multi-gene expression and use polyprotein, precursor protein and the proteolysis of SORF construct
WO2018071868A1 (en) * 2016-10-14 2018-04-19 President And Fellows Of Harvard College Aav delivery of nucleobase editors
CN108822217A (en) * 2018-02-23 2018-11-16 上海科技大学 A kind of gene base editing machine
CN109804066A (en) * 2016-08-09 2019-05-24 哈佛大学的校长及成员们 Programmable CAS9- recombination enzyme fusion proteins and application thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101595228A (en) * 2005-07-21 2009-12-02 艾博特公司 The method that comprises multi-gene expression and use polyprotein, precursor protein and the proteolysis of SORF construct
CN109804066A (en) * 2016-08-09 2019-05-24 哈佛大学的校长及成员们 Programmable CAS9- recombination enzyme fusion proteins and application thereof
WO2018071868A1 (en) * 2016-10-14 2018-04-19 President And Fellows Of Harvard College Aav delivery of nucleobase editors
CN108822217A (en) * 2018-02-23 2018-11-16 上海科技大学 A kind of gene base editing machine

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116497067A (en) * 2019-02-13 2023-07-28 比姆医疗股份有限公司 Compositions and methods for treating heme lesions
CN115161305A (en) * 2021-04-02 2022-10-11 上海科技大学 Fusion protein comprising double-base editor and preparation method and application thereof
CN115161305B (en) * 2021-04-02 2023-05-12 上海科技大学 Fusion protein comprising double-base editor and preparation method and application thereof
CN113694218A (en) * 2021-08-30 2021-11-26 昆明理工大学 Gene repair treatment vector for ATP7B gene P992L mutation

Also Published As

Publication number Publication date
CN112175927B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN113227368B (en) Engineered enzymes
AU2018229561B2 (en) Recombinant adenoviruses and use thereof
CN112175927B (en) Base editing tool and application thereof
KR102147005B1 (en) Fad2 performance loci and corresponding target site specific binding proteins capable of inducing targeted breaks
KR20190039430A (en) How to edit bases in plants
KR20180081527A (en) Genetic tools for transformation of Clostridium bacteria
KR20230091894A (en) Systems, methods, and compositions for site-specific genetic engineering using programmable addition via site-specific targeting elements (PASTE)
EP2181195A2 (en) Fermentative production of acetone from renewable resources by means of novel metabolic pathway
KR101633477B1 (en) Improved protein expression system
CN109666647B (en) Umbilical cord mesenchymal stem cell for over-expressing IGF-1 and preparation method and application thereof
KR20230056669A (en) Compositions of DNA molecules, methods of making them and methods of using them
US20040087029A1 (en) Production of viral vectors
US20030180740A1 (en) Differential expression screening method
CN109762846B (en) Repair of GALC associated with krabbe disease using base editingC1586TMutational reagents and methods
BRPI0616533A2 (en) isolated polynucleotide, isolated nucleic acid fragment, recombinant DNA constructs, plants, seeds, plant cells, plant tissues, nucleic acid fragment isolation method, genetic variation mapping method, molecular cultivation method, corn plants, methods of nitrogen transport of plants and hat variants of altered plants
AU2017252409A1 (en) Compositions and methods for nucleic acid expression and protein secretion in bacteroides
CN107384958B (en) RSV antigenome plasmid constructed based on reverse genetics and application thereof
CN116549630A (en) Vascular endothelial growth factor gene resisting medicine mediated and expressed by adeno-associated virus vector, and preparation method and application thereof
CN114959919A (en) Method for constructing saccharomyces cerevisiae artificial small promoter library and application
CN107151676B (en) Preparation and application of fish with fluorescence protein transfer gene for high-sensitivity monitoring of POPs (persistent organic pollutants)
KR101578445B1 (en) Recombinant foot-and-mouth disease virus expressing P1-protective antigen of middle-east-derived Asia type and the manufacturing method
CN111801422A (en) Optimized host/vector system for the production of protective monovalent and multivalent subunit vaccines based on kluyveromyces lactis
CN108218997B (en) Expression vector for purifying protein
KR102335524B1 (en) Oncolytic recombinant newcastle disease virus contain PTEN gene constructed by based on the Newcastle disease virus for glioblastoma treatment and its composition
CN113684240B (en) Method for preparing bleomycin high-yield strain by utilizing in-situ multiplication blm gene cluster and double reporter genes and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant