CN115093482A - High-precision gonad purine base editor and application thereof - Google Patents

High-precision gonad purine base editor and application thereof Download PDF

Info

Publication number
CN115093482A
CN115093482A CN202210538473.1A CN202210538473A CN115093482A CN 115093482 A CN115093482 A CN 115093482A CN 202210538473 A CN202210538473 A CN 202210538473A CN 115093482 A CN115093482 A CN 115093482A
Authority
CN
China
Prior art keywords
leu
lys
glu
seq
asp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210538473.1A
Other languages
Chinese (zh)
Inventor
欧阳红生
袁泓明
王子茹
逄大欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Jitang Biotechnology Research Institute Co ltd
Original Assignee
Chongqing Jitang Biotechnology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Jitang Biotechnology Research Institute Co ltd filed Critical Chongqing Jitang Biotechnology Research Institute Co ltd
Priority to CN202210538473.1A priority Critical patent/CN115093482A/en
Publication of CN115093482A publication Critical patent/CN115093482A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/93Ligases (6)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04002Adenine deaminase (3.5.4.2)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y603/00Ligases forming carbon-nitrogen bonds (6.3)
    • C12Y603/02Acid—amino-acid ligases (peptide synthases)(6.3.2)
    • C12Y603/02019Ubiquitin-protein ligase (6.3.2.19), i.e. ubiquitin-conjugating enzyme
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/09Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/10Plasmid DNA
    • C12N2800/106Plasmid DNA for vertebrates
    • C12N2800/107Plasmid DNA for vertebrates for mammalian

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The invention provides a high-precision ABEs base editor and application, wherein a fusion protein comprises a first region and a second region from an N end to a C end, the first region comprises ABEs, and the ABEs comprise adenine deaminase or an enzymatic active component thereof and nCas 9; the second region comprises e18 protein; the fusion protein fuses the e18 protein and an ABEs base editing system, the ABEs base editor utilizes the e18 fused with the ABEs base editor to accelerate the degradation of nCas9-TadA of the ABEs base editor, the half-life period of the ABEs base editor is reduced, and therefore the accuracy is improved.

Description

High-precision gonad purine base editor and application thereof
Technical Field
The invention belongs to the technical field of biology, and particularly relates to a high-precision gonadal purine base editor and application thereof.
Background
The single-base editor is a gene editing system formed by fusion expression of Cas9 protein and deaminase (mainly comprising adenine deaminase and cytosine deaminase) and other proteins. The system can realize the conversion from one base pair to another base pair accurately and irreversibly without introducing DNA double-strand break and exogenous repair templates. Single base editors currently mainly include Adenine Base Editors (ABEs), Cytosine Base Editors (CBEs), guanine base editors (GCBEs), and dcbes and TALEDs that enable precise editing of mitochondrial genomes.
An Adenine Base Editor (ABEs) is a fusion protein consisting of nCas9(D10A), artificially-modified Adenine deaminase and the like. The protein can specifically recognize and bind to a target sequence under the guidance of sgRNA, and adenine (A) is deaminated to form creatinine (I), and then the I is converted into G, so that A-G conversion is realized. Because the base editing system has the characteristics of high-efficiency base editing efficiency, simple operation and the like, various versions of ABEs (mainly comprising ABE7.10, ABEmax, ABE8e and the like) are widely applied at present, and the development of the fields of life sciences (such as gene therapy, human disease models, biological medicines, disease generating mechanisms and the like) is greatly promoted. However, with the progress of research, researchers found that ABEs, CBEs and the like have problems such as too large editing window, genome-wide off-target effect (including Cas 9-dependent off-target effect and Cas 9-independent off-target effect).
Therefore, in terms of application, it is important to develop a base editor with higher accuracy for the development of the life science field. Research shows that DNA base editing ribonucleoprotein complex (RNP) formed by sgRNA and ABEs fusion protein can be rapidly degraded by protease in cells, so that the purity of an editor product can be remarkably promoted, and off-target effect can be reduced. However, in the practical application process, the synthesis cost of RNP is high, the storage difficulty is high, and the application prospect of the RNP is greatly limited. The ubiquitin-proteasome system (UPS), as the major pathway for degradation of proteins, is the most important mechanism for controlling protein levels. Ubiquitination involves three main steps: activation, binding and ligation. The protein degradation process mainly activates ubiquitin by E1 ubiquitin activating enzyme, then E2 ubiquitin conjugating enzyme binds ubiquitin transferred by E1 ubiquitin activating enzyme, finally attaches ubiquitin selectively to lysine, serine, threonine or cysteine residue of the protein of interest by E3 ligase. The E3 ligase can bind directly to the substrate and determine the specificity of the ubiquitin system. Rad18 is a RING-type E3 ubiquitin ligase that plays a critical role in DNA damage repair, and the E18 protein is a Rad18 variant protein with SAP domain removed.
Disclosure of Invention
The invention aims to construct a high-precision ABEs base editor, which fuses e18 protein with an ABEs base editing system, wherein the ABEs base editor utilizes e18 fused with the ABEs base editor to accelerate degradation of nCas9-TadA of the ABEs base editor, reduces half-life period of the ABEs base editor, improves precision, and has the advantages of lower price, simpler operation, easier storage and the like compared with a DNA base editing RNP compound, and RNP does not need to be synthesized in vitro.
The purpose of the invention is realized by the following technical scheme:
a fusion protein comprising, from N-terminus to C-terminus, a first region and a second region, the first region comprising ABEs comprising adenine deaminase or an enzymatically active component thereof and nCas 9; the second region comprises e18 protein; the fusion protein optionally comprises one or more linker amino acid sequences located in the first region and between the first region and the second region of the fusion protein.
As a preferable technical scheme of the invention: the ABEs are one of ABE7.10, ABEmax and ABE8e base editors.
As a preferred technical scheme of the invention: the fusion protein also includes a nuclear localization signal fragment.
As a preferred technical scheme of the invention: the amino acid sequence of the fusion protein is shown in SEQ ID NO.40 when the ABEs is ABEmax; the amino acid sequence of the fusion protein is shown in SEQ ID NO.41 when the ABEs are ABE8 e.
It is also an object of the present invention to provide a polynucleotide encoding the above fusion protein. The polynucleotide sequence comprises a first region encoding the ABEs and a second region encoding the e18 protein, and optionally one or more linker amino acid sequences located in the first region and between the first and second regions of the fusion protein.
As a preferred technical scheme of the invention: when the ABEs are ABEmax, the polynucleotide sequence is constructed in the following way: the gene sequence of ABEmax before gene modification is shown in SEQ ID NO. 2; the upstream primer sequence F1 of the e18 gene fragment 1 capable of effectively amplifying the target gene is shown as SEQ ID NO.3, and the downstream primer sequence R1 is shown as SEQ ID NO. 4; an upstream primer sequence F2 of the e18 gene fragment 2 capable of effectively amplifying the target gene is shown as SEQ ID NO.5, and a downstream primer sequence R2 is shown as SEQ ID NO. 6; the upstream primer sequence F3 of the e18 fragment gene sequence with the enzyme cutting sites which can be effectively amplified is shown as SEQ ID NO.7, the downstream primer sequence R3 is shown as SEQ ID NO.8, and the e18 gene sequence with double enzyme cutting sites is shown as SEQ ID NO. 9; the sequence obtained by connecting the fragment after the restriction of the enzyme SEQ ID NO.9 with the fragment after the restriction of the enzyme SEQ ID NO.2 is the polynucleotide sequence shown in SEQ ID NO. 1.
As a preferred technical scheme of the invention: when the ABEs is ABE8e, the polynucleotide sequence shown in SEQ ID NO.10 is constructed in the following manner; the gene sequence of ABE8e before gene modification is shown in SEQ ID NO. 11; the e18 gene sequence with double restriction sites is shown as SEQ ID NO.9, and the sequence obtained by connecting the fragment SEQ ID NO.9 with the fragment after restriction of SEQ ID NO.11 is shown as the polynucleotide sequence shown as SEQ ID NO. 10.
It is also an object of the present invention to provide a construct comprising the polynucleotide. The construct may be constructed by inserting the polynucleotide into an appropriate expression vector. The expression vector may be, but is not limited to, a pCMV expression vector, a pSV2 expression vector, and the like.
It is also an object of the present invention to provide an expression system comprising said construct or having integrated into its genome the exogenous polynucleotide as described above. The expression system can be a host cell that can express the fusion protein as described above, which can be coordinated with the sgRNA so that the fusion protein can be targeted to the target region, enabling base editing of the target region.
It is also an object of the invention to provide a use, in particular a use of said fusion protein and said polynucleotide and said construct or said expression system in gene editing for converting the base a to G.
It is a further object of the invention to provide a base editing system comprising the fusion protein and sgrnas, the fusion protein and the sgrnas cooperating to localize the fusion protein to a target region.
It is still another object of the present invention to provide a method for gene editing comprising gene editing by converting the base A into G using the fusion protein or the base editing system.
The beneficial effects are as follows:
the adenine base editor with high accuracy fuses a foreign protein e18 on the existing adenine base editor, and the gene sequences of the plasmids are shown as SEQ ID NO.1 and SEQ ID NO.10, namely ABEmax-e18 and ABE8e-e 18. The editing windows all have obvious accuracy, namely 5-7 bits and 1-9 bits respectively. Editing positive clonal cells of PCSK9 significantly increased LDL uptake, indicating its potential for gene therapy. Provides more possible improvement direction for the development of the follow-up precise base editor, and simultaneously increases the tool number of the editor.
The invention can shorten the half-life of ABEs protein, accelerate ABEs expression plasmids degraded in cells, increase the accuracy of ABEs and reduce off-target effect of ABEs by constructing the ABEs protein expression plasmids.
Drawings
FIG. 1a is a schematic representation of the ABEmax-e18 plasmid vector of the present invention;
FIG. 1b is a schematic representation of the ABE8e-e18 plasmid vector of the present invention;
FIG. 2 is a sequencing statistical plot of the sgRNA base editing efficiency of ABEmax and ABEmax-e18 at multiple endogenous sites according to the present invention;
FIG. 3 is a sequencing diagram of a positive PCSK9 editing clone of the present invention;
FIG. 4 is a comparison of LDL uptake by wild-type cells of the invention and positive PCSK9 editing clone cells;
fig. 5 is a sequence statistical chart of sgRNA base editing efficiencies of ABE8e and ABE8e-e18 at multiple endogenous sites according to the present invention.
Detailed Description
The present invention provides in a first aspect a fusion protein comprising, from N-terminus to C-terminus, a first region and a second region, the first region comprising ABEs comprising adenine deaminase or an enzymatically active component thereof and nCas 9; the second region comprises e18 protein; the fusion protein optionally comprises one or more linker amino acid sequences located in the first region and between the first region and the second region of the fusion protein. The fusion protein is subjected to base editing at a target site under the guidance of the sgRNA, the ABEs fragment is an existing ABEs base editor, and the degradation of nCas9-TadA of the ABEs base editor is accelerated by using an e18 sequence which is fused and expressed with the ABEs base editor, so that the half-life of the ABEs base editor is reduced, and the accuracy is improved.
In the fusion protein provided by the invention, the ABEs are one of ABE7.10, ABEmax and ABE8e base editors.
In the fusion protein provided by the invention, the amino acid sequence of the fusion protein is shown as SEQ ID NO.40 or SEQ ID NO. 41; or an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.40 or to SEQ ID NO.41 and having the function of the amino acid sequence defined by SEQ ID NO.40 or SEQ ID NO. 41. Specifically, the amino acid sequence having more than 80% sequence similarity with SEQ ID NO.40 or SEQ ID NO.41 specifically refers to: the polypeptide fragment is obtained by substituting, deleting or adding one or more (specifically, 1-50, 1-30, 1-20, 1-10, 1-5, 1-3, 1, 2 or 3) amino acids to the amino acid sequence shown in SEQ ID NO.40 or SEQ ID NO.41, or adding one or more (specifically, 1-50, 1-30, 1-20, 1-10, 1-5, 1-3, 1, 2 or 3) amino acids to the N-terminus and/or C-terminus, and has the function of the polypeptide fragment shown in SEQ ID NO.40 or SEQ ID NO. 41. The sequence similarity, which generally refers to the percentage of amino acid residues in the sequences that are identical in comparison, can be calculated using computational software known in the art for the similarity of two or more sequences of interest, e.g., software from NCBI.
In the fusion protein provided by the invention, the substitution, deletion or addition can be conservative amino acid substitution. The "conservative amino acid substitution" may specifically refer to the case where an amino acid residue is substituted with another amino acid residue having a similar side chain. Families of amino acid residues with similar side chains should be known to those skilled in the art.
In the fusion protein provided by the invention, the fusion protein can also comprise a nuclear localization signal segment, and the nuclear localization signal segment can generally interact with a nuclear carrier, so that the protein can be transported into a nucleus.
In a second aspect, the present invention provides a polynucleotide encoding the fusion protein described above.
The polynucleotide sequence provided by the invention is shown in SEQ ID NO.1 or SEQ ID NO. 10.
In the polynucleotide provided by the invention, the polynucleotide sequence shown as SEQ ID NO.1 is constructed in the following way; the gene sequence of ABEmax before gene modification is shown in SEQ ID NO. 2; the upstream primer sequence F1 of the e18 gene fragment 1 capable of effectively amplifying the target gene is shown as SEQ ID NO.3, and the downstream primer sequence R1 is shown as SEQ ID NO. 4; the upstream primer sequence F2 of the e18 gene fragment 2 capable of effectively amplifying the target gene is shown as SEQ ID NO.5, and the downstream primer sequence R2 is shown as SEQ ID NO. 6; the upstream primer sequence F3 of the e18 fragment gene sequence with the enzyme cutting sites which can be effectively amplified is shown as SEQ ID NO.7, the downstream primer sequence R3 is shown as SEQ ID NO.8, and the e18 fragment with double enzyme cutting sites is shown as SEQ ID NO. 9; the sequence obtained by connecting the fragment after the restriction of the enzyme SEQ ID NO.9 with the fragment after the restriction of the enzyme SEQ ID NO.2 is the polynucleotide sequence shown in SEQ ID NO. 1.
In the polynucleotide provided by the invention, the polynucleotide sequence shown as SEQ ID NO.10 is constructed in the following way; the gene sequence of ABE8e before gene modification is shown in SEQ ID NO. 11; the sequence obtained by connecting the fragment SEQ ID NO.9 with the fragment obtained by enzyme digestion of the fragment SEQ ID NO.11 is shown as SEQ ID NO. 10.
In a third aspect, the invention provides a construct comprising the polynucleotide. The construct may be constructed by inserting the polynucleotide into an appropriate expression vector. One skilled in the art can select an appropriate expression vector, for example, the expression vector can be, but is not limited to, a pCMV expression vector, a pSV2 expression vector, and the like.
In a fourth aspect, the invention provides an expression system comprising the construct or the genome into which the exogenous polynucleotide has been integrated. The expression system can be a host cell that can express the fusion protein as described above, which can be coordinated with the sgRNA so that the fusion protein can be targeted to the target region, enabling base editing of the target region. In another embodiment of the present invention, the host cell may be a eukaryotic cell and/or a prokaryotic cell, more specifically a mouse cell, a human cell, etc., more specifically a mouse brain neuroma cell, a human embryonic kidney cell, a human cervical cancer cell, a human colon cancer cell, a human osteosarcoma cell, etc.
A fifth aspect of the invention provides a use of said fusion protein and said polynucleotide and said construct or said expression system in gene editing. Preferably eukaryotic organisms, in particular metazoan, in particular including but not limited to humans, mice, etc. The use specifically includes, but is not limited to, base editing from A to G, etc., which can be applied to edit a splice acceptor/donor site to regulate RNA splicing, and can also be used for constructing a model (e.g., a disease model, a cell model, an animal model, etc.) or treating human diseases, etc. In one embodiment of the present invention, the object being edited may be an embryo, a cell, or the like.
The sixth aspect of the invention provides a base editing system, which includes the fusion protein and sgRNA. One skilled in the art can select an appropriate sgRNA targeting a specific site according to the targeted editing region of a gene. For example, the sequence of the sgRNA can be generally at least partially complementary to the target region, such that it can cooperate with the fusion protein to localize the fusion protein to the target region, resulting in base editing in the target region, which is the conversion of base a to G.
The seventh aspect of the present invention provides a method for gene editing, comprising performing gene editing by converting the base a into G using the fusion protein or the base editing system. For example, the gene editing method may include: culturing the expression system provided by the fourth aspect of the present invention under appropriate conditions to express the fusion protein, wherein the fusion protein can perform base editing on the target region in the presence of sgRNA targeting the target region in cooperation with the fusion protein. Methods for providing conditions under which the sgrnas exist should be known to those skilled in the art, and for example, an expression system capable of expressing the sgrnas, which may be a host cell including an expression vector containing a polynucleotide encoding the sgrnas, or a host cell having a polynucleotide encoding the sgrnas chromosomally integrated therein, may be cultured under appropriate conditions. In a specific embodiment of the invention, the sgRNA and the fusion protein can be expressed in the same host cell, which can be a target cell. In another embodiment of the invention, the gene editing is in vitro gene editing.
The embodiments of the present invention are described below with specific examples, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and its several details are capable of modifications and variations in various obvious respects, all without departing from the spirit of the invention.
Before the present embodiments are further described, it is to be understood that the scope of the invention is not limited to the particular embodiments described below; it is also to be understood that the terminology used in the examples is for the purpose of describing particular embodiments, and is not intended to limit the scope of the present invention; in the description and claims of the present application, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise.
When numerical ranges are given in the examples, it is understood that both endpoints of each of the numerical ranges and any value therebetween can be selected unless the invention is otherwise indicated. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In addition to the specific methods, devices, and materials used in the examples, any methods, devices, and materials similar or equivalent to those described in the examples herein can be used in the practice of the invention in addition to the specific methods, devices, and materials used in the examples herein, in keeping with the knowledge of one skilled in the art and the description of the invention.
Unless otherwise indicated, the experimental methods, detection methods, and preparation methods disclosed herein all employ techniques conventional in the art of molecular biology, biochemistry, chromatin structure and analysis, analytical chemistry, cell culture, recombinant DNA technology, and related arts.
Example 1
Construction of ABEmax-e18 and ABE8e-e18 plasmids
Designing and synthesizing a primer sequence aiming at an e18 sequence, obtaining an e18 fragment with a restriction enzyme site by a PCR (polymerase chain reaction) in vitro amplification method, carrying out restriction enzyme on the fragment, and respectively connecting the fragment with ABEmax and ABE8e fragments which are similarly restricted to obtain ABEmax-e18 plasmids and ABE8e-e18 plasmids, wherein the main elements of the vector sequentially comprise: adenine deaminase, nCas9, e18, nuclear localization signal. The editor plasmid and the sgRNA plasmid act together to obtain characteristic information such as editing activity and the like. After the constructed plasmid is sequenced and verified, extracting a target plasmid, carrying out ethanol precipitation, purifying the expression vector with a certain concentration, and constructing a plasmid map as shown in figure 1.
The invention constructs the ABEs expression plasmid which can shorten the half-life period of ABEs protein, accelerate the degradation of the ABEs protein in cells, increase the accuracy of the ABEs and reduce the off-target effect of the ABEs.
The fusion protein sequentially comprises deaminase, nCas9 and an e18 fragment.
Example 2
Design of sgRNA sequence and plasmid construction
sgRNA sequences that can be used on human cells were designed and synthesized. Synthesizing the designed sgRNA sequence; the DNA sequences of the single-stranded sgRNAs form a plurality of oligonucleotide chains of the sgRNAs at different sites after annealing respectively; the oligonucleotide was then ligated into the sgRNA backbone plasmid vector. The sequences of these sgrnas are:
SgRNA-1 sequence: 5-GAATACTAAGCATAGACTCC-3
SgRNA-2 sequence: 5-GTAAACAAAGCATAGACTGA-3
SgRNA-3 sequence: 5-GAACACAAAGCATAGACTGC-3
SgRNA-4 sequence: 5-GATGAGATAATGATGAGTCA-3
SgRNA-5 sequence: 5-GACAAACCAGAAGCCGCTCC-3
SgRNA-6 sequence: 5-GGGAATAAATCATAGAATCC-3
SgRNA-7 sequence: 5-GGAACACAAAGCATAGACTG-3
SgRNA-8 sequence: 5-GCACCTACCTCGGGAGCTGA-3
SgRNA-9 sequence: 5-GGAATCCCTTCTGCAGCACC-3
SgRNA-10 sequence: 5-TCAGAAAGTGGTGGCTGGTG-3
SgRNA-11 sequence: 5-GGCCCAGACTGAGCACGTGA-3
SgRNA-12 sequence: 5-ATATTTGCATTGAGATAGTG-3
SgRNA-13 sequence: 5-GTCATCTTAGTCATTACCTG-3
SgRNA-14 sequence: 5-GAAGATAGAGAATAGACTGC-3
After sequencing verification of the constructed sgRNA expression vector, extracting a target plasmid, performing ethanol precipitation, and purifying the sgRNA expression vector with a certain concentration.
Example 3
Co-transfection of ABEs plasmid and sgRNA expression vector
3-1ABEmax and co-transfection of ABEmax-e18 plasmid with sgRNA expression vector
Plating HEK293T cells, introducing ABEmax and ABEmax-e18 and sgRNA expression vectors into the cells in a liposome transfection mode when the cells are about 80% long, after transfection is carried out for 72 hours, extracting genomes of all groups of cells, carrying out PCR reaction by using specific primers for detecting mutation efficiency, sending obtained PCR products to sequencing, evaluating editing efficiency of sgRNA sites through analysis of sequencing peak diagrams, and editing windows of editors; FIG. 2 shows the sgRNA1 sequences corresponding to SEQ ID NO.12 and SEQ ID NO.13 obtained above; sgRNA2 sequences corresponding to SEQ ID NO.14 and SEQ ID NO. 15; sgRNA3 sequences corresponding to SEQ ID NO.16 and SEQ ID NO. 17; sgRNA4 sequences corresponding to SEQ ID No.18 and SEQ ID No. 19; sgRNA5 sequences corresponding to SEQ ID NO.20 and SEQ ID NO. 21; sgRNA6 sequences corresponding to SEQ ID No.22 and SEQ ID No. 23; the sgRNA7 sequences corresponding to SEQ ID NO.24 and SEQ ID NO.25 can effectively guide cas9 protein to edit target sites in cells and obtain the characteristics of an editing window, the site range in the window editable by ABEmax is 3-8 sites, and the site range in the window editable by ABEmax-e18 is 5-7 sites.
Cotransfection of 3-2ABE8e and ABE8e-e18 plasmids with sgRNA expression vectors
Similarly, HEK293T cells are plated, ABE8e, ABE8e-e18 and sgRNA expression vectors are introduced into the cells in a liposome transfection mode when the cells are about 80% long, genomes of all groups of cells are extracted after transfection for 72 hours, then PCR reaction is carried out by using specific primers for detecting mutation efficiency, obtained PCR products are sent to sequencing, editing efficiency of sgRNA sites is evaluated through analysis of sequencing peak diagrams, and an editing window of an editor is obtained; FIG. 5 shows the sgRNA1 sequences corresponding to SEQ ID NO.12 and SEQ ID NO.13 obtained above; sgRNA2 sequences corresponding to SEQ ID No.14 and SEQ ID No. 15; sgRNA3 sequences corresponding to SEQ ID No.16 and SEQ ID No. 17; sgRNA4 sequences corresponding to SEQ ID No.18 and SEQ ID No. 19; sgRNA6 sequences corresponding to SEQ ID No.22 and SEQ ID No. 23; sgRNA9 sequences corresponding to SEQ ID No.28 and SEQ ID No. 29; sgRNA10 sequences corresponding to SEQ ID No.30 and SEQ ID No. 31; sgRNA11 sequences corresponding to SEQ ID NO.32 and SEQ ID NO. 33; sgRNA12 sequences corresponding to SEQ ID NO.34 and SEQ ID NO. 35; sgRNA13 sequences corresponding to SEQ ID No.36 and SEQ ID No. 37; the sgRNA14 sequences corresponding to SEQ ID NO.38 and SEQ ID NO.39 can effectively guide the cas9 protein to edit a target site in a cell and obtain the characteristics of an editing window, the site range in the editable window of ABE8e is 1-14, and the site range in the editable window of ABE8e-e18 is 1-9. E.g., the range of sites and the range of efficiencies within the editable window, see fig. 2.
Co-transfection of 3-3ABEmax-e18 plasmid with PCSK9-sgRNA expression vector
And (3) recovering HepG2 cells, washing the cells for 2-3 times by using PBS when the cells are nearly full, removing the supernatant, adding an electrotransfection buffer solution, adding the ABEmax-e18 plasmid and the sgRNA expression plasmid into the cells and the buffer solution in proportion, gently mixing the mixture by using a pipette, and gently sucking the mixture into the pipette with a special gun head. It is inserted into a pipette stand with an electrode cup into which the electric transfer buffer is added, the program is set up on the apparatus and then pressed to start. After the electric shock is finished and the mixture is kept still for 2 minutes, the mixture in the electric transfer gun is transferred into a cell culture dish. Finally, the cell culture dish is placed in a carbon dioxide incubator at 37 ℃ for culture. After 12 hours of incubation, the medium was changed. The results in fig. 3 show that the sgRNA8 sequences corresponding to SEQ ID No.26 and SEQ ID No.27 obtained above can effectively guide the cas9 protein to edit the target site in HepG2 cells, and see fig. 3.
Example 4 preparation of PCSK9 edited Positive clone HepG2 cells
After electrotransfection for 72h, the HepG2 cells were plated on 100mm cell culture dishes by limiting dilution method using pancreatin-digested cells, and the cell culture solution was changed once for 2-3 days. After the cell clones grow after 8-10 days, uniformly marking the cell clones under a fluorescence microscope, and then picking the marked clones into a 24-hole cell culture plate for subsequent culture. After 2-3 days, when the cells in the 24-well plate grow to a certain confluency, one half of the cells are cracked by NP40 lysate and then the PCSK9 fixed-point base editing event is further verified and determined by a PCR sequencing method, as shown in figure 4.
Experimental example 5, Positive PCSK9 site-directed base editing HepG2 cell in vitro LDL uptake assay HepG2 cells obtained above were cultured with DMEM, 5% FBS, 1% double antibody, 1mM sodium pyroguvate 1% glutamine, 1% non-essential amino acids. The obtained cells were seeded at a certain density in 6-well cell culture plates. When the cell density reached 70%, Dil-LDL was mixed at a ratio of 1: 100 dilution and cell incubation for 3 hours; the supernatant was then discarded, washed three times with PBS, cells were fixed in plates for 2h by adding 4% paraformaldehyde solution, and the samples were washed three times with PBS for 5 minutes each. Cells were permeabilized for 10 minutes with 0.5% triton X-100 (in PBS). PBS was washed three times for 5 minutes each. 0.5ug/ml DAPI (in PBS) was added for 10 min of staining. Wash three times with PBS. Observed under a fluorescent microscope. The site-specific edited cells were able to take up more LDL than the fluorescence intensity results of the unedited control cells, see fig. 5.
In conclusion, the invention realizes that the e18(rad18 gene removes SAP structure domain) gene sequence shown in SEQ ID NO.9 is fused to the commonly used ABEmax and ABE8e adenine base editor with the highest editing activity by using the action of the exogenous protein, so that the editor with more accurate editing window and lower editing activity is obtained, and a new direction is provided for further improving the editor subsequently.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Those skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.
Sequence listing
<110> Chongqing begonia Biotechnology research institute Co., Ltd
<120> high-precision gonadal purine base editor and application thereof
<160> 41
<170> SIPOSequenceListing 1.0
<210> 1
<211> 10224
<212> DNA
<213> Artificial sequence
<400> 1
atatgccaag tacgccccct attgacgtca atgacggtaa atggcccgcc tggcattatg 60
cccagtacat gaccttatgg gactttccta cttggcagta catctacgta ttagtcatcg 120
ctattaccat ggtgatgcgg ttttggcagt acatcaatgg gcgtggatag cggtttgact 180
cacggggatt tccaagtctc caccccattg acgtcaatgg gagtttgttt tggcaccaaa 240
atcaacggga ctttccaaaa tgtcgtaaca actccgcccc attgacgcaa atgggcggta 300
ggcgtgtacg gtgggaggtc tatataagca gagctggttt agtgaaccgt cagatccgct 360
agagatccgc ggccgctaat acgactcact atagggagag ccgccaccat gaaacggaca 420
gccgacggaa gcgagttcga gtcaccaaag aagaagcgga aagtctctga agtcgagttt 480
agccacgagt attggatgag gcacgcactg accctggcaa agcgagcatg ggatgaaaga 540
gaagtccccg tgggcgccgt gctggtgcac aacaatagag tgatcggaga gggatggaac 600
aggccaatcg gccgccacga ccctaccgca cacgcagaga tcatggcact gaggcaggga 660
ggcctggtca tgcagaatta ccgcctgatc gatgccaccc tgtatgtgac actggagcca 720
tgcgtgatgt gcgcaggagc aatgatccac agcaggatcg gaagagtggt gttcggagca 780
cgggacgcca agaccggcgc agcaggctcc ctgatggatg tgctgcacca ccccggcatg 840
aaccaccggg tggagatcac agagggaatc ctggcagacg agtgcgccgc cctgctgagc 900
gatttcttta gaatgcggag acaggagatc aaggcccaga agaaggcaca gagctccacc 960
gactctggag gatctagcgg aggatcctct ggaagcgaga caccaggcac aagcgagtcc 1020
gccacaccag agagctccgg cggctcctcc ggaggatcct ctgaggtgga gttttcccac 1080
gagtactgga tgagacatgc cctgaccctg gccaagaggg cacgcgatga gagggaggtg 1140
cctgtgggag ccgtgctggt gctgaacaat agagtgatcg gcgagggctg gaacagagcc 1200
atcggcctgc acgacccaac agcccatgcc gaaattatgg ccctgagaca gggcggcctg 1260
gtcatgcaga actacagact gattgacgcc accctgtacg tgacattcga gccttgcgtg 1320
atgtgcgccg gcgccatgat ccactctagg atcggccgcg tggtgtttgg cgtgaggaac 1380
gcaaaaaccg gcgccgcagg ctccctgatg gacgtgctgc actaccccgg catgaatcac 1440
cgcgtcgaaa ttaccgaggg aatcctggca gatgaatgtg ccgccctgct gtgctatttc 1500
tttcggatgc ctagacaggt gttcaatgct cagaagaagg cccagagctc caccgactcc 1560
ggaggatcta gcggaggctc ctctggctct gagacacctg gcacaagcga gagcgcaaca 1620
cctgaaagca gcgggggcag cagcgggggg tcagacaaga agtacagcat cggcctggcc 1680
atcggcacca actctgtggg ctgggccgtg atcaccgacg agtacaaggt gcccagcaag 1740
aaattcaagg tgctgggcaa caccgaccgg cacagcatca agaagaacct gatcggagcc 1800
ctgctgttcg acagcggcga aacagccgag gccacccggc tgaagagaac cgccagaaga 1860
agatacacca gacggaagaa ccggatctgc tatctgcaag agatcttcag caacgagatg 1920
gccaaggtgg acgacagctt cttccacaga ctggaagagt ccttcctggt ggaagaggat 1980
aagaagcacg agcggcaccc catcttcggc aacatcgtgg acgaggtggc ctaccacgag 2040
aagtacccca ccatctacca cctgagaaag aaactggtgg acagcaccga caaggccgac 2100
ctgcggctga tctatctggc cctggcccac atgatcaagt tccggggcca cttcctgatc 2160
gagggcgacc tgaaccccga caacagcgac gtggacaagc tgttcatcca gctggtgcag 2220
acctacaacc agctgttcga ggaaaacccc atcaacgcca gcggcgtgga cgccaaggcc 2280
atcctgtctg ccagactgag caagagcaga cggctggaaa atctgatcgc ccagctgccc 2340
ggcgagaaga agaatggcct gttcggaaac ctgattgccc tgagcctggg cctgaccccc 2400
aacttcaaga gcaacttcga cctggccgag gatgccaaac tgcagctgag caaggacacc 2460
tacgacgacg acctggacaa cctgctggcc cagatcggcg accagtacgc cgacctgttt 2520
ctggccgcca agaacctgtc cgacgccatc ctgctgagcg acatcctgag agtgaacacc 2580
gagatcacca aggcccccct gagcgcctct atgatcaaga gatacgacga gcaccaccag 2640
gacctgaccc tgctgaaagc tctcgtgcgg cagcagctgc ctgagaagta caaagagatt 2700
ttcttcgacc agagcaagaa cggctacgcc ggctacattg acggcggagc cagccaggaa 2760
gagttctaca agttcatcaa gcccatcctg gaaaagatgg acggcaccga ggaactgctc 2820
gtgaagctga acagagagga cctgctgcgg aagcagcgga ccttcgacaa cggcagcatc 2880
ccccaccaga tccacctggg agagctgcac gccattctgc ggcggcagga agatttttac 2940
ccattcctga aggacaaccg ggaaaagatc gagaagatcc tgaccttccg catcccctac 3000
tacgtgggcc ctctggccag gggaaacagc agattcgcct ggatgaccag aaagagcgag 3060
gaaaccatca ccccctggaa cttcgaggaa gtggtggaca agggcgcttc cgcccagagc 3120
ttcatcgagc ggatgaccaa cttcgataag aacctgccca acgagaaggt gctgcccaag 3180
cacagcctgc tgtacgagta cttcaccgtg tataacgagc tgaccaaagt gaaatacgtg 3240
accgagggaa tgagaaagcc cgccttcctg agcggcgagc agaaaaaggc catcgtggac 3300
ctgctgttca agaccaaccg gaaagtgacc gtgaagcagc tgaaagagga ctacttcaag 3360
aaaatcgagt gcttcgactc cgtggaaatc tccggcgtgg aagatcggtt caacgcctcc 3420
ctgggcacat accacgatct gctgaaaatt atcaaggaca aggacttcct ggacaatgag 3480
gaaaacgagg acattctgga agatatcgtg ctgaccctga cactgtttga ggacagagag 3540
atgatcgagg aacggctgaa aacctatgcc cacctgttcg acgacaaagt gatgaagcag 3600
ctgaagcggc ggagatacac cggctggggc aggctgagcc ggaagctgat caacggcatc 3660
cgggacaagc agtccggcaa gacaatcctg gatttcctga agtccgacgg cttcgccaac 3720
agaaacttca tgcagctgat ccacgacgac agcctgacct ttaaagagga catccagaaa 3780
gcccaggtgt ccggccaggg cgatagcctg cacgagcaca ttgccaatct ggccggcagc 3840
cccgccatta agaagggcat cctgcagaca gtgaaggtgg tggacgagct cgtgaaagtg 3900
atgggccggc acaagcccga gaacatcgtg atcgaaatgg ccagagagaa ccagaccacc 3960
cagaagggac agaagaacag ccgcgagaga atgaagcgga tcgaagaggg catcaaagag 4020
ctgggcagcc agatcctgaa agaacacccc gtggaaaaca cccagctgca gaacgagaag 4080
ctgtacctgt actacctgca gaatgggcgg gatatgtacg tggaccagga actggacatc 4140
aaccggctgt ccgactacga tgtggaccat atcgtgcctc agagctttct gaaggacgac 4200
tccatcgaca acaaggtgct gaccagaagc gacaagaacc ggggcaagag cgacaacgtg 4260
ccctccgaag aggtcgtgaa gaagatgaag aactactggc ggcagctgct gaacgccaag 4320
ctgattaccc agagaaagtt cgacaatctg accaaggccg agagaggcgg cctgagcgaa 4380
ctggataagg ccggcttcat caagagacag ctggtggaaa cccggcagat cacaaagcac 4440
gtggcacaga tcctggactc ccggatgaac actaagtacg acgagaatga caagctgatc 4500
cgggaagtga aagtgatcac cctgaagtcc aagctggtgt ccgatttccg gaaggatttc 4560
cagttttaca aagtgcgcga gatcaacaac taccaccacg cccacgacgc ctacctgaac 4620
gccgtcgtgg gaaccgccct gatcaaaaag taccctaagc tggaaagcga gttcgtgtac 4680
ggcgactaca aggtgtacga cgtgcggaag atgatcgcca agagcgagca ggaaatcggc 4740
aaggctaccg ccaagtactt cttctacagc aacatcatga actttttcaa gaccgagatt 4800
accctggcca acggcgagat ccggaagcgg cctctgatcg agacaaacgg cgaaaccggg 4860
gagatcgtgt gggataaggg ccgggatttt gccaccgtgc ggaaagtgct gagcatgccc 4920
caagtgaata tcgtgaaaaa gaccgaggtg cagacaggcg gcttcagcaa agagtctatc 4980
cggcccaaga ggaacagcga taagctgatc gccagaaaga aggactggga ccctaagaag 5040
tacggcggct tcgtgagccc caccgtggcc tattctgtgc tggtggtggc caaagtggaa 5100
aagggcaagt ccaagaaact gaagagtgtg aaagagctgc tggggatcac catcatggaa 5160
agaagcagct tcgagaagaa tcccatcgac tttctggaag ccaagggcta caaagaagtg 5220
aaaaaggacc tgatcatcaa gctgcctaag tactccctgt tcgagctgga aaacggccgg 5280
aagagaatgc tggcctctgc cagattcctg cagaagggaa acgaactggc cctgccctcc 5340
aaatatgtga acttcctgta cctggccagc cactatgaga agctgaaggg ctcccccgag 5400
gataatgagc agaaacagct gtttgtggaa cagcacaagc actacctgga cgagatcatc 5460
gagcagatca gcgagttctc caagagagtg atcctggccg acgctaatct ggacaaagtg 5520
ctgtccgcct acaacaagca ccgggataag cccatcagag agcaggccga gaatatcatc 5580
cacctgttta ccctgaccaa tctgggagcc cctcgggcct tcaagtactt tgacaccacc 5640
atcgaccgga aggtgtaccg gagcaccaaa gaggtgctgg acgccaccct gatccaccag 5700
agcatcaccg gcctgtacga gacacggatc gacctgtctc agctgggagg tgactctggc 5760
ggctcaaaaa gaaccgccga cggcagcgaa ttcagcacag ggagcatggg aatggactcc 5820
ctggccgagt ctcggtggcc tccgggcctg gcagtcatga agacaataga tgatttgctg 5880
cggtgtggaa tttgcttcga gtatttcaac attgcaatga taatacctca gtgttcacat 5940
aactactgct ctctctgtat aagaaaattt ctgtcctata aaactcagtg tccaacttgc 6000
tgtgtgactg tcacagagcc ggatctgaaa aataaccgca tattagatga actggtaaaa 6060
agcttgaatt ttgcacggaa tcatctgctg cagtttgctt tagagtcacc agccaaatct 6120
cctgcttctt cctcttcaaa gaatcttgct gtcaaagtat atactcctgt agcctccaga 6180
cagtctttaa agcaggggag caggttaatg gataatttct tgatcagaga aatgagtggt 6240
tctacatcag agttgttgat aaaagaaaat aaaagcaaat tcagccctca aaaagaggcg 6300
agccctgctg caaagaccaa agagacacgt tctgtagaag agatcgctcc agatccctca 6360
gaggctaagc gtcctgagcc accctcgaca tccactttga aacaagttac taaagtggat 6420
tgtcctgttt gcggggttaa cattccagaa agtcacatta ataagcattt agacagctgt 6480
ttatcacgcg aagagaagaa ggaaagcctc agaagttctg ttcacaaaag gaagccgcac 6540
atgtacaatg cccaatgcga tgctttgcat cctaaatcag ctgctgaaat agttcgagaa 6600
atcgaaaata tagagaagac taggatgcgt cttgaagcta gtaaactcaa tgaaagtgta 6660
atggttttta caaaggacca aacagaaaag gaaatagatg aaatccacag taaatatcgt 6720
aaaaaacata agagtgaatt tcagcttctg gtggatcagg ctagaaaagg atacaagaaa 6780
attgctggaa tgtcacaaaa aacagtaaca ataacaaaag aagatgaatc tacagaaaag 6840
ctatcttctg tatgcatggg acaggaagat aatatgacct cagtaacaaa ccacttttct 6900
caatcaaagc tggactcccc agaggaattg gaacctgaca gagaagagga ttcttctagc 6960
tgtattgata ttcaagaagt tctttcttca tcagaatcag attcatgcaa tagttccagt 7020
tcagacatca taagagatct tttagaagaa gaggaagcct gggaagcatc acataaaaac 7080
gatcttcaag acacagaaat aagtccaaga cagaatcgcc gcacaagagc cgctgaaagt 7140
gctgagattg aaccaagaaa caagcgtaat aggaatgaaa aaagaaccgc cgacggcagc 7200
gagttcgagc ccaagaagaa gaggaaagtc caaccggtca tcatcaccat caccattgag 7260
tttaaacccg ctgatcagcc tcgactgtgc cttctagttg ccagccatct gttgtttgcc 7320
cctcccccgt gccttccttg accctggaag gtgccactcc cactgtcctt tcctaataaa 7380
atgaggaaat tgcatcgcat tgtctgagta ggtgtcattc tattctgggg ggtggggtgg 7440
ggcaggacag caagggggag gattgggaag acaatagcag gcatgctggg gatgcggtgg 7500
gctctatggc ttctgaggcg gaaagaacca gctggggctc gataccgtcg acctctagct 7560
agagcttggc gtaatcatgg tcatagctgt ttcctgtgtg aaattgttat ccgctcacaa 7620
ttccacacaa catacgagcc ggaagcataa agtgtaaagc ctagggtgcc taatgagtga 7680
gctaactcac attaattgcg ttgcgctcac tgcccgcttt ccagtcggga aacctgtcgt 7740
gccagctgca ttaatgaatc ggccaacgcg cggggagagg cggtttgcgt attgggcgct 7800
cttccgcttc ctcgctcact gactcgctgc gctcggtcgt tcggctgcgg cgagcggtat 7860
cagctcactc aaaggcggta atacggttat ccacagaatc aggggataac gcaggaaaga 7920
acatgtgagc aaaaggccag caaaaggcca ggaaccgtaa aaaggccgcg ttgctggcgt 7980
ttttccatag gctccgcccc cctgacgagc atcacaaaaa tcgacgctca agtcagaggt 8040
ggcgaaaccc gacaggacta taaagatacc aggcgtttcc ccctggaagc tccctcgtgc 8100
gctctcctgt tccgaccctg ccgcttaccg gatacctgtc cgcctttctc ccttcgggaa 8160
gcgtggcgct ttctcatagc tcacgctgta ggtatctcag ttcggtgtag gtcgttcgct 8220
ccaagctggg ctgtgtgcac gaaccccccg ttcagcccga ccgctgcgcc ttatccggta 8280
actatcgtct tgagtccaac ccggtaagac acgacttatc gccactggca gcagccactg 8340
gtaacaggat tagcagagcg aggtatgtag gcggtgctac agagttcttg aagtggtggc 8400
ctaactacgg ctacactaga agaacagtat ttggtatctg cgctctgctg aagccagtta 8460
ccttcggaaa aagagttggt agctcttgat ccggcaaaca aaccaccgct ggtagcggtg 8520
gtttttttgt ttgcaagcag cagattacgc gcagaaaaaa aggatctcaa gaagatcctt 8580
tgatcttttc tacggggtct gacactcagt ggaacgaaaa ctcacgttaa gggattttgg 8640
tcatgagatt atcaaaaagg atcttcacct agatcctttt aaattaaaaa tgaagtttta 8700
aatcaatcta aagtatatat gagtaaactt ggtctgacag ttaccaatgc ttaatcagtg 8760
aggcacctat ctcagcgatc tgtctatttc gttcatccat agttgcctga ctccccgtcg 8820
tgtagataac tacgatacgg gagggcttac catctggccc cagtgctgca atgataccgc 8880
gagacccacg ctcaccggct ccagatttat cagcaataaa ccagccagcc ggaagggccg 8940
agcgcagaag tggtcctgca actttatccg cctccatcca gtctattaat tgttgccggg 9000
aagctagagt aagtagttcg ccagttaata gtttgcgcaa cgttgttgcc attgctacag 9060
gcatcgtggt gtcacgctcg tcgtttggta tggcttcatt cagctccggt tcccaacgat 9120
caaggcgagt tacatgatcc cccatgttgt gcaaaaaagc ggttagctcc ttcggtcctc 9180
cgatcgttgt cagaagtaag ttggccgcag tgttatcact catggttatg gcagcactgc 9240
ataattctct tactgtcatg ccatccgtaa gatgcttttc tgtgactggt gagtactcaa 9300
ccaagtcatt ctgagaatag tgtatgcggc gaccgagttg ctcttgcccg gcgtcaatac 9360
gggataatac cgcgccacat agcagaactt taaaagtgct catcattgga aaacgttctt 9420
cggggcgaaa actctcaagg atcttaccgc tgttgagatc cagttcgatg taacccactc 9480
gtgcacccaa ctgatcttca gcatctttta ctttcaccag cgtttctggg tgagcaaaaa 9540
caggaaggca aaatgccgca aaaaagggaa taagggcgac acggaaatgt tgaatactca 9600
tactcttcct ttttcaatat tattgaagca tttatcaggg ttattgtctc atgagcggat 9660
acatatttga atgtatttag aaaaataaac aaataggggt tccgcgcaca tttccccgaa 9720
aagtgccacc tgacgtcgac ggatcgggag atcgatctcc cgatccccta gggtcgactc 9780
tcagtacaat ctgctctgat gccgcatagt taagccagta tctgctccct gcttgtgtgt 9840
tggaggtcgc tgagtagtgc gcgagcaaaa tttaagctac aacaaggcaa ggcttgaccg 9900
acaattgcat gaagaatctg cttagggtta ggcgttttgc gctgcttcgc gatgtacggg 9960
ccagatatac gcgttgacat tgattattga ctagttatta atagtaatca attacggggt 10020
cattagttca tagcccatat atggagttcc gcgttacata acttacggta aatggcccgc 10080
ctggctgacc gcccaacgac ccccgcccat tgacgtcaat aatgacgtat gttcccatag 10140
taacgccaat agggactttc cattgacgtc aatgggtgga gtatttacgg taaactgccc 10200
acttggcagt acatcaagtg tatc 10224
<210> 2
<211> 8811
<212> DNA
<213> Artificial sequence
<400> 2
atatgccaag tacgccccct attgacgtca atgacggtaa atggcccgcc tggcattatg 60
cccagtacat gaccttatgg gactttccta cttggcagta catctacgta ttagtcatcg 120
ctattaccat ggtgatgcgg ttttggcagt acatcaatgg gcgtggatag cggtttgact 180
cacggggatt tccaagtctc caccccattg acgtcaatgg gagtttgttt tggcaccaaa 240
atcaacggga ctttccaaaa tgtcgtaaca actccgcccc attgacgcaa atgggcggta 300
ggcgtgtacg gtgggaggtc tatataagca gagctggttt agtgaaccgt cagatccgct 360
agagatccgc ggccgctaat acgactcact atagggagag ccgccaccat gaaacggaca 420
gccgacggaa gcgagttcga gtcaccaaag aagaagcgga aagtctctga agtcgagttt 480
agccacgagt attggatgag gcacgcactg accctggcaa agcgagcatg ggatgaaaga 540
gaagtccccg tgggcgccgt gctggtgcac aacaatagag tgatcggaga gggatggaac 600
aggccaatcg gccgccacga ccctaccgca cacgcagaga tcatggcact gaggcaggga 660
ggcctggtca tgcagaatta ccgcctgatc gatgccaccc tgtatgtgac actggagcca 720
tgcgtgatgt gcgcaggagc aatgatccac agcaggatcg gaagagtggt gttcggagca 780
cgggacgcca agaccggcgc agcaggctcc ctgatggatg tgctgcacca ccccggcatg 840
aaccaccggg tggagatcac agagggaatc ctggcagacg agtgcgccgc cctgctgagc 900
gatttcttta gaatgcggag acaggagatc aaggcccaga agaaggcaca gagctccacc 960
gactctggag gatctagcgg aggatcctct ggaagcgaga caccaggcac aagcgagtcc 1020
gccacaccag agagctccgg cggctcctcc ggaggatcct ctgaggtgga gttttcccac 1080
gagtactgga tgagacatgc cctgaccctg gccaagaggg cacgcgatga gagggaggtg 1140
cctgtgggag ccgtgctggt gctgaacaat agagtgatcg gcgagggctg gaacagagcc 1200
atcggcctgc acgacccaac agcccatgcc gaaattatgg ccctgagaca gggcggcctg 1260
gtcatgcaga actacagact gattgacgcc accctgtacg tgacattcga gccttgcgtg 1320
atgtgcgccg gcgccatgat ccactctagg atcggccgcg tggtgtttgg cgtgaggaac 1380
gcaaaaaccg gcgccgcagg ctccctgatg gacgtgctgc actaccccgg catgaatcac 1440
cgcgtcgaaa ttaccgaggg aatcctggca gatgaatgtg ccgccctgct gtgctatttc 1500
tttcggatgc ctagacaggt gttcaatgct cagaagaagg cccagagctc caccgactcc 1560
ggaggatcta gcggaggctc ctctggctct gagacacctg gcacaagcga gagcgcaaca 1620
cctgaaagca gcgggggcag cagcgggggg tcagacaaga agtacagcat cggcctggcc 1680
atcggcacca actctgtggg ctgggccgtg atcaccgacg agtacaaggt gcccagcaag 1740
aaattcaagg tgctgggcaa caccgaccgg cacagcatca agaagaacct gatcggagcc 1800
ctgctgttcg acagcggcga aacagccgag gccacccggc tgaagagaac cgccagaaga 1860
agatacacca gacggaagaa ccggatctgc tatctgcaag agatcttcag caacgagatg 1920
gccaaggtgg acgacagctt cttccacaga ctggaagagt ccttcctggt ggaagaggat 1980
aagaagcacg agcggcaccc catcttcggc aacatcgtgg acgaggtggc ctaccacgag 2040
aagtacccca ccatctacca cctgagaaag aaactggtgg acagcaccga caaggccgac 2100
ctgcggctga tctatctggc cctggcccac atgatcaagt tccggggcca cttcctgatc 2160
gagggcgacc tgaaccccga caacagcgac gtggacaagc tgttcatcca gctggtgcag 2220
acctacaacc agctgttcga ggaaaacccc atcaacgcca gcggcgtgga cgccaaggcc 2280
atcctgtctg ccagactgag caagagcaga cggctggaaa atctgatcgc ccagctgccc 2340
ggcgagaaga agaatggcct gttcggaaac ctgattgccc tgagcctggg cctgaccccc 2400
aacttcaaga gcaacttcga cctggccgag gatgccaaac tgcagctgag caaggacacc 2460
tacgacgacg acctggacaa cctgctggcc cagatcggcg accagtacgc cgacctgttt 2520
ctggccgcca agaacctgtc cgacgccatc ctgctgagcg acatcctgag agtgaacacc 2580
gagatcacca aggcccccct gagcgcctct atgatcaaga gatacgacga gcaccaccag 2640
gacctgaccc tgctgaaagc tctcgtgcgg cagcagctgc ctgagaagta caaagagatt 2700
ttcttcgacc agagcaagaa cggctacgcc ggctacattg acggcggagc cagccaggaa 2760
gagttctaca agttcatcaa gcccatcctg gaaaagatgg acggcaccga ggaactgctc 2820
gtgaagctga acagagagga cctgctgcgg aagcagcgga ccttcgacaa cggcagcatc 2880
ccccaccaga tccacctggg agagctgcac gccattctgc ggcggcagga agatttttac 2940
ccattcctga aggacaaccg ggaaaagatc gagaagatcc tgaccttccg catcccctac 3000
tacgtgggcc ctctggccag gggaaacagc agattcgcct ggatgaccag aaagagcgag 3060
gaaaccatca ccccctggaa cttcgaggaa gtggtggaca agggcgcttc cgcccagagc 3120
ttcatcgagc ggatgaccaa cttcgataag aacctgccca acgagaaggt gctgcccaag 3180
cacagcctgc tgtacgagta cttcaccgtg tataacgagc tgaccaaagt gaaatacgtg 3240
accgagggaa tgagaaagcc cgccttcctg agcggcgagc agaaaaaggc catcgtggac 3300
ctgctgttca agaccaaccg gaaagtgacc gtgaagcagc tgaaagagga ctacttcaag 3360
aaaatcgagt gcttcgactc cgtggaaatc tccggcgtgg aagatcggtt caacgcctcc 3420
ctgggcacat accacgatct gctgaaaatt atcaaggaca aggacttcct ggacaatgag 3480
gaaaacgagg acattctgga agatatcgtg ctgaccctga cactgtttga ggacagagag 3540
atgatcgagg aacggctgaa aacctatgcc cacctgttcg acgacaaagt gatgaagcag 3600
ctgaagcggc ggagatacac cggctggggc aggctgagcc ggaagctgat caacggcatc 3660
cgggacaagc agtccggcaa gacaatcctg gatttcctga agtccgacgg cttcgccaac 3720
agaaacttca tgcagctgat ccacgacgac agcctgacct ttaaagagga catccagaaa 3780
gcccaggtgt ccggccaggg cgatagcctg cacgagcaca ttgccaatct ggccggcagc 3840
cccgccatta agaagggcat cctgcagaca gtgaaggtgg tggacgagct cgtgaaagtg 3900
atgggccggc acaagcccga gaacatcgtg atcgaaatgg ccagagagaa ccagaccacc 3960
cagaagggac agaagaacag ccgcgagaga atgaagcgga tcgaagaggg catcaaagag 4020
ctgggcagcc agatcctgaa agaacacccc gtggaaaaca cccagctgca gaacgagaag 4080
ctgtacctgt actacctgca gaatgggcgg gatatgtacg tggaccagga actggacatc 4140
aaccggctgt ccgactacga tgtggaccat atcgtgcctc agagctttct gaaggacgac 4200
tccatcgaca acaaggtgct gaccagaagc gacaagaacc ggggcaagag cgacaacgtg 4260
ccctccgaag aggtcgtgaa gaagatgaag aactactggc ggcagctgct gaacgccaag 4320
ctgattaccc agagaaagtt cgacaatctg accaaggccg agagaggcgg cctgagcgaa 4380
ctggataagg ccggcttcat caagagacag ctggtggaaa cccggcagat cacaaagcac 4440
gtggcacaga tcctggactc ccggatgaac actaagtacg acgagaatga caagctgatc 4500
cgggaagtga aagtgatcac cctgaagtcc aagctggtgt ccgatttccg gaaggatttc 4560
cagttttaca aagtgcgcga gatcaacaac taccaccacg cccacgacgc ctacctgaac 4620
gccgtcgtgg gaaccgccct gatcaaaaag taccctaagc tggaaagcga gttcgtgtac 4680
ggcgactaca aggtgtacga cgtgcggaag atgatcgcca agagcgagca ggaaatcggc 4740
aaggctaccg ccaagtactt cttctacagc aacatcatga actttttcaa gaccgagatt 4800
accctggcca acggcgagat ccggaagcgg cctctgatcg agacaaacgg cgaaaccggg 4860
gagatcgtgt gggataaggg ccgggatttt gccaccgtgc ggaaagtgct gagcatgccc 4920
caagtgaata tcgtgaaaaa gaccgaggtg cagacaggcg gcttcagcaa agagtctatc 4980
cggcccaaga ggaacagcga taagctgatc gccagaaaga aggactggga ccctaagaag 5040
tacggcggct tcgtgagccc caccgtggcc tattctgtgc tggtggtggc caaagtggaa 5100
aagggcaagt ccaagaaact gaagagtgtg aaagagctgc tggggatcac catcatggaa 5160
agaagcagct tcgagaagaa tcccatcgac tttctggaag ccaagggcta caaagaagtg 5220
aaaaaggacc tgatcatcaa gctgcctaag tactccctgt tcgagctgga aaacggccgg 5280
aagagaatgc tggcctctgc cagattcctg cagaagggaa acgaactggc cctgccctcc 5340
aaatatgtga acttcctgta cctggccagc cactatgaga agctgaaggg ctcccccgag 5400
gataatgagc agaaacagct gtttgtggaa cagcacaagc actacctgga cgagatcatc 5460
gagcagatca gcgagttctc caagagagtg atcctggccg acgctaatct ggacaaagtg 5520
ctgtccgcct acaacaagca ccgggataag cccatcagag agcaggccga gaatatcatc 5580
cacctgttta ccctgaccaa tctgggagcc cctcgggcct tcaagtactt tgacaccacc 5640
atcgaccgga aggtgtaccg gagcaccaaa gaggtgctgg acgccaccct gatccaccag 5700
agcatcaccg gcctgtacga gacacggatc gacctgtctc agctgggagg tgactctggc 5760
ggctcaaaaa gaaccgccga cggcagcgaa ttcgagccca agaagaagag gaaagtctaa 5820
ccggtcatca tcaccatcac cattgagttt aaacccgctg atcagcctcg actgtgcctt 5880
ctagttgcca gccatctgtt gtttgcccct cccccgtgcc ttccttgacc ctggaaggtg 5940
ccactcccac tgtcctttcc taataaaatg aggaaattgc atcgcattgt ctgagtaggt 6000
gtcattctat tctggggggt ggggtggggc aggacagcaa gggggaggat tgggaagaca 6060
atagcaggca tgctggggat gcggtgggct ctatggcttc tgaggcggaa agaaccagct 6120
ggggctcgat accgtcgacc tctagctaga gcttggcgta atcatggtca tagctgtttc 6180
ctgtgtgaaa ttgttatccg ctcacaattc cacacaacat acgagccgga agcataaagt 6240
gtaaagccta gggtgcctaa tgagtgagct aactcacatt aattgcgttg cgctcactgc 6300
ccgctttcca gtcgggaaac ctgtcgtgcc agctgcatta atgaatcggc caacgcgcgg 6360
ggagaggcgg tttgcgtatt gggcgctctt ccgcttcctc gctcactgac tcgctgcgct 6420
cggtcgttcg gctgcggcga gcggtatcag ctcactcaaa ggcggtaata cggttatcca 6480
cagaatcagg ggataacgca ggaaagaaca tgtgagcaaa aggccagcaa aaggccagga 6540
accgtaaaaa ggccgcgttg ctggcgtttt tccataggct ccgcccccct gacgagcatc 6600
acaaaaatcg acgctcaagt cagaggtggc gaaacccgac aggactataa agataccagg 6660
cgtttccccc tggaagctcc ctcgtgcgct ctcctgttcc gaccctgccg cttaccggat 6720
acctgtccgc ctttctccct tcgggaagcg tggcgctttc tcatagctca cgctgtaggt 6780
atctcagttc ggtgtaggtc gttcgctcca agctgggctg tgtgcacgaa ccccccgttc 6840
agcccgaccg ctgcgcctta tccggtaact atcgtcttga gtccaacccg gtaagacacg 6900
acttatcgcc actggcagca gccactggta acaggattag cagagcgagg tatgtaggcg 6960
gtgctacaga gttcttgaag tggtggccta actacggcta cactagaaga acagtatttg 7020
gtatctgcgc tctgctgaag ccagttacct tcggaaaaag agttggtagc tcttgatccg 7080
gcaaacaaac caccgctggt agcggtggtt tttttgtttg caagcagcag attacgcgca 7140
gaaaaaaagg atctcaagaa gatcctttga tcttttctac ggggtctgac actcagtgga 7200
acgaaaactc acgttaaggg attttggtca tgagattatc aaaaaggatc ttcacctaga 7260
tccttttaaa ttaaaaatga agttttaaat caatctaaag tatatatgag taaacttggt 7320
ctgacagtta ccaatgctta atcagtgagg cacctatctc agcgatctgt ctatttcgtt 7380
catccatagt tgcctgactc cccgtcgtgt agataactac gatacgggag ggcttaccat 7440
ctggccccag tgctgcaatg ataccgcgag acccacgctc accggctcca gatttatcag 7500
caataaacca gccagccgga agggccgagc gcagaagtgg tcctgcaact ttatccgcct 7560
ccatccagtc tattaattgt tgccgggaag ctagagtaag tagttcgcca gttaatagtt 7620
tgcgcaacgt tgttgccatt gctacaggca tcgtggtgtc acgctcgtcg tttggtatgg 7680
cttcattcag ctccggttcc caacgatcaa ggcgagttac atgatccccc atgttgtgca 7740
aaaaagcggt tagctccttc ggtcctccga tcgttgtcag aagtaagttg gccgcagtgt 7800
tatcactcat ggttatggca gcactgcata attctcttac tgtcatgcca tccgtaagat 7860
gcttttctgt gactggtgag tactcaacca agtcattctg agaatagtgt atgcggcgac 7920
cgagttgctc ttgcccggcg tcaatacggg ataataccgc gccacatagc agaactttaa 7980
aagtgctcat cattggaaaa cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt 8040
tgagatccag ttcgatgtaa cccactcgtg cacccaactg atcttcagca tcttttactt 8100
tcaccagcgt ttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa 8160
gggcgacacg gaaatgttga atactcatac tcttcctttt tcaatattat tgaagcattt 8220
atcagggtta ttgtctcatg agcggataca tatttgaatg tatttagaaa aataaacaaa 8280
taggggttcc gcgcacattt ccccgaaaag tgccacctga cgtcgacgga tcgggagatc 8340
gatctcccga tcccctaggg tcgactctca gtacaatctg ctctgatgcc gcatagttaa 8400
gccagtatct gctccctgct tgtgtgttgg aggtcgctga gtagtgcgcg agcaaaattt 8460
aagctacaac aaggcaaggc ttgaccgaca attgcatgaa gaatctgctt agggttaggc 8520
gttttgcgct gcttcgcgat gtacgggcca gatatacgcg ttgacattga ttattgacta 8580
gttattaata gtaatcaatt acggggtcat tagttcatag cccatatatg gagttccgcg 8640
ttacataact tacggtaaat ggcccgcctg gctgaccgcc caacgacccc cgcccattga 8700
cgtcaataat gacgtatgtt cccatagtaa cgccaatagg gactttccat tgacgtcaat 8760
gggtggagta tttacggtaa actgcccact tggcagtaca tcaagtgtat c 8811
<210> 3
<211> 21
<212> DNA
<213> Artificial sequence
<400> 3
atggactccc tggccgagtc t 21
<210> 4
<211> 20
<212> DNA
<213> Artificial sequence
<400> 4
cggcttcctt ttgtgaacag 20
<210> 5
<211> 42
<212> DNA
<213> Artificial sequence
<400> 5
ctgttcacaa aaggaagccg cacatgtaca atgcccaatg cg 42
<210> 6
<211> 24
<212> DNA
<213> Artificial sequence
<400> 6
attcctatta cgcttgtttc ttgg 24
<210> 7
<211> 45
<212> DNA
<213> Artificial sequence
<400> 7
gaattcagca cagggagcat gggaatggac tccctggccg agtct 45
<210> 8
<211> 86
<212> DNA
<213> Artificial sequence
<400> 8
accggttgga ctttcctctt cttcttgggc tcgaactcgc tgccgtcggc ggttcttttt 60
tcattcctat tacgcttgtt tcttgg 86
<210> 9
<211> 1451
<212> DNA
<213> Artificial sequence
<400> 9
gaattcagca cagggagcat gggaatggac tccctggccg agtctcggtg gcctccgggc 60
ctggcagtca tgaagacaat agatgatttg ctgcggtgtg gaatttgctt cgagtatttc 120
aacattgcaa tgataatacc tcagtgttca cataactact gctctctctg tataagaaaa 180
tttctgtcct ataaaactca gtgtccaact tgctgtgtga ctgtcacaga gccggatctg 240
aaaaataacc gcatattaga tgaactggta aaaagcttga attttgcacg gaatcatctg 300
ctgcagtttg ctttagagtc accagccaaa tctcctgctt cttcctcttc aaagaatctt 360
gctgtcaaag tatatactcc tgtagcctcc agacagtctt taaagcaggg gagcaggtta 420
atggataatt tcttgatcag agaaatgagt ggttctacat cagagttgtt gataaaagaa 480
aataaaagca aattcagccc tcaaaaagag gcgagccctg ctgcaaagac caaagagaca 540
cgttctgtag aagagatcgc tccagatccc tcagaggcta agcgtcctga gccaccctcg 600
acatccactt tgaaacaagt tactaaagtg gattgtcctg tttgcggggt taacattcca 660
gaaagtcaca ttaataagca tttagacagc tgtttatcac gcgaagagaa gaaggaaagc 720
ctcagaagtt ctgttcacaa aaggaagccg cacatgtaca atgcccaatg cgatgctttg 780
catcctaaat cagctgctga aatagttcga gaaatcgaaa atatagagaa gactaggatg 840
cgtcttgaag ctagtaaact caatgaaagt gtaatggttt ttacaaagga ccaaacagaa 900
aaggaaatag atgaaatcca cagtaaatat cgtaaaaaac ataagagtga atttcagctt 960
ctggtggatc aggctagaaa aggatacaag aaaattgctg gaatgtcaca aaaaacagta 1020
acaataacaa aagaagatga atctacagaa aagctatctt ctgtatgcat gggacaggaa 1080
gataatatga cctcagtaac aaaccacttt tctcaatcaa agctggactc cccagaggaa 1140
ttggaacctg acagagaaga ggattcttct agctgtattg atattcaaga agttctttct 1200
tcatcagaat cagattcatg caatagttcc agttcagaca tcataagaga tcttttagaa 1260
gaagaggaag cctgggaagc atcacataaa aacgatcttc aagacacaga aataagtcca 1320
agacagaatc gccgcacaag agccgctgaa agtgctgaga ttgaaccaag aaacaagcgt 1380
aataggaatg aaaaaagaac cgccgacggc agcgagttcg agcccaagaa gaagaggaaa 1440
gtccaaccgg t 1451
<210> 10
<211> 9630
<212> DNA
<213> Artificial sequence
<400> 10
atatgccaag tacgccccct attgacgtca atgacggtaa atggcccgcc tggcattatg 60
cccagtacat gaccttatgg gactttccta cttggcagta catctacgta ttagtcatcg 120
ctattaccat ggtgatgcgg ttttggcagt acatcaatgg gcgtggatag cggtttgact 180
cacggggatt tccaagtctc caccccattg acgtcaatgg gagtttgttt tggcaccaaa 240
atcaacggga ctttccaaaa tgtcgtaaca actccgcccc attgacgcaa atgggcggta 300
ggcgtgtacg gtgggaggtc tatataagca gagctggttt agtgaaccgt cagatccgct 360
agagatccgc ggccgctaat acgactcact atagggagag ccgccaccat gaaacggaca 420
gccgacggaa gcgagttcga gtcaccaaag aagaagcgga aagtctctga ggtggagttt 480
tcccacgagt actggatgag acatgccctg accctggcca agagggcacg ggatgagagg 540
gaggtgcctg tgggagccgt gctggtgctg aacaatagag tgatcggcga gggctggaac 600
agagccatcg gcctgcacga cccaacagcc catgccgaaa ttatggccct gagacagggc 660
ggcctggtca tgcagaacta cagactgatt gacgccaccc tgtacgtgac attcgagcct 720
tgcgtgatgt gcgccggcgc catgatccac tctaggatcg gccgcgtggt gtttggcgtg 780
aggaactcaa aaagaggcgc cgcaggctcc ctgatgaacg tgctgaacta ccccggcatg 840
aatcaccgcg tcgaaattac cgagggaatc ctggcagatg aatgtgccgc cctgctgtgc 900
gatttctatc ggatgcctag acaggtgttc aatgctcaga agaaggccca gagctccatc 960
aactccggag gatctagcgg aggctcctct ggctctgaga cacctggcac aagcgagagc 1020
gcaacacctg aaagcagcgg gggcagcagc ggggggtcag acaagaagta cagcatcggc 1080
ctggccatcg gcaccaactc tgtgggctgg gccgtgatca ccgacgagta caaggtgccc 1140
agcaagaaat tcaaggtgct gggcaacacc gaccggcaca gcatcaagaa gaacctgatc 1200
ggagccctgc tgttcgacag cggcgaaaca gccgaggcca cccggctgaa gagaaccgcc 1260
agaagaagat acaccagacg gaagaaccgg atctgctatc tgcaagagat cttcagcaac 1320
gagatggcca aggtggacga cagcttcttc cacagactgg aagagtcctt cctggtggaa 1380
gaggataaga agcacgagcg gcaccccatc ttcggcaaca tcgtggacga ggtggcctac 1440
cacgagaagt accccaccat ctaccacctg agaaagaaac tggtggacag caccgacaag 1500
gccgacctgc ggctgatcta tctggccctg gcccacatga tcaagttccg gggccacttc 1560
ctgatcgagg gcgacctgaa ccccgacaac agcgacgtgg acaagctgtt catccagctg 1620
gtgcagacct acaaccagct gttcgaggaa aaccccatca acgccagcgg cgtggacgcc 1680
aaggccatcc tgtctgccag actgagcaag agcagacggc tggaaaatct gatcgcccag 1740
ctgcccggcg agaagaagaa tggcctgttc ggaaacctga ttgccctgag cctgggcctg 1800
acccccaact tcaagagcaa cttcgacctg gccgaggatg ccaaactgca gctgagcaag 1860
gacacctacg acgacgacct ggacaacctg ctggcccaga tcggcgacca gtacgccgac 1920
ctgtttctgg ccgccaagaa cctgtccgac gccatcctgc tgagcgacat cctgagagtg 1980
aacaccgaga tcaccaaggc ccccctgagc gcctctatga tcaagagata cgacgagcac 2040
caccaggacc tgaccctgct gaaagctctc gtgcggcagc agctgcctga gaagtacaaa 2100
gagattttct tcgaccagag caagaacggc tacgccggct acattgacgg cggagccagc 2160
caggaagagt tctacaagtt catcaagccc atcctggaaa agatggacgg caccgaggaa 2220
ctgctcgtga agctgaacag agaggacctg ctgcggaagc agcggacctt cgacaacggc 2280
agcatccccc accagatcca cctgggagag ctgcacgcca ttctgcggcg gcaggaagat 2340
ttttacccat tcctgaagga caaccgggaa aagatcgaga agatcctgac cttccgcatc 2400
ccctactacg tgggccctct ggccagggga aacagcagat tcgcctggat gaccagaaag 2460
agcgaggaaa ccatcacccc ctggaacttc gaggaagtgg tggacaaggg cgcttccgcc 2520
cagagcttca tcgagcggat gaccaacttc gataagaacc tgcccaacga gaaggtgctg 2580
cccaagcaca gcctgctgta cgagtacttc accgtgtata acgagctgac caaagtgaaa 2640
tacgtgaccg agggaatgag aaagcccgcc ttcctgagcg gcgagcagaa aaaggccatc 2700
gtggacctgc tgttcaagac caaccggaaa gtgaccgtga agcagctgaa agaggactac 2760
ttcaagaaaa tcgagtgctt cgactccgtg gaaatctccg gcgtggaaga tcggttcaac 2820
gcctccctgg gcacatacca cgatctgctg aaaattatca aggacaagga cttcctggac 2880
aatgaggaaa acgaggacat tctggaagat atcgtgctga ccctgacact gtttgaggac 2940
agagagatga tcgaggaacg gctgaaaacc tatgcccacc tgttcgacga caaagtgatg 3000
aagcagctga agcggcggag atacaccggc tggggcaggc tgagccggaa gctgatcaac 3060
ggcatccggg acaagcagtc cggcaagaca atcctggatt tcctgaagtc cgacggcttc 3120
gccaacagaa acttcatgca gctgatccac gacgacagcc tgacctttaa agaggacatc 3180
cagaaagccc aggtgtccgg ccagggcgat agcctgcacg agcacattgc caatctggcc 3240
ggcagccccg ccattaagaa gggcatcctg cagacagtga aggtggtgga cgagctcgtg 3300
aaagtgatgg gccggcacaa gcccgagaac atcgtgatcg aaatggccag agagaaccag 3360
accacccaga agggacagaa gaacagccgc gagagaatga agcggatcga agagggcatc 3420
aaagagctgg gcagccagat cctgaaagaa caccccgtgg aaaacaccca gctgcagaac 3480
gagaagctgt acctgtacta cctgcagaat gggcgggata tgtacgtgga ccaggaactg 3540
gacatcaacc ggctgtccga ctacgatgtg gaccatatcg tgcctcagag ctttctgaag 3600
gacgactcca tcgacaacaa ggtgctgacc agaagcgaca agaaccgggg caagagcgac 3660
aacgtgccct ccgaagaggt cgtgaagaag atgaagaact actggcggca gctgctgaac 3720
gccaagctga ttacccagag aaagttcgac aatctgacca aggccgagag aggcggcctg 3780
agcgaactgg ataaggccgg cttcatcaag agacagctgg tggaaacccg gcagatcaca 3840
aagcacgtgg cacagatcct ggactcccgg atgaacacta agtacgacga gaatgacaag 3900
ctgatccggg aagtgaaagt gatcaccctg aagtccaagc tggtgtccga tttccggaag 3960
gatttccagt tttacaaagt gcgcgagatc aacaactacc accacgccca cgacgcctac 4020
ctgaacgccg tcgtgggaac cgccctgatc aaaaagtacc ctaagctgga aagcgagttc 4080
gtgtacggcg actacaaggt gtacgacgtg cggaagatga tcgccaagag cgagcaggaa 4140
atcggcaagg ctaccgccaa gtacttcttc tacagcaaca tcatgaactt tttcaagacc 4200
gagattaccc tggccaacgg cgagatccgg aagcggcctc tgatcgagac aaacggcgaa 4260
accggggaga tcgtgtggga taagggccgg gattttgcca ccgtgcggaa agtgctgagc 4320
atgccccaag tgaatatcgt gaaaaagacc gaggtgcaga caggcggctt cagcaaagag 4380
tctatcctgc ccaagaggaa cagcgataag ctgatcgcca gaaagaagga ctgggaccct 4440
aagaagtacg gcggcttcga cagccccacc gtggcctatt ctgtgctggt ggtggccaaa 4500
gtggaaaagg gcaagtccaa gaaactgaag agtgtgaaag agctgctggg gatcaccatc 4560
atggaaagaa gcagcttcga gaagaatccc atcgactttc tggaagccaa gggctacaaa 4620
gaagtgaaaa aggacctgat catcaagctg cctaagtact ccctgttcga gctggaaaac 4680
ggccggaaga gaatgctggc ctctgccggc gaactgcaga agggaaacga actggccctg 4740
ccctccaaat atgtgaactt cctgtacctg gccagccact atgagaagct gaagggctcc 4800
cccgaggata atgagcagaa acagctgttt gtggaacagc acaagcacta cctggacgag 4860
atcatcgagc agatcagcga gttctccaag agagtgatcc tggccgacgc taatctggac 4920
aaagtgctgt ccgcctacaa caagcaccgg gataagccca tcagagagca ggccgagaat 4980
atcatccacc tgtttaccct gaccaatctg ggagcccctg ccgccttcaa gtactttgac 5040
accaccatcg accggaagag gtacaccagc accaaagagg tgctggacgc caccctgatc 5100
caccagagca tcaccggcct gtacgagaca cggatcgacc tgtctcagct gggaggtgac 5160
tctggcggct caaaaagaac cgccgacggc agcgaattca gcacagggag catgggaatg 5220
gactccctgg ccgagtctcg gtggcctccg ggcctggcag tcatgaagac aatagatgat 5280
ttgctgcggt gtggaatttg cttcgagtat ttcaacattg caatgataat acctcagtgt 5340
tcacataact actgctctct ctgtataaga aaatttctgt cctataaaac tcagtgtcca 5400
acttgctgtg tgactgtcac agagccggat ctgaaaaata accgcatatt agatgaactg 5460
gtaaaaagct tgaattttgc acggaatcat ctgctgcagt ttgctttaga gtcaccagcc 5520
aaatctcctg cttcttcctc ttcaaagaat cttgctgtca aagtatatac tcctgtagcc 5580
tccagacagt ctttaaagca ggggagcagg ttaatggata atttcttgat cagagaaatg 5640
agtggttcta catcagagtt gttgataaaa gaaaataaaa gcaaattcag ccctcaaaaa 5700
gaggcgagcc ctgctgcaaa gaccaaagag acacgttctg tagaagagat cgctccagat 5760
ccctcagagg ctaagcgtcc tgagccaccc tcgacatcca ctttgaaaca agttactaaa 5820
gtggattgtc ctgtttgcgg ggttaacatt ccagaaagtc acattaataa gcatttagac 5880
agctgtttat cacgcgaaga gaagaaggaa agcctcagaa gttctgttca caaaaggaag 5940
ccgcacatgt acaatgccca atgcgatgct ttgcatccta aatcagctgc tgaaatagtt 6000
cgagaaatcg aaaatataga gaagactagg atgcgtcttg aagctagtaa actcaatgaa 6060
agtgtaatgg tttttacaaa ggaccaaaca gaaaaggaaa tagatgaaat ccacagtaaa 6120
tatcgtaaaa aacataagag tgaatttcag cttctggtgg atcaggctag aaaaggatac 6180
aagaaaattg ctggaatgtc acaaaaaaca gtaacaataa caaaagaaga tgaatctaca 6240
gaaaagctat cttctgtatg catgggacag gaagataata tgacctcagt aacaaaccac 6300
ttttctcaat caaagctgga ctccccagag gaattggaac ctgacagaga agaggattct 6360
tctagctgta ttgatattca agaagttctt tcttcatcag aatcagattc atgcaatagt 6420
tccagttcag acatcataag agatctttta gaagaagagg aagcctggga agcatcacat 6480
aaaaacgatc ttcaagacac agaaataagt ccaagacaga atcgccgcac aagagccgct 6540
gaaagtgctg agattgaacc aagaaacaag cgtaatagga atgaaaaaag aaccgccgac 6600
ggcagcgagt tcgagcccaa gaagaagagg aaagtccaac cggtcatcat caccatcacc 6660
attgagttta aacccgctga tcagcctcga ctgtgccttc tagttgccag ccatctgttg 6720
tttgcccctc ccccgtgcct tccttgaccc tggaaggtgc cactcccact gtcctttcct 6780
aataaaatga ggaaattgca tcgcattgtc tgagtaggtg tcattctatt ctggggggtg 6840
gggtggggca ggacagcaag ggggaggatt gggaagacaa tagcaggcat gctggggatg 6900
cggtgggctc tatggcttct gaggcggaaa gaaccagctg gggctcgata ccgtcgacct 6960
ctagctagag cttggcgtaa tcatggtcat agctgtttcc tgtgtgaaat tgttatccgc 7020
tcacaattcc acacaacata cgagccggaa gcataaagtg taaagcctag ggtgcctaat 7080
gagtgagcta actcacatta attgcgttgc gctcactgcc cgctttccag tcgggaaacc 7140
tgtcgtgcca gctgcattaa tgaatcggcc aacgcgcggg gagaggcggt ttgcgtattg 7200
ggcgctcttc cgcttcctcg ctcactgact cgctgcgctc ggtcgttcgg ctgcggcgag 7260
cggtatcagc tcactcaaag gcggtaatac ggttatccac agaatcaggg gataacgcag 7320
gaaagaacat gtgagcaaaa ggccagcaaa aggccaggaa ccgtaaaaag gccgcgttgc 7380
tggcgttttt ccataggctc cgcccccctg acgagcatca caaaaatcga cgctcaagtc 7440
agaggtggcg aaacccgaca ggactataaa gataccaggc gtttccccct ggaagctccc 7500
tcgtgcgctc tcctgttccg accctgccgc ttaccggata cctgtccgcc tttctccctt 7560
cgggaagcgt ggcgctttct catagctcac gctgtaggta tctcagttcg gtgtaggtcg 7620
ttcgctccaa gctgggctgt gtgcacgaac cccccgttca gcccgaccgc tgcgccttat 7680
ccggtaacta tcgtcttgag tccaacccgg taagacacga cttatcgcca ctggcagcag 7740
ccactggtaa caggattagc agagcgaggt atgtaggcgg tgctacagag ttcttgaagt 7800
ggtggcctaa ctacggctac actagaagaa cagtatttgg tatctgcgct ctgctgaagc 7860
cagttacctt cggaaaaaga gttggtagct cttgatccgg caaacaaacc accgctggta 7920
gcggtggttt ttttgtttgc aagcagcaga ttacgcgcag aaaaaaagga tctcaagaag 7980
atcctttgat cttttctacg gggtctgaca ctcagtggaa cgaaaactca cgttaaggga 8040
ttttggtcat gagattatca aaaaggatct tcacctagat ccttttaaat taaaaatgaa 8100
gttttaaatc aatctaaagt atatatgagt aaacttggtc tgacagttac caatgcttaa 8160
tcagtgaggc acctatctca gcgatctgtc tatttcgttc atccatagtt gcctgactcc 8220
ccgtcgtgta gataactacg atacgggagg gcttaccatc tggccccagt gctgcaatga 8280
taccgcgaga cccacgctca ccggctccag atttatcagc aataaaccag ccagccggaa 8340
gggccgagcg cagaagtggt cctgcaactt tatccgcctc catccagtct attaattgtt 8400
gccgggaagc tagagtaagt agttcgccag ttaatagttt gcgcaacgtt gttgccattg 8460
ctacaggcat cgtggtgtca cgctcgtcgt ttggtatggc ttcattcagc tccggttccc 8520
aacgatcaag gcgagttaca tgatccccca tgttgtgcaa aaaagcggtt agctccttcg 8580
gtcctccgat cgttgtcaga agtaagttgg ccgcagtgtt atcactcatg gttatggcag 8640
cactgcataa ttctcttact gtcatgccat ccgtaagatg cttttctgtg actggtgagt 8700
actcaaccaa gtcattctga gaatagtgta tgcggcgacc gagttgctct tgcccggcgt 8760
caatacggga taataccgcg ccacatagca gaactttaaa agtgctcatc attggaaaac 8820
gttcttcggg gcgaaaactc tcaaggatct taccgctgtt gagatccagt tcgatgtaac 8880
ccactcgtgc acccaactga tcttcagcat cttttacttt caccagcgtt tctgggtgag 8940
caaaaacagg aaggcaaaat gccgcaaaaa agggaataag ggcgacacgg aaatgttgaa 9000
tactcatact cttccttttt caatattatt gaagcattta tcagggttat tgtctcatga 9060
gcggatacat atttgaatgt atttagaaaa ataaacaaat aggggttccg cgcacatttc 9120
cccgaaaagt gccacctgac gtcgacggat cgggagatcg atctcccgat cccctagggt 9180
cgactctcag tacaatctgc tctgatgccg catagttaag ccagtatctg ctccctgctt 9240
gtgtgttgga ggtcgctgag tagtgcgcga gcaaaattta agctacaaca aggcaaggct 9300
tgaccgacaa ttgcatgaag aatctgctta gggttaggcg ttttgcgctg cttcgcgatg 9360
tacgggccag atatacgcgt tgacattgat tattgactag ttattaatag taatcaatta 9420
cggggtcatt agttcatagc ccatatattg agttccgcgt tacataactt acggtaaatg 9480
gcccgcctgg ctgaccgccc aacgaccccc gcccattgac gtcaataatg acgtatgttc 9540
ccatagtaac gccaataggg actttccatt gacgtcaatg ggtggagtat ttacggtaaa 9600
ctgcccactt ggcagtacat caagtgtatc 9630
<210> 11
<211> 8217
<212> DNA
<213> Artificial sequence
<400> 11
atatgccaag tacgccccct attgacgtca atgacggtaa atggcccgcc tggcattatg 60
cccagtacat gaccttatgg gactttccta cttggcagta catctacgta ttagtcatcg 120
ctattaccat ggtgatgcgg ttttggcagt acatcaatgg gcgtggatag cggtttgact 180
cacggggatt tccaagtctc caccccattg acgtcaatgg gagtttgttt tggcaccaaa 240
atcaacggga ctttccaaaa tgtcgtaaca actccgcccc attgacgcaa atgggcggta 300
ggcgtgtacg gtgggaggtc tatataagca gagctggttt agtgaaccgt cagatccgct 360
agagatccgc ggccgctaat acgactcact atagggagag ccgccaccat gaaacggaca 420
gccgacggaa gcgagttcga gtcaccaaag aagaagcgga aagtctctga ggtggagttt 480
tcccacgagt actggatgag acatgccctg accctggcca agagggcacg ggatgagagg 540
gaggtgcctg tgggagccgt gctggtgctg aacaatagag tgatcggcga gggctggaac 600
agagccatcg gcctgcacga cccaacagcc catgccgaaa ttatggccct gagacagggc 660
ggcctggtca tgcagaacta cagactgatt gacgccaccc tgtacgtgac attcgagcct 720
tgcgtgatgt gcgccggcgc catgatccac tctaggatcg gccgcgtggt gtttggcgtg 780
aggaactcaa aaagaggcgc cgcaggctcc ctgatgaacg tgctgaacta ccccggcatg 840
aatcaccgcg tcgaaattac cgagggaatc ctggcagatg aatgtgccgc cctgctgtgc 900
gatttctatc ggatgcctag acaggtgttc aatgctcaga agaaggccca gagctccatc 960
aactccggag gatctagcgg aggctcctct ggctctgaga cacctggcac aagcgagagc 1020
gcaacacctg aaagcagcgg gggcagcagc ggggggtcag acaagaagta cagcatcggc 1080
ctggccatcg gcaccaactc tgtgggctgg gccgtgatca ccgacgagta caaggtgccc 1140
agcaagaaat tcaaggtgct gggcaacacc gaccggcaca gcatcaagaa gaacctgatc 1200
ggagccctgc tgttcgacag cggcgaaaca gccgaggcca cccggctgaa gagaaccgcc 1260
agaagaagat acaccagacg gaagaaccgg atctgctatc tgcaagagat cttcagcaac 1320
gagatggcca aggtggacga cagcttcttc cacagactgg aagagtcctt cctggtggaa 1380
gaggataaga agcacgagcg gcaccccatc ttcggcaaca tcgtggacga ggtggcctac 1440
cacgagaagt accccaccat ctaccacctg agaaagaaac tggtggacag caccgacaag 1500
gccgacctgc ggctgatcta tctggccctg gcccacatga tcaagttccg gggccacttc 1560
ctgatcgagg gcgacctgaa ccccgacaac agcgacgtgg acaagctgtt catccagctg 1620
gtgcagacct acaaccagct gttcgaggaa aaccccatca acgccagcgg cgtggacgcc 1680
aaggccatcc tgtctgccag actgagcaag agcagacggc tggaaaatct gatcgcccag 1740
ctgcccggcg agaagaagaa tggcctgttc ggaaacctga ttgccctgag cctgggcctg 1800
acccccaact tcaagagcaa cttcgacctg gccgaggatg ccaaactgca gctgagcaag 1860
gacacctacg acgacgacct ggacaacctg ctggcccaga tcggcgacca gtacgccgac 1920
ctgtttctgg ccgccaagaa cctgtccgac gccatcctgc tgagcgacat cctgagagtg 1980
aacaccgaga tcaccaaggc ccccctgagc gcctctatga tcaagagata cgacgagcac 2040
caccaggacc tgaccctgct gaaagctctc gtgcggcagc agctgcctga gaagtacaaa 2100
gagattttct tcgaccagag caagaacggc tacgccggct acattgacgg cggagccagc 2160
caggaagagt tctacaagtt catcaagccc atcctggaaa agatggacgg caccgaggaa 2220
ctgctcgtga agctgaacag agaggacctg ctgcggaagc agcggacctt cgacaacggc 2280
agcatccccc accagatcca cctgggagag ctgcacgcca ttctgcggcg gcaggaagat 2340
ttttacccat tcctgaagga caaccgggaa aagatcgaga agatcctgac cttccgcatc 2400
ccctactacg tgggccctct ggccagggga aacagcagat tcgcctggat gaccagaaag 2460
agcgaggaaa ccatcacccc ctggaacttc gaggaagtgg tggacaaggg cgcttccgcc 2520
cagagcttca tcgagcggat gaccaacttc gataagaacc tgcccaacga gaaggtgctg 2580
cccaagcaca gcctgctgta cgagtacttc accgtgtata acgagctgac caaagtgaaa 2640
tacgtgaccg agggaatgag aaagcccgcc ttcctgagcg gcgagcagaa aaaggccatc 2700
gtggacctgc tgttcaagac caaccggaaa gtgaccgtga agcagctgaa agaggactac 2760
ttcaagaaaa tcgagtgctt cgactccgtg gaaatctccg gcgtggaaga tcggttcaac 2820
gcctccctgg gcacatacca cgatctgctg aaaattatca aggacaagga cttcctggac 2880
aatgaggaaa acgaggacat tctggaagat atcgtgctga ccctgacact gtttgaggac 2940
agagagatga tcgaggaacg gctgaaaacc tatgcccacc tgttcgacga caaagtgatg 3000
aagcagctga agcggcggag atacaccggc tggggcaggc tgagccggaa gctgatcaac 3060
ggcatccggg acaagcagtc cggcaagaca atcctggatt tcctgaagtc cgacggcttc 3120
gccaacagaa acttcatgca gctgatccac gacgacagcc tgacctttaa agaggacatc 3180
cagaaagccc aggtgtccgg ccagggcgat agcctgcacg agcacattgc caatctggcc 3240
ggcagccccg ccattaagaa gggcatcctg cagacagtga aggtggtgga cgagctcgtg 3300
aaagtgatgg gccggcacaa gcccgagaac atcgtgatcg aaatggccag agagaaccag 3360
accacccaga agggacagaa gaacagccgc gagagaatga agcggatcga agagggcatc 3420
aaagagctgg gcagccagat cctgaaagaa caccccgtgg aaaacaccca gctgcagaac 3480
gagaagctgt acctgtacta cctgcagaat gggcgggata tgtacgtgga ccaggaactg 3540
gacatcaacc ggctgtccga ctacgatgtg gaccatatcg tgcctcagag ctttctgaag 3600
gacgactcca tcgacaacaa ggtgctgacc agaagcgaca agaaccgggg caagagcgac 3660
aacgtgccct ccgaagaggt cgtgaagaag atgaagaact actggcggca gctgctgaac 3720
gccaagctga ttacccagag aaagttcgac aatctgacca aggccgagag aggcggcctg 3780
agcgaactgg ataaggccgg cttcatcaag agacagctgg tggaaacccg gcagatcaca 3840
aagcacgtgg cacagatcct ggactcccgg atgaacacta agtacgacga gaatgacaag 3900
ctgatccggg aagtgaaagt gatcaccctg aagtccaagc tggtgtccga tttccggaag 3960
gatttccagt tttacaaagt gcgcgagatc aacaactacc accacgccca cgacgcctac 4020
ctgaacgccg tcgtgggaac cgccctgatc aaaaagtacc ctaagctgga aagcgagttc 4080
gtgtacggcg actacaaggt gtacgacgtg cggaagatga tcgccaagag cgagcaggaa 4140
atcggcaagg ctaccgccaa gtacttcttc tacagcaaca tcatgaactt tttcaagacc 4200
gagattaccc tggccaacgg cgagatccgg aagcggcctc tgatcgagac aaacggcgaa 4260
accggggaga tcgtgtggga taagggccgg gattttgcca ccgtgcggaa agtgctgagc 4320
atgccccaag tgaatatcgt gaaaaagacc gaggtgcaga caggcggctt cagcaaagag 4380
tctatcctgc ccaagaggaa cagcgataag ctgatcgcca gaaagaagga ctgggaccct 4440
aagaagtacg gcggcttcga cagccccacc gtggcctatt ctgtgctggt ggtggccaaa 4500
gtggaaaagg gcaagtccaa gaaactgaag agtgtgaaag agctgctggg gatcaccatc 4560
atggaaagaa gcagcttcga gaagaatccc atcgactttc tggaagccaa gggctacaaa 4620
gaagtgaaaa aggacctgat catcaagctg cctaagtact ccctgttcga gctggaaaac 4680
ggccggaaga gaatgctggc ctctgccggc gaactgcaga agggaaacga actggccctg 4740
ccctccaaat atgtgaactt cctgtacctg gccagccact atgagaagct gaagggctcc 4800
cccgaggata atgagcagaa acagctgttt gtggaacagc acaagcacta cctggacgag 4860
atcatcgagc agatcagcga gttctccaag agagtgatcc tggccgacgc taatctggac 4920
aaagtgctgt ccgcctacaa caagcaccgg gataagccca tcagagagca ggccgagaat 4980
atcatccacc tgtttaccct gaccaatctg ggagcccctg ccgccttcaa gtactttgac 5040
accaccatcg accggaagag gtacaccagc accaaagagg tgctggacgc caccctgatc 5100
caccagagca tcaccggcct gtacgagaca cggatcgacc tgtctcagct gggaggtgac 5160
tctggcggct caaaaagaac cgccgacggc agcgaattcg agcccaagaa gaagaggaaa 5220
gtctaaccgg tcatcatcac catcaccatt gagtttaaac ccgctgatca gcctcgactg 5280
tgccttctag ttgccagcca tctgttgttt gcccctcccc cgtgccttcc ttgaccctgg 5340
aaggtgccac tcccactgtc ctttcctaat aaaatgagga aattgcatcg cattgtctga 5400
gtaggtgtca ttctattctg gggggtgggg tggggcagga cagcaagggg gaggattggg 5460
aagacaatag caggcatgct ggggatgcgg tgggctctat ggcttctgag gcggaaagaa 5520
ccagctgggg ctcgataccg tcgacctcta gctagagctt ggcgtaatca tggtcatagc 5580
tgtttcctgt gtgaaattgt tatccgctca caattccaca caacatacga gccggaagca 5640
taaagtgtaa agcctagggt gcctaatgag tgagctaact cacattaatt gcgttgcgct 5700
cactgcccgc tttccagtcg ggaaacctgt cgtgccagct gcattaatga atcggccaac 5760
gcgcggggag aggcggtttg cgtattgggc gctcttccgc ttcctcgctc actgactcgc 5820
tgcgctcggt cgttcggctg cggcgagcgg tatcagctca ctcaaaggcg gtaatacggt 5880
tatccacaga atcaggggat aacgcaggaa agaacatgtg agcaaaaggc cagcaaaagg 5940
ccaggaaccg taaaaaggcc gcgttgctgg cgtttttcca taggctccgc ccccctgacg 6000
agcatcacaa aaatcgacgc tcaagtcaga ggtggcgaaa cccgacagga ctataaagat 6060
accaggcgtt tccccctgga agctccctcg tgcgctctcc tgttccgacc ctgccgctta 6120
ccggatacct gtccgccttt ctcccttcgg gaagcgtggc gctttctcat agctcacgct 6180
gtaggtatct cagttcggtg taggtcgttc gctccaagct gggctgtgtg cacgaacccc 6240
ccgttcagcc cgaccgctgc gccttatccg gtaactatcg tcttgagtcc aacccggtaa 6300
gacacgactt atcgccactg gcagcagcca ctggtaacag gattagcaga gcgaggtatg 6360
taggcggtgc tacagagttc ttgaagtggt ggcctaacta cggctacact agaagaacag 6420
tatttggtat ctgcgctctg ctgaagccag ttaccttcgg aaaaagagtt ggtagctctt 6480
gatccggcaa acaaaccacc gctggtagcg gtggtttttt tgtttgcaag cagcagatta 6540
cgcgcagaaa aaaaggatct caagaagatc ctttgatctt ttctacgggg tctgacactc 6600
agtggaacga aaactcacgt taagggattt tggtcatgag attatcaaaa aggatcttca 6660
cctagatcct tttaaattaa aaatgaagtt ttaaatcaat ctaaagtata tatgagtaaa 6720
cttggtctga cagttaccaa tgcttaatca gtgaggcacc tatctcagcg atctgtctat 6780
ttcgttcatc catagttgcc tgactccccg tcgtgtagat aactacgata cgggagggct 6840
taccatctgg ccccagtgct gcaatgatac cgcgagaccc acgctcaccg gctccagatt 6900
tatcagcaat aaaccagcca gccggaaggg ccgagcgcag aagtggtcct gcaactttat 6960
ccgcctccat ccagtctatt aattgttgcc gggaagctag agtaagtagt tcgccagtta 7020
atagtttgcg caacgttgtt gccattgcta caggcatcgt ggtgtcacgc tcgtcgtttg 7080
gtatggcttc attcagctcc ggttcccaac gatcaaggcg agttacatga tcccccatgt 7140
tgtgcaaaaa agcggttagc tccttcggtc ctccgatcgt tgtcagaagt aagttggccg 7200
cagtgttatc actcatggtt atggcagcac tgcataattc tcttactgtc atgccatccg 7260
taagatgctt ttctgtgact ggtgagtact caaccaagtc attctgagaa tagtgtatgc 7320
ggcgaccgag ttgctcttgc ccggcgtcaa tacgggataa taccgcgcca catagcagaa 7380
ctttaaaagt gctcatcatt ggaaaacgtt cttcggggcg aaaactctca aggatcttac 7440
cgctgttgag atccagttcg atgtaaccca ctcgtgcacc caactgatct tcagcatctt 7500
ttactttcac cagcgtttct gggtgagcaa aaacaggaag gcaaaatgcc gcaaaaaagg 7560
gaataagggc gacacggaaa tgttgaatac tcatactctt cctttttcaa tattattgaa 7620
gcatttatca gggttattgt ctcatgagcg gatacatatt tgaatgtatt tagaaaaata 7680
aacaaatagg ggttccgcgc acatttcccc gaaaagtgcc acctgacgtc gacggatcgg 7740
gagatcgatc tcccgatccc ctagggtcga ctctcagtac aatctgctct gatgccgcat 7800
agttaagcca gtatctgctc cctgcttgtg tgttggaggt cgctgagtag tgcgcgagca 7860
aaatttaagc tacaacaagg caaggcttga ccgacaattg catgaagaat ctgcttaggg 7920
ttaggcgttt tgcgctgctt cgcgatgtac gggccagata tacgcgttga cattgattat 7980
tgactagtta ttaatagtaa tcaattacgg ggtcattagt tcatagccca tatattgagt 8040
tccgcgttac ataacttacg gtaaatggcc cgcctggctg accgcccaac gacccccgcc 8100
cattgacgtc aataatgacg tatgttccca tagtaacgcc aatagggact ttccattgac 8160
gtcaatgggt ggagtattta cggtaaactg cccacttggc agtacatcaa gtgtatc 8217
<210> 12
<211> 20
<212> DNA
<213> Artificial sequence
<400> 12
gaatactaag catagactcc 20
<210> 13
<211> 20
<212> DNA
<213> Artificial sequence
<400> 13
ggagtctatg cttagtattc 20
<210> 14
<211> 20
<212> DNA
<213> Artificial sequence
<400> 14
gtaaacaaag catagactga 20
<210> 15
<211> 20
<212> DNA
<213> Artificial sequence
<400> 15
tcagtctatg ctttgtttac 20
<210> 16
<211> 20
<212> DNA
<213> Artificial sequence
<400> 16
gaacacaaag catagactgc 20
<210> 17
<211> 20
<212> DNA
<213> Artificial sequence
<400> 17
gcagtctatg ctttgtgttc 20
<210> 18
<211> 20
<212> DNA
<213> Artificial sequence
<400> 18
gatgagataa tgatgagtca 20
<210> 19
<211> 20
<212> DNA
<213> Artificial sequence
<400> 19
tgactcatca ttatctcatc 20
<210> 20
<211> 20
<212> DNA
<213> Artificial sequence
<400> 20
gacaaaccag aagccgctcc 20
<210> 21
<211> 20
<212> DNA
<213> Artificial sequence
<400> 21
ggagcggctt ctggtttgtc 20
<210> 22
<211> 20
<212> DNA
<213> Artificial sequence
<400> 22
gggaataaat catagaatcc 20
<210> 23
<211> 20
<212> DNA
<213> Artificial sequence
<400> 23
ggattctatg atttattccc 20
<210> 24
<211> 20
<212> DNA
<213> Artificial sequence
<400> 24
ggaacacaaa gcatagactg 20
<210> 25
<211> 20
<212> DNA
<213> Artificial sequence
<400> 25
cagtctatgc tttgtgttcc 20
<210> 26
<211> 20
<212> DNA
<213> Artificial sequence
<400> 26
gcacctacct cgggagctga 20
<210> 27
<211> 20
<212> DNA
<213> Artificial sequence
<400> 27
tcagctcccg aggtaggtgc 20
<210> 28
<211> 20
<212> DNA
<213> Artificial sequence
<400> 28
ggaatccctt ctgcagcacc 20
<210> 29
<211> 20
<212> DNA
<213> Artificial sequence
<400> 29
ggtgctgcag aagggattcc 20
<210> 30
<211> 20
<212> DNA
<213> Artificial sequence
<400> 30
tcagaaagtg gtggctggtg 20
<210> 31
<211> 20
<212> DNA
<213> Artificial sequence
<400> 31
caccagccac cactttctga 20
<210> 32
<211> 20
<212> DNA
<213> Artificial sequence
<400> 32
ggcccagact gagcacgtga 20
<210> 33
<211> 20
<212> DNA
<213> Artificial sequence
<400> 33
tcacgtgctc agtctgggcc 20
<210> 34
<211> 20
<212> DNA
<213> Artificial sequence
<400> 34
atatttgcat tgagatagtg 20
<210> 35
<211> 20
<212> DNA
<213> Artificial sequence
<400> 35
cactatctca atgcaaatat 20
<210> 36
<211> 20
<212> DNA
<213> Artificial sequence
<400> 36
gtcatcttag tcattacctg 20
<210> 37
<211> 20
<212> DNA
<213> Artificial sequence
<400> 37
caggtaatga ctaagatgac 20
<210> 38
<211> 20
<212> DNA
<213> Artificial sequence
<400> 38
gaagatagag aatagactgc 20
<210> 39
<211> 20
<212> DNA
<213> Artificial sequence
<400> 39
gcagtctatt ctctatcttc 20
<210> 40
<211> 1791
<212> PRT
<213> Artificial sequence
<400> 40
Pro Lys Lys Lys Arg Lys Val Ser Glu Val Glu Phe Ser His Glu Tyr
1 5 10 15
Trp Met Arg His Ala Leu Thr Leu Ala Lys Arg Ala Trp Asp Glu Arg
20 25 30
Glu Val Pro Val Gly Ala Val Leu Val His Asn Asn Arg Val Ile Gly
35 40 45
Glu Gly Trp Asn Arg Pro Ile Gly Arg His Asp Pro Thr Ala His Ala
50 55 60
Glu Ile Met Ala Leu Arg Gln Gly Gly Leu Val Met Gln Asn Tyr Arg
65 70 75 80
Leu Ile Asp Ala Thr Leu Tyr Val Thr Leu Glu Pro Cys Val Met Cys
85 90 95
Ala Gly Ala Met Ile His Ser Arg Ile Gly Arg Val Val Phe Gly Ala
100 105 110
Arg Asp Ala Lys Thr Gly Ala Ala Gly Ser Leu Met Asp Val Leu His
115 120 125
His Pro Gly Met Asn His Arg Val Glu Ile Thr Glu Gly Ile Leu Ala
130 135 140
Asp Glu Cys Ala Ala Leu Leu Ser Asp Phe Phe Arg Met Arg Arg Gln
145 150 155 160
Glu Ile Lys Ala Gln Lys Lys Ala Gln Ser Ser Thr Asp Ser Gly Gly
165 170 175
Ser Ser Gly Gly Ser Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser
180 185 190
Ala Thr Pro Glu Ser Ser Gly Gly Ser Ser Gly Gly Ser Ser Glu Val
195 200 205
Glu Phe Ser His Glu Tyr Trp Met Arg His Ala Leu Thr Leu Ala Lys
210 215 220
Arg Ala Arg Asp Glu Arg Glu Val Pro Val Gly Ala Val Leu Val Leu
225 230 235 240
Asn Asn Arg Val Ile Gly Glu Gly Trp Asn Arg Ala Ile Gly Leu His
245 250 255
Asp Pro Thr Ala His Ala Glu Ile Met Ala Leu Arg Gln Gly Gly Leu
260 265 270
Val Met Gln Asn Tyr Arg Leu Ile Asp Ala Thr Leu Tyr Val Thr Phe
275 280 285
Glu Pro Cys Val Met Cys Ala Gly Ala Met Ile His Ser Arg Ile Gly
290 295 300
Arg Val Val Phe Gly Val Arg Asn Ala Lys Thr Gly Ala Ala Gly Ser
305 310 315 320
Leu Met Asp Val Leu His Tyr Pro Gly Met Asn His Arg Val Glu Ile
325 330 335
Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala Ala Leu Leu Cys Tyr Phe
340 345 350
Phe Arg Met Pro Arg Gln Val Phe Asn Ala Gln Lys Lys Ala Gln Ser
355 360 365
Ser Thr Asp Ser Gly Gly Ser Ser Gly Gly Ser Ser Gly Ser Glu Thr
370 375 380
Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Ser Gly Gly Ser Ser
385 390 395 400
Gly Gly Ser Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn
405 410 415
Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys
420 425 430
Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn
435 440 445
Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr
450 455 460
Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg
465 470 475 480
Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp
485 490 495
Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp
500 505 510
Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val
515 520 525
Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu
530 535 540
Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu
545 550 555 560
Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu
565 570 575
Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln
580 585 590
Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val
595 600 605
Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu
610 615 620
Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe
625 630 635 640
Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser
645 650 655
Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr
660 665 670
Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr
675 680 685
Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu
690 695 700
Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser
705 710 715 720
Ala Ser Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu
725 730 735
Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile
740 745 750
Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly
755 760 765
Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys
770 775 780
Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu
785 790 795 800
Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile
805 810 815
His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr
820 825 830
Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe
835 840 845
Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe
850 855 860
Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe
865 870 875 880
Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg
885 890 895
Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys
900 905 910
His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys
915 920 925
Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly
930 935 940
Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys
945 950 955 960
Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys
965 970 975
Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser
980 985 990
Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe
995 1000 1005
Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr
1010 1015 1020
Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr
1025 1030 1035 1040
Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg
1045 1050 1055
Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile
1060 1065 1070
Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp
1075 1080 1085
Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu
1090 1095 1100
Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp
1105 1110 1115 1120
Ser Leu His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys
1125 1130 1135
Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val
1140 1145 1150
Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu
1155 1160 1165
Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys
1170 1175 1180
Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu
1185 1190 1195 1200
His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr
1205 1210 1215
Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile
1220 1225 1230
Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe
1235 1240 1245
Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys
1250 1255 1260
Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys
1265 1270 1275 1280
Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln
1285 1290 1295
Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu
1300 1305 1310
Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln
1315 1320 1325
Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys
1330 1335 1340
Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu
1345 1350 1355 1360
Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys
1365 1370 1375
Val Arg Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn
1380 1385 1390
Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser
1395 1400 1405
Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile
1410 1415 1420
Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe
1425 1430 1435 1440
Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn
1445 1450 1455
Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly
1460 1465 1470
Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val
1475 1480 1485
Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr
1490 1495 1500
Gly Gly Phe Ser Lys Glu Ser Ile Arg Pro Lys Arg Asn Ser Asp Lys
1505 1510 1515 1520
Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe
1525 1530 1535
Val Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val Glu
1540 1545 1550
Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile
1555 1560 1565
Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu
1570 1575 1580
Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu
1585 1590 1595 1600
Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu
1605 1610 1615
Ala Ser Ala Arg Phe Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser
1620 1625 1630
Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys
1635 1640 1645
Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His
1650 1655 1660
Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys
1665 1670 1675 1680
Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr
1685 1690 1695
Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile
1700 1705 1710
His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Arg Ala Phe Lys Tyr
1715 1720 1725
Phe Asp Thr Thr Ile Asp Arg Lys Val Tyr Arg Ser Thr Lys Glu Val
1730 1735 1740
Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr
1745 1750 1755 1760
Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Ser Gly Gly Ser Lys Arg
1765 1770 1775
Thr Ala Asp Gly Ser Glu Phe Glu Pro Lys Lys Lys Arg Lys Val
1780 1785 1790
<210> 41
<211> 1605
<212> PRT
<213> Artificial sequence
<400> 41
Met Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu Ser Pro Lys Lys Lys
1 5 10 15
Arg Lys Val Ser Glu Val Glu Phe Ser His Glu Tyr Trp Met Arg His
20 25 30
Ala Leu Thr Leu Ala Lys Arg Ala Arg Asp Glu Arg Glu Val Pro Val
35 40 45
Gly Ala Val Leu Val Leu Asn Asn Arg Val Ile Gly Glu Gly Trp Asn
50 55 60
Arg Ala Ile Gly Leu His Asp Pro Thr Ala His Ala Glu Ile Met Ala
65 70 75 80
Leu Arg Gln Gly Gly Leu Val Met Gln Asn Tyr Arg Leu Ile Asp Ala
85 90 95
Thr Leu Tyr Val Thr Phe Glu Pro Cys Val Met Cys Ala Gly Ala Met
100 105 110
Ile His Ser Arg Ile Gly Arg Val Val Phe Gly Val Arg Asn Ser Lys
115 120 125
Arg Gly Ala Ala Gly Ser Leu Met Asn Val Leu Asn Tyr Pro Gly Met
130 135 140
Asn His Arg Val Glu Ile Thr Glu Gly Ile Leu Ala Asp Glu Cys Ala
145 150 155 160
Ala Leu Leu Cys Asp Phe Tyr Arg Met Pro Arg Gln Val Phe Asn Ala
165 170 175
Gln Lys Lys Ala Gln Ser Ser Ile Asn Ser Gly Gly Ser Ser Gly Gly
180 185 190
Ser Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu
195 200 205
Ser Ser Gly Gly Ser Ser Gly Gly Ser Asp Lys Lys Tyr Ser Ile Gly
210 215 220
Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu
225 230 235 240
Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg
245 250 255
His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly
260 265 270
Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr
275 280 285
Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn
290 295 300
Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser
305 310 315 320
Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly
325 330 335
Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr
340 345 350
His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg
355 360 365
Leu Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe
370 375 380
Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu
385 390 395 400
Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro
405 410 415
Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu
420 425 430
Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu
435 440 445
Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu
450 455 460
Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu
465 470 475 480
Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
485 490 495
Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu
500 505 510
Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile
515 520 525
Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His
530 535 540
His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro
545 550 555 560
Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala
565 570 575
Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile
580 585 590
Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys
595 600 605
Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly
610 615 620
Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg
625 630 635 640
Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile
645 650 655
Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala
660 665 670
Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr
675 680 685
Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala
690 695 700
Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn
705 710 715 720
Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val
725 730 735
Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys
740 745 750
Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu
755 760 765
Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr
770 775 780
Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu
785 790 795 800
Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile
805 810 815
Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu
820 825 830
Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile
835 840 845
Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met
850 855 860
Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg
865 870 875 880
Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu
885 890 895
Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu
900 905 910
Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln
915 920 925
Val Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala
930 935 940
Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val
945 950 955 960
Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val
965 970 975
Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn
980 985 990
Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly
995 1000 1005
Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn
1010 1015 1020
Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val
1025 1030 1035 1040
Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His
1045 1050 1055
Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val
1060 1065 1070
Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser
1075 1080 1085
Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn
1090 1095 1100
Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu
1105 1110 1115 1120
Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln
1125 1130 1135
Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp
1140 1145 1150
Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu
1155 1160 1165
Val Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys
1170 1175 1180
Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala
1185 1190 1195 1200
His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys Lys
1205 1210 1215
Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr
1220 1225 1230
Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala
1235 1240 1245
Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr
1250 1255 1260
Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu
1265 1270 1275 1280
Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe
1285 1290 1295
Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys
1300 1305 1310
Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro
1315 1320 1325
Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro
1330 1335 1340
Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu
1345 1350 1355 1360
Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val
1365 1370 1375
Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys
1380 1385 1390
Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys
1395 1400 1405
Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn
1410 1415 1420
Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn
1425 1430 1435 1440
Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser
1445 1450 1455
His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln
1460 1465 1470
Leu Phe Val Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln
1475 1480 1485
Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu Asp
1490 1495 1500
Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu
1505 1510 1515 1520
Gln Ala Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala
1525 1530 1535
Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr
1540 1545 1550
Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile
1555 1560 1565
Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp
1570 1575 1580
Ser Gly Gly Ser Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu Pro Lys
1585 1590 1595 1600
Lys Lys Arg Lys Val
1605

Claims (13)

1. A fusion protein, characterized in that: the fusion protein comprises a first region and a second region from N-terminal to C-terminal, wherein the first region comprises ABEs comprising adenine deaminase or an enzymatically active component thereof and nCas 9; the second region comprises e18 protein; the fusion protein optionally comprises one or more linker amino acid sequences located in the first region and between the first region and the second region of the fusion protein.
2. The fusion protein of claim 1, wherein: the ABEs are one of ABE7.10, ABEmax and ABE8e base editors.
3. The fusion protein of claim 1, wherein: the fusion protein also includes a nuclear localization signal fragment.
4. The fusion protein of claim 1, wherein: the amino acid sequence of the fusion protein is shown in SEQ ID NO.40 when the ABEs is ABEmax; and when the ABEs are ABE8e, the amino acid sequence of the fusion protein is shown as SEQ ID NO. 41.
5. An isolated polynucleotide, characterized in that: encoding the fusion protein of any one of claims 1 to 4.
6. The isolated polynucleotide of claim 5, wherein: the polynucleotide sequence comprises a first region encoding the ABEs and a second region encoding the e18 protein, and optionally one or more linker amino acid sequences located in the first region and between the first and second regions of the fusion protein.
7. The isolated polynucleotide of claim 5, wherein when said ABEs is ABEmax, said polynucleotide sequence is constructed by:
the gene sequence of ABEmax before gene modification is shown in SEQ ID NO. 2; the upstream primer sequence F1 of the e18 gene fragment 1 capable of effectively amplifying the target gene is shown as SEQ ID NO.3, and the downstream primer sequence R1 is shown as SEQ ID NO. 4; the upstream primer sequence F2 of the e18 gene fragment 2 capable of effectively amplifying the target gene is shown as SEQ ID NO.5, and the downstream primer sequence R2 is shown as SEQ ID NO. 6; the upstream primer sequence F3 of the e18 fragment gene sequence with the enzyme cutting sites which can be effectively amplified is shown as SEQ ID NO.7, the downstream primer sequence R3 is shown as SEQ ID NO.8, and the e18 gene sequence with double enzyme cutting sites is shown as SEQ ID NO. 9; the sequence obtained by connecting the fragment after the restriction of the enzyme SEQ ID NO.9 with the fragment after the restriction of the enzyme SEQ ID NO.2 is the polynucleotide sequence shown in SEQ ID NO. 1.
8. The isolated polynucleotide of claim 1, wherein when said ABEs is ABE8e, said polynucleotide sequence set forth in SEQ ID No.10 is constructed as follows; the gene sequence of ABE8e before gene modification is shown in SEQ ID NO. 11; the e18 gene sequence with double restriction sites is shown as SEQ ID NO.9, and the sequence obtained by connecting the fragment SEQ ID NO.9 with the fragment obtained by restriction of the enzyme SEQ ID NO.11 is the polynucleotide sequence shown as SEQ ID NO. 10.
9. A construct, characterized by: the construct comprises a polynucleotide according to any one of claims 5 to 8.
10. An expression system, characterized by: the expression system comprises the construct of claim 9 or a polynucleotide of any one of claims 5 to 8 integrated into the genome.
11. A use, characterized by: use of the fusion protein according to any one of claims 1 to 4 and the polynucleotide according to any one of claims 5 to 8 and the construct according to claim 9 or the expression system according to claim 10 for gene editing of converting the base A into G.
12. A base editing system characterized by: comprising the fusion protein of any one of claims 1 to 4 and sgRNAs.
13. A gene editing method characterized by comprising: comprising a fusion protein according to any one of claims 1 to 4 or a base editing system according to claim 12 for gene editing, wherein the gene editing is the conversion of the base A to G.
CN202210538473.1A 2022-05-17 2022-05-17 High-precision gonad purine base editor and application thereof Pending CN115093482A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210538473.1A CN115093482A (en) 2022-05-17 2022-05-17 High-precision gonad purine base editor and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210538473.1A CN115093482A (en) 2022-05-17 2022-05-17 High-precision gonad purine base editor and application thereof

Publications (1)

Publication Number Publication Date
CN115093482A true CN115093482A (en) 2022-09-23

Family

ID=83288420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210538473.1A Pending CN115093482A (en) 2022-05-17 2022-05-17 High-precision gonad purine base editor and application thereof

Country Status (1)

Country Link
CN (1) CN115093482A (en)

Similar Documents

Publication Publication Date Title
KR102181258B1 (en) Virus like particle composition
CN110029096B (en) Adenine base editing tool and application thereof
CN113227368B (en) Engineered enzymes
KR102140596B1 (en) Novel Promotor from Organic Acid Resistant Yeast and Method for Expressing Target Gene Using The Same
CN111549062A (en) Whole genome knockout vector library of silkworm based on CRISPR/Cas9 system and construction method
KR20160016856A (en) Malaria vaccine
CN112011574B (en) Lentiviral vector, construction method and application thereof
CN115698297A (en) Preparation method of multi-module biosynthetic enzyme gene combined library
CN101657097A (en) With the inflammation is the treatment of diseases of feature
CN113584062B (en) Fusion imaging gene, lentivirus expression plasmid, lentivirus and cell thereof, and preparation method and application thereof
CN113584033B (en) CRISPR/Cpf1 gene editing system, construction method thereof and application thereof in gibberella
CN113652451B (en) Lentiviral vector, construction method and application thereof
CN111534543A (en) Eukaryotic CRISPR/Cas9 knockout system, basic vector, vector and cell line
CN111549060A (en) Eukaryotic organism CRISPR/Cas9 whole genome editing cell library and construction method
CN106086054A (en) A kind of method of helicobacter pylori gene traceless knockout
CN113186140B (en) Genetically engineered bacteria for preventing and/or treating hangover and liver disease
CN113637672B (en) Base editing tool and construction method thereof
KR102335519B1 (en) Vaccine composition for preventing or reducing clinical symptom of severe acute respiratory syndrome coronavirus 2
CN115093482A (en) High-precision gonad purine base editor and application thereof
CN111534541A (en) Eukaryotic organism CRISPR-Cas9 double gRNA vector and construction method thereof
CN106399373B (en) A kind of Cas9 expression vector
CN114058607B (en) Fusion protein for editing C to U base, and preparation method and application thereof
CN111041039B (en) Thermophilic anaerobic ethanol bacillus genome editing vector and application thereof
CN112639104B (en) Novel promoter derived from organic acid-tolerant yeast and method for expressing target gene using the same
CN112209883B (en) Fluorescein dye specifically combined with RNA and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination