CN113025597A - Improved genome editing system - Google Patents

Improved genome editing system Download PDF

Info

Publication number
CN113025597A
CN113025597A CN201911351725.4A CN201911351725A CN113025597A CN 113025597 A CN113025597 A CN 113025597A CN 201911351725 A CN201911351725 A CN 201911351725A CN 113025597 A CN113025597 A CN 113025597A
Authority
CN
China
Prior art keywords
lys
leu
glu
asp
ile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911351725.4A
Other languages
Chinese (zh)
Inventor
邱金龙
张倩伟
尹康权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Microbiology of CAS
Original Assignee
Institute of Microbiology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Microbiology of CAS filed Critical Institute of Microbiology of CAS
Priority to CN201911351725.4A priority Critical patent/CN113025597A/en
Publication of CN113025597A publication Critical patent/CN113025597A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8216Methods for controlling, regulating or enhancing expression of transgenes in plant cells
    • C12N15/8218Antisense, co-suppression, viral induced gene silencing [VIGS], post-transcriptional induced gene silencing [PTGS]
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide

Abstract

The present invention relates to the field of genome editing. In particular, the present invention relates to an improved genome editing system and applications thereof. More specifically, the invention provides a genome editing fusion polypeptide comprising a CRISPR nuclease and a 5'→ 3' exonuclease. The invention also provides polynucleotides or expression constructs encoding the polypeptides, and genome editing systems comprising the polypeptides, polynucleotides, and/or constructs. The invention also provides methods for editing the genome of a cell using the genome editing system.

Description

Improved genome editing system
Technical Field
The present invention relates to the field of genome editing. In particular, the present invention relates to an improved genome editing system and applications thereof. More specifically, the invention provides a genome editing fusion polypeptide comprising a CRISPR nuclease and a 5'→ 3' exonuclease. The invention also provides polynucleotides or expression constructs encoding the polypeptides, and genome editing systems comprising the polypeptides, polynucleotides, and/or constructs. The invention also provides methods for editing the genome of a cell using the genome editing system.
Background introduction
The use of CRISPR/Cas systems to provide immunity against viral infection in bacteria and archaea (Wiedenheft et al, 2012) technically simplifies genome editing and is revolutionizing biology and genetic engineering. The CRISPR/Cas9 system is most widely used for genome editing in a variety of organisms including plants (Hsu et al, 2014; Yin et al, 2017). The CRISPR/Cas9 system consists of two parts, Cas9 nuclease and a single guide rna (sgrna). Cas9 binds to the scaffold of sgrnas, and target specificity is determined by the approximately 20 nucleotide (nt) spacer sequence at the 5' end of sgrnas (Jinek et al, 2012). Cas9 typically cleaves target DNA approximately 3bp upstream of a Protospacer Adjacent Motif (PAM) sequence. The mutation features induced by CRISPR/Cas9 in plants mainly include deletions of less than 10bp (usually 1-3bp) and insertions of one base pair (bp), especially A/T (Paul et al, 2016; Bortesi et al, 2016). Likewise, most mutations induced by Cas9 in mammalian cells are small insertions/deletions (indels) (Kim et al, 2015; Kosicki et al, 2018). Although large genomic deletions of up to 250bp were detected after Cas9 editing (Heckl et al, 2014; Liang et al, 2015), their frequency was very low. Thus, CRISPR/Cas9 has been widely used in genomic coding regions, as a small insertion deletion in the coding gene typically results in a frame shift mutation, resulting in loss of function. However, Cas9 remains a challenge for editing regulatory and non-coding genomic sequences, since small insertion deletions induced by one sgRNA are unlikely to result in loss-of-function mutations for regulatory and non-coding genomic sequences.
Two guide RNAs on the boundaries of the deletion fragment, i.e., paired guide RNAs (pgRNAs), have been used to generate larger noncoding DNA deletions (Han et al, 2014; Yin et al, 2015; Zhu et al, 2016) and regulatory element deletions (Diao et al, 2017). However, the need for two sgrnas clearly increases the limitations of this approach. First, PAM sequences are a limiting factor in the wide range of applications of this strategy. Second, pgrnas still tend to produce a single editing event, especially when the two sgRNA target sites are distant from each other (Zhu et al, 2016). In addition, the introduction of two sgrnas is more laborious and the frequency of off-target may increase.
Cas12a (formerly Cpf1) has also been used as a genome editing tool (Zetsche et al 2015; Koonin et al 2017). Like Cas9, Cas12a is also a class 2 RNA-guided Cas enzyme. However, Cas12a uses guide RNAs shorter than sgrnas of Cas9 (Li et al, 2017; Dang et al, 2015), Cas12a can recognize T-rich PAM compared to G-rich PAM of Cas9 (Zetsche et al, 2015; Jinek et al, 2012). In addition, Cas12a produced Double Strand Breaks (DSBs) with staggered ends with 4-5nt overhangs at the PAM distal position, unlike Cas9 (Zetsche et al, 2015). Thus, mutations in Cas12a in plants are characterized primarily by a shortage of up to 44bp (usually 6-13bp) and by rare insertions (Tang et al, 2017).
There is a need in the art to provide further methods for generating larger genomic deletions with only one guide RNA at a particular target site without the need for paired guide RNAs.
Brief description of the invention
In one aspect, the invention provides an isolated fusion polypeptide comprising a CRISPR nuclease and a 5'→ 3' exonuclease.
In another aspect, the present invention also provides a genome editing system comprising at least one of the following i) to v):
i) the fusion polypeptide and guide RNA of the present invention;
ii) an expression construct comprising a nucleotide sequence encoding the fusion polypeptide of the invention, and a guide RNA;
iii) the fusion polypeptide of the invention, and an expression construct comprising a nucleotide sequence encoding a guide RNA;
iv) an expression construct comprising a nucleotide sequence encoding a fusion polypeptide of the invention, and an expression construct comprising a nucleotide sequence encoding a guide RNA;
v) an expression construct comprising a nucleotide sequence encoding the fusion polypeptide of the invention and a nucleotide sequence encoding a guide RNA.
In another aspect, the invention also provides a method of genetically modifying a cell, comprising introducing into the cell, preferably a plant cell, a genome editing system of the invention.
Brief Description of Drawings
FIG. 1: fusion of T5 exonuclease with Cas9 alters the indel character of genome editing. a) Cas9 and T5exo-Cas9 constructs are schematic. b) Ratio of deletion and insertion of the rice protoplast OsMKK5 target site induced by Cas9 or T5exo-Cas 9. Cas9 and T5exo-Cas9 induced mutations were first enriched by PCR amplification of protoplast genomic DNA pre-digested with HindIII, and then the PCR product clones were used for Sanger sequencing. c) Cas9 and T5exo-Cas 9. d) Representative deletions induced by T5exo-Cas9 at the OsMKK5 locus. Black line, deleted genomic region; rectangle, pre-spacer sequence; triangle, cleavage site for Cas 9; double-sided arrows, forward and reverse primers for PCR amplification.
FIG. 2: fusing the T5 exonuclease with Cas9 increases the frequency and size of genome deletions at the guide RNA target locus. a) In rice protoplasts, Cas9 and T5exo-Cas9 induced insertion deletion patterns at OsMPK16, OsCDC48, OsALS and OsXa13 target sites. All experiments were repeated three times with similar results. b) Cas9 and T5exo-Cas9 were distributed in size of deletions made at four target sites in rice protoplasts. "D" represents the length of the deletion. All experiments were repeated three times with similar results. c) Genome editing efficiency of Cas9 and T5exo-Cas9 at four target sites in rice protoplasts. Untreated protoplast samples were used as controls. Data are mean ± s.e.m (n ═ 3). P values were calculated by two-way ANOVA. P < 0.01, P < 0.001.
FIG. 3: the T5exo-Cas9 fusion contributes to the genome deletion of transgenic rice plants. a) The genotyping results of T0 transgenic rice lines obtained by transforming sgRNA OsXa13-T1 and Cas9 or T5exo-Cas9, respectively, were summarized. b) Insertion deletion patterns generated by Cas9 and T5exo-Cas9 in the OsXa13 promoter in transgenic rice lines. "D" represents the length of the deletion. c) The rice mutants shown have disease resistance to Xanthomonas oryzae (Xao) strain PXO 99. The leaves were inoculated and the lesion length was measured 12 days after inoculation. Data were analyzed by one-way ANOVA (mean ± s.d). Significant differences between the mean values were determined by Fisher's protected LSD test (P ≦ 0.05), and significantly different groups were represented by different lower case letters. d) c in the figureThe indicated insertion or deletion mutated sequences. The upper panel shows the structure of OsXa13 gene. The lower panel is the sequence of the OsXa13 target site. UPTPthXo1The sequence is shown in grey, sgRNA target sequence underlined, PAM boxed with rectangles, dashed line for deleted nucleotides, triangle for inserted nucleotides. WT, wild type.
FIG. 4: fusing the T5 exonuclease with Cas12a increases the frequency and size of genome deletions at the guide RNA target locus. a) Cas12a and T5exo-Cas12a constructs are schematic. b) Cas12a and T5exo-Cas12a induced insertion deletion patterns at OsBADH2, ospsps and OsPDS target sites in rice protoplasts. All experiments were repeated three times with similar results. c) The size distribution of the deletions generated by Cas12a and T5exo-Cas12a at the three target sites of rice protoplasts. "D" represents the length of the deletion. All experiments were repeated three times with similar results. d) Genome editing efficiency of Cas12a and T5exo-Cas12a at three target sites in rice protoplasts. Untreated protoplast samples were used as controls. Data are mean ± s.e.m (n ═ 3). P values were calculated by two-way ANOVA. P < 0.01.
FIG. 5: the T5exo-Cas12a fusion produced a larger genome deletion in transgenic rice plants. a) The genotyping results of T0 transgenic rice lines obtained by transforming guide RNA OsPDS-T1 and Cas12a or T5exo-Cas12a, respectively, were summarized. b) Indel patterns generated by Cas12a and T5exo-Cas12a at the OsPDS gene in transgenic rice lines. "D" represents the length of the deletion.
Detailed Description
A, define
In the present invention, unless otherwise specified, scientific and technical terms used herein have the meanings that are commonly understood by those skilled in the art. Also, protein and nucleic acid chemistry, molecular biology, cell and tissue culture, microbiology, immunology related terms, and laboratory procedures used herein are all terms and conventional procedures used extensively in the relevant art. For example, standard recombinant DNA and molecular cloning techniques used in the present invention are well known to those skilled in the art and are more fully described in the following references: sambrook, j., Fritsch, e.f. and manitis, t., Molecular Cloning: a Laboratory Manual: cold Spring Harbor Laboratory Press: cold Spring Harbor, 1989 (abbreviated as "Sambrook"). Meanwhile, in order to better understand the present invention, the definitions and explanations of related terms are provided below.
As used herein, the term "and/or" encompasses all combinations of items linked by the term, as if each combination had been individually listed herein. For example, "a and/or B" encompasses "a", "a and B", and "B". For example, "A, B and/or C" encompasses "a", "B", "C", "a and B", "a and C", "B and C", and "a and B and C".
The term "comprising" when used herein to describe a sequence of a protein or nucleic acid, the protein or nucleic acid may consist of the sequence or may have additional amino acids or nucleotides at one or both ends of the protein or nucleic acid, but still possess the activity described herein. Furthermore, it is clear to the skilled person that the methionine at the N-terminus of the polypeptide encoded by the start codon may be retained in certain practical cases (e.g.during expression in a particular expression system), but does not substantially affect the function of the polypeptide. Thus, in describing a particular polypeptide amino acid sequence in the specification and claims of this application, although it may not contain a methionine encoded by the start codon at the N-terminus, the sequence containing the methionine is also encompassed herein, and accordingly, the encoding nucleotide sequence may also contain the start codon; and vice versa.
As used herein, the term "CRISPR nuclease" generally refers to nucleases found in naturally occurring CRISPR systems, as well as modified forms thereof, variants thereof, catalytically active fragments thereof, and the like. The term encompasses any effector protein capable of gene targeting (e.g., gene editing, gene targeting regulation, etc.) within a cell based on a CRISPR system.
Examples of "CRISPR nucleases" include Cas9 nuclease or variants thereof. The Cas9 nuclease may be a Cas9 nuclease from a different species, such as spCas9 from streptococcus pyogenes (s.pyogenes) or SaCas9 derived from staphylococcus aureus (s.aureus). "Cas 9 nuclease" and "Cas 9" are used interchangeably herein to refer to an RNA-guided nuclease that includes a Cas9 protein or fragment thereof (e.g., a protein comprising the active DNA cleavage domain of Cas9 and/or the gRNA binding domain of Cas 9). Cas9 is a component of a CRISPR/Cas (clustered regularly interspaced short palindromic repeats and related systems) genome editing system that is capable of targeting and cleaving a DNA target sequence under the direction of a guide RNA to form a DNA Double Strand Break (DSB).
Examples of "CRISPR nucleases" can also include Cas12a nuclease or variants thereof, such as high specificity variants. The Cas12a nuclease may be a Cas12a nuclease from a different species, such as Cas12a nuclease from Francisella novicida U112, Acidaminococcus sp.bv3l6 and Lachnospiraceae bacterium ND 2006.
As used herein, the term "5 ' → 3 ' exonuclease" refers to an exonuclease that degrades DNA from the 5' end, i.e., in the 5' to 3 ' direction. The 5' → 3 ' exonuclease of interest can remove nucleotides from the 5' end of the ds DNA strands at the blunt end and in some embodiments at the 3 ' and/or 5' overhang ends.
As used herein, "gRNA" and "guide RNA" are used interchangeably to refer to an RNA molecule capable of forming a complex with a CRISPR nuclease and targeting the complex to a target sequence due to some complementarity to the target sequence. For example, in Cas 9-based gene editing systems, grnas are typically composed of partially complementary complex-forming crRNA and tracrRNA molecules, where the crRNA comprises a sequence that is sufficiently identical to a target sequence and directs the CRISPR complex (Cas9+ crRNA + tracrRNA) to bind specifically to that target sequence. However, it is known in the art to design single guide rnas (sgrnas) that contain both the characteristics of crRNA and tracrRNA. Whereas in Cas12 a-based genome editing systems, grnas typically consist only of mature crRNA molecules, wherein the crRNA comprises a sequence that is sufficiently identical to the target sequence and directs the specific binding of the complex (Cas12a + crRNA) to that target sequence. It is within the ability of the person skilled in the art to design suitable gRNA sequences based on the CRISPR nuclease used and the target sequence to be edited.
As used herein, "genome" encompasses not only chromosomal DNA present in the nucleus, but organelle DNA present in subcellular components of the cell (e.g., mitochondria, plastids).
As used herein, "cell" includes cells of any organism suitable for genome editing. Examples of organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cows, cats; poultry such as chicken, duck, goose; plants include monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis, and the like.
By "genetically modified organism" or "genetically modified cell" is meant an organism or cell that comprises within its genome an exogenous polynucleotide or modified gene or expression control sequence. For example, an exogenous polynucleotide can be stably integrated into the genome of an organism or cell and be inherited by successive generations. The exogenous polynucleotide may be integrated into the genome alone or as part of a recombinant DNA construct. The modified gene or expression regulatory sequence is one which comprises single or multiple deoxynucleotide substitutions, deletions and additions in the genome of the organism or cell.
"exogenous" with respect to a sequence means a sequence from a foreign species, or if from the same species, a sequence whose composition and/or locus has been significantly altered from its native form by deliberate human intervention.
"polynucleotide", "nucleic acid sequence", "nucleotide sequence" or "nucleic acid fragment" are used interchangeably and are single-or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural or altered nucleotide bases. Nucleotides are referred to by their single letter designation as follows: "A" is adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively), "C" represents cytidine or deoxycytidine, "G" represents guanosine or deoxyguanosine, "U" represents uridine, "T" represents deoxythymidine, "R" represents purine (A or G), "Y" represents pyrimidine (C or T), "K" represents G or T, "H" represents A or C or T, "I" represents inosine, and "N" represents any nucleotide.
"polypeptide," "peptide," and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residues is an artificial chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers. The terms "polypeptide", "peptide", "amino acid sequence" and "protein" may also include modifications including, but not limited to, glycosylation, lipid attachment, sulfation, gamma carboxylation of glutamic acid residues, hydroxylation and ADP-ribosylation.
As used herein, "expression construct" refers to a vector, such as a recombinant vector, suitable for expression of a nucleotide sequence of interest in an organism. "expression" refers to the production of a functional product. For example, expression of a nucleotide sequence can refer to transcription of the nucleotide sequence (e.g., transcription to produce mRNA or functional RNA) and/or translation of the RNA into a precursor or mature protein.
The "expression construct" of the invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector, or, in some embodiments, may be an RNA (e.g., mRNA) capable of translation.
An "expression construct" of the invention may comprise regulatory sequences and nucleotide sequences of interest of different origin, or regulatory sequences and nucleotide sequences of interest of the same origin but arranged in a manner different from that normally found in nature.
"regulatory sequence" and "regulatory element" are used interchangeably to refer to a nucleotide sequence that is located upstream (5 'non-coding sequence), intermediate, or downstream (3' non-coding sequence) of a coding sequence and that affects the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences may include, but are not limited to, promoters, translation leader sequences, introns, and polyadenylation recognition sequences.
"promoter" refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment. In some embodiments of the invention, the promoter is a promoter capable of controlling transcription of a gene in a cell, whether or not it is derived from the cell. The promoter may be a constitutive promoter or a tissue-specific promoter or a developmentally regulated promoter or an inducible promoter.
"constitutive promoter" refers to a promoter that will generally cause a gene to be expressed in most cell types under most circumstances. "tissue-specific promoter" and "tissue-preferred promoter" are used interchangeably and refer to a promoter that is expressed primarily, but not necessarily exclusively, in a tissue or organ, but may also be expressed in a particular cell or cell type. "developmentally regulated promoter" refers to a promoter whose activity is determined by a developmental event. An "inducible promoter" selectively expresses an operably linked DNA sequence in response to an endogenous or exogenous stimulus (environmental, hormonal, chemical signal, etc.).
As used herein, the term "operably linked" refers to a regulatory element (such as, but not limited to, a promoter sequence, a transcription termination sequence, and the like) linked to a nucleic acid sequence (e.g., a coding sequence or an open reading frame) such that transcription of the nucleotide sequence is controlled and regulated by the transcriptional regulatory element. Techniques for operably linking regulatory element regions to nucleic acid molecules are known in the art.
"introducing" a nucleic acid molecule (e.g., a plasmid, a linear nucleic acid fragment, RNA, etc.) or a protein into an organism refers to transforming cells of the organism with the nucleic acid or protein so that the nucleic acid or protein can function in the cells. "transformation" as used herein includes both stable transformation and transient transformation. "Stable transformation" refers to the introduction of an exogenous nucleotide sequence into a genome, resulting in the stable inheritance of the exogenous gene. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any successive generation thereof. "transient transformation" refers to the introduction of a nucleic acid molecule or protein into a cell that performs a function without stable inheritance of a foreign gene. In transient transformation, the foreign nucleic acid sequence is not integrated into the genome.
II, genome editing fusion polypeptide
The present invention provides an isolated fusion polypeptide, wherein the fusion polypeptide comprises a CRISPR nuclease and a 5'→ 3' exonuclease.
The CRISPR nuclease described herein can be any CRISPR nuclease capable of effecting genome editing. In some embodiments, the CRISPR nuclease is Cas9 or an active fragment thereof, such as Cas9 from streptococcus pyogenes (SpCas9), Cas9 from staphylococcus aureus (SaCas9), Cas9 from Francisella novicida (FnCas9), Cas9 from vibrio jejuni (CjCas9), and Cas9 from Neisseria gray (Neisseria cinerea) (NcCas 9). In some embodiments, the CRISPR nuclease is Cas12a or an active fragment thereof, such as Cas12a (FnCas12a) from Francisella novicida U112, Cas12a of a streptococcus species (Acidaminococcus sp.) BV3L6, and Cas12a (LbCas12a) of a trichoderma (Lachnospiraceae bacterium) ND 2006. In one embodiment, the amino acid sequence of the CRISPR nuclease is selected from SEQ ID NO: 8 or 15. In one embodiment, the nucleotide sequence encoding the CRISPR nuclease is selected from the group consisting of SEQ ID NO: 9 or 16.
The 5' → 3 ' exonuclease of the present invention may be an exonuclease which degrades DNA from the 5' end, i.e. in the 5' to 3 ' direction. In one embodiment, the exonuclease can digest double stranded dna (dsdna). In some embodiments, the exonuclease can digest single-stranded dna (ssdna). In some embodiments, the exonuclease can digest double-stranded dna (dsdna) and single-stranded dna (ssdna). In some embodiments, the exonuclease is not a 3 '→ 5' exonuclease. In some embodiments, the 5'→ 3' exonuclease is a T5 exonuclease, for example a phage T5 gene D15 product. In some embodiments, the T5 exonuclease comprises a sequence identical to SEQ ID NO: 3, or a variant thereof, having at least 80%, at least 90%, at least 95%, at least 99%, or 100% sequence identity. In some embodiments, the T5 exonuclease consists of a sequence identical to SEQ ID NO: 4, or a nucleotide sequence encoding a polypeptide having at least 80%, at least 90%, at least 95%, at least 99%, or 100% sequence identity. In some preferred embodiments, the T5 exonuclease comprises SEQ ID NO: 3. In a preferred embodiment, the T5 exonuclease consists of SEQ ID NO: 4.
In the polypeptides of the invention, the 5'→ 3' exonuclease and the CRISPR nuclease may be fused directly or indirectly. In some embodiments, the 5'→ 3' exonuclease is directly fused to the CRISPR nuclease. In some embodiments, the 5'→ 3' exonuclease and the CRISPR nuclease may be indirectly fused, e.g., linked by a linker. The linker may be a non-functional amino acid sequence of 1-50 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or 20-25, 25-50) or more amino acids in length, without secondary or higher structure. For example, the joint may be a flexible joint. In some embodiments, the amino acid sequence of the linker is selected from the group consisting of SEQ ID NOs: 5 or 14.
In the polypeptide of the present invention, the 5'→ 3' exonuclease is located at the N-terminus and/or C-terminus of the CRISPR nuclease. In some embodiments, the 5'→ 3' exonuclease is located at the N-terminus of the CRISPR nuclease. In some embodiments, the 5'→ 3' exonuclease is located at the C-terminus of the CRISPR nuclease.
In some embodiments, the isolated fusion polypeptide further comprises a Nuclear Localization Sequence (NLS). In general, one or more NLS in the polypeptide should be of sufficient strength to drive the polypeptide in the nucleus of the cell to accumulate in an amount that can perform its genome editing function. In general, the intensity of nuclear localization activity is determined by the number, location, specific NLS or NLSs used, or a combination of these factors, in the polypeptide.
In some embodiments of the invention, the NLS of the polypeptide of the invention may be located at the N-terminus and/or the C-terminus. In some embodiments of the invention, the NLS of the polypeptide of the invention may be located between the 5'→ 3' exonuclease and the CRISPR nuclease. In some embodiments, the polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs. In some embodiments, the polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs at or near the N-terminus. In some embodiments, the polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs at or near the C-terminus. In some embodiments, the polypeptide comprises a combination of these, such as comprising one or more NLS at the N-terminus and one or more NLS at the C-terminus. When there is more than one NLS, each can be chosen to be independent of the other NLS.
In general, NLS consists of one or more short sequences of positively charged lysines or arginines exposed on the surface of the protein, but other types of NLS are also known. In some embodiments, the amino acid sequence of the NLS of the invention is selected from SEQ ID NO: 6. 7, 12 or 13.
In addition, depending on the desired DNA position to be edited, the isolated fusion polypeptides of the invention may also include other localization sequences, such as cytoplasmic localization sequences, chloroplast localization sequences, mitochondrial localization sequences, and the like.
In some embodiments, the isolated fusion polypeptide comprises, from N-terminus to C-terminus, at least: the 5'→ 3' exonuclease, NLS, the CRISPR nuclease, and another NLS. In some embodiments, the isolated fusion polypeptide comprises, from N-terminus to C-terminus, at least: NLS, the 5'→ 3' exonuclease, the CRISPR nuclease, and another NLS.
In some preferred embodiments, the isolated fusion polypeptide of the invention comprises SEQ ID NO: 1 or 10.
The invention also provides isolated polynucleotides encoding the fusion polypeptides of the invention. In some embodiments, the polynucleotide comprises SEQ ID NO: 2 or 11 or a degenerate variant thereof.
To obtain efficient expression, in some embodiments, the polynucleotide is codon optimized for the organism being edited, e.g., a plant.
Codon optimization refers to the maintenance of the native amino acid sequence by replacing at least one codon (e.g., about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with a codon that is used more frequently or most frequently in the gene of the host cell while maintaining the native amino acid sequenceAnd modifying the nucleic acid sequence so as to enhance expression in the host cell of interest. Different species exhibit specific preferences for certain codons for particular amino acids. Codon bias (the difference in codon usage between organisms) is often correlated with the translation efficiency of messenger rna (mrna), which is believed to depend on the nature of the codons being translated and the availability of specific transfer rna (trna) molecules. The dominance of the selected tRNA in the cell generally reflects the codons most frequently used for peptide synthesis. Thus, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Tables of codon usage can be readily obtained, e.g., aswww.kazusa.orjp/codon/The above available Codon Usage Database ("Codon Usage Database"), and these tables can be adapted in different ways. See, Nakamura Y. et al, "coherent use tabulated from the international DNA sequences databases: status for the year2000. nucleic acids res, 28: 292(2000).
In some embodiments, the isolated fusion polypeptide, the coding sequence for the 5'→ 3' exonuclease and/or the coding sequence for the CRISPR nuclease of the invention are codon optimized for the organism being edited. In some embodiments, the isolated fusion polypeptide of the invention, the coding sequence for the 5'→ 3' exonuclease and/or the coding sequence for the CRISPR nuclease are codon optimized for rice (Oryza sativa).
Third, improved genome editing system
The present inventors have surprisingly found that fusion of T5 exonuclease with CRISPR nuclease such as Cas9 or Cas12a can produce larger deletions with only one gRNA at a particular target site and greatly improve editing efficiency. More unexpectedly, no cytotoxicity was observed when the fusion polypeptide of T5 exonuclease and CRISPR nuclease transformed cells, which makes the fusion polypeptide particularly suitable for genome editing of cells.
Thus, in a further aspect the invention also provides the use of an isolated fusion polypeptide of the invention for genome editing of a cell.
In another aspect, the present invention provides a genome editing system comprising at least one of the following i) to v):
i) the isolated fusion polypeptides and guide RNAs of the invention;
ii) an expression construct comprising a nucleotide sequence encoding the fusion polypeptide of the invention, and a guide RNA;
iii) an isolated fusion polypeptide of the invention, and an expression construct comprising a nucleotide sequence encoding a guide RNA;
iv) an expression construct comprising a nucleotide sequence encoding a fusion polypeptide of the invention, and an expression construct comprising a nucleotide sequence encoding a guide RNA;
v) an expression construct comprising a nucleotide sequence encoding the fusion polypeptide of the invention and a nucleotide sequence encoding a guide RNA.
As used herein, "genome editing system" refers to a combination of components required for genome editing of a genome within a cell. Wherein the individual components of the system, e.g., fusion polypeptides, guide RNAs, etc., may be present independently of each other or may be present in any combination as a composition.
In some embodiments, wherein the guide RNA is a sgRNA. In some embodiments, wherein the guide RNA is a sgRNA, and the sgRNA is not paired. Methods for constructing suitable sgrnas from a given target sequence are known in the art. For example, see the literature: wang, Y.et al.Simultaneous edition of three homoalloles in hexagonal branched coal compositions reliable resistance to powder great mile. Nat.Biotechnol.32, 947-; shan, Q.et al.targeted gene modification of crop plants using a CRISPR-Cas system. nat.Biotechnol.31, 686-688 (2013); liang, Z.et a1.targeted mutagenesis in Zea Mays using TALENs and the CRISPR/Cas system. J Genet genomics.41, 63-68 (2014).
The design of target sequences that can be recognized and targeted by CRISPR nuclease and guide RNA complexes is within the skill of one of ordinary skill in the art. Generally, for Cas9, the target sequence is a sequence complementary to a guide sequence of about 20 nucleotides contained in the guide RNA and is immediately 3 'to the immediate vicinity of the Protospacer Adjacentto Motif (PAM), which is, for example, 5' -NGG. Whereas for Cas12a, it is generally desirable to include PAM, which may be, for example, 5'-TTTN, at the 5' end of the target sequence.
In some embodiments, the CRISPR system of the invention comprises at least one of ii) to v) above. In some embodiments, the nucleotide sequence encoding the fusion polypeptide of the invention and/or the nucleotide sequence encoding the guide RNA is operably linked to an expression control sequence, preferably a plant expression control sequence, such as a promoter.
Examples of promoters that may be used in the present invention include, but are not limited to: cauliflower mosaic virus 35S promoter (Odell et al (1985) Nature 313: 810-. Promoters useful in the present invention also include Moore et al (2006) Plant J.45 (4): 651-683 commonly used tissue-specific promoters, as reviewed in.
In an exemplary embodiment, the construct of the invention comprises the maize Ubi-1 promoter.
Method for modifying target sequence in cell genome
In another aspect, the invention provides a method of modifying a target sequence in the genome of a cell, comprising introducing into the cell a genome editing system of the invention.
In some embodiments, the modification results in the deletion of one or more nucleotides, preferably a plurality of consecutive nucleotides, in the target sequence. In some embodiments, the deletion comprises 1-500 or even more contiguous nucleotides.
In some embodiments, the deletion is within the target sequence. In some embodiments, the modification does not include insertion and/or substitution mutations.
In another aspect, the invention also provides a method of producing a genetically modified cell comprising introducing into said cell a gene editing system of the invention.
In the present invention, the target sequence to be modified may be located anywhere in the genome, for example, within a functional gene such as a protein-encoding gene, or may be located, for example, in a gene expression regulatory region such as a promoter region or an enhancer region, thereby effecting a modification of the function of the gene or a modification of gene expression. Modifications in the cellular target sequence may be detected by T7EI, PCR/RE or sequencing methods. The genome editing system of the present invention is particularly suitable for modifying regulatory sequences such as promoters or non-coding sequences and the like.
In the method of the present invention, the genome editing system may be introduced into a cell by various methods well known to those skilled in the art.
Methods that can be used to introduce the genome editing system of the present invention into a cell include, but are not limited to: calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (such as baculovirus, vaccinia, adenovirus and other viruses), biolistics, PEG-mediated transformation of protoplasts, Agrobacterium tumefaciens-mediated transformation.
The cells whose genome can be edited by the method of the present invention may be derived from, for example, mammals such as human, mouse, rat, monkey, dog, pig, sheep, cow, cat; poultry such as chicken, duck, goose; plants, including monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis, and the like. Preferably, the cell is a plant cell, such as a rice cell.
In some embodiments, the methods of the invention are performed in vitro. For example, the cell is an isolated cell. In other embodiments, the methods of the invention may also be performed in vivo. For example, the cell is a cell within an organism into which the system of the invention can be introduced in vivo, for example, by virus-mediated methods. In some embodiments, the cell is a germ cell. In some embodiments, the cell is a somatic cell.
In another aspect, the invention also provides a genetically modified organism comprising a genetically modified cell produced by the method of the invention.
Such organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cows, cats; poultry such as chicken, duck, goose; plants, including monocots and dicots, such as rice, maize, wheat, sorghum, barley, soybean, peanut, arabidopsis, and the like. Preferably, the organism is a plant, preferably rice.
Fifth, kit
In yet another aspect, a kit for use in the methods of the invention, comprising the genome editing system of the invention, and instructions for use, is also within the scope of the invention. The kit generally includes a label indicating the intended use and/or method of use of the kit contents. The term label includes any written or recorded material provided on or with the kit or otherwise provided with the kit.
Examples
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying specific embodiments and drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Method
Plasmid construction
The T5 exonuclease coding sequence has been codon optimized for rice (Oryza sativa) and commercially synthesized (GenScript, Nanjing, China). The T5 coding sequence was fused in-frame to the 5' end of Cas9 or Cas12a by Gibson assembly to generate p163-T5exo-Cas9 or p163-T5exo-Cas12a, respectively. To construct a pH-T5exo-Cas9-sgRNA binary vector, the T5exo-Cas9 expression cassette was cloned into pHUE411 backbone (Xing et al, 2014). To construct the pCambia-T5exo-Cas12a-crRNA binary vector, T5exo-Cas12a and the crRNA expression cassette were cloned into the pCambia2300 backbone.
Transfection of protoplasts
Seedlings of yellow Japonica rice (Japonica rice) grown on medium were used for protoplast preparation. The isolation and transformation of protoplasts was performed as described previously (Shan et al, 2013; Wang et al, 2014). Plasmid DNA (10. mu.g of each construct) was delivered into protoplasts by PEG-mediated transfection, and the transfected protoplasts were then incubated at 28 ℃. 48 hours after transfection, protoplasts were harvested to extract genomic DNA for restriction enzyme digestion or amplicon depth sequencing.
Agrobacterium-mediated transformation of rice
The binary vector was introduced into agrobacterium tumefaciens strain AGL1 by electroporation. Agrobacterium-mediated transformation of the rice variety Nipponbare and regeneration of rice plants was performed as previously reported (Hiei et al, 1994). Rice calli edited by Cas9/T5exo-Cas9 were selected on hygromycin containing medium (50. mu.g/ml) and calli edited by Cas12a/T5exo-Cas12a were selected on G418 containing medium (60. mu.g/ml).
Plant genomic DNA extraction and next generation sequencing of targeted amplicons
Genomic DNA was extracted from protoplasts and seedlings using the CTAB method (Murray et al, 1980) and then used as a template for PCR amplification. In the first round of PCR, the target region is amplified using site-specific primers. In the second round of PCR, both forward and reverse barcodes were added to the ends of the PCR products for library construction. Equal amounts of PCR products were collected and samples were commercially sequenced by paired-end read sequencing using Illumina NextSeq 500 platform (GENEWIZ, su, china). The sequencing reads were examined for indels of the sgRNA target site. Amplicon sequencing was repeated 3 times for each target site using genomic DNA extracted from three independent protoplast samples.
Off-target detection
The inventors examined the potential off-target effect of Cas or T5exo-Cas in OsXa13 and OsPDS rice mutants, respectively. Potential off-target sites were predicted in the Nipponbare genome by the online tool Cas-OFFinder (Bae et al, 2014). The four potential off-target sites in the OsXa13 rice mutant have 3-4 nucleotide mismatches. Five potential off-target sites with 4-5 nucleotide mismatches in the OsPDS mutants were selected. Locus specific primers flanking these off-target sites were designed. Amplicons of potential off-target sites (approximately 700 to 1000bp) were sequenced by Sanger sequencing.
Pathogen inoculation and virulence determination
The Xoo strain PXO99 was inoculated onto two recently fully expanded leaves of a rice seedling by the six-leaf stage leaf-cutting method as described previously (Yang et al, 2013). Disease symptoms were scored by measuring lesion length.
Example 1: fusion of T5 exonuclease with Cas9 alters the indel character of genome editing
To generate larger genome deletions in higher plants using the CRISPR/Cas system, the inventors selected T5, a well studied exonuclease, that can degrade DNA in the 5'- > 3' direction (Kaliman et al, 1986) and fuse it in the same reading frame to the N-terminus of Cas9 (fig. 1 a). To test whether T5exo-Cas9 fusion proteins can alter the indel signature, the inventors transfected T5exo-Cas9 and Cas9 plasmids, respectively, into rice protoplasts together with the sgRNA OsMKK5-T1 plasmid. Genomic DNA extracted from protoplasts 48h post-transfection was first digested with Hind III to reduce unedited DNA, which was then used as a template for targeted PCR amplification. To identify indels, the purified PCR products were cloned and sequenced by Sanger sequencing. The resulting indel signature shows that the percentage of deletions in the indels generated by the T5exo-Cas9 fusions is greatly increased, while the portion of the insertions is reduced, relative to Cas9 (fig. 1 b). The inventors further found that Cas9 induced deletions were predominantly 1-2bp in size (fig. 1d), consistent with previous reports (Zhang et al, 2014). In contrast, the deletion induced by the T5exo-Cas9 fusion was variable and larger, up to 446bp (fig. 1 c). These results indicate that fusion of T5 exonuclease with Cas9 contributes to deletions during genome editing.
Example 2: t5exo-Cas9 fusions induce higher frequency and increase the size of genomic deletions
To more thoroughly examine the indels generated by the T5exo-Cas9 fusions, the inventors designed four sgRNAs targeting different genomic loci in rice (OsMPK16-T1, OsCDC48-T1, OsALS-T1, OsXa13-T1) (Table 1). These sgrnas were transformed into rice protoplasts with T5exo-Cas9 fusions or Cas9, respectively. Indels generated at four target sites were analyzed by targeted deep sequencing. Consistently, the T5exo-Cas9 fusion induced significantly more deletions than Cas 9. For OsMPK16-T1, the deletion rate increased from 20.1 to 86.5(4.3 times); for OsCDC48-T1, the deletion rate increased from 71.8 to 95.6(1.3 times); for OsALS-T1, the deletion rate is from 76.4 to 97.4(1.3 times); and for OsXa13-T1, the deletion rate increased from 22.8 to 90.6(4.0 fold) (FIG. 2 a). These results demonstrate that T5exo-Cas9 induces a higher deletion frequency than Cas9 during genome editing.
Table 1: summary of sgRNA target sites and their corresponding oligonucleotides for vector construction.
Figure BDA0002332882140000121
Figure BDA0002332882140000131
PAM motif in each target sequence is shown in bold
Next, the inventors analyzed the size of all deletions made by T5exo-Cas9 fusion and Cas9 at the four target sites. For Cas9, most deletions were less than 10bp (96.5% -100%), with only a very small fraction (0-3.5%) greater than 10 bp. The deletion patterns of OsXa13-T1 and OsALS-T1 are mainly around 1-3bp (FIG. 2 b). The T5exo-Cas9 fusion produced larger deletions, with deletions greater than 10bp ranging from about 16.1% to 35.8%, with an average deletion size of 33-44bp (FIG. 2 b). Interestingly, the genome editing efficiency of Cas9 appeared to be enhanced by the T5 fusion (fig. 2 c).
Example 3: t5exo-Cas9 fusions produce larger genomic deletions in transgenic rice plants
To demonstrate that the use of the T5exo-Cas9 fusion to generate rice mutant plants, the inventors performed Agrobacterium-mediated transformation of rice with a binary vector expressing OsXa13-T1, which targets UPT of the OsXa13 gene promoter, together with T5exo-Cas9 or Cas9, respectivelyPthXo1Frame (upregulated by activator of transcription like effector PthXo 1). For Cas9, the inventors obtained 42T 0 transformants, 36 of which were edited with an editing efficiency of 85.7%. Of the edited lines, 12 were single allele homozygous mutants and 24 were double allele mutants (FIG. 3 a). 82% of the indel patterns had 1bp insertions (FIG. 3 b). For T5exo-Cas9, 46T 0 transformants were obtained, 42 lines were edited, and the editing efficiency was 91.3%. Of the edited lines, 3 lines were single allele homozygous mutants and 35 lines were double allele mutants (FIG. 3 a). The frequency of deletions was 72%, and 35% of the deletion mutants had deletions greater than 3bp at the target site (FIG. 3 b). Overall, in transgenic rice plants, the T5exo-Cas9 fusion induced a higher frequency and larger genome deletion compared to Cas9, consistent with the results observed in rice protoplasts (fig. 2). In addition, it appears that fusion of T5 with Cas9 enhances genome editing efficiency in transgenic rice plants, similar to that seen in protoplasts.
UPT in OsXa13 gene promoterPthXo1The box (25bp) is the only Xoo-responsive cis-acting element (Yuan et al, 2011). UPT of OsXa13PthXo1Naturally occurring deletions in frame result in recessive resistance to Xanthomonas oryzae pv. (Xoo), including the PXO99 strain (Chu et al, 2006). Thus, the inventors examined the PXO99 resistance phenotype of various homozygous deletion mutants generated from T5exo-Cas9 and Cas9 by a leaf-cutting method. The inventors have found that the average length of disease developed on wild type leaves is longerThe degree is about 13 cm. Lesions were approximately 7-8cm in length on mutants with 1bp insertion or no more than 2bp deletion on the allele. The-4/-12 bp biallelic mutant showed the strongest resistance, with lesion length of only about 3cm (FIGS. 3c and 3 d). This result suggests that a T5exo-Cas9 fusion may contribute to loss-of-function mutations of cis regulatory elements.
The inventors further examined the effect of T5 fusions on Cas9 off-target activity by measuring the frequency of indels at putative off-target sites. For OsXa13-T1 (Table 2), four potential off-target sites with three to four mismatches were identified (Table 2) using the online tool Cas-OFFinder (Table 2) (Bae et al, 2014). The inventors amplified DNA fragments covering these potential off-target sites from mutants generated by T5exo-Cas9 and Cas 9. Sequencing of the targeted amplicons showed that no mutations at these potential off-target sites were detected in Cas9 and the T5exo-Cas9 generated mutants, indicating that fusion of the T5 exonuclease did not alter the off-target activity of Cas 9.
TABLE 2 potential off-target sites in Rice
Figure BDA0002332882140000141
PAM sequences in each target sequence are shown in bold. Underlining the location of mismatch with preselected target
Example 4: fusion of T5 exonuclease with Cas12a increases deletion frequency and enlarges deletion size
To test whether the T5 fusions are suitable for other Cas nucleases, the inventors also fused T5 exonuclease to the N-terminus of Cas12a in frame using an XTEN linker (schellenberger et al, 2009). The fusion gene was driven by the maize Ubiquitin-1 promoter (Ubi-1) (FIG. 4 a). Three sgRNAs (OsBADH2-T1, OsEPSPs-T1 and OsPDS-T1) targeting different genomic sites of rice were designed (Table 1). Each sgRNA was transformed into rice protoplasts with T5exo-Cas12a or Cas12a and the editing of each gene was assessed by targeted amplicon deep sequencing. Similar to that observed with T5exo-Cas9, T5exo-Cas12a also induced a higher deletion frequency relative to Cas12a, with a dramatic decrease in insertion rate from 6.2 to 0.2(31.0 fold) for OsBADH 2-T1; for OsEPSPs-T1, the insertion rate decreased from 11.2 to 1.1(10.2 fold); for OsPDS-T1, the decrease was from 7.7 to 1.4(5.5 fold) (FIG. 4 b). The inventors then analyzed the size of all deletions made by the T5exo-Cas12a fusion and Cas12a at three target sites. For Cas12a, the deletions were mostly less than 15bp at all three target sites, and were concentrated around 6-10 bp. As expected, the T5exo-Cas12a fusions induced a larger deletion at each site, and the proportion of > 15bp deletions induced by these T5exo-Cas12a fusions at these target sites was on average 8.6-fold higher than Cas12a (fig. 4 c). Taken together, these results support that fusion of T5 exonuclease with Cas12a increases the frequency and size of genome deletions at the guide RNA target locus. In addition, the genome editing efficiency of the T5exo-Cas12a fusion was higher (1.34-1.47 fold) than Cas12a for all three target sites (fig. 4 d).
Example 5: t5exo-Cas12a fusions produce larger genomic deletions in transgenic rice plants
The present inventors also performed agrobacterium-mediated transformation of rice with binary vectors expressing T5exo-Cas12a or Cas12a and guide RNAs targeting the OsPDS gene (table 1). For Cas12a, 128T 0 transformants were obtained, 21 of which were edited with an editing efficiency of 16.4%. For the OsPDS site, all editing features were deletions, most of which were smaller than 1-15bp, and only 11.5% of which were larger than 15bp (FIG. 5 a). For T5exo-Cas12a, the inventors obtained 150T 0 transformants with a mutation frequency of 28.7%, which is about 1.8 times that of Cas12 a. For the deletion generated by T5exo-Cas12a, 46.8% were greater than 15bp (fig. 5 a). Of these, 11.3% of the deletions were greater than 30bp (FIG. 5 b). This result supports that the T5 fusion enhances the genome deletion and genome editing efficiency of Cas12a in transgenic rice plants.
The inventors also examined the off-target effect of sgRNA targeting the OsPDS gene on T5exo-Cas12 a. 5 potential off-target sites with 5 to 6 mismatches were identified using the online tool Cas-OFFinder (Table 2). DNA fragments covering potential off-target sites were amplified from mutants generated with T5exo-Cas12a and Cas12 a. Sequencing of the targeted amplicons showed that no mutations at these potential off-target sites were detected in the mutants generated by T5exo-Cas12a and Cas12a, indicating that fusion of the T5 exonuclease did not alter the off-target activity of Cas12 a.
Provided herein is a novel approach by fusing Cas9 or Cas12a with a T5 exonuclease that can generate larger genomic deletions with one guide RNA at a given target. As shown in experiments in rice protoplasts and seedlings, both the frequency and size of deletions caused by T5exo-Cas fusions at the target genomic site were increased. In addition, the genome editing efficiency of Cas9 and Cas12a was improved by fusion of T5 exonuclease. The T5exo-Cas fusion extends the CRISPR toolbox and facilitates the knock-out of regulatory and non-coding DNA. More broadly, the results of the present invention suggest a general strategy for creating larger deletions for other Cas nucleases.
Without being bound by any theory, it is speculated that T5 exonuclease degrades the 5' end of DSBs generated by Cas9 or Cas12a, leading to increased frequency and increased deletion size when NHEJ repairs DSBs. The different deletion sizes at one genomic site may be due to different duration of binding of T5exo-Cas9 or T5exo-Cas12a fusion proteins to DNA, which determines the activity of T5 exonuclease at the DNA end. Interestingly, the T5exo-Cas12a fusion produced a larger deletion than T5exo-Cas 9. This may be due to Cas12a producing a sticky end and Cas9 producing a blunt end. T5 exonuclease is reported to bind more strongly to DNA duplexes with 5' -overhangs than to DNA duplexes with blunt ends (Garforth et al, 1997).
The larger genomic deletions produced by the T5exo-Cas fusions will greatly facilitate functional analysis of regulatory and non-coding sequences (such as lncrnas, mirnas, and cis-elements), as small indels in these regions are highly likely not to produce the loss-of-function phenotype. In this study, the present inventors also observed the UPT of OsXa13 promoterPthXo1Small indels generated in frame (+1/-2) failed to knock out their function, but larger deletions (-4/-12) induced by the T5exo-Cas9 fusion disrupted the UPTPthXo1The function of the box (fig. 3 e). Recently, use has been made ofPaired sgRNA covered UPTPthXo1Rice mutants with a deletion of 149bp in frame, which show strong Xoo resistance without affecting fertility (Li et al, 2019). However, deletion ratio UPT of 149bpPthXo1The frame (25bp) is much larger, which may affect other regulatory sequences in this region. In contrast, most of the deletions generated by the T5exo-Cas fusions were in UPTPthXo1In frame, this suggests that the T5exo-Cas fusion provides a more precise strategy to knock out short regulatory and non-coding sequences. Furthermore, it is not easy to design two sgrnas targeting such short sequences, and the present invention would be useful with a new tool that uses only one sgRNA.
One concern with the T5 fusion was the potential toxicity of T5 when expressed in foreign cells, originally proposed in bacteria (Kaliman et al, 1986). However, no visible phenotype or growth defect was observed in transgenic rice expressing T5exo-Cas9 or T5exo-Cas12a, indicating that the T5 fusion did not affect plant growth and development.
In summary, the inventors developed a new efficient strategy that can generate larger deletions with one guide RNA based on the use of fusion strategies against T5 exonuclease and Cas9 or Cas12 a.
Sequence listing
>SEQ ID NO:1 T5exo-Cas9
Figure BDA0002332882140000161
>SEQ ID NO:2 T5exo-Cas9
T5exo-Linker-NLS1-Cas9-NLS2
the NLS,Cas9,T5 exonuclease and Linker are highlighted in gray,purple,blue and orange respectively.
Figure BDA0002332882140000171
Figure BDA0002332882140000181
SEQ ID NO: 3T 5 exonuclease
Figure BDA0002332882140000182
Figure BDA0002332882140000191
SEQ ID NO: 4T 5 exonuclease
Figure BDA0002332882140000192
>SEQ ID NO:5 Linker
Figure BDA0002332882140000193
>SEQ ID NO:6 NLS1
Figure BDA0002332882140000194
>SEQ ID NO:7 NLS2
Figure BDA0002332882140000195
>SEQ ID NO:8 Cas9
Figure BDA0002332882140000196
Figure BDA0002332882140000201
>SEQ ID NO:9 Cas9
Figure BDA0002332882140000202
Figure BDA0002332882140000211
>SEO ID NO:10 T5exo-Cas12a
Figure BDA0002332882140000212
Figure BDA0002332882140000221
>SEQ ID NO:11 T5exo-Cas12a
NLS3-T5exo-Linker-Cas9-NLS4
the NLS,Cas12a,T5 exonuclease and XTEN linker are highlighted in gray,green,blue and yellow respectively.
Figure BDA0002332882140000222
Figure BDA0002332882140000231
>SEQ ID NO:12 NLS3
Figure BDA0002332882140000232
>SEQ ID NO:13 NLS3
Figure BDA0002332882140000233
>SEQ ID NO:14 XTEN linker
Figure BDA0002332882140000241
>SEQ ID NO:15 Cas12a
Figure BDA0002332882140000242
>SEQ ID NO:16 Cas12a
Figure BDA0002332882140000243
Figure BDA0002332882140000251
Sequence listing
<110> institute of microbiology of Chinese academy of sciences
<120> improved genome editing system
<130> I2019TC3889CB
<160> 16
<170> PatentIn version 3.5
<210> 1
<211> 1709
<212> PRT
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 1
Met Ser Lys Ser Trp Gly Lys Phe Ile Glu Glu Glu Glu Ala Glu Met
1 5 10 15
Ala Ser Arg Arg Asn Leu Met Ile Val Asp Gly Thr Asn Leu Gly Phe
20 25 30
Arg Phe Lys His Asn Asn Ser Lys Lys Pro Phe Ala Ser Ser Tyr Val
35 40 45
Ser Thr Ile Gln Ser Leu Ala Lys Ser Tyr Ser Ala Arg Thr Thr Ile
50 55 60
Val Leu Gly Asp Lys Gly Lys Ser Val Phe Arg Leu Glu His Leu Pro
65 70 75 80
Glu Tyr Lys Gly Asn Arg Asp Glu Lys Tyr Ala Gln Arg Thr Glu Glu
85 90 95
Glu Lys Ala Leu Asp Glu Gln Phe Phe Glu Tyr Leu Lys Asp Ala Phe
100 105 110
Glu Leu Cys Lys Thr Thr Phe Pro Thr Phe Thr Ile Arg Gly Val Glu
115 120 125
Ala Asp Asp Met Ala Ala Tyr Ile Val Lys Leu Ile Gly His Leu Tyr
130 135 140
Asp His Val Trp Leu Ile Ser Thr Asp Gly Asp Trp Asp Thr Leu Leu
145 150 155 160
Thr Asp Lys Val Ser Arg Phe Ser Phe Thr Thr Arg Arg Glu Tyr His
165 170 175
Leu Arg Asp Met Tyr Glu His His Asn Val Asp Asp Val Glu Gln Phe
180 185 190
Ile Ser Leu Lys Ala Ile Met Gly Asp Leu Gly Asp Asn Ile Arg Gly
195 200 205
Val Glu Gly Ile Gly Ala Lys Arg Gly Tyr Asn Ile Ile Arg Glu Phe
210 215 220
Gly Asn Val Leu Asp Ile Ile Asp Gln Leu Pro Leu Pro Gly Lys Gln
225 230 235 240
Lys Tyr Ile Gln Asn Leu Asn Ala Ser Glu Glu Leu Leu Phe Arg Asn
245 250 255
Leu Ile Leu Val Asp Leu Pro Thr Tyr Cys Val Asp Ala Ile Ala Ala
260 265 270
Val Gly Gln Asp Val Leu Asp Lys Phe Thr Lys Asp Ile Leu Glu Ile
275 280 285
Ala Glu Gln Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly
290 295 300
Gly Ser Gly Ser Met Ala Pro Lys Lys Lys Arg Lys Val Gly Ile His
305 310 315 320
Gly Val Pro Ala Ala Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile
325 330 335
Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val
340 345 350
Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile
355 360 365
Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala
370 375 380
Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg
385 390 395 400
Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala
405 410 415
Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val
420 425 430
Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val
435 440 445
Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg
450 455 460
Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr
465 470 475 480
Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu
485 490 495
Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln
500 505 510
Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala
515 520 525
Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser
530 535 540
Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn
545 550 555 560
Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn
565 570 575
Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser
580 585 590
Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly
595 600 605
Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala
610 615 620
Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala
625 630 635 640
Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His Gln Asp
645 650 655
Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr
660 665 670
Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile
675 680 685
Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile
690 695 700
Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg
705 710 715 720
Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro
725 730 735
His Gln Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu
740 745 750
Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile
755 760 765
Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn
770 775 780
Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro
785 790 795 800
Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe
805 810 815
Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val
820 825 830
Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu
835 840 845
Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe
850 855 860
Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr
865 870 875 880
Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys
885 890 895
Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe
900 905 910
Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp
915 920 925
Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile
930 935 940
Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg
945 950 955 960
Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys Gln Leu
965 970 975
Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile
980 985 990
Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu
995 1000 1005
Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile His
1010 1015 1020
Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val
1025 1030 1035
Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala
1040 1045 1050
Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
1055 1060 1065
Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn
1070 1075 1080
Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly
1085 1090 1095
Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile
1100 1105 1110
Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val Glu Asn
1115 1120 1125
Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn
1130 1135 1140
Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu
1145 1150 1155
Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys
1160 1165 1170
Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn
1175 1180 1185
Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys
1190 1195 1200
Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr
1205 1210 1215
Gln Arg Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu
1220 1225 1230
Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu
1235 1240 1245
Thr Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg
1250 1255 1260
Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val
1265 1270 1275
Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys
1280 1285 1290
Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His
1295 1300 1305
Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile
1310 1315 1320
Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp Tyr
1325 1330 1335
Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu
1340 1345 1350
Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile Met
1355 1360 1365
Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg
1370 1375 1380
Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile Val
1385 1390 1395
Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser
1400 1405 1410
Met Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly
1415 1420 1425
Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys
1430 1435 1440
Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly
1445 1450 1455
Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys
1460 1465 1470
Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu
1475 1480 1485
Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn Pro
1490 1495 1500
Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp
1505 1510 1515
Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu Asn
1520 1525 1530
Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly
1535 1540 1545
Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu
1550 1555 1560
Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu
1565 1570 1575
Gln Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp Glu
1580 1585 1590
Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala
1595 1600 1605
Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg
1610 1615 1620
Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe
1625 1630 1635
Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp
1640 1645 1650
Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val Leu
1655 1660 1665
Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr
1670 1675 1680
Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp Lys Arg Pro Ala Ala
1685 1690 1695
Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys
1700 1705
<210> 2
<211> 5130
<212> DNA
<213> ARTIFICIAL SEQUENCE
<220>
<223> T5exo - Linker- NLS1- Cas9-NLS2
<400> 2
atgtcaaagt cttggggcaa gttcatcgag gaggaggagg ccgagatggc gtcaaggcgc 60
aacctcatga ttgtcgacgg caccaatctg ggcttccggt tcaagcacaa caattctaag 120
aagcctttcg cctccagcta cgtgtccaca atccagagcc tcgccaagtc ctacagcgcg 180
cgcaccacaa ttgtgctggg cgacaagggc aagtcagtct tccggctgga gcatctgccg 240
gagtacaagg gcaacaggga tgagaagtac gcacagagga ccgaggagga gaaggcactc 300
gatgagcagt tcttcgagta cctcaaggac gccttcgagc tgtgcaagac cacattccca 360
accttcacaa tcaggggagt ggaggcagac gatatggcag cgtacatcgt caagctcatt 420
ggccacctgt acgatcatgt gtggctcatt tccacagacg gcgattggga caccctcctg 480
acagacaagg tctcacggtt ctctttcacc acacggaggg agtaccacct gagggatatg 540
tacgagcacc ataacgtgga cgatgtcgag cagttcatca gcctcaaggc cattatgggc 600
gatctgggcg acaatatcag gggagtcgag ggaattggag caaagagggg ctacaacatc 660
attcgggagt tcggcaatgt gctcgatatc attgaccagc tcccgctgcc aggcaagcag 720
aagtacatcc agaacctcaa tgcgtccgag gagctcctgt tccgcaatct catcctggtg 780
gatctgccga cctactgcgt cgacgcaatt gcagcagtgg gacaggatgt cctcgacaag 840
ttcacaaagg atatcctgga gattgcggag cagggtggag gcggaagtgg aggtggcggg 900
tcagggggtg gcggatctgg atccatggcc cctaagaaga agagaaaggt cggtattcac 960
ggcgttcctg cggcgatgga caagaagtat agtattggtc tggacattgg gacgaattcc 1020
gttggctggg ccgtgatcac cgatgagtac aaggtccctt ccaagaagtt taaggttctg 1080
gggaacaccg atcggcacag catcaagaag aatctcattg gagccctcct gttcgactca 1140
ggcgagaccg ccgaagcaac aaggctcaag agaaccgcaa ggagacggta tacaagaagg 1200
aagaatagga tctgctacct gcaggagatt ttcagcaacg aaatggcgaa ggtggacgat 1260
tcgttctttc atagattgga ggagagtttc ctcgtcgagg aagataagaa gcacgagagg 1320
catcctatct ttggcaacat tgtcgacgag gttgcctatc acgaaaagta ccccacaatc 1380
tatcatctgc ggaagaagct tgtggactcg actgataagg cggaccttag attgatctac 1440
ctcgctctgg cacacatgat taagttcagg ggccattttc tgatcgaggg ggatcttaac 1500
ccggacaata gcgatgtgga caagttgttc atccagctcg tccaaaccta caatcagctc 1560
tttgaggaaa acccaattaa tgcttcaggc gtcgacgcca aggcgatcct gtctgcacgc 1620
ctttcaaagt ctcgccggct tgagaacttg atcgctcaac tcccgggcga aaagaagaac 1680
ggcttgttcg ggaatctcat tgcactttcg ttggggctca caccaaactt caagagtaat 1740
tttgatctcg ctgaggacgc aaagctgcag ctttccaagg acacttatga cgatgacctg 1800
gataaccttt tggcccaaat cggcgatcag tacgcggact tgttcctcgc cgcgaagaat 1860
ttgtcggacg cgatcctcct gagtgatatt ctccgcgtga acaccgagat tacaaaggcc 1920
ccgctctcgg cgagtatgat caagcgctat gacgagcacc atcaggatct gacccttttg 1980
aaggctttgg tccggcagca actcccagag aagtacaagg aaatcttctt tgatcaatcc 2040
aagaacggct acgctggtta tattgacggc ggggcatcgc aggaggaatt ctacaagttt 2100
atcaagccaa ttctggagaa gatggatggc acagaggaac tcctggtgaa gctcaatagg 2160
gaggaccttt tgcggaagca aagaactttc gataacggca gcatccctca ccagattcat 2220
ctcggggagc tgcacgccat cctgagaagg caggaagact tctacccctt tcttaaggat 2280
aaccgggaga agatcgaaaa gattctgacg ttcagaattc cgtactatgt cggaccactc 2340
gcccggggta attccagatt tgcgtggatg accagaaaga gcgaggaaac catcacacct 2400
tggaacttcg aggaagtggt cgataagggc gcttccgcac agagcttcat tgagcgcatg 2460
acaaattttg acaagaacct gcctaatgag aaggtccttc ccaagcattc cctcctgtac 2520
gagtatttca ctgtttataa cgaactcacg aaggtgaagt atgtgaccga gggaatgcgc 2580
aagcccgcct tcctgagcgg cgagcaaaag aaggcgatcg tggacctttt gtttaagacc 2640
aatcggaagg tcacagttaa gcagctcaag gaggactact tcaagaagat tgaatgcttc 2700
gattccgttg agatcagcgg cgtggaagac aggtttaacg cgtcactggg gacttaccac 2760
gatctcctga agatcattaa ggataaggac ttcttggaca acgaggaaaa tgaggatatc 2820
ctcgaagaca ttgtcctgac tcttacgttg tttgaggata gggaaatgat cgaggaacgc 2880
ttgaagacgt atgcccatct cttcgatgac aaggttatga agcagctcaa gagaagaaga 2940
tacaccggat ggggaaggct gtcccgcaag cttatcaatg gcattagaga caagcaatca 3000
gggaagacaa tccttgactt tttgaagtct gatggcttcg cgaacaggaa ttttatgcag 3060
ctgattcacg atgactcact tactttcaag gaggatatcc agaaggctca agtgtcggga 3120
caaggtgaca gtctgcacga gcatatcgcc aaccttgcgg gatctcctgc aatcaagaag 3180
ggtattctgc agacagtcaa ggttgtggat gagcttgtga aggtcatggg acggcataag 3240
cccgagaaca tcgttattga gatggccaga gaaaatcaga ccacacaaaa gggtcagaag 3300
aactcgaggg agcgcatgaa gcgcatcgag gaaggcatta aggagctggg gagtcagatc 3360
cttaaggagc acccggtgga aaacacgcag ttgcaaaatg agaagctcta tctgtactat 3420
ctgcaaaatg gcagggatat gtatgtggac caggagttgg atattaaccg cctctcggat 3480
tacgacgtcg atcatatcgt tcctcagtcc ttccttaagg atgacagcat tgacaataag 3540
gttctcacca ggtccgacaa gaaccgcggg aagtccgata atgtgcccag cgaggaagtc 3600
gttaagaaga tgaagaacta ctggaggcaa cttttgaatg ccaagttgat cacacagagg 3660
aagtttgata acctcactaa ggccgagcgc ggaggtctca gcgaactgga caaggcgggc 3720
ttcattaagc ggcaactggt tgagactaga cagatcacga agcacgtggc gcagattctc 3780
gattcacgca tgaacacgaa gtacgatgag aatgacaagc tgatccggga agtgaaggtc 3840
atcaccttga agtcaaagct cgtttctgac ttcaggaagg atttccaatt ttataaggtg 3900
cgcgagatca acaattatca ccatgctcat gacgcatacc tcaacgctgt ggtcggaaca 3960
gcattgatta agaagtaccc gaagctcgag tccgaattcg tgtacggtga ctataaggtt 4020
tacgatgtgc gcaagatgat cgccaagtca gagcaggaaa ttggcaaggc cactgcgaag 4080
tatttctttt actctaacat tatgaatttc tttaagactg agatcacgct ggctaatggc 4140
gaaatccgga agagaccact tattgagacc aacggcgaga caggggaaat cgtgtgggac 4200
aaggggaggg atttcgccac agtccgcaag gttctctcta tgcctcaagt gaatattgtc 4260
aagaagactg aagtccagac gggcgggttc tcaaaggaat ctattctgcc caagcggaac 4320
tcggataagc ttatcgccag aaagaaggac tgggacccga agaagtatgg aggtttcgac 4380
tcaccaacgg tggcttactc tgtcctggtt gtggcaaagg tggagaaggg aaagtcaaag 4440
aagctcaagt ctgtcaagga gctcctgggt atcaccatta tggagaggtc cagcttcgaa 4500
aagaatccga tcgattttct cgaggcgaag ggatataagg aagtgaagaa ggacctgatc 4560
attaagcttc caaagtacag tcttttcgag ttggaaaacg gcaggaagcg catgttggct 4620
tccgcaggag agctccagaa gggtaacgag cttgctttgc cgtccaagta tgtgaacttc 4680
ctctatctgg catcccacta cgagaagctc aagggcagcc cagaggataa cgaacagaag 4740
caactgtttg tggagcaaca caagcattat cttgacgaga tcattgaaca gatttcggag 4800
ttcagtaagc gcgtcatcct cgccgacgcg aatttggata aggttctctc agcctacaac 4860
aagcaccggg acaagcctat cagagagcag gcggaaaata tcattcatct cttcaccctg 4920
acaaaccttg gggctcccgc tgcattcaag tattttgaca ctacgattga tcggaagaga 4980
tacacttcta cgaaggaggt gctggatgca acccttatcc accaatcgat tactggcctc 5040
tacgagacgc ggatcgactt gagtcagctc gggggggata agagaccagc ggcaaccaag 5100
aaggcaggac aagcgaagaa gaagaagtag 5130
<210> 3
<211> 290
<212> PRT
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 3
Ser Lys Ser Trp Gly Lys Phe Ile Glu Glu Glu Glu Ala Glu Met Ala
1 5 10 15
Ser Arg Arg Asn Leu Met Ile Val Asp Gly Thr Asn Leu Gly Phe Arg
20 25 30
Phe Lys His Asn Asn Ser Lys Lys Pro Phe Ala Ser Ser Tyr Val Ser
35 40 45
Thr Ile Gln Ser Leu Ala Lys Ser Tyr Ser Ala Arg Thr Thr Ile Val
50 55 60
Leu Gly Asp Lys Gly Lys Ser Val Phe Arg Leu Glu His Leu Pro Glu
65 70 75 80
Tyr Lys Gly Asn Arg Asp Glu Lys Tyr Ala Gln Arg Thr Glu Glu Glu
85 90 95
Lys Ala Leu Asp Glu Gln Phe Phe Glu Tyr Leu Lys Asp Ala Phe Glu
100 105 110
Leu Cys Lys Thr Thr Phe Pro Thr Phe Thr Ile Arg Gly Val Glu Ala
115 120 125
Asp Asp Met Ala Ala Tyr Ile Val Lys Leu Ile Gly His Leu Tyr Asp
130 135 140
His Val Trp Leu Ile Ser Thr Asp Gly Asp Trp Asp Thr Leu Leu Thr
145 150 155 160
Asp Lys Val Ser Arg Phe Ser Phe Thr Thr Arg Arg Glu Tyr His Leu
165 170 175
Arg Asp Met Tyr Glu His His Asn Val Asp Asp Val Glu Gln Phe Ile
180 185 190
Ser Leu Lys Ala Ile Met Gly Asp Leu Gly Asp Asn Ile Arg Gly Val
195 200 205
Glu Gly Ile Gly Ala Lys Arg Gly Tyr Asn Ile Ile Arg Glu Phe Gly
210 215 220
Asn Val Leu Asp Ile Ile Asp Gln Leu Pro Leu Pro Gly Lys Gln Lys
225 230 235 240
Tyr Ile Gln Asn Leu Asn Ala Ser Glu Glu Leu Leu Phe Arg Asn Leu
245 250 255
Ile Leu Val Asp Leu Pro Thr Tyr Cys Val Asp Ala Ile Ala Ala Val
260 265 270
Gly Gln Asp Val Leu Asp Lys Phe Thr Lys Asp Ile Leu Glu Ile Ala
275 280 285
Glu Gln
290
<210> 4
<211> 870
<212> DNA
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 4
tcaaagtctt ggggcaagtt catcgaggag gaggaggccg agatggcgtc aaggcgcaac 60
ctcatgattg tcgacggcac caatctgggc ttccggttca agcacaacaa ttctaagaag 120
cctttcgcct ccagctacgt gtccacaatc cagagcctcg ccaagtccta cagcgcgcgc 180
accacaattg tgctgggcga caagggcaag tcagtcttcc ggctggagca tctgccggag 240
tacaagggca acagggatga gaagtacgca cagaggaccg aggaggagaa ggcactcgat 300
gagcagttct tcgagtacct caaggacgcc ttcgagctgt gcaagaccac attcccaacc 360
ttcacaatca ggggagtgga ggcagacgat atggcagcgt acatcgtcaa gctcattggc 420
cacctgtacg atcatgtgtg gctcatttcc acagacggcg attgggacac cctcctgaca 480
gacaaggtct cacggttctc tttcaccaca cggagggagt accacctgag ggatatgtac 540
gagcaccata acgtggacga tgtcgagcag ttcatcagcc tcaaggccat tatgggcgat 600
ctgggcgaca atatcagggg agtcgaggga attggagcaa agaggggcta caacatcatt 660
cgggagttcg gcaatgtgct cgatatcatt gaccagctcc cgctgccagg caagcagaag 720
tacatccaga acctcaatgc gtccgaggag ctcctgttcc gcaatctcat cctggtggat 780
ctgccgacct actgcgtcga cgcaattgca gcagtgggac aggatgtcct cgacaagttc 840
acaaaggata tcctggagat tgcggagcag 870
<210> 5
<211> 15
<212> PRT
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 5
Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser
1 5 10 15
<210> 6
<211> 17
<212> PRT
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 6
Met Ala Pro Lys Lys Lys Arg Lys Val Gly Ile His Gly Val Pro Ala
1 5 10 15
Ala
<210> 7
<211> 16
<212> PRT
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 7
Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys
1 5 10 15
<210> 8
<211> 1368
<212> PRT
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 8
Met Asp Lys Lys Tyr Ser Ile Gly Leu Asp Ile Gly Thr Asn Ser Val
1 5 10 15
Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe
20 25 30
Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile
35 40 45
Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu
50 55 60
Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys
65 70 75 80
Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp Ser
85 90 95
Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys
100 105 110
His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr
115 120 125
His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val Asp
130 135 140
Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His
145 150 155 160
Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn Pro
165 170 175
Asp Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr
180 185 190
Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala
195 200 205
Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu Asn
210 215 220
Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn
225 230 235 240
Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe
245 250 255
Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp
260 265 270
Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp
275 280 285
Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp
290 295 300
Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser
305 310 315 320
Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys
325 330 335
Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe
340 345 350
Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser
355 360 365
Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp
370 375 380
Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu Arg
385 390 395 400
Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu
405 410 415
Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe
420 425 430
Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile
435 440 445
Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp
450 455 460
Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu
465 470 475 480
Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr
485 490 495
Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser
500 505 510
Leu Leu Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val Lys
515 520 525
Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln
530 535 540
Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr
545 550 555 560
Val Lys Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp
565 570 575
Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly
580 585 590
Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp
595 600 605
Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr Leu Thr
610 615 620
Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala
625 630 635 640
His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg Tyr
645 650 655
Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp
660 665 670
Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe
675 680 685
Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr Phe
690 695 700
Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu
705 710 715 720
His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly
725 730 735
Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val Met Gly
740 745 750
Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln
755 760 765
Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg Ile
770 775 780
Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro
785 790 795 800
Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu
805 810 815
Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg
820 825 830
Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys
835 840 845
Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn Arg
850 855 860
Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys Met Lys
865 870 875 880
Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys
885 890 895
Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp
900 905 910
Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr
915 920 925
Lys His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp
930 935 940
Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser
945 950 955 960
Lys Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
965 970 975
Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala Val
980 985 990
Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe
995 1000 1005
Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala
1010 1015 1020
Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe
1025 1030 1035
Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala
1040 1045 1050
Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn Gly Glu
1055 1060 1065
Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val
1070 1075 1080
Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile Val Lys Lys Thr
1085 1090 1095
Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys
1100 1105 1110
Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro
1115 1120 1125
Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val
1130 1135 1140
Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys
1145 1150 1155
Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser
1160 1165 1170
Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys
1175 1180 1185
Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu
1190 1195 1200
Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly
1205 1210 1215
Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys Tyr Val
1220 1225 1230
Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser
1235 1240 1245
Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln His Lys
1250 1255 1260
His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys
1265 1270 1275
Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu Ser Ala
1280 1285 1290
Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn
1295 1300 1305
Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala
1310 1315 1320
Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser
1325 1330 1335
Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr
1340 1345 1350
Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp
1355 1360 1365
<210> 9
<211> 4104
<212> DNA
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 9
atggacaaga agtatagtat tggtctggac attgggacga attccgttgg ctgggccgtg 60
atcaccgatg agtacaaggt cccttccaag aagtttaagg ttctggggaa caccgatcgg 120
cacagcatca agaagaatct cattggagcc ctcctgttcg actcaggcga gaccgccgaa 180
gcaacaaggc tcaagagaac cgcaaggaga cggtatacaa gaaggaagaa taggatctgc 240
tacctgcagg agattttcag caacgaaatg gcgaaggtgg acgattcgtt ctttcataga 300
ttggaggaga gtttcctcgt cgaggaagat aagaagcacg agaggcatcc tatctttggc 360
aacattgtcg acgaggttgc ctatcacgaa aagtacccca caatctatca tctgcggaag 420
aagcttgtgg actcgactga taaggcggac cttagattga tctacctcgc tctggcacac 480
atgattaagt tcaggggcca ttttctgatc gagggggatc ttaacccgga caatagcgat 540
gtggacaagt tgttcatcca gctcgtccaa acctacaatc agctctttga ggaaaaccca 600
attaatgctt caggcgtcga cgccaaggcg atcctgtctg cacgcctttc aaagtctcgc 660
cggcttgaga acttgatcgc tcaactcccg ggcgaaaaga agaacggctt gttcgggaat 720
ctcattgcac tttcgttggg gctcacacca aacttcaaga gtaattttga tctcgctgag 780
gacgcaaagc tgcagctttc caaggacact tatgacgatg acctggataa ccttttggcc 840
caaatcggcg atcagtacgc ggacttgttc ctcgccgcga agaatttgtc ggacgcgatc 900
ctcctgagtg atattctccg cgtgaacacc gagattacaa aggccccgct ctcggcgagt 960
atgatcaagc gctatgacga gcaccatcag gatctgaccc ttttgaaggc tttggtccgg 1020
cagcaactcc cagagaagta caaggaaatc ttctttgatc aatccaagaa cggctacgct 1080
ggttatattg acggcggggc atcgcaggag gaattctaca agtttatcaa gccaattctg 1140
gagaagatgg atggcacaga ggaactcctg gtgaagctca atagggagga ccttttgcgg 1200
aagcaaagaa ctttcgataa cggcagcatc cctcaccaga ttcatctcgg ggagctgcac 1260
gccatcctga gaaggcagga agacttctac ccctttctta aggataaccg ggagaagatc 1320
gaaaagattc tgacgttcag aattccgtac tatgtcggac cactcgcccg gggtaattcc 1380
agatttgcgt ggatgaccag aaagagcgag gaaaccatca caccttggaa cttcgaggaa 1440
gtggtcgata agggcgcttc cgcacagagc ttcattgagc gcatgacaaa ttttgacaag 1500
aacctgccta atgagaaggt ccttcccaag cattccctcc tgtacgagta tttcactgtt 1560
tataacgaac tcacgaaggt gaagtatgtg accgagggaa tgcgcaagcc cgccttcctg 1620
agcggcgagc aaaagaaggc gatcgtggac cttttgttta agaccaatcg gaaggtcaca 1680
gttaagcagc tcaaggagga ctacttcaag aagattgaat gcttcgattc cgttgagatc 1740
agcggcgtgg aagacaggtt taacgcgtca ctggggactt accacgatct cctgaagatc 1800
attaaggata aggacttctt ggacaacgag gaaaatgagg atatcctcga agacattgtc 1860
ctgactctta cgttgtttga ggatagggaa atgatcgagg aacgcttgaa gacgtatgcc 1920
catctcttcg atgacaaggt tatgaagcag ctcaagagaa gaagatacac cggatgggga 1980
aggctgtccc gcaagcttat caatggcatt agagacaagc aatcagggaa gacaatcctt 2040
gactttttga agtctgatgg cttcgcgaac aggaatttta tgcagctgat tcacgatgac 2100
tcacttactt tcaaggagga tatccagaag gctcaagtgt cgggacaagg tgacagtctg 2160
cacgagcata tcgccaacct tgcgggatct cctgcaatca agaagggtat tctgcagaca 2220
gtcaaggttg tggatgagct tgtgaaggtc atgggacggc ataagcccga gaacatcgtt 2280
attgagatgg ccagagaaaa tcagaccaca caaaagggtc agaagaactc gagggagcgc 2340
atgaagcgca tcgaggaagg cattaaggag ctggggagtc agatccttaa ggagcacccg 2400
gtggaaaaca cgcagttgca aaatgagaag ctctatctgt actatctgca aaatggcagg 2460
gatatgtatg tggaccagga gttggatatt aaccgcctct cggattacga cgtcgatcat 2520
atcgttcctc agtccttcct taaggatgac agcattgaca ataaggttct caccaggtcc 2580
gacaagaacc gcgggaagtc cgataatgtg cccagcgagg aagtcgttaa gaagatgaag 2640
aactactgga ggcaactttt gaatgccaag ttgatcacac agaggaagtt tgataacctc 2700
actaaggccg agcgcggagg tctcagcgaa ctggacaagg cgggcttcat taagcggcaa 2760
ctggttgaga ctagacagat cacgaagcac gtggcgcaga ttctcgattc acgcatgaac 2820
acgaagtacg atgagaatga caagctgatc cgggaagtga aggtcatcac cttgaagtca 2880
aagctcgttt ctgacttcag gaaggatttc caattttata aggtgcgcga gatcaacaat 2940
tatcaccatg ctcatgacgc atacctcaac gctgtggtcg gaacagcatt gattaagaag 3000
tacccgaagc tcgagtccga attcgtgtac ggtgactata aggtttacga tgtgcgcaag 3060
atgatcgcca agtcagagca ggaaattggc aaggccactg cgaagtattt cttttactct 3120
aacattatga atttctttaa gactgagatc acgctggcta atggcgaaat ccggaagaga 3180
ccacttattg agaccaacgg cgagacaggg gaaatcgtgt gggacaaggg gagggatttc 3240
gccacagtcc gcaaggttct ctctatgcct caagtgaata ttgtcaagaa gactgaagtc 3300
cagacgggcg ggttctcaaa ggaatctatt ctgcccaagc ggaactcgga taagcttatc 3360
gccagaaaga aggactggga cccgaagaag tatggaggtt tcgactcacc aacggtggct 3420
tactctgtcc tggttgtggc aaaggtggag aagggaaagt caaagaagct caagtctgtc 3480
aaggagctcc tgggtatcac cattatggag aggtccagct tcgaaaagaa tccgatcgat 3540
tttctcgagg cgaagggata taaggaagtg aagaaggacc tgatcattaa gcttccaaag 3600
tacagtcttt tcgagttgga aaacggcagg aagcgcatgt tggcttccgc aggagagctc 3660
cagaagggta acgagcttgc tttgccgtcc aagtatgtga acttcctcta tctggcatcc 3720
cactacgaga agctcaaggg cagcccagag gataacgaac agaagcaact gtttgtggag 3780
caacacaagc attatcttga cgagatcatt gaacagattt cggagttcag taagcgcgtc 3840
atcctcgccg acgcgaattt ggataaggtt ctctcagcct acaacaagca ccgggacaag 3900
cctatcagag agcaggcgga aaatatcatt catctcttca ccctgacaaa ccttggggct 3960
cccgctgcat tcaagtattt tgacactacg attgatcgga agagatacac ttctacgaag 4020
gaggtgctgg atgcaaccct tatccaccaa tcgattactg gcctctacga gacgcggatc 4080
gacttgagtc agctcggggg ggat 4104
<210> 10
<211> 1566
<212> PRT
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 10
Met Ala Pro Lys Lys Lys Arg Lys Val Gly Ile His Gly Val Pro Ala
1 5 10 15
Ala Ser Lys Ser Trp Gly Lys Phe Ile Glu Glu Glu Glu Ala Glu Met
20 25 30
Ala Ser Arg Arg Asn Leu Met Ile Val Asp Gly Thr Asn Leu Gly Phe
35 40 45
Arg Phe Lys His Asn Asn Ser Lys Lys Pro Phe Ala Ser Ser Tyr Val
50 55 60
Ser Thr Ile Gln Ser Leu Ala Lys Ser Tyr Ser Ala Arg Thr Thr Ile
65 70 75 80
Val Leu Gly Asp Lys Gly Lys Ser Val Phe Arg Leu Glu His Leu Pro
85 90 95
Glu Tyr Lys Gly Asn Arg Asp Glu Lys Tyr Ala Gln Arg Thr Glu Glu
100 105 110
Glu Lys Ala Leu Asp Glu Gln Phe Phe Glu Tyr Leu Lys Asp Ala Phe
115 120 125
Glu Leu Cys Lys Thr Thr Phe Pro Thr Phe Thr Ile Arg Gly Val Glu
130 135 140
Ala Asp Asp Met Ala Ala Tyr Ile Val Lys Leu Ile Gly His Leu Tyr
145 150 155 160
Asp His Val Trp Leu Ile Ser Thr Asp Gly Asp Trp Asp Thr Leu Leu
165 170 175
Thr Asp Lys Val Ser Arg Phe Ser Phe Thr Thr Arg Arg Glu Tyr His
180 185 190
Leu Arg Asp Met Tyr Glu His His Asn Val Asp Asp Val Glu Gln Phe
195 200 205
Ile Ser Leu Lys Ala Ile Met Gly Asp Leu Gly Asp Asn Ile Arg Gly
210 215 220
Val Glu Gly Ile Gly Ala Lys Arg Gly Tyr Asn Ile Ile Arg Glu Phe
225 230 235 240
Gly Asn Val Leu Asp Ile Ile Asp Gln Leu Pro Leu Pro Gly Lys Gln
245 250 255
Lys Tyr Ile Gln Asn Leu Asn Ala Ser Glu Glu Leu Leu Phe Arg Asn
260 265 270
Leu Ile Leu Val Asp Leu Pro Thr Tyr Cys Val Asp Ala Ile Ala Ala
275 280 285
Val Gly Gln Asp Val Leu Asp Lys Phe Thr Lys Asp Ile Leu Glu Ile
290 295 300
Ala Glu Gln Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr
305 310 315 320
Pro Glu Ser Ser Lys Leu Glu Lys Phe Thr Asn Cys Tyr Ser Leu Ser
325 330 335
Lys Thr Leu Arg Phe Lys Ala Ile Pro Val Gly Lys Thr Gln Glu Asn
340 345 350
Ile Asp Asn Lys Arg Leu Leu Val Glu Asp Glu Lys Arg Ala Glu Asp
355 360 365
Tyr Lys Gly Val Lys Lys Leu Leu Asp Arg Tyr Tyr Leu Ser Phe Ile
370 375 380
Asn Asp Val Leu His Ser Ile Lys Leu Lys Asn Leu Asn Asn Tyr Ile
385 390 395 400
Ser Leu Phe Arg Lys Lys Thr Arg Thr Glu Lys Glu Asn Lys Glu Leu
405 410 415
Glu Asn Leu Glu Ile Asn Leu Arg Lys Glu Ile Ala Lys Ala Phe Lys
420 425 430
Gly Asn Glu Gly Tyr Lys Ser Leu Phe Lys Lys Asp Ile Ile Glu Thr
435 440 445
Ile Leu Pro Glu Phe Leu Asp Asp Lys Asp Glu Ile Ala Leu Val Asn
450 455 460
Ser Phe Asn Gly Phe Thr Thr Ala Phe Thr Gly Phe Phe Asp Asn Arg
465 470 475 480
Glu Asn Met Phe Ser Glu Glu Ala Lys Ser Thr Ser Ile Ala Phe Arg
485 490 495
Cys Ile Asn Glu Asn Leu Thr Arg Tyr Ile Ser Asn Met Asp Ile Phe
500 505 510
Glu Lys Val Asp Ala Ile Phe Asp Lys His Glu Val Gln Glu Ile Lys
515 520 525
Glu Lys Ile Leu Asn Ser Asp Tyr Asp Val Glu Asp Phe Phe Glu Gly
530 535 540
Glu Phe Phe Asn Phe Val Leu Thr Gln Glu Gly Ile Asp Val Tyr Asn
545 550 555 560
Ala Ile Ile Gly Gly Phe Val Thr Glu Ser Gly Glu Lys Ile Lys Gly
565 570 575
Leu Asn Glu Tyr Ile Asn Leu Tyr Asn Gln Lys Thr Lys Gln Lys Leu
580 585 590
Pro Lys Phe Lys Pro Leu Tyr Lys Gln Val Leu Ser Asp Arg Glu Ser
595 600 605
Leu Ser Phe Tyr Gly Glu Gly Tyr Thr Ser Asp Glu Glu Val Leu Glu
610 615 620
Val Phe Arg Asn Thr Leu Asn Lys Asn Ser Glu Ile Phe Ser Ser Ile
625 630 635 640
Lys Lys Leu Glu Lys Leu Phe Lys Asn Phe Asp Glu Tyr Ser Ser Ala
645 650 655
Gly Ile Phe Val Lys Asn Gly Pro Ala Ile Ser Thr Ile Ser Lys Asp
660 665 670
Ile Phe Gly Glu Trp Asn Val Ile Arg Asp Lys Trp Asn Ala Glu Tyr
675 680 685
Asp Asp Ile His Leu Lys Lys Lys Ala Val Val Thr Glu Lys Tyr Glu
690 695 700
Asp Asp Arg Arg Lys Ser Phe Lys Lys Ile Gly Ser Phe Ser Leu Glu
705 710 715 720
Gln Leu Gln Glu Tyr Ala Asp Ala Asp Leu Ser Val Val Glu Lys Leu
725 730 735
Lys Glu Ile Ile Ile Gln Lys Val Asp Glu Ile Tyr Lys Val Tyr Gly
740 745 750
Ser Ser Glu Lys Leu Phe Asp Ala Asp Phe Val Leu Glu Lys Ser Leu
755 760 765
Lys Lys Asn Asp Ala Val Val Ala Ile Met Lys Asp Leu Leu Asp Ser
770 775 780
Val Lys Ser Phe Glu Asn Tyr Ile Lys Ala Phe Phe Gly Glu Gly Lys
785 790 795 800
Glu Thr Asn Arg Asp Glu Ser Phe Tyr Gly Asp Phe Val Leu Ala Tyr
805 810 815
Asp Ile Leu Leu Lys Val Asp His Ile Tyr Asp Ala Ile Arg Asn Tyr
820 825 830
Val Thr Gln Lys Pro Tyr Ser Lys Asp Lys Phe Lys Leu Tyr Phe Gln
835 840 845
Asn Pro Gln Phe Met Gly Gly Trp Asp Lys Asp Lys Glu Thr Asp Tyr
850 855 860
Arg Ala Thr Ile Leu Arg Tyr Gly Ser Lys Tyr Tyr Leu Ala Ile Met
865 870 875 880
Asp Lys Lys Tyr Ala Lys Cys Leu Gln Lys Ile Asp Lys Asp Asp Val
885 890 895
Asn Gly Asn Tyr Glu Lys Ile Asn Tyr Lys Leu Leu Pro Gly Pro Asn
900 905 910
Lys Met Leu Pro Lys Val Phe Phe Ser Lys Lys Trp Met Ala Tyr Tyr
915 920 925
Asn Pro Ser Glu Asp Ile Gln Lys Ile Tyr Lys Asn Gly Thr Phe Lys
930 935 940
Lys Gly Asp Met Phe Asn Leu Asn Asp Cys His Lys Leu Ile Asp Phe
945 950 955 960
Phe Lys Asp Ser Ile Ser Arg Tyr Pro Lys Trp Ser Asn Ala Tyr Asp
965 970 975
Phe Asn Phe Ser Glu Thr Glu Lys Tyr Lys Asp Ile Ala Gly Phe Tyr
980 985 990
Arg Glu Val Glu Glu Gln Gly Tyr Lys Val Ser Phe Glu Ser Ala Ser
995 1000 1005
Lys Lys Glu Val Asp Lys Leu Val Glu Glu Gly Lys Leu Tyr Met
1010 1015 1020
Phe Gln Ile Tyr Asn Lys Asp Phe Ser Asp Lys Ser His Gly Thr
1025 1030 1035
Pro Asn Leu His Thr Met Tyr Phe Lys Leu Leu Phe Asp Glu Asn
1040 1045 1050
Asn His Gly Gln Ile Arg Leu Ser Gly Gly Ala Glu Leu Phe Met
1055 1060 1065
Arg Arg Ala Ser Leu Lys Lys Glu Glu Leu Val Val His Pro Ala
1070 1075 1080
Asn Ser Pro Ile Ala Asn Lys Asn Pro Asp Asn Pro Lys Lys Thr
1085 1090 1095
Thr Thr Leu Ser Tyr Asp Val Tyr Lys Asp Lys Arg Phe Ser Glu
1100 1105 1110
Asp Gln Tyr Glu Leu His Ile Pro Ile Ala Ile Asn Lys Cys Pro
1115 1120 1125
Lys Asn Ile Phe Lys Ile Asn Thr Glu Val Arg Val Leu Leu Lys
1130 1135 1140
His Asp Asp Asn Pro Tyr Val Ile Gly Ile Asp Arg Gly Glu Arg
1145 1150 1155
Asn Leu Leu Tyr Ile Val Val Val Asp Gly Lys Gly Asn Ile Val
1160 1165 1170
Glu Gln Tyr Ser Leu Asn Glu Ile Ile Asn Asn Phe Asn Gly Ile
1175 1180 1185
Arg Ile Lys Thr Asp Tyr His Ser Leu Leu Asp Lys Lys Glu Lys
1190 1195 1200
Glu Arg Phe Glu Ala Arg Gln Asn Trp Thr Ser Ile Glu Asn Ile
1205 1210 1215
Lys Glu Leu Lys Ala Gly Tyr Ile Ser Gln Val Val His Lys Ile
1220 1225 1230
Cys Glu Leu Val Glu Lys Tyr Asp Ala Val Ile Ala Leu Glu Asp
1235 1240 1245
Leu Asn Ser Gly Phe Lys Asn Ser Arg Val Lys Val Glu Lys Gln
1250 1255 1260
Val Tyr Gln Lys Phe Glu Lys Met Leu Ile Asp Lys Leu Asn Tyr
1265 1270 1275
Met Val Asp Lys Lys Ser Asn Pro Cys Ala Thr Gly Gly Ala Leu
1280 1285 1290
Lys Gly Tyr Gln Ile Thr Asn Lys Phe Glu Ser Phe Lys Ser Met
1295 1300 1305
Ser Thr Gln Asn Gly Phe Ile Phe Tyr Ile Pro Ala Trp Leu Thr
1310 1315 1320
Ser Lys Ile Asp Pro Ser Thr Gly Phe Val Asn Leu Leu Lys Thr
1325 1330 1335
Lys Tyr Thr Ser Ile Ala Asp Ser Lys Lys Phe Ile Ser Ser Phe
1340 1345 1350
Asp Arg Ile Met Tyr Val Pro Glu Glu Asp Leu Phe Glu Phe Ala
1355 1360 1365
Leu Asp Tyr Lys Asn Phe Ser Arg Thr Asp Ala Asp Tyr Ile Lys
1370 1375 1380
Lys Trp Lys Leu Tyr Ser Tyr Gly Asn Arg Ile Arg Ile Phe Arg
1385 1390 1395
Asn Pro Lys Lys Asn Asn Val Phe Asp Trp Glu Glu Val Cys Leu
1400 1405 1410
Thr Ser Ala Tyr Lys Glu Leu Phe Asn Lys Tyr Gly Ile Asn Tyr
1415 1420 1425
Gln Gln Gly Asp Ile Arg Ala Leu Leu Cys Glu Gln Ser Asp Lys
1430 1435 1440
Ala Phe Tyr Ser Ser Phe Met Ala Leu Met Ser Leu Met Leu Gln
1445 1450 1455
Met Arg Asn Ser Ile Thr Gly Arg Thr Asp Val Asp Phe Leu Ile
1460 1465 1470
Ser Pro Val Lys Asn Ser Asp Gly Ile Phe Tyr Asp Ser Arg Asn
1475 1480 1485
Tyr Glu Ala Gln Glu Asn Ala Ile Leu Pro Lys Asn Ala Asp Ala
1490 1495 1500
Asn Gly Ala Tyr Asn Ile Ala Arg Lys Val Leu Trp Ala Ile Gly
1505 1510 1515
Gln Phe Lys Lys Ala Glu Asp Glu Lys Leu Asp Lys Val Lys Ile
1520 1525 1530
Ala Ile Ser Asn Lys Glu Trp Leu Glu Tyr Ala Gln Thr Ser Val
1535 1540 1545
Lys His Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys
1550 1555 1560
Lys Lys Lys
1565
<210> 11
<211> 4701
<212> DNA
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 11
atggctccta agaagaagcg gaaggttggt attcacgggg tgcctgcggc ttcaaagtct 60
tggggcaagt tcatcgagga ggaggaggcc gagatggcgt caaggcgcaa cctcatgatt 120
gtcgacggca ccaatctggg cttccggttc aagcacaaca attctaagaa gcctttcgcc 180
tccagctacg tgtccacaat ccagagcctc gccaagtcct acagcgcgcg caccacaatt 240
gtgctgggcg acaagggcaa gtcagtcttc cggctggagc atctgccgga gtacaagggc 300
aacagggatg agaagtacgc acagaggacc gaggaggaga aggcactcga tgagcagttc 360
ttcgagtacc tcaaggacgc cttcgagctg tgcaagacca cattcccaac cttcacaatc 420
aggggagtgg aggcagacga tatggcagcg tacatcgtca agctcattgg ccacctgtac 480
gatcatgtgt ggctcatttc cacagacggc gattgggaca ccctcctgac agacaaggtc 540
tcacggttct ctttcaccac acggagggag taccacctga gggatatgta cgagcaccat 600
aacgtggacg atgtcgagca gttcatcagc ctcaaggcca ttatgggcga tctgggcgac 660
aatatcaggg gagtcgaggg aattggagca aagaggggct acaacatcat tcgggagttc 720
ggcaatgtgc tcgatatcat tgaccagctc ccgctgccag gcaagcagaa gtacatccag 780
aacctcaatg cgtccgagga gctcctgttc cgcaatctca tcctggtgga tctgccgacc 840
tactgcgtcg acgcaattgc agcagtggga caggatgtcc tcgacaagtt cacaaaggat 900
atcctggaga ttgcggagca gtccggcagc gagacgccag gcacctccga gagcgctacg 960
cctgaatcgt caaagctcga gaaattcacc aactgttatt cgttgagcaa aacactgcgg 1020
tttaaagcga ttccagtcgg caagactcaa gagaatatag acaataagcg gctgttggtg 1080
gaagatgaaa agcgcgcgga agactacaaa ggggtgaaga agttgttgga cagatactac 1140
ctctctttta tcaatgatgt cttgcactca atcaaattga agaatctgaa caactacatc 1200
tccctcttca gaaagaaaac aaggacagaa aaggagaata aggaacttga aaatttggag 1260
atcaatctga ggaaagagat cgcgaaagcc tttaaaggca acgaaggata caaaagtctg 1320
ttcaagaagg atataattga gacaattttg ccagagttcc tcgatgacaa ggacgagatt 1380
gcgctggtca attcgttcaa cggattcaca acagcattca caggcttctt tgataatcgg 1440
gaaaatatgt tctctgagga ggcaaagtcc acttctattg cgttcaggtg tatcaatgag 1500
aatctcacta ggtacatttc caacatggat atctttgaga aggttgacgc aatttttgac 1560
aagcacgaag ttcaggagat taaggagaag atcctcaatt ccgattatga cgttgaggac 1620
ttcttcgaag gtgagttttt taatttcgtg ctcactcaag agggtatcga cgtgtataat 1680
gcgatcatcg gtgggttcgt gactgagtcc ggtgaaaaga ttaagggatt gaacgagtat 1740
atcaaccttt acaaccaaaa gacgaaacag aagctgccaa agttcaagcc tctttacaaa 1800
caggttcttt cagaccgcga gtcactctcg ttctatgggg agggctacac ttcggatgag 1860
gaagtcctgg aggtgttcag gaatactctc aataagaatt cggagatttt ctcttctata 1920
aaaaaactgg aaaagttgtt taagaatttt gacgaatact ctagcgccgg catatttgtg 1980
aaaaacggcc cggccatatc aacgataagt aaagatatct tcggcgaatg gaacgtgatc 2040
agagacaaat ggaacgcgga gtatgacgat attcacctga agaagaaggc tgtcgtaacg 2100
gagaagtacg aggatgatcg caggaaaagc ttcaaaaaga tcggaagttt cagcctggaa 2160
cagttgcagg agtatgctga cgccgatctt agcgtcgtcg agaagttgaa ggagataatc 2220
atccaaaagg tcgacgagat atataaagtc tatggatcaa gtgaaaaact gttcgacgcc 2280
gacttcgttt tggagaagtc cctgaagaag aacgacgctg ttgttgccat tatgaaggat 2340
ctgctcgaca gcgtgaagag tttcgagaac tatattaagg cttttttcgg ggaggggaag 2400
gagactaaca gagatgagtc cttctacgga gacttcgtcc tcgcgtacga tatactcctt 2460
aaggtagacc acatctacga cgcaatcaga aattacgtga cacaaaagcc gtacagcaag 2520
gacaagttca aactctactt ccagaacccc cagttcatgg gcggctggga caaggacaag 2580
gaaacggatt acagggctac gatcctgagg tatggttcaa aatactactt ggcgattatg 2640
gacaagaagt acgccaagtg tctccagaag attgacaaag acgatgtcaa tggcaattat 2700
gagaagatca actacaagct gcttccgggt ccgaacaaga tgctcccaaa ggttttcttc 2760
agcaagaaat ggatggccta ctataaccca agcgaggaca tccagaagat ttataagaac 2820
ggtacgttca agaagggcga catgttcaat cttaacgact gtcacaagct gatcgacttc 2880
ttcaaagact caattagccg gtacccaaag tggtctaacg cctatgactt caacttttcg 2940
gaaaccgaga agtacaagga tatagccgga ttttatagag aggtggaaga gcagggctac 3000
aaggtgtcat tcgagtccgc cagcaagaag gaagtggaca agctcgtgga agagggtaag 3060
ctctacatgt tccagattta taataaagac tttagcgata agagccacgg gacacctaat 3120
ctccacacaa tgtatttcaa gctgctcttc gacgagaata accacggcca aatcaggttg 3180
tcaggagggg ctgaactctt catgcggcgc gctagcctta agaaggagga gcttgtagtc 3240
caccctgcga atagtccaat tgcgaataag aacccggaca atcctaaaaa gactacaaca 3300
ttgagctacg acgtgtacaa ggataagagg ttttccgagg atcagtacga gctccacatc 3360
ccgattgcga tcaacaagtg cccaaagaat attttcaaga taaacacaga ggtgcgtgta 3420
ctcctgaagc atgacgacaa tccttacgtc attgggattg atcggggcga gaggaacctc 3480
ctctatattg tggtggtgga cgggaagggg aacatagtcg aacagtactc ccttaacgaa 3540
ataattaaca atttcaacgg catccgtatc aagaccgact accattcgtt gctggacaag 3600
aaggagaagg agagatttga ggcgcggcaa aattggacaa gtatcgagaa catcaaggaa 3660
ctcaaagcag gttatatctc tcaagttgtg cataagatat gcgagctggt tgagaagtat 3720
gacgcagtga tcgctcttga ggacctcaac tcgggcttta agaattctag agttaaagtg 3780
gagaagcagg tctatcaaaa gttcgagaag atgcttatag ataagctcaa ctacatggtc 3840
gataagaaat cgaacccatg tgccaccggc ggcgcactca aaggttacca aataacaaac 3900
aaattcgagt ccttcaaatc gatgagtact cagaatgggt tcatatttta tataccggcg 3960
tggcttacgt ctaagatcga cccgtcaact ggttttgtca acctgttgaa gacgaaatac 4020
acgtccattg ccgattcgaa aaagttcata tctagttttg atcgtattat gtacgtccca 4080
gaggaagatc ttttcgagtt tgctctcgac tacaaaaact tttcgcggac cgatgcggat 4140
tacattaaaa aatggaaact ctattcgtac ggcaacagaa tcaggatttt tcgcaaccct 4200
aagaagaata acgtctttga ttgggaggaa gtttgcttga ctagcgcgta caaggagctc 4260
tttaataagt atggcattaa ctaccaacag ggtgatatca gagcactgct ttgcgaacaa 4320
tctgacaagg ctttctactc atccttcatg gctttgatga gcctgatgct ccagatgaga 4380
aattcaatta caggcagaac cgacgtggat ttcttgatct ccccggttaa aaattctgat 4440
ggcatctttt acgatagcag gaactatgaa gcgcaagaga atgcgattct gccaaaaaat 4500
gcagacgcca acggtgccta taacatcgcc aggaaagtcc tgtgggcgat cggccagttc 4560
aaaaaggccg aagacgaaaa attggacaag gtcaaaatcg ctatcagcaa caaagagtgg 4620
ctggagtatg ctcagacatc cgtaaagcat aagcgtcctg ctgccaccaa aaaggccgga 4680
caggctaaga aaaagaagtg a 4701
<210> 12
<211> 7
<212> PRT
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 12
Pro Lys Lys Lys Arg Lys Val
1 5
<210> 13
<211> 16
<212> PRT
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 13
Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln Ala Lys Lys Lys Lys
1 5 10 15
<210> 14
<211> 16
<212> PRT
<213> Artificial Sequence
<220>
<223> ARTIFICIAL SEQUENCE
<400> 14
Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser
1 5 10 15
<210> 15
<211> 1227
<212> PRT
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 15
Ser Lys Leu Glu Lys Phe Thr Asn Cys Tyr Ser Leu Ser Lys Thr Leu
1 5 10 15
Arg Phe Lys Ala Ile Pro Val Gly Lys Thr Gln Glu Asn Ile Asp Asn
20 25 30
Lys Arg Leu Leu Val Glu Asp Glu Lys Arg Ala Glu Asp Tyr Lys Gly
35 40 45
Val Lys Lys Leu Leu Asp Arg Tyr Tyr Leu Ser Phe Ile Asn Asp Val
50 55 60
Leu His Ser Ile Lys Leu Lys Asn Leu Asn Asn Tyr Ile Ser Leu Phe
65 70 75 80
Arg Lys Lys Thr Arg Thr Glu Lys Glu Asn Lys Glu Leu Glu Asn Leu
85 90 95
Glu Ile Asn Leu Arg Lys Glu Ile Ala Lys Ala Phe Lys Gly Asn Glu
100 105 110
Gly Tyr Lys Ser Leu Phe Lys Lys Asp Ile Ile Glu Thr Ile Leu Pro
115 120 125
Glu Phe Leu Asp Asp Lys Asp Glu Ile Ala Leu Val Asn Ser Phe Asn
130 135 140
Gly Phe Thr Thr Ala Phe Thr Gly Phe Phe Asp Asn Arg Glu Asn Met
145 150 155 160
Phe Ser Glu Glu Ala Lys Ser Thr Ser Ile Ala Phe Arg Cys Ile Asn
165 170 175
Glu Asn Leu Thr Arg Tyr Ile Ser Asn Met Asp Ile Phe Glu Lys Val
180 185 190
Asp Ala Ile Phe Asp Lys His Glu Val Gln Glu Ile Lys Glu Lys Ile
195 200 205
Leu Asn Ser Asp Tyr Asp Val Glu Asp Phe Phe Glu Gly Glu Phe Phe
210 215 220
Asn Phe Val Leu Thr Gln Glu Gly Ile Asp Val Tyr Asn Ala Ile Ile
225 230 235 240
Gly Gly Phe Val Thr Glu Ser Gly Glu Lys Ile Lys Gly Leu Asn Glu
245 250 255
Tyr Ile Asn Leu Tyr Asn Gln Lys Thr Lys Gln Lys Leu Pro Lys Phe
260 265 270
Lys Pro Leu Tyr Lys Gln Val Leu Ser Asp Arg Glu Ser Leu Ser Phe
275 280 285
Tyr Gly Glu Gly Tyr Thr Ser Asp Glu Glu Val Leu Glu Val Phe Arg
290 295 300
Asn Thr Leu Asn Lys Asn Ser Glu Ile Phe Ser Ser Ile Lys Lys Leu
305 310 315 320
Glu Lys Leu Phe Lys Asn Phe Asp Glu Tyr Ser Ser Ala Gly Ile Phe
325 330 335
Val Lys Asn Gly Pro Ala Ile Ser Thr Ile Ser Lys Asp Ile Phe Gly
340 345 350
Glu Trp Asn Val Ile Arg Asp Lys Trp Asn Ala Glu Tyr Asp Asp Ile
355 360 365
His Leu Lys Lys Lys Ala Val Val Thr Glu Lys Tyr Glu Asp Asp Arg
370 375 380
Arg Lys Ser Phe Lys Lys Ile Gly Ser Phe Ser Leu Glu Gln Leu Gln
385 390 395 400
Glu Tyr Ala Asp Ala Asp Leu Ser Val Val Glu Lys Leu Lys Glu Ile
405 410 415
Ile Ile Gln Lys Val Asp Glu Ile Tyr Lys Val Tyr Gly Ser Ser Glu
420 425 430
Lys Leu Phe Asp Ala Asp Phe Val Leu Glu Lys Ser Leu Lys Lys Asn
435 440 445
Asp Ala Val Val Ala Ile Met Lys Asp Leu Leu Asp Ser Val Lys Ser
450 455 460
Phe Glu Asn Tyr Ile Lys Ala Phe Phe Gly Glu Gly Lys Glu Thr Asn
465 470 475 480
Arg Asp Glu Ser Phe Tyr Gly Asp Phe Val Leu Ala Tyr Asp Ile Leu
485 490 495
Leu Lys Val Asp His Ile Tyr Asp Ala Ile Arg Asn Tyr Val Thr Gln
500 505 510
Lys Pro Tyr Ser Lys Asp Lys Phe Lys Leu Tyr Phe Gln Asn Pro Gln
515 520 525
Phe Met Gly Gly Trp Asp Lys Asp Lys Glu Thr Asp Tyr Arg Ala Thr
530 535 540
Ile Leu Arg Tyr Gly Ser Lys Tyr Tyr Leu Ala Ile Met Asp Lys Lys
545 550 555 560
Tyr Ala Lys Cys Leu Gln Lys Ile Asp Lys Asp Asp Val Asn Gly Asn
565 570 575
Tyr Glu Lys Ile Asn Tyr Lys Leu Leu Pro Gly Pro Asn Lys Met Leu
580 585 590
Pro Lys Val Phe Phe Ser Lys Lys Trp Met Ala Tyr Tyr Asn Pro Ser
595 600 605
Glu Asp Ile Gln Lys Ile Tyr Lys Asn Gly Thr Phe Lys Lys Gly Asp
610 615 620
Met Phe Asn Leu Asn Asp Cys His Lys Leu Ile Asp Phe Phe Lys Asp
625 630 635 640
Ser Ile Ser Arg Tyr Pro Lys Trp Ser Asn Ala Tyr Asp Phe Asn Phe
645 650 655
Ser Glu Thr Glu Lys Tyr Lys Asp Ile Ala Gly Phe Tyr Arg Glu Val
660 665 670
Glu Glu Gln Gly Tyr Lys Val Ser Phe Glu Ser Ala Ser Lys Lys Glu
675 680 685
Val Asp Lys Leu Val Glu Glu Gly Lys Leu Tyr Met Phe Gln Ile Tyr
690 695 700
Asn Lys Asp Phe Ser Asp Lys Ser His Gly Thr Pro Asn Leu His Thr
705 710 715 720
Met Tyr Phe Lys Leu Leu Phe Asp Glu Asn Asn His Gly Gln Ile Arg
725 730 735
Leu Ser Gly Gly Ala Glu Leu Phe Met Arg Arg Ala Ser Leu Lys Lys
740 745 750
Glu Glu Leu Val Val His Pro Ala Asn Ser Pro Ile Ala Asn Lys Asn
755 760 765
Pro Asp Asn Pro Lys Lys Thr Thr Thr Leu Ser Tyr Asp Val Tyr Lys
770 775 780
Asp Lys Arg Phe Ser Glu Asp Gln Tyr Glu Leu His Ile Pro Ile Ala
785 790 795 800
Ile Asn Lys Cys Pro Lys Asn Ile Phe Lys Ile Asn Thr Glu Val Arg
805 810 815
Val Leu Leu Lys His Asp Asp Asn Pro Tyr Val Ile Gly Ile Asp Arg
820 825 830
Gly Glu Arg Asn Leu Leu Tyr Ile Val Val Val Asp Gly Lys Gly Asn
835 840 845
Ile Val Glu Gln Tyr Ser Leu Asn Glu Ile Ile Asn Asn Phe Asn Gly
850 855 860
Ile Arg Ile Lys Thr Asp Tyr His Ser Leu Leu Asp Lys Lys Glu Lys
865 870 875 880
Glu Arg Phe Glu Ala Arg Gln Asn Trp Thr Ser Ile Glu Asn Ile Lys
885 890 895
Glu Leu Lys Ala Gly Tyr Ile Ser Gln Val Val His Lys Ile Cys Glu
900 905 910
Leu Val Glu Lys Tyr Asp Ala Val Ile Ala Leu Glu Asp Leu Asn Ser
915 920 925
Gly Phe Lys Asn Ser Arg Val Lys Val Glu Lys Gln Val Tyr Gln Lys
930 935 940
Phe Glu Lys Met Leu Ile Asp Lys Leu Asn Tyr Met Val Asp Lys Lys
945 950 955 960
Ser Asn Pro Cys Ala Thr Gly Gly Ala Leu Lys Gly Tyr Gln Ile Thr
965 970 975
Asn Lys Phe Glu Ser Phe Lys Ser Met Ser Thr Gln Asn Gly Phe Ile
980 985 990
Phe Tyr Ile Pro Ala Trp Leu Thr Ser Lys Ile Asp Pro Ser Thr Gly
995 1000 1005
Phe Val Asn Leu Leu Lys Thr Lys Tyr Thr Ser Ile Ala Asp Ser
1010 1015 1020
Lys Lys Phe Ile Ser Ser Phe Asp Arg Ile Met Tyr Val Pro Glu
1025 1030 1035
Glu Asp Leu Phe Glu Phe Ala Leu Asp Tyr Lys Asn Phe Ser Arg
1040 1045 1050
Thr Asp Ala Asp Tyr Ile Lys Lys Trp Lys Leu Tyr Ser Tyr Gly
1055 1060 1065
Asn Arg Ile Arg Ile Phe Arg Asn Pro Lys Lys Asn Asn Val Phe
1070 1075 1080
Asp Trp Glu Glu Val Cys Leu Thr Ser Ala Tyr Lys Glu Leu Phe
1085 1090 1095
Asn Lys Tyr Gly Ile Asn Tyr Gln Gln Gly Asp Ile Arg Ala Leu
1100 1105 1110
Leu Cys Glu Gln Ser Asp Lys Ala Phe Tyr Ser Ser Phe Met Ala
1115 1120 1125
Leu Met Ser Leu Met Leu Gln Met Arg Asn Ser Ile Thr Gly Arg
1130 1135 1140
Thr Asp Val Asp Phe Leu Ile Ser Pro Val Lys Asn Ser Asp Gly
1145 1150 1155
Ile Phe Tyr Asp Ser Arg Asn Tyr Glu Ala Gln Glu Asn Ala Ile
1160 1165 1170
Leu Pro Lys Asn Ala Asp Ala Asn Gly Ala Tyr Asn Ile Ala Arg
1175 1180 1185
Lys Val Leu Trp Ala Ile Gly Gln Phe Lys Lys Ala Glu Asp Glu
1190 1195 1200
Lys Leu Asp Lys Val Lys Ile Ala Ile Ser Asn Lys Glu Trp Leu
1205 1210 1215
Glu Tyr Ala Gln Thr Ser Val Lys His
1220 1225
<210> 16
<211> 3681
<212> DNA
<213> ARTIFICIAL SEQUENCE
<220>
<223> ARTIFICIAL SEQUENCE
<400> 16
tcaaagctcg agaaattcac caactgttat tcgttgagca aaacactgcg gtttaaagcg 60
attccagtcg gcaagactca agagaatata gacaataagc ggctgttggt ggaagatgaa 120
aagcgcgcgg aagactacaa aggggtgaag aagttgttgg acagatacta cctctctttt 180
atcaatgatg tcttgcactc aatcaaattg aagaatctga acaactacat ctccctcttc 240
agaaagaaaa caaggacaga aaaggagaat aaggaacttg aaaatttgga gatcaatctg 300
aggaaagaga tcgcgaaagc ctttaaaggc aacgaaggat acaaaagtct gttcaagaag 360
gatataattg agacaatttt gccagagttc ctcgatgaca aggacgagat tgcgctggtc 420
aattcgttca acggattcac aacagcattc acaggcttct ttgataatcg ggaaaatatg 480
ttctctgagg aggcaaagtc cacttctatt gcgttcaggt gtatcaatga gaatctcact 540
aggtacattt ccaacatgga tatctttgag aaggttgacg caatttttga caagcacgaa 600
gttcaggaga ttaaggagaa gatcctcaat tccgattatg acgttgagga cttcttcgaa 660
ggtgagtttt ttaatttcgt gctcactcaa gagggtatcg acgtgtataa tgcgatcatc 720
ggtgggttcg tgactgagtc cggtgaaaag attaagggat tgaacgagta tatcaacctt 780
tacaaccaaa agacgaaaca gaagctgcca aagttcaagc ctctttacaa acaggttctt 840
tcagaccgcg agtcactctc gttctatggg gagggctaca cttcggatga ggaagtcctg 900
gaggtgttca ggaatactct caataagaat tcggagattt tctcttctat aaaaaaactg 960
gaaaagttgt ttaagaattt tgacgaatac tctagcgccg gcatatttgt gaaaaacggc 1020
ccggccatat caacgataag taaagatatc ttcggcgaat ggaacgtgat cagagacaaa 1080
tggaacgcgg agtatgacga tattcacctg aagaagaagg ctgtcgtaac ggagaagtac 1140
gaggatgatc gcaggaaaag cttcaaaaag atcggaagtt tcagcctgga acagttgcag 1200
gagtatgctg acgccgatct tagcgtcgtc gagaagttga aggagataat catccaaaag 1260
gtcgacgaga tatataaagt ctatggatca agtgaaaaac tgttcgacgc cgacttcgtt 1320
ttggagaagt ccctgaagaa gaacgacgct gttgttgcca ttatgaagga tctgctcgac 1380
agcgtgaaga gtttcgagaa ctatattaag gcttttttcg gggaggggaa ggagactaac 1440
agagatgagt ccttctacgg agacttcgtc ctcgcgtacg atatactcct taaggtagac 1500
cacatctacg acgcaatcag aaattacgtg acacaaaagc cgtacagcaa ggacaagttc 1560
aaactctact tccagaaccc ccagttcatg ggcggctggg acaaggacaa ggaaacggat 1620
tacagggcta cgatcctgag gtatggttca aaatactact tggcgattat ggacaagaag 1680
tacgccaagt gtctccagaa gattgacaaa gacgatgtca atggcaatta tgagaagatc 1740
aactacaagc tgcttccggg tccgaacaag atgctcccaa aggttttctt cagcaagaaa 1800
tggatggcct actataaccc aagcgaggac atccagaaga tttataagaa cggtacgttc 1860
aagaagggcg acatgttcaa tcttaacgac tgtcacaagc tgatcgactt cttcaaagac 1920
tcaattagcc ggtacccaaa gtggtctaac gcctatgact tcaacttttc ggaaaccgag 1980
aagtacaagg atatagccgg attttataga gaggtggaag agcagggcta caaggtgtca 2040
ttcgagtccg ccagcaagaa ggaagtggac aagctcgtgg aagagggtaa gctctacatg 2100
ttccagattt ataataaaga ctttagcgat aagagccacg ggacacctaa tctccacaca 2160
atgtatttca agctgctctt cgacgagaat aaccacggcc aaatcaggtt gtcaggaggg 2220
gctgaactct tcatgcggcg cgctagcctt aagaaggagg agcttgtagt ccaccctgcg 2280
aatagtccaa ttgcgaataa gaacccggac aatcctaaaa agactacaac attgagctac 2340
gacgtgtaca aggataagag gttttccgag gatcagtacg agctccacat cccgattgcg 2400
atcaacaagt gcccaaagaa tattttcaag ataaacacag aggtgcgtgt actcctgaag 2460
catgacgaca atccttacgt cattgggatt gatcggggcg agaggaacct cctctatatt 2520
gtggtggtgg acgggaaggg gaacatagtc gaacagtact cccttaacga aataattaac 2580
aatttcaacg gcatccgtat caagaccgac taccattcgt tgctggacaa gaaggagaag 2640
gagagatttg aggcgcggca aaattggaca agtatcgaga acatcaagga actcaaagca 2700
ggttatatct ctcaagttgt gcataagata tgcgagctgg ttgagaagta tgacgcagtg 2760
atcgctcttg aggacctcaa ctcgggcttt aagaattcta gagttaaagt ggagaagcag 2820
gtctatcaaa agttcgagaa gatgcttata gataagctca actacatggt cgataagaaa 2880
tcgaacccat gtgccaccgg cggcgcactc aaaggttacc aaataacaaa caaattcgag 2940
tccttcaaat cgatgagtac tcagaatggg ttcatatttt atataccggc gtggcttacg 3000
tctaagatcg acccgtcaac tggttttgtc aacctgttga agacgaaata cacgtccatt 3060
gccgattcga aaaagttcat atctagtttt gatcgtatta tgtacgtccc agaggaagat 3120
cttttcgagt ttgctctcga ctacaaaaac ttttcgcgga ccgatgcgga ttacattaaa 3180
aaatggaaac tctattcgta cggcaacaga atcaggattt ttcgcaaccc taagaagaat 3240
aacgtctttg attgggagga agtttgcttg actagcgcgt acaaggagct ctttaataag 3300
tatggcatta actaccaaca gggtgatatc agagcactgc tttgcgaaca atctgacaag 3360
gctttctact catccttcat ggctttgatg agcctgatgc tccagatgag aaattcaatt 3420
acaggcagaa ccgacgtgga tttcttgatc tccccggtta aaaattctga tggcatcttt 3480
tacgatagca ggaactatga agcgcaagag aatgcgattc tgccaaaaaa tgcagacgcc 3540
aacggtgcct ataacatcgc caggaaagtc ctgtgggcga tcggccagtt caaaaaggcc 3600
gaagacgaaa aattggacaa ggtcaaaatc gctatcagca acaaagagtg gctggagtat 3660
gctcagacat ccgtaaagca t 3681

Claims (10)

1. An isolated fusion polypeptide, wherein the fusion polypeptide comprises a CRISPR nuclease and a 5'→ 3' exonuclease.
2. The polypeptide of claim 1, wherein the 5'→ 3' exonuclease digests double stranded dna (dsdna) and/or digests single stranded dna (ssdna).
3. The polypeptide of claim 1 or 2, wherein the 5'→ 3' exonuclease is a T5 exonuclease.
4. The polypeptide of claim 3, wherein said T5 exonuclease
i) Comprises a nucleotide sequence substantially identical to SEQ ID NO: 3, having at least 80%, at least 90%, at least 95%, at least 99%, or 100% sequence identity; or
ii) consists of a sequence identical to SEQ ID NO: 4, or a nucleotide sequence encoding a polypeptide having at least 80%, at least 90%, at least 95%, at least 99%, or 100% sequence identity.
5. The polypeptide of any one of claims 1-4, wherein the CRISPR nuclease:
i) is Cas9 or Cas12 a;
ii) has the sequence of SEQ ID NO: 8 or 15; or
iii) consists of SEQ ID NO: 9 or 16.
6. The polypeptide of any one of claims 1 to 5, wherein the 5'→ 3' exonuclease is located at the N-terminus and/or C-terminus of the CRISPR nuclease.
7. The polypeptide of any one of claims 1-6, comprising the amino acid sequence of SEQ ID NO: 1 or 10.
8. An isolated polynucleotide encoding the polypeptide of any one of claims 1-7, e.g., comprising the amino acid sequence of SEQ ID NO: 2 or 11.
9. A genome editing system comprising at least one of the following i) to v):
i) the fusion polypeptide of any one of claims 1-7 and a guide RNA;
ii) an expression construct comprising a nucleotide sequence encoding the fusion polypeptide of any of claims 1-7, and a guide RNA;
iii) the fusion polypeptide of any one of claims 1-7, and an expression construct comprising a nucleotide sequence encoding a guide RNA;
iv) an expression construct comprising a nucleotide sequence encoding the fusion polypeptide of any of claims 1-7, and an expression construct comprising a nucleotide sequence encoding a guide RNA;
v) an expression construct comprising a nucleotide sequence encoding the fusion polypeptide of any of claims 1-7 and a nucleotide sequence encoding a guide RNA,
for example, wherein the guide RNA is a sgRNA.
10. A method of genetically modifying a cell, comprising introducing the genome editing system of claim 9 into a cell, e.g., a plant cell.
CN201911351725.4A 2019-12-24 2019-12-24 Improved genome editing system Pending CN113025597A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911351725.4A CN113025597A (en) 2019-12-24 2019-12-24 Improved genome editing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911351725.4A CN113025597A (en) 2019-12-24 2019-12-24 Improved genome editing system

Publications (1)

Publication Number Publication Date
CN113025597A true CN113025597A (en) 2021-06-25

Family

ID=76452485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911351725.4A Pending CN113025597A (en) 2019-12-24 2019-12-24 Improved genome editing system

Country Status (1)

Country Link
CN (1) CN113025597A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114317492A (en) * 2021-12-06 2022-04-12 北京大学 Modified artificial nuclease system and application thereof
CN116262927A (en) * 2021-12-13 2023-06-16 中国科学院微生物研究所 Method for regulating gene expression based on CRISPR/Cas system and application thereof
WO2023165613A1 (en) * 2022-03-03 2023-09-07 清华大学 Use of 5'→3' exonuclease in gene editing system, and gene editing system and gene editing method
CN116262927B (en) * 2021-12-13 2024-04-26 中国科学院微生物研究所 Method for regulating gene expression based on CRISPR/Cas system and application thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FELICITY ALLEN等: "Predicting the mutations generated by repair of Cas9-induced double-strand breaks", 《NATURE BIOTECHNOLOGY》, vol. 37, no. 1, 31 January 2019 (2019-01-31), pages 64 - 82 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114317492A (en) * 2021-12-06 2022-04-12 北京大学 Modified artificial nuclease system and application thereof
CN116262927A (en) * 2021-12-13 2023-06-16 中国科学院微生物研究所 Method for regulating gene expression based on CRISPR/Cas system and application thereof
CN116262927B (en) * 2021-12-13 2024-04-26 中国科学院微生物研究所 Method for regulating gene expression based on CRISPR/Cas system and application thereof
WO2023165613A1 (en) * 2022-03-03 2023-09-07 清华大学 Use of 5'→3' exonuclease in gene editing system, and gene editing system and gene editing method

Similar Documents

Publication Publication Date Title
AU2020223370B2 (en) Enzymes with RuvC domains
CN107027313B (en) Methods and compositions for multiplex RNA-guided genome editing and other RNA techniques
US11702643B2 (en) System and method for genome editing
US20180273961A1 (en) A CRISPR/Cas9 SYSTEM FOR HIGH EFFICIENT SITE-DIRECTED ALTERING OF PLANT GENOMES
JP2020508046A (en) Genome editing system and method
US10913941B2 (en) Enzymes with RuvC domains
CN111742051A (en) Extended single guide RNA and uses thereof
CN109689875B (en) Genome editing system and method
CN110157726A (en) The method of Plant Genome fixed point replacement
WO2020224611A1 (en) Improved gene editing system
CA3177828A1 (en) Enzymes with ruvc domains
WO2021178934A1 (en) Class ii, type v crispr systems
CN117187220A (en) Adenine deaminase and its use in base editing
CN113025597A (en) Improved genome editing system
CN112805385B (en) Base editor based on human APOBEC3A deaminase and application thereof
CA3228222A1 (en) Class ii, type v crispr systems
JP7361109B2 (en) Systems and methods for C2c1 nuclease-based genome editing
WO2021004456A1 (en) Improved genome editing system and use thereof
US20220298494A1 (en) Enzymes with ruvc domains
CN107446031B (en) Plant glutelin transport and storage related protein OsVHA-E1, and coding gene and application thereof
US20220220460A1 (en) Enzymes with ruvc domains
CN112980839B (en) Method for creating new high-amylose rice germplasm and application thereof
KR20220150363A (en) Improved Cytosine Base Editing System
WO2021226369A1 (en) Enzymes with ruvc domains
CN107075526A (en) Plant with engineering endogenous gene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination