US20230340437A1 - Modified nucleases - Google Patents

Modified nucleases Download PDF

Info

Publication number
US20230340437A1
US20230340437A1 US18/141,363 US202318141363A US2023340437A1 US 20230340437 A1 US20230340437 A1 US 20230340437A1 US 202318141363 A US202318141363 A US 202318141363A US 2023340437 A1 US2023340437 A1 US 2023340437A1
Authority
US
United States
Prior art keywords
polypeptide
terminus
nuclease
nls
composition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US18/141,363
Inventor
Roland Baumgartner
Tanya Warnecke
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Celyntra Therapeutics Sa
Original Assignee
Artisan Development Labs Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Artisan Development Labs Inc filed Critical Artisan Development Labs Inc
Priority to US18/141,363 priority Critical patent/US20230340437A1/en
Assigned to ARTISAN DEVELOPMENT LABS, INC. reassignment ARTISAN DEVELOPMENT LABS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BAUMGARTNER, ROLAND, WARNECKE, TANYA
Assigned to FIRST-CITIZENS BANK & TRUST COMPANY, AS AGENT reassignment FIRST-CITIZENS BANK & TRUST COMPANY, AS AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARTISAN DEVELOPMENT LABS, INC.
Publication of US20230340437A1 publication Critical patent/US20230340437A1/en
Assigned to CELYNTRA THERAPEUTICS SA reassignment CELYNTRA THERAPEUTICS SA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARTISAN (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC
Assigned to ARTISAN (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC reassignment ARTISAN (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARTISAN DEVELOPMENT LABS, INC.
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/09Fusion polypeptide containing a localisation/targetting motif containing a nuclear localisation signal
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • Nucleic acid-guided nucleases have become important tools for research and genome engineering. The applicability of these tools can be limited by the sequence specificity requirements, expression, or delivery issues.
  • FIG. 1 shows a diagram of MAD7 comprising one or more nuclear localization signals (NLS).
  • FIG. 1 discloses “His6” as SEQ ID NO: 423.
  • FIG. 2 shows editing frequency at the DNMT1 locus in and post-transfection cell viability of T-cell leukemic cells following treatment comprising one or more guide nucleic acids complexed with MAD7 comprising one or more NLS.
  • FIG. 3 shows editing frequency at the DNMT1 locus in T-cell leukemic cells using multiple electroporation programs in combination with the SE electroporation buffer.
  • FIG. 4 shows editing frequency at the DNMT1 locus in T-cell leukemic cells using multiple electroporation programs in combination with the SF electroporation buffer.
  • FIG. 5 shows editing frequency at the DNMT1 locus in T-cell leukemic cells using multiple electroporation programs in combination with the SG electroporation buffer.
  • FIG. 6 shows editing frequency at the DNMT1 locus in T-cell leukemic cells using multiple electroporation programs.
  • FIG. 7 shows editing frequency by type at eight loci in T-cell leukemic cells using multiple guide nucleic acids complexed with MAD7 comprising one or more NLS.
  • FIG. 8 shows a comparison of editing efficiency between T-cell leukemic cells treated with MAD7 comprising one or more guide nucleic acids targeting the DNMT1 locus as compared to a control guide nucleic acid binned by editing frequency.
  • FIG. 9 shows editing frequency by PAM motif in T-cell leukemic cells using multiple guide nucleic acids complexed with MAD7 comprising one or more NLS.
  • FIG. 10 A shows sequence logo plots for multiple guide nucleic acids binned by editing frequency in T-cell leukemic cells using when complexed with MAD7 comprising one or more NLS.
  • FIG. 10 B shows nucleotide and dinucleotide frequency for multiple guide nucleic acids binned by editing frequency in T-cell leukemic cells using when complexed with MAD7 comprising one or more NLS.
  • FIG. 11 shows trinucleotide AAA or UUU frequency binned by editing frequency in T-cell leukemic cells following treatment with multiple guide nucleic acids complexed with MAD7 comprising one or more NLS.
  • FIG. 12 shows editing frequency for both INDELs and frameshift mutations at eight loci in T-cell leukemic cells following treatment with multiple guide nucleic acids complexed with MAD7 comprising one or more NLS.
  • FIG. 13 shows the correlation between INDEL frequency in the gNA validation experiment versus INDEL formation in the gNA screen experiment.
  • FIG. 14 shows the proportion of frameshift to INDELs at eight loci in T-cell leukemic cells following treatment with multiple guide nucleic acids complexed with MAD7 comprising one or more NLS.
  • FIG. 15 shows INDEL frequency for gNAs comprising representative spacer sequences complexed with MAD7 comprising one or more NLS in T-cell leukemic cells at predicted off-target sites.
  • FIG. 15 discloses SEQ ID NOS 424-427, 427-429 and 429-454, respectively, in order of appearance.
  • FIG. 16 shows INDEL frequency for gNAs comprising representative spacer sequences complexed with MAD7 comprising one or more NLS in T-cell leukemic cells at predicted off-target sites.
  • FIG. 16 discloses SEQ ID NOS 455-484, 453-454 and 485-487, respectively, in order of appearance.
  • FIG. 17 shows INDEL frequency at the AAVS1 locus in T-cell leukemic cells following treatment with a gNA:MAD7 complex.
  • FIG. 18 shows GFP insertion efficiency at the AAVS1 locus and cell viability following treatment for multiple primer constructs.
  • FIG. 19 shows GFP insertion efficiency at the AAVS1 locus with increasing concentrations of donor template (e.g., HDRT) and variable homology arm length.
  • donor template e.g., HDRT
  • FIG. 20 shows CAR insertion efficiency at the AAVS1 locus and cell viability with increasing concentrations of donor template and variable homology arm length.
  • FIG. 21 shows CAR insertion efficiency (A) at the AAVS1 locus and cell viability (B) in primary T-cells.
  • CRISPR is an abbreviation of Clustered Regularly Interspaced Short Palindromic Repeats. In a palindromic repeat, the sequence of nucleotides is the same in both directions. Each of these palindromic repetitions is followed by short segments of spacer DNA. Small clusters of Cas (CRISPR-associated system) genes are located next to CRISPR sequences.
  • the CRISPR/Cas system is a prokaryotic immune system that can confer resistance to foreign genetic elements such as those present within plasmids and phages providing the prokaryote a form of acquired immunity. RNA harboring a spacer sequence assists Cas (CRISPR-associated) proteins to recognize and cut exogenous DNA.
  • CRISPR sequences are found in approximately 50% of bacterial genomes and nearly 90% of sequenced archaea has selected for efficient and robust metabolic and regulatory networks that prevent unnecessary metabolite biosynthesis and optimally distribute resources to maximize overall cellular fitness.
  • the complexity of these networks with limited approaches to understand their structure and function and the ability to re-program cellular networks to modify these systems for a diverse range of applications has complicated advances in this space.
  • Certain approaches to re-program cellular networks are directed to modifying single genes of complex pathways but as a consequence of modifying single genes, unwanted modifications to the genes or other genes can result, getting in the way of identifying changes necessary to achieve a sought-after endpoint as well as complicating the endpoint sought by the modification.
  • CRISPR-Cas driven genome editing and engineering has dramatically impacted biology and biotechnology in general.
  • CRISPR-Cas editing systems require a polynucleotide guided nuclease, a guide nucleic acid (gNA) e.g. a guide RNA (gRNA)) that directs the nuclease to cut a specific region of the genome, and, optionally, a donor DNA cassette (also referred to herein as a donor template or editing sequence) that can be used to repair the cut dsDNA and thereby incorporate programmable edits at the site of interest.
  • gNA guide nucleic acid
  • gRNA guide RNA
  • a donor DNA cassette also referred to herein as a donor template or editing sequence
  • modulating and “manipulating” of genome editing can mean an increase, a decrease, upregulation, downregulation, induction, a change in editing activity, a change in binding, a change cleavage or the like, of one or more of targeted genes or gene clusters of certain embodiments disclosed herein.
  • primers used herein for preparation per conventional techniques can include sequencing primers and amplification primers.
  • plasmids and oligomers used in conventional techniques can include synthesized oligomers and oligomer cassettes.
  • nucleic acid-guided nuclease systems and methods of use are provided.
  • a nuclease system can include transcripts and other elements involved in the expression of an engineered nuclease disclosed herein, which can include sequences encoding a novel engineered nucleic acid-guided nuclease protein and a guide sequence (gRNA) or a novel gRNA as disclosed herein.
  • nucleic acid-guided nuclease systems can include at least one CRISPR-associated nucleic acid guided nuclease construct, the disclosure of which are provided herein.
  • nucleic acid-guided nuclease systems can include at least one known guide sequence (gRNA) or at least one novel gRNA, such as a single gRNA or a dual gRNA.
  • gRNA guide sequence
  • an engineered nucleic acid-guided nuclease of the instant invention can be used in systems for editing a gene of interest in humans or other species.
  • novel engineered nucleic acid-guided nuclease constructs disclosed herein can be created for targeting of a targeted gene and/or increased efficiency and/or accuracy of targeted gene editing in a subject.
  • Cas12a nuclease recognizes T-rich protospacer adjacent motif (PAM) sequences (e.g. 5′-TTTN-3′ (AsCas12a, LbCas12a) and 5′-TTN-3′ (FnCas12a); whereas, the comparable sequence for SpCas9 is NGG.
  • PAM protospacer adjacent motif
  • the PAM sequence of Cas12a is located at the 5′ end of the target DNA sequence, where it is at the 3′ end for Cas9.
  • Cas12a is capable of cleaving DNA distal to its PAM around the +18/+23 position of the protospacer. This cleavage creates a staggered DNA overhang (e.g.
  • Cas9 cleaves close to its PAM after the 3′ position of the protospacer at both strands and creates blunt ends.
  • creating altered recognition of nucleases can provide an improvement over Cas9 or Cas12a to improve accuracy.
  • Cas12a is guided by a single crRNA and does not require a tracrRNA, resulting in a shorter gRNA sequence than the sgRNA used by Cas9.
  • the modified Cas12a nucleases provided herein can also function with a dual gRNA.
  • Cas12a displays additional ribonuclease activity that functions in crRNA processing.
  • Cas12a is used as an editing tool for different species (e.g. S. cerevisiae ), allowing the use of an alternative PAM sequence compared with the one recognized by CRISPR/Cas9.
  • Novel nucleases disclosed herein can further recognize the same or alternative PAM sequences. These novel nucleases can provide an alternative system for multiplex genome editing as compared with known multiplex approaches and can be used as an improved system in mammalian gene editing.
  • Cas12a-like nucleases and engineered gRNAs disclosed herein are contemplated for use in bacteria, and other prokaryotes.
  • engineered designer nucleases are contemplated for use in eukaryotes such as yeast, mammals, e.g., human as well as of use in birds and fish, or cells derived from same.
  • off-targeting rates for nuclease constructs disclosed herein can be reduced compared to a control, e.g., a native sequence, for improved editing. Off-targeting rates can be readily tested.
  • nuclease constructs disclosed herein can share conserved encoded motifs of known nucleases. In other embodiments, nuclease constructs disclosed herein do not share conserved encoded peptide motifs with known nucleases.
  • the CRISPR nuclease comprises a Type V nuclease. In certain embodiments, provided herein are compositions, methods, and/or kits wherein the CRISPR nuclease comprises a Type V-A, V-B, V-C, V-D, or V-E CRISPR nuclease.
  • compositions, methods, and/or kits wherein the CRISPR nuclease comprises a Type V-A nuclease.
  • Naturally occurring type V-A CRISPR nucleases comprise a RuvC-like nuclease domain but lack an HNH endonuclease domain, and recognize a 5′ T-rich PAM located immediately upstream from the target nucleotide sequence, the orientation determined using the non-target strand (i.e., the strand not hybridized with the spacer sequence) as the coordinate.
  • These CRISPR nucleases cleave a double-stranded DNA to generate a staggered double-stranded break rather than a blunt end.
  • the cleavage site is distant from the PAM site (e.g., separated by at least 10, 11, 12, 13, 14, or 15 nucleotides downstream from the PAM on the non-target strand and/or separated by at least 15, 16, 17, 18, or 19 nucleotides upstream from the sequence complementary to PAM on the target strand).
  • a type V-A CRISPR nuclease comprises Cpf1.
  • Cpf1 proteins are known in the art and are described, e.g., in U.S. Pat. Nos. 9,790,490 and 10,113,179.
  • Cpf1 orthologs can be found in various bacterial and archaeal genomes.
  • the Cpf1 protein is derived from Francisella novicida U112 (Fn), Acidaminococcus sp.
  • BV3L6 (As), Lachnospiraceae bacterium ND2006 (Lb), Lachnospiraceae bacterium MA2020 (Lb2), Candidatus Methanoplasma termitum (CMt), Moraxella bovoculi 237 (Mb), Porphyromonas crevioricanis (Pc), Prevotella disiens (Pd), Francisella tularensis 1, Francisella tularensis subsp.
  • a type V-A CRISPR nuclease comprises AsCpf1 or a variant thereof.
  • a type V-A CRISPR nucleases comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 3 of International (PCT) Application Publication No. WO 2021/158918.
  • a type V-A CRISPR nucleases comprises the amino acid sequence set forth in SEQ ID NO: 3 of International (PCT) Application Publication No. WO 2021/158918.
  • a type V-A CRISPR nuclease comprises FnCpf1 or a variant thereof.
  • a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 5 of International (PCT) Application Publication No. WO 2021158918.
  • a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 5 of International (PCT) Application Publication No. WO 2021/158918.
  • a type V-A CRISPR nuclease comprises Prevotella bryantii Cpf1 (PbCpf1) or a variant thereof.
  • a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 6 of International (PCT) Application Publication No. WO 2021/158918.
  • a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 6 of International (PCT) Application Publication No. WO 2021/158918.
  • a type V-A CRISPR nuclease comprises Proteocatella sphenisci Cpf1 (PsCpf1) or a variant thereof.
  • a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 7 of International (PCT) Application Publication No. WO 2021158918.
  • a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 7 of International (PCT) Application Publication No. WO 2021/158918.
  • a type V-A CRISPR nuclease comprises Anaerovibrio sp. RM50 Cpf1 (As2Cpf1) or a variant thereof.
  • a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 8 of International (PCT) Application Publication No. WO 2021158918.
  • a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 8 of International (PCT) Application Publication No. WO 2021/158918.
  • a type V-A CRISPR nuclease comprises Moraxella caprae Cpf1 (McCpf1) or a variant thereof.
  • a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 9 of International (PCT) Application Publication No. WO 2021/158918.
  • a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 9 of International (PCT) Application Publication No. WO 2021/158918.
  • a type V-A CRISPR nuclease comprises Lachnospiraceae bacterium COE1 Cpf1 (Lb3Cpf1) or a variant thereof.
  • a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 10 of International (PCT) Application Publication No. WO 2021158918.
  • a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 10 of International (PCT) Application Publication No. WO 2021/158918.
  • a type V-A CRISPR nuclease comprises Eubacterium coprostanoligenes Cpf1 (EcCpf1) or a variant thereof.
  • a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 11 of International (PCT) Application Publication No. WO 2021158918.
  • a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 11 of International (PCT) Application Publication No. WO 2021/158918.
  • a type V-A CRISPR nuclease is not Cpf1. In certain embodiments, a type V-A CRISPR nuclease is not AsCpf1.
  • a type V-A CRISPR nuclease comprises a Type V-A nuclease described in U.S. Pat. No. 9,982,279.
  • a Type VA CRISPR nuclease polypeptide used in compositions and methods herein can be represented by a polypeptide that includes a sequence that has at least 60, 70, 80, 85, 90, 95, 96, 97, 98, 99, or 100% sequence identity, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% sequence identity with SEQ ID NO: 1 SEQ ID NO: 1 wherein the Type VA CRISPR nuclease polypeptide further comprises at least one, two, three, four, five or six nuclear localization sequences (NLS), each of which can be at or near the amino end or carboxy end of the CRISPR nuclease polypeptide; and/or one or more purification tags; in addition, a cleavage sequence can be provided to remove portions of a protopeptide.
  • NLS nuclear localization sequences
  • the term “at or near” an N-terminus or a C-terminus includes where the nearest amino acid of the NLS to the N- or C-terminus is within 300 amino acids, in some cases within 200 amino acids, from the N- or C-terminus of the polypeptide (e.g., a core polypeptide such as one of the CRISPR nucleases described herein, to which the NLS or NLSs is attached).
  • the polypeptide e.g., a core polypeptide such as one of the CRISPR nucleases described herein, to which the NLS or NLSs is attached.
  • a Type V CRISPR nuclease polypeptide e.g., Type Va CRISPR polypeptide
  • a CRISPR nuclease polypeptide comprising one or more NLSs and, in some cases, a purification tag and/or a cleavage site, comprises a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to any one of SEQ ID NOs: 109-112.
  • a Type V, e.g., VA CRISPR nuclease polypeptide comprises at least 1-30, 1-20, 1-15, 1-10, 1-9, 1-8, 1-7, 1-6, 1-5, 2-30, 2-20, 2-15, 2-10, 2-9, 2-8, 2-7, 2-6, 2-5, 3-30, 3-20, 3-15, 3-10, 3-9, 3-8, 3-7, 3-6, or 3-5, preferably 1-10, more preferably 2-10, even more preferably 3-10 NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide, in preferred embodiments at or near the N-terminus.
  • At least two, or at least three, of the NLSs have different mechanisms, that is, different mechanisms by which they localize an attached polypeptide to a nucleus.
  • Such mechanisms are well-known in the art; see, e.g., Lu et al. Cell Commun Signal (2021) 19:60 https://doi.org/10.1186/s12964-021-00741-y.
  • Suitable NLS, purification tag, and cleavage site sequences can be as described elsewhere herein, e.g., in sections labeled Nuclear Localization Signals, Purification Tags, and Cleavage Sites.
  • Nucleotide sequences coding for SEQ ID NO: 1 can include sequences with less than 99, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, or 40% sequence identity with SEQ ID NO: 22, in preferred embodiments less than 75% sequence identity.
  • a nucleotide sequence coding for SEQ ID NO: 1 can also include nucleic acid sequences coding for one or more NLS at the N-terminus and/or C-terminus, as described herein, and/or a tag such as a purification tag at the N-terminus, as described herein.
  • compositions comprising a first polynucleotide coding for a polypeptide comprising a nucleic acid-guided nuclease comprising a CRISPR Type V nuclease polypeptide, wherein the polynucleotide has less than 75% sequence identity to SEQ ID NO: 22, such as wherein the nuclease polypeptide comprises at least 1, 2, 3, 4, or 5 NLSs, wherein each of the NLSs is at or near the N-terminus or the C-terminus of the nuclease polypeptide.
  • NLSs can be any of those described herein.
  • the first polynucleotide can comprise a sequence coding for a purification tag, such as a purification tag described herein, and/or cleavage site, such as a cleavage site described herein.
  • the first polynucleotide codes for a polypeptide comprising a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to any one of SEQ ID NOs: 109-112, such as SEQ ID NO: 109, or SEQ ID NO: 110, or SEQ ID NO: 111, or SEQ ID NO: 112.
  • the first polynucleotide comprises a sequence at least 50, 60, 70, 80, 90, 95, 97, or 99% identical, or 100% identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 113.
  • the composition further comprises a second polynucleotide coding for a gNA or portion thereof, wherein the gNA, e.g., gRNA, comprises a spacer sequence that targets a target nucleotide sequence within a polynucleotide, or a polynucleotide coding for the gNA, e.g., gRNA, wherein the gNA, e.g., gRNA is compatible with the Type V CRISPR nuclease.
  • the first and second polynucleotides are the same.
  • the composition can further comprise a third polynucleotide comprising a donor template.
  • a vector comprising one of the polynucleotide compositions of this paragraph.
  • a cell comprising one of the polynucleotide compositions of this paragraph, e.g., a human cell, such as an immune cell, for example a T cell, or a stem cell, such as an iPSC.
  • a method comprising inserting any one of the polynucleotide compositions of this paragraph into a cell. In certain embodiments inserting the composition comprises electroporation.
  • Exemplary nucleotide sequences coding for SEQ ID NO: 1 can include, e.g., SEQ ID NOs: 23-42:
  • Nucleic acid-guided nucleases can encompass a native sequence, an engineered sequence, or engineered nucleotide sequences of synthetized variants.
  • Non-limiting examples of types of engineering that can be done to obtain a non-naturally occurring nuclease system are as follows.
  • Engineering can include codon optimization to facilitate expression or improve expression in a host cell, such as a heterologous host cell.
  • Engineering can reduce the size or molecular weight of the nuclease in order to facilitate expression or delivery.
  • Engineering can alter PAM selection in order to change PAM specificity or to broaden the range of recognized PAMs.
  • Engineering can alter, increase, or decrease stability, processivity, specificity, or efficiency of a targetable nuclease system.
  • Engineering can alter, increase, or decrease protein stability.
  • a non-naturally occurring nucleic acid sequence can be an engineered sequence or engineered nucleotide sequences of synthetized variants. Such non-naturally occurring nucleic acid sequences can be amplified, cloned, assembled, synthesized, generated from synthesized oligonucleotides or dNTPs, or otherwise obtained using methods known by those skilled in the art.
  • examples of non-naturally occurring nucleic acid-guided nucleases disclosed herein can include those nucleic acid-guided nucleases with engineered polypeptide sequences (e.g., SEQ ID NOs:2-4) and those nucleotide sequences of synthetized variants (e.g., SEQ ID NOs: 43-63).
  • a nucleic acid-guided nuclease e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO:2.
  • a nucleic acid-guided nuclease e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to amino acid sequence of SEQ ID NO:2.
  • a nucleic acid-guided nuclease e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO: 3.
  • a nucleic acid-guided nuclease e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical, to amino acid sequence of SEQ ID NO: 3.
  • a nucleic acid-guided nuclease e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO: 4.
  • a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to amino acid sequences of SEQ ID NO: 4.
  • a nucleic acid-guided nuclease e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to any one of SEQ ID NOs: 109-112.
  • a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to amino acid sequence of any one of SEQ ID NOs: 109-112.
  • a nucleic acid-guided nuclease e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO: 110.
  • a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 110.
  • a nucleic acid-guided nuclease e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO: 111.
  • a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 111.
  • a nucleic acid-guided nuclease e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO: 112.
  • a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 112.
  • each may be selected independently of the others, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs present in one or more copies.
  • the engineered nuclease comprises 4 NLSs.
  • Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO:5); the NLS from nucleoplasmin (e.g.
  • nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO:6); the c-myc NLS having the amino acid sequence PAAKRVKLD SEQ ID NO:7) or RQRRNELKRSP (SEQ ID NO:8); the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO:9); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO:10) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO:11) and PPKKARED (SEQ ID NO:12) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO:13) of human p53; the sequence SALI AP (SEQ ID NO:14) of mouse c-abl IV; the sequences DRLRR (SEQ ID
  • a nuclease provided herein comprises at least one myc-related NLS comprising the sequence PAAKKKKLD (SEQ ID NO:21); in certain embodiments the myc-related NLS is at the N-terminus of the nuclease.
  • a nuclease provided herein comprises at least one nucleoplasmin NLS comprising the sequence KRPAATKKAGQAKKKK (SEQ ID NO:6); in certain embodiments the nucleoplasmin NLS is at the C-terminus of the nuclease.
  • a nuclease provided herein comprises at least one, or at least two, SV40 NLS sequences comprising the sequence PKKKRKV (SEQ ID NO:5); in certain embodiments the SV40 NLSs are at the C-terminus of the nuclease. In certain embodiments, a nuclease provided herein comprises 1 NLS at the N-terminus and 3 NLSs at the C-terminus, for example 1 myc-related NLS at the N-terminus and one nucleoplasmin NLS and two SV40 NLSs at the C-terminus.
  • a nuclease provided herein comprises 1 myc-related NLS at the N-terminus with the sequence PAAKKKKLD (SEQ ID NO:21 and one nucleoplasmin NLS comprising the sequence KRPAATKKAGQAKKKK (SEQ ID NO:6) and two SV40 NLSs comprising the sequence PKKKRKV (SEQ ID NO:5) at the C-terminus.
  • the one or more NLSs are of sufficient strength to drive accumulation of the nucleic acid-guided nuclease in a detectable amount in the nucleus of a eukaryotic cell.
  • strength of nuclear localization activity may derive from the number of NLSs, the particular NLS(s) used, or a combination of these factors.
  • Detection of accumulation in the nucleus may be performed by any suitable technique.
  • a detectable marker may be fused to the nucleic acid-guided nuclease, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g. a stain specific for the nucleus such as DAPI).
  • Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of the nucleic acid-guided nuclease complex formation (e.g.
  • nucleic acid-guided nuclease activity assay for DNA cleavage or mutation at the target sequence, or assay for altered gene expression activity affected by targetable nuclease complex formation and/or nucleic acid-guided nuclease activity), as compared to a control not exposed to the nucleic acid-guided nuclease or targetable nuclease complex, or exposed to a nucleic acid-guided nuclease lacking the one or more NLSs.
  • a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4 and at least one myc-related NLS comprising the sequence PAAKKKKLD (SEQ ID NO:21); in certain embodiments the myc-related NLS is at the N-terminus of the nuclease.
  • a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4 and at least one nucleoplasmin NLS comprising the sequence KRPAATKKAGQAKKKK (SEQ ID NO:6); in certain embodiments the nucleoplasmin NLS is at the C-terminus of the nuclease.
  • a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4 and at least one, or at least two, SV40 NLS sequences comprising the sequence PKKKRKV (SEQ ID NO: 5); in certain embodiments the SV40 NLSs are at the C-terminus of the nuclease.
  • a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4 and one NLS at the N-terminus and three NLSs at the C-terminus, for example 1 myc-related NLS at the N-terminus and one nucleoplasmin NLS and two SV40 NLSs at the C-terminus.
  • a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4, and one myc-related NLS at the N-terminus with the sequence PAAKKKKLD (SEQ ID NO:21) and one nucleoplasmin NLS comprising the sequence KRPAATKKAGQAKKKK (SEQ ID NO:6) and two SV40 NLSs comprising the sequence PKKKRKV (SEQ ID NO:5) at the C-terminus.
  • a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 1, and one, two, or three NLS at the N-terminus and one, two, or three NLS at the C-terminus.
  • a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 1, and one myc-related NLS at the N-terminus with the sequence PAAKKKKLD (SEQ ID NO:21) and one nucleoplasmin NLS comprising the sequence KRPAATKKAGQAKKKK (SEQ ID NO:6) and two SV40 NLSs comprising the sequence PKKKRKV (SEQ ID NO:5) at the C-terminus.
  • a nucleic acid-guided nuclease provided herein can comprise a tag, e.g., a purification tag, e.g. at the N-terminus.
  • tags include a poly-his tag, such as a Gly-6 ⁇ His tag (SEQ ID NO: 421) or Gly-8 ⁇ His tag (SEQ ID NO: 422), short epitope tags such as FLAG, hemagglutinin (HA), c-myc, T7, and Glu-Glu; maltose binding protein (mbp); N-terminal glutathione S-transferase (GST); calmodulin binding peptide (CBP).
  • a poly-his tag such as a Gly-6 ⁇ His tag (SEQ ID NO: 421) or Gly-8 ⁇ His tag (SEQ ID NO: 422)
  • short epitope tags such as FLAG, hemagglutinin (HA), c-myc, T7, and Glu-Glu
  • a nucleic acid-guided nuclease provided herein can comprise a poly-his tag, such as a Gly-6 ⁇ His tag (SEQ ID NO: 421), e.g., at the N-terminus.
  • Gly-6 ⁇ His tags SEQ ID NO: 421 are applied for several reasons including: 1) a 6 ⁇ His tag (SEQ ID NO: 423) can be used in protein purification to allow binding to the chromatographic columns for purification, and 2) the N-terminal glycine allows further, site-specific, chemical modifications that permit advanced protein engineering.
  • the Gly-6 ⁇ His (SEQ ID NO: 421) is designed for easy removal, if desired, by digestion with Tobacco Etch Virus (TEV) protease.
  • TEV Tobacco Etch Virus
  • the Gly-6 ⁇ His tag (SEQ ID NO: 421) was positioned on the N-terminus.
  • Gly-6 ⁇ His tags (SEQ ID NO: 421) are further described in Martos-Maldonado et al., Nat Commun. (2016) 17;9(1):3307, the disclosure of which is incorporated herein.
  • nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4 and a poly-His tag at the N-terminus, such as a Gly-6 ⁇ His tag (SEQ ID NO: 421).
  • nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4, a poly-His tag at the N-terminus, such as a Gly-6 ⁇ His tag (SEQ ID NO: 421), and/or a TEV cleavage site at the N-terminus.
  • nucleic acid-guided nuclease having a poly-His tag at the N-terminus, such as a Gly-6 ⁇ His tag (SEQ ID NO: 421) and a TEV cleavage site at the N-terminus, such as a polypeptide having at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 2.
  • nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 1, a poly-His tag at the N-terminus, such as a Gly-6 ⁇ His tag (SEQ ID NO: 421), and/or a TEV cleavage site at the N-terminus. Additionally or alternatively, the nuclease may comprise one or more NLS as described herein.
  • an engineered nuclease polypeptide disclosed herein can include one or more cleavage sites, which can be at or near the N-terminus or the C-terminus. Any suitable cleavage site can be used; if a plurality of cleavage sits is used, they may be the same or different.
  • a cleavage site comprises a Tobacco Etch Virus protease cleavage sequence, herein referred to as a “TEV sequence” (SEQ ID NO: 108).
  • TEV sequence can be at or near the amino terminus.
  • the cleavage sequence e.g., TEV sequence
  • TEV sequence is located so that cleavage at the cleavage sequence leaves other additional amino acid sequences, in particular any NLS added to the original nuclease polypeptide, intact.
  • a TEV cleavage site can have the amino acid sequence ENLYFQS (SEQ ID. NO: 108.
  • nucleic acid sequence encoding a polypeptide having at least 50% nucleic acid identity to a polypeptide represented by SEQ ID NO: 2. In certain embodiments, provided herein is a nucleic acid sequence encoding a polypeptide having at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% to a polypeptide represented by SEQ ID NO: 2. In certain embodiments, provided herein is a nucleic acid sequence encoding a polypeptide having at least at least 50% nucleic acid identity to a polypeptide represented by SEQ ID NO: 3.
  • nucleic acid sequence encoding a polypeptide having at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% to a polypeptide represented by SEQ ID NO: 3.
  • nucleic acid sequence encoding a polypeptide having at least at least 50% nucleic acid identity to a polypeptide represented by SEQ ID NO: 4.
  • nucleic acid sequence encoding a polypeptide having at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% to a polypeptide represented by SEQ ID NO: 4.
  • nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 23-105.
  • nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 23-42 In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 43-65.
  • nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 43-53. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 54-58.
  • nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 59-63. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NO: 43.
  • nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 64-84. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NO: 64.
  • nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 64-74. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 75-79.
  • nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 80-84. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 85-105.
  • nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NO: 85. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 85-95.
  • nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 96-100. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 101-105.
  • a nucleic acid sequence encoding a nucleic acid-guided nuclease can be operably linked to a promoter.
  • Such nucleic acid sequences can be linear or circular.
  • the nucleic acid sequences can be encompassed on a larger linear or circular nucleic acid sequences that comprises additional elements such as an origin of replication, selectable or screenable marker, terminator, other components of a targetable nuclease system, such as a guide nucleic acid, and/or an editing or recorder cassette as disclosed herein.
  • nucleic acid sequences can include sequences that code for at least one glycine, at least one poly-histidine tag, such as a 6 ⁇ histidine tag (SEQ ID NO: 423), and/or at least one, two, three, four, or five nuclear localization signal tags, some or all of which can be on the amino side of the polypeptide, the carboxy side of the polypeptide, or a combination thereof. Larger nucleic acid sequences can be recombinant expression vectors, as are described in more detail later.
  • compositions and methods disclosed herein include a guide nucleic acid (gNA), e.g., a gRNA.
  • gNA guide nucleic acid
  • a guide polynucleotide also referred to as a guide nucleic acid (gNA) can complex with a compatible nucleic acid-guided nuclease, such as those disclosed herein, and can hybridize with a target nucleic acid sequence, thereby directing the nuclease to the target nucleic acid sequence.
  • a subject nucleic acid-guided nuclease capable of complexing with a guide polynucleotide can be referred to as a nucleic acid-guided nuclease that is compatible with the guide polynucleotide.
  • a guide polynucleotide capable of complexing with a nucleic acid-guided nuclease can be referred to as a guide polynucleotide or a guide nucleic acid that is compatible with the nucleic acid-guided nuclease.
  • a polynucleotide (gRNA) disclosed herein can be split into fragments, e.g., two separate polynucleotides, in some cases encompassing a synthetic tracrRNA and crRNA.
  • Such gNAs, e.g., gRNAs can be referred to as dual or split gNA, e.g., gRNA.
  • a guide polynucleotide can be DNA.
  • a guide polynucleotide can be RNA.
  • a guide polynucleotide can include both DNA and RNA.
  • a guide polynucleotide can include modified or non-naturally occurring nucleotides.
  • the RNA guide polynucleotide can be encoded by a DNA sequence on a polynucleotide molecule such as a plasmid, linear construct, or editing cassette as disclosed herein.
  • a guide polynucleotide can comprise a guide sequence, also referred to herein as a spacer sequence.
  • a guide (spacer) sequence is a polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence, also referred to herein as a target nucleic acid sequence, to hybridize with the target sequence and direct sequence-specific binding of a complexed nucleic acid-guided nuclease to the target sequence.
  • the degree of complementarity between a guide sequence and its corresponding target sequence when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
  • Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences.
  • a guide sequence can be about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length.
  • a guide sequence can be less than about 75, 50, 45, 40, 35, 30, 25, 20 nucleotides in length.
  • the guide sequence is 10-30 nucleotides long.
  • the guide sequence can be 15-20 nucleotides in length.
  • the guide sequence can be 15 nucleotides in length.
  • the guide sequence can be 16 nucleotides in length.
  • the guide sequence can be 17 nucleotides in length.
  • the guide sequence can be 18 nucleotides in length.
  • the guide sequence can be 19 nucleotides in length.
  • the guide sequence can be 20 nucleotides in length.
  • a guide polynucleotide can include a scaffold sequence.
  • a “scaffold sequence” can include any sequence that has sufficient sequence to promote formation of a targetable nuclease complex, wherein the targetable nuclease complex includes, but is not limited to, a nucleic acid-guided nuclease and a guide polynucleotide that can include a scaffold sequence and a guide sequence.
  • Sufficient sequence within the scaffold sequence to promote formation of a targetable nuclease complex may include a degree of complementarity along the length of two sequence regions within the scaffold sequence, such as one or two sequence regions involved in forming a secondary structure.
  • the one or two sequence regions are included or encoded on the same polynucleotide. In some cases, the one or two sequence regions are included or encoded on separate polynucleotides. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the one or two sequence regions. In some embodiments, the degree of complementarity between the one or two sequence regions along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, at least one of the two sequence regions can be about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length.
  • a scaffold sequence of a subject guide polynucleotide can comprise a secondary structure.
  • a secondary structure can comprise a pseudoknot region.
  • binding kinetics of a guide polynucleotide to a nucleic acid-guided nuclease is determined in part by secondary structures within the scaffold sequence.
  • binding kinetics of a guide polynucleotide to a nucleic acid-guided nuclease is determined in part by nucleic acid sequence with the scaffold sequence.
  • the invention provides a nuclease that binds to a guide polynucleotide can include a conserved scaffold sequence.
  • the nucleic acid-guided nucleases for use in the present disclosure can bind to a conserved pseudoknot region.
  • the engineered polynucleotide can be split into fragments encompassing a synthetic tracrRNA and crRNA.
  • guide nucleic acid or “guide polynucleotide” can refer to one or more polynucleotides and can include 1) a guide (spacer) sequence capable of hybridizing to a target sequence and 2) a scaffold sequence capable of interacting with or complexing with a nucleic acid-guided nuclease as described herein.
  • a guide nucleic acid can be provided as one or more nucleic acids.
  • the guide sequence and the scaffold sequence are provided as a single polynucleotide.
  • guide nucleic acid may include at least one amplicon targeting fragments.
  • a guide nucleic acid can be compatible with a nucleic acid-guided nuclease when the two elements can form a functional targetable nuclease complex capable of cleaving a target sequence.
  • a compatible scaffold sequence for a compatible guide nucleic acid can be found by scanning sequences adjacent to a native nucleic acid-guided nuclease loci.
  • native nucleic acid-guided nucleases can be encoded on a genome within proximity to a corresponding compatible guide nucleic acid or scaffold sequence.
  • Nucleic acid-guided nucleases can be compatible with guide nucleic acids that are not found within the nucleases endogenous host. Such orthogonal guide nucleic acids can be determined by empirical testing. Orthogonal guide nucleic acids can come from different bacterial species or be synthetic or otherwise engineered to be non-naturally occurring.
  • Orthogonal guide nucleic acids that are compatible with a common nucleic acid-guided nuclease can comprise one or more common features.
  • Common features can include sequence outside a pseudoknot region.
  • Common features can include a pseudoknot region.
  • Common features can include a primary sequence or secondary structure.
  • a guide nucleic acid can be engineered to target a desired target sequence by altering the guide (spacer) sequence such that the guide sequence is complementary to the target sequence, thereby allowing hybridization between the guide sequence and the target sequence.
  • a guide nucleic acid with an engineered guide sequence can be referred to as an engineered guide nucleic acid.
  • Engineered guide nucleic acids are often non-naturally occurring and are not found in nature.
  • Engineered guide nucleic acids can be formed using a Synthetic Tracr RNA (STAR) system.
  • STAR when combined with a Cas12a protein, can form at least one ribonucleoprotein (RNP) complex that targets a specific genomic locus.
  • RNP ribonucleoprotein
  • STAR takes advantage of the natural properties of the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) where the CRISPR system functions much like an immune system against invading viruses and plasmid DNA.
  • Short DNA sequences (spacers) from invading viruses are incorporated at CRISPR loci within the bacterial genome and serve as “memory” of previous infections. Reinfection triggers complementary mature CRISPR RNA (crRNA) to find a matching viral sequence.
  • tracrRNA trans-activating crRNA
  • Cas CRISPR-associated nuclease to cleave double-strand breaks in “foreign” DNA sequences.
  • the prokaryotic CRISPR “immune system” has been engineered to function as an RNA-guided, mammalian genome editing tool that is simple, easy and quick to implement.
  • STAR which includes synthetic crRNA and tracrRNA
  • STAR when combined with Cas12a protein can form ribonucleoprotein (RNP) complexes that target a specific genomic locus.
  • Engineered guide nucleic acids formed with the RNA (STAR) system can result in a split gRNA.
  • Split gRNA i.e., dual guide RNAs are described more fully in WO 2021067788A1.
  • ribonucleoprotein (RNP) complexes that include at least one nuclease disclosed herein.
  • a RNP complex can include at least one nuclease having an amino acid sequence of at least 50% identity to SEQ ID NO:2.
  • a RNP complex can include at least one nuclease having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO:2.
  • a RNP complex can include at least one nuclease having an amino acid sequence of at least 50% identity to SEQ ID NO:3.
  • a RNP complex can include at least one nuclease having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO:3. In certain embodiments, a RNP complex can include at least one nuclease having an amino acid sequence of at least 50% identity to SEQ ID NO:4. In certain embodiments, a RNP complex can include at least one nuclease having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO:4.
  • a RNP complex including a nuclease disclosed herein can further include at least one STAR gRNA (dual guide RNA). In certain embodiments, a RNP complex including a nuclease disclosed herein can further include at least one non-STAR gRNA (e.g., single guide RNA). In certain embodiments, a RNP complex including a nuclease disclosed herein can further include at least one polynucleotide. In certain embodiments, a polynucleotide included in a RNP complex disclosed herein can be greater than about 50 nucleotides in length.
  • a polynucleotide included in a RNP complex disclosed herein can be about 50, to about 150, to about 500, to about 1000 nucleotides, or greater than 1000 nucleotides in length.
  • more than one nuclease can be added to an RNP complex to affect the overall editing efficiency.
  • more than one gRNA can be added to the RNP complex to allow for multiplexed editing of more than one site in a single transfection for improved efficiency.
  • more than one DNA template can be added to the RNP to allow for multiplexed editing at one or more sites based on a specific desired repair outcome.
  • a composition comprising a Type V, e.g., Type VA, CRISPR nuclease polypeptide, such as described herein further comprises a guide nucleic acid (gNA), e.g., gRNA, comprising a spacer sequence that targets a target nucleotide sequence (also referred to herein as a target nucleic acid sequence) within a polynucleotide (also referred to herein as a target polynucleotide, as will be clear from context), or a polynucleotide coding for the gNA, e.g., gRNA, wherein the gNA, e.g., gRNA is compatible with the Type V, e.g., Type VA, CRISPR nuclease.
  • gNA guide nucleic acid
  • a polynucleotide within which a target target nucleotide sequence (target nucleic acid sequence) is located includes a polynucleotide that includes the target target nucleotide sequence (target nucleic acid sequence).
  • a polynucleotide can be any suitable polynucleotide, such as a genome of a cell or part of a genome of a cell.
  • the target nucleotide sequence is within 50 nucleotides of a protospacer adjacent motif (PAM) sequence specific for the Type V CRISPR nuclease, such as a PAM comprising a sequence of YTTN, wherein Y is T or C and N is A, T, G, or C, or a sequence of YTTV or TTTV, wherein V is A, G, or C.
  • PAM protospacer adjacent motif
  • the PAM comprises a sequence of YTTV or TTTV, wherein V is A, G, or C.
  • the gNA is a gRNA, such as a dual (split) gRNA. The gNA, e.g.
  • gRNA can comprise one or more chemical modifications, such as 2′-O-alkyl, a 2′-O-methyl, a phosphorothioate, a phosphonoacetate, a thiophosphonoacetate, a 2′-O-methyl-3′-phosphorothioate, a 2′-O-methyl-3′-phosphonoacetate, a 2′-O-methyl-3′-thiophosphonoacetate, a 2′-deoxy-3′-phosphonoacetate, a 2′-deoxy-3′-thiophosphonoacetate, a suitable alternative, or a combination thereof.
  • chemical modifications such as 2′-O-alkyl, a 2′-O-methyl, a phosphorothioate, a phosphonoacetate, a thiophosphonoacetate, a 2′-O-methyl-3′-phosphorothioate, a 2′-O-methyl-3′-phosphonoacetate, a 2′-
  • a ratio of guanine:uracil in the gRNA is at least 51:49, 52:48, 53:47, 54:46, 55:45, 56:44, 57:43, 58:42, 59:42, or 60:40, preferably at least 53:47, more preferably at least 54:46, even more preferably at least 55:45. See Example 12 and FIG. 10 .
  • a molar ratio of gNA, e.g., gRNA to Type V CRISPR nuclease is at least 1.1:1, 1.2:1, 1.3:1, 1.4:1, 1.5:1, 1.6:1, 1.7:1, 1.8:1, 2:1, 2.2:1, 2.5:1, or 3:1 and/or not more than 1.2:1, 1.3:1, 1.4:1, 1.5:1, 1.6:1, 1.7:1, 1.8:1, 2:1, 2.2:1, 2.5:1, 3:1, or 4:1, preferably 1.1:1 to 2.5:1, more preferably 1.2:1 to 2:1, even more preferably 1.2:1 to 1.7:1. See, e.g., Example 13.
  • a molar amount of gNA is at least 10, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 170, 190 or 200 pmol and/or not more than 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 170, 190, 200, 250, or 300 pmol, preferably 25-200 pmol, more preferably 50-100 pmol, even more preferably 65 to 85 pmol. See Example 13.
  • a composition comprising a Type V, e.g., Type VA, CRISPR nuclease polypeptide, such as described herein further includes a donor template, also referred to as an editing template herein.
  • a donor template can comprise homology arms, that is, nucleotide sequences that are complementary with polynucleotide sequences on either side of a cleavage site at which the donor template will be inserted.
  • the donor template can be present in any suitable amount, e.g., in certain embodiments, at least 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.7, 2, 2.5, 3, 4, or 5 ⁇ g ⁇ L ⁇ 1 and/or not more than 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.7, 2, 2.5, 3, 4, 5, 7, or 10 ⁇ g ⁇ L ⁇ 1 , preferably 0.3 to 2 ⁇ g ⁇ L ⁇ 1 , more preferably 0.5 to 1.5 ⁇ g ⁇ L ⁇ 1 , even more preferably 0.8 to 1.2 ⁇ g ⁇ L ⁇ 1 .
  • a composition comprising a Type V, e.g., Type VA, CRISPR nuclease polypeptide, such as described herein further includes an anionic polymer.
  • Any suitable anionic polymer may be used.
  • Exemplary anionic polymers include 1,2,3-heptanetriol, 2-Amino-2-(hydroxymethyl)-1,3-propanediol (Tris), 3-(1-pyridino)-1-propane sulfonate (NDSB 201), 3[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate (CHAPS), 6-aminocaproic acid, adenosine diphosphate (ADP), adenosine triphosphate (ATP), alpha-cyclodextrin, amidosulfobetaine-14 (ASB-14), ammonium acetate, ammonium nitrate, ammonium sulfate, arginine, arginine ethy
  • an anionic polymer comprises polyglutamic acid.
  • the anionic polymer e.g., PGA, is present at a concentration of at least 20, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 170, 200, 250, 300, 400, or 500 ⁇ g ⁇ L ⁇ 1 and/or not more than 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 170, 200, 250, 300, 400, 500, 700, or 1000 ⁇ g ⁇ L ⁇ 1 , preferably 20 to 200 ⁇ g ⁇ L ⁇ 1 , more preferably 50 to 150 ⁇ g ⁇ L ⁇ 1 , even more preferably 80 to 120 ⁇ g ⁇ L ⁇ 1 (PGA).
  • a cell containing one or more of the compositions described herein e.g. a composition comprising a Type V, e.g., Type VA, CRISPR nuclease polypeptide comprising one or more NLSs and, in certain embodiments a purification tag and/or cleavage site.
  • a suitable cell may be used.
  • the cell is a human cell, such as an immune cell, e.g., T cell, or a stem cell, e.g., induced pluripotent stem cell (iPSC).
  • compositions described herein e.g., a composition comprising a Type V, e.g., Type VA, CRISPR nuclease polypeptide comprising one or more NLSs and, in certain embodiments a purification tag and/or cleavage site, into a cell.
  • a composition comprising a Type V e.g., Type VA
  • CRISPR nuclease polypeptide comprising one or more NLSs and, in certain embodiments a purification tag and/or cleavage site
  • electroporation is used. Electroporation conditions can be optimized, see, e.g., Examples.
  • a composition or compositions as described herein comprising contacting the target polynucleotide with a composition or compositions as described herein, e.g, a composition comprising a Type V, e.g., Type VA, CRISPR nuclease polypeptide comprising one or more NLSs and a suitable gNA, e.g., gRNA, and allowing the composition to modify the target polynucleotide, in some cases a genomic region, such as a genome or part of a genome within a cell, e.g. human cell such as an immune cell, e.g., T cell, or a stem cell, e.g., iPSC.
  • a genomic region such as a genome or part of a genome within a cell, e.g. human cell such as an immune cell, e.g., T cell, or a stem cell, e.g., iPSC.
  • the composition or compositions comprises a donor template, such as a donor template comprising a polynucleotide coding for a polypeptide to be expressed by the cell, in certain embodiments the polypeptide comprises a chimeric antigen receptor (CAR) or portion thereof; see, e.g., Examples.
  • the cell is a human cell, e.g., immune cell such as a T cell, or stem cell, such as an iPSC.
  • targetable nuclease system can include a nucleic acid-guided nuclease and a compatible guide nucleic acid (also referred to interchangeably herein as “guide polynucleotide” and “gRNA”).
  • a targetable nuclease system can include a nucleic acid-guided nuclease or a polynucleotide sequence encoding the nucleic acid-guided nuclease.
  • a targetable nuclease system can include a guide nucleic acid or a polynucleotide sequence encoding the guide nucleic acid.
  • a targetable nuclease system as disclosed herein can be characterized by elements that promote the formation of a targetable nuclease complex at the site of a target sequence, wherein the targetable nuclease complex includes a nucleic acid-guided nuclease and a guide nucleic acid.
  • a guide nucleic acid together with a nucleic acid-guided nuclease forms a targetable nuclease complex which is capable of binding to a target sequence within a target polynucleotide, as determined by the guide sequence of the guide nucleic acid.
  • a targetable nuclease complex binds to a target sequence as determined by the guide nucleic acid, and the nuclease has to recognize a protospacer adjacent motif (PAM) sequence adjacent to the target sequence.
  • PAM protospacer adjacent motif
  • a targetable nuclease complex can include a nucleic acid-guided nuclease having an amino acid sequence of at least 50% identity to SEQ ID NO: 2 and a compatible guide nucleic acid.
  • a targetable nuclease complex can include a nucleic acid-guided nuclease having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences SEQ ID NO: 2 and a compatible guide nucleic acid. protospacer adjacent motif (PAM) sequence adjacent to the target sequence.
  • PAM protospacer adjacent motif
  • a targetable nuclease complex can include a nucleic acid-guided nuclease having an amino acid sequence of at least 50% identity to SEQ ID NO: 3 and a compatible guide nucleic acid.
  • a targetable nuclease complex can include a nucleic acid-guided nuclease having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 3 and a compatible guide nucleic acid.
  • a targetable nuclease complex can include a nucleic acid-guided nuclease having an amino acid sequence of at least 50% identity to SEQ ID NO: 4 and a compatible guide nucleic acid.
  • a targetable nuclease complex can include a nucleic acid-guided nuclease having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4 and a compatible guide nucleic acid.
  • the guide nucleic acid can include a scaffold sequence compatible with the nucleic acid-guided nuclease selected.
  • the guide sequence can be engineered to be complementary to any desired target sequence.
  • the guide sequence selected can be engineered to hybridize to any desired target sequence.
  • the guide sequence is a dual guide RNA.
  • a target sequence of a targetable nuclease complex can be any polynucleotide endogenous or exogenous to a prokaryotic or eukaryotic cell, or in vitro.
  • the target sequence can be a polynucleotide residing in the nucleus of the eukaryotic cell.
  • a target sequence can be a sequence coding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or a junk DNA). It is contemplated herein that the target sequence should be associated with a PAM; that is, a short sequence recognized by a targetable nuclease complex.
  • a PAM site is a nucleotide sequence in proximity to a target sequence. In most cases, a nucleic acid-guided nuclease can only cleave a target sequence if an appropriate PAM is present. PAMs are nucleic acid-guided nuclease-specific and can be different between two different nucleic acid-guided nucleases. A PAM can be 5′ or 3′ of a target sequence. A PAM can be upstream or downstream of a target sequence. A PAM can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleotides in length. Often, a PAM is between 2-6 nucleotides in length.
  • Polynucleotide sequences encoding a component of a targetable nuclease system can include one or more vectors.
  • the term “vector” as used herein can refer to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked.
  • Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double-stranded, or partially double-stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g. circular); nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art.
  • vector refers to a circular double stranded DNA loop into which additional DNA segments can be inserted, such as by standard molecular cloning techniques.
  • viral vector refers to a viral vector, wherein virally-derived DNA or RNA sequences are present in the vector for packaging into a virus (e.g. retroviruses, replication defective retroviruses, adenoviruses, replication defective adenoviruses, and adeno-associated viruses).
  • viruses e.g., non-episomal mammalian vectors
  • non-episomal mammalian vectors can be integrated into the genome of a host cell upon introduction into the host cell.
  • Recombinant expression vectors can include a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, can mean that the recombinant expression vectors include one or more regulatory elements, which may be selected on the basis of the host cells to be used for expression, that is operatively-linked to the nucleic acid sequence to be expressed.
  • codon optimization can refer to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon or more of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence.
  • Various species exhibit certain bias for codons of a certain amino acid.
  • genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the “Codon Usage Database” available at www.kazusa.orjp and these tables can be adapted in a number of ways. See Nakamura, Y., et al. “Codon usage tabulated from the international DNA sequence databases: status for the year 2000” Nucl. Acids Res. 28:292 (2000).
  • a nucleic acid-guided nuclease and one or more guide nucleic acids can be delivered either as DNA or RNA. Delivery of a nucleic acid-guided nuclease and guide nucleic acid both as RNA (unmodified or containing base or backbone modifications) molecules can be used to reduce the amount of time that the nucleic acid-guided nuclease persists in the cell. This may reduce the level of off-target cleavage activity in the target cell.
  • Guide nucleic acid in the form of RNA or encoded on a DNA expression cassette can be introduced into a host cell can include a nucleic acid-guided nuclease encoded on a vector or chromosome.
  • the guide nucleic acid may be provided in the cassette one or more polynucleotides, which may be contiguous or non-contiguous in the cassette. In specific embodiments, the guide nucleic acid is provided in the cassette as a single contiguous polynucleotide.
  • a variety of delivery systems can be used to introduce a nucleic acid-guided nuclease (DNA or RNA) and guide nucleic acid (DNA or RNA) into a host cell.
  • systems of use can include, but are not limited to, yeast systems, lipofection systems, microinjection systems, biolistic systems, virosomes, liposomes, immunoliposomes, polycations, lipid:nucleic acid conjugates, virions, artificial virions, viral vectors, electroporation, cell permeable peptides, nanoparticles, nanowires (Shalek et al., Nano Letters, 2012), exosomes.
  • Molecular trojan horses liposomes may be used to deliver an engineered nuclease and guide nuclease across the blood brain barrier.
  • an editing template also referred to herein as a donor template
  • An editing template may be a component of a vector as described herein, contained in a separate vector, or provided as a separate polynucleotide, such as an oligonucleotide, linear polynucleotide, or synthetic polynucleotide.
  • an editing template is on the same polynucleotide as a guide nucleic acid.
  • an editing template is designed to serve as a template in homologous recombination, such as within or near a target sequence nicked or cleaved by a nucleic acid-guided nuclease as a part of a complex as disclosed herein.
  • An editing template polynucleotide can be of any suitable length, such as about or more than about 10, 15, 20, 25, 50, 75, 100, 150, 200, 500, 1000, or more nucleotides in length.
  • the editing template polynucleotide is complementary to a portion of a polynucleotide can include the target sequence.
  • an editing template polynucleotide might overlap with one or more nucleotides of a target sequences (e.g. about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, or more nucleotides).
  • a target sequence e.g. about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, or more nucleotides.
  • the nearest nucleotide of the template polynucleotide is within about 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 5000, 10000, or more nucleotides from the target sequence.
  • methods are provided for delivering one or more polynucleotides, such as or one or more vectors or linear polynucleotides as described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell.
  • the invention further provides cells produced by such methods, and organisms can include or produced from such cells.
  • an engineered nuclease in combination with (and optionally complexed with) a guide nucleic acid is delivered to a cell.
  • Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome.
  • Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.
  • Adeno-associated virus (“AAV”) vectors may also be used to transduce cells with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene therapy procedures.
  • AAV Adeno-associated virus
  • a host cell is transiently or non-transiently transfected with one or more vectors, linear polynucleotides, polypeptides, nucleic acid-protein complexes, or any combination thereof as described herein.
  • a cell is transfected as it naturally occurs in a subject.
  • a cell that is transfected is taken from a subject.
  • the cell is derived from cells taken from a subject, such as a cell line.
  • one or more vectors described herein are used to produce a non-human transgenic cell, organism, animal, or plant.
  • the transgenic animal is a mammal, such as a mouse, rat, or rabbit.
  • Methods for producing transgenic cells, organisms, plants, and animals are known in the art, and generally begin with a method of cell transformation or transfection, such as described herein.
  • an engineered nuclease complex can refer to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of an engineered nuclease complex.
  • a target sequence can include any polynucleotide, such as DNA, RNA, or a DNA-RNA hybrid.
  • a target sequence can be located in the nucleus or cytoplasm of a cell.
  • a target sequence can be located in vitro or in a cell-free environment.
  • formation of an engineered nuclease complex can include a guide nucleic acid hybridized to a target sequence and complexed with one or more novel engineered nucleases as disclosed herein renders cleavage of one or both strands in or near (e.g. within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, or more base pairs from) the targeted sequence.
  • Cleavage can occur within a target sequence, 5′ of the target sequence, upstream of a target sequence, 3′ of the target sequence, or downstream of a target sequence.
  • one or more vectors driving expression of one or more components of a targetable nuclease system are introduced into a host cell or in vitro such formation of a targetable nuclease complex at one or more target sites.
  • a nucleic acid-guided nuclease and a guide nucleic acid can each be operably linked to separate regulatory elements on separate vectors.
  • two or more of the elements expressed from the same or different regulatory elements can be combined in a single vector, with one or more additional vectors providing any components of the targetable nuclease system not included in the first vector.
  • Targetable nuclease system elements that are combined in a single vector may be arranged in any suitable orientation, such as one element located 5′ with respect to (“upstream” of) or 3′ with respect to (“downstream” of) a second element.
  • the coding sequence of one element may be located on the same or opposite strand of the coding sequence of a second element, and oriented in the same or opposite direction.
  • a single promoter drives expression of a transcript encoding a nucleic acid-guided nuclease and one or more guide nucleic acids.
  • a nucleic acid-guided nuclease and one or more guide nucleic acids are operably linked to and expressed from the same promoter.
  • one or more guide nucleic acids or polynucleotides encoding the one or more guide nucleic acids are introduced into a cell or in vitro environment already can include a nucleic acid-guided nuclease or polynucleotide sequence encoding the nucleic acid-guided nuclease.
  • a single expression construct may be used to target nuclease activity to multiple different, corresponding target sequences within a cell or in vitro.
  • a single vector can include about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or more guide sequences. In other embodiments, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more such guide-sequence-containing vectors can be provided, and optionally, delivered to a cell in vivo or in vitro.
  • the nucleic acid-guided nuclease has DNA cleavage activity or RNA cleavage activity. In some embodiments, the nucleic acid-guided nuclease directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the nucleic acid-guided nuclease directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence.
  • the method includes allowing a targetable nuclease complex to bind to the target sequence to effect cleavage of the target sequence, thereby modifying the target sequence, wherein the targetable nuclease complex includes a nucleic acid-guided nuclease complexed with a guide nucleic acid wherein the guide sequence of the guide nucleic acid is hybridized to a target sequence within a target polynucleotide.
  • the invention provides a method of modifying expression of a target polynucleotide in in vitro or in a prokaryotic or eukaryotic cell.
  • the method includes allowing an targetable nuclease complex to bind to a target sequence with the target polynucleotide such that the binding can lead to in increased or decreased expression of the target polynucleotide; wherein the targetable nuclease complex includes an nucleic acid-guided nuclease complexed with a guide nucleic acid, and wherein the guide sequence of the guide nucleic acid is hybridized to a target sequence within the target polynucleotide.
  • kits containing any one or more of the elements disclosed in the above methods and compositions. Elements may provide individually or in combinations, and may be provided in any suitable container, such as a vial, a bottle, or a tube. In some embodiments, the kit includes instructions in one or more languages, for example in more than one language.
  • a kit comprises one or more reagents for use in a process utilizing one or more of the elements described herein.
  • Reagents may be provided in any suitable container.
  • a kit may provide one or more reaction or storage buffers.
  • Reagents can be provided in a form that is usable in an assay, or in a form that requires addition of one or more other components before use (e.g. in concentrate or lyophilized form).
  • a buffer can be any buffer, including but not limited to a sodium carbonate buffer, a sodium bicarbonate buffer, a borate buffer, a Tris buffer, a MOPS buffer, a HEPES buffer, and combinations thereof.
  • the buffer is alkaline.
  • the buffer has a pH from about 7 to about 10.
  • the kit includes one or more oligonucleotides corresponding to a guide sequence for insertion into a vector so as to operably link the guide sequence and a regulatory element.
  • the kit includes a editing template.
  • An editing template polynucleotide can include a sequence to be integrated (e.g., a mutated gene).
  • a sequence for integration may be a sequence endogenous or exogenous to the cell. Examples of a sequence to be integrated include polynucleotides encoding a protein or a non-coding RNA (e.g., a microRNA). Thus, the sequence for integration may be operably linked to an appropriate control sequence or sequences. Alternatively, the sequence to be integrated may provide a regulatory function. Sequence to be integrated may be a mutated or variant of an endogenous wild-type sequence. Alternatively, sequence to be integrated may be a wild-type version of an endogenous mutated sequence. Additionally or alternatively, sequenced to be integrated may be a variant or mutated form of an endogenous mutated or variant sequence.
  • an upstream or downstream sequence can include from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or about 2500 bp.
  • an exemplary upstream or downstream sequence has about 15 bp to about 2000 bp, about 30 bp to about 1000 bp, about 50 bp to about 750 bp, about 600 bp to about 1000 bp, or about 700 bp to about 1000 bp.
  • the editing template polynucleotide can further include a marker.
  • some markers can facilitate screening for targeted integrations. Examples of suitable markers can include, but are not limited to, restriction sites, fluorescent proteins, or selectable markers.
  • an exogenous polynucleotide template can be constructed using recombinant techniques.
  • an exemplary method for modifying a target polynucleotide by integrating an editing template polynucleotide, a double stranded break is introduced into the genome sequence by an engineered nuclease complex, the break can be repaired via homologous recombination using an editing template such that the template is integrated into the target polynucleotide.
  • the presence of a double-stranded break can increase the efficiency of integration of the editing template.
  • Some methods include increasing or decreasing expression of a target polynucleotide by using a targetable nuclease complex that binds to the target polynucleotide.
  • Detection of the gene expression level can be conducted in real time in an amplification assay.
  • the amplified products can be directly visualized with fluorescent DNA-binding agents including but not limited to DNA intercalators and DNA groove binders. Because the amount of the intercalators incorporated into the double-stranded DNA molecules can be proportional to the amount of the amplified DNA products, one can conveniently determine the amount of the amplified products by quantifying the fluorescence of the intercalated dye using conventional optical systems in the art.
  • DNA-binding dye suitable for this application include, but are not limited to, SYBR green, SYBR blue, DAPI, propidium iodine, Hoeste, SYBR gold, ethidium bromide, acridines, proflavine, acridine orange, acriflavine, fluorcoumanin, ellipticine, daunomycin, chloroquine, distamycin D, chromomycin, homidium, mithramycin, ruthenium polypyridyls, anthramycin, and others known by one of skill in the art.
  • the amount of agent:polypeptide complexes formed during the binding reaction can be quantified by standard quantitative assays. As illustrated above, the formation of agent:polypeptide complex can be measured directly by the amount of label remained at the site of binding.
  • the protein associated with a signaling biochemical pathway is tested for its ability to compete with a labeled analog for binding sites on the specific agent. In this competitive assay, the amount of label captured is inversely proportional to the amount of protein sequences associated with a signaling biochemical pathway present in a test sample.
  • an altered expression of a gene associated with a signaling biochemical pathway can also be determined by examining a change in activity of the gene product relative to a control cell.
  • the assay for an agent-induced change in the activity of a protein associated with a signaling biochemical pathway will dependent on the biological activity and/or the signal transduction pathway that is under investigation.
  • a change in its ability to phosphorylate the downstream substrate(s) can be determined by a variety of assays known in the art. Representative assays include but are not limited to immunoblotting and immunoprecipitation with antibodies such as anti-phosphotyrosine antibodies that recognize phosphorylated proteins.
  • kinase activity can be detected by high throughput chemiluminescent assays.
  • pH sensitive molecules such as fluorescent pH dyes can be used as the reporter molecules.
  • the protein associated with a signaling biochemical pathway is an ion channel
  • fluctuations in membrane potential and/or intracellular ion concentration can be monitored.
  • Representative instruments include FLIPRTM (Molecular Devices, Inc.) and VIPR (Aurora Biosciences). These instruments are capable of detecting reactions in over 1000 sample wells of a microplate simultaneously, and providing real-time measurement and functional data within a second or even a millisecond.
  • Some embodiments disclosed herein relate to use of an engineered nucleic acid guided nuclease system disclosed herein; for example, in order to target and knock out genes, amplify genes and/or repair certain mutations associated with DNA repeat instability and a medical disorder.
  • This nuclease system may be used to harness and to correct these defects of genomic instability.
  • engineered nucleic acid guided nuclease systems disclosed herein can be used for correcting defects in the genes associated with Lafora disease.
  • Lafora disease is an autosomal recessive condition which is characterized by progressive myoclonus epilepsy which may start as epileptic seizures in adolescence. This condition causes seizures, muscle spasms, difficulty walking, dementia, and eventually death.
  • the engineered/novel nucleic acid guided nuclease system can be used to correct genetic-eye disorders that arise from several genetic mutations
  • Human Jurkat T-cell leukemia cells (Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures GmbH (ACC 282)) were propagated in RPMI 1640 medium (ThermoFisher Scientific) with 10% heat-inactivated fetal bovine serum (FBS) (ThermoFisher Scientific) supplemented with 1% penicillin-streptomycin antibiotic mix (ThermoFisher Scientific).
  • FBS heat-inactivated fetal bovine serum
  • penicillin-streptomycin antibiotic mix ThermoFisher Scientific.
  • Cells were cultured at 37° C. in 5% CO2 incubators and maintained at a density of 0.5 to 1.5 ⁇ 10 6 cells mL ⁇ 1 . 24 hours before transfection, cells were passaged at 0.1 ⁇ 10 6 cell mL ⁇ 1 .
  • Cell culture media supernatant was periodically tested for mycoplasma contamination using the MycoAlert PLUS mycoplasma detection kit (Lonza).
  • Ribonucleoprotein complexes were generated by incubating respective guide nucleic acids (gNAs) with MAD7 in the molar ratio of 3:2 gNA:MAD7 for 15 minutes at room temperature immediately before transfection.
  • gNAs guide nucleic acids
  • MAD7 100 pmol
  • nuclease-free water unless otherwise stated.
  • T-cell experiments 1.6 ⁇ L of an aqueous solution of 15-50 kDa poly-L-glutamic acid (PGA, 100 ⁇ g Alamanda Polymers) was added to gNAs, followed by the addition of MAD7 and nuclease-free water.
  • PGA poly-L-glutamic acid
  • Donor templates comprising site-specific homology arms, respective promoter, and respective gene (GFP or Hu19 scFv-CD8 ⁇ -CD28-CD3 ⁇ CAR) were amplified from corresponding pTwist Ampicillin high-copy plasmids (Twist Bioscience) using homology arms-specific PCR primers. Donor templates were amplified in a two-step PCR program: initial denaturation at 98° C. for 30 seconds, cycle denaturation at 98° C. for 10 seconds, extension at 72° C. for 30 seconds per kb amplicon for 40-cycles with a hold at 72° C. for 10 minutes.
  • Each 50 PCR reaction contained 10 ng amplification template (plasmid DNA), 0.5 ⁇ M homology arm-specific forward and reverse primers, nuclease-free water (IDT), 3% DMSO, and 1 ⁇ Phusion High-Fidelity PCR Master Mix with HF Buffer (ThermoFisher Scientific).
  • PCR products were purified using NucleoSpin Gel and PCR Clean-up Kit (Macherey-Nagel) with two 20 ⁇ L elutions. Purified HDR templates were collected and quantified on NanoDrop One Microvolume UV-Vis Spectrophotometer (ThermoFisher Scientific).
  • HDR templates were concentrated using Amicon Ultra 0.5 mL 30K Centrifugal Filters: 100 ⁇ g DNA per unit was transferred, filled with nuclease-free water to 500 ⁇ L, and centrifuged at 10,000 g for 10 minutes to reduce volume to 50 ⁇ L. DNA was washed twice with nuclease-free water and recovered into a fresh tube by inversion and centrifugation at 10,000 g for 15 seconds. HDR templates were collected, diluted, and concentrations quantified using Qubit dsDNA HS Assay Kit (ThermoFisher Scientific). HDR templates of 0.5 to 1 ⁇ g ⁇ L ⁇ 1 were used for cellular studies.
  • Lonza 4D Nucleofector with Shuttle unit (V4SC-2960 Nucleocuvette Strips) was used for transfection, following the manufacturer's instructions.
  • cells were harvested by centrifugation (200 g, RT, 5 minutes) and re-suspended in 20 ⁇ L at 10 ⁇ 10 6 cells mL ⁇ 1 in the SF Cell Line Nucleofector X Kit buffer (Lonza), unless stated otherwise.
  • the cell suspension was mixed with the RNPs, immediately transferred to the nucleocuvette, and transfected.
  • the cells were immediately re-suspended in the pre-warmed cultivation medium and plated onto 96-well, flat-bottom, non-cell culture treated plates (Falcon), and cultured at 37° C. in 5% CO 2 incubators and maintained at a density of 0.5 to 1.0 ⁇ 10 6 cells mL ⁇ 1 . After 48 hours, the cells were harvested for the viability assay and genomic DNA, as described below. For the Homology-Directed Repair Template insertion, the HDR template was added to the cells and the suspension transferred to the RNPs immediately before transfection. The transfection parameters, cell recovery step, and proliferation conditions as described in Example 1. The cells were harvested 48 hours post-transfection for the viability assessment, after 7 days for CAR insertion efficiency, or after 7 days, 14 days, and 21 days for GFP insertion efficiency.
  • the cells were harvested by centrifugation (300 g, RT, 5 minutes) and re-suspended in 20 ⁇ L at 50 ⁇ 10 6 cells mL ⁇ 1 in the supplemented P3 Primary Cell Nucleofector Kit buffer (Lonza). The cells were mixed with HDR templates and the suspension transferred to the RNPs immediately before transfection (Nucleofection program EH-115). After transfection, 80 ⁇ L of pre-warmed cultivation medium without IL-2 was added to the electroporation cuvettes. When using M3814 (Selleckchem), 80 ⁇ L of pre-warmed cultivation medium containing 2 ⁇ M M3814 final concentration without IL-2 was added to the electroporation cuvettes.
  • T-cells were transferred onto 96-well, flat-bottom, non-cell culture treated plates (Falcon) containing pre-warmed cultivation medium pretreated with 2 ⁇ M M3814 final concentration and 12.5 ng mL ⁇ 1 IL-2.
  • the cells were seeded at a density of 0.25 ⁇ 10 6 cells mL ⁇ 1 , or 1.3 ⁇ 10 6 cells mL ⁇ 1 in the experiment with M3814, and kept at 37° C. in 5% CO 2 incubators.
  • the viability assay was carried out 24 hours post-transfection after which the cells were reseeded in the fresh cultivation medium containing IL-2. Insertion efficiency of CAR was measured after 7 days, and 11 days or 13 days post-transfection.
  • Flow cytometric assessments were carried out on a CytoFLEX S instrument (Beckmen Coulter) using a 96-well plate format. Measurements of cell viability, PDCD1 expression, GFP expression, and CAR expression were performed on 10,000 or 20,000 single cell events in Jurkat or primary T-cells, respectively.
  • the cell viability and GFP knock-in measurements approximately 250,000 cells per sample were transferred onto 96-well V-bottom cell culture plates and assessed following a series of consecutive washing and staining steps.
  • the first step included centrifuging the cells at 300 g for 5 minutes at room temperature, discarding the supernatant, and washing cells in 150 ⁇ L Dulbecco's PBS/2% FBS (STEMCELL Technologies) or Cell Staining Buffer (Biolegend), respectively, followed by the second centrifugation and removal of supernatant.
  • the final step included viability staining of cells using 150 ⁇ L Dulbecco's PBS/2% FBS with 7-amino-actinomycin D (7-AAD, 1:1,000; ThermoFisher) or 50 ⁇ L Cell Staining Buffer with Zombie Violet Dye (1:200; Biolegend), respectively.
  • the measurements of cell viability and GFP expression were collected simultaneously for 7-AAD (excitation: yellow-green laser; emission: 561 nm), Zombie Violet (excitation: violet laser; emission 405 nm), and GFP (excitation: blue laser; emission 488 nm) as needed.
  • PDCD1 knock-out efficiency For detection of PDCD1 knock-out efficiency, approx. 250,000 Jurkat cells per sample were transferred onto 96-well V-bottom cell culture plates and assessed following a series of consecutive washing and staining steps. The first step included centrifuging the cells at 300 g for 5 minutes at 4° C. and discarding the supernatant. Afterwards, the cells were stained using 100 ⁇ L Cell Staining Buffer (Biolegend) with APC/Cyanine7 anti-human CD279 (PD-1) antibody (1:100; Biolegend) and incubated for 30 minutes at 4° C. in the dark. The cells were then centrifuged at 300 g for 5 minutes at 4° C. and the supernatant discarded.
  • the next step included two repeats of centrifugation at 300 g for 5 minutes at 4° C., supernatant removal, and cell washing in 150 ⁇ L ice-cold Cell Staining Buffer (Biolegend).
  • the cells were re-suspended in 100 ⁇ L Cell Staining Buffer for the flow cytometry measurements (excitation: red laser; emission: 633 nm).
  • Extracted genomic DNA was quantified using the NanoDrop (ThermoFisher Scientific). Amplicons were constructed in two PCR steps: in the first PCR, regions of interest (150-400 bp) were amplified from 10 to 30 ng of genomic DNA with primers containing Illumina forward and reverse adapters on both ends comprising suitable loci-specific complementary sequences, using Phusion High-Fidelity PCR Master Mix (ThermoFisher Scientific). Amplification products were purified with Agencourt AMPure XP beads (Ramcon), using the sample to beads ratio of 1:1.8.
  • the DNA was eluted from the beads with nuclease-free water and the size of the purified amplicons analyzed on a 2% agarose E-gel using the E-gel electrophoresis system (ThermoFisher Scientific).
  • unique pairs of Illumina-compatible indexes Nextera XT Index Kit v2 were added to the amplicons using the KAPA HiFi HotStart Ready Mix (Roche).
  • the amplified products were purified with Agencourt AMPure XP beads (Ramcon), using the sample to bead ratio of 1:1.8.
  • the DNA was eluted from the beads with 10 mM Tris-HCl pH 8.5, 0.1% Tween 20.
  • Example 11 CRISPR-MAD7 Platform for Human Genome Editing Using the Jurkat T-Cell Leukemia Line
  • MAD7 nuclease comprising a His6 tag (SEQ ID NO: 423) and either one (MAD7-1NLS) or four (MAD7-4NLS) nuclear localization signals (NLS) were used ( FIG. 1 ).
  • RNPs were generated as described in Example 3.
  • Editing frequency of the MAD7 nuclease complexed with one or more guide nucleic acids comprising a spacer sequence of SEQ ID NOs: 86-384 as shown in Table 1 was determined by nucleofection of RNPs in Jurkat T-cells using the Lonza recommended nucleofection program SE-CL-120 (Example 5), followed by genomic DNA extraction (Example 8), amplification of the edited locus and targeted next-generation sequencing (Example 9) for identification of the edits, and finally by computational analysis (Example 10) of modification frequency using the CRISPResso2 algorithm.
  • the editing frequency of MAD7 comprising either one or four NLS complexed with the respective gNA was compared.
  • editing frequency was enhanced in Jurkat cells when treated with RNPs comprising MAD-4NLS, which indicates that optimization of the NLS can improve editing efficiency.
  • a slight decrease in cell viability was seen at higher concentrations of RNP for those comprising four NLS as compared to one NLS ( FIG.
  • NLS nuclear localization signal
  • MAD7-RNP amounts pmol; constant ratio of 1:1.5 MAD7:gNA
  • FIGS. 3 - 5 show the editing frequency (bars; x-axis) of each of the electroporation conditions (buffers SE, SF, and SG respectively) as compared to a control (y-axis, control at the top).
  • the majority of buffer-program transfection combinations resulted in suboptimal viability (dots; x-axis) and editing frequency, however, the analysis revealed several conditions that supported substantial rates of both cell viability and editing.
  • the Jurkat T-cell leukemia cell line was used as a model system to screen GNAs demonstrating high editing efficiency.
  • the screen included 298 unique gNAs comprising one or more spacer sequences of SEQ ID NOs: 86-384 of Table 1 targeting the immune checkpoint receptors PDCD1, TIM3, LAG3, TIGIT, and CTLA4, the checkpoint phosphatases PTPN6 (SHP-1) and PTPN11 (SHP-2), and the TCR signaling subunit CD247 (CD3 ⁇ ).
  • RNPs were generated as described in Example 3, nucleofected (Example 5), genomic DNA was extracted (Example 8), the edited loci amplified and sequenced (Example 9), and the sequencing data computationally analyzed (Example 10) using the CRISPResso2 algorithm.
  • CRISPResso2 software reports the frequency of modifications (insertions, deletions, and substitutions) within a quantification window flanking the position of MAD7-induced cleavage in the amplicon sequence.
  • modifications insertions, deletions, and substitutions
  • the type of modifications detected in 230 amplicons that were sequenced in both gNA-treated and MOCK samples (no MAD7) were compared. Relatively high modification frequencies (median 1%) in MOCK reactions were observed as a result of high frequency of substitutions ( FIG.
  • Dark grey boxplots represent mean INDEL frequency using gNAs.
  • Light grey boxplots represent mean INDEL frequency using crIDTneg (IDT).
  • MAD7 can target a wide range of PAM
  • gNAs adjacent to all YTTN PAM variants were screened and editing specificity of MAD7 in Jurkat cells was analyzed.
  • a grey zone on the plot represents moderately-active gNAs (10-50% INDELs), the zone above highly-active gNAs (>50% INDELs), and the zone below active gNAs (1-10% INDELs).
  • FIG. 10 shows (A) sequence logos comparing DNA-complementary gNA sequences of highly-active (>50% INDELs), moderately-active (10-50% INDELs), active (1-10% INDELs), and inactive ( ⁇ 1% INDELs) gNAs show no strong biases for ribonucleotides at specific positions, however, guanine appeared overrepresented and uracil underrepresented on highly-active and moderately-active gNAs; (B) nucleotide frequency on inactive ( ⁇ 1% INDELs; dark grey box), active (1-10% INDELs; medium grey box), moderately-active (10-50% INDELs; light grey box), and highly-active (>50% INDELs; white box) gNAs, with significant enrichment of guanine and depletion of uracil on highly-active gNAs compared to
  • the INDEL frequency was significantly correlated to the measurements from the initial screen, highlighting the reproducibility of the INDEL assay ( FIG. 13 ). Specifically, FIG.
  • FIG. 14 shows fraction of frameshift to INDEL frequency (dark grey bars) in T-cell leukemic cell line as a function of 38 high-efficiency gNAs. Average fraction of INDELs leading to frameshifts (dashed line) is approx. 66%. Alternating grey and white zones on the plot represent groups of three to five high-efficiency gNAs per locus.
  • gNAs Another consideration for selecting gNAs is the potential for off-target cleavage events.
  • the list of validated gNAs was analyzed using the CasOFFinder software to predict potential off-target editing sites in the genome with up to four mismatches between the gNA and the target DNA sequence.
  • the predicted off-target sites were matched with the human gene database, and those sites that targeted exons and introns within the genes were extracted. Afterwards, the degree of editing activity at these sites was examined by targeted next-generation sequencing, more specifically, at 25 predicted off-target sites for the top-two PDCD1 gNAs, i.e., crPDCD1_1 and crPDCD1_2.
  • INDEL frequency was analyzed at the putative off-target editing sites with ⁇ 4 mismatches between the gNA and target DNA sequence, and with ⁇ 3 mismatches on the remaining gNAs.
  • PAM sequences and spacer sequences with mismatches marked in red are displayed next to their respective measured INDEL frequencies. No significant INDEL frequency at any of the off-target sites was detected (Pairwise T-test, P ⁇ 0.05).
  • Insertion of exogenous transgenes is an important aspect of mammalian cell engineering.
  • Gene insertion with CRISPR-Cas is achieved by homology-directed repair of CRISPR-induced DNA breaks using HDR-donor templates to copy exogenous genetic sequences into targeted DNA loci.
  • HDR templates composed of linear double stranded DNA, provide the most robust and efficient method of transgene insertion using CRISPR-Cas genome editing systems.
  • the Jurkat T-cell leukemia cell line was used to evaluate the transgene insertion and expression efficiency using CRISPR-MAD7 RNP complexes.
  • a highly active gNA targeting the AAVS1 (spacer sequence in Table 1) safe-harbor locus ( FIG. 17 ) was used in combination with eight different HDR-repair templates flanked with symmetric homology arms (HA) of 500 base pairs (bp) in the amount of 0.5 ⁇ g ⁇ L ⁇ 1 .
  • the HDR inserts comprised eight promoters (Table 2) differing in both size and promoter strength to drive GFP expression ( FIG. 18 ). When the transient GFP expression diminished at day 14 post-transfection, comparable insertion efficiencies were observed with stable GFP expressions of up to 30% using four (JET, PGK, EF1a, and CAG) out of eight promoters ( FIG. 18 ), suggesting that the insert size has not affected the integration efficiency at AAVS1 in human T-cell leukemia cell line. Specifically, FIG.
  • HDR templates consisting of eight different promoters and flanked with symmetric homology arms of 500 base pairs in the amount of 0.5 ⁇ g ⁇ L ⁇ 1 were used. Size of promoters in base pairs: CMV, 1400; SCP, 970; CMVe-SCP, 1270; CMVmax, 1830; JET, 1100; CAG, 2600; PGK, 1410; EF-1 ⁇ , 2090. Dark grey bars and circles present mean insertion frequency and cell viability using crAAVS1. Light grey bars represent mean insertion frequency and cell viability using crIDTneg (IDT).
  • Top panels display GFP insertion efficiencies using donor template flanked with short homology arms (100 bp HA), and bottom panels donor template flanked with long homology arms (500 bp HA).
  • Left panels display GFP insertion efficiencies using donor template containing EF-1 ⁇ promoter (long, ⁇ 2000 bp), and right panels donor template containing JET promoter (short, ⁇ 1000 bp).
  • Amount of donor template represented by the gradient above the bars, increases from 0.125, 0.25, 0.5 to 1 ⁇ g ⁇ L ⁇ 1 .
  • Dark grey bars represent mean insertion frequency using crAAVS1.
  • Light grey bars represent mean insertion frequency using crIDTneg (IDT).
  • Individual panels display CAR insertion efficiencies using donor template structure as described in FIG. 19 . Amount of donor template, MAD7-RNP, and PGA was 1 ⁇ g ⁇ L ⁇ 1 , 100:150 pmol MAD7:gNA, and 100 ⁇ g ⁇ L ⁇ 1 , in that order.
  • Nucleofection program P3-EH-115 for transfection of primary T-cells was used.
  • D represents number of biological replicas, and n number of technical replicas per D.
  • Dark grey bars represent mean insertion frequency using crAAVS1.
  • Light grey bars represent mean insertion frequency using crIDTneg (IDT).
  • compositions are described as having, including, or comprising specific components, or where processes and methods are described as having, including, or comprising specific steps, it is contemplated that, additionally, there are compositions of the present invention that consist essentially of, or consist of, the recited components, and that there are processes and methods according to the present invention that consist essentially of, or consist of, the recited processing steps.
  • an element or component is said to be included in and/or selected from a list of recited elements or components, it should be understood that the element or component can be any one of the recited elements or components, or the element or component can be selected from a group consisting of two or more of the recited elements or components.
  • a cell includes a plurality of cells, including mixtures thereof. Where the plural form is used for compounds, salts, or the like, this is taken to mean also a single compound, salt, or the like.
  • embodiment 1 provided herein is a composition comprising a nucleic acid-guided nuclease comprising a Type V CRISPR nuclease polypeptide comprising at least one nuclear localization signal (NLS) at or near the N-terminus or the C-terminus of the polypeptide.
  • embodiment 2 provided herein is the composition of embodiment 1 wherein the nuclease is a Type Va nuclease.
  • embodiment 3 provided herein is the composition of embodiment 1 or embodiment 2 wherein the Type V CRISPR nuclease polypeptide has at least 60, 70, 80, 85, 90, 95, 96, 97, 98, 99, or 100% sequence identity, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% sequence identity with SEQ ID NO: 1.
  • embodiment 4 provided herein is the composition of any previous embodiment wherein the Type V CRISPR nuclease polypeptide comprises two NLSs, one or both of which are at or near the N-terminus or the C-terminus of the polypeptide.
  • the composition of any previous embodiment wherein the Type V CRISPR nuclease polypeptide comprises three NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide.
  • the Type V CRISPR nuclease polypeptide comprises four NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide.
  • the Type V CRISPR nuclease polypeptide comprises at least five NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide.
  • any one of embodiments 4 through 7 wherein at least two of the NLSs are at or near the N-terminus of the polypeptide.
  • embodiment 9 provided herein is the composition of any one of embodiments 5 through 7 wherein at least three of the NLSs are at or near the N-terminus of the polypeptide.
  • embodiment 10 provided herein is the composition of any one of embodiments 6 through 7 wherein at least four of the NLSs are at or near the N-terminus of the polypeptide.
  • embodiment 11 provided herein is the composition of embodiment 7 wherein the 5 NLSs are at or near the N-terminus of the polypeptide.
  • composition of embodiment 11 comprising a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to any one of SEQ ID NOs: 109-112.
  • compositions of any one of embodiments 4 through 11 wherein at least two of the NLSs have different nuclear localization mechanisms.
  • embodiment 15 provided herein is the composition of any one of embodiments 5 through 7 or 9 through 11 wherein at least three of the NLSs have different nuclear localization mechanisms.
  • embodiment 16 provided herein is the composition of any previous embodiment wherein one or more of the NLSs comprises an NLS of the SV40 virus large T-antigen, an NLS from nucleoplasmin, e.g.
  • nucleoplasmin bipartite NLS a nucleoplasmin bipartite NLS, a c-myc NLS; a hRNPA1 M9 NLS; an IBB domain of importin-alpha NLS; a myoma T protein NLS; a sequence from human p53 NLS; a sequence of mouse c-abl IV NLS; a sequence of influenza virus NS1 NLS; a sequence of Hepatitis virus delta antigen NLS; a sequence of mouse Mx1 protein NLS; a sequence of human poly(ADP-ribose) polymerase NLS; a sequence of steroid hormone receptors (human) glucocorticoid NLS; and/or a sequence of EGL-13 NLS.
  • composition of embodiment 16 wherein one or more of the NLSs comprises an NLS of the SV40 virus large T-antigen.
  • composition of embodiment 16 wherein two or more of the NLSs comprises an NLS of the SV40 virus large T-antigen.
  • embodiment 19 provided herein is the composition of embodiment 17 or embodiment 18 wherein the NLS or NLSs comprises the sequence of SEQ ID NO: 5.
  • embodiment 20 provided herein is the composition of any one of embodiments 16 through 19 wherein one or more of the NLSs comprises an NLS from nucleoplasmin.
  • nucleoplasmin NLS comprises the sequence of SEQ ID NO: 6.
  • embodiment 23 provided herein is the composition of embodiment 22 wherein the c-myc NLS comprises the sequence of SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 21.
  • embodiment 24 provided herein is the composition of embodiment 23 wherein the c-myc NLS comprises the sequence of SEQ ID NO: 21.
  • embodiment 25 provided herein is the composition of any one of embodiments 16 through 24 wherein one or more of the NLSs comprises a sequence of EGL-13 NLS.
  • embodiment 26 provided herein is the composition of embodiment 25 wherein the EGL-13 NLS comprises the sequence of SEQ ID NO: 107.
  • composition 27 provided herein is the composition of any previous embodiment wherein the Type V CRISPR nuclease polypeptide further comprises a purification tag.
  • embodiment 28 provided herein is the composition of embodiment 27 wherein the purification tag is at or near the N-terminus of the nuclease polypeptide.
  • composition of embodiment 30 wherein the purification tag comprises a gly-6 ⁇ His tag (SEQ ID NO: 421).
  • the purification tag comprises a gly-8 ⁇ His tag (SEQ ID NO: 422).
  • embodiment 33 provided herein is the composition of any previous embodiment wherein the Type V CRISPR nuclease polypeptide comprises a cleavage site.
  • embodiment 34 provided herein is the composition of embodiment 33 wherein the cleavage site is at or near the N-terminus of the nuclease polypeptide.
  • composition of embodiment 37 comprising a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 8%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 111 or 112.
  • composition of embodiment 37 comprising a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 8%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 112.
  • gNA guide nucleic acid
  • gRNA guide nucleic acid
  • the target nucleotide is within 50 nucleotides of a protospacer adjacent motif (PAM) sequence specific for the Type V CRISPR nuclease.
  • PAM protospacer adjacent motif
  • composition of embodiment 41 wherein the PAM comprises a sequence of YTTN, wherein Y is T or C and N is A, T, G, or C.
  • embodiment 43 provided herein is the composition of embodiment 42 wherein the PAM comprises a sequence of YTTV or TTTV, wherein V is A, G, or C.
  • embodiment 44 provided herein is the composition of embodiment 40 wherein the gNA is a gRNA.
  • embodiment 45 provided herein is the composition of embodiment 44 wherein the gRNA is a dual gRNA.
  • embodiment 46 provided herein is the composition of embodiment 44 or embodiment 45 wherein the composition comprises the gRNA and the gRNA comprises one or more chemical modifications.
  • composition of embodiment 46 wherein the chemical modification comprises a 2′-O-alkyl, a 2′-O-methyl, a phosphorothioate, a phosphonoacetate, a thiophosphonoacetate, a 2′-O-methyl-3′-phosphorothioate, a 2′-O-methyl-3′-phosphonoacetate, a 2′-O-methyl-3′-thiophosphonoacetate, a 2′-deoxy-3′-phosphonoacetate, a 2′-deoxy-3′-thiophosphonoacetate, a suitable alternative, or a combination thereof.
  • any one of embodiments 40 through 48 wherein the molar ratio of gNA, e.g., gRNA to Type V CRISPR nuclease is at least 1.1:1, 1.2:1, 1.3:1, 1.4:1, 1.5:1, 1.6:1, 1.7:1, 1.8:1, 2:1, 2.2:1, 2.5:1, or 3:1 and/or not more than 1.2:1, 1.3:1, 1.4:1, 1.5:1, 1.6:1, 1.7:1, 1.8:1, 2:1, 2.2:1, 2.5:1, 3:1, or 4:1, preferably 1.1:1 to 2.5:1, more preferably 1.2:1 to 2:1, even more preferably 1.2:1 to 1.7:1.
  • the molar ratio of gNA e.g., gRNA to Type V CRISPR nuclease
  • gNA e.g., gRNA
  • the molar amount of gNA is at least 10, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 170, 190 or 200 pmol and/or not more than 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 170, 190, 200, 250, or 300 pmol, preferably 25-200 pmol, more preferably 50-100 pmol, even more preferably 65 to 85 pmol.
  • embodiment 51 provided herein is the composition of any one of embodiments 40 through 50 further comprising a donor template.
  • embodiment 52 provided herein is the composition of embodiment 51 wherein the donor template comprises homology arms.
  • embodiment 53 provided herein is the composition of embodiment 51 or embodiment 52 wherein the donor template is present in an amount of at least 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.7, 2, 2.5, 3, 4, or 5 ⁇ g ⁇ L ⁇ 1 and/or not more than 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.7, 2, 2.5, 3, 4, 5, 7, or 10 ⁇ g ⁇ L ⁇ 1, preferably 0.3 to 2 ⁇ g ⁇ L ⁇ 1, more preferably 0.5 to 1.5 ⁇ g ⁇ L ⁇ 1, even more preferably 0.8 to 1.2 ⁇ g ⁇ L ⁇ 1.
  • any one of embodiments 40 through 53 further comprising an anionic polymer.
  • the anionic polymer comprises polyglutamic acid (PGA).
  • the anionic polymer is present at a concentration of at least 20, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 170, 200, 250, 300, 400, or 500 ⁇ g ⁇ L ⁇ 1 and/or not more than 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 170, 200, 250, 300, 400, 500, 700, or 1000 ⁇ g ⁇ L ⁇ 1, preferably 20 to 200 ⁇ g ⁇ L ⁇ 1, more preferably 50 to 150 ⁇ g ⁇ L ⁇ 1, even more preferably 80 to 120 ⁇ g ⁇ L ⁇ 1.
  • embodiment 64 provided herein is a method comprising inserting a composition of any one of embodiments 1 through 56 into a cell.
  • embodiment 65 provided herein is the method of embodiment 64 wherein inserting the composition into the cell comprises electroporation.
  • embodiment 66 provided herein is a method for modifying a target polynucleotide comprising (i) contacting the composition of any one of embodiments 40 through 56 and (ii) allowing the nuclease and the guide nucleic acid to modify a targeted genomic region.
  • embodiment 67 provided herein is the method of embodiment 66 wherein the composition is a composition of any one of embodiments 51 through 56.
  • embodiment 68 provided herein is the method of embodiment 66 or embodiment 67 wherein the target polynucleotide is a genome or a portion of a genome within a cell.
  • embodiment 69 provided herein is the method of embodiment 68 wherein the cell is a human cell.
  • embodiment 70 provided herein is the method of embodiment 69 wherein the cell is an immune cell or a stem cell.
  • embodiment 71 provided herein is the method of embodiment 70 wherein the cell is an immune cell.
  • embodiment 72 provided herein is the method of embodiment 71 wherein the cell is a T cell.
  • embodiment 73 provided herein is the method of embodiment 70 wherein the cell is a stem cell.
  • embodiment 74 provided herein is the method of embodiment 73 wherein the stem cell is an iPSC
  • embodiment 75 provided herein is the method of any one of embodiments 67 through 74 wherein the donor template comprises a mutation in a PAM within 50 nucleotides of the target nucleotide sequence in the target polynucleotide.
  • embodiment 76 is the method of any one of embodiments 68 through 74 wherein the composition is a composition of embodiment 67 and the donor template comprises a polynucleotide coding for a polypeptide to be expressed by the cell.
  • the polypeptide to be expressed by the cell comprises a chimeric antigen receptor (CAR) or a portion thereof.
  • CAR chimeric antigen receptor
  • embodiment 78 provided herein is the method of embodiment 77 wherein the cell is a human T cell or a human iPSC.
  • embodiment 79 provided herein is the method of embodiment 77 wherein the cell is a human T cell.
  • embodiment 80 provided herein is the method of embodiment 77 wherein the cell is a human iPSC.
  • composition 81 comprising a first polynucleotide coding for a polypeptide comprising a nucleic acid-guided nuclease comprising a CRISPR Type V nuclease polypeptide, wherein the polynucleotide has less than 75% sequence identity to SEQ ID NO: 22.
  • the nuclease polypeptide comprises at least 1, 2, 3, 4, or 5 NLSs, wherein each of the NLSs is at or near the N-terminus or the C-terminus of the nuclease polypeptide.
  • one or more of the NLSs comprises an NLS of the SV40 virus large T-antigen, an NLS from nucleoplasmin, e.g. a nucleoplasmin bipartite NLS, a c-myc NLS; a hRNPA1 M9 NLS; an IBB domain of importin-alpha NLS; a myoma T protein NLS; a sequence from human p53 NLS; a sequence of mouse c-abl IV NLS; a sequence of influenza virus NS1 NLS; a sequence of Hepatitis virus delta antigen NLS; a sequence of mouse Mx1 protein NLS; a sequence of human poly(ADP-ribose) polymerase NLS; a sequence of steroid hormone receptors (human) glucocorticoid NLS; and/or a sequence of EGL-13 NLS.
  • nucleoplasmin e.g. a nucleoplasmin bipartite NLS, a c-myc N
  • embodiment 84 provided herein is the composition of embodiment 83 wherein one or more of the NLSs comprises an NLS of the SV40 virus large T-antigen.
  • embodiment 85 provided herein is the composition of embodiment 84 wherein the NLS or NLSs comprises the sequence of SEQ ID NO: 5.
  • embodiment 86 provided herein is the composition of any one of embodiments 83 through 85 wherein one or more of the NLSs comprises an NLS from nucleoplasmin.
  • embodiment 87 provided herein is the composition of embodiment 86 wherein the nucleoplasmin NLS comprises the sequence of SEQ ID NO: 6.
  • embodiment 88 provided herein is the composition of any one of embodiments 83 through 87 wherein one or more of the NLSs comprises a c-myc NLS.
  • composition of embodiment 88 wherein the c-myc NLS comprises the sequence of SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 21.
  • composition of embodiment 88 wherein the c-myc NLS comprises the sequence SEQ ID NO: 21.
  • embodiment 91 provided herein is the composition of any one of embodiments 83 through 90 wherein one or more of the NLSs comprises a sequence of EGL-13 NLS.
  • embodiment 92 provided herein is the composition of embodiment 91 wherein the EGL-13 NLS comprises the sequence of SEQ ID NO: 107.
  • embodiment 94 provided herein is the composition of any one of embodiments 81 through 93 wherein the first polynucleotide comprises a polynucleotide coding for a purification tag.
  • embodiment 95 provided herein is the composition of embodiment 94 wherein the purification tag is at or near the N-terminus of the nuclease polypeptide.
  • embodiment 96 provided herein is the composition of embodiment 94 or 95 wherein the purification tag comprises a poly-his tag, such as a Gly-6 ⁇ His tag (SEQ ID NO: 421) or Gly-8 ⁇ His tag (SEQ ID NO: 422); short epitope tags, e.g., FLAG, hemagglutinin (HA), c-myc, T7, Glu-Glu; maltose binding protein (mbp); N-terminal glutathione S-transferase (GST); or calmodulin binding peptide (CBP).
  • a poly-his tag such as a Gly-6 ⁇ His tag (SEQ ID NO: 421) or Gly-8 ⁇ His tag (SEQ ID NO: 422); short epitope tags, e.g., FLAG, hemagglutinin (HA), c-myc, T7, Glu-Glu; maltose binding protein (mbp); N-terminal glutathione S-transfera
  • embodiment 98 provided herein is the composition of embodiment 97 wherein the purification tag comprises a gly-6 ⁇ His tag (SEQ ID NO: 421).
  • embodiment 99 provided herein is the composition of embodiment 97 wherein the purification tag comprises a gly-8 ⁇ His tag (SEQ ID NO: 422).
  • embodiment 100 provided herein is the composition of any one of embodiments 81 through 99 wherein the Type V CRISPR nuclease polypeptide comprises a cleavage site.
  • embodiment 101 provided herein is the composition of embodiment 100 wherein the cleavage site is at or near the N-terminus of the nuclease polypeptide.
  • embodiment 102 provided herein is the composition of embodiment 100 or 101 wherein the cleavage site comprises a Tobacco Etch Virus (TEV) cleavage site.
  • TSV Tobacco Etch Virus
  • embodiment 103 provided herein is the composition of embodiment 102 wherein the cleavage site comprises the sequence of SEQ ID NO: 108.
  • embodiment 104 provided herein is the composition of embodiment 103 comprising 5 NLSs at or near the N-terminus of the polypeptide, a purification tag, and the cleavage site, wherein the cleavage site is after the purification tag.
  • embodiment 105 provided herein is the composition of any one of embodiments 81 through 104 wherein the polynucleotide codes for a polypeptide comprising a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to any one of SEQ ID NOs: 109-112
  • embodiment 106 provided herein is the composition of any one of embodiments 81 through 105 wherein the polynucleotide codes for a polypeptide comprising a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical identical to SEQ ID NO: 112.
  • embodiment 108 provided herein is the composition of any one of embodiments 81 through 107 further comprising a second polynucleotide coding for a gNA or portion thereof, wherein the gNA, e.g., gRNA, comprises a spacer sequence that targets a target nucleotide sequence within a polynucleotide, or a polynucleotide coding for the gNA, e.g., gRNA, wherein the gNA, e.g., gRNA is compatible with the Type V CRISPR nuclease.
  • embodiment 109 provided herein is the composition of embodiment 108 wherein the first and second polynucleotides are the same.
  • embodiment 110 provided herein is the composition of any one of embodiments 81 through 109 further comprising third polynucleotide that comprises a donor template.
  • embodiment 111 is a vector comprising the polynucleotide or polynucleotides of any one of embodiments 81 through 110.
  • embodiment 112 provided herein is a cell comprising a composition of any one of embodiments 81 through 110.
  • 113 provided herein is the composition of embodiment 112 wherein the cell is a human cell.
  • 114 provided herein is the composition of embodiment 113 wherein the cell is an immune cell or a stem cell.
  • 115 provided herein is the composition of embodiment 113 wherein the cell is an immune cell.
  • 116 provided herein is the composition of embodiment 115 wherein the cell is T cell.
  • embodiment 117 provided herein is the composition of embodiment 113 wherein the cell is a stem cell.
  • 118 provided herein is the composition of embodiment 117 wherein the cell is an iPSC.
  • embodiment 119 provided herein is a method comprising inserting the composition of any one of embodiments 81 through 111 into a cell.
  • embodiment 120 provided herein is the method of embodiment 119 wherein inserting the composition into the cell comprises electroporation.
  • embodiment 121 provided herein is a method comprising (i) inserting a composition of any one of embodiments 81 through 107 into a cell and (ii) inserting a gNA, e.g. a gRNA, compatible with the Type V CRISPR nuclease coded for by the composition, into the cell.
  • a gNA e.g. a gRNA
  • steps (i) and (ii) comprise electroporation.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

Provided herein are methods and compositions utilizing modified nucleases and/or other components, such as guide nucleic acids and donor templates, for use in a CRISPR system.

Description

    CROSS-REFERENCE
  • This application is a continuation of PCT/US2022/028208, filed May 6, 2022, which claims priority to U.S. Provisional Application No. 63/185,315, filed May 6, 2021, and to U.S. Provisional Application No. 63/315,483, filed Mar. 1, 2022, both of which are incorporated herein by reference.
  • SEQUENCE LISTING
  • The instant application contains a Sequence Listing which has been submitted electronically in XML file format and is hereby incorporated by reference in its entirety. Said XML copy, created on Jun. 16, 2023, is named ARTN-008_CON-T1_SL.xml and is 985,959 bytes in size.
  • BACKGROUND
  • Nucleic acid-guided nucleases have become important tools for research and genome engineering. The applicability of these tools can be limited by the sequence specificity requirements, expression, or delivery issues.
  • INCORPORATION BY REFERENCE
  • All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
  • FIG. 1 shows a diagram of MAD7 comprising one or more nuclear localization signals (NLS). FIG. 1 discloses “His6” as SEQ ID NO: 423.
  • FIG. 2 shows editing frequency at the DNMT1 locus in and post-transfection cell viability of T-cell leukemic cells following treatment comprising one or more guide nucleic acids complexed with MAD7 comprising one or more NLS.
  • FIG. 3 shows editing frequency at the DNMT1 locus in T-cell leukemic cells using multiple electroporation programs in combination with the SE electroporation buffer.
  • FIG. 4 shows editing frequency at the DNMT1 locus in T-cell leukemic cells using multiple electroporation programs in combination with the SF electroporation buffer.
  • FIG. 5 shows editing frequency at the DNMT1 locus in T-cell leukemic cells using multiple electroporation programs in combination with the SG electroporation buffer.
  • FIG. 6 shows editing frequency at the DNMT1 locus in T-cell leukemic cells using multiple electroporation programs.
  • FIG. 7 shows editing frequency by type at eight loci in T-cell leukemic cells using multiple guide nucleic acids complexed with MAD7 comprising one or more NLS.
  • FIG. 8 shows a comparison of editing efficiency between T-cell leukemic cells treated with MAD7 comprising one or more guide nucleic acids targeting the DNMT1 locus as compared to a control guide nucleic acid binned by editing frequency.
  • FIG. 9 shows editing frequency by PAM motif in T-cell leukemic cells using multiple guide nucleic acids complexed with MAD7 comprising one or more NLS.
  • FIG. 10A shows sequence logo plots for multiple guide nucleic acids binned by editing frequency in T-cell leukemic cells using when complexed with MAD7 comprising one or more NLS.
  • FIG. 10B shows nucleotide and dinucleotide frequency for multiple guide nucleic acids binned by editing frequency in T-cell leukemic cells using when complexed with MAD7 comprising one or more NLS.
  • FIG. 11 shows trinucleotide AAA or UUU frequency binned by editing frequency in T-cell leukemic cells following treatment with multiple guide nucleic acids complexed with MAD7 comprising one or more NLS.
  • FIG. 12 shows editing frequency for both INDELs and frameshift mutations at eight loci in T-cell leukemic cells following treatment with multiple guide nucleic acids complexed with MAD7 comprising one or more NLS.
  • FIG. 13 shows the correlation between INDEL frequency in the gNA validation experiment versus INDEL formation in the gNA screen experiment.
  • FIG. 14 shows the proportion of frameshift to INDELs at eight loci in T-cell leukemic cells following treatment with multiple guide nucleic acids complexed with MAD7 comprising one or more NLS.
  • FIG. 15 shows INDEL frequency for gNAs comprising representative spacer sequences complexed with MAD7 comprising one or more NLS in T-cell leukemic cells at predicted off-target sites. FIG. 15 discloses SEQ ID NOS 424-427, 427-429 and 429-454, respectively, in order of appearance.
  • FIG. 16 shows INDEL frequency for gNAs comprising representative spacer sequences complexed with MAD7 comprising one or more NLS in T-cell leukemic cells at predicted off-target sites. FIG. 16 discloses SEQ ID NOS 455-484, 453-454 and 485-487, respectively, in order of appearance.
  • FIG. 17 shows INDEL frequency at the AAVS1 locus in T-cell leukemic cells following treatment with a gNA:MAD7 complex.
  • FIG. 18 shows GFP insertion efficiency at the AAVS1 locus and cell viability following treatment for multiple primer constructs.
  • FIG. 19 shows GFP insertion efficiency at the AAVS1 locus with increasing concentrations of donor template (e.g., HDRT) and variable homology arm length.
  • FIG. 20 shows CAR insertion efficiency at the AAVS1 locus and cell viability with increasing concentrations of donor template and variable homology arm length.
  • FIG. 21 shows CAR insertion efficiency (A) at the AAVS1 locus and cell viability (B) in primary T-cells.
  • DETAILED DESCRIPTION
  • CRISPR is an abbreviation of Clustered Regularly Interspaced Short Palindromic Repeats. In a palindromic repeat, the sequence of nucleotides is the same in both directions. Each of these palindromic repetitions is followed by short segments of spacer DNA. Small clusters of Cas (CRISPR-associated system) genes are located next to CRISPR sequences. The CRISPR/Cas system is a prokaryotic immune system that can confer resistance to foreign genetic elements such as those present within plasmids and phages providing the prokaryote a form of acquired immunity. RNA harboring a spacer sequence assists Cas (CRISPR-associated) proteins to recognize and cut exogenous DNA. CRISPR sequences are found in approximately 50% of bacterial genomes and nearly 90% of sequenced archaea has selected for efficient and robust metabolic and regulatory networks that prevent unnecessary metabolite biosynthesis and optimally distribute resources to maximize overall cellular fitness. The complexity of these networks with limited approaches to understand their structure and function and the ability to re-program cellular networks to modify these systems for a diverse range of applications has complicated advances in this space. Certain approaches to re-program cellular networks are directed to modifying single genes of complex pathways but as a consequence of modifying single genes, unwanted modifications to the genes or other genes can result, getting in the way of identifying changes necessary to achieve a sought-after endpoint as well as complicating the endpoint sought by the modification.
  • CRISPR-Cas driven genome editing and engineering has dramatically impacted biology and biotechnology in general. CRISPR-Cas editing systems require a polynucleotide guided nuclease, a guide nucleic acid (gNA) e.g. a guide RNA (gRNA)) that directs the nuclease to cut a specific region of the genome, and, optionally, a donor DNA cassette (also referred to herein as a donor template or editing sequence) that can be used to repair the cut dsDNA and thereby incorporate programmable edits at the site of interest. The earliest demonstrations and applications of CRISPR-Cas editing used Cas9 nucleases and associated gRNA. These systems have been used for gene editing in a broad range of species encompassing bacteria to higher order mammalian systems such as animals and in certain cases, humans. It is well established, however, that important editing parameters such as protospacer adjacent motif (PAM) specificity, editing efficiency, and off-target rates, among others, are species, loci, and nuclease dependent. There is increasing interest in identifying and rapidly characterizing novel nuclease systems that can be exploited to broaden and improve overall editing capabilities.
  • One version of the CRISPR/Cas system, CRISPR/Cas9, has been modified to provide useful tools for editing targeted genomes. By delivering the Cas9 nuclease complexed with a synthetic guide RNA (gRNA) into a cell, the cell's genome can be cut/edited at a predetermined location, allowing existing genes to be removed and/or new ones added. These systems are useful but have some important limitations regarding efficiency and accuracy of targeted editing, imprecise editing complications, as well as impediments when used for commercially relevant situations such as gene replacement. Therefore, a need exists for improved nucleic acid guided nuclease systems for directed and accurate editing with improved efficiency.
  • As used herein, the term “modulating” and “manipulating” of genome editing can mean an increase, a decrease, upregulation, downregulation, induction, a change in editing activity, a change in binding, a change cleavage or the like, of one or more of targeted genes or gene clusters of certain embodiments disclosed herein.
  • In certain embodiments of the present disclosure, there can be employed conventional molecular biology, microbiology, and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature and understood by those of skill in the art.
  • In other embodiments, primers used herein for preparation per conventional techniques can include sequencing primers and amplification primers. In some embodiments, plasmids and oligomers used in conventional techniques can include synthesized oligomers and oligomer cassettes.
  • In some embodiments disclosed herein, nucleic acid-guided nuclease systems and methods of use are provided. A nuclease system can include transcripts and other elements involved in the expression of an engineered nuclease disclosed herein, which can include sequences encoding a novel engineered nucleic acid-guided nuclease protein and a guide sequence (gRNA) or a novel gRNA as disclosed herein. In some embodiments, nucleic acid-guided nuclease systems can include at least one CRISPR-associated nucleic acid guided nuclease construct, the disclosure of which are provided herein. In other embodiments, nucleic acid-guided nuclease systems can include at least one known guide sequence (gRNA) or at least one novel gRNA, such as a single gRNA or a dual gRNA. In some embodiments, an engineered nucleic acid-guided nuclease of the instant invention can be used in systems for editing a gene of interest in humans or other species.
  • Bacterial and archaeal targetable nuclease systems have emerged as powerful tools for precision genome editing. However, naturally occurring nucleases have some limitations including expression and delivery challenges due to the nucleic acid sequence and protein size. In certain embodiments, novel engineered nucleic acid-guided nuclease constructs disclosed herein can be created for targeting of a targeted gene and/or increased efficiency and/or accuracy of targeted gene editing in a subject.
  • In accordance with these embodiments, it is known that Cas12a is a single RNA-guided CRISPR/Cas endonuclease capable of genome editing having differing features when compared to Cas9. In certain embodiments, a Cas12a-based system allow fast and reliable introduction of donor DNA into a genome. In addition, Cas12a broadens genome editing. CRISPR/Cas12a genome editing has been evaluated in human cells as well as other organisms including plants. Several features of the CRISPR/Cas12a system are different when compared to CRISPR/Cas9.
  • It is known that Cas12a nuclease recognizes T-rich protospacer adjacent motif (PAM) sequences (e.g. 5′-TTTN-3′ (AsCas12a, LbCas12a) and 5′-TTN-3′ (FnCas12a); whereas, the comparable sequence for SpCas9 is NGG. The PAM sequence of Cas12a is located at the 5′ end of the target DNA sequence, where it is at the 3′ end for Cas9. In addition, Cas12a is capable of cleaving DNA distal to its PAM around the +18/+23 position of the protospacer. This cleavage creates a staggered DNA overhang (e.g. sticky ends), whereas Cas9 cleaves close to its PAM after the 3′ position of the protospacer at both strands and creates blunt ends. In certain methods, creating altered recognition of nucleases can provide an improvement over Cas9 or Cas12a to improve accuracy. Further, Cas12a is guided by a single crRNA and does not require a tracrRNA, resulting in a shorter gRNA sequence than the sgRNA used by Cas9. Surprisingly, it has been found that the modified Cas12a nucleases provided herein can also function with a dual gRNA.
  • It is also known that Cas12a displays additional ribonuclease activity that functions in crRNA processing. Cas12a is used as an editing tool for different species (e.g. S. cerevisiae), allowing the use of an alternative PAM sequence compared with the one recognized by CRISPR/Cas9. Novel nucleases disclosed herein can further recognize the same or alternative PAM sequences. These novel nucleases can provide an alternative system for multiplex genome editing as compared with known multiplex approaches and can be used as an improved system in mammalian gene editing.
  • Well-known Cas12a protein-RNA complexes recognize a T-rich PAM and cleavage leads to a staggered DNA double-stranded break. Cas12a-type nuclease interacts with the pseudoknot structure formed by the 5′-handle of crRNA. A guide RNA segment, composed of a seed region and the 3′ terminus, possesses complementary binding sequences with the target DNA sequences. Cas12a type nucleases characterized to date have been demonstrated to work with a single gRNA and to process gRNA arrays. While Cas12a-type and Cas9 nuclease systems have proven highly impactful, neither system has been demonstrated to function as predictably as is desired to enable the full range of applications envisioned for gene-editing technologies.
  • In the current state, a range of efforts have attempted to engineer improved CRISPR editing systems having increased efficiency and accuracy, which have included engineering of the PAM specificity, stability, and sequence of the gRNA and-or the nuclease. For example, chemical modifications of CRISPR/Cas9 gRNA expected to increase gRNA stability was found to lead to a 3.8-fold higher indel frequencies in human cells. In addition, other studies included structure-guided mutagenesis of Cas12a and screened to identify variants with an increased range of recognized PAM sequences. These engineered AsCas12a recognized TYCV and TATV PAMs in addition to the established TTTV sequence, with enhanced activities in vitro and in tested human cells.
  • In certain embodiments, Cas12a-like nucleases and engineered gRNAs disclosed herein are contemplated for use in bacteria, and other prokaryotes. In certain embodiments, engineered designer nucleases are contemplated for use in eukaryotes such as yeast, mammals, e.g., human as well as of use in birds and fish, or cells derived from same.
  • In some embodiments, off-targeting rates for nuclease constructs disclosed herein can be reduced compared to a control, e.g., a native sequence, for improved editing. Off-targeting rates can be readily tested.
  • In some embodiments, nuclease constructs disclosed herein can share conserved encoded motifs of known nucleases. In other embodiments, nuclease constructs disclosed herein do not share conserved encoded peptide motifs with known nucleases. In preferred embodiments, provided herein are compositions, methods, and/or kits wherein the CRISPR nuclease comprises a Type V nuclease. In certain embodiments, provided herein are compositions, methods, and/or kits wherein the CRISPR nuclease comprises a Type V-A, V-B, V-C, V-D, or V-E CRISPR nuclease. In certain embodiments, provided herein are compositions, methods, and/or kits wherein the CRISPR nuclease comprises a Type V-A nuclease. Naturally occurring type V-A CRISPR nucleases comprise a RuvC-like nuclease domain but lack an HNH endonuclease domain, and recognize a 5′ T-rich PAM located immediately upstream from the target nucleotide sequence, the orientation determined using the non-target strand (i.e., the strand not hybridized with the spacer sequence) as the coordinate. These CRISPR nucleases cleave a double-stranded DNA to generate a staggered double-stranded break rather than a blunt end. The cleavage site is distant from the PAM site (e.g., separated by at least 10, 11, 12, 13, 14, or 15 nucleotides downstream from the PAM on the non-target strand and/or separated by at least 15, 16, 17, 18, or 19 nucleotides upstream from the sequence complementary to PAM on the target strand).
  • In certain embodiments, a type V-A CRISPR nuclease comprises Cpf1. Cpf1 proteins are known in the art and are described, e.g., in U.S. Pat. Nos. 9,790,490 and 10,113,179. Cpf1 orthologs can be found in various bacterial and archaeal genomes. For example, in certain embodiments, the Cpf1 protein is derived from Francisella novicida U112 (Fn), Acidaminococcus sp. BV3L6 (As), Lachnospiraceae bacterium ND2006 (Lb), Lachnospiraceae bacterium MA2020 (Lb2), Candidatus Methanoplasma termitum (CMt), Moraxella bovoculi 237 (Mb), Porphyromonas crevioricanis (Pc), Prevotella disiens (Pd), Francisella tularensis 1, Francisella tularensis subsp. novicida, Prevotella albensis, Lachnospiraceae bacterium MC2017 1, Butyrivibrio proteoclasticus, Peregrinibacteria bacterium GW2011_GWA2_33_10, Parcubacteria bacterium GW2011_GWC2_44_17, Smithella sp. SCADC, Eubacterium eligens, Leptospira inadai, Porphyromonas macacae, Prevotella bryantii, Proteocatella sphenisci, Anaerovibrio sp. RM50, Moraxella caprae, Lachnospiraceae bacterium COE1, or Eubacterium coprostanoligenes.
  • In certain embodiments, a type V-A CRISPR nuclease comprises AsCpf1 or a variant thereof. In certain embodiments, a type V-A CRISPR nucleases comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 3 of International (PCT) Application Publication No. WO 2021/158918. In certain embodiments, a type V-A CRISPR nucleases comprises the amino acid sequence set forth in SEQ ID NO: 3 of International (PCT) Application Publication No. WO 2021/158918.
  • In certain embodiments, a type V-A CRISPR nuclease comprises LbCpf1 or a variant thereof. In certain embodiments, a type V-A CRISPR nucleases comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 4 of International (PCT) Application Publication No. WO 2021158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 4 of International (PCT) Application Publication No. WO 2021/158918.
  • In certain embodiments, a type V-A CRISPR nuclease comprises FnCpf1 or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 5 of International (PCT) Application Publication No. WO 2021158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 5 of International (PCT) Application Publication No. WO 2021/158918.
  • In certain embodiments, a type V-A CRISPR nuclease comprises Prevotella bryantii Cpf1 (PbCpf1) or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 6 of International (PCT) Application Publication No. WO 2021/158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 6 of International (PCT) Application Publication No. WO 2021/158918.
  • In certain embodiments, a type V-A CRISPR nuclease comprises Proteocatella sphenisci Cpf1 (PsCpf1) or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 7 of International (PCT) Application Publication No. WO 2021158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 7 of International (PCT) Application Publication No. WO 2021/158918.
  • In certain embodiments, a type V-A CRISPR nuclease comprises Anaerovibrio sp. RM50 Cpf1 (As2Cpf1) or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 8 of International (PCT) Application Publication No. WO 2021158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 8 of International (PCT) Application Publication No. WO 2021/158918.
  • In certain embodiments, a type V-A CRISPR nuclease comprises Moraxella caprae Cpf1 (McCpf1) or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 9 of International (PCT) Application Publication No. WO 2021/158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 9 of International (PCT) Application Publication No. WO 2021/158918.
  • In certain embodiments, a type V-A CRISPR nuclease comprises Lachnospiraceae bacterium COE1 Cpf1 (Lb3Cpf1) or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 10 of International (PCT) Application Publication No. WO 2021158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 10 of International (PCT) Application Publication No. WO 2021/158918.
  • In certain embodiments, a type V-A CRISPR nuclease comprises Eubacterium coprostanoligenes Cpf1 (EcCpf1) or a variant thereof. In certain embodiments, a type V-A Cas protein comprises an amino acid sequence at least 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to the amino acid sequence set forth in SEQ ID NO: 11 of International (PCT) Application Publication No. WO 2021158918. In certain embodiments, a type V-A Cas protein comprises the amino acid sequence set forth in SEQ ID NO: 11 of International (PCT) Application Publication No. WO 2021/158918.
  • In certain embodiments, a type V-A CRISPR nuclease is not Cpf1. In certain embodiments, a type V-A CRISPR nuclease is not AsCpf1.
  • In certain embodiments, a type V-A CRISPR nuclease comprises a Type V-A nuclease described in U.S. Pat. No. 9,982,279.
  • In certain embodiments, a Type VA CRISPR nuclease polypeptide used in compositions and methods herein can be represented by a polypeptide that includes a sequence that has at least 60, 70, 80, 85, 90, 95, 96, 97, 98, 99, or 100% sequence identity, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% sequence identity with SEQ ID NO: 1 SEQ ID NO: 1 wherein the Type VA CRISPR nuclease polypeptide further comprises at least one, two, three, four, five or six nuclear localization sequences (NLS), each of which can be at or near the amino end or carboxy end of the CRISPR nuclease polypeptide; and/or one or more purification tags; in addition, a cleavage sequence can be provided to remove portions of a protopeptide. As used herein, the term “at or near” an N-terminus or a C-terminus includes where the nearest amino acid of the NLS to the N- or C-terminus is within 300 amino acids, in some cases within 200 amino acids, from the N- or C-terminus of the polypeptide (e.g., a core polypeptide such as one of the CRISPR nucleases described herein, to which the NLS or NLSs is attached). In certain embodiments, a Type V CRISPR nuclease polypeptide, e.g., Type Va CRISPR polypeptide, comprises two, three, four, or five NLSs, each of which are at or near the N-terminus or the C-terminus of the polypeptide, in preferred embodiments the NLSs are at or near the N-terminus. In certain embodiments, a CRISPR nuclease polypeptide, including one or more NLSs and, in some cases, a purification tag and/or a cleavage site, comprises a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to any one of SEQ ID NOs: 109-112. In certain embodiments, a Type V, e.g., VA CRISPR nuclease polypeptide comprises at least 1-30, 1-20, 1-15, 1-10, 1-9, 1-8, 1-7, 1-6, 1-5, 2-30, 2-20, 2-15, 2-10, 2-9, 2-8, 2-7, 2-6, 2-5, 3-30, 3-20, 3-15, 3-10, 3-9, 3-8, 3-7, 3-6, or 3-5, preferably 1-10, more preferably 2-10, even more preferably 3-10 NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide, in preferred embodiments at or near the N-terminus. In certain embodiments, at least two, or at least three, of the NLSs have different mechanisms, that is, different mechanisms by which they localize an attached polypeptide to a nucleus. Such mechanisms are well-known in the art; see, e.g., Lu et al. Cell Commun Signal (2021) 19:60 https://doi.org/10.1186/s12964-021-00741-y. Suitable NLS, purification tag, and cleavage site sequences can be as described elsewhere herein, e.g., in sections labeled Nuclear Localization Signals, Purification Tags, and Cleavage Sites.
  • SEQ ID NO: 1
    MNNGTNNFQNFIGISSLQKTLRNALIPTETTQQFIVKNGIIKEDELRGEN
    RQILKDIMDDYYRGFISETLSSIDDIDWTSLFEKMEIQLKNGDNKDTLIK
    EQTEYRKAIHKKFANDDRFKNMFSAKLISDILPEFVIHNNNYSASEKEEK
    TQVIKLFSRFATSFKDYFKNRANCFSADDISSSSCHRIVNDNAEIFFSNA
    LVYRRIVKSLSNDDINKISGDMKDSLKEMSLEEIYSYEKYGEFITQEGIS
    FYNDICGKVNSFMNLYCQKNKENKNLYKLQKLHKQILCIADTSYEVPYKF
    ESDEEVYQSVNGFLDNISSKHIVERLRKIGDNYNGYNLDKIYIVSKFYES
    VSQKTYRDWETINTALEIHYNNILPGNGKSKADKVKKAVKNDLQKSITEI
    NELVSNYKLCSDDNIKAETYIHEISHILNNFEAQELKYNPEIHLVESELK
    ASELKNVLDVIMNAFHWCSVFMTEELVDKDNNFYAELEEIYDEIYPVISL
    YNLVRNYVTQKPYSTKKIKLNFGIPTLADGWSKSKEYSNNAIILMRDNLY
    YLGIFNAKNKPDKKIIEGNTSENKGDYKKMIYNLLPGPNKMIPKVFLSSK
    TGVETYKPSAYILEGYKQNKHIKSSKDFDITFCHDLIDYFKNCIAIHPEW
    KNFGFDFSDTSTYEDISGFYREVELQGYKIDWTYISEKDIDLLQEKGQLY
    LFQIYNKDFSKKSTGNDNLHTMYLKNLFSEENLKDIVLKLNGEAEIFFRK
    SSIKNPIIHKKGSILVNRTYEAEEKDQFGNIQIVRKNIPENIYQELYKYF
    NDKSDKELSDEAAKLKNVVGHHEAATNIVKDYRYTYDKYFLHMPITINFK
    ANKTGFINDRILQYIAKEKDLHVIGIDRGERNLIYVSVIDTCGNIVEQKS
    FNIVNGYDYQIKLKQQEGARQIARKEWKEIGKIKEIKEGYLSLVIHEISK
    MVIKYNAIIAMEDLSYGFKKGRFKVERQVYQKFETMLINKLNYLVFKDIS
    ITENGGLLKGYQLTYIPDKLKNVGHQCGCIFYVPAAYTSKIDPTTGFVNI
    FKFKDLTVDAKREFIKKFDSIRYDSEKNLFCFTFDYNNFITQNTVMSKSS
    WSVYTYGVRIKRRFVNGRFSNESDTIDITKDMEKTLEMTDINWRDGHDLR
    QDIIDYEIVQHIFEIFRLTVQMRNSLSELEDRDYDRLISPVLNENNIFYD
    SAKAGDALPKDADANGAYCIALKGLYEIKQITENWKEDGKFSRDKLKISN
    KDWFDFIQNKRYL
  • Nucleotide sequences coding for SEQ ID NO: 1 can include sequences with less than 99, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, or 40% sequence identity with SEQ ID NO: 22, in preferred embodiments less than 75% sequence identity. In certain embodiments, a nucleotide sequence coding for SEQ ID NO: 1 can also include nucleic acid sequences coding for one or more NLS at the N-terminus and/or C-terminus, as described herein, and/or a tag such as a purification tag at the N-terminus, as described herein. In certain embodiments, provided herein are compositions comprising a first polynucleotide coding for a polypeptide comprising a nucleic acid-guided nuclease comprising a CRISPR Type V nuclease polypeptide, wherein the polynucleotide has less than 75% sequence identity to SEQ ID NO: 22, such as wherein the nuclease polypeptide comprises at least 1, 2, 3, 4, or 5 NLSs, wherein each of the NLSs is at or near the N-terminus or the C-terminus of the nuclease polypeptide. NLSs can be any of those described herein. The first polynucleotide can comprise a sequence coding for a purification tag, such as a purification tag described herein, and/or cleavage site, such as a cleavage site described herein. In certain embodiments the first polynucleotide codes for a polypeptide comprising a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to any one of SEQ ID NOs: 109-112, such as SEQ ID NO: 109, or SEQ ID NO: 110, or SEQ ID NO: 111, or SEQ ID NO: 112. the first polynucleotide comprises a sequence at least 50, 60, 70, 80, 90, 95, 97, or 99% identical, or 100% identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 113. In certain embodiment the composition further comprises a second polynucleotide coding for a gNA or portion thereof, wherein the gNA, e.g., gRNA, comprises a spacer sequence that targets a target nucleotide sequence within a polynucleotide, or a polynucleotide coding for the gNA, e.g., gRNA, wherein the gNA, e.g., gRNA is compatible with the Type V CRISPR nuclease. In certain embodiments the first and second polynucleotides are the same. The composition can further comprise a third polynucleotide comprising a donor template. In certain embodiments, provided is a vector comprising one of the polynucleotide compositions of this paragraph. In certain embodiments, provided is a cell comprising one of the polynucleotide compositions of this paragraph, e.g., a human cell, such as an immune cell, for example a T cell, or a stem cell, such as an iPSC. In certain embodiments, provided is a method comprising inserting any one of the polynucleotide compositions of this paragraph into a cell. In certain embodiments inserting the composition comprises electroporation.
  • SEQ ID NO: 22:
    ATGAACAACGGCACAAATAATTTTCAGAACTTCATCGGGATCTCAAGTTTGCAGAAA
    ACGCTGCGCAATGCTCTGATCCCCACGGAAACCACGCAACAGTTCATCGTCAAGAA
    CGGAATAATTAAAGAAGATGAGTTACGTGGCGAGAACCGCCAGATTCTGAAAGATA
    TCATGGATGACTACTACCGCGGATTCATCTCTGAGACTCTGAGTTCTATTGATGACA
    TAGATTGGACTAGCCTGTTCGAAAAAATGGAAATTCAGCTGAAAAATGGTGATAAT
    AAAGATACCTTAATTAAGGAACAGACAGAGTATCGGAAAGCAATCCATAAAAAATT
    TGCGAACGACGATCGGTTTAAGAACATGTTTAGCGCCAAACTGATTAGTGACATATT
    ACCTGAATTTGTCATCCACAACAATAATTATTCGGCATCAGAGAAAGAGGAAAAAA
    CCCAGGTGATAAAATTGTTTTCGCGCTTTGCGACTAGCTTTAAAGATTACTTCAAGA
    ACCGTGCAAATTGCTTTTCAGCGGACGATATTTCATCAAGCAGCTGCCATCGCATCG
    TCAACGACAATGCAGAGATATTCTTTTCAAATGCGCTGGTCTACCGCCGGATCGTAA
    AATCGCTGAGCAATGACGATATCAACAAAATTTCGGGCGATATGAAAGATTCATTA
    AAAGAAATGAGTCTGGAAGAAATATATTCTTACGAGAAGTATGGGGAATTTATTAC
    CCAGGAAGGCATTAGCTTCTATAATGATATCTGTGGGAAAGTGAATTCTTTTATGAA
    CCTGTATTGTCAGAAAAATAAAGAAAACAAAAATTTATACAAACTTCAGAAACTTC
    ACAAACAGATTCTATGCATTGCGGACACTAGCTATGAGGTCCCGTATAAATTTGAAA
    GTGACGAGGAAGTGTACCAATCAGTTAACGGCTTCCTTGATAACATTAGCAGCAAA
    CATATAGTCGAAAGATTACGCAAAATCGGCGATAACTATAACGGCTACAACCTGGA
    TAAAATTTATATCGTGTCCAAATTTTACGAGAGCGTTAGCCAAAAAACCTACCGCGA
    CTGGGAAACAATTAATACCGCCCTCGAAATTCATTACAATAATATCTTGCCGGGTAA
    CGGTAAAAGTAAAGCCGACAAAGTAAAAAAAGCGGTTAAGAATGATTTACAGAAAT
    CCATCACCGAAATAAATGAACTAGTGTCAAACTATAAGCTGTGCAGTGACGACAAC
    ATCAAAGCGGAGACTTATATACATGAGATTAGCCATATCTTGAATAACTTTGAAGCA
    CAGGAATTGAAATACAATCCGGAAATTCACCTAGTTGAATCCGAGCTCAAAGCGAG
    TGAGCTTAAAAACGTGCTGGACGTGATCATGAATGCGTTTCATTGGTGTTCGGTTTTT
    ATGACTGAGGAACTTGTTGATAAAGACAACAATTTTTATGCGGAACTGGAGGAGAT
    TTACGATGAAATTTATCCAGTAATTAGTCTGTACAACCTGGTTCGTAACTACGTTACC
    CAGAAACCGTACAGCACGAAAAAGATTAAATTGAACTTTGGAATACCGACGTTAGC
    AGACGGTTGGTCAAAGTCCAAAGAGTATTCTAATAACGCTATCATACTGATGCGCGA
    CAATCTGTATTATCTGGGCATCTTTAATGCGAAGAATAAACCGGACAAGAAGATTAT
    CGAGGGTAATACGTCAGAAAATAAGGGTGACTACAAAAAGATGATTTATAATTTGC
    TCCCGGGTCCCAACAAAATGATCCCGAAAGTTTTCTTGAGCAGCAAGACGGGGGTG
    GAAACGTATAAACCGAGCGCCTATATCCTAGAGGGGTATAAACAGAATAAACATAT
    CAAGTCTTCAAAAGACTTTGATATCACTTTCTGTCATGATCTGATCGACTACTTCAAA
    AACTGTATTGCAATTCATCCCGAGTGGAAAAACTTCGGTTTTGATTTTAGCGACACC
    AGTACTTATGAAGACATTTCCGGGTTTTATCGTGAGGTAGAGTTACAAGGTTACAAG
    ATTGATTGGACATACATTAGCGAAAAAGACATTGATCTGCTGCAGGAAAAAGGTCA
    ACTGTATCTGTTCCAGATATATAACAAAGATTTTTCGAAAAAATCAACCGGGAATGA
    CAACCTTCACACCATGTACCTGAAAAATCTTTTCTCAGAAGAAAATCTTAAGGATAT
    CGTCCTGAAACTTAACGGCGAAGCGGAAATCTTCTTCAGGAAGAGCAGCATAAAGA
    ACCCAATCATTCATAAAAAAGGCTCGATTTTAGTCAACCGTACCTACGAAGCAGAA
    GAAAAAGACCAGTTTGGCAACATTCAAATTGTGCGTAAAAATATTCCGGAAAACAT
    TTATCAGGAGCTGTACAAATACTTCAACGATAAAAGCGACAAAGAGCTGTCTGATG
    AAGCAGCCAAACTGAAGAATGTAGTGGGACACCACGAGGCAGCGACGAATATAGTC
    AAGGACTATCGCTACACGTATGATAAATACTTCCTTCATATGCCTATTACGATCAAT
    TTCAAAGCCAATAAAACGGGTTTTATTAATGATAGGATCTTACAGTATATCGCTAAA
    GAAAAAGACTTACATGTGATCGGCATTGATCGGGGCGAGCGTAACCTGATCTACGT
    GTCCGTGATTGATACTTGTGGTAATATAGTTGAACAGAAAAGCTTTAACATTGTAAA
    CGGCTACGACTATCAGATAAAACTGAAACAACAGGAGGGCGCTAGACAGATTGCGC
    GGAAAGAATGGAAAGAAATTGGTAAAATTAAAGAGATCAAAGAGGGCTACCTGAG
    CTTAGTAATCCACGAGATCTCTAAAATGGTAATCAAATACAATGCAATTATAGCGAT
    GGAGGATTTGTCTTATGGTTTTAAAAAAGGGCGCTTTAAGGTCGAACGGCAAGTTTA
    CCAGAAATTTGAAACCATGCTCATCAATAAACTCAACTATCTGGTATTTAAAGATAT
    TTCGATTACCGAGAATGGCGGTCTCCTGAAAGGTTATCAGCTGACATACATTCCTGA
    TAAACTTAAAAACGTGGGTCATCAGTGCGGCTGCATTTTTTATGTGCCTGCTGCATA
    CACGAGCAAAATTGATCCGACCACCGGCTTTGTGAATATCTTTAAATTTAAAGACCT
    GACAGTGGACGCAAAACGTGAATTCATTAAAAAATTTGACTCAATTCGTTATGACAG
    TGAAAAAAATCTGTTCTGCTTTACATTTGACTACAATAACTTTATTACGCAAAACAC
    GGTCATGAGCAAATCATCGTGGAGTGTGTATACATACGGCGTGCGCATCAAACGTC
    GCTTTGTGAACGGCCGCTTCTCAAACGAAAGTGATACCATTGACATAACCAAAGATA
    TGGAGAAAACGTTGGAAATGACGGACATTAACTGGCGCGATGGCCACGATCTTCGT
    CAAGACATTATAGATTATGAAATTGTTCAGCACATATTCGAAATTTTCCGTTTAACA
    GTGCAAATGCGTAACTCCTTGTCTGAACTGGAGGACCGTGATTACGATCGTCTCATT
    TCACCTGTACTGAACGAAAATAACATTTTTTATGACAGCGCGAAAGCGGGGGATGC
    ACTTCCTAAGGATGCCGATGCAAATGGTGCGTATTGTATTGCATTAAAAGGGTTATA
    TGAAATTAAACAAATTACCGAAAATTGGAAAGAAGATGGTAAATTTTCGCGCGATA
    AACTCAAAATCAGCAATAAAGATTGGTTCGACTTTATCCAGAATAAGCGCTATCTCT
    AA
  • Exemplary nucleotide sequences coding for SEQ ID NO: 1 can include, e.g., SEQ ID NOs: 23-42:
  • SEQ ID NO: 23
    ATGAACAACGGAACAAATAATTTTCAGAACTTTATTGGGATCAGTTCGCTTCAGAAA
    ACGCTTCGTAATGCTCTGATTCCCACAGAAACCACTCAGCAGTTTATCGTAAAGAAT
    GGCATTATCAAGGAGGATGAATTACGCGGCGAGAACCGCCAAATCTTAAAAGATAT
    CATGGACGACTACTACCGCGGTTTCATTAGCGAAACTCTTAGTTCAATTGACGACAT
    TGACTGGACGTCCTTGTTCGAAAAGATGGAGATTCAATTAAAGAACGGTGATAACA
    AGGATACGTTGATTAAAGAACAGACGGAGTACCGTAAGGCTATCCACAAAAAATTT
    GCAAACGACGACCGCTTTAAAAATATGTTTAGCGCAAAATTAATCTCCGACATCCTG
    CCTGAATTCGTCATCCATAACAATAACTATAGCGCCTCGGAAAAAGAAGAAAAAAC
    GCAGGTTATTAAACTTTTCTCGCGCTTTGCAACAAGCTTTAAGGATTACTTCAAAAA
    TCGCGCCAATTGTTTTTCAGCCGACGACATTAGCTCCAGTTCCTGCCACCGTATTGTG
    AATGACAACGCTGAGATTTTTTTTTCCAATGCGCTGGTTTATCGTCGTATTGTTAAGA
    GCCTTAGTAACGACGACATTAATAAAATTAGCGGTGATATGAAGGATAGCTTGAAA
    GAAATGAGTCTGGAAGAGATCTATAGTTACGAGAAGTACGGCGAATTTATTACCCA
    GGAGGGCATTTCATTTTACAATGATATCTGTGGAAAAGTCAACTCCTTTATGAACTT
    GTATTGCCAAAAGAATAAAGAAAACAAAAACCTGTACAAACTGCAAAAGTTACACA
    AGCAGATTTTGTGTATCGCAGACACGTCATACGAAGTACCGTACAAGTTTGAGTCCG
    ATGAAGAAGTGTACCAAAGCGTTAATGGCTTTTTGGATAACATTTCGAGCAAACATA
    TCGTAGAGCGTTTGCGTAAGATTGGTGATAATTACAACGGTTACAATTTAGACAAAA
    TCTATATCGTCTCTAAGTTTTACGAAAGTGTTTCTCAGAAAACTTACCGCGATTGGG
    AGACGATCAACACTGCGCTGGAGATTCATTACAATAATATCCTTCCAGGTAACGGTA
    AAAGCAAAGCTGATAAGGTGAAAAAGGCGGTTAAAAATGACCTTCAAAAGTCTATC
    ACAGAAATCAACGAATTGGTCAGCAATTATAAGCTTTGCAGTGACGATAACATTAA
    GGCCGAGACTTACATCCATGAGATCTCTCACATTCTTAATAATTTTGAAGCGCAAGA
    GCTGAAATACAATCCTGAAATCCATCTGGTCGAAAGTGAATTAAAAGCCTCCGAATT
    AAAAAATGTCTTGGACGTGATCATGAATGCGTTCCATTGGTGCTCAGTTTTTATGAC
    GGAAGAGTTGGTGGACAAAGACAACAATTTTTACGCCGAGCTTGAGGAAATTTACG
    ACGAAATTTACCCCGTTATTTCGTTATACAACCTTGTGCGTAATTACGTTACACAAA
    AGCCCTATTCGACAAAGAAAATCAAGTTAAATTTCGGGATTCCCACATTAGCTGATG
    GATGGTCCAAATCCAAAGAATACTCGAATAACGCTATCATCCTTATGCGTGATAATT
    TGTACTACTTAGGCATCTTCAATGCGAAGAACAAACCTGACAAGAAAATTATCGAA
    GGAAACACTTCGGAGAACAAAGGTGATTATAAAAAGATGATCTACAACTTGCTTCC
    CGGGCCAAACAAAATGATTCCCAAGGTATTTTTGAGTTCTAAAACCGGTGTCGAAAC
    TTACAAACCAAGTGCTTATATTTTGGAAGGATACAAACAGAACAAACATATCAAGT
    CTTCGAAAGACTTCGATATTACGTTCTGCCACGATCTGATCGATTACTTCAAGAACT
    GTATTGCTATTCACCCCGAGTGGAAGAACTTTGGATTTGATTTCTCCGACACGTCCA
    CTTATGAAGATATCTCTGGCTTCTATCGCGAGGTTGAATTACAAGGGTATAAGATTG
    ACTGGACTTATATTTCGGAGAAGGATATCGATCTTTTGCAAGAAAAAGGGCAACTTT
    ATTTATTTCAGATCTATAACAAGGACTTTTCAAAAAAGAGCACTGGAAATGACAATC
    TGCATACCATGTACCTTAAGAACCTGTTCTCGGAAGAGAACCTGAAGGACATTGTAC
    TTAAACTGAATGGAGAGGCAGAGATCTTCTTTCGCAAATCAAGCATTAAGAACCCA
    ATTATTCACAAAAAGGGGAGTATCTTAGTAAATCGCACATATGAGGCTGAGGAAAA
    AGATCAGTTTGGTAACATTCAGATCGTGCGTAAGAACATTCCTGAAAATATCTATCA
    GGAACTTTATAAGTATTTCAACGATAAAAGTGATAAAGAGCTGAGTGACGAAGCGG
    CTAAACTTAAGAATGTTGTGGGACACCATGAGGCAGCAACCAATATTGTGAAGGAT
    TATCGCTATACGTACGACAAATACTTTTTACACATGCCCATCACTATTAATTTTAAAG
    CTAATAAGACTGGCTTCATTAACGATCGCATCCTGCAGTACATTGCTAAGGAAAAGG
    ATCTTCACGTTATCGGTATCGATCGCGGGGAGCGTAATCTTATCTACGTCTCTGTCAT
    TGACACGTGTGGCAATATTGTGGAGCAAAAGTCCTTCAATATTGTTAACGGCTATGA
    CTATCAGATTAAATTGAAACAGCAGGAAGGTGCGCGTCAGATTGCCCGCAAGGAAT
    GGAAGGAAATTGGCAAGATCAAAGAAATTAAGGAGGGCTACTTAAGCTTAGTAATT
    CACGAAATTAGTAAAATGGTTATCAAATACAACGCCATCATCGCGATGGAGGATCTT
    TCGTACGGGTTTAAGAAAGGTCGTTTTAAAGTGGAGCGTCAGGTGTACCAGAAATTT
    GAAACTATGCTTATTAACAAACTTAACTACCTGGTTTTCAAGGATATCAGTATTACT
    GAAAACGGGGGGCTGTTAAAAGGGTATCAATTAACTTACATTCCAGACAAATTAAA
    GAACGTTGGACATCAGTGTGGCTGCATTTTTTATGTACCAGCTGCATACACTTCAAA
    GATCGATCCTACGACTGGGTTCGTGAACATTTTTAAGTTTAAAGACTTGACGGTAGA
    TGCCAAGCGCGAATTCATCAAGAAATTCGACAGCATTCGCTACGACTCTGAGAAAA
    ATCTTTTCTGTTTCACATTCGATTATAACAATTTCATTACGCAGAACACAGTAATGTC
    CAAGTCTTCTTGGAGTGTTTATACATATGGTGTCCGCATTAAGCGCCGTTTCGTCAAC
    GGCCGCTTCAGTAATGAGAGCGATACTATTGACATCACAAAAGACATGGAAAAAAC
    ACTGGAAATGACCGACATCAATTGGCGTGACGGCCATGACTTACGTCAGGATATCAT
    TGATTATGAGATCGTTCAACACATCTTCGAAATCTTTCGCTTGACTGTTCAAATGCGC
    AATTCCTTGTCGGAATTGGAGGACCGTGATTATGACCGCTTAATTTCCCCCGTCTTAA
    ATGAAAACAATATTTTTTATGACTCTGCAAAAGCTGGAGATGCTCTGCCGAAAGACG
    CCGATGCAAATGGGGCATATTGCATTGCTTTAAAGGGGCTTTACGAGATCAAGCAA
    ATCACCGAAAACTGGAAAGAGGATGGAAAGTTTTCGCGTGATAAACTGAAGATCTC
    TAACAAAGACTGGTTCGACTTTATCCAGAACAAGCGTTATTT
    SEQ ID NO: 24
    ATGAACAACGGCACCAATAACTTCCAAAACTTCATCGGGATCTCTAGCCTTCAGAAG
    ACGCTTCGCAATGCTCTTATCCCAACTGAGACCACTCAACAATTTATTGTGAAGAAT
    GGAATTATTAAAGAGGACGAACTGCGTGGCGAGAATCGTCAGATCTTAAAGGACAT
    TATGGATGATTATTACCGTGGATTCATCTCCGAAACATTATCGTCGATCGATGATAT
    CGATTGGACTTCTCTGTTCGAGAAAATGGAAATTCAATTGAAAAACGGAGATAATA
    AAGATACGCTTATCAAAGAACAGACGGAATATCGTAAAGCGATTCATAAGAAATTC
    GCAAATGACGATCGTTTCAAAAATATGTTCAGTGCCAAGCTTATTTCGGACATTTTA
    CCTGAATTTGTAATTCATAATAATAACTACTCAGCAAGTGAGAAGGAGGAGAAAAC
    CCAAGTTATTAAACTGTTCTCTCGTTTCGCAACGTCCTTTAAAGATTACTTTAAAAAC
    CGCGCGAATTGCTTTAGCGCTGACGACATTTCCAGCTCATCCTGTCATCGCATCGTA
    AACGACAATGCGGAAATCTTCTTCAGCAACGCCCTGGTTTACCGCCGCATCGTCAAA
    AGCTTATCGAATGACGACATCAATAAGATCTCAGGAGATATGAAGGACTCGCTTAA
    GGAGATGTCTCTGGAGGAAATTTATAGTTACGAAAAGTATGGAGAGTTCATTACCCA
    GGAGGGAATCTCGTTCTACAATGACATTTGCGGGAAGGTGAACTCCTTCATGAACTT
    ATACTGCCAGAAAAACAAAGAGAACAAAAATCTGTATAAATTGCAGAAATTACATA
    AACAGATTCTTTGTATTGCTGACACTTCCTACGAAGTACCCTATAAATTCGAGTCAG
    ATGAAGAAGTATACCAGTCCGTGAACGGATTTCTGGACAATATCTCCTCAAAACACA
    TCGTGGAACGCTTACGTAAAATTGGCGATAATTATAATGGTTACAATCTTGACAAAA
    TTTATATCGTATCTAAATTTTACGAGAGTGTGAGCCAAAAGACCTACCGCGACTGGG
    AGACCATCAACACAGCTTTAGAAATTCACTATAATAATATCTTACCCGGCAATGGTA
    AGAGCAAGGCTGACAAGGTAAAAAAGGCCGTCAAGAATGATTTGCAGAAATCTATT
    ACAGAAATTAATGAGTTAGTCTCCAACTATAAGCTTTGTTCCGACGATAACATCAAA
    GCTGAGACATATATTCATGAGATTAGTCACATTCTTAACAACTTCGAGGCCCAGGAA
    CTTAAGTACAATCCTGAAATTCATCTTGTCGAGTCTGAGCTGAAAGCTAGTGAATTG
    AAAAATGTTTTAGACGTTATTATGAACGCATTCCACTGGTGCTCTGTGTTTATGACA
    GAAGAACTGGTCGACAAGGACAATAACTTCTATGCCGAACTTGAGGAAATCTACGA
    TGAAATTTACCCTGTAATCTCCTTGTATAATCTTGTACGTAATTACGTCACTCAAAAA
    CCTTACAGCACGAAAAAAATTAAATTGAACTTCGGGATTCCTACACTTGCCGACGGG
    TGGTCTAAATCCAAGGAATATAGCAACAATGCCATTATTTTAATGCGCGACAATCTT
    TACTATTTAGGAATTTTTAACGCTAAGAACAAGCCCGATAAAAAGATTATTGAAGGA
    AACACGTCTGAAAATAAGGGCGACTACAAAAAGATGATTTATAACCTTTTGCCCGGT
    CCAAACAAAATGATCCCAAAGGTATTCCTGTCATCCAAAACAGGGGTTGAGACATA
    TAAGCCCAGCGCATATATTCTGGAAGGATACAAACAGAATAAACATATCAAAAGCA
    GCAAAGATTTTGACATTACTTTTTGCCACGATTTAATCGACTACTTCAAAAACTGTAT
    CGCTATCCACCCTGAATGGAAGAATTTCGGATTTGATTTCTCAGATACAAGTACGTA
    TGAGGATATCAGCGGTTTCTATCGCGAAGTTGAACTTCAAGGGTATAAAATTGACTG
    GACCTACATTAGTGAGAAGGACATCGACCTGTTACAGGAAAAAGGCCAATTGTACT
    TGTTTCAGATCTACAATAAGGATTTCTCAAAAAAATCGACCGGCAATGATAACTTGC
    ACACCATGTACCTGAAGAACCTTTTTTCGGAGGAAAACCTTAAAGACATTGTCCTGA
    AGTTGAATGGAGAAGCGGAGATTTTCTTTCGTAAGTCTTCCATTAAAAATCCAATTA
    TTCATAAGAAGGGCAGCATCCTTGTGAACCGTACGTACGAGGCGGAAGAGAAGGAC
    CAATTCGGTAACATTCAAATCGTCCGCAAGAACATCCCTGAAAATATTTATCAGGAG
    CTTTACAAGTATTTCAATGATAAGTCCGACAAGGAATTATCAGATGAGGCTGCGAAG
    TTGAAAAATGTTGTTGGTCATCACGAGGCGGCGACGAATATTGTAAAGGATTATCGC
    TACACTTATGACAAGTACTTTCTGCACATGCCGATCACCATTAATTTCAAGGCGAAC
    AAAACAGGATTTATTAATGACCGCATCTTACAATACATTGCCAAAGAAAAGGACTT
    ACACGTTATTGGCATTGATCGTGGAGAACGCAACTTAATCTACGTAAGCGTTATTGA
    CACTTGCGGGAATATCGTAGAACAAAAGAGCTTCAACATCGTGAATGGTTACGATT
    ACCAGATCAAGCTTAAGCAGCAGGAGGGAGCGCGCCAGATCGCGCGCAAGGAATG
    GAAGGAGATTGGTAAGATCAAGGAAATCAAGGAAGGTTATCTGTCCTTGGTAATCC
    ACGAAATTTCGAAAATGGTTATCAAATACAATGCTATTATTGCAATGGAGGACTTGT
    CCTACGGCTTTAAAAAAGGACGCTTTAAGGTGGAGCGCCAGGTTTATCAAAAGTTTG
    AAACAATGCTGATTAACAAGCTGAACTATTTGGTCTTTAAAGATATCTCCATCACCG
    AAAATGGTGGGCTTTTGAAAGGCTATCAACTTACATATATCCCTGATAAGCTTAAGA
    ATGTGGGTCATCAGTGCGGGTGCATTTTTTATGTTCCTGCAGCCTACACGTCCAAAA
    TCGATCCTACAACTGGATTTGTTAATATCTTCAAATTTAAGGATCTTACCGTCGACGC
    GAAGCGCGAATTTATCAAGAAATTCGATAGTATTCGTTATGATTCCGAAAAAAACCT
    TTTCTGTTTCACCTTTGATTATAATAACTTTATCACGCAAAATACTGTCATGAGCAAA
    TCGAGTTGGTCTGTGTACACTTACGGAGTACGCATCAAGCGTCGTTTTGTTAATGGG
    CGCTTCAGTAACGAGTCAGACACGATTGATATCACAAAAGATATGGAGAAAACGCT
    GGAGATGACAGACATCAATTGGCGCGATGGTCATGACTTACGTCAAGACATTATCG
    ATTATGAAATTGTCCAGCATATCTTTGAGATCTTTCGTTTGACTGTTCAGATGCGCAA
    CAGCCTGTCAGAATTGGAGGATCGTGACTATGATCGCCTTATTTCTCCCGTCTTAAAT
    GAGAACAATATCTTCTACGACTCAGCCAAGGCTGGAGATGCACTGCCAAAAGACGC
    CGACGCAAATGGGGCCTACTGTATTGCATTGAAGGGGTTGTACGAGATCAAACAGA
    TTACAGAAAATTGGAAGGAGGACGGTAAGTTCTCTCGTGATAAGCTGAAGATTTCTA
    ACAAAGACTGGTTCGATTTCATTCAGAACAAACGTTACCTG
    SEQ ID NO: 25
    ATGAACAACGGTACCAATAACTTTCAGAATTTCATTGGAATCAGCAGCTTACAGAAA
    ACCCTGCGCAATGCACTTATCCCCACTGAGACAACCCAGCAGTTCATTGTAAAGAAC
    GGGATTATTAAAGAAGATGAGCTTCGCGGGGAGAATCGTCAGATCTTAAAGGATAT
    TATGGACGATTACTACCGTGGCTTCATTTCGGAGACGCTGTCGTCGATCGACGACAT
    CGACTGGACATCCTTGTTTGAAAAGATGGAAATCCAACTGAAGAATGGCGATAACA
    AGGACACGTTAATCAAAGAGCAGACGGAATACCGTAAAGCTATCCACAAAAAGTTC
    GCTAATGACGACCGCTTTAAGAACATGTTCTCAGCAAAACTTATTAGCGATATTTTA
    CCTGAATTTGTCATCCACAATAACAATTACTCCGCGAGTGAAAAAGAGGAGAAAAC
    CCAGGTGATTAAGCTGTTTTCCCGTTTTGCAACCAGTTTCAAGGACTATTTTAAGAAT
    CGTGCTAATTGTTTCTCTGCAGACGACATTTCCTCGTCGTCCTGCCATCGCATTGTTA
    ATGATAATGCTGAAATCTTTTTTTCAAACGCACTTGTGTATCGTCGCATTGTCAAAAG
    CTTAAGTAATGACGATATCAATAAGATCTCAGGAGACATGAAGGACTCCCTGAAAG
    AAATGTCATTGGAAGAAATTTACTCTTATGAAAAGTATGGAGAATTTATTACGCAGG
    AGGGTATCAGCTTCTATAACGACATTTGTGGTAAAGTGAACAGCTTTATGAATCTTT
    ATTGTCAAAAGAATAAAGAGAACAAAAATCTGTACAAGCTGCAGAAATTGCATAAA
    CAAATTCTGTGCATTGCAGATACTTCGTATGAGGTTCCTTACAAATTCGAGTCGGAT
    GAGGAGGTGTATCAAAGCGTAAACGGATTTTTGGATAACATTAGTAGTAAGCATATT
    GTGGAACGCCTTCGCAAGATTGGTGACAACTATAACGGATACAACTTAGACAAGAT
    CTATATTGTCTCGAAGTTTTACGAAAGTGTTTCCCAAAAGACTTATCGCGACTGGGA
    GACAATCAACACTGCGCTGGAAATTCACTATAACAATATCTTGCCGGGGAACGGAA
    AAAGTAAGGCAGATAAGGTGAAGAAAGCAGTCAAAAATGATCTGCAAAAAAGCAT
    TACTGAAATTAACGAACTTGTGTCAAATTACAAATTGTGTTCGGATGACAATATTAA
    AGCGGAAACGTATATCCACGAGATCTCGCACATTCTTAATAATTTCGAGGCGCAGGA
    ATTAAAGTATAATCCTGAGATCCATTTGGTGGAATCAGAACTTAAAGCTAGTGAACT
    GAAAAATGTCCTGGACGTTATTATGAATGCATTTCACTGGTGTTCTGTCTTTATGACA
    GAAGAACTTGTCGACAAAGACAACAACTTTTATGCGGAATTAGAAGAGATTTACGA
    CGAAATTTATCCCGTTATTTCGTTATATAATTTAGTTCGTAATTACGTGACTCAGAAA
    CCCTACAGCACAAAAAAGATTAAATTAAACTTTGGGATTCCGACTCTTGCTGATGGA
    TGGAGCAAGTCCAAGGAGTACTCTAATAACGCCATTATCTTGATGCGTGACAACCTG
    TACTACCTGGGCATTTTTAACGCTAAAAACAAACCCGACAAAAAGATCATTGAAGG
    GAACACCTCGGAAAATAAGGGGGACTATAAAAAAATGATCTACAATCTGTTGCCAG
    GCCCAAATAAGATGATCCCAAAGGTTTTTTTATCTTCCAAAACTGGCGTAGAAACTT
    ACAAGCCGAGCGCATACATCCTTGAAGGATATAAACAAAACAAACATATCAAAAGT
    TCAAAGGACTTCGATATTACGTTCTGCCATGATTTAATCGATTATTTCAAGAATTGCA
    TCGCGATTCACCCAGAGTGGAAAAACTTTGGGTTTGATTTTTCAGACACCAGCACTT
    ACGAGGATATTAGTGGATTCTATCGTGAGGTTGAACTGCAGGGCTATAAAATTGACT
    GGACCTATATTTCTGAAAAAGATATTGATCTGCTTCAGGAGAAAGGCCAATTGTACT
    TATTTCAAATCTATAACAAGGATTTCTCCAAGAAGTCCACGGGTAATGACAACTTAC
    ACACAATGTATCTGAAGAATCTGTTTAGTGAGGAGAACTTGAAGGACATTGTGCTGA
    AGCTTAATGGCGAGGCCGAAATCTTTTTTCGTAAGTCCTCCATTAAAAACCCTATTA
    TCCATAAGAAAGGGAGTATTCTTGTCAACCGCACGTATGAGGCCGAAGAAAAGGAC
    CAATTCGGAAACATCCAAATTGTCCGTAAAAATATTCCTGAGAACATTTACCAGGAG
    CTTTACAAGTATTTCAACGACAAGAGTGATAAAGAACTTTCAGATGAGGCGGCGAA
    ACTGAAGAATGTAGTGGGGCACCACGAAGCTGCCACGAATATTGTAAAGGATTACC
    GTTACACCTACGACAAGTACTTTTTGCATATGCCCATCACAATTAATTTTAAGGCCA
    ATAAAACTGGTTTTATCAACGATCGTATCTTACAGTACATTGCTAAGGAAAAAGATC
    TGCACGTTATCGGTATCGATCGCGGGGAACGCAATCTGATTTATGTTAGTGTGATTG
    ACACGTGCGGAAATATTGTTGAGCAGAAGAGCTTTAATATCGTAAATGGATATGACT
    ATCAAATTAAACTGAAGCAACAGGAAGGGGCCCGCCAGATTGCCCGCAAGGAGTGG
    AAAGAAATTGGAAAGATCAAGGAGATTAAAGAAGGGTACCTTTCCCTTGTTATCCA
    CGAAATCTCGAAAATGGTGATCAAGTACAATGCCATTATTGCTATGGAGGATCTGTC
    ATATGGGTTTAAGAAAGGCCGCTTTAAGGTGGAACGTCAGGTTTACCAGAAGTTTGA
    GACCATGCTTATCAATAAGCTGAATTATCTTGTCTTCAAAGACATCTCAATCACAGA
    GAACGGCGGGCTGTTAAAAGGATATCAGCTGACCTATATCCCCGACAAACTGAAAA
    ATGTCGGGCACCAATGCGGCTGTATTTTCTACGTGCCCGCTGCATACACATCTAAAA
    TTGACCCAACGACTGGATTCGTAAATATTTTTAAGTTTAAGGATCTTACGGTAGATG
    CAAAGCGCGAATTTATCAAGAAATTTGATAGTATCCGTTACGACAGCGAGAAAAAC
    TTATTTTGTTTTACGTTCGATTATAACAACTTCATCACGCAAAATACCGTCATGTCAA
    AATCTTCCTGGTCAGTCTATACGTATGGCGTCCGTATCAAGCGCCGCTTCGTCAACG
    GGCGTTTTTCAAACGAGTCAGATACCATCGATATCACCAAAGATATGGAAAAAACA
    TTGGAGATGACGGACATCAATTGGCGCGATGGTCATGACTTACGCCAGGACATTATT
    GACTACGAAATCGTACAACATATTTTTGAGATTTTCCGTCTGACCGTGCAAATGCGC
    AACTCATTATCCGAACTTGAGGATCGTGATTACGACCGCTTGATCAGTCCTGTTCTG
    AACGAGAATAATATTTTTTACGACAGTGCCAAGGCGGGAGACGCACTGCCCAAGGA
    CGCTGACGCTAACGGAGCTTATTGTATTGCGTTGAAGGGACTTTACGAAATCAAGCA
    AATCACTGAAAACTGGAAGGAGGATGGTAAATTCTCACGCGACAAGTTGAAAATTT
    CGAACAAGGACTGGTTCGATTTCATCCAAAACAAGCGTTATTTA
    SEQ ID NO: 26
    ATGAACAACGGGACTAATAACTTCCAGAACTTCATCGGTATTTCATCATTACAAAAA
    ACGCTTCGTAACGCCTTGATCCCAACAGAAACGACCCAACAATTTATTGTAAAAAAC
    GGCATCATCAAAGAAGACGAACTGCGTGGCGAAAATCGCCAAATTTTGAAGGACAT
    TATGGATGACTATTATCGTGGGTTTATCTCGGAGACATTATCCTCCATCGACGACATT
    GATTGGACGAGTCTTTTTGAGAAAATGGAGATCCAGCTTAAAAATGGTGATAACAA
    GGATACATTGATCAAGGAGCAAACCGAGTACCGCAAGGCCATCCATAAGAAGTTCG
    CAAATGACGACCGCTTCAAAAATATGTTTAGTGCCAAATTGATCTCGGATATCCTTC
    CTGAGTTCGTAATTCACAACAATAATTATAGCGCATCCGAAAAGGAGGAAAAGACT
    CAAGTCATTAAGCTTTTCAGTCGCTTTGCTACCTCGTTTAAGGACTATTTCAAGAACC
    GCGCGAACTGCTTCTCAGCGGATGACATTTCTTCCTCGTCGTGTCACCGCATCGTGA
    ATGATAATGCGGAGATCTTCTTTAGTAATGCCTTGGTATACCGCCGCATTGTTAAAT
    CCCTGTCTAACGACGATATCAATAAGATCTCAGGAGATATGAAGGATAGCCTTAAA
    GAAATGTCTCTGGAAGAAATTTACTCCTATGAAAAGTACGGTGAGTTTATCACCCAA
    GAGGGGATTAGCTTTTATAACGATATCTGCGGGAAGGTGAATTCGTTTATGAACCTT
    TATTGTCAAAAGAATAAGGAGAATAAGAACTTATATAAGCTTCAGAAACTGCATAA
    ACAAATCTTATGCATTGCCGATACTAGCTATGAAGTTCCGTATAAATTCGAGAGCGA
    TGAAGAAGTTTATCAGAGCGTCAATGGGTTCTTGGATAACATTTCATCAAAACACAT
    CGTGGAACGTCTGCGTAAGATTGGGGATAACTACAACGGATATAATCTTGACAAAA
    TTTATATTGTATCTAAATTCTATGAGTCGGTGAGTCAAAAGACCTACCGTGATTGGG
    AAACAATCAATACCGCGTTAGAAATCCACTATAACAACATTCTGCCAGGGAATGGT
    AAAAGTAAAGCGGACAAAGTCAAGAAGGCTGTGAAGAACGATCTGCAAAAGAGTA
    TTACAGAGATTAACGAATTAGTCTCCAATTATAAGTTATGCTCGGACGATAACATTA
    AGGCGGAGACGTATATTCATGAGATTTCGCATATTCTTAACAACTTCGAGGCACAAG
    AGCTTAAGTATAACCCAGAGATTCACCTTGTCGAATCGGAGCTGAAGGCATCGGAA
    TTAAAAAATGTCTTAGATGTAATCATGAACGCGTTCCATTGGTGCAGTGTTTTCATG
    ACTGAGGAGTTAGTTGACAAGGACAATAACTTCTACGCAGAATTAGAAGAGATCTA
    TGATGAGATTTATCCAGTGATTTCGCTGTATAATCTGGTACGTAATTACGTCACTCAA
    AAGCCCTACTCAACAAAAAAAATTAAGCTGAACTTCGGAATTCCGACTCTGGCCGA
    CGGGTGGTCCAAGTCAAAGGAGTATTCTAATAATGCTATCATCCTGATGCGCGATAA
    CTTATACTATTTGGGAATTTTCAATGCCAAAAATAAACCAGATAAAAAGATTATCGA
    AGGTAATACAAGCGAGAATAAGGGTGACTATAAGAAAATGATTTACAATCTTCTTC
    CAGGCCCTAACAAGATGATTCCCAAAGTTTTTTTGTCCAGTAAAACAGGGGTCGAAA
    CTTACAAGCCCAGTGCCTATATCCTTGAAGGGTACAAGCAGAATAAGCACATCAAA
    TCCTCGAAAGACTTTGATATTACATTTTGTCATGACTTAATCGATTATTTTAAGAACT
    GTATCGCAATCCATCCAGAATGGAAGAACTTCGGGTTTGATTTCTCTGATACTTCCA
    CGTATGAGGATATTTCCGGGTTCTACCGCGAAGTAGAGCTTCAGGGCTATAAAATTG
    ACTGGACATATATTTCAGAAAAAGACATCGATCTGTTACAAGAAAAAGGACAGTTG
    TATCTGTTTCAAATCTATAATAAGGATTTCTCCAAAAAGTCAACTGGAAATGATAAC
    TTACATACAATGTATCTGAAAAATCTTTTTAGTGAAGAGAATTTGAAGGATATCGTG
    CTGAAGTTAAATGGCGAAGCAGAGATCTTCTTCCGCAAGTCCTCGATCAAGAATCCT
    ATCATCCACAAGAAAGGTAGTATTCTGGTTAACCGCACGTACGAGGCCGAGGAAAA
    AGACCAGTTCGGTAATATCCAGATTGTACGTAAGAATATTCCTGAAAATATTTACCA
    GGAATTATACAAGTATTTTAACGACAAATCGGATAAGGAGCTTTCAGATGAGGCCG
    CAAAGTTGAAGAACGTCGTAGGACACCATGAGGCCGCTACGAATATCGTCAAGGAC
    TACCGCTATACGTATGACAAGTACTTCCTGCACATGCCTATTACTATCAATTTCAAA
    GCTAATAAAACAGGATTCATCAATGATCGTATCCTTCAGTACATTGCCAAAGAAAAA
    GATCTGCACGTAATCGGAATCGACCGTGGCGAACGTAATCTGATTTACGTATCAGTT
    ATCGACACATGTGGTAACATCGTGGAGCAGAAATCTTTTAACATTGTTAACGGCTAT
    GATTATCAGATTAAGCTTAAACAGCAGGAGGGGGCACGCCAAATCGCTCGTAAAGA
    ATGGAAGGAGATTGGAAAGATTAAAGAGATTAAAGAGGGGTACCTTTCGCTGGTTA
    TTCACGAAATTTCCAAGATGGTGATTAAGTACAATGCAATCATCGCGATGGAAGATC
    TTAGTTACGGATTCAAAAAGGGACGCTTCAAAGTTGAGCGTCAGGTCTACCAGAAA
    TTTGAAACGATGCTGATTAACAAATTGAATTACTTGGTATTCAAAGATATCTCAATT
    ACTGAAAATGGTGGCTTATTAAAGGGTTACCAGCTTACCTATATCCCGGATAAGCTG
    AAGAACGTGGGCCATCAATGCGGCTGCATCTTTTACGTCCCTGCCGCATATACCTCT
    AAAATTGACCCCACCACCGGATTCGTAAATATTTTTAAATTCAAGGACCTGACGGTG
    GACGCCAAGCGCGAATTCATCAAAAAATTCGACTCAATCCGCTATGATTCCGAAAA
    AAATCTTTTCTGCTTTACGTTCGATTATAATAACTTCATTACCCAAAACACGGTGATG
    TCAAAATCGTCCTGGAGCGTGTATACTTATGGAGTGCGTATCAAGCGCCGCTTTGTT
    AATGGGCGCTTCAGTAACGAAAGCGATACCATCGACATTACCAAAGACATGGAGAA
    GACGCTTGAAATGACGGATATCAATTGGCGTGACGGACACGATCTTCGTCAGGATAT
    CATCGACTACGAGATTGTGCAACATATCTTTGAGATTTTCCGTTTAACTGTTCAAATG
    CGTAACTCCTTGTCCGAATTGGAAGACCGTGATTACGACCGCTTGATTTCACCAGTG
    CTTAACGAGAATAACATCTTCTACGACTCCGCCAAAGCAGGCGATGCCCTGCCAAA
    GGACGCTGATGCAAATGGTGCATACTGTATCGCGTTGAAGGGCTTATACGAGATTAA
    GCAAATCACCGAAAATTGGAAAGAGGATGGAAAGTTCAGTCGCGATAAGCTGAAGA
    TCTCTAATAAAGATTGGTTTGACTTTATCCAGAACAAACGTTATTTA
    SEQ ID NO: 27
    ATGAACAACGGTACCAATAATTTCCAAAATTTCATCGGAATCTCATCCTTGCAAAAA
    ACCTTGCGCAATGCTTTGATCCCCACCGAAACCACGCAGCAGTTCATCGTGAAAAAC
    GGCATTATCAAAGAGGATGAGTTGCGCGGGGAAAACCGTCAAATTCTTAAGGATAT
    CATGGACGATTACTACCGTGGGTTTATCAGTGAGACCCTGTCAAGCATTGACGACAT
    TGACTGGACCAGCTTATTTGAGAAGATGGAGATTCAATTAAAGAACGGGGACAATA
    AGGACACGCTTATCAAAGAGCAGACAGAATACCGTAAAGCGATTCATAAGAAATTT
    GCAAATGACGATCGCTTCAAGAACATGTTTTCAGCAAAATTAATCAGCGACATCCTT
    CCCGAATTTGTGATTCATAATAACAACTATTCGGCTAGCGAAAAAGAGGAGAAAAC
    TCAGGTTATTAAGCTTTTCTCGCGTTTTGCCACTTCGTTCAAAGACTATTTTAAGAAT
    CGCGCAAACTGCTTTTCGGCTGATGATATTTCCAGTTCTAGCTGCCATCGTATCGTTA
    ACGATAATGCTGAGATTTTCTTCTCTAATGCCCTGGTGTATCGTCGTATCGTTAAATC
    TTTGAGCAACGACGATATTAATAAGATTTCAGGCGACATGAAGGATTCTTTAAAGGA
    GATGTCTTTAGAAGAGATTTATTCCTATGAGAAATATGGCGAGTTTATCACCCAAGA
    AGGAATTTCGTTCTACAACGACATCTGTGGCAAAGTGAACAGCTTCATGAATTTATA
    CTGCCAAAAGAATAAGGAGAATAAAAATTTATATAAACTGCAGAAACTGCATAAGC
    AAATTCTTTGCATTGCAGACACCTCTTATGAAGTTCCTTATAAGTTTGAATCGGACG
    AGGAGGTATATCAGAGTGTGAACGGGTTCCTGGACAATATTTCATCCAAGCATATTG
    TTGAACGTTTACGCAAAATTGGAGACAATTACAATGGGTATAACCTTGACAAAATTT
    ACATCGTGTCGAAGTTTTACGAATCGGTAAGCCAGAAGACCTATCGTGACTGGGAA
    ACTATCAATACCGCCTTAGAAATTCATTACAACAATATTCTTCCTGGTAACGGCAAA
    AGCAAAGCCGATAAGGTAAAGAAGGCTGTCAAGAACGACCTGCAAAAGTCTATCAC
    AGAGATCAACGAGTTAGTCTCTAACTACAAATTATGTTCCGACGACAATATTAAAGC
    CGAAACCTACATCCATGAGATCTCACACATTCTTAACAATTTTGAGGCCCAGGAGCT
    GAAATATAACCCAGAAATTCACCTTGTAGAGAGCGAATTAAAAGCCTCCGAGCTGA
    AGAACGTTTTGGATGTAATCATGAACGCATTTCATTGGTGCAGCGTATTTATGACAG
    AGGAGTTGGTCGACAAGGACAATAACTTTTACGCCGAGCTTGAAGAAATCTACGAT
    GAAATTTACCCGGTAATTAGTTTATATAATTTAGTTCGCAACTACGTAACTCAGAAA
    CCCTACAGTACCAAGAAGATTAAATTGAACTTTGGGATCCCGACACTTGCTGACGGT
    TGGAGTAAATCAAAAGAATACTCCAATAATGCAATTATCCTGATGCGCGACAATCTT
    TACTACTTGGGGATCTTTAACGCAAAGAACAAACCAGATAAGAAAATCATCGAGGG
    CAACACCAGCGAGAATAAAGGCGATTACAAGAAAATGATCTATAATCTTTTGCCGG
    GACCGAACAAAATGATCCCAAAGGTTTTCCTGTCGTCGAAAACGGGAGTCGAGACA
    TATAAACCATCTGCGTACATCTTGGAAGGTTACAAACAGAATAAGCATATTAAGTCT
    AGTAAAGACTTCGACATCACCTTTTGTCATGACCTGATTGATTATTTCAAGAACTGT
    ATTGCTATCCATCCAGAATGGAAAAACTTCGGATTTGACTTCTCCGATACTAGCACC
    TACGAAGACATTTCGGGTTTTTATCGCGAAGTAGAGCTTCAAGGGTACAAAATTGAT
    TGGACATATATTAGCGAGAAAGACATTGATTTGCTTCAAGAGAAGGGACAGTTATA
    TTTATTCCAGATCTACAACAAAGACTTCTCGAAGAAATCCACCGGTAATGATAATCT
    TCACACTATGTACCTGAAGAATTTATTTTCAGAGGAAAATCTGAAGGACATTGTACT
    TAAACTTAATGGAGAAGCCGAAATCTTCTTCCGCAAGAGTTCCATTAAAAATCCGAT
    TATTCATAAAAAGGGAAGTATCCTTGTGAACCGCACGTATGAGGCCGAAGAGAAGG
    ATCAGTTTGGGAATATTCAAATTGTCCGCAAAAACATCCCCGAGAACATCTACCAGG
    AACTGTATAAATACTTTAATGATAAATCTGATAAAGAGTTATCAGACGAGGCTGCCA
    AACTGAAAAACGTAGTCGGTCATCATGAGGCAGCGACCAATATTGTAAAGGACTAC
    CGTTACACCTACGACAAGTATTTCCTTCACATGCCGATCACGATTAATTTTAAGGCT
    AACAAGACCGGCTTTATCAATGACCGCATCTTGCAGTACATCGCGAAAGAGAAAGA
    TTTACACGTCATCGGAATTGATCGTGGAGAGCGTAATCTTATCTACGTCAGCGTCAT
    CGACACCTGTGGAAACATTGTGGAACAAAAAAGTTTTAATATCGTAAACGGCTACG
    ACTATCAAATTAAACTTAAACAGCAAGAGGGAGCTCGCCAGATCGCTCGCAAAGAG
    TGGAAAGAGATTGGGAAAATTAAAGAAATTAAAGAGGGTTACCTGTCGCTGGTAAT
    TCACGAAATCTCGAAAATGGTCATCAAATATAATGCAATTATCGCTATGGAGGATCT
    GTCCTACGGGTTCAAGAAGGGACGTTTTAAAGTAGAGCGCCAGGTGTATCAAAAAT
    TCGAAACCATGTTGATCAATAAGCTTAACTATTTGGTCTTCAAAGATATTTCGATTAC
    GGAGAACGGAGGTTTGTTGAAAGGATATCAGCTGACGTATATCCCAGACAAGTTGA
    AAAACGTGGGGCATCAATGTGGATGTATTTTCTATGTGCCCGCGGCCTACACGAGTA
    AGATCGATCCTACCACTGGTTTCGTCAACATTTTCAAATTTAAAGATCTTACCGTGG
    ATGCGAAGCGCGAATTTATTAAGAAATTTGATAGCATTCGCTATGATTCCGAAAAGA
    ACCTGTTCTGTTTTACGTTCGACTATAACAATTTCATTACCCAAAACACGGTGATGA
    GCAAATCCTCTTGGTCAGTTTATACATACGGTGTACGTATCAAACGCCGTTTCGTTA
    ACGGACGCTTTTCCAATGAGTCTGATACAATCGATATCACGAAAGATATGGAAAAA
    ACATTAGAGATGACTGATATCAACTGGCGTGACGGGCACGACCTGCGTCAAGACAT
    TATTGACTACGAGATTGTGCAGCATATCTTCGAAATCTTTCGCTTAACTGTGCAAAT
    GCGTAACTCGTTATCCGAGTTAGAAGACCGTGACTACGATCGCCTGATTTCACCCGT
    CTTGAACGAAAATAACATCTTCTACGATTCCGCGAAGGCTGGGGACGCATTGCCCAA
    GGACGCAGACGCGAATGGAGCGTACTGTATTGCGCTTAAAGGATTATATGAAATCA
    AGCAGATCACCGAAAATTGGAAGGAGGACGGGAAGTTCTCACGCGACAAACTGAA
    GATTTCAAATAAGGACTGGTTCGATTTCATTCAGAATAAGCGTTACCTG
    SEQ ID NO: 28
    TGAATAATGGTACGAACAACTTTCAGAACTTCATCGGCATCTCCAGCCTTCAAAAGA
    CTTTACGCAACGCATTGATTCCCACGGAGACTACGCAACAGTTTATCGTAAAAAATG
    GTATTATCAAAGAAGATGAATTACGCGGGGAGAATCGCCAGATTCTTAAGGACATT
    ATGGACGATTATTACCGTGGATTCATCAGTGAGACACTGAGCTCCATTGATGACATC
    GACTGGACGTCATTGTTTGAAAAGATGGAAATCCAGTTGAAAAATGGCGATAACAA
    AGATACATTGATTAAAGAGCAGACAGAGTACCGCAAAGCAATTCACAAGAAATTCG
    CCAATGATGATCGTTTTAAGAACATGTTTAGTGCCAAGCTTATTTCGGATATCTTACC
    CGAATTCGTGATTCACAACAACAATTATTCGGCAAGTGAGAAAGAGGAAAAGACCC
    AGGTTATCAAATTGTTTTCGCGCTTCGCCACTTCGTTCAAAGATTATTTCAAGAACCG
    TGCAAACTGTTTCTCCGCTGACGACATCAGTTCCAGCTCATGCCACCGTATTGTAAA
    TGACAATGCGGAGATCTTTTTCAGTAATGCCTTAGTATATCGTCGCATTGTAAAGAG
    CTTATCTAATGATGACATTAACAAGATCTCGGGTGATATGAAGGACTCACTTAAGGA
    GATGAGTCTGGAAGAGATCTACTCCTACGAAAAATACGGGGAATTCATCACCCAGG
    AGGGAATTTCATTCTACAACGATATCTGCGGCAAAGTTAACTCCTTTATGAATCTGT
    ACTGTCAAAAGAACAAGGAGAATAAAAACCTGTATAAATTGCAGAAACTTCATAAA
    CAAATTTTGTGTATCGCAGACACGAGTTATGAAGTACCTTATAAATTCGAATCCGAC
    GAAGAGGTATATCAGTCCGTAAATGGGTTCCTGGACAATATCAGTAGTAAGCACATT
    GTGGAACGCTTACGCAAAATTGGAGACAATTACAACGGGTATAACCTGGACAAAAT
    CTACATCGTATCCAAATTTTATGAAAGCGTGTCTCAAAAAACTTATCGTGATTGGGA
    AACAATCAACACGGCTCTTGAGATCCATTACAATAACATCTTGCCGGGTAACGGCAA
    ATCGAAGGCAGACAAAGTTAAAAAAGCAGTTAAGAACGACTTACAGAAAAGCATTA
    CGGAGATTAACGAGTTAGTAAGTAATTACAAATTATGCTCCGACGATAATATCAAA
    GCTGAAACCTACATCCATGAAATTAGCCACATTTTGAACAATTTCGAAGCGCAGGAG
    CTGAAATATAACCCTGAAATCCATCTGGTAGAGTCTGAGTTGAAGGCGTCAGAACTG
    AAAAACGTTCTTGACGTCATCATGAATGCCTTTCACTGGTGTAGTGTTTTTATGACTG
    AGGAGCTTGTAGATAAGGACAACAACTTCTATGCTGAACTTGAAGAGATCTACGAT
    GAAATCTACCCCGTAATCAGTCTGTATAATTTAGTTCGTAACTACGTCACGCAGAAA
    CCCTATTCGACTAAGAAAATTAAGCTGAACTTTGGGATCCCTACTTTGGCAGACGGG
    TGGAGCAAGAGTAAAGAATACAGTAATAATGCAATTATCTTGATGCGCGATAACTT
    ATATTACTTAGGTATTTTCAATGCTAAGAACAAACCTGATAAGAAGATTATCGAAGG
    AAATACGAGTGAGAATAAGGGAGACTACAAAAAGATGATTTACAACTTGCTGCCAG
    GGCCTAATAAGATGATTCCAAAAGTTTTTCTGTCGAGCAAGACAGGGGTTGAAACTT
    ATAAGCCATCCGCTTATATCCTTGAGGGGTACAAGCAGAATAAGCATATCAAGTCCT
    CCAAAGATTTTGATATTACATTTTGCCACGACTTAATTGATTACTTCAAGAACTGCAT
    CGCAATCCATCCCGAATGGAAGAATTTCGGCTTCGATTTCTCAGATACGTCCACGTA
    TGAGGATATCTCAGGCTTTTACCGCGAAGTTGAGCTGCAAGGTTATAAAATTGATTG
    GACATACATCTCCGAAAAAGACATTGATCTTTTACAGGAAAAGGGCCAATTATACTT
    ATTTCAAATCTATAACAAAGATTTTAGCAAGAAGTCCACAGGTAATGATAACCTGCA
    TACGATGTATTTGAAAAATCTTTTCAGTGAAGAGAATTTGAAGGATATCGTCCTGAA
    GCTGAACGGTGAGGCTGAGATCTTCTTCCGCAAATCGTCTATCAAAAACCCCATCAT
    TCACAAAAAGGGAAGTATCTTAGTAAACCGCACTTATGAAGCGGAGGAAAAGGATC
    AGTTCGGGAACATCCAGATCGTGCGCAAGAACATTCCAGAAAACATCTATCAGGAA
    CTTTACAAATATTTCAATGACAAGTCTGATAAAGAATTATCAGACGAGGCGGCGAA
    ACTTAAAAATGTTGTTGGACACCACGAAGCAGCGACGAATATTGTAAAGGATTATC
    GCTACACATACGATAAATACTTTTTGCACATGCCAATCACCATTAACTTTAAGGCGA
    ACAAGACAGGTTTCATTAACGACCGTATTCTGCAATATATCGCAAAGGAAAAAGAC
    CTGCACGTTATTGGGATCGATCGTGGCGAACGCAATTTGATCTACGTAAGCGTTATC
    GACACTTGCGGAAATATCGTTGAACAAAAAAGCTTTAATATCGTCAATGGATACGAT
    TACCAAATCAAGCTGAAACAACAAGAAGGGGCACGTCAGATCGCTCGTAAAGAATG
    GAAAGAGATTGGTAAGATCAAAGAGATTAAAGAAGGGTATCTTTCTTTAGTAATTC
    ACGAGATTTCGAAAATGGTTATTAAATACAATGCGATTATTGCTATGGAAGACTTAA
    GCTACGGCTTTAAGAAAGGTCGCTTCAAAGTGGAGCGCCAAGTGTATCAGAAGTTT
    GAAACGATGTTGATTAACAAATTAAATTACCTGGTCTTTAAGGACATCAGTATCACA
    GAAAATGGGGGGTTGCTTAAAGGGTACCAGCTTACATACATCCCTGATAAACTGAA
    AAATGTCGGTCATCAGTGCGGATGTATCTTCTATGTACCAGCAGCCTATACCAGTAA
    GATTGACCCTACTACTGGCTTTGTGAATATTTTTAAATTCAAGGATTTAACCGTGGAC
    GCCAAGCGTGAATTTATTAAAAAATTTGATTCGATTCGCTACGACAGTGAGAAAAAC
    CTTTTCTGCTTTACCTTTGACTACAACAATTTTATTACCCAGAACACCGTAATGTCAA
    AGAGTTCGTGGTCTGTATATACCTACGGTGTTCGCATCAAGCGCCGCTTCGTAAACG
    GGCGTTTCAGTAACGAATCTGACACCATCGACATCACTAAAGATATGGAGAAGACA
    TTGGAAATGACGGACATTAATTGGCGTGATGGCCATGACTTACGTCAGGACATTATT
    GATTACGAAATTGTGCAGCATATCTTCGAGATTTTCCGTTTGACAGTTCAGATGCGC
    AACTCACTGAGTGAGTTAGAAGATCGCGATTACGACCGTCTGATCTCACCGGTCCTT
    AATGAAAACAACATTTTCTACGACTCAGCAAAGGCGGGTGATGCCCTGCCAAAGGA
    TGCGGACGCTAATGGCGCCTACTGCATCGCCCTGAAAGGATTGTATGAAATTAAGCA
    GATTACAGAAAATTGGAAGGAAGATGGTAAATTTAGCCGTGATAAATTAAAAATCT
    CGAACAAGGATTGGTTCGATTTTATTCAGAACAAACGTTATTTG
    SEQ ID NO: 29
    ATGAACAATGGAACAAATAATTTTCAAAATTTTATCGGCATCTCAAGTCTTCAAAAA
    ACCCTTCGCAATGCCCTGATTCCAACTGAAACAACCCAGCAATTTATCGTCAAGAAC
    GGCATCATTAAGGAAGACGAGTTACGCGGGGAGAACCGTCAAATCCTGAAAGATAT
    CATGGATGACTACTATCGTGGGTTCATTTCGGAAACCTTGTCTTCAATCGACGACAT
    TGACTGGACGAGTCTTTTCGAGAAAATGGAAATTCAGCTTAAAAATGGAGACAACA
    AGGATACTCTGATTAAGGAACAGACAGAATATCGCAAAGCTATCCACAAAAAGTTC
    GCTAATGATGATCGTTTCAAAAATATGTTTTCTGCTAAATTGATTTCCGATATCTTGC
    CTGAATTTGTAATCCACAACAACAATTATTCTGCTTCCGAGAAGGAAGAGAAGACCC
    AGGTCATTAAATTATTCAGCCGCTTTGCAACCAGCTTTAAAGACTACTTTAAGAATC
    GCGCTAACTGCTTTTCGGCGGATGACATCTCATCATCATCATGCCACCGCATTGTGA
    ACGACAATGCGGAGATCTTCTTTTCGAATGCGTTAGTTTATCGTCGCATTGTCAAAA
    GTCTTAGCAATGATGACATCAACAAGATCTCAGGAGACATGAAAGATTCCTTAAAG
    GAGATGTCTCTTGAGGAAATCTATTCGTATGAGAAATACGGCGAGTTCATTACCCAG
    GAAGGTATTAGTTTCTACAATGATATCTGCGGCAAAGTAAATTCTTTTATGAATCTG
    TATTGCCAAAAAAACAAAGAAAACAAGAATCTTTATAAGTTACAAAAGTTACATAA
    GCAAATTCTGTGCATCGCTGATACATCTTATGAGGTACCCTACAAATTTGAAAGTGA
    TGAGGAGGTCTATCAGAGTGTCAACGGCTTCTTAGACAACATCTCTTCCAAACATAT
    CGTGGAACGCCTGCGTAAAATCGGAGATAACTACAACGGATATAACTTAGATAAAA
    TCTACATCGTGTCCAAGTTTTATGAAAGTGTGAGCCAAAAAACATATCGTGACTGGG
    AAACCATTAACACCGCATTGGAAATTCACTATAACAACATTTTGCCAGGCAACGGG
    AAAAGTAAGGCGGACAAAGTTAAGAAAGCAGTTAAAAATGACCTGCAAAAAAGCA
    TCACTGAAATTAACGAATTGGTATCGAATTACAAATTATGTAGCGACGATAATATCA
    AAGCAGAAACTTACATTCACGAGATTAGTCACATTTTAAATAACTTCGAGGCCCAGG
    AATTGAAATACAATCCCGAAATTCATTTGGTTGAATCAGAACTGAAAGCATCAGAGT
    TGAAAAATGTGTTAGATGTCATTATGAATGCGTTTCATTGGTGCTCTGTGTTCATGAC
    CGAGGAACTGGTTGATAAAGATAACAACTTTTACGCTGAATTGGAGGAGATTTACG
    ATGAGATTTACCCGGTCATTTCGCTTTATAACTTAGTGCGCAATTATGTGACGCAGA
    AACCATATTCCACGAAGAAAATCAAACTTAATTTTGGCATCCCTACTCTGGCTGATG
    GTTGGTCGAAATCGAAAGAGTACAGCAACAACGCGATCATTCTTATGCGTGACAAT
    CTTTACTATTTGGGCATTTTTAATGCCAAGAATAAGCCAGATAAGAAAATCATTGAG
    GGGAATACTTCCGAGAATAAGGGGGATTACAAAAAGATGATCTATAACTTGCTGCC
    CGGCCCCAACAAAATGATTCCTAAGGTTTTCTTGTCAAGCAAGACGGGCGTCGAAAC
    ATATAAGCCGTCAGCTTATATTCTGGAAGGCTATAAACAGAATAAGCACATCAAGTC
    TTCCAAGGACTTTGACATCACTTTTTGCCACGATTTGATCGACTACTTTAAGAACTGT
    ATTGCGATTCATCCGGAATGGAAGAACTTCGGTTTCGACTTTTCCGATACCTCAACA
    TACGAGGATATCAGCGGCTTCTACCGTGAAGTCGAGCTTCAAGGCTACAAGATCGAT
    TGGACATATATTTCAGAGAAGGACATTGATTTGTTACAAGAGAAAGGTCAACTTTAC
    TTATTTCAGATCTATAACAAAGACTTTTCGAAGAAATCGACAGGAAACGATAACTTA
    CACACTATGTATTTAAAAAATCTGTTTTCGGAGGAAAACCTGAAAGATATTGTGCTG
    AAACTTAACGGCGAGGCAGAGATCTTTTTCCGTAAAAGCTCAATCAAGAATCCTATC
    ATCCATAAAAAAGGTAGTATTCTTGTCAACCGCACATATGAAGCGGAGGAGAAGGA
    CCAATTCGGAAACATCCAAATTGTCCGTAAGAATATTCCGGAGAACATTTACCAAGA
    GTTGTATAAATACTTTAACGATAAGTCAGATAAGGAACTTAGCGATGAGGCGGCGA
    AGCTTAAAAACGTAGTTGGGCATCATGAAGCTGCTACCAACATTGTAAAAGATTACC
    GTTACACCTATGACAAGTATTTCTTGCACATGCCCATTACGATCAATTTCAAAGCAA
    ATAAGACAGGCTTTATCAATGATCGCATCCTGCAGTACATTGCTAAAGAGAAGGATT
    TGCATGTTATCGGTATTGATCGCGGAGAGCGCAATTTGATCTACGTCTCCGTAATCG
    ACACTTGCGGTAACATTGTTGAGCAGAAGTCGTTCAACATCGTTAATGGTTATGATT
    ACCAAATCAAGCTGAAGCAGCAAGAGGGTGCCCGCCAGATCGCGCGTAAGGAATGG
    AAAGAAATCGGGAAAATTAAAGAGATCAAAGAAGGCTATTTGTCTCTGGTAATTCA
    CGAAATCAGCAAGATGGTGATCAAGTATAACGCGATCATTGCGATGGAGGATCTTT
    CTTATGGCTTCAAGAAAGGGCGCTTTAAAGTCGAACGCCAGGTCTACCAGAAATTTG
    AGACAATGCTTATCAACAAGCTTAACTATCTTGTATTTAAGGATATTTCCATCACTG
    AGAACGGAGGACTTTTAAAGGGGTACCAACTGACGTACATTCCTGATAAGCTGAAG
    AACGTTGGTCATCAATGCGGATGCATCTTCTATGTGCCAGCGGCTTACACCTCCAAA
    ATCGATCCCACTACAGGCTTTGTCAATATCTTCAAATTCAAGGATTTGACCGTTGAC
    GCGAAGCGCGAGTTTATCAAGAAGTTTGATAGCATTCGCTACGACAGCGAAAAAAA
    TTTATTTTGTTTTACTTTCGACTACAATAACTTTATTACTCAGAACACTGTCATGTCA
    AAGAGTTCGTGGAGTGTCTACACGTACGGAGTACGTATTAAGCGCCGTTTCGTCAAC
    GGACGCTTCTCAAACGAAAGCGACACGATCGACATCACCAAAGACATGGAAAAAAC
    TCTTGAGATGACGGATATCAATTGGCGCGACGGCCATGACCTGCGTCAGGATATCAT
    TGATTACGAGATCGTTCAGCACATCTTCGAAATCTTCCGCCTTACCGTCCAGATGCG
    CAACAGTTTAAGCGAGCTTGAAGACCGCGACTACGATCGTTTGATTAGCCCCGTTCT
    GAACGAGAATAATATTTTCTACGACAGCGCAAAGGCCGGTGATGCTTTGCCAAAGG
    ACGCAGACGCGAATGGAGCCTACTGCATCGCCCTGAAGGGCTTATATGAGATTAAG
    CAAATTACCGAAAATTGGAAGGAAGATGGTAAGTTCTCCCGTGATAAGCTTAAAAT
    TAGCAATAAGGATTGGTTCGACTTCATCCAGAACAAACGTTACCTG
    SEQ ID NO: 30
    ATGAACAACGGAACAAACAATTTCCAAAACTTCATCGGTATCTCTTCGTTGCAGAAG
    ACTCTGCGTAATGCTTTGATCCCGACGGAGACAACCCAACAATTTATCGTCAAAAAC
    GGTATTATTAAGGAGGACGAGTTACGTGGAGAAAATCGTCAAATCCTTAAGGACAT
    CATGGACGATTATTATCGCGGGTTTATTTCTGAAACCCTGAGCAGTATCGATGATAT
    CGACTGGACCTCACTTTTTGAGAAAATGGAGATCCAGTTGAAGAACGGTGATAACA
    AAGACACTCTGATCAAAGAGCAAACTGAATACCGCAAGGCAATTCACAAAAAGTTC
    GCCAACGACGACCGTTTCAAGAATATGTTCTCAGCTAAGTTAATCAGCGACATTTTG
    CCAGAGTTCGTTATCCACAACAATAATTATAGTGCTTCAGAGAAGGAGGAAAAAAC
    CCAAGTGATTAAACTTTTTTCGCGCTTTGCAACCTCATTCAAGGACTACTTCAAGAAT
    CGCGCGAATTGCTTCAGTGCGGACGACATTTCTTCTTCAAGTTGCCATCGTATCGTTA
    ACGATAACGCGGAAATTTTCTTCTCTAATGCTTTGGTGTATCGCCGCATTGTAAAATC
    GCTTAGTAACGATGACATTAATAAGATCTCAGGTGATATGAAAGATTCATTGAAGG
    AAATGAGCTTGGAAGAGATTTACAGTTACGAAAAATATGGAGAATTTATTACTCAG
    GAAGGCATCTCATTCTATAACGATATCTGCGGGAAGGTAAATTCGTTTATGAACTTA
    TATTGCCAGAAAAATAAAGAGAATAAAAATTTGTATAAGCTTCAGAAGTTGCACAA
    ACAGATCCTGTGCATTGCAGACACCTCGTATGAGGTTCCGTATAAATTTGAGTCCGA
    TGAAGAAGTGTATCAGTCTGTGAATGGTTTCTTAGATAATATCTCTTCCAAGCATATT
    GTCGAACGCCTGCGCAAAATTGGTGATAACTATAACGGATACAATCTGGATAAAAT
    TTACATCGTTTCTAAATTTTACGAGTCAGTCTCGCAGAAGACCTACCGCGACTGGGA
    AACAATTAACACGGCATTGGAGATTCACTACAATAATATCTTGCCTGGTAACGGTAA
    GTCTAAGGCAGATAAGGTAAAAAAAGCTGTGAAAAACGACCTTCAGAAAAGCATCA
    CGGAGATTAATGAGCTGGTGAGTAATTACAAATTATGTTCAGACGATAATATTAAAG
    CTGAAACGTATATCCATGAAATCTCGCATATCTTGAACAACTTCGAGGCCCAAGAAC
    TTAAATATAACCCCGAAATCCATTTAGTCGAGTCTGAATTGAAAGCGTCGGAATTAA
    AAAACGTCTTAGACGTCATTATGAACGCGTTTCACTGGTGTTCAGTTTTCATGACCG
    AAGAGCTGGTCGACAAAGACAACAACTTCTATGCGGAATTGGAGGAAATCTATGAT
    GAAATCTACCCTGTTATTTCACTGTATAACCTTGTGCGCAACTATGTCACTCAGAAG
    CCGTATTCGACCAAAAAAATTAAATTGAATTTCGGTATCCCTACTCTTGCAGACGGA
    TGGAGTAAAAGCAAGGAATACAGTAATAACGCCATTATTCTTATGCGCGACAATTTA
    TACTACCTGGGCATCTTTAACGCAAAGAATAAGCCGGATAAGAAGATTATTGAGGG
    TAACACCAGTGAGAACAAGGGCGACTATAAGAAGATGATCTATAACTTATTGCCAG
    GTCCAAATAAAATGATCCCAAAAGTATTCTTATCATCAAAGACGGGAGTTGAAACCT
    ATAAGCCTAGTGCCTATATTCTTGAGGGATATAAACAGAACAAGCACATTAAGTCGT
    CTAAGGATTTTGACATTACGTTCTGCCATGACTTAATCGACTATTTTAAAAACTGTAT
    TGCGATTCACCCCGAATGGAAGAATTTTGGATTCGATTTTTCGGATACCTCGACCTA
    TGAAGATATTTCGGGATTTTATCGTGAAGTGGAGTTGCAAGGCTATAAAATCGATTG
    GACCTATATCTCAGAAAAAGACATTGATTTATTACAGGAAAAGGGACAACTGTACC
    TTTTCCAAATTTATAACAAGGACTTTTCTAAAAAGTCCACAGGAAATGATAACCTTC
    ACACCATGTACCTGAAGAACCTTTTCTCAGAGGAAAACCTGAAGGACATTGTCCTTA
    AGTTAAATGGAGAAGCGGAGATCTTTTTCCGTAAATCTAGTATCAAGAATCCGATTA
    TCCATAAAAAAGGTTCGATTTTGGTAAATCGCACCTATGAAGCGGAAGAGAAAGAT
    CAATTTGGTAACATCCAGATCGTGCGCAAGAATATCCCGGAGAACATTTACCAAGA
    GCTGTATAAGTACTTCAATGATAAGTCTGATAAGGAACTGTCAGATGAAGCTGCGA
    AATTGAAGAACGTGGTTGGGCATCATGAAGCCGCTACCAATATCGTCAAGGATTAC
    CGTTATACCTATGACAAATATTTCTTACACATGCCGATTACGATCAATTTTAAGGCA
    AACAAGACAGGATTCATCAACGACCGTATCTTGCAGTATATTGCCAAAGAGAAGGA
    TCTGCATGTGATCGGTATTGACCGCGGGGAGCGCAATTTAATCTATGTATCGGTGAT
    CGATACTTGTGGTAACATCGTAGAACAAAAGAGCTTTAACATCGTGAATGGTTACGA
    CTATCAGATCAAGCTGAAACAACAGGAAGGAGCCCGCCAGATCGCTCGCAAGGAAT
    GGAAAGAAATCGGGAAAATTAAGGAAATCAAGGAAGGCTACCTTTCATTGGTCATT
    CACGAAATTTCGAAAATGGTAATTAAGTACAACGCGATCATCGCCATGGAGGACCT
    TTCGTACGGATTTAAGAAGGGTCGTTTCAAAGTTGAGCGCCAGGTATACCAAAAATT
    CGAGACTATGCTTATCAACAAACTTAACTACTTGGTCTTTAAGGACATTTCTATTACC
    GAAAACGGCGGCTTACTTAAAGGCTATCAATTGACATATATTCCCGACAAACTGAA
    GAATGTTGGACATCAATGCGGGTGTATTTTCTATGTGCCGGCAGCTTACACTAGTAA
    GATCGACCCTACAACCGGGTTCGTAAACATTTTTAAATTCAAAGACTTAACAGTCGA
    TGCGAAGCGTGAATTTATTAAGAAGTTTGATAGTATCCGCTATGACAGTGAAAAGA
    ACTTGTTTTGCTTTACGTTCGACTACAATAACTTTATTACACAGAACACGGTCATGTC
    TAAATCATCATGGTCGGTTTACACATATGGGGTGCGCATCAAGCGTCGCTTTGTAAA
    TGGCCGTTTTAGTAATGAGAGCGACACAATCGACATCACAAAGGATATGGAGAAAA
    CTCTTGAGATGACAGACATCAATTGGCGTGACGGTCATGACTTACGCCAAGATATCA
    TCGACTACGAAATCGTACAGCATATTTTTGAGATTTTTCGTCTTACTGTGCAAATGCG
    TAATTCTTTATCCGAACTGGAAGATCGTGATTACGACCGCTTGATTAGTCCCGTCTTA
    AATGAGAACAATATTTTCTATGATTCTGCGAAAGCCGGAGATGCACTGCCCAAAGA
    CGCTGATGCCAATGGCGCGTATTGCATTGCATTAAAAGGATTATATGAGATTAAACA
    GATTACCGAAAATTGGAAAGAGGACGGTAAATTCTCACGCGATAAATTGAAGATTT
    CTAACAAGGACTGGTTCGACTTTATCCAAAATAAACGTTATCTT
    SEQ ID NO: 31
    ATGAATAACGGTACCAACAACTTTCAGAATTTCATTGGCATTAGCTCGCTTCAAAAA
    ACTTTACGCAATGCTCTTATTCCGACTGAGACGACACAACAGTTTATCGTTAAGAAT
    GGCATCATCAAAGAAGATGAATTACGCGGAGAAAACCGCCAGATCCTGAAAGACAT
    TATGGACGATTATTACCGTGGGTTCATCTCCGAGACGTTGTCATCGATCGATGACAT
    CGACTGGACGTCACTTTTTGAAAAAATGGAGATCCAGTTAAAGAACGGTGACAATA
    AGGATACATTGATCAAAGAACAGACCGAGTACCGTAAAGCGATTCATAAAAAGTTT
    GCGAACGATGATCGCTTCAAGAATATGTTTTCTGCGAAATTAATTTCCGACATTTTA
    CCTGAATTTGTTATTCATAATAACAACTACTCGGCGTCTGAGAAAGAGGAGAAAACC
    CAAGTGATTAAACTTTTTTCACGTTTCGCAACGTCGTTCAAAGACTATTTTAAAAATC
    GTGCTAATTGCTTTAGCGCGGATGACATCAGCTCTAGTTCATGTCATCGCATTGTCA
    ACGATAATGCTGAGATCTTTTTCAGTAATGCGTTAGTGTACCGTCGTATTGTGAAGT
    CCTTATCTAATGATGATATCAATAAGATCAGCGGGGATATGAAGGACTCACTTAAGG
    AGATGAGCTTGGAGGAAATCTATTCCTATGAGAAGTATGGTGAGTTTATTACGCAAG
    AAGGAATTAGCTTTTACAACGATATCTGTGGAAAGGTGAATTCGTTTATGAATTTGT
    ATTGCCAGAAAAATAAGGAGAACAAGAACCTTTATAAATTGCAAAAGTTACACAAG
    CAAATCCTGTGCATTGCAGATACTTCCTACGAGGTGCCTTACAAGTTTGAATCCGAC
    GAAGAGGTCTACCAATCTGTAAACGGTTTCTTAGATAATATTAGTTCCAAGCATATT
    GTGGAGCGCCTTCGTAAAATTGGCGATAATTACAACGGTTACAATTTAGACAAAATT
    TACATTGTCAGTAAATTCTACGAGTCCGTATCTCAAAAGACGTATCGTGATTGGGAG
    ACTATCAATACGGCCCTGGAGATCCACTACAACAATATCTTGCCCGGTAATGGTAAG
    TCGAAGGCCGATAAAGTTAAGAAAGCGGTGAAAAATGACTTACAGAAGTCAATCAC
    CGAAATTAACGAATTGGTGTCCAATTATAAATTGTGTTCAGATGATAATATCAAAGC
    CGAGACCTACATTCATGAGATTTCCCATATCTTAAATAATTTCGAGGCGCAAGAGCT
    TAAGTATAACCCAGAAATCCACCTGGTAGAATCTGAGTTGAAGGCGTCAGAGTTAA
    AAAATGTTTTAGATGTCATTATGAACGCGTTTCACTGGTGCTCCGTATTTATGACGG
    AGGAATTAGTAGATAAAGACAACAATTTCTATGCCGAACTTGAGGAAATCTATGAT
    GAGATCTATCCCGTCATTAGCCTGTATAACTTGGTCCGCAACTATGTTACCCAAAAA
    CCGTACAGTACCAAGAAGATTAAGCTGAATTTCGGCATTCCTACACTGGCTGATGGT
    TGGAGTAAATCGAAGGAATATTCGAATAACGCGATTATCTTGATGCGCGACAACTTA
    TACTATTTGGGGATCTTTAACGCCAAAAACAAACCGGATAAGAAGATTATTGAGGG
    AAACACATCAGAGAACAAAGGCGACTACAAAAAAATGATTTACAACTTGTTACCGG
    GGCCTAACAAAATGATCCCGAAGGTGTTCTTATCCAGTAAAACAGGCGTTGAGACCT
    ACAAACCTTCCGCATACATCCTGGAAGGGTATAAGCAGAACAAGCACATTAAGTCC
    AGCAAGGATTTCGATATTACCTTCTGTCATGATTTAATTGACTATTTCAAGAACTGTA
    TTGCAATCCACCCCGAGTGGAAGAACTTCGGATTCGACTTCTCAGATACGAGCACAT
    ATGAGGACATCTCGGGGTTCTATCGTGAAGTAGAACTGCAGGGATATAAAATTGATT
    GGACATATATTTCCGAAAAAGACATCGACCTTTTACAAGAGAAGGGTCAACTTTACT
    TGTTCCAAATTTACAATAAAGACTTCTCAAAAAAAAGCACGGGTAACGATAATTTAC
    ACACTATGTATTTAAAGAACCTTTTCTCGGAAGAGAATTTAAAGGATATCGTATTGA
    AGTTGAATGGAGAAGCGGAGATCTTCTTCCGTAAGTCCAGTATTAAAAACCCTATTA
    TTCACAAGAAGGGATCGATTTTAGTTAACCGCACATACGAGGCCGAAGAGAAGGAC
    CAATTTGGGAACATTCAAATTGTCCGCAAAAACATCCCTGAGAACATTTATCAAGAG
    CTTTATAAGTACTTTAACGATAAGTCCGATAAGGAATTGTCAGATGAGGCGGCAAA
    GTTGAAGAATGTCGTGGGGCATCATGAAGCTGCCACCAACATTGTGAAGGACTACC
    GCTACACTTACGACAAATACTTCCTGCACATGCCCATTACGATCAATTTTAAGGCCA
    ATAAGACAGGCTTTATTAACGACCGTATTCTTCAATATATCGCTAAGGAGAAGGACC
    TTCATGTGATTGGGATCGACCGCGGAGAACGTAATTTAATTTATGTGTCCGTCATCG
    ATACGTGTGGAAATATCGTGGAACAGAAATCATTCAATATCGTGAATGGCTATGATT
    ACCAGATCAAATTAAAACAGCAGGAGGGCGCTCGCCAAATTGCGCGTAAGGAATGG
    AAAGAGATCGGAAAAATCAAAGAAATCAAAGAAGGATATTTGTCATTGGTGATCCA
    TGAGATTTCAAAAATGGTAATTAAATATAATGCAATTATCGCAATGGAAGACCTGTC
    CTATGGTTTTAAGAAGGGTCGTTTCAAGGTAGAACGCCAAGTGTATCAAAAGTTCGA
    GACGATGCTGATCAATAAGCTGAATTATCTTGTGTTTAAGGACATTAGCATCACGGA
    AAATGGAGGGCTGTTGAAAGGCTATCAACTGACGTATATCCCTGACAAGCTGAAAA
    ATGTTGGCCATCAGTGCGGGTGCATTTTCTACGTCCCCGCGGCGTATACAAGCAAGA
    TCGATCCTACTACGGGATTCGTAAATATTTTTAAATTCAAAGACTTAACCGTGGACG
    CCAAGCGCGAATTCATTAAGAAGTTTGATAGCATTCGCTACGATTCAGAAAAAAATC
    TTTTCTGTTTTACGTTCGATTACAACAATTTTATCACCCAGAACACAGTGATGAGCAA
    GTCATCCTGGTCTGTCTATACCTACGGTGTCCGTATCAAACGCCGCTTCGTCAACGG
    ACGCTTCTCTAATGAATCTGATACCATTGACATCACCAAGGACATGGAAAAGACACT
    TGAGATGACAGATATTAACTGGCGTGACGGACATGACCTGCGTCAGGACATCATCG
    ATTATGAGATTGTTCAGCATATCTTCGAGATCTTCCGCCTGACAGTACAAATGCGCA
    ATTCACTGTCAGAACTTGAAGACCGCGACTATGACCGCCTGATCTCTCCAGTATTAA
    ATGAGAACAATATCTTTTATGACAGTGCTAAGGCCGGCGATGCCCTTCCGAAAGATG
    CTGATGCTAACGGAGCTTATTGTATTGCATTAAAGGGTCTTTATGAGATCAAGCAAA
    TTACCGAGAATTGGAAGGAGGATGGCAAATTCTCGCGCGACAAACTGAAAATCAGT
    AACAAGGACTGGTTCGATTTTATTCAGAATAAACGTTACCTG
    SEQ ID NO: 32
    ATGAATAACGGAACGAACAACTTCCAGAACTTCATCGGCATCAGTTCTTTACAAAAA
    ACCCTGCGTAACGCCCTTATTCCGACTGAGACAACACAACAGTTCATCGTTAAAAAC
    GGAATTATCAAAGAGGACGAGTTGCGCGGCGAGAATCGCCAAATTTTGAAAGATAT
    TATGGACGACTATTATCGTGGTTTTATTTCAGAAACACTGAGTTCGATTGACGATAT
    CGATTGGACGAGCCTGTTTGAGAAAATGGAAATCCAGTTGAAAAATGGCGATAATA
    AAGACACTTTAATCAAAGAACAAACCGAGTATCGTAAAGCGATCCATAAAAAGTTC
    GCTAATGACGATCGTTTTAAGAATATGTTCAGTGCGAAACTGATTTCAGACATTTTG
    CCCGAGTTCGTGATCCATAATAACAACTATTCCGCCTCGGAAAAGGAAGAAAAAAC
    CCAGGTGATTAAGCTGTTCAGTCGCTTCGCAACATCTTTCAAGGATTATTTCAAGAA
    TCGCGCGAATTGCTTCAGTGCGGACGATATTTCTAGTTCAAGCTGCCATCGTATCGTT
    AATGATAACGCGGAGATTTTTTTTAGCAATGCTCTGGTGTACCGCCGCATTGTTAAG
    TCACTGTCCAACGATGATATTAACAAGATCTCAGGAGACATGAAAGACTCGCTTAA
    AGAGATGAGTCTGGAAGAGATCTATTCTTATGAGAAGTATGGCGAGTTTATTACCCA
    AGAAGGAATCTCATTCTACAATGATATTTGTGGAAAGGTGAACAGCTTTATGAATCT
    TTACTGCCAAAAAAACAAGGAGAATAAGAATCTTTACAAACTTCAGAAGTTACATA
    AACAGATTTTGTGTATTGCGGATACGTCTTATGAAGTCCCCTACAAATTTGAATCGG
    ATGAAGAGGTATACCAAAGTGTGAACGGATTCTTGGACAATATTTCTTCTAAACATA
    TTGTTGAACGCTTACGTAAGATCGGGGATAACTACAATGGCTACAATCTTGACAAAA
    TCTACATTGTTAGCAAATTCTACGAGAGTGTCAGCCAAAAGACGTACCGCGATTGGG
    AAACAATTAATACTGCGCTTGAGATTCACTATAATAACATTTTACCAGGCAACGGCA
    AGTCCAAGGCGGATAAAGTTAAAAAAGCTGTTAAAAACGATTTGCAAAAATCTATC
    ACAGAAATTAACGAGTTAGTTAGTAACTACAAACTGTGCTCCGATGACAACATTAA
    GGCTGAGACGTATATCCATGAGATCTCTCACATCTTAAACAATTTTGAAGCTCAAGA
    ACTTAAGTACAATCCGGAAATCCACCTGGTGGAATCCGAGCTGAAGGCTAGCGAAC
    TGAAGAACGTATTGGACGTGATCATGAACGCGTTCCACTGGTGTTCTGTCTTTATGA
    CGGAAGAGCTTGTCGACAAAGATAATAACTTTTACGCGGAACTTGAGGAAATTTAC
    GATGAGATTTACCCAGTTATTTCATTGTATAACCTTGTCCGTAATTACGTGACCCAAA
    AGCCTTATAGTACGAAAAAAATCAAATTAAATTTTGGAATCCCAACACTGGCTGACG
    GTTGGAGCAAATCTAAGGAGTATTCTAATAACGCAATCATCTTAATGCGTGACAACC
    TGTATTATTTGGGTATCTTCAATGCCAAAAATAAGCCTGACAAAAAGATTATCGAAG
    GAAATACTTCGGAGAATAAGGGGGATTACAAAAAAATGATTTACAATTTGCTGCCC
    GGGCCGAACAAGATGATCCCCAAAGTGTTCTTATCCTCGAAGACTGGTGTAGAAAC
    ATACAAGCCAAGCGCATACATTCTGGAGGGTTACAAGCAAAACAAACACATCAAAT
    CTTCAAAAGACTTTGACATTACATTTTGCCATGATCTTATTGACTACTTCAAAAACTG
    CATTGCTATTCACCCCGAGTGGAAGAACTTTGGGTTTGACTTCAGCGACACGTCTAC
    GTATGAGGACATCTCCGGGTTCTACCGTGAAGTTGAGTTACAAGGGTATAAGATTGA
    CTGGACGTATATTTCAGAGAAAGATATCGATCTTTTGCAGGAAAAGGGCCAGTTATA
    TTTATTCCAGATTTACAACAAGGACTTTAGTAAGAAGTCAACAGGAAATGACAACTT
    GCATACGATGTATTTGAAAAATCTTTTTTCTGAGGAAAATCTTAAGGACATCGTACT
    GAAATTGAATGGCGAGGCTGAAATCTTCTTCCGTAAATCCTCCATTAAGAATCCCAT
    TATCCACAAAAAGGGGTCTATCCTGGTGAATCGTACCTACGAGGCAGAGGAGAAGG
    ATCAATTCGGAAATATTCAGATTGTTCGTAAGAACATCCCCGAGAACATTTATCAAG
    AATTGTATAAGTACTTTAATGACAAATCTGACAAAGAGTTATCCGACGAAGCTGCGA
    AACTGAAAAACGTTGTTGGTCACCACGAGGCCGCCACTAATATCGTAAAAGACTAC
    CGTTATACCTATGACAAGTACTTTTTGCACATGCCGATCACTATCAACTTCAAGGCG
    AATAAGACGGGCTTCATTAACGATCGTATCCTGCAATACATCGCCAAGGAGAAGGA
    CCTTCACGTCATTGGGATTGACCGTGGTGAGCGTAACCTGATTTATGTAAGCGTCAT
    TGATACCTGCGGTAATATCGTCGAACAGAAAAGTTTCAACATTGTAAATGGATATGA
    CTATCAGATCAAACTTAAGCAGCAGGAGGGTGCACGCCAGATTGCCCGCAAGGAAT
    GGAAGGAGATTGGGAAGATTAAGGAAATTAAAGAAGGTTACTTATCACTGGTTATT
    CACGAGATCAGTAAAATGGTAATCAAATATAACGCGATCATTGCCATGGAGGATCT
    GAGCTATGGCTTTAAAAAGGGCCGTTTCAAAGTCGAGCGCCAGGTATATCAAAAGT
    TTGAAACAATGCTGATTAACAAATTAAACTATCTGGTTTTCAAAGATATTTCGATCA
    CTGAAAATGGCGGGCTGTTGAAGGGATACCAACTTACATACATCCCTGACAAACTG
    AAAAATGTCGGTCACCAATGTGGATGTATCTTTTATGTACCAGCAGCGTATACGAGC
    AAAATCGATCCAACTACGGGTTTTGTGAACATCTTTAAGTTCAAGGATTTGACAGTA
    GATGCCAAACGCGAGTTCATTAAAAAATTTGATTCAATTCGCTACGATTCAGAGAAA
    AATCTTTTTTGTTTCACGTTCGATTACAATAATTTCATTACGCAGAACACAGTAATGT
    CAAAGTCAAGCTGGTCGGTCTACACGTATGGAGTCCGTATTAAACGTCGTTTTGTAA
    ACGGCCGTTTCTCAAATGAATCAGATACAATTGATATTACGAAGGATATGGAGAAG
    ACATTAGAGATGACTGACATTAACTGGCGCGACGGACATGATCTTCGTCAGGACATT
    ATTGATTATGAGATTGTACAGCATATCTTTGAGATCTTCCGCCTGACCGTTCAGATGC
    GCAATTCGTTGTCCGAGTTAGAAGACCGCGATTACGACCGTTTAATCAGTCCCGTCT
    TAAACGAAAATAACATCTTCTACGATTCAGCCAAGGCAGGCGATGCCTTGCCAAAG
    GATGCTGACGCAAATGGCGCATACTGTATTGCGTTGAAAGGCCTTTATGAAATCAAG
    CAAATTACCGAAAACTGGAAAGAAGACGGAAAATTCTCCCGTGATAAGTTGAAAAT
    CTCTAATAAGGATTGGTTCGATTTCATCCAAAATAAACGCTATTTG
    SEQ ID NO: 33
    ATGAACAACGGAACTAATAATTTCCAAAATTTTATAGGCATCTCTTCTTTACAGAAG
    ACTCTTCGTAACGCCCTAATCCCGACTGAGACCACACAACAATTCATAGTGAAAAAT
    GGGATCATTAAAGAAGACGAGCTGCGTGGGGAGAACAGGCAGATCCTAAAAGACA
    TAATGGACGATTATTATAGAGGGTTCATCTCAGAGACATTATCTAGCATCGACGACA
    TTGACTGGACCTCCCTGTTTGAAAAAATGGAAATCCAGCTGAAGAATGGTGACAAT
    AAAGACACATTAATAAAAGAACAAACAGAGTACAGGAAAGCCATCCACAAGAAGT
    TCGCAAACGATGACAGATTCAAAAATATGTTCAGTGCGAAGCTAATATCCGACATCT
    TACCAGAGTTTGTAATACACAATAACAATTACAGCGCGAGCGAAAAGGAAGAGAAA
    ACGCAAGTAATTAAGCTTTTTAGTAGGTTCGCTACCTCTTTCAAAGATTACTTCAAA
    AATCGTGCTAACTGCTTCTCAGCCGACGACATATCTTCAAGTTCCTGTCACCGTATCG
    TGAATGATAACGCTGAGATATTCTTCTCAAACGCCCTTGTATACCGTAGGATCGTAA
    AGTCCTTATCTAACGATGATATAAACAAGATCAGTGGAGACATGAAAGACAGCCTT
    AAAGAGATGTCTCTAGAAGAAATTTACTCCTATGAAAAGTATGGGGAGTTTATAAC
    ACAGGAGGGGATCAGCTTCTACAACGACATCTGCGGAAAGGTGAACAGTTTCATGA
    ATCTTTACTGCCAGAAGAATAAAGAGAACAAAAATCTTTATAAGCTTCAAAAGTTGC
    ACAAACAAATACTGTGCATTGCCGATACATCATATGAGGTCCCCTATAAGTTCGAAT
    CTGATGAGGAAGTTTATCAATCTGTTAACGGCTTTCTAGACAATATCAGCTCAAAAC
    ACATCGTAGAAAGACTGAGGAAAATAGGTGATAATTATAATGGATACAACTTGGAT
    AAAATATATATAGTCTCTAAATTTTACGAGTCAGTATCCCAGAAAACGTATAGGGAT
    TGGGAGACCATCAACACGGCGTTAGAGATTCATTACAATAACATCTTACCGGGAAA
    CGGAAAAAGTAAGGCGGACAAAGTAAAGAAAGCCGTTAAAAATGACTTACAAAAG
    AGTATAACAGAAATAAACGAACTAGTAAGCAACTACAAGCTTTGTTCCGATGATAA
    TATCAAGGCCGAGACATATATCCATGAGATCTCCCACATTCTAAACAATTTCGAAGC
    GCAAGAACTTAAATATAATCCCGAAATCCACCTGGTGGAAAGTGAACTAAAGGCTA
    GTGAGTTAAAGAACGTTCTTGATGTTATCATGAACGCCTTCCATTGGTGCTCTGTTTT
    TATGACCGAGGAGTTGGTTGATAAAGATAATAATTTCTACGCTGAATTAGAGGAGAT
    ATACGACGAAATCTACCCAGTGATTTCACTATACAACTTGGTCAGGAACTATGTTAC
    ACAAAAGCCGTACAGCACTAAGAAAATTAAGCTAAATTTCGGTATCCCCACGTTAG
    CCGACGGGTGGAGCAAGTCCAAAGAATATTCCAACAATGCGATTATTTTAATGCGTG
    ACAATCTTTATTACCTTGGCATCTTCAATGCCAAAAACAAACCTGACAAAAAGATTA
    TAGAAGGTAATACGTCCGAGAACAAAGGCGATTACAAGAAGATGATTTATAACCTA
    CTGCCCGGACCAAACAAAATGATCCCCAAAGTTTTTCTTAGTTCTAAAACCGGCGTA
    GAGACGTATAAACCTTCTGCCTATATCTTAGAGGGATATAAGCAGAACAAACATATC
    AAATCTTCCAAGGACTTTGATATTACATTCTGCCACGATTTAATTGACTACTTCAAAA
    ATTGCATAGCGATACATCCGGAGTGGAAGAACTTTGGCTTCGACTTCAGTGATACAT
    CCACCTATGAGGATATATCAGGCTTCTATCGTGAGGTCGAATTGCAAGGGTACAAAA
    TCGATTGGACGTATATATCCGAGAAAGACATAGACCTTCTTCAAGAAAAGGGGCAG
    TTATATTTATTCCAAATATACAACAAGGACTTCAGTAAGAAGTCAACAGGTAATGAC
    AACTTACACACCATGTACTTGAAAAATTTATTTTCTGAAGAAAACCTAAAGGACATT
    GTACTAAAACTGAACGGGGAGGCAGAAATTTTTTTTAGAAAGAGCAGCATAAAAAA
    CCCAATAATTCATAAGAAAGGAAGCATTTTAGTTAATAGGACGTACGAGGCAGAGG
    AAAAGGACCAGTTTGGCAATATCCAGATCGTAAGGAAAAATATTCCTGAAAACATA
    TATCAGGAACTATATAAATACTTTAACGACAAATCCGACAAAGAATTATCCGACGA
    GGCTGCAAAGCTGAAGAACGTCGTAGGGCACCATGAGGCAGCGACTAATATTGTGA
    AAGACTATAGGTATACATACGACAAATACTTTCTGCACATGCCCATCACGATTAACT
    TCAAGGCGAACAAGACGGGATTCATTAACGACCGTATATTACAATATATTGCTAAG
    GAGAAAGATCTGCATGTAATAGGTATCGACAGAGGCGAACGTAATTTAATCTACGT
    GTCCGTCATCGACACGTGCGGGAACATCGTAGAGCAAAAGAGTTTTAATATAGTAA
    ATGGCTATGATTACCAAATTAAGCTAAAGCAGCAAGAAGGAGCAAGACAGATAGCT
    AGGAAAGAATGGAAGGAGATAGGAAAAATAAAGGAGATCAAGGAGGGGTATCTTA
    GCCTAGTAATTCATGAAATATCTAAGATGGTTATCAAATACAACGCTATCATAGCGA
    TGGAAGACTTATCTTATGGTTTCAAGAAAGGAAGGTTCAAAGTAGAGCGTCAAGTTT
    ATCAAAAGTTCGAAACGATGTTGATTAATAAACTAAACTATTTGGTATTTAAAGATA
    TATCTATCACCGAGAATGGTGGTCTACTAAAGGGTTACCAGCTTACATACATACCGG
    ACAAACTTAAAAACGTCGGACATCAGTGTGGATGCATTTTCTACGTTCCAGCTGCAT
    ATACCAGCAAGATCGACCCAACGACTGGGTTCGTAAATATTTTTAAATTCAAGGATT
    TGACTGTCGACGCCAAAAGAGAGTTCATAAAAAAGTTCGATTCAATTAGGTACGAC
    AGCGAAAAGAATTTGTTCTGCTTTACTTTTGACTATAACAATTTCATTACTCAGAACA
    CTGTAATGTCTAAGTCCTCTTGGTCAGTCTATACTTATGGCGTTCGTATCAAACGTAG
    ATTTGTTAACGGTAGATTCTCAAATGAAAGTGATACAATAGATATCACGAAAGATAT
    GGAGAAAACATTAGAAATGACAGACATAAACTGGAGAGACGGACATGACTTGAGA
    CAGGACATTATTGACTACGAGATCGTGCAGCACATCTTTGAGATCTTTCGTTTGACC
    GTACAAATGCGTAACAGTTTATCTGAGCTTGAGGACAGGGACTACGATAGATTGAT
    ATCACCTGTATTAAATGAGAATAACATCTTCTATGATTCCGCAAAAGCAGGCGACGC
    TCTACCCAAAGACGCTGATGCGAACGGTGCTTATTGCATAGCTTTAAAGGGTTTGTA
    TGAGATCAAACAGATAACAGAAAATTGGAAGGAAGATGGTAAGTTCTCCCGTGACA
    AGCTTAAAATATCAAATAAGGACTGGTTCGATTTTATACAGAATAAGCGTTATTA
    SEQ ID NO: 34
    ATGAACAATGGAACTAATAACTTCCAGAATTTCATTGGTATCTCCTCTTTACAAAAA
    ACTCTAAGAAACGCCCTAATTCCGACTGAAACTACACAGCAATTCATCGTCAAAAAC
    GGGATCATTAAGGAGGATGAGTTGAGGGGTGAAAATCGTCAAATTCTTAAAGACAT
    CATGGACGACTACTACAGGGGGTTCATCAGCGAGACGTTATCTAGTATAGACGATAT
    AGACTGGACTTCACTGTTCGAGAAGATGGAAATCCAATTAAAAAATGGGGACAATA
    AAGATACACTTATAAAGGAACAGACAGAGTATAGAAAGGCAATACACAAAAAGTTT
    GCCAACGACGATCGTTTCAAGAACATGTTTAGTGCTAAATTGATTTCAGATATTCTG
    CCGGAATTTGTTATTCACAACAATAATTATAGCGCCAGTGAGAAAGAAGAAAAAAC
    GCAGGTTATCAAACTGTTCAGTCGTTTCGCTACATCTTTTAAGGATTACTTTAAAAAC
    CGTGCAAATTGTTTTTCAGCCGACGATATTAGTAGCAGCTCTTGTCACCGTATTGTTA
    ATGATAATGCGGAGATTTTCTTTTCAAACGCATTGGTCTACAGGAGGATAGTCAAGT
    CCCTTTCAAATGACGACATTAATAAGATCTCAGGTGACATGAAAGATTCCTTAAAGG
    AAATGTCCCTGGAAGAGATCTATTCCTATGAAAAGTACGGTGAGTTCATTACTCAAG
    AGGGTATAAGCTTTTACAATGACATATGTGGTAAGGTTAATAGCTTTATGAACCTGT
    ATTGCCAGAAGAACAAAGAAAATAAGAATCTGTATAAGTTGCAAAAGCTACACAAA
    CAAATTTTGTGCATTGCCGATACATCATACGAGGTGCCATACAAATTCGAGAGCGAT
    GAGGAGGTTTATCAGAGCGTGAATGGATTCCTGGACAATATTAGTAGTAAGCATATC
    GTGGAAAGGCTTAGAAAGATAGGTGACAATTACAATGGCTACAATCTGGATAAAAT
    CTACATCGTCTCAAAATTCTATGAAAGTGTATCCCAGAAGACGTACCGTGATTGGGA
    AACTATCAACACCGCTCTGGAGATACATTACAACAATATACTTCCCGGAAACGGCA
    AGTCAAAAGCCGACAAAGTCAAAAAAGCGGTCAAGAACGATTTACAAAAGTCTATC
    ACTGAAATTAATGAATTAGTTAGTAATTACAAACTGTGTAGTGATGATAATATTAAG
    GCAGAGACTTACATACACGAAATTTCACACATTTTAAACAACTTCGAGGCACAGGA
    ACTTAAATATAATCCTGAAATTCACCTGGTTGAAAGTGAATTGAAAGCCAGCGAGCT
    AAAGAACGTTTTGGACGTAATCATGAACGCATTCCACTGGTGCTCTGTCTTTATGAC
    AGAGGAACTAGTGGATAAGGACAATAATTTTTATGCGGAGCTGGAGGAAATATACG
    ATGAGATATATCCCGTAATATCATTATATAATCTGGTAAGAAACTATGTGACTCAAA
    AGCCGTATAGCACCAAGAAAATTAAACTTAATTTCGGCATACCCACTTTAGCGGACG
    GCTGGTCAAAATCCAAAGAGTATAGTAATAATGCCATCATCCTGATGCGTGACAACC
    TGTACTATTTAGGTATATTTAACGCCAAAAATAAACCCGACAAAAAGATTATAGAG
    GGCAACACCTCAGAGAACAAAGGTGATTATAAGAAGATGATTTACAACCTTTTACC
    CGGTCCTAATAAGATGATTCCCAAAGTCTTTCTATCTAGCAAAACTGGTGTTGAAAC
    ATACAAACCCTCAGCTTATATTTTAGAAGGGTATAAGCAGAATAAGCATATTAAAA
    GCTCCAAAGATTTCGATATTACCTTTTGCCATGACTTGATAGACTATTTCAAAAATTG
    TATTGCCATTCACCCTGAATGGAAAAACTTCGGATTTGACTTCTCTGACACATCCAC
    CTACGAAGACATTTCAGGTTTTTACAGGGAAGTCGAGCTACAGGGTTATAAAATTGA
    TTGGACATACATCAGCGAGAAAGATATTGACCTACTTCAAGAAAAAGGGCAGCTAT
    ACCTGTTCCAGATATACAATAAAGACTTCAGTAAAAAAAGCACCGGGAACGATAAT
    CTTCACACAATGTACTTAAAAAATTTATTTAGTGAAGAGAATCTGAAGGATATAGTG
    CTGAAGTTAAACGGGGAGGCAGAGATATTTTTTAGAAAATCTAGTATTAAGAATCC
    GATCATCCACAAGAAGGGTTCTATCCTTGTTAATAGGACTTATGAGGCAGAAGAAA
    AAGACCAATTCGGCAACATACAAATTGTCCGTAAAAATATCCCTGAGAACATTTATC
    AGGAACTATACAAGTACTTCAATGATAAAAGCGACAAGGAGCTGAGCGACGAGGCT
    GCTAAGTTAAAGAATGTGGTGGGCCACCATGAGGCAGCAACGAATATTGTGAAGGA
    CTATCGTTATACCTACGATAAATACTTTCTTCATATGCCGATCACCATTAATTTCAAG
    GCAAACAAAACTGGCTTCATTAACGATCGTATCTTACAATATATCGCAAAAGAGAA
    AGACCTTCACGTTATCGGGATCGATAGAGGCGAGCGTAACCTAATTTATGTTTCTGT
    GATAGACACCTGTGGGAACATAGTCGAACAGAAATCATTTAATATTGTTAACGGCTA
    CGATTATCAGATAAAGTTGAAGCAACAAGAGGGTGCACGTCAAATAGCAAGGAAAG
    AATGGAAAGAAATAGGCAAGATTAAAGAAATAAAAGAAGGTTATTTATCCCTTGTA
    ATACACGAAATTAGCAAAATGGTGATTAAATATAATGCGATCATTGCCATGGAGGA
    TCTTTCTTACGGCTTCAAAAAGGGGAGATTCAAAGTCGAGAGGCAGGTGTATCAGA
    AGTTTGAGACCATGCTAATCAATAAACTAAATTATCTAGTATTCAAAGACATAAGCA
    TCACCGAAAATGGCGGCTTGTTGAAGGGTTATCAATTGACCTACATCCCAGATAAAC
    TAAAAAACGTAGGGCATCAATGCGGATGTATATTTTACGTTCCAGCCGCATACACTT
    CCAAAATCGATCCAACTACGGGTTTTGTGAACATCTTCAAATTCAAAGACTTGACTG
    TCGATGCTAAGAGGGAGTTTATCAAGAAATTTGACTCCATTAGATACGACAGTGAG
    AAGAATCTGTTCTGTTTTACCTTTGATTATAACAACTTTATAACTCAAAACACAGTCA
    TGAGTAAGTCATCTTGGTCAGTGTATACGTATGGTGTGAGGATTAAAAGGAGGTTTG
    TTAACGGGAGATTTTCCAATGAAAGTGATACAATAGATATAACCAAGGACATGGAA
    AAGACTCTTGAAATGACCGACATTAACTGGAGAGATGGCCACGACTTACGTCAAGA
    TATAATCGATTACGAGATAGTGCAACATATCTTTGAGATATTTAGGCTTACTGTCCA
    AATGCGTAACTCATTAAGTGAGTTGGAGGACAGGGATTACGATAGGCTAATAAGTC
    CTGTTCTTAACGAAAACAATATATTCTACGATTCAGCAAAGGCGGGAGACGCCCTGC
    CCAAGGACGCGGATGCTAACGGCGCATACTGTATTGCCCTGAAAGGCTTGTACGAG
    ATAAAACAGATCACGGAGAACTGGAAAGAAGATGGAAAATTCAGTCGTGACAAGTT
    AAAAATTAGTAACAAAGACTGGTTCGACTTTATTCAGAACAAGAGATATCTG
    SEQ ID NO: 35
    ATGAACAACGGAACCAATAACTTTCAAAACTTTATAGGCATCTCCAGTCTACAGAAG
    ACACTACGTAACGCTTTGATACCAACTGAGACCACGCAGCAGTTTATCGTCAAGAAC
    GGTATTATAAAGGAAGACGAGCTAAGGGGGGAAAACCGTCAGATCTTAAAGGACAT
    CATGGATGACTACTACAGAGGCTTCATAAGTGAGACTTTGTCTAGTATAGACGACAT
    CGACTGGACCAGTTTATTTGAGAAGATGGAAATTCAGTTAAAGAACGGGGACAATA
    AAGACACACTAATTAAAGAGCAGACCGAATACAGAAAAGCTATACACAAAAAGTTT
    GCCAACGATGATAGATTCAAAAATATGTTTTCAGCAAAATTGATTTCCGACATATTG
    CCAGAATTCGTAATCCATAATAACAATTATTCTGCAAGTGAGAAGGAAGAGAAGAC
    CCAAGTAATCAAGCTGTTTTCCCGTTTTGCTACGAGTTTCAAAGATTATTTCAAGAAT
    AGGGCTAATTGTTTCTCCGCGGACGACATAAGTAGCAGTTCCTGTCACAGGATTGTG
    AACGATAATGCTGAGATATTTTTTTCCAATGCCCTAGTGTATAGGAGAATAGTTAAA
    AGCTTAAGCAACGACGATATCAATAAAATTTCAGGGGACATGAAGGACAGCTTAAA
    GGAAATGAGTTTGGAGGAGATTTACAGTTATGAAAAATACGGAGAGTTTATAACTC
    AGGAAGGCATCTCTTTCTATAATGATATCTGTGGGAAGGTAAACTCCTTCATGAATT
    TATATTGCCAGAAGAATAAGGAAAACAAAAATCTTTACAAGCTTCAAAAGTTACAT
    AAGCAGATCTTATGTATTGCCGACACGAGTTATGAAGTGCCTTATAAATTCGAGAGT
    GATGAGGAAGTGTATCAGTCTGTTAACGGATTCCTAGATAATATAAGTTCCAAACAT
    ATAGTCGAGAGGCTGAGGAAGATTGGCGATAACTATAATGGATATAATCTTGACAA
    AATCTATATAGTCTCTAAATTTTATGAAAGCGTCAGCCAGAAGACATATAGAGATTG
    GGAAACTATAAACACAGCCCTTGAAATACATTACAATAACATCCTACCCGGCAATG
    GTAAGTCTAAGGCAGACAAAGTTAAAAAAGCAGTAAAGAATGACTTACAGAAGTCA
    ATCACGGAGATAAATGAGTTGGTCAGTAACTACAAATTATGCTCCGACGATAATATT
    AAGGCCGAAACATATATACACGAGATAAGTCATATATTAAACAATTTCGAAGCCCA
    GGAGTTAAAATATAACCCTGAAATTCATCTGGTCGAAAGTGAGTTAAAGGCCAGTG
    AGTTAAAGAATGTACTTGACGTAATTATGAATGCTTTTCATTGGTGCTCCGTGTTCAT
    GACCGAGGAGTTAGTAGATAAAGACAATAACTTTTACGCCGAACTTGAAGAGATAT
    ACGACGAGATTTATCCGGTAATCAGCTTGTACAACTTAGTTAGAAATTATGTAACAC
    AGAAGCCTTACTCTACTAAAAAAATAAAACTGAACTTTGGTATCCCAACTCTTGCAG
    ATGGTTGGAGTAAAAGCAAGGAATATAGCAACAATGCGATCATCTTGATGAGAGAC
    AACTTGTACTATTTGGGAATCTTCAACGCGAAAAATAAACCCGACAAAAAAATCAT
    CGAAGGGAATACCTCTGAGAATAAAGGTGACTATAAGAAAATGATTTACAATCTAC
    TTCCTGGTCCTAATAAAATGATCCCGAAAGTGTTTCTTAGTTCTAAGACTGGTGTCG
    AGACGTACAAACCTAGCGCGTACATCTTAGAAGGGTACAAGCAGAATAAACACATC
    AAATCAAGCAAAGACTTCGATATTACTTTTTGCCATGACTTGATAGACTACTTTAAA
    AACTGCATAGCAATCCACCCGGAGTGGAAAAACTTTGGCTTTGATTTCTCTGACACC
    TCTACATATGAGGACATATCTGGTTTTTACCGTGAGGTTGAATTGCAGGGATACAAA
    ATTGACTGGACTTACATATCTGAAAAAGATATCGATCTATTGCAGGAGAAAGGCCA
    GCTTTACCTTTTCCAGATCTATAATAAGGACTTCTCTAAGAAGTCTACAGGGAATGA
    TAATTTGCACACTATGTACTTAAAAAATCTGTTTTCCGAGGAAAACTTGAAAGACAT
    TGTTTTAAAGTTGAACGGAGAAGCTGAAATATTTTTCAGAAAGAGCTCCATAAAAA
    ACCCGATCATTCATAAGAAGGGATCTATCCTGGTTAACAGAACGTACGAAGCGGAA
    GAAAAAGACCAATTCGGAAACATTCAAATTGTTAGAAAGAATATCCCTGAGAACAT
    CTACCAGGAGTTATATAAGTATTTTAATGATAAGTCAGATAAGGAACTATCTGACGA
    AGCGGCGAAGCTTAAAAATGTTGTAGGACACCATGAGGCTGCTACAAATATAGTCA
    AGGACTACCGTTATACCTACGATAAGTACTTTCTACACATGCCCATTACCATCAATTT
    TAAAGCTAATAAAACGGGTTTTATCAACGATCGTATCCTACAATATATTGCGAAAGA
    GAAGGATTTGCATGTCATTGGCATTGATAGAGGTGAGAGGAACCTAATATACGTATC
    CGTGATTGATACGTGCGGGAACATAGTTGAACAGAAATCATTTAATATAGTTAATGG
    GTACGACTATCAGATTAAGCTAAAGCAACAAGAAGGCGCCAGGCAAATTGCCCGTA
    AAGAATGGAAAGAGATCGGGAAGATCAAGGAAATAAAAGAAGGATACCTTTCCCT
    GGTCATCCATGAAATTAGCAAAATGGTGATTAAGTACAATGCCATAATCGCGATGG
    AGGACTTAAGCTACGGGTTCAAAAAGGGGAGGTTTAAGGTGGAGAGGCAAGTGTAC
    CAGAAATTTGAGACCATGCTAATCAACAAACTGAACTACCTAGTTTTTAAGGACATT
    TCAATTACAGAGAATGGAGGACTTTTAAAGGGTTACCAACTAACGTATATACCAGAT
    AAGTTGAAAAATGTCGGTCACCAGTGTGGCTGCATCTTTTACGTTCCCGCCGCTTAT
    ACATCTAAAATTGATCCAACCACAGGCTTTGTAAATATCTTTAAATTCAAAGATTTA
    ACTGTGGATGCAAAAAGAGAGTTTATCAAGAAATTCGATAGCATTCGTTATGATAGC
    GAGAAGAACCTGTTCTGCTTTACTTTCGACTATAACAACTTTATAACTCAAAACACC
    GTGATGTCAAAAAGCTCATGGTCAGTCTACACCTATGGTGTAAGGATTAAAAGGCGT
    TTCGTGAATGGGAGATTCTCCAATGAAAGTGACACGATCGACATAACAAAGGACAT
    GGAGAAGACACTAGAGATGACTGATATTAATTGGAGAGACGGACACGATCTGCGTC
    AAGATATAATTGATTATGAGATAGTACAGCACATATTTGAGATCTTCCGTTTGACTG
    TCCAAATGCGTAATTCCCTTTCTGAGCTGGAAGATAGGGACTATGATAGATTAATAT
    CCCCTGTACTAAATGAGAACAACATTTTCTATGATAGTGCAAAAGCCGGGGATGCAT
    TGCCGAAAGACGCTGACGCTAATGGGGCGTACTGTATAGCTTTAAAGGGGCTTTACG
    AAATAAAGCAGATAACCGAAAACTGGAAGGAAGATGGCAAATTCTCAAGGGACAA
    ACTTAAGATCTCTAACAAGGATTGGTTCGATTTTATACAAAACAAACGTTATTTG
    SEQ ID NO: 36
    ATGAATAATGGTACAAACAACTTTCAGAATTTCATTGGGATCTCTAGCTTACAGAAG
    ACCCTGAGGAATGCGTTGATTCCAACTGAAACAACCCAGCAATTCATCGTGAAAAA
    TGGGATAATCAAAGAGGATGAGTTAAGGGGTGAAAACCGTCAAATATTGAAGGATA
    TTATGGACGACTACTACCGTGGATTCATCTCAGAGACGTTGAGCAGCATTGACGACA
    TAGACTGGACTAGCCTTTTCGAGAAGATGGAAATTCAGTTAAAGAACGGAGATAAC
    AAAGATACACTAATCAAGGAACAGACAGAATACAGAAAAGCAATTCATAAGAAATT
    CGCTAATGACGATCGTTTTAAAAACATGTTCTCTGCAAAATTAATTAGCGACATTCT
    GCCGGAATTCGTTATACATAATAATAACTACAGTGCTTCTGAAAAGGAAGAGAAAA
    CTCAGGTAATAAAACTGTTCTCTCGTTTTGCCACATCCTTCAAAGACTACTTTAAAAA
    TAGAGCGAACTGCTTTAGCGCCGACGATATTAGTTCTTCCTCATGCCACAGGATTGT
    CAACGATAATGCAGAGATATTCTTTTCTAACGCACTAGTCTACAGAAGGATTGTAAA
    GTCTTTGTCAAATGATGACATAAACAAGATTAGTGGAGATATGAAAGACTCTCTAAA
    GGAAATGAGCCTTGAGGAGATATACTCTTATGAAAAGTACGGTGAGTTTATTACCCA
    AGAAGGCATTAGTTTCTATAATGACATTTGTGGAAAAGTTAACAGTTTTATGAATCT
    ATACTGTCAAAAAAATAAGGAGAATAAAAATCTTTATAAGTTGCAAAAACTGCATA
    AGCAGATATTATGTATAGCAGACACGAGCTATGAGGTACCGTACAAGTTCGAGAGC
    GATGAGGAAGTCTACCAATCTGTCAACGGATTTTTGGACAACATTTCTTCAAAACAT
    ATTGTGGAGAGGCTTAGGAAAATAGGCGACAATTATAATGGATATAACTTAGATAA
    GATATATATTGTTTCCAAATTCTACGAATCTGTAAGCCAGAAGACATACAGAGATTG
    GGAAACGATAAACACAGCCCTTGAAATTCACTATAACAACATACTACCTGGAAACG
    GCAAATCAAAGGCCGACAAAGTTAAGAAGGCCGTAAAGAATGATTTACAGAAGAG
    CATAACGGAGATCAATGAGCTGGTGTCTAACTATAAATTGTGTAGCGATGACAACAT
    AAAAGCCGAGACTTACATTCACGAAATTTCACACATACTTAACAACTTTGAAGCTCA
    GGAATTAAAGTATAATCCCGAAATACACCTTGTGGAGTCCGAACTAAAGGCTAGTG
    AGCTTAAGAACGTCCTAGACGTAATTATGAATGCCTTCCACTGGTGTAGTGTTTTTAT
    GACCGAGGAACTTGTTGACAAAGATAATAATTTTTATGCAGAACTAGAAGAGATAT
    ACGATGAAATATACCCGGTGATCAGTTTGTACAATCTTGTCAGGAACTATGTGACAC
    AAAAGCCCTATTCAACAAAGAAAATAAAACTTAATTTCGGAATTCCTACGTTAGCTG
    ATGGCTGGTCTAAATCCAAGGAATACAGCAACAACGCTATAATTCTGATGAGAGAT
    AACTTGTACTATCTAGGCATCTTCAATGCCAAAAATAAGCCTGATAAGAAGATTATA
    GAGGGCAACACTTCAGAGAACAAGGGCGACTACAAGAAAATGATCTATAACCTATT
    GCCTGGCCCAAACAAGATGATTCCGAAGGTCTTCCTATCATCCAAGACCGGCGTTGA
    GACATACAAGCCATCAGCGTATATTTTAGAGGGGTACAAACAAAACAAGCACATAA
    AGTCTAGTAAAGACTTCGATATAACATTTTGTCATGACTTAATTGACTACTTTAAGA
    ATTGCATCGCTATACACCCGGAATGGAAGAATTTCGGCTTCGACTTCTCTGATACAT
    CTACCTACGAGGACATTAGCGGGTTTTACCGTGAAGTCGAATTACAAGGGTATAAG
    ATAGATTGGACGTACATCTCTGAGAAAGACATAGACTTGCTTCAGGAAAAGGGCCA
    GTTGTATCTATTCCAAATATACAATAAGGATTTTTCCAAGAAATCTACGGGTAATGA
    CAATCTTCACACAATGTATCTTAAGAACCTTTTCTCAGAAGAGAACCTGAAGGACAT
    TGTCTTAAAACTAAATGGCGAAGCTGAGATTTTTTTCAGGAAGTCTTCAATTAAGAA
    CCCGATAATCCACAAGAAGGGGAGTATTCTTGTGAATAGAACTTACGAGGCCGAAG
    AAAAAGACCAATTTGGTAACATCCAGATAGTCAGAAAGAACATTCCAGAGAACATC
    TACCAAGAGCTATACAAATATTTCAACGACAAGTCCGATAAGGAACTGTCCGATGA
    GGCAGCCAAGTTGAAGAATGTCGTGGGTCATCATGAAGCTGCTACTAACATTGTCAA
    GGACTATCGTTATACTTACGACAAGTATTTCCTACACATGCCGATAACAATTAATTT
    CAAGGCTAACAAAACAGGCTTTATCAACGATCGTATCTTGCAGTACATAGCTAAGG
    AAAAGGATTTGCATGTGATTGGCATTGATAGAGGGGAGCGTAACTTGATATATGTGT
    CTGTCATAGACACGTGTGGCAACATCGTCGAACAGAAATCATTCAACATAGTAAAC
    GGCTACGATTACCAAATTAAGCTGAAACAGCAAGAGGGTGCACGTCAAATTGCGCG
    TAAAGAGTGGAAAGAAATTGGTAAAATCAAGGAAATTAAAGAAGGCTACTTGTCTC
    TTGTTATACATGAAATTTCCAAGATGGTTATAAAGTATAACGCGATAATTGCTATGG
    AAGACTTATCATACGGGTTTAAAAAGGGGAGGTTCAAGGTAGAGAGGCAGGTCTAT
    CAAAAGTTCGAGACGATGTTGATTAATAAACTAAACTATCTAGTGTTCAAAGATATC
    AGCATTACGGAGAACGGGGGGCTACTGAAAGGATATCAACTAACGTACATTCCCGA
    TAAGTTAAAGAACGTTGGTCATCAATGTGGTTGCATCTTCTACGTGCCTGCTGCCTAT
    ACGTCCAAAATAGATCCAACTACTGGATTTGTTAACATCTTTAAATTCAAAGATTTA
    ACCGTAGACGCCAAAAGGGAATTTATAAAAAAATTTGACAGCATCCGTTACGATAG
    CGAAAAGAATCTGTTCTGTTTTACTTTCGACTACAATAATTTCATCACGCAAAATAC
    GGTAATGTCTAAGTCAAGTTGGAGCGTCTACACGTATGGAGTCAGGATCAAGAGGC
    GTTTCGTAAATGGAAGATTCTCTAATGAGTCAGATACTATAGACATCACGAAAGATA
    TGGAGAAAACCTTGGAGATGACGGATATTAACTGGCGTGATGGACACGATTTAAGA
    CAGGACATTATTGACTATGAGATTGTGCAACACATCTTCGAAATATTCCGTCTAACA
    GTCCAAATGAGGAATAGCCTAAGTGAATTGGAGGACCGTGATTACGATAGGCTTAT
    AAGTCCTGTCCTTAACGAAAACAATATTTTCTATGATAGTGCTAAGGCGGGGGACGC
    ACTGCCTAAAGACGCAGATGCTAACGGGGCATACTGCATTGCGTTAAAGGGTCTGT
    ACGAAATCAAGCAGATTACGGAAAACTGGAAAGAGGATGGCAAGTTTAGCAGAGA
    TAAGTTGAAGATAAGTAACAAAGATTGGTTTGACTTTATTCAGAATAAAAGGTATTT
    A
    SEQ ID NO: 37
    ATGAATAACGGCACTAATAATTTCCAGAATTTCATCGGCATTAGCAGCTTACAAAAG
    ACGTTGAGGAATGCCTTAATACCCACAGAAACTACTCAACAATTTATAGTGAAGAAT
    GGGATAATTAAGGAAGACGAGTTGAGAGGTGAAAATAGGCAAATCTTGAAAGACAT
    TATGGATGACTACTACAGGGGCTTCATTAGTGAAACGTTGTCTTCAATAGATGACAT
    TGATTGGACTTCTTTGTTTGAGAAGATGGAAATACAGTTAAAGAACGGCGACAATA
    AGGATACACTTATCAAAGAGCAAACAGAATATAGAAAAGCAATTCACAAAAAGTTT
    GCTAACGATGATAGGTTCAAGAACATGTTTAGCGCTAAACTAATATCAGACATCCTT
    CCCGAGTTCGTTATTCATAACAATAACTATAGTGCAAGTGAAAAAGAGGAGAAGAC
    ACAGGTGATTAAGCTGTTCTCCAGATTCGCGACTTCTTTCAAAGATTACTTCAAAAA
    CAGAGCCAACTGTTTTTCAGCTGACGATATCTCTAGTAGTAGTTGTCACCGTATAGT
    GAACGATAACGCTGAGATCTTCTTTAGCAATGCATTAGTGTATAGAAGGATAGTTAA
    GTCTCTAAGCAATGATGATATCAATAAAATTTCCGGAGACATGAAGGACTCCCTAAA
    GGAAATGTCCTTAGAAGAGATCTACTCATATGAGAAATACGGGGAATTTATTACGC
    AGGAAGGGATCTCCTTTTACAATGACATATGCGGGAAGGTCAACTCTTTCATGAACT
    TATACTGCCAAAAGAACAAGGAGAACAAGAATTTATATAAACTTCAGAAACTTCAC
    AAACAAATACTGTGCATAGCCGATACCTCATATGAGGTTCCTTACAAATTTGAATCA
    GATGAAGAGGTATACCAATCCGTTAACGGCTTTCTTGACAATATTAGCTCAAAGCAC
    ATCGTGGAGAGGTTGAGAAAGATTGGTGATAATTATAATGGCTACAATCTAGATAA
    GATATATATTGTTAGCAAGTTCTACGAGTCTGTGTCCCAAAAAACATATAGGGATTG
    GGAGACAATTAATACTGCTCTAGAAATCCATTACAACAACATCCTTCCTGGAAATGG
    CAAGAGTAAGGCCGACAAAGTCAAGAAAGCAGTGAAAAATGATCTGCAAAAATCA
    ATTACTGAGATAAACGAGCTAGTATCTAATTACAAGCTTTGTAGCGACGATAACATT
    AAGGCAGAAACGTACATACACGAGATTAGTCACATCTTAAATAATTTTGAAGCCCA
    AGAACTGAAATATAACCCTGAGATACACCTTGTTGAATCCGAGTTAAAGGCGTCTGA
    ACTAAAAAACGTGTTAGACGTTATTATGAATGCCTTCCACTGGTGTAGCGTCTTTAT
    GACTGAGGAGTTGGTTGATAAGGATAATAACTTTTACGCTGAATTGGAAGAAATTTA
    TGACGAAATCTATCCTGTTATTTCTCTATATAATTTGGTGAGAAATTACGTAACGCA
    AAAGCCCTATAGTACGAAAAAAATAAAACTAAATTTCGGGATCCCTACCCTAGCCG
    ACGGTTGGTCTAAATCCAAGGAGTACTCAAACAATGCAATAATATTGATGAGGGAC
    AACCTGTACTACCTAGGCATATTTAATGCCAAAAATAAGCCCGATAAAAAGATTATA
    GAAGGGAACACGTCAGAAAATAAAGGAGACTATAAGAAAATGATCTACAACCTTTT
    GCCCGGCCCCAATAAAATGATCCCGAAGGTCTTCCTAAGTAGCAAGACTGGCGTAG
    AGACCTACAAACCATCTGCATACATTTTGGAGGGGTACAAGCAAAACAAGCACATA
    AAGAGTAGTAAGGATTTTGACATTACATTCTGCCATGACTTAATTGACTACTTTAAA
    AATTGCATCGCAATTCACCCTGAATGGAAAAATTTTGGATTTGATTTCTCTGATACTT
    CAACATATGAGGATATTTCAGGGTTCTACAGGGAGGTCGAACTACAGGGTTACAAA
    ATAGACTGGACGTATATTTCTGAGAAAGATATAGATTTGCTTCAGGAAAAGGGTCA
    GCTATATCTGTTCCAGATATATAATAAGGACTTCTCCAAAAAGAGTACCGGAAATGA
    TAATCTGCACACAATGTACTTAAAAAACTTGTTCTCTGAGGAGAATCTAAAAGACAT
    CGTACTAAAACTTAACGGGGAGGCCGAAATTTTTTTTAGGAAGTCCAGCATCAAGA
    ACCCGATTATTCATAAAAAAGGTAGCATTTTGGTGAACCGTACTTATGAGGCGGAAG
    AAAAAGACCAATTCGGTAATATTCAAATCGTTAGAAAGAACATCCCTGAGAACATT
    TATCAGGAACTATACAAATACTTTAACGACAAATCAGATAAGGAGCTTTCTGATGAG
    GCAGCTAAATTGAAAAATGTAGTGGGACATCACGAAGCAGCCACTAACATAGTGAA
    GGACTACAGATACACATACGATAAGTACTTCCTGCACATGCCTATTACAATTAACTT
    TAAAGCAAATAAAACAGGGTTTATTAACGACAGAATCTTACAGTATATTGCCAAAG
    AAAAGGATCTGCATGTGATAGGAATAGACAGAGGAGAAAGAAACCTGATATACGTC
    TCCGTGATTGATACATGTGGGAACATAGTAGAACAGAAGTCCTTTAACATTGTTAAT
    GGGTACGATTATCAAATTAAATTAAAACAACAAGAAGGAGCACGTCAAATAGCTAG
    GAAAGAATGGAAAGAGATAGGAAAAATTAAGGAAATTAAGGAGGGTTACCTGTCC
    CTTGTAATTCATGAAATATCCAAAATGGTAATTAAATATAACGCGATCATCGCGATG
    GAAGATCTAAGCTACGGGTTCAAAAAAGGCAGGTTTAAGGTGGAGAGGCAAGTTTA
    CCAAAAGTTCGAGACAATGTTGATTAATAAGTTAAACTACTTAGTTTTCAAAGATAT
    CTCCATAACCGAGAATGGCGGGCTTTTAAAAGGGTACCAACTAACATATATCCCGG
    ATAAATTGAAGAACGTTGGACACCAGTGTGGCTGCATATTTTATGTACCCGCTGCGT
    ATACTTCTAAAATTGACCCGACCACCGGGTTTGTAAACATATTCAAGTTTAAGGACC
    TAACAGTTGACGCCAAACGTGAGTTCATCAAGAAGTTCGATAGTATAAGGTATGACT
    CTGAGAAGAACCTTTTCTGCTTCACGTTTGACTATAATAATTTCATCACCCAAAATAC
    AGTTATGTCAAAAAGCTCTTGGTCAGTATATACGTATGGCGTAAGGATTAAGCGTAG
    GTTCGTGAACGGTAGATTTTCCAACGAGTCAGATACTATTGATATTACCAAGGATAT
    GGAGAAGACATTAGAAATGACAGATATAAATTGGAGGGATGGGCACGATCTAAGGC
    AAGATATCATTGATTACGAAATTGTTCAGCACATATTCGAGATATTCCGTCTTACAG
    TACAAATGCGTAACAGCTTGTCTGAGTTGGAAGATCGTGACTATGACAGGTTGATAT
    CACCGGTCTTGAACGAGAACAATATATTCTACGACAGCGCTAAGGCGGGAGACGCT
    CTGCCTAAAGACGCAGATGCCAATGGGGCGTACTGCATTGCCTTAAAAGGCTTATAC
    GAGATTAAACAGATCACAGAGAACTGGAAAGAGGACGGCAAGTTTTCTAGAGATAA
    ATTGAAAATCTCAAACAAAGACTGGTTCGATTTCATCCAAAACAAAAGATACCTT
    SEQ ID NO: 38
    ATGAACAATGGAACTAACAACTTCCAGAACTTTATCGGCATCTCTTCCCTCCAAAAG
    ACACTGAGAAATGCACTGATCCCAACCGAAACGACTCAACAATTTATTGTTAAGAA
    CGGCATCATAAAAGAAGACGAGCTTCGCGGCGAGAACCGCCAGATACTTAAGGATA
    TTATGGACGATTATTACCGAGGCTTTATCAGCGAAACTCTTAGCTCTATTGATGATAT
    CGACTGGACCTCCCTCTTCGAAAAAATGGAGATACAGCTCAAGAACGGCGATAATA
    AAGACACCTTGATAAAGGAACAGACTGAGTACAGGAAAGCGATCCACAAGAAATTC
    GCGAACGACGACAGGTTTAAAAACATGTTCTCTGCAAAATTGATATCCGACATCTTG
    CCGGAATTTGTGATACACAACAATAACTATAGCGCTTCAGAGAAAGAAGAGAAGAC
    CCAAGTAATCAAGTTGTTCAGCCGCTTCGCAACGTCTTTTAAAGATTACTTTAAGAA
    CCGGGCCAATTGTTTCTCCGCGGATGATATTAGCTCATCAAGTTGCCATCGAATTGT
    CAATGATAATGCGGAGATCTTCTTCAGCAATGCGCTGGTCTACAGACGAATCGTAAA
    AAGTCTTTCAAATGACGACATCAATAAGATTAGTGGAGATATGAAGGATTCCCTTAA
    GGAAATGAGTCTTGAAGAAATATACTCATACGAAAAGTACGGGGAATTTATTACCC
    AGGAGGGGATCTCCTTCTATAACGACATCTGTGGAAAAGTAAACTCATTCATGAACC
    TGTACTGTCAGAAAAACAAAGAAAACAAAAATCTGTATAAACTCCAAAAATTGCAC
    AAGCAAATATTGTGTATAGCGGACACATCATACGAGGTTCCATATAAGTTCGAAAGT
    GATGAAGAAGTCTACCAATCAGTGAATGGGTTTCTGGACAACATTAGTTCCAAGCAC
    ATAGTTGAACGACTGCGAAAGATTGGTGACAATTACAACGGCTATAATTTGGACAA
    GATTTATATAGTTAGCAAATTTTATGAATCCGTATCACAAAAGACTTATAGAGACTG
    GGAAACAATCAACACGGCACTTGAGATCCATTATAACAATATTCTTCCAGGGAACG
    GCAAAAGCAAGGCTGATAAGGTAAAAAAGGCCGTTAAGAATGATCTTCAAAAATCC
    ATAACGGAGATCAACGAACTTGTAAGTAACTACAAATTGTGCTCTGACGACAATAT
    AAAGGCTGAAACGTATATTCACGAGATTAGCCATATCCTGAATAACTTTGAGGCCCA
    AGAACTCAAGTATAACCCGGAAATACATTTGGTAGAAAGCGAGCTTAAAGCGAGTG
    AGCTGAAAAACGTCCTCGATGTGATCATGAATGCTTTCCACTGGTGTAGTGTCTTTA
    TGACTGAGGAGTTGGTTGATAAAGACAATAATTTCTACGCTGAACTGGAAGAAATTT
    ACGACGAAATCTATCCAGTGATCTCCCTCTATAACCTCGTTCGAAACTACGTGACGC
    AGAAACCTTATTCTACAAAGAAAATTAAGTTGAACTTCGGCATTCCTACACTTGCTG
    ACGGATGGTCCAAATCCAAAGAGTACTCAAACAACGCAATCATCCTCATGCGGGAT
    AACCTTTATTATTTGGGCATTTTCAACGCCAAAAACAAACCTGATAAAAAGATAATT
    GAAGGCAATACGAGTGAGAACAAGGGCGACTACAAAAAAATGATATATAACTTGTT
    GCCAGGCCCCAACAAGATGATTCCTAAAGTTTTTCTGTCTTCTAAGACTGGAGTTGA
    AACTTACAAACCCTCCGCCTACATTCTTGAAGGGTATAAACAGAATAAGCACATAA
    AGTCCTCAAAGGATTTCGACATTACGTTTTGCCATGACCTCATCGACTATTTCAAGA
    ACTGTATCGCCATACATCCGGAGTGGAAGAATTTTGGATTTGATTTCTCCGACACAT
    CTACCTATGAAGACATAAGCGGTTTCTACCGGGAGGTCGAGCTTCAGGGCTATAAG
    ATAGATTGGACATACATTAGTGAAAAAGATATCGATCTTCTGCAAGAAAAGGGACA
    ACTTTACCTTTTTCAGATTTATAATAAAGACTTTTCAAAAAAGTCCACAGGGAACGA
    TAATCTGCACACCATGTATCTCAAGAATCTGTTTAGTGAAGAAAACCTTAAAGACAT
    AGTTTTGAAGCTTAACGGAGAGGCTGAGATTTTTTTTAGAAAGTCCTCAATTAAAAA
    CCCTATAATACACAAGAAAGGCTCTATTCTTGTTAACAGGACATATGAAGCCGAGG
    AGAAAGATCAGTTTGGCAATATCCAGATTGTTCGCAAGAATATCCCGGAAAATATAT
    ATCAGGAGCTGTATAAATACTTTAACGACAAGAGCGACAAGGAGCTGAGTGACGAG
    GCCGCGAAGCTTAAGAATGTAGTAGGTCACCACGAAGCAGCCACCAATATCGTCAA
    AGACTATAGGTACACGTACGACAAGTACTTTTTGCACATGCCTATAACTATAAACTT
    CAAAGCTAATAAAACTGGGTTTATTAATGACAGGATTCTCCAATACATCGCTAAAGA
    GAAGGATCTGCATGTAATTGGCATAGACAGAGGTGAGAGAAACTTGATATATGTCA
    GCGTAATAGACACATGTGGCAATATCGTGGAACAGAAGTCTTTTAACATCGTCAATG
    GTTACGACTACCAAATTAAGTTGAAACAGCAGGAAGGCGCACGACAGATCGCACGA
    AAGGAATGGAAAGAGATAGGCAAAATAAAAGAAATAAAGGAGGGCTATCTCAGTC
    TCGTTATACACGAAATTTCAAAAATGGTTATTAAGTACAATGCAATCATAGCGATGG
    AGGATCTCAGTTATGGGTTCAAAAAGGGTCGGTTTAAAGTTGAGCGCCAAGTGTACC
    AAAAGTTCGAGACAATGCTGATTAACAAGCTGAACTACCTCGTCTTCAAAGATATAA
    GTATTACGGAGAACGGTGGCCTTCTTAAAGGCTATCAACTTACTTACATCCCGGACA
    AGCTCAAAAACGTAGGGCACCAATGCGGGTGTATTTTCTATGTGCCTGCGGCATATA
    CGTCAAAGATTGACCCAACCACAGGATTCGTAAACATATTCAAGTTTAAGGACCTCA
    CCGTTGATGCGAAAAGGGAGTTCATTAAAAAATTTGATTCTATTCGATATGATAGTG
    AGAAAAATCTCTTTTGTTTCACATTTGACTATAATAATTTTATTACTCAGAATACTGT
    CATGAGCAAGTCATCTTGGTCAGTGTACACATACGGGGTGCGGATCAAACGCAGGT
    TCGTCAATGGTCGCTTCTCAAACGAATCAGACACCATTGACATCACAAAGGACATGG
    AAAAAACCCTTGAGATGACCGACATTAATTGGCGCGATGGTCATGATCTGCGGCAA
    GACATCATAGACTACGAAATCGTCCAACACATCTTTGAGATCTTTCGCTTGACGGTC
    CAAATGCGGAACTCCCTGTCCGAGCTCGAGGATAGAGATTATGATCGGCTGATATCT
    CCCGTGCTTAATGAAAATAACATCTTCTACGACTCCGCCAAGGCGGGTGATGCCCTG
    CCGAAGGATGCGGATGCTAATGGCGCTTATTGCATTGCTCTTAAGGGGCTCTATGAG
    ATAAAGCAGATCACGGAAAACTGGAAAGAAGACGGTAAGTTTAGTAGAGACAAGC
    TGAAGATCTCAAATAAAGACTGGTTTGATTTCATACAG. AAC. AAG. CGG. TAC. CTG
    SEQ ID NO: 39
    ATGAACAATGGCACTAACAATTTTCAGAATTTCATCGGCATTTCAAGTCTGCAAAAA
    ACTCTGAGGAATGCTTTGATCCCTACTGAAACCACTCAGCAATTTATAGTCAAGAAC
    GGTATAATTAAAGAAGATGAACTCAGGGGTGAAAATAGACAAATACTCAAGGACAT
    TATGGATGACTATTATAGAGGCTTCATCTCAGAGACTCTCTCATCAATAGATGATAT
    CGATTGGACTAGCCTTTTCGAGAAAATGGAGATTCAGTTGAAAAATGGTGATAACA
    AAGATACGTTGATAAAGGAACAGACCGAGTACAGGAAAGCCATTCATAAGAAATTT
    GCTAATGACGATAGATTTAAGAATATGTTTAGTGCAAAACTGATTAGTGACATTCTG
    CCGGAGTTCGTTATCCATAATAATAACTACTCTGCATCCGAAAAGGAGGAAAAGAC
    GCAAGTTATTAAACTGTTCAGCCGCTTCGCCACAAGCTTCAAGGACTACTTCAAAAA
    TAGAGCCAACTGCTTTTCTGCCGACGATATATCATCATCTTCATGCCATCGGATCGTT
    AACGATAACGCCGAGATATTCTTCAGCAACGCCCTTGTATATCGAAGAATAGTCAAA
    AGTCTGAGTAATGATGATATTAATAAAATTAGCGGTGATATGAAAGACTCCCTGAA
    GGAAATGTCACTGGAGGAAATTTATAGTTACGAAAAGTACGGCGAATTCATTACTC
    AAGAAGGCATATCCTTCTATAACGACATTTGCGGAAAGGTCAACTCATTCATGAACC
    TTTATTGCCAGAAGAATAAGGAGAATAAAAATCTTTACAAATTGCAAAAACTTCAC
    AAACAAATTCTTTGCATCGCGGATACGTCCTACGAAGTTCCTTACAAATTTGAATCC
    GATGAGGAAGTGTATCAGAGTGTCAATGGATTTTTGGATAATATCTCTTCAAAACAT
    ATTGTGGAGAGATTGCGCAAAATAGGTGATAACTACAATGGCTACAACCTGGACAA
    GATTTATATTGTTAGCAAGTTCTATGAAAGTGTCAGTCAAAAGACCTACAGAGATTG
    GGAGACAATCAACACGGCGCTCGAAATACACTACAATAACATCCTCCCCGGCAATG
    GGAAGAGTAAAGCCGATAAGGTTAAAAAAGCTGTTAAGAACGACCTCCAGAAATCC
    ATCACGGAAATAAACGAGCTGGTTTCCAACTATAAGCTGTGTAGCGATGATAATATT
    AAGGCTGAGACATATATACATGAGATCAGCCACATTCTCAACAATTTCGAGGCACA
    GGAACTCAAATACAATCCCGAGATTCACTTGGTGGAAAGTGAGTTGAAGGCGTCAG
    AGCTTAAGAATGTACTTGACGTAATAATGAATGCTTTTCATTGGTGCTCCGTGTTCAT
    GACTGAGGAACTCGTGGATAAGGATAATAACTTTTATGCGGAGTTGGAAGAGATAT
    ACGATGAAATATACCCGGTTATCTCACTGTATAATCTGGTCAGAAATTACGTGACCC
    AAAAGCCTTATAGTACAAAAAAAATAAAGTTGAACTTCGGTATTCCGACATTGGCA
    GATGGTTGGTCCAAAAGCAAAGAATACTCTAATAACGCCATTATATTGATGCGAGA
    CAATTTGTATTACCTTGGGATCTTTAACGCGAAAAACAAACCGGATAAGAAGATCAT
    CGAAGGTAATACATCTGAGAATAAGGGGGATTACAAGAAGATGATTTATAATCTGT
    TGCCGGGGCCAAACAAGATGATTCCGAAGGTCTTTCTGTCATCTAAGACAGGAGTA
    GAGACCTACAAACCTTCTGCGTACATTTTGGAAGGCTACAAACAGAACAAGCATAT
    AAAATCTAGCAAGGACTTTGATATCACGTTTTGTCATGATCTGATAGATTATTTCAA
    AAACTGCATCGCTATACATCCTGAGTGGAAGAATTTCGGCTTTGACTTTTCTGACAC
    CAGCACATACGAAGACATCTCAGGTTTCTACCGGGAAGTCGAGCTCCAGGGGTACA
    AGATTGACTGGACATATATAAGTGAAAAAGACATCGACCTCCTCCAAGAGAAGGGC
    CAACTTTACCTGTTCCAGATCTATAACAAAGACTTTTCTAAAAAGTCCACGGGTAAC
    GACAACTTGCACACTATGTATCTGAAAAACTTGTTCTCTGAAGAGAACCTCAAGGAC
    ATCGTCCTGAAGCTTAACGGGGAGGCGGAGATCTTCTTTAGAAAGTCCTCTATCAAA
    AATCCCATTATCCATAAAAAGGGCTCTATACTCGTTAATAGGACATATGAAGCGGAG
    GAAAAAGATCAATTTGGGAACATCCAGATCGTCCGGAAAAATATACCTGAGAATAT
    CTATCAAGAGCTGTACAAGTATTTTAATGATAAGTCAGACAAAGAGCTCAGTGATG
    AGGCGGCAAAGCTCAAGAACGTGGTGGGGCATCATGAAGCTGCGACGAACATTGTC
    AAAGATTATAGATACACTTACGATAAATACTTCCTCCACATGCCGATAACGATTAAC
    TTCAAAGCCAATAAGACGGGGTTTATAAATGATCGGATCCTTCAGTACATTGCGAAA
    GAGAAAGACCTCCATGTGATCGGAATTGACCGAGGAGAAAGGAATCTGATTTACGT
    GTCCGTGATTGATACTTGCGGGAATATAGTCGAGCAAAAGAGTTTCAACATAGTCAA
    CGGGTATGACTATCAGATAAAGCTCAAACAGCAGGAAGGTGCGAGGCAAATTGCGC
    GCAAAGAGTGGAAGGAGATAGGCAAGATTAAAGAAATCAAGGAAGGTTATCTCAG
    CTTGGTGATCCATGAAATATCTAAGATGGTTATAAAGTACAATGCCATAATAGCCAT
    GGAGGATCTTTCCTACGGGTTTAAGAAGGGCCGATTTAAAGTGGAGCGACAAGTTT
    ACCAGAAGTTCGAAACCATGTTGATTAACAAACTTAACTATTTGGTGTTCAAGGATA
    TAAGTATAACCGAAAACGGCGGTTTGCTTAAGGGTTATCAGCTCACGTATATTCCTG
    ATAAACTTAAAAACGTTGGACACCAGTGTGGATGTATCTTCTACGTGCCAGCCGCTT
    ACACTAGTAAGATAGATCCTACCACGGGGTTTGTGAATATTTTTAAGTTTAAAGACT
    TGACAGTCGACGCCAAAAGGGAATTTATAAAAAAGTTTGATTCTATCCGCTACGATA
    GTGAAAAAAATCTCTTTTGCTTTACTTTCGACTATAACAACTTCATTACGCAGAACA
    CTGTCATGAGTAAGTCCAGCTGGAGCGTCTACACATATGGCGTCCGAATTAAACGAC
    GATTTGTAAACGGGCGGTTTTCAAACGAATCTGACACGATAGACATTACCAAGGAT
    ATGGAGAAGACACTTGAGATGACCGACATAAACTGGCGGGACGGTCACGATCTTCG
    GCAGGACATAATTGATTACGAAATCGTCCAGCATATATTCGAAATATTTCGACTTAC
    AGTGCAAATGCGGAACAGTCTCTCTGAACTGGAAGATCGCGATTATGACCGGTTGAT
    TTCTCCGGTCCTCAATGAAAATAACATATTTTATGATAGTGCTAAGGCAGGTGATGC
    GTTGCCAAAGGATGCAGACGCTAATGGTGCCTATTGTATCGCGCTCAAGGGATTGTA
    CGAGATAAAGCAAATTACGGAGAACTGGAAGGAGGATGGTAAGTTTAGCCGAGAC
    AAGTTGAAGATTAGCAATAAAGACTGGTTTGATTTTATCCAAAACAAGAGGTACCTG
    SEQ ID NO: 40
    ATGAATAACGGAACTAATAACTTTCAAAATTTCATAGGTATTTCAAGCTTGCAGAAG
    ACCCTGAGGAATGCCCTGATTCCAACCGAGACAACGCAGCAGTTCATAGTCAAAAA
    TGGCATTATTAAGGAAGATGAGCTGCGGGGGGAAAACCGACAGATACTCAAGGATA
    TTATGGACGACTATTACCGGGGATTTATCTCAGAAACGCTGAGCAGTATTGATGACA
    TCGATTGGACCAGTCTTTTCGAGAAAATGGAAATTCAACTTAAGAATGGTGACAATA
    AAGACACTCTCATAAAGGAGCAAACTGAATACCGAAAAGCCATACACAAAAAGTTT
    GCCAACGATGACCGCTTTAAAAACATGTTTTCAGCTAAGCTCATTAGCGACATTCTC
    CCCGAGTTTGTGATTCATAACAATAACTATAGCGCATCCGAGAAGGAGGAAAAAAC
    CCAAGTTATCAAATTGTTCAGTAGATTCGCTACGAGCTTTAAAGATTACTTTAAAAA
    CCGGGCTAACTGCTTCAGTGCAGACGATATCAGCTCCTCATCCTGTCATCGCATCGT
    CAATGATAATGCTGAGATCTTCTTTTCTAATGCACTGGTTTACCGCAGGATAGTTAA
    GTCTCTTAGTAACGACGACATCAACAAGATATCAGGAGATATGAAGGATTCCCTTAA
    AGAAATGAGTCTCGAGGAGATATATTCTTATGAAAAATACGGCGAATTTATTACCCA
    AGAGGGCATTAGTTTCTATAATGACATATGCGGAAAAGTTAATAGTTTTATGAATCT
    CTATTGTCAGAAGAATAAGGAGAATAAGAACCTCTACAAATTGCAGAAGTTGCACA
    AGCAAATTCTGTGTATCGCGGACACCTCTTACGAGGTCCCATATAAGTTCGAGAGTG
    ATGAAGAAGTATACCAGAGCGTTAATGGGTTCCTGGACAACATCTCAAGTAAACAC
    ATAGTCGAAAGGCTCCGAAAGATCGGTGATAACTATAACGGATATAATTTGGATAA
    AATTTATATAGTTAGCAAATTTTACGAGAGCGTCAGTCAGAAGACCTACCGGGACTG
    GGAGACCATAAACACAGCGCTGGAAATACATTATAACAACATACTGCCTGGGAACG
    GTAAGTCAAAGGCAGACAAGGTTAAAAAGGCTGTGAAGAATGACCTGCAAAAATCA
    ATTACAGAAATAAATGAGTTGGTAAGTAATTACAAACTTTGCAGCGATGATAATATA
    AAGGCAGAGACGTACATACATGAAATATCTCATATCCTCAACAATTTCGAAGCCCA
    AGAACTGAAGTACAACCCGGAAATTCATCTTGTAGAGTCTGAGTTGAAGGCCTCCG
    AATTGAAAAACGTTCTTGACGTAATTATGAATGCCTTCCACTGGTGCTCAGTATTCA
    TGACGGAAGAGCTCGTGGATAAAGACAACAATTTTTACGCTGAACTGGAAGAAATA
    TATGACGAGATTTACCCCGTAATTTCACTCTACAACTTGGTACGAAATTACGTTACC
    CAAAAGCCATACTCAACAAAAAAAATTAAACTGAACTTCGGGATACCCACCCTCGC
    AGATGGATGGTCAAAGTCCAAAGAGTACAGTAACAATGCAATTATCCTGATGCGAG
    ACAACCTTTATTACCTCGGGATTTTCAACGCTAAAAATAAACCTGATAAAAAAATAA
    TTGAGGGTAATACCTCTGAAAACAAGGGGGATTATAAAAAGATGATATACAATCTG
    CTGCCTGGCCCGAACAAAATGATTCCTAAAGTCTTCTTGTCTTCCAAGACTGGAGTC
    GAAACCTACAAGCCAAGTGCTTATATACTCGAAGGGTACAAACAAAATAAGCACAT
    AAAATCCAGCAAGGATTTTGATATTACATTCTGCCACGATTTGATTGATTATTTTAAG
    AACTGTATAGCCATCCACCCAGAATGGAAGAATTTTGGTTTTGATTTTAGCGATACC
    TCAACATATGAGGATATCTCTGGCTTTTACCGCGAGGTAGAACTGCAAGGTTATAAG
    ATCGATTGGACTTATATTTCTGAAAAGGACATAGATCTCCTGCAAGAGAAAGGGCA
    ACTTTATTTGTTTCAAATATACAACAAAGATTTTAGTAAGAAGAGTACTGGCAATGA
    TAACCTTCACACTATGTATCTGAAGAACCTTTTTTCTGAGGAGAACTTGAAGGACAT
    AGTCCTTAAACTCAATGGGGAAGCTGAAATATTCTTTCGCAAAAGCTCCATTAAAAA
    CCCGATCATTCATAAAAAGGGTTCCATCTTGGTAAACCGCACATACGAGGCGGAAG
    AAAAAGATCAGTTCGGAAATATCCAGATCGTAAGGAAGAATATCCCCGAAAATATA
    TACCAAGAGCTTTACAAATATTTTAACGATAAGTCAGACAAGGAACTGTCAGACGA
    AGCAGCCAAGTTGAAGAATGTCGTAGGGCACCACGAAGCAGCTACAAACATAGTTA
    AAGATTATCGGTACACCTACGATAAATATTTCCTGCATATGCCAATAACCATAAACT
    TCAAAGCCAACAAAACAGGGTTCATCAATGACCGAATACTTCAGTATATAGCCAAG
    GAAAAAGACCTGCATGTTATAGGAATAGATAGAGGTGAGCGCAACTTGATATATGT
    CAGCGTGATAGACACCTGCGGAAATATCGTCGAGCAAAAAAGTTTCAACATTGTTA
    ATGGCTACGATTACCAAATTAAATTGAAGCAGCAAGAGGGGGCTCGGCAAATCGCG
    CGAAAGGAATGGAAAGAAATCGGGAAGATTAAAGAAATTAAAGAGGGCTACCTGT
    CTCTTGTAATTCACGAAATATCTAAGATGGTCATCAAGTATAATGCCATTATTGCGA
    TGGAAGATCTGTCCTACGGATTTAAGAAAGGCAGGTTTAAAGTCGAAAGGCAGGTG
    TACCAGAAATTCGAGACCATGCTGATTAATAAGCTCAACTATCTCGTATTTAAGGAT
    ATTTCTATAACTGAAAATGGAGGGCTTCTCAAAGGATATCAACTCACATACATACCT
    GATAAGCTGAAGAACGTAGGCCACCAGTGTGGATGCATATTCTATGTACCAGCTGC
    ATACACAAGCAAGATCGATCCAACTACTGGGTTTGTCAATATCTTCAAATTTAAGGA
    CTTGACGGTCGATGCCAAACGGGAGTTCATCAAAAAGTTTGATAGTATTCGATATGA
    TAGTGAGAAGAACTTGTTTTGCTTCACATTTGACTACAACAATTTCATAACGCAAAA
    TACGGTTATGTCTAAATCCTCATGGAGCGTCTACACTTACGGAGTGAGGATAAAGCG
    GCGCTTCGTAAATGGCAGGTTTAGCAATGAATCCGACACGATTGACATAACCAAGG
    ATATGGAGAAAACCCTCGAGATGACCGATATAAATTGGCGGGATGGACACGATCTG
    CGACAAGACATAATCGATTATGAAATCGTGCAGCACATATTTGAGATATTCAGGCTT
    ACGGTCCAAATGAGAAATTCCCTTTCCGAACTTGAAGACCGCGATTACGACCGACTG
    ATAAGCCCCGTTCTGAACGAAAATAACATCTTCTACGACAGCGCTAAAGCGGGAGA
    CGCGCTGCCGAAAGATGCGGACGCAAATGGAGCCTATTGTATCGCCTTGAAAGGGT
    TGTACGAGATCAAACAGATAACCGAGAATTGGAAGGAGGATGGGAAGTTTAGTCGA
    GACAAACTTAAAATAAGCAACAAGGACTGGTTCGACTTTATTCAAAACAAACGATA
    TCTC
    SEQ ID NO: 41
    ATGAATAATGGTACTAACAATTTTCAAAACTTTATCGGCATCTCTTCACTTCAGAAA
    ACTCTTCGGAACGCCCTTATACCGACGGAGACAACGCAGCAGTTTATAGTTAAAAAC
    GGGATCATTAAAGAAGATGAACTCAGAGGGGAAAACAGGCAAATATTGAAGGACA
    TTATGGACGATTACTACCGGGGGTTTATTTCAGAGACCCTTTCATCTATTGATGACAT
    AGATTGGACCTCCCTTTTCGAGAAAATGGAGATACAATTGAAAAACGGCGACAATA
    AAGATACACTTATCAAGGAACAAACTGAGTATCGCAAGGCGATTCACAAGAAGTTT
    GCGAATGACGATCGCTTTAAGAATATGTTTTCTGCGAAGCTCATAAGTGACATTCTG
    CCTGAATTTGTCATTCATAACAACAATTATTCTGCTAGCGAAAAAGAGGAAAAAACT
    CAAGTCATTAAGCTTTTTAGCAGGTTCGCTACTAGTTTTAAAGACTATTTTAAGAACC
    GGGCGAATTGCTTTAGCGCTGACGACATATCATCCTCATCCTGTCATCGCATAGTCA
    ATGATAATGCAGAAATATTCTTTTCTAATGCGCTCGTGTATCGGAGAATAGTGAAAA
    GCCTCTCTAACGATGACATTAACAAAATAAGCGGCGATATGAAGGATAGTCTGAAG
    GAAATGTCCCTCGAAGAAATATACTCATACGAGAAGTACGGAGAATTTATCACCCA
    GGAAGGAATTAGTTTTTACAACGACATCTGTGGTAAGGTTAACTCTTTTATGAATCT
    GTATTGTCAAAAGAATAAAGAAAATAAAAATCTTTATAAGCTCCAAAAGCTTCACA
    AACAAATCTTGTGCATTGCGGATACGTCATACGAAGTACCTTACAAATTTGAAAGCG
    ACGAAGAGGTGTATCAGTCAGTGAATGGGTTCCTTGACAATATTTCTAGCAAACATA
    TTGTGGAGCGACTTCGAAAGATCGGTGATAATTACAATGGCTATAATTTGGATAAAA
    TTTACATAGTTAGTAAGTTTTATGAATCCGTCTCACAAAAGACGTACCGAGATTGGG
    AGACCATCAACACTGCTCTGGAGATTCATTACAATAATATATTGCCTGGGAATGGGA
    AGTCAAAGGCCGACAAGGTTAAAAAAGCCGTAAAAAACGATCTTCAAAAGTCCATT
    ACCGAGATAAATGAACTTGTATCCAACTATAAGTTGTGCTCTGACGATAATATTAAA
    GCAGAAACGTATATCCACGAAATAAGTCACATCCTGAACAACTTCGAAGCTCAAGA
    GCTCAAGTATAATCCTGAAATTCATCTCGTCGAAAGCGAGCTGAAAGCATCCGAGTT
    GAAGAATGTGCTTGATGTGATCATGAACGCATTCCATTGGTGCAGTGTGTTCATGAC
    CGAAGAACTTGTAGACAAAGACAACAACTTCTACGCTGAATTGGAAGAGATTTACG
    ATGAAATTTACCCCGTGATATCCCTCTATAATCTGGTAAGAAATTACGTCACGCAAA
    AACCATACAGTACCAAGAAAATAAAGCTCAACTTTGGTATTCCGACGTTGGCAGAT
    GGGTGGAGTAAGAGCAAGGAGTATTCTAACAATGCAATCATCCTCATGCGCGACAA
    TTTGTATTATCTGGGGATCTTCAACGCGAAAAATAAGCCCGACAAAAAGATAATAG
    AAGGCAATACGTCCGAGAACAAAGGGGACTATAAGAAAATGATTTATAACCTTCTT
    CCAGGACCCAACAAGATGATCCCAAAGGTTTTCTTGAGTTCAAAAACCGGCGTAGA
    AACTTATAAACCGTCCGCCTACATTCTGGAAGGGTACAAGCAAAACAAGCACATTA
    AGTCATCTAAGGATTTCGACATTACTTTTTGTCATGATTTGATAGACTACTTCAAAAA
    TTGTATAGCGATACATCCGGAATGGAAAAATTTTGGGTTCGATTTTTCCGACACAAG
    TACTTATGAAGACATCTCAGGGTTTTATAGGGAAGTTGAACTGCAAGGTTACAAAAT
    AGACTGGACTTATATTAGTGAGAAGGACATTGATTTGCTCCAGGAAAAGGGTCAATT
    GTATCTGTTCCAGATATATAACAAGGATTTCTCTAAAAAATCTACAGGTAACGACAA
    TCTCCACACGATGTACCTCAAGAATCTCTTCAGCGAAGAGAATTTGAAGGATATCGT
    ACTTAAGCTCAATGGAGAAGCGGAAATATTCTTCAGAAAGTCCAGCATTAAGAATC
    CTATAATTCACAAGAAAGGGTCAATTCTCGTAAACCGGACTTATGAGGCCGAAGAA
    AAAGATCAGTTTGGTAACATTCAGATTGTACGGAAAAACATTCCCGAGAACATCTAT
    CAAGAACTGTATAAATACTTTAATGATAAATCCGACAAGGAACTTTCTGACGAGGCT
    GCAAAATTGAAGAACGTAGTGGGACACCATGAGGCCGCAACCAATATAGTAAAGGA
    TTACAGATACACTTATGATAAGTATTTCCTCCATATGCCGATCACGATTAATTTCAAG
    GCGAATAAAACCGGCTTCATTAACGATCGCATTTTGCAATATATTGCGAAGGAAAA
    GGATTTGCACGTGATAGGTATAGACCGGGGTGAACGAAACTTGATTTACGTCTCTGT
    GATCGACACATGCGGAAATATAGTTGAACAGAAGTCCTTTAATATTGTGAATGGTTA
    CGACTACCAGATAAAATTGAAGCAACAGGAGGGCGCAAGACAGATAGCTCGCAAA
    GAGTGGAAGGAAATCGGCAAGATCAAAGAAATAAAGGAGGGTTATCTTTCCCTGGT
    AATTCATGAAATTAGCAAGATGGTTATTAAGTATAATGCTATAATAGCTATGGAGGA
    CCTTTCCTATGGGTTCAAGAAAGGTCGCTTCAAAGTGGAGCGACAAGTGTATCAAAA
    GTTCGAGACTATGTTGATAAATAAATTGAATTATTTGGTTTTTAAAGACATTTCAATA
    ACTGAGAACGGGGGTCTCTTGAAGGGGTACCAATTGACTTATATTCCGGACAAGTTG
    AAGAATGTCGGACACCAGTGTGGTTGCATTTTCTACGTGCCTGCCGCTTACACCTCA
    AAAATCGATCCGACCACTGGTTTTGTAAATATATTTAAATTCAAAGATCTCACCGTT
    GATGCCAAACGGGAGTTTATCAAAAAATTCGATTCCATTCGCTACGACTCTGAGAAA
    AACCTTTTTTGTTTCACGTTCGATTATAACAACTTTATAACCCAAAATACTGTAATGT
    CCAAGTCAAGTTGGTCTGTCTATACTTACGGAGTAAGGATCAAGCGCCGCTTCGTTA
    ATGGGAGATTCTCAAACGAGTCTGATACCATAGACATAACTAAAGACATGGAAAAA
    ACCCTGGAAATGACGGACATCAATTGGCGAGACGGGCATGATCTTCGACAGGACAT
    AATAGATTACGAAATTGTTCAACACATTTTCGAGATATTTCGACTTACGGTTCAGAT
    GAGGAATTCCCTTTCCGAATTGGAAGACCGGGATTATGATCGACTTATATCTCCCGT
    GCTCAATGAAAACAATATTTTTTATGATTCAGCGAAAGCTGGGGACGCGCTGCCAAA
    AGATGCCGATGCCAATGGAGCATACTGTATCGCCCTGAAGGGTTTGTATGAGATTAA
    GCAAATTACTGAAAACTGGAAGGAAGATGGCAAGTTTTCTAGAGATAAGCTTAAGA
    TTAGCAATAAGGACTGGTTTGACTTCATTCAAAATAAAAGGTATCTT
    SEQ ID NO: 42
    ATGAATAATGGAACAAATAATTTTCAAAATTTTATTGGTATCAGTTCATTGCAAAAG
    ACTTTGAGAAATGCTTTGATCCCGACTGAGACCACACAGCAGTTCATCGTCAAAAAT
    GGCATAATCAAGGAAGACGAACTTAGGGGTGAGAATAGACAAATATTGAAGGACAT
    CATGGATGACTATTATAGGGGGTTCATTTCCGAAACGCTCAGTAGTATTGATGACAT
    TGACTGGACTAGTCTTTTCGAGAAAATGGAAATTCAGCTTAAGAACGGGGACAATA
    AAGACACGCTGATCAAGGAGCAAACGGAATATAGGAAGGCGATCCATAAAAAATTC
    GCGAATGATGATCGGTTTAAAAACATGTTTAGTGCCAAGTTGATCAGCGACATACTG
    CCCGAATTCGTGATCCACAACAATAATTACAGCGCCTCCGAAAAGGAGGAAAAAAC
    TCAGGTCATTAAATTGTTTAGCCGATTCGCAACGAGTTTCAAAGATTATTTTAAGAA
    CCGGGCCAACTGTTTTTCAGCGGATGATATTAGCTCCAGCAGCTGCCATCGCATAGT
    AAATGATAACGCTGAAATCTTTTTTAGCAACGCACTTGTCTACCGGAGGATTGTAAA
    ATCACTGTCAAATGATGACATTAACAAAATATCTGGAGATATGAAGGACTCACTCA
    AAGAAATGAGCCTGGAAGAAATATATTCATACGAAAAATACGGGGAGTTTATTACC
    CAGGAAGGTATCAGTTTTTATAATGATATATGTGGAAAAGTTAATTCATTTATGAAT
    CTTTACTGTCAAAAAAATAAGGAGAACAAGAATTTGTACAAGCTCCAAAAACTTCA
    TAAACAGATTCTGTGCATCGCAGACACAAGTTATGAGGTACCGTACAAATTTGAGA
    GCGACGAAGAAGTTTATCAGAGTGTGAATGGTTTCCTGGACAATATCTCTTCTAAAC
    ACATTGTTGAGAGGCTTAGGAAGATCGGTGATAATTATAACGGCTATAATCTGGACA
    AAATTTATATTGTATCAAAGTTTTATGAATCAGTCTCTCAAAAGACGTATCGGGATT
    GGGAAACAATTAACACGGCTCTGGAGATCCACTACAATAACATTCTGCCCGGCAAC
    GGGAAGAGCAAAGCTGATAAGGTCAAGAAGGCAGTCAAGAACGACCTTCAGAAGA
    GCATAACAGAAATTAACGAATTGGTCAGTAACTACAAACTGTGTAGTGATGACAAC
    ATAAAAGCCGAAACATACATCCATGAAATAAGCCATATCCTGAATAACTTCGAAGC
    CCAAGAACTTAAATACAATCCCGAGATTCATCTTGTCGAATCAGAACTCAAGGCGTC
    CGAGCTCAAAAATGTCCTTGACGTGATAATGAATGCCTTCCACTGGTGCAGCGTATT
    CATGACGGAGGAGTTGGTAGATAAAGACAACAACTTTTATGCCGAATTGGAAGAGA
    TTTATGATGAGATTTACCCCGTTATTTCTCTGTACAACTTGGTTCGAAACTACGTAAC
    ACAAAAACCATACTCAACCAAAAAGATCAAACTCAATTTTGGCATACCTACATTGGC
    TGATGGTTGGTCCAAGTCAAAGGAATATAGCAATAATGCAATAATTCTCATGCGAG
    ATAACTTGTATTATTTGGGGATCTTTAACGCTAAGAACAAACCAGATAAAAAGATAA
    TCGAGGGGAACACAAGTGAGAACAAGGGTGATTACAAAAAAATGATTTACAATCTG
    CTTCCTGGGCCTAACAAAATGATTCCGAAGGTGTTTCTTAGCTCTAAAACTGGAGTG
    GAGACGTATAAGCCTTCCGCGTACATTCTCGAAGGCTACAAGCAAAATAAGCATAT
    CAAGTCCAGTAAGGACTTCGACATCACTTTTTGCCACGATCTCATCGATTACTTTAA
    GAACTGTATCGCAATACACCCCGAGTGGAAAAACTTTGGTTTTGATTTTTCAGACAC
    TAGTACCTACGAGGACATTTCCGGCTTCTATCGAGAAGTCGAACTCCAGGGCTACAA
    AATCGATTGGACGTACATTTCTGAGAAGGACATCGACTTGCTCCAAGAGAAAGGTC
    AACTTTACCTCTTCCAAATTTACAATAAAGACTTTTCAAAGAAGAGCACCGGTAATG
    ACAACTTGCATACCATGTATCTGAAGAACCTGTTTTCTGAGGAGAACCTCAAGGATA
    TTGTATTGAAGTTGAATGGCGAAGCAGAAATATTTTTCCGAAAGTCATCTATCAAGA
    ACCCCATTATACACAAAAAAGGCTCTATCCTGGTGAACCGGACTTACGAGGCAGAG
    GAGAAGGATCAATTCGGAAACATACAGATAGTCCGCAAAAACATCCCTGAGAATAT
    CTATCAGGAACTCTATAAGTACTTCAATGATAAATCAGACAAGGAGCTTAGCGACG
    AAGCAGCTAAACTTAAAAACGTGGTTGGCCATCACGAGGCCGCTACCAACATAGTC
    AAAGACTACCGCTATACTTATGACAAGTACTTTTTGCACATGCCCATAACAATTAAT
    TTCAAAGCTAACAAAACAGGGTTTATAAATGACAGAATCCTCCAATACATCGCCAA
    AGAGAAGGACCTCCATGTAATCGGGATTGATAGAGGCGAACGGAACTTGATTTACG
    TTAGTGTCATTGATACCTGTGGTAACATTGTCGAACAAAAGTCATTCAACATAGTCA
    ATGGATATGATTATCAGATAAAACTCAAGCAACAAGAAGGCGCGAGGCAGATTGCC
    AGGAAGGAATGGAAAGAAATCGGGAAGATCAAGGAGATCAAGGAGGGTTACCTGT
    CCTTGGTGATACACGAGATTTCAAAAATGGTTATAAAATACAATGCCATTATCGCGA
    TGGAGGATTTGTCTTATGGATTTAAGAAGGGGAGGTTCAAAGTCGAACGACAAGTC
    TATCAGAAGTTTGAAACAATGCTCATTAACAAGCTCAATTACCTTGTTTTCAAGGAT
    ATAAGCATCACTGAAAACGGCGGACTCCTTAAGGGATATCAGCTGACTTATATCCCC
    GACAAGCTCAAGAACGTAGGGCACCAATGCGGATGCATCTTTTACGTGCCTGCAGC
    ATATACTTCAAAAATTGATCCGACTACTGGCTTTGTTAACATTTTCAAGTTCAAGGAT
    CTGACGGTAGACGCTAAGAGAGAATTCATAAAAAAGTTTGACAGCATCAGGTACGA
    TAGTGAAAAGAACCTTTTTTGTTTTACCTTTGACTACAATAATTTTATTACGCAAAAT
    ACAGTTATGAGCAAATCAAGTTGGAGCGTTTACACATATGGCGTTCGGATCAAGCGC
    AGATTCGTCAATGGTCGCTTCTCAAATGAGAGCGATACAATCGATATAACGAAGGA
    TATGGAGAAGACGCTTGAGATGACAGATATCAACTGGCGGGACGGACATGACCTTA
    GACAAGACATAATCGATTACGAAATAGTACAGCATATCTTTGAGATTTTTAGGCTTA
    CAGTTCAGATGCGGAACTCTCTTTCCGAACTGGAGGACCGGGATTATGATCGGTTGA
    TCTCCCCAGTACTGAACGAAAATAATATCTTTTACGATAGCGCGAAGGCTGGTGATG
    CACTCCCAAAAGACGCTGATGCGAACGGAGCTTATTGCATAGCCCTTAAAGGGCTTT
    ACGAGATTAAACAAATAACAGAAAATTGGAAGGAAGATGGCAAATTTTCCCGCGAC
    AAGTTGAAGATTAGTAACAAAGACTGGTTCGACTTCATTCAGAATAAACGCTACCTC
  • Nucleic acid-guided nucleases can encompass a native sequence, an engineered sequence, or engineered nucleotide sequences of synthetized variants. Non-limiting examples of types of engineering that can be done to obtain a non-naturally occurring nuclease system are as follows. Engineering can include codon optimization to facilitate expression or improve expression in a host cell, such as a heterologous host cell. Engineering can reduce the size or molecular weight of the nuclease in order to facilitate expression or delivery. Engineering can alter PAM selection in order to change PAM specificity or to broaden the range of recognized PAMs. Engineering can alter, increase, or decrease stability, processivity, specificity, or efficiency of a targetable nuclease system. Engineering can alter, increase, or decrease protein stability. Engineering can alter, increase, or decrease processivity of nucleic acid scanning. Engineering can alter, increase, or decrease target sequence specificity. Engineering can alter, increase, or decrease nuclease activity. Engineering can alter, increase, or decrease editing efficiency. Engineering can alter, increase, or decrease transformation efficiency. Engineering can alter, increase, or decrease nuclease or guide nucleic acid expression. As used herein, a non-naturally occurring nucleic acid sequence can be an engineered sequence or engineered nucleotide sequences of synthetized variants. Such non-naturally occurring nucleic acid sequences can be amplified, cloned, assembled, synthesized, generated from synthesized oligonucleotides or dNTPs, or otherwise obtained using methods known by those skilled in the art. In certain embodiments, examples of non-naturally occurring nucleic acid-guided nucleases disclosed herein can include those nucleic acid-guided nucleases with engineered polypeptide sequences (e.g., SEQ ID NOs:2-4) and those nucleotide sequences of synthetized variants (e.g., SEQ ID NOs: 43-63).
  • SEQ ID NO: 2
    MGHHHHHHSSGVDLGTENLYFQSPAAKKKKLDGSVDMNNGTNNFQNFIGISSLQKTLR
    NALIPTETTQQFIVKNGIIKEDELRGENRQILKDIMDDYYRGFISETLSSIDDIDWTSLFEK
    MEIQLKNGDNKDTLIKEQTEYRKAIHKKFANDDRFKNMFSAKLISDILPEFVIHNNNYSA
    SEKEEKTQVIKLFSRFATSFKDYFKNRANCFSADDISSSSCHRIVNDNAEIFFSNALVYRRI
    VKSLSNDDINKISGDMKDSLKEMSLEEIYSYEKYGEFITQEGISFYNDICGKVNSFMNLY
    CQKNKENKNLYKLQKLHKQILCIADTSYEVPYKFESDEEVYQSVNGFLDNISSKHIVERL
    RKIGDNYNGYNLDKIYIVSKFYESVSQKTYRDWETINTALEIHYNNILPGNGKSKADKV
    KKAVKNDLQKSITEINELVSNYKLCSDDNIKAETYIHEISHILNNFEAQELKYNPEIHLVE
    SELKASELKNVLDVIMNAFHWCSVFMTEELVDKDNNFYAELEEIYDEIYPVISLYNLVR
    NYVTQKPYSTKKIKLNFGIPTLADGWSKSKEYSNNAIILMRDNLYYLGIFNAKNKPDKKI
    IEGNTSENKGDYKKMIYNLLPGPNKMIPKVFLSSKTGVETYKPSAYILEGYKQNKHIKSS
    KDFDITFCHDLIDYFKNCIAIHPEWKNFGFDFSDTSTYEDISGFYREVELQGYKIDWTYIS
    EKDIDLLQEKGQLYLFQIYNKDFSKKSTGNDNLHTMYLKNLFSEENLKDIVLKLNGEAEI
    FFRKSSIKNPIIHKKGSILVNRTYEAEEKDQFGNIQIVRKNIPENIYQELYKYFNDKSDKEL
    SDEAAKLKNVVGHHEAATNIVKDYRYTYDKYFLHMPITINFKANKTGFINDRILQYIAK
    EKDLHVIGIDRGERNLIYVSVIDTCGNIVEQKSFNIVNGYDYQIKLKQQEGARQIARKEW
    KEIGKIKEIKEGYLSLVIHEISKMVIKYNAIIAMEDLSYGFKKGRFKVERQVYQKFETMLI
    NKLNYLVFKDISITENGGLLKGYQLTYIPDKLKNVGHQCGCIFYVPAAYTSKIDPTTGFV
    NIFKFKDLTVDAKREFIKKFDSIRYDSEKNLFCFTFDYNNFITQNTVMSKSSWSVYTYGV
    RIKRRFVNGRFSNESDTIDITKDMEKTLEMTDINWRDGHDLRQDIIDYEIVQHIFEIFRLTV
    QMRNSLSELEDRDYDRLISPVLNENNIFYDSAKAGDALPKDADANGAYCIALKGLYEIK
    QITENWKEDGKFSRDKLKISNKDWFDFIQNKRYLKRPAATKKAGQAKKKKASGSGAGS
    PKKKRKVEDPKKKRKVIPG*
    SEQ ID NO: 3
    SPAAKKKKLDGSVDMNNGTNNFQNFIGISSLQKTLRNALIPTETTQQFIVKNGIIKEDELR
    GENRQILKDIMDDYYRGFISETLSSIDDIDWTSLFEKMEIQLKNGDNKDTLIKEQTEYRK
    AIHKKFANDDRFKNMFSAKLISDILPEFVIHNNNYSASEKEEKTQVIKLFSRFATSFKDYF
    KNRANCFSADDISSSSCHRIVNDNAEIFFSNALVYRRIVKSLSNDDINKISGDMKDSLKEM
    SLEEIYSYEKYGEFITQEGISFYNDICGKVNSFMNLYCQKNKENKNLYKLQKLHKQILCI
    ADTSYEVPYKFESDEEVYQSVNGFLDNISSKHIVERLRKIGDNYNGYNLDKIYIVSKFYE
    SVSQKTYRDWETINTALEIHYNNILPGNGKSKADKVKKAVKNDLQKSITEINELVSNYK
    LCSDDNIKAETYIHEISHILNNFEAQELKYNPEIHLVESELKASELKNVLDVIMNAFHWCS
    VFMTEELVDKDNNFYAELEEIYDEIYPVISLYNLVRNYVTQKPYSTKKIKLNFGIPTLAD
    GWSKSKEYSNNAIILMRDNLYYLGIFNAKNKPDKKIIEGNTSENKGDYKKMIYNLLPGP
    NKMIPKVFLSSKTGVETYKPSAYILEGYKQNKHIKSSKDFDITFCHDLIDYFKNCIAIHPE
    WKNFGFDFSDTSTYEDISGFYREVELQGYKIDWTYISEKDIDLLQEKGQLYLFQIYNKDF
    SKKSTGNDNLHTMYLKNLFSEENLKDIVLKLNGEAEIFFRKSSIKNPIIHKKGSILVNRTY
    EAEEKDQFGNIQIVRKNIPENIYQELYKYFNDKSDKELSDEAAKLKNVVGHHEAATNIV
    KDYRYTYDKYFLHMPITINFKANKTGFINDRILQYIAKEKDLHVIGIDRGERNLIYVSVID
    TCGNIVEQKSFNIVNGYDYQIKLKQQEGARQIARKEWKEIGKIKEIKEGYLSLVIHEISKM
    VIKYNAIIAMEDLSYGFKKGRFKVERQVYQKFETMLINKLNYLVFKDISITENGGLLKGY
    QLTYIPDKLKNVGHQCGCIFYVPAAYTSKIDPTTGFVNIFKFKDLTVDAKREFIKKFDSIR
    YDSEKNLFCFTFDYNNFITQNTVMSKSSWSVYTYGVRIKRRFVNGRFSNESDTIDITKDM
    EKTLEMTDINWRDGHDLRQDIIDYEIVQHIFEIFRLTVQMRNSLSELEDRDYDRLISPVLN
    ENNIFYDSAKAGDALPKDADANGAYCIALKGLYEIKQITENWKEDGKFSRDKLKISNKD
    WFDFIQNKRYLKRPAATKKAGQAKKKKASGSGAGSPKKKRKVEDPKKKRKVIPG*
    SEQ ID NO: 4
    PAAKKKKLDGSVDMNNGTNNFQNFIGISSLQKTLRNALIPTETTQQFIVKNGIIKEDELRG
    ENRQILKDIMDDYYRGFISETLSSIDDIDWTSLFEKMEIQLKNGDNKDTLIKEQTEYRKAI
    HKKFANDDRFKNMFSAKLISDILPEFVIHNNNYSASEKEEKTQVIKLFSRFATSFKDYFK
    NRANCFSADDISSSSCHRIVNDNAEIFFSNALVYRRIVKSLSNDDINKISGDMKDSLKEMS
    LEEIYSYEKYGEFITQEGISFYNDICGKVNSFMNLYCQKNKENKNLYKLQKLHKQILCIA
    DTSYEVPYKFESDEEVYQSVNGFLDNISSKHIVERLRKIGDNYNGYNLDKIYIVSKFYES
    VSQKTYRDWETINTALEIHYNNILPGNGKSKADKVKKAVKNDLQKSITEINELVSNYKL
    CSDDNIKAETYIHEISHILNNFEAQELKYNPEIHLVESELKASELKNVLDVIMNAFHWCSV
    FMTEELVDKDNNFYAELEEIYDEIYPVISLYNLVRNYVTQKPYSTKKIKLNFGIPTLADG
    WSKSKEYSNNAIILMRDNLYYLGIFNAKNKPDKKIIEGNTSENKGDYKKMIYNLLPGPN
    KMIPKVFLSSKTGVETYKPSAYILEGYKQNKHIKSSKDFDITFCHDLIDYFKNCIAIHPEW
    KNFGFDFSDTSTYEDISGFYREVELQGYKIDWTYISEKDIDLLQEKGQLYLFQIYNKDFSK
    KSTGNDNLHTMYLKNLFSEENLKDIVLKLNGEAEIFFRKSSIKNPIIHKKGSILVNRTYEA
    EEKDQFGNIQIVRKNIPENIYQELYKYFNDKSDKELSDEAAKLKNVVGHHEAATNIVKD
    YRYTYDKYFLHMPITINFKANKTGFINDRILQYIAKEKDLHVIGIDRGERNLIYVSVIDTC
    GNIVEQKSFNIVNGYDYQIKLKQQEGARQIARKEWKEIGKIKEIKEGYLSLVIHEISKMVI
    KYNAIIAMEDLSYGFKKGRFKVERQVYQKFETMLINKLNYLVFKDISITENGGLLKGYQ
    LTYIPDKLKNVGHQCGCIFYVPAAYTSKIDPTTGFVNIFKFKDLTVDAKREFIKKFDSIRY
    DSEKNLFCFTFDYNNFITQNTVMSKSSWSVYTYGVRIKRRFVNGRFSNESDTIDITKDME
    KTLEMTDINWRDGHDLRQDIIDYEIVQHIFEIFRLTVQMRNSLSELEDRDYDRLISPVLNE
    NNIFYDSAKAGDALPKDADANGAYCIALKGLYEIKQITENWKEDGKFSRDKLKISNKD
    WFDFIQNKRYLKRPAATKKAGQAKKKKASGSGAGSPKKKRKVEDPKKKRKVIPG*
    SEQ ID NO: 109:
    SMSRRRKANPTKLSENAKKLAKEVENASGSGAGSKRPAATKKAGQAKKKKASGSGAG
    SPAAKKKKLDGSVDASGSGAGSPKKKRKVEDASGSGAGSPKKKRKVASGSGAGSMNN
    GTNNFQNFIGISSLQKTLRNALIPTETTQQFIVKNGIIKEDELRGENRQILKDIMDDYYRGF
    ISETLSSIDDIDWTSLFEKMEIQLKNGDNKDTLIKEQTEYRKAIHKKFANDDRFKNMFSA
    KLISDILPEFVIHNNNYSASEKEEKTQVIKLFSRFATSFKDYFKNRANCFSADDISSSSCHR
    IVNDNAEIFFSNALVYRRIVKSLSNDDINKISGDMKDSLKEMSLEEIYSYEKYGEFITQEGI
    SFYNDICGKVNSFMNLYCQKNKENKNLYKLQKLHKQILCIADTSYEVPYKFESDEEVYQ
    SVNGFLDNISSKHIVERLRKIGDNYNGYNLDKIYIVSKFYESVSQKTYRDWETINTALEIH
    YNNILPGNGKSKADKVKKAVKNDLQKSITEINELVSNYKLCSDDNIKAETYIHEISHILN
    NFEAQELKYNPEIHLVESELKASELKNVLDVIMNAFHWCSVFMTEELVDKDNNFYAELE
    EIYDEIYPVISLYNLVRNYVTQKPYSTKKIKLNFGIPTLADGWSKSKEYSNNAIILMRDNL
    YYLGIFNAKNKPDKKIIEGNTSENKGDYKKMIYNLLPGPNKMIPKVFLSSKTGVETYKPS
    AYILEGYKQNKHIKSSKDFDITFCHDLIDYFKNCIAIHPEWKNFGFDFSDTSTYEDISGFY
    REVELQGYKIDWTYISEKDIDLLQEKGQLYLFQIYNKDFSKKSTGNDNLHTMYLKNLFS
    EENLKDIVLKLNGEAEIFFRKSSIKNPIIHKKGSILVNRTYEAEEKDQFGNIQIVRKNIPENI
    YQELYKYFNDKSDKELSDEAAKLKNVVGHHEAATNIVKDYRYTYDKYFLHMPITINFK
    ANKTGFINDRILQYIAKEKDLHVIGIDRGERNLIYVSVIDTCGNIVEQKSFNIVNGYDYQI
    KLKQQEGARQIARKEWKEIGKIKEIKEGYLSLVIHEISKMVIKYNAIIAMEDLSYGFKKG
    RFKVERQVYQKFETMLINKLNYLVFKDISITENGGLLKGYQLTYIPDKLKNVGHQCGCIF
    YVPAAYTSKIDPTTGFVNIFKFKDLTVDAKREFIKKFDSIRYDSEKNLFCFTFDYNNFITQ
    NTVMSKSSWSVYTYGVRIKRRFVNGRFSNESDTIDITKDMEKTLEMTDINWRDGHDLR
    QDIIDYEIVQHIFEIFRLTVQMRNSLSELEDRDYDRLISPVLNENNIFYDSAKAGDALPKD
    ADANGAYCIALKGLYEIKQITENWKEDGKFSRDKLKISNKDWFDFIQNKRYL
    SEQ ID NO: 110:
    MSRRRKANPTKLSENAKKLAKEVENASGSGAGSKRPAATKKAGQAKKKKASGSGAGS
    PAAKKKKLDGSVDASGSGAGSPKKKRKVEDASGSGAGSPKKKRKVASGSGAGSMNNG
    TNNFQNFIGISSLQKTLRNALIPTETTQQFIVKNGIIKEDELRGENRQILKDIMDDYYRGFI
    SETLSSIDDIDWTSLFEKMEIQLKNGDNKDTLIKEQTEYRKAIHKKFANDDRFKNMFSAK
    LISDILPEFVIHNNNYSASEKEEKTQVIKLFSRFATSFKDYFKNRANCFSADDISSSSCHRI
    VNDNAEIFFSNALVYRRIVKSLSNDDINKISGDMKDSLKEMSLEEIYSYEKYGEFITQEGI
    SFYNDICGKVNSFMNLYCQKNKENKNLYKLQKLHKQILCIADTSYEVPYKFESDEEVYQ
    SVNGFLDNISSKHIVERLRKIGDNYNGYNLDKIYIVSKFYESVSQKTYRDWETINTALEIH
    YNNILPGNGKSKADKVKKAVKNDLQKSITEINELVSNYKLCSDDNIKAETYIHEISHILN
    NFEAQELKYNPEIHLVESELKASELKNVLDVIMNAFHWCSVFMTEELVDKDNNFYAELE
    EIYDEIYPVISLYNLVRNYVTQKPYSTKKIKLNFGIPTLADGWSKSKEYSNNAIILMRDNL
    YYLGIFNAKNKPDKKIIEGNTSENKGDYKKMIYNLLPGPNKMIPKVFLSSKTGVETYKPS
    AYILEGYKQNKHIKSSKDFDITFCHDLIDYFKNCIAIHPEWKNFGFDFSDTSTYEDISGFY
    REVELQGYKIDWTYISEKDIDLLQEKGQLYLFQIYNKDFSKKSTGNDNLHTMYLKNLFS
    EENLKDIVLKLNGEAEIFFRKSSIKNPIIHKKGSILVNRTYEAEEKDQFGNIQIVRKNIPENI
    YQELYKYFNDKSDKELSDEAAKLKNVVGHHEAATNIVKDYRYTYDKYFLHMPITINFK
    ANKTGFINDRILQYIAKEKDLHVIGIDRGERNLIYVSVIDTCGNIVEQKSFNIVNGYDYQI
    KLKQQEGARQIARKEWKEIGKIKEIKEGYLSLVIHEISKMVIKYNAIIAMEDLSYGFKKG
    RFKVERQVYQKFETMLINKLNYLVFKDISITENGGLLKGYQLTYIPDKLKNVGHQCGCIF
    YVPAAYTSKIDPTTGFVNIFKFKDLTVDAKREFIKKFDSIRYDSEKNLFCFTFDYNNFITQ
    NTVMSKSSWSVYTYGVRIKRRFVNGRFSNESDTIDITKDMEKTLEMTDINWRDGHDLR
    QDIIDYEIVQHIFEIFRLTVQMRNSLSELEDRDYDRLISPVLNENNIFYDSAKAGDALPKD
    ADANGAYCIALKGLYEIKQITENWKEDGKFSRDKLKISNKDWFDFIQNKRYL
    SEQ ID NO: 111
    GHHHHHHSSGVDLGTENLYFQSMSRRRKANPTKLSENAKKLAKEVENASGSGAGSKRP
    AATKKAGQAKKKKASGSGAGSPAAKKKKLDGSVDASGSGAGSPKKKRKVEDASGSGA
    GSPKKKRKVASGSGAGSMNNGTNNFQNFIGISSLQKTLRNALIPTETTQQFIVKNGIIKED
    ELRGENRQILKDIMDDYYRGFISETLSSIDDIDWTSLFEKMEIQLKNGDNKDTLIKEQTEY
    RKAIHKKFANDDRFKNMFSAKLISDILPEFVIHNNNYSASEKEEKTQVIKLFSRFATSFKD
    YFKNRANCFSADDISSSSCHRIVNDNAEIFFSNALVYRRIVKSLSNDDINKISGDMKDSLK
    EMSLEEIYSYEKYGEFITQEGISFYNDICGKVNSFMNLYCQKNKENKNLYKLQKLHKQIL
    CIADTSYEVPYKFESDEEVYQSVNGFLDNISSKHIVERLRKIGDNYNGYNLDKIYIVSKFY
    ESVSQKTYRDWETINTALEIHYNNILPGNGKSKADKVKKAVKNDLQKSITEINELVSNY
    KLCSDDNIKAETYIHEISHILNNFEAQELKYNPEIHLVESELKASELKNVLDVIMNAFHW
    CSVFMTEELVDKDNNFYAELEEIYDEIYPVISLYNLVRNYVTQKPYSTKKIKLNFGIPTLA
    DGWSKSKEYSNNAIILMRDNLYYLGIFNAKNKPDKKIIEGNTSENKGDYKKMIYNLLPG
    PNKMIPKVFLSSKTGVETYKPSAYILEGYKQNKHIKSSKDFDITFCHDLIDYFKNCIAIHPE
    WKNFGFDFSDTSTYEDISGFYREVELQGYKIDWTYISEKDIDLLQEKGQLYLFQIYNKDF
    SKKSTGNDNLHTMYLKNLFSEENLKDIVLKLNGEAEIFFRKSSIKNPIIHKKGSILVNRTY
    EAEEKDQFGNIQIVRKNIPENIYQELYKYFNDKSDKELSDEAAKLKNVVGHHEAATNIV
    KDYRYTYDKYFLHMPITINFKANKTGFINDRILQYIAKEKDLHVIGIDRGERNLIYVSVID
    TCGNIVEQKSFNIVNGYDYQIKLKQQEGARQIARKEWKEIGKIKEIKEGYLSLVIHEISKM
    VIKYNAIIAMEDLSYGFKKGRFKVERQVYQKFETMLINKLNYLVFKDISITENGGLLKGY
    QLTYIPDKLKNVGHQCGCIFYVPAAYTSKIDPTTGFVNIFKFKDLTVDAKREFIKKFDSIR
    YDSEKNLFCFTFDYNNFITQNTVMSKSSWSVYTYGVRIKRRFVNGRFSNESDTIDITKDM
    EKTLEMTDINWRDGHDLRQDIIDYEIVQHIFEIFRLTVQMRNSLSELEDRDYDRLISPVLN
    ENNIFYDSAKAGDALPKDADANGAYCIALKGLYEIKQITENWKEDGKFSRDKLKISNKD
    WFDFIQNKRYL*
    SEQ ID NO: 112
    MGHHHHHHSSGVDLGTENLYFQSMSRRRKANPTKLSENAKKLAKEVENASGSGAGSK
    RPAATKKAGQAKKKKASGSGAGSPAAKKKKLDGSVDASGSGAGSPKKKRKVEDASGS
    GAGSPKKKRKVASGSGAGSMNNGTNNFQNFIGISSLQKTLRNALIPTETTQQFIVKNGIIK
    EDELRGENRQILKDIMDDYYRGFISETLSSIDDIDWTSLFEKMEIQLKNGDNKDTLIKEQT
    EYRKAIHKKFANDDRFKNMFSAKLISDILPEFVIHNNNYSASEKEEKTQVIKLFSRFATSF
    KDYFKNRANCFSADDISSSSCHRIVNDNAEIFFSNALVYRRIVKSLSNDDINKISGDMKDS
    LKEMSLEEIYSYEKYGEFITQEGISFYNDICGKVNSFMNLYCQKNKENKNLYKLQKLHK
    QILCIADTSYEVPYKFESDEEVYQSVNGFLDNISSKHIVERLRKIGDNYNGYNLDKIYIVS
    KFYESVSQKTYRDWETINTALEIHYNNILPGNGKSKADKVKKAVKNDLQKSITEINELVS
    NYKLCSDDNIKAETYIHEISHILNNFEAQELKYNPEIHLVESELKASELKNVLDVIMNAFH
    WCSVFMTEELVDKDNNFYAELEEIYDEIYPVISLYNLVRNYVTQKPYSTKKIKLNFGIPT
    LADGWSKSKEYSNNAIILMRDNLYYLGIFNAKNKPDKKIIEGNTSENKGDYKKMIYNLL
    PGPNKMIPKVFLSSKTGVETYKPSAYILEGYKQNKHIKSSKDFDITFCHDLIDYFKNCIAI
    HPEWKNFGFDFSDTSTYEDISGFYREVELQGYKIDWTYISEKDIDLLQEKGQLYLFQIYN
    KDFSKKSTGNDNLHTMYLKNLFSEENLKDIVLKLNGEAEIFFRKSSIKNPIIHKKGSILVN
    RTYEAEEKDQFGNIQIVRKNIPENIYQELYKYFNDKSDKELSDEAAKLKNVVGHHEAAT
    NIVKDYRYTYDKYFLHMPITINFKANKTGFINDRILQYIAKEKDLHVIGIDRGERNLIYVS
    VIDTCGNIVEQKSFNIVNGYDYQIKLKQQEGARQIARKEWKEIGHIKEIKEGYLSLVIHEI
    SKMVIKYNAIIAMEDLSYGFKKGRFKVERQVYQKFETMLINKLNYLVFKDISITENGGLL
    KGYQLTYIPDKLKNVGHQCGCIFYVPAAYTSKIDPTTGFVNIFKFKDLTVDAKREFIKKF
    DSIRYDSEKNLFCFTFDYNNFITQNTVMSKSSWSVYTYGVRIKRRFVNGRFSNESDTIDIT
    KDMEKTLEMTDINWRDGHDLRQDIIDYEIVQHIFEIFRLTVQMRNSLSELEDRDYDRLIS
    PVLNENNIFYDSAKAGDALPKDADANGAYCIALKGLYEIKQITENWKEDGKFSRDKLKI
    SNKDWFDFIQNKRYL*
    SEQ ID NO: 43
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    TAACGGTACCAATAACTTCCAGAACTTCATCGGTATTTCTAGCCTGCAAAAGACCCT
    GCGTAACGCGCTGATTCCGACCGAGACTACCCAGCAATTCATCGTGAAAAACGGTA
    TCATTAAGGAAGATGAATTGCGCGGTGAGAATCGTCAGATTCTGAAAGATATCATG
    GATGACTACTATCGCGGTTTCATTAGCGAAACCCTGTCGAGCATCGATGATATCGAT
    TGGACGAGCCTCTTCGAGAAAATGGAAATTCAACTGAAAAATGGTGACAACAAAGA
    TACCCTGATTAAAGAACAAACGGAATACCGCAAGGCAATCCATAAAAAGTTTGCGA
    ATGACGACCGTTTTAAGAATATGTTCTCGGCCAAGCTGATTTCCGACATCCTGCCAG
    AGTTCGTCATTCACAACAACAATTACAGCGCAAGCGAGAAAGAGGAAAAGACTCAG
    GTCATTAAGCTGTTTAGCCGCTTTGCGACGTCCTTCAAAGACTACTTCAAGAATCGT
    GCGAATTGCTTTAGCGCGGATGACATCTCTAGCTCTAGCTGTCACCGTATTGTTAAC
    GACAATGCAGAGATTTTCTTCAGCAACGCCCTGGTGTATCGCCGTATTGTCAAGTCT
    CTGAGCAACGACGACATTAACAAGATCAGCGGCGACATGAAAGACAGCCTGAAAG
    AAATGTCTCTGGAAGAAATCTACAGCTACGAGAAATATGGTGAGTTTATCACCCAA
    GAGGGCATTAGCTTCTACAATGATATCTGTGGTAAGGTTAATAGCTTTATGAATCTG
    TACTGCCAGAAGAATAAAGAAAACAAGAACTTGTACAAGCTGCAAAAGCTGCATAA
    GCAAATTCTGTGCATCGCCGATACTAGCTATGAAGTTCCGTACAAGTTCGAGTCTGA
    TGAAGAGGTGTATCAGTCAGTCAACGGTTTTCTGGATAACATCAGCAGCAAGCACAT
    CGTCGAGCGCCTGCGCAAGATTGGTGACAACTACAATGGTTATAACCTGGACAAGA
    TCTATATCGTGTCGAAGTTTTACGAGAGCGTGTCCCAGAAAACGTACCGTGATTGGG
    AAACGATTAACACGGCCTTGGAAATTCACTATAACAATATCCTGCCGGGCAACGGC
    AAGAGCAAAGCTGACAAAGTCAAAAAAGCTGTGAAAAACGATCTGCAAAAGTCCAT
    CACCGAGATCAACGAACTGGTTAGCAACTATAAGCTGTGTAGCGACGACAACATTA
    AAGCTGAAACGTATATCCACGAAATCAGCCACATCCTGAATAACTTTGAGGCACAA
    GAACTGAAATACAATCCTGAGATCCATCTGGTAGAGAGCGAGCTGAAGGCAAGCGA
    GTTGAAAAACGTTCTCGACGTTATCATGAATGCTTTCCACTGGTGTAGCGTGTTTATG
    ACCGAAGAACTGGTTGACAAAGATAACAATTTCTATGCAGAGCTGGAAGAAATCTA
    TGATGAAATCTACCCGGTCATCAGCCTGTATAACCTGGTTCGTAACTACGTGACGCA
    GAAGCCGTACAGCACCAAAAAGATCAAGCTGAACTTCGGTATTCCGACCTTGGCGG
    ACGGTTGGAGCAAATCCAAAGAATACTCCAATAATGCGATTATTCTGATGCGTGATA
    ATCTGTACTATCTGGGTATCTTCAATGCGAAGAACAAGCCAGATAAAAAGATTATTG
    AAGGCAACACCAGCGAGAATAAAGGCGACTACAAGAAAATGATCTACAACTTATTG
    CCGGGTCCGAACAAGATGATCCCGAAAGTTTTTCTGAGCAGCAAGACCGGCGTTGA
    AACCTATAAGCCGAGCGCGTACATTTTAGAGGGCTATAAACAAAACAAGCACATCA
    AGAGCAGCAAAGATTTTGATATTACGTTCTGCCACGACCTGATCGACTATTTCAAGA
    ATTGTATTGCGATTCACCCTGAGTGGAAGAACTTCGGTTTTGACTTTTCCGATACCTC
    CACCTATGAAGATATTAGCGGTTTTTACCGTGAAGTCGAGTTGCAGGGTTATAAGAT
    TGATTGGACTTACATTTCCGAGAAAGACATCGACCTGTTGCAAGAGAAAGGTCAGCT
    GTACCTGTTTCAGATCTATAACAAAGATTTCAGCAAAAAGTCGACGGGCAATGATA
    ATCTGCACACCATGTATCTGAAAAACCTGTTTAGCGAAGAGAACCTGAAAGACATT
    GTTCTTAAGCTGAATGGTGAGGCCGAGATCTTCTTCCGTAAAAGCTCCATTAAGAAC
    CCGATTATCCACAAAAAGGGCTCTATTCTGGTTAACCGCACGTACGAAGCGGAAGA
    GAAAGATCAATTTGGTAACATCCAGATCGTGCGTAAGAATATCCCGGAGAACATTT
    ACCAAGAACTGTATAAGTATTTCAATGACAAGAGCGATAAAGAATTGAGCGATGAA
    GCGGCAAAGCTGAAAAACGTCGTTGGCCACCACGAAGCCGCGACGAATATCGTGAA
    AGATTATCGTTACACCTACGACAAGTACTTTCTGCACATGCCGATCACCATCAATTT
    CAAAGCGAATAAAACGGGTTTTATCAATGACCGTATCCTGCAGTACATTGCGAAAG
    AAAAAGATTTACACGTGATTGGTATTGATCGCGGCGAGCGCAATCTGATTTACGTCA
    GCGTTATCGACACGTGCGGCAATATTGTGGAGCAGAAAAGCTTCAATATCGTCAATG
    GTTACGACTACCAGATCAAACTGAAGCAACAAGAGGGCGCCCGCCAGATTGCGCGT
    AAAGAGTGGAAAGAAATCGGTAAGATTAAAGAAATCAAGGAAGGCTACCTGTCCCT
    GGTGATCCATGAAATCAGCAAAATGGTGATCAAGTACAACGCTATCATTGCGATGG
    AAGATCTGAGCTACGGTTTTAAAAAGGGTCGCTTCAAAGTTGAGCGTCAAGTGTATC
    AGAAATTTGAGACTATGCTGATTAACAAGTTGAACTATCTGGTTTTTAAAGACATCA
    GCATTACCGAGAATGGTGGCCTGCTGAAGGGTTATCAACTGACCTATATTCCTGACA
    AGTTGAAAAATGTTGGTCATCAGTGTGGTTGCATTTTCTACGTACCGGCAGCGTACA
    CGAGCAAGATTGACCCGACCACGGGTTTCGTTAACATTTTCAAGTTTAAAGATTTGA
    CCGTGGACGCCAAGCGTGAGTTCATTAAAAAGTTCGACAGCATCAGATACGACTCT
    GAGAAGAATCTGTTCTGCTTTACGTTCGACTACAATAACTTCATTACCCAAAATACC
    GTTATGAGCAAAAGCTCCTGGAGCGTGTACACGTACGGCGTCCGTATCAAGCGTCGT
    TTTGTGAATGGTCGCTTTTCCAACGAATCTGACACCATTGACATTACCAAAGATATG
    GAAAAGACCCTTGAGATGACCGACATTAATTGGCGTGATGGCCATGACTTGCGCCA
    AGACATTATCGACTACGAAATTGTTCAGCACATCTTTGAGATTTTTCGTCTGACGGTC
    CAGATGCGCAACTCGCTGAGCGAGTTGGAAGATCGTGACTATGACCGTCTGATTAGC
    CCGGTGCTGAATGAAAACAATATCTTCTATGATAGCGCAAAGGCCGGTGACGCGCT
    GCCGAAAGATGCGGATGCTAACGGTGCATACTGCATTGCACTGAAGGGTCTGTACG
    AAATCAAACAGATCACCGAGAATTGGAAAGAGGATGGTAAGTTTAGCCGTGATAAG
    CTGAAGATTAGCAATAAAGACTGGTTCGACTTTATTCAAAACAAGCGCTATCTGAAA
    CGTCCGGCAGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTA
    GCGGCGCAGGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACG
    TAAGGTTATTCCGGGCTAA
    SEQ ID NO: 44
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    CAACGGAACAAATAATTTTCAGAACTTTATTGGGATCAGTTCGCTTCAGAAAACGCT
    TCGTAATGCTCTGATTCCCACAGAAACCACTCAGCAGTTTATCGTAAAGAATGGCAT
    TATCAAGGAGGATGAATTACGCGGCGAGAACCGCCAAATCTTAAAAGATATCATGG
    ACGACTACTACCGCGGTTTCATTAGCGAAACTCTTAGTTCAATTGACGACATTGACT
    GGACGTCCTTGTTCGAAAAGATGGAGATTCAATTAAAGAACGGTGATAACAAGGAT
    ACGTTGATTAAAGAACAGACGGAGTACCGTAAGGCTATCCACAAAAAATTTGCAAA
    CGACGACCGCTTTAAAAATATGTTTAGCGCAAAATTAATCTCCGACATCCTGCCTGA
    ATTCGTCATCCATAACAATAACTATAGCGCCTCGGAAAAAGAAGAAAAAACGCAGG
    TTATTAAACTTTTCTCGCGCTTTGCAACAAGCTTTAAGGATTACTTCAAAAATCGCGC
    CAATTGTTTTTCAGCCGACGACATTAGCTCCAGTTCCTGCCACCGTATTGTGAATGAC
    AACGCTGAGATTTTTTTTTCCAATGCGCTGGTTTATCGTCGTATTGTTAAGAGCCTTA
    GTAACGACGACATTAATAAAATTAGCGGTGATATGAAGGATAGCTTGAAAGAAATG
    AGTCTGGAAGAGATCTATAGTTACGAGAAGTACGGCGAATTTATTACCCAGGAGGG
    CATTTCATTTTACAATGATATCTGTGGAAAAGTCAACTCCTTTATGAACTTGTATTGC
    CAAAAGAATAAAGAAAACAAAAACCTGTACAAACTGCAAAAGTTACACAAGCAGA
    TTTTGTGTATCGCAGACACGTCATACGAAGTACCGTACAAGTTTGAGTCCGATGAAG
    AAGTGTACCAAAGCGTTAATGGCTTTTTGGATAACATTTCGAGCAAACATATCGTAG
    AGCGTTTGCGTAAGATTGGTGATAATTACAACGGTTACAATTTAGACAAAATCTATA
    TCGTCTCTAAGTTTTACGAAAGTGTTTCTCAGAAAACTTACCGCGATTGGGAGACGA
    TCAACACTGCGCTGGAGATTCATTACAATAATATCCTTCCAGGTAACGGTAAAAGCA
    AAGCTGATAAGGTGAAAAAGGCGGTTAAAAATGACCTTCAAAAGTCTATCACAGAA
    ATCAACGAATTGGTCAGCAATTATAAGCTTTGCAGTGACGATAACATTAAGGCCGA
    GACTTACATCCATGAGATCTCTCACATTCTTAATAATTTTGAAGCGCAAGAGCTGAA
    ATACAATCCTGAAATCCATCTGGTCGAAAGTGAATTAAAAGCCTCCGAATTAAAAA
    ATGTCTTGGACGTGATCATGAATGCGTTCCATTGGTGCTCAGTTTTTATGACGGAAG
    AGTTGGTGGACAAAGACAACAATTTTTACGCCGAGCTTGAGGAAATTTACGACGAA
    ATTTACCCCGTTATTTCGTTATACAACCTTGTGCGTAATTACGTTACACAAAAGCCCT
    ATTCGACAAAGAAAATCAAGTTAAATTTCGGGATTCCCACATTAGCTGATGGATGGT
    CCAAATCCAAAGAATACTCGAATAACGCTATCATCCTTATGCGTGATAATTTGTACT
    ACTTAGGCATCTTCAATGCGAAGAACAAACCTGACAAGAAAATTATCGAAGGAAAC
    ACTTCGGAGAACAAAGGTGATTATAAAAAGATGATCTACAACTTGCTTCCCGGGCC
    AAACAAAATGATTCCCAAGGTATTTTTGAGTTCTAAAACCGGTGTCGAAACTTACAA
    ACCAAGTGCTTATATTTTGGAAGGATACAAACAGAACAAACATATCAAGTCTTCGA
    AAGACTTCGATATTACGTTCTGCCACGATCTGATCGATTACTTCAAGAACTGTATTG
    CTATTCACCCCGAGTGGAAGAACTTTGGATTTGATTTCTCCGACACGTCCACTTATG
    AAGATATCTCTGGCTTCTATCGCGAGGTTGAATTACAAGGGTATAAGATTGACTGGA
    CTTATATTTCGGAGAAGGATATCGATCTTTTGCAAGAAAAAGGGCAACTTTATTTAT
    TTCAGATCTATAACAAGGACTTTTCAAAAAAGAGCACTGGAAATGACAATCTGCAT
    ACCATGTACCTTAAGAACCTGTTCTCGGAAGAGAACCTGAAGGACATTGTACTTAAA
    CTGAATGGAGAGGCAGAGATCTTCTTTCGCAAATCAAGCATTAAGAACCCAATTATT
    CACAAAAAGGGGAGTATCTTAGTAAATCGCACATATGAGGCTGAGGAAAAAGATCA
    GTTTGGTAACATTCAGATCGTGCGTAAGAACATTCCTGAAAATATCTATCAGGAACT
    TTATAAGTATTTCAACGATAAAAGTGATAAAGAGCTGAGTGACGAAGCGGCTAAAC
    TTAAGAATGTTGTGGGACACCATGAGGCAGCAACCAATATTGTGAAGGATTATCGCT
    ATACGTACGACAAATACTTTTTACACATGCCCATCACTATTAATTTTAAAGCTAATA
    AGACTGGCTTCATTAACGATCGCATCCTGCAGTACATTGCTAAGGAAAAGGATCTTC
    ACGTTATCGGTATCGATCGCGGGGAGCGTAATCTTATCTACGTCTCTGTCATTGACA
    CGTGTGGCAATATTGTGGAGCAAAAGTCCTTCAATATTGTTAACGGCTATGACTATC
    AGATTAAATTGAAACAGCAGGAAGGTGCGCGTCAGATTGCCCGCAAGGAATGGAAG
    GAAATTGGCAAGATCAAAGAAATTAAGGAGGGCTACTTAAGCTTAGTAATTCACGA
    AATTAGTAAAATGGTTATCAAATACAACGCCATCATCGCGATGGAGGATCTTTCGTA
    CGGGTTTAAGAAAGGTCGTTTTAAAGTGGAGCGTCAGGTGTACCAGAAATTTGAAA
    CTATGCTTATTAACAAACTTAACTACCTGGTTTTCAAGGATATCAGTATTACTGAAA
    ACGGGGGGCTGTTAAAAGGGTATCAATTAACTTACATTCCAGACAAATTAAAGAAC
    GTTGGACATCAGTGTGGCTGCATTTTTTATGTACCAGCTGCATACACTTCAAAGATC
    GATCCTACGACTGGGTTCGTGAACATTTTTAAGTTTAAAGACTTGACGGTAGATGCC
    AAGCGCGAATTCATCAAGAAATTCGACAGCATTCGCTACGACTCTGAGAAAAATCTT
    TTCTGTTTCACATTCGATTATAACAATTTCATTACGCAGAACACAGTAATGTCCAAGT
    CTTCTTGGAGTGTTTATACATATGGTGTCCGCATTAAGCGCCGTTTCGTCAACGGCCG
    CTTCAGTAATGAGAGCGATACTATTGACATCACAAAAGACATGGAAAAAACACTGG
    AAATGACCGACATCAATTGGCGTGACGGCCATGACTTACGTCAGGATATCATTGATT
    ATGAGATCGTTCAACACATCTTCGAAATCTTTCGCTTGACTGTTCAAATGCGCAATTC
    CTTGTCGGAATTGGAGGACCGTGATTATGACCGCTTAATTTCCCCCGTCTTAAATGA
    AAACAATATTTTTTATGACTCTGCAAAAGCTGGAGATGCTCTGCCGAAAGACGCCGA
    TGCAAATGGGGCATATTGCATTGCTTTAAAGGGGCTTTACGAGATCAAGCAAATCAC
    CGAAAACTGGAAAGAGGATGGAAAGTTTTCGCGTGATAAACTGAAGATCTCTAACA
    AAGACTGGTTCGACTTTATCCAGAACAAGCGTTATTTGAAACGTCCGGCAGCGACCA
    AAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCC
    GAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCT
    AA
    SEQ ID NO: 45
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    CAACGGCACCAATAACTTCCAAAACTTCATCGGGATCTCTAGCCTTCAGAAGACGCT
    TCGCAATGCTCTTATCCCAACTGAGACCACTCAACAATTTATTGTGAAGAATGGAAT
    TATTAAAGAGGACGAACTGCGTGGCGAGAATCGTCAGATCTTAAAGGACATTATGG
    ATGATTATTACCGTGGATTCATCTCCGAAACATTATCGTCGATCGATGATATCGATT
    GGACTTCTCTGTTCGAGAAAATGGAAATTCAATTGAAAAACGGAGATAATAAAGAT
    ACGCTTATCAAAGAACAGACGGAATATCGTAAAGCGATTCATAAGAAATTCGCAAA
    TGACGATCGTTTCAAAAATATGTTCAGTGCCAAGCTTATTTCGGACATTTTACCTGA
    ATTTGTAATTCATAATAATAACTACTCAGCAAGTGAGAAGGAGGAGAAAACCCAAG
    TTATTAAACTGTTCTCTCGTTTCGCAACGTCCTTTAAAGATTACTTTAAAAACCGCGC
    GAATTGCTTTAGCGCTGACGACATTTCCAGCTCATCCTGTCATCGCATCGTAAACGA
    CAATGCGGAAATCTTCTTCAGCAACGCCCTGGTTTACCGCCGCATCGTCAAAAGCTT
    ATCGAATGACGACATCAATAAGATCTCAGGAGATATGAAGGACTCGCTTAAGGAGA
    TGTCTCTGGAGGAAATTTATAGTTACGAAAAGTATGGAGAGTTCATTACCCAGGAGG
    GAATCTCGTTCTACAATGACATTTGCGGGAAGGTGAACTCCTTCATGAACTTATACT
    GCCAGAAAAACAAAGAGAACAAAAATCTGTATAAATTGCAGAAATTACATAAACAG
    ATTCTTTGTATTGCTGACACTTCCTACGAAGTACCCTATAAATTCGAGTCAGATGAA
    GAAGTATACCAGTCCGTGAACGGATTTCTGGACAATATCTCCTCAAAACACATCGTG
    GAACGCTTACGTAAAATTGGCGATAATTATAATGGTTACAATCTTGACAAAATTTAT
    ATCGTATCTAAATTTTACGAGAGTGTGAGCCAAAAGACCTACCGCGACTGGGAGAC
    CATCAACACAGCTTTAGAAATTCACTATAATAATATCTTACCCGGCAATGGTAAGAG
    CAAGGCTGACAAGGTAAAAAAGGCCGTCAAGAATGATTTGCAGAAATCTATTACAG
    AAATTAATGAGTTAGTCTCCAACTATAAGCTTTGTTCCGACGATAACATCAAAGCTG
    AGACATATATTCATGAGATTAGTCACATTCTTAACAACTTCGAGGCCCAGGAACTTA
    AGTACAATCCTGAAATTCATCTTGTCGAGTCTGAGCTGAAAGCTAGTGAATTGAAAA
    ATGTTTTAGACGTTATTATGAACGCATTCCACTGGTGCTCTGTGTTTATGACAGAAG
    AACTGGTCGACAAGGACAATAACTTCTATGCCGAACTTGAGGAAATCTACGATGAA
    ATTTACCCTGTAATCTCCTTGTATAATCTTGTACGTAATTACGTCACTCAAAAACCTT
    ACAGCACGAAAAAAATTAAATTGAACTTCGGGATTCCTACACTTGCCGACGGGTGG
    TCTAAATCCAAGGAATATAGCAACAATGCCATTATTTTAATGCGCGACAATCTTTAC
    TATTTAGGAATTTTTAACGCTAAGAACAAGCCCGATAAAAAGATTATTGAAGGAAA
    CACGTCTGAAAATAAGGGCGACTACAAAAAGATGATTTATAACCTTTTGCCCGGTCC
    AAACAAAATGATCCCAAAGGTATTCCTGTCATCCAAAACAGGGGTTGAGACATATA
    AGCCCAGCGCATATATTCTGGAAGGATACAAACAGAATAAACATATCAAAAGCAGC
    AAAGATTTTGACATTACTTTTTGCCACGATTTAATCGACTACTTCAAAAACTGTATCG
    CTATCCACCCTGAATGGAAGAATTTCGGATTTGATTTCTCAGATACAAGTACGTATG
    AGGATATCAGCGGTTTCTATCGCGAAGTTGAACTTCAAGGGTATAAAATTGACTGGA
    CCTACATTAGTGAGAAGGACATCGACCTGTTACAGGAAAAAGGCCAATTGTACTTGT
    TTCAGATCTACAATAAGGATTTCTCAAAAAAATCGACCGGCAATGATAACTTGCACA
    CCATGTACCTGAAGAACCTTTTTTCGGAGGAAAACCTTAAAGACATTGTCCTGAAGT
    TGAATGGAGAAGCGGAGATTTTCTTTCGTAAGTCTTCCATTAAAAATCCAATTATTC
    ATAAGAAGGGCAGCATCCTTGTGAACCGTACGTACGAGGCGGAAGAGAAGGACCA
    ATTCGGTAACATTCAAATCGTCCGCAAGAACATCCCTGAAAATATTTATCAGGAGCT
    TTACAAGTATTTCAATGATAAGTCCGACAAGGAATTATCAGATGAGGCTGCGAAGTT
    GAAAAATGTTGTTGGTCATCACGAGGCGGCGACGAATATTGTAAAGGATTATCGCT
    ACACTTATGACAAGTACTTTCTGCACATGCCGATCACCATTAATTTCAAGGCGAACA
    AAACAGGATTTATTAATGACCGCATCTTACAATACATTGCCAAAGAAAAGGACTTAC
    ACGTTATTGGCATTGATCGTGGAGAACGCAACTTAATCTACGTAAGCGTTATTGACA
    CTTGCGGGAATATCGTAGAACAAAAGAGCTTCAACATCGTGAATGGTTACGATTACC
    AGATCAAGCTTAAGCAGCAGGAGGGAGCGCGCCAGATCGCGCGCAAGGAATGGAA
    GGAGATTGGTAAGATCAAGGAAATCAAGGAAGGTTATCTGTCCTTGGTAATCCACG
    AAATTTCGAAAATGGTTATCAAATACAATGCTATTATTGCAATGGAGGACTTGTCCT
    ACGGCTTTAAAAAAGGACGCTTTAAGGTGGAGCGCCAGGTTTATCAAAAGTTTGAA
    ACAATGCTGATTAACAAGCTGAACTATTTGGTCTTTAAAGATATCTCCATCACCGAA
    AATGGTGGGCTTTTGAAAGGCTATCAACTTACATATATCCCTGATAAGCTTAAGAAT
    GTGGGTCATCAGTGCGGGTGCATTTTTTATGTTCCTGCAGCCTACACGTCCAAAATC
    GATCCTACAACTGGATTTGTTAATATCTTCAAATTTAAGGATCTTACCGTCGACGCG
    AAGCGCGAATTTATCAAGAAATTCGATAGTATTCGTTATGATTCCGAAAAAAACCTT
    TTCTGTTTCACCTTTGATTATAATAACTTTATCACGCAAAATACTGTCATGAGCAAAT
    CGAGTTGGTCTGTGTACACTTACGGAGTACGCATCAAGCGTCGTTTTGTTAATGGGC
    GCTTCAGTAACGAGTCAGACACGATTGATATCACAAAAGATATGGAGAAAACGCTG
    GAGATGACAGACATCAATTGGCGCGATGGTCATGACTTACGTCAAGACATTATCGAT
    TATGAAATTGTCCAGCATATCTTTGAGATCTTTCGTTTGACTGTTCAGATGCGCAACA
    GCCTGTCAGAATTGGAGGATCGTGACTATGATCGCCTTATTTCTCCCGTCTTAAATG
    AGAACAATATCTTCTACGACTCAGCCAAGGCTGGAGATGCACTGCCAAAAGACGCC
    GACGCAAATGGGGCCTACTGTATTGCATTGAAGGGGTTGTACGAGATCAAACAGAT
    TACAGAAAATTGGAAGGAGGACGGTAAGTTCTCTCGTGATAAGCTGAAGATTTCTA
    ACAAAGACTGGTTCGATTTCATTCAGAACAAACGTTACCTGAAACGTCCGGCAGCG
    ACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCA
    GCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCG
    GGCTAA
    SEQ ID NO: 46
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    CAACGGTACCAATAACTTTCAGAATTTCATTGGAATCAGCAGCTTACAGAAAACCCT
    GCGCAATGCACTTATCCCCACTGAGACAACCCAGCAGTTCATTGTAAAGAACGGGA
    TTATTAAAGAAGATGAGCTTCGCGGGGAGAATCGTCAGATCTTAAAGGATATTATG
    GACGATTACTACCGTGGCTTCATTTCGGAGACGCTGTCGTCGATCGACGACATCGAC
    TGGACATCCTTGTTTGAAAAGATGGAAATCCAACTGAAGAATGGCGATAACAAGGA
    CACGTTAATCAAAGAGCAGACGGAATACCGTAAAGCTATCCACAAAAAGTTCGCTA
    ATGACGACCGCTTTAAGAACATGTTCTCAGCAAAACTTATTAGCGATATTTTACCTG
    AATTTGTCATCCACAATAACAATTACTCCGCGAGTGAAAAAGAGGAGAAAACCCAG
    GTGATTAAGCTGTTTTCCCGTTTTGCAACCAGTTTCAAGGACTATTTTAAGAATCGTG
    CTAATTGTTTCTCTGCAGACGACATTTCCTCGTCGTCCTGCCATCGCATTGTTAATGA
    TAATGCTGAAATCTTTTTTTCAAACGCACTTGTGTATCGTCGCATTGTCAAAAGCTTA
    AGTAATGACGATATCAATAAGATCTCAGGAGACATGAAGGACTCCCTGAAAGAAAT
    GTCATTGGAAGAAATTTACTCTTATGAAAAGTATGGAGAATTTATTACGCAGGAGGG
    TATCAGCTTCTATAACGACATTTGTGGTAAAGTGAACAGCTTTATGAATCTTTATTGT
    CAAAAGAATAAAGAGAACAAAAATCTGTACAAGCTGCAGAAATTGCATAAACAAAT
    TCTGTGCATTGCAGATACTTCGTATGAGGTTCCTTACAAATTCGAGTCGGATGAGGA
    GGTGTATCAAAGCGTAAACGGATTTTTGGATAACATTAGTAGTAAGCATATTGTGGA
    ACGCCTTCGCAAGATTGGTGACAACTATAACGGATACAACTTAGACAAGATCTATAT
    TGTCTCGAAGTTTTACGAAAGTGTTTCCCAAAAGACTTATCGCGACTGGGAGACAAT
    CAACACTGCGCTGGAAATTCACTATAACAATATCTTGCCGGGGAACGGAAAAAGTA
    AGGCAGATAAGGTGAAGAAAGCAGTCAAAAATGATCTGCAAAAAAGCATTACTGA
    AATTAACGAACTTGTGTCAAATTACAAATTGTGTTCGGATGACAATATTAAAGCGGA
    AACGTATATCCACGAGATCTCGCACATTCTTAATAATTTCGAGGCGCAGGAATTAAA
    GTATAATCCTGAGATCCATTTGGTGGAATCAGAACTTAAAGCTAGTGAACTGAAAA
    ATGTCCTGGACGTTATTATGAATGCATTTCACTGGTGTTCTGTCTTTATGACAGAAGA
    ACTTGTCGACAAAGACAACAACTTTTATGCGGAATTAGAAGAGATTTACGACGAAA
    TTTATCCCGTTATTTCGTTATATAATTTAGTTCGTAATTACGTGACTCAGAAACCCTA
    CAGCACAAAAAAGATTAAATTAAACTTTGGGATTCCGACTCTTGCTGATGGATGGAG
    CAAGTCCAAGGAGTACTCTAATAACGCCATTATCTTGATGCGTGACAACCTGTACTA
    CCTGGGCATTTTTAACGCTAAAAACAAACCCGACAAAAAGATCATTGAAGGGAACA
    CCTCGGAAAATAAGGGGGACTATAAAAAAATGATCTACAATCTGTTGCCAGGCCCA
    AATAAGATGATCCCAAAGGTTTTTTTATCTTCCAAAACTGGCGTAGAAACTTACAAG
    CCGAGCGCATACATCCTTGAAGGATATAAACAAAACAAACATATCAAAAGTTCAAA
    GGACTTCGATATTACGTTCTGCCATGATTTAATCGATTATTTCAAGAATTGCATCGCG
    ATTCACCCAGAGTGGAAAAACTTTGGGTTTGATTTTTCAGACACCAGCACTTACGAG
    GATATTAGTGGATTCTATCGTGAGGTTGAACTGCAGGGCTATAAAATTGACTGGACC
    TATATTTCTGAAAAAGATATTGATCTGCTTCAGGAGAAAGGCCAATTGTACTTATTT
    CAAATCTATAACAAGGATTTCTCCAAGAAGTCCACGGGTAATGACAACTTACACAC
    AATGTATCTGAAGAATCTGTTTAGTGAGGAGAACTTGAAGGACATTGTGCTGAAGCT
    TAATGGCGAGGCCGAAATCTTTTTTCGTAAGTCCTCCATTAAAAACCCTATTATCCAT
    AAGAAAGGGAGTATTCTTGTCAACCGCACGTATGAGGCCGAAGAAAAGGACCAATT
    CGGAAACATCCAAATTGTCCGTAAAAATATTCCTGAGAACATTTACCAGGAGCTTTA
    CAAGTATTTCAACGACAAGAGTGATAAAGAACTTTCAGATGAGGCGGCGAAACTGA
    AGAATGTAGTGGGGCACCACGAAGCTGCCACGAATATTGTAAAGGATTACCGTTAC
    ACCTACGACAAGTACTTTTTGCATATGCCCATCACAATTAATTTTAAGGCCAATAAA
    ACTGGTTTTATCAACGATCGTATCTTACAGTACATTGCTAAGGAAAAAGATCTGCAC
    GTTATCGGTATCGATCGCGGGGAACGCAATCTGATTTATGTTAGTGTGATTGACACG
    TGCGGAAATATTGTTGAGCAGAAGAGCTTTAATATCGTAAATGGATATGACTATCAA
    ATTAAACTGAAGCAACAGGAAGGGGCCCGCCAGATTGCCCGCAAGGAGTGGAAAG
    AAATTGGAAAGATCAAGGAGATTAAAGAAGGGTACCTTTCCCTTGTTATCCACGAA
    ATCTCGAAAATGGTGATCAAGTACAATGCCATTATTGCTATGGAGGATCTGTCATAT
    GGGTTTAAGAAAGGCCGCTTTAAGGTGGAACGTCAGGTTTACCAGAAGTTTGAGAC
    CATGCTTATCAATAAGCTGAATTATCTTGTCTTCAAAGACATCTCAATCACAGAGAA
    CGGCGGGCTGTTAAAAGGATATCAGCTGACCTATATCCCCGACAAACTGAAAAATG
    TCGGGCACCAATGCGGCTGTATTTTCTACGTGCCCGCTGCATACACATCTAAAATTG
    ACCCAACGACTGGATTCGTAAATATTTTTAAGTTTAAGGATCTTACGGTAGATGCAA
    AGCGCGAATTTATCAAGAAATTTGATAGTATCCGTTACGACAGCGAGAAAAACTTAT
    TTTGTTTTACGTTCGATTATAACAACTTCATCACGCAAAATACCGTCATGTCAAAATC
    TTCCTGGTCAGTCTATACGTATGGCGTCCGTATCAAGCGCCGCTTCGTCAACGGGCG
    TTTTTCAAACGAGTCAGATACCATCGATATCACCAAAGATATGGAAAAAACATTGG
    AGATGACGGACATCAATTGGCGCGATGGTCATGACTTACGCCAGGACATTATTGACT
    ACGAAATCGTACAACATATTTTTGAGATTTTCCGTCTGACCGTGCAAATGCGCAACT
    CATTATCCGAACTTGAGGATCGTGATTACGACCGCTTGATCAGTCCTGTTCTGAACG
    AGAATAATATTTTTTACGACAGTGCCAAGGCGGGAGACGCACTGCCCAAGGACGCT
    GACGCTAACGGAGCTTATTGTATTGCGTTGAAGGGACTTTACGAAATCAAGCAAATC
    ACTGAAAACTGGAAGGAGGATGGTAAATTCTCACGCGACAAGTTGAAAATTTCGAA
    CAAGGACTGGTTCGATTTCATCCAAAACAAGCGTTATTTAAAACGTCCGGCAGCGAC
    CAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGC
    CCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGG
    GCTAA
    SEQ ID NO: 47
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    CAACGGGACTAATAACTTCCAGAACTTCATCGGTATTTCATCATTACAAAAAACGCT
    TCGTAACGCCTTGATCCCAACAGAAACGACCCAACAATTTATTGTAAAAAACGGCAT
    CATCAAAGAAGACGAACTGCGTGGCGAAAATCGCCAAATTTTGAAGGACATTATGG
    ATGACTATTATCGTGGGTTTATCTCGGAGACATTATCCTCCATCGACGACATTGATTG
    GACGAGTCTTTTTGAGAAAATGGAGATCCAGCTTAAAAATGGTGATAACAAGGATA
    CATTGATCAAGGAGCAAACCGAGTACCGCAAGGCCATCCATAAGAAGTTCGCAAAT
    GACGACCGCTTCAAAAATATGTTTAGTGCCAAATTGATCTCGGATATCCTTCCTGAG
    TTCGTAATTCACAACAATAATTATAGCGCATCCGAAAAGGAGGAAAAGACTCAAGT
    CATTAAGCTTTTCAGTCGCTTTGCTACCTCGTTTAAGGACTATTTCAAGAACCGCGCG
    AACTGCTTCTCAGCGGATGACATTTCTTCCTCGTCGTGTCACCGCATCGTGAATGATA
    ATGCGGAGATCTTCTTTAGTAATGCCTTGGTATACCGCCGCATTGTTAAATCCCTGTC
    TAACGACGATATCAATAAGATCTCAGGAGATATGAAGGATAGCCTTAAAGAAATGT
    CTCTGGAAGAAATTTACTCCTATGAAAAGTACGGTGAGTTTATCACCCAAGAGGGG
    ATTAGCTTTTATAACGATATCTGCGGGAAGGTGAATTCGTTTATGAACCTTTATTGTC
    AAAAGAATAAGGAGAATAAGAACTTATATAAGCTTCAGAAACTGCATAAACAAATC
    TTATGCATTGCCGATACTAGCTATGAAGTTCCGTATAAATTCGAGAGCGATGAAGAA
    GTTTATCAGAGCGTCAATGGGTTCTTGGATAACATTTCATCAAAACACATCGTGGAA
    CGTCTGCGTAAGATTGGGGATAACTACAACGGATATAATCTTGACAAAATTTATATT
    GTATCTAAATTCTATGAGTCGGTGAGTCAAAAGACCTACCGTGATTGGGAAACAATC
    AATACCGCGTTAGAAATCCACTATAACAACATTCTGCCAGGGAATGGTAAAAGTAA
    AGCGGACAAAGTCAAGAAGGCTGTGAAGAACGATCTGCAAAAGAGTATTACAGAG
    ATTAACGAATTAGTCTCCAATTATAAGTTATGCTCGGACGATAACATTAAGGCGGAG
    ACGTATATTCATGAGATTTCGCATATTCTTAACAACTTCGAGGCACAAGAGCTTAAG
    TATAACCCAGAGATTCACCTTGTCGAATCGGAGCTGAAGGCATCGGAATTAAAAAA
    TGTCTTAGATGTAATCATGAACGCGTTCCATTGGTGCAGTGTTTTCATGACTGAGGA
    GTTAGTTGACAAGGACAATAACTTCTACGCAGAATTAGAAGAGATCTATGATGAGA
    TTTATCCAGTGATTTCGCTGTATAATCTGGTACGTAATTACGTCACTCAAAAGCCCTA
    CTCAACAAAAAAAATTAAGCTGAACTTCGGAATTCCGACTCTGGCCGACGGGTGGT
    CCAAGTCAAAGGAGTATTCTAATAATGCTATCATCCTGATGCGCGATAACTTATACT
    ATTTGGGAATTTTCAATGCCAAAAATAAACCAGATAAAAAGATTATCGAAGGTAAT
    ACAAGCGAGAATAAGGGTGACTATAAGAAAATGATTTACAATCTTCTTCCAGGCCCT
    AACAAGATGATTCCCAAAGTTTTTTTGTCCAGTAAAACAGGGGTCGAAACTTACAAG
    CCCAGTGCCTATATCCTTGAAGGGTACAAGCAGAATAAGCACATCAAATCCTCGAA
    AGACTTTGATATTACATTTTGTCATGACTTAATCGATTATTTTAAGAACTGTATCGCA
    ATCCATCCAGAATGGAAGAACTTCGGGTTTGATTTCTCTGATACTTCCACGTATGAG
    GATATTTCCGGGTTCTACCGCGAAGTAGAGCTTCAGGGCTATAAAATTGACTGGACA
    TATATTTCAGAAAAAGACATCGATCTGTTACAAGAAAAAGGACAGTTGTATCTGTTT
    CAAATCTATAATAAGGATTTCTCCAAAAAGTCAACTGGAAATGATAACTTACATACA
    ATGTATCTGAAAAATCTTTTTAGTGAAGAGAATTTGAAGGATATCGTGCTGAAGTTA
    AATGGCGAAGCAGAGATCTTCTTCCGCAAGTCCTCGATCAAGAATCCTATCATCCAC
    AAGAAAGGTAGTATTCTGGTTAACCGCACGTACGAGGCCGAGGAAAAAGACCAGTT
    CGGTAATATCCAGATTGTACGTAAGAATATTCCTGAAAATATTTACCAGGAATTATA
    CAAGTATTTTAACGACAAATCGGATAAGGAGCTTTCAGATGAGGCCGCAAAGTTGA
    AGAACGTCGTAGGACACCATGAGGCCGCTACGAATATCGTCAAGGACTACCGCTAT
    ACGTATGACAAGTACTTCCTGCACATGCCTATTACTATCAATTTCAAAGCTAATAAA
    ACAGGATTCATCAATGATCGTATCCTTCAGTACATTGCCAAAGAAAAAGATCTGCAC
    GTAATCGGAATCGACCGTGGCGAACGTAATCTGATTTACGTATCAGTTATCGACACA
    TGTGGTAACATCGTGGAGCAGAAATCTTTTAACATTGTTAACGGCTATGATTATCAG
    ATTAAGCTTAAACAGCAGGAGGGGGCACGCCAAATCGCTCGTAAAGAATGGAAGGA
    GATTGGAAAGATTAAAGAGATTAAAGAGGGGTACCTTTCGCTGGTTATTCACGAAA
    TTTCCAAGATGGTGATTAAGTACAATGCAATCATCGCGATGGAAGATCTTAGTTACG
    GATTCAAAAAGGGACGCTTCAAAGTTGAGCGTCAGGTCTACCAGAAATTTGAAACG
    ATGCTGATTAACAAATTGAATTACTTGGTATTCAAAGATATCTCAATTACTGAAAAT
    GGTGGCTTATTAAAGGGTTACCAGCTTACCTATATCCCGGATAAGCTGAAGAACGTG
    GGCCATCAATGCGGCTGCATCTTTTACGTCCCTGCCGCATATACCTCTAAAATTGAC
    CCCACCACCGGATTCGTAAATATTTTTAAATTCAAGGACCTGACGGTGGACGCCAAG
    CGCGAATTCATCAAAAAATTCGACTCAATCCGCTATGATTCCGAAAAAAATCTTTTC
    TGCTTTACGTTCGATTATAATAACTTCATTACCCAAAACACGGTGATGTCAAAATCG
    TCCTGGAGCGTGTATACTTATGGAGTGCGTATCAAGCGCCGCTTTGTTAATGGGCGC
    TTCAGTAACGAAAGCGATACCATCGACATTACCAAAGACATGGAGAAGACGCTTGA
    AATGACGGATATCAATTGGCGTGACGGACACGATCTTCGTCAGGATATCATCGACTA
    CGAGATTGTGCAACATATCTTTGAGATTTTCCGTTTAACTGTTCAAATGCGTAACTCC
    TTGTCCGAATTGGAAGACCGTGATTACGACCGCTTGATTTCACCAGTGCTTAACGAG
    AATAACATCTTCTACGACTCCGCCAAAGCAGGCGATGCCCTGCCAAAGGACGCTGA
    TGCAAATGGTGCATACTGTATCGCGTTGAAGGGCTTATACGAGATTAAGCAAATCAC
    CGAAAATTGGAAAGAGGATGGAAAGTTCAGTCGCGATAAGCTGAAGATCTCTAATA
    AAGATTGGTTTGACTTTATCCAGAACAAACGTTATTTAAAACGTCCGGCAGCGACCA
    AAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCC
    GAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCT
    AA
    SEQ ID NO: 48
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    CAACGGTACCAATAATTTCCAAAATTTCATCGGAATCTCATCCTTGCAAAAAACCTT
    GCGCAATGCTTTGATCCCCACCGAAACCACGCAGCAGTTCATCGTGAAAAACGGCA
    TTATCAAAGAGGATGAGTTGCGCGGGGAAAACCGTCAAATTCTTAAGGATATCATG
    GACGATTACTACCGTGGGTTTATCAGTGAGACCCTGTCAAGCATTGACGACATTGAC
    TGGACCAGCTTATTTGAGAAGATGGAGATTCAATTAAAGAACGGGGACAATAAGGA
    CACGCTTATCAAAGAGCAGACAGAATACCGTAAAGCGATTCATAAGAAATTTGCAA
    ATGACGATCGCTTCAAGAACATGTTTTCAGCAAAATTAATCAGCGACATCCTTCCCG
    AATTTGTGATTCATAATAACAACTATTCGGCTAGCGAAAAAGAGGAGAAAACTCAG
    GTTATTAAGCTTTTCTCGCGTTTTGCCACTTCGTTCAAAGACTATTTTAAGAATCGCG
    CAAACTGCTTTTCGGCTGATGATATTTCCAGTTCTAGCTGCCATCGTATCGTTAACGA
    TAATGCTGAGATTTTCTTCTCTAATGCCCTGGTGTATCGTCGTATCGTTAAATCTTTG
    AGCAACGACGATATTAATAAGATTTCAGGCGACATGAAGGATTCTTTAAAGGAGAT
    GTCTTTAGAAGAGATTTATTCCTATGAGAAATATGGCGAGTTTATCACCCAAGAAGG
    AATTTCGTTCTACAACGACATCTGTGGCAAAGTGAACAGCTTCATGAATTTATACTG
    CCAAAAGAATAAGGAGAATAAAAATTTATATAAACTGCAGAAACTGCATAAGCAAA
    TTCTTTGCATTGCAGACACCTCTTATGAAGTTCCTTATAAGTTTGAATCGGACGAGG
    AGGTATATCAGAGTGTGAACGGGTTCCTGGACAATATTTCATCCAAGCATATTGTTG
    AACGTTTACGCAAAATTGGAGACAATTACAATGGGTATAACCTTGACAAAATTTACA
    TCGTGTCGAAGTTTTACGAATCGGTAAGCCAGAAGACCTATCGTGACTGGGAAACTA
    TCAATACCGCCTTAGAAATTCATTACAACAATATTCTTCCTGGTAACGGCAAAAGCA
    AAGCCGATAAGGTAAAGAAGGCTGTCAAGAACGACCTGCAAAAGTCTATCACAGAG
    ATCAACGAGTTAGTCTCTAACTACAAATTATGTTCCGACGACAATATTAAAGCCGAA
    ACCTACATCCATGAGATCTCACACATTCTTAACAATTTTGAGGCCCAGGAGCTGAAA
    TATAACCCAGAAATTCACCTTGTAGAGAGCGAATTAAAAGCCTCCGAGCTGAAGAA
    CGTTTTGGATGTAATCATGAACGCATTTCATTGGTGCAGCGTATTTATGACAGAGGA
    GTTGGTCGACAAGGACAATAACTTTTACGCCGAGCTTGAAGAAATCTACGATGAAA
    TTTACCCGGTAATTAGTTTATATAATTTAGTTCGCAACTACGTAACTCAGAAACCCTA
    CAGTACCAAGAAGATTAAATTGAACTTTGGGATCCCGACACTTGCTGACGGTTGGAG
    TAAATCAAAAGAATACTCCAATAATGCAATTATCCTGATGCGCGACAATCTTTACTA
    CTTGGGGATCTTTAACGCAAAGAACAAACCAGATAAGAAAATCATCGAGGGCAACA
    CCAGCGAGAATAAAGGCGATTACAAGAAAATGATCTATAATCTTTTGCCGGGACCG
    AACAAAATGATCCCAAAGGTTTTCCTGTCGTCGAAAACGGGAGTCGAGACATATAA
    ACCATCTGCGTACATCTTGGAAGGTTACAAACAGAATAAGCATATTAAGTCTAGTAA
    AGACTTCGACATCACCTTTTGTCATGACCTGATTGATTATTTCAAGAACTGTATTGCT
    ATCCATCCAGAATGGAAAAACTTCGGATTTGACTTCTCCGATACTAGCACCTACGAA
    GACATTTCGGGTTTTTATCGCGAAGTAGAGCTTCAAGGGTACAAAATTGATTGGACA
    TATATTAGCGAGAAAGACATTGATTTGCTTCAAGAGAAGGGACAGTTATATTTATTC
    CAGATCTACAACAAAGACTTCTCGAAGAAATCCACCGGTAATGATAATCTTCACACT
    ATGTACCTGAAGAATTTATTTTCAGAGGAAAATCTGAAGGACATTGTACTTAAACTT
    AATGGAGAAGCCGAAATCTTCTTCCGCAAGAGTTCCATTAAAAATCCGATTATTCAT
    AAAAAGGGAAGTATCCTTGTGAACCGCACGTATGAGGCCGAAGAGAAGGATCAGTT
    TGGGAATATTCAAATTGTCCGCAAAAACATCCCCGAGAACATCTACCAGGAACTGT
    ATAAATACTTTAATGATAAATCTGATAAAGAGTTATCAGACGAGGCTGCCAAACTG
    AAAAACGTAGTCGGTCATCATGAGGCAGCGACCAATATTGTAAAGGACTACCGTTA
    CACCTACGACAAGTATTTCCTTCACATGCCGATCACGATTAATTTTAAGGCTAACAA
    GACCGGCTTTATCAATGACCGCATCTTGCAGTACATCGCGAAAGAGAAAGATTTACA
    CGTCATCGGAATTGATCGTGGAGAGCGTAATCTTATCTACGTCAGCGTCATCGACAC
    CTGTGGAAACATTGTGGAACAAAAAAGTTTTAATATCGTAAACGGCTACGACTATCA
    AATTAAACTTAAACAGCAAGAGGGAGCTCGCCAGATCGCTCGCAAAGAGTGGAAAG
    AGATTGGGAAAATTAAAGAAATTAAAGAGGGTTACCTGTCGCTGGTAATTCACGAA
    ATCTCGAAAATGGTCATCAAATATAATGCAATTATCGCTATGGAGGATCTGTCCTAC
    GGGTTCAAGAAGGGACGTTTTAAAGTAGAGCGCCAGGTGTATCAAAAATTCGAAAC
    CATGTTGATCAATAAGCTTAACTATTTGGTCTTCAAAGATATTTCGATTACGGAGAA
    CGGAGGTTTGTTGAAAGGATATCAGCTGACGTATATCCCAGACAAGTTGAAAAACG
    TGGGGCATCAATGTGGATGTATTTTCTATGTGCCCGCGGCCTACACGAGTAAGATCG
    ATCCTACCACTGGTTTCGTCAACATTTTCAAATTTAAAGATCTTACCGTGGATGCGA
    AGCGCGAATTTATTAAGAAATTTGATAGCATTCGCTATGATTCCGAAAAGAACCTGT
    TCTGTTTTACGTTCGACTATAACAATTTCATTACCCAAAACACGGTGATGAGCAAAT
    CCTCTTGGTCAGTTTATACATACGGTGTACGTATCAAACGCCGTTTCGTTAACGGAC
    GCTTTTCCAATGAGTCTGATACAATCGATATCACGAAAGATATGGAAAAAACATTAG
    AGATGACTGATATCAACTGGCGTGACGGGCACGACCTGCGTCAAGACATTATTGACT
    ACGAGATTGTGCAGCATATCTTCGAAATCTTTCGCTTAACTGTGCAAATGCGTAACT
    CGTTATCCGAGTTAGAAGACCGTGACTACGATCGCCTGATTTCACCCGTCTTGAACG
    AAAATAACATCTTCTACGATTCCGCGAAGGCTGGGGACGCATTGCCCAAGGACGCA
    GACGCGAATGGAGCGTACTGTATTGCGCTTAAAGGATTATATGAAATCAAGCAGAT
    CACCGAAAATTGGAAGGAGGACGGGAAGTTCTCACGCGACAAACTGAAGATTTCAA
    ATAAGGACTGGTTCGATTTCATTCAGAATAAGCGTTACCTGAAACGTCCGGCAGCGA
    CCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAG
    CCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGG
    GCTAA
    SEQ ID NO: 49
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    TAATGGTACGAACAACTTTCAGAACTTCATCGGCATCTCCAGCCTTCAAAAGACTTT
    ACGCAACGCATTGATTCCCACGGAGACTACGCAACAGTTTATCGTAAAAAATGGTAT
    TATCAAAGAAGATGAATTACGCGGGGAGAATCGCCAGATTCTTAAGGACATTATGG
    ACGATTATTACCGTGGATTCATCAGTGAGACACTGAGCTCCATTGATGACATCGACT
    GGACGTCATTGTTTGAAAAGATGGAAATCCAGTTGAAAAATGGCGATAACAAAGAT
    ACATTGATTAAAGAGCAGACAGAGTACCGCAAAGCAATTCACAAGAAATTCGCCAA
    TGATGATCGTTTTAAGAACATGTTTAGTGCCAAGCTTATTTCGGATATCTTACCCGAA
    TTCGTGATTCACAACAACAATTATTCGGCAAGTGAGAAAGAGGAAAAGACCCAGGT
    TATCAAATTGTTTTCGCGCTTCGCCACTTCGTTCAAAGATTATTTCAAGAACCGTGCA
    AACTGTTTCTCCGCTGACGACATCAGTTCCAGCTCATGCCACCGTATTGTAAATGAC
    AATGCGGAGATCTTTTTCAGTAATGCCTTAGTATATCGTCGCATTGTAAAGAGCTTA
    TCTAATGATGACATTAACAAGATCTCGGGTGATATGAAGGACTCACTTAAGGAGAT
    GAGTCTGGAAGAGATCTACTCCTACGAAAAATACGGGGAATTCATCACCCAGGAGG
    GAATTTCATTCTACAACGATATCTGCGGCAAAGTTAACTCCTTTATGAATCTGTACTG
    TCAAAAGAACAAGGAGAATAAAAACCTGTATAAATTGCAGAAACTTCATAAACAAA
    TTTTGTGTATCGCAGACACGAGTTATGAAGTACCTTATAAATTCGAATCCGACGAAG
    AGGTATATCAGTCCGTAAATGGGTTCCTGGACAATATCAGTAGTAAGCACATTGTGG
    AACGCTTACGCAAAATTGGAGACAATTACAACGGGTATAACCTGGACAAAATCTAC
    ATCGTATCCAAATTTTATGAAAGCGTGTCTCAAAAAACTTATCGTGATTGGGAAACA
    ATCAACACGGCTCTTGAGATCCATTACAATAACATCTTGCCGGGTAACGGCAAATCG
    AAGGCAGACAAAGTTAAAAAAGCAGTTAAGAACGACTTACAGAAAAGCATTACGG
    AGATTAACGAGTTAGTAAGTAATTACAAATTATGCTCCGACGATAATATCAAAGCTG
    AAACCTACATCCATGAAATTAGCCACATTTTGAACAATTTCGAAGCGCAGGAGCTGA
    AATATAACCCTGAAATCCATCTGGTAGAGTCTGAGTTGAAGGCGTCAGAACTGAAA
    AACGTTCTTGACGTCATCATGAATGCCTTTCACTGGTGTAGTGTTTTTATGACTGAGG
    AGCTTGTAGATAAGGACAACAACTTCTATGCTGAACTTGAAGAGATCTACGATGAA
    ATCTACCCCGTAATCAGTCTGTATAATTTAGTTCGTAACTACGTCACGCAGAAACCC
    TATTCGACTAAGAAAATTAAGCTGAACTTTGGGATCCCTACTTTGGCAGACGGGTGG
    AGCAAGAGTAAAGAATACAGTAATAATGCAATTATCTTGATGCGCGATAACTTATAT
    TACTTAGGTATTTTCAATGCTAAGAACAAACCTGATAAGAAGATTATCGAAGGAAAT
    ACGAGTGAGAATAAGGGAGACTACAAAAAGATGATTTACAACTTGCTGCCAGGGCC
    TAATAAGATGATTCCAAAAGTTTTTCTGTCGAGCAAGACAGGGGTTGAAACTTATAA
    GCCATCCGCTTATATCCTTGAGGGGTACAAGCAGAATAAGCATATCAAGTCCTCCAA
    AGATTTTGATATTACATTTTGCCACGACTTAATTGATTACTTCAAGAACTGCATCGCA
    ATCCATCCCGAATGGAAGAATTTCGGCTTCGATTTCTCAGATACGTCCACGTATGAG
    GATATCTCAGGCTTTTACCGCGAAGTTGAGCTGCAAGGTTATAAAATTGATTGGACA
    TACATCTCCGAAAAAGACATTGATCTTTTACAGGAAAAGGGCCAATTATACTTATTT
    CAAATCTATAACAAAGATTTTAGCAAGAAGTCCACAGGTAATGATAACCTGCATAC
    GATGTATTTGAAAAATCTTTTCAGTGAAGAGAATTTGAAGGATATCGTCCTGAAGCT
    GAACGGTGAGGCTGAGATCTTCTTCCGCAAATCGTCTATCAAAAACCCCATCATTCA
    CAAAAAGGGAAGTATCTTAGTAAACCGCACTTATGAAGCGGAGGAAAAGGATCAGT
    TCGGGAACATCCAGATCGTGCGCAAGAACATTCCAGAAAACATCTATCAGGAACTT
    TACAAATATTTCAATGACAAGTCTGATAAAGAATTATCAGACGAGGCGGCGAAACT
    TAAAAATGTTGTTGGACACCACGAAGCAGCGACGAATATTGTAAAGGATTATCGCT
    ACACATACGATAAATACTTTTTGCACATGCCAATCACCATTAACTTTAAGGCGAACA
    AGACAGGTTTCATTAACGACCGTATTCTGCAATATATCGCAAAGGAAAAAGACCTG
    CACGTTATTGGGATCGATCGTGGCGAACGCAATTTGATCTACGTAAGCGTTATCGAC
    ACTTGCGGAAATATCGTTGAACAAAAAAGCTTTAATATCGTCAATGGATACGATTAC
    CAAATCAAGCTGAAACAACAAGAAGGGGCACGTCAGATCGCTCGTAAAGAATGGA
    AAGAGATTGGTAAGATCAAAGAGATTAAAGAAGGGTATCTTTCTTTAGTAATTCACG
    AGATTTCGAAAATGGTTATTAAATACAATGCGATTATTGCTATGGAAGACTTAAGCT
    ACGGCTTTAAGAAAGGTCGCTTCAAAGTGGAGCGCCAAGTGTATCAGAAGTTTGAA
    ACGATGTTGATTAACAAATTAAATTACCTGGTCTTTAAGGACATCAGTATCACAGAA
    AATGGGGGGTTGCTTAAAGGGTACCAGCTTACATACATCCCTGATAAACTGAAAAA
    TGTCGGTCATCAGTGCGGATGTATCTTCTATGTACCAGCAGCCTATACCAGTAAGAT
    TGACCCTACTACTGGCTTTGTGAATATTTTTAAATTCAAGGATTTAACCGTGGACGCC
    AAGCGTGAATTTATTAAAAAATTTGATTCGATTCGCTACGACAGTGAGAAAAACCTT
    TTCTGCTTTACCTTTGACTACAACAATTTTATTACCCAGAACACCGTAATGTCAAAGA
    GTTCGTGGTCTGTATATACCTACGGTGTTCGCATCAAGCGCCGCTTCGTAAACGGGC
    GTTTCAGTAACGAATCTGACACCATCGACATCACTAAAGATATGGAGAAGACATTG
    GAAATGACGGACATTAATTGGCGTGATGGCCATGACTTACGTCAGGACATTATTGAT
    TACGAAATTGTGCAGCATATCTTCGAGATTTTCCGTTTGACAGTTCAGATGCGCAAC
    TCACTGAGTGAGTTAGAAGATCGCGATTACGACCGTCTGATCTCACCGGTCCTTAAT
    GAAAACAACATTTTCTACGACTCAGCAAAGGCGGGTGATGCCCTGCCAAAGGATGC
    GGACGCTAATGGCGCCTACTGCATCGCCCTGAAAGGATTGTATGAAATTAAGCAGA
    TTACAGAAAATTGGAAGGAAGATGGTAAATTTAGCCGTGATAAATTAAAAATCTCG
    AACAAGGATTGGTTCGATTTTATTCAGAACAAACGTTATTTGAAACGTCCGGCAGCG
    ACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCA
    GCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCG
    GGCTAA
    SEQ ID NO: 50
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    CAATGGAACAAATAATTTTCAAAATTTTATCGGCATCTCAAGTCTTCAAAAAACCCT
    TCGCAATGCCCTGATTCCAACTGAAACAACCCAGCAATTTATCGTCAAGAACGGCAT
    CATTAAGGAAGACGAGTTACGCGGGGAGAACCGTCAAATCCTGAAAGATATCATGG
    ATGACTACTATCGTGGGTTCATTTCGGAAACCTTGTCTTCAATCGACGACATTGACT
    GGACGAGTCTTTTCGAGAAAATGGAAATTCAGCTTAAAAATGGAGACAACAAGGAT
    ACTCTGATTAAGGAACAGACAGAATATCGCAAAGCTATCCACAAAAAGTTCGCTAA
    TGATGATCGTTTCAAAAATATGTTTTCTGCTAAATTGATTTCCGATATCTTGCCTGAA
    TTTGTAATCCACAACAACAATTATTCTGCTTCCGAGAAGGAAGAGAAGACCCAGGTC
    ATTAAATTATTCAGCCGCTTTGCAACCAGCTTTAAAGACTACTTTAAGAATCGCGCT
    AACTGCTTTTCGGCGGATGACATCTCATCATCATCATGCCACCGCATTGTGAACGAC
    AATGCGGAGATCTTCTTTTCGAATGCGTTAGTTTATCGTCGCATTGTCAAAAGTCTTA
    GCAATGATGACATCAACAAGATCTCAGGAGACATGAAAGATTCCTTAAAGGAGATG
    TCTCTTGAGGAAATCTATTCGTATGAGAAATACGGCGAGTTCATTACCCAGGAAGGT
    ATTAGTTTCTACAATGATATCTGCGGCAAAGTAAATTCTTTTATGAATCTGTATTGCC
    AAAAAAACAAAGAAAACAAGAATCTTTATAAGTTACAAAAGTTACATAAGCAAATT
    CTGTGCATCGCTGATACATCTTATGAGGTACCCTACAAATTTGAAAGTGATGAGGAG
    GTCTATCAGAGTGTCAACGGCTTCTTAGACAACATCTCTTCCAAACATATCGTGGAA
    CGCCTGCGTAAAATCGGAGATAACTACAACGGATATAACTTAGATAAAATCTACAT
    CGTGTCCAAGTTTTATGAAAGTGTGAGCCAAAAAACATATCGTGACTGGGAAACCA
    TTAACACCGCATTGGAAATTCACTATAACAACATTTTGCCAGGCAACGGGAAAAGT
    AAGGCGGACAAAGTTAAGAAAGCAGTTAAAAATGACCTGCAAAAAAGCATCACTG
    AAATTAACGAATTGGTATCGAATTACAAATTATGTAGCGACGATAATATCAAAGCA
    GAAACTTACATTCACGAGATTAGTCACATTTTAAATAACTTCGAGGCCCAGGAATTG
    AAATACAATCCCGAAATTCATTTGGTTGAATCAGAACTGAAAGCATCAGAGTTGAA
    AAATGTGTTAGATGTCATTATGAATGCGTTTCATTGGTGCTCTGTGTTCATGACCGAG
    GAACTGGTTGATAAAGATAACAACTTTTACGCTGAATTGGAGGAGATTTACGATGA
    GATTTACCCGGTCATTTCGCTTTATAACTTAGTGCGCAATTATGTGACGCAGAAACC
    ATATTCCACGAAGAAAATCAAACTTAATTTTGGCATCCCTACTCTGGCTGATGGTTG
    GTCGAAATCGAAAGAGTACAGCAACAACGCGATCATTCTTATGCGTGACAATCTTTA
    CTATTTGGGCATTTTTAATGCCAAGAATAAGCCAGATAAGAAAATCATTGAGGGGA
    ATACTTCCGAGAATAAGGGGGATTACAAAAAGATGATCTATAACTTGCTGCCCGGC
    CCCAACAAAATGATTCCTAAGGTTTTCTTGTCAAGCAAGACGGGCGTCGAAACATAT
    AAGCCGTCAGCTTATATTCTGGAAGGCTATAAACAGAATAAGCACATCAAGTCTTCC
    AAGGACTTTGACATCACTTTTTGCCACGATTTGATCGACTACTTTAAGAACTGTATTG
    CGATTCATCCGGAATGGAAGAACTTCGGTTTCGACTTTTCCGATACCTCAACATACG
    AGGATATCAGCGGCTTCTACCGTGAAGTCGAGCTTCAAGGCTACAAGATCGATTGG
    ACATATATTTCAGAGAAGGACATTGATTTGTTACAAGAGAAAGGTCAACTTTACTTA
    TTTCAGATCTATAACAAAGACTTTTCGAAGAAATCGACAGGAAACGATAACTTACAC
    ACTATGTATTTAAAAAATCTGTTTTCGGAGGAAAACCTGAAAGATATTGTGCTGAAA
    CTTAACGGCGAGGCAGAGATCTTTTTCCGTAAAAGCTCAATCAAGAATCCTATCATC
    CATAAAAAAGGTAGTATTCTTGTCAACCGCACATATGAAGCGGAGGAGAAGGACCA
    ATTCGGAAACATCCAAATTGTCCGTAAGAATATTCCGGAGAACATTTACCAAGAGTT
    GTATAAATACTTTAACGATAAGTCAGATAAGGAACTTAGCGATGAGGCGGCGAAGC
    TTAAAAACGTAGTTGGGCATCATGAAGCTGCTACCAACATTGTAAAAGATTACCGTT
    ACACCTATGACAAGTATTTCTTGCACATGCCCATTACGATCAATTTCAAAGCAAATA
    AGACAGGCTTTATCAATGATCGCATCCTGCAGTACATTGCTAAAGAGAAGGATTTGC
    ATGTTATCGGTATTGATCGCGGAGAGCGCAATTTGATCTACGTCTCCGTAATCGACA
    CTTGCGGTAACATTGTTGAGCAGAAGTCGTTCAACATCGTTAATGGTTATGATTACC
    AAATCAAGCTGAAGCAGCAAGAGGGTGCCCGCCAGATCGCGCGTAAGGAATGGAA
    AGAAATCGGGAAAATTAAAGAGATCAAAGAAGGCTATTTGTCTCTGGTAATTCACG
    AAATCAGCAAGATGGTGATCAAGTATAACGCGATCATTGCGATGGAGGATCTTTCTT
    ATGGCTTCAAGAAAGGGCGCTTTAAAGTCGAACGCCAGGTCTACCAGAAATTTGAG
    ACAATGCTTATCAACAAGCTTAACTATCTTGTATTTAAGGATATTTCCATCACTGAG
    AACGGAGGACTTTTAAAGGGGTACCAACTGACGTACATTCCTGATAAGCTGAAGAA
    CGTTGGTCATCAATGCGGATGCATCTTCTATGTGCCAGCGGCTTACACCTCCAAAAT
    CGATCCCACTACAGGCTTTGTCAATATCTTCAAATTCAAGGATTTGACCGTTGACGC
    GAAGCGCGAGTTTATCAAGAAGTTTGATAGCATTCGCTACGACAGCGAAAAAAATT
    TATTTTGTTTTACTTTCGACTACAATAACTTTATTACTCAGAACACTGTCATGTCAAA
    GAGTTCGTGGAGTGTCTACACGTACGGAGTACGTATTAAGCGCCGTTTCGTCAACGG
    ACGCTTCTCAAACGAAAGCGACACGATCGACATCACCAAAGACATGGAAAAAACTC
    TTGAGATGACGGATATCAATTGGCGCGACGGCCATGACCTGCGTCAGGATATCATTG
    ATTACGAGATCGTTCAGCACATCTTCGAAATCTTCCGCCTTACCGTCCAGATGCGCA
    ACAGTTTAAGCGAGCTTGAAGACCGCGACTACGATCGTTTGATTAGCCCCGTTCTGA
    ACGAGAATAATATTTTCTACGACAGCGCAAAGGCCGGTGATGCTTTGCCAAAGGAC
    GCAGACGCGAATGGAGCCTACTGCATCGCCCTGAAGGGCTTATATGAGATTAAGCA
    AATTACCGAAAATTGGAAGGAAGATGGTAAGTTCTCCCGTGATAAGCTTAAAATTA
    GCAATAAGGATTGGTTCGACTTCATCCAGAACAAACGTTACCTGAAACGTCCGGCA
    GCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAG
    GCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTAT
    TCCGGGCTAA
    SEQ ID NO: 51
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    CAACGGAACAAACAATTTCCAAAACTTCATCGGTATCTCTTCGTTGCAGAAGACTCT
    GCGTAATGCTTTGATCCCGACGGAGACAACCCAACAATTTATCGTCAAAAACGGTAT
    TATTAAGGAGGACGAGTTACGTGGAGAAAATCGTCAAATCCTTAAGGACATCATGG
    ACGATTATTATCGCGGGTTTATTTCTGAAACCCTGAGCAGTATCGATGATATCGACT
    GGACCTCACTTTTTGAGAAAATGGAGATCCAGTTGAAGAACGGTGATAACAAAGAC
    ACTCTGATCAAAGAGCAAACTGAATACCGCAAGGCAATTCACAAAAAGTTCGCCAA
    CGACGACCGTTTCAAGAATATGTTCTCAGCTAAGTTAATCAGCGACATTTTGCCAGA
    GTTCGTTATCCACAACAATAATTATAGTGCTTCAGAGAAGGAGGAAAAAACCCAAG
    TGATTAAACTTTTTTCGCGCTTTGCAACCTCATTCAAGGACTACTTCAAGAATCGCGC
    GAATTGCTTCAGTGCGGACGACATTTCTTCTTCAAGTTGCCATCGTATCGTTAACGAT
    AACGCGGAAATTTTCTTCTCTAATGCTTTGGTGTATCGCCGCATTGTAAAATCGCTTA
    GTAACGATGACATTAATAAGATCTCAGGTGATATGAAAGATTCATTGAAGGAAATG
    AGCTTGGAAGAGATTTACAGTTACGAAAAATATGGAGAATTTATTACTCAGGAAGG
    CATCTCATTCTATAACGATATCTGCGGGAAGGTAAATTCGTTTATGAACTTATATTGC
    CAGAAAAATAAAGAGAATAAAAATTTGTATAAGCTTCAGAAGTTGCACAAACAGAT
    CCTGTGCATTGCAGACACCTCGTATGAGGTTCCGTATAAATTTGAGTCCGATGAAGA
    AGTGTATCAGTCTGTGAATGGTTTCTTAGATAATATCTCTTCCAAGCATATTGTCGAA
    CGCCTGCGCAAAATTGGTGATAACTATAACGGATACAATCTGGATAAAATTTACATC
    GTTTCTAAATTTTACGAGTCAGTCTCGCAGAAGACCTACCGCGACTGGGAAACAATT
    AACACGGCATTGGAGATTCACTACAATAATATCTTGCCTGGTAACGGTAAGTCTAAG
    GCAGATAAGGTAAAAAAAGCTGTGAAAAACGACCTTCAGAAAAGCATCACGGAGA
    TTAATGAGCTGGTGAGTAATTACAAATTATGTTCAGACGATAATATTAAAGCTGAAA
    CGTATATCCATGAAATCTCGCATATCTTGAACAACTTCGAGGCCCAAGAACTTAAAT
    ATAACCCCGAAATCCATTTAGTCGAGTCTGAATTGAAAGCGTCGGAATTAAAAAAC
    GTCTTAGACGTCATTATGAACGCGTTTCACTGGTGTTCAGTTTTCATGACCGAAGAG
    CTGGTCGACAAAGACAACAACTTCTATGCGGAATTGGAGGAAATCTATGATGAAAT
    CTACCCTGTTATTTCACTGTATAACCTTGTGCGCAACTATGTCACTCAGAAGCCGTAT
    TCGACCAAAAAAATTAAATTGAATTTCGGTATCCCTACTCTTGCAGACGGATGGAGT
    AAAAGCAAGGAATACAGTAATAACGCCATTATTCTTATGCGCGACAATTTATACTAC
    CTGGGCATCTTTAACGCAAAGAATAAGCCGGATAAGAAGATTATTGAGGGTAACAC
    CAGTGAGAACAAGGGCGACTATAAGAAGATGATCTATAACTTATTGCCAGGTCCAA
    ATAAAATGATCCCAAAAGTATTCTTATCATCAAAGACGGGAGTTGAAACCTATAAG
    CCTAGTGCCTATATTCTTGAGGGATATAAACAGAACAAGCACATTAAGTCGTCTAAG
    GATTTTGACATTACGTTCTGCCATGACTTAATCGACTATTTTAAAAACTGTATTGCGA
    TTCACCCCGAATGGAAGAATTTTGGATTCGATTTTTCGGATACCTCGACCTATGAAG
    ATATTTCGGGATTTTATCGTGAAGTGGAGTTGCAAGGCTATAAAATCGATTGGACCT
    ATATCTCAGAAAAAGACATTGATTTATTACAGGAAAAGGGACAACTGTACCTTTTCC
    AAATTTATAACAAGGACTTTTCTAAAAAGTCCACAGGAAATGATAACCTTCACACCA
    TGTACCTGAAGAACCTTTTCTCAGAGGAAAACCTGAAGGACATTGTCCTTAAGTTAA
    ATGGAGAAGCGGAGATCTTTTTCCGTAAATCTAGTATCAAGAATCCGATTATCCATA
    AAAAAGGTTCGATTTTGGTAAATCGCACCTATGAAGCGGAAGAGAAAGATCAATTT
    GGTAACATCCAGATCGTGCGCAAGAATATCCCGGAGAACATTTACCAAGAGCTGTA
    TAAGTACTTCAATGATAAGTCTGATAAGGAACTGTCAGATGAAGCTGCGAAATTGA
    AGAACGTGGTTGGGCATCATGAAGCCGCTACCAATATCGTCAAGGATTACCGTTATA
    CCTATGACAAATATTTCTTACACATGCCGATTACGATCAATTTTAAGGCAAACAAGA
    CAGGATTCATCAACGACCGTATCTTGCAGTATATTGCCAAAGAGAAGGATCTGCATG
    TGATCGGTATTGACCGCGGGGAGCGCAATTTAATCTATGTATCGGTGATCGATACTT
    GTGGTAACATCGTAGAACAAAAGAGCTTTAACATCGTGAATGGTTACGACTATCAG
    ATCAAGCTGAAACAACAGGAAGGAGCCCGCCAGATCGCTCGCAAGGAATGGAAAG
    AAATCGGGAAAATTAAGGAAATCAAGGAAGGCTACCTTTCATTGGTCATTCACGAA
    ATTTCGAAAATGGTAATTAAGTACAACGCGATCATCGCCATGGAGGACCTTTCGTAC
    GGATTTAAGAAGGGTCGTTTCAAAGTTGAGCGCCAGGTATACCAAAAATTCGAGAC
    TATGCTTATCAACAAACTTAACTACTTGGTCTTTAAGGACATTTCTATTACCGAAAAC
    GGCGGCTTACTTAAAGGCTATCAATTGACATATATTCCCGACAAACTGAAGAATGTT
    GGACATCAATGCGGGTGTATTTTCTATGTGCCGGCAGCTTACACTAGTAAGATCGAC
    CCTACAACCGGGTTCGTAAACATTTTTAAATTCAAAGACTTAACAGTCGATGCGAAG
    CGTGAATTTATTAAGAAGTTTGATAGTATCCGCTATGACAGTGAAAAGAACTTGTTT
    TGCTTTACGTTCGACTACAATAACTTTATTACACAGAACACGGTCATGTCTAAATCA
    TCATGGTCGGTTTACACATATGGGGTGCGCATCAAGCGTCGCTTTGTAAATGGCCGT
    TTTAGTAATGAGAGCGACACAATCGACATCACAAAGGATATGGAGAAAACTCTTGA
    GATGACAGACATCAATTGGCGTGACGGTCATGACTTACGCCAAGATATCATCGACTA
    CGAAATCGTACAGCATATTTTTGAGATTTTTCGTCTTACTGTGCAAATGCGTAATTCT
    TTATCCGAACTGGAAGATCGTGATTACGACCGCTTGATTAGTCCCGTCTTAAATGAG
    AACAATATTTTCTATGATTCTGCGAAAGCCGGAGATGCACTGCCCAAAGACGCTGAT
    GCCAATGGCGCGTATTGCATTGCATTAAAAGGATTATATGAGATTAAACAGATTACC
    GAAAATTGGAAAGAGGACGGTAAATTCTCACGCGATAAATTGAAGATTTCTAACAA
    GGACTGGTTCGACTTTATCCAAAATAAACGTTATCTTAAACGTCCGGCAGCGACCAA
    AAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCG
    AAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTA
    A
    SEQ ID NO: 52
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    TAACGGTACCAACAACTTTCAGAATTTCATTGGCATTAGCTCGCTTCAAAAAACTTT
    ACGCAATGCTCTTATTCCGACTGAGACGACACAACAGTTTATCGTTAAGAATGGCAT
    CATCAAAGAAGATGAATTACGCGGAGAAAACCGCCAGATCCTGAAAGACATTATGG
    ACGATTATTACCGTGGGTTCATCTCCGAGACGTTGTCATCGATCGATGACATCGACT
    GGACGTCACTTTTTGAAAAAATGGAGATCCAGTTAAAGAACGGTGACAATAAGGAT
    ACATTGATCAAAGAACAGACCGAGTACCGTAAAGCGATTCATAAAAAGTTTGCGAA
    CGATGATCGCTTCAAGAATATGTTTTCTGCGAAATTAATTTCCGACATTTTACCTGAA
    TTTGTTATTCATAATAACAACTACTCGGCGTCTGAGAAAGAGGAGAAAACCCAAGT
    GATTAAACTTTTTTCACGTTTCGCAACGTCGTTCAAAGACTATTTTAAAAATCGTGCT
    AATTGCTTTAGCGCGGATGACATCAGCTCTAGTTCATGTCATCGCATTGTCAACGAT
    AATGCTGAGATCTTTTTCAGTAATGCGTTAGTGTACCGTCGTATTGTGAAGTCCTTAT
    CTAATGATGATATCAATAAGATCAGCGGGGATATGAAGGACTCACTTAAGGAGATG
    AGCTTGGAGGAAATCTATTCCTATGAGAAGTATGGTGAGTTTATTACGCAAGAAGG
    AATTAGCTTTTACAACGATATCTGTGGAAAGGTGAATTCGTTTATGAATTTGTATTGC
    CAGAAAAATAAGGAGAACAAGAACCTTTATAAATTGCAAAAGTTACACAAGCAAAT
    CCTGTGCATTGCAGATACTTCCTACGAGGTGCCTTACAAGTTTGAATCCGACGAAGA
    GGTCTACCAATCTGTAAACGGTTTCTTAGATAATATTAGTTCCAAGCATATTGTGGA
    GCGCCTTCGTAAAATTGGCGATAATTACAACGGTTACAATTTAGACAAAATTTACAT
    TGTCAGTAAATTCTACGAGTCCGTATCTCAAAAGACGTATCGTGATTGGGAGACTAT
    CAATACGGCCCTGGAGATCCACTACAACAATATCTTGCCCGGTAATGGTAAGTCGAA
    GGCCGATAAAGTTAAGAAAGCGGTGAAAAATGACTTACAGAAGTCAATCACCGAAA
    TTAACGAATTGGTGTCCAATTATAAATTGTGTTCAGATGATAATATCAAAGCCGAGA
    CCTACATTCATGAGATTTCCCATATCTTAAATAATTTCGAGGCGCAAGAGCTTAAGT
    ATAACCCAGAAATCCACCTGGTAGAATCTGAGTTGAAGGCGTCAGAGTTAAAAAAT
    GTTTTAGATGTCATTATGAACGCGTTTCACTGGTGCTCCGTATTTATGACGGAGGAA
    TTAGTAGATAAAGACAACAATTTCTATGCCGAACTTGAGGAAATCTATGATGAGATC
    TATCCCGTCATTAGCCTGTATAACTTGGTCCGCAACTATGTTACCCAAAAACCGTAC
    AGTACCAAGAAGATTAAGCTGAATTTCGGCATTCCTACACTGGCTGATGGTTGGAGT
    AAATCGAAGGAATATTCGAATAACGCGATTATCTTGATGCGCGACAACTTATACTAT
    TTGGGGATCTTTAACGCCAAAAACAAACCGGATAAGAAGATTATTGAGGGAAACAC
    ATCAGAGAACAAAGGCGACTACAAAAAAATGATTTACAACTTGTTACCGGGGCCTA
    ACAAAATGATCCCGAAGGTGTTCTTATCCAGTAAAACAGGCGTTGAGACCTACAAA
    CCTTCCGCATACATCCTGGAAGGGTATAAGCAGAACAAGCACATTAAGTCCAGCAA
    GGATTTCGATATTACCTTCTGTCATGATTTAATTGACTATTTCAAGAACTGTATTGCA
    ATCCACCCCGAGTGGAAGAACTTCGGATTCGACTTCTCAGATACGAGCACATATGAG
    GACATCTCGGGGTTCTATCGTGAAGTAGAACTGCAGGGATATAAAATTGATTGGAC
    ATATATTTCCGAAAAAGACATCGACCTTTTACAAGAGAAGGGTCAACTTTACTTGTT
    CCAAATTTACAATAAAGACTTCTCAAAAAAAAGCACGGGTAACGATAATTTACACA
    CTATGTATTTAAAGAACCTTTTCTCGGAAGAGAATTTAAAGGATATCGTATTGAAGT
    TGAATGGAGAAGCGGAGATCTTCTTCCGTAAGTCCAGTATTAAAAACCCTATTATTC
    ACAAGAAGGGATCGATTTTAGTTAACCGCACATACGAGGCCGAAGAGAAGGACCAA
    TTTGGGAACATTCAAATTGTCCGCAAAAACATCCCTGAGAACATTTATCAAGAGCTT
    TATAAGTACTTTAACGATAAGTCCGATAAGGAATTGTCAGATGAGGCGGCAAAGTT
    GAAGAATGTCGTGGGGCATCATGAAGCTGCCACCAACATTGTGAAGGACTACCGCT
    ACACTTACGACAAATACTTCCTGCACATGCCCATTACGATCAATTTTAAGGCCAATA
    AGACAGGCTTTATTAACGACCGTATTCTTCAATATATCGCTAAGGAGAAGGACCTTC
    ATGTGATTGGGATCGACCGCGGAGAACGTAATTTAATTTATGTGTCCGTCATCGATA
    CGTGTGGAAATATCGTGGAACAGAAATCATTCAATATCGTGAATGGCTATGATTACC
    AGATCAAATTAAAACAGCAGGAGGGCGCTCGCCAAATTGCGCGTAAGGAATGGAAA
    GAGATCGGAAAAATCAAAGAAATCAAAGAAGGATATTTGTCATTGGTGATCCATGA
    GATTTCAAAAATGGTAATTAAATATAATGCAATTATCGCAATGGAAGACCTGTCCTA
    TGGTTTTAAGAAGGGTCGTTTCAAGGTAGAACGCCAAGTGTATCAAAAGTTCGAGA
    CGATGCTGATCAATAAGCTGAATTATCTTGTGTTTAAGGACATTAGCATCACGGAAA
    ATGGAGGGCTGTTGAAAGGCTATCAACTGACGTATATCCCTGACAAGCTGAAAAAT
    GTTGGCCATCAGTGCGGGTGCATTTTCTACGTCCCCGCGGCGTATACAAGCAAGATC
    GATCCTACTACGGGATTCGTAAATATTTTTAAATTCAAAGACTTAACCGTGGACGCC
    AAGCGCGAATTCATTAAGAAGTTTGATAGCATTCGCTACGATTCAGAAAAAAATCTT
    TTCTGTTTTACGTTCGATTACAACAATTTTATCACCCAGAACACAGTGATGAGCAAG
    TCATCCTGGTCTGTCTATACCTACGGTGTCCGTATCAAACGCCGCTTCGTCAACGGA
    CGCTTCTCTAATGAATCTGATACCATTGACATCACCAAGGACATGGAAAAGACACTT
    GAGATGACAGATATTAACTGGCGTGACGGACATGACCTGCGTCAGGACATCATCGA
    TTATGAGATTGTTCAGCATATCTTCGAGATCTTCCGCCTGACAGTACAAATGCGCAA
    TTCACTGTCAGAACTTGAAGACCGCGACTATGACCGCCTGATCTCTCCAGTATTAAA
    TGAGAACAATATCTTTTATGACAGTGCTAAGGCCGGCGATGCCCTTCCGAAAGATGC
    TGATGCTAACGGAGCTTATTGTATTGCATTAAAGGGTCTTTATGAGATCAAGCAAAT
    TACCGAGAATTGGAAGGAGGATGGCAAATTCTCGCGCGACAAACTGAAAATCAGTA
    ACAAGGACTGGTTCGATTTTATTCAGAATAAACGTTACCTGAAACGTCCGGCAGCGA
    CCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAG
    CCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGG
    GCTAA
    SEQ ID NO: 53
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    TAACGGAACGAACAACTTCCAGAACTTCATCGGCATCAGTTCTTTACAAAAAACCCT
    GCGTAACGCCCTTATTCCGACTGAGACAACACAACAGTTCATCGTTAAAAACGGAAT
    TATCAAAGAGGACGAGTTGCGCGGCGAGAATCGCCAAATTTTGAAAGATATTATGG
    ACGACTATTATCGTGGTTTTATTTCAGAAACACTGAGTTCGATTGACGATATCGATT
    GGACGAGCCTGTTTGAGAAAATGGAAATCCAGTTGAAAAATGGCGATAATAAAGAC
    ACTTTAATCAAAGAACAAACCGAGTATCGTAAAGCGATCCATAAAAAGTTCGCTAA
    TGACGATCGTTTTAAGAATATGTTCAGTGCGAAACTGATTTCAGACATTTTGCCCGA
    GTTCGTGATCCATAATAACAACTATTCCGCCTCGGAAAAGGAAGAAAAAACCCAGG
    TGATTAAGCTGTTCAGTCGCTTCGCAACATCTTTCAAGGATTATTTCAAGAATCGCG
    CGAATTGCTTCAGTGCGGACGATATTTCTAGTTCAAGCTGCCATCGTATCGTTAATG
    ATAACGCGGAGATTTTTTTTAGCAATGCTCTGGTGTACCGCCGCATTGTTAAGTCACT
    GTCCAACGATGATATTAACAAGATCTCAGGAGACATGAAAGACTCGCTTAAAGAGA
    TGAGTCTGGAAGAGATCTATTCTTATGAGAAGTATGGCGAGTTTATTACCCAAGAAG
    GAATCTCATTCTACAATGATATTTGTGGAAAGGTGAACAGCTTTATGAATCTTTACT
    GCCAAAAAAACAAGGAGAATAAGAATCTTTACAAACTTCAGAAGTTACATAAACAG
    ATTTTGTGTATTGCGGATACGTCTTATGAAGTCCCCTACAAATTTGAATCGGATGAA
    GAGGTATACCAAAGTGTGAACGGATTCTTGGACAATATTTCTTCTAAACATATTGTT
    GAACGCTTACGTAAGATCGGGGATAACTACAATGGCTACAATCTTGACAAAATCTA
    CATTGTTAGCAAATTCTACGAGAGTGTCAGCCAAAAGACGTACCGCGATTGGGAAA
    CAATTAATACTGCGCTTGAGATTCACTATAATAACATTTTACCAGGCAACGGCAAGT
    CCAAGGCGGATAAAGTTAAAAAAGCTGTTAAAAACGATTTGCAAAAATCTATCACA
    GAAATTAACGAGTTAGTTAGTAACTACAAACTGTGCTCCGATGACAACATTAAGGCT
    GAGACGTATATCCATGAGATCTCTCACATCTTAAACAATTTTGAAGCTCAAGAACTT
    AAGTACAATCCGGAAATCCACCTGGTGGAATCCGAGCTGAAGGCTAGCGAACTGAA
    GAACGTATTGGACGTGATCATGAACGCGTTCCACTGGTGTTCTGTCTTTATGACGGA
    AGAGCTTGTCGACAAAGATAATAACTTTTACGCGGAACTTGAGGAAATTTACGATG
    AGATTTACCCAGTTATTTCATTGTATAACCTTGTCCGTAATTACGTGACCCAAAAGCC
    TTATAGTACGAAAAAAATCAAATTAAATTTTGGAATCCCAACACTGGCTGACGGTTG
    GAGCAAATCTAAGGAGTATTCTAATAACGCAATCATCTTAATGCGTGACAACCTGTA
    TTATTTGGGTATCTTCAATGCCAAAAATAAGCCTGACAAAAAGATTATCGAAGGAA
    ATACTTCGGAGAATAAGGGGGATTACAAAAAAATGATTTACAATTTGCTGCCCGGG
    CCGAACAAGATGATCCCCAAAGTGTTCTTATCCTCGAAGACTGGTGTAGAAACATAC
    AAGCCAAGCGCATACATTCTGGAGGGTTACAAGCAAAACAAACACATCAAATCTTC
    AAAAGACTTTGACATTACATTTTGCCATGATCTTATTGACTACTTCAAAAACTGCATT
    GCTATTCACCCCGAGTGGAAGAACTTTGGGTTTGACTTCAGCGACACGTCTACGTAT
    GAGGACATCTCCGGGTTCTACCGTGAAGTTGAGTTACAAGGGTATAAGATTGACTGG
    ACGTATATTTCAGAGAAAGATATCGATCTTTTGCAGGAAAAGGGCCAGTTATATTTA
    TTCCAGATTTACAACAAGGACTTTAGTAAGAAGTCAACAGGAAATGACAACTTGCA
    TACGATGTATTTGAAAAATCTTTTTTCTGAGGAAAATCTTAAGGACATCGTACTGAA
    ATTGAATGGCGAGGCTGAAATCTTCTTCCGTAAATCCTCCATTAAGAATCCCATTAT
    CCACAAAAAGGGGTCTATCCTGGTGAATCGTACCTACGAGGCAGAGGAGAAGGATC
    AATTCGGAAATATTCAGATTGTTCGTAAGAACATCCCCGAGAACATTTATCAAGAAT
    TGTATAAGTACTTTAATGACAAATCTGACAAAGAGTTATCCGACGAAGCTGCGAAA
    CTGAAAAACGTTGTTGGTCACCACGAGGCCGCCACTAATATCGTAAAAGACTACCGT
    TATACCTATGACAAGTACTTTTTGCACATGCCGATCACTATCAACTTCAAGGCGAAT
    AAGACGGGCTTCATTAACGATCGTATCCTGCAATACATCGCCAAGGAGAAGGACCT
    TCACGTCATTGGGATTGACCGTGGTGAGCGTAACCTGATTTATGTAAGCGTCATTGA
    TACCTGCGGTAATATCGTCGAACAGAAAAGTTTCAACATTGTAAATGGATATGACTA
    TCAGATCAAACTTAAGCAGCAGGAGGGTGCACGCCAGATTGCCCGCAAGGAATGGA
    AGGAGATTGGGAAGATTAAGGAAATTAAAGAAGGTTACTTATCACTGGTTATTCAC
    GAGATCAGTAAAATGGTAATCAAATATAACGCGATCATTGCCATGGAGGATCTGAG
    CTATGGCTTTAAAAAGGGCCGTTTCAAAGTCGAGCGCCAGGTATATCAAAAGTTTGA
    AACAATGCTGATTAACAAATTAAACTATCTGGTTTTCAAAGATATTTCGATCACTGA
    AAATGGCGGGCTGTTGAAGGGATACCAACTTACATACATCCCTGACAAACTGAAAA
    ATGTCGGTCACCAATGTGGATGTATCTTTTATGTACCAGCAGCGTATACGAGCAAAA
    TCGATCCAACTACGGGTTTTGTGAACATCTTTAAGTTCAAGGATTTGACAGTAGATG
    CCAAACGCGAGTTCATTAAAAAATTTGATTCAATTCGCTACGATTCAGAGAAAAATC
    TTTTTTGTTTCACGTTCGATTACAATAATTTCATTACGCAGAACACAGTAATGTCAAA
    GTCAAGCTGGTCGGTCTACACGTATGGAGTCCGTATTAAACGTCGTTTTGTAAACGG
    CCGTTTCTCAAATGAATCAGATACAATTGATATTACGAAGGATATGGAGAAGACATT
    AGAGATGACTGACATTAACTGGCGCGACGGACATGATCTTCGTCAGGACATTATTGA
    TTATGAGATTGTACAGCATATCTTTGAGATCTTCCGCCTGACCGTTCAGATGCGCAA
    TTCGTTGTCCGAGTTAGAAGACCGCGATTACGACCGTTTAATCAGTCCCGTCTTAAA
    CGAAAATAACATCTTCTACGATTCAGCCAAGGCAGGCGATGCCTTGCCAAAGGATG
    CTGACGCAAATGGCGCATACTGTATTGCGTTGAAAGGCCTTTATGAAATCAAGCAAA
    TTACCGAAAACTGGAAAGAAGACGGAAAATTCTCCCGTGATAAGTTGAAAATCTCT
    AATAAGGATTGGTTCGATTTCATCCAAAATAAACGCTATTTGAAACGTCCGGCAGCG
    ACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCA
    GCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCG
    GGCTAA
    SEQ ID NO: 54
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    CAACGGAACTAATAATTTCCAAAATTTTATAGGCATCTCTTCTTTACAGAAGACTCTT
    CGTAACGCCCTAATCCCGACTGAGACCACACAACAATTCATAGTGAAAAATGGGAT
    CATTAAAGAAGACGAGCTGCGTGGGGAGAACAGGCAGATCCTAAAAGACATAATG
    GACGATTATTATAGAGGGTTCATCTCAGAGACATTATCTAGCATCGACGACATTGAC
    TGGACCTCCCTGTTTGAAAAAATGGAAATCCAGCTGAAGAATGGTGACAATAAAGA
    CACATTAATAAAAGAACAAACAGAGTACAGGAAAGCCATCCACAAGAAGTTCGCAA
    ACGATGACAGATTCAAAAATATGTTCAGTGCGAAGCTAATATCCGACATCTTACCAG
    AGTTTGTAATACACAATAACAATTACAGCGCGAGCGAAAAGGAAGAGAAAACGCA
    AGTAATTAAGCTTTTTAGTAGGTTCGCTACCTCTTTCAAAGATTACTTCAAAAATCGT
    GCTAACTGCTTCTCAGCCGACGACATATCTTCAAGTTCCTGTCACCGTATCGTGAAT
    GATAACGCTGAGATATTCTTCTCAAACGCCCTTGTATACCGTAGGATCGTAAAGTCC
    TTATCTAACGATGATATAAACAAGATCAGTGGAGACATGAAAGACAGCCTTAAAGA
    GATGTCTCTAGAAGAAATTTACTCCTATGAAAAGTATGGGGAGTTTATAACACAGGA
    GGGGATCAGCTTCTACAACGACATCTGCGGAAAGGTGAACAGTTTCATGAATCTTTA
    CTGCCAGAAGAATAAAGAGAACAAAAATCTTTATAAGCTTCAAAAGTTGCACAAAC
    AAATACTGTGCATTGCCGATACATCATATGAGGTCCCCTATAAGTTCGAATCTGATG
    AGGAAGTTTATCAATCTGTTAACGGCTTTCTAGACAATATCAGCTCAAAACACATCG
    TAGAAAGACTGAGGAAAATAGGTGATAATTATAATGGATACAACTTGGATAAAATA
    TATATAGTCTCTAAATTTTACGAGTCAGTATCCCAGAAAACGTATAGGGATTGGGAG
    ACCATCAACACGGCGTTAGAGATTCATTACAATAACATCTTACCGGGAAACGGAAA
    AAGTAAGGCGGACAAAGTAAAGAAAGCCGTTAAAAATGACTTACAAAAGAGTATA
    ACAGAAATAAACGAACTAGTAAGCAACTACAAGCTTTGTTCCGATGATAATATCAA
    GGCCGAGACATATATCCATGAGATCTCCCACATTCTAAACAATTTCGAAGCGCAAGA
    ACTTAAATATAATCCCGAAATCCACCTGGTGGAAAGTGAACTAAAGGCTAGTGAGT
    TAAAGAACGTTCTTGATGTTATCATGAACGCCTTCCATTGGTGCTCTGTTTTTATGAC
    CGAGGAGTTGGTTGATAAAGATAATAATTTCTACGCTGAATTAGAGGAGATATACG
    ACGAAATCTACCCAGTGATTTCACTATACAACTTGGTCAGGAACTATGTTACACAAA
    AGCCGTACAGCACTAAGAAAATTAAGCTAAATTTCGGTATCCCCACGTTAGCCGACG
    GGTGGAGCAAGTCCAAAGAATATTCCAACAATGCGATTATTTTAATGCGTGACAATC
    TTTATTACCTTGGCATCTTCAATGCCAAAAACAAACCTGACAAAAAGATTATAGAAG
    GTAATACGTCCGAGAACAAAGGCGATTACAAGAAGATGATTTATAACCTACTGCCC
    GGACCAAACAAAATGATCCCCAAAGTTTTTCTTAGTTCTAAAACCGGCGTAGAGACG
    TATAAACCTTCTGCCTATATCTTAGAGGGATATAAGCAGAACAAACATATCAAATCT
    TCCAAGGACTTTGATATTACATTCTGCCACGATTTAATTGACTACTTCAAAAATTGCA
    TAGCGATACATCCGGAGTGGAAGAACTTTGGCTTCGACTTCAGTGATACATCCACCT
    ATGAGGATATATCAGGCTTCTATCGTGAGGTCGAATTGCAAGGGTACAAAATCGATT
    GGACGTATATATCCGAGAAAGACATAGACCTTCTTCAAGAAAAGGGGCAGTTATAT
    TTATTCCAAATATACAACAAGGACTTCAGTAAGAAGTCAACAGGTAATGACAACTT
    ACACACCATGTACTTGAAAAATTTATTTTCTGAAGAAAACCTAAAGGACATTGTACT
    AAAACTGAACGGGGAGGCAGAAATTTTTTTTAGAAAGAGCAGCATAAAAAACCCAA
    TAATTCATAAGAAAGGAAGCATTTTAGTTAATAGGACGTACGAGGCAGAGGAAAAG
    GACCAGTTTGGCAATATCCAGATCGTAAGGAAAAATATTCCTGAAAACATATATCA
    GGAACTATATAAATACTTTAACGACAAATCCGACAAAGAATTATCCGACGAGGCTG
    CAAAGCTGAAGAACGTCGTAGGGCACCATGAGGCAGCGACTAATATTGTGAAAGAC
    TATAGGTATACATACGACAAATACTTTCTGCACATGCCCATCACGATTAACTTCAAG
    GCGAACAAGACGGGATTCATTAACGACCGTATATTACAATATATTGCTAAGGAGAA
    AGATCTGCATGTAATAGGTATCGACAGAGGCGAACGTAATTTAATCTACGTGTCCGT
    CATCGACACGTGCGGGAACATCGTAGAGCAAAAGAGTTTTAATATAGTAAATGGCT
    ATGATTACCAAATTAAGCTAAAGCAGCAAGAAGGAGCAAGACAGATAGCTAGGAA
    AGAATGGAAGGAGATAGGAAAAATAAAGGAGATCAAGGAGGGGTATCTTAGCCTA
    GTAATTCATGAAATATCTAAGATGGTTATCAAATACAACGCTATCATAGCGATGGAA
    GACTTATCTTATGGTTTCAAGAAAGGAAGGTTCAAAGTAGAGCGTCAAGTTTATCAA
    AAGTTCGAAACGATGTTGATTAATAAACTAAACTATTTGGTATTTAAAGATATATCT
    ATCACCGAGAATGGTGGTCTACTAAAGGGTTACCAGCTTACATACATACCGGACAA
    ACTTAAAAACGTCGGACATCAGTGTGGATGCATTTTCTACGTTCCAGCTGCATATAC
    CAGCAAGATCGACCCAACGACTGGGTTCGTAAATATTTTTAAATTCAAGGATTTGAC
    TGTCGACGCCAAAAGAGAGTTCATAAAAAAGTTCGATTCAATTAGGTACGACAGCG
    AAAAGAATTTGTTCTGCTTTACTTTTGACTATAACAATTTCATTACTCAGAACACTGT
    AATGTCTAAGTCCTCTTGGTCAGTCTATACTTATGGCGTTCGTATCAAACGTAGATTT
    GTTAACGGTAGATTCTCAAATGAAAGTGATACAATAGATATCACGAAAGATATGGA
    GAAAACATTAGAAATGACAGACATAAACTGGAGAGACGGACATGACTTGAGACAG
    GACATTATTGACTACGAGATCGTGCAGCACATCTTTGAGATCTTTCGTTTGACCGTA
    CAAATGCGTAACAGTTTATCTGAGCTTGAGGACAGGGACTACGATAGATTGATATCA
    CCTGTATTAAATGAGAATAACATCTTCTATGATTCCGCAAAAGCAGGCGACGCTCTA
    CCCAAAGACGCTGATGCGAACGGTGCTTATTGCATAGCTTTAAAGGGTTTGTATGAG
    ATCAAACAGATAACAGAAAATTGGAAGGAAGATGGTAAGTTCTCCCGTGACAAGCT
    TAAAATATCAAATAAGGACTGGTTCGATTTTATACAGAATAAGCGTTATTAAAACGT
    CCGGCAGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCG
    GCGCAGGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAA
    GGTTATTCCGGGCTAA
    SEQ ID NO: 55
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    CAATGGAACTAATAACTTCCAGAATTTCATTGGTATCTCCTCTTTACAAAAAACTCT
    AAGAAACGCCCTAATTCCGACTGAAACTACACAGCAATTCATCGTCAAAAACGGGA
    TCATTAAGGAGGATGAGTTGAGGGGTGAAAATCGTCAAATTCTTAAAGACATCATG
    GACGACTACTACAGGGGGTTCATCAGCGAGACGTTATCTAGTATAGACGATATAGA
    CTGGACTTCACTGTTCGAGAAGATGGAAATCCAATTAAAAAATGGGGACAATAAAG
    ATACACTTATAAAGGAACAGACAGAGTATAGAAAGGCAATACACAAAAAGTTTGCC
    AACGACGATCGTTTCAAGAACATGTTTAGTGCTAAATTGATTTCAGATATTCTGCCG
    GAATTTGTTATTCACAACAATAATTATAGCGCCAGTGAGAAAGAAGAAAAAACGCA
    GGTTATCAAACTGTTCAGTCGTTTCGCTACATCTTTTAAGGATTACTTTAAAAACCGT
    GCAAATTGTTTTTCAGCCGACGATATTAGTAGCAGCTCTTGTCACCGTATTGTTAATG
    ATAATGCGGAGATTTTCTTTTCAAACGCATTGGTCTACAGGAGGATAGTCAAGTCCC
    TTTCAAATGACGACATTAATAAGATCTCAGGTGACATGAAAGATTCCTTAAAGGAA
    ATGTCCCTGGAAGAGATCTATTCCTATGAAAAGTACGGTGAGTTCATTACTCAAGAG
    GGTATAAGCTTTTACAATGACATATGTGGTAAGGTTAATAGCTTTATGAACCTGTAT
    TGCCAGAAGAACAAAGAAAATAAGAATCTGTATAAGTTGCAAAAGCTACACAAACA
    AATTTTGTGCATTGCCGATACATCATACGAGGTGCCATACAAATTCGAGAGCGATGA
    GGAGGTTTATCAGAGCGTGAATGGATTCCTGGACAATATTAGTAGTAAGCATATCGT
    GGAAAGGCTTAGAAAGATAGGTGACAATTACAATGGCTACAATCTGGATAAAATCT
    ACATCGTCTCAAAATTCTATGAAAGTGTATCCCAGAAGACGTACCGTGATTGGGAAA
    CTATCAACACCGCTCTGGAGATACATTACAACAATATACTTCCCGGAAACGGCAAGT
    CAAAAGCCGACAAAGTCAAAAAAGCGGTCAAGAACGATTTACAAAAGTCTATCACT
    GAAATTAATGAATTAGTTAGTAATTACAAACTGTGTAGTGATGATAATATTAAGGCA
    GAGACTTACATACACGAAATTTCACACATTTTAAACAACTTCGAGGCACAGGAACTT
    AAATATAATCCTGAAATTCACCTGGTTGAAAGTGAATTGAAAGCCAGCGAGCTAAA
    GAACGTTTTGGACGTAATCATGAACGCATTCCACTGGTGCTCTGTCTTTATGACAGA
    GGAACTAGTGGATAAGGACAATAATTTTTATGCGGAGCTGGAGGAAATATACGATG
    AGATATATCCCGTAATATCATTATATAATCTGGTAAGAAACTATGTGACTCAAAAGC
    CGTATAGCACCAAGAAAATTAAACTTAATTTCGGCATACCCACTTTAGCGGACGGCT
    GGTCAAAATCCAAAGAGTATAGTAATAATGCCATCATCCTGATGCGTGACAACCTGT
    ACTATTTAGGTATATTTAACGCCAAAAATAAACCCGACAAAAAGATTATAGAGGGC
    AACACCTCAGAGAACAAAGGTGATTATAAGAAGATGATTTACAACCTTTTACCCGGT
    CCTAATAAGATGATTCCCAAAGTCTTTCTATCTAGCAAAACTGGTGTTGAAACATAC
    AAACCCTCAGCTTATATTTTAGAAGGGTATAAGCAGAATAAGCATATTAAAAGCTCC
    AAAGATTTCGATATTACCTTTTGCCATGACTTGATAGACTATTTCAAAAATTGTATTG
    CCATTCACCCTGAATGGAAAAACTTCGGATTTGACTTCTCTGACACATCCACCTACG
    AAGACATTTCAGGTTTTTACAGGGAAGTCGAGCTACAGGGTTATAAAATTGATTGGA
    CATACATCAGCGAGAAAGATATTGACCTACTTCAAGAAAAAGGGCAGCTATACCTG
    TTCCAGATATACAATAAAGACTTCAGTAAAAAAAGCACCGGGAACGATAATCTTCA
    CACAATGTACTTAAAAAATTTATTTAGTGAAGAGAATCTGAAGGATATAGTGCTGAA
    GTTAAACGGGGAGGCAGAGATATTTTTTAGAAAATCTAGTATTAAGAATCCGATCAT
    CCACAAGAAGGGTTCTATCCTTGTTAATAGGACTTATGAGGCAGAAGAAAAAGACC
    AATTCGGCAACATACAAATTGTCCGTAAAAATATCCCTGAGAACATTTATCAGGAAC
    TATACAAGTACTTCAATGATAAAAGCGACAAGGAGCTGAGCGACGAGGCTGCTAAG
    TTAAAGAATGTGGTGGGCCACCATGAGGCAGCAACGAATATTGTGAAGGACTATCG
    TTATACCTACGATAAATACTTTCTTCATATGCCGATCACCATTAATTTCAAGGCAAAC
    AAAACTGGCTTCATTAACGATCGTATCTTACAATATATCGCAAAAGAGAAAGACCTT
    CACGTTATCGGGATCGATAGAGGCGAGCGTAACCTAATTTATGTTTCTGTGATAGAC
    ACCTGTGGGAACATAGTCGAACAGAAATCATTTAATATTGTTAACGGCTACGATTAT
    CAGATAAAGTTGAAGCAACAAGAGGGTGCACGTCAAATAGCAAGGAAAGAATGGA
    AAGAAATAGGCAAGATTAAAGAAATAAAAGAAGGTTATTTATCCCTTGTAATACAC
    GAAATTAGCAAAATGGTGATTAAATATAATGCGATCATTGCCATGGAGGATCTTTCT
    TACGGCTTCAAAAAGGGGAGATTCAAAGTCGAGAGGCAGGTGTATCAGAAGTTTGA
    GACCATGCTAATCAATAAACTAAATTATCTAGTATTCAAAGACATAAGCATCACCGA
    AAATGGCGGCTTGTTGAAGGGTTATCAATTGACCTACATCCCAGATAAACTAAAAA
    ACGTAGGGCATCAATGCGGATGTATATTTTACGTTCCAGCCGCATACACTTCCAAAA
    TCGATCCAACTACGGGTTTTGTGAACATCTTCAAATTCAAAGACTTGACTGTCGATG
    CTAAGAGGGAGTTTATCAAGAAATTTGACTCCATTAGATACGACAGTGAGAAGAAT
    CTGTTCTGTTTTACCTTTGATTATAACAACTTTATAACTCAAAACACAGTCATGAGTA
    AGTCATCTTGGTCAGTGTATACGTATGGTGTGAGGATTAAAAGGAGGTTTGTTAACG
    GGAGATTTTCCAATGAAAGTGATACAATAGATATAACCAAGGACATGGAAAAGACT
    CTTGAAATGACCGACATTAACTGGAGAGATGGCCACGACTTACGTCAAGATATAAT
    CGATTACGAGATAGTGCAACATATCTTTGAGATATTTAGGCTTACTGTCCAAATGCG
    TAACTCATTAAGTGAGTTGGAGGACAGGGATTACGATAGGCTAATAAGTCCTGTTCT
    TAACGAAAACAATATATTCTACGATTCAGCAAAGGCGGGAGACGCCCTGCCCAAGG
    ACGCGGATGCTAACGGCGCATACTGTATTGCCCTGAAAGGCTTGTACGAGATAAAA
    CAGATCACGGAGAACTGGAAAGAAGATGGAAAATTCAGTCGTGACAAGTTAAAAAT
    TAGTAACAAAGACTGGTTCGACTTTATTCAGAACAAGAGATATCTGAAACGTCCGGC
    AGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCA
    GGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTA
    TTCCGGGCTAA
    SEQ ID NO: 56
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    CAACGGAACCAATAACTTTCAAAACTTTATAGGCATCTCCAGTCTACAGAAGACACT
    ACGTAACGCTTTGATACCAACTGAGACCACGCAGCAGTTTATCGTCAAGAACGGTAT
    TATAAAGGAAGACGAGCTAAGGGGGGAAAACCGTCAGATCTTAAAGGACATCATGG
    ATGACTACTACAGAGGCTTCATAAGTGAGACTTTGTCTAGTATAGACGACATCGACT
    GGACCAGTTTATTTGAGAAGATGGAAATTCAGTTAAAGAACGGGGACAATAAAGAC
    ACACTAATTAAAGAGCAGACCGAATACAGAAAAGCTATACACAAAAAGTTTGCCAA
    CGATGATAGATTCAAAAATATGTTTTCAGCAAAATTGATTTCCGACATATTGCCAGA
    ATTCGTAATCCATAATAACAATTATTCTGCAAGTGAGAAGGAAGAGAAGACCCAAG
    TAATCAAGCTGTTTTCCCGTTTTGCTACGAGTTTCAAAGATTATTTCAAGAATAGGGC
    TAATTGTTTCTCCGCGGACGACATAAGTAGCAGTTCCTGTCACAGGATTGTGAACGA
    TAATGCTGAGATATTTTTTTCCAATGCCCTAGTGTATAGGAGAATAGTTAAAAGCTT
    AAGCAACGACGATATCAATAAAATTTCAGGGGACATGAAGGACAGCTTAAAGGAAA
    TGAGTTTGGAGGAGATTTACAGTTATGAAAAATACGGAGAGTTTATAACTCAGGAA
    GGCATCTCTTTCTATAATGATATCTGTGGGAAGGTAAACTCCTTCATGAATTTATATT
    GCCAGAAGAATAAGGAAAACAAAAATCTTTACAAGCTTCAAAAGTTACATAAGCAG
    ATCTTATGTATTGCCGACACGAGTTATGAAGTGCCTTATAAATTCGAGAGTGATGAG
    GAAGTGTATCAGTCTGTTAACGGATTCCTAGATAATATAAGTTCCAAACATATAGTC
    GAGAGGCTGAGGAAGATTGGCGATAACTATAATGGATATAATCTTGACAAAATCTA
    TATAGTCTCTAAATTTTATGAAAGCGTCAGCCAGAAGACATATAGAGATTGGGAAA
    CTATAAACACAGCCCTTGAAATACATTACAATAACATCCTACCCGGCAATGGTAAGT
    CTAAGGCAGACAAAGTTAAAAAAGCAGTAAAGAATGACTTACAGAAGTCAATCACG
    GAGATAAATGAGTTGGTCAGTAACTACAAATTATGCTCCGACGATAATATTAAGGCC
    GAAACATATATACACGAGATAAGTCATATATTAAACAATTTCGAAGCCCAGGAGTT
    AAAATATAACCCTGAAATTCATCTGGTCGAAAGTGAGTTAAAGGCCAGTGAGTTAA
    AGAATGTACTTGACGTAATTATGAATGCTTTTCATTGGTGCTCCGTGTTCATGACCGA
    GGAGTTAGTAGATAAAGACAATAACTTTTACGCCGAACTTGAAGAGATATACGACG
    AGATTTATCCGGTAATCAGCTTGTACAACTTAGTTAGAAATTATGTAACACAGAAGC
    CTTACTCTACTAAAAAAATAAAACTGAACTTTGGTATCCCAACTCTTGCAGATGGTT
    GGAGTAAAAGCAAGGAATATAGCAACAATGCGATCATCTTGATGAGAGACAACTTG
    TACTATTTGGGAATCTTCAACGCGAAAAATAAACCCGACAAAAAAATCATCGAAGG
    GAATACCTCTGAGAATAAAGGTGACTATAAGAAAATGATTTACAATCTACTTCCTGG
    TCCTAATAAAATGATCCCGAAAGTGTTTCTTAGTTCTAAGACTGGTGTCGAGACGTA
    CAAACCTAGCGCGTACATCTTAGAAGGGTACAAGCAGAATAAACACATCAAATCAA
    GCAAAGACTTCGATATTACTTTTTGCCATGACTTGATAGACTACTTTAAAAACTGCA
    TAGCAATCCACCCGGAGTGGAAAAACTTTGGCTTTGATTTCTCTGACACCTCTACAT
    ATGAGGACATATCTGGTTTTTACCGTGAGGTTGAATTGCAGGGATACAAAATTGACT
    GGACTTACATATCTGAAAAAGATATCGATCTATTGCAGGAGAAAGGCCAGCTTTACC
    TTTTCCAGATCTATAATAAGGACTTCTCTAAGAAGTCTACAGGGAATGATAATTTGC
    ACACTATGTACTTAAAAAATCTGTTTTCCGAGGAAAACTTGAAAGACATTGTTTTAA
    AGTTGAACGGAGAAGCTGAAATATTTTTCAGAAAGAGCTCCATAAAAAACCCGATC
    ATTCATAAGAAGGGATCTATCCTGGTTAACAGAACGTACGAAGCGGAAGAAAAAGA
    CCAATTCGGAAACATTCAAATTGTTAGAAAGAATATCCCTGAGAACATCTACCAGG
    AGTTATATAAGTATTTTAATGATAAGTCAGATAAGGAACTATCTGACGAAGCGGCG
    AAGCTTAAAAATGTTGTAGGACACCATGAGGCTGCTACAAATATAGTCAAGGACTA
    CCGTTATACCTACGATAAGTACTTTCTACACATGCCCATTACCATCAATTTTAAAGCT
    AATAAAACGGGTTTTATCAACGATCGTATCCTACAATATATTGCGAAAGAGAAGGA
    TTTGCATGTCATTGGCATTGATAGAGGTGAGAGGAACCTAATATACGTATCCGTGAT
    TGATACGTGCGGGAACATAGTTGAACAGAAATCATTTAATATAGTTAATGGGTACG
    ACTATCAGATTAAGCTAAAGCAACAAGAAGGCGCCAGGCAAATTGCCCGTAAAGAA
    TGGAAAGAGATCGGGAAGATCAAGGAAATAAAAGAAGGATACCTTTCCCTGGTCAT
    CCATGAAATTAGCAAAATGGTGATTAAGTACAATGCCATAATCGCGATGGAGGACT
    TAAGCTACGGGTTCAAAAAGGGGAGGTTTAAGGTGGAGAGGCAAGTGTACCAGAAA
    TTTGAGACCATGCTAATCAACAAACTGAACTACCTAGTTTTTAAGGACATTTCAATT
    ACAGAGAATGGAGGACTTTTAAAGGGTTACCAACTAACGTATATACCAGATAAGTT
    GAAAAATGTCGGTCACCAGTGTGGCTGCATCTTTTACGTTCCCGCCGCTTATACATCT
    AAAATTGATCCAACCACAGGCTTTGTAAATATCTTTAAATTCAAAGATTTAACTGTG
    GATGCAAAAAGAGAGTTTATCAAGAAATTCGATAGCATTCGTTATGATAGCGAGAA
    GAACCTGTTCTGCTTTACTTTCGACTATAACAACTTTATAACTCAAAACACCGTGATG
    TCAAAAAGCTCATGGTCAGTCTACACCTATGGTGTAAGGATTAAAAGGCGTTTCGTG
    AATGGGAGATTCTCCAATGAAAGTGACACGATCGACATAACAAAGGACATGGAGAA
    GACACTAGAGATGACTGATATTAATTGGAGAGACGGACACGATCTGCGTCAAGATA
    TAATTGATTATGAGATAGTACAGCACATATTTGAGATCTTCCGTTTGACTGTCCAAA
    TGCGTAATTCCCTTTCTGAGCTGGAAGATAGGGACTATGATAGATTAATATCCCCTG
    TACTAAATGAGAACAACATTTTCTATGATAGTGCAAAAGCCGGGGATGCATTGCCG
    AAAGACGCTGACGCTAATGGGGCGTACTGTATAGCTTTAAAGGGGCTTTACGAAAT
    AAAGCAGATAACCGAAAACTGGAAGGAAGATGGCAAATTCTCAAGGGACAAACTT
    AAGATCTCTAACAAGGATTGGTTCGATTTTATACAAAACAAACGTTATTTGAAACGT
    CCGGCAGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCG
    GCGCAGGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAA
    GGTTATTCCGGGCTAA
    SEQ ID NO: 57
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    TAATGGTACAAACAACTTTCAGAATTTCATTGGGATCTCTAGCTTACAGAAGACCCT
    GAGGAATGCGTTGATTCCAACTGAAACAACCCAGCAATTCATCGTGAAAAATGGGA
    TAATCAAAGAGGATGAGTTAAGGGGTGAAAACCGTCAAATATTGAAGGATATTATG
    GACGACTACTACCGTGGATTCATCTCAGAGACGTTGAGCAGCATTGACGACATAGA
    CTGGACTAGCCTTTTCGAGAAGATGGAAATTCAGTTAAAGAACGGAGATAACAAAG
    ATACACTAATCAAGGAACAGACAGAATACAGAAAAGCAATTCATAAGAAATTCGCT
    AATGACGATCGTTTTAAAAACATGTTCTCTGCAAAATTAATTAGCGACATTCTGCCG
    GAATTCGTTATACATAATAATAACTACAGTGCTTCTGAAAAGGAAGAGAAAACTCA
    GGTAATAAAACTGTTCTCTCGTTTTGCCACATCCTTCAAAGACTACTTTAAAAATAG
    AGCGAACTGCTTTAGCGCCGACGATATTAGTTCTTCCTCATGCCACAGGATTGTCAA
    CGATAATGCAGAGATATTCTTTTCTAACGCACTAGTCTACAGAAGGATTGTAAAGTC
    TTTGTCAAATGATGACATAAACAAGATTAGTGGAGATATGAAAGACTCTCTAAAGG
    AAATGAGCCTTGAGGAGATATACTCTTATGAAAAGTACGGTGAGTTTATTACCCAAG
    AAGGCATTAGTTTCTATAATGACATTTGTGGAAAAGTTAACAGTTTTATGAATCTAT
    ACTGTCAAAAAAATAAGGAGAATAAAAATCTTTATAAGTTGCAAAAACTGCATAAG
    CAGATATTATGTATAGCAGACACGAGCTATGAGGTACCGTACAAGTTCGAGAGCGA
    TGAGGAAGTCTACCAATCTGTCAACGGATTTTTGGACAACATTTCTTCAAAACATAT
    TGTGGAGAGGCTTAGGAAAATAGGCGACAATTATAATGGATATAACTTAGATAAGA
    TATATATTGTTTCCAAATTCTACGAATCTGTAAGCCAGAAGACATACAGAGATTGGG
    AAACGATAAACACAGCCCTTGAAATTCACTATAACAACATACTACCTGGAAACGGC
    AAATCAAAGGCCGACAAAGTTAAGAAGGCCGTAAAGAATGATTTACAGAAGAGCAT
    AACGGAGATCAATGAGCTGGTGTCTAACTATAAATTGTGTAGCGATGACAACATAA
    AAGCCGAGACTTACATTCACGAAATTTCACACATACTTAACAACTTTGAAGCTCAGG
    AATTAAAGTATAATCCCGAAATACACCTTGTGGAGTCCGAACTAAAGGCTAGTGAG
    CTTAAGAACGTCCTAGACGTAATTATGAATGCCTTCCACTGGTGTAGTGTTTTTATGA
    CCGAGGAACTTGTTGACAAAGATAATAATTTTTATGCAGAACTAGAAGAGATATAC
    GATGAAATATACCCGGTGATCAGTTTGTACAATCTTGTCAGGAACTATGTGACACAA
    AAGCCCTATTCAACAAAGAAAATAAAACTTAATTTCGGAATTCCTACGTTAGCTGAT
    GGCTGGTCTAAATCCAAGGAATACAGCAACAACGCTATAATTCTGATGAGAGATAA
    CTTGTACTATCTAGGCATCTTCAATGCCAAAAATAAGCCTGATAAGAAGATTATAGA
    GGGCAACACTTCAGAGAACAAGGGCGACTACAAGAAAATGATCTATAACCTATTGC
    CTGGCCCAAACAAGATGATTCCGAAGGTCTTCCTATCATCCAAGACCGGCGTTGAGA
    CATACAAGCCATCAGCGTATATTTTAGAGGGGTACAAACAAAACAAGCACATAAAG
    TCTAGTAAAGACTTCGATATAACATTTTGTCATGACTTAATTGACTACTTTAAGAATT
    GCATCGCTATACACCCGGAATGGAAGAATTTCGGCTTCGACTTCTCTGATACATCTA
    CCTACGAGGACATTAGCGGGTTTTACCGTGAAGTCGAATTACAAGGGTATAAGATA
    GATTGGACGTACATCTCTGAGAAAGACATAGACTTGCTTCAGGAAAAGGGCCAGTT
    GTATCTATTCCAAATATACAATAAGGATTTTTCCAAGAAATCTACGGGTAATGACAA
    TCTTCACACAATGTATCTTAAGAACCTTTTCTCAGAAGAGAACCTGAAGGACATTGT
    CTTAAAACTAAATGGCGAAGCTGAGATTTTTTTCAGGAAGTCTTCAATTAAGAACCC
    GATAATCCACAAGAAGGGGAGTATTCTTGTGAATAGAACTTACGAGGCCGAAGAAA
    AAGACCAATTTGGTAACATCCAGATAGTCAGAAAGAACATTCCAGAGAACATCTAC
    CAAGAGCTATACAAATATTTCAACGACAAGTCCGATAAGGAACTGTCCGATGAGGC
    AGCCAAGTTGAAGAATGTCGTGGGTCATCATGAAGCTGCTACTAACATTGTCAAGG
    ACTATCGTTATACTTACGACAAGTATTTCCTACACATGCCGATAACAATTAATTTCA
    AGGCTAACAAAACAGGCTTTATCAACGATCGTATCTTGCAGTACATAGCTAAGGAA
    AAGGATTTGCATGTGATTGGCATTGATAGAGGGGAGCGTAACTTGATATATGTGTCT
    GTCATAGACACGTGTGGCAACATCGTCGAACAGAAATCATTCAACATAGTAAACGG
    CTACGATTACCAAATTAAGCTGAAACAGCAAGAGGGTGCACGTCAAATTGCGCGTA
    AAGAGTGGAAAGAAATTGGTAAAATCAAGGAAATTAAAGAAGGCTACTTGTCTCTT
    GTTATACATGAAATTTCCAAGATGGTTATAAAGTATAACGCGATAATTGCTATGGAA
    GACTTATCATACGGGTTTAAAAAGGGGAGGTTCAAGGTAGAGAGGCAGGTCTATCA
    AAAGTTCGAGACGATGTTGATTAATAAACTAAACTATCTAGTGTTCAAAGATATCAG
    CATTACGGAGAACGGGGGGCTACTGAAAGGATATCAACTAACGTACATTCCCGATA
    AGTTAAAGAACGTTGGTCATCAATGTGGTTGCATCTTCTACGTGCCTGCTGCCTATA
    CGTCCAAAATAGATCCAACTACTGGATTTGTTAACATCTTTAAATTCAAAGATTTAA
    CCGTAGACGCCAAAAGGGAATTTATAAAAAAATTTGACAGCATCCGTTACGATAGC
    GAAAAGAATCTGTTCTGTTTTACTTTCGACTACAATAATTTCATCACGCAAAATACG
    GTAATGTCTAAGTCAAGTTGGAGCGTCTACACGTATGGAGTCAGGATCAAGAGGCG
    TTTCGTAAATGGAAGATTCTCTAATGAGTCAGATACTATAGACATCACGAAAGATAT
    GGAGAAAACCTTGGAGATGACGGATATTAACTGGCGTGATGGACACGATTTAAGAC
    AGGACATTATTGACTATGAGATTGTGCAACACATCTTCGAAATATTCCGTCTAACAG
    TCCAAATGAGGAATAGCCTAAGTGAATTGGAGGACCGTGATTACGATAGGCTTATA
    AGTCCTGTCCTTAACGAAAACAATATTTTCTATGATAGTGCTAAGGCGGGGGACGCA
    CTGCCTAAAGACGCAGATGCTAACGGGGCATACTGCATTGCGTTAAAGGGTCTGTAC
    GAAATCAAGCAGATTACGGAAAACTGGAAAGAGGATGGCAAGTTTAGCAGAGATA
    AGTTGAAGATAAGTAACAAAGATTGGTTTGACTTTATTCAGAATAAAAGGTATTTAA
    AACGTCCGGCAGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGG
    TAGCGGCGCAGGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAA
    CGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 58
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    TAACGGCACTAATAATTTCCAGAATTTCATCGGCATTAGCAGCTTACAAAAGACGTT
    GAGGAATGCCTTAATACCCACAGAAACTACTCAACAATTTATAGTGAAGAATGGGA
    TAATTAAGGAAGACGAGTTGAGAGGTGAAAATAGGCAAATCTTGAAAGACATTATG
    GATGACTACTACAGGGGCTTCATTAGTGAAACGTTGTCTTCAATAGATGACATTGAT
    TGGACTTCTTTGTTTGAGAAGATGGAAATACAGTTAAAGAACGGCGACAATAAGGA
    TACACTTATCAAAGAGCAAACAGAATATAGAAAAGCAATTCACAAAAAGTTTGCTA
    ACGATGATAGGTTCAAGAACATGTTTAGCGCTAAACTAATATCAGACATCCTTCCCG
    AGTTCGTTATTCATAACAATAACTATAGTGCAAGTGAAAAAGAGGAGAAGACACAG
    GTGATTAAGCTGTTCTCCAGATTCGCGACTTCTTTCAAAGATTACTTCAAAAACAGA
    GCCAACTGTTTTTCAGCTGACGATATCTCTAGTAGTAGTTGTCACCGTATAGTGAAC
    GATAACGCTGAGATCTTCTTTAGCAATGCATTAGTGTATAGAAGGATAGTTAAGTCT
    CTAAGCAATGATGATATCAATAAAATTTCCGGAGACATGAAGGACTCCCTAAAGGA
    AATGTCCTTAGAAGAGATCTACTCATATGAGAAATACGGGGAATTTATTACGCAGG
    AAGGGATCTCCTTTTACAATGACATATGCGGGAAGGTCAACTCTTTCATGAACTTAT
    ACTGCCAAAAGAACAAGGAGAACAAGAATTTATATAAACTTCAGAAACTTCACAAA
    CAAATACTGTGCATAGCCGATACCTCATATGAGGTTCCTTACAAATTTGAATCAGAT
    GAAGAGGTATACCAATCCGTTAACGGCTTTCTTGACAATATTAGCTCAAAGCACATC
    GTGGAGAGGTTGAGAAAGATTGGTGATAATTATAATGGCTACAATCTAGATAAGAT
    ATATATTGTTAGCAAGTTCTACGAGTCTGTGTCCCAAAAAACATATAGGGATTGGGA
    GACAATTAATACTGCTCTAGAAATCCATTACAACAACATCCTTCCTGGAAATGGCAA
    GAGTAAGGCCGACAAAGTCAAGAAAGCAGTGAAAAATGATCTGCAAAAATCAATTA
    CTGAGATAAACGAGCTAGTATCTAATTACAAGCTTTGTAGCGACGATAACATTAAGG
    CAGAAACGTACATACACGAGATTAGTCACATCTTAAATAATTTTGAAGCCCAAGAA
    CTGAAATATAACCCTGAGATACACCTTGTTGAATCCGAGTTAAAGGCGTCTGAACTA
    AAAAACGTGTTAGACGTTATTATGAATGCCTTCCACTGGTGTAGCGTCTTTATGACT
    GAGGAGTTGGTTGATAAGGATAATAACTTTTACGCTGAATTGGAAGAAATTTATGAC
    GAAATCTATCCTGTTATTTCTCTATATAATTTGGTGAGAAATTACGTAACGCAAAAG
    CCCTATAGTACGAAAAAAATAAAACTAAATTTCGGGATCCCTACCCTAGCCGACGGT
    TGGTCTAAATCCAAGGAGTACTCAAACAATGCAATAATATTGATGAGGGACAACCT
    GTACTACCTAGGCATATTTAATGCCAAAAATAAGCCCGATAAAAAGATTATAGAAG
    GGAACACGTCAGAAAATAAAGGAGACTATAAGAAAATGATCTACAACCTTTTGCCC
    GGCCCCAATAAAATGATCCCGAAGGTCTTCCTAAGTAGCAAGACTGGCGTAGAGAC
    CTACAAACCATCTGCATACATTTTGGAGGGGTACAAGCAAAACAAGCACATAAAGA
    GTAGTAAGGATTTTGACATTACATTCTGCCATGACTTAATTGACTACTTTAAAAATTG
    CATCGCAATTCACCCTGAATGGAAAAATTTTGGATTTGATTTCTCTGATACTTCAACA
    TATGAGGATATTTCAGGGTTCTACAGGGAGGTCGAACTACAGGGTTACAAAATAGA
    CTGGACGTATATTTCTGAGAAAGATATAGATTTGCTTCAGGAAAAGGGTCAGCTATA
    TCTGTTCCAGATATATAATAAGGACTTCTCCAAAAAGAGTACCGGAAATGATAATCT
    GCACACAATGTACTTAAAAAACTTGTTCTCTGAGGAGAATCTAAAAGACATCGTACT
    AAAACTTAACGGGGAGGCCGAAATTTTTTTTAGGAAGTCCAGCATCAAGAACCCGA
    TTATTCATAAAAAAGGTAGCATTTTGGTGAACCGTACTTATGAGGCGGAAGAAAAA
    GACCAATTCGGTAATATTCAAATCGTTAGAAAGAACATCCCTGAGAACATTTATCAG
    GAACTATACAAATACTTTAACGACAAATCAGATAAGGAGCTTTCTGATGAGGCAGC
    TAAATTGAAAAATGTAGTGGGACATCACGAAGCAGCCACTAACATAGTGAAGGACT
    ACAGATACACATACGATAAGTACTTCCTGCACATGCCTATTACAATTAACTTTAAAG
    CAAATAAAACAGGGTTTATTAACGACAGAATCTTACAGTATATTGCCAAAGAAAAG
    GATCTGCATGTGATAGGAATAGACAGAGGAGAAAGAAACCTGATATACGTCTCCGT
    GATTGATACATGTGGGAACATAGTAGAACAGAAGTCCTTTAACATTGTTAATGGGTA
    CGATTATCAAATTAAATTAAAACAACAAGAAGGAGCACGTCAAATAGCTAGGAAAG
    AATGGAAAGAGATAGGAAAAATTAAGGAAATTAAGGAGGGTTACCTGTCCCTTGTA
    ATTCATGAAATATCCAAAATGGTAATTAAATATAACGCGATCATCGCGATGGAAGA
    TCTAAGCTACGGGTTCAAAAAAGGCAGGTTTAAGGTGGAGAGGCAAGTTTACCAAA
    AGTTCGAGACAATGTTGATTAATAAGTTAAACTACTTAGTTTTCAAAGATATCTCCA
    TAACCGAGAATGGCGGGCTTTTAAAAGGGTACCAACTAACATATATCCCGGATAAA
    TTGAAGAACGTTGGACACCAGTGTGGCTGCATATTTTATGTACCCGCTGCGTATACT
    TCTAAAATTGACCCGACCACCGGGTTTGTAAACATATTCAAGTTTAAGGACCTAACA
    GTTGACGCCAAACGTGAGTTCATCAAGAAGTTCGATAGTATAAGGTATGACTCTGAG
    AAGAACCTTTTCTGCTTCACGTTTGACTATAATAATTTCATCACCCAAAATACAGTTA
    TGTCAAAAAGCTCTTGGTCAGTATATACGTATGGCGTAAGGATTAAGCGTAGGTTCG
    TGAACGGTAGATTTTCCAACGAGTCAGATACTATTGATATTACCAAGGATATGGAGA
    AGACATTAGAAATGACAGATATAAATTGGAGGGATGGGCACGATCTAAGGCAAGAT
    ATCATTGATTACGAAATTGTTCAGCACATATTCGAGATATTCCGTCTTACAGTACAA
    ATGCGTAACAGCTTGTCTGAGTTGGAAGATCGTGACTATGACAGGTTGATATCACCG
    GTCTTGAACGAGAACAATATATTCTACGACAGCGCTAAGGCGGGAGACGCTCTGCC
    TAAAGACGCAGATGCCAATGGGGCGTACTGCATTGCCTTAAAAGGCTTATACGAGA
    TTAAACAGATCACAGAGAACTGGAAAGAGGACGGCAAGTTTTCTAGAGATAAATTG
    AAAATCTCAAACAAAGACTGGTTCGATTTCATCCAAAACAAAAGATACCTTAAACG
    TCCGGCAGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGC
    GGCGCAGGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTA
    AGGTTATTCCGGGCTAA
    SEQ ID NO: 59
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    CAATGGAACTAACAACTTCCAGAACTTTATCGGCATCTCTTCCCTCCAAAAGACACT
    GAGAAATGCACTGATCCCAACCGAAACGACTCAACAATTTATTGTTAAGAACGGCA
    TCATAAAAGAAGACGAGCTTCGCGGCGAGAACCGCCAGATACTTAAGGATATTATG
    GACGATTATTACCGAGGCTTTATCAGCGAAACTCTTAGCTCTATTGATGATATCGAC
    TGGACCTCCCTCTTCGAAAAAATGGAGATACAGCTCAAGAACGGCGATAATAAAGA
    CACCTTGATAAAGGAACAGACTGAGTACAGGAAAGCGATCCACAAGAAATTCGCGA
    ACGACGACAGGTTTAAAAACATGTTCTCTGCAAAATTGATATCCGACATCTTGCCGG
    AATTTGTGATACACAACAATAACTATAGCGCTTCAGAGAAAGAAGAGAAGACCCAA
    GTAATCAAGTTGTTCAGCCGCTTCGCAACGTCTTTTAAAGATTACTTTAAGAACCGG
    GCCAATTGTTTCTCCGCGGATGATATTAGCTCATCAAGTTGCCATCGAATTGTCAAT
    GATAATGCGGAGATCTTCTTCAGCAATGCGCTGGTCTACAGACGAATCGTAAAAAGT
    CTTTCAAATGACGACATCAATAAGATTAGTGGAGATATGAAGGATTCCCTTAAGGA
    AATGAGTCTTGAAGAAATATACTCATACGAAAAGTACGGGGAATTTATTACCCAGG
    AGGGGATCTCCTTCTATAACGACATCTGTGGAAAAGTAAACTCATTCATGAACCTGT
    ACTGTCAGAAAAACAAAGAAAACAAAAATCTGTATAAACTCCAAAAATTGCACAAG
    CAAATATTGTGTATAGCGGACACATCATACGAGGTTCCATATAAGTTCGAAAGTGAT
    GAAGAAGTCTACCAATCAGTGAATGGGTTTCTGGACAACATTAGTTCCAAGCACATA
    GTTGAACGACTGCGAAAGATTGGTGACAATTACAACGGCTATAATTTGGACAAGAT
    TTATATAGTTAGCAAATTTTATGAATCCGTATCACAAAAGACTTATAGAGACTGGGA
    AACAATCAACACGGCACTTGAGATCCATTATAACAATATTCTTCCAGGGAACGGCA
    AAAGCAAGGCTGATAAGGTAAAAAAGGCCGTTAAGAATGATCTTCAAAAATCCATA
    ACGGAGATCAACGAACTTGTAAGTAACTACAAATTGTGCTCTGACGACAATATAAA
    GGCTGAAACGTATATTCACGAGATTAGCCATATCCTGAATAACTTTGAGGCCCAAGA
    ACTCAAGTATAACCCGGAAATACATTTGGTAGAAAGCGAGCTTAAAGCGAGTGAGC
    TGAAAAACGTCCTCGATGTGATCATGAATGCTTTCCACTGGTGTAGTGTCTTTATGA
    CTGAGGAGTTGGTTGATAAAGACAATAATTTCTACGCTGAACTGGAAGAAATTTACG
    ACGAAATCTATCCAGTGATCTCCCTCTATAACCTCGTTCGAAACTACGTGACGCAGA
    AACCTTATTCTACAAAGAAAATTAAGTTGAACTTCGGCATTCCTACACTTGCTGACG
    GATGGTCCAAATCCAAAGAGTACTCAAACAACGCAATCATCCTCATGCGGGATAAC
    CTTTATTATTTGGGCATTTTCAACGCCAAAAACAAACCTGATAAAAAGATAATTGAA
    GGCAATACGAGTGAGAACAAGGGCGACTACAAAAAAATGATATATAACTTGTTGCC
    AGGCCCCAACAAGATGATTCCTAAAGTTTTTCTGTCTTCTAAGACTGGAGTTGAAAC
    TTACAAACCCTCCGCCTACATTCTTGAAGGGTATAAACAGAATAAGCACATAAAGTC
    CTCAAAGGATTTCGACATTACGTTTTGCCATGACCTCATCGACTATTTCAAGAACTGT
    ATCGCCATACATCCGGAGTGGAAGAATTTTGGATTTGATTTCTCCGACACATCTACC
    TATGAAGACATAAGCGGTTTCTACCGGGAGGTCGAGCTTCAGGGCTATAAGATAGA
    TTGGACATACATTAGTGAAAAAGATATCGATCTTCTGCAAGAAAAGGGACAACTTT
    ACCTTTTTCAGATTTATAATAAAGACTTTTCAAAAAAGTCCACAGGGAACGATAATC
    TGCACACCATGTATCTCAAGAATCTGTTTAGTGAAGAAAACCTTAAAGACATAGTTT
    TGAAGCTTAACGGAGAGGCTGAGATTTTTTTTAGAAAGTCCTCAATTAAAAACCCTA
    TAATACACAAGAAAGGCTCTATTCTTGTTAACAGGACATATGAAGCCGAGGAGAAA
    GATCAGTTTGGCAATATCCAGATTGTTCGCAAGAATATCCCGGAAAATATATATCAG
    GAGCTGTATAAATACTTTAACGACAAGAGCGACAAGGAGCTGAGTGACGAGGCCGC
    GAAGCTTAAGAATGTAGTAGGTCACCACGAAGCAGCCACCAATATCGTCAAAGACT
    ATAGGTACACGTACGACAAGTACTTTTTGCACATGCCTATAACTATAAACTTCAAAG
    CTAATAAAACTGGGTTTATTAATGACAGGATTCTCCAATACATCGCTAAAGAGAAGG
    ATCTGCATGTAATTGGCATAGACAGAGGTGAGAGAAACTTGATATATGTCAGCGTA
    ATAGACACATGTGGCAATATCGTGGAACAGAAGTCTTTTAACATCGTCAATGGTTAC
    GACTACCAAATTAAGTTGAAACAGCAGGAAGGCGCACGACAGATCGCACGAAAGG
    AATGGAAAGAGATAGGCAAAATAAAAGAAATAAAGGAGGGCTATCTCAGTCTCGTT
    ATACACGAAATTTCAAAAATGGTTATTAAGTACAATGCAATCATAGCGATGGAGGA
    TCTCAGTTATGGGTTCAAAAAGGGTCGGTTTAAAGTTGAGCGCCAAGTGTACCAAAA
    GTTCGAGACAATGCTGATTAACAAGCTGAACTACCTCGTCTTCAAAGATATAAGTAT
    TACGGAGAACGGTGGCCTTCTTAAAGGCTATCAACTTACTTACATCCCGGACAAGCT
    CAAAAACGTAGGGCACCAATGCGGGTGTATTTTCTATGTGCCTGCGGCATATACGTC
    AAAGATTGACCCAACCACAGGATTCGTAAACATATTCAAGTTTAAGGACCTCACCGT
    TGATGCGAAAAGGGAGTTCATTAAAAAATTTGATTCTATTCGATATGATAGTGAGAA
    AAATCTCTTTTGTTTCACATTTGACTATAATAATTTTATTACTCAGAATACTGTCATG
    AGCAAGTCATCTTGGTCAGTGTACACATACGGGGTGCGGATCAAACGCAGGTTCGTC
    AATGGTCGCTTCTCAAACGAATCAGACACCATTGACATCACAAAGGACATGGAAAA
    AACCCTTGAGATGACCGACATTAATTGGCGCGATGGTCATGATCTGCGGCAAGACAT
    CATAGACTACGAAATCGTCCAACACATCTTTGAGATCTTTCGCTTGACGGTCCAAAT
    GCGGAACTCCCTGTCCGAGCTCGAGGATAGAGATTATGATCGGCTGATATCTCCCGT
    GCTTAATGAAAATAACATCTTCTACGACTCCGCCAAGGCGGGTGATGCCCTGCCGAA
    GGATGCGGATGCTAATGGCGCTTATTGCATTGCTCTTAAGGGGCTCTATGAGATAAA
    GCAGATCACGGAAAACTGGAAAGAAGACGGTAAGTTTAGTAGAGACAAGCTGAAG
    ATCTCAAATAAAGACTGGTTTGATTTCATACAGAACAAGCGGTACCTGAAACGTCCG
    GCAGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCG
    CAGGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGT
    TATTCCGGGCTAA
    SEQ ID NO: 60
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    CAATGGCACTAACAATTTTCAGAATTTCATCGGCATTTCAAGTCTGCAAAAAACTCT
    GAGGAATGCTTTGATCCCTACTGAAACCACTCAGCAATTTATAGTCAAGAACGGTAT
    AATTAAAGAAGATGAACTCAGGGGTGAAAATAGACAAATACTCAAGGACATTATGG
    ATGACTATTATAGAGGCTTCATCTCAGAGACTCTCTCATCAATAGATGATATCGATT
    GGACTAGCCTTTTCGAGAAAATGGAGATTCAGTTGAAAAATGGTGATAACAAAGAT
    ACGTTGATAAAGGAACAGACCGAGTACAGGAAAGCCATTCATAAGAAATTTGCTAA
    TGACGATAGATTTAAGAATATGTTTAGTGCAAAACTGATTAGTGACATTCTGCCGGA
    GTTCGTTATCCATAATAATAACTACTCTGCATCCGAAAAGGAGGAAAAGACGCAAG
    TTATTAAACTGTTCAGCCGCTTCGCCACAAGCTTCAAGGACTACTTCAAAAATAGAG
    CCAACTGCTTTTCTGCCGACGATATATCATCATCTTCATGCCATCGGATCGTTAACGA
    TAACGCCGAGATATTCTTCAGCAACGCCCTTGTATATCGAAGAATAGTCAAAAGTCT
    GAGTAATGATGATATTAATAAAATTAGCGGTGATATGAAAGACTCCCTGAAGGAAA
    TGTCACTGGAGGAAATTTATAGTTACGAAAAGTACGGCGAATTCATTACTCAAGAA
    GGCATATCCTTCTATAACGACATTTGCGGAAAGGTCAACTCATTCATGAACCTTTAT
    TGCCAGAAGAATAAGGAGAATAAAAATCTTTACAAATTGCAAAAACTTCACAAACA
    AATTCTTTGCATCGCGGATACGTCCTACGAAGTTCCTTACAAATTTGAATCCGATGA
    GGAAGTGTATCAGAGTGTCAATGGATTTTTGGATAATATCTCTTCAAAACATATTGT
    GGAGAGATTGCGCAAAATAGGTGATAACTACAATGGCTACAACCTGGACAAGATTT
    ATATTGTTAGCAAGTTCTATGAAAGTGTCAGTCAAAAGACCTACAGAGATTGGGAG
    ACAATCAACACGGCGCTCGAAATACACTACAATAACATCCTCCCCGGCAATGGGAA
    GAGTAAAGCCGATAAGGTTAAAAAAGCTGTTAAGAACGACCTCCAGAAATCCATCA
    CGGAAATAAACGAGCTGGTTTCCAACTATAAGCTGTGTAGCGATGATAATATTAAG
    GCTGAGACATATATACATGAGATCAGCCACATTCTCAACAATTTCGAGGCACAGGA
    ACTCAAATACAATCCCGAGATTCACTTGGTGGAAAGTGAGTTGAAGGCGTCAGAGC
    TTAAGAATGTACTTGACGTAATAATGAATGCTTTTCATTGGTGCTCCGTGTTCATGAC
    TGAGGAACTCGTGGATAAGGATAATAACTTTTATGCGGAGTTGGAAGAGATATACG
    ATGAAATATACCCGGTTATCTCACTGTATAATCTGGTCAGAAATTACGTGACCCAAA
    AGCCTTATAGTACAAAAAAAATAAAGTTGAACTTCGGTATTCCGACATTGGCAGATG
    GTTGGTCCAAAAGCAAAGAATACTCTAATAACGCCATTATATTGATGCGAGACAATT
    TGTATTACCTTGGGATCTTTAACGCGAAAAACAAACCGGATAAGAAGATCATCGAA
    GGTAATACATCTGAGAATAAGGGGGATTACAAGAAGATGATTTATAATCTGTTGCC
    GGGGCCAAACAAGATGATTCCGAAGGTCTTTCTGTCATCTAAGACAGGAGTAGAGA
    CCTACAAACCTTCTGCGTACATTTTGGAAGGCTACAAACAGAACAAGCATATAAAAT
    CTAGCAAGGACTTTGATATCACGTTTTGTCATGATCTGATAGATTATTTCAAAAACT
    GCATCGCTATACATCCTGAGTGGAAGAATTTCGGCTTTGACTTTTCTGACACCAGCA
    CATACGAAGACATCTCAGGTTTCTACCGGGAAGTCGAGCTCCAGGGGTACAAGATT
    GACTGGACATATATAAGTGAAAAAGACATCGACCTCCTCCAAGAGAAGGGCCAACT
    TTACCTGTTCCAGATCTATAACAAAGACTTTTCTAAAAAGTCCACGGGTAACGACAA
    CTTGCACACTATGTATCTGAAAAACTTGTTCTCTGAAGAGAACCTCAAGGACATCGT
    CCTGAAGCTTAACGGGGAGGCGGAGATCTTCTTTAGAAAGTCCTCTATCAAAAATCC
    CATTATCCATAAAAAGGGCTCTATACTCGTTAATAGGACATATGAAGCGGAGGAAA
    AAGATCAATTTGGGAACATCCAGATCGTCCGGAAAAATATACCTGAGAATATCTATC
    AAGAGCTGTACAAGTATTTTAATGATAAGTCAGACAAAGAGCTCAGTGATGAGGCG
    GCAAAGCTCAAGAACGTGGTGGGGCATCATGAAGCTGCGACGAACATTGTCAAAGA
    TTATAGATACACTTACGATAAATACTTCCTCCACATGCCGATAACGATTAACTTCAA
    AGCCAATAAGACGGGGTTTATAAATGATCGGATCCTTCAGTACATTGCGAAAGAGA
    AAGACCTCCATGTGATCGGAATTGACCGAGGAGAAAGGAATCTGATTTACGTGTCC
    GTGATTGATACTTGCGGGAATATAGTCGAGCAAAAGAGTTTCAACATAGTCAACGG
    GTATGACTATCAGATAAAGCTCAAACAGCAGGAAGGTGCGAGGCAAATTGCGCGCA
    AAGAGTGGAAGGAGATAGGCAAGATTAAAGAAATCAAGGAAGGTTATCTCAGCTTG
    GTGATCCATGAAATATCTAAGATGGTTATAAAGTACAATGCCATAATAGCCATGGA
    GGATCTTTCCTACGGGTTTAAGAAGGGCCGATTTAAAGTGGAGCGACAAGTTTACCA
    GAAGTTCGAAACCATGTTGATTAACAAACTTAACTATTTGGTGTTCAAGGATATAAG
    TATAACCGAAAACGGCGGTTTGCTTAAGGGTTATCAGCTCACGTATATTCCTGATAA
    ACTTAAAAACGTTGGACACCAGTGTGGATGTATCTTCTACGTGCCAGCCGCTTACAC
    TAGTAAGATAGATCCTACCACGGGGTTTGTGAATATTTTTAAGTTTAAAGACTTGAC
    AGTCGACGCCAAAAGGGAATTTATAAAAAAGTTTGATTCTATCCGCTACGATAGTGA
    AAAAAATCTCTTTTGCTTTACTTTCGACTATAACAACTTCATTACGCAGAACACTGTC
    ATGAGTAAGTCCAGCTGGAGCGTCTACACATATGGCGTCCGAATTAAACGACGATTT
    GTAAACGGGCGGTTTTCAAACGAATCTGACACGATAGACATTACCAAGGATATGGA
    GAAGACACTTGAGATGACCGACATAAACTGGCGGGACGGTCACGATCTTCGGCAGG
    ACATAATTGATTACGAAATCGTCCAGCATATATTCGAAATATTTCGACTTACAGTGC
    AAATGCGGAACAGTCTCTCTGAACTGGAAGATCGCGATTATGACCGGTTGATTTCTC
    CGGTCCTCAATGAAAATAACATATTTTATGATAGTGCTAAGGCAGGTGATGCGTTGC
    CAAAGGATGCAGACGCTAATGGTGCCTATTGTATCGCGCTCAAGGGATTGTACGAG
    ATAAAGCAAATTACGGAGAACTGGAAGGAGGATGGTAAGTTTAGCCGAGACAAGTT
    GAAGATTAGCAATAAAGACTGGTTTGATTTTATCCAAAACAAGAGGTACCTGAAAC
    GTCCGGCAGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAG
    CGGCGCAGGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGT
    AAGGTTATTCCGGGCTAA
    SEQ ID NO: 61
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    TAACGGAACTAATAACTTTCAAAATTTCATAGGTATTTCAAGCTTGCAGAAGACCCT
    GAGGAATGCCCTGATTCCAACCGAGACAACGCAGCAGTTCATAGTCAAAAATGGCA
    TTATTAAGGAAGATGAGCTGCGGGGGGAAAACCGACAGATACTCAAGGATATTATG
    GACGACTATTACCGGGGATTTATCTCAGAAACGCTGAGCAGTATTGATGACATCGAT
    TGGACCAGTCTTTTCGAGAAAATGGAAATTCAACTTAAGAATGGTGACAATAAAGA
    CACTCTCATAAAGGAGCAAACTGAATACCGAAAAGCCATACACAAAAAGTTTGCCA
    ACGATGACCGCTTTAAAAACATGTTTTCAGCTAAGCTCATTAGCGACATTCTCCCCG
    AGTTTGTGATTCATAACAATAACTATAGCGCATCCGAGAAGGAGGAAAAAACCCAA
    GTTATCAAATTGTTCAGTAGATTCGCTACGAGCTTTAAAGATTACTTTAAAAACCGG
    GCTAACTGCTTCAGTGCAGACGATATCAGCTCCTCATCCTGTCATCGCATCGTCAAT
    GATAATGCTGAGATCTTCTTTTCTAATGCACTGGTTTACCGCAGGATAGTTAAGTCTC
    TTAGTAACGACGACATCAACAAGATATCAGGAGATATGAAGGATTCCCTTAAAGAA
    ATGAGTCTCGAGGAGATATATTCTTATGAAAAATACGGCGAATTTATTACCCAAGAG
    GGCATTAGTTTCTATAATGACATATGCGGAAAAGTTAATAGTTTTATGAATCTCTATT
    GTCAGAAGAATAAGGAGAATAAGAACCTCTACAAATTGCAGAAGTTGCACAAGCAA
    ATTCTGTGTATCGCGGACACCTCTTACGAGGTCCCATATAAGTTCGAGAGTGATGAA
    GAAGTATACCAGAGCGTTAATGGGTTCCTGGACAACATCTCAAGTAAACACATAGT
    CGAAAGGCTCCGAAAGATCGGTGATAACTATAACGGATATAATTTGGATAAAATTT
    ATATAGTTAGCAAATTTTACGAGAGCGTCAGTCAGAAGACCTACCGGGACTGGGAG
    ACCATAAACACAGCGCTGGAAATACATTATAACAACATACTGCCTGGGAACGGTAA
    GTCAAAGGCAGACAAGGTTAAAAAGGCTGTGAAGAATGACCTGCAAAAATCAATTA
    CAGAAATAAATGAGTTGGTAAGTAATTACAAACTTTGCAGCGATGATAATATAAAG
    GCAGAGACGTACATACATGAAATATCTCATATCCTCAACAATTTCGAAGCCCAAGA
    ACTGAAGTACAACCCGGAAATTCATCTTGTAGAGTCTGAGTTGAAGGCCTCCGAATT
    GAAAAACGTTCTTGACGTAATTATGAATGCCTTCCACTGGTGCTCAGTATTCATGAC
    GGAAGAGCTCGTGGATAAAGACAACAATTTTTACGCTGAACTGGAAGAAATATATG
    ACGAGATTTACCCCGTAATTTCACTCTACAACTTGGTACGAAATTACGTTACCCAAA
    AGCCATACTCAACAAAAAAAATTAAACTGAACTTCGGGATACCCACCCTCGCAGAT
    GGATGGTCAAAGTCCAAAGAGTACAGTAACAATGCAATTATCCTGATGCGAGACAA
    CCTTTATTACCTCGGGATTTTCAACGCTAAAAATAAACCTGATAAAAAAATAATTGA
    GGGTAATACCTCTGAAAACAAGGGGGATTATAAAAAGATGATATACAATCTGCTGC
    CTGGCCCGAACAAAATGATTCCTAAAGTCTTCTTGTCTTCCAAGACTGGAGTCGAAA
    CCTACAAGCCAAGTGCTTATATACTCGAAGGGTACAAACAAAATAAGCACATAAAA
    TCCAGCAAGGATTTTGATATTACATTCTGCCACGATTTGATTGATTATTTTAAGAACT
    GTATAGCCATCCACCCAGAATGGAAGAATTTTGGTTTTGATTTTAGCGATACCTCAA
    CATATGAGGATATCTCTGGCTTTTACCGCGAGGTAGAACTGCAAGGTTATAAGATCG
    ATTGGACTTATATTTCTGAAAAGGACATAGATCTCCTGCAAGAGAAAGGGCAACTTT
    ATTTGTTTCAAATATACAACAAAGATTTTAGTAAGAAGAGTACTGGCAATGATAACC
    TTCACACTATGTATCTGAAGAACCTTTTTTCTGAGGAGAACTTGAAGGACATAGTCC
    TTAAACTCAATGGGGAAGCTGAAATATTCTTTCGCAAAAGCTCCATTAAAAACCCGA
    TCATTCATAAAAAGGGTTCCATCTTGGTAAACCGCACATACGAGGCGGAAGAAAAA
    GATCAGTTCGGAAATATCCAGATCGTAAGGAAGAATATCCCCGAAAATATATACCA
    AGAGCTTTACAAATATTTTAACGATAAGTCAGACAAGGAACTGTCAGACGAAGCAG
    CCAAGTTGAAGAATGTCGTAGGGCACCACGAAGCAGCTACAAACATAGTTAAAGAT
    TATCGGTACACCTACGATAAATATTTCCTGCATATGCCAATAACCATAAACTTCAAA
    GCCAACAAAACAGGGTTCATCAATGACCGAATACTTCAGTATATAGCCAAGGAAAA
    AGACCTGCATGTTATAGGAATAGATAGAGGTGAGCGCAACTTGATATATGTCAGCG
    TGATAGACACCTGCGGAAATATCGTCGAGCAAAAAAGTTTCAACATTGTTAATGGCT
    ACGATTACCAAATTAAATTGAAGCAGCAAGAGGGGGCTCGGCAAATCGCGCGAAAG
    GAATGGAAAGAAATCGGGAAGATTAAAGAAATTAAAGAGGGCTACCTGTCTCTTGT
    AATTCACGAAATATCTAAGATGGTCATCAAGTATAATGCCATTATTGCGATGGAAGA
    TCTGTCCTACGGATTTAAGAAAGGCAGGTTTAAAGTCGAAAGGCAGGTGTACCAGA
    AATTCGAGACCATGCTGATTAATAAGCTCAACTATCTCGTATTTAAGGATATTTCTAT
    AACTGAAAATGGAGGGCTTCTCAAAGGATATCAACTCACATACATACCTGATAAGC
    TGAAGAACGTAGGCCACCAGTGTGGATGCATATTCTATGTACCAGCTGCATACACAA
    GCAAGATCGATCCAACTACTGGGTTTGTCAATATCTTCAAATTTAAGGACTTGACGG
    TCGATGCCAAACGGGAGTTCATCAAAAAGTTTGATAGTATTCGATATGATAGTGAGA
    AGAACTTGTTTTGCTTCACATTTGACTACAACAATTTCATAACGCAAAATACGGTTA
    TGTCTAAATCCTCATGGAGCGTCTACACTTACGGAGTGAGGATAAAGCGGCGCTTCG
    TAAATGGCAGGTTTAGCAATGAATCCGACACGATTGACATAACCAAGGATATGGAG
    AAAACCCTCGAGATGACCGATATAAATTGGCGGGATGGACACGATCTGCGACAAGA
    CATAATCGATTATGAAATCGTGCAGCACATATTTGAGATATTCAGGCTTACGGTCCA
    AATGAGAAATTCCCTTTCCGAACTTGAAGACCGCGATTACGACCGACTGATAAGCCC
    CGTTCTGAACGAAAATAACATCTTCTACGACAGCGCTAAAGCGGGAGACGCGCTGC
    CGAAAGATGCGGACGCAAATGGAGCCTATTGTATCGCCTTGAAAGGGTTGTACGAG
    ATCAAACAGATAACCGAGAATTGGAAGGAGGATGGGAAGTTTAGTCGAGACAAACT
    TAAAATAAGCAACAAGGACTGGTTCGACTTTATTCAAAACAAACGATATCTCAAAC
    GTCCGGCAGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAG
    CGGCGCAGGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGT
    AAGGTTATTCCGGGCTAA
    SEQ ID NO: 62
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    TAATGGTACTAACAATTTTCAAAACTTTATCGGCATCTCTTCACTTCAGAAAACTCTT
    CGGAACGCCCTTATACCGACGGAGACAACGCAGCAGTTTATAGTTAAAAACGGGAT
    CATTAAAGAAGATGAACTCAGAGGGGAAAACAGGCAAATATTGAAGGACATTATGG
    ACGATTACTACCGGGGGTTTATTTCAGAGACCCTTTCATCTATTGATGACATAGATT
    GGACCTCCCTTTTCGAGAAAATGGAGATACAATTGAAAAACGGCGACAATAAAGAT
    ACACTTATCAAGGAACAAACTGAGTATCGCAAGGCGATTCACAAGAAGTTTGCGAA
    TGACGATCGCTTTAAGAATATGTTTTCTGCGAAGCTCATAAGTGACATTCTGCCTGA
    ATTTGTCATTCATAACAACAATTATTCTGCTAGCGAAAAAGAGGAAAAAACTCAAGT
    CATTAAGCTTTTTAGCAGGTTCGCTACTAGTTTTAAAGACTATTTTAAGAACCGGGC
    GAATTGCTTTAGCGCTGACGACATATCATCCTCATCCTGTCATCGCATAGTCAATGA
    TAATGCAGAAATATTCTTTTCTAATGCGCTCGTGTATCGGAGAATAGTGAAAAGCCT
    CTCTAACGATGACATTAACAAAATAAGCGGCGATATGAAGGATAGTCTGAAGGAAA
    TGTCCCTCGAAGAAATATACTCATACGAGAAGTACGGAGAATTTATCACCCAGGAA
    GGAATTAGTTTTTACAACGACATCTGTGGTAAGGTTAACTCTTTTATGAATCTGTATT
    GTCAAAAGAATAAAGAAAATAAAAATCTTTATAAGCTCCAAAAGCTTCACAAACAA
    ATCTTGTGCATTGCGGATACGTCATACGAAGTACCTTACAAATTTGAAAGCGACGAA
    GAGGTGTATCAGTCAGTGAATGGGTTCCTTGACAATATTTCTAGCAAACATATTGTG
    GAGCGACTTCGAAAGATCGGTGATAATTACAATGGCTATAATTTGGATAAAATTTAC
    ATAGTTAGTAAGTTTTATGAATCCGTCTCACAAAAGACGTACCGAGATTGGGAGACC
    ATCAACACTGCTCTGGAGATTCATTACAATAATATATTGCCTGGGAATGGGAAGTCA
    AAGGCCGACAAGGTTAAAAAAGCCGTAAAAAACGATCTTCAAAAGTCCATTACCGA
    GATAAATGAACTTGTATCCAACTATAAGTTGTGCTCTGACGATAATATTAAAGCAGA
    AACGTATATCCACGAAATAAGTCACATCCTGAACAACTTCGAAGCTCAAGAGCTCA
    AGTATAATCCTGAAATTCATCTCGTCGAAAGCGAGCTGAAAGCATCCGAGTTGAAG
    AATGTGCTTGATGTGATCATGAACGCATTCCATTGGTGCAGTGTGTTCATGACCGAA
    GAACTTGTAGACAAAGACAACAACTTCTACGCTGAATTGGAAGAGATTTACGATGA
    AATTTACCCCGTGATATCCCTCTATAATCTGGTAAGAAATTACGTCACGCAAAAACC
    ATACAGTACCAAGAAAATAAAGCTCAACTTTGGTATTCCGACGTTGGCAGATGGGT
    GGAGTAAGAGCAAGGAGTATTCTAACAATGCAATCATCCTCATGCGCGACAATTTGT
    ATTATCTGGGGATCTTCAACGCGAAAAATAAGCCCGACAAAAAGATAATAGAAGGC
    AATACGTCCGAGAACAAAGGGGACTATAAGAAAATGATTTATAACCTTCTTCCAGG
    ACCCAACAAGATGATCCCAAAGGTTTTCTTGAGTTCAAAAACCGGCGTAGAAACTTA
    TAAACCGTCCGCCTACATTCTGGAAGGGTACAAGCAAAACAAGCACATTAAGTCAT
    CTAAGGATTTCGACATTACTTTTTGTCATGATTTGATAGACTACTTCAAAAATTGTAT
    AGCGATACATCCGGAATGGAAAAATTTTGGGTTCGATTTTTCCGACACAAGTACTTA
    TGAAGACATCTCAGGGTTTTATAGGGAAGTTGAACTGCAAGGTTACAAAATAGACT
    GGACTTATATTAGTGAGAAGGACATTGATTTGCTCCAGGAAAAGGGTCAATTGTATC
    TGTTCCAGATATATAACAAGGATTTCTCTAAAAAATCTACAGGTAACGACAATCTCC
    ACACGATGTACCTCAAGAATCTCTTCAGCGAAGAGAATTTGAAGGATATCGTACTTA
    AGCTCAATGGAGAAGCGGAAATATTCTTCAGAAAGTCCAGCATTAAGAATCCTATA
    ATTCACAAGAAAGGGTCAATTCTCGTAAACCGGACTTATGAGGCCGAAGAAAAAGA
    TCAGTTTGGTAACATTCAGATTGTACGGAAAAACATTCCCGAGAACATCTATCAAGA
    ACTGTATAAATACTTTAATGATAAATCCGACAAGGAACTTTCTGACGAGGCTGCAAA
    ATTGAAGAACGTAGTGGGACACCATGAGGCCGCAACCAATATAGTAAAGGATTACA
    GATACACTTATGATAAGTATTTCCTCCATATGCCGATCACGATTAATTTCAAGGCGA
    ATAAAACCGGCTTCATTAACGATCGCATTTTGCAATATATTGCGAAGGAAAAGGATT
    TGCACGTGATAGGTATAGACCGGGGTGAACGAAACTTGATTTACGTCTCTGTGATCG
    ACACATGCGGAAATATAGTTGAACAGAAGTCCTTTAATATTGTGAATGGTTACGACT
    ACCAGATAAAATTGAAGCAACAGGAGGGCGCAAGACAGATAGCTCGCAAAGAGTG
    GAAGGAAATCGGCAAGATCAAAGAAATAAAGGAGGGTTATCTTTCCCTGGTAATTC
    ATGAAATTAGCAAGATGGTTATTAAGTATAATGCTATAATAGCTATGGAGGACCTTT
    CCTATGGGTTCAAGAAAGGTCGCTTCAAAGTGGAGCGACAAGTGTATCAAAAGTTC
    GAGACTATGTTGATAAATAAATTGAATTATTTGGTTTTTAAAGACATTTCAATAACT
    GAGAACGGGGGTCTCTTGAAGGGGTACCAATTGACTTATATTCCGGACAAGTTGAA
    GAATGTCGGACACCAGTGTGGTTGCATTTTCTACGTGCCTGCCGCTTACACCTCAAA
    AATCGATCCGACCACTGGTTTTGTAAATATATTTAAATTCAAAGATCTCACCGTTGA
    TGCCAAACGGGAGTTTATCAAAAAATTCGATTCCATTCGCTACGACTCTGAGAAAAA
    CCTTTTTTGTTTCACGTTCGATTATAACAACTTTATAACCCAAAATACTGTAATGTCC
    AAGTCAAGTTGGTCTGTCTATACTTACGGAGTAAGGATCAAGCGCCGCTTCGTTAAT
    GGGAGATTCTCAAACGAGTCTGATACCATAGACATAACTAAAGACATGGAAAAAAC
    CCTGGAAATGACGGACATCAATTGGCGAGACGGGCATGATCTTCGACAGGACATAA
    TAGATTACGAAATTGTTCAACACATTTTCGAGATATTTCGACTTACGGTTCAGATGA
    GGAATTCCCTTTCCGAATTGGAAGACCGGGATTATGATCGACTTATATCTCCCGTGC
    TCAATGAAAACAATATTTTTTATGATTCAGCGAAAGCTGGGGACGCGCTGCCAAAA
    GATGCCGATGCCAATGGAGCATACTGTATCGCCCTGAAGGGTTTGTATGAGATTAAG
    CAAATTACTGAAAACTGGAAGGAAGATGGCAAGTTTTCTAGAGATAAGCTTAAGAT
    TAGCAATAAGGACTGGTTTGACTTCATTCAAAATAAAAGGTATCTTAAACGTCCGGC
    AGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCA
    GGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTA
    TTCCGGGCTAA
    SEQ ID NO: 63
    ATGGGCCATCATCATCATCATCACAGCAGCGGCGTCGATCTGGGTACCGAGAATTTG
    TATTTCCAGAGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAA
    TAATGGAACAAATAATTTTCAAAATTTTATTGGTATCAGTTCATTGCAAAAGACTTT
    GAGAAATGCTTTGATCCCGACTGAGACCACACAGCAGTTCATCGTCAAAAATGGCA
    TAATCAAGGAAGACGAACTTAGGGGTGAGAATAGACAAATATTGAAGGACATCATG
    GATGACTATTATAGGGGGTTCATTTCCGAAACGCTCAGTAGTATTGATGACATTGAC
    TGGACTAGTCTTTTCGAGAAAATGGAAATTCAGCTTAAGAACGGGGACAATAAAGA
    CACGCTGATCAAGGAGCAAACGGAATATAGGAAGGCGATCCATAAAAAATTCGCGA
    ATGATGATCGGTTTAAAAACATGTTTAGTGCCAAGTTGATCAGCGACATACTGCCCG
    AATTCGTGATCCACAACAATAATTACAGCGCCTCCGAAAAGGAGGAAAAAACTCAG
    GTCATTAAATTGTTTAGCCGATTCGCAACGAGTTTCAAAGATTATTTTAAGAACCGG
    GCCAACTGTTTTTCAGCGGATGATATTAGCTCCAGCAGCTGCCATCGCATAGTAAAT
    GATAACGCTGAAATCTTTTTTAGCAACGCACTTGTCTACCGGAGGATTGTAAAATCA
    CTGTCAAATGATGACATTAACAAAATATCTGGAGATATGAAGGACTCACTCAAAGA
    AATGAGCCTGGAAGAAATATATTCATACGAAAAATACGGGGAGTTTATTACCCAGG
    AAGGTATCAGTTTTTATAATGATATATGTGGAAAAGTTAATTCATTTATGAATCTTTA
    CTGTCAAAAAAATAAGGAGAACAAGAATTTGTACAAGCTCCAAAAACTTCATAAAC
    AGATTCTGTGCATCGCAGACACAAGTTATGAGGTACCGTACAAATTTGAGAGCGAC
    GAAGAAGTTTATCAGAGTGTGAATGGTTTCCTGGACAATATCTCTTCTAAACACATT
    GTTGAGAGGCTTAGGAAGATCGGTGATAATTATAACGGCTATAATCTGGACAAAAT
    TTATATTGTATCAAAGTTTTATGAATCAGTCTCTCAAAAGACGTATCGGGATTGGGA
    AACAATTAACACGGCTCTGGAGATCCACTACAATAACATTCTGCCCGGCAACGGGA
    AGAGCAAAGCTGATAAGGTCAAGAAGGCAGTCAAGAACGACCTTCAGAAGAGCAT
    AACAGAAATTAACGAATTGGTCAGTAACTACAAACTGTGTAGTGATGACAACATAA
    AAGCCGAAACATACATCCATGAAATAAGCCATATCCTGAATAACTTCGAAGCCCAA
    GAACTTAAATACAATCCCGAGATTCATCTTGTCGAATCAGAACTCAAGGCGTCCGAG
    CTCAAAAATGTCCTTGACGTGATAATGAATGCCTTCCACTGGTGCAGCGTATTCATG
    ACGGAGGAGTTGGTAGATAAAGACAACAACTTTTATGCCGAATTGGAAGAGATTTA
    TGATGAGATTTACCCCGTTATTTCTCTGTACAACTTGGTTCGAAACTACGTAACACA
    AAAACCATACTCAACCAAAAAGATCAAACTCAATTTTGGCATACCTACATTGGCTGA
    TGGTTGGTCCAAGTCAAAGGAATATAGCAATAATGCAATAATTCTCATGCGAGATA
    ACTTGTATTATTTGGGGATCTTTAACGCTAAGAACAAACCAGATAAAAAGATAATCG
    AGGGGAACACAAGTGAGAACAAGGGTGATTACAAAAAAATGATTTACAATCTGCTT
    CCTGGGCCTAACAAAATGATTCCGAAGGTGTTTCTTAGCTCTAAAACTGGAGTGGAG
    ACGTATAAGCCTTCCGCGTACATTCTCGAAGGCTACAAGCAAAATAAGCATATCAA
    GTCCAGTAAGGACTTCGACATCACTTTTTGCCACGATCTCATCGATTACTTTAAGAA
    CTGTATCGCAATACACCCCGAGTGGAAAAACTTTGGTTTTGATTTTTCAGACACTAG
    TACCTACGAGGACATTTCCGGCTTCTATCGAGAAGTCGAACTCCAGGGCTACAAAAT
    CGATTGGACGTACATTTCTGAGAAGGACATCGACTTGCTCCAAGAGAAAGGTCAAC
    TTTACCTCTTCCAAATTTACAATAAAGACTTTTCAAAGAAGAGCACCGGTAATGACA
    ACTTGCATACCATGTATCTGAAGAACCTGTTTTCTGAGGAGAACCTCAAGGATATTG
    TATTGAAGTTGAATGGCGAAGCAGAAATATTTTTCCGAAAGTCATCTATCAAGAACC
    CCATTATACACAAAAAAGGCTCTATCCTGGTGAACCGGACTTACGAGGCAGAGGAG
    AAGGATCAATTCGGAAACATACAGATAGTCCGCAAAAACATCCCTGAGAATATCTA
    TCAGGAACTCTATAAGTACTTCAATGATAAATCAGACAAGGAGCTTAGCGACGAAG
    CAGCTAAACTTAAAAACGTGGTTGGCCATCACGAGGCCGCTACCAACATAGTCAAA
    GACTACCGCTATACTTATGACAAGTACTTTTTGCACATGCCCATAACAATTAATTTCA
    AAGCTAACAAAACAGGGTTTATAAATGACAGAATCCTCCAATACATCGCCAAAGAG
    AAGGACCTCCATGTAATCGGGATTGATAGAGGCGAACGGAACTTGATTTACGTTAGT
    GTCATTGATACCTGTGGTAACATTGTCGAACAAAAGTCATTCAACATAGTCAATGGA
    TATGATTATCAGATAAAACTCAAGCAACAAGAAGGCGCGAGGCAGATTGCCAGGAA
    GGAATGGAAAGAAATCGGGAAGATCAAGGAGATCAAGGAGGGTTACCTGTCCTTGG
    TGATACACGAGATTTCAAAAATGGTTATAAAATACAATGCCATTATCGCGATGGAG
    GATTTGTCTTATGGATTTAAGAAGGGGAGGTTCAAAGTCGAACGACAAGTCTATCAG
    AAGTTTGAAACAATGCTCATTAACAAGCTCAATTACCTTGTTTTCAAGGATATAAGC
    ATCACTGAAAACGGCGGACTCCTTAAGGGATATCAGCTGACTTATATCCCCGACAAG
    CTCAAGAACGTAGGGCACCAATGCGGATGCATCTTTTACGTGCCTGCAGCATATACT
    TCAAAAATTGATCCGACTACTGGCTTTGTTAACATTTTCAAGTTCAAGGATCTGACG
    GTAGACGCTAAGAGAGAATTCATAAAAAAGTTTGACAGCATCAGGTACGATAGTGA
    AAAGAACCTTTTTTGTTTTACCTTTGACTACAATAATTTTATTACGCAAAATACAGTT
    ATGAGCAAATCAAGTTGGAGCGTTTACACATATGGCGTTCGGATCAAGCGCAGATTC
    GTCAATGGTCGCTTCTCAAATGAGAGCGATACAATCGATATAACGAAGGATATGGA
    GAAGACGCTTGAGATGACAGATATCAACTGGCGGGACGGACATGACCTTAGACAAG
    ACATAATCGATTACGAAATAGTACAGCATATCTTTGAGATTTTTAGGCTTACAGTTC
    AGATGCGGAACTCTCTTTCCGAACTGGAGGACCGGGATTATGATCGGTTGATCTCCC
    CAGTACTGAACGAAAATAATATCTTTTACGATAGCGCGAAGGCTGGTGATGCACTCC
    CAAAAGACGCTGATGCGAACGGAGCTTATTGCATAGCCCTTAAAGGGCTTTACGAG
    ATTAAACAAATAACAGAAAATTGGAAGGAAGATGGCAAATTTTCCCGCGACAAGTT
    GAAGATTAGTAACAAAGACTGGTTCGACTTCATTCAGAATAAACGCTACCTCAAAC
    GTCCGGCAGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAG
    CGGCGCAGGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGT
    AAGGTTATTCCGGGCTAA
    SEQ ID NO: 64
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAACGGTA
    CCAATAACTTCCAGAACTTCATCGGTATTTCTAGCCTGCAAAAGACCCTGCGTAACG
    CGCTGATTCCGACCGAGACTACCCAGCAATTCATCGTGAAAAACGGTATCATTAAGG
    AAGATGAATTGCGCGGTGAGAATCGTCAGATTCTGAAAGATATCATGGATGACTAC
    TATCGCGGTTTCATTAGCGAAACCCTGTCGAGCATCGATGATATCGATTGGACGAGC
    CTCTTCGAGAAAATGGAAATTCAACTGAAAAATGGTGACAACAAAGATACCCTGAT
    TAAAGAACAAACGGAATACCGCAAGGCAATCCATAAAAAGTTTGCGAATGACGACC
    GTTTTAAGAATATGTTCTCGGCCAAGCTGATTTCCGACATCCTGCCAGAGTTCGTCAT
    TCACAACAACAATTACAGCGCAAGCGAGAAAGAGGAAAAGACTCAGGTCATTAAGC
    TGTTTAGCCGCTTTGCGACGTCCTTCAAAGACTACTTCAAGAATCGTGCGAATTGCTT
    TAGCGCGGATGACATCTCTAGCTCTAGCTGTCACCGTATTGTTAACGACAATGCAGA
    GATTTTCTTCAGCAACGCCCTGGTGTATCGCCGTATTGTCAAGTCTCTGAGCAACGA
    CGACATTAACAAGATCAGCGGCGACATGAAAGACAGCCTGAAAGAAATGTCTCTGG
    AAGAAATCTACAGCTACGAGAAATATGGTGAGTTTATCACCCAAGAGGGCATTAGC
    TTCTACAATGATATCTGTGGTAAGGTTAATAGCTTTATGAATCTGTACTGCCAGAAG
    AATAAAGAAAACAAGAACTTGTACAAGCTGCAAAAGCTGCATAAGCAAATTCTGTG
    CATCGCCGATACTAGCTATGAAGTTCCGTACAAGTTCGAGTCTGATGAAGAGGTGTA
    TCAGTCAGTCAACGGTTTTCTGGATAACATCAGCAGCAAGCACATCGTCGAGCGCCT
    GCGCAAGATTGGTGACAACTACAATGGTTATAACCTGGACAAGATCTATATCGTGTC
    GAAGTTTTACGAGAGCGTGTCCCAGAAAACGTACCGTGATTGGGAAACGATTAACA
    CGGCCTTGGAAATTCACTATAACAATATCCTGCCGGGCAACGGCAAGAGCAAAGCT
    GACAAAGTCAAAAAAGCTGTGAAAAACGATCTGCAAAAGTCCATCACCGAGATCAA
    CGAACTGGTTAGCAACTATAAGCTGTGTAGCGACGACAACATTAAAGCTGAAACGT
    ATATCCACGAAATCAGCCACATCCTGAATAACTTTGAGGCACAAGAACTGAAATAC
    AATCCTGAGATCCATCTGGTAGAGAGCGAGCTGAAGGCAAGCGAGTTGAAAAACGT
    TCTCGACGTTATCATGAATGCTTTCCACTGGTGTAGCGTGTTTATGACCGAAGAACT
    GGTTGACAAAGATAACAATTTCTATGCAGAGCTGGAAGAAATCTATGATGAAATCT
    ACCCGGTCATCAGCCTGTATAACCTGGTTCGTAACTACGTGACGCAGAAGCCGTACA
    GCACCAAAAAGATCAAGCTGAACTTCGGTATTCCGACCTTGGCGGACGGTTGGAGC
    AAATCCAAAGAATACTCCAATAATGCGATTATTCTGATGCGTGATAATCTGTACTAT
    CTGGGTATCTTCAATGCGAAGAACAAGCCAGATAAAAAGATTATTGAAGGCAACAC
    CAGCGAGAATAAAGGCGACTACAAGAAAATGATCTACAACTTATTGCCGGGTCCGA
    ACAAGATGATCCCGAAAGTTTTTCTGAGCAGCAAGACCGGCGTTGAAACCTATAAG
    CCGAGCGCGTACATTTTAGAGGGCTATAAACAAAACAAGCACATCAAGAGCAGCAA
    AGATTTTGATATTACGTTCTGCCACGACCTGATCGACTATTTCAAGAATTGTATTGCG
    ATTCACCCTGAGTGGAAGAACTTCGGTTTTGACTTTTCCGATACCTCCACCTATGAA
    GATATTAGCGGTTTTTACCGTGAAGTCGAGTTGCAGGGTTATAAGATTGATTGGACT
    TACATTTCCGAGAAAGACATCGACCTGTTGCAAGAGAAAGGTCAGCTGTACCTGTTT
    CAGATCTATAACAAAGATTTCAGCAAAAAGTCGACGGGCAATGATAATCTGCACAC
    CATGTATCTGAAAAACCTGTTTAGCGAAGAGAACCTGAAAGACATTGTTCTTAAGCT
    GAATGGTGAGGCCGAGATCTTCTTCCGTAAAAGCTCCATTAAGAACCCGATTATCCA
    CAAAAAGGGCTCTATTCTGGTTAACCGCACGTACGAAGCGGAAGAGAAAGATCAAT
    TTGGTAACATCCAGATCGTGCGTAAGAATATCCCGGAGAACATTTACCAAGAACTGT
    ATAAGTATTTCAATGACAAGAGCGATAAAGAATTGAGCGATGAAGCGGCAAAGCTG
    AAAAACGTCGTTGGCCACCACGAAGCCGCGACGAATATCGTGAAAGATTATCGTTA
    CACCTACGACAAGTACTTTCTGCACATGCCGATCACCATCAATTTCAAAGCGAATAA
    AACGGGTTTTATCAATGACCGTATCCTGCAGTACATTGCGAAAGAAAAAGATTTACA
    CGTGATTGGTATTGATCGCGGCGAGCGCAATCTGATTTACGTCAGCGTTATCGACAC
    GTGCGGCAATATTGTGGAGCAGAAAAGCTTCAATATCGTCAATGGTTACGACTACCA
    GATCAAACTGAAGCAACAAGAGGGCGCCCGCCAGATTGCGCGTAAAGAGTGGAAA
    GAAATCGGTAAGATTAAAGAAATCAAGGAAGGCTACCTGTCCCTGGTGATCCATGA
    AATCAGCAAAATGGTGATCAAGTACAACGCTATCATTGCGATGGAAGATCTGAGCT
    ACGGTTTTAAAAAGGGTCGCTTCAAAGTTGAGCGTCAAGTGTATCAGAAATTTGAGA
    CTATGCTGATTAACAAGTTGAACTATCTGGTTTTTAAAGACATCAGCATTACCGAGA
    ATGGTGGCCTGCTGAAGGGTTATCAACTGACCTATATTCCTGACAAGTTGAAAAATG
    TTGGTCATCAGTGTGGTTGCATTTTCTACGTACCGGCAGCGTACACGAGCAAGATTG
    ACCCGACCACGGGTTTCGTTAACATTTTCAAGTTTAAAGATTTGACCGTGGACGCCA
    AGCGTGAGTTCATTAAAAAGTTCGACAGCATCAGATACGACTCTGAGAAGAATCTG
    TTCTGCTTTACGTTCGACTACAATAACTTCATTACCCAAAATACCGTTATGAGCAAA
    AGCTCCTGGAGCGTGTACACGTACGGCGTCCGTATCAAGCGTCGTTTTGTGAATGGT
    CGCTTTTCCAACGAATCTGACACCATTGACATTACCAAAGATATGGAAAAGACCCTT
    GAGATGACCGACATTAATTGGCGTGATGGCCATGACTTGCGCCAAGACATTATCGAC
    TACGAAATTGTTCAGCACATCTTTGAGATTTTTCGTCTGACGGTCCAGATGCGCAAC
    TCGCTGAGCGAGTTGGAAGATCGTGACTATGACCGTCTGATTAGCCCGGTGCTGAAT
    GAAAACAATATCTTCTATGATAGCGCAAAGGCCGGTGACGCGCTGCCGAAAGATGC
    GGATGCTAACGGTGCATACTGCATTGCACTGAAGGGTCTGTACGAAATCAAACAGA
    TCACCGAGAATTGGAAAGAGGATGGTAAGTTTAGCCGTGATAAGCTGAAGATTAGC
    AATAAAGACTGGTTCGACTTTATTCAAAACAAGCGCTATCTGAAACGTCCGGCAGCG
    ACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCA
    GCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCG
    GGCTAA
    SEQ ID NO: 65
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGAA
    CAAATAATTTTCAGAACTTTATTGGGATCAGTTCGCTTCAGAAAACGCTTCGTAATG
    CTCTGATTCCCACAGAAACCACTCAGCAGTTTATCGTAAAGAATGGCATTATCAAGG
    AGGATGAATTACGCGGCGAGAACCGCCAAATCTTAAAAGATATCATGGACGACTAC
    TACCGCGGTTTCATTAGCGAAACTCTTAGTTCAATTGACGACATTGACTGGACGTCC
    TTGTTCGAAAAGATGGAGATTCAATTAAAGAACGGTGATAACAAGGATACGTTGAT
    TAAAGAACAGACGGAGTACCGTAAGGCTATCCACAAAAAATTTGCAAACGACGACC
    GCTTTAAAAATATGTTTAGCGCAAAATTAATCTCCGACATCCTGCCTGAATTCGTCA
    TCCATAACAATAACTATAGCGCCTCGGAAAAAGAAGAAAAAACGCAGGTTATTAAA
    CTTTTCTCGCGCTTTGCAACAAGCTTTAAGGATTACTTCAAAAATCGCGCCAATTGTT
    TTTCAGCCGACGACATTAGCTCCAGTTCCTGCCACCGTATTGTGAATGACAACGCTG
    AGATTTTTTTTTCCAATGCGCTGGTTTATCGTCGTATTGTTAAGAGCCTTAGTAACGA
    CGACATTAATAAAATTAGCGGTGATATGAAGGATAGCTTGAAAGAAATGAGTCTGG
    AAGAGATCTATAGTTACGAGAAGTACGGCGAATTTATTACCCAGGAGGGCATTTCAT
    TTTACAATGATATCTGTGGAAAAGTCAACTCCTTTATGAACTTGTATTGCCAAAAGA
    ATAAAGAAAACAAAAACCTGTACAAACTGCAAAAGTTACACAAGCAGATTTTGTGT
    ATCGCAGACACGTCATACGAAGTACCGTACAAGTTTGAGTCCGATGAAGAAGTGTA
    CCAAAGCGTTAATGGCTTTTTGGATAACATTTCGAGCAAACATATCGTAGAGCGTTT
    GCGTAAGATTGGTGATAATTACAACGGTTACAATTTAGACAAAATCTATATCGTCTC
    TAAGTTTTACGAAAGTGTTTCTCAGAAAACTTACCGCGATTGGGAGACGATCAACAC
    TGCGCTGGAGATTCATTACAATAATATCCTTCCAGGTAACGGTAAAAGCAAAGCTGA
    TAAGGTGAAAAAGGCGGTTAAAAATGACCTTCAAAAGTCTATCACAGAAATCAACG
    AATTGGTCAGCAATTATAAGCTTTGCAGTGACGATAACATTAAGGCCGAGACTTACA
    TCCATGAGATCTCTCACATTCTTAATAATTTTGAAGCGCAAGAGCTGAAATACAATC
    CTGAAATCCATCTGGTCGAAAGTGAATTAAAAGCCTCCGAATTAAAAAATGTCTTGG
    ACGTGATCATGAATGCGTTCCATTGGTGCTCAGTTTTTATGACGGAAGAGTTGGTGG
    ACAAAGACAACAATTTTTACGCCGAGCTTGAGGAAATTTACGACGAAATTTACCCCG
    TTATTTCGTTATACAACCTTGTGCGTAATTACGTTACACAAAAGCCCTATTCGACAA
    AGAAAATCAAGTTAAATTTCGGGATTCCCACATTAGCTGATGGATGGTCCAAATCCA
    AAGAATACTCGAATAACGCTATCATCCTTATGCGTGATAATTTGTACTACTTAGGCA
    TCTTCAATGCGAAGAACAAACCTGACAAGAAAATTATCGAAGGAAACACTTCGGAG
    AACAAAGGTGATTATAAAAAGATGATCTACAACTTGCTTCCCGGGCCAAACAAAAT
    GATTCCCAAGGTATTTTTGAGTTCTAAAACCGGTGTCGAAACTTACAAACCAAGTGC
    TTATATTTTGGAAGGATACAAACAGAACAAACATATCAAGTCTTCGAAAGACTTCGA
    TATTACGTTCTGCCACGATCTGATCGATTACTTCAAGAACTGTATTGCTATTCACCCC
    GAGTGGAAGAACTTTGGATTTGATTTCTCCGACACGTCCACTTATGAAGATATCTCT
    GGCTTCTATCGCGAGGTTGAATTACAAGGGTATAAGATTGACTGGACTTATATTTCG
    GAGAAGGATATCGATCTTTTGCAAGAAAAAGGGCAACTTTATTTATTTCAGATCTAT
    AACAAGGACTTTTCAAAAAAGAGCACTGGAAATGACAATCTGCATACCATGTACCT
    TAAGAACCTGTTCTCGGAAGAGAACCTGAAGGACATTGTACTTAAACTGAATGGAG
    AGGCAGAGATCTTCTTTCGCAAATCAAGCATTAAGAACCCAATTATTCACAAAAAG
    GGGAGTATCTTAGTAAATCGCACATATGAGGCTGAGGAAAAAGATCAGTTTGGTAA
    CATTCAGATCGTGCGTAAGAACATTCCTGAAAATATCTATCAGGAACTTTATAAGTA
    TTTCAACGATAAAAGTGATAAAGAGCTGAGTGACGAAGCGGCTAAACTTAAGAATG
    TTGTGGGACACCATGAGGCAGCAACCAATATTGTGAAGGATTATCGCTATACGTACG
    ACAAATACTTTTTACACATGCCCATCACTATTAATTTTAAAGCTAATAAGACTGGCTT
    CATTAACGATCGCATCCTGCAGTACATTGCTAAGGAAAAGGATCTTCACGTTATCGG
    TATCGATCGCGGGGAGCGTAATCTTATCTACGTCTCTGTCATTGACACGTGTGGCAA
    TATTGTGGAGCAAAAGTCCTTCAATATTGTTAACGGCTATGACTATCAGATTAAATT
    GAAACAGCAGGAAGGTGCGCGTCAGATTGCCCGCAAGGAATGGAAGGAAATTGGC
    AAGATCAAAGAAATTAAGGAGGGCTACTTAAGCTTAGTAATTCACGAAATTAGTAA
    AATGGTTATCAAATACAACGCCATCATCGCGATGGAGGATCTTTCGTACGGGTTTAA
    GAAAGGTCGTTTTAAAGTGGAGCGTCAGGTGTACCAGAAATTTGAAACTATGCTTAT
    TAACAAACTTAACTACCTGGTTTTCAAGGATATCAGTATTACTGAAAACGGGGGGCT
    GTTAAAAGGGTATCAATTAACTTACATTCCAGACAAATTAAAGAACGTTGGACATCA
    GTGTGGCTGCATTTTTTATGTACCAGCTGCATACACTTCAAAGATCGATCCTACGACT
    GGGTTCGTGAACATTTTTAAGTTTAAAGACTTGACGGTAGATGCCAAGCGCGAATTC
    ATCAAGAAATTCGACAGCATTCGCTACGACTCTGAGAAAAATCTTTTCTGTTTCACA
    TTCGATTATAACAATTTCATTACGCAGAACACAGTAATGTCCAAGTCTTCTTGGAGT
    GTTTATACATATGGTGTCCGCATTAAGCGCCGTTTCGTCAACGGCCGCTTCAGTAAT
    GAGAGCGATACTATTGACATCACAAAAGACATGGAAAAAACACTGGAAATGACCGA
    CATCAATTGGCGTGACGGCCATGACTTACGTCAGGATATCATTGATTATGAGATCGT
    TCAACACATCTTCGAAATCTTTCGCTTGACTGTTCAAATGCGCAATTCCTTGTCGGAA
    TTGGAGGACCGTGATTATGACCGCTTAATTTCCCCCGTCTTAAATGAAAACAATATT
    TTTTATGACTCTGCAAAAGCTGGAGATGCTCTGCCGAAAGACGCCGATGCAAATGG
    GGCATATTGCATTGCTTTAAAGGGGCTTTACGAGATCAAGCAAATCACCGAAAACTG
    GAAAGAGGATGGAAAGTTTTCGCGTGATAAACTGAAGATCTCTAACAAAGACTGGT
    TCGACTTTATCCAGAACAAGCGTTATTTGAAACGTCCGGCAGCGACCAAAAAAGCC
    GGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGA
    AACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 66
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGCA
    CCAATAACTTCCAAAACTTCATCGGGATCTCTAGCCTTCAGAAGACGCTTCGCAATG
    CTCTTATCCCAACTGAGACCACTCAACAATTTATTGTGAAGAATGGAATTATTAAAG
    AGGACGAACTGCGTGGCGAGAATCGTCAGATCTTAAAGGACATTATGGATGATTAT
    TACCGTGGATTCATCTCCGAAACATTATCGTCGATCGATGATATCGATTGGACTTCTC
    TGTTCGAGAAAATGGAAATTCAATTGAAAAACGGAGATAATAAAGATACGCTTATC
    AAAGAACAGACGGAATATCGTAAAGCGATTCATAAGAAATTCGCAAATGACGATCG
    TTTCAAAAATATGTTCAGTGCCAAGCTTATTTCGGACATTTTACCTGAATTTGTAATT
    CATAATAATAACTACTCAGCAAGTGAGAAGGAGGAGAAAACCCAAGTTATTAAACT
    GTTCTCTCGTTTCGCAACGTCCTTTAAAGATTACTTTAAAAACCGCGCGAATTGCTTT
    AGCGCTGACGACATTTCCAGCTCATCCTGTCATCGCATCGTAAACGACAATGCGGAA
    ATCTTCTTCAGCAACGCCCTGGTTTACCGCCGCATCGTCAAAAGCTTATCGAATGAC
    GACATCAATAAGATCTCAGGAGATATGAAGGACTCGCTTAAGGAGATGTCTCTGGA
    GGAAATTTATAGTTACGAAAAGTATGGAGAGTTCATTACCCAGGAGGGAATCTCGTT
    CTACAATGACATTTGCGGGAAGGTGAACTCCTTCATGAACTTATACTGCCAGAAAAA
    CAAAGAGAACAAAAATCTGTATAAATTGCAGAAATTACATAAACAGATTCTTTGTAT
    TGCTGACACTTCCTACGAAGTACCCTATAAATTCGAGTCAGATGAAGAAGTATACCA
    GTCCGTGAACGGATTTCTGGACAATATCTCCTCAAAACACATCGTGGAACGCTTACG
    TAAAATTGGCGATAATTATAATGGTTACAATCTTGACAAAATTTATATCGTATCTAA
    ATTTTACGAGAGTGTGAGCCAAAAGACCTACCGCGACTGGGAGACCATCAACACAG
    CTTTAGAAATTCACTATAATAATATCTTACCCGGCAATGGTAAGAGCAAGGCTGACA
    AGGTAAAAAAGGCCGTCAAGAATGATTTGCAGAAATCTATTACAGAAATTAATGAG
    TTAGTCTCCAACTATAAGCTTTGTTCCGACGATAACATCAAAGCTGAGACATATATT
    CATGAGATTAGTCACATTCTTAACAACTTCGAGGCCCAGGAACTTAAGTACAATCCT
    GAAATTCATCTTGTCGAGTCTGAGCTGAAAGCTAGTGAATTGAAAAATGTTTTAGAC
    GTTATTATGAACGCATTCCACTGGTGCTCTGTGTTTATGACAGAAGAACTGGTCGAC
    AAGGACAATAACTTCTATGCCGAACTTGAGGAAATCTACGATGAAATTTACCCTGTA
    ATCTCCTTGTATAATCTTGTACGTAATTACGTCACTCAAAAACCTTACAGCACGAAA
    AAAATTAAATTGAACTTCGGGATTCCTACACTTGCCGACGGGTGGTCTAAATCCAAG
    GAATATAGCAACAATGCCATTATTTTAATGCGCGACAATCTTTACTATTTAGGAATT
    TTTAACGCTAAGAACAAGCCCGATAAAAAGATTATTGAAGGAAACACGTCTGAAAA
    TAAGGGCGACTACAAAAAGATGATTTATAACCTTTTGCCCGGTCCAAACAAAATGAT
    CCCAAAGGTATTCCTGTCATCCAAAACAGGGGTTGAGACATATAAGCCCAGCGCAT
    ATATTCTGGAAGGATACAAACAGAATAAACATATCAAAAGCAGCAAAGATTTTGAC
    ATTACTTTTTGCCACGATTTAATCGACTACTTCAAAAACTGTATCGCTATCCACCCTG
    AATGGAAGAATTTCGGATTTGATTTCTCAGATACAAGTACGTATGAGGATATCAGCG
    GTTTCTATCGCGAAGTTGAACTTCAAGGGTATAAAATTGACTGGACCTACATTAGTG
    AGAAGGACATCGACCTGTTACAGGAAAAAGGCCAATTGTACTTGTTTCAGATCTACA
    ATAAGGATTTCTCAAAAAAATCGACCGGCAATGATAACTTGCACACCATGTACCTGA
    AGAACCTTTTTTCGGAGGAAAACCTTAAAGACATTGTCCTGAAGTTGAATGGAGAA
    GCGGAGATTTTCTTTCGTAAGTCTTCCATTAAAAATCCAATTATTCATAAGAAGGGC
    AGCATCCTTGTGAACCGTACGTACGAGGCGGAAGAGAAGGACCAATTCGGTAACAT
    TCAAATCGTCCGCAAGAACATCCCTGAAAATATTTATCAGGAGCTTTACAAGTATTT
    CAATGATAAGTCCGACAAGGAATTATCAGATGAGGCTGCGAAGTTGAAAAATGTTG
    TTGGTCATCACGAGGCGGCGACGAATATTGTAAAGGATTATCGCTACACTTATGACA
    AGTACTTTCTGCACATGCCGATCACCATTAATTTCAAGGCGAACAAAACAGGATTTA
    TTAATGACCGCATCTTACAATACATTGCCAAAGAAAAGGACTTACACGTTATTGGCA
    TTGATCGTGGAGAACGCAACTTAATCTACGTAAGCGTTATTGACACTTGCGGGAATA
    TCGTAGAACAAAAGAGCTTCAACATCGTGAATGGTTACGATTACCAGATCAAGCTTA
    AGCAGCAGGAGGGAGCGCGCCAGATCGCGCGCAAGGAATGGAAGGAGATTGGTAA
    GATCAAGGAAATCAAGGAAGGTTATCTGTCCTTGGTAATCCACGAAATTTCGAAAAT
    GGTTATCAAATACAATGCTATTATTGCAATGGAGGACTTGTCCTACGGCTTTAAAAA
    AGGACGCTTTAAGGTGGAGCGCCAGGTTTATCAAAAGTTTGAAACAATGCTGATTA
    ACAAGCTGAACTATTTGGTCTTTAAAGATATCTCCATCACCGAAAATGGTGGGCTTT
    TGAAAGGCTATCAACTTACATATATCCCTGATAAGCTTAAGAATGTGGGTCATCAGT
    GCGGGTGCATTTTTTATGTTCCTGCAGCCTACACGTCCAAAATCGATCCTACAACTG
    GATTTGTTAATATCTTCAAATTTAAGGATCTTACCGTCGACGCGAAGCGCGAATTTA
    TCAAGAAATTCGATAGTATTCGTTATGATTCCGAAAAAAACCTTTTCTGTTTCACCTT
    TGATTATAATAACTTTATCACGCAAAATACTGTCATGAGCAAATCGAGTTGGTCTGT
    GTACACTTACGGAGTACGCATCAAGCGTCGTTTTGTTAATGGGCGCTTCAGTAACGA
    GTCAGACACGATTGATATCACAAAAGATATGGAGAAAACGCTGGAGATGACAGACA
    TCAATTGGCGCGATGGTCATGACTTACGTCAAGACATTATCGATTATGAAATTGTCC
    AGCATATCTTTGAGATCTTTCGTTTGACTGTTCAGATGCGCAACAGCCTGTCAGAATT
    GGAGGATCGTGACTATGATCGCCTTATTTCTCCCGTCTTAAATGAGAACAATATCTT
    CTACGACTCAGCCAAGGCTGGAGATGCACTGCCAAAAGACGCCGACGCAAATGGGG
    CCTACTGTATTGCATTGAAGGGGTTGTACGAGATCAAACAGATTACAGAAAATTGG
    AAGGAGGACGGTAAGTTCTCTCGTGATAAGCTGAAGATTTCTAACAAAGACTGGTTC
    GATTTCATTCAGAACAAACGTTACCTGAAACGTCCGGCAGCGACCAAAAAAGCCGG
    CCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGAAA
    CGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 67
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGTA
    CCAATAACTTTCAGAATTTCATTGGAATCAGCAGCTTACAGAAAACCCTGCGCAATG
    CACTTATCCCCACTGAGACAACCCAGCAGTTCATTGTAAAGAACGGGATTATTAAAG
    AAGATGAGCTTCGCGGGGAGAATCGTCAGATCTTAAAGGATATTATGGACGATTAC
    TACCGTGGCTTCATTTCGGAGACGCTGTCGTCGATCGACGACATCGACTGGACATCC
    TTGTTTGAAAAGATGGAAATCCAACTGAAGAATGGCGATAACAAGGACACGTTAAT
    CAAAGAGCAGACGGAATACCGTAAAGCTATCCACAAAAAGTTCGCTAATGACGACC
    GCTTTAAGAACATGTTCTCAGCAAAACTTATTAGCGATATTTTACCTGAATTTGTCAT
    CCACAATAACAATTACTCCGCGAGTGAAAAAGAGGAGAAAACCCAGGTGATTAAGC
    TGTTTTCCCGTTTTGCAACCAGTTTCAAGGACTATTTTAAGAATCGTGCTAATTGTTT
    CTCTGCAGACGACATTTCCTCGTCGTCCTGCCATCGCATTGTTAATGATAATGCTGAA
    ATCTTTTTTTCAAACGCACTTGTGTATCGTCGCATTGTCAAAAGCTTAAGTAATGACG
    ATATCAATAAGATCTCAGGAGACATGAAGGACTCCCTGAAAGAAATGTCATTGGAA
    GAAATTTACTCTTATGAAAAGTATGGAGAATTTATTACGCAGGAGGGTATCAGCTTC
    TATAACGACATTTGTGGTAAAGTGAACAGCTTTATGAATCTTTATTGTCAAAAGAAT
    AAAGAGAACAAAAATCTGTACAAGCTGCAGAAATTGCATAAACAAATTCTGTGCAT
    TGCAGATACTTCGTATGAGGTTCCTTACAAATTCGAGTCGGATGAGGAGGTGTATCA
    AAGCGTAAACGGATTTTTGGATAACATTAGTAGTAAGCATATTGTGGAACGCCTTCG
    CAAGATTGGTGACAACTATAACGGATACAACTTAGACAAGATCTATATTGTCTCGAA
    GTTTTACGAAAGTGTTTCCCAAAAGACTTATCGCGACTGGGAGACAATCAACACTGC
    GCTGGAAATTCACTATAACAATATCTTGCCGGGGAACGGAAAAAGTAAGGCAGATA
    AGGTGAAGAAAGCAGTCAAAAATGATCTGCAAAAAAGCATTACTGAAATTAACGAA
    CTTGTGTCAAATTACAAATTGTGTTCGGATGACAATATTAAAGCGGAAACGTATATC
    CACGAGATCTCGCACATTCTTAATAATTTCGAGGCGCAGGAATTAAAGTATAATCCT
    GAGATCCATTTGGTGGAATCAGAACTTAAAGCTAGTGAACTGAAAAATGTCCTGGA
    CGTTATTATGAATGCATTTCACTGGTGTTCTGTCTTTATGACAGAAGAACTTGTCGAC
    AAAGACAACAACTTTTATGCGGAATTAGAAGAGATTTACGACGAAATTTATCCCGTT
    ATTTCGTTATATAATTTAGTTCGTAATTACGTGACTCAGAAACCCTACAGCACAAAA
    AAGATTAAATTAAACTTTGGGATTCCGACTCTTGCTGATGGATGGAGCAAGTCCAAG
    GAGTACTCTAATAACGCCATTATCTTGATGCGTGACAACCTGTACTACCTGGGCATT
    TTTAACGCTAAAAACAAACCCGACAAAAAGATCATTGAAGGGAACACCTCGGAAAA
    TAAGGGGGACTATAAAAAAATGATCTACAATCTGTTGCCAGGCCCAAATAAGATGA
    TCCCAAAGGTTTTTTTATCTTCCAAAACTGGCGTAGAAACTTACAAGCCGAGCGCAT
    ACATCCTTGAAGGATATAAACAAAACAAACATATCAAAAGTTCAAAGGACTTCGAT
    ATTACGTTCTGCCATGATTTAATCGATTATTTCAAGAATTGCATCGCGATTCACCCAG
    AGTGGAAAAACTTTGGGTTTGATTTTTCAGACACCAGCACTTACGAGGATATTAGTG
    GATTCTATCGTGAGGTTGAACTGCAGGGCTATAAAATTGACTGGACCTATATTTCTG
    AAAAAGATATTGATCTGCTTCAGGAGAAAGGCCAATTGTACTTATTTCAAATCTATA
    ACAAGGATTTCTCCAAGAAGTCCACGGGTAATGACAACTTACACACAATGTATCTGA
    AGAATCTGTTTAGTGAGGAGAACTTGAAGGACATTGTGCTGAAGCTTAATGGCGAG
    GCCGAAATCTTTTTTCGTAAGTCCTCCATTAAAAACCCTATTATCCATAAGAAAGGG
    AGTATTCTTGTCAACCGCACGTATGAGGCCGAAGAAAAGGACCAATTCGGAAACAT
    CCAAATTGTCCGTAAAAATATTCCTGAGAACATTTACCAGGAGCTTTACAAGTATTT
    CAACGACAAGAGTGATAAAGAACTTTCAGATGAGGCGGCGAAACTGAAGAATGTAG
    TGGGGCACCACGAAGCTGCCACGAATATTGTAAAGGATTACCGTTACACCTACGAC
    AAGTACTTTTTGCATATGCCCATCACAATTAATTTTAAGGCCAATAAAACTGGTTTTA
    TCAACGATCGTATCTTACAGTACATTGCTAAGGAAAAAGATCTGCACGTTATCGGTA
    TCGATCGCGGGGAACGCAATCTGATTTATGTTAGTGTGATTGACACGTGCGGAAATA
    TTGTTGAGCAGAAGAGCTTTAATATCGTAAATGGATATGACTATCAAATTAAACTGA
    AGCAACAGGAAGGGGCCCGCCAGATTGCCCGCAAGGAGTGGAAAGAAATTGGAAA
    GATCAAGGAGATTAAAGAAGGGTACCTTTCCCTTGTTATCCACGAAATCTCGAAAAT
    GGTGATCAAGTACAATGCCATTATTGCTATGGAGGATCTGTCATATGGGTTTAAGAA
    AGGCCGCTTTAAGGTGGAACGTCAGGTTTACCAGAAGTTTGAGACCATGCTTATCAA
    TAAGCTGAATTATCTTGTCTTCAAAGACATCTCAATCACAGAGAACGGCGGGCTGTT
    AAAAGGATATCAGCTGACCTATATCCCCGACAAACTGAAAAATGTCGGGCACCAAT
    GCGGCTGTATTTTCTACGTGCCCGCTGCATACACATCTAAAATTGACCCAACGACTG
    GATTCGTAAATATTTTTAAGTTTAAGGATCTTACGGTAGATGCAAAGCGCGAATTTA
    TCAAGAAATTTGATAGTATCCGTTACGACAGCGAGAAAAACTTATTTTGTTTTACGT
    TCGATTATAACAACTTCATCACGCAAAATACCGTCATGTCAAAATCTTCCTGGTCAG
    TCTATACGTATGGCGTCCGTATCAAGCGCCGCTTCGTCAACGGGCGTTTTTCAAACG
    AGTCAGATACCATCGATATCACCAAAGATATGGAAAAAACATTGGAGATGACGGAC
    ATCAATTGGCGCGATGGTCATGACTTACGCCAGGACATTATTGACTACGAAATCGTA
    CAACATATTTTTGAGATTTTCCGTCTGACCGTGCAAATGCGCAACTCATTATCCGAA
    CTTGAGGATCGTGATTACGACCGCTTGATCAGTCCTGTTCTGAACGAGAATAATATT
    TTTTACGACAGTGCCAAGGCGGGAGACGCACTGCCCAAGGACGCTGACGCTAACGG
    AGCTTATTGTATTGCGTTGAAGGGACTTTACGAAATCAAGCAAATCACTGAAAACTG
    GAAGGAGGATGGTAAATTCTCACGCGACAAGTTGAAAATTTCGAACAAGGACTGGT
    TCGATTTCATCCAAAACAAGCGTTATTTAAAACGTCCGGCAGCGACCAAAAAAGCC
    GGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGA
    AACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 68
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGGA
    CTAATAACTTCCAGAACTTCATCGGTATTTCATCATTACAAAAAACGCTTCGTAACG
    CCTTGATCCCAACAGAAACGACCCAACAATTTATTGTAAAAAACGGCATCATCAAA
    GAAGACGAACTGCGTGGCGAAAATCGCCAAATTTTGAAGGACATTATGGATGACTA
    TTATCGTGGGTTTATCTCGGAGACATTATCCTCCATCGACGACATTGATTGGACGAG
    TCTTTTTGAGAAAATGGAGATCCAGCTTAAAAATGGTGATAACAAGGATACATTGAT
    CAAGGAGCAAACCGAGTACCGCAAGGCCATCCATAAGAAGTTCGCAAATGACGACC
    GCTTCAAAAATATGTTTAGTGCCAAATTGATCTCGGATATCCTTCCTGAGTTCGTAAT
    TCACAACAATAATTATAGCGCATCCGAAAAGGAGGAAAAGACTCAAGTCATTAAGC
    TTTTCAGTCGCTTTGCTACCTCGTTTAAGGACTATTTCAAGAACCGCGCGAACTGCTT
    CTCAGCGGATGACATTTCTTCCTCGTCGTGTCACCGCATCGTGAATGATAATGCGGA
    GATCTTCTTTAGTAATGCCTTGGTATACCGCCGCATTGTTAAATCCCTGTCTAACGAC
    GATATCAATAAGATCTCAGGAGATATGAAGGATAGCCTTAAAGAAATGTCTCTGGA
    AGAAATTTACTCCTATGAAAAGTACGGTGAGTTTATCACCCAAGAGGGGATTAGCTT
    TTATAACGATATCTGCGGGAAGGTGAATTCGTTTATGAACCTTTATTGTCAAAAGAA
    TAAGGAGAATAAGAACTTATATAAGCTTCAGAAACTGCATAAACAAATCTTATGCA
    TTGCCGATACTAGCTATGAAGTTCCGTATAAATTCGAGAGCGATGAAGAAGTTTATC
    AGAGCGTCAATGGGTTCTTGGATAACATTTCATCAAAACACATCGTGGAACGTCTGC
    GTAAGATTGGGGATAACTACAACGGATATAATCTTGACAAAATTTATATTGTATCTA
    AATTCTATGAGTCGGTGAGTCAAAAGACCTACCGTGATTGGGAAACAATCAATACC
    GCGTTAGAAATCCACTATAACAACATTCTGCCAGGGAATGGTAAAAGTAAAGCGGA
    CAAAGTCAAGAAGGCTGTGAAGAACGATCTGCAAAAGAGTATTACAGAGATTAACG
    AATTAGTCTCCAATTATAAGTTATGCTCGGACGATAACATTAAGGCGGAGACGTATA
    TTCATGAGATTTCGCATATTCTTAACAACTTCGAGGCACAAGAGCTTAAGTATAACC
    CAGAGATTCACCTTGTCGAATCGGAGCTGAAGGCATCGGAATTAAAAAATGTCTTA
    GATGTAATCATGAACGCGTTCCATTGGTGCAGTGTTTTCATGACTGAGGAGTTAGTT
    GACAAGGACAATAACTTCTACGCAGAATTAGAAGAGATCTATGATGAGATTTATCC
    AGTGATTTCGCTGTATAATCTGGTACGTAATTACGTCACTCAAAAGCCCTACTCAAC
    AAAAAAAATTAAGCTGAACTTCGGAATTCCGACTCTGGCCGACGGGTGGTCCAAGT
    CAAAGGAGTATTCTAATAATGCTATCATCCTGATGCGCGATAACTTATACTATTTGG
    GAATTTTCAATGCCAAAAATAAACCAGATAAAAAGATTATCGAAGGTAATACAAGC
    GAGAATAAGGGTGACTATAAGAAAATGATTTACAATCTTCTTCCAGGCCCTAACAA
    GATGATTCCCAAAGTTTTTTTGTCCAGTAAAACAGGGGTCGAAACTTACAAGCCCAG
    TGCCTATATCCTTGAAGGGTACAAGCAGAATAAGCACATCAAATCCTCGAAAGACTT
    TGATATTACATTTTGTCATGACTTAATCGATTATTTTAAGAACTGTATCGCAATCCAT
    CCAGAATGGAAGAACTTCGGGTTTGATTTCTCTGATACTTCCACGTATGAGGATATT
    TCCGGGTTCTACCGCGAAGTAGAGCTTCAGGGCTATAAAATTGACTGGACATATATT
    TCAGAAAAAGACATCGATCTGTTACAAGAAAAAGGACAGTTGTATCTGTTTCAAATC
    TATAATAAGGATTTCTCCAAAAAGTCAACTGGAAATGATAACTTACATACAATGTAT
    CTGAAAAATCTTTTTAGTGAAGAGAATTTGAAGGATATCGTGCTGAAGTTAAATGGC
    GAAGCAGAGATCTTCTTCCGCAAGTCCTCGATCAAGAATCCTATCATCCACAAGAAA
    GGTAGTATTCTGGTTAACCGCACGTACGAGGCCGAGGAAAAAGACCAGTTCGGTAA
    TATCCAGATTGTACGTAAGAATATTCCTGAAAATATTTACCAGGAATTATACAAGTA
    TTTTAACGACAAATCGGATAAGGAGCTTTCAGATGAGGCCGCAAAGTTGAAGAACG
    TCGTAGGACACCATGAGGCCGCTACGAATATCGTCAAGGACTACCGCTATACGTATG
    ACAAGTACTTCCTGCACATGCCTATTACTATCAATTTCAAAGCTAATAAAACAGGAT
    TCATCAATGATCGTATCCTTCAGTACATTGCCAAAGAAAAAGATCTGCACGTAATCG
    GAATCGACCGTGGCGAACGTAATCTGATTTACGTATCAGTTATCGACACATGTGGTA
    ACATCGTGGAGCAGAAATCTTTTAACATTGTTAACGGCTATGATTATCAGATTAAGC
    TTAAACAGCAGGAGGGGGCACGCCAAATCGCTCGTAAAGAATGGAAGGAGATTGG
    AAAGATTAAAGAGATTAAAGAGGGGTACCTTTCGCTGGTTATTCACGAAATTTCCAA
    GATGGTGATTAAGTACAATGCAATCATCGCGATGGAAGATCTTAGTTACGGATTCAA
    AAAGGGACGCTTCAAAGTTGAGCGTCAGGTCTACCAGAAATTTGAAACGATGCTGA
    TTAACAAATTGAATTACTTGGTATTCAAAGATATCTCAATTACTGAAAATGGTGGCT
    TATTAAAGGGTTACCAGCTTACCTATATCCCGGATAAGCTGAAGAACGTGGGCCATC
    AATGCGGCTGCATCTTTTACGTCCCTGCCGCATATACCTCTAAAATTGACCCCACCA
    CCGGATTCGTAAATATTTTTAAATTCAAGGACCTGACGGTGGACGCCAAGCGCGAAT
    TCATCAAAAAATTCGACTCAATCCGCTATGATTCCGAAAAAAATCTTTTCTGCTTTAC
    GTTCGATTATAATAACTTCATTACCCAAAACACGGTGATGTCAAAATCGTCCTGGAG
    CGTGTATACTTATGGAGTGCGTATCAAGCGCCGCTTTGTTAATGGGCGCTTCAGTAA
    CGAAAGCGATACCATCGACATTACCAAAGACATGGAGAAGACGCTTGAAATGACGG
    ATATCAATTGGCGTGACGGACACGATCTTCGTCAGGATATCATCGACTACGAGATTG
    TGCAACATATCTTTGAGATTTTCCGTTTAACTGTTCAAATGCGTAACTCCTTGTCCGA
    ATTGGAAGACCGTGATTACGACCGCTTGATTTCACCAGTGCTTAACGAGAATAACAT
    CTTCTACGACTCCGCCAAAGCAGGCGATGCCCTGCCAAAGGACGCTGATGCAAATG
    GTGCATACTGTATCGCGTTGAAGGGCTTATACGAGATTAAGCAAATCACCGAAAATT
    GGAAAGAGGATGGAAAGTTCAGTCGCGATAAGCTGAAGATCTCTAATAAAGATTGG
    TTTGACTTTATCCAGAACAAACGTTATTTAAAACGTCCGGCAGCGACCAAAAAAGCC
    GGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGA
    AACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 69
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGTA
    CCAATAATTTCCAAAATTTCATCGGAATCTCATCCTTGCAAAAAACCTTGCGCAATG
    CTTTGATCCCCACCGAAACCACGCAGCAGTTCATCGTGAAAAACGGCATTATCAAAG
    AGGATGAGTTGCGCGGGGAAAACCGTCAAATTCTTAAGGATATCATGGACGATTAC
    TACCGTGGGTTTATCAGTGAGACCCTGTCAAGCATTGACGACATTGACTGGACCAGC
    TTATTTGAGAAGATGGAGATTCAATTAAAGAACGGGGACAATAAGGACACGCTTAT
    CAAAGAGCAGACAGAATACCGTAAAGCGATTCATAAGAAATTTGCAAATGACGATC
    GCTTCAAGAACATGTTTTCAGCAAAATTAATCAGCGACATCCTTCCCGAATTTGTGA
    TTCATAATAACAACTATTCGGCTAGCGAAAAAGAGGAGAAAACTCAGGTTATTAAG
    CTTTTCTCGCGTTTTGCCACTTCGTTCAAAGACTATTTTAAGAATCGCGCAAACTGCT
    TTTCGGCTGATGATATTTCCAGTTCTAGCTGCCATCGTATCGTTAACGATAATGCTGA
    GATTTTCTTCTCTAATGCCCTGGTGTATCGTCGTATCGTTAAATCTTTGAGCAACGAC
    GATATTAATAAGATTTCAGGCGACATGAAGGATTCTTTAAAGGAGATGTCTTTAGAA
    GAGATTTATTCCTATGAGAAATATGGCGAGTTTATCACCCAAGAAGGAATTTCGTTC
    TACAACGACATCTGTGGCAAAGTGAACAGCTTCATGAATTTATACTGCCAAAAGAAT
    AAGGAGAATAAAAATTTATATAAACTGCAGAAACTGCATAAGCAAATTCTTTGCATT
    GCAGACACCTCTTATGAAGTTCCTTATAAGTTTGAATCGGACGAGGAGGTATATCAG
    AGTGTGAACGGGTTCCTGGACAATATTTCATCCAAGCATATTGTTGAACGTTTACGC
    AAAATTGGAGACAATTACAATGGGTATAACCTTGACAAAATTTACATCGTGTCGAA
    GTTTTACGAATCGGTAAGCCAGAAGACCTATCGTGACTGGGAAACTATCAATACCGC
    CTTAGAAATTCATTACAACAATATTCTTCCTGGTAACGGCAAAAGCAAAGCCGATAA
    GGTAAAGAAGGCTGTCAAGAACGACCTGCAAAAGTCTATCACAGAGATCAACGAGT
    TAGTCTCTAACTACAAATTATGTTCCGACGACAATATTAAAGCCGAAACCTACATCC
    ATGAGATCTCACACATTCTTAACAATTTTGAGGCCCAGGAGCTGAAATATAACCCAG
    AAATTCACCTTGTAGAGAGCGAATTAAAAGCCTCCGAGCTGAAGAACGTTTTGGAT
    GTAATCATGAACGCATTTCATTGGTGCAGCGTATTTATGACAGAGGAGTTGGTCGAC
    AAGGACAATAACTTTTACGCCGAGCTTGAAGAAATCTACGATGAAATTTACCCGGTA
    ATTAGTTTATATAATTTAGTTCGCAACTACGTAACTCAGAAACCCTACAGTACCAAG
    AAGATTAAATTGAACTTTGGGATCCCGACACTTGCTGACGGTTGGAGTAAATCAAAA
    GAATACTCCAATAATGCAATTATCCTGATGCGCGACAATCTTTACTACTTGGGGATC
    TTTAACGCAAAGAACAAACCAGATAAGAAAATCATCGAGGGCAACACCAGCGAGA
    ATAAAGGCGATTACAAGAAAATGATCTATAATCTTTTGCCGGGACCGAACAAAATG
    ATCCCAAAGGTTTTCCTGTCGTCGAAAACGGGAGTCGAGACATATAAACCATCTGCG
    TACATCTTGGAAGGTTACAAACAGAATAAGCATATTAAGTCTAGTAAAGACTTCGAC
    ATCACCTTTTGTCATGACCTGATTGATTATTTCAAGAACTGTATTGCTATCCATCCAG
    AATGGAAAAACTTCGGATTTGACTTCTCCGATACTAGCACCTACGAAGACATTTCGG
    GTTTTTATCGCGAAGTAGAGCTTCAAGGGTACAAAATTGATTGGACATATATTAGCG
    AGAAAGACATTGATTTGCTTCAAGAGAAGGGACAGTTATATTTATTCCAGATCTACA
    ACAAAGACTTCTCGAAGAAATCCACCGGTAATGATAATCTTCACACTATGTACCTGA
    AGAATTTATTTTCAGAGGAAAATCTGAAGGACATTGTACTTAAACTTAATGGAGAAG
    CCGAAATCTTCTTCCGCAAGAGTTCCATTAAAAATCCGATTATTCATAAAAAGGGAA
    GTATCCTTGTGAACCGCACGTATGAGGCCGAAGAGAAGGATCAGTTTGGGAATATT
    CAAATTGTCCGCAAAAACATCCCCGAGAACATCTACCAGGAACTGTATAAATACTTT
    AATGATAAATCTGATAAAGAGTTATCAGACGAGGCTGCCAAACTGAAAAACGTAGT
    CGGTCATCATGAGGCAGCGACCAATATTGTAAAGGACTACCGTTACACCTACGACA
    AGTATTTCCTTCACATGCCGATCACGATTAATTTTAAGGCTAACAAGACCGGCTTTA
    TCAATGACCGCATCTTGCAGTACATCGCGAAAGAGAAAGATTTACACGTCATCGGA
    ATTGATCGTGGAGAGCGTAATCTTATCTACGTCAGCGTCATCGACACCTGTGGAAAC
    ATTGTGGAACAAAAAAGTTTTAATATCGTAAACGGCTACGACTATCAAATTAAACTT
    AAACAGCAAGAGGGAGCTCGCCAGATCGCTCGCAAAGAGTGGAAAGAGATTGGGA
    AAATTAAAGAAATTAAAGAGGGTTACCTGTCGCTGGTAATTCACGAAATCTCGAAA
    ATGGTCATCAAATATAATGCAATTATCGCTATGGAGGATCTGTCCTACGGGTTCAAG
    AAGGGACGTTTTAAAGTAGAGCGCCAGGTGTATCAAAAATTCGAAACCATGTTGAT
    CAATAAGCTTAACTATTTGGTCTTCAAAGATATTTCGATTACGGAGAACGGAGGTTT
    GTTGAAAGGATATCAGCTGACGTATATCCCAGACAAGTTGAAAAACGTGGGGCATC
    AATGTGGATGTATTTTCTATGTGCCCGCGGCCTACACGAGTAAGATCGATCCTACCA
    CTGGTTTCGTCAACATTTTCAAATTTAAAGATCTTACCGTGGATGCGAAGCGCGAAT
    TTATTAAGAAATTTGATAGCATTCGCTATGATTCCGAAAAGAACCTGTTCTGTTTTAC
    GTTCGACTATAACAATTTCATTACCCAAAACACGGTGATGAGCAAATCCTCTTGGTC
    AGTTTATACATACGGTGTACGTATCAAACGCCGTTTCGTTAACGGACGCTTTTCCAA
    TGAGTCTGATACAATCGATATCACGAAAGATATGGAAAAAACATTAGAGATGACTG
    ATATCAACTGGCGTGACGGGCACGACCTGCGTCAAGACATTATTGACTACGAGATTG
    TGCAGCATATCTTCGAAATCTTTCGCTTAACTGTGCAAATGCGTAACTCGTTATCCGA
    GTTAGAAGACCGTGACTACGATCGCCTGATTTCACCCGTCTTGAACGAAAATAACAT
    CTTCTACGATTCCGCGAAGGCTGGGGACGCATTGCCCAAGGACGCAGACGCGAATG
    GAGCGTACTGTATTGCGCTTAAAGGATTATATGAAATCAAGCAGATCACCGAAAATT
    GGAAGGAGGACGGGAAGTTCTCACGCGACAAACTGAAGATTTCAAATAAGGACTGG
    TTCGATTTCATTCAGAATAAGCGTTACCTGAAACGTCCGGCAGCGACCAAAAAAGCC
    GGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGA
    AACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 70
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAATGGTA
    CGAACAACTTTCAGAACTTCATCGGCATCTCCAGCCTTCAAAAGACTTTACGCAACG
    CATTGATTCCCACGGAGACTACGCAACAGTTTATCGTAAAAAATGGTATTATCAAAG
    AAGATGAATTACGCGGGGAGAATCGCCAGATTCTTAAGGACATTATGGACGATTAT
    TACCGTGGATTCATCAGTGAGACACTGAGCTCCATTGATGACATCGACTGGACGTCA
    TTGTTTGAAAAGATGGAAATCCAGTTGAAAAATGGCGATAACAAAGATACATTGAT
    TAAAGAGCAGACAGAGTACCGCAAAGCAATTCACAAGAAATTCGCCAATGATGATC
    GTTTTAAGAACATGTTTAGTGCCAAGCTTATTTCGGATATCTTACCCGAATTCGTGAT
    TCACAACAACAATTATTCGGCAAGTGAGAAAGAGGAAAAGACCCAGGTTATCAAAT
    TGTTTTCGCGCTTCGCCACTTCGTTCAAAGATTATTTCAAGAACCGTGCAAACTGTTT
    CTCCGCTGACGACATCAGTTCCAGCTCATGCCACCGTATTGTAAATGACAATGCGGA
    GATCTTTTTCAGTAATGCCTTAGTATATCGTCGCATTGTAAAGAGCTTATCTAATGAT
    GACATTAACAAGATCTCGGGTGATATGAAGGACTCACTTAAGGAGATGAGTCTGGA
    AGAGATCTACTCCTACGAAAAATACGGGGAATTCATCACCCAGGAGGGAATTTCAT
    TCTACAACGATATCTGCGGCAAAGTTAACTCCTTTATGAATCTGTACTGTCAAAAGA
    ACAAGGAGAATAAAAACCTGTATAAATTGCAGAAACTTCATAAACAAATTTTGTGT
    ATCGCAGACACGAGTTATGAAGTACCTTATAAATTCGAATCCGACGAAGAGGTATA
    TCAGTCCGTAAATGGGTTCCTGGACAATATCAGTAGTAAGCACATTGTGGAACGCTT
    ACGCAAAATTGGAGACAATTACAACGGGTATAACCTGGACAAAATCTACATCGTAT
    CCAAATTTTATGAAAGCGTGTCTCAAAAAACTTATCGTGATTGGGAAACAATCAACA
    CGGCTCTTGAGATCCATTACAATAACATCTTGCCGGGTAACGGCAAATCGAAGGCA
    GACAAAGTTAAAAAAGCAGTTAAGAACGACTTACAGAAAAGCATTACGGAGATTAA
    CGAGTTAGTAAGTAATTACAAATTATGCTCCGACGATAATATCAAAGCTGAAACCTA
    CATCCATGAAATTAGCCACATTTTGAACAATTTCGAAGCGCAGGAGCTGAAATATAA
    CCCTGAAATCCATCTGGTAGAGTCTGAGTTGAAGGCGTCAGAACTGAAAAACGTTCT
    TGACGTCATCATGAATGCCTTTCACTGGTGTAGTGTTTTTATGACTGAGGAGCTTGTA
    GATAAGGACAACAACTTCTATGCTGAACTTGAAGAGATCTACGATGAAATCTACCCC
    GTAATCAGTCTGTATAATTTAGTTCGTAACTACGTCACGCAGAAACCCTATTCGACT
    AAGAAAATTAAGCTGAACTTTGGGATCCCTACTTTGGCAGACGGGTGGAGCAAGAG
    TAAAGAATACAGTAATAATGCAATTATCTTGATGCGCGATAACTTATATTACTTAGG
    TATTTTCAATGCTAAGAACAAACCTGATAAGAAGATTATCGAAGGAAATACGAGTG
    AGAATAAGGGAGACTACAAAAAGATGATTTACAACTTGCTGCCAGGGCCTAATAAG
    ATGATTCCAAAAGTTTTTCTGTCGAGCAAGACAGGGGTTGAAACTTATAAGCCATCC
    GCTTATATCCTTGAGGGGTACAAGCAGAATAAGCATATCAAGTCCTCCAAAGATTTT
    GATATTACATTTTGCCACGACTTAATTGATTACTTCAAGAACTGCATCGCAATCCATC
    CCGAATGGAAGAATTTCGGCTTCGATTTCTCAGATACGTCCACGTATGAGGATATCT
    CAGGCTTTTACCGCGAAGTTGAGCTGCAAGGTTATAAAATTGATTGGACATACATCT
    CCGAAAAAGACATTGATCTTTTACAGGAAAAGGGCCAATTATACTTATTTCAAATCT
    ATAACAAAGATTTTAGCAAGAAGTCCACAGGTAATGATAACCTGCATACGATGTATT
    TGAAAAATCTTTTCAGTGAAGAGAATTTGAAGGATATCGTCCTGAAGCTGAACGGTG
    AGGCTGAGATCTTCTTCCGCAAATCGTCTATCAAAAACCCCATCATTCACAAAAAGG
    GAAGTATCTTAGTAAACCGCACTTATGAAGCGGAGGAAAAGGATCAGTTCGGGAAC
    ATCCAGATCGTGCGCAAGAACATTCCAGAAAACATCTATCAGGAACTTTACAAATAT
    TTCAATGACAAGTCTGATAAAGAATTATCAGACGAGGCGGCGAAACTTAAAAATGT
    TGTTGGACACCACGAAGCAGCGACGAATATTGTAAAGGATTATCGCTACACATACG
    ATAAATACTTTTTGCACATGCCAATCACCATTAACTTTAAGGCGAACAAGACAGGTT
    TCATTAACGACCGTATTCTGCAATATATCGCAAAGGAAAAAGACCTGCACGTTATTG
    GGATCGATCGTGGCGAACGCAATTTGATCTACGTAAGCGTTATCGACACTTGCGGAA
    ATATCGTTGAACAAAAAAGCTTTAATATCGTCAATGGATACGATTACCAAATCAAGC
    TGAAACAACAAGAAGGGGCACGTCAGATCGCTCGTAAAGAATGGAAAGAGATTGGT
    AAGATCAAAGAGATTAAAGAAGGGTATCTTTCTTTAGTAATTCACGAGATTTCGAAA
    ATGGTTATTAAATACAATGCGATTATTGCTATGGAAGACTTAAGCTACGGCTTTAAG
    AAAGGTCGCTTCAAAGTGGAGCGCCAAGTGTATCAGAAGTTTGAAACGATGTTGAT
    TAACAAATTAAATTACCTGGTCTTTAAGGACATCAGTATCACAGAAAATGGGGGGTT
    GCTTAAAGGGTACCAGCTTACATACATCCCTGATAAACTGAAAAATGTCGGTCATCA
    GTGCGGATGTATCTTCTATGTACCAGCAGCCTATACCAGTAAGATTGACCCTACTAC
    TGGCTTTGTGAATATTTTTAAATTCAAGGATTTAACCGTGGACGCCAAGCGTGAATT
    TATTAAAAAATTTGATTCGATTCGCTACGACAGTGAGAAAAACCTTTTCTGCTTTAC
    CTTTGACTACAACAATTTTATTACCCAGAACACCGTAATGTCAAAGAGTTCGTGGTC
    TGTATATACCTACGGTGTTCGCATCAAGCGCCGCTTCGTAAACGGGCGTTTCAGTAA
    CGAATCTGACACCATCGACATCACTAAAGATATGGAGAAGACATTGGAAATGACGG
    ACATTAATTGGCGTGATGGCCATGACTTACGTCAGGACATTATTGATTACGAAATTG
    TGCAGCATATCTTCGAGATTTTCCGTTTGACAGTTCAGATGCGCAACTCACTGAGTG
    AGTTAGAAGATCGCGATTACGACCGTCTGATCTCACCGGTCCTTAATGAAAACAACA
    TTTTCTACGACTCAGCAAAGGCGGGTGATGCCCTGCCAAAGGATGCGGACGCTAAT
    GGCGCCTACTGCATCGCCCTGAAAGGATTGTATGAAATTAAGCAGATTACAGAAAA
    TTGGAAGGAAGATGGTAAATTTAGCCGTGATAAATTAAAAATCTCGAACAAGGATT
    GGTTCGATTTTATTCAGAACAAACGTTATTTGAAACGTCCGGCAGCGACCAAAAAAG
    CCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAA
    GAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 71
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAATGGAA
    CAAATAATTTTCAAAATTTTATCGGCATCTCAAGTCTTCAAAAAACCCTTCGCAATG
    CCCTGATTCCAACTGAAACAACCCAGCAATTTATCGTCAAGAACGGCATCATTAAGG
    AAGACGAGTTACGCGGGGAGAACCGTCAAATCCTGAAAGATATCATGGATGACTAC
    TATCGTGGGTTCATTTCGGAAACCTTGTCTTCAATCGACGACATTGACTGGACGAGT
    CTTTTCGAGAAAATGGAAATTCAGCTTAAAAATGGAGACAACAAGGATACTCTGAT
    TAAGGAACAGACAGAATATCGCAAAGCTATCCACAAAAAGTTCGCTAATGATGATC
    GTTTCAAAAATATGTTTTCTGCTAAATTGATTTCCGATATCTTGCCTGAATTTGTAAT
    CCACAACAACAATTATTCTGCTTCCGAGAAGGAAGAGAAGACCCAGGTCATTAAAT
    TATTCAGCCGCTTTGCAACCAGCTTTAAAGACTACTTTAAGAATCGCGCTAACTGCT
    TTTCGGCGGATGACATCTCATCATCATCATGCCACCGCATTGTGAACGACAATGCGG
    AGATCTTCTTTTCGAATGCGTTAGTTTATCGTCGCATTGTCAAAAGTCTTAGCAATGA
    TGACATCAACAAGATCTCAGGAGACATGAAAGATTCCTTAAAGGAGATGTCTCTTG
    AGGAAATCTATTCGTATGAGAAATACGGCGAGTTCATTACCCAGGAAGGTATTAGTT
    TCTACAATGATATCTGCGGCAAAGTAAATTCTTTTATGAATCTGTATTGCCAAAAAA
    ACAAAGAAAACAAGAATCTTTATAAGTTACAAAAGTTACATAAGCAAATTCTGTGC
    ATCGCTGATACATCTTATGAGGTACCCTACAAATTTGAAAGTGATGAGGAGGTCTAT
    CAGAGTGTCAACGGCTTCTTAGACAACATCTCTTCCAAACATATCGTGGAACGCCTG
    CGTAAAATCGGAGATAACTACAACGGATATAACTTAGATAAAATCTACATCGTGTCC
    AAGTTTTATGAAAGTGTGAGCCAAAAAACATATCGTGACTGGGAAACCATTAACAC
    CGCATTGGAAATTCACTATAACAACATTTTGCCAGGCAACGGGAAAAGTAAGGCGG
    ACAAAGTTAAGAAAGCAGTTAAAAATGACCTGCAAAAAAGCATCACTGAAATTAAC
    GAATTGGTATCGAATTACAAATTATGTAGCGACGATAATATCAAAGCAGAAACTTA
    CATTCACGAGATTAGTCACATTTTAAATAACTTCGAGGCCCAGGAATTGAAATACAA
    TCCCGAAATTCATTTGGTTGAATCAGAACTGAAAGCATCAGAGTTGAAAAATGTGTT
    AGATGTCATTATGAATGCGTTTCATTGGTGCTCTGTGTTCATGACCGAGGAACTGGT
    TGATAAAGATAACAACTTTTACGCTGAATTGGAGGAGATTTACGATGAGATTTACCC
    GGTCATTTCGCTTTATAACTTAGTGCGCAATTATGTGACGCAGAAACCATATTCCAC
    GAAGAAAATCAAACTTAATTTTGGCATCCCTACTCTGGCTGATGGTTGGTCGAAATC
    GAAAGAGTACAGCAACAACGCGATCATTCTTATGCGTGACAATCTTTACTATTTGGG
    CATTTTTAATGCCAAGAATAAGCCAGATAAGAAAATCATTGAGGGGAATACTTCCG
    AGAATAAGGGGGATTACAAAAAGATGATCTATAACTTGCTGCCCGGCCCCAACAAA
    ATGATTCCTAAGGTTTTCTTGTCAAGCAAGACGGGCGTCGAAACATATAAGCCGTCA
    GCTTATATTCTGGAAGGCTATAAACAGAATAAGCACATCAAGTCTTCCAAGGACTTT
    GACATCACTTTTTGCCACGATTTGATCGACTACTTTAAGAACTGTATTGCGATTCATC
    CGGAATGGAAGAACTTCGGTTTCGACTTTTCCGATACCTCAACATACGAGGATATCA
    GCGGCTTCTACCGTGAAGTCGAGCTTCAAGGCTACAAGATCGATTGGACATATATTT
    CAGAGAAGGACATTGATTTGTTACAAGAGAAAGGTCAACTTTACTTATTTCAGATCT
    ATAACAAAGACTTTTCGAAGAAATCGACAGGAAACGATAACTTACACACTATGTAT
    TTAAAAAATCTGTTTTCGGAGGAAAACCTGAAAGATATTGTGCTGAAACTTAACGGC
    GAGGCAGAGATCTTTTTCCGTAAAAGCTCAATCAAGAATCCTATCATCCATAAAAAA
    GGTAGTATTCTTGTCAACCGCACATATGAAGCGGAGGAGAAGGACCAATTCGGAAA
    CATCCAAATTGTCCGTAAGAATATTCCGGAGAACATTTACCAAGAGTTGTATAAATA
    CTTTAACGATAAGTCAGATAAGGAACTTAGCGATGAGGCGGCGAAGCTTAAAAACG
    TAGTTGGGCATCATGAAGCTGCTACCAACATTGTAAAAGATTACCGTTACACCTATG
    ACAAGTATTTCTTGCACATGCCCATTACGATCAATTTCAAAGCAAATAAGACAGGCT
    TTATCAATGATCGCATCCTGCAGTACATTGCTAAAGAGAAGGATTTGCATGTTATCG
    GTATTGATCGCGGAGAGCGCAATTTGATCTACGTCTCCGTAATCGACACTTGCGGTA
    ACATTGTTGAGCAGAAGTCGTTCAACATCGTTAATGGTTATGATTACCAAATCAAGC
    TGAAGCAGCAAGAGGGTGCCCGCCAGATCGCGCGTAAGGAATGGAAAGAAATCGG
    GAAAATTAAAGAGATCAAAGAAGGCTATTTGTCTCTGGTAATTCACGAAATCAGCA
    AGATGGTGATCAAGTATAACGCGATCATTGCGATGGAGGATCTTTCTTATGGCTTCA
    AGAAAGGGCGCTTTAAAGTCGAACGCCAGGTCTACCAGAAATTTGAGACAATGCTT
    ATCAACAAGCTTAACTATCTTGTATTTAAGGATATTTCCATCACTGAGAACGGAGGA
    CTTTTAAAGGGGTACCAACTGACGTACATTCCTGATAAGCTGAAGAACGTTGGTCAT
    CAATGCGGATGCATCTTCTATGTGCCAGCGGCTTACACCTCCAAAATCGATCCCACT
    ACAGGCTTTGTCAATATCTTCAAATTCAAGGATTTGACCGTTGACGCGAAGCGCGAG
    TTTATCAAGAAGTTTGATAGCATTCGCTACGACAGCGAAAAAAATTTATTTTGTTTT
    ACTTTCGACTACAATAACTTTATTACTCAGAACACTGTCATGTCAAAGAGTTCGTGG
    AGTGTCTACACGTACGGAGTACGTATTAAGCGCCGTTTCGTCAACGGACGCTTCTCA
    AACGAAAGCGACACGATCGACATCACCAAAGACATGGAAAAAACTCTTGAGATGAC
    GGATATCAATTGGCGCGACGGCCATGACCTGCGTCAGGATATCATTGATTACGAGAT
    CGTTCAGCACATCTTCGAAATCTTCCGCCTTACCGTCCAGATGCGCAACAGTTTAAG
    CGAGCTTGAAGACCGCGACTACGATCGTTTGATTAGCCCCGTTCTGAACGAGAATAA
    TATTTTCTACGACAGCGCAAAGGCCGGTGATGCTTTGCCAAAGGACGCAGACGCGA
    ATGGAGCCTACTGCATCGCCCTGAAGGGCTTATATGAGATTAAGCAAATTACCGAA
    AATTGGAAGGAAGATGGTAAGTTCTCCCGTGATAAGCTTAAAATTAGCAATAAGGA
    TTGGTTCGACTTCATCCAGAACAAACGTTACCTGAAACGTCCGGCAGCGACCAAAA
    AAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAA
    AAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 72
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGAA
    CAAACAATTTCCAAAACTTCATCGGTATCTCTTCGTTGCAGAAGACTCTGCGTAATG
    CTTTGATCCCGACGGAGACAACCCAACAATTTATCGTCAAAAACGGTATTATTAAGG
    AGGACGAGTTACGTGGAGAAAATCGTCAAATCCTTAAGGACATCATGGACGATTAT
    TATCGCGGGTTTATTTCTGAAACCCTGAGCAGTATCGATGATATCGACTGGACCTCA
    CTTTTTGAGAAAATGGAGATCCAGTTGAAGAACGGTGATAACAAAGACACTCTGAT
    CAAAGAGCAAACTGAATACCGCAAGGCAATTCACAAAAAGTTCGCCAACGACGACC
    GTTTCAAGAATATGTTCTCAGCTAAGTTAATCAGCGACATTTTGCCAGAGTTCGTTAT
    CCACAACAATAATTATAGTGCTTCAGAGAAGGAGGAAAAAACCCAAGTGATTAAAC
    TTTTTTCGCGCTTTGCAACCTCATTCAAGGACTACTTCAAGAATCGCGCGAATTGCTT
    CAGTGCGGACGACATTTCTTCTTCAAGTTGCCATCGTATCGTTAACGATAACGCGGA
    AATTTTCTTCTCTAATGCTTTGGTGTATCGCCGCATTGTAAAATCGCTTAGTAACGAT
    GACATTAATAAGATCTCAGGTGATATGAAAGATTCATTGAAGGAAATGAGCTTGGA
    AGAGATTTACAGTTACGAAAAATATGGAGAATTTATTACTCAGGAAGGCATCTCATT
    CTATAACGATATCTGCGGGAAGGTAAATTCGTTTATGAACTTATATTGCCAGAAAAA
    TAAAGAGAATAAAAATTTGTATAAGCTTCAGAAGTTGCACAAACAGATCCTGTGCA
    TTGCAGACACCTCGTATGAGGTTCCGTATAAATTTGAGTCCGATGAAGAAGTGTATC
    AGTCTGTGAATGGTTTCTTAGATAATATCTCTTCCAAGCATATTGTCGAACGCCTGCG
    CAAAATTGGTGATAACTATAACGGATACAATCTGGATAAAATTTACATCGTTTCTAA
    ATTTTACGAGTCAGTCTCGCAGAAGACCTACCGCGACTGGGAAACAATTAACACGG
    CATTGGAGATTCACTACAATAATATCTTGCCTGGTAACGGTAAGTCTAAGGCAGATA
    AGGTAAAAAAAGCTGTGAAAAACGACCTTCAGAAAAGCATCACGGAGATTAATGAG
    CTGGTGAGTAATTACAAATTATGTTCAGACGATAATATTAAAGCTGAAACGTATATC
    CATGAAATCTCGCATATCTTGAACAACTTCGAGGCCCAAGAACTTAAATATAACCCC
    GAAATCCATTTAGTCGAGTCTGAATTGAAAGCGTCGGAATTAAAAAACGTCTTAGAC
    GTCATTATGAACGCGTTTCACTGGTGTTCAGTTTTCATGACCGAAGAGCTGGTCGAC
    AAAGACAACAACTTCTATGCGGAATTGGAGGAAATCTATGATGAAATCTACCCTGTT
    ATTTCACTGTATAACCTTGTGCGCAACTATGTCACTCAGAAGCCGTATTCGACCAAA
    AAAATTAAATTGAATTTCGGTATCCCTACTCTTGCAGACGGATGGAGTAAAAGCAAG
    GAATACAGTAATAACGCCATTATTCTTATGCGCGACAATTTATACTACCTGGGCATC
    TTTAACGCAAAGAATAAGCCGGATAAGAAGATTATTGAGGGTAACACCAGTGAGAA
    CAAGGGCGACTATAAGAAGATGATCTATAACTTATTGCCAGGTCCAAATAAAATGA
    TCCCAAAAGTATTCTTATCATCAAAGACGGGAGTTGAAACCTATAAGCCTAGTGCCT
    ATATTCTTGAGGGATATAAACAGAACAAGCACATTAAGTCGTCTAAGGATTTTGACA
    TTACGTTCTGCCATGACTTAATCGACTATTTTAAAAACTGTATTGCGATTCACCCCGA
    ATGGAAGAATTTTGGATTCGATTTTTCGGATACCTCGACCTATGAAGATATTTCGGG
    ATTTTATCGTGAAGTGGAGTTGCAAGGCTATAAAATCGATTGGACCTATATCTCAGA
    AAAAGACATTGATTTATTACAGGAAAAGGGACAACTGTACCTTTTCCAAATTTATAA
    CAAGGACTTTTCTAAAAAGTCCACAGGAAATGATAACCTTCACACCATGTACCTGAA
    GAACCTTTTCTCAGAGGAAAACCTGAAGGACATTGTCCTTAAGTTAAATGGAGAAG
    CGGAGATCTTTTTCCGTAAATCTAGTATCAAGAATCCGATTATCCATAAAAAAGGTT
    CGATTTTGGTAAATCGCACCTATGAAGCGGAAGAGAAAGATCAATTTGGTAACATC
    CAGATCGTGCGCAAGAATATCCCGGAGAACATTTACCAAGAGCTGTATAAGTACTTC
    AATGATAAGTCTGATAAGGAACTGTCAGATGAAGCTGCGAAATTGAAGAACGTGGT
    TGGGCATCATGAAGCCGCTACCAATATCGTCAAGGATTACCGTTATACCTATGACAA
    ATATTTCTTACACATGCCGATTACGATCAATTTTAAGGCAAACAAGACAGGATTCAT
    CAACGACCGTATCTTGCAGTATATTGCCAAAGAGAAGGATCTGCATGTGATCGGTAT
    TGACCGCGGGGAGCGCAATTTAATCTATGTATCGGTGATCGATACTTGTGGTAACAT
    CGTAGAACAAAAGAGCTTTAACATCGTGAATGGTTACGACTATCAGATCAAGCTGA
    AACAACAGGAAGGAGCCCGCCAGATCGCTCGCAAGGAATGGAAAGAAATCGGGAA
    AATTAAGGAAATCAAGGAAGGCTACCTTTCATTGGTCATTCACGAAATTTCGAAAAT
    GGTAATTAAGTACAACGCGATCATCGCCATGGAGGACCTTTCGTACGGATTTAAGAA
    GGGTCGTTTCAAAGTTGAGCGCCAGGTATACCAAAAATTCGAGACTATGCTTATCAA
    CAAACTTAACTACTTGGTCTTTAAGGACATTTCTATTACCGAAAACGGCGGCTTACT
    TAAAGGCTATCAATTGACATATATTCCCGACAAACTGAAGAATGTTGGACATCAATG
    CGGGTGTATTTTCTATGTGCCGGCAGCTTACACTAGTAAGATCGACCCTACAACCGG
    GTTCGTAAACATTTTTAAATTCAAAGACTTAACAGTCGATGCGAAGCGTGAATTTAT
    TAAGAAGTTTGATAGTATCCGCTATGACAGTGAAAAGAACTTGTTTTGCTTTACGTT
    CGACTACAATAACTTTATTACACAGAACACGGTCATGTCTAAATCATCATGGTCGGT
    TTACACATATGGGGTGCGCATCAAGCGTCGCTTTGTAAATGGCCGTTTTAGTAATGA
    GAGCGACACAATCGACATCACAAAGGATATGGAGAAAACTCTTGAGATGACAGACA
    TCAATTGGCGTGACGGTCATGACTTACGCCAAGATATCATCGACTACGAAATCGTAC
    AGCATATTTTTGAGATTTTTCGTCTTACTGTGCAAATGCGTAATTCTTTATCCGAACT
    GGAAGATCGTGATTACGACCGCTTGATTAGTCCCGTCTTAAATGAGAACAATATTTT
    CTATGATTCTGCGAAAGCCGGAGATGCACTGCCCAAAGACGCTGATGCCAATGGCG
    CGTATTGCATTGCATTAAAAGGATTATATGAGATTAAACAGATTACCGAAAATTGGA
    AAGAGGACGGTAAATTCTCACGCGATAAATTGAAGATTTCTAACAAGGACTGGTTC
    GACTTTATCCAAAATAAACGTTATCTTAAACGTCCGGCAGCGACCAAAAAAGCCGG
    CCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGAAA
    CGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 73
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAACGGTA
    CCAACAACTTTCAGAATTTCATTGGCATTAGCTCGCTTCAAAAAACTTTACGCAATG
    CTCTTATTCCGACTGAGACGACACAACAGTTTATCGTTAAGAATGGCATCATCAAAG
    AAGATGAATTACGCGGAGAAAACCGCCAGATCCTGAAAGACATTATGGACGATTAT
    TACCGTGGGTTCATCTCCGAGACGTTGTCATCGATCGATGACATCGACTGGACGTCA
    CTTTTTGAAAAAATGGAGATCCAGTTAAAGAACGGTGACAATAAGGATACATTGAT
    CAAAGAACAGACCGAGTACCGTAAAGCGATTCATAAAAAGTTTGCGAACGATGATC
    GCTTCAAGAATATGTTTTCTGCGAAATTAATTTCCGACATTTTACCTGAATTTGTTAT
    TCATAATAACAACTACTCGGCGTCTGAGAAAGAGGAGAAAACCCAAGTGATTAAAC
    TTTTTTCACGTTTCGCAACGTCGTTCAAAGACTATTTTAAAAATCGTGCTAATTGCTT
    TAGCGCGGATGACATCAGCTCTAGTTCATGTCATCGCATTGTCAACGATAATGCTGA
    GATCTTTTTCAGTAATGCGTTAGTGTACCGTCGTATTGTGAAGTCCTTATCTAATGAT
    GATATCAATAAGATCAGCGGGGATATGAAGGACTCACTTAAGGAGATGAGCTTGGA
    GGAAATCTATTCCTATGAGAAGTATGGTGAGTTTATTACGCAAGAAGGAATTAGCTT
    TTACAACGATATCTGTGGAAAGGTGAATTCGTTTATGAATTTGTATTGCCAGAAAAA
    TAAGGAGAACAAGAACCTTTATAAATTGCAAAAGTTACACAAGCAAATCCTGTGCA
    TTGCAGATACTTCCTACGAGGTGCCTTACAAGTTTGAATCCGACGAAGAGGTCTACC
    AATCTGTAAACGGTTTCTTAGATAATATTAGTTCCAAGCATATTGTGGAGCGCCTTC
    GTAAAATTGGCGATAATTACAACGGTTACAATTTAGACAAAATTTACATTGTCAGTA
    AATTCTACGAGTCCGTATCTCAAAAGACGTATCGTGATTGGGAGACTATCAATACGG
    CCCTGGAGATCCACTACAACAATATCTTGCCCGGTAATGGTAAGTCGAAGGCCGATA
    AAGTTAAGAAAGCGGTGAAAAATGACTTACAGAAGTCAATCACCGAAATTAACGAA
    TTGGTGTCCAATTATAAATTGTGTTCAGATGATAATATCAAAGCCGAGACCTACATT
    CATGAGATTTCCCATATCTTAAATAATTTCGAGGCGCAAGAGCTTAAGTATAACCCA
    GAAATCCACCTGGTAGAATCTGAGTTGAAGGCGTCAGAGTTAAAAAATGTTTTAGAT
    GTCATTATGAACGCGTTTCACTGGTGCTCCGTATTTATGACGGAGGAATTAGTAGAT
    AAAGACAACAATTTCTATGCCGAACTTGAGGAAATCTATGATGAGATCTATCCCGTC
    ATTAGCCTGTATAACTTGGTCCGCAACTATGTTACCCAAAAACCGTACAGTACCAAG
    AAGATTAAGCTGAATTTCGGCATTCCTACACTGGCTGATGGTTGGAGTAAATCGAAG
    GAATATTCGAATAACGCGATTATCTTGATGCGCGACAACTTATACTATTTGGGGATC
    TTTAACGCCAAAAACAAACCGGATAAGAAGATTATTGAGGGAAACACATCAGAGAA
    CAAAGGCGACTACAAAAAAATGATTTACAACTTGTTACCGGGGCCTAACAAAATGA
    TCCCGAAGGTGTTCTTATCCAGTAAAACAGGCGTTGAGACCTACAAACCTTCCGCAT
    ACATCCTGGAAGGGTATAAGCAGAACAAGCACATTAAGTCCAGCAAGGATTTCGAT
    ATTACCTTCTGTCATGATTTAATTGACTATTTCAAGAACTGTATTGCAATCCACCCCG
    AGTGGAAGAACTTCGGATTCGACTTCTCAGATACGAGCACATATGAGGACATCTCG
    GGGTTCTATCGTGAAGTAGAACTGCAGGGATATAAAATTGATTGGACATATATTTCC
    GAAAAAGACATCGACCTTTTACAAGAGAAGGGTCAACTTTACTTGTTCCAAATTTAC
    AATAAAGACTTCTCAAAAAAAAGCACGGGTAACGATAATTTACACACTATGTATTTA
    AAGAACCTTTTCTCGGAAGAGAATTTAAAGGATATCGTATTGAAGTTGAATGGAGA
    AGCGGAGATCTTCTTCCGTAAGTCCAGTATTAAAAACCCTATTATTCACAAGAAGGG
    ATCGATTTTAGTTAACCGCACATACGAGGCCGAAGAGAAGGACCAATTTGGGAACA
    TTCAAATTGTCCGCAAAAACATCCCTGAGAACATTTATCAAGAGCTTTATAAGTACT
    TTAACGATAAGTCCGATAAGGAATTGTCAGATGAGGCGGCAAAGTTGAAGAATGTC
    GTGGGGCATCATGAAGCTGCCACCAACATTGTGAAGGACTACCGCTACACTTACGA
    CAAATACTTCCTGCACATGCCCATTACGATCAATTTTAAGGCCAATAAGACAGGCTT
    TATTAACGACCGTATTCTTCAATATATCGCTAAGGAGAAGGACCTTCATGTGATTGG
    GATCGACCGCGGAGAACGTAATTTAATTTATGTGTCCGTCATCGATACGTGTGGAAA
    TATCGTGGAACAGAAATCATTCAATATCGTGAATGGCTATGATTACCAGATCAAATT
    AAAACAGCAGGAGGGCGCTCGCCAAATTGCGCGTAAGGAATGGAAAGAGATCGGA
    AAAATCAAAGAAATCAAAGAAGGATATTTGTCATTGGTGATCCATGAGATTTCAAA
    AATGGTAATTAAATATAATGCAATTATCGCAATGGAAGACCTGTCCTATGGTTTTAA
    GAAGGGTCGTTTCAAGGTAGAACGCCAAGTGTATCAAAAGTTCGAGACGATGCTGA
    TCAATAAGCTGAATTATCTTGTGTTTAAGGACATTAGCATCACGGAAAATGGAGGGC
    TGTTGAAAGGCTATCAACTGACGTATATCCCTGACAAGCTGAAAAATGTTGGCCATC
    AGTGCGGGTGCATTTTCTACGTCCCCGCGGCGTATACAAGCAAGATCGATCCTACTA
    CGGGATTCGTAAATATTTTTAAATTCAAAGACTTAACCGTGGACGCCAAGCGCGAAT
    TCATTAAGAAGTTTGATAGCATTCGCTACGATTCAGAAAAAAATCTTTTCTGTTTTAC
    GTTCGATTACAACAATTTTATCACCCAGAACACAGTGATGAGCAAGTCATCCTGGTC
    TGTCTATACCTACGGTGTCCGTATCAAACGCCGCTTCGTCAACGGACGCTTCTCTAAT
    GAATCTGATACCATTGACATCACCAAGGACATGGAAAAGACACTTGAGATGACAGA
    TATTAACTGGCGTGACGGACATGACCTGCGTCAGGACATCATCGATTATGAGATTGT
    TCAGCATATCTTCGAGATCTTCCGCCTGACAGTACAAATGCGCAATTCACTGTCAGA
    ACTTGAAGACCGCGACTATGACCGCCTGATCTCTCCAGTATTAAATGAGAACAATAT
    CTTTTATGACAGTGCTAAGGCCGGCGATGCCCTTCCGAAAGATGCTGATGCTAACGG
    AGCTTATTGTATTGCATTAAAGGGTCTTTATGAGATCAAGCAAATTACCGAGAATTG
    GAAGGAGGATGGCAAATTCTCGCGCGACAAACTGAAAATCAGTAACAAGGACTGGT
    TCGATTTTATTCAGAATAAACGTTACCTGAAACGTCCGGCAGCGACCAAAAAAGCC
    GGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGA
    AACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 74
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAACGGAA
    CGAACAACTTCCAGAACTTCATCGGCATCAGTTCTTTACAAAAAACCCTGCGTAACG
    CCCTTATTCCGACTGAGACAACACAACAGTTCATCGTTAAAAACGGAATTATCAAAG
    AGGACGAGTTGCGCGGCGAGAATCGCCAAATTTTGAAAGATATTATGGACGACTAT
    TATCGTGGTTTTATTTCAGAAACACTGAGTTCGATTGACGATATCGATTGGACGAGC
    CTGTTTGAGAAAATGGAAATCCAGTTGAAAAATGGCGATAATAAAGACACTTTAAT
    CAAAGAACAAACCGAGTATCGTAAAGCGATCCATAAAAAGTTCGCTAATGACGATC
    GTTTTAAGAATATGTTCAGTGCGAAACTGATTTCAGACATTTTGCCCGAGTTCGTGA
    TCCATAATAACAACTATTCCGCCTCGGAAAAGGAAGAAAAAACCCAGGTGATTAAG
    CTGTTCAGTCGCTTCGCAACATCTTTCAAGGATTATTTCAAGAATCGCGCGAATTGCT
    TCAGTGCGGACGATATTTCTAGTTCAAGCTGCCATCGTATCGTTAATGATAACGCGG
    AGATTTTTTTTAGCAATGCTCTGGTGTACCGCCGCATTGTTAAGTCACTGTCCAACGA
    TGATATTAACAAGATCTCAGGAGACATGAAAGACTCGCTTAAAGAGATGAGTCTGG
    AAGAGATCTATTCTTATGAGAAGTATGGCGAGTTTATTACCCAAGAAGGAATCTCAT
    TCTACAATGATATTTGTGGAAAGGTGAACAGCTTTATGAATCTTTACTGCCAAAAAA
    ACAAGGAGAATAAGAATCTTTACAAACTTCAGAAGTTACATAAACAGATTTTGTGTA
    TTGCGGATACGTCTTATGAAGTCCCCTACAAATTTGAATCGGATGAAGAGGTATACC
    AAAGTGTGAACGGATTCTTGGACAATATTTCTTCTAAACATATTGTTGAACGCTTAC
    GTAAGATCGGGGATAACTACAATGGCTACAATCTTGACAAAATCTACATTGTTAGCA
    AATTCTACGAGAGTGTCAGCCAAAAGACGTACCGCGATTGGGAAACAATTAATACT
    GCGCTTGAGATTCACTATAATAACATTTTACCAGGCAACGGCAAGTCCAAGGCGGAT
    AAAGTTAAAAAAGCTGTTAAAAACGATTTGCAAAAATCTATCACAGAAATTAACGA
    GTTAGTTAGTAACTACAAACTGTGCTCCGATGACAACATTAAGGCTGAGACGTATAT
    CCATGAGATCTCTCACATCTTAAACAATTTTGAAGCTCAAGAACTTAAGTACAATCC
    GGAAATCCACCTGGTGGAATCCGAGCTGAAGGCTAGCGAACTGAAGAACGTATTGG
    ACGTGATCATGAACGCGTTCCACTGGTGTTCTGTCTTTATGACGGAAGAGCTTGTCG
    ACAAAGATAATAACTTTTACGCGGAACTTGAGGAAATTTACGATGAGATTTACCCAG
    TTATTTCATTGTATAACCTTGTCCGTAATTACGTGACCCAAAAGCCTTATAGTACGAA
    AAAAATCAAATTAAATTTTGGAATCCCAACACTGGCTGACGGTTGGAGCAAATCTA
    AGGAGTATTCTAATAACGCAATCATCTTAATGCGTGACAACCTGTATTATTTGGGTA
    TCTTCAATGCCAAAAATAAGCCTGACAAAAAGATTATCGAAGGAAATACTTCGGAG
    AATAAGGGGGATTACAAAAAAATGATTTACAATTTGCTGCCCGGGCCGAACAAGAT
    GATCCCCAAAGTGTTCTTATCCTCGAAGACTGGTGTAGAAACATACAAGCCAAGCGC
    ATACATTCTGGAGGGTTACAAGCAAAACAAACACATCAAATCTTCAAAAGACTTTG
    ACATTACATTTTGCCATGATCTTATTGACTACTTCAAAAACTGCATTGCTATTCACCC
    CGAGTGGAAGAACTTTGGGTTTGACTTCAGCGACACGTCTACGTATGAGGACATCTC
    CGGGTTCTACCGTGAAGTTGAGTTACAAGGGTATAAGATTGACTGGACGTATATTTC
    AGAGAAAGATATCGATCTTTTGCAGGAAAAGGGCCAGTTATATTTATTCCAGATTTA
    CAACAAGGACTTTAGTAAGAAGTCAACAGGAAATGACAACTTGCATACGATGTATT
    TGAAAAATCTTTTTTCTGAGGAAAATCTTAAGGACATCGTACTGAAATTGAATGGCG
    AGGCTGAAATCTTCTTCCGTAAATCCTCCATTAAGAATCCCATTATCCACAAAAAGG
    GGTCTATCCTGGTGAATCGTACCTACGAGGCAGAGGAGAAGGATCAATTCGGAAAT
    ATTCAGATTGTTCGTAAGAACATCCCCGAGAACATTTATCAAGAATTGTATAAGTAC
    TTTAATGACAAATCTGACAAAGAGTTATCCGACGAAGCTGCGAAACTGAAAAACGT
    TGTTGGTCACCACGAGGCCGCCACTAATATCGTAAAAGACTACCGTTATACCTATGA
    CAAGTACTTTTTGCACATGCCGATCACTATCAACTTCAAGGCGAATAAGACGGGCTT
    CATTAACGATCGTATCCTGCAATACATCGCCAAGGAGAAGGACCTTCACGTCATTGG
    GATTGACCGTGGTGAGCGTAACCTGATTTATGTAAGCGTCATTGATACCTGCGGTAA
    TATCGTCGAACAGAAAAGTTTCAACATTGTAAATGGATATGACTATCAGATCAAACT
    TAAGCAGCAGGAGGGTGCACGCCAGATTGCCCGCAAGGAATGGAAGGAGATTGGG
    AAGATTAAGGAAATTAAAGAAGGTTACTTATCACTGGTTATTCACGAGATCAGTAA
    AATGGTAATCAAATATAACGCGATCATTGCCATGGAGGATCTGAGCTATGGCTTTAA
    AAAGGGCCGTTTCAAAGTCGAGCGCCAGGTATATCAAAAGTTTGAAACAATGCTGA
    TTAACAAATTAAACTATCTGGTTTTCAAAGATATTTCGATCACTGAAAATGGCGGGC
    TGTTGAAGGGATACCAACTTACATACATCCCTGACAAACTGAAAAATGTCGGTCACC
    AATGTGGATGTATCTTTTATGTACCAGCAGCGTATACGAGCAAAATCGATCCAACTA
    CGGGTTTTGTGAACATCTTTAAGTTCAAGGATTTGACAGTAGATGCCAAACGCGAGT
    TCATTAAAAAATTTGATTCAATTCGCTACGATTCAGAGAAAAATCTTTTTTGTTTCAC
    GTTCGATTACAATAATTTCATTACGCAGAACACAGTAATGTCAAAGTCAAGCTGGTC
    GGTCTACACGTATGGAGTCCGTATTAAACGTCGTTTTGTAAACGGCCGTTTCTCAAA
    TGAATCAGATACAATTGATATTACGAAGGATATGGAGAAGACATTAGAGATGACTG
    ACATTAACTGGCGCGACGGACATGATCTTCGTCAGGACATTATTGATTATGAGATTG
    TACAGCATATCTTTGAGATCTTCCGCCTGACCGTTCAGATGCGCAATTCGTTGTCCGA
    GTTAGAAGACCGCGATTACGACCGTTTAATCAGTCCCGTCTTAAACGAAAATAACAT
    CTTCTACGATTCAGCCAAGGCAGGCGATGCCTTGCCAAAGGATGCTGACGCAAATG
    GCGCATACTGTATTGCGTTGAAAGGCCTTTATGAAATCAAGCAAATTACCGAAAACT
    GGAAAGAAGACGGAAAATTCTCCCGTGATAAGTTGAAAATCTCTAATAAGGATTGG
    TTCGATTTCATCCAAAATAAACGCTATTTGAAACGTCCGGCAGCGACCAAAAAAGCC
    GGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGA
    AACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 75
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGAA
    CTAATAATTTCCAAAATTTTATAGGCATCTCTTCTTTACAGAAGACTCTTCGTAACGC
    CCTAATCCCGACTGAGACCACACAACAATTCATAGTGAAAAATGGGATCATTAAAG
    AAGACGAGCTGCGTGGGGAGAACAGGCAGATCCTAAAAGACATAATGGACGATTAT
    TATAGAGGGTTCATCTCAGAGACATTATCTAGCATCGACGACATTGACTGGACCTCC
    CTGTTTGAAAAAATGGAAATCCAGCTGAAGAATGGTGACAATAAAGACACATTAAT
    AAAAGAACAAACAGAGTACAGGAAAGCCATCCACAAGAAGTTCGCAAACGATGAC
    AGATTCAAAAATATGTTCAGTGCGAAGCTAATATCCGACATCTTACCAGAGTTTGTA
    ATACACAATAACAATTACAGCGCGAGCGAAAAGGAAGAGAAAACGCAAGTAATTA
    AGCTTTTTAGTAGGTTCGCTACCTCTTTCAAAGATTACTTCAAAAATCGTGCTAACTG
    CTTCTCAGCCGACGACATATCTTCAAGTTCCTGTCACCGTATCGTGAATGATAACGC
    TGAGATATTCTTCTCAAACGCCCTTGTATACCGTAGGATCGTAAAGTCCTTATCTAAC
    GATGATATAAACAAGATCAGTGGAGACATGAAAGACAGCCTTAAAGAGATGTCTCT
    AGAAGAAATTTACTCCTATGAAAAGTATGGGGAGTTTATAACACAGGAGGGGATCA
    GCTTCTACAACGACATCTGCGGAAAGGTGAACAGTTTCATGAATCTTTACTGCCAGA
    AGAATAAAGAGAACAAAAATCTTTATAAGCTTCAAAAGTTGCACAAACAAATACTG
    TGCATTGCCGATACATCATATGAGGTCCCCTATAAGTTCGAATCTGATGAGGAAGTT
    TATCAATCTGTTAACGGCTTTCTAGACAATATCAGCTCAAAACACATCGTAGAAAGA
    CTGAGGAAAATAGGTGATAATTATAATGGATACAACTTGGATAAAATATATATAGT
    CTCTAAATTTTACGAGTCAGTATCCCAGAAAACGTATAGGGATTGGGAGACCATCAA
    CACGGCGTTAGAGATTCATTACAATAACATCTTACCGGGAAACGGAAAAAGTAAGG
    CGGACAAAGTAAAGAAAGCCGTTAAAAATGACTTACAAAAGAGTATAACAGAAAT
    AAACGAACTAGTAAGCAACTACAAGCTTTGTTCCGATGATAATATCAAGGCCGAGA
    CATATATCCATGAGATCTCCCACATTCTAAACAATTTCGAAGCGCAAGAACTTAAAT
    ATAATCCCGAAATCCACCTGGTGGAAAGTGAACTAAAGGCTAGTGAGTTAAAGAAC
    GTTCTTGATGTTATCATGAACGCCTTCCATTGGTGCTCTGTTTTTATGACCGAGGAGT
    TGGTTGATAAAGATAATAATTTCTACGCTGAATTAGAGGAGATATACGACGAAATCT
    ACCCAGTGATTTCACTATACAACTTGGTCAGGAACTATGTTACACAAAAGCCGTACA
    GCACTAAGAAAATTAAGCTAAATTTCGGTATCCCCACGTTAGCCGACGGGTGGAGC
    AAGTCCAAAGAATATTCCAACAATGCGATTATTTTAATGCGTGACAATCTTTATTAC
    CTTGGCATCTTCAATGCCAAAAACAAACCTGACAAAAAGATTATAGAAGGTAATAC
    GTCCGAGAACAAAGGCGATTACAAGAAGATGATTTATAACCTACTGCCCGGACCAA
    ACAAAATGATCCCCAAAGTTTTTCTTAGTTCTAAAACCGGCGTAGAGACGTATAAAC
    CTTCTGCCTATATCTTAGAGGGATATAAGCAGAACAAACATATCAAATCTTCCAAGG
    ACTTTGATATTACATTCTGCCACGATTTAATTGACTACTTCAAAAATTGCATAGCGAT
    ACATCCGGAGTGGAAGAACTTTGGCTTCGACTTCAGTGATACATCCACCTATGAGGA
    TATATCAGGCTTCTATCGTGAGGTCGAATTGCAAGGGTACAAAATCGATTGGACGTA
    TATATCCGAGAAAGACATAGACCTTCTTCAAGAAAAGGGGCAGTTATATTTATTCCA
    AATATACAACAAGGACTTCAGTAAGAAGTCAACAGGTAATGACAACTTACACACCA
    TGTACTTGAAAAATTTATTTTCTGAAGAAAACCTAAAGGACATTGTACTAAAACTGA
    ACGGGGAGGCAGAAATTTTTTTTAGAAAGAGCAGCATAAAAAACCCAATAATTCAT
    AAGAAAGGAAGCATTTTAGTTAATAGGACGTACGAGGCAGAGGAAAAGGACCAGTT
    TGGCAATATCCAGATCGTAAGGAAAAATATTCCTGAAAACATATATCAGGAACTAT
    ATAAATACTTTAACGACAAATCCGACAAAGAATTATCCGACGAGGCTGCAAAGCTG
    AAGAACGTCGTAGGGCACCATGAGGCAGCGACTAATATTGTGAAAGACTATAGGTA
    TACATACGACAAATACTTTCTGCACATGCCCATCACGATTAACTTCAAGGCGAACAA
    GACGGGATTCATTAACGACCGTATATTACAATATATTGCTAAGGAGAAAGATCTGCA
    TGTAATAGGTATCGACAGAGGCGAACGTAATTTAATCTACGTGTCCGTCATCGACAC
    GTGCGGGAACATCGTAGAGCAAAAGAGTTTTAATATAGTAAATGGCTATGATTACC
    AAATTAAGCTAAAGCAGCAAGAAGGAGCAAGACAGATAGCTAGGAAAGAATGGAA
    GGAGATAGGAAAAATAAAGGAGATCAAGGAGGGGTATCTTAGCCTAGTAATTCATG
    AAATATCTAAGATGGTTATCAAATACAACGCTATCATAGCGATGGAAGACTTATCTT
    ATGGTTTCAAGAAAGGAAGGTTCAAAGTAGAGCGTCAAGTTTATCAAAAGTTCGAA
    ACGATGTTGATTAATAAACTAAACTATTTGGTATTTAAAGATATATCTATCACCGAG
    AATGGTGGTCTACTAAAGGGTTACCAGCTTACATACATACCGGACAAACTTAAAAA
    CGTCGGACATCAGTGTGGATGCATTTTCTACGTTCCAGCTGCATATACCAGCAAGAT
    CGACCCAACGACTGGGTTCGTAAATATTTTTAAATTCAAGGATTTGACTGTCGACGC
    CAAAAGAGAGTTCATAAAAAAGTTCGATTCAATTAGGTACGACAGCGAAAAGAATT
    TGTTCTGCTTTACTTTTGACTATAACAATTTCATTACTCAGAACACTGTAATGTCTAA
    GTCCTCTTGGTCAGTCTATACTTATGGCGTTCGTATCAAACGTAGATTTGTTAACGGT
    AGATTCTCAAATGAAAGTGATACAATAGATATCACGAAAGATATGGAGAAAACATT
    AGAAATGACAGACATAAACTGGAGAGACGGACATGACTTGAGACAGGACATTATTG
    ACTACGAGATCGTGCAGCACATCTTTGAGATCTTTCGTTTGACCGTACAAATGCGTA
    ACAGTTTATCTGAGCTTGAGGACAGGGACTACGATAGATTGATATCACCTGTATTAA
    ATGAGAATAACATCTTCTATGATTCCGCAAAAGCAGGCGACGCTCTACCCAAAGAC
    GCTGATGCGAACGGTGCTTATTGCATAGCTTTAAAGGGTTTGTATGAGATCAAACAG
    ATAACAGAAAATTGGAAGGAAGATGGTAAGTTCTCCCGTGACAAGCTTAAAATATC
    AAATAAGGACTGGTTCGATTTTATACAGAATAAGCGTTATTAAAACGTCCGGCAGCG
    ACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCA
    GCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCG
    GGCTAA
    SEQ ID NO: 76
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAATGGAA
    CTAATAACTTCCAGAATTTCATTGGTATCTCCTCTTTACAAAAAACTCTAAGAAACG
    CCCTAATTCCGACTGAAACTACACAGCAATTCATCGTCAAAAACGGGATCATTAAGG
    AGGATGAGTTGAGGGGTGAAAATCGTCAAATTCTTAAAGACATCATGGACGACTAC
    TACAGGGGGTTCATCAGCGAGACGTTATCTAGTATAGACGATATAGACTGGACTTCA
    CTGTTCGAGAAGATGGAAATCCAATTAAAAAATGGGGACAATAAAGATACACTTAT
    AAAGGAACAGACAGAGTATAGAAAGGCAATACACAAAAAGTTTGCCAACGACGAT
    CGTTTCAAGAACATGTTTAGTGCTAAATTGATTTCAGATATTCTGCCGGAATTTGTTA
    TTCACAACAATAATTATAGCGCCAGTGAGAAAGAAGAAAAAACGCAGGTTATCAAA
    CTGTTCAGTCGTTTCGCTACATCTTTTAAGGATTACTTTAAAAACCGTGCAAATTGTT
    TTTCAGCCGACGATATTAGTAGCAGCTCTTGTCACCGTATTGTTAATGATAATGCGG
    AGATTTTCTTTTCAAACGCATTGGTCTACAGGAGGATAGTCAAGTCCCTTTCAAATG
    ACGACATTAATAAGATCTCAGGTGACATGAAAGATTCCTTAAAGGAAATGTCCCTG
    GAAGAGATCTATTCCTATGAAAAGTACGGTGAGTTCATTACTCAAGAGGGTATAAG
    CTTTTACAATGACATATGTGGTAAGGTTAATAGCTTTATGAACCTGTATTGCCAGAA
    GAACAAAGAAAATAAGAATCTGTATAAGTTGCAAAAGCTACACAAACAAATTTTGT
    GCATTGCCGATACATCATACGAGGTGCCATACAAATTCGAGAGCGATGAGGAGGTT
    TATCAGAGCGTGAATGGATTCCTGGACAATATTAGTAGTAAGCATATCGTGGAAAG
    GCTTAGAAAGATAGGTGACAATTACAATGGCTACAATCTGGATAAAATCTACATCGT
    CTCAAAATTCTATGAAAGTGTATCCCAGAAGACGTACCGTGATTGGGAAACTATCAA
    CACCGCTCTGGAGATACATTACAACAATATACTTCCCGGAAACGGCAAGTCAAAAG
    CCGACAAAGTCAAAAAAGCGGTCAAGAACGATTTACAAAAGTCTATCACTGAAATT
    AATGAATTAGTTAGTAATTACAAACTGTGTAGTGATGATAATATTAAGGCAGAGACT
    TACATACACGAAATTTCACACATTTTAAACAACTTCGAGGCACAGGAACTTAAATAT
    AATCCTGAAATTCACCTGGTTGAAAGTGAATTGAAAGCCAGCGAGCTAAAGAACGT
    TTTGGACGTAATCATGAACGCATTCCACTGGTGCTCTGTCTTTATGACAGAGGAACT
    AGTGGATAAGGACAATAATTTTTATGCGGAGCTGGAGGAAATATACGATGAGATAT
    ATCCCGTAATATCATTATATAATCTGGTAAGAAACTATGTGACTCAAAAGCCGTATA
    GCACCAAGAAAATTAAACTTAATTTCGGCATACCCACTTTAGCGGACGGCTGGTCAA
    AATCCAAAGAGTATAGTAATAATGCCATCATCCTGATGCGTGACAACCTGTACTATT
    TAGGTATATTTAACGCCAAAAATAAACCCGACAAAAAGATTATAGAGGGCAACACC
    TCAGAGAACAAAGGTGATTATAAGAAGATGATTTACAACCTTTTACCCGGTCCTAAT
    AAGATGATTCCCAAAGTCTTTCTATCTAGCAAAACTGGTGTTGAAACATACAAACCC
    TCAGCTTATATTTTAGAAGGGTATAAGCAGAATAAGCATATTAAAAGCTCCAAAGAT
    TTCGATATTACCTTTTGCCATGACTTGATAGACTATTTCAAAAATTGTATTGCCATTC
    ACCCTGAATGGAAAAACTTCGGATTTGACTTCTCTGACACATCCACCTACGAAGACA
    TTTCAGGTTTTTACAGGGAAGTCGAGCTACAGGGTTATAAAATTGATTGGACATACA
    TCAGCGAGAAAGATATTGACCTACTTCAAGAAAAAGGGCAGCTATACCTGTTCCAG
    ATATACAATAAAGACTTCAGTAAAAAAAGCACCGGGAACGATAATCTTCACACAAT
    GTACTTAAAAAATTTATTTAGTGAAGAGAATCTGAAGGATATAGTGCTGAAGTTAAA
    CGGGGAGGCAGAGATATTTTTTAGAAAATCTAGTATTAAGAATCCGATCATCCACAA
    GAAGGGTTCTATCCTTGTTAATAGGACTTATGAGGCAGAAGAAAAAGACCAATTCG
    GCAACATACAAATTGTCCGTAAAAATATCCCTGAGAACATTTATCAGGAACTATACA
    AGTACTTCAATGATAAAAGCGACAAGGAGCTGAGCGACGAGGCTGCTAAGTTAAAG
    AATGTGGTGGGCCACCATGAGGCAGCAACGAATATTGTGAAGGACTATCGTTATAC
    CTACGATAAATACTTTCTTCATATGCCGATCACCATTAATTTCAAGGCAAACAAAAC
    TGGCTTCATTAACGATCGTATCTTACAATATATCGCAAAAGAGAAAGACCTTCACGT
    TATCGGGATCGATAGAGGCGAGCGTAACCTAATTTATGTTTCTGTGATAGACACCTG
    TGGGAACATAGTCGAACAGAAATCATTTAATATTGTTAACGGCTACGATTATCAGAT
    AAAGTTGAAGCAACAAGAGGGTGCACGTCAAATAGCAAGGAAAGAATGGAAAGAA
    ATAGGCAAGATTAAAGAAATAAAAGAAGGTTATTTATCCCTTGTAATACACGAAAT
    TAGCAAAATGGTGATTAAATATAATGCGATCATTGCCATGGAGGATCTTTCTTACGG
    CTTCAAAAAGGGGAGATTCAAAGTCGAGAGGCAGGTGTATCAGAAGTTTGAGACCA
    TGCTAATCAATAAACTAAATTATCTAGTATTCAAAGACATAAGCATCACCGAAAATG
    GCGGCTTGTTGAAGGGTTATCAATTGACCTACATCCCAGATAAACTAAAAAACGTAG
    GGCATCAATGCGGATGTATATTTTACGTTCCAGCCGCATACACTTCCAAAATCGATC
    CAACTACGGGTTTTGTGAACATCTTCAAATTCAAAGACTTGACTGTCGATGCTAAGA
    GGGAGTTTATCAAGAAATTTGACTCCATTAGATACGACAGTGAGAAGAATCTGTTCT
    GTTTTACCTTTGATTATAACAACTTTATAACTCAAAACACAGTCATGAGTAAGTCAT
    CTTGGTCAGTGTATACGTATGGTGTGAGGATTAAAAGGAGGTTTGTTAACGGGAGAT
    TTTCCAATGAAAGTGATACAATAGATATAACCAAGGACATGGAAAAGACTCTTGAA
    ATGACCGACATTAACTGGAGAGATGGCCACGACTTACGTCAAGATATAATCGATTA
    CGAGATAGTGCAACATATCTTTGAGATATTTAGGCTTACTGTCCAAATGCGTAACTC
    ATTAAGTGAGTTGGAGGACAGGGATTACGATAGGCTAATAAGTCCTGTTCTTAACGA
    AAACAATATATTCTACGATTCAGCAAAGGCGGGAGACGCCCTGCCCAAGGACGCGG
    ATGCTAACGGCGCATACTGTATTGCCCTGAAAGGCTTGTACGAGATAAAACAGATC
    ACGGAGAACTGGAAAGAAGATGGAAAATTCAGTCGTGACAAGTTAAAAATTAGTAA
    CAAAGACTGGTTCGACTTTATTCAGAACAAGAGATATCTGAAACGTCCGGCAGCGA
    CCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAG
    CCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGG
    GCTAA
    SEQ ID NO: 77
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGAA
    CCAATAACTTTCAAAACTTTATAGGCATCTCCAGTCTACAGAAGACACTACGTAACG
    CTTTGATACCAACTGAGACCACGCAGCAGTTTATCGTCAAGAACGGTATTATAAAGG
    AAGACGAGCTAAGGGGGGAAAACCGTCAGATCTTAAAGGACATCATGGATGACTAC
    TACAGAGGCTTCATAAGTGAGACTTTGTCTAGTATAGACGACATCGACTGGACCAGT
    TTATTTGAGAAGATGGAAATTCAGTTAAAGAACGGGGACAATAAAGACACACTAAT
    TAAAGAGCAGACCGAATACAGAAAAGCTATACACAAAAAGTTTGCCAACGATGATA
    GATTCAAAAATATGTTTTCAGCAAAATTGATTTCCGACATATTGCCAGAATTCGTAA
    TCCATAATAACAATTATTCTGCAAGTGAGAAGGAAGAGAAGACCCAAGTAATCAAG
    CTGTTTTCCCGTTTTGCTACGAGTTTCAAAGATTATTTCAAGAATAGGGCTAATTGTT
    TCTCCGCGGACGACATAAGTAGCAGTTCCTGTCACAGGATTGTGAACGATAATGCTG
    AGATATTTTTTTCCAATGCCCTAGTGTATAGGAGAATAGTTAAAAGCTTAAGCAACG
    ACGATATCAATAAAATTTCAGGGGACATGAAGGACAGCTTAAAGGAAATGAGTTTG
    GAGGAGATTTACAGTTATGAAAAATACGGAGAGTTTATAACTCAGGAAGGCATCTC
    TTTCTATAATGATATCTGTGGGAAGGTAAACTCCTTCATGAATTTATATTGCCAGAA
    GAATAAGGAAAACAAAAATCTTTACAAGCTTCAAAAGTTACATAAGCAGATCTTAT
    GTATTGCCGACACGAGTTATGAAGTGCCTTATAAATTCGAGAGTGATGAGGAAGTGT
    ATCAGTCTGTTAACGGATTCCTAGATAATATAAGTTCCAAACATATAGTCGAGAGGC
    TGAGGAAGATTGGCGATAACTATAATGGATATAATCTTGACAAAATCTATATAGTCT
    CTAAATTTTATGAAAGCGTCAGCCAGAAGACATATAGAGATTGGGAAACTATAAAC
    ACAGCCCTTGAAATACATTACAATAACATCCTACCCGGCAATGGTAAGTCTAAGGCA
    GACAAAGTTAAAAAAGCAGTAAAGAATGACTTACAGAAGTCAATCACGGAGATAA
    ATGAGTTGGTCAGTAACTACAAATTATGCTCCGACGATAATATTAAGGCCGAAACAT
    ATATACACGAGATAAGTCATATATTAAACAATTTCGAAGCCCAGGAGTTAAAATAT
    AACCCTGAAATTCATCTGGTCGAAAGTGAGTTAAAGGCCAGTGAGTTAAAGAATGT
    ACTTGACGTAATTATGAATGCTTTTCATTGGTGCTCCGTGTTCATGACCGAGGAGTTA
    GTAGATAAAGACAATAACTTTTACGCCGAACTTGAAGAGATATACGACGAGATTTA
    TCCGGTAATCAGCTTGTACAACTTAGTTAGAAATTATGTAACACAGAAGCCTTACTC
    TACTAAAAAAATAAAACTGAACTTTGGTATCCCAACTCTTGCAGATGGTTGGAGTAA
    AAGCAAGGAATATAGCAACAATGCGATCATCTTGATGAGAGACAACTTGTACTATTT
    GGGAATCTTCAACGCGAAAAATAAACCCGACAAAAAAATCATCGAAGGGAATACCT
    CTGAGAATAAAGGTGACTATAAGAAAATGATTTACAATCTACTTCCTGGTCCTAATA
    AAATGATCCCGAAAGTGTTTCTTAGTTCTAAGACTGGTGTCGAGACGTACAAACCTA
    GCGCGTACATCTTAGAAGGGTACAAGCAGAATAAACACATCAAATCAAGCAAAGAC
    TTCGATATTACTTTTTGCCATGACTTGATAGACTACTTTAAAAACTGCATAGCAATCC
    ACCCGGAGTGGAAAAACTTTGGCTTTGATTTCTCTGACACCTCTACATATGAGGACA
    TATCTGGTTTTTACCGTGAGGTTGAATTGCAGGGATACAAAATTGACTGGACTTACA
    TATCTGAAAAAGATATCGATCTATTGCAGGAGAAAGGCCAGCTTTACCTTTTCCAGA
    TCTATAATAAGGACTTCTCTAAGAAGTCTACAGGGAATGATAATTTGCACACTATGT
    ACTTAAAAAATCTGTTTTCCGAGGAAAACTTGAAAGACATTGTTTTAAAGTTGAACG
    GAGAAGCTGAAATATTTTTCAGAAAGAGCTCCATAAAAAACCCGATCATTCATAAG
    AAGGGATCTATCCTGGTTAACAGAACGTACGAAGCGGAAGAAAAAGACCAATTCGG
    AAACATTCAAATTGTTAGAAAGAATATCCCTGAGAACATCTACCAGGAGTTATATAA
    GTATTTTAATGATAAGTCAGATAAGGAACTATCTGACGAAGCGGCGAAGCTTAAAA
    ATGTTGTAGGACACCATGAGGCTGCTACAAATATAGTCAAGGACTACCGTTATACCT
    ACGATAAGTACTTTCTACACATGCCCATTACCATCAATTTTAAAGCTAATAAAACGG
    GTTTTATCAACGATCGTATCCTACAATATATTGCGAAAGAGAAGGATTTGCATGTCA
    TTGGCATTGATAGAGGTGAGAGGAACCTAATATACGTATCCGTGATTGATACGTGCG
    GGAACATAGTTGAACAGAAATCATTTAATATAGTTAATGGGTACGACTATCAGATTA
    AGCTAAAGCAACAAGAAGGCGCCAGGCAAATTGCCCGTAAAGAATGGAAAGAGAT
    CGGGAAGATCAAGGAAATAAAAGAAGGATACCTTTCCCTGGTCATCCATGAAATTA
    GCAAAATGGTGATTAAGTACAATGCCATAATCGCGATGGAGGACTTAAGCTACGGG
    TTCAAAAAGGGGAGGTTTAAGGTGGAGAGGCAAGTGTACCAGAAATTTGAGACCAT
    GCTAATCAACAAACTGAACTACCTAGTTTTTAAGGACATTTCAATTACAGAGAATGG
    AGGACTTTTAAAGGGTTACCAACTAACGTATATACCAGATAAGTTGAAAAATGTCG
    GTCACCAGTGTGGCTGCATCTTTTACGTTCCCGCCGCTTATACATCTAAAATTGATCC
    AACCACAGGCTTTGTAAATATCTTTAAATTCAAAGATTTAACTGTGGATGCAAAAAG
    AGAGTTTATCAAGAAATTCGATAGCATTCGTTATGATAGCGAGAAGAACCTGTTCTG
    CTTTACTTTCGACTATAACAACTTTATAACTCAAAACACCGTGATGTCAAAAAGCTC
    ATGGTCAGTCTACACCTATGGTGTAAGGATTAAAAGGCGTTTCGTGAATGGGAGATT
    CTCCAATGAAAGTGACACGATCGACATAACAAAGGACATGGAGAAGACACTAGAG
    ATGACTGATATTAATTGGAGAGACGGACACGATCTGCGTCAAGATATAATTGATTAT
    GAGATAGTACAGCACATATTTGAGATCTTCCGTTTGACTGTCCAAATGCGTAATTCC
    CTTTCTGAGCTGGAAGATAGGGACTATGATAGATTAATATCCCCTGTACTAAATGAG
    AACAACATTTTCTATGATAGTGCAAAAGCCGGGGATGCATTGCCGAAAGACGCTGA
    CGCTAATGGGGCGTACTGTATAGCTTTAAAGGGGCTTTACGAAATAAAGCAGATAA
    CCGAAAACTGGAAGGAAGATGGCAAATTCTCAAGGGACAAACTTAAGATCTCTAAC
    AAGGATTGGTTCGATTTTATACAAAACAAACGTTATTTGAAACGTCCGGCAGCGACC
    AAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCC
    CGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGG
    CTAA
    SEQ ID NO: 78
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAATGGTA
    CAAACAACTTTCAGAATTTCATTGGGATCTCTAGCTTACAGAAGACCCTGAGGAATG
    CGTTGATTCCAACTGAAACAACCCAGCAATTCATCGTGAAAAATGGGATAATCAAA
    GAGGATGAGTTAAGGGGTGAAAACCGTCAAATATTGAAGGATATTATGGACGACTA
    CTACCGTGGATTCATCTCAGAGACGTTGAGCAGCATTGACGACATAGACTGGACTAG
    CCTTTTCGAGAAGATGGAAATTCAGTTAAAGAACGGAGATAACAAAGATACACTAA
    TCAAGGAACAGACAGAATACAGAAAAGCAATTCATAAGAAATTCGCTAATGACGAT
    CGTTTTAAAAACATGTTCTCTGCAAAATTAATTAGCGACATTCTGCCGGAATTCGTT
    ATACATAATAATAACTACAGTGCTTCTGAAAAGGAAGAGAAAACTCAGGTAATAAA
    ACTGTTCTCTCGTTTTGCCACATCCTTCAAAGACTACTTTAAAAATAGAGCGAACTG
    CTTTAGCGCCGACGATATTAGTTCTTCCTCATGCCACAGGATTGTCAACGATAATGC
    AGAGATATTCTTTTCTAACGCACTAGTCTACAGAAGGATTGTAAAGTCTTTGTCAAA
    TGATGACATAAACAAGATTAGTGGAGATATGAAAGACTCTCTAAAGGAAATGAGCC
    TTGAGGAGATATACTCTTATGAAAAGTACGGTGAGTTTATTACCCAAGAAGGCATTA
    GTTTCTATAATGACATTTGTGGAAAAGTTAACAGTTTTATGAATCTATACTGTCAAA
    AAAATAAGGAGAATAAAAATCTTTATAAGTTGCAAAAACTGCATAAGCAGATATTA
    TGTATAGCAGACACGAGCTATGAGGTACCGTACAAGTTCGAGAGCGATGAGGAAGT
    CTACCAATCTGTCAACGGATTTTTGGACAACATTTCTTCAAAACATATTGTGGAGAG
    GCTTAGGAAAATAGGCGACAATTATAATGGATATAACTTAGATAAGATATATATTGT
    TTCCAAATTCTACGAATCTGTAAGCCAGAAGACATACAGAGATTGGGAAACGATAA
    ACACAGCCCTTGAAATTCACTATAACAACATACTACCTGGAAACGGCAAATCAAAG
    GCCGACAAAGTTAAGAAGGCCGTAAAGAATGATTTACAGAAGAGCATAACGGAGAT
    CAATGAGCTGGTGTCTAACTATAAATTGTGTAGCGATGACAACATAAAAGCCGAGA
    CTTACATTCACGAAATTTCACACATACTTAACAACTTTGAAGCTCAGGAATTAAAGT
    ATAATCCCGAAATACACCTTGTGGAGTCCGAACTAAAGGCTAGTGAGCTTAAGAAC
    GTCCTAGACGTAATTATGAATGCCTTCCACTGGTGTAGTGTTTTTATGACCGAGGAA
    CTTGTTGACAAAGATAATAATTTTTATGCAGAACTAGAAGAGATATACGATGAAATA
    TACCCGGTGATCAGTTTGTACAATCTTGTCAGGAACTATGTGACACAAAAGCCCTAT
    TCAACAAAGAAAATAAAACTTAATTTCGGAATTCCTACGTTAGCTGATGGCTGGTCT
    AAATCCAAGGAATACAGCAACAACGCTATAATTCTGATGAGAGATAACTTGTACTA
    TCTAGGCATCTTCAATGCCAAAAATAAGCCTGATAAGAAGATTATAGAGGGCAACA
    CTTCAGAGAACAAGGGCGACTACAAGAAAATGATCTATAACCTATTGCCTGGCCCA
    AACAAGATGATTCCGAAGGTCTTCCTATCATCCAAGACCGGCGTTGAGACATACAA
    GCCATCAGCGTATATTTTAGAGGGGTACAAACAAAACAAGCACATAAAGTCTAGTA
    AAGACTTCGATATAACATTTTGTCATGACTTAATTGACTACTTTAAGAATTGCATCGC
    TATACACCCGGAATGGAAGAATTTCGGCTTCGACTTCTCTGATACATCTACCTACGA
    GGACATTAGCGGGTTTTACCGTGAAGTCGAATTACAAGGGTATAAGATAGATTGGA
    CGTACATCTCTGAGAAAGACATAGACTTGCTTCAGGAAAAGGGCCAGTTGTATCTAT
    TCCAAATATACAATAAGGATTTTTCCAAGAAATCTACGGGTAATGACAATCTTCACA
    CAATGTATCTTAAGAACCTTTTCTCAGAAGAGAACCTGAAGGACATTGTCTTAAAAC
    TAAATGGCGAAGCTGAGATTTTTTTCAGGAAGTCTTCAATTAAGAACCCGATAATCC
    ACAAGAAGGGGAGTATTCTTGTGAATAGAACTTACGAGGCCGAAGAAAAAGACCAA
    TTTGGTAACATCCAGATAGTCAGAAAGAACATTCCAGAGAACATCTACCAAGAGCT
    ATACAAATATTTCAACGACAAGTCCGATAAGGAACTGTCCGATGAGGCAGCCAAGT
    TGAAGAATGTCGTGGGTCATCATGAAGCTGCTACTAACATTGTCAAGGACTATCGTT
    ATACTTACGACAAGTATTTCCTACACATGCCGATAACAATTAATTTCAAGGCTAACA
    AAACAGGCTTTATCAACGATCGTATCTTGCAGTACATAGCTAAGGAAAAGGATTTGC
    ATGTGATTGGCATTGATAGAGGGGAGCGTAACTTGATATATGTGTCTGTCATAGACA
    CGTGTGGCAACATCGTCGAACAGAAATCATTCAACATAGTAAACGGCTACGATTAC
    CAAATTAAGCTGAAACAGCAAGAGGGTGCACGTCAAATTGCGCGTAAAGAGTGGAA
    AGAAATTGGTAAAATCAAGGAAATTAAAGAAGGCTACTTGTCTCTTGTTATACATGA
    AATTTCCAAGATGGTTATAAAGTATAACGCGATAATTGCTATGGAAGACTTATCATA
    CGGGTTTAAAAAGGGGAGGTTCAAGGTAGAGAGGCAGGTCTATCAAAAGTTCGAGA
    CGATGTTGATTAATAAACTAAACTATCTAGTGTTCAAAGATATCAGCATTACGGAGA
    ACGGGGGGCTACTGAAAGGATATCAACTAACGTACATTCCCGATAAGTTAAAGAAC
    GTTGGTCATCAATGTGGTTGCATCTTCTACGTGCCTGCTGCCTATACGTCCAAAATAG
    ATCCAACTACTGGATTTGTTAACATCTTTAAATTCAAAGATTTAACCGTAGACGCCA
    AAAGGGAATTTATAAAAAAATTTGACAGCATCCGTTACGATAGCGAAAAGAATCTG
    TTCTGTTTTACTTTCGACTACAATAATTTCATCACGCAAAATACGGTAATGTCTAAGT
    CAAGTTGGAGCGTCTACACGTATGGAGTCAGGATCAAGAGGCGTTTCGTAAATGGA
    AGATTCTCTAATGAGTCAGATACTATAGACATCACGAAAGATATGGAGAAAACCTT
    GGAGATGACGGATATTAACTGGCGTGATGGACACGATTTAAGACAGGACATTATTG
    ACTATGAGATTGTGCAACACATCTTCGAAATATTCCGTCTAACAGTCCAAATGAGGA
    ATAGCCTAAGTGAATTGGAGGACCGTGATTACGATAGGCTTATAAGTCCTGTCCTTA
    ACGAAAACAATATTTTCTATGATAGTGCTAAGGCGGGGGACGCACTGCCTAAAGAC
    GCAGATGCTAACGGGGCATACTGCATTGCGTTAAAGGGTCTGTACGAAATCAAGCA
    GATTACGGAAAACTGGAAAGAGGATGGCAAGTTTAGCAGAGATAAGTTGAAGATAA
    GTAACAAAGATTGGTTTGACTTTATTCAGAATAAAAGGTATTTAAAACGTCCGGCAG
    CGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGG
    CAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTC
    CGGGCTAA
    SEQ ID NO: 79
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAACGGCA
    CTAATAATTTCCAGAATTTCATCGGCATTAGCAGCTTACAAAAGACGTTGAGGAATG
    CCTTAATACCCACAGAAACTACTCAACAATTTATAGTGAAGAATGGGATAATTAAG
    GAAGACGAGTTGAGAGGTGAAAATAGGCAAATCTTGAAAGACATTATGGATGACTA
    CTACAGGGGCTTCATTAGTGAAACGTTGTCTTCAATAGATGACATTGATTGGACTTC
    TTTGTTTGAGAAGATGGAAATACAGTTAAAGAACGGCGACAATAAGGATACACTTA
    TCAAAGAGCAAACAGAATATAGAAAAGCAATTCACAAAAAGTTTGCTAACGATGAT
    AGGTTCAAGAACATGTTTAGCGCTAAACTAATATCAGACATCCTTCCCGAGTTCGTT
    ATTCATAACAATAACTATAGTGCAAGTGAAAAAGAGGAGAAGACACAGGTGATTAA
    GCTGTTCTCCAGATTCGCGACTTCTTTCAAAGATTACTTCAAAAACAGAGCCAACTG
    TTTTTCAGCTGACGATATCTCTAGTAGTAGTTGTCACCGTATAGTGAACGATAACGC
    TGAGATCTTCTTTAGCAATGCATTAGTGTATAGAAGGATAGTTAAGTCTCTAAGCAA
    TGATGATATCAATAAAATTTCCGGAGACATGAAGGACTCCCTAAAGGAAATGTCCTT
    AGAAGAGATCTACTCATATGAGAAATACGGGGAATTTATTACGCAGGAAGGGATCT
    CCTTTTACAATGACATATGCGGGAAGGTCAACTCTTTCATGAACTTATACTGCCAAA
    AGAACAAGGAGAACAAGAATTTATATAAACTTCAGAAACTTCACAAACAAATACTG
    TGCATAGCCGATACCTCATATGAGGTTCCTTACAAATTTGAATCAGATGAAGAGGTA
    TACCAATCCGTTAACGGCTTTCTTGACAATATTAGCTCAAAGCACATCGTGGAGAGG
    TTGAGAAAGATTGGTGATAATTATAATGGCTACAATCTAGATAAGATATATATTGTT
    AGCAAGTTCTACGAGTCTGTGTCCCAAAAAACATATAGGGATTGGGAGACAATTAA
    TACTGCTCTAGAAATCCATTACAACAACATCCTTCCTGGAAATGGCAAGAGTAAGGC
    CGACAAAGTCAAGAAAGCAGTGAAAAATGATCTGCAAAAATCAATTACTGAGATAA
    ACGAGCTAGTATCTAATTACAAGCTTTGTAGCGACGATAACATTAAGGCAGAAACG
    TACATACACGAGATTAGTCACATCTTAAATAATTTTGAAGCCCAAGAACTGAAATAT
    AACCCTGAGATACACCTTGTTGAATCCGAGTTAAAGGCGTCTGAACTAAAAAACGT
    GTTAGACGTTATTATGAATGCCTTCCACTGGTGTAGCGTCTTTATGACTGAGGAGTT
    GGTTGATAAGGATAATAACTTTTACGCTGAATTGGAAGAAATTTATGACGAAATCTA
    TCCTGTTATTTCTCTATATAATTTGGTGAGAAATTACGTAACGCAAAAGCCCTATAGT
    ACGAAAAAAATAAAACTAAATTTCGGGATCCCTACCCTAGCCGACGGTTGGTCTAA
    ATCCAAGGAGTACTCAAACAATGCAATAATATTGATGAGGGACAACCTGTACTACC
    TAGGCATATTTAATGCCAAAAATAAGCCCGATAAAAAGATTATAGAAGGGAACACG
    TCAGAAAATAAAGGAGACTATAAGAAAATGATCTACAACCTTTTGCCCGGCCCCAA
    TAAAATGATCCCGAAGGTCTTCCTAAGTAGCAAGACTGGCGTAGAGACCTACAAAC
    CATCTGCATACATTTTGGAGGGGTACAAGCAAAACAAGCACATAAAGAGTAGTAAG
    GATTTTGACATTACATTCTGCCATGACTTAATTGACTACTTTAAAAATTGCATCGCAA
    TTCACCCTGAATGGAAAAATTTTGGATTTGATTTCTCTGATACTTCAACATATGAGG
    ATATTTCAGGGTTCTACAGGGAGGTCGAACTACAGGGTTACAAAATAGACTGGACG
    TATATTTCTGAGAAAGATATAGATTTGCTTCAGGAAAAGGGTCAGCTATATCTGTTC
    CAGATATATAATAAGGACTTCTCCAAAAAGAGTACCGGAAATGATAATCTGCACAC
    AATGTACTTAAAAAACTTGTTCTCTGAGGAGAATCTAAAAGACATCGTACTAAAACT
    TAACGGGGAGGCCGAAATTTTTTTTAGGAAGTCCAGCATCAAGAACCCGATTATTCA
    TAAAAAAGGTAGCATTTTGGTGAACCGTACTTATGAGGCGGAAGAAAAAGACCAAT
    TCGGTAATATTCAAATCGTTAGAAAGAACATCCCTGAGAACATTTATCAGGAACTAT
    ACAAATACTTTAACGACAAATCAGATAAGGAGCTTTCTGATGAGGCAGCTAAATTG
    AAAAATGTAGTGGGACATCACGAAGCAGCCACTAACATAGTGAAGGACTACAGATA
    CACATACGATAAGTACTTCCTGCACATGCCTATTACAATTAACTTTAAAGCAAATAA
    AACAGGGTTTATTAACGACAGAATCTTACAGTATATTGCCAAAGAAAAGGATCTGC
    ATGTGATAGGAATAGACAGAGGAGAAAGAAACCTGATATACGTCTCCGTGATTGAT
    ACATGTGGGAACATAGTAGAACAGAAGTCCTTTAACATTGTTAATGGGTACGATTAT
    CAAATTAAATTAAAACAACAAGAAGGAGCACGTCAAATAGCTAGGAAAGAATGGA
    AAGAGATAGGAAAAATTAAGGAAATTAAGGAGGGTTACCTGTCCCTTGTAATTCAT
    GAAATATCCAAAATGGTAATTAAATATAACGCGATCATCGCGATGGAAGATCTAAG
    CTACGGGTTCAAAAAAGGCAGGTTTAAGGTGGAGAGGCAAGTTTACCAAAAGTTCG
    AGACAATGTTGATTAATAAGTTAAACTACTTAGTTTTCAAAGATATCTCCATAACCG
    AGAATGGCGGGCTTTTAAAAGGGTACCAACTAACATATATCCCGGATAAATTGAAG
    AACGTTGGACACCAGTGTGGCTGCATATTTTATGTACCCGCTGCGTATACTTCTAAA
    ATTGACCCGACCACCGGGTTTGTAAACATATTCAAGTTTAAGGACCTAACAGTTGAC
    GCCAAACGTGAGTTCATCAAGAAGTTCGATAGTATAAGGTATGACTCTGAGAAGAA
    CCTTTTCTGCTTCACGTTTGACTATAATAATTTCATCACCCAAAATACAGTTATGTCA
    AAAAGCTCTTGGTCAGTATATACGTATGGCGTAAGGATTAAGCGTAGGTTCGTGAAC
    GGTAGATTTTCCAACGAGTCAGATACTATTGATATTACCAAGGATATGGAGAAGAC
    ATTAGAAATGACAGATATAAATTGGAGGGATGGGCACGATCTAAGGCAAGATATCA
    TTGATTACGAAATTGTTCAGCACATATTCGAGATATTCCGTCTTACAGTACAAATGC
    GTAACAGCTTGTCTGAGTTGGAAGATCGTGACTATGACAGGTTGATATCACCGGTCT
    TGAACGAGAACAATATATTCTACGACAGCGCTAAGGCGGGAGACGCTCTGCCTAAA
    GACGCAGATGCCAATGGGGCGTACTGCATTGCCTTAAAAGGCTTATACGAGATTAA
    ACAGATCACAGAGAACTGGAAAGAGGACGGCAAGTTTTCTAGAGATAAATTGAAAA
    TCTCAAACAAAGACTGGTTCGATTTCATCCAAAACAAAAGATACCTTAAACGTCCGG
    CAGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGC
    AGGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTT
    ATTCCGGGCTAA
    SEQ ID NO: 80
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAATGGAA
    CTAACAACTTCCAGAACTTTATCGGCATCTCTTCCCTCCAAAAGACACTGAGAAATG
    CACTGATCCCAACCGAAACGACTCAACAATTTATTGTTAAGAACGGCATCATAAAA
    GAAGACGAGCTTCGCGGCGAGAACCGCCAGATACTTAAGGATATTATGGACGATTA
    TTACCGAGGCTTTATCAGCGAAACTCTTAGCTCTATTGATGATATCGACTGGACCTC
    CCTCTTCGAAAAAATGGAGATACAGCTCAAGAACGGCGATAATAAAGACACCTTGA
    TAAAGGAACAGACTGAGTACAGGAAAGCGATCCACAAGAAATTCGCGAACGACGA
    CAGGTTTAAAAACATGTTCTCTGCAAAATTGATATCCGACATCTTGCCGGAATTTGT
    GATACACAACAATAACTATAGCGCTTCAGAGAAAGAAGAGAAGACCCAAGTAATCA
    AGTTGTTCAGCCGCTTCGCAACGTCTTTTAAAGATTACTTTAAGAACCGGGCCAATT
    GTTTCTCCGCGGATGATATTAGCTCATCAAGTTGCCATCGAATTGTCAATGATAATG
    CGGAGATCTTCTTCAGCAATGCGCTGGTCTACAGACGAATCGTAAAAAGTCTTTCAA
    ATGACGACATCAATAAGATTAGTGGAGATATGAAGGATTCCCTTAAGGAAATGAGT
    CTTGAAGAAATATACTCATACGAAAAGTACGGGGAATTTATTACCCAGGAGGGGAT
    CTCCTTCTATAACGACATCTGTGGAAAAGTAAACTCATTCATGAACCTGTACTGTCA
    GAAAAACAAAGAAAACAAAAATCTGTATAAACTCCAAAAATTGCACAAGCAAATAT
    TGTGTATAGCGGACACATCATACGAGGTTCCATATAAGTTCGAAAGTGATGAAGAA
    GTCTACCAATCAGTGAATGGGTTTCTGGACAACATTAGTTCCAAGCACATAGTTGAA
    CGACTGCGAAAGATTGGTGACAATTACAACGGCTATAATTTGGACAAGATTTATATA
    GTTAGCAAATTTTATGAATCCGTATCACAAAAGACTTATAGAGACTGGGAAACAATC
    AACACGGCACTTGAGATCCATTATAACAATATTCTTCCAGGGAACGGCAAAAGCAA
    GGCTGATAAGGTAAAAAAGGCCGTTAAGAATGATCTTCAAAAATCCATAACGGAGA
    TCAACGAACTTGTAAGTAACTACAAATTGTGCTCTGACGACAATATAAAGGCTGAA
    ACGTATATTCACGAGATTAGCCATATCCTGAATAACTTTGAGGCCCAAGAACTCAAG
    TATAACCCGGAAATACATTTGGTAGAAAGCGAGCTTAAAGCGAGTGAGCTGAAAAA
    CGTCCTCGATGTGATCATGAATGCTTTCCACTGGTGTAGTGTCTTTATGACTGAGGA
    GTTGGTTGATAAAGACAATAATTTCTACGCTGAACTGGAAGAAATTTACGACGAAAT
    CTATCCAGTGATCTCCCTCTATAACCTCGTTCGAAACTACGTGACGCAGAAACCTTA
    TTCTACAAAGAAAATTAAGTTGAACTTCGGCATTCCTACACTTGCTGACGGATGGTC
    CAAATCCAAAGAGTACTCAAACAACGCAATCATCCTCATGCGGGATAACCTTTATTA
    TTTGGGCATTTTCAACGCCAAAAACAAACCTGATAAAAAGATAATTGAAGGCAATA
    CGAGTGAGAACAAGGGCGACTACAAAAAAATGATATATAACTTGTTGCCAGGCCCC
    AACAAGATGATTCCTAAAGTTTTTCTGTCTTCTAAGACTGGAGTTGAAACTTACAAA
    CCCTCCGCCTACATTCTTGAAGGGTATAAACAGAATAAGCACATAAAGTCCTCAAAG
    GATTTCGACATTACGTTTTGCCATGACCTCATCGACTATTTCAAGAACTGTATCGCCA
    TACATCCGGAGTGGAAGAATTTTGGATTTGATTTCTCCGACACATCTACCTATGAAG
    ACATAAGCGGTTTCTACCGGGAGGTCGAGCTTCAGGGCTATAAGATAGATTGGACA
    TACATTAGTGAAAAAGATATCGATCTTCTGCAAGAAAAGGGACAACTTTACCTTTTT
    CAGATTTATAATAAAGACTTTTCAAAAAAGTCCACAGGGAACGATAATCTGCACAC
    CATGTATCTCAAGAATCTGTTTAGTGAAGAAAACCTTAAAGACATAGTTTTGAAGCT
    TAACGGAGAGGCTGAGATTTTTTTTAGAAAGTCCTCAATTAAAAACCCTATAATACA
    CAAGAAAGGCTCTATTCTTGTTAACAGGACATATGAAGCCGAGGAGAAAGATCAGT
    TTGGCAATATCCAGATTGTTCGCAAGAATATCCCGGAAAATATATATCAGGAGCTGT
    ATAAATACTTTAACGACAAGAGCGACAAGGAGCTGAGTGACGAGGCCGCGAAGCTT
    AAGAATGTAGTAGGTCACCACGAAGCAGCCACCAATATCGTCAAAGACTATAGGTA
    CACGTACGACAAGTACTTTTTGCACATGCCTATAACTATAAACTTCAAAGCTAATAA
    AACTGGGTTTATTAATGACAGGATTCTCCAATACATCGCTAAAGAGAAGGATCTGCA
    TGTAATTGGCATAGACAGAGGTGAGAGAAACTTGATATATGTCAGCGTAATAGACA
    CATGTGGCAATATCGTGGAACAGAAGTCTTTTAACATCGTCAATGGTTACGACTACC
    AAATTAAGTTGAAACAGCAGGAAGGCGCACGACAGATCGCACGAAAGGAATGGAA
    AGAGATAGGCAAAATAAAAGAAATAAAGGAGGGCTATCTCAGTCTCGTTATACACG
    AAATTTCAAAAATGGTTATTAAGTACAATGCAATCATAGCGATGGAGGATCTCAGTT
    ATGGGTTCAAAAAGGGTCGGTTTAAAGTTGAGCGCCAAGTGTACCAAAAGTTCGAG
    ACAATGCTGATTAACAAGCTGAACTACCTCGTCTTCAAAGATATAAGTATTACGGAG
    AACGGTGGCCTTCTTAAAGGCTATCAACTTACTTACATCCCGGACAAGCTCAAAAAC
    GTAGGGCACCAATGCGGGTGTATTTTCTATGTGCCTGCGGCATATACGTCAAAGATT
    GACCCAACCACAGGATTCGTAAACATATTCAAGTTTAAGGACCTCACCGTTGATGCG
    AAAAGGGAGTTCATTAAAAAATTTGATTCTATTCGATATGATAGTGAGAAAAATCTC
    TTTTGTTTCACATTTGACTATAATAATTTTATTACTCAGAATACTGTCATGAGCAAGT
    CATCTTGGTCAGTGTACACATACGGGGTGCGGATCAAACGCAGGTTCGTCAATGGTC
    GCTTCTCAAACGAATCAGACACCATTGACATCACAAAGGACATGGAAAAAACCCTT
    GAGATGACCGACATTAATTGGCGCGATGGTCATGATCTGCGGCAAGACATCATAGA
    CTACGAAATCGTCCAACACATCTTTGAGATCTTTCGCTTGACGGTCCAAATGCGGAA
    CTCCCTGTCCGAGCTCGAGGATAGAGATTATGATCGGCTGATATCTCCCGTGCTTAA
    TGAAAATAACATCTTCTACGACTCCGCCAAGGCGGGTGATGCCCTGCCGAAGGATG
    CGGATGCTAATGGCGCTTATTGCATTGCTCTTAAGGGGCTCTATGAGATAAAGCAGA
    TCACGGAAAACTGGAAAGAAGACGGTAAGTTTAGTAGAGACAAGCTGAAGATCTCA
    AATAAAGACTGGTTTGATTTCATACAGAACAAGCGGTACCTGAAACGTCCGGCAGC
    GACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGC
    AGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCC
    GGGCTAA
    SEQ ID NO: 81
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAATGGCA
    CTAACAATTTTCAGAATTTCATCGGCATTTCAAGTCTGCAAAAAACTCTGAGGAATG
    CTTTGATCCCTACTGAAACCACTCAGCAATTTATAGTCAAGAACGGTATAATTAAAG
    AAGATGAACTCAGGGGTGAAAATAGACAAATACTCAAGGACATTATGGATGACTAT
    TATAGAGGCTTCATCTCAGAGACTCTCTCATCAATAGATGATATCGATTGGACTAGC
    CTTTTCGAGAAAATGGAGATTCAGTTGAAAAATGGTGATAACAAAGATACGTTGAT
    AAAGGAACAGACCGAGTACAGGAAAGCCATTCATAAGAAATTTGCTAATGACGATA
    GATTTAAGAATATGTTTAGTGCAAAACTGATTAGTGACATTCTGCCGGAGTTCGTTA
    TCCATAATAATAACTACTCTGCATCCGAAAAGGAGGAAAAGACGCAAGTTATTAAA
    CTGTTCAGCCGCTTCGCCACAAGCTTCAAGGACTACTTCAAAAATAGAGCCAACTGC
    TTTTCTGCCGACGATATATCATCATCTTCATGCCATCGGATCGTTAACGATAACGCCG
    AGATATTCTTCAGCAACGCCCTTGTATATCGAAGAATAGTCAAAAGTCTGAGTAATG
    ATGATATTAATAAAATTAGCGGTGATATGAAAGACTCCCTGAAGGAAATGTCACTG
    GAGGAAATTTATAGTTACGAAAAGTACGGCGAATTCATTACTCAAGAAGGCATATC
    CTTCTATAACGACATTTGCGGAAAGGTCAACTCATTCATGAACCTTTATTGCCAGAA
    GAATAAGGAGAATAAAAATCTTTACAAATTGCAAAAACTTCACAAACAAATTCTTT
    GCATCGCGGATACGTCCTACGAAGTTCCTTACAAATTTGAATCCGATGAGGAAGTGT
    ATCAGAGTGTCAATGGATTTTTGGATAATATCTCTTCAAAACATATTGTGGAGAGAT
    TGCGCAAAATAGGTGATAACTACAATGGCTACAACCTGGACAAGATTTATATTGTTA
    GCAAGTTCTATGAAAGTGTCAGTCAAAAGACCTACAGAGATTGGGAGACAATCAAC
    ACGGCGCTCGAAATACACTACAATAACATCCTCCCCGGCAATGGGAAGAGTAAAGC
    CGATAAGGTTAAAAAAGCTGTTAAGAACGACCTCCAGAAATCCATCACGGAAATAA
    ACGAGCTGGTTTCCAACTATAAGCTGTGTAGCGATGATAATATTAAGGCTGAGACAT
    ATATACATGAGATCAGCCACATTCTCAACAATTTCGAGGCACAGGAACTCAAATAC
    AATCCCGAGATTCACTTGGTGGAAAGTGAGTTGAAGGCGTCAGAGCTTAAGAATGT
    ACTTGACGTAATAATGAATGCTTTTCATTGGTGCTCCGTGTTCATGACTGAGGAACT
    CGTGGATAAGGATAATAACTTTTATGCGGAGTTGGAAGAGATATACGATGAAATAT
    ACCCGGTTATCTCACTGTATAATCTGGTCAGAAATTACGTGACCCAAAAGCCTTATA
    GTACAAAAAAAATAAAGTTGAACTTCGGTATTCCGACATTGGCAGATGGTTGGTCCA
    AAAGCAAAGAATACTCTAATAACGCCATTATATTGATGCGAGACAATTTGTATTACC
    TTGGGATCTTTAACGCGAAAAACAAACCGGATAAGAAGATCATCGAAGGTAATACA
    TCTGAGAATAAGGGGGATTACAAGAAGATGATTTATAATCTGTTGCCGGGGCCAAA
    CAAGATGATTCCGAAGGTCTTTCTGTCATCTAAGACAGGAGTAGAGACCTACAAACC
    TTCTGCGTACATTTTGGAAGGCTACAAACAGAACAAGCATATAAAATCTAGCAAGG
    ACTTTGATATCACGTTTTGTCATGATCTGATAGATTATTTCAAAAACTGCATCGCTAT
    ACATCCTGAGTGGAAGAATTTCGGCTTTGACTTTTCTGACACCAGCACATACGAAGA
    CATCTCAGGTTTCTACCGGGAAGTCGAGCTCCAGGGGTACAAGATTGACTGGACATA
    TATAAGTGAAAAAGACATCGACCTCCTCCAAGAGAAGGGCCAACTTTACCTGTTCCA
    GATCTATAACAAAGACTTTTCTAAAAAGTCCACGGGTAACGACAACTTGCACACTAT
    GTATCTGAAAAACTTGTTCTCTGAAGAGAACCTCAAGGACATCGTCCTGAAGCTTAA
    CGGGGAGGCGGAGATCTTCTTTAGAAAGTCCTCTATCAAAAATCCCATTATCCATAA
    AAAGGGCTCTATACTCGTTAATAGGACATATGAAGCGGAGGAAAAAGATCAATTTG
    GGAACATCCAGATCGTCCGGAAAAATATACCTGAGAATATCTATCAAGAGCTGTAC
    AAGTATTTTAATGATAAGTCAGACAAAGAGCTCAGTGATGAGGCGGCAAAGCTCAA
    GAACGTGGTGGGGCATCATGAAGCTGCGACGAACATTGTCAAAGATTATAGATACA
    CTTACGATAAATACTTCCTCCACATGCCGATAACGATTAACTTCAAAGCCAATAAGA
    CGGGGTTTATAAATGATCGGATCCTTCAGTACATTGCGAAAGAGAAAGACCTCCATG
    TGATCGGAATTGACCGAGGAGAAAGGAATCTGATTTACGTGTCCGTGATTGATACTT
    GCGGGAATATAGTCGAGCAAAAGAGTTTCAACATAGTCAACGGGTATGACTATCAG
    ATAAAGCTCAAACAGCAGGAAGGTGCGAGGCAAATTGCGCGCAAAGAGTGGAAGG
    AGATAGGCAAGATTAAAGAAATCAAGGAAGGTTATCTCAGCTTGGTGATCCATGAA
    ATATCTAAGATGGTTATAAAGTACAATGCCATAATAGCCATGGAGGATCTTTCCTAC
    GGGTTTAAGAAGGGCCGATTTAAAGTGGAGCGACAAGTTTACCAGAAGTTCGAAAC
    CATGTTGATTAACAAACTTAACTATTTGGTGTTCAAGGATATAAGTATAACCGAAAA
    CGGCGGTTTGCTTAAGGGTTATCAGCTCACGTATATTCCTGATAAACTTAAAAACGT
    TGGACACCAGTGTGGATGTATCTTCTACGTGCCAGCCGCTTACACTAGTAAGATAGA
    TCCTACCACGGGGTTTGTGAATATTTTTAAGTTTAAAGACTTGACAGTCGACGCCAA
    AAGGGAATTTATAAAAAAGTTTGATTCTATCCGCTACGATAGTGAAAAAAATCTCTT
    TTGCTTTACTTTCGACTATAACAACTTCATTACGCAGAACACTGTCATGAGTAAGTCC
    AGCTGGAGCGTCTACACATATGGCGTCCGAATTAAACGACGATTTGTAAACGGGCG
    GTTTTCAAACGAATCTGACACGATAGACATTACCAAGGATATGGAGAAGACACTTG
    AGATGACCGACATAAACTGGCGGGACGGTCACGATCTTCGGCAGGACATAATTGAT
    TACGAAATCGTCCAGCATATATTCGAAATATTTCGACTTACAGTGCAAATGCGGAAC
    AGTCTCTCTGAACTGGAAGATCGCGATTATGACCGGTTGATTTCTCCGGTCCTCAAT
    GAAAATAACATATTTTATGATAGTGCTAAGGCAGGTGATGCGTTGCCAAAGGATGC
    AGACGCTAATGGTGCCTATTGTATCGCGCTCAAGGGATTGTACGAGATAAAGCAAA
    TTACGGAGAACTGGAAGGAGGATGGTAAGTTTAGCCGAGACAAGTTGAAGATTAGC
    AATAAAGACTGGTTTGATTTTATCCAAAACAAGAGGTACCTGAAACGTCCGGCAGC
    GACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGC
    AGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCC
    GGGCTAA
    SEQ ID NO: 82
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAACGGAA
    CTAATAACTTTCAAAATTTCATAGGTATTTCAAGCTTGCAGAAGACCCTGAGGAATG
    CCCTGATTCCAACCGAGACAACGCAGCAGTTCATAGTCAAAAATGGCATTATTAAG
    GAAGATGAGCTGCGGGGGGAAAACCGACAGATACTCAAGGATATTATGGACGACTA
    TTACCGGGGATTTATCTCAGAAACGCTGAGCAGTATTGATGACATCGATTGGACCAG
    TCTTTTCGAGAAAATGGAAATTCAACTTAAGAATGGTGACAATAAAGACACTCTCAT
    AAAGGAGCAAACTGAATACCGAAAAGCCATACACAAAAAGTTTGCCAACGATGACC
    GCTTTAAAAACATGTTTTCAGCTAAGCTCATTAGCGACATTCTCCCCGAGTTTGTGAT
    TCATAACAATAACTATAGCGCATCCGAGAAGGAGGAAAAAACCCAAGTTATCAAAT
    TGTTCAGTAGATTCGCTACGAGCTTTAAAGATTACTTTAAAAACCGGGCTAACTGCT
    TCAGTGCAGACGATATCAGCTCCTCATCCTGTCATCGCATCGTCAATGATAATGCTG
    AGATCTTCTTTTCTAATGCACTGGTTTACCGCAGGATAGTTAAGTCTCTTAGTAACGA
    CGACATCAACAAGATATCAGGAGATATGAAGGATTCCCTTAAAGAAATGAGTCTCG
    AGGAGATATATTCTTATGAAAAATACGGCGAATTTATTACCCAAGAGGGCATTAGTT
    TCTATAATGACATATGCGGAAAAGTTAATAGTTTTATGAATCTCTATTGTCAGAAGA
    ATAAGGAGAATAAGAACCTCTACAAATTGCAGAAGTTGCACAAGCAAATTCTGTGT
    ATCGCGGACACCTCTTACGAGGTCCCATATAAGTTCGAGAGTGATGAAGAAGTATA
    CCAGAGCGTTAATGGGTTCCTGGACAACATCTCAAGTAAACACATAGTCGAAAGGC
    TCCGAAAGATCGGTGATAACTATAACGGATATAATTTGGATAAAATTTATATAGTTA
    GCAAATTTTACGAGAGCGTCAGTCAGAAGACCTACCGGGACTGGGAGACCATAAAC
    ACAGCGCTGGAAATACATTATAACAACATACTGCCTGGGAACGGTAAGTCAAAGGC
    AGACAAGGTTAAAAAGGCTGTGAAGAATGACCTGCAAAAATCAATTACAGAAATAA
    ATGAGTTGGTAAGTAATTACAAACTTTGCAGCGATGATAATATAAAGGCAGAGACG
    TACATACATGAAATATCTCATATCCTCAACAATTTCGAAGCCCAAGAACTGAAGTAC
    AACCCGGAAATTCATCTTGTAGAGTCTGAGTTGAAGGCCTCCGAATTGAAAAACGTT
    CTTGACGTAATTATGAATGCCTTCCACTGGTGCTCAGTATTCATGACGGAAGAGCTC
    GTGGATAAAGACAACAATTTTTACGCTGAACTGGAAGAAATATATGACGAGATTTA
    CCCCGTAATTTCACTCTACAACTTGGTACGAAATTACGTTACCCAAAAGCCATACTC
    AACAAAAAAAATTAAACTGAACTTCGGGATACCCACCCTCGCAGATGGATGGTCAA
    AGTCCAAAGAGTACAGTAACAATGCAATTATCCTGATGCGAGACAACCTTTATTACC
    TCGGGATTTTCAACGCTAAAAATAAACCTGATAAAAAAATAATTGAGGGTAATACC
    TCTGAAAACAAGGGGGATTATAAAAAGATGATATACAATCTGCTGCCTGGCCCGAA
    CAAAATGATTCCTAAAGTCTTCTTGTCTTCCAAGACTGGAGTCGAAACCTACAAGCC
    AAGTGCTTATATACTCGAAGGGTACAAACAAAATAAGCACATAAAATCCAGCAAGG
    ATTTTGATATTACATTCTGCCACGATTTGATTGATTATTTTAAGAACTGTATAGCCAT
    CCACCCAGAATGGAAGAATTTTGGTTTTGATTTTAGCGATACCTCAACATATGAGGA
    TATCTCTGGCTTTTACCGCGAGGTAGAACTGCAAGGTTATAAGATCGATTGGACTTA
    TATTTCTGAAAAGGACATAGATCTCCTGCAAGAGAAAGGGCAACTTTATTTGTTTCA
    AATATACAACAAAGATTTTAGTAAGAAGAGTACTGGCAATGATAACCTTCACACTAT
    GTATCTGAAGAACCTTTTTTCTGAGGAGAACTTGAAGGACATAGTCCTTAAACTCAA
    TGGGGAAGCTGAAATATTCTTTCGCAAAAGCTCCATTAAAAACCCGATCATTCATAA
    AAAGGGTTCCATCTTGGTAAACCGCACATACGAGGCGGAAGAAAAAGATCAGTTCG
    GAAATATCCAGATCGTAAGGAAGAATATCCCCGAAAATATATACCAAGAGCTTTAC
    AAATATTTTAACGATAAGTCAGACAAGGAACTGTCAGACGAAGCAGCCAAGTTGAA
    GAATGTCGTAGGGCACCACGAAGCAGCTACAAACATAGTTAAAGATTATCGGTACA
    CCTACGATAAATATTTCCTGCATATGCCAATAACCATAAACTTCAAAGCCAACAAAA
    CAGGGTTCATCAATGACCGAATACTTCAGTATATAGCCAAGGAAAAAGACCTGCAT
    GTTATAGGAATAGATAGAGGTGAGCGCAACTTGATATATGTCAGCGTGATAGACAC
    CTGCGGAAATATCGTCGAGCAAAAAAGTTTCAACATTGTTAATGGCTACGATTACCA
    AATTAAATTGAAGCAGCAAGAGGGGGCTCGGCAAATCGCGCGAAAGGAATGGAAA
    GAAATCGGGAAGATTAAAGAAATTAAAGAGGGCTACCTGTCTCTTGTAATTCACGA
    AATATCTAAGATGGTCATCAAGTATAATGCCATTATTGCGATGGAAGATCTGTCCTA
    CGGATTTAAGAAAGGCAGGTTTAAAGTCGAAAGGCAGGTGTACCAGAAATTCGAGA
    CCATGCTGATTAATAAGCTCAACTATCTCGTATTTAAGGATATTTCTATAACTGAAA
    ATGGAGGGCTTCTCAAAGGATATCAACTCACATACATACCTGATAAGCTGAAGAAC
    GTAGGCCACCAGTGTGGATGCATATTCTATGTACCAGCTGCATACACAAGCAAGATC
    GATCCAACTACTGGGTTTGTCAATATCTTCAAATTTAAGGACTTGACGGTCGATGCC
    AAACGGGAGTTCATCAAAAAGTTTGATAGTATTCGATATGATAGTGAGAAGAACTT
    GTTTTGCTTCACATTTGACTACAACAATTTCATAACGCAAAATACGGTTATGTCTAA
    ATCCTCATGGAGCGTCTACACTTACGGAGTGAGGATAAAGCGGCGCTTCGTAAATG
    GCAGGTTTAGCAATGAATCCGACACGATTGACATAACCAAGGATATGGAGAAAACC
    CTCGAGATGACCGATATAAATTGGCGGGATGGACACGATCTGCGACAAGACATAAT
    CGATTATGAAATCGTGCAGCACATATTTGAGATATTCAGGCTTACGGTCCAAATGAG
    AAATTCCCTTTCCGAACTTGAAGACCGCGATTACGACCGACTGATAAGCCCCGTTCT
    GAACGAAAATAACATCTTCTACGACAGCGCTAAAGCGGGAGACGCGCTGCCGAAAG
    ATGCGGACGCAAATGGAGCCTATTGTATCGCCTTGAAAGGGTTGTACGAGATCAAA
    CAGATAACCGAGAATTGGAAGGAGGATGGGAAGTTTAGTCGAGACAAACTTAAAAT
    AAGCAACAAGGACTGGTTCGACTTTATTCAAAACAAACGATATCTCAAACGTCCGG
    CAGCGACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGC
    AGGCAGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTT
    ATTCCGGGCTAA
    SEQ ID NO: 83
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAATGGTA
    CTAACAATTTTCAAAACTTTATCGGCATCTCTTCACTTCAGAAAACTCTTCGGAACGC
    CCTTATACCGACGGAGACAACGCAGCAGTTTATAGTTAAAAACGGGATCATTAAAG
    AAGATGAACTCAGAGGGGAAAACAGGCAAATATTGAAGGACATTATGGACGATTAC
    TACCGGGGGTTTATTTCAGAGACCCTTTCATCTATTGATGACATAGATTGGACCTCCC
    TTTTCGAGAAAATGGAGATACAATTGAAAAACGGCGACAATAAAGATACACTTATC
    AAGGAACAAACTGAGTATCGCAAGGCGATTCACAAGAAGTTTGCGAATGACGATCG
    CTTTAAGAATATGTTTTCTGCGAAGCTCATAAGTGACATTCTGCCTGAATTTGTCATT
    CATAACAACAATTATTCTGCTAGCGAAAAAGAGGAAAAAACTCAAGTCATTAAGCT
    TTTTAGCAGGTTCGCTACTAGTTTTAAAGACTATTTTAAGAACCGGGCGAATTGCTTT
    AGCGCTGACGACATATCATCCTCATCCTGTCATCGCATAGTCAATGATAATGCAGAA
    ATATTCTTTTCTAATGCGCTCGTGTATCGGAGAATAGTGAAAAGCCTCTCTAACGAT
    GACATTAACAAAATAAGCGGCGATATGAAGGATAGTCTGAAGGAAATGTCCCTCGA
    AGAAATATACTCATACGAGAAGTACGGAGAATTTATCACCCAGGAAGGAATTAGTT
    TTTACAACGACATCTGTGGTAAGGTTAACTCTTTTATGAATCTGTATTGTCAAAAGA
    ATAAAGAAAATAAAAATCTTTATAAGCTCCAAAAGCTTCACAAACAAATCTTGTGC
    ATTGCGGATACGTCATACGAAGTACCTTACAAATTTGAAAGCGACGAAGAGGTGTA
    TCAGTCAGTGAATGGGTTCCTTGACAATATTTCTAGCAAACATATTGTGGAGCGACT
    TCGAAAGATCGGTGATAATTACAATGGCTATAATTTGGATAAAATTTACATAGTTAG
    TAAGTTTTATGAATCCGTCTCACAAAAGACGTACCGAGATTGGGAGACCATCAACAC
    TGCTCTGGAGATTCATTACAATAATATATTGCCTGGGAATGGGAAGTCAAAGGCCGA
    CAAGGTTAAAAAAGCCGTAAAAAACGATCTTCAAAAGTCCATTACCGAGATAAATG
    AACTTGTATCCAACTATAAGTTGTGCTCTGACGATAATATTAAAGCAGAAACGTATA
    TCCACGAAATAAGTCACATCCTGAACAACTTCGAAGCTCAAGAGCTCAAGTATAATC
    CTGAAATTCATCTCGTCGAAAGCGAGCTGAAAGCATCCGAGTTGAAGAATGTGCTTG
    ATGTGATCATGAACGCATTCCATTGGTGCAGTGTGTTCATGACCGAAGAACTTGTAG
    ACAAAGACAACAACTTCTACGCTGAATTGGAAGAGATTTACGATGAAATTTACCCC
    GTGATATCCCTCTATAATCTGGTAAGAAATTACGTCACGCAAAAACCATACAGTACC
    AAGAAAATAAAGCTCAACTTTGGTATTCCGACGTTGGCAGATGGGTGGAGTAAGAG
    CAAGGAGTATTCTAACAATGCAATCATCCTCATGCGCGACAATTTGTATTATCTGGG
    GATCTTCAACGCGAAAAATAAGCCCGACAAAAAGATAATAGAAGGCAATACGTCCG
    AGAACAAAGGGGACTATAAGAAAATGATTTATAACCTTCTTCCAGGACCCAACAAG
    ATGATCCCAAAGGTTTTCTTGAGTTCAAAAACCGGCGTAGAAACTTATAAACCGTCC
    GCCTACATTCTGGAAGGGTACAAGCAAAACAAGCACATTAAGTCATCTAAGGATTT
    CGACATTACTTTTTGTCATGATTTGATAGACTACTTCAAAAATTGTATAGCGATACAT
    CCGGAATGGAAAAATTTTGGGTTCGATTTTTCCGACACAAGTACTTATGAAGACATC
    TCAGGGTTTTATAGGGAAGTTGAACTGCAAGGTTACAAAATAGACTGGACTTATATT
    AGTGAGAAGGACATTGATTTGCTCCAGGAAAAGGGTCAATTGTATCTGTTCCAGATA
    TATAACAAGGATTTCTCTAAAAAATCTACAGGTAACGACAATCTCCACACGATGTAC
    CTCAAGAATCTCTTCAGCGAAGAGAATTTGAAGGATATCGTACTTAAGCTCAATGGA
    GAAGCGGAAATATTCTTCAGAAAGTCCAGCATTAAGAATCCTATAATTCACAAGAA
    AGGGTCAATTCTCGTAAACCGGACTTATGAGGCCGAAGAAAAAGATCAGTTTGGTA
    ACATTCAGATTGTACGGAAAAACATTCCCGAGAACATCTATCAAGAACTGTATAAAT
    ACTTTAATGATAAATCCGACAAGGAACTTTCTGACGAGGCTGCAAAATTGAAGAAC
    GTAGTGGGACACCATGAGGCCGCAACCAATATAGTAAAGGATTACAGATACACTTA
    TGATAAGTATTTCCTCCATATGCCGATCACGATTAATTTCAAGGCGAATAAAACCGG
    CTTCATTAACGATCGCATTTTGCAATATATTGCGAAGGAAAAGGATTTGCACGTGAT
    AGGTATAGACCGGGGTGAACGAAACTTGATTTACGTCTCTGTGATCGACACATGCGG
    AAATATAGTTGAACAGAAGTCCTTTAATATTGTGAATGGTTACGACTACCAGATAAA
    ATTGAAGCAACAGGAGGGCGCAAGACAGATAGCTCGCAAAGAGTGGAAGGAAATC
    GGCAAGATCAAAGAAATAAAGGAGGGTTATCTTTCCCTGGTAATTCATGAAATTAG
    CAAGATGGTTATTAAGTATAATGCTATAATAGCTATGGAGGACCTTTCCTATGGGTT
    CAAGAAAGGTCGCTTCAAAGTGGAGCGACAAGTGTATCAAAAGTTCGAGACTATGT
    TGATAAATAAATTGAATTATTTGGTTTTTAAAGACATTTCAATAACTGAGAACGGGG
    GTCTCTTGAAGGGGTACCAATTGACTTATATTCCGGACAAGTTGAAGAATGTCGGAC
    ACCAGTGTGGTTGCATTTTCTACGTGCCTGCCGCTTACACCTCAAAAATCGATCCGA
    CCACTGGTTTTGTAAATATATTTAAATTCAAAGATCTCACCGTTGATGCCAAACGGG
    AGTTTATCAAAAAATTCGATTCCATTCGCTACGACTCTGAGAAAAACCTTTTTTGTTT
    CACGTTCGATTATAACAACTTTATAACCCAAAATACTGTAATGTCCAAGTCAAGTTG
    GTCTGTCTATACTTACGGAGTAAGGATCAAGCGCCGCTTCGTTAATGGGAGATTCTC
    AAACGAGTCTGATACCATAGACATAACTAAAGACATGGAAAAAACCCTGGAAATGA
    CGGACATCAATTGGCGAGACGGGCATGATCTTCGACAGGACATAATAGATTACGAA
    ATTGTTCAACACATTTTCGAGATATTTCGACTTACGGTTCAGATGAGGAATTCCCTTT
    CCGAATTGGAAGACCGGGATTATGATCGACTTATATCTCCCGTGCTCAATGAAAACA
    ATATTTTTTATGATTCAGCGAAAGCTGGGGACGCGCTGCCAAAAGATGCCGATGCCA
    ATGGAGCATACTGTATCGCCCTGAAGGGTTTGTATGAGATTAAGCAAATTACTGAAA
    ACTGGAAGGAAGATGGCAAGTTTTCTAGAGATAAGCTTAAGATTAGCAATAAGGAC
    TGGTTTGACTTCATTCAAAATAAAAGGTATCTTAAACGTCCGGCAGCGACCAAAAAA
    GCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAA
    AGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 84
    AGCCCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAATGGAA
    CAAATAATTTTCAAAATTTTATTGGTATCAGTTCATTGCAAAAGACTTTGAGAAATG
    CTTTGATCCCGACTGAGACCACACAGCAGTTCATCGTCAAAAATGGCATAATCAAGG
    AAGACGAACTTAGGGGTGAGAATAGACAAATATTGAAGGACATCATGGATGACTAT
    TATAGGGGGTTCATTTCCGAAACGCTCAGTAGTATTGATGACATTGACTGGACTAGT
    CTTTTCGAGAAAATGGAAATTCAGCTTAAGAACGGGGACAATAAAGACACGCTGAT
    CAAGGAGCAAACGGAATATAGGAAGGCGATCCATAAAAAATTCGCGAATGATGATC
    GGTTTAAAAACATGTTTAGTGCCAAGTTGATCAGCGACATACTGCCCGAATTCGTGA
    TCCACAACAATAATTACAGCGCCTCCGAAAAGGAGGAAAAAACTCAGGTCATTAAA
    TTGTTTAGCCGATTCGCAACGAGTTTCAAAGATTATTTTAAGAACCGGGCCAACTGT
    TTTTCAGCGGATGATATTAGCTCCAGCAGCTGCCATCGCATAGTAAATGATAACGCT
    GAAATCTTTTTTAGCAACGCACTTGTCTACCGGAGGATTGTAAAATCACTGTCAAAT
    GATGACATTAACAAAATATCTGGAGATATGAAGGACTCACTCAAAGAAATGAGCCT
    GGAAGAAATATATTCATACGAAAAATACGGGGAGTTTATTACCCAGGAAGGTATCA
    GTTTTTATAATGATATATGTGGAAAAGTTAATTCATTTATGAATCTTTACTGTCAAAA
    AAATAAGGAGAACAAGAATTTGTACAAGCTCCAAAAACTTCATAAACAGATTCTGT
    GCATCGCAGACACAAGTTATGAGGTACCGTACAAATTTGAGAGCGACGAAGAAGTT
    TATCAGAGTGTGAATGGTTTCCTGGACAATATCTCTTCTAAACACATTGTTGAGAGG
    CTTAGGAAGATCGGTGATAATTATAACGGCTATAATCTGGACAAAATTTATATTGTA
    TCAAAGTTTTATGAATCAGTCTCTCAAAAGACGTATCGGGATTGGGAAACAATTAAC
    ACGGCTCTGGAGATCCACTACAATAACATTCTGCCCGGCAACGGGAAGAGCAAAGC
    TGATAAGGTCAAGAAGGCAGTCAAGAACGACCTTCAGAAGAGCATAACAGAAATTA
    ACGAATTGGTCAGTAACTACAAACTGTGTAGTGATGACAACATAAAAGCCGAAACA
    TACATCCATGAAATAAGCCATATCCTGAATAACTTCGAAGCCCAAGAACTTAAATAC
    AATCCCGAGATTCATCTTGTCGAATCAGAACTCAAGGCGTCCGAGCTCAAAAATGTC
    CTTGACGTGATAATGAATGCCTTCCACTGGTGCAGCGTATTCATGACGGAGGAGTTG
    GTAGATAAAGACAACAACTTTTATGCCGAATTGGAAGAGATTTATGATGAGATTTAC
    CCCGTTATTTCTCTGTACAACTTGGTTCGAAACTACGTAACACAAAAACCATACTCA
    ACCAAAAAGATCAAACTCAATTTTGGCATACCTACATTGGCTGATGGTTGGTCCAAG
    TCAAAGGAATATAGCAATAATGCAATAATTCTCATGCGAGATAACTTGTATTATTTG
    GGGATCTTTAACGCTAAGAACAAACCAGATAAAAAGATAATCGAGGGGAACACAA
    GTGAGAACAAGGGTGATTACAAAAAAATGATTTACAATCTGCTTCCTGGGCCTAAC
    AAAATGATTCCGAAGGTGTTTCTTAGCTCTAAAACTGGAGTGGAGACGTATAAGCCT
    TCCGCGTACATTCTCGAAGGCTACAAGCAAAATAAGCATATCAAGTCCAGTAAGGA
    CTTCGACATCACTTTTTGCCACGATCTCATCGATTACTTTAAGAACTGTATCGCAATA
    CACCCCGAGTGGAAAAACTTTGGTTTTGATTTTTCAGACACTAGTACCTACGAGGAC
    ATTTCCGGCTTCTATCGAGAAGTCGAACTCCAGGGCTACAAAATCGATTGGACGTAC
    ATTTCTGAGAAGGACATCGACTTGCTCCAAGAGAAAGGTCAACTTTACCTCTTCCAA
    ATTTACAATAAAGACTTTTCAAAGAAGAGCACCGGTAATGACAACTTGCATACCATG
    TATCTGAAGAACCTGTTTTCTGAGGAGAACCTCAAGGATATTGTATTGAAGTTGAAT
    GGCGAAGCAGAAATATTTTTCCGAAAGTCATCTATCAAGAACCCCATTATACACAAA
    AAAGGCTCTATCCTGGTGAACCGGACTTACGAGGCAGAGGAGAAGGATCAATTCGG
    AAACATACAGATAGTCCGCAAAAACATCCCTGAGAATATCTATCAGGAACTCTATA
    AGTACTTCAATGATAAATCAGACAAGGAGCTTAGCGACGAAGCAGCTAAACTTAAA
    AACGTGGTTGGCCATCACGAGGCCGCTACCAACATAGTCAAAGACTACCGCTATACT
    TATGACAAGTACTTTTTGCACATGCCCATAACAATTAATTTCAAAGCTAACAAAACA
    GGGTTTATAAATGACAGAATCCTCCAATACATCGCCAAAGAGAAGGACCTCCATGT
    AATCGGGATTGATAGAGGCGAACGGAACTTGATTTACGTTAGTGTCATTGATACCTG
    TGGTAACATTGTCGAACAAAAGTCATTCAACATAGTCAATGGATATGATTATCAGAT
    AAAACTCAAGCAACAAGAAGGCGCGAGGCAGATTGCCAGGAAGGAATGGAAAGAA
    ATCGGGAAGATCAAGGAGATCAAGGAGGGTTACCTGTCCTTGGTGATACACGAGAT
    TTCAAAAATGGTTATAAAATACAATGCCATTATCGCGATGGAGGATTTGTCTTATGG
    ATTTAAGAAGGGGAGGTTCAAAGTCGAACGACAAGTCTATCAGAAGTTTGAAACAA
    TGCTCATTAACAAGCTCAATTACCTTGTTTTCAAGGATATAAGCATCACTGAAAACG
    GCGGACTCCTTAAGGGATATCAGCTGACTTATATCCCCGACAAGCTCAAGAACGTAG
    GGCACCAATGCGGATGCATCTTTTACGTGCCTGCAGCATATACTTCAAAAATTGATC
    CGACTACTGGCTTTGTTAACATTTTCAAGTTCAAGGATCTGACGGTAGACGCTAAGA
    GAGAATTCATAAAAAAGTTTGACAGCATCAGGTACGATAGTGAAAAGAACCTTTTTT
    GTTTTACCTTTGACTACAATAATTTTATTACGCAAAATACAGTTATGAGCAAATCAA
    GTTGGAGCGTTTACACATATGGCGTTCGGATCAAGCGCAGATTCGTCAATGGTCGCT
    TCTCAAATGAGAGCGATACAATCGATATAACGAAGGATATGGAGAAGACGCTTGAG
    ATGACAGATATCAACTGGCGGGACGGACATGACCTTAGACAAGACATAATCGATTA
    CGAAATAGTACAGCATATCTTTGAGATTTTTAGGCTTACAGTTCAGATGCGGAACTC
    TCTTTCCGAACTGGAGGACCGGGATTATGATCGGTTGATCTCCCCAGTACTGAACGA
    AAATAATATCTTTTACGATAGCGCGAAGGCTGGTGATGCACTCCCAAAAGACGCTG
    ATGCGAACGGAGCTTATTGCATAGCCCTTAAAGGGCTTTACGAGATTAAACAAATA
    ACAGAAAATTGGAAGGAAGATGGCAAATTTTCCCGCGACAAGTTGAAGATTAGTAA
    CAAAGACTGGTTCGACTTCATTCAGAATAAACGCTACCTCAAACGTCCGGCAGCGAC
    CAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGC
    CCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGG
    GCTAA
    SEQ ID NO: 85
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAACGGTACCAA
    TAACTTCCAGAACTTCATCGGTATTTCTAGCCTGCAAAAGACCCTGCGTAACGCGCT
    GATTCCGACCGAGACTACCCAGCAATTCATCGTGAAAAACGGTATCATTAAGGAAG
    ATGAATTGCGCGGTGAGAATCGTCAGATTCTGAAAGATATCATGGATGACTACTATC
    GCGGTTTCATTAGCGAAACCCTGTCGAGCATCGATGATATCGATTGGACGAGCCTCT
    TCGAGAAAATGGAAATTCAACTGAAAAATGGTGACAACAAAGATACCCTGATTAAA
    GAACAAACGGAATACCGCAAGGCAATCCATAAAAAGTTTGCGAATGACGACCGTTT
    TAAGAATATGTTCTCGGCCAAGCTGATTTCCGACATCCTGCCAGAGTTCGTCATTCA
    CAACAACAATTACAGCGCAAGCGAGAAAGAGGAAAAGACTCAGGTCATTAAGCTGT
    TTAGCCGCTTTGCGACGTCCTTCAAAGACTACTTCAAGAATCGTGCGAATTGCTTTA
    GCGCGGATGACATCTCTAGCTCTAGCTGTCACCGTATTGTTAACGACAATGCAGAGA
    TTTTCTTCAGCAACGCCCTGGTGTATCGCCGTATTGTCAAGTCTCTGAGCAACGACG
    ACATTAACAAGATCAGCGGCGACATGAAAGACAGCCTGAAAGAAATGTCTCTGGAA
    GAAATCTACAGCTACGAGAAATATGGTGAGTTTATCACCCAAGAGGGCATTAGCTTC
    TACAATGATATCTGTGGTAAGGTTAATAGCTTTATGAATCTGTACTGCCAGAAGAAT
    AAAGAAAACAAGAACTTGTACAAGCTGCAAAAGCTGCATAAGCAAATTCTGTGCAT
    CGCCGATACTAGCTATGAAGTTCCGTACAAGTTCGAGTCTGATGAAGAGGTGTATCA
    GTCAGTCAACGGTTTTCTGGATAACATCAGCAGCAAGCACATCGTCGAGCGCCTGCG
    CAAGATTGGTGACAACTACAATGGTTATAACCTGGACAAGATCTATATCGTGTCGAA
    GTTTTACGAGAGCGTGTCCCAGAAAACGTACCGTGATTGGGAAACGATTAACACGG
    CCTTGGAAATTCACTATAACAATATCCTGCCGGGCAACGGCAAGAGCAAAGCTGAC
    AAAGTCAAAAAAGCTGTGAAAAACGATCTGCAAAAGTCCATCACCGAGATCAACGA
    ACTGGTTAGCAACTATAAGCTGTGTAGCGACGACAACATTAAAGCTGAAACGTATA
    TCCACGAAATCAGCCACATCCTGAATAACTTTGAGGCACAAGAACTGAAATACAAT
    CCTGAGATCCATCTGGTAGAGAGCGAGCTGAAGGCAAGCGAGTTGAAAAACGTTCT
    CGACGTTATCATGAATGCTTTCCACTGGTGTAGCGTGTTTATGACCGAAGAACTGGT
    TGACAAAGATAACAATTTCTATGCAGAGCTGGAAGAAATCTATGATGAAATCTACC
    CGGTCATCAGCCTGTATAACCTGGTTCGTAACTACGTGACGCAGAAGCCGTACAGCA
    CCAAAAAGATCAAGCTGAACTTCGGTATTCCGACCTTGGCGGACGGTTGGAGCAAA
    TCCAAAGAATACTCCAATAATGCGATTATTCTGATGCGTGATAATCTGTACTATCTG
    GGTATCTTCAATGCGAAGAACAAGCCAGATAAAAAGATTATTGAAGGCAACACCAG
    CGAGAATAAAGGCGACTACAAGAAAATGATCTACAACTTATTGCCGGGTCCGAACA
    AGATGATCCCGAAAGTTTTTCTGAGCAGCAAGACCGGCGTTGAAACCTATAAGCCG
    AGCGCGTACATTTTAGAGGGCTATAAACAAAACAAGCACATCAAGAGCAGCAAAGA
    TTTTGATATTACGTTCTGCCACGACCTGATCGACTATTTCAAGAATTGTATTGCGATT
    CACCCTGAGTGGAAGAACTTCGGTTTTGACTTTTCCGATACCTCCACCTATGAAGAT
    ATTAGCGGTTTTTACCGTGAAGTCGAGTTGCAGGGTTATAAGATTGATTGGACTTAC
    ATTTCCGAGAAAGACATCGACCTGTTGCAAGAGAAAGGTCAGCTGTACCTGTTTCAG
    ATCTATAACAAAGATTTCAGCAAAAAGTCGACGGGCAATGATAATCTGCACACCAT
    GTATCTGAAAAACCTGTTTAGCGAAGAGAACCTGAAAGACATTGTTCTTAAGCTGAA
    TGGTGAGGCCGAGATCTTCTTCCGTAAAAGCTCCATTAAGAACCCGATTATCCACAA
    AAAGGGCTCTATTCTGGTTAACCGCACGTACGAAGCGGAAGAGAAAGATCAATTTG
    GTAACATCCAGATCGTGCGTAAGAATATCCCGGAGAACATTTACCAAGAACTGTAT
    AAGTATTTCAATGACAAGAGCGATAAAGAATTGAGCGATGAAGCGGCAAAGCTGAA
    AAACGTCGTTGGCCACCACGAAGCCGCGACGAATATCGTGAAAGATTATCGTTACA
    CCTACGACAAGTACTTTCTGCACATGCCGATCACCATCAATTTCAAAGCGAATAAAA
    CGGGTTTTATCAATGACCGTATCCTGCAGTACATTGCGAAAGAAAAAGATTTACACG
    TGATTGGTATTGATCGCGGCGAGCGCAATCTGATTTACGTCAGCGTTATCGACACGT
    GCGGCAATATTGTGGAGCAGAAAAGCTTCAATATCGTCAATGGTTACGACTACCAG
    ATCAAACTGAAGCAACAAGAGGGCGCCCGCCAGATTGCGCGTAAAGAGTGGAAAG
    AAATCGGTAAGATTAAAGAAATCAAGGAAGGCTACCTGTCCCTGGTGATCCATGAA
    ATCAGCAAAATGGTGATCAAGTACAACGCTATCATTGCGATGGAAGATCTGAGCTA
    CGGTTTTAAAAAGGGTCGCTTCAAAGTTGAGCGTCAAGTGTATCAGAAATTTGAGAC
    TATGCTGATTAACAAGTTGAACTATCTGGTTTTTAAAGACATCAGCATTACCGAGAA
    TGGTGGCCTGCTGAAGGGTTATCAACTGACCTATATTCCTGACAAGTTGAAAAATGT
    TGGTCATCAGTGTGGTTGCATTTTCTACGTACCGGCAGCGTACACGAGCAAGATTGA
    CCCGACCACGGGTTTCGTTAACATTTTCAAGTTTAAAGATTTGACCGTGGACGCCAA
    GCGTGAGTTCATTAAAAAGTTCGACAGCATCAGATACGACTCTGAGAAGAATCTGTT
    CTGCTTTACGTTCGACTACAATAACTTCATTACCCAAAATACCGTTATGAGCAAAAG
    CTCCTGGAGCGTGTACACGTACGGCGTCCGTATCAAGCGTCGTTTTGTGAATGGTCG
    CTTTTCCAACGAATCTGACACCATTGACATTACCAAAGATATGGAAAAGACCCTTGA
    GATGACCGACATTAATTGGCGTGATGGCCATGACTTGCGCCAAGACATTATCGACTA
    CGAAATTGTTCAGCACATCTTTGAGATTTTTCGTCTGACGGTCCAGATGCGCAACTC
    GCTGAGCGAGTTGGAAGATCGTGACTATGACCGTCTGATTAGCCCGGTGCTGAATGA
    AAACAATATCTTCTATGATAGCGCAAAGGCCGGTGACGCGCTGCCGAAAGATGCGG
    ATGCTAACGGTGCATACTGCATTGCACTGAAGGGTCTGTACGAAATCAAACAGATC
    ACCGAGAATTGGAAAGAGGATGGTAAGTTTAGCCGTGATAAGCTGAAGATTAGCAA
    TAAAGACTGGTTCGACTTTATTCAAAACAAGCGCTATCTGAAACGTCCGGCAGCGAC
    CAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGC
    CCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGG
    GCTAA
    SEQ ID NO: 86
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGAACAA
    ATAATTTTCAGAACTTTATTGGGATCAGTTCGCTTCAGAAAACGCTTCGTAATGCTCT
    GATTCCCACAGAAACCACTCAGCAGTTTATCGTAAAGAATGGCATTATCAAGGAGG
    ATGAATTACGCGGCGAGAACCGCCAAATCTTAAAAGATATCATGGACGACTACTAC
    CGCGGTTTCATTAGCGAAACTCTTAGTTCAATTGACGACATTGACTGGACGTCCTTG
    TTCGAAAAGATGGAGATTCAATTAAAGAACGGTGATAACAAGGATACGTTGATTAA
    AGAACAGACGGAGTACCGTAAGGCTATCCACAAAAAATTTGCAAACGACGACCGCT
    TTAAAAATATGTTTAGCGCAAAATTAATCTCCGACATCCTGCCTGAATTCGTCATCC
    ATAACAATAACTATAGCGCCTCGGAAAAAGAAGAAAAAACGCAGGTTATTAAACTT
    TTCTCGCGCTTTGCAACAAGCTTTAAGGATTACTTCAAAAATCGCGCCAATTGTTTTT
    CAGCCGACGACATTAGCTCCAGTTCCTGCCACCGTATTGTGAATGACAACGCTGAGA
    TTTTTTTTTCCAATGCGCTGGTTTATCGTCGTATTGTTAAGAGCCTTAGTAACGACGA
    CATTAATAAAATTAGCGGTGATATGAAGGATAGCTTGAAAGAAATGAGTCTGGAAG
    AGATCTATAGTTACGAGAAGTACGGCGAATTTATTACCCAGGAGGGCATTTCATTTT
    ACAATGATATCTGTGGAAAAGTCAACTCCTTTATGAACTTGTATTGCCAAAAGAATA
    AAGAAAACAAAAACCTGTACAAACTGCAAAAGTTACACAAGCAGATTTTGTGTATC
    GCAGACACGTCATACGAAGTACCGTACAAGTTTGAGTCCGATGAAGAAGTGTACCA
    AAGCGTTAATGGCTTTTTGGATAACATTTCGAGCAAACATATCGTAGAGCGTTTGCG
    TAAGATTGGTGATAATTACAACGGTTACAATTTAGACAAAATCTATATCGTCTCTAA
    GTTTTACGAAAGTGTTTCTCAGAAAACTTACCGCGATTGGGAGACGATCAACACTGC
    GCTGGAGATTCATTACAATAATATCCTTCCAGGTAACGGTAAAAGCAAAGCTGATA
    AGGTGAAAAAGGCGGTTAAAAATGACCTTCAAAAGTCTATCACAGAAATCAACGAA
    TTGGTCAGCAATTATAAGCTTTGCAGTGACGATAACATTAAGGCCGAGACTTACATC
    CATGAGATCTCTCACATTCTTAATAATTTTGAAGCGCAAGAGCTGAAATACAATCCT
    GAAATCCATCTGGTCGAAAGTGAATTAAAAGCCTCCGAATTAAAAAATGTCTTGGA
    CGTGATCATGAATGCGTTCCATTGGTGCTCAGTTTTTATGACGGAAGAGTTGGTGGA
    CAAAGACAACAATTTTTACGCCGAGCTTGAGGAAATTTACGACGAAATTTACCCCGT
    TATTTCGTTATACAACCTTGTGCGTAATTACGTTACACAAAAGCCCTATTCGACAAA
    GAAAATCAAGTTAAATTTCGGGATTCCCACATTAGCTGATGGATGGTCCAAATCCAA
    AGAATACTCGAATAACGCTATCATCCTTATGCGTGATAATTTGTACTACTTAGGCAT
    CTTCAATGCGAAGAACAAACCTGACAAGAAAATTATCGAAGGAAACACTTCGGAGA
    ACAAAGGTGATTATAAAAAGATGATCTACAACTTGCTTCCCGGGCCAAACAAAATG
    ATTCCCAAGGTATTTTTGAGTTCTAAAACCGGTGTCGAAACTTACAAACCAAGTGCT
    TATATTTTGGAAGGATACAAACAGAACAAACATATCAAGTCTTCGAAAGACTTCGAT
    ATTACGTTCTGCCACGATCTGATCGATTACTTCAAGAACTGTATTGCTATTCACCCCG
    AGTGGAAGAACTTTGGATTTGATTTCTCCGACACGTCCACTTATGAAGATATCTCTG
    GCTTCTATCGCGAGGTTGAATTACAAGGGTATAAGATTGACTGGACTTATATTTCGG
    AGAAGGATATCGATCTTTTGCAAGAAAAAGGGCAACTTTATTTATTTCAGATCTATA
    ACAAGGACTTTTCAAAAAAGAGCACTGGAAATGACAATCTGCATACCATGTACCTT
    AAGAACCTGTTCTCGGAAGAGAACCTGAAGGACATTGTACTTAAACTGAATGGAGA
    GGCAGAGATCTTCTTTCGCAAATCAAGCATTAAGAACCCAATTATTCACAAAAAGG
    GGAGTATCTTAGTAAATCGCACATATGAGGCTGAGGAAAAAGATCAGTTTGGTAAC
    ATTCAGATCGTGCGTAAGAACATTCCTGAAAATATCTATCAGGAACTTTATAAGTAT
    TTCAACGATAAAAGTGATAAAGAGCTGAGTGACGAAGCGGCTAAACTTAAGAATGT
    TGTGGGACACCATGAGGCAGCAACCAATATTGTGAAGGATTATCGCTATACGTACG
    ACAAATACTTTTTACACATGCCCATCACTATTAATTTTAAAGCTAATAAGACTGGCTT
    CATTAACGATCGCATCCTGCAGTACATTGCTAAGGAAAAGGATCTTCACGTTATCGG
    TATCGATCGCGGGGAGCGTAATCTTATCTACGTCTCTGTCATTGACACGTGTGGCAA
    TATTGTGGAGCAAAAGTCCTTCAATATTGTTAACGGCTATGACTATCAGATTAAATT
    GAAACAGCAGGAAGGTGCGCGTCAGATTGCCCGCAAGGAATGGAAGGAAATTGGC
    AAGATCAAAGAAATTAAGGAGGGCTACTTAAGCTTAGTAATTCACGAAATTAGTAA
    AATGGTTATCAAATACAACGCCATCATCGCGATGGAGGATCTTTCGTACGGGTTTAA
    GAAAGGTCGTTTTAAAGTGGAGCGTCAGGTGTACCAGAAATTTGAAACTATGCTTAT
    TAACAAACTTAACTACCTGGTTTTCAAGGATATCAGTATTACTGAAAACGGGGGGCT
    GTTAAAAGGGTATCAATTAACTTACATTCCAGACAAATTAAAGAACGTTGGACATCA
    GTGTGGCTGCATTTTTTATGTACCAGCTGCATACACTTCAAAGATCGATCCTACGACT
    GGGTTCGTGAACATTTTTAAGTTTAAAGACTTGACGGTAGATGCCAAGCGCGAATTC
    ATCAAGAAATTCGACAGCATTCGCTACGACTCTGAGAAAAATCTTTTCTGTTTCACA
    TTCGATTATAACAATTTCATTACGCAGAACACAGTAATGTCCAAGTCTTCTTGGAGT
    GTTTATACATATGGTGTCCGCATTAAGCGCCGTTTCGTCAACGGCCGCTTCAGTAAT
    GAGAGCGATACTATTGACATCACAAAAGACATGGAAAAAACACTGGAAATGACCGA
    CATCAATTGGCGTGACGGCCATGACTTACGTCAGGATATCATTGATTATGAGATCGT
    TCAACACATCTTCGAAATCTTTCGCTTGACTGTTCAAATGCGCAATTCCTTGTCGGAA
    TTGGAGGACCGTGATTATGACCGCTTAATTTCCCCCGTCTTAAATGAAAACAATATT
    TTTTATGACTCTGCAAAAGCTGGAGATGCTCTGCCGAAAGACGCCGATGCAAATGG
    GGCATATTGCATTGCTTTAAAGGGGCTTTACGAGATCAAGCAAATCACCGAAAACTG
    GAAAGAGGATGGAAAGTTTTCGCGTGATAAACTGAAGATCTCTAACAAAGACTGGT
    TCGACTTTATCCAGAACAAGCGTTATTTGAAACGTCCGGCAGCGACCAAAAAAGCC
    GGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGA
    AACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 87
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGCACCA
    ATAACTTCCAAAACTTCATCGGGATCTCTAGCCTTCAGAAGACGCTTCGCAATGCTC
    TTATCCCAACTGAGACCACTCAACAATTTATTGTGAAGAATGGAATTATTAAAGAGG
    ACGAACTGCGTGGCGAGAATCGTCAGATCTTAAAGGACATTATGGATGATTATTACC
    GTGGATTCATCTCCGAAACATTATCGTCGATCGATGATATCGATTGGACTTCTCTGTT
    CGAGAAAATGGAAATTCAATTGAAAAACGGAGATAATAAAGATACGCTTATCAAAG
    AACAGACGGAATATCGTAAAGCGATTCATAAGAAATTCGCAAATGACGATCGTTTC
    AAAAATATGTTCAGTGCCAAGCTTATTTCGGACATTTTACCTGAATTTGTAATTCATA
    ATAATAACTACTCAGCAAGTGAGAAGGAGGAGAAAACCCAAGTTATTAAACTGTTC
    TCTCGTTTCGCAACGTCCTTTAAAGATTACTTTAAAAACCGCGCGAATTGCTTTAGCG
    CTGACGACATTTCCAGCTCATCCTGTCATCGCATCGTAAACGACAATGCGGAAATCT
    TCTTCAGCAACGCCCTGGTTTACCGCCGCATCGTCAAAAGCTTATCGAATGACGACA
    TCAATAAGATCTCAGGAGATATGAAGGACTCGCTTAAGGAGATGTCTCTGGAGGAA
    ATTTATAGTTACGAAAAGTATGGAGAGTTCATTACCCAGGAGGGAATCTCGTTCTAC
    AATGACATTTGCGGGAAGGTGAACTCCTTCATGAACTTATACTGCCAGAAAAACAA
    AGAGAACAAAAATCTGTATAAATTGCAGAAATTACATAAACAGATTCTTTGTATTGC
    TGACACTTCCTACGAAGTACCCTATAAATTCGAGTCAGATGAAGAAGTATACCAGTC
    CGTGAACGGATTTCTGGACAATATCTCCTCAAAACACATCGTGGAACGCTTACGTAA
    AATTGGCGATAATTATAATGGTTACAATCTTGACAAAATTTATATCGTATCTAAATTT
    TACGAGAGTGTGAGCCAAAAGACCTACCGCGACTGGGAGACCATCAACACAGCTTT
    AGAAATTCACTATAATAATATCTTACCCGGCAATGGTAAGAGCAAGGCTGACAAGG
    TAAAAAAGGCCGTCAAGAATGATTTGCAGAAATCTATTACAGAAATTAATGAGTTA
    GTCTCCAACTATAAGCTTTGTTCCGACGATAACATCAAAGCTGAGACATATATTCAT
    GAGATTAGTCACATTCTTAACAACTTCGAGGCCCAGGAACTTAAGTACAATCCTGAA
    ATTCATCTTGTCGAGTCTGAGCTGAAAGCTAGTGAATTGAAAAATGTTTTAGACGTT
    ATTATGAACGCATTCCACTGGTGCTCTGTGTTTATGACAGAAGAACTGGTCGACAAG
    GACAATAACTTCTATGCCGAACTTGAGGAAATCTACGATGAAATTTACCCTGTAATC
    TCCTTGTATAATCTTGTACGTAATTACGTCACTCAAAAACCTTACAGCACGAAAAAA
    ATTAAATTGAACTTCGGGATTCCTACACTTGCCGACGGGTGGTCTAAATCCAAGGAA
    TATAGCAACAATGCCATTATTTTAATGCGCGACAATCTTTACTATTTAGGAATTTTTA
    ACGCTAAGAACAAGCCCGATAAAAAGATTATTGAAGGAAACACGTCTGAAAATAAG
    GGCGACTACAAAAAGATGATTTATAACCTTTTGCCCGGTCCAAACAAAATGATCCCA
    AAGGTATTCCTGTCATCCAAAACAGGGGTTGAGACATATAAGCCCAGCGCATATATT
    CTGGAAGGATACAAACAGAATAAACATATCAAAAGCAGCAAAGATTTTGACATTAC
    TTTTTGCCACGATTTAATCGACTACTTCAAAAACTGTATCGCTATCCACCCTGAATGG
    AAGAATTTCGGATTTGATTTCTCAGATACAAGTACGTATGAGGATATCAGCGGTTTC
    TATCGCGAAGTTGAACTTCAAGGGTATAAAATTGACTGGACCTACATTAGTGAGAA
    GGACATCGACCTGTTACAGGAAAAAGGCCAATTGTACTTGTTTCAGATCTACAATAA
    GGATTTCTCAAAAAAATCGACCGGCAATGATAACTTGCACACCATGTACCTGAAGA
    ACCTTTTTTCGGAGGAAAACCTTAAAGACATTGTCCTGAAGTTGAATGGAGAAGCGG
    AGATTTTCTTTCGTAAGTCTTCCATTAAAAATCCAATTATTCATAAGAAGGGCAGCA
    TCCTTGTGAACCGTACGTACGAGGCGGAAGAGAAGGACCAATTCGGTAACATTCAA
    ATCGTCCGCAAGAACATCCCTGAAAATATTTATCAGGAGCTTTACAAGTATTTCAAT
    GATAAGTCCGACAAGGAATTATCAGATGAGGCTGCGAAGTTGAAAAATGTTGTTGG
    TCATCACGAGGCGGCGACGAATATTGTAAAGGATTATCGCTACACTTATGACAAGTA
    CTTTCTGCACATGCCGATCACCATTAATTTCAAGGCGAACAAAACAGGATTTATTAA
    TGACCGCATCTTACAATACATTGCCAAAGAAAAGGACTTACACGTTATTGGCATTGA
    TCGTGGAGAACGCAACTTAATCTACGTAAGCGTTATTGACACTTGCGGGAATATCGT
    AGAACAAAAGAGCTTCAACATCGTGAATGGTTACGATTACCAGATCAAGCTTAAGC
    AGCAGGAGGGAGCGCGCCAGATCGCGCGCAAGGAATGGAAGGAGATTGGTAAGAT
    CAAGGAAATCAAGGAAGGTTATCTGTCCTTGGTAATCCACGAAATTTCGAAAATGGT
    TATCAAATACAATGCTATTATTGCAATGGAGGACTTGTCCTACGGCTTTAAAAAAGG
    ACGCTTTAAGGTGGAGCGCCAGGTTTATCAAAAGTTTGAAACAATGCTGATTAACAA
    GCTGAACTATTTGGTCTTTAAAGATATCTCCATCACCGAAAATGGTGGGCTTTTGAA
    AGGCTATCAACTTACATATATCCCTGATAAGCTTAAGAATGTGGGTCATCAGTGCGG
    GTGCATTTTTTATGTTCCTGCAGCCTACACGTCCAAAATCGATCCTACAACTGGATTT
    GTTAATATCTTCAAATTTAAGGATCTTACCGTCGACGCGAAGCGCGAATTTATCAAG
    AAATTCGATAGTATTCGTTATGATTCCGAAAAAAACCTTTTCTGTTTCACCTTTGATT
    ATAATAACTTTATCACGCAAAATACTGTCATGAGCAAATCGAGTTGGTCTGTGTACA
    CTTACGGAGTACGCATCAAGCGTCGTTTTGTTAATGGGCGCTTCAGTAACGAGTCAG
    ACACGATTGATATCACAAAAGATATGGAGAAAACGCTGGAGATGACAGACATCAAT
    TGGCGCGATGGTCATGACTTACGTCAAGACATTATCGATTATGAAATTGTCCAGCAT
    ATCTTTGAGATCTTTCGTTTGACTGTTCAGATGCGCAACAGCCTGTCAGAATTGGAG
    GATCGTGACTATGATCGCCTTATTTCTCCCGTCTTAAATGAGAACAATATCTTCTACG
    ACTCAGCCAAGGCTGGAGATGCACTGCCAAAAGACGCCGACGCAAATGGGGCCTAC
    TGTATTGCATTGAAGGGGTTGTACGAGATCAAACAGATTACAGAAAATTGGAAGGA
    GGACGGTAAGTTCTCTCGTGATAAGCTGAAGATTTCTAACAAAGACTGGTTCGATTT
    CATTCAGAACAAACGTTACCTGAAACGTCCGGCAGCGACCAAAAAAGCCGGCCAGG
    CGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGAAACGTAA
    AGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 88
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGTACCA
    ATAACTTTCAGAATTTCATTGGAATCAGCAGCTTACAGAAAACCCTGCGCAATGCAC
    TTATCCCCACTGAGACAACCCAGCAGTTCATTGTAAAGAACGGGATTATTAAAGAA
    GATGAGCTTCGCGGGGAGAATCGTCAGATCTTAAAGGATATTATGGACGATTACTAC
    CGTGGCTTCATTTCGGAGACGCTGTCGTCGATCGACGACATCGACTGGACATCCTTG
    TTTGAAAAGATGGAAATCCAACTGAAGAATGGCGATAACAAGGACACGTTAATCAA
    AGAGCAGACGGAATACCGTAAAGCTATCCACAAAAAGTTCGCTAATGACGACCGCT
    TTAAGAACATGTTCTCAGCAAAACTTATTAGCGATATTTTACCTGAATTTGTCATCCA
    CAATAACAATTACTCCGCGAGTGAAAAAGAGGAGAAAACCCAGGTGATTAAGCTGT
    TTTCCCGTTTTGCAACCAGTTTCAAGGACTATTTTAAGAATCGTGCTAATTGTTTCTC
    TGCAGACGACATTTCCTCGTCGTCCTGCCATCGCATTGTTAATGATAATGCTGAAAT
    CTTTTTTTCAAACGCACTTGTGTATCGTCGCATTGTCAAAAGCTTAAGTAATGACGAT
    ATCAATAAGATCTCAGGAGACATGAAGGACTCCCTGAAAGAAATGTCATTGGAAGA
    AATTTACTCTTATGAAAAGTATGGAGAATTTATTACGCAGGAGGGTATCAGCTTCTA
    TAACGACATTTGTGGTAAAGTGAACAGCTTTATGAATCTTTATTGTCAAAAGAATAA
    AGAGAACAAAAATCTGTACAAGCTGCAGAAATTGCATAAACAAATTCTGTGCATTG
    CAGATACTTCGTATGAGGTTCCTTACAAATTCGAGTCGGATGAGGAGGTGTATCAAA
    GCGTAAACGGATTTTTGGATAACATTAGTAGTAAGCATATTGTGGAACGCCTTCGCA
    AGATTGGTGACAACTATAACGGATACAACTTAGACAAGATCTATATTGTCTCGAAGT
    TTTACGAAAGTGTTTCCCAAAAGACTTATCGCGACTGGGAGACAATCAACACTGCGC
    TGGAAATTCACTATAACAATATCTTGCCGGGGAACGGAAAAAGTAAGGCAGATAAG
    GTGAAGAAAGCAGTCAAAAATGATCTGCAAAAAAGCATTACTGAAATTAACGAACT
    TGTGTCAAATTACAAATTGTGTTCGGATGACAATATTAAAGCGGAAACGTATATCCA
    CGAGATCTCGCACATTCTTAATAATTTCGAGGCGCAGGAATTAAAGTATAATCCTGA
    GATCCATTTGGTGGAATCAGAACTTAAAGCTAGTGAACTGAAAAATGTCCTGGACGT
    TATTATGAATGCATTTCACTGGTGTTCTGTCTTTATGACAGAAGAACTTGTCGACAA
    AGACAACAACTTTTATGCGGAATTAGAAGAGATTTACGACGAAATTTATCCCGTTAT
    TTCGTTATATAATTTAGTTCGTAATTACGTGACTCAGAAACCCTACAGCACAAAAAA
    GATTAAATTAAACTTTGGGATTCCGACTCTTGCTGATGGATGGAGCAAGTCCAAGGA
    GTACTCTAATAACGCCATTATCTTGATGCGTGACAACCTGTACTACCTGGGCATTTTT
    AACGCTAAAAACAAACCCGACAAAAAGATCATTGAAGGGAACACCTCGGAAAATA
    AGGGGGACTATAAAAAAATGATCTACAATCTGTTGCCAGGCCCAAATAAGATGATC
    CCAAAGGTTTTTTTATCTTCCAAAACTGGCGTAGAAACTTACAAGCCGAGCGCATAC
    ATCCTTGAAGGATATAAACAAAACAAACATATCAAAAGTTCAAAGGACTTCGATAT
    TACGTTCTGCCATGATTTAATCGATTATTTCAAGAATTGCATCGCGATTCACCCAGA
    GTGGAAAAACTTTGGGTTTGATTTTTCAGACACCAGCACTTACGAGGATATTAGTGG
    ATTCTATCGTGAGGTTGAACTGCAGGGCTATAAAATTGACTGGACCTATATTTCTGA
    AAAAGATATTGATCTGCTTCAGGAGAAAGGCCAATTGTACTTATTTCAAATCTATAA
    CAAGGATTTCTCCAAGAAGTCCACGGGTAATGACAACTTACACACAATGTATCTGAA
    GAATCTGTTTAGTGAGGAGAACTTGAAGGACATTGTGCTGAAGCTTAATGGCGAGG
    CCGAAATCTTTTTTCGTAAGTCCTCCATTAAAAACCCTATTATCCATAAGAAAGGGA
    GTATTCTTGTCAACCGCACGTATGAGGCCGAAGAAAAGGACCAATTCGGAAACATC
    CAAATTGTCCGTAAAAATATTCCTGAGAACATTTACCAGGAGCTTTACAAGTATTTC
    AACGACAAGAGTGATAAAGAACTTTCAGATGAGGCGGCGAAACTGAAGAATGTAGT
    GGGGCACCACGAAGCTGCCACGAATATTGTAAAGGATTACCGTTACACCTACGACA
    AGTACTTTTTGCATATGCCCATCACAATTAATTTTAAGGCCAATAAAACTGGTTTTAT
    CAACGATCGTATCTTACAGTACATTGCTAAGGAAAAAGATCTGCACGTTATCGGTAT
    CGATCGCGGGGAACGCAATCTGATTTATGTTAGTGTGATTGACACGTGCGGAAATAT
    TGTTGAGCAGAAGAGCTTTAATATCGTAAATGGATATGACTATCAAATTAAACTGAA
    GCAACAGGAAGGGGCCCGCCAGATTGCCCGCAAGGAGTGGAAAGAAATTGGAAAG
    ATCAAGGAGATTAAAGAAGGGTACCTTTCCCTTGTTATCCACGAAATCTCGAAAATG
    GTGATCAAGTACAATGCCATTATTGCTATGGAGGATCTGTCATATGGGTTTAAGAAA
    GGCCGCTTTAAGGTGGAACGTCAGGTTTACCAGAAGTTTGAGACCATGCTTATCAAT
    AAGCTGAATTATCTTGTCTTCAAAGACATCTCAATCACAGAGAACGGCGGGCTGTTA
    AAAGGATATCAGCTGACCTATATCCCCGACAAACTGAAAAATGTCGGGCACCAATG
    CGGCTGTATTTTCTACGTGCCCGCTGCATACACATCTAAAATTGACCCAACGACTGG
    ATTCGTAAATATTTTTAAGTTTAAGGATCTTACGGTAGATGCAAAGCGCGAATTTAT
    CAAGAAATTTGATAGTATCCGTTACGACAGCGAGAAAAACTTATTTTGTTTTACGTT
    CGATTATAACAACTTCATCACGCAAAATACCGTCATGTCAAAATCTTCCTGGTCAGT
    CTATACGTATGGCGTCCGTATCAAGCGCCGCTTCGTCAACGGGCGTTTTTCAAACGA
    GTCAGATACCATCGATATCACCAAAGATATGGAAAAAACATTGGAGATGACGGACA
    TCAATTGGCGCGATGGTCATGACTTACGCCAGGACATTATTGACTACGAAATCGTAC
    AACATATTTTTGAGATTTTCCGTCTGACCGTGCAAATGCGCAACTCATTATCCGAACT
    TGAGGATCGTGATTACGACCGCTTGATCAGTCCTGTTCTGAACGAGAATAATATTTT
    TTACGACAGTGCCAAGGCGGGAGACGCACTGCCCAAGGACGCTGACGCTAACGGAG
    CTTATTGTATTGCGTTGAAGGGACTTTACGAAATCAAGCAAATCACTGAAAACTGGA
    AGGAGGATGGTAAATTCTCACGCGACAAGTTGAAAATTTCGAACAAGGACTGGTTC
    GATTTCATCCAAAACAAGCGTTATTTAAAACGTCCGGCAGCGACCAAAAAAGCCGG
    CCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGAAA
    CGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 89
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGGACTA
    ATAACTTCCAGAACTTCATCGGTATTTCATCATTACAAAAAACGCTTCGTAACGCCT
    TGATCCCAACAGAAACGACCCAACAATTTATTGTAAAAAACGGCATCATCAAAGAA
    GACGAACTGCGTGGCGAAAATCGCCAAATTTTGAAGGACATTATGGATGACTATTAT
    CGTGGGTTTATCTCGGAGACATTATCCTCCATCGACGACATTGATTGGACGAGTCTT
    TTTGAGAAAATGGAGATCCAGCTTAAAAATGGTGATAACAAGGATACATTGATCAA
    GGAGCAAACCGAGTACCGCAAGGCCATCCATAAGAAGTTCGCAAATGACGACCGCT
    TCAAAAATATGTTTAGTGCCAAATTGATCTCGGATATCCTTCCTGAGTTCGTAATTCA
    CAACAATAATTATAGCGCATCCGAAAAGGAGGAAAAGACTCAAGTCATTAAGCTTT
    TCAGTCGCTTTGCTACCTCGTTTAAGGACTATTTCAAGAACCGCGCGAACTGCTTCTC
    AGCGGATGACATTTCTTCCTCGTCGTGTCACCGCATCGTGAATGATAATGCGGAGAT
    CTTCTTTAGTAATGCCTTGGTATACCGCCGCATTGTTAAATCCCTGTCTAACGACGAT
    ATCAATAAGATCTCAGGAGATATGAAGGATAGCCTTAAAGAAATGTCTCTGGAAGA
    AATTTACTCCTATGAAAAGTACGGTGAGTTTATCACCCAAGAGGGGATTAGCTTTTA
    TAACGATATCTGCGGGAAGGTGAATTCGTTTATGAACCTTTATTGTCAAAAGAATAA
    GGAGAATAAGAACTTATATAAGCTTCAGAAACTGCATAAACAAATCTTATGCATTGC
    CGATACTAGCTATGAAGTTCCGTATAAATTCGAGAGCGATGAAGAAGTTTATCAGA
    GCGTCAATGGGTTCTTGGATAACATTTCATCAAAACACATCGTGGAACGTCTGCGTA
    AGATTGGGGATAACTACAACGGATATAATCTTGACAAAATTTATATTGTATCTAAAT
    TCTATGAGTCGGTGAGTCAAAAGACCTACCGTGATTGGGAAACAATCAATACCGCG
    TTAGAAATCCACTATAACAACATTCTGCCAGGGAATGGTAAAAGTAAAGCGGACAA
    AGTCAAGAAGGCTGTGAAGAACGATCTGCAAAAGAGTATTACAGAGATTAACGAAT
    TAGTCTCCAATTATAAGTTATGCTCGGACGATAACATTAAGGCGGAGACGTATATTC
    ATGAGATTTCGCATATTCTTAACAACTTCGAGGCACAAGAGCTTAAGTATAACCCAG
    AGATTCACCTTGTCGAATCGGAGCTGAAGGCATCGGAATTAAAAAATGTCTTAGATG
    TAATCATGAACGCGTTCCATTGGTGCAGTGTTTTCATGACTGAGGAGTTAGTTGACA
    AGGACAATAACTTCTACGCAGAATTAGAAGAGATCTATGATGAGATTTATCCAGTG
    ATTTCGCTGTATAATCTGGTACGTAATTACGTCACTCAAAAGCCCTACTCAACAAAA
    AAAATTAAGCTGAACTTCGGAATTCCGACTCTGGCCGACGGGTGGTCCAAGTCAAA
    GGAGTATTCTAATAATGCTATCATCCTGATGCGCGATAACTTATACTATTTGGGAAT
    TTTCAATGCCAAAAATAAACCAGATAAAAAGATTATCGAAGGTAATACAAGCGAGA
    ATAAGGGTGACTATAAGAAAATGATTTACAATCTTCTTCCAGGCCCTAACAAGATGA
    TTCCCAAAGTTTTTTTGTCCAGTAAAACAGGGGTCGAAACTTACAAGCCCAGTGCCT
    ATATCCTTGAAGGGTACAAGCAGAATAAGCACATCAAATCCTCGAAAGACTTTGAT
    ATTACATTTTGTCATGACTTAATCGATTATTTTAAGAACTGTATCGCAATCCATCCAG
    AATGGAAGAACTTCGGGTTTGATTTCTCTGATACTTCCACGTATGAGGATATTTCCG
    GGTTCTACCGCGAAGTAGAGCTTCAGGGCTATAAAATTGACTGGACATATATTTCAG
    AAAAAGACATCGATCTGTTACAAGAAAAAGGACAGTTGTATCTGTTTCAAATCTATA
    ATAAGGATTTCTCCAAAAAGTCAACTGGAAATGATAACTTACATACAATGTATCTGA
    AAAATCTTTTTAGTGAAGAGAATTTGAAGGATATCGTGCTGAAGTTAAATGGCGAA
    GCAGAGATCTTCTTCCGCAAGTCCTCGATCAAGAATCCTATCATCCACAAGAAAGGT
    AGTATTCTGGTTAACCGCACGTACGAGGCCGAGGAAAAAGACCAGTTCGGTAATAT
    CCAGATTGTACGTAAGAATATTCCTGAAAATATTTACCAGGAATTATACAAGTATTT
    TAACGACAAATCGGATAAGGAGCTTTCAGATGAGGCCGCAAAGTTGAAGAACGTCG
    TAGGACACCATGAGGCCGCTACGAATATCGTCAAGGACTACCGCTATACGTATGAC
    AAGTACTTCCTGCACATGCCTATTACTATCAATTTCAAAGCTAATAAAACAGGATTC
    ATCAATGATCGTATCCTTCAGTACATTGCCAAAGAAAAAGATCTGCACGTAATCGGA
    ATCGACCGTGGCGAACGTAATCTGATTTACGTATCAGTTATCGACACATGTGGTAAC
    ATCGTGGAGCAGAAATCTTTTAACATTGTTAACGGCTATGATTATCAGATTAAGCTT
    AAACAGCAGGAGGGGGCACGCCAAATCGCTCGTAAAGAATGGAAGGAGATTGGAA
    AGATTAAAGAGATTAAAGAGGGGTACCTTTCGCTGGTTATTCACGAAATTTCCAAGA
    TGGTGATTAAGTACAATGCAATCATCGCGATGGAAGATCTTAGTTACGGATTCAAAA
    AGGGACGCTTCAAAGTTGAGCGTCAGGTCTACCAGAAATTTGAAACGATGCTGATT
    AACAAATTGAATTACTTGGTATTCAAAGATATCTCAATTACTGAAAATGGTGGCTTA
    TTAAAGGGTTACCAGCTTACCTATATCCCGGATAAGCTGAAGAACGTGGGCCATCAA
    TGCGGCTGCATCTTTTACGTCCCTGCCGCATATACCTCTAAAATTGACCCCACCACCG
    GATTCGTAAATATTTTTAAATTCAAGGACCTGACGGTGGACGCCAAGCGCGAATTCA
    TCAAAAAATTCGACTCAATCCGCTATGATTCCGAAAAAAATCTTTTCTGCTTTACGTT
    CGATTATAATAACTTCATTACCCAAAACACGGTGATGTCAAAATCGTCCTGGAGCGT
    GTATACTTATGGAGTGCGTATCAAGCGCCGCTTTGTTAATGGGCGCTTCAGTAACGA
    AAGCGATACCATCGACATTACCAAAGACATGGAGAAGACGCTTGAAATGACGGATA
    TCAATTGGCGTGACGGACACGATCTTCGTCAGGATATCATCGACTACGAGATTGTGC
    AACATATCTTTGAGATTTTCCGTTTAACTGTTCAAATGCGTAACTCCTTGTCCGAATT
    GGAAGACCGTGATTACGACCGCTTGATTTCACCAGTGCTTAACGAGAATAACATCTT
    CTACGACTCCGCCAAAGCAGGCGATGCCCTGCCAAAGGACGCTGATGCAAATGGTG
    CATACTGTATCGCGTTGAAGGGCTTATACGAGATTAAGCAAATCACCGAAAATTGG
    AAAGAGGATGGAAAGTTCAGTCGCGATAAGCTGAAGATCTCTAATAAAGATTGGTT
    TGACTTTATCCAGAACAAACGTTATTTAAAACGTCCGGCAGCGACCAAAAAAGCCG
    GCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGAA
    ACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 90
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGTACCA
    ATAATTTCCAAAATTTCATCGGAATCTCATCCTTGCAAAAAACCTTGCGCAATGCTTT
    GATCCCCACCGAAACCACGCAGCAGTTCATCGTGAAAAACGGCATTATCAAAGAGG
    ATGAGTTGCGCGGGGAAAACCGTCAAATTCTTAAGGATATCATGGACGATTACTACC
    GTGGGTTTATCAGTGAGACCCTGTCAAGCATTGACGACATTGACTGGACCAGCTTAT
    TTGAGAAGATGGAGATTCAATTAAAGAACGGGGACAATAAGGACACGCTTATCAAA
    GAGCAGACAGAATACCGTAAAGCGATTCATAAGAAATTTGCAAATGACGATCGCTT
    CAAGAACATGTTTTCAGCAAAATTAATCAGCGACATCCTTCCCGAATTTGTGATTCA
    TAATAACAACTATTCGGCTAGCGAAAAAGAGGAGAAAACTCAGGTTATTAAGCTTT
    TCTCGCGTTTTGCCACTTCGTTCAAAGACTATTTTAAGAATCGCGCAAACTGCTTTTC
    GGCTGATGATATTTCCAGTTCTAGCTGCCATCGTATCGTTAACGATAATGCTGAGAT
    TTTCTTCTCTAATGCCCTGGTGTATCGTCGTATCGTTAAATCTTTGAGCAACGACGAT
    ATTAATAAGATTTCAGGCGACATGAAGGATTCTTTAAAGGAGATGTCTTTAGAAGAG
    ATTTATTCCTATGAGAAATATGGCGAGTTTATCACCCAAGAAGGAATTTCGTTCTAC
    AACGACATCTGTGGCAAAGTGAACAGCTTCATGAATTTATACTGCCAAAAGAATAA
    GGAGAATAAAAATTTATATAAACTGCAGAAACTGCATAAGCAAATTCTTTGCATTGC
    AGACACCTCTTATGAAGTTCCTTATAAGTTTGAATCGGACGAGGAGGTATATCAGAG
    TGTGAACGGGTTCCTGGACAATATTTCATCCAAGCATATTGTTGAACGTTTACGCAA
    AATTGGAGACAATTACAATGGGTATAACCTTGACAAAATTTACATCGTGTCGAAGTT
    TTACGAATCGGTAAGCCAGAAGACCTATCGTGACTGGGAAACTATCAATACCGCCTT
    AGAAATTCATTACAACAATATTCTTCCTGGTAACGGCAAAAGCAAAGCCGATAAGG
    TAAAGAAGGCTGTCAAGAACGACCTGCAAAAGTCTATCACAGAGATCAACGAGTTA
    GTCTCTAACTACAAATTATGTTCCGACGACAATATTAAAGCCGAAACCTACATCCAT
    GAGATCTCACACATTCTTAACAATTTTGAGGCCCAGGAGCTGAAATATAACCCAGAA
    ATTCACCTTGTAGAGAGCGAATTAAAAGCCTCCGAGCTGAAGAACGTTTTGGATGTA
    ATCATGAACGCATTTCATTGGTGCAGCGTATTTATGACAGAGGAGTTGGTCGACAAG
    GACAATAACTTTTACGCCGAGCTTGAAGAAATCTACGATGAAATTTACCCGGTAATT
    AGTTTATATAATTTAGTTCGCAACTACGTAACTCAGAAACCCTACAGTACCAAGAAG
    ATTAAATTGAACTTTGGGATCCCGACACTTGCTGACGGTTGGAGTAAATCAAAAGAA
    TACTCCAATAATGCAATTATCCTGATGCGCGACAATCTTTACTACTTGGGGATCTTTA
    ACGCAAAGAACAAACCAGATAAGAAAATCATCGAGGGCAACACCAGCGAGAATAA
    AGGCGATTACAAGAAAATGATCTATAATCTTTTGCCGGGACCGAACAAAATGATCC
    CAAAGGTTTTCCTGTCGTCGAAAACGGGAGTCGAGACATATAAACCATCTGCGTACA
    TCTTGGAAGGTTACAAACAGAATAAGCATATTAAGTCTAGTAAAGACTTCGACATCA
    CCTTTTGTCATGACCTGATTGATTATTTCAAGAACTGTATTGCTATCCATCCAGAATG
    GAAAAACTTCGGATTTGACTTCTCCGATACTAGCACCTACGAAGACATTTCGGGTTT
    TTATCGCGAAGTAGAGCTTCAAGGGTACAAAATTGATTGGACATATATTAGCGAGA
    AAGACATTGATTTGCTTCAAGAGAAGGGACAGTTATATTTATTCCAGATCTACAACA
    AAGACTTCTCGAAGAAATCCACCGGTAATGATAATCTTCACACTATGTACCTGAAGA
    ATTTATTTTCAGAGGAAAATCTGAAGGACATTGTACTTAAACTTAATGGAGAAGCCG
    AAATCTTCTTCCGCAAGAGTTCCATTAAAAATCCGATTATTCATAAAAAGGGAAGTA
    TCCTTGTGAACCGCACGTATGAGGCCGAAGAGAAGGATCAGTTTGGGAATATTCAA
    ATTGTCCGCAAAAACATCCCCGAGAACATCTACCAGGAACTGTATAAATACTTTAAT
    GATAAATCTGATAAAGAGTTATCAGACGAGGCTGCCAAACTGAAAAACGTAGTCGG
    TCATCATGAGGCAGCGACCAATATTGTAAAGGACTACCGTTACACCTACGACAAGT
    ATTTCCTTCACATGCCGATCACGATTAATTTTAAGGCTAACAAGACCGGCTTTATCA
    ATGACCGCATCTTGCAGTACATCGCGAAAGAGAAAGATTTACACGTCATCGGAATT
    GATCGTGGAGAGCGTAATCTTATCTACGTCAGCGTCATCGACACCTGTGGAAACATT
    GTGGAACAAAAAAGTTTTAATATCGTAAACGGCTACGACTATCAAATTAAACTTAA
    ACAGCAAGAGGGAGCTCGCCAGATCGCTCGCAAAGAGTGGAAAGAGATTGGGAAA
    ATTAAAGAAATTAAAGAGGGTTACCTGTCGCTGGTAATTCACGAAATCTCGAAAAT
    GGTCATCAAATATAATGCAATTATCGCTATGGAGGATCTGTCCTACGGGTTCAAGAA
    GGGACGTTTTAAAGTAGAGCGCCAGGTGTATCAAAAATTCGAAACCATGTTGATCA
    ATAAGCTTAACTATTTGGTCTTCAAAGATATTTCGATTACGGAGAACGGAGGTTTGT
    TGAAAGGATATCAGCTGACGTATATCCCAGACAAGTTGAAAAACGTGGGGCATCAA
    TGTGGATGTATTTTCTATGTGCCCGCGGCCTACACGAGTAAGATCGATCCTACCACT
    GGTTTCGTCAACATTTTCAAATTTAAAGATCTTACCGTGGATGCGAAGCGCGAATTT
    ATTAAGAAATTTGATAGCATTCGCTATGATTCCGAAAAGAACCTGTTCTGTTTTACG
    TTCGACTATAACAATTTCATTACCCAAAACACGGTGATGAGCAAATCCTCTTGGTCA
    GTTTATACATACGGTGTACGTATCAAACGCCGTTTCGTTAACGGACGCTTTTCCAAT
    GAGTCTGATACAATCGATATCACGAAAGATATGGAAAAAACATTAGAGATGACTGA
    TATCAACTGGCGTGACGGGCACGACCTGCGTCAAGACATTATTGACTACGAGATTGT
    GCAGCATATCTTCGAAATCTTTCGCTTAACTGTGCAAATGCGTAACTCGTTATCCGA
    GTTAGAAGACCGTGACTACGATCGCCTGATTTCACCCGTCTTGAACGAAAATAACAT
    CTTCTACGATTCCGCGAAGGCTGGGGACGCATTGCCCAAGGACGCAGACGCGAATG
    GAGCGTACTGTATTGCGCTTAAAGGATTATATGAAATCAAGCAGATCACCGAAAATT
    GGAAGGAGGACGGGAAGTTCTCACGCGACAAACTGAAGATTTCAAATAAGGACTGG
    TTCGATTTCATTCAGAATAAGCGTTACCTGAAACGTCCGGCAGCGACCAAAAAAGCC
    GGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGA
    AACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 91
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAATGGTACGAA
    CAACTTTCAGAACTTCATCGGCATCTCCAGCCTTCAAAAGACTTTACGCAACGCATT
    GATTCCCACGGAGACTACGCAACAGTTTATCGTAAAAAATGGTATTATCAAAGAAG
    ATGAATTACGCGGGGAGAATCGCCAGATTCTTAAGGACATTATGGACGATTATTACC
    GTGGATTCATCAGTGAGACACTGAGCTCCATTGATGACATCGACTGGACGTCATTGT
    TTGAAAAGATGGAAATCCAGTTGAAAAATGGCGATAACAAAGATACATTGATTAAA
    GAGCAGACAGAGTACCGCAAAGCAATTCACAAGAAATTCGCCAATGATGATCGTTT
    TAAGAACATGTTTAGTGCCAAGCTTATTTCGGATATCTTACCCGAATTCGTGATTCAC
    AACAACAATTATTCGGCAAGTGAGAAAGAGGAAAAGACCCAGGTTATCAAATTGTT
    TTCGCGCTTCGCCACTTCGTTCAAAGATTATTTCAAGAACCGTGCAAACTGTTTCTCC
    GCTGACGACATCAGTTCCAGCTCATGCCACCGTATTGTAAATGACAATGCGGAGATC
    TTTTTCAGTAATGCCTTAGTATATCGTCGCATTGTAAAGAGCTTATCTAATGATGACA
    TTAACAAGATCTCGGGTGATATGAAGGACTCACTTAAGGAGATGAGTCTGGAAGAG
    ATCTACTCCTACGAAAAATACGGGGAATTCATCACCCAGGAGGGAATTTCATTCTAC
    AACGATATCTGCGGCAAAGTTAACTCCTTTATGAATCTGTACTGTCAAAAGAACAAG
    GAGAATAAAAACCTGTATAAATTGCAGAAACTTCATAAACAAATTTTGTGTATCGCA
    GACACGAGTTATGAAGTACCTTATAAATTCGAATCCGACGAAGAGGTATATCAGTCC
    GTAAATGGGTTCCTGGACAATATCAGTAGTAAGCACATTGTGGAACGCTTACGCAA
    AATTGGAGACAATTACAACGGGTATAACCTGGACAAAATCTACATCGTATCCAAATT
    TTATGAAAGCGTGTCTCAAAAAACTTATCGTGATTGGGAAACAATCAACACGGCTCT
    TGAGATCCATTACAATAACATCTTGCCGGGTAACGGCAAATCGAAGGCAGACAAAG
    TTAAAAAAGCAGTTAAGAACGACTTACAGAAAAGCATTACGGAGATTAACGAGTTA
    GTAAGTAATTACAAATTATGCTCCGACGATAATATCAAAGCTGAAACCTACATCCAT
    GAAATTAGCCACATTTTGAACAATTTCGAAGCGCAGGAGCTGAAATATAACCCTGA
    AATCCATCTGGTAGAGTCTGAGTTGAAGGCGTCAGAACTGAAAAACGTTCTTGACGT
    CATCATGAATGCCTTTCACTGGTGTAGTGTTTTTATGACTGAGGAGCTTGTAGATAA
    GGACAACAACTTCTATGCTGAACTTGAAGAGATCTACGATGAAATCTACCCCGTAAT
    CAGTCTGTATAATTTAGTTCGTAACTACGTCACGCAGAAACCCTATTCGACTAAGAA
    AATTAAGCTGAACTTTGGGATCCCTACTTTGGCAGACGGGTGGAGCAAGAGTAAAG
    AATACAGTAATAATGCAATTATCTTGATGCGCGATAACTTATATTACTTAGGTATTTT
    CAATGCTAAGAACAAACCTGATAAGAAGATTATCGAAGGAAATACGAGTGAGAATA
    AGGGAGACTACAAAAAGATGATTTACAACTTGCTGCCAGGGCCTAATAAGATGATT
    CCAAAAGTTTTTCTGTCGAGCAAGACAGGGGTTGAAACTTATAAGCCATCCGCTTAT
    ATCCTTGAGGGGTACAAGCAGAATAAGCATATCAAGTCCTCCAAAGATTTTGATATT
    ACATTTTGCCACGACTTAATTGATTACTTCAAGAACTGCATCGCAATCCATCCCGAA
    TGGAAGAATTTCGGCTTCGATTTCTCAGATACGTCCACGTATGAGGATATCTCAGGC
    TTTTACCGCGAAGTTGAGCTGCAAGGTTATAAAATTGATTGGACATACATCTCCGAA
    AAAGACATTGATCTTTTACAGGAAAAGGGCCAATTATACTTATTTCAAATCTATAAC
    AAAGATTTTAGCAAGAAGTCCACAGGTAATGATAACCTGCATACGATGTATTTGAA
    AAATCTTTTCAGTGAAGAGAATTTGAAGGATATCGTCCTGAAGCTGAACGGTGAGG
    CTGAGATCTTCTTCCGCAAATCGTCTATCAAAAACCCCATCATTCACAAAAAGGGAA
    GTATCTTAGTAAACCGCACTTATGAAGCGGAGGAAAAGGATCAGTTCGGGAACATC
    CAGATCGTGCGCAAGAACATTCCAGAAAACATCTATCAGGAACTTTACAAATATTTC
    AATGACAAGTCTGATAAAGAATTATCAGACGAGGCGGCGAAACTTAAAAATGTTGT
    TGGACACCACGAAGCAGCGACGAATATTGTAAAGGATTATCGCTACACATACGATA
    AATACTTTTTGCACATGCCAATCACCATTAACTTTAAGGCGAACAAGACAGGTTTCA
    TTAACGACCGTATTCTGCAATATATCGCAAAGGAAAAAGACCTGCACGTTATTGGGA
    TCGATCGTGGCGAACGCAATTTGATCTACGTAAGCGTTATCGACACTTGCGGAAATA
    TCGTTGAACAAAAAAGCTTTAATATCGTCAATGGATACGATTACCAAATCAAGCTGA
    AACAACAAGAAGGGGCACGTCAGATCGCTCGTAAAGAATGGAAAGAGATTGGTAA
    GATCAAAGAGATTAAAGAAGGGTATCTTTCTTTAGTAATTCACGAGATTTCGAAAAT
    GGTTATTAAATACAATGCGATTATTGCTATGGAAGACTTAAGCTACGGCTTTAAGAA
    AGGTCGCTTCAAAGTGGAGCGCCAAGTGTATCAGAAGTTTGAAACGATGTTGATTA
    ACAAATTAAATTACCTGGTCTTTAAGGACATCAGTATCACAGAAAATGGGGGGTTGC
    TTAAAGGGTACCAGCTTACATACATCCCTGATAAACTGAAAAATGTCGGTCATCAGT
    GCGGATGTATCTTCTATGTACCAGCAGCCTATACCAGTAAGATTGACCCTACTACTG
    GCTTTGTGAATATTTTTAAATTCAAGGATTTAACCGTGGACGCCAAGCGTGAATTTA
    TTAAAAAATTTGATTCGATTCGCTACGACAGTGAGAAAAACCTTTTCTGCTTTACCTT
    TGACTACAACAATTTTATTACCCAGAACACCGTAATGTCAAAGAGTTCGTGGTCTGT
    ATATACCTACGGTGTTCGCATCAAGCGCCGCTTCGTAAACGGGCGTTTCAGTAACGA
    ATCTGACACCATCGACATCACTAAAGATATGGAGAAGACATTGGAAATGACGGACA
    TTAATTGGCGTGATGGCCATGACTTACGTCAGGACATTATTGATTACGAAATTGTGC
    AGCATATCTTCGAGATTTTCCGTTTGACAGTTCAGATGCGCAACTCACTGAGTGAGT
    TAGAAGATCGCGATTACGACCGTCTGATCTCACCGGTCCTTAATGAAAACAACATTT
    TCTACGACTCAGCAAAGGCGGGTGATGCCCTGCCAAAGGATGCGGACGCTAATGGC
    GCCTACTGCATCGCCCTGAAAGGATTGTATGAAATTAAGCAGATTACAGAAAATTG
    GAAGGAAGATGGTAAATTTAGCCGTGATAAATTAAAAATCTCGAACAAGGATTGGT
    TCGATTTTATTCAGAACAAACGTTATTTGAAACGTCCGGCAGCGACCAAAAAAGCCG
    GCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGAA
    ACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 92
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAATGGAACAA
    ATAATTTTCAAAATTTTATCGGCATCTCAAGTCTTCAAAAAACCCTTCGCAATGCCCT
    GATTCCAACTGAAACAACCCAGCAATTTATCGTCAAGAACGGCATCATTAAGGAAG
    ACGAGTTACGCGGGGAGAACCGTCAAATCCTGAAAGATATCATGGATGACTACTAT
    CGTGGGTTCATTTCGGAAACCTTGTCTTCAATCGACGACATTGACTGGACGAGTCTT
    TTCGAGAAAATGGAAATTCAGCTTAAAAATGGAGACAACAAGGATACTCTGATTAA
    GGAACAGACAGAATATCGCAAAGCTATCCACAAAAAGTTCGCTAATGATGATCGTT
    TCAAAAATATGTTTTCTGCTAAATTGATTTCCGATATCTTGCCTGAATTTGTAATCCA
    CAACAACAATTATTCTGCTTCCGAGAAGGAAGAGAAGACCCAGGTCATTAAATTATT
    CAGCCGCTTTGCAACCAGCTTTAAAGACTACTTTAAGAATCGCGCTAACTGCTTTTC
    GGCGGATGACATCTCATCATCATCATGCCACCGCATTGTGAACGACAATGCGGAGAT
    CTTCTTTTCGAATGCGTTAGTTTATCGTCGCATTGTCAAAAGTCTTAGCAATGATGAC
    ATCAACAAGATCTCAGGAGACATGAAAGATTCCTTAAAGGAGATGTCTCTTGAGGA
    AATCTATTCGTATGAGAAATACGGCGAGTTCATTACCCAGGAAGGTATTAGTTTCTA
    CAATGATATCTGCGGCAAAGTAAATTCTTTTATGAATCTGTATTGCCAAAAAAACAA
    AGAAAACAAGAATCTTTATAAGTTACAAAAGTTACATAAGCAAATTCTGTGCATCGC
    TGATACATCTTATGAGGTACCCTACAAATTTGAAAGTGATGAGGAGGTCTATCAGAG
    TGTCAACGGCTTCTTAGACAACATCTCTTCCAAACATATCGTGGAACGCCTGCGTAA
    AATCGGAGATAACTACAACGGATATAACTTAGATAAAATCTACATCGTGTCCAAGTT
    TTATGAAAGTGTGAGCCAAAAAACATATCGTGACTGGGAAACCATTAACACCGCAT
    TGGAAATTCACTATAACAACATTTTGCCAGGCAACGGGAAAAGTAAGGCGGACAAA
    GTTAAGAAAGCAGTTAAAAATGACCTGCAAAAAAGCATCACTGAAATTAACGAATT
    GGTATCGAATTACAAATTATGTAGCGACGATAATATCAAAGCAGAAACTTACATTCA
    CGAGATTAGTCACATTTTAAATAACTTCGAGGCCCAGGAATTGAAATACAATCCCGA
    AATTCATTTGGTTGAATCAGAACTGAAAGCATCAGAGTTGAAAAATGTGTTAGATGT
    CATTATGAATGCGTTTCATTGGTGCTCTGTGTTCATGACCGAGGAACTGGTTGATAA
    AGATAACAACTTTTACGCTGAATTGGAGGAGATTTACGATGAGATTTACCCGGTCAT
    TTCGCTTTATAACTTAGTGCGCAATTATGTGACGCAGAAACCATATTCCACGAAGAA
    AATCAAACTTAATTTTGGCATCCCTACTCTGGCTGATGGTTGGTCGAAATCGAAAGA
    GTACAGCAACAACGCGATCATTCTTATGCGTGACAATCTTTACTATTTGGGCATTTTT
    AATGCCAAGAATAAGCCAGATAAGAAAATCATTGAGGGGAATACTTCCGAGAATAA
    GGGGGATTACAAAAAGATGATCTATAACTTGCTGCCCGGCCCCAACAAAATGATTC
    CTAAGGTTTTCTTGTCAAGCAAGACGGGCGTCGAAACATATAAGCCGTCAGCTTATA
    TTCTGGAAGGCTATAAACAGAATAAGCACATCAAGTCTTCCAAGGACTTTGACATCA
    CTTTTTGCCACGATTTGATCGACTACTTTAAGAACTGTATTGCGATTCATCCGGAATG
    GAAGAACTTCGGTTTCGACTTTTCCGATACCTCAACATACGAGGATATCAGCGGCTT
    CTACCGTGAAGTCGAGCTTCAAGGCTACAAGATCGATTGGACATATATTTCAGAGAA
    GGACATTGATTTGTTACAAGAGAAAGGTCAACTTTACTTATTTCAGATCTATAACAA
    AGACTTTTCGAAGAAATCGACAGGAAACGATAACTTACACACTATGTATTTAAAAA
    ATCTGTTTTCGGAGGAAAACCTGAAAGATATTGTGCTGAAACTTAACGGCGAGGCA
    GAGATCTTTTTCCGTAAAAGCTCAATCAAGAATCCTATCATCCATAAAAAAGGTAGT
    ATTCTTGTCAACCGCACATATGAAGCGGAGGAGAAGGACCAATTCGGAAACATCCA
    AATTGTCCGTAAGAATATTCCGGAGAACATTTACCAAGAGTTGTATAAATACTTTAA
    CGATAAGTCAGATAAGGAACTTAGCGATGAGGCGGCGAAGCTTAAAAACGTAGTTG
    GGCATCATGAAGCTGCTACCAACATTGTAAAAGATTACCGTTACACCTATGACAAGT
    ATTTCTTGCACATGCCCATTACGATCAATTTCAAAGCAAATAAGACAGGCTTTATCA
    ATGATCGCATCCTGCAGTACATTGCTAAAGAGAAGGATTTGCATGTTATCGGTATTG
    ATCGCGGAGAGCGCAATTTGATCTACGTCTCCGTAATCGACACTTGCGGTAACATTG
    TTGAGCAGAAGTCGTTCAACATCGTTAATGGTTATGATTACCAAATCAAGCTGAAGC
    AGCAAGAGGGTGCCCGCCAGATCGCGCGTAAGGAATGGAAAGAAATCGGGAAAAT
    TAAAGAGATCAAAGAAGGCTATTTGTCTCTGGTAATTCACGAAATCAGCAAGATGG
    TGATCAAGTATAACGCGATCATTGCGATGGAGGATCTTTCTTATGGCTTCAAGAAAG
    GGCGCTTTAAAGTCGAACGCCAGGTCTACCAGAAATTTGAGACAATGCTTATCAACA
    AGCTTAACTATCTTGTATTTAAGGATATTTCCATCACTGAGAACGGAGGACTTTTAA
    AGGGGTACCAACTGACGTACATTCCTGATAAGCTGAAGAACGTTGGTCATCAATGC
    GGATGCATCTTCTATGTGCCAGCGGCTTACACCTCCAAAATCGATCCCACTACAGGC
    TTTGTCAATATCTTCAAATTCAAGGATTTGACCGTTGACGCGAAGCGCGAGTTTATC
    AAGAAGTTTGATAGCATTCGCTACGACAGCGAAAAAAATTTATTTTGTTTTACTTTC
    GACTACAATAACTTTATTACTCAGAACACTGTCATGTCAAAGAGTTCGTGGAGTGTC
    TACACGTACGGAGTACGTATTAAGCGCCGTTTCGTCAACGGACGCTTCTCAAACGAA
    AGCGACACGATCGACATCACCAAAGACATGGAAAAAACTCTTGAGATGACGGATAT
    CAATTGGCGCGACGGCCATGACCTGCGTCAGGATATCATTGATTACGAGATCGTTCA
    GCACATCTTCGAAATCTTCCGCCTTACCGTCCAGATGCGCAACAGTTTAAGCGAGCT
    TGAAGACCGCGACTACGATCGTTTGATTAGCCCCGTTCTGAACGAGAATAATATTTT
    CTACGACAGCGCAAAGGCCGGTGATGCTTTGCCAAAGGACGCAGACGCGAATGGAG
    CCTACTGCATCGCCCTGAAGGGCTTATATGAGATTAAGCAAATTACCGAAAATTGGA
    AGGAAGATGGTAAGTTCTCCCGTGATAAGCTTAAAATTAGCAATAAGGATTGGTTCG
    ACTTCATCCAGAACAAACGTTACCTGAAACGTCCGGCAGCGACCAAAAAAGCCGGC
    CAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGAAAC
    GTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 93
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGAACAA
    ACAATTTCCAAAACTTCATCGGTATCTCTTCGTTGCAGAAGACTCTGCGTAATGCTTT
    GATCCCGACGGAGACAACCCAACAATTTATCGTCAAAAACGGTATTATTAAGGAGG
    ACGAGTTACGTGGAGAAAATCGTCAAATCCTTAAGGACATCATGGACGATTATTATC
    GCGGGTTTATTTCTGAAACCCTGAGCAGTATCGATGATATCGACTGGACCTCACTTT
    TTGAGAAAATGGAGATCCAGTTGAAGAACGGTGATAACAAAGACACTCTGATCAAA
    GAGCAAACTGAATACCGCAAGGCAATTCACAAAAAGTTCGCCAACGACGACCGTTT
    CAAGAATATGTTCTCAGCTAAGTTAATCAGCGACATTTTGCCAGAGTTCGTTATCCA
    CAACAATAATTATAGTGCTTCAGAGAAGGAGGAAAAAACCCAAGTGATTAAACTTT
    TTTCGCGCTTTGCAACCTCATTCAAGGACTACTTCAAGAATCGCGCGAATTGCTTCA
    GTGCGGACGACATTTCTTCTTCAAGTTGCCATCGTATCGTTAACGATAACGCGGAAA
    TTTTCTTCTCTAATGCTTTGGTGTATCGCCGCATTGTAAAATCGCTTAGTAACGATGA
    CATTAATAAGATCTCAGGTGATATGAAAGATTCATTGAAGGAAATGAGCTTGGAAG
    AGATTTACAGTTACGAAAAATATGGAGAATTTATTACTCAGGAAGGCATCTCATTCT
    ATAACGATATCTGCGGGAAGGTAAATTCGTTTATGAACTTATATTGCCAGAAAAATA
    AAGAGAATAAAAATTTGTATAAGCTTCAGAAGTTGCACAAACAGATCCTGTGCATT
    GCAGACACCTCGTATGAGGTTCCGTATAAATTTGAGTCCGATGAAGAAGTGTATCAG
    TCTGTGAATGGTTTCTTAGATAATATCTCTTCCAAGCATATTGTCGAACGCCTGCGCA
    AAATTGGTGATAACTATAACGGATACAATCTGGATAAAATTTACATCGTTTCTAAAT
    TTTACGAGTCAGTCTCGCAGAAGACCTACCGCGACTGGGAAACAATTAACACGGCA
    TTGGAGATTCACTACAATAATATCTTGCCTGGTAACGGTAAGTCTAAGGCAGATAAG
    GTAAAAAAAGCTGTGAAAAACGACCTTCAGAAAAGCATCACGGAGATTAATGAGCT
    GGTGAGTAATTACAAATTATGTTCAGACGATAATATTAAAGCTGAAACGTATATCCA
    TGAAATCTCGCATATCTTGAACAACTTCGAGGCCCAAGAACTTAAATATAACCCCGA
    AATCCATTTAGTCGAGTCTGAATTGAAAGCGTCGGAATTAAAAAACGTCTTAGACGT
    CATTATGAACGCGTTTCACTGGTGTTCAGTTTTCATGACCGAAGAGCTGGTCGACAA
    AGACAACAACTTCTATGCGGAATTGGAGGAAATCTATGATGAAATCTACCCTGTTAT
    TTCACTGTATAACCTTGTGCGCAACTATGTCACTCAGAAGCCGTATTCGACCAAAAA
    AATTAAATTGAATTTCGGTATCCCTACTCTTGCAGACGGATGGAGTAAAAGCAAGGA
    ATACAGTAATAACGCCATTATTCTTATGCGCGACAATTTATACTACCTGGGCATCTTT
    AACGCAAAGAATAAGCCGGATAAGAAGATTATTGAGGGTAACACCAGTGAGAACA
    AGGGCGACTATAAGAAGATGATCTATAACTTATTGCCAGGTCCAAATAAAATGATC
    CCAAAAGTATTCTTATCATCAAAGACGGGAGTTGAAACCTATAAGCCTAGTGCCTAT
    ATTCTTGAGGGATATAAACAGAACAAGCACATTAAGTCGTCTAAGGATTTTGACATT
    ACGTTCTGCCATGACTTAATCGACTATTTTAAAAACTGTATTGCGATTCACCCCGAAT
    GGAAGAATTTTGGATTCGATTTTTCGGATACCTCGACCTATGAAGATATTTCGGGAT
    TTTATCGTGAAGTGGAGTTGCAAGGCTATAAAATCGATTGGACCTATATCTCAGAAA
    AAGACATTGATTTATTACAGGAAAAGGGACAACTGTACCTTTTCCAAATTTATAACA
    AGGACTTTTCTAAAAAGTCCACAGGAAATGATAACCTTCACACCATGTACCTGAAGA
    ACCTTTTCTCAGAGGAAAACCTGAAGGACATTGTCCTTAAGTTAAATGGAGAAGCG
    GAGATCTTTTTCCGTAAATCTAGTATCAAGAATCCGATTATCCATAAAAAAGGTTCG
    ATTTTGGTAAATCGCACCTATGAAGCGGAAGAGAAAGATCAATTTGGTAACATCCA
    GATCGTGCGCAAGAATATCCCGGAGAACATTTACCAAGAGCTGTATAAGTACTTCA
    ATGATAAGTCTGATAAGGAACTGTCAGATGAAGCTGCGAAATTGAAGAACGTGGTT
    GGGCATCATGAAGCCGCTACCAATATCGTCAAGGATTACCGTTATACCTATGACAAA
    TATTTCTTACACATGCCGATTACGATCAATTTTAAGGCAAACAAGACAGGATTCATC
    AACGACCGTATCTTGCAGTATATTGCCAAAGAGAAGGATCTGCATGTGATCGGTATT
    GACCGCGGGGAGCGCAATTTAATCTATGTATCGGTGATCGATACTTGTGGTAACATC
    GTAGAACAAAAGAGCTTTAACATCGTGAATGGTTACGACTATCAGATCAAGCTGAA
    ACAACAGGAAGGAGCCCGCCAGATCGCTCGCAAGGAATGGAAAGAAATCGGGAAA
    ATTAAGGAAATCAAGGAAGGCTACCTTTCATTGGTCATTCACGAAATTTCGAAAATG
    GTAATTAAGTACAACGCGATCATCGCCATGGAGGACCTTTCGTACGGATTTAAGAAG
    GGTCGTTTCAAAGTTGAGCGCCAGGTATACCAAAAATTCGAGACTATGCTTATCAAC
    AAACTTAACTACTTGGTCTTTAAGGACATTTCTATTACCGAAAACGGCGGCTTACTT
    AAAGGCTATCAATTGACATATATTCCCGACAAACTGAAGAATGTTGGACATCAATGC
    GGGTGTATTTTCTATGTGCCGGCAGCTTACACTAGTAAGATCGACCCTACAACCGGG
    TTCGTAAACATTTTTAAATTCAAAGACTTAACAGTCGATGCGAAGCGTGAATTTATT
    AAGAAGTTTGATAGTATCCGCTATGACAGTGAAAAGAACTTGTTTTGCTTTACGTTC
    GACTACAATAACTTTATTACACAGAACACGGTCATGTCTAAATCATCATGGTCGGTT
    TACACATATGGGGTGCGCATCAAGCGTCGCTTTGTAAATGGCCGTTTTAGTAATGAG
    AGCGACACAATCGACATCACAAAGGATATGGAGAAAACTCTTGAGATGACAGACAT
    CAATTGGCGTGACGGTCATGACTTACGCCAAGATATCATCGACTACGAAATCGTACA
    GCATATTTTTGAGATTTTTCGTCTTACTGTGCAAATGCGTAATTCTTTATCCGAACTG
    GAAGATCGTGATTACGACCGCTTGATTAGTCCCGTCTTAAATGAGAACAATATTTTC
    TATGATTCTGCGAAAGCCGGAGATGCACTGCCCAAAGACGCTGATGCCAATGGCGC
    GTATTGCATTGCATTAAAAGGATTATATGAGATTAAACAGATTACCGAAAATTGGAA
    AGAGGACGGTAAATTCTCACGCGATAAATTGAAGATTTCTAACAAGGACTGGTTCG
    ACTTTATCCAAAATAAACGTTATCTTAAACGTCCGGCAGCGACCAAAAAAGCCGGC
    CAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGAAAC
    GTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 94
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAACGGTACCAA
    CAACTTTCAGAATTTCATTGGCATTAGCTCGCTTCAAAAAACTTTACGCAATGCTCTT
    ATTCCGACTGAGACGACACAACAGTTTATCGTTAAGAATGGCATCATCAAAGAAGA
    TGAATTACGCGGAGAAAACCGCCAGATCCTGAAAGACATTATGGACGATTATTACC
    GTGGGTTCATCTCCGAGACGTTGTCATCGATCGATGACATCGACTGGACGTCACTTT
    TTGAAAAAATGGAGATCCAGTTAAAGAACGGTGACAATAAGGATACATTGATCAAA
    GAACAGACCGAGTACCGTAAAGCGATTCATAAAAAGTTTGCGAACGATGATCGCTT
    CAAGAATATGTTTTCTGCGAAATTAATTTCCGACATTTTACCTGAATTTGTTATTCAT
    AATAACAACTACTCGGCGTCTGAGAAAGAGGAGAAAACCCAAGTGATTAAACTTTT
    TTCACGTTTCGCAACGTCGTTCAAAGACTATTTTAAAAATCGTGCTAATTGCTTTAGC
    GCGGATGACATCAGCTCTAGTTCATGTCATCGCATTGTCAACGATAATGCTGAGATC
    TTTTTCAGTAATGCGTTAGTGTACCGTCGTATTGTGAAGTCCTTATCTAATGATGATA
    TCAATAAGATCAGCGGGGATATGAAGGACTCACTTAAGGAGATGAGCTTGGAGGAA
    ATCTATTCCTATGAGAAGTATGGTGAGTTTATTACGCAAGAAGGAATTAGCTTTTAC
    AACGATATCTGTGGAAAGGTGAATTCGTTTATGAATTTGTATTGCCAGAAAAATAAG
    GAGAACAAGAACCTTTATAAATTGCAAAAGTTACACAAGCAAATCCTGTGCATTGC
    AGATACTTCCTACGAGGTGCCTTACAAGTTTGAATCCGACGAAGAGGTCTACCAATC
    TGTAAACGGTTTCTTAGATAATATTAGTTCCAAGCATATTGTGGAGCGCCTTCGTAA
    AATTGGCGATAATTACAACGGTTACAATTTAGACAAAATTTACATTGTCAGTAAATT
    CTACGAGTCCGTATCTCAAAAGACGTATCGTGATTGGGAGACTATCAATACGGCCCT
    GGAGATCCACTACAACAATATCTTGCCCGGTAATGGTAAGTCGAAGGCCGATAAAG
    TTAAGAAAGCGGTGAAAAATGACTTACAGAAGTCAATCACCGAAATTAACGAATTG
    GTGTCCAATTATAAATTGTGTTCAGATGATAATATCAAAGCCGAGACCTACATTCAT
    GAGATTTCCCATATCTTAAATAATTTCGAGGCGCAAGAGCTTAAGTATAACCCAGAA
    ATCCACCTGGTAGAATCTGAGTTGAAGGCGTCAGAGTTAAAAAATGTTTTAGATGTC
    ATTATGAACGCGTTTCACTGGTGCTCCGTATTTATGACGGAGGAATTAGTAGATAAA
    GACAACAATTTCTATGCCGAACTTGAGGAAATCTATGATGAGATCTATCCCGTCATT
    AGCCTGTATAACTTGGTCCGCAACTATGTTACCCAAAAACCGTACAGTACCAAGAAG
    ATTAAGCTGAATTTCGGCATTCCTACACTGGCTGATGGTTGGAGTAAATCGAAGGAA
    TATTCGAATAACGCGATTATCTTGATGCGCGACAACTTATACTATTTGGGGATCTTTA
    ACGCCAAAAACAAACCGGATAAGAAGATTATTGAGGGAAACACATCAGAGAACAA
    AGGCGACTACAAAAAAATGATTTACAACTTGTTACCGGGGCCTAACAAAATGATCC
    CGAAGGTGTTCTTATCCAGTAAAACAGGCGTTGAGACCTACAAACCTTCCGCATACA
    TCCTGGAAGGGTATAAGCAGAACAAGCACATTAAGTCCAGCAAGGATTTCGATATT
    ACCTTCTGTCATGATTTAATTGACTATTTCAAGAACTGTATTGCAATCCACCCCGAGT
    GGAAGAACTTCGGATTCGACTTCTCAGATACGAGCACATATGAGGACATCTCGGGG
    TTCTATCGTGAAGTAGAACTGCAGGGATATAAAATTGATTGGACATATATTTCCGAA
    AAAGACATCGACCTTTTACAAGAGAAGGGTCAACTTTACTTGTTCCAAATTTACAAT
    AAAGACTTCTCAAAAAAAAGCACGGGTAACGATAATTTACACACTATGTATTTAAA
    GAACCTTTTCTCGGAAGAGAATTTAAAGGATATCGTATTGAAGTTGAATGGAGAAG
    CGGAGATCTTCTTCCGTAAGTCCAGTATTAAAAACCCTATTATTCACAAGAAGGGAT
    CGATTTTAGTTAACCGCACATACGAGGCCGAAGAGAAGGACCAATTTGGGAACATT
    CAAATTGTCCGCAAAAACATCCCTGAGAACATTTATCAAGAGCTTTATAAGTACTTT
    AACGATAAGTCCGATAAGGAATTGTCAGATGAGGCGGCAAAGTTGAAGAATGTCGT
    GGGGCATCATGAAGCTGCCACCAACATTGTGAAGGACTACCGCTACACTTACGACA
    AATACTTCCTGCACATGCCCATTACGATCAATTTTAAGGCCAATAAGACAGGCTTTA
    TTAACGACCGTATTCTTCAATATATCGCTAAGGAGAAGGACCTTCATGTGATTGGGA
    TCGACCGCGGAGAACGTAATTTAATTTATGTGTCCGTCATCGATACGTGTGGAAATA
    TCGTGGAACAGAAATCATTCAATATCGTGAATGGCTATGATTACCAGATCAAATTAA
    AACAGCAGGAGGGCGCTCGCCAAATTGCGCGTAAGGAATGGAAAGAGATCGGAAA
    AATCAAAGAAATCAAAGAAGGATATTTGTCATTGGTGATCCATGAGATTTCAAAAA
    TGGTAATTAAATATAATGCAATTATCGCAATGGAAGACCTGTCCTATGGTTTTAAGA
    AGGGTCGTTTCAAGGTAGAACGCCAAGTGTATCAAAAGTTCGAGACGATGCTGATC
    AATAAGCTGAATTATCTTGTGTTTAAGGACATTAGCATCACGGAAAATGGAGGGCTG
    TTGAAAGGCTATCAACTGACGTATATCCCTGACAAGCTGAAAAATGTTGGCCATCAG
    TGCGGGTGCATTTTCTACGTCCCCGCGGCGTATACAAGCAAGATCGATCCTACTACG
    GGATTCGTAAATATTTTTAAATTCAAAGACTTAACCGTGGACGCCAAGCGCGAATTC
    ATTAAGAAGTTTGATAGCATTCGCTACGATTCAGAAAAAAATCTTTTCTGTTTTACGT
    TCGATTACAACAATTTTATCACCCAGAACACAGTGATGAGCAAGTCATCCTGGTCTG
    TCTATACCTACGGTGTCCGTATCAAACGCCGCTTCGTCAACGGACGCTTCTCTAATG
    AATCTGATACCATTGACATCACCAAGGACATGGAAAAGACACTTGAGATGACAGAT
    ATTAACTGGCGTGACGGACATGACCTGCGTCAGGACATCATCGATTATGAGATTGTT
    CAGCATATCTTCGAGATCTTCCGCCTGACAGTACAAATGCGCAATTCACTGTCAGAA
    CTTGAAGACCGCGACTATGACCGCCTGATCTCTCCAGTATTAAATGAGAACAATATC
    TTTTATGACAGTGCTAAGGCCGGCGATGCCCTTCCGAAAGATGCTGATGCTAACGGA
    GCTTATTGTATTGCATTAAAGGGTCTTTATGAGATCAAGCAAATTACCGAGAATTGG
    AAGGAGGATGGCAAATTCTCGCGCGACAAACTGAAAATCAGTAACAAGGACTGGTT
    CGATTTTATTCAGAATAAACGTTACCTGAAACGTCCGGCAGCGACCAAAAAAGCCG
    GCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGAA
    ACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 95
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAACGGAACGA
    ACAACTTCCAGAACTTCATCGGCATCAGTTCTTTACAAAAAACCCTGCGTAACGCCC
    TTATTCCGACTGAGACAACACAACAGTTCATCGTTAAAAACGGAATTATCAAAGAG
    GACGAGTTGCGCGGCGAGAATCGCCAAATTTTGAAAGATATTATGGACGACTATTAT
    CGTGGTTTTATTTCAGAAACACTGAGTTCGATTGACGATATCGATTGGACGAGCCTG
    TTTGAGAAAATGGAAATCCAGTTGAAAAATGGCGATAATAAAGACACTTTAATCAA
    AGAACAAACCGAGTATCGTAAAGCGATCCATAAAAAGTTCGCTAATGACGATCGTT
    TTAAGAATATGTTCAGTGCGAAACTGATTTCAGACATTTTGCCCGAGTTCGTGATCC
    ATAATAACAACTATTCCGCCTCGGAAAAGGAAGAAAAAACCCAGGTGATTAAGCTG
    TTCAGTCGCTTCGCAACATCTTTCAAGGATTATTTCAAGAATCGCGCGAATTGCTTCA
    GTGCGGACGATATTTCTAGTTCAAGCTGCCATCGTATCGTTAATGATAACGCGGAGA
    TTTTTTTTAGCAATGCTCTGGTGTACCGCCGCATTGTTAAGTCACTGTCCAACGATGA
    TATTAACAAGATCTCAGGAGACATGAAAGACTCGCTTAAAGAGATGAGTCTGGAAG
    AGATCTATTCTTATGAGAAGTATGGCGAGTTTATTACCCAAGAAGGAATCTCATTCT
    ACAATGATATTTGTGGAAAGGTGAACAGCTTTATGAATCTTTACTGCCAAAAAAACA
    AGGAGAATAAGAATCTTTACAAACTTCAGAAGTTACATAAACAGATTTTGTGTATTG
    CGGATACGTCTTATGAAGTCCCCTACAAATTTGAATCGGATGAAGAGGTATACCAAA
    GTGTGAACGGATTCTTGGACAATATTTCTTCTAAACATATTGTTGAACGCTTACGTA
    AGATCGGGGATAACTACAATGGCTACAATCTTGACAAAATCTACATTGTTAGCAAAT
    TCTACGAGAGTGTCAGCCAAAAGACGTACCGCGATTGGGAAACAATTAATACTGCG
    CTTGAGATTCACTATAATAACATTTTACCAGGCAACGGCAAGTCCAAGGCGGATAA
    AGTTAAAAAAGCTGTTAAAAACGATTTGCAAAAATCTATCACAGAAATTAACGAGT
    TAGTTAGTAACTACAAACTGTGCTCCGATGACAACATTAAGGCTGAGACGTATATCC
    ATGAGATCTCTCACATCTTAAACAATTTTGAAGCTCAAGAACTTAAGTACAATCCGG
    AAATCCACCTGGTGGAATCCGAGCTGAAGGCTAGCGAACTGAAGAACGTATTGGAC
    GTGATCATGAACGCGTTCCACTGGTGTTCTGTCTTTATGACGGAAGAGCTTGTCGAC
    AAAGATAATAACTTTTACGCGGAACTTGAGGAAATTTACGATGAGATTTACCCAGTT
    ATTTCATTGTATAACCTTGTCCGTAATTACGTGACCCAAAAGCCTTATAGTACGAAA
    AAAATCAAATTAAATTTTGGAATCCCAACACTGGCTGACGGTTGGAGCAAATCTAA
    GGAGTATTCTAATAACGCAATCATCTTAATGCGTGACAACCTGTATTATTTGGGTAT
    CTTCAATGCCAAAAATAAGCCTGACAAAAAGATTATCGAAGGAAATACTTCGGAGA
    ATAAGGGGGATTACAAAAAAATGATTTACAATTTGCTGCCCGGGCCGAACAAGATG
    ATCCCCAAAGTGTTCTTATCCTCGAAGACTGGTGTAGAAACATACAAGCCAAGCGCA
    TACATTCTGGAGGGTTACAAGCAAAACAAACACATCAAATCTTCAAAAGACTTTGA
    CATTACATTTTGCCATGATCTTATTGACTACTTCAAAAACTGCATTGCTATTCACCCC
    GAGTGGAAGAACTTTGGGTTTGACTTCAGCGACACGTCTACGTATGAGGACATCTCC
    GGGTTCTACCGTGAAGTTGAGTTACAAGGGTATAAGATTGACTGGACGTATATTTCA
    GAGAAAGATATCGATCTTTTGCAGGAAAAGGGCCAGTTATATTTATTCCAGATTTAC
    AACAAGGACTTTAGTAAGAAGTCAACAGGAAATGACAACTTGCATACGATGTATTT
    GAAAAATCTTTTTTCTGAGGAAAATCTTAAGGACATCGTACTGAAATTGAATGGCGA
    GGCTGAAATCTTCTTCCGTAAATCCTCCATTAAGAATCCCATTATCCACAAAAAGGG
    GTCTATCCTGGTGAATCGTACCTACGAGGCAGAGGAGAAGGATCAATTCGGAAATA
    TTCAGATTGTTCGTAAGAACATCCCCGAGAACATTTATCAAGAATTGTATAAGTACT
    TTAATGACAAATCTGACAAAGAGTTATCCGACGAAGCTGCGAAACTGAAAAACGTT
    GTTGGTCACCACGAGGCCGCCACTAATATCGTAAAAGACTACCGTTATACCTATGAC
    AAGTACTTTTTGCACATGCCGATCACTATCAACTTCAAGGCGAATAAGACGGGCTTC
    ATTAACGATCGTATCCTGCAATACATCGCCAAGGAGAAGGACCTTCACGTCATTGGG
    ATTGACCGTGGTGAGCGTAACCTGATTTATGTAAGCGTCATTGATACCTGCGGTAAT
    ATCGTCGAACAGAAAAGTTTCAACATTGTAAATGGATATGACTATCAGATCAAACTT
    AAGCAGCAGGAGGGTGCACGCCAGATTGCCCGCAAGGAATGGAAGGAGATTGGGA
    AGATTAAGGAAATTAAAGAAGGTTACTTATCACTGGTTATTCACGAGATCAGTAAA
    ATGGTAATCAAATATAACGCGATCATTGCCATGGAGGATCTGAGCTATGGCTTTAAA
    AAGGGCCGTTTCAAAGTCGAGCGCCAGGTATATCAAAAGTTTGAAACAATGCTGAT
    TAACAAATTAAACTATCTGGTTTTCAAAGATATTTCGATCACTGAAAATGGCGGGCT
    GTTGAAGGGATACCAACTTACATACATCCCTGACAAACTGAAAAATGTCGGTCACC
    AATGTGGATGTATCTTTTATGTACCAGCAGCGTATACGAGCAAAATCGATCCAACTA
    CGGGTTTTGTGAACATCTTTAAGTTCAAGGATTTGACAGTAGATGCCAAACGCGAGT
    TCATTAAAAAATTTGATTCAATTCGCTACGATTCAGAGAAAAATCTTTTTTGTTTCAC
    GTTCGATTACAATAATTTCATTACGCAGAACACAGTAATGTCAAAGTCAAGCTGGTC
    GGTCTACACGTATGGAGTCCGTATTAAACGTCGTTTTGTAAACGGCCGTTTCTCAAA
    TGAATCAGATACAATTGATATTACGAAGGATATGGAGAAGACATTAGAGATGACTG
    ACATTAACTGGCGCGACGGACATGATCTTCGTCAGGACATTATTGATTATGAGATTG
    TACAGCATATCTTTGAGATCTTCCGCCTGACCGTTCAGATGCGCAATTCGTTGTCCGA
    GTTAGAAGACCGCGATTACGACCGTTTAATCAGTCCCGTCTTAAACGAAAATAACAT
    CTTCTACGATTCAGCCAAGGCAGGCGATGCCTTGCCAAAGGATGCTGACGCAAATG
    GCGCATACTGTATTGCGTTGAAAGGCCTTTATGAAATCAAGCAAATTACCGAAAACT
    GGAAAGAAGACGGAAAATTCTCCCGTGATAAGTTGAAAATCTCTAATAAGGATTGG
    TTCGATTTCATCCAAAATAAACGCTATTTGAAACGTCCGGCAGCGACCAAAAAAGCC
    GGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAAGA
    AACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 96
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGAACTA
    ATAATTTCCAAAATTTTATAGGCATCTCTTCTTTACAGAAGACTCTTCGTAACGCCCT
    AATCCCGACTGAGACCACACAACAATTCATAGTGAAAAATGGGATCATTAAAGAAG
    ACGAGCTGCGTGGGGAGAACAGGCAGATCCTAAAAGACATAATGGACGATTATTAT
    AGAGGGTTCATCTCAGAGACATTATCTAGCATCGACGACATTGACTGGACCTCCCTG
    TTTGAAAAAATGGAAATCCAGCTGAAGAATGGTGACAATAAAGACACATTAATAAA
    AGAACAAACAGAGTACAGGAAAGCCATCCACAAGAAGTTCGCAAACGATGACAGA
    TTCAAAAATATGTTCAGTGCGAAGCTAATATCCGACATCTTACCAGAGTTTGTAATA
    CACAATAACAATTACAGCGCGAGCGAAAAGGAAGAGAAAACGCAAGTAATTAAGC
    TTTTTAGTAGGTTCGCTACCTCTTTCAAAGATTACTTCAAAAATCGTGCTAACTGCTT
    CTCAGCCGACGACATATCTTCAAGTTCCTGTCACCGTATCGTGAATGATAACGCTGA
    GATATTCTTCTCAAACGCCCTTGTATACCGTAGGATCGTAAAGTCCTTATCTAACGAT
    GATATAAACAAGATCAGTGGAGACATGAAAGACAGCCTTAAAGAGATGTCTCTAGA
    AGAAATTTACTCCTATGAAAAGTATGGGGAGTTTATAACACAGGAGGGGATCAGCT
    TCTACAACGACATCTGCGGAAAGGTGAACAGTTTCATGAATCTTTACTGCCAGAAGA
    ATAAAGAGAACAAAAATCTTTATAAGCTTCAAAAGTTGCACAAACAAATACTGTGC
    ATTGCCGATACATCATATGAGGTCCCCTATAAGTTCGAATCTGATGAGGAAGTTTAT
    CAATCTGTTAACGGCTTTCTAGACAATATCAGCTCAAAACACATCGTAGAAAGACTG
    AGGAAAATAGGTGATAATTATAATGGATACAACTTGGATAAAATATATATAGTCTCT
    AAATTTTACGAGTCAGTATCCCAGAAAACGTATAGGGATTGGGAGACCATCAACAC
    GGCGTTAGAGATTCATTACAATAACATCTTACCGGGAAACGGAAAAAGTAAGGCGG
    ACAAAGTAAAGAAAGCCGTTAAAAATGACTTACAAAAGAGTATAACAGAAATAAA
    CGAACTAGTAAGCAACTACAAGCTTTGTTCCGATGATAATATCAAGGCCGAGACAT
    ATATCCATGAGATCTCCCACATTCTAAACAATTTCGAAGCGCAAGAACTTAAATATA
    ATCCCGAAATCCACCTGGTGGAAAGTGAACTAAAGGCTAGTGAGTTAAAGAACGTT
    CTTGATGTTATCATGAACGCCTTCCATTGGTGCTCTGTTTTTATGACCGAGGAGTTGG
    TTGATAAAGATAATAATTTCTACGCTGAATTAGAGGAGATATACGACGAAATCTACC
    CAGTGATTTCACTATACAACTTGGTCAGGAACTATGTTACACAAAAGCCGTACAGCA
    CTAAGAAAATTAAGCTAAATTTCGGTATCCCCACGTTAGCCGACGGGTGGAGCAAG
    TCCAAAGAATATTCCAACAATGCGATTATTTTAATGCGTGACAATCTTTATTACCTTG
    GCATCTTCAATGCCAAAAACAAACCTGACAAAAAGATTATAGAAGGTAATACGTCC
    GAGAACAAAGGCGATTACAAGAAGATGATTTATAACCTACTGCCCGGACCAAACAA
    AATGATCCCCAAAGTTTTTCTTAGTTCTAAAACCGGCGTAGAGACGTATAAACCTTC
    TGCCTATATCTTAGAGGGATATAAGCAGAACAAACATATCAAATCTTCCAAGGACTT
    TGATATTACATTCTGCCACGATTTAATTGACTACTTCAAAAATTGCATAGCGATACA
    TCCGGAGTGGAAGAACTTTGGCTTCGACTTCAGTGATACATCCACCTATGAGGATAT
    ATCAGGCTTCTATCGTGAGGTCGAATTGCAAGGGTACAAAATCGATTGGACGTATAT
    ATCCGAGAAAGACATAGACCTTCTTCAAGAAAAGGGGCAGTTATATTTATTCCAAAT
    ATACAACAAGGACTTCAGTAAGAAGTCAACAGGTAATGACAACTTACACACCATGT
    ACTTGAAAAATTTATTTTCTGAAGAAAACCTAAAGGACATTGTACTAAAACTGAACG
    GGGAGGCAGAAATTTTTTTTAGAAAGAGCAGCATAAAAAACCCAATAATTCATAAG
    AAAGGAAGCATTTTAGTTAATAGGACGTACGAGGCAGAGGAAAAGGACCAGTTTGG
    CAATATCCAGATCGTAAGGAAAAATATTCCTGAAAACATATATCAGGAACTATATA
    AATACTTTAACGACAAATCCGACAAAGAATTATCCGACGAGGCTGCAAAGCTGAAG
    AACGTCGTAGGGCACCATGAGGCAGCGACTAATATTGTGAAAGACTATAGGTATAC
    ATACGACAAATACTTTCTGCACATGCCCATCACGATTAACTTCAAGGCGAACAAGAC
    GGGATTCATTAACGACCGTATATTACAATATATTGCTAAGGAGAAAGATCTGCATGT
    AATAGGTATCGACAGAGGCGAACGTAATTTAATCTACGTGTCCGTCATCGACACGTG
    CGGGAACATCGTAGAGCAAAAGAGTTTTAATATAGTAAATGGCTATGATTACCAAA
    TTAAGCTAAAGCAGCAAGAAGGAGCAAGACAGATAGCTAGGAAAGAATGGAAGGA
    GATAGGAAAAATAAAGGAGATCAAGGAGGGGTATCTTAGCCTAGTAATTCATGAAA
    TATCTAAGATGGTTATCAAATACAACGCTATCATAGCGATGGAAGACTTATCTTATG
    GTTTCAAGAAAGGAAGGTTCAAAGTAGAGCGTCAAGTTTATCAAAAGTTCGAAACG
    ATGTTGATTAATAAACTAAACTATTTGGTATTTAAAGATATATCTATCACCGAGAAT
    GGTGGTCTACTAAAGGGTTACCAGCTTACATACATACCGGACAAACTTAAAAACGTC
    GGACATCAGTGTGGATGCATTTTCTACGTTCCAGCTGCATATACCAGCAAGATCGAC
    CCAACGACTGGGTTCGTAAATATTTTTAAATTCAAGGATTTGACTGTCGACGCCAAA
    AGAGAGTTCATAAAAAAGTTCGATTCAATTAGGTACGACAGCGAAAAGAATTTGTT
    CTGCTTTACTTTTGACTATAACAATTTCATTACTCAGAACACTGTAATGTCTAAGTCC
    TCTTGGTCAGTCTATACTTATGGCGTTCGTATCAAACGTAGATTTGTTAACGGTAGAT
    TCTCAAATGAAAGTGATACAATAGATATCACGAAAGATATGGAGAAAACATTAGAA
    ATGACAGACATAAACTGGAGAGACGGACATGACTTGAGACAGGACATTATTGACTA
    CGAGATCGTGCAGCACATCTTTGAGATCTTTCGTTTGACCGTACAAATGCGTAACAG
    TTTATCTGAGCTTGAGGACAGGGACTACGATAGATTGATATCACCTGTATTAAATGA
    GAATAACATCTTCTATGATTCCGCAAAAGCAGGCGACGCTCTACCCAAAGACGCTG
    ATGCGAACGGTGCTTATTGCATAGCTTTAAAGGGTTTGTATGAGATCAAACAGATAA
    CAGAAAATTGGAAGGAAGATGGTAAGTTCTCCCGTGACAAGCTTAAAATATCAAAT
    AAGGACTGGTTCGATTTTATACAGAATAAGCGTTATTAAAACGTCCGGCAGCGACCA
    AAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCC
    GAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCT
    AA
    SEQ ID NO: 97
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAATGGAACTA
    ATAACTTCCAGAATTTCATTGGTATCTCCTCTTTACAAAAAACTCTAAGAAACGCCC
    TAATTCCGACTGAAACTACACAGCAATTCATCGTCAAAAACGGGATCATTAAGGAG
    GATGAGTTGAGGGGTGAAAATCGTCAAATTCTTAAAGACATCATGGACGACTACTA
    CAGGGGGTTCATCAGCGAGACGTTATCTAGTATAGACGATATAGACTGGACTTCACT
    GTTCGAGAAGATGGAAATCCAATTAAAAAATGGGGACAATAAAGATACACTTATAA
    AGGAACAGACAGAGTATAGAAAGGCAATACACAAAAAGTTTGCCAACGACGATCGT
    TTCAAGAACATGTTTAGTGCTAAATTGATTTCAGATATTCTGCCGGAATTTGTTATTC
    ACAACAATAATTATAGCGCCAGTGAGAAAGAAGAAAAAACGCAGGTTATCAAACTG
    TTCAGTCGTTTCGCTACATCTTTTAAGGATTACTTTAAAAACCGTGCAAATTGTTTTT
    CAGCCGACGATATTAGTAGCAGCTCTTGTCACCGTATTGTTAATGATAATGCGGAGA
    TTTTCTTTTCAAACGCATTGGTCTACAGGAGGATAGTCAAGTCCCTTTCAAATGACG
    ACATTAATAAGATCTCAGGTGACATGAAAGATTCCTTAAAGGAAATGTCCCTGGAA
    GAGATCTATTCCTATGAAAAGTACGGTGAGTTCATTACTCAAGAGGGTATAAGCTTT
    TACAATGACATATGTGGTAAGGTTAATAGCTTTATGAACCTGTATTGCCAGAAGAAC
    AAAGAAAATAAGAATCTGTATAAGTTGCAAAAGCTACACAAACAAATTTTGTGCAT
    TGCCGATACATCATACGAGGTGCCATACAAATTCGAGAGCGATGAGGAGGTTTATC
    AGAGCGTGAATGGATTCCTGGACAATATTAGTAGTAAGCATATCGTGGAAAGGCTT
    AGAAAGATAGGTGACAATTACAATGGCTACAATCTGGATAAAATCTACATCGTCTC
    AAAATTCTATGAAAGTGTATCCCAGAAGACGTACCGTGATTGGGAAACTATCAACA
    CCGCTCTGGAGATACATTACAACAATATACTTCCCGGAAACGGCAAGTCAAAAGCC
    GACAAAGTCAAAAAAGCGGTCAAGAACGATTTACAAAAGTCTATCACTGAAATTAA
    TGAATTAGTTAGTAATTACAAACTGTGTAGTGATGATAATATTAAGGCAGAGACTTA
    CATACACGAAATTTCACACATTTTAAACAACTTCGAGGCACAGGAACTTAAATATAA
    TCCTGAAATTCACCTGGTTGAAAGTGAATTGAAAGCCAGCGAGCTAAAGAACGTTTT
    GGACGTAATCATGAACGCATTCCACTGGTGCTCTGTCTTTATGACAGAGGAACTAGT
    GGATAAGGACAATAATTTTTATGCGGAGCTGGAGGAAATATACGATGAGATATATC
    CCGTAATATCATTATATAATCTGGTAAGAAACTATGTGACTCAAAAGCCGTATAGCA
    CCAAGAAAATTAAACTTAATTTCGGCATACCCACTTTAGCGGACGGCTGGTCAAAAT
    CCAAAGAGTATAGTAATAATGCCATCATCCTGATGCGTGACAACCTGTACTATTTAG
    GTATATTTAACGCCAAAAATAAACCCGACAAAAAGATTATAGAGGGCAACACCTCA
    GAGAACAAAGGTGATTATAAGAAGATGATTTACAACCTTTTACCCGGTCCTAATAAG
    ATGATTCCCAAAGTCTTTCTATCTAGCAAAACTGGTGTTGAAACATACAAACCCTCA
    GCTTATATTTTAGAAGGGTATAAGCAGAATAAGCATATTAAAAGCTCCAAAGATTTC
    GATATTACCTTTTGCCATGACTTGATAGACTATTTCAAAAATTGTATTGCCATTCACC
    CTGAATGGAAAAACTTCGGATTTGACTTCTCTGACACATCCACCTACGAAGACATTT
    CAGGTTTTTACAGGGAAGTCGAGCTACAGGGTTATAAAATTGATTGGACATACATCA
    GCGAGAAAGATATTGACCTACTTCAAGAAAAAGGGCAGCTATACCTGTTCCAGATA
    TACAATAAAGACTTCAGTAAAAAAAGCACCGGGAACGATAATCTTCACACAATGTA
    CTTAAAAAATTTATTTAGTGAAGAGAATCTGAAGGATATAGTGCTGAAGTTAAACG
    GGGAGGCAGAGATATTTTTTAGAAAATCTAGTATTAAGAATCCGATCATCCACAAG
    AAGGGTTCTATCCTTGTTAATAGGACTTATGAGGCAGAAGAAAAAGACCAATTCGG
    CAACATACAAATTGTCCGTAAAAATATCCCTGAGAACATTTATCAGGAACTATACAA
    GTACTTCAATGATAAAAGCGACAAGGAGCTGAGCGACGAGGCTGCTAAGTTAAAGA
    ATGTGGTGGGCCACCATGAGGCAGCAACGAATATTGTGAAGGACTATCGTTATACCT
    ACGATAAATACTTTCTTCATATGCCGATCACCATTAATTTCAAGGCAAACAAAACTG
    GCTTCATTAACGATCGTATCTTACAATATATCGCAAAAGAGAAAGACCTTCACGTTA
    TCGGGATCGATAGAGGCGAGCGTAACCTAATTTATGTTTCTGTGATAGACACCTGTG
    GGAACATAGTCGAACAGAAATCATTTAATATTGTTAACGGCTACGATTATCAGATAA
    AGTTGAAGCAACAAGAGGGTGCACGTCAAATAGCAAGGAAAGAATGGAAAGAAAT
    AGGCAAGATTAAAGAAATAAAAGAAGGTTATTTATCCCTTGTAATACACGAAATTA
    GCAAAATGGTGATTAAATATAATGCGATCATTGCCATGGAGGATCTTTCTTACGGCT
    TCAAAAAGGGGAGATTCAAAGTCGAGAGGCAGGTGTATCAGAAGTTTGAGACCATG
    CTAATCAATAAACTAAATTATCTAGTATTCAAAGACATAAGCATCACCGAAAATGGC
    GGCTTGTTGAAGGGTTATCAATTGACCTACATCCCAGATAAACTAAAAAACGTAGG
    GCATCAATGCGGATGTATATTTTACGTTCCAGCCGCATACACTTCCAAAATCGATCC
    AACTACGGGTTTTGTGAACATCTTCAAATTCAAAGACTTGACTGTCGATGCTAAGAG
    GGAGTTTATCAAGAAATTTGACTCCATTAGATACGACAGTGAGAAGAATCTGTTCTG
    TTTTACCTTTGATTATAACAACTTTATAACTCAAAACACAGTCATGAGTAAGTCATCT
    TGGTCAGTGTATACGTATGGTGTGAGGATTAAAAGGAGGTTTGTTAACGGGAGATTT
    TCCAATGAAAGTGATACAATAGATATAACCAAGGACATGGAAAAGACTCTTGAAAT
    GACCGACATTAACTGGAGAGATGGCCACGACTTACGTCAAGATATAATCGATTACG
    AGATAGTGCAACATATCTTTGAGATATTTAGGCTTACTGTCCAAATGCGTAACTCAT
    TAAGTGAGTTGGAGGACAGGGATTACGATAGGCTAATAAGTCCTGTTCTTAACGAA
    AACAATATATTCTACGATTCAGCAAAGGCGGGAGACGCCCTGCCCAAGGACGCGGA
    TGCTAACGGCGCATACTGTATTGCCCTGAAAGGCTTGTACGAGATAAAACAGATCAC
    GGAGAACTGGAAAGAAGATGGAAAATTCAGTCGTGACAAGTTAAAAATTAGTAACA
    AAGACTGGTTCGACTTTATTCAGAACAAGAGATATCTGAAACGTCCGGCAGCGACC
    AAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCC
    CGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGG
    CTAA
    SEQ ID NO: 98
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAACGGAACCA
    ATAACTTTCAAAACTTTATAGGCATCTCCAGTCTACAGAAGACACTACGTAACGCTT
    TGATACCAACTGAGACCACGCAGCAGTTTATCGTCAAGAACGGTATTATAAAGGAA
    GACGAGCTAAGGGGGGAAAACCGTCAGATCTTAAAGGACATCATGGATGACTACTA
    CAGAGGCTTCATAAGTGAGACTTTGTCTAGTATAGACGACATCGACTGGACCAGTTT
    ATTTGAGAAGATGGAAATTCAGTTAAAGAACGGGGACAATAAAGACACACTAATTA
    AAGAGCAGACCGAATACAGAAAAGCTATACACAAAAAGTTTGCCAACGATGATAGA
    TTCAAAAATATGTTTTCAGCAAAATTGATTTCCGACATATTGCCAGAATTCGTAATC
    CATAATAACAATTATTCTGCAAGTGAGAAGGAAGAGAAGACCCAAGTAATCAAGCT
    GTTTTCCCGTTTTGCTACGAGTTTCAAAGATTATTTCAAGAATAGGGCTAATTGTTTC
    TCCGCGGACGACATAAGTAGCAGTTCCTGTCACAGGATTGTGAACGATAATGCTGA
    GATATTTTTTTCCAATGCCCTAGTGTATAGGAGAATAGTTAAAAGCTTAAGCAACGA
    CGATATCAATAAAATTTCAGGGGACATGAAGGACAGCTTAAAGGAAATGAGTTTGG
    AGGAGATTTACAGTTATGAAAAATACGGAGAGTTTATAACTCAGGAAGGCATCTCTT
    TCTATAATGATATCTGTGGGAAGGTAAACTCCTTCATGAATTTATATTGCCAGAAGA
    ATAAGGAAAACAAAAATCTTTACAAGCTTCAAAAGTTACATAAGCAGATCTTATGT
    ATTGCCGACACGAGTTATGAAGTGCCTTATAAATTCGAGAGTGATGAGGAAGTGTAT
    CAGTCTGTTAACGGATTCCTAGATAATATAAGTTCCAAACATATAGTCGAGAGGCTG
    AGGAAGATTGGCGATAACTATAATGGATATAATCTTGACAAAATCTATATAGTCTCT
    AAATTTTATGAAAGCGTCAGCCAGAAGACATATAGAGATTGGGAAACTATAAACAC
    AGCCCTTGAAATACATTACAATAACATCCTACCCGGCAATGGTAAGTCTAAGGCAG
    ACAAAGTTAAAAAAGCAGTAAAGAATGACTTACAGAAGTCAATCACGGAGATAAAT
    GAGTTGGTCAGTAACTACAAATTATGCTCCGACGATAATATTAAGGCCGAAACATAT
    ATACACGAGATAAGTCATATATTAAACAATTTCGAAGCCCAGGAGTTAAAATATAA
    CCCTGAAATTCATCTGGTCGAAAGTGAGTTAAAGGCCAGTGAGTTAAAGAATGTACT
    TGACGTAATTATGAATGCTTTTCATTGGTGCTCCGTGTTCATGACCGAGGAGTTAGT
    AGATAAAGACAATAACTTTTACGCCGAACTTGAAGAGATATACGACGAGATTTATC
    CGGTAATCAGCTTGTACAACTTAGTTAGAAATTATGTAACACAGAAGCCTTACTCTA
    CTAAAAAAATAAAACTGAACTTTGGTATCCCAACTCTTGCAGATGGTTGGAGTAAAA
    GCAAGGAATATAGCAACAATGCGATCATCTTGATGAGAGACAACTTGTACTATTTGG
    GAATCTTCAACGCGAAAAATAAACCCGACAAAAAAATCATCGAAGGGAATACCTCT
    GAGAATAAAGGTGACTATAAGAAAATGATTTACAATCTACTTCCTGGTCCTAATAAA
    ATGATCCCGAAAGTGTTTCTTAGTTCTAAGACTGGTGTCGAGACGTACAAACCTAGC
    GCGTACATCTTAGAAGGGTACAAGCAGAATAAACACATCAAATCAAGCAAAGACTT
    CGATATTACTTTTTGCCATGACTTGATAGACTACTTTAAAAACTGCATAGCAATCCA
    CCCGGAGTGGAAAAACTTTGGCTTTGATTTCTCTGACACCTCTACATATGAGGACAT
    ATCTGGTTTTTACCGTGAGGTTGAATTGCAGGGATACAAAATTGACTGGACTTACAT
    ATCTGAAAAAGATATCGATCTATTGCAGGAGAAAGGCCAGCTTTACCTTTTCCAGAT
    CTATAATAAGGACTTCTCTAAGAAGTCTACAGGGAATGATAATTTGCACACTATGTA
    CTTAAAAAATCTGTTTTCCGAGGAAAACTTGAAAGACATTGTTTTAAAGTTGAACGG
    AGAAGCTGAAATATTTTTCAGAAAGAGCTCCATAAAAAACCCGATCATTCATAAGA
    AGGGATCTATCCTGGTTAACAGAACGTACGAAGCGGAAGAAAAAGACCAATTCGGA
    AACATTCAAATTGTTAGAAAGAATATCCCTGAGAACATCTACCAGGAGTTATATAAG
    TATTTTAATGATAAGTCAGATAAGGAACTATCTGACGAAGCGGCGAAGCTTAAAAA
    TGTTGTAGGACACCATGAGGCTGCTACAAATATAGTCAAGGACTACCGTTATACCTA
    CGATAAGTACTTTCTACACATGCCCATTACCATCAATTTTAAAGCTAATAAAACGGG
    TTTTATCAACGATCGTATCCTACAATATATTGCGAAAGAGAAGGATTTGCATGTCAT
    TGGCATTGATAGAGGTGAGAGGAACCTAATATACGTATCCGTGATTGATACGTGCG
    GGAACATAGTTGAACAGAAATCATTTAATATAGTTAATGGGTACGACTATCAGATTA
    AGCTAAAGCAACAAGAAGGCGCCAGGCAAATTGCCCGTAAAGAATGGAAAGAGAT
    CGGGAAGATCAAGGAAATAAAAGAAGGATACCTTTCCCTGGTCATCCATGAAATTA
    GCAAAATGGTGATTAAGTACAATGCCATAATCGCGATGGAGGACTTAAGCTACGGG
    TTCAAAAAGGGGAGGTTTAAGGTGGAGAGGCAAGTGTACCAGAAATTTGAGACCAT
    GCTAATCAACAAACTGAACTACCTAGTTTTTAAGGACATTTCAATTACAGAGAATGG
    AGGACTTTTAAAGGGTTACCAACTAACGTATATACCAGATAAGTTGAAAAATGTCG
    GTCACCAGTGTGGCTGCATCTTTTACGTTCCCGCCGCTTATACATCTAAAATTGATCC
    AACCACAGGCTTTGTAAATATCTTTAAATTCAAAGATTTAACTGTGGATGCAAAAAG
    AGAGTTTATCAAGAAATTCGATAGCATTCGTTATGATAGCGAGAAGAACCTGTTCTG
    CTTTACTTTCGACTATAACAACTTTATAACTCAAAACACCGTGATGTCAAAAAGCTC
    ATGGTCAGTCTACACCTATGGTGTAAGGATTAAAAGGCGTTTCGTGAATGGGAGATT
    CTCCAATGAAAGTGACACGATCGACATAACAAAGGACATGGAGAAGACACTAGAG
    ATGACTGATATTAATTGGAGAGACGGACACGATCTGCGTCAAGATATAATTGATTAT
    GAGATAGTACAGCACATATTTGAGATCTTCCGTTTGACTGTCCAAATGCGTAATTCC
    CTTTCTGAGCTGGAAGATAGGGACTATGATAGATTAATATCCCCTGTACTAAATGAG
    AACAACATTTTCTATGATAGTGCAAAAGCCGGGGATGCATTGCCGAAAGACGCTGA
    CGCTAATGGGGCGTACTGTATAGCTTTAAAGGGGCTTTACGAAATAAAGCAGATAA
    CCGAAAACTGGAAGGAAGATGGCAAATTCTCAAGGGACAAACTTAAGATCTCTAAC
    AAGGATTGGTTCGATTTTATACAAAACAAACGTTATTTGAAACGTCCGGCAGCGACC
    AAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCC
    CGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGG
    CTAA
    SEQ ID NO: 99
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAATGGTACAAA
    CAACTTTCAGAATTTCATTGGGATCTCTAGCTTACAGAAGACCCTGAGGAATGCGTT
    GATTCCAACTGAAACAACCCAGCAATTCATCGTGAAAAATGGGATAATCAAAGAGG
    ATGAGTTAAGGGGTGAAAACCGTCAAATATTGAAGGATATTATGGACGACTACTAC
    CGTGGATTCATCTCAGAGACGTTGAGCAGCATTGACGACATAGACTGGACTAGCCTT
    TTCGAGAAGATGGAAATTCAGTTAAAGAACGGAGATAACAAAGATACACTAATCAA
    GGAACAGACAGAATACAGAAAAGCAATTCATAAGAAATTCGCTAATGACGATCGTT
    TTAAAAACATGTTCTCTGCAAAATTAATTAGCGACATTCTGCCGGAATTCGTTATAC
    ATAATAATAACTACAGTGCTTCTGAAAAGGAAGAGAAAACTCAGGTAATAAAACTG
    TTCTCTCGTTTTGCCACATCCTTCAAAGACTACTTTAAAAATAGAGCGAACTGCTTTA
    GCGCCGACGATATTAGTTCTTCCTCATGCCACAGGATTGTCAACGATAATGCAGAGA
    TATTCTTTTCTAACGCACTAGTCTACAGAAGGATTGTAAAGTCTTTGTCAAATGATG
    ACATAAACAAGATTAGTGGAGATATGAAAGACTCTCTAAAGGAAATGAGCCTTGAG
    GAGATATACTCTTATGAAAAGTACGGTGAGTTTATTACCCAAGAAGGCATTAGTTTC
    TATAATGACATTTGTGGAAAAGTTAACAGTTTTATGAATCTATACTGTCAAAAAAAT
    AAGGAGAATAAAAATCTTTATAAGTTGCAAAAACTGCATAAGCAGATATTATGTAT
    AGCAGACACGAGCTATGAGGTACCGTACAAGTTCGAGAGCGATGAGGAAGTCTACC
    AATCTGTCAACGGATTTTTGGACAACATTTCTTCAAAACATATTGTGGAGAGGCTTA
    GGAAAATAGGCGACAATTATAATGGATATAACTTAGATAAGATATATATTGTTTCCA
    AATTCTACGAATCTGTAAGCCAGAAGACATACAGAGATTGGGAAACGATAAACACA
    GCCCTTGAAATTCACTATAACAACATACTACCTGGAAACGGCAAATCAAAGGCCGA
    CAAAGTTAAGAAGGCCGTAAAGAATGATTTACAGAAGAGCATAACGGAGATCAATG
    AGCTGGTGTCTAACTATAAATTGTGTAGCGATGACAACATAAAAGCCGAGACTTAC
    ATTCACGAAATTTCACACATACTTAACAACTTTGAAGCTCAGGAATTAAAGTATAAT
    CCCGAAATACACCTTGTGGAGTCCGAACTAAAGGCTAGTGAGCTTAAGAACGTCCT
    AGACGTAATTATGAATGCCTTCCACTGGTGTAGTGTTTTTATGACCGAGGAACTTGT
    TGACAAAGATAATAATTTTTATGCAGAACTAGAAGAGATATACGATGAAATATACC
    CGGTGATCAGTTTGTACAATCTTGTCAGGAACTATGTGACACAAAAGCCCTATTCAA
    CAAAGAAAATAAAACTTAATTTCGGAATTCCTACGTTAGCTGATGGCTGGTCTAAAT
    CCAAGGAATACAGCAACAACGCTATAATTCTGATGAGAGATAACTTGTACTATCTAG
    GCATCTTCAATGCCAAAAATAAGCCTGATAAGAAGATTATAGAGGGCAACACTTCA
    GAGAACAAGGGCGACTACAAGAAAATGATCTATAACCTATTGCCTGGCCCAAACAA
    GATGATTCCGAAGGTCTTCCTATCATCCAAGACCGGCGTTGAGACATACAAGCCATC
    AGCGTATATTTTAGAGGGGTACAAACAAAACAAGCACATAAAGTCTAGTAAAGACT
    TCGATATAACATTTTGTCATGACTTAATTGACTACTTTAAGAATTGCATCGCTATACA
    CCCGGAATGGAAGAATTTCGGCTTCGACTTCTCTGATACATCTACCTACGAGGACAT
    TAGCGGGTTTTACCGTGAAGTCGAATTACAAGGGTATAAGATAGATTGGACGTACAT
    CTCTGAGAAAGACATAGACTTGCTTCAGGAAAAGGGCCAGTTGTATCTATTCCAAAT
    ATACAATAAGGATTTTTCCAAGAAATCTACGGGTAATGACAATCTTCACACAATGTA
    TCTTAAGAACCTTTTCTCAGAAGAGAACCTGAAGGACATTGTCTTAAAACTAAATGG
    CGAAGCTGAGATTTTTTTCAGGAAGTCTTCAATTAAGAACCCGATAATCCACAAGAA
    GGGGAGTATTCTTGTGAATAGAACTTACGAGGCCGAAGAAAAAGACCAATTTGGTA
    ACATCCAGATAGTCAGAAAGAACATTCCAGAGAACATCTACCAAGAGCTATACAAA
    TATTTCAACGACAAGTCCGATAAGGAACTGTCCGATGAGGCAGCCAAGTTGAAGAA
    TGTCGTGGGTCATCATGAAGCTGCTACTAACATTGTCAAGGACTATCGTTATACTTA
    CGACAAGTATTTCCTACACATGCCGATAACAATTAATTTCAAGGCTAACAAAACAGG
    CTTTATCAACGATCGTATCTTGCAGTACATAGCTAAGGAAAAGGATTTGCATGTGAT
    TGGCATTGATAGAGGGGAGCGTAACTTGATATATGTGTCTGTCATAGACACGTGTGG
    CAACATCGTCGAACAGAAATCATTCAACATAGTAAACGGCTACGATTACCAAATTA
    AGCTGAAACAGCAAGAGGGTGCACGTCAAATTGCGCGTAAAGAGTGGAAAGAAATT
    GGTAAAATCAAGGAAATTAAAGAAGGCTACTTGTCTCTTGTTATACATGAAATTTCC
    AAGATGGTTATAAAGTATAACGCGATAATTGCTATGGAAGACTTATCATACGGGTTT
    AAAAAGGGGAGGTTCAAGGTAGAGAGGCAGGTCTATCAAAAGTTCGAGACGATGTT
    GATTAATAAACTAAACTATCTAGTGTTCAAAGATATCAGCATTACGGAGAACGGGG
    GGCTACTGAAAGGATATCAACTAACGTACATTCCCGATAAGTTAAAGAACGTTGGTC
    ATCAATGTGGTTGCATCTTCTACGTGCCTGCTGCCTATACGTCCAAAATAGATCCAA
    CTACTGGATTTGTTAACATCTTTAAATTCAAAGATTTAACCGTAGACGCCAAAAGGG
    AATTTATAAAAAAATTTGACAGCATCCGTTACGATAGCGAAAAGAATCTGTTCTGTT
    TTACTTTCGACTACAATAATTTCATCACGCAAAATACGGTAATGTCTAAGTCAAGTT
    GGAGCGTCTACACGTATGGAGTCAGGATCAAGAGGCGTTTCGTAAATGGAAGATTC
    TCTAATGAGTCAGATACTATAGACATCACGAAAGATATGGAGAAAACCTTGGAGAT
    GACGGATATTAACTGGCGTGATGGACACGATTTAAGACAGGACATTATTGACTATG
    AGATTGTGCAACACATCTTCGAAATATTCCGTCTAACAGTCCAAATGAGGAATAGCC
    TAAGTGAATTGGAGGACCGTGATTACGATAGGCTTATAAGTCCTGTCCTTAACGAAA
    ACAATATTTTCTATGATAGTGCTAAGGCGGGGGACGCACTGCCTAAAGACGCAGAT
    GCTAACGGGGCATACTGCATTGCGTTAAAGGGTCTGTACGAAATCAAGCAGATTAC
    GGAAAACTGGAAAGAGGATGGCAAGTTTAGCAGAGATAAGTTGAAGATAAGTAAC
    AAAGATTGGTTTGACTTTATTCAGAATAAAAGGTATTTAAAACGTCCGGCAGCGACC
    AAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCC
    CGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGG
    CTAA
    SEQ ID NO: 100
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAACGGCACTAA
    TAATTTCCAGAATTTCATCGGCATTAGCAGCTTACAAAAGACGTTGAGGAATGCCTT
    AATACCCACAGAAACTACTCAACAATTTATAGTGAAGAATGGGATAATTAAGGAAG
    ACGAGTTGAGAGGTGAAAATAGGCAAATCTTGAAAGACATTATGGATGACTACTAC
    AGGGGCTTCATTAGTGAAACGTTGTCTTCAATAGATGACATTGATTGGACTTCTTTGT
    TTGAGAAGATGGAAATACAGTTAAAGAACGGCGACAATAAGGATACACTTATCAAA
    GAGCAAACAGAATATAGAAAAGCAATTCACAAAAAGTTTGCTAACGATGATAGGTT
    CAAGAACATGTTTAGCGCTAAACTAATATCAGACATCCTTCCCGAGTTCGTTATTCA
    TAACAATAACTATAGTGCAAGTGAAAAAGAGGAGAAGACACAGGTGATTAAGCTGT
    TCTCCAGATTCGCGACTTCTTTCAAAGATTACTTCAAAAACAGAGCCAACTGTTTTTC
    AGCTGACGATATCTCTAGTAGTAGTTGTCACCGTATAGTGAACGATAACGCTGAGAT
    CTTCTTTAGCAATGCATTAGTGTATAGAAGGATAGTTAAGTCTCTAAGCAATGATGA
    TATCAATAAAATTTCCGGAGACATGAAGGACTCCCTAAAGGAAATGTCCTTAGAAG
    AGATCTACTCATATGAGAAATACGGGGAATTTATTACGCAGGAAGGGATCTCCTTTT
    ACAATGACATATGCGGGAAGGTCAACTCTTTCATGAACTTATACTGCCAAAAGAAC
    AAGGAGAACAAGAATTTATATAAACTTCAGAAACTTCACAAACAAATACTGTGCAT
    AGCCGATACCTCATATGAGGTTCCTTACAAATTTGAATCAGATGAAGAGGTATACCA
    ATCCGTTAACGGCTTTCTTGACAATATTAGCTCAAAGCACATCGTGGAGAGGTTGAG
    AAAGATTGGTGATAATTATAATGGCTACAATCTAGATAAGATATATATTGTTAGCAA
    GTTCTACGAGTCTGTGTCCCAAAAAACATATAGGGATTGGGAGACAATTAATACTGC
    TCTAGAAATCCATTACAACAACATCCTTCCTGGAAATGGCAAGAGTAAGGCCGACA
    AAGTCAAGAAAGCAGTGAAAAATGATCTGCAAAAATCAATTACTGAGATAAACGAG
    CTAGTATCTAATTACAAGCTTTGTAGCGACGATAACATTAAGGCAGAAACGTACATA
    CACGAGATTAGTCACATCTTAAATAATTTTGAAGCCCAAGAACTGAAATATAACCCT
    GAGATACACCTTGTTGAATCCGAGTTAAAGGCGTCTGAACTAAAAAACGTGTTAGA
    CGTTATTATGAATGCCTTCCACTGGTGTAGCGTCTTTATGACTGAGGAGTTGGTTGAT
    AAGGATAATAACTTTTACGCTGAATTGGAAGAAATTTATGACGAAATCTATCCTGTT
    ATTTCTCTATATAATTTGGTGAGAAATTACGTAACGCAAAAGCCCTATAGTACGAAA
    AAAATAAAACTAAATTTCGGGATCCCTACCCTAGCCGACGGTTGGTCTAAATCCAAG
    GAGTACTCAAACAATGCAATAATATTGATGAGGGACAACCTGTACTACCTAGGCAT
    ATTTAATGCCAAAAATAAGCCCGATAAAAAGATTATAGAAGGGAACACGTCAGAAA
    ATAAAGGAGACTATAAGAAAATGATCTACAACCTTTTGCCCGGCCCCAATAAAATG
    ATCCCGAAGGTCTTCCTAAGTAGCAAGACTGGCGTAGAGACCTACAAACCATCTGC
    ATACATTTTGGAGGGGTACAAGCAAAACAAGCACATAAAGAGTAGTAAGGATTTTG
    ACATTACATTCTGCCATGACTTAATTGACTACTTTAAAAATTGCATCGCAATTCACCC
    TGAATGGAAAAATTTTGGATTTGATTTCTCTGATACTTCAACATATGAGGATATTTCA
    GGGTTCTACAGGGAGGTCGAACTACAGGGTTACAAAATAGACTGGACGTATATTTCT
    GAGAAAGATATAGATTTGCTTCAGGAAAAGGGTCAGCTATATCTGTTCCAGATATAT
    AATAAGGACTTCTCCAAAAAGAGTACCGGAAATGATAATCTGCACACAATGTACTT
    AAAAAACTTGTTCTCTGAGGAGAATCTAAAAGACATCGTACTAAAACTTAACGGGG
    AGGCCGAAATTTTTTTTAGGAAGTCCAGCATCAAGAACCCGATTATTCATAAAAAAG
    GTAGCATTTTGGTGAACCGTACTTATGAGGCGGAAGAAAAAGACCAATTCGGTAAT
    ATTCAAATCGTTAGAAAGAACATCCCTGAGAACATTTATCAGGAACTATACAAATAC
    TTTAACGACAAATCAGATAAGGAGCTTTCTGATGAGGCAGCTAAATTGAAAAATGT
    AGTGGGACATCACGAAGCAGCCACTAACATAGTGAAGGACTACAGATACACATACG
    ATAAGTACTTCCTGCACATGCCTATTACAATTAACTTTAAAGCAAATAAAACAGGGT
    TTATTAACGACAGAATCTTACAGTATATTGCCAAAGAAAAGGATCTGCATGTGATAG
    GAATAGACAGAGGAGAAAGAAACCTGATATACGTCTCCGTGATTGATACATGTGGG
    AACATAGTAGAACAGAAGTCCTTTAACATTGTTAATGGGTACGATTATCAAATTAAA
    TTAAAACAACAAGAAGGAGCACGTCAAATAGCTAGGAAAGAATGGAAAGAGATAG
    GAAAAATTAAGGAAATTAAGGAGGGTTACCTGTCCCTTGTAATTCATGAAATATCCA
    AAATGGTAATTAAATATAACGCGATCATCGCGATGGAAGATCTAAGCTACGGGTTC
    AAAAAAGGCAGGTTTAAGGTGGAGAGGCAAGTTTACCAAAAGTTCGAGACAATGTT
    GATTAATAAGTTAAACTACTTAGTTTTCAAAGATATCTCCATAACCGAGAATGGCGG
    GCTTTTAAAAGGGTACCAACTAACATATATCCCGGATAAATTGAAGAACGTTGGAC
    ACCAGTGTGGCTGCATATTTTATGTACCCGCTGCGTATACTTCTAAAATTGACCCGA
    CCACCGGGTTTGTAAACATATTCAAGTTTAAGGACCTAACAGTTGACGCCAAACGTG
    AGTTCATCAAGAAGTTCGATAGTATAAGGTATGACTCTGAGAAGAACCTTTTCTGCT
    TCACGTTTGACTATAATAATTTCATCACCCAAAATACAGTTATGTCAAAAAGCTCTT
    GGTCAGTATATACGTATGGCGTAAGGATTAAGCGTAGGTTCGTGAACGGTAGATTTT
    CCAACGAGTCAGATACTATTGATATTACCAAGGATATGGAGAAGACATTAGAAATG
    ACAGATATAAATTGGAGGGATGGGCACGATCTAAGGCAAGATATCATTGATTACGA
    AATTGTTCAGCACATATTCGAGATATTCCGTCTTACAGTACAAATGCGTAACAGCTT
    GTCTGAGTTGGAAGATCGTGACTATGACAGGTTGATATCACCGGTCTTGAACGAGAA
    CAATATATTCTACGACAGCGCTAAGGCGGGAGACGCTCTGCCTAAAGACGCAGATG
    CCAATGGGGCGTACTGCATTGCCTTAAAAGGCTTATACGAGATTAAACAGATCACA
    GAGAACTGGAAAGAGGACGGCAAGTTTTCTAGAGATAAATTGAAAATCTCAAACAA
    AGACTGGTTCGATTTCATCCAAAACAAAAGATACCTTAAACGTCCGGCAGCGACCA
    AAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCC
    GAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCT
    AA
    SEQ ID NO: 101
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAATGGAACTA
    ACAACTTCCAGAACTTTATCGGCATCTCTTCCCTCCAAAAGACACTGAGAAATGCAC
    TGATCCCAACCGAAACGACTCAACAATTTATTGTTAAGAACGGCATCATAAAAGAA
    GACGAGCTTCGCGGCGAGAACCGCCAGATACTTAAGGATATTATGGACGATTATTA
    CCGAGGCTTTATCAGCGAAACTCTTAGCTCTATTGATGATATCGACTGGACCTCCCT
    CTTCGAAAAAATGGAGATACAGCTCAAGAACGGCGATAATAAAGACACCTTGATAA
    AGGAACAGACTGAGTACAGGAAAGCGATCCACAAGAAATTCGCGAACGACGACAG
    GTTTAAAAACATGTTCTCTGCAAAATTGATATCCGACATCTTGCCGGAATTTGTGAT
    ACACAACAATAACTATAGCGCTTCAGAGAAAGAAGAGAAGACCCAAGTAATCAAGT
    TGTTCAGCCGCTTCGCAACGTCTTTTAAAGATTACTTTAAGAACCGGGCCAATTGTTT
    CTCCGCGGATGATATTAGCTCATCAAGTTGCCATCGAATTGTCAATGATAATGCGGA
    GATCTTCTTCAGCAATGCGCTGGTCTACAGACGAATCGTAAAAAGTCTTTCAAATGA
    CGACATCAATAAGATTAGTGGAGATATGAAGGATTCCCTTAAGGAAATGAGTCTTG
    AAGAAATATACTCATACGAAAAGTACGGGGAATTTATTACCCAGGAGGGGATCTCC
    TTCTATAACGACATCTGTGGAAAAGTAAACTCATTCATGAACCTGTACTGTCAGAAA
    AACAAAGAAAACAAAAATCTGTATAAACTCCAAAAATTGCACAAGCAAATATTGTG
    TATAGCGGACACATCATACGAGGTTCCATATAAGTTCGAAAGTGATGAAGAAGTCT
    ACCAATCAGTGAATGGGTTTCTGGACAACATTAGTTCCAAGCACATAGTTGAACGAC
    TGCGAAAGATTGGTGACAATTACAACGGCTATAATTTGGACAAGATTTATATAGTTA
    GCAAATTTTATGAATCCGTATCACAAAAGACTTATAGAGACTGGGAAACAATCAAC
    ACGGCACTTGAGATCCATTATAACAATATTCTTCCAGGGAACGGCAAAAGCAAGGC
    TGATAAGGTAAAAAAGGCCGTTAAGAATGATCTTCAAAAATCCATAACGGAGATCA
    ACGAACTTGTAAGTAACTACAAATTGTGCTCTGACGACAATATAAAGGCTGAAACG
    TATATTCACGAGATTAGCCATATCCTGAATAACTTTGAGGCCCAAGAACTCAAGTAT
    AACCCGGAAATACATTTGGTAGAAAGCGAGCTTAAAGCGAGTGAGCTGAAAAACGT
    CCTCGATGTGATCATGAATGCTTTCCACTGGTGTAGTGTCTTTATGACTGAGGAGTTG
    GTTGATAAAGACAATAATTTCTACGCTGAACTGGAAGAAATTTACGACGAAATCTAT
    CCAGTGATCTCCCTCTATAACCTCGTTCGAAACTACGTGACGCAGAAACCTTATTCT
    ACAAAGAAAATTAAGTTGAACTTCGGCATTCCTACACTTGCTGACGGATGGTCCAAA
    TCCAAAGAGTACTCAAACAACGCAATCATCCTCATGCGGGATAACCTTTATTATTTG
    GGCATTTTCAACGCCAAAAACAAACCTGATAAAAAGATAATTGAAGGCAATACGAG
    TGAGAACAAGGGCGACTACAAAAAAATGATATATAACTTGTTGCCAGGCCCCAACA
    AGATGATTCCTAAAGTTTTTCTGTCTTCTAAGACTGGAGTTGAAACTTACAAACCCTC
    CGCCTACATTCTTGAAGGGTATAAACAGAATAAGCACATAAAGTCCTCAAAGGATTT
    CGACATTACGTTTTGCCATGACCTCATCGACTATTTCAAGAACTGTATCGCCATACAT
    CCGGAGTGGAAGAATTTTGGATTTGATTTCTCCGACACATCTACCTATGAAGACATA
    AGCGGTTTCTACCGGGAGGTCGAGCTTCAGGGCTATAAGATAGATTGGACATACATT
    AGTGAAAAAGATATCGATCTTCTGCAAGAAAAGGGACAACTTTACCTTTTTCAGATT
    TATAATAAAGACTTTTCAAAAAAGTCCACAGGGAACGATAATCTGCACACCATGTAT
    CTCAAGAATCTGTTTAGTGAAGAAAACCTTAAAGACATAGTTTTGAAGCTTAACGGA
    GAGGCTGAGATTTTTTTTAGAAAGTCCTCAATTAAAAACCCTATAATACACAAGAAA
    GGCTCTATTCTTGTTAACAGGACATATGAAGCCGAGGAGAAAGATCAGTTTGGCAAT
    ATCCAGATTGTTCGCAAGAATATCCCGGAAAATATATATCAGGAGCTGTATAAATAC
    TTTAACGACAAGAGCGACAAGGAGCTGAGTGACGAGGCCGCGAAGCTTAAGAATGT
    AGTAGGTCACCACGAAGCAGCCACCAATATCGTCAAAGACTATAGGTACACGTACG
    ACAAGTACTTTTTGCACATGCCTATAACTATAAACTTCAAAGCTAATAAAACTGGGT
    TTATTAATGACAGGATTCTCCAATACATCGCTAAAGAGAAGGATCTGCATGTAATTG
    GCATAGACAGAGGTGAGAGAAACTTGATATATGTCAGCGTAATAGACACATGTGGC
    AATATCGTGGAACAGAAGTCTTTTAACATCGTCAATGGTTACGACTACCAAATTAAG
    TTGAAACAGCAGGAAGGCGCACGACAGATCGCACGAAAGGAATGGAAAGAGATAG
    GCAAAATAAAAGAAATAAAGGAGGGCTATCTCAGTCTCGTTATACACGAAATTTCA
    AAAATGGTTATTAAGTACAATGCAATCATAGCGATGGAGGATCTCAGTTATGGGTTC
    AAAAAGGGTCGGTTTAAAGTTGAGCGCCAAGTGTACCAAAAGTTCGAGACAATGCT
    GATTAACAAGCTGAACTACCTCGTCTTCAAAGATATAAGTATTACGGAGAACGGTG
    GCCTTCTTAAAGGCTATCAACTTACTTACATCCCGGACAAGCTCAAAAACGTAGGGC
    ACCAATGCGGGTGTATTTTCTATGTGCCTGCGGCATATACGTCAAAGATTGACCCAA
    CCACAGGATTCGTAAACATATTCAAGTTTAAGGACCTCACCGTTGATGCGAAAAGG
    GAGTTCATTAAAAAATTTGATTCTATTCGATATGATAGTGAGAAAAATCTCTTTTGTT
    TCACATTTGACTATAATAATTTTATTACTCAGAATACTGTCATGAGCAAGTCATCTTG
    GTCAGTGTACACATACGGGGTGCGGATCAAACGCAGGTTCGTCAATGGTCGCTTCTC
    AAACGAATCAGACACCATTGACATCACAAAGGACATGGAAAAAACCCTTGAGATGA
    CCGACATTAATTGGCGCGATGGTCATGATCTGCGGCAAGACATCATAGACTACGAA
    ATCGTCCAACACATCTTTGAGATCTTTCGCTTGACGGTCCAAATGCGGAACTCCCTG
    TCCGAGCTCGAGGATAGAGATTATGATCGGCTGATATCTCCCGTGCTTAATGAAAAT
    AACATCTTCTACGACTCCGCCAAGGCGGGTGATGCCCTGCCGAAGGATGCGGATGCT
    AATGGCGCTTATTGCATTGCTCTTAAGGGGCTCTATGAGATAAAGCAGATCACGGAA
    AACTGGAAAGAAGACGGTAAGTTTAGTAGAGACAAGCTGAAGATCTCAAATAAAGA
    CTGGTTTGATTTCATACAGAACAAGCGGTACCTGAAACGTCCGGCAGCGACCAAAA
    AAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAA
    AAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 102
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAACAATGGCACTAA
    CAATTTTCAGAATTTCATCGGCATTTCAAGTCTGCAAAAAACTCTGAGGAATGCTTT
    GATCCCTACTGAAACCACTCAGCAATTTATAGTCAAGAACGGTATAATTAAAGAAG
    ATGAACTCAGGGGTGAAAATAGACAAATACTCAAGGACATTATGGATGACTATTAT
    AGAGGCTTCATCTCAGAGACTCTCTCATCAATAGATGATATCGATTGGACTAGCCTT
    TTCGAGAAAATGGAGATTCAGTTGAAAAATGGTGATAACAAAGATACGTTGATAAA
    GGAACAGACCGAGTACAGGAAAGCCATTCATAAGAAATTTGCTAATGACGATAGAT
    TTAAGAATATGTTTAGTGCAAAACTGATTAGTGACATTCTGCCGGAGTTCGTTATCC
    ATAATAATAACTACTCTGCATCCGAAAAGGAGGAAAAGACGCAAGTTATTAAACTG
    TTCAGCCGCTTCGCCACAAGCTTCAAGGACTACTTCAAAAATAGAGCCAACTGCTTT
    TCTGCCGACGATATATCATCATCTTCATGCCATCGGATCGTTAACGATAACGCCGAG
    ATATTCTTCAGCAACGCCCTTGTATATCGAAGAATAGTCAAAAGTCTGAGTAATGAT
    GATATTAATAAAATTAGCGGTGATATGAAAGACTCCCTGAAGGAAATGTCACTGGA
    GGAAATTTATAGTTACGAAAAGTACGGCGAATTCATTACTCAAGAAGGCATATCCTT
    CTATAACGACATTTGCGGAAAGGTCAACTCATTCATGAACCTTTATTGCCAGAAGAA
    TAAGGAGAATAAAAATCTTTACAAATTGCAAAAACTTCACAAACAAATTCTTTGCAT
    CGCGGATACGTCCTACGAAGTTCCTTACAAATTTGAATCCGATGAGGAAGTGTATCA
    GAGTGTCAATGGATTTTTGGATAATATCTCTTCAAAACATATTGTGGAGAGATTGCG
    CAAAATAGGTGATAACTACAATGGCTACAACCTGGACAAGATTTATATTGTTAGCAA
    GTTCTATGAAAGTGTCAGTCAAAAGACCTACAGAGATTGGGAGACAATCAACACGG
    CGCTCGAAATACACTACAATAACATCCTCCCCGGCAATGGGAAGAGTAAAGCCGAT
    AAGGTTAAAAAAGCTGTTAAGAACGACCTCCAGAAATCCATCACGGAAATAAACGA
    GCTGGTTTCCAACTATAAGCTGTGTAGCGATGATAATATTAAGGCTGAGACATATAT
    ACATGAGATCAGCCACATTCTCAACAATTTCGAGGCACAGGAACTCAAATACAATC
    CCGAGATTCACTTGGTGGAAAGTGAGTTGAAGGCGTCAGAGCTTAAGAATGTACTT
    GACGTAATAATGAATGCTTTTCATTGGTGCTCCGTGTTCATGACTGAGGAACTCGTG
    GATAAGGATAATAACTTTTATGCGGAGTTGGAAGAGATATACGATGAAATATACCC
    GGTTATCTCACTGTATAATCTGGTCAGAAATTACGTGACCCAAAAGCCTTATAGTAC
    AAAAAAAATAAAGTTGAACTTCGGTATTCCGACATTGGCAGATGGTTGGTCCAAAA
    GCAAAGAATACTCTAATAACGCCATTATATTGATGCGAGACAATTTGTATTACCTTG
    GGATCTTTAACGCGAAAAACAAACCGGATAAGAAGATCATCGAAGGTAATACATCT
    GAGAATAAGGGGGATTACAAGAAGATGATTTATAATCTGTTGCCGGGGCCAAACAA
    GATGATTCCGAAGGTCTTTCTGTCATCTAAGACAGGAGTAGAGACCTACAAACCTTC
    TGCGTACATTTTGGAAGGCTACAAACAGAACAAGCATATAAAATCTAGCAAGGACT
    TTGATATCACGTTTTGTCATGATCTGATAGATTATTTCAAAAACTGCATCGCTATACA
    TCCTGAGTGGAAGAATTTCGGCTTTGACTTTTCTGACACCAGCACATACGAAGACAT
    CTCAGGTTTCTACCGGGAAGTCGAGCTCCAGGGGTACAAGATTGACTGGACATATAT
    AAGTGAAAAAGACATCGACCTCCTCCAAGAGAAGGGCCAACTTTACCTGTTCCAGA
    TCTATAACAAAGACTTTTCTAAAAAGTCCACGGGTAACGACAACTTGCACACTATGT
    ATCTGAAAAACTTGTTCTCTGAAGAGAACCTCAAGGACATCGTCCTGAAGCTTAACG
    GGGAGGCGGAGATCTTCTTTAGAAAGTCCTCTATCAAAAATCCCATTATCCATAAAA
    AGGGCTCTATACTCGTTAATAGGACATATGAAGCGGAGGAAAAAGATCAATTTGGG
    AACATCCAGATCGTCCGGAAAAATATACCTGAGAATATCTATCAAGAGCTGTACAA
    GTATTTTAATGATAAGTCAGACAAAGAGCTCAGTGATGAGGCGGCAAAGCTCAAGA
    ACGTGGTGGGGCATCATGAAGCTGCGACGAACATTGTCAAAGATTATAGATACACT
    TACGATAAATACTTCCTCCACATGCCGATAACGATTAACTTCAAAGCCAATAAGACG
    GGGTTTATAAATGATCGGATCCTTCAGTACATTGCGAAAGAGAAAGACCTCCATGTG
    ATCGGAATTGACCGAGGAGAAAGGAATCTGATTTACGTGTCCGTGATTGATACTTGC
    GGGAATATAGTCGAGCAAAAGAGTTTCAACATAGTCAACGGGTATGACTATCAGAT
    AAAGCTCAAACAGCAGGAAGGTGCGAGGCAAATTGCGCGCAAAGAGTGGAAGGAG
    ATAGGCAAGATTAAAGAAATCAAGGAAGGTTATCTCAGCTTGGTGATCCATGAAAT
    ATCTAAGATGGTTATAAAGTACAATGCCATAATAGCCATGGAGGATCTTTCCTACGG
    GTTTAAGAAGGGCCGATTTAAAGTGGAGCGACAAGTTTACCAGAAGTTCGAAACCA
    TGTTGATTAACAAACTTAACTATTTGGTGTTCAAGGATATAAGTATAACCGAAAACG
    GCGGTTTGCTTAAGGGTTATCAGCTCACGTATATTCCTGATAAACTTAAAAACGTTG
    GACACCAGTGTGGATGTATCTTCTACGTGCCAGCCGCTTACACTAGTAAGATAGATC
    CTACCACGGGGTTTGTGAATATTTTTAAGTTTAAAGACTTGACAGTCGACGCCAAAA
    GGGAATTTATAAAAAAGTTTGATTCTATCCGCTACGATAGTGAAAAAAATCTCTTTT
    GCTTTACTTTCGACTATAACAACTTCATTACGCAGAACACTGTCATGAGTAAGTCCA
    GCTGGAGCGTCTACACATATGGCGTCCGAATTAAACGACGATTTGTAAACGGGCGG
    TTTTCAAACGAATCTGACACGATAGACATTACCAAGGATATGGAGAAGACACTTGA
    GATGACCGACATAAACTGGCGGGACGGTCACGATCTTCGGCAGGACATAATTGATT
    ACGAAATCGTCCAGCATATATTCGAAATATTTCGACTTACAGTGCAAATGCGGAACA
    GTCTCTCTGAACTGGAAGATCGCGATTATGACCGGTTGATTTCTCCGGTCCTCAATG
    AAAATAACATATTTTATGATAGTGCTAAGGCAGGTGATGCGTTGCCAAAGGATGCA
    GACGCTAATGGTGCCTATTGTATCGCGCTCAAGGGATTGTACGAGATAAAGCAAATT
    ACGGAGAACTGGAAGGAGGATGGTAAGTTTAGCCGAGACAAGTTGAAGATTAGCAA
    TAAAGACTGGTTTGATTTTATCCAAAACAAGAGGTACCTGAAACGTCCGGCAGCGA
    CCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAG
    CCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGG
    GCTAA
    SEQ ID NO: 103
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAACGGAACTA
    ATAACTTTCAAAATTTCATAGGTATTTCAAGCTTGCAGAAGACCCTGAGGAATGCCC
    TGATTCCAACCGAGACAACGCAGCAGTTCATAGTCAAAAATGGCATTATTAAGGAA
    GATGAGCTGCGGGGGGAAAACCGACAGATACTCAAGGATATTATGGACGACTATTA
    CCGGGGATTTATCTCAGAAACGCTGAGCAGTATTGATGACATCGATTGGACCAGTCT
    TTTCGAGAAAATGGAAATTCAACTTAAGAATGGTGACAATAAAGACACTCTCATAA
    AGGAGCAAACTGAATACCGAAAAGCCATACACAAAAAGTTTGCCAACGATGACCGC
    TTTAAAAACATGTTTTCAGCTAAGCTCATTAGCGACATTCTCCCCGAGTTTGTGATTC
    ATAACAATAACTATAGCGCATCCGAGAAGGAGGAAAAAACCCAAGTTATCAAATTG
    TTCAGTAGATTCGCTACGAGCTTTAAAGATTACTTTAAAAACCGGGCTAACTGCTTC
    AGTGCAGACGATATCAGCTCCTCATCCTGTCATCGCATCGTCAATGATAATGCTGAG
    ATCTTCTTTTCTAATGCACTGGTTTACCGCAGGATAGTTAAGTCTCTTAGTAACGACG
    ACATCAACAAGATATCAGGAGATATGAAGGATTCCCTTAAAGAAATGAGTCTCGAG
    GAGATATATTCTTATGAAAAATACGGCGAATTTATTACCCAAGAGGGCATTAGTTTC
    TATAATGACATATGCGGAAAAGTTAATAGTTTTATGAATCTCTATTGTCAGAAGAAT
    AAGGAGAATAAGAACCTCTACAAATTGCAGAAGTTGCACAAGCAAATTCTGTGTAT
    CGCGGACACCTCTTACGAGGTCCCATATAAGTTCGAGAGTGATGAAGAAGTATACC
    AGAGCGTTAATGGGTTCCTGGACAACATCTCAAGTAAACACATAGTCGAAAGGCTC
    CGAAAGATCGGTGATAACTATAACGGATATAATTTGGATAAAATTTATATAGTTAGC
    AAATTTTACGAGAGCGTCAGTCAGAAGACCTACCGGGACTGGGAGACCATAAACAC
    AGCGCTGGAAATACATTATAACAACATACTGCCTGGGAACGGTAAGTCAAAGGCAG
    ACAAGGTTAAAAAGGCTGTGAAGAATGACCTGCAAAAATCAATTACAGAAATAAAT
    GAGTTGGTAAGTAATTACAAACTTTGCAGCGATGATAATATAAAGGCAGAGACGTA
    CATACATGAAATATCTCATATCCTCAACAATTTCGAAGCCCAAGAACTGAAGTACAA
    CCCGGAAATTCATCTTGTAGAGTCTGAGTTGAAGGCCTCCGAATTGAAAAACGTTCT
    TGACGTAATTATGAATGCCTTCCACTGGTGCTCAGTATTCATGACGGAAGAGCTCGT
    GGATAAAGACAACAATTTTTACGCTGAACTGGAAGAAATATATGACGAGATTTACC
    CCGTAATTTCACTCTACAACTTGGTACGAAATTACGTTACCCAAAAGCCATACTCAA
    CAAAAAAAATTAAACTGAACTTCGGGATACCCACCCTCGCAGATGGATGGTCAAAG
    TCCAAAGAGTACAGTAACAATGCAATTATCCTGATGCGAGACAACCTTTATTACCTC
    GGGATTTTCAACGCTAAAAATAAACCTGATAAAAAAATAATTGAGGGTAATACCTC
    TGAAAACAAGGGGGATTATAAAAAGATGATATACAATCTGCTGCCTGGCCCGAACA
    AAATGATTCCTAAAGTCTTCTTGTCTTCCAAGACTGGAGTCGAAACCTACAAGCCAA
    GTGCTTATATACTCGAAGGGTACAAACAAAATAAGCACATAAAATCCAGCAAGGAT
    TTTGATATTACATTCTGCCACGATTTGATTGATTATTTTAAGAACTGTATAGCCATCC
    ACCCAGAATGGAAGAATTTTGGTTTTGATTTTAGCGATACCTCAACATATGAGGATA
    TCTCTGGCTTTTACCGCGAGGTAGAACTGCAAGGTTATAAGATCGATTGGACTTATA
    TTTCTGAAAAGGACATAGATCTCCTGCAAGAGAAAGGGCAACTTTATTTGTTTCAAA
    TATACAACAAAGATTTTAGTAAGAAGAGTACTGGCAATGATAACCTTCACACTATGT
    ATCTGAAGAACCTTTTTTCTGAGGAGAACTTGAAGGACATAGTCCTTAAACTCAATG
    GGGAAGCTGAAATATTCTTTCGCAAAAGCTCCATTAAAAACCCGATCATTCATAAAA
    AGGGTTCCATCTTGGTAAACCGCACATACGAGGCGGAAGAAAAAGATCAGTTCGGA
    AATATCCAGATCGTAAGGAAGAATATCCCCGAAAATATATACCAAGAGCTTTACAA
    ATATTTTAACGATAAGTCAGACAAGGAACTGTCAGACGAAGCAGCCAAGTTGAAGA
    ATGTCGTAGGGCACCACGAAGCAGCTACAAACATAGTTAAAGATTATCGGTACACC
    TACGATAAATATTTCCTGCATATGCCAATAACCATAAACTTCAAAGCCAACAAAACA
    GGGTTCATCAATGACCGAATACTTCAGTATATAGCCAAGGAAAAAGACCTGCATGTT
    ATAGGAATAGATAGAGGTGAGCGCAACTTGATATATGTCAGCGTGATAGACACCTG
    CGGAAATATCGTCGAGCAAAAAAGTTTCAACATTGTTAATGGCTACGATTACCAAAT
    TAAATTGAAGCAGCAAGAGGGGGCTCGGCAAATCGCGCGAAAGGAATGGAAAGAA
    ATCGGGAAGATTAAAGAAATTAAAGAGGGCTACCTGTCTCTTGTAATTCACGAAAT
    ATCTAAGATGGTCATCAAGTATAATGCCATTATTGCGATGGAAGATCTGTCCTACGG
    ATTTAAGAAAGGCAGGTTTAAAGTCGAAAGGCAGGTGTACCAGAAATTCGAGACCA
    TGCTGATTAATAAGCTCAACTATCTCGTATTTAAGGATATTTCTATAACTGAAAATG
    GAGGGCTTCTCAAAGGATATCAACTCACATACATACCTGATAAGCTGAAGAACGTA
    GGCCACCAGTGTGGATGCATATTCTATGTACCAGCTGCATACACAAGCAAGATCGAT
    CCAACTACTGGGTTTGTCAATATCTTCAAATTTAAGGACTTGACGGTCGATGCCAAA
    CGGGAGTTCATCAAAAAGTTTGATAGTATTCGATATGATAGTGAGAAGAACTTGTTT
    TGCTTCACATTTGACTACAACAATTTCATAACGCAAAATACGGTTATGTCTAAATCC
    TCATGGAGCGTCTACACTTACGGAGTGAGGATAAAGCGGCGCTTCGTAAATGGCAG
    GTTTAGCAATGAATCCGACACGATTGACATAACCAAGGATATGGAGAAAACCCTCG
    AGATGACCGATATAAATTGGCGGGATGGACACGATCTGCGACAAGACATAATCGAT
    TATGAAATCGTGCAGCACATATTTGAGATATTCAGGCTTACGGTCCAAATGAGAAAT
    TCCCTTTCCGAACTTGAAGACCGCGATTACGACCGACTGATAAGCCCCGTTCTGAAC
    GAAAATAACATCTTCTACGACAGCGCTAAAGCGGGAGACGCGCTGCCGAAAGATGC
    GGACGCAAATGGAGCCTATTGTATCGCCTTGAAAGGGTTGTACGAGATCAAACAGA
    TAACCGAGAATTGGAAGGAGGATGGGAAGTTTAGTCGAGACAAACTTAAAATAAGC
    AACAAGGACTGGTTCGACTTTATTCAAAACAAACGATATCTCAAACGTCCGGCAGC
    GACCAAAAAAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGC
    AGCCCGAAAAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCC
    GGGCTAA
    SEQ ID NO: 104
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAATGGTACTAA
    CAATTTTCAAAACTTTATCGGCATCTCTTCACTTCAGAAAACTCTTCGGAACGCCCTT
    ATACCGACGGAGACAACGCAGCAGTTTATAGTTAAAAACGGGATCATTAAAGAAGA
    TGAACTCAGAGGGGAAAACAGGCAAATATTGAAGGACATTATGGACGATTACTACC
    GGGGGTTTATTTCAGAGACCCTTTCATCTATTGATGACATAGATTGGACCTCCCTTTT
    CGAGAAAATGGAGATACAATTGAAAAACGGCGACAATAAAGATACACTTATCAAGG
    AACAAACTGAGTATCGCAAGGCGATTCACAAGAAGTTTGCGAATGACGATCGCTTT
    AAGAATATGTTTTCTGCGAAGCTCATAAGTGACATTCTGCCTGAATTTGTCATTCATA
    ACAACAATTATTCTGCTAGCGAAAAAGAGGAAAAAACTCAAGTCATTAAGCTTTTTA
    GCAGGTTCGCTACTAGTTTTAAAGACTATTTTAAGAACCGGGCGAATTGCTTTAGCG
    CTGACGACATATCATCCTCATCCTGTCATCGCATAGTCAATGATAATGCAGAAATAT
    TCTTTTCTAATGCGCTCGTGTATCGGAGAATAGTGAAAAGCCTCTCTAACGATGACA
    TTAACAAAATAAGCGGCGATATGAAGGATAGTCTGAAGGAAATGTCCCTCGAAGAA
    ATATACTCATACGAGAAGTACGGAGAATTTATCACCCAGGAAGGAATTAGTTTTTAC
    AACGACATCTGTGGTAAGGTTAACTCTTTTATGAATCTGTATTGTCAAAAGAATAAA
    GAAAATAAAAATCTTTATAAGCTCCAAAAGCTTCACAAACAAATCTTGTGCATTGCG
    GATACGTCATACGAAGTACCTTACAAATTTGAAAGCGACGAAGAGGTGTATCAGTC
    AGTGAATGGGTTCCTTGACAATATTTCTAGCAAACATATTGTGGAGCGACTTCGAAA
    GATCGGTGATAATTACAATGGCTATAATTTGGATAAAATTTACATAGTTAGTAAGTT
    TTATGAATCCGTCTCACAAAAGACGTACCGAGATTGGGAGACCATCAACACTGCTCT
    GGAGATTCATTACAATAATATATTGCCTGGGAATGGGAAGTCAAAGGCCGACAAGG
    TTAAAAAAGCCGTAAAAAACGATCTTCAAAAGTCCATTACCGAGATAAATGAACTT
    GTATCCAACTATAAGTTGTGCTCTGACGATAATATTAAAGCAGAAACGTATATCCAC
    GAAATAAGTCACATCCTGAACAACTTCGAAGCTCAAGAGCTCAAGTATAATCCTGA
    AATTCATCTCGTCGAAAGCGAGCTGAAAGCATCCGAGTTGAAGAATGTGCTTGATGT
    GATCATGAACGCATTCCATTGGTGCAGTGTGTTCATGACCGAAGAACTTGTAGACAA
    AGACAACAACTTCTACGCTGAATTGGAAGAGATTTACGATGAAATTTACCCCGTGAT
    ATCCCTCTATAATCTGGTAAGAAATTACGTCACGCAAAAACCATACAGTACCAAGA
    AAATAAAGCTCAACTTTGGTATTCCGACGTTGGCAGATGGGTGGAGTAAGAGCAAG
    GAGTATTCTAACAATGCAATCATCCTCATGCGCGACAATTTGTATTATCTGGGGATC
    TTCAACGCGAAAAATAAGCCCGACAAAAAGATAATAGAAGGCAATACGTCCGAGA
    ACAAAGGGGACTATAAGAAAATGATTTATAACCTTCTTCCAGGACCCAACAAGATG
    ATCCCAAAGGTTTTCTTGAGTTCAAAAACCGGCGTAGAAACTTATAAACCGTCCGCC
    TACATTCTGGAAGGGTACAAGCAAAACAAGCACATTAAGTCATCTAAGGATTTCGA
    CATTACTTTTTGTCATGATTTGATAGACTACTTCAAAAATTGTATAGCGATACATCCG
    GAATGGAAAAATTTTGGGTTCGATTTTTCCGACACAAGTACTTATGAAGACATCTCA
    GGGTTTTATAGGGAAGTTGAACTGCAAGGTTACAAAATAGACTGGACTTATATTAGT
    GAGAAGGACATTGATTTGCTCCAGGAAAAGGGTCAATTGTATCTGTTCCAGATATAT
    AACAAGGATTTCTCTAAAAAATCTACAGGTAACGACAATCTCCACACGATGTACCTC
    AAGAATCTCTTCAGCGAAGAGAATTTGAAGGATATCGTACTTAAGCTCAATGGAGA
    AGCGGAAATATTCTTCAGAAAGTCCAGCATTAAGAATCCTATAATTCACAAGAAAG
    GGTCAATTCTCGTAAACCGGACTTATGAGGCCGAAGAAAAAGATCAGTTTGGTAAC
    ATTCAGATTGTACGGAAAAACATTCCCGAGAACATCTATCAAGAACTGTATAAATAC
    TTTAATGATAAATCCGACAAGGAACTTTCTGACGAGGCTGCAAAATTGAAGAACGT
    AGTGGGACACCATGAGGCCGCAACCAATATAGTAAAGGATTACAGATACACTTATG
    ATAAGTATTTCCTCCATATGCCGATCACGATTAATTTCAAGGCGAATAAAACCGGCT
    TCATTAACGATCGCATTTTGCAATATATTGCGAAGGAAAAGGATTTGCACGTGATAG
    GTATAGACCGGGGTGAACGAAACTTGATTTACGTCTCTGTGATCGACACATGCGGAA
    ATATAGTTGAACAGAAGTCCTTTAATATTGTGAATGGTTACGACTACCAGATAAAAT
    TGAAGCAACAGGAGGGCGCAAGACAGATAGCTCGCAAAGAGTGGAAGGAAATCGG
    CAAGATCAAAGAAATAAAGGAGGGTTATCTTTCCCTGGTAATTCATGAAATTAGCA
    AGATGGTTATTAAGTATAATGCTATAATAGCTATGGAGGACCTTTCCTATGGGTTCA
    AGAAAGGTCGCTTCAAAGTGGAGCGACAAGTGTATCAAAAGTTCGAGACTATGTTG
    ATAAATAAATTGAATTATTTGGTTTTTAAAGACATTTCAATAACTGAGAACGGGGGT
    CTCTTGAAGGGGTACCAATTGACTTATATTCCGGACAAGTTGAAGAATGTCGGACAC
    CAGTGTGGTTGCATTTTCTACGTGCCTGCCGCTTACACCTCAAAAATCGATCCGACC
    ACTGGTTTTGTAAATATATTTAAATTCAAAGATCTCACCGTTGATGCCAAACGGGAG
    TTTATCAAAAAATTCGATTCCATTCGCTACGACTCTGAGAAAAACCTTTTTTGTTTCA
    CGTTCGATTATAACAACTTTATAACCCAAAATACTGTAATGTCCAAGTCAAGTTGGT
    CTGTCTATACTTACGGAGTAAGGATCAAGCGCCGCTTCGTTAATGGGAGATTCTCAA
    ACGAGTCTGATACCATAGACATAACTAAAGACATGGAAAAAACCCTGGAAATGACG
    GACATCAATTGGCGAGACGGGCATGATCTTCGACAGGACATAATAGATTACGAAAT
    TGTTCAACACATTTTCGAGATATTTCGACTTACGGTTCAGATGAGGAATTCCCTTTCC
    GAATTGGAAGACCGGGATTATGATCGACTTATATCTCCCGTGCTCAATGAAAACAAT
    ATTTTTTATGATTCAGCGAAAGCTGGGGACGCGCTGCCAAAAGATGCCGATGCCAAT
    GGAGCATACTGTATCGCCCTGAAGGGTTTGTATGAGATTAAGCAAATTACTGAAAAC
    TGGAAGGAAGATGGCAAGTTTTCTAGAGATAAGCTTAAGATTAGCAATAAGGACTG
    GTTTGACTTCATTCAAAATAAAAGGTATCTTAAACGTCCGGCAGCGACCAAAAAAG
    CCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAAAAA
    GAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 105
    CCAGCGGCTAAAAAAAAGAAACTGGATGGCAGCGTGGATATGAATAATGGAACAA
    ATAATTTTCAAAATTTTATTGGTATCAGTTCATTGCAAAAGACTTTGAGAAATGCTTT
    GATCCCGACTGAGACCACACAGCAGTTCATCGTCAAAAATGGCATAATCAAGGAAG
    ACGAACTTAGGGGTGAGAATAGACAAATATTGAAGGACATCATGGATGACTATTAT
    AGGGGGTTCATTTCCGAAACGCTCAGTAGTATTGATGACATTGACTGGACTAGTCTT
    TTCGAGAAAATGGAAATTCAGCTTAAGAACGGGGACAATAAAGACACGCTGATCAA
    GGAGCAAACGGAATATAGGAAGGCGATCCATAAAAAATTCGCGAATGATGATCGGT
    TTAAAAACATGTTTAGTGCCAAGTTGATCAGCGACATACTGCCCGAATTCGTGATCC
    ACAACAATAATTACAGCGCCTCCGAAAAGGAGGAAAAAACTCAGGTCATTAAATTG
    TTTAGCCGATTCGCAACGAGTTTCAAAGATTATTTTAAGAACCGGGCCAACTGTTTT
    TCAGCGGATGATATTAGCTCCAGCAGCTGCCATCGCATAGTAAATGATAACGCTGAA
    ATCTTTTTTAGCAACGCACTTGTCTACCGGAGGATTGTAAAATCACTGTCAAATGAT
    GACATTAACAAAATATCTGGAGATATGAAGGACTCACTCAAAGAAATGAGCCTGGA
    AGAAATATATTCATACGAAAAATACGGGGAGTTTATTACCCAGGAAGGTATCAGTTT
    TTATAATGATATATGTGGAAAAGTTAATTCATTTATGAATCTTTACTGTCAAAAAAA
    TAAGGAGAACAAGAATTTGTACAAGCTCCAAAAACTTCATAAACAGATTCTGTGCA
    TCGCAGACACAAGTTATGAGGTACCGTACAAATTTGAGAGCGACGAAGAAGTTTAT
    CAGAGTGTGAATGGTTTCCTGGACAATATCTCTTCTAAACACATTGTTGAGAGGCTT
    AGGAAGATCGGTGATAATTATAACGGCTATAATCTGGACAAAATTTATATTGTATCA
    AAGTTTTATGAATCAGTCTCTCAAAAGACGTATCGGGATTGGGAAACAATTAACACG
    GCTCTGGAGATCCACTACAATAACATTCTGCCCGGCAACGGGAAGAGCAAAGCTGA
    TAAGGTCAAGAAGGCAGTCAAGAACGACCTTCAGAAGAGCATAACAGAAATTAACG
    AATTGGTCAGTAACTACAAACTGTGTAGTGATGACAACATAAAAGCCGAAACATAC
    ATCCATGAAATAAGCCATATCCTGAATAACTTCGAAGCCCAAGAACTTAAATACAAT
    CCCGAGATTCATCTTGTCGAATCAGAACTCAAGGCGTCCGAGCTCAAAAATGTCCTT
    GACGTGATAATGAATGCCTTCCACTGGTGCAGCGTATTCATGACGGAGGAGTTGGTA
    GATAAAGACAACAACTTTTATGCCGAATTGGAAGAGATTTATGATGAGATTTACCCC
    GTTATTTCTCTGTACAACTTGGTTCGAAACTACGTAACACAAAAACCATACTCAACC
    AAAAAGATCAAACTCAATTTTGGCATACCTACATTGGCTGATGGTTGGTCCAAGTCA
    AAGGAATATAGCAATAATGCAATAATTCTCATGCGAGATAACTTGTATTATTTGGGG
    ATCTTTAACGCTAAGAACAAACCAGATAAAAAGATAATCGAGGGGAACACAAGTGA
    GAACAAGGGTGATTACAAAAAAATGATTTACAATCTGCTTCCTGGGCCTAACAAAA
    TGATTCCGAAGGTGTTTCTTAGCTCTAAAACTGGAGTGGAGACGTATAAGCCTTCCG
    CGTACATTCTCGAAGGCTACAAGCAAAATAAGCATATCAAGTCCAGTAAGGACTTC
    GACATCACTTTTTGCCACGATCTCATCGATTACTTTAAGAACTGTATCGCAATACACC
    CCGAGTGGAAAAACTTTGGTTTTGATTTTTCAGACACTAGTACCTACGAGGACATTT
    CCGGCTTCTATCGAGAAGTCGAACTCCAGGGCTACAAAATCGATTGGACGTACATTT
    CTGAGAAGGACATCGACTTGCTCCAAGAGAAAGGTCAACTTTACCTCTTCCAAATTT
    ACAATAAAGACTTTTCAAAGAAGAGCACCGGTAATGACAACTTGCATACCATGTAT
    CTGAAGAACCTGTTTTCTGAGGAGAACCTCAAGGATATTGTATTGAAGTTGAATGGC
    GAAGCAGAAATATTTTTCCGAAAGTCATCTATCAAGAACCCCATTATACACAAAAA
    AGGCTCTATCCTGGTGAACCGGACTTACGAGGCAGAGGAGAAGGATCAATTCGGAA
    ACATACAGATAGTCCGCAAAAACATCCCTGAGAATATCTATCAGGAACTCTATAAGT
    ACTTCAATGATAAATCAGACAAGGAGCTTAGCGACGAAGCAGCTAAACTTAAAAAC
    GTGGTTGGCCATCACGAGGCCGCTACCAACATAGTCAAAGACTACCGCTATACTTAT
    GACAAGTACTTTTTGCACATGCCCATAACAATTAATTTCAAAGCTAACAAAACAGGG
    TTTATAAATGACAGAATCCTCCAATACATCGCCAAAGAGAAGGACCTCCATGTAATC
    GGGATTGATAGAGGCGAACGGAACTTGATTTACGTTAGTGTCATTGATACCTGTGGT
    AACATTGTCGAACAAAAGTCATTCAACATAGTCAATGGATATGATTATCAGATAAA
    ACTCAAGCAACAAGAAGGCGCGAGGCAGATTGCCAGGAAGGAATGGAAAGAAATC
    GGGAAGATCAAGGAGATCAAGGAGGGTTACCTGTCCTTGGTGATACACGAGATTTC
    AAAAATGGTTATAAAATACAATGCCATTATCGCGATGGAGGATTTGTCTTATGGATT
    TAAGAAGGGGAGGTTCAAAGTCGAACGACAAGTCTATCAGAAGTTTGAAACAATGC
    TCATTAACAAGCTCAATTACCTTGTTTTCAAGGATATAAGCATCACTGAAAACGGCG
    GACTCCTTAAGGGATATCAGCTGACTTATATCCCCGACAAGCTCAAGAACGTAGGGC
    ACCAATGCGGATGCATCTTTTACGTGCCTGCAGCATATACTTCAAAAATTGATCCGA
    CTACTGGCTTTGTTAACATTTTCAAGTTCAAGGATCTGACGGTAGACGCTAAGAGAG
    AATTCATAAAAAAGTTTGACAGCATCAGGTACGATAGTGAAAAGAACCTTTTTTGTT
    TTACCTTTGACTACAATAATTTTATTACGCAAAATACAGTTATGAGCAAATCAAGTT
    GGAGCGTTTACACATATGGCGTTCGGATCAAGCGCAGATTCGTCAATGGTCGCTTCT
    CAAATGAGAGCGATACAATCGATATAACGAAGGATATGGAGAAGACGCTTGAGATG
    ACAGATATCAACTGGCGGGACGGACATGACCTTAGACAAGACATAATCGATTACGA
    AATAGTACAGCATATCTTTGAGATTTTTAGGCTTACAGTTCAGATGCGGAACTCTCTT
    TCCGAACTGGAGGACCGGGATTATGATCGGTTGATCTCCCCAGTACTGAACGAAAAT
    AATATCTTTTACGATAGCGCGAAGGCTGGTGATGCACTCCCAAAAGACGCTGATGCG
    AACGGAGCTTATTGCATAGCCCTTAAAGGGCTTTACGAGATTAAACAAATAACAGA
    AAATTGGAAGGAAGATGGCAAATTTTCCCGCGACAAGTTGAAGATTAGTAACAAAG
    ACTGGTTCGACTTCATTCAGAATAAACGCTACCTCAAACGTCCGGCAGCGACCAAAA
    AAGCCGGCCAGGCGAAGAAAAAAAAAGCGTCAGGTAGCGGCGCAGGCAGCCCGAA
    AAAGAAACGTAAAGTCGAGGATCCGAAAAAGAAACGTAAGGTTATTCCGGGCTAA
    SEQ ID NO: 113
    ATGGGCCATCATCATCATCATCATAGCAGCGGCGTGGATCTGGGCACCGAAAACCT
    GTATTTTCAGTCCATGAGCCGCCGCCGCAAAGCGAACCCGACCAAACTGAGCGAAA
    ACGCGAAAAAACTGGCGAAAGAAGTGGAAAACGCAAGCGGCAGCGGCGCGGGCAG
    CAAACGACCGGCGGCGACCAAAAAAGCGGGCCAAGCGAAGAAAAAGAAAGCAAGC
    GGCAGCGGCGCGGGCAGCCCGGCGGCAAAAAAAAAAAAACTGGACGGCAGCGTGG
    ATGCAAGCGGCAGCGGCGCGGGCAGCCCCAAAAAAAAACGCAAAGTTGAAGATGC
    AAGCGGCAGCGGCGCGGGCAGCCCGAAAAAAAAACGTAAAGTGGCAAGCGGCAGC
    GGCGCGGGCAGCATGAACAACGGCACCAACAACTTTCAGAACTTTATTGGCATTAG
    CAGCCTGCAGAAAACCCTGCGCAACGCGCTGATTCCGACCGAAACCACGCAGCAGT
    TTATTGTGAAAAACGGCATTATTAAAGAAGATGAACTGCGCGGCGAAAACCGTCAG
    ATTCTGAAGGACATTATGGATGATTATTATCGCGGCTTTATTAGCGAAACCCTGAGC
    AGCATTGATGATATAGACTGGACGAGCCTGTTTGAAAAAATGGAAATTCAGCTGAA
    AAACGGCGATAACAAAGATACCCTGATTAAAGAACAGACCGAATATCGCAAAGCGA
    TTCATAAGAAGTTTGCGAACGATGATCGCTTTAAAAACATGTTTAGCGCGAAACTGA
    TTAGCGATATTCTGCCGGAATTTGTGATTCATAACAACAACTATAGCGCGAGCGAAA
    AGGAAGAAAAAACCCAAGTGATTAAACTGTTTAGCCGCTTTGCGACGAGCTTTAAA
    GATTATTTTAAAAATCGCGCGAACTGCTTTAGCGCGGATGATATTAGCAGCAGCAGC
    TGCCATCGCATTGTGAACGATAACGCGGAGATCTTTTTTAGCAATGCGCTGGTGTAT
    CGCCGCATTGTGAAAAGCCTGAGCAACGATGATATTAACAAAATTAGCGGCGATAT
    GAAAGATAGCCTGAAAGAAATGAGCCTGGAAGAAATATATAGCTATGAAAAATATG
    GGGAATTTATTACACAAGAGGGCATTAGCTTTTATAACGATATTTGCGGCAAAGTGA
    ACAGCTTTATGAACCTGTATTGTCAGAAAAACAAAGAAAACAAAAACCTGTATAAA
    CTGCAGAAACTGCATAAACAGATTCTGTGCATTGCGGATACGAGCTATGAAGTGCC
    GTATAAATTTGAAAGCGATGAAGAAGTGTATCAGAGCGTGAACGGCTTTCTGGATA
    ACATTAGCAGCAAACATATTGTGGAACGCCTGCGCAAAATTGGCGATAACTATAAC
    GGCTATAACCTGGATAAAATTTATATTGTGAGCAAATTTTATGAAAGCGTGAGTCAG
    AAAACCTATCGCGATTGGGAAACCATTAACACCGCGCTGGAAATTCATTATAACAA
    CATTCTGCCGGGCAACGGCAAAAGTAAAGCGGATAAAGTGAAAAAAGCGGTGAAA
    AACGATCTGCAGAAAAGCATTACGGAAATTAACGAACTGGTGAGCAACTATAAACT
    GTGCAGCGATGATAACATTAAAGCGGAAACCTATATTCACGAGATCAGTCATATTCT
    GAACAACTTTGAAGCGCAAGAACTGAAATATAACCCGGAAATTCATCTGGTGGAAT
    CAGAACTGAAGGCGAGCGAACTTAAGAATGTGCTAGATGTGATTATGAACGCGTTT
    CATTGGTGCAGCGTGTTTATGACCGAAGAACTGGTGGATAAAGATAACAACTTTTAT
    GCGGAACTGGAAGAAATCTACGACGAAATTTATCCGGTGATTAGCCTGTATAACCTG
    GTGCGCAACTATGTGACGCAGAAACCGTATAGCACCAAAAAAATTAAACTGAACTT
    TGGCATTCCGACCCTGGCGGATGGCTGGAGCAAGAGCAAAGAGTATAGCAACAACG
    CTATTATCCTAATGCGCGATAACCTGTATTATCTGGGCATTTTTAACGCGAAAAACA
    AACCGGATAAAAAAATTATTGAAGGCAACACGAGCGAAAACAAAGGCGATTATAA
    AAAAATGATTTATAACCTGCTGCCGGGCCCGAACAAAATGATTCCGAAAGTGTTTCT
    GAGCAGCAAAACCGGCGTGGAAACCTATAAACCGAGCGCGTATATTCTGGAAGGCT
    ATAAACAGAACAAACATATTAAAAGCAGCAAAGATTTTGATATTACCTTTTGCCATG
    ATCTGATTGACTACTTTAAGAACTGTATAGCGATTCATCCGGAATGGAAAAACTTTG
    GCTTTGATTTTAGCGATACGAGCACCTATGAAGACATTAGCGGCTTTTATCGCGAAG
    TGGAACTGCAAGGCTATAAAATTGATTGGACCTATATTAGCGAAAAAGATATTGATC
    TGCTGCAAGAAAAAGGTCAGCTGTATCTGTTTCAGATTTATAACAAAGATTTTAGCA
    AAAAAAGCACCGGCAACGATAACCTGCATACCATGTATCTGAAAAATCTGTTTTCTG
    AAGAAAACCTAAAAGATATTGTCCTGAAACTGAACGGCGAAGCCGAAATTTTTTTTC
    GCAAGAGCAGCATTAAAAACCCGATTATTCACAAAAAAGGTAGCATTCTGGTGAAC
    CGCACATACGAAGCTGAGGAAAAGGATCAGTTTGGCAACATTCAGATTGTGCGCAA
    AAACATTCCGGAAAACATCTACCAAGAACTGTACAAATATTTTAACGATAAAAGCG
    ATAAAGAACTGAGCGACGAGGCTGCGAAGCTGAAGAATGTCGTGGGCCATCATGAA
    GCGGCGACTAACATTGTCAAAGATTATCGCTATACCTATGATAAATATTTTCTGCAT
    ATGCCGATTACCATTAACTTTAAAGCGAACAAAACCGGCTTTATTAACGATCGCATT
    CTGCAGTATATTGCGAAGGAAAAGGATCTGCACGTGATTGGCATTGATCGCGGCGA
    ACGCAACCTGATTTATGTGAGCGTGATTGATACCTGCGGCAACATTGTGGAACAGAA
    AAGCTTTAACATCGTGAACGGCTATGATTATCAGATTAAACTGAAACAGCAAGAAG
    GCGCGCGTCAGATTGCGCGCAAAGAATGGAAAGAAATTGGCAAAATTAAAGAAATT
    AAAGAAGGCTATCTGAGCCTGGTGATTCATGAAATCAGCAAGATGGTGATTAAATA
    TAATGCCATTATTGCGATGGAAGATCTGAGCTATGGCTTTAAAAAAGGCCGCTTTAA
    AGTGGAACGCCAAGTGTATCAGAAATTTGAAACCATGCTGATTAACAAACTGAACT
    ATCTGGTGTTTAAAGATATTAGTATTACTGAAAATGGCGGCCTGCTGAAAGGCTATC
    AGCTGACCTATATTCCGGACAAGCTGAAGAATGTGGGCCATCAGTGCGGCTGCATTT
    TTTATGTGCCGGCGGCGTATACGAGCAAAATTGATCCGACCACCGGCTTTGTGAACA
    TTTTTAAATTTAAAGATCTGACCGTGGATGCGAAACGGGAATTCATAAAAAAATTTG
    ATAGCATTCGCTATGATAGCGAAAAGAATCTGTTTTGCTTCACCTTTGATTATAACA
    ACTTTATAACGCAGAACACCGTGATGAGCAAAAGCAGCTGGAGCGTGTATACCTAT
    GGCGTGCGCATTAAACGCCGCTTTGTGAACGGCCGCTTTAGCAACGAAAGCGATAC
    CATTGATATTACCAAAGATATGGAAAAAACCCTGGAAATGACCGATATTAACTGGC
    GCGATGGCCATGATCTGCGCCAAGATATTATTGATTATGAAATTGTGCAGCATATTT
    TTGAAATTTTTCGCCTGACCGTGCAGATGCGCAACAGCCTGAGCGAACTGGAAGATC
    GCGATTATGATCGCCTGATTAGCCCGGTGCTGAACGAAAACAACATTTTTTATGATA
    GCGCGAAAGCGGGCGATGCGCTGCCGAAAGATGCGGATGCGAACGGCGCGTATTGC
    ATTGCGCTGAAAGGCCTGTATGAAATTAAACAGATTACGGAAAACTGGAAAGAAGA
    TGGCAAATTTAGCCGCGACAAGCTGAAAATTAGCAACAAAGATTGGTTTGATTTTAT
    TCAGAACAAACGCTATCTGTA
  • In certain embodiments, a nucleic acid-guided nuclease, e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO:2. In certain embodiments, a nucleic acid-guided nuclease, e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to amino acid sequence of SEQ ID NO:2. In certain embodiments, a nucleic acid-guided nuclease, e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO: 3. In certain embodiments, a nucleic acid-guided nuclease, e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical, to amino acid sequence of SEQ ID NO: 3. In certain embodiments, a nucleic acid-guided nuclease, e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO: 4. In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to amino acid sequences of SEQ ID NO: 4. In certain embodiments, a nucleic acid-guided nuclease, e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to any one of SEQ ID NOs: 109-112. In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to amino acid sequence of any one of SEQ ID NOs: 109-112. In certain embodiments, a nucleic acid-guided nuclease, e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO: 109. In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 109. In certain embodiments, a nucleic acid-guided nuclease, e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO: 110. In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 110. In certain embodiments, a nucleic acid-guided nuclease, e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO: 111. In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 111. In certain embodiments, a nucleic acid-guided nuclease, e.g., Type V, preferably Type VA CRISPR nuclease polypeptide disclosed herein includes a polypeptide having an amino acid sequence of at least 50% identity to SEQ ID NO: 112. In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 112.
  • Nuclear Localization Signals (NLSs)
  • In certain embodiments, a composition, e.g., nuclease, disclosed herein includes one or more nuclear localization sequences (NLSs), such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs. In some embodiments, a composition, e.g., engineered nuclease comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the amino-terminus, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the carboxy-terminus, or a combination of these (e.g. one or more NLS at the amino-terminus and one or more NLS at the carboxy terminus). When more than one NLS is present, each may be selected independently of the others, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs present in one or more copies. In certain embodiments the engineered nuclease comprises 4 NLSs.
  • Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO:5); the NLS from nucleoplasmin (e.g. the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO:6); the c-myc NLS having the amino acid sequence PAAKRVKLD SEQ ID NO:7) or RQRRNELKRSP (SEQ ID NO:8); the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO:9); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO:10) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO:11) and PPKKARED (SEQ ID NO:12) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO:13) of human p53; the sequence SALI AP (SEQ ID NO:14) of mouse c-abl IV; the sequences DRLRR (SEQ ID NO:15) and PKQKKRK (SEQ ID NO:16) of the influenza virus NS1; the sequence RKLKKKIKKL (SEQ ID NO:17) of the Hepatitis virus delta antigen; the sequence REKKKFLKRR (SEQ ID NO:18) of the mouse Mx1 protein; the sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO:19) of the human poly(ADP-ribose) polymerase; the sequence RKCLQAGMNLEARKTKK (SEQ ID NO:20) of the steroid hormone receptors (human) glucocorticoid; and EGL-13, MSRRRKANPTKLSENAKKLAKEVEN, SEQ ID NO: 107.
  • In certain embodiments, a nuclease provided herein comprises at least one myc-related NLS comprising the sequence PAAKKKKLD (SEQ ID NO:21); in certain embodiments the myc-related NLS is at the N-terminus of the nuclease. In certain embodiments, a nuclease provided herein comprises at least one nucleoplasmin NLS comprising the sequence KRPAATKKAGQAKKKK (SEQ ID NO:6); in certain embodiments the nucleoplasmin NLS is at the C-terminus of the nuclease. In certain embodiments a nuclease provided herein comprises at least one, or at least two, SV40 NLS sequences comprising the sequence PKKKRKV (SEQ ID NO:5); in certain embodiments the SV40 NLSs are at the C-terminus of the nuclease. In certain embodiments, a nuclease provided herein comprises 1 NLS at the N-terminus and 3 NLSs at the C-terminus, for example 1 myc-related NLS at the N-terminus and one nucleoplasmin NLS and two SV40 NLSs at the C-terminus. In certain embodiments, a nuclease provided herein comprises 1 myc-related NLS at the N-terminus with the sequence PAAKKKKLD (SEQ ID NO:21 and one nucleoplasmin NLS comprising the sequence KRPAATKKAGQAKKKK (SEQ ID NO:6) and two SV40 NLSs comprising the sequence PKKKRKV (SEQ ID NO:5) at the C-terminus.
  • In general, the one or more NLSs are of sufficient strength to drive accumulation of the nucleic acid-guided nuclease in a detectable amount in the nucleus of a eukaryotic cell. In general, strength of nuclear localization activity may derive from the number of NLSs, the particular NLS(s) used, or a combination of these factors. Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to the nucleic acid-guided nuclease, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g. a stain specific for the nucleus such as DAPI). Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of the nucleic acid-guided nuclease complex formation (e.g. assay for DNA cleavage or mutation at the target sequence, or assay for altered gene expression activity affected by targetable nuclease complex formation and/or nucleic acid-guided nuclease activity), as compared to a control not exposed to the nucleic acid-guided nuclease or targetable nuclease complex, or exposed to a nucleic acid-guided nuclease lacking the one or more NLSs.
  • In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4 and at least one myc-related NLS comprising the sequence PAAKKKKLD (SEQ ID NO:21); in certain embodiments the myc-related NLS is at the N-terminus of the nuclease. In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4 and at least one nucleoplasmin NLS comprising the sequence KRPAATKKAGQAKKKK (SEQ ID NO:6); in certain embodiments the nucleoplasmin NLS is at the C-terminus of the nuclease. In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4 and at least one, or at least two, SV40 NLS sequences comprising the sequence PKKKRKV (SEQ ID NO: 5); in certain embodiments the SV40 NLSs are at the C-terminus of the nuclease. In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4 and one NLS at the N-terminus and three NLSs at the C-terminus, for example 1 myc-related NLS at the N-terminus and one nucleoplasmin NLS and two SV40 NLSs at the C-terminus. In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4, and one myc-related NLS at the N-terminus with the sequence PAAKKKKLD (SEQ ID NO:21) and one nucleoplasmin NLS comprising the sequence KRPAATKKAGQAKKKK (SEQ ID NO:6) and two SV40 NLSs comprising the sequence PKKKRKV (SEQ ID NO:5) at the C-terminus. In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 1, and one, two, or three NLS at the N-terminus and one, two, or three NLS at the C-terminus. In certain embodiments, a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 1, and one myc-related NLS at the N-terminus with the sequence PAAKKKKLD (SEQ ID NO:21) and one nucleoplasmin NLS comprising the sequence KRPAATKKAGQAKKKK (SEQ ID NO:6) and two SV40 NLSs comprising the sequence PKKKRKV (SEQ ID NO:5) at the C-terminus.
  • Purification Tags
  • In certain embodiments, a nucleic acid-guided nuclease provided herein can comprise a tag, e.g., a purification tag, e.g. at the N-terminus. Exemplary tags include a poly-his tag, such as a Gly-6×His tag (SEQ ID NO: 421) or Gly-8×His tag (SEQ ID NO: 422), short epitope tags such as FLAG, hemagglutinin (HA), c-myc, T7, and Glu-Glu; maltose binding protein (mbp); N-terminal glutathione S-transferase (GST); calmodulin binding peptide (CBP). In certain embodiments, a nucleic acid-guided nuclease provided herein can comprise a poly-his tag, such as a Gly-6×His tag (SEQ ID NO: 421), e.g., at the N-terminus. These Gly-6×His tags (SEQ ID NO: 421) are applied for several reasons including: 1) a 6×His tag (SEQ ID NO: 423) can be used in protein purification to allow binding to the chromatographic columns for purification, and 2) the N-terminal glycine allows further, site-specific, chemical modifications that permit advanced protein engineering. Further, the Gly-6×His (SEQ ID NO: 421) is designed for easy removal, if desired, by digestion with Tobacco Etch Virus (TEV) protease. For these constructs, the Gly-6×His tag (SEQ ID NO: 421) was positioned on the N-terminus. Gly-6×His tags (SEQ ID NO: 421) are further described in Martos-Maldonado et al., Nat Commun. (2018) 17;9(1):3307, the disclosure of which is incorporated herein. Thus, in certain embodiments provided herein is a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4 and a poly-His tag at the N-terminus, such as a Gly-6×His tag (SEQ ID NO: 421). In certain embodiments provided herein is a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4, a poly-His tag at the N-terminus, such as a Gly-6×His tag (SEQ ID NO: 421), and/or a TEV cleavage site at the N-terminus. In certain embodiments provided herein is a nucleic acid-guided nuclease having a poly-His tag at the N-terminus, such as a Gly-6×His tag (SEQ ID NO: 421) and a TEV cleavage site at the N-terminus, such as a polypeptide having at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 2. In certain embodiments provided herein is a nucleic acid-guided nuclease disclosed herein includes a polypeptide having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 1, a poly-His tag at the N-terminus, such as a Gly-6×His tag (SEQ ID NO: 421), and/or a TEV cleavage site at the N-terminus. Additionally or alternatively, the nuclease may comprise one or more NLS as described herein.
  • Cleavage Sites
  • In addition to, or alternatively to, including one or more NLSs, purification tags, and/or other additional amino acid sequences described herein, an engineered nuclease polypeptide disclosed herein can include one or more cleavage sites, which can be at or near the N-terminus or the C-terminus. Any suitable cleavage site can be used; if a plurality of cleavage sits is used, they may be the same or different. In certain embodiments a cleavage site comprises a Tobacco Etch Virus protease cleavage sequence, herein referred to as a “TEV sequence” (SEQ ID NO: 108). The TEV sequence can be at or near the amino terminus. Generally, the cleavage sequence, e.g., TEV sequence, is located so that cleavage at the cleavage sequence leaves other additional amino acid sequences, in particular any NLS added to the original nuclease polypeptide, intact. A TEV cleavage site can have the amino acid sequence ENLYFQS (SEQ ID. NO: 108.
  • In certain embodiments, provided herein is a nucleic acid sequence encoding a polypeptide having at least 50% nucleic acid identity to a polypeptide represented by SEQ ID NO: 2. In certain embodiments, provided herein is a nucleic acid sequence encoding a polypeptide having at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% to a polypeptide represented by SEQ ID NO: 2. In certain embodiments, provided herein is a nucleic acid sequence encoding a polypeptide having at least at least 50% nucleic acid identity to a polypeptide represented by SEQ ID NO: 3. In certain embodiments, provided herein is a nucleic acid sequence encoding a polypeptide having at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% to a polypeptide represented by SEQ ID NO: 3. In certain embodiments, provided herein is a nucleic acid sequence encoding a polypeptide having at least at least 50% nucleic acid identity to a polypeptide represented by SEQ ID NO: 4. In certain embodiments, provided herein is a nucleic acid sequence encoding a polypeptide having at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% to a polypeptide represented by SEQ ID NO: 4. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 23-105. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 23-42 In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 43-65. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 43-53. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 54-58. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 59-63. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NO: 43. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 64-84. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NO: 64. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 64-74. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 75-79. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 80-84. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 85-105. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NO: 85. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 85-95. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 96-100. In certain embodiments, provided herein is a nucleic acid of at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% polynucleotide identity to any one of SEQ ID NOS: 101-105.
  • A nucleic acid sequence encoding a nucleic acid-guided nuclease can be operably linked to a promoter. Such nucleic acid sequences can be linear or circular. The nucleic acid sequences can be encompassed on a larger linear or circular nucleic acid sequences that comprises additional elements such as an origin of replication, selectable or screenable marker, terminator, other components of a targetable nuclease system, such as a guide nucleic acid, and/or an editing or recorder cassette as disclosed herein. In some aspects, nucleic acid sequences can include sequences that code for at least one glycine, at least one poly-histidine tag, such as a 6× histidine tag (SEQ ID NO: 423), and/or at least one, two, three, four, or five nuclear localization signal tags, some or all of which can be on the amino side of the polypeptide, the carboxy side of the polypeptide, or a combination thereof. Larger nucleic acid sequences can be recombinant expression vectors, as are described in more detail later.
  • Guide Nucleic Acids
  • In certain embodiments, compositions and methods disclosed herein include a guide nucleic acid (gNA), e.g., a gRNA.
  • In general, a guide polynucleotide, also referred to as a guide nucleic acid (gNA) can complex with a compatible nucleic acid-guided nuclease, such as those disclosed herein, and can hybridize with a target nucleic acid sequence, thereby directing the nuclease to the target nucleic acid sequence. A subject nucleic acid-guided nuclease capable of complexing with a guide polynucleotide can be referred to as a nucleic acid-guided nuclease that is compatible with the guide polynucleotide. In addition, a guide polynucleotide capable of complexing with a nucleic acid-guided nuclease can be referred to as a guide polynucleotide or a guide nucleic acid that is compatible with the nucleic acid-guided nuclease. In some embodiments, a polynucleotide (gRNA) disclosed herein can be split into fragments, e.g., two separate polynucleotides, in some cases encompassing a synthetic tracrRNA and crRNA. Such gNAs, e.g., gRNAs, can be referred to as dual or split gNA, e.g., gRNA.
  • A guide polynucleotide can be DNA. A guide polynucleotide can be RNA. A guide polynucleotide can include both DNA and RNA. A guide polynucleotide can include modified or non-naturally occurring nucleotides. In cases where the guide polynucleotide comprises RNA, the RNA guide polynucleotide can be encoded by a DNA sequence on a polynucleotide molecule such as a plasmid, linear construct, or editing cassette as disclosed herein.
  • A guide polynucleotide can comprise a guide sequence, also referred to herein as a spacer sequence. A guide (spacer) sequence is a polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence, also referred to herein as a target nucleic acid sequence, to hybridize with the target sequence and direct sequence-specific binding of a complexed nucleic acid-guided nuclease to the target sequence. The degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences. In some embodiments, a guide sequence can be about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In other embodiments, a guide sequence can be less than about 75, 50, 45, 40, 35, 30, 25, 20 nucleotides in length. Preferably the guide sequence is 10-30 nucleotides long. The guide sequence can be 15-20 nucleotides in length. The guide sequence can be 15 nucleotides in length. The guide sequence can be 16 nucleotides in length. The guide sequence can be 17 nucleotides in length. The guide sequence can be 18 nucleotides in length. The guide sequence can be 19 nucleotides in length. The guide sequence can be 20 nucleotides in length.
  • A guide polynucleotide can include a scaffold sequence. In general, a “scaffold sequence” can include any sequence that has sufficient sequence to promote formation of a targetable nuclease complex, wherein the targetable nuclease complex includes, but is not limited to, a nucleic acid-guided nuclease and a guide polynucleotide that can include a scaffold sequence and a guide sequence. Sufficient sequence within the scaffold sequence to promote formation of a targetable nuclease complex may include a degree of complementarity along the length of two sequence regions within the scaffold sequence, such as one or two sequence regions involved in forming a secondary structure. In some cases, the one or two sequence regions are included or encoded on the same polynucleotide. In some cases, the one or two sequence regions are included or encoded on separate polynucleotides. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the one or two sequence regions. In some embodiments, the degree of complementarity between the one or two sequence regions along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, at least one of the two sequence regions can be about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length.
  • A scaffold sequence of a subject guide polynucleotide can comprise a secondary structure. A secondary structure can comprise a pseudoknot region. In some cases, binding kinetics of a guide polynucleotide to a nucleic acid-guided nuclease is determined in part by secondary structures within the scaffold sequence. In some cases, binding kinetics of a guide polynucleotide to a nucleic acid-guided nuclease is determined in part by nucleic acid sequence with the scaffold sequence. In some aspects, the invention provides a nuclease that binds to a guide polynucleotide can include a conserved scaffold sequence. For example, the nucleic acid-guided nucleases for use in the present disclosure can bind to a conserved pseudoknot region.
  • In certain embodiments, the engineered polynucleotide (gRNA) can be split into fragments encompassing a synthetic tracrRNA and crRNA.
  • As used herein, “guide nucleic acid” or “guide polynucleotide” can refer to one or more polynucleotides and can include 1) a guide (spacer) sequence capable of hybridizing to a target sequence and 2) a scaffold sequence capable of interacting with or complexing with a nucleic acid-guided nuclease as described herein. A guide nucleic acid can be provided as one or more nucleic acids. In specific embodiments, the guide sequence and the scaffold sequence are provided as a single polynucleotide. In other aspects, guide nucleic acid may include at least one amplicon targeting fragments.
  • A guide nucleic acid can be compatible with a nucleic acid-guided nuclease when the two elements can form a functional targetable nuclease complex capable of cleaving a target sequence. In certain methods, a compatible scaffold sequence for a compatible guide nucleic acid can be found by scanning sequences adjacent to a native nucleic acid-guided nuclease loci.
  • For example, native nucleic acid-guided nucleases can be encoded on a genome within proximity to a corresponding compatible guide nucleic acid or scaffold sequence.
  • Nucleic acid-guided nucleases can be compatible with guide nucleic acids that are not found within the nucleases endogenous host. Such orthogonal guide nucleic acids can be determined by empirical testing. Orthogonal guide nucleic acids can come from different bacterial species or be synthetic or otherwise engineered to be non-naturally occurring.
  • Orthogonal guide nucleic acids that are compatible with a common nucleic acid-guided nuclease can comprise one or more common features. Common features can include sequence outside a pseudoknot region. Common features can include a pseudoknot region. Common features can include a primary sequence or secondary structure.
  • A guide nucleic acid can be engineered to target a desired target sequence by altering the guide (spacer) sequence such that the guide sequence is complementary to the target sequence, thereby allowing hybridization between the guide sequence and the target sequence. A guide nucleic acid with an engineered guide sequence can be referred to as an engineered guide nucleic acid. Engineered guide nucleic acids are often non-naturally occurring and are not found in nature.
  • Engineered guide nucleic acids can be formed using a Synthetic Tracr RNA (STAR) system. STAR, when combined with a Cas12a protein, can form at least one ribonucleoprotein (RNP) complex that targets a specific genomic locus. STAR takes advantage of the natural properties of the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) where the CRISPR system functions much like an immune system against invading viruses and plasmid DNA. Short DNA sequences (spacers) from invading viruses are incorporated at CRISPR loci within the bacterial genome and serve as “memory” of previous infections. Reinfection triggers complementary mature CRISPR RNA (crRNA) to find a matching viral sequence. Together, the crRNA and trans-activating crRNA (tracrRNA) guide CRISPR-associated (Cas) nuclease to cleave double-strand breaks in “foreign” DNA sequences. The prokaryotic CRISPR “immune system” has been engineered to function as an RNA-guided, mammalian genome editing tool that is simple, easy and quick to implement. STAR (which includes synthetic crRNA and tracrRNA) when combined with Cas12a protein can form ribonucleoprotein (RNP) complexes that target a specific genomic locus. Engineered guide nucleic acids formed with the RNA (STAR) system can result in a split gRNA. Split gRNA, i.e., dual guide RNAs are described more fully in WO 2021067788A1.
  • In certain embodiments, provided herein are ribonucleoprotein (RNP) complexes that include at least one nuclease disclosed herein. In certain embodiments, a RNP complex can include at least one nuclease having an amino acid sequence of at least 50% identity to SEQ ID NO:2. In certain embodiments, a RNP complex can include at least one nuclease having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO:2. In certain embodiments, a RNP complex can include at least one nuclease having an amino acid sequence of at least 50% identity to SEQ ID NO:3. In certain embodiments, a RNP complex can include at least one nuclease having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO:3. In certain embodiments, a RNP complex can include at least one nuclease having an amino acid sequence of at least 50% identity to SEQ ID NO:4. In certain embodiments, a RNP complex can include at least one nuclease having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO:4. In certain embodiments, a RNP complex including a nuclease disclosed herein can further include at least one STAR gRNA (dual guide RNA). In certain embodiments, a RNP complex including a nuclease disclosed herein can further include at least one non-STAR gRNA (e.g., single guide RNA). In certain embodiments, a RNP complex including a nuclease disclosed herein can further include at least one polynucleotide. In certain embodiments, a polynucleotide included in a RNP complex disclosed herein can be greater than about 50 nucleotides in length. In certain embodiments, a polynucleotide included in a RNP complex disclosed herein can be about 50, to about 150, to about 500, to about 1000 nucleotides, or greater than 1000 nucleotides in length. In certain embodiments, more than one nuclease can be added to an RNP complex to affect the overall editing efficiency. In certain embodiments, more than one gRNA can be added to the RNP complex to allow for multiplexed editing of more than one site in a single transfection for improved efficiency. In other embodiments, more than one DNA template can be added to the RNP to allow for multiplexed editing at one or more sites based on a specific desired repair outcome.
  • In certain embodiments, a composition comprising a Type V, e.g., Type VA, CRISPR nuclease polypeptide, such as described herein, further comprises a guide nucleic acid (gNA), e.g., gRNA, comprising a spacer sequence that targets a target nucleotide sequence (also referred to herein as a target nucleic acid sequence) within a polynucleotide (also referred to herein as a target polynucleotide, as will be clear from context), or a polynucleotide coding for the gNA, e.g., gRNA, wherein the gNA, e.g., gRNA is compatible with the Type V, e.g., Type VA, CRISPR nuclease. In general, a polynucleotide within which a target target nucleotide sequence (target nucleic acid sequence) is located, as that term is used herein, includes a polynucleotide that includes the target target nucleotide sequence (target nucleic acid sequence). Such a polynucleotide can be any suitable polynucleotide, such as a genome of a cell or part of a genome of a cell. In certain embodiments, the target nucleotide sequence (target nucleic acid sequence) is within 50 nucleotides of a protospacer adjacent motif (PAM) sequence specific for the Type V CRISPR nuclease, such as a PAM comprising a sequence of YTTN, wherein Y is T or C and N is A, T, G, or C, or a sequence of YTTV or TTTV, wherein V is A, G, or C. In certain embodiments the PAM comprises a sequence of YTTV or TTTV, wherein V is A, G, or C. In certain embodiments, the gNA is a gRNA, such as a dual (split) gRNA. The gNA, e.g. gRNA, can comprise one or more chemical modifications, such as 2′-O-alkyl, a 2′-O-methyl, a phosphorothioate, a phosphonoacetate, a thiophosphonoacetate, a 2′-O-methyl-3′-phosphorothioate, a 2′-O-methyl-3′-phosphonoacetate, a 2′-O-methyl-3′-thiophosphonoacetate, a 2′-deoxy-3′-phosphonoacetate, a 2′-deoxy-3′-thiophosphonoacetate, a suitable alternative, or a combination thereof. In certain embodiments, a ratio of guanine:uracil in the gRNA is at least 51:49, 52:48, 53:47, 54:46, 55:45, 56:44, 57:43, 58:42, 59:42, or 60:40, preferably at least 53:47, more preferably at least 54:46, even more preferably at least 55:45. See Example 12 and FIG. 10 . In certain embodiments, a molar ratio of gNA, e.g., gRNA to Type V CRISPR nuclease is at least 1.1:1, 1.2:1, 1.3:1, 1.4:1, 1.5:1, 1.6:1, 1.7:1, 1.8:1, 2:1, 2.2:1, 2.5:1, or 3:1 and/or not more than 1.2:1, 1.3:1, 1.4:1, 1.5:1, 1.6:1, 1.7:1, 1.8:1, 2:1, 2.2:1, 2.5:1, 3:1, or 4:1, preferably 1.1:1 to 2.5:1, more preferably 1.2:1 to 2:1, even more preferably 1.2:1 to 1.7:1. See, e.g., Example 13. In certain embodiments a molar amount of gNA, e.g., gRNA, is at least 10, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 170, 190 or 200 pmol and/or not more than 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 170, 190, 200, 250, or 300 pmol, preferably 25-200 pmol, more preferably 50-100 pmol, even more preferably 65 to 85 pmol. See Example 13.
  • In certain embodiments, a composition comprising a Type V, e.g., Type VA, CRISPR nuclease polypeptide, such as described herein, further includes a donor template, also referred to as an editing template herein. A donor template can comprise homology arms, that is, nucleotide sequences that are complementary with polynucleotide sequences on either side of a cleavage site at which the donor template will be inserted. The donor template can be present in any suitable amount, e.g., in certain embodiments, at least 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.7, 2, 2.5, 3, 4, or 5 μg μL−1 and/or not more than 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.7, 2, 2.5, 3, 4, 5, 7, or 10 μg μL−1, preferably 0.3 to 2 μg μL−1, more preferably 0.5 to 1.5 μg μL−1, even more preferably 0.8 to 1.2 μg μL−1.
  • In certain embodiments, a composition comprising a Type V, e.g., Type VA, CRISPR nuclease polypeptide, such as described herein, further includes an anionic polymer. Any suitable anionic polymer may be used. Exemplary anionic polymers include 1,2,3-heptanetriol, 2-Amino-2-(hydroxymethyl)-1,3-propanediol (Tris), 3-(1-pyridino)-1-propane sulfonate (NDSB 201), 3[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate (CHAPS), 6-aminocaproic acid, adenosine diphosphate (ADP), adenosine triphosphate (ATP), alpha-cyclodextrin, amidosulfobetaine-14 (ASB-14), ammonium acetate, ammonium nitrate, ammonium sulfate, arginine, arginine ethylester, barium chloride, barium iodide, benzamidine HCl, beta-cyclodextrin, beta-mercaptoethanol (BME), biotin, calcium chloride, cesium chloride, cesium sulfate, cetyltrimethylammonium bromide (CTAB), choline chloride, citric acid, cobalt chloride, copper (II) chloride, cyclohexanol, D-sorbitol, dimethylethylammoniumpropane sulfonate (NDSB 195), dithiothreitol (DTT), erythritol, ethanol, ethylene glycol, ethylene glycol-bis(βbeta-aminoethyl ether)-N,N,N′,N′-tetraacetic acid (EGTA), ethylenediaminetetraacetic acid (EDTA), formamide, gadolinium bromide, gamma butyrolactone, glucose, glutamic acid, glutamine, glycerol, glycine, glycine betaine, glycine-glycine-glycine, guanidine HCl, guanosine triphosphate (GTP), holmium chloride, imidazole, iron (III) chloride, Jeffamine M-600, lanthanum acetate, lauryl sulfobetaine, lauryldimethylamine N-oxide (LDAO), lithium sulfate, magnesium chloride, magnesium sulfate, manganese chloride, mannitol, N-(2-hydroxyethyl)piperazine-N′-(3-propanesulfonic acid) (EPPS), N-dodecyl beta-D-maltoside (DDM), N-ethylurea, n-hexanol, N-lauryl sarcoside, N-lauryl sarcosine, N-methylformamide, N-methylurea, n-octyl-b-D-glucoside (OG: Octyl glucoside), n-penthanol, nickel chloride, non-detergent sulfo betaine (NDSB), Nonidet P40 (NP40), octyl beta-D-glucopyranoside, poly-L-glutamic acid, polyethylene glycol (for example, PEG 300, PEG 3350, PEG 4000), polyethyleneglycol lauryl ether (Brij 35), polyoxyethylene (2) oleyl ether (Brij 93), polyoxyethylene cetyl ether (Brij 56), polyvinylpyrrolidone 40 (PVP40), potassium chloride, potassium citrate, potassium nitrate, proline, putrescine, spermidine, spermine, riboflavin, samarium bromide, sarcosine, sodium acetate, sodium chloride, sodium dodecyl sulfate (SDS), sodium fluoride, sodium iodide, sodium lauroyl sarcosinate (Sarkosyl), sodium malonate, sodium molybdate, sodium selenite, sodium sulfate, sodium thiocyanate, sucrose, taurine, trehalose, tricine, triethylamine, trimethylamine N-oxide (TMAO), tris(2-carboxyethyl)phosphine (TCEP), Triton X-100, Tween 20, Tween 60, Tween 80, urea, vitamin B12, xylitol, yttrium chloride, yttrium nitrate, zinc chloride, Zwittergent 3-08, Zwittergent 3-14, or a combination thereof. In certain embodiments, an anionic polymer comprises polyglutamic acid. In certain embodiments, the anionic polymer, e.g., PGA, is present at a concentration of at least 20, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 170, 200, 250, 300, 400, or 500 μg μL−1 and/or not more than 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 170, 200, 250, 300, 400, 500, 700, or 1000 μg μL−1, preferably 20 to 200 μg μL−1, more preferably 50 to 150 μg μL−1, even more preferably 80 to 120 μg μL−1 (PGA).
  • In certain embodiments, provided herein is a cell containing one or more of the compositions described herein, e.g. a composition comprising a Type V, e.g., Type VA, CRISPR nuclease polypeptide comprising one or more NLSs and, in certain embodiments a purification tag and/or cleavage site. Any suitable cell may be used. In certain embodiments the cell is a human cell, such as an immune cell, e.g., T cell, or a stem cell, e.g., induced pluripotent stem cell (iPSC).
  • In certain embodiments, provided herein are methods of inserting one or more of the compositions described herein, e.g., a composition comprising a Type V, e.g., Type VA, CRISPR nuclease polypeptide comprising one or more NLSs and, in certain embodiments a purification tag and/or cleavage site, into a cell. Any suitable method for insertion may be used. In certain embodiments, electroporation is used. Electroporation conditions can be optimized, see, e.g., Examples.
  • In certain embodiments provided are methods of modifying a target polynucleotide comprising contacting the target polynucleotide with a composition or compositions as described herein, e.g, a composition comprising a Type V, e.g., Type VA, CRISPR nuclease polypeptide comprising one or more NLSs and a suitable gNA, e.g., gRNA, and allowing the composition to modify the target polynucleotide, in some cases a genomic region, such as a genome or part of a genome within a cell, e.g. human cell such as an immune cell, e.g., T cell, or a stem cell, e.g., iPSC. In certain cases, the composition or compositions comprises a donor template, such as a donor template comprising a polynucleotide coding for a polypeptide to be expressed by the cell, in certain embodiments the polypeptide comprises a chimeric antigen receptor (CAR) or portion thereof; see, e.g., Examples. In certain embodiments the cell is a human cell, e.g., immune cell such as a T cell, or stem cell, such as an iPSC.
  • Nuclease Systems
  • In certain embodiments disclosed herein are targetable nuclease systems. In certain embodiments, targetable nuclease system can include a nucleic acid-guided nuclease and a compatible guide nucleic acid (also referred to interchangeably herein as “guide polynucleotide” and “gRNA”). A targetable nuclease system can include a nucleic acid-guided nuclease or a polynucleotide sequence encoding the nucleic acid-guided nuclease. A targetable nuclease system can include a guide nucleic acid or a polynucleotide sequence encoding the guide nucleic acid.
  • In general, a targetable nuclease system as disclosed herein can be characterized by elements that promote the formation of a targetable nuclease complex at the site of a target sequence, wherein the targetable nuclease complex includes a nucleic acid-guided nuclease and a guide nucleic acid.
  • A guide nucleic acid together with a nucleic acid-guided nuclease forms a targetable nuclease complex which is capable of binding to a target sequence within a target polynucleotide, as determined by the guide sequence of the guide nucleic acid.
  • In general, to generate a double stranded break, in most cases a targetable nuclease complex binds to a target sequence as determined by the guide nucleic acid, and the nuclease has to recognize a protospacer adjacent motif (PAM) sequence adjacent to the target sequence.
  • A targetable nuclease complex can include a nucleic acid-guided nuclease having an amino acid sequence of at least 50% identity to SEQ ID NO: 2 and a compatible guide nucleic acid. A targetable nuclease complex can include a nucleic acid-guided nuclease having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences SEQ ID NO: 2 and a compatible guide nucleic acid. protospacer adjacent motif (PAM) sequence adjacent to the target sequence. A targetable nuclease complex can include a nucleic acid-guided nuclease having an amino acid sequence of at least 50% identity to SEQ ID NO: 3 and a compatible guide nucleic acid. A targetable nuclease complex can include a nucleic acid-guided nuclease having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 3 and a compatible guide nucleic acid. A targetable nuclease complex can include a nucleic acid-guided nuclease having an amino acid sequence of at least 50% identity to SEQ ID NO: 4 and a compatible guide nucleic acid. A targetable nuclease complex can include a nucleic acid-guided nuclease having an amino acid sequence of at least about 60%, 65%, 75%, 85%, 95%, 99% or about 100% identity to amino acid sequences of SEQ ID NO: 4 and a compatible guide nucleic acid. In certain embodiments, the guide nucleic acid can include a scaffold sequence compatible with the nucleic acid-guided nuclease selected. In any of these embodiments, the guide sequence can be engineered to be complementary to any desired target sequence. The guide sequence selected can be engineered to hybridize to any desired target sequence. In certain embodiments, the guide sequence is a dual guide RNA.
  • A target sequence of a targetable nuclease complex can be any polynucleotide endogenous or exogenous to a prokaryotic or eukaryotic cell, or in vitro. For example, the target sequence can be a polynucleotide residing in the nucleus of the eukaryotic cell. A target sequence can be a sequence coding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or a junk DNA). It is contemplated herein that the target sequence should be associated with a PAM; that is, a short sequence recognized by a targetable nuclease complex. The precise sequence and length requirements for a PAM differ depending on the nucleic acid-guided nuclease used, but PAMs can be a 2-5 base pair sequences adjacent the target sequence. Examples of PAM sequences are given in the examples section below, and the skilled person will be able to identify further PAM sequences for use with a given nucleic acid-guided nuclease. Further, engineering of the PAM Interacting (PI) domain may allow programming of PAM specificity, improve target site recognition fidelity, and increase the versatility of a nucleic acid-guided nuclease genome engineering platform. Nucleic acid-guided nucleases may be engineered to alter their PAM specificity, for example as described in Kleinstiver et al., Nature. 2015 Jul. 23; 523 (7561): 481-5, the disclosure of which is incorporated herein in its entirety.
  • A PAM site is a nucleotide sequence in proximity to a target sequence. In most cases, a nucleic acid-guided nuclease can only cleave a target sequence if an appropriate PAM is present. PAMs are nucleic acid-guided nuclease-specific and can be different between two different nucleic acid-guided nucleases. A PAM can be 5′ or 3′ of a target sequence. A PAM can be upstream or downstream of a target sequence. A PAM can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleotides in length. Often, a PAM is between 2-6 nucleotides in length.
  • In some embodiments disclosed herein, a PAM can be provided on a separate oligonucleotide. In such cases, providing PAM on a oligonucleotide allows cleavage of a target sequence that otherwise would not be able to be cleave because no adjacent PAM is present on the same polynucleotide as the target sequence.
  • Polynucleotide sequences encoding a component of a targetable nuclease system can include one or more vectors. In general, the term “vector” as used herein can refer to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double-stranded, or partially double-stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g. circular); nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art. One type of vector is a “plasmid,” which refers to a circular double stranded DNA loop into which additional DNA segments can be inserted, such as by standard molecular cloning techniques. Another type of vector is a viral vector, wherein virally-derived DNA or RNA sequences are present in the vector for packaging into a virus (e.g. retroviruses, replication defective retroviruses, adenoviruses, replication defective adenoviruses, and adeno-associated viruses). Other vectors (e.g., non-episomal mammalian vectors) can be integrated into the genome of a host cell upon introduction into the host cell. Recombinant expression vectors can include a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, can mean that the recombinant expression vectors include one or more regulatory elements, which may be selected on the basis of the host cells to be used for expression, that is operatively-linked to the nucleic acid sequence to be expressed.
  • In some embodiments, a regulatory element can be operably linked to one or more elements of a targetable nuclease system so as to drive expression of the one or more components of the targetable nuclease system.
  • In some embodiments, a vector can include a regulatory element operably linked to a polynucleotide sequence encoding a nucleic acid-guided nuclease. The polynucleotide sequence encoding the nucleic acid-guided nuclease can be codon optimized for expression in targeted cells, such as prokaryotic or eukaryotic cells. Eukaryotic cells can be yeast, fungi, algae, plant, animal, or human cells. Eukaryotic cells can be those derived from an organism, such as a mammal, including but not limited to human, mouse, rat, rabbit, dog, or non-human mammal including non-human primate.
  • In general, codon optimization can refer to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon or more of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence. Various species exhibit certain bias for codons of a certain amino acid. As contemplated herein, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the “Codon Usage Database” available at www.kazusa.orjp and these tables can be adapted in a number of ways. See Nakamura, Y., et al. “Codon usage tabulated from the international DNA sequence databases: status for the year 2000” Nucl. Acids Res. 28:292 (2000).
  • A nucleic acid-guided nuclease and one or more guide nucleic acids can be delivered either as DNA or RNA. Delivery of a nucleic acid-guided nuclease and guide nucleic acid both as RNA (unmodified or containing base or backbone modifications) molecules can be used to reduce the amount of time that the nucleic acid-guided nuclease persists in the cell. This may reduce the level of off-target cleavage activity in the target cell. Since a nucleic acid-guided nuclease as mRNA takes time to be translated into protein, it can be advantageous to deliver the guide nucleic acid several hours following the delivery of the nucleic acid-guided nuclease mRNA, to maximize the level of guide nucleic acid available for interaction with the nucleic acid-guided nuclease protein. In other cases, the nucleic acid-guided nuclease mRNA and guide nucleic acid are delivered concomitantly. In other examples, the guide nucleic acid is delivered sequentially, such as 0.5, 1, 2, 3, 4, or more hours after the nucleic acid-guided nuclease mRNA.
  • Guide nucleic acid in the form of RNA or encoded on a DNA expression cassette can be introduced into a host cell can include a nucleic acid-guided nuclease encoded on a vector or chromosome. The guide nucleic acid may be provided in the cassette one or more polynucleotides, which may be contiguous or non-contiguous in the cassette. In specific embodiments, the guide nucleic acid is provided in the cassette as a single contiguous polynucleotide.
  • A variety of delivery systems can be used to introduce a nucleic acid-guided nuclease (DNA or RNA) and guide nucleic acid (DNA or RNA) into a host cell. In accordance with these embodiments, systems of use can include, but are not limited to, yeast systems, lipofection systems, microinjection systems, biolistic systems, virosomes, liposomes, immunoliposomes, polycations, lipid:nucleic acid conjugates, virions, artificial virions, viral vectors, electroporation, cell permeable peptides, nanoparticles, nanowires (Shalek et al., Nano Letters, 2012), exosomes. Molecular trojan horses liposomes (Pardridge et al., Cold Spring Harb Protoc; 2010; doi:10.1101/pdb.prot5407) may be used to deliver an engineered nuclease and guide nuclease across the blood brain barrier.
  • In some embodiments, an editing template, also referred to herein as a donor template, is also provided. An editing template may be a component of a vector as described herein, contained in a separate vector, or provided as a separate polynucleotide, such as an oligonucleotide, linear polynucleotide, or synthetic polynucleotide. In some cases, an editing template is on the same polynucleotide as a guide nucleic acid. In some embodiments, an editing template is designed to serve as a template in homologous recombination, such as within or near a target sequence nicked or cleaved by a nucleic acid-guided nuclease as a part of a complex as disclosed herein. An editing template polynucleotide can be of any suitable length, such as about or more than about 10, 15, 20, 25, 50, 75, 100, 150, 200, 500, 1000, or more nucleotides in length. In some embodiments, the editing template polynucleotide is complementary to a portion of a polynucleotide can include the target sequence. When optimally aligned, an editing template polynucleotide might overlap with one or more nucleotides of a target sequences (e.g. about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, or more nucleotides). In some embodiments, when a editing template sequence and a polynucleotide can include a target sequence are optimally aligned, the nearest nucleotide of the template polynucleotide is within about 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 5000, 10000, or more nucleotides from the target sequence.
  • In some embodiments, methods are provided for delivering one or more polynucleotides, such as or one or more vectors or linear polynucleotides as described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell. In some aspects, the invention further provides cells produced by such methods, and organisms can include or produced from such cells. In some embodiments, an engineered nuclease in combination with (and optionally complexed with) a guide nucleic acid is delivered to a cell.
  • Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in cells, such as prokaryotic cells, eukaryotic cells, mammalian cells, or target tissues. Such methods can be used to administer nucleic acids encoding components of an engineered nucleic acid-guided nuclease system to cells in culture, or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell. Any gene therapy method known in the art is contemplated of use herein. Methods of non-viral delivery of nucleic acids include are contemplated herein. Adeno-associated virus (“AAV”) vectors may also be used to transduce cells with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene therapy procedures.
  • In some embodiments, a host cell is transiently or non-transiently transfected with one or more vectors, linear polynucleotides, polypeptides, nucleic acid-protein complexes, or any combination thereof as described herein. In some embodiments, a cell in transfected in vitro, in culture, or ex vivo. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, a cell that is transfected is taken from a subject. In some embodiments, the cell is derived from cells taken from a subject, such as a cell line.
  • In some embodiments, a cell transfected with one or more vectors, linear polynucleotides, polypeptides, nucleic acid-protein complexes, or any combination thereof as described herein is used to establish a new cell line can include one or more transfection-derived sequences. In some embodiments, a cell transiently transfected with the components of an engineered nucleic acid-guided nuclease system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of an engineered nuclease complex, is used to establish a new cell line can include cells containing the modification but lacking any other exogenous sequence.
  • In some embodiments, one or more vectors described herein are used to produce a non-human transgenic cell, organism, animal, or plant. In some embodiments, the transgenic animal is a mammal, such as a mouse, rat, or rabbit. Methods for producing transgenic cells, organisms, plants, and animals are known in the art, and generally begin with a method of cell transformation or transfection, such as described herein.
  • In certain embodiments, an engineered nuclease complex, “target sequence” can refer to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of an engineered nuclease complex. A target sequence can include any polynucleotide, such as DNA, RNA, or a DNA-RNA hybrid. A target sequence can be located in the nucleus or cytoplasm of a cell. A target sequence can be located in vitro or in a cell-free environment.
  • In some embodiments, formation of an engineered nuclease complex can include a guide nucleic acid hybridized to a target sequence and complexed with one or more novel engineered nucleases as disclosed herein renders cleavage of one or both strands in or near (e.g. within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, or more base pairs from) the targeted sequence. Cleavage can occur within a target sequence, 5′ of the target sequence, upstream of a target sequence, 3′ of the target sequence, or downstream of a target sequence.
  • In some embodiments, one or more vectors driving expression of one or more components of a targetable nuclease system are introduced into a host cell or in vitro such formation of a targetable nuclease complex at one or more target sites. For example, a nucleic acid-guided nuclease and a guide nucleic acid can each be operably linked to separate regulatory elements on separate vectors. Alternatively, two or more of the elements expressed from the same or different regulatory elements, can be combined in a single vector, with one or more additional vectors providing any components of the targetable nuclease system not included in the first vector. Targetable nuclease system elements that are combined in a single vector may be arranged in any suitable orientation, such as one element located 5′ with respect to (“upstream” of) or 3′ with respect to (“downstream” of) a second element. The coding sequence of one element may be located on the same or opposite strand of the coding sequence of a second element, and oriented in the same or opposite direction. In some embodiments, a single promoter drives expression of a transcript encoding a nucleic acid-guided nuclease and one or more guide nucleic acids. In some embodiments, a nucleic acid-guided nuclease and one or more guide nucleic acids are operably linked to and expressed from the same promoter. In other embodiments, one or more guide nucleic acids or polynucleotides encoding the one or more guide nucleic acids are introduced into a cell or in vitro environment already can include a nucleic acid-guided nuclease or polynucleotide sequence encoding the nucleic acid-guided nuclease.
  • In some embodiments, when multiple different guide sequences are used, a single expression construct may be used to target nuclease activity to multiple different, corresponding target sequences within a cell or in vitro. For example, a single vector can include about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or more guide sequences. In other embodiments, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more such guide-sequence-containing vectors can be provided, and optionally, delivered to a cell in vivo or in vitro.
  • In some embodiments, methods and compositions disclosed herein can include more than one guide nucleic acid, such that each guide nucleic acid has a different guide sequence, thereby targeting a different target sequence. In accordance with these embodiments, multiple guide nucleic acids can be using in multiplexing, wherein multiple targets are targeted simultaneously. Additionally or alternatively, the multiple guide nucleic acids are introduced into a population of cells, such that each cell in a population received a different or random guide nucleic acid, thereby targeting multiple different target sequences across a population of cells. In such cases, the collection of subsequently altered cells can be referred to as a library.
  • In other embodiments, methods and compositions disclosed herein can include multiple different nucleic acid-guided nucleases, each with one or more different corresponding guide nucleic acids, thereby allowing targeting of different target sequences by different nucleic acid-guided nucleases. In some such cases, each nucleic acid-guided nuclease can correspond to a distinct plurality of guide nucleic acids, allowing two or more non-overlapping, partially overlapping, or completely overlapping multiplexing events.
  • In some embodiments, the nucleic acid-guided nuclease has DNA cleavage activity or RNA cleavage activity. In some embodiments, the nucleic acid-guided nuclease directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the nucleic acid-guided nuclease directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence.
  • In certain embodiments, the invention provides for methods of modifying a target sequence in vitro, or in a prokaryotic or eukaryotic cell, which can be in vivo, ex vivo, or in vitro. In some embodiments, the method includes sampling a cell or population of cells such as prokaryotic cells, or those from a human or non-human animal or plant (including micro-algae or other organism), and modifying the cell or cells. Culturing may occur at any stage in vitro or ex vivo. The cell or cells may even be re-introduced into the host, such as a non-human animal or plant (including micro-algae). For re-introduced cells, they can be stem cells.
  • In some embodiments, the method includes allowing a targetable nuclease complex to bind to the target sequence to effect cleavage of the target sequence, thereby modifying the target sequence, wherein the targetable nuclease complex includes a nucleic acid-guided nuclease complexed with a guide nucleic acid wherein the guide sequence of the guide nucleic acid is hybridized to a target sequence within a target polynucleotide. In some aspects, the invention provides a method of modifying expression of a target polynucleotide in in vitro or in a prokaryotic or eukaryotic cell. In some embodiments, the method includes allowing an targetable nuclease complex to bind to a target sequence with the target polynucleotide such that the binding can lead to in increased or decreased expression of the target polynucleotide; wherein the targetable nuclease complex includes an nucleic acid-guided nuclease complexed with a guide nucleic acid, and wherein the guide sequence of the guide nucleic acid is hybridized to a target sequence within the target polynucleotide.
  • In certain embodiments, the invention provides kits containing any one or more of the elements disclosed in the above methods and compositions. Elements may provide individually or in combinations, and may be provided in any suitable container, such as a vial, a bottle, or a tube. In some embodiments, the kit includes instructions in one or more languages, for example in more than one language.
  • In some embodiments, a kit comprises one or more reagents for use in a process utilizing one or more of the elements described herein. Reagents may be provided in any suitable container. For example, a kit may provide one or more reaction or storage buffers. Reagents can be provided in a form that is usable in an assay, or in a form that requires addition of one or more other components before use (e.g. in concentrate or lyophilized form). A buffer can be any buffer, including but not limited to a sodium carbonate buffer, a sodium bicarbonate buffer, a borate buffer, a Tris buffer, a MOPS buffer, a HEPES buffer, and combinations thereof. In some embodiments, the buffer is alkaline. In some embodiments, the buffer has a pH from about 7 to about 10. In some embodiments, the kit includes one or more oligonucleotides corresponding to a guide sequence for insertion into a vector so as to operably link the guide sequence and a regulatory element. In some embodiments, the kit includes a editing template.
  • In some embodiments, a targetable nuclease complex has a wide variety of utility including modifying (e.g., deleting, inserting, translocating, inactivating, activating) a target sequence in a multiplicity of cell types. As such a targetable nuclease complex of the invention has a broad spectrum of applications in, e.g., biochemical pathway optimization, genome-wide studies, genome engineering, gene therapy, drug screening, disease diagnosis, and prognosis. An exemplary targetable nuclease complex includes a nucleic acid-guided nuclease as disclosed herein complexed with a guide nucleic acid, wherein the guide sequence of the guide nucleic acid can hybridize to a target sequence within the target polynucleotide. A guide nucleic acid can include a guide sequence linked to a scaffold sequence. A scaffold sequence can include one or more sequence regions with a degree of complementarity such that together they form a secondary structure.
  • An editing template polynucleotide can include a sequence to be integrated (e.g., a mutated gene). A sequence for integration may be a sequence endogenous or exogenous to the cell. Examples of a sequence to be integrated include polynucleotides encoding a protein or a non-coding RNA (e.g., a microRNA). Thus, the sequence for integration may be operably linked to an appropriate control sequence or sequences. Alternatively, the sequence to be integrated may provide a regulatory function. Sequence to be integrated may be a mutated or variant of an endogenous wild-type sequence. Alternatively, sequence to be integrated may be a wild-type version of an endogenous mutated sequence. Additionally or alternatively, sequenced to be integrated may be a variant or mutated form of an endogenous mutated or variant sequence.
  • In certain embodiments, an upstream or downstream sequence can include from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or about 2500 bp. In some embodiments, an exemplary upstream or downstream sequence has about 15 bp to about 2000 bp, about 30 bp to about 1000 bp, about 50 bp to about 750 bp, about 600 bp to about 1000 bp, or about 700 bp to about 1000 bp.
  • In some embodiments, the editing template polynucleotide can further include a marker. In certain embodiments, some markers can facilitate screening for targeted integrations. Examples of suitable markers can include, but are not limited to, restriction sites, fluorescent proteins, or selectable markers. In certain embodiments, an exogenous polynucleotide template can be constructed using recombinant techniques.
  • In one embodiment, an exemplary method for modifying a target polynucleotide by integrating an editing template polynucleotide, a double stranded break is introduced into the genome sequence by an engineered nuclease complex, the break can be repaired via homologous recombination using an editing template such that the template is integrated into the target polynucleotide. The presence of a double-stranded break can increase the efficiency of integration of the editing template.
  • Disclosed herein are methods for modifying expression of a polynucleotide in a cell. Some methods include increasing or decreasing expression of a target polynucleotide by using a targetable nuclease complex that binds to the target polynucleotide.
  • Detection of the gene expression level can be conducted in real time in an amplification assay. In one aspect, the amplified products can be directly visualized with fluorescent DNA-binding agents including but not limited to DNA intercalators and DNA groove binders. Because the amount of the intercalators incorporated into the double-stranded DNA molecules can be proportional to the amount of the amplified DNA products, one can conveniently determine the amount of the amplified products by quantifying the fluorescence of the intercalated dye using conventional optical systems in the art. DNA-binding dye suitable for this application include, but are not limited to, SYBR green, SYBR blue, DAPI, propidium iodine, Hoeste, SYBR gold, ethidium bromide, acridines, proflavine, acridine orange, acriflavine, fluorcoumanin, ellipticine, daunomycin, chloroquine, distamycin D, chromomycin, homidium, mithramycin, ruthenium polypyridyls, anthramycin, and others known by one of skill in the art.
  • In some embodiments, other fluorescent labels such as sequence specific probes can be employed in the amplification reaction to facilitate the detection and quantification of the amplified products. Probe-based quantitative amplification relies on the sequence-specific detection of a desired amplified product. It utilizes fluorescent, target-specific probes (e.g., TaqMan™ probes) resulting in increased specificity and sensitivity. Methods for performing probe-based quantitative amplification are well established in the art.
  • In some embodiments, an agent-induced change in expression of sequences associated with a signaling biochemical pathway can also be determined by examining the corresponding gene products. Determining the protein level can involve a) contacting the protein contained in a biological sample with an agent that specifically bind to a protein associated with a signaling biochemical pathway; and (b) identifying any agent:protein complex so formed. In one aspect of this embodiment, the agent that specifically binds a protein associated with a signaling biochemical pathway is an antibody, preferably a monoclonal antibody.
  • In some embodiments, the amount of agent:polypeptide complexes formed during the binding reaction can be quantified by standard quantitative assays. As illustrated above, the formation of agent:polypeptide complex can be measured directly by the amount of label remained at the site of binding. In an alternative, the protein associated with a signaling biochemical pathway is tested for its ability to compete with a labeled analog for binding sites on the specific agent. In this competitive assay, the amount of label captured is inversely proportional to the amount of protein sequences associated with a signaling biochemical pathway present in a test sample.
  • In some embodiments, a number of techniques for protein analysis based on the general principles outlined above are known in the art and contemplated herein. They include but are not limited to radioimmunoassays, ELISA (enzyme linked immunoradiometric assays), “sandwich” immunoassays, immunoradiometric assays, in situ immunoassays (using e.g., colloidal gold, enzyme or radioisotope labels), western blot analysis, immunoprecipitation assays, immunofluorescent assays, and SDS-PAGE.
  • In some embodiments, in practicing a subject method, it may be desirable to discern the expression pattern of a protein associated with a signaling biochemical pathway in different bodily tissue, in different cell types, and/or in different subcellular structures. These studies can be performed with the use of tissue-specific, cell-specific or subcellular structure specific antibodies capable of binding to protein markers that are preferentially expressed in certain tissues, cell types, or subcellular structures.
  • In other embodiment, an altered expression of a gene associated with a signaling biochemical pathway can also be determined by examining a change in activity of the gene product relative to a control cell. The assay for an agent-induced change in the activity of a protein associated with a signaling biochemical pathway will dependent on the biological activity and/or the signal transduction pathway that is under investigation. For example, where the protein is a kinase, a change in its ability to phosphorylate the downstream substrate(s) can be determined by a variety of assays known in the art. Representative assays include but are not limited to immunoblotting and immunoprecipitation with antibodies such as anti-phosphotyrosine antibodies that recognize phosphorylated proteins. In addition, kinase activity can be detected by high throughput chemiluminescent assays.
  • In certain embodiments, where the protein associated with a signaling biochemical pathway is part of a signaling cascade leading to a fluctuation of intracellular pH condition, pH sensitive molecules such as fluorescent pH dyes can be used as the reporter molecules. In another example where the protein associated with a signaling biochemical pathway is an ion channel, fluctuations in membrane potential and/or intracellular ion concentration can be monitored. A number of commercial kits and high-throughput devices are suited for a rapid and robust screening for modulators of ion channels. Representative instruments include FLIPR™ (Molecular Devices, Inc.) and VIPR (Aurora Biosciences). These instruments are capable of detecting reactions in over 1000 sample wells of a microplate simultaneously, and providing real-time measurement and functional data within a second or even a millisecond.
  • In practicing any of the methods disclosed herein, a suitable vector can be introduced to a cell, tissue, organism, or an embryo via one or more methods known in the art, including without limitation, microinjection, electroporation, sonoporation, biolistics, calcium phosphate-mediated transfection, cationic transfection, liposome transfection, dendrimer transfection, heat shock transfection, nucleofection transfection, magnetofection, lipofection, impalefection, optical transfection, proprietary agent-enhanced uptake of nucleic acids, and delivery via liposomes, immunoliposomes, virosomes, or artificial virions. In some methods, the vector is introduced into an embryo by microinjection. The vector or vectors may be microinjected into the nucleus or the cytoplasm of the embryo. In some methods, the vector or vectors may be introduced into a cell by nucleofection.
  • A target polynucleotide of a targetable nuclease complex can be any polynucleotide endogenous or exogenous to the host cell. For example, the target polynucleotide can be a polynucleotide residing in the nucleus of the eukaryotic cell, the genome of a prokaryotic cell, or an extrachromosomal vector of a host cell. The target polynucleotide can be a sequence coding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or a junk DNA).
  • Some embodiments disclosed herein relate to use of an engineered nucleic acid guided nuclease system disclosed herein; for example, in order to target and knock out genes, amplify genes and/or repair certain mutations associated with DNA repeat instability and a medical disorder. This nuclease system may be used to harness and to correct these defects of genomic instability. In other embodiments, engineered nucleic acid guided nuclease systems disclosed herein can be used for correcting defects in the genes associated with Lafora disease. Lafora disease is an autosomal recessive condition which is characterized by progressive myoclonus epilepsy which may start as epileptic seizures in adolescence. This condition causes seizures, muscle spasms, difficulty walking, dementia, and eventually death.
  • In yet another aspect of the invention, the engineered/novel nucleic acid guided nuclease system can be used to correct genetic-eye disorders that arise from several genetic mutations
  • In certain embodiments disclosed herein engineered nucleic acid guided nuclease constructs can recognize a protospacer adjacent motif (PAM) sequence other than TTTN or in addition to TTTN. In other embodiments, engineered nucleic acid guided nuclease constructs disclosed herein can be further mutated to improve targeting efficiency or can be selected from a library for certain targeted features. Other embodiments disclosed herein concern vectors including constructs disclosed herein of use for further analysis and to select for improved genome editing features.
  • Other embodiments disclosed herein include kits for packaging and transporting nucleic acid guided nuclease constructs and/or novel gRNAs disclosed herein or known gRNAs disclosed herein and further include at least one container. In certain embodiments, several reagents required for the kits can be included for convenience and ease of transport and efficiency.
  • EXAMPLES Example 1: Culture of Jurkat Human T-Cell Leukemia Cell Line and Primary Human T-Cells
  • Human Jurkat T-cell leukemia cells (Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures GmbH (ACC 282)) were propagated in RPMI 1640 medium (ThermoFisher Scientific) with 10% heat-inactivated fetal bovine serum (FBS) (ThermoFisher Scientific) supplemented with 1% penicillin-streptomycin antibiotic mix (ThermoFisher Scientific). Cells were cultured at 37° C. in 5% CO2 incubators and maintained at a density of 0.5 to 1.5×106 cells mL−1. 24 hours before transfection, cells were passaged at 0.1×106 cell mL−1. Cell culture media supernatant was periodically tested for mycoplasma contamination using the MycoAlert PLUS mycoplasma detection kit (Lonza).
  • Example 2: Primary T-Cell Isolation and Culture
  • T-cells were isolated from human peripheral blood obtained from healthy adults by immune-magnetic negative selection using the EasySep Human T-cell Isolation Kit (STEMCELL Technologies). After isolation, T-cells were activated in 25 μL mL−1 ImmunoCult Human CD3/CD28/CD2 T-Cell Activator (STEMCELL Technologies) in ImmunoCult-XF T-Cell Expansion Medium (STEMCELL Technologies) containing 12.5 ng mL−1 Human Recombinant IL-2, 5 ng mL−1 IL-7, and 5 ng mL−1 IL-15 (STEMCELL Technologies) and seeded at 1.0×106 cells mL−1. Until transfection 48 hours later, the cells were cultured at 37° C. in 5% CO2 incubators.
  • Example 3: RNP Formulation
  • Ribonucleoprotein complexes (RNPs) were generated by incubating respective guide nucleic acids (gNAs) with MAD7 in the molar ratio of 3:2 gNA:MAD7 for 15 minutes at room temperature immediately before transfection. For Jurkat experiments, the RNP complexes were generated by mixing the respective gNA (150 pmol), MAD7 (100 pmol), and nuclease-free water, unless otherwise stated. For T-cell experiments, 1.6 μL of an aqueous solution of 15-50 kDa poly-L-glutamic acid (PGA, 100 μg Alamanda Polymers) was added to gNAs, followed by the addition of MAD7 and nuclease-free water.
  • Example 4: Generation of Donor Templates Via PCT Amplification
  • Donor templates comprising site-specific homology arms, respective promoter, and respective gene (GFP or Hu19 scFv-CD8α-CD28-CD3ζ CAR) were amplified from corresponding pTwist Ampicillin high-copy plasmids (Twist Bioscience) using homology arms-specific PCR primers. Donor templates were amplified in a two-step PCR program: initial denaturation at 98° C. for 30 seconds, cycle denaturation at 98° C. for 10 seconds, extension at 72° C. for 30 seconds per kb amplicon for 40-cycles with a hold at 72° C. for 10 minutes. Each 50 PCR reaction contained 10 ng amplification template (plasmid DNA), 0.5 μM homology arm-specific forward and reverse primers, nuclease-free water (IDT), 3% DMSO, and 1× Phusion High-Fidelity PCR Master Mix with HF Buffer (ThermoFisher Scientific). PCR products were purified using NucleoSpin Gel and PCR Clean-up Kit (Macherey-Nagel) with two 20 μL elutions. Purified HDR templates were collected and quantified on NanoDrop One Microvolume UV-Vis Spectrophotometer (ThermoFisher Scientific). Templates were concentrated using Amicon Ultra 0.5 mL 30K Centrifugal Filters: 100 μg DNA per unit was transferred, filled with nuclease-free water to 500 μL, and centrifuged at 10,000 g for 10 minutes to reduce volume to 50 μL. DNA was washed twice with nuclease-free water and recovered into a fresh tube by inversion and centrifugation at 10,000 g for 15 seconds. HDR templates were collected, diluted, and concentrations quantified using Qubit dsDNA HS Assay Kit (ThermoFisher Scientific). HDR templates of 0.5 to 1 μg μL−1 were used for cellular studies.
  • Example 5: Jurkat Cell Transfection
  • Lonza 4D Nucleofector with Shuttle unit (V4SC-2960 Nucleocuvette Strips) was used for transfection, following the manufacturer's instructions. For transfection, cells were harvested by centrifugation (200 g, RT, 5 minutes) and re-suspended in 20 μL at 10×106 cells mL−1 in the SF Cell Line Nucleofector X Kit buffer (Lonza), unless stated otherwise. The cell suspension was mixed with the RNPs, immediately transferred to the nucleocuvette, and transfected. After transfection, the cells were immediately re-suspended in the pre-warmed cultivation medium and plated onto 96-well, flat-bottom, non-cell culture treated plates (Falcon), and cultured at 37° C. in 5% CO2 incubators and maintained at a density of 0.5 to 1.0×106 cells mL−1. After 48 hours, the cells were harvested for the viability assay and genomic DNA, as described below. For the Homology-Directed Repair Template insertion, the HDR template was added to the cells and the suspension transferred to the RNPs immediately before transfection. The transfection parameters, cell recovery step, and proliferation conditions as described in Example 1. The cells were harvested 48 hours post-transfection for the viability assessment, after 7 days for CAR insertion efficiency, or after 7 days, 14 days, and 21 days for GFP insertion efficiency.
  • Example 6: Primary T-Cell Transfection
  • 48 hours after isolation, the cells were harvested by centrifugation (300 g, RT, 5 minutes) and re-suspended in 20 μL at 50×106 cells mL−1 in the supplemented P3 Primary Cell Nucleofector Kit buffer (Lonza). The cells were mixed with HDR templates and the suspension transferred to the RNPs immediately before transfection (Nucleofection program EH-115). After transfection, 80 μL of pre-warmed cultivation medium without IL-2 was added to the electroporation cuvettes. When using M3814 (Selleckchem), 80 μL of pre-warmed cultivation medium containing 2 μM M3814 final concentration without IL-2 was added to the electroporation cuvettes. After 10 minutes of incubation at 37° C., T-cells were transferred onto 96-well, flat-bottom, non-cell culture treated plates (Falcon) containing pre-warmed cultivation medium pretreated with 2 μM M3814 final concentration and 12.5 ng mL−1 IL-2. The cells were seeded at a density of 0.25×106 cells mL−1, or 1.3×106 cells mL−1 in the experiment with M3814, and kept at 37° C. in 5% CO2 incubators. The viability assay was carried out 24 hours post-transfection after which the cells were reseeded in the fresh cultivation medium containing IL-2. Insertion efficiency of CAR was measured after 7 days, and 11 days or 13 days post-transfection.
  • Example 7: Flow Cytometry
  • Flow cytometric assessments were carried out on a CytoFLEX S instrument (Beckmen Coulter) using a 96-well plate format. Measurements of cell viability, PDCD1 expression, GFP expression, and CAR expression were performed on 10,000 or 20,000 single cell events in Jurkat or primary T-cells, respectively.
  • For the cell viability and GFP knock-in measurements, approximately 250,000 cells per sample were transferred onto 96-well V-bottom cell culture plates and assessed following a series of consecutive washing and staining steps. The first step included centrifuging the cells at 300 g for 5 minutes at room temperature, discarding the supernatant, and washing cells in 150 μL Dulbecco's PBS/2% FBS (STEMCELL Technologies) or Cell Staining Buffer (Biolegend), respectively, followed by the second centrifugation and removal of supernatant. The final step included viability staining of cells using 150 μL Dulbecco's PBS/2% FBS with 7-amino-actinomycin D (7-AAD, 1:1,000; ThermoFisher) or 50 μL Cell Staining Buffer with Zombie Violet Dye (1:200; Biolegend), respectively. The measurements of cell viability and GFP expression were collected simultaneously for 7-AAD (excitation: yellow-green laser; emission: 561 nm), Zombie Violet (excitation: violet laser; emission 405 nm), and GFP (excitation: blue laser; emission 488 nm) as needed.
  • For detection of CAR knock-in efficiency, approx. 250,000 cells per sample were transferred onto 96-well V-bottom, washed as described above using Cell Staining Buffer, and re-suspended in 50 μL Cell Staining Buffer with PE Anti-Myc tag antibody [9E10] (1:50; Abcam) and Zombie Violet Dye (1:200; Biolegend) for 30 minutes. Afterwards, the cells were washed in two subsequent washing steps using 150 μL Cell Staining Buffer, and finally re-suspended in 100 μL Cell Staining Buffer for the flow cytometry measurements (excitation: yellow-green laser; emission: 561 nm).
  • For detection of PDCD1 knock-out efficiency, approx. 250,000 Jurkat cells per sample were transferred onto 96-well V-bottom cell culture plates and assessed following a series of consecutive washing and staining steps. The first step included centrifuging the cells at 300 g for 5 minutes at 4° C. and discarding the supernatant. Afterwards, the cells were stained using 100 μL Cell Staining Buffer (Biolegend) with APC/Cyanine7 anti-human CD279 (PD-1) antibody (1:100; Biolegend) and incubated for 30 minutes at 4° C. in the dark. The cells were then centrifuged at 300 g for 5 minutes at 4° C. and the supernatant discarded. The next step included two repeats of centrifugation at 300 g for 5 minutes at 4° C., supernatant removal, and cell washing in 150 μL ice-cold Cell Staining Buffer (Biolegend). In the final step, the cells were re-suspended in 100 μL Cell Staining Buffer for the flow cytometry measurements (excitation: red laser; emission: 633 nm).
  • Example 8: DNA Extraction
  • Cells were harvested 48-h post-transfection by centrifugation (1,000 g, 10 minutes) in 96-well, V-bottom plates (Greiner), washed with PBS (Sigma Aldrich) and lysed in 20 μL QuickExtract DNA Extraction Solution (Epicentre, Lucigen). DNA was extracted following the manufacturer's protocol: 15 minutes at 65° C., 15 minutes at 68° C., 10 minutes at 95° C., cooled to 4° C., and stored at 4° C. Genomic DNA was diluted 20-fold in nuclease-free water before amplicon PCR reactions.
  • Example 9: Amplicon Sequencing
  • Extracted genomic DNA was quantified using the NanoDrop (ThermoFisher Scientific). Amplicons were constructed in two PCR steps: in the first PCR, regions of interest (150-400 bp) were amplified from 10 to 30 ng of genomic DNA with primers containing Illumina forward and reverse adapters on both ends comprising suitable loci-specific complementary sequences, using Phusion High-Fidelity PCR Master Mix (ThermoFisher Scientific). Amplification products were purified with Agencourt AMPure XP beads (Ramcon), using the sample to beads ratio of 1:1.8. The DNA was eluted from the beads with nuclease-free water and the size of the purified amplicons analyzed on a 2% agarose E-gel using the E-gel electrophoresis system (ThermoFisher Scientific). In the second PCR, unique pairs of Illumina-compatible indexes (Nextera XT Index Kit v2) were added to the amplicons using the KAPA HiFi HotStart Ready Mix (Roche). The amplified products were purified with Agencourt AMPure XP beads (Ramcon), using the sample to bead ratio of 1:1.8. The DNA was eluted from the beads with 10 mM Tris-HCl pH 8.5, 0.1% Tween 20. Sizes of the purified DNA fragments were validated on a 2% agarose gel using the E-gel electrophoresis system (ThermoFisher Scientific), quantified using Qubit dsDNA HS Assay Kit (Thermo Fisher) and then pooled in equimolar concentrations. Quality of the amplicon library was validated using Bioanalyzer, High Sensitivity DNA Kit (Agilent) before sequencing. The final library was sequenced on Illumina MiSeq System using the MiSeq Reagent Kit v.2 (300 cycles, 2×250 bp, paired-end reads). De-multiplexed FASTQ files were obtained from BaseSpace (Illumina).
  • Example 10: NGS Data Analysis
  • Initial quality assessment of the obtained reads was performed with FastQC36. The sequencing data were aligned and analyzed with the CRISPResso2 software, using CRISPRessoBatch command with the parameters—cleavage_offset 1—quantification_window_size 10—quantification_window_center 1—expand_ambiguous_alignments for the INDEL frequency analysis. For the ORF disruption analysis, CRISPRessoBatch command with the parameters—cleavage_offset 1—coding_seq <EXON_SEQ>—quantification_window_size 0—quantification_window_center 1—expand_ambiguous_alignments was used. Modification rates from the CRISPResso2 software output were analyzed in Excel.
  • Example 11: CRISPR-MAD7 Platform for Human Genome Editing Using the Jurkat T-Cell Leukemia Line
  • MAD7 nuclease comprising a His6 tag (SEQ ID NO: 423) and either one (MAD7-1NLS) or four (MAD7-4NLS) nuclear localization signals (NLS) were used (FIG. 1 ). RNPs were generated as described in Example 3. Editing frequency of the MAD7 nuclease complexed with one or more guide nucleic acids comprising a spacer sequence of SEQ ID NOs: 86-384 as shown in Table 1 was determined by nucleofection of RNPs in Jurkat T-cells using the Lonza recommended nucleofection program SE-CL-120 (Example 5), followed by genomic DNA extraction (Example 8), amplification of the edited locus and targeted next-generation sequencing (Example 9) for identification of the edits, and finally by computational analysis (Example 10) of modification frequency using the CRISPResso2 algorithm.
  • TABLE 1
    Spacer sequences
    Name PAM SEQ ID NO Spacer sequence
    crCD247_1 TTTC 114 ACCGCGGCCAUCCUGCAGGCA
    crCD247_2 TTTC 115 UGAGGGAAAGGACAAGAUGAA
    crCD247_3 TTTG 116 GGAUCCAGCAGGCCAAAGCUC
    crCD247_4 TTTC 117 CUAGCAGAGAAGGAAGAACCC
    crCD247_5 TTTC 118 UGUGUUGCAGUUCAGCAGGAG
    crCD247_6 CTTC 119 CUGAGGGUUCUUCCUUCUCUG
    crCD247_7 CTTC 120 CCGUUGUCUUUCCUAGCAGAG
    crCD247_8 TTTC 121 UGCAGUUCCUGCAGAAGAGGG
    crCD247_9 CTTC 122 UGCAGGAACUGCAGAAAGAUA
    crCD247_10 TTTC 123 AUCCCAAUCUCACUGUAGGCC
    crCD247_11 CTTT 124 CAUCCCAAUCUCACUGUAGGC
    crCD247_12 TTTT 125 CUCAUUUCACUCCCAAACAAC
    crCD247_13 TTTC 126 UCAUUUCACUCCCAAACAACC
    crCD247_14 TTTC 127 ACUCCCAAACAACCAGCGCCG
    crCD247_15 CTTA 128 CGUUAUAGAGCUGGUUCUGGC
    crCD247_16 TTTG 129 UUUUCUGAUUUGCUUUCACGC
    crCD247_17 TTTC 130 UGAUUUGCUUUCACGCCAGGG
    crCD247_18 TTTG 131 CUUUCACGCCAGGGUCUCAGU
    crCD247_19 TTTC 132 ACGCCAGGGUCUCAGUACAGC
    crCD247_20 TTTC 133 CGGAGGGUCUACGGCGAGGCU
    crCD247_21 TTTC 134 UUAUCUGUUAUAGGAGCUCAA
    crCD247_22 CTTA 135 UCUGUUAUAGGAGCUCAAUCU
    crCD247_23 CTTG 136 UCCAAAACAUCGUACUCCUCU
    crCD247_24 TTTC 137 CCCCCAUCUCAGGGUCCCGGC
    crCD247_25 TTTG 138 GACAAGAGACGUGGCCGGGAC
    crCD247_26 TTTC 139 UCUCCCUCUAACGUCUUCCCG
    crCTLA4_1 TTTG 140 CCUGGAGAUGCAUACUCACAC
    crCTLA4_2 TTTG 141 CAGAAGACAGGGAUGAAGAGA
    crCTLA4_3 TTTC 142 CACUGGAGGUGCCCGUGCAGA
    crCTLA4_4 TTTG 143 UGUGUGAGUAUGCAUCUCCAG
    crCTLA4_5 TTTC 144 AGCGGCACAAGGCUCAGCUGA
    crCTLA4_6 CTTG 145 UGCCGCUGAAAUCCAAGGCAA
    crCTLA4_7 CTTT 146 UCCAUGCUAGCAAUGCACGUG
    crCTLA4_8 TTTT 147 CCAUGCUAGCAAUGCACGUGG
    crCTLA4_9 CTTT 148 GUGUGUGAGUAUGCAUCUCCA
    crCTLA4_10 CTTT 149 GCCUGGAGAUGCAUACUCACA
    crCTLA4_11 CTTC 150 GGCAGGCUGACAGCCAGGUGA
    crCTLA4_12 CTTC 151 AGUCACCUGGCUGUCAGCCUG
    crCTLA4_13 CTTC 152 CUAGAUGAUUCCAUCUGCACG
    crCTLA4_14 CTTG 153 CCUUGGAUUUCAGCGGCACAA
    crCTLA4_15 CTTG 154 AUUUCCACUGGAGGUGCCCGU
    crCTLA4_16 CTTG 155 GAUAGUGAGGUUCACUUGAUU
    crCTLA4_17 CTTG 156 CAGAUGUAGAGUCCCGUGUCC
    crCTLA4_18 TTTG 157 CUCACCAAUUACAUAAAUCUG
    crCTLA4_19 CTTT 158 GCUCACCAAUUACAUAAAUCU
    crCTLA4_20 CTTT 159 GUUUUCUGUUGCAGAUCCAGA
    crCTLA4_21 TTTG 160 UUUUCUGUUGCAGAUCCAGAA
    crCTLA4_22 TTTT 161 CUGUUGCAGAUCCAGAACCGU
    crCTLA4_23 CTTC 162 CUCCUCUGGAUCCUUGCAGCA
    crCTLA4_24 CTTG 163 CAGCAGUUAGUUCGGGGUUGU
    crCTLA4_25 CTTG 164 GAUUUCAGCGGCACAAGGCUC
    crCTLA4_26 TTTT 165 UUUAUAGCUUUCUCCUCACAG
    crCTLA4_27 CTTT 166 CUCCUCACAGCUGUUUCUUUG
    crCTLA4_28 TTTC 167 UCCUCACAGCUGUUUCUUUGA
    crCTLA4_29 TTTT 168 GCUCAAAGAAACAGCUGUGAG
    crCTLA4_30 TTTC 169 UUUUUGUGUUUGACAGCUAAA
    crCTLA4_31 TTTT 170 UGUGUUUGACAGCUAAAGAAA
    crCTLA4_32 TTTG 171 ACAGCUAAAGAAAAGAAGCCC
    crCTLA4_33 TTTT 172 CACAUAGACCCCUGUUGUAAG
    crCTLA4_34 TTTT 173 CACAUUCUGGCUCUGUUGGGG
    crCTLA4_35 CTTT 174 UCACAUUCUGGCUCUGUUGGG
    crCTLA4_36 TTTC 175 AGCCUUAUUUUAUUCCCAUCA
    crCTLA4_37 TTTC 176 UCAAUUGAUGGGAAUAAAAUA
    crCTLA4_38 TTTT 177 UUCUUCUCUUCAUCCCUGUCU
    crCTLA4_39 CTTT 178 GCAGAAGACAGGGAUGAAGAG
    crCTLA4_40 CTTT 179 GGCUUUUCCAUGCUAGCAAUG
    crCTLA4_41 TTTG 180 GCUUUUCCAUGCUAGCAAUGC
    crLAG3_1 TTTG 181 GGGUGCAUACCUGUCUGGCUG
    crLAG3_2 TTTG 182 GGUCACCUGGAUCCCUGGGGA
    crLAG3_3 TTTC 183 UCAGGACCUUGGCUGGAGGCA
    crLAG3_4 TTTC 184 CCAGCCUUGGCAAUGCCAGCU
    crLAG3_5 TTTG 185 UGAGGUGACUCCAGUAUCUGG
    crLAG3_6 CTTG 186 CUGUUUCUGCAGCCGCUUUGG
    crLAG3_7 CTTG 187 CACAGUGACUGCCAGCCCCCC
    crLAG3_8 TTTT 188 GAACUGCUCCUUCAGCCGCCC
    crLAG3_9 CTTC 189 AGCCGCCCUGACCGCCCAGCC
    crLAG3_10 TTTC 190 CGCUAAGUGGUGAUGGGGGGA
    crLAG3_11 CTTT 191 CCGCUAAGUGGUGAUGGGGGG
    crLAG3_12 CTTA 192 GCGGAAAGCUUCCUCUUCCUG
    crLAG3_13 CTTG 193 GGGCAGGAAGAGGAAGCUUUC
    crLAG3_14 CTTC 194 CUCUUCCUGCCCCAAGUCAGC
    crLAG3_15 CTTC 195 AACGUCUCCAUCAUGUAUAAC
    crLAG3_16 TTTT 196 CUUUUCUCUUCAGGUCUGGAG
    crLAG3_17 TTTC 197 UGCAGCCGCUUUGGGUGGCUC
    crLAG3_18 TTTT 198 CUCUUCAGGUCUGGAGCCCCC
    crLAG3_19 CTTG 199 ACAGUGUACGCUGGAGCAGGU
    crLAG3_20 CTTG 200 GCAGUGAGGAAAGACCGGGUC
    crLAG3_21 TTTC 201 CUCACUGCCAAGUGGACUCCU
    crLAG3_22 CTTT 202 ACCCUUCGACUAGAGGAUGUG
    crLAG3_23 TTTA 203 CCCUUCGACUAGAGGAUGUGA
    crLAG3_24 CTTC 204 GACUAGAGGAUGUGAGCCAGG
    crLAG3_25 TTTC 205 CCACCUGAGGCUGACCUGUGA
    crLAG3_26 CTTT 206 CCCACCUGAGGCUGACCUGUG
    crLAG3_27 CTTC 207 UACUCUUUUCAGUGACUCCCA
    crLAG3_28 TTTT 208 ACCUGGAGCCACCCAAAGCGG
    crLAG3_29 TTTT 209 CAGUGACUCCCAAAUCCUUUG
    crLAG3_30 CTTC 210 CCCAGGGAUCCAGGUGACCCA
    crLAG3_31 CTTT 211 GGGUCACCUGGAUCCCUGGGG
    crLAG3_32 CTTT 212 GUGAGGUGACUCCAGUAUCUG
    crLAG3_33 CTTT 213 GUGUGGAGCUCUCUGGACACC
    crLAG3_34 TTTG 214 UGUGGAGCUCUCUGGACACCC
    crLAG3_35 CTTG 215 GCUGGAGGCACAGGAGGCCCA
    crLAG3_36 TTTT 216 GCUCACCUAGUGAAGCCUCUC
    crLAG3_37 CTTT 217 CCCAGCCUUGGCAAUGCCAGC
    crLAG3_38 CTTG 218 GCAAUGCCAGCUGUACCAGGG
    crLAG3_39 CTTC 219 UUGGAGCAGCAGUGUACUUCA
    crLAG3_40 CTTC 220 ACAGAGCUGUCUAGCCCAGGU
    crLAG3_41 CTTT 221 CUCCAUAGGUGCCCAACGCUC
    crLAG3_42 TTTC 222 UCCAUAGGUGCCCAACGCUCU
    crLAG3_43 TTTC 223 UCAUCCUUGGUGUCCUUUCUC
    crLAG3_44 CTTG 224 GUGUCCUUUCUCUGCUCCUUU
    crLAG3_45 CTTT 225 CUCUGCUCCUUUUGGUGACUG
    crLAG3_46 CTTC 226 UGCGAAGAGCAGGGGUCACUU
    crLAG3_47 CTTT 227 UGGUGACUGGAGCCUUUGGCU
    crLAG3_48 TTTT 228 GGUGACUGGAGCCUUUGGCUU
    crLAG3_49 CTTT 229 GGCUUUCACCUUUGGAGAAGA
    crLAG3_50 TTTG 230 GCUUUCACCUUUGGAGAAGAC
    crLAG3_51 CTTG 231 CUCUAAGGCAGAAAAUCGUCU
    crLAG3_52 TTTT 232 CUGCCUUAGAGCAAGGGAUUC
    crLAG3_53 CTTA 233 GAGCAAGGGAUUCACCCUCCG
    crLAG3_54 TTTC 234 CCGCCCAGUGGCCCGCCCGCU
    crLAG3_55 CTTC 235 UCGCUAUGGCUGCGCCCAGCC
    crLAG3_56 TTTA 236 UCCUUGCACAGUGACUGCCAG
    crPDCD1_1 TTTA 237 GCACGAAGCUCUCCGAUGUGU
    crPDCD1_2 TTTC 238 UCUGCAGGGACAAUAGGAGCC
    crPDCD1_3 TTTC 239 CAGUGGCGAGAGAAGACCCCG
    crPDCD1_4 TTTC 240 CUAGCGGAAUGGGCACCUCAU
    crPDCD1_5 CTTC 241 GUGCUAAACUGGUACCGCAUG
    crPDCD1_6 CTTC 242 AACCUGACCUGGGACAGUUUC
    crPDCD1_7 CTTG 243 UCCGUCUGGUUGCUGGGGCUC
    crPDCD1_8 CTTC 244 CCCGAGGACCGCAGCCAGCCC
    crPDCD1_9 CTTC 245 CGUGUCACACAACUGCCCAAC
    crPDCD1_10 CTTC 246 CACAUGAGCGUGGUCAGGGCC
    crPDCD1_11 CTTT 247 GAUCUGCGCCUUGGGGGCCAG
    crPDCD1_12 TTTG 248 AUCUGCGCCUUGGGGGCCAGG
    crPDCD1_13 CTTG 249 GGGGCCAGGGAGAUGGCCCCA
    crPDCD1_14 CTTT 250 GUGCCCUUCCAGAGAGAAGGG
    crPDCD1_15 TTTG 251 UGCCCUUCCAGAGAGAAGGGC
    crPDCD1_16 TTTC 252 CCUUCCGCUCACCUCCGCCUG
    crPDCD1_17 CTTC 253 CAGAGAGAAGGGCAGAAGUGC
    crPDCD1_18 CTTC 254 UGCCCUUCUCUCUGGAAGGGC
    crPDCD1_19 TTTG 255 GAACUGGCCGGCUGGCCUGGG
    crPDCD1_20 CTTT 256 CUCCUCAAAGAAGGAGGACCC
    crPDCD1_21 TTTC 257 UCCUCAAAGAAGGAGGACCCC
    crPDCD1_22 CTTC 258 UCUCGCCACUGGAAAUCCAGC
    crPDCD1_23 CTTT 259 CCUAGCGGAAUGGGCACCUCA
    crPDCD1_24 CTTC 260 CGCUCACCUCCGCCUGAGCAG
    crPDCD1_25 CTTG 261 GCCCCUCUGACCGGCUUCCUU
    crPDCD1_26 CTTC 262 UCCACUGCUCAGGCGGAGGUG
    crPDCD1_27 CTTC 263 UCCCCAGCCCUGCUCGUGGUG
    crPDCD1_28 CTTC 264 GGUCACCACGAGCAGGGCUGG
    crPDCD1_29 CTTC 265 ACCUGCAGCUUCUCCAACACA
    crPDCD1_30 CTTC 266 UCCAACACAUCGGAGAGCUUC
    crPTPN1_1 TTTA 267 CCUGACAGCGAAUCAUAACAU
    crPTPN1_2 TTTC 268 AUUCCAACUUACCUAACGGAA
    crPTPN1_3 TTTC 269 UGUGCGCACUGGUGAUGACAA
    crPTPN11_4 TTTC 270 CAAUCUGCUCACCUGCUUGAG
    crPTPN11_5 TTTC 271 UUCUAGUUGAUCAUACCAGGG
    crPTPN11_6 TTTA 272 AUAACUUACCUCAAAUUCUUC
    crPTPN11_7 CTTA 273 CCUAACGGAAAGUGUGAAGUC
    crPTPN11_8 TTTC 274 CAGACACUACAACAACAGGAG
    crPTPN11_9 TTTA 275 GGUGGUUUCAUGGACAUCUCU
    crPTPN11_10 TTTC 276 CCAGAGAGAUGUCCAUGAAAC
    crPTPN6_1 TTTC 277 UAUGACCUGUAUGGAGGGGAG
    crPTPN6_2 TTTG 278 CGACUCUGACAGAGCUGGUGG
    crPTPN6_3 TTTG 279 CAGAAGCAGGAGGUGAAGAAC
    crPTPN6_4 TTTG 280 ACUGCCCCCCACCCAGGCCUG
    crPTPN6_5 CTTA 281 UGGGCCCUACUCUGUGACCAA
    crPTPN6_6 TTTC 282 ACCGAGACCUCAGUGGGCUGG
    crPTPN6_7 CTTC 283 UCUAGGUGGUACCAUGGCCAC
    crPTPN6_8 CTTG 284 GCCUGCAGCAGCGUCUCUGCC
    crPTPN6_9 TTTC 285 UUGUGCGUGAGAGCCUCAGCC
    crPTPN6_10 CTTC 286 GUGCUUUCUGUGCUCAGUGAC
    crPTPN6_11 CTTG 287 GGCUGGUCACUGAGCACAGAA
    crPTPN6_12 CTTT 288 CUGUGCUCAGUGACCAGCCCA
    crPTPN6_13 TTTC 289 UGUGCUCAGUGACCAGCCCAA
    crPTPN6_14 CTTG 290 AUGUGGGUGACCCUGAGCGGG
    crPTPN6_15 CTTA 291 CCUCGCACAUGACCUUGAUGU
    crPTPN6_16 TTTG 292 GCUCCCCCCAGGGUGGACGCU
    crPTPN6_17 CTTG 293 AGCAGGGUCUCUGCAUCCAGC
    crPTPN6_18 TTTG 294 GAGACCUUCGACAGCCUCACG
    crPTPN6_19 CTTC 295 GACAGCCUCACGGACCUGGUG
    crPTPN6_20 TTTC 296 AAGAAGACGGGGAUUGAGGAG
    crPTPN6_21 CTTC 297 UUGUUCAGUUCCAACACUCGG
    crPTPN6_22 CTTG 298 GCUGUAUCCUCGGACUCCUGC
    crPTPN6_23 TTTC 299 CCCACCCACAUCUCAGAGUUU
    crPTPN6_24 CTTC 300 CAGACGCUGGUGCAAGUUCUU
    crPTPN6_25 CTTG 301 CACCAGCGUCUGGAAGGGCAG
    crPTPN6_26 CTTG 302 UUCUCUGGCCGCUGCCCUUCC
    crPTPN6_27 CTTG 30 AUGUAGUUGGCAUUGAUGUAG
    crPTPN6_28 CTTG 304 CGUCCAGAACCAGCUGCUAGG
    crPTPN6_29 CTTC 305 UGGCAGAUGGCGUGGCAGGAG
    crPTPN6_30 TTTC 306 UCCACCUCUCGGGUGGUCAUG
    crPTPN6_31 CTTT 307 CUCCACCUCUCGGGUGGUCAU
    crPTPN6_32 CTTT 308 CCAGAACAAAUGCGUCCCAUA
    crPTPN6_33 TTTC 309 CAGAACAAAUGCGUCCCAUAC
    crPTPN6_34 TTTG 310 UAUUCGGUUGUGUCAUGCUCC
    crPTPN6_35 CTTA 311 CAGGUCUCCCCGCUGGACAAU
    crPTPN6_36 CTTC 312 CUGGCUCGGCCCAGUCGCAAG
    crPTPN6_37 CTTA 313 GGGAGACCUGAUUCGGGAGAU
    crPTPN6_38 CTTC 314 CUGGACCAGAUCAACCAGCGG
    crPTPN6_39 TTTC 315 CUGCCGCUGGUUGAUCUGGUC
    crPTPN6_40 CTTT 316 CCUGCCGCUGGUUGAUCUGGU
    crPTPN6_41 CTTG 317 GUGGAGAUGUUCUCCAUGAGC
    crPTPN6_42 CTTG 318 UACUGCGCCUCCGUCUGCACC
    crPTPN6_43 TTTC 319 AAUGAACUGGGCGAUGGCCAC
    crPTPN6_44 CTTC 320 UUCUUAGUGGUUUCAAUGAAC
    crPTPN6_45 CTTC 321 UCCCCUCCAUACAGGUCAUAG
    crPTPN6_46 CTTG 322 GAGUCUAGUGCAGGGACCGUG
    crPTPN6_47 CTTG 323 CCCCCCUGCACCCGGCUGCAG
    crPTPN6_48 CTTG 324 UGUCUGCAGCCGGGUGCAGGG
    crPTPN6_49 TTTC 325 UCCUCCCUCUUGUUCUUAGUG
    crPTPN6_50 CTTT 326 CUCCUCCCUCUUGUUCUUAGU
    crPTPN6_51 CTTC 327 UUCACUUUCUCCUCCCUCUUG
    crPTPN6_52 CTTG 328 AGGUGGAUGAUGGUGCCGUCG
    crPTPN6_53 CTTC 329 CCUGACGCUGCCUUCUCUAGG
    crTIGIT_1 TTTC 330 AGGCCUUACCUGAGGCGAGGG
    crTIGIT_2 TTTT 331 GUCCUCCCUCUAGUGGCUGAG
    crTIGIT_3 CTTG 332 GGGUGGCACAUCUCCCCAUCC
    crTIGIT_4 TTTC 333 UGCAGAGAAAGGUGGCUCUAU
    crTIGIT_5 TTTG 334 UAAUGCUGACUUGGGGUGGCA
    crTIGIT_6 CTTA 335 CCUGAGGCGAGGGGAGCCUGC
    crTIGIT_7 CTTG 336 AAGGAUGGGGAGAUGUGCCAC
    crTIGIT_8 CTTC 337 AAGGAUCGAGUGGCCCCAGGU
    crTIGIT_9 CTTC 338 UGCAUCUAUCACACCUACCCU
    crTIGIT_10 TTTC 339 UAGGACCUCCAGGAAGAUUCU
    crTIGIT_11 CTTT 340 CUAGGACCUCCAGGAAGAUUC
    crTIGIT_12 CTTG 341 CUCCAGCAGGAAUACCUGAGC
    crTIGIT_13 CTTG 342 GAGCCAUGGCCGCGACGCUGG
    crTIGIT_14 TTTC 343 UAGUCAACGCGACCACCACGA
    crTIGIT_15 CTTT 344 CUAGUCAACGCGACCACCACG
    crTIGIT_16 TTTG 345 UAGUUUGUUUGUUUUUAGAAG
    crTIGIT_17 TTTG 346 UUUGUUUUUAGAAGAAAGCCC
    crTIGIT_18 TTTG 347 UUUUUAGAAGAAAGCCCUCAG
    crTIGIT_19 TTTT 348 UAGAAGAAAGCCCUCAGAAUC
    crTIGIT_20 CTTC 349 CACAGAAUGGAUUCUGAGGGC
    crTIGIT_21 TTTT 350 CUCCUGAGGUCACCUUCCACA
    crTIGIT_22 CTTC 351 CUGGGGGUGAGGGAGCACUGG
    crTIGIT_23 CTTC 352 UGCCUGGACACAGCUUCCUGG
    crTIGIT_24 CTTC 353 GUCCUCUUCCCUAGGAAUGAU
    crTIGIT_25 CTTC 354 UGUAACUCAGGACAUUGAAGU
    crTIGIT_26 CTTC 355 AAUGUCCUGAGUUACAGAAGC
    crTIGIT_27 TTTC 356 UAUUGUGCCUGUCAUCAUUCC
    crTIGIT_28 TTTC 357 UCUGCAGAAAUGUUCCCCGUU
    crTIGIT_29 CTTT 358 CUCUGCAGAAAUGUUCCCCGU
    crTIGIT_30 CTTG 359 UGCCGUGGUGGAGGAGAGGUG
    crTIGIT_31 CTTC 360 UGGCCAUUUGUAAUGCUGACU
    crTIM3_1 CTTA 361 CUUGUAAGUAGUAGCAGCAGC
    crTIM3_2 TTTC 362 CAAGGAUGCUUACCACCAGGG
    crTIM3_3 CTTG 363 UAAGUAGUAGCAGCAGCAGCA
    crTIM3_4 CTTA 364 CCACCAGGGGACAUGGCCCAG
    crTIM3_5 TTTG 365 AAUGUGGCAACGUGGUGCUCA
    crTIM3_6 CTTT 366 UCUUCUGCAAGCUCCAUGUUU
    crTIM3_7 CTTT 367 GCCCCAGCAGACGGGCACGAG
    crTIM3_8 TTTC 368 AUCAGUCCUGAGCACCACGUU
    crTIM3_9 CTTT 369 CAUCAGUCCUGAGCACCACGU
    crTIM3_10 TTTA 370 GCCAGUAUCUGGAUGUCCAAU
    crTIM3_11 TTTG 371 CGGAAAUCCCCAUUUAGCCAG
    crTIM3_12 CTTT 372 GCGGAAAUCCCCAUUUAGCCA
    crTIM3_13 TTTC 373 CGCAAAGGAGAUGUGUCCCUG
    crTIM3_14 TTTG 374 GAUCCGGCAGCAGUAGAUCCC
    crTIM3_15 TTTT 375 UCAUCAUUCAUUAUGCCUGGG
    crTIM3_16 TTTT 376 CUUCUGCAAGCUCCAUGUUUU
    crTIM3_17 CTTC 377 AGGUUAAAUUUUUCAUCAUUC
    crTIM3_18 TTTG 378 AUGACCAACUUCAGGUUAAAU
    crTIM3_19 TTTA 379 ACCUGAAGUUGGUCAUCAAAC
    crTIM3_20 CTTA 380 UGUUGUUUCUGACAUUAGCCA
    crTIM3_21 TTTC 381 UGACAUUAGCCAAGGUCACCC
    crTIM3_22 CTTG 382 GAAAGGCUGCAGUGAAGUCUC
    crTIM3_23 CTTC 383 ACUGCAGCCUUUCCAAGGAUG
    crTIM3_24 CTTT 384 CCAAGGAUGCUUACCACCAGG
    crTIM3_25 TTTT 385 CACAUCUUCCCUUUGACUGUG
    crTIM3_26 TTTT 386 UAUAGCAGAGACACAGACACU
    crTIM3_27 TTTA 387 UAUCAGGGAGGCUCCCCAGUG
    crTIM3_28 CTTA 388 CUGUUAGAUUUAUAUCAGGGA
    crTIM3_29 TTTG 389 UGUUUCCAUAGCAAAUAUCCA
    crTIM3_30 TTTC 390 CAUAGCAAAUAUCCACAUUGG
    crTIM3_31 CTTA 391 CGGGACUCUGGAGCAACCAUC
    crTIM3_32 TTTG 392 AAAAUUAAAGCGCCGAAGAUA
    crTIM3_33 CTTA 393 CAUUUGAAAAUUAAAGCGCCG
    crTIM3_34 CTTT 394 UGUUUCCCCCUUACUAGGGUA
    crTIM3_35 TTTT 395 GUUUCCCCCUUACUAGGGUAU
    crTIM3_36 CTTT 396 GACUGUGUCCUGCUGCUGCUG
    crTIM3_37 TTTC 397 CCCCUUACUAGGGUAUUCUCA
    crTIM3_38 CTTA 398 CUAGGGUAUUCUCAUAGCAAA
    crTIM3_39 CTTA 399 AAUUCUGUAUCUUCUCUUUGC
    crTIM3_40 CTTT 400 AUUUCCACAGCCUCAUCUCUU
    crTIM3_41 TTTA 401 UUUCCACAGCCUCAUCUCUUU
    crTIM3_42 TTTC 402 CACAGCCUCAUCUCUUUGGCC
    crTIM3_43 TTTG 403 GCCAACCUCCCUCCCUCAGGA
    crTIM3_44 TTTG 404 CCAAUCCUGAGGGAGGGAGGU
    crTIM3_45 TTTT 405 CUUCUGAGCGAAUUCCCUCUG
    crTIM3_46 CTTC 406 AUAUACGUUCUCUUCAAUGGU
    crTIM3_47 CTTT 407 GGGUUGUCGCUUUGCAAUGCC
    crTIM3_48 TTTG 408 GGUUGUCGCUUUGCAAUGCCA
    crTIM3_49 CTTC 409 UCUCUCUAUGCAGGGUCCUCA
    crTIM3_50 CTTC 410 UACACCCCAGCCGCCCCAGGG
    crTIM3_51 TTTG 411 CCCCAGCAGACGGGCACGAGG
    crAAVS1 TTTC 412 TTAGGATGGCCTTCTCCGACG
  • Firstly, using a gNA targeting the DNMT1 locus, the editing frequency of MAD7 comprising either one or four NLS complexed with the respective gNA was compared. RNP concentration-dependent modification efficiency was observed as evidenced by an increased fraction of modified amplicons (FIG. 2 , left axis, dark grey for MAD7-1NLS and light grey representing MAD7-4NLS). Error bars represent one standard deviation for a sample of 3 (n=3). In this experiment, editing frequency was enhanced in Jurkat cells when treated with RNPs comprising MAD-4NLS, which indicates that optimization of the NLS can improve editing efficiency. A slight decrease in cell viability was seen at higher concentrations of RNP for those comprising four NLS as compared to one NLS (FIG. 2 , right axis). Specifically, FIG. 2 shows editing frequency at the DNMT1 locus (n=3; Mean±SD) and cell viability of T-cell leukemic cells as a function of MAD7 comprising one or four nuclear localization signal (NLS) and MAD7-RNP amounts (pmol; constant ratio of 1:1.5 MAD7:gNA). Dark grey bars and circles represent mean modification frequency and viability using MAD7-1NLS, respectively. Light grey bars and triangles represent mean modification frequency and viability using MAD7-4NLS, respectively.
  • To optimize editing activity, 93 different transfection conditions were tested; 31 nucleofection programs in combination with three buffers—on the Lonza Nucleofector 96-well Shuttle System (FIGS. 3-5 ). FIGS. 3, 4, and 5 show the editing frequency (bars; x-axis) of each of the electroporation conditions (buffers SE, SF, and SG respectively) as compared to a control (y-axis, control at the top). The majority of buffer-program transfection combinations resulted in suboptimal viability (dots; x-axis) and editing frequency, however, the analysis revealed several conditions that supported substantial rates of both cell viability and editing. Two improved conditions observed in the screen, namely SF-CA-137 and SG-CA-138, were then validated and compared to the Lonza recommended nucleofection programs for T-cell leukemia, namely SE-CL-120 and SE-CK-116 (FIG. 6 ). Specifically, FIG. 6 shows editing frequency at the DNMT1 locus (n=4; Mean±SD) in T-cell leukemic cell line achieved by utilization of the transfection conditions identified in FIG. 2 (100 pmol MAD7-4NLS) and Lonza recommended nucleofection programs SE-CK-116 and SE-CL-120, as well as the two best nucleofection programs observed in this study, SF-CA-137 and SG-CA-138 (FIGS. 3-5 ). Dark grey bars represent mean modification frequency using crDNMT1. Light grey bars represent mean modification frequency using crIDTneg (Integrated DNA Technologies, IDT).
  • Example 12: Scalable High-Level MAD7-RNP Editing of Immunologically Relevant Genes in Jurkate T-Cell Leukemia Cell Line
  • The Jurkat T-cell leukemia cell line was used as a model system to screen GNAs demonstrating high editing efficiency. The screen included 298 unique gNAs comprising one or more spacer sequences of SEQ ID NOs: 86-384 of Table 1 targeting the immune checkpoint receptors PDCD1, TIM3, LAG3, TIGIT, and CTLA4, the checkpoint phosphatases PTPN6 (SHP-1) and PTPN11 (SHP-2), and the TCR signaling subunit CD247 (CD3ζ). RNPs were generated as described in Example 3, nucleofected (Example 5), genomic DNA was extracted (Example 8), the edited loci amplified and sequenced (Example 9), and the sequencing data computationally analyzed (Example 10) using the CRISPResso2 algorithm.
  • CRISPResso2 software reports the frequency of modifications (insertions, deletions, and substitutions) within a quantification window flanking the position of MAD7-induced cleavage in the amplicon sequence. To better understand detection of editing events, the type of modifications detected in 230 amplicons that were sequenced in both gNA-treated and MOCK samples (no MAD7) were compared. Relatively high modification frequencies (median 1%) in MOCK reactions were observed as a result of high frequency of substitutions (FIG. 7 , light grey bars); substitutions were detected at a median frequency of 0.96%, likely due to the errors in NGS base calling or substitutions arising during DNA amplification, while insertions and deletions were found at a much lower median frequency of 0.003% and 0.042%, respectively. Specifically, FIG. 7 shows editing frequency at eight different loci using 298 gNAs (n=3; Mean±SD) in T-cell leukemic cell line as a function of various editing types: all modifications, only insertions, only deletions, only substitutions, or insertions and deletions (INDELs). Edits were achieved using the transfection conditions identified in Example 11, FIG. 2 (100 pmol MAD7-4NLS) and one of the tested Lonza nucleofection programs (FIG. 6 ; SF-CA-137). Dark grey boxplots represent mean modification frequency using gNAs. Light grey boxplots represent mean modification frequency using crIDTneg (IDT). Thus, the frequency of both insertions and deletions (INDEL) were used as a means to quantify the editing activity of the CRISPR-MAD7 system to minimize low end noise. Moreover, low INDEL frequencies in MOCK reactions enabled sensitive detection of editing events at a significantly greater fraction of sites (Fisher exact test, P=3×10−12; FIG. 8 ). Analysis of gNAs with low INDEL frequencies showed statistically significant editing in gNA-treated samples compared to MOCK samples at INDEL frequencies as low as 0.5% (Fisher exact test, P=4×10−8; FIG. 8 ). This indicates the sensitivity of the assay to detect modifications in the sub-1% range. Specifically, FIG. 8 shows INDEL frequency at eight different loci using 298 gNAs (n=3; Mean±SD) in T-cell leukemic cell line as a function of two modification types: all modifications <1%, and INDELs <1%, or <0.5%, or <0.1%, with lower INDEL frequencies in MOCK compared to gNA reactions at INDELs <1% (Fisher's exact test; P=3×10−12) and <0.5% (Fisher exact test, P=4×10−8). Dark grey boxplots represent mean INDEL frequency using gNAs. Light grey boxplots represent mean INDEL frequency using crIDTneg (IDT).
  • Since MAD7 can target a wide range of PAM, gNAs adjacent to all YTTN PAM variants were screened and editing specificity of MAD7 in Jurkat cells was analyzed. MAD7 demonstrated editing with all eight combinations of YTTN PAM; in this experiment, editing was higher at the YTTV and TTTV consensus sequences (Fisher exact test; P=2×10−3 and P=2×10−4, respectively). While the majority of highly-active (>50% INDEL frequency) gNAs were found at sites with YTTV and TTTV PAMs, moderately-active (>10% INDEL frequency) gNAs were found to target every PAM sequence with the exception of CTTT. This indicates that MAD7 can edit a wide range of target PAMs, albeit at reduced frequencies (FIG. 9 ). Specifically, FIG. 9 shows INDEL frequency at eight different loci using 298 gNAs (n=3; Mean±SD) in T-cell leukemic cell line as a function of eight YTTN PAM combinations, and TTTV, YTTN, and YTTV PAM motifs. A grey zone on the plot represents moderately-active gNAs (10-50% INDELs), the zone above highly-active gNAs (>50% INDELs), and the zone below active gNAs (1-10% INDELs). INDEL frequency at the YTTV and TTTV PAM motif is significantly higher compared to YTTN motif (Fisher exact test, P=2×10−3 and P=2×10−4, respectively).
  • Given the large number of gNAs analyzed, it was determined if the targeted DNA sequence biases editing efficiency. Sequence logos were made to compare the DNA-complementary gNA sequences of inactive (<1% INDELs), active (1-10% INDELs), moderately-active (10-50% INDELs), and highly-active (>50% INDELs) gNAs (FIG. 10A). While no strong biases for ribonucleotides at specific positions were identified in this experiment, guanine appeared overrepresented and uracil underrepresented on moderately-active and highly-active gNAs. Next, the frequency of ribonucleotide bases were analyzed within the same four classes of gNAs (FIG. 10B). The analysis confirmed significant enrichment of guanine and depletion of uracil on highly-active gNAs. Specifically, FIG. 10 shows (A) sequence logos comparing DNA-complementary gNA sequences of highly-active (>50% INDELs), moderately-active (10-50% INDELs), active (1-10% INDELs), and inactive (<1% INDELs) gNAs show no strong biases for ribonucleotides at specific positions, however, guanine appeared overrepresented and uracil underrepresented on highly-active and moderately-active gNAs; (B) nucleotide frequency on inactive (<1% INDELs; dark grey box), active (1-10% INDELs; medium grey box), moderately-active (10-50% INDELs; light grey box), and highly-active (>50% INDELs; white box) gNAs, with significant enrichment of guanine and depletion of uracil on highly-active gNAs compared to inactive gNAs (Fisher exact test, P=4×10−3 and P=3×10−4, respectively). Also, significant enrichment of guanine-cytosine content and depletion of adenine-uracil content was observed on moderately-active gNAs compared to inactive gNAs (Fisher exact test, P=1×10−2). Moreover, the data showed that nearly 40% of inactive gNAs had runs of three or more adenine or uracil ribonucleotides, while none of the highly-active and <20% of moderately-active gNAs contained such runs (FIG. 11 ). These sequence features can act as an algorithm for selecting putative high-activity gNAs during initial rounds of screening, and could reduce the overall cost of identifying gNAs for various genes of interest. Specifically, FIG. 11 shows fraction of gNAs with AAA and/or UUU runs as a function of INDEL frequency of highly-active (>50% INDELs), moderately-active (10-50% INDELs), active (1-10% INDELs), and inactive (<1% INDELs) gNAs. Fraction of inactive (<1% INDELs) and active (1-10% INDELs) gNAs containing such runs is higher compared to highly-active (>50% INDELs) gNAs (Fisher exact test, P=1×10−3 and P=4×10−4, respectively).
  • Example 13: Validation of gNAs for Gene Editing and Disruption of Immunologically Relevant Genes Using T-Cell Leukemia Line
  • High-efficiency gNAs identified in our initial analysis were validated by assaying INDEL frequency for the top three or five gNAs for each of the selected immunologically relevant genes (FIG. 12 ). Specifically, FIG. 12 shows INDEL (dark grey bars) and frameshift (light grey bars) frequencies (n=3; Mean±SD) in T-cell leukemic cell line as a function of 38 high-efficiency gNAs. Alternating grey and white zones on the plot represent groups of three to five high-efficiency gNAs per locus. In the validation experiment, the INDEL frequency was significantly correlated to the measurements from the initial screen, highlighting the reproducibility of the INDEL assay (FIG. 13 ). Specifically, FIG. 13 shows correlation of INDEL frequency in the gNA validation experiment versus INDEL formation in the gNA screen experiment (Spearman's correlation=0.91; P=9×10−14), highlighting reproducibility of the INDEL assay. Using the CRISPresso2 software, the degree of open reading frame (ORF) disruption for each of the validated gNAs was estimated (FIG. 12 ). In addition, for four high-efficiency gNAs targeting three different exons at the PDCD1 locus, surface expression of the PDCD1 protein was measured by flow cytometry 4, 7, and 11 days post-transfection (data not shown). The data revealed that the protein surface expression after transfection with crPDCD1_2, a gNA targeting the PDCD1 gene at the extracellular domain of the protein, was as low as 10% 4 days post-transfection and remained at this level even at day 11 post-transfection. The surface expression after transfection with the remaining three gNAs was significantly higher, 35% and 85% after transfection with crPDCD1_3 and both crPDCD1_4 and crPDCD1_5, respectively. This is in line with the ORF data analysis, which showed that for most of the gNAs including the high-efficiency crPDCD1s, the predicted number of INDELs leading to frameshifts was similar to that expected from an unbiased DNA repair process, with frameshifts in two-thirds of the edited loci (FIG. 14 ). However, several of the gNAs had a markedly different degree of ORF disruption; crCD247_4 resulted in frameshifts with 97% frequency, while crTIM3_1 and crTIM3_3 resulted in frameshifts with 23% and 44% frequency, respectively (FIG. 14 ). Specifically, FIG. 14 shows fraction of frameshift to INDEL frequency (dark grey bars) in T-cell leukemic cell line as a function of 38 high-efficiency gNAs. Average fraction of INDELs leading to frameshifts (dashed line) is approx. 66%. Alternating grey and white zones on the plot represent groups of three to five high-efficiency gNAs per locus. The analysis of repair products indicates that in the case of crTIM3_1, and to some extent crTIM3_3, the bias arose from directly repeated sequences at the DNA cleavage site, which possibly promoted microhomology-mediated end joining (MMEJ) repair following DNA cleavage. These data help inform selection of gNAs for gene KO since some gNAs, such as crTIM3_1, have much lower frequency of gene disruption than would be predicted based on the frequency of INDEL formation.
  • Another consideration for selecting gNAs is the potential for off-target cleavage events. The list of validated gNAs was analyzed using the CasOFFinder software to predict potential off-target editing sites in the genome with up to four mismatches between the gNA and the target DNA sequence. Using the Bioconductor R packages, the predicted off-target sites were matched with the human gene database, and those sites that targeted exons and introns within the genes were extracted. Afterwards, the degree of editing activity at these sites was examined by targeted next-generation sequencing, more specifically, at 25 predicted off-target sites for the top-two PDCD1 gNAs, i.e., crPDCD1_1 and crPDCD1_2. The analysis revealed low-level off-target activity at crPDCD1_2_13 and crPDCD1_2_15 sites, however, INDEL formation at these two sites was statistically insignificant compared to MOCK samples (non-targeting gNAs) (Pairwise T-test, P≥0.05; FIGS. 15 and 16 ). INDEL frequency at 43 putative off-target sites with up to three mismatches between gNA and target DNA sequence were assayed for the top-two gNAs targeting seven remaining genes (i.e., TIM3, LAG3, TIGIT, CTLA4, PTPN6, PTPN11, and CD247; spacer sequences in Table 1). The analysis revealed no detectable activity at any of the putative off-target sites (FIGS. 15 and 16 ), which confirms the high cleavage fidelity of MAD7-gNA complexes. Specifically, FIGS. 15-16 show INDEL frequency of MAD7 (n=3; Mean±SD) in T-cell leukemic cell line at predicted off-target sites analyzed by targeted deep sequencing. For crPDCD1, INDEL frequency was analyzed at the putative off-target editing sites with ≤4 mismatches between the gNA and target DNA sequence, and with ≤3 mismatches on the remaining gNAs. PAM sequences and spacer sequences with mismatches marked in red are displayed next to their respective measured INDEL frequencies. No significant INDEL frequency at any of the off-target sites was detected (Pairwise T-test, P≥0.05).
  • Insertion of exogenous transgenes is an important aspect of mammalian cell engineering. Gene insertion with CRISPR-Cas is achieved by homology-directed repair of CRISPR-induced DNA breaks using HDR-donor templates to copy exogenous genetic sequences into targeted DNA loci. Several studies indicate that HDR templates, composed of linear double stranded DNA, provide the most robust and efficient method of transgene insertion using CRISPR-Cas genome editing systems.
  • The Jurkat T-cell leukemia cell line was used to evaluate the transgene insertion and expression efficiency using CRISPR-MAD7 RNP complexes. A highly active gNA targeting the AAVS1 (spacer sequence in Table 1) safe-harbor locus (FIG. 17 ) was used in combination with eight different HDR-repair templates flanked with symmetric homology arms (HA) of 500 base pairs (bp) in the amount of 0.5 μg μL−1. Specifically, FIG. 17 shows INDEL frequency at the AAVS1 locus (n=3; Mean±SD) in T-cell leukemic cell line as a function of MAD7-RNP amounts (pmol; constant ratio of 1:1.5 MAD7:gNA). Dark grey bars represent mean INDEL frequency using crAAVS1. Light grey bars represent mean modification frequency using crIDTneg (IDT). The HDR inserts comprised eight promoters (Table 2) differing in both size and promoter strength to drive GFP expression (FIG. 18 ). When the transient GFP expression diminished at day 14 post-transfection, comparable insertion efficiencies were observed with stable GFP expressions of up to 30% using four (JET, PGK, EF1a, and CAG) out of eight promoters (FIG. 18 ), suggesting that the insert size has not affected the integration efficiency at AAVS1 in human T-cell leukemia cell line. Specifically, FIG. 18 shows GFP insertion efficiency at AAVS1 (n=3; Mean±SD) and cell viability of T-cell leukemic cell line measured at day 14 post-transfection. HDR templates consisting of eight different promoters and flanked with symmetric homology arms of 500 base pairs in the amount of 0.5 μg μL−1 were used. Size of promoters in base pairs: CMV, 1400; SCP, 970; CMVe-SCP, 1270; CMVmax, 1830; JET, 1100; CAG, 2600; PGK, 1410; EF-1α, 2090. Dark grey bars and circles present mean insertion frequency and cell viability using crAAVS1. Light grey bars represent mean insertion frequency and cell viability using crIDTneg (IDT).
  • TABLE 2
    SEQ
    Name ID NO Sequence
    CMV 413 CGTTACATAACTTACGGTAA
    ATGGCCCGCCTGGCTGACCG
    CCCAACGACCCCCGCCCATT
    GACGTCAATAATGACGTATG
    TTCCCATAGTAACGCCAATA
    GGGACTTTCCATTGACGTCA
    ATGGGTGGAGTATTTACGGT
    AAACTGCCCACTTGGCAGTA
    CATCAAGTGTATCATATGCC
    AAGTACGCCCCCTATTGACG
    TCAATGACGGTAAATGGCCC
    GCCTGGCATTATGCCCAGTA
    CATGACCTTATGGGACTTTC
    CTACTTGGCAGTACATCTAC
    GTATTAGTCATCGCTATTAC
    CATGGTGATGCGGTTTTGGC
    AGTACATCAATGGGCGTGGA
    TAGCGGTTTGACTCACGGGG
    ATTTCCAAGTCTCCACCCCA
    TTGACGTCAATGGGAGTTTG
    TTTTGGCACCAAAATCAACG
    GGACTTTCCAAAATGTCGTA
    ACAACTCCGCCCCATTGACG
    CAAATGGGCGGTAGGCGTGT
    ACGGTGGGAGGTCTATATAA
    GCAGAGCT
    SCP 414 GTACTTATATAAGGGGGTGG
    GGGCGCGTTCGTCCTCAGTC
    GCGATCGAACACTCGAGCCG
    AGCAGACGTGCCTACGGACC
    G
    CMVe- 415 CGTTACATAACTTACGGTAA
    SCP ATGGCCCGCCTGGCTGACCG
    CCCAACGACCCCCGCCCATT
    GACGTCAATAATGACGTATG
    TTCCCATAGTAACGCCAATA
    GGGACTTTCCATTGACGTCA
    ATGGGTGGAGTATTTACGGT
    AAACTGCCCACTTGGCAGTA
    CATCAAGTGTATCATATGCC
    AAGTACGCCCCCTATTGACG
    TCAATGACGGTAAATGGCCC
    GCCTGGCATTATGCCCAGTA
    CATGACCTTATGGGACTTTC
    CTACTTGGCAGTACATCTAC
    GTATTAGTCATCGCTATTAC
    CATGGTACTTATATAAGGGG
    GTGGGGGCGCGTTCGTCCTC
    AGTCGCGATCGAACACTCGA
    GCCGAGCAGACGTGCCTACG
    GACCG
    CMVmax 416 TCAATATTGGCCATTAGCCA
    TATTATTCATTGGTTATATA
    GCATAAATCAATATTGGCTA
    TTGGCCATTGCATACGTTGT
    ATCTATATCATAATATGTAC
    ATTTATATTGGCTCATGTCC
    AATATGACCGCCATGTTGGC
    ATTGATTATTGACTAGTTAT
    TAATAGTAATCAATTACGGG
    GTCATTAGTTCATAGCCCAT
    ATATGGAGTTCCGCGTTACA
    TAACTTACGGTAAATGGCCC
    GCCTGGCTGACCGCCCAACG
    ACCCCCGCCCATTGACGTCA
    ATAATGACGTATGTTCCCAT
    AGTAACGCCAATAGGGACTT
    TCCATTGACGTCAATGGGTG
    GAGTATTTACGGTAAACTGC
    CCACTTGGCAGTACATCAAG
    TGTATCATATGCCAAGTCCG
    CCCCCTATTGACGTCAATGA
    CGGTAAATGGCCCGCCTGGC
    ATTATGCCCAGTACATGACC
    TTACGGGACTTTCCTACTTG
    GCAGTACATCTACGTATTAG
    TCATCGCTATTACCATGGTG
    ATGCGGTTTTGGCAGTACAC
    CAATGGGCGTGGATAGCGGT
    TTGACTCACGGGGATTTCCA
    AGTCTCCACCCCATTGACGT
    CAATGGGAGTTTGTTTTGGC
    ACCAAAATCAACGGGACTTT
    CCAAAATGTCGTAATAACCC
    CGCCCCGTTGACGCAAATGG
    GCGGTAGGCGTGTACGGTGG
    GAGGTCTATATAAGCAGAGG
    TCGTTTAGTGAACCGTCAGA
    TCACTAGTAGCTTTATTGCG
    GTAGTTTATCACAGTTAAAT
    TGCTAACGCAGTCAGTGCTC
    GACTGATCACAGGTAAGTAT
    CAAGGTTACAAGACAGGTTT
    AAGGAGGCCAATAGAAACTG
    GGCTTGTCGAGACAGAGAAG
    ATTCTTGCGTTTCTGATAGG
    CACCTATTGGTCTTACTGAC
    ATCCACTTTGCCTTTCTCTC
    CACAGGG
    JET 417 GAATTCGGGCGGAGTTAGGG
    CGGAGCCAATCAGCGTGCGC
    CGTTCCGAAAGTTGCCTTTT
    ATGGCTGGGCGGAGAATGGG
    CGGTGAACGCCGATGATTAT
    ATAAGGACGCGCCGGGTGTG
    GCACAGCTAGTTCCGTCGCA
    GCCGGGATTTGGGTCGCGGT
    TCTTGTTTGTGGATCCCTGT
    GATCGTCACTTGACA
    CAG 418 ATCTCGACTAGTTATTAATA
    GTAATCAATTACGGGGTCAT
    TAGTTCATAGCCCATATATG
    GAGTTCCGCGTTACATAACT
    TACGGTAAATGGCCCGCCTG
    GCTGACCGCCCAACGACCCC
    CGCCCATTGACGTCAATAAT
    GACGTATGTTCCCATAGTAA
    CGCCAATAGGGACTTTCCAT
    TGACGTCAATGGGTGGAGTA
    TTTACGGTAAACTGCCCACT
    TGGCAGTACATCAAGTGTAT
    CATATGCCAAGTACGCCCCC
    TATTGACGTCAATGACGGTA
    AATGGCCCGCCTGGCATTAT
    GCCCAGTACATGACCTTATG
    GGACTTTCCTACTTGGCAGT
    ACATCTACGTATTAGTCATC
    GCTATTACCATGGTCGAGGT
    GAGCCCCACGTTCTGCTTCA
    CTCTCCCCATCTCCCCCCCC
    TCCCCACCCCCAATTTTGTA
    TTTATTTATTTTTTAATTAT
    TTTGTGCAGCGATGGGGGCG
    GGGGGGGGGGGGGGGCGCGC
    GCCAGGCGGGGCGGGGCGGG
    GCGAGGGGCGGGGCGGGGCG
    AGGCGGAGAGGTGCGGCGGC
    AGCCAATCAGAGCGGCGCGC
    TCCGAAAGTTTCCTTTTATG
    GCGAGGCGGCGGCGGCGGCG
    GCCCTATAAAAAGCGAAGCG
    CGCGGCGGGCGGGGAGTCGC
    TGCGACGCTGCCTTCGCCCC
    GTGCCCCGCTCCGCCGCCGC
    CTCGCGCCGCCCGCCCCGGC
    TCTGACTGACCGCGTTACTC
    CCACAGGTGAGCGGGCGGGA
    CGGCCCTTCTCCTCCGGGCT
    GTAATTAGCGCTTGGTTTAA
    TGACGGCTTGTTTCTTTTCT
    GTGGCTGCGTGAAAGCCTTG
    AGGGGCTCCGGGAGGGCCCT
    TTGTGCGGGGGGAGCGGCTC
    GGGGGGTGCGTGCGTGTGTG
    TGTGCGTGGGGAGCGCCGCG
    TGCGGCTCCGCGCTGCCCGG
    CGGCTGTGAGCGCTGCGGGC
    GCGGCGCGGGGCTTTGTGCG
    CTCCGCAGTGTGCGCGAGGG
    GAGCGCGGCCGGGGGCGGTG
    CCCCGCGGTGCGGGGGGGGC
    TGCGAGGGGAACAAAGGCTG
    CGTGCGGGGTGTGTGCGTGG
    GGGGGTGAGCAGGGGGTGTG
    GGCGCGTCGGTCGGGCTGCA
    ACCCCCCCTGCACCCCCCTC
    CCCGAGTTGCTGAGCACGGC
    CCGGCTTCGGGTGCGGGGCT
    CCGTACGGGGCGTGGCGCGG
    GGCTCGCCGTGCCGGGCGGG
    GGGTGGCGGCAGGTGGGGGT
    GCCGGGCGGGGCGGGGCCGC
    CTCGGGCCGGGGAGGGCTCG
    GGGGAGGGGCGCGGCGGCCC
    CCGGAGCGCCGGCGGCTGTC
    GAGGCGCGGCGAGCCGCAGC
    CATTGCCTTTTATGGTAATC
    GTGCGAGAGGGCGCAGGGAC
    TTCCTTTGTCCCAAATCTGT
    GCGGAGCCGAAATCTGGGAG
    GCGCCGCCGCACCCCCTCTA
    GCGGGCGCGGGGCGAAGCGG
    TGCGGCGCCGGCAGGAAGGA
    AATGGGCGGGGAGGGCCTTC
    GTGCGTCGCCGCGCCGCCGT
    CCCCTTCTCCCTCTCCAGCC
    TCGGGGCTGTCCGCGGGGGG
    ACGGCTGCCTTCGGGGGGGA
    CGGGGCAGGGCGGGGTTCGG
    CTTCTGGCGTGTGACCGGCG
    GCTCTAGAGCCTCTGCTAAC
    CATGTTCATGCCTTCTTCTT
    TTTCCTACAGCTCCTGGGCA
    ACGTGCTGGTTATTGTGCTG
    TCTCATCATTTTGGCAAAGA
    ATT
    PGK 419 GGGGTTGGGGTTGCGCCTTT
    TCCAAGGCAGCCCTGGGTTT
    GCGCAGGGACGCGGCTGCTC
    TGGGCGTGGTTCCGGGAAAC
    GCAGCGGCGCCGACCCTGGG
    TCTCGCACATTCTTCACGTC
    CGTTCGCAGCGTCACCCGGA
    TCTTCGCCGCTACCCTTGTG
    GGCCCCCCGGCGACGCTTCC
    TGCTCCGCCCCTAAGTCGGG
    AAGGTTCCTTGCGGTTCGCG
    GCGTGCCGGACGTGACAAAC
    GGAAGCCGCACGTCTCACTA
    GTACCCTCGCAGACGGACAG
    CGCCAGGGAGCAATGGCAGC
    GCGCCGACCGCGATGGGCTG
    TGGCCAATAGCGGCTGCTCA
    GCAGGGCGCGCCGAGAGCAG
    CGGCCGGGAAGGGGCGGTGC
    GGGAGGCGGGGTGTGGGGCG
    GTAGTGTGGGCCCTGTTCCT
    GCCCGCGCGGTGTTCCGCAT
    TCTGCAAGCCTCCGGAGCGC
    ACGTCGGCAGTCGGCTCCCT
    CGTTGACCGAATCACCGACC
    TCTCTCCCCAG
    EF-1a 420 GAATTCAGGCTCCGGTGCCC
    GTCAGTGGGCAGAGCGCACA
    TCGCCCACAGTCCCCGAGAA
    GTTGGGGGGAGGGGTCGGCA
    ATTGAACCGGTGCCTAGAGA
    AGGTGGCGCGGGGTAAACTG
    GGAAAGTGATGTCGTGTACT
    GGCTCCGCCTTTTTCCCGAG
    GGTGGGGGAGAACCGTATAT
    AAGTGCAGTAGTCGCCGTGA
    ACGTTCTTTTTCGCAACGGG
    TTTGCCGCCAGAACACAGGT
    AAGTGCCGTGTGTGGTTCCC
    GCGGGCCTGGCCTCTTTACG
    GGTTATGGCCCTTGCGTGCC
    TTGAATTACTTCCACCTGGC
    TGCAGTACGTGATTCTTGAT
    CCCGAGCTTCGGGTTGGAAG
    TGGGTGGGAGAGTTCGAGGC
    CTTGCGCTTAAGGAGCCCCT
    TCGCCTCGTGCTTGAGTTGA
    GGCCTGGCCTGGGCGCTGGG
    GCCGCCGCGTGCGAATCTGG
    TGGCACCTTCGCGCCTGTCT
    CGCTGCTTTCGATAAGTCTC
    TAGCCATTTAAAATTTTTGA
    TGACCTGCTGCGACGCTTTT
    TTTCTGGCAAGATAGTCTTG
    TAAATGCGGGCCAAGATCTG
    CACACTGGTATTTCGGTTTT
    TGGGGCCGCGGGCGGCGACG
    GGGCCCGTGCGTCCCAGCGC
    ACATGTTCGGCGAGGCGGGG
    CCTGCGAGCGCGGCCACCGA
    GAATCGGACGGGGGTAGTCT
    CAAGCTGGCCGGCCTGCTCT
    GGTGCCTGGTCTCGCGCCGC
    CGTGTATCGCCCCGCCCTGG
    GCGGCAAGGCTGGCCCGGTC
    GGCACCAGTTGCGTGAGCGG
    AAAGATGGCCGCTTCCCGGC
    CCTGCTGCAGGGAGCTCAAA
    ATGGAGGACGCGGCGCTCGG
    GAGAGCGGGCGGGTGAGTCA
    CCCACACAAAGGAAAAGGGC
    CTTTCCGTCCTCAGCCGTCG
    CTTCATGTGACTCCACGGAG
    TACCGGGCGCCGTCCAGGCA
    CCTCGATTAGTTCTCGAGCT
    TTTGGAGTACGTCGTCTTTA
    GGTTGGGGGGAGGGGTTTTA
    TGCGATGGAGTTTCCCCACA
    CTGAGTGGGTGGAGACTGAA
    GTTAGGCCAGCTTGGCACTT
    GATGTAATTCTCCTTGGAAT
    TTGCCCTTTTTGAGTTTGGA
    TCTTGGTTCATTCTCAAGCC
    TCAGACAGTGGTTCAAAGTT
    TTTTTCTTCCATTTCAGGTG
    TCGTGACATCATTTT
  • Subsequently, keeping the MAD7-RNP amounts constant, the effect of various homology arm lengths (100 vs 500 bp) and HDR template amounts (0.125 μg μL−1, 0.25 μg μL−1, 0.5 μg μL−1, and 1 μg μL−1) on the insertion efficiency was evaluated using JET and EF1a promoters. Up to 30% higher integration efficiency was observed with HDR templates flanked with HA of 500 compared to 100 base pairs. Moreover, the data showed improved insertion efficiencies with increasing amounts of HDR templates flanked with either 100 or 500 base pair HA but at the same time somewhat reduced cell viability (FIG. 19 ). Specifically, FIG. 19 shows GFP insertion efficiency at AAVS1 (n=3; Mean±SD) in T-cell leukemic cell line measured at days 2, 7, 14, and 21 post-transfection as a function of donor template amount. No transient GFP expression was observed at day 21 post-transfection. Cell viability (black circles) was measured at day 2 post-transfection. Top panels display GFP insertion efficiencies using donor template flanked with short homology arms (100 bp HA), and bottom panels donor template flanked with long homology arms (500 bp HA). Left panels display GFP insertion efficiencies using donor template containing EF-1α promoter (long, ˜2000 bp), and right panels donor template containing JET promoter (short, ˜1000 bp). Amount of donor template, represented by the gradient above the bars, increases from 0.125, 0.25, 0.5 to 1 μg μL−1. Dark grey bars represent mean insertion frequency using crAAVS1. Light grey bars represent mean insertion frequency using crIDTneg (IDT).
  • Next, using primary T-cells isolated from the human peripheral blood from three donors and a protocol selected from the experiments above, i.e., 150:100 pmol gNA:MAD7 RNP complex together with 1 μg μL−1 HDR template, in combination with 100 μg μL−1 poly-L-glutamic acid (PGA), integration efficiency of a clinically relevant CAR transgene containing JET or EF1a promoter flanked with HA of 100 or 500 base pairs and a bovine growth hormone derived polyadenylation sequence was analyzed. An anti-CD19 CAR with fully human variable regions (Hu19CAR), CD8α hinge and transmembrane domains, a CD28 costimulatory domain, and CD3ζ activation domain was used. Moderate insertion efficiency at AAVS1 but stable CAR expression of up to 14% and 16% was observed using HDR templates flanked with 100 and 500 base pair HA, respectively. The normalized cell viability measured 24 h post-transfection was in same cases relatively low, ranging from 22% with JET-500-CAR, 35% with JET-100-CAR, 43% with EF1a-100-CAR, to 55% with EF1a-500-CAR (FIG. 20 ). It is important to emphasize, that both CAR insertion efficiency and cell viability were higher in the treatment with PGA compared to the treatment without PGA (P≤0.05; data not shown). Specifically, FIG. 20 shows CAR insertion efficiency at AAVS1 (D=3; n=3; Mean±SD) in primary Pan T-cells measured at days 7 and 11 post-transfection. Cell viability was measured 24 hours post-transfection. Individual panels display CAR insertion efficiencies using donor template structure as described in FIG. 19 . Amount of donor template, MAD7-RNP, and PGA was 1 μg μL−1, 100:150 pmol MAD7:gNA, and 100 μg μL−1, in that order. Nucleofection program P3-EH-115 for transfection of primary T-cells was used. D represents number of biological replicas, and n number of technical replicas per D. Dark grey bars represent mean insertion frequency using crAAVS1. Light grey bars represent mean insertion frequency using crIDTneg (IDT).
  • Multiple parameters were reevaluated to further optimize primary T-cell viability and CAR insertion efficiencies at AAVS1. Using Pan T-cells isolated from the blood from two donors, the effect of RNP amount with 100 μg μL−1 PGA and EF1a-500-CAR template amount on CAR insertion efficiency and cell viability was tested (data not shown). Reducing the RNP amount to 75:50 pmol gNA:MAD7 RNP complex while increasing the donor template amount to 1.5 μg μL−1 led to improved CAR insertion efficiencies without significantly affecting cell viability (P≥0.05; data not shown). In addition, using the abovementioned transfection conditions in combination with the cell recovery in a post-transfection cultivation medium pretreated with 2 μM M3814 resulted in nearly 5-times more efficient CAR insertion than other experiments (FIG. 21 ). The optimized CRISPR-MAD7 transfection protocol resulted in CAR insertion efficiency of up to 85% 13-days post-transfection (median 65%) together with the median normalized cell viability as high as 62% 24 hours post-transfection. Specifically, FIG. 21 shows CAR insertion efficiency at AAVS1 (D=5; n=3) in primary Pan T-cells measured at day 7 post-transfection, and re-measured in two biological replicas at day 13 post-transfection (D=2; n=3). Cell viability was measured 24 hours post-transfection (D=5; n=3; Mean±SD). Amount or concentration of donor template, MAD7-RNP, PGA, and M3814 was 1.5 μg μL−1, 50:75 pmol MAD7:gNA, 100 μg μL−1, and 2 μM, respectively. Nucleofection program P3-EH-115 for transfection of primary T-cells was used. D represents number of biological replicas, and n number of technical replicas per D. Dark grey bars represent mean insertion frequency using crAAVS1. Light grey bars represent mean insertion frequency using crIDTneg (IDT).
  • EQUIVALENTS
  • Throughout the description, where compositions are described as having, including, or comprising specific components, or where processes and methods are described as having, including, or comprising specific steps, it is contemplated that, additionally, there are compositions of the present invention that consist essentially of, or consist of, the recited components, and that there are processes and methods according to the present invention that consist essentially of, or consist of, the recited processing steps.
  • In the application, where an element or component is said to be included in and/or selected from a list of recited elements or components, it should be understood that the element or component can be any one of the recited elements or components, or the element or component can be selected from a group consisting of two or more of the recited elements or components.
  • Further, it should be understood that elements and/or features of a composition or a method described herein can be combined in a variety of ways without departing from the spirit and scope of the present invention, whether explicit or implicit herein. For example, where reference is made to a particular compound, that compound can be used in various embodiments of compositions of the present invention and/or in methods of the present invention, unless otherwise understood from the context. In other words, within this application, embodiments have been described and depicted in a way that enables a clear and concise application to be written and drawn, but it is intended and will be appreciated that embodiments may be variously combined or separated without parting from the present teachings and invention(s). For example, it will be appreciated that all features described and depicted herein can be applicable to all aspects of the invention(s) described and depicted herein.
  • The terms “a” and “an” and “the” and similar references in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. For example, the term “a cell” includes a plurality of cells, including mixtures thereof. Where the plural form is used for compounds, salts, or the like, this is taken to mean also a single compound, salt, or the like.
  • It should be understood that the expression “at least one of” includes individually each of the recited objects after the expression and the various combinations of two or more of the recited objects unless otherwise understood from the context and use. The expression “and/or” in connection with three or more recited objects should be understood to have the same meaning unless otherwise understood from the context.
  • The use of the term “include,” “includes,” “including,” “have,” “has,” “having,” “contain,” “contains,” or “containing,” including grammatical equivalents thereof, should be understood generally as open-ended and non-limiting, for example, not excluding additional unrecited elements or steps, unless otherwise specifically stated or understood from the context.
  • Where the use of the term “about” is before a quantitative value, the present invention also includes the specific quantitative value itself, unless specifically stated otherwise. As used herein, the term “about” refers to a ±10% variation from the nominal value unless otherwise indicated or inferred.
  • It should be understood that the order of steps or order for performing certain actions is immaterial so long as the present invention remain operable. Moreover, two or more steps or actions may be conducted simultaneously.
  • The use of any and all examples, or exemplary language herein, for example, “such as” or “including,” is intended merely to illustrate better the present invention and does not pose a limitation on the scope of the invention unless claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the present invention.
  • The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting on the invention described herein. Scope of the invention is thus indicated by the appended claims rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are intended to be embraced therein.
  • Embodiments
  • In embodiment 1 provided herein is a composition comprising a nucleic acid-guided nuclease comprising a Type V CRISPR nuclease polypeptide comprising at least one nuclear localization signal (NLS) at or near the N-terminus or the C-terminus of the polypeptide. In embodiment 2 provided herein is the composition of embodiment 1 wherein the nuclease is a Type Va nuclease. In embodiment 3 provided herein is the composition of embodiment 1 or embodiment 2 wherein the Type V CRISPR nuclease polypeptide has at least 60, 70, 80, 85, 90, 95, 96, 97, 98, 99, or 100% sequence identity, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% sequence identity with SEQ ID NO: 1. In embodiment 4 provided herein is the composition of any previous embodiment wherein the Type V CRISPR nuclease polypeptide comprises two NLSs, one or both of which are at or near the N-terminus or the C-terminus of the polypeptide. In embodiment 5 provided herein is the composition of any previous embodiment wherein the Type V CRISPR nuclease polypeptide comprises three NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide. In embodiment 6 provided herein is the composition of any previous embodiment wherein the Type V CRISPR nuclease polypeptide comprises four NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide. In embodiment 7 provided herein is the composition of any previous embodiment wherein the Type V CRISPR nuclease polypeptide comprises at least five NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide. In embodiment 8 provided herein is the composition of any one of embodiments 4 through 7 wherein at least two of the NLSs are at or near the N-terminus of the polypeptide. In embodiment 9 provided herein is the composition of any one of embodiments 5 through 7 wherein at least three of the NLSs are at or near the N-terminus of the polypeptide. In embodiment 10 provided herein is the composition of any one of embodiments 6 through 7 wherein at least four of the NLSs are at or near the N-terminus of the polypeptide. In embodiment 11 provided herein is the composition of embodiment 7 wherein the 5 NLSs are at or near the N-terminus of the polypeptide. In embodiment 12 provided herein is the composition of embodiment 11 comprising a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to any one of SEQ ID NOs: 109-112. In embodiment 13 provided herein is the composition of any one of embodiments 1 through 3 wherein the Type V CRISPR nuclease polypeptide comprises at least 1-30, 1-20, 1-15, 1-10, 1-9, 1-8, 1-7, 1-6, 1-5, 2-30, 2-20, 2-15, 2-10, 2-9, 2-8, 2-7, 2-6, 2-5, 3-30, 3-20, 3-15, 3-10, 3-9, 3-8, 3-7, 3-6, or 3-5, preferably 1-10, more preferably 2-10, even more preferably 3-10 NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide. In embodiment 14 provided herein is the composition of any one of embodiments 4 through 11 wherein at least two of the NLSs have different nuclear localization mechanisms. In embodiment 15 provided herein is the composition of any one of embodiments 5 through 7 or 9 through 11 wherein at least three of the NLSs have different nuclear localization mechanisms. In embodiment 16 provided herein is the composition of any previous embodiment wherein one or more of the NLSs comprises an NLS of the SV40 virus large T-antigen, an NLS from nucleoplasmin, e.g. a nucleoplasmin bipartite NLS, a c-myc NLS; a hRNPA1 M9 NLS; an IBB domain of importin-alpha NLS; a myoma T protein NLS; a sequence from human p53 NLS; a sequence of mouse c-abl IV NLS; a sequence of influenza virus NS1 NLS; a sequence of Hepatitis virus delta antigen NLS; a sequence of mouse Mx1 protein NLS; a sequence of human poly(ADP-ribose) polymerase NLS; a sequence of steroid hormone receptors (human) glucocorticoid NLS; and/or a sequence of EGL-13 NLS. In embodiment 17 provided herein is the composition of embodiment 16 wherein one or more of the NLSs comprises an NLS of the SV40 virus large T-antigen. In embodiment 18 provided herein is the composition of embodiment 16 wherein two or more of the NLSs comprises an NLS of the SV40 virus large T-antigen. In embodiment 19 provided herein is the composition of embodiment 17 or embodiment 18 wherein the NLS or NLSs comprises the sequence of SEQ ID NO: 5. In embodiment 20 provided herein is the composition of any one of embodiments 16 through 19 wherein one or more of the NLSs comprises an NLS from nucleoplasmin. In embodiment 21 provided herein is the composition of embodiment 20 wherein the nucleoplasmin NLS comprises the sequence of SEQ ID NO: 6. In embodiment 22 provided herein is the composition of any one of embodiments 16 through 21 wherein one or more of the NLSs comprises a c-myc NLS. In embodiment 23 provided herein is the composition of embodiment 22 wherein the c-myc NLS comprises the sequence of SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 21. In embodiment 24 provided herein is the composition of embodiment 23 wherein the c-myc NLS comprises the sequence of SEQ ID NO: 21. In embodiment 25 provided herein is the composition of any one of embodiments 16 through 24 wherein one or more of the NLSs comprises a sequence of EGL-13 NLS. In embodiment 26 provided herein is the composition of embodiment 25 wherein the EGL-13 NLS comprises the sequence of SEQ ID NO: 107. In embodiment 27 provided herein is the composition of any previous embodiment wherein the Type V CRISPR nuclease polypeptide further comprises a purification tag. In embodiment 28 provided herein is the composition of embodiment 27 wherein the purification tag is at or near the N-terminus of the nuclease polypeptide. In embodiment 29 provided herein is the composition of embodiment 27 or embodiment 28 wherein the purification tag comprises a poly-his tag, such as a Gly-6×His tag (SEQ ID NO: 421) or Gly-8×His tag (SEQ ID NO: 422); short epitope tags, e.g., FLAG, hemagglutinin (HA), c-myc, T7, Glu-Glu; maltose binding protein (mbp); N-terminal glutathione S-transferase (GST); or calmodulin binding peptide (CBP) In embodiment 30 provided herein is the composition of embodiment 29 wherein the purification tag comprises a poly-his tag. In embodiment 31 provided herein is the composition of embodiment 30 wherein the purification tag comprises a gly-6×His tag (SEQ ID NO: 421). In embodiment 32 provided herein is the composition of embodiment 30 wherein the purification tag comprises a gly-8×His tag (SEQ ID NO: 422). In embodiment 33 provided herein is the composition of any previous embodiment wherein the Type V CRISPR nuclease polypeptide comprises a cleavage site. In embodiment 34 provided herein is the composition of embodiment 33 wherein the cleavage site is at or near the N-terminus of the nuclease polypeptide. In embodiment 35 provided herein is the composition of embodiment 33 or embodiment 34 wherein the cleavage site comprises a Tobacco Etch Virus (TEV) cleavage site. In embodiment 36 provided herein is the composition of embodiment 35 wherein the cleavage site comprises the sequence of SEQ ID NO: 108. In embodiment 37 provided herein is the composition of embodiment 36 comprising 5 NLSs at or near the N-terminus of the polypeptide, a purification tag, and the cleavage site, wherein the cleavage site is after the purification tag. In embodiment 38 provided herein is the composition of embodiment 37 comprising a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 8%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 111 or 112. In embodiment 39 provided herein is the composition of embodiment 37 comprising a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 8%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 112. In embodiment 40 provided herein is the composition of any previous embodiment further comprising a guide nucleic acid (gNA), e.g., gRNA, comprising a spacer sequence that targets a target nucleotide sequence within a polynucleotide, or a polynucleotide coding for the gNA, e.g., gRNA, wherein the gNA, e.g., gRNA is compatible with the Type V CRISPR nuclease. In embodiment 41 provided herein is the composition of embodiment 40 wherein the target nucleotide is within 50 nucleotides of a protospacer adjacent motif (PAM) sequence specific for the Type V CRISPR nuclease. In embodiment 42 provided herein is the composition of embodiment 41 wherein the PAM comprises a sequence of YTTN, wherein Y is T or C and N is A, T, G, or C. In embodiment 43 provided herein is the composition of embodiment 42 wherein the PAM comprises a sequence of YTTV or TTTV, wherein V is A, G, or C. In embodiment 44 provided herein is the composition of embodiment 40 wherein the gNA is a gRNA. In embodiment 45 provided herein is the composition of embodiment 44 wherein the gRNA is a dual gRNA. In embodiment 46 provided herein is the composition of embodiment 44 or embodiment 45 wherein the composition comprises the gRNA and the gRNA comprises one or more chemical modifications. In embodiment 47 provided herein is the composition of embodiment 46 wherein the chemical modification comprises a 2′-O-alkyl, a 2′-O-methyl, a phosphorothioate, a phosphonoacetate, a thiophosphonoacetate, a 2′-O-methyl-3′-phosphorothioate, a 2′-O-methyl-3′-phosphonoacetate, a 2′-O-methyl-3′-thiophosphonoacetate, a 2′-deoxy-3′-phosphonoacetate, a 2′-deoxy-3′-thiophosphonoacetate, a suitable alternative, or a combination thereof. In embodiment 48 provided herein is the composition of any one of embodiments 44 through 47 wherein a ratio of guanine:uracil in the gRNA is at least 51:49, 52:48, 53:47, 54:46, 55:45, 56:44, 57:43, 58:42, 59:42, or 60:40, preferably at least 53:47, more preferably at least 54:46, even more preferably at least 55:45. In embodiment 49 provided herein is the composition of any one of embodiments 40 through 48 wherein the molar ratio of gNA, e.g., gRNA to Type V CRISPR nuclease is at least 1.1:1, 1.2:1, 1.3:1, 1.4:1, 1.5:1, 1.6:1, 1.7:1, 1.8:1, 2:1, 2.2:1, 2.5:1, or 3:1 and/or not more than 1.2:1, 1.3:1, 1.4:1, 1.5:1, 1.6:1, 1.7:1, 1.8:1, 2:1, 2.2:1, 2.5:1, 3:1, or 4:1, preferably 1.1:1 to 2.5:1, more preferably 1.2:1 to 2:1, even more preferably 1.2:1 to 1.7:1. In embodiment 50 provided herein is the composition of any one of embodiments 40 through 49 wherein the molar amount of gNA, e.g., gRNA, is at least 10, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 170, 190 or 200 pmol and/or not more than 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, 130, 140, 150, 170, 190, 200, 250, or 300 pmol, preferably 25-200 pmol, more preferably 50-100 pmol, even more preferably 65 to 85 pmol. In embodiment 51 provided herein is the composition of any one of embodiments 40 through 50 further comprising a donor template. In embodiment 52 provided herein is the composition of embodiment 51 wherein the donor template comprises homology arms. In embodiment 53 provided herein is the composition of embodiment 51 or embodiment 52 wherein the donor template is present in an amount of at least 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.7, 2, 2.5, 3, 4, or 5 μg μL−1 and/or not more than 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.7, 2, 2.5, 3, 4, 5, 7, or 10 μg μL−1, preferably 0.3 to 2 μg μL−1, more preferably 0.5 to 1.5 μg μL−1, even more preferably 0.8 to 1.2 μg μL−1. In embodiment 54 provided herein is the composition of any one of embodiments 40 through 53 further comprising an anionic polymer. In embodiment 55 provided herein is the composition of embodiment 54 wherein the anionic polymer comprises polyglutamic acid (PGA). In embodiment 56 provided herein is the composition of embodiment 54 or embodiment 55 wherein the anionic polymer is present at a concentration of at least 20, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 170, 200, 250, 300, 400, or 500 μg μL−1 and/or not more than 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 170, 200, 250, 300, 400, 500, 700, or 1000 μg μL−1, preferably 20 to 200 μg μL−1, more preferably 50 to 150 μg μL−1, even more preferably 80 to 120 μg μL−1.
  • In embodiment 57 provided herein is a cell containing the composition of any previous embodiment. In embodiment 58 provided herein is the cell of embodiment 56 wherein the cell is a human cell. In embodiment 59 provided herein is the cell of embodiment 58 wherein the cell is an immune cell or a stem cell. In embodiment 60 provided herein is the cell of embodiment 59 wherein the cell is an immune cell. In embodiment 61 provided herein is the cell of embodiment 60 wherein the cell is a T cell. In embodiment 62 provided herein is the cell of embodiment 59 wherein the cell is a stem cell. In embodiment 63 provided herein is the cell of embodiment 62 wherein the cell is an induced pluripotent stem cell (iPSC).
  • In embodiment 64 provided herein is a method comprising inserting a composition of any one of embodiments 1 through 56 into a cell. In embodiment 65 provided herein is the method of embodiment 64 wherein inserting the composition into the cell comprises electroporation.
  • In embodiment 66 provided herein is a method for modifying a target polynucleotide comprising (i) contacting the composition of any one of embodiments 40 through 56 and (ii) allowing the nuclease and the guide nucleic acid to modify a targeted genomic region. In embodiment 67 provided herein is the method of embodiment 66 wherein the composition is a composition of any one of embodiments 51 through 56. In embodiment 68 provided herein is the method of embodiment 66 or embodiment 67 wherein the target polynucleotide is a genome or a portion of a genome within a cell. In embodiment 69 provided herein is the method of embodiment 68 wherein the cell is a human cell. In embodiment 70 provided herein is the method of embodiment 69 wherein the cell is an immune cell or a stem cell. In embodiment 71 provided herein is the method of embodiment 70 wherein the cell is an immune cell. In embodiment 72 provided herein is the method of embodiment 71 wherein the cell is a T cell. In embodiment 73 provided herein is the method of embodiment 70 wherein the cell is a stem cell. In embodiment 74 provided herein is the method of embodiment 73 wherein the stem cell is an iPSC In embodiment 75 provided herein is the method of any one of embodiments 67 through 74 wherein the donor template comprises a mutation in a PAM within 50 nucleotides of the target nucleotide sequence in the target polynucleotide. In embodiment 76 provided herein is the method of any one of embodiments 68 through 74 wherein the composition is a composition of embodiment 67 and the donor template comprises a polynucleotide coding for a polypeptide to be expressed by the cell. In embodiment 77 provided herein is the method of embodiment 76 wherein the polypeptide to be expressed by the cell comprises a chimeric antigen receptor (CAR) or a portion thereof. In embodiment 78 provided herein is the method of embodiment 77 wherein the cell is a human T cell or a human iPSC. In embodiment 79 provided herein is the method of embodiment 77 wherein the cell is a human T cell. In embodiment 80 provided herein is the method of embodiment 77 wherein the cell is a human iPSC.
  • In embodiment 81 provided herein is a composition comprising a first polynucleotide coding for a polypeptide comprising a nucleic acid-guided nuclease comprising a CRISPR Type V nuclease polypeptide, wherein the polynucleotide has less than 75% sequence identity to SEQ ID NO: 22. In embodiment 82 provided herein is the composition of embodiment 81 wherein the nuclease polypeptide comprises at least 1, 2, 3, 4, or 5 NLSs, wherein each of the NLSs is at or near the N-terminus or the C-terminus of the nuclease polypeptide. In embodiment 83 provided herein is the composition of embodiment 82 wherein one or more of the NLSs comprises an NLS of the SV40 virus large T-antigen, an NLS from nucleoplasmin, e.g. a nucleoplasmin bipartite NLS, a c-myc NLS; a hRNPA1 M9 NLS; an IBB domain of importin-alpha NLS; a myoma T protein NLS; a sequence from human p53 NLS; a sequence of mouse c-abl IV NLS; a sequence of influenza virus NS1 NLS; a sequence of Hepatitis virus delta antigen NLS; a sequence of mouse Mx1 protein NLS; a sequence of human poly(ADP-ribose) polymerase NLS; a sequence of steroid hormone receptors (human) glucocorticoid NLS; and/or a sequence of EGL-13 NLS. In embodiment 84 provided herein is the composition of embodiment 83 wherein one or more of the NLSs comprises an NLS of the SV40 virus large T-antigen. In embodiment 85 provided herein is the composition of embodiment 84 wherein the NLS or NLSs comprises the sequence of SEQ ID NO: 5. In embodiment 86 provided herein is the composition of any one of embodiments 83 through 85 wherein one or more of the NLSs comprises an NLS from nucleoplasmin. In embodiment 87 provided herein is the composition of embodiment 86 wherein the nucleoplasmin NLS comprises the sequence of SEQ ID NO: 6. In embodiment 88 provided herein is the composition of any one of embodiments 83 through 87 wherein one or more of the NLSs comprises a c-myc NLS. In embodiment 89 provided herein is the composition of embodiment 88 wherein the c-myc NLS comprises the sequence of SEQ ID NO: 7, SEQ ID NO: 8, or SEQ ID NO: 21. In embodiment 90 provided herein is the composition of embodiment 88 wherein the c-myc NLS comprises the sequence SEQ ID NO: 21. In embodiment 91 provided herein is the composition of any one of embodiments 83 through 90 wherein one or more of the NLSs comprises a sequence of EGL-13 NLS. In embodiment 92 provided herein is the composition of embodiment 91 wherein the EGL-13 NLS comprises the sequence of SEQ ID NO: 107. In embodiment 93 provided herein is the composition of any one of embodiments 82 through 92 wherein the NLS or NLSs is at or near the N-terminus of the polypeptide. In embodiment 94 provided herein is the composition of any one of embodiments 81 through 93 wherein the first polynucleotide comprises a polynucleotide coding for a purification tag. In embodiment 95 provided herein is the composition of embodiment 94 wherein the purification tag is at or near the N-terminus of the nuclease polypeptide. In embodiment 96 provided herein is the composition of embodiment 94 or 95 wherein the purification tag comprises a poly-his tag, such as a Gly-6×His tag (SEQ ID NO: 421) or Gly-8×His tag (SEQ ID NO: 422); short epitope tags, e.g., FLAG, hemagglutinin (HA), c-myc, T7, Glu-Glu; maltose binding protein (mbp); N-terminal glutathione S-transferase (GST); or calmodulin binding peptide (CBP). In embodiment 97 provided herein is the composition of embodiment 96 wherein the purification tag comprises a poly-his tag. In embodiment 98 provided herein is the composition of embodiment 97 wherein the purification tag comprises a gly-6×His tag (SEQ ID NO: 421). In embodiment 99 provided herein is the composition of embodiment 97 wherein the purification tag comprises a gly-8×His tag (SEQ ID NO: 422). In embodiment 100 provided herein is the composition of any one of embodiments 81 through 99 wherein the Type V CRISPR nuclease polypeptide comprises a cleavage site. In embodiment 101 provided herein is the composition of embodiment 100 wherein the cleavage site is at or near the N-terminus of the nuclease polypeptide. In embodiment 102 provided herein is the composition of embodiment 100 or 101 wherein the cleavage site comprises a Tobacco Etch Virus (TEV) cleavage site. In embodiment 103 provided herein is the composition of embodiment 102 wherein the cleavage site comprises the sequence of SEQ ID NO: 108. In embodiment 104 provided herein is the composition of embodiment 103 comprising 5 NLSs at or near the N-terminus of the polypeptide, a purification tag, and the cleavage site, wherein the cleavage site is after the purification tag. In embodiment 105 provided herein is the composition of any one of embodiments 81 through 104 wherein the polynucleotide codes for a polypeptide comprising a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to any one of SEQ ID NOs: 109-112 In embodiment 106 provided herein is the composition of any one of embodiments 81 through 105 wherein the polynucleotide codes for a polypeptide comprising a sequence at least 60, 70, 80, 85, 90, 95, 98, 99%, or 100%, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical identical to SEQ ID NO: 112. In embodiment 107 provided herein is the composition of any one of embodiments 81 through 105 wherein the first polynucleotide comprises a sequence at least 50, 60, 70, 80, 90, 95, 97, or 99% identical, or 100% identical, preferably at least 80%, more preferably at least 90%, even more preferably at least 95%, still more preferably at least 98% identical to SEQ ID NO: 113. In embodiment 108 provided herein is the composition of any one of embodiments 81 through 107 further comprising a second polynucleotide coding for a gNA or portion thereof, wherein the gNA, e.g., gRNA, comprises a spacer sequence that targets a target nucleotide sequence within a polynucleotide, or a polynucleotide coding for the gNA, e.g., gRNA, wherein the gNA, e.g., gRNA is compatible with the Type V CRISPR nuclease. In embodiment 109 provided herein is the composition of embodiment 108 wherein the first and second polynucleotides are the same. In embodiment 110 provided herein is the composition of any one of embodiments 81 through 109 further comprising third polynucleotide that comprises a donor template.
  • In embodiment 111 provided herein is a vector comprising the polynucleotide or polynucleotides of any one of embodiments 81 through 110.
  • In embodiment 112 provided herein is a cell comprising a composition of any one of embodiments 81 through 110. In embodiment 113 provided herein is the composition of embodiment 112 wherein the cell is a human cell. In embodiment 114 provided herein is the composition of embodiment 113 wherein the cell is an immune cell or a stem cell. In embodiment 115 provided herein is the composition of embodiment 113 wherein the cell is an immune cell. In embodiment 116 provided herein is the composition of embodiment 115 wherein the cell is T cell. In embodiment 117 provided herein is the composition of embodiment 113 wherein the cell is a stem cell. In embodiment 118 provided herein is the composition of embodiment 117 wherein the cell is an iPSC.
  • In embodiment 119 provided herein is a method comprising inserting the composition of any one of embodiments 81 through 111 into a cell. In embodiment 120 provided herein is the method of embodiment 119 wherein inserting the composition into the cell comprises electroporation.
  • In embodiment 121 provided herein is a method comprising (i) inserting a composition of any one of embodiments 81 through 107 into a cell and (ii) inserting a gNA, e.g. a gRNA, compatible with the Type V CRISPR nuclease coded for by the composition, into the cell. In embodiment 122 provided herein is the method of embodiment 121 wherein steps (i) and (ii) comprise electroporation.
  • While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims (28)

1.-122. (canceled)
123. A composition comprising a nucleic acid-guided nuclease comprising a Type V CRISPR nuclease polypeptide comprising at least three nuclear localization signals (NLS) at or near the N-terminus or the C-terminus of the polypeptide, wherein at least two of the NLSs are at or near the N-terminus of the polypeptide, or a polynucleotide encoding the nuclease.
124. The composition of claim 123 wherein the nuclease is a Type Va nuclease.
125. The composition of claim 123 wherein the Type V CRISPR nuclease polypeptide has at least 80% sequence identity with SEQ ID NO: 1.
126. The composition of claim 123 wherein the Type V CRISPR nuclease polypeptide comprises four NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide.
127. The composition of claim 123 wherein the Type V CRISPR nuclease polypeptide comprises at least five NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide.
128. The composition of claim 123 wherein at least three of the NLSs are at or near the N-terminus of the polypeptide.
129. The composition of claim 126 wherein at least four of the NLSs are at or near the N-terminus of the polypeptide.
130. The composition of claim 127 wherein the 5 NLSs are at or near the N-terminus of the polypeptide.
131. The composition of claim 130 wherein the polypeptide comprises a sequence at least 80%, identical to any one of SEQ ID NOs: 109-112.
132. The composition of claim 123 wherein at least two of the NLSs have different nuclear localization mechanisms.
133. The composition of claim 123 wherein one or more of the NLSs comprises an NLS of the SV40 virus large T-antigen, an NLS from nucleoplasmin, e.g. a nucleoplasmin bipartite NLS, a c-myc NLS; a hRNPA1 M9 NLS; an IBB domain of importin-alpha NLS; a myoma T protein NLS; a sequence from human p53 NLS; a sequence of mouse c-abl IV NLS; a sequence of influenza virus NS1 NLS; a sequence of Hepatitis virus delta antigen NLS; a sequence of mouse Mx1 protein NLS; a sequence of human poly(ADP-ribose) polymerase NLS; a sequence of steroid hormone receptors (human) glucocorticoid NLS; and/or a sequence of EGL-13 NLS.
134. The composition of claim 123 wherein the Type V CRISPR nuclease polypeptide further comprises a purification tag at or near the N-terminus of the nuclease polypeptide.
135. The composition claim 134 wherein the Type V CRISPR nuclease polypeptide comprises a cleavage site at or near the N-terminus of the nuclease polypeptide.
136. The composition of claim 135 comprising 5 NLSs at or near the N-terminus of the polypeptide, a purification tag, and the cleavage site, wherein the cleavage site is after the purification tag.
137. The composition of claim 136 comprising a sequence at least 80%, identical to SEQ ID NO: 111 or 112.
138. The composition of claim 123 further comprising a guide nucleic acid (gNA), comprising a spacer sequence that targets a target nucleotide sequence within a polynucleotide, or a polynucleotide coding for the gNA, wherein the gNA, is compatible with the Type V CRISPR nuclease.
139. The composition of claim 138 wherein the gRNA is a dual gRNA.
140. A method for modifying a target polynucleotide comprising:
(A) contacting the polynucleotide with a composition comprising
(i) a nucleic acid-guided nuclease comprising a Type V CRISPR nuclease polypeptide comprising at least three nuclear localization signals (NLS) at or near the N-terminus or the C-terminus of the polypeptide, wherein at least two of the NLSs are at or near the N-terminus of the polypeptide, or a polynucleotide encoding the nuclease, and
(ii) a guide RNA compatible with the nuclease and comprising a spacer sequence complementary to the target polynucleotide or a portion thereof; and
(B) allowing the nuclease to modify the target polynucleotide.
141. The method of claim 140 wherein the target polynucleotide is a genome or a portion of a genome within a cell.
142. The method of claim 140 wherein the guide RNA is a dual guide RNA.
143. The method of claim 140 wherein the nuclease is a Type Va nuclease.
144. The method of claim 140 wherein the Type V CRISPR nuclease polypeptide has at least 80% sequence identity with SEQ ID NO: 1.
145. The method of claim 140 wherein the Type V CRISPR nuclease polypeptide comprises four NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide.
146. The method of claim 140 wherein the Type V CRISPR nuclease polypeptide comprises at least five NLSs, each of which is at or near the N-terminus or the C-terminus of the polypeptide.
147. The method of claim 146 wherein at least three of the NLSs are at or near the N-terminus of the polypeptide.
148. The method of claim 146 wherein at least four of the NLSs are at or near the N-terminus of the polypeptide.
149. The method of claim 146 wherein the 5 NLSs are at or near the N-terminus of the polypeptide.
US18/141,363 2021-05-06 2023-04-28 Modified nucleases Pending US20230340437A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/141,363 US20230340437A1 (en) 2021-05-06 2023-04-28 Modified nucleases

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202163185315P 2021-05-06 2021-05-06
US202263315483P 2022-03-01 2022-03-01
PCT/US2022/028208 WO2022236147A1 (en) 2021-05-06 2022-05-06 Modified nucleases
US18/141,363 US20230340437A1 (en) 2021-05-06 2023-04-28 Modified nucleases

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/028208 Continuation WO2022236147A1 (en) 2021-05-06 2022-05-06 Modified nucleases

Publications (1)

Publication Number Publication Date
US20230340437A1 true US20230340437A1 (en) 2023-10-26

Family

ID=81975392

Family Applications (1)

Application Number Title Priority Date Filing Date
US18/141,363 Pending US20230340437A1 (en) 2021-05-06 2023-04-28 Modified nucleases

Country Status (4)

Country Link
US (1) US20230340437A1 (en)
JP (1) JP2024518413A (en)
CA (1) CA3218053A1 (en)
WO (1) WO2022236147A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017106569A1 (en) * 2015-12-18 2017-06-22 The Regents Of The University Of California Modified site-directed modifying polypeptides and methods of use thereof
WO2020011985A1 (en) * 2018-07-12 2020-01-16 Keygene N.V. Type v crispr/nuclease-system for genome editing in plant cells
WO2020092057A1 (en) * 2018-10-30 2020-05-07 Yale University Compositions and methods for rapid and modular generation of chimeric antigen receptor t cells
US20200299661A1 (en) * 2017-12-11 2020-09-24 Editas Medicine, Inc. Cpf1-related methods and compositions for gene editing
US11649442B2 (en) * 2017-09-08 2023-05-16 The Regents Of The University Of California RNA-guided endonuclease fusion polypeptides and methods of use thereof

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9790490B2 (en) 2015-06-18 2017-10-17 The Broad Institute Inc. CRISPR enzymes and systems
US9896696B2 (en) 2016-02-15 2018-02-20 Benson Hill Biosystems, Inc. Compositions and methods for modifying genomes
US9982279B1 (en) 2017-06-23 2018-05-29 Inscripta, Inc. Nucleic acid-guided nucleases
CN115103910A (en) 2019-10-03 2022-09-23 工匠开发实验室公司 CRISPR system with engineered dual guide nucleic acids
WO2021074191A1 (en) * 2019-10-14 2021-04-22 KWS SAAT SE & Co. KGaA Mad7 nuclease in plants and expanding its pam recognition capability
EP4100524A1 (en) 2020-02-05 2022-12-14 Danmarks Tekniske Universitet Compositions and methods for targeting, editing or modifying human genes

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017106569A1 (en) * 2015-12-18 2017-06-22 The Regents Of The University Of California Modified site-directed modifying polypeptides and methods of use thereof
US11649442B2 (en) * 2017-09-08 2023-05-16 The Regents Of The University Of California RNA-guided endonuclease fusion polypeptides and methods of use thereof
US20200299661A1 (en) * 2017-12-11 2020-09-24 Editas Medicine, Inc. Cpf1-related methods and compositions for gene editing
WO2020011985A1 (en) * 2018-07-12 2020-01-16 Keygene N.V. Type v crispr/nuclease-system for genome editing in plant cells
WO2020092057A1 (en) * 2018-10-30 2020-05-07 Yale University Compositions and methods for rapid and modular generation of chimeric antigen receptor t cells

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
Del’Guidice et al (PLOS ONE, 2018, 13:e0195558) *
Fernandez et al (Methods, 2018, 150:11-18) *
Li et al (Nature Biotechnology, 2018, 36:324-327 + online methods) *
Li et al Supplemental Information (Nature Biotechnology, 2018, 36:324) *
Liu et al (The CRISPR Journal, 2020, 3:97-108) *
Peters et al (Genome editing in human pluripotent stem cells. In: StemBook. Harvard Stem Cell Institute, Cambridge (MA); 2008. PMID: 23785737) *
Tsang et al (Genesis, 2018, 56(11-12):e23261) *
Zhang et al (Science Advances, 2017, 3:e1602814) *

Also Published As

Publication number Publication date
WO2022236147A1 (en) 2022-11-10
JP2024518413A (en) 2024-05-01
CA3218053A1 (en) 2022-11-10

Similar Documents

Publication Publication Date Title
US20200239863A1 (en) Tracking and Manipulating Cellular RNA via Nuclear Delivery of CRISPR/CAS9
CN115651927B (en) Methods and compositions for editing RNA
US20230332119A1 (en) Compositions comprising a cas12i2 variant polypeptide and uses thereof
CA3036926C (en) Modified stem cell memory t cells, methods of making and methods of using same
KR20220004674A (en) Methods and compositions for editing RNA
CA3026372A1 (en) High specificity genome editing using chemically modified guide rnas
US20230235363A1 (en) Crispr systems with engineered dual guide nucleic acids
TW202118873A (en) Compositions and methods for treatment of disorders associated with repetitive dna
JP2021517815A (en) Lymphopoiesis manipulation using the CAS9 base editor
WO2023023515A1 (en) Persistent allogeneic modified immune cells and methods of use thereof
US20230340437A1 (en) Modified nucleases
KR102666695B1 (en) Methods and compositions for editing RNA
WO2023137233A2 (en) Compositions and methods for editing genomes
WO2022256448A2 (en) Compositions and methods for targeting, editing, or modifying genes
US20230193243A1 (en) Compositions comprising a cas12i2 polypeptide and uses thereof
US20240042029A1 (en) Delivery of molecules to cells using trogocytosis and engineered cells
WO2023167882A1 (en) Composition and methods for transgene insertion
CN118019846A (en) Compositions comprising CRISPR nucleases and uses thereof
CN117136233A (en) Compositions comprising variant Cas12i4 polypeptides and uses thereof
WO2023183434A2 (en) Compositions and methods for generating cells with reduced immunogenicty
WO2024081383A2 (en) Compositions and methods for targeting, editing, or modifying genes
WO2024025908A2 (en) Compositions and methods for genome editing
CN117355607A (en) Non-viral homology mediated end ligation
EP4370676A2 (en) Compositions and methods for targeting, editing or modifying human genes
WO2023019243A1 (en) Compositions comprising a variant cas12i3 polypeptide and uses thereof

Legal Events

Date Code Title Description
AS Assignment

Owner name: ARTISAN DEVELOPMENT LABS, INC., COLORADO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BAUMGARTNER, ROLAND;WARNECKE, TANYA;REEL/FRAME:063799/0362

Effective date: 20230525

AS Assignment

Owner name: FIRST-CITIZENS BANK & TRUST COMPANY, AS AGENT, CALIFORNIA

Free format text: SECURITY INTEREST;ASSIGNOR:ARTISAN DEVELOPMENT LABS, INC.;REEL/FRAME:064442/0952

Effective date: 20230728

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

AS Assignment

Owner name: CELYNTRA THERAPEUTICS SA, BELGIUM

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ARTISAN (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC;REEL/FRAME:066708/0978

Effective date: 20240305

Owner name: ARTISAN (ASSIGNMENT FOR THE BENEFIT OF CREDITORS), LLC, COLORADO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ARTISAN DEVELOPMENT LABS, INC.;REEL/FRAME:066708/0973

Effective date: 20240108

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED