US20210207130A1 - Methods and compositions for the making and using of guide nucleic acids - Google Patents

Methods and compositions for the making and using of guide nucleic acids Download PDF

Info

Publication number
US20210207130A1
US20210207130A1 US16/995,761 US202016995761A US2021207130A1 US 20210207130 A1 US20210207130 A1 US 20210207130A1 US 202016995761 A US202016995761 A US 202016995761A US 2021207130 A1 US2021207130 A1 US 2021207130A1
Authority
US
United States
Prior art keywords
sequence
collection
dna
nucleic acid
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US16/995,761
Inventor
Stephane B. GOURGUECHON
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARC Bio LLC
Original Assignee
ARC Bio LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARC Bio LLC filed Critical ARC Bio LLC
Priority to US16/995,761 priority Critical patent/US20210207130A1/en
Publication of US20210207130A1 publication Critical patent/US20210207130A1/en
Assigned to ARC BIO, LLC reassignment ARC BIO, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GOURGUECHON, STEPHANE B.
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1068Template (nucleic acid) mediated chemical library synthesis, e.g. chemical and enzymatical DNA-templated organic molecule synthesis, libraries prepared by non ribosomal polypeptide synthesis [NRPS], DNA/RNA-polymerase mediated polypeptide synthesis
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K48/00Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/111General methods applicable to biologically active non-coding nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/115Aptamers, i.e. nucleic acids binding a target molecule specifically and with high affinity without hybridising therewith ; Nucleic acids binding to non-nucleic acids, e.g. aptamers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2320/00Applications; Uses
    • C12N2320/10Applications; Uses in screening processes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2750/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssDNA viruses
    • C12N2750/00011Details
    • C12N2750/14011Parvoviridae
    • C12N2750/14111Dependovirus, e.g. adenoassociated viruses
    • C12N2750/14141Use of virus, viral particle or viral elements as a vector
    • C12N2750/14143Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/70Vectors containing special elements for cloning, e.g. topoisomerase, adaptor sites

Definitions

  • gNA guide nucleic acid
  • gRNA guide RNA
  • Cas9 guide RNA-mediated Cas systems
  • gNA e.g., gRNA
  • compositions and methods to generate gNAs and collections of gNAs from any source nucleic acid can be generated from source DNA, such as genomic DNA.
  • source DNA such as genomic DNA.
  • Such gNAs and collections of the same are useful for a variety of applications, including depletion, partitioning, capture, or enrichment of target sequences of interest, genome-wide labeling, genome-wide editing, genome-wide functional screens, and genome-wide regulation.
  • the invention described herein provides a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment encoding a nucleic acid-guided nuclease system protein-binding sequence, wherein at least 10% of the nucleic acids in the collection vary in size.
  • the invention described herein provides a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence, wherein the size of the second segment is greater than 21 bp; and a third segment encoding a nucleic acid-guided nuclease system protein-binding sequence.
  • the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein.
  • the size of the second segment varies from 15-250 bp across the collection of nucleic acids. In some embodiments, at least 10% of the second segments in the collection are greater than 21 bp.
  • the size of the second segment is not 20 bp. In some embodiments, the size of the second segment is not 21 bp.
  • the collection of nucleic acids is a collection of DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA.
  • the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3.
  • the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or a viral genome. In some embodiments, the targeting sequence is directed at repetitive or abundant DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the sequence of the second segments is selected from Table 3 and/or Table 4. In some embodiments, the collection comprises at least 10 2 unique nucleic acid molecules. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence.
  • the collection comprises targeting sequences directed to sequences of interest spaced about every 10,000 bp or less across the genome of an organism.
  • the PAM sequence is AGG, CGG, or TGG.
  • the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
  • the third segment comprises DNA encoding a gRNA stem-loop sequence.
  • the third segment encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2).
  • the sequence of the third segment encodes for a crRNA and a tracrRNA.
  • the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the third segment comprises DNA encoding a Cas9-binding sequence.
  • a plurality of third segments of the collection encode for a first nucleic acid-guided nuclease system protein binding sequence and a plurality of the third segments of the collection encode for a second nucleic acid-guided nuclease system protein binding sequence.
  • the third segments of the collection encode for a plurality of different binding sequences of a plurality of different binding sequences of a plurality of different nucleic acid-guided nuclease system proteins.
  • the invention described herein provides for a collection of guide RNAs (gRNAs), comprising: a first RNA segment a targeting sequence; and a second RNA segment comprising a nucleic acid-guided nuclease system protein-binding sequence, wherein at least 10% of the gRNAs in the collection vary in size.
  • the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein.
  • the size of the first segment varies from 15-250 bp across the collection of gRNAs.
  • the at least 10% of the first segments in the collection are greater than 21 bp.
  • the size of the first segment is not 20 bp.
  • the size of the first segment is not 21 bp.
  • the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or viral genome. In some embodiments, the targeting sequence is directed at repetitive or abundant DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the sequence of the first segments is RNA encoded by sequences selected from Table 3 and/or Table 4. In some embodiments, the collection comprises at least (unique gRNAs. In some embodiments, the gRNAs comprise cytosine, guanine, and adenine.
  • a subset of the gRNAs further comprises thymine. In some embodiments, a subset of the gRNAs further comprises uracil.
  • the first segment is at least 80% complementary to a target genomic sequence of interest. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments the PAM sequence is AGG, CGG, or TGG.
  • the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
  • the second segment comprises a gRNA stem-loop sequence.
  • the third segment comprises DNA encoding a gRNA stem-loop sequence.
  • the third segment comprises the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUU (SEQ ID NO: 1) or comprises the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2).
  • the second segment comprises a crRNA and a tracrRNA.
  • the nucleic acid-guided nuclease system protein is from a bacterial species.
  • the nucleic acid-guided nuclease system protein is from an archaea species.
  • the CRISPR/Cas system protein is a Type I, Type II, or Type III protein.
  • the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase.
  • the second segment comprises a Cas9-binding sequence. In some embodiments, at least 10% of the gRNAs in the collection vary in their 5′ terminal-end sequence.
  • the collection comprises targeting sequences directed to sequences of interest spaced every 10,000 bp or less across the genome of an organism.
  • a plurality of second segments of the collection comprise a first nucleic acid-guided nuclease system protein binding sequence
  • a plurality of the second segments of the collection comprise a second nucleic acid-guided nuclease system protein binding sequence.
  • the second segments of the collection comprise a plurality of different binding sequences of a plurality of different nucleic acid-guided nuclease system proteins.
  • a plurality of the gRNAs of the collection are attached to a substrate.
  • a plurality of the gRNAs of the collection comprise a label.
  • a plurality of the gRNAs of the collection comprise different labels.
  • the invention described herein provides nucleic acid comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence, wherein the targeting sequence is greater than 30 bp; and a third segment encoding a nucleic acid encoding a nucleic acid-guided nuclease system protein-binding sequence.
  • the nucleic acid-guided nuclease is a CRISPR/Cas system protein.
  • the nucleic acid is DNA.
  • the second segment is single stranded DNA.
  • the third segment is single stranded DNA.
  • the second segment is double stranded DNA.
  • the third segment is double stranded DNA.
  • the regulatory region is a region capable of binding a transcription factor.
  • the regulatory region comprises a promoter.
  • the promoter is selected from the group consisting of T7, SP6, and T3.
  • the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or a viral genome.
  • the targeting sequence is directed at abundant or repetitive DNA.
  • the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA.
  • the sequence of the second segments is selected from Table 3 and/or Table 4.
  • the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence.
  • the target genomic sequence of interest is 5′ upstream of a PAM sequence.
  • the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
  • the third segment comprises DNA encoding a gRNA stem-loop sequence.
  • the third segment comprises DNA encoding a gRNA stem-loop sequence.
  • the third segment encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUC (SEQ ID NO: 2).
  • the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the third segment comprises DNA encoding a Cas9-binding sequence.
  • the invention described herein provides a guide RNA comprising a first segment comprising a targeting sequence, wherein the size of the first segment is greater than 30 bp; and a second segment comprising a nucleic acid-guided nuclease system protein-binding sequence.
  • the nucleic acid-guided nuclease is a CRISPR/Cas system protein.
  • the gRNA comprises an adenine, a guanine, and a cytosine.
  • the gRNA further comprises a thymine.
  • the gRNA further comprises a uracil.
  • the size of the first RNA segment is between 30 and 250 bp.
  • the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or viral genome. In some embodiments, the targeting sequence is directed at repetitive or abundant DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the first segment is at least 80% complementary to the target genomic sequence of interest. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the second segment comprises a gRNA stem-loop sequence.
  • the sequence of the second segment comprises GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUU (SEQ ID NO: 1) or comprises the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2).
  • the sequence of the third segment comprises a crRNA and a tracrRNA.
  • the nucleic acid-guided nuclease system protein is from a bacterial species.
  • the nucleic acid-guided nuclease system protein is from an archaea species.
  • the CRISPR/Cas system protein is a Type I, Type II, or Type III protein.
  • the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase.
  • the second segment is a Cas9-binding sequence.
  • the invention provides a complex comprising a nucleic acid-guided nuclease system protein and a comprising a first segment comprising a targeting sequence, wherein the size of the first segment is greater than 30 bp; and a second segment comprising a nucleic acid-guided nuclease system protein-binding sequence.
  • the invention described herein provides a method for depleting and partitioning of targeted sequences in a sample, enriching a sample for non-host nucleic acids, or serially depleting targeted nucleic acids in a sample comprising: providing nucleic acids extracted from a sample; and contacting the sample with a plurality of complexes comprising (i) any one of the collection of gRNAs provided herein; and (ii) nucleic acid-guided nuclease system proteins.
  • the nucleic acid-guided nuclease system proteins are CRISPR/Cas system proteins.
  • the CRISPR/Cas system proteins are Cas9 proteins.
  • the invention provides a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: (a) providing double-stranded DNA molecules, each comprising a sequence of interest 5′ to a PAM sequence, and its reverse complementary sequence on the opposite strand; (b) performing an enzymatic digestion reaction on the double stranded DNA molecules, wherein cleavages are generated at the PAM sequence and/or its reverse complementary sequence on the opposite strand, but never completely remove the PAM sequence and/or its reverse complementary sequence on the opposite strand from the double stranded DNA; (c) ligating adapters comprising a recognition sequence to the resulting DNA molecules of step b; (d) contacting the DNA molecules of step c with an restriction enzyme that recognizes the recognition sequence of step c, whereby generating DNA fragments comprising blunt-ended double strand breaks immediately 5
  • the nucleic acid-guided nuclease is a CRISPR/Cas nucleic acid-guided nuclease system protein.
  • the starting DNA molecules of the collection further comprise a regulatory sequence upstream of the sequence of interest 5′ to the PAM sequence.
  • the regulatory sequence comprises a promoter.
  • the promoter comprises a T7, Sp6, or T3 sequence.
  • the double stranded DNA molecules are genomic DNA, intact DNA, or sheared DNA.
  • the genomic DNA is human, mouse, avian, fish, plant, insect, bacterial, or viral.
  • the DNA segments encoding a targeting sequence are at least 22 bp.
  • the DNA segments encoding a targeting sequence are 15-250 bp in size range.
  • the PAM sequence is AGG, CGG, or TGG.
  • the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
  • step (b) further comprises (1) contacting the DNA molecules with an enzyme capable of creating a nick in a single strand at a CCD site, whereby generating a plurality of nicked double stranded DNA molecules, each comprising a sequence of interest followed by an HGG sequence, wherein the DNA molecules are nicked at the CCD sites; and (2) contacting the nicked double stranded DNA molecules with an endonuclease, whereby generating a plurality of double stranded DNA fragments, each comprising a sequence of interest followed by an HGG sequence wherein residual nucleotides from HGG and/or CCD sequences is (are) left behind.
  • step (d) further comprises PCR amplification of the adaptor-ligated DNA fragments from step (c) before cutting with the restriction enzyme recognizing the recognition sequence of step (c), wherein after PCR, the recognition sequence is positioned 3′ of the PAM sequence, and a regulatory sequence is positioned at the 5′ distal end of the PAM sequence.
  • the enzymatic reaction of step (b) comprises the use of a Nt.CviPII enzyme, and a T7 Endonuclease I enzyme.
  • step (c) further comprises a blunt-end reaction with a T4 DNA Polymerase, if the adapter to be ligated does not comprise an overhang.
  • the adapter of step (c) is either (1) double stranded, comprising a restriction enzyme recognition sequence in one strand, and a regulatory sequence in the other strand, if the adapter is Y-shaped and comprises an overhang; or (2) has a palindromic enzyme recognition sequence in both strands, if the adapter is not Y-shaped.
  • the restriction enzyme of step (d) is MlyI.
  • the restriction enzyme of step (d) is BaeI.
  • step (d) further comprises contacting the DNA molecules with an XhoI enzyme.
  • the DNA encoding a nucleic acid-guided nuclease system-protein binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUC (SEQ ID NO: 2).
  • the targeted sequences of interest are spaced every 10,000 bp or less across the genome of an organism.
  • the invention provides a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: (a) providing a plurality of double stranded DNA molecules, each comprising a sequence of interest, an NGG site, and its complement CCN site; (b) contacting the molecules with an enzyme capable of creating a nick in a single strand at a CCN site, whereby generating a plurality of nicked double stranded DNA molecules, each comprising a sequence of interest 5′ to the NGG site, wherein the DNA molecules are nicked at the CCD sites; (c) contacting the nicked double stranded DNA molecules with an endonuclease, whereby generating a plurality of double stranded DNA fragments, each comprising a sequence of interest, wherein the fragments comprise an terminal overhang; (d)
  • the nucleic acid-guided nuclease is a CRISPR/Cas system protein.
  • the plurality of double stranded DNA molecules have a regulatory sequence 5′ upstream of the NGG sites.
  • the regulatory sequence comprises a T7, SP6, or T3 sequence.
  • the NGG site comprises AGG, CGG, or TGG
  • the CCN site comprises CCT, CCG, or CCA.
  • the plurality of double stranded DNA molecules, each comprising a sequence of interest comprise sheared fragments of genomic DNA.
  • the genomic DNA is mammalian, prokaryotic, eukaryotic, avian, bacterial or viral.
  • the plurality of double stranded DNA molecules in step (a) are at least 500 bp.
  • the enzyme in step b is a Nt.CviPII enzyme.
  • the enzyme in step c is a T7 Endonuclease I.
  • the enzyme in step d is a T4 DNA Polymerase.
  • the DNA encoding a nucleic acid-guided nuclease system-protein binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUC (SEQ ID NO: 2).
  • the step e additionally comprises ligating adaptors carrying a MlyI recognition site and digesting with MlyI enzyme.
  • the sequence of interest is spaced every 10,000 bp or less across the genome.
  • the invention provides a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence and a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: (a) providing genomic DNA comprising a plurality of sequences of interest, comprising NGG and CCN sites; (b) contacting the genomic DNA with an enzyme capable of creating nicks in the genomic DNA, whereby generating nicked genomic DNA, nicked at CCN sites; (c) contacting the nicked genomic DNA with an endonuclease, whereby generating double stranded DNA fragments, with an overhang; (d) ligating the DNA with overhangs from step c to a Y-shaped adapter, thereby introducing a restriction enzyme recognition sequence only at 3′ of the NGG site and a regulatory sequence 5′ of the sequence of interest; (e) contacting the product from step d with an enzyme that cleaves away the NGG site together with the
  • the nucleic acid-guided nuclease is a CRISPR/Cas system protein.
  • the NGG site comprises AGG, CGG, or TGG
  • CCN site comprises CCT, CCG, or CCA.
  • the regulatory sequence comprises a promoter sequence.
  • the promoter sequence comprises a T7, SP6, or T3 sequence.
  • the DNA fragments are sheared fragments of genomic DNA.
  • the genomic DNA is mammalian, prokaryotic, eukaryotic, or viral.
  • the fragments are at least 200 bp.
  • the enzyme in step b is a Nt.CviPII enzyme.
  • the enzyme in step c is a T7 Endonuclease I.
  • step d further comprises PCR amplification of the adaptor-ligated DNA.
  • the DNA encoding nucleic acid-guided nuclease system protein-binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUC (SEQ ID NO: 2).
  • the enzyme removing NGG site in step e is MlyI.
  • the target of interest of the collection is spaced every 10,000 bp or less across the genome.
  • kits and/or reagents useful for performing a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, as described in the embodiments herein.
  • the invention described herein provides kit comprising a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment encoding a CRISPR/Cas system protein-binding sequence, wherein at least 10% of the nucleic acids in the collection vary in size.
  • the invention described herein provides a kit comprising a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence, wherein the size of the second segment is greater than 21 bp; and a third segment encoding a CRISPR/Cas system protein-binding sequence.
  • the invention described herein provides a kit comprising a collection of guide RNAs comprising a first RNA segment a targeting sequence; and a second RNA segment comprising a CRISPR/Cas system protein-binding sequence, wherein at least 10% of the gRNAs in the collection vary in size.
  • the invention described herein provides a method of making a collection of guide nucleic acids, comprising: a. obtaining abundant cells in a source sample; b. collecting nucleic acids from said abundant cells; and c. preparing a collection of guide nucleic acids (gNAs) from said nucleic acids.
  • said abundant cells comprise cells from one or more most abundant bacterial species in said source sample.
  • said abundant cells comprise cells from more than one species.
  • said abundant cells comprise human cells.
  • said abundant cells comprise animal cells.
  • said abundant cells comprise plant cells.
  • said abundant cells comprise bacterial cells.
  • the method further comprises contacting nucleic acid-guided nucleases with said library of gNAs to form nucleic acid-guided nuclease-gNA complexes. In some embodiments, the method further comprises using said nucleic acid-guided nuclease-gNA complexes to cleave target nucleic acids at target sites, wherein said gNAs are complementary to said target sites.
  • said target nucleic acids are from said source sample.
  • a species of said target nucleic acids is the same as a species of said source sample. In some embodiments, said species of said target nucleic acids and said species of said source sample is human. In some embodiments, said species of said target nucleic acids and said species of said source sample is animal. In some embodiments, said species of said target nucleic acids and said species of said source sample is plant.
  • the invention described herein provides a method of making a collection of nucleic acids, each comprising a targeting sequence, comprising: a. obtaining source DNA; b. nicking said source DNA with a nicking enzyme at nicking enzyme recognition sites, thereby producing double-stranded breaks at proximal nicks; and c. repairing overhangs of said double-stranded breaks, thereby producing a double-stranded fragment comprising (i) a targeting sequence and (ii) said nicking enzyme recognition site.
  • the invention described herein provides a method of making a collection of nucleic acids, each comprising a targeting sequence, comprising: a. obtaining source DNA; b.
  • the method further comprises producing a double-stranded fragment comprising said targeting sequence from said single-stranded fragment.
  • said producing said double-stranded fragment comprises random priming and extension.
  • said random priming is conducted with a primer comprising a random n-mer region and a promoter region.
  • said random n-mer region is a random hexamer region.
  • said random n-mer region is a random octamer region.
  • said promoter region is a T7 promoter region.
  • the method further comprises ligating a nuclease recognition site nucleic acid comprising a nuclease recognition site to said double-stranded fragment.
  • said nuclease recognition site corresponds to a nuclease that cuts at a distance from said nuclease recognition site equal to the length of said nicking enzyme recognition sites.
  • said nuclease recognition site is a MlyI recognition site.
  • said nuclease recognition site is a BaeI recognition site.
  • the method further comprises digesting said double-stranded fragment with said nuclease, thereby removing said nicking enzyme recognition site from said double-stranded fragment.
  • the method further comprises ligating said double-stranded fragment to a nucleic acid-guided nuclease system protein recognition site nucleic acid comprising a nucleic acid-guided nuclease system protein recognition site.
  • said nucleic acid-guided nuclease system protein recognition site comprises a guide RNA stem-loop sequence.
  • said nuclease recognition site corresponds to a nuclease that cuts at a distance from said nuclease recognition site equal to a length of said targeting sequence.
  • said length of said targeting sequence is 20 base pairs.
  • said nuclease recognition site is a MmeI recognition site.
  • the method further comprises digesting said double-stranded fragment with said nuclease.
  • said nuclease recognition site corresponds to a nuclease that cuts at a distance from said nuclease recognition site equal to a length of said targeting sequence plus a length of said nicking enzyme recognition sites.
  • said length of said targeting sequence plus a length of said nicking enzyme recognition sites is 23 base pairs.
  • said nuclease recognition site is a EcoP15I recognition site.
  • the method further comprises digesting said double-stranded fragment with said nuclease. In some embodiments, the method further comprises ligating said double-stranded fragment to a nucleic acid-guided nuclease system protein recognition site nucleic acid comprising a nucleic acid-guided nuclease system protein recognition site. In some embodiments, said nucleic acid-guided nuclease system protein recognition site comprises a guide RNA stem-loop sequence.
  • the invention described herein provides a kit comprising all essential reagents and instructions for carrying out the methods of aspects of the invention described herein.
  • FIG. 1 illustrates an exemplary scheme for producing a collection of gRNAs (a gRNA library) from genomic DNA.
  • FIG. 2 illustrates another exemplary scheme for producing a collection of gRNAs (a gRNA library) from genomic DNA.
  • FIG. 3 illustrates an exemplary scheme for nicking of DNA and subsequent treatment with polymerase to generate blunt ends.
  • FIG. 4 illustrates an exemplary scheme for sequential production of a library of gNAs using three adapters.
  • FIG. 5 illustrates an exemplary scheme for sequential production of a library of gNAs using one adapter and one oligo.
  • FIG. 6 illustrates an exemplary scheme for generation of a large pool of DNA fragments with blunt ends using Nicking Enzyme Mediated DNA Amplification (NEMDA).
  • NEMDA Nicking Enzyme Mediated DNA Amplification
  • FIG. 7 illustrates an exemplary scheme for generation of a large pool of gNAs using Nicking Enzyme Mediated DNA Amplification (NEMDA).
  • NEMDA Nicking Enzyme Mediated DNA Amplification
  • gNAs guide nucleic acids
  • nucleic acid refers to a molecule comprising one or more nucleic acid subunits.
  • a nucleic acid can include one or more subunits selected from adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), and modified versions of the same.
  • a nucleic acid comprises deoxyribonucleic acid (DNA), ribonucleic acid (RNA), combinations, or derivatives thereof.
  • a nucleic acid may be single-stranded and/or double-stranded.
  • nucleic acids comprise “nucleotides”, which, as used herein, is intended to include those moieties that contain purine and pyrimidine bases, and modified versions of the same. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles.
  • nucleotide or “polynucleotide” includes those moieties that contain hapten or fluorescent labels and may contain not only conventional ribose and deoxyribose sugars, but other sugars as well.
  • Modified nucleosides, nucleotides or polynucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.
  • nucleic acids and “polynucleotides” are used interchangeably herein.
  • Polynucleotide is used to describe a nucleic acid polymer of any length, e.g., greater than about 2 bases, greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, up to about 10,000 or more bases composed of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, and may be produced enzymatically or synthetically (e.g., PNA as described in U.S. Pat. No.
  • Naturally-occurring nucleotides include guanine, cytosine, adenine and thymine (G, C, A and T, respectively).
  • DNA and RNA have a deoxyribose and ribose sugar backbones, respectively, whereas PNA's backbone is composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds.
  • PNA various purine and pyrimidine bases are linked to the backbone by methylene carbonyl bonds.
  • a locked nucleic acid is a modified RNA nucleotide.
  • the ribose moiety of an LNA nucleotide is modified with an extra bridge connecting the 2′ oxygen and 4′ carbon. The bridge “locks” the ribose in the 3′-endo (North) conformation, which is often found in the A-form duplexes.
  • LNA nucleotides can be mixed with DNA or RNA residues in the oligonucleotide whenever desired.
  • the term “unstructured nucleic acid,” or “UNA” is a nucleic acid containing non-natural nucleotides that bind to each other with reduced stability.
  • an unstructured nucleic acid may contain a G′ residue and a C′ residue, where these residues correspond to non-naturally occurring forms, i.e., analogs, of G and C that base pair with each other with reduced stability, but retain an ability to base pair with naturally occurring C and G residues, respectively.
  • Unstructured nucleic acid is described in US20050233340, which is incorporated by reference herein for disclosure of UNA.
  • oligonucleotide denotes a single-stranded multimer of nucleotides.
  • nucleic acids are written left to right in 5′ to 3′ orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
  • cleaving refers to a reaction that breaks the phosphodiester bonds between two adjacent nucleotides in both strands of a double-stranded DNA molecule, thereby resulting in a double-stranded break in the DNA molecule.
  • nicking refers to a reaction that breaks the phosphodiester bond between two adjacent nucleotides in only one strand of a double-stranded DNA molecule, thereby resulting in a break in one strand of the DNA molecule.
  • cleavage site refers to the site at which a double-stranded DNA molecule has been cleaved.
  • nucleic acid-guided nuclease-gNA complex refers to a complex comprising a nucleic acid-guided nuclease protein and a guide nucleic acid (gNA, for example a gRNA or a gDNA).
  • gNA guide nucleic acid
  • Cas9-gRNA complex refers to a complex comprising a Cas9 protein and a guide RNA (gRNA).
  • the nucleic acid-guided nuclease may be any type of nucleic acid-guided nuclease, including but not limited to wild type nucleic acid-guided nuclease, a catalytically dead nucleic acid-guided nuclease, or a nucleic acid-guided nuclease-nickase.
  • nucleic acid-guided nuclease-associated guide NA refers to a guide nucleic acid (guide NA).
  • the nucleic acid-guided nuclease-associated guide NA may exist as an isolated nucleic acid, or as part of a nucleic acid-guided nuclease-gNA complex, for example a Cas9-gRNA complex.
  • capture and “enrichment” are used interchangeably herein, and refer to the process of selectively isolating a nucleic acid region containing: sequences of interest, targeted sites of interest, sequences not of interest, or targeted sites not of interest.
  • hybridization refers to the process by which a strand of nucleic acid joins with a complementary strand through base pairing as known in the art.
  • a nucleic acid is considered to be “selectively hybridizable” to a reference nucleic acid sequence if the two sequences specifically hybridize to one another under moderate to high stringency hybridization and wash conditions. Moderate and high stringency hybridization conditions are known (see, e.g., Ausubel, et al., Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons 1995 and Sambrook et al., Molecular Cloning: A Laboratory Manual, Third Edition, 2001 Cold Spring Harbor, N.Y.).
  • high stringency conditions includes hybridization at about 42° C. in 50% formamide, 5 ⁇ SSC, 5 ⁇ Denhardt's solution, 0.5% SDS and 100 ⁇ g/ml denatured carrier DNA followed by washing two times in 2 ⁇ SSC and 0.5% SDS at room temperature and two additional times in 0.1 ⁇ SSC and 0.5% SDS at 42° C.
  • duplex or “duplexed,” as used herein, describes two complementary polynucleotides that are base-paired, i.e., hybridized together.
  • amplifying refers to generating one or more copies of a target nucleic acid, using the target nucleic acid as a template.
  • genomic region refers to a region of a genome, e.g., an animal or plant genome such as the genome of a human, monkey, rat, fish or insect or plant.
  • an oligonucleotide used in the method described herein may be designed using a reference genomic region, i.e., a genomic region of known nucleotide sequence, e.g., a chromosomal region whose sequence is deposited at NCBI's Genbank database or other databases, for example.
  • genomic sequence refers to a sequence that occurs in a genome. Because RNAs are transcribed from a genome, this term encompasses sequence that exist in the nuclear genome of an organism, as well as sequences that are present in a cDNA copy of an RNA (e.g., an mRNA) transcribed from such a genome.
  • genomic fragment refers to a region of a genome, e.g., an animal or plant genome such as the genome of a human, monkey, rat, fish or insect or plant.
  • a genomic fragment may be an entire chromosome, or a fragment of a chromosome.
  • a genomic fragment may be adapter ligated (in which case it has an adapter ligated to one or both ends of the fragment, or to at least the 5 end of a molecule), or may not be adapter ligated.
  • an oligonucleotide used in the method described herein may be designed using a reference genomic region, i.e., a genomic region of known nucleotide sequence, e.g., a chromosomal region whose sequence is deposited at NCBI's Genbank database or other databases, for example.
  • a reference genomic region i.e., a genomic region of known nucleotide sequence, e.g., a chromosomal region whose sequence is deposited at NCBI's Genbank database or other databases, for example.
  • Such an oligonucleotide may be employed in an assay that uses a sample containing a test genome, where the test genome contains a binding site for the oligonucleotide.
  • ligating refers to the enzymatically catalyzed joining of the terminal nucleotide at the 5′ end of a first DNA molecule to the terminal nucleotide at the 3′ end of a second DNA molecule.
  • nucleic acids are “complementary,” each base of one of the nucleic acids base pairs with corresponding nucleotides in the other nucleic acid.
  • complementary and perfectly complementary are used synonymously herein.
  • separating refers to physical separation of two elements (e.g., by size or affinity, etc.) as well as degradation of one element, leaving the other intact.
  • size exclusion can be employed to separate nucleic acids, including cleaved targeted sequences.
  • DNA usually exists in a double-stranded form, and as such, has two complementary strands of nucleic acid referred to herein as the “top” and “bottom” strands.
  • complementary strands of a chromosomal region may be referred to as “plus” and “minus” strands, the “first” and “second” strands, the “coding” and “noncoding” strands, the “Watson” and “Crick” strands or the “sense” and “antisense” strands.
  • the assignment of a strand as being a top or bottom strand is arbitrary and does not imply any particular orientation, function or structure.
  • the first and second strands are distinct molecules.
  • the “top” and “bottom” strands of a double-stranded nucleic acid in which the top and bottom strands have been covalently linked will still be described as the “top” and “bottom” strands.
  • the top and bottom strands of a double-stranded DNA do not need to be separated molecules.
  • the nucleotide sequences of the first strand of several exemplary mammalian chromosomal regions e.g., BACs, assemblies, chromosomes, etc.
  • NCBI's Genbank database for example.
  • top strand refers to either strand of a nucleic acid but not both strands of a nucleic acid.
  • oligonucleotide or a primer binds or anneals “only to a top strand,” it binds to only one strand but not the other.
  • bottom strand refers to the strand that is complementary to the “top strand.”
  • an oligonucleotide binds or anneals “only to one strand,” it binds to only one strand, e.g., the first or second strand, but not the other strand.
  • the oligonucleotide may have two regions, a first region that hybridizes with the top strand of the double-stranded DNA, and a second region that hybridizes with the bottom strand of the double-stranded DNA.
  • double-stranded DNA molecule refers to both double-stranded DNA molecules in which the top and bottom strands are not covalently linked, as well as double-stranded DNA molecules in which the top and bottom stands are covalently linked.
  • the top and bottom strands of a double-stranded DNA are base paired with one other by Watson-Crick interactions.
  • denaturing refers to the separation of at least a portion of the base pairs of a nucleic acid duplex by placing the duplex in suitable denaturing conditions. Denaturing conditions are well known in the art.
  • the duplex in order to denature a nucleic acid duplex, the duplex may be exposed to a temperature that is above the T m of the duplex, thereby releasing one strand of the duplex from the other.
  • a nucleic acid may be denatured by exposing it to a temperature of at least 90° C. for a suitable amount of time (e.g., at least 30 seconds, up to 30 mins).
  • fully denaturing conditions may be used to completely separate the base pairs of the duplex.
  • partially denaturing conditions e.g., with a lower temperature than fully denaturing conditions
  • Nucleic acid may also be denatured chemically (e.g., using urea or NaOH).
  • genotyping refers to any type of analysis of a nucleic acid sequence, and includes sequencing, polymorphism (SNP) analysis, and analysis to identify rearrangements.
  • sequencing refers to a method by which the identity of consecutive nucleotides of a polynucleotide are obtained.
  • next-generation sequencing refers to the so-called parallelized sequencing-by-synthesis or sequencing-by-ligation platforms, for example, those currently employed by Illumina, Life Technologies, and Roche, etc.
  • Next-generation sequencing methods may also include nanopore sequencing methods or electronic-detection based methods such as Ion Torrent technology commercialized by Life Technologies.
  • complementary DNA refers to a double-stranded DNA sample that was produced from an RNA sample by reverse transcription of RNA (using primers such as random hexamers or oligo-dT primers) followed by second-strand synthesis by digestion of the RNA with RNaseH and synthesis by DNA polymerase.
  • RNA promoter adapter is an adapter that contains a promoter for a bacteriophage RNA polymerase, e.g., the RNA polymerase from bacteriophage T3, T7, SP6 or the like.
  • gNAs Guide Nucleic Acids
  • gNAs guide nucleic acids derivable from any nucleic acid source.
  • the gNAs can be guide RNAs (gRNAs) or guide DNAs (gDNAs).
  • the nucleic acid source can be DNA or RNA.
  • Provided herein are methods to generate gNAs from any source nucleic acid, including DNA from a single organism, or mixtures of DNA from multiple organisms, or mixtures of DNA from multiple species, or DNA from clinical samples, or DNA from forensic samples, or DNA from environmental samples, or DNA from metagenomic DNA samples (for example a sample that contains more than one species of organism).
  • Examples of any source DNA include, but are not limited to any genome, any genome fragment, cDNA, synthetic DNA, or a DNA collection (e.g. a SNP collection, DNA libraries).
  • the gNAs provided herein can be used for genome-wide applications.
  • the gNAs are derived from genomic sequences (e.g., genomic DNA). In some embodiments, the gNAs are derived from mammalian genomic sequences. In some embodiments, the gNAs are derived from eukaryotic genomic sequences. In some embodiments, the gNAs are derived from prokaryotic genomic sequences. In some embodiments, the gNAs are derived from viral genomic sequences. In some embodiments, the gNAs are derived from bacterial genomic sequences. In some embodiments, the gNAs are derived from plant genomic sequences. In some embodiments, the gNAs are derived from microbial genomic sequences. In some embodiments, the gNAs are derived from genomic sequences from a parasite, for example a eukaryotic parasite.
  • the gNAs are derived from repetitive DNA. In some embodiments, the gNAs are derived from abundant DNA. In some embodiments, the gNAs are derived from mitochondrial DNA. In some embodiments, the gNAs are derived from ribosomal DNA. In some embodiments, the gNAs are derived from centromeric DNA. In some embodiments, the gNAs are derived from DNA comprising Alu elements (Alu DNA). In some embodiments, the gNAs are derived from DNA comprising long interspersed nuclear elements (LINE DNA). In some embodiments, the gNAs are derived from DNA comprising short interspersed nuclear elements (SINE DNA). In some embodiments the abundant DNA comprises ribosomal DNA.
  • the abundant DNA comprises host DNA (e.g., host genomic DNA or all host DNA).
  • the gNAs can be derived from host DNA (e.g., human, animal, plant) for the depletion of host DNA to allow for easier analysis of other DNA that is present (e.g., bacterial, viral, or other metagenomic DNA).
  • the gNAs can be derived from the one or more most abundant types (e.g., species) in a mixed sample, such as the one or more most abundant bacteria species in a metagenomic sample.
  • the one or more most abundant types can comprise the two, three, four, five, six, seven, eight, nine, ten, or more than ten most abundant types (e.g., species).
  • the most abundant types can be the most abundant kingdoms, phyla or divisions, classes, orders, families, genuses, species, or other classifications.
  • the most abundant types can be the most abundant cell types, such as epithelial cells, bone cells, muscle cells, blood cells, adipose cells, or other cell types.
  • the most abundant types can be non-cancerous cells.
  • the most abundant types can be cancerous cells.
  • the most abundant types can be animal, human, plant, fungal, bacterial, or viral.
  • gNAs can be derived from both a host and the one or more most abundant non-host types (e.g., species) in a sample, such as from both human DNA and the DNA of the one or more most abundant bacterial species.
  • the abundant DNA comprises DNA from the more abundant or most abundant cells in a sample.
  • the highly abundant cells can be extracted and their DNA can be used to produce gNAs; these gNAs can be used to produce depletion library and applied to original sample to enable or enhance sequencing or detection of low abundance targets.
  • the gNAs are derived from DNA comprising short terminal repeats (STRs).
  • the gNAs are derived from a genomic fragment, comprising a region of the genome, or the whole genome itself.
  • the genome is a DNA genome.
  • the genome is a RNA genome.
  • the gNAs are derived from a eukaryotic or prokaryotic organism; from a mammalian organism or a non-mammalian organism; from an animal or a plant; from a bacteria or virus; from an animal parasite; from a pathogen.
  • the gNAs are derived from any mammalian organism.
  • the mammal is a human.
  • the mammal is a livestock animal, for example a horse, a sheep, a cow, a pig, or a donkey.
  • a mammalian organism is a domestic pet, for example a cat, a dog, a gerbil, a mouse, a rat.
  • the mammal is a type of a monkey.
  • the gNAs are derived from any bird or avian organism.
  • An avian organism includes but is not limited to chicken, turkey, duck and goose.
  • the gNAs are derived from a plant.
  • the plant is rice, maize, wheat, rose, grape, coffee, fruit, tomato, potato, or cotton.
  • the gNAs are derived from a species of bacteria.
  • the bacteria are tuberculosis-causing bacteria.
  • the gNAs are derived from a virus.
  • the gNAs are derived from a species of fungi.
  • the gNAs are derived from a species of algae.
  • the gNAs are derived from any mammalian parasite.
  • the gNAs are derived from any mammalian parasite.
  • the parasite is a worm.
  • the parasite is a malaria-causing parasite.
  • the parasite is a Leishmaniasis-causing parasite.
  • the parasite is an amoeba.
  • the gNAs are derived from a nucleic acid target.
  • Contemplated targets include, but are not limited to, pathogens; single nucleotide polymorphisms (SNPs), insertions, deletions, tandem repeats, or translocations; human SNPs or STRs; potential toxins; or animals, fungi, and plants.
  • the gRNAs are derived from pathogens, and are pathogen-specific gNAs.
  • a guide NA of the invention comprises a first NA segment comprising a targeting sequence, wherein the targeting sequence is 15-250 bp; and a second NA segment comprising a nucleic acid guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence.
  • a nucleic acid guided nuclease system e.g., CRISPR/Cas system
  • the targeting sequence is greater than 21 bp, greater than 22 bp, greater than 23 bp, greater than 24 bp, greater than 25 bp, greater than 26 bp, greater than 27 bp, greater than 28 bp, greater than 29 bp, greater than 30 bp, greater than 40 bp, greater than 50 bp, greater than 60 bp, greater than 70 bp, greater than 80 bp, greater than 90 bp, greater than 100 bp, greater than 110 bp, greater than 120 bp, greater than 130 bp, greater than 140 bp, or even greater than 150 bp. In an exemplary embodiment, the targeting sequence is greater than 30 bp.
  • the targeting sequences of the present invention range in size from 30-50 bp. In some embodiments, targeting sequences of the present invention range in size from 30-75 bp. In some embodiments, targeting sequences of the present invention range in size from 30-100 bp.
  • a targeting sequence can be at least 15 bp, 20 bp, 25 bp, 30 bp, 35 bp, 40 bp, 45 bp, 50 bp, 55 bp, 60 bp, 65 bp, 70 bp, 75 bp, 80 bp, 85 bp, 90 bp, 95 bp, 100 bp, 110 bp, 120 bp, 130 bp, 140 bp, 150 bp, 160 bp, 170 bp, 180 bp, 190 bp, 200 bp, 210 bp, 220 bp, 230 bp, 240 bp, or 250 bp.
  • the targeting sequence is at least 22 bp. In specific embodiments, the targeting sequence is at least 30 bp.
  • target-specific gNAs can comprise a nucleic acid sequence that is complementary to a region on the opposite strand of the targeted nucleic acid sequence 5′ to a PAM sequence, which can be recognized by a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein.
  • the targeted nucleic acid sequence is immediately 5′ to a PAM sequence.
  • the nucleic acid sequence of the gNA that is complementary to a region in a target nucleic acid is 15-250 bp.
  • the nucleic acid sequence of the gNA that is complementary to a region in a target nucleic acid is 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 75, 80, 90, or 100 bp.
  • the targeting sequence is not 20 bp. In some particular embodiments, the targeting sequence is not 21 bp.
  • the gNAs comprise any purines or pyrimidines (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, uracil, guanine, and cytosine (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, thymine, guanine, and cytosine (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, thymine, guanine, cytosine and uracil (and/or modified versions of the same).
  • the gNAs comprise a label, are attached to a label, or are capable of being labeled. In some embodiments, the gNA comprises is a moiety that is further capable of being attached to a label.
  • a label includes, but is not limited to, enzyme, an enzyme substrate, an antibody, an antigen binding fragment, a peptide, a chromophore, a lumiphore, a fluorophore, a chromogen, a hapten, an antigen, a radioactive isotope, a magnetic particle, a metal nanoparticle, a redox active marker group (capable of undergoing a redox reaction), an aptamer, one member of a binding pair, a member of a FRET pair (either a donor or acceptor fluorophore), and combinations thereof.
  • the gNAs are attached to a substrate.
  • the substrate can be made of glass, plastic, silicon, silica-based materials, functionalized polystyrene, functionalized polyethyleneglycol, functionalized organic polymers, nitrocellulose or nylon membranes, paper, cotton, and materials suitable for synthesis.
  • Substrates need not be flat.
  • the substrate is a 2-dimensional array.
  • the 2-dimensional array is flat.
  • the 2-dimensional array is not flat, for example, the array is a wave-like array.
  • Substrates include any type of shape including spherical shapes (e.g., beads).
  • the substrate is a 3-dimensional array, for example, a microsphere.
  • the microsphere is magnetic.
  • the microsphere is glass.
  • the microsphere is made of polystyrene.
  • the microsphere is silica-based.
  • the substrate is an array with interior surface, for example, is a straw, tube, capillary, cylindrical, or microfluidic chamber array.
  • the substrate comprises multiple straws, capillaries, tubes, cylinders, or chambers.
  • nucleic acids encoding for gNAs e.g., gRNAs or gDNAs.
  • a gNA results from the transcription of a nucleic acid encoding for a gNA (e.g., gRNA).
  • the nucleic acid is a template for the transcription of a gNA (e.g., gRNA).
  • a gNA results from the reverse transcription of a nucleic acid encoding for a gNA.
  • nucleic acid is a template for the reverse transcription of a gNA. In some embodiments, by encoding, it is meant that a gNA results from the amplification of a nucleic acid encoding for a gNA. In some embodiments, by encoding, it is meant that the nucleic acid is a template for the amplification of a gNA.
  • the nucleic acid encoding for a gNA comprises a first segment comprising a regulatory region; a second segment comprising targeting sequence, wherein the second segment can range from 15 bp-250 bp; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence.
  • a nucleic acid-guided nuclease system e.g., CRISPR/Cas system
  • the nucleic acids encoding for gNAs comprise DNA.
  • the first segment is double stranded DNA.
  • the first segment is single stranded DNA.
  • the second segment is single stranded DNA.
  • the third segment is single stranded DNA.
  • the second segment is double stranded DNA.
  • the third segment is double stranded DNA.
  • the nucleic acids encoding for gNAs comprise RNA.
  • the nucleic acids encoding for gNAs comprise DNA and RNA.
  • the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3.
  • collections (interchangeably referred to as libraries) of gNAs.
  • a collection of gNAs denotes a mixture of gNAs containing at least 102 unique gNAs.
  • a collection of gNAs contains at least 10 2 , at least 10 3 , at least 10 4 , at least 10 5 , at least 10 6 , at least 10 7 , at least 10 8 , at least 10 9 , at least 10 10 unique gNAs.
  • a collection of gNAs contains a total of at least 10 2 , at least 10 3 , at least 10 4 , at least 10 5 , at least 10 6 , at least 10 7 , at least 10 8 , at least 10 9 , at least 10 10 gNAs.
  • a collection of gNAs comprises a first NA segment comprising a targeting sequence; and a second NA segment comprising a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein at least 10% of the gNAs in the collection vary in size.
  • the first and second segments are in 5′- to 3-order′.
  • the size of the first segment varies from 15-250 bp, or 30-100 bp, or 22-30 bp, or 15-50 bp, or 15-75 bp, or 15-100 bp, or 15-125 bp, or 15-150 bp, or 15-175 bp, or 15-200 bp, or 15-225 bp, or 15-250 bp, or 22-50 bp, or 22-75 bp, or 22-100 bp, or 22-125 bp, or 22-150 bp, or 22-175 bp, or 22-200 bp, or 22-225 bp, or 22-250 bp across the collection of gNAs.
  • At least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are greater than 21 bp.
  • At least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are greater than 25 bp.
  • At least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are greater than 30 bp.
  • At least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are 15-50 bp.
  • At least 0%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are 30-100 bp.
  • the size of the first segment is not 20 bp.
  • the size of the first segment is not 21 bp.
  • the gNAs and/or the targeting sequence of the gNAs in the collection of gRNAs comprise unique 5′ ends.
  • the collection of gNAs exhibit variability in sequence of the 5′ end of the targeting sequence, across the members of the collection.
  • the collection of gNAs exhibit variability at least 5%, or at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75% variability in the sequence of the 5′ end of the targeting sequence, across the members of the collection.
  • the 3′ end of the gNA targeting sequence can be any purine or pyrimidine (and/or modified versions of the same).
  • the 3′ end of the gNA targeting sequence is an adenine.
  • the 3′ end of the gNA targeting sequence is a guanine.
  • the 3′ end of the gNA targeting sequence is a cytosine.
  • the 3′ end of the gNA targeting sequence is a uracil.
  • the 3′ end of the gNA targeting sequence is a thymine. In some embodiments, the 3′ end of the gNA targeting sequence is not cytosine.
  • the collection of gNAs comprises targeting sequences which can base-pair with the targeted DNA, wherein the target of interest is spaced at least every 1 bp, at least every 2 bp, at least every 3 bp, at least every 4 bp, at least every 5 bp, at least every 6 bp, at least every 7 bp, at least every 8 bp, at least every 9 bp, at least every 10 bp, at least every 11 bp, at least every 12 bp, at least every 13 bp, at least every 14 bp, at least every 15 bp, at least every 16 bp, at least every 17 bp, at least every 18 bp, at least every 19 bp, 20 bp, at least every 25 bp, at least every 30 bp, at least every 40 bp, at least every 50 bp, at least every 100 bp, at least every 200 bp, at least every 300 bp, at least every 400 b
  • the collection of gNAs comprises a first NA segment comprising a targeting sequence; and a second NA segment comprising a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein the gNAs in the collection can have a variety of second NA segments with various specificities for protein members of the nucleic acid-guided nuclease system (e.g., CRISPR/Cas system).
  • a nucleic acid-guided nuclease system e.g., CRISPR/Cas system
  • gNAs can comprise members whose second segment comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a first nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein; and also comprises members whose second segment comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein, wherein the first and second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins are not the same.
  • a nucleic acid-guided nuclease system e.g., CRISPR/Cas system
  • a collection of gNAs as provided herein comprises members that exhibit specificity to at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or even at least 20 nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins.
  • nucleic acid-guided nuclease system e.g., CRISPR/Cas system
  • a collection of gNAs as provided herein comprises members that exhibit specificity for a Cas9 protein and another protein selected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
  • a plurality of the gNA members of the collection are attached to a label, comprise a label or are capable of being labeled.
  • the gNA comprises is a moiety that is further capable of being attached to a label.
  • a label includes, but is not limited to, enzyme, an enzyme substrate, an antibody, an antigen binding fragment, a peptide, a chromophore, a lumiphore, a fluorophore, a chromogen, a hapten, an antigen, a radioactive isotope, a magnetic particle, a metal nanoparticle, a redox active marker group (capable of undergoing a redox reaction), an aptamer, one member of a binding pair, a member of a FRET pair (either a donor or acceptor fluorophore), and combinations thereof.
  • a plurality of the gNA members of the collection are attached to a substrate.
  • the substrate can be made of glass, plastic, silicon, silica-based materials, functionalized polystyrene, functionalized polyethyleneglycol, functionalized organic polymers, nitrocellulose or nylon membranes, paper, cotton, and materials suitable for synthesis.
  • Substrates need not be flat.
  • the substrate is a 2-dimensional array.
  • the 2-dimensional array is flat.
  • the 2-dimensional array is not flat, for example, the array is a wave-like array.
  • Substrates include any type of shape including spherical shapes (e.g., beads).
  • the substrate is a 3-dimensional array, for example, a microsphere.
  • the microsphere is magnetic.
  • the microsphere is glass.
  • the microsphere is made of polystyrene.
  • the microsphere is silica-based.
  • the substrate is an array with interior surface, for example, is a straw, tube, capillary, cylindrical, or microfluidic chamber array.
  • the substrate comprises multiple straws, capillaries, tubes, cylinders, or chambers.
  • nucleic acids encoding for gNAs e.g., gRNAs or gDNAs.
  • a gNA results from the transcription of a nucleic acid encoding for a gNA.
  • the nucleic acid is a template for the transcription of a gNA.
  • a collection of nucleic acids encoding for gNAs denotes a mixture of nucleic acids containing at least 10 2 unique nucleic acids.
  • a collection of nucleic acids encoding for gNAs contains at least 10 2 , at least 10 3 , at least 10 4 , at least 10 5 , at least 10 6 , at least 10 7 , at least 10 8 , at least 10 9 , at least 10 10 unique nucleic acids encoding for gNAs.
  • a collection of nucleic acids encoding for gNAs contains a total of at least 10 2 , at least 10 3 , at least 10 4 , at least 10 5 , at least 10 6 , at least 10 7 , at least 10 8 , at least 10 9 , at least 10 10 nucleic acids encoding for gNAs.
  • a collection of nucleic acids encoding for gNAs comprises a first segment comprising a regulatory region; a second segment comprising a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein at least 10% of the nucleic acids in the collection vary in size.
  • a nucleic acid-guided nuclease system e.g., CRISPR/Cas system
  • the first, second, and third segments are in 5′- to 3′-order′.
  • the nucleic acids encoding for gNAs comprise DNA.
  • the first segment is single stranded DNA.
  • the first segment is double stranded DNA.
  • the second segment is single stranded DNA.
  • the third segment is single stranded DNA.
  • the second segment is double stranded DNA.
  • the third segment is double stranded DNA.
  • the nucleic acids encoding for gNAs comprise RNA.
  • the nucleic acids encoding for gNAs comprise DNA and RNA.
  • the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3.
  • the size of the second segments (targeting sequence) in the collection varies from 15-250 bp, or 30-100 bp, or 22-30 bp, or 15-50 bp, or 15-75 bp, or 15-100 bp, or 15-125 bp, or 15-150 bp, or 15-175 bp, or 15-200 bp, or 15-225 bp, or 15-250 bp, or 22-50 bp, or 22-75 bp, or 22-100 bp, or 22-125 bp, or 22-150 bp, or 22-175 bp, or 22-200 bp, or 22-225 bp, or 22-250 bp across the collection of gNAs.
  • At least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are greater than 21 bp.
  • At least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are greater than 25 bp.
  • At least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are greater than 30 bp.
  • At least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are 15-50 bp.
  • At least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are 30-100 bp.
  • the size of the second segment is not 20 bp.
  • the size of the second segment is not 21 bp.
  • the gNAs and/or the targeting sequence of the gNAs in the collection of gNAs comprise unique 5′ ends.
  • the collection of gNAs exhibit variability in sequence of the 5′ end of the targeting sequence, across the members of the collection.
  • the collection of gNAs exhibit variability at least 5%, or at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75% variability in the sequence of the 5′ end of the targeting sequence, across the members of the collection.
  • the collection of nucleic acids comprises targeting sequences, wherein the target of interest is spaced at least every 1 bp, at least every 2 bp, at least every 3 bp, at least every 4 bp, at least every 5 bp, at least every 6 bp, at least every 7 bp, at least every 8 bp, at least every 9 bp, at least every 10 bp, at least every 11 bp, at least every 12 bp, at least every 13 bp, at least every 14 bp, at least every 15 bp, at least every 16 bp, at least every 17 bp, at least every 18 bp, at least every 19 bp, 20 bp, at least every 25 bp, at least every 30 bp, at least every 40 bp, at least every 50 bp, at least every 100 bp, at least every 200 bp, at least every 300 bp, at least every 400 bp, at least every 500 bp,
  • the collection of nucleic acids encoding for gNAs comprise a third segment encoding for a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein the segments in the collection vary in their specificity for protein members of the nucleic acid-guided nuclease system (e.g., CRISPR/Cas system).
  • a nucleic acid-guided nuclease system e.g., CRISPR/Cas system
  • a collection of nucleic acids encoding for gNAs as provided herein can comprise members whose third segment encode for a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a first nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein; and also comprises members whose third segment encodes for a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein, wherein the first and second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins are not the same.
  • a nucleic acid-guided nuclease system e.g., CRISPR/Cas system
  • a collection of nucleic acids encoding for gNAs as provided herein comprises members that exhibit specificity to at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or even at least 20 nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins.
  • nucleic acid-guided nuclease system e.g., CRISPR/Cas system
  • a collection of nucleic acids encoding for gNAs as provided herein comprises members that exhibit specificity for a Cas9 protein and another protein selected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
  • gNAs and collections of gNAs derived from any source DNA (for example from genomic DNA, cDNA, artificial DNA, DNA libraries), that can be used to target sequences of interest in a sample for a variety of applications including, but not limited to, enrichment, depletion, capture, partitioning, labeling, regulation, and editing.
  • the gNAs comprise a targeting sequence, directed at sequences of interest.
  • the sequences of interest are genomic sequences (genomic DNA). In some embodiments, the sequences of interest are mammalian genomic sequences. In some embodiments, the sequences of interest are eukaryotic genomic sequences. In some embodiments, the sequences of interest are prokaryotic genomic sequences. In some embodiments, the sequences of interest are viral genomic sequences. In some embodiments, the sequences of interest are bacterial genomic sequences. In some embodiments, the sequences of interest are plant genomic sequences. In some embodiments, the sequences of interest are microbial genomic sequences. In some embodiments, the sequences of interest are genomic sequences from a parasite, for example a eukaryotic parasite.
  • the sequences of interest are host genomic sequences (e.g., the host organism of a microbiome, a parasite, or a pathogen).
  • the sequences of interest are abundant genomic sequences, such as sequences from the genome or genomes of the most abundant species in a sample.
  • the sequences of interest comprise repetitive DNA. In some embodiments, the sequences of interest comprise abundant DNA. In some embodiments, the sequences of interest comprise mitochondrial DNA. In some embodiments, the sequences of interest comprise ribosomal DNA. In some embodiments, the sequences of interest comprise centromeric DNA. In some embodiments, the sequences of interest comprise DNA comprising Alu elements (Alu DNA). In some embodiments, the sequences of interest comprise long interspersed nuclear elements (LINE DNA). In some embodiments, the sequences of interest comprise short interspersed nuclear elements (SINE DNA). In some embodiments, the abundant DNA comprises ribosomal DNA.
  • sequences of interest comprise single nucleotide polymorphisms (SNPs), short tandem repeats (STRs), cancer genes, inserts, deletions, structural variations, exons, genetic mutations, or regulatory regions.
  • SNPs single nucleotide polymorphisms
  • STRs short tandem repeats
  • cancer genes inserts, deletions, structural variations, exons, genetic mutations, or regulatory regions.
  • the sequences of interest can be a genomic fragment, comprising a region of the genome, or the whole genome itself.
  • the genome is a DNA genome.
  • the genome is a RNA genome.
  • the sequences of interest are from a eukaryotic or prokaryotic organism; from a mammalian organism or a non-mammalian organism; from an animal or a plant; from a bacteria or virus from an animal parasite; from a pathogen.
  • the sequences of interest are from any mammalian organism.
  • the mammal is a human.
  • the mammal is a livestock animal, for example a horse, a sheep, a cow, a pig, or a donkey.
  • a mammalian organism is a domestic pet, for example a cat, a dog, a gerbil, a mouse, a rat.
  • the mammal is a type of a monkey.
  • sequences of interest are from any bird or avian organism.
  • An avian organism includes but is not limited to chicken, turkey, duck and goose.
  • the sequences of interest are from a plant.
  • the plant is rice, maize, wheat, rose, grape, coffee, fruit, tomato, potato, or cotton.
  • sequences of interest are from a species of bacteria.
  • the bacteria are tuberculosis-causing bacteria.
  • sequences of interest are from a virus.
  • sequences of interest are from a species of fungi.
  • sequences of interest are from a species of algae.
  • sequences of interest are from any mammalian parasite.
  • the sequences of interest are obtained from any mammalian parasite.
  • the parasite is a worm.
  • the parasite is a malaria-causing parasite.
  • the parasite is a Leishmaniasis-causing parasite.
  • the parasite is an amoeba.
  • sequences of interest are from a pathogen.
  • a targeting sequence is one that directs the gNA to the sequences of interest in a sample.
  • a targeting sequence targets a particular sequence of interest, for example the targeting sequence targets a genomic sequence of interest.
  • gNAs and collections of gNAs that comprise a segment that comprises a targeting sequence.
  • nucleic acids encoding for gNAs and collections of nucleic acids encoding for gNAs that comprise a segment encoding for a targeting sequence.
  • the targeting sequence comprises DNA.
  • the targeting sequence comprises RNA.
  • the targeting sequence comprises RNA, and shares at least 70% sequence identity, at least 75% sequence identity, at least 80% sequence identity, at least 85% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or shares 100% sequence identity to a sequence 5′ to a PAM sequence on a sequence of interest, except that the RNA comprises uracils instead of thymines.
  • the PAM sequence is AGG, CGG, or TGG.
  • the targeting sequence comprises DNA, and shares at least 70% sequence identity, at least 75% sequence identity, at least 80% sequence identity, at least 85% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or shares 100% sequence identity to a sequence 5′ to a PAM sequence on a sequence of interest.
  • the targeting sequence comprises RNA and is complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the targeting sequence is at least 70% complementary, at least 75% complementary, at least 80% complementary, at least 85% complementary, at least 90% complementary, at least 95% complementary, or is 100% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the PAM sequence is AGG, CGG, or TGG.
  • the targeting sequence comprises DNA and is complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the targeting sequence is at least 70% complementary, at least 75% complementary, at least 80% complementary, at least 85% complementary, at least 90% complementary, at least 95% complementary, or is 100% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the PAM sequence is AGG, CGG, or TGG.
  • a DNA encoding for a targeting sequence of a gRNA shares at least 70% sequence identity, at least 75% sequence identity, at least 80% sequence identity, at least 85% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or shares 100 sequence identity to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence.
  • the PAM sequence is AGG, CGG, or TGG.
  • a DNA encoding for a targeting sequence of a gRNA is complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence and is at least 70% complementary, at least 75% complementary, at least 80% complementary, at least 85% complementary, at least 90% complementary, at least 95% complementary, or is 100% complementary to a sequence 5′ to a PAM sequence on a sequence of interest.
  • the PAM sequence is AGG, CGG, or TGG.
  • gNAs and collections of gNAs comprising a segment that comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence.
  • nucleic acids encoding for gNAs and collections of nucleic acids encoding for gNAs that comprise a segment encoding a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence.
  • a nucleic acid-guided nuclease system can be an RNA-guided nuclease system.
  • a nucleic acid-guided nuclease system can be a DNA-guided nuclease system.
  • nucleic acid-guided nucleases can utilize nucleic acid-guided nucleases.
  • a “nucleic acid-guided nuclease” is any nuclease that cleaves DNA, RNA or DNA/RNA hybrids, and which uses one or more nucleic acid guide nucleic acids (gNAs) to confer specificity.
  • Nucleic acid-guided nucleases include CRISPR/Cas system proteins as well as non-CRISPR/Cas system proteins.
  • the nucleic acid-guided nucleases provided herein can be DNA guided DNA nucleases; DNA guided RNA nucleases; RNA guided DNA nucleases; or RNA guided RNA nucleases.
  • the nucleases can be endonucleases.
  • the nucleases can be exonucleases.
  • the nucleic acid-guided nuclease is a nucleic acid-guided-DNA endonuclease.
  • the nucleic acid-guided nuclease is a nucleic acid-guided-RNA endonuclease.
  • a nucleic acid-guided nuclease system protein-binding sequence is a nucleic acid sequence that binds any protein member of a nucleic acid-guided nuclease system.
  • a CRISPR/Cas system protein-binding sequence is a nucleic acid sequence that binds any protein member of a CRISPR/Cas system.
  • the nucleic acid-guided nuclease is selected from the group consisting of CAS Class I Type 1, CAS Class I Type III, CAS Class 1 Type IV, CAS Class II Type 11, and CAS Class 11 Type V.
  • CRISPR/Cas system proteins include proteins from CRISPR Type I systems, CRISPR Type II systems, and CRISPR Type III systems.
  • the nucleic acid-guided nuclease is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, Csf1, C2c2, and NgAgo.
  • nucleic acid-guided nuclease system proteins can be from any bacterial or archaeal species.
  • the nucleic acid-guided nuclease system proteins are from, or are derived from nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins) from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola t
  • examples of nucleic acid-guided nuclease system e.g., CRISPR/Cas system
  • examples of nucleic acid-guided nuclease system can be naturally occurring or engineered versions.
  • naturally occurring nucleic acid-guided nuclease system e.g., CRISPR/Cas system
  • proteins include Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
  • Engineered versions of such proteins can also be employed.
  • engineered examples of nucleic acid-guided nuclease system include catalytically dead nucleic acid-guided nuclease system proteins.
  • catalytically dead generally refers to a nucleic acid-guided nuclease system protein that has inactivated nucleases (e.g., HNH and RuvC nucleases).
  • HNH and RuvC nucleases e.g., HNH and RuvC nucleases.
  • Such a protein can bind to a target site in any nucleic acid (where the target site is determined by the guide NA), but the protein is unable to cleave or nick the target nucleic acid (e.g., double-stranded DNA).
  • the nucleic acid-guided nuclease system catalytically dead protein is a catalytically dead CRISPR/Cas system protein, such as catalytically dead Cas9 (dCas9).
  • dCas9 allows separation of the mixture into unbound nucleic acids and dCas9-bound fragments.
  • a dCas9/gRNA complex binds to targets determined by the gRNA sequence. The dCas9 bound can prevent cutting by Cas9 while other manipulations proceed.
  • the dCas9 can be fused to another enzyme, such as a transposase, to target that enzyme's activity to a specific site.
  • Naturally occurring catalytically dead nucleic acid-guided nuclease system proteins can also be employed.
  • engineered examples of nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins also include nucleic acid-guided nickases (e.g., Cas nickases).
  • a nucleic acid-guided nickase refers to a modified version of a nucleic acid-guided nuclease system protein, containing a single inactive catalytic domain.
  • the nucleic acid-guided nickase is a Cas nickase, such as Cas9 nickase.
  • a Cas9 nickase may contain a single inactive catalytic domain, for example, either the RuvC- or the HNH-domain.
  • the Cas9 nickase cuts only one strand of the target DNA, creating a single-strand break or “nick”.
  • the guide NA-hybridized strand or the non-hybridized strand may be cleaved.
  • Nucleic acid-guided nickases bound to 2 gNAs that target opposite strands will create a double-strand break in a target double-stranded DNA.
  • This “dual nickase” strategy can increase the specificity of cutting because it requires that both nucleic acid-guided nuclease/gNA (e.g., Cas9/gRNA) complexes be specifically bound at a site before a double-strand break is formed.
  • nucleic acid-guided nuclease/gNA e.g., Cas9/gRNA
  • Naturally occurring nickase nucleic acid-guided nuclease system proteins can also be employed.
  • engineered examples of nucleic acid-guided nuclease system proteins also include nucleic acid-guided nuclease system fusion proteins.
  • a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein may be fused to another protein, for example an activator, a repressor, a nuclease, a fluorescent molecule, a radioactive tag, or a transposase.
  • the nucleic acid-guided nuclease system protein-binding sequence comprises a gNA (e.g., gRNA) stem-loop sequence.
  • gNA e.g., gRNA
  • a double-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA on the other strand (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4).
  • a single-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence: (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA serves as a transcription template.
  • the gNA (e.g., gRNA) stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUU) (SEQ ID NO: 1)
  • a double-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and its reverse-complementary DNA on the other strand (5′>3′, GAAAAAAAGCACCGACTCGGTGCCAITTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6).
  • a single-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence: (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-stranded DNA serves as a transcription template.
  • the gNA (e.g., gRNA) stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUC) (SEQ ID NO: 2).
  • a nucleic acid encoding for a gNA comprising a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence.
  • the third segment comprises a single transcribed component, which upon transcription yields a NA (e.g., RNA) stem-loop sequence.
  • the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA on the other strand (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4).
  • the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is single-stranded, and comprises the following DNA sequence: (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATITTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA serves as a transcription template.
  • the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and its reverse-complementary DNA on the other strand (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6).
  • the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is single-stranded, and comprises the following DNA sequence: (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACUITITrCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-stranded DNA serves as a transcription template.
  • the yielded gRNA stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUC) (SEQ ID NO: 2).
  • the third segment comprises two sub-segments, which encode for a crRNA and a tracrRNA upon transcription.
  • the crRNA does not comprise the N20 plus the extra sequence which can hybridize with tracrRNA.
  • the crRNA comprises the extra sequence which can hybridize with tracrRNA.
  • the two sub-segments are independently transcribed. In some embodiments, the two sub-segments are transcribed as a single unit.
  • the DNA encoding the crRNA comprises N target GTTTTAGAGCTATGCTGTTTTG (SEQ ID NO: 7), where N target represents the targeting sequence.
  • the DNA encoding the tracrRNA comprises the sequence GGAACCATTCAAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGC ACCGAGTCGGTGCTTTTTTTTT (SEQ ID NO: 8).
  • a nucleic acid encoding for a gNA comprising a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence.
  • the third segment comprises a DNA sequence, which upon transcription yields a gRNA stem-loop sequence capable of binding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein.
  • the DNA sequence can be double-stranded.
  • the third segment double stranded DNA comprises the following DNA sequence on one strand (5′>3′, GTITAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA on the other strand (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4).
  • the third segment double stranded DNA comprises the following DNA sequence on one strand (5′>3′, GTTITAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and its reverse-complementary DNA on the other strand (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTrFITCAAGTTGATAACGGACTAGCCTTATTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6).
  • the DNA sequence can be single-stranded.
  • the third segment single stranded DNA comprises the following DNA sequence (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTITrrICAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA serves as a transcription template.
  • the third segment single stranded DNA comprises the following DNA sequence (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-stranded DNA serves as a transcription template.
  • the third segment comprises a DNA sequence which, upon transcription, yields a first RNA sequence that is capable of forming a hybrid with a second RNA sequence, and which hybrid is capable of CRISPR/Cas system protein binding.
  • the third segment is double-stranded DNA comprising the DNA sequence on one strand: (5′>3′, GTTTTAGAGCTATGCTGTTTTG) (SEQ ID NO: 9) and its reverse complementary DNA sequence on the other strand: (5′>3′, CAAAACAGCATAGCTCTAAAAC) (SEQ ID NO: 10).
  • the third segment is single-stranded DNA comprising the DNA sequence of (5′>3′, CAAAACAGCATAGCTCTAAAAC) (SEQ ID NO: 10).
  • the second segment and the third segment together encode for a crRNA sequence.
  • the second RNA sequence that is capable of forming a hybrid with the first RNA sequence encoded by the third segment of the nucleic acid encoding a gRNA is a tracrRNA.
  • the tracrRNA comprises the sequence (5′>3′, GGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG GCACCGAGUCGGUGCUUUUUU) (SEQ ID NO: 11).
  • the tracrRNA is encoded by a double-stranded DNA comprising sequence of (5′>3′, GGAACCATTCAAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGC ACCGAGTCGGTGCTTTTTTT) (SEQ ID NO: 8), and optionally fused with a regulatory sequence at its 5′ end.
  • the regulatory sequence can be bound by a transcription factor.
  • the regulatory sequence is a promoter. In some embodiments, the regulatory sequence is a T7 promoter, comprising the sequence of (5′>3′, GCCTCGAGCTAATACGACTCACTATAGAG) (SEQ ID NO: 12).
  • a nucleic acid encoding for a gNA comprising a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence.
  • the third segment encodes for a RNA sequence that, upon post-transcriptional cleavage, yields a first RNA segment and a second RNA segment.
  • the first RNA segment comprises a crRNA and the second RNA segment comprises a tracrRNA, which can form a hybrid and together, provide for nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein binding.
  • the third segment further comprises a spacer in between the transcriptional unit for the first RNA segment and the second RNA segment, which spacer comprises an enzyme cleavage site.
  • a gNA e.g., gRNA
  • a gNA comprising a first NA segment comprising a targeting sequence and a second NA segment comprising a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence.
  • the size of the first segment is greater than 30 bp.
  • the second segment comprises a single segment, which comprises the gRNA stem-loop sequence.
  • the gRNA stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUU) (SEQ ID NO: 1). In some embodiments, the gRNA stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUC) (SEQ ID NO: 2).
  • the second segment comprises two sub-segments: a first RNA sub-segment (crRNA) that forms a hybrid with a second RNA sub-segment (tracrRNA), which together act to direct nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein binding.
  • the sequence of the second sub-segment comprises GUUUUAGAGCUAUGCUGUUUUG.
  • the first RNA segment and the second RNA segment together forms a crRNA sequence.
  • the other RNA that will form a hybrid with the second RNA segment is a tracrRNA.
  • the tracrRNA comprises the sequence of 5′>3′, GGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG GCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 11).
  • CRISPR/Cas system proteins are used in the embodiments provided herein.
  • CRISPR/Cas system proteins include proteins from CRISPR Type I systems, CRISPR Type II systems, and CRISPR Type III systems.
  • CRISPR/Cas system proteins can be from any bacterial or archaeal species.
  • the CRISPR/Cas system protein is isolated, recombinantly produced, or synthetic.
  • the CRISPR/Cas system proteins are from, or are derived from CRISPR/Cas system proteins from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus,
  • examples of CRISPR/Cas system proteins can be naturally occurring or engineered versions.
  • naturally occurring CRISPR/Cas system proteins can belong to CAS Class I Type I, III, or IV, or CAS Class II Type II or V, and can include Cas9, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cmr5, Csf1, C2c2, and Cpf1.
  • the CRISPR/Cas system protein comprises Cas9.
  • a “CRISPR/Cas system protein-gNA complex” refers to a complex comprising a CRISPR/Cas system protein and a guide NA (e.g. a gRNA or a gDNA).
  • a guide NA e.g. a gRNA or a gDNA
  • the gRNA may be composed of two molecules, i.e., one RNA (“crRNA”) which hybridizes to a target and provides sequence specificity, and one RNA, the “tracrRNA”, which is capable of hybridizing to the crRNA.
  • the guide RNA may be a single molecule (i.e., a gRNA) that contains crRNA and tracrRNA sequences.
  • a CRISPR/Cas system protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type CRISPR/Cas system protein.
  • the CRISPR/Cas system protein may have all the functions of a wild type CRISPR/Cas system protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.
  • CRISPR/Cas system protein-associated guide NA refers to a guide NA.
  • the CRISPR/Cas system protein-associated guide NA may exist as isolated NA, or as part of a CRISPR/Cas system protein-gNA complex.
  • the CRISPR/Cas System protein nucleic acid-guided nuclease is or comprises Cas9.
  • the Cas9 of the present invention can be isolated, recombinantly produced, or synthetic.
  • Cas9 proteins that can be used in the embodiments herin can be found in F. A. Ran, L. Cong, W. X. Yan, D. A. Scott, J. S. Gootenberg, A. J. Kriz, B. Zetsche, O. Shalem, X. Wu, K. S. Makarova, E. V. Koonin, P. A. Sharp, and F. Zhang; “In vivo genome editing using Staphylococcus aureus Cas9” Nature 520, 186-191 (9 Apr. 2015) doi:10.1038/nature14299, which is incorporated herein by reference.
  • the Cas9 is a Type II CRISPR system derived from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lar, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Sta
  • the Cas9 is a Type II CRISPR system derived from S. pyogenes and the PAM sequence is NGG located on the immediate 3′ end of the target specific guide sequence.
  • the PAM sequences of Type II CRISPR systems from exemplary bacterial species can also include: Streptococcus pyogenes (NGG), Staph aureus (NNGRRT), Neisseria meningitidis (NNNNGA TT), Streptococcus thermophilus (NNAGAA) and Treponema denticola (NAAAAC) which are all usable without deviating from the present invention.
  • Cas9 sequence can be obtained, for example, from the pX330 plasmid (available from Addgene), re-amplified by PCR then cloned into pET30 (from EMD biosciences) to express in bacteria and purify the recombinant 6His tagged protein.
  • a “Cas9-gNA complex” refers to a complex comprising a Cas9 protein and a guide NA.
  • a Cas9 protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type Cas9 protein, e.g., to the Streptococcus pyogenes Cas9 protein.
  • the Cas9 protein may have all the functions of a wild type Cas9 protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.
  • Cas9-associated guide NA refers to a guide NA as described above.
  • the Cas9-associated guide NA may exist isolated, or as part of a Cas9-gNA complex.
  • non-CRISPR/Cas system proteins are used in the embodiments provided herein.
  • the non-CRISPR/Cas system proteins can be from any bacterial or archaeal species.
  • the non-CRISPR/Cas system protein is isolated, recombinantly produced, or synthetic.
  • the non-CRISPR/Cas system proteins are from, or are derived from Aquifex aeolicus, Thermus thermophilus, Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococc
  • non-CRISPR/Cas system proteins can be naturally occurring or engineered versions.
  • a naturally occurring non-CRISPR/Cas system protein is NgAgo (Argonaute from Natronobacterium gregoryi ).
  • a “non-CRISPR/Cas system protein-gNA complex” refers to a complex comprising a non-CRISPR/Cas system protein and a guide NA (e.g. a gRNA or a gDNA).
  • a guide NA e.g. a gRNA or a gDNA
  • the gRNA may be composed of two molecules, i.e., one RNA (“crRNA”) which hybridizes to a target and provides sequence specificity, and one RNA, the “tracrRNA”, which is capable of hybridizing to the crRNA.
  • the guide RNA may be a single molecule (i.e., a gRNA) that contains crRNA and tracrRNA sequences.
  • a non-CRISPR/Cas system protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type non-CRISPR/Cas system protein.
  • the non-CRISPR/Cas system protein may have all the functions of a wild type non-CRISPR/Cas system protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.
  • non-CRISPR/Cas system protein-associated guide NA refers to a guide NA.
  • the non-CRISPR/Cas system protein-associated guide NA may exist as isolated NA, or as part of a non-CRISPR/Cas system protein-gNA complex.
  • engineered examples of nucleic acid-guided nucleases include catalytically dead nucleic acid-guided nucleases (CRISPR/Cas system nucleic acid-guided nucleases or non-CRISPR/Cas system nucleic acid-guided nucleases).
  • CRISPR/Cas system nucleic acid-guided nucleases or non-CRISPR/Cas system nucleic acid-guided nucleases.
  • the term “catalytically dead” generally refers to a nucleic acid-guided nuclease that has inactivated nucleases, for example inactivated HNH and RuvC nucleases.
  • Such a protein can bind to a target site in any nucleic acid (where the target site is determined by the guide NA), but the protein is unable to cleave or nick the nucleic acid.
  • the catalytically dead nucleic acid-guided nuclease allows separation of the mixture into unbound nucleic acids and catalytically dead nucleic acid-guided nuclease-bound fragments.
  • a dCas9/gRNA complex binds to the targets determined by the gRNA sequence. The dCas9 bound can prevent cutting by Cas9 while other manipulations proceed.
  • the catalytically dead nucleic acid-guided nuclease can be fused to another enzyme, such as a transposase, to target that enzyme's activity to a specific site.
  • another enzyme such as a transposase
  • the catalytically dead nucleic acid-guided nuclease is dCas9, dCpf1, dCas3, dCas8a-c, dCas10, dCse1, dCsy1, dCsn2, dCas4, dCsm2, dCm5, dCsf1, dC2C2, or dNgAgo.
  • the catalytically dead nucleic acid-guided nuclease protein is a dCas9.
  • engineered examples of nucleic acid-guided nucleases include nucleic acid-guided nuclease nickases (referred to interchangeably as nickase nucleic acid-guided nucleases).
  • engineered examples of nucleic acid-guided nucleases include CRISPR/Cas system nickases or non-CRISPR/Cas system nickases, containing a single inactive catalytic domain.
  • the nucleic acid-guided nuclease nickase is a Cas9 nickase, Cpf1 nickase, Cas3 nickase, Cas8a-c nickase, Cas10 nickase, Cse1 nickase, Csy1 nickase, Csn2 nickase, Cas4 nickase, Csm2 nickase, Cm5 nickase, Csf1 nickase, C2C2 nickase, or a NgAgo nickase.
  • the nucleic acid-guided nuclease nickase is a Cas9 nickase.
  • a nucleic acid-guided nuclease nickase can be used to bind to target sequence. With only one active nuclease domain, the nucleic acid-guided nuclease nickase cuts only one strand of a target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the guide NA-hybridized strand or the non-hybridized strand may be cleaved. nucleic acid-guided nuclease nickases bound to 2 gNAs that target opposite strands can create a double-strand break in the nucleic acid. This “dual nickase” strategy increases the specificity of cutting because it requires that both nucleic acid-guided nuclease/gNA complexes be specifically bound at a site before a double-strand break is formed.
  • a Cas9 nickase can be used to bind to target sequence.
  • the term “Cas9 nickase” refers to a modified version of the Cas9 protein, containing a single inactive catalytic domain, i.e., either the RuvC- or the HNH-domain. With only one active nuclease domain, the Cas9 nickase cuts only one strand of the target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the guide RNA-hybridized strand or the non-hybridized strand may be cleaved.
  • Cas9 nickases bound to 2 gRNAs that target opposite strands will create a double-strand break in the DNA.
  • This “dual nickase” strategy can increase the specificity of cutting because it requires that both Cas9/gRNA complexes be specifically bound at a site before a double-strand break is formed.
  • Capture of DNA can be carried out using a nucleic acid-guided nuclease nickase.
  • a nucleic acid-guided nuclease nickase cuts a single strand of double stranded nucleic acid, wherein the double stranded region comprises methylated nucleotides.
  • thermostable nucleic acid-guided nucleases are used in the methods provided herein (thermostable CRISPR/Cas system nucleic acid-guided nucleases or thermostable non-CRISPR/Cas system nucleic acid-guided nucleases).
  • the reaction temperature is elevated, inducing dissociation of the protein; the reaction temperature is lowered, allowing for the generation of additional cleaved target sequences.
  • thermostable nucleic acid-guided nucleases maintain at least 50% activity, at least 55% activity, at least 60% activity, at least 65% activity, at least 70% activity, at least 75% activity, at least 80% activity, at least 85% activity, at least 90% activity, at least 95% activity, at least 96% activity, at least 97% activity, at least 98% activity, at least 99% activity, or 100% activity, when maintained for at least 75° C. for at least 1 minute.
  • thermostable nucleic acid-guided nucleases maintain at least 50% activity, when maintained for at least 1 minute at least at 75° C., at least at 80° C., at least at 85° C., at least at 90° C., at least at 91° C., at least at 92° C., at least at 93° C., at least at 94° C., at least at 95° C., 96° C., at least at 97° C., at least at 98° C., at least at 99° C., or at least at 100° C. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, when maintained at least at 75° C.
  • thermostable nucleic acid-guided nuclease maintains at least 50% activity when the temperature is elevated, lowered to 25° C.-50° C. In some embodiments, the temperature is lowered to 25° C., to 30° C., to 35° C. to 40° C., to 45° C., or to 50° C. In one exemplary embodiment, a thermostable enzyme retains at least 90% activity after 1 min at 95° C.
  • thermostable nucleic acid-guided nuclease is thermostable Cas9, thermostable Cpf1, thermostable Cas3, thermostable Cas8a-c, thermostable Cas10, thermostable Cse1, thermostable Csy1, thermostable Csn2, thermostable Cas4, thermostable Csm2, thermostable Cm5, thermostable Csf1, thermostable C2C2, or thermostable NgAgo.
  • thermostable CRISPR/Cas system protein is thermostable Cas9.
  • Thermostable nucleic acid-guided nucleases can be isolated, for example, identified by sequence homology in the genome of thermophilic bacteria Streptococcus thermophilus and Pyrococcus furiosus . Nucleic acid-guided nuclease genes can then be cloned into an expression vector. In one exemplary embodiment, a thermostable Cas9 protein is isolated.
  • thermostable nucleic acid-guided nuclease in another embodiment, can be obtained by in vitro evolution of a non-thermostable nucleic acid-guided nuclease.
  • the sequence of a nucleic acid-guided nuclease can be mutagenized to improve its thermostability.
  • Methods provided herein can employ enzymatic methods including but not limited to digestion, ligation, extension, overhang filling, transcription, reverse transcription, amplification.
  • the method can comprise providing a nucleic acid (e.g., DNA); employing a first enzyme (or combinations of first enzymes) that cuts at a part of the PAM sequence in the nucleic acid, in a way that a residual nucleotide sequence from the PAM sequence is left; ligating an adapter that positions a restriction enzyme typeIIS site (an enzyme that cuts outside yet near its recognition motif) at a distance to eliminate the PAM sequence; employing a second typeIIS enzyme (or combination of second enzymes) to eliminate the PAM sequence together with the adapter; and fusing a sequence that can be recognized by protein members of the nucleic acid-guided nuclease (e.g., CRISPR/Cas) system, for example, a gRNA stem-loop sequence.
  • a nucleic acid e.g., DNA
  • a first enzyme or combinations of first enzymes
  • the first enzymatic reactions cuts part of the PAM sequence in a way that residual nucleotide sequence from the PAM sequence is left, and that the nucleotide sequence immediately 5′ to the PAM sequence can be any purine or pyrimidine, not just those with a cytosine 5′ to the PAM sequence, for example, not just those that are C/NGG or C/TAG, etc.
  • Table 1 shows exemplary strategies/protocols to convert any source nucleic acid (e.g., DNA) into a collection of gNAs (e.g., gRNAs) using different restriction enzymes.
  • source nucleic acid e.g., DNA
  • gNAs e.g., gRNAs
  • Table 2 shows additional exemplary strategies/protocols to convert any source nucleic acid (e.g., DNA) into a collection of gNAs (e.g., gRNAs) using different restriction enzymes.
  • source nucleic acid e.g., DNA
  • gNAs e.g., gRNAs
  • CRISPR/ First Adapter oligo sequence (with Cas System PAM Enzyme/ Inosine overhangs, all in 5′ ⁇ 3′ Species Sequence Component Exemplary Strategy direction) Streptococcus NGG CviPII Nicks immediately 5′ of CCD Adapter oligo I: pyogenes (SP); sequence, nicks the other strand ggggGACTCggatccctatagtgatac SpCas9 with T7 endonuclease I; ligate to aaagacgatgacgacaagcg adapter; cut with MlyI to remove (SEQ ID NO: 4404) PAM and 3′ adapter; ligate gRNA Adapter oligo 2: stem-loop sequence at 3′ end gcctcgagc*t*a*atacgactcactatag ggatccaagtccc (*
  • FIG. 1 Exemplary applications of the compositions and methods described herein are provided in FIG. 1 , FIG. 2 , FIG. 3 , FIG. 4 , FIG. 5 , FIG. 6 , and FIG. 7 .
  • the figures depict non-limiting exemplary embodiments of the present invention that includes a method of constructing a gNA library (e.g., gRNA library) from input nucleic acids (e.g., DNA), such as genomic DNA (e.g., human genomic DNA).
  • a gNA library e.g., gRNA library
  • input nucleic acids e.g., DNA
  • genomic DNA e.g., human genomic DNA
  • the starting material can be fragmented genomic DNA (e.g., human) or other source DNA. These fragments are blunt-ended before constructing the library 101 .
  • T7 promoter adapters are ligated to the blunt-ended DNA fragments 102 , which is then PCR amplified.
  • Nt.CviPII is then used to generate a nick on one strand of the PCR product immediately 5′ to the CCD sequence 103 .
  • T7 Endonuclease I cleaves on the opposite strand 1, 2, or 3 bp 5′ of the nick 104 .
  • the resulting DNA fragments are blunt-ended with T4 DNA Polymerase, leaving HGG sequence at the end of the DNA fragment 105 .
  • the resulting DNA is cleaned and recovered on beads.
  • An adapter carrying MlyI recognition site is ligated to the blunt-ended DNA fragment immediately 3′ of HGG sequence 106 .
  • MlyI generates a blunt-end cleavage immediately 5′ to the HGG sequence, removing HGG together with the adapter sequence 107 .
  • the resulting DNA fragments are cleaned and recovered again on beads.
  • a gRNA stem-loop sequence is then ligated to the blunt-end cleaved by MlyI, forming a gRNA library covering the human genome 108 .
  • This library of DNA is then PCR amplified and cleaned on beads, ready for in vitro transcription.
  • the starting material can intact genomic DNA (e.g., human) or other source DNA 201 .
  • Nt.CviPII and T7 Endonuclease I are used to generate nicks on each strand of the human genomic DNA, resulting in smaller DNA fragments 202 .
  • DNA fragments of 200-600 bp are size selected on beads, then ligated with Y-shaped adapters carrying a GG overhang on the 5′.
  • One strand of the Y-shaped adapter contains a MlyI recognition site, wherein the other strand contains a mutated MlyI site and a T7 promoter sequence 203 .
  • the T7 promoter sequence is at the distal end of the HGG sequence
  • the MlyI sequence is at the rear end of HGG 204 .
  • Digestion with MlyI generates a cleavage immediately 5′ of HGG sequence 205 .
  • MlyI generates a blunt-end cleavage immediately 5′ to the HGG sequence, removing HGG together with the adapter sequence 206 .
  • a gRNA stem-loop sequence is then ligated to the blunt-end cleaved by MlyI, forming a gRNA library covering the human genome. This library of DNA is then PCR amplified and cleaned on beads, ready for in vitro transcription.
  • the source DNA e.g., genomic DNA
  • the nicking enzyme can have a recognition site that is three or fewer bases in length.
  • CviPII is used, which can recognize and nick at a sequence of CCD (where D represents a base other than C).
  • Nicks can be proximal, surrounding a region containing the sequence (represented by the thicker line) which will be used to yield the guide RNA N20 sequence. When nicks are proximal, a double stranded break can occur and lead to 5′ or 3′ overhangs 302 .
  • repair can comprise synthesizing a complementary strand.
  • repair can comprise removing overhangs. Repair can result in a blunt end including the N20 guide sequence and a sequence complementary to the nick recognition sequence (e.g., HGG, where H represents a base other than G).
  • a polymerase e.g., T4 polymerase
  • repair can comprise synthesizing a complementary strand.
  • repair can comprise removing overhangs. Repair can result in a blunt end including the N20 guide sequence and a sequence complementary to the nick recognition sequence (e.g., HGG, where H represents a base other than G).
  • FIG. 4 different combinations of adapters can be ligated to the DNA to allow for the desired cleaving.
  • Adapters with a recognition site for a nuclease enzyme that cuts 3 base pairs from the site can be ligated 401 , and digestion at that site can be used to remove a left over sequence, such as an HGG sequence 402 .
  • These adapters can also include a second recognition site for a nuclease that cuts the proper number of nucleotides from the site to later remove the first recognition site (e.g., BsaXI).
  • the first enzyme can be used to cut 20 nucleotides down, thereby keeping the N20 sequence 404 .
  • a promoter adapter e.g., T7
  • the nuclease corresponding to the second recognition site e.g., BsaXI
  • the guide RNA stem-loop sequence adapter can be ligated to the N20 sequence 407 to prepare for guide RNA production.
  • the protocol shown in FIG. 5 can follow the end of a protocol such as that shown in FIG. 3 .
  • Adapters with a recognition site for a nuclease enzyme that cleaves 25 nucleotides from the site e.g., EcoP151
  • These adapters can also include a second recognition site for a nuclease that cuts the proper number of nucleotides from the site to later remove the first recognition site (e.g., Bac) and any other left-over sequence, such as HGG.
  • the enzyme corresponding to the first recognition site e.g., EcoP15I
  • a promoter adapter e.g., T7
  • the enzyme corresponding to the second recognition site e.g., BaeI
  • the guide RNA stem-loop sequence adapter can be ligated (e.g., by single strand ligation) to the N20 sequence 505 .
  • a nick can be introduced by a nicking enzyme (e.g., CviPII) 601 .
  • CviPII a nicking enzyme
  • the nick recognition site is three or fewer bases in length.
  • CviPII is used, which can recognize and nick at a sequence of CCD.
  • a polymerase e.g., Bst large fragment DNA polymerase
  • the nick can be sealed and made available to be nicked again 603 .
  • target sequences 604 can be made double stranded, for example by random priming and extension.
  • double stranded nucleic acids comprising N20 sequences can then be further processed by methods disclosed herein, such as those shown in FIG. 4 or FIG. 5 .
  • the protocol shown in FIG. 7 can be used in preparation for protocols such as those shown in FIG. 4 or FIG. 5 .
  • a nick can be introduced by a nicking enzyme (e.g., CviPII) 701 .
  • CviPII nicking enzyme
  • the nicking enzyme recognition site is three or fewer bases in length.
  • CviPII is used, which can recognize and nick at a sequence of CCD.
  • a polymeras e.g., Bst large fragment DNA polymerase
  • Bst large fragment DNA polymerase can then be used to synthesize a new DNA strand starting from the nick while displacing the old strand (e.g., nicking endonuclease-mediated strand-displacement DNA amplification (NEMDA)).
  • the reaction parameters can be adjusted to control the size of the single stranded DNA produced. For example, the nickase:polymerase ratio (e.g., CviPII:Bts large fragment polymerase ratio) can be adjusted. Reaction temperature can also be adjusted.
  • an oligonucleotide can be added 704 which has (in the 5′>3′ direction) a promoter (e.g., T7 promoter) 702 followed by a random n-mer (e.g., random 6-mer, random 8-mer) 703 .
  • the random n-mer region can bind to a region of the single stranded DNA generated previously.
  • binding can be conducted by denaturing at high temperature followed by rapid cool down, which can allow the random n-mer region to bind to the single stranded DNA generated by NEMDA.
  • the DNA is denatured at 98° C. for 7 minutes then cooled down rapidly to 10° C.
  • Extension and/or amplification can be used to produce double-stranded DNA.
  • Blunt ends can be produced, for example enzymatically (e.g., by treatment with DNA polymerase I at 20° C.). This can result in one end ending at the promoter (e.g., T7 promoter) and the other end ending at any nicking enzyme recognition sites (e.g., any CCD sites). These fragments can then be purified, for example by size selection (e.g., by gel purification, capillary electrophoresis, or other fragment separation techniques).
  • the target fragments are about 50 base pairs in length (adapter sequence (e.g., T7 adapter)+target N20 sequence+nicking enzyme recognition site or complement (e.g., HGG)). Fragments can then be ligated to an adapter comprising a nuclease recognition site for a nuclease that cuts an appropriate distance away to remove the nicking enzyme recognition site 705 .
  • an adapter comprising a nuclease recognition site for a nuclease that cuts an appropriate distance away to remove the nicking enzyme recognition site 705 .
  • a three-nucleotide long nicking enzyme recognition site e.g., CCD for CviPII
  • BaeI can be used.
  • the appropriate nuclease e.g., BaeI
  • the remaining nucleic acid sequence (e.g., the N20 site) can then be ligated to the final stem-loop sequence for the guide RNA 707 .
  • Amplification e.g., PCR
  • Guide RNAs can be produced.
  • a collection of gNAs e.g., gRNAs
  • mtDNA human mitochondrial DNA
  • the targeting sequence of this collection of gNAs are encoded by DNA sequences comprising at least the 20 nt sequence provided in the second column from the right of Table 3 (if the NGG sequence is on positive strand) and Table 4 (if the NGG sequence is on negative strand).
  • a collection of gRNA nucleic acids, as provided herein, with specificity for human mitochondrial DNA comprise a plurality of members, wherein the members comprise a plurality of targeting sequences provided in the second column from the right column of Table 3 and/or the second column from the right of Table 4.
  • gNAs e.g., gRNAs
  • collections of gNAs e.g., gRNAs
  • gNAs e.g., gRNAs
  • depletion, partitioning, capture, or enrichment of target sequences of interest genome-wide labeling
  • genome-wide editing genome-wide function screens
  • genome-wide regulation genome-wide regulation
  • the gNAs are selective for host nucleic acids in abiological sample from a host, but are not selective for non-host nucleic acids in the sample from a host. In one embodiment, the gNAs are selective for non-host nucleic acids from a biological sample from a host but are not selective for the host nucleic acids in the sample. In one embodiment, the gNAs are selective for both host nucleic acids and a subset of the non-host nucleic acids in abiological sample from a host. For example, where a complex biological sample comprises host nucleic acids and nucleic acids from more than one non-host organisms, the gRNAs may be selective for more than one of the non-host species.
  • the gNAs are used to serially deplete or partition the sequences that are not of interest.
  • saliva from a human contains human DNA, as well as the DNA of more than one bacterial species, but may also contain the genomic material of an unknown pathogenic organism.
  • gNAs directed at the human DNA and the known bacteria can be used to serially deplete the human DNA, and the DNA of the known bacterial, thus resulting in a sample comprising the genomic material of the unknown pathogenic organism.
  • the gNAs are selective for human host DNA obtained from a biological sample from the host, but do not hybridize with DNA from an unknown pathogen(s) also obtained from the sample.
  • the gNAs are useful for depleting and partitioning of targeted sequences in a sample, enriching a sample for non-host nucleic acids, or serially depleting targeted nucleic acids in a sample comprising: providing nucleic acids extracted from a sample; and contacting the sample with a plurality of complexes comprising (i) any one of the collection of gNAs described herein and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins.
  • a plurality of complexes comprising (i) any one of the collection of gNAs described herein and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins.
  • the gNAs are useful for method of depletion and partitioning of targeted sequences in a sample comprising: providing nucleic acids extracted from a sample, wherein the extracted nucleic acids comprise sequences of interest and targeted sequences for one of depletion and partitioning; contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, under conditions in which the nucleic acid-guided nuclease system proteins cleave the nucleic acids in the sample.
  • nucleic acid-guided nuclease e.g., CRISPR/Cas
  • the gNAs are useful for enriching a sample for non-host nucleic acids comprising: providing a sample comprising host nucleic acids and non-host nucleic acids; contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein comprising targeting sequences directed at the host nucleic acids; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, under conditions in which the nucleic acid-guided nuclease system proteins cleave the host nucleic acids in the sample, thereby depleting the sample of host nucleic acids, and allowing for the enrichment of non-host nucleic acids.
  • a plurality of complexes comprising (i) a collection of gNAs provided herein comprising targeting sequences directed at the host nucleic acids; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas
  • the gNAs are useful for one method for serially depleting targeted nucleic acids in a sample comprising: providing a biological sample from a host comprising host nucleic acids and non-host nucleic acids, wherein the non-host nucleic acids comprise nucleic acids from at least one known non-host organism and nucleic acids from an unknown non-host organism; providing a plurality of complexes comprising (i) a collection of gNAs provided herein, directed at the host nucleic acids; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins; mixing the nucleic acids from the biological sample with the gNA-nucleic acid-guided nuclease system protein complexes (e.g., gRNA-CRISPR/Cas system protein complexes) configured to hybridize to targeted sequences in the host nucleic acids, wherein at least a portion of the complexes hybridizes to the targeted
  • the gNAs generated herein are used to perform genome-wide or targeted functional screens in a population of cells.
  • libraries of in vitro-transcribed gNAs e.g., gRNAs
  • vectors encoding the gNAs can be introduced into a population of cells via transfection or other laboratory techniques known in the art, along with a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein, in a way that gNA-directed nucleic acid-guided nuclease system protein editing can be achieved to sequences across the entire genome or to a specific region of the genome.
  • CRISPR/Cas CRISPR/Cas
  • the nucleic acid-guided nuclease system protein can be introduced as a DNA. In one embodiment, the nucleic acid-guided nuclease system protein can be introduced as mRNA. In one embodiment, the nucleic acid-guided nuclease system protein can be introduced as protein. In one exemplary embodiment, the nucleic acid-guided nuclease system protein is Cas9.
  • the gNAs generated herein are used for the selective capture and/or enrichment of nucleic acid sequences of interest.
  • the gNAs generated herein are used for capturing target nucleic acid sequences comprising: providing a sample comprising a plurality of nucleic acids; and contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins.
  • nucleic acid-guided nuclease e.g., CRISPR/Cas
  • the gNAs generated herein are used for introducing labeled nucleotides at targeted sites of interest comprising: (a) providing a sample comprising a plurality of nucleic acid fragments; (b) contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-nickases (e.g.
  • Cas9-nickases wherein the gNAs are complementary to targeted sites of interest in the nucleic acid fragments, thereby generating a plurality of nicked nucleic acid fragments at the targeted sites of interest; and (c) contacting the plurality of nicked nucleic acid fragments with an enzyme capable of initiating nucleic acid synthesis at a nicked site, and labeled nucleotides, thereby generating a plurality of nucleic acid fragments comprising labeled nucleotides in the targeted sites of interest.
  • the gNAs generated herein are used for capturing target nucleic acid sequences of interest comprising: (a) providing a sample comprising a plurality of adapter-ligated nucleic acids, wherein the nucleic acids are ligated to a first adapter at one end and are ligated to a second adapter at the other end; and (b) contacting the sample with a collection of gNAs which comprise a plurality of dead nucleic acid-guided nuclease-gNA complexes (e.g., dCas9-gRNA complexes), wherein the dead nucleic acid-guided nuclease (e.g., dCas9) is fused to a transposase, wherein the gNAs are complementary to targeted sites of interest contained in a subset of the nucleic acids, and wherein the dead nucleic acid-guided nuclease-gNA transposase complexes (e.g., d
  • the gNAs generated herein are used to perform genome-wide or targeted activation or repression in a population of cells.
  • libraries of in vitro-transcribed gNAs e.g., gRNAs
  • vectors encoding the gNAs can be introduced into a population of cells via transfection or other laboratory techniques known in the art, along with a catalytically dead nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein fused to an activator or repressor domain (catalytically dead nucleic acid-guided nuclease system protein-fusion protein), in a way that gNA-directed catalytically dead nucleic acid-guided nuclease system protein-mediated activation or repression can be achieved at sequences across the entire genome or to a specific region of the genome.
  • a catalytically dead nucleic acid-guided nuclease e.g., CRISPR/Cas
  • the catalytically dead nucleic acid-guided nuclease system protein-fusion protein can be introduced as DNA. In one embodiment, the catalytically dead nucleic acid-guided nuclease system protein-fusion protein can be introduced as mRNA. In one embodiment, the catalytically dead nucleic acid-guided nuclease system protein-fusion protein can be introduced as protein. In some embodiments, the collection of gNAs or nucleic acids encoding for gNAs exhibit specificity for more than one nucleic acid-guided nuclease system protein. In one exemplary embodiment, the catalytically dead nucleic acid-guided nuclease system protein is dCas9.
  • the collection comprises gRNAs or nucleic acids encoding for gRNAs with specificity for Cas9 and one or more CRISPR/Cas system proteins selected from selected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
  • the collection comprises gRNAs or nucleic acids encoding for gRNAs with specificity for various catalytically dead CRISPR/Cas system proteins fused to different fluorophores, for example for use in the labeling and/or visualization of different genomes or portions of genomes, for use in the labeling and/or visualization of different chromosomal regions, or for use in the labeling and/or visualization of the integration of viral genes/genomes into a genome.
  • the collection of gNAs (or nucleic acids encoding for gNAs) have specificity for different nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, and target different sequences of interest, for example from different species.
  • nucleic acid-guided nuclease e.g., CRISPR/Cas
  • a first subset of gNAs from a collection of gNAs (or transcribed from a population of nucleic acids encoding such gNAs) targeting a genome from a first species can be first mixed with a first nucleic acid-guided nuclease system protein member (or an engineered version); and a second subset of gNAs from a collection of gNAs (or transcribed from a population of nucleic acids encoding such gNAs) targeting a genome from a second species can be mixed with a second different nucleic acid-guided nuclease system protein member (or an engineered version).
  • the nucleic acid-guided nuclease system proteins can be a catalytically dead version (for example dCas9) fused with different fluorophores, so that different targeted sequence of interest, e.g. different species genome, or different chromosomes of one species, can be labeled by different fluorescent labels.
  • different chromosomal regions can be labeled by different gRNA-targeted dCas9-fluorophores, for visualization of genetic translocations.
  • different viral genomes can be labeled by different gRNA-targeted dCas9-fluorophores, for visualization of integration of different viral genomes into the host genome.
  • the nucleic acid-guided nuclease system protein can be dCas9 fused with either activation or repression domain, so that different targeted sequence of interest, e.g. different chromosomes of a genome, can be differentially regulated.
  • the nucleic acid-guided nuclease system protein can be dCas9 fused different protein domain which can be recognized by different antibodies, so that different targeted sequence of interest, e.g. different DNA sequences within a sample mixture, can be differentially isolated.
  • a composition comprising a nucleic acid fragment, a nickase nucleic acid-guided nuclease-gNA complex, and labeled nucleotides.
  • a composition comprising a nucleic acid fragment, a nickase Cas9-gRNA complex, and labeled nucleotides.
  • the nucleic acid may comprise DNA.
  • the nucleotides can be labeled, for example with biotin.
  • the nucleotides can be part of an antibody-conjugate pair.
  • composition comprising a nucleic acid fragment and a catalytically dead nucleic acid-guided nuclease-gNA complex, wherein the catalytically dead nucleic acid-guided nuclease is fused to a transposase.
  • a composition comprising a DNA fragment and a dCas9-gRNA complex, wherein the dCas9 is fused to a transposase.
  • composition comprising a nucleic acid fragment comprising methylated nucleotides, a nickase nucleic acid-guided nuclease-gNA complex, and unmethylated nucleotides.
  • a composition comprising a DNA fragment comprising methylated nucleotides, a nickase Cas9-gRNA complex, and unmethylated nucleotides.
  • a gDNA complexed with a nucleic acid-guided-DNA endonuclease is NgAgo.
  • a gDNA complexed with a nucleic acid-guided-RNA endonuclease is provided herein.
  • provided herin is a gRNA complexed with a nucleic acid-guided-DNA endonuclease.
  • a gRNA complexed with a nucleic acid-guided-RNA endonuclease comprises C2c2.
  • kits comprising any one or more of the compositions described herein, not limited to adapters, gNAs (e.g., gRNAs), gNA collections (e.g., gRNA collections), nucleic acid molecules encoding the gNA collections, and the like.
  • gNAs e.g., gRNAs
  • gNA collections e.g., gRNA collections
  • nucleic acid molecules encoding the gNA collections e.g., gRNAs
  • the kit comprises a collection of DNA molecules capable of transcribing into a library of gRNAs wherein the gRNAs are targeted to human genomic or other sources of DNA sequences.
  • the kit comprises a collection of gNAs wherein the gNAs are targeted to human genomic or other sources of DNA sequences.
  • kits comprising any of the collection of nucleic acids encoding gNAs, as described herein. In some embodiments, provided herein are kits comprising any of the collection of gNAs, as described herein.
  • kits that comprise all essential reagents and instructions for carrying out the methods of making individual gNAs and collections of gNAs as described herein.
  • the software can compute and report the abundance of non-target sequence in the sample before and after providing gNA collection to ensure no off-target targeting occurs, and wherein the software can check the efficacy of targeted-depletion/encrichment/capture/partitioning/labeling/regulation/editing by comparing the abundance of the target sequence before and after providing gNA collection to the sample.
  • Example 1 Construction of a gRNA Library from a T7 Promoter Human DNA Library
  • Human genomic DNA 400 ng was fragmented using an S2 Covaris sonicator (Covaris) for 8 cycles, to yield fragments of 200-300 bp in length. Fragmented DNA was repaired using the NEBNext End Repair Module (NEB) and incubated at 25° C. for 30 min, then heat inactivated at 75° C. for 20 min.
  • NEB NEBNext End Repair Module
  • T7 promoter adapters To make T7 promoter adapters, oligos T7-1 (5′GCCTCGAGC*T*A*ATACGACTCACTATAGAG3′, * denotes a phosphorothioate backbone linkage)(SEQ ID NO: 4397) and T7-2 (sequence 5′Phos-CTCTATAGTGAGTCGTATTA3′) (SEQ ID NO: 4398) were admixed at 15 ⁇ M, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C. T7 promoter blunt adapters (15 pmol total) were then added to the blunt-ended human genomic DNA fragments, and incubated with Blunt/TA Ligase Master Mix (NEB) at 25° C.
  • NEB Blunt/TA Ligase Master Mix
  • Ligations were amplified with 2 ⁇ M oligo T7-1, using Hi-Fidelity 2 ⁇ Master Mix (NEB) for 10 cycles of PCR (98° C. for 20 s, 63° C. for 20 s, 72° C. for 35 s). Amplification was verified by running a small aliquot on agarose gel electrophoresis. PCR amplified products were recovered using 0.6 ⁇ AxyPrep beads (Axygen) according to the manufacturer's instructions, and resuspended in 15 ⁇ L of 10 mM Tris-HCl pH 8.
  • PCR amplified T7 promoter DNA (2 ⁇ g total per digestion) was digested with 0.1 ⁇ L of Nt.CviPII (NEB) in 10 ⁇ L of NEB buffer 2 (50 mM NaCl, 10 mM Tris-HCl pH 7.9, 10 mM MgCl 2 , 100 ⁇ g/mL BSA) for 10 min at 37° C. ((3) in FIG. 1 ), then heat inactivated at 75° C. for 20 min. An additional 10 ⁇ L of NEB buffer 2 with 1 ⁇ L of T7 Endonuclease I (NEB) was added to the reaction, and incubated at 37° C. for 20 min ((4) in FIG. 1 ).
  • NEB buffer 2 50 mM NaCl, 10 mM Tris-HCl pH 7.9, 10 mM MgCl 2 , 100 ⁇ g/mL BSA
  • DNA was then blunted using T4 DNA Polymerase (NEB) for 20 min at 25° C., followed by heat inactivation at 75° C. for 20 min ((5) in FIG. 1 ).
  • NEB T4 DNA Polymerase
  • oligos MlyI-1 (sequence 5 ′> 3 ′, 5 ′Phos-GGGACTCGGATCCCTATAGTGATACAAAGACGATGACGACAAGCG) (SEQ ID NO: 4399) and MlyI-2 (sequence 5 ′> 3 ′, TCACTATAGGGATCCGAGTCCC) (SEQ ID NO: 4400) were admixed at 15 ⁇ M, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C.
  • MlyI adapters (15 pmol total) were then added to T4 DNA Polymerase-blunted DNA, and incubated with Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((6) in FIG. 1 ). Ligations were heat inactivated at 75° C. for 20 min, then digested with MlyI and XhoI (NEB) for 1 hr at 37° C., so that HGG motifs are eliminated ((7) in FIG. 1 ). Digests were then cleaned using 0.8 ⁇ AxyPrep beads (Axygen), and DNA was resuspended in 10 ⁇ L of 10 mM Tris-Cl pH 8.
  • oligos stlgR (sequence 5 ′> 3 ′, 5 ′Phos-GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTTGGATCCGATGC) (SEQ ID NO: 4401) and stlgRev (sequence 5 ′> 3 ′, GGATCCAAAAAAAGCACCGACTCGGTGCCACUITTITCAAGTTGATAACGGACTAGCCTTATTTTAAC TTGCTATTTCTAGCTCTAAAAC) (SEQ ID NO: 4402) were admixed at 15 ⁇ M, heated to 98° C.
  • PCR amplified products were recovered using 0.6 ⁇ AxyPrep beads (Axygen) according to the manufacturer's instructions, and resuspended in 15 ⁇ L of 10 mM Tris-HCl pH 8.
  • the T7/gRU amplified library of PCR products was then used as template for in vitro transcription, using the HiScribe T7 In Vitro Transcription Kit (NEB). 500-1000 ng of template was incubated overnight at 37° C. according to the manufacturer's instructions.
  • To transcribe the guide libraries into gRNAs the following in vitro transcription reaction mixture was assembled: 10 ⁇ L of purified library ( ⁇ 500 ng), 6.5 ⁇ L of H 2 O, 2.25 ⁇ L of ATP, 2.25 ⁇ L of CTP, 2.25 ⁇ L of GTP, 2.25 ⁇ L of UTP, 2.25 ⁇ L of 10 ⁇ reaction buffer (NEB) and 2.25 ⁇ L of T7 RNA Polymerase mix. The reaction was incubated at 37° C. for 24 hr, then purified using the RNA cleanup kit (Life Technologies), eluted with 100 ⁇ L of RNase-free water, quantified and stored at ⁇ 20° C. until use.
  • Human genomic DNA ((1) in FIG. 2 ; 20 ⁇ g total per digestion) was digested with 0.1 ⁇ L of Nt.CviPII (NEB) in 40 ⁇ L of NEB buffer 2 (50 mM NaCl, 10 mM Tris-HC pH 7.9, 10 mM MgCl 2 , 100 ⁇ g/mL BSA) for 10 min at 37° C., then heat inactivated at 75° C. for 20 min. An additional 40 ⁇ L of NEB buffer 2 and 1 ⁇ L of T7 Endonuclease I (NEB) was added to the reaction, with 20 min incubation at 37° C. (e.g., (2) in FIG. 2 ).
  • NEB buffer 2 50 mM NaCl, 10 mM Tris-HC pH 7.9, 10 mM MgCl 2 , 100 ⁇ g/mL BSA
  • DNA fragments between 200 and 600 bp were recovered by adding 0.3 ⁇ AxyPrep beads (Axygen), incubating at 25° C. for 5 min, capturing beads on a magnetic stand and transferring the supernatant to a new tube. DNA fragments below 600 bp do not bind to beads at this bead/DNA ratio and remain in the supernatant. 0.7 ⁇ AxyPrep beads (Axygen) were then added to the supernatant (this will bind all DNA molecules longer than 200 bp), allowed to bind for 5 min.
  • oligos MlyI-1 (sequence 5 ′> 3 ′, 5 ′Phos-GGGGGACTCGGATCCCTATAGTGATACAAAGACGATGACGACAAGCG) (SEQ ID NO: 4404) and T7-7 (sequence 5 ′> 3 ′, GCCTCGAGC*T*A*ATACGACTCACTATAGGGATCCAAGTCCC, * denotes a phosphorothioate backbone linkage) (SEQ ID NO: 4405) were admixed at 15 ⁇ M, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C.
  • Nt.CviPII/T7 Endonuclease I digested DNA 100 ng was then ligated to 15 pmol of T7/MlyI adapters using Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((3) in FIG. 2 ). Ligations were then amplified by 10 cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C.
  • PCR products were then digested with MlyI and XhoI (NEB) for 1 hr at 37° C., and heat inactivated at 75° C. for 20 min ((5) in FIG. 2 ).
  • 5 pmol of adapter StlgR (in Example 1) was ligated using Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((6) in FIG. 2 ).
  • Ligations were then amplified by PCR using Hi-Fidelity 2 ⁇ Master Mix (NEB), 2 ⁇ M of both oligos T7-7 and gRU (in Example 1) and 20 cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C. for 35 s).
  • PCR amplified products were recovered using 0.6 ⁇ AxyPrep beads (Axygen) according to the manufacturer's instructions, and resuspended in 15 ⁇ L of 10 mM Tris-HCl pH 8.
  • the DNA was then ligated to MlyI adapter (see, for example, Example 4) or BaeI/EcoP15I adapters (see, for example, Example 4) or BaeI/EcoP15I adapters (see, for example, Example 5)
  • Adapter MlyI was made by combining 2 ⁇ moles of MlyI Ad1 and MlyAd2 in 40 ⁇ L water.
  • Adapter BsaXI/MmeI was made by combining 2 ⁇ moles oligo BsMm-Ad1 and 2 ⁇ moles oligo BsMm-Ad2 in 40 ⁇ L water.
  • T7 adapter was made by combining 1.5 ⁇ moles of T7-Ad1 and T7-Ad2 oligos in 100 ⁇ L water.
  • Stem-loop adapter was made by combining 1.5 ⁇ moles of gR-top and gR-bot oligos in 100 ⁇ L water. In all cases, after mixing adapters were heated to 98° C. for 3 min then cooled to room temperature at a cooling rate of 1° C./min in a thermal cycler.
  • the DNA containing the CCD blunt ends was then ligated to 50 pmoles of adapter MlyI, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes.
  • the DNA was then recovered by incubating with 0.6 ⁇ Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 ⁇ L buffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 ⁇ g/mL BSA, pH 7.9). These steps eliminate small ( ⁇ 100 nucleotides) DNA and MlyI adapter dimers.
  • DNA was then digested by adding 20 units of MlyI (New England Biolabs) and incubating at 37° C. for 1 hour to eliminate both the adapter derived sequences and the CCD (and complementary HGG) motifs. DNA was recovered from the digest by incubating with 0.6 ⁇ Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 30 ⁇ L buffer 4.
  • MlyI New England Biolabs
  • CCD complementary HGG
  • the purified DNA was then ligated to 50 pmoles of adapter BsaXI/MmeI, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes.
  • the DNA was then recovered by incubating with 0.6 ⁇ Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 ⁇ L buffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 ⁇ g/mL BSA, pH 7.9).
  • DNA was then digested by addition of 20 units MmeI (New England Biolabs) and 40 pmol/ ⁇ L SAM (S-adenosyl methionine) at 37° C. for 1 hour, followed by heat inactivation at 75° C. for 20 minutes. DNA was then ligated to 30 pmoles T7 adapter using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 ⁇ L buffer 4, then digested with 20 units of BsaXI for 1 hour at 37° C.
  • MmeI New England Biolabs
  • SAM S-adenosyl methionine
  • the guide RNA stem-loop sequences were added by adding 15 pmoles stem-loop adapter and using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 min. DNA was then recovered using a PCR cleanup kit (Zymo), eluted in 20 ⁇ L elution buffer and PCR amplified using HiFidelity 2 ⁇ master mix (New England Biolabs). Primers T7-Ad1 and gRU (sequence 5 ′> 3 ′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with the following settings (98° C. 3 min; 98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 sec, 30 cycles). The PCR amplicon was cleaned up using the PCR cleanup kit and verified by DNA sequencing, then used as template for an in vitro transcription reaction to generate guide RNAs.
  • Adapter Bae/EcoP15I was made by combining 2 ⁇ moles of BE Ad1 and BE Ad2 in 40 ⁇ L water.
  • T7-E adapter was made by combining 1.5 ⁇ moles of T7-Ad3 and T7-Ad4 oligos in 100 ⁇ L water. In all cases, after mixing adapters were heated to 98° C. for 3 min then cooled to room temperature at a cooling rate of 1° C./min in a thermal cycler.
  • the DNA containing the CCD blunt ends was then ligated to 50 pmoles of adapter BaeI/EcoP15I, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes.
  • the DNA was then recovered by incubating with 0.6 ⁇ Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 ⁇ L buffer 4 (50 mM potassium acetate 20 mM Tris-acetate, 10 mM magnesium acetate, 100 ⁇ g/mL BSA, pH 7.9).
  • DNA was then digested by addition of 20 units EcoP15I (New England Biolabs) and 1 mM ATP at 37° C. for 1 hour, followed by heat inactivation at 75° C. for 20 minutes. DNA was then ligated to 30 pmoles T7-E adapter using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 ⁇ L buffer 4.
  • EcoP15I New England Biolabs
  • 1 mM ATP DNA was then digested by addition of 20 units EcoP15I (New England Biolabs) and 1 mM ATP at 37° C. for 1 hour, followed by heat inactivation at 75° C. for 20 minutes. DNA was then ligated to 30 pmoles T7-E adapter using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo) and
  • Purified DNA was then digested by adding 20 units of BaeI (New England Biolabs), 40 pmol/ ⁇ L SAM (S-adenosyl methionine) and incubating at 37° C. for 1 hour to eliminate both the adapter derived sequences and the CCD (and complementary HGG) motifs. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 ⁇ L elution buffer.
  • BaeI New England Biolabs
  • SAM S-adenosyl methionine
  • ss ligation buffer 10 mM Bis-Tris-Propane-HCl, 10 mM MgCl 2 , 1 mM DTT, 2.5 mM MnCl 2 , pH 7 @ 25° C.
  • DNA product was then PCR amplified using HiFidelity 2 ⁇ master mix (New England Biolabs).
  • Primers T7-Ad3 and gRU (sequence 5 ′> 3 ′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with the following settings (98° C. 3 min; 98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 see, 30 cycles).
  • the PCR amplicon was cleaned up using the PCR cleanup kit and verified by DNA sequencing, then used as template for an in vitro transcription reaction to generate the guide RNAs.
  • NEMDA Nicking Endonuclease Mediated DNA Amplification
  • 100 ⁇ L thermo polymerase buffer 20 mM Tris-HCl, 10 mM (NH 4 ) 2 SO 4 , 10 mM KCl, 6 mM MgSO 4 , 0.1% Triton® X-100, pH 8.8 supplemented with 0.3 mM dNTPs, 40 units of Bst large fragment DNA polymerase, and 0.1 units of NtCviPII (New England Biolabs) at 55° C. for 45 min, followed by 65° C. for 30 min and finally 80° C. for 20 min in a thermal cycler.
  • the DNA was then diluted with 300 ⁇ L of buffer 4 supplemented with 200 pmoles of T7-RND8 oligo (sequence 5 ′> 3 ′ gcctcgagctaatacgactcactatagagnnnnnnnn) (SEQ ID NO: 4420) and boiled at 98° C. for 10 min followed by rapid cooling to 10° C. for 5 min.
  • the reaction was then supplemented with 40 units of E. coli DNA polymerase I and 0.1 mM dNTPs (New England Biolabs) and incubated at room temperature for 20 min followed by heat inactivation at 75° C. for 20 min. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 30 ⁇ L elution buffer.
  • DNA was then ligated to 50 pmoles of adapter BaeI/EcoP15I, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes.
  • the DNA was then recovered by incubating with 0.6 ⁇ Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80/o ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 ⁇ L buffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 ⁇ g/mL BSA, pH 7.9).
  • Purified DNA was then digested by adding 20 units of BaeI (New England Biolabs), 40 pmol/ ⁇ L SAM (S-adenosyl methionine) and incubating at 37° C. for 1 hour to eliminate both the adapter derived sequences and the CCD (and complementary HGG) motifs. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 ⁇ L elution buffer.
  • BaeI New England Biolabs
  • SAM S-adenosyl methionine
  • DNA was then ligated to the stlgR oligo using Thermostable 5′ AppDNA/RNA Ligase (New England Biolabs) by adding 20 units ligase, 20 pmol stlgR oligo, in 20 ⁇ L ss ligation buffer (10 mM Bis-Tris-Propane-HCl, 10 mM MgCl 2 , 1 mM DTT, 2.5 mM MnCl 2 . pH 7 @ 25° C.) and incubating at 65° C. for 1 hour followed by heat inactivation at 90° C. for 5 min. DNA product was then PCR amplified using HiFidelity 2 ⁇ master mix (New England Biolabs).
  • Primers T7-Ad3 (sequence 5 ′> 3 ′ gctcgagctaatacgactcactatagag) (SEQ ID NO: 12) and gRU (sequence 5 ′> 3 ′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with the following settings (98° C. for 3 min; 98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 sec, 30 cycles). The PCR amplicon was cleaned up using the PCR cleanup kit and verified by DNA sequencing, then used as template for an in vitro transcription reaction to generate the guide RNAs.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Medicinal Chemistry (AREA)
  • General Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Epidemiology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

Provided herein are methods and compositions to make guide nucleic acids (gNAs), nucleic acids encoding gNAs, collections of gNAs, and nucleic acids encoding for a collection of gNAs from any source nucleic acid. Also provided herein are methods and compositions to use the resulting gNAs, nucleic acids encoding gNAs, collections of gNAs, and nucleic acids encoding for a collection of gNAs in a variety of applications.

Description

    CROSS-REFERENCE
  • This application is a continuation of U.S. application Ser. No. 15/742,862, filed Jan. 8, 2018, which is a U.S. National Stage Application under 35 U.S.C. § 371 of International Application No. PCT/US2016/065420, filed Dec. 7, 2016, which claims the benefit of U.S. Provisional Application No. 62/264,262, filed Dec. 7, 2015, and of U.S. Provisional Application No. 62/298,963, filed Feb. 23, 2016, each of which is hereby incorporated by reference in its entirety.
  • INCORPORATION BY REFERENCE OF SEQUENCE LISTING
  • The present application is being filed with a Sequence Listing in electronic format. The content of the ASCII text file of the sequence listing named “155949-00086_Sequence_Listing.TXT” which is 797 kb in size was created on Aug. 17, 2020, and electronically submitted via EFS-Web herewith the application is incorporated herein by reference in its entirety.
  • BACKGROUND
  • Human clinical DNA samples and sample libraries such as cDNA libraries derived from RNA contain highly abundant sequences that have little informative value and increase the cost of sequencing. While methods have been developed to deplete these unwanted sequences (e.g., via hybridization capture), these methods are often time-consuming and can be inefficient.
  • Although a guide nucleic acid (gNA) mediated nuclease systems (such as guide RNA (gRNA)-mediated Cas systems) can efficiently deplete any target DNA, targeted depletion of very high numbers of unique DNA molecules is not feasible. For example, a sequencing library derived from human blood may contain >99% human genomic DNA. Using a gRNA-mediated Cas9 system-based method to deplete this genomic DNA to detect an infectious agent circulating in the human blood would require extremely high numbers of gRNAs (about 10-100 million gRNAs), in order to ensure that a gRNA will be present every 30-50 base pairs (bp), and that no target DNA will be missed. Very large numbers of gRNAs can be predicted computationally and then synthesized chemically, but at a prohibitively expensive cost.
  • Therefore, there is a need in the art to provide a cost-effective method of converting any DNA into a gNA (e.g., gRNA) library to enable, for example, genome-wide depletion of unwanted DNA sequences from those of interest, without prior knowledge about their sequences. Provided herein are methods and compositions that address this need.
  • SUMMARY
  • Provided herein are compositions and methods to generate gNAs and collections of gNAs from any source nucleic acid. For example, gRNAs and collections of gRNAs can be generated from source DNA, such as genomic DNA. Such gNAs and collections of the same are useful for a variety of applications, including depletion, partitioning, capture, or enrichment of target sequences of interest, genome-wide labeling, genome-wide editing, genome-wide functional screens, and genome-wide regulation.
  • In one aspect, the invention described herein provides a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment encoding a nucleic acid-guided nuclease system protein-binding sequence, wherein at least 10% of the nucleic acids in the collection vary in size. In another aspect, the invention described herein provides a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence, wherein the size of the second segment is greater than 21 bp; and a third segment encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein. In some embodiments, the size of the second segment varies from 15-250 bp across the collection of nucleic acids. In some embodiments, at least 10% of the second segments in the collection are greater than 21 bp. In some embodiments, the size of the second segment is not 20 bp. In some embodiments, the size of the second segment is not 21 bp. In some embodiments, the collection of nucleic acids is a collection of DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA. In some embodiments, the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3. In some embodiments, the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or a viral genome. In some embodiments, the targeting sequence is directed at repetitive or abundant DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the sequence of the second segments is selected from Table 3 and/or Table 4. In some embodiments, the collection comprises at least 102 unique nucleic acid molecules. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the collection comprises targeting sequences directed to sequences of interest spaced about every 10,000 bp or less across the genome of an organism. In some embodiments, the PAM sequence is AGG, CGG, or TGG. In some embodiments, the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the third segment comprises DNA encoding a gRNA stem-loop sequence. In some embodiments, the third segment encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the sequence of the third segment encodes for a crRNA and a tracrRNA. In some embodiments, the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the third segment comprises DNA encoding a Cas9-binding sequence. In some embodiments, a plurality of third segments of the collection encode for a first nucleic acid-guided nuclease system protein binding sequence, and a plurality of the third segments of the collection encode for a second nucleic acid-guided nuclease system protein binding sequence. In some embodiments, the third segments of the collection encode for a plurality of different binding sequences of a plurality of different binding sequences of a plurality of different nucleic acid-guided nuclease system proteins.
  • In another aspect, the invention described herein provides for a collection of guide RNAs (gRNAs), comprising: a first RNA segment a targeting sequence; and a second RNA segment comprising a nucleic acid-guided nuclease system protein-binding sequence, wherein at least 10% of the gRNAs in the collection vary in size. In some embodiments, the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein. In some embodiments, the size of the first segment varies from 15-250 bp across the collection of gRNAs. In some embodiments, the at least 10% of the first segments in the collection are greater than 21 bp. In some embodiments, the size of the first segment is not 20 bp. In some embodiments, the size of the first segment is not 21 bp. In some embodiments, the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or viral genome. In some embodiments, the targeting sequence is directed at repetitive or abundant DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the sequence of the first segments is RNA encoded by sequences selected from Table 3 and/or Table 4. In some embodiments, the collection comprises at least (unique gRNAs. In some embodiments, the gRNAs comprise cytosine, guanine, and adenine. In some embodiments, a subset of the gRNAs further comprises thymine. In some embodiments, a subset of the gRNAs further comprises uracil. In some embodiments, the first segment is at least 80% complementary to a target genomic sequence of interest. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments the PAM sequence is AGG, CGG, or TGG. In some embodiments, the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the second segment comprises a gRNA stem-loop sequence. In some embodiments, the third segment comprises DNA encoding a gRNA stem-loop sequence. In some embodiments, the third segment comprises the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or comprises the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the second segment comprises a crRNA and a tracrRNA. In some embodiments, the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the second segment comprises a Cas9-binding sequence. In some embodiments, at least 10% of the gRNAs in the collection vary in their 5′ terminal-end sequence. In some embodiments, the collection comprises targeting sequences directed to sequences of interest spaced every 10,000 bp or less across the genome of an organism. In some embodiments, a plurality of second segments of the collection comprise a first nucleic acid-guided nuclease system protein binding sequence, and a plurality of the second segments of the collection comprise a second nucleic acid-guided nuclease system protein binding sequence. In some embodiments, the second segments of the collection comprise a plurality of different binding sequences of a plurality of different nucleic acid-guided nuclease system proteins. In some embodiments, a plurality of the gRNAs of the collection are attached to a substrate. In some embodiments, a plurality of the gRNAs of the collection comprise a label. In some particular embodiments, a plurality of the gRNAs of the collection comprise different labels.
  • In another aspect, the invention described herein provides nucleic acid comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence, wherein the targeting sequence is greater than 30 bp; and a third segment encoding a nucleic acid encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas system protein. In some embodiments, the nucleic acid is DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA. In some embodiments, the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3. In some embodiments, the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or a viral genome. In some embodiments, the targeting sequence is directed at abundant or repetitive DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the sequence of the second segments is selected from Table 3 and/or Table 4. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the target genomic sequence of interest is 5′ upstream of a PAM sequence. In some embodiments, the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the third segment comprises DNA encoding a gRNA stem-loop sequence. In some embodiments, the third segment comprises DNA encoding a gRNA stem-loop sequence. In some embodiments, the third segment encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the third segment comprises DNA encoding a Cas9-binding sequence.
  • In another aspect, the invention described herein provides a guide RNA comprising a first segment comprising a targeting sequence, wherein the size of the first segment is greater than 30 bp; and a second segment comprising a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas system protein. In some embodiments, the gRNA comprises an adenine, a guanine, and a cytosine. In some embodiments, the gRNA further comprises a thymine. In some embodiments, the gRNA further comprises a uracil. In some embodiments, the size of the first RNA segment is between 30 and 250 bp. In some embodiments, the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or viral genome. In some embodiments, the targeting sequence is directed at repetitive or abundant DNA. In some embodiments, the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA. In some embodiments, the first segment is at least 80% complementary to the target genomic sequence of interest. In some embodiments, the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the second segment comprises a gRNA stem-loop sequence. In some embodiments, the sequence of the second segment comprises GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or comprises the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the sequence of the third segment comprises a crRNA and a tracrRNA. In some embodiments, the nucleic acid-guided nuclease system protein is from a bacterial species. In some embodiments, the nucleic acid-guided nuclease system protein is from an archaea species. In some embodiments, the CRISPR/Cas system protein is a Type I, Type II, or Type III protein. In some embodiments, the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase. In some embodiments, the second segment is a Cas9-binding sequence.
  • In another aspect, the invention provides a complex comprising a nucleic acid-guided nuclease system protein and a comprising a first segment comprising a targeting sequence, wherein the size of the first segment is greater than 30 bp; and a second segment comprising a nucleic acid-guided nuclease system protein-binding sequence.
  • In another aspect, the invention described herein provides a method for depleting and partitioning of targeted sequences in a sample, enriching a sample for non-host nucleic acids, or serially depleting targeted nucleic acids in a sample comprising: providing nucleic acids extracted from a sample; and contacting the sample with a plurality of complexes comprising (i) any one of the collection of gRNAs provided herein; and (ii) nucleic acid-guided nuclease system proteins. In some embodiments, the nucleic acid-guided nuclease system proteins are CRISPR/Cas system proteins. In some embodiments, the CRISPR/Cas system proteins are Cas9 proteins.
  • In another aspect, the invention provides a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: (a) providing double-stranded DNA molecules, each comprising a sequence of interest 5′ to a PAM sequence, and its reverse complementary sequence on the opposite strand; (b) performing an enzymatic digestion reaction on the double stranded DNA molecules, wherein cleavages are generated at the PAM sequence and/or its reverse complementary sequence on the opposite strand, but never completely remove the PAM sequence and/or its reverse complementary sequence on the opposite strand from the double stranded DNA; (c) ligating adapters comprising a recognition sequence to the resulting DNA molecules of step b; (d) contacting the DNA molecules of step c with an restriction enzyme that recognizes the recognition sequence of step c, whereby generating DNA fragments comprising blunt-ended double strand breaks immediately 5′ to the PAM sequence, whereby removing the PAM sequence and the adapter containing the enzyme recognition site; and (e) ligating the resulting double stranded DNA fragments of step d with a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, whereby generating a plurality of DNA fragments, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas nucleic acid-guided nuclease system protein. In some embodiments, the starting DNA molecules of the collection further comprise a regulatory sequence upstream of the sequence of interest 5′ to the PAM sequence. In some embodiments, the regulatory sequence comprises a promoter. In some embodiments, the promoter comprises a T7, Sp6, or T3 sequence. In some embodiments, the double stranded DNA molecules are genomic DNA, intact DNA, or sheared DNA. In some embodiments, the genomic DNA is human, mouse, avian, fish, plant, insect, bacterial, or viral. In some embodiments, the DNA segments encoding a targeting sequence are at least 22 bp. In some embodiments, the DNA segments encoding a targeting sequence are 15-250 bp in size range. In some embodiments, the PAM sequence is AGG, CGG, or TGG. In some embodiments, the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, step (b) further comprises (1) contacting the DNA molecules with an enzyme capable of creating a nick in a single strand at a CCD site, whereby generating a plurality of nicked double stranded DNA molecules, each comprising a sequence of interest followed by an HGG sequence, wherein the DNA molecules are nicked at the CCD sites; and (2) contacting the nicked double stranded DNA molecules with an endonuclease, whereby generating a plurality of double stranded DNA fragments, each comprising a sequence of interest followed by an HGG sequence wherein residual nucleotides from HGG and/or CCD sequences is (are) left behind. In some embodiments, step (d) further comprises PCR amplification of the adaptor-ligated DNA fragments from step (c) before cutting with the restriction enzyme recognizing the recognition sequence of step (c), wherein after PCR, the recognition sequence is positioned 3′ of the PAM sequence, and a regulatory sequence is positioned at the 5′ distal end of the PAM sequence. In some embodiments, the enzymatic reaction of step (b) comprises the use of a Nt.CviPII enzyme, and a T7 Endonuclease I enzyme. In some embodiments, step (c) further comprises a blunt-end reaction with a T4 DNA Polymerase, if the adapter to be ligated does not comprise an overhang. In some embodiments, the adapter of step (c) is either (1) double stranded, comprising a restriction enzyme recognition sequence in one strand, and a regulatory sequence in the other strand, if the adapter is Y-shaped and comprises an overhang; or (2) has a palindromic enzyme recognition sequence in both strands, if the adapter is not Y-shaped. In some embodiments, the restriction enzyme of step (d) is MlyI. In some embodiments, the restriction enzyme of step (d) is BaeI. In some embodiments, step (d) further comprises contacting the DNA molecules with an XhoI enzyme. In some embodiments, in step (e) the DNA encoding a nucleic acid-guided nuclease system-protein binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the targeted sequences of interest are spaced every 10,000 bp or less across the genome of an organism.
  • In another aspect, the invention provides a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: (a) providing a plurality of double stranded DNA molecules, each comprising a sequence of interest, an NGG site, and its complement CCN site; (b) contacting the molecules with an enzyme capable of creating a nick in a single strand at a CCN site, whereby generating a plurality of nicked double stranded DNA molecules, each comprising a sequence of interest 5′ to the NGG site, wherein the DNA molecules are nicked at the CCD sites; (c) contacting the nicked double stranded DNA molecules with an endonuclease, whereby generating a plurality of double stranded DNA fragments, each comprising a sequence of interest, wherein the fragments comprise an terminal overhang; (d) contacting the double stranded DNA fragments with an enzyme without 5′ to 3′ exonuclease activity to blunt end the double stranded DNA fragments, whereby generating a plurality of blunt ended double stranded fragments, each comprising a sequence of interest; (e) contacting the blunt ended double stranded fragments of step d with an enzyme that cleaves the terminal NGG site; and (f) ligating the resulting double stranded DNA fragments of step e with a DNA encoding a nucleic acid-guided nuclease system-protein binding sequence, whereby generating a plurality of DNA fragments, each comprising a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas system protein. In some embodiments, the plurality of double stranded DNA molecules have a regulatory sequence 5′ upstream of the NGG sites. In some embodiments, the regulatory sequence comprises a T7, SP6, or T3 sequence. In some embodiments, the NGG site comprises AGG, CGG, or TGG, and the CCN site comprises CCT, CCG, or CCA. In some embodiments, the plurality of double stranded DNA molecules, each comprising a sequence of interest comprise sheared fragments of genomic DNA. In some embodiments, the genomic DNA is mammalian, prokaryotic, eukaryotic, avian, bacterial or viral. In some embodiments, the plurality of double stranded DNA molecules in step (a) are at least 500 bp. In some embodiments, the enzyme in step b is a Nt.CviPII enzyme. In some embodiments, the enzyme in step c is a T7 Endonuclease I. In some embodiments, the enzyme in step d is a T4 DNA Polymerase. In some embodiments, in step f the DNA encoding a nucleic acid-guided nuclease system-protein binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the step e additionally comprises ligating adaptors carrying a MlyI recognition site and digesting with MlyI enzyme. In some embodiments, the sequence of interest is spaced every 10,000 bp or less across the genome.
  • In another aspect, the invention provides a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence and a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, comprising: (a) providing genomic DNA comprising a plurality of sequences of interest, comprising NGG and CCN sites; (b) contacting the genomic DNA with an enzyme capable of creating nicks in the genomic DNA, whereby generating nicked genomic DNA, nicked at CCN sites; (c) contacting the nicked genomic DNA with an endonuclease, whereby generating double stranded DNA fragments, with an overhang; (d) ligating the DNA with overhangs from step c to a Y-shaped adapter, thereby introducing a restriction enzyme recognition sequence only at 3′ of the NGG site and a regulatory sequence 5′ of the sequence of interest; (e) contacting the product from step d with an enzyme that cleaves away the NGG site together with the adaptor carrying the enzyme recognition sequence; and (f) ligating the resulting double stranded DNA fragments of step e with a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, whereby generating a plurality of DNA fragments, each comprising a sequence of interest ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence. In some embodiments, the nucleic acid-guided nuclease is a CRISPR/Cas system protein. In some embodiments, the NGG site comprises AGG, CGG, or TGG, and CCN site comprises CCT, CCG, or CCA. In some embodiments, the regulatory sequence comprises a promoter sequence. In some embodiments, the promoter sequence comprises a T7, SP6, or T3 sequence. In some embodiments, the DNA fragments are sheared fragments of genomic DNA.
  • In some embodiments, the genomic DNA is mammalian, prokaryotic, eukaryotic, or viral. In some embodiments, the fragments are at least 200 bp. In some embodiments, the enzyme in step b is a Nt.CviPII enzyme. In some embodiments, the enzyme in step c is a T7 Endonuclease I. In some embodiments, step d further comprises PCR amplification of the adaptor-ligated DNA. In some embodiments, in step f, the DNA encoding nucleic acid-guided nuclease system protein-binding sequence encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2). In some embodiments, the enzyme removing NGG site in step e is MlyI. In some embodiments, the target of interest of the collection is spaced every 10,000 bp or less across the genome.
  • In another aspect, the invention provides kits and/or reagents useful for performing a method of making a collection of nucleic acids, each comprising a DNA encoding a targeting sequence ligated to a DNA encoding a nucleic acid-guided nuclease system protein-binding sequence, as described in the embodiments herein.
  • In another aspect, the invention described herein provides kit comprising a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment encoding a CRISPR/Cas system protein-binding sequence, wherein at least 10% of the nucleic acids in the collection vary in size.
  • In another aspect, the invention described herein provides a kit comprising a collection of nucleic acids, a plurality of the nucleic acids in the collection comprising: a first segment comprising a regulatory region; a second segment encoding a targeting sequence, wherein the size of the second segment is greater than 21 bp; and a third segment encoding a CRISPR/Cas system protein-binding sequence.
  • In another aspect, the invention described herein provides a kit comprising a collection of guide RNAs comprising a first RNA segment a targeting sequence; and a second RNA segment comprising a CRISPR/Cas system protein-binding sequence, wherein at least 10% of the gRNAs in the collection vary in size.
  • In another aspect, the invention described herein provides a method of making a collection of guide nucleic acids, comprising: a. obtaining abundant cells in a source sample; b. collecting nucleic acids from said abundant cells; and c. preparing a collection of guide nucleic acids (gNAs) from said nucleic acids. In some embodiments, said abundant cells comprise cells from one or more most abundant bacterial species in said source sample. In some embodiments, said abundant cells comprise cells from more than one species. In some embodiments, said abundant cells comprise human cells. In some embodiments, said abundant cells comprise animal cells. In some embodiments, said abundant cells comprise plant cells. In some embodiments, said abundant cells comprise bacterial cells. In some embodiments, the method further comprises contacting nucleic acid-guided nucleases with said library of gNAs to form nucleic acid-guided nuclease-gNA complexes. In some embodiments, the method further comprises using said nucleic acid-guided nuclease-gNA complexes to cleave target nucleic acids at target sites, wherein said gNAs are complementary to said target sites. In some embodiments, said target nucleic acids are from said source sample. In some embodiments, a species of said target nucleic acids is the same as a species of said source sample. In some embodiments, said species of said target nucleic acids and said species of said source sample is human. In some embodiments, said species of said target nucleic acids and said species of said source sample is animal. In some embodiments, said species of said target nucleic acids and said species of said source sample is plant.
  • In another aspect, the invention described herein provides a method of making a collection of nucleic acids, each comprising a targeting sequence, comprising: a. obtaining source DNA; b. nicking said source DNA with a nicking enzyme at nicking enzyme recognition sites, thereby producing double-stranded breaks at proximal nicks; and c. repairing overhangs of said double-stranded breaks, thereby producing a double-stranded fragment comprising (i) a targeting sequence and (ii) said nicking enzyme recognition site. In another aspect, the invention described herein provides a method of making a collection of nucleic acids, each comprising a targeting sequence, comprising: a. obtaining source DNA; b. nicking said source DNA with a nicking enzyme at nicking enzyme recognition sites, thereby producing a nick; and c. synthesizing a new strand from said nick, thereby producing a single-stranded fragment of said source DNA comprising a targeting sequence. In some embodiments, the method further comprises producing a double-stranded fragment comprising said targeting sequence from said single-stranded fragment. In some embodiments, said producing said double-stranded fragment comprises random priming and extension. In some embodiments, said random priming is conducted with a primer comprising a random n-mer region and a promoter region. In some embodiments, said random n-mer region is a random hexamer region. In some embodiments, said random n-mer region is a random octamer region. In some embodiments, said promoter region is a T7 promoter region. In some embodiments, the method further comprises ligating a nuclease recognition site nucleic acid comprising a nuclease recognition site to said double-stranded fragment. In some embodiments, said nuclease recognition site corresponds to a nuclease that cuts at a distance from said nuclease recognition site equal to the length of said nicking enzyme recognition sites. In some embodiments, said nuclease recognition site is a MlyI recognition site. In some embodiments, said nuclease recognition site is a BaeI recognition site. In some embodiments, the method further comprises digesting said double-stranded fragment with said nuclease, thereby removing said nicking enzyme recognition site from said double-stranded fragment. In some embodiments, the method further comprises ligating said double-stranded fragment to a nucleic acid-guided nuclease system protein recognition site nucleic acid comprising a nucleic acid-guided nuclease system protein recognition site. In some embodiments, said nucleic acid-guided nuclease system protein recognition site comprises a guide RNA stem-loop sequence. In some embodiments, said nuclease recognition site corresponds to a nuclease that cuts at a distance from said nuclease recognition site equal to a length of said targeting sequence. In some embodiments, said length of said targeting sequence is 20 base pairs. In some embodiments, said nuclease recognition site is a MmeI recognition site. In some embodiments, the method further comprises digesting said double-stranded fragment with said nuclease. In some embodiments, said nuclease recognition site corresponds to a nuclease that cuts at a distance from said nuclease recognition site equal to a length of said targeting sequence plus a length of said nicking enzyme recognition sites. In some embodiments, said length of said targeting sequence plus a length of said nicking enzyme recognition sites is 23 base pairs. In some embodiments, said nuclease recognition site is a EcoP15I recognition site. In some embodiments, the method further comprises digesting said double-stranded fragment with said nuclease. In some embodiments, the method further comprises ligating said double-stranded fragment to a nucleic acid-guided nuclease system protein recognition site nucleic acid comprising a nucleic acid-guided nuclease system protein recognition site. In some embodiments, said nucleic acid-guided nuclease system protein recognition site comprises a guide RNA stem-loop sequence.
  • In another aspect, the invention described herein provides a kit comprising all essential reagents and instructions for carrying out the methods of aspects of the invention described herein.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates an exemplary scheme for producing a collection of gRNAs (a gRNA library) from genomic DNA.
  • FIG. 2 illustrates another exemplary scheme for producing a collection of gRNAs (a gRNA library) from genomic DNA.
  • FIG. 3 illustrates an exemplary scheme for nicking of DNA and subsequent treatment with polymerase to generate blunt ends.
  • FIG. 4 illustrates an exemplary scheme for sequential production of a library of gNAs using three adapters.
  • FIG. 5 illustrates an exemplary scheme for sequential production of a library of gNAs using one adapter and one oligo.
  • FIG. 6 illustrates an exemplary scheme for generation of a large pool of DNA fragments with blunt ends using Nicking Enzyme Mediated DNA Amplification (NEMDA).
  • FIG. 7 illustrates an exemplary scheme for generation of a large pool of gNAs using Nicking Enzyme Mediated DNA Amplification (NEMDA).
  • DETAILED DESCRIPTION OF THE INVENTION
  • There is a need in the art for a scalable, low-cost approach to generate large numbers of diverse guide nucleic acids (gNAs) (e.g., gRNAs, gDNAs) for a variety of downstream applications.
  • Unless defined otherwise herein, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described.
  • Numeric ranges are inclusive of the numbers defining the range.
  • For purposes of interpreting this specification, the following definitions will apply and whenever appropriate, terms used in the singular will also include the plural and vice versa. In the event that any definition set forth below conflicts with any document incorporated herein by reference, the definition set forth shall control.
  • As used herein, the singular form “a”, “an”, and “the” includes plural references unless indicated otherwise.
  • It is understood that aspects and embodiments of the invention described herein include “comprising,” “consisting,” and “consisting essentially of” aspects and embodiments.
  • The term “about” as used herein refers to the usual error range for the respective value readily known to the skilled person in this technical field. Reference to “about” a value or parameter herein includes (and describes) embodiments that are directed to that value or parameter per se.
  • The term “nucleic acid,” as used herein, refers to a molecule comprising one or more nucleic acid subunits. A nucleic acid can include one or more subunits selected from adenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), and modified versions of the same. A nucleic acid comprises deoxyribonucleic acid (DNA), ribonucleic acid (RNA), combinations, or derivatives thereof. A nucleic acid may be single-stranded and/or double-stranded.
  • The nucleic acids comprise “nucleotides”, which, as used herein, is intended to include those moieties that contain purine and pyrimidine bases, and modified versions of the same. Such modifications include methylated purines or pyrimidines, acylated purines or pyrimidines, alkylated riboses or other heterocycles. In addition, the term “nucleotide” or “polynucleotide” includes those moieties that contain hapten or fluorescent labels and may contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleosides, nucleotides or polynucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.
  • The term “nucleic acids” and “polynucleotides” are used interchangeably herein. Polynucleotide is used to describe a nucleic acid polymer of any length, e.g., greater than about 2 bases, greater than about 10 bases, greater than about 100 bases, greater than about 500 bases, greater than 1000 bases, up to about 10,000 or more bases composed of nucleotides, e.g., deoxyribonucleotides or ribonucleotides, and may be produced enzymatically or synthetically (e.g., PNA as described in U.S. Pat. No. 5,948,902 and the references cited therein) which can hybridize with naturally occurring nucleic acids in a sequence specific manner analogous to that of two naturally occurring nucleic acids, e.g., can participate in Watson-Crick base pairing interactions. Naturally-occurring nucleotides include guanine, cytosine, adenine and thymine (G, C, A and T, respectively). DNA and RNA have a deoxyribose and ribose sugar backbones, respectively, whereas PNA's backbone is composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds. In PNA various purine and pyrimidine bases are linked to the backbone by methylene carbonyl bonds. A locked nucleic acid (LNA), often referred to as inaccessible RNA, is a modified RNA nucleotide. The ribose moiety of an LNA nucleotide is modified with an extra bridge connecting the 2′ oxygen and 4′ carbon. The bridge “locks” the ribose in the 3′-endo (North) conformation, which is often found in the A-form duplexes. LNA nucleotides can be mixed with DNA or RNA residues in the oligonucleotide whenever desired. The term “unstructured nucleic acid,” or “UNA” is a nucleic acid containing non-natural nucleotides that bind to each other with reduced stability. For example, an unstructured nucleic acid may contain a G′ residue and a C′ residue, where these residues correspond to non-naturally occurring forms, i.e., analogs, of G and C that base pair with each other with reduced stability, but retain an ability to base pair with naturally occurring C and G residues, respectively. Unstructured nucleic acid is described in US20050233340, which is incorporated by reference herein for disclosure of UNA.
  • The term “oligonucleotide” as used herein denotes a single-stranded multimer of nucleotides.
  • Unless otherwise indicated, nucleic acids are written left to right in 5′ to 3′ orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
  • The term “cleaving,” as used herein, refers to a reaction that breaks the phosphodiester bonds between two adjacent nucleotides in both strands of a double-stranded DNA molecule, thereby resulting in a double-stranded break in the DNA molecule.
  • The term “nicking” as used herein, refers to a reaction that breaks the phosphodiester bond between two adjacent nucleotides in only one strand of a double-stranded DNA molecule, thereby resulting in a break in one strand of the DNA molecule.
  • The term “cleavage site, as used herein, refers to the site at which a double-stranded DNA molecule has been cleaved.
  • The “nucleic acid-guided nuclease-gNA complex” refers to a complex comprising a nucleic acid-guided nuclease protein and a guide nucleic acid (gNA, for example a gRNA or a gDNA). For example the “Cas9-gRNA complex” refers to a complex comprising a Cas9 protein and a guide RNA (gRNA). The nucleic acid-guided nuclease may be any type of nucleic acid-guided nuclease, including but not limited to wild type nucleic acid-guided nuclease, a catalytically dead nucleic acid-guided nuclease, or a nucleic acid-guided nuclease-nickase.
  • The term “nucleic acid-guided nuclease-associated guide NA” refers to a guide nucleic acid (guide NA). The nucleic acid-guided nuclease-associated guide NA may exist as an isolated nucleic acid, or as part of a nucleic acid-guided nuclease-gNA complex, for example a Cas9-gRNA complex.
  • The terms “capture” and “enrichment” are used interchangeably herein, and refer to the process of selectively isolating a nucleic acid region containing: sequences of interest, targeted sites of interest, sequences not of interest, or targeted sites not of interest.
  • The term “hybridization” refers to the process by which a strand of nucleic acid joins with a complementary strand through base pairing as known in the art. A nucleic acid is considered to be “selectively hybridizable” to a reference nucleic acid sequence if the two sequences specifically hybridize to one another under moderate to high stringency hybridization and wash conditions. Moderate and high stringency hybridization conditions are known (see, e.g., Ausubel, et al., Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons 1995 and Sambrook et al., Molecular Cloning: A Laboratory Manual, Third Edition, 2001 Cold Spring Harbor, N.Y.). One example of high stringency conditions includes hybridization at about 42° C. in 50% formamide, 5×SSC, 5×Denhardt's solution, 0.5% SDS and 100 μg/ml denatured carrier DNA followed by washing two times in 2×SSC and 0.5% SDS at room temperature and two additional times in 0.1×SSC and 0.5% SDS at 42° C.
  • The term “duplex,” or “duplexed,” as used herein, describes two complementary polynucleotides that are base-paired, i.e., hybridized together.
  • The term “amplifying” as used herein refers to generating one or more copies of a target nucleic acid, using the target nucleic acid as a template.
  • The term “genomic region,” as used herein, refers to a region of a genome, e.g., an animal or plant genome such as the genome of a human, monkey, rat, fish or insect or plant. In certain cases, an oligonucleotide used in the method described herein may be designed using a reference genomic region, i.e., a genomic region of known nucleotide sequence, e.g., a chromosomal region whose sequence is deposited at NCBI's Genbank database or other databases, for example.
  • The term “genomic sequence,” as used herein, refers to a sequence that occurs in a genome. Because RNAs are transcribed from a genome, this term encompasses sequence that exist in the nuclear genome of an organism, as well as sequences that are present in a cDNA copy of an RNA (e.g., an mRNA) transcribed from such a genome.
  • The term “genomic fragment,” as used herein, refers to a region of a genome, e.g., an animal or plant genome such as the genome of a human, monkey, rat, fish or insect or plant. A genomic fragment may be an entire chromosome, or a fragment of a chromosome. A genomic fragment may be adapter ligated (in which case it has an adapter ligated to one or both ends of the fragment, or to at least the 5 end of a molecule), or may not be adapter ligated.
  • In certain cases, an oligonucleotide used in the method described herein may be designed using a reference genomic region, i.e., a genomic region of known nucleotide sequence, e.g., a chromosomal region whose sequence is deposited at NCBI's Genbank database or other databases, for example. Such an oligonucleotide may be employed in an assay that uses a sample containing a test genome, where the test genome contains a binding site for the oligonucleotide.
  • The term “ligating,” as used herein, refers to the enzymatically catalyzed joining of the terminal nucleotide at the 5′ end of a first DNA molecule to the terminal nucleotide at the 3′ end of a second DNA molecule.
  • If two nucleic acids are “complementary,” each base of one of the nucleic acids base pairs with corresponding nucleotides in the other nucleic acid. The term “complementary” and “perfectly complementary” are used synonymously herein.
  • The term “separating,” as used herein, refers to physical separation of two elements (e.g., by size or affinity, etc.) as well as degradation of one element, leaving the other intact. For example, size exclusion can be employed to separate nucleic acids, including cleaved targeted sequences.
  • In a cell, DNA usually exists in a double-stranded form, and as such, has two complementary strands of nucleic acid referred to herein as the “top” and “bottom” strands. In certain cases, complementary strands of a chromosomal region may be referred to as “plus” and “minus” strands, the “first” and “second” strands, the “coding” and “noncoding” strands, the “Watson” and “Crick” strands or the “sense” and “antisense” strands. The assignment of a strand as being a top or bottom strand is arbitrary and does not imply any particular orientation, function or structure. Until they become covalently linked, the first and second strands are distinct molecules. For ease of description, the “top” and “bottom” strands of a double-stranded nucleic acid in which the top and bottom strands have been covalently linked will still be described as the “top” and “bottom” strands. In other words, for the purposes of this disclosure, the top and bottom strands of a double-stranded DNA do not need to be separated molecules. The nucleotide sequences of the first strand of several exemplary mammalian chromosomal regions (e.g., BACs, assemblies, chromosomes, etc.) is known, and may be found in NCBI's Genbank database, for example.
  • The term “top strand,” as used herein, refers to either strand of a nucleic acid but not both strands of a nucleic acid. When an oligonucleotide or a primer binds or anneals “only to a top strand,” it binds to only one strand but not the other. The term “bottom strand,” as used herein, refers to the strand that is complementary to the “top strand.” When an oligonucleotide binds or anneals “only to one strand,” it binds to only one strand, e.g., the first or second strand, but not the other strand. If an oligonucleotide binds or anneals to both strands of a double-stranded DNA, the oligonucleotide may have two regions, a first region that hybridizes with the top strand of the double-stranded DNA, and a second region that hybridizes with the bottom strand of the double-stranded DNA.
  • The term “double-stranded DNA molecule” refers to both double-stranded DNA molecules in which the top and bottom strands are not covalently linked, as well as double-stranded DNA molecules in which the top and bottom stands are covalently linked. The top and bottom strands of a double-stranded DNA are base paired with one other by Watson-Crick interactions.
  • The term “denaturing,” as used herein, refers to the separation of at least a portion of the base pairs of a nucleic acid duplex by placing the duplex in suitable denaturing conditions. Denaturing conditions are well known in the art. In one embodiment, in order to denature a nucleic acid duplex, the duplex may be exposed to a temperature that is above the Tm of the duplex, thereby releasing one strand of the duplex from the other. In certain embodiments, a nucleic acid may be denatured by exposing it to a temperature of at least 90° C. for a suitable amount of time (e.g., at least 30 seconds, up to 30 mins). In certain embodiments, fully denaturing conditions may be used to completely separate the base pairs of the duplex. In other embodiments, partially denaturing conditions (e.g., with a lower temperature than fully denaturing conditions) may be used to separate the base pairs of certain parts of the duplex (e.g., regions enriched for A-T base pairs may separate while regions enriched for G-C base pairs may remain paired). Nucleic acid may also be denatured chemically (e.g., using urea or NaOH).
  • The term “genotyping” as used herein, refers to any type of analysis of a nucleic acid sequence, and includes sequencing, polymorphism (SNP) analysis, and analysis to identify rearrangements.
  • The term “sequencing,” as used herein, refers to a method by which the identity of consecutive nucleotides of a polynucleotide are obtained.
  • The term “next-generation sequencing” refers to the so-called parallelized sequencing-by-synthesis or sequencing-by-ligation platforms, for example, those currently employed by Illumina, Life Technologies, and Roche, etc. Next-generation sequencing methods may also include nanopore sequencing methods or electronic-detection based methods such as Ion Torrent technology commercialized by Life Technologies.
  • The term “complementary DNA” or cDNA refers to a double-stranded DNA sample that was produced from an RNA sample by reverse transcription of RNA (using primers such as random hexamers or oligo-dT primers) followed by second-strand synthesis by digestion of the RNA with RNaseH and synthesis by DNA polymerase.
  • The term “RNA promoter adapter” is an adapter that contains a promoter for a bacteriophage RNA polymerase, e.g., the RNA polymerase from bacteriophage T3, T7, SP6 or the like.
  • Other definitions of terms may appear throughout the specification.
  • For any of the structural and functional characteristics described herein, methods of determining these characteristics are known in the art.
  • Guide Nucleic Acids (gNAs)
  • Provided herein are guide nucleic acids (gNAs) derivable from any nucleic acid source. The gNAs can be guide RNAs (gRNAs) or guide DNAs (gDNAs). The nucleic acid source can be DNA or RNA. Provided herein are methods to generate gNAs from any source nucleic acid, including DNA from a single organism, or mixtures of DNA from multiple organisms, or mixtures of DNA from multiple species, or DNA from clinical samples, or DNA from forensic samples, or DNA from environmental samples, or DNA from metagenomic DNA samples (for example a sample that contains more than one species of organism). Examples of any source DNA include, but are not limited to any genome, any genome fragment, cDNA, synthetic DNA, or a DNA collection (e.g. a SNP collection, DNA libraries). The gNAs provided herein can be used for genome-wide applications.
  • In some embodiments, the gNAs are derived from genomic sequences (e.g., genomic DNA). In some embodiments, the gNAs are derived from mammalian genomic sequences. In some embodiments, the gNAs are derived from eukaryotic genomic sequences. In some embodiments, the gNAs are derived from prokaryotic genomic sequences. In some embodiments, the gNAs are derived from viral genomic sequences. In some embodiments, the gNAs are derived from bacterial genomic sequences. In some embodiments, the gNAs are derived from plant genomic sequences. In some embodiments, the gNAs are derived from microbial genomic sequences. In some embodiments, the gNAs are derived from genomic sequences from a parasite, for example a eukaryotic parasite.
  • In some embodiments, the gNAs are derived from repetitive DNA. In some embodiments, the gNAs are derived from abundant DNA. In some embodiments, the gNAs are derived from mitochondrial DNA. In some embodiments, the gNAs are derived from ribosomal DNA. In some embodiments, the gNAs are derived from centromeric DNA. In some embodiments, the gNAs are derived from DNA comprising Alu elements (Alu DNA). In some embodiments, the gNAs are derived from DNA comprising long interspersed nuclear elements (LINE DNA). In some embodiments, the gNAs are derived from DNA comprising short interspersed nuclear elements (SINE DNA). In some embodiments the abundant DNA comprises ribosomal DNA. In some embodiments, the abundant DNA comprises host DNA (e.g., host genomic DNA or all host DNA). In an example, the gNAs can be derived from host DNA (e.g., human, animal, plant) for the depletion of host DNA to allow for easier analysis of other DNA that is present (e.g., bacterial, viral, or other metagenomic DNA). In another example, the gNAs can be derived from the one or more most abundant types (e.g., species) in a mixed sample, such as the one or more most abundant bacteria species in a metagenomic sample. The one or more most abundant types (e.g., species) can comprise the two, three, four, five, six, seven, eight, nine, ten, or more than ten most abundant types (e.g., species). The most abundant types can be the most abundant kingdoms, phyla or divisions, classes, orders, families, genuses, species, or other classifications. The most abundant types can be the most abundant cell types, such as epithelial cells, bone cells, muscle cells, blood cells, adipose cells, or other cell types. The most abundant types can be non-cancerous cells. The most abundant types can be cancerous cells. The most abundant types can be animal, human, plant, fungal, bacterial, or viral. gNAs can be derived from both a host and the one or more most abundant non-host types (e.g., species) in a sample, such as from both human DNA and the DNA of the one or more most abundant bacterial species. In some embodiments, the abundant DNA comprises DNA from the more abundant or most abundant cells in a sample. For example, for a specific sample, the highly abundant cells can be extracted and their DNA can be used to produce gNAs; these gNAs can be used to produce depletion library and applied to original sample to enable or enhance sequencing or detection of low abundance targets.
  • In some embodiments, the gNAs are derived from DNA comprising short terminal repeats (STRs).
  • In some embodiments, the gNAs are derived from a genomic fragment, comprising a region of the genome, or the whole genome itself. In one embodiment, the genome is a DNA genome. In another embodiment, the genome is a RNA genome.
  • In some embodiments, the gNAs are derived from a eukaryotic or prokaryotic organism; from a mammalian organism or a non-mammalian organism; from an animal or a plant; from a bacteria or virus; from an animal parasite; from a pathogen.
  • In some embodiments, the gNAs are derived from any mammalian organism. In one embodiment the mammal is a human. In another embodiment the mammal is a livestock animal, for example a horse, a sheep, a cow, a pig, or a donkey. In another embodiment, a mammalian organism is a domestic pet, for example a cat, a dog, a gerbil, a mouse, a rat. In another embodiment the mammal is a type of a monkey.
  • In some embodiments, the gNAs are derived from any bird or avian organism. An avian organism includes but is not limited to chicken, turkey, duck and goose.
  • In some embodiments, the gNAs are derived from a plant. In one embodiment, the plant is rice, maize, wheat, rose, grape, coffee, fruit, tomato, potato, or cotton.
  • In some embodiments, the gNAs are derived from a species of bacteria. In one embodiment, the bacteria are tuberculosis-causing bacteria.
  • In some embodiments, the gNAs are derived from a virus.
  • In some embodiments, the gNAs are derived from a species of fungi.
  • In some embodiments, the gNAs are derived from a species of algae.
  • In some embodiments, the gNAs are derived from any mammalian parasite.
  • In some embodiments, the gNAs are derived from any mammalian parasite. In one embodiment, the parasite is a worm. In another embodiment, the parasite is a malaria-causing parasite. In another embodiment, the parasite is a Leishmaniasis-causing parasite. In another embodiment, the parasite is an amoeba.
  • In some embodiments, the gNAs are derived from a nucleic acid target. Contemplated targets include, but are not limited to, pathogens; single nucleotide polymorphisms (SNPs), insertions, deletions, tandem repeats, or translocations; human SNPs or STRs; potential toxins; or animals, fungi, and plants. In some embodiments, the gRNAs are derived from pathogens, and are pathogen-specific gNAs.
  • In some embodiments, a guide NA of the invention comprises a first NA segment comprising a targeting sequence, wherein the targeting sequence is 15-250 bp; and a second NA segment comprising a nucleic acid guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence. In some embodiments, the targeting sequence is greater than 21 bp, greater than 22 bp, greater than 23 bp, greater than 24 bp, greater than 25 bp, greater than 26 bp, greater than 27 bp, greater than 28 bp, greater than 29 bp, greater than 30 bp, greater than 40 bp, greater than 50 bp, greater than 60 bp, greater than 70 bp, greater than 80 bp, greater than 90 bp, greater than 100 bp, greater than 110 bp, greater than 120 bp, greater than 130 bp, greater than 140 bp, or even greater than 150 bp. In an exemplary embodiment, the targeting sequence is greater than 30 bp. In some embodiments, the targeting sequences of the present invention range in size from 30-50 bp. In some embodiments, targeting sequences of the present invention range in size from 30-75 bp. In some embodiments, targeting sequences of the present invention range in size from 30-100 bp. For example, a targeting sequence can be at least 15 bp, 20 bp, 25 bp, 30 bp, 35 bp, 40 bp, 45 bp, 50 bp, 55 bp, 60 bp, 65 bp, 70 bp, 75 bp, 80 bp, 85 bp, 90 bp, 95 bp, 100 bp, 110 bp, 120 bp, 130 bp, 140 bp, 150 bp, 160 bp, 170 bp, 180 bp, 190 bp, 200 bp, 210 bp, 220 bp, 230 bp, 240 bp, or 250 bp. In specific embodiments, the targeting sequence is at least 22 bp. In specific embodiments, the targeting sequence is at least 30 bp.
  • In some embodiments, target-specific gNAs can comprise a nucleic acid sequence that is complementary to a region on the opposite strand of the targeted nucleic acid sequence 5′ to a PAM sequence, which can be recognized by a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein. In some embodiments the targeted nucleic acid sequence is immediately 5′ to a PAM sequence. In specific embodiments, the nucleic acid sequence of the gNA that is complementary to a region in a target nucleic acid is 15-250 bp. In specific embodiments, the nucleic acid sequence of the gNA that is complementary to a region in a target nucleic acid is 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 75, 80, 90, or 100 bp.
  • In some particular embodiments, the targeting sequence is not 20 bp. In some particular embodiments, the targeting sequence is not 21 bp.
  • In some embodiments, the gNAs comprise any purines or pyrimidines (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, uracil, guanine, and cytosine (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, thymine, guanine, and cytosine (and/or modified versions of the same). In some embodiments, the gNAs comprise adenine, thymine, guanine, cytosine and uracil (and/or modified versions of the same).
  • In some embodiments, the gNAs comprise a label, are attached to a label, or are capable of being labeled. In some embodiments, the gNA comprises is a moiety that is further capable of being attached to a label. A label includes, but is not limited to, enzyme, an enzyme substrate, an antibody, an antigen binding fragment, a peptide, a chromophore, a lumiphore, a fluorophore, a chromogen, a hapten, an antigen, a radioactive isotope, a magnetic particle, a metal nanoparticle, a redox active marker group (capable of undergoing a redox reaction), an aptamer, one member of a binding pair, a member of a FRET pair (either a donor or acceptor fluorophore), and combinations thereof.
  • In some embodiments, the gNAs are attached to a substrate. The substrate can be made of glass, plastic, silicon, silica-based materials, functionalized polystyrene, functionalized polyethyleneglycol, functionalized organic polymers, nitrocellulose or nylon membranes, paper, cotton, and materials suitable for synthesis. Substrates need not be flat. In some embodiments, the substrate is a 2-dimensional array. In some embodiments, the 2-dimensional array is flat. In some embodiments, the 2-dimensional array is not flat, for example, the array is a wave-like array. Substrates include any type of shape including spherical shapes (e.g., beads). Materials attached to substrates may be attached to any portion of the substrates (e.g., may be attached to an interior portion of a porous substrates material). In some embodiments, the substrate is a 3-dimensional array, for example, a microsphere. In some embodiments, the microsphere is magnetic. In some embodiments, the microsphere is glass. In some embodiments, the microsphere is made of polystyrene. In some embodiments, the microsphere is silica-based. In some embodiments, the substrate is an array with interior surface, for example, is a straw, tube, capillary, cylindrical, or microfluidic chamber array. In some embodiments, the substrate comprises multiple straws, capillaries, tubes, cylinders, or chambers.
  • Nucleic Acids Encoding gNAs
  • Also provided herein are nucleic acids encoding for gNAs (e.g., gRNAs or gDNAs). In some embodiments, by encoding it is meant that a gNA results from the transcription of a nucleic acid encoding for a gNA (e.g., gRNA). In some embodiments, by encoding, it is meant that the nucleic acid is a template for the transcription of a gNA (e.g., gRNA). In some embodiments, by encoding, it is meant that a gNA results from the reverse transcription of a nucleic acid encoding for a gNA. In some embodiments, by encoding, it is meant that the nucleic acid is a template for the reverse transcription of a gNA. In some embodiments, by encoding, it is meant that a gNA results from the amplification of a nucleic acid encoding for a gNA. In some embodiments, by encoding, it is meant that the nucleic acid is a template for the amplification of a gNA.
  • In some embodiments the nucleic acid encoding for a gNA comprises a first segment comprising a regulatory region; a second segment comprising targeting sequence, wherein the second segment can range from 15 bp-250 bp; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence.
  • In some embodiments, the nucleic acids encoding for gNAs comprise DNA. In some embodiments, the first segment is double stranded DNA. In some embodiments, the first segment is single stranded DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA.
  • In some embodiments, the nucleic acids encoding for gNAs comprise RNA.
  • In some embodiments the nucleic acids encoding for gNAs comprise DNA and RNA.
  • In some embodiments, the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3.
  • Collections of gNAs
  • Provided herein are collections (interchangeably referred to as libraries) of gNAs.
  • As used herein, a collection of gNAs denotes a mixture of gNAs containing at least 102 unique gNAs. In some embodiments a collection of gNAs contains at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 unique gNAs. In some embodiments a collection of gNAs contains a total of at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 gNAs.
  • In some embodiments, a collection of gNAs comprises a first NA segment comprising a targeting sequence; and a second NA segment comprising a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein at least 10% of the gNAs in the collection vary in size. In some embodiments, the first and second segments are in 5′- to 3-order′.
  • In some embodiments, the size of the first segment varies from 15-250 bp, or 30-100 bp, or 22-30 bp, or 15-50 bp, or 15-75 bp, or 15-100 bp, or 15-125 bp, or 15-150 bp, or 15-175 bp, or 15-200 bp, or 15-225 bp, or 15-250 bp, or 22-50 bp, or 22-75 bp, or 22-100 bp, or 22-125 bp, or 22-150 bp, or 22-175 bp, or 22-200 bp, or 22-225 bp, or 22-250 bp across the collection of gNAs.
  • In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are greater than 21 bp.
  • In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are greater than 25 bp.
  • In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are greater than 30 bp.
  • In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are 15-50 bp.
  • In some embodiments, at least 0%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the first segments in the collection are 30-100 bp.
  • In some particular embodiments, the size of the first segment is not 20 bp.
  • In some particular embodiments, the size of the first segment is not 21 bp.
  • In some embodiments, the gNAs and/or the targeting sequence of the gNAs in the collection of gRNAs comprise unique 5′ ends. In some embodiments, the collection of gNAs exhibit variability in sequence of the 5′ end of the targeting sequence, across the members of the collection. In some embodiments, the collection of gNAs exhibit variability at least 5%, or at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75% variability in the sequence of the 5′ end of the targeting sequence, across the members of the collection.
  • In some embodiments, the 3′ end of the gNA targeting sequence can be any purine or pyrimidine (and/or modified versions of the same). In some embodiments, the 3′ end of the gNA targeting sequence is an adenine. In some embodiments, the 3′ end of the gNA targeting sequence is a guanine. In some embodiments, the 3′ end of the gNA targeting sequence is a cytosine. In some embodiments, the 3′ end of the gNA targeting sequence is a uracil. In some embodiments, the 3′ end of the gNA targeting sequence is a thymine. In some embodiments, the 3′ end of the gNA targeting sequence is not cytosine.
  • In some embodiments, the collection of gNAs comprises targeting sequences which can base-pair with the targeted DNA, wherein the target of interest is spaced at least every 1 bp, at least every 2 bp, at least every 3 bp, at least every 4 bp, at least every 5 bp, at least every 6 bp, at least every 7 bp, at least every 8 bp, at least every 9 bp, at least every 10 bp, at least every 11 bp, at least every 12 bp, at least every 13 bp, at least every 14 bp, at least every 15 bp, at least every 16 bp, at least every 17 bp, at least every 18 bp, at least every 19 bp, 20 bp, at least every 25 bp, at least every 30 bp, at least every 40 bp, at least every 50 bp, at least every 100 bp, at least every 200 bp, at least every 300 bp, at least every 400 bp, at least every 500 bp, at least every 600 bp, at least every 700 bp, at least every 800 bp, at least every 900 bp, at least every 1000 bp, at least every 2500 bp, at least every 5000 bp, at least every 10,000 bp, at least every 15,000 bp, at least every 20,000 bp, at least every 25,000 bp, at least every 50,000 bp, at least every 100,000 bp, at least every 250,000 bp, at least every 500,000 bp, at least every 750,000 bp, or even at least every 1,000,000 bp across a genome of interest.
  • In some embodiments, the collection of gNAs comprises a first NA segment comprising a targeting sequence; and a second NA segment comprising a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein the gNAs in the collection can have a variety of second NA segments with various specificities for protein members of the nucleic acid-guided nuclease system (e.g., CRISPR/Cas system). For example a collection of gNAs as provided herein, can comprise members whose second segment comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a first nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein; and also comprises members whose second segment comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein, wherein the first and second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins are not the same. In some embodiments a collection of gNAs as provided herein comprises members that exhibit specificity to at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or even at least 20 nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins. In one specific embodiment, a collection of gNAs as provided herein comprises members that exhibit specificity for a Cas9 protein and another protein selected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
  • In some embodiments, a plurality of the gNA members of the collection are attached to a label, comprise a label or are capable of being labeled. In some embodiments, the gNA comprises is a moiety that is further capable of being attached to a label. A label includes, but is not limited to, enzyme, an enzyme substrate, an antibody, an antigen binding fragment, a peptide, a chromophore, a lumiphore, a fluorophore, a chromogen, a hapten, an antigen, a radioactive isotope, a magnetic particle, a metal nanoparticle, a redox active marker group (capable of undergoing a redox reaction), an aptamer, one member of a binding pair, a member of a FRET pair (either a donor or acceptor fluorophore), and combinations thereof.
  • In some embodiments, a plurality of the gNA members of the collection are attached to a substrate. The substrate can be made of glass, plastic, silicon, silica-based materials, functionalized polystyrene, functionalized polyethyleneglycol, functionalized organic polymers, nitrocellulose or nylon membranes, paper, cotton, and materials suitable for synthesis. Substrates need not be flat. In some embodiments, the substrate is a 2-dimensional array. In some embodiments, the 2-dimensional array is flat. In some embodiments, the 2-dimensional array is not flat, for example, the array is a wave-like array. Substrates include any type of shape including spherical shapes (e.g., beads). Materials attached to substrates may be attached to any portion of the substrates (e.g., may be attached to an interior portion of a porous substrates material). In some embodiments, the substrate is a 3-dimensional array, for example, a microsphere. In some embodiments, the microsphere is magnetic. In some embodiments, the microsphere is glass. In some embodiments, the microsphere is made of polystyrene. In some embodiments, the microsphere is silica-based. In some embodiments, the substrate is an array with interior surface, for example, is a straw, tube, capillary, cylindrical, or microfluidic chamber array. In some embodiments, the substrate comprises multiple straws, capillaries, tubes, cylinders, or chambers.
  • Collections of Nucleic Acids Encoding % NAs
  • Provided herein are collections (interchangeably referred to as libraries) of nucleic acids encoding for gNAs (e.g., gRNAs or gDNAs). In some embodiments, by encoding it is meant that a gNA results from the transcription of a nucleic acid encoding for a gNA. In some embodiments, by encoding, it is meant that the nucleic acid is a template for the transcription of a gNA.
  • As used herein, a collection of nucleic acids encoding for gNAs denotes a mixture of nucleic acids containing at least 102 unique nucleic acids. In some embodiments a collection of nucleic acids encoding for gNAs contains at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 unique nucleic acids encoding for gNAs. In some embodiments a collection of nucleic acids encoding for gNAs contains a total of at least 102, at least 103, at least 104, at least 105, at least 106, at least 107, at least 108, at least 109, at least 1010 nucleic acids encoding for gNAs.
  • In some embodiments, a collection of nucleic acids encoding for gNAs comprises a first segment comprising a regulatory region; a second segment comprising a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein at least 10% of the nucleic acids in the collection vary in size.
  • In some embodiments, the first, second, and third segments are in 5′- to 3′-order′.
  • In some embodiments, the nucleic acids encoding for gNAs comprise DNA. In some embodiments, the first segment is single stranded DNA. In some embodiments, the first segment is double stranded DNA. In some embodiments, the second segment is single stranded DNA. In some embodiments, the third segment is single stranded DNA. In some embodiments, the second segment is double stranded DNA. In some embodiments, the third segment is double stranded DNA.
  • In some embodiments, the nucleic acids encoding for gNAs comprise RNA.
  • In some embodiments the nucleic acids encoding for gNAs comprise DNA and RNA.
  • In some embodiments, the regulatory region is a region capable of binding a transcription factor. In some embodiments, the regulatory region comprises a promoter. In some embodiments, the promoter is selected from the group consisting of T7, SP6, and T3.
  • In some embodiments, the size of the second segments (targeting sequence) in the collection varies from 15-250 bp, or 30-100 bp, or 22-30 bp, or 15-50 bp, or 15-75 bp, or 15-100 bp, or 15-125 bp, or 15-150 bp, or 15-175 bp, or 15-200 bp, or 15-225 bp, or 15-250 bp, or 22-50 bp, or 22-75 bp, or 22-100 bp, or 22-125 bp, or 22-150 bp, or 22-175 bp, or 22-200 bp, or 22-225 bp, or 22-250 bp across the collection of gNAs.
  • In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are greater than 21 bp.
  • In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are greater than 25 bp.
  • In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are greater than 30 bp.
  • In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are 15-50 bp.
  • In some embodiments, at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75%, or at least 80%, or at least 85%, or at least 90%, or at least 95%, or 100% of the second segments in the collection are 30-100 bp.
  • In some particular embodiments, the size of the second segment is not 20 bp.
  • In some particular embodiments, the size of the second segment is not 21 bp.
  • In some embodiments, the gNAs and/or the targeting sequence of the gNAs in the collection of gNAs comprise unique 5′ ends. In some embodiments, the collection of gNAs exhibit variability in sequence of the 5′ end of the targeting sequence, across the members of the collection. In some embodiments, the collection of gNAs exhibit variability at least 5%, or at least 10%, or at least 15%, or at last 20%, or at least 25%, or at least 30%, or at least 35%, or at least 40%, or at least 45%, or at least 50%, or at least 55%, or at least 60%, or at least 65%, or at least 70%, or at least 75% variability in the sequence of the 5′ end of the targeting sequence, across the members of the collection.
  • In some embodiments, the collection of nucleic acids comprises targeting sequences, wherein the target of interest is spaced at least every 1 bp, at least every 2 bp, at least every 3 bp, at least every 4 bp, at least every 5 bp, at least every 6 bp, at least every 7 bp, at least every 8 bp, at least every 9 bp, at least every 10 bp, at least every 11 bp, at least every 12 bp, at least every 13 bp, at least every 14 bp, at least every 15 bp, at least every 16 bp, at least every 17 bp, at least every 18 bp, at least every 19 bp, 20 bp, at least every 25 bp, at least every 30 bp, at least every 40 bp, at least every 50 bp, at least every 100 bp, at least every 200 bp, at least every 300 bp, at least every 400 bp, at least every 500 bp, at least every 600 bp, at least every 700 bp, at least every 800 bp, at least every 900 bp, at least every 1000 bp, at least every 2500 bp, at least every 5000 bp, at least every 10,000 bp, at least every 15,000 bp, at least every 20,000 bp, at least every 25,000 bp, at least every 50,000 bp, at least every 100,000 bp, at least every 250,000 bp, at least every 500,000 bp, at least every 750,000 bp, or even at least every 1,000,000 bp across a genome of interest.
  • In some embodiments, the collection of nucleic acids encoding for gNAs comprise a third segment encoding for a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence, wherein the segments in the collection vary in their specificity for protein members of the nucleic acid-guided nuclease system (e.g., CRISPR/Cas system). For example, a collection of nucleic acids encoding for gNAs as provided herein, can comprise members whose third segment encode for a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a first nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein; and also comprises members whose third segment encodes for a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence specific for a second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein, wherein the first and second nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins are not the same. In some embodiments, a collection of nucleic acids encoding for gNAs as provided herein comprises members that exhibit specificity to at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, or even at least 20 nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins. In one specific embodiment, a collection of nucleic acids encoding for gNAs as provided herein comprises members that exhibit specificity for a Cas9 protein and another protein selected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
  • Sequences of Interest
  • Provided herein are gNAs and collections of gNAs, derived from any source DNA (for example from genomic DNA, cDNA, artificial DNA, DNA libraries), that can be used to target sequences of interest in a sample for a variety of applications including, but not limited to, enrichment, depletion, capture, partitioning, labeling, regulation, and editing. The gNAs comprise a targeting sequence, directed at sequences of interest.
  • In some embodiments, the sequences of interest are genomic sequences (genomic DNA). In some embodiments, the sequences of interest are mammalian genomic sequences. In some embodiments, the sequences of interest are eukaryotic genomic sequences. In some embodiments, the sequences of interest are prokaryotic genomic sequences. In some embodiments, the sequences of interest are viral genomic sequences. In some embodiments, the sequences of interest are bacterial genomic sequences. In some embodiments, the sequences of interest are plant genomic sequences. In some embodiments, the sequences of interest are microbial genomic sequences. In some embodiments, the sequences of interest are genomic sequences from a parasite, for example a eukaryotic parasite. In some embodiments, the sequences of interest are host genomic sequences (e.g., the host organism of a microbiome, a parasite, or a pathogen). In some embodiments, the sequences of interest are abundant genomic sequences, such as sequences from the genome or genomes of the most abundant species in a sample.
  • In some embodiments, the sequences of interest comprise repetitive DNA. In some embodiments, the sequences of interest comprise abundant DNA. In some embodiments, the sequences of interest comprise mitochondrial DNA. In some embodiments, the sequences of interest comprise ribosomal DNA. In some embodiments, the sequences of interest comprise centromeric DNA. In some embodiments, the sequences of interest comprise DNA comprising Alu elements (Alu DNA). In some embodiments, the sequences of interest comprise long interspersed nuclear elements (LINE DNA). In some embodiments, the sequences of interest comprise short interspersed nuclear elements (SINE DNA). In some embodiments, the abundant DNA comprises ribosomal DNA.
  • In some embodiments, the sequences of interest comprise single nucleotide polymorphisms (SNPs), short tandem repeats (STRs), cancer genes, inserts, deletions, structural variations, exons, genetic mutations, or regulatory regions.
  • In some embodiments, the sequences of interest can be a genomic fragment, comprising a region of the genome, or the whole genome itself. In one embodiment, the genome is a DNA genome. In another embodiment, the genome is a RNA genome.
  • In some embodiments, the sequences of interest are from a eukaryotic or prokaryotic organism; from a mammalian organism or a non-mammalian organism; from an animal or a plant; from a bacteria or virus from an animal parasite; from a pathogen.
  • In some embodiments, the sequences of interest are from any mammalian organism. In one embodiment the mammal is a human. In another embodiment the mammal is a livestock animal, for example a horse, a sheep, a cow, a pig, or a donkey. In another embodiment, a mammalian organism is a domestic pet, for example a cat, a dog, a gerbil, a mouse, a rat. In another embodiment the mammal is a type of a monkey.
  • In some embodiments, the sequences of interest are from any bird or avian organism. An avian organism includes but is not limited to chicken, turkey, duck and goose.
  • In some embodiments, the sequences of interest are from a plant. In one embodiment, the plant is rice, maize, wheat, rose, grape, coffee, fruit, tomato, potato, or cotton.
  • In some embodiments, the sequences of interest are from a species of bacteria. In one embodiment, the bacteria are tuberculosis-causing bacteria.
  • In some embodiments, the sequences of interest are from a virus.
  • In some embodiments, the sequences of interest are from a species of fungi.
  • In some embodiments, the sequences of interest are from a species of algae.
  • In some embodiments, the sequences of interest are from any mammalian parasite.
  • In some embodiments, the sequences of interest are obtained from any mammalian parasite. In one embodiment, the parasite is a worm. In another embodiment, the parasite is a malaria-causing parasite. In another embodiment, the parasite is a Leishmaniasis-causing parasite. In another embodiment, the parasite is an amoeba.
  • In some embodiments, the sequences of interest are from a pathogen.
  • Targeting Sequences
  • As used herein, a targeting sequence is one that directs the gNA to the sequences of interest in a sample. For example, a targeting sequence targets a particular sequence of interest, for example the targeting sequence targets a genomic sequence of interest.
  • Provided herein are gNAs and collections of gNAs that comprise a segment that comprises a targeting sequence. Also provided herein, are nucleic acids encoding for gNAs, and collections of nucleic acids encoding for gNAs that comprise a segment encoding for a targeting sequence.
  • In some embodiments, the targeting sequence comprises DNA.
  • In some embodiments, the targeting sequence comprises RNA.
  • In some embodiments, the targeting sequence comprises RNA, and shares at least 70% sequence identity, at least 75% sequence identity, at least 80% sequence identity, at least 85% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or shares 100% sequence identity to a sequence 5′ to a PAM sequence on a sequence of interest, except that the RNA comprises uracils instead of thymines. In some embodiments, the PAM sequence is AGG, CGG, or TGG.
  • In some embodiments, the targeting sequence comprises DNA, and shares at least 70% sequence identity, at least 75% sequence identity, at least 80% sequence identity, at least 85% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or shares 100% sequence identity to a sequence 5′ to a PAM sequence on a sequence of interest.
  • In some embodiments, the targeting sequence comprises RNA and is complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the targeting sequence is at least 70% complementary, at least 75% complementary, at least 80% complementary, at least 85% complementary, at least 90% complementary, at least 95% complementary, or is 100% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the PAM sequence is AGG, CGG, or TGG.
  • In some embodiments, the targeting sequence comprises DNA and is complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the targeting sequence is at least 70% complementary, at least 75% complementary, at least 80% complementary, at least 85% complementary, at least 90% complementary, at least 95% complementary, or is 100% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the PAM sequence is AGG, CGG, or TGG.
  • In some embodiments, a DNA encoding for a targeting sequence of a gRNA shares at least 70% sequence identity, at least 75% sequence identity, at least 80% sequence identity, at least 85% sequence identity, at least 90% sequence identity, at least 95% sequence identity, or shares 100 sequence identity to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence. In some embodiments, the PAM sequence is AGG, CGG, or TGG.
  • In some embodiments, a DNA encoding for a targeting sequence of a gRNA is complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence and is at least 70% complementary, at least 75% complementary, at least 80% complementary, at least 85% complementary, at least 90% complementary, at least 95% complementary, or is 100% complementary to a sequence 5′ to a PAM sequence on a sequence of interest. In some embodiments, the PAM sequence is AGG, CGG, or TGG.
  • Nucleic Acid-Guided Nuclease System Proteins
  • Provided herein are gNAs and collections of gNAs comprising a segment that comprises a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence. Also provided herein, are nucleic acids encoding for gNAs, and collections of nucleic acids encoding for gNAs that comprise a segment encoding a nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) protein-binding sequence. A nucleic acid-guided nuclease system can be an RNA-guided nuclease system. A nucleic acid-guided nuclease system can be a DNA-guided nuclease system.
  • Methods of the present disclosure can utilize nucleic acid-guided nucleases. As used herein, a “nucleic acid-guided nuclease” is any nuclease that cleaves DNA, RNA or DNA/RNA hybrids, and which uses one or more nucleic acid guide nucleic acids (gNAs) to confer specificity. Nucleic acid-guided nucleases include CRISPR/Cas system proteins as well as non-CRISPR/Cas system proteins.
  • The nucleic acid-guided nucleases provided herein can be DNA guided DNA nucleases; DNA guided RNA nucleases; RNA guided DNA nucleases; or RNA guided RNA nucleases. The nucleases can be endonucleases. The nucleases can be exonucleases. In one embodiment, the nucleic acid-guided nuclease is a nucleic acid-guided-DNA endonuclease. In one embodiment, the nucleic acid-guided nuclease is a nucleic acid-guided-RNA endonuclease.
  • A nucleic acid-guided nuclease system protein-binding sequence is a nucleic acid sequence that binds any protein member of a nucleic acid-guided nuclease system. For example, a CRISPR/Cas system protein-binding sequence is a nucleic acid sequence that binds any protein member of a CRISPR/Cas system.
  • In some embodiments, the nucleic acid-guided nuclease is selected from the group consisting of CAS Class I Type 1, CAS Class I Type III, CAS Class 1 Type IV, CAS Class II Type 11, and CAS Class 11 Type V. In some embodiments, CRISPR/Cas system proteins include proteins from CRISPR Type I systems, CRISPR Type II systems, and CRISPR Type III systems. In some embodiments, the nucleic acid-guided nuclease is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, Csf1, C2c2, and NgAgo.
  • In some embodiments, nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins) can be from any bacterial or archaeal species.
  • In some embodiments, the nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins) are from, or are derived from nucleic acid-guided nuclease system proteins (e.g., CRISPR/Cas system proteins) from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, or Corynebacter diphtheria.
  • In some embodiments, examples of nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins can be naturally occurring or engineered versions.
  • In some embodiments, naturally occurring nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins include Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. Engineered versions of such proteins can also be employed.
  • In some embodiments, engineered examples of nucleic acid-guided nuclease system (e.g., CRISPR/Cas system) proteins include catalytically dead nucleic acid-guided nuclease system proteins. The term “catalytically dead” generally refers to a nucleic acid-guided nuclease system protein that has inactivated nucleases (e.g., HNH and RuvC nucleases). Such a protein can bind to a target site in any nucleic acid (where the target site is determined by the guide NA), but the protein is unable to cleave or nick the target nucleic acid (e.g., double-stranded DNA). In some embodiments, the nucleic acid-guided nuclease system catalytically dead protein is a catalytically dead CRISPR/Cas system protein, such as catalytically dead Cas9 (dCas9). Accordingly, the dCas9 allows separation of the mixture into unbound nucleic acids and dCas9-bound fragments. In one embodiment, a dCas9/gRNA complex binds to targets determined by the gRNA sequence. The dCas9 bound can prevent cutting by Cas9 while other manipulations proceed. In another embodiment, the dCas9 can be fused to another enzyme, such as a transposase, to target that enzyme's activity to a specific site. Naturally occurring catalytically dead nucleic acid-guided nuclease system proteins can also be employed.
  • In some embodiments, engineered examples of nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins also include nucleic acid-guided nickases (e.g., Cas nickases). A nucleic acid-guided nickase refers to a modified version of a nucleic acid-guided nuclease system protein, containing a single inactive catalytic domain. In one embodiment, the nucleic acid-guided nickase is a Cas nickase, such as Cas9 nickase. A Cas9 nickase may contain a single inactive catalytic domain, for example, either the RuvC- or the HNH-domain. With only one active nuclease domain, the Cas9 nickase cuts only one strand of the target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the guide NA-hybridized strand or the non-hybridized strand may be cleaved. Nucleic acid-guided nickases bound to 2 gNAs that target opposite strands will create a double-strand break in a target double-stranded DNA. This “dual nickase” strategy can increase the specificity of cutting because it requires that both nucleic acid-guided nuclease/gNA (e.g., Cas9/gRNA) complexes be specifically bound at a site before a double-strand break is formed. Naturally occurring nickase nucleic acid-guided nuclease system proteins can also be employed.
  • In some embodiments, engineered examples of nucleic acid-guided nuclease system proteins also include nucleic acid-guided nuclease system fusion proteins. For example, a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein may be fused to another protein, for example an activator, a repressor, a nuclease, a fluorescent molecule, a radioactive tag, or a transposase.
  • In some embodiments, the nucleic acid-guided nuclease system protein-binding sequence comprises a gNA (e.g., gRNA) stem-loop sequence.
  • In some embodiments, a double-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA on the other strand (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4).
  • In some embodiments, a single-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence: (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA serves as a transcription template.
  • In some embodiments, the gNA (e.g., gRNA) stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 1)
  • In some embodiments, a double-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and its reverse-complementary DNA on the other strand (5′>3′, GAAAAAAAGCACCGACTCGGTGCCAITTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6).
  • In some embodiments, a single-stranded DNA sequence encoding the gNA (e.g., gRNA) stem-loop sequence comprises the following DNA sequence: (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-stranded DNA serves as a transcription template.
  • In some embodiments, the gNA (e.g., gRNA) stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC) (SEQ ID NO: 2).
  • In some embodiments, provided herein is a nucleic acid encoding for a gNA (e.g., gRNA) comprising a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence. In some embodiments, the third segment comprises a single transcribed component, which upon transcription yields a NA (e.g., RNA) stem-loop sequence. In some embodiments, the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is double-stranded, comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA on the other strand (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4). In some embodiments, the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is single-stranded, and comprises the following DNA sequence: (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATITTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA serves as a transcription template. In some embodiments, upon transcription from the single transcribed component, the resulting gNA (e.g., gRNA) stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 1). In some embodiments, the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is double-stranded, comprises the following DNA sequence on one strand (5′>3′, GTTTTAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and its reverse-complementary DNA on the other strand (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6). In some embodiments, the third segment comprising a single transcribed component that encodes for the gNA (e.g., gRNA) stem-loop sequence is single-stranded, and comprises the following DNA sequence: (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACUITITrCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-stranded DNA serves as a transcription template. In some embodiments, upon transcription from the single transcribed component, the yielded gRNA stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC) (SEQ ID NO: 2). In some embodiments, the third segment comprises two sub-segments, which encode for a crRNA and a tracrRNA upon transcription. In some embodiment, the crRNA does not comprise the N20 plus the extra sequence which can hybridize with tracrRNA. In some embodiments, the crRNA comprises the extra sequence which can hybridize with tracrRNA. In some embodiments, the two sub-segments are independently transcribed. In some embodiments, the two sub-segments are transcribed as a single unit. In some embodiments, the DNA encoding the crRNA comprises NtargetGTTTTAGAGCTATGCTGTTTTG (SEQ ID NO: 7), where Ntarget represents the targeting sequence. In some embodiments, the DNA encoding the tracrRNA comprises the sequence GGAACCATTCAAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGC ACCGAGTCGGTGCTTTTTTT (SEQ ID NO: 8).
  • In some embodiments, provided herein is a nucleic acid encoding for a gNA (e.g., gRNA) comprising a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence. In some embodiments, the third segment comprises a DNA sequence, which upon transcription yields a gRNA stem-loop sequence capable of binding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein. In one embodiment, the DNA sequence can be double-stranded. In some embodiments, the third segment double stranded DNA comprises the following DNA sequence on one strand (5′>3′, GTITAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTT) (SEQ ID NO: 3), and its reverse-complementary DNA on the other strand (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4). In some embodiments, the third segment double stranded DNA comprises the following DNA sequence on one strand (5′>3′, GTTITAGAGCTATGCTGGAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGCACCGAGTCGGTGCTTTTTTTC) (SEQ ID NO: 5), and its reverse-complementary DNA on the other strand (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTrFITCAAGTTGATAACGGACTAGCCTTATTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6). In one embodiment, the DNA sequence can be single-stranded. In some embodiments, the third segment single stranded DNA comprises the following DNA sequence (5′>3′, AAAAAAAGCACCGACTCGGTGCCACTITrrICAAGTTGATAACGGACTAGCCTTATTTTAACTTGCTAT TTCTAGCTCTAAAAC) (SEQ ID NO: 4), wherein the single-stranded DNA serves as a transcription template. In some embodiments, the third segment single stranded DNA comprises the following DNA sequence (5′>3′, GAAAAAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTAACTTGCT ATGCTGTTTCCAGCATAGCTCTAAAAC) (SEQ ID NO: 6), wherein the single-stranded DNA serves as a transcription template. In some embodiments, the third segment comprises a DNA sequence which, upon transcription, yields a first RNA sequence that is capable of forming a hybrid with a second RNA sequence, and which hybrid is capable of CRISPR/Cas system protein binding. In some embodiments, the third segment is double-stranded DNA comprising the DNA sequence on one strand: (5′>3′, GTTTTAGAGCTATGCTGTTTTG) (SEQ ID NO: 9) and its reverse complementary DNA sequence on the other strand: (5′>3′, CAAAACAGCATAGCTCTAAAAC) (SEQ ID NO: 10). In some embodiments, the third segment is single-stranded DNA comprising the DNA sequence of (5′>3′, CAAAACAGCATAGCTCTAAAAC) (SEQ ID NO: 10). In some embodiments, the second segment and the third segment together encode for a crRNA sequence. In some embodiments, the second RNA sequence that is capable of forming a hybrid with the first RNA sequence encoded by the third segment of the nucleic acid encoding a gRNA is a tracrRNA. In some embodiments, the tracrRNA comprises the sequence (5′>3′, GGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG GCACCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 11). In some embodiments, the tracrRNA is encoded by a double-stranded DNA comprising sequence of (5′>3′, GGAACCATTCAAAACAGCATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGC ACCGAGTCGGTGCTTTTTTT) (SEQ ID NO: 8), and optionally fused with a regulatory sequence at its 5′ end. In some embodiments, the regulatory sequence can be bound by a transcription factor. In some embodiments, the regulatory sequence is a promoter. In some embodiments, the regulatory sequence is a T7 promoter, comprising the sequence of (5′>3′, GCCTCGAGCTAATACGACTCACTATAGAG) (SEQ ID NO: 12).
  • In some embodiments, provided herein is a nucleic acid encoding for a gNA comprising a first segment comprising a regulatory region; a second segment encoding a targeting sequence; and a third segment comprising a nucleic acid encoding a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence. In some embodiments, the third segment encodes for a RNA sequence that, upon post-transcriptional cleavage, yields a first RNA segment and a second RNA segment. In some embodiments, the first RNA segment comprises a crRNA and the second RNA segment comprises a tracrRNA, which can form a hybrid and together, provide for nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein binding. In some embodiments, the third segment further comprises a spacer in between the transcriptional unit for the first RNA segment and the second RNA segment, which spacer comprises an enzyme cleavage site.
  • In some embodiments, provided herein is a gNA (e.g., gRNA) comprising a first NA segment comprising a targeting sequence and a second NA segment comprising a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-binding sequence. In some embodiments, the size of the first segment is greater than 30 bp. In some embodiments, the second segment comprises a single segment, which comprises the gRNA stem-loop sequence. In some embodiments, the gRNA stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCA CCGAGUCGGUGCUUUUUUU) (SEQ ID NO: 1). In some embodiments, the gRNA stem-loop sequence comprises the following RNA sequence: (5′>3′, GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGA AAAAGUGGCACCGAGUCGGUGCUUUUUUUC) (SEQ ID NO: 2). In some embodiments, the second segment comprises two sub-segments: a first RNA sub-segment (crRNA) that forms a hybrid with a second RNA sub-segment (tracrRNA), which together act to direct nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein binding. In some embodiments, the sequence of the second sub-segment comprises GUUUUAGAGCUAUGCUGUUUUG. In some embodiments, the first RNA segment and the second RNA segment together forms a crRNA sequence. In some embodiments, the other RNA that will form a hybrid with the second RNA segment is a tracrRNA. In some embodiments the tracrRNA comprises the sequence of 5′>3′, GGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUG GCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 11).
  • CRISPR/Cas System Nucleic Acid-Guided Nucleases
  • In some embodiments, CRISPR/Cas system proteins are used in the embodiments provided herein. In some embodiments, CRISPR/Cas system proteins include proteins from CRISPR Type I systems, CRISPR Type II systems, and CRISPR Type III systems.
  • In some embodiments, CRISPR/Cas system proteins can be from any bacterial or archaeal species.
  • In some embodiments, the CRISPR/Cas system protein is isolated, recombinantly produced, or synthetic.
  • In some embodiments, the CRISPR/Cas system proteins are from, or are derived from CRISPR/Cas system proteins from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, or Corynebacter diphtheria.
  • In some embodiments, examples of CRISPR/Cas system proteins can be naturally occurring or engineered versions.
  • In some embodiments, naturally occurring CRISPR/Cas system proteins can belong to CAS Class I Type I, III, or IV, or CAS Class II Type II or V, and can include Cas9, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cmr5, Csf1, C2c2, and Cpf1.
  • In an exemplary embodiment, the CRISPR/Cas system protein comprises Cas9.
  • A “CRISPR/Cas system protein-gNA complex” refers to a complex comprising a CRISPR/Cas system protein and a guide NA (e.g. a gRNA or a gDNA). Where the gNA is a gRNA, the gRNA may be composed of two molecules, i.e., one RNA (“crRNA”) which hybridizes to a target and provides sequence specificity, and one RNA, the “tracrRNA”, which is capable of hybridizing to the crRNA. Alternatively, the guide RNA may be a single molecule (i.e., a gRNA) that contains crRNA and tracrRNA sequences.
  • A CRISPR/Cas system protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type CRISPR/Cas system protein. The CRISPR/Cas system protein may have all the functions of a wild type CRISPR/Cas system protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.
  • The term “CRISPR/Cas system protein-associated guide NA” refers to a guide NA. The CRISPR/Cas system protein-associated guide NA may exist as isolated NA, or as part of a CRISPR/Cas system protein-gNA complex.
  • Cas9
  • In some embodiments, the CRISPR/Cas System protein nucleic acid-guided nuclease is or comprises Cas9. The Cas9 of the present invention can be isolated, recombinantly produced, or synthetic.
  • Examples of Cas9 proteins that can be used in the embodiments herin can be found in F. A. Ran, L. Cong, W. X. Yan, D. A. Scott, J. S. Gootenberg, A. J. Kriz, B. Zetsche, O. Shalem, X. Wu, K. S. Makarova, E. V. Koonin, P. A. Sharp, and F. Zhang; “In vivo genome editing using Staphylococcus aureus Cas9” Nature 520, 186-191 (9 Apr. 2015) doi:10.1038/nature14299, which is incorporated herein by reference.
  • In some embodiments, the Cas9 is a Type II CRISPR system derived from Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lar, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, or Corynebacter diphtheria.
  • In some embodiments, the Cas9 is a Type II CRISPR system derived from S. pyogenes and the PAM sequence is NGG located on the immediate 3′ end of the target specific guide sequence. The PAM sequences of Type II CRISPR systems from exemplary bacterial species can also include: Streptococcus pyogenes (NGG), Staph aureus (NNGRRT), Neisseria meningitidis (NNNNGA TT), Streptococcus thermophilus (NNAGAA) and Treponema denticola (NAAAAC) which are all usable without deviating from the present invention.
  • In one exemplary embodiment, Cas9 sequence can be obtained, for example, from the pX330 plasmid (available from Addgene), re-amplified by PCR then cloned into pET30 (from EMD biosciences) to express in bacteria and purify the recombinant 6His tagged protein.
  • A “Cas9-gNA complex” refers to a complex comprising a Cas9 protein and a guide NA. A Cas9 protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type Cas9 protein, e.g., to the Streptococcus pyogenes Cas9 protein. The Cas9 protein may have all the functions of a wild type Cas9 protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.
  • The term “Cas9-associated guide NA” refers to a guide NA as described above. The Cas9-associated guide NA may exist isolated, or as part of a Cas9-gNA complex.
  • Non-CRISPR/Cas System Nucleic Acid-Guided Nucleases
  • In some embodiments, non-CRISPR/Cas system proteins are used in the embodiments provided herein.
  • In some embodiments, the non-CRISPR/Cas system proteins can be from any bacterial or archaeal species.
  • In some embodiments, the non-CRISPR/Cas system protein is isolated, recombinantly produced, or synthetic.
  • In some embodiments, the non-CRISPR/Cas system proteins are from, or are derived from Aquifex aeolicus, Thermus thermophilus, Streptococcus pyogenes, Staphylococcus aureus, Neisseria meningitidis, Streptococcus thermophiles, Treponema denticola, Francisella tularensis, Pasteurella multocida, Campylobacter jejuni, Campylobacter lari, Mycoplasma gallisepticum, Nitratifractor salsuginis, Parvibaculum lavamentivorans, Roseburia intestinalis, Neisseria cinerea, Gluconacetobacter diazotrophicus, Azospirillum, Sphaerochaeta globus, Flavobacterium columnare, Fluviicola taffensis, Bacteroides coprophilus, Mycoplasma mobile, Lactobacillus farciminis, Streptococcus pasteurianus, Lactobacillus johnsonii, Staphylococcus pseudintermedius, Filifactor alocis, Legionella pneumophila, Suterella wadsworthensis, Natronobacterium gregoryi, or Corynebacter diphtheria.
  • In some embodiments, the non-CRISPR/Cas system proteins can be naturally occurring or engineered versions.
  • In some embodiments, a naturally occurring non-CRISPR/Cas system protein is NgAgo (Argonaute from Natronobacterium gregoryi).
  • A “non-CRISPR/Cas system protein-gNA complex” refers to a complex comprising a non-CRISPR/Cas system protein and a guide NA (e.g. a gRNA or a gDNA). Where the gNA is a gRNA, the gRNA may be composed of two molecules, i.e., one RNA (“crRNA”) which hybridizes to a target and provides sequence specificity, and one RNA, the “tracrRNA”, which is capable of hybridizing to the crRNA. Alternatively, the guide RNA may be a single molecule (i.e., a gRNA) that contains crRNA and tracrRNA sequences.
  • A non-CRISPR/Cas system protein may be at least 60% identical (e.g., at least 70%, at least 80%, or 90% identical, at least 95% identical or at least 98% identical or at least 99% identical) to a wild type non-CRISPR/Cas system protein. The non-CRISPR/Cas system protein may have all the functions of a wild type non-CRISPR/Cas system protein, or only one or some of the functions, including binding activity, nuclease activity, and nuclease activity.
  • The term “non-CRISPR/Cas system protein-associated guide NA” refers to a guide NA. The non-CRISPR/Cas system protein-associated guide NA may exist as isolated NA, or as part of a non-CRISPR/Cas system protein-gNA complex.
  • Catalytically Dead Nucleic Acid-Guided Nucleases
  • In some embodiments, engineered examples of nucleic acid-guided nucleases include catalytically dead nucleic acid-guided nucleases (CRISPR/Cas system nucleic acid-guided nucleases or non-CRISPR/Cas system nucleic acid-guided nucleases). The term “catalytically dead” generally refers to a nucleic acid-guided nuclease that has inactivated nucleases, for example inactivated HNH and RuvC nucleases. Such a protein can bind to a target site in any nucleic acid (where the target site is determined by the guide NA), but the protein is unable to cleave or nick the nucleic acid.
  • Accordingly, the catalytically dead nucleic acid-guided nuclease allows separation of the mixture into unbound nucleic acids and catalytically dead nucleic acid-guided nuclease-bound fragments. In one exemplary embodiment, a dCas9/gRNA complex binds to the targets determined by the gRNA sequence. The dCas9 bound can prevent cutting by Cas9 while other manipulations proceed.
  • In another embodiment, the catalytically dead nucleic acid-guided nuclease can be fused to another enzyme, such as a transposase, to target that enzyme's activity to a specific site.
  • In some embodiments, the catalytically dead nucleic acid-guided nuclease is dCas9, dCpf1, dCas3, dCas8a-c, dCas10, dCse1, dCsy1, dCsn2, dCas4, dCsm2, dCm5, dCsf1, dC2C2, or dNgAgo.
  • In one exemplary embodiment the catalytically dead nucleic acid-guided nuclease protein is a dCas9.
  • Nucleic Acid-Guided Nuclease Nickases
  • In some embodiments, engineered examples of nucleic acid-guided nucleases include nucleic acid-guided nuclease nickases (referred to interchangeably as nickase nucleic acid-guided nucleases).
  • In some embodiments, engineered examples of nucleic acid-guided nucleases include CRISPR/Cas system nickases or non-CRISPR/Cas system nickases, containing a single inactive catalytic domain.
  • In some embodiments, the nucleic acid-guided nuclease nickase is a Cas9 nickase, Cpf1 nickase, Cas3 nickase, Cas8a-c nickase, Cas10 nickase, Cse1 nickase, Csy1 nickase, Csn2 nickase, Cas4 nickase, Csm2 nickase, Cm5 nickase, Csf1 nickase, C2C2 nickase, or a NgAgo nickase.
  • In one embodiment, the nucleic acid-guided nuclease nickase is a Cas9 nickase.
  • In some embodiments, a nucleic acid-guided nuclease nickase can be used to bind to target sequence. With only one active nuclease domain, the nucleic acid-guided nuclease nickase cuts only one strand of a target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the guide NA-hybridized strand or the non-hybridized strand may be cleaved. nucleic acid-guided nuclease nickases bound to 2 gNAs that target opposite strands can create a double-strand break in the nucleic acid. This “dual nickase” strategy increases the specificity of cutting because it requires that both nucleic acid-guided nuclease/gNA complexes be specifically bound at a site before a double-strand break is formed.
  • In exemplary embodiments, a Cas9 nickase can be used to bind to target sequence. The term “Cas9 nickase” refers to a modified version of the Cas9 protein, containing a single inactive catalytic domain, i.e., either the RuvC- or the HNH-domain. With only one active nuclease domain, the Cas9 nickase cuts only one strand of the target DNA, creating a single-strand break or “nick”. Depending on which mutant is used, the guide RNA-hybridized strand or the non-hybridized strand may be cleaved. Cas9 nickases bound to 2 gRNAs that target opposite strands will create a double-strand break in the DNA. This “dual nickase” strategy can increase the specificity of cutting because it requires that both Cas9/gRNA complexes be specifically bound at a site before a double-strand break is formed.
  • Capture of DNA can be carried out using a nucleic acid-guided nuclease nickase. In one exemplary embodiment, a nucleic acid-guided nuclease nickase cuts a single strand of double stranded nucleic acid, wherein the double stranded region comprises methylated nucleotides.
  • Dissociable and Thermostable Nucleic Acid-Guided Nucleases
  • In some embodiments, thermostable nucleic acid-guided nucleases are used in the methods provided herein (thermostable CRISPR/Cas system nucleic acid-guided nucleases or thermostable non-CRISPR/Cas system nucleic acid-guided nucleases). In such embodiments, the reaction temperature is elevated, inducing dissociation of the protein; the reaction temperature is lowered, allowing for the generation of additional cleaved target sequences. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, at least 55% activity, at least 60% activity, at least 65% activity, at least 70% activity, at least 75% activity, at least 80% activity, at least 85% activity, at least 90% activity, at least 95% activity, at least 96% activity, at least 97% activity, at least 98% activity, at least 99% activity, or 100% activity, when maintained for at least 75° C. for at least 1 minute. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, when maintained for at least 1 minute at least at 75° C., at least at 80° C., at least at 85° C., at least at 90° C., at least at 91° C., at least at 92° C., at least at 93° C., at least at 94° C., at least at 95° C., 96° C., at least at 97° C., at least at 98° C., at least at 99° C., or at least at 100° C. In some embodiments, thermostable nucleic acid-guided nucleases maintain at least 50% activity, when maintained at least at 75° C. for at least 1 minute, 2 minutes, 3 minutes, 4 minutes, or 5 minutes. In some embodiments, a thermostable nucleic acid-guided nuclease maintains at least 50% activity when the temperature is elevated, lowered to 25° C.-50° C. In some embodiments, the temperature is lowered to 25° C., to 30° C., to 35° C. to 40° C., to 45° C., or to 50° C. In one exemplary embodiment, a thermostable enzyme retains at least 90% activity after 1 min at 95° C.
  • In some embodiments, the thermostable nucleic acid-guided nuclease is thermostable Cas9, thermostable Cpf1, thermostable Cas3, thermostable Cas8a-c, thermostable Cas10, thermostable Cse1, thermostable Csy1, thermostable Csn2, thermostable Cas4, thermostable Csm2, thermostable Cm5, thermostable Csf1, thermostable C2C2, or thermostable NgAgo.
  • In some embodiments, the thermostable CRISPR/Cas system protein is thermostable Cas9.
  • Thermostable nucleic acid-guided nucleases can be isolated, for example, identified by sequence homology in the genome of thermophilic bacteria Streptococcus thermophilus and Pyrococcus furiosus. Nucleic acid-guided nuclease genes can then be cloned into an expression vector. In one exemplary embodiment, a thermostable Cas9 protein is isolated.
  • In another embodiment, a thermostable nucleic acid-guided nuclease can be obtained by in vitro evolution of a non-thermostable nucleic acid-guided nuclease. The sequence of a nucleic acid-guided nuclease can be mutagenized to improve its thermostability.
  • Methods of Making Collections of gNAs
  • Provided herein are methods that enable the generation of a large number of diverse gRNAs, collections of gNAs, from any source nucleic acid (e.g., DNA). Methods provided herein can employ enzymatic methods including but not limited to digestion, ligation, extension, overhang filling, transcription, reverse transcription, amplification.
  • Generally, the method can comprise providing a nucleic acid (e.g., DNA); employing a first enzyme (or combinations of first enzymes) that cuts at a part of the PAM sequence in the nucleic acid, in a way that a residual nucleotide sequence from the PAM sequence is left; ligating an adapter that positions a restriction enzyme typeIIS site (an enzyme that cuts outside yet near its recognition motif) at a distance to eliminate the PAM sequence; employing a second typeIIS enzyme (or combination of second enzymes) to eliminate the PAM sequence together with the adapter; and fusing a sequence that can be recognized by protein members of the nucleic acid-guided nuclease (e.g., CRISPR/Cas) system, for example, a gRNA stem-loop sequence. In some embodiments, the first enzymatic reactions cuts part of the PAM sequence in a way that residual nucleotide sequence from the PAM sequence is left, and that the nucleotide sequence immediately 5′ to the PAM sequence can be any purine or pyrimidine, not just those with a cytosine 5′ to the PAM sequence, for example, not just those that are C/NGG or C/TAG, etc.
  • Table 1 shows exemplary strategies/protocols to convert any source nucleic acid (e.g., DNA) into a collection of gNAs (e.g., gRNAs) using different restriction enzymes.
  • TABLE 1
    Exemplary strategies for preparing a collection of guide nucleic acids.
    First 3′ Adapter sequence with
    CRISPR/Cas PAM Enzyme/ typeIIS enzyme site
    System Se- Compo- (provided with only one
    Species quence nents Strategy strand sequence 5′→3′)
    Streptococcus NGG CviPII Nicks immediately 5′ of CCD sequence, ggGACTCggatccctatagtc
    pyogenes nicks the other strand with T7 endonuclease (SEQ ID NO: 4421)
    (SP); SpCas9 I, blunt with T4 DNA polymerase; ligate to
    adapter; cut with MlyI to remove PAM and
    adapter; ligate gRNA stem-loop sequence at
    3′ end
    Staphylococcus NNGRRT AlwI Cut, blunt with T4 DNA polymerase; ligate to ttttagcggccgcctgctgCTCtacaa
    aureus or adapter SA; cut with EcoP15I to remove agacgatgacgacaagcgt
    (SA); SaCas9 NNGRR PAM and adapter; blunt end; ligate gRNA (SEQ ID NO: 4422)
    (N) stem-loop sequence at 3′ end
    Neisseria NNNNGA TfiI Cut, blunt with T4 DNA polymerase; ligate to TCgcggccgcttttattctgctgCTCt
    meningitidis TT adapter NM; cut with EcoRI to eliminate acaaagacgatgacgacaagcgt
    (NM) unwanted DNA and EcoP15I to remove PAM (SEQ ID NO: 4428)
    and adapter; blunt end; ligate gRNA stem-
    loop sequence at 3′ end
    Streptococcus NNAGAA BsmI Cut, blunt with T4 DNA polymerase; ligate to ttacggccgcttttattctgctgCTCt
    thermophilus W adapter ST; cut with EcoP15I to remove PAM acaaagacgatgacgacaagcgt
    (ST) and adapter; blunt end; ligate gRNA stem- (SEQ ID NO: 4429)
    loop sequence at 3′ end
    Treponema NAAAAC Cly7489I Cut, blunt with T4 DNA polymerase; ligate to tttagcggccgcctgecgCTCtacaaa
    denticola I adapter TD; cut with EcoP15I to remove gacgatgacgacaagcgt
    (TD) PAM and adapter (SEQ ID NO: 4430)
  • Table 2 shows additional exemplary strategies/protocols to convert any source nucleic acid (e.g., DNA) into a collection of gNAs (e.g., gRNAs) using different restriction enzymes.
  • TABLE 2
    Additional exemplary strategies for preparing a collection of guide nucleic acids.
    CRISPR/ First Adapter oligo sequence (with
    Cas System PAM Enzyme/ Inosine overhangs, all in 5′→3′
    Species Sequence Component Exemplary Strategy direction)
    Streptococcus NGG CviPII Nicks immediately 5′ of CCD Adapter oligo I:
    pyogenes (SP); sequence, nicks the other strand ggggGACTCggatccctatagtgatac
    SpCas9 with T7 endonuclease I; ligate to aaagacgatgacgacaagcg
    adapter; cut with MlyI to remove (SEQ ID NO: 4404)
    PAM and 3′ adapter; ligate gRNA Adapter oligo 2:
    stem-loop sequence at 3′ end gcctcgagc*t*a*atacgactcactatag
    ggatccaagtccc
    (* denotes a phosphorothioate
    backbone linkage)
    (SEQ ID NO: 4405)
    Staphylococcus NNGRRT or AlwI Cut; ligate to adapter SA; cut Adapter oligo 1:
    aureus (SA), NNGRR(N) with EcoP15I to remove PAM and 3′ IttttagcggccgcctgctgCTCtacaaa
    SaCas9 adapter; blunt end; ligate gRNA gacgatgacgacaagcgt
    stem-loop sequence at 3′ end (SEQ ID NO: 4422)
    Adapter oligo 2:
    gagatcagcttctgcattgatgcGAGcag
    caggcggccgctaaaa
    (SEQ ID NO: 4423)
    Neisseria NNNNGATT TfiI Cut; ligate to adapter NM; cut Adapter oligo 1:
    meningitidis with EcoP15I to remove PAM and attTCgcggccgcttttattctgctgCTCt
    (NM) 3′ adapter; blunt end; ligate acaaagacgatgacgacaagcgt
    gRNA stem-loop sequence at 3′ (SEQ ID NO: 4424)
    end Adapter oligo 2:
    gagatcagcttctgcattgatgcGAGcag
    cagaataaaagcggccgcGA
    (SEQ ID NO: 4425)
    Streptococcus NNAGAAW BsmI Cut; ligate to adapter ST; cut Adapter oligo 1:
    thermophilus with EcoP15I to remove PAM and 3′ gcggccgcttttattctgctgCTCtacaaa
    (ST) adapter; blunt end; ligate gRNA gacgatgacgacaagcgt
    stem-loop sequence at 3′ end (SEQ ID NO: 4426)
    Adapter oligo 2:
    gagatcagcttctgcattgatgcGAGcag
    cagaataaaagcggccgcIG
    SEQ ID NO: 4427)
  • Exemplary applications of the compositions and methods described herein are provided in FIG. 1, FIG. 2, FIG. 3, FIG. 4, FIG. 5, FIG. 6, and FIG. 7. The figures depict non-limiting exemplary embodiments of the present invention that includes a method of constructing a gNA library (e.g., gRNA library) from input nucleic acids (e.g., DNA), such as genomic DNA (e.g., human genomic DNA).
  • In FIG. 1, the starting material can be fragmented genomic DNA (e.g., human) or other source DNA. These fragments are blunt-ended before constructing the library 101. T7 promoter adapters are ligated to the blunt-ended DNA fragments 102, which is then PCR amplified. Nt.CviPII is then used to generate a nick on one strand of the PCR product immediately 5′ to the CCD sequence 103. T7 Endonuclease I cleaves on the opposite strand 1, 2, or 3 bp 5′ of the nick 104. The resulting DNA fragments are blunt-ended with T4 DNA Polymerase, leaving HGG sequence at the end of the DNA fragment 105. The resulting DNA is cleaned and recovered on beads. An adapter carrying MlyI recognition site is ligated to the blunt-ended DNA fragment immediately 3′ of HGG sequence 106. MlyI generates a blunt-end cleavage immediately 5′ to the HGG sequence, removing HGG together with the adapter sequence 107. The resulting DNA fragments are cleaned and recovered again on beads. A gRNA stem-loop sequence is then ligated to the blunt-end cleaved by MlyI, forming a gRNA library covering the human genome 108. This library of DNA is then PCR amplified and cleaned on beads, ready for in vitro transcription.
  • In FIG. 2, the starting material can intact genomic DNA (e.g., human) or other source DNA 201. Nt.CviPII and T7 Endonuclease I are used to generate nicks on each strand of the human genomic DNA, resulting in smaller DNA fragments 202. DNA fragments of 200-600 bp are size selected on beads, then ligated with Y-shaped adapters carrying a GG overhang on the 5′. One strand of the Y-shaped adapter contains a MlyI recognition site, wherein the other strand contains a mutated MlyI site and a T7 promoter sequence 203. Because of these features, after PCR amplification, the T7 promoter sequence is at the distal end of the HGG sequence, and the MlyI sequence is at the rear end of HGG 204. Digestion with MlyI generates a cleavage immediately 5′ of HGG sequence 205. MlyI generates a blunt-end cleavage immediately 5′ to the HGG sequence, removing HGG together with the adapter sequence 206. A gRNA stem-loop sequence is then ligated to the blunt-end cleaved by MlyI, forming a gRNA library covering the human genome. This library of DNA is then PCR amplified and cleaned on beads, ready for in vitro transcription.
  • In FIG. 3, the source DNA (e.g., genomic DNA) can be nicked 301, for example with a nicking enzyme. In some cases, the nicking enzyme can have a recognition site that is three or fewer bases in length. In some cases, CviPII is used, which can recognize and nick at a sequence of CCD (where D represents a base other than C). Nicks can be proximal, surrounding a region containing the sequence (represented by the thicker line) which will be used to yield the guide RNA N20 sequence. When nicks are proximal, a double stranded break can occur and lead to 5′ or 3′ overhangs 302. These overhangs can be repaired, for example with a polymerase (e.g., T4 polymerase). In some cases, such as with 5′ strands, repair can comprise synthesizing a complementary strand. In some case, such as with 3′ strands, repair can comprise removing overhangs. Repair can result in a blunt end including the N20 guide sequence and a sequence complementary to the nick recognition sequence (e.g., HGG, where H represents a base other than G).
  • In FIG. 4, continuing for example from the end of FIG. 3, different combinations of adapters can be ligated to the DNA to allow for the desired cleaving. Adapters with a recognition site for a nuclease enzyme that cuts 3 base pairs from the site (e.g., MlyI) can be ligated 401, and digestion at that site can be used to remove a left over sequence, such as an HGG sequence 402. Adapters with a recognition site for a nuclease that cuts 20 base pairs from the site (e.g., MmeI) 403. These adapters can also include a second recognition site for a nuclease that cuts the proper number of nucleotides from the site to later remove the first recognition site (e.g., BsaXI). The first enzyme can be used to cut 20 nucleotides down, thereby keeping the N20 sequence 404. Then, a promoter adapter (e.g., T7) can be ligated next to the N20 sequence 405. Then, the nuclease corresponding to the second recognition site (e.g., BsaXI) can be used to remove the adapter for the site that cuts 20 nucleotides away (e.g., MmeI) 406. Finally, the guide RNA stem-loop sequence adapter can be ligated to the N20 sequence 407 to prepare for guide RNA production.
  • Alternatively, the protocol shown in FIG. 5 can follow the end of a protocol such as that shown in FIG. 3. Adapters with a recognition site for a nuclease enzyme that cleaves 25 nucleotides from the site (e.g., EcoP151) can be ligated to the DNA 501. These adapters can also include a second recognition site for a nuclease that cuts the proper number of nucleotides from the site to later remove the first recognition site (e.g., Bac) and any other left-over sequence, such as HGG. The enzyme corresponding to the first recognition site (e.g., EcoP15I) can then be used to cleave after the N20 sequence 502. Then, a promoter adapter (e.g., T7) can be ligated next to the N20 sequence 503. The enzyme corresponding to the second recognition site (e.g., BaeI) can then be used to remove the recognition sites and any residual sequence (e.g., HGG) 504. Finally, the guide RNA stem-loop sequence adapter can be ligated (e.g., by single strand ligation) to the N20 sequence 505.
  • As an alternative to protocols such as that shown in FIG. 3, the protocol shown in FIG. 6 can be used in preparation for protocols such as those shown in FIG. 4 or FIG. 5. A nick can be introduced by a nicking enzyme (e.g., CviPII) 601. In some cases, the nick recognition site is three or fewer bases in length. In some cases, CviPII is used, which can recognize and nick at a sequence of CCD. A polymerase (e.g., Bst large fragment DNA polymerase) can then be used to synthesize a new DNA strand starting from the nick while displacing the old strand 602. Because of the DNA synthesis, the nick can be sealed and made available to be nicked again 603. Subsequent cycles of nicking and synthesis can be used to yield large amounts of target sequences 604. These single stranded copies of target sequences can be made double stranded, for example by random priming and extension. These double stranded nucleic acids comprising N20 sequences can then be further processed by methods disclosed herein, such as those shown in FIG. 4 or FIG. 5.
  • As another alternative to protocols such as that shown in FIG. 3 or FIG. 6, the protocol shown in FIG. 7 can be used in preparation for protocols such as those shown in FIG. 4 or FIG. 5. A nick can be introduced by a nicking enzyme (e.g., CviPII) 701. In some cases, the nicking enzyme recognition site is three or fewer bases in length. In some cases, CviPII is used, which can recognize and nick at a sequence of CCD. A polymeras (e.g., Bst large fragment DNA polymerase) can then be used to synthesize a new DNA strand starting from the nick while displacing the old strand (e.g., nicking endonuclease-mediated strand-displacement DNA amplification (NEMDA)). The reaction parameters can be adjusted to control the size of the single stranded DNA produced. For example, the nickase:polymerase ratio (e.g., CviPII:Bts large fragment polymerase ratio) can be adjusted. Reaction temperature can also be adjusted. Next, an oligonucleotide can be added 704 which has (in the 5′>3′ direction) a promoter (e.g., T7 promoter) 702 followed by a random n-mer (e.g., random 6-mer, random 8-mer) 703. The random n-mer region can bind to a region of the single stranded DNA generated previously. For example, binding can be conducted by denaturing at high temperature followed by rapid cool down, which can allow the random n-mer region to bind to the single stranded DNA generated by NEMDA. In some cases, the DNA is denatured at 98° C. for 7 minutes then cooled down rapidly to 10° C. Extension and/or amplification can be used to produce double-stranded DNA. Blunt ends can be produced, for example enzymatically (e.g., by treatment with DNA polymerase I at 20° C.). This can result in one end ending at the promoter (e.g., T7 promoter) and the other end ending at any nicking enzyme recognition sites (e.g., any CCD sites). These fragments can then be purified, for example by size selection (e.g., by gel purification, capillary electrophoresis, or other fragment separation techniques). In some cases, the target fragments are about 50 base pairs in length (adapter sequence (e.g., T7 adapter)+target N20 sequence+nicking enzyme recognition site or complement (e.g., HGG)). Fragments can then be ligated to an adapter comprising a nuclease recognition site for a nuclease that cuts an appropriate distance away to remove the nicking enzyme recognition site 705. For example, for a three-nucleotide long nicking enzyme recognition site (e.g., CCD for CviPII). BaeI can be used. The appropriate nuclease (e.g., BaeI) can then be used to remove the nuclease recognition site and the nicking enzyme recognition site 706. The remaining nucleic acid sequence (e.g., the N20 site) can then be ligated to the final stem-loop sequence for the guide RNA 707. Amplification (e.g., PCR) can be conducted. Guide RNAs can be produced.
  • In some embodiments, a collection of gNAs (e.g., gRNAs) targeting human mitochondrial DNA (mtDNA) is created, that can be used for directing nucleic acid-guided nuclease (e.g., Cas9) proteins, comprising the nucleic acid-guided nuclease (e.g., Cas9) target sequence. In some embodiments, the targeting sequence of this collection of gNAs (e.g., gRNAs) are encoded by DNA sequences comprising at least the 20 nt sequence provided in the second column from the right of Table 3 (if the NGG sequence is on positive strand) and Table 4 (if the NGG sequence is on negative strand). In some embodiments, a collection of gRNA nucleic acids, as provided herein, with specificity for human mitochondrial DNA, comprise a plurality of members, wherein the members comprise a plurality of targeting sequences provided in the second column from the right column of Table 3 and/or the second column from the right of Table 4.
  • TABLE 3
    gRNA target sequence for human mtDNA carrying NGG sequence on the (+) strand.
    Chr end nt sequence on the (+) 20 nt gRNA target
    Chr start position strand containing gRNA sequence
    position (+ target sequence followed SEQ (will encode the gRNA SEQ
    (+ strand) strand) by NGG ID NO: targeting sequence) ID NO:
    13 35 ATCACCCTATTAACCAC 13 ATCACCCTATTAACCA 436
    TCACGG CTCA
    14 36 TCACCCTATTAACCACT 14 TCACCCTATTAACCAC 437
    CACGGG TCAC
    32 54 ACGGGAGCTCTCCATGC 15 ACGGGAGCTCTCCATG 438
    ATTTGG CATT
    45 67 ATGCATTTGGTATTTTC 16 ATGCATTTGGTATTTT 439
    GTCTGG CGTC
    46 68 TGCATTTGGTATTTTCGT 17 TGCATTTGGTATTTTC 440
    CTGGG GTCT
    47 69 GCATTTGGTATTTTCGT 18 GCATTTGGTATTTTCG 441
    CTGGGG TCTG
    48 70 CATTTGGTATTTTCGTCT 19 CATTTGGTATTTTCGTC 442
    GGGGG TGG
    49 71 ATTTGGTATTTTCGTCTG 20 ATTTGGTATTTTCGTCT 443
    GGGGG GGG
    79 101 GCGATAGCATTGCGAGA 21 GCGATAGCATTGCGAG 444
    CGCTGG ACGC
    85 107 GCATTGCGAGACGCTGG 22 GCATTGCGAGACGCTG 445
    AGCCGG GAGC
    163 185 GCACCTACGTTCAATAT 23 GCACCTACGTTCAATA 446
    TACAGG TTAC
    207 229 GTTAATTAATTAATGCT 24 GTTAATTAATTAATGC 447
    TGTAGG TTGT
    301 323 AACCCCCCCTCCCCCGC 25 AACCCCCCCTCCCCCG 448
    TTCTGG CTTC
    388 410 AGATTTCAAATTTTATC 26 AGATTTCAAATTTTAT 449
    TTTTGG CTTT
    391 413 TTTCAAATTTTATCTTTT 27 TTTCAAATTTTATCTTT 450
    GGCGG TGG
    604 626 ATACACTGAAAATGTTT 28 ATACACTGAAAATGTT 451
    AGACGG TAGA
    605 627 TACACTGAAAATGTTTA 29 TACACTGAAAATGTTT 452
    GACGGG AGAC
    631 653 ACATCACCCCATAAACA 30 ACATCACCCCATAAAC 453
    AATAGG AAAT
    636 658 ACCCCATAAACAAATAG 31 ACCCCATAAACAAATA 454
    GTTTGG GGTT
    727 749 TCTAAATCACCACGATC 32 TCTAAATCACCACGAT 455
    AAAAGG CAAA
    788 810 TTAGCCTAGCCACACCC 33 TTAGCCTAGCCACACC 456
    CCACGG CCCA
    789 811 TAGCCTAGCCACACCCC 34 TAGCCTAGCCACACCC 457
    CACGGG CCAC
    851 873 AACTAAGCTATACTAAC 35 AACTAAGCTATACTAA 458
    CCCAGG CCCC
    852 874 ACTAAGCTATACTAACC 36 ACTAAGCTATACTAAC 459
    CCAGGG CCCA
    856 878 AGCTATACTAACCCCAG 37 AGCTATACTAACCCCA 460
    GGTTGG GGGT
    880 902 CAATTTCGTGCCAGCCA 38 CAATTTCGTGCCAGCC 461
    CCGCGG ACCG
    912 934 TAACCCAAGTCAATAGA 39 TAACCCAAGTCAATAG 462
    AGCCGG AAGC
    1009 1031 CACAAAATAGACTACG 40 ACACAAATAGACTACG 463
    AAAGTGG AAAG
    1051 1073 ACAATAGCTAAGACCCA 41 ACAATAGCTAAGACCC 464
    AACTGG AAAC
    1052 1074 CAATAGCTAAGACCCAA 42 CAATAGCTAAGACCCA 465
    ACTGGG AACT
    1148 1170 AGCCACAGCTTAAAACT 43 AGCCACAGCTTAAAAC 466
    CAAAGG TCAA
    1154 1176 AGCTTAAAACTCAAAGG 44 AGCTTAAAACTCAAAG 467
    ACCTGG GACC
    1157 1179 TTAAAACTCAAAGGACC 45 TTAAAACTCAAAGGAC 468
    TGGCGG CTGG
    1178 1200 GGTGCTTCATATCCCTC 46 GGTGCTTCATATCCCT 469
    TAGAGG CTAG
    1267 1289 TCTTCAGCAAACCCTGA 47 TCTTCAGCAAACCCTG 470
    TGAAGG ATGA
    1306 1328 AGTACCCACGTAAAGAC 48 AGTACCCACGTAAAGA 471
    GTTAGG CGTT
    1312 1334 CACGTAAAGACGTTAGG 49 CACGTAAAGACGTTAG 472
    TCAAGG GTCA
    1326 1348 AGGTCAAGGTGTAGCCC 50 AGGTCAAGGTGTAGCC 473
    ATGAGG CATG
    1329 1351 TCAAGGTGTAGCCCATG 51 TCAAGGTGTAGCCCAT 474
    AGGTGG GAGG
    1339 1361 GCCCATGAGGTGGCAA 52 GCCCATGAGGTGGCAA 475
    GAAATGG GAAA
    1340 1362 CCCATGAGGTGGCAAG 53 CCCATGAGGTGGCAAG 476
    AAATGGG AAAT
    1389 1411 GATAGCCCTTATGAAAC 54 GATAGCCCTTATGAAA 477
    TTAAGG CTTA
    1390 1412 ATAGCCCTTATGAAACT 55 ATAGCCCTTATGAAAC 478
    TAAGGG TTAA
    1397 1419 TTATGAAACTTAAGGGT 56 TTATGAAACTTAAGGG 479
    CGAAGG TCGA
    1400 1422 TGAAACTTAAGGGTCGA 57 TGAAACTTAAGGGTCG 480
    AGGTGG AAGG
    1441 1463 AGTAGAGTGCTTAGTTG 58 AGTAGAGTGCTTAGTT 481
    AACAGG GAAC
    1442 1464 GTAGAGTGCTTAGTTGA 59 GTAGAGTGCTTAGTTG 482
    ACAGGG AACA
    1494 1516 CCTCCTCAAGTATACTT 60 CCTCCTCAAGTATACT 483
    CAAAGG TCAA
    1530 1552 ACCCCTACGCATTTATA 61 ACCCCTACGCATTTAT 484
    TAGAGG ATAG
    1548 1570 AGAGGAGACAAGTCGT 62 AGAGGAGACAAGTCG 485
    AACATGG TAACA
    1560 1582 TCGTAACATGGTAAGTG 63 TCGTAACATGGTAAGT 486
    TACTGG GTAC
    1573 1595 AGTGTACTGGAAAGTGC 64 AGTGTACTGGAAAGTG 487
    ACTTGG CACT
    1620 1642 AAAGCACCCAACTTACA 65 AAAGCACCCAACTTAC 488
    CTTAGG ACTT
    1726 1748 CATTTACCCAAATAAAG 66 CATTTACCCAAATAAA 489
    TATAGG GTAT
    1746 1768 AGGCGATAGAAATTGA 67 AGGCGATAGAAATTG 490
    AACCTGG AAACC
    1770 1792 GCAATAGATATAGTACC 68 GCAATAGATATAGTAC 491
    GCAAGG CGCA
    1771 1793 CAATAGATATAGTACCG 69 CAATAGATATAGTACC 492
    CAAGGG GCAA
    1809 1831 TAACCAAGCATAATATA 70 TAACCAAGCATAATAT 493
    GCAAGG AGCA
    1862 1884 TAACTAGAAATAACTTT 71 TAACTAGAAATAACTT 494
    GCAAGG TGCA
    1947 1969 CCGTCTATGTAGCAAAA 72 CCGTCTATGTAGCAAA 495
    TAGTGG ATAG
    1948 1970 CGTCTATGTAGCAAAAT 73 CGTCTATGTAGCAAAA 496
    AGTGGG TAGT
    1960 1982 AAAATAGTGGGAAGAT 74 AAAATAGTGGGAAGA 497
    TTATAGG TTTAT
    1966 1988 GTGGGAAGATTTATAGG 75 GTGGGAAGATTTATAG 498
    TAGAGG GTAG
    1987 2009 GGCGACAAACCTACCG 76 GGCGACAAACCTACCG 499
    AGCCTGG AGCC
    1997 2019 CTACCGAGCCTGGTGAT 77 CTACCGAGCCTGGTGA 500
    AGCTGG TAGC
    2086 2108 ATTTAACTGTTAGTCCA 78 ATTTAACTGTTAGTCC 501
    AAGAGG AAAG
    2099 2121 TCCAAAGAGGAACAGC 79 TCCAAAGAGGAACAG 502
    TCTTTGG CTCTT
    2107 2129 GGAACAGCTCTTTGGAC 80 GGAACAGCTCTTTGGA 503
    ACTAGG CACT
    2152 2174 AAAAATTTAACACCCAT 81 AAAAATTTAACACCCA 504
    AGTAGG TACT
    2247 2269 CTGAACTCCTCACACCC 82 CTGAACTCCTCACACC 505
    AATTGG CAAT
    2414 2436 CCTCACTGTCAACCCAA 83 CCTCACTGTCAACCCA 506
    CACAGG ACAC
    2427 2449 CCAACACAGGCATGCTC 84 CCAACACAGGCATGCT 507
    ATAAGG CATA
    2432 2454 ACAGGCATGCTCATAAG 85 ACAGGCATGCTCATAA 508
    GAAAGG GGAA
    2449 2471 GAAAGGTTAAAAAAAG 86 GAAAGGTTAAAAAAA 509
    TAAAAGG GTAAA
    2456 2478 TAAAAAAAGTAAAAGG 87 TAAAAAAAGTAAAAG 510
    AACTCGG GAACT
    2515 2537 TCTAGCATCACCAGTAT 88 TCTAGCATCACCAGTA 511
    TAGAGG TTAG
    2546 2568 GCCCAGTGACACATGTT 89 GCCCAGTGACACATGT 512
    TAACGG TTAA
    2552 2574 TGACACATGTTTAACGG 90 TGACACATGTTTAACG 513
    CCGCGG GCCG
    2571 2593 GCGGTACCCTAACCGTG 91 GCGGTACCCTAACCGT 514
    CAAAGG GCAA
    2599 2621 TAATCACTTGTTCCTTA 92 TAATCACTTGTTCCTT 515
    AATAGG AAAT
    2600 2622 AATCACTTGTTCCTTAA 93 AATCACTTGTTCCTTA 516
    ATAGGG AATA
    2614 2636 TAAATAGGGACCTGTAT 94 TAAATAGGGACCTGTA 517
    GAATGG TGAA
    2624 2646 CCTGTATGAATGGCTCC 95 CCTGTATGAATGGCTC 518
    ACGAGG CACG
    2625 2647 CTGTATGAATGGCTCCA 96 CTGTATGAATGGCTCC 519
    CGAGGG ACGA
    2676 2698 AAATTGACCTGCCCGTG 97 AAATTGACCTGCCCGT 520
    AAGAGG GAAG
    2679 2701 TTGACCTGCCCGTGAAG 98 TTGACCTGCCCGTGAA 521
    AGGCGG GAGG
    2680 2702 TGACCTGCCCGTGAAGA 99 TGACCTGCCCGTGAAG 522
    GGCGGG AGGC
    2711 2733 AGCAAGACGAGAAGAC 100 AGCAAGACGAGAAGA 523
    CCTATGG CCCTA
    2755 2777 ACAGTACCTAACAAACC 101 ACAGTACCTAACAAAC 524
    CACAGG CCAC
    2789 2811 CAAACCTGCATTAAAAA 102 CAAACCTGCATTAAAA 525
    TTTCGG ATTT
    2793 2815 CCTGCATTAAAAATTTC 103 CCTGCATTAAAAATTT 526
    GGTTGG CGGT
    2794 2816 CTGCATTAAAAATTTCG 104 CTGCATTAAAAATTTC 527
    GTTGGG GGTT
    2795 2817 TGCATTAAAAATTTCGG 105 TGCATTAAAAATTTCG 528
    TTGGGG GTTG
    2804 2826 AATTTCGGTTGGGGCGA 106 AATTTCGGTTGGGGCG 529
    CCTCGG ACCT
    2895 2917 TGATCCAATAACTTGAC 107 TGATCCAATAACTTGA 530
    CAACGG CCAA
    2911 2933 CCAACGGAACAAGTTAC 108 CCAACGGAACAAGTTA 531
    CCTAGG CCCT
    2912 2934 CAACGGAACAAGTTACC 109 CAACGGAACAAGTTAC 532
    CTAGGG CCTA
    2954 2976 CTAGAGTCCATATCAAC 110 CTAGAGTCCATATCAA 533
    AATAGG CAAT
    2955 7977 TAGAGTCCATATCAACA 111 TAGAGTCCATATCAAC 534
    ATAGGG AATA
    2974 2996 AGGGTTTACGACCTCGA 112 AGGGTTTACGACCTCG 535
    TGTTGG ATGT
    2980 3002 TACGACCTCGATGTTGG 113 TACGACCTCGATGTTG 536
    ATCAGG GATC
    2992 3014 GTTGGATCAGGACATCC 114 GTTGGATCAGGACATC 537
    CGATGG CCGA
    3010 3032 GATGGTGCAGCCGCTAT 115 GATGGTGCAGCCGCTA 538
    TAAAGG TTAA
    3058 3080 TACGTGATCTGAGTTCA 116 TACGTGATCTGAGTTC 539
    GACCGG AGAC
    3069 3091 AGTTCAGACCGGAGTAA 117 AGTTCAGACCGGAGTA 540
    TCCAGG ATCC
    3073 3095 CAGACCGGAGTAATCCA 118 CAGACCGGAGTAATCC 541
    GGTCGG AGGT
    3110 3132 CAAATTCCTCCCTGTAC 119 CAAATTCCTCCCTGTA 542
    GAAAGG CGAA
    3125 3147 ACGAAAGGACAAGAGA 120 ACGAAAGGACAAGAG 543
    AATAAGG AAATA
    3203 3225 ACCCACACCCACCCAAG 121 ACCCACACCCACCCAA 544
    AACAGG GAAC
    3204 3226 CCCACACCCACCCAAGA 122 CCCACACCCACCCAAG 545
    ACAGGG AACA
    3217 3239 AAGAACAGGGTTTGTTA 123 AAGAACAGGGTTTGTT 546
    AGATGG AAGA
    3227 3249 TTTGTTAAGATGGCAGA 124 TTTGTTAAGATGGCAG 547
    GCCCGG AGCC
    3262 3284 ACTTAAAACTTTACAGT 125 ACTTAAAACTTTACAG 548
    CAGAGG TCAG
    3294 3316 TCTTCTTAACAACATAC 126 TCTTCTTAACAACATA 549
    CCATGG CCCA
    3336 3358 TGTACCCATTCTAATCG 127 TGTACCCATTCTAATC 550
    CAATGG GCAA
    3370 3392 CTTACCGAACGAAAAAT 128 CTTACCGAACGAAAAA 551
    TCTAGG TTCT
    3391 3413 GGCTATATACAACTACG 129 GGCTATATACAACTAC 552
    CAAAGG GCAA
    3406 3428 CGCAAAGGCCCCAACGT 130 CGCAAAGGCCCCAAC 553
    TGTAGG GTTGT
    3415 3437 CCCAACGTTGTAGGCCC 131 CCCAACGTTGTAGGCC 554
    CTACGG CCTA
    3416 3438 CCAACGTTGTAGGCCCC 132 CCAACGTTGTAGGCCC 555
    TACGGG CTAC
    3570 3592 CCTCCCCATACCCAACC 133 CCTCCCCATACCCAAC 556
    CCCTGG CCCC
    3586 3608 CCCCTGGTCAACCTCAA 134 CCCCTGGTCAACCTCA 557
    CCTAGG ACCT
    3643 3665 GTTTACTCAATCCTCTG 135 GTTTACTCAATCCTCT 558
    ATCAGG GATC
    3644 3666 TTTACTCAATCCTCTGA 136 TTTACTCAATCCTCTG 559
    TCAGGG ATCA
    3676 3698 AACTCAAACTACGCCCT 137 AACTCAAACTACGCCC 560
    GATCGG TGAT
    3757 3779 CTATCAACATTACTAAT 138 CTATCAACATTACTAA 561
    AAGTGG TAAG
    3828 3850 ACTCCTGCCATCATGAC 139 ACTCCTGCCATCATGA 562
    CCTTGG CCCT
    3892 3914 ACCCCCTTCGACCTTGC 140 ACCCCCTTCGACCTTG 563
    CGAAGG CCGA
    3893 3915 CCCCCTTCGACCTTGCC 141 CCCCCTTCGACCTTGC 564
    GAAGGG CGAA
    3894 3916 CCCCTTCGACCTTGCCG 142 CCCCTTCGACCTTGCC 565
    AAGGGG GAAG
    3913 3935 GGGGAGTCCGAACTAGT 143 GGGGAGTCCGAACTA 566
    CTCAGG GTCTC
    3937 3959 TTCAACATCGAATACGC 144 TTCAACATCGAATACG 567
    CGCAGG CCGC
    4015 4037 CTCACCACTACAATCTT 145 CTCACCACTACAATCT 568
    CCTAGG TCCT
    4287 4309 ACTTTGATAGAGTAAAT 146 ACTTTGATAGAGTAAA 569
    AATAGG TAAT
    4311 4333 GCTTAAACCCCCTTATT 147 GCTTAAACCCCCTTAT 570
    TCTAGG TTCT
    4386 4408 TCACACCCCATCCTAAA 148 TCACACCCCATCCTAA 571
    GTAAGG AGTA
    4406 4428 AGGTCAGCTAAATAAGC 149 AGGTCAGCTAAATAAG 572
    TATCGG CTAT
    4407 4429 GGTCAGCTAAATAAGCT 150 GGTCAGCTAAATAAGC 573
    ATCGGG TATC
    4428 4450 GGCCCATACCCCGAAAA 151 GGCCCATACCCCGAAA 574
    TGTTGG ATGT
    4460 4482 TCCCGTACTAATTAATC 152 TCCCGTACTAATTAAT 575
    CCCTGG CCCC
    4494 4516 ATCTACTCTACCATCTTT 153 ATCTACTCTACCATCT 576
    GCAGG TTGC
    4542 4564 CACTGATTTTTTACCTG 154 CACTGATTTTTTACCT 577
    AGTAGG GAGT
    4692 4714 CTCTTCAACAATATACT 155 CTCTTCAACAATATAC 578
    CTCCGG TCTC
    4767 4789 ATAGCTATAGCAATAAA 156 ATAGCTATAGCAATAA 579
    ACTAGG AACT
    4799 4821 CTTTCACTTCTGAGTCC 157 CTTTCACTTCTGAGTC 580
    CAGAGG CCAG
    4809 4831 TGAGTCCCAGAGGTTAC 158 TGAGTCCCAGAGGTTA 581
    CCAAGG CCCA
    4827 4849 CAAGGCACCCCTCTGAC 159  CAAGGCACCCCTCTGA 582
    ATCCGG CATC
    4941 4963 TCAATCTTATCCATCAT 160 TCAATCTTATCCATCA 583
    AGCAGG TAGC
    4950 4972 TCCATCATAGCAGGCAG 161 TCCATCATAGCAGGCA 584
    TTGAGG GTTG
    4953 4975 ATCATAGCAGGCAGTTG 162 ATCATAGCAGGCAGTT 585
    AGGTGG GAGG
    5010 5032 TACTCCTCAATTACCCA 163 TACTCCTCAATTACCC 586
    CATAGG ACAT
    5202 5274 CCATCCACCCTCCTCTC 164 CCATCCACCCTCCTCT 587
    CCTAGG CCCT
    5205 5227 TCCACCCTCCTCTCCCT 165 TCCACCCTCCTCTCCCT 588
    AGGAGG AGG
    5223 5245 GGAGGCCTGCCCCCGCT 166 GGAGGCCTGCCCCCGC 589
    AACCGG TAAC
    539 5261 TAACCGGCTTTTTGCCC 167 TAACCGGCTTTTTGCC 590
    AAATGG CAAA
    5240 5262 AACCGGCTTTTTGCCCA 168 AACCGGCTTTTTGCCC 591
    AATGGG AAAT
    5500 5522 TAATAATCTTATAGAAA 169 TAATAATCTTATAGAA 592
    TTTAGG ATTT
    5569 5591 CTTAATTTCTGTAACAG 170 CTTAATTTCTGTAACA 593
    CTAAGG GCTA
    5646 5668 CTAAGCCCTTACTAGAC 171 CTAAGCCCTTACTAGA 594
    CAATGG CCAA
    5647 5669 TAAGCCCTTACTAGACC 172 TAAGCCCTTACTAGAC 595
    AATGGG CAAT
    5697 5719 AGCTAAGCACCCTAATC 173 AGCTAAGCACCCTAAT 596
    AACTGG CAAC
    5723 5745 CAATCTACTTCTCCCGC 174 CAATCTACTTCTCCCG 597
    CGCCGG CCGC
    5724 5746 AATCTACTTCTCCCGCC 175 AATCTACTTCTCCCGC 598
    GCCGGG CGCC
    5732 5754 TCTCCCGCCGCCGGGAA 176 TCTCCCGCCGCCGGGA 599
    AAAAGG AAAA
    5735 5757 CCCGCCGCCGGGAAAA 177 CCCGCCGCCGGGAAA 600
    AAGGCGG AAAGG
    5736 5758 CCGCCGCCGGGAAAAA 178 CCGCCGCCGGGAAAA 601
    AGGCGGG AAGGC
    5747 5769 AAAAAAGGCGGGAGAA 179 AAAAAAGGCGGGAGA 602
    GCCCCGG AGCCC
    5751 5773 AAGGCGGGAGAAGCCC 180 AAGGCGGGAGAAGCC 603
    CGGCAGG CCGGC
    5800 5822 ATTCAATATGAAAATCA 181 ATTCAATATGAAAATC 604
    CCTCGG ACCT
    5806 5828 TATGAAAATCACCTCGG 182 TATGAAAATCACCTCG 605
    ACTCTGG GAGC
    5816 5838 ACCTCGGAGCTGGTAAA 183 ACCTCGGAGCTGGTAA 606
    AAGAGG AAAG
    5928 5950 TCTACAAACCACAAAGA 184 TCTACAAACCACAAAG 607
    CATTGG ACAT
    5949 5971 GGAACACTATACCTATT 185 GGAACACTATACCTAT 608
    ATTCGG TATT
    5961 5983 CTATTATTCGGCGCATG 186 CTATTATTCGGCGCAT 609
    AGCTGG GAGC
    5970 5992 GGCGCATGAGCTGGAGT 187 GGCGCATGAGCTGGA 610
    CCTAGG GTCCT
    6005 6027 CCTCCTTATTCGAGCCG 188 CCTCCTTATTCGAGCC 611
    AGCTGG GAGC
    6006 6028 CTCCTTATTCGAGCCGA 189 CTCCTTATTCGAGCCG 612
    GCTGGG AGCT
    6027 6049 GGCCAGCCAGGCAACCT 190 GGCCAGCCAGGCAAC 613
    TCTAGG CTTCT
    6108 3130 ATAGTAATACCCATCAT 191 ATAGTAATACCCATCA 614
    AATCGG TAAT
    6111 6133 GTAATACCCATCATAAT 192 GTAATACCCATCATAA 615
    CGGAGG TCGG
    6117 6139 CCCATCATAATCGGAGG 193 CCCATCATAATCGGAG 616
    CTTTGG GCTT
    6144 6166 TGACTAGTTCCCCTAAT 194 TGACTAGTTCCCCTAA 617
    AATCGG TAAT
    6158 6180 AATAATCGGTGCCCCCG 195 AATAATCGGTGCCCCC 618
    ATATGG GATA
    6236 6258 CCTGCTCGCATCTGCTA 196 CCTGCTCGCATCTGCT 619
    TAGTGG ATAG
    6239 6261 GCTCGCATCTGCTATAG 197 GCTCGCATCTGCTATA 620
    TGGAGG GTGG
    6243 6265 GCATCTGCTATAGTGGA 198 GCATCTGCTATAGTGG 621
    GGCCGG AGGC
    6249 6271 GCTATAGTGGAGGCCGG 199 GCTATAGTGGAGGCCG 622
    AGCAGG GAGC
    6255 6277 GTGGAGGCCGGAGCAG 200 GTGGAGGCCGGAGCA 623
    GAACAGG GGAAC
    6282 6304 ACAGTCTACCCTCCCTT 201 ACAGTCTACCCTCCC 624
    AGCAGG TAGC
    6283 6305 CAGTCTACCCTCCCTTA 202 CAGTCTACCCTCCCTT 625
    GCAGGG AGCA
    6300 6322 GCAGGGAACTACTCCCA 203 GCAGGGAACTACTCCC 626
    CCCTGG ACCC
    6342 6364 ATCTTCTCCTTACACCT 204 ATCTTCTCCTTACACCT 627
    AGCAGG AGC
    6360 6382 GCAGGTGTCTCCTCTAT 205 GCAGGTGTCTCCTCTA 628
    CTTAGG TCTT
    6361 6383 CAGGTGTCTCCTCTATC 206 CAGGTGTCTCCTCTAT 629
    TTAGGG CTTA
    6362 6384 AGGTGTCTCCTCTATCT 207 AGGTGTCTCCTCTATC 630
    TAGGGG TTAG
    6495 6517 TCTCTCCCAGTCCTAGC 208 TCTCTCCCAGTCCTAG 631
    TGCTGG CTGC
    6552 6574 ACCACCTTCTTCGACCC 209 ACCACCTTCTTCGACC 632
    CGCCGG CCGC
    6555 6577 ACCTTCTTCGACCCCGC 210 ACCTTCTTCGACCCCG 633
    CGGAGG CCGG
    6558 6580 TTCTTCGACCCCGCCGG 211 TTCTTCGACCCCGCCG 634
    AGGAGG GAGG
    6597 6619 CAACACCTATTCTGATT 212 CAACACCTATTCTGAT 635
    TTTCGG TTTT
    6630 6652 GTTTATATTCTTATCCTA 213 GTTTATATTCTTATCCT 636
    CCAGG ACC
    6636 6658 ATTCTTATCCTACCAGG 214 ATTCTTATCCTACCAG 637
    CTTCGG GCTT
    6669 6691 CATATTGTAACTTACTA 215 CATATTGTAACTTACT 638
    CTCCGG ACTC
    6687 6709 TCCGGAAAAAAAGAAC 216 TCCGGAAAAAAAGAA 639
    CATTTGG CCATT
    6696 6718 AAAGAACCATTTGGATA 217 AAAGAACCATTTGGAT 640
    CATAGG ACAT
    6701 6723 ACCATTTGGATACATAG 218 ACCATTTGGATACATA 641
    GTATGG GGTA
    6723 6745 GTCTGAGCTATGATATC 219 GTCTGAGCTATGATAT 642
    AATTGG CAAT
    6732 6754 ATGATATCAATTGGCTT 220 ATGATATCAATTGGCT 643
    CCTAGG TCCT
    6713 6755 TGATATCAATTGGCTTC 221 TGATATCAATTGGCTT 644
    CTAGGG CCTA
    6768 6790 GCACACCATATATTTAC 222 GCACACCATATATTTrA 645
    AGTAGG CAGT
    6831 6853 ATAATCATCGCTATCCC 223 ATAATCATCGCTATCC 646
    CACCGG CCAC
    6867 6889 AGCTGACTCGCCACACT 224 AGCTGACTCGCCACAC 647
    CCACGG TCCA
    6909 6931 GCTGCAGTGCTCTGAGC 225 GCTGCAGTGCTCTGAG 648
    CCTAGG CCCT
    6933 6955 TTCATCTTTCTTTTCACC 226 TTCATCTTTCTTTTCAC 649
    GTAGG CGT
    6936 6958 ATCTTTCTTTTCACCGTA 227 ATCTTTCTTTTCACCGT 650
    GGTGG AGG
    6945 6967 TTCACCGTAGGTGGCCT 228 TTCACCGTAGGTGGCC 651
    GACTGG TGAC
    7032 7054 TTCCACTATGTCCTATC 229 TTCCACTATGTCCTAT 652
    AATAGG CAAT
    7053 7075 GGAGCTGTATTTGCCAT 230 GGAGCTGTATTTGCCA 653
    CATAGG TCAT
    7056 7078 GCTGTATTTGCCATCAT 231 GCTGTATTTGCCATCA 654
    AGGAGG TAGG
    7086 7108 CACTGATTTCCCCTATT 232 CACTGATTTCCCCTAT 655
    CTCAGG TCTC
    7140 7162 CATTTCACTATCATATT 233 CATTTCACTATCATAT 656
    CATCGG TCAT
    7176 7198 TTCTTCCCACAACACTT 234 TTCTTCCCACAACACT 657
    TCTCGG TTCT
    7185 7207 CAACACTTTCTCGGCCT 235 CAACACTTTCTCGGCC 658
    ATCCGG TATC
    7205 7227 CGGAATGCCCCGACGTT 236 CGGAATGCCCCGACGT 659
    ACTCGG TACT
    7251 7273 TGAAACATCCTATCATC 237 TGAAACATCCTATCAT 660
    TGTAGG CTGT
    7358 7380 AGAAGAACCCTCCATAA 238 AGAAGAACCCTCCATA 661
    ACCTGG AACC
    7371 7393 ATAAACCTGGAGTGACT 239 ATAAACCTGGAGTGAC 662
    ATATGG TATA
    7432 7454 ACATAAAATCTAGACAA 240 ACATAAAATCTAGACA 663
    AAAAGG AAAA
    7436 7458 AAAATCTAGACAAAAA 241 AAAATCTAGACAAAA 664
    AGGAAGG AAGGA
    7457 7479 GGAATCGAACCCCCCAA 242 GGAATCGAACCCCCCA 665
    AGCTGG AAGC
    7476 7498 CTGGTTTCAAGCCAACC 243 CTGGTTTCAAGCCAAC 666
    CCATGG CCCA
    7499 7521 CCTCCATGACTTTTTCA 244 CCTCCATGACTTTTTC 667
    AAAAGG AAAA
    7544 7566 CTTTGTCAAAGTTAAAT 245 CTTTGTCAAAGTTAAA 668
    TATAGG TTAT
    7567 7589 CTAAATCCTATATATCT 246 CTAAATCCTATATATC 669
    TAATGG TTAA
    7586 7608 ATGGCACATGCAGCGCA 247 ATGGCACATGCAGCGC 670
    AGTAGG AAGT
    7741 7763 TACTAACATCTCAGACG 248 TACTAACATCTCAGAC 671
    CTCAGG GCTC
    7831 7853 CATCCTTTACATAACAG 249 CATCCTTTACATAACA 672
    ACGAGG GACG
    7865 7887 TCCCTTACCATCAAATC 250 TCCCTTACCATCAAAT 673
    AATTGG CAAT
    7875 7897 TCAAATCAATTGGCCAC 251 TCAAATCAATTGGCCA 674
    CAATGG CCAA
    7904 7926 ACCTACGAGTACACCGA 252 ACCTACGAGTACACCG 675
    CTACGG ACTA
    7907 7929 TACGAGTACACCGACTA 253 TACGAGTACACCGACT 676
    CGGCGG ACGG
    7955 7977 CCCCCATTATTCCTAGA 254 CCCCCATTTTCCTAG 677
    ACCAGG AACC
    8069 8091 TCATGAGCTGTCCCCAC 255 TCATGAGCTGTCCCCA 678
    ATTAGG CATT
    8093 8115 TTAAAAACAGATGCAAT 256 TTAAAAACAGATGCAA 679
    TCCCGG TTCC
    8131 8153 CACTTTCACCGCTACAC 257 CACTTTCACCGCTACA 680
    GACCGG CGAC
    8132 8154 ACTTTCACCGCTACACG 258 ACTTTCACCGCTACAC 681
    ACCGGG GACC
    8133 8155 CTTTCACCGCTACACGA 259 CTTTCACCGCTACACG 682
    CCGGGG ACCG
    8134 8156 TTTCACCGCTACACGAC 260 TTTCACCGCTACACGA 683
    CGGGGG CCGG
    8144 8166 ACACGACCGGGGGTAT 261 ACACGACCGGGGGTAT 684
    ACTACGG ACTA
    8165 8187 GGTCAATGCTCTGAAAT 262 GGTCAATGCTCTGAAA 685
    CTGTGG TCTG
    8228 8250 CCCCTAAAAATCTTTGA 263 CCCCTAAAAATCTTTG 686
    AATAGG AAAT
    8229 8251 CCCTAAAAATCTTTGAA 264 CCCTAAAAATCTTTGA 687
    ATAGGG AATA
    8370 8392 CCCAACTAAATACTACC 265 CCCAACTAAATACTAC 688
    GTATGG CGTA
    8551 8573 TTCATTGCCCCCACAAT 266 TTCATTTGCCCCCACAA 689
    CCTAGG TCCT
    8698 8720 ATAACCATACACAACAC 267 ATAACCATACACAACA 690
    TAAAGG CTAA
    8761 8783 ATTGCCACAACTAACCT 268 ATTGCCACAACTAACC 691
    CCTCGG TCCT
    8817 8839 ACTATCTATAAACCTAG 269 ACTATCTATAAACCTA 692
    CCATGG GCCA
    8835 8857 CATGGCCATCCCCTTAT 270 CATGGCCATCCCCTTA 693
    GAGCGG TGAG
    8836 8858 ATGGCCATCCCCTTATG 271 ATGGCCATCCCCTTAT 694
    AGCGGG GAGC
    8851 8873 TGAGCGGGCACAGTGAT 272 TGAGCGGGCACAGTG 695
    TATAGG ATTAT
    8899 8921 CTAGCCCACTTCTTACC 273 CTAGCCCACTTCTTAC 696
    ACAAGG CACA
    8973 8995 ACTCATTCAACCAATAG 274 ACTCATTCAACCAATA 697
    CCCTGG GCCC
    9004 9026 CTAACCGCTAACATTACC 275 CTAACCGCTAACATTA 698
    TGCAGG CTGC
    9028 9050 CACCTACTCATGCACCT 276 CACCTACTCATGCACC 699
    AATTGG TAAT
    9243 9265 CCCAGCCCATGACCCCT 277 CCCAGCCCATGACCCC 700
    AACAGG TAAC
    9244 9266 CCAGCCCATGACCCCTA 278 CCAGCCCATGACCCCT 701
    ACAGGG AACA
    9245 9267 CAGCCCATGACCCCTAA 279 CAGCCCATGACCCCTA 702
    CAGGGG ACAG
    9273 9295 TCAGCCCTCCTAATGAC 280 TCAGCCCTCCTAATGA 703
    CTCCGG CCTC
    9321 9343 TCCATAACGCTCCTCAT 281 TCCATAACGCTCCTCA 704
    ACTAGG TACT
    9358 9380 CACTAACCATATACCAA 282 CACTAACCATATACCA 705
    TGATGG ATGA
    9390 9412 ACACGAGAAAGCACAT 283 ACACGAGAAAGCACA 706
    ACCAAGG TACCA
    9417 9439 CACACACCACCTGTCCA 284 CACACACCACCTGTCC 707
    AAAAGG AAAA
    9429 9451 GTCCAAAAAGGCCTTCG 285 GTCCAAAAAGGCCTTC 708
    ATACGG GATA
    9430 9452 TCCAAAAAGGCCTTCGA 286 TCCAAAAAGGCCTTCG 709
    TACGGG ATAC
    9471 9493 TCAGAAGTTTTTTTCTTC 287 TCAGAAGTTTTTTTCTT 710
    GCAGG CGC
    9522 9544 CTAGCCCCTACCCCCCA 288 CTAGCCCCTACCCCCC 711
    ATTAGG AATT
    9525 9547 GCCCCTACCCCCCAATT 289 GCCCCTACCCCCCAAT 712
    AGGAGG TAGG
    9526 9548 CCCCTACCCCCCAATTA 290 CCCCTACCCCCCAATT 713
    GGAGGG AGGA
    9532 9554 CCCCCCAATTAGGAGGG 291 CCCCCCAATTAGGAGG 714
    CACTGG GCAC
    9543 9565 GGAGGGCACTGGCCCCC 292 GGAGGGCCACTGGCCCC 715
    AACAGG CAAC
    9606 9628 ACATCCGTATTACTCGC 293 ACATCCGTATTACTCG 716
    ATCAGG CATC
    9692 9714 ACTGCTTATTACAATTT 294 ACTGCTTATTACAATT 717
    TACTGG TTAC
    9693 9715 CTGCTTATTACAATTTT 295 CTGCTTATTACAATTTT 718
    ACTGGG ACT
    9756 9778 TCTCCCTTCACCATTTCC 296 TCTCCCTTCACCATTTC 719
    GACGG CGA
    9765 9787 ACCATTTCCGACGGCAT 297 ACCATTTCCGACGGCA 720
    CTACGG TCTA
    9789 9811 TCAACATTTTTTGTAGC 298 TCAACATTTTTTGTAG 721
    CACAGG CCAC
    9798 9820 TTTGTAGCCACAGGCTT 299 TTTGTAGCCACAGGCT 722
    CCACGG TCCA
    9816 9838 CACGGACTTCACGTCAT 300 CACGGACTTCACGTCA 723
    TATTGG TTAT
    9885 9907 TTTACATCCAAACATCA 301 TTTACATCCAAACATC 724
    CTTTGG ACTT
    9910 9932 TCGAAGCCGCCGCCTGA 302 TCGAAGCCGCCGCCTG 725
    TACTGG ATAC
    9926 9948 ATACTGGCATTTTGTAG 303 ATACTGGCATTTTGTA 726
    ATGTGG GATG
    9963 9985 TATGTCTCCATCTATTG 304 TATGTCTCCATCTATT 727
    ATGAGG GATG
    9964 9986 ATGTCTCCATCTATTGA 305 ATGTCTCCATCTATTG 728
    TGAGGG ATGA
    10122 10144 TTTTGACTACCACAACT 306 TTTTGACTACCACAAC 729
    CAACGG TCAA
    10155 10177 AAATCCACCCCTTACGA 307 AAATCCACCCCTTACG 730
    GTGCGG AGTG
    10343 10365 CATCATCCTAGCCCTAA 308 CATCATCCTAGCCCTA 731
    GTCTGG AGTC
    10365 10387 GCCTATGAGTGACTACA 309 GCCTATGAGTGACTAC 732
    AAAAGG AAAA
    10385 10407 AGGATTAGACTGAACCG 310 AGGATTAGACTGAACC 733
    AATTGG GAAT
    10500 10522 GCATTTACCATCTCACT 311 GCATTTACCATCTCAC 734
    TCTAGG TTCT
    10551 10573 TCCTCCCTACTATGCCT 312 TCCTCCCTACTATGCC 735
    AGAAGG TAGA
    10664 10686 CTTTGCCGCCTGCGAAG 313 CTTTGCCGCCTGCGAA 736
    CAGCGG GCAG
    10667 10689 TGCCGCCTGCGAAGCAG 314 TGCCGCCTGCGAAGCA 737
    CGGTGG GCGG
    10668 10690 GCCGCCTGCGAAGCAGC 315 GCCGCCTGCGAAGCAG 738
    GGTGGG CGGT
    10704 10726 GTCTCAATCTCCAACAC 316 GTCTCAATCTCCAACA 739
    ATATGG CATA
    10972 10994 ACTCCTACCCCTCACAA 317 ACTCCTACCCCTCACA 740
    TCATGG ATCA
    11128 11150 AACCACACTTATCCCCA 318 AACCACACTTATCCCC 741
    CCTTGG ACCT
    11147 11169 TTGGCTATCATCACCCG 319 TTGGCTATCATCACCC 742
    ATGAGG GATG
    11174 11196 CAGCCAGAACGCCTGA 320 CAGCCAGAACGCCTGA 743
    ACGCAGG ACGC
    11204 11226 TTCCTATTCTACACCCT 321 TTCCTATTCTACACCCT 744
    AGTAGG AGT
    11252 11274 ATTTACACTCACAACAC 322 ATTTACACTCACAACA 745
    CCTAGG CCCT
    11369 11391 ATAGTAAAGATACCTCT 323 ATAGTAAAGATACCTC 746
    TTACGG TTTA
    11417 11439 CATGTCGAAGCCCCCAT 324 CATGTCGAAGCCCCCA 747
    CGCTGG TCGC
    11418 11440 ATGTCGAAGCCCCCATC 325 ATGTCGAAGCCCCCAT 748
    GCTGGG CGCT
    11453 11475 GCCGCAGTACTCTTAAA 326 GCCGCAGTACTCTTAA 749
    ACTAGG AACT
    11456 11478 GCAGTACTCTTAAAACT 327 GCAGTACTCTTAAAAC 750
    AGGCGG TAGG
    11462 11484 CTCTTAAAACTAGGCGG 328 CTCTTAAAACTAGGCG 751
    CTATGG GCTA
    11540 11562 TTCCTTGTACTATCCCTA 329 TTCCTTGTACTATCCCT 752
    TGAGG ATG
    11669 11691 CAAACCCCCTGAAGCTT 330 CAAACCCCCTGAAGCT 753
    CACCGG TCAC
    11696 11718 GTCATTCTCATAATCGC 331 GTCATTCTCATAATCG 754
    CCACGG CCCA
    11697 11719 TCATTCTCATAATCGCC 332 TCATTCTCATAATCGC 755
    CACGGG CCAC
    11777 11799 CGCATCATAATCCTCTC 333 CGCATCATAATCCTCT 756
    TCAAGG CTCA
    11866 11888 ACCCCCCACTATTAACC 334 ACCCCCACTATTAAC 757
    TACTGG CTAC
    11867 11889 CCCCCCACTATTAACCT 335 CCCCCCACTATTAACC 758
    ACTGGG TACT
    11927 11949 AATATCACTCTCCTACT 336 AATATCACTCTCCTAC 759
    TACAGG TTAC
    11985 12007 ACATATTTACCACAACA 337 ACATATTTACCACAAC 760
    CAATGG ACAA
    11986 12008 CATATTTACCACAACAC 338 CATATTTACCACAACA 761
    AATGGG CAAT
    11987 12009 ATATTTACCACAACACA 339 ATATTTACCACAACAC 762
    ATGGGG AATG
    12104 12126 CTCAACCCCGACATCAT 340 CTCAACCCCGACATCA 763
    TACCGG TTAC
    12105 12127 TCAACCCCGACATCATT 341 TCAACCCCGACATCAT 764
    ACCGGG TACC
    12164 12186 GATTGTGAATCTGACAA 342 GATTGTGAATCTGACA 765
    CAGAGG ACAG
    12235 12257 TGCCCCCATGTCTAACA 343 TGCCCCCATGTCTAAC 766
    ACATGG AACA
    12254 12276 ATGGCTTTCTCAACTTTT 344 ATGGCTTTCTCAACTT 767
    AAAGG TTAA
    12272 12294 AAAGGATAACAGCTATC 345 AAAGGATAACAGCTAT 768
    CATTGG CCAT
    12279 12301 AACAGCTATCCATTGGT 346 AACAGCTATCCATTGG 769
    CTTAGG TCTT
    12294 12316 GTCTTAGGCCCCAAAAA 347 GTCTTAGGCCCCAAAA 770
    TTTTGG ATTT
    12608 12630 CTGTAGCATTGTTCGTT 348 CTGTAGCATTGTTCGT 771
    ACATGG TACA
    12742 12764 AACCTATTCCAACTGTT 349 AACCTATTCCAACTGT 772
    CATCGG TCAT
    12750 12772 CCAACTGTTCATCGGCT 350 CCAACTGTTCATCGGC 773
    GAGAGG TGAG
    12751 12773 CAACTGTTCATCGGCTG 351 CAACTGTTCATCGGCT 774
    AGAGGG GAGA
    12757 12779 TTCATCGGCTGAGAGGG 352 TTCATCGGCTGAGAGG 775
    CGTAGG GCGT
    12847 12869 GCAATCCTATACAACCG 353 GCAATCCTATACAACC 776
    TATCGG GTAT
    12856 12878 TACAACCGTATCGGCGA 354 TACAACCGTATCGGCG 777
    TATCGG ATAT
    12958 12980 CCAAGCCTCACCCCACT 355 CCAAGCCTCACCCCAC 778
    ACTAGG TACT
    12979 13001 GGCCTCCTCCTAGCAGC 356 GGCCTCCTCCTAGCAG 779
    AGCAGG CAGC
    12997 13019 GCAGGCAAATCAGCCC 357 GCAGGCAAATCAGCCC 780
    AATTAGG AATT
    13030 13052 TGACTCCCCTCAGCCAT 358 TGACTCCCCTCAGCCA 781
    AGAAGG TAGA
    13081 13103 TCAAGCACTATAGTTGT 359 TCAAGCACTATAGTTG 782
    AGCAGG TAGC
    13156 13178 CAAACTCTAACACTATG 360 CAAACTCTAACACTAT 783
    CTTAGG GCTT
    13246 13268 TTCTCCACTTCAAGTCA 361 TTCTCCACTTCAAGTC 784
    ACTAGG AACT
    13267 13289 GGACTCATAATAGTTAC 362 GGACTCATAATAGTTA 785
    AATCGG CAAT
    13345 13367 GCCATACTATTTATGTG 363 GCCATACTATTTATGT 786
    CTCCGG GCTC
    13346 13368 CCATACTATTTATGTGC 364 CCATACTATTTATGTG 787
    TCCGGG CTCC
    13393 13415 GAACAAGATATTCGAA 365 GAACAAGATATTCGAA 788
    AAATAGG AAAT
    13396 13418 CAAGATATTCGAAAAAT 366 CAAGATATTCGAAAAA 789
    AGGAGG TAGG
    13441 13463 ACTTCAACCTCCCTCAC 367 ACTTCAACCTCCCTCA 790
    CATTGG CCAT
    13459 13481 ATTGGCAGCCTAGCATT 368 ATTGGCAGCCTAGCAT 791
    AGCAGG TAGC
    13477 13499 GCAGGAATACCTTTCCT 369 GCAGGAATACCTTTCC 792
    CACAGG TCAC
    13612 13634 ATAATTCTTCTCACCCT 370 ATAATTCTTCTCACCC 793
    AACAGG TAAC
    13686 13708 ACTAAACCCCATTAAAC 371 ACTAAACCCCATTAAA 794
    GCCTGG CGCC
    13693 13715 CCCATTAAACGCCTGGC 372 CCCATTAAACGCCTGG 795
    AGCCGG CAGC
    13708 13730 GCAGCCGGAAGCCTATT 373 GCAGCCGGAAGCCTAT 796
    CGCAGG TCGC
    13804 13826 GCCCTCGCTGTCACTTT 374 GCCCTCGCTGTCACTT 797
    CCTAGG TCCT
    13894 13916 TTTTATTTCTCCAACATA 375 TTTTATTTCTCCAACAT 798
    CTCGG ACT
    13936 13958 CACCGCACAATCCCCTA 376 CACCGCACAATCCCCT 799
    TCTAGG ATCT
    14059 14081 ATCATCACCTCAACCCA 377 ATCATCACCTCAACCC 800
    AAAAGG AAAA
    14237 14259 TACAAAGCCCCCGCACC 378 TACAAAGCCCCCGCAC 801
    AATAGG CAAT
    14417 14439 ACCCCTGACCCCCATGC 379 ACCCCTGACCCCCATG 802
    CTCAGG CCTC
    14579 14601 AATACTAAACCCCCATA 380 AATACTAAACCCCCAT 803
    AATAGG AAAT
    14585 14607 AAACCCCCATAAATAGG 381 AAACCCCCATAAATAG 804
    AGAAGG GAGA
    14664 14686 CATACATCATTATTCTC 382 CATACATCATTATTCT 805
    GCACGG CGCA
    14825 14847 ATCTCCGCATGATGAAA 383 ATCTCCGCATGATGAA 806
    CTTCGG ACTT
    14837 14859 TGAAACTTCGGCTCACT 384 TGAAACTTCGGCTCAC 807
    CCTTGG TCCT
    14867 14889 CTGATCCTCCAAATCAC 385 CTGATCCTCCAAATCA 808
    CACAGG CCAC
    14951 14973 ATCACTCGAGACGTAAA 386 ATCACTCGAGACGTAA 809
    TTATGG ATTA
    14981 15003 ATCCGCTACCTTCACGC 387 ATCCGCTACCTTCACG 810
    CAATGG CCAA
    15020 15042 ATCTGCCTCTTCCTACA 388 ATCTGCCTCTTCCTAC 811
    CATCGG ACAT
    15021 15043 TCTGCCTCTTCCTCACA 389 TCTGCCTCTTCCTACA 812
    ATCGGG CATC
    15026 15048 CTCTTCCTACACATCGG 390 CTCTTCCTACACATCG 813
    GCGAGG GGCG
    15038 15060 ATCGGGCGAGGCCTATA 391 ATCGGGCGAGGCCTAT 814
    TTACGG ATTA
    15071 15093 TACTCAGAAACCTGAAA 392 TACTCAGAAACCTGAA 815
    CATCGG ACAT
    15113 15135 ACTATAGCAACAGCCTT 393 ACTATAGCAACAGCCT 816
    CATAGG TCAT
    15131 15153 ATAGGCTATGTCCTCCC 394 ATAGGCTATGTCCTCC 817
    GTGAGG CGTG
    15149 15171 TGAGGCCAAATATCATT 395 TGAGGCCAAATATCAT 818
    CTGAGG TCTG
    15150 15172 GAGGCCAAATATCATTC 396 GAGGCCAAATATCATT 819
    TGAGGG CTGA
    15151 15173 AGGCCAAATATCATTCT 397 AGGCCAAATATCATTC 820
    GAGGGG TGAG
    15194 15216 CTATCCGCCATCCCATA 398 CTATCCGCCATCCCAT 821
    CATTGG ACAT
    15195 15217 TATCCGCCATCCCATAC 399 TATCCGCCATCCCATA 822
    ATTGGG CATT
    15221 15243 GACCTAGTTCAATGAAT 400 GACCTAGTTCAATGAA 823
    CTGAGG TCTG
    15224 15246 CTAGTTCAATGAATCTG 401 CTAGTTCAATGAATCT 824
    AGGAGG GAGG
    15334 15356 CCTCCTATTCTTGCACG 402 CCTCCTATTCTTGCAC 825
    AAACGG GAAA
    15335 15357 CTCCTATTCTTGCACGA 403 CTCCTATTCTTGCACG 826
    AACGGG AAAC
    15353 15375 ACGGGATCAAACAACC 404 ACGGGATCAAACAAC 827
    CCCTAGG CCCCT
    15416 15438 TACACAATCAAAGACGC 405 TACACAATCAAAGACG 828
    CCTCGG CCCT
    15476 15498 CTATTCTCACCAGACCT 406 CTATTCTCACCAGACC 829
    CCTAGG TCCT
    15590 15612 CGATCCGTCCCTAACAA 407 CGATCCGTCCCTAACA 830
    ACTAGG AACT
    15593 15615 TCCGTCCCTAACAAACT 408 TCCGTCCCTAACAAAC 831
    AGGAGG TAGG
    15740 15762 CTCCTCATTCTAACCTG 409 CTCCTCATTCTAACCT 832
    AATCGG GAAT
    15743 15765 CTCATTCTAACCTGAAT 410 CTCATTCTAACCTGAA 833
    CGGAGG TCGG
    15776 15798 AGCTACCCTTTTACCAT 411 AGCTACCCTTTTACCA 834
    CATTGG TCAT
    15861 15883 TTGAAAACAAAATACTC 412 TTGAAAACAAAATACT 835
    AAATGG CAAA
    15862 15884 TGAAAACAAAATACTCA 413 TGAAAACAAAATACTC 836
    AATGGG AAAT
    15906 15928 AATACACCAGTCTTGTA 414 AATACACCAGTCTTGT 837
    AACCGG AAAC
    15928 15950 GAGATGAAAACCTTTTT 415 GAGATGAAAACCTTTT 838
    CCAAGG TCCA
    16012 16034 AACTATTCTCTGTTCTTT 416 AACTTTCTCTGTTCTT 839
    CATGG TCA
    16013 16035 ACTATTCTCTGTTCTTTC 417 ACTATTCTCTGTTCTTT 840
    ATGGG CAT
    16014 16036 CTATTCTCTGTTCTTTCA 418 CTATTCTCTGTTCTTTC 841
    TGGGG ATG
    16026 16048 CTTTCATGGGGAAGCAG 419 CTTTCATGGGGAAGCA 842
    ATTTGG GATT
    16027 16049 TTTCATGGGGAAGCAGA 420 TTTCATGGGGAAGCAG 843
    TTTGGG ATTT
    16108 16130 CAGCCACCATGAATATT 421 CAGCCACCATGAATAT 844
    GTACGG TGTA
    16252 16274 AAAGCCACCCCTCACCC 422 AAAGCCACCCCTCACC 845
    ACTAGG CACT
    16348 16370 CAAATCCCTTCTCGTCC 423 CAAATCCCTTCTCGTC 846
    CCATGG CCCA
    16367 16389 ATGGATGACCCCCCTCA 424 ATGGATGACCCCCCTC 847
    GATAGG AGAT
    16368 16390 TGGATGACCCCCCTCAG 425 TGGATGACCCCCCTCA 848
    ATAGGG GATA
    16369 16391 GGATGACCCCCCTCAGA 426 GGATGACCCCCCTCAG 849
    TAGGGG ATAG
    16434 16456 GAGTGCTACTCTCCTCG 427 GAGTGCTACTCTCCTC 850
    CTCCGG GCTC
    16435 16457 AGTGCTACTCTCCTCGC 428 AGTGCTACTCTCCTCG 851
    TCCGGG CTCC
    16449 16471 CGCTCCGGGCCCATAAC 429 CGCTCCGGGCCCATAA 852
    ACTTGG CACT
    16450 16472 GCTCCGGGCCCATAACA 430 GCTCCGGGCCCATAAC 853
    CTTGGG ACTT
    16451 16473 CTCCGGGCCCATAACAC 431 CTCCGGGCCCATAACA 854
    TTGGGG CTTG
    16452 16474 TCCGGGCCCATAACACT 432 TCCGGGCCCATAACAC 855
    TGGGGG TTGG
    16482 16504 AGTGAACTGTATCCGAC 433 AGTGAACTGTATCCGA 856
    ATCTGG CATC
    16495 16517 CGACATCTGGTTCCTAC 434 CGACATCTGGTTCCTA 857
    TTCAGG CTTC
    16496 16518 GACATCTGGTTCCTACT 435 GACATCTGGTTCCTAC 858
    TCAGGG TTCA
  • TABLE 4
    gRNA target sequence for human mtDNA carrying NGG sequence on the (−) strand.
    nt sequence on the (+)
    strand containing CCN
    Chr end sequence followed by the 20 nt gRNA target
    Chr start position reverse complementary sequence
    position (+ sequence of gRNA target SEQ (will encode the gRNA SEQ
    (+ strand) strand) sequence ID NO: targeting sequence) ID NO:
    17 39 CCCTATTAACCACTCAC 859 GCTCCCGTGAGTGGTT 2628
    GGGAGC AATA
    18 40 CCTATTAACCACTCACG 860 AGCTCCCGTGAGTGGT 2629
    GGAGCT TAAT
    26 48 CCACTCACGGGAGCTCT 861 GCATGGAGAGCTCCCG 2630
    CCATGC TGAG
    43 65 CCATGCATTTGGTATTT 862 AGACGAAAATACCAA 2631
    TCGTCT ATGCA
    104 126 CCGGAGCACCCTATGTC 863 TACTGCGACATAGGGT 2632
    GCAGTA GCTC
    112 134 CCCTATGTCGCAGTATC 864 AAGACAGATACTGCG 2633
    TGTCTT ACATA
    113 135 CCTATGTCGCAGTATCT 865 AAAGACAGATACTGC 2634
    GTCTTT GACAT
    140 162 CCTGCCTCATCCTATTA 866 GATAAATAATAGGATG 2635
    TTTATC AGGC
    144 166 CCTCATCCTATTATTTAT 867 GTGCGATAAATAATAG 2636
    CGCAC GATG
    150 172 CCTATTATTTATCGCAC 868 ACGTAGGTGCGATAAA 2637
    CTACGT TAAT
    166 188 CCTACGTTCAATATTAC 869 TCGCCTGTAATATTGA 2638
    AGGCGA ACGT
    261 283 CCACTTTCCACACAGAC 870 TATGATGTCTGTGTGG 2639
    ATCATA AAAG
    268 290 CCACACAGACATCATAA 871 TTTTTGTTATGATGTCT 2640
    CAAAAA GTG
    298 320 CCAAACCCCCCCTCCCC 872 GAAGCGGGGGAGGGG 2641
    CGCTTC GGGTT
    304 326 CCCCCCTCCCCCGCTTC 873 TGGCCAGAAGCGGGG 2642
    TGGCCA GAGGG
    305 327 CCCCCTCCCCCGCTTCT 874 GTGGCCAGAAGCGGG 2643
    GGCCAC GGAGG
    306 328 CCCCTCCCCCGCTTCTG 875 TGTGGCCAGAAGCGG 2644
    GCCACA GGGAG
    107 329 CCCTCCCCCGCTTCTGG 876 CTGTGGCCAGAAGCGG 2645
    CCACAG GGGA
    308 330 CCTCCCCCGCTTCTGGC 877 GCTGTGGCCAGAAGCG 2646
    CACAGC GGGG
    311 333 CCCCCGCTTCTGGCCAC 878 AGTGCTGTGGCCAGAA 2647
    AGCACT GCGG
    312 334 CCCCGCTTCTGGCCACA 879 AAGTGCTGTGGCCAGA 2648
    GCACTT AGCG
    313 335 CCCGCTTCTGGCCACAG 880 TAAGTGCTGTGGCCAG 2649
    CACTTA AAGC
    314 336 CCGCTTCTGGCCACAGC 881 TTAAGTGCTGTGGCCA 2650
    ACTTAA GAAG
    324 346 CCACAGCACTTAAACAC 882 AGAGATGTGTTTAAGT 2651
    ATCTCT GCTG
    348 370 CCAAACCCCAAAAACA 883 GGTTCTTTGTTTTTGGG 2652
    AAGAACC GTT
    353 375 CCCCAAAAACAAAGAA 884 GTTAGGGTTCTTTGTTT 2653
    CCCTAAC TTG
    354 376 CCCAAAAACAAAGAAC 885 TGTTAGGGTTCTTTGTT 2654
    CCTAACA TTT
    355 377 CCAAAAACAAAGAACC 886 GTGTTAGGGTTCTTTG 2655
    CTAACAC TTTT
    369 391 CCCTAACACCAGCCTAA 887 ATCTGGTTAGGCTGGT 2656
    CCAGAT GTTA
    370 392 CCTAACACCAGCCTAAC 888 AATCTGGTTAGGCTGG 2657
    CAGATT TGTT
    377 399 CCAGCCTAACCAGATTT 889 AATTTGAAATCTGGTT 2658
    CAAATT AGGC
    381 403 CCTAACCAGATTTCAAA 890 ATAAAATTTGAAATCT 2659
    TTTTAT GGTT
    386 408 CCAGATTTCAAATTTTA 891 AAAAGATAAAATTTGA 2660
    TCTTTT AATC
    433 455 CCCCCCAACTAACACAT 892 AAAATAATGTGTTAGT 2661
    TATTTT TGGG
    434 456 CCCCCAACTAACACATT 893 GAAAATAATGTGTTAG 2662
    ATTTTC TTGG
    435 457 CCCCAACTAACACATTA 894 GGAAAATAATGTGTTA 2663
    TTTTCC GTTG
    436 458 CCCAACTAACACATTAT 895 GGGAAAATAATGTGTT 2664
    TTTCCC AGTT
    437 459 CCAACTAACACATTATT 896 GGGGAAAATAATGTGT 7665
    TTCCCC TAGT
    456 478 CCCCTCCCACTCCCATA 897 TAGTAGTATGGGAGTG 2666
    CTACTA GGAG
    457 479 CCCTCCCACTCCCATAC 898 TTAGTAGTATGGGAGT 2667
    TACTAA GGGA
    458 480 CCTCCCACTCCCATACT 899 ATTAGTAGTATGGGAG 2668
    ACTAAT TGGG
    461 483 CCCACTCCCATACTACT 900 GAGATTAGTAGTATGG 2669
    AATCTC GAGT
    462 484 CCACTCCCATACTACTA 901 TGAGATTAGTAGTATG 2670
    ATCTCA GGAG
    467 489 CCCATACTACTAATCTC 902 ATTGATGAGATTAGTA 2671
    ATCAAT GTAT
    468 490 CCATACTACTAATCTCA 903 TATTGATGAGATTAGT 2672
    TCAATA AGTA
    494 516 CCCCCGCCCATCCTACC 904 GTGCTGGGTAGGATGG 2673
    CAGCAC GCGG
    495 517 CCCCGCCCATCCTACCC 905 TGTGCTGGGTAGGATG 2674
    AGCACA GGCG
    496 518 CCCGCCCATCCTACCCA 906 GTGTGCTGGGTAGGAT 2675
    GCACAC GGGC
    497 519 CCGCCCATCCTACCCAG 907 TGTGTGCTGGGTAGGA 2676
    CACACA TGGG
    500 522 CCCATCCTACCCAGCAC 908 GTGTGTGTGCTGGGTA 2677
    ACACAC GGAT
    501 523 CCATCCTACCCAGCACA 909 TGTGTGTGTGCTGGTA 2678
    CACACA AGGA
    505 527 CCTACCCAGCACACACA 910 GCGGTGTGTGTGTGCT 2679
    CACCGC GGGT
    509 531 CCCAGCACACACACACC 911 AGCAGCGGTGTGTGTG 2680
    GCTGCT TGCT
    510 532 CCAGCACACACACACCG 912 TAGCAGCGGTGTGTGT 2681
    CTGCTA GTGC
    524 546 CCGCTGCTAACCCCATA 913 TCGGGGTATGGGGTTA 2682
    CCCCGA GCAG
    534 556 CCCCATACCCCGAACCA 914 TTTGGTTGGTTCGGGG 2683
    ACCAAA TATG
    535 557 CCCATACCCCGAACCAA 915 GTTTGGTTGGTTCGGG 2684
    CCAAAC GTAT
    536 558 CCATACCCCGAACCAAC 916 GGTTTGGTTGGTTCGG 2685
    CAAACC GGTA
    541 563 CCCCGAACCAACCAAAC 917 TTTGGGGTTTGGTTGG 2686
    CCCAAA TTCG
    542 564 CCCGAACCAACCAAACC 918 CTTTGGGGTTTGGTTG 2687
    CCAAAG GTTC
    543 565 CCGAACCAACCAAACCC 919 TCTTTGGGGTTTGGTT 2688
    CAAAGA GGTT
    548 570 CCAACCAAACCCCAAA 920 GGGTGTCTTTGGGGTT 2689
    GACACCC TGGT
    552 574 CCAAACCCCAAAGACA 921 TGGGGGGTGTCTTTGG 2690
    CCCCCCA GGTT
    557 579 CCCCAAAGACACCCCCC 922 AACTGTGGGGGGTGTC 2691
    ACAGTT TTTG
    558 580 CCCAAAGACACCCCCCA 923 AAACTGTGGGGGGTGT 2692
    CAGTTT CTTT
    559 581 CCAAAGACACCCCCCAC 924 TAAACTGTGGGGGGTG 2693
    AGTTTA TCTT
    568 590 CCCCCCACAGTTTATGT 925 TAAGCTACATAAACTG 2694
    AGCTTA TGGG
    569 591 CCCCCACAGTTTATGTA 926 GTAAGCTACATAAACT 2695
    GCTTAC GTGG
    570 592 CCCCACAGTTTATGTAG 927 GGTAAGCTACATAAAC 2696
    CTTACC TGTG
    571 593 CCCACAGTTTATGTAGC 928 AGGTAAGCTACATAAA 2697
    TTACCT CTGT
    572 594 CCACAGTTTATGTAGCT 929 GAGGTAAGCTACATAA 2698
    TACCTC ACTG
    591 613 CCTCCTCAAAGCAATAC 930 TTCAGTGTATTGCTTT 2699
    ACTGAA GAGG
    594 616 CCTCAAAGCAATACACT 931 ATTTTCAGTGTATTGC 2700
    GAAAAT TTTG
    637 659 CCCCATAAACAAATAGG 932 ACCAAACCTATTTGTT 2701
    TTTGGT TATG
    638 660 CCCATAAACAAATAGGT 933 GACCAAACCTATTTGT 2702
    TTGGTC TTAT
    639 661 CCATAAACAAATAGGTT 934 GGACCAAACCTATTTG 2703
    TGGTCC TTTA
    660 682 CCTAGCCTTTCTATTAG 935 TAAGAGCTAATAGAA 2704
    CTCTTA AGGCT
    665 687 CCTTTCTATTAGCTCTTA 936 CTTACTAAGAGCTAAT 2705
    GTAAG AGAA
    705 727 CCCCGTTCCAGTGAGTT 937 AGGGTGAACTCACTGG 2706
    CACCCT AACG
    706 728 CCCGTTCCAGTGAGTTC 938 GAGGGTGAACTCACTG 2707
    ACCCTC GAAC
    707 729 CCGTTCCAGTGAGTTCA 939 AGAGGGTGAACTCACT 2708
    CCCTCT GGAA
    712 734 CCAGTGAGTTCACCCTC 940 GATTTAGAGGGTGAAC 2709
    TAAATC TCAC
    724 746 CCCTCTAAATCACCACG 941 TTTGATCGTGGTGATT 2710
    ATCAAA TAGA
    725 747 CCTCTAAATCACCACGA 942 TTTTGATCGTGGTGAT 2711
    TCAAAA TTAG
    736 758 CCACGATCAAAAGGAA 943 ATGCTTGTTCCTTTTGA 2712
    CAAGCAT TCG
    792 814 CCTAGCCACACCCCCAC 944 TTTCCCGTGGGGGTGT 2713
    GGGAAA GGCT
    797 819 CCACACCCCCACGGGAA 945 TGCTGTTTCCCGTGGG 2714
    ACAGCA GGTG
    802 824 CCCCCACGGGAAACAG 946 ATCACTGCTGTTTCCC 2715
    CAGTGAT GTGG
    803 825 CCCCACGGGAAACAGC 947 AATCACTGCTGTTTCC 2716
    AGTGATT CGTG
    804 826 CCCACGGGAAACAGCA 948 TAATCACTGCTGTTTC 2717
    GTGATTA CCGT
    805 827 CCACGGGAAACAGCAG 949 TTAATCACTGCTGTTT 2718
    TGATTAA CCCG
    828 850 CCTTTAGCAATAAACGA 950 AAACTTTCGTTTATTG 2719
    AAGTTT CTAA
    867 889 CCCCAGGGTTGGTCAAT 951 CACGAAATTGACCAAC 2720
    TTCGTG CCTG
    868 890 CCCAGGGTTGGTCAATT 952 GCACGAAATTGACCAA 2721
    TCGTGC CCCT
    869 891 CCAGGGTTGGTCAATTT 953 GGCACGAAATTGACCA 2722
    CGTGCC ACCC
    890 912 CCAGCCACCGCGGTCAC 954 AATCGTGTGACCGCGG 2723
    ACGATT TGGC
    894 916 CCACCGCGGTCACACGA 955 GGTTAATCGTGTGACC 2724
    TTAACC GCGG
    897 919 CCGCGGTCACACGATTA 956 TTGGGTTAATCGTGTG 2725
    ACCCAA ACCG
    915 937 CCCAAGTCAATAGAAGC 957 ACGCCGGCTTCTATTG 2726
    CGGCGT ACTT
    916 938 CCAAGTCAATAGAAGCC 958 TACGCCGGCTTCTATT 2727
    GGCGTA GACT
    931 953 CCGGCGTAAAGAGTGTT 959 ATCTAAAACACTCTTT 2728
    TTAGAT ACGC
    956 978 CCCCCTCCCCAATAAAG 960 TTTTAGCTTTTTGGG 2729
    CTAAAA GAGG
    957 979 CCCCTCCCCAATAAAGC 961 GTTTTAGCTTTATTGG 2730
    TAAAAC GGAG
    958 980 CCCTCCCCAATAAAGCT 962 AGTTTTAGCTTTATTG 2731
    AAAACT GGGA
    959 981 CCTCCCCAATAAAGCTA 963 GAGTTTTAGCTTTATT 2732
    AAACTC GGGG
    962 984 CCCCAATAAAGCTAAAA 964 GGTGAGTTTTAGCTTT 2733
    CTCACC ATTG
    963 985 CCCAATAAAGCTAAAAC 965 AGGTGAGTTTTAGCTT 2734
    TCACCT TATT
    964 986 CCAATAAAGCTAAAACT 966 CAGGTGAGTTTTAGCT 2735
    CACCTG TTAT
    983 1005 CCTGAGTTGTAAAAAAC 967 ACTGGAGTTTTTTACA 2736
    TCCAGT ACTC
    1001 1023 CCAGTTGACACAAAATA 968 GTAGTCTATTTTGTGT 2737
    GACTAC CAAC
    1064 1086 CCCAAACTGGGATTAGA 969 GGGGTATCTAATCCCA 2738
    TACCCC GTTT
    1065 1087 CCAAACTGGGATTAGAT 970 TGGGGTATCTAATCCC 2739
    ACCCCA AGTT
    1083 1105 CCCCACTATGCTTAGCC 971 GTTTAGGGCTAAGCAT 2740
    CTAAAC AGTG
    1084 1106 CCCACTATGCTTAGCCC 972 GGTTTAGGGCTAAGCA 2741
    TAAACC TAGT
    1085 1107 CCACTATGCTTAGCCCT 973 AGGTTTAGGGCTAAGC 2742
    AAACCT ATAG
    1098 1120 CCCTAAACCTCAACAGT 974 GATTTAACTGTTGAGG 2743
    TAAATC TTTA
    1099 1121 CCTAAACCTCAACAGTT 975 TGATTTAACTGTTGAG  2744
    AAATCA GTTT
    1105 1127 CCTCAACAGTTAAATCA 976 TTTTGTTGATTTAACTG 2745
    ACAAAA TTG
    1135 1157 CCAGAACACTACGAGCC 977 AGCTGTGGCTCGTAGT 2746
    ACAGCT GTTC
    1150 1172 CCACAGCTTAAAACTCA 978 GTCCTTTGAGTTTTAA 2747
    AAGGAC GCTG
    1172 1194 CCTGGCGGTGCTTCATA 979 GAGGGATATGAAGCA 2748
    TCCCTC CCGCC
    1190 1212 CCCTCTAGAGGAGCCTG 980 ACAGAACAGGCTCCT 2749
    TTCTGT TAGA
    1191 1213 CCTCTAGAGGAGCCTGT 981 TACAGAACAGGCTCCT 2750
    TCTGTA CTAG
    1203 1225 CCTGTTCTGTAATCGAT 982 GGGTTTATCGATTACA 2751
    AAACCC GAAC
    1223 1245 CCCCGATCAACCTCAC 983 AGAGGTGGTGAGGTTG 2752
    ACCTCT ATCG
    1224 1246 CCCGATCAACCTCACCA 984 AAGAGGTGGTGAGGTT 2753
    CCTCTT GATC
    1225 1247 CCGATCAACCTCACCAC 985 CAAGAGGTGGTGAGG 2754
    CTCTTG TTGAT
    1233 1255 CCTCACCACCTCTTGCT 986 AGGCTGAGCAAGAGG 2755
    CAGCCT TGGTG
    1238 1260 CCACCTCTTGCTCAGCC 987 TATATAGGCTGAGCAA 2756
    TATATA GAGG
    1241 1263 CCTCTTGCTCAGCCTAT 988 CGGTATATAGGCTGAG 2757
    ATACCG CAAG
    1253 1275 CCTATATACCGCCATCT 989 TGCTGAAGATGGCGGT 2758
    TCAGCA ATAT
    1261 1283 CCGCCATCTTCAGCAAA 990 TCAGGGTTTGCTGAAG 2759
    CCCTGA ATGG
    1264 1286 CCATCTTCAGCAAACCC 991 TCATCAGGGTTTGCTG 2760
    TGATGA AAGA
    1278 1300 CCCTGATGAAGGCTACA 992 TTACTTTGTAGCCTTC 2761
    AAGTAA ATCA
    1279 1301 CCTGATGAAGGCTACAA 993 CTTACTTTGTAGCCTTC 2762
    AGTAAG ATC
    1310 1332 CCCACGTAAAGACGTTA 994 TTGACCTAACGTCTTT 2763
    GGTCAA ACGT
    1311 1333 CCACGTAAAGACGTTAG 995 CTTGACCTAACGTCTT 2764
    GTCAAG TACG
    1340 1362 CCCATGAGGTGGCAAG 996 CCCATTTGTTGCCACC 2765
    AAATGGG TCAT
    1341 1363 CCATGAGGTGGCAAGA 997 GCCCATTTCTTGCCAC 2766
    AATGGGC CTCA
    1375 1397 CCCCAGAAAACTACGAT 998 AGGGCTATCGTAGTTT 2767
    AGCCCT TCTG
    1376 1398 CCCAGAAAACTACGATA 999 AAGGGCTATCGTAGTT 2768
    GCCCTT TTCT
    1377 1399 CCAGAAAACTACGATA 1000 TAAGGGCTATCGTAGT 2769
    GCCCTTA TTTC
    1394 1416 CCCTTATGAAACTTAAG 1001 TCGACCCTTAAGTTTC 2770
    GGTCGA ATAA
    1395 1417 CCTTATGAAACTTAAGG 1002 TTCGACCCTTAAGTTT 2771
    GTCGAA CATA
    1465 1487 CCCTGAAGCGCGTACAC 1003 GGCGGTGTGTACGCGC 2772
    ACCGCC TTCA
    1466 1488 CCTGAAGCGCGTACACA 1004 GGGCGGTGTGTACGCG 2773
    CCGCCC CTTC
    1483 1505 CCGCCCTCACCCTCCT 1005 TACTTGAGGAGGGTGA 2774
    CAAGTA CGGG
    1486 1508 CCCGTCACCCTCCTCAA 1006 GTATACTTGAGGAGGG 2775
    GTATAC TGAC
    1487 1509 CCGTCACCCTCCTCAAG 1007 AGTATACTTGAGGAGG 7776
    TATACT GTGA
    1493 1515 CCCTCCTCAAGTATACT 1008 CTTTGAAGTATACTTG 2777
    TCAAAG AGGA
    1494 1516 CCTCCTCAAGTATACTT 1009 CCTTTGAAGTATACTT 2778
    CAAAGG GAGG
    1497 1519 CCTCAAGTATACTTCAA 1010 TGTCCTTTGAAGTATA 2779
    AGGACA CTTG
    1531 1553 CCCCTACGCATTTATAT 1011 TCCTCTATATAAATGC 2780
    AGAGGA GTAG
    1532 1554 CCCTACGCATTTATATA 1012 CTCCTCTATATAAATG 2781
    GAGGAG CGTA
    1533 1555 CCTACGCATTTATATAG 1013 TCTCCTCTATATAAAT 2782
    AGGAGA GCGT
    1601 1623 CCAGAGTGTAGCTTAAC 1014 CTTTGTGTTAAGCTAC 2783
    ACAAAG ACTC
    1626 1648 CCCAACTTACACTTAGG 1015 AAATCTCCTAAGTGTA 2784
    AGATTT AGTTT
    162 1649 CCAACTTACACTTAGGA 1016 GAAATCTCCTAAGTGT 2785
    GATTTC AAGT
    1662 1684 CCGCTCTGAGCTAAACC 1017 GGGCTAGGTTTAGCTC 2786
    TAGCCC AGAG
    1677 1699 CCTAGCCCCAAACCCAC 1018 GGTGGAGTGGGTTTGG 2787
    TCCACC GGCT
    1682 1704 CCCCAAACCCACTCCAC 1019 AGTAAGGTGGAGTGG 2788
    CTTACT GTTTG
    1683 1705 CCCAAACCCACTCCACC 1020 TAGTAAGGTGGAGTGG 2789
    TTACTA GTTT
    1684 1706 CCAAACCCACTCCACCT 1021 GTAGTAAGGTGGAGTG 2790
    TACTAC GGTT
    1689 1711 CCCACTCCACCTTACTA 1022 GTCTGGTAGTAAGGTG 2791
    CCAGAC GAGT
    1690 1712 CCACTCCACCTTACTAC 1023 TGTCTGGTAGTAAGGT 2792
    CAGACA GGAG
    1695 1717 CCACCTTACTACCAGAC 1024 AAGGTTGTCTGGTAGT 2793
    AACCTT AAGG
    1698 1720 CCTTACTACCAGACAAC 1025 GCTAAGGTTGTCTGGT 2794
    CTTAGC AGTA
    1706 1728 CCAGACAACCTTAGCC 1026 ATGGTTTGGCTAAGGT 2795
    AACCAT TGTC
    1714 1736 CCTTAGCCAAACCATTT 1027 TTGGGTAAATGGTTTG 2796
    ACCCAA GCTA
    1720 1742 CCAAACCATTTACCCAA 1028 CTTTATTTGGGTAAAT 2797
    ATAAAG GGTT
    1725 1747 CCATTTACCCAAATAAA 1029 CTATACTTTATTTGGG 2798
    GTATAG TAAA
    1732 1754 CCCAAATAAAGTATAGG 1030 CTATCGCCTATACTTT 2799
    CGATAG ATTT
    1733 1755 CCAAATAAAGTATAGGC 1031 TCTATCGCCTATACTTT 2800
    GATAGA ATT
    1764 1786 CCTGGCGCAATAGATAT 1032 GGTACTATATCTATTG 2801
    AGTACC CGCC
    1785 1807 CCGCAAGGGAAAGATG 1033 AATTTTTCATCTTTCCC 2802
    AAAAATT TTG
    1812 1834 CCAAGCATAATATAGCA 1034 AGTCCTTGCTATATTA 2803
    AGGACT TGCT
    1837 1859 CCCCTATACCTTCTGCA 1035 TCATTATGCAGAAGGT 2804
    TAATGA ATAG
    1838 1860 CCCTATACCTTCTGCAT 1036 TTCATTATGCAGAAGG 2805
    AATGAA TATA
    1839 1861 CCTATACCTTCTGCATA 1037 ATTCATTATGCAGAAG 2806
    ATGAAT GTAT
    1845 1867 CCTTCTGCATAATGAAT 1038 TAGTTAATTCATTATG 2807
    TAACTA CAGA
    1889 1911 CCAAAGCTAAGACCCCC 1039 GGTTTCGGGGGTCTTA 2808
    GAAACC GCTT
    1901 1923 CCCCCGAAACCAGACG 1040 GGTAGCTCGTCTGGTT 2809
    AGCTACC TCGG
    1902 1924 CCCCGAAACCAGACGA 1041 AGGTAGCTCGTCTGGT 2810
    GCTAC TTCG
    1903 1925 CCCGAAACCAGACGAG 1042 TAGGTAGCTCGTCTGG 2811
    CTACCTA TTTC
    1904 1926 CCGAAACCAGACGAGC 1043 TTAGGTAGCTCGTCTG 2812
    TACCTAA GTTT
    1910 1932 CCAGACGAGCTACCTAA 1044 CTGTTCTTTTGGTAGCT 2813
    GAACAG CGTC
    1922 1944 CCTAAGAACAGCTAAA 1045 GTGCTCTTTTAGCTGTT 2814
    AGAGCAC CTT
    1946 1968 CCCGTCTATGTAGCAAA 1046 CACTATTTTGCTACAT 2815
    ATAGTG AGAC
    1947 1969 CCGTCTATGTAGCAAAA 1047 CCACTATTTTGCTACA 2816
    TAGTGG TAGA
    1996 2018 CCTACCGAGCCTGGTGA 1048 CAGCTATCACCAGGCT 2817
    TAGCTG CGGT
    2000 2022 CCGAGCCTGGTGATAGC 1049 CAACCAGCTATCACCA 2818
    TGGTTG GGCT
    2005 2027 CCTGGTGATAGCTGGTT 1050 TTGGACAACCAGCTAT 2819
    GTCCAA CACC
    2024 2046 CCAAGATAGAATCTTAG 1051 GTTGAACTAAGATTCT 2820
    TTCAAC ATCT
    2057 2079 CCCACAGAACCCTCTAA 1052 GGGGATTTAGAGGGTT 2821
    ATCCCC CTGT
    2058 2080 CCACAGAACCCTCTAAA 1053 AGGGGATTTAGAGGGT 2822
    TCCCCT TCTG
    2066 7088 CCCTCTAAATCCCCTTG 1054 AATTTACAAGGGGATT 2823
    TAAATT TAGA
    2067 2089 CCTCTAAATCCCCTTGT 1055 AAATTTACAAGGGGAT 2824
    AAATTT TTAG
    2076 2098 CCCCTTGTAAATTTAAC 1056 CTAACAGTTAAATTTA 2825
    TGTTAG CAAG
    2077 2099 CCCTTGTAAATTTAACT 1057 ACTAACAGTTAAATTT 2826
    GTTAGT ACAA
    2078 2100 CCTTGTAAATTTAACTG 1058 GACTAACAGTTAAATT 2827
    TTAGTC TACA
    2100 2122 CCAAAGAGGAACAGCT 1059 TCCAAAGAGCTGTTCC 2828
    CTTTGGA TCTT
    2136 2158 CCTTGTAGAGAGAGTAA 1060 AATTTTTTACTCTCTCT 2829
    AAAATT ACA
    2164 2186 CCCATAGTAGGCCTAAA 1061 GCTGCTTTTAGGCCTA 2830
    AGCAGC CTAT
    2165 2187 CCATAGTAGGCCTAAAA 1062 GGCTGCTTTTAGGCCT 2831
    GCAGCC ACTA
    2175 2197 CCTAAAAGCAGCCACCA 1063 CTTAATTGGTGGCTGC 2832
    ATTAAG TTTT
    2186 2208 CCACCAATTAAGAAAGC 1064 TTGAACGCTTTCTTAA 2833
    GTTCAA TTGG
    2189 2211 CCAATTAAGAAAGCGTT 1065 AGCTTGAACGCTTTCT 2834
    CAAGCT AAT
    2217 2239 CCCACTACCTAAAAAAT 1066 TTTGGGATTTTTTAGG 2835
    CCCAAA TAGT
    2218 2240 CCACTACCTAAAAAATC 1067 GTTTGGGATTTTTTAG 2836
    CCAAAC GTAG
    2224 2246 CCTAAAAAATCCCAAAC 1068 TTATATGTTTGGGATT 2837
    ATATAA TTTT
    2234 2256 CCCAAACATATAACTGA 1069 AGGAGTTCAGTTATAT 2838
    ACTCCT GTTT
    2235 2257 CCAAACATATAACTGAA 1070 GAGGAGTTCAGTTATA 2839
    CTCCTC TGTT
    2254 2276 CCTCACACCCAATTGGA 1071 GATTGGTCCAATTGGG 2840
    CCAATC TGTG
    2261 2283 CCCAATTGGACCAATCT 1072 GGTGATAGATTGGTCC 2841
    ATCACC AATT
    2262 2284 CCAATTGGACCAATCTA 1073 GGGTGATAGATTGGTC 2842
    TCACCC CAAT
    2271 2293 CCAATCTATCACCCTAT 1074 TCTTCTATAGGGTGAT 2843
    AGAAGA AGAT
    2282 2304 CCCTATAGAAGAACTAA 1075 CTAACATTAGTTCTTC 2844
    TGTTAG TATA
    2283 2305 CCTATAGAAGAACTAAT 1076 ACTAACATTAGTTCTT 7845
    GTTAGT CTAT
    2328 2350 CCTCCGCATAAGCCTGC 1077 TCTGACGCAGGCTTAT 2846
    GTCAGA GCGG
    2331 2353 CCGCATAAGCCTGCGTC 1078 TAATCTGACGCAGGCT 2847
    AGATTA TATG
    2340 2362 CCTGCGTCAGATTAAAA 1079 TCAGTGTTTTAATCTG 2848
    CACTGA ACGC
    2378 2400 CCCAATATCTACAATCA 1080 GTTGGTTGATTGTAGA 2849
    ACCAAC TATT
    2379 2401 CCAATATCTACAATCAA 1081 TGTTGGTTGATTGTAG 2850
    CCAACA ATAT
    2396 2418 CCAACAAGTCATTATTA 1082 TGAGGGTAATAATGAC 2851
    CCCTCA TTGT
    2413 2435 CCCTCACTGTCAACCCA 1083 CTGTGTTGGGTTGACA 2852
    ACACAG GTGA
    2414 2436 CCTCACTGTCAACCCAA 1084 CCTGTGTTGGGTTGAC 2853
    CACAGG AGTG
    2426 2448 CCCAACACGGCATGCT 1085 CTTATGAGCATGCCTG 2854
    CATAAG TGTT
    2427 2449 CCAACACAGGCATGCTC 1086 CCTTATGAGCATGCCT 2855
    ATAAGG GTGT
    2488 2510 CCCCGCCTGTTTACCAA 1087 ATGTTTTTGGTAAACA 2856
    AAACAT GGCG
    2489 2511 CCCGCCTGTTTACCAAA 1088 GATGTTTTTGGTAAAC 2857
    AACATC AGGC
    2490 2512 CCGCCTGTTTACCAAAA 1089 TGATGTTTTTGGTAAA 2858
    ACATCA CAGG
    2493 2515 CCTGTTTACCAAAAACA 1090 AGGTGATGTTTTTGGT 2859
    TCACCT AAAC
    2501 2523 CCAAAAACATCACCTCT 1091 GATGCTAGAGGTGATG 2860
    AGCATC TTTT
    2513 2535 CCTCTAGCATCACCAGT 1092 TCTAATACTGGTGATG 2861
    ATTAGA CTAG
    2525 2547 CCAGTATTAGAGGCACC 1093 GCAGGCGGTGCCTCTA 2862
    GCCTGC ATAC
    2540 2562 CCGCCTGCCCAGTGACA 1094 AACATGTGTCACTGGG 2863
    CATGTT CAGG
    2543 2565 CCTGCCCAGTGACACAT 1095 TTAAACATGTGTCACT 2864
    GTTTAA GGGC
    2547 2569 CCCAGTGACACATGTTT 1096 GCCGTTAAACATGTGT 2865
    AACGGC CACT
    2548 2570 CCAGTGACACATGTTTA 1097 GGCCGTTAAACATGTG 2866
    ACGGCC TCAC
    2569 2591 CCGCGGTACCCTAACCG 1098 TTTGCACGGTTAGGGT 2867
    TGCAAA ACCG
    2577 2599 CCCTAACCGTGCAAAGG 1099 ATGCTACCTTTGCACG 2868
    TAGCAT GTTA
    2578 2600 CCTAACCGTGCAAAGGT 1100 TATGCTACCTTTGCAC 2869
    AGCATA GGTT
    2583 2605 CCGTGCAAAGGTAGCAT 1101 GTGATTATGTACCT 2870
    AATCAC TGCA
    2611 2633 CCTTAAATAGGGACCTG 1101 TTCATACAGGTCCCTA 2871
    TATGAA TTTA
    2624 2646 CCTGTATGAATGGCTCC 1103 CCTCGTGGAGCCATTC 2872
    ACGAGG ATAC
    2639 2661 CCACGAGGGTTCAGCTG 1104 AAGAGACAGCTGAAC 2873
    TCTCTT CCTCG
    2670 2692 CCAGTGAAATTGACCTG 1105 CACGGGCAGGTCAATT 2874
    CCCGTG TCAC
    2683 2705 CCTGCCCGTGAAGAGGC 1106 ATGCCCGCCTCTTCAC 7875
    GGGCAT GGGC
    2687 2709 CCCGTGAAGAGGCGGG 1107 TGTTATGCCCGCCTCT 2876
    CATAACA TCAC
    2688 2710 CCGTGAAGAGGCGGGC 1108 GTGTTATGCCCGCCTC 2877
    ATAACAC TTCA
    2726 2748 CCCTATGGAGCTTTAAT 1109 TAATAAATTAAAGC  2878
    TTATTA CATA
    2727 2749 CCTATGGAGCTTTAATT 1110 TTAATAAATTAAAGCT 2879
    TATTAA CCAT
    2761 2783 CCTAACAAACCCACAGG 1111 TTAGGACCTGTGGGTT 2880
    TCCTAA TGTT
    2770 2792 CCCACAGGTCCTAAACT 1112 TTTGGTAGTTTAGGAC 2881
    ACCAAA CTGT
    2771 2793 CCACAGGTCCTAAACTA 1113 GTTTGGTAGTTTAGGA 2882
    CCAAAC CCTG
    7779 2801 CCTAAACTACCAAACCT 1114 TAATGCAGGTTTGGTA 2883
    GCATTA GTTT
    2788 2810 CCAAACCTGCATTAAAA 1115 CGAAATTTTTAATGCA 2884
    ATTTCG GGTT
    2793 2815 CCTGCATTAAAAATTTC 1116 CCAACCGAAATTTTTA 2885
    GGTTGG ATGC
    2821 2843 CCTCGGAGCAGAACCCA 1117 GGAGGTTGGGTTCTGC 2886
    ACCTCC TCCG
    2834 2856 CCCAACCTCCGAGCAGT 1118 GCATGTACTGCTCGGA 2887
    ACATGC GGTT
    2835 2857 CCAACCTCCGAGCAGTA 1119 AGCATGTACTGCTCGG 2888
    CATGCT AGGT
    2839 2861 CCTCCGAGCAGTACATG 1120 TCTTAGCATGTACTGC 2889
    CTAAGA TCGG
    2842 2864 CCGAGCAGTACATGCTA 1121 AAGTCTTAGCATGTAC 2890
    AGACTT TGCT
    2867 2889 CCAGTCAAAGCGAACTA 1122 GTATACTTAGTTCGCTT 2891
    CTATAC TGAC
    2899 2921 CCAATAACTTGACCAAC 1123 TGTTCCGTTGGTCAACG 2892
    GGAACA TTAT
    2911 2933 CCAACGGAACAAGTTAC 1124 CCTAGGGTAACTTGTT 2893
    CCTAGG CCGT
    2927 2949 CCCTAGGGATAACAGCG 1125 GGATTGCGCTGTTATC 2894
    CAATCC CCTA
    2928 2950 CCTAGGGATAACAGCGC 1126 AGGATTGCGCTGTTAT 2895
    AATCCT CCCT
    2948 2970 CCTATTCTAGAGTCCAT 1127 GTTGATATGGACTCTA 2896
    ATCAAC GAAT
    2961 2983 CCATATCAACAATAGGG 1128 CGTAAACCCTATTGTT 2897
    TTTACG GATA
    2985 3007 CCTCGATGTTGGATCAG 1129 GATGTCCTGATCCAAC 2898
    GACATC ATCG
    3007 3029 CCCGATGGTGCAGCCGC 1130 TTAATAGCGGCTGCAC 2899
    TATTAA CATC
    3008 3030 CCGATGGTGCAGCCGCT 1131 TTTAATAGCGGCTGCA 2900
    ATTAAA CCAT
    3020 3042 CCGCTATTAAAGGTTCG 1132 AACAAACGAACCTTTA 2901
    TTTGTT ATAG
    3056 3078 CCTACGTGATCTGAGTT 1133 GGTCTGAACTCAGATC 2902
    CAGACC ACGT
    3077 3099 CCGGAGTAATCCAGGTC 1134 GAAACCGACCTGGATT 2903
    GGTTTC ACTC
    3087 3109 CCAGGTCGGTTTCTATC 1135 AANGTAGATAGAAAC 2904
    TACNTT CGACC
    3116 3138 CCTCCCTGTACGAAAGG 1136 TCTTGTCCTTTCGTACA 2905
    ACAAGA GGG
    3119 3141 CCCTGTACGAAAGGACA 1137 TTCTCTTGTCCTTTCGT 2906
    AGAGAA ACA
    3120 3142 CCTGTACGAAAGGACA 1138 TTTCTCTTGTCCTTTCG 2907
    AGAGAAA TAC
    3148 3170 CCTACTTCACAAAGCGC 1139 GGGAAGGCGCTTTGTG 2908
    CTTCCC AAGT
    3164 3186 CCTTCCCCCGTAAATGA 1140 ATGATATCATTTACGG 2909
    TATCAT GGGA
    3168 3190 CCCCCGTAAATGATATC 1141 TGAGATGATATCATTT 2910
    ATCTCA ACGG
    3169 3191 CCCCGTAAATGATATCA 1142 TTGAGATGATATCATT 2911
    TCTCAA TACG
    3170 3192 CCCGTAAATGATATCAT 1143 GTTGAGATGATATCAT 2912
    CTCAAC TTAC
    3171 3193 CCGTAAATGATATCATC 1144 AGTTGAGATGATATCA 2913
    TCAACT TTTA
    3204 3226 CCCACACCCACCCAAGA 1145 CCCTGTTCTTGGGTGG 2914
    ACAGGG GTGT
    3205 3227 CCACACCCACCCAAGAA 1146 ACCCTGTTCTTGGGTG 2915
    CAGGGT GGTG
    3210 3232 CCCACCCAAGAACAGG 1147 AACAAACCCTGTTCTT 2916
    GTTTGTT GGGT
    3211 3233 CCACCCAAGAACAGGG 1148 TAACAAACCCTGTTCT 2917
    TTTGTTA TGGG
    3214 3236 CCCAAGAACAGGGTTTG 1149 TCTTAACAAACCCTGT 2918
    TTAAGA TCTT
    3215 3237 CCAAGAACAGGGTTTGT 1150 ATCTTAACAAACCCTG 2919
    TAAGAT TTCT
    3245 3267 CCCGGTAATCGCATAAA 1151 TTAAGTTTTATGCGAT 2920
    ACTTAA TACC
    3246 3268 CCGGTAATCGCATTAAAA 1152 TTTAAGTTTTATGCGA 2921
    CTTAAA TTAC
    3292 3314 CCTCTTCTTAACAACAT 1153 ATGGGTATGTTGTTAA 2922
    ACCCAT GAAG
    3310 3332 CCCATGGCCAACCTCCT 1154 AGGAGTAGGAGGTTG 2923
    ACTCCT GCCAT
    3311 3333 CCATGGCCAACCTCCTA 1155 GAGGAGTAGGAGGTT 2924
    CTCCTC GGCCA
    3317 3339 CCAACCTCCTACTCCTC 1156 TACAATGAGGAGTAG 2925
    ATTGTA GAGGT
    3321 3343 CCTCCTACTCCTATTGT 1157 TGGGTACAATGAGGA 2926
    ACCCA GTAGG
    3324 3346 CCTACTCCTCATTGTAC 1158 GAATGGGTACAATGA 2927
    CCATTC GGAGT
    3330 3352 CCTCATTGTACCCATTC 1159 CGATTAGAATGGGTAC 2928
    TAATCG AATG
    3340 3362 CCCATTCTAATCGCAAT 1160 AATGCCATTGCGATTA 2929
    GGCATT GAAT
    3341 3363 CCATTCTAATCGCAATG 1161 GAATGCCATTGCGATT 2930
    GCATTC AGAA
    3363 3385 CCTAATGCTTACCGAAC 1162 TTTTTCGTTCGGTAAG 2931
    GAAAAA CATT
    3374 3396 CCGAACGAAAAATTCTA 1163 ATAGCCTAGAATTTTT 2932
    GGCTAT CGTT
    3414 3436 CCCCAACGTTGTAGGCC 1164 CGTAGGGGCCTACAAC 2933
    CCTACG GTTG
    3415 3437 CCCAACGTTGTAGGCCC 1165 CCGTAGGGGCCTACAA 2934
    CTACGG CGTT
    3416 3438 CCAACGTTGTAGGCCCC 1166 CCCGTAGGGGCCTACA 2935
    TACGGG ACGT
    3429 3451 CCCCTACGGGCTACTAC 1167 AGGTTGTAGTAGCCC 2936
    AACCCT GTAG
    3430 3452 CCCTACGGGCTACTACA 1168 AAGGGTTGTAGTAGCC 2937
    ACCCTT CGTA
    3431 3453 CCTACGGGCTACTACAA 1169 GAAGGGTTGTAGTAGC 2938
    CCCTTC CCGT
    3448 3470 CCCTTCGCTGACGCCAT 1170 AGTTTTATGGCGTCAG 2939
    AAAACT CGAA
    3449 3471 CCTTCGCTGACGCCATA 1171 GAGTTTTATGGCGTCA 2940
    AAACTC GCGA
    3461 3483 CCATAAAACTCTTCACC 1172 CTCTTTGGTGAAGAGT 2941
    AAAGAG TTTA
    3476 3498 CCAAAGAGCCCCTAAA 1173 GGCGGGTTTTAGGGGC 2942
    ACCCGCC TCTT
    3484 3506 CCCCTAAAACCCGCCAC 1174 GTAGATGTGGCGGGTT 2943
    ATCTAC TTAG
    3485 3507 CCCTAAAACCCGCCACA 1175 GGTAGATGTGGCGGGT 2944
    TCTACC TTTA
    3486 3508 CCTAAAACCCGCCACAT 1176 TGGTAGATGTGGCGGG 2945
    CTACCA TTTT
    3493 3515 CCCGCCACATCTACCAT 1177 AGGGTGATGGTAGATG 2946
    CACCCT TGGC
    3494 3516 CCGCCACATCTACCATC 1178 GAGGGTGATGGTAGAT 2947
    ACCCTC GTGG
    3497 3519 CCACATCTACCATCACC 1179 GTAGAGGGTGATGGTA 2948
    CTCTAC GATG
    3506 3528 CCATCACCCTCTACATC 1180 GGCGGTGATGTAGAG 2949
    ACCGCC GGTGA
    3512 3534 CCCTCTACATCACCGCC 1181 GGTCGGGGCGGTGATG 2950
    CCGACC TAGA
    3513 3535 CCTCTACATCACCGCCC 1182 AGGTCGGGGCGGTGAT 2951
    CGACCT GTAG
    3524 3546 CCGCCCCGACCTTAGCT 1183 GGTGAGAGCTAAGGTC 2952
    CTCACC GGGG
    3527 3549 CCCCGACCTTAGCTCTC 1184 GATGGTGAGAGCTAA 2953
    ACCATC GGTCG
    3528 3550 CCCGACCTTAGCTCTCA 1185 CGATGGTGAGAGCTAA 2954
    CCATCG GGTC
    3529 3551 CCGACCTTAGCTCTCAC 1186 GCGATGGTGAGAGCTA 2955
    CATCGC AGGT
    3533 3555 CCTTAGCTCTCACCATC 1187 AAGAGCGATGGTGAG 2956
    GCTCTT AGCTA
    3545 3567 CCATCGCTCTTCTACTA 1188 GGTTCATAGTAGAAGA 2957
    TGAACC GCGA
    3566 3588 CCCCCCTCCCCATACCC 1189 GGGGTTGGGTATGGGG 2958
    AACCCC AGGG
    3567 3589 CCCCCTCCCCATACCA 1190 GGGGGTTGGGTATGGG 2959
    ACCCCC GAGG
    3568 3590 CCCCTCCCCATACCCAA 1191 AGGGGGTTGGGTATGG 2960
    CCCCCT GGAG
    3569 3591 CCCTCCCCATACCCAAC 1192 CAGGGGGTTGGGTATG 2961
    CCCCTG GGGA
    3570 3592 CCTCCCCATACCCAACC 1193 CCAGGGGGTTGGGTAT 2962
    CCCTGG GGGG
    3573 3595 CCCCATACCCAACCCCC 1194 TGACCAGGGGGTTGGG 2963
    TGGTCA TATG
    3574 3596 CCCATACCCAACCCCCT 1195 TTGACCAGGGGGTTGG 2964
    GGTCAA GTAT
    3575 3597 CCATACCCAACCCCCTG 1196 GTTGACCAGGGGGTTG 2965
    GTCAAC GGTA
    3580 3602 CCCAACCCCCTGGTCAA 1197 TTGAGGTTGACCAGGG 2966
    CCTCAA GGTT
    3581 3603 CCAACCCCCTGGTCAAC 1198 GTTGAGGTTGACCAGG 2967
    CTCAAC GGGT
    3585 3607 CCCCCTGGTCAACCTCA 1199 CTAGGTTGAGGTTGAC 2968
    ACCTAG CAGG
    3586 3608 CCCCTGGTCAACCTCAA 1200 CCTAGGTTGAGGTTGA 2969
    CCTAGG CCAG
    3587 3609 CCCTGGTCAACCTCAAC 1201 GCCTAGGTTGAGGTTG 2970
    CTAGGC ACCA
    3588 3610 CCTGGTCAACCTCAACC 1202 GGCCTAGGTTGAGGTT 2971
    TAGGCC GACC
    3597 3619 CCTCAACCTAGGCCTCC 1203 TAAATAGGAGGCCTAG 2972
    TATTTA GTTG
    3603 3625 CCTAGGCCTCCTATTTA 1204 CTAGAATAAATAGGA 2973
    TTCTAG GGCCT
    3609 3631 CCTCCTATTTATTCTAGC 1205 AGGTGGCTAGAATAA 2974
    CACCT ATAGG
    3612 3634 CCTATTTATTTAGCCA 1206 TAGAGGTGGCTAGAAT 2975
    CCTCTA AAAT
    3626 3648 CCACCTCTAGCCTAGCC 1207 GTAAACGGCTAGGCTA 2976
    GTTTAC GAGG
    3629 3651 CCTCTAGCCTAGCCGTT 1208 TGAGTAAACGGCTAGG 2977
    TACTCA CTAG
    3636 3658 CCTAGCCGTTTACTCAA 1209 AGAGGATTGAGTAAA 2978
    TCCTCT CGGCT
    3641 3663 CCGTTTACTCAATCCTC 1210 TGATCAGAGGATTGAG 2979
    TGATCA TAAA
    3654 3676 CCTCTGATCAGGGTGAG 1211 TTGATGCTCACCCTGA 2980
    CATCAA TCAG
    3689 3711 CCCTGATCGGCGCCTG 1212 TGCTCGCAGTGCGCCG 2981
    CGAGCA ATCA
    3690 3712 CCTGATCGGCACTGC 1213 CTGCTCGCAGTGCGCC 2982
    GAGCAG GATC
    3716 3738 CCCAAACAATCTCATAT 1214 GACTTCATATGAGATT 2983
    GAAGTC GTTT
    3717 3739 CCAAACAATCTCATATG 1215 TGACTTCATATGAGAT 2984
    AAGTCA TGTT
    3740 3762 CCCTAGCCATCATTCTA 1216 TGATAGTAGAATGATG 2985
    CTATCA GCTA
    3741 3763 CCTAGCCATCATTCTAC 1217 TTGATAGTAGAATGAT 2986
    TATCAA GGCT
    3746 3768 CCATCATTCTACTATCA 1218 TAATGTTGATAGTAGA 2987
    ACATTA ATGA
    3782 3804 CCTTTAACCTCTCCACC 1219 GATAAGGGTGGAGAG 2988
    CTTATC GTTAA
    3789 3811 CCTCTCCACCCTTATCA 1220 GTGTTGTGATAAGGGT 2989
    CAACAC GGAG
    3794 3816 CCACCCTTATCACAACA 1221 TTCTTGTGTTGTGATA 2990
    CAAGAA AGGG
    3797 3819 CCCTTATCACAACACAA 1222 GTGTTCTTGTGTTGTG 2991
    GAACAC ATAA
    3798 3820 CCTTATCACAACACAAG 1223 GGTGTTCTTGTGTTGT 2992
    AACACC GATA
    3819 3841 CCTCTGATTACTCCTGC 1224 ATGATGGCAGGAGTA 2993
    CATCAT ATCAG
    3831 3853 CCTGCCATCATGACCCT 1225 TGGCCAAGGGTCATGA 2994
    TGGCCA GGC
    3835 3857 CCATCATGACCCTTGGC 1226 ATTATGGCCAAGGGTC 2995
    CATAAT ATGA
    3844 3866 CCCTTGGCCATAATATG 1227 ATAAATCATATTATGG 2996
    ATTTAT CCAA
    3845 3867 CCTTGGCCATAATATGA 1228 GATAAATCATATTATG 2997
    TTTATC GCCA
    3851 3873 CCATAATATGATTTATC 1229 TGTGGAGATAAATCAT 2998
    TCCACA ATTA
    3869 3891 CCACACTAGCAGAGACC 1230 TCGGTTGGTCTCTGCT 2999
    AACCGA AGTG
    3884 3906 CCAACCGAACCCCCTTC 1231 AAGGTCGAAGGGGGT 3000
    GACCTT TCGGT
    3888 3910 CCGAACCCCCTTCGACC 1232 CGGCAAGGTCGAAGG 3001
    TTGCCG GGGTT
    3893 3915 CCCCCTTCGACCTTGCC 1233 CCCTTCGGCAAGGTCG 3002
    GAAGGG AAGG
    3894 3916 CCCCTTCGACCTTGCCG 1234 CCCCTTCGGCAAGGTC 3003
    AAGGGG GAAG
    3895 3917 CCCTTCGACCTTGCCGA 1235 TCCCCTTCGGCAAGGT 3004
    AGGGGA CGAA
    3896 3918 CCTTCGACCTTGCCGAA 1236 CTCCCCTTCGGCAAGG 3005
    GGGGAG TCGA
    3903 3925 CCTTGCCGAAGGGGAGT 1237 GTTCGGACTCCCCTTC 3006
    CCGAAC GGCA
    3908 3930 CCGAAGGGGAGTCCGA 1238 GACTAGTTCGGACTCC 3007
    ACTAGTC CCTT
    3920 3942 CCGAACTAGTCTCAGGC 1239 GTTGAAGCCTGAGACT 3008
    TTCAAC AGTT
    3953 3975 CCGCAGGCCCCTTCGCC 1240 GAATAGGGCGAAGGG 3009
    CTATTC GCCTG
    3960 3982 CCCCTTCGCCCTATTCTT 1241 CTATGAAGAATAGGGC 3010
    CATAG GAAG
    3961 3983 CCCTTCGCCCTATTCTTC 1242 GCTATGAAGAATAGG 3011
    ATAGC GCGAA
    3962 3984 CCTTCGCCCTATTCTTCA 1243 GGCTATGAAGAATAG 3012
    TAGCC GGCGA
    3968 3990 CCCTATTCTTCATAGCCG 1244 GTATTCGGCTATGAAG 3013
    GAATAC AATA
    3969 3991 CCTATTCTTCATAGCCG 1245 TGTATTCGGCTATGAA 3014
    AATACA GAAT
    3983 4005 CCGAATACACAAACATT 1246 TATAATAATGTTTGTG 3015
    ATTATA TATT
    4013 4035 CCCTCACCACTACAATC 1247 TAGGAAGATTGTAGTG 3016
    TTCCTA GTGA
    4014 4036 CCTCACCACTACAATCT 1248 CTAGGAAGATTGTAGT 3017
    TCCTAG GGTG
    4019 4041 CCACTACAATCTTCCTA 1249 TGTTCCTAGGAAGATT 3018
    GGAACA CTTAG
    4032 4054 CCTAGGAACAACATATG 1250 GTGCGTCATATGTTGT 3019
    ACGCAC TCCT
    4058 4080 CCCCTGAACTCTACACA 1251 ATATGTTGTGTAGAGT 3020
    ACATAT TCAG
    4059 4081 CCCTGAACTCTACACAA 1252 AATATGTTGTGTAGAG 3021
    CATATT TTCA
    4060 4082 CCTGAACTCTACCAAC 1253 AAATATGTTGTGTAGA 3022
    ATATTT GTTC
    4088 4110 CCAAGACCCTACTTCTA 1254 GGAGGTTAGAAGTAG 3023
    ACCTCC GGTCT
    4094 4116 CCCTACTTCTAACCTCC 1255 GAACAGGGAGGTTAG 3024
    CTGTTC AAGTA
    4095 4117 CCTACTTCTAACTCCC 1256 AGAACAGGGAGGTTA 3025
    TGTTCT GAAGT
    4106 4128 CCTCCCTGTTCTTATGA 1257 TCGAATTCATAAGAAC 3026
    ATTCGA AGGG
    4109 4131 CCCTGTTCTTATGAATT 1258 TGTTCGAATTCATAAG 3027
    CGAACA AACA
    4110 4132 CCTGTTCTTATGAATTC 1259 CTGTTCGAATTCATAA 3028
    GAACAG GAAC
    4137 4159 CCCCCGATTCCGCTACG 1260 GTTGGTCGTAGCGGAA 3029
    ACCAAC TCGG
    4138 4160 CCCCGATTCCGCTACGA 1261 AGTTGGTCGTAGCGGA 3030
    CCAACT ATCG
    4139 4161 CCCGATTCCGCTACGAC 1262 GAGTTGGTCGTAGCGG 3031
    CAACTC AATC
    4140 4162 CCGATTCCGCTACGACC 1263 TGAGTTGGTCGTAGCG 3032
    AACTCA GAAT
    4146 4168 CCGCTACGACCAACTCA 1264 GGTGTATGAGTTGGTC 3033
    TACACC GTAG
    4155 4177 CCAACTCATACACCTCC 1265 TTCATAGGAGGTGTAT 3034
    TATGAA GAGT
    4167 4189 CCTCCTATGAAAAAACT 1266 GTAGGAAGTTTTTTCA 3035
    TCCTAC TAGG
    4170 4192 CCTATGAAAAAACTTCC 1267 GTGGTAGGAAGTTTTT 3036
    TACCAC TCAT
    4185 4207 CCTACCACTCACCCTAG 1268 GTAATGCTAGGGTGAG 3037
    CATTAC TGGT
    4189 4211 CCACTCACCCTAGCATT 1269 ATAAGTAATGCTAGGG 3038
    ACTTAT TGAG
    4196 4218 CCCTAGCATTACTTATA 1270 ATATCATATAAGTAAT 3039
    TGATAT GCTA
    4197 4219 CCTAGCATTACTTATAT 1271 CATATCATATAAGTAA 3040
    GATATG TGCT
    4223 4245 CCATACCCATTACAATC 1272 GCTGGAGATTGTAATG 3041
    TCCAGC GGTA
    4228 4250 CCCATTACAATCTCCAG 1273 GGAATGCTGGAGATTG 3042
    CATTCC TAAT
    4229 4251 CCATTACAATCTCCAGC 1274 GGGAATGCTGGAGATT 3043
    ATTCCC GTAA
    4241 4263 CCAGCATTCCCCCTCAA 1275 TTAGGTTTGAGGGGGA 3044
    ACCTAA ATGC
    4249 4271 CCCCCTCAAACCTAAGA 1276 CATATTTCTTAGGTTT 3045
    AATATG GAGG
    4250 4272 CCCCTCAAACCTAAGAA 1277 ACATATTTCTTAGGTT 3046
    ATATGT TGAG
    4251 4273 CCCTCAAACCTAAGAAA 1278 GACATATTTCTTAGGT 3047
    TATGTC TTGA
    4252 4274 CCTCAAACCTAAGAAAT 1279 AGACATATTTCTTAGG 3048
    ATGTCT TTTG
    4259 4281 CCTAAGAAATATGTCTG 1280 TTTTATCAGACATATT 3049
    ATAAAA TCTT
    4318 4340 CCCCCTTATTTCTAGGA 1281 TCATAGTCCTAGAAAT 3050
    CTATGA AAGG
    4319 4341 CCCCTTATTTCTAGGAC 1282 CTCATAGTCCTAGAAA 3051
    TATGAG TAAG
    4320 4342 CCCTTATTTCTAGGACT 1283 TCTCATAGTCCTAGAA 3052
    ATGAGA ATAA
    4321 4343 CCTTATTTCTAGGACTA 1284 TTCTCATAGTCCTAGA 3053
    TGAGAA AATA
    4349 4371 CCATCCCTGAGAATCC 1285 AATTTTGGATTCTCAG 3054
    AAAATT GGAT
    4350 4372 CCATCCCTGAGAATCCA 1286 GAATTTTGGATTCTCA 3055
    AAATTC GGGA
    4354 4376 CCCTGAGAATCCAAAAT 1287 CGGAGAATTTTGGATT 3056
    TCTCCG CTCA
    4355 4377 CCTGAGAATCCAAAATT 1288 ACGGAGAATTTTGGAT 3057
    CTCCGT TCTC
    4364 4386 CCAAAATTCTCCGTGCC 1289 ATAGGTGGCACGGAG 3058
    ACCTAT AATTT
    4374 4396 CCGTGCCACCTATCACA 1290 ATGGGGTGTGATAGGT 3059
    CCCCAT GGCA
    4379 4401 CCACCTATCACACCCCA 1291 TTAGGATGGGGTGTGA 3060
    TCCTAA TAGG
    4382 4404 CCTATCACACCCCATCC 1292 ACTTTAGGATGGGGTG 3061
    TAAAGT TGAT
    4391 4413 CCCCATCCTAAAGTAAG 1293 GCTGACCTTACTTTAG 3062
    GTCAGC GATG
    4392 4414 CCCATCCTAAAGTAAGG 1294 AGCTGACCTTACTTTA 3063
    TCAGCT GGAT
    4393 4415 CCATCCTAAAGTAAGGT 1295 TAGCTGACCTTACTTT 3064
    CAGCTA AGGA
    4397 4419 CCTAAAGTAAGGTCAGC 1296 TATTTAGCTGACCTTA 3065
    TAAATA CTTT
    4430 4452 CCCATACCCCGAAAATG 1297 AACCAACATTTTCGGG 3066
    TTGGTT GTAT
    4431 4453 CCATACCCCGAAAATGT 1298 AACCAACATTTTCGGG 3067
    TGGTTA GGTA
    4436 4458 CCCCGAAAATGTTGGTT 1299 GGGTATAACCAACATT 3068
    ATACCC TTCG
    4437 4459 CCCGAAAATGTTGGTTA 1300 AGGGTATAACCAACAT 3069
    TACCCT TTTC
    4438 4460 CCGAAAATGTTGGTTAT 1301 AAGGGTATAACCAAC 3070
    ACCCTT ATTTT
    4456 4478 CCCTTCCCGTACTAATT 1302 GGGATTAATTAGTACG 3071
    AATCCC GGAA
    4457 4479 CCTTCCCGTACTAATTA 1303 GGGGATTAATTAGTAC 3072
    ATCCCC GGGA
    4461 4483 CCCGTACTAATTAATCC 1304 GCCAGGGGATTAATTA 3073
    CCTGGC GTAC
    4462 4484 CCGTACTAATTAATCCC 1305 GGCCAGGGGATTAATT 3074
    CTGGCC AGTA
    4476 4498 CCCCTGGCCCAACCCGT 1306 TAGATGACGGGTTGGG 3075
    CATCTA CCAG
    4477 4499 CCCTGGCCCAACCCGTC 1307 GTAGATGACGGGTTGG 3076
    ATCTAC GCCA
    4478 4500 CCTGGCCCAACCCGTCA 1308 AGTAGATGACGGGTTG 3077
    TCTACT GGCC
    4483 4505 CCCAACCCGTCATCTAC 1309 GGTAGAGTAGATGAC 3078
    TCTACC GGGTT
    4484 4506 CCAACCCGTCATCTACT 1310 TGGTAGAGTAGATGAC 3079
    CTACCA GGGT
    4488 4510 CCCGTCATCTACTCTAC 1311 AAGATGGTAGAGTAG 3080
    CATCTT ATGAC
    4489 4511 CCGTCATCTACTCTACC 1312 AAAGATGGTAGAGTA 3081
    ATCTTT GATGA
    4504 4526 CCATCTTTGCAGGCACA 1313 GATGAGTGTGCCTGCA 3082
    CTCATC AAGA
    4555 4577 CCTGAGTAGGCCTA 1314 GTTTATTTCTAGGCCT 3083
    ATAAAC ACTC
    4565 4587 CCTAGAAATAAACATGC 1315 AAGCTAGCATGTTTAT 3084
    TAGCTT TTCT
    4593 4615 CCAGTTCTAACCAAAAA 1316 TTTATTTTTTTGGTTAG 3085
    AATAAA AAC
    4603 4625 CCAAAAAAATAAACCCT 1317 GGAACGAGGGTTTATT 3086
    CGTTCC TTTT
    4616 4638 CCCTCGTTCCACAGAAG 1318 TGGCAGCTTCTGTGGA 3087
    CTGCCA ACGA
    4617 4639 CCTCGTTCCACAGAAGC 1319 ATGGCAGCTTCTGTGG 3088
    TGCCAT AACG
    4624 4646 CCACAGAAGCTGCCATC 1320 ATACTTGATGGCAGCT 3089
    AAGTAT TCTG
    4636 4658 CCATCAAGTATTTCCTC 1321 TTGCGTGAGGAAATAC 3090
    ACGCAA TTGA
    4649 4671 CCTCACGCAAGCAACCG 1322 TGGATGCGGTTGCTTG 3091
    CATCCA CGTG
    4663 4685 CCGCATCCATAATCCTT 1323 TATTAGAAGGATTATG 3097
    CTAATA GATG
    4669 4691 CCATAATCCTTCTAATA 1324 GATAGCTATTAGAAGG 3093
    GCTATC ATTA
    4676 4698 CCTTCTAATAGCTATCC 1325 TGAAGAGGATAGCTAT 3094
    TCTTCA TAGA
    4691 4713 CCTCTTCAACAATATAC 1326 CGGAGAGTATATTGTT 3095
    TCTCCG GAAG
    4711 4733 CCGGACAATGAACCATA 1327 ATTGGTTATGGTTCAT 3096
    ACCAAT TGTC
    4723 4745 CCATAACCAATACTACC 1328 TTGATTGGTAGTATTG 3097
    AATCAA GTTA
    4729 4751 CCAATACTACCAATCAA 1329 TGAGTATTGATTGGTA 3098
    TACTCA GTAT
    4738 4760 CCAATCAATACTCATCA 1330 TATTAATGATGAGTAT 3099
    TTAATA TGAT
    4795 4817 CCCCCTTTCACTTCTGA 1331 TGGGACTCAGAAGTGA 3100
    GTCCCA AAGG
    4796 4818 CCCCTTTCACTTCTGAG 1332 CTGGGACTCAGAAGTG 3101
    TCCCAG AAAG
    4797 4819 CCCTTTCACTTCTGAGT 1333 TCTGGGACTCAGAAGT 3102
    CCCAGA GAAA
    4798 4820 CCTTTCACTTCTGAGTC 1334 CTCTGGGACTCAGAAG 3103
    CCAGAG TGAA
    4814 4836 CCCAGAGGTTACCCAAG 1335 GGGTGCCTTGGGTAAC 3104
    GCACCC CTCT
    4815 4837 CCAGAGGTTACCCAAGG 1336 GGGGTGCCTTGGGTAA 3105
    CACCCC CCTC
    4825 4847 CCCAAGGCACCCCTCTG 1337 GGATGTCAGAGGGGT 3106
    ACATCC GCCTT
    4826 4848 CCAAGGCACCCCTCTGA 1338 CGGATGTCAGAGGGGT 3107
    CATCCG GCCT
    4834 4856 CCCCTCTGACATCCGGC 1339 AAGCAGGCCGGATGTC 3108
    CTGCTT AGAG
    4835 4857 CCCTCTGACATCCGGCC 1340 GAAGCAGGCCGGATG 3109
    TGCTTC TCAGA
    4836 4858 CCTCTGACATCCGGCCT 1341 AGAAGCAGGCCGGAT 3110
    GCTTCT GTCAG
    4846 4868 CCGGCCTGCTTCTTCTC 1342 TCATGTGAGAAGAAGC 3111
    ACATGA AGGC
    4850 4872 CCTGCTTCTTCTCACAT 1343 TTTGTCATGTGAGAAG 3112
    GACAAA AAGC
    4879 4901 CCCCCATCTCAATCATA 1344 TTGGTATATGATTGAG 3113
    TACCAA ATGG
    4880 4902 CCCCATCTCAATCATAT 1345 TTTGGTATATGATTGA 3114
    ACCAAA GATG
    4881 4903 CCCATCTCAATCATATA 1346 ATTTGGTATATGATTG 3115
    CCAAAT AGAT
    4882 4904 CCATCTCAATCATATAC 1347 GATTTGGTATATGATT 3116
    CAAATC GAGA
    4898 4920 CCAAATCTCTCCCTCAC 1348 CGTTTAGTGAGGGAGA 3117
    TAAACG GATT
    4908 4930 CCCTCACTAAACGTAAG 1349 AGAAGGCTTACGTTTA 3118
    CCTTCT GTGA
    4909 4931 CCTCACTAAACGTAAGC 1350 GAGAAGGCTTACGTTT 3119
    CTTCTC AGTG
    4925 4947 CCTTCTCCTCACTCTCTC 1351 AGATTGAGAGAGTGA 3120
    AATCT GGAGA
    4931 4953 CCTCACTCTCTCAATCTT 1352 TGGATAAGATTGAGAG 3121
    ATCCA AGTG
    4951 4973 CCATCATAGCAGGCAGT 1353 ACCTCAACTGCCTGCT 3122
    TGAGGT ATGA
    4982 5004 CCAAACCCAGCTACGCA 1354 AGATTTTGCGTAGCTG 3123
    AAATCT GGTT
    4987 5009 CCCAGCTACGCAAAATC 1355 TGCTAAGATTTTGCGT 3124
    TTAGCA AGCT
    4988 5010 CCAGCTACGCAAAATCT 1356 ATGCTAAGATTTTGCG 3125
    TAGCAT TAGC
    5014 5036 CCTCAATTACCCACATA 1357 TCATCCTATGTGGGTA 3126
    GGATGA ATTG
    5023 5045 CCCACATAGGATGAATA 1358 TGCTATTATTCATCCT 3127
    ATAGCA ATGT
    5024 5046 CCACATAGGATGAATAA 1359 CTGCTATTATTCATCCT 3128
    TAGCAG ATG
    5052 5074 CCGTACAACCCTAACAT 1360 ATGGTTATGTTAGGGT 3129
    AACCAT TGTA
    5060 5082 CCCTAACATAACCATTC 1361 AATTAAGAATGGTTAT 3130
    TTAATT GTTA
    5061 5083 CCTAACATAACCATTCT 1362 AAATTAAGAATGGTTA 3131
    TAATTT TGTT
    5071 5093 CCATTCTTAATTTAACT 1363 ATAAATAGTTAAATTA 3132
    ATTTAT AGAA
    5099 5121 CCTAACTACTACCGCAT 1364 GTAGGAATGCGGTAGT 3133
    TCCTAC AGTT
    5110 5132 CCGCATTCCTACTACTC 1365 TAAGTTGAGTAGTAGG 3134
    AACTTA AATG
    5117 5139 CCTACTACTCAACTTAA 1366 TGGAGTTTAAGTTGAG 3135
    ACTCCA TAGT
    5137 5159 CCAGCACCACGACCCTA 1367 TAGTAGTAGGGTCGTG 3136
    CTACTA GTGC
    5143 5165 CCACGACCCTACTACTA 1368 GCGAGATAGTAGTAG 3137
    TCTCGC GGTCG
    5149 5171 CCCTACTACTATCTCGC 1369 TCAGGTGCGAGATAGT 3138
    ACCTGA AGTA
    5150 5172 CCTACTACTATCTCGCA 1370 TTCAGGTGCGAGATAG 3139
    CCTGAA TAGT
    5167 5189 CCTGAAACAAGCTAACA 1371 TAGTCATGTTAGCTTG 3140
    TGACTA TTTC
    5193 5215 CCCTTAATTCCATCCAC 1372 AGGAGGGTGGATGGA 3141
    CCTCCT ATTAA
    5194 5216 CCTTAATTCCATCCACC 1373 GAGGAGGGTGGATGG 3142
    CTCCTC AATTA
    5202 5224 CCATCCACCCTCCTC 1374 CCTAGGGAGAGGAGG 3143
    CCTAGG GTGGA
    5206 5228 CCACCCTCCTCTCCCTA 1375 GCCTCCTAGGGAGAGG 3144
    GGAGGC AGGG
    5209 5231 CCCTCCTCTCCCTAGGA 1376 CAGGCCTCCTAGGGAG 3145
    GGCCTG AGGA
    5210 5232 GCTCCTCTCCCTAGGAG 1377 CCAGGCCTCCTAGGGA 3146
    GCCTGC GAGG
    5213 5235 CCTCTCCCTAGGAGGCC 1378 GGGGCAGGCCTCCTAG 3147
    TGCCCC GGAG
    5218 5240 CCCTAGGAGGCCTGCCC 1379 TAGCGGGGGCAGGCCT 3148
    CCGCTA CCTA
    5219 5241 CCTAGGAGGCCTGCCCC 1380 TTAGCGGGGGCAGGCC 3149
    CGCTAA TCCT
    5228 5250 CCTGCCCCCGCTAACCG 1381 AAAAGCCGGTTAGCG 3150
    GCTTTT GGGGC
    5232 5254 CCCCCGCTAACCGGCTT 1382 GGCAAAAAGCCGGTT 3151
    TTTGCC AGCGG
    5213 5255 CCCCGCTAACCGGCTTT 1383 GGGCAAAAAGCCGGT 3152
    TTGCCC TAGCG
    5234 5256 CCCGCTAACCGGCTTTT 1384 TGGGCAAAAAGCCGG 3153
    TGCCCA TTAGC
    5235 5257 CCGCTAACCGGCTTTTT 1385 TTGGGCAAAAAGCCG 3154
    GCCCAA GTTAG
    5242 5264 CCGGCTTTTTGCCCAAA 1386 GGCCCATTTGGGCAAA 3155
    TGGGCC AAGC
    5253 5275 CCCAAATGGGCCATTAT 1387 TCTTCGATAATGGCCC 3156
    CGAAGA ATTT
    5254 5276 CCAAATGGGCCATTATC 1388 TTCTTCGATAATGGCC 3157
    GAAGAA CATT
    5263 5285 CCATTATCGAAGAATTC 1389 TTTTGTGAATTCTTCG 3158
    ACAAAA ATAA
    5294 5316 CCTCATCATCCCCACCA 1390 CTATGATGGTGGGGAT 3159
    TCATAG GATG
    5303 5325 CCCCACCATCATAGCCA 1391 TGATGGTGGCTATGAT 3160
    CCATCA GGTG
    5304 5326 CCCACCATCATAGCCAC 1392 GTGATGGTGGCTATGA 3161
    CATCAC TGGT
    5305 5327 CCACCATCATAGCCACC 1393 GGTGATGGTGGCTATG 3162
    ATCACC ATGG
    5308 5330 CCATCATAGCCACCATC 1394 GAGGGTGATGGTGGCT 3163
    ACCCTC ATGA
    5317 5339 CCACCATCACCCTCCTT 1395 GAGGTTAAGGAGGGT 3164
    AACCTC GATGG
    5320 5342 CCATCACCCTCCTTAAC 1396 GTAGAGGTTAAGGAG 3165
    CTCTAC GGTGA
    5326 5348 CCCTCCTTAACCTCTAC 1397 GTAGAAGTAGAGGTTA 3166
    TTCTAC AGGA
    5327 5349 CCTCCTTAACCTCTACTT 1398 GGTAGAAGTAGAGGTT 3167
    CTACC AAGG
    5330 5352 CCTTAACCTCTACTTCT 1399 GTAGGTAGAAGTAGA 3168
    ACCTAC GGTTA
    5336 5358 CCTCTACTTCTACCTAC 1400 TTAGGCGTAGGTAGAA 3169
    GCCTAA GTAG
    5348 5370 CCTACGCCTAATCTACT 1401 AGGTGGAGTAGATTAG 3170
    CCACCT GCGT
    5354 5376 CCTAATCTACTCCACCT 1402 TGATTGAGGTGGAGTA 3171
    CAATCA GATT
    5365 5387 CCACCTCAATCACACTA 1403 GGGGAGTAGTGTGATT 3172
    CTCCCC GAGG
    5368 5390 CCTCAATCACACTACTC 1404 TATGGGGAGTAGTGTG 3173
    CCCATA ATTG
    5384 5406 CCCCATATCTAACAACG 1405 TTTTTACGTTGTTAGAT 3174
    TAAAAA ATG
    5385 5407 CCCATATCTAACAACGT 1406 ATTTTTACGTTGTTAG 3175
    AAAAAT ATAT
    5386 5408 CCATATCTAACAACGT 1407 TATTTTTACGTTGTTAG 3176
    AAAATA ATA
    5433 5455 CCCACCCCATTCCTCCC 1408 AGTGTGGGGAGGAAT 3177
    CACACT GGGGT
    5434 5456 CCACCCCATTCCTCCCC 1409 GAGTGTGGGGAGGAA 3178
    ACACTC TGGGG
    5417 5459 CCCCATTCCTCCCCACA 1410 GATGAGTGTGGGGAG 3179
    CTCATC GAATG
    5438 5460 CCCATTCCTCCCCACAC 1411 CGATGAGTGTGGGGA 3180
    TCATCG GGAAT
    5439 5461 CCATTCCTCCCCACACT 1412 GCGATGAGTGTGGGG 3181
    CATCGC AGGAA
    5444 5466 CCTCCCCACACTCATCG 1413 TAAGGGCGATGAGTGT 3182
    CCCTTA GGGG
    5447 5469 CCCCACACTCATCGCCC 1414 TGGTAAGGGCGATGA 3183
    TTACCA GTGTG
    5448 5470 CCCACACTCATCGCCCT 1415 GTGGTAAGGGCGATG 3184
    TACCAC AGTGT
    5449 5471 CCACACTCATCGCCCTT 1416 CGTGGTAAGGGCGATG 3185
    ACCACG AGTG
    5461 5483 CCCTTACCACGCTACTC 1417 AGGTAGGAGTAGCGT 3186
    CTACCT GGTAA
    5462 5484 CCTTACCACGCTACTCC 1418 TAGGTAGGAGTAGCGT 3187
    TACCTA GGTA
    5467 5489 CCACGCTACTCCTACCT 1419 GGAGATAGGTAGGAG 3188
    ATCTCC TAGCG
    5477 5499 CCTACCTATCTCCCCTTT 1420 GTATAAAAGGGGAGA 3189
    TATAC TAGGT
    5481 5503 CCTATCTCCCCTTTATA 1421 ATTAGTATAAAAGGGG 3190
    CTAAT AGAT
    5488 5510 CCCCTTTTATACTAATA 1422 TAAGATTATTAGTATA 3191
    ATCTTA AAAG
    5489 5511 CCCTTTTATACTAATAA 1423 ATAAGATTATTAGTAT 3192
    TCTTAT AAAA
    5490 5512 CCTTTTATACTAATAAT 1424 TATAAGATTATTAGTA 3193
    CTTATA TAAA
    5534 5556 CCAAGAGCCTTCAAAGC 1425 CTGAGGGCTTTGAAGG 3194
    CCTCAG CTCT
    5541 5563 CCTTCAAAGCCCTCAGT 1426 CAACTTACTGAGGGCT 3195
    AAGTTG TTGA
    5550 5572 CCCTCAGTAAGTTGCAA 1427 TAAGTATTGCAACTTA 3196
    TACTTA CTGA
    5551 5573 CCTCAGTAAGTTGCAAT 1428 TTAAGTATTGCAACTT 3197
    ACTTAA ACTG
    5601 5623 CCCCACTCTGCATCAAC 1429 CGTTCAGTTGATGCAG 3198
    TGAACG AGTG
    5602 5624 CCCACTCTGCATCAACT 1430 GCGTTCAGTTGATGCA 3199
    GAACGC GAGT
    5603 5625 CCACTCTGCATCAACTG 1431 TGCGTTCAGTTGATGC 3200
    AACGCA AGAG
    5632 5654 CCACTTTAATTAAGCTA 1432 AGGGCTTAGCTTAATT 3201
    AGCCCT AAAG
    5651 5673 CCCTTACTAGACCAATG 1433 AAGTCCCATTGGTCTA 3202
    GGACTT GTAA
    5652 5674 CCTTACTAGACCAATGG 1434 TAAGTCCCATTGGTCT 3203
    GACTTA AGTA
    5667 5684 CCAATGGGACTTAAACC 1435 TTTGTGGGTTTAAGTC 3204
    CACAAA CCAT
    5677 5699 CCCACAAACACTTAGTT 1436 GCTGTTAACTAAGTGT 3205
    AACAGC TTGT
    5678 5700 CCACAAACACTTAGTTA 1437 AGCTGTTAACTAAGTG 3206
    ACAGCT TTTG
    5706 5728 CCCTAATCAACTGGCTT 1438 AGATTGAAGCCAGTTG 3207
    CAATCT ATTA
    5707 5729 CCTAATCAACTGGCTTC 1439 TAGATTGAAGCCAGTT 3208
    AATCTA GATT
    5735 5757 CCCGCCGCCGGGAAAA 1440 CCGCCTTTTTTCCCGG 3209
    AAGGCGG CGGC
    5736 5758 CCGCCGCCGGGAAAAA 1441 CCCGCCTTTTTTCCCG 3210
    AGGCGGG GCGG
    5739 5761 CCGCCGGGAAAAAAGG 1442 TCTCCCGCCTTTTTTCC 3211
    CGGGAGA CGG
    5742 5764 CCGGGAAAAAAGGCGG 1443 GCTTCTCCCGCCTTTTT 3212
    GAGAAGC TCC
    5764 5786 CCCCGGCAGGTTTGAAG 1444 AACCAGCTTCAAACCT 3213
    CTGCTT GCCG
    5765 5787 CCCGGCAGGTTTGAAGC 1445 GAAGCAGCTTCAAACC 3214
    TGCTTC TGCC
    5766 5788 CCGGCAGGTTTGAAGCT 1446 AGAAGCAGCTTCAAAC 3215
    GCTTCT CTGC
    5817 5839 CCTCGGAGCTGGTAAAA 1447 GCCTCTTTTTACCAGC 3216
    AGAGGC TCCG
    5839 5861 CCTAACCCCTGTCTTTA 1448 TAAATCTAAAGACAGG 3217
    GATTTA GGTT
    5844 5866 CCCCTGTCTTTAGATTT 1449 GACTGTAAATCTAAAG 3218
    ACAGTC ACAG
    5845 5867 CCCTGTCTTTAGATTTA 1450 GGACTGTAAATCTAAA 3219
    CAGTCC GACA
    5846 5868 CCTGTCTTTAGATTTAC 1451 TGGACTGTAAATCTAA 3220
    AGTCCA AGAC
    5866 5888 CCAATGCTTCACTCAGC 1452 AAAATGGCTGAGTGA 3221
    CATTTT AGCAT
    5882 5904 CCATTTTACCTCACCCC 1453 TCAGTGGGGGTGAGGT 3222
    CACTGA AAAA
    5890 5912 CCTCACCCCCACTGATG 1454 GGCGAACATCAGTGG 3223
    TTCGCC GGGTG
    5895 5917 CCCCCACTGATGTTCGC 1455 CGGTCGGCGAACATCA 3224
    CGACCG GTGG
    5896 5918 CCCCACTGATGTTCGCC 1456 ACGGTCGGCGAACATC 3225
    GACCGT AGTG
    5897 5919 CCCACTGATGTTCGCCG 1457 AACGGTCGGCGAACAT 3226
    ACCGTT CAGT
    5898 5920 CCACTGATGTTCGCCGA 1458 CAACGGTCGGCGAAC 3227
    CCGTTG ATCAG
    5911 5933 CCGACCGTTGACTATTC 1459 TGTAGAGAATAGTCAA 3228
    TCTACA CGGT
    5915 5937 CCGTTGACTATTCTCTA 1460 GGTTTGTAGAGAATAG 3229
    CAAACC TCAA
    5936 5958 CCACAAAGACATTGGA 1461 ATAGTGTTCCAATGTC 3230
    ACACTAT TTTG
    5960 5982 CCTATTATTCGGCGCAT 1462 CAGCTCATGCGCCGAA 3231
    GAGCTG TAAT
    5987 6009 CCTAGGCACAGCTCTAA 1463 GGAGGCTTAGAGCTGT 3232
    GCCTCC GCCT
    6005 6027 CCTCCTTATTCGAGCCG 1464 CCAGCTCGGCTCGAAT 3233
    AGCTGG AAGG
    6008 6030 CCTTATTCGAGCCGAGC 1465 GGCCCAGCTCGGCTCG 3234
    TGGGCC AATA
    6019 6041 CCGAGCTGGGCCAGCCA 1466 GTTGCCTGGCTGGCCC 3235
    GGCAAC AGCT
    6029 6051 CCAGCCAGGCAACCTTC 1467 TACCTAGAAGGTTGCC 3236
    TAGGTA TGGC
    6033 6055 CCAGGCAACCTTCTAGG 1468 TCGTTACCTAGAAGGT 3237
    TAACGA TGCC
    6041 6063 CCTTCTAGGTAACGACC 1469 AGATGTGGTCGTTACC 3238
    ACATCT TAGA
    6056 6078 CCACATCTACAACGTTA 1470 TGACGATAACGTTGTA 3239
    TCGTCA GATG
    6082 6104 CCCATGCATTTGTAATA 1471 GAAGATTATTACAAAT 3240
    ATCTTC GCAT
    6083 6105 CCATGCATTTGTAATAA 1472 AGAAGATTATTACAAA 3241
    TCTTCT TGCA
    6117 6139 CCCATCATAATCGGAGG 1473 CCAAAGCCTCCGATTA 3242
    CTTTGG TGAT
    6118 6140 CCATCATAATCGGAGGC 1474 GCCAAAGCCTCCGATT 3243
    TTTGGC ATGA
    6153 6175 CCCCTAATAATCGGTGC 1475 TCGGGGGCACCGATTA 3244
    CCCCGA TTAG
    6154 6176 CCCTAATAATCGGTGCC 1476 ATCGGGGGCACCGATT 3245
    CCCGAT ATTA
    6155 6177 CCTAATAATCGGTGCCC 1477 TATCGGGGGCACCGAT 3246
    CCGATA TATT
    6169 6191 CCCCCGATATGGCGTTT 1478 GCGGGGAAACGCCAT 3247
    CCCCGC ATCGG
    6170 6192 CCCCGATATGGCGTTTC 1479 TGCGGGGAAACGCCAT 3248
    CCCGCA ATCG
    6171 6193 CCCGATATGGCGTTTCC 1480 ATGCGGGGAAACGCC 3249
    CCGCAT ATATC
    6172 6194 CCGATATGGCGTTTCCC 1481 TATGCGGGGAAACGCC 3250
    CGCATA ATAT
    6186 6208 CCCCGCATAAACAACAT 1482 AAGCTTATGTTGTTTA 3251
    AAGCTT TGCG
    6187 6209 CCCGCATAAACAACATA 1483 GAAGCTTATGTTGTTT 3252
    AGCTTC ATGC
    6188 6210 CCGCATAAACAACATAA 1484 AGAAGCTTATGTTGTT 3253
    GCTTCT TATG
    6219 6241 CCTCCCTCTCTCCTACTC 1485 AGCAGGAGTAGGAGA 3254
    CTGCT GAGGG
    6222 6244 CCCTCTCTCCTACTCCTG 1486 GCGAGCAGGAGTAGG 3255
    CTCGC AGAGA
    6223 6245 CCTCTCTCCTACTCCTGC 1487 TGCGAGCAGGAGTAG 3256
    TCGCA GAGAG
    6210 6252 CCTACTCCTGCTCGCAT 1488 TAGCAGATGCGAGCA 3257
    CTGCTA GGAGT
    6236 6258 CCTGCTCGCATCTGCTA 1489 CCACTATAGCAGATGC 3258
    TAGTGG GAGC
    6262 6284 CCGGAGCAGGAACAGG 1490 TGTTCAACCTGTTCCT 3259
    TTGAACA GCTC
    6290 6312 CCCTCCCTTAGCAGGGA 1491 AGTAGTTCCCTGCTAA 3260
    ACTACT GGGA
    6291 6313 CCTCCCTTAGCAGGGAA 1492 GAGTAGTTCCCTGCTA 3261
    CTACTC AGGG
    6294 6316 CCCTTAGCAGGGAACTAC 1493 TGGGAGTAGTTCCCTG 3262
    CTCCCA CTAA
    6295 6317 CCTTAGCAGGGAACTAC 1494 GTGGGAGTAGTTCCCT 3263
    TCCCAC GCTA
    6313 6335 CCCACCCTGGAGCCTCC 1495 GTCTACGGTGGCTCCA 3264
    GTAGAC GGGT
    6314 6336 CCACCCTGGAGCCTCCG 1496 GGTCTACGGAGGCTCC 3265
    TAGACC AGGG
    6317 6339 CCCTGGAGCCTCCGTAG 1497 TTAGGTCTACGGAGGC 3266
    ACCTAA TCCA
    6318 6340 CCTGGAGCCTCCGTAGA 1498 GTTAGGTCTACGGAGG 3267
    CCTAAC CTCC
    6325 6347 CCTCCGTAGACCTAACC 1499 GAAGATGGTTAGGTCT 3268
    ATCTTC ACGG
    6328 6350 CCGTAGACCTAACCATC 1500 GGAGAAGATGGTTAG 3269
    TTCTCC GTCTA
    6335 6357 CCTAACCATCTTCTCCTT 1501 GGTGTAAGGAGAAGA 3270
    ACACC TGGTT
    6340 6362 CCATCTTCTCCTTACAC 1502 TGCTAGGTGTAAGGAG 3271
    CTAGCA AAGA
    6349 6371 CTTACACCTAGCAGGT 1503 GGAGACACCTGCTAGG 3272
    GTCTCC TGTA
    6356 6378 CCTAGCAGGTGTCTCCT 1504 AGATAGAGGAGACAC 3273
    CTATCT CTGCT
    6370 6392 CCTCTATCTTAGGGGCC 1505 ATTGATGGCCCCTAAG 3274
    ATCAAT ATAG
    6385 6407 CCATCAATTTCATCACA 1506 AATTGTTGTGATGAAA 3275
    ACAATT TTGA
    6420 6442 CCCCCTGCCATAACCCA 1507 TGGTATTGGGTTATGG 3276
    ATACCA CAGG
    6421 6443 CCCCTGCCATAACCCAA 1508 TTGGTATTGGGTTATGG 3277
    TACCAA GCAG
    6422 6444 CCCTGCCATAACCCAAT 1509 TTTGGTATTGGGTTAT 3278
    ACCAAA GGCA
    6423 6445 CCTGCCATAACCCAATA 1510 GTTTGGTATTGGGTTA 3279
    CCAAAC TGGC
    6427 6449 CCATAACCCAATACCAA 1511 GGGCGTTTGGTATTGG 3280
    ACGCCC GTTA
    6433 6455 CCCAATACCAAACGCCC 1512 GAAGAGGGGCGTTTG 3281
    CTCTTC GTATT
    6434 6456 CCAATACCAAACGCCCC 1513 CGAAGAGGGGCGTTTG 3282
    TCTTCG GTAT
    6440 6462 CCAAACGCCCCTCTTCG 1514 ATCAGACGAAGAGGG 3283
    TCTGAT GCGTT
    6447 6469 CCCCTCTTCGTCTGATC 1515 AGGACGGATCAGACG 3284
    CGTCCT AAGAG
    6448 6470 CCCTCTTCGTCTGATCC 1516 TAGGACGGATCAGAC 3285
    GTCCTA GAAGA
    6449 6471 CCTCTTCGTCTGATCCG 1517 TTAGGACGGATCAGAC 3286
    TCCTAA GAAG
    6463 6485 CCGTCCTAATCACAGCA 1518 TAGGACTGCTGTGATT 3287
    GTCCTA AGGA
    6467 6489 CCTAATCACAGCAGTCC 1519 GAAGTAGGACTGCTGT 3288
    TACTTC GATT
    6482 6504 CCTACTTCTCCTATCTCT 1520 CTGGGAGAGATAGGA 3289
    CCCAG GAAGT
    6491 6513 CCTATCTCTCCCAGTCC 1521 CAGCTAGGACTGGGA 3290
    TAGCTG GAGAT
    6500 6522 CCCAGTCCTAGCTGCTG 1522 TGATGCCAGCAGCTAG 3291
    GCATCA GACT
    6501 6523 CCAGTCCTAGCTGCTGG 1523 GTGATGCCAGCAGCTA 3292
    CATCAC GGAC
    6506 6528 CCTAGCTGCTGGCATCA 1524 GTATAGTGATGCCAGC 3293
    CTATAC AGCT
    6539 6561 CCGCAACCTCAACACCA 1525 AGAAGGTGGTGTTGAG 3294
    CCTTCT GTTG
    6545 6567 CCTCAACACCCTTCT 1526 GGTCGAAGAAGGTGG 3295
    TCGACC TGTTG
    6553 6575 CCACCTTCTTCGACCCC 1527 TCCGGCGGGGTCGAAG 3296
    GCCGGA AAGG
    6556 6578 CCTTCTTCGACCCCGCC 1528 TCCTCCGGCGGGGTCG 3297
    GGAGGA AAGA
    6566 6588 CCCCGCCGGAGGAGGA 1529 TGGGGTCTCCTCCTCC 3298
    GACCCCA GGCG
    6567 6589 CCCGCCGGAGGAGGAG 1530 ATGGGGTCTCCTCCTC 3299
    ACCCCAT CGGC
    6568 6590 CCGCCGGAGGAGGAGA 1531 AATGGGGTCTCCTCCT 3300
    CCCCATT CCGG
    6571 6593 CCGGAGGAGGAGACCC 1532 TAGAATGGGGTCTCCT 3301
    CATTCTA CCTC
    6584 6606 CCCCATTCTATACCAAC 1533 ATAGGTGTTGGTATAG 3302
    ACCTAT AATG
    6585 6607 CCCATTCTATACCAACA 1534 AATAGGTGTTGGTATA 3303
    CCTATT GAAT
    6586 6608 CCATTCTATACCAACAC 1535 GAATAGGTGTTGGTAT 3304
    CTATTC AGAA
    6596 6618 CCAACACCTATTCTGAT 1536 CGAAAAATCAGAATA 3305
    TTTTCG GGTGT
    6602 6624 CCTATTCTGATTTTTCGG 1537 GGTGACCGAAAAATC 3306
    TCACC AGAAT
    6623 6645 CCCTGAAGTTTATATTC 1538 GGATAAGAATATAAA 3307
    TTATCC CTTCA
    6624 6646 CCTGAAGTTTATATTCT 1539 AGGATAAGAATATAA 3308
    TATCCT ACTTC
    6644 6666 CCTACCAGGCTTCGGAA 1540 AGATTATTCCGAAGCC 3309
    TAATCT TGGT
    6648 6670 CCAGGCTTCGGAATAAT 1541 TGGGAGATTATTCCGA 3310
    CTCCCA AGCC
    6667 6689 CCCATATTGTAACTAC 1542 GGAGTAGTAAGTTACA 3311
    TACTCC ATAT
    6668 6690 CCATATTGTAACTTACT 1543 CGGAGTAGTAAGTTAC 3312
    ACTCCG AATA
    6688 6710 CCGGAAAAAAAGAACC 1544 TCCAAATGGTTCTTTTT 3313
    ATTTGGA TTC
    6702 6724 CCATTTGGATACATAGG 1545 ACCATACCTATGTATC 3314
    TATGGT CAAA
    6749 6771 CCTAGGGTTTATCGTGT 1546 GTGCTCACACGATAAA 3315
    GAGCAC CCCT
    6773 6795 CCATATATTTACAGTAG 1547 CTATTCCTACTGTAAA 3316
    GAATAG TATA
    6820 6842 CCTCCGCTACCATAATC 1548 AGCGATGATTATGGTA 3317
    ATCGCTGCGG GCGG
    6823 6845 CCGCTACCATAATCATC 1549 GATAGCGATGATTATG 3318
    GCTATC GTAG
    6829 6851 CCATAATCATCGCTATC 1550 GGTGGGGATAGCGAT 3319
    CCCACC GATTA
    6845 6867 CCCCACCGGCGTCAAAG 1551 TAAATACTTTGACGCC 3320
    TATTTA GGTG
    6846 6868 CCCACCGGCGTCAAAGT 1552 CTAAATACTTTGACGC 3321
    ATTTAG CGGT
    6847 6869 CCACCGGCGTCAAAGTA 1553 GCTAAATACTTTGACG 3322
    TTTAGC CCGG
    6850 6872 CCGGCGTCAAAGTATTT 1554 TCAGCTAAATACTTTG 3323
    AGCTGA ACGC
    6877 6899 CCACACTCCACGGAAGC 1555 CATATTGCTTCCGTGG 3324
    AATATG AGTG
    6884 6906 CCACGGAAGCAATATG 1556 ATCATTTCATATTGCTT 3325
    AAATGAT CCG
    6925 6947 CCCTAGATTCATCTTT 1557 GAAAAGAAAGATGAA 3326
    CTTTTC TCCTA
    6926 6948 CCTAGGATTCATCTTTC 1558 TGAAAAGAAAGATGA 3327
    TTTTCA ATCCT
    6949 6971 CCGTAGGTGGCCTGACT 1559 AATGCCAGTCAGGCCA 3328
    GGCATT CCTA
    6959 6981 CCTGACTGGCATTGTAT 1560 TTGCTAATACAATGCC 3329
    TAGCAA AGTC
    7027 7049 CCCACTTCCACTATGTC 1561 TGATAGGACATAGTGG 3330
    CTATCA AAGT
    7028 7050 CCACTTCCACTATGTCC 1562 TTGATAGGACATAGTG 3331
    TATCAA GAAG
    7034 7056 CCACTATGTCCTATCAA 1563 CTCCTATTGATAGGAC 3332
    TAGGAG ATAG
    7043 7065 CCTATCAATAGGAGCTG 1564 CAAATACAGCTCCTAT 3333
    TATTTG TGAT
    7066 7088 CCATCATAGGAGGCTTC 1565 GTGAATGAAGCCTCCT 3334
    ATTCAC ATGA
    7095 7117 CCCCTATTCTCAGGCTA 1566 AGGGTGTAGCCTGAGA 3335
    CACCCT ATAG
    7096 7118 CCCTATTCTCAGGCTAC 1567 TAGGGTGTAGCCTGAG 3336
    ACCCTA AATA
    7097 7119 CCTATTCFCAGGCTACA 1568 CTAGGGTGTAGCCTGA 3337
    CCCTAG GAAT
    7114 7136 CCCTAGACCAAACCTAC 1569 TTTGGCGTAGGTTTGG 3338
    GCCAAA TCTA
    7115 7137 CCTAGACCAAACCTACG 1570 TTTTGGCGTAGGTTTG 3339
    CCAAAA GTCT
    7121 7143 CCAAACCTACGCCAAAA 1571 AATGGATTTTGGCGTA 3340
    TCCATT GGTT
    7126 7148 CCTACGCCAAAATCCAT 1572  AGTGAAATGGATTTTG 3341
    TTCACT GCGT
    7132 7154 CCAAAATCCATTTCACT 1573 TATGATAGTGAAATGG 3342
    ATCATA ATTT
    7139 7161 CCATTTCACTATCATAT 1574 CGATGAATATGATAGT 3343
    TCATCG GAAA
    7181 7203 CCCACAACACTTTCTCG 1575 ATAGGCCGAGAAAGT 3344
    GCCTAT GTTGT
    7182 7204 CCACAACACTTTCTCGG 1576 GATAGGCCGAGAAAG 3345
    CCTATC TGTTG
    7199 7221 CCTATCCGGAATGCCCC 1577 AACGTCGGGGCATTCC 3346
    GACGTT GGAT
    7204 7226 CCGGAATGCCCCGACGT 1578 CGAGTAACGTCGGGGC 3347
    TACTCG ATTC
    7212 7234 CCCCGACGTTACTCGGA 1579 GGGTAGTCCGAGTAAC 3348
    CTACCC GTCG
    7213 7235 CCCGACGTTACTCGGAC 1580 GGGGTAGTCCGAGTAA 3349
    TACCCC CGTC
    7214 7236 CCGACGTTACTCGGACT 1581 CGGGGTAGTCCGAGTA 3350
    ACCCCG ACGT
    7232 7254 CCCCGATGCATACACCA 1582 TTCATGTGGTGTATGC 3351
    CATGAA ATCG
    7233 7255 CCCGATGCATACACCAC 1583 TTTCATGTGGTGTATG 3352
    ATGAAA CATC
    7234 7256 CCGATGCATACACCACA 1584 GTTTCATGTGGTGTAT 3353
    TGAAAC GCAT
    7246 7268 CCACATGAAACATCCTA 1585 AGATGATAGGATGTTT 3354
    TCATCT CATG
    7259 7281 CCTATCATCTGTAGGCT 1586 TGAATGAGCCTACAGA 3355
    CATTCA TGAT
    7327 7349 CCTTCGCTTCGAAGCGA 1587 GACTTTTCGCTTCGAA 3356
    AAAGTC GCGA
    7349 7371 CCTAATAGTAGAAGAAC 1588 TGGAGGGTTCTTCTAC 3357
    CCTCCA TATT
    7365 7387 CCCTCCATAAACCTGGA 1589 AGTCACTCCAGGTTTA 3358
    GTGACT TGGA
    7366 7388 CCTCCATAAACCTGGAG 1590 TAGTCACTCCAGGTTT 3359
    TGACTA ATGG
    7369 7391 CCATAAACCTGGAGTGA 1591 ATATAGTCACTCCAGG 3360
    CTATAT TTTA
    7376 7398 CCTGGAGTGACTATATG 1592 GGCATCCATATAGTCA 3361
    GATGCC CTCC
    7397 7419 CCCCCCACCCTACCACA 1593 CGAATGTGTGGTAGGG 3362
    CATTCG TGGG
    7398 7420 CCCCCACCCTACCACAC 1594 TCGAATGTGTGGTAGG 3363
    ATTCGA GTGG
    7399 7421 CCCCACCCTACCACACA 1595 TTCGAATGTGTGGTAG 3364
    TTCGAA GGTG
    7400 7422 CCCACCCTACCACACAT 1596 CTTCGAATGTGTGGTA 3365
    TCGAAG GGGT
    7401 7423 CCACCCTACCACACATT 1597 TCTTCGAATGTGTGGT 3366
    CGAAGA AGGG
    7404 7426 CCCTACCACACATTCGA 1598 GGTTCTTCGAATGTGT 3367
    AGAACC GGTA
    7405 7427 CCTACCACACATTCGAA 1599 GGGTTCTTCGAATGTG 3368
    GAACCC TGGT
    7409 7431 CCACACATTCGAAGAAC 1600 ATACGGGTTCTTCGAA 3369
    CCGTAT TGTG
    7425 7447 CCCGTATACATAAAATC 1601 TGTCTAGATTTTATGT 3370
    TAGACA ATAC
    7426 7448 CCGTATACATAAAATCT 1602 TTGTCTAGATTTTATGT 3371
    AGACAA ATA
    7466 7488 CCCCCCAAAGCTGGTTT 1603 GGCTTGAAACCAGCTT 3372
    CAAGCC TGGG
    7467 7489 CCCCCAAAGCTGGTTTC 1604 TGGCTTGAAACCAGCT 3373
    AAGCCA TTGG
    7468 7490 CCCCAAAGTGGTTTCA 1605 TTGGCTTGAAACCAGC 3374
    AGCCAA TTTG
    7469 7491 CCCAAAGCTGGTTTCAA 1606 GTTGGCTTGAAACCAG 3375
    GCCAAC CTTT
    7470 7492 CCAAAGCTGGTTTCAAG 1607 GGTTGGCTTGAAACCA 3376
    CCAACC GCTT
    7487 7509 CCAACCCCATGGCCTCC 1608 AGTCATGGAGGCCATG 3377
    ATGACT GGGT
    7491 7513 CCCCATGGCCTCCATGA 1609 AAAAAGTCATGGAGG 3378
    CTTTTT CCATG
    7492 7514 CCCATGGCCTCCATGAC 1610 GAAAAAGTCATGGAG 3379
    TTTTTC GCCAT
    7493 7515 CCATGGCCTCCATGACT 1611 TGAAAAAGTCATGGA 3380
    TTTTCA GGCCA
    7499 7521 CCTCCATGACTTTTTCA 1612 CCTTTTTGAAAAAGTC 3381
    AAAAGG ATGG
    7502 7524 CCATGACTTTTTCAAAA 1613 ATACCTTTTTGAAAAA 3382
    AGGTAT GTCA
    7533 7555 CCATTTCATAACTTTGT 1614 ACTTTGACAAAGTTAT 3383
    CAAAGT GAAA
    7573 7595 CCTATATATCTTAATGG 1615 CATGTGCCATTAAGAT 3384
    CACATG ATAT
    7626 7648 CCCCTATCATAGAAGAG 1616 GATAAGCTCTTCTATG 3385
    CTTATC ATAG
    7627 7649 CCCTATCATAGAAGAGC 1617 TGATAAGCTCTTCTAT 3386
    TTATCA GATA
    7628 7650 CCTATCATAGAAGAGCT 1618 GTGATAAGCTCTTCTA 3387
    TATCAC TGAT
    7650 7672 CCTTTCATGATCACGCC 1619 TATGAGGGCGTGATCA 3388
    CTCATA TGAA
    7665 7687 CCCTCATAATCATTTTC 1620 GATAAGGAAAATGATT 3389
    CTTATC ATGA
    7666 7688 CCTCATAATCATTTTCCT 1621 AGATAAGGAAAATGA 3390
    TATCT TTATG
    7681 7703 CCTTATCTGCTTCCTAGT 1622 ACAGGACTAGGAAGC 3391
    CCTGT AGATA
    7693 7715 CCTAGTCCTGTATGCCC 1623 GGAAAAGGGCATACA 3392
    TTTTCC GGACT
    7699 7721 CCTGTATGCCCTTTTCCT 1624 GTGTTAGGAAAAGGG 3393
    AACAC CATAC
    7707 7729 CCCTTTTCCTAACACTC 1625 TGTTGTGAGTGTTAGG 3394
    ACAACA AAAA
    7708 7730 CCTTTTCTAACACTCA 1626 TTGTTGTGAGTGTTAG 3395
    CAACAA GAAA
    7714 7736 CCTAACACTCACAACAA 1627 TTAGTTTTGTTGTGAG 3396
    AACTAA TGTT
    7773 7795 CCGTCTGAACTATCCTG 1628 GGCGGGCAGGATAGTT 3397
    CCCGCC CAGA
    7786 7808 CCTGCCCGCCATCATCC 1629 GGACTAGGATGATGGC 3398
    TAGTCC GGGC
    7790 7812 CCCGCCATCATCCTAGT 1630 ATGAGGACTAGGATG 3399
    CCTCAT ATGGC
    7791 7813 CCGCCATCATCCTAGTC 1631 GATGAGGACTAGGAT 3400
    CTCATC GATGG
    7794 7816 CCATCATCCTAGTCCTC 1632 GGCGATGAGGACTAG 3401
    ATCGCC GATGA
    7801 7823 CCTAGTCCTCATCGCCC 1633 ATGGGAGGGCGATGA 3402
    TCCCAT GGACT
    7807 7829 CCTCATCGCCCTCCCAT 1634 GTAGGGATGGGAGGG 3403
    CCCTAC CGATG
    7815 7837 CCCTCCCATCCCTACGC 1635 AAGGATGCGTAGGGA 3404
    ATCCTT TGG
    7816 7838 CCTCCCATCCCTACGCA 1636 AAAGGATGCGTAGGG 3405
    TCCTTT ATGGG
    7819 7841 CCCATCCCTACGCATCC 1637 TGTAAAGGATGCGTAG 3406
    TTTACA GGAT
    7820 7842 CCATCCCTACGCATCCT 1638 ATGTAAAGGATGCGTA 3407
    TTACAT GGGA
    7824 7846 CCCTACGCATCCTTTAC 1639 TGTTATGTAAAGGATG 3408
    ATAACA CGTA
    7825 7847 CCTACGCATCCTTTACA 1640 CTGTTATGTAAAGGAT 3409
    TAACAG GCGT
    7834 7856 CCTTTACATAACAGACG 1641 TGACCTCGTCTGTTAT 3410
    AGGTCA GTAA
    7862 7884 CCCTCCCTTACCATCAA 1642 ATTGATTTGATGGTAA 3411
    ATCAAT GGGA
    7863 7885 CCTCCCTTACCATCAAA 1643 AATTGATTTGATGGTA 3412
    TCAATT AGGG
    7866 7888 CCCTTACCATCAAATCA 1644 GCCAATTGATTTGATG 3413
    ATTGGC GTAA
    7867 7889 CCTTACCATCAAATCAA 1645 GGCCAATTGATTTGAT 3414
    TTGGCC GGTA
    7872 7894 CCATCAAATCAATTGGC 1646 TTGGTGGCCAATTGAT 3415
    CACCAA TTGA
    7888 7910 CCACCAATGGTACTGAA 1647 CGTAGGTTCAGTACCA 3416
    CCTACG TTGG
    7891 7913 CCAATGGTACTGAACCT 1648 ACTCGTAGGTTCAGTA 3417
    ACGAGT CCAT
    7905 7927 CCTACGAGTACACCGAC 1649 GCCGTAGTCGGTGTAC 3418
    TACGGC TCGT
    7917 7939 CCGACTACGGCGGACTA 1650 GAAGATTAGTCCGCCG 3419
    ATCTTC TAGT
    7944 7966 CCTACATACTTCCCCCA 1651 GAATAATGGGGGAAG 3420
    TTATTC TATGT
    7955 7977 CCCCCATTATTCCTAGA 1652 CCTGGTTCTAGGAATA 3421
    ACCAGG ATGG
    7956 7978 CCCCATTTTCCTAGAA 1653 GCCTGGTTCTAGGAAT 3422
    CCAGGC AATG
    7957 7979 CCCATTATTCCTAGAAC 1654 CGCCTGGTTCTAGGAA 3423
    CAGGCG TAAT
    7958 7980 CCATTATTCCTAGAACC 1655 TCGCCTGGTTCTAGGA 3424
    AGGCGA ATAA
    7966 7988 CCTAGAACCAGGCGACC 1656 GTCGCAGGTCGCCTGG 3425
    TGCGAC TTCT
    7973 7995 CCAGGCGACCTGCGACT 1657 TCAAGGACTCGCAGGT 3426
    CCTTGA CGCC
    7981 8003 CCTGCGACTCCTTGACG 1658 TGTCAACGTCAAGGAG 3427
    TTGACA TCGC
    7990 8012 CCTTGACGTTGACAATC 1659 CTACTCGATTGTCAAC 3428
    GAGTAG GTCA
    8017 8039 CCCGATTGAAGCCCCCA 1660 TACGAATGGGGGCTTC 3429
    TTCCTTA AATC
    8018 8040 CCGATTGAAGCCCCCAT 1661 ATACGAATGGGGGCTT 3430
    TCGTAT CAAT
    8028 8050 CCCCCATTCGTATAATA 1662 TGTAATTATTATACGA 3431
    ATTACA ATGG
    8029 8051 CCCCATTCGTATAATAA 1663 ATGTAATTATTATACG 3432
    TTACAT AATG
    8030 8052 CCCATTCGTATAATAAT 1664 GATGTAATTATTATAC 3433
    TACATC GAAT
    8031 8053 CCATTCGTATAATAATT 1665 TGATGTAATTATTATA 3434
    ACATCA CGAA
    8080 8102 CCCCACATTAGGCTTAA 1666 CTGTTTTTAAGCCTAA 3435
    AAACAG TGTG
    8081 8103 CCCACATTAGGCTTAAA 1667 TCTGTTTTTAAGCCTA 3436
    AACAGA ATGT
    8082 8104 CCACATTAGGCTTAAAA 1668 ATCTGTTTTTAAGCCT 3437
    ACAGAT AATG
    8111 8133 CCCGGACGTCTAAACCA 1669 GTGGTTTGGTTTAGAC 3438
    AACCAC GTCC
    8112 8134 CCGGACGTCTAAACCAA 1670 AGTGGTTTGGTTTAGA 3439
    ACCACT CGTC
    8125 8147 CCAAACCACTTTCACCG 1671 GTGTAGCGGTGAAAGT 3440
    CTACAC GGTT
    8130 8152 CCACTTTCACCGCTACA 1672 CGGTCGTGTAGCGGTG 3441
    CGACCG AAAG
    8139 8161 CCGCTACACGACCGGGG 1673 GTATACCCCCGGTCGT 3442
    GTATAC GTAG
    8150 8172 CCGGGGGTATACTACGG 1674 CATTGACCGTAGTATA 3443
    TCAATG CCCC
    8194 8216 CCACAGTTTCATGCCCA 1675 GGACGATGGGCATGA 3444
    TCGTCC AACTG
    8207 8229 CCCATCGTCCTAGAATT 1676 GGAATTAATTCTAGGA 3445
    AATTCC CGAT
    8208 8230 CCATCGTCCTAGAATTA 1677 GGGAATTAATTCTAG 3446
    ATTCCC ACGA
    8215 8237 CCTAGAATTAATTCCCC 1678 TTTTTAGGGGAATTAA 3447
    TAAAAA TTCT
    8228 8250 CCCCTAAAAATCTTTGA 1679 CCTATTTCAAAGATTT 3448
    AATAGG TTAG
    8229 8251 CCCTAAAAATCTTTGAA 1680 CCCTATTTCAAAGATT 3449
    ATAGGG TTTA
    8230 8252 CCTAAAAATCTTTGAAA 1681 GCCCTATTTCAAAGAT 3450
    TAGGGC TTTT
    8252 8274 CCCGTATTTACCCTATA 1682 GGGTGCTATAGGGTAA 3451
    GCACCC ATAC
    8253 8275 CCGTATTTACCCTATAG 1683 GGGGTGCTATAGGGTA 3452
    CACCCC AATA
    8262 8284 CCCTATAGCACCCCCTC 1684 GGGGTAGAGGGGGTG 3453
    TACCCC CTATA
    8263 8285 CCTATAGCACCCCCTCT 1685 GGGGGTAGAGGGGGT 3454
    ACCCCC GCTAT
    8272 8294 CCCCCTCTACCCCCTCT 1686 GGCTCTAGAGGGGGTA 3455
    AGAGCC GAGG
    8273 8295 CCCCTCTACCCCCTCTA 1687 GGGCTCTAGAGGGGGT 3456
    GAGCCC AGAG
    8274 8296 CCCTCTACCCCCTCTAG 1688 TGGGCTCTAGAGGGGG 3457
    AGCCCA TAGA
    8275 8297 CCTCTACCCCCTCTAGA 1689 GTGGGCTCTAGAGGCG 3458
    GCCCAC GTAG
    8281 8303 CCCCCTCTAGAGCCCAC 1690 TTTACAGTGGGCTCTA 3459
    TGTAAA GAGG
    8282 8304 CCCCTCTAGAGCCCACT 1691 CTTTACAGTGGGCTCT 3460
    GTAAAG AGAG
    8283 8305 CCCTCTAGAGCCCACTG 1692 GCTTTACAGTGGGCTC 3461
    TAAAGC TAGA
    8284 8306 CCTCTAGAGCCCACTGT 1693 AGCTTTACAGTGGGCT 3462
    AAAGCT CTAG
    8293 8315 CCCACTGTAAAGCTAAC 1694 TGCTAAGTTAGCTTTA 3463
    TTAGCA CAGT
    8294 8316 CCACTGTAAAGCTAACT 1695 ATGCTAAGTTAGCTTT 3464
    TAGCAT ACAG
    8320 8342 CCTTTTAAGTTAAAGAT 1696 CTCTTAATCTTTAACTT 3465
    TAAGAG AAA
    8345 8367 CCAACACCTCTTTACAG 1697 ATTTCACTGTAAAGAG 3466
    TGAAAT GTGT
    8351 8373 CCTCTTTACAGTGAAAT 1698 TGGGGCATTTCACTGT 3467
    GCCCCA AAAG
    8369 8391 CCCCAACTAAATACTAC 1699 CATACGGTAGTATTTA 3468
    CGTATG GTTG
    8370 8392 CCCAACTAAATACTACC 1700 CCATACGGTAGTATTT 3469
    GTATGG AGTT
    8371 8393 CCAACTAAATACTACCG 1701 GCCATACGGTAGTATT 3470
    TATGGC TAGT
    8385 8407 CCGTATGGCCCACCATA 1702 GGTAATTATGGTGGGC 3471
    ATTACC CATA
    8393 8415 CCCACCATAATTACCCC 1703 AGTATGGGGGTAATTA 3472
    CATACT TGGT
    8394 8416 CCACCATAATTACCCCC 1704 GAGTATGGGGGTAATT 3473
    ATACTC ATGG
    8397 8419 CCATAATTACCCCCATA 1705 AAGGAGTATGGGGGT 3474
    CTCCTT AATTA
    8406 8428 CCCCCATACTCCTTACA 1706 GAATAGTGTAAGGAGT 3475
    CTATTC ATGG
    8407 8429 CCCCATACTCCTTACAC 1707 GGAATAGTGTAAGGA 3476
    TATTCC GTATG
    8408 8430 CCCATACTCCTTACACT 1708 AGGAATAGTGTAAGG 3477
    ATTCCT AGTAT
    8409 8431 CCATACTCCTTACACTTA 1709 GAGGAATAGTGTAAG 3478
    TTCCTC GAGTA
    8416 8438 CCTTACACTATTCCTCA 1710 GGGTGATGAGGAATA 3479
    TCACCC GTGTA
    8428 8450 CCTCATCACCCAACTAA 1711 ATATTTTTAGTTGGGT 3480
    AAATAT GATG
    8436 8458 CCCAACTAAAAATATTA 1712 TGTGTTTAATATTTTTA 3481
    AACACA GTT
    8437 8459 CCAACTAAAAATATTAA 1713 TTGTGTTTAATATTTTT 3482
    ACACAA AGT
    8464 8486 CCACCTACCTCCCTCAC 1714 GCTTTGGTGAGGGAGG 3483
    CAAAGC GAGG
    8467 8489 CCTACCTCCCTCACCAA 1715 TGGGCTTTGGTGAGGG 3484
    AGCCCA AGGT
    8471 8493 CCTCCCTCACCAAAGCC 1716 TTTATGGGCTTTGGTG 3485
    CATAAA AGGG
    8474 8496 CCCTCACCAAAGCCCAT 1717 ATTTTTATGGGCTTTG 3486
    AAAAAT GTGA
    8475 8497 CCTCACCAAAGCCCATA 1718 TATTTTTATGGGCTTTG 3487
    AAAATA GTG
    8480 8502 CCAAAGCCCATAAAAAT 1719 TTTTTTATTTTTATGG 3488
    AAAAAA CTT
    8486 8508 CCCATAAAAATAAAAA 1720 TTATAATTTTTTATTTT 3489
    ATTATAA TAT
    8487 8509 CCATAAAAATAAAAAA 1721 GTTATAATTTTTTATTT 3490
    TTATAAC TTA
    8513 8535 CCCTGAGAACCAAAATG 1722 TTCGTTCATTTTGGTTC 3491
    AACGAA TCA
    8514 8536 CCTGAGAACCAAAATG 17231 TTTCGTTCATTTTGGTT 3492
    AACGAAA CTC
    8522 8544 CCAAAATGAACGAAAA 1724 GAACAGATTTTCGTTC 3493
    TCTGTTC ATTT
    8558 8580 CCCCCACAATCCTAGGC 1725 GGGTAGGCCTAGGATT 3494
    CTACCC GTGG
    8559 8581 CCCCACAATCCTAGGCC 1726 CGGGTAGGCCTAGGAT 3495
    TACCCG TGTG
    8560 8582 CCCACAATCCTAGGCCT 1727 GCGGGTAGGCCTTAGG 3496
    ACCCGC ATTGT
    8561 8583 CCACAATCCTAGGCCTA 1728 GGCGGGTAGGCCTAG 3497
    CCCGCC GATTG
    8568 8590 CCTAGGCCTACCCGCCG 1729 GTACTGCGGCGGGTAG 3498
    CAGTAC GCCT
    8574 8596 CCTACCCGCCGCAGTAC 1730 TGATCAGTACTGCGGC 3499
    TGATCA GGGT
    8578 8600 CCCGCCGCAGTACTGAT 1731 AGAATGATCAGTACTG 3500
    CATTCT CGGC
    8579 8601 CCGCCGCAGTACTGATC 1732 TAGAATGATCAGTACT 3501
    ATTCTA GCGG
    8582 8604 CCGCAGTACTGATCATT 1733 AAATAGAATGATCAGT 3502
    CTATTT ACTG
    8605 8627 CCCCCTCTATTGATCCC 1734 GAGGTGGGGATCAAT 3503
    CACCTC AGAGG
    8606 8628 CCCCTCTATTGATCCCC 1735 GGAGGTGGGGATCAA 3504
    ACCTCC TAGAG
    8607 8629 CCCTCTATTGATCCCCA 1736 TGGAGGTGGGGATCA 3505
    CCTCCA ATAGA
    8608 8630 CCTCTATTGATCCCCAC 1737 TTGGAGGTGGGGATCA 3506
    CTCCAA ATAG
    8619 8641 CCCCACCTCCAAATATC 1738 TGATGAGATATTTGGA 3507
    TCATCA GGTG
    8620 8642 CCCACCTCCAAATATCT 1739 TTGATGAGATATTTGG 3508
    CATCAA AGGT
    8621 8643 CCACCTCCAAATATCTC 1740 GTTGATGAGATATTTG 3509
    ATCAAC GAGG
    8624 8646 CCTCCAAATATCTCATC 1741 GTTGTTGATGAGATAT 3510
    AACAAC TTGG
    8627 8649 CCAAATATCTCATCAAC 1742 TCGGTTGTTGATGAGA 3511
    AACCGA TATT
    8646 8668 CCGACTAATCACCACCC 1743 ATTGTTGGGTGGTGAT 3512
    AACAAT TAGT
    8657 8679 CCACCCAACAATGACTA 1744 TTTGATTAGTCATTGTT 3513
    ATCAAA GGG
    8660 8682 CCCAACAATGACTAATC 1745 TAGTTTGATTAGTCAT 3514
    AAACTA TGTT
    8661 8683 CCAACAATGACTAATCA 1746 TTAGTTTGATTAGTCA 3515
    AACTAA TTGT
    8684 8706 CCTCAAAACAAATGATA 1747 TATGGTTATCATTTGTT 3516
    ACCTA TTG
    8702 8724 CCATACACAACACTAAA 1748 TCGTCCTTTAGTGTTGT 3517
    GGACGA GTA
    8726 8748 CCTGATCTCTTATACTA 1749 GGATACTAGTATAAGA 3518
    GTATCC GATC
    8747 8769 CCTTAATCATTTTTATTG 1750 TGTGGCAATAAAAATG 3519
    CCACA ATTA
    8765 8787 CCACAACTAACCTCCTC 1751 GAGTCCGAGGAGGTTA 3570
    GGACTC GTTG
    8775 8797 CCTCCTCGGACTCCTGC 1752 AGTGAGGCAGGAGTC 3521
    CTCACT CGAGG
    8778 8800 CCTCGGACTCCTGCCTC 1753 ATGAGTGAGGCAGGA 3522
    ACTCAT GTCCG
    8787 8809 CCTGCCTCACTCATTTA 1754 TTGGTGTAAATGAGTG 3523
    CACCAA AGGC
    8791 8813 CCTCACTCATTTACACC 1755 GTGGTTGGTGTAAATG 3524
    AACCAC AGTG
    8806 8828 CCAACCACCCAACTATC 1756 TTTATAGATAGTTGGG 3525
    TATAAA TGGT
    8810 8832 CCACCCAACTATCTATA 1757 TAGGTTTATAGATAGT 3526
    AACCTA TGGG
    8813 8835 CCCAACTATCTATAAAC 1758 GGCTAGGTTTATAGAT 3527
    CTAGCC AGTT
    8814 8836 GCCAACTATCTATAAACC 1759 TGGCTAGGTTTATAGA 3528
    TAGCCA TAGT
    8829 8851 CCTAGCCATGGCCATCC 1760 ATAAGGGGATGGCCAT 3529
    CCTTAT GGCT
    8834 8856 CCATGGCCATCCCCTTA 1761 CGCTCATAAGGGGATG 3530
    TGAGCG GCCA
    8840 8862 CCATCCCCTTATGAGCG 1762 TGTGCCCGCTCATAAG 3531
    GGCACA GGGA
    8844 8866 CCCCTTATGAGCGGGCA 1763 TCACTGTGCCCGCTCA 3532
    CAGTGA TAAG
    8845 8867 CCCTTATGAGCGGGCAC 1764 ATCACTGTGCCCGCTC 3533
    AGTGAT ATAA
    8846 8868 CCTTATGAGCGGGCACA 1765 AATCACTGTGCCCGCT 3534
    GTGATT CATA
    8897 8919 CCCTAGCCCACTTCTTA 1766 TTGTGGTAAGAAGTGG 3535
    CCACAA GCTA
    8898 8920 CCTAGCCCACTTCTTAC 1767 CTTGTGGTAAGAAGTG 3536
    CACAAG GGCT
    8903 8925 CCCACTTCTTACCACAA 1768 TGTGCCTTGTGGTAAG 3537
    GGCACA AAGT
    8904 8926 CCACTTCTTACCACAAG 1769 GTGTGCCTTGTGGTAA 3538
    GCACAC GAAG
    8914 8936 CCACAAGGCACACCTAC 1770 AGGGGTGTAGGTGTGC 3539
    ACCCCT CTTG
    8926 8948 CCTACACCCCTTATCCC 1771 AGTATGGGGATAAGG 3540
    CATACT GGTGT
    8932 8954 CCCCTTATCCCCATACT 1772 ATAACTAGTATGGGGA 3541
    AGTTAT TAAG
    8933 8955 CCCTTATCCCCATACTA 1773 AATAACTAGTATGGGG 3542
    GTTATT ATAA
    8934 8956 CCTTATCCCCATACTAG 1774 TAATAACTAGTATGGG 3543
    TTATTA GATA
    8940 8962 CCCCATACTAGTTATTA 1775 TTTCGATAATAACTAC 3544
    TCGAAA TATG
    8941 8963 CCCATACTAGTTATTAT 1776 GTTTCGATAATAACTA 3545
    CGAAAC GTAT
    8942 8964 CCATACTAGTTATTATC 1777 GGTTTCGATAATAACT 3546
    GAAACC AGTA
    8963 8985 CCATCAGCCTACTCATT 1778 TGGTTGAATGAGTAGG 3547
    CAACCA CTGA
    8970 8992 CCTACTCATTCAACCAA 1779 GGGCTATTGGTTGAAT 3548
    TAGCCC GAGT
    8983 9005 CCAATAGCCCTGGCCGT 1780 AGGCGTACGGCCAGG 3549
    ACGCCT GCTAT
    8990 9012 CCCTGGCCGTACGCCTA 1781 AGCGGTTAGGCGTACG 3550
    ACCGCT GCCA
    8991 9013 CCTGGCCGTACGCCTAA 1782 TAGCGGTTAGGCGTAC 3551
    CCGCTA GGCC
    8996 9018 CCGTACGCCTAACCGCT 1783 AATGTTAGCGGTTAGG 3552
    AACATT CGTA
    9003 9025 CCTAACCGCTAACATTA 1784 CTGCAGTAATGTTAGC 3553
    CTGCAG GGTT
    9008 9030 CCGCTAACATTACTGCA 1785 GTGGCCTGCAGTAATG 3554
    GGCCAC TTAG
    9027 9049 CCACCTACTCATGCACC 1786 CAATTAGGTGCATGAG 3555
    TAATTG TAGG
    9030 9052 CCTACTCATGCACCTAA 1787 TTCCAATTAGGTGCAT 3556
    TTGGAA GAGT
    9042 9064 CCTAATTGGAAGCGCCA 1788 CTAGGGTGGCGCTTCC 3557
    CCCTAG AATT
    9056 9078 CCACCCTAGCAATATCA 1789 AATGGTTGATATTGCT 3558
    ACCATT AGGG
    9059 9081 CCCTAGCAATATCAACC 1790 GTTAATGGTTGATATT 3559
    ATTAAC GCTA
    9060 9082 CCTAGCAATATCAACCA 1791 GGTTAATGGTTGATAT 3560
    TTAACC TGCT
    9074 9096 CCATTAACCTTCCCTCT 1792 AAGTGTAGAGGGAAG 3561
    ACACTT GTTAA
    9081 9103 CCTTCCCTCTACACTTAT 1793 AGATGATAAGTGTAGA 3562
    CATCT GGGA
    9085 9107 CCCTCTACACTTATCAT 1794 GTGAAGATGATAAGTG 3563
    CTTCAC TAGA
    9086 9108 CCTCTACACTTATCATC 1795 TGTGAAGATGATAAGT 3564
    TTCACA GTAG
    9129 9151 CCTAGAAATCGCTGTCG 1796 TAAGGCGACAGCGAT 3565
    CCTTAA TTCT
    9146 9168 CCTTAATCCAAGCCTAC 1797 GAAAACGTAGGCTTGG 3566
    GTTTTC ATTA
    9153 9175 CCAAGCCTACGTTTTCA 1798 GAAGTGTGAAAACGT 3567
    CACTTTC AGGCT
    9158 9180 CCTACGTTTTCACACTT 1799 TACTAGAAGTGTGAAA 3568
    CTAGTA ACGT
    9183 9205 CCTCTACCTGCACGACA 1800 ATGTGTTGTCGTGCAG 3569
    ACACAT GTAG
    9189 9211 CCTGCACGACAACACAT 1801 GTCATTATGTGTTGTC 3570
    AATGAC GTGC
    9211 9233 CCCACCAATCACATGCC 1802 ATGATAGGCATGTGAT 3571
    TATCAT TGGT
    9212 9234 CCACCAATCACATGCCT 1803 TATGATAGGCATGTGA 3572
    ATCATA TTGG
    9215 9237 CCAATCACATGCCTATC 1804 CTATATGATAGGCATG 3573
    ATATAG TGAT
    9226 9248 CCTATCATATAGTAAAA 1805 GCTGGGTTTTACTATA 3574
    CCCAGC TGAT
    9243 9265 CCCAGCCCATGACCCCT 1806 CCTGTTAGGGGTCATG 3575
    AACAGG GGCT
    9244 9266 CCAGCCCATGACCCCTA 1807 CCCTGTTAGGGGTCAT 3576
    ACAGGG GGGC
    9248 9270 CCCATGACCCCTAACAG 1808 GGGCCCCTGTTAGGGG 3577
    GGGCCC TCAT
    9249 9271 CCATGACCCCTAACAGG 1809 AGGGCCCCTGTTAGGG 3578
    GGCCCT GTCA
    9255 9277 CCCCTAACAGGGGCCCT 1810 GCTGAGAGGGCCCCTG 3579
    CTCAGC TTAG
    9256 9278 CCCTAACAGGGGCCCTC 1811 GGCTGAGAGGGCCCCT 3580
    TCAGCC GTTA
    9257 9279 CCTAACAGGGGCCCTCT 1812 GGGCTGAGAGGGCCC 3581
    CAGCCC CTGTT
    9268 9290 CCCTCTCAGCCCTCCTA 1813 GGTCATTAGGAGGGCT 3582
    ATGACC GAGA
    9269 9291 CCTCTCAGCCCTCCTAA 1814 AGGTCATTAGGAGGGC 3583
    TGACCT TGAG
    9277 9299 CCCTCCTAATGACCTCC 1815 TAGGCCGGAGGTCATT 3584
    GGCCTA AGGA
    9278 9300 CCTCCTAATGACCTCCG 1816 CTAGGCCGGAGGTCAT 3585
    GCCTAG TAGG
    9281 9303 CCTAATGACCTCCGGCC 1817 TGGCTAGGCCGGAGGT 3586
    TAGCCA CATT
    9289 9311 CCTCCGGCCTAGCCATG 1818 AAATCACATGGCTAGG 3587
    TGATTT CCGG
    9292 9314 CCGGCCTAGCCATGTGA 1819 GTGAAATCACATGGCT 3588
    TTTCAC AGGC
    9296 9318 CCTAGCCATGTGATTTC 1820 GGAAGTGAAATCACAT 3589
    ACTTCC GGCT
    9301 9323 CCATGTGATTTCACTTC 1821 GGAGTGGAAGTGAAA 3590
    CACTCC TCACA
    9317 9339 CCACTCCATAACGCTCC 1822 GTATGAGGAGCGTTAT 3591
    TCATAC GGAG
    9322 9344 CCATAACGCTCCTCATA 1823 GCCTAGTATGAGGAGC 3592
    CTAGGC GTTA
    9332 9354 CCTCATACTAGGCCTAC 1824 TGGTTAGTAGGCCTAG 3593
    TAACCA TATG
    9344 9366 CCTACTAACCAACACAC 1825 TGGTTAGTGTGTTGGT 3594
    TAACCA TAGT
    9352 9374 CCAACACACTAACCATA 1826 TTGGTATATGGTFAGT 3595
    TACCAA GTGT
    9364 9386 CCATATACCAATGATGG 1827 ATCGCGCCATCATTGG 3596
    CGCGAT TATA
    9371 9393 CCAATGATGGCGCGATG 1828 GTGTTACATCGCGCCA 3597
    TAACAC TCAT
    9407 9429 CCAAGGCCACCACACAC 1829 CAGGTGGTGTGTGGTG 3598
    CACCTG GCCT
    9413 9435 CCACCACACACCACCTG 1830 TTTGGACAGGTGGTGT 3599
    TCCAAA GTGG
    9416 9438 CCACACACCACCTGTCC 1831 CTTTTTGGACAGGTGG 3600
    AAAAAG TGTG
    9423 9445 CCACCTGTCCAAAAAGG 1832 CGAAGGCCTTTTTGGA 3601
    CCTTCG CAGG
    9426 9448 CCTGTCCAAAAAGGCCT 1833 TATCGAAGGCCTTTTT 3602
    TCGATA GGAC
    9431 9453 CCAAAAAGGCCTTCGAT 1834 TCCCGTATCGAAGGCC 3603
    ACGGGA TTTT
    9440 9462 CCTTCGATACGGGATAA 1835 ATAGGATTATCCCGTA 3604
    TCCTAT TCGA
    9458 9480 CCTATTTATTACCTCAG 1836 AAACTTCTGAGGTAAT 3605
    AAGTTT AAAT
    9469 9491 CCTCAGAAGTTTTTTTCT 1837 TGCGAAGAAAAAAAC 3606
    TCGCA TTCTG
    9505 9527 CCTTTTACCACTCCAGC 1838 GGCTAGGCTGGAGTGG 3607
    CTAGCC TAAA
    9512 9534 CCACTCCAGCCTAGCCC 1839 GGGTAGGGGCTAGGCT 3608
    CTACCC GGAG
    9517 9539 CCAGCCTAGCCCCTACC 1840 TTGGGGGGTAGGGGCT 3609
    CCCCAA AGGC
    9521 9543 CCTAGCCCCTACCCCCC 1841 CTAATTGGGGGGTAGG 3610
    AATTAG GGCT
    9526 9548 CCCCTACCCCCCAATTA 1842 CCCTCCTAATTGGGGG 3611
    GGAGGG GTAG
    9527 9549 CCCTACCCCCCAATTAG 1843 GCCCTCCTAATTGGGG 3612
    GAGGGC GGTA
    9528 9550 CCTACCCCCCAATTAGG 1844 TGCCCTCCTAATTGGG 3613
    AGGGCA GGGT
    9532 9554 CCCCCCAATTAGGAGGG 1845 CCAGTGCCCTCCTAAT 3614
    CACTGG TGGG
    9533 9555 CCCCCAATTAGGAGGGC 1846 GCCAGTGCCCTCCTAA 3615
    ACTGGC TTGG
    9534 9556 CCCCAATTAGGAGGGCA 1847 GGCCAGTGCCCTCCTA 3616
    CTGGCC ATTG
    9535 9557 CCCAATTAGGAGGGCAC 1848 GGGCCAGTGCCCTCCT 3617
    TGGCCC AATT
    9536 9558 CCAATTAGGAGGGCACT 1849 GGGGCCAGTGCCCTCC 3618
    GGCCCC TAAT
    9555 9577 CCCCCAACAGGCATCAC 1850 AGCGGGGTGATGCCTG 3619
    CCCGCT TTGG
    9556 9578 CCCCAACAGGCATCACC 1851 TAGCGGGGTGATGCCT 3620
    CCGCTA GTTG
    9557 9579 CCCAACAGGCATCACCC 1852 TTAGCGGGGTGATGCC 3621
    CGCTAA TGTT
    9558 9580 CCAACAGGCATCACCCC 1853 TTTAGCGGGGTGATGC 3622
    GCTAAA CTGT
    9571 9593 CCCCGCTAAATCCCCTA 1854 GACTTCTAGGGGATTT 3623
    GAAGTC AGCG
    9572 9594 CCCGCTAAATCCCCTAG 1855 GGACTTCTAGGGGATT 3624
    AAGTCC TAGC
    9573 9595 CCGCTAAATCCCCTAGA 1856 GGGACTTCTAGGGGAT 3625
    AGTCCC TTAG
    9582 9604 CCCCTAGAAGTCCCACT 1857 TTTAGGAGTGGGACTT 3626
    CCTAAA CTAG
    9583 9605 CCCTAGAAGTCCCACTC 1858 GTTTAGGAGTGGGACT 3627
    CTAAAC TCTA
    9584 9606 CCTAGAAGTCCCACTCC 1859 TGTTTAGGAGTGGGAC 3628
    TAAACA TTCT
    9593 9615 CCCACTCCTAAACACAT 1860 ATACGGATGTGTTTAG 3629
    CCGTAT GAGT
    9594 9616 CCACTCCTAAACACATC 1861 AATACGGATGTGTTTA 3630
    CGTATT GGAG
    9599 9621 CCTAAACACATCCGTAT 1862 CGAGTAATACGGATGT 3631
    TACTCG GTTT
    9610 9632 CCGTATTACTCGCATCA 1863 TACTCCTGATGCGAGT 3632
    GGAGTA AATA
    9640 9662 CCTGAGCTCACCATAGT 1864 TATTAGACTATGGTGA 3633
    CTAATA GCTC
    9650 9672 CCATAGTCTAATAGAAA 1865 GGTTGTTTTCTATTAG 3634
    ACAACC ACTA
    9671 9693 CCGAAACCAAATAATTC 1866 GTGCTTGAATTATTTG 3635
    AAGCAC GTTT
    9677 9699 CCAAATAATTCAAGCAC 1867 TAAGCAGTGCTTGAAT 3636
    TGCTTA TATT
    9727 9749 CCCTCCTACAAGCCTCA 1868 GTACTCTGAGGCTTGT 3637
    GAGTAC AGGA
    9728 9750 CCTCCTACAAGCCTCAG 1869 AGTACTCTGAGGCTTG 3638
    AGTACT TAGG
    9731 9753 CCTACAAGCCTCAGAGT 1870 CGAAGTACTCTGAGGC 3639
    ACTTCG TTGT
    9739 9761 CCTCAGAGTACTTCGAG 1871 CGAAGTACTCTGAGGC 3640
    TCTCCC CTCTG
    9759 9781 CCCTTCACCATTTCCGA 1872 ATGCCGTCGGAAATGG 3641
    CGGCAT TGAA
    9760 9782 CCTTCACCATTTCCGAC 1873 GATGCCGTCGGAAATG 3642
    GGCATC GTGA
    9766 9788 CCATTTCCGACGGCATC 1874 GCCGTAGATGCCGTCG 3643
    TACGGC GAAA
    9772 9794 CCGACGGCATCTACGGC 1875 TGTTGAGCCGTAGATG 3644
    TCAACA CCGT
    9805 9827 CCACAGGCTTCCACGGA 1876 GTGAAGTCCGTGGAAG 3645
    CTTCAC CCTG
    9815 9837 CCACGGACTTCACGTCA 1877 CAATAATGACGTGAAG 3646
    TTATTG TCCG
    9848 9870 CCTCACTATCTGCTTCA 1878 GGCGGATGAAGCAGA 3647
    TCCGCC TAGTG
    9866 9888 CCGCCAACTAATATTTC 1879 TAAAGTGAAATATTAG 3648
    ACTTTA TTGG
    9869 9891 CCAACTAATATTTCACT 1880 ATGTAAAGTGAAATAT 3649
    TTACAT TAGT
    9892 9914 CCAAACATCACTTTGGC 1881 TTCGAAGCCAAAGTGA 3650
    TTCGAA TGTT
    9916 9938 CCGCCGCCTGATACTGG 1882 AAAATGCCAGTATCAG 3651
    CATTTT GCGG
    9919 9941 CCGCCTGATACTGGCAT 1883 TACAAAATGCCAGTAT 3652
    TTTGTA CAGG
    9922 9944 CCTGATACTGGCATTTT 1884 ATCTACAAAATGCCAG 3653
    GTAGAT TATC
    9970 9992 CCATCTATTGATGAGGG 1885 GTAAGACCCTCATCAA 3654
    TCTTAC TAGA
    10012 10034 CCGTTAACTTCCAATTA 1886 ACTAGTTAATTGGAG 3655
    ACTAGT TTAA
    10022 10044 CCAATTAACTAGTTTTG 1887 TGTTGTCAAAACTAGT 3656
    ACAACA TAAT
    10069 10091 CCTTAATTTTAATAATC 1888 GGTGTTGATTATTAAA 3657
    AACACC ATTA
    10090 10112 CCCTCCTAGCCTTACTA 1889 TATTAGTAGTAAGGCT 3658
    CTAATA AGGA
    10091 10113 CCTCCTAGCCTTACTAC 1890 TTATTAGTAGTAAGGC 3659
    TAATAA TAGG
    10094 10116 CCTAGCCTTACTACTAA 1891 TAATTATTAGTAGTAA 3660
    TAATTA GGCT
    10099 10121 CCTTACTACTAATAATT 1892 TGTAATAATTATTAGT 3661
    ATTACA AGTA
    10131 10153 CCACAACTCAACGGCT 1893 TCTATGTAGCCGTTGA 3662
    CATAGA GTTG
    10159 10181 CCACCCCTTACGAGTGC 1894 GAAGCCGCACTCGTAA 3663
    GGCTTC GGGG
    10162 10184 CCCCTTACGAGTGCGGC 1895 GTCGAAGCCGCACTCG 3664
    TTCGAC TAAG
    10163 10185 CCCTTACGAGTGCGGCT 1896 GGTCGAAGCCGCACTC 3665
    TCGACC GTAA
    10164 10186 CCTTACGAGTGCGGCTT 1897 GGGTCGAAGCCGCACT 3666
    CGACCC CGTA
    10184 10206 CCCTATATCCCCCGCCC 1898 GGACGCGGGCGGGGG 3667
    GCGTCC ATATA
    10185 10207 CCTATATCCCCCGCCCG 1899 GGGACGCGGGCGGGG 3668
    CGTCCC GATAT
    10192 10214 CCCCCGCCCGCGTCCCT 1900 GGAGAAAGGGACGCG 3669
    TTCTCC GGCGG
    10193 10215 CCCCGCCCGCGTCCCTT 1901 TGGAGAAAGGGACGC 3670
    TCTCCA GGGCG
    10194 10216 CCCGCCCGCGTCCCTTT 1902 ATGGAGAAAGGGACG 3671
    CTCCAT CGGGC
    10195 10217 CCGCCCGCGTCCCTTTC 1903 TATGGAGAAAGGGAC 3672
    TCCATA GCGGG
    10198 10220 CCCGCGTCCCTTTCTCC 1904 TTTTATGGAGAAAGGG 3673
    ATAAAA ACGC
    10199 10221 CCGCGTCCCTTTCTCCA 1905 ATTTTATGGAGAAAGG 3674
    TAAAAT GACG
    10205 10227 CCCTTTCTCCATAAAAT 1906 AGAAGAATTTTATGGA 3675
    TCTTCT GAAA
    10206 10228 CCTTTCTCCATAAAATT 1907 AAGAAGAATTTTATGG 3676
    CTTCTT AGAA
    10213 10235 CCATAAAATTCTTCTTA 1908 AGCTACTAAGAAGAAT 3677
    GTAGCT TTTA
    10240 10262 CCTTCTTATTATTTGATC 1909 TTCTAGATCAAATAAT 3678
    TAGAA AAGA
    10267 10289 CCCTCCTTTTACCCCTAC 1910 TCATGGTAGGGGTAAA 3679
    CATGA AGGA
    10268 10290 CCTCCTTTTACCCCTACC 1911 CTCATGGTAGGGGTAA 3680
    ATGAG AAGG
    10271 10293 CCTTTTACCCCTACCAT 1912 GGGCTCATGGTAGGGG 3681
    GAGCCC TAAA
    10278 10300 CCCCTACCATGAGCCCT 1913 GTTTGTAGGGCTCATG 3682
    ACAAAC GTAG
    10279 10301 CCCTACCATGAGCCCTA 1914 TGTTTGTAGGGCTCAT 3683
    CAAACA GGTA
    10280 10302 CCTACCATGAGCCCTAC 1915 TTGTTTGTAGGGCTCA 3684
    AAACAA TGGT
    10284 10306 CCATGAGCCCTACAAAC 1916 TTAGTTGTTTGTAGGG 3685
    AACTAA CTCA
    10291 10313 CCCTACAAACAACTAAC 1917 TGGCAGGTTAGTTGTT 3686
    CTGCCA TGTA
    10292 10314 CCTACAAACAACTAACC 1918 GTGGCAGGTTAGTTGT 3687
    TGCCAC TTGT
    10307 10329 CCTGCCACTAATAGTTA 1919 ATGACATAACTATTAG 3688
    TGTCAT TGGC
    10311 10333 CCACTAATAGTTATGTC 1920 AGGGATGACATAACTA 3689
    ATCCCT TTAG
    10330 10352 CCCTCTTATTAATCATC 1921 TAGGATGATGATTAAT 3690
    ATCCTA AAGA
    10331 10353 CCTCTTATTAATCATCA 1922 CTAGGATGATGATTAA 3691
    TCCTAG TAAG
    10349 10371 CCTAGCCCTAAGTCTGG 1923 CATAGCCAGACTTAG 3692
    CCTATG GGCT
    10354 10376 CCCTAAGTCTGGCCTAT 1924 TCACTCATAGGCCAGA 3693
    GAGTGA CTTA
    10355 10377 CCTAAGTCTGGCCTATG 1925 GTCACTCATAGGCCAG 3694
    AGTGAC ACTT
    10366 10388 CCTATGAGTGACTACAA 1926 TCCTTTTTGTAGTCACT 3695
    AAAGGA CAT
    10399 10421 CCGAATTGGTATATAGT 1927 GTTTAAACTATATACC 3696
    TTAAAC AATT
    10466 10488 CCAAATGCCCCTCATTT 1928 TTATGTAAATGAGGGG 3697
    ACATAA CATT
    10473 10495 CCCCTCATTTACATAAA 1929 ATAATATTTATGTAAA 3698
    TATTAT TGAG
    10474 10496 CCCTCATTTACATAAAT 1930 TATAATATTTATGTAA 3699
    ATTATA ATGA
    10475 10497 CCTCATTTACATAAATA 1931 GTATAATATTTATGTA 3700
    TTATAC AATG
    10507 10529 CCATCTCACTTCTAGGA 1932 TAGTATTCCTAGAAGT 3701
    ATACTA GAGA
    10544 10566 CCTCATATCCTCCCTAC 1933 GGCATAGTAGGGAGG 3702
    TATGCC ATATG
    10552 10574 CCTCCCTACTATGCCTA 1934 TCCTTCTAGGCATAGT 3703
    GAAGGA AGGG
    10555 10577 CCCTACTATGCCTAGAA 1935 TATTCCTTCTAGGCAT 3704
    GGAATA AGTA
    10556 10578 CCTACTATGCCTAGAAG 1936 TTATTCCTTCTAGCTCA 3705
    GAATAA TAGT
    10565 10587 CCTAGAAGGAATAATAC 1937 GCGATAGTATTATTCC 3706
    TATCGC TTCT
    10612 10634 CCCTCAACACCCACTCC 1938 TAAGAGGGAGTGGGT 3707
    CTCTTA GTTGA
    10613 10635 CCTCAACACCCACTCCC 1939 CTAAGAGGGAGTGGG 3708
    TCTTAG TGTTG
    10621 10643 CCCACTCCCTCTTAGCC 1940 AATATTGGCTAAGAGG 3709
    AATATT GAGT
    10622 10644 CCACTCCCTCTTAGCCA 1941 CAATATTGGCTAAGAG 3710
    ATATTG GGAG
    10627 10649 CCCTCTTAGCCAATATT 1942 AGGCACAATATTGGCT 3711
    GTGCCT AAGA
    10628 10650 CCTCTTAGCCAATATTG 1943 TAGGCACAATATTGGC 3712
    TGCCTA TAAG
    10636 10658 CCAATATTGTGCCTATT 1944 TATGGCAATAGGCACA 3713
    GCCATA ATAT
    10647 10669 CCTATTGCCATACTAGT 1945 GCAAAGACTAGTATGG 3714
    CTTTGC CAAT
    10654 10676 CCATACTAGTCTTTGCC 1946 GCAGGCGGCAAAGAC 3715
    GCCTGC TAGTA
    10669 10691 CCGCCTGCGAAGCAGCG 1947 GCCCACCGCTGCTTCG 3716
    GTGGGC CAGG
    10672 10694 CCTGCGAAGCAGCGGTG 1948 TAGGCCCACCGCTGCT 3717
    GGCCTA TCGC
    10691 10713 CCTAGCCCTACTAGTCT 1949 AGATTGAGACTAGTAG 3718
    CAATCT GGCT
    10696 10718 CCCTACTAGTCTCAATC 1950 GTTGGAGATTGAGACT 3719
    TCCAAC AGTA
    10697 10719 CCTACTAGTCTCAATCT 1951 TGTTGGAGATTGAGAC 3720
    CCAACA TAGT
    10714 10736 CCAACACATATGGCCTA 1952 GTAGTCTAGGCCATAT 3721
    GACTAC GTGT
    10727 10749 CCTAGACTACGTACATA 1953 TTAGGTTATGTACGTA 3722
    ACCTAA GTCT
    10745 10767 CCTAAACCTACTCCAAT 1954 TTTAGCATTGGAGTAG 3723
    GCTAAA GTTT
    10751 10773 CCTACTCCAATGCTAAA 1955 ATTAGTTTTAGCATTG 3724
    ACTAAT GAGT
    10757 10779 CCAATGCTAAAACTAAT 1956 GGGACGATTAGTTTTA 3725
    CGTCCC GCAT
    10777 10799 CCCAACAATTATATTAC 1957 GTGGTAGTAATATAAT 3726
    TACCAC TGTT
    10778 10800 CCAACAATTATATTACT 1958 AGTGGTAGTAATATAA 3727
    ACCACT TTGT
    10796 10818 CCACTGACATGACTTTC 1959 TTTTTGGAAAGTCATG 3728
    CAAAAA TCAG
    10812 10834 CCAAAAAACACATAATT 1960 GATTCAAATTATGTGT 3729
    TGAATC TTTT
    10842 10864 CCACCCACAGCCTAATT 1961 GCTAATAATTAGGCTG 3730
    ATTAGC TGGG
    10845 10867 CCCACAGCCTAATTATT 1962 GATGCTAATAATTAGG 3731
    AGCATC CTGT
    10846 10868 CCACAGCCTAATTATTA 1963 TGATGCTAATAATTAG 3732
    GCATCA GCTG
    10852 10874 CCTAATTATTAGCATCA 1964 GAGGGATGATGCTAAT 3733
    TCCCTC AATT
    10870 10892 CCCTCTACTATTTTTTAA 1965 TTTGGTTAAAAAATAG 3734
    CCAAA TAGA
    10871 10893 CCTCTACTATTTTTTAAC 1966 ATTTGGTTAAAAAATA 3735
    CAAAT GTAG
    10888 10910 CCAAATCAACAACAACC 1967 TAAATAGGTTGTTGTT 3736
    TATTTA GATT
    10903 10925 CCTATTTAGCTGTTCCC 1968 AGGTTGGGGAACAGCT 3737
    CAACCT AAAT
    10917 10939 CCCCAACCTTTTCCTCC 1969 GGGGTCGGAGGAAAA 3738
    GACCCC GGTTG
    10918 10940 CCCAACCTTTTCCTCCG 1970 GGGGGTCGGAGGAAA 3739
    ACCCCC AGGTT
    10919 10941 CCAACCTTTTCCTCCGA 1971 AGGGGGTCGGAGGAA 3740
    CCCCCT AAGGT
    10923 10945 CCTTTTCCTCCGACCCC 1972 TGTTAGGGGGTCGGAG 3741
    CTAACA GAAA
    10929 10951 CCTCCGACCCCCTAACA 1973 GGGGGTTGTTAGGGGG 3742
    ACCCCC TCGG
    10932 10954 CCGACCCCCTAACAACC 1974 GAGGGGGGTTGTTAGG 3743
    CCCCTC GGGT
    10936 10958 CCCCCTAACAACCCCCC 1975 TTAGGAGGGGGGTTGT 3744
    TCCTAA TAGG
    10937 10959 CCCCTAACAACCCCCCT 1976 ATTAGGAGGGGGGTTG 3745
    CCTAAT TTAG
    10938 10960 CCCTAACAACCCCCCTC 1977 TATTAGGAGGGGGGTT 3746
    CTAATA GTTA
    10939 10961 CCTAACAACCCCCCTCC 1978 GTATTAGGAGGGGGGT 3747
    TAATAC TGTT
    10947 10969 CCCCCCTCCTAATACTA 1979 GGTAGTTAGTATTAGG 3748
    ACTACC AGGG
    10948 10970 CCCCCTCCTAATACTAA 1980 AGGTAGTTAGTATTAG 3749
    CTACCT GAGG
    10949 10971 CCCCTCCTAATACTAAC 1981 CAGGTAGTTAGTATTA 3750
    TACCTG GGAG
    10950 10972 CCCTCCTAATACTAAC 1982 TCAGGTAGTTAGTATT 3751
    ACCTGA AGGA
    10951 10973 CCTCCTAATACTAACTA 1983 GTCAGGTAGTTAGTAT 3752
    CCTGAC TAGG
    10954 10976 CCTAATACTAACTACCT 1984 GGAGTCAGGTAGTTAG 3753
    GACTCC TATT
    10968 10990 CCTGACTCCTACCCCTC 1985 GATTGTGAGGGGTAGG 3754
    ACAATC AGTC
    10975 10997 CCTACCCCTCACAATCA 1986 TTGCCATGATTGTGAG 3755
    TGGCAA GGGT
    10979 11001 CCCCTCACAATCATGGC 1987 TGGCTTGCCATGATTG 3756
    AAGCCA TGAG
    10980 11002 CCCTCACAATCATGGCA 1988 TTGGCTTGCCATGATT 3757
    AGCCAA GTGA
    10981 11003 CCTCACAATCATGGCAA 1989 GTTGGCTTGCCATGAT 3758
    GCCAAC GTG
    10999 11021 CCAACGCCACTTATCCA 1990 GTTCACTGGATAAGTG 3759
    GTGAAC GCGT
    11005 11027 CCACTTATCCAGTGAAC 1991 ATAGTGGTTCACTGGA 3760
    CACTAT TAAG
    11013 11035 CCAGTGAACCACTATCA 1992 TTTTCGTGATAGTGGT 3761
    CGAAAA TCAC
    11021 11043 CCACTATCACGAAAAAA 1993 TAGAGTTTTTTTCGTG 3762
    ACTCTA ATAG
    11044 11066 CCTCTCTATACTAATCT 1994 GTAGGGAGATTAGTAT 3763
    CCCTAC AGAG
    11061 11083 CCCTACAAATCTCCTTA 1995 TATAATTAAGGAGATT 3764
    ATTATA TGTA
    11062 11084 CCTACAAATCTCCTTAA 1996 TTATAATTAAGGAGAT 3765
    TTATAA TTGT
    11073 11095 CCTTAATTATAACATTC 1997 GGCTGTGAATGTTATA 3766
    ACAGCC ATTA
    11094 11116 CCACAGAACTAATCATA 1998 ATAAAATATGATTAGT 3767
    TTTTAT TCTG
    11130 11152 CCACACTTATCCCCACC 1999 AGCCAAGGTGGGGAT 3768
    TTGGCT AAGTG
    11140 11162 CCCCACCTTGGCTATCA 2000 GGGTGATGATAGCCAA 3769
    TCACCC GGTG
    11141 11163 CCCACCTTGGCTATCAT 2001 CGGGTGATGATAGCCA 3770
    CACCCG AGGT
    11142 11164 CCACCTTGGCTATCATC 2002 TCGGGTGATGATAGCC 3771
    ACCCGA AAGG
    11145 11167 CCTTGGCTATCATCACC 2003 TCATCGGGTGATGATA 3772
    CGATGA GCCA
    11160 11182 CCCGATGAGGCAACCA 2004 TTCTGGCTGGTTGCCT 3773
    GCCAGAA CATC
    11161 11183 CCGATGAGGCAACCAG 2005 GTTCTGGCTGGTTGCC 3774
    CCAGAAC TCAT
    11173 11195 CCAGCCAGAACGCCTGA 2006 CTGCGTTCAGGCGTTC 3775
    ACGCAG TGGC
    11177 11199 CCAGAACGCCTGAACGC 2007 GTGCCTGCGTTCAGGC 3776
    AGGCAC GTTC
    11185 11207 CCTGAACGCAGGCACAT 2008 GGAAGTATGTGCCTGC 3777
    ACTTCC GTTC
    11206 11228 CCTATTCTACACCCTAG 2009 AGCCTACTAGGGTGTA 3778
    TAGGCT GAAT
    11217 11239 CCCTAGTAGGCTCCCTT 2010 TAGGGGAAGGGAGCC 3779
    CCCCTA TACTA
    11218 11240 CCTAGTAGGCTCCCTTC 2011 GTAGGGGAAGGGAGC 3780
    CCCTAC CTACT
    11229 11251 CCCTTCCCCTACTCATC 2012 TAGTGCGATGAGTAGG 3781
    GCACTA GGAA
    11230 11252 CCTTCCCCTACTCATCG 2013 TTAGTGCGATGAGTAG 3782
    CACTAA GGGA
    11234 11256 CCCCTACTCATCGCACT 2014 TAAATTAGTGCGATGA 3783
    AATTTA GTAG
    11235 11257 CCCTACTCATCGCACTA 2015 GTAAATTAGTGCGATG 3784
    ATTTAC AGTA
    11236 11258 CCTACTCATCGCACTAA 2016 TGTAAATTAGTGCGAT 3785
    TTTACA GAGT
    11268 11290 CCCTAGGCTCACTAAAC 2017 TAGAATGTTTAGTGAG 3786
    ATTCTA CCTA
    11269 11291 CCTAGGCTCACTAAACA 2018 GTAGAATGTTTAGTGA 3787
    TTCTAC GCCT
    11307 11329 CCCAAGAACTATCAAAC 2019 TCAGGAGTTTGATAGT 3788
    TCCTGA TCTT
    11308 11330 CCAAGAACTATCAAACT 2020 CTCAGGAGTTTGATAG 3789
    CCTGAG TTCT
    11325 11347 CCTGAGCCAACAACTTA 2021 TCATATTAAGTTGTTG 3790
    ATATGA GCTC
    11331 11353 CCAACAACTTAATATGA 2022 AGCTAGTCATATTAAG 3791
    CTAGCT TTGT
    11381 11403 CCTCTTTACGGACTCCA 2023 CATAAGTGGAGTCCGT 3792
    CTTTATG AAAG
    11395 11417 CCACTTATGACTCCCTA 2024 GGGCTTTAGGGAGTCA 3793
    AAGCCC TAAG
    11407 11429 CCCTAAAGCCCATGTCG 2025 GGGCTTCGACATGGGC 3794
    AAGCCC TTTA
    11408 11430 CCTAAAGCCCATGTCGA 2026 GGGGCTTCGACATGGG 3795
    AGCCCC CTTT
    11415 11437 CCCATGTCGAAGCCCCC 2027 AGCGATGGGGGCTTCG 3796
    ATCGCT ACAT
    11416 11438 CCATGTCGAAGCCCCCA 2028 CAGCGATGGGGGCTTC 3797
    TCGCTG GACA
    11427 11449 CCCCCATCGCTGGGTCA 2029 TACTATTGACCCAGCG 3798
    ATAGTA ATGG
    11428 11450 CCCCATCGCTGGGTCAA 2030 GTACTATTGACCCAGC 3799
    TAGTAC GATG
    11429 11451 CCCATCGCTGGGTCAAT 2031 AGTACTATTGACCCAG 3800
    AGTACT CGAT
    11430 11452 CCATCGCTGGGTCAATA 2032 AAGTACTATTGACCCA 3801
    GTACTT GCGA
    11454 11476 CCGCAGTACTCTTAAAA 2033 GCCTAGTTTTAAGAGT 3802
    CTAGGC ACTG
    11494 11516 CCTCACACTCATTCTCA 2034 GGGGGTTGAGAATGA 3803
    ACCCCC GTGTG
    11512 11534 CCCCCTGACAAAACACA 2035 AGGCTATGTGTTTTGT 3804
    TAGCCT CAGG
    11513 11535 CCCCTGACAAAACACAT 2036 TAGGCTATGTGTTTTG 3805
    AGCCTA TCAG
    11514 11536 CCCTGACAAAACACATA 2037 GTAGGCTATGTGTTTT 3806
    GCCTAC GTCA
    11515 11537 CCTGACAAAACACATAG 2038 GGTAGGCTATGTGTTT 3807
    CCTACC TGTC
    11532 11554 CCTACCCCTTTCCTGTA 2039 GGATAGTACAAGGAA 3808
    CTATCC GGGGT
    11536 11558 CCCCTTCCTTGTACTATC 2040 ATAGGGATAGTACAA 3809
    CCTAT GGAAG
    11537 11559 CCCTTCCTTGTACTATCC 2041 CATAGGGATAGTACAA 3810
    CTATG GGAA
    11538 11560 CCTTCCTTGTACTATCCC 2042 TCATAGGGATAGTACA 3811
    TATGA AGGA
    11542 11564 CCTTGTACTATCCCTAT 2043 TGCCTCATAGGGATAG 3812
    GAGGCA TACA
    11553 11575 CCCTATGAGGCATAATT 2044 TGTTATAATTATGCCT 3813
    ATAACA CATA
    11554 11576 CCTATGAGGCATAATTA 2045 TTGTTATAATTATGCC 3814
    TAACAA TCAT
    11580 11602 CCATCTGCCTACGACAA 2046 GTCTGTTTGTCGTAGG 3815
    ACAGAC CAGA
    11587 11609 CCTACGACAAACAGACC 2047 ATTTTAGGTCTGTTTGT 3816
    TAAAAT CGT
    11602 11624 CCTAAAATCGCTCATTG 2048 AGTATGCAATGAGCGA 3817
    CATACT TTTT
    11635 11657 CCACATAGCCCTCGTAG 2049 CTGTTACTACGAGGGC 3818
    TAACAG TATG
    11643 11665 CCCTCGTAGTAACAGCC 2050 GAGAATGGCTGTTACT 3819
    ATTCTC ACGA
    11644 11666 CCTCGTAGTAACAGCCA 2051 TGAGAATGGCTGTTAC 3820
    TTCTCA TACG
    11658 11680 CCATTCTCATCCAAACC 2052 TCAGGGGGTTTGGATG 3821
    CCCTGA AGAA
    11668 11690 CCAAACCCCCTGAAGCT 2053 CGGTGAAGCTTCAGGG 3822
    TCACCG GGTT
    116173 11695 CCCCCTGAAGCTTCACC 2054 TGCGCCGGTGAAGCTT 3823
    GGCGCA CAGG
    11674 11696 CCCCTGAAGCTTCACCG 2055 CTGCGCCGGTGAAGCT 3824
    GCGCAG TCAG
    11675 11697 CCCTGAAGCTTCACCGG 2056 ACTGCGCCGGTGAAGC 3825
    CGCAGT TTCA
    11676 11698 CCTGAAGCTTCACCGGC 2057 GACTGCGCCGGTGAAG 3826
    GCAGTC CTTC
    11688 11710 CCGGCGCAGTCATTCTC 2058 GATTATGAGAATGACT 3827
    ATAATC GCGC
    11712 11734 CCCACGGGCTTACATCC 2059 TAATGAGGATGTAAGC 3828
    TCATTA CCGT
    11713 11735 CCACGGGCTTACATCCT 2060 GTAATGAGGATGTAAG 3829
    CATTAC CCCG
    11727 11749 CCTCATTACTATTCTGC 2061 TGCTAGGCAGAATAGT 3830
    CTAGCA AATG
    11743 11765 CCTAGCAAACTCAAACT 2062 GTTCGTAGTTTGAGTT 3831
    ACGAAC TGCT
    11788 11810 CCTCTCTCAAGGACTTC 2063 GAGTTTGAAGTCCTTG 3832
    AAACTC AGAG
    11815 11837 CCCACTAATAGCTTTTT 2064 GTCATCAAAAAGCTAT 3833
    GATGAC TAGT
    11816 11838 CCACTAATAGCTTTTTG 2065 AGTCATCAAAAAGCTA 3834
    ATGACT TTAG
    11870 11848 CCTCGCTAACCTCGCCT 2066 GGGGTAAGGCGAGGT 3835
    TACCCC TAGCG
    11857 11879 CCTCGCCTTACCCCCCA 2067 TAATAGTGGGGGGTAA 3836
    CTATTA GGCG
    11862 11884 CCTTACCCCCCACTATT 2068 TAGGTTAATAGTGGGG 3837
    AACCTA GGTA
    11867 11889 CCCCCCACTATTAACCT 2069 CCCAGTAGGTTAATAG 3838
    ACTGGG TGGG
    11868 11890 CCCCCACTATTAACCTA 2070 TCCCAGTAGGTTAATA 3839
    CTGGGA GTGG
    11869 11891 CCCCACTATTAACCTAC 2071 CTCCCAGTAGGTTAAT 3840
    TGGGAG AGTG
    11870 11892 CCCACTATTAACCTACT 2072 TCTCCCAGTAGGTTAA 3841
    GGGAGA TAGT
    11871 11893 CCACTATTAACCTACTG 2073 TTCTCCCAGTAGGTTA 3842
    GGAGAA ATAG
    11881 11903 CCTACTGGGAGAACTCT 2074 GCACAGAGAGTTCTCC 3843
    CTGTGC CAGT
    11910 11932 CCACGTTCTCCTGATCA 2075 GATATTTGATCAGGAG 3844
    AATATC AACG
    11919 11941 CCTGATCAAATATCACT 2076 TAGGAGAGTGATATTT 3845
    CTCCTA GATC
    11938 11960 CCTACTTACAGGACTCA 2077 GTATGTTGAGTCCTGT 3846
    ACATAC AAGT
    11970 11992 CCCTATACTCCCTCTAC 2078 AAATATGTAGAGGGA 3847
    ATATTT GTATA
    11971 11993 CCTATACTCCCTCTACA 2079 TAAATATGTAGAGGGA 3848
    TATTTA GTAT
    11979 12001 CCCTCTACATATTTACC 2080 TGTTGTGGTAAATATG 3849
    ACAACA TAGA
    11980 12002 CCTCTACATATTTACCA 2081 GTGTTGTGGTAAATAT 3850
    CAACAC GTAG
    11994 12016 CCACAACACAATGGGG 2082 GAGTGAGCCCCATTGT 3851
    CTCACTC GTTG
    12018 12040 CCCACCACATTAACAAC 2083 TTTTATGTTGTTAATGT 3852
    ATAAAA GGT
    12019 12041 CCACCACATTAACAACA 2084 GTTTTATGTTGTTAAT 3853
    TAAAAC GTGG
    12022 12044 CCACATTAACAACATAA 2085 AGGGTTTTATGTTGTT 3854
    AACCCT AATG
    12041 12063 CCCTCATTCACACGAGA 2086 GTGTTTTCTCGTGTGA 3855
    AAACAC ATGA
    12042 12064 CCTCATTCACACGAGAA 2087 GGTGTTTTTTCGTGTG 3856
    AACACC AATG
    12063 12085 CCCTCATGTTCATACAC 2088 GGATAGGTGTATGAAC 3857
    CTATCC ATGA
    12064 12086 CCTCATGTTCATACACC 2089 GGGATAGGTGTATGAA 3858
    TATCCC CATG
    12079 12101 CCTATCCCCCATTCTCCT 2090 ATAGGAGGAGAATGG 3859
    CCTAT GGGAT
    12084 12106 CCCCCATTCTCCTCCTAT 2091 GAGGGATAGGAGGAG 3860
    CCCTC AATGG
    12085 12107 CCCCATTCTCCTCCTATC 2092 TGAGGGATAGGAGGA 3861
    CCTCA GAATG
    12086 12108 CCCATTCTCCTCCTATCC 2093 TTGAGGGATAGGAGG 3862
    CTCAA AGAAT
    12087 12109 CCATTCTCCTCCTATCCC 2094 GTTGAGGGATAGGAG 3863
    TCAAC GAGAA
    12094 12116 CCTCCTATCCCTCAACC 2095 TGTCGGGGTTGAGGGA 3864
    CCGACA TAGG
    12097 12119 CCTATCCCTCAACCCCG 2096 TGATGTCGGGGTTGAG 3865
    ACATCA GGAT
    12102 12124 CCCTCAACCCCGACATC 2097 GGTAATGATGTCGGGG 3866
    ATTACC TTGA
    12103 12125 CCTCAACCCCGACATCA 2098 CGGTAATGATGTCGGG 3867
    TTACCG GTTG
    12109 12131 CCCCGACATCATTACCG 2099 AAAACCCGGTAATGAT 3868
    GGTTTT GTCG
    12110 12132 CCCGACATCATTACCGG 2100 GAAAACCCGGTAATG 3869
    GTTTTC ATGTC
    12111 12133 CCGACATCATTACCGGG 2101 GGAAAACCCGGTAAT 3870
    TTTTCC GATGT
    12123 12145 CCGGGTTTTCCTCTTGT 2102 ATATTTACAAGAGGAA 3871
    AAATAT AACC
    12132 12154 CCTCTTGTAAATATAGT 2103 GGTTAAACTATATTTA 3872
    TTAACC CAAG
    12153 12175 CCAAAACATCAGATTGT 2104 AGATTCACAATCTGAT 3873
    GAATCT GTTT
    12194 12216 CCCCTTATTTACCGAGA 2105 GAGCTTTCTCGGTAAA 3874
    AAGCTC TAAG
    12195 12217 CCCTTATTTACCGAGAA 2106 TGAGCTTTCTCGGTAA 3875
    AGCTCA ATAA
    12196 12218 CCTTATTTACCGAGAAA 2107 GTGAGCTTTCTCGGTA 3876
    GCTCAC AATA
    12205 12227 CCGAGAAAGCTCACAA 2108 GCAGTTCTTGTGAGCT 3877
    GAACTGC TTCT
    12237 12259 CCCCCATGTCTAACAAC 2109 AGCCATGTTGTTAGAC 3878
    ATGGCT ATGG
    12238 12260 CCCCATGTCTAACAACA 2110 AAGCCATGTTGTTAGA 3879
    TGGCTT CATG
    12239 12261 CCCATGTCTAACAACAT 2111 AAAGCCATGTTGTTAG 3880
    GGCTTT ACAT
    12240 12262 CCATGTCTAACAACATG 2112 GAAAGCCATGTTGTTA 3881
    GCTTTC GACA
    12288 12310 CCATTGGTCTTAGGCCC 2113 TTTTTGGGGCCTAAGA 3882
    CAAAAA CCAA
    12302 12324 CCCCAAAAATTTTGGTG 2114 GAGTTGCACCAAAATT 3883
    CAACTC TTTG
    12303 12325 CCCAAAAATTTTGGTGC 2115 GGAGTTGCACCAAAAT 3884
    AACTCC TTTT
    12304 12326 CCAAAAATTTTGGTGCA 2116 TGGAGTTGCACCAAAA 3885
    ACTCCA TTTT
    12324 12346 CCAAATAAAAGTAATA 2117 GCATGGTTATTACTTT 3886
    ACCATGC TATT
    12341 12363 CCATGCACACTACTATA 2118 GGTGGTTATAGTAGTG 3887
    ACCACC TGCA
    12359 12381 CCACCCTAACCCTGACT 2119 TAGGGAAGTCAGGGTT 3888
    TCCCTA AGGG
    12362 12384 CCCTAACCCTGACTTCC 2120 AATTAGGGAAGTCAG 3889
    CTAATT GGTTA
    12363 12385 CCTAACCCTGACTTCCC 2121 GAATTAGGGAAGTCA 3890
    TAATTC GGGTT
    12368 12390 CCCTGACTTCCCTAATT 2122 GGGGGGAATTAGGGA 3891
    CCCCCC AGTCA
    12369 12391 CCTGACTTCCCTAATTC 2123 TGGGGGGAATTAGGG 3892
    CCCCCA AAGTC
    12377 12399 CCCTAATTCCCCCCATC 2124 CTGTAAGGATGGGGGG 3893
    CTTACC AATTA
    12378 12400 CCTAATTCCCCCCATCC 2125 TGGTAAGGATGGGGG 3894
    TTACCA GAATT
    12385 12407 CCCCCCATCCTTACCAC 2126 ACGAGGGTGGTAAGG 3895
    CCTCGT ATGGG
    12386 12408 CCCCCATCCTTACCACC 2127 AACGAGGGTGGTAAG 3896
    CTCGTT GATGG
    12387 12409 CCCCATCCTTACCACCC 2128 TAACGAGGGTGGTAA 3897
    TCGTTA GGATG
    12388 12410 CCCATCCTTACCACCCT 2129 TTAACGAGGGTGGTAA 3898
    CGTTAA GGAT
    12389 12411 CCATCCTTACCACCCTC 2130 GTTAACGAGGGTGGTA 3899
    GTTAAC AGGA
    12393 12415 CCTTACCACCCTCGTTA 2131 TAGGGTTAACGAGGGT 3900
    ACCCTA GGTA
    12398 12420 CCACCCTCGTTAACCCT 2132 TTTGTTAGGGTTAACG 3901
    AACAAA AGGG
    12401 12423 CCCTCGTTAACCCTAAC 2133 TTTTTTGTTAGGGTTA 3902
    AAAAAA ACGA
    12402 12424 CCTCGTTAACCCTAACA 2134 TTTTTTTGTTAGGGTTA 3903
    AAAAAA ACG
    12411 12433 CCCTAACAAAAAAAACT 2135 GGTATGAGTTTTTTTT 3904
    CATACC GTTA
    12412 12434 CCTAACAAAAAAAACTC 2136 GGGTATGAGTTTTTTT 3905
    ATACCC TGTT
    12432 12454 CCCCCATTATGTAAAAT 2137 CAATGGATTTTACATA 3906
    CCATTG ATGG
    12433 12455 CCCCATTATGTAAAATC 2138 ACAATGGATTTTACAT 3907
    CATTGT AATG
    12434 12456 CCCATTATGTAAAATCC 2139 GACATGGATTTTACA 3908
    ATTGTC TAAT
    12435 12457 CCATTATGTAAAATCCA 2140 CGACAATGGATTTTAC 3909
    TTGTCG ATAA
    12449 12471 CCATTGTCGCATCCACC 2141 AATAAAGGTGGATGC 3910
    TTTATT GACAA
    12461 12483 CCACCTTTATTATCAGT 2142 GAAGAGACTGATAAT 3911
    CTCTTC AAAGG
    12464 12486 CCTTTATTATCAGTCTCT 2143 GGGGAAGAGACTGAT 3912
    TCCCC AATAA
    12483 12505 CCCCACAACAATATTCA 2144 GGCACATGAATATTGT 3913
    TGTGCC TGTG
    12484 12506 CCCACAACAATATTCAT 2145 AGGCACATGAATATTG 3914
    GTGCCT TTGT
    12485 12507 CCACAACAATATTCATG 2146 TAGGCACATGAATATT 3915
    TGCCTA GTTG
    12504 12526 CCTAGACCAAGAAGTTA 2147 AGATAATAACTTCTTG 3916
    TTATCT GTCT
    12510 12532 CCAAGAAGTTATTATCT 2148 AGTTCGAGATAATAAC 3917
    CGAACT TTCT
    12542 12564 CCACAACCCAAACAACC 2149 GAGCTGGGTTGTTTGG 3918
    CAGCTC GTTG
    12548 12570 CCCAAACAACCCAGCTC 2150 TAGGGAGAGCTGGGTT 3919
    TCCCTA GTTT
    12549 12571 CCAAACAACCCAGCTCT 2151 TTAGGGAGAGCTGGGT 3920
    CCCTAA TGTT
    12557 12579 CCCAGCTCTCCCTAAGC 2152 TTTGAAGCTTAGGGAG 3921
    TTCAAA AGCT
    12558 12580 CCAGCTCTCCCTAAGCT 2153 GTTTGAAGCTTAGGGA 3922
    TCAAAC GAGC
    12566 12588 CCCTAACTTCAAACTA 2154 GTAGTCTAGTTTGAAG 3923
    GACTAC CTTA
    12567 12589 CCTAAGCTTCAAACTAG 2155 AGTAGTCTAGTTTGAA 3924
    ACTACT GCTT
    12593 12615 CCATAATATTCATCCCT 2156 TGCTACAGGGATGAAT 3925
    GTAGCA ATTA
    12606 12628 CCCTGTAGCATTGTTCG 2157 ATGTAACGAACAATGC 3926
    TTACAT ACA
    12607 12629 CCTGTAGCATTGTTCGT 2158 CATGTAACGAACAATG 3927
    TACATG CTAC
    12632 12654 CCATCATAGAATTCTCA 2159 TCACAGTGAGAATTCT 3928
    CTGTGA ATGA
    12669 12691 CCCAAACATTAATCAGT 2160 TGAAGAACTGATTAAT 3929
    TCTTCA GTTT
    12670 12692 CCAAACATTAATCAGTT 2161 TTGAAGAACTGATTAA 3930
    CTTCAA TGTT
    12708 12730 CCTAATTACCATACTAA 2162 CTAAGATTAGTATGGT 3931
    TCTTAG AATT
    12716 12738 CCATACTAATCTTAGTT 2163 AGCGGTAACTAAGATT 3932
    ACCGCT AGTA
    12734 12756 CCGCTAACAACCTATTC 2164 CAGTTGGAATAGGTTG 3933
    CAACTG TTAG
    12744 12766 CCTATTCCAACTGTTCA 2165 AGCCGATGAACAGTTG 3934
    TCGGCT GAAT
    12750 12772 CCAACTGTTCATCGGCT 2166 CCTTTCAGCCGATGAA 3935
    GAGAGG CAGT
    12788 12810 CCTTCTTGCTCATCAGTT 2167 TCATCAACTGATGAGC 3936
    GATGA AAGA
    12815 12837 CCCGAGCAGATGCCAAC 2168 TGCTGTGTTGGCATCT 3937
    ACAGCA GCTC
    12816 12838 CCGAGCAGATGCCAAC 2169 CTGCTGTGTTGGCATC 3938
    ACAGCAG TGCT
    12827 12849 CCAACACAGCAGCCATT 2170 TGCTTGAATGGCTGCT 3939
    CAAGCA GTGT
    12839 12861 CCATTCAAGCAATCCTA 2171 GTTGTATAGGATTGCT 3940
    TACAAC TGAA
    12852 12874 CCTATACAACCGTATCG 2172 TATCGCCGATACGGTT 3941
    GCGATA GTAT
    12861 12883 CCGTATCGGCGATATCG 2173 TGAAAACCGATATCGCC 3942
    GTTTCA GATA
    12885 12907 CCTCGCCTTAGCATGAT 2174 GGATAAATCATGCTAA 3943
    TTATCC GGCG
    12890 12912 CCTTAGCATGATTTATC 2175 GTGTAGGATAAATCAT 3944
    CTACAC GCTA
    12906 12928 CCTACACTCCAACTCAT 2176 GGTCTCATGAGTTGGA 3945
    GAGACC GTGT
    12914 12936 CCAACTCATGAGACCCA 2177 TTGTTGTGGGTCTCAT 3946
    CAACAA GAGT
    12927 12949 CCCACAACAAATAGCCC 2178 TTAGAAGGGCTATTTG 3947
    TTCTAA TTGT
    12928 12950 CCACAACAAATAGCCCT 2179 TTTAGAAGGGCTATTT 3948
    TCTAAA GTTG
    12941 12963 CCCTTCTAAACGCTAAT 2180 GCTTGGATTAGCGTTT 3949
    CCAAGC AGAA
    12942 12964 CCTTCTAAACGCTAATC 2181 GGCTTGGATTAGCGTT 3950
    CAAGCC TAGA
    12958 12980 CCAAGCCTCACCCCACT 2182 CCTAGTAGTGGGGTGA 3951
    ACTAGG GGCT
    12963 12985 CCTCACCCCACTACTAG 2183 GGAGGCCTAGTAGTGG 3952
    GCCTCC GGTG
    12968 12990 CCCCACTACTAGGCCTC 2184 TAGGAGGAGGCCTAGT 3953
    CTCCTA AGTG
    12969 12991 CCCACTACTAGGCCTCC 2185 CTAGGAGGAGGCCTA 3954
    TCCTAG GTAGT
    12970 12992 CCACTACTAGGCCTCCT 2186 GCTAGGAGGAGGCCT 3955
    CCTAGC AGTAG
    12981 13003 CCTCCTCCTAGCAGCAG 2187 TGCCTGCTGCTGCTAG 3956
    CAGGCA GAGG
    12984 13006 CCTCCTAGCAGCAGCAG 2188 ATTTGCCTGCTGCTGC 3957
    GCAAAT TAGG
    12987 13009 CCTAGCAGCAGCAGGC 2189 CTGATTTGCCTGCTGC 3958
    AAATCAG TGCT
    13010 13032 CCCAATTAGGTCTCCAC 2190 TCAGGGGTGGAGACCT 3959
    CCCTGA AATT
    13011 13033 CCAATTAGGTCTCCACC 2191 GTCAGGGGTGGAGAC 3960
    CCTGAC CTAAT
    13023 13045 CCACCCCTGACTCCCCT 2192 TGGCTGAGGGGAGTCA 3961
    CAGCCA GGGG
    13026 13048 CCCCTGACTCCCCTCAG 2193 CTATGGCTGAGGGGAG 3962
    CCATAG TCAG
    13027 13049 CCCTGACTCCCCTCAGC 2194 TCTATGGCTGAGGGGA 3963
    CATAGA GTCA
    13028 13050 CCTGACTCCCCTCAGCC 2195 TTCTATGGCTGAGGGG 3964
    ATAGAA AGTC
    13035 13057 CCCCTCAGCCATAGAAG 2196 TGGGGCCTTCTATGGC 3965
    GCCCCA TGAG
    13036 13058 CCCTCAGCCATAGAAGG 2197 GTGGGGCCTTCTATGG 3966
    CCCCAC CTGA
    13037 13059 CCTCAGCCATAGAAGGC 2198 GGTGGGGCCTTCTATG 3967
    CCCACC GCTG
    13043 13065 CCATAGAAGGCCCCACC 2199 GACTGGGGTGGGGCCT 3968
    CCAGTC TCTA
    13053 13075 CCCCACCCCAGTCTCAG 2200 GTAGGGCTGAGACTGG 3969
    CCCTAC GGTG
    13054 13076 CCCACCCCAGTCTCAGC 2201 AGTAGGGCTGAGACTG 3970
    CCTACT GGGT
    13055 13077 CCACCCCAGTCTCAGCC 2202 GAGTAGGGCTGAGACT 3971
    CTACTC GGGG
    13058 13080 CCCCAGTCTCAGCCCTA 2203 GTGGAGTAGGGCTGA 3972
    CTCCAC GACTG
    13059 13081 CCCAGTCTCAGCCCTAC 2204 AGTGGAGTAGGGCTG 3973
    TCCACT AGACT
    13060 13082 CCAGTCTCAGCCCTACT 2205 GAGTGGAGTAGGGCT 3974
    CCACTC GAGAC
    13070 13092 CCCTACTCCACTCAAGC 2206 TATAGTGCTTGAGTGG 3975
    ACTATA AGTA
    13071 13093 CCTACTCCACTCAAGCA 2207 CTATAGTGCTTGAGTG 3976
    CTATAG GAGT
    13077 13099 CCACTCAAGCACTATAG 2208 CTACAACTATAGTGCT 3977
    TTGTAG TGAG
    13119 13141 CCGCTTCCACCCCCTAG 2209 TTTCTGCTAGGGGGTG 3978
    CAGAAA GAAG
    13125 13147 CCACCCCCTAGCAGAAA 2210 GGCTATTTTCTGCTAG 3979
    ATAGCC GGGG
    13128 13150 CCCCCTAGCAGAAAATA 2211 GTGGGCTATTTTCTGC 3980
    GCCCAC TAGG
    13129 13151 CCCCTAGCAGAAAATAG 2212 AGTGGGCTATTTTCTG 3981
    CCCACT CTAG
    13130 13152 CCCTAGCAGAAAATAGC 2213 TAGTGGGCTATTTTCT 3982
    CCACTA GCTA
    13131 13153 CCTAGCAGAAAATAGCC 2214 TAGTGGCCTATTTTC 3983
    CACTAA TGCT
    13146 13168 CCCACTAATCCAAACTC 2215 GTGTTAGAGTTTGGAT 3984
    TAACAC TAGT
    13147 13169 CCACTAATCCAAACTCT 2216 AGTGTTAGAGTTTGGA 3985
    AACACT TTAG
    13155 13177 CCAAACTCTAACACTAT 2217 CTAAGCATAGTGTTAG 3986
    GCTTAG AGTT
    13187 13209 CCACTCTGTTCGCAGCA 2218 GCAGACTGCTGCGAAC 3987
    GTCTGC AGAG
    13211 13233 CCCTTACACAAAATGAC 2219 TTTGATGTCATTTTGTG 3988
    ATCAAA TAA
    13212 13234 CCTTACACAAAATGACA 2220 TTTTGATGTCATTTTGT 3989
    TCAAAA GTA
    13244 13266 CCTTCTCCACTTCAAGT 2221 TAGTTGACTTGAAGTG 3990
    CAACTA GAGA
    13250 13272 CCACTTCAAGTCAACTA 2222 GAGTCCTAGTTGACTT 3991
    GGACTC GAAG
    13296 13318 CCAACCACACCTAGCAT 2223 GCAGGAATGCTAGGTG 3992
    TCCTGC TGGT
    13300 13322 CCACACCTAGCATTCCT 2224 ATGTGCAGGAATGCTA 3993
    GCACAT GGTG
    13305 13327 CCTAGCATTCCTGCACA 2225 TACAGATGTGCAGGAA 3994
    TCTGTA TGCT
    13314 13336 CCTGCACATCTGTACCC 2226 AGGCGTGGGTACAGAT 3995
    ACGCCT GTGC
    13328 13350 CCCACGCCTTCTTCAAA 2227 TATGGCTTTGAAGAAG 3996
    GCCATA GCGT
    13329 13351 CCACGCCTTCTTCAAAG 2228 GTATGGCTTTGAAGAA 3997
    CCATAC GGCG
    13334 13356 CCTTCTTCAAAGCCATA 2229 AAATAGTATGGCTTTG 3998
    CTATTT AAGA
    13346 13368 CCATACTATTTATGTGC 2230 CCCGGAGCACATAAAT 3999
    TCCGGG AGTA
    13364 13386 CCGGGTCCATCATCCAC 2231 AAGGTTGTGGATGATG 4000
    AACCTT GACC
    13370 13392 CCATCATCCACAACCTT 2232 ATTGTTAAGGTTGTGG 4001
    AACAAT ATGA
    13377 13399 CCACAACCTTAACAATG 2233 CTTGTTCATTGTTAAG 4002
    AACAAG GTTG
    13383 13405 CCTTAACAATGAACAAG 2234 GAATATCTTGTTCATT 4003
    ATATTC GTTA
    13430 13452 CCATACCTCTCACTTCA 2235 GGAGGTTGAAGTGAG 4004
    ACCTCC AGGTA
    13435 13457 CCTCTCACTTCAACCTC 2236 GTGAGGGAGGTTGAA 4005
    CCTCAC GTGAG
    13448 13470 CCTCCCTCACCATTGGC 2237 TAGGCTGCCAATGGTG 4006
    AGCCTA AGGG
    13451 13473 CCCTCACCATTGGCAGC 2238 TGCTAGGCTGCCAATG 4007
    CTAGCA GTGA
    13452 13474 CCTCACCATTGGCAGCC 2239 ATGCTAGGCTGCCAAT 4008
    TAGCAT GGTG
    13457 13479 CCATTGGCAGCCTAGCA 2240 TGCTAATGCTAGGCTG 4009
    TTAGCA CCAA
    13467 13489 CCTAGCATTAGCAGGAA 2241 AAGGTATTCCTGCTAA 4010
    TACCTT TGCT
    13486 13508 CCTTTCCTCACAGGTTT 2242 GAGTAGAAACCTGTGA 4011
    CTACTC GGAA
    13491 13513 CCTCACAGGTTTCTACT 2243 CTTTGGAGTAGAAACC 4012
    CCAAAG TGTG
    13508 13530 CCAAAGACCACATCATC 2244 GGTTTCGATGATGTGG 4013
    GAAACC TCTT
    13515 13537 CCACATCATCGAAACCG 2245 TGTTTGCGGTTTCGAT 4014
    CAAACA GATG
    13529 13551 CCGCAAACATATCATAC 2246 GTTTGTGTATGATATG 4015
    ACAAAC TTTG
    13553 13575 CCTGAGCCCTATCTATT 2247 GAGAGTAATAGATAG 4016
    ACTCTC GGCTC
    13559 13581 CCCTATCTATTACTCTC 2248 AGCGATGAGAGTAAT 4017
    ATCGCT AGATA
    13560 13582 CCTATCTATTACTCTCAT 2249 TAGCGATGAGAGTAAT 4018
    CGCTA AGAT
    13583 13605 CCTCCCTGACAAGCGCC 2250 GCTATAGGCGCTTGTC 4019
    TATAGC AGGG
    13586 13608 CCCTGACAAGCGCCTAT 2251 AGTGCTATAGGCGCTT 4020
    AGCACT GTCA
    13587 13609 CCTGACAAGCGCCTATA 2252 GAGTGCTATAGGCGCT 4021
    GCACTC TGTC
    13598 13620 CCTATAGCACTCGAATA 2253 AAGAATTATTCGAGTG 4022
    ATTCTT CTAT
    13625 13647 CCCTAACAGGTCAACCT 2254 GAAGCGAGGTTGACCT 4023
    CGCTTC GTTA
    13626 13648 CCTAACAGGTCAACCTC 2255 GGAAGCGAGGTTGAC 4024
    GCTTCC CTGTT
    13639 13661 CCTCGCTTCCCCACC 2256 TAGTAAGGGTGGGGA 4025
    TACTAA AGCG
    13647 13669 CCCCACCCTTACTAACA 2257 CGTTAATGTTAGTAAG 4026
    TTAACG GGTG
    13648 13670 CCCACCCTTACTAACAT 2258 TCGTTAATGTTAGTAA 4027
    TAACGA GGGT
    13649 13671 CCACCCTTACTAACATT 2259 TTCGTTAATGTTAGTA 4028
    AACGAA AGGG
    13652 13674 CCCTTACTAACATTAAC 2260 ATTTTCGTTAATGTTA 4029
    GAAAAT GTAA
    13653 13675 CCTTACTAACATTAACG 2261 TATTTTCGTTAATGTTA 4030
    AAAATA GTA
    13677 13699 CCCCACCCTACTATACC 2262 TAATGGGGTTTAGTAG 4031
    CCATTA GGTG
    13678 13700 CCCACCCTACTAAACCC 2263 TTAATGGGGTTTAGTA 4032
    CATTAA GGGT
    13679 13701 CCACCCTACTAAACCCC 2264 TTTAATGGGGTTTAGT 4033
    ATTAAA AGGG
    13682 13704 CCCTACTAAACCCCATT 2265 GCGTTTAATGGGGTTT 4034
    AAACGC AGTA
    13683 13705 CCTACTAAACCCCATTA 2266 GGCGTTTAATGGGGTT 4035
    AACGCC TAGT
    13692 13714 CCCCATTAAACGCCTGG 2267 CGGCTGCCAGGCGTTT 4036
    CAGCCG AATG
    13693 13715 CCCATTAAACGCCTGGC 2268 CCGGCTGCCAGGCGTT 4037
    AGCCGG TAAT
    13694 13716 CCATTAAACGCCTGGCA 2269 TCCGGCTGCCAGGCGT 4038
    GCCGGA TTAA
    13704 13726 CCTGGCAGCCGGAAGCC 2270 CGAATAGGCTTCCGGC 4039
    TATTCG TGCC
    13712 13734 CCGGAAGCCTATTCGCA 2271 AAATCCTGCGAATAGG 4040
    GGATTT CTTC
    13719 13741 CCTATTCGCAGGATTTC 2272 TAATGAGAAATCCTGC 4041
    TCATTA GAAT
    13754 13776 CCCCCGCATCCCCCTTC 2773 TGTTTGGAAGGGGGAT 4042
    CAAACA GCGG
    13755 13777 CCCCGCATCCCCCTTCC 2274 TTGTTTGGAAGGGGGA 4043
    AAACAA TGCG
    13756 13778 CCCGCATCCCCCTTCCA 7275 GTTGTTTGGAAGGGGG 4044
    AACAAC ATGC
    13757 13779 CCGCATCCCCCTTCCAA 2276 TGTTGTTTGGAAGGGG 4045
    ACAACA GATG
    13763 13785 CCCCCTTCCAAACAACA 2277 GGGGATTGTTGTTTGG 4046
    ATCCCC AAGG
    13764 13786 CCCCTTCCAAACAACAA 2278 GGGGGATTGTTGTTTG 4047
    TCCCCC GAAG
    13765 13787 CCCTTCCAAACAACAAT 2279 AGGGGGATTGTTGTTT 4048
    CCCCCT GGAA
    13766 13788 CCTTCCAAACAACAATC 2280 GAGGGGGATTGTTGTT 4049
    CCCCTC TGGA
    13770 13792 CCAAACAACAATCCCCC 2281 GGTAGAGGGGGATTGT 4050
    TCTACC TGTT
    13782 13804 CCCCCTCTACCTAAAAC 2282 CTGTGAGTTTTAGGTA 4051
    TCACAG GAGG
    13783 13805 CCCCTCTACCTAAAACT 2283 GCTGTGAGTTTTAGGT 4052
    CACAGC AGAG
    13784 13806 CCCTCTACCTAAAACTC 2284 GGCTGTGAGTTTTAGG 4053
    ACAGCC TAGA
    13785 13807 CCTCTACCTAAAACTCA 2285 GGGCTGTGAGTTTTAG 4054
    CAGCCC GTAG
    13791 13813 CCTAAAACTCACAGCCC 2286 CAGCGAGGGCTGTGA 4055
    TCGCTG GTTTT
    13805 13827 CCCTCGCTGTCACTTTC 2287 TCCTAGGAAAGTGACA 4056
    CTAGGA GCGA
    13806 13828 CCTCGCTGTCACTTTCCT 2288 GTCCTAGGAAAGTGAC 4057
    AGGAC AGCG
    13821 13843 CCTAGGACTTCTAACAG 2289 CTAGGGCTGTTAGAAG 4058
    CCCTAG TCCT
    13838 13860 CCCTAGACCTCAACTAC 2290 GGTTAGGTAGTTGAGG 4059
    CTAACC TCTA
    13839 13861 CCTAGACCTCAACTACC 2291 TGGTTAGGTAGTTGAG 4060
    TAACCA GTCT
    13845 13867 CCTCAACTACCTAACCA 2292 GTTTGTTGGTTAGGTA 4061
    ACAAAC GTTG
    13854 13876 CCTAACCAACAAACTTA 2293 TTATTTTAAGTTTGTTG 4062
    AAATAA GTT
    13859 13881 CCAACAAACTTAAAATA 2294 GGATTTTATTTTAAGT 4063
    AAATCC TTGT
    13880 13902 CCCCACTATGCACATTT 2295 GAAATAAAATGTGCAT 4064
    TATTTC AGTG
    13881 13903 CCCACTATGCACATTTT 2296 AGAAATAAAATGTGC 4065
    ATTTCT ATAGT
    13882 13904 CCACTATGCACATTTTA 2297 GAGAAATAAAATGTG 4066
    TTTCTC CATAG
    13904 13926 CCAACATACTCGGATTC 2298 AGGGTAGAATCCGAGT 4067
    TACCCT ATGT
    13923 13945 CCCTAGCATCACACACC 2299 TTGTGCGGTGTGTGAT 4068
    GCACAA GCTA
    13924 13946 CCTAGCATCACACACCG 2300 ATTGTGCGGTGTGTGA 4069
    CACAAT TGCT
    13938 13960 CCGCACAATCCCCTATC 2301 GGCCTAGATAGGGGAT 4070
    TAGGCC TGTG
    13947 13969 CCCCTATCTAGGCCTTC 2302 TCGTAAGAAGGCCTAG 4071
    TTACGA ATAG
    13948 13970 CCCTATCTAGGCCTTCT 2303 CTCGTAAGAAGGCCTA 4072
    TACGAG GATA
    13949 13971 CCTATCTAGGCCTTCTT 2304 GCTCGTAAGAAGGCCT 4073
    ACGAGC AGAT
    13959 13981 CCTTCTTACGAGCCAAA 2305 GCAGGTTTTGGCTCGT 4074
    ACCTGC AAGA
    13971 13993 CCAAAACCTGCCCCTAC 2306 GGAGGAGTAGGGGCA 4075
    TCCTCC GGTTT
    13977 13999 CCTGCCCCTACTCCTCC 2307 GGTCTAGGAGGAGTA 4076
    TAGACC GGGGC
    13981 14003 CCCCTACTCCTCCTAGA 2308 GTTAGGTCTAGGAGGA 4077
    CCTAAC GTAG
    13982 14004 CCCTACTCCTCCTAGAC 2309 GGTTAGGTCTAGGAGG 4078
    CTAACC AGTA
    13983 14005 CCTACTCCTCCTAGACC 2310 AGGTTAGGTCTAGGAG 4079
    TAACCT GAGT
    13989 14011 CCTCCTAGACCTAACCT 2311 CTAGTCAGGTTAGGTC 4080
    GACTAG TAGG
    13992 14014 CCTAGACCTAACCTGAC 2312 TTTCTAGTCAGGTTAG 4081
    TAGAAA GTCT
    13998 14020 CCTAACCTGACTAGAAA 2313 ATAGCTTTTCTAGTCA 4082
    AGCTAT GGTT
    14003 14025 CCTGACTAGAAAAGCTA 2314 AGGTAATAGCTTTTCT 4083
    TTACCT AGTC
    14023 14045 CCTAAAACAATTTCACA 2315 TGGTGCTGTGAAATTG 4084
    GCACCA TTTT
    14043 14065 CCAAATCTCCACCTCCA 2316 TGATGATGGAGGTGGA 4085
    TCATCA GATT
    14051 14073 CCACCTCCATCATCACC 2317 GGTTGAGGTGATGATG 4086
    TCAACC GAGG
    14054 14076 CCTCCATCATCACCTCA 2318 TTGGGTTGAGGTGATG 4087
    ACCCAA ATGG
    14057 14079 CCATCATCACCTCAACC 2319 TTTTTGGGTTGAGGTG 4088
    CAAAAA ATGA
    14066 14088 CCTCAACCCAAAAAGGC 2320 AATTATGCCTTTTTGG 4089
    ATAATT GTTG
    14072 14094 CCCAAAAAGGCATAATT 2321 AAGTTTAATTATGCCT 4090
    AAACTT TTTT
    14073 14095 CCAAAAAGGCATAATTA 2322 AAAGTTTAATTATGCC 4091
    AACTTT TTTT
    14100 14122 CCTCTCTTTCTTCTTCCC 2323 TGAGTGGGAAGAAGA 4092
    ACTCA AAGAG
    14115 14137 CCCACTCATCCTAACCC 2324 GGAGTAGGGTTAGGAT 4093
    TACTCC GAGT
    14116 14138 CCACTCATCCTAACCCT 2325 AGGAGTAGGGTTAGG 4094
    ACTCCT ATGAG
    14124 14146 CCTAACCCTACTCCTAA 2326 ATGTGATTAGGAGTAG 4095
    TCACAT GGTT
    14129 14151 CCCTACTCCTAATCACA 2327 AGGTTATGTGATTAGG 4096
    TAACCT AGTA
    14130 14152 CCTACTCCTAATCACAT 2328 TAGGTTATGTGATTAG 4097
    AACCTA GAGT
    14136 14158 CCTAATCACATAACCTA 2329 GGGGAATAGGTTATGT 4098
    TTCCCC GATT
    14149 14171 CCTATTCCCCCGAGCAA 2330 TTGAGATTGCTCGGGG 4099
    TCTCAA GAAT
    14155 14177 CCCCCGAGCAATCTCAA 2331 TTGTAATTGAGATTGC 4100
    TTACAA TCGG
    14156 14178 CCCCGAGCAATCTCAAT 2332 ATTGTAATTGAGATTG 4101
    TACAAT CTCG
    14157 14179 CCCGAGCAATCTCAATT 2333 TATTGTAATTGAGATT 4102
    ACAATA GCTC
    14158 14180 CCGAGCAATCTCAATTA 2334 ATATTGTAATTGAGAT 4103
    CAATAT TGCT
    14186 14208 CCAACAAACAATGTTCA 2335 ACTGGTTGAACATTGT 4104
    ACCAGT TTGT
    14204 14226 CCAGTAACTACTACTAA 2336 CGTTGATTAGTAGTAG 4105
    TCAACG TTAC
    14227 14249 CCCATAATCATACAAAG 2337 CGGGGGCTTTGTATGA 4106
    CCCCCG TTAT
    14228 14250 CCATAATCATACAAAGC 2338 GCGGGGGCTTTGTATG 4107
    CCCCGC ATTA
    14244 14266 CCCCCGCACCAATAGGA 2339 GGAGGATCCTATTGGT 4108
    TCCTCC GCGG
    14245 14267 CCCCGCACCAATAGGAT 2340 GGGAGGATCCTATTGG 4109
    CCTCCC TGCG
    14246 14268 CCCGCACCAATAGGATC 2341 CGGGAGGATCCTATTG 4110
    CTCCCG GTGC
    14247 14269 CCGCACCAATAGGATCC 2342 TCGGGAGGATCCTATT 4111
    TCCCGA GGTG
    14252 14274 CCAATAGGATCCTCCCG 2343 TTGATTCGGGAGGATC 4112
    AATCAA CTAT
    14262 14284 CCTCCCGAATCAACCCT 2344 GGGGTCAGGGTTGATT 4113
    GACCCC CGGG
    14265 14287 CCCGAATCAACCCTGAC 2345 AGAGGGGTCAGGGTT 4114
    CCCTCT GATTC
    14266 14288 CCGAATCAACCCTGACC 2346 GAGAGGGGTCAGGGT 4115
    CCTCTC TGATT
    14275 14297 CCCTGACCCCTCTCCTT 2347 TTTATGAAGGAGAGGG 4116
    CATAAA GTCA
    14276 14298 CCTGACCCCTCTCCTTC 2348 ATTTATGAAGGAGAGG 4117
    ATAAAT GGTC
    14281 14303 CCCCTCTCCTTCATAAA 2349 GAATAATTTATGAAGG 4118
    TTATTC AGAG
    14282 14304 CCCTCTCCTTCATAAAT 2350 TGAATAATTTATGAAG 4119
    TATTCA GAGA
    14283 14305 CCTCTCCTTCATAAATT 2351 CTGAATAATTTATGAA 4120
    ATTCAG GGAG
    14288 14310 GCTTCATAAATTATTCA 2352 GGAAGCTGAATAATTT 4121
    GCTTCC ATGA
    14309 14331 CCTACACTATTAAAGTT 2353 GTGGTAAACTTTAATA 4122
    TACCAC GTGT
    14328 14350 CCACAACCACCACCCCA 2354 GTATGATGGGGTGGTG 4123
    TCATAC GTTG
    14334 14356 CCACCACCCCATCATAC 2355 GAAAGAGTATGATGG 4124
    TCTTTC GGTGG
    14337 14359 CCACCCCATCATACTCT 2356 GGTGAAAGAGTATGAT 4125
    TTCACC GGGG
    14340 14362 CCCCATCATACTCTTTT 2357 CTGGGTGAAAGAGTAT 4126
    ACCCAC GATG
    14341 14363 CCCATCATACTCTTTCA 2358 TGTGGGTGAAAGAGTA 4127
    CCCACA TGAT
    14342 14364 CCATCATACTCTTTCAC 2359 CTGTGGGTGAAAGAGT 4128
    CCACAG ATGA
    14358 14380 CCCACAGCACCAATCCT 2360 GGAGGTAGGATTGGTG 4129
    ACCTCC CTGT
    14359 14381 CCACAGCACCAATCCTA 2361 TGGAGGTAGGATTGGT 4130
    CCTCCA GCTG
    14367 14389 CCAATCCTACCTCCATC 2362 GTTAGCGATGGAGGTA 4131
    GCTAAC GGAT
    14372 14394 CCTACCTCCATCGCTAA 2363 GTGGGGTTAGCGATGG 4132
    CCCCAC AGGT
    14376 14398 CCTCCATCGCTAACCCC 2364 TTTAGTGGGGTTAGCG 4133
    ACTAAA ATGG
    14379 14401 CCATCGCTAACCCCACT 2365 TGTTTTAGTGGGGTTA 4134
    AAAACA GCGA
    14389 14411 CCCCACTAAAACACTCA 2366 TCTTGGTGAGTGTTTT 4135
    CCAAGA AGTG
    14390 14412 CCCACTAAAACACTCAC 2367 GTCTTGGTGAGTGTTT 4136
    CAAGAC TAGT
    14391 14413 CCACTAAAACACTCACC 2368 GGTCTTGGTGAGTGTT 4137
    AAGACC TTAG
    14406 14428 CCAAGACCTCAACCCCT 2369 GGGGTCAGGGGTTGA 4138
    GACCCC GGTCT
    14412 14434 CCTCAACCCCTGACCCC 2370 GGCATGGGGGTCAGG 4139
    CATGCC GGTTG
    14418 14440 CCCCTGACCCCCATGCC 2371 TCCTGAGGCATGGGGG 4140
    TCAGGA TCAG
    14419 14441 CCCTGACCCCCATGCCT 2372 ATCCTGAGGCATGGGG 4141
    CAGGAT GTCA
    14420 14442 CCTGACCCCCATGCCTC 2373 TATCCTGAGGCATGGG 4142
    AGGATA GGTC
    14425 14447 CCCCCATGCCTCAGGAT 2374 AGGAGTATCCTGAGGC 4143
    ACTCCT ATGG
    14426 14448 CCCCATGCCTCAGGATA 2375 GAGGAGTATCCTGAGG 4144
    CTCCTC CATG
    14427 14449 CCCATGCCTCAGGATAC 2376 TGAGGAGTATCCTGAG 4145
    TCCTCA GCAT
    14428 14450 CCATGCCTCAGGATACT 2377 TTGAGGAGTATCCTGA 4146
    CCTCAA GGCA
    14433 14455 CCTCAGGATACTCCTCA 2378 GGCTATTGAGGAGTAT 4147
    ATAGCC CCTG
    14445 14467 CCTCAATAGCCATCGCT 2379 TACTACAGCGATGGCT 4148
    GTAGTA ATTG
    14454 14476 CCATCGCTGTAGTATAT 2380 CTTTGGATATACTACA 4149
    CCAAAG GCGA
    14471 14493 CCAAAGACAACCATCAT 2381 GGGGGAATGATGGTTG 4150
    TCCCCC TCTT
    14481 14503 CCATCATTCCCCCTAAA 2382 AATTTATTTAGGGGGA 4151
    TAAATT ATGA
    14489 14511 CCCCCTAAATAAATTAA 2383 GTTTTTTTAATTTATTT 4152
    AAAAAC AGG
    14490 14512 CCCCTAAATAAATTAA 2384 AGTTTTTTTAATTTATT 4153
    AAAACT TAG
    14491 14513 CCCTAAATAAATTAAAA 2385 TAGTTTTTTTAATTTAT 4154
    AAACTA TTA
    14492 14514 CCTAAATAAATTAAAAA 2386 ATAGTTTTTTTAATTTA 4155
    AACTAT TTT
    14519 14541 CCCATATAACCTCCCCC 2387 AATTTTGGGGGAGGTT 4156
    AAAATT ATAT
    14520 14542 CCATATAACCTCCCCCA 2388 GAATTTTGGGGGAGGT 4157
    AAATTC TATA
    14528 14550 CCTCCCCCAAAATTCAG 2389 ATTATTCTGAATTTTG 4158
    AATAAT GGGG
    14531 14553 CCCCCAAAATTCAGAAT 2390 GTTATTATTCTGAATTT 4159
    AATAAC TGG
    14532 14554 CCCCAAAATTCAGAATA 2391 TGTTATTATTCTGAATT 4160
    ATAACA TTG
    14533 14555 CCCAAAATTCAGAATAA 2392 GTGTTATTATTCTGAA 4161
    TAACAC TTTT
    14534 14556 CCAAAATTCAGAATAAT 2393 TGTGTTATTATTCTGA 4162
    AACACA ATTT
    14557 14579 CCCGACCACACCGCTAA 2394 TGATTGTTAGCGGTGT 4163
    CAATCA GGTC
    14558 14580 CCGACCACACCGCTAAC 2395 TTGATTGTTAGCGGTG 4164
    AATCAA TGGT
    14562 1484 CCACACCGCTAACAATC 2396 AGTATTGATTGTTAGC 4165
    AATACT GGTG
    14567 14589 CCGCTAACAATCAATAC 2397 GGTTTAGTATTGATTG 4166
    TAAACC TTAG
    14588 14610 CCCCCATAAATAGGAGA 2398 AAGCCTTCTCCTATTT 4167
    AGGCTT ATGG
    14589 14611 CCCCATAAATAGGAGA 2399 TAAGCCTTCTCCTATTT 4168
    AGGCTTA ATG
    14590 14612 CCCATAAATAGGAGAA 2400 CTAAGCCTTCTCCTAT 4169
    GGCTTAG TTAT
    14591 14613 CCATAAATAGGAGAAG 2401 TCTAAGCCTTCTCCTA 4170
    GCTTAGA TTTA
    14620 14642 CCCCACAAACCCCATTA 2402 GTTTAGTAATGGGGTT 4171
    CTAAAC TGTG
    14621 14643 CCCACAAACCCCATTAC 2403 GGTTTAGTAATGGGGT 4172
    TAAACC TTGT
    14622 14644 CCACAAACCCCATTACT 2404 GGGTTTAGTAATGGGG 4173
    AAACCC TTTG
    14629 14651 CCCCATTACTAAACCCA 2405 TGAGTGTGGGTTTAGT 4174
    CACTCA AATG
    14630 14652 CCCATTACTAAACCCAC 2406 TTGAGTGTGGGTTTAG 4175
    ACTCAA TAAT
    14631 14653 CCATTACTAAACCCACA 2407 GTTGAGTGTGGGTTTA 4176
    CTCAAC GTAA
    14642 14664 CCCACACTCAACAGAAA 2408 GCTTTGTTTCTGTTGA 4177
    CAAAGC GTGT
    14643 14665 CCACACTCAACAGAAAC 2409 TGCTTTGTTTCTGTTGA 4178
    AAAGCA GTG
    14694 14716 CCACGACCAATGATATG 2410 GTTTTTCATATCATTG 4179
    AAAAAC GTCG
    14700 14722 CCAATGATATGAAAAAC 2411 ACGATGGTTTTTCATA 4180
    CATCGT TCAT
    14716 14738 CCATCTTGTATTTCAA 2412 TTGTAGTTGAAATACA 4181
    CTACAA ACGA
    14744 14766 CCAATGACCCCAATACG 2413 GTTTTGCGTATTGGGG 4182
    CAAAAC TCAT
    14751 14773 CCCCAATACGCAAAACT 2414 GGGGTTAGTTTTGCGT 4183
    AACCCC ATTG
    14752 14774 CCCAATACGCAAAACTA 2415 GGGGGTTAGTTTTGCG 4184
    ACCCCC TATT
    14753 14775 CCAATACGCAAAACTAA 2416 AGGGGGTTAGTTTTGC 4185
    CCCCCT GTAT
    14770 14792 CCCCCTAATAAAATTAA 2417 GGTTAATTAATTTTAT 4186
    TTAACC TAGG
    14771 14793 CCCCTAATAAAATTAAT 2418 TGGTTAATTAATTTTA 4187
    TAACCA TTAG
    14772 14794 CCCTAATAAAATTAATT 2419 GTGGTTAATTAATTTT 4188
    AACCAC ATTA
    14773 14795 CCTAATAAAATFAATTA 2420 AGTGGTTAATTAATTT 4189
    ACCACT TATT
    14791 14813 CCACTCATTCATCGACC 2421 TGGGGAGGTCGATGA 4190
    TCCCCA ATGAG
    14806 14828 CCTCCCCACCCCATCCA 2422 AGATGTTGGATGGGGT 4191
    ACATCT GGGG
    14809 14831 CCCCACCCCATCCAACA 2423 CGGAGATGTTGGATGG 4192
    TCTCCG GGTG
    14810 14832 CCCACCCCATCCAACAT 2424 GCGGAGATGTTGGATG 4193
    CTCCGC GGGT
    14811 14833 CCACCCCATCCAACATC 2425 TGCGGAGATGTTGGAT 4194
    TCCGCA GGGG
    14814 14836 CCCCATCCAACATCTCC 2426 TCATGCGGAGATGTTG 4195
    GCATGA GATG
    14815 14837 CCCATCCAACATCTCCG 2427 ATCATGCGGAGATGTT 4196
    CATGAT GGAT
    14816 14838 CCATCCAACATCTCCGC 2428 CATCATGCGGAGATGT 4197
    ATGATG TGGA
    14820 14842 CCAACATCTCCGCATGA 2429 GTTTCATCATGCGGAG 4198
    TGAAAC ATGT
    14829 14851 CCGCATGATGAAACTTC 2430 TGAGCCGAAGTTTCAT 4199
    GGCTCA CATG
    14854 14876 CCTTGGCGCCTGCCTGA 2431 GGAGGATCAGGCAGG 4200
    TCCTCC CGCCA
    14862 14884 CCTGCCTGATCCTCCAA 2432 GGTGATTTGGAGGATC 4201
    ATCACC AGGC
    14866 14888 CCTGATCCTCCAAATCA 2433 CTGTGGTGATTTGGAG 4202
    CCACAG GATC
    14872 14894 CCTCCAAATCACCACAG 2434 ATAGTCCTGTGGTGAT 4203
    GACTAT TTGG
    14875 14897 CCAAATCACCACAGGAC 2435 GGAATAGTCCTGTGGT 4204
    TATTCC GATT
    14883 14905 CCACAGGACTATTCCTA 2436 CATGGCTAGGAATAGT 4205
    GCCATG CCTG
    14896 14918 CCTAGCCATGCACTACT 2437 CTGGTGAGTAGTGCAT 4206
    CACCAG GGCT
    14901 14923 CCATGCACTACTCACCA 2438 GGCGTCTGGTGAGTAG 4207
    GACGCC TGCA
    14915 14937 CCAGACGCCTCAACCGC 2439 GAAAAGGCGGTTGAG 4208
    CTTTTC GCGTC
    14922 14944 CCTCAACCGCCTTTTCA 2440 GATTGATGAAAAGGC 4209
    TCAATC GGTTG
    14928 14950 CCGCCTTTTCATCAATC 2441 GTGGGCGATTGATGAA 4210
    GCCCAC AAGG
    14931 14953 CCTTTTCATCAATCGCC 2442 GATGTGGGCGATTGAT 4211
    CACATC GAAA
    14946 14968 CCCACATCACTCGAGAC 2443 ATTTACGTCTCGAGTG 4212
    GTAAAT ATGT
    14947 14969 CCACATCACTCGAGACG 2444 AATTTACGTCTCGAGT 4213
    TAAATT GATG
    14983 15005 CCGCTACCTTCACGCCA 2445 CGCCATTGGCGTGAAG 4214
    ATGGCG GTAG
    14989 15011 CCTTCACGCCAATGGCG 2446 TTGAGGCGCCATTGGC 4215
    CCTCAA GTGA
    14997 15019 CCAATGGCGCCTCAATA 2447 AAAGAATATTGAGGC 4216
    TTCTTT GCCAT
    15006 15028 CCTCAATATTCTTTATCT 2448 GAGGCAGATAAAGAA 4217
    GCCTC TATTG
    15025 15047 CCTCTTCCTACACATCG 2449 CTCGCCCGATGTGTAG 4218
    GGCGAG GAAG
    15031 15053 CCTACACATCGGGCGAG 2450 ATAGGCCTCGCCCGAT 4219
    GCCTAT GTGT
    15049 15071 CCTATATTACGGATCAT 2451 AGAGAAATGATCCGTA 4220
    TTCTCT ATAT
    15081 15103 CCTGAAACTTCGGCATT 2452 GAGGATAATGCCGATG 4221
    ATCCTC TTTC
    15100 15122 CCTCCTGCTTGCAACTA 2453 TTGCTATAGTTGCAAG 4222
    TAGCAA CAGG
    15103 15125 CCTGCTTGCAACTATAG 2454 CTGTTGCTATAGTTGC 4223
    CAACAG AAGC
    15126 15148 CCTTCATAGGCTATGTC 2455 CGGGAGGACATAGCCT 4224
    CTCCCG ATGA
    15142 15164 CCTCCCGTGAGGCCAAA 2456 ATGATATTTGGCCTCA 4225
    TATCAT CGGG
    15145 15167 CCCGTGAGGCCAAATAT 2457 AGAATGATATTTGGCC 4226
    CATTCT TCAC
    15146 15168 CCGTGAGGCCAAATATC 2458 CAGAATGATATTTGGC 4227
    ATTCTG CTCA
    15154 15176 CCAAATATCATTCTGAG 2459 TGGCCCCTCAGAATGA 4228
    GGGCCA TATT
    15174 15196 CCACAGTAATTACAAAC 2460 TAGTAAGTTTGTAATT 4229
    TTACTA ACTG
    15198 15220 CCGCCATCCCATACATT 2461 TGTCCCAATGTATGGG 4230
    GGGACA ATGG
    15201 15223 CCATCCCATACATTGGG 2462 GTCTGTCCCAATGTAT 4231
    ACAGAC GGGA
    15205 15727 CCCATACATTGGGACAG 2463 CTAGGTCTGTCCCAAT 4232
    ACCTAG GTAT
    15206 15228 CCATACATTGGGACAGA 2464 ACTAGGTCTGTCCCAA 4233
    CCTAGT TGTA
    15223 15245 CCTAGTTCAATGAATCT 2465 CTCCTCAGATTCATTG 4234
    GAGGAG AACT
    15263 15285 CCCACCCTCACACGATT 2466 GTAAAGAATCGTGTGA 4235
    CTTTAC GGGT
    15264 15286 CCACCCTCACACGATTC 2467 GGTAAAGAATCGTGTG 4236
    TTTACC AGGG
    15267 15289 CCCTCACACGATTCTTT 2468 AAAGGTAAAGAATCG 4237
    ACCTTT TGTGA
    15268 15290 CCTCACACGATTCTTTA 2469 GAAAGGTAAAGAATC 4238
    CCTTTC GTGTG
    15285 15307 CCTTTCACTTCATCTTGC 2470 GAAGGGCAAGATGAA 4239
    CCTTC GTGAA
    15302 15324 CCCTTCATTATTGCAGC 2471 GCTAGGGCTGCAATAA 4240
    CCTAGC TGAA
    15303 15325 CCTTCATTATTGCAGCC 2472 TGCTAGGGCTGCAATA 4241
    CTAGCA ATGA
    15318 15340 CCCTAGCAACACTCCAC 2473 TAGGAGGTGGAGTGTT 4242
    CTCCTA GCTA
    15319 15341 CCTAGCAACACTCCACC 2474 ATAGGAGGTGGAGTGT 4243
    TCCTAT TGCT
    15331 15353 CCACCTCCTATTCTTGC 2475 TTTCGTGCAAGAATAG 4244
    ACGAAA GAGG
    15334 15356 CCTCCTATTCTTGCACG 2476 CCGTTTCGTGCAAGAA 4245
    AAACGG TAGG
    15337 15359 CCTATTCTTGCACGAAA 2477 ATCCCGTTTCGTGCAA 4246
    CGGGAT GAAT
    15367 15389 CCCCCTAGGAATCACCT 2478 AATGGGAGGTGATTCC 4247
    CCCATT TAGG
    15368 15390 CCCCTAGGAATCACCTC 2479 GAATGGGAGGTGATTC 4248
    CCATTC CTAG
    15369 15391 CCCTAGGAATCACCTCC 2480 GGAATGGGAGGTGATT 4249
    CATTCC CCTA
    15370 15392 CCTAGGAATCACCTCCC 2481 CGGAATGGGAGGTGA 4250
    ATTCCG TTCCT
    15381 15403 CCTCCCATTCCGATAAA 2482 GGTGATTTTATCGGAA 4251
    ATCACC TGGG
    15384 15406 CCCATTCCGATAAAATC 2483 GAAGGTGATTTTATCG 4252
    ACCTTC GAAT
    15385 15407 CCATTCCGATAAAATCA 2484 GGAAGGTGATTTTATC 4253
    CCTTCC GGAA
    15390 15412 CCGATAAAATCACCTTC 2485 AGGGTGGAAGGTGATT 4254
    CACCCT TTAT
    15402 15424 CCTTCCACCCTTACTAC 2486 GATTGTGTAGTAAGGG 4255
    ACAATC TGGA
    15406 15428 CCACCCTTACTACACAA 2487 CTTTGATTGTGTAGTA 4256
    TCAAAG AGGG
    15409 15431 CCCTTACTACACAATCA 2488 CGTCTTTGATTGTGTA 4257
    AAGACG GTAA
    15410 15432 CCTTACTACACAATCAA 2489 GCGTCTTTGATTGTGT 4258
    AGACGC AGTA
    15432 15454 CCCTCGGCTTACTTCTCT 2490 AAGGAAGAGAAGTAA 4259
    TCCTT GCCGA
    15433 15455 CCTCGGCTTACTTCTCTT 2491 GAAGGAAGAGAAGTA 4260
    CCTTC AGCCG
    15451 15473 CCTTCTCTCCTTAATGA 2492 TTAATGTCATTAAGGA 4261
    CATTAA GAGA
    15459 15481 CCTTAATGACATTAACA 2493 GAATAGTGTTAATGTC 4262
    CTATTC ATTA
    15485 15507 CCAGACCTCCTAGGCGA 2494 TCTGGGTCGCCTAGGA 4263
    CCCAGA GGTC
    15490 15512 CCTCCTAGGCGACCCAG 2495 AATTGTCTGGGTCGCC 4264
    ACAATT TAGG
    15493 15515 CCTAGGCGACCCAGACA 2496 TATAATTGTCTGGGTC 4265
    ATTATA GCCT
    15502 15524 CCCAGACAATTATACCC 2497 TGGCTAGGGTATAATT 4266
    TAGCCA GTCT
    15503 15525 CCAGACAATTATACCCT 2498 TTGGCTAGGGTATAAT 4267
    AGCCAA TGTC
    15516 15538 CCCTAGCCAACCCCTTA 2499 GGTGTTTAAGGGGTTG 4268
    AACACC GCTA
    15517 15539 CCTAGCCAACCCCTTAA 2500 GGGTGTTTAAGGGGTT 4269
    ACACCC GGCT
    15522 15544 CCAACCCCTTAAACACC 2501 GGGAGGGGTGTTTAAG 4270
    CCTCCC GGGT
    15526 15548 CCCCTTAAACACCCCTC 2502 TGTGGGGAGGGGTGTT 4271
    CCCACA TAAG
    15527 15549 CCCTTAAACACCCCTCC 2503 ATGTGGGGAGGGGTGT 4272
    CCACAT TTAA
    15528 15550 CCTTAAACACCCCTCCC 2504 GATGTGGGGAGGGGT 4273
    CACATC GTTTA
    15537 15559 CCCCTCCCCACATCAAG 2505 TTCGGGCTTGATGTGG 4274
    CCCGAA GGAG
    15538 15560 CCCTCCCCACATCAAGC 2506 ATTCGGGCTTGATGTG 4275
    CCGAAT GGGA
    15539 15561 CCTCCCCACATCAAGCC 2507 CATTCGGGCTTGATGT 4276
    CGAATG GGGG
    15542 15564 CCCCACATCAAGCCCGA 2508 TATCATTCGGGCTTGA 4277
    ATGATA TGTG
    15543 15565 CCCACATCAAGCCCGAA 2509 ATATCATTCGGGCTTG 4278
    TGATAT ATGT
    15544 15566 CCACATCAAGCCCGAAT 2510 AATATCATCGGGCTT 4279
    GATATT GATG
    15554 15576 CCCGAATGATATTTCCT 2511 GCGAATAGGAAATATC 4280
    ATTCGC ATTC
    15555 15577 CCGAATGATATTTCCTA 2512 GGCGAATAGGAAATA 4281
    TTCGCC TCATT
    15568 15590 CCTATTCGCCTACACAA 2513 GGAGAATTGTGTAGGC 4282
    TTCTCC GAAT
    15576 15598 CCTACACAATTCTCCGA 2514 GACGGATCGGAGAATT 4283
    GTGTGT GTGT
    15589 15611 CCGATCCGTCCCTAACA 2515 CTAGTTTGTTAGGGAC 4284
    AACTAG GGAT
    15594 15616 CCGTCCCTAACAAACTA 2516 GCCTCCTAGTTTGTTA 4285
    GGAGGC GGGA
    15598 15620 CCCTAACAAACTAGGAG 2517 GGACGCCTCCTAGTTT 4286
    GCGTCC GTTA
    15599 15621 CCTAACAAACTAGGAG 2518 AGGACGCCTCCTAGTT 4287
    GCGTCCT TGTT
    15619 15641 CCTTGCCCTATTACTAT 2519 GGATGGATAGTAATAG 4288
    CCATCC GGCA
    15624 15646 CCCTATTACTATCCATC 2520 GATGAGGATGGATAGT 4289
    CTCATC AATA
    15625 15647 CCTATTACTATCCATCC 2521 GGATGAGGATGGATA 4290
    TCATCC GTAAT
    15636 15658 CCATCCTCATCCTAGCA 2522 GATTATTGCTAGGATG 4291
    ATAATC AGGA
    15640 15662 CCTCATCCTAGCAATAA 2523 TGGGGATTATTGCTAG 4292
    TCCCCA GATG
    15646 15668 CCTAGCAATAATCCCCA 2524 GGAGGATGGGGATTAT 4293
    TCCTCC TGCT
    15658 15680 CCCCATCCTCCATATAT 2525 GTTTGGATATATGGAG 4294
    CCAAAC GATG
    15659 15681 CCCATCCTCCATATATC 2526 TGTTTGGATATATGGA 4295
    CAAACA GGAT
    15660 15682 CCATCCTCCATATATCC 2527 TTGTTTGGATATATGG 4296
    AAACAA AGGA
    15664 15686 CCTCCATATATCCAAAC 2528 TTTGTTGTTTGGATAT 4297
    AACAAA ATGG
    15667 15689 CCATATATCCAAACAAC 2529 TGCTTTGTTGTTTGGAT 4298
    AAAGCA ATA
    15675 15697 CCAAACAACAAAGCAT 2530 AAATATTATGCTTTGT 4299
    AATATTT TGTT
    15700 15722 CCCACTAAGCCAATCAC 2531 AATAAAGTGATTGGCT 4300
    TTTATT TAGT
    15701 15723 CCACTAACCAATCACT 2532 CAATAAAGTGATTGGC 4301
    TTATTG TTAG
    15709 15731 CCAATCACTTTATTGAC 2533 CTAGGAGTCAATAAAG 4302
    TCCTAG TGAT
    15727 15749 CCTAGCCGCAGACCTCC 2534 GAATGAGGAGGTCTGC 4303
    TCATTC GGCT
    15732 15754 CCGCAGACCTCCTCATT 2535 GGTTAGAATGAGGAG 4304
    CTAACC GTCTG
    15739 15761 CCTCCTCATTCTAACCT 2536 CGATTCAGGTTAGAAT 4305
    GAATCG GAGG
    15742 15764 CCTCATTCTAACCTGAA 2537 CTCCGATTCAGGTTAG 4306
    TCGGAG AATG
    15753 15775 CCTGAATCGGAGGACA 2538 TACTGGTTGTCCTCCG 4307
    ACCAGTA ATTC
    15770 15792 CCAGTAAGCTACCCTTT 2539 ATGGTAAAAGGGTAG 4308
    TACCAT CTTAC
    15781 15803 CCCTTTTACCATCATTG 2540 CTTGTCCAATGATGGT 4309
    GACAAG AAAA
    15782 15804 CCTTTTACCATCATTGG 2541 ACTTGTCCAATGATGG 4310
    ACAAGT TAAA
    15789 15811 CCATCATTGGACAAGTA 2542 GGATGCTACTTGTCCA 4311
    GCATCC ATGA
    15810 15832 CCGTACTATACTTCACA 2543 GATTGTTGTGAAGTAT 4312
    ACAATC AGTA
    15832 15854 CCTAATCCTAATACCAA 2544 AGATAGTTGGTATTAG 4313
    CTATCT GATT
    15838 15860 CCTAATACCAACTATCT 2545 TTAGGGAGATAGTTGG 4314
    CCCTAA TATT
    15845 15867 CCAACTATCTCCCTAAT 2546 TTTTCAATTAGGGAGA 4315
    TGAAAA TAGT
    15855 15877 CCCTAATTGAAAACAAA 2547 GAGTATTTTGTTTTCA 4316
    ATACTC ATTA
    15856 15878 CCTAATTGAAAACAAAA 2548 TGAGTATTTTGTTTTCA 4317
    TACTCA ATT
    15885 15907 CCTGTCCTTGTAGTATA 2549 TTAGTTTATACTACAA 4318
    AACTAA GGAC
    15890 15912 CCTTGTAGTATAAACTA 2550 GTGTATTAGTTTATAC 4319
    ATACAC TACA
    15912 15934 CCAGTCTTGTAAACCGG 2551 TCATCTCCGGTTTACA 4320
    AGATGA AGAC
    15925 15947 CCGGAGATGAAAACCTT 2552 TGGAAAAAGGTTTTCA 4321
    TTTCCA TCTC
    15938 15960 CCTTTTTCCAAGGACAA 2553 TCTGATTTGTCCTTGG 4322
    ATCAGA AAAA
    15945 15967 CCAAGGACAAATCAGA 2554 CTTTTTCTCTGATTTGT 4323
    GAAAAAG CCT
    15977 15999 CCACCATTAGCACCCAA 2555 TTAGCTTTGGGTGCTA 4324
    AGCTAA ATGG
    15980 16002 CCATTAGCACCCAAAGC 2556 ATCTTAGCTTTGGGTG 4325
    TAAGAT CTAA
    15989 16011 CCCAAAGCTAAGATTCT 2557 TAAATTAGAATCTTAG 4326
    AATTTA CTTT
    15990 16012 CCAAAGCTAAGATTCTA 2558 TTAAATTAGAATCTTA 4327
    ATTTAA GCTT
    16052 16074 CCACCCAAGTATTGACT 2559 TGGGTGAGTCAATACT 4328
    CACCCA TGGG
    16055 16077 CCCAAGTATTGACTCAC 2560 TGATGGGTGAGTCAAT 4329
    CCATCA ACTT
    16056 16078 CCAAGTATTGACTCACC 2561 TTGATGGGTGAGTCAA 4330
    CATCAA TACT
    16071 16093 CCCATCAACAACCGCTA 2562 AATACATAGCGGTTGT 4331
    TGTATT TGAT
    16072 16094 CCATCAACAACCGCTAT 2563 AAATACATAGCGGTTG 4332
    GTATTT TTGA
    16082 16104 CCGCTATGTATTTCGTA 2564 GTAATGTACGAAATAC 4333
    CATTAC ATAG
    16107 16129 CCAGCCACCATGAATAT 2565 CGTACAATATTCATGG 4334
    TGTACG TGGC
    16111 16133 CCACCATGAATATTGTA 2566 GTACCGTACAATATTC 4335
    CGGTAC ATGG
    16114 16136 CCATGAATATTGTACGG 2567 ATGGTACCGTACAATA 4336
    TACCAT TTCA
    16133 16155 CCATAAATACTTGACCA 2568 TACAGGTGGTCAAGTA 4337
    CCTGTA TTTA
    16147 16169 CCACCTGTAGTACATAA 2569 GGGTTTTTATGTACTA 4338
    AAACCC CAGG
    16150 16172 CCTGTAGTACATAAAAA 2570 ATTGGGTTTTTATGTA 4339
    CCCAAT CTAC
    16167 16189 CCCAATCCACATCAAAA 2571 AGGGGGTTTTGATGTG 4340
    CCCCCT GATT
    16168 16190 CCAATCCACATCAAAAC 2572 GAGGGGGTTTTGATGT 4341
    CCCCTC GGAT
    16173 16195 CCACATCAAAACCCCCT 2573 ATGGGGAGGGGGTTTT 4342
    CCCCAT GATG
    16184 16206 CCCCCTCCCCATGCTTA 2574 TGCTTGTAAGCATGGG 4343
    CAAGCA GG
    16185 16207 CCCCTCCCCATGCTTAC 2575 TTGCTTGTAAGCATGG 4344
    AAGCAA GGAG
    16186 16208 CCCTCCCCATGCTTACA 2576 CTTGCTTGTAAGCATG 4345
    AGCAAG GGGA
    16187 16209 CCTCCCCATGCTTACAA 2577 ACTTGCTTGTAAGCAT 4346
    GCAAGT GGGG
    16190 16212 CCCCATGCTTACAAGCA 2578 TGTACTTGCTTGTAAG 4347
    AGTACA CATG
    16191 16213 CCCATGCTTACAAGCAA 2579 CTGTACTTGCTTGTAA 4348
    GTACAG GCAT
    16192 16214 CCATGCTTACAAGCAAG 2580 GCTGTACTTGCTTGTA 4349
    TACAGC AGCA
    16221 16243 CCCTCAACTATCACACA 2581 AGTTGATGTGTGATAG 4350
    TCAACT TTGA
    16222 16244 CCTCAACTATCACACAT 2582 CAGTTGATGTGTGATA 4351
    CAACTG GTTG
    16250 16272 CCAAAGCCACCCCTCAC 2583 TAGTGGGTGAGGGGTG 4352
    CCACTA GCTT
    16256 16278 CCACCCCTCACCCACTA 2584 GTATCCTAGTGGGTGA 4353
    GGATAC GGGG
    16259 16281 CCCCTCACCCACTAGGA 2585 TTGGTATCCTAGTGGG 4354
    TACCAA TGAG
    16260 16282 CCCTCACCCACTAGGAT 2586 GTTGGTATCCTAGTGG 4355
    ACCAAC GTGA
    16261 16283 CCTCACCCACTAGGATA 2587 TGTTGGTATCCTAGTG 4356
    CCAACA GGTG
    16266 16288 CCCACTAGGATACCAAC 2588 AGGTTTGTTGGTATCC 4357
    AAACCT TAGT
    16267 16289 CCACTAGGATACCAACA 2589 TAGGTTTGTTGGTATC 4358
    AACCTA CTAG
    16278 16300 CCAACAAACCTACCCAC 2590 TTAAGGGTGGGTAGGT 4359
    CCTTAA TTGT
    16286 16308 CCTACCCACCCTTAACA 2591 ATGTACTGTTAAGGGT 4360
    GTACAT GGGT
    16290 16312 CCCACCCTTAACAGTAC 2592 TACTATGTACTGTTAA 4361
    ATAGTA GGGT
    16291 16313 CCACCCTTAACAGTACA 2593 GTACTATGTACTGTTA 4362
    TAGTAC AGGG
    16294 16316 CCCTTAACAGTACATAG 2594 TATGTACTATGTACTG 4363
    TACATA TTAA
    16295 16317 CCTTAACAGTACATAGT 2595 TTATGTACTATGTACT 4364
    ACATAA GTTA
    16320 16342 CCATTTACCGTACATAG 2596 AATGTGCTATGTACGG 4365
    CACATT TAAA
    16327 16349 CCGTACATAGCACATTA 2597 TGACTGTAATGTGCTA 4366
    CAGTCA TGTA
    16353 16375 CCCTTCTCGTCCCCATG 2598 GTCATCCATGGGGACG 4367
    GATGAC AGAA
    16354 16376 CCTTCTCGTCCCCATGG 2599 GGTCATCCATGGGGAC 4368
    ATGACC GAGA
    16363 16385 CCCCATGGATGACCCCC 2600 TCTGAGGGGGGTCAT 4369
    CTCAGA CATG
    16364 16386 CCCATGGATGACCCCCC 2601 ATCTGAGGGGGGTCAT 4370
    TCAGAT CCAT
    16365 16387 CCATGGATGACCCCCCT 2602 TATCTGAGGGGGGTCA 4371
    CAGATA TCCA
    16375 16397 CCCCCCTCAGATAGGGG 2603 AAGGGACCCCTATCTG 4372
    TCCCTT AGGG
    16376 16398 CCCCCTCAGATAGGGGT 2604 CAAGGGACCCCTATCT 4373
    CCCTTG GAGG
    16377 16399 CCCCTCAGATAGGGGTC 2605 TCAAGGGACCCCTATC 4374
    CCTTGA TGAG
    16378 16400 CCCTCAGATAGGGGTCC 2606 GTCAAGGGACCCCTAT 4375
    CTTGAC CTGA
    16379 16401 CCTCAGATAGGGGTCCC 2607 GGTCAAGGGACCCCTA 4376
    TTGACC TCTG
    16393 16415 CCCTTGACCACCATCCT 2608 TCACGGAGGATGGTGG 4377
    CCGTGA TCAA
    16394 16416 CCTTGACCACCATCCTC 2609 TTCACGGAGGATGGTG 4378
    CGTGAA GTCA
    16400 16422 CCACCATCCTCCGTGAA 2610 ATTGATTTCACGGAGG 4379
    ATCAAT ATGG
    16403 16425 CCATCCTCCGTGAAATC 2611 GATATTGATTTCACGG 4380
    AATATC AGGA
    16407 16429 CCTCCGTGAAATCAATA 2612 GCGGGATATTGATTTC 4381
    TCCCGC ACGG
    16410 16432 CCGTGAAATCAATATCC 2613 TGTGCGGGATATTGAT 4382
    CGCACA TTCA
    16425 16447 CCCGCACAAGAGTGCTA 2614 GGAGAGTAGCACTCTT 4383
    CTCTCC GTGC
    16426 16448 CCGCACAAGAGTGCTAC 2615 AGGAGAGTAGCACTCT 4384
    TCTCCT TGTG
    16446 16468 CCTCGCTCCGGGCCCAT 2616 AGTGTTATGGGCCCGG 4385
    AACACT AGCG
    16453 16475 CCGGGCCCATAACACTT 2617 ACCCCCAAGTGTTATG 4386
    GGGGGT GGCC
    16458 16480 CCCATAACACTTGGGGG 2618 TAGCTACCCCCAAGTG 4387
    TAGCTA TTAT
    16459 16481 CCATAACACTTGGGGGT 2619 TTAGCTACCCCCAAGT 4388
    AGCTAA GTTA
    16494 16516 CCGACATCTGGTTCCTA 2620 CTGAAGTAGGAACCA 4389
    CTTCAG GATGT
    16507 16529 CCTACTTCAGGGTCATA 2621 AGGCTTTATGACCCTG 4390
    AAGCCT AAGT
    16527 16549 CCTAAATAGCCCACACG 2622 GGGGAACGTGTGGGCT 4391
    TTCCCC ATTT
    16536 16558 CCCACACGTTCCCCTTA 2623 CTTATTTAAGGGGAAC 4392
    AATAAG GTGT
    16537 16559 CCACACGTTCCCCTTAA 2624 TCTTATTTAAGGGGAA 4393
    ATAAGA CGTG
    16546 16568 CCCCTTAAATAAGACAT 2625 ATCGTGATGTCTTATT 4394
    CACGAT TAAG
    16547 16569 CCCTTAAATAAGACATC 2626 CATCGTGATGTCTTAT 4395
    ACGATG TT
    16548 16570 CCTTAAATAAGACATCA 2627 CCATCGTGATGTCTTA 4396
    CGATGG TTTA
  • Applications
  • The gNAs (e.g., gRNAs) and collections of gNAs (e.g., gRNAs) provided herein are useful for a variety of applications, including depletion, partitioning, capture, or enrichment of target sequences of interest, genome-wide labeling; genome-wide editing, genome-wide function screens; and genome-wide regulation.
  • In one embodiment, the gNAs are selective for host nucleic acids in abiological sample from a host, but are not selective for non-host nucleic acids in the sample from a host. In one embodiment, the gNAs are selective for non-host nucleic acids from a biological sample from a host but are not selective for the host nucleic acids in the sample. In one embodiment, the gNAs are selective for both host nucleic acids and a subset of the non-host nucleic acids in abiological sample from a host. For example, where a complex biological sample comprises host nucleic acids and nucleic acids from more than one non-host organisms, the gRNAs may be selective for more than one of the non-host species. In such embodiments, the gNAs are used to serially deplete or partition the sequences that are not of interest. For example, saliva from a human contains human DNA, as well as the DNA of more than one bacterial species, but may also contain the genomic material of an unknown pathogenic organism. In such an embodiment, gNAs directed at the human DNA and the known bacteria can be used to serially deplete the human DNA, and the DNA of the known bacterial, thus resulting in a sample comprising the genomic material of the unknown pathogenic organism.
  • In an exemplary embodiment, the gNAs are selective for human host DNA obtained from a biological sample from the host, but do not hybridize with DNA from an unknown pathogen(s) also obtained from the sample.
  • In some embodiments, the gNAs are useful for depleting and partitioning of targeted sequences in a sample, enriching a sample for non-host nucleic acids, or serially depleting targeted nucleic acids in a sample comprising: providing nucleic acids extracted from a sample; and contacting the sample with a plurality of complexes comprising (i) any one of the collection of gNAs described herein and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins.
  • In some embodiments, the gNAs are useful for method of depletion and partitioning of targeted sequences in a sample comprising: providing nucleic acids extracted from a sample, wherein the extracted nucleic acids comprise sequences of interest and targeted sequences for one of depletion and partitioning; contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, under conditions in which the nucleic acid-guided nuclease system proteins cleave the nucleic acids in the sample.
  • In some embodiments, the gNAs are useful for enriching a sample for non-host nucleic acids comprising: providing a sample comprising host nucleic acids and non-host nucleic acids; contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein comprising targeting sequences directed at the host nucleic acids; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, under conditions in which the nucleic acid-guided nuclease system proteins cleave the host nucleic acids in the sample, thereby depleting the sample of host nucleic acids, and allowing for the enrichment of non-host nucleic acids.
  • In some embodiments, the gNAs are useful for one method for serially depleting targeted nucleic acids in a sample comprising: providing a biological sample from a host comprising host nucleic acids and non-host nucleic acids, wherein the non-host nucleic acids comprise nucleic acids from at least one known non-host organism and nucleic acids from an unknown non-host organism; providing a plurality of complexes comprising (i) a collection of gNAs provided herein, directed at the host nucleic acids; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins; mixing the nucleic acids from the biological sample with the gNA-nucleic acid-guided nuclease system protein complexes (e.g., gRNA-CRISPR/Cas system protein complexes) configured to hybridize to targeted sequences in the host nucleic acids, wherein at least a portion of the complexes hybridizes to the targeted sequences in the host nucleic acids, and wherein at least a portion of the host nucleic acids are cleaved; mixing the remaining nucleic acids from the biological sample with the gNA-nucleic acid-guided nuclease system protein complexes configured to hybridize to targeted sequences in the at least one known non-host nucleic acids, wherein at least a portion of the complexes hybridizes to the targeted sequences in the at least one non-host nucleic acids, and wherein at least a portion of the non-host nucleic acids are cleaved; and isolating the remaining nucleic acids from the unknown non-host organism and preparing for further analysis.
  • In some embodiments, the gNAs generated herein are used to perform genome-wide or targeted functional screens in a population of cells. In such an embodiment, libraries of in vitro-transcribed gNAs (e.g., gRNAs) or vectors encoding the gNAs can be introduced into a population of cells via transfection or other laboratory techniques known in the art, along with a nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein, in a way that gNA-directed nucleic acid-guided nuclease system protein editing can be achieved to sequences across the entire genome or to a specific region of the genome. In one embodiment, the nucleic acid-guided nuclease system protein can be introduced as a DNA. In one embodiment, the nucleic acid-guided nuclease system protein can be introduced as mRNA. In one embodiment, the nucleic acid-guided nuclease system protein can be introduced as protein. In one exemplary embodiment, the nucleic acid-guided nuclease system protein is Cas9.
  • In some embodiments, the gNAs generated herein are used for the selective capture and/or enrichment of nucleic acid sequences of interest. For example, in some embodiments, the gNAs generated herein are used for capturing target nucleic acid sequences comprising: providing a sample comprising a plurality of nucleic acids; and contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins. Once the sequences of interest are captured, they can be further ligated to create, for example, a sequencing library.
  • In some embodiments, the gNAs generated herein are used for introducing labeled nucleotides at targeted sites of interest comprising: (a) providing a sample comprising a plurality of nucleic acid fragments; (b) contacting the sample with a plurality of complexes comprising (i) a collection of gNAs provided herein; and (ii) nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein-nickases (e.g. Cas9-nickases), wherein the gNAs are complementary to targeted sites of interest in the nucleic acid fragments, thereby generating a plurality of nicked nucleic acid fragments at the targeted sites of interest; and (c) contacting the plurality of nicked nucleic acid fragments with an enzyme capable of initiating nucleic acid synthesis at a nicked site, and labeled nucleotides, thereby generating a plurality of nucleic acid fragments comprising labeled nucleotides in the targeted sites of interest.
  • In some embodiments, the gNAs generated herein are used for capturing target nucleic acid sequences of interest comprising: (a) providing a sample comprising a plurality of adapter-ligated nucleic acids, wherein the nucleic acids are ligated to a first adapter at one end and are ligated to a second adapter at the other end; and (b) contacting the sample with a collection of gNAs which comprise a plurality of dead nucleic acid-guided nuclease-gNA complexes (e.g., dCas9-gRNA complexes), wherein the dead nucleic acid-guided nuclease (e.g., dCas9) is fused to a transposase, wherein the gNAs are complementary to targeted sites of interest contained in a subset of the nucleic acids, and wherein the dead nucleic acid-guided nuclease-gNA transposase complexes (e.g., dCas9-gRNA transposase complexes) are loaded with a plurality of third adapters, to generate a plurality of nucleic acids fragments comprising either a first or second adapter at one end and a third adapter at the other end. In one embodiment the method further comprises amplifying the product of step (b) using first or second adapter and third adapter-specific PCR.
  • In some embodiments, the gNAs generated herein are used to perform genome-wide or targeted activation or repression in a population of cells. In such an embodiment, libraries of in vitro-transcribed gNAs (e.g., gRNAs) or vectors encoding the gNAs can be introduced into a population of cells via transfection or other laboratory techniques known in the art, along with a catalytically dead nucleic acid-guided nuclease (e.g., CRISPR/Cas) system protein fused to an activator or repressor domain (catalytically dead nucleic acid-guided nuclease system protein-fusion protein), in a way that gNA-directed catalytically dead nucleic acid-guided nuclease system protein-mediated activation or repression can be achieved at sequences across the entire genome or to a specific region of the genome. In one embodiment, the catalytically dead nucleic acid-guided nuclease system protein-fusion protein can be introduced as DNA. In one embodiment, the catalytically dead nucleic acid-guided nuclease system protein-fusion protein can be introduced as mRNA. In one embodiment, the catalytically dead nucleic acid-guided nuclease system protein-fusion protein can be introduced as protein. In some embodiments, the collection of gNAs or nucleic acids encoding for gNAs exhibit specificity for more than one nucleic acid-guided nuclease system protein. In one exemplary embodiment, the catalytically dead nucleic acid-guided nuclease system protein is dCas9.
  • In some embodiments, the collection comprises gRNAs or nucleic acids encoding for gRNAs with specificity for Cas9 and one or more CRISPR/Cas system proteins selected from selected from the group consisting of Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5. In some embodiments, the collection comprises gRNAs or nucleic acids encoding for gRNAs with specificity for various catalytically dead CRISPR/Cas system proteins fused to different fluorophores, for example for use in the labeling and/or visualization of different genomes or portions of genomes, for use in the labeling and/or visualization of different chromosomal regions, or for use in the labeling and/or visualization of the integration of viral genes/genomes into a genome.
  • In some embodiments, the collection of gNAs (or nucleic acids encoding for gNAs) have specificity for different nucleic acid-guided nuclease (e.g., CRISPR/Cas) system proteins, and target different sequences of interest, for example from different species. For example, a first subset of gNAs from a collection of gNAs (or transcribed from a population of nucleic acids encoding such gNAs) targeting a genome from a first species can be first mixed with a first nucleic acid-guided nuclease system protein member (or an engineered version); and a second subset of gNAs from a collection of gNAs (or transcribed from a population of nucleic acids encoding such gNAs) targeting a genome from a second species can be mixed with a second different nucleic acid-guided nuclease system protein member (or an engineered version). In one embodiment, the nucleic acid-guided nuclease system proteins can be a catalytically dead version (for example dCas9) fused with different fluorophores, so that different targeted sequence of interest, e.g. different species genome, or different chromosomes of one species, can be labeled by different fluorescent labels. For example, different chromosomal regions can be labeled by different gRNA-targeted dCas9-fluorophores, for visualization of genetic translocations. For example, different viral genomes can be labeled by different gRNA-targeted dCas9-fluorophores, for visualization of integration of different viral genomes into the host genome. In another embodiment, the nucleic acid-guided nuclease system protein can be dCas9 fused with either activation or repression domain, so that different targeted sequence of interest, e.g. different chromosomes of a genome, can be differentially regulated. In another embodiment, the nucleic acid-guided nuclease system protein can be dCas9 fused different protein domain which can be recognized by different antibodies, so that different targeted sequence of interest, e.g. different DNA sequences within a sample mixture, can be differentially isolated.
  • Exemplary Compositions of the Invention
  • In one embodiment, provided herein is a composition comprising a nucleic acid fragment, a nickase nucleic acid-guided nuclease-gNA complex, and labeled nucleotides. In one exemplary embodiment, provided herein is a composition comprising a nucleic acid fragment, a nickase Cas9-gRNA complex, and labeled nucleotides. In such embodiments, the nucleic acid may comprise DNA. The nucleotides can be labeled, for example with biotin. The nucleotides can be part of an antibody-conjugate pair.
  • In one embodiment, provided herein is a composition comprising a nucleic acid fragment and a catalytically dead nucleic acid-guided nuclease-gNA complex, wherein the catalytically dead nucleic acid-guided nuclease is fused to a transposase. In one exemplary embodiment, provided herein is a composition comprising a DNA fragment and a dCas9-gRNA complex, wherein the dCas9 is fused to a transposase.
  • In one embodiment, provided herein is a composition comprising a nucleic acid fragment comprising methylated nucleotides, a nickase nucleic acid-guided nuclease-gNA complex, and unmethylated nucleotides. In an exemplary embodiment, provided herein is a composition comprising a DNA fragment comprising methylated nucleotides, a nickase Cas9-gRNA complex, and unmethylated nucleotides.
  • In one embodiment, provided herein is a gDNA complexed with a nucleic acid-guided-DNA endonuclease. In an exemplary embodiment, the nucleic acid-guided-DNA endonuclease is NgAgo.
  • In one embodiment, provided herein is a gDNA complexed with a nucleic acid-guided-RNA endonuclease.
  • In one embodiment, provided herin is a gRNA complexed with a nucleic acid-guided-DNA endonuclease.
  • In one embodiment, provided herein is a gRNA complexed with a nucleic acid-guided-RNA endonuclease. In one embodiment, the nucleic acid-guided-RNA endonuclease comprises C2c2.
  • Kits and Articles of Manufacture
  • The present application provides kits comprising any one or more of the compositions described herein, not limited to adapters, gNAs (e.g., gRNAs), gNA collections (e.g., gRNA collections), nucleic acid molecules encoding the gNA collections, and the like.
  • In one exemplary embodiment, the kit comprises a collection of DNA molecules capable of transcribing into a library of gRNAs wherein the gRNAs are targeted to human genomic or other sources of DNA sequences.
  • In one embodiment, the kit comprises a collection of gNAs wherein the gNAs are targeted to human genomic or other sources of DNA sequences.
  • In some embodiments, provided herein are kits comprising any of the collection of nucleic acids encoding gNAs, as described herein. In some embodiments, provided herein are kits comprising any of the collection of gNAs, as described herein.
  • The present application also provides all essential reagents and instructions for carrying out the methods of making the gNAs and the collection of nucleic acids encoding gNAs, as described herein. In some embodiments, provided herein are kits that comprise all essential reagents and instructions for carrying out the methods of making individual gNAs and collections of gNAs as described herein.
  • Also provided herein is computer software monitoring the information before and after contacting a sample with a gNA collection produced herein. In one exemplary embodiment, the software can compute and report the abundance of non-target sequence in the sample before and after providing gNA collection to ensure no off-target targeting occurs, and wherein the software can check the efficacy of targeted-depletion/encrichment/capture/partitioning/labeling/regulation/editing by comparing the abundance of the target sequence before and after providing gNA collection to the sample.
  • The following examples are included for illustrative purposes and are not intend to limit the scope of the invention.
  • Examples Example 1: Construction of a gRNA Library from a T7 Promoter Human DNA Library T7 Promoter Library Construction
  • Human genomic DNA (400 ng) was fragmented using an S2 Covaris sonicator (Covaris) for 8 cycles, to yield fragments of 200-300 bp in length. Fragmented DNA was repaired using the NEBNext End Repair Module (NEB) and incubated at 25° C. for 30 min, then heat inactivated at 75° C. for 20 min. To make T7 promoter adapters, oligos T7-1 (5′GCCTCGAGC*T*A*ATACGACTCACTATAGAG3′, * denotes a phosphorothioate backbone linkage)(SEQ ID NO: 4397) and T7-2 (sequence 5′Phos-CTCTATAGTGAGTCGTATTA3′) (SEQ ID NO: 4398) were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C. T7 promoter blunt adapters (15 pmol total) were then added to the blunt-ended human genomic DNA fragments, and incubated with Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((2) in FIG. 1). Ligations were amplified with 2 μM oligo T7-1, using Hi-Fidelity 2× Master Mix (NEB) for 10 cycles of PCR (98° C. for 20 s, 63° C. for 20 s, 72° C. for 35 s). Amplification was verified by running a small aliquot on agarose gel electrophoresis. PCR amplified products were recovered using 0.6×AxyPrep beads (Axygen) according to the manufacturer's instructions, and resuspended in 15 μL of 10 mM Tris-HCl pH 8.
  • Digestion of DNA
  • PCR amplified T7 promoter DNA (2 μg total per digestion) was digested with 0.1 μL of Nt.CviPII (NEB) in 10 μL of NEB buffer 2 (50 mM NaCl, 10 mM Tris-HCl pH 7.9, 10 mM MgCl2, 100 μg/mL BSA) for 10 min at 37° C. ((3) in FIG. 1), then heat inactivated at 75° C. for 20 min. An additional 10 μL of NEB buffer 2 with 1 μL of T7 Endonuclease I (NEB) was added to the reaction, and incubated at 37° C. for 20 min ((4) in FIG. 1). Enzymatic digestion of DNA was verified by agarose gel electrophoresis. Digested DNA was recovered by adding 0.6×AxyPrep beads (Axygen), according to the manufacturer's instructions, and resuspended in 15 μL of 10 mM Tris-HCl pH 8.
  • Ligation of Adapters and Removal of HGG
  • DNA was then blunted using T4 DNA Polymerase (NEB) for 20 min at 25° C., followed by heat inactivation at 75° C. for 20 min ((5) in FIG. 1).
  • To make MlyI adapters, oligos MlyI-1 (sequence 5′>3′, 5′Phos-GGGACTCGGATCCCTATAGTGATACAAAGACGATGACGACAAGCG) (SEQ ID NO: 4399) and MlyI-2 (sequence 5′>3′, TCACTATAGGGATCCGAGTCCC) (SEQ ID NO: 4400) were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C. MlyI adapters (15 pmol total) were then added to T4 DNA Polymerase-blunted DNA, and incubated with Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((6) in FIG. 1). Ligations were heat inactivated at 75° C. for 20 min, then digested with MlyI and XhoI (NEB) for 1 hr at 37° C., so that HGG motifs are eliminated ((7) in FIG. 1). Digests were then cleaned using 0.8×AxyPrep beads (Axygen), and DNA was resuspended in 10 μL of 10 mM Tris-Cl pH 8.
  • To make StlgR adapters, oligos stlgR (sequence 5′>3′, 5′Phos-GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCG AGTCGGTGCTTTTTTTGGATCCGATGC) (SEQ ID NO: 4401) and stlgRev (sequence 5′>3′, GGATCCAAAAAAAGCACCGACTCGGTGCCACUITTITCAAGTTGATAACGGACTAGCCTTATTTTAAC TTGCTATTTCTAGCTCTAAAAC) (SEQ ID NO: 4402) were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 60° C. StlgR adapters (5 pmol total) were added to HGG-removed DNA fragments, and incubated with Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((8) in FIG. 1). Ligations were then incubated with Hi-Fidelity 2× Master Mix (NEB), using 2 μM of both oligos T7-1 and gRU (sequence 5′>3′, AAAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4403), and amplified using 20 cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C. for 35 s). Amplification was verified by running a small aliquot on agarose gel electrophoresis. PCR amplified products were recovered using 0.6×AxyPrep beads (Axygen) according to the manufacturer's instructions, and resuspended in 15 μL of 10 mM Tris-HCl pH 8.
  • In Vitro Transcription
  • The T7/gRU amplified library of PCR products was then used as template for in vitro transcription, using the HiScribe T7 In Vitro Transcription Kit (NEB). 500-1000 ng of template was incubated overnight at 37° C. according to the manufacturer's instructions. To transcribe the guide libraries into gRNAs, the following in vitro transcription reaction mixture was assembled: 10 μL of purified library (˜500 ng), 6.5 μL of H2O, 2.25 μL of ATP, 2.25 μL of CTP, 2.25 μL of GTP, 2.25 μL of UTP, 2.25 μL of 10× reaction buffer (NEB) and 2.25 μL of T7 RNA Polymerase mix. The reaction was incubated at 37° C. for 24 hr, then purified using the RNA cleanup kit (Life Technologies), eluted with 100 μL of RNase-free water, quantified and stored at −20° C. until use.
  • Example 2: Construction of gRNA Library from Intact Human Genomic DNA Digestion of DNA
  • Human genomic DNA ((1) in FIG. 2; 20 μg total per digestion) was digested with 0.1 μL of Nt.CviPII (NEB) in 40 μL of NEB buffer 2 (50 mM NaCl, 10 mM Tris-HC pH 7.9, 10 mM MgCl2, 100 μg/mL BSA) for 10 min at 37° C., then heat inactivated at 75° C. for 20 min. An additional 40 μL of NEB buffer 2 and 1 μL of T7 Endonuclease I (NEB) was added to the reaction, with 20 min incubation at 37° C. (e.g., (2) in FIG. 2). Fragmentation of genomic DNA was verified with a small aliquot by agarose gel electrophoresis. DNA fragments between 200 and 600 bp were recovered by adding 0.3×AxyPrep beads (Axygen), incubating at 25° C. for 5 min, capturing beads on a magnetic stand and transferring the supernatant to a new tube. DNA fragments below 600 bp do not bind to beads at this bead/DNA ratio and remain in the supernatant. 0.7×AxyPrep beads (Axygen) were then added to the supernatant (this will bind all DNA molecules longer than 200 bp), allowed to bind for 5 min. Beads were captured on a magnetic stand and washed twice with 80% ethanol, air dried. DNA was then resuspended in 15 μL of 10 mM Tris-HCl pH 8. DNA concentration was determined using a Qbit assay (Life Technologies).
  • Ligation of Adapters
  • To make T7/MlyI adapters, oligos MlyI-1 (sequence 5′>3′, 5′Phos-GGGGGACTCGGATCCCTATAGTGATACAAAGACGATGACGACAAGCG) (SEQ ID NO: 4404) and T7-7 (sequence 5′>3′, GCCTCGAGC*T*A*ATACGACTCACTATAGGGATCCAAGTCCC, * denotes a phosphorothioate backbone linkage) (SEQ ID NO: 4405) were admixed at 15 μM, heated to 98° C. for 3 min then cooled slowly (0.1° C./min) to 30° C. The purified, Nt.CviPII/T7 Endonuclease I digested DNA (100 ng) was then ligated to 15 pmol of T7/MlyI adapters using Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((3) in FIG. 2). Ligations were then amplified by 10 cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C. for 35 s) using Hi-Fidelity 2× Master Mix (NEB), and 2 μM of both oligos T7-17 (GCCTCGAGC*T*A*ATACGACTCACTATAGGG * denotes a phosphorothioate backbone linkage) (SEQ ID NO: 4406) and Flag (sequence 5′>3′, CGCTTGTCGTCATCGTCTTTGTA) (SEQ ID NO: 4407). PCR amplification increases the yield of DNA and, given the nature of the Y-shaped adapters we used, always resulted in T7 promoter being added distal to the HGG site and MlyI site being added next to the HGG motif ((4) in FIG. 2).
  • PCR products were then digested with MlyI and XhoI (NEB) for 1 hr at 37° C., and heat inactivated at 75° C. for 20 min ((5) in FIG. 2). Following that, 5 pmol of adapter StlgR (in Example 1) was ligated using Blunt/TA Ligase Master Mix (NEB) at 25° C. for 30 min ((6) in FIG. 2). Ligations were then amplified by PCR using Hi-Fidelity 2× Master Mix (NEB), 2 μM of both oligos T7-7 and gRU (in Example 1) and 20 cycles of PCR (98° C. for 20 s, 60° C. for 20 s, 72° C. for 35 s). Amplification was verified by running a small aliquot on agarose gel electrophoresis. PCR amplified products were recovered using 0.6×AxyPrep beads (Axygen) according to the manufacturer's instructions, and resuspended in 15 μL of 10 mM Tris-HCl pH 8.
  • Samples were then used as templates for in vitro transcription reaction as described in Example 1.
  • Example 3: Direct Cutting with CviPII
  • 30 μg of human genomic DNA was digested with 2 units of NtCviPII (New England Biolabs) for 1 hour at 37° C., followed by heat inactivation at 75° C. for 20 minutes. The size of the fragments was verified to be 200-1,000 base pairs using a fragment analyzer instrument (Advanced Analytical). The 5′ or 3′ protruding ends (as shown, for example, in FIG. 3) were converted to blunt ends by adding 100 units of T4 DNA polymerase (New England Biolabs), 100 μM dNTPs and incubating at 12° C. for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL elution buffer. The DNA was then ligated to MlyI adapter (see, for example, Example 4) or BaeI/EcoP15I adapters (see, for example, Example 4) or BaeI/EcoP15I adapters (see, for example, Example 5)
  • Example 4: Use of MlyI Adapter
  • Adapter MlyI was made by combining 2 μmoles of MlyI Ad1 and MlyAd2 in 40 μL water. Adapter BsaXI/MmeI was made by combining 2 μmoles oligo BsMm-Ad1 and 2 μmoles oligo BsMm-Ad2 in 40 μL water. T7 adapter was made by combining 1.5 μmoles of T7-Ad1 and T7-Ad2 oligos in 100 μL water. Stem-loop adapter was made by combining 1.5 μmoles of gR-top and gR-bot oligos in 100 μL water. In all cases, after mixing adapters were heated to 98° C. for 3 min then cooled to room temperature at a cooling rate of 1° C./min in a thermal cycler.
  • TABLE 5
    Oligonucleotides used with MlyI Adapter.
    SEQ
    ID Oligo
    NO name Sequence (5′>3′) Modification
    4408 MlyI- gagatcagcttctgcattgatgccagcagcccgagtcag none
    Ad1
    4409 MlyI- ctgactcgggctgctgtacaaagacgatgacgacaagcgtta 5′phosphate
    Ad2
    4410 BsMm- gagatcagcttctgcattgatgcGGAGCCGCAGTACACTATCCAAC none
    Ad1
    4411 BsMm- GTTGGATAGTGTACTGCGGCTCCtacaaagacgatgacgacaagcg 5′phosphate
    Ad2
    4412 T7-Ad1 gcctcgagctaatacgactcactatagagNN none
    4398 T7-Ad2 Ctctatagtgagtcgtatta 5′phosphate
    4413 gR-top ttagagctagaaatagcaagttaaaataaggctagtccgttatcaacttgaaaaagtggcaccgagtcggt 5′phosphate
    gctttttt
    4414 gR-bot aaaaaagcaccgactcggtgccactttttcaagttgataacggactagccttattttaacttgctatttct none
    agctctaaaac
  • The DNA containing the CCD blunt ends (from earlier section) was then ligated to 50 pmoles of adapter MlyI, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9). These steps eliminate small (<100 nucleotides) DNA and MlyI adapter dimers.
  • Purified DNA was then digested by adding 20 units of MlyI (New England Biolabs) and incubating at 37° C. for 1 hour to eliminate both the adapter derived sequences and the CCD (and complementary HGG) motifs. DNA was recovered from the digest by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 30 μL buffer 4.
  • The purified DNA was then ligated to 50 pmoles of adapter BsaXI/MmeI, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9). DNA was then digested by addition of 20 units MmeI (New England Biolabs) and 40 pmol/μL SAM (S-adenosyl methionine) at 37° C. for 1 hour, followed by heat inactivation at 75° C. for 20 minutes. DNA was then ligated to 30 pmoles T7 adapter using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL buffer 4, then digested with 20 units of BsaXI for 1 hour at 37° C. The guide RNA stem-loop sequences were added by adding 15 pmoles stem-loop adapter and using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 min. DNA was then recovered using a PCR cleanup kit (Zymo), eluted in 20 μL elution buffer and PCR amplified using HiFidelity 2× master mix (New England Biolabs). Primers T7-Ad1 and gRU (sequence 5′>3′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with the following settings (98° C. 3 min; 98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 sec, 30 cycles). The PCR amplicon was cleaned up using the PCR cleanup kit and verified by DNA sequencing, then used as template for an in vitro transcription reaction to generate guide RNAs.
  • Example 5: Use of BaeI/EcoP15I Adapter
  • Adapter Bae/EcoP15I was made by combining 2 μmoles of BE Ad1 and BE Ad2 in 40 μL water. T7-E adapter was made by combining 1.5 μmoles of T7-Ad3 and T7-Ad4 oligos in 100 μL water. In all cases, after mixing adapters were heated to 98° C. for 3 min then cooled to room temperature at a cooling rate of 1° C./min in a thermal cycler.
  • TABLE 6
    Oligonucleotides used with BaeI/EcoP15I Adapter.
    SEQ
    ID Oligo
    NO: name Sequence (5′>3′) Modification
    4415 BE ActgctgacACAAgtatcTTTTTTTTTTgtttaaacTTTTTTTTTTgatacACAAgtcagcagA 5′phosphate
    Ad1
    4416 Be TagctgacTTGTgtatcAAAAAAAAAAgtttaaacAAAAAAAAAAgatacTTGTgtcagcagT 5′phosphate
    Ad2
    12 T7- gcctcgagctaatacgactcactatagag none
    Ad3
    4417 T7- NNctctatagtgagtcgtatta 5′phosphate
    Ad4
    4418 stlgR ttagagctagaaatagcaagttaaaataaggctagtccgttatcaacttgaaaaagtggcaccg 5′adenylation
    agtcggtgctttttt
  • The DNA containing the CCD blunt ends (from earlier section) was then ligated to 50 pmoles of adapter BaeI/EcoP15I, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9). Recovered DNA was then digested with 20 units PmeI for 30 min at 37° C.; DNA was then recovered by incubating with 1.2× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4. These steps eliminate small (<100 nucleotides) DNA and BaeI/EcoP15I adapter multimers.
  • DNA was then digested by addition of 20 units EcoP15I (New England Biolabs) and 1 mM ATP at 37° C. for 1 hour, followed by heat inactivation at 75° C. for 20 minutes. DNA was then ligated to 30 pmoles T7-E adapter using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL buffer 4.
  • Purified DNA was then digested by adding 20 units of BaeI (New England Biolabs), 40 pmol/μL SAM (S-adenosyl methionine) and incubating at 37° C. for 1 hour to eliminate both the adapter derived sequences and the CCD (and complementary HGG) motifs. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL elution buffer.
  • Recovered DNA was then ligated to the stlgR oligo using Thermostable 5′ AppDNA/RNA Ligase
  • (New England Biolabs) by adding 20 units ligase, 20 pmol stlgR oligo, in 20 μL ss ligation buffer (10 mM Bis-Tris-Propane-HCl, 10 mM MgCl2, 1 mM DTT, 2.5 mM MnCl2, pH 7 @ 25° C.) and incubating at 65° C. for 1 hour followed by heat inactivation at 90° C. for 5 min. DNA product was then PCR amplified using HiFidelity 2× master mix (New England Biolabs). Primers T7-Ad3 and gRU (sequence 5′>3′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with the following settings (98° C. 3 min; 98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 see, 30 cycles). The PCR amplicon was cleaned up using the PCR cleanup kit and verified by DNA sequencing, then used as template for an in vitro transcription reaction to generate the guide RNAs.
  • Example 6: NEMDA Method
  • NEMDA (Nicking Endonuclease Mediated DNA Amplification) was performed using 50 ng of human genomic DNA. The DNA was incubated in 100 μL thermo polymerase buffer (20 mM Tris-HCl, 10 mM (NH4)2SO4, 10 mM KCl, 6 mM MgSO4, 0.1% Triton® X-100, pH 8.8) supplemented with 0.3 mM dNTPs, 40 units of Bst large fragment DNA polymerase, and 0.1 units of NtCviPII (New England Biolabs) at 55° C. for 45 min, followed by 65° C. for 30 min and finally 80° C. for 20 min in a thermal cycler.
  • The DNA was then diluted with 300 μL of buffer 4 supplemented with 200 pmoles of T7-RND8 oligo (sequence 5′>3′ gcctcgagctaatacgactcactatagagnnnnnnnn) (SEQ ID NO: 4420) and boiled at 98° C. for 10 min followed by rapid cooling to 10° C. for 5 min. The reaction was then supplemented with 40 units of E. coli DNA polymerase I and 0.1 mM dNTPs (New England Biolabs) and incubated at room temperature for 20 min followed by heat inactivation at 75° C. for 20 min. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 30 μL elution buffer.
  • DNA was then ligated to 50 pmoles of adapter BaeI/EcoP15I, using the blunt/TA ligation master mix (New England Biolabs) at room temperature for 30 minutes. The DNA was then recovered by incubating with 0.6× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80/o ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 100 μg/mL BSA, pH 7.9). Recovered DNA was then digested with 20 units PmeI for 30 min at 37° C.; DNA was then recovered by incubating with 1.2× Kapa SPRI beads (Kapa Biosystems) for 5 minutes, capturing the beads with a magnetic rack, washing twice with 80% ethanol, air drying the beads for 5 minutes and finally resuspending the DNA in 50 μL buffer 4. These steps eliminate small (<100 nucleotides) DNA and BaeI/EcoP15I adapter multimers.
  • Purified DNA was then digested by adding 20 units of BaeI (New England Biolabs), 40 pmol/μL SAM (S-adenosyl methionine) and incubating at 37° C. for 1 hour to eliminate both the adapter derived sequences and the CCD (and complementary HGG) motifs. DNA was then recovered using a PCR cleanup kit (Zymo) and eluted in 20 μL elution buffer.
  • Recovered DNA was then ligated to the stlgR oligo using Thermostable 5′ AppDNA/RNA Ligase (New England Biolabs) by adding 20 units ligase, 20 pmol stlgR oligo, in 20 μL ss ligation buffer (10 mM Bis-Tris-Propane-HCl, 10 mM MgCl2, 1 mM DTT, 2.5 mM MnCl2. pH 7 @ 25° C.) and incubating at 65° C. for 1 hour followed by heat inactivation at 90° C. for 5 min. DNA product was then PCR amplified using HiFidelity 2× master mix (New England Biolabs). Primers T7-Ad3 (sequence 5′>3′ gctcgagctaatacgactcactatagag) (SEQ ID NO: 12) and gRU (sequence 5′>3′ AAAAAAGCACCGACTCGGTG) (SEQ ID NO: 4419) were used to amplify with the following settings (98° C. for 3 min; 98° C. for 20 sec, 60° C. for 30 secs, 72° C. for 20 sec, 30 cycles). The PCR amplicon was cleaned up using the PCR cleanup kit and verified by DNA sequencing, then used as template for an in vitro transcription reaction to generate the guide RNAs.

Claims (38)

1. A collection of nucleic acids the nucleic acids in the collection comprising:
a second segment encoding a targeting sequence; and
a third segment encoding a nucleic acid-guided nuclease system protein-binding sequence, wherein the collection of nucleic acids comprises at least 105 unique nucleic acid molecules.
2. The collection of claim 1, wherein the nucleic acid-guided nuclease system protein is a CRISPR/Cas system protein.
3. The collection of claim 1, wherein the size of the second segment varies from 15-250 bp across the collection of nucleic acids.
4. The collection of claim 1, wherein at least 10% of the second segments in the collection are greater than 21 bp.
5. The collection of claim 1, wherein the size of the second segment is not 20 bp and is not 21 bp.
6. (canceled)
7. The collection of claim 1, wherein the collection of nucleic acids is a collection of DNA.
8. The collection of claim 7, wherein the second segment is single stranded DNA.
9. The collection of claim 7, wherein the third segment is single stranded DNA.
10. The collection of claim 7, wherein the third segment is double stranded DNA.
11. The collection of claim 1, further comprising a first segment comprising a regulatory region, wherein the regulatory region is a region capable of binding a transcription factor.
12. The collection of claim 1, further comprising a first segment comprising a regulatory region, wherein the regulatory region comprises a promoter.
13. The collection of claim 13, wherein the promoter is selected from the group consisting of T7, SP6, and T3.
14. The collection of claim 1, wherein the targeting sequence is directed at a mammalian genome, eukaryotic genome, prokaryotic genome, or a viral genome.
15. The collection of claim 1, wherein the targeting sequence is directed at repetitive or abundant DNA.
16. The collection of claim 1, wherein the targeting sequence is directed at mitochondrial DNA, ribosomal DNA, Alu DNA, centromeric DNA, SINE DNA, LINE DNA, or STR DNA.
17. The collection of claim 1, wherein the sequence of the second segments is selected from Table 3 and/or Table 4.
18. (canceled)
19. The collection of claim 1, wherein the targeting sequence is at least 80% complementary to the strand opposite to a sequence of nucleotides 5′ to a PAM sequence.
20. The collection of claim 1, wherein the collection comprises targeting sequences directed to sequences of interest spaced about every 10,000 bp or less across the genome of an organism.
21. The collection of claim 20, wherein the PAM sequence is AGG, CGG, or TGG.
22. The collection of claim 20, wherein the PAM sequence is specific for a CRISPR/Cas system protein selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, and Cm5.
23. The collection of claim 1, wherein the third segment comprises DNA encoding a gRNA stem-loop sequence.
24. The collection of claim 1, wherein the sequence of the third segment encodes for a RNA comprising the sequence GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAA AGUGGCACCGAGUCGGUGCUUUUUUU (SEQ ID NO: 1) or encodes for a RNA comprising the sequence GUUUUAGAGCUAUGCUGGAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUC AACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUC (SEQ ID NO: 2).
25. The collection of claim 1, wherein the sequence of the third segment encodes for a crRNA and a tracrRNA.
26. The collection of claim 1, wherein the nucleic acid-guided nuclease system protein is from a bacterial species.
27. The collection of claim 1, wherein the nucleic acid-guided nuclease system protein is from an archaea species.
28. The collection of claim 2, wherein the CRISPR/Cas system protein is a Type I, Type II, or Type III protein.
29. The collection of claim 2, wherein the CRISPR/Cas system protein is selected from the group consisting of Cas9, Cpf1, Cas3, Cas8a-c, Cas10, Cse1, Csy1, Csn2, Cas4, Csm2, Cm5, dCas9 and cas9 nickase.
30. The collection of claim 1, wherein the third segment comprises DNA encoding a Cas9-binding sequence.
31. The collection of claim 1, wherein a plurality of third segments of the collection encode for a first nucleic acid-guided nuclease system protein binding sequence, and a plurality of the third segments of the collection encode for a second nucleic acid-guided nuclease system protein binding sequence.
32. The collection of claim 1, wherein the third segments of the collection encode for a plurality of different binding sequences of a plurality of different nucleic acid-guided nuclease system proteins.
33.-192. (canceled)
193. A kit comprising the collection of nucleic acids of claim 1.
194.-235. (canceled)
236. The collection of claim 1, wherein at least 10% of the nucleic acids in the collection vary in size.
237. The collection of claim 1, further comprising a first segment comprising a regulatory region.
238. A collection of guide RNAs generated by transcribing the collection of claim 1.
US16/995,761 2015-12-07 2020-08-17 Methods and compositions for the making and using of guide nucleic acids Pending US20210207130A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/995,761 US20210207130A1 (en) 2015-12-07 2020-08-17 Methods and compositions for the making and using of guide nucleic acids

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201562264262P 2015-12-07 2015-12-07
US201662298963P 2016-02-23 2016-02-23
PCT/US2016/065420 WO2017100343A1 (en) 2015-12-07 2016-12-07 Methods and compositions for the making and using of guide nucleic acids
US201815742862A 2018-01-08 2018-01-08
US16/995,761 US20210207130A1 (en) 2015-12-07 2020-08-17 Methods and compositions for the making and using of guide nucleic acids

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
PCT/US2016/065420 Continuation WO2017100343A1 (en) 2015-12-07 2016-12-07 Methods and compositions for the making and using of guide nucleic acids
US15/742,862 Continuation US10787662B2 (en) 2015-12-07 2016-12-07 Methods and compositions for the making and using of guide nucleic acids

Publications (1)

Publication Number Publication Date
US20210207130A1 true US20210207130A1 (en) 2021-07-08

Family

ID=59013518

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/742,862 Active 2037-02-18 US10787662B2 (en) 2015-12-07 2016-12-07 Methods and compositions for the making and using of guide nucleic acids
US16/995,761 Pending US20210207130A1 (en) 2015-12-07 2020-08-17 Methods and compositions for the making and using of guide nucleic acids

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US15/742,862 Active 2037-02-18 US10787662B2 (en) 2015-12-07 2016-12-07 Methods and compositions for the making and using of guide nucleic acids

Country Status (8)

Country Link
US (2) US10787662B2 (en)
EP (2) EP3386550B1 (en)
JP (3) JP6995751B2 (en)
CN (1) CN109310784B (en)
AU (1) AU2016365720B2 (en)
CA (1) CA3006781A1 (en)
DK (1) DK3386550T3 (en)
WO (1) WO2017100343A1 (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
PL3105328T3 (en) 2014-02-11 2020-10-19 The Regents Of The University Of Colorado, A Body Corporate Crispr enabled multiplexed genome engineering
CA3006781A1 (en) 2015-12-07 2017-06-15 Arc Bio, Llc Methods and compositions for the making and using of guide nucleic acids
LT3474669T (en) 2016-06-24 2022-06-10 The Regents Of The University Of Colorado, A Body Corporate Methods for generating barcoded combinatorial libraries
US20200190508A1 (en) * 2017-06-07 2020-06-18 Arc Bio, Llc Creation and use of guide nucleic acids
IT201700067084A1 (en) * 2017-06-16 2018-12-16 Univ Degli Studi Di Palermo In vitro method for enrichment in target genomic sequences by CRISPR-Cas system
US9982279B1 (en) 2017-06-23 2018-05-29 Inscripta, Inc. Nucleic acid-guided nucleases
US10011849B1 (en) 2017-06-23 2018-07-03 Inscripta, Inc. Nucleic acid-guided nucleases
US20210130817A1 (en) * 2017-07-14 2021-05-06 Cure Genetics Co., Ltd. Gene Editing System and Gene Editing Method
AU2018320864B2 (en) 2017-08-22 2024-02-22 Napigen, Inc. Organelle genome modification using polynucleotide guided endonuclease
KR20210045360A (en) 2018-05-16 2021-04-26 신테고 코포레이션 Methods and systems for guide RNA design and use
US20210198660A1 (en) 2018-06-07 2021-07-01 Arc Bio, Llc Compositions and methods for making guide nucleic acids
KR102200109B1 (en) * 2018-09-12 2021-01-08 한국과학기술정보연구원 Nucleotide for amplification of target nucleic acid sequence and amplification mathod usning the same
EP4296373A3 (en) 2018-10-04 2024-01-24 Arc Bio, LLC Normalization controls for managing low sample inputs in next generation sequencing
JP7389135B2 (en) 2019-03-18 2023-11-29 リジェネロン・ファーマシューティカルズ・インコーポレイテッド CRISPR/CAS dropout screening platform to reveal genetic vulnerabilities associated with tau aggregation
JP7461368B2 (en) 2019-03-18 2024-04-03 リジェネロン・ファーマシューティカルズ・インコーポレイテッド CRISPR/CAS Screening Platform to Identify Genetic Modifiers of Tau Seeding or Aggregation
CA3136228A1 (en) * 2019-04-09 2020-10-15 Arc Bio, Llc Compositions and methods for nucleotide modification-based depletion
AU2020290509A1 (en) 2019-06-14 2021-11-11 Regeneron Pharmaceuticals, Inc. Models of tauopathy
CN114293264A (en) * 2021-12-21 2022-04-08 翌圣生物科技(上海)股份有限公司 Preparation method of enzyme method target sequence random sgRNA library
CN114277447A (en) * 2021-12-21 2022-04-05 翌圣生物科技(上海)股份有限公司 Preparation method of target sequence random sgRNA full-coverage group

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015122967A1 (en) * 2014-02-13 2015-08-20 Clontech Laboratories, Inc. Methods of depleting a target molecule from an initial collection of nucleic acids, and compositions and kits for practicing the same
US20150353927A1 (en) * 2013-01-10 2015-12-10 Ge Healthcare Dharmacon, Inc. Templates, Libraries, Kits and Methods for Generating Molecules
WO2016100955A2 (en) * 2014-12-20 2016-06-23 Identifygenomics, Llc Compositions and methods for targeted depletion, enrichment, and partitioning of nucleic acids using crispr/cas system proteins
US20180002736A1 (en) * 2015-01-28 2018-01-04 The Regents Of The University Of California Methods and compositions for labeling a single-stranded target nucleic acid

Family Cites Families (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8476011B1 (en) * 1996-06-18 2013-07-02 Bp Corporation North America Inc. Construction and use of catalogued nucleic acid libraries that contain advantageously adjusted representations of defined components
US5948902A (en) 1997-11-20 1999-09-07 South Alabama Medical Science Foundation Antisense oligonucleotides to human serine/threonine protein phosphatase genes
US20050233340A1 (en) 2004-04-20 2005-10-20 Barrett Michael T Methods and compositions for assessing CpG methylation
US8323930B2 (en) 2007-07-28 2012-12-04 Dna Twopointo, Inc. Methods, compositions and kits for one-step DNA cloning using DNA topoisomerase
WO2009133466A2 (en) 2008-04-30 2009-11-05 Population Genetics Technologies Ltd. Asymmetric adapter library construction
CN102301009B (en) 2009-02-03 2015-09-30 新英格兰生物实验室公司 Enzyme is used to produce random ds breakage in DNA
JP5780971B2 (en) 2009-03-09 2015-09-16 ニユー・イングランド・バイオレイブス・インコーポレイテツド Random double-strand breaks in DNA using enzymes
WO2012051327A2 (en) 2010-10-12 2012-04-19 Cornell University Method of dual-adapter recombination for efficient concatenation of multiple dna fragments in shuffled or specified arrangements
EP2500436B1 (en) 2011-03-17 2016-05-25 Institut Pasteur Method, probe and kit for DNA in situ hybridation and use thereof
DE202013012242U1 (en) 2012-05-25 2016-02-02 Emmanuelle Charpentier Compositions for RNA-directed modification of a target DNA and for RNA-driven modulation of transcription
RU2014153918A (en) * 2012-06-12 2016-07-27 Дженентек, Инк. METHODS AND COMPOSITIONS FOR OBTAINING CONDITIONALLY KO-KOLLUTE ALLEYS
KR102146721B1 (en) 2012-07-13 2020-08-21 엑스-켐, 인크. Dna-encoded libraries having encoding oligonucleotide linkages not readable by polymerases
CN103233028B (en) * 2013-01-25 2015-05-13 南京徇齐生物技术有限公司 Specie limitation-free eucaryote gene targeting method having no bio-safety influence and helical-structure DNA sequence
US9234213B2 (en) * 2013-03-15 2016-01-12 System Biosciences, Llc Compositions and methods directed to CRISPR/Cas genomic engineering systems
US10435740B2 (en) 2013-04-01 2019-10-08 University Of Florida Research Foundation, Incorporated Determination of methylation state and chromatin structure of target genetic loci
US9873907B2 (en) 2013-05-29 2018-01-23 Agilent Technologies, Inc. Method for fragmenting genomic DNA using CAS9
WO2014196863A1 (en) 2013-06-07 2014-12-11 Keygene N.V. Method for targeted sequencing
KR102291045B1 (en) 2013-08-05 2021-08-19 트위스트 바이오사이언스 코포레이션 De novo synthesized gene libraries
EP3611268A1 (en) * 2013-08-22 2020-02-19 E. I. du Pont de Nemours and Company Plant genome modification using guide rna/cas endonuclease systems and methods of use
WO2015048690A1 (en) * 2013-09-27 2015-04-02 The Regents Of The University Of California Optimized small guide rnas and methods of use
EP3052651B1 (en) 2013-10-01 2019-11-27 Life Technologies Corporation Systems and methods for detecting structural variants
WO2015050501A1 (en) * 2013-10-03 2015-04-09 Agency For Science, Technology And Research Amplification paralleled library enrichment
EP3080259B1 (en) * 2013-12-12 2023-02-01 The Broad Institute, Inc. Engineering of systems, methods and optimized guide compositions with new architectures for sequence manipulation
CA3194412A1 (en) * 2014-02-27 2015-09-03 Monsanto Technology Llc Compositions and methods for site directed genomic modification
US20170088819A1 (en) 2014-05-16 2017-03-30 Vrije Universiteit Brussel Genetic correction of myotonic dystrophy type 1
US20160053304A1 (en) 2014-07-18 2016-02-25 Whitehead Institute For Biomedical Research Methods Of Depleting Target Sequences Using CRISPR
US10435685B2 (en) * 2014-08-19 2019-10-08 Pacific Biosciences Of California, Inc. Compositions and methods for enrichment of nucleic acids
LT3250691T (en) 2015-01-28 2023-09-11 Caribou Biosciences, Inc. Crispr hybrid dna/rna polynucleotides and methods of use
WO2016133764A1 (en) 2015-02-17 2016-08-25 Complete Genomics, Inc. Dna sequencing using controlled strand displacement
US20160362680A1 (en) 2015-05-15 2016-12-15 Nugen Technologies, Inc. Compositions and methods for negative selection of non-desired nucleic acid sequences
WO2016196805A1 (en) 2015-06-05 2016-12-08 The Regents Of The University Of California Methods and compositions for generating crispr/cas guide rnas
EP3337898B1 (en) 2015-08-19 2021-07-28 Arc Bio, LLC Capture of nucleic acids using a nucleic acid-guided nuclease-based system
KR20180054871A (en) 2015-10-08 2018-05-24 프레지던트 앤드 펠로우즈 오브 하바드 칼리지 Multiplexed genome editing
EP3374507A1 (en) 2015-11-09 2018-09-19 IFOM Fondazione Istituto Firc di Oncologia Molecolare Crispr-cas sgrna library
CA3006781A1 (en) 2015-12-07 2017-06-15 Arc Bio, Llc Methods and compositions for the making and using of guide nucleic acids
JP2019506875A (en) 2016-02-23 2019-03-14 アーク バイオ, エルエルシー Methods and compositions for target detection
US20200190508A1 (en) 2017-06-07 2020-06-18 Arc Bio, Llc Creation and use of guide nucleic acids

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150353927A1 (en) * 2013-01-10 2015-12-10 Ge Healthcare Dharmacon, Inc. Templates, Libraries, Kits and Methods for Generating Molecules
WO2015122967A1 (en) * 2014-02-13 2015-08-20 Clontech Laboratories, Inc. Methods of depleting a target molecule from an initial collection of nucleic acids, and compositions and kits for practicing the same
WO2016100955A2 (en) * 2014-12-20 2016-06-23 Identifygenomics, Llc Compositions and methods for targeted depletion, enrichment, and partitioning of nucleic acids using crispr/cas system proteins
US20180298421A1 (en) * 2014-12-20 2018-10-18 Identifygenomics, Llc Compositions and methods for targeted depletion, enrichment, and partitioning of nucleic acids using crispr/cas system proteins
US20180002736A1 (en) * 2015-01-28 2018-01-04 The Regents Of The University Of California Methods and compositions for labeling a single-stranded target nucleic acid

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Koike-Yusa et al et al in "Genome-wide recessive genetics screening in mammalian cells with a lentiviral CRISPR-guide RNA library" (Nature Biotechnology; Vol 32, pages 267-273; published December 23, 2013). (Year: 2013) *
Schmidt et al in "Synthesis of an arrayed sgRNA library targeting the human genome" (Scientific Reports 5:14987; pages 1-10; published October 8, 2015). (Year: 2015) *

Also Published As

Publication number Publication date
JP2024051149A (en) 2024-04-10
CA3006781A1 (en) 2017-06-15
JP2021141904A (en) 2021-09-24
US10787662B2 (en) 2020-09-29
WO2017100343A1 (en) 2017-06-15
CN109310784B (en) 2022-08-19
EP3386550A1 (en) 2018-10-17
EP3386550B1 (en) 2021-01-20
CN109310784A (en) 2019-02-05
AU2016365720A1 (en) 2018-06-14
EP3386550A4 (en) 2019-09-18
JP2018535689A (en) 2018-12-06
US20190270984A1 (en) 2019-09-05
AU2016365720B2 (en) 2020-11-26
EP3871695A1 (en) 2021-09-01
DK3386550T3 (en) 2021-04-26
JP6995751B2 (en) 2022-02-04

Similar Documents

Publication Publication Date Title
US20210207130A1 (en) Methods and compositions for the making and using of guide nucleic acids
US9567604B2 (en) Using truncated guide RNAs (tru-gRNAs) to increase specificity for RNA-guided genome editing
US10774365B2 (en) Compositions and methods for targeted depletion, enrichment, and partitioning of nucleic acids using CRISPR/Cas system proteins
AU2020262371A1 (en) Nucleic acid constructs and methods for their manufacture
US5807718A (en) Enzymatic DNA molecules
JP7282692B2 (en) Preparation and Use of Guide Nucleic Acids
CA2975855A1 (en) Compositions and methods for synthetic gene assembly
US20150010953A1 (en) Method for producing a population of oligonucleotides that has reduced synthesis errors
JP2016013127A (en) Template-independent ligation of single-stranded dna
CN107760772B (en) Methods, compositions, systems, instruments, and kits for nucleic acid paired-end sequencing
CA2584984A1 (en) Methods for assembly of high fidelity synthetic polynucleotides
CN118086471A (en) Methods, compositions, systems, instruments and kits for nucleic acid amplification
US20210198660A1 (en) Compositions and methods for making guide nucleic acids
US20230220434A1 (en) Composistions and methods for crispr enabled dna synthesis
US20080213841A1 (en) Novel Method for Assembling DNA Metasegments to use as Substrates for Homologous Recombination in a Cell
US20240167020A1 (en) Analyzing expression of protein-coding variants in cells
CA3128755C (en) Compositions and methods for treating hemoglobinopathies
TW202342069A (en) Modified crispr-based gene editing system and methods of use
Joyce et al. Enzymatic DNA molecules

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

AS Assignment

Owner name: ARC BIO, LLC, MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GOURGUECHON, STEPHANE B.;REEL/FRAME:063988/0540

Effective date: 20171228

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCV Information on status: appeal procedure

Free format text: NOTICE OF APPEAL FILED