US20220213458A1 - Mad nucleases - Google Patents

Mad nucleases Download PDF

Info

Publication number
US20220213458A1
US20220213458A1 US17/691,018 US202217691018A US2022213458A1 US 20220213458 A1 US20220213458 A1 US 20220213458A1 US 202217691018 A US202217691018 A US 202217691018A US 2022213458 A1 US2022213458 A1 US 2022213458A1
Authority
US
United States
Prior art keywords
seq
nuclease
sequence seq
nucleic acid
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/691,018
Inventor
Juhan Kim
Benjamin Mijts
Aamir Mir
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inscripta Inc
Original Assignee
Inscripta Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inscripta Inc filed Critical Inscripta Inc
Priority to US17/691,018 priority Critical patent/US20220213458A1/en
Assigned to INSCRIPTA, INC. reassignment INSCRIPTA, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KIM, JUHAN, MIJTS, BENJAMIN, MIR, Aamir
Publication of US20220213458A1 publication Critical patent/US20220213458A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y301/00Hydrolases acting on ester bonds (3.1)
    • C12Y301/21Endodeoxyribonucleases producing 5'-phosphomonoesters (3.1.21)
    • C12Y301/21004Type II site-specific deoxyribonuclease (3.1.21.4)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2320/00Applications; Uses
    • C12N2320/50Methods for regulating/modulating their activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/80Vectors containing sites for inducing double-stranded breaks, e.g. meganuclease restriction sites

Definitions

  • the present disclosure provides new RNA-guided nuclease systems and engineered nickases for making rational, direct edits to nucleic acids in live cells.
  • nucleases that allow manipulation of gene sequence: hence, gene function.
  • These nucleases include nucleic acid-guided nucleases.
  • the range of target sequences that nucleic acid-guided nucleases can recognize, however, is constrained by the need for a specific PAM to be located near the desired target sequence.
  • PAMs are short nucleotide sequences recognized by a gRNA/nuclease complex where this complex directs editing of the target sequence.
  • PAMs typically are 2-7 base-pair sequences adjacent or in proximity to the target sequence and, depending on the nuclease, can be 5′ or 3′ to the target sequence.
  • Engineering nucleic acid-guided nucleases or mining for new nucleic acid-guided nucleases may provide nucleases with altered PAM preferences and/or altered activity or fidelity; all changes that may increase the versatility of a nucleic acid-guided nuclease for certain editing tasks.
  • Type II MAD nucleases e.g., RNA-guided nucleases or RGNs
  • PAM preferences e.g., PAM preferences, and/or varied activity in mammalian cells.
  • MAD nuclease systems that perform nucleic acid-guided nuclease editing including a MAD2015 system comprising SEQ ID Nos. 1 (MAD2015 nuclease), 2 (CRISPR RNA) and 3 (trans-activating crispr RNA); a MAD2016 system comprising SEQ ID Nos. 4 (MAD2016 nuclease), 5 (CRISPR RNA) and 6 (trans-activating crispr RNA); a MAD2017 system comprising SEQ ID Nos. 7 (MAD2017 nuclease), 8 (CRISPR RNA) and 9 (trans-activating crispr RNA); a MAD2019 system comprising SEQ ID Nos.
  • MAD2019 nuclease 10 (MAD2019 nuclease), 11 (CRISPR RNA) and 12 (trans-activating crispr RNA); a MAD2020 system comprising SEQ ID Nos. 13 (MAD2020 nuclease), 14 (CRISPR RNA) and 15 (trans-activating crispr RNA); a MAD2021 system comprising SEQ ID Nos. 16 (MAD2021 nuclease), 17 (CRISPR RNA) and 18 (trans-activating crispr RNA); a MAD2022 system comprising SEQ ID Nos. 19 (MAD2022 nuclease), 20 (CRISPR RNA) and 21 (trans-activating crispr RNA); a MAD2023 system comprising SEQ ID Nos.
  • MAD2023 nuclease 22 (MAD2023 nuclease), 23 (CRISPR RNA) and 24 (trans-activating crispr RNA); a MAD2024 system comprising SEQ ID Nos. 25 (MAD2024 nuclease), 26 (CRISPR RNA) and 27 (trans-activating crispr RNA); a MAD2025 system comprising SEQ ID Nos. 28 (MAD2025 nuclease), 29 (CRISPR RNA) and 30 (trans-activating crispr RNA); a MAD2026 system comprising SEQ ID Nos. 31 (MAD2026 nuclease), 32 (CRISPR RNA) and 33 (trans-activating crispr RNA); a MAD2027 system comprising SEQ ID Nos.
  • MAD2034 nuclease 35 (CRISPR RNA) and 36 (trans-activating crispr RNA); a MAD2028 system comprising SEQ ID Nos. 37 (MAD2028 nuclease), 38 (CRISPR RNA) and 39 (trans-activating crispr RNA); a MAD2029 system comprising SEQ ID Nos. 40 (MAD2029 nuclease), 41 (CRISPR RNA) and 42 (trans-activating crispr RNA); a MAD2030 system comprising SEQ ID Nos. 43 (MAD2030 nuclease), 44 (CRISPR RNA) and 45 (trans-activating crispr RNA); a MAD2031 system comprising SEQ ID Nos.
  • MAD2031 nuclease 46 (MAD2031 nuclease), 47 (CRISPR RNA) and 48 (trans-activating crispr RNA); a MAD2032 system comprising SEQ ID Nos. 49 (MAD2032 nuclease), 50 (CRISPR RNA) and 51 (trans-activating crispr RNA); a MAD2033 system comprising SEQ ID Nos. 52 (MAD2033 nuclease), 53 (CRISPR RNA) and 54 (trans-activating crispr RNA); a MAD2034 system comprising SEQ ID Nos. 55 (MAD2034 nuclease), 56 (CRISPR RNA) and 57 (trans-activating crispr RNA); a MAD2035 system comprising SEQ ID Nos.
  • MAD2035 nuclease 59 (CRISPR RNA) and 60 (trans-activating crispr RNA); a MAD2036 system comprising SEQ ID Nos. 61 (MAD2036 nuclease), 62 (CRISPR RNA) and 63 (trans-activating crispr RNA); a MAD2037 system comprising SEQ ID Nos. 64 (MAD2031 nuclease), 65 (CRISPR RNA) and 66 (trans-activating crispr RNA); a MAD2038 system comprising SEQ ID Nos.
  • 67 MAD2038 nuclease
  • 68 CRISPR RNA
  • 69 trans-activating crispr RNA
  • MAD2039 system comprising SEQ ID Nos. 70 (MAD2039 nuclease), 71 (CRISPR RNA) and 72 (trans-activating crispr RNA)
  • MAD2040 system comprising SEQ ID Nos. 73 (MAD2040 nuclease), 74 (CRISPR RNA) and 75 (trans-activating crispr RNA).
  • the MAD system components are delivered as sequences to be transcribed (in the case of the gRNA components) and transcribed and translated (in the case of the MAD nuclease), and in some aspects, the coding sequence for the MAD nuclease and the gRNA component sequences are on the same vector. In other aspects, the coding sequence for the MAD nuclease and the gRNA component sequences are on a different vector and in some aspects, the gRNA component sequences are located in an editing cassette which also comprises a donor DNA (e.g., homology arm). In other aspects, the MAD nuclease is delivered to the cells as a peptide or the MAD nuclease and gRNA components are delivered to the cells as a ribonuclease complex.
  • engineered nickases derived from the nucleases from the above-referenced systems, including MAD2016-H851A (SEQ ID NO: 178); MAD2016-N874A (SEQ ID NO: 179); MAD2032-H590A (SEQ ID NO: 180); MAD2039-H587A (SEQ ID NO: 181); MAD2039-N610A (SEQ ID NO: 182).
  • FIG. 1 is an exemplary workflow for creating and screening mined MAD nucleases or RGNs.
  • FIG. 2 is a simplified depiction of an in vitro test conducted on candidate enzymes.
  • FIG. 3 is a list of novel Type II MADzymes that have been identified.
  • FIG. 4 is a map of Type II MADzymes in cluster 59 .
  • FIG. 5 is a map of Type II MADzymes in cluster 55 , 56 , 57 and 58 .
  • FIG. 6 is a map of Type II MADzymes in cluster 141 .
  • FIG. 7 is a reproduction of a gel showing nicked plasmid formation with different MADzyme nickases compared to corresponding MADzyme nucleases.
  • Nuclease-specific techniques can be found in, e.g., Genome Editing and Engineering From TALENs and CRISPRs to Molecular Surgery, Appasani and Church, 2018; and CRISPR: Methods and Protocols, Lindgren and Charpentier, 2015; both of which are herein incorporated in their entirety by reference for all purposes.
  • Basic methods for enzyme engineering may be found in, Enzyme Engineering Methods and Protocols, Samuelson, ed., 2013; Protein Engineering, Kaumaya, ed., (2012); and Kaur and Sharma, “ Directed Evolution: An Approach to Engineer Enzymes”, Crit. Rev. Biotechnology, 26:165-69 (2006).
  • an oligonucleotide refers to one or more oligonucleotides.
  • Terms such as “first,” “second,” “third,” etc., merely identify one of a number of portions, components, steps, operations, functions, and/or points of reference as disclosed herein, and likewise do not necessarily limit embodiments of the present disclosure to any particular configuration or orientation.
  • nucleic acid refers to Watson-Crick base pairing between nucleotides and specifically refers to nucleotides hydrogen bonded to one another with thymine or uracil residues linked to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three hydrogen bonds.
  • a nucleic acid includes a nucleotide sequence described as having a “percent complementarity” or “percent homology” to a specified second nucleotide sequence.
  • a nucleotide sequence may have 80%, 90%, or 100% complementarity to a specified second nucleotide sequence, indicating that 8 of 10, 9 of 10 or 10 of 10 nucleotides of a sequence are complementary to the specified second nucleotide sequence.
  • the nucleotide sequence 3′-TCGA-5′ is 100% complementary to the nucleotide sequence 5′-AGCT-3′; and the nucleotide sequence 3′-TCGA-5′ is 100% complementary to a region of the nucleotide sequence 5′-TAGCTG-3′.
  • control sequences refers collectively to promoter sequences, polyadenylation signals, transcription termination sequences, upstream regulatory domains, origins of replication, internal ribosome entry sites, nuclear localization sequences, enhancers, and the like, which collectively provide for the replication, transcription and translation of a coding sequence in a recipient cell. Not all of these types of control sequences need to be present so long as a selected coding sequence is capable of being replicated, transcribed and—for some components—translated in an appropriate host cell.
  • donor DNA or “donor nucleic acid” refers to nucleic acid that is designed to introduce a DNA sequence modification (insertion, deletion, substitution) into a locus by homologous recombination using nucleic acid-guided nucleases.
  • the donor DNA must have sufficient homology to the regions flanking the “cut site” or site to be edited in the genomic target sequence.
  • the length of the homology arm(s) will depend on, e.g., the type and size of the modification being made.
  • the donor DNA will have two regions of sequence homology (e.g., two homology arms) to the genomic target locus.
  • an “insert” region or “DNA sequence modification” region the nucleic acid modification that one desires to be introduced into a genome target locus in a cell—will be located between two regions of homology.
  • the DNA sequence modification may change one or more bases of the target genomic DNA sequence at one specific site or multiple specific sites.
  • a change may include changing 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 or more base pairs of the target sequence.
  • a deletion or insertion may be a deletion or insertion of 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 200, 300, 400, or 500 or more base pairs of the target sequence.
  • guide nucleic acid or “guide RNA” or “gRNA” refer to a polynucleotide comprising 1) a guide sequence capable of hybridizing to a genomic target locus, and 2) a scaffold sequence capable of interacting or complexing with a nucleic acid-guided nuclease.
  • “Homology” or “identity” or “similarity” refers to sequence similarity between two peptides or, more often in the context of the present disclosure, between two nucleic acid molecules.
  • the term “homologous region” or “homology arm” refers to a region on the donor DNA with a certain degree of homology with the target genomic DNA sequence. Homology can be determined by comparing a position in each sequence which may be aligned for purposes of comparison. When a position in the compared sequence is occupied by the same base or amino acid, then the molecules are homologous at that position. A degree of homology between sequences is a function of the number of matching or homologous positions shared by the sequences.
  • operably linked refers to an arrangement of elements where the components so described are configured so as to perform their usual function.
  • control sequences operably linked to a coding sequence are capable of effecting the transcription, and in some cases, the translation, of a coding sequence.
  • the control sequences need not be contiguous with the coding sequence so long as they function to direct the expression of the coding sequence.
  • intervening untranslated yet transcribed sequences can be present between a promoter sequence and the coding sequence and the promoter sequence can still be considered “operably linked” to the coding sequence.
  • such sequences need not reside on the same contiguous DNA molecule (i.e. chromosome) and may still have interactions resulting in altered regulation.
  • a “promoter” or “promoter sequence” is a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a polynucleotide or polypeptide coding sequence such as messenger RNA, ribosomal RNA, small nuclear or nucleolar RNA, guide RNA, or any kind of RNA transcribed by any class of any RNA polymerase I, II or III. Promoters may be constitutive or inducible and, in some embodiments—particularly many embodiments in which selection is employed—the transcription of at least one component of the nucleic acid-guided nuclease editing system is under the control of an inducible promoter.
  • selectable marker refers to a gene introduced into a cell, which confers a trait suitable for artificial selection.
  • General use selectable markers are well-known to those of ordinary skill in the art.
  • Drug selectable markers such as ampicillin/carbenicillin, kanamycin, chloramphenicol, erythromycin, tetracycline, gentamicin, bleomycin, streptomycin, rhamnose, puromycin, hygromycin, blasticidin, and G418 may be employed.
  • selectable markers include, but are not limited to human nerve growth factor receptor (detected with a MAb, such as described in U.S. Pat. No.
  • target genomic DNA sequence refers to any locus in vitro or in vivo, or in a nucleic acid (e.g., genome) of a cell or population of cells, in which a change of at least one nucleotide is desired using a nucleic acid-guided nuclease editing system.
  • the target sequence can be a genomic locus or extrachromosomal locus.
  • a “vector” is any of a variety of nucleic acids that comprise a desired sequence or sequences to be delivered to and/or expressed in a cell.
  • Vectors are typically composed of DNA, although RNA vectors are also available.
  • Vectors include, but are not limited to, plasmids, fosmids, phagemids, virus genomes, synthetic chromosomes, and the like.
  • engine vector comprises a coding sequence for a nuclease to be used in the nucleic acid-guided nuclease systems and methods of the present disclosure.
  • the engine vector may also comprise, in a bacterial system, the ⁇ Red recombineering system or an equivalent thereto.
  • Engine vectors also typically comprise a selectable marker.
  • the phrase “editing vector” comprises a donor nucleic acid, optionally including an alteration to the target sequence that prevents nuclease binding at a PAM or spacer in the target sequence after editing has taken place, and a coding sequence for a gRNA.
  • the editing vector may also comprise a selectable marker and/or a barcode.
  • the engine vector and editing vector may be combined; that is, the contents of the engine vector may be found on the editing vector.
  • the engine and editing vectors comprise control sequences operably linked to, e.g., the nuclease coding sequence, recombineering system coding sequences (if present), donor nucleic acid, guide nucleic acid, and selectable marker(s).
  • RNA-guided nucleases have rapidly become the foundational tools for genome engineering of prokaryotes and eukaryotes.
  • Clustered Rapidly Interspaced Short Palindromic Repeats CRISPR
  • MGEs mobile genetic elements
  • RGNs are a major part of this defense system because they identify and destroy MGEs.
  • RGNs can be repurposed for genome editing in various organisms by reprogramming the CRISPR RNA (crRNA) that guides the RGN to a specific target DNA.
  • crRNA CRISPR RNA
  • a number of different RGNs have been identified to date for various applications; however, there are various properties that make some RGNs more desirable than others for specific applications.
  • RGNs can be used for creating specific double strand breaks (DSBs), specific nicks of one strand of DNA, or guide another moiety to a specific DNA sequence.
  • DSBs double strand breaks
  • RGN radionuclear nucleic acid
  • PAM protospacer adjacent motif
  • Type V RGNs such as MAD7, AsCas12a and LbCas12a tend to access DNA targets that contain YTTN/TTTN on the 5′ end
  • type II RGNs such as the MADzymes disclosed herein—target DNA sequences containing a specific short motif on the 3′ end.
  • An example well known in the art for a type II RGN is SpCas9 which requires an NGG on the 3′ end of the target DNA.
  • Type II RGNs unlike type V RGNS, require a transactivating RNA (tracrRNA) in addition to a crRNA for optimal function. Compared to type V RGNs, the type II RGNs create a double-strand break closer to the PAM sequence, which is highly desirable for precise genome editing applications.
  • tracrRNA transactivating RNA
  • type II RGNs have been discovered so far; however, their use in widespread applications is limited by restrictive PAMs. For example, the PAM of SpCas9 occurs less frequently in AT-rich regions of the genome. New type II RGNs with new and less restrictive PAMs are beneficial for the field. Further, not all type II nucleases are active in multiple organisms. For example, a number of RGNs have been discussed in the scientific literature but only a few have been demonstrated to be active in vitro and fewer still are active in cells, particularly in mammalian cells. The present disclosure identifies multiple type II RGNs that have novel PAMs and are active in mammalian cells.
  • the type II RGNs or MADzymes may be delivered to cells to be edited as a polypeptide; alternatively, a polynucleotide sequence encoding the MADzyme are transformed or transfected into the cells to be edited.
  • the polynucleotide sequence encoding the MADzyme may be codon optimized for expression in particular cells, such as archaeal, prokaryotic or eukaryotic cells.
  • Eukaryotic cells can be yeast, fungi, algae, plant, animal, or human cells.
  • Eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, mouse, rat, rabbit, dog, or non-human mammals including non-human primates.
  • the choice of the MADzyme to be employed depends on many factors, such as what type of edit is to be made in the target sequence and whether an appropriate PAM is located close to the desired target sequence.
  • the MADzyme may be encoded by a DNA sequence on a vector (e.g., the engine vector) and be under the control of a constitutive or inducible promoter.
  • the sequence encoding the nuclease is under the control of an inducible promoter, and the inducible promoter may be separate from but the same as an inducible promoter controlling transcription of the guide nucleic acid; that is, a separate inducible promoter may drive the transcription of the nuclease and guide nucleic acid sequences but the two inducible promoters may be the same type of inducible promoter (e.g., both are pL promoters).
  • the inducible promoter controlling expression of the nuclease may be different from the inducible promoter controlling transcription of the guide nucleic acid; that is, e.g., the nuclease may be under the control of the pBAD inducible promoter, and the guide nucleic acid may be under the control of the pL inducible promoter.
  • a guide nucleic acid e.g., gRNA
  • a compatible nucleic acid-guided nuclease can then hybridize with a target sequence, thereby directing the nuclease to the target sequence.
  • the nucleic acid-guided nuclease editing system uses two separate guide nucleic acid components that combine and function as a guide nucleic acid; that is, a CRISPR RNA (crRNA) and a transactivating CRISPR RNA (tracrRNA).
  • crRNA CRISPR RNA
  • tracrRNA transactivating CRISPR RNA
  • the gRNA may be encoded by a DNA sequence on a polynucleotide molecule such as a plasmid, linear construct, or the coding sequence may reside within an editing cassette and is under the control of a constitutive promoter, or, in some embodiments, an inducible promoter as described below.
  • a guide nucleic acid comprises a guide polynucleotide sequence having sufficient complementarity with a target sequence to hybridize with the target sequence and direct sequence-specific binding of a complexed nucleic acid-guided nuclease to the target sequence.
  • the degree of complementarity between a guide sequence and the corresponding target sequence when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
  • Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences.
  • a guide sequence is about or more than about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20 nucleotides in length. Preferably the guide sequence is 10-30 or 15-20 nucleotides long, or 15, 16, 17, 18, 19, or 20 nucleotides in length.
  • the components of the guide nucleic acid is provided as a sequence to be expressed from a plasmid or vector and comprises both the guide sequence and the scaffold sequence as a single transcript under the control of a promoter, and in some embodiments, an inducible promoter.
  • the gRNA/nuclease complex binds to a target sequence as determined by the guide RNA, and the nuclease recognizes a protospacer adjacent motif PAM) sequence adjacent to the target sequence.
  • the target sequence can be any polynucleotide endogenous or exogenous to a prokaryotic or eukaryotic cell, or in vitro.
  • the target sequence can be a polynucleotide residing in the nucleus of a eukaryotic cell.
  • a target sequence can be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide, an intron, a PAM, or “junk” DNA).
  • the guide nucleic acid may be part of an editing cassette that encodes the donor nucleic acid.
  • the guide nucleic acid may not be part of the editing cassette and instead may be encoded on the engine or editing vector backbone.
  • a sequence coding for a guide nucleic acid can be assembled or inserted into a vector backbone first, followed by insertion of the donor nucleic acid in, e.g., the editing cassette.
  • the donor nucleic acid in, e.g., an editing cassette can be inserted or assembled into a vector backbone first, followed by insertion of the sequence coding for the guide nucleic acid.
  • sequence encoding the guide nucleic acid and the donor nucleic acid are simultaneously but separately inserted or assembled into a vector.
  • sequence encoding the guide nucleic acid and the sequence encoding the donor nucleic acid are both included in the editing cassette.
  • the target sequence is associated with a PAM, which is a short nucleotide sequence recognized by the gRNA/nuclease complex.
  • PAM is a short nucleotide sequence recognized by the gRNA/nuclease complex.
  • the precise PAM sequence and length requirements for different nucleic acid-guided nucleases vary; however, PAMs typically are 2-7 base-pair sequences adjacent or in proximity to the target sequence and, depending on the nuclease, can be 5′ or 3′ to the target sequence.
  • Engineering of the PAM-interacting domain of a nucleic acid-guided nuclease may allow for alteration of PAM specificity, improve fidelity, or decrease fidelity.
  • the genome editing of a target sequence both introduces a desired DNA change to a target sequence, e.g., the genomic DNA of a cell, and removes, mutates, or renders inactive a proto-spacer mutation (PAM) region in the target sequence. Rendering the PAM at the target sequence inactive precludes additional editing of the cell genome at that target sequence, e.g., upon subsequent exposure to a nucleic acid-guided nuclease complexed with a synthetic guide nucleic acid in later rounds of editing.
  • PAM proto-spacer mutation
  • Cells that did not undergo the first editing event will be cut rendering a double-stranded DNA break, and thus will not continue to be viable.
  • the cells containing the desired target sequence edit and PAM alteration will not be cut, as these edited cells no longer contain the necessary PAM site and will continue to grow and propagate.
  • nucleic acid-guided nucleases can recognize some PAMs very well (e.g., canonical PAMs), and other PAMs less well or poorly (e.g., non-canonical PAMs). Because the mined MAD nucleases disclosed herein may recognize different PAMs, the mined MAD nucleases increase the number of target sequences that can be targeted for editing; that is, mined MAD nucleases decrease the regions of “PAM deserts” in the genome.
  • the mined MAD nucleases expand the scope of target sequences that may be edited by increasing the number (variety) of PAM sequences recognized.
  • cocktails of mined MAD nucleases may be delivered to cells such that target sequences adjacent to several different PAMs may be edited in a single editing run.
  • the donor nucleic acid is on the same polynucleotide (e.g., editing vector or editing cassette) as the guide nucleic acid and may be (but not necessarily) under the control of the same promoter as the guide nucleic acid (e.g., a single promoter driving the transcription of both the guide nucleic acid and the donor nucleic acid).
  • the same polynucleotide e.g., editing vector or editing cassette
  • the same promoter e.g., a single promoter driving the transcription of both the guide nucleic acid and the donor nucleic acid.
  • the donor nucleic acid is designed to serve as a template for homologous recombination with a target sequence nicked or cleaved by the nucleic acid-guided nuclease as a part of the gRNA/nuclease complex.
  • a donor nucleic acid polynucleotide may be of any suitable length, such as about or more than about 20, 25, 50, 75, 100, 150, 200, 500, or 1000 nucleotides in length.
  • the donor nucleic acid can be provided as an oligonucleotide of between 20-300 nucleotides, more preferably between 50-250 nucleotides.
  • the donor nucleic acid comprises a region that is complementary to a portion of the target sequence (e.g., a homology arm). When optimally aligned, the donor nucleic acid overlaps with (is complementary to) the target sequence by, e.g., about 20, 25, 30, 35, 40, 50, 60, 70, 80, 90 or more nucleotides. In many embodiments, the donor nucleic acid comprises two homology arms (regions complementary to the target sequence) flanking the mutation or difference between the donor nucleic acid and the target template.
  • the donor nucleic acid comprises at least one mutation or alteration compared to the target sequence, such as an insertion, deletion, modification, or any combination thereof compared to the target sequence.
  • the donor nucleic acid is provided as an editing cassette, which is inserted into a vector backbone
  • the vector backbone may comprise a promoter driving transcription of the gRNA and the coding sequence of the gRNA, or the vector backbone may comprise a promoter driving the transcription of the gRNA but not the gRNA itself.
  • the promoter driving transcription of the gRNA and the donor nucleic acid is an inducible promoter.
  • Inducible editing is advantageous in that isolated cells can be grown for several to many cell doublings to establish colonies before editing is initiated, which increases the likelihood that cells with edits will survive, as the double-strand cuts caused by active editing are largely toxic to the cells. This toxicity results both in cell death in the edited colonies, as well as a lag in growth for the edited cells that do survive but must repair and recover following editing. However, once the edited cells have a chance to recover, the size of the colonies of the edited cells will eventually catch up to the size of the colonies of unedited cells.
  • a guide nucleic acid may be efficacious directing the edit of more than one donor nucleic acid in an editing cassette; e.g., if the desired edits are close to one another in a target sequence.
  • an editing cassette may comprise one or more primer sites.
  • the primer sites can be used to amplify the editing cassette by using oligonucleotide primers; for example, if the primer sites flank one or more of the other components of the editing cassette.
  • the editing cassette may comprise a barcode.
  • a barcode is a unique DNA sequence that corresponds to the donor DNA sequence such that the barcode can identify the edit made to the corresponding target sequence.
  • the barcode typically comprises four or more nucleotides.
  • the editing cassettes comprise a collection of donor nucleic acids representing, e.g., gene-wide or genome-wide libraries of donor nucleic acids. The library of editing cassettes is cloned into vector backbones where, e.g., each different donor nucleic acid is associated with a different barcode.
  • an expression vector or cassette encoding components of the nucleic acid-guided nuclease system further encodes one or more nuclear localization sequences (NLSs), such as about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs.
  • NLSs nuclear localization sequences
  • the nuclease comprises NLSs at or near the amino-terminus of the MADzyme, NLSs at or near the carboxy-terminus of the MADzyme, or a combination.
  • the engine and editing vectors comprise control sequences operably linked to the component sequences to be transcribed.
  • the promoters driving transcription of one or more components of the mined MAD nuclease editing system may be inducible, and an inducible system is likely employed if selection is to be performed.
  • a number of gene regulation control systems have been developed for the controlled expression of genes in plant, microbe, and animal cells, including mammalian cells, including the pL promoter (induced by heat inactivation of the CI857 repressor), the pBAD promoter (induced by the addition of arabinose to the cell growth medium), and the rhamnose inducible promoter (induced by the addition of rhamnose to the cell growth medium).
  • performing genome editing in live cells entails transforming cells with the components necessary to perform nucleic acid-guided nuclease editing.
  • the cells may be transformed simultaneously with separate engine and editing vectors; the cells may already be expressing the mined MAD nuclease (e.g., the cells may have already been transformed with an engine vector or the coding sequence for the mined MAD nuclease may be stably integrated into the cellular genome) such that only the editing vector needs to be transformed into the cells; or the cells may be transformed with a single vector comprising all components required to perform nucleic acid-guided nuclease genome editing.
  • a variety of delivery systems can be used to introduce (e.g., transform or transfect) nucleic acid-guided nuclease editing system components into a host cell.
  • These delivery systems include the use of yeast systems, lipofection systems, microinjection systems, biolistic systems, virosomes, liposomes, immunoliposomes, polycations, lipid:nucleic acid conjugates, virions, artificial virions, viral vectors, electroporation, cell permeable peptides, nanoparticles, nanowires, exosomes.
  • molecular trojan horse liposomes may be used to deliver nucleic acid-guided nuclease components across the blood brain barrier.
  • electroporation particularly flow-through electroporation (either as a stand-alone instrument or as a module in an automated multi-module system) as described in, e.g., U.S. Pat. Nos. 10,435,713; 10,443,074; 10,323,258; and 10,415,058.
  • the cells are cultured under conditions that promote editing.
  • constitutive promoters are used to drive transcription of the mined MAD nucleases and/or gRNA
  • the transformed cells need only be cultured in a typical culture medium under typical conditions (e.g., temperature, CO 2 atmosphere, etc.)
  • editing is inducible—by, e.g., activating inducible promoters that control transcription of one or more of the components needed for nucleic acid-guided nuclease editing, such as, e.g., transcription of the gRNA, donor DNA, nuclease, or, in the case of bacteria, a recombineering system—the cells are subjected to inducing conditions.
  • FIG. 1 shows an exemplary workflow for creating and for in vitro screening of MADzymes, including those in untapped clusters.
  • metagenome mining was performed to identify putative RGNs of interest based on, e.g., sequence (HMMER profile) and a search for CRISPR arrays.
  • HMMER profile sequence
  • candidate pools were created and each MADzyme was identified by cluster, the tracrRNA was identified, and the sgRNA structure was predicted. Final candidates were identified, then the genes were synthesized.
  • An in vitro depletion test was performed (see FIG.
  • FIG. 2 depicts the in vitro depletion test in more detail.
  • NCBI Metagenome database was used to search for novel, putative CRISPR nucleases using HMMER hidden Markov model searches. Hundreds of potential nucleases were identified. For each potential nuclease candidate, putative CRISPR arrays were identified and CRISPR repeat and anti-repeats were identified. Thirteen nucleases ( FIG. 3 ) were chosen for in vitro validation and 11 active MADzymes were identified and assigned to clusters. There was less than 40% sequence identity between clusters. Cluster 59 shown in FIG. 4 presents two unique subclusters with distinct sgRNA architecture. Clusters 55 - 57 are shown in FIG. 5 . These new MADzymes have diverse PAM preferences and distinct sgRNA structure. Cluster 141 ( FIG.
  • Table 1 lists the identified MADzymes, including amino acid sequences, origin, and nucleic acid sequences of the CRISPR RNA and the trans-activating crispr RNA.
  • Organism MAD Clus- (meta- CRISPR name ter Contig_id genome) Source aa_seq repeat tracrRNA MAD2015 59 DPZI01000013.1 Vagococcus MGKNYTIGLDIGTNSVGWSVVTENQQLVKKRMKIRGDS GTTTT TGTTGGT sp.
  • EKKQVKKNFWGVRLFDEGETAEATRLKRTTRRRYTRRR AGAGC AGCATTC NRVVDLQNIFKDEINQKDSNFFNRLNESFLVVEDKKQP TATGC AAAACAA KQMIFGTVEEEASYHESFPTIYHLRKELVDNKDQADIR TGTTT CATAGCA LVYLAMAHMIKYRGHFLIEGQLSTENTSVEEKFHLFLK TGAAT AGTTAAA EYNSTFCKQEDGSLVNPVNEDINGEEILMGTLSRSKKA GCTTC ATAAGGC EQIMKSFEGEKSNGVFSQFLKMIVGNQGNFKKAFNLEE CAAAA TTTGTCC DAKIQFAKEEYDEDLTTLLSNIGDEYANVFSLAKETYE C GTTCTCA AIELSGILSTKDKETYAKLSSSMTERYEDHEKDLASLK [SEQ ACTTTTA SFFREHLPEKYAVMFKDVSKNGYAGYIENSNKISQEEF
  • AAACACG AKFEGADYFLEKIDREDFLRKQRTFDNGSIPYQIHLQE 11] TGGCACC MRAILDKQAKFYPFLAKNKERIEKILTFRIPYYVGPLA GATTCGG RGNSDFAWSIRKRNEKITPWNFEDVIDKESSAEAFINR TGC MTSFDLYLPEEKVLPKHSLLYETFTVYNELTKVRFIAE [SEQ ID GMSDYQFLDSKQKKDIVRLYFKGKRKVKVTDKDIIEYL NO.
  • MAD2023 56 DCGJ01000048.1 Lachno- Feces MEKNNYLLGLDIGTDSVGYAVTNDKYDILKFHGEPAWG GTTTG AGACCCC clostridium of six- VTIFDEASLSTEKRSFRVSRRRLDRRQQRVLLVQELFA AGAGT TATGGAT sp.
  • MAD2024 56 CADAKQ010000027.1 uncultured Cattle MNFDGEYFLGLDIGTDSVGYAVTDQRYNLVKFKGEPMW GTTTG GAGCCCT Lachno- rumen GSHLFDAANQCAERRGFRTARRRLDRRQQRVKLVDEIF AGAGT CTGGATT spiraceae APEVAKVDPNFYIRKMESALYPEDKSNKGDLYLYFNKQ AGTGT TACACTA bacterium EYDEKHYYKDYPTIHHLICALMNDEKTKFDIRLINIAI AAATC CGAGTTC DWLVAHRGHFLSEVGTDSVDKVLDFRKIYDEFMALFSD CAGAG AAATAAA EDDAVSSKPWENINPDELGKVLKIHGKNAKRNELKKLL GGCTC AATTATT YGGKIPTDEDSFIDRKLLIDFIAGTSVQCNKLFRNSEY CAAAA TCAAATC EDDLKITISNSDEREVVLPQLEDFH
  • MAD2025 56 DOQG01000053.1 Rumino- human MSFKENSKFYFGLDIGTDSVGWAVTDNLYKLYKYKNNL GTTTG TTTTACT coccaceae gut MWGVSLFEAASPAEDRRNHRTARRRLDRRQQRVALLRE AGAGT ACCCTAT bacterium LFAKEILKTDPDFFLRLKESSLYPEDRTNKNVNTYFDD AGTGT AAATTTA ADFKDSDYFKMYPTVHHLIKELSESDKPHDVRLVYLAC AAATT CACTACG AFIVAHRGHFLNGADENNVQEVLDFNSSYCEFTDWFKS TATAG AGTTCAA NDIEDNPFSESTENEFSVILRKKIGITAKEKEIKNLLF GGTAG ATAAAAA GTTKTPDCYKDEEYPIDIDVLIKFISGGKTNLAKLFRN TAAAA TTATTTC PAYDELDIQTVEVGKADFADTIDLLASSMEDTDVPLLS C AAATCGT AVK
  • MAD2026 65 CADBQN010000053.1 uncultured Cattle MEQKDYYIGLDIGTNSVGWAVVDEGYQLCRFKKYDMWG GTTTG GACTACC Firmicutes rumen VRLFDSAETAAERRMNRVNRRRNRRKKQRIDLLQGLFA AGAGT ATATGAG bacterium EEIAKIDRTFFVRLNESRLHPEDKSTAFRHPLFNDPNY AGTGT ATTACAC TDVDYYKEYPTIYHLRKELMDSAEPHDIRLVYLALHHI AATTT TACACGG LKNRGHFLIEGGFEDSKKFEPTFRQLLEVLTEELGLKM CATAT TTCAAAT DGADAALAESVLKDRGMKKTEKVKRLKNVFTLNTTDMD GGTAG AAAGAAT QESQKKQKAQIDAVCKFLAGSKGDFKKLVADEALNELK TCAAA GTTCGAA LDTFALGTSKAEDIGLEIEKSAPQYCVVFESVKSVFDW C ACC
  • MAD2027 65 CACWRN010000001.1 uncultured Cattle MSKKFAGEYYLGLDIGTDSVGWAVTDNQYNVLKFNGKS GTTTG TTTACCA Succini- rumen MWGIRLFDAAQTAAERRMFRTARRRVERRRWRLELLQE AGAGT TCCAGTG clasticum LFQNEIEKKDPDFFQRMKDSALYPEDSKTGKPFALFCD AATGT AGTTTAC sp.
  • MAD2029 66 DBKT01000013.1 Bacillales gut MADKLFIGLDVGSESVGWAATDENFHLYRLKGKTAWGA GTTTG GCATTGT bacterium meta- RIFSEANDAKTRRGFRVAGRRLARRKERIRLLNTLFDP AGAGC AAGACAA genome LLKKDPAFLLRLENSAIQNDDPNKPIQAIADCPLLVNK AGTGT CACTGCT QEEKDYYKRYPTIWHLRKALMENDDHAFSDIRFLYLAI TGTCT ACGTTCA HHIIKYRGNFLREGDIKIGQFDYSIFDKLNETLAVLFD TATAT AATAAGC LQNEDGENEEGRFIGLPKSQYEAFITCANDRNLPKQPK AGCTC ATATTGC KAKLLSMFEKTEESKAFLEMFCTLCSGGEFSTKKLNAK GAAAA TACAAGG GEETYQDAKISFNSSYDENEGAYQEILGDFFDLVDIAK C TTCTCCA AVFDY
  • MAD2030 66 DBLD01000015.1 Bacillales gut MEQNTKKLFIGLDVGTDSVGWAATDEYFNLYRLKGKTA GTTTG GCATTGT bacterium meta- WGARLFLDAANAKDRRQHRVSGRRLARRKERIRLLNAL AGAGC AAGACAA genome FDPLLKKVDPTFLLRLESSTLQNDDPNKDQRAVSDALL AGTGT CACTGCA FGNKKHEKAYYAAFPTIWHLRKALIENDDKAFSDIRYL TGTCT CGTTCAA YLAIHHIIKYRGNFLRQGEIKIGEFDFSCFDKLNQFFD TAAAT ATAAGCA IYFSKEDEEEVEFIGLPNENYQRFIDCAADKNLGKGKK AGCTC GATTGCT KGDLLKLMSFSEDEKPFCEMFCSLCAGLAFSTKKLNKK GAAAA ACAAGGT DETVFEDIKVEFNGKFDDKQEEIKSVLGDAYDLVELAK C T
  • MAD2031 141 CACVOG010000001.1 uncultured Cattle MNYILGLDIGIASVGWAAVALDANDEPCKILDLNARIF ATTGT TTGTAAT Seleno- rumen EAAEQPKTGASLAAPRREARGSRRRTRRRRHRMERLRH ACCAT AACCTAT monadaceae LFAREELISAENIAALFEAPADVYRLRAEGLSRRLDEG AGCGA TTTACCT bacterium EWARVLYHIAKRRGFKSNRKGAASDADEGKVLEAVKEN GTTAA CGCTATG EALLKNYKTVGEMMFRDEKFQTAKRNKGGSYTFCVSRG ATTAG GCACAAT MLAEEIGELFAAQREQGNPHASETFETAYSKIFADQRS GGAAT TTGTTAT FDDGPDANSRSPYAGNQIEKMIGTCSLETDPPEKRAAK TACAA TACATGG ASYSFMRFSLLQKINHLRLKDAKGEERPLTDEERAAVE C ACATTAT ALAWKSPSL
  • AAAAAAG RGFGHISVKACRKLIPHLEKGMTYDKACKEAGYDLQKT CAACGAA GGEKTKLLSGNLDEIREIPNPVVRRAIAQTVKVVNAVI AAACGTG RRYGSPVAVNVELAREMGRTFQERRDMMKSMEDNNAEN CTGGCAG EKRKEELKGYGVVHPSGLDIVKLKLYKEQGGVCAYSLA CAA AMPIEKVLKDHDYAEVDHILPYSRSFDDSYANKVLVLS [SEQ ID KENRDKGNRTPMEYMANMPGRRHDFITWVKSAVRNPRK NO.
  • MAD2032 141 CACVWE010000020.1 uncultured Cattle MKYIIGLDMGITSVGFATMMLDDKDEPCRIIRMGSRIF GTTGT ATTGTAT Rumino- rumen EAAEHPKDGSSLAAPRRINRGMRRRLRRKSHRKERIKD AGTTC CATACCA coccus LIIKNELMTADEISAIYSTGKQLSDIYQIRAEALDRKL CCTAA AGAACAA sp.
  • MAD2033 141 DCJP01000021.1 un- Feces MKNTLYGIGLDIGVASVGWAVVGLNGTGEPVGLHRLGV GTTGT TTATACC cultivated of RIFDKAEQPKTGESLAAPRRMARGMRRRLRRKALRRAD AGTTC ATACCAA Faecali three- VYALLERSGLSTREALAQMFEAGGLEDIYALRTRALDE CCTAA GAACTGT bacterium weeks PVGKAEFSRILLHLAQRRGFKSNRRTASDGEDGRLLAA CAGTT TATGGTT sp.
  • MAD2034 141 CACXAV010000001.1 uncultured Cattle MAYGIGLDIGIASVGFATVALNEQDEPCGILRMGSRIF GTTGT TTATACC
  • AGTTC ATACCAA diales LLVESCLISQDGLGSLFEGRLEDIYALRTRALDERLTD CCTAA GAACTGT bacterium
  • AELCRVLIHLAQRRGFRSNRKADAADKEAGKLLKAVSE CGGTT TGGGTTA NDRRMEENGYRTVGEMLYKDPLFAEHRRNKGEAYLSTV CTTGG CTACAAT TRTAVEQEARLVLSTQREKGNAAITEDFVEKYLDILLS TATGG AAGGTAG QRPFDVGPGGNSPYGGNMIEKMIGRCTFEPDELRAPKA TATAA TAAACCG SYSFEYFQLLQKVNHIRLLRDGRSEPLSEEQRRAIIDL T AAAAGCT ALASA
  • MAD2040 141 DHKF01000115.1 Clostri- Feces MHRYAIGLDIGITSVGWAAIALDAEENPCGMLDFGSRI GTTGT TTATACC diales FTGAEHPKTGASLAAPRREARGARRRLRRHRHRNERIR AGTTC ATACCAA bacterium RLMVSGGLISQEQLESLFAGQLEDIYALRTRALDEQVA CCTGA GAACTGC UBA4701 REELARIMLHLSQRRGFRSNRKGGADAEDGKLLEAVGD TGGTT TCAGGTT NKRRMDEKGYRTAGEMFFKDEAFAAHKRNKGGNYIATV CTTGG ACTATGA TRAMTEDEVHRIFAAQRGFGAEYANEKLEAAYLDILLS TATGG TAAGGTA QRSFDEGPGGDSPYGGSQIERMIGTCAFEPDQPRAAKA TATAA GTAAACC AYSFEYFSLLEKLNHIRLVSGGKSEPLTDAQRKKLIEL T GAAGAGC AHKQDTLSYAK
  • the MADzyme coding sequences were cloned into a pUC57 vector with T7-promoter sequence attached to the 5′-end of the coding sequence and a T7-terminator sequence attached to the 3′-end of the coding sequence.
  • Q5 Hot Start 2 ⁇ master mix reagent (NEB, Ipswich, MA) was used to amplify the MADzyme sequences cloned in the pUC57 vector.
  • 1 ⁇ M primers were used in a 10 ⁇ L PCR reaction using 3.3 ⁇ L boiled cell samples as templates in 96 well PCR plates. The PCR conditions shown in Table 2 were used:
  • TTTATTTATCAAA [SEQ ID NO. 91] [SEQ ID NO. 90] sgM GTTTGAGAGCCTTGT NONE NONE NONE NONE 2021 AAAACCGTATATCTC TCAAGCGAAAGATAA TGTTTTACAAGGCGA GTTCAAATAAGGATT TATCCGAAATCGCTT GCGTGCATTGGCACC ATCTATCTTTTAAGA CTTTCTTTGAAAGTC TT [SEQ ID NO.
  • TTTCAAATCGT ACCTTCACAAGTGTT TGAG [SEQ ID 102] ACTTTTTAGTA GTGAATATTAACTCA NO. 101] CCTTCACAAGT CCTTCGGGTGAG GTTGTGAATAT [SEQ ID NO. TAACTCACCTT 100] CGGGTGAG [SEQ ID NO. 103] sgM GTTTGAGAGTAGTGT NONE NONE NONE 2026 AATTTCATATGGTAG TCAAACGACTACCAT ATGAGATTACACTAC ACGGTTCAAATAAAG AATGTTCGAAACCGC CCTTTGGGGCCCGCT TGTTGCGGATTTACA GACTTGATATCAAGT CTG [SEQ ID NO.
  • AAAAGCTCTGA GTTTGCGCAGGACGT [SEQ ID NO. 126] CGTCTTGTTTG CATCTTTATATCAGA 125] CGCAGGACGTC CGGATG [SEQ ID ATCTTTATATC NO. 124] AGACGGATG [SEQ ID NO. 127] sgM GTTGTAGTCCCCTGA NONE NONE NONE 2035 TGGTTTCTGGAATGG TATAATGAAATTATA CCATTCCAGAAACTA TTATGGTCACTACAA TAAGGTATTAGACCG TAGAGCACTAACACC CCATTTGGGGTGTTA TCTCTTTAAACTGTC CAAAATTTAGTATTG CAATTATTGA [SEQ ID NO.
  • gRNA length different lengths of spacer, repeat:anti-repeat duplex and 3′ end of the tracrRNA were included. These gRNAs were then synthesized as a single stranded DNA downstream of the T7 promoter (see Table 4). These sgRNAs were amplified using two primers (5′-AAACCCCTCCGTTTAGAGAG [SEQ ID NO. 174] and 5′-AAGCTAATACGACTCACTATAGGCCAGTC [SEQ ID NO. 175]) and 1 uL of 10 uM diluted single stranded DNA as a template in 25 uL PCR reactions for each sgRNA according to the conditions of Table 5.
  • the target library was designed based on an assumption that the eight randomized NNNNNNNN [SEQ ID NO. 176] PAMs of these nucleases reside on the 3′ end of the target sequence (5′-CCAGTCAGTAATGTTACTGG [SEQ ID NO. 177]).
  • the MADZYMEs were tested for activity by in vitro transcription and translation (txtl). Both the gRNA plasmid and nuclease plasmid were included in each txtl reaction.
  • a PURExpress® In Vitro Protein Synthesis Kit (NEB, Ipswich, Mass.) was used to produce MADzymes from the PCR-amplified MADZYME library and also to produce the gRNA libraries.
  • the reagents listed in Table 6 were mixed to start the production of MADzymes and gRNAs:
  • a master mix with all reagents was mixed on ice with the exception of the PCR-amplified T7-MADZYMEs to cover enough 96-well plates for the assay. After 21 ⁇ L of the master mix was distributed in each well in 96 well plates, 4 ⁇ L of the mixture of PCR amplified MADZYMEs and gRNA under the control of T7 promoter was added. The 96-well plates were sealed and incubated for 4 hrs at 37° C. in a thermal cycler. The plates were kept at room temperature until the target pool was added to perform the target depletion reaction.
  • RNAseA/Proteinase K treated samples were purified with DNA purification kits and the purified DNA samples were then amplified and sequenced.
  • the PCR conditions are shown in Table 7:
  • Proteins were produced in vitro under a PURExpress® In Vitro Protein Synthesis Kit (NEB, Ipswich, Mass.). Guide RNAs that target the target plasmid were also produced under a T7 promoter in the same mixture.
  • Supercoiled plasmid target was diluted into the digestion buffer, then the RNP complex was added to the same digestion buffer to initiate the plasmid digestion. After incubation at 37° C.
  • Table 8 lists the identified MADzyme nickases, including the variations from the nuclease sequence in Table 1 and the amino acid sequence.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Mycology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Reciprocating, Oscillating Or Vibrating Motors (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present disclosure provides new RNA-guided nuclease systems and engineered nickases for making rational, direct edits to nucleic acids in live cells.

Description

    RELATED CASES
  • This application is a continuation of U.S. Ser. No. 17/463,498, filed 31 Aug. 2021, now allowed; which claims priority to U.S. Ser. No. 63/133,502, filed 4 Jan. 2021, entitled “MAD NUCLEASES”, which is incorporated herein in its entirety.
  • INCORPORATION BY REFERENCE
  • Submitted with the present application is an electronically filed sequence listing via EFS-Web as an ASCII formatted sequence listing, entitled “INSC083US2_SEQLIST_20220309”, created Mar. 9, 2022, and 359,000 bytes in size. The sequence listing is part of the specification filed Mar. 9, 2022 and is incorporated by reference in its entirety.
  • FIELD OF THE INVENTION
  • The present disclosure provides new RNA-guided nuclease systems and engineered nickases for making rational, direct edits to nucleic acids in live cells.
  • BACKGROUND OF THE INVENTION
  • In the following discussion certain articles and methods will be described for background and introductory purposes. Nothing contained herein is to be construed as an “admission” of prior art. Applicant expressly reserves the right to demonstrate, where appropriate, that the methods referenced herein do not constitute prior art under the applicable statutory provisions.
  • The ability to make precise, targeted changes to the genome of living cells has been a long-standing goal in biomedical research and development. Recently, various nucleases have been identified that allow manipulation of gene sequence: hence, gene function. These nucleases include nucleic acid-guided nucleases. The range of target sequences that nucleic acid-guided nucleases can recognize, however, is constrained by the need for a specific PAM to be located near the desired target sequence. PAMs are short nucleotide sequences recognized by a gRNA/nuclease complex where this complex directs editing of the target sequence. The precise PAM sequence and PAM length requirements for different nucleic acid-guided nucleases vary; however, PAMs typically are 2-7 base-pair sequences adjacent or in proximity to the target sequence and, depending on the nuclease, can be 5′ or 3′ to the target sequence. Engineering nucleic acid-guided nucleases or mining for new nucleic acid-guided nucleases may provide nucleases with altered PAM preferences and/or altered activity or fidelity; all changes that may increase the versatility of a nucleic acid-guided nuclease for certain editing tasks.
  • There is thus a need in the art of nucleic acid-guided nuclease gene editing for novel nucleases with varied PAM preferences, varied activity in cells from different organisms such as mammals and/or altered enzyme fidelity. The novel MAD nucleases described herein satisfy this need.
  • SUMMARY OF THE INVENTION
  • This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Other features, details, utilities, and advantages of the claimed subject matter will be apparent from the following written Detailed Description including those aspects illustrated in the accompanying drawings and defined in the appended claims.
  • The present disclosure provides Type II MAD nucleases (e.g., RNA-guided nucleases or RGNs) with varied PAM preferences, and/or varied activity in mammalian cells.
  • Thus, in one embodiment there are provided MAD nuclease systems that perform nucleic acid-guided nuclease editing including a MAD2015 system comprising SEQ ID Nos. 1 (MAD2015 nuclease), 2 (CRISPR RNA) and 3 (trans-activating crispr RNA); a MAD2016 system comprising SEQ ID Nos. 4 (MAD2016 nuclease), 5 (CRISPR RNA) and 6 (trans-activating crispr RNA); a MAD2017 system comprising SEQ ID Nos. 7 (MAD2017 nuclease), 8 (CRISPR RNA) and 9 (trans-activating crispr RNA); a MAD2019 system comprising SEQ ID Nos. 10 (MAD2019 nuclease), 11 (CRISPR RNA) and 12 (trans-activating crispr RNA); a MAD2020 system comprising SEQ ID Nos. 13 (MAD2020 nuclease), 14 (CRISPR RNA) and 15 (trans-activating crispr RNA); a MAD2021 system comprising SEQ ID Nos. 16 (MAD2021 nuclease), 17 (CRISPR RNA) and 18 (trans-activating crispr RNA); a MAD2022 system comprising SEQ ID Nos. 19 (MAD2022 nuclease), 20 (CRISPR RNA) and 21 (trans-activating crispr RNA); a MAD2023 system comprising SEQ ID Nos. 22 (MAD2023 nuclease), 23 (CRISPR RNA) and 24 (trans-activating crispr RNA); a MAD2024 system comprising SEQ ID Nos. 25 (MAD2024 nuclease), 26 (CRISPR RNA) and 27 (trans-activating crispr RNA); a MAD2025 system comprising SEQ ID Nos. 28 (MAD2025 nuclease), 29 (CRISPR RNA) and 30 (trans-activating crispr RNA); a MAD2026 system comprising SEQ ID Nos. 31 (MAD2026 nuclease), 32 (CRISPR RNA) and 33 (trans-activating crispr RNA); a MAD2027 system comprising SEQ ID Nos. 34 (MAD2034 nuclease), 35 (CRISPR RNA) and 36 (trans-activating crispr RNA); a MAD2028 system comprising SEQ ID Nos. 37 (MAD2028 nuclease), 38 (CRISPR RNA) and 39 (trans-activating crispr RNA); a MAD2029 system comprising SEQ ID Nos. 40 (MAD2029 nuclease), 41 (CRISPR RNA) and 42 (trans-activating crispr RNA); a MAD2030 system comprising SEQ ID Nos. 43 (MAD2030 nuclease), 44 (CRISPR RNA) and 45 (trans-activating crispr RNA); a MAD2031 system comprising SEQ ID Nos. 46 (MAD2031 nuclease), 47 (CRISPR RNA) and 48 (trans-activating crispr RNA); a MAD2032 system comprising SEQ ID Nos. 49 (MAD2032 nuclease), 50 (CRISPR RNA) and 51 (trans-activating crispr RNA); a MAD2033 system comprising SEQ ID Nos. 52 (MAD2033 nuclease), 53 (CRISPR RNA) and 54 (trans-activating crispr RNA); a MAD2034 system comprising SEQ ID Nos. 55 (MAD2034 nuclease), 56 (CRISPR RNA) and 57 (trans-activating crispr RNA); a MAD2035 system comprising SEQ ID Nos. 58 (MAD2035 nuclease), 59 (CRISPR RNA) and 60 (trans-activating crispr RNA); a MAD2036 system comprising SEQ ID Nos. 61 (MAD2036 nuclease), 62 (CRISPR RNA) and 63 (trans-activating crispr RNA); a MAD2037 system comprising SEQ ID Nos. 64 (MAD2031 nuclease), 65 (CRISPR RNA) and 66 (trans-activating crispr RNA); a MAD2038 system comprising SEQ ID Nos. 67 (MAD2038 nuclease), 68 (CRISPR RNA) and 69 (trans-activating crispr RNA); a MAD2039 system comprising SEQ ID Nos. 70 (MAD2039 nuclease), 71 (CRISPR RNA) and 72 (trans-activating crispr RNA); and a MAD2040 system comprising SEQ ID Nos. 73 (MAD2040 nuclease), 74 (CRISPR RNA) and 75 (trans-activating crispr RNA). In some aspects, the MAD system components are delivered as sequences to be transcribed (in the case of the gRNA components) and transcribed and translated (in the case of the MAD nuclease), and in some aspects, the coding sequence for the MAD nuclease and the gRNA component sequences are on the same vector. In other aspects, the coding sequence for the MAD nuclease and the gRNA component sequences are on a different vector and in some aspects, the gRNA component sequences are located in an editing cassette which also comprises a donor DNA (e.g., homology arm). In other aspects, the MAD nuclease is delivered to the cells as a peptide or the MAD nuclease and gRNA components are delivered to the cells as a ribonuclease complex.
  • Additionally there is provided engineered nickases derived from the nucleases from the above-referenced systems, including MAD2016-H851A (SEQ ID NO: 178); MAD2016-N874A (SEQ ID NO: 179); MAD2032-H590A (SEQ ID NO: 180); MAD2039-H587A (SEQ ID NO: 181); MAD2039-N610A (SEQ ID NO: 182).
  • These aspects and other features and advantages of the invention are described below in more detail.
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 is an exemplary workflow for creating and screening mined MAD nucleases or RGNs.
  • FIG. 2 is a simplified depiction of an in vitro test conducted on candidate enzymes.
  • FIG. 3 is a list of novel Type II MADzymes that have been identified.
  • FIG. 4 is a map of Type II MADzymes in cluster 59.
  • FIG. 5 is a map of Type II MADzymes in cluster 55, 56, 57 and 58.
  • FIG. 6 is a map of Type II MADzymes in cluster 141.
  • FIG. 7 is a reproduction of a gel showing nicked plasmid formation with different MADzyme nickases compared to corresponding MADzyme nucleases.
  • It should be understood that the drawings are not necessarily to scale.
  • DETAILED DESCRIPTION
  • The description set forth below in connection with the appended drawings is intended to be a description of various, illustrative embodiments of the disclosed subject matter. Specific features and functionalities are described in connection with each illustrative embodiment; however, it will be apparent to those skilled in the art that the disclosed embodiments may be practiced without each of those specific features and functionalities. Moreover, all of the functionalities described in connection with one embodiment are intended to be applicable to the additional embodiments described herein except where expressly stated or where the feature or function is incompatible with the additional embodiments. For example, where a given feature or function is expressly described in connection with one embodiment but not expressly mentioned in connection with an alternative embodiment, it should be understood that the feature or function may be deployed, utilized, or implemented in connection with the alternative embodiment unless the feature or function is incompatible with the alternative embodiment.
  • The practice of the techniques described herein may employ, unless otherwise indicated, conventional techniques and descriptions of organic chemistry, polymer technology, molecular biology (including recombinant techniques), cell biology, biochemistry, biological emulsion generation, and sequencing technology, which are within the skill of those who practice in the art. Such conventional techniques include polymer array synthesis, hybridization and ligation of polynucleotides, and detection of hybridization using a label. Specific illustrations of suitable techniques can be had by reference to the examples herein. However, other equivalent conventional procedures can, of course, also be used. Such conventional techniques and descriptions can be found in standard laboratory manuals such as Green, et al., Eds. (1999), Genome Analysis: A Laboratory Manual Series (Vols. I-IV); Weiner, Gabriel, Stephens, Eds. (2007), Genetic Variation: A Laboratory Manual; Dieffenbach, Dveksler, Eds. (2003), PCR Primer: A Laboratory Manual; Bowtell and Sambrook (2003), DNA Microarrays: A Molecular Cloning Manual; Mount (2004), Bioinformatics: Sequence and Genome Analysis; Sambrook and Russell (2006), Condensed Protocols from Molecular Cloning: A Laboratory Manual; and Sambrook and Russell (2002), Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press); Stryer, L. (1995) Biochemistry (4th Ed.) W.H. Freeman, New York N.Y.; Gait, “Oligonucleotide Synthesis: A Practical Approach” 1984, IRL Press, London; Nelson and Cox (2000), Lehninger, Principles of Biochemistry 3rd Ed., W. H. Freeman Pub., New York, N.Y.; Berg et al. (2002) Biochemistry, 5th Ed., W.H. Freeman Pub., New York, N.Y.; Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, eds., John Wiley & Sons 1998), all of which are herein incorporated in their entirety by reference for all purposes. Nuclease-specific techniques can be found in, e.g., Genome Editing and Engineering From TALENs and CRISPRs to Molecular Surgery, Appasani and Church, 2018; and CRISPR: Methods and Protocols, Lindgren and Charpentier, 2015; both of which are herein incorporated in their entirety by reference for all purposes. Basic methods for enzyme engineering may be found in, Enzyme Engineering Methods and Protocols, Samuelson, ed., 2013; Protein Engineering, Kaumaya, ed., (2012); and Kaur and Sharma, “Directed Evolution: An Approach to Engineer Enzymes”, Crit. Rev. Biotechnology, 26:165-69 (2006).
  • Note that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “an oligonucleotide” refers to one or more oligonucleotides. Terms such as “first,” “second,” “third,” etc., merely identify one of a number of portions, components, steps, operations, functions, and/or points of reference as disclosed herein, and likewise do not necessarily limit embodiments of the present disclosure to any particular configuration or orientation.
  • Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. All publications mentioned herein are incorporated by reference for the purpose of describing and disclosing devices, methods and cell populations that may be used in connection with the presently described invention.
  • Where a range of values is provided, it is understood that each intervening value, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either both of those included limits are also included in the invention.
  • In the following description, numerous specific details are set forth to provide a more thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without one or more of these specific details. In other instances, well-known features and procedures well known to those skilled in the art have not been described in order to avoid obscuring the invention.
  • The term “complementary” as used herein refers to Watson-Crick base pairing between nucleotides and specifically refers to nucleotides hydrogen bonded to one another with thymine or uracil residues linked to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three hydrogen bonds. In general, a nucleic acid includes a nucleotide sequence described as having a “percent complementarity” or “percent homology” to a specified second nucleotide sequence. For example, a nucleotide sequence may have 80%, 90%, or 100% complementarity to a specified second nucleotide sequence, indicating that 8 of 10, 9 of 10 or 10 of 10 nucleotides of a sequence are complementary to the specified second nucleotide sequence. For instance, the nucleotide sequence 3′-TCGA-5′ is 100% complementary to the nucleotide sequence 5′-AGCT-3′; and the nucleotide sequence 3′-TCGA-5′ is 100% complementary to a region of the nucleotide sequence 5′-TAGCTG-3′.
  • The term DNA “control sequences” refers collectively to promoter sequences, polyadenylation signals, transcription termination sequences, upstream regulatory domains, origins of replication, internal ribosome entry sites, nuclear localization sequences, enhancers, and the like, which collectively provide for the replication, transcription and translation of a coding sequence in a recipient cell. Not all of these types of control sequences need to be present so long as a selected coding sequence is capable of being replicated, transcribed and—for some components—translated in an appropriate host cell.
  • As used herein the term “donor DNA” or “donor nucleic acid” refers to nucleic acid that is designed to introduce a DNA sequence modification (insertion, deletion, substitution) into a locus by homologous recombination using nucleic acid-guided nucleases. For homology-directed repair, the donor DNA must have sufficient homology to the regions flanking the “cut site” or site to be edited in the genomic target sequence. The length of the homology arm(s) will depend on, e.g., the type and size of the modification being made. In many instances and preferably, the donor DNA will have two regions of sequence homology (e.g., two homology arms) to the genomic target locus. Preferably, an “insert” region or “DNA sequence modification” region—the nucleic acid modification that one desires to be introduced into a genome target locus in a cell—will be located between two regions of homology. The DNA sequence modification may change one or more bases of the target genomic DNA sequence at one specific site or multiple specific sites. A change may include changing 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 or more base pairs of the target sequence. A deletion or insertion may be a deletion or insertion of 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 200, 300, 400, or 500 or more base pairs of the target sequence.
  • The terms “guide nucleic acid” or “guide RNA” or “gRNA” refer to a polynucleotide comprising 1) a guide sequence capable of hybridizing to a genomic target locus, and 2) a scaffold sequence capable of interacting or complexing with a nucleic acid-guided nuclease.
  • “Homology” or “identity” or “similarity” refers to sequence similarity between two peptides or, more often in the context of the present disclosure, between two nucleic acid molecules. The term “homologous region” or “homology arm” refers to a region on the donor DNA with a certain degree of homology with the target genomic DNA sequence. Homology can be determined by comparing a position in each sequence which may be aligned for purposes of comparison. When a position in the compared sequence is occupied by the same base or amino acid, then the molecules are homologous at that position. A degree of homology between sequences is a function of the number of matching or homologous positions shared by the sequences.
  • “Operably linked” refers to an arrangement of elements where the components so described are configured so as to perform their usual function. Thus, control sequences operably linked to a coding sequence are capable of effecting the transcription, and in some cases, the translation, of a coding sequence. The control sequences need not be contiguous with the coding sequence so long as they function to direct the expression of the coding sequence. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the coding sequence and the promoter sequence can still be considered “operably linked” to the coding sequence. In fact, such sequences need not reside on the same contiguous DNA molecule (i.e. chromosome) and may still have interactions resulting in altered regulation.
  • A “promoter” or “promoter sequence” is a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a polynucleotide or polypeptide coding sequence such as messenger RNA, ribosomal RNA, small nuclear or nucleolar RNA, guide RNA, or any kind of RNA transcribed by any class of any RNA polymerase I, II or III. Promoters may be constitutive or inducible and, in some embodiments—particularly many embodiments in which selection is employed—the transcription of at least one component of the nucleic acid-guided nuclease editing system is under the control of an inducible promoter.
  • As used herein the term “selectable marker” refers to a gene introduced into a cell, which confers a trait suitable for artificial selection. General use selectable markers are well-known to those of ordinary skill in the art. Drug selectable markers such as ampicillin/carbenicillin, kanamycin, chloramphenicol, erythromycin, tetracycline, gentamicin, bleomycin, streptomycin, rhamnose, puromycin, hygromycin, blasticidin, and G418 may be employed. In other embodiments, selectable markers include, but are not limited to human nerve growth factor receptor (detected with a MAb, such as described in U.S. Pat. No. 6,365,373); truncated human growth factor receptor (detected with MAb); mutant human dihydrofolate reductase (DHFR; fluorescent MTX substrate available); secreted alkaline phosphatase (SEAP; fluorescent substrate available); human thymidylate synthase (TS; confers resistance to anti-cancer agent fluorodeoxyuridine); human glutathione S-transferase alpha (GSTA1; conjugates glutathione to the stem cell selective alkylator busulfan; chemoprotective selectable marker in CD34+cells); CD24 cell surface antigen in hematopoietic stem cells; human CAD gene to confer resistance to N-phosphonacetyl-L-aspartate (PALA); human multi-drug resistance-1 (MDR-1; P-glycoprotein surface protein selectable by increased drug resistance or enriched by FACS); human CD25 (IL-2α; detectable by Mab-FITC); Methylguanine-DNA methyltransferase (MGMT; selectable by carmustine); and Cytidine deaminase (CD; selectable by Ara-C). “Selective medium” as used herein refers to cell growth medium to which has been added a chemical compound or biological moiety that selects for or against selectable markers.
  • The terms “target genomic DNA sequence”, “target sequence”, or “genomic target locus” refer to any locus in vitro or in vivo, or in a nucleic acid (e.g., genome) of a cell or population of cells, in which a change of at least one nucleotide is desired using a nucleic acid-guided nuclease editing system. The target sequence can be a genomic locus or extrachromosomal locus.
  • A “vector” is any of a variety of nucleic acids that comprise a desired sequence or sequences to be delivered to and/or expressed in a cell. Vectors are typically composed of DNA, although RNA vectors are also available. Vectors include, but are not limited to, plasmids, fosmids, phagemids, virus genomes, synthetic chromosomes, and the like. As used herein, the phrase “engine vector” comprises a coding sequence for a nuclease to be used in the nucleic acid-guided nuclease systems and methods of the present disclosure. The engine vector may also comprise, in a bacterial system, the λ Red recombineering system or an equivalent thereto. Engine vectors also typically comprise a selectable marker. As used herein the phrase “editing vector” comprises a donor nucleic acid, optionally including an alteration to the target sequence that prevents nuclease binding at a PAM or spacer in the target sequence after editing has taken place, and a coding sequence for a gRNA. The editing vector may also comprise a selectable marker and/or a barcode. In some embodiments, the engine vector and editing vector may be combined; that is, the contents of the engine vector may be found on the editing vector. Further, the engine and editing vectors comprise control sequences operably linked to, e.g., the nuclease coding sequence, recombineering system coding sequences (if present), donor nucleic acid, guide nucleic acid, and selectable marker(s).
  • Editing in Nucleic Acid-Guided Nuclease Genome Systems
  • RNA-guided nucleases (RGNs) have rapidly become the foundational tools for genome engineering of prokaryotes and eukaryotes. Clustered Rapidly Interspaced Short Palindromic Repeats (CRISPR) systems are an adaptive immunity system which protect prokaryotes against mobile genetic elements (MGEs). RGNs are a major part of this defense system because they identify and destroy MGEs. RGNs can be repurposed for genome editing in various organisms by reprogramming the CRISPR RNA (crRNA) that guides the RGN to a specific target DNA. A number of different RGNs have been identified to date for various applications; however, there are various properties that make some RGNs more desirable than others for specific applications. RGNs can be used for creating specific double strand breaks (DSBs), specific nicks of one strand of DNA, or guide another moiety to a specific DNA sequence.
  • The ability of an RGN to specifically target any genomic sequence is perhaps the most desirable feature of RGNs; however, RGNs can only access their desired target if the target DNA also contains a short motif called PAM (protospacer adjacent motif) that is specific for every RGN. Type V RGNs such as MAD7, AsCas12a and LbCas12a tend to access DNA targets that contain YTTN/TTTN on the 5′ end whereas type II RGNs—such as the MADzymes disclosed herein—target DNA sequences containing a specific short motif on the 3′ end. An example well known in the art for a type II RGN is SpCas9 which requires an NGG on the 3′ end of the target DNA. Type II RGNs, unlike type V RGNS, require a transactivating RNA (tracrRNA) in addition to a crRNA for optimal function. Compared to type V RGNs, the type II RGNs create a double-strand break closer to the PAM sequence, which is highly desirable for precise genome editing applications.
  • A number of type II RGNs have been discovered so far; however, their use in widespread applications is limited by restrictive PAMs. For example, the PAM of SpCas9 occurs less frequently in AT-rich regions of the genome. New type II RGNs with new and less restrictive PAMs are beneficial for the field. Further, not all type II nucleases are active in multiple organisms. For example, a number of RGNs have been discussed in the scientific literature but only a few have been demonstrated to be active in vitro and fewer still are active in cells, particularly in mammalian cells. The present disclosure identifies multiple type II RGNs that have novel PAMs and are active in mammalian cells.
  • In performing nucleic acid-guided nuclease editing, the type II RGNs or MADzymes may be delivered to cells to be edited as a polypeptide; alternatively, a polynucleotide sequence encoding the MADzyme are transformed or transfected into the cells to be edited. The polynucleotide sequence encoding the MADzyme may be codon optimized for expression in particular cells, such as archaeal, prokaryotic or eukaryotic cells. Eukaryotic cells can be yeast, fungi, algae, plant, animal, or human cells. Eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, mouse, rat, rabbit, dog, or non-human mammals including non-human primates. The choice of the MADzyme to be employed depends on many factors, such as what type of edit is to be made in the target sequence and whether an appropriate PAM is located close to the desired target sequence. The MADzyme may be encoded by a DNA sequence on a vector (e.g., the engine vector) and be under the control of a constitutive or inducible promoter. In some embodiments, the sequence encoding the nuclease is under the control of an inducible promoter, and the inducible promoter may be separate from but the same as an inducible promoter controlling transcription of the guide nucleic acid; that is, a separate inducible promoter may drive the transcription of the nuclease and guide nucleic acid sequences but the two inducible promoters may be the same type of inducible promoter (e.g., both are pL promoters). Alternatively, the inducible promoter controlling expression of the nuclease may be different from the inducible promoter controlling transcription of the guide nucleic acid; that is, e.g., the nuclease may be under the control of the pBAD inducible promoter, and the guide nucleic acid may be under the control of the pL inducible promoter.
  • In general, a guide nucleic acid (e.g., gRNA) complexes with a compatible nucleic acid-guided nuclease and can then hybridize with a target sequence, thereby directing the nuclease to the target sequence. With the type II MADzymes described herein, the nucleic acid-guided nuclease editing system uses two separate guide nucleic acid components that combine and function as a guide nucleic acid; that is, a CRISPR RNA (crRNA) and a transactivating CRISPR RNA (tracrRNA). The gRNA may be encoded by a DNA sequence on a polynucleotide molecule such as a plasmid, linear construct, or the coding sequence may reside within an editing cassette and is under the control of a constitutive promoter, or, in some embodiments, an inducible promoter as described below.
  • A guide nucleic acid comprises a guide polynucleotide sequence having sufficient complementarity with a target sequence to hybridize with the target sequence and direct sequence-specific binding of a complexed nucleic acid-guided nuclease to the target sequence. The degree of complementarity between a guide sequence and the corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences. In some embodiments, a guide sequence is about or more than about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20 nucleotides in length. Preferably the guide sequence is 10-30 or 15-20 nucleotides long, or 15, 16, 17, 18, 19, or 20 nucleotides in length.
  • In the present methods and compositions, the components of the guide nucleic acid is provided as a sequence to be expressed from a plasmid or vector and comprises both the guide sequence and the scaffold sequence as a single transcript under the control of a promoter, and in some embodiments, an inducible promoter. In general, to generate an edit in a target sequence, the gRNA/nuclease complex binds to a target sequence as determined by the guide RNA, and the nuclease recognizes a protospacer adjacent motif PAM) sequence adjacent to the target sequence. The target sequence can be any polynucleotide endogenous or exogenous to a prokaryotic or eukaryotic cell, or in vitro. For example, the target sequence can be a polynucleotide residing in the nucleus of a eukaryotic cell. A target sequence can be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide, an intron, a PAM, or “junk” DNA).
  • The guide nucleic acid may be part of an editing cassette that encodes the donor nucleic acid. Alternatively, the guide nucleic acid may not be part of the editing cassette and instead may be encoded on the engine or editing vector backbone. For example, a sequence coding for a guide nucleic acid can be assembled or inserted into a vector backbone first, followed by insertion of the donor nucleic acid in, e.g., the editing cassette. In other cases, the donor nucleic acid in, e.g., an editing cassette can be inserted or assembled into a vector backbone first, followed by insertion of the sequence coding for the guide nucleic acid. In yet other cases, the sequence encoding the guide nucleic acid and the donor nucleic acid (inserted, for example, in an editing cassette) are simultaneously but separately inserted or assembled into a vector. In yet other embodiments, the sequence encoding the guide nucleic acid and the sequence encoding the donor nucleic acid are both included in the editing cassette.
  • The target sequence is associated with a PAM, which is a short nucleotide sequence recognized by the gRNA/nuclease complex. The precise PAM sequence and length requirements for different nucleic acid-guided nucleases vary; however, PAMs typically are 2-7 base-pair sequences adjacent or in proximity to the target sequence and, depending on the nuclease, can be 5′ or 3′ to the target sequence. Engineering of the PAM-interacting domain of a nucleic acid-guided nuclease may allow for alteration of PAM specificity, improve fidelity, or decrease fidelity. In certain embodiments, the genome editing of a target sequence both introduces a desired DNA change to a target sequence, e.g., the genomic DNA of a cell, and removes, mutates, or renders inactive a proto-spacer mutation (PAM) region in the target sequence. Rendering the PAM at the target sequence inactive precludes additional editing of the cell genome at that target sequence, e.g., upon subsequent exposure to a nucleic acid-guided nuclease complexed with a synthetic guide nucleic acid in later rounds of editing. Thus, cells having the desired target sequence edit and an altered PAM can be selected using a nucleic acid-guided nuclease complexed with a synthetic guide nucleic acid complementary to the target sequence. Cells that did not undergo the first editing event will be cut rendering a double-stranded DNA break, and thus will not continue to be viable. The cells containing the desired target sequence edit and PAM alteration will not be cut, as these edited cells no longer contain the necessary PAM site and will continue to grow and propagate.
  • As mentioned previously, the range of target sequences that nucleic acid-guided nucleases can recognize is constrained by the need for a specific PAM to be located near the desired target sequence. As a result, it often can be difficult to target edits with the precision that is necessary for genome editing. It has been found that nucleases can recognize some PAMs very well (e.g., canonical PAMs), and other PAMs less well or poorly (e.g., non-canonical PAMs). Because the mined MAD nucleases disclosed herein may recognize different PAMs, the mined MAD nucleases increase the number of target sequences that can be targeted for editing; that is, mined MAD nucleases decrease the regions of “PAM deserts” in the genome. Thus, the mined MAD nucleases expand the scope of target sequences that may be edited by increasing the number (variety) of PAM sequences recognized. Moreover, cocktails of mined MAD nucleases may be delivered to cells such that target sequences adjacent to several different PAMs may be edited in a single editing run.
  • Another component of the nucleic acid-guided nuclease system is the donor nucleic acid. In some embodiments, the donor nucleic acid is on the same polynucleotide (e.g., editing vector or editing cassette) as the guide nucleic acid and may be (but not necessarily) under the control of the same promoter as the guide nucleic acid (e.g., a single promoter driving the transcription of both the guide nucleic acid and the donor nucleic acid). For cassettes of this type, see U.S. Pat. Nos. 10,240,167; 10,266,849; 9,982,278; 10,351,877; 10,364,442; 10,435,715; and 10,465,207. The donor nucleic acid is designed to serve as a template for homologous recombination with a target sequence nicked or cleaved by the nucleic acid-guided nuclease as a part of the gRNA/nuclease complex. A donor nucleic acid polynucleotide may be of any suitable length, such as about or more than about 20, 25, 50, 75, 100, 150, 200, 500, or 1000 nucleotides in length. In certain preferred aspects, the donor nucleic acid can be provided as an oligonucleotide of between 20-300 nucleotides, more preferably between 50-250 nucleotides. The donor nucleic acid comprises a region that is complementary to a portion of the target sequence (e.g., a homology arm). When optimally aligned, the donor nucleic acid overlaps with (is complementary to) the target sequence by, e.g., about 20, 25, 30, 35, 40, 50, 60, 70, 80, 90 or more nucleotides. In many embodiments, the donor nucleic acid comprises two homology arms (regions complementary to the target sequence) flanking the mutation or difference between the donor nucleic acid and the target template. The donor nucleic acid comprises at least one mutation or alteration compared to the target sequence, such as an insertion, deletion, modification, or any combination thereof compared to the target sequence.
  • Often the donor nucleic acid is provided as an editing cassette, which is inserted into a vector backbone where the vector backbone may comprise a promoter driving transcription of the gRNA and the coding sequence of the gRNA, or the vector backbone may comprise a promoter driving the transcription of the gRNA but not the gRNA itself. Moreover, there may be more than one, e.g., two, three, four, or more guide nucleic acid/donor nucleic acid cassettes inserted into an engine vector, where each guide nucleic acid is under the control of separate different promoters, separate like promoters, or where all guide nucleic acid/donor nucleic acid pairs are under the control of a single promoter. In some embodiments the promoter driving transcription of the gRNA and the donor nucleic acid (or driving more than one gRNA/donor nucleic acid pair) is an inducible promoter. Inducible editing is advantageous in that isolated cells can be grown for several to many cell doublings to establish colonies before editing is initiated, which increases the likelihood that cells with edits will survive, as the double-strand cuts caused by active editing are largely toxic to the cells. This toxicity results both in cell death in the edited colonies, as well as a lag in growth for the edited cells that do survive but must repair and recover following editing. However, once the edited cells have a chance to recover, the size of the colonies of the edited cells will eventually catch up to the size of the colonies of unedited cells. See, e.g., U.S. Pat. Nos. 10,533,152; 10,550,363; 10,532,324; 10,550,363; 10,633,626; 10,633,627; 10,647,958; 10,760,043; 10,723,995; 10,801,008; and 10,851,339. Further, a guide nucleic acid may be efficacious directing the edit of more than one donor nucleic acid in an editing cassette; e.g., if the desired edits are close to one another in a target sequence.
  • In addition to the donor nucleic acid, an editing cassette may comprise one or more primer sites. The primer sites can be used to amplify the editing cassette by using oligonucleotide primers; for example, if the primer sites flank one or more of the other components of the editing cassette.
  • In addition, the editing cassette may comprise a barcode. A barcode is a unique DNA sequence that corresponds to the donor DNA sequence such that the barcode can identify the edit made to the corresponding target sequence. The barcode typically comprises four or more nucleotides. In some embodiments, the editing cassettes comprise a collection of donor nucleic acids representing, e.g., gene-wide or genome-wide libraries of donor nucleic acids. The library of editing cassettes is cloned into vector backbones where, e.g., each different donor nucleic acid is associated with a different barcode.
  • Additionally, in some embodiments, an expression vector or cassette encoding components of the nucleic acid-guided nuclease system further encodes one or more nuclear localization sequences (NLSs), such as about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs. In some embodiments, the nuclease comprises NLSs at or near the amino-terminus of the MADzyme, NLSs at or near the carboxy-terminus of the MADzyme, or a combination.
  • The engine and editing vectors comprise control sequences operably linked to the component sequences to be transcribed. As stated above, the promoters driving transcription of one or more components of the mined MAD nuclease editing system may be inducible, and an inducible system is likely employed if selection is to be performed. A number of gene regulation control systems have been developed for the controlled expression of genes in plant, microbe, and animal cells, including mammalian cells, including the pL promoter (induced by heat inactivation of the CI857 repressor), the pBAD promoter (induced by the addition of arabinose to the cell growth medium), and the rhamnose inducible promoter (induced by the addition of rhamnose to the cell growth medium). Other systems include the tetracycline-controlled transcriptional activation system (Tet-On/Tet-Off, Clontech, Inc. (Palo Alto, Calif.); Bujard and Gossen, PNAS, 89(12):5547-5551 (1992)), the Lac Switch Inducible system (Wyborski et al., Environ Mol Mutagen, 28(4):447-58 (1996); DuCoeur et al., Strategies 5(3):70-72 (1992); U.S. Pat. No. 4,833,080), the ecdysone-inducible gene expression system (No et al., PNAS, 93(8):3346-3351 (1996)), the cumate gene-switch system (Mullick et al., BMC Biotechnology, 6:43 (2006)), and the tamoxifen-inducible gene expression (Zhang et al., Nucleic Acids Research, 24:543-548 (1996)) as well as others.
  • Typically, performing genome editing in live cells entails transforming cells with the components necessary to perform nucleic acid-guided nuclease editing. For example, the cells may be transformed simultaneously with separate engine and editing vectors; the cells may already be expressing the mined MAD nuclease (e.g., the cells may have already been transformed with an engine vector or the coding sequence for the mined MAD nuclease may be stably integrated into the cellular genome) such that only the editing vector needs to be transformed into the cells; or the cells may be transformed with a single vector comprising all components required to perform nucleic acid-guided nuclease genome editing.
  • A variety of delivery systems can be used to introduce (e.g., transform or transfect) nucleic acid-guided nuclease editing system components into a host cell. These delivery systems include the use of yeast systems, lipofection systems, microinjection systems, biolistic systems, virosomes, liposomes, immunoliposomes, polycations, lipid:nucleic acid conjugates, virions, artificial virions, viral vectors, electroporation, cell permeable peptides, nanoparticles, nanowires, exosomes. Alternatively, molecular trojan horse liposomes may be used to deliver nucleic acid-guided nuclease components across the blood brain barrier. Of particular interest is the use of electroporation, particularly flow-through electroporation (either as a stand-alone instrument or as a module in an automated multi-module system) as described in, e.g., U.S. Pat. Nos. 10,435,713; 10,443,074; 10,323,258; and 10,415,058.
  • After the cells are transformed with the components necessary to perform nucleic acid-guided nuclease editing, the cells are cultured under conditions that promote editing. For example, if constitutive promoters are used to drive transcription of the mined MAD nucleases and/or gRNA, the transformed cells need only be cultured in a typical culture medium under typical conditions (e.g., temperature, CO2 atmosphere, etc.) Alternatively, if editing is inducible—by, e.g., activating inducible promoters that control transcription of one or more of the components needed for nucleic acid-guided nuclease editing, such as, e.g., transcription of the gRNA, donor DNA, nuclease, or, in the case of bacteria, a recombineering system—the cells are subjected to inducing conditions.
  • EXAMPLES
  • The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of what the inventors regard as their invention, nor are they intended to represent or imply that the experiments below are all of or the only experiments performed. It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific aspects without departing from the spirit or scope of the invention as broadly described. The present aspects are, therefore, to be considered in all respects as illustrative and not restrictive.
  • Example 1 Exemplary Workflow Overview
  • The disclosed MADzyme Type II CRISPR enzymes were identified by the method depicted in FIG. 1. FIG. 1 shows an exemplary workflow for creating and for in vitro screening of MADzymes, including those in untapped clusters. In a first step, metagenome mining was performed to identify putative RGNs of interest based on, e.g., sequence (HMMER profile) and a search for CRISPR arrays. Once putative RGNs of interest were identified in silico, candidate pools were created and each MADzyme was identified by cluster, the tracrRNA was identified, and the sgRNA structure was predicted. Final candidates were identified, then the genes were synthesized. An in vitro depletion test was performed (see FIG. 2), where a synthetic target library was constructed in which to test target depletion for each of the candidate MADzymes. After target depletion, amplicons were produced for analysis for in vivo analysis. FIG. 2 depicts the in vitro depletion test in more detail.
  • Example 2 Metagenome Mining
  • The NCBI Metagenome database was used to search for novel, putative CRISPR nucleases using HMMER hidden Markov model searches. Hundreds of potential nucleases were identified. For each potential nuclease candidate, putative CRISPR arrays were identified and CRISPR repeat and anti-repeats were identified. Thirteen nucleases (FIG. 3) were chosen for in vitro validation and 11 active MADzymes were identified and assigned to clusters. There was less than 40% sequence identity between clusters. Cluster 59 shown in FIG. 4 presents two unique subclusters with distinct sgRNA architecture. Clusters 55-57 are shown in FIG. 5. These new MADzymes have diverse PAM preferences and distinct sgRNA structure. Cluster 141 (FIG. 6) is a distant cluster from 55, 56, 57 and 59 and shows diverse Cas protein structure and smaller-sized enzymes (e.g., approximately 200 amino acids shorter than the counterparts from the 55, 56, 57 and 59 clusters). Table 1 lists the identified MADzymes, including amino acid sequences, origin, and nucleic acid sequences of the CRISPR RNA and the trans-activating crispr RNA.
  • TABLE 1
    Organism
    MAD Clus- (meta- CRISPR
    name ter Contig_id genome) Source aa_seq repeat tracrRNA
    MAD2015 59 DPZI01000013.1 Vagococcus MGKNYTIGLDIGTNSVGWSVVTENQQLVKKRMKIRGDS GTTTT TGTTGGT
    sp. EKKQVKKNFWGVRLFDEGETAEATRLKRTTRRRYTRRR AGAGC AGCATTC
    NRVVDLQNIFKDEINQKDSNFFNRLNESFLVVEDKKQP TATGC AAAACAA
    KQMIFGTVEEEASYHESFPTIYHLRKELVDNKDQADIR TGTTT CATAGCA
    LVYLAMAHMIKYRGHFLIEGQLSTENTSVEEKFHLFLK TGAAT AGTTAAA
    EYNSTFCKQEDGSLVNPVNEDINGEEILMGTLSRSKKA GCTTC ATAAGGC
    EQIMKSFEGEKSNGVFSQFLKMIVGNQGNFKKAFNLEE CAAAA TTTGTCC
    DAKIQFAKEEYDEDLTTLLSNIGDEYANVFSLAKETYE C GTTCTCA
    AIELSGILSTKDKETYAKLSSSMTERYEDHEKDLASLK [SEQ ACTTTTA
    SFFREHLPEKYAVMFKDVSKNGYAGYIENSNKISQEEF ID NO. GTGACGC
    YKYTKKLIGQIEGADYFIKKMEQEAFLRKQRTYDNGVI 2] TGTTTCG
    PYQVHLSELTHIINNQKKYYPFLLEKEEEIKSILTFKI GCG
    PYYIGPLAKGNSDFAWLIRNSNDKITPSNFNEVLDIEN [SEQ ID
    SASQFIERMTNNDVYLPEEKVLPKNSMLYQKYIVFNEL NO. 3]
    TKVRYINDRGTECNFSGEEKLQIFERFFKDSSTKVKKV
    SLENYLNKEYMIESPTIKGIEDDFNASFRTYHDFIKLG
    VSREMLDDIDNEEMFEDIVKILTIFEDRQMIKKQLEKY
    KDVFDSDILKKMVRRHYTGWGRLSKKLLHEMKDDNSGK
    TILDYLIEDDRLPKHINRNFMQLINDSNLSFKEKIEKA
    QLTDGTEDIDSVVKNLIGSPAIKKGISQSLKIVEELVS
    IMGYQPTSIVVEMARENQTTSKGKRQSIQRYKRLEAAI
    NELGSDLLKVCPTDNHALKDDRLYLYYLQNGRDMYTGL
    ELDIHNLSQYDIDHIVPRSFITDNSIDNRVLVSSKKNR
    GKLDNVPSKEIVQKNKLLWMNLKKSKLMSEKKYANLIK
    GETGGLTEDDKAKFLNRQLVETRQITKNVAQILDQRFN
    TQKDEKGNIIREVKVITLKSALVSQFRQNFEFYKVREV
    NDFHHANDAYLNAVVANTLLKVYPKLTPDFVYGEYRKG
    NPFKNTKATAKKHYYSNIMENLCHETTIIDDETGEILW
    DKKCIGTIKQVLNYHQVNVVKKVETQTGRFSEETLVPR
    GSTKNPIALKSHLDPQKYGGFKSPTIAYTIVIEYKKGK
    KDILIKELLGISIMNRGAFEKNNKEYLEKLNYKEPRVL
    MVLPKYSLFELENGRRRLLASDKESQKGNQMAVPSYLN
    NLLYHTNKSLSKNAKSLEYVNEHRQQFEELLEEIIDFA
    NQFTLAEKNTLLIADLYESNKEADIELLASSFINLLRF
    NQMGAPAEFSFFEKPIPRKRYSSTFELLKGKVIHQSIT
    GLYETHQKV [SEQ ID NO. 1]
    MAD2016 59 DGLK01000042.1 Entero- New MKKDYVIGLDIGTNSVGWAVMTEDYQLVKKKMPIYGNT GTTTT TCTTTTG
    coccus York EKKKIKKNFWGVRLFEEGHTAEDRRLKRTARRIISRRR AGAGT GGACTAT
    faecalis City NRLRYLQAFFEEAMTDLDENFFARLQESFLVPEDKKWH CATGT TCTAAAC
    MTA RHPIFAKLEDEVAYHETYPTIYHLRKKLADSSEQADLR TGTTT AACATAG
    subway LIYLALAHIVKYRGHFLIEGKLSTENISVKEQFQQFMI AGAAT CAAGTTA
    IYNQTFVNGESRLVSAPLPESVLIEEELTEKASRTKKS GGTAC AAATAAG
    EKVLQQFPQEKANGLFGQFLKLMVGNKADFKKVFGLEE CAAAA GTTTTAA
    EAKITYASESYEEDLEGILAKVGDEYSDVFLAAKNVYD C CCGTAAT
    AVELSTILADSDKKSHAKLSSSMIVRFTEHQEDLKKFK [SEQ CAACTGT
    RFIRENCPDEYDNLFKNEQKDGYAGYIAHAGKVSQLKF ID NO. AAAGTGG
    YQYVKKIIQDIAGAEYFLEKIAQENFLRKQRTFDNGVI 5] CGCTGTT
    PHQIHLAELQAIIHRQAAYYPFLKENQEKIEQLVTFRI TCGGCGC
    PYYVGPLSKGDASTFAWLKRQSEEPIRPWNLQETVDLD [SEQ ID
    QSATAFIERMTNFDTYLPSEKVLPKHSLLYEKFMVFNE NO. 6]
    LTKISYTDDRGIKANFSGKEKEKIFDYLFKTRRKVKKK
    DIIQFYRNEYNTEIVTLSGLEEDQFNASFSTYQDLLKC
    GLTRAELDHPDNAEKLEDIIKILTIFEDRQRIRTQLST
    FKGQFSAEVLKKLERKHYTGWGRLSKKLINGIYDKESG
    KTILGYLIKDDGVSKHYNRNFMQLINDSQLSFKNAIQK
    AQSSEHEETLSETVNELAGSPAIKKGIYQSLKIVDELV
    AIMGYAPKRIVVEMARENQTTSTGKRRSIQRLKIVEKA
    MAEIGSNLLKEQPTTNEQLRDTRLFLYYMQNGKDMYTG
    DELSLHRLSHYDIDHIIPQSFMKDDSLDNLVLVGSTEN
    RGKSDDVPSKEVVKDMKAYWEKLYAAGLISQRKFQRLT
    KGEQGGLTLEDKAHFIQRQLVETRQITKNVAGILDQRY
    NANSKEKKVQIITLKASLTSQFRSIFGLYKVREVNDYH
    HGQDAYLNCVVATTLLKVYPNLAPEFVYGEYPKFQTFK
    ENKATAKAIIYTNLLRFFTEDEPRFTKDGEILWSNSYL
    KTIKKELNYHQMNIVKKVEVQKGGFSKESIKPKGPSNK
    LIPVKNGLDPQKYGGFDSPIVAYTVLFTHEKGKKPLIK
    QEILGITIMEKTRFEQNPILFLEEKGFLRPRVLMKLPK
    YTLYEFPEGRRRLLASAKEAQKGNQMVLPEHLLTLLYH
    AKQCLLPNQSESLTYVEQHQPEFQEILERVVDFAEVHT
    LAKSKVQQIVKLFEANQTADVKEIAASFIQLMQFNAMG
    APSTFKFFQKDIERARYTSIKEIFDATIIYQSTTGLYE
    TRRKVVD [SEQ ID NO. 4]
    MAD2017 59 DMKA01000006.1 Strepto- MKKPYSIGLDIGTNSVGWAVITDDYKVPAKKMKVLGNT GTTTT TGTTGGA
    coccus DKKYIKKNLLGALLFDSGETAEVTRLKRTARRRYTRRK AGAGC ACTATTC
    sp. NRLRYLQEIFAKEMTKVDESFFQRLEESFLTDDDKTFD TGTGC GAAACAA
    (firmi- SHPIFGNKAEEDAYHQKFPTIYHLRKYLADSQEKADLR TGTTT CACAGCG
    cutes) LVYLALAHMIKYRGHFLIEGELNAENTDVQKLFNVFVE CGAAT AGTTAAA
    TYDKIVDESHLSEIEVDASSILTEKVSKSRRLENLIKQ GGTTC ATAAGGC
    YPTEKKNTLFGNLIALALGLQPNFKTNFKLSEDAKLQF CAAAA TTTGTCC
    SKDTYEEDLEELLGKVGDDYADLFISAKNLYDAILLSG C GTACACA
    ILTVDDNSTKAPLSASMIKRYVEHHEDLEKLKEFIKIN [SEQ ACTTGTA
    KLKLYHDIFKDKTKNGYAGYIDNGVKQDEFYKYLKTIL ID NO. AAAGGGG
    TKIDDSDYFLDKIERDDFLRKQRTFDNGSIPHQIHLQE 8] CACCCGA
    MHSILRRQGEYYPFLKENQAKIEKILTFRIPYYVGPLA TTCGGGT
    RKDSRFAWANYHSDEPITPWNFDEVVDKEKSAEKFITR GCA
    MTLNDLYLPEEKVLPKHSHVYETFTVYNELTKIKYVNE [SEQ ID
    QGESFFFDANMKQEIFDHVFKENRKVTKAKLLSYLNNE NO. 9]
    FEEFRINDLIGLDKDSKSFNASLGTYHDLKKILDKSFL
    DDKTNEQIIEDIVLTLTLFEDRDMIHERLQKYSDFFTS
    QQLKKLERRHYTGWGRLSYKLINGIRNKENNKTILDFL
    IDDGHANRNFMQLINDESLSFKTIIQEAQVVGDVDDIE
    AVVHDLPGSPAIKKGILQSVKIVDELVKVMGDNPDNIV
    IEMARENQTTGYGRNKSNQRLKRLQDSLKEFGSDILSK
    KKPSYVDSKVENSHLQNDRLFLYYIQNGKDMYTGEELD
    IDRLSDYDIDHIIPQAFIKDNSIDNKVLTSSAKNRGKS
    DDVPSIEIVRNRRSYWYKLYKSGLISKRKFDNLTKAER
    GGLTEADKAGFIKRQLVETRQITKHVAQILDARFNTKR
    DENDKVIRDVKVITLKSNLVSQFRKEFKFYKVREINDY
    HHANDAYLNAVVGTALLKKYPKLTPEFVYGEYKKYDVR
    KLIAKSSDDYSEMGKATAKYFFYSNLMNFFKTEVKYAD
    GRVFERPDIETNADGEVVWNKQKDFDIVRKVLSYPQVN
    IVKKVEAQTGGFSKESILSKGDSDKLIPRKTKKVYWNT
    KKYGGFDSPTVAYSVLVVADIEKGKAKKLKTVKELVGI
    SIMERSFFEENPVSFLEKKGYHNVQEDKLIKLPKYSLF
    EFEGGRRRLLASATELQKGNEVMLPAHLVELLYHAHRI
    DSFNSTEHLKYVSEHKKEFEKVLSCVENFSNLYVDVEK
    NLSKVRAAAESMTNFSLEEISASFINLLTLTALGAPAD
    FNFLGEKIPRKRYTSTKECLSATLIHQSVTGLYETRID
    LSKLGEE [SEQ ID NO. 7]
    MAD2019 59 DOTL01000042.1 Strepto- MTKPYSIGLDIGTNSVGWAVITDDYKVPSKKMKVLGNT GTTTT GGTTTGA
    coccus SKKYIKKNLLGALLFDSGITAEGRRLKRTARRRYTRRR AGAGC AACCATT
    sp. NRILYLQEIFSTEMATLDDAFFQRLDDSFLVPDDKRDS TGTGT CGAAACA
    (firmi- KYPIFGNLVEEKAYHDEFPTIYHLRKYLADSTKKADLR TGTTT ATACAGC
    cutes) LVYLALAHMIKYRGHFLIEGEFNSKNNDIQKNFQDFLD CGAAT AAAGTTA
    TYNAIFESDLSLENSKQLEEIVKDKISKLEKKDRILKL GGTTC AAATAAG
    FPGEKNSGIFSEFLKLIVGNQADFKKYFNLDEKASLHF CAAAA GCTAGTC
    SKESYDEDLETLLGYIGDDYSDVFLKAKKLYDAILLSG C CGTATAC
    ILTVTDNGTETPLSSAMIMRYKEHEEDLGLLKAYIRNI [SEQ AACGTGA
    SLKTYNEVFNDDTKNGYAGYIDGKTNQEDFYVYLKKLL ID NO. AAACACG
    AKFEGADYFLEKIDREDFLRKQRTFDNGSIPYQIHLQE 11] TGGCACC
    MRAILDKQAKFYPFLAKNKERIEKILTFRIPYYVGPLA GATTCGG
    RGNSDFAWSIRKRNEKITPWNFEDVIDKESSAEAFINR TGC
    MTSFDLYLPEEKVLPKHSLLYETFTVYNELTKVRFIAE [SEQ ID
    GMSDYQFLDSKQKKDIVRLYFKGKRKVKVTDKDIIEYL NO. 12]
    HAIDGYDGIELKGIEKQFNSSLSTYHDLLNIINDKEFL
    DDSSNEAIIEEIIHTLTIFEDREMIKQRLSKFENIFDK
    SVLKKLSRRHYTGWGKLSAKLINGIRDEKSGNTILDYL
    IDDGISNRNFMQLIHDDALSFKKKIQKAQIIGDKDKDN
    IKEVVKSLPGSPAIKKGILQSIKIVDELVKVMGRKPES
    IVVEMARENQYTNQGKSNSQQRLKRLEESLEELGSKIL
    KENIPAKLSKIDNNSLQNDRLYLYYLQNGKDMYTGDDL
    DIDRLSNYDIDHIIPQAFLKDNSIDNKVLVSSASNRGK
    SDDVPSLEVVKKRKTLWYQLLKSKLISQRKFDNLTKAE
    RGGLSPEDKAGFIQRQLVETRQITKHVARLLDEKFNNK
    KDENNRAVRTVKIITLKSTLVSQFRKDFELYKVREIND
    FHHAHDAYLNAVVASALLKKYPKLEPEFVYGDYPKYNS
    FRERKSATEKVYFYSNIMNIFKKSISLADGRVIERPLI
    EVNEETGESVWNKESDLATVRRVLSYPQVNVVKKVEVQ
    SGGFSKELVQPHGNSDKLIPRKTKKMIWDTKKYGGFDS
    PIVAYSVLVMAEREKGKSKKLKPVKELVRITIMEKESF
    KENTIDFLERRGLRNIQDENIILLPKFSLFELENGRRR
    LLASAKELQKGNEFILPNKLVKLLYHAKNIHNTLEPEH
    LEYVESHRADFGKILDVVSVFSEKYILAEAKLEKIKEI
    YRKNMNTEIHEMATAFINLLTFTSIGAPATFKFFGHNI
    ERKRYSSVAEILNATLIHQSVTGLYETRIDLGKLGED
    [SEQ ID NO. 10]
    MAD2020 55 DQFW01000027.1 Achole- human MKNNEETLKKLRLGLDIGTNSVGYALLDENNKLIKKNG GTTTG TGTAAAT
    plasmatales gut HTFWGVRMFDEAETAKDRGSYRKSRRRLLRRKERMEIL CTAGT AACATAA
    bacterium RSFFTKEICDIDPTFFERLDDSFYYKEDKKNKNTYNLF TATGT CGAGTGC
    TSEYTDKDFYLEYPTIYHLRKAMQEEDKKFDIRMVYLA TATTT AAATAAG
    IAHIIKYRGNFLYPGEEFSTSEYTSIKQFFLDFNDILD ATAGT CGTTTCG
    ELSNELEDNEDYSAEYFDKIENINDDFLEKLKVILMEI ATTAA CGAAAAT
    KGISNKKKELLDLFNVNKKSIYNELVIPFISGSAKVNI GCAAA TTACAGT
    SSLSVIKNSKYPKTEISLGSEELEGQVEEAISVAPEIK C GGCCCTG
    SVLEMIIKIKEISDFYFINKILSDSKTISESMVKMYDE [SEQ CTGTGGG
    HNEDLKKLKGFFKKYAEDQYNEIFKIRDEKLANYVAYV ID NO. GCCTTTT
    GFNKLRKNKVERFKHASREEFYGYLKQKLNNIKYAEAQ 14] TTATTTA
    EEIKYFIDKIDNNEFLLKQNSNQNGAFPMQLHLKELKT TCAAA
    ILNNQEKYYPFLSEGNDGYSIKEKIILTFKYKIPYYVG [SEQ ID
    PLNKESKYSWVVREDEKIYPWNFDKVVKLDETAEKFIL NO. 15]
    RMQNKCTYLKGDNDYCLPKNSLIFSEYSCLSYLNKLSI
    NGKPIDPIMKSKIFNEVFLIKKQPTKKDIIEFIKTNYN
    ADALTTTEKELPEATCNMASYIKMKEIFGKDFNDNKEM
    IENIIKDITIFEDKSILGNRLKELYKLNNDRIKQIKGL
    NYKGYSRLSKNLLVGLQIVDNQTGEIKGNVIEVMRKTN
    LNLQEILYLDGYRLIDAIDEYNRKNSLNDSYLCARDYI
    AENLVISPSFKRALIQTCSIIQEIERIFHKKIDEFYVE
    VTRTNKDKNKGKTTSSRYDKIKKIYSSCQELAMAYNFD
    MKRLKNELESNKDNLKSDILYFYFTQLGKCMYSLEDID
    ISDLTNNYHYDIDHIYPQSIIKDDSLSNRVLVDKKKNA
    AKTDKFLFEAKVLNPKAQQFYKKLLSLELISKEKYRRL
    TQKEISKDELEGFVNRQLVSTNQSVMGLIKLLKEYYKV
    DEKNIIYSKGENVSDFRHTFDLVKSRTANNFHHANDAY
    LNVVVGGILNKYYTSRRFYQFSDIARIENEGESLNPSR
    IFTKRDILKANGKVIWDKKEDIKRIEKDLYHRFDITET
    IRTYNPNKMYSKVTILPKGEGESAVPFQTTTPRVDVEK
    YGGITSNKFSRYVIIEAHGKKGLDTILEAIPKTACGDN
    NKIEKDIDNYIASLDEYQKYTSYKVVNYNIKANVVIQE
    GSFKYIITGKSGNQYVLQNVQDRFFSKKAMITIKNIDK
    YLNNKKLGIIMAKDNEKIIVSPARGKNNEEIFFEKTEL
    VNLLKEIKTMYSKDIYSFSAIQNIVNNIDCSIDYSIDD
    FIIICNNLLQILKTNERKNADLRLIHLSGNSGTLYLGK
    KLKSGMKFIWQSITGYYEEILYEVK [SEQ ID NO.
    13]
    MAD2021 57 DEED01000018.1 Lachno- MSEKYFVGLDMGTSSVGWAVTDEHYHLLRRKGKDLWGA GTTTG GATAATG
    spiraceae RLFDEAETAAGRRTNRVSRRRLARQRARIGWLKELFRP AGAGC TTTTACA
    bacterium YLEEKDAGFLQRLEESRFFLEDKTVKQPYALFSDKEFT CTTGT AGGCGAG
    DKDYYQKYPTIFHLRKELLESKAPHDVRLVFLAVLNMY AAAAC TTCAAAT
    AHRGHFLNPELQEGTLGDIHDLLSRLDAYIQDLFEDQG CGTAT AAGGATT
    WSILENVEEQQKVLAEKNISNTVRLEKILSAIGTSPKD ATCTC TATCCGA
    KEKKPLIEIYKLICGLKGSLSLAFSGVEMNETDAQMKF TCAAG AATCGCT
    SFSDSNLEENEPEIERILGERYFEMYSILKEIHAWGLL C TGCGTGC
    SEIMSDDSGKTYPYISYAKVDLYQKHHEQLRMLKKIIR [SEQ ATTGGCA
    TYAPDEYHRMFRSMEDNTYSAYVGSVNSKNKKQRRGAK ID NO. CCATCTA
    STDFFKEVKRIIEKIEKEHGELPECEEILDLIARDSFL 17] TCTTTTA
    PKQLTTANGVIPNQVYATELRQIVTNAAAYLPFLNDKD
    DTGLTNAEKIVEMFKFHIPYYIGPLKNDGNGTAWVVRK AGACTTT
    QQGTVYPWNIDEKVDMAKTRDQFILNLVRKCSYLNDET CTTTGAA
    VLPASSLLYEKFKVLNELNNLTINGQKISVELKQDIFR AGTCTT
    DLFRATGKRVTTRKLMGYLRRKAVIDADADETCLEGFD [SEQ ID
    KTQGGFVSTLSSYHKFMEIFSTDVLTDRQREIAEGAIY NO. 18]
    FATVYGEDKSFLKKVLRDKFSPAELSQAQIDRLSGIRF
    KDWSHLSREFLLLEEADHSTGEIMTIIDRLWNTNENLM
    QIIHSDEYTYKQAIEERTARLEKSLSEVSFEDIEDSYM
    SAPVRRMVWQTIRILQEIEEVMGSEPARVFVEMTRSEG
    EKGDKGRKDSRKKKLKELYKKCKDDDQGLLSDIEGRDE
    RDFRIRKLYLYYMQKGLCMYSGHPIDFGKLFDDSYYDI
    DHIYPRHYVKDDSIENNLVLVESKLNRDKKDTLLCPDI
    QERMHPVWEMLHRQGFMNDEKFKRLMRKEPFSEEEFAH
    FIERQLVETGQGTKEIARILNDVLGNKDENNKVIYVKA
    GNVSSFRNDNKKNPEFVKCRVINDHHHAKDAYLNIVVG
    NTYYTKFTLHPANFIRELRNKSHPTLEDQYNMDKLFAR
    RVERNGYTAWNPDTDFQTVKQVLRKNSVLISRRSFIEH
    GQIADLQLVSGRKISEVNGKGYLPIKASDIRLSGPSGT
    MKYGGYNKASGAYFFLVEHELKGKLVRTIEPVYVYMMA
    SIHGKEDLEKYCQEELGYIHPRICLKKIPMYSHIRING
    FDYYLTGRSNDRLFICNAVQLTLSSEWSAYIKALSKAV
    DEKWDAAYIEQQASRIQDSLKSEEVFISKERNDQLYKV
    LLQKHLEGFFNNRINSIGTIMKEGYDSFRALPVNEQAE
    TLMEILKISQLVNIGANLVSIGGKSRSGVATVSKKISD
    SKSFQLISDSVTGIFQRATDLLTI [SEQ ID NO.
    16]
    MAD2022 57 CACYWR010000004.1 uncultured Cattle MEKEYYLGLDMGTSSVGWAVTDKEYRLLRAKGKDMWGI GTTTG GAGAATT
    Lachno- rumen REFEEAQTAVERRTHRLSKRRRARQLVRIGLLKDYFHD AGAGT AACAAGA
    spiraceae EIMKIDPNFYIRLENSKYYLEDKDVRLASSNGIFDDKN CTTGT CGAGTGC
    bacterium YTDKDYYEQYKTIFHLRSELIHNSQKHDVRLVYLALLN TAATT AAATAAG
    MFKHRGHFLFEGDAYVQGNIGDIYKEFIQLLKNEYYED CTTAA GTTTATC
    ENVKLTDQIDYFKLKEILSNSEFSRTAKAEKINSLVHI AGGTG CGGAATC
    DKKNKLENTYIRLLCGLEIELKILFPEIDEKIKICFAK TAAAA GTCAATA
    GYDEKLVEITEILTDNQLQILENLKKIHDIAALDKIRK C TGACCTG
    GKEYLSDARVAEYEKHREDLALLKKIYREYMTKQDYDR [SEQ CATTGTG
    MFREGEDGSYSAYVNSYNTSKKQRRNMKHRKIDEFYGT ID NO. CAGAATC
    IRKDLKLLLKQGIQDDNIERILEEIDGNNDNKFMPKQL 20] TTTAAAA
    SFANGVIPNSLHKAEMKAILRNAETYLPFLLETDESGL TCATATG
    TVSERILQLFSFHIPYYIGPVSVNSEKNNGNGWVVRRE ATTTCAT
    DGEVLPWNIEQKIDYGETSKRFIEKMVRRCTYISGEQV ATGGTTT
    LPKNSFIYEKYCVLNEINNIKIDGERITVELKQNIYND TA
    LYLHGKRVTKKQLINYLNNRGMIEDENQVSGIDINLNN [SEQ ID
    YLGSYGKFLPIFEEKLKEDNYIKIAEDIIYLASIYGDS NO. 21]
    KKMLKSQIKSKYGDILDDKQIKRILGLKFKDWGRISRR
    FLELEGLDKETGEITTIIKAMWDYNLNFMEIIHSDAFD
    FKDKIEELHANSIKPLAEIEVEDLDDMYFSAPVKRMIW
    QTFKVIKEIEKVMGCPPKKVFIEMTRINDKKSKGKRTN
    SRKEKFLSLYKNIHDELVDWKQLIISSDESGKLNSKKM
    YLYLTQQGICMYTGRRINLEELFDDNKYDIDHIYPRHF
    VKDDNLENNLVLVEKQSNSRKSDTYPIDKSIRNNSQVY
    KHWKSLREGNFISKEKYDRLTGKNEFTDEQKAGFIARQ
    MVETSQGTKGVADIIKQALPQSRIIYSKASNVSEFRRK
    YDILKSRTVNEFHHANDAYLNIVVGNVYDTKFTSNPLN
    FIKKQYNVDRKANNYNLDKMFVYDVKRGNEIAWIGWNP
    KKSEDSSEMSKRGTIVTVKKMLSKNTPLMTRMSFVGHG
    GIAEDNLSSHFVAKNKGYMPNGKESDVTKYGGYKKAKT
    AYFFVVEHGQTNNRIRTIETLPIYRRREVEKYEDGLIK
    YCEQSLSLLNPIIIYKKIKIQSLMKINGYYAYISGKSN
    EVYTFRNGVNMCLSQEWINYVKKLENYIEKDRQDRMIT
    YEKNIELYEIILRKYSTTILNKRLSKMDKKLINAKDRF
    CILNVKEQSQVLINVFVLSRIGDNQTDLSKIGIGKQSG
    QITQNKKITGCKEFKLVNQSVTGLYENEIDLLTV
    [SEQ ID NO. 19]
    MAD2023 56 DCGJ01000048.1 Lachno- Feces MEKNNYLLGLDIGTDSVGYAVTNDKYDILKFHGEPAWG GTTTG AGACCCC
    clostridium of six- VTIFDEASLSTEKRSFRVSRRRLDRRQQRVLLVQELFA AGAGT TATGGAT
    sp. years SEVAKVDKDFFKRIQESNLYRSDAENQAGLFIGEDYCD AGTGT TTACATT
    old REYYGQYPTIHHLISDLMNGTSPHDVRLVYLACAWLVA AAATC GCGAGTT
    elephant HRGHFLSNIDKDNLSGLKDFSSVYEGLMQYFSDNGYER CATAG CAAATAA
    PWNANVDVKALGDALKKKQGVTAKTKELLALLLDSAKA GGGTC AAGTTTA
    EKLPREEFPFSQDGIIKLLAGGTYKLSELFGNEEYKDF TCAAA CTCAAAT
    GSVKLSMDDEKLGEIMSNIGEDYELIASLRIVSDWAVL C CGTTGGC
    VDVLGESATISEAKVGIYNQHKADLEVLKKIIRKYTGK [SEQ TTGACCA
    EGYKKVFRQVDSKENYVAYSQHESDGKAPKEKGIDIAT ID NO. ACCGCAC
    FSKFILNIVRLLDVEPEDKEVYEDMVARLELNSFLPKQ 23] AGCGTGT
    VNTDNRVIPYQLYWFELHKILENASIYLPMLTEKDSNG
    ISVMEKLESVFMFRIPYFVGPLNKHSKYAWLERKEGKI GCTTAAA
    YPWNFENMVDLDASEANFIKRMTNTCTYLPGQNVLPKD GATCTCT
    SLRYHRFMVLNEINNLRINNERISVELKQKIYSELFLN TCAGTGA
    VKKVTRKRLVDFLISNGELRKGEESSLTGIDVEIKANL GGTC
    APQIAFKKLMESGQLTEEDVESIIERASYAEDKARLAH [SEQ ID
    WLEAKYSKLSEIDRKYICGIKIKDFGRLSKMFLSELEG NO. 24]
    VDKTTGEMTTILGAMWNSQLNLMELINSELYSFREAIC
    AYQTDYYSTHSSSLEERMNEMYLSNAVKRPVYRTLDIV
    KDVKKAFGEPKKIFVEMTRGASEEQKGKRTKSRKEQIL
    ELYKQCKDEDVRILQQQLEEMGDLADNKLQGDKLFLYY
    MQKGKCMYTGTPIVLEQLGSKAYDIDHIYPQAYVKDDS
    ILNNRVLVLSEANGKKKDIYPIEKETRDKMHGFWTYLN
    DKGMITEEKYKRLTRTTGFTEEEKWSFINRQLTETSQA
    TKAVATLLGELFPNAEIVYSKARLTSEFRQEFNLLKCR
    SYNDLHHAVDAYLNIVCGNVYNMKFTKRWFNINKDYSI
    KTKTVFTHPVVCGGQVVWDGQEMLNKVIRNAKKNTAHF
    TKYAYIRKGGFFDQMPVKAAEGLTPLKKDMPTAVYGGY
    NKPSVAFLIPTRYKAGKKTEIIILSVEHLFGERFLRDE
    AYAKEYAAERLKKILGKQVDEVSFPMGMRPWKINTVLS
    LDGFLICISGIGSGGKCLRAQSIMQFSSDYRWTIYLKR
    LERLVEKITVNAKYVYSEEFDKVSTIENIELYDLYIEK
    YKATIFSKRVNSPEEIIESGRDKFVKLDVLSQARALLC
    IHQTFGRIVGGCDLGLIGGKKNSAATGNFSSTISNWAK
    YYKDVRIIDQSTSGLWVRKSENLLELV
    [SEQ ID NO. 22]
    MAD2024 56 CADAKQ010000027.1 uncultured Cattle MNFDGEYFLGLDIGTDSVGYAVTDQRYNLVKFKGEPMW GTTTG GAGCCCT
    Lachno- rumen GSHLFDAANQCAERRGFRTARRRLDRRQQRVKLVDEIF AGAGT CTGGATT
    spiraceae APEVAKVDPNFYIRKMESALYPEDKSNKGDLYLYFNKQ AGTGT TACACTA
    bacterium EYDEKHYYKDYPTIHHLICALMNDEKTKFDIRLINIAI AAATC CGAGTTC
    DWLVAHRGHFLSEVGTDSVDKVLDFRKIYDEFMALFSD CAGAG AAATAAA
    EDDAVSSKPWENINPDELGKVLKIHGKNAKRNELKKLL GGCTC AATTATT
    YGGKIPTDEDSFIDRKLLIDFIAGTSVQCNKLFRNSEY CAAAA TCAAATC
    EDDLKITISNSDEREVVLPQLEDFHADIIAKLSSMYDW C GCCGCTA
    SVLSDILSGSTYISESKVKVYEQHKKDLKELKEFVRKY [SEQ TGTCGGC
    APEKYNDIFRLASKETYNYTAYSYNLKSVKDEKDLPKG ID NO. CGCACAG
    KASKEDFYSYLKKTLKLDKAENYNFVNDADTRFFDDMV 26] TGTGTGC
    ERISSGTFLPKQVNSDNRVIPYQVYYIELKKILENAKK ATTAAGA
    HYAFFEEKDEDGYSNVEKIMSVFTFRIPYYVGPLRNDD AAAGTCC
    KSPYAWIRRKADGKIYPWDFEEKVDLDASENAFIDRMT GAAAGGG
    NSCTYIPGADVLPKWSLLYTKYMVLNEINNIKVNNIGI C
    SVEAKQGIYNELFCKKAKVSLKAIREYLISNGFMQKDD [SEQ ID
    EMSGIDITVKSSLKSRYDFRHLLEKNELTTDDVEAIIS NO. 27]
    RSTYAEDKARFKKWLKKEFPQLSDEDYKYVSKLKYKDF
    GRLSRSLLNGLEGASKETGEIGTIMHFLWETNDNLMQL
    LSDRYTFMEEINKKRQDYYIEHKLTLNEQMEELGISNA
    VKRPVTRTLAVVKDVVSAIGYAPQKIFVEMARQEDEKK
    KRSVTRKEQILELYKNVEEDTKELERQLKKMGDTANNE
    LQSDALFLYYLQLGKCMYSGKPIDLTQIKTTKKYDIDH
    IWPQSMVKDDSLLNNRVLVLSEINGDKKDVYPIDESIR
    SKMHSYWKMLLDKNLITKEKYSRLTRPTPFTESEKLGF
    INRQLVETRQSMKAVTQLLNNMYPDSEIVYVKAKLAAD
    FKQDFKLAPKSRIINDLHHAKDTYLNVVAGNVYNERFT
    KKWFNVNEKYSMKTKVLFGHDVKIGDRLIWDSKKDLQT
    VKNTYEKNNIHLTRYAYCQKGGLFDQMPVKKGQGQIQL
    KKGMDIDRYGGYNKATASFFIIARYLRGGKKEVSFVPV
    ELMVSEKFLNDDNFAIEYITNVLTGMNTKKIENVELPL
    GKRVIKIKTVLLLDGYKVWVNGKASGGTRVMLTSAESL
    RMPKEYVEYLKKMENYSEKKKSNRNFMHDSENDGLSEE
    KNILLYDKLLEKLDENHFKKMPGNQCETMKSGRVKFIE
    LDFDVQISTLLNCIDLLKSGRTGGCDLKNIGGKSASGV
    VYISANLSACKYNDVHIIDISPAGLHENISCNLMELFE
    [SEQ ID NO. 25]
    MAD2025 56 DOQG01000053.1 Rumino- human MSFKENSKFYFGLDIGTDSVGWAVTDNLYKLYKYKNNL GTTTG TTTTACT
    coccaceae gut MWGVSLFEAASPAEDRRNHRTARRRLDRRQQRVALLRE AGAGT ACCCTAT
    bacterium LFAKEILKTDPDFFLRLKESSLYPEDRTNKNVNTYFDD AGTGT AAATTTA
    ADFKDSDYFKMYPTVHHLIKELSESDKPHDVRLVYLAC AAATT CACTACG
    AFIVAHRGHFLNGADENNVQEVLDFNSSYCEFTDWFKS TATAG AGTTCAA
    NDIEDNPFSESTENEFSVILRKKIGITAKEKEIKNLLF GGTAG ATAAAAA
    GTTKTPDCYKDEEYPIDIDVLIKFISGGKTNLAKLFRN TAAAA TTATTTC
    PAYDELDIQTVEVGKADFADTIDLLASSMEDTDVPLLS C AAATCGT
    AVKAMYDWSLLIDVLKGQKTISDAKVCEYEQHKSDLKA [SEQ ACTTTTT
    LKHIVRKYLDKAQYDEIFRTAGEKPNYVSYSYNVTDVK ID NO. AGTACCT
    LKQLPSNFKKKYSEEFCKYINSKLEKIKPEPDDEAVYN 29] TCACAAG
    ELIEKCNSKTLCPKQVTDENRVIPYQLYYHELSMILDK TGTTGTG
    ASAYLDFLNETEDGISVKQKILTLMKFRIPYFVGPSVK AATATTA
    RNETDNVWIVRKAEGRIYPWNFENMVDYDKSEDGFIRR ACTCACC
    MTCKCTYLAGEDVLPKYSLLYSRYTVLNEINNIKVKDV TTCGGGT
    KISPELKQDIFNELFMKTSRVTVKKITELLKRKGAFSE GAG
    ENGDSLSGVDINIKSSLKSYLDFRRLLENGSLSESDVE [SEQ ID
    RIIERITVTTDKPRLISWLKTEYPALPAEDIRYISRLS NO. 30]
    YKDYGRLSAKMLTGCYELDMDTGEIGGRSIIDLMWAEN
    INLMQIMSDSYGYKSFIEEENKKYYAINPTGSIAQTLR
    EMYVSPSASRAIIRTMDIVKELRKIIKRDPDKIFVEMA
    RGSKPEDKGKRTSSRREQIEKLFASAKEFVSDEEISHL
    RSQLGSLSDEQLRSEKYYLYFTQFGKCVYSGEAIDFSR
    LGDNHCYDIDHIYPQSKVKDDSLHNKVLVKSQLNGEKS
    DDYPIKEQIRNKMHPIWKNLFYRDPKNPTDKIKYERLT
    RSTPFTEDELAGFIERQLVETRQSTKAVATLLKEMFPD
    SKIVYVKAGQVSKFRHDFDMLKCREINDLHHAKDAYLN
    VVVGNVHDVKFTSNPLNFVKNADKHYTIKIKETLKHKV
    ARNGETAWNPETDFDTVKRMMSKNSVRYVRYCYKRKGE
    LFKQQPKKAGNPDLAWLKKNLDPVKYGGYNSKSISCFS
    LIKCTGVGVVIIPVELLCEKRYFSDDSFASEYAYSVLK
    NALPAKNIAKISIDDISFPLKRRPIKINTLFEFDGYRV
    NIRSKDSYSVFRISSAMAAIYSKDTSDYIKAISSYIDK
    SDKGSKFKPGEAFDVLSNLKAYDEIAKKCISEPFCKIS
    KLAEAGKKMEEGRNKFAELSIIEQMKTLLLLVDVLKTG
    RVDKCNLKPVGGVDNFHTERMSAILKNTKYSDIRIIDQ
    SPTGLYENKSDNLLEL [SEQ ID NO. 28]
    MAD2026 65 CADBQN010000053.1 uncultured Cattle MEQKDYYIGLDIGTNSVGWAVVDEGYQLCRFKKYDMWG GTTTG GACTACC
    Firmicutes rumen VRLFDSAETAAERRMNRVNRRRNRRKKQRIDLLQGLFA AGAGT ATATGAG
    bacterium EEIAKIDRTFFVRLNESRLHPEDKSTAFRHPLFNDPNY AGTGT ATTACAC
    TDVDYYKEYPTIYHLRKELMDSAEPHDIRLVYLALHHI AATTT TACACGG
    LKNRGHFLIEGGFEDSKKFEPTFRQLLEVLTEELGLKM CATAT TTCAAAT
    DGADAALAESVLKDRGMKKTEKVKRLKNVFTLNTTDMD GGTAG AAAGAAT
    QESQKKQKAQIDAVCKFLAGSKGDFKKLVADEALNELK TCAAA GTTCGAA
    LDTFALGTSKAEDIGLEIEKSAPQYCVVFESVKSVFDW C ACCGCCC
    KIMTQILGDESTFSSAKVKEYEKHHENLIILRELIRKY [SEQ TTTGGGG
    CDKETYRHFFNNVNGGYSRYIGSLKKNGKKYYVAGCTQ ID NO. CCCGCTT
    EEFYKELKGLLKSIDQRVDPEDRPVYQRVLAETEDETF 32] GTTGCGG
    LPLLRSKANSAIPRQIHQKELDDILQNASVYLPFLNDV ATTTACA
    DEDGLSAAEKIRSIFTFRIPYYVGPLSLRHKDKGAHVW GACTTGA
    IKRKEEGYIYPWNYEKKIDREKSNEEFIRRLINQCTYL TATCAAG
    KDEKVLPKKSLLYSEFMVLNELNNLRIRGKRLSEEQVE TCTG
    LKQRIYRDLFMTKTRVTKKTLLNYLRKEDSDLTEEDLS [SEQ ID
    GFDNDFKASLSSCLELKNKVFGDRIEEDRVRKIAEDLI NO. 33]
    RWLTIYDDDKKMIKEVIRAEYPNEFTNEQLDVICRLKF
    SGWGNLSEAFLCGVEGADKDTGEVFTIIEALRNTNHNL
    MELLSGNYTFTEKIREHNAALSSEIKAKDYESLVRDLY
    VSPACKRGIWQTIRITEEIKKIMGHEPKKIFVEMTREH
    RDSGRTTSRKDQLLALYQKCEEDARDWVKEIEDREERD
    FSSIKLFLYYLQQGKCMYSGEAIDLDELMSKNSRWDRD
    HIYPQSKIKDDSLDNLVLVKKELNAVKDNGEIAPDIQK
    RMKGFWLSLLRQGFLSKKKFDRLTRTGPFTSEELAGFI
    SRQLVETSQMSKAVAELLNQLYEDSRVVYVKAGLVSQF
    RQKDLGVLKSRSVNDYHHAKDAYLNVVVGDMFDRKFTS
    DPARWFKKNKKVNYSINQVFRRDYEENGKLIWKGIDRG
    EDGKPLFRDGLIHGGTIDLVRAIAKRNTNIRYTEYTYC
    ETGQLYNLTLLPKTDTAITIPLKKELPAAKYGGFKGAG
    TSYFSLIEFDDKKGHHHKQIVGVPIYVANMLEHNENAF
    IEYLETVCSFRNITVLCEKIKKNALISVNGYPMRIRGE
    NEILNMLKNNLQLVLSQEGEETLRHIEKYFNKKPGFEP
    DKEHDGIDRDAMAALYDEMTEKLCTVYKKRPTNQGELL
    KNNRGLFLNLEKRSEMAKVLSETAKMFGTTAQTTADLS
    LIKGSKYAGKIVINKNTLGAAKLILIHQSVTGLFETRV
    EL [SEQ ID NO. 31]
    MAD2027 65 CACWRN010000001.1 uncultured Cattle MSKKFAGEYYLGLDIGTDSVGWAVTDNQYNVLKFNGKS GTTTG TTTACCA
    Succini- rumen MWGIRLFDAAQTAAERRMFRTARRRVERRRWRLELLQE AGAGT TCCAGTG
    clasticum LFQNEIEKKDPDFFQRMKDSALYPEDSKTGKPFALFCD AATGT AGTTTAC
    sp. KDLNDKLYYKQYPTIYHLRKALLTENSKFDIRLVYLAI AAATT ATTACAA
    HHILKHRGHFLFNGDFSNVTRFSFAFEQLQTCLCNELD CATAG GTTCAAA
    MDFECNNVQKLSEILKDTHMSKNDKVKASVALFENSGD GATGG TAAAAAT
    KKQLQAVIGLFCGAKKKLADVFLDETLNDTEMPSISIA TAAAA TTATTCA
    DKPYEELRPELESILAEKCCVIDYIKAVYDWAILADML C ACCCGTT
    DGGEYGNRTYISVARVRQYEKHHDDLKKLKKLVRRYCK [SEQ CTTCGGA
    SEYKSFFSVAGTDNYCAYIGDDIETDDRKSVKKCKQED ID NO. ACCTCCA
    FYKRIKGLLKKAIENGCPKDEVVEIIKDIDAQVFLPLQ 35] CCGTGTG
    VTKDNGVIPHQVHEMELKQILKNAEKYYPFLCKKDEEG GAACATT
    IVTSNKILQLFKFRIPYYVGPLNSRIGKNSWIVRRAEG AAGGTCT
    KIYPWNFEEKVDFDKSEEGFIRRMTNPCTYMAGADVLP GCTTTGC
    KYSLLYSEFMVLNELNNVRICGDKLSVEIKQTIIKDLF AGGCC
    QRTRRVTVRKLCDKLKAEGVISRNSNQKDIDIKGIDQD [SEQ ID
    LKSSMVSYVDFKNIFGKEIEKYSVQQMCERIIFLLTIH NO. 36]
    HDDKRRLQKRIRAEFTEAQITDDQLQKVLRLNYQGWGR
    FSAEFLKELKGVDTETGEVFSIINALRETDDNLMQLLS
    NRYTFAEELEKYNSNKRKKIEALTYDNIMEGIVASPAI
    KRSAWQAISIVMELSKIMGREPKRIFVEMARGPEEKKH
    TISRKNQLLELYKSVKDESRDWKTELETKTESDFRSIK
    LFLYYTQMGRCMYTGEPIDLDQLANTTIYDRDHIYPQS
    LTKDDSLNNLVLVKKVENANKGNGLISADIQKKMRGFW
    AELKKKGLISDEKFSRLTRTTPLSDDELAGFINRQLVE
    TRQSSKIVADLFHQLYPTTQVVYVKAKIVSDFRHETLD
    MVKVRSLNDLHHAKDAYLNIVTGNVYYEKFSGNPLTWL
    RKNPDRNYSLNQMFNYDIVKKTKEGTSYVWKKGKDGSI
    AVVRRTMERNDILYTRQATENKNGGLFDQNIVSSKNKP
    FIPVKKGLDVNKYGGYKGITPAYFALIEFTDKKGSRQR
    LLEAVPLYLRADIDNDSNVLRDFYKNVLGLENPVVILN
    RIKKNSLLKINGFLIHLRGTTGFSASQLKVQNAVEFSL
    PHHMEDYVKKLENYEKHIIAERGSTKNSQIKITEWDGI
    SKEKNLQLYDMFINKMENTIYKFRPANQVSNLKENREV
    FNSLAVEDQCSVLNQVLMLFVCKPVTANLSLIKGSKNA
    GNMALSKIISNMRSAYLIHQSVTGLFEQKIDLLKVSSQ
    KD [SEQ ID NO. 34]
    MAD2028 66 DHKP01000031.1 Bacillales gut MANKLFIGLDVGSDSVGWAATDENFHLYRLKGKTAWGA GTTTG GCATTGT
    bacterium meta- RIFSEASDAKGRRGFRVAGRRLARRKERIRLLNTLFDP AGAGC AAGACAA
    genome LLKEKDPTFLLRLENSAIQNDDPNKPAQAVTDCLLFAN AGTGT CACTGCT
    KQEEKGFYKRYPTIWHLRKALMDNEDCAFSDIRFLYLA TGTCT ACGTTCA
    IHHIIKYRGNFLRDGEIKIGQFDYSVFDKLNETLSVLF TATAT AATAAGC
    DLQSEDEDSQEGHFVGLPKSQYEAFITTANDRNLPKQT AGCTC ATATTGC
    KKTKLLSMFEKDEESKSFLEMFCTLCAGGEFSTKKLNK GAAAA TACAAGG
    KGEETFDDTKISFNASYDQNEPNYQEILGDAFDLVDIA C TTCTCCC
    KAVFDYCDLSDILNGNDNLSNAFVELYDSHKSQLSALK [SEQ TCGGAGA
    AICKQIDNQSNLKGDASVYVKLFNDPNDKSNYPAFTHN ID NO. ATGACCA
    KTLVDKRCDIHTFDKYVIDTVLPYEPLLMGQDATNWQM 38] TTAGGTC
    LKSLAEQDRLLQTIALRSTSVIPMQLHQKELKIILKNA ACTTAGA
    ISRNVKGIAEIEEKILKLFQYKIPYYCGPLTTKSAYSN TAGCCGG
    VVFKNNEYRPLKPWDYEEAIDWDETKKKFMEGLTNKCT TTCTTCT
    YLKDKNVLPKQSILYQDFDAWNKLNNLKVNGSKPSLKE GGCTA
    LKDLFSFVSQRPKTTMKDIQRHFKSDTNSKDKDVVVSG [SEQ ID
    WNPEDYICCSSRASFGKNGVFDLNNPDSSDPKDLSKCE NO. 39]
    RMIFLKTIYADSPKDADVAILKEFPDLTNDQKSLLKTI
    KCKEWSPLSKEFLELRYADKYGEIRESIINLLRSGEGN
    LMQILAKYDYQERIDAYNADSFQTKSKSQIVSDLIEEM
    PPKMRRPVIQAVRIVHEVVKVAKKEPDQISIEVTRENN
    NKEKKQQLTKKAKSRSAQIQTFLKNLVKIDTFEEKRVD
    EVLEELKKYSDRSINGKHLYLYFLQNGKDAYTGKPINI
    DDVLSGNKYDTDHVIPQSKMKDDSIDNLVLVERSINQH
    RSNEYPLPESIRKNPANVAFWSKLKKAGMMSEKKFNNL
    TRANPLTEEELSAFVAAQINVVNRSNIVIRDVLKVLYP
    NAKLIFSKAQYPSQIRKELNIPKLRDLNDTHHAVDAYL
    NIVSGVSLTERYGNLSFIKAAQKNENQTDYSLNMERYI
    SSLIQTKEGEKTSLGKLIDQTSRRHDFLLTYRFSYQDS
    AFYNQTIYKKNAGLIPVHEKLPPERYGGYNSMSTEVNC
    VVTIKGKKERRYLVGVPHLLLEKGNKVADINKEIANSV
    PHKENETIAVSLKDIIQLDSMVKKDGLVYLCTTQNKDL
    VKLKPFGPIFLSRESEVYLSNLNKFVEKYPNIADGNEN
    YSLKTNRYGEKSIDFLQEKTGNVLKELVDLSNQKRFDY
    CPMICKLRTIDYRKGVEGKTLTEQLILIRSFVGVFTRK
    SEALSNGSNFRKARGLVLQDGLVLCSDSITGLYHTERK
    L [SEQ ID NO. 37]
    MAD2029 66 DBKT01000013.1 Bacillales gut MADKLFIGLDVGSESVGWAATDENFHLYRLKGKTAWGA GTTTG GCATTGT
    bacterium meta- RIFSEANDAKTRRGFRVAGRRLARRKERIRLLNTLFDP AGAGC AAGACAA
    genome LLKKDPAFLLRLENSAIQNDDPNKPIQAIADCPLLVNK AGTGT CACTGCT
    QEEKDYYKRYPTIWHLRKALMENDDHAFSDIRFLYLAI TGTCT ACGTTCA
    HHIIKYRGNFLREGDIKIGQFDYSIFDKLNETLAVLFD TATAT AATAAGC
    LQNEDGENEEGRFIGLPKSQYEAFITCANDRNLPKQPK AGCTC ATATTGC
    KAKLLSMFEKTEESKAFLEMFCTLCSGGEFSTKKLNAK GAAAA TACAAGG
    GEETYQDAKISFNSSYDENEGAYQEILGDFFDLVDIAK C TTCTCCA
    AVFDYCDLSDILNGNDNLSSAFVELYDSHKSQLSALKS [SEQ TTGGAGA
    ICKRIDNQNGFIGEKSIYVKLFNDPNDKSNYPAFTNNK ID NO. ATGACCA
    TLVDKRCDIHTFDKYVKETILPYESSLTGRDAVNWQML 41] TTAGGTC
    KSLAEQDRLLQTIALRSTSVIPMQLHQKELKIILKNAV GCTTAGA
    SRNIKGVAEIEEKILKLFQYKIPYYCGPLTTKSDYSNV TAGCCAG
    VFKNNEYRPLKPWDYEEAIDWDGTKQKFMEGLTNKCTY TTCTTCT
    LKDKNVLPKQSVLYQDFDTWNKLNNLKVNGNKPSLEDL GGCTA
    NDLFSFVSQRSKTTMRDIQRYLKSKTNSKENDVVVSGW [SEQ ID
    NSEDYICCSSRASFNKNGIFNLNNSEVLKECERIIFLK NO. 42]
    TIYTDSPKDADAAVLKEFPDLTNNQKTLLKTIKCKEWS
    PLSKEFLELRYSDKYGElRQSIIDLLRNGEGNLMQILA
    KYDYQEVIDACNAASFQTKSKSQIVSDLIEEMPPKMRR
    PVIQAVRIVQEVAKVAKKEPDEISIEVTRENNDKEKKQ
    QLTKKAKSRSTQIQNFLKNLVKIDASEKKQANEVLEEL
    KKYSDQSINGKHLYLYFLQNGKDAYTGKPINIDDVLSG
    NKYDTDHIIPQSKMKDDSIDNLVLVEREINQHRSNEYP
    LPESIRKNPANVAFWRKLKKAGMMSEKKFNNLTRSNPL
    TEEELGAFVAAQINVVNRSNVVIRDVLKILYPNAKLIF
    SKAQYPSQIRKELNIPKLRDLNDTHHAVDAYLNIVSGV
    TLTDRYGNMRFIKASQDEEKHSLNMERYISSLIQTKEG
    QRTELGELIDQTSRRHDFLLTYRFSYQDSAFYKQTIYK
    KNAGLIPAHDNLPPERYGGYDSMSTEVNCVATIIGKKT
    TRYLVGVPHLLIKKAKDGIDVNDELIKLVPHKENEVVK
    VDLNTTLQLDCTVKKDGFMYLCTSNNIALVKLKPFSPI
    FLSRESEIYLSNLMKYVEKYPNISDENSEYEFKINREN
    VDPIKFTEKQSIEVVQDLIIKAKQDRFSYCSMISKLRD
    INAEEMIHSKSLTEQLKIIKSLIGVFTRKSEILSDKNN
    FRKSRGAILQEDLFLCSDSITGLYHTERKL [SEQ ID
    NO. 40]
    MAD2030 66 DBLD01000015.1 Bacillales gut MEQNTKKLFIGLDVGTDSVGWAATDEYFNLYRLKGKTA GTTTG GCATTGT
    bacterium meta- WGARLFLDAANAKDRRQHRVSGRRLARRKERIRLLNAL AGAGC AAGACAA
    genome FDPLLKKVDPTFLLRLESSTLQNDDPNKDQRAVSDALL AGTGT CACTGCA
    FGNKKHEKAYYAAFPTIWHLRKALIENDDKAFSDIRYL TGTCT CGTTCAA
    YLAIHHIIKYRGNFLRQGEIKIGEFDFSCFDKLNQFFD TAAAT ATAAGCA
    IYFSKEDEEEVEFIGLPNENYQRFIDCAADKNLGKGKK AGCTC GATTGCT
    KGDLLKLMSFSEDEKPFCEMFCSLCAGLAFSTKKLNKK GAAAA ACAAGGT
    DETVFEDIKVEFNGKFDDKQEEIKSVLGDAYDLVELAK C TCCCGTA
    FIFDYCDLKDILGASTNRLSEAFAGIYDSHKEELKALK [SEQ AGGGAAT
    GICREIDRSLGNESKNSLYREVFNDKGIPNNYAAFIHH ID NO. GACCATC
    ETNSSRCGIADFNNYVLQKIEPLENLLSKQNYKNWIQL 44] TGGTCAC
    KQLASQGRLLQTIAIRSTSIIPMQLHLKDLKLILANAE ATGAATA
    KRDIPGIKDIKEKILLLFQFKVPYYCGPLTDRSQYSNV GCCCCCG
    VLKAGTREKITPWNFADQVDLEETKKKFMEGLTNKCTY GCAACGG
    LKDCNVLPRQSLMFQEYDAWNKLNNLSINGNKPSPEEM TGGCTG
    NALFDFASKRRKTTMSDIKKFEKRATMSKENDVTVSGW [SEQ ID
    NENDFIDLSSFVSLSGFFDLGEIHSADYMACEEAILLK NO. 45]
    TIFTDAPQDADPIIAEKFPNLKPNQLAALKKMSCKGWA
    TLSREFLTLKAVDADGEVMNETLLGLMKEGKGNLMQLL
    HSSLYNFQDVIDSHNRAVFGDKSPKQIANDLIEEMPPQ
    MRRPVIQALRIVREVSKVAKKQPDVISIEVTRESNDKK
    KKEEWSKKATDRKKQIDLFLKNLKKTEDVKQTESELDG
    QAINDIDSIRGKHLYLYFLQNGKDAYTGLPIDINDVLN
    GTKYDTDHIIPQSLMKDDSIDNLVLVNREKNQHKSNEF
    PLPRDIQTKANIERWRALKKAGGMSEKKFNNLTRTTPL
    TEEELSAFVAAQINVVNRSNVVIRDVLKILYPNAKLIF
    SKAQYPSQIRRDLEIPKLRDLNDTHHAVDAFLNIVSGV
    ELTKQFGRMDVIKAAAKGDKDHSLNMTRYLERLLKKVD
    ENKNETMTELGNHVFVTSQRHDFLLTYRFDYQDSAFYN
    ATIYSPDKNLIPMHDGMDPERYGGYSSLNIEYNCIATI
    KGKKKTTRYLLGVPHLLALKFKNDGIDITSDLIKLVPH
    KGDEEVSIDWKNPIPLRITVKKDGVEYLLAPFNAQVME
    LKPVSPVFLPREAAEYLARLKKAVDQKKQFIYQNSAEI
    FQSKDKNNALQFGPEQSKNVALKIYALADAKKYDYCAM
    ISKLRDAALRAEMLDSLSSEALFKQYNDLISLLSQLTR
    RSKKISSKYFSKSRGALLQDGLKIVSKSITGLYETERN
    L [SEQ ID NO. 43]
    MAD2031 141 CACVOG010000001.1 uncultured Cattle MNYILGLDIGIASVGWAAVALDANDEPCKILDLNARIF ATTGT TTGTAAT
    Seleno- rumen EAAEQPKTGASLAAPRREARGSRRRTRRRRHRMERLRH ACCAT AACCTAT
    monadaceae LFAREELISAENIAALFEAPADVYRLRAEGLSRRLDEG AGCGA TTTACCT
    bacterium EWARVLYHIAKRRGFKSNRKGAASDADEGKVLEAVKEN GTTAA CGCTATG
    EALLKNYKTVGEMMFRDEKFQTAKRNKGGSYTFCVSRG ATTAG GCACAAT
    MLAEEIGELFAAQREQGNPHASETFETAYSKIFADQRS GGAAT TTGTTAT
    FDDGPDANSRSPYAGNQIEKMIGTCSLETDPPEKRAAK TACAA TACATGG
    ASYSFMRFSLLQKINHLRLKDAKGEERPLTDEERAAVE C ACATTAT
    ALAWKSPSLTYGAIRKALPLPDELRFTDLYYRWDKKPE [SEQ ACTAAAC
    EIEKKKLPFAAPYHEIRKALDKREKGRIQSLTPDALDA ID NO. ATTTCCT
    VGYAFTVFKNDAKIEAALSAAGIDGEDAVALMAAGLTF 47] AAAAAAG
    RGFGHISVKACRKLIPHLEKGMTYDKACKEAGYDLQKT CAACGAA
    GGEKTKLLSGNLDEIREIPNPVVRRAIAQTVKVVNAVI AAACGTG
    RRYGSPVAVNVELAREMGRTFQERRDMMKSMEDNNAEN CTGGCAG
    EKRKEELKGYGVVHPSGLDIVKLKLYKEQGGVCAYSLA CAA
    AMPIEKVLKDHDYAEVDHILPYSRSFDDSYANKVLVLS [SEQ ID
    KENRDKGNRTPMEYMANMPGRRHDFITWVKSAVRNPRK NO. 48]
    RDNLLLEKFGEDKEAAWKERHLTDTKYIGSFIANLLRD
    HLEFAPWLNGKKKQHVLAVNGAVTDYTRKRLGIRKIRE
    DGDLHHAVDAAVIATVTQGNIQKLTDYSKQIERAFVKN
    RDGRYVNPDTGEVLKKDEWIVQRSRHFPEPWPGFRHEL
    EARVSDHPKEMIESLRLPTYTPEEIDGLKPPFVSRMPT
    RKVRGAAHLETVVSPRLKDEGMIVKKVSLDALKLTKDK
    DAIENYYAPESDHLLYEALLHRLQAFGGDGEKAFAESF
    HKPKADGTPGPVVKKVKIAEKSTLSVPVHHGRGLAANG
    GMVRVDVFFIPEGKDRGYYLVPVYTSDVVRGELPMRAV
    VQGKSYAEWKLMREEDFIFSLYPNDLVYIEHEKGVKVK
    IQKKLREISTLPREKTMTSGLFYYRTMGIAVASIHIYA
    PDGVYVQESLGVKTLKEFKKWTIDILGGEPHPVQKEKR
    QDFASVKRDPHAAKSTSSG [SEQ ID NO. 46]
    MAD2032 141 CACVWE010000020.1 uncultured Cattle MKYIIGLDMGITSVGFATMMLDDKDEPCRIIRMGSRIF GTTGT ATTGTAT
    Rumino- rumen EAAEHPKDGSSLAAPRRINRGMRRRLRRKSHRKERIKD AGTTC CATACCA
    coccus LIIKNELMTADEISAIYSTGKQLSDIYQIRAEALDRKL CCTAA AGAACAA
    sp. NTEEFVRLLIHLSQRRGFKSNRKVDAKEKGSDAGKLLS TTATT TTAGGTT
    AVNSNKELMIEKNYRTIGEMLYKDEKFSEYKRNKADDY CTTGG ACTATGA
    SNTFARSEYEDEIRQIFSAQQEHGNPYATDELKESYLD TATGG TAAGGTA
    IYLSQRSFDEGPGGSSPYGGNQIEKMIGNCTLEPEEKR TATAA GTATACC
    AAKATFSFEYFNLLSKVNSIKIVSSSGKRALNNDERQS T GCAAAGC
    VIRLAFAKNAISYTSLRKELNMEYSERFNISYSQSDKS [SEQ TCTAACA
    IEEIEKKTKFTYLTAYHTFKKAYGSVFVEWSADKKNSL ID NO. CCTCATC
    AYALTAYKNDTKIIEYLTQKGFDAAETDIALTLPSFSK 50] TTCGGAT
    WGNLSEKALNNIIPYLEQGMLYHDACTAAGYNFKADDT GAGGTGT
    DKRMYLPAHEKEAPELDDITNPVVRRAISQTIKVINAL TATCT
    IREMGESPCFVNIELARELSKNKAERSKIEKGQKENQV [SEQ ID
    RNDRIMERLRNEFGLLSPTGQDLIKLKLWEEQDGICPY NO. 51]
    SLKPIKIEKLFDVGYTDIDHIIPYSLSFDDTYNNKVLV
    MSSENRQKGNRIPMQYLEGKRQDDFWLWVDNSNLSRRK
    KQNLTKETLSEDDLSGFKKRNLQDTQYLSRFMMNYLKK
    YLALAPNTTGRKNTIQAVNGAVTSYLRKRWGIQKVREN
    GDTHHAVDAVVISCVTAGMTKRVSEYAKYKETEFQNPQ
    TGEFFDVDIRTGEVINRFPLPYARFRNELLMRCSENPS
    RILHEMPLPTYAADEKVAPIFVSRMPKHKVKGSAHKET
    IRRAFEEDGKKYTVSKVPLTDLKLKNGEIENYYNPESD
    GLLYNALKEQUAFGGDAAKAFEQPFYKPKSDGSEGPLV
    KKVKLINKATLTVPVLNNTAVADNGSMVRVDVFFVEGE
    GYYLVPIYVADTVKKELPNKAIIANKPYEEWKEMREEN
    FVFSLYPNDLIKISSRKDMKFNLVNKESTLAPNCQSKE
    ALVYYKGSDISTAAVTAINHDNTYKLRGLGVKTLLKIE
    KYQVDVLGNVFKVGKEKRVRFK [SEQ ID NO. 49]
    MAD2033 141 DCJP01000021.1 un- Feces MKNTLYGIGLDIGVASVGWAVVGLNGTGEPVGLHRLGV GTTGT TTATACC
    cultivated of RIFDKAEQPKTGESLAAPRRMARGMRRRLRRKALRRAD AGTTC ATACCAA
    Faecali three- VYALLERSGLSTREALAQMFEAGGLEDIYALRTRALDE CCTAA GAACTGT
    bacterium weeks PVGKAEFSRILLHLAQRRGFKSNRRTASDGEDGRLLAA CAGTT TATGGTT
    sp. old VNENRRRMAQGGWRTVGEMLYRHEAFALRKRNKADEYL CTTGG GCTATGA
    elephant STVGRDMVAEEASLLFQRQRELGCAWATPELQAEYLSI TATGG TAAGGTC
    LLRQRSFDEGPGGNSPYGGNQVEKMVGRCTFEPDEPRA TATAA TTAGCAC
    AKAAYSFEYFSLLQKLNHIRLAENGETRPLTQPQRQQL T CGTAAAG
    LSLAHKTPDVSLARIRKELALPETVQFNGVRCRANETL [SEQ CTCTGAC
    EESEKKEKFACLPAYHKMRKALDGVVKGRISSLSISQR ID NO. GCCTCGC
    DAAATALSLYKNEDTLRAKLTEAGFQAPEIDALAGLTG 53] TTTCAGC
    FSKFGHLSLKACRKLIPHLEQGLTYDQACSAAGYDFKG GGGGCGT
    HGAGERAFTLPAAAPEMEQITSPVVRRAVAQTIKVVNG CATCTTT
    IIREMDASPAWVRIELARELSKTFGERQEMDRSMRENA TTTGCCC
    AQNERLMQELRDTFHLLSPTGQDLVKYRLWKEQDGVCA AAAAGAC
    YSLRRLDVERLFEPGYVDVDHIVPYSLSFDDRRSNKVL ACGGATA
    VLSSENRQKGNRLPLQYLQGKRREDFIVWTNSSVRDYR TTTTT
    KRQNLLREKFSGDEAEGFRQRNLQDTQHMARFLYNYIS [SEQ ID
    DHLAFAQSEALGKKRVFAVSGAVTSHLRKRWGLSKVRA NO. 54]
    DGDLHHALDAAVIACTTDGMIRRISGYYGHIEGEYLQD
    ADGAGSQHARTKERFPAPWPRFRDELIVRLSEQPGEHL
    LDINPAFYCEYGTEHICPVFVSRMPRRKVTGPGHKETI
    KGAAAADEGLLTVRKALTELKLDKDGEIKDYYMPSSDT
    LLYEALKAQLRRFGGDGKKAFAEPFYKPKADGTPGPLV
    RKVKTIEKATLTVPVHGGAASNDTMVRVDVFLVPGDGY
    YWVPVYVADTLKPELPNRAVVAFKPYSEWKEMREEDFI
    FSLYPNDLVYVEHKSGLKFTLQNADSTLEKTWVPKASF
    AYFVGGDISTAAISLRTHDNAYGLRGLGIKTLKVLKKY
    QVDVLGNISPVHRETRQRFR [SEQ ID NO. 52]
    MAD2034 141 CACXAV010000001.1 uncultured Cattle MAYGIGLDIGIASVGFATVALNEQDEPCGILRMGSRIF GTTGT TTATACC
    Clostri- rumen DAAEHPKNGASLAAPRREARSARRRLRRHRHRLERIRN AGTTC ATACCAA
    diales LLVESCLISQDGLGSLFEGRLEDIYALRTRALDERLTD CCTAA GAACTGT
    bacterium AELCRVLIHLAQRRGFRSNRKADAADKEAGKLLKAVSE CGGTT TGGGTTA
    NDRRMEENGYRTVGEMLYKDPLFAEHRRNKGEAYLSTV CTTGG CTACAAT
    TRTAVEQEARLVLSTQREKGNAAITEDFVEKYLDILLS TATGG AAGGTAG
    QRPFDVGPGGNSPYGGNMIEKMIGRCTFEPDELRAPKA TATAA TAAACCG
    SYSFEYFQLLQKVNHIRLLRDGRSEPLSEEQRRAIIDL T AAAAGCT
    ALASADVTFAKIRKALSLPDSVRFNDVYYRESAEEAEK [SEQ CTGACGT
    KKKLGCMDAYHEMRKALDKVAKGRICAIPVEQRNAIAY ID NO. CTTGTTT
    VLTVHKTDERILTELQNINLERSDIDQLMQMKGFSKFG 56] GCGCAGG
    HLSIKACDRIIPYLEQGMTYSDACTAAGYAFRGHEGGE ACGTCAT
    HSLYLPAQTPEMDEITSPVVRRAVSQTIKVVNALIREQ CTTTATA
    GESPTFVNIELAREMSKDFAERNDIRRENEKNAKANEA TCAGACG
    VMNELRRTFGLVNPSGQDLVKYKLFLEQGGVCPYTQRP GATG
    MEPGRLFEAGYADVDHIVPYSISFDDRYCNKVLTFASV [SEQ ID
    NRKEKGNRLPLQFLKGERRESFIVYVKANVRDYRKQRL NO. 57]
    LLKETVTEEDRKGFRDRNLQDTKHMAAFLHSYINDHLQ
    FAPFQTDRKRHVTAVNGAVTAYLRKRWGIRKVRAEGDL
    HHASDALVIACTTPGMIQRLSRYAELREAEYMQTEDGA
    VRFDPATGEVLEKFPYPWPCFRQEWTARVSDDPQAMLQ
    DMKLTDYRGLPLEQVKPVFVSRMPKHKVTGAAHKDTVK
    SAKALDRGVVLVKRALTDLKLKDGEIENYYDPASDRLL
    YEALKERLIAFGGDAQKAFAEPFHKPKRDGTPGPLVKK
    VKLMEKSSLTVPVHDGKGVADNDSMVRIDVFFVAGEGY
    YFVPIYVADTVKPELPNRAVVANKPYAEWKEMKDEDFL
    FSLYPSDLMRVTQKKGIKLSLINKESTLKKEEMAQSIL
    LYYVKGSISTGSITAENHDRTYAINSLGIKTLEKLEKY
    QVDVLGNVSPVGKEKRLTFC [SEQ ID NO. 55]
    MAD2035 141 CADATZ010000012.1 uncultured Cattle MLPYAIGLDIGIASVGWAVVGLDTNERPFCILGMGSRI GTTGT TTATACC
    Chloroflexi rumen FDKAEQPKTGASLALPRREARSLRRRLRRHRHRNERIR AGTCC ATTCCAG
    bacterium NLLLREKIISESELQDLFSGTLSDIYQLRVEALDRKLD CCTGA AAACTAT
    DKEFSRVLIHIAQRRGFKSNRKNAAASQEDGKLLSAVT TGGTT TATGGTC
    ENQQRMNDKGYRTVSEMLLRDDKFKDHKRNKGGEYLTT TCTGG ACTACAA
    VTRTMVEDEVHKIFSAQRTHGNLKADNQLESEYLEILL AATGG TAAGGTA
    SQRSFDEGPGGDSPYGGSQIEKMIGKCTFFPEEKRAAK TATAA TTAGACC
    ATYTFEYFNLLEKINHIRLVSKDNLPEPLSDFQRRSLI T GTAGAGC
    ELAYKVENLTYDRIRKELHISPELKFNTIRYESDDLPE [SEQ ACTAACA
    NEKKQKLNCLKAYHElRKALDKLGKGTINTLSKEQLNT ID NO. CCCCATT
    IGTVLSMYKTSEIIKNKMEQIPAEIVDKLDEEGINFSK 59] TGGGGTG
    FGHLSIKACELIIPGLEKGLNYNDACEEAGLNFKAHNN TTATCTC
    EEKSFLLHPTEDDYADITSPVVKRAASQTIKVINAIIR TTTAAAC
    KQGCSPTYINIEVARELSKDFYERDKINKRNEANRAEN TGTCCAA
    ERSLEQIRKEYGKSNASGLDLVKFKLYQKQDGVCAYSQ AATTTAG
    KQISFERLFEPNYVEVDHIIPYSKCFDDRESNKVLVFA TATTGCA
    KENREKGNRLPLEYLDGKKRESFIVWVNSKVKDYRKKQ ATTATTG
    NLLKESLSEEEEKQFKERNLQDTKTVSKFLMNYINDNL A
    IFSSSNKRKKHVTAVSGGVTSYMRKRWGISKVREDGDQ [SEQ ID
    HHAVDALVIVCTTDGMIQQVSKYVEYKECQYIQTDAGS NO. 60]
    LAVDPYTGEVLRSFPYPWARFHEDAVTWTEKIFVSRMP
    MRKVTGPAHKETIKSPKALGEGLLIVRKPLTELKLKNG
    EIENYYKPEADLLLYNGLKERLMEFGGDAKKAFAEPFP
    KPGNPQKIVKKVRLTEKSTLNVPVLKGEGRADNDSMVR
    VDVFLKDGKYYLVPIYVADTLKPELPNKACIAHKPYDE
    WATMDDGDFLFSLYPNDLIYIKHKKGIKLTKINKNSTL
    ADSIEGKEFFLFYKTMGISSAVLTCTNHDNTYYIESLG
    VKTLESLEKCVVGVLGEIHKVRKEKRTGFSGN [SEQ
    ID NO. 58]
    MAD2036 141 CADAWQ010000026.1 Ruminoe- Cattle MLPYAIGLDIGISSVGWASVALDEEDKPCGIIGMGSRI GTTAT TTATACC
    coccacea rumen FDAAEQPKTGDSLAAPRRAARSARRRLRRRRHRNERIR AGTTC ATACCAA
    bacterium ALMLREGLLSEAELAALFDGRLEDICALRVRALDEAVT CCTGT GAACGAA
    NDELARILLHLSQRRGFRSNRKTAATQEDGELLAAVSA TCGTT GCAGGTT
    NRALMQERGYRTVAEMLLRDERYRDHRRNKGGAYIATV CTTGG ACTATGA
    GRDMVEDEVRQIFAAQRALGSTAASETLETAYLEILLS TATGG TAAGGTA
    QRSFDAGPGEPSPYAGGQIERMIGRCTFEPDEPRAARA TATAA GTATACC
    TYSFEYFSLLEAVNHIRLTEAGESVPLTKEQREKLIAL T GCAGAGC
    AHRTADLSYAKIRKELGVPESQRFNMVTYGKTDSADEA [SEQ TCCAACG
    EKKTKLKQLRAYHQMRAAFEKAAKGSFVLLTKEQRNAV ID NO. CCTCGCT
    GQTLSIYKTSDNIRPRLREAGLTEAEIDVAEGLSFSKF 62] TTTGCGG
    GHLSVKACDKIIPFLEQGMKYSEACVAAGYAFRGHEGQ GGCGTTG
    DKQRLLPPLDNDAKDTITSPVVLRAVSQTIKVVNAIIR TCTCT
    ERGGSPTFINIELAREMAKDFSERSQIKREQDSNRARN [SEQ ID
    ERMMERIKTEYGKSSPTGLDLVKLKLYEEQAGVCAYSL NO. 63]
    KQMSLEHLFDPNYAEIDHIIPYSISFDDGYKNKVLVLA
    KENRDKGNRLPLEYLNGKRREDFIVWVNSSVRDWRKKQ
    NLLKEHVTPEDEAKFKERNLQDTKTASRFLLNYIADNL
    AFAPFQTERKKRVTAVNGSVTAYLRKRWGIAKVRANGD
    LHHAVDALVIACTTDGLIQKVSRYACYQENRYSEAGGV
    IVDSATGEVVAQFPEPWPRFRHELEARLSDDPARAVLG
    LGLAHYMTGEIRPRPLFVSRMPRRKVTGAAHKETVKSP
    RALDEGQLVTKTPLSALKLGKDGEIPGYYKPESDRLLY
    EALKARLRQFGGDGKKAFAEPFHKPKHDGTPGPVVTKV
    KLCEPATLSVPVHGGLGAANNDSMVRIDVFHVEGDGYY
    FVPIYIADTLKLELPNKACVKIKKISEWKHMKPQDFMF
    SLYPNDLFRIVSKKGITLNLVSKESTLPTSVNVSDTLL
    YFVSAGIASACLTCRNHDNTYQIESLGIKTLEKLEKYT
    VDVLGNVHRVEKEPRMSFSQKGD [SEQ ID NO.
    61]
    MAD2037 141 DGSQ01000028.1 Clostri- low MLPYGIGLDIGITSVGWATVALDENDRPYGIIGMGSRI GTTAT TTATACC
    diales methane FDAAEQPKTGESLAAPRRAARSARRRLRRHRHRNERIR AGTTC ATACCAA
    bacterium producing ALILRENLLSEGQLLHLYDGQLSDVYSLRVKALDERVS CCTGA GAACTAT
    sheep NEEFARILIHISQRRGFKSNRKGASSKEDSELLAAISA TAGTT GAGGTTG
    NQVRMQQQGYRTVAEMYLKDPIYQEHRRNKGGNYIATV CTTGG CTATAAT
    SRAMVEDEVHQIFTGQRACGNPAATKELEEAYVEILLS TATGG AAGGTAG
    QRSFDDGPGDGSPYAGSQIERMIGKCQLEKEAGEPRAA TATAA TAAACCG
    KATYSFEYFSLLAAINNISIISNGQLSPLTKEQREMLI T CAGAGCT
    ALAHKTSELNYARIRKELGLSEAQRFNTVSYGKMEIAE [SEQ CTAACGC
    AEKKTKFEHLKAYHKMRREFERIAKGHFASITIEQRNA ID NO. CTCACAT
    IGDVLSKYKTDAKIRPALREAGLTELDIDAAEALNFSK 65] TTGTGGG
    FGHISIKACKKIIPWLEQGMKYSEACNAAGYNFKGHDG GCGTTAT
    QEKSHLLPPLDEESRNVITSPVALRAISQTIKVVNAII CTCT
    RERGCSPTFINIELAREMSKDFYERIEIKKEQDGNRAK
    NERMMERIRTEYGKASPTGQDLVKFKLYEEQGGVCAYS [SEQ ID
    LKQMSLAHLFEPDYAEVDHIVPYSISFDDGYKNKVLVL NO. 66]
    AKENRDKGNRLPLQYLQGKRREDFIAWVNSCVRDYKKR
    QRLLKESISEDDLRAFKERNLQDTKTASRFLLNYISDH
    LEFTQFATERKKHVTAVNGSVTAYLRKRWGITKIRENG
    DLHHAVDALVIACTTDGMIQQVSRFAQHRENQYSLAED
    SRFIIDPETGEVIKEFPYPWPRFRQELEARLSSNPGLA
    VRDRGFLLYMAESIPVHPLFVSRMPRRKVTGAAHKETI
    KSGKAQKDGLLIVKKPLTDLKLDKEGEIANYYNPMSDR
    LLYEALKKRLTAFNGDGKKAFADPFYKPKSDGTQGPLV
    NKVKLCEPSTLNVSVIGGKGVAENDSMVRIDVFRVEGD
    GYYFVPVYVADTVKPELPNKACVANKPYTDWKEMRESD
    FLFSLYPNDLLKVTHKKALILTKAQKDSDLPDCKETKS
    EMLYFVSASISTASLACRTHDNSYRINSLGIKTLEALE
    KYTVDVLGEYHPVRRETRQTFTGRESSGHSGIS [SEQ
    ID NO. 64]
    MAD2038 141 CACWHR010000008.1 Rumino- Cattle MRPYGIGLDIGISSVGWAAIALDHQDSPCGILDMGARI GTTGT TTATACC
    coccaceae rumen FDAAENPKDGASLAAPRREKRSQRRRLRRHRHRNERIR AGTTC ATACCAA
    bacterium RMLLKEGLLTEAELTGLFDGALEDIYALRTRALDEALT CCTGA GAACGAT
    KQEFARVLLHLSQRRGFRSNRRATAAQEDGKLLDAVSE TCGTT CAGGTTG
    NAKRMADCGYRTVGEMLCRDATFAKHKRNKGGEYLTTV CTTGG CTACAAT
    SRAMIEDEVKLVFASQRRLGSAFASEALEQGYLDILLS TATGG AAGGTAG
    QRSFDEGPGGNSPYGGAQIERMIGKCTFYPEEPRAARA TATAA TAAACCG
    CYSFEYFSLLQKVNHIRLQKDGESTPLTSEQRLQLIEL T AAGAGCT
    ANKTENLDYARIRRALQIPDAYRFNTVSYRIESDPAAA [SEQ CTAACGC
    EKKEKFQYLRAYHTMRKAIDGASKGRFALLSQEQRDQI ID NO. CCCGTTT
    GTVLTLYKSQERISEKLTEAGIEPCDIAALESVSGFSK 68] CTTTACG
    TGHISLRACKELIPYLEQGMNYNEACAAAGIEFHGHSG GGGCGTT
    TERTVVLHPTPDDLADITSPVVRRAVAQTVKVINAVIR ATCTCT
    RYGSPVFVNIELARELAKDFTERKKLEKDNKTNRAENE [SEQ ID
    RLMRRIREEYGKMNPTGLDLVKLRLYEEQAGVCPYSQK NO. 69]
    QMSLQRLFEPNYAEVDHIIPYSISFDDSRRNKVLVLAE
    ENRNKGNRLPLQYLTGERRDNFIVWVNSSVRDYRKKQK
    LLKPTVTDEDKQQFKERNLQDTKTMSRFLMNYINDHLQ
    FGVSAKERKKRVTAVNGIVTSYLRKRWGITKIRGDGDL
    HHAVDALVIACATDGMIRQITRYAQYRECRYMQTDTGS
    AAIDEATGEVLRIFPYPWEHFRKELEARLSSDPARAVN
    ALRLPFYLDSGEPLPKPLFVSRMPRRKVSGAAHKDTVK
    SPKAMAEGKVIVRRALTDLKLKNGEIENYFDPGSDRLL
    YDALKARLAAFGGDGAKAFREPFYKPRHDGTPGPLVKK
    VKLCEPTTLNVAVHGGKGVADNDSMVRIDVFRVEGDGY
    YFVPIYIADTLKPVLPNKACVAFKPYSEWRTMDDRDFI
    FSLYPNDLIRVTHKSALKLSRVSKESTLPESIESKTAL
    LYYVSAGISGAAVSCRNHDNSYEIKSMGIKTLEKLEKY
    TVDVLGEYHKVEKERRMPFTGKRS [SEQ ID NO.
    67]
    MAD2039 141 CACZLL010000017.1 Rumino- Cattle MRPYAIGLDIGITSVGWATVALDADESPCGIIGLGSRI GTTAT TTATACC
    coccaceae rumen FDAAEQPKTGESLAAPRRAARGSRRRLRRHRHRNERIR AGTTC ATACCAA
    bacterium SLMLEERLISQDELETLFDGRLEDIYALRVKALDEIVS CCTGA GAACTAT
    RTDFARILLHISQRRGFKSNRKNPTTKEDGVLLAAVNE TAGTT TTAGGTT
    NKQRMSEHGYRTVGEMFLLDETFKDHKRNKGGNYITTV CTTGG ACTATGA
    ARDMVADEVRAIFSAQRELGASFASEEFEERYLEILLS TATGG TAAGGTT
    QRSFDEGPGGNSPYGGSQIERMVGRCTFFPDEPRAAKA TATAA TAGTACA
    TYSFEYFTLLQKVNHIRIVENGVASKLTDEQRRIIIEL T CCTTAGA
    AHTTKDVSYAKIRKVLKLSDKQLFNIRYSDNSPAEDSE [SEQ GCTCTGA
    KKEKLGIMKAYHQMRSAIDRVSKGRFAMMPRAQRNAIG ID NO. CGCCTCG
    TALSLYKTSDKIRKYLTDAGLDEIDINSADSIGSFSKF 71] CTTTTGC
    GHISVKACDMLIPFLEQGMNYNEACAAAGLNFKGHDAG GAGGCGT
    EKSKLLHPKEEDYEDITSPVVRRAIAQTIKVINAIIRR TATCTCT
    EGCSPTFINIELAREMAKDFRERNRIKKENDDNRAKNE TTATATT
    RLLERIRTEYGKNNPTGLDLVKLRLYEEQSGVCMYSLK GCCAAAA
    QMSLEKLFEPNYAEVDHIVPYSISFDDSRKNKVLVLTE ATGCAAA
    ENRNKGNRLPLQYLKGRRREDFIVWVNNNVKDYRKRRL TATATCG
    LLKEELTAEDESGFKERNLQDTKTMSRFLLNYIADNLE TACAATG
    FAESTRGRKKKVTAVNGAVTAYMRKRWGITKIREDGDC GTGGC
    HHAVDAVVIACTTDAMIRQVSRYAQFRECEYMQTESGS [SEQ ID
    VAVDTGTGEVLRTFPYPWPDFRKELEARLANDPAKVIN NO. 72]
    DLHLPFYMSAGRPLPEPVFVSRMPRRKVTGAAHKDTIK
    SARELDNGYLIVKRPLTDLKLKNGEIENYYNPQSDKCL
    YDALKNALIEHGGDAKKAFAGEFRKPKRDGTPGPIVKK
    VKLLEPTTMCVPVHGGKGAADNDSMVRVDVFLSGGKYY
    LVPIYVADTLKPELPNKAVTRGKKYSEWLEMADEDFIF
    SLYPNDLICATSKNGITLSVCRKDSTLPPTVESKSFML
    YYRGTDISTGSISCITHDNAYKLRGLGVKTLEKLEKYT
    VDVLGEYHKVGKEVRQPFNIKRRKACPSEML [SEQ
    ID NO. 70]
    MAD2040 141 DHKF01000115.1 Clostri- Feces MHRYAIGLDIGITSVGWAAIALDAEENPCGMLDFGSRI GTTGT TTATACC
    diales FTGAEHPKTGASLAAPRREARGARRRLRRHRHRNERIR AGTTC ATACCAA
    bacterium RLMVSGGLISQEQLESLFAGQLEDIYALRTRALDEQVA CCTGA GAACTGC
    UBA4701 REELARIMLHLSQRRGFRSNRKGGADAEDGKLLEAVGD TGGTT TCAGGTT
    NKRRMDEKGYRTAGEMFFKDEAFAAHKRNKGGNYIATV CTTGG ACTATGA
    TRAMTEDEVHRIFAAQRGFGAEYANEKLEAAYLDILLS TATGG TAAGGTA
    QRSFDEGPGGDSPYGGSQIERMIGTCAFEPDQPRAAKA TATAA GTAAACC
    AYSFEYFSLLEKLNHIRLVSGGKSEPLTDAQRKKLIEL T GAAGAGC
    AHKQDTLSYAKIRKELELNEAVRFNSVRYTDDATFEEQ [SEQ TCTAATG
    EKKEKIVCMKAYHAMRKAVDKNAKGRFAYLTIPQRNEI ID NO. CCCCGTC
    GRVLSTYKTSAKIEPALAAAGIEPCDIAALEGLSFSKF 74] TCGCACG
    GHLSIKACDKLIPFLEKAMNYNDACAAAGYDFRGHSRD GGGCATT
    GRQMYLPPLGGDCTEITSPVVRRAVSQTIKVINAIIRR ATCTCTA
    YGTSPVYVNIELAREMSKDFAERNKIKKQNDDNRSKNE ACAGCGA
    KIKEQVAEYKHGAATGLDIVKMKLFNEQGGICAYSQRQ AAAGGCA
    MSLERLFDPNYAEVDHIVPYSISFDDRYKNKVLVLTEE AA
    NRNKGNRLPLQYLTGERRDRFIVWVNNSVRDFQKRKLL [SEQ ID
    LKEALTPEEENDWKERNLQDTKFVSSFLLNYINDNLLF NO. 75]
    APSVRRKKRVTAVNGAVTDYMRKRWGISKVREDGDRHH
    AVDAVVIACTNDALIQKVSRYESWHERHYMPTENGSIL
    VDPATGEIKQTFPYPWAMFRKELEARLSNDPSRAVADL
    KLPFYMDADAPPVKPLFVSRMPTRKVTGAAHKDTVKSA
    RALADGLAIVRRPLTALKLDKDGEIAGYYNKDSDRLLY
    DALKARLTEYGGNAAKAFAEPFYKPKSDGTPGPVVNKV
    KLTEPTTLSVPVQDGTGIADNDSMVRIDVFRVVGDGYY
    FVPVYVADTLKQELPDRAVVAFKAHSEWKVMSDGDFVF
    SLYPNDLVKVTRKKDVILKRSFDNSTLPETIASNECLL
    YYAGADISTGAISCVTNDNAYSIRGLGIKTLVSMEKYT
    VDILGEYHPVRKEERQRFNTKR [SEQ ID NO. 73]
  • Example 3 Vector Cloning, MADZYME Library Construction and PCR
  • The MADzyme coding sequences were cloned into a pUC57 vector with T7-promoter sequence attached to the 5′-end of the coding sequence and a T7-terminator sequence attached to the 3′-end of the coding sequence.
  • First, Q5 Hot Start 2× master mix reagent (NEB, Ipswich, MA) was used to amplify the MADzyme sequences cloned in the pUC57 vector. The forward primer 5′-TTGGGTAACGCCAGGGTTTT [SEQ ID No. 172] and reverse primer 5′-TGTGTGGAATTGTGAGCGGA [SEQ ID No. 173] amplified the sequences flanking the MADzyme in the pUC57 vector including the T7-promoter and T7-terminator components at the 5′- and 3′-end of the MADzymes, respectively. 1 μM primers were used in a 10 μL PCR reaction using 3.3 μL boiled cell samples as templates in 96 well PCR plates. The PCR conditions shown in Table 2 were used:
  • TABLE 2
    STEP TEMPERATURE TIME
    DENATURATION 98° C. 30 SEC
    30 CYCLES 98° C. 10 SEC
    66° C. 30 SEC
    72° C. 3 MIN
    FINAL EXTENSION 72° C. 2 MIN
    HOLD 12° C.
  • Example 4 gRNA Construction
  • Several functional gRNAs associated with each MADzyme was designed by truncating the 5′ region, the 3′ region and the repeat/anti-repeat duplex (see Table 3).
  • TABLE 3
    gRNA 
    name sgRNAv1 sgRNAv2 sgRNAv3 sgRNAv4 sgRNAv5
    sgM GTTTTAGAGCTATGC GTTTTAGAGCTATGC GTTTTAGAGCTATGC GTTTTAGAGCT NONE
    2015 TGTTTTGAATGCTTC TGTTTTGAATGCTTC TGTTAACAACATAGC ATGCAAACATA
    CAAAACGAAATGTTG GTAGCATTCAAAACA AAGTTAAAATAAGGC GCAAGTTAAAA
    GTAGCATTCAAAACA ACATAGCAAGTTAAA TTTGTCCGTTCTCAA TAAGGCTTTGT
    ACATAGCAAGTTAAA ATAAGGCTTTGTCCG CTTTTAGTGACGCTG CCGTTCTCAAC
    ATAAGGCTTTGTCCG TTCTCAACTTTTAGT TTTCGGCG TTTTAGTGACG
    TTCTCAACTTTTAGT GACGCTGTTTCGGCG [SEQ ID NO. 78] CTGTTTCGGCG
    GACGCTGTTTCGGCG [SEQ ID NO. 77] [SEQ ID NO.
    [SEQ ID NO. 76] 79]
    sgM GTTTTAGAGTCATGT GTTTTAGAGTCATGT GTTTTAGAGTCATGT NONE NONE
    2016 TGTTTAGAATGGTAC TGTAAAAACAACATA TGTAAAAACAACATA
    CAAAACATCTTTTGG GCAAGTTAAAATAAG GCAAGTTAAAATAAG
    GACTATTCTAAACAA GTTTTAACCGTAATC CGTAATCAACTGTAA
    CATAGCAAGTTAAAA AACTGTAAAGTGGCG AGTGGCGCTGTTTCG
    TAAGGTTTTAACCGT CTGTTTCGGCGC GCGC
    AATCAACTGTAAAGT [SEQ ID NO. 81] [SEQ ID NO. 82]
    GGCGCTGTTTCGGCG
    C [SEQ ID NO.
    80]
    sgM GTTTTAGAGCTGTGC GTTTTAGAGCTGTGC GTTTTAGAGCTGTGC GTTTTAGAGCT NONE
    2017 TGTTTCGAATGGTTC TGTTTCGAAAAATCG TGTAAAAACAACACA GTGCAAACACA
    CAAAACGAAATGTTG AAACAACACAGCGAG GCGAGTTAAAATAAG GCGAGTTAAAA
    GAACTATTCGAAACA TTAAAATAAGGCTTT GCTTTGTCCGTACAC TAAGGCTTTGT
    ACACAGCGAGTTAAA GTCCGTACACAACTT AACTTGTAAAAGGGG CCGTACACAAC
    ATAAGGCTTTGTCCG GTAAAAGGGGCACCC CACCCGATTCGGGTG TTGTAAAAGGG
    TACACAACTTGTAAA GATTCGGGTGC C GCACCCGATTC
    AGGGGCACCCGATTC [SEQ ID NO. 84] [SEQ ID NO. 85] GGGTGC
    GGGTGCA [SEQ ID NO.
    [SEQ ID NO. 83] 86]
    sgM GTTTTAGAGCTGTGT GTTTTAGAGCTGTGT GTTTTAGAGCTGTGT NONE NONE
    2019 TGTTTCGAATGGTTC TGTAAAAACAATACA TGTAAAAACAATACA
    CAAAACGGTTTGAAA GCAAAGTTAAAATAA GCAAGTTAAAATAAG
    CCATTCGAAACAATA GGCTAGTCCGTATAC GCTAGTCCGTATACA
    CAGCAAAGTTAAAAT AACGTGAAAACACGT ACGTGAAAACACGTG
    AAGGCTAGTCCGTAT GGCACCGATTCGGTG GCACCGATTCGGTGC
    ACAACGTGAAAACAC C [SEQ ID NO. 89
    GTGGCACCGATTCGG [SEQ ID NO. 88]
    TGC [SEQ ID NO.
    87]
    sgM GTTTGCTAGTTATGT GTTTGCTAGTTATGT GTTTGCTAGTTATGT NONE NONE
    2020 TATTTATAGTATTAA TATAAAAATAACATA TATAAAAATAACATA
    GCAAACTGTAAATAA ACGAGTGCAAATAAG ACGAGTGCAAATAAG
    CATAACGAGTGCAAA CGTTTCGCGAAAATT CGTTTCGCGAAAATT
    TAAGCGTTTCGCGAA TACAGTGGCCCTGCT TACAGTGGCCCTGCT
    AATTTACAGTGGCCC GTGGGGCCTTTTTTA GTGGGGCC
    TGCTGTGGGGCCTTT TTTATCAAA [SEQ ID NO. 92]
    TTTATTTATCAAA [SEQ ID NO. 91]
    [SEQ ID NO. 90]
    sgM GTTTGAGAGCCTTGT NONE NONE NONE NONE
    2021 AAAACCGTATATCTC
    TCAAGCGAAAGATAA
    TGTTTTACAAGGCGA
    GTTCAAATAAGGATT
    TATCCGAAATCGCTT
    GCGTGCATTGGCACC
    ATCTATCTTTTAAGA
    CTTTCTTTGAAAGTC
    TT [SEQ ID NO.
    93]
    sgM GTTTGAGAGTCTTGT GTTTGAGAGTCTTGT GTTTGAGAGTCTTGT GTTTGAGAGTC NONE
    2022 TAATTCTTAAAGGTG AAAAACAAGACGAGT AAAAACAAGACGAGT TTGTTAATTCA
    TAAAACGAGAATTAA GCAAATAAGGTTTAT GCAAATAAGGTTTAT AAAGAATTAAC
    CAAGACGAGTGCAAA CCGGAATCGTCAATA CCGGAATCGTCAATA AAGACGAGTGC
    TAAGGTTTATCCGGA TGACCTGCATTGTGC TGACCTGCATTGTGC AAATAAGGTTT
    ATCGTCAATATGACC AGAATCTTTAAAATC AG [SEQ ID NO. ATCCGGAATCG
    TGCATTGTGCAGAAT ATATGATTTCATATG 96] TCAATATGACC
    CTTTAAAATCATATG GTTTTA [SEQ ID TGCATTGTGCA
    ATTTCATATGGTTTT NO. 95] GAATCTTTAAA
    A [SEQ ID NO. ATCATATGATT
    94] TCATATGGTTT
    TA [SEQ ID
    NO. 97]
    sgM GTTTGAGAGTAGTGT NONE NONE NONE NONE
    2023 AAATCCATAGGGGTC
    TCAAACGAAAAGACC
    CCTATGGATTTACAT
    TGCGAGTTCAAATAA
    AAGTTTACTCAAATC
    GTTGGCTTGACCAAC
    CGCACAGCGTGTGCT
    TAAAGATCTCTTCAG
    TGAGGTC [SEQ ID
    NO. 98]
    sgM GTTTGAGAGTAGTGT NONE NONE NONE NONE
    2024 AAATCCAGAGGGCTC
    CAAAACGAGCCCTCT
    GGATTTACACTACGA
    GTTCAAATAAAAATT
    ATTTCAAATCGCCGC
    TATGTCGGCCGCACA
    GTGTGTGCATTAAGA
    AAAGTCCGAAAGGGC
    [SEQ ID NO. 99]
    sgM GTTTGAGAGTAGTGT GTTTGAGAGTAGTGT GTTTGAGAGTAGTGT GTTTGAGAGTA NONE
    2025 AAATTTATAGGGTAG AAAAATACACTACGA AAAAATACACTACGA GTGTAAATTTA
    TAAAACAAATTTTAC GTTCAAATAAAAATT GTTCAAATAAAAATT TAGGAAAACCT
    TACCCTATAAATTTA ATTTCAAATCGTACT ATTTCAAATCGTACT ATAAATTTACA
    CACTACGAGTTCAAA TTTTAGTACCTTCAC TTTTAGTACCTTCAC CTACGAGTTCA
    TAAAAATTATTTCAA AAGTGTTGTGAATAT AAGTGTTGTGAA AATAAAAATTA
    ATCGTACTTTTTAGT TAACTCACCTTCGGG [SEQ ID NO. TTTCAAATCGT
    ACCTTCACAAGTGTT TGAG [SEQ ID 102] ACTTTTTAGTA
    GTGAATATTAACTCA NO. 101] CCTTCACAAGT
    CCTTCGGGTGAG GTTGTGAATAT
    [SEQ ID NO. TAACTCACCTT
    100] CGGGTGAG
    [SEQ ID NO. 
    103]
    sgM GTTTGAGAGTAGTGT NONE NONE NONE NONE
    2026 AATTTCATATGGTAG
    TCAAACGACTACCAT
    ATGAGATTACACTAC
    ACGGTTCAAATAAAG
    AATGTTCGAAACCGC
    CCTTTGGGGCCCGCT
    TGTTGCGGATTTACA
    GACTTGATATCAAGT
    CTG [SEQ ID NO.
    104]
    sgM GTTTGAGAGTAATGT GTTTGAGAGTAATGT GTTTGAGAGTAATGT GTTTGAGAGTA NONE
    2027 AAATTCATAGGATGG AAAAATACATTACAA AAAAATACATTACAA ATGTAAATTCA
    TAAAACGAAATTTAC GTTCAAATAAAAATT GTTCAAATAAAAATT TAAAAGTGAGT
    CATCCAGTGAGTTTA TATTCAACCCGTTCT TATTCAACCCGTTCT TTACATTACAA
    CATTACAAGTTCAAA TCGGAACCTCCACCG TCGGAACCTCCACCG GTTCAAATAAA
    TAAAAATTTATTCAA TGTGGAACATTAAGG TGTGGA [SEQ ID AATTTATTCAA
    CCCGTTCTTCGGAAC TCTGCTTTGCAGGCC NO. 107] CCCGTTCTTCG
    CTCCACCGTGTGGAA [SEQ ID NO. GAACCTCCACC
    C [SEQ ID NO. 106] GTGTGGAACAT
    105] TAAG [SEQ
    ID NO. 108]
    sgM GTTTGAGAGCAGTGT NONE NONE NONE NONE
    2028 TGTCTTATATAGCTC
    GAAAACGCATTGTAA
    GACAACACTGCTACG
    TTCAAATAAGCATAT
    TGCTACAAGGTTCTC
    CCTCGGAGAATGACC
    ATTAGGTCACTTAGA
    TAGCCGGTTCTTCTG
    GCTA [SEQ ID
    NO. 109]
    sgM GTTTGAGAGCAGTGT GTTTGAGAGCAGTGT GTTTGAGAGCAGTGT GTTTGAGAGCA NONE
    2029 TGTCTTATATAGCTC AAAAACACTGCTACG AAAAACACTGCTACG GTGTTGTCAAA
    GAAAACGCATTGTAA TTCAAATAAGCATAT TTCAAATAAGCATAT AGACAACACTG
    GACAACACTGCTACG TGCTACAAGGTTCTC TGCTACAAGGTTCTC CTACGTTCAAA
    TTCAAATAAGCATAT CATTGGAGAATGACC CATTGGAGAATGACC TAAGCATATTG
    TGCTACAAGGTTCTC ATTAGGTCGCTTAGA ATTAGGTC [SEQ CTACAAGGTTC
    CATTGGAGAATGACC TAGCCAGTTCTTCTG ID NO. 112] TCCATTGGAGA
    ATTAGGTCGCTTAGA GCTA [SEQ ID ATGACCATTAG
    TAGCCAGTTCTTCTG NO. 111] GTCGCTTAGAT
    GCTA [SEQ ID AGCCAGTTCTT
    NO. 110] CTGGCTA
    [SEQ ID NO. 
    113]
    sgM GTTTGAGAGCAGTGT NONE NONE NONE NONE
    2030 TGTCTTAAATAGCTC
    GAAAACGCATTGTAA
    GACAACACTGCACGT
    TCAAATAAGCAGATT
    GCTACAAGGTTCCCG
    TAAGGGAATGACCAT
    CTGGTCACATGAATA
    GCCCCCGGCAACGGT
    GGCTG [SEQ ID
    NO. 114]
    sgM ATTGTACCATAGCGA NONE NONE NONE NONE
    2031 GTTAAATTAGGGAAT
    TACAACGAAATTGTA
    ATAACCTATTTTACC
    TCGCTATGGCACAAT
    TTGTTATTACATGGA
    CATTATACTAAACAT
    TTCCTAAAAAAGCAA
    CGAAAAACGTGCT
    [SEQ ID NO.
    115]
    sgM GTTGTAGTTCCCTAA GTTGTAGTTCCCTAA GTTGTAGTTCCCTAA GTTGTAGTTCC NONE
    2032 TTATTCTTGGTATGG TTATTCTTGGTAAAA TTATTCTTGGTAAAA CTAATTATTCT
    TATAATGAAAATTGT ACCAAGAACAATTAG ACCAAGAACAATTAG TGGTATGGTAA
    ATCATACCAAGAACA GTTACTATGATAAGG GTTACTATGATAAGG AAATATCATAC
    ATTAGGTTACTATGA TAGTATACCGCAAAG TAGTATACCGCAAAG CAAGAACAATA
    TAAGGTAGTATACCG CTCTAACACCTCATC CTCTAACACCTCATC GGTTACTATGA
    CAAAGCTCTAACACC TTCGGATGAGGTGTT TTCGGATGAG [SEQ TAAGGTAGTAT
    TCATCTTCGGATGAG A [SEQ ID NO. ID NO. 118] ACCGCAAAGCT
    GTGTTATCT [SEQ 117] CTAACACCTCA
    ID NO. 116] TCTTCGGATGA
    GGTGTTATCT
    [SEQ ID NO. 
    119]
    sgM GTTGTAGTTCCCTAA GTTGTAGTTCCCTAA GTTGTAGTTCCCTAA GTTGTAGTTCC NONE
    2033 CAGTTCTTGGTATGG CAGTTCTAAAAAGAA CAGTTCTAAAAAGAA CTAACAGTAAA
    TATAATAAAAATTAT CTGTTATGGTTGCTA CTGTTATGGTTGCTA AACTGTTATGG
    ACCATACCAAGAACT TGATAAGGTCTTAGC TGATAAGGTCTTAGC TTGCTATGATA
    GTTATGGTTGCTATG ACCGTAAAGCTCTGA ACCGTAAAGCTCTGA AGGTCTTAGCA
    ATAAGGTCTTAGCAC CGCCTCGCTTTCAGC CGCCTCGCTTTCAGC CCGTAAAGCTC
    CGTAAAGCTCTGACG GGGGCGTCA [SEQ GGGG [SEQ ID TGACGCCTCGC
    CCTCGCTTTCAGCGG ID NO. 121] NO. 122] TTTCAGCGGGG
    GGCGTCATCTTTTTT CGTCA
    GCCCAAAAGACACGG [SEQ ID NO. 
    ATATTTTT [SEQ 123]
    ID NO. 120]
    sgM GTTGTAGTTCCCTAA GTTGTAGTTCCCTAA GTTGTAGTTCCCTAA GTTGTAGTTCC NONE
    2034 CGGTTCTTGGTATGG CGGTACTGTTGGGTT CGGTACTGTTGGGTT CTAACGGTTCT
    TATAATGAATTATAC ACTACAATAAGGTAG ACTACAATAAGGTAG TGAAAACAAGA
    CATACCAAGAACTGT TAAACCGAAAAGCTC TAAACCGAAAAGCTC ACTGTTGGGTT
    TGGGTTACTACAATA TGACGTCTTGTTTGC TGACGTCTTGTTTGC ACTACAATAAG
    AGGTAGTAAACCGAA GCAGGACGTCATCTT GCAGGACGTCATCTT GTAGTAAACCG
    AAGCTCTGACGTCTT TATATCAGACGGATG T [SEQ ID NO. AAAAGCTCTGA
    GTTTGCGCAGGACGT [SEQ ID NO. 126] CGTCTTGTTTG
    CATCTTTATATCAGA 125] CGCAGGACGTC
    CGGATG [SEQ ID ATCTTTATATC
    NO. 124] AGACGGATG
    [SEQ ID NO. 
    127]
    sgM GTTGTAGTCCCCTGA NONE NONE NONE NONE
    2035 TGGTTTCTGGAATGG
    TATAATGAAATTATA
    CCATTCCAGAAACTA
    TTATGGTCACTACAA
    TAAGGTATTAGACCG
    TAGAGCACTAACACC
    CCATTTGGGGTGTTA
    TCTCTTTAAACTGTC
    CAAAATTTAGTATTG
    CAATTATTGA [SEQ
    ID NO. 128]
    sgM GTTATAGTTCCCTGT NONE NONE NONE NONE
    2036 TCGTTCTTGGTATGG
    TATAATGAAATTATA
    CCATACCAAGAACGA
    AGCAGGTTACTATGA
    TAAGGTAGTATACCG
    CAGAGCTCCAACGCC
    TCGCTTTTGCGGGGC
    GTTGTCTCT [SEQ
    ID NO. 128]
    sgM GTTATAGTTCCCTGA NONE NONE NONE NONE
    2037 TAGTTCTTGGTATGG
    TATAATGAAATTATA
    CCATACCAAGAACTA
    TGAGGTTGCTATAAT
    AAGGTAGTAAACCGC
    AGAGCTCTAACGCCT
    CACATTTGTGGGGCG
    TTATCTCT [SEQ
    ID NO. 129]
    sgM GTTGTAGTTCCCTGA NONE NONE NONE NONE
    2038 TCGTTCTTGGTATGG
    TATAATGAAATTATA
    CCATACCAAGAACGA
    TCAGGTTGCTACAAT
    AAGGTAGTAAACCGA
    AGAGCTCTAACGCCC
    CGTTTCTTTACGGGG
    CGTTATCTCT [SEQ
    ID NO. 130]
    sgM GTTATAGTTCCCTGA GTTATAGTTCCCTGA GTTATAGTTCCCTGA GTTATAGTTCC GTTATAGTTC
    2039 TAGTTCTTGGTATGG TAGTTCTTGGTATGG TAGTTCTTAACCAAG CTGATAGTTCT CCTGATAGTT
    TATAATGAATTATAC TATAATGAATTATAC AACTATTTAGGTTAC TGCAAGAACTA CTTGCAAGAA
    CATACCAAGAACTAT CATACCAAGAACTAT TATGATAAGGTTTAG TTTAGGTTACT CTATTTAGGT
    TTAGGTTACTATGAT TTAGGTTACTATGAT TACACCTTAGAGCTC ATGATAAGGTT TACTATGATA
    AAGGTTTAGTACACC AAGGTTTAGTACACC TGACGCCTCGCTTTT TAGTACACCTT AGGTTTAGTA
    TTAGAGCTCTGACGC TTAGAGCTCTGACGC GCGAGGCGTTATCTC AGAGCTCTGAC CACCTTAGAG
    CTCGCTTTTGCGAGG CTCGCTTTTGCGAGG T [SEQ ID NO. GCCTCGCTTTT CTCTGACGCC
    CGTTATCTCTTTATA CGTTATCTCT [SEQ 133] GCGAGGCGTTA AAAAGGCGTT
    TTGCCAAAAATGCAA ID NO. 132] TCTCT ATCTCT
    ATATATCGTACAATG [SEQ ID [SEQ ID
    GTGGC [SEQ ID NO. 134]  NO. 135]
    NO. 131]
    sgM GTTGTAGTTCCCTGA NONE GTTGTAGTTCCCTGA GTTGTAGTTCC NONE
    2040 TGGTTCTTGGTATGG TGGTTCTTGAAAAAG CTGATGGTTCT
    TATAATAAATTATAC AACTGCTCAGGTTAC TGAAAAAGAAC
    CATACCAAGAACTGC TATGATAAGGTAGTA TGCTCAGGTTA
    TCAGGTTACTATGAT AACCGAAGAGCTCTA CTATGATAAGG
    AAGGTAGTAAACCGA ATGCCCCGTCTCGCA TAGTAAACCGA
    AGAGCTCTAATGCCC CGGGGCATTATCTCT AGAGCTCTAAT
    CGTCTCGCACGGGGC [SEQ ID NO. GCCAAAGGGCA
    ATTATCTCT [SEQ 137] TTATCTCT
    ID NO. 136] [SEQ ID NO. 
    138]
  • To find the optimal gRNA length, different lengths of spacer, repeat:anti-repeat duplex and 3′ end of the tracrRNA were included. These gRNAs were then synthesized as a single stranded DNA downstream of the T7 promoter (see Table 4). These sgRNAs were amplified using two primers (5′-AAACCCCTCCGTTTAGAGAG [SEQ ID NO. 174] and 5′-AAGCTAATACGACTCACTATAGGCCAGTC [SEQ ID NO. 175]) and 1 uL of 10 uM diluted single stranded DNA as a template in 25 uL PCR reactions for each sgRNA according to the conditions of Table 5.
  • TABLE 4
    Name Sequence
    sg M201 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGCGCCGAAACAGCGCCACTTTACAGTTGATTACGGT
    6v1 TAAAACCTTATTTTAACTTGCTATGTTGTTTAGAATAGTCCCAAAAGATGTTTTGGTACCATTCTAAACAA
    CATGACTCTAAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 139]
    sg M201 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGCGCCGAAACAGCGCCACTTTACAGTTGATTACGGT
    6v2 TAAAACCTTATTTTAACTTGCTATGTTGTTTTTACAACATGACTCTAAAACCCAGTAACATTACTGACTGG
    CCTATAGTGAGTCGTATTA [SEQ ID NO. 140]
    sg M201 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGCGCCGAAACAGCGCCACTTTACAGTTGATTACGCT
    6v3 TATTTTAACTTGCTATGTTGTTTTTACAACATGACTCTAAAACCCAGTAACATTACTGACTGGCCTATAGT
    GAGTCGTATTA [SEQ ID NO. 141]
    sg M201 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGCACCGAATCGGTGCCACGTGTTTTCACGTTGTATA
    9v1 CGGACTAGCCTTATTTTAACTTTGCTGTATTGTTTCGAATGGTTTCAAACCGTTTTGGAACCATTCGAAAC
    AACACAGCTCTAAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO.
    142]
    sg M201 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGCACCGAATCGGTGCCACGTGTTTTCACGTTGTATA
    9v2 CGGACTAGCCTTATTTTAACTTTGCTGTATTGTTTTTACAACACAGCTCTAAAACCCAGTAACATTACTGA
    CTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 143]
    sg M201 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGCACCGAATCGGTGCCACGTGTTTTCACGTTGTATA
    9v3 CGGACTAGCCTTATTTTAACTTGCTGTATTGTTTTTACAACACAGCTCTAAAACCCAGTAACATTACTGAC
    TGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 144]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATTTGATAAATAAAAAAGGCCCCACAGCAGGGCCACT
    0v1 GTAAATTTTCGCGAAACGCTTATTTGCACTCGTTATGTTATTTACAGTTTGCTTAATACTATAAATAACAT
    AACTAGCAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 145]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATTTGATAAATAAAAAAGGCCCCACAGCAGGGCCACT
    0v2 GTAAATTTTCGCGAAACGCTTATTTGCACTCGTTATGTTATTTTTATAACATAACTAGCAAACCCAGTAAC
    ATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 146]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGGCCCCACAGCAGGGCCACTGTAAATTTTCGCGAAA
    0v3 CGCTTATTTGCACTCGTTATGTTATTTTTATAACATAACTAGCAAACCCAGTAACATTACTGACTGGCCTA
    TAGTGAGTCGTATTA [SEQ ID NO. 147]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATAAAACCATATGAAATCATATGATTTTAAAGATTCT
    2v1 GCACAATGCAGGTCATATTGACGATTCCGGATAAACCTTATTTGCACTCGTCTTGTTAATTCTTTTGAATT
    AACAAGACTCTCAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO.
    148]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATAAAACCATATGAAATCATATGATTTTAAAGATTCT
    2v2 GCACAATGCAGGTCATATTGACGATTCCGGATAAACCTTATTTGCACTCGTCTTGTTTTTACAAGACTCTC
    AAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 149]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTACTGCACAATGCAGGTCATATTGACGATTCCGGATAA
    2v3 ACCTTATTTGCACTCGTCTTGTTTTTACAAGACTCTCAAACCCAGTAACATTACTGACTGGCCTATAGTGA
    GTCGTATTA [SEQ ID NO. 150]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGCTCACCCGAAGGTGAGTTAATATTCACAACACTTGTGAA
    5v1 GGTACTAAAAAGTACGATTTGAAATAATTTTTATTTGAACTCGTAGTGTAAATTTATAGGTTTTCCTATAA
    ATTTACACTACTCTCAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO.
    151]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTACTCACCCGAAGGTGAGTTAATATTCACAACACTTGT
    5v2 GAAGGTACTAAAAAGTACGATTTGAAATAATTTTTATTTGAACTCGTAGTGTATTTTTACACTACTCTCAA
    ACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 152]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATTCACAACACTTGTGAAGGTACTAAAAAGTACGATT
    5v3 TGAAATAATTTTTATTTGAACTCGTAGTGTATTTTTACACTACTCTCAAACCCAGTAACATTACTGACTGG
    CCTATAGTGAGTCGTATTA [SEQ ID NO. 153]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGGCCTGCAAAGCAGACCTTAATGTTCCACACGGTGG
    7v1 AGGTTCCGAAGAACGGGTTGAATAAATTTTTATTTGAACTTGTAATGTAAACTCACTTTTATGAATTTACA
    TTACTCTCAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 154]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGGCCTGCAAAGCAGACCTTAATGTTCCACACGGTGG
    7v2 AGGTTCCGAAGAACGGGTTGAATAAATTTTTATTTGAACTTGTAATGTATTTTTACATTACTCTCAAACCC
    AGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 155]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATCCACACGGTGGAGGTTCCGAAGAACGGGTTGAATA
    7v3 AATTTTTATTTGAACTTGTAATGTATTTTTACATTACTCTCAAACCCAGTAACATTACTGACTGGCCTATA
    GTGAGTCGTATTA [SEQ ID NO. 156]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATAGCCAGAAGAACTGGCTATCTAAGCGACCTAATGG
    9v1 TCATTCTCCAATGGAGAACCTTGTAGCAATATGCTTATTTGAACGTAGCAGTGTTGTCTTTTGACAACACT
    GCTCTCAAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 157]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATAGCCAGAAGAACTGGCTATCTAAGCGACCTAATGG
    9v2 TCATTCTCCAATGGAGAACCTTGTAGCAATATGCTTATTTGAACGTAGCAGTGTTTTTACACTGCTCTCAA
    ACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 158]
    sg M202 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAGACCTAATGGTCATTCTCCAATGGAGAACCTTGTAG
    9v3 CAATATGCTTATTTGAACGTAGCAGTGTTTTTACACTGCTCTCAAACCCAGTAACATTACTGACTGGCCTA
    TAGTGAGTCGTATTA [SEQ ID NO. 159]
    sg M203 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAAGATAACACCTCATCCGAAGATGAGGTGTTAGAGCT
    2v1 TTGCGGTATACTACCTTATCATAGTAACCTAATTGTTCTTGGTATGATATTTTTACCATACCAAGAATAAT
    TAGGGAACTACAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 160]
    sg M203 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATAACACCTCATCCGAAGATGAGGTGTTAGAGCTTTG
    2v2 CGGTATACTACCTTATCATAGTAACCTAATTGTTCTTGGTTTTTACCAAGAATAATTAGGGAACTACAACC
    CAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 161]
    sg M203 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTACTCATCCGAAGATGAGGTGTTAGAGCTTTGCGGTAT
    2v3 ACTACCTTATCATAGTAACCTAATTGTTCTTGGTTTTTACCAAGAATAATTAGGGAACTACAACCCAGTAA
    CATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 162]
    sg M203 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATGACGCCCCGCTGAAAGCGAGGCGTCAGAGCTTTAC
    3v1 GGTGCTAAGACCTTATCATAGCAACCATAACAGTTTTTACTGTTAGGGAACTACAACCCAGTAACATTACT
    GACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 163]
    sg M203 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTATGACGCCCCGCTGAAAGCGAGGCGTCAGAGCTTTAC
    3v2 GGTGCTAAGACCTTATCATAGCAACCATAACAGTTCTTTTTAGAACTGTTAGGGAACTACAACCCAGTAAC
    ATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 164]
    sg M203 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTACCCCGCTGAAAGCGAGGCGTCAGAGCTTTACGGTGC
    3v3 TAAGACCTTATCATAGCAACCATAACAGTTCTTTTTAGAACTGTTAGGGAACTACAACCCAGTAACATTAC
    TGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 165]
    sg M203 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTACATCCGTCTGATATAAAGATGACGTCCTGCGCAAAC
    4v1 AAGACGTCAGAGCTTTTCGGTTTACTACCTTATTGTAGTAACCCAACAGTTCTTGTTTTCAAGAACCGTTA
    GGGAACTACAACCCAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 166]
    sg M203 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTACATCCGTCTGATATAAAGATGACGTCCTGCGCAAAC
    4v2 AAGACGTCAGAGCTTTTCGGTTTACTACCTTATTGTAGTAACCCAACAGTACCGTTAGGGAACTACAACCC
    AGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 167]
    sg M203 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAAAAGATGACGTCCTGCGCAAACAAGACGTCAGAGCT
    4v3 TTTCGGTTTACTACCTTATTGTAGTAACCCAACAGTACCGTTAGGGAACTACAACCCAGTAACATTACTGA
    CTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 168]
    sg M203 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAAGAGATAACGCCTCGCAAAAGCGAGGCGTCAGAGCT
    9v1 CTAAGGTGTACTAAACCTTATCATAGTAACCTAAATAGTTCTTGCAAGAACTATCAGGGAACTATAACCCA
    GTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 169]
    sg M203 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAAGAGATAACGCCTTTTGGCGTCAGAGCTCTAAGGTG
    9v2 TACTAAACCTTATCATAGTAACCTAAATAGTTCTTGCAAGAACTATCAGGGAACTATAACCCAGTAACATT
    ACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 170]
    sg M203 AAACCCCTCCGTTTAGAGAGGGGTTATGCTAGTTAAGAGATAACGCCTCGCAAAAGCGAGGCGTCAGAGCT
    9v3 CTAAGGTGTACTAAACCTTATCATAGTAACCTAAATAGTTCTTGGTTAAGAACTATCAGGGAACTATAACC
    CAGTAACATTACTGACTGGCCTATAGTGAGTCGTATTA [SEQ ID NO. 171]
  • TABLE 5
    STEP TEMPERATURE TIME
    DENATURATION 98° C. 30 SEC
    12 CYCLES 98° C. 10 SEC
    66° C. 30 SEC
    72° C. 2 MIN
    FINAL EXTENSION 72° C. 2 MIN
    HOLD 12° C.
  • The target library was designed based on an assumption that the eight randomized NNNNNNNN [SEQ ID NO. 176] PAMs of these nucleases reside on the 3′ end of the target sequence (5′-CCAGTCAGTAATGTTACTGG [SEQ ID NO. 177]).
  • Example 5 In Vitro Transcription and Translation for Production of MAD Nucleases and gRNAs
  • The MADZYMEs were tested for activity by in vitro transcription and translation (txtl). Both the gRNA plasmid and nuclease plasmid were included in each txtl reaction. A PURExpress® In Vitro Protein Synthesis Kit (NEB, Ipswich, Mass.) was used to produce MADzymes from the PCR-amplified MADZYME library and also to produce the gRNA libraries. In each well in a 96-well plate, the reagents listed in Table 6 were mixed to start the production of MADzymes and gRNAs:
  • TABLE 6
    REAGENTS VOLUME (μl)
    1 SolA (NEB kit) 10
    2 SolB (NEB kit) 7.5
    3 PCR amplified gRNA 0.4
    4 Murine RNase inhibitor (NEB) 0.5
    5 Water 3.0
    6 PCR amplified T7 MADZYMEs 3.6
  • A master mix with all reagents was mixed on ice with the exception of the PCR-amplified T7-MADZYMEs to cover enough 96-well plates for the assay. After 21 μL of the master mix was distributed in each well in 96 well plates, 4 μL of the mixture of PCR amplified MADZYMEs and gRNA under the control of T7 promoter was added. The 96-well plates were sealed and incubated for 4 hrs at 37° C. in a thermal cycler. The plates were kept at room temperature until the target pool was added to perform the target depletion reaction.
  • After 4 hours incubation to allow production of the MADzymes and gRNAs, 4 μL of the target library pool (10 ng/μL) was added to the 10 μL aliquots of in vitro transcription/translation reaction mixture and allowed to deplete for 30 min, 3 hrs or overnight at 37° C. and 48° C. The target depletion reaction mixtures were diluted into PCR-grade water that contains RNAse A incubated for 5 min at room temperature. Proteinase K was then added and the mixtures were incubated for 5 min at 55° C. RNAseA/Proteinase K treated samples were purified with DNA purification kits and the purified DNA samples were then amplified and sequenced. The PCR conditions are shown in Table 7:
  • TABLE 7
    STEP TEMPERATURE TIME
    DENATURATION 98° C. 30 SEC
     4 CYCLES 98° C. 10 SEC
    66° C. 30 SEC
    72° C. 20 SEC
    12 CYCLES 98° C. 10 SEC
    72° C. 20 SEC
    FINAL EXTENSION 72° C. 2 MINUTES
    HOLD 12° C.
  • Example 6 Measurement of Nicked Plasmid with Nickase RNP Complexes
  • Proteins were produced in vitro under a PURExpress® In Vitro Protein Synthesis Kit (NEB, Ipswich, Mass.). Guide RNAs that target the target plasmid were also produced under a T7 promoter in the same mixture. The MADzyme Nickase or Nuclease and guide complexes (RNP complex) formed as they were produced in the in vitro transcription and translation reagent. Supercoiled plasmid target was diluted into the digestion buffer, then the RNP complex was added to the same digestion buffer to initiate the plasmid digestion. After incubation at 37° C. to allow digestion of the plasmid, the resulting mixtures were treated with RNAase and Proteinase K, then the target plasmid was purified with a PCR cleanup kit, and run on TAE-agarose gel to observe the formation of nicked or double stand cut plasmid. The results are shown in FIG. 7. Table 8 lists the identified MADzyme nickases, including the variations from the nuclease sequence in Table 1 and the amino acid sequence.
  • TABLE 8
    MAD
    zyme SEQ
    Nickase ID
    Name NO Amino Acid Sequence
    MAD2016- 178 MKKDYVIGLDIGTNSVGWAVMTEDYQLVKKKMPIYGNTEKKKIKKNFWGVRLFEEGHTAEDRR
    H851A LKRTARRIISRRRNRLRYLQAFFEEAMTDLDENFFARLQESFLVPEDKKWHRHPIFAKLEDEV
    AYHETYPTIYHLRKKLADSSEQADLRLIYLALAHIVKYRGHFLIEGKLSTENISVKEQFQQFM
    IIYNQTFVNGESRLVSAPLPESVLIEEELTEKASRTKKSEKVLQQFPQEKANGLFGQFLKLMV
    GNKADFKKVFGLEEEAKITYASESYEEDLEGILAKVGDEYSDVFLAAKNVYDAVELSTILADS
    DKKSHAKLSSSMIVRFTEHQEDLKKFKRFIRENCPDEYDNLFKNEQKDGYAGYIAHAGKVSQL
    KFYQYVKKIIQDIAGAEYFLEKIAQENFLRKQRTFDNGVIPHQIHLAELQAIIHRQAAYYPFL
    KENQEKIEQLVTFRIPYYVGPLSKGDASTFAWLKRQSEEPIRPWNLQETVDLDQSATAFIERM
    TNFDTYLPSEKVLPKHSLLYEKFMVFNELTKISYTDDRGIKANFSGKEKEKIFDYLFKTRRKV
    KKKDIIQFYRNEYNTEIVTLSGLEEDQFNASFSTYQDLLKCGLTRAELDHPDNAEKLEDIIKI
    LTIFEDRQRIRTQLSTFKGQFSAEVLKKLERKHYTGWGRLSKKLINGIYDKESGKTILGYLIK
    DDGVSKHYNRNFMQLINDSQLSFKNAIQKAQSSEHEETLSETVNELAGSPAIKKGIYQSLKIV
    DELVAIMGYAPKRIVVEMARENQTTSTGKRRSIQRLKIVEKAMAEIGSNLLKEQPTTNEQLRD
    TRLFLYYMQNGKDMYTGDELSLHRLSHYDIDAIIPQSFMKDDSLDNLVLVGSTENRGKSDDVP
    SKEVVKDMKAYWEKLYAAGLISQRKFQRLTKGEQGGLTLEDKAHFIQRQLVETRQITKNVAGI
    LDQRYNANSKEKKVQIITLKASLTSQFRSIFGLYKVREVNDYHHGQDAYLNCVVATTLLKVYP
    NLAPEFVYGEYPKFQTFKENKATAKAIIYTNLLRFFTEDEPRFTKDGEILWSNSYLKTIKKEL
    NYHQMNIVKKVEVQKGGFSKESIKPKGPSNKLIPVKNGLDPQKYGGFDSPIVAYTVLFTHEKG
    KKPLIKQEILGITIMEKTRFEQNPILFLEEKGFLRPRVLMKLPKYTLYEFPEGRRRLLASAKE
    AQKGNQMVLPEHLLTLLYHAKQCLLPNQSESLTYVEQHQPEFQEILERVVDFAEVHTLAKSKV
    QQIVKLFEANQTADVKEIAASFIQLMQFNAMGAPSTFKFFQKDIERARYTSIKEIFDATIIYQ
    STTGLYETRRKVVD
    MAD2016- 179 MKKDYVIGLDIGTNSVGWAVMTEDYQLVKKKMPIYGNTEKKKIKKNFWGVRLFEEGHTAEDRR
    N874A LKRTARRIISRRRNRLRYLQAFFEEAMTDLDENFFARLQESFLVPEDKKWHRHPIFAKLEDEV
    AYHETYPTIYHLRKKLADSSEQADLRLIYLALAHIVKYRGHFLIEGKLSTENISVKEQFQQFM
    IIYNQTFVNGESRLVSAPLPESVLIEEELTEKASRTKKSEKVLQQFPQEKANGLFGQFLKLMV
    GNKADFKKVFGLEEEAKITYASESYEEDLEGILAKVGDEYSDVFLAAKNVYDAVELSTILADS
    DKKSHAKLSSSMIVRFTEHQEDLKKFKRFIRENCPDEYDNLFKNEQKDGYAGYIAHAGKVSQL
    KFYQYVKKIIQDIAGAEYFLEKIAQENFLRKQRTFDNGVIPHQIHLAELQAIIHRQAAYYPFL
    KENQEKIEQLVTFRIPYYVGPLSKGDASTFAWLKRQSEEPIRPWNLQETVDLDQSATAFIERM
    TNFDTYLPSEKVLPKHSLLYEKFMVFNELTKISYTDDRGIKANFSGKEKEKIFDYLFKTRRKV
    KKKDIIQFYRNEYNTEIVTLSGLEEDQFNASFSTYQDLLKCGLTRAELDHPDNAEKLEDIIKI
    LTIFEDRQRIRTQLSTFKGQFSAEVLKKLERKHYTGWGRLSKKLINGIYDKESGKTILGYLIK
    DDGVSKHYNRNFMQLINDSQLSFKNAIQKAQSSEHEETLSETVNELAGSPAIKKGIYQSLKIV
    DELVAIMGYAPKRIVVEMARENQTTSTGKRRSIQRLKIVEKAMAEIGSNLLKEQPTTNEQLRD
    TRLFLYYMQNGKDMYTGDELSLHRLSHYDIDHIIPQSFMKDDSLDNLVLVGSTEARGKSDDVP
    SKEVVKDMKAYWEKLYAAGLISQRKFQRLTKGEQGGLTLEDKAHFIQRQLVETRQITKNVAGI
    LDQRYNANSKEKKVQIITLKASLTSQFRSIFGLYKVREVNDYHHGQDAYLNCVVATTLLKVYP
    NLAPEFVYGEYPKFQTFKENKATAKAIIYTNLLRFFTEDEPRFTKDGEILWSNSYLKTIKKEL
    NYHQMNIVKKVEVQKGGFSKESIKPKGPSNKLIPVKNGLDPQKYGGFDSPIVAYTVLFTHEKG
    KKPLIKQEILGITIMEKTRFEQNPILFLEEKGFLRPRVLMKLPKYTLYEFPEGRRRLLASAKE
    AQKGNQMVLPEHLLTLLYHAKQCLLPNQSESLTYVEQHQPEFQEILERVVDFAEVHTLAKSKV
    QQIVKLFEANQTADVKEIAASFIQLMQFNAMGAPSTFKFFQKDIERARYTSIKEIFDATIIYQ
    STTGLYETRRKVVD
    MAD2032- 180 MKYIIGLDMGITSVGFATMMLDDKDEPCRIIRMGSRIFEAAEHPKDGSSLAAPRRINRGMRRR
    H590A LRRKSHRKERIKDLIIKNELMTADEISAIYSTGKQLSDIYQIRAEALDRKLNTEEFVRLLIHL
    SQRRGFKSNRKVDAKEKGSDAGKLLSAVNSNKELMIEKNYRTIGEMLYKDEKFSEYKRNKADD
    YSNTFARSEYEDEIRQIFSAQQEHGNPYATDELKESYLDIYLSQRSFDEGPGGSSPYGGNQIE
    KMIGNCTLEPEEKRAAKATFSFEYFNLLSKVNSIKIVSSSGKRALNNDERQSVIRLAFAKNAI
    SYTSLRKELNMEYSERFNISYSQSDKSIEEIEKKTKFTYLTAYHTFKKAYGSVFVEWSADKKN
    SLAYALTAYKNDTKIIEYLTQKGFDAAETDIALTLPSFSKWGNLSEKALNNIIPYLEQGMLYH
    DACTAAGYNFKADDTDKRMYLPAHEKEAPELDDITNPVVRRAISQTIKVINALIREMGESPCF
    VNIELARELSKNKAERSKIEKGQKENQVRNDRIMERLRNEFGLLSPTGQDLIKLKLWEEQDGI
    CPYSLKPIKIEKLFDVGYTDIDAIIPYSLSFDDTYNNKVLVMSSENRQKGNRIPMQYLEGKRQ
    DDFWLWVDNSNLSRRKKQNLTKETLSEDDLSGFKKRNLQDTQYLSRFMMNYLKKYLALAPNTT
    GRKNTIQAVNGAVTSYLRKRWGIQKVRENGDTHHAVDAVVISCVTAGMTKRVSEYAKYKETEF
    QNPQTGEFFDVDIRTGEVINRFPLPYARFRNELLMRCSENPSRILHEMPLPTYAADEKVAPIF
    VSRMPKHKVKGSAHKETIRRAFEEDGKKYTVSKVPLTDLKLKNGEIENYYNPESDGLLYNALK
    EQUAFGGDAAKAFEQPFYKPKSDGSEGPLVKKVKLINKATLTVPVLNNTAVADNGSMVRVDVF
    FVEGEGYYLVPIYVADTVKKELPNKAIIANKPYEEWKEMREENFVFSLYPNDLIKISSRKDMK
    FNLVNKESTLAPNCQSKEALVYYKGSDISTAAVTAINHDNTYKLRGLGVKTLLKIEKYQVDVL
    GNVFKVGKEKRVRFK
    MAD2039- 181 MRPYAIGLDIGITSVGWATVALDADESPCGIIGLGSRIFDAAEQPKTGESLAAPRRAARGSRR
    H587A RLRRHRHRNERIRSLMLEERLISQDELETLFDGRLEDIYALRVKALDEIVSRTDFARILLHIS
    QRRGFKSNRKNPTTKEDGVLLAAVNENKQRMSEHGYRTVGEMFLLDETFKDHKRNKGGNYITT
    VARDMVADEVRAIFSAQRELGASFASEEFEERYLEILLSQRSFDEGPGGNSPYGGSQIERMVG
    RCTFFPDEPRAAKATYSFEYFTLLQKVNHIRIVENGVASKLTDEQRRIIIELAHTTKDVSYAK
    IRKVLKLSDKQLFNIRYSDNSPAEDSEKKEKLGIMKAYHQMRSAIDRVSKGRFAMMPRAQRNA
    IGTALSLYKTSDKIRKYLTDAGLDEIDINSADSIGSFSKFGHISVKACDMLIPFLEQGMNYNE
    ACAAAGLNFKGHDAGEKSKLLHPKEEDYEDITSPVVRRAIAQTIKVINAIIRREGCSPTFINI
    ELAREMAKDFRERNRIKKENDDNRAKNERLLERIRTEYGKNNPTGLDLVKLRLYEEQSGVCMY
    SLKQMSLEKLFEPNYAEVDAIVPYSISFDDSRKNKVLVLTEENRNKGNRLPLQYLKGRRREDF
    IVWVNNNVKDYRKRRLLLKEELTAEDESGFKERNLQDTKTMSRFLLNYIADNLEFAESTRGRK
    KKVTAVNGAVTAYMRKRWGITKIREDGDCHHAVDAVVIACTTDAMIRQVSRYAQFRECEYMQT
    ESGSVAVDTGTGEVLRTFPYPWPDFRKELEARLANDPAKVINDLHLPFYMSAGRPLPEPVFVS
    RMPRRKVTGAAHKDTIKSARELDNGYLIVKRPLTDLKLKNGEIENYYNPQSDKCLYDALKNAL
    IEHGGDAKKAFAGEFRKPKRDGTPGPIVKKVKLLEPTTMCVPVHGGKGAADNDSMVRVDVFLS
    GGKYYLVPIYVADTLKPELPNKAVTRGKKYSEWLEMADEDFIFSLYPNDLICATSKNGITLSV
    CRKDSTLPPTVESKSFMLYYRGTDISTGSISCITHDNAYKLRGLGVKTLEKLEKYTVDVLGEY
    HKVGKEVRQPFNIKRRKACPSEML
    MAD2039- 182 MRPYAIGLDIGITSVGWATVALDADESPCGIIGLGSRIFDAAEQPKTGESLAAPRRAARGSRR
    N610A RLRRHRHRNERIRSLMLEERLISQDELETLFDGRLEDIYALRVKALDEIVSRTDFARILLHIS
    QRRGFKSNRKNPTTKEDGVLLAAVNENKQRMSEHGYRTVGEMFLLDETFKDHKRNKGGNYITT
    VARDMVADEVRAIFSAQRELGASFASEEFEERYLEILLSQRSFDEGPGGNSPYGGSQIERMVG
    RCTFFPDEPRAAKATYSFEYFTLLQKVNHIRIVENGVASKLTDEQRRIIIELAHTTKDVSYAK
    IRKVLKLSDKQLFNIRYSDNSPAEDSEKKEKLGIMKAYHQMRSAIDRVSKGRFAMMPRAQRNA
    IGTALSLYKTSDKIRKYLTDAGLDEIDINSADSIGSFSKFGHISVKACDMLIPFLEQGMNYNE
    ACAAAGLNFKGHDAGEKSKLLHPKEEDYEDITSPVVRRAIAQTIKVINAIIRREGCSPTFINI
    ELAREMAKDFRERNRIKKENDDNRAKNERLLERIRTEYGKNNPTGLDLVKLRLYEEQSGVCMY
    SLKQMSLEKLFEPNYAEVDHIVPYSISFDDSRKNKVLVLTEENRNKGNRLPLQYLKGRRREDF
    IVWVNNNVKDYRKRRLLLKEELTAEDESGFKERNLQDTKTMSRFLLNYIADNLEFAESTRGRK
    KKVTAVNGAVTAYMRKRWGITKIREDGDCHHAVDAVVIACTTDAMIRQVSRYAQFRECEYMQT
    ESGSVAVDTGTGEVLRTFPYPWPDFRKELEARLANDPAKVINDLHLPFYMSAGRPLPEPVFVS
    RMPRRKVTGAAHKDTIKSARELDNGYLIVKRPLTDLKLKNGEIENYYNPQSDKCLYDALKNAL
    IEHGGDAKKAFAGEFRKPKRDGTPGPIVKKVKLLEPTTMCVPVHGGKGAADNDSMVRVDVFLS
    GGKYYLVPIYVADTLKPELPAKAVTRGKKYSEWLEMADEDFIFSLYPNDLICATSKNGITLSV
    CRKDSTLPPTVESKSFMLYYRGTDISTGSISCITHDNAYKLRGLGVKTLEKLEKYTVDVLGEY
    HKVGKEVRQPFNIKRRKACPSEML
  • While this invention is satisfied by embodiments in many different forms, as described in detail in connection with preferred embodiments of the invention, it is understood that the present disclosure is to be considered as exemplary of the principles of the invention and is not intended to limit the invention to the specific embodiments illustrated and described herein. Numerous variations may be made by persons skilled in the art without departure from the spirit of the invention. The scope of the invention will be measured by the appended claims and their equivalents. The abstract and the title are not to be construed as limiting the scope of the present invention, as their purpose is to enable the appropriate authorities, as well as the general public, to quickly determine the general nature of the invention. In the claims that follow, unless the term “means” is used, none of the features or elements recited therein should be construed as means-plus-function limitations pursuant to 35 U.S.C. § 112, ¶6.

Claims (8)

We claim:
1. A system for CRISPR editing of live cells comprising a MAD2015 nuclease having a sequence SEQ ID NO: 1, a CRISPR repeat RNA having a sequence SEQ ID NO: 2, and a tracr RNA having a sequence SEQ ID NO: 3; a MAD2016 nuclease having a sequence SEQ ID NO: 4, a CRISPR repeat RNA having a sequence SEQ ID NO: 5, and a tracr RNA having a sequence SEQ ID NO: 6; a MAD2017 nuclease having a sequence SEQ ID NO: 7, a CRISPR repeat RNA having a sequence SEQ ID NO: 8, and a tracr RNA having a sequence SEQ ID NO: 9; a MAD2019 nuclease having a sequence SEQ ID NO: 10, a CRISPR repeat RNA having a sequence SEQ ID NO: 11, and a tracr RNA having a sequence SEQ ID NO: 12; a MAD2020 nuclease having a sequence SEQ ID NO: 13, a CRISPR repeat RNA having a sequence SEQ ID NO: 14, and a tracr RNA having a sequence SEQ ID NO: 15; a MAD2021 nuclease having a sequence SEQ ID NO: 16, a CRISPR repeat RNA having a sequence SEQ ID NO: 17, and a tracr RNA having a sequence SEQ ID NO: 18; or a MAD2022 nuclease having a sequence SEQ ID NO: 19, a CRISPR repeat RNA having a sequence SEQ ID NO: 20, and a tracr RNA having a sequence SEQ ID NO: 21.
2. The system for CRISPR editing of live cells of claim 1, comprising a MAD2015 nuclease having a sequence SEQ ID NO: 1, a CRISPR repeat RNA having a sequence SEQ ID NO: 2, and a tracr RNA having a sequence SEQ ID NO: 3.
3. The system for CRISPR editing of live cells of claim 1, comprising a MAD2016 nuclease having a sequence SEQ ID NO: 4, a CRISPR repeat RNA having a sequence SEQ ID NO: 5, and a tracr RNA having a sequence SEQ ID NO: 6.
4. The system for CRISPR editing of live cells of claim 1, comprising a MAD2017 nuclease having a sequence SEQ ID NO: 7, a CRISPR repeat RNA having a sequence SEQ ID NO: 8, and a tracr RNA having a sequence SEQ ID NO: 9.
5. The system for CRISPR editing of live cells of claim 1, comprising a MAD2019 nuclease having a sequence SEQ ID NO: 10, a CRISPR repeat RNA having a sequence SEQ ID NO: 11, and a tracr RNA having a sequence SEQ ID NO: 12.
6. The system for CRISPR editing of live cells of claim 1, comprising a MAD2020 nuclease having a sequence SEQ ID NO: 13, a CRISPR repeat RNA having a sequence SEQ ID NO: 14, and a tracr RNA having a sequence SEQ ID NO: 15.
7. The system for CRISPR editing of live cells of claim 1, comprising a MAD2021 nuclease having a sequence SEQ ID NO: 16, a CRISPR repeat RNA having a sequence SEQ ID NO: 17, and a tracr RNA having a sequence SEQ ID NO: 18.
8. The system for CRISPR editing of live cells of claim 1, a MAD2022 nuclease having a sequence SEQ ID NO: 19, a CRISPR repeat RNA having a sequence SEQ ID NO: 20, and a tracr RNA having a sequence SEQ ID NO: 21.
US17/691,018 2021-01-04 2022-03-09 Mad nucleases Pending US20220213458A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/691,018 US20220213458A1 (en) 2021-01-04 2022-03-09 Mad nucleases

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202163133502P 2021-01-04 2021-01-04
US17/463,498 US11306298B1 (en) 2021-01-04 2021-08-31 Mad nucleases
US17/691,018 US20220213458A1 (en) 2021-01-04 2022-03-09 Mad nucleases

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US17/463,498 Continuation US11306298B1 (en) 2021-01-04 2021-08-31 Mad nucleases

Publications (1)

Publication Number Publication Date
US20220213458A1 true US20220213458A1 (en) 2022-07-07

Family

ID=80473291

Family Applications (4)

Application Number Title Priority Date Filing Date
US17/463,498 Active US11306298B1 (en) 2021-01-04 2021-08-31 Mad nucleases
US17/463,581 Active US11268078B1 (en) 2021-01-04 2021-09-01 Nucleic acid-guided nickases
US17/676,218 Active 2042-06-01 US11965186B2 (en) 2021-01-04 2022-02-20 Nucleic acid-guided nickases
US17/691,018 Pending US20220213458A1 (en) 2021-01-04 2022-03-09 Mad nucleases

Family Applications Before (3)

Application Number Title Priority Date Filing Date
US17/463,498 Active US11306298B1 (en) 2021-01-04 2021-08-31 Mad nucleases
US17/463,581 Active US11268078B1 (en) 2021-01-04 2021-09-01 Nucleic acid-guided nickases
US17/676,218 Active 2042-06-01 US11965186B2 (en) 2021-01-04 2022-02-20 Nucleic acid-guided nickases

Country Status (5)

Country Link
US (4) US11306298B1 (en)
EP (2) EP4271802A1 (en)
AU (1) AU2021415461A1 (en)
CA (1) CA3204158A1 (en)
WO (2) WO2022146497A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023150637A1 (en) 2022-02-02 2023-08-10 Inscripta, Inc. Nucleic acid-guided nickase fusion proteins
WO2024026344A1 (en) 2022-07-27 2024-02-01 Inscripta, Inc. Modulating cellular repair mechanisms for genomic editing
WO2024047563A1 (en) 2022-09-02 2024-03-07 Janssen Biotech, Inc. Materials and processes for engineering hypoimmunogenicity

Family Cites Families (117)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6391582B2 (en) 1998-08-14 2002-05-21 Rigel Pharmaceuticlas, Inc. Shuttle vectors
SE9900530D0 (en) 1999-02-15 1999-02-15 Vincenzo Vassarotti A device for concentrating and / or purifying macromolecules in a solution and a method for manufacturing such a device
US6986993B1 (en) 1999-08-05 2006-01-17 Cellomics, Inc. System for cell-based screening
WO2002010183A1 (en) 2000-07-31 2002-02-07 Menzel, Rolf Compositions and methods for directed gene assembly
US20020139741A1 (en) 2001-03-27 2002-10-03 Henry Kopf Integral gasketed filtration cassette article and method of making the same
US7166443B2 (en) 2001-10-11 2007-01-23 Aviva Biosciences Corporation Methods, compositions, and automated systems for separating rare cells from fluid samples
CA2474486C (en) 2002-01-23 2013-05-14 The University Of Utah Research Foundation Targeted chromosomal mutagenesis using zinc finger nucleases
AU2003268089A1 (en) 2002-08-13 2004-02-25 National Jewish Medical And Research Center Method for identifying mhc-presented peptide epitopes for t cells
US20040138154A1 (en) 2003-01-13 2004-07-15 Lei Yu Solid surface for biomolecule delivery and high-throughput assay
DK3284833T3 (en) 2005-08-26 2022-02-07 Dupont Nutrition Biosci Aps USE OF CRISPR-ASSOCIATED GENES (CAS)
US8748146B2 (en) 2007-04-19 2014-06-10 Celexion, Llc Engineered nucleases and their uses for nucleic acid assembly
WO2009025690A2 (en) 2007-05-23 2009-02-26 Nature Technology Corporation Improved e. coli plasmid dna production
GB0724860D0 (en) 2007-12-20 2008-01-30 Heptares Therapeutics Ltd Screening
US8450112B2 (en) 2008-04-09 2013-05-28 Maxcyte, Inc. Engineering and delivery of therapeutic compositions of freshly isolated cells
US9845455B2 (en) 2008-05-15 2017-12-19 Ge Healthcare Bio-Sciences Ab Method for cell expansion
US20100076057A1 (en) 2008-09-23 2010-03-25 Northwestern University TARGET DNA INTERFERENCE WITH crRNA
JP5771147B2 (en) 2008-09-26 2015-08-26 トカジェン インコーポレーテッド Gene therapy vector and cytosine deaminase
EP2206723A1 (en) 2009-01-12 2010-07-14 Bonas, Ulla Modular DNA-binding domains
US20110294217A1 (en) 2009-02-12 2011-12-01 Fred Hutchinson Cancer Research Center Dna nicking enzyme from a homing endonuclease that stimulates site-specific gene conversion
GB0922434D0 (en) 2009-12-22 2010-02-03 Ucb Pharma Sa antibodies and fragments thereof
PT2816112T (en) 2009-12-10 2018-11-20 Univ Iowa State Res Found Inc Tal effector-mediated dna modification
BR112012028805A2 (en) 2010-05-10 2019-09-24 The Regents Of The Univ Of California E Nereus Pharmaceuticals Inc endoribonuclease compositions and methods of use thereof.
EP2395087A1 (en) 2010-06-11 2011-12-14 Icon Genetics GmbH System and method of modular cloning
US9361427B2 (en) 2011-02-01 2016-06-07 The Regents Of The University Of California Scar-less multi-part DNA assembly design automation
US8332160B1 (en) 2011-11-17 2012-12-11 Amyris Biotechnologies, Inc. Systems and methods for engineering nucleic acid constructs using scoring techniques
US9637739B2 (en) 2012-03-20 2017-05-02 Vilnius University RNA-directed DNA cleavage by the Cas9-crRNA complex
ES2960803T3 (en) 2012-05-25 2024-03-06 Univ California Methods and compositions for RNA-directed modification of target DNA and for modulation of RNA-directed transcription
EP3483311A1 (en) 2012-06-25 2019-05-15 Gen9, Inc. Methods for nucleic acid assembly and high throughput sequencing
PL2877490T3 (en) 2012-06-27 2019-03-29 The Trustees Of Princeton University Split inteins, conjugates and uses thereof
CN116622704A (en) 2012-07-25 2023-08-22 布罗德研究所有限公司 Inducible DNA binding proteins and genomic disruption tools and uses thereof
EP3617309A3 (en) 2012-12-06 2020-05-06 Sigma Aldrich Co. LLC Crispr-based genome modification and regulation
US8697359B1 (en) 2012-12-12 2014-04-15 The Broad Institute, Inc. CRISPR-Cas systems and methods for altering expression of gene products
WO2014143381A1 (en) 2013-03-09 2014-09-18 Agilent Technologies, Inc. Methods of in vivo engineering of large sequences using multiple crispr/cas selections of recombineering events
US9499855B2 (en) 2013-03-14 2016-11-22 Elwha Llc Compositions, methods, and computer systems related to making and administering modified T cells
US9234213B2 (en) 2013-03-15 2016-01-12 System Biosciences, Llc Compositions and methods directed to CRISPR/Cas genomic engineering systems
US10119134B2 (en) 2013-03-15 2018-11-06 Abvitro Llc Single cell bar-coding for antibody discovery
EP2981617B1 (en) 2013-04-04 2023-07-05 President and Fellows of Harvard College Therapeutic uses of genome editing with crispr/cas systems
JP7065564B2 (en) * 2013-05-29 2022-05-12 セレクティス Methods for Providing Accurate DNA Cleavage Using CAS9 Nickase Activity
CN105339076B (en) 2013-06-25 2018-11-23 利乐拉瓦尔集团及财务有限公司 Membrane filter system with hygienic suspension arrangement
RU2764637C2 (en) 2013-07-09 2022-01-19 Президент Энд Фэллоуз Оф Харвард Коллидж Multiplex genomic engineering guided by rna
US9388430B2 (en) 2013-09-06 2016-07-12 President And Fellows Of Harvard College Cas9-recombinase fusion proteins and uses thereof
WO2015048690A1 (en) 2013-09-27 2015-04-02 The Regents Of The University Of California Optimized small guide rnas and methods of use
US20150098954A1 (en) 2013-10-08 2015-04-09 Elwha Llc Compositions and Methods Related to CRISPR Targeting
WO2015059690A1 (en) 2013-10-24 2015-04-30 Yeda Research And Development Co. Ltd. Polynucleotides encoding brex system polypeptides and methods of using s ame
RU2685914C1 (en) 2013-12-11 2019-04-23 Регенерон Фармасьютикалс, Инк. Methods and compositions for genome targeted modification
CA2932472A1 (en) 2013-12-12 2015-06-18 Massachusetts Institute Of Technology Compositions and methods of use of crispr-cas systems in nucleotide repeat disorders
US10787654B2 (en) 2014-01-24 2020-09-29 North Carolina State University Methods and compositions for sequence guiding Cas9 targeting
EP3690044B1 (en) 2014-02-11 2024-01-10 The Regents of the University of Colorado, a body corporate Crispr enabled multiplexed genome engineering
CA2943569C (en) 2014-03-27 2021-02-23 British Columbia Cancer Agency Branch T-cell epitope identification
US10665114B2 (en) 2014-03-28 2020-05-26 The Boeing Company Aircraft fuel optimization analytics
WO2015153940A1 (en) 2014-04-03 2015-10-08 Massachusetts Institute Of Technology Methods and compositions for the production of guide rna
EP3680333A1 (en) 2014-04-29 2020-07-15 Illumina, Inc. Multiplexed single cell expression analysis using template switch and tagmentation
WO2015183025A1 (en) 2014-05-28 2015-12-03 주식회사 툴젠 Method for sensitive detection of target dna using target-specific nuclease
US20160053304A1 (en) 2014-07-18 2016-02-25 Whitehead Institute For Biomedical Research Methods Of Depleting Target Sequences Using CRISPR
US20160053272A1 (en) 2014-07-18 2016-02-25 Whitehead Institute For Biomedical Research Methods Of Modifying A Sequence Using CRISPR
BR112017001567A2 (en) 2014-07-25 2017-11-21 Novogy Inc promoters derived from yarrowia lipolytica and arxula adeninivorans, and methods of using them
US20160076093A1 (en) 2014-08-04 2016-03-17 University Of Washington Multiplex homology-directed repair
US9879283B2 (en) 2014-10-09 2018-01-30 Life Technologies Corporation CRISPR oligonucleotides and gene editing
US10308947B2 (en) 2014-10-17 2019-06-04 The Penn State Research Foundation Methods and compositions for multiplex RNA guided genome editing and other RNA technologies
US11396665B2 (en) 2015-01-06 2022-07-26 Dsm Ip Assets B.V. CRISPR-CAS system for a filamentous fungal host cell
WO2016145416A2 (en) 2015-03-11 2016-09-15 The Broad Institute, Inc. Proteomic analysis with nucleic acid identifiers
JP6944876B2 (en) 2015-03-16 2021-10-06 マックス−デルブリュック−ツェントルム フューア モレキュラーレ メディツィン イン デア ヘルムホルツ−ゲマインシャフト A method for detecting a novel immunogenic T cell epitope and a method for isolating a novel antigen-specific T cell receptor using an MHC cell library.
ES2926467T3 (en) 2015-04-13 2022-10-26 Maxcyte Inc Methods and compositions for the modification of genomic DNA
US9790490B2 (en) 2015-06-18 2017-10-17 The Broad Institute Inc. CRISPR enzymes and systems
US11655452B2 (en) 2015-06-25 2023-05-23 Icell Gene Therapeutics Inc. Chimeric antigen receptors (CARs), compositions and methods of use thereof
CN107820504A (en) 2015-07-09 2018-03-20 英默里斯石墨及活性炭瑞士有限公司 Highly conductive carbon black with low viscosity
US20180200342A1 (en) 2015-07-13 2018-07-19 Institut Pasteur Improving sequence-specific antimicrobials by blocking dna repair
CN113774495A (en) 2015-09-25 2021-12-10 阿布维特罗有限责任公司 High throughput method for T cell receptor targeted identification of naturally paired T cell receptor sequences
WO2017075294A1 (en) 2015-10-28 2017-05-04 The Board Institute Inc. Assays for massively combinatorial perturbation profiling and cellular circuit reconstruction
WO2017075265A1 (en) 2015-10-28 2017-05-04 The Broad Institute, Inc. Multiplex analysis of single cell constituents
WO2017078631A1 (en) 2015-11-05 2017-05-11 Agency For Science, Technology And Research Chemical-inducible genome engineering technology
WO2017083722A1 (en) 2015-11-11 2017-05-18 Greenberg Kenneth P Crispr compositions and methods of using the same for gene therapy
US9988624B2 (en) 2015-12-07 2018-06-05 Zymergen Inc. Microbial strain improvement by a HTP genomic engineering platform
JP2018537106A (en) 2015-12-18 2018-12-20 ダニスコ・ユーエス・インク Methods and compositions for polymerase II (Pol-II) based guide RNA expression
CN116218916A (en) 2016-01-12 2023-06-06 Sqz生物技术公司 Intracellular delivery of complexes
EP3199632A1 (en) 2016-01-26 2017-08-02 ACIB GmbH Temperature-inducible crispr/cas system
US9896696B2 (en) 2016-02-15 2018-02-20 Benson Hill Biosystems, Inc. Compositions and methods for modifying genomes
JP7038676B2 (en) 2016-03-18 2022-03-18 キューティー ホールディングス コーポレーション Compositions, Devices, and Methods for Cell Separation
CN109072191B (en) 2016-04-04 2024-03-22 苏黎世联邦理工学院 Mammalian cell lines for protein production and library generation
CN116254231A (en) 2016-04-25 2023-06-13 巴塞尔大学 Allele editing and uses thereof
MX2018013445A (en) 2016-05-06 2019-09-09 Juno Therapeutics Inc Genetically engineered cells and methods of making the same.
AU2017274145B2 (en) 2016-06-02 2020-07-23 Sigma-Aldrich Co Llc Using programmable DNA binding proteins to enhance targeted genome modification
US11913081B2 (en) 2016-06-06 2024-02-27 The University Of Chicago Proximity-dependent split RNA polymerases as a versatile biosensor platform
JP2019522481A (en) 2016-06-22 2019-08-15 アイカーン スクール オブ メディシン アット マウント サイナイ Viral delivery of RNA using self-cleaving ribozymes and its CRISPR-based application
WO2017223538A1 (en) 2016-06-24 2017-12-28 The Regents Of The University Of Colorado, A Body Corporate Methods for generating barcoded combinatorial libraries
US20190264193A1 (en) 2016-08-12 2019-08-29 Caribou Biosciences, Inc. Protein engineering methods
WO2017216392A1 (en) 2016-09-23 2017-12-21 Dsm Ip Assets B.V. A guide-rna expression system for a host cell
WO2018071672A1 (en) 2016-10-12 2018-04-19 The Regents Of The University Of Colorado Novel engineered and chimeric nucleases
EP4365296A2 (en) 2016-11-07 2024-05-08 Genovie AB An engineered two-part cellular device for discovery and characterisation of t-cell receptor interaction with cognate antigen
EP3583203B1 (en) 2017-02-15 2023-11-01 2seventy bio, Inc. Donor repair templates multiplex genome editing
SG11201906297QA (en) 2017-03-24 2019-10-30 Curevac Ag Nucleic acids encoding crispr-associated proteins and uses thereof
WO2018191715A2 (en) 2017-04-14 2018-10-18 Synthetic Genomics, Inc. Polypeptides with type v crispr activity and uses thereof
US10011849B1 (en) 2017-06-23 2018-07-03 Inscripta, Inc. Nucleic acid-guided nucleases
US9982279B1 (en) 2017-06-23 2018-05-29 Inscripta, Inc. Nucleic acid-guided nucleases
PT3645719T (en) 2017-06-30 2022-05-18 Inscripta Inc Automated cell processing methods, modules, instruments, and systems
CN111344403A (en) 2017-09-15 2020-06-26 利兰斯坦福初级大学董事会 Multiplexed generation and barcoding of genetically engineered cells
US20200263197A1 (en) 2017-10-12 2020-08-20 The Jackson Laboratory Transgenic selection methods and compositions
US20190225928A1 (en) 2018-01-22 2019-07-25 Inscripta, Inc. Automated cell processing methods, modules, instruments, and systems comprising filtration devices
WO2019190874A1 (en) 2018-03-29 2019-10-03 Inscripta, Inc. Automated control of cell growth rates for induction and transformation
WO2019200004A1 (en) 2018-04-13 2019-10-17 Inscripta, Inc. Automated cell processing instruments comprising reagent cartridges
US10501738B2 (en) 2018-04-24 2019-12-10 Inscripta, Inc. Automated instrumentation for production of peptide libraries
US10227576B1 (en) 2018-06-13 2019-03-12 Caribou Biosciences, Inc. Engineered cascade components and cascade complexes
CA3108767A1 (en) 2018-06-30 2020-01-02 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
SG11202100320QA (en) 2018-07-26 2021-02-25 Ospedale Pediatrico Bambino Gesù Opbg Therapeutic preparations of gamma-delta t cells and natural killer cells and methods for manufacture and use
JP2021534798A (en) 2018-08-28 2021-12-16 フラッグシップ パイオニアリング イノベーションズ シックス,エルエルシー Methods and compositions for regulating the genome
CN112955540A (en) 2018-08-30 2021-06-11 因思科瑞普特公司 Improved detection of nuclease edited sequences in automated modules and instruments
GB201816522D0 (en) 2018-10-10 2018-11-28 Autolus Ltd Methods and reagents for analysing nucleic acids from single cells
EP3870697A4 (en) 2018-10-22 2022-11-09 Inscripta, Inc. Engineered enzymes
WO2020191102A1 (en) 2019-03-18 2020-09-24 The Broad Institute, Inc. Type vii crispr proteins and systems
BR112021018606A2 (en) 2019-03-19 2021-11-23 Harvard College Methods and compositions for editing nucleotide sequences
GB201905651D0 (en) 2019-04-24 2019-06-05 Lightbio Ltd Nucleic acid constructs and methods for their manufacture
CA3139122C (en) 2019-06-06 2023-04-25 Inscripta, Inc. Curing for recursive nucleic acid-guided cell editing
US10927385B2 (en) 2019-06-25 2021-02-23 Inscripta, Inc. Increased nucleic-acid guided cell editing in yeast
US10883095B1 (en) * 2019-12-10 2021-01-05 Inscripta, Inc. Mad nucleases
US10704033B1 (en) 2019-12-13 2020-07-07 Inscripta, Inc. Nucleic acid-guided nucleases
US10689669B1 (en) 2020-01-11 2020-06-23 Inscripta, Inc. Automated multi-module cell processing methods, instruments, and systems
US20210317444A1 (en) 2020-04-08 2021-10-14 Inscripta, Inc. System and method for gene editing cassette design

Also Published As

Publication number Publication date
US11965186B2 (en) 2024-04-23
US11268078B1 (en) 2022-03-08
CA3204158A1 (en) 2022-07-07
US20220213457A1 (en) 2022-07-07
US11306298B1 (en) 2022-04-19
WO2022146498A1 (en) 2022-07-07
WO2022146497A1 (en) 2022-07-07
EP4271802A1 (en) 2023-11-08
AU2021415461A1 (en) 2023-08-17
EP4271803A1 (en) 2023-11-08

Similar Documents

Publication Publication Date Title
US10724021B1 (en) Nucleic acid-guided nucleases
US11053485B2 (en) MAD nucleases
US11345903B2 (en) Engineered enzymes
US20220213458A1 (en) Mad nucleases
US11332742B1 (en) Mad nucleases
US11214781B2 (en) Engineered enzyme
CN116732003A (en) Engineered nucleases and uses thereof

Legal Events

Date Code Title Description
AS Assignment

Owner name: INSCRIPTA, INC., COLORADO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KIM, JUHAN;MIJTS, BENJAMIN;MIR, AAMIR;REEL/FRAME:059215/0706

Effective date: 20220112

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED