WO2023154892A1 - Rna-guided genome recombineering at kilobase scale - Google Patents

Rna-guided genome recombineering at kilobase scale Download PDF

Info

Publication number
WO2023154892A1
WO2023154892A1 PCT/US2023/062431 US2023062431W WO2023154892A1 WO 2023154892 A1 WO2023154892 A1 WO 2023154892A1 US 2023062431 W US2023062431 W US 2023062431W WO 2023154892 A1 WO2023154892 A1 WO 2023154892A1
Authority
WO
WIPO (PCT)
Prior art keywords
protein
seq
composition
sequence
nucleic acid
Prior art date
Application number
PCT/US2023/062431
Other languages
French (fr)
Inventor
Mengdi WANG
Original Assignee
Possible Medicines Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Possible Medicines Llc filed Critical Possible Medicines Llc
Publication of WO2023154892A1 publication Critical patent/WO2023154892A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/16Aptamers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/30Chemical structure
    • C12N2310/35Nature of the modification
    • C12N2310/351Conjugate
    • C12N2310/3519Fusion with another nucleic acid

Definitions

  • the present invention relates to RNA-guided recombineering-editing systems using phage recombination enzymes as well as methods, vectors, nucleic acid compositions, and kits thereof.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • a Casl2f protein for example, disclosed herein are systems comprising a Casl2f protein, a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence, and a recombination protein.
  • the Casl2f protein is a dCasl2fl protein.
  • the Casl2f protein is a dCasl2f protein.
  • the Casl2f protein is a dCAS12fl protein.
  • the Casl2f protein is a catalytically dead.
  • the recombination protein may be a single stranded DNA annealing protein (SSAP), including but not limited to a microbial recombination protein, for example, RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof.
  • SSAP single stranded DNA annealing protein
  • the system further comprises donor DNA.
  • the target DNA sequence is a genomic DNA sequence in a host cell.
  • the recombination protein comprises a microbial recombination protein or active portion thereof, a mitochondrial recombination protein or active portion thereof, a viral recombination protein or active portion thereof, or a eukaryotic recombination protein or active portion thereof, including without limitation, a recombination protein set forth in Table 6 or derivative or variant or functional portion thereof.
  • the recombination protein comprises an amino acid sequence with at least 70% identity , or at least 75% identity, or at least 80% identity, or at least 85% identity, or at least 90% identity, or at least 92% identity, or at least 95% identity, or at least 96% identity, or at least 97% identity, or at least 98% identity, or at least 99% identity to a recombination protein set forth in Table 6 or derivative or variant or functional portion thereof.
  • the system or composition comprises a donor nucleic acid.
  • the donor nucleic acid comprises homology arms.
  • the system or composition comprises a recruitment system for recruiting a guide nucleic acid and a recombination protein.
  • the recruitment system comprises at least one aptamer sequence and an aptamer binding protein functionally linked to the microbial recombination protein as part of a fusion protein.
  • the at least one aptamer sequence is an RNA aptamer sequence or a peptide aptamer sequence.
  • the nucleic acid molecule or nucleic acid molecules additionally comprises the at least one RNA aptamer sequence or comprises one, two, three, or more RNA aptamer sequences.
  • two aptamer sequences comprise the same sequence or comprise sequences that bind to the same aptamer binding protein.
  • the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof. In certain embodiments, the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof. In certain embodiments, the at least one peptide aptamer sequence is conjugated to the guide RNA. In certain embodiments, the at least one peptide aptamer sequence comprises between 1 and 24 peptide aptamer sequences. In certain embodiments, two or more aptamer sequences comprise the same sequence. In certain embodiments, an aptamer sequence comprises a GCN4 peptide sequence.
  • the recombination protein N-terminus is linked to the aptamer binding protein C-terminus.
  • the recombination protein and the aptamer binding protein are operably linked by a linker.
  • the recombination system or composition comprises at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is linked to the recombination protein.
  • NLS nuclear localization sequence
  • the NLS is located at the recombination protein C-terminus or at the recombination protein N-terminus.
  • the recombination system is comprised in a cell, for example, a eukaryotic cell, a mammalian cell, an animal cell, a human cell, or a plant cell.
  • the recruitment system is adaptable to a multitude of combinations and configurations of recombination proteins.
  • the system can comprise multiple recombination proteins, which may be the same or different and in various ratios.
  • the system comprises an exonuclease.
  • the system comprises an SSAP.
  • the system comprises an SSB.
  • the system comprises an exonuclease and an SSAP.
  • the system comprises an exonuclease and an SSB.
  • the system comprises an SSAP and an SSB.
  • the system comprises an exonuclease and an SSAP and does not comprise an SSB. In certain embodiments, the system comprises an exonuclease and an SSB and does not comprise an SSAP. In certain embodiments, the system comprises an SSAP and an SSB and does not comprise an exonuclease. In certain embodiments, the system comprises an exonuclease, an SSAP, and an SSB.
  • the invention provides a recombination system comprising an SSAP , a Casl2f, and a reverse transcriptase (RT).
  • the invention provides a system or composition comprising: (i) a reverse transcriptase(s) (RT); (ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription, or nucleic acid molecules comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription; and (iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iv) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (RT).
  • RT reverse transcriptas
  • the RT system or composition can involve (i) being enzyme, (ii) being nucleic acid molecule(s), and (iii) being nucleic acid molecules; or (i) being nucleic acid molecule(s) encoding the enzyme(s), (ii) being nucleic acid molecule(s), and (iii) being protein, or all of (i), (ii) and (iii) being nucleic acid molecules.
  • the RT system or composition can include more than one reverse transcriptase. When there is more than one reverse transcriptase there can be more than one RNA for reverse transcription.
  • composition (i), (ii) and (iii) further comprises a Cas protein; or (iv) further comprises nucleic acid molecule(s) encoding a Cas protein, e.g., (iv) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) and/or a Cas protein for expression in vivo in a cell; or the vector(s) of (v) additional contain nucleic acid molecule(s) encoding a Cas protein.
  • Reverse transcriptases that can be used according to the invention include, without limitation, reverse transcriptases, retrotransposon reverse transcriptases, retron reverse transcriptases, LINE-1 reverse transcriptase, Ec86 reverse transcriptase, Human immunodeficiency virus (HIV) RT, Avian myoblastosis virus (AMV) RT, Moloney murine leukemia virus (M- MLV) RT a group II intron RT, a group II intron-like RT, a chimeric RT, Ma Luoni mouse leukaemia virus (M-MLV) Transcriptase, Rous sarcoma virus (Rous sarcoma virus, RSV), avian myeloblastosis virus (AMV) reverse transcriptase, Lao Sishi correlated virus (RAV) reverse transcriptase and myeloblast Tumor correlated virus (MAV) reverse transcriptase or other Avian Sarcoma leucovirus (Avian sarcoma le
  • Such engineered polymerases include, with limitation, human DNA polymerase r] which has reverse transcriptase activity in cellular environments (Su et al. 2019, J. Biol. Chem. 294(15):6073-81), and Taq DNA polymerase engineered to enhance reverse transcription and strand displacement (Barnes et el., Front. Bioeng. Biotechnol., 14 January 2021, doi.org/10.3389/fbioe.2020.553474).
  • compositions comprising a nucleic acid sequence encoding a fusion protein comprising a recombination protein functionally linked to an aptamer binding protein.
  • the microbial recombination protein may be RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof.
  • compositions may further comprise one or both of a polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence.
  • the nucleic acid molecule further comprises at least one RNA aptamer sequence.
  • the polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein further comprises a sequence encoding at least one peptide aptamer sequence.
  • vectors comprising a nucleic acid sequence encoding a fusion protein comprising a recombination protein functionally linked to an aptamer binding protein.
  • the recombination protein comprises a microbial recombination protein, including without limitation RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof.
  • the vectors may further comprise one or both of a polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence.
  • the nucleic acid molecule further comprises at least one RNA aptamer sequence.
  • the polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein further comprises a sequence encoding at least one peptide aptamer sequence.
  • the RecE and RecT recombination protein is derived from E. coli.
  • the RecE, or derivative or variant thereof comprises an amino acid sequence with at least 70% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: l-8.
  • the RecT, or derivative or variant thereof comprises an amino acid sequence with at least 70% similarity to amino acid sequences selected from the group consisting of SEQ ID NO:9.
  • linker may be a peptide of 5-30, 10-30, 10-20 or 15 amino acid residues.
  • the linker may be - (Gly-Gly-Gly-Gly-Ser)2 - (SEQ ID NO:492), - (Gly-Gly-Gly-Gly- Ser) 3 - (SEQ ID NO:493), or - (Gly-Gly-Gly-Gly-Ser) 4 - (SEQ ID NO:494).
  • the linker is - (Gly-Gly-Gly-Gly-Ser) 3 - (SEQ ID NO:493).
  • the amino acid sequence of SEQ ID NO:561 may be encoded by the nucleic acid sequence of SEQ ID NO:495.
  • a linker is made up of a majority of amino acids that are sterically unhindered, such as glycine and alanine.
  • exemplary linkers are polyglycines (particularly (Glys, poly(Gly-Ala), and polyalanines.
  • One exemplary suitable linker as shown in the Examples below is (Gly-Ser), such as - (Gly-Gly-Gly-Gly-Ser)2 - (SEQ ID NO:492), - (Gly- Gly-Gly-Gly-Ser) 3 - (SEQ ID NO:493), or - (Gly-Gly-Gly-Gly-Gly-Ser) 4 - (SEQ ID NO:494).
  • Linkers may also be non-peptide linkers.
  • These alkyl linkers may further be substituted by any non-sterically hindering group such as lower alkyl (e.g., Ci- 4 ) lower acyl, halogen (e.g., CI, Br), CN, NH2, phenyl, etc.
  • a eukaryotic cell comprising the systems or vectors disclosed herein.
  • methods of altering a target genomic DNA sequence in a host cell comprise contacting the systems, compositions, or vectors described herein with a target DNA sequence (e.g., introducing the systems, compositions, or vectors described herein into a host cell comprising a target genomic DNA sequence). Kits containing one or more reagents or other components useful, necessary, or sufficient for practicing any of the methods are also disclosed herein.
  • Patent law e.g., they can mean “includes”, “included”, “including”, and the like; and that terms such as “consisting essentially of' and “consists essentially of' have the meaning ascribed to them in U.S. Patent law, e.g., they allow for elements not explicitly recited, but exclude elements that are found in the prior art or that affect a basic or novel characteristic of the invention.
  • FIG. 1 depicts SSAP with dCAS12fl protein as compact editor for precision large knock-in.
  • FIG. 2 depicts SSAP with dCasl2fl(D225A) protein as compact editor for precision large knock-in (knock-in of mKate transgene).
  • FIG. 3 depicts SSAP with different versions of dCasl2fl - using scaffold no.4 with the best signal to noise ratio.
  • the present disclosure is directed to a system and the components for DNA editing.
  • the disclosed system based on CRISPR targeting and homology directed repair by phage recombination enzymes.
  • the system results in superior recombination efficiency and accuracy at a kilobase scale.
  • the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0, the number 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, and 7.0 are explicitly contemplated.
  • complementary and complementarity refer to the ability of a nucleic acid to form hydrogen bond(s) with another nucleic acid sequence by either traditional Watson- Crick base-paring or other non-traditional types of pairing.
  • the degree of complementarity between two nucleic acid sequences can be indicated by the percentage of nucleotides in a nucleic acid sequence which can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 50%, 60%, 70%, 80%, 90%, and 100% complementary).
  • Two nucleic acid sequences are “perfectly complementary” if all the contiguous nucleotides of a nucleic acid sequence will hydrogen bond with the same number of contiguous nucleotides in a second nucleic acid sequence.
  • Two nucleic acid sequences are “substantially complementary” if the degree of complementarity between the two nucleic acid sequences is at least 60% (e.g., 65%, 70%, 75%, 80%, 85%, 90%, 95%.
  • nucleic acid sequences hybridize under at least moderate, preferably high, stringency conditions.
  • Exemplary moderate stringency conditions include overnight incubation at 37° C in a solution comprising 20% formamide, 5*SSC (150 mM NaCl, 15 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5*Denhardt’s solution, 10% dextran sulfate, and 20 mg/ml denatured sheared salmon sperm DNA, followed by washing the filters in 1 *SSC at about 37-50° C., or substantially similar conditions, e.g., the moderately stringent conditions described in Sambrook et al., infra.
  • High stringency conditions are conditions that use, for example (1) low ionic strength and high temperature for washing, such as 0.015 M sodium chloride/0.0015 M sodium citrate/0.1% sodium dodecyl sulfate (SDS) at 50° C, (2) employ a denaturing agent during hybridization, such as formamide, for example, 50% (v/v) formamide with 0.1% bovine serum albumin (BSA)/0.1% Ficoll/0.1% polyvinylpyrrolidone (PVP)/50 mM sodium phosphate buffer at pH 6.5 with 750 mM sodium chloride and 75 mM sodium citrate at 42° C., or (3) employ 50% formamide, 5*SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1% sodium pyrophosphate, 5*Denhardt’s solution, sonicated salmon sperm DNA (50 pg/ml), 0.1% SDS, and 10% dextran sulfate
  • a cell has been “genetically modified,” “transformed,” or “transfected” by exogenous DNA, e.g., a recombinant expression vector, when such DNA has been introduced inside the cell.
  • exogenous DNA e.g., a recombinant expression vector
  • the presence of the exogenous DNA results in permanent or transient genetic change.
  • the transforming DNA may or may not be integrated (covalently linked) into the genome of the cell.
  • the transforming DNA may be maintained on an episomal element such as a plasmid.
  • a stably transformed cell is one in which the transforming DNA has become integrated into a chromosome so that it is inherited by daughter cells through chromosome replication.
  • a “nucleic acid” or a “nucleic acid sequence” refers to a polymer or oligomer of pyrimidine and/or purine bases, preferably cytosine, thymine, and uracil, and adenine and guanine, respectively.
  • the present technology contemplates any deoxyribonucleotide, ribonucleotide, or peptide nucleic acid component, and any chemical variants thereof, such as methylated, hydroxymethylated, or glycosylated forms of these bases, and the like.
  • the polymers or oligomers may be heterogenous or homogenous in composition and may be isolated from naturally occurring sources or may be artificially or synthetically produced.
  • the nucleic acids may be DNA or RNA, or a mixture thereof, and may exist permanently or transitionally in single-stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid states.
  • a nucleic acid or nucleic acid sequence comprises other kinds of nucleic acid structures such as, for instance, a DNA/RNA helix, peptide nucleic acid (PNA), morpholino nucleic acid (see, e.g., Braasch and Corey, Biochemistry, 41(14): 4503-4510 (2002)) and U.S. Pat. No. 5,034,506, incorporated herein by reference), locked nucleic acid (LNA; see Wahlestedt et al., Proc. Natl. Acad. Sci. U.S.A., 97: 5633-5638 (2000), incorporated herein by reference), cyclohexenyl nucleic acids (see Wang, J. Am. Chem.
  • nucleic acid or “nucleic acid sequence” may also encompass a chain comprising non-natural nucleotides, modified nucleotides, and/or non- nucleotide building blocks that can exhibit the same function as natural nucleotides (e.g., “nucleotide analogs”); further, the term “nucleic acid sequence” as used herein refers to an oligonucleotide, nucleotide or polynucleotide, and fragments or portions thereof, and to DNA or RNA of genomic or synthetic origin, which may be single or double-stranded, and represent the sense or antisense strand.
  • nucleic acid refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof.
  • a “peptide” or “polypeptide” is a linked sequence of two or more amino acids linked by peptide bonds.
  • the peptide or polypeptide can be natural, synthetic, or a modification or combination of natural and synthetic.
  • Polypeptides include proteins such as binding proteins, receptors, and antibodies. The proteins may be modified by the addition of sugars, lipids or other moieties not included in the amino acid chain.
  • the terms “polypeptide” and “protein,” are used interchangeably herein.
  • percent sequence identity refers to the percentage of nucleotides or nucleotide analogs in a nucleic acid sequence, or amino acids in an amino acid sequence, that is identical with the corresponding nucleotides or amino acids in a reference sequence after aligning the two sequences and introducing gaps, if necessary, to achieve the maximum percent identity.
  • nucleic acid according to the technology is longer than a reference sequence, additional nucleotides in the nucleic acid, that do not align with the reference sequence, are not taken into account for determining sequence identity.
  • Methods and computer programs for alignment are well known in the art, including BLAST, Align 2, and FASTA.
  • a “vector” or “expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, e.g., an “insert,” may be attached or incorporated so as to bring about the replication of the attached segment in a cell.
  • wild-type refers to a gene or a gene product that has the characteristics of that gene or gene product when isolated from a naturally occurring source.
  • a wild-type gene is that which is most frequently observed in a population and is thus arbitrarily designated the “normal” or “wild-type” form of the gene.
  • modified,” “mutant,” or “polymorphic” refers to a gene or gene product that displays modifications in sequence and or functional properties (e.g., altered characteristics) when compared to the wild-type gene or gene product. It is noted that naturally occurring mutants can be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene or gene product.
  • CRISPR/Cas systems provide immunity by incorporating fragments of invading phage, virus, and plasmid DNA into CRISPR loci and using corresponding CRISPR RNAs (“crRNAs”) to guide the degradation of homologous sequences.
  • crRNAs CRISPR RNAs
  • Each CRISPR locus encodes acquired “spacers” that are separated by repeat sequences. Transcription of a CRISPR locus produces a “pre-crRNA,” which is processed to yield crRNAs containing spacerrepeat fragments that guide effector nuclease complexes to cleave dsDNA sequences complementary to the spacer.
  • CRISPR systems Three different types are known, type I, type II, or type III, and classified based on the Cas protein type and the use of a proto-spacer-adjacent motif (PAM) for selection of proto-spacers in invading DNA.
  • the endogenous type II systems comprise the Cas9 protein and two noncoding crRNAs: trans-activating crRNA (tracrRNA) and a precursor crRNA (pre-crRNA) array containing nuclease guide sequences (also referred to as “spacers”) interspaced by identical direct repeats (DRs).
  • tracrRNA is important for processing the pre-crRNA and formation of the Cas9 complex.
  • tracrRNAs hybridize to repeat regions of the pre-crRNA.
  • each mature complex locates a target double stranded DNA (dsDNA) sequence and cleaves both strands using the nuclease activity of Cas9.
  • dsDNA target double stranded DNA
  • CRISPR/Cas gene editing systems have been developed to enable targeted modifications to a specific gene of interest in eukaryotic cells.
  • CRISPR/Cas gene editing systems are commonly based on the RNA-guided Cas9 nuclease from the type II prokaryotic clustered regularly interspaced short palindromic repeats (CRISPR) adaptive immune system.
  • Engineering CRISPR/Cas systems for use in eukaryotic cells typically involves reconstitution of the crRNA- tracrRNA-Cas9 complex.
  • the Cas9 amino acid sequence may be codon-optimized and modified to include an appropriate nuclear localization signal, and the crRNA and tracrRNA sequences may be expressed individually or as a single chimeric molecule via an RNA polymerase II promoter.
  • the crRNA and tracrRNA sequences are expressed as a chimera and are referred to collectively as “guide RNA” (gRNA) or single guide RNA (sgRNA).
  • gRNA guide chimera
  • sgRNA single guide RNA
  • guide RNA single guide RNA
  • guide RNA single guide RNA
  • synthetic guide RNA are used interchangeably herein and refer to a nucleic acid sequence comprising a tracrRNA and a pre- crRNA array containing a guide sequence.
  • guide sequence refers to the about 20 nucleotide sequence within a guide RNA that specifies the target site.
  • the guide RNA contains an approximate 20-nucleotide guide sequence followed by a protospacer adjacent motif (PAM) that directs Cas9 via Watson-Crick base pairing to a target sequence.
  • PAM protospacer adjacent motif
  • the disclosure provides a system for RNA-guided recombineering utilizing tools from CRISPR gene editing systems.
  • the system comprises: a Casl2f protein, a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and a microbial recombination protein.
  • Cas protein families are described in further detail in, e.g., Haft et al., PLoS Comput. Biol., 1(6): e60 (2005), incorporated herein by reference.
  • the Cas protein is Casl2f.
  • the amino acid sequences of Cas proteins from a variety of species are publicly available through the GenBank and UniProt databases.
  • Class 2 CRISPR systems are exceptionally diverse, nevertheless, all share a single effector protein that contains a conserved RuvC-like nuclease domain.
  • Casl2f also known as Casl4, is the smallest class 2 CRISPR-Cas effector reported to date with a length between ⁇ 400-700 amino acids (Harrington et al., Science. 2018; 362:839- 842).
  • Casl2f proteins were identified almost exclusively within a superphylum of symbiotic archaea, DP ANN (Harrington et al., Science. 2018; 362:839-842). Initially found to be specific for ssDNA, Casl2f was recently reported to also recognize dsDNA with 5' T-rich protospacer adjacent motifs (PAMs) (Karvelis et al., Nucleic Acids Res. 2020; 48:5016-5023). Casl2f associates with a crRNA and a tracrRNA, which can be fused into a single guide RNA (sgRNA), to target substrate DNA.
  • sgRNA single guide RNA
  • Casl2f is a Mg2+-dependent endonuclease that functions best in low salt concentrations and at ⁇ 46°C (Karvelis et al., Nucleic Acids Res. 2020; 48:5016-5023). Similar to other Casl2 nucleases, Casl2f is capable of cleaving non-specific ssDNA in trans after binding complementary target DNA, thus enabling its development for nucleic acid detection (Harrington et al., Science. 2018; 362:839-842).
  • Type-V effectors employ multiple domains distributed in a recognition lobe (REC) and a nuclease lobe (NUC) for substrate recognition and cleavage.
  • the REC lobe is responsible for substrate recognition, whereas the NUC lobe contains a nuclease site located within the RuvC domain..
  • a miniature Casl2f which is only about half the size of other Cas 12 nucleases, completes all functional requirements for target recognition and cleavage.
  • the Casl2f protein is a catalytically dead Casl2f.
  • Casl2f may be modified introducing D228A and D225A encoding codons into the Casl2f gene (see, e.g., Bigelyte et al., NATURE COMMUNICATIONS (2021) 12:6191 https://doi.org/10.1038/s41467-021-26469-4).
  • the system comprises a nucleic acid molecule comprising a guide RNA sequence complementary to a target DNA sequence.
  • the guide RNA sequence specifies the target site with an approximate 20-nucleotide guide sequence followed by a protospacer adjacent motif (PAM) that directs Casl2f via Watson-Crick base pairing to a target sequence.
  • PAM protospacer adjacent motif
  • target DNA sequence refers to a polynucleotide (nucleic acid, gene, chromosome, genome, etc.) to which a guide sequence (e.g., a guide RNA) is designed to have complementarity, wherein hybridization between the target sequence and a guide sequence promotes the formation of a Casl2f/CRISPR complex, provided sufficient conditions for binding exist.
  • a guide sequence e.g., a guide RNA
  • the target sequence is a genomic DNA sequence.
  • genomic refers to a nucleic acid sequence (e.g., a gene or locus) that is located on a chromosome in a cell.
  • the target sequence and guide sequence need not exhibit complete complementarity, provided that there is sufficient complementarity to cause hybridization and promote formation of a CRISPR complex.
  • a target sequence may comprise any polynucleotide, such as DNA or RNA.
  • Suitable DNA/RNA binding conditions include physiological conditions normally present in a cell. Other suitable DNA/RNA binding conditions (e.g., conditions in a cell-free system) are known in the art; see, e.g., Sambrook, referenced herein and incorporated by reference.
  • the strand of the target DNA that is complementary to and hybridizes with the DNA-targeting RNA is referred to as the “complementary strand” and the strand of the target DNA that is complementary to the “complementary strand” (and is therefore not complementary to the DNA-targeting RNA) is referred to as the “noncomplementary strand” or “non-complementary strand.”
  • the target genomic DNA sequence may encode a gene product.
  • gene product refers to any biochemical product resulting from expression of a gene.
  • Gene products may be RNA or protein.
  • RNA gene products include non-coding RNA, such as tRNA, rRNA, micro RNA (miRNA), and small interfering RNA (siRNA), and coding RNA, such as messenger RNA (mRNA).
  • mRNA messenger RNA
  • the target genomic DNA sequence encodes a protein or polypeptide.
  • the system when the system includes a a catalytically dead Casl2f, two nucleic acid molecules comprising a guide RNA sequence may be utilized.
  • the two nucleic acid molecules may have the same or different guide RNA sequences, thus complementary to the same or different target DNA sequence.
  • the guide RNA sequences of the two nucleic acid molecules are complementary to a target DNA sequences at opposite ends (e.g., 3’ or 5’) and/or on opposite strands of the insert location.
  • the system further comprises a recruitment system comprising at least one aptamer sequence and an aptamer binding protein functionally linked to the microbial recombination protein as part of a fusion protein.
  • the aptamer sequence is an RNA aptamer sequence.
  • the nucleic acid molecule comprising the guide RNA also comprises one or more RNA aptamers, or distinct RNA secondary structures or sequences that can recruit and bind another molecular species, an adaptor molecule, such as a nucleic acid or protein.
  • the RNA aptamers can be naturally occurring or synthetic oligonucleotides that have been engineered through repeated rounds of in vitro selection or SELEX (systematic evolution of ligands by exponential enrichment) to bind to a specific target molecular species.
  • the nucleic acid comprises two or more aptamer sequences.
  • the aptamer sequences may be the same or different and may target the same or different adaptor proteins.
  • the nucleic acid comprises two aptamer sequences.
  • RNA aptamer/ aptamer binding protein pair known may be selected and used in connection with the present disclosure (see, e.g., Jayasena, S.D., Clinical Chemistry, 1999. 45(9): p. 1628-1650; Gelinas, et al., Current Opinion in Structural Biology, 2016. 36: p. 122-132; and Hasegawa, H., Molecules, 2016; 21(4): p. 421, incorporated herein by reference).
  • RNA aptamer binding, or adaptor, proteins exist, including a diverse array of bacteriophage coat proteins.
  • coat proteins include but are not limited to: MS2, Qp, F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, Mi l, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, 4>Cb5, 4>Cb8r, 4>Cb 12r, (
  • the RNA aptamer binds MS2 bacteriophage coat protein or a functional derivative, fragment or variant thereof.
  • MS2 binding RNA aptamers commonly have a simple stem-loop structure, classically defined by a 19 nucleotide RNA molecule with a single bulged adenine on the 5’ leg of the stem (Witherail G.W., et al., (1991) Prog. Nucleic Acid Res. Mol. Biol., 40, 185-220, incorporated herein by reference).
  • MS2 coat protein Parrott AM, et al., Nucleic Acids Res. 2000;28(2):489-497, Buenrostro JD, et al. Natura Biotechnology 2014; 32, 562-568, and incorporated herein by reference).
  • the MS2 RNA aptamer sequence comprises: AACAUGAGGAUCACCCAUGUCUGCAG (SEQ ID NO: 145), AGCAUGAGGAUCACCCAUGUCUGCAG (SEQ ID NO: 146), or AGCGUGAGGAUCACCCAUGCCUGCAG (SEQ ID NO: 147).
  • N-proteins (Nut-utilization site proteins) of bacteriophages contain arginine-rich conserved RNA recognition motifs of ⁇ 20 amino acids, referred to as N peptides.
  • the RNA aptamer may bind a phage N peptide or a functional derivative, fragment or variant thereof.
  • the phage N peptide is the lambda or P22 phage N peptide or a functional derivative, fragment or variant thereof.
  • the N peptide is lambda phage N22 peptide, or a functional derivative, fragment or variant thereof.
  • theN22 peptide comprises an amino acid sequence with at least 70% similarity to the amino acid sequence GNARTRRRERRAEKQAQWKAAN (SEQ ID NO: 149).
  • N22 peptide, the 22 amino acid RNA- binding domain of the X bacteriophage antiterminator protein N (XN-(l-22) or XN peptide) is capable of specifically binding to specific stem-loop structures, including but not limited to the BoxB stem-loop. See, for example Cilley and Williamson, RNA 1997; 3(l):57-67, incorporated herein by reference.
  • the N22 peptide RNA aptamer sequence comprises a nucleotide sequence with at least 70% similarity to an RNA sequence selected from the group consisting of GCCCUGAAAAAGGGC (SEQ ID NO: 150), GCCCUGAAGAAGGGC (SEQ ID NO: 151), GCGCUGAAAAAGCGC (SEQ ID NO: 152), GCCCUGACAAAGGGC (SEQ ID NO: 153), and GCGCUGACAAAGCGC (SEQ ID NO: 154).
  • the N22 peptide RNA aptamer sequence is selected from the group consisting of SEQ ID NOs: 150-154.
  • the N peptide is the P22 phage N peptide, or a functional derivative, fragment or variant thereof.
  • a number of different BoxB stem-loop primary sequences are known to bind the P22 phage N peptide and variants thereof and any of those may be utilized in connection with the present disclosure. See, for example Cocozaki, Ghattas, and Smith, Journal of Bacteriology 2008; 190(23):7699-7708, incorporated herein by reference.
  • the P22 phage N peptide comprises an amino acid sequence with at least 70% similarity to the amino acid sequence GNAKTRRHERRRKLAIERDTI (SEQ ID NO: 155).
  • the P22 phage N peptide RNA aptamer sequence comprises a sequence with at least 70% similarity to an RNA sequence selected from the group consisting of GCGCUGACAAAGCGC (SEQ ID NO: 156) and CCGCCGACAACGCGG (SEQ ID NO: 157). In some embodiments, the P22 phage N peptide RNA aptamer sequence is selected from the group consisting of SEQ ID NOs: 156-157, UGCGCUGACAAAGCGCG (SEQ ID NO: 158) or ACCGCCGACAACGCGGU (SEQ ID NO: 159).
  • the aptamer sequence is a peptide aptamer sequence.
  • the peptide aptamers can be naturally occurring or synthetic peptides that are specifically recognized by an affinity agent.
  • Such aptamers include, but are not limited to, a c-Myc affinity tag, an HA affinity tag, a His affinity tag, an S affinity tag, a methionine-His affinity tag, an RGD-His affinity tag, a 7x His tag, a FLAG octapeptide, a strep tag or strep tag II, a V5 tag, or a VSV-G epitope.
  • Corresponding aptamer binding proteins are well-known in the art and include, for example, primary antibodies, biotin, affimers, single domain antibodies, and antibody mimetics.
  • An exemplary peptide aptamer includes a GCN4 peptide (Tanenbaum et al., Cell 2014; 159(3):635-646, incorporated herein by reference).
  • Antibodies, or GCN4 binding protein can be used as the aptamer binding proteins.
  • the peptide aptamer sequence is conjugated to the Casl2f protein.
  • the peptide aptamer sequence may be fused to Casl2f in any orientation (e.g., N-terminus to C-terminus, C-terminus to N-terminus, N-terminus to N-terminus).
  • the peptide aptamer is fused to the C-terminus of the Casl2f protein.
  • peptide aptamer sequences may be conjugated to the Casl2f protein.
  • the aptamer sequences may be the same or different and may target the same or different aptamer binding proteins.
  • 1 to 24 tandem repeats of the same peptide aptamer sequence are conjugated to the Casl2f protein.
  • between 4 and 18 tandem repeats are conjugated to the Casl2f protein.
  • the individual aptamers may be separated by a linker region. Suitable linker regions are known in the art. The linker may be flexible or configured to allow the binding of affinity agents to adjacent aptamers without or with decreased steric hindrance.
  • the linker sequences may provide an unstructured or linear region of the polypeptide, for example, with the inclusion of one or more glycine and/or serine residues.
  • the linker sequences can be at least about 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acids in length.
  • the fusion protein comprises a microbial recombination protein functionally linked to an aptamer binding protein.
  • the microbial recombination protein may be RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, singlestranded DNA-binding protein gp2.5, or a derivative or variant thereof.
  • the microbial recombination protein is RecE or RecT, or a derivative or variant thereof.
  • Derivatives or variants of RecE and RecT are functionally equivalent proteins or polypeptides which possess substantially similar function to wild type RecE and RecT.
  • RecE and RecT derivatives or variants include biologically active amino acid sequences similar to the wild-type sequences but differing due to amino acid substitutions, additions, deletions, truncations, post-translational modifications, or other modifications.
  • the derivatives may improve translation, purification, biological half-life, activity, or eliminate or lessen any undesirable side effects or reactions.
  • the derivatives or variants may be naturally occurring polypeptides, synthetic or chemically synthesized polypeptides or genetically engineered peptide polypeptides.
  • RecE and RecT bioactivities are known to, and easily assayed by, those of ordinary skill in the art, and include, for example exonuclease and single-stranded nucleic acid binding, respectively.
  • the RecE or RecT may be from a number of microbial organisms, including Escherichia coH. Pantoea breeneri. Type-F symbiont of Plautia slab. Providencia sp. MGF014, Shigella sonnei. Pseudobacteriovorax antillogorgiicola, among others.
  • the RecE and RecT protein is derived from Escherichia coli.
  • the fusion protein comprises RecE, or a derivative or variant thereof.
  • the RecE, or derivative or variant thereof may comprise an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-8.
  • the RecE, or derivative or variant thereof may comprise an amino acid sequences with at least 70% (e.g., 75%., 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-8.
  • the RecE, or derivative or variant thereof comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-8. In exemplary embodiments, the RecE, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-3.
  • the fusion protein comprises RecT, or a derivative or variant thereof.
  • the RecT, or derivative or variant thereof may comprise an amino acid sequence selected from the group consisting of SEQ ID NOs: 9-14.
  • the RecT, or derivative or variant thereof may comprise an amino acid sequences with at least 70% (e.g., 75%., 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 9-14.
  • the RecT, or derivative or variant thereof comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 9-14. In exemplary embodiments, the RecT, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NO:9.
  • Truncations may be from either the C-terminal or N-terminal ends, or both. For example, as demonstrated below, a diverse set of truncations from either end or both provided a functional product. In some embodiments, one or more (2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 100, 120 or more) amino acids may be truncated from the C-terminal, N-terminal ends as compared to the wild-type sequence.
  • the microbial recombination protein may be linked to either terminus of the aptamer binding protein in any orientation (e.g., N-terminus to C-terminus, C- terminus to N-terminus, N-terminus to N-terminus).
  • the microbial recombination protein N-terminus is linked to the aptamer binding protein C-terminus.
  • the overall fusion protein from N- to C-terminus comprises the aptamer binding protein (N- to C- terminus) linked to the microbial recombination protein (N- to C-terminus).
  • the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a nuclease. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an endonuclease. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an exonuclease. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a nuclease and/or a Cas or dCas.
  • the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a endonuclease and/or a Cas or dCas. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a exonuclease and/or a Cas or dCas. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with a nuclease. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with an endonuclease.
  • the recombination protein may be expressed independently from, not a fusion protein with an exonuclease. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with a nuclease and/or a Cas or dCas. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an aptamer and/or aptamer binding protein. In some embodiments, the recombination protein may be expressed independently, not as a fusion protein, with an aptamer and/or aptamer binding protein.
  • the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a nuclease and/or Cas or dCas and/or to an aptamer and/or aptamer binding protein.
  • the recombination protein may be expressed independently from, not a fusion protein with a nuclease and/or a Cas or dCas and/or an aptamer and/or aptamer binding protein.
  • the aptamer and/or aptamer binding protein is an MCP protein.
  • the recombination protein may be an SSAP.
  • nuclease refers to an agent, such as a protein or small molecule, that is capable of cleaving phosphodiester bonds that join nucleotide residues in a nucleic acid molecule.
  • the nuclease is but woven, e.g., an enzyme that is capable of binding to a nucleic acid molecule and cleaving phosphodiester bonds linking nucleotide residues in the nucleic acid molecule.
  • the nuclease may be an endonuclease, which cleaves a phosphodiester bond in a polynucleotide strand, or an exonuclease, which cleaves a phosphodiester bond at the end of a polynucleotide strand.
  • the nuclease is a site-specific nuclease that binds to and/or cleaves a particular phosphodiester bond within a particular nucleotide sequence, which is also referred to herein as a "recognition sequence," "nuclease target site", or "target site”.
  • the nuclease is an RNA-guided (i.e., RNA-programmable) nuclease that complexes (e.g., binds) to RNA having a sequence complementary to the target site, thereby providing sequence specificity of the nuclease.
  • the nuclease recognizes a single-stranded target site, while in other embodiments, the nuclease recognizes a double-stranded target site, e.g., a double-stranded DNA target site.
  • Target sites for many naturally occurring nucleases for example many naturally occurring DNA restriction nucleases, are well known to those skilled in the art.
  • DNA nucleases such as EcoRI, Hindlll or BamHI recognize palindromic double-stranded DNA target sites that are 4 to 10 base pairs in length and cut each of the wo DNA strands at specific positions within the target site.
  • Some endonucleases symmetrically cleave a double-stranded nucleic acid target site, i.e., cleave both strands at the same position, such that the ends comprise base-paired nucleotides, also referred to herein as blunt ends.
  • endonucleases cleave double-stranded nucleic acid target sites asymmetrically, i.e., each strand is cleaved at a different position such that the ends contain unpaired nucleotides.
  • Unpaired nucleotides at the ends of a double-stranded DNA molecule are also referred to as "overhangs", e.g., "5 '-overhangs” or "3' -overhangs,” depending on whether the unpaired nucleotide forms the 5'or 3' end of the corresponding DNA strand.
  • Nuclease proteins typically comprise a "binding domain” that mediates interaction of the protein with a nucleic acid substrate (in some cases also specifically binding to a target site) and a "cleavage domain” that catalyzes the cleavage of phosphodiester bonds within the nucleic acid backbone.
  • the nuclease protein is capable of binding and cleaving a nucleic acid molecule in a monomeric form, while in other embodiments, the nuclease protein must dimerize or otherwise cleave a target nucleic acid molecule. Binding and cleavage domains of naturally occurring nucleases, as well as mode binding and cleavage domains that can be fused to create nucleases, are well known to those of skill in the art.
  • a zinc finger or transcriptional activator-like element can be used as a binding domain to specifically bind a desired target site and fused or conjugated to a cleavage domain, such as the cleavage domain of fokl, to create an engineered nuclease that cleaves the target site.
  • Non-limiting examples of an exonuclease include exonuclease I, exonuclease II, exonnuclease III, exonuclease IV, exonuclease V, exonuclease VII, exonuclease VIII, lambda exonuclease, Xml, mung bean nuclease, TREX2, exonuclease T, T7 exonuclease, strandase exonuclease, 3’-5’ exophosphodiesterase, and Bal31 nuclease.
  • the fusion protein further comprises a linker between the microbial recombination protein and the aptamer binding protein.
  • the linkers may comprise any amino acid sequence of any length.
  • the linkers may be flexible such that they do not constrain either of the two components they link together in any particular orientation.
  • the linkers may essentially act as a spacer.
  • the linker links the C-terminus of the microbial recombination protein to the N-terminus of the aptamer binding protein.
  • the linker comprises the amino acid sequence of the 16-residue XTEN linker, SGSETPGTSESATPES (SEQ ID NO: 15) or the 37-residue EXTEN linker, SASGGSSGGSSGSETPGTSESATPESSGGSSGGSGGS (SEQ ID NO: 148).
  • the fusion protein further comprises a nuclear localization sequence (NLS).
  • the nuclear localization sequence may be at any location within the fusion protein (e.g., C-terminal of the aptamer binding protein, N-terminal of the aptamer binding protein, C-terminal of the microbial recombination protein).
  • the nuclear localization sequence is linked to the C-terminus of the microbial recombination protein.
  • a number of nuclear localization sequences are known in the art (see, e.g., Lange, A., et al., J Biol Chem. 2007; 282(8): 5101-5105, incorporated herein by reference) and may be used in connection with the present invention.
  • the nuclear localization sequence may be the SV40 NLS, PKKKRKV (SEQ ID NO: 16); the Tyl NLS, NSKKRSLEDNETEIKVSRDTWNTKNMRSLEPPRSKKRIH (SEQ ID NO: 17); the c-Myc NLS, PAAKRVKLD (SEQ ID NO: 18); the biSV40 NLS, KRTADGSEFESPKKKRKV (SEQ ID NO: 19); and the Mut NLS, PEKKRRRPSGSVPVLARPSPPKAGKSSCI (SEQ ID NO:20).
  • the nuclear localization sequence is the SV40 NLS, PKKKRKV (SEQ ID NO: 16).
  • the Casl2f protein and the fusion protein are desirably included in a single composition alone, in combination with each other, and/or the polynucleotide(s) (e.g., a vector) comprising the guide RNA sequence and the aptamer sequence.
  • the Casl2f protein and/or the fusion protein may or may not be physically or chemically bound to the polynucleotide.
  • the Casl2f protein and/or the microbial recombination protein can be associated with a polynucleotide using any suitable method for protein-protein linking or protein-virus linking known in the art.
  • compositions and vectors comprising a polynucleotide comprising a nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an RNA aptamer binding protein.
  • compositions or vectors may further comprise at least one or both of a polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence.
  • the nucleic acid molecule comprising a guide RNA sequence further comprises at least one RNA aptamer sequence.
  • the polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein further comprises a sequence encoding at least one peptide aptamer sequence.
  • nucleic acid molecule comprising a guide RNA sequence, the aptamer sequences, the Casl2f proteins, the microbial recombination proteins, and the aptamer binding proteins set forth above in connection with the inventive system also are applicable to the polynucleotides of the recited compositions and vectors.
  • the invention involves vectors, e.g. for delivering or introducing in a cell, but also for propagating these components (e.g. in prokaryotic cells).
  • a "vector” is a tool that allows or facilitates the transfer of an entity from one environment to another. It is a replicon, such as a plasmid, phage, or cosmid, into which another DNA segment may be inserted so as to bring about the replication of the inserted segment.
  • a vector is capable of replication when associated with the proper control elements.
  • the term “vector” refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked.
  • Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, doublestranded, or partially double-stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g. circular); nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art.
  • plasmid refers to a circular double stranded DNA loop into which additional DNA segments can be inserted, such as by standard molecular cloning techniques.
  • viral vector wherein virally- derived DNA or RNA sequences are present in the vector for packaging into a virus (e.g.
  • Viral vectors also include polynucleotides carried by a virus for transfection into a host cell.
  • Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g. bacterial vectors having a bacterial origin of replication and episomal mammalian vectors).
  • Other vectors e.g., non-episomal mammalian vectors are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome.
  • vectors are capable of directing the expression of genes to which they are operatively-linked. Such vectors are referred to herein as "expression vectors.” Vectors for and that result in expression in a eukaryotic cell can be referred to herein as “eukaryotic expression vectors.” Common expression vectors of utility in recombinant DNA techniques are often in the form of plasmids.
  • Recombinant expression vectors can comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vectors include one or more regulatory elements, which may be selected on the basis of the host cells to be used for expression, that is operatively-linked to the nucleic acid sequence to be expressed.
  • "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the regulatory element(s) in a manner that allows for expression of the nucleotide sequence (e.g. in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
  • Vector delivery e.g., plasmid, viral delivery:
  • the CRISPR enzyme for instance a Type V protein such as Casl2f, and/or any of the present RNAs, for instance a guide RNA
  • can be delivered using any suitable vector e.g., plasmid or viral vectors, such as adeno associated virus (AAV), lentivirus, adenovirus or other viral vector types, or combinations thereof.
  • Effector proteins and one or more guide RNAs can be packaged into one or more vectors, e.g., plasmid or viral vectors.
  • the vector e.g., plasmid or viral vector is delivered to the tissue of interest by, for example, an intramuscular injection, while other times the delivery is via intravenous, transdermal, intranasal, oral, mucosal, or other delivery methods. Such delivery may be either via a single dose, or multiple doses.
  • the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector choice, the target cell, organism, or tissue, the general condition of the subject to be treated, the degree of transformation/modification sought, the administration route, the administration mode, the type of transformation/modification sought, etc.
  • Such a dosage may further contain, for example, a carrier (water, saline, ethanol, glycerol, lactose, sucrose, calcium phosphate, gelatin, dextran, agar, pectin, peanut oil, sesame oil, etc.), a diluent, a pharmaceutically-acceptable carrier (e.g., phosphate-buffered saline), a pharmaceutically-acceptable excipient, and/or other compounds known in the art.
  • a carrier water, saline, ethanol, glycerol, lactose, sucrose, calcium phosphate, gelatin, dextran, agar, pectin, peanut oil, sesame oil, etc.
  • a pharmaceutically-acceptable carrier e.g., phosphate-buffered saline
  • a pharmaceutically-acceptable excipient e.g., phosphate-buffered saline
  • the dosage may further contain one or more pharmaceutically acceptable salts such as, for example, a mineral acid salt such as a hydrochloride, a hydrobromide, a phosphate, a sulfate, etc.; and the salts of organic acids such as acetates, propionates, malonates, benzoates, etc.
  • auxiliary substances such as wetting or emulsifying agents, pH buffering substances, gels or gelling materials, flavorings, colorants, microspheres, polymers, suspension agents, etc. may also be present herein.
  • Suitable exemplary ingredients include microcrystalline cellulose, carboxymethylcellulose sodium, polysorbate 80, phenylethyl alcohol, chlorobutanol, potassium sorbate, sorbic acid, sulfur dioxide, propyl gallate, the parabens, ethyl vanillin, glycerin, phenol, parachlorophenol, gelatin, albumin and a combination thereof.
  • the delivery is via an adenovirus, which may be at a single booster dose containing at least 1 x 10 5 particles (also referred to as particle units, pu) of adenoviral vector.
  • the dose preferably is at least about 1 x 10 6 particles (for example, about 1 x 10 6 - 1 x 10 11 particles), more preferably at least about 1 x 10 7 particles, more preferably at least about 1 x 10 8 particles (e.g., about 1 x 10 8 -l x 10 11 particles or about 1 x 10 9 -l x 10 12 particles), and most preferably at least about 1 x IO 10 particles (e.g., about 1 x 10 9 - 1 x IO 10 particles or about 1 x 10 9 - 1 x 10 12 particles), or even at least about 1 x IO 10 particles (e.g., about 1 x 10 10 - 1 x 10 12 particles) of the adenoviral vector.
  • the dose comprises no more than about 1 x 10 14 particles, preferably no more than about 1 x 10 13 particles, even more preferably no more than about 1 x 10 12 particles, even more preferably no more than about 1 x 10 11 particles, and most preferably no more than about 1 x IO 10 particles (e.g., no more than about 1 x 10 9 particles).
  • the dose may contain a single dose of adenoviral vector with, for example, about 1 x 10 6 particle units (pu), about 2 x 10 6 pu, about 4 x 10 6 pu, about 1 x 10 7 pu, about 2 x 10 7 pu, about 4 x 10 7 pu, about 1 x 10 8 pu, about 2 x 10 8 pu, about 4 x 10 8 pu, about 1 x 10 9 pu, about 2 x 10 9 pu, about 4 x 10 9 pu, about 1 x IO 10 pu, about 2 x IO 10 pu, about 4 x IO 10 pu, about 1 x 10 11 pu, about 2 x 10 11 pu, about 4 x 10 11 pu, about 1 x 10 12 pu, about 2 x 10 12 pu, or about 4 x 10 12 pu of adenoviral vector.
  • adenoviral vector with, for example, about 1 x 10 6 particle units (pu), about 2 x 10 6 pu, about 4 x 10 6 pu, about 1 x 10 7 pu, about 2 x
  • the adenoviral vectors in U.S. Pat. No. 8,454,972 B2 to Nabel, et. al., granted on Jun. 4, 2013; incorporated by reference herein, and the dosages at col 29, lines 36-58 thereof.
  • the adenovirus is delivered via multiple doses.
  • the delivery is via an AAV.
  • a therapeutically effective dosage for in vivo delivery of the AAV to a human is believed to be in the range of from about 20 to about 50 ml of saline solution containing from about 1 x 10 10 to about 1 x 10 10 functional AAV/ml solution. The dosage may be adjusted to balance the therapeutic benefit against any side effects.
  • the AAV dose is generally in the range of concentrations of from about 1 x 10 5 to 1 x IO 50 genomes AAV, from about 1 x 10 8 to 1 x IO 20 genomes AAV, from about 1 x IO 10 to about 1 x 10 16 genomes, or about 1 x 10 11 to about 1 x 10 16 genomes AAV.
  • a human dosage may be about 1 x 10 13 genomes AAV.
  • concentrations may be delivered in from about 0.001 ml to about 100 ml, about 0.05 to about 50 ml, or about 10 to about 25 ml of a carrier solution.
  • Other effective dosages can be readily established by one of ordinary skill in the art through routine trials establishing dose response curves. See, for example, U.S. Pat. No. 8,404,658 B2 to Hajjar, et al., granted on Mar. 26, 2013, at col. 27, lines 45-60.
  • nucleic acid-targeting effector protein coding nucleic acid molecules, e.g., DNA
  • vectors e.g., viral vectors
  • Ways to package nucleic acid-targeting effector protein include: To achieve NHEJ-mediated gene knockout: Single virus vectorVector containing two or more expression cassettes:Promoter-nucleic acid-targeting effector protein coding nucleic acid molecule-terminatorPromoter-guide RNA1 -terminatorPromoter-guide RNA (N)-terminator (up to size limit of vector) Double virus vectorVector 1 containing one expression cassette for driving the expression of nucleic acid-targeting effector protein (such as a Type V protein such as C2cl or C2c3)Promoter-nucleic acid-targeting effector protein coding nucleic acid molecule- terminatorVector 2 containing one more expression cassettes for driving the expression of one or more guideRNAsPromoter-guide RNA1 -terminatorPromoter-
  • the promoter used to drive nucleic acid-targeting effector protein coding nucleic acid molecule expression can include: AAV ITR can serve as a promoter: this is advantageous for eliminating the need for an additional promoter element (which can take up space in the vector). The additional space freed up can be used to drive the expression of additional elements (gRNA, etc.). Also, ITR activity is relatively weaker, so can be used to reduce potential toxicity due to over expression of nucleic acid-targeting effector protein. For ubiquitous expression, can use promoters: CMV, CAG, CBh, PGK, SV40, Ferritin heavy or light chains, etc.
  • promoters For brain or other CNS expression, can use promoters: SynapsinI for all neurons, CaMKIIalpha for excitatory neurons, GAD67 or GAD65 or VGAT for GABAergic neurons, etc.
  • For liver expression can use Albumin promoter.
  • For lung expression can use SP-B.
  • For endothelial cells can use ICAM.
  • For hematopoietic cells can use IFNbeta or CD45.
  • Osteoblasts can use OG-2.
  • the promoter used to drive guide RNA can include: Pol III promoters such as U6 or Hl Use of Pol II promoter and intronic cassettes to express guide RNA Adeno Associated Virus (AAV)
  • Nucleic acid-targeting effector protein and one or more guide RNA can be delivered using adeno associated virus (AAV), lentivirus, adenovirus or other plasmid or viral vector types, in particular, using formulations and doses from, for example, U.S. Pat. No. 8,454,972 (formulations, doses for adenovirus), U.S. Pat. No. 8,404,658 (formulations, doses for AAV) and U.S. Pat. No. 5,846,946 (formulations, doses for DNA plasmids) and from clinical trials and publications regarding the clinical trials involving lentivirus, AAV and adenovirus.
  • AAV adeno associated virus
  • the route of administration, formulation and dose can be as in U.S. Pat. No. 8,454,972 and as in clinical trials involving AAV.
  • the route of administration, formulation and dose can be as in U.S. Pat. No. 8,404,658 and as in clinical trials involving adenovirus.
  • the route of administration, formulation and dose can be as in U.S. Pat. No. 5,846,946 and as in clinical studies involving plasmids.
  • Doses may be based on or extrapolated to an average 70 kg individual (e.g., a male adult human), and can be adjusted for patients, subjects, mammals of different weight and species.
  • Frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian), depending on usual factors including the age, sex, general health, other conditions of the patient or subject and the particular condition or symptoms being addressed.
  • the viral vectors can be injected into the tissue of interest.
  • the expression of nucleic acid-targeting effector can be driven by a cell-type specific promoter.
  • liver-specific expression might use the Albumin promoter and neuron-specific expression (e.g., for targeting CNS disorders) might use the Synapsin I promoter.
  • AAV is advantageous over other viral vectors for a couple of reasons: Low toxicity (this may be due to the purification method not requiring ultra centrifugation of cell particles that can activate the immune response) and Low probability of causing insertional mutagenesis because it doesn't integrate into the host genome.
  • AAV has a packaging limit of 4.5 or 4.75 Kb.
  • nucleic acid-targeting effector protein such as a Type V protein such as C2cl or C2c3
  • a promoter and transcription terminator have to be all fit into the same viral vector. Therefore embodiments of the invention include utilizing homologs of nucleic acid-targeting effector protein (such as a Type V protein such as C2cl or C2c3) that are shorter.
  • the AAV can be AAV1, AAV2, AAV5 or any combination thereof.
  • AAV8 is useful for delivery to the liver. The herein promoters and vectors are preferred individually.
  • Packaging cells are typically used to form virus particles that are capable of infecting a host cell. Such cells include 293 cells, which package adenovirus, and psi2 cells or PA317 cells, which package retrovirus.
  • Viral vectors used in gene therapy are usually generated by producing a cell line that packages a nucleic acid vector into a viral particle. The vectors typically contain the minimal viral sequences required for packaging and subsequent integration into a host, other viral sequences being replaced by an expression cassette for the polynucleotide(s) to be expressed. The missing viral functions are typically supplied in trans by the packaging cell line. For example, AAV vectors used in gene therapy typically only possess ITR sequences from the AAV genome which are required for packaging and integration into the host genome.
  • Viral DNA is packaged in a cell line, which contains a helper plasmid encoding the other AAV genes, namely rep and cap, but lacking ITR sequences.
  • the cell line may also be infected with adenovirus as a helper.
  • the helper virus promotes replication of the AAV vector and expression of AAV genes from the helper plasmid.
  • the helper plasmid is not packaged in significant amounts due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, e.g., heat treatment to which adenovirus is more sensitive than AAV. Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US20030087817, incorporated herein by reference.
  • Millington-Ward et al. (Molecular Therapy, vol. 19 no. 4, 642-649 April 2011) describes adeno-associated virus (AAV) vectors to deliver an RNA interference (RNAi)-based rhodopsin suppressor and a codon-modified rhodopsin replacement gene resistant to suppression due to nucleotide alterations at degenerate positions over the RNAi target site.
  • RNAi RNA interference
  • An injection of either 6.0 x 10 8 vp or 1.8 x 10 10 vp AAV were subretinally injected into the eyes by Millington- Ward et al.
  • Dalkara et al. (Sci Transl Med 5, 189ra76 (2013)) also relates to in vivo directed evolution to fashion an AAV vector that delivers wild-type versions of defective genes throughout the retina after noninjurious injection into the eyes' vitreous humor. Dalkara describes a 7 mer peptide display library and an AAV library constructed by DNA shuffling of cap genes from AAV1, 2, 4, 5, 6, 8, and 9.
  • the rcAAV libraries and rAAV vectors expressing GFP under a CAG or Rho promoter were packaged and deoxyribonuclease-resistant genomic titers were obtained through quantitative PCR.
  • the libraries were pooled, and two rounds of evolution were performed, each consisting of initial library diversification followed by three in vivo selection steps.
  • P30 rho-GFP mice were intravitreally injected with 2 ml of iodixanol-purified, phosphate-buffered saline (PBS)-dialyzed library with a genomic titer of about 1. times.10. sup.12 vg/ml.
  • the AAV vectors of Dalkara et al. may be applied to the nucleic acid-targeting system of the present invention, contemplating a dose of about 1 x 10 15 to about 1 x 10 16 vg/ml administered to a human.
  • the nucleic acid sequence encoding the Casl2f protein and/or the nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein can be provided to a cell on the same vector (e.g., in cis) as the nucleic acid molecule comprising the guide RNA sequence and/or the RNA aptamer sequence.
  • a unidirectional promoter can be used to control expression of each nucleic acid sequence.
  • a combination of bidirectional and unidirectional promoters can be used to control expression of multiple nucleic acid sequences.
  • a nucleic acid sequence encoding the Casl2f protein, the nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein, and the nucleic acid molecule comprising the guide RNA sequence and/or the RNA aptamer sequence can be provided to a cell on separate vectors (e.g., in trans).
  • Each of the nucleic acid sequences in each of the separate vectors can comprise the same or different expression control sequences.
  • the separate vectors can be provided to cells simultaneously or sequentially.
  • the vector(s) comprising the nucleic acid sequences encoding the Casl2f protein and encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein can be introduced into a host cell that is capable of expressing the polypeptide encoded thereby, including any suitable prokaryotic or eukaryotic cell.
  • the invention provides an isolated cell comprising the vector or nucleic acid sequences disclosed herein.
  • Preferred host cells are those that can be easily and reliably grown, have reasonably fast growth rates, have well characterized expression systems, and can be transformed or transfected easily and efficiently.
  • suitable prokaryotic cells include, but are not limited to, cells from the genera Bacillus (such as Bacillus subtilis and Bacillus brevis), Escherichia (such as E. coll), Pseudomonas, Streptomyces, Salmonella, and Envinia.
  • Suitable eukaryotic cells are known in the art and include, for example, yeast cells, insect cells, and mammalian cells.
  • suitable yeast cells include those from the genera Kluyveromyces, Pichia, Rhino-sporidium, Saccharomyces, and Schizosaccharomyces.
  • Exemplary insect cells include Sf-9 and HIS (Invitrogen, Carlsbad, Calif.) and are described in, for example, Kitts et al., Biotechniques, IP. 810-817 (1993); Lucklow, Curr. Opin. Biotechnol., P. 564-572 (1993); and Lucklow et al., J. Virol., 67'. 4566-4579 (1993), incorporated herein by reference.
  • the host cell is a mammalian cell, and in some embodiments, the host cell is a human cell.
  • a number of suitable mammalian and human host cells are known in the art, and many are available from the American Type Culture Collection (ATCC, Manassas, Va.).
  • suitable mammalian cells include, but are not limited to, Chinese hamster ovary cells (CHO) (ATCC No. CCL61), CHO DHFR-cells (Urlaub et al., Proc. Natl. Acad. Sci. USA, 97: 4216-4220 (1980)), human embryonic kidney (HEK) 293 or 293T cells (ATCC No. CRL1573), and 3T3 cells (ATCC No. CCL92).
  • CHO Chinese hamster ovary cells
  • CHO DHFR-cells Urlaub et al., Proc. Natl. Acad. Sci. USA, 97: 4216-4220 (1980)
  • human embryonic kidney (HEK) 293 or 293T cells ATCC No. CRL1573)
  • 3T3 cells ATCC No. CCL92
  • Other suitable mammalian cell lines are the monkey COS-1 (ATCC No. CRL1650) and COS-7 cell lines (ATCC No. CRL1651), as
  • mammalian host cells include primate, rodent, and human cell lines, including transformed cell lines. Normal diploid cells, cell strains derived from in vitro culture of primary tissue, as well as primary explants, are also suitable.
  • Other suitable mammalian cell lines include, but are not limited to, mouse neuroblastoma N2A cells, HeLa, HEK, A549, HepG2, mouse L-929 cells, and BHK or HaK hamster cell lines. Methods for selecting suitable mammalian host cells and methods for transformation, culture, amplification, screening, and purification of cells are known in the art.
  • the invention also provides a method of altering a target DNA.
  • the method alters genomic DNA sequence in a cell, although any desired nucleic acid may be modified.
  • the method comprises introducing the systems, compositions, or vectors described herein into a cell comprising a target genomic DNA sequence.
  • Descriptions of the nucleic acid molecule comprising a guide RNA sequence, the Casl2f proteins, the microbial recombination proteins, the recruitment systems, and polynucleotides encoding thereof, the cell, the target genomic DNA sequence, and components thereof, set forth above in connection with the inventive system are also applicable to the method of altering a target genomic DNA sequence in a cell.
  • the systems, composition or vectors may be introduced in any manner known in the art including, but not limited to, chemical transfection, electroporation, microinjection, biolistic delivery via gene guns, or magnetic-assisted transfection, depending on the cell type.
  • the guide RNA sequence binds to the target genomic DNA sequence in the cell genome
  • the Casl2f protein associates with the guide RNA and may induce a double strand break or single strand nick in the target genomic DNA sequence and the aptamer recruits the microbial recombination proteins to the target genomic DNA sequence through the aptamer binding protein of the fusion protein, thereby altering the target genomic DNA sequence in the cell.
  • the nucleic acid molecule comprising a guide RNA sequence, the Casl2f protein, and the fusion protein are first expressed in the cell.
  • the cell is in an organism or host, such that introducing the disclosed systems, compositions, vectors into the cell comprises administration to a subject.
  • the method may comprise providing or administering to the subject, in vivo, or by transplantation of ex vivo treated cells, systems, compositions, vectors of the present system.
  • a “subject” may be human or non-human and may include, for example, animal strains or species used as “model systems” for research purposes, such a mouse model as described herein. Likewise, subject may include either adults or juveniles (e.g., children). Moreover, subject may mean any living organism, preferably a mammal (e.g., human or non-human) that may benefit from the administration of compositions contemplated herein.
  • mammals include, but are not limited to, any member of the Mammalian class: humans, non-human primates such as chimpanzees, and other apes and monkey species; farm animals such as cattle, horses, sheep, goats, swine; domestic animals such as rabbits, dogs, and cats; laboratory animals including rodents, such as rats, mice and guinea pigs, and the like.
  • non-mammals include, but are not limited to, birds, fish, and the like.
  • the mammal is a human.
  • Plants include without limitation sugar cane, corn, wheat, rice, oil palm fruit, potatoes, soy beans, vegetables, cassava, sugar beets, tomatoes, barley, bananas, watermelon, onions, sweet potatoes, cucumbers, apples, seed cotton, oranges, and the like.
  • the terms “providing”, “administering,” “introducing,” are used interchangeably herein and refer to the placement of the systems of the invention into a subject by a method or route which results in at least partial localization of the system to a desired site.
  • the systems can be administered by any appropriate route which results in delivery to a desired location in the subject.
  • altering a DNA sequence refers to modifying at least one physical feature of a DNA sequence of interest.
  • DNA alterations include, for example, single or double strand DNA breaks, deletion, or insertion of one or more nucleotides, and other modifications that affect the structural integrity or nucleotide sequence of the DNA sequence.
  • the modifications of a target sequence in genomic DNA may lead to, for example, gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, gene mutation, gene knock-down, and the like.
  • the systems and methods described herein may be used to correct one or more defects or mutations in a gene (referred to as “gene correction”).
  • the target genomic DNA sequence encodes a defective version of a gene
  • the system further comprises a donor nucleic acid molecule which encodes a wild-type or corrected version of the gene.
  • the target genomic DNA sequence is a “disease-associated” gene.
  • the term “disease-associated gene,” refers to any gene or polynucleotide whose gene products are expressed at an abnormal level or in an abnormal form in cells obtained from a disease-affected individual as compared with tissues or cells obtained from an individual not affected by the disease.
  • a disease-associated gene may be expressed at an abnormally high level or at an abnormally low level, where the altered expression correlates with the occurrence and/or progression of the disease.
  • a disease-associated gene also refers to a gene, the mutation or genetic variation of which is directly responsible or is in linkage disequilibrium with a gene(s) that is responsible for the etiology of a disease.
  • genes responsible for such “single gene” or “monogenic” diseases include, but are not limited to, adenosine deaminase, a-1 antitrypsin, cystic fibrosis transmembrane conductance regulator (CFTR), P-hemoglobin (HBB), oculocutaneous albinism II (OCA2), Huntingtin (HTT), dystrophia myotonica-protein kinase (DMPK), low-density lipoprotein receptor (LDLR), apolipoprotein B (APOB), neurofibromin 1 (NF1), polycystic kidney disease 1 (PKD1), polycystic kidney disease 2 (PKD2), coagulation factor VIII (F8), dystrophin (DMD), phosphate-regulating endopeptidase homologue, X-linked (PHEX), methyl-CpG-binding protein 2 (MECP2), and ubiquitin-specific peptidase 9Y, Y-linked (USP9Y).
  • the invention provides knock-ins of large transgenes at therapeutically relevant loci in the human genome.
  • the locus provides cell or tissue-specific expression.
  • the invention comprises insertion of nucleic acids into the albumin (ALB) locus.
  • the ALB locus provides for liver targeting in human hepatocytes, is highly expressed and in a liver-specific manner.
  • the invention comprises insertion of nucleic acids into the AAVS1 locus.
  • the AAVS1 locus is a safe-harbor locus for gene therapy that is well expressed in certain tissue types and can be used in a wide variety of treatments, with low expression in liver.
  • US Patent Publication 2018/0214490 Al describes gene therapy for lysosomal storage diseases, including targeting transgenes to safe harbo” loci such as the AAVS1, HPRT and CCR5 genes in human cells, and Rosa26 in murine cells.
  • US Patent 9267154 describes integration of exogenous nucleic acid sequences into the PPP1R12C locus, which is widely expressed in most tissues, describes cell-specific expression by targeting transgenes (e.g., encoding chimeric antigen receptors (CARs)) to the T-cell receptor a constant (TRAC) locus.
  • transgenes e.g., encoding chimeric antigen receptors (CARs)
  • T-cell receptor a constant (TRAC) locus are exemplary and nonlimiting as to loci that can be targeted according to the invention.
  • the target genomic DNA sequence can comprise a gene, the mutation of which contributes to a particular disease in combination with mutations in other genes.
  • Diseases caused by the contribution of multiple genes which lack simple (e.g., Mendelian) inheritance patterns are referred to in the art as a “multifactorial” or “polygenic” disease.
  • multifactorial or polygenic diseases include, but are not limited to, asthma, diabetes, epilepsy, hypertension, bipolar disorder, and schizophrenia.
  • Certain developmental abnormalities also can be inherited in a multifactorial or polygenic pattern and include, for example, cleft lip/palate, congenital heart defects, and neural tube defects.
  • the method of altering a target genomic DNA sequence can be used to delete nucleic acids from a target sequence in a cell by cleaving the target sequence and allowing the cell to repair the cleaved sequence in the absence of an exogenously provided donor nucleic acid molecule.
  • Deletion of a nucleic acid sequence in this manner can be used in a variety of applications, such as, for example, to remove disease-causing trinucleotide repeat sequences in neurons, to create gene knock-outs or knock-downs, and to generate mutations for disease models in research.
  • donor nucleic acid molecule refers to a nucleotide sequence that is inserted into the target DNA (e.g., genomic DNA).
  • the donor DNA may include, for example, a gene or part of a gene, a sequence encoding a tag or localization sequence, or a regulating element.
  • the donor nucleic acid molecule may be of any length. In some embodiments, the donor nucleic acid molecule is between 10 and 10,000 nucleotides in length.
  • nucleotides in length between about 100 and 5,000 nucleotides in length, between about 200 and 2,000 nucleotides in length, between about 500 and 1,000 nucleotides in length, between about 500 and 5,000 nucleotides in length, between about 1,000 and 5,000 nucleotides in length, or between about 1,000 and 10,000 nucleotides in length,
  • the disclosed systems and methods overcome challenges encountered during conventional gene editing, including low efficiency and off-target events, particularly with kilobase-scale nucleic acids.
  • the disclosed systems and methods improve the efficiency of gene editing.
  • the disclosed systems and methods can have a 2- to 10-fold increase in efficiency over conventional CRISPR-Cas9 systems and methods, as shown in Examples 2, 3, and 5.
  • the improvement in efficiency is accompanied by a reduction in off-target events.
  • the off-target events may be reduced by greater than 50% compared to conventional CRISPR-Cas9 systems and methods, for example, a reduction of off-target events by about 90% is shown in Example 3.
  • the invention further provides kits containing one or more reagents or other components useful, necessary, or sufficient for practicing any of the methods described herein.
  • kits may include CRISPR reagents (Casl2f protein, guide RNA, vectors, compositions, etc.), recombineering reagents (recombination protein-aptamer binding protein fusion protein, the aptamer sequence, vectors, compositions, etc.) transfection or administration reagents, negative and positive control samples (e.g., cells, template DNA), cells, containers housing one or more components (e.g., microcentrifuge tubes, boxes), detectable labels, detection and analysis instruments, software, instructions, and the like.
  • CRISPR reagents Casl2f protein, guide RNA, vectors, compositions, etc.
  • recombineering reagents recombination protein-aptamer binding protein fusion protein, the aptamer sequence, vectors, compositions, etc.
  • transfection or administration reagents e.g., negative and positive control samples (e.g., cells, template DNA), cells, containers housing one or more components (e.g
  • the RNAs may be delivered using adeno associated virus (AAV), lentivirus, adenovirus or other viral vector types, or combinations thereof.
  • AAV adeno associated virus
  • the RNAs can be packaged into one or more viral vectors.
  • the viral vector is delivered to the tissue of interest by, for example, an intramuscular injection, while other times the viral delivery is via intravenous, transdermal, intranasal, oral, mucosal, or other delivery methods. Such delivery may be either via a single dose, or multiple doses.
  • the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector chose, the target cell, organism, or tissue, the general condition of the subject to be treated, the degree of transformation/modification sought, the administration route, the administration mode, the type of transformation/modification sought, etc.
  • the delivery is via an adenovirus, which may be at a single booster dose containing at least 1 x 10 5 particles (also referred to as particle units, pu) of adenoviral vector.
  • the dose preferably is at least about 1 x 10 6 particles (for example, about lxlO 6 -lxlO 12 particles), more preferably at least about IxlO 10 particles, more preferably at least about IxlO 8 particles (e.g., about lxlO 8 -lxlO n particles or about 1X10 8 -1X10 12 particles), and most preferably at least about IxlO 9 particles (e.g., about lxlO 9 -lxlO 10 particles or about lxio 9 -lxio 12 particles), or even at least about IxlO 10 particles (e.g., about lxlO lo -lxlO 12 particles) of the adenoviral vector.
  • the dose comprises no more than about IxlO 14 particles, preferably no more than about IxlO 13 particles, even more preferably no more than about IxlO 12 particles, even more preferably no more than about IxlO 11 particles, and most preferably no more than about 1 x 10 10 particles (e.g., no more than about IxlO 9 articles).
  • the dose may contain a single dose of adenoviral vector with, for example, about IxlO 6 particle units (pu), about 2x 10 6 pu, about 4x 10 6 pu, about IxlO 7 pu, about 2x 10 7 pu, about 4x 10 7 pu, about IxlO 8 pu, about 2xl0 8 pu, about 4xl0 8 pu, about IxlO 9 pu, about 2xl0 9 pu, about 4xl0 9 pu, about lxl0 10 pu, about 2xlO 10 pu, about 4xlO 10 pu, about IxlO 11 pu, about 2xlO n pu, about 4xlO n pu, about IxlO 12 pu, about 2x 10 12 pu, or about 4x 10 12 pu of adenoviral vector.
  • adenoviral vector with, for example, about IxlO 6 particle units (pu), about 2x 10 6 pu, about 4x 10 6 pu, about IxlO 7 pu, about 2x 10 7 pu, about 4x 10 7
  • the adenoviral vectors in U.S. Pat. No.8,454,972 B2 to Nabel, et. al., granted on Jun.4, 2013; incorporated by reference herein, and the dosages at col 29, lines 36-58 thereof.
  • the adenovirus is delivered via multiple doses.
  • the delivery is via an AAV.
  • a therapeutically effective dosage for in vivo delivery of the AAV to a human is believed to be in the range of from about 20 to about 50 ml of saline solution containing from about 1 x IO 10 to about 1 x IO 10 functional AAV/ml solution. The dosage may be adjusted to balance the therapeutic benefit against any side effects.
  • the AAV dose is generally in the range of concentrations of from about 1 x 10 5 to 1 x IO 50 genomes AAV, from about 1 x 10 8 to 1 x IO 20 genomes AAV, from about 1 x 10 10 to about 1 x 10 16 genomes, or about 1 x 10 11 to about 1 x 10 16 genomes AAV.
  • a human dosage may be about I x lO 13 genomes AAV.
  • concentrations may be delivered in from about 0.001 ml to about 100 ml, about 0.05 to about 50 ml, or about 10 to about 25 ml of a carrier solution.
  • Other effective dosages can be readily established by one of ordinary skill in the art through routine trials establishing dose response curves. See, for example, U.S. Pat. No. 8,404,658 B2 to Hajjar, et al., granted on Mar. 26, 2013, at col. 27, lines 45-60.
  • the delivery is via a plasmid.
  • the dosage should be a sufficient amount of plasmid to elicit a response.
  • suitable quantities of plasmid DNA in plasmid compositions can be from about 0.1 to about 2 mg, or from about 1 pg to about 10 pg.
  • the doses herein are based on an average 70 kg individual.
  • the frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian), or scientist skilled in the art. Mice used in experiments are about 20 g. From that which is administered to a 20 g mouse, one can extrapolate to a 70 kg individual.
  • Lentiviruses are complex retroviruses that have the ability to infect and express their genes in both mitotic and post-mitotic cells.
  • the most commonly known lentivirus is the human immunodeficiency virus (HIV), which uses the envelope glycoproteins of other viruses to target a broad range of cell types.
  • HIV human immunodeficiency virus
  • VSV-g pseudotype VSV-g pseudotype
  • psPAX2 gag/pol/rev/tat
  • Transfection was done in 4 mL OptiMEM with a cationic lipid delivery agent (50 uL Lipofectamine 2000 and 100 ul Plus reagent). After 6 hours, the media was changed to antibiotic-free DMEM with 10% fetal bovine serum.
  • Lentivirus may be purified as follows. Viral supernatants were harvested after 48 hours. Supernatants were first cleared of debris and filtered through a 0.45 um low protein binding (PVDF) filter. They were then spun in a ultracentrifuge for 2 hours at 24,000 rpm. Viral pellets were resuspended in 50 ul of DMEM overnight at 4 C. They were then aliquotted and immediately frozen at -80 C.
  • PVDF low protein binding
  • minimal non-primate lentiviral vectors based on the equine infectious anemia virus are also contemplated, especially for ocular gene therapy (see, e.g., Balagaan, J Gene Med 2006; 8: 275-285, Published online 21 Nov. 2005 in Wiley InterScienc; available at the website: interscience.wiley.com. DOI: 10.1002/jgm.845).
  • EIAV equine infectious anemia virus
  • RetinoStat® an equine infectious anemia virus-based lentiviral gene therapy vector that expresses angiostatic proteins endostain and angiostatin that is delivered via a subretinal injection for the treatment of the web form of age-related macular degeneration is also contemplated (see, e.g., Binley et al., HUMAN GENE THERAPY 23 : 980-991 (September 2012)) may be modified for the system of the present invention.
  • Lentiviral vectors have been disclosed as in the treatment for Parkinson's Disease, see, e.g., US Patent Publication No. 20120295960 and U.S. Pat. Nos. 7,303,910 and 7,351,585. Lentiviral vectors have also been disclosed for the treatment of ocular diseases, see e.g., US Patent Publication Nos. 20060281180, 20090007284, US20110117189; US20090017543;
  • a particle is defined as a small object that behaves as a whole unit with respect to its transport and properties. Particles are further classified according to diameter Coarse particles cover a range between 2,500 and 10,000 nanometers. Fine particles are sized between 100 and 2,500 nanometers. Ultrafine particles, or nanoparticles, are generally between 1 and 100 nanometers in size. The basis of the 100-nm limit is the fact that novel properties that differentiate particles from the bulk material typically develop at a critical length scale of under 100 nm.
  • Example 1 Materials and Methods [00135] RecE/T Homolog Screening RefSeq non-redundant protein database was downloaded from NCBI on October 29, 2019. The database was searched with E. coli Rac prophage RecT (NP 415865.1) and RecE (NP 415866.1) as queries using position-specific iterated (PSI)- BLAST 1 to retrieve protein homologs. Hits were clustered with CD-EHT2 and representative sequences were selected from each cluster for multiple alignment with MUSCLE 3 . Then, FastTree4 was used for maximum likelihood tree reconstruction with default parameters. A diverse set of RecET homologs were selected, synthesized by GenScript, and cloned into pMPH MCP vectors for testing.
  • PSI position-specific iterated
  • Plasmids construction pX330, pMPH and pU6-(BbsI)_CBh-Cas9-T2A-BFP plasmids were obtained from Addgene. Tested effector DNA fragments were ordered from IDT, Genewiz, and GenScript. The fragments were Gibson assembled into the backbones using NEBuilder HiFi DNA Assembly Master Mix (New England BioLabs). All sgRNAs (Table 1) were inserted into backbones using Golden Gate cloning. All constructs were sequence-verified with Sanger sequencing of prepped plasmids.
  • HEK Cell culture Human Embryonic Kidney 293 T, HeLa and HepG2 were maintained in Dulbecco’s Modified Eagle’s Medium (DMEM, Life Technologies), with 10% fetal bovine serum (FBS, HyClone), 100 U/mL penicillin, and 100 pg/mL streptomycin (Life Technologies) at 37 °C with 5% CO2.
  • DMEM Modified Eagle’s Medium
  • FBS fetal bovine serum
  • streptomycin Life Technologies
  • hES-H9 cells were maintained in mTeSRl medium (StemCell Technologies) at 37 °C with 5% CO2. Culture plates were pre-coated with Matrigel (Corning) 12 hours prior to use, and cells were supplemented with 10 pM Y27632 (Sigma) for the first 24 hours after passaging. Culture media was changed every 24 hours.
  • Transfection HEK293T cells were seeded into 96-well plates (Corning) 12-24 hours prior to transfection at a density of 30,000 cells/well, and 250 ng of total DNA was transfected per well.
  • HeLa and HepG2 cells were seeded into 48-well plates (Corning) one day prior to transfection at a density of 50,000 and 30,000 cells/well respectively, and 400 ng of total DNA was transfected per well. Transfections were performed with Lipofectamine 3000 (Life Technologies) following the manufacturer’s instructions.
  • Fluorescence-activated cell sorting FACS mKate knock-in efficiency was analyzed on a CytoFLEX flow cytometer (Beckman Coulter; Stanford Stem Cell FACS Core). 72 hours after transfection, cells were washed once with PBS and dissociated with TrypLE Express Enzyme (Thermo Fisher Scientific). Cell suspension was then transferred to a 96-well U-bottom plate (Thermo Fisher Scientific) and centrifuged at 300xG for 5 minutes. After removing the supernatant, pelleted cells were resuspended with 50 pl 4% FBS in PBS, and cells were sorted within 30 minutes of preparation.
  • FACS Fluorescence-activated cell sorting
  • RFLP HEK293T cells were transfected with plasmid DNA and PCR templates and harvested after 72 hours for genomic DNA using the QuickExtract DNA Extraction Solution (Biosearch Technologies) following the manufacturer’s protocol.
  • the target genomic region was amplified using specific primers outside of the homology arms of the PCR template.
  • PCR products were purified with Monarch PCR & DNA Cleanup Kit (New England BioLabs). 300 ng of purified product was digested with BsrGI (EMX1, New England BioLabs) or Xbal (VEGFA, NEB), and the digested products were analyzed on a 5% Mini-PROTEAN TBE gel (Bio-Rad).
  • iGUIDE Off-target Analysis Genome-wide, unbiased off-target analysis was performed following the iGUIDE pipeline (Nobles, C.L., et al. Genome Biol 20, 14 (2019), incorporated herein by reference) based on Guide-seq invented previously (Tsai, S., et al. Nat Biotechnol 33, 187-197 (2015), incorporated herein by reference).
  • HEK293T cells were transfected in 20uL Lonza SF Cell Line Nucleofector Solution on a Lonza Nucleofector 4-D with program DS- 150 according to the manufacturer’s instructions.
  • gRNA-Cas9 plasmids or 150ng of each gRNACas9n plasmid for the double nickase
  • 150ng of the effector plasmids and 5pmol of double stranded oligonucleotides (dsODN) were transfected.
  • Cells were harvested after 72hrs for genomic DNA using Agencourt DNAdvance reagent kit. 400ng of purified gDNA which was then fragmented to an average of 500bp and ligated with adaptors using NEBNext Ultra II FS DNA Library Prep kit following manufacturer’s instructions.
  • Microbial recombineering has two major steps: template DNA is chewed back by exonucleases (Exo), then the single-strand annealing protein (SSAP) supports homology directed repair by the template, optionally facilitated by nuclease inhibitor.
  • SSAP single-strand annealing protein
  • a system for RNA-guided targeting of RecE/T recombineering activities was developed and achieved kilobase (kb) human gene-editing without DNA cutting.
  • kb kilobase
  • NCBI protein database was systematically searched for RecE/T homologs. To develop a portable tool, evolutionary relationships and lengths were examined. Co-occurrence analysis revealed that most RecE/T systems have only one of the two proteins. As prophage integration could be imprecise, the 11% of species harboring both homologs were prioritized as evidence for intact functionality.
  • the top 12 candidates were codon-optimized and MS2 coat protein (MCP) fusions were constructed to recruit these RecE/T homologs, hereafter termed “recombinator”, to wild-type Streptococcus pyogenes Cas9 (wtCas9) via MS2 RNA aptamers.
  • MCP MS2 coat protein
  • RecE is only 269 amino acid (AA) long
  • RecE was truncated from AA587 (RecE_587) and the carboxy terminus domain (RecE CTD) based on functional studies (Muyrers, J.P., Genes Dev. (2000); 14, 1971-1982, incorporated herein by reference).
  • HDR homology directed repair
  • RecE had activities without recruitment, whereas RecT showed efficiency increases in a recruitment-dependent manner. Without being bound by theory, this may be explained by RecE exonuclease activity acting promiscuously.
  • the RecE/T recombineering-edit (REDIT) tools was termed as REDITvl, with REDITvl RecT as the preferred variant.
  • REDITvl activity was robust across multiple genomic sites in HEK, A549, HepG2, and HeLa cells. Noticeably, in human embryonic stem cells (hESCs), REDITvl exhibited consistent increases of kilobase knock-in efficiency at HSP90AA1 and OCT4, with up to 3.5-fold improvement relative to Cas9-HDR. Different template designs were also tested. REDITvl performed efficient kilobase editing using HA length as short as 200bp total, with longer HA supporting higher efficiency.
  • REDIT was examined for long sequence editing ability in the absence of any nicking/cutting of the target DNA.
  • dCas9 catalytically dead Cas9
  • REDITv2D has lower efficiency than REDITv2N, it achieved programmable DNA- damage-free editing at kilobase-scale with 1-2% efficiency and no selection. It was hypothesized that two processes could be contributing to the REDIT v2D recombineering. One possibility was via dCas9 unwinding.
  • dCas9 could unwind DNA as it induces sequence-specific formation of loop, a double-binding with two dCas9s would be expected to promote genome accessibility to RecE/T. However, a significant increase upon delivering two guide RNAs was not observed. Another possibility was that the unwinding of DNA during cell cycle permitted RecE/T to access the target region mediated by dCas9 binding.
  • a Ikb knock-in was performed with different REDIT tools at varying serum levels (10% regular, 2% reduced, and no serum). As serum starvation arrests cell proliferation, the results indicated that the cell cycle correlated positively with REDITv2D recombineering.
  • REDITv3 Microscopy analysis revealed incomplete nuclei-targeting of REDITvl, particularly REDITvl RecT. Hence, different designs of protein linkers and nuclear localization signals (NLSs) were tested. The extended XTEN-linker with C-terminal SV40-NLS was identified as a preferred configuration, termed REDITv3. REDITv3 further achieved a 2- to 3- fold increase of HDR efficiencies over REDITv2 across genome targets and Cas9 variants (wtCas9, Cas9n, dCas9).
  • REDITv3 was utilized in hESCs to engineer kilobase knock-in alleles in human stem cells.
  • REDITv3N single- and double-nicking designs resulted in 5-fold and 20-fold increased HDR efficiencies over no-recombinator controls, respectively.
  • the efficacy and fidelity were confirmed via a combination of assays described for previous REDIT versions.
  • REDITv3 works effectively with Staphylococcus aureus Cas9 (SaCas9), a compact CRISPR system suitable for in vivo delivery.
  • RecT and RecE_587 variants were truncated at various lengths. The resulting efficiencies were measured using an mKate knock-in assay, with both wildtype SpCas9 and Cas9n(D10A) with single- and double-nicking at the DYNLT1 locus. Efficiencies of the no recombination group are shown as the control.
  • the truncated versions of both RecT and RecE_587 retained significant recombineering activity when used with different Cas9s.
  • the new truncated versions such as RecT(93-264aa) are over 30% smaller yet they preserved essentially the full activities of RecT in stimulating recombination in eukaryotic cells.
  • truncated versions such as RecE_587(120-221aa) and RecE_587(120-209aa) are over 60% smaller but still retained high recombination activities in human cells.
  • exonuclease proteins were used: the exonuclease from phage Lambda, the RecE587 core domain of E. coli RecE protein, and the exonuclease (gene name gp6) from phage T7.
  • the gene-editing activity was measured using mKate knock-in assay at genomic loci (DYNLT1 and HSP90AA1).
  • SSAPs single-strand DNA annealing proteins
  • SSAP single-strand annealing proteins
  • a REDIT system using SunTag recruitment was developed. Because SunTag is based on fusion protein design, the sgRNA or guideRNAs are the same as wild-type CRISPR system. Specifically, the REDIT recombinator proteins were fused to scFV antibody peptide (replacing MCP), and the GCN4 peptide was fused in tandem fashion (10 copies of GCN4 peptide separated by linkers) to the Cas9 protein. Thus, the scFV-REDIT could be recruited to the Cas9 complex via affinity of GCN4 to scFV.
  • the knock-in cells were clonally isolated and the target genomic region was amplified using primers binding completely outside of the donor DNAs for colony Sanger sequencing.
  • Junction sequencing analysis ( ⁇ 48 colonies per gene per condition) revealed varying degrees of indels at the 5’- and 3’- knock-in junctions, including at single or both junctions.
  • HDR donors had better precision than MMEJ donors, and REDIT modestly improved the knock-in yield compared with Cas9, though junction indels were still observed.
  • next-generation sequencing was used to quantify the editing events. Comparable levels of indels were observed between Cas9 and REDIT with improved HDR efficiencies using REDIT.
  • REDIT The sensitivity of REDIT’ s ability to promote HDR in the presence or absence of two distinctive pharmacological inhibitors of RAD51, B02 and RI-1.
  • RAD51 inhibition significantly lowered HDR efficiencies.
  • RAD51 inhibition decreased REDIT and REDIT dn efficiencies only moderately, as both REDIT/REDITdn methods maintained significantly higher knock-in efficiencies compared with Cas9/Cas9dn under RAD51 inhibition.
  • Mirin a potent chemical inhibitor of DSB repair, which has also been shown to prevent MRN complex formation, MRN-dependent ATM activation, and inhibit Mrel l exonuclease activity was also used.
  • Mrining only the editing efficiencies of Cas9 reference experiments were affected by the Miring treatment, whereas the REDIT versions were essentially the same as vehicle-treated groups across all genomic targets.
  • REDIT was applied in human embryonic stem cells (hESCs) to test their ability to engineer long sequences in non-transformed human cells.
  • Robust stimulation of HDR was observed across all three genomic sites HSP90AA1, ACTB, OCT4/POU5F1)' using REDIT and REDITdn.
  • REDIT and REDITdn editing used donor DNAs with 200-bp HAs on each side and achieved up to over 5% efficiency for kb-scale geneediting without selection compared with ⁇ 1% efficiency using non-REDIT methods.
  • REDIT improved knock-in efficiencies in A549 (lung-derived), HepG2 (liver- derived), and HeLa (cervix-derived) cells, demonstrating up to ⁇ 15% kb-scale genomic knock-in without selection. This improvement was up to 4-fold higher than the Cas9 groups, supporting the potential of using REDIT methods in different cell types.
  • dCas9-EcRecT (SAFE-dCas9) was tested using cleavage free dCas9 editor via hydrodynamic tail vein injection.
  • Successful gene editing of liver hepatocytes was monitored by transgene-encoded protein expression from the albumin locus.
  • the perfused mice livers were dissected.
  • the lobes of the liver were homogenized and processed to extract liver genomic DNA from the primary hepatocytes.
  • the extracted genomic DNA was used for three different downstream analyses: 1) PCR using knock-in-specific primers and agarose gel electrophoresis; 2) Sanger sequencing of the knock-in PCR product; 3) high-throughput deep sequencing of the knock-in junction to confirm and quantify the accuracy of gene-editing using SAFE-dCas9 in vivo.
  • Each downstream analysis confirmed knock-in success.
  • LTC mice include three genome alleles: 1) Lkbl (flox/flox) allele allows Lkbl- KO when expressing Cre; 2) R26(LSL-TdTom) allele allows detection of AAV-transduced cells via TdTom red fluorescent protein; and 3) Hl 1(LSL-Cas9) allele allows expression of Cas9 in AAV-transduced cells.
  • Successful gene editing using the gene editing vector leads to Kras alleles that drive tumor growth in the lung of the treated mice.
  • Escherichia coli RecE amino acid sequence (SEQ ID NO:1): MSTKPLFLLRKAKKS SGEPD VVLWASNDFESTC ATLDYLIVKSGKKLS S YFKAVATNFP VVNDLPAEGEIDFTWSERYQLSKDSMTWELKPGAAPDNAHYQGNTNVNGEDMTEIEEN MLLPISGQELPIRWLAQHGSEKPVTHVSRDGLQALHIARAEELPAVTALAVSHKTSLLDP LEIRELHKLVRDTDKVFPNPGNSNLGLITAFFEAYLNADYTDRGLLTKEWMKGNRVSHI TRTASGANAGGGNLTDRGEGFVHDLTSLARDVATGVLARSMDLDIYNLHPAHAKRIEEI lAENKPPFSVFRDKFITMPGGLDYSRAIVVASVKEAPIGIEVIPAHVTEYLNKVLTETDHA NPDPEIVDIACGRSSAPMPQRVTEEGKQDDEEKPQPSGTTAVEQGEAETMEPD
  • Escherichia coli CTD RecE amino acid sequence (SEQ ID NO:3):
  • Pantoea brenneri RecE amino acid sequence (SEQ ID NO:4):
  • Type-F symbiont of Plautia stali RecT amino acid sequence (SEQ ID NO: 11):
  • SV40 NLS amino acid sequence (SEQ ID NO: 16): PKKKRKV
  • biSV40 NLS amino acid sequence (SEQ ID NO: 19): KRTADGSEFESPKKKRKV
  • VEGFA HDR template sequence (SEQ ID NO:80):
  • HSP90AA1 HDR template sequence (SEQ ID NO:82):
  • OCT4 HDR template sequence (SEQ ID NO:84):
  • Pantoea stewartii RecT DNA (SEQ ID NO:85):
  • Pantoea stewartii RecE DNA (SEQ ID NO:86):
  • Pantoea brenneri RecT DNA (SEQ ID NO:87):
  • Pantoea brenneri RecE DNA (SEQ ID NO:88):
  • Pantoea dispersa RecT DNA (SEQ ID NO: 89):
  • Pantoea dispersa RecE DNA (SEQ ID NO: 90):
  • Type-F symbiont of Plautia stali RecE DNA (SEQ ID NO:92):
  • Shewanella putrefaciens RecT DNA SEQ ID NO:97:
  • Shewanella putrefaciens RecE DNA SEQ ID NO:98:
  • Salmonella enterica RecT DNA SEQ ID NO: 1023:
  • Salmonella enterica RecE DNA SEQ ID NO: 1044:
  • Acetobacter RecT DNA SEQ ID NO: 1057
  • Acetobacter RecE DNA SEQ ID NO: 1066:
  • Salmonella enterica subsp. enterica serovar Javiana str. 10721 RecE DNA SEQ ID NO: 1
  • Pantoea stewartii RecT Protein (SEQ ID NO:115):
  • Pantoea stewartii RecE Protein (SEQ ID NO: 116):
  • Pantoea brenneri RecT Protein (SEQ ID NO: 117):
  • Pantoea brenneri RecE Protein (SEQ ID NO: 118):
  • Pantoea dispersa RecT Protein SEQ ID NO:119: MSNQPPLATADLQKTQQSNQVAKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMIR IVTTEIRKTPALAQCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQLI IGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLIHRPGDNESAPITHVYAVAR LKDGGTQFEVMTAKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSIEM QKAVVLDEKAESDVDQDNASVLSAEYSVLESGTGE
  • Pantoea dispersa RecE Protein (SEQ ID NO: 120): MEPGIYYDISNEAYHSGPGISKSQLDDIARSPAIFQWRKDAPVDTEKTKALDLGTDFHCA VLEPERFADMYRVGPEVNRRTTAGKAEEKEFFEKCEKDGAVPITHDDARKVELMRGSV MAHPIAKQMIAAQGHAEASIYWHDESTGNLCRCRPDKFIPDWNWIVDVKTTADMKKFR REFYDLRYHVQDAFYTDGYAAQFGERPTFVFVVTSTTIDCGRYPTEVFFLDEETKAAGR SEYQSNLVTYSECLSRNEWPGIATLSLPHWAKELRNV
  • Type-F symbiont of Plautia stali RecT Protein (SEQ ID NO: 121): MSNQPPIASADLQKTQQSKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMI RIVTTEIRKTPALATCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQ LIIGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLIHRPGDNEDAPITHVYAV ARLKDGGTQFEVMTAKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSI EMQKAVVLDEKAESDVDQDNASVLSAEYSVLEGDGGE
  • Type-F symbiont of Plautia stali RecE Protein (SEQ ID NO: 122): MQPGIYYDISNEDYHGGPGISKSQLDDIAISPAIYQWRKHAPVDEEKTAALDLGTALHCL LLEPDEFSKRFEIGPEVNRRTTAGKEKEKEFMERCEAEGVTPITHDDNRKLRLMRDSAM AHPIARWMLEAQGNAEASIYWNDRDTGVLSRCRPDKIITDFNWCVDVKSTADIIKFQKD FYSYRYHVQDAFYSDGYESHFDETPTFAFLAVSTSIDCGRYPVQVFIMDQQAKDAGRAE YKRNIHTFAECLSRNEWPGIATLSLPYWAKELRNE
  • Shewanella putrefaciens RecT Protein SEQ ID NO: 127):
  • Bacillus sp. MUM 116 RecT Protein (SEQ ID NO:129):
  • Salmonella enterica RecE Protein SEQ ID NO: 1344:
  • Acetobacter RecT Protein SEQ ID NO: 135):
  • Acetobacter RecE Protein SEQ ID NO: 1336:
  • Salmonella enterica subsp. enterica serovar Javiana str. 10721 RecT Protein (SEQ ID NO: 137): MPKQPPIAKADLQKTQGARTPTAVKNNNDVISFINQPSMKEQLAAALPRHMTAERMIRI ATTEIRKVPALGDCDTMSFVSAIVQCSQLGLEPGGALGHAYLLPFGNRNEKSGKKNVQL IIGYRGMIDLARRSGQIASLSARVVREGDDFSFEFGLEEKLVHRPGENEDAPVTHVYAVA RLKDGGTQFEVMTRKQIELVRAQSKAGNNGPWVTHWEEMAKKTAIRRLFKYLPVSIEI
  • Salmonella enterica subsp. enterica serovar Javiana str. 10721 RecE Protein SEQ ID NO: 138:
  • Pseudob acteriovorax antillogorgiicola RecT Protein (SEQ ID NO: 139): MGHLVSKTEQDYIKQHYAKGATDQEFEHFIGVCRARGLNPAANQIYFVKYRSKDGPAK PAFILSIDSLRLIAHRTGDYAGCSEPIFTDGGKACTVTVRRNLKSGETGNFSGMAFYDEQ VQQKNGRPTSFWQSKPRTMLEKCAEAKALRKAFPQDLGQFYIREEMPPQYDEPIQVHK PKALEEPRFSKSDLSRRKGLNRKLSALGVDPSRFDEVATFLDGTPDRELGQKLKLWLKE AGYGVNQ
  • Pseudob acteriovorax antillogorgiicola RecE Protein (SEQ ID NO: 140): MSKLSNLKVSNSDVDTLSRIRMKEGVYRDLPIESYHQSPGYSKTSLCQIDKAPIYLKTKV PQKSTKSLNIGTAFHEAMEGVFKDKYVVHPDPGVNKTTKSWKDFVKRYPKHMPLKRSE YDQVLAMYDAARSYRPFQKYHLSRGFYESSFYWHDAVTNSLIKCRPDYITPDGMSVIDF KTTVDPSPKGFQYQAYKYHYYVSAALTLEGIEAVTGIRPKEYLFLAVSNSAPYLTALYR ASEKEIALGDHFIRRSLLTLKTCLESGKWPGLQEEILELGLPFSGLKELREEQEVEDEFME LVG
  • Photobacterium sp. JCM 19050 RecT Protein (SEQ ID NO: 141): MNTDMIAMPPSPAISMLDTSKLDVMVRAAELMSQAVVMVPDHFKGKPADCLAVVMQ ADQ WGMNPFT V AQKTHL VSGTLGYESQLVNAVIS S SKAIKGRFHYEWSDGWERLAGK VQYVKESRQRKGQQGSYQVTVAKPTWKPEDEQGLWVRCGAVLAGEKDITWGPKLYL ASVLVRNSELWTTKPYQQAAYTALKDWSRLYTPAVMQGSMTGKSWSLTGRLISPR [00270] Photobacterium sp.
  • JCM 19050 RecE Protein (SEQ ID NO: 142): MAERVRTYQRDAVFAHELKAEFDEAVENGKTGVTLEDQARAKRMVHEATTNPASRN WFRYDGELAACERSYFWRDEEAGLVLKARPDKEIGNNLIDVKSIEVPTDVCACDLNAYI NRQIEKRGYHISAAHYLSGTGKDRFFWIFINKVKGYEWVAIVEASPLHIELGTYEVLEGL RSIASSTKEADYPAPLSHPVNERGIPQPLMSNLSTYAMKRLEQFREL
  • DSM 30120 RecT Protein SEQ ID NO:143: MKAQLAAALPKHITSDRMIRIVSTEIRKTPSLANCDIQSFIGAVVQCSQLGLEPGNALGH AYLLPFGNGKSDNGKSNVQLIIGYRGMIDLARRSGQIISISARTVRQGDNFHFEYGLNEN LTHIPEGNEDSPITHVYAVARLKDEGVQFEVMTYNQIEKVRDSSKAGKNGPWVTHWEE MAKKTVIRRLFKYLPVSIEMQKAVILDEKAEANIEQDHSAIFEAEFEEVDSNGN
  • DSM 30120 RecE Protein SEQ ID NO:144: MNEGIYYDISNEDYHHGLGISKSQLDLIDESPADFIWHRDAPVDNEKTKALDFGTALHCL LLEPDEFQKRFRIAPEVNRRTNAGKEQEKEFLEMCEKENITPITNEDNRKLSLMKDSAM
  • Mouse Albumin knock-in sense template (SEQ ID NO: 160) CACCTTCAGATTTTCCTGTAACGATCGGGAACTGGCATCTTCAGGGAGTAGctgacctcttc tcttcctcccacaggATCCTGGAGCCACCCGCAGTTCGAAAAGCTCAGTGAAGAGAAGAACA AAAAGCAGCATATTACAGTTAGTTGTCTTCATCAATCTTTAAATATGTTGTGTGGTTT [00273] Mouse Albumin knock-in anti-sense template (SEQ ID NO: 161) GTGGAAACAGGGAGAAAAAAACCACACAACATATTTAAAGATTGATGAAGACAACT AACTGTAATATGCTGCTTTTTGTTCTTCTCTTCACTGAGCTTTTCGAACTGCGGGTGG CTCCAGGATcctgtgggaggaagagaagaggtcagCTACTCCCTGAAGATGCCAGTTCCCGATCGT TACAGGAAAATCTGAAGGTGAAGGTGAAGGTGAAGGTGAAGGTGAAGGTG
  • FIG. 1 depicts SSAP with dCAS12fl protein as compact editor for precision large knock-in.
  • SSAP with dCasl2fl protein as compact editor for precision large knock-in has three components plus donor DNA.
  • the donor sequence is provided in Wang et al., Nucleic Acids Res. 2021 Apr 6;49(6):e36. doi: 10.1093/nar/gkaal264.
  • the three components are dCasl2fl protein, guideRNA with MS2 aptamer, and MCP-SSAP fusion protein.
  • the dCasl2fl protein has different mutations to convert Casl2fl into dCasl2fl.
  • MS2 aptamer inserted into the stem-loop region of optimized Casl2fl guideRNA scaffold can be other aptamers as in NAR paper, guide RNA 20bp used, can be longer or shorter (15bp to 35bp, possibly longer).
  • MS2 aptamer can be one or more.
  • FIG. 2 depicts SSAP with dCasl2fl(D225A) protein as compact editor for precision large knock-in (knock-in of mKate transgene).
  • FIG. 3 depicts SSAP with different versions of dCasl2fl - using scaffold no.4 with the best signal to noise ratio.
  • Table 5 provides representative SSAP proteins for use with Casl2f according to the invention.
  • SSAP proteins were identified from sequence data for use in systems and compositions comprising Casl2f and/or other nucleases.
  • the SSAPs were screened for activity with Cas9 and dCas9.
  • Gene editing activites are shown below in Table 6, for top scoring SSAP proteins followed by amino acid sequences of the proteins.
  • the table shows editing efficiency as the normalized average of two targets (HSP90 and ACTB), absolute editing efficiency, and cell viability.
  • SSAP proteins are identified by Uniparc deposit number and SEQ ID NO.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Mycology (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) system, originally found in bacteria and archaea as part of the immune system to defend against invading viruses, forms the basis for genome editing technologies that can be programmed to target specific stretches of a genome or other DNA for editing at precise locations. The present disclosure provides recombineering-editing systems using CRISPR and recombination enzymes as well as methods, vectors, nucleic acid compositions, and kits thereof. The methods and systems provide means for altering target DNA, including genomic DNA in a host cell.

Description

RNA-GUIDED GENOME RECOMBINEERING AT KILOBASE SCALE
RELATED APPLICATIONS AND INCORPORATION BY REFERENCE
[0001] This application claims priority under 35 U.S.C. § 119(e) to U.S. Patent Application Serial No. 63/308,759, filed February 10, 2022, which is incorporated herein by reference in its entireties.
[0002] The foregoing applications, and all documents cited therein or during their prosecution (“appln cited documents”) and all documents cited or referenced in the appln cited documents, and all documents cited or referenced herein (“herein cited documents”), and all documents cited or referenced in herein cited documents, together with any manufacturer’s instructions, descriptions, product specifications, and product sheets for any products mentioned herein or in any document incorporated by reference herein, are hereby incorporated herein by reference, and may be employed in the practice of the invention. More specifically, all referenced documents are incorporated by reference to the same extent as if each individual document was specifically and individually indicated to be incorporated by reference.
SEQUENCE LISTING
[0003] This application contains a sequence listing filed in electronic form in extensible Markup Language (XML) format entitled J0217-99007_Sequence_Listing.xml, created on February 10, 2023 and having a size of 612,954 bytes. The content of the sequence listing is incorporated herein in its entirety.
FIELD OF THE INVENTION
[0004] The present invention relates to RNA-guided recombineering-editing systems using phage recombination enzymes as well as methods, vectors, nucleic acid compositions, and kits thereof.
BACKGROUND OF THE INVENTION
[0005] The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) system, originally found in bacteria and archaea as part of the immune system to defend against invading viruses, forms the basis for genome editing technologies that can be programmed to target specific stretches of a genome or other DNA for editing at precise locations. While various CRISPR-based tools are available, the majority are geared towards editing short sequences. Long-sequence editing is highly sought after in the engineering of model systems, therapeutic cell production and gene therapy. Prior studies have developed technologies to improve Cas9-mediated homology-5 directed repair (HDR), and tools leveraging nucleic acid modification enzymes with Cas9, e.g., prime-editing, demonstrated editing up to 80 base-pairs (bp) in length. Despite these progresses, there are continued demands for large-scale mammalian genome engineering with high efficiency and fidelity.
[0006] Citation or identification of any document in this application is not an admission that such document is available as prior art to the present invention.
SUMMARY OF THE INVENTION
[0007] Provided herein are systems and methods that facilitate nucleic acid editing in a manner that allows large-scale nucleic acid editing with high accuracy and low off-target errors. These systems and methods employ a combination of microbial recombination components with CRISPR recombination components.
[0008] For example, disclosed herein are systems comprising a Casl2f protein, a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence, and a recombination protein. In an advantageous embodiment, the Casl2f protein is a dCasl2fl protein. In another advantageous embodiment, the Casl2f protein is a dCasl2f protein. In a particular advantageous embodiment, the Casl2f protein is a dCAS12fl protein.
[0009] In some embodiments, the Casl2f protein is a catalytically dead.
[0010] The recombination protein may be a single stranded DNA annealing protein (SSAP), including but not limited to a microbial recombination protein, for example, RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof. In some embodiments, the system further comprises donor DNA. In some embodiments, the target DNA sequence is a genomic DNA sequence in a host cell.
[0011] In certain embodiments, the recombination protein comprises a microbial recombination protein or active portion thereof, a mitochondrial recombination protein or active portion thereof, a viral recombination protein or active portion thereof, or a eukaryotic recombination protein or active portion thereof, including without limitation, a recombination protein set forth in Table 6 or derivative or variant or functional portion thereof. In certain embodiments, the recombination protein comprises an amino acid sequence with at least 70% identity , or at least 75% identity, or at least 80% identity, or at least 85% identity, or at least 90% identity, or at least 92% identity, or at least 95% identity, or at least 96% identity, or at least 97% identity, or at least 98% identity, or at least 99% identity to a recombination protein set forth in Table 6 or derivative or variant or functional portion thereof.
[0012] In certain embodiments, the system or composition comprises a donor nucleic acid. In certain embodiments, the donor nucleic acid comprises homology arms.
[0013] In certain embodiments, the system or composition comprises a recruitment system for recruiting a guide nucleic acid and a recombination protein. In certain embodiments, the recruitment system comprises at least one aptamer sequence and an aptamer binding protein functionally linked to the microbial recombination protein as part of a fusion protein. In certain embodiments, the at least one aptamer sequence is an RNA aptamer sequence or a peptide aptamer sequence. In certain embodiments, the nucleic acid molecule or nucleic acid molecules additionally comprises the at least one RNA aptamer sequence or comprises one, two, three, or more RNA aptamer sequences. In certain embodiments two aptamer sequences comprise the same sequence or comprise sequences that bind to the same aptamer binding protein.
[0014] In certain embodiments, the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof. In certain embodiments, the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof. In certain embodiments, the at least one peptide aptamer sequence is conjugated to the guide RNA. In certain embodiments, the at least one peptide aptamer sequence comprises between 1 and 24 peptide aptamer sequences. In certain embodiments, two or more aptamer sequences comprise the same sequence. In certain embodiments, an aptamer sequence comprises a GCN4 peptide sequence.
[0015] In certain embodiments, the recombination protein N-terminus is linked to the aptamer binding protein C-terminus. In certain embodiments, the recombination protein and the aptamer binding protein are operably linked by a linker.
[0016] In certain embodiments, the recombination system or composition comprises at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is linked to the recombination protein. In certain embodiments, the NLS is located at the recombination protein C-terminus or at the recombination protein N-terminus. [0017] In certain embodiments, the recombination system is comprised in a cell, for example, a eukaryotic cell, a mammalian cell, an animal cell, a human cell, or a plant cell.
[0018] The recruitment system is adaptable to a multitude of combinations and configurations of recombination proteins. For example, by selecting and incorporating multiple nucleic acid aptamers, the system can comprise multiple recombination proteins, which may be the same or different and in various ratios. In certain embodiments, the system comprises an exonuclease. In certain embodiments, the system comprises an SSAP. In certain embodiments, the system comprises an SSB. In certain embodiments, the system comprises an exonuclease and an SSAP. In certain embodiments, the system comprises an exonuclease and an SSB. In certain embodiments, the system comprises an SSAP and an SSB. In certain embodiments, the system comprises an exonuclease and an SSAP and does not comprise an SSB. In certain embodiments, the system comprises an exonuclease and an SSB and does not comprise an SSAP. In certain embodiments, the system comprises an SSAP and an SSB and does not comprise an exonuclease. In certain embodiments, the system comprises an exonuclease, an SSAP, and an SSB.
[0019] In an aspect, the invention provides a recombination system comprising an SSAP , a Casl2f, and a reverse transcriptase (RT). In certain embodiments, the invention provides a system or composition comprising: (i) a reverse transcriptase(s) (RT); (ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription, or nucleic acid molecules comprising a guide RNA sequence that is complementary to a target DNA sequence and RNA for reverse transcription; and (iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or, (iv) nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) for expression in vivo in a cell; or, (v) vector(s) containing the nucleic acid molecule(s) of (iv) for expression in vivo in a cell. In this system or composition involving a RT (“the RT system or composition”), (iv) can involve (i) being enzyme, (ii) being nucleic acid molecule(s), and (iii) being nucleic acid molecules; or (i) being nucleic acid molecule(s) encoding the enzyme(s), (ii) being nucleic acid molecule(s), and (iii) being protein, or all of (i), (ii) and (iii) being nucleic acid molecules. In some embodiments the RT system or composition can include more than one reverse transcriptase. When there is more than one reverse transcriptase there can be more than one RNA for reverse transcription. In some embodiments, in the RT system or composition (i), (ii) and (iii) further comprises a Cas protein; or (iv) further comprises nucleic acid molecule(s) encoding a Cas protein, e.g., (iv) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) and/or a Cas protein for expression in vivo in a cell; or the vector(s) of (v) additional contain nucleic acid molecule(s) encoding a Cas protein.
[0020] Reverse transcriptases that can be used according to the invention include, without limitation, reverse transcriptases, retrotransposon reverse transcriptases, retron reverse transcriptases, LINE-1 reverse transcriptase, Ec86 reverse transcriptase, Human immunodeficiency virus (HIV) RT, Avian myoblastosis virus (AMV) RT, Moloney murine leukemia virus (M- MLV) RT a group II intron RT, a group II intron-like RT, a chimeric RT, Ma Luoni mouse leukaemia virus (M-MLV) Transcriptase, Rous sarcoma virus (Rous sarcoma virus, RSV), avian myeloblastosis virus (AMV) reverse transcriptase, Lao Sishi correlated virus (RAV) reverse transcriptase and myeloblast Tumor correlated virus (MAV) reverse transcriptase or other Avian Sarcoma leucovirus (Avian sarcoma leukosis virus, ASLV) reverse transcriptase, and other naturally occurring and engineered nucleic acid polymerases. Such engineered polymerases include, with limitation, human DNA polymerase r] which has reverse transcriptase activity in cellular environments (Su et al. 2019, J. Biol. Chem. 294(15):6073-81), and Taq DNA polymerase engineered to enhance reverse transcription and strand displacement (Barnes et el., Front. Bioeng. Biotechnol., 14 January 2021, doi.org/10.3389/fbioe.2020.553474).
[0021] Disclosed herein are compositions comprising a nucleic acid sequence encoding a fusion protein comprising a recombination protein functionally linked to an aptamer binding protein. The microbial recombination protein may be RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof. The compositions may further comprise one or both of a polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence. In some embodiments, the nucleic acid molecule further comprises at least one RNA aptamer sequence. In some embodiments, the polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein further comprises a sequence encoding at least one peptide aptamer sequence.
[0022] Also disclosed herein are vectors comprising a nucleic acid sequence encoding a fusion protein comprising a recombination protein functionally linked to an aptamer binding protein. In certain embodiments, the recombination protein comprises a microbial recombination protein, including without limitation RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof. The vectors may further comprise one or both of a polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence. In some embodiments, the nucleic acid molecule further comprises at least one RNA aptamer sequence. In some embodiments, the polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein further comprises a sequence encoding at least one peptide aptamer sequence.
[0023] In some embodiments, the RecE and RecT recombination protein is derived from E. coli. In some embodiments, the RecE, or derivative or variant thereof, comprises an amino acid sequence with at least 70% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: l-8. In some embodiments, the RecT, or derivative or variant thereof, comprises an amino acid sequence with at least 70% similarity to amino acid sequences selected from the group consisting of SEQ ID NO:9.
[0024] In systems or compositions di cussed herein, coponents may be fused directely or joined by linkers. Without limitation, the linker may be a peptide of 5-30, 10-30, 10-20 or 15 amino acid residues. The linker may be - (Gly-Gly-Gly-Gly-Ser)2 - (SEQ ID NO:492), - (Gly-Gly-Gly-Gly- Ser)3 - (SEQ ID NO:493), or - (Gly-Gly-Gly-Gly-Ser)4 - (SEQ ID NO:494). In certain embodiments, the linker is - (Gly-Gly-Gly-Gly-Ser)3 - (SEQ ID NO:493). The amino acid sequence of SEQ ID NO:561 may be encoded by the nucleic acid sequence of SEQ ID NO:495.
[0025] In certain embodiments, a linker is made up of a majority of amino acids that are sterically unhindered, such as glycine and alanine. Exemplary linkers are polyglycines (particularly (Glys, poly(Gly-Ala), and polyalanines. One exemplary suitable linker as shown in the Examples below is (Gly-Ser), such as - (Gly-Gly-Gly-Gly-Ser)2 - (SEQ ID NO:492), - (Gly- Gly-Gly-Gly-Ser)3 - (SEQ ID NO:493), or - (Gly-Gly-Gly-Gly-Ser)4 - (SEQ ID NO:494).
[0026] Linkers may also be non-peptide linkers. For example, alkyl linkers such -NH-, -(CH2)s-C(O)-, wherein s=2-20 can be used. These alkyl linkers may further be substituted by any non-sterically hindering group such as lower alkyl (e.g., Ci-4) lower acyl, halogen (e.g., CI, Br), CN, NH2, phenyl, etc.
Figure imgf000008_0001
Figure imgf000009_0001
[0027] Also disclosed is a eukaryotic cell comprising the systems or vectors disclosed herein. [0028] Further disclosed herein are methods of altering a target genomic DNA sequence in a host cell. The methods comprise contacting the systems, compositions, or vectors described herein with a target DNA sequence (e.g., introducing the systems, compositions, or vectors described herein into a host cell comprising a target genomic DNA sequence). Kits containing one or more reagents or other components useful, necessary, or sufficient for practicing any of the methods are also disclosed herein.
[0029] Other aspects and embodiments of the disclosure will be apparent in light of the following detailed description and accompanying figures.
[0030] Accordingly, it is an object of the invention not to encompass within the invention any previously known product, process of making the product, or method of using the product such that Applicants reserve the right and hereby disclose a disclaimer of any previously known product, process, or method. It is further noted that the invention does not intend to encompass within the scope of the invention any product, process, or making of the product or method of using the product, which does not meet the written description and enablement requirements of the USPTO (35 U.S.C. §112, first paragraph) or the EPO (Article 83 of the EPC), such that Applicants reserve the right and hereby disclose a disclaimer of any previously described product, process of making the product, or method of using the product. It may be advantageous in the practice of the invention to be in compliance with Art. 53(c) EPC and Rule 28(b) and (c) EPC. All rights to explicitly disclaim any embodiments that are the subject of any granted patent(s) of applicant in the lineage of this application or in any other lineage or in any prior filed application of any third party is explicitly reserved. Nothing herein is to be construed as a promise. [0031] It is noted that in this disclosure and particularly in the claims and/or paragraphs, terms such as "comprises", "comprised", "comprising" and the like can have the meaning attributed to it in U.S. Patent law; e.g., they can mean "includes", "included", "including", and the like; and that terms such as "consisting essentially of' and "consists essentially of' have the meaning ascribed to them in U.S. Patent law, e.g., they allow for elements not explicitly recited, but exclude elements that are found in the prior art or that affect a basic or novel characteristic of the invention.
[0032] These and other embodiments are disclosed or are obvious from and encompassed by, the following Detailed Description.
BRIEF DESCRIPTION OF THE DRAWINGS
[0033] The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
[0034] The following detailed description, given by way of example, but not intended to limit the invention solely to the specific embodiments described, may best be understood in conjunction with the accompanying drawings.
[0035] FIG. 1 depicts SSAP with dCAS12fl protein as compact editor for precision large knock-in.
[0036] FIG. 2 depicts SSAP with dCasl2fl(D225A) protein as compact editor for precision large knock-in (knock-in of mKate transgene).
[0037] FIG. 3 depicts SSAP with different versions of dCasl2fl - using scaffold no.4 with the best signal to noise ratio.
DETAILED DESCRIPTION OF THE INVENTION
[0038] The present disclosure is directed to a system and the components for DNA editing. In particular, the disclosed system based on CRISPR targeting and homology directed repair by phage recombination enzymes. The system results in superior recombination efficiency and accuracy at a kilobase scale.
[0039] To facilitate an understanding of the present technology, a number of terms and phrases are provided below. Additional terminology is set forth throughout the detailed description.
[0040] The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms, or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “and” and “the” include plural references unless the context clearly dictates otherwise. The present disclosure also contemplates other embodiments “comprising,” “consisting of’ and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not. [0041] For the recitation of numeric ranges herein, each intervening number there between with the same degree of precision is explicitly contemplated. For example, for the range of 6-9, the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0, the number 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, and 7.0 are explicitly contemplated.
[0042] Unless otherwise defined herein, scientific, and technical terms used in connection with the present disclosure shall have the meanings that are commonly understood by those of ordinary skill in the art. For example, any nomenclature used in connection with, and techniques of, cell and tissue culture, molecular biology, immunology, microbiology, genetics and protein and nucleic acid chemistry and hybridization described herein are those that are well known and commonly used in the art. The meaning and scope of the terms should be clear; in the event, however of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular.
[0043] The terms “complementary” and “complementarity” refer to the ability of a nucleic acid to form hydrogen bond(s) with another nucleic acid sequence by either traditional Watson- Crick base-paring or other non-traditional types of pairing. The degree of complementarity between two nucleic acid sequences can be indicated by the percentage of nucleotides in a nucleic acid sequence which can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 50%, 60%, 70%, 80%, 90%, and 100% complementary). Two nucleic acid sequences are “perfectly complementary” if all the contiguous nucleotides of a nucleic acid sequence will hydrogen bond with the same number of contiguous nucleotides in a second nucleic acid sequence. Two nucleic acid sequences are “substantially complementary” if the degree of complementarity between the two nucleic acid sequences is at least 60% (e.g., 65%, 70%, 75%, 80%, 85%, 90%, 95%. 97%, 98%, 99%, or 100%) over a region of at least 8 nucleotides (e.g., 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more nucleotides), or if the two nucleic acid sequences hybridize under at least moderate, preferably high, stringency conditions. Exemplary moderate stringency conditions include overnight incubation at 37° C in a solution comprising 20% formamide, 5*SSC (150 mM NaCl, 15 mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5*Denhardt’s solution, 10% dextran sulfate, and 20 mg/ml denatured sheared salmon sperm DNA, followed by washing the filters in 1 *SSC at about 37-50° C., or substantially similar conditions, e.g., the moderately stringent conditions described in Sambrook et al., infra. High stringency conditions are conditions that use, for example (1) low ionic strength and high temperature for washing, such as 0.015 M sodium chloride/0.0015 M sodium citrate/0.1% sodium dodecyl sulfate (SDS) at 50° C, (2) employ a denaturing agent during hybridization, such as formamide, for example, 50% (v/v) formamide with 0.1% bovine serum albumin (BSA)/0.1% Ficoll/0.1% polyvinylpyrrolidone (PVP)/50 mM sodium phosphate buffer at pH 6.5 with 750 mM sodium chloride and 75 mM sodium citrate at 42° C., or (3) employ 50% formamide, 5*SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1% sodium pyrophosphate, 5*Denhardt’s solution, sonicated salmon sperm DNA (50 pg/ml), 0.1% SDS, and 10% dextran sulfate at 42° C., with washes at (i) 42° C. in 0.2*SSC, (ii) 55° C. in 50% formamide, and (iii) 55° C. in 0.1 *SSC (preferably in combination with EDTA). Additional details and an explanation of stringency of hybridization reactions are provided in, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual, 3rd ed., Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (2001); and Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing Associates and John Wiley & Sons, New York (1994).
[0044] A cell has been “genetically modified,” “transformed,” or “transfected” by exogenous DNA, e.g., a recombinant expression vector, when such DNA has been introduced inside the cell. The presence of the exogenous DNA results in permanent or transient genetic change. The transforming DNA may or may not be integrated (covalently linked) into the genome of the cell. In prokaryotes, yeast, and mammalian cells for example, the transforming DNA may be maintained on an episomal element such as a plasmid. With respect to eukaryotic cells, a stably transformed cell is one in which the transforming DNA has become integrated into a chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones that comprise a population of daughter cells containing the transforming DNA. A “clone” is a population of cells derived from a single cell or common ancestor by mitosis. A “cell line” is a clone of a primary cell that is capable of stable growth in vitro for many generations. [0045] As used herein, a “nucleic acid” or a “nucleic acid sequence” refers to a polymer or oligomer of pyrimidine and/or purine bases, preferably cytosine, thymine, and uracil, and adenine and guanine, respectively. The present technology contemplates any deoxyribonucleotide, ribonucleotide, or peptide nucleic acid component, and any chemical variants thereof, such as methylated, hydroxymethylated, or glycosylated forms of these bases, and the like. The polymers or oligomers may be heterogenous or homogenous in composition and may be isolated from naturally occurring sources or may be artificially or synthetically produced. In addition, the nucleic acids may be DNA or RNA, or a mixture thereof, and may exist permanently or transitionally in single-stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid states. In some embodiments, a nucleic acid or nucleic acid sequence comprises other kinds of nucleic acid structures such as, for instance, a DNA/RNA helix, peptide nucleic acid (PNA), morpholino nucleic acid (see, e.g., Braasch and Corey, Biochemistry, 41(14): 4503-4510 (2002)) and U.S. Pat. No. 5,034,506, incorporated herein by reference), locked nucleic acid (LNA; see Wahlestedt et al., Proc. Natl. Acad. Sci. U.S.A., 97: 5633-5638 (2000), incorporated herein by reference), cyclohexenyl nucleic acids (see Wang, J. Am. Chem. Soc., 122: 8595-8602 (2000), incorporated herein by reference), and/or a ribozyme. Hence, the term “nucleic acid” or “nucleic acid sequence” may also encompass a chain comprising non-natural nucleotides, modified nucleotides, and/or non- nucleotide building blocks that can exhibit the same function as natural nucleotides (e.g., “nucleotide analogs”); further, the term “nucleic acid sequence” as used herein refers to an oligonucleotide, nucleotide or polynucleotide, and fragments or portions thereof, and to DNA or RNA of genomic or synthetic origin, which may be single or double-stranded, and represent the sense or antisense strand. The terms “nucleic acid,” “polynucleotide,” “nucleotide sequence,” and “oligonucleotide” are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof.
[0046] A “peptide” or “polypeptide” is a linked sequence of two or more amino acids linked by peptide bonds. The peptide or polypeptide can be natural, synthetic, or a modification or combination of natural and synthetic. Polypeptides include proteins such as binding proteins, receptors, and antibodies. The proteins may be modified by the addition of sugars, lipids or other moieties not included in the amino acid chain. The terms “polypeptide” and “protein,” are used interchangeably herein. [0047] As used herein, the term “percent sequence identity” refers to the percentage of nucleotides or nucleotide analogs in a nucleic acid sequence, or amino acids in an amino acid sequence, that is identical with the corresponding nucleotides or amino acids in a reference sequence after aligning the two sequences and introducing gaps, if necessary, to achieve the maximum percent identity. Hence, in case a nucleic acid according to the technology is longer than a reference sequence, additional nucleotides in the nucleic acid, that do not align with the reference sequence, are not taken into account for determining sequence identity. Methods and computer programs for alignment are well known in the art, including BLAST, Align 2, and FASTA.
[0048] A “vector” or “expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, e.g., an “insert,” may be attached or incorporated so as to bring about the replication of the attached segment in a cell.
[0049] The term “wild-type” refers to a gene or a gene product that has the characteristics of that gene or gene product when isolated from a naturally occurring source. A wild-type gene is that which is most frequently observed in a population and is thus arbitrarily designated the “normal” or “wild-type” form of the gene. In contrast, the term “modified,” “mutant,” or “polymorphic” refers to a gene or gene product that displays modifications in sequence and or functional properties (e.g., altered characteristics) when compared to the wild-type gene or gene product. It is noted that naturally occurring mutants can be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene or gene product.
[0050] In bacteria and archaea, CRISPR/Cas systems provide immunity by incorporating fragments of invading phage, virus, and plasmid DNA into CRISPR loci and using corresponding CRISPR RNAs (“crRNAs”) to guide the degradation of homologous sequences. Each CRISPR locus encodes acquired “spacers” that are separated by repeat sequences. Transcription of a CRISPR locus produces a “pre-crRNA,” which is processed to yield crRNAs containing spacerrepeat fragments that guide effector nuclease complexes to cleave dsDNA sequences complementary to the spacer. Three different types of CRISPR systems are known, type I, type II, or type III, and classified based on the Cas protein type and the use of a proto-spacer-adjacent motif (PAM) for selection of proto-spacers in invading DNA. The endogenous type II systems comprise the Cas9 protein and two noncoding crRNAs: trans-activating crRNA (tracrRNA) and a precursor crRNA (pre-crRNA) array containing nuclease guide sequences (also referred to as “spacers”) interspaced by identical direct repeats (DRs). tracrRNA is important for processing the pre-crRNA and formation of the Cas9 complex. First, tracrRNAs hybridize to repeat regions of the pre-crRNA. Second, endogenous RNaselll cleaves the hybridized crRNA-tracrRNAs, and a second event removes the 5’ end of each spacer, yielding mature crRNAs that remain associated with both the tracrRNA and Cas9. Third, each mature complex locates a target double stranded DNA (dsDNA) sequence and cleaves both strands using the nuclease activity of Cas9.
[0051] CRISPR/Cas gene editing systems have been developed to enable targeted modifications to a specific gene of interest in eukaryotic cells. CRISPR/Cas gene editing systems are commonly based on the RNA-guided Cas9 nuclease from the type II prokaryotic clustered regularly interspaced short palindromic repeats (CRISPR) adaptive immune system. Engineering CRISPR/Cas systems for use in eukaryotic cells typically involves reconstitution of the crRNA- tracrRNA-Cas9 complex. In human cells, for example, the Cas9 amino acid sequence may be codon-optimized and modified to include an appropriate nuclear localization signal, and the crRNA and tracrRNA sequences may be expressed individually or as a single chimeric molecule via an RNA polymerase II promoter. Typically, the crRNA and tracrRNA sequences are expressed as a chimera and are referred to collectively as “guide RNA” (gRNA) or single guide RNA (sgRNA). Thus, the terms “guide RNA,” “single guide RNA,” and “synthetic guide RNA,” are used interchangeably herein and refer to a nucleic acid sequence comprising a tracrRNA and a pre- crRNA array containing a guide sequence. The terms “guide sequence,” “guide,” and “spacer,” are used interchangeably herein and refer to the about 20 nucleotide sequence within a guide RNA that specifies the target site. In CRISPR/Cas9 systems, the guide RNA contains an approximate 20-nucleotide guide sequence followed by a protospacer adjacent motif (PAM) that directs Cas9 via Watson-Crick base pairing to a target sequence.
[0052] In some embodiments, the disclosure provides a system for RNA-guided recombineering utilizing tools from CRISPR gene editing systems. The system comprises: a Casl2f protein, a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence and a microbial recombination protein.
[0053] Cas protein families are described in further detail in, e.g., Haft et al., PLoS Comput. Biol., 1(6): e60 (2005), incorporated herein by reference. The Cas protein is Casl2f. The amino acid sequences of Cas proteins from a variety of species are publicly available through the GenBank and UniProt databases. Class 2 CRISPR systems are exceptionally diverse, nevertheless, all share a single effector protein that contains a conserved RuvC-like nuclease domain. Interestingly, the size of these CRISPR-associated (Cas) nucleases ranges from >1000 amino acids (aa) for Cas9/Casl2ato as small as 400-600 aa for Casl2f. For in vivo genome editing applications, compact RNA guided nucleases are desirable and would streamline cellular delivery approaches. [0054] Casl2f, also known as Casl4, is the smallest class 2 CRISPR-Cas effector reported to date with a length between ~400-700 amino acids (Harrington et al., Science. 2018; 362:839- 842). Casl2f proteins were identified almost exclusively within a superphylum of symbiotic archaea, DP ANN (Harrington et al., Science. 2018; 362:839-842). Initially found to be specific for ssDNA, Casl2f was recently reported to also recognize dsDNA with 5' T-rich protospacer adjacent motifs (PAMs) (Karvelis et al., Nucleic Acids Res. 2020; 48:5016-5023). Casl2f associates with a crRNA and a tracrRNA, which can be fused into a single guide RNA (sgRNA), to target substrate DNA. Casl2f is a Mg2+-dependent endonuclease that functions best in low salt concentrations and at ~46°C (Karvelis et al., Nucleic Acids Res. 2020; 48:5016-5023). Similar to other Casl2 nucleases, Casl2f is capable of cleaving non-specific ssDNA in trans after binding complementary target DNA, thus enabling its development for nucleic acid detection (Harrington et al., Science. 2018; 362:839-842).
[0055] Type-V effectors employ multiple domains distributed in a recognition lobe (REC) and a nuclease lobe (NUC) for substrate recognition and cleavage. The REC lobe is responsible for substrate recognition, whereas the NUC lobe contains a nuclease site located within the RuvC domain.. However, it is not known how a miniature Casl2f, which is only about half the size of other Cas 12 nucleases, completes all functional requirements for target recognition and cleavage. By determining the atomic structures of Casl2f-sgRNA in the presence and absence of target dsDNA, Xiao et al. (Nucleic Acids Research, Volume 49, Issue 7, 19 April 2021, pages 4120- 4128) showed that two copies of Casl2f are required for substrate recognition and cleavage.
[0056] In some embodiments, the Casl2f protein is a catalytically dead Casl2f. For example, a dead or nuclease-inactivated Casl2f protein, Casl2f may be modified introducing D228A and D225A encoding codons into the Casl2f gene (see, e.g., Bigelyte et al., NATURE COMMUNICATIONS (2021) 12:6191 https://doi.org/10.1038/s41467-021-26469-4).
[0057] In some embodiments, the system comprises a nucleic acid molecule comprising a guide RNA sequence complementary to a target DNA sequence. The guide RNA sequence, as described above, specifies the target site with an approximate 20-nucleotide guide sequence followed by a protospacer adjacent motif (PAM) that directs Casl2f via Watson-Crick base pairing to a target sequence.
[0058] The terms “target DNA sequence,” “target nucleic acid,” “target sequence,” and “target site” are used interchangeably herein to refer to a polynucleotide (nucleic acid, gene, chromosome, genome, etc.) to which a guide sequence (e.g., a guide RNA) is designed to have complementarity, wherein hybridization between the target sequence and a guide sequence promotes the formation of a Casl2f/CRISPR complex, provided sufficient conditions for binding exist. In some embodiments, the target sequence is a genomic DNA sequence. The term “genomic,” as used herein, refers to a nucleic acid sequence (e.g., a gene or locus) that is located on a chromosome in a cell. The target sequence and guide sequence need not exhibit complete complementarity, provided that there is sufficient complementarity to cause hybridization and promote formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA. Suitable DNA/RNA binding conditions include physiological conditions normally present in a cell. Other suitable DNA/RNA binding conditions (e.g., conditions in a cell-free system) are known in the art; see, e.g., Sambrook, referenced herein and incorporated by reference. The strand of the target DNA that is complementary to and hybridizes with the DNA-targeting RNA is referred to as the “complementary strand” and the strand of the target DNA that is complementary to the “complementary strand” (and is therefore not complementary to the DNA-targeting RNA) is referred to as the “noncomplementary strand” or “non-complementary strand.”
[0059] The target genomic DNA sequence may encode a gene product. The term “gene product,” as used herein, refers to any biochemical product resulting from expression of a gene. Gene products may be RNA or protein. RNA gene products include non-coding RNA, such as tRNA, rRNA, micro RNA (miRNA), and small interfering RNA (siRNA), and coding RNA, such as messenger RNA (mRNA). In some embodiments, the target genomic DNA sequence encodes a protein or polypeptide.
[0060] In some embodiments, for instance, when the system includes a a catalytically dead Casl2f, two nucleic acid molecules comprising a guide RNA sequence may be utilized. The two nucleic acid molecules may have the same or different guide RNA sequences, thus complementary to the same or different target DNA sequence. In some embodiments, the guide RNA sequences of the two nucleic acid molecules are complementary to a target DNA sequences at opposite ends (e.g., 3’ or 5’) and/or on opposite strands of the insert location. [0061] In some embodiments, the system further comprises a recruitment system comprising at least one aptamer sequence and an aptamer binding protein functionally linked to the microbial recombination protein as part of a fusion protein.
[0062] In some embodiments, the aptamer sequence is an RNA aptamer sequence. In some embodiments, the nucleic acid molecule comprising the guide RNA also comprises one or more RNA aptamers, or distinct RNA secondary structures or sequences that can recruit and bind another molecular species, an adaptor molecule, such as a nucleic acid or protein. The RNA aptamers can be naturally occurring or synthetic oligonucleotides that have been engineered through repeated rounds of in vitro selection or SELEX (systematic evolution of ligands by exponential enrichment) to bind to a specific target molecular species. In some embodiments, the nucleic acid comprises two or more aptamer sequences. The aptamer sequences may be the same or different and may target the same or different adaptor proteins. In select embodiments, the nucleic acid comprises two aptamer sequences.
[0063] Any RNA aptamer/ aptamer binding protein pair known may be selected and used in connection with the present disclosure (see, e.g., Jayasena, S.D., Clinical Chemistry, 1999. 45(9): p. 1628-1650; Gelinas, et al., Current Opinion in Structural Biology, 2016. 36: p. 122-132; and Hasegawa, H., Molecules, 2016; 21(4): p. 421, incorporated herein by reference).
[0064] A number of RNA aptamer binding, or adaptor, proteins exist, including a diverse array of bacteriophage coat proteins. Examples of such coat proteins include but are not limited to: MS2, Qp, F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, Mi l, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, 4>Cb5, 4>Cb8r, 4>Cb 12r, (|)Cb23r, 7s and PRR1. In some embodiments, the RNA aptamer binds MS2 bacteriophage coat protein or a functional derivative, fragment or variant thereof. MS2 binding RNA aptamers commonly have a simple stem-loop structure, classically defined by a 19 nucleotide RNA molecule with a single bulged adenine on the 5’ leg of the stem (Witherail G.W., et al., (1991) Prog. Nucleic Acid Res. Mol. Biol., 40, 185-220, incorporated herein by reference). However, a number of vastly different primary sequences were found to be able to bind the MS2 coat protein ( Parrott AM, et al., Nucleic Acids Res. 2000;28(2):489-497, Buenrostro JD, et al. Natura Biotechnology 2014; 32, 562-568, and incorporated herein by reference). Any of the RNA aptamer sequence known to bind the MS2 bacteriophage coat protein may be utilized in connection with the present disclosure. In select embodiments, the MS2 RNA aptamer sequence comprises: AACAUGAGGAUCACCCAUGUCUGCAG (SEQ ID NO: 145), AGCAUGAGGAUCACCCAUGUCUGCAG (SEQ ID NO: 146), or AGCGUGAGGAUCACCCAUGCCUGCAG (SEQ ID NO: 147).
[0065] N-proteins (Nut-utilization site proteins) of bacteriophages contain arginine-rich conserved RNA recognition motifs of ~20 amino acids, referred to as N peptides. The RNA aptamer may bind a phage N peptide or a functional derivative, fragment or variant thereof. In some embodiments, the phage N peptide is the lambda or P22 phage N peptide or a functional derivative, fragment or variant thereof.
[0066] In select embodiments, the N peptide is lambda phage N22 peptide, or a functional derivative, fragment or variant thereof. In some embodiments, theN22 peptide comprises an amino acid sequence with at least 70% similarity to the amino acid sequence GNARTRRRERRAEKQAQWKAAN (SEQ ID NO: 149). N22 peptide, the 22 amino acid RNA- binding domain of the X bacteriophage antiterminator protein N (XN-(l-22) or XN peptide), is capable of specifically binding to specific stem-loop structures, including but not limited to the BoxB stem-loop. See, for example Cilley and Williamson, RNA 1997; 3(l):57-67, incorporated herein by reference. A number of different BoxB stem-loop primary sequences are known to bind the N22 peptide and any of those may be utilized in connection with the present disclosure. In some embodiments, the N22 peptide RNA aptamer sequence comprises a nucleotide sequence with at least 70% similarity to an RNA sequence selected from the group consisting of GCCCUGAAAAAGGGC (SEQ ID NO: 150), GCCCUGAAGAAGGGC (SEQ ID NO: 151), GCGCUGAAAAAGCGC (SEQ ID NO: 152), GCCCUGACAAAGGGC (SEQ ID NO: 153), and GCGCUGACAAAGCGC (SEQ ID NO: 154). In some embodiments, the N22 peptide RNA aptamer sequence is selected from the group consisting of SEQ ID NOs: 150-154.
[0067] In select embodiments, the N peptide is the P22 phage N peptide, or a functional derivative, fragment or variant thereof. A number of different BoxB stem-loop primary sequences are known to bind the P22 phage N peptide and variants thereof and any of those may be utilized in connection with the present disclosure. See, for example Cocozaki, Ghattas, and Smith, Journal of Bacteriology 2008; 190(23):7699-7708, incorporated herein by reference. In some embodiments, the P22 phage N peptide comprises an amino acid sequence with at least 70% similarity to the amino acid sequence GNAKTRRHERRRKLAIERDTI (SEQ ID NO: 155). In some embodiments, the P22 phage N peptide RNA aptamer sequence comprises a sequence with at least 70% similarity to an RNA sequence selected from the group consisting of GCGCUGACAAAGCGC (SEQ ID NO: 156) and CCGCCGACAACGCGG (SEQ ID NO: 157). In some embodiments, the P22 phage N peptide RNA aptamer sequence is selected from the group consisting of SEQ ID NOs: 156-157, UGCGCUGACAAAGCGCG (SEQ ID NO: 158) or ACCGCCGACAACGCGGU (SEQ ID NO: 159).
[0068] In some embodiments, the aptamer sequence is a peptide aptamer sequence. The peptide aptamers can be naturally occurring or synthetic peptides that are specifically recognized by an affinity agent. Such aptamers include, but are not limited to, a c-Myc affinity tag, an HA affinity tag, a His affinity tag, an S affinity tag, a methionine-His affinity tag, an RGD-His affinity tag, a 7x His tag, a FLAG octapeptide, a strep tag or strep tag II, a V5 tag, or a VSV-G epitope. Corresponding aptamer binding proteins are well-known in the art and include, for example, primary antibodies, biotin, affimers, single domain antibodies, and antibody mimetics.
[0069] An exemplary peptide aptamer includes a GCN4 peptide (Tanenbaum et al., Cell 2014; 159(3):635-646, incorporated herein by reference). Antibodies, or GCN4 binding protein can be used as the aptamer binding proteins.
[0070] In some embodiments, the peptide aptamer sequence is conjugated to the Casl2f protein. The peptide aptamer sequence may be fused to Casl2f in any orientation (e.g., N-terminus to C-terminus, C-terminus to N-terminus, N-terminus to N-terminus). In select embodiments, the peptide aptamer is fused to the C-terminus of the Casl2f protein.
[0071] In some embodiments, between 1 and 24 peptide aptamer sequences may be conjugated to the Casl2f protein. The aptamer sequences may be the same or different and may target the same or different aptamer binding proteins. In select embodiments, 1 to 24 tandem repeats of the same peptide aptamer sequence are conjugated to the Casl2f protein. In preferred embodiments between 4 and 18 tandem repeats are conjugated to the Casl2f protein. The individual aptamers may be separated by a linker region. Suitable linker regions are known in the art. The linker may be flexible or configured to allow the binding of affinity agents to adjacent aptamers without or with decreased steric hindrance. The linker sequences may provide an unstructured or linear region of the polypeptide, for example, with the inclusion of one or more glycine and/or serine residues. The linker sequences can be at least about 2, 3, 4, 5, 6, 7, 8, 9, 10 or more amino acids in length.
[0072] In some embodiments, the fusion protein comprises a microbial recombination protein functionally linked to an aptamer binding protein. The microbial recombination protein may be RecE, RecT, lambda exonuclease (Exo), Bet protein (betA, redB), exonuclease gp6, singlestranded DNA-binding protein gp2.5, or a derivative or variant thereof.
[0073] In select embodiments, the microbial recombination protein is RecE or RecT, or a derivative or variant thereof. Derivatives or variants of RecE and RecT are functionally equivalent proteins or polypeptides which possess substantially similar function to wild type RecE and RecT. RecE and RecT derivatives or variants include biologically active amino acid sequences similar to the wild-type sequences but differing due to amino acid substitutions, additions, deletions, truncations, post-translational modifications, or other modifications. In some embodiments, the derivatives may improve translation, purification, biological half-life, activity, or eliminate or lessen any undesirable side effects or reactions. The derivatives or variants may be naturally occurring polypeptides, synthetic or chemically synthesized polypeptides or genetically engineered peptide polypeptides. RecE and RecT bioactivities are known to, and easily assayed by, those of ordinary skill in the art, and include, for example exonuclease and single-stranded nucleic acid binding, respectively.
[0074] The RecE or RecT may be from a number of microbial organisms, including Escherichia coH. Pantoea breeneri. Type-F symbiont of Plautia slab. Providencia sp. MGF014, Shigella sonnei. Pseudobacteriovorax antillogorgiicola, among others. In preferred embodiments, the RecE and RecT protein is derived from Escherichia coli.
[0075] In some embodiments, the fusion protein comprises RecE, or a derivative or variant thereof. The RecE, or derivative or variant thereof, may comprise an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-8. The RecE, or derivative or variant thereof, may comprise an amino acid sequences with at least 70% (e.g., 75%., 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-8. In select embodiments, the RecE, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-8. In exemplary embodiments, the RecE, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-3.
[0076] In some embodiments, the fusion protein comprises RecT, or a derivative or variant thereof. The RecT, or derivative or variant thereof, may comprise an amino acid sequence selected from the group consisting of SEQ ID NOs: 9-14. The RecT, or derivative or variant thereof, may comprise an amino acid sequences with at least 70% (e.g., 75%., 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%) similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 9-14. In select embodiments, the RecT, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 9-14. In exemplary embodiments, the RecT, or derivative or variant thereof, comprises an amino acid sequences with at least 90% similarity to amino acid sequences selected from the group consisting of SEQ ID NO:9.
[0077] Truncations may be from either the C-terminal or N-terminal ends, or both. For example, as demonstrated below, a diverse set of truncations from either end or both provided a functional product. In some embodiments, one or more (2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 100, 120 or more) amino acids may be truncated from the C-terminal, N-terminal ends as compared to the wild-type sequence.
[0078] In the fusion protein, the microbial recombination protein may be linked to either terminus of the aptamer binding protein in any orientation (e.g., N-terminus to C-terminus, C- terminus to N-terminus, N-terminus to N-terminus). In select embodiments, the microbial recombination protein N-terminus is linked to the aptamer binding protein C-terminus. Thus, the overall fusion protein from N- to C-terminus comprises the aptamer binding protein (N- to C- terminus) linked to the microbial recombination protein (N- to C-terminus).
[0079] In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a nuclease. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an endonuclease. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an exonuclease. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a nuclease and/or a Cas or dCas. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a endonuclease and/or a Cas or dCas. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a exonuclease and/or a Cas or dCas. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with a nuclease. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with an endonuclease. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with an exonuclease. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with a nuclease and/or a Cas or dCas. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to an aptamer and/or aptamer binding protein. In some embodiments, the recombination protein may be expressed independently, not as a fusion protein, with an aptamer and/or aptamer binding protein. In some embodiments, the recombination protein may be functionally linked as a fusion protein or chimera or chimeric molecule to a nuclease and/or Cas or dCas and/or to an aptamer and/or aptamer binding protein. In some embodiments, the recombination protein may be expressed independently from, not a fusion protein with a nuclease and/or a Cas or dCas and/or an aptamer and/or aptamer binding protein. In some embodiments, the aptamer and/or aptamer binding protein is an MCP protein. In some embodiments the recombination protein may be an SSAP.
[0080] The term “nuclease” as used herein, refers to an agent, such as a protein or small molecule, that is capable of cleaving phosphodiester bonds that join nucleotide residues in a nucleic acid molecule. In some embodiments, the nuclease is but woven, e.g., an enzyme that is capable of binding to a nucleic acid molecule and cleaving phosphodiester bonds linking nucleotide residues in the nucleic acid molecule. The nuclease may be an endonuclease, which cleaves a phosphodiester bond in a polynucleotide strand, or an exonuclease, which cleaves a phosphodiester bond at the end of a polynucleotide strand. In some embodiments, the nuclease is a site-specific nuclease that binds to and/or cleaves a particular phosphodiester bond within a particular nucleotide sequence, which is also referred to herein as a "recognition sequence," "nuclease target site", or "target site". In some embodiments, the nuclease is an RNA-guided (i.e., RNA-programmable) nuclease that complexes (e.g., binds) to RNA having a sequence complementary to the target site, thereby providing sequence specificity of the nuclease. In some embodiments, the nuclease recognizes a single-stranded target site, while in other embodiments, the nuclease recognizes a double-stranded target site, e.g., a double-stranded DNA target site. Target sites for many naturally occurring nucleases, for example many naturally occurring DNA restriction nucleases, are well known to those skilled in the art. In many cases, DNA nucleases such as EcoRI, Hindlll or BamHI recognize palindromic double-stranded DNA target sites that are 4 to 10 base pairs in length and cut each of the wo DNA strands at specific positions within the target site. Some endonucleases symmetrically cleave a double-stranded nucleic acid target site, i.e., cleave both strands at the same position, such that the ends comprise base-paired nucleotides, also referred to herein as blunt ends. Other endonucleases cleave double-stranded nucleic acid target sites asymmetrically, i.e., each strand is cleaved at a different position such that the ends contain unpaired nucleotides. Unpaired nucleotides at the ends of a double-stranded DNA molecule are also referred to as "overhangs", e.g., "5 '-overhangs" or "3' -overhangs," depending on whether the unpaired nucleotide forms the 5'or 3' end of the corresponding DNA strand. The ends of a double-stranded DNA molecule that terminate in unpaired nucleotides are also referred to as sticky ends, so they can "stick" to the ends of other double-stranded DNA molecules that contain complementary unpaired nucleotides. Nuclease proteins typically comprise a "binding domain" that mediates interaction of the protein with a nucleic acid substrate (in some cases also specifically binding to a target site) and a "cleavage domain" that catalyzes the cleavage of phosphodiester bonds within the nucleic acid backbone. In some embodiments, the nuclease protein is capable of binding and cleaving a nucleic acid molecule in a monomeric form, while in other embodiments, the nuclease protein must dimerize or otherwise cleave a target nucleic acid molecule. Binding and cleavage domains of naturally occurring nucleases, as well as mode binding and cleavage domains that can be fused to create nucleases, are well known to those of skill in the art. For example, a zinc finger or transcriptional activator-like element can be used as a binding domain to specifically bind a desired target site and fused or conjugated to a cleavage domain, such as the cleavage domain of fokl, to create an engineered nuclease that cleaves the target site. [0081] Non-limiting examples of an exonuclease include exonuclease I, exonuclease II, exonnuclease III, exonuclease IV, exonuclease V, exonuclease VII, exonuclease VIII, lambda exonuclease, Xml, mung bean nuclease, TREX2, exonuclease T, T7 exonuclease, strandase exonuclease, 3’-5’ exophosphodiesterase, and Bal31 nuclease.
[0082] In some embodiments, the fusion protein further comprises a linker between the microbial recombination protein and the aptamer binding protein. The linkers may comprise any amino acid sequence of any length. The linkers may be flexible such that they do not constrain either of the two components they link together in any particular orientation. The linkers may essentially act as a spacer. In select embodiments, the linker links the C-terminus of the microbial recombination protein to the N-terminus of the aptamer binding protein. In select embodiments, the linker comprises the amino acid sequence of the 16-residue XTEN linker, SGSETPGTSESATPES (SEQ ID NO: 15) or the 37-residue EXTEN linker, SASGGSSGGSSGSETPGTSESATPESSGGSSGGSGGS (SEQ ID NO: 148).
[0083] In some embodiments, the fusion protein further comprises a nuclear localization sequence (NLS). The nuclear localization sequence may be at any location within the fusion protein (e.g., C-terminal of the aptamer binding protein, N-terminal of the aptamer binding protein, C-terminal of the microbial recombination protein). In select embodiments, the nuclear localization sequence is linked to the C-terminus of the microbial recombination protein. A number of nuclear localization sequences are known in the art (see, e.g., Lange, A., et al., J Biol Chem. 2007; 282(8): 5101-5105, incorporated herein by reference) and may be used in connection with the present invention. The nuclear localization sequence may be the SV40 NLS, PKKKRKV (SEQ ID NO: 16); the Tyl NLS, NSKKRSLEDNETEIKVSRDTWNTKNMRSLEPPRSKKRIH (SEQ ID NO: 17); the c-Myc NLS, PAAKRVKLD (SEQ ID NO: 18); the biSV40 NLS, KRTADGSEFESPKKKRKV (SEQ ID NO: 19); and the Mut NLS, PEKKRRRPSGSVPVLARPSPPKAGKSSCI (SEQ ID NO:20). In select embodiments, the nuclear localization sequence is the SV40 NLS, PKKKRKV (SEQ ID NO: 16).
[0084] The Casl2f protein and the fusion protein are desirably included in a single composition alone, in combination with each other, and/or the polynucleotide(s) (e.g., a vector) comprising the guide RNA sequence and the aptamer sequence. The Casl2f protein and/or the fusion protein may or may not be physically or chemically bound to the polynucleotide. The Casl2f protein and/or the microbial recombination protein can be associated with a polynucleotide using any suitable method for protein-protein linking or protein-virus linking known in the art.
[0085] The invention further provides compositions and vectors comprising a polynucleotide comprising a nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an RNA aptamer binding protein.
[0086] The compositions or vectors may further comprise at least one or both of a polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence. In some embodiments, the nucleic acid molecule comprising a guide RNA sequence further comprises at least one RNA aptamer sequence. In some embodiments, the polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein further comprises a sequence encoding at least one peptide aptamer sequence. [0087] Descriptions of the nucleic acid molecule comprising a guide RNA sequence, the aptamer sequences, the Casl2f proteins, the microbial recombination proteins, and the aptamer binding proteins set forth above in connection with the inventive system also are applicable to the polynucleotides of the recited compositions and vectors.
[0088] In certain aspects the invention involves vectors, e.g. for delivering or introducing in a cell, but also for propagating these components (e.g. in prokaryotic cells). A used herein, a "vector" is a tool that allows or facilitates the transfer of an entity from one environment to another. It is a replicon, such as a plasmid, phage, or cosmid, into which another DNA segment may be inserted so as to bring about the replication of the inserted segment. Generally, a vector is capable of replication when associated with the proper control elements. In general, the term "vector" refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, doublestranded, or partially double-stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g. circular); nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art. One type of vector is a "plasmid," which refers to a circular double stranded DNA loop into which additional DNA segments can be inserted, such as by standard molecular cloning techniques. Another type of vector is a viral vector, wherein virally- derived DNA or RNA sequences are present in the vector for packaging into a virus (e.g. retroviruses, replication defective retroviruses, adenoviruses, replication defective adenoviruses, and adeno-associated viruses (AAVs)). Viral vectors also include polynucleotides carried by a virus for transfection into a host cell. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g. bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operatively-linked. Such vectors are referred to herein as "expression vectors." Vectors for and that result in expression in a eukaryotic cell can be referred to herein as "eukaryotic expression vectors." Common expression vectors of utility in recombinant DNA techniques are often in the form of plasmids.
[0089] Recombinant expression vectors can comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vectors include one or more regulatory elements, which may be selected on the basis of the host cells to be used for expression, that is operatively-linked to the nucleic acid sequence to be expressed. Within a recombinant expression vector, "operably linked" is intended to mean that the nucleotide sequence of interest is linked to the regulatory element(s) in a manner that allows for expression of the nucleotide sequence (e.g. in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell). With regards to recombination and cloning methods, mention is made of U.S. patent application Ser. No. 10/815,730, published Sep. 2, 2004 as US 2004-0171156 Al, the contents of which are herein incorporated by reference in their entirety.
[0090] Vector delivery, e.g., plasmid, viral delivery: The CRISPR enzyme, for instance a Type V protein such as Casl2f, and/or any of the present RNAs, for instance a guide RNA, can be delivered using any suitable vector, e.g., plasmid or viral vectors, such as adeno associated virus (AAV), lentivirus, adenovirus or other viral vector types, or combinations thereof. Effector proteins and one or more guide RNAs can be packaged into one or more vectors, e.g., plasmid or viral vectors. In some embodiments, the vector, e.g., plasmid or viral vector is delivered to the tissue of interest by, for example, an intramuscular injection, while other times the delivery is via intravenous, transdermal, intranasal, oral, mucosal, or other delivery methods. Such delivery may be either via a single dose, or multiple doses. One skilled in the art understands that the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector choice, the target cell, organism, or tissue, the general condition of the subject to be treated, the degree of transformation/modification sought, the administration route, the administration mode, the type of transformation/modification sought, etc.
[0091] Such a dosage may further contain, for example, a carrier (water, saline, ethanol, glycerol, lactose, sucrose, calcium phosphate, gelatin, dextran, agar, pectin, peanut oil, sesame oil, etc.), a diluent, a pharmaceutically-acceptable carrier (e.g., phosphate-buffered saline), a pharmaceutically-acceptable excipient, and/or other compounds known in the art. The dosage may further contain one or more pharmaceutically acceptable salts such as, for example, a mineral acid salt such as a hydrochloride, a hydrobromide, a phosphate, a sulfate, etc.; and the salts of organic acids such as acetates, propionates, malonates, benzoates, etc. Additionally, auxiliary substances, such as wetting or emulsifying agents, pH buffering substances, gels or gelling materials, flavorings, colorants, microspheres, polymers, suspension agents, etc. may also be present herein. In addition, one or more other conventional pharmaceutical ingredients, such as preservatives, humectants, suspending agents, surfactants, antioxidants, anticaking agents, fillers, chelating agents, coating agents, chemical stabilizers, etc. may also be present, especially if the dosage form is a reconstitutable form. Suitable exemplary ingredients include microcrystalline cellulose, carboxymethylcellulose sodium, polysorbate 80, phenylethyl alcohol, chlorobutanol, potassium sorbate, sorbic acid, sulfur dioxide, propyl gallate, the parabens, ethyl vanillin, glycerin, phenol, parachlorophenol, gelatin, albumin and a combination thereof. A thorough discussion of pharmaceutically acceptable excipients is available in REMINGTON'S PHARMACEUTICAL SCIENCES (Mack Pub. Co., N.J. 1991) which is incorporated by reference herein.
[0092] In an embodiment herein the delivery is via an adenovirus, which may be at a single booster dose containing at least 1 x 105 particles (also referred to as particle units, pu) of adenoviral vector. In an embodiment herein, the dose preferably is at least about 1 x 106 particles (for example, about 1 x 106- 1 x 1011 particles), more preferably at least about 1 x 107 particles, more preferably at least about 1 x 108 particles (e.g., about 1 x 108-l x 1011 particles or about 1 x 109-l x 1012 particles), and most preferably at least about 1 x IO10 particles (e.g., about 1 x 109- 1 x IO10 particles or about 1 x 109- 1 x 1012 particles), or even at least about 1 x IO10 particles (e.g., about 1 x 1010- 1 x 1012 particles) of the adenoviral vector. Alternatively, the dose comprises no more than about 1 x 1014 particles, preferably no more than about 1 x 1013 particles, even more preferably no more than about 1 x 1012 particles, even more preferably no more than about 1 x 1011 particles, and most preferably no more than about 1 x IO10 particles (e.g., no more than about 1 x 109 particles). Thus, the dose may contain a single dose of adenoviral vector with, for example, about 1 x 106 particle units (pu), about 2 x 106 pu, about 4 x 106 pu, about 1 x 107 pu, about 2 x 107 pu, about 4 x 107 pu, about 1 x 108 pu, about 2 x 108 pu, about 4 x 108 pu, about 1 x 109 pu, about 2 x 109 pu, about 4 x 109 pu, about 1 x IO10 pu, about 2 x IO10 pu, about 4 x IO10 pu, about 1 x 1011 pu, about 2 x 1011 pu, about 4 x 1011 pu, about 1 x 1012 pu, about 2 x 1012 pu, or about 4 x 1012 pu of adenoviral vector. See, for example, the adenoviral vectors in U.S. Pat. No. 8,454,972 B2 to Nabel, et. al., granted on Jun. 4, 2013; incorporated by reference herein, and the dosages at col 29, lines 36-58 thereof. In an embodiment herein, the adenovirus is delivered via multiple doses.
[0093] In an embodiment herein, the delivery is via an AAV. A therapeutically effective dosage for in vivo delivery of the AAV to a human is believed to be in the range of from about 20 to about 50 ml of saline solution containing from about 1 x 1010 to about 1 x 1010 functional AAV/ml solution. The dosage may be adjusted to balance the therapeutic benefit against any side effects. In an embodiment herein, the AAV dose is generally in the range of concentrations of from about 1 x 105 to 1 x IO50 genomes AAV, from about 1 x 108 to 1 x IO20 genomes AAV, from about 1 x IO10 to about 1 x 1016 genomes, or about 1 x 1011 to about 1 x 1016 genomes AAV. A human dosage may be about 1 x 1013 genomes AAV. Such concentrations may be delivered in from about 0.001 ml to about 100 ml, about 0.05 to about 50 ml, or about 10 to about 25 ml of a carrier solution. Other effective dosages can be readily established by one of ordinary skill in the art through routine trials establishing dose response curves. See, for example, U.S. Pat. No. 8,404,658 B2 to Hajjar, et al., granted on Mar. 26, 2013, at col. 27, lines 45-60.
[0094] Ways to package nucleic acid-targeting effector protein (coding nucleic acid molecules, e.g., DNA, into vectors, e.g., viral vectors, to mediate genome modification in vivo include: To achieve NHEJ-mediated gene knockout: Single virus vectorVector containing two or more expression cassettes:Promoter-nucleic acid-targeting effector protein coding nucleic acid molecule-terminatorPromoter-guide RNA1 -terminatorPromoter-guide RNA (N)-terminator (up to size limit of vector) Double virus vectorVector 1 containing one expression cassette for driving the expression of nucleic acid-targeting effector protein (such as a Type V protein such as C2cl or C2c3)Promoter-nucleic acid-targeting effector protein coding nucleic acid molecule- terminatorVector 2 containing one more expression cassettes for driving the expression of one or more guideRNAsPromoter-guide RNA1 -terminatorPromoter-guide RNA1 (N)-terminator (up to size limit of vector) To mediate homology-directed repair. In addition to the single and double virus vector approaches described above, an additional vector is used to deliver a homology-direct repair template.
[0095] The promoter used to drive nucleic acid-targeting effector protein coding nucleic acid molecule expression can include: AAV ITR can serve as a promoter: this is advantageous for eliminating the need for an additional promoter element (which can take up space in the vector). The additional space freed up can be used to drive the expression of additional elements (gRNA, etc.). Also, ITR activity is relatively weaker, so can be used to reduce potential toxicity due to over expression of nucleic acid-targeting effector protein. For ubiquitous expression, can use promoters: CMV, CAG, CBh, PGK, SV40, Ferritin heavy or light chains, etc. For brain or other CNS expression, can use promoters: SynapsinI for all neurons, CaMKIIalpha for excitatory neurons, GAD67 or GAD65 or VGAT for GABAergic neurons, etc. For liver expression, can use Albumin promoter. For lung expression, can use SP-B. For endothelial cells, can use ICAM. For hematopoietic cells can use IFNbeta or CD45. For Osteoblasts can use OG-2.
[0096] The promoter used to drive guide RNA can include: Pol III promoters such as U6 or Hl Use of Pol II promoter and intronic cassettes to express guide RNA Adeno Associated Virus (AAV)
[0097] Nucleic acid-targeting effector protein and one or more guide RNA can be delivered using adeno associated virus (AAV), lentivirus, adenovirus or other plasmid or viral vector types, in particular, using formulations and doses from, for example, U.S. Pat. No. 8,454,972 (formulations, doses for adenovirus), U.S. Pat. No. 8,404,658 (formulations, doses for AAV) and U.S. Pat. No. 5,846,946 (formulations, doses for DNA plasmids) and from clinical trials and publications regarding the clinical trials involving lentivirus, AAV and adenovirus. For examples, for AAV, the route of administration, formulation and dose can be as in U.S. Pat. No. 8,454,972 and as in clinical trials involving AAV. For Adenovirus, the route of administration, formulation and dose can be as in U.S. Pat. No. 8,404,658 and as in clinical trials involving adenovirus. For plasmid delivery, the route of administration, formulation and dose can be as in U.S. Pat. No. 5,846,946 and as in clinical studies involving plasmids. Doses may be based on or extrapolated to an average 70 kg individual (e.g., a male adult human), and can be adjusted for patients, subjects, mammals of different weight and species. Frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian), depending on usual factors including the age, sex, general health, other conditions of the patient or subject and the particular condition or symptoms being addressed. The viral vectors can be injected into the tissue of interest. For cell-type specific genome modification, the expression of nucleic acid-targeting effector can be driven by a cell-type specific promoter. For example, liver-specific expression might use the Albumin promoter and neuron-specific expression (e.g., for targeting CNS disorders) might use the Synapsin I promoter.
[0098] In terms of in vivo delivery, AAV is advantageous over other viral vectors for a couple of reasons: Low toxicity (this may be due to the purification method not requiring ultra centrifugation of cell particles that can activate the immune response) and Low probability of causing insertional mutagenesis because it doesn't integrate into the host genome.
[0099] AAV has a packaging limit of 4.5 or 4.75 Kb. This means that nucleic acid-targeting effector protein (such as a Type V protein such as C2cl or C2c3) as well as a promoter and transcription terminator have to be all fit into the same viral vector. Therefore embodiments of the invention include utilizing homologs of nucleic acid-targeting effector protein (such as a Type V protein such as C2cl or C2c3) that are shorter.
[00100] As to AAV, the AAV can be AAV1, AAV2, AAV5 or any combination thereof. One can select the AAV of the AAV with regard to the cells to be targeted; e.g., one can select AAV serotypes 1, 2, 5 or a hybrid capsid AAV1, AAV2, AAV5 or any combination thereof for targeting brain or neuronal cells; and one can select AAV4 for targeting cardiac tissue. AAV8 is useful for delivery to the liver. The herein promoters and vectors are preferred individually.
[00101] Packaging cells are typically used to form virus particles that are capable of infecting a host cell. Such cells include 293 cells, which package adenovirus, and psi2 cells or PA317 cells, which package retrovirus. Viral vectors used in gene therapy are usually generated by producing a cell line that packages a nucleic acid vector into a viral particle. The vectors typically contain the minimal viral sequences required for packaging and subsequent integration into a host, other viral sequences being replaced by an expression cassette for the polynucleotide(s) to be expressed. The missing viral functions are typically supplied in trans by the packaging cell line. For example, AAV vectors used in gene therapy typically only possess ITR sequences from the AAV genome which are required for packaging and integration into the host genome. Viral DNA is packaged in a cell line, which contains a helper plasmid encoding the other AAV genes, namely rep and cap, but lacking ITR sequences. The cell line may also be infected with adenovirus as a helper. The helper virus promotes replication of the AAV vector and expression of AAV genes from the helper plasmid. The helper plasmid is not packaged in significant amounts due to a lack of ITR sequences. Contamination with adenovirus can be reduced by, e.g., heat treatment to which adenovirus is more sensitive than AAV. Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US20030087817, incorporated herein by reference.
[00102] Millington-Ward et al. (Molecular Therapy, vol. 19 no. 4, 642-649 April 2011) describes adeno-associated virus (AAV) vectors to deliver an RNA interference (RNAi)-based rhodopsin suppressor and a codon-modified rhodopsin replacement gene resistant to suppression due to nucleotide alterations at degenerate positions over the RNAi target site. An injection of either 6.0 x 108 vp or 1.8 x 1010 vp AAV were subretinally injected into the eyes by Millington- Ward et al. The AAV vectors of Millington-Ward et al. may be applied to the system of the present invention, contemplating a dose of about 2 x 1011 to about 6 x 1011 vp administered to a human. [00103] Dalkara et al. (Sci Transl Med 5, 189ra76 (2013)) also relates to in vivo directed evolution to fashion an AAV vector that delivers wild-type versions of defective genes throughout the retina after noninjurious injection into the eyes' vitreous humor. Dalkara describes a 7 mer peptide display library and an AAV library constructed by DNA shuffling of cap genes from AAV1, 2, 4, 5, 6, 8, and 9. The rcAAV libraries and rAAV vectors expressing GFP under a CAG or Rho promoter were packaged and deoxyribonuclease-resistant genomic titers were obtained through quantitative PCR. The libraries were pooled, and two rounds of evolution were performed, each consisting of initial library diversification followed by three in vivo selection steps. In each such step, P30 rho-GFP mice were intravitreally injected with 2 ml of iodixanol-purified, phosphate-buffered saline (PBS)-dialyzed library with a genomic titer of about 1. times.10. sup.12 vg/ml. The AAV vectors of Dalkara et al. may be applied to the nucleic acid-targeting system of the present invention, contemplating a dose of about 1 x 1015 to about 1 x 1016 vg/ml administered to a human.
[00104] The nucleic acid sequence encoding the Casl2f protein and/or the nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein can be provided to a cell on the same vector (e.g., in cis) as the nucleic acid molecule comprising the guide RNA sequence and/or the RNA aptamer sequence. In such embodiments, a unidirectional promoter can be used to control expression of each nucleic acid sequence. In another embodiment, a combination of bidirectional and unidirectional promoters can be used to control expression of multiple nucleic acid sequences.
[00105] In other embodiments, a nucleic acid sequence encoding the Casl2f protein, the nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein, and the nucleic acid molecule comprising the guide RNA sequence and/or the RNA aptamer sequence can be provided to a cell on separate vectors (e.g., in trans). Each of the nucleic acid sequences in each of the separate vectors can comprise the same or different expression control sequences. The separate vectors can be provided to cells simultaneously or sequentially.
[00106] The vector(s) comprising the nucleic acid sequences encoding the Casl2f protein and encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein can be introduced into a host cell that is capable of expressing the polypeptide encoded thereby, including any suitable prokaryotic or eukaryotic cell. As such, the invention provides an isolated cell comprising the vector or nucleic acid sequences disclosed herein. Preferred host cells are those that can be easily and reliably grown, have reasonably fast growth rates, have well characterized expression systems, and can be transformed or transfected easily and efficiently. Examples of suitable prokaryotic cells include, but are not limited to, cells from the genera Bacillus (such as Bacillus subtilis and Bacillus brevis), Escherichia (such as E. coll), Pseudomonas, Streptomyces, Salmonella, and Envinia. Suitable eukaryotic cells are known in the art and include, for example, yeast cells, insect cells, and mammalian cells. Examples of suitable yeast cells include those from the genera Kluyveromyces, Pichia, Rhino-sporidium, Saccharomyces, and Schizosaccharomyces. Exemplary insect cells include Sf-9 and HIS (Invitrogen, Carlsbad, Calif.) and are described in, for example, Kitts et al., Biotechniques, IP. 810-817 (1993); Lucklow, Curr. Opin. Biotechnol., P. 564-572 (1993); and Lucklow et al., J. Virol., 67'. 4566-4579 (1993), incorporated herein by reference. Desirably, the host cell is a mammalian cell, and in some embodiments, the host cell is a human cell. A number of suitable mammalian and human host cells are known in the art, and many are available from the American Type Culture Collection (ATCC, Manassas, Va.). Examples of suitable mammalian cells include, but are not limited to, Chinese hamster ovary cells (CHO) (ATCC No. CCL61), CHO DHFR-cells (Urlaub et al., Proc. Natl. Acad. Sci. USA, 97: 4216-4220 (1980)), human embryonic kidney (HEK) 293 or 293T cells (ATCC No. CRL1573), and 3T3 cells (ATCC No. CCL92). Other suitable mammalian cell lines are the monkey COS-1 (ATCC No. CRL1650) and COS-7 cell lines (ATCC No. CRL1651), as well as the CV-1 cell line (ATCC No. CCL70). Further exemplary mammalian host cells include primate, rodent, and human cell lines, including transformed cell lines. Normal diploid cells, cell strains derived from in vitro culture of primary tissue, as well as primary explants, are also suitable. Other suitable mammalian cell lines include, but are not limited to, mouse neuroblastoma N2A cells, HeLa, HEK, A549, HepG2, mouse L-929 cells, and BHK or HaK hamster cell lines. Methods for selecting suitable mammalian host cells and methods for transformation, culture, amplification, screening, and purification of cells are known in the art. [00107] The invention also provides a method of altering a target DNA. In some embodiments, the method alters genomic DNA sequence in a cell, although any desired nucleic acid may be modified. When applied to DNA contained in cells, the method comprises introducing the systems, compositions, or vectors described herein into a cell comprising a target genomic DNA sequence. Descriptions of the nucleic acid molecule comprising a guide RNA sequence, the Casl2f proteins, the microbial recombination proteins, the recruitment systems, and polynucleotides encoding thereof, the cell, the target genomic DNA sequence, and components thereof, set forth above in connection with the inventive system are also applicable to the method of altering a target genomic DNA sequence in a cell. The systems, composition or vectors may be introduced in any manner known in the art including, but not limited to, chemical transfection, electroporation, microinjection, biolistic delivery via gene guns, or magnetic-assisted transfection, depending on the cell type.
[00108] Upon introducing the systems described herein into a cell comprising a target genomic DNA sequence, the guide RNA sequence binds to the target genomic DNA sequence in the cell genome, the Casl2f protein associates with the guide RNA and may induce a double strand break or single strand nick in the target genomic DNA sequence and the aptamer recruits the microbial recombination proteins to the target genomic DNA sequence through the aptamer binding protein of the fusion protein, thereby altering the target genomic DNA sequence in the cell. When introducing the compositions, or vectors described herein into the cell, the nucleic acid molecule comprising a guide RNA sequence, the Casl2f protein, and the fusion protein are first expressed in the cell.
[00109] In some embodiments, the cell is in an organism or host, such that introducing the disclosed systems, compositions, vectors into the cell comprises administration to a subject. The method may comprise providing or administering to the subject, in vivo, or by transplantation of ex vivo treated cells, systems, compositions, vectors of the present system.
[00110] A “subject” may be human or non-human and may include, for example, animal strains or species used as “model systems” for research purposes, such a mouse model as described herein. Likewise, subject may include either adults or juveniles (e.g., children). Moreover, subject may mean any living organism, preferably a mammal (e.g., human or non-human) that may benefit from the administration of compositions contemplated herein. Examples of mammals include, but are not limited to, any member of the Mammalian class: humans, non-human primates such as chimpanzees, and other apes and monkey species; farm animals such as cattle, horses, sheep, goats, swine; domestic animals such as rabbits, dogs, and cats; laboratory animals including rodents, such as rats, mice and guinea pigs, and the like. Examples of non-mammals include, but are not limited to, birds, fish, and the like. In one embodiment of the methods and compositions provided herein, the mammal is a human. Plants include without limitation sugar cane, corn, wheat, rice, oil palm fruit, potatoes, soy beans, vegetables, cassava, sugar beets, tomatoes, barley, bananas, watermelon, onions, sweet potatoes, cucumbers, apples, seed cotton, oranges, and the like.
[00111] As used herein, the terms “providing”, “administering,” “introducing,” are used interchangeably herein and refer to the placement of the systems of the invention into a subject by a method or route which results in at least partial localization of the system to a desired site. The systems can be administered by any appropriate route which results in delivery to a desired location in the subject.
[00112] The phrase “altering a DNA sequence,” as used herein, refers to modifying at least one physical feature of a DNA sequence of interest. DNA alterations include, for example, single or double strand DNA breaks, deletion, or insertion of one or more nucleotides, and other modifications that affect the structural integrity or nucleotide sequence of the DNA sequence. The modifications of a target sequence in genomic DNA may lead to, for example, gene correction, gene replacement, gene tagging, transgene insertion, nucleotide deletion, gene disruption, gene mutation, gene knock-down, and the like.
[00113] In some embodiments, the systems and methods described herein may be used to correct one or more defects or mutations in a gene (referred to as “gene correction”). In such cases, the target genomic DNA sequence encodes a defective version of a gene, and the system further comprises a donor nucleic acid molecule which encodes a wild-type or corrected version of the gene. Thus, in other words, the target genomic DNA sequence is a “disease-associated” gene. The term “disease-associated gene,” refers to any gene or polynucleotide whose gene products are expressed at an abnormal level or in an abnormal form in cells obtained from a disease-affected individual as compared with tissues or cells obtained from an individual not affected by the disease. A disease-associated gene may be expressed at an abnormally high level or at an abnormally low level, where the altered expression correlates with the occurrence and/or progression of the disease. A disease-associated gene also refers to a gene, the mutation or genetic variation of which is directly responsible or is in linkage disequilibrium with a gene(s) that is responsible for the etiology of a disease. Examples of genes responsible for such “single gene” or “monogenic” diseases include, but are not limited to, adenosine deaminase, a-1 antitrypsin, cystic fibrosis transmembrane conductance regulator (CFTR), P-hemoglobin (HBB), oculocutaneous albinism II (OCA2), Huntingtin (HTT), dystrophia myotonica-protein kinase (DMPK), low-density lipoprotein receptor (LDLR), apolipoprotein B (APOB), neurofibromin 1 (NF1), polycystic kidney disease 1 (PKD1), polycystic kidney disease 2 (PKD2), coagulation factor VIII (F8), dystrophin (DMD), phosphate-regulating endopeptidase homologue, X-linked (PHEX), methyl-CpG-binding protein 2 (MECP2), and ubiquitin-specific peptidase 9Y, Y-linked (USP9Y). Other single gene or monogenic diseases are known in the art and described in, e.g., Chial, H. Rare Genetic Disorders: Learning About Genetic Disease Through Gene Mapping, SNPs, and Microarray Data, Nature Education 1(1): 192 (2008), incorporated herein by reference; Online Mendelian Inheritance in Man (OMIM); and the Human Gene Mutation Database (HGMD).
[00114] The invention provides knock-ins of large transgenes at therapeutically relevant loci in the human genome. In certain embodiments, the locus provides cell or tissue-specific expression. In certain embodiments, the invention comprises insertion of nucleic acids into the albumin (ALB) locus. The ALB locus provides for liver targeting in human hepatocytes, is highly expressed and in a liver-specific manner. In certain embodiments, the invention comprises insertion of nucleic acids into the AAVS1 locus. The AAVS1 locus is a safe-harbor locus for gene therapy that is well expressed in certain tissue types and can be used in a wide variety of treatments, with low expression in liver. US Patent Publication 2018/0214490 Al describes gene therapy for lysosomal storage diseases, including targeting transgenes to safe harbo” loci such as the AAVS1, HPRT and CCR5 genes in human cells, and Rosa26 in murine cells. US Patent 9267154 describes integration of exogenous nucleic acid sequences into the PPP1R12C locus, which is widely expressed in most tissues, describes cell-specific expression by targeting transgenes (e.g., encoding chimeric antigen receptors (CARs)) to the T-cell receptor a constant (TRAC) locus. These are exemplary and nonlimiting as to loci that can be targeted according to the invention.
[00115] In another embodiment, the target genomic DNA sequence can comprise a gene, the mutation of which contributes to a particular disease in combination with mutations in other genes. Diseases caused by the contribution of multiple genes which lack simple (e.g., Mendelian) inheritance patterns are referred to in the art as a “multifactorial” or “polygenic” disease. Examples of multifactorial or polygenic diseases include, but are not limited to, asthma, diabetes, epilepsy, hypertension, bipolar disorder, and schizophrenia. Certain developmental abnormalities also can be inherited in a multifactorial or polygenic pattern and include, for example, cleft lip/palate, congenital heart defects, and neural tube defects.
[00116] In another embodiment, the method of altering a target genomic DNA sequence can be used to delete nucleic acids from a target sequence in a cell by cleaving the target sequence and allowing the cell to repair the cleaved sequence in the absence of an exogenously provided donor nucleic acid molecule. Deletion of a nucleic acid sequence in this manner can be used in a variety of applications, such as, for example, to remove disease-causing trinucleotide repeat sequences in neurons, to create gene knock-outs or knock-downs, and to generate mutations for disease models in research.
[00117] The term “donor nucleic acid molecule” refers to a nucleotide sequence that is inserted into the target DNA (e.g., genomic DNA). As described above the donor DNA may include, for example, a gene or part of a gene, a sequence encoding a tag or localization sequence, or a regulating element. The donor nucleic acid molecule may be of any length. In some embodiments, the donor nucleic acid molecule is between 10 and 10,000 nucleotides in length. For example, between about 100 and 5,000 nucleotides in length, between about 200 and 2,000 nucleotides in length, between about 500 and 1,000 nucleotides in length, between about 500 and 5,000 nucleotides in length, between about 1,000 and 5,000 nucleotides in length, or between about 1,000 and 10,000 nucleotides in length,
[00118] The disclosed systems and methods overcome challenges encountered during conventional gene editing, including low efficiency and off-target events, particularly with kilobase-scale nucleic acids. In some embodiments, the disclosed systems and methods improve the efficiency of gene editing. For example, the disclosed systems and methods can have a 2- to 10-fold increase in efficiency over conventional CRISPR-Cas9 systems and methods, as shown in Examples 2, 3, and 5. In some embodiments, the improvement in efficiency is accompanied by a reduction in off-target events. The off-target events may be reduced by greater than 50% compared to conventional CRISPR-Cas9 systems and methods, for example, a reduction of off-target events by about 90% is shown in Example 3. Another aspect of increasing the overall accuracy of a gene editing system is reducing the on-target insertion-deletions (indels), a byproduct of HDR editing. In some embodiments, the disclosed systems and methods reduce the on-target indels by greater than 90% compared to conventional CRISPR-Cas9 systems and methods, as shown in Example 3. [00119] The invention further provides kits containing one or more reagents or other components useful, necessary, or sufficient for practicing any of the methods described herein. For example, kits may include CRISPR reagents (Casl2f protein, guide RNA, vectors, compositions, etc.), recombineering reagents (recombination protein-aptamer binding protein fusion protein, the aptamer sequence, vectors, compositions, etc.) transfection or administration reagents, negative and positive control samples (e.g., cells, template DNA), cells, containers housing one or more components (e.g., microcentrifuge tubes, boxes), detectable labels, detection and analysis instruments, software, instructions, and the like.
[00120] The RNAs may be delivered using adeno associated virus (AAV), lentivirus, adenovirus or other viral vector types, or combinations thereof. The RNAs can be packaged into one or more viral vectors. In some embodiments, the viral vector is delivered to the tissue of interest by, for example, an intramuscular injection, while other times the viral delivery is via intravenous, transdermal, intranasal, oral, mucosal, or other delivery methods. Such delivery may be either via a single dose, or multiple doses. One skilled in the art understands that the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector chose, the target cell, organism, or tissue, the general condition of the subject to be treated, the degree of transformation/modification sought, the administration route, the administration mode, the type of transformation/modification sought, etc.
[00121] In an embodiment herein the delivery is via an adenovirus, which may be at a single booster dose containing at least 1 x 105 particles (also referred to as particle units, pu) of adenoviral vector. In an embodiment herein, the dose preferably is at least about 1 x 106 particles (for example, about lxlO6-lxlO12 particles), more preferably at least about IxlO10 particles, more preferably at least about IxlO8 particles (e.g., about lxlO8-lxlOn particles or about 1X108-1X1012 particles), and most preferably at least about IxlO9 particles (e.g., about lxlO9-lxlO10 particles or about lxio9-lxio12 particles), or even at least about IxlO10 particles (e.g., about lxlOlo-lxlO12 particles) of the adenoviral vector. Alternatively, the dose comprises no more than about IxlO14 particles, preferably no more than about IxlO13 particles, even more preferably no more than about IxlO12 particles, even more preferably no more than about IxlO11 particles, and most preferably no more than about 1 x 1010 particles (e.g., no more than about IxlO9 articles). Thus, the dose may contain a single dose of adenoviral vector with, for example, about IxlO6 particle units (pu), about 2x 106 pu, about 4x 106 pu, about IxlO7 pu, about 2x 107 pu, about 4x 107 pu, about IxlO8 pu, about 2xl08 pu, about 4xl08 pu, about IxlO9 pu, about 2xl09 pu, about 4xl09 pu, about lxl010pu, about 2xlO10 pu, about 4xlO10 pu, about IxlO11 pu, about 2xlOn pu, about 4xlOn pu, about IxlO12 pu, about 2x 1012 pu, or about 4x 1012 pu of adenoviral vector. See, for example, the adenoviral vectors in U.S. Pat. No.8,454,972 B2 to Nabel, et. al., granted on Jun.4, 2013; incorporated by reference herein, and the dosages at col 29, lines 36-58 thereof. In an embodiment herein, the adenovirus is delivered via multiple doses.
[00122] In an embodiment herein, the delivery is via an AAV. A therapeutically effective dosage for in vivo delivery of the AAV to a human is believed to be in the range of from about 20 to about 50 ml of saline solution containing from about 1 x IO10 to about 1 x IO10 functional AAV/ml solution. The dosage may be adjusted to balance the therapeutic benefit against any side effects. In an embodiment herein, the AAV dose is generally in the range of concentrations of from about 1 x 105 to 1 x IO50 genomes AAV, from about 1 x 108 to 1 x IO20 genomes AAV, from about 1 x 1010 to about 1 x 1016 genomes, or about 1 x 1011 to about 1 x 1016 genomes AAV. A human dosage may be about I x lO13 genomes AAV. Such concentrations may be delivered in from about 0.001 ml to about 100 ml, about 0.05 to about 50 ml, or about 10 to about 25 ml of a carrier solution. Other effective dosages can be readily established by one of ordinary skill in the art through routine trials establishing dose response curves. See, for example, U.S. Pat. No. 8,404,658 B2 to Hajjar, et al., granted on Mar. 26, 2013, at col. 27, lines 45-60.
[00123] In an embodiment herein the delivery is via a plasmid. In such plasmid compositions, the dosage should be a sufficient amount of plasmid to elicit a response. For instance, suitable quantities of plasmid DNA in plasmid compositions can be from about 0.1 to about 2 mg, or from about 1 pg to about 10 pg.
[00124] The doses herein are based on an average 70 kg individual. The frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian), or scientist skilled in the art. Mice used in experiments are about 20 g. From that which is administered to a 20 g mouse, one can extrapolate to a 70 kg individual.
[00125] Lentiviruses are complex retroviruses that have the ability to infect and express their genes in both mitotic and post-mitotic cells. The most commonly known lentivirus is the human immunodeficiency virus (HIV), which uses the envelope glycoproteins of other viruses to target a broad range of cell types.
[00126] Lentiviruses may be prepared as follows. After cloning pCasESlO (which contains a lentiviral transfer plasmid backbone), HEK293FT at low passage (p=5) were seeded in a T-75 flask to 50% confluence the day before transfection in DMEM with 10% fetal bovine serum and without antibiotics. After 20 hours, media was changed to OptiMEM (serum-free) media and transfection was done 4 hours later. Cells were transfected with 10 pg of lentiviral transfer plasmid (pCasESlO) and the following packaging plasmids: 5 pg of pMD2. G (VSV-g pseudotype), and 7.5 ug of psPAX2 (gag/pol/rev/tat). Transfection was done in 4 mL OptiMEM with a cationic lipid delivery agent (50 uL Lipofectamine 2000 and 100 ul Plus reagent). After 6 hours, the media was changed to antibiotic-free DMEM with 10% fetal bovine serum.
[00127] Lentivirus may be purified as follows. Viral supernatants were harvested after 48 hours. Supernatants were first cleared of debris and filtered through a 0.45 um low protein binding (PVDF) filter. They were then spun in a ultracentrifuge for 2 hours at 24,000 rpm. Viral pellets were resuspended in 50 ul of DMEM overnight at 4 C. They were then aliquotted and immediately frozen at -80 C.
[00128] In another embodiment, minimal non-primate lentiviral vectors based on the equine infectious anemia virus (EIAV) are also contemplated, especially for ocular gene therapy (see, e.g., Balagaan, J Gene Med 2006; 8: 275-285, Published online 21 Nov. 2005 in Wiley InterScienc; available at the website: interscience.wiley.com. DOI: 10.1002/jgm.845). In another embodiment, RetinoStat®, an equine infectious anemia virus-based lentiviral gene therapy vector that expresses angiostatic proteins endostain and angiostatin that is delivered via a subretinal injection for the treatment of the web form of age-related macular degeneration is also contemplated (see, e.g., Binley et al., HUMAN GENE THERAPY 23 : 980-991 (September 2012)) may be modified for the system of the present invention.
[00129] Lentiviral vectors have been disclosed as in the treatment for Parkinson's Disease, see, e.g., US Patent Publication No. 20120295960 and U.S. Pat. Nos. 7,303,910 and 7,351,585. Lentiviral vectors have also been disclosed for the treatment of ocular diseases, see e.g., US Patent Publication Nos. 20060281180, 20090007284, US20110117189; US20090017543;
US20070054961, US20100317109. Lentiviral vectors have also been disclosed for delivery to the brain, see, e.g., US Patent Publication Nos. US20110293571; US20110293571, US20040013648, US20070025970, US20090111106 and U.S. Pat. No. 7,259,015.
[00130] Several types of particle delivery systems and/or formulations are known to be useful in a diverse spectrum of biomedical applications. In general, a particle is defined as a small object that behaves as a whole unit with respect to its transport and properties. Particles are further classified according to diameter Coarse particles cover a range between 2,500 and 10,000 nanometers. Fine particles are sized between 100 and 2,500 nanometers. Ultrafine particles, or nanoparticles, are generally between 1 and 100 nanometers in size. The basis of the 100-nm limit is the fact that novel properties that differentiate particles from the bulk material typically develop at a critical length scale of under 100 nm.
[00131] Any element of any suitable CRISPR/Cas gene editing system known in the art can be employed in the systems and methods described herein, as appropriate. CRISPR/Cas gene editing technology is described in detail in, for example, U.S. Patent Nos. 8,546,553, 8,697,359; 8,771,945; 8,795,965; 8,865,406; 8,871,445; 8,889,356; 8,889,418; 8,895,308; 8,9066,616; 8,932,814; 8,945,839; 8,993,233; 8,999,641; 9,115,348; 9,149,049; 9,493,844; 9,567,603; 9,637,739; 9,663,782; 9,404,098; 9,885,026; 9,951,342; 10,087,431; 10,227,610; 10,266,850; 10,601,748; 10,604,771; and 10,760,064; and U.S. Patent Application Publication Nos. US2010/0076057; US2014/0113376; US2015/0050699; US2015/0031134; US2014/0357530; US2014/0349400; US2014/0315985; US2014/0310830; US2014/0310828; US2014/0309487; US2014/0294773; US2014/0287938; US2014/0273230; US2014/0242699; US2014/0242664; US2014/0212869; US2014/0201857; US2014/0199767; US2014/0189896; US2014/0186919; US2014/0186843; and US2014/0179770, each incorporated herein by reference.
[00132] Preferred embodiments of this invention are described herein, including the best mode known to the inventors for carrying out the invention. Variations of those preferred embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the invention to be practiced otherwise than as specifically described herein. Accordingly, this invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context.
[00133] Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined in the appended claims.
[00134] The present invention will be further illustrated in the following Examples which are given for illustration purposes only and are not intended to limit the invention in any way.
Examples
Example 1: Materials and Methods [00135] RecE/T Homolog Screening RefSeq non-redundant protein database was downloaded from NCBI on October 29, 2019. The database was searched with E. coli Rac prophage RecT (NP 415865.1) and RecE (NP 415866.1) as queries using position-specific iterated (PSI)- BLAST1 to retrieve protein homologs. Hits were clustered with CD-EHT2 and representative sequences were selected from each cluster for multiple alignment with MUSCLE3. Then, FastTree4 was used for maximum likelihood tree reconstruction with default parameters. A diverse set of RecET homologs were selected, synthesized by GenScript, and cloned into pMPH MCP vectors for testing.
[00136] Plasmids construction pX330, pMPH and pU6-(BbsI)_CBh-Cas9-T2A-BFP plasmids were obtained from Addgene. Tested effector DNA fragments were ordered from IDT, Genewiz, and GenScript. The fragments were Gibson assembled into the backbones using NEBuilder HiFi DNA Assembly Master Mix (New England BioLabs). All sgRNAs (Table 1) were inserted into backbones using Golden Gate cloning. All constructs were sequence-verified with Sanger sequencing of prepped plasmids.
Figure imgf000042_0001
[00137] Cell culture Human Embryonic Kidney (HEK) 293 T, HeLa and HepG2 were maintained in Dulbecco’s Modified Eagle’s Medium (DMEM, Life Technologies), with 10% fetal bovine serum (FBS, HyClone), 100 U/mL penicillin, and 100 pg/mL streptomycin (Life Technologies) at 37 °C with 5% CO2.
[00138] hES-H9 cells were maintained in mTeSRl medium (StemCell Technologies) at 37 °C with 5% CO2. Culture plates were pre-coated with Matrigel (Corning) 12 hours prior to use, and cells were supplemented with 10 pM Y27632 (Sigma) for the first 24 hours after passaging. Culture media was changed every 24 hours.
[00139] Transfection HEK293T cells were seeded into 96-well plates (Corning) 12-24 hours prior to transfection at a density of 30,000 cells/well, and 250 ng of total DNA was transfected per well. HeLa and HepG2 cells were seeded into 48-well plates (Corning) one day prior to transfection at a density of 50,000 and 30,000 cells/well respectively, and 400 ng of total DNA was transfected per well. Transfections were performed with Lipofectamine 3000 (Life Technologies) following the manufacturer’s instructions.
[00140] Electroporation For hES-H9 related transfection experiments, P3 Primary Cell 4D- NucleofectorTM X Kit S (Lonza) was used following the manufacturer’s protocol. For each reaction, 300,000 cells were nucleofected with 4 pg total DNA using the DC100 Nucleofector Program.
[00141] Fluorescence-activated cell sorting (FACS) mKate knock-in efficiency was analyzed on a CytoFLEX flow cytometer (Beckman Coulter; Stanford Stem Cell FACS Core). 72 hours after transfection, cells were washed once with PBS and dissociated with TrypLE Express Enzyme (Thermo Fisher Scientific). Cell suspension was then transferred to a 96-well U-bottom plate (Thermo Fisher Scientific) and centrifuged at 300xG for 5 minutes. After removing the supernatant, pelleted cells were resuspended with 50 pl 4% FBS in PBS, and cells were sorted within 30 minutes of preparation.
[00142] RFLP HEK293T cells were transfected with plasmid DNA and PCR templates and harvested after 72 hours for genomic DNA using the QuickExtract DNA Extraction Solution (Biosearch Technologies) following the manufacturer’s protocol. The target genomic region was amplified using specific primers outside of the homology arms of the PCR template. PCR products were purified with Monarch PCR & DNA Cleanup Kit (New England BioLabs). 300 ng of purified product was digested with BsrGI (EMX1, New England BioLabs) or Xbal (VEGFA, NEB), and the digested products were analyzed on a 5% Mini-PROTEAN TBE gel (Bio-Rad).
[00143] Next-Generation Sequencing Library Preparation 72 hours after transfection, genomic DNA was extracted using QuickExtract DNA Extraction Solution (Biosearch Technologies). 200 ng total DNA was used for NGS library preparation. Genes of interest were amplified using specific primers (Table 2) for the first round PCR reaction. Illumina adapters and index barcodes were added to the fragments with a second round PCR using the primers listed in Table 2. Round 2 PCR products were purified by gel electrophoresis on a 2% agarose gel using the Monarch DNA Gel Extraction Kit (NEB). The purified product was quantified with Qubit dsDNA HS Assay Kit (Thermo Fisher) and sequenced on an Illumina MiSeq according to the manufacturer’s instructions.
Figure imgf000044_0001
Figure imgf000045_0001
Figure imgf000046_0001
[00144] High-throughput Sequencing Data Analysis Processed (demultiplexed, trimmed, and merged) sequencing reads were analyzed to determine editing outcomes using CRISPPResso25 by aligning sequenced amplicons to reference and expected HDR amplicons. The quantification window was increased to 10 bp surrounding the expected cut site to better capture diverse editing outcomes, but substitutions were ignored to avoid inclusion of sequencing errors. Only reads containing no mismatches to the expected amplicon were considered for HDR quantification; reads containing indels that partially matched the expected amplicons were included in the overall reported indel frequency.
[00145] Statistical Analysis Unless otherwise stated, all statistical analysis and comparison were performed using t-test, with 1% false-discovery-rate (FDR) using two-stage step-up method of Benjamini, Krieger and Yekutieli (Benjamini, Y., et. al, Biometrika 93, 491-507 (2006), incorporated herein by reference). All experiments were performed in triplicates unless otherwise noted to ensure sufficient statistical power in the analysis.
[00146] Determination of editing at predicted Cas9 off-target sites To evaluate RecT/RecE off- target editing activity at known Cas9 off-target sites, same genomic DNA extracts for knock-in analysis were used as template for PCR amplification of top predicted off-targets sites (high scored as predicted CRISPOR, a web-based analysis tool) for the EMX1, VEGFA guides, primer sequences are listed in Table 2.
[00147] iGUIDE Off-target Analysis Genome-wide, unbiased off-target analysis was performed following the iGUIDE pipeline (Nobles, C.L., et al. Genome Biol 20, 14 (2019), incorporated herein by reference) based on Guide-seq invented previously (Tsai, S., et al. Nat Biotechnol 33, 187-197 (2015), incorporated herein by reference). HEK293T cells were transfected in 20uL Lonza SF Cell Line Nucleofector Solution on a Lonza Nucleofector 4-D with program DS- 150 according to the manufacturer’s instructions. 300ng of gRNA-Cas9 plasmids (or 150ng of each gRNACas9n plasmid for the double nickase), 150ng of the effector plasmids, and 5pmol of double stranded oligonucleotides (dsODN) were transfected. Cells were harvested after 72hrs for genomic DNA using Agencourt DNAdvance reagent kit. 400ng of purified gDNA which was then fragmented to an average of 500bp and ligated with adaptors using NEBNext Ultra II FS DNA Library Prep kit following manufacturer’s instructions. Two rounds of nested anchored PCR from the oligo tag to the ligated adaptor sequence were performed to amplify targeted DNA, and the amplified library was purified, size-selected, and sequenced using Illumina Miseq V2 PE300. Sequencing data was analyzed using the published iGUIDE pipeline, with the addition of a downsampling step which ensures an unbiased comparison across samples.
Example 2
[00148] In contrast to mammals, convenient recombineering-edit tools are available for bacteria, e.g., the phage lambda Red and RecE/T. Microbial recombineering has two major steps: template DNA is chewed back by exonucleases (Exo), then the single-strand annealing protein (SSAP) supports homology directed repair by the template, optionally facilitated by nuclease inhibitor. A system for RNA-guided targeting of RecE/T recombineering activities was developed and achieved kilobase (kb) human gene-editing without DNA cutting. [00149] Candidate microbial systems with recombineering activities were surveyed. Two lines of reasoning guided the search: 1) Orthogonality: prioritizing proteins with minimal resemblance to mammalian repair enzymes; 2) Parsimony: focusing on systems with fewest interdependent components. Three protein families were identified: lambda Red, RecE/T, and phage T7 gp6 (Exo) and gp2.5 (SSAP) recombination machinery. Based on phylogenetic reconstruction, RecE/T proteins were determined to be the most distant from eukaryotic recombination proteins and among the most compact. Thus, RecE/T systems were utilized for downstream analysis.
[00150] The NCBI protein database was systematically searched for RecE/T homologs. To develop a portable tool, evolutionary relationships and lengths were examined. Co-occurrence analysis revealed that most RecE/T systems have only one of the two proteins. As prophage integration could be imprecise, the 11% of species harboring both homologs were prioritized as evidence for intact functionality.
[00151] The top 12 candidates were codon-optimized and MS2 coat protein (MCP) fusions were constructed to recruit these RecE/T homologs, hereafter termed “recombinator”, to wild-type Streptococcus pyogenes Cas9 (wtCas9) via MS2 RNA aptamers. To understand their respective molecular effects as Exo and SSAP, each was tested independently. Initial results revealed Escherichia coli RecE/T proteins (simplified as RecE and RecT) as promising candidates, as determined by genome knock-in assays. While RecT is only 269 amino acid (AA) long, RecE was truncated from AA587 (RecE_587) and the carboxy terminus domain (RecE CTD) based on functional studies (Muyrers, J.P., Genes Dev. (2000); 14, 1971-1982, incorporated herein by reference).
[00152] To validate RecE/T recombineering in human cells, homology directed repair (HDR) was measured at five genomic sites with two templates. While the RecE variants (RecE_587, RecE CTD) demonstrated variable increases in knock-in efficiency, RecT significantly enhanced HDR in all cases, replacing ~16bp sequences a EMXl and VEGFA, and knocking-in ~lkb cassette atEISP90AAl, DYNLT1, AAVS1. These results were verified using imaging and junction sites were sequenced using Sanger sequencing to confirm precise insertion. To test if these activities are truly sequence-specific, a no-recruitment control with the PP7 coat protein (PCP) that recognizes PP7 aptamers not MS2 aptamers was employed. RecE had activities without recruitment, whereas RecT showed efficiency increases in a recruitment-dependent manner. Without being bound by theory, this may be explained by RecE exonuclease activity acting promiscuously. The RecE/T recombineering-edit (REDIT) tools was termed as REDITvl, with REDITvl RecT as the preferred variant.
Example 3
[00153] Three tests on REDITvl were performed to explore: 1) activity across cell types, 2) optimal designs of HDR template, and 3) specificity. REDITvl activity was robust across multiple genomic sites in HEK, A549, HepG2, and HeLa cells. Noticeably, in human embryonic stem cells (hESCs), REDITvl exhibited consistent increases of kilobase knock-in efficiency at HSP90AA1 and OCT4, with up to 3.5-fold improvement relative to Cas9-HDR. Different template designs were also tested. REDITvl performed efficient kilobase editing using HA length as short as 200bp total, with longer HA supporting higher efficiency. It achieved up to 10% efficiency (without selection) for kb-scale knock-in, a 5-fold increase over Cas9-HDR and significantly higher than the 1~2% typical efficiency. Lastly, the accuracy of REDITvl accuracy was determined using deep sequencing of predicted off-target sites (OTSs) and GUIDE-seq. Although REDITvl did not increase off-target effects, detectable OTSs remained at previously reported sites for EMX1 and VEGFA. In short, REDITvl showcased kilobase-scale genome recombineering but retained the off-target issues, with REDITvl RecT having the highest efficiency.
Example 4
[00154] To alleviate unwanted edits, a version of REDIT with non-cutting Cas9 nickases (Cas9n) was assessed. A similar strategy was previously employed (Ran, F.A., et al., Cell (2013), 154: 1380-1389, incorporated herein by reference) to address off-target issues but had low HDR efficiency. REDIT was tested to determine if this system could overcome the limitation of endogenous repair and promote nicking-mediated recombination. Indeed, the nickase version demonstrated higher efficiencies, with the best results from Cas9n(D10A) with single- and doublenicking. This Cas9n(D10A) variant was designated REDITv2N. A 5%~10% knock-in without selection was observed using REDITv2N double-nicking, comparable to REDITvl using wtCas9. Junction sequencing confirmed the precision of knock-in for all targets. This result represented 6- to 10-fold improvement over Cas9n-HDR. Even with single-nicking REDIT v2N, a ~2% efficiency for Ikb knock-in was observed, a level considerably higher than the 0.46% HDR efficiency in previous report (Cong, L. et al., Science. 339, 819-823, incorporated herein by reference) using regular single-nicking Cas9n and a less-challenging 12-bp knock-in template. [00155] The off-target activity of REDITv2N was investigated using GUIDE-seq. Results showed minimal off-target cleavage and a reduction of OTSs by -90% compared to REDITvl. Specifically, for DYNL T1 -targeting guides, the most abundant KIF6 OTS was significantly enriched in REDITvl group but disappeared when using REDITv2N. REDITv2N was highly accurate.
[00156] Another byproduct of HDR editing is on-target insertion-deletions (indels). They could drastically lower yields of gene-editing, especially for long sequences. Indel formation was measured in an EMX1 knock-in experiment using deep sequencing. REDITv2N increased HDR to the same efficiency as its counterpart using wtCas9, with a reduction of unwanted on-target indels by 92%.
[00157] Concepts from GUIDE-seq, LAM-PCR, and TLA were used to develop an NGS-based assay to identify genome-wide insertion sites (GIS), or GIS-seq. Using GIS-seq, NGS read clusters/peaks representing knock-in insertion sites were obtained, showing representative reads from the on-target site). GIS-seq was applied to DYNLT1 and ACTB loci to measure the knock-in accuracy. Sequencing results indicated that, when considering sites with high confidence based on maximum likelihood estimation, REDIT had less off-target insertion sites identified compared with Cas9. Together, the clonal Sanger sequencing of knock-in junctions, GUIDE-seq analysis, and GIS seq results indicated that REDIT can be an efficient method with the ability to insert kilobase-length sequences with less unwanted editing events.
Example 5
[00158] REDIT was examined for long sequence editing ability in the absence of any nicking/cutting of the target DNA. Remarkably, when using catalytically dead Cas9 (dCas9) to construct REDITv2D, an exact genomic knock-in of a kilobase cassette was observed in human cells. While REDITv2D has lower efficiency than REDITv2N, it achieved programmable DNA- damage-free editing at kilobase-scale with 1-2% efficiency and no selection. It was hypothesized that two processes could be contributing to the REDIT v2D recombineering. One possibility was via dCas9 unwinding. If dCas9 could unwind DNA as it induces sequence-specific formation of loop, a double-binding with two dCas9s would be expected to promote genome accessibility to RecE/T. However, a significant increase upon delivering two guide RNAs was not observed. Another possibility was that the unwinding of DNA during cell cycle permitted RecE/T to access the target region mediated by dCas9 binding. A Ikb knock-in was performed with different REDIT tools at varying serum levels (10% regular, 2% reduced, and no serum). As serum starvation arrests cell proliferation, the results indicated that the cell cycle correlated positively with REDITv2D recombineering. Upon no-serum treatment, HDR efficiency only dropped in REDITv2D(dCas9) group, whereas REDITv l(wtCas9) and REDITv2N(D10A) were not affected, supporting that DNA unwinding permitted RecE/T to access the target region.
Example 6
[00159] Microscopy analysis revealed incomplete nuclei-targeting of REDITvl, particularly REDITvl RecT. Hence, different designs of protein linkers and nuclear localization signals (NLSs) were tested. The extended XTEN-linker with C-terminal SV40-NLS was identified as a preferred configuration, termed REDITv3. REDITv3 further achieved a 2- to 3- fold increase of HDR efficiencies over REDITv2 across genome targets and Cas9 variants (wtCas9, Cas9n, dCas9).
[00160] Finally, REDITv3 was utilized in hESCs to engineer kilobase knock-in alleles in human stem cells. REDITv3N single- and double-nicking designs resulted in 5-fold and 20-fold increased HDR efficiencies over no-recombinator controls, respectively. The efficacy and fidelity were confirmed via a combination of assays described for previous REDIT versions. Additionally, REDITv3 works effectively with Staphylococcus aureus Cas9 (SaCas9), a compact CRISPR system suitable for in vivo delivery.
Example 7
[00161] To further investigate RecT and RecE_587 variants, both RecT and RecE_587 were truncated at various lengths. The resulting efficiencies were measured using an mKate knock-in assay, with both wildtype SpCas9 and Cas9n(D10A) with single- and double-nicking at the DYNLT1 locus. Efficiencies of the no recombination group are shown as the control.
[00162] The truncated versions of both RecT and RecE_587 retained significant recombineering activity when used with different Cas9s. In particular, compared with the full- length RecT(l-269aa), the new truncated versions such as RecT(93-264aa) are over 30% smaller yet they preserved essentially the full activities of RecT in stimulating recombination in eukaryotic cells. Similarly, compared with the full-length RecE(l-280aa), truncated versions such as RecE_587(120-221aa) and RecE_587(120-209aa) are over 60% smaller but still retained high recombination activities in human cells. These truncated versions demonstrated the potential to further engineer minimal-functional recombineering enzymes using RecE and RecT protein variants, but also provide valuable compact recombineering tools for human genome editing that is ideal for in vitro, ex vivo, and in vivo delivery given their small size.
[00163] Overall, REDIT harnessed the specificity of CRISPR genome-targeting with the efficiency of RecE/RecT recombineering. The disclosed high-efficiency, low-error system makes a powerful addition to existing CRISPR toolkits. The balanced efficiency and accuracy of REDITv3N makes it an attractive therapeutic option for knock-in of large cassette in immune and stem cells.
Example 8
[00164] The reconstructed RecE and RecT phylogenetic trees with eukaryotic recombination enzymes from yeast and human show the evolutionary distance of the proteins based on sequence homology. The dotted boxes indicate the full-length E. coli RecB and E. coli RecE protein. The catalytic core domain of E. coli RecB and E. coli RecE protein (solid boxes) was used for the comparison. The gene-editing activities of these families of recombineering proteins were measured using the MS2-MCP recruitment system, where sgRNA bearing MS2 stem-loop is used with recombineering proteins fused to the MCP protein via peptide linker and with nuclear- localization signals.
[00165] Three exonuclease proteins were used: the exonuclease from phage Lambda, the RecE587 core domain of E. coli RecE protein, and the exonuclease (gene name gp6) from phage T7. The gene-editing activity was measured using mKate knock-in assay at genomic loci (DYNLT1 and HSP90AA1).
[00166] Similar measurements were made testing the genome editing efficiencies of three single-strand DNA annealing proteins (SSAPs) from the same three species of microbes as the exonucleases, namely Bet protein from phage Lambda, RecT protein from E. coli, and SSAP (gene name gp2.5) from phage T7.
[00167] From these results, the genome recombineering activities of all three major family of phage/microbial recombination systems was systematically measured and validated in eukaryotic cells (lambda phage exonuclease and beta proteins; E. coli prophase RecE and RecT proteins, T7 phage exonuclease gp6 and single-strand binding gp2.5 proteins). All six proteins from three systems achieved efficient gene editing to knock-in kilobase-long sequences into mammalian genome across two genomic loci. Overall, the exonucleases showed ~3-fold higher recombination efficiency (up to 4% mKate genome knock-in) when compared with no-recombinator controls. The single-strand annealing proteins (SSAP) showed higher activities, with 4-fold to 8-fold higher gene-editing activities over the control groups. This demonstrated the general applicability and validity that microbial recombination proteins in the exonuclease and SSAP families could be engineered via the Cas9-based fusion protein system to achieve highly efficient genome recombination in mammalian cells.
Example 9
[00168] In order to demonstrate the generalizability of REDIT protein design, alternative recruitment systems were developed and tested. For a more compact REDIT system, the REDIT recombinator proteins were fused to N22 peptide and at the same time the sgRNA included boxB, the short cognizant sequence of N22 peptide, replacing MCP within the sgRNA. This boxB-N22 system demonstrated comparable editing efficiencies at the two genomic sites tested with side-by- side comparisons of the MS2-MCP recruitment system.
[00169] A REDIT system using SunTag recruitment, a protein-based recruitment system, was developed. Because SunTag is based on fusion protein design, the sgRNA or guideRNAs are the same as wild-type CRISPR system. Specifically, the REDIT recombinator proteins were fused to scFV antibody peptide (replacing MCP), and the GCN4 peptide was fused in tandem fashion (10 copies of GCN4 peptide separated by linkers) to the Cas9 protein. Thus, the scFV-REDIT could be recruited to the Cas9 complex via affinity of GCN4 to scFV.
[00170] mKate knock-in experiments were used to measure the editing efficiencies at the DYNLT1 locus and the HSP90AA1 locus, respectively. This SunTag-based REDIT system demonstrated significant increase of gene-editing knock-in efficiency at the DYNLT1 genomic sites tested. In addition, the SunTag design significantly increased HRD efficiencies to ~2-fold better than Cas9 but did not achieve increases as high as the MS2-aptamer.
Example 10
[00171] In order to demonstrate the generalizability of REDIT protein design and develop versatile REDIT system applicable to a range of CRISPR enzymes, Cpfl/Casl2a based REDIT system using the SunTag recruitment design was developed. Two different Cpfl/Casl2a proteins were tested (Lachnospiraceae bacterium ND2006, LbCpfl and Acidaminococcus sp. BV3L6) using the mKate knock-in assay as previously shown.
[00172] These results showed that the microbial recombination proteins (exonuclease and single-strand annealing proteins) could be engineered using alternative designs such as the SunTag recruitment system to perform genome editing in eukaryotic cells. These protein-based recruitment system does not require the usage of RNA aptamers or RNA-binding proteins, instead, they took advantage of fusion protein domains directly connecting to the CRISPR enzymes to recruit REDIT proteins.
[00173] In addition to the flexibility in recruitment system design, these results using Cpfl/Casl2a-type CRISPR enzymes also demonstrated the general adaptability of REDIT proteins to various CRISPR systems for genome recombination. Cpfl/Casl2a enzymes have different catalytic residues and DNA-recognition mechanisms from the Cas9 enzymes. Hence, the REDIT recombination proteins (exonucleases and single-strand annealing proteins) could function independent from the specific choices of the CRISPR enzyme components (Cas9, Cpfl/Casl2a, and others). This proved the generalizability of the REDIT system and open up possibility to use additional CRISPR enzymes (known and unknown) as components of REDIT system to achieve accurate genome editing in eukaryotic cells.
Example 11
[00174] Fifteen different species of microbes having RecE/RecT proteins were selected for a screen of various RecE and RecT proteins across the microbial kingdom (Table 3). Each protein was codon-optimized and synthesized. As previously described for E. coli RecE/RecT based REDIT systems, each protein was fused via E-XTEN linker to the MCP protein with additional nuclear localization signal. mKate knock-in gene-editing assay was used to measure efficiencies a DYNLTl locus Table 4) and HSP90AA1 locus (Table 4). The homologs demonstrated the ability to enable and enhance precision gene-editing.
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000055_0002
Figure imgf000056_0001
Example 12
[00175] Next, to benchmark the RecT-based REDIT design, it was compared with three categories of existing HDR-enhancing tools: DNA repair enzyme CtIP fusion with the Cas9 (Cas9- HE), a fusion of the functional domain (amino acids 1 to 110) of human Geminin protein with the Cas9 (Cas9-Gem), and a small-molecule enhancers of HDR via cell cycle control, Nocodazole. Across endogenous targets tested, the RecT-based REDIT design had favorable performance compared with three alternative strategies. Furthermore, the RecT-based REDIT design, which putatively acted through activity independently from the other approaches, may synergize with existing methods. To test this hypothesis, RecT-based REDIT design was combined with three different approaches (conveniently through the MS2-aptamer). The RecT-based REDIT design could indeed further enhance the HDR-promoting activities of the tested tools.
Example 13
[00176] The effect of template HA lengths on the editing efficiency of REDIT was quantified when using the canonical HDR donor bearing HAs of at least 100 bp on each side. Higher HDR rates were observed for both Cas9 and RecT groups with increasing HA lengths, and REDIT effectively stimulated HDR over Cas9 using HA lengths as short as ~100bp each side. When supplied with a longer template bearing 600-800 bp total HA, RecT achieved over 10% HDR efficiencies for kb-scale knock-in without selection, significantly higher than the 2-3% efficiency when only using Cas9. Recent reports identified that using donor DNAs with shorter HAs (usually between 10 and 50 bp) could significantly stimulate knock-in efficiencies thanks to the high repair activities from the Microhomology-mediated end joining (MMEJ) pathway. Knock-in efficiencies of the REDIT -based method were compared with Cas9, using donor DNA with Obp (NHEJ-based), lObp or 50bp (MMEJ-based) HAs. The results demonstrated that short-HA donors leveraging MMEJ mechanisms yielded higher editing efficiencies compared with HDR donors. At the same time, REDIT was able to enhance the knock-in efficiencies as long as there is HA present (no effect for the Obp NHEJ donor). This effect is particularly significant with The 10 bp donors in which there was a significant effect, were chosen for further characterization and comparison with the HDR donors.
[00177] The knock-in cells were clonally isolated and the target genomic region was amplified using primers binding completely outside of the donor DNAs for colony Sanger sequencing. Junction sequencing analysis (~48 colonies per gene per condition) revealed varying degrees of indels at the 5’- and 3’- knock-in junctions, including at single or both junctions. Overall, HDR donors had better precision than MMEJ donors, and REDIT modestly improved the knock-in yield compared with Cas9, though junction indels were still observed.
[00178] Furthermore, the efficiencies of REDIT and Cas9 were compared when making different lengths of editing. For longer edits, 2-kb knock-in cassettes were used, and for shorter edits single-stranded oligo donors (ssODN) were used. When the knock-in sequence length was increased to ~2-kb using a dual-mKate/GFP template, REDIT maintained its HDR-promoting activity compared with Cas9 across endogenous targets tested. For ssODN tests, at two well- established loci EMX1 and VEGFA, REDIT and Cas9 were used to introduce 12-16-bp exogenous sequences. As ssODN templates are short (<100 bp HAs on each side), next-generation sequencing (NGS) was used to quantify the editing events. Comparable levels of indels were observed between Cas9 and REDIT with improved HDR efficiencies using REDIT.
Example 14
[00179] The sensitivity of REDIT’ s ability to promote HDR in the presence or absence of two distinctive pharmacological inhibitors of RAD51, B02 and RI-1. As expected, for Cas9-based editing, RAD51 inhibition significantly lowered HDR efficiencies. Intriguingly, RAD51 inhibition decreased REDIT and REDIT dn efficiencies only moderately, as both REDIT/REDITdn methods maintained significantly higher knock-in efficiencies compared with Cas9/Cas9dn under RAD51 inhibition.
[00180] Mirin, a potent chemical inhibitor of DSB repair, which has also been shown to prevent MRN complex formation, MRN-dependent ATM activation, and inhibit Mrel l exonuclease activity was also used. When treating cells with Mrining, only the editing efficiencies of Cas9 reference experiments were affected by the Miring treatment, whereas the REDIT versions were essentially the same as vehicle-treated groups across all genomic targets.
[00181] To test if cell cycle inhibition affected recombination, cells were chemically synchronized at the Gl/S boundary using double Thymidine blockage (DTB). REDIT versions had reduced editing efficiencies under DTB treatment, though it maintained higher editing efficiencies under DNA repair pathway inhibition, compared with Cas9 reference experiments, when Miring RI-1, or B02 were combined with DTB treatment.
[00182] To validate REDIT in different contexts, REDIT was applied in human embryonic stem cells (hESCs) to test their ability to engineer long sequences in non-transformed human cells. Robust stimulation of HDR was observed across all three genomic sites HSP90AA1, ACTB, OCT4/POU5F1)' using REDIT and REDITdn. Of note, REDIT and REDITdn editing used donor DNAs with 200-bp HAs on each side and achieved up to over 5% efficiency for kb-scale geneediting without selection compared with ~1% efficiency using non-REDIT methods. Additionally, REDIT improved knock-in efficiencies in A549 (lung-derived), HepG2 (liver- derived), and HeLa (cervix-derived) cells, demonstrating up to ~15% kb-scale genomic knock-in without selection. This improvement was up to 4-fold higher than the Cas9 groups, supporting the potential of using REDIT methods in different cell types.
Example 15
[00183] In vivo use of dCas9-EcRecT (SAFE-dCas9) was tested using cleavage free dCas9 editor via hydrodynamic tail vein injection. A gene editing vector (60 pg) and template DNA (60 pg) were injected via hydrodynamic tail vein injection to deliver the components to the mouse. Successful gene editing of liver hepatocytes was monitored by transgene-encoded protein expression from the albumin locus.
[00184] At approximately seven days after injection, the perfused mice livers were dissected. The lobes of the liver were homogenized and processed to extract liver genomic DNA from the primary hepatocytes. The extracted genomic DNA was used for three different downstream analyses: 1) PCR using knock-in-specific primers and agarose gel electrophoresis; 2) Sanger sequencing of the knock-in PCR product; 3) high-throughput deep sequencing of the knock-in junction to confirm and quantify the accuracy of gene-editing using SAFE-dCas9 in vivo. Each downstream analysis confirmed knock-in success.
[00185] In addition, in vivo use was tested using adeno-associated virus (AAV) delivery into LTC mice lungs. LTC mice include three genome alleles: 1) Lkbl (flox/flox) allele allows Lkbl- KO when expressing Cre; 2) R26(LSL-TdTom) allele allows detection of AAV-transduced cells via TdTom red fluorescent protein; and 3) Hl 1(LSL-Cas9) allele allows expression of Cas9 in AAV-transduced cells. Successful gene editing using the gene editing vector leads to Kras alleles that drive tumor growth in the lung of the treated mice.
[00186] Approximately fourteen weeks after the AAV injection, perfused mice lungs were dissected. Fixed lung tissue was used for imaging analysis to identify tumor formation from successful gene-editing. Quantification of the surface tumor number via imagining analysis showed increased gene-editing efficiencies and total number of tumors in the REDIT treated mice.
[00187] Escherichia coli RecE amino acid sequence (SEQ ID NO:1): MSTKPLFLLRKAKKS SGEPD VVLWASNDFESTC ATLDYLIVKSGKKLS S YFKAVATNFP VVNDLPAEGEIDFTWSERYQLSKDSMTWELKPGAAPDNAHYQGNTNVNGEDMTEIEEN MLLPISGQELPIRWLAQHGSEKPVTHVSRDGLQALHIARAEELPAVTALAVSHKTSLLDP LEIRELHKLVRDTDKVFPNPGNSNLGLITAFFEAYLNADYTDRGLLTKEWMKGNRVSHI TRTASGANAGGGNLTDRGEGFVHDLTSLARDVATGVLARSMDLDIYNLHPAHAKRIEEI lAENKPPFSVFRDKFITMPGGLDYSRAIVVASVKEAPIGIEVIPAHVTEYLNKVLTETDHA NPDPEIVDIACGRSSAPMPQRVTEEGKQDDEEKPQPSGTTAVEQGEAETMEPDATEHHQ DTQPLDAQSQVNSVDAKYQELRAELHEARKNIPSKNPVDDDKLLAASRGEFVDGISDPN DPKWVKGIQTRDCVYQNQPETEKTSPDMNQPEPVVQQEPEIACNACGQTGGDNCPDCG AVMGDATYQETFDEESQVEAKENDPEEMEGAEHPHNENAGSDPHRDCSDETGEVADP VIVEDIEPGIYYGISNENYHAGPGISKSQLDDIADTPALYLWRKNAPVDTTKTKTLDLGT AFHCRVLEPEEFSNRFIVAPEFNRRTNAGKEEEKAFLMECASTGKTVITAEEGRKIELMY
QSVMALPLGQWLVESAGHAESSIYWEDPETGILCRCRPDKIIPEFHWIMDVKTTADIQRF KTAYYDYRYHVQDAFYSDGYEAQFGVQPTFVFLVASTTIECGRYPVEIFMMGEEAKLA GQQEYHRNLRTLADCLNTDEWPA IKTLSLPRWAKEYAND
[00188] Escherichia coli RecE_587 amino acid sequence (SEQ ID NO:2):
ADPVIVEDIEPGIYYGISNENYHAGPGVSKSQLDDIADTPALYLWRKNAPVDTTKTKTLD LGTAFHCRVLEPEEFSNRFIVAPEFNRRTNSGKEEEKAFLRECASTGKTVITAEEGRKIEL
MYQSVMALPLGQWLVESAGHAESSIYWEDPETAILCRCRPDKIIPEFHWIMDVKTTADI QRFKTAYYDYRYHVQDAFYSDGYEAQFGVQPTFVFLVASTTIECGRYPVEIFMMGEEA KLAGQLEYHRNLRTLADCLNTDEWPAIKTLSLPRWAKEYAND*
[00189] Escherichia coli CTD RecE amino acid sequence (SEQ ID NO:3):
GISNENYHAGPGVSKSQLDDIADTPALYLWRKNAPVDTTKTKTLDLGTAFHCRVLEPEE FSNRFIVAPEFNRRTNSGKEEEKAFLRECASTGKTVITAEEGRKIELMYQSVMALPLGQW LVESAGHAESSIYWEDPETAILCRCRPDKIIPEFHWIMDVKTTADIQRFKTAYYDYRYHV QDAFYSDGYEAQFGVQPTFVFLVASTTIECGRYPVEIFMMGEEAKLAGQLEYHRNLRTL ADCLNTDEWPAIKTLSLPRWAKEYAND*
[00190] Pantoea brenneri RecE amino acid sequence (SEQ ID NO:4):
MQPGIYYDISNEDYHRGAGISKSQLDDIAISPAIYQWRKHAPVDEEKTAALDLGTALHCL LLEPDEFSKRFQIGPEVNRRTTAGKEKEKEFIERCEAEGITPITHDDNRKLKLMRDSALAH PIARWMLEAQGNAEASIYWNDRDAGVLSRCRPDKIITEFNWCVDVKSTADIMKFQKDF YSYRYHVQDAFYSDGYESHFHETPTFAFLAVSTSIDCGRYPVQVFIMDQQAKDAGRAE YKRNIHTFAECLSRNEWPGIATLSLPFWAKELRNE
[00191] Type-F symbiont of Plautia stali RecE amino acid sequence (SEQ ID NO:5):
MQPGIYYDISNEDYHGGPGISKSQLDDIAISPAIYQWRKHAPVDEEKTAALDLGTALHCL LLEPDEFSKRFEIGPEVNRRTTAGKEKEKEFMERCEAEGVTPITHDDNRKLRLMRDSAM
AHPIARWMLEAQGNAEASIYWNDRDTGVLSRCRPDKIITDFNWCVDVKSTADIIKFQKD
FYSYRYHVQDAFYSDGYESHFDETPTFAFLAVSTSIDCGRYPVQVFIMDQQAKDAGRAE
YKRNIHTFAECLSRNEWPGIATLSLPYWAKELRNE
[00192] Providencia sp. MGF014 RecE amino acid sequence (SEQ ID NO:6):
MKEGIYYNISNEDYHNGLGISKSQLDLINEMPAEYIWSKEAPVDEEKIKPLEIGTALHCLL
LEPDEYHKRYKIGPDVNRRTNVGKEKEKEFFDMCEKEGITPITHDDNRKLMIMRDSALA
HPIAKWCLEADGVSES SIYWTDKETDVLCRCRPDRIITAHNYIIDVKS SGDIEKFD YEYYN
YRYHVQDAFYSDGYKEVTGITPTFLFLVVSTKIDCGKYPVRTYVMSEEAKSAGRTAYK
HNLLTYAECLKTDEWAGIRTLSLPRWAKELRNE
[00193] Shigella sonnei RecE amino acid sequence (SEQ ID NO:7):
DRGLLTKEWRKGNRVSRITRTASGANAGGGNLTDRGEGFVHDLTSLARDIATGVLARS
MDVDIYNLHPAHAKRIEEIIAENKPPFSVFRDKFITMPGGLDYSRAIVVASVKEAPIGIEVI
PAHVTAYLNKVLTETDHANPDPEIVDIACGRSSAPMPQRVTEEGKQDDEEKLQPSGTTA
DEQGEAETMEPDATKHHQDTQPLDAQSQVNSVDAKYQELRAELHEARKNIPSKNPVDA
DKLLAASRGEFVDGISDPNDPKWVKGIQTRDSVYQNQPETEKTSPDMKQPEPVVQQEPE
IAFNACGQTGGDNCPDCGAVMGDATYQETFDEENQVEAKENDPEEMEGAEHPHNENA
GSDPHRDCSDETGEVADPVIVEDIEPGIYYGISNENYHAGPGVSKSQLDDIADTPALYLW
RKNAPVDTTKTKTLDLGTAFHCRVLEPEEFSNRFIVAPEFNRRTNAGKEEEKAFLMECA
STGKMVITAEEGRKIELMYQSVMALPLGQWLVESAGHAESSIYWEDPETGILCRCRPDK
IIPEFHWIMDVKTTADIQRFKTAYYDYRYHVQDAFYSDGYEAQFGVQPTFVFL VASTTIE
CGRYPVEIFMMGEEAKLAGQLEYHRNLRTLADCLNTDEWPAIKTLSLPRWAKEYAND
[00194] Pseudobacteriovorax antillogorgiicola RecE amino acid sequence (SEQ ID NO:8):
MSKLSNLKVSNSDVDTLSRIRMKEGVYRDLPIESYHQSPGYSKTSLCQIDKAPIYLKTKV
PQKSTKSLNIGTAFHEAMEGVFKDKYVVHPDPGVNKTTKSWKDFVKRYPKHMPLKRSE
YDQVLAMYDAARSYRPFQKYHLSRGFYESSFYWHDAVTNSLIKCRPDYITPDGMSVIDF
KTTVDPSPKGFQYQAYKYHYYVSAALTLEGIEAVTGIRPKEYLFLAVSNSAPYLTALYR
ASEKEIALGDHFIRRSLLTLKTCLESGKWPGLQEEILELGLPFSGLKELREEQEVEDEFME LVG
[00195] Escherichia coli RecT amino acid sequence (SEQ ID NO:9):
MTKQPPIAKADLQKTQGNRAPAAVKNSDVISFINQPSMKEQLAAALPRHMTAERMIRIA TTEIRKVPALGNCDTMSFVSAIVQCSQLGLEPGSALGHAYLLPFGNKNEKSGKKNVQLII
GYRGMIDLARRSGQIASLSARVVREGDEFSFEFGLDEKLIHRPGENEDAPVTHVYAVAR
LKDGGTQFEVMTRKQIELVRSLSKAGNNGPWVTHWEEMAKKTAIRRLFKYLPVSIEIQR
AVSMDEKEPLTIDPADSSVLTGEYSVIDNSEE*
[00196] Pantoea brenneri RecT amino acid sequence (SEQ ID NO: 10):
MSNQPPIASADLQKTQQSKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMI
RIVTTEIRKTPQLAQCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQ
LIIGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLVHRPGENEDAPITHVYAV
ARLKDGGTQFEVMTVKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSI
EMQKAVVLDEKAESDVDQDNASVLSAEYSVLESGDEATN
[00197] Type-F symbiont of Plautia stali RecT amino acid sequence (SEQ ID NO: 11):
MSNQPPIASADLQKTQQSKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMI
RIVTTEIRKTPALATCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQ
LIIGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLIHRPGDNEDAPITHVYAV
ARLKDGGTQFEVMTAKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSI
EMQKAVVLDEKAESDVDQDNASVLSAEYSVLEGDGGE
[00198] Providencia sp. MGF014 RecT amino acid sequence (SEQ ID NO: 12):
MSNPPLAQSDLQKTQGTEVKVKTKDQQLIQFINQPSMKAQLAAALPRHMTPDRMIRIVT
TEIRKTPALATCDMQSFVGAVVQCSQLGLEPGNALGHAYLLPFGNGKAKSGQSNVQLII
GYRGMIDLARRSNQIISISARTVRQGDNFHFEYGLNEDLTHTPSENEDSPITHVYAVARL
KDGGVQFEVMTYNQVEKVRASSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPVSIEMQ
KAVVLDEKAEANVDQENATIFEGEYEEVGTDGN
[00199] Shigella sonnei RecT amino acid sequence (SEQ ID NO: 13):
MTKQPPIAKADLQKTQENRAPAAIKNNDVISFINQPSMKEQLAAALPRHMTAERMIRIA
TTEIRKVPALGNCDTMSFVSAIVQCSQLGLEPGSALGHAYLLPFGNKNEKSGKKNVQLII
GYRGMIDLARRSGQIASLSARVVREGDEFNFEFGLDEKLIHRPGENEDAPVTHVYAVAR
LKDGGTQFEVMTRRQIELVRSQSKAGNNGPWVTHWEEMAKKTAIRRLFKYLPVSIEIQR
AVSMDEKEPLTIDPADSSVLTGEYSVIDNSEE
[00200] Pseudobacteriovorax antillogorgiicola RecT amino acid sequence (SEQ ID
NO:14):
MGHLVSKTEQDYIKQHYAKGATDQEFEHFIGVCRARGLNPAANQIYFVKYRSKDGPAK PAFILSIDSLRLIAHRTGDYAGCSEPIFTDGGKACTVTVRRNLKSGETGNFSGMAFYDEQ VQQKNGRPTSFWQSKPRTMLEKCAEAKALRKAFPQDLGQFYIREEMPPQYDEPIQVHK PKALEEPRFSKSDLSRRKGLNRKLSALGVDPSRFDEVATFLDGTPDRELGQKLKLWLKE AGYGVNQ
[00201] SV40 NLS amino acid sequence (SEQ ID NO: 16): PKKKRKV
[00202] Tyl NLS amino acid sequence (SEQ ID NO: 17):
NSKKRSLEDNETEIKVSRDTWNTKNMRSLEPPRSKKRIH
[00203] c-Myc NLS amino acid sequence (SEQ ID NO: 18): PAAKRVKLD
[00204] biSV40 NLS amino acid sequence (SEQ ID NO: 19): KRTADGSEFESPKKKRKV
[00205] Mut NLS amino acid sequence (SEQ ID NO:20):
PEKKRRRPSGS VPVLARPSPPKAGKS SCI
[00206] Template DNA sequences (underlining marks the replaced or inserter editing sequences) EMX1 HDR template sequence (SEQ ID NO:79):
CATTCTGCCTCTCTGTATGGAAAAGAGCATGGGGCTGGCCCGTGGGGTGGTGTCCAC TTTAGGCCCTGTGGGAGATCATGGGAACCCACGCAGTGGGTcataggctctctcatttactactcacat ccactctgtgaagaagcgattatgatctctcctctagaaaCTCGTAGAGTCCCATGTCTGCCGGCTTCCAGAG CCTGCACTCCTCCACCTTGGCTTGGCTTTGCTGGGGCTAGAGGAGCTAGGATGCACA GCAGCTCTGTGACCCTTTGTTTGAGAGGAACAGGAAAACCACCCTTCTCTCTGGCCC ACTGTGTCCTCTTCCTGCCCTGCCATCCCCTTCTGTGAATGTTAGACCCATGGGAGCA GCTGGTCAGAGGGGACCCCGGCCTGGGGCCCCTAACCCTATGTAGCCTCAGTCTTCC CATCAGGCTCTCAGCTCAGCCTGAGTGTTGAGGCCCCAGTGGCTGCTCTGGGGGCCT CCTGAGTTTCTCATCTGTGCCCCTCCCTCCCTGGCCCAGGTGAAGGTGTGGTTCCAG AACCGGAGGACAAAGTACAAACGGCAGAAGCTGGAGGAGGAAGGGCCTGAGTCCG AGCAGAAGAAGAAGGGCTCCCATCACATCAACCGGTGGCGCATTGCCACGAAGCAG GCCAATGGGGAGGACATCGATGTCACCTCCAATGACTCGGATGTACACGGTCTGCA ACCACAAACCCACGAGGGCAGAGTGCTGCTTGCTGCTGGCCAGGCCCCTGCGTGGG CCCAAGCTGGACTCTGGCCACTCCCTGGCCAGGCTTTGGGGAGGCCTGGAGTCATGG CCCCACAGGGCTTGAAGCCCGGGGCCGCCATTGACAGAGGGACAAGCAATGGGCTG GCTGAGGCCTGGGACCACTTGGCCTTCTCCTCGGAGAGCCTGCCTGCCTGGGCGGGC CCGCCCGCCACCGCAGCCTCCCAGCTGCTCTCCGTGTCTCCAATCTCCCTTTTGTTTT GATGCATTTCTGTTTTAATTTATTTTCCAGGCACCACTGTAGTTTAGTGATCCCCAGT GTCCCCCTTCCCTATGGGAATAATAAAAGTCTCTCTCTTAATGACACGGGCATCCAG CTCCAGCCCCAGAGCCTGGGGTGGTAGATTCCGGCTCTGAGGGCCAGTGGGGGCTG GTAGAGCAAACGCGTTCAGGGCCTGGGAGCCTGGGGTGGGGTACTGGTGGAGGGGG TCAAGGGTAATTCATTAACTCCTCTCTTTTGTTGGGGGACCCTGGTCTCTACCTCCAG CTCCACAGCAGGAGAAACAGGCTAGACATAGGGAAGGGCCATCCTGTATCTTGAGG GAGGACAGGCCCAGGTCTTTCTTAACGTATTGAGAGGTGGGAATCAGGCCCAGGTA GTTCAATGGG
[00207] VEGFA HDR template sequence (SEQ ID NO:80):
AGGTTTGAATCATCACGCAGGCCCTGGCCTCCACCCGCCCCCACCAGCCCCCTGGCC TCAGTTCCCTGGCAACATCTGGGGTTGGGGGGGCAGCAGGAACAAGGGCCTCTGTC TGCCCAGCTGCCTCCCCCTTTGGGTTTTGCCAGACTCCACAGTGCATACGTGGGCTC CAACAGGTCCTCTTCCCTCCCAGTCACTGACTAACCCCGGAACCACACAGCTTCCCG TTctcagctccacaaacttggtgccaaattcttctcccctgggaagcatccctggacacttcccaaaggaccccagtcactccagcctgttg gctgccgctcactttgatgtctgcaggccagatgagggctccagatggcacattgtcagagggacacactgtggcccctgtgcccagccct gggctctctgtacatgaagcaactccagtcccaaatatgtagctgtttgggaggtcagaaatagggggtccaggagcaaactccccccacc ccctttccaaagcccattccctctttagccagagccggggtgtgcagacggcagtcactagggggcgctcggccaccacagggaagctg ggtgaatggagcgagcagcgtcttcgagagtgaggacgtgtgtgtctgtgtgggtgagtgagtgtgCgcACTCTAGAGgtgtCg Tgttgagggcgttggagcggggagaaggccaggggtcactccaggattccaatagatctgtgtgtccctctccccacccgtccctgtccg gctctccgccttcccctgcccccttcaatattcctagcaaagagggaacggctctcaggccctgtccgcacgtaacctcactttcctgctccct cctcgccaatgccccgcgggcgcgtgtctctggacagagtttccgggggcggatgggtaattttcaggctgtgaaccttggtgggggtcga gcttccccttcattgcggcgggctGCGGGCCAGGCTTCACTGAGCGTCCGCAGAGCCCGGGCCCGA GCCGCGTGTGGAAGGGCTGAGGCTCGCCTGTccccgccccccggggcgggccgggggcggggtcccgg cggggcggAGCCATGCGCCCCCCCCttttttttttAAAAGTCGGCTGGTAGCGGGGAGGATCGC
GGAGGCTTGGGGCAGCCGGGTAGCTCGGAGGTCGTGGCGCTGGGGGCTAGCACCAG CGCTCTGTCGGGAGGCGCAGCGGTTAGGTGGACCGGTCAGCGGACTCACCGGCCAG GGCGCTCGGTGCTGGAATTTGATATTCATTGATCCGGGttttatccctcttcttttttcttaaacatttttttttA AAACTGTATTGTTTCTCGTTTTAATTTATTTTTGCTTGCCATTCCCCACTTGAAT [00208] DYNLT1 HDR template sequence (SEQ ID NO:81):
AGTGACCTGTGTAATTATGCAGAAGAATGGAGCTGGATTACACACAGCAAGTTCCT GCTTCTGGGACAGCTCTACTGACGGTATGATTTTCATTCATGTTTGTGAAGTTTTGTT GTGTGAAATATATGACTGGAAGTTTCCTATCTTTGAATGCAATGCATGTTTATCACCT TTTAAAACATTTAATAATAGACTTGCCAAGGTTCTTTGTGTAGCATAGAGATGGGTA CTTGAATGTTGGCCTTATTGTGAGTAAAACGTCGTCCCCCAGCTTTCCCTGCCGTAAA TGCTGCTCTCTTCCCTCCCGCAGGGAGCTGCACTGTGCGATGGGAGAATAAGACCAT GTACTGCATCGTCAGTGCCTTCGGACTGTCTATTGGAAGCGGAGCTACTAACTTCAG CCTGCTGAAGCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTgccaccatggtgagcgagct gattaaggagaacatgcacatgaagctgtacatggagggcaccgtgaacaaccaccacttcaagtgcacatccgagggcgaaggcaagc cctacgagggcacccagaccatgagaatcaaggcggtcgagggcggccctctccccttcgccttcgacatcctggctaccagcttcatgta cggcagcaaaaccttcatcaaccacacccagggcatccccgacttctttaagcagtccttccccgagggcttcacatgggagagagtcacc acatacgaagatgggggcgtgctgaccgctacccaggacaccagcctccaggacggctgcctcatctacaacgtcaagatcagaggggt gaactcccatccaacggccctgtgatgcagaagaaaacactcggctgggaggcctccaccgagacactgtaccccgctgacggcggcc tggaaggcagagccgacatggccctgaagctcgtgggcgggggccacctgatctgcaaccttaagaccacatacagatccaagaaaccc gctaagaacctcaagatgcccggcgtctactatgtggacaggagactggaaagaatcaaggaggccgacaaagagacatacgtcgagca gcacgaggtggctgtggccagatactgcgacctccctagcaaactggggcacaaacttaattccTAACCaGCtGTCCtGCCT ATGGCCTTTCTCCTTTTGTCTCTAGTTCATCCTCTAACCACCAGCCATGAATTCAGTG AACTCTTTTCTCATTCTCTTTGTTTTGTGGCACTTTCACAATGTAGAGGAAAAAACCA
AATGACCGCACTGTGATGTGAATGGCACCGAAGTCAGATGAGTATCCCTGTAGGTC ACCTGCAGCCTGCGTTGCCACTTGTCTTAACTCTGAATATTTCATTTCAAAGGTGCTA AAATCTGAAATCTGCTAGTGTGAAACTTGCTCTACTCTCTGAAATGATTCAAATACA CTAATTTTCCATACTTTATACTTTTGTTAGAATAAATTATTCAAATCTAAAGTCTGTT GTGTTCTTCATAGTCTGCATAGTATCATAAACG
[00209] HSP90AA1 HDR template sequence (SEQ ID NO:82):
GCAGCAAAGAAACACCTGGAGATAAACCCTGACCATTCCATTATTGAGACCTTAAG GCAAAAGGCAGAGGCTGATAAGAACGACAAGTCTGTGAAGGATCTGGTCATCTTGC TTTATGAAACTGCGCTCCTGTCTTCTGGCTTCAGTCTGGAAGATCCCCAGACACATG CTAACAGGATCTACAGGATGATCAAACTTGGTCTGGGTAAGCCTTATACTATGTAAT GTTAAAAAGAAAATAAACACACGTGACATTGAAGAAAATGGTGAACTTTCAGTTAT CCAAACTTGGAGCACCTTGTCCTGCTTGCTGCTTGGAGGTATTAAAGTATGttttttttAGG GATAAGTAAGGTCTTACAAGAGCAAAGAAATGAAATTGAGACTCATATGTCCTGTA ATACTGTCTTGAAAGCAGATAGAAACCAAGAGTATTACCCTAATAGCTGGCTTTAAG AAATCTTTGTAATATGAGGATTTTATTTTGGAAACAGGTATTGATGAAGATGACCCT ACTGCTGATGATACCAGTGCTGCTGTAACTGAAGAAATGCCACCCCTTGAAGGAGAT GACGACACATCACGCATGGAAGAAGTAGACGGAAGCGGAGCTACTAACTTCAGCCT GCTGAAGCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTgtgagcgagctgattaaggagaaca tgcacatgaagctgtacatggagggcaccgtgaacaaccaccacttcaagtgcacatccgagggcgaaggcaagccctacgagggcac ccagaccatgagaatcaaggcggtcgagggcggccctctccccttcgccttcgacatcctggctaccagcttcatgtacggcagcaaaacc ttcatcaaccacacccagggcatccccgacttctttaagcagtcctccccgagggcttcacatgggagagagtcaccacatacgaagatgg gggcgtgctgaccgctacccaggacaccagcctccaggacggctgcctcatctacaacgtcaagatcagaggggtgaacttcccatccaa cggccctgtgatgcagaagaaaacactcggctgggaggcctccaccgagacactgtaccccgctgacggcggcctggaaggcagagc cgacatggccctgaagctcgtgggcgggggccacctgatctgcaaccttaagaccacatacagatccaagaaacccgctaagaacctcaa gatgcccggcgtctactatgtggacaggagactggaaagaatcaaggaggccgacaaagagacatacgtcgagcagcacgaggtggct gtggccagatactgcgacctccctagcaaactggggcacaaacttaattccTAaAT C TgT GGC T G AGGG AT G AC T T A CCTGTTCAGTACTCTACAATTCCTCTGATAATATATTTTCAAGGATGTTTTTCTTTATT TTTGTTAATATTAAAAAGTCTGTATGGCATGACAACTACTTTAAGGGGAAGATAAGA TTTCTGTCTACTAAGTGATGCTGTGATACCTTAGGCACTAAAGCAGAGCTAGTAATG CTTTTTGAGTTTCATGTTGGTTTATTTTCACAGATTGGGGTAACGTGCACTGTAAGAC GTATGTAACATGATGTTAACTTTGTGGTCTAAAGTGTTTAGCTGTCAAGCCGGATGC CTAAGTAGACCAAATCTTGTTATTGAAGTGTTCTGAGCTGTATCTTGATGTTTAGAA AAGTATTCGTTACATCTTGTAGGATCTACTTTTTGAACTTTTCATTCCCTGTAGTTGA CAATTCTGCATGTACTAGTCCTCTAGAAATAGGTTAAACTGAAGCAACTTGATGGAA GGATCTCTCCACAGGGCTTGTTTTCCAAAGAAAAGTATTGTTTGGAGGAGCAAAGTT AAAAGCCTACCTAAGCATATCGTAAAGCTGTTCAAAAATAACTCAGACCCAGTCTTG TGGA
[00210] AAVS1 HDR template sequence (SEQ ID NO:83): gatgctctttccggagcacttccttctcggcgctgcaccacgtgatgtcctctgagcggatcctccccgtgtctgggtcctctccgggcatctc tcctccctcacccaaccccatgccgtcttcactcgctgggttcccttttccttctccttctggggcctgtgccatctctcgtttcttaggatggcctt ctccgacggatgtctcccttgcgtcccgcctccccttcttgtaggcctgcatcatcaccgtttttctggacaaccccaaagtaccccgtctccct ggctttagccacctctccatcctcttgctttctttgcctggacaccccgttctcctgtggattcgggtcacctctcactcctttcatttgggcagctc ccctaccccccttacctctctagtctgtgctagctcttccagccccctgtcatggcatcttccaggggtccgagagctcagctagtcttcttcctc caacccgggcccctatgtccacttcaggacagcatgtttgctgcctccagggatcctgtgtccccgagctgggaccaccttatattcccagg gccggttaatgtggctctggttctgggtactttatctgtcccctccaccccacagtggggcaagcttctgacctcttctcttcctcccacaggg cctcgagagatctggcagcggaGGAAGCGGAGCTACTAACTTCAGCCTGCTGAAGCAGGCTGGA GACGTGGAGGAGAACCCTGGACCTgtgagcgagctgattaaggagaacatgcacatgaagctgtacatggagggc accgtgaacaaccaccacttcaagtgcacatccgagggcgaaggcaagccctacgagggcacccagaccatgagaatcaaggcggtcg agggcggccctctccccttcgccttcgacatcctggctaccagcttcatgtacggcagcaaaacctcatcaaccacacccagggcatcccc gacttctttaagcagtccttccccgagggcttcacatgggagagagtcaccacatacgaagatgggggcgtgctgaccgctacccaggac accagcctccaggacggctgcctcatctacaacgtcaagatcagaggggtgaacttcccatccaacggccctgtgatgcagaagaaaaca ctcggctgggaggcctccaccgagacactgtaccccgctgacggcggcctggaaggcagagccgacatggccctgaagctcgtgggc gggggccacctgatctgcaaccttaagaccacatacagatccaagaaacccgctaagaacctcaagatgcccggcgtctactatgtggac aggagactggaaagaatcaaggaggccgacaaagagacatacgtcgagcagcacgaggtggctgtggccagatactgcgacctcccta gcaaactggggcacaaacttaattccTAaactagggacaggatggtgacagaaaagccccatccttaggcctcctccttcctagtctcct gatattgggtctaacccccacctcctgttaggcagattccttatctggtgacacacccccatttcctggagccatctctctccttgccagaacct ctaaggtttgcttacgatggagccagagaggatcctgggagggagagcttggcagggggtgggagggaagggggggatgcgtgacctg cccggttctcagtggccaccctgcgctaccctctcccagaacctgagctgctctgacgcggctgtctggtgcgtttcactgatcctggtgctg cagcttccttacacttcccaagaggagaagcagtttggaaaaacaaaatcagaataagttggtcctgagttctaactttggctcttcacctttcta gtccccaatttatattgttcctccgtgcgtcagttttacctgtgagataaggccagtagccagccccgtcctggcagggctgtggtgaggagg ggggtgtccgtgtggaaaactccctttgtgagaatggtgcgtcctaggtgttcaccaggtcgtggccgcctctactccctttctctttctccatc cttctttccttaaagagtccccagtgctatctgggacatattcctccgcccagagcagggtcccgcttccctaaggccctgctctgggcttctg ggtttgagtccttggc
[00211] OCT4 HDR template sequence (SEQ ID NO:84):
GCGACTATGCACAACGAGAGGATTTTGAGGCTGCTGGGTCTCCTTTCTCAGGGGGAC CAGTGTCCTTTCCTCTGGCCCCAGGGCCCCATTTTGGTACCCCAGGCTATGGGAGCC CTCACTTCACTGCACTGTACTCCTCGGTCCCTTTCCCTGAGGGGGAAGCCTTTCCCCC TGTCTCCGTCACCACTCTGGGCTCTCCCATGCATTCAAAtGGAAGCGGAGCTACTAAC TTCAGCCTGCTGAAGCAGGCTGGAGACGTGGAGGAGAACCCTGGACCTgccaccatggtga gcgagctgattaaggagaacatgcacatgaagctgtacatggagggcaccgtgaacaaccaccacttcaagtgcacatccgagggcgaa ggcaagccctacgagggcacccagaccatgagaatcaaggcggtcgagggcggccctctccccttcgccttcgacatcctggctaccag cttcatgtacggcagcaaaaccttcatcaaccacacccagggcatccccgacttctttaagcagtccttccccgagggcttcacatgggaga gagtcaccacatacgaagatgggggcgtgctgaccgctacccaggacaccagcctccaggacggctgcctcatctacaacgtcaagatc agaggggtgaacttcccatccaacggccctgtgatgcagaagaaaacactcggctgggaggcctccaccgagacactgtaccccgctga cggcggcctggaaggcagagccgacatggccctgaagctcgtgggcgggggccacctgatctgcaaccttaagaccacatacagatcc aagaaacccgctaagaacctcaagatgcccggcgtctactatgtggacaggagactggaaagaatcaaggaggccgacaaagagacat acgtcgagcagcacgaggtggctgtggccagatactgcgacctccctagcaaactggggcacaaacttaattccTAaT G AC TAG GAATGGGGGACAGGGGGAGGGGAGGAGCTAGGGAAAGAAAACCTGGAGTTTGTGC CAGGGTTTTTGGGATTAAGTTCTTCATTCACTAAGGAAGGAATTGGGAACACAAAGG
GTGGGGGCAGGGGAGTTTGGGGCAACTGGTTGGAGGGAAGGTGAAGTTCAATGATG
CTCTTGATTTTAATCCCACATCATGTATCACTTTTTTCTTAAATAAAGAAGCCTGGGA
CACAGTAGATAGACACACTT
[00212] Pantoea stewartii RecT DNA (SEQ ID NO:85):
AGCAACCAGCCCCCTATCGCCTCCGCCGATCTGCAGAAGGCCAACACCGGCAAGCA
GGTGGCCAATAAGACCCCTGAGCAGACACTGGTGGGCTTCATGAATCAGCCAGCAA
TGAAGAGCCAGCTGGCCGCCGCCCTGCCAAGGCACATGACAGCCGATCGGATGATC
AGAATCGTGACCACAGAGATCCGCAAGACCCCCGCCCTGGCCACATGCGACCAGAG
CTCCTTCATCGGCGCCGTGGTGCAGTGTTCTCAGCTGGGCCTGGAGCCTGGCAGCGC
CCTGGGCCACGCCTACCTGCTGCCATTTGGCAACGGCCGGAGCAAGTCCGGACAGT
CCAATGTGCAGCTGATCATCGGCTATAGAGGCATGATCGATCTGGCCCGGAGATCTG
GCCAGATCGTGTCTCTGAGCGCCAGGGTGGTGCGCGCAGACGATGAGTTCTCCTTTG
AGTACGGCCTGGATGAGAACCTGATCCACCGGCCAGGCGAGAATGAGGACGCACCC
ATCACCCACGTGTATGCAGTGGCAAGACTGAAGGACGGAGGCACCCAGTTCGAAGT
GATGACAGTGAAGCAGATCGAGAAGGTGAAGGCCCAGTCCAAGGCCTCTAGCAACG
GACCCTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTG
TTTAAGTACCTGCCCGTGAGCATCGAGATGCAGAAGGCCGTGATCCTGGATGAGAA
GGCCGAGTCTGACGTGGATCAGGACAATGCCTCCGTGCTGTCTGCCGAGTATAGCGT
GCTGGACGGCTCCTCTGAGGAG
[00213] Pantoea stewartii RecE DNA (SEQ ID NO:86):
CAGCCCGGCGTGTACTATGACATCTCCAACGAGGAGTATCACGCCGGCCCTGGCATC
AGCAAGTCCCAGCTGGACGACATCGCCGTGTCCCCAGCCATCTTCCAGTGGAGAAA
GTCTGCCCCCGTGGACGATGAGAAAACCGCCGCCCTGGACCTGGGCACAGCCCTGC
ACTGCCTGCTGCTGGAGCCTGATGAGTTCTCCAAGAGGTTTATGATCGGCCCAGAGG
TGAACCGGAGAACCAATGCCGGCAAGCAGAAGGAGCAGGACTTCCTGGATATGTGC
GAGCAGCAGGGCATCACCCCTATCACACACGACGATAACCGGAAGCTGAGACTGAT
GAGGGACTCTGCCTTTGCCCACCCAGTGGCCAGATGGATGCTGGAGACAGAGGGCA
AGGCCGAGGCCTCTATCTACTGGAATGACAGGGATACACAGATCCTGAGCAGGTGC
CGCCCCGACAAGCTGATCACCGAGTTCTCTTGGTGCGTGGACGTGAAGAGCACAGC
CGACATCGGCAAGTTCCAGAAGGACTTCTACAGCTATCGCTACCACGTGCAGGACG CCTTCTATTCCGATGGCTACGAGGCCCAGTTTTGCGAGGTGCCAACCTTCGCCTTTCT
GGTGGTGAGCTCCTCTATCGATTGTGGCCGGTATCCCGTGCAGGTGTTTATCATGGA
CCAGCAGGCAAAGGATGCAGGAAGGGCCGAGTATAAGCGGAACCTGACCACATAC
GCCGAGTGCCAGGCAAGGAATGAGTGGCCTGGCATCGCCACACTGAGCCTGCCTTA
CTGGGCCAAGGAGATCCGGAATGTG
[00214] Pantoea brenneri RecT DNA (SEQ ID NO:87):
AGCAACCAGCCCCCTATCGCCTCCGCCGATCTGCAGAAAACCCAGCAGTCCAAGCA
GGTGGCCAACAAGACCCCTGAGCAGACACTGGTGGGCTTCATGAATCAGCCAGCAA
TGAAGAGCCAGCTGGCCGCCGCCCTGCCAAGGCACATGACCGCCGATCGGATGATC
AGAATCGTGACCACAGAGATCCGCAAGACACCACAGCTGGCCCAGTGCGACCAGAG
CTCCTTCATCGGCGCCGTGGTGCAGTGTTCTCAGCTGGGCCTGGAGCCTGGCAGCGC
CCTGGGCCACGCCTACCTGCTGCCATTTGGCAACGGCCGGTCCAAGTCTGGCCAGAG
CAATGTGCAGCTGATCATCGGCTATAGAGGCATGATCGATCTGGCCCGGAGATCCG
GACAGATCGTGAGCCTGTCCGCCAGGGTGGTGCGCGCAGACGATGAGTTCTCTTTTG
AGTACGGCCTGGATGAGAACCTGGTGCACCGGCCAGGCGAGAATGAGGACGCACCC
ATCACCCACGTGTATGCAGTGGCAAGACTGAAGGACGGAGGCACCCAGTTCGAAGT
GATGACAGTGAAGCAGGTGGAGAAGGTGAAGGCCCAGTCCAAGGCCTCTAGCAATG
GCCCCTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTG
TTTAAGTACCTGCCCGTGAGCATCGAGATGCAGAAGGCCGTGGTGCTGGATGAGAA
GGCCGAGTCTGACGTGGATCAGGACAACGCCTCTGTGCTGAGCGCCGAGTATTCCGT
GCTGGAGTCTGGCGACGAGGCCACAAAT
[00215] Pantoea brenneri RecE DNA (SEQ ID NO:88):
CAGCCTGGCATCTACTATGACATCAGCAACGAGGATTATCACAGGGGAGCAGGCAT
CAGCAAGTCCCAGCTGGACGACATCGCCATCTCCCCAGCCATCTACCAGTGGAGAA
AGCACGCCCCCGTGGACGAGGAGAAAACCGCCGCCCTGGATCTGGGCACAGCCCTG
CACTGCCTGCTGCTGGAGCCTGACGAGTTCTCTAAGAGGTTTCAGATCGGCCCAGAG
GTGAACCGGAGAACCACAGCCGGCAAGGAGAAGGAGAAGGAGTTCATCGAGCGGT
GCGAGGCAGAGGGAATCACCCCAATCACACACGACGATAATAGGAAGCTGAAGCT
GATGAGGGATTCCGCCCTGGCCCACCCAATCGCAAGGTGGATGCTGGAGGCACAGG
GAAACGCAGAGGCCTCTATCTATTGGAATGACAGAGATGCCGGCGTGCTGAGCAGG
TGCCGCCCCGACAAGATCATCACCGAGTTCAACTGGTGCGTGGACGTGAAGTCCAC AGCCGACATCATGAAGTTCCAGAAGGACTTCTACTCTTACAGATACCACGTGCAGGA
CGCCTTCTATTCCGATGGCTACGAGTCTCACTTTCACGAGACACCCACATTCGCCTTT
CTGGCCGTGTCTACCAGCATCGACTGCGGCAGGTATCCTGTGCAGGTGTTTATCATG
GACCAGCAGGCAAAGGATGCAGGAAGGGCCGAGTACAAGAGAAACATCCACACCT
TCGCCGAGTGTCTGAGCAGGAATGAGTGGCCTGGCATCGCCACACTGTCCCTGCCTT
TTTGGGCCAAGGAGCTGCGCAATGAG
[00216] Pantoea dispersa RecT DNA (SEQ ID NO: 89):
TCCAACCAGCCACCTCTGGCCACCGCAGATCTGCAGAAAACCCAGCAGTCTAACCA
GGTGGCCAAGACCCCTGAGCAGACACTGGTGGGCTTCATGAATCAGCCAGCAATGA
AGAGCCAGCTGGCCGCCGCCCTGCCAAGGCACATGACCGCCGATCGGATGATCAGA
ATCGTGACCACAGAGATCCGCAAGACACCCGCCCTGGCCCAGTGCGACCAGAGCTC
CTTCATCGGAGCAGTGGTGCAGTGTAGCCAGCTGGGCCTGGAGCCTGGCTCCGCCCT
GGGCCACGCCTACCTGCTGCCATTTGGCAACGGCCGGTCCAAGTCTGGCCAGAGCA
ATGTGCAGCTGATCATCGGCTATAGAGGCATGATCGATCTGGCCCGGAGATCCGGA
CAGATCGTGAGCCTGTCCGCCAGGGTGGTGCGCGCAGACGATGAGTTCTCTTTTGAG
TACGGCCTGGATGAGAACCTGATCCACCGGCCAGGCGACAATGAGTCCGCCCCCAT
CACCCACGTGTATGCAGTGGCAAGACTGAAGGACGGAGGCACCCAGTTCGAAGTGA
TGACAGCCAAGCAGGTGGAGAAGGTGAAGGCCCAGTCCAAGGCCTCTAGCAACGG
ACCCTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTGT
TTAAGTACCTGCCCGTGAGCATCGAGATGCAGAAGGCCGTGGTGCTGGACGAGAAG
GCCGAGAGCGACGTGGATCAGGACAATGCCTCTGTGCTGAGCGCCGAGTATTCCGT
GCTGGAGTCTGGCACAGGCGAG
[00217] Pantoea dispersa RecE DNA (SEQ ID NO: 90):
GAGCCAGGCATCTACTATGACATCAGCAACGAGGCCTACCACTCCGGCCCCGGCAT
CAGCAAGTCCCAGCTGGACGACATCGCCAGGAGCCCTGCCATCTTCCAGTGGCGCA
AGGACGCCCCAGTGGATACCGAGAAAACCAAGGCCCTGGACCTGGGCACCGATTTC
CACTGCGCCGTGCTGGAGCCAGAGAGGTTTGCAGACATGTATCGCGTGGGCCCTGA
AGTGAATCGGAGAACCACAGCCGGCAAGGCCGAGGAGAAGGAGTTCTTTGAGAAGT
GTGAGAAGGATGGAGCCGTGCCCATCACCCACGACGATGCACGGAAGGTGGAGCTG
ATGAGAGGCTCCGTGATGGCCCACCCTATCGCCAAGCAGATGATCGCAGCACAGGG
ACACGCAGAGGCCTCTATCTACTGGCACGACGAGAGCACAGGCAACCTGTGCCGGT GTAGACCCGACAAGTTTATCCCTGATTGGAATTGGATCGTGGACGTGAAAACCACA
GCCGATATGAAGAAGTTCAGGCGCGAGTTTTACGATCTGCGGTATCACGTGCAGGA
CGCCTTCTACACCGATGGCTATGCCGCCCAGTTTGGCGAGCGGCCTACCTTCGTGTT
TGTGGTGACATCCACCACAATCGACTGCGGCAGATACCCCACCGAGGTGTTCTTTCT
GGATGAGGAGACAAAGGCCGCCGGCAGGTCTGAGTACCAGAGCAACCTGGTGACCT
ATTCCGAGTGTCTGTCTCGCAATGAGTGGCCAGGCATCGCCACACTGTCTCTGCCCC
ACTGGGCCAAGGAGCTGAGGAACGTG
[00218] Type-F symbiont of Plautia stali RecT DNA (SEQ ID NO:91):
TCCAACCAGCCCCCTATCGCCTCTGCCGATCTGCAGAAAACCCAGCAGTCTAAGCAG
GTGGCCAACAAGACCCCTGAGCAGACACTGGTGGGCTTCATGAATCAGCCAGCAAT
GAAGTCCCAGCTGGCCGCCGCCCTGCCAAGGCACATGACAGCCGATCGGATGATCA
GAATCGTGACCACAGAGATCCGCAAGACCCCCGCCCTGGCCACATGCGACCAGAGC
TCCTTCATCGGAGCAGTGGTGCAGTGTAGCCAGCTGGGCCTGGAGCCTGGCTCCGCC
CTGGGCCACGCCTACCTGCTGCCATTTGGCAACGGCCGGTCCAAGTCTGGCCAGTCT
AATGTGCAGCTGATCATCGGCTATAGAGGCATGATCGACCTGGCCCGGAGAAGCGG
ACAGATCGTGAGCCTGTCCGCCAGGGTGGTGCGCGCAGACGATGAGTTCTCCTTTGA
GTACGGCCTGGATGAGAACCTGATCCACCGGCCAGGCGATAATGAGGACGCCCCCA
TCACCCACGTGTATGCAGTGGCAAGACTGAAGGACGGAGGCACCCAGTTCGAAGTG
ATGACAGCCAAGCAGGTGGAGAAGGTGAAGGCCCAGAGCAAGGCCTCTAGCAACG
GACCCTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTG
TTTAAGTACCTGCCCGTGAGCATCGAGATGCAGAAGGCCGTGGTGCTGGATGAGAA
GGCCGAGAGCGACGTGGATCAGGACAATGCCTCTGTGCTGAGCGCCGAGTATTCCG
TGCTGGAGGGCGACGGCGGCGAG
[00219] Type-F symbiont of Plautia stali RecE DNA (SEQ ID NO:92):
CAGCCTGGCATCTACTATGACATCAGCAACGAGGATTATCACGGCGGCCCTGGCATC
AGCAAGTCCCAGCTGGACGACATCGCCATCTCCCCAGCCATCTACCAGTGGAGGAA
GCACGCCCCCGTGGACGAGGAGAAAACCGCCGCCCTGGATCTGGGCACAGCCCTGC
ACTGCCTGCTGCTGGAGCCTGACGAGTTCTCTAAGAGATTTGAGATCGGCCCAGAGG
TGAACCGGAGAACCACAGCCGGCAAGGAGAAGGAGAAGGAGTTCATGGAGAGGTG
TGAGGCAGAGGGAGTGACCCCTATCACACACGACGATAATCGGAAGCTGAGACTGA
TGAGGGATAGCGCAATGGCCCACCCAATCGCCAGATGGATGCTGGAGGCACAGGGA AACGCAGAGGCCTCTATCTATTGGAATGACAGGGATACCGGCGTGCTGAGCAGGTG
CCGCCCCGACAAGATCATCACCGACTTCAACTGGTGCGTGGACGTGAAGTCCACAG
CCGACATCATCAAGTTCCAGAAGGACTTTTACTCTTATCGCTACCACGTGCAGGACG
CCTTCTATTCCGATGGCTACGAGTCTCACTTTGACGAGACACCAACATTCGCCTTTCT
GGCCGTGTCTACAAGCATCGATTGCGGCCGGTATCCCGTGCAGGTGTTCATCATGGA
CCAGCAGGCAAAGGATGCAGGAAGGGCCGAGTACAAGCGGAACATCCACACCTTTG
CCGAGTGTCTGAGCCGCAATGAGTGGCCTGGCATCGCCACACTGTCCCTGCCTTACT
GGGCCAAGGAGCTGCGGAATGAG
[00220] Providencia stuartii RecT DNA (SEQ ID NO:93):
AGCAACCCACCTCTGGCCCAGGCAGACCTGCAGAAAACCCAGGGCACAGAGGTGAA
GGAGAAAACCAAGGATCAGATGCTGGTGGAGCTGATCAATAAGCCTTCCATGAAGG
CACAGCTGGCCGCCGCCCTGCCAAGGCACATGACACCCGACCGGATGATCAGAATC
GTGACCACAGAGATCAGAAAGACCCCCGCCCTGGCCACATGCGATATGCAGAGCTT
CGTGGGAGCAGTGGTGCAGTGTTCCCAGCTGGGCCTGGAGCCTGGCAACGCCCTGG
GACACGCCTACCTGCTGCCTTTTGGCAACGGCAAGTCTAAGAGCGGCCAGTCTAATG
TGCAGCTGATCATCGGCTATCGGGGCATGATCGACCTGGCCCGGAGAAGCGGCCAG
ATCGTGTCCATCTCTGCCAGGACCGTGCGCCAGGGCGATAACTTCCACTTTGAGTAC
GGCCTGAACGAGAATCTGACCCACGTGCCTGGCGAGAATGAGGACTCTCCAATCAC
ACACGTGTACGCAGTGGCAAGGCTGAAGGATGGAGGCGTGCAGTTCGAAGTGATGA
CCTATAACCAGATCGAGAAGGTGCGCGCCAGCTCCAAGGCAGGACAGAATGGACCC
TGGGTGAGCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTGTTCAA
GTACCTGCCCGTGTCTATCGAGATGCAGAAGGCCGTGATCCTGGACGAGAAGGCCG
AGGCCAACATCGATCAGGAGAATGCCACCATCTTTGAGGGCGAGTATGAGGAAGTG
GGCACAGACGGCAAG
[00221] Providencia stuartii RecE DNA (SEQ ID NO:94):
GAGGGCATCTACTATAACATCAGCAATGAGGACTACCACAACGGCCTGGGCATCTC
CAAGTCTCAGCTGGATCTGATCAATGAGATGCCTGCCGAGTATATCTGGTCCAAGGA
GGCCCCCGTGGACGAGGAGAAGATCAAGCCTCTGGAGATCGGCACCGCCCTGCACT
GCCTGCTGCTGGAGCCAGACGAGTACCACAAGAGATATAAGATCGGCCCCGATGTG
AACCGGAGAACAAATGCCGGCAAGGAGAAGGAGAAGGAGTTCTTTGATATGTGCGA
GAAGGAGGGCATCACCCCCATCACACACGACGATAACCGGAAGCTGATGATCATGA GAGACTCTGCCCTGGCCCACCCTATCGCCAAGTGGTGTCTGGAGGCCGATGGCGTGA
GCGAGAGCTCCATCTACTGGACCGACAAGGAGACAGATGTGCTGTGCAGGTGTCGC
CCAGACCGCATCATCACCGCCCACAACTACATCGTGGATGTGAAGTCTAGCGGCGA
CATCGAGAAGTTCGATTACGAGTACTACAACTACAGATACCACGTGCAGGACGCCTT
TTACTCCGATGGCTATAAGGAGGTGACCGGCATCACCCCTACATTCCTGTTTCTGGT
GGTGTCTACCAAGATCGACTGCGGCAAGTACCCCGTGCGGACCTACGTGATGAGCG
AGGAGGCAAAGTCCGCCGGAAGGACCGCCTACAAGCACAACCTGCTGACCTATGCC
GAGTGTCTGAAAACCGATGAGTGGGCCGGCATCAGGACACTGTCTCTGCCCAGATG
GGCAAAGGAGCTGCGGAATGAG
[00222] Providencia sp. MGF014 RecT DNA (SEQ ID NO:95):
TCTAACCCCCCTCTGGCCCAGAGCGACCTGCAGAAAACCCAGGGCACAGAGGTGAA
GGTGAAAACCAAGGATCAGCAGCTGATCCAGTTCATCAATCAGCCTTCTATGAAGG
CACAGCTGGCCGCCGCCCTGCCAAGGCACATGACACCCGACCGGATGATCAGAATC
GTGACCACAGAGATCAGAAAGACCCCCGCCCTGGCCACATGCGATATGCAGTCCTT
CGTGGGCGCCGTGGTGCAGTGTTCTCAGCTGGGCCTGGAGCCTGGCAACGCCCTGG
GACACGCCTACCTGCTGCCTTTTGGCAACGGCAAGGCCAAGTCCGGCCAGTCTAATG
TGCAGCTGATCATCGGCTATCGGGGCATGATCGACCTGGCCCGGAGATCCAACCAG
ATCATCTCTATCAGCGCCAGGACCGTGCGCCAGGGCGATAACTTCCACTTTGAGTAC
GGCCTGAATGAGGACCTGACCCACACACCTAGCGAGAATGAGGATTCCCCAATCAC
CCACGTGTACGCAGTGGCAAGGCTGAAGGACGGAGGCGTGCAGTTTGAAGTGATGA
CATATAACCAGGTGGAGAAGGTGCGCGCCAGCTCCAAGGCAGGACAGAATGGACCC
TGGGTGAGCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGCGCCTGTTCAA
GTACCTGCCCGTGTCCATCGAGATGCAGAAGGCAGTGGTGCTGGACGAGAAGGCAG
AGGCCAACGTGGATCAGGAGAATGCCACCATCTTTGAGGGCGAGTATGAGGAAGTG
GGCACAGATGGCAAT
[00223] Providencia sp. MGF014 RecE DNA (SEQ ID NO:96):
AAGGAGGGCATCTACTATAACATCAGCAATGAGGACTACCACAACGGCCTGGGCAT
CTCCAAGTCTCAGCTGGATCTGATCAATGAGATGCCTGCCGAGTATATCTGGTCCAA
GGAGGCCCCCGTGGACGAGGAGAAGATCAAGCCTCTGGAGATCGGCACCGCCCTGC
ACTGCCTGCTGCTGGAGCCAGACGAGTACCACAAGAGATATAAGATCGGCCCCGAT
GTGAACCGGAGAACAAATGTGGGCAAGGAGAAGGAGAAGGAGTTCTTTGATATGTG CGAGAAGGAGGGCATCACCCCCATCACACACGACGATAACCGGAAGCTGATGATCA
TGAGAGACTCTGCCCTGGCCCACCCTATCGCCAAGTGGTGTCTGGAGGCCGATGGCG
TGAGCGAGAGCTCCATCTACTGGACCGACAAGGAGACAGATGTGCTGTGCAGGTGT
CGCCCAGACCGCATCATCACCGCCCACAACTACATCATCGATGTGAAGTCTAGCGGC
GACATCGAGAAGTTCGATTACGAGTACTACAACTACAGATACCACGTGCAGGACGC
CTTTTACTCCGATGGCTATAAGGAGGTGACCGGCATCACCCCTACATTCCTGTTTCTG
GTGGTGTCTACCAAGATCGACTGCGGCAAGTACCCCGTGCGGACCTACGTGATGAG
CGAGGAGGCAAAGTCCGCCGGAAGGACCGCCTACAAGCACAACCTGCTGACCTATG
CCGAGTGTCTGAAAACCGATGAGTGGGCCGGCATCAGGACACTGTCTCTGCCCAGA
TGGGCAAAGGAGCTGCGGAATGAG
[00224] Shewanella putrefaciens RecT DNA (SEQ ID NO:97):
CAGACCGCACAGGTGAAGCTGAGCGTGCCCCACCAGCAGGTGTACCAGGACAACTT
CAATTATCTGAGCTCCCAGGTGGTGGGCCACCTGGTGGATCTGAACGAGGAGATCG
GCTACCTGAACCAGATCGTGTTTAATTCTCTGAGCACCGCCTCTCCCCTGGACGTGG
CAGCACCTTGGAGCGTGTACGGCCTGCTGCTGAACGTGTGCCGGCTGGGCCTGTCCC
TGAATCCAGAGAAGAAGCTGGCCTATGTGATGCCCTCCTGGTCTGAGACAGGCGAG
ATCATCATGAAGCTGTACCCCGGCTATAGGGGCGAGATCGCCATCGCCTCTAACTTC
AATGTGATCAAGAACGCCAATGCCGTGCTGGTGTATGAGAACGATCACTTCCGCATC
CAGGCAGCAACCGGCGAGATCGAGCACTTTGTGACAAGCCTGTCCATCGACCCTAG
GGTGCGCGGAGCATGCAGCGGAGGCTACTGTCGGTCCGTGCTGATGGATAATACAA
TCCAGATCTCTTATCTGAGCATCGAGGAGATGAACGCCATCGCCCAGAATCAGATCG
AGGCCAACATGGGCAATACCCCTTGGAACTCCATCTGGCGGACAGAGATGAATAGA
GTGGCCCTGTACCGGAGAGCAGCAAAGGACTGGAGGCAGCTGATCAAGGCCACCCC
AGAGATCCAGTCCGCCCTGTCTGATACAGAGTAT
[00225] Shewanella putrefaciens RecE DNA (SEQ ID NO:98):
GGCACCGCCCTGGCCCAGACAATCAGCCTGGACTGGCAGGATACCATCCAGCCAGC
ATACACAGCCTCCGGCAAGCCTAACTTCCTGAATGCCCAGGGCGAGATCGTGGAGG
GCATCTACACCGATCTGCCTAATTCCGTGTATCACGCCCTGGACGCACACAGCTCCA
CCGGCATCAAGACATTCGCCAAGGGCCGCCACCACTACTTTCGGCAGTATCTGTCTG
ACGTGTGCCGGCAGAGAACAAAGCAGCAGGAGTACACCTTCGACGCCGGCACCTAC
GGCCACATGCTGGTGCTGGAGCCAGAGAACTTCCACGGCAACTTCATGAGGAACCC CGTGCCTGACGATTTTCCAGACATCGAGCTGATCGAGAGCATCCCACAGCTGAAGG
CCGCCCTGGCCAAGAGCAACCTGCCCGTGTCCGGAGCAAAGGCCGCCCTGATCGAG
AGACTGTACGCCTTCGACCCATCCCTGCCCCTGTTTGAGAAGATGAGGGAGAAGGC
CATCACCGACTATCTGGATCTGCGCTACGCCAAGTATCTGCGGACCGACGTGGAGCT
GGATGAGATGGCCACATTCTACGGCATCGATACCTCTCAGACACGGGAGAAGAAGA
TCGAGGAGATCCTGGCCATCTCTCCTAGCCAGCCAATCTGGGAGAAGCTGATCAGCC
AGCACGTGATCGACCACATCGTGTGGGACGATGCCATGAGGGTGGAGAGATCCACC
AGGGCCCACCCTAAGGCAGACTGGCTGATCTCTGATGGCTATGCCGAGCTGACAAT
CATCGCAAGGTGCCCAACCACCGGCCTGCTGCTGAAGGTGCGGTTTGACTGGCTGA
GGAATGATGCCATCGGCGTGGACTTCAAGACCACACTGTCTACCAACCCCACAAAG
TTTGGCTACCAGATCAAGGACCTGCGGTATGATCTGCAGCAGGTGTTCTACTGTTAT
GTGGCCAATCTGGCCGGCATCCCTGTGAAGCACTTCTGCTTTGTGGCCACCGAGTAC
AAGGACGCCGATAACTGTGAGACATTTGAGCTGTCTCACAAGAAAGTGATCGAGAG
CACCGAGGAGATGTTCGACCTGCTGGATGAGTTTAAGGAGGCCCTGACCTCCGGCA
ATTGGTATGGCCACGACAGGTCCCGCTCTACATGGGTCATCGAGGTG
[00226] Bacillus sp. MUM 116 RecT DNA (SEQ ID NO:99):
AGCAAGCAGCTGACCACAGTGAATACCCAGGCCGTGGTGGGCACATTCTCCCAGGC
CGAGCTGGATACCCTGAAGCAGACAATCGCCAAGGGCACCACAAACGAGCAGTTCG
CCCTGTTTGTGCAGACCTGCGCCAACTCTAGGCTGAATCCATTTCTGAACCACATCC
ACTGTATCGTGTATAACGGCAAGGAGGGCGCCACCATGAGCCTGCAGATCGCAGTG
GAGGGCATCCTGTACCTGGCACGCAAGACAGACGGCTATAAGGGCATCGAGTGCCA
GCTGATCCACGAGAATGACGAGTTCAAGTTTGATGCCAAGTCCAAGGAGGTGGATC
ACCAGATCGGATTCCCCAGGGGCAACGTGATCGGAGGATATGCAATCGCAAAGAGG
GAGGGCTTTGACGATGTGGTGGTGCTGATGGAGTCTAACGAGGTGGACCACATGCT
GAAGGGCCGGAATGGCCACATGTGGAGAGACTGGTTCAACGATATGTTTAAGAAGC
ACATCATGAAGCGGGCCGCCAAGCTGCAGTACGGCATCGAGATCGCAGAGGACGAG
ACAGTGAGCAGCGGACCTAGCGTGGATAATATCCCAGAGTATAAGCCACAGCCCCG
GAAGGACATCACACCCAACCAGGACGTGATCGATGCCCCCCCTCAGCAGCCTAAGC
AGGACGATGAGGCCGCCAAGCTGAAGGCCGCCAGATCTGAGGTGAGCAAGAAGTTC
AAGAAGCTGGGCATCGTGAAGGAGGATCAGACCGAGTACGTGGAGAAGCACGTGC CTGGCTTCAAGGGCACACTGTCCGACTTTATCGGCCTGTCTCAGCTGCTGGATCTGA
ATATCGAGGCCCAGGAGGCCCAGTCCGCCGACGGCGATCTGCTGGAC
[00227] Bacillus sp. MUM 116 RecE DNA (SEQ ID N0:100):
ACCTACGCCGCCGACGAGACACTGGTGCAGCTGCTGCTGTCCGTGGATGGCAAGCA
GCTGCTGCTGGGAAGGGGCCTGAAGAAGGGCAAGGCCCAGTACTATATCAATGAGG
TGCCATCTAAGGCCAAGGAGTTCGAGGAGATCCGGGACCAGCTGTTTGACAAGGAT
CTGTTCATGTCCCTGTTTAACCCCTCTTACTTCTTTACCCTGCACTGGGAGAAGCAGA
GGGCCATGATGCTGAAGTATGTGACAGCCCCCGTGTCTAAGGAGGTGCTGAAGAAT
CTGCCTGAGGCCCAGTCCGAGGTGCTGGAGAGATACCTGAAGAAGCACTCTCTGGT
GGATCTGGAGAAGATCCACAAGGACAACAAGAATAAGCAGGATAAGGCCTATATCT
CTGCCCAGAGCAGGACCAACACACTGAAGGAGCAGCTGATGCAGCTGACCGAGGA
GAAGCTGGACATCGATTCCATCAAGGCCGAGCTGGCCCACATCGACATGCAGGTCA
TCGAGCTGGAGAAGCAGATGGATACAGCCTTCGAGAAGAACCAGGCCTTTAATCTG
CAGGCCCAGATCAGGAATCTGCAGGACAAGATCGAGATGAGCAAGGAGCGGTGGC
CCTCCCTGAAGAACGAAGTGATCGAGGATACCTGCCGGACATGCAAGCGGCCCCTG
GACGAGGATAGCGTGGAGGCCGTGAAGGCCGACAAGGATAATCGGATCGCCGAGT
ACAAGGCCAAGCACAACTCCCTGGTGTCTCAGAGAAATGAGCTGAAGGAGCAGCTG
AACACCATCGAGTATATCGACGTGACAGAGCTGAGAGAGCAGATCAAGGAGCTGGA
TGAGTCCGGACAGCCTCTGAGGGAGCAGGTGCGCATCTACAGCCAGTATCAGAATC
TGGACACCCAGGTGAAGTCCGCCGAGGCAGACGAGAACGGCATCCTGCAGGATCTG
AAGGCCTCTATCTTCATCCTGGATAGCATCAAGGCCTTTAGGGGCAAGGAGGCCGA
GATGCAGGCCGAGAAGGTGCAGGCCCTGTTCACCACACTGAGCGTGCGCCTGTTTA
AGCAGAATAAGGGCGACGGCGAGATCAAGCCAGATTTCGAGATCGAGATGAACGA
CAAGCCCTATCGGACCCTGAGCCTGTCCGAGGGCATCCGGGCAGGCCTGGAGCTGC
GGGACGTGCTGAGCCAGCAGTCCGAGCTGGTGACCCCTACATTCGTGGATAATGCC
GAGTCTATCACCAGCTTCAAGCAGCCAAACGGCCAGCTGATCATCAGCCGGGTGGT
GGCAGGACAGGAGCTGAAGATCGAGGCCGTGAGCGAG
[00228] Shigella sonnei RecT DNA (SEQ ID NO: 101):
ACCAAGCAGCCCCCTATCGCCAAGGCCGACCTGCAGAAAACCCAGGAGAACAGGGC
ACCAGCAGCCATCAAGAACAATGATGTGATCTCCTTTATCAATCAGCCCTCTATGAA
GGAGCAGCTGGCCGCCGCCCTGCCTAGGCACATGACCGCCGAGAGGATGATCCGCA TCGCCACCACAGAGATCCGCAAGGTGCCTGCCCTGGGCAACTGCGACACAATGAGC
TTCGTGAGCGCCATCGTGCAGTGTAGCCAGCTGGGCCTGGAGCCAGGCTCCGCCCTG
GGCCACGCCTACCTGCTGCCCTTCGGCAACAAGAATGAGAAGTCCGGCAAGAAGAA
TGTGCAGCTGATCATCGGCTATAGGGGCATGATCGATCTGGCCCGGAGATCTGGCCA
GATCGCCTCTCTGAGCGCCAGAGTGGTGCGGGAGGGCGACGAGTTCAACTTTGAGTT
CGGCCTGGATGAGAAGCTGATCCACCGGCCTGGCGAGAATGAGGACGCCCCAGTGA
CCCACGTGTACGCAGTGGCCAGACTGAAGGATGGCGGCACCCAGTTTGAAGTGATG
ACAAGGCGCCAGATCGAGCTGGTGAGGTCCCAGTCTAAGGCCGGCAACAATGGCCC
TTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGCCATCCGGAGACTGTTCA
AGTACCTGCCAGTGTCTATCGAGATCCAGCGCGCCGTGAGCATGGACGAGAAGGAG
CCACTGACCATCGACCCCGCCGATAGCTCCGTGCTGACAGGCGAGTATTCTGTGATC
GATAACAGCGAGGAG
[00229] Shigella sonnei RecE DNA (SEQ ID NO: 102):
GATCGCGGCCTGCTGACAAAGGAGTGGAGGAAGGGAAACCGGGTGAGCCGGATCA
CCAGGACAGCCAGCGGAGCAAACGCAGGAGGAGGAAATCTGACCGACAGAGGCGA
GGGCTTCGTGCACGATCTGACAAGCCTGGCCCGCGACATCGCAACCGGCGTGCTGG
CCCGGAGCATGGACGTGGACATCTACAACCTGCACCCTGCCCACGCCAAGAGGATC
GAGGAGATCATCGCCGAGAATAAGCCCCCTTTCAGCGTGTTTAGAGACAAGTTTATC
ACAATGCCAGGCGGCCTGGACTACTCCAGGGCCATCGTGGTGGCCTCTGTGAAGGA
GGCCCCAATCGGCATCGAAGTGATCCCCGCCCACGTGACCGCCTATCTGAACAAGG
TGCTGACCGAGACAGACCACGCCAATCCAGATCCCGAGATCGTGGACATCGCATGC
GGCAGAAGCTCCGCCCCTATGCCACAGAGGGTGACCGAGGAGGGCAAGCAGGACG
ATGAGGAGAAGCTGCAGCCTTCTGGCACCACAGCAGATGAGCAGGGAGAGGCAGA
GACAATGGAGCCAGACGCCACAAAGCACCACCAGGATACCCAGCCTCTGGACGCCC
AGAGCCAGGTGAACAGCGTGGATGCCAAGTATCAGGAGCTGAGAGCCGAGCTGCAC
GAGGCCAGGAAGAACATCCCTTCCAAGAATCCAGTGGACGCAGATAAGCTGCTGGC
CGCCTCTCGCGGCGAGTTCGTGGACGGCATCAGCGACCCAAACGATCCCAAGTGGG
TGAAGGGCATCCAGACACGGGATTCCGTGTACCAGAATCAGCCTGAGACAGAGAAA
ACCAGCCCCGACATGAAGCAGCCAGAGCCTGTGGTGCAGCAGGAGCCTGAGATCGC
CTTCAACGCCTGCGGACAGACCGGCGGCGACAATTGCCCAGATTGTGGCGCCGTGA
TGGGCGATGCCACCTATCAGGAGACATTTGACGAGGAGAACCAGGTGGAGGCCAAG GAGAATGATCCTGAGGAGATGGAGGGCGCCGAGCACCCACACAACGAGAATGCCG
GCAGCGACCCCCACAGAGACTGTTCCGATGAGACAGGCGAGGTGGCCGATCCCGTG
ATCGTGGAGGACATCGAGCCTGGCATCTACTATGGCATCAGCAACGAGAATTACCA
CGCAGGCCCCGGCGTGTCCAAGTCTCAGCTGGACGACATCGCCGACACACCTGCCCT
GTATCTGTGGAGGAAGAACGCCCCAGTGGATACCACAAAGACCAAGACACTGGACC
TGGGCACCGCATTCCACTGCCGCGTGCTGGAGCCAGAGGAGTTCAGCAATCGGTTTA
TCGTGGCCCCCGAGTTCAACCGGAGAACAAATGCCGGCAAGGAGGAGGAGAAGGC
CTTTCTGATGGAGTGTGCCTCCACAGGCAAGATGGTCATCACCGCCGAGGAGGGCA
GAAAGATCGAGCTGATGTACCAGTCTGTGATGGCACTGCCACTGGGACAGTGGCTG
GTGGAGAGCGCCGGACACGCAGAGTCTAGCATCTATTGGGAGGACCCCGAGACAGG
CATCCTGTGCAGGTGTCGCCCCGACAAGATCATCCCTGAGTTCCACTGGATCATGGA
CGTGAAAACCACAGCCGACATCCAGCGGTTCAAGACAGCCTACTATGATTACAGGT
ATCACGTGCAGGATGCCTTCTACTCCGACGGCTATGAGGCCCAGTTTGGCGTGCAGC
CCACCTTCGTGTTTCTGGTGGCCTCTACCACAATCGAGTGCGGCAGATACCCCGTGG
AGATCTTTATGATGGGAGAGGAGGCAAAGCTGGCCGGACAGCTGGAGTATCACCGC
AACCTGCGGACACTGGCCGATTGTCTGAATACCGACGAGTGGCCAGCCATCAAGAC
CCTGTCCCTGCCCAGATGGGCAAAGGAGTACGCCAACGAC
[00230] Salmonella enterica RecT DNA (SEQ ID NO: 103):
ACCAAGCAGCCCCCTATCGCCAAGGCCGACCTGCAGAAAACCCAGGGAAACAGGGC
ACCTGCAGCAGTGAATGACAAGGATGTGCTGTGCGTGATCAACAGCCCTGCCATGA
AGGCACAGCTGGCCGCCGCCCTGCCAAGGCACATGACCGCCGAGAGGATGATCCGC
ATCGCCACCACAGAGATCAGGAAGGTGCCAGAGCTGCGCAACTGCGACAGCACCAG
CTTCATCGGCGCCATCGTGCAGTGTTCTCAGCTGGGCCTGGAGCCCGGCAGCGCCCT
GGGCCACGCCTACCTGCTGCCTTTTGGCAATGGCAAGGCCAAGAACGGCAAGAAGA
ATGTGCAGCTGATCATCGGCTATCGGGGCATGATCGATCTGGCCCGGAGATCTGGCC
AGATCATCTCCCTGAGCGCCAGAGTGGTGCGGGAGTGTGACGAGTTCTCCTACGAGC
TGGGCCTGGATGAGAAGCTGGTGCACCGGCCAGGCGAGAACGAGGACGCACCCATC
ACCCACGTGTATGCCGTGGCCAAGCTGAAGGATGGCGGCGTGCAGTTTGAAGTGAT
GACCAAGAAGCAGGTGGAGAAGGTGAGAGATACACACTCCAAGGCCGCCAAGAAT
GCCGCCTCTAAGGGCGCCAGCTCCATCTGGGACGAGCACTTCGAGGATATGGCCAA
GAAAACCGTGATCCGGAAGCTGTTTAAGTACCTGCCCGTGAGCATCGAGATCCAGA GAGCCGTGAGCATGGACGGCAAGGAGGTGGAGACAATCAACCCAGACGACATCAG
CGTGATCGCCGGCGAGTATTCCGTGATCGATAATCCCGAGGAG
[00231] Salmonella enterica RecE DNA (SEQ ID NO: 104):
GATCGCGGCCTGCTGACAAAGGAGTGGAGGAAGGGAAACCGGGTGAGCCGGATCA
CCAGGACAGCCAGCGGAGCAAACGCAGGAGGAGGAAATCTGACCGACAGAGGCGA
GGGCTTCGTGCACGATCTGACAAGCCTGGCCCGCGACGTGGCAACCGGCGTGCTGG
CCCGGAGCATGGACGTGGACATCTACAACCTGCACCCTGCCCACGCCAAGAGGGTG
GAGGAGATCATCGCCGAGAATAAGCCCCCTTTCAGCGTGTTTAGAGACAAGTTTATC
ACAATGCCTGGCGGCCTGGACTACTCCAGGGCCATCGTGGTGGCCTCTGTGAAGGA
GGCCCCTATCGGCATCGAAGTGATCCCAGCCCACGTGACCGAGTATCTGAACAAGG
TGCTGACCGAGACAGACCACGCCAATCCAGATCCCGAGATCGTGGACATCGCATGC
GGCAGAAGCTCCGCCCCTATGCCACAGAGGGTGACCGAGGAGGGCAAGCAGGACG
ATGAGGAGAAGCCCCAGCCTTCTGGAGCTATGGCCGACGAGCAGGCAACCGCAGAG
ACAGTGGAGCCAAACGCCACAGAGCACCACCAGAATACCCAGCCCCTGGATGCCCA
GAGCCAGGTGAACTCCGTGGACGCCAAGTATCAGGAGCTGAGAGCCGAGCTGCAGG
AGGCCAGGAAGAACATCCCCTCCAAGAATCCTGTGGACGCAGATAAGCTGCTGGCC
GCCTCTCGCGGCGAGTTCGTGGATGGCATCAGCGACCCTAACGATCCAAAGTGGGT
GAAGGGCATCCAGACACGGGATTCCGTGTACCAGAATCAGCCCGAGACAGAGAAG
ATCTCTCCTGACGCCAAGCAGCCAGAGCCCGTGGTGCAGCAGGAGCCCGAGACAGT
GTGCAACGCCTGTGGACAGACCGGCGGCGACAATTGCCCTGATTGTGGCGCCGTGA
TGGGCGACGCCACATATCAGGAGACATTCGGCGAGGAGAATCAGGTGGAGGCCAAG
GAGAAGGACCCCGAGGAGATGGAGGGAGCAGAGCACCCTCACAACGAGAATGCCG
GCAGCGACCCACACAGAGACTGTTCCGATGAGACAGGCGAGGTGGCCGATCCAGTG
ATCGTGGAGGACATCGAGCCTGGCATCTACTATGGCATCAGCAACGAGAATTACCA
CGCAGGCCCCGGCGTGTCCAAGTCTCAGCTGGACGACATCGCCGACACACCCGCCC
TGTATCTGTGGAGGAAGAACGCCCCTGTGGATACCACAAAGACCAAGACACTGGAC
CTGGGCACCGCATTCCACTGCCGCGTGCTGGAGCCTGAGGAGTTCAGCAATCGGTTT
ATCGTGGCCCCAGAGTTCAACCGGAGAACAAATGCCGGCAAGGAGGAGGAGAAGG
CCTTTCTGATGGAGTGTGCCTCCACCGGCAAGACAGTGATCACCGCCGAGGAGGGC
AGAAAGATCGAGCTGATGTACCAGTCTGTGATGGCACTGCCTCTGGGACAGTGGCT
GGTGGAGAGCGCCGGACACGCAGAGTCTAGCATCTATTGGGAGGACCCCGAGACAG GCATCCTGTGCAGGTGTCGCCCAGACAAGATCATCCCCGAGTTCCACTGGATCATGG
ACGTGAAAACCACAGCCGACATCCAGCGGTTCAAGACAGCCTACTATGATTACAGG
TATCACGTGCAGGATGCCTTCTACTCCGACGGCTATGAGGCCCAGTTTGGCGTGCAG
CCAACCTTCGTGTTTCTGGTGGCCTCTACCACAGTGGAGTGCGGCAGATACCCCGTG
GAGATCTTTATGATGGGAGAGGAGGCAAAGCTGGCCGGACAGCAGGAGTATCACCG
CAACCTGCGGACACTGGCCGATTGTCTGAATACCGACGAGTGGCCTGCCATCAAGA
CCCTGTCCCTGCCACGGTGGGCCAAGGAGTACGCCAACGAC
[00232] Acetobacter RecT DNA (SEQ ID NO: 105):
AACGCCCCCCAGAAGCAGAATACCAGAGCCGCCGTGAAGAAGATCAGCCCTCAGGA
GTTCGCCGAGCAGTTTGCCGCCATCATCCCACAGGTGAAGTCCGTGCTGCCCGCCCA
CGTGACCTTCGAGAAGTTTGAGCGGGTGGTGAGACTGGCCGTGCGGAAGAACCCTG
ACCTGCTGACATGCTCCCCAGCCTCTCTGTTCATGGCATGTATCCAGGCAGCCTCCG
ACGGCCTGCTGCCTGATGGAAGGGAGGGAGCAATCGTGAGCCGGTGGAGCTCCAAG
AAGAGCTGCAACGAGGCCTCCTGGATGCCAATGGTGGCCGGCCTGATGAAGCTGGC
CCGGAACAGCGGCGACATCGCCAGCATCTCTAGCCAGGTGGTGTTCGAGGGCGAGC
ACTTTAGAGTGGTGCTGGGCGACGAGGAGAGGATCGAGCACGAGCGCGATCTGGGC
AAGACCGGCGGCAAGATCGTGGCAGCCTACGCCGTGGCAAGGCTGAAGGACGGCA
GCGATCCAATCCGCGAGATCATGTCCTGGGGCCAGATCGAGAAGATCAGAAACACA
AATAAGAAGTGGGAGTGGGGACCCTGGAAGGCCTGGGAGGACGAGATGGCCAGAA
AGACCGTGATCCGGAGACTGGCCAAGAGACTGCCCATGTCTACAGATAAGGAGGGA
GAGAGGCTGCGCAGCGCCATCGAGAGGATCGACTCCCTGGTGGACATCTCTGCCAA
CGTGGACGCACCTCAGATCGCAGCAGACGATGAGTTTGCCGCCGCCGCCCACGGCG
TGGAGCCACAGCAGATCGCAGCACCTGACCTGATCGGCCGCCTGGCCCAGATGCAG
TCCCTGGAGCAGGTGCAGGACATCGAGCCCCAGGTGTCTCACGCCATCCAGGAGGC
CGACAAGAGGGGCGACAGCGATACAGCCAATGCCCTGGATGCCGCCCTGCAGAGCG
CCCTGTCCCGCACCTCTACAGCCAAGGAGGAGGTGCCTGCC
[00233] Acetobacter RecE DNA (SEQ ID NO: 106):
GTGATCTCTAAGAGCGGCATCTACGACCTGACCAACGAGCAGTATCACGCCGATCCT
TGCCCAGAGATGTCCCTGAGCTCCTCTGGAGCCAGGGACCTGCTGAGCTCCTGTCCT
GCCAAGTTCATCGCCGCCAAGCAGCTGCCACAGCAGAATAAGAGGTGCTTTGACAT
CGGCTCTGCCGGACACCTGATGGTGCTGGAGCCACACCTGTTCGACCAGAAGGTGT GCGAGATCAAGCACCCTGATTGGCGCACAAAGGCAGCAAAGGAGGAGCGGGACGC
CGCCTACGCCGAGGGAAGAATCCCCCTGCTGAGCCGCGAGGTGGAGGACATCAGGG
CAATGCACTCCGTGGTGTGGAGAGATTCTCTGGGAGCCAGGGCCTTCAGCGGAGGC
AAGGCAGAGCAGTCCCTGGTGTGGCGCGACGAGGAGTTTGGCATCTGGTGCCGGCT
GCGGCCCGATTACGTGCCTAACAATGCCGTGCGGATCTTCGACTATAAGACCGCCAC
AAACGGCTCCCCCGATGCCTTTATGAAGGAGATCTACAATCGGGGCTATCACCAGC
AGGCCGCCTGGTATCTGGACGGATATGAGGCAGTGACCGGCCACAGGCCACGCGAG
TTCTGGTTTGTGGTGCAGGAGAAAACCGCCCCCTTCCTGCTGTCTTTCTTTCAGATGG
ATGAGATGAGCCTGGAGATCGGCCGGACCCTGAACAGACAGGCCAAGGGCATCTTT
GCCTGGTGCCTGCGCAACAATTGTTGGCCAGGCTATCAGCCCGAGGTGGATGGCAA
GGTGAGATTCTTTACCACATCTCCCCCTGCCTGGCTGGTGAGGGAGTACGAGTTTAA
GAATGAGCACGGCGCCTATGAGCCACCCGAGATCAAGCGGAAGGAGGTGGCC [00234] Salmonella enterica subsp. enterica serovar Javiana str. 10721 RecT DNA (SEQ ID
NO:107):
CCAAAGCAGCCCCCTATCGCCAAGGCAGACCTGCAGAAAACCCAGGGAGCACGGAC
CCCAACAGCAGTGAAGAACAATAACGATGTGATCTCCTTTATCAATCAGCCTTCTAT
GAAGGAGCAGCTGGCCGCCGCCCTGCCAAGGCACATGACCGCCGAGCGGATGATCA
GAATCGCCACCACAGAGATCAGGAAGGTGCCCGCCCTGGGCGACTGCGATACAATG
TCTTTTGTGAGCGCCATCGTGCAGTGTAGCCAGCTGGGCCTGGAGCCTGGCGGCGCC
CTGGGCCACGCCTACCTGCTGCCTTTCGGCAATCGGAACGAGAAGTCCGGCAAGAA
GAATGTGCAGCTGATCATCGGCTATAGAGGCATGATCGACCTGGCCCGGAGATCCG
GACAGATCGCCAGCCTGTCCGCCAGGGTGGTGCGCGAGGGCGACGATTTCTCTTTTG
AGTTCGGCCTGGAGGAGAAGCTGGTGCACAGGCCAGGCGAGAACGAGGACGCCCC
CGTGACCCACGTGTACGCAGTGGCACGCCTGAAGGATGGAGGCACCCAGTTTGAAG
TGATGACACGGAAGCAGATCGAGCTGGTGAGAGCCCAGTCTAAGGCCGGCAATAAC
GGCCCTTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGCCATCAGGCGCCT
GTTCAAGTACCTGCCCGTGAGCATCGAGATCCAGAGGGCCGTGAGCATGGATGAGA
AGGAGACACTGACAATCGACCCAGCCGATGCCAGCGTGATCACCGGCGAGTATTCC
GTGGTGGAGAATGCCGGCGTGGAGGAGAACGTGACAGCC
[00235] Salmonella enterica subsp. enterica serovar Javiana str. 10721 RecE DNA (SEQ ID
NO:108): TACTATGACATCCCAAACGAGGCCTACCACGCAGGCCCCGGCGTGTCTAAGAGCCA
GCTGGACGACATCGCCGATACCCCCGCCATCTATCTGTGGCGGAAGAATGCCCCTGT
GGACACCGAGAAAACCAAGTCCCTGGATACCGGCACAGCCTTCCACTGCAGGGTGC
TGGAGCCAGAGGAGTTCAGCAAGCGGTTCATCATCGCCCCCGAGTTCAACCGGAGA
ACCTCCGCCGGCAAGGAGGAGGAGAAAACCTTCCTGGAGGAGTGTACCCGGACAGG
CAGAACCGTGCTGACAGCCGAGGAGGGCAGGAAGATCGAGCTGATGTACCAGTCCG
TGATGGCACTGCCACTGGGACAGTGGCTGGTGGAGTCTGCCGGCTACGCCGAGAGC
TCCGTGTATTGGGAGGACCCTGAGACAGGCATCCTGTGCCGGTGTAGACCCGATAA
GATCATCCCTGAGTTCCACTGGATCATGGACGTGAAAACCACAGCCGACATCCAGA
GGTTTCGCACCGCCTACTATGACTACAGATACCACGTGCAGGACGCCTTCTACTCTG
ATGGCTATAGAGCCCAGTTTGGCGAGATCCCTACATTCGTGTTTCTGGTGGCCAGCA
CCACAGCAGAGTGCGGCAGATACCCCGTGGAGATCTTTATGATGGGAGAGGACGCA
AAGCTGGCCGGACAGCGCGAGTATAGGCGCAATCTGCAGACCCTGGCCGAGTGTCT
GAACAATGATGAGTGGCCTGCCATCAAGACACTGTCTCTGCCACGGTGGGCCAAGG
AGAACGCCAATGCC
[00236] Pseudob acteriovorax antillogorgiicola RecT DNA (SEQ ID NO: 109):
GGCCACCTGGTGAGCAAGACCGAGCAGGATTACATCAAGCAGCACTATGCCAAGGG
CGCCACAGACCAGGAGTTCGAGCACTTTATCGGCGTGTGCAGGGCCAGAGGCCTGA
ACCCAGCCGCCAATCAGATCTACTTCGTGAAGTATCGGTCCAAGGATGGACCAGCA
AAGCCAGCCTTTATCCTGTCTATCGACAGCCTGAGGCTGATCGCACACCGCACCGGC
GATTACGCAGGATGCTCTGAGCCCATCTTCACAGACGGCGGCAAGGCCTGTACCGT
GACAGTGCGGAGAAACCTGAAGAGCGGCGAGACAGGCAATTTCTCCGGCATGGCCT
TTTATGACGAGCAGGTGCAGCAGAAGAACGGCCGGCCTACCTCCTTTTGGCAGTCTA
AGCCAAGAACAATGCTGGAGAAGTGTGCAGAGGCAAAGGCCCTGAGGAAGGCCTTC
CCTCAGGATCTGGGCCAGTTTTACATCAGAGAGGAGATGCCCCCTCAGTATGACGAG
CCTATCCAGGTGCACAAGCCAAAGGCCCTGGAGGAGCCCAGGTTCAGCAAGTCCGA
TCTGTCCAGGCGCAAGGGCCTGAACAGGAAGCTGTCTGCCCTGGGAGTGGACCCCA
GCCGCTTCGATGAGGTGGCCACCTTTCTGGACGGCACACCTGATCGCGAGCTGGGCC
AGAAGCTGAAGCTGTGGCTGAAGGAGGCCGGCTACGGCGTGAATCAG
[00237] Pseudob acteriovorax antillogorgiicola RecE DNA (SEQ ID NO: 110):
AGCAAGCTGTCCAACCTGAAGGTGTCTAATAGCGACGTGGATACACTGAGCCGGAT CAGAATGAAGGAGGGCGTGTATCGGGACCTGCCAATCGAGAGCTACCACCAGTCCC
CCGGCTATTCTAAGACCAGCCTGTGCCAGATCGATAAGGCCCCTATCTACCTGAAAA
CCAAGGTGCCACAGAAGTCCACAAAGTCTCTGAACATCGGCACCGCCTTCCACGAG
GCTATGGAGGGCGTGTTTAAGGACAAGTATGTGGTGCACCCCGATCCTGGCGTGAAT
AAGACCACAAAGTCTTGGAAGGACTTCGTGAAGAGGTATCCTAAGCACATGCCACT
GAAGCGCAGCGAGTACGACCAGGTGCTGGCCATGTACGATGCCGCCCGGTCTTATA
GACCTTTTCAGAAGTACCACCTGAGCCGGGGCTTCTACGAGAGCTCCTTTTATTGGC
ACGATGCCGTGACAAACAGCCTGATCAAGTGCAGACCCGACTATATCACCCCTGAT
GGCATGAGCGTGATCGACTTCAAGACCACAGTGGACCCCAGCCCCAAGGGCTTTCA
GTACCAGGCCTACAAGTATCACTACTACGTGAGCGCCGCCCTGACCCTGGAGGGAA
TCGAGGCAGTGACCGGCATCAGGCCAAAGGAGTACCTGTTCCTGGCCGTGTCCAATT
CTGCCCCATACCTGACCGCCCTGTATCGCGCCTCTGAGAAGGAGATCGCCCTGGGCG
ACCACTTTATCCGGCGGAGCCTGCTGACCCTGAAAACCTGTCTGGAGTCTGGCAAGT
GGCCCGGCCTGCAGGAGGAGATCCTGGAGCTGGGCCTGCCTTTCTCCGGCCTGAAG
GAGCTGAGAGAGGAGCAGGAGGTGGAGGATGAGTTTATGGAGCTGGTGGGC
[00238] Photobacterium sp. JCM 19050 RecT DNA (SEQ ID NO: 111):
AACACCGACATGATCGCCATGCCCCCTTCTCCAGCCATCAGCATGCTGGACACAAGC
AAGCTGGATGTGATGGTGCGGGCAGCAGAGCTGATGTCCCAGGCCGTGGTCATGGT
GCCCGACCACTTCAAGGGCAAGCCAGCCGATTGCCTGGCAGTGGTCATGCAGGCAG
ACCAGTGGGGCATGAACCCCTTTACCGTGGCCCAGAAAACCCACCTGGTGAGCGGC
ACCCTGGGATACGAGTCCCAGCTGGTGAATGCCGTGATCAGCTCCTCTAAGGCCATC
AAGGGCCGGTTCCACTATGAGTGGTCTGATGGCTGGGAGAGACTGGCCGGCAAGGT
GCAGTACGTGAAGGAGTCTCGGCAGAGAAAGGGCCAGCAGGGCAGCTATCAGGTG
ACCGTGGCCAAGCCAACATGGAAGCCAGAGGACGAGCAGGGCCTGTGGGTGCGGT
GTGGAGCCGTGCTGGCCGGAGAGAAGGACATCACATGGGGCCCTAAGCTGTACCTG
GCCAGCGTGCTGGTGCGGAACAGCGAGCTGTGGACCACAAAGCCCTACCAGCAGGC
CGCCTATACCGCCCTGAAGGATTGGTCCCGCCTGTATACACCTGCCGTGATGCAGGG
CTCTATGACCGGCAAGAGCTGGTCCCTGACAGGCAGGCTGATCAGCCCCCGC
[00239] Photobacterium sp. JCM 19050 RecE DNA (SEQ ID NO: 112):
GCCGAGCGGGTGAGAACCTATCAGCGGGACGCCGTGTTCGCACACGAGCTGAAGGC
CGAGTTTGATGAGGCCGTGGAGAACGGCAAGACCGGCGTGACACTGGAGGACCAGG CCAGGGCCAAGAGGATGGTGCACGAGGCCACCACAAACCCCGCCTCTCGGAATTGG
TTCAGATACGACGGAGAGCTGGCCGCATGCGAGAGGAGCTATTTTTGGCGCGATGA
GGAGGCAGGCCTGGTGCTGAAGGCCAGGCCTGACAAGGAGATCGGCAACAATCTGA
TCGATGTGAAGTCCATCGAGGTGCCAACCGACGTGTGCGCCTGTGATCTGAACGCCT
ATATCAATCGGCAGATCGAGAAGAGAGGCTACCACATCTCCGCCGCCCACTATCTGT
CTGGCACAGGCAAGGACCGCTTCTTTTGGATCTTCATCAATAAGGTGAAGGGCTACG
AGTGGGTGGCAATCGTGGAGGCCTCTCCCCTGCACATCGAGCTGGGCACCTATGAG
GTGCTGGAGGGCCTGCGGAGCATCGCCAGCTCCACAAAGGAGGCAGATTACCCAGC
ACCTCTGTCCCACCCTGTGAACGAGAGAGGCATCCCACAGCCCCTGATGTCTAATCT
GAGCACATACGCCATGAAGAGGCTGGAGCAGTTTCGCGAGCTG
[00240] Providencia alcalifaciens DSM 30120 RecT DNA (SEQ ID NO:113):
[00241] AAGGCACAGCTGGCCGCCGCCCTGCCTAAGCACATCACCAGCGACCGGAT
GATCAGAATCGTGTCCACCGAGATCAGAAAGACCCCATCTCTGGCCAACTGCGACA
TCCAGAGCTTCATCGGCGCCGTGGTGCAGTGTTCTCAGCTGGGCCTGGAGCCAGGCA
ACGCCCTGGGACACGCCTACCTGCTGCCCTTTGGCAATGGCAAGTCCGACAACGGC
AAGTCTAATGTGCAGCTGATCATCGGCTATCGGGGCATGATCGATCTGGCCCGGAGA
AGCGGCCAGATCATCTCTATCAGCGCCAGGACCGTGCGCCAGGGCGACAACTTCCA
CTTTGAGTACGGCCTGAACGAGAATCTGACCCACATCCCCGAGGGCAATGAGGACT
CCCCTATCACACACGTGTACGCAGTGGCACGGCTGAAGGATGAGGGCGTGCAGTTC
GAAGTGATGACATATAACCAGATCGAGAAGGTGAGAGATAGCTCCAAGGCCGGCAA
GAATGGCCCCTGGGTGACCCACTGGGAGGAGATGGCCAAGAAAACCGTGATCAGGC
GCCTGTTTAAGTACCTGCCCGTGAGCATCGAGATGCAGAAGGCCGTGATCCTGGACG
AGAAGGCCGAGGCCAATATCGAGCAGGATCACTCCGCCATCTTCGAGGCCGAGTTT
GAGGAGGTGGACTCTAACGGCAAT
[00242] Providencia alcalifaciens DSM 30120 RecE DNA (SEQ ID NO:114):
AACGAGGGCATCTACTATGACATCTCTAATGAGGACTATCACCACGGCCTGGGCATC
TCTAAGAGCCAGCTGGATCTGATCGACGAGAGCCCCGCCGATTTCATCTGGCACCGG
GATGCCCCTGTGGACAACGAGAAAACCAAGGCCCTGGATTTTGGCACAGCCCTGCA
CTGCCTGCTGCTGGAGCCAGACGAGTTCCAGAAGAGGTTTCGCATCGCCCCCGAGGT
GAACCGGAGAACAAATGCCGGCAAGGAGCAGGAGAAGGAGTTCCTGGAGATGTGC
GAGAAGGAGAATATCACCCCCATCACAAACGAGGATAATAGGAAGCTGTCTCTGAT GAAGGACAGCGCAATGGCCCACCCTATCGCCCGCTGGTGTCTGGAGGCCAAGGGCA
TCGCCGAGAGCTCCATCTATTGGAAGGACAAGGATACAGACATCCTGTGCCGGTGT
AGACCAGACAAGCTGATCGAGGAGCACCACTGGCTGGTGGATGTGAAGTCCACCGC
CGACATCCAGAAGTTCGAGCGGTCTATGTACGAGTATAGATACCACGTGCAGGATTC
CTTTTATTCTGACGGCTACAAGAGCCTGACAGGCGAGATGCCCGTGTTCGTGTTCCT
GGCCGTGTCCACCGTGATCAACTGCGGCAGATACCCCGTGCGGGTGTTCGTGCTGGA
CGAGCAGGCAAAGTCCGTGGGACGGATCACCTATAAGCAGAATCTGTTTACATACG
CCGAGTGTCTGAAAACCGACGAGTGGGCCGGCATCAGAACCCTGAGCCTGCCCTCC
TGGGCAAAGGAGCTGAAGCACGAGCACACCACAGCCTCT
[00243] Pantoea stewartii RecT Protein (SEQ ID NO:115):
MSNQPPIASADLQKANTGKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMI
RIVTTEIRKTPALATCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQ
LIIGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLIHRPGENEDAPITHVYAV
ARLKDGGTQFEVMTVKQIEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSIE MQKAVILDEKAESDVDQDNASVLSAEYSVLDGSSEE
[00244] Pantoea stewartii RecE Protein (SEQ ID NO: 116):
MQPGVYYDISNEEYHAGPGISKSQLDDIAVSPAIFQWRKSAPVDDEKTAALDLGTALHC
LLLEPDEFSKRFMIGPEVNRRTNAGKQKEQDFLDMCEQQGITPITHDDNRKLRLMRDSA FAHPVARWMLETEGKAEASIYWNDRDTQILSRCRPDKLITEFSWCVDVKSTADIGKFQK DF YS YRYHVQD AF YSDGYEAQFCEVPTF AFLVVS S SIDCGRYP VQVFIMDQQAKD AGR
AEYKRNLTTYAECQARNEWPGIATLSLPYWAKEIRNV
[00245] Pantoea brenneri RecT Protein (SEQ ID NO: 117):
MSNQPPIASADLQKTQQSKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMI
RIVTTEIRKTPQLAQCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQ LIIGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLVHRPGENEDAPITHVYAV ARLKDGGTQFEVMTVKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSI
EMQKAVVLDEKAESDVDQDNASVLSAEYSVLESGDEATN
[00246] Pantoea brenneri RecE Protein (SEQ ID NO: 118):
MQPGIYYDISNEDYHRGAGISKSQLDDIAISPAIYQWRKHAPVDEEKTAALDLGTALHCL
LLEPDEFSKRFQIGPEVNRRTTAGKEKEKEFIERCEAEGITPITHDDNRKLKLMRDSALAH PIARWMLEAQGNAEASIYWNDRDAGVLSRCRPDKIITEFNWCVDVKSTADIMKFQKDF YSYRYHVQDAFYSDGYESHFHETPTFAFLAVSTSIDCGRYPVQVFIMDQQAKDAGRAE YKRNIHTFAECLSRNEWPGIATLSLPFWAKELRNE
[00247] Pantoea dispersa RecT Protein (SEQ ID NO:119): MSNQPPLATADLQKTQQSNQVAKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMIR IVTTEIRKTPALAQCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQLI IGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLIHRPGDNESAPITHVYAVAR LKDGGTQFEVMTAKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSIEM QKAVVLDEKAESDVDQDNASVLSAEYSVLESGTGE
[00248] Pantoea dispersa RecE Protein (SEQ ID NO: 120): MEPGIYYDISNEAYHSGPGISKSQLDDIARSPAIFQWRKDAPVDTEKTKALDLGTDFHCA VLEPERFADMYRVGPEVNRRTTAGKAEEKEFFEKCEKDGAVPITHDDARKVELMRGSV MAHPIAKQMIAAQGHAEASIYWHDESTGNLCRCRPDKFIPDWNWIVDVKTTADMKKFR REFYDLRYHVQDAFYTDGYAAQFGERPTFVFVVTSTTIDCGRYPTEVFFLDEETKAAGR SEYQSNLVTYSECLSRNEWPGIATLSLPHWAKELRNV
[00249] Type-F symbiont of Plautia stali RecT Protein (SEQ ID NO: 121): MSNQPPIASADLQKTQQSKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMI RIVTTEIRKTPALATCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQ LIIGYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLIHRPGDNEDAPITHVYAV ARLKDGGTQFEVMTAKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSI EMQKAVVLDEKAESDVDQDNASVLSAEYSVLEGDGGE
[00250] Type-F symbiont of Plautia stali RecE Protein (SEQ ID NO: 122): MQPGIYYDISNEDYHGGPGISKSQLDDIAISPAIYQWRKHAPVDEEKTAALDLGTALHCL LLEPDEFSKRFEIGPEVNRRTTAGKEKEKEFMERCEAEGVTPITHDDNRKLRLMRDSAM AHPIARWMLEAQGNAEASIYWNDRDTGVLSRCRPDKIITDFNWCVDVKSTADIIKFQKD FYSYRYHVQDAFYSDGYESHFDETPTFAFLAVSTSIDCGRYPVQVFIMDQQAKDAGRAE YKRNIHTFAECLSRNEWPGIATLSLPYWAKELRNE
[00251] Providencia stuartii RecT Protein (SEQ ID NO: 123): MSNPPLAQADLQKTQGTEVKEKTKDQMLVELINKPSMKAQLAAALPRHMTPDRMIRIV TTEIRKTPALATCDMQSFVGAVVQCSQLGLEPGNALGHAYLLPFGNGKSKSGQSNVQLI IGYRGMIDLARRSGQIVSISARTVRQGDNFHFEYGLNENLTHVPGENEDSPITHVYAVAR LKDGGVQFEVMTYNQIEKVRASSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPVSIEM QKAVILDEKAEANIDQENATIFEGEYEEVGTDGK
[00252] Providencia stuartii RecE Protein (SEQ ID NO: 124):
EGIYYNISNEDYHNGLGISKSQLDLINEMPAEYIWSKEAPVDEEKIKPLEIGTALHCLLLE PDEYHKRYKIGPDVNRRTNAGKEKEKEFFDMCEKEGITPITHDDNRKLMIMRDSALAHP IAKWCLEADGVSES SIYWTDKETDVLCRCRPDRIITAHNYIVDVKS SGDIEKFDYEYYNY
RYHVQDAFYSDGYKEVTGITPTFLFLVVSTKIDCGKYPVRTYVMSEEAKSAGRTAYKH NLLTYAECLKTDEWAGIRTLSLPRWAKELRNE
[00253] Providencia sp. MGF014 RecT Protein (SEQ ID NO:125):
MSNPPLAQSDLQKTQGTEVKVKTKDQQLIQFINQPSMKAQLAAALPRHMTPDRMIRIVT
TEIRKTPALATCDMQSFVGAVVQCSQLGLEPGNALGHAYLLPFGNGKAKSGQSNVQLII
GYRGMIDLARRSNQIISISARTVRQGDNFHFEYGLNEDLTHTPSENEDSPITHVYAVARL
KDGGVQFEVMTYNQVEKVRASSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPVSIEMQ KAVVLDEKAEANVDQENATIFEGEYEEVGTDGN
[00254] Providencia sp. MGF014 RecE Protein (SEQ ID NO:126):
MKEGIYYNISNEDYHNGLGISKSQLDLINEMPAEYIWSKEAPVDEEKIKPLEIGTALHCLL
LEPDEYHKRYKIGPDVNRRTNVGKEKEKEFFDMCEKEGITPITHDDNRKLMIMRDSALA HPIAKWCLEADGVSES SIYWTDKETDVLCRCRPDRIITAHNYIIDVKS SGDIEKFD YEYYN YRYHVQDAFYSDGYKEVTGITPTFLFLVVSTKIDCGKYPVRTYVMSEEAKSAGRTAYK HNLLTYAECLKTDEWAGIRTLSLPRWAKELRNE
[00255] Shewanella putrefaciens RecT Protein (SEQ ID NO: 127):
MQTAQVKLSVPHQQVYQDNFNYLSSQVVGHLVDLNEEIGYLNQIVFNSLSTASPLDVA
APWSVYGLLLNVCRLGLSLNPEKKLAYVMPSWSETGEIIMKLYPGYRGEIAIASNFNVIK
NANAVLVYENDHFRIQAATGEIEHFVTSLSIDPRVRGACSGGYCRSVLMDNTIQISYLSIE
EMNAIAQNQIEANMGNTPWNSIWRTEMNRVALYRRAAKDWRQLIKATPEIQSALSDTE Y
[00256] Shewanella putrefaciens RecE Protein (SEQ ID NO: 128):
MGTALAQTISLDWQDTIQPAYTASGKPNFLNAQGEIVEGIYTDLPNSVYHALDAHSSTGI
KTFAKGRHHYFRQYLSDVCRQRTKQQEYTFDAGTYGHMLVLEPENFHGNFMRNPVPD DFPDIELIESIPQLKAALAKSNLPVSGAKAALIERLYAFDPSLPLFEKMREKAITDYLDLR YAKYLRTDVELDEMATFYGIDTSQTREKKIEEILAISPSQPIWEKLISQHVIDHIVWDDAM RVERSTRAHPKADWLISDGYAELTIIARCPTTGLLLKVRFDWLRNDAIGVDFKTTLSTNP TKFGYQIKDLRYDLQQVFYCYVANLAGIPVKHFCFV ATE YKDADNCETFELSHKK VIES TEEMFDLLDEFKEALTSGNWYGHDRSRSTWVIEV
[00257] Bacillus sp. MUM 116 RecT Protein (SEQ ID NO:129):
MSKQLTTVNTQAVVGTFSQAELDTLKQTIAKGTTNEQFALFVQTCANSRLNPFLNHIHCI
VYNGKEGATMSLQIAVEGILYLARKTDGYKGIECQLIHENDEFKFDAKSKEVDHQIGFP
RGNVIGGYAIAKREGFDDVVVLMESNEVDHMLKGRNGHMWRDWFNDMFKKHIMKR AAKLQYGIEIAEDETVSSGPSVDNIPEYKPQPRKDITPNQDVIDAPPQQPKQDDEAAKLK
AARSEVSKKFKKLGIVKEDQTEYVEKHVPGFKGTLSDFIGLSQLLDLNIEAQEAQSADG DLLD
[00258] Bacillus sp. MUM 116 RecE Protein (SEQ ID NO:130):
MTYAADETLVQLLLSVDGKQLLLGRGLKKGKAQYYINEVPSKAKEFEEIRDQLFDKDLF
MSLFNPSYFFTLHWEKQRAMMLKYVTAPVSKEVLKNLPEAQSEVLERYLKKHSLVDLE
KIHKDNKNKQDKAYISAQSRTNTLKEQLMQLTEEKLDIDSIKAELAHIDMQVIELEKQM
DTAFEKNQAFNLQAQIRNLQDKIEMSKERWPSLKNEVIEDTCRTCKRPLDEDSVEAVKA DKDNRIAEYKAKHNSLVSQRNELKEQLNTIEYIDVTELREQIKELDESGQPLREQVRIYS QYQNLDTQVKSAEADENGILQDLKASIFILDSIKAFRGKEAEMQAEKVQALFTTLSVRLF
KQNKGDGEIKPDFEIEMNDKPYRTLSLSEGIRAGLELRDVLSQQSELVTPTFVDNAESITS FKQPNGQLIISRVVAGQELKIEAVSE
[00259] Shigella sonnei RecT Protein (SEQ ID NO: 131):
MTKQPPIAKADLQKTQENRAPAAIKNNDVISFINQPSMKEQLAAALPRHMTAERMIRIA
TTEIRKVPALGNCDTMSFVSAIVQCSQLGLEPGSALGHAYLLPFGNKNEKSGKKNVQLII GYRGMIDLARRSGQIASLSARVVREGDEFNFEFGLDEKLIHRPGENEDAPVTHVYAVAR LKDGGTQFEVMTRRQIELVRSQSKAGNNGPWVTHWEEMAKKTAIRRLFKYLPVSIEIQR
AVSMDEKEPLTIDPADSSVLTGEYSVIDNSEE
[00260] Shigella sonnei RecE Protein (SEQ ID NO: 132):
DRGLLTKEWRKGNRVSRITRTASGANAGGGNLTDRGEGFVHDLTSLARDIATGVLARS
MDVDIYNLHPAHAKRIEEIIAENKPPFSVFRDKFITMPGGLDYSRAIVVASVKEAPIGIEVI
PAHVTAYLNKVLTETDHANPDPEIVDIACGRSSAPMPQRVTEEGKQDDEEKLQPSGTTA DEQGEAETMEPDATKHHQDTQPLDAQSQVNSVDAKYQELRAELHEARKNIPSKNPVDA DKLLAASRGEFVDGISDPNDPKWVKGIQTRDSVYQNQPETEKTSPDMKQPEPVVQQEPE IAFNACGQTGGDNCPDCGAVMGDATYQETFDEENQVEAKENDPEEMEGAEHPHNENA GSDPHRDCSDETGEVADPVIVEDIEPGIYYGISNENYHAGPGVSKSQLDDIADTPALYLW RKNAPVDTTKTKTLDLGTAFHCRVLEPEEFSNRFIVAPEFNRRTNAGKEEEKAFLMECA STGKMVITAEEGRKIELMYQSVMALPLGQWLVESAGHAESSIYWEDPETGILCRCRPDK IIPEFHWIMDVKTTADIQRFKTAYYDYRYHVQDAFYSDGYEAQFGVQPTFVFL VASTTIE CGRYPVEIFMMGEEAKLAGQLEYHRNLRTLADCLNTDEWPAIKTLSLPRWAKEYAND [00261] Salmonella enterica RecT Protein (SEQ ID NO: 133): MTKQPPIAKADLQKTQGNRAPAAVNDKDVLCVINSPAMKAQLAAALPRHMTAERMIRI ATTEIRKVPELRNCDSTSFIGAIVQCSQLGLEPGSALGHAYLLPFGNGKAKNGKKNVQLII GYRGMIDLARRSGQIISLSARVVRECDEFSYELGLDEKLVHRPGENEDAPITHVYAVAKL KDGGVQFEVMTKKQVEKVRDTHSKAAKNAASKGASSIWDEHFEDMAKKTVIRKLFKY LPVSIEIQRAVSMDGKEVETINPDDISVIAGEYSVIDNPEE
[00262] Salmonella enterica RecE Protein (SEQ ID NO: 134):
DRGLLTKEWRKGNRVSRITRTASGANAGGGNLTDRGEGFVHDLTSLARDVATGVLARS MDVDIYNLHPAHAKRVEEIIAENKPPFSVFRDKFITMPGGLDYSRAIVVASVKEAPIGIEV IPAHVTEYLNKVLTETDHANPDPEIVDIACGRSSAPMPQRVTEEGKQDDEEKPQPSGAM ADEQATAETVEPNATEHHQNTQPLDAQSQVNSVDAKYQELRAELQEARKNIPSKNPVD ADKLLAASRGEFVDGISDPNDPKWVKGIQTRDSVYQNQPETEKISPDAKQPEPVVQQEP ETVCNACGQTGGDNCPDCGAVMGDATYQETFGEENQVEAKEKDPEEMEGAEHPHNEN AGSDPHRDCSDETGEVADPVIVEDIEPGIYYGISNENYHAGPGVSKSQLDDIADTPALYL WRKNAPVDTTKTKTLDLGTAFHCRVLEPEEFSNRFIVAPEFNRRTNAGKEEEKAFLMEC ASTGKTVITAEEGRKIELMYQSVMALPLGQWLVESAGHAESSIYWEDPETGILCRCRPD KIIPEFHWIMDVKTTADIQRFKTAYYDYRYHVQDAFYSDGYEAQFGVQPTFVFLVASTT VECGRYPVEIFMMGEEAKLAGQQEYHRNLRTLADCLNTDEWPAIKTLSLPRWAKEYAN D
[00263] Acetobacter RecT Protein (SEQ ID NO: 135):
MNAPQKQNTRAAVKKISPQEFAEQFAAIIPQVKSVLPAHVTFEKFERVVRLAVRKNPDL LTCSPASLFMACIQAASDGLLPDGREGAIVSRWSSKKSCNEASWMPMVAGLMKLARNS GDIASISSQVVFEGEHFRVVLGDEERIEHERDLGKTGGKIVAAYAVARLKDGSDPIREIM SWGQIEKIRNTNKKWEWGPWKAWEDEMARKTVIRRLAKRLPMSTDKEGERLRSAIERI DSLVDISANVDAPQIAADDEFAAAAHGVEPQQIAAPDLIGRLAQMQSLEQVQDIEPQVS HAIQEADKRGDSDTANALDAALQSALSRTSTAKEEVPA
[00264] Acetobacter RecE Protein (SEQ ID NO: 136):
MVISKSGIYDLTNEQYHADPCPEMSLS S SGARDLLS SCPAKFIAAKQLPQQNKRCFDIGS AGHLMVLEPHLFDQKVCEIKHPDWRTKAAKEERDAAYAEGRIPLLSREVEDIRAMHSV VWRDSLGARAFSGGKAEQSLVWRDEEFGIWCRLRPDYVPNNAVRIFDYKTATNGSPDA FMKEIYNRGYHQQAAWYLDGYEAVTGHRPREFWFVVQEKTAPFLLSFFQMDEMSLEIG RTLNRQAKGIFAWCLRNNCWPGYQPEVDGKVRFFTTSPPAWLVREYEFKNEHGAYEPP EIKRKEVA
[00265] Salmonella enterica subsp. enterica serovar Javiana str. 10721 RecT Protein (SEQ ID NO: 137): MPKQPPIAKADLQKTQGARTPTAVKNNNDVISFINQPSMKEQLAAALPRHMTAERMIRI ATTEIRKVPALGDCDTMSFVSAIVQCSQLGLEPGGALGHAYLLPFGNRNEKSGKKNVQL IIGYRGMIDLARRSGQIASLSARVVREGDDFSFEFGLEEKLVHRPGENEDAPVTHVYAVA RLKDGGTQFEVMTRKQIELVRAQSKAGNNGPWVTHWEEMAKKTAIRRLFKYLPVSIEI
QRAVSMDEKETLTIDPADASVITGEYSVVENAGVEENVTA
[00266] Salmonella enterica subsp. enterica serovar Javiana str. 10721 RecE Protein (SEQ ID NO: 138):
MYYDIPNEAYHAGPGVSKSQLDDIADTPAIYLWRKNAPVDTEKTKSLDTGTAFHCRVLE PEEFSKRFIIAPEFNRRTSAGKEEEKTFLEECTRTGRTVLTAEEGRKIELMYQSVMALPLG QWLVESAGYAESSVYWEDPETGILCRCRPDKIIPEFHWIMDVKTTADIQRFRTAYYDYR YHVQDAFYSDGYRAQFGEIPTFVFLVASTTAECGRYPVEIFMMGEDAKLAGQREYRRN LQTLAECLNNDEWPAIKTLSLPRW AKEN ANA
[00267] Pseudob acteriovorax antillogorgiicola RecT Protein (SEQ ID NO: 139): MGHLVSKTEQDYIKQHYAKGATDQEFEHFIGVCRARGLNPAANQIYFVKYRSKDGPAK PAFILSIDSLRLIAHRTGDYAGCSEPIFTDGGKACTVTVRRNLKSGETGNFSGMAFYDEQ VQQKNGRPTSFWQSKPRTMLEKCAEAKALRKAFPQDLGQFYIREEMPPQYDEPIQVHK PKALEEPRFSKSDLSRRKGLNRKLSALGVDPSRFDEVATFLDGTPDRELGQKLKLWLKE AGYGVNQ
[00268] Pseudob acteriovorax antillogorgiicola RecE Protein (SEQ ID NO: 140): MSKLSNLKVSNSDVDTLSRIRMKEGVYRDLPIESYHQSPGYSKTSLCQIDKAPIYLKTKV PQKSTKSLNIGTAFHEAMEGVFKDKYVVHPDPGVNKTTKSWKDFVKRYPKHMPLKRSE YDQVLAMYDAARSYRPFQKYHLSRGFYESSFYWHDAVTNSLIKCRPDYITPDGMSVIDF KTTVDPSPKGFQYQAYKYHYYVSAALTLEGIEAVTGIRPKEYLFLAVSNSAPYLTALYR ASEKEIALGDHFIRRSLLTLKTCLESGKWPGLQEEILELGLPFSGLKELREEQEVEDEFME LVG
[00269] Photobacterium sp. JCM 19050 RecT Protein (SEQ ID NO: 141): MNTDMIAMPPSPAISMLDTSKLDVMVRAAELMSQAVVMVPDHFKGKPADCLAVVMQ ADQ WGMNPFT V AQKTHL VSGTLGYESQLVNAVIS S SKAIKGRFHYEWSDGWERLAGK VQYVKESRQRKGQQGSYQVTVAKPTWKPEDEQGLWVRCGAVLAGEKDITWGPKLYL ASVLVRNSELWTTKPYQQAAYTALKDWSRLYTPAVMQGSMTGKSWSLTGRLISPR [00270] Photobacterium sp. JCM 19050 RecE Protein (SEQ ID NO: 142): MAERVRTYQRDAVFAHELKAEFDEAVENGKTGVTLEDQARAKRMVHEATTNPASRN WFRYDGELAACERSYFWRDEEAGLVLKARPDKEIGNNLIDVKSIEVPTDVCACDLNAYI NRQIEKRGYHISAAHYLSGTGKDRFFWIFINKVKGYEWVAIVEASPLHIELGTYEVLEGL RSIASSTKEADYPAPLSHPVNERGIPQPLMSNLSTYAMKRLEQFREL
[00271] Providencia alcalifaciens DSM 30120 RecT Protein (SEQ ID NO:143): MKAQLAAALPKHITSDRMIRIVSTEIRKTPSLANCDIQSFIGAVVQCSQLGLEPGNALGH AYLLPFGNGKSDNGKSNVQLIIGYRGMIDLARRSGQIISISARTVRQGDNFHFEYGLNEN LTHIPEGNEDSPITHVYAVARLKDEGVQFEVMTYNQIEKVRDSSKAGKNGPWVTHWEE MAKKTVIRRLFKYLPVSIEMQKAVILDEKAEANIEQDHSAIFEAEFEEVDSNGN Providencia alcalifaciens DSM 30120 RecE Protein (SEQ ID NO:144): MNEGIYYDISNEDYHHGLGISKSQLDLIDESPADFIWHRDAPVDNEKTKALDFGTALHCL LLEPDEFQKRFRIAPEVNRRTNAGKEQEKEFLEMCEKENITPITNEDNRKLSLMKDSAM AHPIARWCLEAKGIAESSIYWKDKDTDILCRCRPDKLIEEHHWLVDVKSTADIQKFERS MYEYRYHVQDSFYSDGYKSLTGEMPVFVFLAVSTVINCGRYPVRVFVLDEQAKSVGRI TYKQNLFTYAECLKTDEWAGIRTLSLPSWAKELKHEHTTAS
[00272] Mouse Albumin knock-in sense template (SEQ ID NO: 160) CACCTTCAGATTTTCCTGTAACGATCGGGAACTGGCATCTTCAGGGAGTAGctgacctcttc tcttcctcccacaggATCCTGGAGCCACCCGCAGTTCGAAAAGCTCAGTGAAGAGAAGAACA AAAAGCAGCATATTACAGTTAGTTGTCTTCATCAATCTTTAAATATGTTGTGTGGTTT
Figure imgf000091_0001
[00273] Mouse Albumin knock-in anti-sense template (SEQ ID NO: 161) GTGGAAACAGGGAGAGAAAAACCACACAACATATTTAAAGATTGATGAAGACAACT AACTGTAATATGCTGCTTTTTGTTCTTCTCTTCACTGAGCTTTTCGAACTGCGGGTGG CTCCAGGATcctgtgggaggaagagaagaggtcagCTACTCCCTGAAGATGCCAGTTCCCGATCGT TACAGGAAAATCTGAAGGTG
[00274] (SEQ ID NO: 162)
ACTTTGAGTGTAGCAGAGAGGAACCATTGCCACCTTCAGATTTTCCTGTAACGATCG GGAACTGGCATCTTCAGGGAGTAGCTGACCTCTTCTCTTCCTCCCACAGGATCCTGG AGCCACC
Example 16
[00275] FIG. 1 depicts SSAP with dCAS12fl protein as compact editor for precision large knock-in.
[00276] SSAP with dCasl2fl protein as compact editor for precision large knock-in has three components plus donor DNA. The donor sequence is provided in Wang et al., Nucleic Acids Res. 2021 Apr 6;49(6):e36. doi: 10.1093/nar/gkaal264. The three components are dCasl2fl protein, guideRNA with MS2 aptamer, and MCP-SSAP fusion protein.
[00277] The dCasl2fl protein has different mutations to convert Casl2fl into dCasl2fl.
[00278] dCasl2fl protein with D225A mutation (with nuclear localization signals similar to the design from Wang et al., Nucleic Acids Res. 2021 Apr 6;49(6):e36. doi: 10.1093/nar/gkaal264) MAPKKKRKVMIKVYRYEIVKPLDLDWKEFGTILRQLQQETRFALNKATQLAWEWMGF SSDYKDNHGEYPKSKDILGYTNVHGYAYHTIKTKAYRLNSGNLSQTIKRATDRFKAYQ KEILRGDMSIPSYKRDIPLDLIKENISVNRMNHGDYIASLSLLSNPAKQEMNVKRKISVIII VRGAGKTIMDRILSGEYQVSASQIIHDDRKNKWYLNISYDFEPQTRVLDLNKIMGIALGV AVAVYMAFQHTPARYKLEGGEIENFRRQVESRRISMLRQGKYAGGARGGHGRDKRIKP IEQLRDKIANFRDTTNHRYSRYIVDMAIKEGCGTIQMEDLTNIRDIGSRFLQNWTYYDLQ
QKIIYKAEEAGIKVIKIDPQYTSQRCSECGNIDSGNRIGQAIFKCRACGYEANADYNAAR NIAIPNIDKIIAESIKSGGSPKKKRKV (SEQ ID NO:496)
[00279] dCasl2fl protein with D225A, D401 A (with nuclear localization signals similar to the design from Wang et al., Nucleic Acids Res. 2021 Apr 6;49(6):e36. doi: 10.1093/nar/gkaal264) MAPKKKRKVMIKVYRYEIVKPLDLDWKEFGTILRQLQQETRFALNKATQLAWEWMGF SSDYKDNHGEYPKSKDILGYTNVHGYAYHTIKTKAYRLNSGNLSQTIKRATDRFKAYQ KEILRGDMSIPSYKRDIPLDLIKENISVNRMNHGDYIASLSLLSNPAKQEMNVKRKISVIII VRGAGKTIMDRILSGEYQVSASQIIHDDRKNKWYLNISYDFEPQTRVLDLNKIMGIALGV AVAVYMAFQHTPARYKLEGGEIENFRRQVESRRISMLRQGKYAGGARGGHGRDKRIKP lEQLRDKIANFRDTTNHRYSRYIVDMAIKEGCGTIQMEDLTNIRDIGSRFLQNWTYYDLQ QKIIYKAEEAGIKVIKIDPQYTSQRCSECGNIDSGNRIGQAIFKCRACGYEANAAYNAAR NIAIPNIDKIIAESIKSGGSPKKKRKV (SEQ ID NO:497)
[00280] dCasl2fl protein with D225A, E324A (with nuclear localization signals similar to the design from Wang et al., Nucleic Acids Res. 2021 Apr 6;49(6):e36. doi: 10.1093/nar/gkaal264) MAPKKKRKVMIKVYRYEIVKPLDLDWKEFGTILRQLQQETRFALNKATQLAWEWMGF SSDYKDNHGEYPKSKDILGYTNVHGYAYHTIKTKAYRLNSGNLSQTIKRATDRFKAYQ KEILRGDMSIPSYKRDIPLDLIKENISVNRMNHGDYIASLSLLSNPAKQEMNVKRKISVIII VRGAGKTIMDRILSGEYQVSASQIIHDDRKNKWYLNISYDFEPQTRVLDLNKIMGIALGV AVAVYMAFQHTPARYKLEGGEIENFRRQVESRRISMLRQGKYAGGARGGHGRDKRIKP IEQLRDKIANFRDTTNHRYSRYIVDMAIKEGCGTIQMADLTNIRDIGSRFLQNWTYYDLQ QKIIYKAEEAGIKVIKIDPQYTSQRCSECGNIDSGNRIGQAIFKCRACGYEANADYNAAR NIAIPNIDKIIAESIKSGGSPKKKRKV (SEQ ID NO:498)
[00281] MCP-SSAP fusion protein similar to Wang et al., Nucleic Acids Res. 2021 Apr 6;49(6):e36. doi: 10.1093/nar/gkaal264
MASNFTQFVLVDNGGTGDVTVAPSNFANGVAEWISSNSRSQAYKVTCSVRQSSAQKRK YTIKVEVPKVATQTVGGVELPVAAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGN PIPSAIAANSGIYSASGGSSGGSSGSETPGTSESATPESSGGSSGGSGGSMTKQPPIAKADL QKTQGNRAPAAVKNSDVISFINQPSMKEQLAAALPRHMTAERMIRIATTEIRKVPALGN CDTMSFVSAIVQCSQLGLEPGSALGHAYLLPFGNKNEKSGKKNVQLIIGYRGMIDLARR SGQIASLSARVVREGDEFSFEFGLDEKLIHRPGENEDAPVTHVYAVARLKDGGTQFEVM TRKQIELVRSLSKAGNNGPWVTHWEEMAKKTAIRRLFKYLPVSIEIQRAVSMDEKEPLTI DPADSSVLTGEYSVIDNSEESGGSPKKKRKV (SEQ ID NO:499)
[00282] Guide RNA scaffold with inserted MS2 aptamer designs. MS2 aptamer inserted into the stem-loop region of optimized Casl2fl guideRNA scaffold, can be other aptamers as in NAR paper, guide RNA 20bp used, can be longer or shorter (15bp to 35bp, possibly longer). MS2 aptamer can be one or more. [00283] guideRNA with MS2 aptamer (two MS2 inserted) scaffold 1, first G for expressing with U6
Gattcgtcggttcagcgacgataagccgagaagtgccaataaaactgttaagtggtttggtaacgctcggtaaggtagccaaaaggctgaa actccgtgcacaaagaccgcacggacgcttcacatatagctcataaacaacgtacaccatcagggtacgatacagacaccatcagggtctg gggtttgcgagctagcttgtggagtgtgaacNNNNNNNNNNNNNNNNNNNN (SEQ ID NO:500) [00284] guideRNA with MS2 aptamer (one MS2 inserted) scaffold 2, first G for expressing with U6
Gattcgtcggttcagcgacgataagccgagaagtgccaataaaacgtacaccatcagggtacggtggtttggtaacgctcggtaaggtag ccaaaaggctgaaactccgtgcacaaagaccgcacggacgcttcacatatagcttgtggagtgtgaacNNNNNNNNNNNNN NNNNNNN (SEQ ID NO:501) [00285] guideRNA with MS2 aptamer (one MS2 inserted) scaffold 3, first G for expressing with U6
Gattcgtcggttcagcgacgataagccgagaagtgccaataaaagtggtttggtaacgctcggtaaggtagccaaaaggctgaaactccgt gcacaacgtacaccatcagggtacgagaccgcacggacgcttcacatatagcttgtggagtgtgaacNNNNNNNNNNNNNN NNNNNN (SEQ ID NO:502) [00286] guideRNA with MS2 aptamer (one MS2 inserted) scaffold 4, first G for expressing with U6
Gattcgtcggttcagcgacgataagccgagaagtgccaataaaagtggtttggtaacgctcggtaaggtagccaaaaggctgaaactccgt gcacaaagaccgcacggacgcttcacatatcgtacaccatcagggtacgagcttgtggagtgtgaacNNNNNNNNNNNNNN NNNNNN (SEQ ID NO:503) [00287] FIG. 2 depicts SSAP with dCasl2fl(D225A) protein as compact editor for precision large knock-in (knock-in of mKate transgene).
[00288] FIG. 3 depicts SSAP with different versions of dCasl2fl - using scaffold no.4 with the best signal to noise ratio.
Example 17
[00289] Table 5 provides representative SSAP proteins for use with Casl2f according to the invention.
Figure imgf000095_0001
Figure imgf000096_0001
Figure imgf000097_0001
Figure imgf000098_0001
Example 18
[00290] 322 SSAP proteins were identified from sequence data for use in systems and compositions comprising Casl2f and/or other nucleases. The SSAPs were screened for activity with Cas9 and dCas9. Gene editing activites are shown below in Table 6, for top scoring SSAP proteins followed by amino acid sequences of the proteins. The table shows editing efficiency as the normalized average of two targets (HSP90 and ACTB), absolute editing efficiency, and cell viability. SSAP proteins are identified by Uniparc deposit number and SEQ ID NO.
Figure imgf000098_0002
Figure imgf000099_0001
UPIOOOOO 10203 (SEQ ID NO: 172)
ATNESLKNQLSTKKETGLGSAGNTIKGLMNSPAIKKRFEEVLKQRAPQYMSSIVNLVNS
DINLKKCDQMSVVASCMVAATLDLPVDKNLGYAWVVPYGNKAQFQLGYKGYVQLAL
RTGQYKSINVIEIHEGELIDWNPLTEELKIDFSKKESDAVIGYAGYFELLNGFKKSTYWTK
EQITKHKNKFSKSDFGWKKDFDAMARKTVLRNMLSKWGILSIEMQNAYTADQGIIKNEI
IETGEVKENIEYIEADFESYEDNSIEEGGANE UPI00000105D3 (SEQ ID NO: 173)
ATNESLKNQLTTKKETGLGSAGNTIKGLMNSPAIKKRFEEVLKQRAPQYMSSIVNLVNS
DINLKKCDQMSVVASCMVAATLDLPVDKNLGYAWVVPYGNKAQFQLGYKGYVQLAL
RTGQYKSINVIEIHEGELIDWNPLTEELKIDFSKKESDAVIGYAGYFELLNGFKKSTYWTK
EQITKHKNKFSKSDFGWKKDFDAMARKTVLRNMLSKWGILSIEMQNAYTADQGIIKNEI
METGEVKENIEYIEADFESYEDNSIEEGGANE
UPI0000030D3A I HAW2682705.1 RecT [Escherichia coli] (SEQ ID NO: 167)
TKQPPIAKADLQKTQGNRAPAAVKNSDVISFINQPSMKEQLAAALPRHMTAERMIRIAT
TEIRKVPALGNCDTMSFVSAIVQCSQLGLEPGSALGHAYLLPFGNKNEKSGKKNVQLIIG
YRGMIDLARRSGQIASLSARVVREGDEFSFEFGLDEKLIHRPGENEDAPVTHVYAVARL
KDGGTQFEVMTRKQIELVRSLSKAGNNGPWVTHWEEMAKKTAIRRLFKYLPVSIEIQRA
VSMDEKEPLTIDPADSSVLTGEYSVIDNSEE
UPI0000030D3E (SEQ ID NO: 166)
STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT
KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV
TEWMDECRREPFKTREGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAERI
VENTAYTAERQPERDITPVNDETMQEINTLLIALDKTWDDDLLPLCSQIFRRDIRASSELT
QAEAVKALGFLKQKAAEQKVAA
UPI000009AF52 (SEQ ID NO: 174)
ANELGIFSVDNLNMTTIKQYLDGGGKASDAELVLLINLCKQNNMNPFMKEVYFIKYGN
QPAQIVVSRDFYRKRAFQNPNFVGIEVGVIVLNKDGVLEHNEGTFKTHEQELVGAWAR
VHLKNTEIPVYVAVSYDEYVQMKDGHPNKMWTNKPCTMLGKVAESQALRMAFPAEFS
GTYGEEEYPEPEKEPREVNGVKEPDRAQIESFDKEDYAAKKIEELKEKAQPQKEVVEET GEVIDEEPLEGF
UPI000009B019 (SEQ ID NO: 175)
STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT
KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV
TEWMDECRREPFKTREGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAERI
VENTAYTAERQPERDITPVNDETMQEINTLLIALDKTWDDDLLPLCSQIFRRDIRASSELT
QAEAVKALGFLKQKATEQKVAA
UPI000009B628 (SEQ ID NO: 176)
STNDELKNKLANKQNGGQVASAQSLGLKGLLEAPTMRKKFESVLDKKAPQFLTSLLNL
YNGDDYLQKTDPMTVVTSAMVAATLDLPIDKNLGYAWIVPYKGRAQFQLGYKGYIQL
ALRTGQYKSINVIEVRDGELLKWNRLTEEIELDLDNNTSEKVIGYCGYFQLINGFEKTVY
WTRKEIEAHKKKFSKSDFGWKKDYDAMAKKTVLRNMLSKWGILSIDMQTAVTEDEAE
PRERKDVTEDESIPDIIDAPITPSDTLEAGSEVQGSMI UPI000009BC15 (SEQ ID NO: 177)
STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT
KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV TEWMDECRREPFKTREGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAERI VENTAYTAERQPERDITPVNDETMQEINTLLIALDKTWDDDLLPLCSQIFRRDIRASSELT QAEAVKVLGFLKQKASEQKVAA
UPI00000B3F97 Bet [Gammaproteobacteria] (SEQ ID NO: 178)
EKPKLIQRFAERFSVDPNKLFDTLKATAFKQRDGSAPTNEQMMALLVVADQYGLNPFT
KEIFAFPDKQAGIIPVVGVDGWSRIINQHDQFDGMEFKTSENKVSLDGAKECPEWMECII
YRRDRSHPVKITEYLDEVYRPPFEGNGKNGPYRVDGPWQTHTKRMLRHKSMIQCSRIAF
GFVGIFDQDEAERIIEGQATHVVEPSVIPPEQVDDRTRGLVYKLIERAEASNAWNSALEY ANEHFQGVELTFAKQEIFNAQQQAAKALTQPLAS
UPI000019AB49 Bet [Escherichia coli] (SEQ ID NO: 179)
STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT
KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV
TEWMDECRREPFKTREGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAERI
VENTAYTAERQPERDITPVNNETMQEINTLLIALDKTWDDDLLPLCSQIFRRDIRASSELT QAEAVKALGFLKQKATEQKVAA
UPI000034E66D Bet [Lactococcus phage phiLC3] (SEQ ID NO: 180)
ANEIDIYDAKNLNTATVKKFLKGGGQASDEELAMLLAISRNQNMNPFMKEVYFIKYGS
AAAQIVVSRDFYRKRAFQNPNFAGIEVGVIVLNKDGVLEHNEGTFKTKDQELVGAWAR
VHLKNTEIPVYVAVSYDEYVQMKNGQPNSMWTNKPCTMLGKVAESQALRMAFPAEFS
GTYGEEEYPEPEKEPREVNGVKEPDRAQIESFDKENYAARKIEELKEKAQPQKEFVEEIG EAIDEITAEDF
UPI00005F0A78 (SEQ ID NO: 181)
STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT
KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV
TEWMDECRREPFKTREGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAERI
VENTAYTAERQPERDITPVNDETMQEINTLLIALDKTWDDDLLPLCSQIFRRDIRASSELT QAEAVKALGFLKQKASEQKVAA
UPI000150D6AC (SEQ ID NO: 182)
ANELGIFSVDNLNMTTIKQYLDGGGKASDAELVLLINLCKQNNMNPFMKEVYFIKYGN
QPAQIVVSRDFYRKRAFQNPNFVGIEVGVIVLNKDGVLEHNEGTFKTHEQELVGAWAR
VHLKNTEIPVYVAVSYDEYVQMKDGHPNKMWTNKPCTMLGKVAESQALRMAFPAEFS GTYGEEEFPEPEKEPREVNGVKEPDRAQIESFDKEDYAAKKIEELKEKAQPQKEVVEETG EVIDEEPLEGF UPI0001594E53 (SEQ ID NO: 183)
TTALQTLTNKLAERFEMGDGSGLVETLKSSAFAGATVSDAQMIALLVIANQYQLNPWT KEIYAFPGGNGGLTPIVGVDGWVRIINREPQYDGMEFHFTDDYSACTCTIYRKDRSKPIV VTEFMGECKKSSPAWNSHPKRMLRHKAMIQCARLAFGFTGIYDQDEAERIAENEKPPK NITPQNNVVETTAVELISEEQLSQIRQLMQVTGTEEAKILAYIGVQALNQIPKSQAEAVIK KLNLTLDKQNAEKADNGESVGEEIPL
UPI00015968D7 (SEQ ID NO: 184)
AKNELAKGSYLTDLQKLDGNTLRDFVDPKHQASPQELQALLAIVKGRNLNPFTKEVYFI
KYGSAPAQIVVSKEAIMKRAEENPDFDGFEAGIVVETKDGAIERLTGTIVPKSATLRGGW CKVYRKDRSHAIEADADFAYYTTSKNLWQKMPALMIRKVAIVSAFREAFSESVGGLYT ADEMQRETQAEVRARKMKEAYEEKLYLLTQMEAKSYKKTKSKNENEAKKTKEAEAIE TVEEPTQDGNLEW
UPI00015C01AE (SEQ ID NO: 185)
MAKENYSDPNGKLLNSITTFEVNGEEVKLSGNIIRDYLVSGNAEVTDQEIIMFLQLCKYQ
KLNPFLNEAYLVKFKNTKGPDKPAQIIVSKEAFMKRAETHEQYDGFEAGVIVERGGEIIE LEGAVSLASDKLLGGWAKVFRKDRNRPVSVRISEKEFNKRQSTWNTMPLTMMRKTAV VNAMREAFPDNLGAMYTEEEQGSLQNTETSVQQEIKQNANAEVLDIPSQQNEVPDFKE VREPEHVEMPPIYGEQQSTPPARPY
UPI00015C02E0 (SEQ ID NO: 186)
ATNDELKNQLANKQNGGQVASAQSLDLKGLLEAPTMRKKFEKVLDKKAPQFLTSLLNL YNGDDYLQKTDPMTVVTSAMVAATLDLPIDKNLGYAWIVPYKGRAQFQLGYKGYIQL ALRTGQYKSINVIEVRDGELLKWNRLTEEIELDLDNNTSEKVIGYCGYFQLINGFEKTVY WTRKEIEAHKKKFSKSDFGWKKDYDAMAKKTVLRNMLSKWGILSIDMQTAVTEDEAE PRERKDVTEDESIPDIIDAPITPSDTLEAGSEVQGSMI
UPI00019E1F9A (SEQ ID NO: 187)
PQEIAKVEYTAADGQEVRLTPGVIAKYIVSGNGLASEKDIYSFMARCQARGLNPLAGDA YMTVYQGKDGNTSSSVIVSKDYFVRTATAQDSFDGMEAGVTVLNGQGQIQKREGCEFF PSLGEKLLGGWAKVHVKDREHPSKAAVTMDEYDQHRSLWKSKPATMIRKVAIVQALR EAYPGQFGGVYDRDEMPPSQEPQQVPVEVYEAPEAYETPDNQNRATEEF
UPI0001BEF484 (SEQ ID NO: 188)
NTPTMKRKFEEVLHENANAFMSNVMTLVSNDSYLAESEPMSILSGALTAATLNLGLDK
NLGYAYLVPFNTKNKQTGKWERKAQFILGYKGYIQLAQRSGKYKALNVIEVYEGELLS WNRLTEEFEFDPNGRQSDDVIGYVGYFELLNGFKKTVYWTKQEIEAHRIANSKDKEKTK LSGVWATDYNAMARKTVLRNMLSKWGILSIEMQEATTSDEKVQQMQEDGNIISETEVE ENTTMKTAEVINEADSDSLNQTDLFDTKNPPLE
UPI0001CE597A CK3 26380 [butyrate-producing bacterium SS3/4] (SEQ ID NO: 189) ENATAVQQAESQGTQDFS APVKHNTDF SLGIFGS SDNFLMATQMAKAF AS STIVPKEYQ GNFANGLVAMDIANRLKTSPFMVMQNLDVIQGRPAWRATFLIAMINRSKKYDIELQFEE KRDKNGKPYSCTCWTTKDGRKVTGIEVTMDMAEAEGWTKKNGSKWITMPQVMLRYR AASFFSRMNCPELSNGLYTTDEVYEMADSEYKVYNLEDEVKRDLAQNANKEEFVAPPN ETAPESESKGSEPLDPAVENQKSGDTPDWMKPETM
UPI0001D2DF22 RecT [Cellulosilyticum lentocellum] (SEQ ID NO: 190)
SDKKELVLKETHSRLNQLLATKMEAMPKDFNQTRFLQNCMTVLQDTKGIENCHPVSIA RTLLKGAFLGLDFFQRECYAIPYGGELQFQTDYKGETKMAKKYSIRDIKDIYAKVVRKG DEFKEEIVAGQQVVDFKPLPFNDAEIIGAFAVVLYQDGGMEYETMSTKQIEGIRDNFSK MKNGLMWTKTPEEAYKKTVLRRLTKKIEKDFASIDQAKAYEESSDMQFKQDEQKQDA
KDPFADAVDVEFTEETEGQVRLDGEADGAK
UPI0001E0C499 (SEQ ID NO: 191)
SNELMTKAVTYEVNNEEVKLSGQIVKQYLTSGQAVTDQEVTMFIQLCRYQHLNPFLNE AYLVKFNGKPAQIITSKEAFMKRAESNPNYAGLKAGCIVERNGELIYTEGAFTLKTDNIL GAWADVIRKDRREPTHVEISMDEFSKSQATWKSMPATMIRKTAIVNALREAFPQDLGA LYTEDDKNPNEATQTTYKQEPEVNTTKTADVLAKKFSGAPQIKSVENVQESEEESNNAS
NHGEATEPVNNVEEPTATAEVEQGQLL
UPI0001E2AFC1 (SEQ ID NO: 192)
TNNQLATQIKRDITTDPSLLTGADIKKYFDPQNLLTEKQVGQALALCKGRNLNPFANEV YIVAYQNNSGTEFSLIVSKEAFMKRAERCEGYDGFEAGITVMRNGEMIEIEGSLKLPEDIL IGGWAVVYRKDRSHRYKVTVDFNEYVKTDRNGNPRSTWKSMPATMIRKTALVQTLRE AFPDELGNMYTDIDGGDTFDAIKDVTPQESREDVVARKMAQIDQFNKEQEANHADPEP
TQNEDPIQGELLDGELEY
UPI0001E35ACE (SEQ ID NO: 193)
TNNQIVEAKGDFLTNPQLLNSGIIRKYLDPQGKASDEELAYFIAQAKAQNLNPFTKEIYFI KYGTQPAQIVTAKSAFEKKADSHPQFDGKEAGVIYLMDGEIKYSKGAFIPKGAEILGGW AKVYRKDRTYPTETEVSFEEYDNSKIRARVKELTQQGKDVTYPVMNSYGKPIGENNWD TMPCVMIRKVALVSAYREAFPAELGASYEADEIQLDNTPKDVTPQESREDVVARKMAEI EQFNKEQEANHADPEPTQNEDPIQGELLDGELEY
UPI00020BA2E0 (SEQ ID NO: 194)
NNEVMEKSVEYEVNGNSVKLTPNMIKQFITKGNADVTDQEAIMFMKLAEQQQLNPFLN EVYLIKFKGKPAQNIVAKEAFMKRAEKHSEYDGLEAGIIVQRGEEIKELPGAVCLPTDNL LGGWARVYRKDRKNPFYVQLDFKEFSKGQATWNQMPKNMIRKTAIVNALREAFPEAL GAMYTEDDARLEEVKTAEPIKEKAETTQILENKFKELSENGQTEVGDEQTNESTEPEPTA KQEQLL
UPI000212F382 (SEQ ID NO: 195) TVQLVQPRNSDEYDFDQTKLDLIKRTICKGATNDELQLFIHACKRTGLDPFMRQIFAVKR WDSSTKKEIMTIQTGIDGYRLIADRTGKYAPGKDTEFGYDNKGNIRWAKAYIKKMTPD GQWHEISAIAFWEEYVQTTREGKSTLFWLKKSHIMLSKCTEALALRKTFPAERSGIYTKE EMAQEFSPLEEHLVERIAASRNDQGRS
UPI00022F8B4D (SEQ ID NO: 196)
SNNQLSTQQAKRDIAIDTSVWTFQDVKRYFDPQNLLTEKQVGQALSLIKGRNLNPLANE
VYIVAYKKKTGGTEFSLIVSKEAFLKRAAQNPNYEGFEAGVVTVDTDGVMHERKGALM
LPGDTLVGGWARVYRKNFKVPVEIFVSREEYDKKQSTWNAMPATMIRKTALVNALRE
AFPEDLGNMYTEDDGGETFDRIKQAEPVESREDVMARKMAQIEQMKQEQAQRQIDTSY PTDDVIDPDDEPAQGELLEDLEY
UPI0002314B74 (SEQ ID NO: 197)
AITPNPIPAQDGSPIPSPDDIVGELARRKIYAGIPDDDVALALALCQKYGFDPLLKHLVLL
ATKDRDETTGQGQKHYNAYVTRDGLLHVAHTSGMLDGLETIQGKDDLGEWAEAVVY
RKDMSRPFRYRVYLSEYVREAKGVWKTHPQAMLTKTAEVFALRRAFDVALTPFEEMG
FDNQNIAGDTGPSPKTGFTEKAGFTGNTDFSAEASLPGKARFSTEAGLTDMTVIPPNRVT GSIPETSRLNTSAGSTGRQRRQLF
UPI00025CAD2E (SEQ ID NO: 198)
STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT
KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDDESCTCRIYRKDRNHPICV
TEWMDECRRAPFKTREGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAER
IVENTAYTTERQPERDITPVNEETMSEINALLTSMEKTWDDDLLPLCSQIFRRYIRASSEL SQAEAEKVLGFLKQKATEQKVAA
UPI00025CF49A (SEQ ID NO: 199)
EKPKLIQRFAERFSVDPNKLFDTLKATAFKQRDGSAPTNEQMMALLVVADQYGLNPFT
KEIFAFPDKQAGIIPVVGVDGWSRIINQHDQFDGMEFKTSETKVSLDGAKECPEWMECII
YRRDRSHPVKITEYLDEVYRPPFEGNGKNGPYRVDGPWQTHTKRMLRHKSMIQCSRIAF
GFVGIFDQDEAERIIEGQATHVVEPSVIPPEQVDDRTRGLVYKLIERAEASNAWNSALEY ANEHFQGVELTFAKQEIFNAQQQAAKALTQPLAS
UPI0002AD92E7 (SEQ ID NO:200)
TTVNQTELKNKLAEKAKTPAKTGNTVFDLIRKMEPEIKRALPKQISPERFARIAMTAVRN
TPKLQACEPISFIAALMQSAQLGLEPNTPLGQAYLIPYGKEVQFQLGYQGMLTLAYRTGE
YQSIYAMPVYANDEFEYEYGLNEKLVHKPAPDPEGEPIYYYAVYKLKNGGHGFVVMSR
QQIERHRDKYSPSAKQGKFSPWNTDFDSMAKKTVLKQLLKYAPKSVEFATQIAQDETIK TEIAEDMTEVQGIEVEYEATDDQENQENQEQED
UPI0002B78771 (SEQ ID NO:201) EFETDEEEKEMSNNQLSTQQAKRDIAIDTSVWTFQDVKRYFDPQNLLTEKQVGQALSLI
KGRNLNPLANEVYIVAYKKKTGGTEFSLIVSKEAFLKRAAQNPNYEGFEAGVVTVDTD
GVMHERKGALMLPGDTLVGGWARVYRKNFKVPVEIFVSREEYDKKQSTWNAMPATM
IRKTALVNALREAFPEDLGNMYTEDDGGETFDRIKQAEPVESREDVMARKMAQIEQMK QEQAQRQIDTSYPTDDVIDPDDEPAQGELLEDLEY
UPI0002B78B34 (SEQ ID NO:202)
TTNQVVTHKNFFNAPNVQKSFDDVWKGAGVQFATSILSVIQGNASLKSASNESIMTSAM
KAAVLNLPIEPSLGRAYLVPYKGQVQFQLGYKGLIELAQRSGKYKSINAGPVYKSQFVS YDPLFEELTLDFTQPQDEVIGYF ASF SLLNGFRKLTYWTKAEVEAHGKKF SKTFGNGPW KTDFDAMARKTVLKHILSIYGPLSVEMQTGMQNDESENDNATRDIKTAEPVNADQQLL EDLMNVDTETGEILEEVSELKDNGELDLKYEDPNAR
UPI0002B884F0 / WP 003158887.1 Bet [Pseudomonas aeruginosa] (SEQ ID NO:203)
GTALTPLLTKFATRYEMGTTPEEVANTLKQTCFKGQVNDSQMVALLIVADQYKLNPFT
KELYAFPDKNNGIVPVVGVDGWARIINENPQFDGMEFSMDQQGTECTCKIYRKDRSHAI
SATEYMAECKRNTQPWQSHPRRMLRHKAMIQCARLAFGFAGIYDQDEAERIVERDVTP
AEQYEDVSEAICLIKDSPTMEDLQSAFSNAWKAYKTKGARDQLTAAKDQRKKELLDAP IDVEFEETGDDRAA
UPI0002CB4A67 / WP_010792303.1 Bet [Pseudomonas aeruginosa] (SEQ ID NO:204)
GTALTPLLTKFATRYEMGTTPEEVANTLKQTCFKGQVNDSQMVALLIVADQYKLNPFT
KELYAFPDKNNGIVPVVGVDGWARIINENPQFDGMEFSMDQQGTECTCKIYRKDRSHAI
SATEYMAECKRNTQPWQSHPRRMLRHKAMIQCARLAFGFAGIYDQDEAERIVERDVTP
AEQYEDVSEAVCLIKDSPTMEDLQSAFSNAWKAYKTKGARDQLTAAKDQRKKELLDA PIDVEFEETGDDRAA
UPI0002E4C0BF (SEQ ID NO:205)
SSIAAAAESAEVTPASIINKYRDDIATVLPPKLRERIDRWIRLAIGAVNSNPELISRVRADQ
GASMMQALMKCAALGHEPGSGLFHLVPKGSRIEGWEDYKGILQRIDRSGVYARTVIGV
VYANDEYSYDQNVDERPRHVRATGDRGEPISSYAYAVYPSGAITTVAEATPEQIASSKS KARGADNAASPWRAPGAPMHRKVAVRLLEKHVATSAEDRREPISRSAANDVVIDATA DYYQEP
UPI0003282677 (SEQ ID NO:206)
TNELTQTKGAYLTDLQKLDGATLRNFVDPKHQASPQELQTLLAIVKNRNLNPFTKEVYF
IKYGNNPAQIVVSKDAFMKRAEQNQNYDGFESGIIYEDASGELKNKKGVILPKNCTLIGG
WCEVYRKDRTRPVYREVELSAYNTGKNWWQKAPGQMIEKVAIVAAVRDSFSEDVGGL
YTSEEMEQAAPIDVTPQESQEEVRTRKMAQIEEMKREQEKHQSSAYPEDEIPNFEDEPLQ GELLEEMEY
UPI00033853AF (SEQ ID NO:207) NERTNLQYAPAPVERFKECLNSHEIKARLKNSLKNNWTQFQTSMLDLYSGDAYLQKCD
PMAVALECVKAATLDLPISKSLGFAYVVPYNNVPTFTLGYKGLIQLAQRTGQYRTINAD VVYEGEIRGADKLSGMVDLSGERTGDEVVGYFAYFKLINGFEKMIYMTRAEAEKWRD
DYSPSAKSKYSPWRTDFDKMALKTCIRRLISKYGIMSVEMQGVMTEEAEPRAAAAAKR AEETVQANANSKVIDIDAAPPAANESPAEAAPQPDF
UPI0003427695 (SEQ ID NO:208)
DYVTKIQEVLNRLLDAKHDALPSGFKKTRFSENCRAYVKEYTDLQKYDEEEVALVLFK
GAVLGLDFLAKECHVITEGSALRFQTDYKGEMALVKKYSVRPILDIYAKNVREGDVFRE
EISEGKPLIHFNPLAFNNSQIIGSFAVALFSDGGMVYETMPAEEIESIRRNYGKNPGSDTW
EKSQGEMYKRTVLRRLCKTIEIDFDAEQSLAYEAGSSFEFNREPQPKKRSPFNPPEVEESE VLSDDGTSEAE
UPI000353091F (SEQ ID NO:209)
SNALTITQDQTEFTPKQLSVLENLGVQGAAPQEVAMFFDYCQRTGLSPWARQIYMIGR
WDRNLGRKKYAVQVSIDGQRLVAERSGVYEGQTAPQWCGPDGQWVDVWLANEPPQA
ARVGVWRKSFREPAYGVARLSSYMPVTRDGKPQGLWGTMPDVMLAKCAESLALRKA
FPLELSGLYTSEEMQQADAPRTEPAPVDEDVVDAEIVDDEERMQWVEAIQAAETTDVL
RKMWADIKTCPDALQAELRELIPARAKELAA
UPI000386D631 (SEQ ID NO:210)
IECAKLGLEPNNILGQAYLVPVCVDGVNKVEFQLGYKGLIELAYRSGKIKSLYANEVFE
KDEFHIDYGLDQKLIHKPFLGGDRGEVIGYYAVYQMDNRGASFVFMTRDEILGHSRKYS
RSFGCDLWESEFDAMAKKTVIKKLLKYAPLSIELQKSVSVDESVKGIGCIGVI
UPI0003E3D237 (SEQ ID NO:211)
GTALTPLLTKFATRYEMGTTPEEVANTLKQTCFKGQVNDSQMVALLIVADQYKLNPFT
KELYAFPDKNNGIVPVVGVDGWARIINENPQFDGMEFSMDQQGTECTCKIYRKDRSHAI
SATEYMAECKRNTQPWQSHPRRMLRHKAMIQCARLAFGFAGIYDQDEAERIVERDVTP
GEPVEDVTEALSLINSAPTMDDLQAAFSDAWKAYKSKGARDQLTVAKDQRKKELLEAP IDVEFEETGDDRAA
UPI00044F7143 (SEQ ID NO:212)
STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT
KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV
TEWMDECRREPFKTREGKEIIGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAERI
VENTACTAEHQPERDITPVNDETMQEINTLLIALDKTWDDDLLPLCSQIFRRDIRASSELT QAEAVKALGFLKQKATEQKVAA
UPI0004995B90 (SEQ ID NO:213)
TNNQLATQTKRNITTDPSLLTGADIKKYFDPQNLLTEKQVGQALALCKGRNLNPFANEV
YIVAYTNRNGGKEFSLIVSKEAFLKRAAQCKDYEGFEAGVVVVDSEGVMHERKGAIML PEDTLIGGWARVHRI<NFI<VPVEIFVSREEYDI<I<QSTWNTMPATMIRI<VALVNALREAF PEDLGNMYTEDDGGETFDRIKDVTPQESREDVVARKMAQIEQFNKEQEANHADPEPAQ TEETIQGELLDGELEY
UPI00051F5876 (SEQ ID NO:214)
SNREIEVIRACSKAGNNGGSSPWDSFPDEMARKAIVKRASKYWPRRDRLDTAIDYLNTQ GGEGIILNADHIPERDVTPASDEIINEITQAITEINKTWDDLLPLCSKTFRRTIASHEYLSQE EAVKTLDFVKKKAARNKATAEAKIHATTENNSEAVS
UPI000588C848 (SEQ ID NO:215)
ATNDELKNKLANKQNGGQVASAQSLDLKGLLEAPTMRKKFEKVLDKKAPQFLTSLLNL YNGDDYLQKTDPMTVVTSAMVAATLDLPIDKNLGYAWIVPYKGRAQFQLGYKGYIQL ALRTGQYKSINVIEVREGELLKWNRLTEEIELDLDNNTSEKVVGYCGYFQLINGFEKTVY WTRKEIEAHKKKFSKSDFGWKKDYDAMAKKTVLRNMLSKWGILSIDMQTAVTEDEAE PRERKDVTDDESIPDIIEAPITPSDTLEAGSVVQGSMI
UPI000598CD40 (SEQ ID NO:216)
TNNQLATQTKRNITTDPSLLTGADIKKYFDPQNLLTEKQVGQALALCKGRNLNPFANEV YIVAYTNRNGGKEFSLIVSKEAFLKRAAQCKDYEGFEAGVVVVDSEGVMHERKGAIML PEDTLIGGWARVHRI<NFI<VPVEIFVSREEYDI<I<QSTWNTMPATMIRI<VALVNALREAF PEDLGNMYTEDDGGETFDRIKDVTPQESREDVVARKMAQIEQFNKEQEANHADPEPAQ TEEPIQGELLDGELEY
UPI0005DCEBAD (SEQ ID NO:217)
TNNQLATQTKRNITTDPSLLTGADIKKYFDPQNLLTEKQVGQALALCKGRNLNPFANEV YIVAYTNRNGGKEFSLIVSKEAFLKRAAQCKDYEGFEAGVVVVDSEGVMHERKGAIML PEDTLIGGWARVHRI<NFI<VPVEIFVSREEYDI<I<QSTWNTMPATMIRI<VALVNALREAF PEDLGNMYTEDDGGETFDRIKDVTPQESREDVVARKMAEIEQFNKEQEANHADPEPAQ TEESIQGELLDGELEY
UPI0005E4CB74 (SEQ ID NO:218)
TNNQLATQTKRNITTDPSLLTGADIKKYFDPQNLLTEKQVGQALALCKGRNLNPFANEV YIVAYTNRNGGKEFSLIVSKEAFLKRAAQCKDYEGFEAGVVVVDSEGVMHERKGAIML PEDTLIGGWARVHRI<NFI<VPVEIVVSREEYDI<I<QSTWNTMPATMIRI<VALVNALREAF PEDLGNMYTEDDGGETFDRIKDVTPQESREDVVARKMAEIEQFNKEQETNHADPEPAQ TEETIQGELLDGELEY
UPI0005FEB4B0 (SEQ ID NO:219)
NEIQAYDKINDRDGMEMLGAAIQRSGMFGAETKEQGIILALQCMVEKKPPLEMAKNYH IIQGKLSKRADAMLADFRKAGGKFIFADLKNPTVQKAKVTFEDYKDFDVEYSIDDAKTA GVYNAKGAWVKYPGAMLRARLVSETLRAIAPEIVTGVYTPEELETPINAKPELKCAQPV KAKPEPKKAQPDVIEATVCESELDAKLVELIGDREQIVNLYWEKKGLIDGLDTTWRDLN
DDTKRKMIDQFDQFMDAAQRKAAQ
UPI00062002D2 (SEQ ID NO:220)
AENEKQALLQEENKSENVVSTVKRTALATNPFSDTDQFNNIFKMAQLISQSDMIPATYK GKPMNCVIALEQANRMGVSPLMVMQNLYVVKGVPSWSGQGCMMIIQGCGKFRDVDY VYSGEKGTDSRSCKVVATRISDGKRIEGTEITMQMVKSEGWISNTKWKNMPEQMLGYR AATFFARMYCPNELNGFATEGEAEDMNHKPQRIEAINVLGDTAHE
UPI00064B44C1 (SEQ ID NO:221)
TIMDLLNDPKMKSQIQRALPNGMSAERIARIALTALRMNPQLQECSPQSFAAALMTSAQ LGLEPNTPLGHAWLIPRKNHGKMEVQFELGYKGMLDLVRRSGMITAIFAEEVREKDEFE FEYGTNPYLKHKPYLGGDRGKVLFYYAVATFKDGGYAFKVMSIPEIEEARKLSQSANSP YSPWNRFYDEMAKKTVLKRLCKYLPLSIEVQRNLAQDETIRTQIEADDILDLPNENEFEV VEVEEIPGEEEKEEAKEGPFPNKALRESPTPLT
UPI00064D5E13 (SEQ ID NO:222)
STALTTLTSQLSQRFKLDGGEELLTTLKQTAFKGQVTDAQMTALLIVANQFGLNPWTKE IYAFPDKNGGIVPVVGVDGWARIINEHPQFDGMDFEMDGEQSCTCVIYRKDRTRPIRITE YMAECKKTGGGPWQSHPRRMLRHKAMIQCARMAFGFGGISDEDDAERIREKDITPQAE VVPKALEPYPADKFEENFEQWKSLIESGRRSADDVIAKIKSRNTMTDEQETRLRACGGE EGKTYENA
UPI00065C2D47 Bet [Pseudomonas phage PS-1] (SEQ ID NO:223)
SNVATIKPSSLSARMAERFGVDPNEMMATLKATAFKGQVSDAQMQALLIVADQYGLNP WTKEIYAFPDKGGIVPVVGVDGWSRIINENGAFDGMDFQQDDESCTCIIYRKDRNHPIK VTEWMAECKRNTQPWQSHPKRMLRHKAMIQCARLAFGYTGIFDEDEAQRIVEKDVTP AVNEPDITPALEAIKNASSMEELHAAFKAAWNQHPSARARLTAVKDERKKALSEPIEGE LVENEDGPAQQ
UPI00067A7349 RecT [Streptococcus phage APCM01] (SEQ ID NO:224)
AKNELVKGEYLTDLQKLDGNTLRNFVDPKHRASPQELQALLAIVKNRNLNPFTKEVYFI KYGSAPAQIVVSKEAIMKRAEENPNFDGFEAGIVIETKSGSIERLTGTIAPKRAELRGGWC KVYRKDRSHAIEADADFAYYTTGKNLWQKMPALMIRKVAIVSAFREAFSESVGGLYTA DEMEQNNTQETQEEVRARRMKQAYEEKLRLLTEMEAKSYKKVEDESASKEIEAAKTTK NTKEVEVIEETEVTEEPTQEDSLEW
UPI0006CE3F5D (SEQ ID NO:225)
STALATLAGKLAERVGMDSVDPQELITTLRQTAFKGDASDAQFIALLIVANQYGLNPWT KEIYAFPDKQNGIVPVVGVDGWSRIINENQQFDGMDFEQDNESCTCRIYRKDRNHPICV TEWMDECRREPFKTRDGREITGPWQSHPKRMLRHKAMIQCARLAFGFAGIYDKDEAER IVENTTYTTDRQPERDITPVSDETMREINDLLITMNKTWDDDLLPLCSQIFRRDIGASSDL
TQIEAVKALGFLKQKAAEQKVEA
UPI00078E90BE RecT [Pirellula sp. SH-Sr6A] (SEQ ID NO:226)
SEATKEAKPETAIAKKPNGIKDWLKSDALKTQIASVAPKHMAPERVMRIALNAVSRTPK LADCTPESFMRCLLDLSSWGLVPDGRHAHLIPYGTECTLVLDYKGLVTLAYRSGWVKKI HADVVFEGDIFVYSLGTVCQHIPWEFRDDANKPEHKGHFRAAYCVVTMADGIEKHEV MTASEIDAIKAKSRSGNSGPWKDHYTEMAKKTVYRRASKWLPLSPEQADAMERDDDRI IDAVSVAVTQRLSKAAMPLIGANETGDTE
UPI00078EBE91 RecT [Pirellula sp. SH-Sr6A] (SEQ ID NO:227)
SEATKEAKPETAIAKKPNGIKDWLKSDALKTQIASVAPKHMAPERVMRIALNAVSRTPK LADCTPESFMRCLLDLSSWGLEPDGRHAHLIPYGTECTLVLDYKGLVTLAYRSGWVKKI HADVVFEGDIFVYSLGTVCQHIPWEFRDDANKPEHKGHFRAAYCVVTMADGIEKHEV MTASEIDAIKAKSRSGNSGPWKDHYTEMAKKTVYRRASKWLPLSPEQADAMERDDDRI IDAVSVAVTQRLSKAAMPLIGANETGDTE
UPI00078ED021 (SEQ ID NO:228)
SEIQQQAEAQTQAHPTAVLDDYRGAIASVAPPGTNIDLFIRMTKSNVNRSDEIVAAVKR NPGLFMQAVMDSAALGHIPGSEYYYLTPRRDGISGIESWKGVAKRIFNTGRYQRIVCEV VYEGEQWEFQPGEDLKPKHVIDWDARQVGSKVRFTYAYAVDFEGNPSTVAVCTKLDL DKAQKQSRGKVWDQWYEQMAKKTAIKRLEDFVDTSAVDLRADGSSRRHSAEVAE
UPI000795D815 (SEQ ID NO:229)
ASKNEAIEVSPAEIASVKEKPASIVKAEKAKKEPCALVKYEDAEGREVVLTREDIINTISS NPRITDKEIKLFIELARAQKLNPFTREIFITKYGDYPATFIVGKDVFTKRAQSNPLFKGMQ AGIIVQRGNAVDQREGSATFGDEMLIGGWCKVYVQGYDVPIYDSVSFNEYAARKTDGT LNAMWASKPATMIRKVAIVHALREAFPSDFQGLYDQSEMGLSGQGGE
UPI00079B135B (SEQ ID NO:230)
ATSLKRAVTGDGKPATVQQLLTNPKIKSQIALALPQHLTPERLTRIVLTEIRRTPALAKCK PESLLAAVMQCAQLGLEPGGSLGHAWLMPFKNEVQFIIGYRGMIDLARRSGQVLSIEAR GVYESDTFHVSFGLEPDLTHQPDWDPADRGKLAFVYAVARLKDGGFQFDVMSRAEVE KIRAQSPAGKSGPWVTHFEEMAKKTVIRRLFKYLPVSVEMARAVGLDEAAERGEQSDA IDADCVIESEEEATPEEKGDSAA
UPI0007B45EC7 (SEQ ID NO:231)
SEISKAVATQQNPLAVVARYKRELGTVLPTVLRQDPDRWLMAAENAARKNPDIMAVT KADQGASYMRALVECARLGHEPGSKDFHFIKRGNAISGEESYRGIIKRVLNSGFYRSVV ARTVFSNDTYSFDPLTDIVPNHVPAQGDRGKPLSAYAFAVHWDGTPSTVAEATPERIAT AKAKSFASDKPTSPWQLPTGVMYRKTAIRELEPYVHVAPEPQPRRHLDGTVGGIPATDF DVDDGDVLDITADQLAEAGEIV UPI0007B642FE (SEQ ID NO:232)
SELQQAAQGQADAGPVQVIYSHAKEIQNVLAKGTDMDRWLQMARLAVMRDPNLVNA AKRDPGSLMQAMLDCAEKGHIPGTEDYYLVPRKGGIQGMESWKGIAKRIMRSGRYQSI VAEVVYEGEDFDFNPNTMDRPVHQIKYMARTSGQPVLSYAYAVDHEGKPSTIAVADPR YIAKVKANSKGTVWADWDEAMYKKTAVKMLVDYVDTSSTDRRGVSTVQVDGPVGTF IDGVLEIEGGDQ
UPI0007B64693 (SEQ ID NO:233)
SELQQAAQGQQSNNPVSFIYSHAKDIQNVLTKGTDMDRWLQMARLAVMRDQNLVASA KRDPGSLMQALLDCAEKGHVPGTEDYYLVPRKGGIQGMESWKGIAKRIMRSGRYQSIV NEVVYEGETFEFNPNTMDRPVHNINYMTRTSGKPVMSYAYALDHDGKPSSVAIADPRYI AKVKKNSRGSVWEDWDEQMYRKTAVKMLQDYVDTSSVDRRDVSTVQVDDANVIDV DDTFTAREAGE
UPI0007BCAEAB (SEQ ID NO:234)
TQDLATAIADQQPAQRRTAFDLVESMRGELHKALPEHASIDNFLRLALTELKMNPQLGN CSGESLLGALMTAARVGLEVGGPLGQFYLTPRRLKRDGWAVVPIVGYRGLITLARRAG VGQVNAVVVHEGDTFREGASSERGFFFDWEPAVERGKPVGALAAARLAGGDVQHRYL SLAEVHERRDRGGFKDGSNSPWATDYDAMVRKTALRALVPLLPQSTALSFAVQADEQ VQRYDAGDIDIPALDETDTEDTK
UPI0007F13B78 (SEQ ID NO:235)
TNQLAHKDFFNTPAVKQKFQEVLNGNERQFTASLLSIVNNNKLLARASNTSIMTAAMK AAVLNLPIEPSLGFAYIVPYGQDAQFQLGYKGLIQLAIRSGQFKAINSGKVYKAQFKSYD PLFETLDIDFTQPEDEVYGYFATFELVNGFKKLTFWTKEQAESHGKRFSKTYAKGPWST DFDAMAQKTVLKSILSKYAPLSTEMQEGLISDNQTEEVETDPIDVTPKNEDTQTLLSDLM SDEAESETEKV
UPI000865F43D (SEQ ID NO:236)
TSQQLDTTHTINQQVTTFRHTLVQMKNEIAAALPAHMTGDRFLRLILTEVRKNPELAECS TESIFGGILTAAALGLEPGLNGECWLIPRKVGKGPGSRKEATFQVGYKGIIKLFWQNPLA SYLDTGVVYANDAWKFRKGLDPILEHTPATGDRGAVRGYYAVVGLTTGARIFDFFTPK QISALRGTAGPNGGISDPEHWMERKTALLQVMKMAPKSTDLASAASVDGTVQTVEAA AQVAAASTGPVNPTTGEVLEAEPVEGGAA
UPI000865FB15 (SEQ ID NO:237)
TQQMPIEAQGEPTKELQQKAAVDRFNATLHQMQNEIARALPKHMTGDRFVRIVLTEVR KDPTLALCDPLTMFGSLLTAAALGLEPGLNGECWLVPRKNHGTLEAQLQVGYRGVVKL FWQNPAAAYLDTGYVCERDHFRFAKGLNPILEHTPAEGDRGKVVRYYAVAGLNTGAR VFDVFTPAQIKTLRGGKVGSNGDIPDPEHWMERKTALLQVLKLMPKSTQLAAVPAADG RAHTISDAQQIFGGVDTSTGEVLEAEPVEGDAA UPI0008D18539 (SEQ ID NO:238)
ETIDIKQELASQAQTDSKKEVKLTKAMSIAEMIKAMMPEIKRALPSMITPERFTRIALSAL NNTPELQACTPMSFISALLNAAQLGLEINSPLGHAYLIPYKNKGVLECQFQIGYLGLIALA YRNELMQTIQAQCVYENDEFLYEYGLNPKLVHRPATSDRGEPVFFYGLFKMINSGFGFC VMSKQEMDEFARTYSKGLASSFSPWKTSYNEMAKKTVIKQALKYAPIKTDFQKALSTD ESIKYAISEDMTEAVNEIVSQNTEVA
UPI0008D990CB (SEQ ID NO:239)
SNLKNQLANKAGGTATKKQPQTMQDWIKVMEPQIKKALPSVITAERFTRMALTAISTNP
KLAECTPESFMGALMNAAQLGLEPNTPLGQAYLIPYGKSVQFQVGYKGLMELAQRSGQ FKSIYAHTVYENDEFEVEYGLTQNIVHKPNFDDRGKPIGFYAVYKLTNGGENFVFMTQR EVEEFGKAKSKTFNNGPWKTDFEAMAKKTVLKQLLKYAPIKVEFQREIAQDATIKTEIA EDMTEVPEEMVEAEYEVVEQNTMAEDADLKGTPFETK
UPI0008E12231 (SEQ ID NO:240)
SNNELLAKPVEFEVNGEAVKLTGKTVKNFLVSGNGEVSDQEVVMFINLCKYQKLNPFL NEAYLVKFKSKSGPDKPAQVIVSKEAFMKRAEKHPNYEGFEAGIIVERDGQLVDIEGAIK
LTNDKLVGGWARVYRSDRQKPITTRISLSEFSKGQSTWNSMPLTMIRKSAIVNAQREAF PETLGALYTEDDAKLDTTSSHDQEQVIEQEIKTKANQEVIDVEYTEESEQKSPQQEQTET TQAGPGF
UPI0008EA8633 (SEQ ID NO:241)
ATNSSLKNQLSKKENVTIGNTMQGLLNNPKMKKRFEEILDKKAPQYMSSILNLYNGDTS LQKCEPMS VLS S SMIAATMDLPVDKNLGYAWIVPYKNKAQFQMGYKGYIQLALRTGQ YKHINAIEIHEGELVNWNPLTEELEIDFTKKESDKIIGYAGYFELLNGFKKSTYWTKTQIE NHRKKFSKSDYGWNKDFDAMAIKTVIRNMLSKWGILSIEMQNAYTADENIIKDSFIDDS ENVSANIEDLVEADYTVNQDSLESKEEFEGTPLE
UPI00091F1EB0 (SEQ ID NO:242)
KKMTVMKTSAPLCYADVAEVKCEEFYEDQYKAGAEELFDNTSYDRLKVYLEKHGGLE GVHADVVRAGDTFVYRPGVIRRHGYVPGEQRGQVYAVYAKAHIKGGATRCVILARHE VEIDMDAI<HGGNPDGDWENLAI<VVALRSLAEALPLPSAVLQSCRTWSAI<
UPI000958E115 (SEQ ID NO:243)
SNPPLAQADLQKTQGTEVKTKTKDQQLIHFINQPSMKAQLAAALPRHMTPDRMIRIVTT
EIRKTPALANCDMQSFIGAVVQCSQLGLEPGNALGHAYLLPFGNGKAKSGQSNVQLIIG YRGMIDLARRSNQIISISARTVRQGDSFHFEYGLNENLTHVPGENEDSPITHVYAVARLK DGGVQFEVMTHNQIEKVRASSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPVSIEMQK AVILDEKAEANVDQENASVFEGEFEEVSQSA
UPI0009805C1D (SEQ ID NO: 244) ANELGIFSVDNLNMTTIKQYLDGGGKASDAELVLLINLCKQNNMNPFMKEVYFIKYGN
QPAQIVVSRDFYRKRAFQNPNFAGIEVGVIVLNKDGVLEHNEGTFKTHEQELVGAWAR
VHLKNTEIPVYVAVSYDEYVQMKDGHPNKMWTNKPCTMLGKVAESQALRMAFPAEFS
GTYGEEELPEPEKEPREVNGVKEPDRAQIESFDKEDYAAKKIEELKEKAQPQKEVVKET
GEVIDKITAEDF
UPI0009805F63 (SEQ ID NO:245)
ANELGIFSVDNLNMTTIKQYLDGGGKASDAELVLLINLCKQNNMNPFMKEVYFIKYGN
QPAQIVVSRDFYRKRAFQNPNFAGIEVGVIVLNKDGVLEHNEGTFKTHEQELVGAWAR
VHLKNTEIPVYVAVSYDEYVQMKDGHPNKMWTNKPCTMLGKVAESQALRMAFPAEFS
GTYGEEELPEPEKEPREVNGVKEPDRAQIESFDKEDYAARKIEELKEKAQPQKEVVKET
GEVIDEITAEDF
UPI0009880690 (SEQ ID NO:246)
TNNQLVEAKGDFLTNPQLLNSGIIRKYLDPQGKASDEELAYFIAQAKAQNLNPFTKEIYFI
KYGTQPAQIVTAKSAFEKKADSHPQFDGKEAGVIYLLDGEIKYSKGAFIPKGAEILGGW
AKVYRKDRTYPTETEVSFEEYDNSKIRARVKELTQQGKDVTYPVMNSYGKPIGENNWD
TMPCVMIRKVALVSAYREAFPAELGASYEADEIQLDNTPKDITPQENREDVIARKMAQIE
QFNKEQAHTDPEPTQTEEPIQGELLDGELEY
UPI0009F5E532 (SEQ ID NO:247)
RTDGTKEAGAAATAPTEGKAPAKAHKPADTIGAMIEKLKPQIERALPKHVTPDRMARM
ALTAIRNNPKLGQAEAVSLMGSIIQASQLGLEPNTPLGQCYIIPYNSKNGMQAQFQMGY
KGIVDLAHRSGQYRQLTAHPVDEADEFRYSYGLNPDLVHVPAEKPSGKITHYYAVYHL
TNGGFDFRVWSREKVEAHAKQYSKSFSSGPWQTNFDQMACKTVMIDLLRYAPKSVEIA
KATSADNRTHTINPEDPDLNIDTIDGDFELEGEER
UPI0009F8F604 (SEQ ID NO:248)
EALLLRRWQMGNLTKTTGFALAPQNLEQAMQLATMICNSQLAPNNYKGKPEDTLVAM
MMGHELGLNPLQSIQNIAVINGRPSIYGDALLALVQNSPAFGGIQESFDEDTMTATCTV
WRKGGEKHTQHYSKDDADTAGLWGKQGPWKQHPKRMLAMRARGFAVRNQFADALA
GLVTREEAEDMEKEINPTPAPQAQSKRIGQKQSRTQYSESDFNENFPKWKAAVESGKKT
SEQIISMVSTKGDLTQGMIEAIESIEAGEPA
UPI000A08A794 (SEQ ID NO:249)
GHLVSKTEQDYIKQHYAKGATDQEFEHFIGVCRARGLNPAANQIYFVKYRSKDGPAKP
AFILSIDSLRLIAHRTGDYAGCSEPIFTDGGKACTVTVRRNLKSGETGNFSGMAFYDEQV
QQKNGRPTSFWQSKPRTMLEKCAEAKALRKAFPQDLGQFYIREEMPPQYDEPIQVHKP
KALEEPRFSKSDLSRRKGLNRKLSALGVDPSRFDEVATFLDGTPDRELGQKLKLWLKEA
GYGVNQ
UPI000B36BD3F (SEQ ID NO:250) TDVKQELERKVGKQDSTAVRLTKNMSIPDMIKALEPEIRRALPAVLTPERFLRMALSAV NNTPKLAECTPMSFIAAMMNAAQLGLEPNTPLGQAYLIPYKNKNQLECQFQIGYKGIIDL AYRTGQMQMIQAHAVHEFDDFEYEYGLNPKLIHRPGDGNRGEITYFYGLFKLVNGGFG FEVMNREAMEAFAQQYSQSYGSQYSPWVKNFEDMAKKTVIKKALKYGPVKAEFQKAI SMDETIKTEIAVDMTEVQNEESE
UPI000B38B374 (SEQ ID NO:251)
ENEVMTQDQ AYEVASPFGS SENFQKLFDIGKMF AS S SLVPDRYRGKPMDCTIAVDMAN RMGVSPMMVMQQLYVVKGNPQWSGQACMSLIRGSSEYKNVRPVYTGKKGEDSWGC YIEAEKKKTGEIVKGTEVTIAMAKAEGWYSKKDKYGNETSKWQTMPELMLAYRAAAF FARVYIPNALMGCAVEGEAEDIMKRAITAEDPFKEDAK
UPI000B49B5D9 (SEQ ID NO:252)
TLQAVCPTQDKAVESQLDQTKFELIKRTICKGTTDDEFQLFIHACKRTGLDPFMRQIFAV KRWDSAERREVMTIQTGIDGYRLIADRTGRYAPGRDAEFGYDAHGGLRWAKAYVKKM TPDGHWHEISATAFWTEYVQTTKDGRPTVFWMKKGHVMLSKCAEALALRKTFPAELS GIYTQEEMAQTMSLPDTKGDSQTIGSDKAYEIERSIDNDPEFKTQLLTRLQRAFGCKSFS DLPQDQFKNVKKVIENHQIKEKIA
UPI000B4BEFE6 / WP 088258624.1 Bet [Fimbriiglobus ruber] (SEQ ID NO:253)
TDIAHRSYSAPQLSLIRRTVAKDTNQDEFDLFIEICKQQGLDPFKKQIFAQVYNKDKADK RQIVIVTSIDGYRAKAQRCGDYRPAEEETRFEADAALKIRRQPMGSFVRSCGLQVRPGQ GVVSRVGEARWDEFAPLDDAEFDWVDTGETWPDTGKPKKKKVAKSAKKTLKEGNWK NMPHVMLGKCAEAQALGGGGRKRSAACTSKRRWTRSTWT
UPI000B5661AA (SEQ ID NO:254)
TASKQTDIFSFVSGGEDITITLADIKNYFCANATDQECVLFGQLCKANGLNPWLKEAYLI I<YDI<NAPAAMVTGI<DAYMI<RANEHPAFDGYEAGVI<VYLPDVGQVEYREGTAYYEDL GEQLIGGYAKVYRKDRSRPYYEEVPLKEYDTKQSKWKTSPATMIRKVALVHALREAFP TNIQGMYDADETPYAADYEGSFREMDDPTPAPSMRGRIAPAPVADPLEDLEADVIEAGD VE
UPI000B94B1D1 (SEQ ID NO:255)
ADLTKTANGADLAAAIGGKQAETGRATAFDLVKSMEAEFAKALPRHVPVEQFMRTAV TELRQNADLQRSTSESLLGAFLTAARLGLEVGGPMGEFYLTPRFAKLPGQDQKAWQVV PIVGYRGLVKLARNAGVGAVKAWVVYEGDHFVEGANSERGPFFDFHPVPGDPAGRKE VGVLAVARLSGGDVQHTYLTIEQVEKRKARGSAGDKGPWATDRAAMIRKSGIRALAGE LPQSTLLALARVVDEEVQTYVPGSLVDVGTGELEA
UPI000BD04ECE (SEQ ID NO:256)
NTELETMNNVYDNLQSVIMQQGIAALLPAQVTPEQFTRTAATALIENVDLQNADKQSLV
LALTRCAKDGLMPDGREAALVVRSTKVNKQFVKKAVYMPMVDGVIKRARQSGQVANI
I l l lAKVVYSQDEFEYVIDENGEHLTHRPAFVDGDDIVKVYAFAKLNSGELVVEVMSRAGV
EKIRDTVQSAKYDSSPWVKWFDRMALKTVIHRLARRLPCASELFSLFEVYEDANSTEKT
LRMAPASFKRLSIN
WP 032686941.1 RecT [Raoultella planticola] (SEQ ID NO:257)
TKQPPIAKADLQKTQGTRVSSPKGNNDVISFINQPSMKEQLAAALPRHMTAERMIRIATT
EIRKVPALASCDTMSFVSAIVQCSQLGLEPGGALGHAYLLPFGNKNEKSGKKNVQLIIGY
RGMIDLARRSGQIASLSARVVREGDEFSYEFGLEEKLTHRPGENEDAPVTHVYAVARLK
DGGTQFEVLTSKQIELVRSQSKAANSGPWVTHWEEMAKKTAIRRLFKYLPVSIEIQRAV SIDEKEALTIDPADTSVLTGEYSVINSESEE
WP 069728515.1 RecT [Pantoea brenneri] (SEQ ID NO:258)
SNQPPIASADLQKTQQSKQVANKTPEQTLVGFMNQPAMKSQLAAALPRHMTADRMIRI
VTTEIRKTPQLAQCDQSSFIGAVVQCSQLGLEPGSALGHAYLLPFGNGRSKSGQSNVQLII
GYRGMIDLARRSGQIVSLSARVVRADDEFSFEYGLDENLVHRPGENEDAPITHVYAVAR
LKDGGTQFEVMTVKQVEKVKAQSKASSNGPWVTHWEEMAKKTVIRRLFKYLPVSIEM
QKAVVLDEKAESDVDQDNASVLSAEYSVLESGDEATN
WP_045958294.1 RecT [Xenorhabdus poinarii] (SEQ ID NO:259)
TNTPPLAQADLQKAQPQTKVAATKDQALIQFINKPSMKAQLAAALPRHMAPDRMIRIVT
TEIRKTPALANCDMQSFVGAVVQCSQLGLEPGSALGHAYLLPFGNGKSKTGQSNVQLII
GYRGMIDLARRSGQIVSISARTVRDGDQFHYEYGLNENLTHIPGENEDAPITHVYAVARL
QDGGVQFEVMTRKQVEKVREKSSAGNNGPWVTHWEEMAKKTVIRRLFKYLPVSIEMQ
KAVILDEKADANIDQDNAAIFEGEFEEVGNDG
WP_102086779.1 RecT [Proteus mirabilis] (SEQ ID NO:260)
SNPPLAQADLQKTQGTEVKTKTKDQQLIHFINQPSMKTQLAAALPRHMTPDRMIRIVTT
EIRKTPALANCDMQSFIGAVVQCSQLGLEPGNALGHAYLLPFGNGKAKSGQSNVQLIIG
YRGMIDLARRSNQIISISARTVRQGDSFHFEYGLNENLTHVPGENEDSPITHVYAVARLK
DGGVQFEVMTHNQIEKVRASSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPVSIEMQK
AVILDEKAEANVDQENASVFEGEFEEVGSNGN
WP_109615067.1 RecT [Edwardsiella piscicida] (SEQ ID NO:261)
TNNQQPPIATADLQKAQSQAPAVKPDQKLINFINQPSMKGQIAAALPRHMAPDRMIRIIT
TEIRKTPALATCDMQSFIGSVVQCSQLGLEPGGALGHAYLLPFGNGKAKSGQSNVQLIIG
YRGMIDLARRSGQIVSISARTVRDGDQFHYEYGLDETLKHVPGDNESSPITHVYAVAKL
KDGGVQFEVMTFNQIEKVRGQSKAGNNGPWQTHWEEMAKKTVIRRLFKYLPVSIEMQ
KAVILDEKAEANIDQENASVISAEFSVVED
WP 124537594.1 RecT [Morganella morganii] (SEQ ID NO:262)
SNPPIAQADLQKAQGTAVKEKTKDQQLIQFINQPGMKAQLAAALPRHITPDRMIRIVTTE
IRKTPSLATCDMQSFIGAVVQCSQLGLEPGNALGHAYLLPFGNGKAASGQSNVQLIIGYR GMIDLARRSGQIISISARTVREGDSFHFEYGLNEDLTHVPGENDSGPITHVYAVARLKEG GVQFEVMSFSQIEKVRDSSKAGKNGPWVSHWEEMAKKTVIRRLFKYLPVSIEMQRAVIL DEKAEANVDQEHASIFEGEYETVSPE
WP 006657622.1 RecT [Providencia alcalifaciens] (SEQ ID NO:263)
STPPLAKSDLQKTQGTEVKIKTNEQKLVEFINQPGMKAQLAAALPKHITSDRMIRIVSTEI
RKTPSLANCDIQSFIGAVVQCSQLGLEPGNALGHAYLLPFGNGKSDNGQQNVQLIIGYR GMIDLARRSGQIISISARTVRQGDNFHFEYGLNENLTHIPEGNEDSPITHVYAVARLKDG
GVQFEVMTYNQIEKVRNLSKAGKNGPWVTHWEEMAKKTVIRRLFKYLPVSIEMQKAVI LDEKAEANIEQEHSAIFEAEFEEVDSNGN
WP_109401438.1 RecT [Proteus terrae] (SEQ ID NO:264)
SNPPLAQADLQKTQGTEVREKTKDQMLVEFINKPNMKAQLAAALPRHMAPDRMIRIVT TEIRKTPELANCDMQSFVGAVVQCSQLGLEPGNALGHAYILPFEKKRKQGNQWVTVRT DAQLIIGYRGMIDLARRSGQIVSISARTVRQGDSFHFEYGLNENLTHVPGENEDSPITHVY
AVARLKDGGVQFEVMTHNQIEKVRTSSKAGQNGPWVSHWEEMAKKTVIRRLFKYLPV SIEMQKAVILDEKAEANVDQENSSVFEGEFEEVGQGA
WP_115149784.1 RecT [Plesiomonas shigelloides] (SEQ ID NO:265)
SNQRPPIATADLQKAQSQPPAAKPEQNLINFINQPSMKSQIAAALPRHMAPERMIRIITTEI
RKTPKLATCDVQSFIGAVVQCSQLGLEPGGGLGHAYLLPFGNGKAESGKPNVQLIIGYR GMIDLARRSGQIVSISSRIVREGDQFHYEYGLNETLKHVPGDNESAPITHVYAVAKLKDG
GTQFEVMSFNEIEKIRGQSKAGNDGPWIKHWEEMAKKTVIRRLFKYLPVSIEMQRAVIL DEKAEADIEQDNASIIGAEYSVVENAA
WP 034910107.1 RecT [Gilliamella apicola] (SEQ ID NO:266)
SEQNQPPIAKSDLEKTQLTNQDKKPATLAELVNSPKIKNQLAMALPKHMNPDRMARIVT
TEIRKTPALADSNIQSFLGAVVQCSQLGLEPGGALGHAYLLPFGNGKAKDGKSNVQLIIG
YRGMIDLARRSGQIISISARTVREGDDFHYEYGLNEDLKHTPKADESAPITYVYAVARLK DGGSQFEVMTFNQIESVRKQSKAGDKGPWITHWEEMAKKTVIRRLFKYLPVSIEIQQAVI LDEKAEAGISQDNEMILD ADF S VVEA
WP 016979878.1 RecT [Pseudomonas fluorescens] (SEQ ID NO:267)
NSTAETATPFSSQDLEKTQPTKAQSKTGSLASLLASPKMKSQFAAALPKHMTPERMARI
VTTEIRKNPELVKCEQHSFLGAVIQCAQLGLEPGNTLGHAYILPYGKQAQLIIGYRGMID LARRSGQIISISARTVREGDYFEYEFGLDENLIHRPVETTQPGAVTHVYAVARLKDGGRQ
FEVMSRAQIEEVRVQSKAAKSGPWVTHWEEMAKKTVIRRVFKYLPVSVEIQRAVMLDE KAEAGVCQENECVFDGDFEVITDTEE
WP 080977968.1 RecT [Pseudomonas stutzeri] (SEQ ID NO:268)
STENVAPFSQKDMQQATGQQVKPRSPADSLAAMLASPKMKAQFAAALPKHMTAERM ARIVTTEIRKTPALVKCDQHSFLGSVIQCAQLGLEPGNSLGHAYLLPYGNQVQLIIGYRG MIDLARRSGQIVSLSARTVREHDEFDYQLGLHEDLTHKPFEGEHAGEITHVYAVARLQG GGVQFEVMSKAQVEAVRAQSKAGKSGPWVSHWEEMAKKTVIRRLFKYLPVSVEIQRA VTLDEAAEAGLPQGNEYVFDGDFEVVNDASGAQQ
KXJ39364.1 AXA67_02205 [Methyl othermaceae bacteria B42] (SEQ ID NO:269)
ATSLKRAVTGDGKPATVQQLLTNPKIKSQIALALPQHLTPERLTRIVLTEIRRTPALAKCK PESLLAAVMQCAQLGLEPGGSLGHAWLMPFKNEVQFIIGYRGMIDLARRSGQVLSIEAR GVYESDTFHVSFGLEPDLTHQPDWDPADRGKLAFVYAVARLKDGGFQFDVMSRAEVE KIRAQSPAGKSGPWVTHFEEMAKKTVIRRLFKYLPVSVEMARAVGLDEAAERGEQSDA IDADCVIESEEEATPEEKGDSAA
WP_106478153.1 RecT [Halomonadaceae bacterium R4HLG17] (SEQ ID NO:270)
SEVATQDTLGKELQQHSGQQKKPMPTTIQGMLKDPRFTSQIARALPKHITPDRITRIALTE VNKTPALGKCDPVTLFGSIIQSAQLGLELGGALGHAYLVPYGNQAQFIIGYRGMIDLARR SGQMVSLQAHTVHDNDEFDFEYGLDEKLRHVPARGDRGPMVAVYSVAKLVGGGHQIE VMWKEDVDAIRSKSKAGNSGPWRDHYEEMAKKTAIRRLFKYLPVSVEMQKAVALDEQ AE AGVQDNNVFDGEF S YGEAE
WP 129141488.1 RecT [Halomonas coralii] (SEQ ID NO:271)
TDQATAEPQEDLGKQLQQHSQRKPMPTTIQGMLKDDRFTGQIARALPKHITPDRISRIAL TEVNKTPALGKCDPMSLFGSIIQSAQLGLELGGALGHAYLVPYKDQAQFIIGYRGMIDLA RRSGQMVSLQAHTVHENDDFEFEYGLDEKLRHVPARGQRGPMIAVYAVAKLTGGGHQ IEVMWKEDVDAIRQQSKAGNSGPWRDHYEEMAKKTAIRRLFKYLPVSVEMQKAVSLD EQ AEAGVQDNNVFDGEF S YQEPE
WP 084261900.1 RecT [Zymobacter palmae] (SEQ ID NO:272)
TNTVQQQAPQQDQLAQQLQQASGNTPQKKPMPSTIQGMLKDDRFKTQIARALPKHVTP ERIMRIAL TEINKTPKLKECDPIGLFGSIVQSAQLGLELGGALGHAYLVPYGKQAQFIIGY RGMIDLARRSGQMVSLQAHTVHENDEFNFEYGLNENLRHVPARGERGPMIAVYAVAK
LVGGGHQIEVMWKEDVDAVRKSSKAGGSGPWRDHYEEMAKKTAIRRLFKYLPVSVEM QRAVSLDEQAEEGVQDNNVFDGDYTVAEH
WP 020007369.1 RecT [Salinicoccus albus] (SEQ ID NO:273)
STNESLKNQVATNQKNEVSNGNKPKTIGDYIDQMAPAMAQALPKHMSVERMTRMATT VIRTTPQLKEADVASLLGAVMQSAQLGLEPGPMGHCYFLPFKNNKKGTTEVTFIIGYKG MIDLARRSGHISTIYAHAVYENDEFEYELGLHADLKHKPSEDERGAFKGAYAVAHFKD
GGYQFEYMPKSDIDKRRSRSKAGNSNYSPWATDYEEMAKKTVIRHMWKYLPVSVEMQ QAVAHDEGTGKDIKDVTPDEDSFVDMPEYIADVPAEGEGE
WP_131521405.1 RecT [unclassified Lysinibacillus] (SEQ ID NO:274)
ATTTDLKAQMQQAPATQQKPKTIDDYLKQMAPAMAQALPKHMDVDRLMRLAMTTIR TTPALKDADVSSLLGAVMQAAQLGLEPGLMGHCYLLPFKNNKKGITEVQFIIGYKGMID LARRSGHIQSIYAHAVYQKDEFEYELGLDPKLKHKPCMDEDKGNFVGAYAVAHFKDG
GYQFEFMSKAEIEKRKGRSKAANSTYSPWATDYEEMAKKTVVRHMWKYLPISVEMQQ
QVAYDEGTAPKREMKDITPETEFFVDAPEIEVEVVNE
WP_132769795.1 RecT [Tepidibacillus fermentans] (SEQ ID NO:275)
ATNEKVKTQLANRANGQAPTPTPEQTIAAYMKKMAPRFAEVLPKHMDIDRMTRIALTTI
RTNPKLLEASVPSLLGAIMQAAQLGLEPGLVGHCYLVPFKNGKTGQTEVQFIIGYKGMI
DLARRSGNIESIYAHAVYENDTFEYEYGLHPKLVHKPAMTDRGEFIGAYAVAHFKDGG
YQFEFMPKEEIEKRRNRSKTANGGPWVTDYEEMAKKTVVRHMWKYLPISIEIQQAAAQ
DEVIRKDVTSEPEFVDDVIDISTEIEEQSVEVEGEEAQ
WP 120191052.1 RecT [Ammoniphilus oxalaticus] (SEQ ID NO:276)
STKATSNELKNQLANRQGNNAATNNNPANTIAAYLKRMAPEIEKALPAHMDADRLARI
ALTTIRTTPKLLECTIPSLMGAVMQSAQLGLEPGLIGHCYIIPYGKEATFIIGYKGMIDLAR
RSGNIESIYAHAVYKNDEFEYEYGLKPNLVHKPAMSDQGDFIGAYAVAHFKDGGYQFE
FMPKEEIDKRRNRSAASKGGPWVTDYEEMAKKTVVRHMWKYLPISIEIQQAATQDEVV
RKDITEDPMPVDVLDIPFEASDAEETSEEGEINFD
WP 066790810.1 RecT [Rummeliibacillus stabekisii] (SEQ ID NO:277)
ATTTELKEQMKQQAPAQTKKPKTIEDYMKQMAPAMAEALPKHMSVDRLTRLAMTTIR
TTPALRQADVSSLLGAVMQAAQLGLEPGLLGQCYLLPFKNKKKAITEVQFIIGYKGMID
LARRSGHIQSIYSHAVFENDVFEYELGLEPKLKHTPTMSTDKGAFIGAYAVAHFKDGGH
QFEFMSKADIEKRKGRSKAANSDYSPWLTDYEEMAKKTVIRHMWKYLPISVEMQEQV
AYDEGVGRSIKDVTPEEDVFVQAPDEILEAEATEA
WP 098408280.1 RecT [Bacillus] (multispecies) (SEQ ID NO:278)
TQAEKLKNDIAKQEQKNEVAQDDKPKTILDVMMQHKESFEMALPKHLDADRLIRLAVT
EFRKNPMLKECTPESLLGAVMQAAQVGLEPDALGSAYLVPYYNKNKNVKEVQLQIGY
KGLIELVRRSGQVTSIVANEVYENDEFDFEYGINEKLYHKPTMDADRGKLKCFYAYARF
KDGGHAFTVMSVEQINQIRDKFSKSQKNGKHFGPWADHYESMAKKTVIKQLVKYMPIS
VEIQNQITRDETVHSSFKEEPKPIYAFEESPDIIDAPIEN
WP 047150996.1 RecT [Aneurinibacillus tyrosinisolvens] (SEQ ID NO:279)
SDLKEKLEKRANETEAAPPSPAQTIAAYLKRMEPEIARALPKHMDVERLTRIALTTIRTN
PRLLECTVPSLLGAVMQAAQLGLEPGLLGQCYIIPHGREATFIIGYKGMIDLARRSGNIKS lYAHDVRENDEFEYEYGLHPFLKHRPAMTDRGKFIGVYAVAHFNDGGYQFEFMPYEEIE RRKLRSRSYKNGPWVTDYEEMAKKTVIRHMFKYLPLSVEIMRSAAQDETVRPDLTSDP
VSIYERPIEGKIITAEDVQPEEIPNVPDAEQGDV
WP_018705791.1 RecT [Siminovitchia fordii] (SEQ ID NO:280)
ATNQDIKNQLANKANGNKPASPANTIAAYLKKMGPEIEKALPKHMDADRLARIALTTIR
TTPKLLECNISSLMGAVMQSAQLGLEPGLIGHCYIIPYGKEATFIIGYKGMIDLARRSGQI QNIYAHAVFENDEFDYALGLHPKLEHKPAGSNRGEIIGAYAVAHFKDGGYQFEYMAKE DIEKRKSRSAAARSKHSPWATDYEEMAKKTVIRHMWKYLPISVEIQQQAIQDEVVRKD VTSEPEFIDMEDMPEVEEGQSEESEQVEAPFD
WP_035430909.1 RecT [Bacillus sp. UNC322MFChir4.1] (SEQ ID NO:281)
ATNKDVKNQLANRKENKPATPEQKVEAYMTAMAPRFAEVLPKHMSMDRMSRIALTTI RTNPKLLECSVPSLMGAVMQAVQLGLEPGLLGHCYILPYKSEATFIIGYKGMIDLARRSG HIQSIYAHAVYENDEFDYELGLHPKLTHKPSFGERGEFIGAYAVAHFKDGGHQMEFMPK SEIEKRRSRSASGNSSYSPWKSDYEEMAKKTVVRYMFKYLPISIEVQSQAQHDEVVRKD ITEEPEFIEMDSIEVAEASEGDGQKEFVIEE
RDC50983.1 RecT [Acinetobacter sp. RIT592] (SEQ ID NO:282)
ADLKNKLANKAAGTVTKTSPNAGMKQLMKSMSKEIEAALPSHMSSERFQRVALTAFG NNPKLMNCDPMSFIAAMMDSAQLGLEPNTPLGQAYLIPYGTKVQFQVGYKGLLELALR SGKIKTLYAHEVRENDTFEVKYGLHQDLIHEPVLKGNRGEVIGYYAVYHLDTGGHSFVF MTKDEVLEHAKGKSKTFNNGPWQTDFDAMAKKTVIKQLLKYAPLSIEMQKAVSSDET VKSKIDEDMSLVVDESDSIEANFEIKEDEDGQLDVYVK
WP_150051132.1 RecT [Methylomonas rhizoryzae] (SEQ ID NO:283)
SELLSALNAPETQKPQTLPAMLKQHQPRFKAIAPRDVDVTRFSAALMADVRSNQKLAE CNPMTVLGAFIRSTQLGLEPGSQLGQAYFVPFKGECQLVIGYRGMIELAYRSGKVASISA RTVYENDVFEWELGTDERITHKPATGDRGALVAVYAMAKLTTGGIHFEVLDLAEIEKA KRASKS S SFGPWKDHFEEMAKKTAIRRLFKYLP VGTDLTRAVALDEKAESGSQQNDIEA ETVLDGEFYPAGGGNDG
WP 097006457.1 [Lacrimispora amygdalina] (SEQ ID NO:284)
AVDVKNELERKASGQNSQVKLTKSMTIADMVKALEPEIKRALPAVLTPERFTRMALSAI NNTPELAGCTPMSFIAAMMNAAQLGLEPNTPLGQAYLIPYKNKGVLECQFQIGYKGMID LAYRTGQIQVIQGQAVREFDYFEYQYGLDPKLVHRPGEEERGEITFIYGLFRLSNGGYGF EVSNKADMDAFAAKYSKSFGSKYSPWTENYEDMAKKTVIKRALKYAPVSVDFQKAMS MDETIKTEISVDMSEIRNECPEISENGEAA
WP 087225255.1 RecT [Lachnoclostridium sp. Anl4] (SEQ ID NO:285)
TDVKQELERKVGKQDSTAVRLTKNMSIPDMIKALEPEIRRALPAVLTPERFLRMALSAV NNTPKLAECTPMSFIAAMMNAAQLGLEPNTPLGQAYLIPYKNKNQLECQFQIGYKGIIDL AYRTGQMQMIQAHAVHEFDDFEYEYGLNPKLIHRPGDGNRGEITYFYGLFKLVNGGFG FEVMNREAMEAFAQQYSQSYGSQYSPWVKNFEDMAKKTVIKKALKYGPVKAEFQKAI SMDETIKTEIAVDMTEVQNEESE
WP 002566991.1 RecT [Enterocloster bolteae] (SEQ ID NO:286)
GVNVKHELEQRAAGQGASVRLTKNMTIVDMVKALEPEIRRALPAVLTPERFTRMALSSI
NNTPELAECTPMSFIAALLNAAQLGLEPNTPLGQAYLIPYKNKGKLECQFQLGYKGLIDL AYRTGQVQIIQAQVVREFDSFEYQYGLDSKLVHKPGEGARGEITYVYGLFKLSNGGYGF
EVSNKTEMDTFAARYSKSFGSKYSPWTEDYESMAKKTVIKRVLKYAPISSDFQKALSM
DETIKTGIAVDMSEIRNECLPEEAGSEAA
WPJ32412730.1 RecT [Kribbella albertanoniae] (SEQ ID NO:287)
ATADSVREELARSKEVERTQPKASNADNVIGLINRSLPEIAKALPGHVKPERIARIATTAV RVTPKLADCTQASFLGALLTAAQLGLEPNTPTGEAYLLPFGRNVQLIIGYRGYIKLANQS GQVRNIMAMTVYENDHFDYKYGSNPFLEHTPTLGQDPGPVKCWYACATFTNGGTNFV VLDKFKVEGYRARARSKDDGPWVTDYDAMARKTCIRRLAPYLPMSVELAQAMQVDE EVTAFTPGVSDPEVLATLAGVDTGTGEVQQ
WP 130067396.1 RecT [Bacillus albus] (SEQ ID NO:288)
ATNEKLKNQLANRKESAPATPEQTVEAYMKKMAPKMAEVLPKHMDMGRMSRMALTT MRTSPKLLNCTVSSLMGAVMQAVQLGLEPGLLGHCYILPYKGEATFIIGYKGMIDLARR SGHIQSIYAHAVHENDEFDYELGLHPKLEHKPVHGDRGAFVGAYAVAHFKDGGYQME FMPKSEIEKRRKRS AS ANS SF SPWKSDYEEMAKKT VIRYIFKYLPISIEVQLL AAQDEVVR KDITEEPEFIE ADPID VEQPTTEGDGQQEF SIEE
WP 087099033.1 RecT [Bacillus cytotoxicus] (SEQ ID NO:289)
ATNEKIKNQLANRKANASLSPEQTVEAYMKKMAPRFAEVLPKHMDMDRMSRIALTTIR TNPKLLECNVPSLMGAVMQAVQLGLEPGLLGHCYILPYKGEATFIIGYKGMIDLARRSG HIQSIYAHAVYENDEFEYELGLNPQLKHKPSFGDRGEFIGAYAVAHFKDGGHQMEFMP KSEIEKRRKRSASANSNYSPWKSDYEEMAKKTVVRYMFKYLPISIEVQSQAQHDEVVR KDITEEPQFIEADSVEVEETPTEGTNQEEFVIEE
WP 149216302.1 RecT [Bacillus sp. JAS24-2] (SEQ ID NO:290)
ATNKDVKNQLANRKASAPVTTEQTVEAYMKKMGPKMAEVLPKHMDMDRMSRIALTT IRTNPKLLECSVPSLMGAVMSAVQLGLEPGLLGHCYILPYKSEATFIIGYKGMIDLARRS GHIQSIYAHAVYENDEFEYELGLHPQLKHKPSFGDRGEFIGAYAVAHFKDGGHQMEFM PKSEIEKRRGRSASANSNYSPWKTDYEEMAKKTVVRYMFKYLPISIEVQSQAQQDEVVR KDITEEPEFIE VEQQTEGDGQGDFVIEGE
WP 125141636.1 RecT [Clostridium transplantifaecale] (SEQ ID NO:291)
TDVI<EELARI<AGNTGI<QEIRLNI<NMSIPDMVI<VLEPEII<RALPSVLTPERFTRMALSAIN NTPKLAECSPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKNQLECQFQIGYKGYIDL AYRTGQVQMIQAQAVHEFDYFEYEYGLTPKLVHRPGEGERGEITYFYGLFKMINGGFGF EVMNRAAMDAFAKQYSQSINSKYSPWNSQYEEMAKKTIIKKALKYGPVKSDFQKAISM DESIKTELSIDMSEVRNEDLIDGEFEEAA
WP 120055566.1 RecT [Lachnoclostridium pacaense] (SEQ ID NO:292)
TDVKQELEKRAGSSNQAIKLTKSMTIVDMVKALEPEIKRALPAVLTPERFTRMALSAINS TPKLAECTPMSFIAALMNAAQLGLEPNTPLGQAYLLPYKNKGVLECQFQIGYKGVIDLA YRTGQIQMIQAQAVRESDYFEYQYGLEPKLVHRPGDGARGEVTFIYGMFRLTNGGYGF
EVSNKADMDAFAEKYSKSYGSRYSPWTENYEDMAKKTVIKRALKYAPISSDLQKALSS
DETIKTVLSVDMSEINNECQIDEVIQEDAA
WP_118246619.1 RecT [Clostridium sp. AM58-1XD] (SEQ ID NO:293)
SVDVKNELEKRAAGTVNPAVKLTKNMTIVDMVRALEPEIKRALPTILTPERFMRMALSA
INNTPELADCTPMSFIAALMNAAQLGMEPNTPLGQAYLIPYKNKGTLECQFQIGYKGLID
LAYRTGLIQVIQAQTVREFDSFEYQYGLDSRLTHRPGDGERGEITYIYGLFKLTNGGYGF
EVSNKADMDAFAEKYSKSFGSRFSPWKENYEDMAKKTVIKRALKYAPVSSDFQKALS
MDETIKSELSIDMSEIRNECQVEASGQEGAA
WP 025114396.1 RecT [Lysinibacillus fusiformis] (SEQ ID NO:294)
ATTNELKAKSQNQVQQNVTPEQSLNTLLKRMGPQIQRALPKHMDADRIARIALTAVRA
TPKLLECDQMSFVAALMQSAQLGVEPNTGLGQAYLIPYGKQVQFQLGYKGLIDLAVRS
GQYKAIYAHEVYKEDEFSFAYGLHKDLVHVPSTNPEGEPIGYYAVYHLKNGGYDFVYW
TRERIDKHAHEFSQAVKKGWTSPWKTNYDAMAKKTVLKEVLKYAPKSIELQKVVEAD
ETIKTEVSEDMSDVIDVTDYSVIEDESAQEELIIEQ
WP 083048409.1 RecT [Mari Spirochaeta aestuarii] (SEQ ID NO:295)
RTDGTKEAGAAATAPTEGKAPAKAHKPADTIGAMIEKLKPQIERALPKHVTPDRMARM
ALTAIRNNPKLGQAEAVSLMGSIIQASQLGLEPNTPLGQCYIIPYNSKNGMQAQFQMGY
KGIVDLAHRSGQYRQLTAHPVDEADEFRYSYGLNPDLVHVPAEKPSGKITHYYAVYHL
TNGGFDFRVWSREKVEAHAKQYSKSFSSGPWQTNFDQMACKTVMIDLLRYAPKSVEIA
KATSADNRTHTINPEDPDLNIDTIDGDFELEGEER
WP_099424140.1 RecT [Solibacillus sp. R5-41] (SEQ ID NO:296)
ATSNELKKQAQGQVTAKPTTPEGSLNALLKKMGPEIQRALPKHMDADRIARIALTAVRT
TPKLLECDQLSFVAALMQSAQLGVEPNTGLGQAYLIPYGGKVQFQLGYKGLIDLAVRSG
QYKAIYAHEVYADDEFSFAYGLHKDLVHVPSANPSGDPIGYYAVYHLKNGGYDFVYW
TRERIDIHSKAFSQAVQKGWTSPWKTNYDAMAKKTVLKEVLKYAPKSIEMQKVVEADE
TIKNEVAPDMSNVIDVTDYSILEDPQDVTDAQ
WP 076065282.1 RecT [Viridibacillus sp. FSL H8-0123] (SEQ ID NO:297)
ATNNALKEQMKQAPSKEVKPEQSLNTLLKRMGPEIQRALPKHMDADRIARIALTAVRN
TPKLLDCDQMSFVAALMQSAQLGVEPNTGLGQAYLIPYGKQVQFQLGYKGLIDLAVRS
GQYKAIYAHEVYEDDEFSFAYGLHKDLVHVPAPNPTGEPIGYYAVYHLQNGGYDFVY
WTRERIDQHAHKFSMAVQKGWTSPWKTNFDAMAKKTVLKEVLKYAPKSIEMQKVVD
ADETVKTDVSDDMSNVIDVTDYTVMDQEQETIQEPTK
WP 024292388.1 RecT [Lacrimispora indolis] (SEQ ID NO:298)
SDVKQELEKRAAGGGGQSQSVRLTKNMTIVDMVKALEPEIKRALPSILTPERFTRMALS
AINNTPKLGECTPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKGVLECQFQLGYRGLI DLAYRNERMQSVEAQVVYENDEFSYELGLHPSLIHRPSFDEPGEIRAFYAIFRLDNGGFR FEVMSKSYVDAYAARYSKAFTSDFSPWKSNYEGMAKKTVIKQLLKYAPMKSEFQKAV TMDETIKTELSVDMSEVSNQEVIDRELTEQVA
WP 009524931.1 RecT [Peptoanaerobacter stomatis] (SEQ ID NO:299)
GAKELIQKKQENKQISPTSNMNMLLQSMAGAIKKALPAQINSERFQRVALTAFSSNQKL QQCDPISFLAAMMQSAQLGLEPNTPLGQAYLIPYGKQVQFQVGYKGLLELAQRSGQFK SIYSHEVRENDEFEMEYGLNQKLVHKPNLKQERGEVIGYYACYHLTNGGESMFFMTKD EIINFGKSKSKTFNNGPWQTDFDAMAKKTVLKQLLKYAPLSIESQKFMSMDETVKSDIS ANMDEINNDTVDFEVDIQTGEVINDIVVENTNEDEAN
WP 015358111.1 RecT [Thermoclostridium stercorarium] (SEQ ID NO:300)
TTVNQTELKNKLAEKAKTPAKTGNTVFDLIRKMEPEIKRALPKQISPERFARIAMTAVRN TPKLQACEPISFIAALMQSAQLGLEPNTPLGQAYLIPYGKEVQFQLGYQGMLTLAYRTGE YQSIYAMPVYANDEFEYEYGLNEKLVHKPAPDPEGEPIYYYAVYKLKNGGHGFVVMSR QQIERHRDKYSPSAKQGKFSPWNTDFDSMAKKTVLKQLLKYAPKSVEFATQIAQDETIK TEIAEDMTEVQGIEVEYEATDDQENQENQEQED
WP 002595146.1 RecT [Enterocloster clostridioformis] (SEQ ID NO:301)
GIDVKHELEKRAAGQDKPVKLTRNMTIADMVKALEPEIKRALPAILTPERFTRMALSAV NNTPELANCTPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKGTLECQFQLGYKGLID LAYRTGQIQIIQAQAVREFDYFEYQYGLDSRLVHKPGNEERGQITFIYGLFKLSNGGYGF EVSNKAEMDAFAAKYSKSFGSKYSPWTEDYESMAKKTVIKRALKYAPVSSDFQKALSL DETVKSEIAVDMSEIRNDCIPADMGTEAA
WP_100306418.1 RecT [Lacrimispora celerecrescens] (SEQ ID NO:302)
SDVKQELEKRAAGGGSQSQSVKLTKNMTIVDMVKALEPEIKRALPSILTPERFTRMALS AINNTPKLGECTPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKGVLECQFQLGYRGLI DLAYRNDRMQSIEAQVVYENDEFSYELGLHPSLTHRPSFDEPGEIRAFYAIFRLDNGGFR FEVMSKSYVDAYATKYSKAFTSDFSPWKNNYEGMAKKTVIKQLLKYAPIKSDFQKAIT LDETVKTQLSIDMSEIRNECLPDTSENSEVA
WP_071062796.1 RecT [Andreesenia angusta] (SEQ ID NO:303)
SNLKNQLANKAGGTATKKQPQTMQDWIKVMEPQIKKALPSVITAERFTRMALTAISTNP KLAECTPESFMGALMNAAQLGLEPNTPLGQAYLIPYGKSVQFQVGYKGLMELAQRSGQ FKSIYAHTVYENDEFEVEYGLTQNIVHKPNFDDRGKPIGFYAVYKLTNGGENFVFMTQR EVEEFGKAKSKTFNNGPWKTDFEAMAKKTVLKQLLKYAPIKVEFQREIAQDATIKTEIA EDMTEVPEEMVEAEYEVVEQNTMAEDADLKGTPFETK
SFO83314.1 RecT [Amycolatopsis arida] (SEQ ID NO:304)
HGTALNPERFTRVALTVIRQSADLQRCRPESLLGALMTSAQLGLEPGPLGEAYLVPYGD
QVTFIPGYRGLIKLAWQSGQLRHISARVVHEGDRFSYSYGLHPDLIHQPTRGDRGPITDV YAAATLIDGGVEFEVLDVATVETIRARSRAGRKGPWVTDWEAMARKTAIRQLAKWLP
MATVMSRAIAAEGTVRTDLDADALDDLTADPGPEVLDADPAWDGPEPPGDQARNQEP TTQGDA
WP_110092637.1 RecT [Corynebacterium striatum] (SEQ ID NO:305)
GTNLEQRMAANNAPAKQNRPVTLADQIRSMESQFQLAMPKGMEAQQLVRDALTCLRQ
TPKLAECTPQSVLGGLMTCSQLGLRPGVLGHAYLLPFWDRKQGGMVAQLVVGYRGLV
ELAHRSGQIQSLIARTVYENDHFDVDYGLDDKLVHKPCMNGPKGNPIAYYAVAKFTTG
GHSFIVMSI<DEMLAYRDEFAI<AI<NI<QGEVFGPWADNFDAMAHI<TCVRQLAI<WMPSS
TDLDRGIAADETVRVDLSESALDYPQHVDGEVVDSKPAEDEAA
WP 129692339.1 RecT [Gottfriedia acidiceleris] (SEQ ID NO:306)
ATPAELKNLLAAKPKGEVKLTPDQQVSSYLKAYEGTFRQIAPKHFNTERFQRIALSEIRK
NPKLLDCNLPSLMSAVLQSVKLGLEPGLFGQAYLIPYGKEVQFQIGYKGLIELAQRSGRI
AKIQAREVYEHDEFEVSYGIDDTIIHKPKLDGDRGDVRLYYAVAWFKDGAAQFEIMSKS
DVENHRDKFSKTKNYGPWKENFDAMARKTVLKKLVNQLPMDVEFHEAVQEDETVRK TINDEPEVIAAEYEIIDAPEVVEGNE
WP_118016648.1 RecT [unclassified Coprococcus] (multispecies) (SEQ ID NO:307)
ANNIDLKQELAEQASKVPAKKDEEVKLTKSMTIPDMVKAMMPEIKKALPAVMTPERFT
RIALSALNTTPALNQCTPMSFLAALMNAAQLGLEPNTPLGQAYLIPYKNHGTLECQFQIG
YKGLIELAYRSGQMQTIQAQTVYENDEFAYQYGLEPVLVHRPAYSDRGEVKYFYGIFKT
VNGGYGMAVMSRAEMDLYAKTYSKAYDSSYSPWKSNYEDMAKKTVIKQALKYAPIK
TDFQRALSFDETIKKEISLDMSTVKNELLDVA
WP 051200279.1 RecT [Butyrivibrio sp. FCS006] (SEQ ID NO:308)
PYLFGGQMKEQEIKNQLAAKAVETTNPKLSKNMNIADLIKAIEPEIKKALPTVITPERFTR
IALSALNTTPKLAECSQMSFLAALMNAAQLGLEVNSPLGQAYLIPYNNKGKLECQFQIG
YKGMLGLAYRNPEIQTIQAQVVYENDDFKYELGLDSKLYHKPSLSDRGKVRCYYALYK
LRNGGYGFEVMSRRDVEEYAKRYSKVTDSLYSPWANNFDSMAKKTVIKQLLKYAPLR
TDLEKAMSMDESIKTRVSVDMSEVENEETFDAEVEV
WP_107514794.1 RecT [Staphylococcus equorum] (SEQ ID NO:309)
ATNETLKQKVVERKPNGVKEQSPKTQLNHLLKKMAPEIQRALPKHMDSDRMARIAMT
AVSNTPKLLECDQMSFIAALMQASQLGVEPNTGLGQAYLIPYAGKVQFQLSYKGLIDLA
TRSGQYKSIYAHEVYTNDEFEYRYGLFKDLIHIPSQEPEGNPIGYYAVYHLKNGGYDFV
YWTRERVDKHAKEFSQAVQKGWTSPWITNYDAMAKKTVLKEVLKYAPKSIEMNKAV
ENDSTIKEEIDKDMSTVIDVTDYSEVEEQESLETGGQTSK
WP_117624242.1 RecT [Hungatella hathewayi] (SEQ ID NO:310)
RRDRNVTAVKQELEKKAAGTSQAVKLTKNMTIVDMVKALEPEIKRALPSILTPERFTRM
ALSAINNTPKLAECTPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKGVLECQFQLGY RGMIDLAYRNERMQSIEAQTVYEHDEFFYELGLHPALVHRPTFEDRGEIRAFYAIFRLDN GGYRFEVMSKSYVDAYAMRYSKAFTSEFSPWKSNYEGMAKKTVIKQLLKYAPVKSEF QKAITLDETVKTELSVDMSEVQNEDLSETLTAESAA
WP_118771779.1 RecT [Roseburia intestinalis] (SEQ ID NO:311)
GDIRSELAKKAEQTQGNTKLTKSMSIADLIKAMEPEIQKALPSVITPERFTRMALSALNTT PKLQECTPMSFLAALMNAAQLGLEPNTPLGQAYLIPYKNKNVLECQFQLGYRGMIDLA YRNGHMQSIEAQAVYENDVFSYALGLHPELVHKPTLEEKGALKAFYAIFRLDNGGFRFE VMGKTYIDWYANRYSKAFTSEFSPWKSNYEGMAKKTVIKQLLKYAPLKTEFQRALSTD ETIKNSLNVDMGEVLSEDIIDMPCEEVA
WP 107378794.1 RecT [Staphylococcus chromogenes] (SEQ ID NO:312)
ANAKEFKKQMNSKNEVAETNNAPQKAKGPRQQVSDLLDRMAPEIQKALPNNMSAERM ARIAMTAVSSNPKLLECDPKSFIGALMQASQIGLEPNTALGQAYLIPYGNQVQLQLSYLG LIELATRTGQYKAIYAHEVYKDDEFSYEYGLYKNLIHKPVDDPNGEPIGYYAVYHLMNG GYDFAYWTRKKVEAHAQQYSKAVQQGWNSPWKSDFNAMAKKTVLKDLLKYAPKAIE VSQAIGSDSKVSEINDEGEIIDVTDYSQEEEK
WP 094369469.1 RecT [Romboutsia weinsteinii] (SEQ ID NO:313)
TNLKNTLKNKEAKGNNLAINPSYAMKQLMIKMKGEITSALPKELCSERFQRVALTAFNS NPKLQNCAPMTFIAAMMQSDQLGLEPNTPLGQAYLIPYKVKGIDKVQFQIGYKGLLELA HRSGRLKTLYAHEVRENDEFDIDYGLEQRLIHKPLLKGNRGEVIGYYAVYHLEHNGYSF VFMTYDEVLEHGKKYSKSFEGGIWEKEFDSMAKKTVIKKLLKYAPLSIEIQKAINFDESV KGSIDSDMLLVDKADESIDVEGNVLNQRGIKYGCI
CDF42377.1 [Roseburia sp. CAG:182] (SEQ ID NO:314)
DVKEELAKMAEEKPTKKLTKSMSIQDMIKVIEPEIKKALPSVLTPERFTRMALSAINNTP KLAECSQISFLAALMNAAQLGLEPNTPLGQAYLIPFQNKGKLECQFQIGYKGIIELVYRN PLIQTIQAQVVYENDEFEYELGLNSRLFHRPALYDRGETVLFYALFKMSNGGYGFEVLS KQDMDAYAKRYSKGISSEYSPWKSNYEEMAKKTMIKKVLKYAPIRTDFQKAVSMDESI KKELSVDMSEVSNENIIDMEEITQEEE
WP 123609006.1 RecT [Mobilisporobacter senegalensis] (SEQ ID NO:315)
KDIKSALEKKVDKQDVKLTKSMSITDMIKALEPEIKKALPSVITPERFTRMALSAVNNTP KLAECSQMSFLAALMNAAQLGLEPNTSLGQAYLIPYQNKGKLECQFQLGFKGMIDLVY RNEKVQTIQAHCVYEEDYFEYELGLDSKLAHKPALANRGKMILVYAFFKLENGGFGFE VMSKEDIDIHALKYSKGYSSQYSPWKSNYEDMAKKTVIKKVLKYAPLKIDFQRAISVDE TVKAEISIDMSEVQNEEIIDGQCTDVGEIEEK
WP_115856892.1 RecT [Staphylococcus felis] (SEQ ID NO:316)
ANANSFKEQVSNKNEVSENNNTPQQKTKGPRQQVSDLLERMAPEIQKALPSHMSAERM ARIAMTAISSNTQLLECNPRSLIGALLQASQIGLEPNTALGQAYLIPYYNRNKGEFEAQLQ LSYLGLIELATRTGQYKAIYAHEVYKEDEFYYEYGLHKNLVHKPVDDPKSEPIGYYAVY
HLQNGGYDFSFWTRNKVELHSGQYSKAVQKGWNSPWKTDFNAMAKKTVLKDLLKYA PKSVEVSRAVGTDSKVSEISQNGEIIDVTDYSKEEE
WP_108404827.1 RecT [Corynebacterium liangguodongii] (SEQ ID NO:317)
KDLETRMAANQQPAQQRPTTLADQIRGMEQQFALAMPKGAEASQLVRDALTALRQAP
KLAQCTPQSVLGSLMTCAQLGLRPGVLGHAYLIPFYDRRAGGLVAQLVIGYQGLVELA
HRSGQIKSLIARTVYENDVFDVDYGLEDKLVHKPYMGGDKGQPIAYYAVAKFTTGGHA
FYVMSHPEMLDYRARFAKSAERGPWVDNFEAMALKTCVRQLSKWMPKSTELATAIAA
DESVRVDLTPDAINYPEHVDGEVVDAQGTTEDTAGEGEQSA
WP 021747387.1 RecT [unclassified Oscillibacter] (multispecies) (SEQ ID NO:318)
KEGLIQGTQSAQAAKKGPATMQDYIKKMQGEIAKALPSVLTPERFTRITLSALSTNPKLA
QTTPKSFLGAMMTAAQLGMEPNTPLGQAYLIPFKNHGVLECQFQLGYKGLIDLAYRSG
EVSTIQAQTVYENDEFEYELGLEPKLHHVPAKGERGEPVYFYAVFRTKDGGYGFEVMS
VDDVRTHAKKYSKAYSNGPWQTNFEEMAKKTVLKKALKYAPLKTEFMRGLTSDETIK TEISEDMYSVPDETVIEAEGYEVDGDTGEVIERPADGQ
WP 103110615.1 RecT [Brevibacillus reuszeri] (SEQ ID NO:319)
SNKLAQRAGQQTQPVKPDQQISALLKRMEPEIARALPKHLTSDRLARIAMTSIRQNPKLL
ACDQMSLLAGVMQSAQLGLEPNTPLGEAYLIPYGKEAQFQVGYKGIISLAHRTGEYQAI
YAHEVFKNDEFSYSYGLDKTLNHKPADEPEGDPIYFYAVYRLKNGGFDFVVWSTKKID
AHAKKYSQAYQKGWTTPWKTDFVAMAKKTVLKEVLKYAPKSAEMAKALVMDETVK
NEISEDMSEVPGMVIDIEADAANVEETAGGGASE
WP_016998679.1 RecT [Mammaliicoccus] (SEQ ID NO:320)
ATNESIKNQVASRKKNEVQNKSPKTQLNDLLIKMGPEIQRALPKHMDADRMARIAMTA
VSTTPKLLECDQMSFIGALMQASQLGVEPNTGLGQAYLIPYGGKVQFQLSYKGLIDLAT
RSGQYKAIYAHEVFPNDEFNYQYGLFKNLEHIPSQEPEGEPIGYYAVYHLKNGGYDFVY
WTRERVDKHAKDFSQAVQKGWTSPWKTNYDAMAKKTVLKEVLKYAPKSIEMNKAVN SDSTIKDEINEDMSSVIDITDYEEVNDQQEEKKEESK
WP 147540090.1 RecT [Clostridiaceae bacterium] (SEQ ID NO:321)
SNLKKALKTNETKGNSVTVSKAYAMKQLMIKMKGEITSALPTNLSSERFEKVALTAFNS
NPKLQKCDPRTFIAAMMQSAQLGLEPNTALGLAYLIPYEVKGINKVQFQIGYKGLLELA
NRSGKLKTLYAHEVRENDEFDIDYGLEQKLIHKPLLKGNRGNVIGYYAVYHLEPSGYNF
VFMTYDEVLEHGKKYSKSFEGGVWEKEFDSMAKKTVIKKLLKYAPLSIEMQKAIVFDE SVKGSIDSDMLLVDKEDESIEGSELN
WP 019168122.1 RecT [Staphylococcus intermedius] (SEQ ID NO:322)
ANANSFKEQVSKNEVQETNNEKPKGPRQQVSDLLERMAPEIQKALPSHMSAERMARIA
MTAISSNPQLLECNPRSLIGALLQASQIGLEPNTALGQAYLIPYYNHKKKEFEAQLQLSYL GLIELATRTGQYKAIYAHEVYKEDEFYYEYGLHKNLVHKPVDDPNGEPVGYYAVYHLQ
NGGFDFAYWTKNKIELHAGNYSKAVQKGWNSPWKTDFNAMAKKTVLKDLLKYAPKSI EISQAVGSDSKVTEINKQGEIIDITEYGQEALEG
WP 148820236.1 RecT [Corynebacterium urealyticum] (SEQ ID NO:323)
AKNLEARMQQSTNAPARADKPLSLPDQIRQMEDQFRLAMPKGAEATQLVRDALTCLR
QTPQLAQCTPASVLGGLMTCAQLGLRPGVLGHAYLIPFNDRRSGNSVAQLVIGYQGLVE
LAHRSGQIKALIARTVYENDHFDVDYGLEDKLVHKPHMGADKGNPVAYYAVVKFTTG
GHAFYVMSHPEMLQYRDKNAKSPKRGPWVDNFEAMAHKTCVRQLAKWMPKSTEFSQ ALATDESIRLDVTPDAINYPDHPAEGEVIDGEVEQDGGQQ
WP 096823857.1 RecT [Staphylococcus nepalensis] (SEQ ID NO:324)
ATQNQFKNQLTQKKENNNQPQQKAVGPKQEISNLLDRMAPQIQKALPQHMSAERMARI
AMTAVSSTPKLLECDPKSLIGALMQSSQIGLEPNTNLGQAYLIPYGKEVQLQVSYLGMIE
LANRSKQYKAIYAHEVYPEDYFEYQYGLQKDLIHKPADNPQSEPIGYYAVYHLLNGGY
DF VYWSKAKIDDHARQF SKAVQKGWQ SPWKTNFNAMAKKTVLKDLLKF APKSIEMN NAVSSDSKAQQIDDDGNIIDVTDYSQVNDEPEQLQEGQ
WP 098170605.1 RecT [Bacillus sp. AFS017336] (SEQ ID NO:325)
ATNESLKNQITNKKTGEVPLTPAQQVSSYLKAYEGTFQQIAPKHFNTERFQRIALSEIRKN
PKLLECSVPSLMSAVLQSVKLGLEPGLFGQAYLIPYGKEVQFQIGYRGLIELSQRSGRILK
IQAREVYENDEFEVSYGIDDNIIHKPALDVDRGKVRLYYAVAWFKDGGAQFELMSISDV
EKHRDKFSKTAKFGPWKDHFDEMAKKTVLKKLVKQLPMDVEFQEAVQEDETVRKTIT DEPEILQAEFEIVDQPEISVE
WP 087290962.1 RecT [Pseudoflavonifractor sp. Anl84] (SEQ ID NO:326)
ATEKAIQRATGRAPALENRPALQQYIKQMSGEIKKALPSVMTPERFTRIVLSALSTNPKL
AETTPQSFLGAMMTAAQLGLEPNTPLGQAYLLPYWNSKANAYECQFQLGYKGLLDLA
YRSGEISVIQAHVVYSEDQFSYSFGLKPELKHIPAGEERGEPVYVYAIFHTKDGGYGFEV
CSIDDIRAHAQRYSKSFQNGPWQTNFEEMAKKTVLKRVLKYAPLKSEFLRGLAQDETIK QEISEDMYMVEA AYAEPD VS S AEND
WP 051264703.1 RecT [Nakamurella lactea] (SEQ ID NO:327)
ASNLAARAAEQVEQQTAPNRPPTIKEQIGRMESQFALAMPRGSEAAQLVRDAITAINTN
PQLAECTPASVLGALMTCAQLGLRPGVLGHAWVLPFRSKGVMQAQLVIGYQGLVELA
HRTGQVASLIAREVHERDHFDVDYGLADSLIHKPLLNGDRGPVTGYYAIVKFKGGGHSF
IYASKADVEAHRDKFSKMKSFGPWVDNFDSMALKTVVRMLAKWMPKSTEFANAISAD
EGVRVDYSPTADVAQATEYVQPQLEEAPVEGVVVSEGGES
CCZ61365.1 [Clostridium hathewayi CAG:224] (SEQ ID NO:328)
ANDIRGELARRASGTETQAVKLTKNMSIPDMIKALEPEIKRALPTILTPERFTRIALSAINN
TPKLAECSPMSFIAALMNAAQLGLEPNTPLGQAFLIPYKVKGSLECQFQIGYRGMIDLAY RNERVQSIEAHTVYENDVFEYELGLNPRLVHIPTMEEPGDPIAFYGIFRLDNGGFRFEVM
NKNAIDAYAARYSKAYDSASSPWKNNYESMACKTVLKQLLKYSPMKSEFQKAVSMDE
SVKTELSVDMSEVQNVNLIEETQEDAA
WP 068720576.1 RecT [Veillonellaceae bacterium DNF00626] (SEQ ID NO:329)
KTTGGLQQQQQQQAQALQNGGTTLKGYLQAMMPEIKKALPTVMTPERFTRIVMTTIST
NPALQNCTPQSFLGAVMQAAQLGVEPNTPLGQAYLIPYGNQVQFQLGYKGLIDLAYRS
GEVQSLQAHEVYQNDTFEYELGLNPKLKHIPALTNRGDVILYYAVIKFKNGGEGFEVMS
KEDVEAFAKSKSKTYGRGPWQTDFDEMAKKTVLKKVLKYAPMKTDFIRAVATDETVK
SSVAEQMADLPDETVTIDTEAQVVVDKETGEVKS
WP 037404193.1 RecT [Solobacterium moorei] (SEQ ID NO:330)
TEIKAAKAPATVAKAGVSTQNKTIKDYITIMKPEIEKALPSTITPERFTRITLSAVSNNPKL
QACSPSTFLSAMMQSAQLGLEPNTPLGQAYLIPYGNSCQFQLGYKGLLQLAYNSGQIKTI
RTETVYENDEFKYELGLHSDLVHVPAMSNRGNPTAYYAVIEYTNGGYGFEVMSHDDVL
EHAKKFSKTFNNGPWQSDFESMAKKTVLKQALKYAPLSTELVSKINTDETVKSSISDHM
EEVKNDIDLSQIIDAETGEIHE
WP 027347470.1 RecT [Helcococcus sueciensis] (SEQ ID NO:331)
AQAKELLENKTNNTVKKSEKQTMENLLTLMADEIKKALPENVKSERFRRIALTAFNGN
KDLQQCEPTSFLAAMMQSAQLGLEPNTPLGQAYLIPYNNSKKNIKEVQFQVGYKGMLD
LAHRTNQYKNIQANIVYEKDEFDIEYGLNPKLKHIPNMKEDRGQAIGYYAVYNLINGGQ
GFEYMTRAEVEKHAQKFSKTYRNGPWQTDFDEMAKKTVLKKVLKYAPMSTELQEATA
IDER VVNEENIKSKNEDKFVDVDWSYVDDVEED VIE
WP_072526012.1 RecT [Clostridium sp. Marseille-P3244] (SEQ ID NO:332)
AARATNSVKEELAKKAETKAVGEKKLTRSMSIADLIKAMAPEIKKALPEVITPERFTRM
ALSALNTTPKLQECTQMSFLAALMNAAQLGLEPNTPLGQAYLIPFNNKGTMECQFQIGY
KGLIDLGYRNPQMQIISAQAVYENDEFEYELGLNPKLEHRPALHDRGELRLFYGLFKLV
NGGFGFEVMSKEAVDAYAKEYSKSFDSSFSPWKTNYEAMAKKTVIKQALKYAPIKADF
RKALSTDETIKNEIAEDMSEIHGEDIFDAEYTEQTA
WP 092453396.1 RecT [Clostridium fimetarium] (SEQ ID NO:333)
ETIDIKQELASQAQTDSKKEVKLTKAMSIAEMIKAMMPEIKRALPSMITPERFTRIALSAL
NNTPELQACTPMSFISALLNAAQLGLEINSPLGHAYLIPYKNKGVLECQFQIGYLGLIALA
YRNELMQTIQAQCVYENDEFLYEYGLNPKLVHRPATSDRGEPVFFYGLFKMINSGFGFC
VMSKQEMDEFARTYSKGLASSFSPWKTSYNEMAKKTVIKQALKYAPIKTDFQKALSTD
ESIKYAISEDMTEAVNEIVSQNTEVA
WP 027295741.1 RecT [Robinsoniella sp. KNHs210] (SEQ ID NO:334)
TTRTGNIKEELAKKAEGTNGDTRLTKAMSIADLIKAMEPEIKKALPEVITPERFTRMALS
ALNTTPKLRECTQISFLAAMMNAAQLGLEPNTPLGQAYLIPFNNKGTMECQFQIGYKGM IDLSYRNPQMQMISAQAVYENDEFKYELGLNPTLIHRPVLRGRGEVILFYGLFKLTNGGY
GFEVMSKEEMDAYAKAYSKAIDSSFSPWKSNYNGMAKKTVIKQVLKYAPIKADFRKAL
S SDETIKNEISENMSEIHGEIIFDTDYMEES A
WP_117768035.1 RecT [Blautia sp. OF03-15BH] (SEQ ID NO:335)
NVKEELAQKAEITQKEVKLKKSMSISDMIRALQPEIKKALPSVVTPERFIRMALSALNTT
PKLAECSQISVLAALMNAAQLGLEPNTPMGQAYLIPFNNKGKMECQFQIGYKGLLELVY
RNPAIQIIQAQTVYENDYFEYELGLNSRLIHRPELEDRGEIRLFYGLFKMVNGGYGFEVM SRQEMDQYAARYSKSFASGFSPWENNYEDMAKKTMIKRVLKYAPVKIETARALINDESI KLHLSEDMSEVENETVVDGQAEEKAA
SCJ42694.1 [Ruminococcus sp.] (SEQ ID NO:336)
GTSIQKNVENNALQKEKMPTMQAYIKKMEGEIKKALPSVMTPERFTRITLSALSTNPKL
AATTPGSFLGAMMTAAQLGLEPNTPLGQAYLIPYSNKGKLECQFQIGYKGLIDLAYRSG SISVIQAHTVYENDDFEYELGLDPKLKHIPSKSADKGNPAWFYAVFKTKDGGYGFEVMS IEDIRSHAAKYSQSYNSAYSPWKTNFEEMAKKTVLKKALKYAPLKSDFVRQISTDETIKT KLSDDMFSVPAETIEVEGIEVDTETGEITEVDHA
WP 092724975.1 RecT [Romboutsia lituseburensis] (SEQ ID NO:337)
SNLKNVLKNQEDKGQGITVNPTYAMKQLMIKMKNDIDLALPKNLSSERFQKVSMSAFN
NNEKLQNCEPTTFIAAMMQSAQLGLEPNTPLGQVYLIPHNLNGVDKVQFQVGYKGLLQ LAHRSGKLKTLYAHEVKENDEFEIDYGLEQKLIHKPLLKGNRGDVIGYYAVYHLEPSGY SFEFMTYDEVAKHGKKYSKDFEGGIWEKDFDSMAKKTVIKKLLKYAPLSIEMQKAVAF DESVKSSIDSDMLLVESIGE
KKZ74881.1 VO63 05385 [Streptomyces showdoensis] (SEQ ID NO:338)
TSDARNAVARRAANVGQVEQAGEQPKPTMAQQIERMKPEIARALPKHMDADRIARIAL
TLIRKNPDLANCTTESFLGALMTCSQLGFEPGSPTQEAYIIPRKGQAEFQLGYQGMVTLF
YQHPMASSVKVETVRENDYFEHEEGLEERLIHRPFADGPRGKAIAYYSVARLINGGRTF
KVMYPAEIEERRQKLPSKNSPAWRDNYDEMAKKTVLRNHFKALPKSAELARALAHDG TVRTDWQPDAIDVPPEYLSEPQRPELEAGAQ
WP 055284109.1 RecT [Dorea longicatena] (SEQ ID NO:339)
TVGKTDEIKQELARKVENTKAGTKLKKSMSIADMIKVMEPQIKKALPEVITPERFTRMA
LSALNTTPKLNECTPMSFLAALMNAAQLGLEPNTPLGQAFLIPYNNKGKMECQFQLGY
KGLIDLSYRNPNMQIITAHTVYENDEFEYELGLNPCLDHRPTLGERGEIRLFYGLFKLTN GGFGFEVMSKTAMDDFAKEYSKAFDSSFSPWRTNYESMALKTIIKKALKYAPLKSEFRN ALSTDETIKNEIGADMSEINSENIFDTVYQEECA
SDL28883.1 RecT [Streptomyces indicus] (SEQ ID NO:340)
STDARNAVARRAETVGQVEQQAQQQPTLAQQIERMKPEMERALPKHMSADRMARIAL
TLIRKNPDLATCNTQSFLGALMTCSQLGFEPGSPTQEAYIIPRKGNAEFQLGYQGMVTLF YQHPMASSIKVETVRENDYFEHEEGLEERLVHRPCATGPRGRAIAYYSVARLINGGRTF
KVMYPDEIEERRQKLPSKNSPAWRDNYDEMAKKTVLRNHFKALPKSAQLARALAHDG TVRTDATADVIDVAPEYPQRPELEAGPTA
WP 145458209.1 RecT [Staphylococcus pettenkoferi] (SEQ ID NO:341)
ATQKDFKNQISQKETQQKQEVQKKKKGPRQQVSDLLDRMAPEIEKALPNHLSADRMAR
VAMTAVSSNPKLLECDPKSFIGAVMQSAQLGLEPNTALGEAYLVPYAGKVNFQLSYLG
LINLATRSGQYKAIYAHEVYAEDEFRYQYGLHKDLIHKPVDNPKGKPIGYYAVYHLLNG
GYDFVYWTTERIQKHAKKYSFAVQKGYQSPWNDEFDAMAKKTVLKDLLKYAPKSIEM NNAVRSDDKQSELSDEGVVIDVTNYDEENGEEK
WP_117787252.1 RecT [Tyzzerella nexilis] (SEQ ID NO:342)
AGVKEELAKKAESTKGETKLTKSMSIADLIKAMEPEIKKALPEVITPERFTRMALSALNT
TPKLRECTQMSFLAALMNAAQLGLEPNTPLGQAYLIPYNNKGVMECQFQIGYKGLIDLS
YRNPQMQIISAQAVYENDDFSYELGLNPKLEHCPTLGERGEVRLFYGFFKLVNGGFGFE
VMSKTAMDEYAKEYSKAFDSSFSPWKSNYIGMAKKTVIKQALKYAPLKTDFRKALSND ETIKTELSDDMSDIHGEEIWDVEYQEKTA
WP_073112630.1 RecT [Hespellia stercorisuis] (SEQ ID NO:343)
ADIKEELAKKVAEGTEDKKKLTKSMSIADLIKAMEPEIKKALPEVITPERFTRMALSALN
TTPKLKECTQTSFLTALMNAAQLGLEPNTPLGQAYLIPYKNKGNLECQFQIGYKGLIDLS YRNRQMQIIQAQAVYENDEFEYELGLNPVLVHRPALQNRGAVKLFYGIFKLTNGGFGFE VMSKADMDAYAKEYSKAFDSSFSPWKSNYIGMAKKTVIKQAIKYAPLKTDFRKALSTD ETIKTEFCEDMSEVQCKDIWDTEYKERSA
CDD36322.1 [Roseburia sp. CAG:309] (SEQ ID NO:344)
DVKNELAKKAENTGKVKLTKSMSIADMIKTLEPEIARALPSVITPERFTRMALNALNNTP
KLAECTQMSFLAALMNAAQLGLEPNTPLGQAYLIPYKNKGVLECQFQIGYKGMLDLVY
RNEMVQTVQAQVVYQNDEFHYALGLTGRLEHIPTLRDRGEPYAFYALFKLENGGYGFE
VMSKTDMDAFALQYSKGISSEYSPWKTNYIDMAKKTVIKKVLKYAPLKTEFQRALSND ETIKTHFAVDMSEVEPETVIDMEEGELLESAS
WP 128520904.1 RecT [Absicoccus porci] (SEQ ID NO:345)
TTTNQQGMITKKANNSVAKKTNRTMKDYITMYQGEIAKALPSVMTPERFVRIATTAVT
NTPKLASCTPQSFIGALLNAAQLGLEPNTPLGQAYLIPYGNQCQFQIGYLGMVELAQRA GTNVDAHVVYANDEFDYSLGLHPDIKHVPAMKDRGEAIAYYAVWHNGENFGFEVMSR
EDVEKHMKKYSKTYSNGPWKTEFDEMAKKTVLKRALKYAPKKTDLARAVMQDETIK QFNPI<ADNDMADAI<NDFFDVEYDEVDENTDPVTGEVI<
GAK01483.1 RecT [Geomicrobium sp. JCM 19055] (SEQ ID NO:346)
GYKGMIDLARRSGHIKSIYAHTVHANDEFEYELGLEPKLVHKPATGDRGNMEYAYAVA HFVDGGYQFEVFSHHDIEQVKKRSKAGNFGPWKTDYEEMAKKTVVRRMFKYLPISIEIQ QHASQDETVRRDITEEAEKVDNIIDLPNYEDPNNIDVPDEEQDEQKDEKQKQQGSAEEIA
LDFK
WP_135329961.1 RecT [Streptomyces sp. MZ04] (SEQ ID NO:347)
STNLAARVEARRQNPTTKQPARRGKAAQQPTLVQFVQSMRGEIARALPSHVASPERIAR IALTELRRVDHLAECTQESFGGALMTCAALGLEPGGVGGEAYLLPFWNKKVRAYEVTL VIGYQGMVRLFWQHPAAAGLAAHTVHEGDEFDFEYGLEPFLRHKPARTGRGKPTDYY AVAKMANGGSAFVVMNVEDIEAIRHRSKARDAGPWSTDYGLRRHGAQDLHSAVVQV AAEVC
WP 079588582.1 RecT [Acetoanaerobium noterae] (SEQ ID NO:348)
SNLKNELAKKANNSVTDGNKEPQTIKDWIKVMEPAIKKALPSVITPERFTRMALTAISVN PKLAECTPKSFMGSLMNAAQLGLEPNTPLGQAYLIPYKNKGNMEVQFQIGYKGLIELAY RSGEFANIYAKEVFENDEFEYEFGLEPVLKHKPASGNRGEVIAYYAVFKLTNGGFGFEV MSKEDITNHAKTYSQAYSSSYSPWSKNFDEMAKKTVLKKVLKYAPIKVEFVKQIVQDS TIKTEINSDMTEVESQNVFEAEETDYEVIDQEETK
WP 107635892.1 RecT [Staphylococcus haemolyticus] (SEQ ID NO:349)
ATQNEFKNQLAKKEDKGNTNAPTQTKSTNPRTIAQNYLAKMKPEIEKALPAHMSHERM TRIALSAVNSNPELTEVILNNPTSFLGALMQSAQLGLEPNTNLGHAYLIPYYDKNSGKKI VNLQLGYMGLLDLAHRSGMYQKIFAMPVYKDDYFEYQYGTNEKLNHVPAQVSKGEPI GYYAFYKLTNGGVHFVYWSRQKMQMHKDRYTRRGSVWNNNFDAMALKTVIKDVLK YAPKSVEMGEAVQSDENNFEFNEDSKVIDVTDYETEENK
WP_107638953.1 RecT [Staphylococcus hominis] (SEQ ID NO:350)
ATANDFKNQVTKKESDNTKESSNKKTELATTSPRQVAQNYLEKMKPEIAKALPAHMSH ERMTRIAL SAVNSNPQLTEVILNNPTSFLGALMQSAQLGLEPNTSLGHAYLIPYNFKGKK IVNLQLGYMGLLELAHRSGLYKKIFAMPVFKDDFFEYQYGTNEKLNHIPAQVQNGDAV GYYAFYQLTNGGVHFVYWSRQKMERHKDLYTRKGSVWNTNFDAMALKTVIKDVLKY APKS VEMS S AVQSDNSNFEF SEDS STVID VTDYETEDNK
SUY49750.1 RecT [Lacrimispora sphenoides] (SEQ ID NO:351)
ADVKQELEKRAAGSGGQ SVKLTKNMTIVDMVKALEPEIKRALPCILTPERF SRMALS Al NNTPKLGECTPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKGVLECQFQLGYRGLID LAYRNERMQSIEAQVVYDNDEFSYELGLHPSLIHRPTFDEPGEIQAFYAIFRLDNGGFRFE VMSKNYVDSLCHALFKSIYFRFQSLEK
CDE68291.1 [Clostridium sp. CAG:277] (SEQ ID NO:352)
DFKEELAAKAEVAATTKKSDGVKLTKNMSIVDMIKALEPEIKRALPSVLTPERFVRMAL TAVNNTPALAQCTPMSFIAALMNAAQLGLEPNTPLGQAYLIPYKNKKKGVVECQFQIGY KGMIDLVYRNDNVQTIQAHIVRENDHFEYELGLESKLRHIPAMEGRGEMMYVYALFKL TNGGYGFEVLNKEAVIAHAERYSPSYDGFSPWKTDFESKGLELFLILDLSSKQSGK WP 060905391.1 RecT [Streptomyces scabiei] (SEQ ID NO:353)
DADRMARIALTLIRKNPDLATCSGESFLGALMTCSQLGFEPGSPTQEAFIVPYKGEATFQ
LGYQGMVTLFYQHPMASSVKVETVRENDYFEHEEGLEEKLVHRPCKTGPRGKAIAYYS
VARLINGGRTFKVMYPAEIEERREKLPSKNSPAWRNSYDEMAKKTVLRNHFKALPKSA
ELARAMAHDGTVRTDWQPDAIDVPPEYLSEPQRPELGTGSTQ
WP_146678271.1 RecT [Pirellula sp. SH-Sr6A] (SEQ ID NO:354)
SEATKEAKPETAIAKKPNGIKDWLKSDALKTQIASVAPKHMAPERVMRIALNAVSRTPK
LADCTPESFMRCLLDLSSWGLEPDGRHAHLIPYGTECTLVLDYKGLVTLAYRSGWVKKI
HADVVFEGDIFVYSLGTVCQHIPWEFRDDANKPEHKGHFRAAYCVVTMADGIEKHEV
MTASEIDAIKAKSRSGNSGPWKDHYTEMAKKTVYRRASKWLPLSPEQADAMERDDDRI
IDAVSVAVTQRLSKAAMPLIGANETGDTE
WP 126032909.1 RecT [Bifidobacterium castoris] (SEQ ID NO:355)
GALATTAKNNELTTMNTMGDIHALIRGRRAQIESVMSGVLTPERLYSLLQSAVSHEPKL
LQCTPESIVACCMKCAVLGLEPSNVDGLGKAYILPYGNKNYQTGQVEATFILGYKGMIE
LARRSGEIKSLNVTPVFEDDGIKLFMDEAGQPYIKAGEVNPLANHTPDKLMFVFLNAEF
TNGGHYRTYMTRAEIDAAKKRSSAGDRGPWKTDYVAMARKTVVRRAFPYLPVSTEAQ
SAAVEDETTPHFDFLDRNTTPVGEPSDVMQEATA
WP_114599505.1 RecT [Staphylococcus warneri] (SEQ ID NO:356)
ATQNDFKNQITDKKENKPQQSTNPRQVASDLLERMKPEIAKALPAHMSQDRMTRIALS
AVNSNPKLSEVILNNPTSFLGALMQSAQLGLEPNTNLGHAYLIPYGNIVQLQLGYLGLLE
LAYRSGKYQKIMAMPVYKDDFFEYQYGTDEKLNHIPAQQQTGDAVGYYAFYKLTNGG
THFVYWSRQKMNMHQQQYSKGGNVWRNNFDAMALKTVIKDVLKYAPKSIEMGEAVT
SDNNNFDFKDGGDIIDVTDYETEEN
SCQ72869.1 RecT protein [Propionib acterium freudenreichii] (SEQ ID NO:357)
TQQMPIKAQGEPTKELQQKAAVDRFNATLHQMQNEIARALPKHMTGDRFVRIVLTEVR
KNPTLALCDPLTMFGSLLTAAALGLEPGLNGECWLVPRKNHGTLEAQLQVGYRGVVKL
FWQNPAATYLDTGYVCERDEFRFAKGLNPILEHTPAEGDRGKVVRYYAVAGLNTGAR
VFDVFTPAQIKTLRGGKVGSNGDIPDPEHWMERKTALLQVLKLMPKSTQLAAVPAADG
RAHTISDAQQIFGGVDPTTGEVLDAEPVEDGAA
WP 127100780.1 RecT [Asaia sp. W19] (SEQ ID NO:358)
SNALATPTEKLRTQITSMTGEFRNALPSHIKPEKFQRVVMTVVQQNQGLMNADRKSLLA
SCLKCAADGLIPDGREAALVMFGQQVQYMPMLAGIQKRIRNSGEIASIQAHVIYENDHFI
WHQGIDASIEHRPLFPGDRGKAIGAYAVAKFKDGSDPQFEVMDVAAIEKVRAVSRAGK
SGPWVQWWDEMARKTVFRRLSKWLPMDTEAEDLMRRDDENDAQDVAAPTIRVEAEA
PSKLDALEHDDDGVVLEETRELEGSAA
EIC09117.1 RecT protein [Microbacterium laevaniformans OR221] (SEQ ID NO:359) TDLSTVAAAAKQNPTMKDLVEAQLPAIERQLGGTMNSDAFVRAVLSEITKSPDLMQAD
PKTLLGGVMLAAQLRLEIGSGLGEFYLTPRKDHGRMICLPIVGFQGMVKLALRSEFVTN
VQAFIVREGDDFTYGANAERGMFYDWTPKDFEEKRPMVGVVATARMKQGGTTWAYL
TREQVEDRRPSYWQKTPWGSHPDEMAKKTAVRALAKYLPKATDLGRAIEADEQKVQH
VKGLDEVTVTRLDDEPETVVVQETTDAWAATPVAEVQP
WP_136046271.1 RecT [Microbacterium sp. K41] (SEQ ID NO:360)
SKDLSTAAAAAKSQPTMKDLVEAQLPAIERQLGGAMNSDAFVRAVLSEIGKSPDLMNA
DPKTLLGGVMLAAQLRLEIGSGLGEFYLTPRKDRGRQICLPIIGYQGMIKLALRSEYVLN
VQAFLVREGDDFTYGGNSERGMFYDWTPKDFEESRPWIGVVATAKMRGGGTTWVYLT
RTQVIDRRPSYWASTPWKTNEDEMVKKTAVRALAKFLPKSTDLGRALEADEAKVQHL
KGVDEVQVTRLDDDAETFVVQEQDPMSRTPEEQAEDEANR
WP 136309287.1 RecT [Streptococcus pyogenes] (SEQ ID NO:361)
SDLSVAAAAAKTQPTMKDLVEAQLPAIERQLGGAMNSAAFVRAVLSEIGKSPDLMAAD
PKTLLGGVMLAAQLRLEIGSGLGEFYLTPRKERGRQICLPIIGYQGLIKLALRSEFVMNV
QAFLVRQGDQFSYGANAERGMFYDWVPQDFEETRDWIGVVATARMRSGGTTWVYLT
RTQVIDRRPSYWNSTPWKTNEDEMVKKTAVRALAKFLPKSTDLGRALEADEAKVQTLR
GLDEVEVTRLDDEADTVVVQEQNPMSRTPEEQAEDAEAQR
WP_110990907.1 RecT [Mesotoga sp. TolDC] (SEQ ID NO:362)
KASEIASMVKKEDERRNHKPDPLAGIVKNLTSIKGEIANALPDAGITPERMIRIVVTLLRQ
NKSLAEAAMQNPASLLGAVMMAAQLGLDPTNGLDQCALVPRKGKVCFDIMYEGLVEL
GYRSDRMESIVARTVYEKDTFSLKYGLNEELVHIPYLDGDPGESKGYYMVGKLKGGGN
IIVYMTKEQVHKIRDRYSVAYKAGLSGSRKDSPWFTSEDRMGEKTVVKAGFRWIPKSPII
RTALALDETAREASRLPMRN
WP_109196224.1 RecT [Streptomyces sp. CS014] (SEQ ID NO:363)
TENTVTAAVAVRDTGPAAQIEAYRDEYAALVPSHINADQWVRLAVGAIRGNEDLTNAA
RTDIGVFLRELKTAARLGLEPGTEQFYLTPRKSKAHRGQKIIKGIVGYQGIIELIYRAGAV
SSVIVESVRANDTFRYVIGRDERPVHEIDWFGGDRGDLVGVYAYATMKDGATSKVVVL
NHAQVMQIRAKSDSKHSEYSPWNTNPESMWLKSAVRQLMKWVPTSAEYMREQLRAQ
AEVAAEQPPAADLPPMPSVELNDEDEAVDAELVDEEA
WP 068202759.1 RecT [Isoptericola dokdonensis] (SEQ ID NO:364)
TQDLATAIADQQPAQRRTAFDLVESMRGELHKALPEHASIDNFLRLALTELKMNPQLGN
CSGESLLGALMTAARVGLEVGGPLGQFYLTPRRLKRDGWAVVPIVGYRGLITLARRAG
VGQVNAVVVHEGDTFREGASSERGFFFDWEPAVERGKPVGALAAARLAGGDVQHRYL
SLAEVHERRDRGGFKDGSNSPWATDYDAMVRKTALRALVPLLPQSTALSFAVQADEQ
VQRYDAGDIDIPALDETDTEDTK
WP_114797327.1 RecT [Gaiella occulta] (SEQ ID NO:365) STAVARRDPVAEVCTTIASKEFEAKIVQALPDGVTPARFVRTTLTAIQQNPDVVKGTRQS LYNAVIRCAQDGLLPDGREAALVVFRAKGTDVVQYLPMIGGLRKIAAEYGIKIETAVVY ERDKFEWELGFEPRVLHVPPALGEDRGEPIGAYAVATDKLGRKYVEVMSRQEIEEVRK VSRAATSEYGPWVKWWAEMARKTVGRRLFKQLPLHDLDERGERVISASDAEISFSPSG LDSLPHVDPSEPEEVLTGDVMDDDDDDGIPFGEPAA
PAV10712.1 CBG25_01455 [Arsenophonus sp. ENCA] (SEQ ID NO:366)
NTELETMNNVYDNLQSVIMQQGIAALLPAQVTPEQFTRTAATALIENVDLQNADKQSLV LALTRCAKDGLMPDGREAALVVRSTKVNKQFVKKAVYMPMVDGVIKRARQSGQVANI lAKVVYSQDEFEYVIDENGEHLTHRPAFVDGDDIVKVYAFAKLNSGELVVEVMSRAGV EKIRDTVQSAKYDSSPWVKWFDRMALKTVIHRLARRLPCASELFSLFEVYEDANSTEKT LRMAPASFKRLSIN
WP_147981944.1 RecT [Streptomyces sp. msl91] (SEQ ID NO:367)
VEHYKADLAQVMPSHVKPDTFIRLAVGVLRRDRNLAQAAQNNPAALMGALMDAAQL GLTPGTEQFYLVPRKKAGRLEVQGIRGYQGEIELIYRAGAVSSVIVEVVRQADTFRYSPG RDERPEHEIDWDAEDRGPLRLVYAYAVMKDGATSKVVVLNRAQVMKAKAMSQGSDS AYSPWQKHEEAMWMKTAAHRLTKWVPTSAEYMREQLRAAAEVAAEHRPTPVAAAPG MPSVPGEDEAIEAEFVDEDEVA
BAQ93806.1 phage RecT family (TIGR00616) [uncultured Mediterranean phage uvMED] (SEQ ID NO: 368)
TSSITPLVAMQGTLEKMADKFTEALPRQMDVNKFISVAKLTLNKNPRLLQADKTSLMQ TFMKAAQDGLYLDGKEAAAVQYGQSVQYIPMVEGIIKVLHNSGLIKTISAEVVYENDFF DYELGTAPKITHKPLIVGDRGKPMCVYAVAITTNDGEYYEVMNMDQINQCRQVSKASS SPHSPWVKWFDQMAKKTVIHRIAKRLPKNDAINSVVTVDDEPNFQQAVNVTPSEPKDS LSRLRDSIGMEGKDVEQAANDLLEKYNKEAREE
WP 061405262.1 RecT [Streptomyces] (multispecies) (SEQ ID NO:369)
SQISNALATRDQGPAAQIEQYRDEYAALVPSHVNADQWVRLAVGAVRGDEKLMEAAQ NDIGLFLREMKTAARLGLEPGTEQFYLTPRKSKPHGGRKVIKGIVGYQGIVELIYRAGAA STVIVEAVRENDTFRYVPGRDDRPVHEIDWFANDRGPLVGVYAYAVMKDGAVSKVVV LNRSRVMEFKAKSDSKHSEYSPWNTNEEAMWLKSAVRQLAKWVPTSAEYRRDQLLAH TETADSVVASVSTAPLPPQPSALDDADPDDDGPIDAELVD
WP_114014965.1 RecT [Streptomyces reniochalinae] (SEQ ID NO:370)
SQISNAVAKRDNSPGAMVQQYKADFSTVLPDHVKPDTWVRLAQGVLRRDKNLAQAAE RNPGSLMTALLDCARLGHEPGTESFYLVPFGGEVQGIEGYRGVVERMYRAGAIASVKA EVVCQGDDFDYQPDMDKPRHRVDWFGDRGPIVGAYAYAIFKDGSTSRVAVINRAYIDK VKKESKGSDRATSPWMKWEEQMVLKTVAKRLEPWVPTSNEWRREQLRAAREVANEP TPPTTPAPPAPEQVDPDTGEVIDGELVDDTPTQ
WP 027699748.1 RecT [Weissella oryzae] (SEQ ID NO:371) SNNLTSAQYFNAPNIKGKFEEVLGKNANGYVTSLLSVINGSQQLQRAEPSSIMVAAMKA
ATLNLPIESSLGFAYIVPYGNNAQFQIGYKGLIQLALRSGQIKGLNSGVVYETQFISYDPL
FEELEIDFKKPAEGKIAGYFASMKLTNGFSKVVYWTKEQVEQHRDRFSKGKNNGPWKS
DFDAMAQKTVMKAMISKYAPLNQEMQQAIVEDSESELTVPRDVTTSNEAAELNSLLTT
PKVQEGANTDLSEPFPNAEETQLFDDLASVTGD
SYW13692.1 Phage RecT family protein [Oenococcus oeni] (SEQ ID NO:372)
SNELKTILNAPTTKEKFDEVLGRNAQGYINSVLNAVGNSKLLQNASPNSILSGAMKAAT
LNLSIDPNLGYAYFVPYGHEAQLQIGYQGLIQLAQRSGQIKILNAAPIYDEQFKSLDPVTG
KLTLNKKIVPDTNKKPTGYVAYLKTVTGFEHTEFMSYADIEKFAKRFSKSFNSSTSPWK
TDFNAMAKKTLIKQVLKYAPMSIDLQTAVSADNDDIEPKDITPDEDKETVDKISNLISDN
KQDDTLSQLEEVANANN
WP_141158250.1 RecT [Pseudarthrobacter sp. NIBRBAC000502771] (SEQ ID NO:373)
TSQLAEATAAKAVEQRKNPTARDLIQAQQAAIETQLAGAMNSAAFVRAAISSVSASPQL
QQATPASLLGGIMLAAQLKLEIGPALGHFHLTPRMVSKKDGDNWVEVWTCLPIIGYQG
YIELAYRSGRIEKIESLLVRKGDKFDHGANSERGRFFDWAPADYEETREWTGVIALAKIK
GAGTVWAYLPKEKVIARRPDRWEKTPWATNEEEMARKSGIRALAPYLPKSTELGKALE
ADEHKVEHIAGVHDLVVSKAEDEPLEEPTA
TAK04183.1 EPO34 03495 [Patescibacteria group bacterium] (SEQ ID NO:374)
TNQPTTHVATTPNQRPATTLEQFRHQLVGDYQKQVLNYFNGQKEKAMKFMSAVVYSA
QKNPALLECDRTTLLHAFMACAEYQLYPSSVSGEAYVIPYKGKAQFQLGYQGIITLLYR
AGVEAVNAQIICENDAFEYEEGLEPNLVHKPNVLKDRGKPIGVYAIAAINGHKLFKVLSE
AEVMKFKGFSQSKNSEYTPWNPDNDPELWMWRKTAIKQLSKLLPKNDALQKAISEDN
QDSVIEARRSTLDAGGPAVGRALHDPNASNEPEGK
WP 092601202.1 RecT [Actinopolyspora xinjiangensis] (SEQ ID NO:375)
TGQTIGTAVAKKDDENPSAIIATNRADLARVMPSHVRTDSWVRIAQGIVRRDKNLAHAA
RQSPGTLMVALMEAARLGLEPGTEQYYLTPRKNKGKPEVLGIPGYQGLIELMYRSGAV
SSVVVETVRENDTFQWAPGRMERPEHEADWFAINGERGQLRGVYAYAIMSNGATSKV
VVLNRNDIARARDSAQGADSEHSPWKNHEEAMWLKTAARRLAKWVPTSTEDRRIVQG
VAERSDQPTEAPLDLTDEPDTDQPIEGELVDEEATQ
WP_067024969.1 RecT [Mycobacterium sp. 1245499.0] (SEQ ID NO:376)
TTQPEYKPVAQGADKQMTTGKLLKMLEPEIGRALPKGMDPDRICRLVMTEVRKNPMLT
QCTQESFAGALLTASALGLEPGVNGEAWLVPYRDRKRGIVECQFIMGYNGVAKLFWQS
PHADRLDAQLVCANDHFRYVKGLSPILEHVESDGDRGDPIAYYAIVGVKGAQPMWDVF
KPEAIAQLRGGRVGTKGDIDDPQRWMERKTALKQVLKLAPKSTRLDLAIRADERSGSDL
YKSQGMEVHAIEPGFIETEAEPETQEQ
WP_075737485.1 RecT [Streptomyces acidiscabies] (SEQ ID NO:377) TDNAISNAIATRDNGPEAIVQQHRDDLTLVLPAHHKGETWMRLATGALRRDANLRQTA ARNPGSLMNALLECARLGHEPGTESFYLVPFGNEVQGIEGYRGIVERIYRAGAVKAVKA EVVYENDHFRYHPGMDRPEHEPDYFADRGRIIGAYAYGVFQDGSTSRVVVINRAYIDK VKKESKGSDRASSPWVKWEEGMVLKTVARRLEPWVPTAVEWRTEPTPASAAEATAPV GDGVKAIAAPAPTSPYDDEGPIEGEFVDEYDGGAA
AKT73182.1 RecT (prophage associated) [Yersinia pestis] (SEQ ID NO:378)
NQVATLESIHADLSSALTRQGIQSLLPSHVSPEQFTRTAATALVADPELQNADRQSLVMS LIRCAQDGLVPDGREAAMVVYNTKQGDQWVKKAQYLPMVDGVLKRARQSGQVANIT GKVVHMADKFDYWVDENGEHIEHRPAFENHGEIRLVYAFAKLTSGEIVVEVMSRSEVE KVRDATAKKDRDGKPKVPAVWQKWFDRMALKTVLHRLARRLPCASELYSLLDVNQIA DEAEKP AECGAQRES STT AA
WP_123127078.1 RecT [Rufibacter latericius] (SEQ ID NO: 379)
SNQLQVAREQVISAQKSFKNVPNNKLDFEREAGFAMQMIQSNPFLASMDANSIRNCIVN VALTGLTLNPVLKLAYLVPRKGKLILDPSYMGLINVLVTSGAAKKIEADVVCENDFFDY EKGTNGFIKHKPSLS SRGEIIAAYAIAHLPNGEVQFEIMNREELEKVRKS SEAAKKGS SPY DGWASEMMRKAPIRRLFKYLPKHNIPDQVINTLSLDEQNNGVDFSAQKQEAFKGKAAD FFEDEPANTVDADYTDMSHEEADNELAA
WP 093587584.1 RecT [unclassified Streptomyces] (multispecies) (SEQ ID NO:380)
SQIGNAIATRDEGPAAQIEVYRDEYAALVPSHVNADQWVRLAVGAIRGNDDLLKAAGN DIGLFLREMKTAARLGLEPGTEQFYLTPRKSKAHGGRPVIKGIVGYQGIVELIYRAGAAS TVIVEAVRQNDVFRYVPGRDDRPVHEIDWFGQDRGPLVGVYAYAVMKDGAVSKVVVL NKARVMELKAKSDSKNSPYSPWNTNEEAMWLKSGVRQLAKWVPTSAEFRRDQLLAHT DTADGVIASVSAPPLPPQSAALEDLDPDDEGPIDGELLDD
WP_030975214.1 RecT [Streptomyces sp. NRRL S-1824] (SEQ ID NO:381)
SEISNAIATRDQGPAAQIEAYRDEYAALVPSHINVDQWVRLAAGAIRGNEDLMEAARND IGVFLRELKTAARLGLEPGTEQFYLTARKSKAHGYALIIKGIVGYQGIVELIYRAGAVSSV IVEAVRANDTFSYVPGRDDRPIHEIDWFGGDRGPLVGVYAYAVMKDGAVSKVVVMNH KRVMEIKARSDSKNSQYSPWNTDEESMWLKSAIRQLAKWVPTSAEYKSEQLRAHAEAI GELASVASAPLPPQPSVLDDVDPDDEGPIEGELVD
RKT60104.1 RecT [Agromyces sp. OV415] (SEQ ID NO:382)
STTVALPAQKAEAVIQQVTGAANGFAAALADRIGPDRFVRAAVTSIRTSPQLAQCEPLSI LGGLFVAAQLALEVGGPRGLAYLVPYGREAQLIVGYRGYVELFYRAGARKVEWFIVRD GDTFRQWSTGRGGRDYEWTPLDDDSNRRPIGAVAQIQGAHGEFQFEHMTVDQINERRP KRATSGPWVDWYEEMALKTVMRQLAKTARQSTDDLAFAAANDGAVITQVEGGQARV VHPATSEPEQPLSLDALERTPGELAEETNP
WP_017415747.1 RecT [Clostridium tunisiense] (SEQ ID NO:383) TTKANVTSVKNALKEQIQVQQVAAQTDTSFQGVLTKQLQHQFKAIQSLVPKHVTPERLC RIGINAASRNPQLMNCTPETIVGAIVNCATLGLEPNLLGHAYIVPFYNNKTGKMEAQFQV GYKGALDLIRRTGAVSTLSAHEVYGPRSIFWTQYFY
RYE05836.1 EOP33 01060 [Rickettsiaceae bacterium] (SEQ ID NO:384)
TNSIETNIEDLSPGNQTKTQISETNAPIVSEIRTIKRDGVYDLCSSRREKVLPFLGNNSQKF ERLARSFAFEINTNPKLASCDQLSILQAFYKCCEYGLDPASSLQQIWMIPYNGKIDFQIGY KGWLQLLWGSKLITNAYSCAVYQGDQFEYELGLNPNIKHIPQHKSINDVNELIATYGVI KLKSNEVQLRVCWRDELEKSKKSSKSNGREDSPWNRHFEAMALIVPMRKMAKNLALA LRAEDFDDEDYVNENNNQGMA
WP 052399147.1 RecT [Francisella sp. FSC1006] (SEQ ID NO:385)
SNLVVAKQCLASAEKSFIGISGDEKKYKRECNFAVQSLQANSYMLQQANANPNSLRNAI INLASMGLTLNPAEKKAYLVPYGGKNPRVDLQISYMGLIDLAISDGAIMWAQAKVVRQ NDLFNITGVDTPPEHKYNPFDSEQTRGDVIGVYCVAKFPNSDYITEIMSITDINSIKSRSSG VKSGNTTPWDTDFDEMAKKTVIKRASKYWKGSSKLSKAIDFLNNENNEGINFNKQEEK PKQNINDLMNDDVVDIDSEVGDE
WP 067349107.1 RecT [Streptomyces noursei] (SEQ ID NO:386)
TSPIRAAVARRAGDPAALISQYTADFAAVLPSHIKPATFVRLAQGILRRDEKLAQAAAND PGQFMSVLLDAARLGLEPGTEAYYLVPFKGRVQGIVGWQGEIELMYRAGAISSVIVETV REDDVFVWTPGLVDRETPPRWEGPMSYPFHEVEWAGDRGPLRLVYAYAVMKGGATS KVVVLNAQDIERAKKTSQGADSPSSPWRQHEAAMWSKTAVHRLAKYVPTSAEYITAQ VRAVRQADALSAPPVEEVVDVELVGDGQEQEARR
WP 143887802.1 RecT [Streptococcus lutetiensis] (SEQ ID NO:387)
ANQLQMSHKDFFNRPAVKNKFSEVVSGKSDQFITSLLSVVNNNKLLSKADNNSILNAA MKAATLNLPIEPSLGSAYIVPFKGQAQFQLGYKGLIELAQRSGQYKSINAGVVYKAQFK SYDPLFETLDLDFNQPQDEVIGYFACFELLNGFRKITYWTREEVYNHGKRFSKSFNNGP WKTDFDAMAKKTLLKSIIGTYGPKSVDMQEAITDDNKTEYEKAEPIDVTPQEENLTDLI GETPQEELPIANPETGEIQEEQTALFNQLGDLTDD
WP_073793143.1 RecT [Streptomyces uncialis] (SEQ ID NO:388)
SQISTAIATRDNGPAAVVEQYRESLALVMPSHLQQRVGAWIRNTQGLLRRDSKLMEAA QNDVGQFVAVLMDAARLGLEPGTEHYYLVPRWNNKKRATEVTGVRGYQGEIELMYR AGAVSSVIVEVVHTQDQFRFRPGRDARPVHDIDWDLEDRGSLRLVYAYAVMKDGATS KVVVLNRQHIAAARAKSDSAAKDWSPWNTDEEAMWLKTAAHRLTKWVPTSAEYLRE QIRAQVAVESEQRPEPLPVAPPPAPGTVDADPDDEGPIDGELVD
WP_116200709.1 RecT [Amycolatopsis circi] (SEQ ID NO:389)
ISQTVTTAVAQQKDSSPAALVRKYRTDFATVLPSHIKPETWLRIATGALRRSPQLANAA KRNPSSLLVALLEAARKGLEPGTEQFYLVPRKGKNGPEVLGITGYQGEVELMYRAGAV SSVKVEVVREHDTFAYNPGEHDRPVHEIDWRADRGDLVLTYAYAVMRDGATSNVVVL
SADDIAVILKKADGADSPFSPWQWNPKAMWLKSAARQLAKWVPTSAEYVRLPDVPLE
SLPPAKPLDLPRVDDVVDAEIVEDWPTAPDDTADGAR
WP 020135111.1 RecT [Streptomyces sp. 351MFTsu5.1] (SEQ ID NO:390)
SQISNAIEKRDQGPGAVIEQYKQELALVAASHVKVDTFARLAVGALRQNPKLAAAAQS NPGSLMSALMTAARLGLEPGTEQFYLRPIKRKGVAEVQGIVGYQGIVELIYNAGAASSV VVEVVRANDQFNYVPGLHERPVHNVDWFGDRGDLVGVYAYAVMAGGATSKVVVLSR THINRAKAKSDGADSDYSPWRTDEEAMWLKTAARRLGKWVPTSAERLTMPAERTDTV LPVGSAAPALDAADPDEDEGPVDGELEPAGGWPETAQPPQ
WP 099421180.1 RecT [Streptococcus macedonicus] (SEQ ID NO:391)
ANQMQVSHKDFFNSPAVKNKLSEVVGGKSDRFIASLLSILNNNKLLSSADNNSILTAAM KAATLNLPIEPSLGFAYIVPYKRQAQFQLGYKGLIQLAIRSGQIKSINSGVIYKAQFKSYD PLFETLEVDFSQPEDEVAGYFATIELLNGFKKLIYWTKERAYNHGKRFSKSFGNSPWQT DFDAMAQKTLLKQIISKYAPLSVELQEAITADNENEDEKAAPIDVTPQEESLSDLIGEAA QEELPAADPETGEIQEEQTALFEQLGDLTDD
WP_141925904.1 RecT [Haloactinospora alba] (SEQ ID NO:392)
GQSVTNAVAQRDTSPSGMVGKYRDDFAQVMPSHVNGAGWVRIAQGILRRDAKLAEAA RNAPQSLMSALMDAAQQGLTPGTTEFYLVPRKRKGSLEVQGITGYQGEIELIYRAGAVA SVVAEIVHEHDTFEWIPGKHERPIHEADWFGNRGTMVGAYAYAVMNSGSTSKVVILNQ HDIEKARAMSDGADSSYSPWQKWPESMWLKTAAHRLAKWVPTSAEYRHEQERARAR SEDTEIPASPDSDVVHAEIVEENDDEQAT
WP 136710836.1 RecT [Clostridium tyrobutyricum] (SEQ ID NO:393)
SDKKMVVLGESHKALSKLLETKQEALPKDFNKARFLQNCMTVLQDTKDIDKCQPISVA
RTMLKGAFLGLDFFNRECYAIPYGGNLQFQTDYKGEIKLAKKYSFNSIKDIYAKIVREGD DFQESIEDGRQTINFKPLPFNNGEIIGAFAVCLFQDGSMLYETMTKQEIEDIRNNFSKAKN SPAWVKTPGEMYKKTVLRRLCKLIELDFDSVETKKTYDETSEFEFGSANHEVSNFDKDD SNIIEADAEIQDDVQEGDGEDE
WP_132110073.1 RecT [Actinocri spurn wychmicini] (SEQ ID NO:394)
SQTVTAAVAQRDNGAQALIAKYRTDFAQVLPSHLRPTTFVRLSQGLLRRNVKLAEAAE RNPASFLAALLECARLGHDPGTDQFALVPFNDRKRNTVEVVGIEQYQGVIERMYRAGA VRSVKAEVVRAADPFEYAPDVMDRPGHKPNWFADRGELIGVYAYAEFFDGSTSRVVM MNRETVMAHKAKSRGATSEDSPWQAWEESMWLKTAVHELEKWVPTSSEDRRAARDG TADPAPVEVPRVADEVLDADLVEDDHADHPTATPTGDVR
WP 125769509.1 RecT [Companilactobacillus furfuricola] (SEQ ID NO:395)
VNNLAKLPIQTLVKEPKIVEKFESVLGNKSAQFVTSLINVVNSNQSLKNVDQMSVVASA MVAASLDLPINQDLGYMWLVPYGGKAQPQMGYKGYIQLAQRTGQYKHLNAVAVYED EFQSYNPLTEQLDYEPHFKDRDSSEKPVGYVGYFELTSGFEKTVYWTRKQIDDHRQSFS KMSGKSKPSGVWATNFDAMALKTVLRNLISKWGPMSVEMQKAYESDEHATTISANDI KDIEVQEQEPATDVSQLINGSATEVNVNDSTTNSKDSE
WP 004234437.1 RecT [Streptococcus parauberis] (SEQ ID NO:396)
ANQLTVVNTLQSDAVKEKFEAVMGEKANGFVSSVLSVVTNNNILSKADFNSVYTSAMK AAVLDLPVEPSLGMAHIVPYKGKAQFQIGYKGLIQLALRSGQVVGLNAGKVYEGQFKS FNALTEKLDIVDIYNPKKDEPIVGYFAYMKLSNGFEKTTYWTKEQVEEHGKKYSQSYDS KFSPWQTNFDAMARKTVLKSILSTYAPLTIEMQNANDFDNGKNTGIEPLEVKDVTPETD NESLLTDLLEDEPSVNTETGEIIEDTELDLDYGQINAK
WP 006845711.1 RecT [Weissella koreensis] (SEQ ID NO:397)
ANELVKQLKSEKVAAQFETTAGKNAAAFASEVAISVMGNKALENASLSSVVVEATKAS ALGLSLLPTVGEAYLVPYKGQAQFQLGYKGLVQLAMRSGQMKSFGTVKVYEGEHPRW DKYSQELHTDGDETGEVVGYYAQFTLINGFKKADYWTKSAVEEHRSRFSKSKSGPWST DFDAMAQKTVLKSILQYAPKSSEMTRAMASEDMNGDISEGTAKPIDITPETETPKVEEA NQNQQIDTNEMVDEIKEYAKETNEAPKEQTVSAADEFFK
WP 073846185.1 RecT [Amycolatopsis sp. CB00013] (SEQ ID NO:398)
TTQTVTSAVAQQDSSPAALVRKYRTDFATVLPSHIKPETWLRIATGALRRSSQLAHAAE KNPTSLLVALLDAARKGLEPGTEQYYLVPRKTKRGPEVLGITGYQGEVELMYRAGAVS SVKVEGVREHDTFAYNPGEHDRPVHEINWRANRGDLVLAYAYAYAKMRDGATSNVS VLSADDIAVILSKAEGADSPFSPWQWNPKAMWLKSAAHQLAKWVPTSAERVWQPDGP PLEAPPATPVTLPTVEDVVDAEVVEDWPTTPADTADSEQ
WP 142511229.1 RecT [Leuconostoc pseudomesenteroides] (SEQ ID NO:399)
ANEITLAKQLSSDKVVEQFAATAGESAKSFAKEVALTISGNPALQHAKLGTVIVEATKAS ALGLSLLPTVGEAYLVPYKGDAQFQLGYKGIVQLAMRSGQMKSFGAESVYEGENPKW DKYNQELVTDGEETGKIIGYYAFFTLVNGFKMAAYWPKEKVEAHRDRFSKSKKGPWST DFDAMAKKTVLKSILQYAPKSSEMKRALAEDTQAEYVQAGIQDVTPEPANIEAPIETAN APEIN AQEESLF GEL SDVDKETAPNPFAQNLGGDN
WP 023055804.1 RecT [Peptoniphilus sp. BV3C26] (SEQ ID NO:400)
TNIQKQENRALSPVNQMKNLLANQGMQNLFADALKENKDRFIASIIDLYNGDNYLQNC DPKEVAMEALKAATLNLPINKSLGYAYIVPFKNKGKLTPQFQIGYKGYIQMAQRSGQYK ALNAGIMYEGMEIKRDFLRGTFEIVGEPKSDKAIGYFAYFQLLNGYEKALYMSKEDITD HAKRYSQSFGSDFSPWKNQFDEMAQKTVLRRLLTKYGVLTTEFQEAAKREEDEEVLKA TEENAMIEMNSQEETIAVDPKTGEIIEETEAPF
PCR98661.1 RecT [Lactococcus fujiensis JCM 16395] (SEQ ID NO:401)
KSAPVQARFQEVLGKKSSGFVSSLLTVVNNNNLLKRATPDSIMTAAMKAATLDLPIEPS
LGFAYIIPYGQEAQFQIGYKGLIQLALRSGQITGLNSGIVYKSQFISYDPLFEELEIDFMQP EDEVVGYFASMKLSNGFMKVVYWTKARVENHKKRFSKAGAKSPWATDFDAMAQKT VLKAMISKFAPLSQEMQIAVIADNESETLEPKDVTPEQPLISIDEPKENENSQSQISIPEDQ APQQENEEFVEELFPVGQA
WP_106316803.1 RecT [Actinoplanes italicus] (SEQ ID NO:402)
PETIANAVAQRDQSPTALVADYRNDFAAVLPSHLPPATFVRLAQGVLRRDQNLMRTAM
NNPGSLMTAMLDCARLGHEPGTPAYYFVPIKGAIEGWEGYRGVIERIYRAGAVQSVRA
EVVRENDFYEYEEGMPHPIHRYERFASPEQRGPLLGVWAYAVMLDGGMSRPVEMGRE
EVLAHRDMNPSNNRSDSPWKKWERSMWLKCAVHELEKWVPSSTEYRREIARMSAPQP AAAAAPVTYVPPQVGQRDAIEGEVAEDWPEPAEVPGGAQ
WP 013655830.1 RecT [Cellulosilyticum lentocellum] (SEQ ID NO:403)
SDKKELVLKETHSRLNQLLATKMEAMPKDFNQTRFLQNCMTVLQDTKGIENCHPVSIA
RTLLKGAFLGLDFFQRECYAIPYGGELQFQTDYKGETKMAKKYSIRDIKDIYAKVVRKG
DEFKEEIVAGQQVVDFKPLPFNDAEIIGAFAVVLYQDGGMEYETMSTKQIEGIRDNFSK
MKNGLMWTKTPEEAYKKTVLRRLTKKIEKDFASIDQAKAYEESSDMQFKQDEQKQDA KDPFADAVDVEFTEETEGQVRLDGEADGAK
WP 148001988.1 RecT [Streptomyces sp. adml3(2018)] (SEQ ID NO:404)
SQIGNEIARQSHSPAAIIEQHKADLAVVAASHVRVDTFARLAVGVLRQNEKLAAAAANN
PGSLMSALMTAARLGLEPGTEQFYLRPIKRKGQLEVQGIVGYQGIVELIYNAGAAQSVV
VEVVRARDEFAWTPGALDEHRPPRWPGAMKQPHHKVDWFGDRGPLVGAYAYAVMQ GGAISKVVVLNRDHIARAKAKSDGADTDYSPWRTDEEAMWLKTAARRLGKWVPTSAE KRTGVIERLDTPPAPLNEIDPDEDDEPIDGELVD
WP 011988985.1 RecT [Clostridium kluyveri] (SEQ ID NO:405)
PDKKMMVLSESHKALNKLLETKKEALPKDFNKSRFLQNCMTVLQDTKDIDKCQPISVA
RTMLKGAFLGLDFFNRECYAIPYNGNLQFQTDYKGEIKLAKKYSINPIKDIYAKVVRKG DEFQESIVNGHQTVNFKPLPFNNDEIIGAFAVCLFQDGSMIYETMTKQEIEDIRNNFSKAK NSPAWVKTPGEMYKKTVLRRLCKFIELDFNSIESKKTYDEASDFQFEHEPNKEVSNFDK
GSIDEDKTVEADTETEAKEDNREYAFKESE
GAC42786.1 recombinational DNA repair protein [Paenibacillus popilliae ATCC 14706] (SEQ ID NO:406)
STSHLLTIHNNLEKLIDSKREAMPKSFNKTRFLQNCMTVLQDTKDVGKCDPQSVARTLL
KGAFLGLDFFNKECYAITYGGSVQFQTDYKGEKKLAKKYSVRPVKDIYAKLVREGDEFI
EEIKDGQPTVQFKPLPFNDSEIKGAFAVSLFEDDGLAYEVMSVAEIELTRKNYSKQPNGQ
AWVKSKGEMYKKTVLRRLCKNIELDFDTIEQAQAFEDSSDFEFNKEPKQAQQSPLNPQA TVIDAEYEEVKEESDNETNQE
OBR91022.1 RecT [Clostridium ragsdalei Pl 1] (SEQ ID NO:407) LDKQANGFITSLLNLKQDKLKGCNDMTVLGSALKAAPLKLPIDPNLGFAWIIPFKNHGK LEAQFQVGYRGFIQMAQRSAQYKKLNVTEIYEGQLKSFNPLTEELELDLDNKQSDEVVG YAAYFRRLNGFEKMVYWSKEKVTAHARRFSKSFGNGPWKTDFDAMARKTVLKNMLS TWGILSIDMQEAITSDSKIIKTTEDDYELLEEGTEDESNANVTDVEYTESDESGKEEDGK DP YEGTPF SENNTES
SEI77195.1 RecT [Paenibacillus polymyxa] (SEQ ID NO:408)
PDKLLVIHDNLNKMLDEKSEAMPTSFNKTRFLQNCMAVLQDTKDIEQCDAKSVARTML KGAFLGLDFFNKECYAIPYNENIGTKNKPRWIKSLQFQTDYKGEKKLAKKYSTRRVKDI YAKLVRDGDDFREEIESGQPTINFRPLPFNDGIIRGAFAVALFEDGGMIYETMSLKEIEKT RDDYSKQSTGKAWTKSPGEMYKKTVLRRLCKNIELDFDTIEQAQAFEDSSDFDMNKEL KPQQQSPLNPNTTIIDAEYEEIKEEPADGPEQE
KKT72154.1 RecT [Candidatus Collierbacteria bacterium GW2O11_GWB1_44_6] (SEQ ID NO:409)
SNQIQIKSEVDLKMILANQYMKQINNFFGNEKQAMKFLSSVMSAVQRIPELLNCEPKSLI NSFMTMAQLGLMPSEVSGEAYVLPYNNKNGKVAQFQLGYQGLVTLFFRAGGQKIRAEI VRKNDEVSYVNGEIKHTIDIFKSNEERGEAVGAYAVATINGQEVCKYMNATDILAFGSR FSKSWTTSFTPWKEANDPELNMWKKTVLKQLGKMLPKNESINLAIAEDNKDSIISDRLL PAVEESKNLTMGSIVKTEEPVIEVEPEEIKQ
WP 125777163.1 RecT [Antribacter gilvus] (SEQ ID NO:410)
SADVVIRQHATELTSVLPSHLAEKGDGWLNAAVAAVRKDRNLWNAANSDPGAVMNA LAEAARLGLQPGSKEYYLTVRGGKVLGIVGYQGEIELMYRAGAVSSVIVEPVFERDGFE YTPGVDDRPKHRIDWDADDRGPIRLAYAYAVMKDGAVSKVVVVNKTRIRRAKDASAT AGKSHSPWTSDEVAMWMKTAAHDLAKWVPTSAEYIREQLRAVKEVEAEPARASDPRP EPVHIVEAQILDEDPFPNAPEDGAA
WP 130123223.1 RecT [Lactococcus sp. S-13] (SEQ ID NO:411)
SNQITKTQQTLKSPEVKAKFEEVLGKKADGFVASLLSVVGNSNLKTVEANSVMTAAMK AATLDLPIEPSLGFAYVIPYGREAQFQIGYKGFIQLALRSGQLTGLNCGIVYESQFVSYDP LFEELELDFTQQASGDAVGYFASMKLANGFKKVTYWSKEQVLAHKKKFVKSANGPWR DHFDAMAQKTVLKAMLTKYAPASIESKMIQTAITEDDSERFENAKDVTPDEPVISIDEPV TSEVSQNESSAESQEQFPEDEVEELFPIGKS
WP 147265819.1 RecT [Nocardia puris] (SEQ ID NO:412)
AESISSEVARQASPLAVVARYRSELAGSLPAAVRHDVDRWLMVAEMAVRRSPDLMEIV RRDQGASLMRALIECARLGHEPGSPEFYLIPRGGIVSGEESYRGIIKRILNSGEYQRVVAR VVHERDRFSFDPRIDEIPDHRPAEGERGAPARAYAFAVRWDGTPSTVGEATPERIIAAKA KARGVDRKDSPWNSPTGVMYRKTAIRELASYVHTSAEPRPRPAAPTEPPAVDEVSTVY DAEVIDEVDVLDITAEPTA
TCP18101.1 RecT [Nicoletella semolina] (SEQ ID NO:413) LKNADPQSVFNAACMAATLNLPIQNGLGFAYIVPYQNKKEKKTEAQFQLGYKGLIQLA
QRSGQFKRLVAVPVYEKQLIAEDPINGFEFDWKQKPENGEKPIGYYAYFKLLNDFTAEL
YMTTHEVDEHAQRYSQTYRTYLDKKSKGQWASSVWADNFEAMALKTVMKLLLSKQA
PLSVEMQQAVLADQAVVKNVETNEFSYVDNQIEEAEYTELKVSTDIFEKCKQSILNKET TLQELCDSGYEFSQEQYAELEKLEVE
OAB27843.1 recombinase [Paenibacillus macquariensis subsp. defensor] (SEQ ID NOAM)
SDKLLVIHKNLENLLDSKREAMPSNFNKTRFLQNCMTVLQDTRDIDKCDATSVARTML
KGAFLGLDFFNKECYAITYAGAVQFQTDYKGEKKLAKKYSVRPVRDIYAKLVKEGDDF
KEEVKDGQQTIQFAPKPFNDGEVLGAFAVALFEDGGLVYEVMSKVDIETTRKNYSKQA
NGQAWTKSPGEMYKKTVLRRLCKNIELDFDTIEQAQAFEDSGDMDLNKEVKPPQVSPL NATVIDAEYTEIREGDPNATNQE
WP 019417330.1 RecT [Anoxybacillus] (SEQ ID NO:415)
ATTQSLKNQIAKKQNSNIQQGVTLKQLLNSESMKKRFEEVLGKRAQQFATSILNLYNSE
KMLQKCEPMSIISSAMVAASLDLPVDKNLGYMYIVPYGTTATPIMGYRGYIQLALRTGQ
YKHINVIEVYEGELQKWDRLTEEFEMDSKQKKSDVVVGYAAYFELINGFRKTVYWTRE
QIEAHRKKYSKSDFGWKNDFDAMAKKTVLKSLLSKWGILSIEMQNAFNEDEKEVDTKE VKDITSEVQEAEYIEAEAFEVPIETETPQQEEIVFDAQ
CDA71469.1 phage RecT family [Ruminococcus sp. CAG:579] (SEQ ID NO:416)
NERTNLQYAPAPVERFKECLNSHEIKARLKNSLKNNWTQFQTSMLDLYSGDAYLQKCD
PMAVALECVKAATLDLPISKSLGFAYVVPYNNVPTFTLGYKGLIQLAQRTGQYRTINAD VVYEGEIRGADKLSGMVDLSGERTGDEVVGYFAYFKLINGFEKMIYMTRAEAEKWRD DYSPSAKSKYSPWRTDFDKMALKTCIRRLISKYGIMSVEMQGVMTEEAEPRAAAAAKR AEETVQANANSKVIDIDAAPPAANESPAEAAPQPDF
WP 019108121.1 RecT [Peptoniphilus senegalensis] (SEQ ID NO:417)
TNQIARKPVNEIKNVLSVPSVRNLFDNALADNAGAFVSSLIDLYGGDSYLQNCEPKDVV
MEALKAATLKLPINKNLGFGYVVPFKNKNGKLVPTFIIGYKGLIQLAMRTGQYKAINSGI
IYEGMEIKEDVLRGTLEIKGSKQSEKIKGYFAYFQLINGFEKALYMDVEEAADWGRKYS
KSFAKGPWTTEFDAQAQKTCLRRLLSKYGVLSTEMQRLEKTEEDVDIAVGTIENNAVEE LNIPSSQADYIVDEETGEILDDEEIVAPF
AFH22576.1 RecT family protein [environmental Halophage eHP-30] (SEQ ID NO:418)
TEQNQTPAKTESKSPIKAQLYKDNVQQRFQELLGERASAFMTSVMSVVKDNDQLSQAE
PSSVLNAAMTAATLDMPIDNNLGMAYIVPYKDGKSGKTYAQFQLGYKGFIQLAQRSGQ
FKTISATPVRQGQIVTADPLRGYEFDFTQGQDKEVVGYAAYFALLNGFEKTLYMSKAE
MEQHAASYAAGYKKGYSNWNRKFDEMALKTVIKQLLSKYAPLSVDMQKAQQTDQTV
S VEEPNAIEQQE AAPEID AS SNNNQNQ
WP_138067957.1 RecT [Streptococcus pseudoporcinus] (SEQ ID NO:419) ANQLTVVNTLQSDAVKEKFEAVMGEKANGFVSSVLSVVTNNNLLAKADFNSVYTSAM KAAVLDLPVEPSLGMAYIVPYKGKAQFQIGYKGLIQLAQRSGKVTKLNSGKIYKGQFKS YNALSEELDIDDIYTPKEDEEVVGYFGYMKLSNGFEKITYWTKERVEKHGKKYSQSYDS KFSPWQTNFDAMAEKTVLKSILSTYAPLTIEMQNANDFDNGKNTGIEPLEVKDVTPEND NESLLSDLLEDEPSVDAETGEIMENTELDLDYGSINAK
WP_072904346.1 RecT [Hathewaya proteolytica] (SEQ ID NO:420)
ADSKKELILKESYSVLDRLIETKISAMPKDFNRTRFLQNCMTVLQDTKDIEKCQPISVART LLKGAFLGLDFFQKECYAIPYGGTLQFQTDYKGETKMAKKYSIRPIKDIYAKVVREDDL FEEEIKEGQQF VNFKPIPF SDKPIIGAF AVVL YQDGGMEYETMSKTQIEGIRDNF SKMKN GLMWTKTPEEAYKKTVLRRLTKKIEKDFDTIEQAKTYEETSDSEFKKEEKCNEKSVFDV EYSEVESEELEQQTMLENSPFGGEQ
GAE17732.1 RecT [Bacteroides pyogenes DSM 20611 = JCM 6294] (SEQ ID NO:421)
QVADPQSVLNSAVIAATLDLPINPNLGFAAIVPYNDRKSGKCIAQFQLMYKGLVELCLRS GQFASLIDEVVYEGQIVKKNKFTGEYIFDEDAKTSNKVIGYMAYFRLVNGFEKTFYMTS EEVTAHAKAYSQSFKSGYGVWKDNFDIMARKTVLKLLLSKYAPKSIEMQRAITFDQAA VKGDLTETNVDEAEIEYIDNESGSDKIKQAAEDAVIQSQQKTLL
CDF09406.1 [Eubacterium sp. CAG:76] (SEQ ID NO:422)
AERKQITTKEYLAEVKGGLENELNLNAKALPENFNQSRFVLNCISLIKSNLSNYNNITPES VYLALAKGAYLGLDFFNGECYAIPYSGEVNFQTDYKGEIKLAKTYSRNPIKDIYAKNVR DGDFFEEIIESGKQSVNFRPVPFSDKKIIGTFAVVLFKDGSMMYDTMSVKEIEEVRNNFS KAKNSKAWAATPGEMYKKTVLRRLCKLIDLDFNSQQRLAYEDAGDFDKEKADEPVAD DTVNVFDAEFKEVEPENKDAAIIEEMGLEEA
WP 099299656.1 RecT [Pediococcus pentosaceus] (SEQ ID NO:423)
MNDISKVPMKVLVQQDKVQRMLENTLKGKTRQFTTSLINVVNSNQSLADVDQMSVIKS AMVAASLDLPIDQNLGFMWLVPYKGMATPQIGYKGYIQLALRTGQYKKLNTIVVHEGE MKYWNPLTEDFEYDPKGKESDEVIGYLGYLRMINGFEKTVYWTKQNIEDHRMKFSKM SGKAKPSGVWASNYDAMALKTVMRNLLSKWGIMSIEMQQAVVQDEKAPETDVRDVT PTETNSIDSLLAPEPKGEPINDSNEATVPTNAE
WP_118227047.1 RecT [Bacteroides eggerthii] (SEQ ID NO:424)
GTVTTVPQLKSMLANENVKSRFKEILGKKAPGFISSIVAVANSNTLLQKAEPQSIMNAAV IAATLDLPINPNLGFAYIIPYGNQASFQIGYKGMTQLAMRSGQYKTINVTEVYEGEIKSEN RFTGEYTFGERKSDKIVGYMAYFSLTNGFEKYMYMSREECEKHGKKFSQTYKRGGGL WATDFDSMSKKTVLKMLISKYGILSIDMQRAQTFDQAVVKDDLVEKNIDEAEVSYEDN PTNADVRRNAMKEALEEAEVVDETTGEIFNQPAQ
WP 094754495.1 RecT [Criibacterium bergeronii] (SEQ ID NO:425) EVNNMNNQMQQTATQVTPINQMKNLLANKGINQMFEQALKMNAGAFISSLIDLYNSD
GYLQKCEPKDVAMEALKAATLNLPINKGLGFAYIVPYGKAPQFQIGYKGYIQLAMRTG
QYKHINAGAVYEGEEVKENRLAGTVEILGDKKNDNETGYFAYFKLTNGFEKCLYMSKQ
EMTTHAQRFSKAFKNGPWQSDFSAMATKTVLRLLLSKYGVLSTQMQEAIAKENDDELQ QQINQNANKEVIDIEKIDNKNVIDIEAIDAADDDIEAPF
WP_045553720.1 RecT [Listeria] (multispecies) (SEQ ID NO:426)
ATNDELKNQLANKQNGGQVASAQSLDLKGLLEAPTMRKKFEKVLDKKAPQFLTSLLNL
YNGDDYLQKTDPMTVVTSAMVAATLDLPIDKNLGYAWIVPYKGRAQFQLGYKGYIQL
ALRSGQYKSINVIEVREGELLKWNRLTEEIELDLDNNTSEKVVGYCGYFQLINGFEKTVY
WTRKEIEAHKQKFSKSDFGWKKDYDAMAKKTVLRNMLSKWGILSIDMQTAVTEDEAE PRERKDVTEDESIPDIIDAPITPSDTLEAGSEVQGSMI
WP_106024518.1 RecT [Clostridium thermopalmarium] (SEQ ID NO:427)
ATVNELKNEIATKKETGVGSAGNTIKGLINSPAIKKRFEDVLNKKAPQYMSSIVNLVNGD
TNLKKCDQMSVIASCMVAATLDLPIDKNLGYAWIVPYGNRAQFQLGYKGYVQLALRT
GQYKAINVIEVHEGELIEWNPLTEELKIDFSQKKSDAIIGYAGYFELLNGFKKSTYWTKE
QIIRHKNKFSKSDFGWKKDFDAMAKKTVLRNMLSKWGILSIEMQNAYTADQATIRPEA VETGDIKGNVDYVEADFEENYEGTPFEEVEEGGVNE
WP 073010654.1 RecT [Virgibacillus chiguensis] (SEQ ID NO:428)
ATNSSLKNQIANKGNGNQNTPQGYTVKQLMSASSVKNRFEETLGKKAPQFMASVINLV
NGDTNLQKCDQMSVVSSAMVAAALDLPIDKNLGYAWVVPYGNKATFQMGYKGYIQL
ALRTGQYKNINVIEVYEGEVKSFNRLTEEIELEFEGKESDKVIGYVGYFELINGFRKTVY
WSKDEIERHKKRFSKTGFAWKDNYDAMAKKTVIRNMLNKWGILSIDMQTAVTTDGNA VTQDFEQEDSGLVIDAEFSEVNEASEGQQEIKFENADA
WP_111921306.1 RecT [Clostridium cochlearium] (SEQ ID NO:429)
ATNESLKNQLATKKETGIGSAGNTIKSLINSPVIKKRFEEVLDKRAPQYMSSIVNLVNSDT
NLKKCDQMSVIASCMVAATMDLPVDKNLGYAWIVPYGNKAQFQMGYKGYVQLALRT
GQYKSINVIEVHEGELEEWNPLTEELKIDFSKKESDAVIGYAGYFELLNGFKKSTYWTKE
QITKHKNKFSKSDFGWKKDFDAMAKKTVLRNMLSKWGILSIEMQNAYTADHGVIKNEI METGEVKENVEYIEADFESYEGTSIEEGGSNE
WP 019125538.1 RecT [Peptoniphilus grossensis] (SEQ ID NO:430)
TNIQKQENRALSPVNQMKNLLKNQGMQNLFADALKENKDRFLASIIDLYNGDTSLQDC
NPKEVVMEALKAATLNLPINKNLGYAYIVPYNSKGTTRPQFQIGYKGYIQMAQRSGQY
KALNAGILYEGMEVKRDFLRGTFEIIGEPKSDKVMGYFAYFQLLNGYEKAIYMTKDEVT
EHAERYSQSYGSKYSPWKKQFDEMGQKTVIKKLLSKYGVLTTEFQDAVKEEEDREVLR ATENNAMLEMTNPDEEEETIEVNPETGEIIEDDVKAPF
ERL63827.1 YqaK [Schleiferilactobacillus shenzhenensis LY-73] (SEQ ID NO:431) SAVSESKDLQHVDQLSVLNSAMTAASLNLPINQNLGFFYLVPYKGIAQAQMGYKGYIQ LAQRSGQYQRLNAIPVYADEFGSWNPLTEELDYTPHFEDRKASDKPVGYVGFFKLANG FEKTVYWSRKQIEAHRDRFSKSSKSSASPWNTDFDAMALKTVLRNLITKWGPMTTDIQR ANDADEGDYKNDLSTDTSEPKDVTPGASLEQFLGETDQQQKPATKPAPKKKAEEAKPN DLKPDVTHDPNEHTEQTSLSDDDLPFD
WP 051267408.1 RecT [Gulosibacter molinativorax] (SEQ ID NO:432)
TDLTEKIATKAVAVKKDPKIADLMKSYEPQFARSLGKSMDAAKFGQDALTAIKQTPKLL EADQRSLFGAIFLAAQLKLPVGGPLAQFHLTPRKVAGEMTVVPIIGYNGYIQLAMNTGL YSKVGAFTVHANDHFRTGANSERGEFYDYERATGDRGELTGVIGYAKVKGFDESSFVY LDAATVRERHRPKFWDKTPWASDEGEMFRKTAIRVLQKYLPKSIEAAPLALAAQADQA TVRRVDGVDDLQIDHEDIAIAEVIEDD
WP_112330076.1 RecT [Cereibacter johrii] (SEQ ID NO:433)
TENTAQAPAAARQLTPIQAISQTLESDAFAPKISASLDGTGISPARFKRAALACLSRPEAS YLVEKCDRGSIFTAVMNAAAAELELHPALGQAYIVPRGGQAVLQVGYKGFIALASRAG LAVEADVIYAGDRFSIRKGTNPDVSVEPELDPAKRGEWVAVYVITHYASGAKTLTFMTR AEVEAIRNRYSDAYKRGGAGAKTWNESPEEMAKKTCIRRASKLWPISVPGGGDDDGGE VIEADPAPVPAPRMRDVTPGGGLDRLAASL
WP 063601171.1 RecT [Clostridium coskatii] (SEQ ID NO:434)
SDKKMVVLNESHTMLNKLLETKQEALPKDFNKARFLQNCMTVLQDTKGIEQCQPITVA RTMLKGAFLGLDFFNKECYAIPYKDNLQFQTDYKGEIKLAKKYSFNPIKDIYAKIVRQG DDFQEAIINGQQTINFTPVPFNNGEIIGAFAVCLFQDGSMLYETMAKQEIENTRKNFSKAP NSPAWTKTPGEMYKKTVLRRLCKLIELDFDSVECKKVYNETSDFEFENQQHEVSNFDK KDIDEDKIVEADVEVQDDNENNVPEDGE
WP_118206945.1 RecT [Bacteroides stercoris] (SEQ ID NO:435)
STITTIPQLKSMLANDNVKARFKEILGKKAPGFISSIVAVANSNTLLQKAEPQSIMNAAVV AATLDLPINPNLGFAYVVPYGNQAQFQMGWRGFVQLAMRSGQYKTINVNEIYEGEIKK SNRFTGEYEFGERASDKIVGYMAYF SLINGFEKFL YMSKEDCEKHGRKF SQTYKRGTGI WSTDFDSMAKKTVLKMLLSKFGILSIEMQRAQTFDQAIIKDNLAETDIDEAEVSYNDNP DNEEARRNAMKEALQEAEVVDENTGELFNTETK
WP 099840029.1 RecT [Clostridium combesii] (SEQ ID NO:436)
ANTKAIVLQETANNLNTLLKAKVKALPKGFNETRFLQNCMTVLQDTRNIEKCNSVSVA RTMLKGAFLGLDFFSKECYAIPYNDYKTGKCHLEFQTDYKGERKLMKQYSVRPIKDIYA KVVREGDKFEEIIEKGIPTINFRPKPFSNEKIIGVFAVVLFEDGGLLYETMSVEDVEKIKVG FAKRDKEGNYSKAWTATPEEMYKKTVIRRLRKSVELEFDSVEQQKTYEEASEFDVKRD EEVKEEASPFENVDFEEAEEGNTIEAKQE
WP 069686512.1 RecT [Oceanobacillus sp. E9] (SEQ ID NO:437) ATNDSLKNQLSSKQGNQNTPSGYTIKQLMGAESVKKRFEEMLDSKASQFMASVINLVN GDTNLQKCDQMSVVSSAMVAATLDLPIDRNLGYAWVIPYGNQATFQLGYKGYIQLALR SGQYRNINVIEVYEGELQSFNRLTEEIELDFEKRTSDKVIGYTGFFELINGFRKTVYWSKA EIEKHKNKFSKSGFGWKNDWDAMAKKTVVRNMLNKWGILSIDMQKAYVEETKDPSEP NGE VIDLNLTEDELT AAQEQF SDENANE
RMD50745.1 [Candidatus Parcubacteria bacterium] (SEQ ID NO:438)
TEYKRPQQPEQTKMLSAKLNQAGAPNKVSSFDVQLRDWFKKHSRKMQTLAGSKEEAN KIITSLIFVAQRNPKLMTCTMESIGECLMQSAQLKLYPGPLQECAYVPFGNRATFMPQYQ GLCKLAYNSGVVRSIATEVVYANDLFEFELGTNAYLRHVPTLSDNRGERIAAWCVVKT THGEVIIVKPISFIEGIRKRSPAGNKKDSPWNTSDDDYDAMARKTVLKQALKTIPKSSDL AAAIQVDNAVESGSVDNVVTHIEPVTDPTPEETKE
WP 061413958.1 RecT [Lactococcus sp. DD01] (SEQ ID NO:439)
ANLTPTQTVLKSDAAKRKFEEVLGKKTNGFVGSLLSLVGSTNLKNVDSNSVMTAAMK
AATLDLPIEPSLGFAYVIPYGREAQFQIGYKGLIQLAIRSGQVTKLNAGPVYENQFIKYDS LFEELEIDF SMPQGVEIAGYF ASMELANGFRKIIYWDKEKVTAHGKRF SKSFNRS S SPWQ TDFDAMATKTVLKAMLSTYAPLSTEMQQAIVADNESATPKDATPVTDDLVLEAVEDSK QIEENEIINDQVASENYQEPQGEPEVLDLEL
WP_147129628.1 RecT [Nocardia ninae] (SEQ ID NO:440)
AESISKEVARQANPLAVVAKYQNELGKSMPAAIRGDVGRWMMVAEMAVRKNPKLLSI
VQADQGASLMRALIECARLGHEPGTKYFYLVPRGNQISGEEGYHGIIKRVLNSGHYQKV LARTVFERDEYSFDPLTDQLPTHVPASGERGKPVSAYAFALHWDGTPSTVAEASPERIA AAKAKSYGTDRKDSPWQSVTGVMYRKTAIRELEPYVHTSAEPQPRQDNAGSRGAVMD PSTYDDAEPLDADVLDITADQIAEHDGEGAL
WP 074846740.1 RecT [Clostridium cadaveris] (SEQ ID NO:441)
ATNSSLKNQLSKKENVTIGNTMQGLLNNPKMKKRFEEILDKKAPQYMSSILNLYNGDTS LQKCEPMS VLS S SMIAATMDLPVDKNLGYAWIVPYKNKAQFQMGYKGYIQLALRTGQ YKHINAIEIHEGELVNWNPLTEELEIDFTKKESDKIIGYAGYFELLNGFKKSTYWTKTQIE NHRKKFSKSDYGWNKDFDAMAIKTVIRNMLSKWGILSIEMQNAYTADENIIKDSFIDDS ENVSANIEDLVEADYTVNQDSLESKEEFEGTPLE
WP 038246219.1 RecT [Virgibacillus] (multispecies) (SEQ ID NO:442)
ATNDSVKNQIANKNQGSNQVNPNNLGLKQLLSTPTMRKKFDEVLDKKAPQFMSSLLNL YSNDSYLQKAEPMSVVTSALVAATLDLPIDKNLGYAWIVPYGGKAQFQLGYKGYIQLA LRTGQYRNINVIEVYEGELKSFNRLTEEMELDFEQKQSDKVIGYTGYFELINGFRKTVY WSKEEIEKHKKRFSKSDFGWKKDWDAMAKKTVIRNMLNKWGILSIDMQKGIVEDNKD PIEK ANEFDEQDIIEADF SEVNDDQEIDF SD AQ
WP_106064284.1 RecT [Clostridium liquoris] (SEQ ID NO:443) TTASELKNQLATRKETGVGSAGNTVKGLLESPAIKKRFEEVLKQRAPQYMSSIVNLVNG DANLKKCDQMSVIASCMVAATLDLPVDKNLGYAWVVPYGNKAQFQLGYKGYVQLAL RTGQYKSINVIEIHEGELIEWNPLTEELRIDFEKKKSDAIIGYAGYFELINGFRKSTYWTKE QITKHKNKFSKSDFGWKKDFDAMAKKTVLRNMLSKWGILSIEMQNAYTADQETIKSEV LETGNIKENVEYVEADFDVDFEGTPFEEGVTNE
WP 028562280.1 RecT [Paenibacillus pinihumi] (SEQ ID NO:444)
ADANKLLVINEKLIKLIESKQDAMPKSFNKTRFIQNCMAVLQDTDEIDKCDATSVARTLL KGAFLGLDFMNKECYPIIYGGKCTFQTDYKGEIKLAQKYSVRPVLNIYAELVREGDFFL KEVKDGQRTIQHKPPEGFNDGKVIGAFAIVLYKDGGMDCESMSVAEIETTRKNYSKQA NGPSWTKSPGEMQKKTVLRRLCKTIQLDFDTIEAKEAFEDGGDFDFKQDPKPQQQSPFD KNATVVDAEYEEVEEEDQSESAT
WP 068672306.1 RecT [Oceanobacillus sp. Castelsardo] (SEQ ID NO:445)
ATNSTLKNQISNKKQGNNQVGKTQGTTMKQLLASPAVMNRFEEVLGKRANQFTASILG LYNSEKMLQKAEPMSVISSAMIAATLDLPVDKNLGYAWIVPYGGKAQFQMGYKGYIQL ALRTGQYRNINVIEVYEGELKKWDRLTEEIELDFESRTSDKVIGYTGYFELINGFRKTVY WSKEDVEKHKKRFSKSDFGWKNDWDAMARKTVIRNMLNKWGILSIDMQKGMVEDSK DPVEVNEEFSSDVIDADYEVVGENEQQDFTVEENA
WP_067592792.1 RecT [Nocardia terpenica] (SEQ ID NO:446)
SSIAAAAESAEVTPASIINKYRDDIATVLPPKLRERIDRWIRLAIGAVNSNPELISRVRADQ GASMMQALMKCAALGHEPGSGLFHLVPKGSRIEGWEDYKGILQRIDRSGVYARTVIGV VYANDEYSYDQNVDERPRHVRATGDRGEPISSYAYAVYPSGAITTVAEATPEQIASSKS KARGADNAASPWRAPGAPMHRKVAVRLLEKHVATSAEDRREPISRSAANDVVIDATA DYYQEP
WP_079708113.1 RecT [Paraliobacillus ryukyuensis] (SEQ ID NO:447)
ATNDTLKNQISNKKNNQVAEGKQGTTMKGLLNSPAVMKRFEEVLGKRANQFTASILSL YNNEKTLQKSEPMSVISSAMIAATLDLPIDKNLGYAWIVPYGNKAQFQLGYKGYIQLAL RTGQYRNINVIEVYEGELVKWNRLTEELELDFEQKKSDKVIGYTGYFELINGFRKTVYW SKADIEKHKQKFSKSNFGWSNDWDAMAKKTVIRNMLNKWGILSIDMQKAYSTDEIEQE QESNDFIDGEWAEVSEDDITEAMNEV
OLA20462.1 BHW17 09115 [Dorea sp. 42 8] (SEQ ID NO:448)
AVNNSLAKRDQSMKLSVYLQNDAVKKQINQVVGGKNGTRFISSIVSAVQSTPALQECTS PSIVNAALLGEALNLSPSPQLGQFYMVPFDNRKKGCKEAQFQLGYKGYIQLAERSGYYK KLNVLAIKEGELIRYDPLDEEIEVELIDDDVIREETPAMGYYAMFEYENGFCIQQKWRSE DFGTFRAGQNSGKGSLEVFFFLVQRF
WP 058906805.1 RecT [Lactiplantibacillus plantarum] (SEQ ID NO:449) SNELAHMPMKQLVKQDAIQQMLSRTLADKASQFSTSLINLVNGNQSLAKVDQMSVIQS
AMVAATLNLPIDQNLGYMWLVPYKGRATPQIGYKGYIQLAQRTGQYLAMNAIAIHSGE
LKGWNPLTEDFQFDPMGRTSDEVIGYVGYFKLTNGFEKTVFWTKASMEEHRMSFSKMS
GGKTPQGVWASNYDAMAIKTVLRNMLSKWGPMSIEMEQALANDETAPQTPLNVEAEE SASETTDNMLDKFRQQQGEVNTSDQEHNTEDQGDPRDQS
RZT66774.1 RecT [Leucobacter luti] (SEQ ID NO:450)
SDLSQAAVAVKKSKTVEDYLTEYEPQFQRALGKSMDAAKFSQDALTAIKQTPQLGQAD
LQTLFGSLFLAAQLKLPVGGPLAQFHLTPRKRGDKLEVLPIIGFGGYVQLIMNTGLYSKV
GAFLIYEKDYFDEGANSERGEFYDFKKSRGDRGPVVGVIAYVKLKGFDESQYVFLDAD
TIRSRHRPRYWEKTPWGSDEGEMFKKTGVRVLQKLLPKSVEAAPLALAADADQATVR KVDGIEDLTIQHDVVDAEVVPDGVPV
WP_087916041.1 RecT [Paenibacillus donghaensis] (SEQ ID NO:451)
SNTQLATIHNNLERLIDSKRDAMPSSFNKTRFLQNCMTVLQDTYGIEKADPVSIARTMLK GAFLGLDFFNKECYAIIYGGKVEFMTDYKGEVKLAKKYSIKRIKDIYAKVVRAGDEFEE
TIEGGNQSINFKPLPFNDGEVLGAFAVVVYEDGSMNYDTMSVKEIESIKENFSKKSKDTG
QFSKAWVVTTSEMYRKTVLRRLCKNIELDFDTIEAKQAFEDGGDFEFNKDKKPAQESPL NPKSTVIDGEFTAVGEGAADGTE
WP 009411480.1 RecT [Capnocytophaga sp. oral taxon 324] (SEQ ID NO:452)
ETQVLQKQSLANFLNKSDKFLEQNLGAKKSEFVSNLLALSDSNKELSQCEPADLMKCA
MNATALNLPLNKNLGYAYVIPYFDGKTNRTIPQFQMGYKGFVQLAIRSGQYKTINTCEI
REGEIKRNKVTGHIDFLGENPSGAVIGYLAYIELLNGFQQSLFMTIEEVQAHARKYSKIY
AKTNRGLWKDEFDLMAKKTVLKLLLNRYGVLSVEMQKAIEKDQADNEGNYIDNPQGR YIQDAEVIEQNEPTENAQPVQPVTSEEPNKVDFKDV
WP_116232802.1 RecT [Paenibacillus sp. VMFN-D1] (SEQ ID NO:453)
AKALLENKLQERAAGASTPSTQGTSLKALLNSPAIKKRFDELLDKRSAQYMTSIVNLYN
SDAMLQKAEPMSVISSCIVAATLDLPVDKNLGYAWIVPYSGKAQFQLGYKGYIQLALRT
GQYKAINVIEVYEGELVKWNPLTEALELDFEKRKSDAVIGYAGYFELINGFRKSVYWTR
EQIESHRKKFSKSDFGWKKDYDAMAKKTIIRNMLSKWGILSIEMQDAYSKEIEAIPPLNN ENEEDPPIDLTPEDYRVGDEPQDGKEQGEMNFE
WP 123849158.1 RecT [Chitinophaga lutea] (SEQ ID NO:454)
SNVNAPAAPVKSKIEVLKDIMNAPSVQEQFQNALRENSGVFVASVIDLFNSDTYLQNCE
PKQVVMECLKAATLKLPINKNLGFAYVVPYKSNGKQIPQFQIGYKGYIQLAMRTGQYRI INADKVYEGEYRTKNKLTGEFDLSGTATSETVVGYFAHIEMLNGFAKTLYMTKEKVAA
HAKKYSKSFGKETSPWHTEFDAMALKTVLRNLLSHYGYLSVEMMGAMNADIESDQVG SEVSQTINDKANKQEMTFDDAEVVDDDEKEQNPI
WP_078410260.1 RecT [Priestia abyssalis] (SEQ ID NO:455) ATNQSLKNQLQSRQSAGTPAQQSNSLKALLSSPTVKKRFEEVLDKRSAQFMTSIVNLYN SEKMLQKCEPMSVISSAMVAATLDLPVDKNLGYAWIVPYKNTASFQLGYKGYIQLALR TSQYRFINVTPVHEGELMKWNPLTEEIEIDFDARQSDVIIGYAAYFELLNGFRKTVYWTK NQVEKHRKKFAKSDFGWKNDYDAMAMKTVLKAMLSKWGILSIEMQKAYSEDEEPREL KDITEEAQEVDYIEAEVIDVPAEEKASAFDQENFHIE
AAT90028.1 phage recombination protein [Leifsonia xyli subsp. xyli str. CTCB07] (SEQ ID NO:456)
AVKKNPTIEDYLIKYEPEFQRALGASMDAAKFAQDALTAIKQNPKIGHSDPRSLFGALFL AAQLKLPVGGPLAQFHLTTRTVKGNLTVVPIVGYGGYVQLIMNTGLYSRVSAFLIHAGD YFVTGANSERGEFYDFRRADSDRGEVKGVIAYAKVKGHNESSWVYIDAETMRAKHRP KYWESTPWADDAGEMFKKTGIRVLQKYLPKSVESLNVALAASADQAIVRKVDGVPDL DIQHDRDTETVAVPEQPVSVPQPGDET
WP 080022455.1 RecT [Clostridium thermobutyricum] (SEQ ID NO:457)
QSTGDIVFPQNYNYSNALKSAQLILAETVDRNKVPVLQSCSKPSICNALLDMVIQGLSPA KKQCYFVPYGGKLQLMKSYLGNIAATKRLKGVKDVFANVIYEGDVFEYKLNLNTGLIEI EKHEQKFENISKKILGAYAVVVRENQNNYVEVMNIEQIKNAWNQGAAKGNSQAHKNF AEEMAKKTVINRACKRFVNTSDDSDTLIESINRTNEYKEEDIIETTKSEVGEEIKENANTE NLGLEDTEVVEAEVIENIEFEGDK
WP 081759639.1 RecT [Clostridium jeddahense] (SEQ ID NO:458)
LGERTPQFISSIVSLVNADANLQRAFYDAPVTVIQSALKVATFNLPIDPNLGYAYIVPFNN TVKNPDGSIRKRIEASFIMGYKGMNQLALRTGVYKTINVVDVREGELKSYNRLTEDIEL DFVEDDEEREKLPIIGWVGYYRLINGTEKTIYMTRKQIETHEKKNHKGQYMGKGWRED FDSMAMETVFRRLIGKWCLMSIDYQRANPGTLAAADALAHGQFDDEDPLPDAVPLQAE AQEVNPETGEVQS
WP 089281299.1 RecT [Anaerovirgula multivorans] (SEQ ID NO:459)
DAKHLTVVHQNLNTLLKAKADALPKGFNQTRFLQNCMTVLQDTKDIESVEPKSVARTM LKGAFLGLDFFNKECYAIVYNKKAGNSWIKTLEFQTDYKGEIKLAKKYSINTIKDIYAKL VREGDEFEEGVKDGKQVINFKPKPFNNNKILGAFAVAYYENGSMIYDTMSVEEIESVKK AYAKADKEGKYSKAWIESTGEMYKKTVLRRLCKLIELDFDTIEQKQAFDEGSGMEFKQ EGKTDKPKSSLEAEFVEAEYEEVEESETSEVVEE
RDI65706.1 phage RecT family recombinase [Nocardia pseudobrasiliensis] (SEQ ID NO:460)
SSIANAANASELTPASIVNRYRDDIAAVLPPKLQARIDRWLRLAIGAVNSNADLVDRVR ADQGASMMQTLMKCAALGHEPGSGLFHLVPKGPRIEGWEDYKGVLQRIDRSGVYARV VVEVVHANDDYAYDPNLDDRPQHKRAAADRGEPVSAYAYAVYPNGAVTAVAEATPE LIAASKAKARGADNASSPWRAPGAPMHRKVAIRQLEKFVATSAEDMREVAVRNAAPD VEDAPADYYQEP
WP 076170610.1 RecT [Paenibacillus rhizosphaerae] (SEQ ID NO:461) SSKLVEINSKLDSFLDAQHKAMPKGFNKTRFLQNSMSVLRDIEGLEQCDPKSVALVMLK GAFLGLDFFNKECYPVVYAGKVEFQTDYKGEVKLVKKYSTKPVREIYAKLVRQGDDFS EEIVAGSQTINFKPLPFNNGEIVGAFAVVNYVDGTMQYDTMSTEEIEKIKVNFSRKSKKT NEYSKAWVVTPGEMYKKTVLRRLCKTIDLDFDTIEQAQAFEDAADMDFNQDSKPQQQS PLNPMVIDVEYEEVKEEQADAAEQE
WP 106833617.1 RecT [Brevibacillus porteri] (SEQ ID NO:462)
ADQNKLVVIYNNLEKLLDSKREAMPTSFNKTRFLQNCMTVLQETKDIELCNPTTVART MLKGAFLGLDFFNKECYAVVYKGSVEFQTDYKGEVKLAKKYSTKPVREVYAKLVREG DEFAEEINSGNQTINFRPKPFNNEEILGAFAVVNYMDGTMAYDIMSKEEIEKIKENFSRKS KQTGEYSKAWVVTPGEMYKKTVLRRLCKNIDLDFDTIEQRQAFEDAGDVDFNQEVKPA QQSPLNSTVIEAEFEEVSEEQTNAAEQE
RDE19343.1 RecT [Parageobacillus thermoglucosidasius] (SEQ ID NO:463)
AKQADLKNKLANKNSTNPTAYLKNLVYAPTVQQKFKEVLKEKAAHFLTSLISLVDSSPD LQKCNPMTIIASAMKAATLELPIDKNLGYAWIVPYKNVATFQIGYKGYIQLALRTGLYR SINVIEVYEGELRKWNRLTEELDIDEGARKSDHVIGYAGYFELTNGFIKRVYWSKEDIER HRKKFAKSDFGWENNYDAMAKKTVLRNMLSKWGILSIDMQRAYVNDIDDPEQTKEVI DVEWSEIIEEANVANSPEQQEIVFEQ
WP_138600901.1 RecT [Pseudoalteromonas] (multispecies) (SEQ ID NO:464)
SLSLQEYQNLLYGKLTACKGQFDACLSENGYKLDFNTELNYVYQIVMSGLNVEYSFPYT PVESVITSFLKAAKIGLSLCPTEQLCFLKTEYSESSGQYVTQLGLGYKGILKLAYRSGKV KQINANVFYEKDNFQYNGVNSKVTHTTTVLSKAMRGQLAGGYCQTELIDGSFKTTVMP PEEILAIEEQGKAMGNEAWLSVHVDQMREKTLIKRHWKTLCPCIYRDSVMNDPMLFDD QDCQHSSNQQAYEEQFESAYSREAY
WP 082209600.1 RecT [Peptostreptococcaceae bacterium VA2] (SEQ ID NO:465)
QPFLVQRYPHLDVVLNDQVHVLKSFFFQNHIILYLYKYIECLQIFHKPLLKGDRGKVIGY YAVYHLEPNGYNFVFMTYDEVKNHGKKYSKNFEGGIWEKEFDSMAKKTVIKKLLKYA PLSIEMQKAVTFDESVKGSIDNDMLLVESIEDVEEIQLDTNI
WP 026627303.1 RecT [Dysgonomonas capnocytophagoides] (SEQ ID NO:466)
STQQVQQQTKPLSLANFLNAPSTANFLKETLAEKKSEFVSNLIALCDADPKLAQCDPAQ LMKCAMNATSLNLPLNKNLGYAYVIAYKGVPSFQIGYKGLIQLAIRTGQYKFINATEIRE GEIRHNKITGEVIFNGEKPDAPIVGYMSYLELVNGFTASLYMTEEQIEQHALRFSQTYKN DKQYRSSTSKWSDPLARPTMCKKTVLKLLLGTYGLMTTEFAKALDSDSDDEVSTSGHR FEEAEIVQQGEPNEEQSDEPKRMEI
WP 109523733.1 RecT [Nocardia aurea] (SEQ ID NO:467)
SESISAAADAQKVTPRIVLDRHRDAFAQVLPPTINLDRWLRLAESAINASAGLLDIFRRD RGASALKALMKCAQLGHEPGSGLFHLVPKGQAIEGWEDYKGILQRILRSALYAKVVVA PVYANDEYAFDVNVDERPRHKQAAGDRGEPVRAYAYAVHRDGSTSTIAEATPAMIAG
AKAKGHKTDASTSPWQNPRAPMHQKVAVRELERFVSTSAVDLRVTGDVTDLIIEEP
GAE09585.1 [Paenibacillus sp. JCM 10914] (SEQ ID NO:468)
TMDYVTKIQDALDRELDAKHDALPSGFKKTRFSENCRAYVKDYKDLQKYDAEEVASV
LFKGAVLGLDFLAKECHVITEGSALRFQTDYKGEMTLVKKYSVRPILDIYAKNVREGDD
FREEISGGKPLIHFNPRAFNNSKITGSFAVALFTDGGMVYETMPAEEIESIREHYGKNPGS
DTWEKSQGEMYKRTVLRRLCKTIEIDFDAEQSLAYEAGSSFEFDREQQPKKRSPFNPPEV
EESEVLSNDGITETQ
RRG08833.1 RecT [Lactobacillus sp.] (SEQ ID NO:469)
NSLSGALNSRNQAGSPTSMIKNLMRSDSIKNRFDEVMGAKAPQFMASITNLVNSNQDLQ
HVDAMSVVASAMVAATLDLPIDPNLGYMYIVPYRGQAQPQMGYKGYIQLALRTGQYK
HINALPVYDDEVKSWNPLTEELEYESSGTSHDNQTPAGYVGYFQLINGFEKTTYWTYDQ
INSHRQKFSKMSSKTDPTGVWKSNFDAMALKTVLRNLISKWGIMSIEMQQAFVKDERP
QEFDHETGEIQDVQEVEAEEENVAPETQGSTDKKEE
GEA30849.1 CDIOL 17720 [Clostridium diolis] (SEQ ID NO:470)
ATNSSLKNQLIEKEQSTVNVQETIFKNLINSDEIKSKFTEVLKDKAFEYINSIINLVKEIPVP
NALGASDSHQSADLGSLLIECEPRSIIDACMIAASLDLSIDKNLEYVWIIPYKKKSNFQLG
YKGYIQLLLRTGEYKAINVIEVYEGQLKSWNPLTEEFDIDVSAKKSDAVIGYAGYFEMV
NGFRKYVYWSKDNMDAFRNNSFKGDPRWNNDYKAMAKRTVMRNMLSKWGRLSAE
MQRAYLEDINTDKFINGN
WP 077867213.1 RecT [Clostridium saccharobutylicum] (SEQ ID NO:471)
ATNSSLKNQLIEKEQTTVNVQETMFKNLINSDDVKSKFTEVLKDKAIQYINSIINLVNSDK
DLIECEPKSIIDACMSAVSLDLSVDKNLEYVEIIPYKKKANFQLGYRGYIQLLLRTGEYKS
VNIIEVYEGQLKSWNQLAEEFDIDFTYKKSDAVIGYAGYFEMLNGFRKSVYWSKENMD
ALRENSFKSDTRWNNDYKAMAKRAVIRNMISKWGSLSIEMEKAYCEDLNTDKFVNGN
WP 132305216.1 RecT [Paenibacillus sp. BK033] (SEQ ID NO:472)
AANTQLITIHNNLEKLIEAKKDAMPQGFNKTRFIQNCMTVLQDTYGIEKCEPTTVARTLL
KGAFLGLDFFNKECYAIPYGASMNFQTDYKGERKLAKKYSVRKVKDIYAKLVRAGDVF
EENITDGQQTIQFAPVPFNNGDIVGAFAVVLFHDGGMLYETMSIAEMEHIKENYSKKSK
DTGKFSKAWEVSTGEMYKKTVLRRLCKNIELDFDTIEQARAFEDAADVDFNKKTAPQQ
TSPLNVVEAEYEVVNDGSATEAQSE
RPI78794.1 EHM45 05245 [Desulfobacteraceae bacterium] (SEQ ID NO:473)
ATPNTPTTTDAGDFLKKSEKSLKNYAVRKYDFTSFLKSAMIAINDNTTLSECLRTEAGK
KSLFNAMRYAATTGLSLNPQEGKAALIGYKNKAGEMVLNYQIMKNGLIDLALSSGKVE
F VTADLVRANDEF SIKKS ASGDD YSF SP AIRDRGEVIGF VAALKLKGS ATYVKWMSTEE VAEFRDKYSSMYKNRPDASPWTHSFNGMGIKTVMKALLRSVSISPDVDAAVKSDDYIE AEFTVHGTTADDAVTQLQTPSKPVKAEEGQGELL
WP 051624047.1 RecT [Clostridium akagii] (SEQ ID NO:474)
ATSESLKNQLVNKETRPPKDPFKALVYSAGIKKRFEDMLDKQANGFITSLLNLKQDKLK SCDDFTVLGSALKAAALKLPIDPNLGFAWIIPFKNHGKLEAQFQIGYKGFIQMAQRSGQY KKLNVTEIYEGQLKSFNPLTEEIVLDLDNIKSDLRKINKRYLIVMRMNLLALHLRKISKG
WP 081735325.1 RecT [Paenibacillus gorillae] (SEQ ID NO:475)
LEAKHDALPSGFNAVRFVQNCKAYLPEVRNFERFNPDEIALQFLKGAILGLDFLAKECH VITEGSAARFQTDYKGEMKLAMKHSVRPLLNIYAKNVREGDVFRESVVEGRPVVSFDP LPFNNSKIIGSFAVAQFNDGGMDYESMSSTEIESIRTHYGKNPGSDTWEKSQGEMYKRT ALRRLCKTIEIDFDAEQRLAFDAGSSFEFNREPRPQQQSPLNLESEVLTDEVEQG
WP 084505057.1 RecT [Acetobacterium dehalogenans] (SEQ ID NO:476)
CLRSWTRSFSNSVPLKIRFRLLYTAFLSQGSPLSSVNTTQSADKGSPIFFYAMFKTKDGG YGFEVMSVEDVRAHAKKYSQSFSSAYSPWSKNFEEMAKKTVLKKALKYAPLKSDFVR GIVVDETIKREISEDMYAAPSIEIEYEVDEDGVIQDEPTSNELTEAEK
AGF93134.1 RecT protein [uncultured organism] (SEQ ID NO:477)
SNELQNIKPEVFGEVEDKLGSLADNNGIDLPENYSARNALKQAYLKLQSKDEPVFDKYK DETIYNALLDTLTQGLNPGKDQVYYIGYGNHLTAQKSYFGNIALAKRMAGVQEVSSNVI LEGDEVDISIERGQQVIESHDRNFDSMDGQVKGAYAVISFEDERKDKYEIMTLKELKQA WAQGKSFGGNGKSPHHKFTKEMAKKTVINRALKPLIKASDDSGLIKEKPKLEKLKDGQ QERTEGEKIEEVDVDKEEVVEVDYDV
WP 076079849.1 RecT [Paenibacillus sp. FSL R7-0333] (SEQ ID NO:478)
TVAIELQVQETLDRILDSKHDALPSDFNKKRFSENCKAYVADEKDLHKYSPEEIAANLFK GAVLGLDFLAKECHLISGGVELKFQTDYKGEMKLTKKYSVRPLLDVYAKNVREGDEFR EEVIEGRPVIHFAPLPFNASSIIGSFAVALFQDGGMVYESIPAGEIEEIRKNYGKSLGDAW DKSQGEMYKRTVLRRLCKTIETDFDAEQRLIYDAGGAFEFTKQPARSRQQSPFNPPEESE VTQDDRVAETDQG
WP_119800346.1 RecT [Paenibacillus sp. 1011MAR3C5] (SEQ ID NO:479)
ATEQIISSLEALLEAKHDALPSGFNPTRFVQNCIAYLPEIRNWDRFNAEDLAIQFFKGAVL GLDFLAKECHIIAEGSGVRFQTDYKGEMKLAMKHSVRPLLTIYAKNVREGDCIEEAVIE GRPVINFNPLPFNNSSISGSFAVAQYTDGGMVYETMSAEEIEAVRTNYGKNPGSDTWDK SKGEMYKRTVLRRLCKTIEIDFDAEQRLAFEAGSEFDFSKQPRPQQRSPFEEKEVGPDEV EQG
WP 025706233.1 RecT [Paenibacillus graminis] (SEQ ID NO:480) TVAIETQVQETLDRILDSKHDALPSDFKKKRFSENCKVYVAEEKDLHKYTIDDIVANLFK GAVLGLDFLAKECHLITGGVDLKFQTDYKGEMKLTKKYSVRPMLDVYAKNVREGDIFR EQIIEGRPAIHFDPLPFNASKIIGSFAIALFQDGGMVYESIPAGEIEEIRKNYGKSLGDAWE KSQGEMYKRTVLRRLCKTIETDFDAEQRLIYEMGGAFEFTKQPTRSRQQSPFNPPEESEV IQNDRAAETDQG
OIO76374.1 AUJ88 06865 [Gallionellaceae bacterium CG1 02 56 997] (SEQ ID NO:481)
GRKEVERIRDGSRGYQAAKKYKKESTWDTDFVAMGLKTAIRRICKFLPKSPELATALA MDEQAGRQNLNLDDVINGSYTPVVDKDTGEIVDVADGGKTGNSNAATSTKLEKLEKIV AALRDANSVEALDEIYIRAEGDLDDANLEIAMREYRKCKDAISNSLI
WP-131535536.1 RecT [Pedobacter nototheniae] (SEQ ID NO:482)
STEQSQQQTAARVPAKFQEGTVDSILKRVSDFQNTGELVLPANYIPENAVRAAWLMLM ETTDRNDKPAIEVCTKESIANAFLEMVTKGLSVVKKQCYFVVYGNKLSLEDSYIGKIAIA KREAGVKEVNAVTIYEGDIFKYENDIETGRKRILEHKQELKNINPDKIVGA
WP 028113352.1 RecT [Ferrimonas kyonanensis] (SEQ ID NO:483)
NQMINEPDFVTALKDSRETYIDLTQNGGFNLNYGLEAGWAHQQIEASRYQNLDLTCSEP GSIMQAFCEAARLGLSFDPRKKHIYLMGQKDVQSGRTITILYVGYKGMIALACRTGFMI GGHADLVFEEDTFTYRSGTQLPVHEHDGRPNHERGRLKCGYVVAHQPGGMVKTLLVP KEVLLEAASNGLNAGGSNNTWCGPYMEMMYQKTCWRYAFNAWYSELEAVGMTQAQ LESATTAVSYQ
WP_100916003.1 RecT [Pseudoalteromonas spongiae] (SEQ ID NO:484)
NKFQHLQTELS SQLLSTKERFNELNNKNNLKVNFEEEYNFF YHL VTS SF YNINGIATCTF SSLKEAFLNIAKYGLSINPKLNLCYIRTEQSCAQANVNIAVYDFGYKGLLKLITRTGKVKI VTADVFYENDNFEFRGTREPVKHSTKTLSAAARGAMAGGYCSSELVAGGVVTTIMTPE ELREIESICQSTGNEAWNSVFIDELRRKTLIKRHWKTLMQVIEEQNLSVPIEETYQCDFAN GGY
WP 125711747.1 RecT [Companilactobacillus kedongensis] (SEQ ID NO:485)
MKDLARIPVKELVRSDTIKSKFNDVLGKRAPQFISSIVNIVNSNQDLKNVDQTSVISSALV AASLDLPINQSFGYMYLVPYSGKAQPQMGYKGYIQLAQRSGQYKRLNAISVSKEKVPD KMVIFIPDYRMEEAETQIDMYQDHIEDVKAGRVEPTRCGKCDYCKSTAKLGKIVSMDD LID
WP 002845682.1 RecT [Peptostreptococcus anaerobius] (SEQ ID NO:486)
SNQVTESKKGYVAEKNITDSALNAINKYMNDGVLHLPKTYSVENAMKSAYLTLSQAKD KNGKSVLESCTKESIYQSLLDMAVQGLTPAKNQCYFIPYGSKLTMSRSYLGTIAVTKSA VPEVKDVKGYAIYDKDVFETEFDYNTGCIKIKKFERNFDSIDTNSIKGAFALIIGEHGVLH TEVMNMAQIRNAWSMGATNGKSKAHNQFTDQMAIRTVINRACKFYINTSDDTSVLFAD SYANSDEDTSSEREVEIVDENVREKK WP_115407185.1 RecT [Shewanella morhuae] (SEQ ID NO:487)
QTAQVKLSVPHQQVYQDNFNYLSSQVVGHLVDLNEEIGYLNQIVFNSLSTASPLDVAAP WSVYGLLLNVCRLGLSLNPEKKLAYVMPSWSETGEIIMKLYPGYRGEIAIASNFNVIKN ANAVLVYENDHFRIQAATGEIEHFVTSLSIDPRVRGACSGGYCRSVLMDNTIQISYLSIEE
MNAIAQNQIEANMGNTPWNSIWRTEMNRVALYRRAAKDWRQLIKATPEIQSALSDTEY
WP 081955873.1 RecT [Helicobacter trogontum] (SEQ ID NO:488)
SNITTIQRKNEALALLENKEIQERLCALCGNEASKDKFKASLLNIALDSNLSACSMQSIVK
ASLDIAGLKLSLNKNLGKAYIVPRKVKIGNDYITEARIDIGYKGWLELAKRSKLSVKAHS
VFDCDDFVYSVDGVDEYMKLTPNFELRQEHDSAWVKEHLKGIVVGIKDLKSGDSEVKF
VSKGTLLKIMQKNDSVKNGKYSAYTDWLHEMLLAKAIKSCLSKTAMSEDTFYLIISNNK LFI
WP 064664300.1 RecT [Pseudoalteromonas sp. MQS005] (SEQ ID NO:489)
SISQQDYENLLYSKLYECESQYQAYLAEHNEKLNFNAELNYMYKAVMSGVGIEGGFPY
TPLESIVESFLKAAKLGLSLDPSEQFCFLRSQYDHSTGLYHTELGLGYKGVLHLAYRSGK
VKQIVSNVFYNKDNFQFNGPNSKVTHTMTVLSTSARGNLAGGYCQTELVDGSFIVTVM
PPEEILAIEEQGKSVGNPAWLSAHVNQMREKTLILRHWKTLYPAIYSSSLLDSAQIFDDE CEEFPFSSPSQGFSESQTIGSY
WP 069455496.1 RecT [Shewanella xiamenensis] (SEQ ID NO:490)
QTAQVKLSVPHQQVFQDNFNYLSSQIVGHQVDLNEEIGYLNQIVFNSLATTSPLDVAAS
WSVYRLLLNVCRLGLSLDPEKKLAYVIPSLSETGEKIMKLYPGYRGEIAIASNANVLKNA
NAVLVYENDHFRIQAATGEIEHFVTSLSIDPRVRGACSGGYCRSVLMDGSVLMSYLSIEE
MDSIAQHQIEANMGNTPWNSIWRTEMNRVALYRRAAKDWRQLIKATPEMQSSLLDTEF
RTL04618.1 EKK58 09925 [Candidatus Dependentiae bacterium] (SEQ ID NO:491)
CHVLNFQTDYKGEIKLAHKYSVRKIIDIYAKVVRDGDVLEIRVENGSQIVNFNPKVFNDG KIIGAFAVVKFVDGSLLYETMSKSEIDHTRVTFSKMPNGMAWKDSEGEMCRKTVLRRIC KLIDLHFDSVEQEQAWNDGSDADLTKNEPVKPEIQNPFPTKAVEAVIVTEEEKLRKQLK
DKDPTLQDWQIDALVREHKEANQ
[00291] Additional exemplary and non-limiting aspects and embodiments of the disclosure are summarized as follows in the form of numbered paragraphs.
1. A system comprising: a Casl2f protein; a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target
DNA sequence; and a microbial recombination protein, wherein the microbial recombination protein is selected from the group consisting of RecE, RecT, lambda exonuclease, Bet protein, exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof.
2. The system of paragraph 1, further comprising a recruitment system comprising. at least one aptamer sequence; and an aptamer binding protein functionally linked to the microbial recombination protein as part of a fusion protein.
3. The system of paragraph 2, wherein the at least one aptamer sequence is an RNA aptamer sequence or a peptide aptamer sequence.
4. The system of paragraph 3, wherein the nucleic acid molecule comprises the at least one RNA aptamer sequence.
5. The system of paragraph 4, wherein the nucleic acid molecule comprises two RNA aptamer sequences.
6. The system of paragraph 5, wherein the two RNA aptamer sequences comprise the same sequence.
7. The system of any of paragraphs 2-6, wherein the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof.
8. The system of any of paragraphs 2-6, wherein the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof.
9. The system of paragraph 3, wherein the at least one peptide aptamer sequence is conjugated to the Casl2f protein.
10. The system of paragraph 9, wherein the at least one peptide aptamer sequence comprises between 1 and 24 peptide aptamer sequences.
11. The system of paragraph 9 or 10, wherein the aptamer sequences comprise the same sequence.
12. The system of any of paragraphs 2-3 or 9-11, wherein the aptamer sequence comprises a GCN4 peptide sequence.
13. The system of any of paragraphs 2-12, wherein the microbial recombination protein N- terminus is linked to the aptamer binding protein C-terminus.
14. The system of any of paragraphs 2-13, wherein the fusion protein further comprises a linker between the microbial recombination protein and the aptamer binding protein. 15. The system of paragraph 14, wherein the linker comprises the amino acid sequence of SEQ ID NO: 15.
16. The system of any of paragraphs 2-15, wherein the fusion protein further comprises a nuclear localization sequence.
17. The system of paragraph 16, wherein the nuclear localization sequence comprises the amino acid sequence of SEQ ID NO: 16.
18. The system of paragraph 16 or paragraph 17, wherein the nuclear localization sequence is on the microbial recombination protein C-terminus.
19. The system of any of paragraphs 1-18, wherein the RecE or RecT recombination protein is derived from E. coli.
20. The system of any of paragraphs 1-19, wherein the microbial recombination protein comprises RecE, or derivative or variant thereof.
21. The system of any of paragraphs 1-20, wherein the RecE, or derivative or variant thereof, comprises an amino acid sequence with at least 70% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-8.
22. The system of any of paragraphs 1-21, wherein the RecE, or derivative or variant thereof, comprises an amino acid sequence with at least 70% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 1-3.
23. The system of any of paragraphs 1-19, wherein the fusion protein comprises RecT, or derivative or variant thereof.
24. The system of any of paragraphs 1-19 or 23, wherein the RecT, or derivative or variant thereof, comprises an amino acid sequence with at least 70% similarity to amino acid sequences selected from the group consisting of SEQ ID NOs: 9-14.
25. The system of any of paragraphs 1-19 or 23-24, wherein the RecT, or derivative or variant thereof, comprises an amino acid sequence with at least 70% similarity to amino acid sequences selected from the group consisting of SEQ ID NO: 9.
26. The system of any of paragraphs 1-25, wherein the Casl2f protein is catalytically dead.
27. The system of any of paragraphs 1-26, further comprising a donor nucleic acid.
28. The system of any of paragraphs 1-27, wherein the target DNA sequence is a genomic DNA sequence in a host cell.
29. A composition comprising: a polynucleotide comprising a nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein, wherein the microbial recombination protein is RecE, RecT, lambda exonuclease, Bet protein, exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof.
30. The composition of paragraph 29, further comprising at least one of: a polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein; and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence.
31. The composition of paragraph 30, wherein the nucleic acid molecule further comprises at least one RNA aptamer sequence.
32. The composition of paragraph 31, wherein the polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein further comprises a sequence encoding at least one peptide aptamer sequence.
33. A vector comprising a polynucleotide comprising a nucleic acid sequence encoding a fusion protein comprising a microbial recombination protein functionally linked to an aptamer binding protein, wherein the microbial recombination protein is RecE, RecT, lambda exonuclease, Bet protein, exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof.
34. The vector of paragraph 33, further comprising at least one of: a polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein; and a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence.
35. The vector of paragraph 34, wherein the nucleic acid molecule further comprises at least one RNA aptamer sequence.
36. The vector of paragraph 34, wherein the polynucleotide comprising a nucleic acid sequence encoding a Casl2f protein further comprises a sequence encoding at least one peptide aptamer sequence.
37. The vector of any one of paragraphs 33-36, wherein the vector is AAV.
38. A eukaryotic cell comprising the system of any one of paragraphs 1-28, the composition of any one of paragraphs 29-32, or the vector of any of paragraphs 33-37. 39. A method of altering a target genomic DNA sequence in a cell, comprising introducing the system of any one of paragraphs 1-28, the composition of any one of paragraphs 29-32, or the vector of any one of paragraphs 33-37 into a cell comprising a target genomic DNA sequence.
40. The method of paragraph 38, wherein the cell is a mammalian cell.
41. The method of paragraph 38 or paragraph 39, wherein the cell is a human cell.
42. The method of any one of paragraphs 38-41, wherein the cell is a stem cell.
43. The method of any one of paragraphs 38-42, wherein the target genomic DNA sequence encodes a gene product.
44. The method of any one of paragraphs 38-43, wherein the introducing into a cell comprises administering to a subject.
45. The method of paragraph 44, wherein the subject is a human.
46. The method of paragraph 44 or 45, wherein the administering comprises in vivo administration.
47. The method of paragraph 44 or 46, wherein the administering comprises transplantation of ex vivo treated cells comprising the system, composition, or vector.
48. Use of the system of any one of paragraphs 1-28, the composition of any one of paragraphs 29-32, or the vector of any one of paragraphs 33-37 for the alteration of a target DNA sequence in a cell.
* * *
[00292] Having thus described in detail preferred embodiments of the present invention, it is to be understood that the invention defined by the above paragraphs is not to be limited to particular details set forth in the above description as many apparent variations thereof are possible without departing from the spirit or scope of the present invention.

Claims

WHAT IS CLAIMED IS:
1. A system or composition comprising:
(i) a Casl2f protein;
(ii) a nucleic acid molecule comprising a guide RNA sequence that is complementary to a target DNA sequence; and
(iii) a recombination protein, wherein the recombination protein comprises an exonuclease, a single stranded DNA annealing protein (SSAP), or a single stranded DNA binding protein (SSB), or a combination of two or more thereof; or,
(iv) nucleic acid molecule(s) encoding or delivering (i), and/or (ii), and/or (iii) for expression in vivo in a cell; or,
(v) vector(s) containing the nucleic acid molecule(s) of (iv) for expression in vivo in a cell.
2. The system or composition of claim 1, further comprising a recruitment system comprising at least one aptamer; and an aptamer binding protein functionally linked to the recombination protein as part of a fusion protein.
3. The system or composition or composition of claim 2, wherein the at least one aptamer is an RNA aptamer or a peptide aptamer.
4. The system or composition of claim 3, wherein the nucleic acid molecule or nucleic acid molecules comprises two RNA aptamers.
5. The system or composition of claim 4, wherein the two RNA aptamer sequences comprise the same sequence.
6. The system or composition of any of claims 2-4, wherein the aptamer binding protein comprises a MS2 coat protein, or a functional derivative or variant thereof.
7. The system or composition of any of claims 2-4, wherein the aptamer binding protein comprises phage N peptide, or a functional derivative or variant thereof.
8. The system or composition of claim 2, wherein the at least one aptamer sequence is linked to the Casl2f protein.
9. The system or composition of claim 2, wherein the at least one aptamer sequence is linked to the guide RNA.
10. The system or composition of claim 2, wherein the recruitment system comprises from 1 to 24 aptamers.
11. The system or composition of any one of claims 8 to 10, wherein two or more aptamers comprise the same sequence.
12. The system or composition of any of claims 2, 3 or 8-11, wherein the aptamer comprises a GCN4 peptide sequence.
13. The system or composition of any of claims 2-12, wherein the recombination protein N- terminus is linked to the aptamer binding protein C-terminus.
14. The system or composition of any one of claims 1-13, wherein the system or composition comprises a SSAP and aa exonuclease
15. The system or composition of any of claims 2-13, wherein further comprises a linker between the recombination protein and the aptamer binding protein.
16. The system or composition of claim 14, wherein the linker comprises the amino acid sequence of SEQ ID NO: 15.
17. The system or composition of any of claims 1-15, including at least one nuclear localization sequence (NLS), optionally wherein the NLS(s) is / are linked to the recombination protein or the Casl2f protein.
18. The system or composition of claim 16, wherein the nuclear localization sequence comprises the amino acid sequence of SEQ ID NO: 16.
19. The system or composition of claim 16 or 17, which comprises a NLS on one or more of the recombination protein C-terminus, the recombination protein N-terminus or on the Casl2f protein.
20. The system or composition of any one of claims 1-18, wherein the recombination protein comprises a SSAP.
21. The system or composition of claim 20, wherein the SSAP is fused to a Casl2f or a dCasl2f.
22. The system or composition of any one of claims 1-19, wherein the recombination protein comprises an exonuclease.
23. The system or composition of claim 22, wherein the exonuclease is fused to a Casl2f or a dCasl2f.
24. The system or composition of any one of claims 1-19, which comprises a fusion protein comprising a SSAP, an exonuclease, and a Casl2f.
25. The system or composition of claim 24, wherein the Casl2f is a dCasl2f.
26. The system or composition of any one of claims 1-19, wherein the recombination protein comprises an exonuclease.
27. The system or composition of any one of claims 1-19, wherein the recombination protein comprises a microbial recombination protein or active portion thereof.
28. The system or composition of claim 20, wherein the recombination protein comprises RecE, RecT, lambda exonuclease, Bet protein, exonuclease gp6, single-stranded DNA-binding protein gp2.5, or a derivative or variant thereof.
29. The system or composition of claim 20, wherein the fusion protein comprises RecE, RecT, or derivative or variant thereof.
30. The system or composition of claim 29, wherein the RecE, RecT, or derivative or variant thereof, comprises an amino acid sequence with at least 70% identity or similarity or identity to an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-14.
31. The system or composition of any one of claims 1-19, wherein the recombination protein comprises a mitochondrial recombination protein or active portion thereof.
32. The system or composition of any one of claims 1-19, wherein the recombination protein comprises a viral recombination protein or active portion thereof.
33. The system or composition of any one of claims 1-19, wherein the recombination protein comprises a eukaryotic recombination protein or active portion thereof.
34. The system or composition of any one of claims 1-19, wherein the recombination protein comprises an amino acid sequence having at least 95% identity to a recombination protein of SEQ ID NO: 166 to SEQ ID NO: 491.
35. The system or composition of any one of claims 1-19, wherein the recombination protein comprises a recombination protein of Table 6 or derivative or variant or functional portion thereof having at least 70%, at least 80%, at least 90%, or at least 95% identity thereto.
36. The system or composition of any one of claims 1-19, wherein the recombination protein comprises an amino acid sequence having at least 95% identity to SEQ ID NO: 179, SEQ ID NO: 185, SEQ ID NO:205, SEQ ID NO:321, SEQ ID NO:353, SEQ ID NO:359, SEQ ID NO:366, SEQ ID NO:424, or SEQ ID NO:479.
37. The system or composition of any of claims claims 1-35, wherein the Casl2f protein is catalytically inactive (less than 5% nuclease activity as compared with a wild type or non-mutated of the Cas protein) or catalytically dead.
38. The system or composition of any of claim 36, wherein the Casl2f protein comprises Casl2fl.
39. The system or composition of any one of claims 1-38, wherein the Casl2f and the recombination protein are covalently or non-covalently linked.
40. The system or composition of any one of claims 1-38 wherein the Casl2f and the recombination protein comprise a fusion protein.
41. The system or composition of any one of claims 1-40, wherein (i), (ii) and (iii), further comprises a nucleic acid polymerase; or (iv) comprises nucleic acid molecule(s) encoding or delivering (i), and/or (ii) and/or (iii) and/or a nucleic acid polymerase for expression in vivo in a cell; or the vector(s) of (v) additionally contains nucleic acid molecule(s) encoding a nucleic acid polymerase.
42. The system of composition of claim 41, wherein the nucleic acid polymerase comprises reverse transcriptase activity.
43. The system or composition of claim 41, wherein the nucleic acid polymerase comprises a retron RT.
44. The system or composition of any of claims 1-43, further comprising a donor nucleic acid.
45. The system or composition of any of claims 1-43, wherein the target DNA sequence is a genomic DNA sequence in a host cell.
46. The system or composition of any of claims 1-45, wherein the vector comprises AAV.
47. A eukaryotic cell comprising the system of any one of claims 1-45, or the vector system of claim 46.
48. A method of altering a target genomic DNA sequence in a cell, comprising introducing the system or composition of any one of claims 1-45, or the vector system of claim 46 into a cell comprising a target genomic DNA sequence.
49. The method of claim 47, wherein the cell is a mammalian cell.
50. The method of claim 47 or claim 48, wherein the cell is a human cell.
51. The method of any one of claims 47-50, wherein the cell is a stem cell.
52. The method of any one of claims 47-51, wherein the target genomic DNA sequence encodes a gene product.
53. The method of any one of claims claims 47-52, wherein the target comprises a genomic sequence of albumin (ALB), AAVS1, HSP90AA1, DYNLT1, ACTB, BCAP31, HIST1H2BK, CLTA, or RABI 1 A.
54. The method of any one of claims 47-52, wherein the introducing into a cell comprises administering to a subject.
55. The method of claim 54, wherein the subject is a human.
56. The method of claim 54 or 55, wherein the administering comprises in vivo administration.
57. The method of claim 54 or 56, wherein the administering comprises transplantation of ex vivo treated cells comprising the system, composition, or vector.
58. Use of the system of composition of any one of claims 1-45, or the vector system of claim 46 for the alteration of a target DNA sequence in a cell.
PCT/US2023/062431 2022-02-10 2023-02-10 Rna-guided genome recombineering at kilobase scale WO2023154892A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263308759P 2022-02-10 2022-02-10
US63/308,759 2022-02-10

Publications (1)

Publication Number Publication Date
WO2023154892A1 true WO2023154892A1 (en) 2023-08-17

Family

ID=87565155

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/062431 WO2023154892A1 (en) 2022-02-10 2023-02-10 Rna-guided genome recombineering at kilobase scale

Country Status (1)

Country Link
WO (1) WO2023154892A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020088450A1 (en) * 2018-10-29 2020-05-07 中国农业大学 Novel crispr/cas12f enzyme and system
WO2021016135A1 (en) * 2019-07-19 2021-01-28 Synthego Corporation Stabilized crispr complexes
WO2021178432A1 (en) * 2020-03-03 2021-09-10 The Board Of Trustees Of The Leland Stanford Junior University Rna-guided genome recombineering at kilobase scale

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020088450A1 (en) * 2018-10-29 2020-05-07 中国农业大学 Novel crispr/cas12f enzyme and system
WO2021016135A1 (en) * 2019-07-19 2021-01-28 Synthego Corporation Stabilized crispr complexes
WO2021178432A1 (en) * 2020-03-03 2021-09-10 The Board Of Trustees Of The Leland Stanford Junior University Rna-guided genome recombineering at kilobase scale

Similar Documents

Publication Publication Date Title
US11479775B2 (en) RNA targeting of mutations via suppressor tRNAs and deaminases
JP7252328B2 (en) Methods and compositions for editing RNA
EP3461894B1 (en) Engineered crispr-cas9 compositions and methods of use
CN114072496A (en) Adenosine deaminase base editor and method for modifying nucleobases in target sequence by using same
CN113795587A (en) RNA-guided DNA integration Using Tn 7-like transposons
WO2021050571A1 (en) Novel nucleobase editors and methods of using same
CA3116739A1 (en) Compositions and methods for treating alpha-1 antitrypsin deficiencey
CN116113692A (en) Compositions and methods for implanting base editing cells
CA3237482A1 (en) Precise genome editing using retrons
US20230091242A1 (en) Rna-guided genome recombineering at kilobase scale
WO2019173248A1 (en) Engineered nucleic acid-targeting nucleic acids
WO2022150974A1 (en) Targeted rna editing by leveraging endogenous adar using engineered rnas
WO2023154892A1 (en) Rna-guided genome recombineering at kilobase scale
US20230088902A1 (en) Cell specific, self-inactivating genomic editing using crispr-cas systems having rnase and dnase activity
WO2023154877A2 (en) Rna-guided genome recombineering at kilobase scale
Tan et al. Host cell factors important for BHV-1 cell entry revealed by genome-wide CRISPR knockout screen
CN118234855A (en) RNA-guided kilobase-scale genome recombination engineering
WO2023034925A1 (en) Rna-guided genome recombineering at kilobase scale
WO2023086953A1 (en) Compositions and methods for the treatment of hereditary angioedema (hae)
WO2024052681A1 (en) Rett syndrome therapy
CN117561074A (en) Adenosine deaminase variants and uses thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23753721

Country of ref document: EP

Kind code of ref document: A1