WO2024055013A1

WO2024055013A1 - Systems and methods for transposing cargo nucleotide sequences

Info

Publication number: WO2024055013A1
Application number: PCT/US2023/073796
Authority: WO
Inventors: Brian C. Thomas; Christopher Brown; Daniela S.A. Goltsman
Original assignee: Metagenomi, Inc.
Priority date: 2022-09-08
Filing date: 2023-09-08
Publication date: 2024-03-14

Abstract

The present disclosure provides systems and methods for transposing a cargo nucleotide sequence to a target nucleic acid site. These systems and methods may comprise a first double-stranded nucleic acid comprising the cargo nucleotide sequence, wherein the cargo nucleotide sequence is configured to interact with a transposase, and the transposase, wherein said transposase is configured to transpose the cargo nucleotide sequence to the target nucleic acid site.

Description

SYSTEMS AND METHODS FOR TRANSPOSING CARGO NUCLEOTIDE SEQUENCES

CROSS-REFERENCE

[0001] This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/404,859 filed September 8, 2022, which is incorporated by reference in its entirety herein.

SUMMARY

[0002] Transposons are mobile genetic elements evolved to execute highly efficient integration of their genes into the genomes of their host cells. The ability of transposons to naturally transfer DNA throughout the genome has been harnessed for a wide variety of research and therapeutic applications including gene editing applications.

[0003] Described herein, in certain embodiments, are engineered transposase systems, comprising: (a) a double- stranded nucleic acid and comprising a cargo nucleotide sequence; and (b) a transposase configured to interact with the double-stranded nucleic acid to transpose the cargo nucleotide sequence to a target nucleic acid site; and comprising a sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-38. In some embodiments, the cargo nucleotide sequence is flanked by a left-hand transposase recognition sequence and a right-hand transposase recognition sequence recognized by the transposase. In some embodiments, the transposase is configured to transpose the cargo nucleotide sequence as double- stranded deoxyribonucleic acid polynucleotide. In some embodiments, the transposase comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of the transposase. In some embodiments, the NLS comprises a sequence according to any one of SEQ ID NOs: 1480-1495.

[0004] Described herein, in certain embodiments, are methods for binding, nicking, cleaving, marking, modifying, or transposing a double- stranded deoxyribonucleic acid polynucleotide, comprising contacting the double- stranded deoxyribonucleic acid polynucleotide with a transposase configured to transpose the cargo nucleotide sequence to a target nucleic acid site; and comprising a sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-38.

[0005] Described herein, in certain embodiments, are methods of modifying a target nucleic acid site, comprising contacting the target nucleic acid site with the engineered transposase system described herein. In some embodiments, modifying the target nucleic acid site comprises binding, nicking, cleaving, marking, modifying, or transposing the target nucleic acid site. In some embodiments, the target nucleic acid site comprises deoxyribonucleic acid (DNA). In some embodiments, the target nucleic acid site comprises genomic DNA, viral DNA, or bacterial DNA. [0006] Described herein, in certain embodiments, are methods for transposing a cargo nucleotide sequence into a target nucleic acid site comprising introducing the engineered transposase system described herein to a cell.

[0007] Described herein, in certain embodiments, are cells comprising the engineered transposase system described herein. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is an immortalized cell. In some embodiments, the cell is an insect cell. In some embodiments, the cell is a yeast cell. In some embodiments, the cell is a plant cell. In some embodiments, the cell is a fungal cell. In some embodiments, the cell is a prokaryotic cell. In some embodiments, the cell is an A549, HEK-293, HEK-293T, BHK, CHO, HeLa, MRC5, Sf9, Cos-1, Cos-7, Vero, BSC 1, BSC 40, BMT 10, WI38, HeLa, Saos, C2C12, L cell, HT1080, HepG2, Huh7, K562, primary cell, or a derivative thereof. In some embodiments, the cell is an engineered cell. In some embodiments, the cell is a stable cell. In some embodiments, the cell is a primary cell. In some embodiments, the cell is a T cell. In some embodiments, the cell is a hematopoietic stem cell (HSC).

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] The novel features of the disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings of which:

[0009] FIGs. 1A-1C depict MG63-961. FIG. 1A depicts the genomic context of a bacterial Tel- Mariner superfamily transposase. The transposase encodes a predicted DDE superfamily endonuclease domain. FIG. IB depicts alignment of identified, imperfect terminal inverted repeats. FIG. IB discloses SEQ ID NOs: 55-56, respectively, in order of appearance. FIG. 1C depicts 3D structure prediction of an MG Tel /Mariner-like superfamily transposase folds best after a Eukaryotic Mosl transposase.

[0010] FIGs. 2A-2B depict multiple sequence alignments (MSA) that identified key transposon features. FIG. 2A depicts MSA of transposase proteins vs. the catalytic domain of Sleeping Beauty, which identified conserved catalytic residues DDE (boxes). FIG. 2A discloses SEQ ID NOs: 57-66, respectively, in order of appearance. FIG. 2B depicts length distribution of the distance between the second aspartate (D) and the glutamate (E) catalytic residues of the canonical DDE transposases motif. The x-axis represents the sequence length between the D and E catalytic residues, inclusive. BRIEF DESCRIPTION OF THE SEQUENCE LISTING

[0011] The Sequence Listing filed herewith provides exemplary polynucleotide and polypeptide sequences for use in methods, compositions, and systems according to the disclosure. Below are exemplary descriptions of sequences therein.

[0012] MG63

[0013] SEQ ID NOs: 1-38 show the full-length peptide sequences of MG63 transposition proteins.

DETAILED DESCRIPTION

[0014] While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.

[0015] The practice of some methods disclosed herein employ, unless otherwise indicated, techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics, and recombinant DNA. See for example Sambrook and Green, Molecular Cloning: A Laboratory Manual, 4th Edition (2012); the series Current Protocols in Molecular Biology (F. M. Ausubel, et al. eds.); the series Methods In Enzymology (Academic Press, Inc.), PCR 2: A Practical Approach (M.J. MacPherson, B.D. Hames and G.R. Taylor eds. (1995)), Harlow and Lane, eds. (1988) Antibodies, A Laboratory Manual, and Culture of Animal Cells: A Manual of Basic Technique and Specialized Applications, 6th Edition (R.I. Freshney, ed. (2010)).

[0016] As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, to the extent that the terms “including”, “includes”, “having”, “has”, “with”, or variants thereof are used in either the detailed description and/or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising”.

[0017] The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within one or more than one standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 15%, up to 10%, up to 5%, or up to 1% of a given value.

[0018] The term “nucleotide,” as used herein, refers to a base-sugar-phosphate combination. Contemplated nucleotides include naturally occurring nucleotides and synthetic nucleotides. Nucleotides are monomeric units of a nucleic acid sequence (e.g., deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)). The term nucleotide includes ribonucleoside triphosphates adenosine triphosphate (ATP), uridine triphosphate (UTP), cytosine triphosphate (CTP), guanosine triphosphate (GTP) and deoxyribonucleoside triphosphates such as dATP, dCTP, diTP, dUTP, dGTP, dTTP, or derivatives thereof. Such derivatives include, for example, [aS]dATP, 7-deaza-dGTP and 7-deaza- dATP, and nucleotide derivatives that confer nuclease resistance on the nucleic acid molecule containing them. The term nucleotide as used herein encompasses dideoxyribonucleoside triphosphates (ddNTPs) and their derivatives. Illustrative examples of ddNTPs include, but are not limited to, ddATP, ddCTP, ddGTP, ddITP, and ddTTP. A nucleotide may be unlabeled or detectably labeled, such as using moieties comprising optically detectable moieties (e.g., fluorophores) or quantum dots. Detectable labels include, for example, radioactive isotopes, fluorescent labels, chemiluminescent labels, bioluminescent labels, and enzyme labels. Fluorescent labels of nucleotides include but are not limited fluorescein, 5-carboxyfluorescein (FAM), 2'7'-dimethoxy-4'5-dichloro-6- carboxyfluorescein (JOE), rhodamine, 6-carboxyrhodamine (R6G), N,N,N',N'-tetramethyl-6- carboxyrhodamine (TAMRA), 6-carboxy-X-rhodamine (ROX), 4-(4 'dimethylaminophenylazo) benzoic acid (DABCYL), Cascade Blue, Oregon Green, Texas Red, Cyanine and 5-(2'- aminoethyl)aminonaphthalene-l- sulfonic acid (EDANS). Specific examples of fluorescently labeled nucleotides include [R6G]dUTP, [TAMRA]dUTP, [R110]dCTP, [R6G]dCTP, [TAMRA]dCTP, [JOE]ddATP, [R6G]ddATP, [FAM]ddCTP, [R110]ddCTP, [TAMRA]ddGTP, [ROX]ddTTP, [dR6G]ddATP, [dR110]ddCTP, [dTAMRA]ddGTP, and [dROX]ddTTP available from Perkin Elmer, Foster City, Calif; FluoroLink DeoxyNucleotides, FluoroLink Cy3-dCTP, FluoroLink Cy5- dCTP, FluoroLink Fluor X-dCTP, FluoroLink Cy3-dUTP, and FluoroLink Cy5-dUTP available from Amersham, Arlington Heights, IL; Fluorescein- 15-dATP, Fluorescein- 12-dUTP, Tetramethyl- rodamine-6-dUTP, IR770-9-dATP, Fluorescein- 12-ddUTP, Fluorescein- 12-UTP, and Fluorescein- 15-2'-dATP available from Boehringer Mannheim, Indianapolis, Ind.; and Chromosome Labeled Nucleotides, B0DIPY-FL-14-UTP, B0DIPY-FL-4-UTP, B0DIPY-TMR-14-UTP, BODIPY-TMR- 14-dUTP, B0DIPY-TR-14-UTP, BODIPY-TR-14-dUTP, Cascade Blue-7-UTP, Cascade Blue-7- dUTP, fluorescein- 12-UTP, fluorescein- 12-dUTP, Oregon Green 488-5-dUTP, Rhodamine Green-5- UTP, Rhodamine Green-5-dUTP, tetramethylrhodamine-6-UTP, tetramethylrhodamine-6-dUTP, Texas Red-5-UTP, Texas Red-5-dUTP, and Texas Red- 12-dUTP available from Molecular Probes, Eugene, Oreg. The term nucleotide encompasses chemically modified nucleotides. An exemplary chemically-modified nucleotide is biotin-dNTP. Non-limiting examples of biotinylated dNTPs include, biotin-dATP (e.g., bio-N6-ddATP, biotin- 14-dATP), biotin-dCTP (e.g., biotin- 11-dCTP, biotin- 14-dCTP), and biotin-dUTP (e.g., biotin- 11-dUTP, biotin- 16-dUTP, biotin-20-dUTP). [0019] The terms “polynucleotide,” “oligonucleotide,” and “nucleic acid” are used interchangeably to refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof, either in single-, double-, or multi- stranded form. Contemplated polynucleotides include a gene or fragment thereof. Exemplary polynucleotides include, but are not limited to, DNA, RNA, coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, cell-free polynucleotides including cell-free DNA (cfDNA) and cell-free RNA (cfRNA), nucleic acid probes, and primers. In a polynucleotide when referring to a T, a T means U (Uracil) in RNA and T (Thymine) in DNA. A polynucleotide can be exogenous or endogenous to a cell and/or exist in a cell-free environment. The term polynucleotide encompasses modified polynucleotides (e.g., altered backbone, sugar, or nucleobase). If present, modifications to the nucleotide structure are imparted before or after assembly of the polymer. Non-limiting examples of modifications include: 5-bromouracil, peptide nucleic acid, xeno nucleic acid, morpholinos, locked nucleic acids, glycol nucleic acids, threose nucleic acids, dideoxynucleotides, cordycepin, 7-deaza-GTP, fluorophores (e.g., rhodamine or fluorescein linked to the sugar), thiol-containing nucleotides, biotin-linked nucleotides, fluorescent base analogs, CpG islands, methyl-7-guanosine, methylated nucleotides, inosine, thiouridine, pseudouridine, dihydrouridine, queuosine, and wyosine. The sequence of nucleotides may be interrupted by non-nucleotide components.

[0020] The terms “transfection” or “transfected” refer to introduction of a polynucleotide into a cell by non- viral or viral-based methods. The polynucleotides may be gene sequences encoding complete proteins or functional portions thereof. See, e.g., Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, 18.1-18.88.

[0021] The terms “peptide,” “polypeptide,” and “protein” are used interchangeably herein to refer to a polymer of at least two amino acid residues joined by peptide bond(s). This term does not connote a specific length of polymer, nor is it intended to imply or distinguish whether the peptide is produced using recombinant techniques, chemical or enzymatic synthesis, or is naturally occurring. The terms apply to naturally occurring amino acid polymers as well as amino acid polymers comprising at least one modified amino acid. In some cases, the polymer is interrupted by non-amino acids. The terms include amino acid chains of any length, including full length proteins, and proteins with or without secondary or tertiary structure (e.g., domains). The terms also encompass an amino acid polymer that has been modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, oxidation, and any other manipulation such as conjugation with a labeling component. The terms “amino acid” and “amino acids,” as used herein, refer to natural and non-natural amino acids, including, but not limited to, modified amino acids. Modified amino acids include amino acids that have been chemically modified to include a group or a chemical moiety not naturally present on the amino acid. The term “amino acid” includes both D- amino acids and L-amino acids.

[0022] As used herein, the “non-native” refers to a nucleic acid or polypeptide sequence that is non- naturally occurring. Non-native refers to a non-naturally occurring nucleic acid or polypeptide sequence that comprises modifications such as mutations, insertions, or deletions. The term non- native encompasses fusion nucleic acids or polypeptides that encodes or exhibits an activity (e.g., enzymatic activity, methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitinating activity, etc.) of the nucleic acid or polypeptide sequence to which the non-native sequence is fused. A non-native nucleic acid or polypeptide sequence includes those linked to a naturally-occurring nucleic acid or polypeptide sequence (or a variant thereof) by genetic engineering to generate a chimeric nucleic acid or polypeptide sequence encoding a chimeric nucleic acid or polypeptide.

[0023] The term “promoter”, as used herein, refers to the regulatory DNA region which controls transcription or expression of a polynucleotide (e.g., a gene) and which may be located adjacent to or overlapping a nucleotide or region of nucleotides at which RNA transcription is initiated. A promoter may contain specific DNA sequences which bind protein factors, often referred to as transcription factors, which facilitate binding of RNA polymerase to the DNA leading to gene transcription. Eukaryotic basal promoters typically, though not necessarily, contain a TATA-box and/or a CAAT box.

[0024] The term “expression”, as used herein, refers to the process by which a nucleic acid sequence or a polynucleotide is transcribed from a DNA template (such as into mRNA or other RNA transcript) and/or the process by which a transcribed mRNA is subsequently translated into peptides, polypeptides, or proteins. Transcripts and encoded polypeptides may be collectively referred to as “gene product.” If the polynucleotide is derived from genomic DNA, the term expression includes splicing of the mRNA in a eukaryotic cell.

[0025] As used herein, “operably linked”, “operable linkage”, “operatively linked”, or grammatical equivalents thereof refer to an arrangement of genetic elements, e.g., a promoter, an enhancer, a polyadenylation sequence, etc., wherein an operation (e.g., movement or activation) of a first genetic element has some effect on the second genetic element. The effect on the second genetic element can be, but need not be, of the same type as operation of the first genetic element. For example, two genetic elements are operably linked if movement of the first element causes an activation of the second element. For instance, a regulatory element, which may comprise promoter and/or enhancer sequences, is operatively linked to a coding region if the regulatory element helps initiate transcription of the coding sequence. There may be intervening residues between the regulatory element and coding region so long as this functional relationship is maintained.

[0026] A “vector” as used herein, refers to a macromolecule or association of macromolecules that comprises or associates with a polynucleotide and which mediates delivery of the polynucleotide to a cell. Examples of vectors include nucleic-based vectors (e.g., plasmids and viral vectors) and liposomes. An exemplary nucleic-acid based vector comprises genetic elements, e.g., regulatory elements, operatively linked to a gene to facilitate expression of the gene in a target.

[0027] As used herein, “expression cassette” and “nucleic acid cassette” are used interchangeably to refer to a component of a vector comprising a combination of nucleic acid sequences or elements (e.g., therapeutic gene, promoter, and a terminator) that are expressed together or are operably linked for expression. The terms encompass an expression cassette including a combination of regulatory elements and a gene or genes to which they are operably linked for expression.

[0028] A “functional fragment” of a DNA or protein sequence refers to a fragment that retains a biological activity (either functional or structural) that is substantially similar to a biological activity of the full-length DNA or protein sequence. A biological activity of a DNA sequence includes its ability to influence expression in a manner attributed to the full-length sequence.

[0029] The terms “engineered,” “synthetic,” and “artificial” are used interchangeably herein to refer to an object that has been modified by human intervention. For example, the terms refer to a polynucleotide or polypeptide that is non-naturally occurring. An engineered peptide has, but does not require, low sequence identity (e.g., less than 50% sequence identity, less than 25% sequence identity, less than 10% sequence identity, less than 5% sequence identity, less than 1% sequence identity) to a naturally occurring human protein. For example, VPR and VP64 domains are synthetic transactivation domains. Non-limiting examples include the following: a nucleic acid modified by changing its sequence to a sequence that does not occur in nature; a nucleic acid modified by ligating it to a nucleic acid that it does not associate with in nature such that the ligated product possesses a function not present in the original nucleic acid; an engineered nucleic acid synthesized in vitro with a sequence that does not exist in nature; a protein modified by changing its amino acid sequence to a sequence that does not exist in nature; an engineered protein acquiring a new function or property. An “engineered” system comprises at least one engineered component.

[0030] As used herein, the term “transposable element” refers to a DNA sequence that can move from one location in the genome to another (i.e., they can be “transposed”). Transposable elements can be generally divided into two classes. Class I transposable elements, or “retrotransposons”, are transposed via transcription and translation of an RNA intermediate which is subsequently reincorporated into its new location into the genome via reverse transcription (a process mediated by a reverse transcriptase). Class II transposable elements, or “DNA transposons”, are transposed via a complex of single- or double- stranded DNA flanked on either side by a transposase.

[0031] As used herein, the term “Tcl/Mariner” refers to a class and superfamily of DNA transposons. Tcl/Mariner transposons consist of a transposase gene flanked by terminal inverted repeats (“TIRs”) and short tandem site duplications (“TSDs”). Transposition, which occurs by a “cute and paste” mechanism, is initiated by two transposases’ recognition and binding of the TIR sequences. The transposases join together and promote double-stranded DNA cleavage, after which the DNA-transposase complex inserts the DNA at the target sequence. The Tcl/Mariner superfamily exhibits a characteristic DDE catalytic triad.

[0032] The term “sequence identity” or “percent identity” in the context of two or more nucleic acids or polypeptide sequences, refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a local or global comparison window, as measured using a sequence comparison algorithm. Suitable sequence comparison algorithms for polypeptide sequences include, e.g., BLASTP using parameters of a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment for polypeptide sequences longer than 30 residues; BLASTP using parameters of a wordlength (W) of 2, an expectation (E) of 1000000, and the PAM30 scoring matrix setting gap costs at 9 to open gaps and 1 to extend gaps for sequences of less than 30 residues (these are the default parameters for BLASTP in the BLAST suite available at https://blast.ncbi.nlm.nih.gov); CLUSTALW with the Smith- Waterman homology search algorithm parameters with a match of 2, a mismatch of -1, and a gap of -1; MUSCLE with default parameters; MAFFT with parameters of a retree of 2 and max iterations of 1000; Novafold with default parameters; HMMER hmmalign with default parameters.

[0033] The term “optimally aligned” in the context of two or more nucleic acids or polypeptide sequences, refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that have been aligned to maximal correspondence of amino acids residues or nucleotides, for example, as determined by the alignment producing a highest or “optimized” percent identity score.

[0034] Included in the current disclosure are variants of any of the enzymes described herein with one or more conservative amino acid substitutions. Such conservative substitutions can be made in the amino acid sequence of a polypeptide without disrupting the three-dimensional structure or function of the polypeptide. Conservative substitutions can be accomplished by substituting amino acids with similar hydrophobicity, polarity, and R chain length for one another. Additionally, or alternatively, by comparing aligned sequences of homologous proteins from different species, conservative substitutions can be identified by locating amino acid residues that have been mutated between species (e.g., non-conserved residues) without altering the basic functions of the encoded proteins. Such conservatively substituted variants may include variants with at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of the transposase protein sequences described herein (e.g. MG63 family transposases described herein, or any other family transposase described herein). In some embodiments, such conservatively substituted variants are functional variants. Such functional variants can encompass sequences with substitutions such that the activity of one or more critical active site residues of the transposase are not disrupted. In some embodiments, a functional variant of any of the proteins described herein lacks substitution of at least one of the conserved or functional residues called out in FIG. 2A. In some embodiments, a functional variant of any of the proteins described herein lacks substitution of all of the conserved or functional residues called out in FIG. 2A.

[0035] Also included in the current disclosure are variants of any of the enzymes described herein with substitution of one or more catalytic residues to decrease or eliminate activity of the enzyme (e.g. decreased-activity variants). In some embodiments, a decreased activity variant as a protein described herein comprises a disrupting substitution of at least one, at least two, or all three catalytic residues called out in FIG. 2A.

[0036] Conservative substitution tables providing functionally similar amino acids are available from a variety of references (see, for e.g., Creighton, Proteins: Structures and Molecular Properties (W H Freeman & Co.; 2nd edition (December 1993)). The following eight groups each contain amino acids that are conservative substitutions for one another:

1) Alanine (A), Glycine (G);

2) Aspartic acid (D), Glutamic acid (E);

3) Asparagine (N), Glutamine (Q);

4) Arginine (R), Lysine (K);

5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W);

7) Serine (S), Threonine (T); and

8) Cysteine (C), Methionine (M)

Overview

[0037] The discovery of new transposable elements with unique functionality and structure may offer the potential to further disrupt deoxyribonucleic acid (DNA) editing technologies, improving speed, specificity, functionality, and ease of use. Relative to the predicted prevalence of transposable elements in microbes and the sheer diversity of microbial species, relatively few functionally characterized transposable elements exist in the literature. This is partly because a huge number of microbial species may not be readily cultivated in laboratory conditions. Metagenomic sequencing from natural environmental niches containing large numbers of microbial species may offer the potential to drastically increase the number of new transposable elements documented and speed the discovery of new oligonucleotide editing functionalities.

[0038] Transposable elements are deoxyribonucleic acid sequences that can change position within a genome, often resulting in the generation or amelioration of mutations. In eukaryotes, a great proportion of the genome, and a large share of the mass of cellular DNA, is attributable to transposable elements. Although transposable elements are “selfish genes” which propagate themselves at the expense of other genes, they have been found to serve various important functions and to be crucial to genome evolution. Based on their mechanism, transposable elements are classified as either Class I “retrotransposons” or Class II “DNA transposons.”

[0039] Class I transposable elements, also referred to as retrotransposons, function according to a two-part “copy and paste” mechanism involving an RNA intermediate. First, the retrotransposon is transcribed. The resulting RNA is subsequently converted back to DNA by reverse transcriptase (generally encoded by the retrotransposon itself), and the reverse transcribed retrotransposon is finally integrated into its new position in the genome by integrase. Retrotransposons are further classified into three orders. Retrotransposons with long terminal repeats (“LTRs”) encode reverse transcriptase and are flanked by long strands of repeating DNA. Retrotransposons with long interspersed nuclear elements (“LINEs”) encode reverse transcriptase, lack LTRs, and are transcribed by RNA polymerase II. Retrotransposons with short interspersed nuclear elements (“SINEs”) are transcribed by RNA polymerase III but lack reverse transcriptase, instead relying on the reverse transcription machinery of other transposable elements (e.g., LINEs).

[0040] Class II transposable elements, also referred to as DNA transposons, function according to mechanisms that do not involve an RNA intermediate. Many DNA transposons display a “cut and paste” mechanism in which transposase binds terminal inverted repeats (“TIRs”) flanking the transposon, cleaves the transposon from the donor region, and inserts it into the target region of the genome. Others, referred to as “helitrons,” display a “rolling circle” mechanism involving a singlestranded DNA intermediate and mediated by an undocumented protein believed to possess HUH endonuclease function and 5’ to 3’ helicase activity. First, a circular strand of DNA is nicked to create two single DNA strands. The protein remains attached to the 5’ phosphate of the nicked strand, leaving the 3’ hydroxyl end of the complementary strand exposed and thus allowing a polymerase to replicate the non-nicked strand. Once replication is complete, the new strand disassociates and is itself replicated along with the original template strand. Still other DNA transposons, “Polintons,” are theorized to undergo a “self-synthesis” mechanism. The transposition is initiated by an integrase’s excision of a single- stranded extra-chromosomal Polinton element, which forms a racket- like structure. The Polinton undergoes replication with DNA polymerase B, and the double stranded Polinton is inserted into the genome by the integrase. Finally, some DNA transposons, such as those in the IS200/IS605 family, proceed via a “peel and paste” mechanism in which TnpA excises a piece of single-stranded DNA (as a circular “transposon joint”) from the lagging strand template of the donor gene and reinserts it into the replication fork of the target gene. [0041] While transposable elements have found some use as biological tools, documented transposable elements do not encompass the full range of possible biodiversity and targetability, and may not represent all possible activities. Here, thousands of genomic fragments were mined from numerous metagenomes for transposable elements. The documented diversity of transposable elements may have been expanded and novel systems may have been developed into highly targetable, compact, and precise gene editing agents.

Gene Editing Systems

MG Enzymes

[0042] Described herein, in certain embodiments, are engineered transposase systems, comprising (a) a double- stranded nucleic acid comprising a cargo nucleotide sequence; and (b) a transposase configured to interact with the double-stranded nucleic acid to transpose the cargo nucleotide sequence to a target nucleic acid site. In some embodiments, the transposase is a MG63 transposase (i.e., SEQ ID NOs: 1-38). See FIGs. 1A-1C.

[0043] In some embodiments, the engineered transposase system is discovered through metagenomic sequencing. In some embodiments, the metagenomic sequencing is conducted on samples collected from various environments. In some embodiments, the environment is a human microbiome, an animal microbiome, an environment with high temperatures, an environment with low temperatures, or sediment.

[0044] In some embodiments, the transposase is a MG63 transposase (i.e., SEQ ID NOs: 1-38). In some embodiments, the transposase comprises a sequence having at least about 70% sequence identity to any one of SEQ ID NOs: 1-38. In some embodiments, the transposase has at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of SEQ ID NOs: 1-38. In some embodiments, the transposase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 1-38. In some embodiments, the transposase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 1-38. In some embodiments, the transposase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 1-38. In some embodiments, the transposase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 1-38. In some embodiments, the transposase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 1-38. In some embodiments, the transposase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 1- 38. In some embodiments, the transposase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 1-38. In some embodiments, the transposase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 1-38. In some embodiments, the transposase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 1- 38. In some embodiments, the transposase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 1-38. In some embodiments, the transposase comprises a sequence having 100% identity to any one of SEQ ID NOs: 1-38.

[0045] In some embodiments, the transposase is not a Tcl/Mariner transposase. In some embodiments, the transposase has less than about 90%, less than about 85%, less than about 80%, less than about 75%, less than about 70%, less than about 65%, less than about 60%, less than about 55%, less than about 50%, less than about 45%, less than about 40%, less than about 35%, less than about 30%, less than about 25%, less than about 20%, less than about 15%, less than about 10%, or less than about 5% sequence identity to a Tcl/Mariner transposase.

[0046] In some embodiments, the transposase comprises a sequence complementary to a eukaryotic, fungal, plant, mammalian, or human genomic polynucleotide sequence. In some embodiments, the transposase comprises a sequence complementary to a eukaryotic genomic polynucleotide sequence. In some embodiments, the transposase comprises a sequence complementary to a fungal genomic polynucleotide sequence. In some embodiments, the transposase comprises a sequence complementary to a plant genomic polynucleotide sequence. In some embodiments, the transposase comprises a sequence complementary to a mammalian genomic polynucleotide sequence. In some embodiments, the transposase comprises a sequence complementary to a human genomic polynucleotide sequence.

[0047] In some embodiments, the transposase comprises a nuclear localization sequence (NLS). In some embodiments, the NLS is at an N-terminus of the transposase. In some embodiments, the NLS is at a C-terminus of the transposase. In some embodiments, the NLS is at an N-terminus and a C- terminus of the transposase.

[0048] In some embodiments, the NLS comprises a sequence of any one of SEQ ID NOs: 39-54, or a sequence having at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of SEQ ID NOs: 39-54. In some embodiments, the NLS comprises a sequence having at least about 80% identity to SEQ ID NOs: 39-54. In some embodiments, the NLS comprises a sequence having at least about 85% identity to SEQ ID NOs: 39-54. In some embodiments, the NLS comprises a sequence having at least about 90% identity to SEQ ID NOs: 39-54. In some embodiments, the NLS comprises a sequence having at least about 91% identity to SEQ ID NOs: 39- 54. In some embodiments, the NLS comprises a sequence having at least about 92% identity to SEQ ID NOs: 39-54. In some embodiments, the NLS comprises a sequence having at least about 93% identity to SEQ ID NOs: 39-54. In some embodiments, the NLS comprises a sequence having at least about 94% identity to SEQ ID NOs: 39-54. In some embodiments, the NLS comprises a sequence having at least about 95% identity to SEQ ID NOs: 39-54. In some embodiments, the NLS comprises a sequence having at least about 96% identity to SEQ ID NOs: 39-54. In some embodiments, the NLS comprises a sequence having at least about 97% identity to SEQ ID NOs: 39-54. In some embodiments, the NLS comprises a sequence having at least about 98% identity to SEQ ID NOs: 39- 54. In some embodiments, the NLS comprises a sequence having at least about 99% identity to SEQ ID NOs: 39-54. In some embodiments, the NLS comprises a sequence having 100% identity to SEQ ID NOs: 39-54.

Table 1: Example NLS Sequences that can be used with the transposases according to the present disclosure

[0049] In some embodiments, a transposase sequence described herein is determined by a BLASTP, CLUSTALW, MUSCLE, or MAFFT algorithm, or a CLUSTALW algorithm with the Smith- Waterman homology search algorithm parameters. In some embodiments, a transposase sequence is determined by the BLASTP homology search algorithm using parameters of a wordlength (W) of 3, an expectation (E) of 10, and a BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment.

Cargo Polynucleotides

[0050] Described herein, in certain embodiments, are engineered transposase systems comprising a transposase and a cargo nucleotide sequence. In some embodiments, the transposase is configured to interact with the transposase and transpose the cargo nucleotide sequence to a target nucleic acid site. In some embodiments, the cargo nucleotide sequence is flanked by a left-hand transposase recognition sequence recognized by the transposase and a right-hand transposase recognition sequence recognized by the transposase. [0051] In some embodiments, the cargo nucleotide sequence is double stranded. In some embodiments, the cargo nucleotide sequence is double stranded DNA. In some embodiments, the cargo nucleotide sequence is single stranded. In some embodiments, the cargo nucleotide sequence is a eukaryotic, plant, fungal, mammalian, rodent, or human double-stranded deoxyribonucleic acid polynucleotide.

[0052] In some embodiments, the target nucleic acid is double stranded. In some embodiments, the target nucleic acid is double stranded DNA. In some embodiments, the target nucleic acid is single stranded. In some embodiments, the target nucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double- stranded deoxyribonucleic acid polynucleotide.

MG Systems

[0053] Described herein, in certain embodiments, are engineered transposase systems, comprising (a) a double- stranded nucleic acid comprising a cargo nucleotide sequence; and (b) a transposase configured to interact with the double-stranded nucleic acid to transpose the cargo nucleotide sequence to a target nucleic acid site.

[0054] In some embodiments, the engineered transposase system comprises (a) a transposase comprising sequence having at least about 70% identity to any one of SEQ ID NOs: 1-38, and (b) a cargo nucleotide. In some embodiments, the engineered transposase system comprises (a) a transposase comprising sequence having at least about at least about 75% identity to any one of SEQ ID NOs: 1-38, and (b) a cargo nucleotide. In some embodiments, the engineered transposase system comprises (a) a transposase comprising sequence having at least about at least about 80% identity to any one of SEQ ID NOs: 1-38, and (b) a cargo nucleotide. In some embodiments, the engineered transposase system comprises (a) a transposase comprising sequence having at least about at least about 85% identity to any one of SEQ ID NOs: 1-38, and (b) a cargo nucleotide. In some embodiments, the engineered transposase system comprises (a) a transposase comprising sequence having at least about at least about 90% identity to any one of SEQ ID NOs: 1-38, and (b) a cargo nucleotide. In some embodiments, the engineered transposase system comprises (a) a transposase comprising sequence having at least about at least about 95% identity to any one of SEQ ID NOs: 1- 38, and (b) a cargo nucleotide. In some embodiments, the engineered transposase system comprises (a) a transposase comprising sequence having at least about at least about 96% identity to any one of SEQ ID NOs: 1-38, and (b) a cargo nucleotide. In some embodiments, the engineered transposase system comprises (a) a transposase comprising sequence having at least about at least about 97% identity to any one of SEQ ID NOs: 1-38, and (b) a cargo nucleotide. In some embodiments, the engineered transposase system comprises (a) a transposase comprising sequence having at least about at least about 98% identity to any one of SEQ ID NOs: 1-38, and (b) a cargo nucleotide. In some embodiments, the engineered transposase system comprises (a) a transposase comprising sequence having at least about at least about 99% identity to any one of SEQ ID NOs: 1-38, and (b) a cargo nucleotide. In some embodiments, the engineered transposase system comprises (a) a transposase comprising 100% identity to any one of SEQ ID NOs: 1-38, and (b) a cargo nucleotide.

Delivery and Vectors

[0055] Disclosed herein, in some embodiments, are nucleic acid sequences encoding an engineered transposase system disclosed herein.

[0056] In some embodiments, the nucleic acid encoding the engineered transposase system is a DNA, for example a linear DNA, a plasmid DNA, or a minicircle DNA. In some embodiments, the nucleic acid encoding the engineered transposase system is an RNA, for example a mRNA.

[0057] In some embodiments, the nucleic acid encoding the engineered transposase system is delivered by a nucleic acid-based vector. In some embodiments, the nucleic acid-based vector is a plasmid (e.g., circular DNA molecules that can autonomously replicate inside a cell), cosmid (e.g., pWE or sCos vectors), artificial chromosome, human artificial chromosome (HAC), yeast artificial chromosomes (YAC), bacterial artificial chromosome (BAC), Pl -derived artificial chromosomes (PAC), phagemid, phage derivative, bacmid, or virus. In some embodiments, the nucleic acid-based vector is selected from the list consisting of: pSF-CMV-NEO-NH2-PPT-3XFLAG, pSF-CMV-NEO- COOH-3XFLAG, pSF-CMV-PURO-NH2-GST-TEV, pSF-OXB20-COOH-TEV-FLAG(R)-6His, pCEP4 pDEST27, pSF-CMV-Ub-KrYFP, pSF-CMV-FMDV-daGFP, pEFla-mCherry-Nl vector, pEFla-tdTomato vector, pSF-CMV-FMDV-Hygro, pSF-CMV-PGK-Puro, pMCP-tag(m), pSF- CMV-PURO-NH2-CMYC, pSF-OXB20-BetaGal,pSF-OXB20-Fluc, pSF-OXB20, pSF-Tac, pRI 101-AN DNA, pCambia2301,pTYB21, pKLAC2, pAc5.1/V5-His A, and pDEST8.

[0058] In some embodiments, the nucleic acid-based vector comprises a promoter. In some embodiments, the promoter is selected from the group consisting of a mini promoter, an inducible promoter, a constitutive promoter, and derivatives thereof. In some embodiments, the promoter is selected from the group consisting of CMV, CBA, EFla, CAG, PGK, TRE, U6, UAS, T7, Sp6, lac, araBad, trp, Ptac, p5, pl9, p40, Synapsin, CaMKII, GRK1, and derivatives thereof. In some embodiments the promoter is a U6 promoter. In some embodiments, the promoter is a CAG promoter.

[0059] In some embodiments, the nucleic acid-based vector is a virus. In some embodiments, the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus. In some embodiments, the virus is an alphavirus. In some embodiments, the virus is a parvovirus. In some embodiments, the virus is an adenovirus. In some embodiments, the virus is an AAV. In some embodiments, the virus is a baculovirus. In some embodiments, the virus is a Dengue virus. In some embodiments, the virus is a lentivirus. In some embodiments, the virus is a herpesvirus. In some embodiments, the virus is a poxvirus. In some embodiments, the virus is an anellovirus. In some embodiments, the virus is a bocavirus. In some embodiments, the virus is a vaccinia virus. In some embodiments, the virus is or a retrovirus.

[0060] In some embodiments, the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV-rh8, AAV- rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV-rhM4-l, AAV-hu37, AAV-Anc80, AAV-Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP-EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV- HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV-HSC6, AAV-HSC7, AAV- HSC8, AAV-HSC9, AAV-HSC10, AAV-HSC11, AAV-HSC12, AAV-HSC13, AAV-HSC14, AAV-HSC15, AAV-TT, AAV-DJ/8, AAV-Myo, AAV-NP40, AAV-NP59, AAV-NP22, AAV- NP66, AAV-HSC16, or a derivative thereof. In some embodiments, the herpesvirus is HSV type 1, HSV-2, VZV, EBV, CMV, HHV-6, HHV-7, or HHV-8.

[0061] In some embodiments, the virus is AAV1 or a derivative thereof. In some embodiments, the virus is AAV2 or a derivative thereof. In some embodiments, the virus is AAV3 or a derivative thereof. In some embodiments, the virus is AAV4 or a derivative thereof. In some embodiments, the virus is AAV5 or a derivative thereof. In some embodiments, the virus is AAV6 or a derivative thereof. In some embodiments, the virus is AAV7 or a derivative thereof. In some embodiments, the virus is AAV8 or a derivative thereof. In some embodiments, the virus is AAV9 or a derivative thereof. In some embodiments, the virus is AAV10 or a derivative thereof. In some embodiments, the virus is AAV 11 or a derivative thereof. In some embodiments, the virus is AAV 12 or a derivative thereof. In some embodiments, the virus is AAV 13 or a derivative thereof. In some embodiments, the virus is AAV14 or a derivative thereof. In some embodiments, the virus is AAV15 or a derivative thereof. In some embodiments, the virus is AAV 16 or a derivative thereof. In some embodiments, the virus is AAV-rh8 or a derivative thereof. In some embodiments, the virus is AAV-rhlO or a derivative thereof. In some embodiments, the virus is AAV-rh20 or a derivative thereof. In some embodiments, the virus is AAV-rh39 or a derivative thereof. In some embodiments, the virus is AAV-rh74 or a derivative thereof. In some embodiments, the virus is AAV-rhM4-l or a derivative thereof. In some embodiments, the virus is AAV-hu37 or a derivative thereof. In some embodiments, the virus is AAV-Anc80 or a derivative thereof. In some embodiments, the virus is AAV-Anc80L65 or a derivative thereof. In some embodiments, the virus is AAV-7m8 or a derivative thereof. In some embodiments, the virus is AAV-PHP-B or a derivative thereof. In some embodiments, the virus is AAV-PHP-EB or a derivative thereof. In some embodiments, the virus is AAV-2.5 or a derivative thereof. In some embodiments, the virus is AAV-2tYF or a derivative thereof. In some embodiments, the virus is AAV-3B or a derivative thereof. In some embodiments, the virus is AAV-LK03 or a derivative thereof. In some embodiments, the virus is AAV-HSC1 or a derivative thereof. In some embodiments, the virus is AAV-HSC2 or a derivative thereof. In some embodiments, the virus is AAV-HSC3 or a derivative thereof. In some embodiments, the virus is AAV-HSC4 or a derivative thereof. In some embodiments, the virus is AAV-HSC5 or a derivative thereof. In some embodiments, the virus is AAV-HSC6 or a derivative thereof. In some embodiments, the virus is AAV-HSC7 or a derivative thereof. In some embodiments, the virus is AAV-HSC8 or a derivative thereof. In some embodiments, the virus is AAV-HSC9 or a derivative thereof. In some embodiments, the virus is AAV-HSC10 or a derivative thereof. In some embodiments, the virus is AAV-HSC11 or a derivative thereof. In some embodiments, the virus is AAV-HSC12 or a derivative thereof. In some embodiments, the virus is AAV-HSC13 or a derivative thereof. In some embodiments, the virus is AAV-HSC14 or a derivative thereof. In some embodiments, the virus is AAV-HSC15 or a derivative thereof. In some embodiments, the virus is AAV-TT or a derivative thereof. In some embodiments, the virus is AAV-DJ/8 or a derivative thereof. In some embodiments, the virus is AAV-Myo or a derivative thereof. In some embodiments, the virus is AAV-NP40 or a derivative thereof. In some embodiments, the virus is AAV-NP59 or a derivative thereof. In some embodiments, the virus is AAV-NP22 or a derivative thereof. In some embodiments, the virus is AAV-NP66 or a derivative thereof. In some embodiments, the virus is AAV-HSC16 or a derivative thereof.

[0062] In some embodiments, the virus is HSV-1 or a derivative thereof. In some embodiments, the virus is HSV-2 or a derivative thereof. In some embodiments, the virus is VZV or a derivative thereof. In some embodiments, the virus is EBV or a derivative thereof. In some embodiments, the virus is CMV or a derivative thereof. In some embodiments, the virus is HHV-6 or a derivative thereof. In some embodiments, the virus is HHV-7 or a derivative thereof. In some embodiments, the virus is HHV-8 or a derivative thereof.

[0063] In some embodiments, the nucleic acid encoding the engineered transposase system is delivered by a non-nucleic acid-based delivery system (e.g., a non-viral delivery system). In some embodiments, the non-viral delivery system is a liposome. In some embodiments, the nucleic acid is associated with a lipid. The nucleic acid associated with a lipid, in some embodiments, is encapsulated in the aqueous interior of a liposome, interspersed within the lipid bilayer of a liposome, attached to a liposome via a linking molecule that is associated with both the liposome and the nucleic acid, entrapped in a liposome, complexed with a liposome, dispersed in a solution containing a lipid, mixed with a lipid, combined with a lipid, contained as a suspension in a lipid, contained or complexed with a micelle, or otherwise associated with a lipid. In some embodiments, the nucleic acid is comprised in a lipid nanoparticle (LNP).

[0064] In some embodiments, the engineered transposase system is introduced into the cell in any suitable way, either stably or transiently. In some embodiments, an engineered transposase system is transfected into the cell. In some embodiments, the cell is transduced or transfected with a nucleic acid construct that encodes an engineered transposase system. For example, a cell is transduced (e.g., with a virus encoding an engineered transposase system), or transfected (e.g., with a plasmid encoding an engineered transposase system) with a nucleic acid that encodes an engineered transposase system, or the translated engineered transposase system. In some embodiments, the transduction is a stable or transient transduction. In some embodiments, a plasmid expressing an engineered transposase system is introduced into cells through electroporation, transient (e.g., lipofection) and stable genome integration (e.g., piggybac) and viral transduction (for example lentivirus or AAV) or other methods known to those of skill in the art. In some embodiments, the gene editing system is introduced into the cell as one or more polypeptides. In some embodiments, delivery is achieved through the use of RNP complexes. Delivery methods to cells for polypeptides and/or RNPs are known in the art, for example by electroporation or by cell squeezing.

[0065] Exemplary methods of delivery of nucleic acids include lipofection, nucleofection, electroporation, stable genome integration (e.g., piggybac), microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386; 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™, Lipofectin™ and SF Cell Line 4D-Nucleofector X Kit™ (Lonza)). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of WO 91/17424 and WO 91/16024. In some embodiments, the delivery is to cells (e.g., in vitro or ex vivo administration) or target tissues (e.g., in vivo administration). In some embodiments, the nucleic acid is comprised in a liposome or a nanoparticle that specifically targets a host cell.

[0066] Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US 2003/0087817.

Cells

[0067] Described herein, in certain embodiments, is a cell comprising the engineered transposase system described herein. [0068] In some embodiments, the cell is a eukaryotic cell (e.g., a plant cell, an animal cell, a protist cell, or a fungi cell), a mammalian cell (a Chinese hamster ovary (CHO) cell, baby hamster kidney (BHK), human embryo kidney (HEK), mouse myeloma (NSO), or human retinal cells), an immortalized cell (e.g., a HeLa cell, a COS cell, a HEK-293T cell, a MDCK cell, a 3T3 cell, a PC12 cell, a Huh7 cell, a HepG2 cell, a K562 cell, a N2a cell, or a SY5Y cell), an insect cell (e.g., a Spodoptera frugiperda cell, a Trichoplusia ni cell, a Drosophila melanogaster cell, a S2 cell, or a Heliothis virescens cell), a yeast cell (e.g., a Saccharomyces cerevisiae cell, a Cryptococcus cell, or a Candida cell), a plant cell (e.g., a parenchyma cell, a collenchyma cell, or a sclerenchyma cell), a fungal cell (e.g., a Saccharomyces cerevisiae cell, a Cryptococcus cell, or a Candida cell), or a prokaryotic cell (e.g., a E. coli cell, a streptococcus bacterium cell, a streptomyces soil bacteria cell, or an archaea cell). In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is an immortalized cell. In some embodiments, the cell is an insect cell. In some embodiments, the cell is a yeast cell. In some embodiments, the cell is a plant cell. In some embodiments, the cell is a fungal cell. In some embodiments, the cell is a prokaryotic cell.

[0069] In some embodiments, the cell is an A549, HEK-293, HEK-293T, BHK, CHO, HeLa, MRC5, Sf9, Cos-1, Cos-7, Vero, BSC 1, BSC 40, BMT 10, WI38, HeLa, Saos, C2C12, L cell, HT1080, HepG2, Huh7, K562, a primary cell, or derivative thereof.

Methods of Use

[0070] Described herein, in certain embodiments, are methods for modifying a target nucleic acid comprising providing an engineered transposase system disclosed herein. In some embodiments, the engineered transposase system comprises a transposase and cargo nucleotide sequence. In some embodiments, the target nucleic acid is double stranded. In some embodiments, the target nucleic acid is double stranded DNA. In some embodiments, the target nucleic acid is single stranded.

[0071] In some embodiments, the methods are used to introduce a modification in the genome of a cell. In some embodiments, the modification is an insertion, deletion, or mutation. In some embodiments, the methods are used to introduce site-directed insertions, deletions, and/or mutations in the genome of a cell (for example an insertion and a mutation). In some embodiments, the methods are used in combination with a nucleic acid template to facilitate site-directed insertions into the genome of a cell.

[0072] In some embodiments, the cell is a human cell. In some embodiments, the cell genome or a vector comprised in the cell is modified. In some embodiments, the cell genome is modified ex vivo. In some embodiments, the cell genome is modified in vivo. [0073] Systems of the present disclosure may be used for various applications, such as, for example, nucleic acid editing (e.g., gene editing), binding to a nucleic acid molecule (e.g., sequence- specific binding). Such systems may be used, for example, for addressing (e.g., removing or replacing) a genetically inherited mutation that may cause a disease in a subject, inactivating a gene in order to ascertain its function in a cell, as a diagnostic tool to detect disease-causing genetic elements (e.g., via cleavage of reverse-transcribed viral RNA or an amplified DNA sequence encoding a diseasecausing mutation), as deactivated enzymes in combination with a probe to target and detect a specific nucleotide sequence (e.g., sequence encoding antibiotic resistance int bacteria), to render viruses inactive or incapable of infecting host cells by targeting viral genomes, to add genes or amend metabolic pathways to engineer organisms to produce valuable small molecules, macromolecules, or secondary metabolites, to establish a gene drive element for evolutionary selection, to detect cell perturbations by foreign small molecules and nucleotides as a biosensor.

[0074] Described herein, in certain embodiments, are methods for modifying a target nucleic acid comprising providing an engineered transposase system. In some embodiments, the present disclosure provides a method for binding, nicking, cleaving, marking, modifying, or transposing a double-stranded deoxyribonucleic acid polynucleotide. In some embodiments, the method comprises contacting the double-stranded deoxyribonucleic acid polynucleotide with a transposase. In some embodiments, the transposase induces a single-stranded break or a double- stranded break at or proximal to the target nucleic acid site. In some embodiments, the transposase induces a staggered single stranded break within or 5’ to the target site.

[0075] In some embodiments, the double-stranded deoxyribonucleic acid polynucleotide is a eukaryotic, plant, fungal, mammalian, rodent, or human double- stranded deoxyribonucleic acid polynucleotide.

[0076] In some embodiments, the transposase is configured to transpose the cargo nucleotide sequence as single-stranded deoxyribonucleic acid polynucleotide. In some embodiments, the transposase is configured to transpose the cargo nucleotide sequence as double- stranded deoxyribonucleic acid polynucleotide. In some embodiments, the transposase is configured to transpose the cargo nucleotide sequence via a ribonucleic acid polynucleotide intermediate. In some embodiments, the cargo nucleotide sequence is flanked by a 3’ untranslated region (UTR) and a 5’ untranslated region (UTR).

[0077] In some embodiments, the present disclosure provides a method of modifying a target nucleic acid site. In some embodiments, the method comprises delivering to the target nucleic acid site the engineered transposase system described herein. In some embodiments, the engineered transposase system is configured such that upon binding of the engineered transposase system to the target nucleic acid site, the engineered transposase system modifies the target nucleic acid site.

[0078] In some embodiments, modifying the target nucleic acid site comprises binding, nicking, cleaving, marking, modifying, or transposing the target nucleic acid site. In some embodiments, the target nucleic acid site comprises deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). In some embodiments, the target nucleic acid comprises genomic DNA, viral DNA, viral RNA, or bacterial DNA. In some embodiments, the target nucleic acid site is in vitro. In some embodiments, the target nucleic acid site is within a cell. In some embodiments, the cell is a prokaryotic cell, a bacterial cell, a eukaryotic cell, a fungal cell, a plant cell, an animal cell, a mammalian cell, a rodent cell, a primate cell, or a human cell. In some embodiments, the cell is a primary cell. In some embodiments, the primary cell is a T cell. In some embodiments, the primary cell is a hematopoietic stem cell (HSC). In some embodiments, the cell is a human cell. In some embodiments, the cell is genome edited ex vivo. In some embodiments, the cell is genome edited in vivo.

[0079] In some embodiments, delivery of the engineered transposase system to the target nucleic acid site comprises delivering the nucleic acid described herein or the vector described herein. In some embodiments, delivery of engineered transposase system to the target nucleic acid site comprises delivering a nucleic acid comprising an open reading frame encoding the transposase. In some embodiments, the nucleic acid comprises a promoter. In some embodiments, the open reading frame encoding the transposase is operably linked to the promoter.

[0080] In some embodiments, delivery of the engineered transposase system to the target nucleic acid site comprises delivering a capped mRNA containing the open reading frame encoding the transposase. In some embodiments, delivery of the engineered transposase system to the target nucleic acid site comprises delivering a translated polypeptide.

[0081] In some embodiments, the transposase does not induce a break at or proximal to the target nucleic acid site.

[0082] In some embodiments, the transposition activity is measured in vitro by introducing the transposase to cells comprising the target nucleic acid site and detecting transposition of the target nucleic acid site in the cells. In some embodiments, the composition comprises 20 pmoles or less of the transposase. In some embodiments, the composition comprises 1 pmol or less of the transposase. [0083] Further described herein, in certain embodiments, are methods of manufacturing a transposase. In some embodiments, the method comprises cultivating a host cell with the engineered transposase system described herein.

[0084] In some embodiments, the host cell is a bacterial cell. In some embodiments, the bacterial cell is Bifidobacterium longum, Bifidobacterium lactis, Bifidobacterium animalis, Bifidobacterium breve, Bifidobacterium infantis, Bifidobacterium adolescentis, Lactobacillus acidophilus, Lactobacillus casei, Lactobacillus paracasei, Lactobacillus salivarius, Lactobacillus reuteri, Lactobacillus rhamno sus, Lactobacillus johnsonii, Lactobacillus plantarum, Lactobacillus fermentum, Lactococcus lactis, Streptococcus thermophilus, Lactococcus lactis, Lactococcus diacetylactis, Lactococcus cremoris, Lactobacillus bulgaricus, Lactobacillus helveticus, Lactobacillus delbrueckii, or Escherichia coli. In some embodiments, the host cell is an E. coli cell. In some embodiments, the E. coli cell is a Z.DE3 lysogen or a BL21(DE3) strain. In some embodiments, the E. coli cell has an ompT Ion genotype.

[0085] In some embodiments, the host cell is an E. coli cell. In some embodiments, the E. coli cell is a Z.DE3 lysogen or the E. coli cell is a BL21(DE3) strain. In some embodiments, the E. coli cell has an ompT Ion genotype.

[0086] In some embodiments, the open reading frame is operably linked to a promoter sequence. In some embodiments, the promoter is selected from the group consisting of a mini promoter, an inducible promoter, a constitutive promoter, and derivatives thereof. In some embodiments, the promoter is selected from the group consisting of CMV, CBA, EFla, CAG, PGK, TRE, U6, UAS, T7, Sp6, lac, araBad, trp, Ptac, p5, pl9, p40, Synapsin, CaMKII, GRK1, and derivatives thereof. [0087] In some embodiments, the open reading frame is operably linked to a T7 promoter sequence, a T7-lac promoter sequence, a lac promoter sequence, a tac promoter sequence, a trc promoter sequence, a ParaBAD promoter sequence, a PrhaBAD promoter sequence, a T5 promoter sequence, a cspA promoter sequence, an araPBAD promoter, a strong leftward promoter from phage lambda (pL promoter), or any combination thereof.

[0088] In some embodiments, the open reading frame comprises a sequence encoding an affinity tag linked in-frame to a sequence encoding the transposase. In some embodiments, the affinity tag is an immobilized metal affinity chromatography (IMAC) tag. In some embodiments, the IMAC tag is a polyhistidine tag. In some embodiments, the affinity tag is a myc tag, a human influenza hemagglutinin (HA) tag, a maltose binding protein (MBP) tag, a glutathione S-transferase (GST) tag, a streptavidin tag, a FLAG tag, or any combination thereof. In some embodiments, the affinity tag is linked in-frame to the sequence encoding the transposase via a linker sequence encoding a protease cleavage site. In some embodiments, the protease cleavage site is a tobacco etch virus (TEV) protease cleavage site, a PreScission® protease cleavage site, a Thrombin cleavage site, a Factor Xa cleavage site, an enterokinase cleavage site, or any combination thereof.

[0089] In some embodiments, the open reading frame is codon-optimized for expression in the host cell. In some embodiments, the open reading frame is provided on a vector. In some embodiments, the open reading frame is integrated into a genome of the host cell. [0090] In some embodiments, the present disclosure provides a culture comprising a host cell described herein in compatible liquid medium.

[0091] In some embodiments, the present disclosure provides a method of producing a transposase, comprising cultivating a host cell described herein in compatible growth medium. In some embodiments, the method further comprises inducing expression of the transposase by addition of an additional chemical agent or an increased amount of a nutrient. In some embodiments, the additional chemical agent or increased amount of a nutrient comprises Isopropyl P-D-l -thiogalactopyranoside (IPTG) or additional amounts of lactose. In some embodiments, the method further comprises isolating the host cell after the cultivation and lysing the host cell to produce a protein extract. In some embodiments, the method further comprises subjecting the protein extract to IMAC, or ionaffinity chromatography. In some embodiments, the open reading frame comprises a sequence encoding an IMAC affinity tag linked in-frame to a sequence encoding the transposase. In some embodiments, the IMAC affinity tag is linked in-frame to the sequence encoding the transposase via a linker sequence encoding protease cleavage site. In some embodiments, the protease cleavage site comprises a tobacco etch virus (TEV) protease cleavage site, a PreScission® protease cleavage site, a Thrombin cleavage site, a Factor Xa cleavage site, an enterokinase cleavage site, or any combination thereof. In some embodiments, the method further comprises cleaving the IMAC affinity tag by contacting a protease corresponding to the protease cleavage site to the transposase. In some embodiments, the method further comprises performing subtractive IMAC affinity chromatography to remove the affinity tag from a composition comprising the transposase.

Kits

[0092] In some embodiments, this disclosure provides kits comprising one or more nucleic acid constructs encoding the various components of the transposase or gene editing system described herein, e.g., comprising a nucleotide sequence encoding the components of the transposase or gene editing system capable of modifying a target DNA sequence. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the gene editing system components.

[0093] In some embodiments, any of the transposase or gene editing systems disclosed herein is assembled into a pharmaceutical, diagnostic, or research kit to facilitate its use in therapeutic, diagnostic, or research applications. A kit may include one or more containers housing any of the vectors disclosed herein and instructions for use.

[0094] The kit may be designed to facilitate use of the methods described herein by researchers and can take many forms. Each of the compositions of the kit, where applicable, may be provided in liquid form (e.g., in solution), or in solid form, (e.g., a dry powder). In certain cases, some of the compositions may be constitutable or otherwise processable (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water or a cell culture medium), which may or may not be provided with the kit. As used herein, "instructions" can define a component of instruction and/or promotion, and typically involve written instructions on or associated with packaging of the disclosure. Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc. The written instructions, in some embodiments, are in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which instructions can also reflect approval by the agency of manufacture, use, or sale for animal administration.

EXAMPLES

Example 1 - A method of metagenomic analysis for new proteins

[0095] Metagenomic samples were collected from sediment, soil, and animals. Deoxyribonucleic acid (DNA) was extracted and sequenced. Additional raw sequence data from public sources included animal microbiomes, sediment, soil, hot springs, hydrothermal vents, marine, peat bogs, permafrost, and sewage sequences. Metagenomic sequence data was searched using Hidden Markov Models generated based on known transposase protein sequences to identify new transposases. Transposase proteins identified by the search were aligned to known proteins to identify potential active sites. This metagenomic workflow resulted in the delineation of the MG63 family described herein.

Example 2 - Discovery of MG63 Family of Transposases

[0096] Analysis of the data from the metagenomic analysis of Example 1 revealed a new cluster of previously undescribed putative transposase systems comprising 1 family (MG63). The corresponding protein sequences for these new enzymes and their exemplary subdomains are presented as SEQ ID NOs: 1-38.

Example 3 - Integrase in vitro activity (prophetic)

[0097] Integrase activity is preferentially conducted via expression in an E. coli lysate based expression system. The required components for in vitro testing are three plasmids: an expression plasmid with the transposon gene(s) under a T7 promoter, a target plasmid, and a donor plasmid which contains the required left end (LE) and right end (RE) DNA sequences for transposition around a cargo gene (e.g. Tet resistance gene). The lysate-based expression products, target DNA, and donor DNA are incubated to allow for transposition to occur. Transposition is detected via PCR. In addition, the transposition product will be tagmented with T5 and sequenced via NGS to determine the insertion sites on a population of transposition events. Alternatively, the in vitro transposition products can be transformed into E. coli under antibiotic (e.g. Tet) selection, where growth requires the transposition cargo to be stably inserted into a plasmid. Either single colonies or a population of E. coli can be sequenced to determine the insertion sites.

[0098] Integration efficiency can be measured via ddPCR or qPCR of the experimental output of target DNA with integrated cargo, normalized to the amount of unmodified target DNA also measured via ddPCR.

[0099] This assay may also be conducted with purified protein components rather than from lysatebased expression. In this case, the proteins are expressed in E. coli protease-deficient B strain under T7 inducible promoter, the cells are lysed using sonication, and the His-tagged protein of interest is purified. Purity is determined using densitometry of the protein bands resolved on SDS-PAGE and coomassie stained acrylamide gels. The protein is desalted in storage buffer composed of 50 mM Tris-HCl, 300 mM NaCl, 1 mM TCEP, 5% glycerol; pH 7.5 (or other buffers as determined for maximum stability) and stored at -80°C. After purification the transposon gene(s) are added to the target DNA and donor DNA as described above in a reaction buffer, for example 26 mM HEPES pH 7.5, 4.2 mM TRIS pH 8, 50 ug/mL BSA, 2 mM ATP, 2.1 mM DTT, 0.05 mM EDTA, 0.2 mM MgCh, 28 mM NaCl, 21 mM KC1, 1.35% glycerol, (final pH 7.5) supplemented with 15 mM MgOAci.

Example 4 - Transposon end verification via gel shift (prophetic)

[00100] The transposon ends are tested for transposase binding via an electrophoretic mobility shift assay (EMSA). In this case, the potential LE or RE is synthesized as a DNA fragment (100-500 bp) and end-labeled with FAM via PCR with FAM-labeled primers. The transposase protein is synthesized in an in vitro transcription/translation system. After synthesis, 1 uL of protein is added to 50 nM of the labeled RE or LE in a 10 uL reaction in binding buffer (e.g. 20 mM HEPES pH 7.5, 2.5 mM Tris pH 7.5, 10 mM NaCl, 0.0625 mM EDTA, 5 mM TCEP, 0.005% BSA, 1 ug/mL poly(dl- dC), and 5% glycerol). The binding is incubated at 30° for 40 minutes, then 2 uL of 6X loading buffer (60 mM KC1, 10 mM Tris pH 7,6, 50% glycerol) is added. The binding reaction is separated on a 5% TBE gel and visualized. Shifts of the LE or RE in the presence of transposase protein can be attributed to successful binding and are indicative of transposase activity. This assay can also be performed with transposase truncations or mutations, as well as using E. coli extract or purified protein.

Example 5 - Integrase activity in E. coli (prophetic)

[00101] Engineered E. coli strains are transformed with a plasmid expressing the transposon genes and a plasmid containing a temperature-sensitive origin of replication with a selectable marker flanked by left end (LE) and right end (RE) transposon motifs for integration. Transformants induced for expression of these genes are then screened for transfer of the marker to a genomic target by selection at restrictive temperature for plasmid replication and the marker integration in the genome is confirmed by PCR.

[00102] Integrations are screened using an unbiased approach. In brief, purified gDNA is tagmented with Tn5, and DNA of interest is then PCR amplified using primers specific to the Tn5 tagmentation and the selectable marker. The amplicons are then prepared for NGS sequencing. Analysis of the resulting sequences is trimmed of the transposon sequences and flanking sequences are mapped to the genome to determine insertion position, and insertion rates are determined.

[00103] Alternatively, a polA mutant E. coli strain, MM383, which produces a DNA polymerase I (Poll) that is defective at 42°C, is used to detect integration as described previously. Resistance to a selectable marker after growth at 42°C indicates incorporation of donor DNA into the chromosome. The pUC19 plasmid without donor is used as a control following growth for 24 hours at 42°C without antibiotic selection.

[00104] E. coli strains that successfully grow in selection media are presumed to have integrated the donor DNA encoding the cargo resistance gene. Colonies growing in antibiotic selection plates are genotyped for cargo presence and NGS of whole genome sequence is performed.

Example 6 - Integrase activity in mammalian cells (prophetic)

[00105] To show targeting and cleavage activity in mammalian cells, each of the transposon proteins is purified with 2 NLS peptides on either terminus of the protein sequence. A plasmid containing a selectable neomycin resistance marker (NeoR) or a fluorescent marker flanked by the left end (LE) and right end (RE) motifs is synthesized. Cells are then transfected with the plasmid, recovered for 4-6 hours, and subsequently electroporated with transposon proteins. Antibiotic resistance integration into the genome is quantified by G418-resistant colony counts, and positive transposition by the fluorescent marker is assayed by fluorescence activated cell cytometry. 72 hours after cotransfection, genomic DNA is extracted and used for the preparation of an NGS-library. Integration frequency is assayed by Tn5 tagmentation. Table 2 - Protein and nucleic acid sequences referred to herein

[00106] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

CLAIMS WHAT IS CLAIMED IS:

1. An engineered transposase system, comprising:

(a) a double- stranded nucleic acid and comprising a cargo nucleotide sequence; and

(b) a transposase configured to interact with the double- stranded nucleic acid to transpose the cargo nucleotide sequence to a target nucleic acid site; and comprising a sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-38.

2. The engineered transposase system of claim 1, wherein the cargo nucleotide sequence is flanked by a left-hand transposase recognition sequence and a right-hand transposase recognition sequence recognized by the transposase.

3. The engineered transposase system of any one of claims 1-2, wherein the transposase is configured to transpose the cargo nucleotide sequence as double-stranded deoxyribonucleic acid polynucleotide.

4. The engineered transposase system of any one of claims 1-3, wherein the transposase comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of the transposase.

5. The engineered transposase system of claim 4, wherein the NLS comprises a sequence according to any one of SEQ ID NOs: 1480-1495.

6. A method for binding, nicking, cleaving, marking, modifying, or transposing a doublestranded deoxyribonucleic acid polynucleotide, comprising contacting the double- stranded deoxyribonucleic acid polynucleotide with a transposase configured to transpose the cargo nucleotide sequence to a target nucleic acid site; and comprising a sequence having at least 75% sequence identity to any one of SEQ ID NOs: 1-38.

7. A method of modifying a target nucleic acid site, comprising contacting the target nucleic acid site with the engineered transposase system of any one of claims 1-5.

8. The method of claim 7, wherein modifying the target nucleic acid site comprises binding, nicking, cleaving, marking, modifying, or transposing the target nucleic acid site.

9. The method of any one of claims 7-8, wherein the target nucleic acid site comprises deoxyribonucleic acid (DNA).

10. The method of claim 9, wherein the target nucleic acid site comprises genomic DNA, viral DNA, or bacterial DNA.

11. A method for transposing a cargo nucleotide sequence into a target nucleic acid site comprising introducing the engineered transposase system of any one of claims 1-5 to a cell.

12. A cell comprising the engineered transposase system of any one of claims 1-5.

13. The cell of claim 12, wherein the cell is a eukaryotic cell.

14. The cell of claim 12, wherein the cell is a mammalian cell.

15. The cell of claim 12, wherein the cell is an immortalized cell.

16. The cell of claim 12, wherein the cell is an insect cell.

17. The cell of claim 12, wherein the cell is a yeast cell.

18. The cell of claim 12, wherein the cell is a plant cell.

19. The cell of claim 12, wherein the cell is a fungal cell.

20. The cell of claim 12, wherein the cell is a prokaryotic cell.

21. The cell of claim 12, wherein the cell is an A549, HEK-293, HEK-293T, BHK, CHO,

HeLa, MRC5, Sf9, Cos-1, Cos-7, Vero, BSC 1, BSC 40, BMT 10, WI38, HeLa, Saos, C2C12, L cell, HT1080, HepG2, Huh7, K562, primary cell, or a derivative thereof.

22. The cell of claim 12, wherein the cell is an engineered cell.

23. The cell of claim 12, wherein the cell is a stable cell.

24. The cell of claim 12, wherein the cell is a primary cell.

25. The cell of claim 12, wherein the cell is a T cell.

26. The cell of claim 12, wherein the cell is a hematopoietic stem cell (HSC).