WO2023196647A1 - Computer-implemented systems and methods for targeting microhomology-mediated excision - Google Patents

Computer-implemented systems and methods for targeting microhomology-mediated excision Download PDF

Info

Publication number
WO2023196647A1
WO2023196647A1 PCT/US2023/017959 US2023017959W WO2023196647A1 WO 2023196647 A1 WO2023196647 A1 WO 2023196647A1 US 2023017959 W US2023017959 W US 2023017959W WO 2023196647 A1 WO2023196647 A1 WO 2023196647A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
region
cleaved
microhomology
template nucleic
Prior art date
Application number
PCT/US2023/017959
Other languages
French (fr)
Inventor
Thomas James CRADICK
Original Assignee
Excision Biotherapeutics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Excision Biotherapeutics Inc filed Critical Excision Biotherapeutics Inc
Publication of WO2023196647A1 publication Critical patent/WO2023196647A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • C12N15/1131Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing against viruses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • Gene editing systems have the potential to treat an array of diseases, however, one of the major challenges facing their implementation is improving the rate of safe, high fidelity, and efficient editing.
  • the development of safe and efficient gene editing systems and methods is important for the broad implementation of genome-editing technologies in the treatment of disease.
  • the challenges of developing safe and effective delivery genome-editing technologies differs from that facing classical gene replacement therapy, which requires longterm transgene expression and does not involve base or genome editing.
  • the traditional goal of genome-editing technologies has focused on the delivery of one or a limited number of doses of programmable nucleases, in a transient manner, with the goal of achieving sufficient effective editing efficiencies to yield clinical benefits. Due to potential limitations with current gene editing technologies and methods, sufficient effective editing efficiencies may not yield clinical benefits or such clinical benefits may be hard to achieve.
  • MMEJ-mediated excision of template nucleic acids e.g., viral sequences
  • template nucleic acids e.g., viral sequences
  • MMEJ-mediated deletions are considered to be limited to indels at single cut sites having smaller distances (e.g., ⁇ 15 nucleotides) between microhomologous sequences.
  • MMEJ prediction algorithms generally reduce MMEJ predictions as a function of the distance between microhomologous sequences (e.g., reducing predicted MMEJ frequencies as the distance between microhomologous sequences increases).
  • compositions and methods described herein are advantageous for achieving MMEJ-mediated excision between two cut sites and/or over large distances (e.g., >100 base pairs, >1 ,000 base pairs, etc.) separating a first and a second cleaved region.
  • methods described herein are further advantageous for modelling and/or predicting a range of repair options with multiple cuts (e.g., MMEJ-mediated excision, MMEJ-mediated excision using inversions, etc.).
  • the described methods better allow for the selection of target sites predicted to provide more the desired outcome, such as improved MMEJ-mediated excision efficiencies and/or MMEJ-mediated inversions.
  • compositions and methods described herein are additionally useful for the excising viral nucleic acid molecules (e.g., to inactivate a virus).
  • using microhomology scoring to model the competing repair possibilities, including individual cut site repair, inversions and/or excision modelling can allow choice of target sites that maximize the desired outcome in viral excision.
  • choosing target sites e.g., cleavable regions or guide RNAs
  • choosing target sites based on high identifying and/or characterizing microhomologies can provide excision or inversion levels that are substantially higher than expected from non-homology repair.
  • compositions comprising:
  • a first gene editing system configured to enzymatically cleave at a first target site on a template nucleic acid molecule and generate a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence;
  • the second gene editing system is configured to enzymatically cleave at a second target site on the template nucleic acid molecule and generate a second cleaved region, the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
  • the first target site and the second target site are different.
  • the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides.
  • sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
  • microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
  • the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
  • the first gene editing system and the second gene editing system is selected from the group consisting of a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system.
  • the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides, first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. In some embodiments, the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides, first target site and the second target site are located within two or more genes of the template nucleic acid molecule.
  • the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides, first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule.
  • the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
  • the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
  • MMEJ microhomology-mediated end joining
  • generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
  • the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
  • the deletion removes one or more genes within the template nucleic acid molecule.
  • the deletion is a full deletion of a gene or a partial deletion of a gene.
  • the deletion comprises an inversion.
  • the first target site and the second target site are separated by a distance of at least 50 (e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
  • the template nucleic acid molecule is in a cell.
  • the cell is in an individual.
  • the individual is a human.
  • the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell.
  • compositions comprising:
  • a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)- associated nuclease system comprising: (i) a first guide ribonucleic acid (gRNA) comprising a first spacer sequence that hybridizes to a first target site on a template nucleic acid molecule, and (ii) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease, wherein: the first CRISPR-associated nuclease cleaves the template nucleic acid molecule within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the first target site and generates a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence; and (b) a second CRISPR-associated endonuclease system comprising (i) a second guide ribonucleic acid
  • the first gRNA and the second gRNA are different.
  • the microhomology comprises three or more complementary nucleotides having a GC (guanine or cytosine) content greater than or equal to 50%.
  • sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
  • microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
  • the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
  • the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides.
  • the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. [0017] In some embodiments, the first target site and the second target site are located within two or more genes of the template nucleic acid molecule. In some embodiments, the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule.
  • the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
  • generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
  • MMEJ microhomology-mediated end joining
  • generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
  • the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
  • the deletion removes one or more genes within the template nucleic acid molecule.
  • the deletion is a full deletion of a gene or a partial deletion of a gene.
  • the deletion comprises an inversion.
  • the first target site and the second target site are separated by a distance of at least 50 (e.g. , at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
  • the template nucleic acid molecule is in a cell.
  • the cell is in an individual.
  • the individual is a human.
  • the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • cleaving the template nucleic acid molecule at a first target site on the template nucleic acid molecule thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence;
  • the first target site and the second target site are different.
  • the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
  • microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
  • the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
  • the first gene editing system and the second gene editing system is selected from the group consisting of a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system.
  • the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides.
  • the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are located within two or more genes of the template nucleic acid molecule. In some embodiments, the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule.
  • the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
  • generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
  • MMEJ microhomology-mediated end joining
  • generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
  • the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs). In some embodiments, the deletion removes one or more genes within the template nucleic acid molecule. In some embodiments, the deletion is a full deletion of a gene or a partial deletion of a gene. In some embodiments, the deletion comprises an inversion.
  • the first target site and the second target site are separated by a distance of at least 50 (e.g. , at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
  • at least 50 e.g. , at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs.
  • the template nucleic acid molecule is in a cell.
  • the cell is in an individual.
  • the individual is a human.
  • the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell.
  • the first target site and the second target site are different.
  • the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
  • microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
  • the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
  • the viral nucleic acid molecule is in a cell.
  • the cell is in an individual.
  • the individual is a human.
  • the viral nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated). In some embodiments, the viral nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell. [0032] Provided herein are computer-implemented methods of cut site identification and characterization for cutting a template polynucleotide molecule, the computer-implemented method comprising:
  • a template nucleic acid sequence e.g., a viral genome sequence
  • positional data for a cleavable region of the plurality of cleavable regions using the template nucleic acid sequence, wherein the positional data comprises (i) the location of the cleavable region, (ii) a cut site within the cleavable region, and/or (iii) nucleobase sequences of nucleobase positions within the cleavable region;
  • the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • a cleavable region comprises (i) about 10 base pairs 5’ of a cut site within the cleavable region and (ii) about 10 base pairs 3’ of the cut site within the cleavable region.
  • the microhomology data is a function of:
  • nucleobase complementarity of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region (i) total nucleobase complementarity of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region; (ii) the length (e.g., number of contiguous nucleobases) of complementary nucleobases of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region;
  • orientation and/or strand location (e.g., for identifying inversion outcomes);
  • the template nucleic acid sequence comprises consensus sequence
  • the computer-implemented method comprises in (a) generating, by one or more computers, the template nucleic acid sequence by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes).
  • the template nucleic acid sequence is different from each input nucleic acid sequence used to generate the consensus sequence.
  • the two or more input nucleic acid sequences are present within a definable geographical region (e.g., Asia, Europe, North America, etc.), a definable population of individuals (e.g., a patient population), or a definable pathology (e.g., cancer-causing variants).
  • the computer-implemented method further comprises: generating, by one or more computers, positional entropy data for a nucleotide at each position of the template nucleic acid sequence.
  • the method further comprises: generating, by the one or more computers, additional data using the template nucleic acid sequence and/or the positional data, wherein the additional data comprises positional entropy data (e.g., Shannon entropy) for a cut site and/or nucleobase positions proximal to the cut site, gene location (e.g., coding region data) data for a cut site and/or nucleobase positions proximal to the cut site, a distance data (e.g., distance from other target sequences) for a cut site and/or nucleobase positions proximal to the cut site, proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence) for a cut site and/or nucleobase positions proximal to the cut site, target specificity and selectivity data (e.g., Azimuth 2.0) for a cut site and/or nucleobase positions proximal to the cut site,
  • the method further comprises: generating, by the one or more computers, additional data for the first target nucleic acid sequence and the second target nucleic acid sequence, wherein the additional data comprises positional entropy data (e.g., Shannon entropy), gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof.
  • positional entropy data e.g., Shannon entropy
  • gene location e.g., coding region data
  • a distance data e.g., distance from other target sequences
  • proximity to one or more PAM sequences e.g., homology to a human genome sequence
  • target specificity and selectivity data e.g., Azimuth 2.0
  • identifying a first cleavable region and a second cleavable region having microhomology by: (i) generating, by one or more computers, microhomology data for a plurality of cleavable regions comprising cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (1 ) the location of cut sites and/or (2) nucleobase sequences of nucleobase positions within the cleavable regions comprising the cut sites; and (ii) identifying, by one or more computers, a first cleavable region and a second cleavable region comprising microhomology using the microhomology data;
  • FIG. 1A, FIG. 1 B, and FIG. 1C show a schematic representation of generating cleaved regions having microhomology.
  • FIG. 2A and FIG. 2B show a schematic representation of microhomologous sequences on or within cleaved regions.
  • FIG. 3A and FIG. 3B show a schematic representation of microhomologous sequences on or within cleaved regions.
  • FIG. 4A, FIG. 4B, and FIG. 4C show representations and data of two-cut MME J -mediated excision of an HSV nucleic acid using CasX and encompassing ⁇ 3,500 base pairs.
  • FIG. 5 shows representations and data of two-cut MMEJ-mediated excision of an HSV nucleic acid using CasX and encompassing ⁇ 4,500 base pairs.
  • FIG. 6 shows representations and data of two-cut MMEJ-mediated excision of an HSV nucleic acid using SluCas and encompassing ⁇ 3,400 base pairs.
  • FIG. 7A and FIG. 7B show representations and data of two-cut MMEJ- mediated excision of an HSV nucleic acid using CpeCas and encompassing ⁇ 3, 500-4, 500 base pairs.
  • FIG. 8 shows representations and data of single MMEJ-mediated deletion at a single cleaved region of an HBV nucleic acid using CpeCas.
  • FIG. 9 shows an exemplary flowchart of a method for selecting a guide RNA.
  • FIG. 10 schematically illustrates a computer system that is programmed or otherwise configured to implement methods provided herein.
  • a gene editing system generally refers to and includes a system comprising one or more nucleic acid-modifying enzymes capable of binding a nucleic acid molecule (e.g., a template nucleic acid molecule).
  • gene editing systems are generally used for modifying the nucleic acid of a target gene and/or for modulating the expression of a target gene (e.g., as measured by mRNA expression, protein expression, or protein function).
  • the one or more nucleic acid-binding domains or components are associated with the one or more nucleic acid-modifying enzymes or components, such that the one or more nucleic acid -binding domains target the one or more nucleic acid-modifying enzymes or components to a specific nucleic acid site (e.g., a specific sequence).
  • the gene editing systems described herein are useful for targeting two or more sites on a template nucleic acid (e.g., a viral genome) to excise a portion (e.g., full or partial) through microhomology-mediated repair pathways (e.g., MMEJ).
  • Gene systems generally include, but are not limited to, zinc finger nucleases, transcription activator-like effector nucleases (TALENs); clustered regularly interspaced short palindromic repeats (CRISPR)ZCas systems, meganuclease systems, and recombinase-based systems.
  • TALENs transcription activator-like effector nucleases
  • CRISPR clustered regularly interspaced short palindromic repeats
  • meganuclease systems and recombinase-based systems.
  • the gene editing systems described herein are useful for targeting at least a first site (e.g., region) on a template nucleic acid and a second site on the template nucleic acid.
  • Target or target site e.g., as used in target region, or target gene
  • the gene editing system comprises a nucleic acid encoding a gene editing system configured to target multiple target sites (e.g., a first target site and a second target site) on a template nucleic acid molecule.
  • gene editing systems targeting a first and a second target site provides advantages when combined with targeting regions having microhomology in that generating a first and a second double stranded break comprising microhomology promotes efficient excision (e.g., microhomology-mediated excision) of the region between the first site and second site, whereas a single cut may be readily repaired by host cell machinery.
  • the gene editing system targets a first target site and a second target site on a template nucleic acid.
  • targeting a first target site and a second target site on a template nucleic acid provides the benefit of promoting excision of the region between the first site and second site, wherein this effect is achieved by a gene editing system targeting the first and second target site.
  • the gene editing system can be a CRISPR- Cas system (e.g., comprising a first gRNA and second gRNA, or a single gRNA having a target site repeated within in a template sequence), a meganuclease system (e.g., comprising a first meganuclease and a second meganuclease, or a single meganuclease having a target site repeated within in a template sequence), a TALEN system (e.g., comprising a first TALEN and a second TALEN, or a single TALEN having a target site repeated within in a template sequence), or a zinc finger nuclease system (e.g., comprising a first ZFN and a second ZFN, or a single ZFN having a target site repeated within in a template sequence).
  • CRISPR- Cas system e.g., comprising a first gRNA and second gRNA, or a single gRNA having
  • a vector comprising a nucleic acid encoding a gene editing system that is configured to target (e.g., cut) multiple sites (identical sites or different sites) within a template nucleic acid are useful for excision of a region between the multiple sites.
  • microhomology-mediated excision provides advantages over targeting a single site, targeting sites without homology or introducing a single cut in a template sequence.
  • the gene editing system is a CRSIPR-Cas system.
  • CRISPR system refers to and includes elements involved in the expression of or directing the activity of a CRISPR-associated (Cas) endonuclease, including guide RNA sequences and components thereof, such as a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a spacer sequence (also referred to as a guide sequence), or other sequences and transcripts from a CRISPR locus.
  • tracr trans-activating CRISPR
  • tracrRNA or an active partial tracrRNA e.g. tracrRNA or an active partial tracrRNA
  • a tracr-mate sequence encompassing a “direct repeat” and a tracrRNA-processed partial direct
  • a CRISPR system is characterized by such elements that promote the formation of a CRISPR complex at the site of a target sequence.
  • a target sequence refers to a sequence to which a spacer sequence is designed to hybridize to, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex.
  • the gene editing system is a CRSIPR-Cas system.
  • CRISPR system refers to and includes elements involved in the expression of or directing the activity of a CRISPR-associated (“Cas”) endonuclease, including guide RNA sequences and components thereof, such as a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a spacer sequence (also referred to as a guide sequence), or other sequences and transcripts from a CRISPR locus.
  • tracr trans-activating CRISPR
  • tracr-mate sequence encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system
  • spacer sequence also referred to as a guide sequence
  • a CRISPR system is characterized by such elements that promote the formation of a CRISPR complex that directs to the target sequence or sequences.
  • target sequence refers to a sequence to which a spacer sequence is designed to hybridize to, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex.
  • the gene editing system is a meganuclease system.
  • Meganuclease generally refers to and includes an endonuclease that binds double- stranded DNA at a recognition sequence that is greater than about 12 to about 40 base pairs.
  • a meganuclease can be an endonuclease that is derived from l-Crel, and can refer to an engineered variant of l-Crel that has been modified relative to natural l-Crel with respect to, for example, DNA- binding specificity, DNA cleavage activity, DNA-binding affinity, or dimerization properties. Methods for producing such modified variants of l-Crel are known.
  • a meganuclease as used herein binds to double-stranded DNA as a heterodimer or as a single-chain meganuclease in which a pair of DNA-binding domains are joined into a single polypeptide using a peptide linker.
  • a single-chain meganuclease refers to and includes a polypeptide comprising a pair of meganuclease subunits joined by a linker.
  • a single-chain meganuclease has the organization: N-terminal subunit - Linker - C-terminal subunit.
  • the two meganuclease subunits are generally non-identical in amino acid sequence and recognize nonidentical DNA.
  • meganucleases can refer to a dimeric or single-chain meganuclease.
  • meganucleases can be divided into families based on sequence and structure motifs: LAGLIDADG, GIY-YIG, HNH, His-Cys box and PD-(D/E)XK.
  • a range of other meganulcease systems are possible, e.g., such as those derived from naturally occurring enzymes or through rational design.
  • crystal structures illustrate the mode of sequence specificity and cleavage mechanism for the meganucleases (e.g., LAGLIDADG meganucleases) where (i) specificity contacts arise from the burial of extended [3-strands into the major groove of the DNA, with the DNA binding elements the having a pitch and contour mimicking the helical twist of the DNA; (ii) cleavage generates 4-nt 3'-OH overhangs occurs across the minor groove, wherein the scissile phosphate bonds are brought closer to the protein catalytic core by a distortion of the DNA in the central “4-base” region; (iii) cleavage occurs via a proposed two-metal mechanism; and (iv) finally, additional affinity and/or specificity contacts can arise from adapted scaffolds or modifications in regions outside the core a/
  • the gene editing system is a transcription activator-like effector nuclease (TALEN) system.
  • TALENs generally refer to and include a polypeptide comprising a transcription activator-like effector domain (TALE) for DNA binding and a Fokl nuclease domain.
  • TALE transcription activator-like effector domain
  • TALENs can be rapidly designed and assembled with flexible targeting sequences with potentially high potency and specificity.
  • the TALE has domain has a central DNA-binding domain composed of 13-28 repeat monomers of 33-34 amino acids. The amino acids of each monomer are highly conserved, except for hypervariable amino acid residues at positions 12 and 13. The two variable amino acids are called repeat-variable residues (RVDs).
  • the amino acid pairs Nl, NG, HD, and NN of RVDs preferentially recognize adenine, thymine, cytosine, and guanine/adenine, respectively, and modulation of RVDs can recognize consecutive DNA bases.
  • Other natural, selected or rationally designed RVDs can be used, including NK and NH to recognize guanine.
  • the relationship between amino acid sequence and DNA recognition allows for the engineering of specific DNA binding domains by selecting a combination of repeat segments containing the appropriate RVDs.
  • the transcription activator-like effector (TALE) DNA binding domain can be fused to a functional domain, such as a recombinase, a nuclease, a transposase or a helicase, thereby conferring sequence specificity to the functional domain.
  • a functional domain such as a recombinase, a nuclease, a transposase or a helicase
  • the gene editing system is a zinc finger nuclease (ZFN) system.
  • Zinc finger nucleases generally refer to and include a chimeric polypeptide molecule comprising at least one zinc finger DNA binding domain, and generally three to five domains effectively linked to at least one nuclease capable of cleaving DNA. In many embodiments cleavage requires dimerization of two zinc finger nuclease domains.
  • zinc finger nucleases are generally capable of directing targeted genetic recombination or targeted mutation in a host cell by causing a double-stranded break at a target locus.
  • zinc finger nucleases include a DNA-binding domain and a DNA-cleavage domain, wherein the DNA binding domain is comprised of at least one zinc finger and is operatively linked to a DNA-cleavage domain.
  • the zinc finger DNA- binding domain is at the N-terminus of the chimeric protein molecule and the DNA- cleavage domain is located at the C-terminus of the molecule.
  • cleavage requires dimerization of two zinc finger nuclease domains to cleave the intervening sequence.
  • targetable gene editing systems e.g., such as those described herein
  • a template nucleic acid sequence e.g., a viral genome
  • a first target sequence generating a first cleaved region
  • a second target sequence generating a second cleaved region
  • the second cleaved region comprising microhomology or a region (e.g., a sequence within the cleaved region) of microhomology to the first cleaved region.
  • Microhomology-mediated end joining generally refers to and includes the mechanism for double stranded breaks in a template nucleic acid molecule (e.g., within a genome), which relies on exposed microhomologous sequences (i.e., sequences having microhomology) flanking broken junction to fix DSBs in a Ku- and ligase IV-independent manner.
  • MMEJ generally involves five steps for repairing a double stranded break: resection of the DSB ends (generally 5’ to 3’ resection), annealing of region/sequences having microhomology, removal of heterologous flaps, fill-in synthesis (i.e., polymerase extension), and ligation. Additional pathways for repair of the cleaved or cleavable regions described herein
  • microhomology can be determined by various known methods, such as Microhomology-Predictor (Bae, S., Kweon, J., Kim, H. et al. Microhomology-based choice of Cas9 nuclease target sites. Nat Methods 11 , 705-706 (2014) and MENTHLI (Robust Activation of Microhomology-mediated End Joining for Precision Gene Editing Applications. Ata H, Ekstrom TL, Martinez- Galvez G, Mann CM, Dvornikov AV, Schaefbauer KJ, Ma AC, Dobbs D, Clark KJ, Ekker SC. PLOS Genetics 14(9): e1007652), inDelphi (Max W.
  • Microhomology-Predictor Boe, S., Kweon, J., Kim, H. et al. Microhomology-based choice of Cas9 nuclease target sites. Nat Methods 11 , 705-706 (2014) and MENTHLI (Robust Activation of Microhomology-mediated End
  • FORCAST a fully integrated and open source pipeline to design Cas-mediated mutagenesis experiments
  • Lindel and MENdel (Gabriel Martinez-Galvez, Parnal Joshi, Iddo Friedberg, Armando Manduca, Stephen C Ekker, Deploying MMEJ using MENdel in precision gene editing applications for gene therapy and functional genomics, Nucleic Acids Research, Volume 49, Issue 1 , 1 1 January 2021 ), each of which are herein incorporated by reference for the application of determining and/or identifying microhomology.
  • the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • a template nucleic acid molecule FIG. 1A - 100; FIG. 1 B - 100
  • a first sequence FIG. 1A - 110; FIG. 1 B - 110
  • a second sequence FIG. 1A - 112; FIG.
  • targetable gene editing systems e.g., CRISPR-Cas systems, meganuclease systems, TALEN systems, or ZFN systems
  • CRISPR-Cas systems e.g., CRISPR-Cas systems, meganuclease systems, TALEN systems, or ZFN systems
  • FIG. 1A - 120 target site
  • TALEN systems TALEN systems
  • ZFN systems ZFN systems
  • FIG. 1 B - 122 target site and 132 (cut/cleavage site) located within (FIG. 1A - 122) or proximal (FIG. 1 B - 122) to the second sequence, thereby generating a first cleaved region (FIG. 1A - 140; FIG. 1 B - 140) and a second cleaved region (FIG. 1A - 142; FIG. 1 B - 142) comprising the sequences having microhomology (FIG. 1A - 110 and 112; FIG.
  • microhomology-based repair mechanisms e.g., MMEJ
  • join facilitate excision through, in certain instances, resection of the DSB ends (generally 5’ to 3’ resection), annealing of region/sequences having microhomology, removal of heterologous flaps, fill-in synthesis (i.e. , polymerase extension), and ligation.
  • sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
  • microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
  • sequences having microhomology comprise a first sequence (e.g., within a first cleaved region; FIG. 1A or 1 B - 140) and a second sequence (e.g., within a second cleaved region; FIG. 1A or 1 B - 142), wherein the first sequence and the second sequence are complementary (e.g., full, substantially, or partially) to one another.
  • first sequence and/or second sequence comprise about 2 nucleotides to about 20 nucleotides.
  • first sequence and/or second sequence comprise about 2 nucleotides to about 3 nucleotides, about 2 nucleotides to about 4 nucleotides, about 2 nucleotides to about 5 nucleotides, about 2 nucleotides to about 6 nucleotides, about 2 nucleotides to about 7 nucleotides, about 2 nucleotides to about 8 nucleotides, about 2 nucleotides to about 9 nucleotides, about 2 nucleotides to about 10 nucleotides, about 2 nucleotides to about 12 nucleotides, about 2 nucleotides to about 15 nucleotides, about 2 nucleotides to about 20 nucleotides, about 3 nucleotides to about 4 nucleotides, about 3 nucleotides to about 5 nucleotides, about 3 nucleotides to about 6 nucleotides, about 3 nucleotides to about 7 nucleotides, about 3
  • first sequence and/or second sequence comprise about 2 nucleotides, about 3 nucleotides, about 4 nucleotides, about 5 nucleotides, about 6 nucleotides, about 7 nucleotides, about 8 nucleotides, about 9 nucleotides, about 10 nucleotides, about 12 nucleotides, about 15 nucleotides, or about 20 nucleotides.
  • first sequence and/or second sequence comprise at least about 2 nucleotides, about 3 nucleotides, about 4 nucleotides, about 5 nucleotides, about 6 nucleotides, about 7 nucleotides, about 8 nucleotides, about 9 nucleotides, about 10 nucleotides, about 12 nucleotides, or about 15 nucleotides.
  • first and second sequences having microhomology comprise about 2 complementary nucleotides to about 15 complementary nucleotides. In certain embodiments, first and second sequences having microhomology comprise about 2 complementary nucleotides to about 3 complementary nucleotides, about 2 complementary nucleotides to about 4 complementary nucleotides, about 2 complementary nucleotides to about 5 complementary nucleotides, about 2 complementary nucleotides to about 6 complementary nucleotides, about 2 complementary nucleotides to about 7 complementary nucleotides, about 2 complementary nucleotides to about 8 complementary nucleotides, about 2 complementary nucleotides to about 9 complementary nucleotides, about 2 complementary nucleotides to about 10 complementary nucleotides, about 2 complementary nucleotides to about 12 complementary nucleotides, about 2 complementary nucleotides to about 15 complementary nucleotides, about 3 complementary nucleotides to about 4 complementary nucleotides, about 3 complementary nucle
  • first and second sequences having microhomology comprise about 2 complementary nucleotides, about 3 complementary nucleotides, about 4 complementary nucleotides, about 5 complementary nucleotides, about 6 complementary nucleotides, about 7 complementary nucleotides, about 8 complementary nucleotides, about 9 complementary nucleotides, about 10 complementary nucleotides, about 12 complementary nucleotides, or about 15 complementary nucleotides.
  • first and second sequences having microhomology comprise at least about 2 complementary nucleotides, about 3 complementary nucleotides, about 4 complementary nucleotides, about 5 complementary nucleotides, about 6 complementary nucleotides, about 7 complementary nucleotides, about 8 complementary nucleotides, about 9 complementary nucleotides, about 10 complementary nucleotides, or about 12 complementary nucleotides.
  • sequences having microhomology comprise a first sequence (e.g., within a first cleaved region; FIG. 1A or 1 B - 140) and a second sequence (e.g., within a second cleaved region; FIG. 1A or 1 B - 142), wherein the first sequence and the second sequence are capable of annealing to one another (e.g., under physiological conditions).
  • first and second sequences having microhomology comprise about 2 nucleotides capable of annealing to about 20 nucleotides capable of annealing.
  • first and second sequences having microhomology comprise about 2 nucleotides capable of annealing to about 3 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 4 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 5 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 6 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 7 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 8 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 9 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 10 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 12
  • first and second sequences having microhomology comprise about 2 nucleotides capable of annealing, about 3 nucleotides capable of annealing, about 4 nucleotides capable of annealing, about 5 nucleotides capable of annealing, about 6 nucleotides capable of annealing, about 7 nucleotides capable of annealing, about 8 nucleotides capable of annealing, about 9 nucleotides capable of annealing, about 10 nucleotides capable of annealing, about 12 nucleotides capable of annealing, about 15 nucleotides capable of annealing, or about 20 nucleotides capable of annealing.
  • first and second sequences having microhomology comprise at least about 2 nucleotides capable of annealing, about 3 nucleotides capable of annealing, about 4 nucleotides capable of annealing, about 5 nucleotides capable of annealing, about 6 nucleotides capable of annealing, about 7 nucleotides capable of annealing, about 8 nucleotides capable of annealing, about 9 nucleotides capable of annealing, about 10 nucleotides capable of annealing, about 12 nucleotides capable of annealing, or about 15 nucleotides capable of annealing.
  • sequences having microhomology comprise a first sequence (e.g., within a first cleaved region; FIG. 1A - 140) and a second sequence (e.g., within a second cleaved region; FIG. 1A - 142), wherein the first sequence and/or the second sequence are located at the terminal end of a first cleaved region and/or second cleaved region, respectively.
  • the terminal end is the 3’ end of a cleaved region (FIG. 2A - 211 and 213).
  • sequences having microhomology comprise a first sequence (e.g., within a first cleaved region; FIG.
  • first sequence and/or the second sequence are proximally located (e.g., within 2 nucleobase positions) at the terminal end of a first cleaved region and/or second cleaved region, respectively.
  • the terminal end is the 3’ end of a cleaved region (FIG. 2B - 211 and 213).
  • the first sequence and/or the second sequence are located within 1 to 25 nucleobase positions from the terminal end of a first cleaved region and/or second cleaved region, respectively.
  • the first sequence and/or the second sequence are located within about 1 nucleobase position from the terminal end to about 25 nucleobase positions from the terminal end. In certain embodiments, the first sequence and/or the second sequence are located within about 1 nucleobase position from the terminal end to about 2 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 3 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 4 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 5 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 10 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 15 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 20 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 25 nucleobase positions from the terminal end, about 2 nucleo
  • the first sequence and/or the second sequence are located within about 1 nucleobase position from the terminal end, about 2 nucleobase positions from the terminal end, about 3 nucleobase positions from the terminal end, about 4 nucleobase positions from the terminal end, about 5 nucleobase positions from the terminal end, about 10 nucleobase positions from the terminal end, about 15 nucleobase positions from the terminal end, about 20 nucleobase positions from the terminal end, or about 25 nucleobase positions from the terminal end.
  • the first sequence and/or the second sequence are located within at least about 1 nucleobase position from the terminal end, about 2 nucleobase positions from the terminal end, about 3 nucleobase positions from the terminal end, about 4 nucleobase positions from the terminal end, about 5 nucleobase positions from the terminal end, about 10 nucleobase positions from the terminal end, about 15 nucleobase positions from the terminal end, or about 20 nucleobase positions from the terminal end.
  • the first sequence and/or the second sequence are located within at most about 2 nucleobase positions from the terminal end, about 3 nucleobase positions from the terminal end, about 4 nucleobase positions from the terminal end, about 5 nucleobase positions from the terminal end, about 10 nucleobase positions from the terminal end, about 15 nucleobase positions from the terminal end, about 20 nucleobase positions from the terminal end, or about 25 nucleobase positions from the terminal end.
  • the first and second sequences having microhomology are located in different genes. In some embodiments, the first and second sequences having microhomology located in coding regions of different genes. In certain embodiments, the first and second sequences having microhomology are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In certain embodiments, the first and second sequences having microhomology are separated by about 50 base pairs to about 8,000 base pairs.
  • the first and second sequences having microhomology are separated by about 50 base pairs to about 250 base pairs, about 50 base pairs to about 500 base pairs, about 50 base pairs to about 1 ,000 base pairs, about 50 base pairs to about 2,000 base pairs, about 50 base pairs to about 5,000 base pairs, about 50 base pairs to about 8,000 base pairs, about 250 base pairs to about 500 base pairs, about 250 base pairs to about 1 ,000 base pairs, about 250 base pairs to about 2,000 base pairs, about 250 base pairs to about 5,000 base pairs, about 250 base pairs to about 8,000 base pairs, about 500 base pairs to about 1 ,000 base pairs, about 500 base pairs to about 2,000 base pairs, about 500 base pairs to about 5,000 base pairs, about 500 base pairs to about 8,000 base pairs, about 1 ,000 base pairs to about 2,000 base pairs, about 1 ,000 base pairs to about 5,000 base pairs, about 1 ,000 base pairs to about 8,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about
  • the first and second sequences having microhomology are separated by about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs. In certain embodiments, the first and second sequences having microhomology are separated by at least about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, or about 5,000 base pairs. In certain embodiments, the first and second sequences having microhomology are separated by at most about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs.
  • the first and second sequences having microhomology are located on different template nucleic acids (e.g., an episomal genome and an integrated genome).
  • compositions comprising gene editing systems targeting regions having microhomology
  • compositions comprising: (a) a first gene editing system, wherein: the first gene editing system is configured to enzymatically cleave at a first target site on a template nucleic acid molecule and generate a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence; and (b) a second gene editing system, wherein: the second gene editing system is configured to enzymatically cleave at a second target site on the template nucleic acid molecule and generate a second cleaved region, the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
  • the first target site and the second target site are different.
  • the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides.
  • sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
  • microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
  • the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
  • 3A-3B show an example of a cleaved region (340 and 342) on a template nucleic acid molecule, wherein a first and a second cleaved region comprise a first (310 and 311 ) and a second sequence (312 and 313) having microhomology.
  • a cleaved or cleavable region comprises about 5 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site. In some embodiments, a cleaved or cleavable region comprises about 5 base pairs 5' and 3' of a cut site to about 10 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 12 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 15 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 20 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site to about 12 base pairs 5' and 3' of a cut site, about
  • a cleaved or cleavable region comprises about 5 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site, about 20 base pairs 5' and 3' of a cut site, or about 25 base pairs 5' and 3' of a cut site.
  • a cleaved or cleavable region comprises at least about 5 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site, or about 20 base pairs 5' and 3' of a cut site.
  • a cleaved or cleavable region comprises at most about 10 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site, about 20 base pairs 5' and 3' of a cut site, or about 25 base pairs 5' and 3' of a cut site.
  • compositions comprising: (a) a first gene editing system, the first gene editing system configured to enzymatically cleave at a first target site on a template nucleic acid molecule and generate a first cleaved region; and (b) a second gene editing system, the second gene editing system configured to enzymatically cleave at a second target site on the template nucleic acid molecule and generate a second cleaved region, and the second target site comprising microhomology to the first target site.
  • compositions comprising: (a) a first gene editing system, the first gene editing system configured to enzymatically cleave at a first target site on a template nucleic acid molecule and generate a first cleaved region on the template nucleic acid molecule, and (b) a second gene editing system, the second gene editing system configured to enzymatically cleave at a second target site on a template nucleic acid molecule and generate a second cleaved region on the template nucleic acid molecule, the second cleaved region having microhomology to the first cleaved region.
  • any targetable nuclease system can be used to target sites or regions comprising microhomology.
  • the first gene editing system and the second gene editing system is selected from the group consisting of a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system.
  • the microhomology comprises at least 2, at least
  • microhomology comprises about 2 complementary nucleotides to about 20 complementary nucleotides . In some embodiments, microhomology comprises about 2 complementary nucleotides to about 5 complementary nucleotides, about 2 complementary nucleotides to about 10 complementary nucleotides, about 2 complementary nucleotides to about 15 complementary nucleotides, about 2 complementary nucleotides to about 20 complementary nucleotides, about 5 complementary nucleotides to about 10 complementary nucleotides, about 5 complementary nucleotides to about 15 complementary nucleotides, about 5 complementary nucleotides to about 20 complementary nucleotides, about 10 complementary nucleotides to about 15 complementary nucleotides, about 10 complementary nucleotides to about 20 complementary nucleotides, or about 15 complementary nucleotides to about 20 complementary nucleotides.
  • microhomology comprises about 2 complementary nucleotides, about 5 complementary nucleotides, about 10 complementary nucleotides, about 15 complementary nucleotides, or about 20 complementary nucleotides. In some embodiments, microhomology comprises at least about 2 complementary nucleotides, about 5 complementary nucleotides, about 10 complementary nucleotides, or about 15 complementary nucleotides. In some embodiments, microhomology comprises more than 20 complementary nucleotides. In some embodiments, microhomology comprises more than 30 complementary nucleotides. In some embodiments, microhomology comprises more than 40 complementary nucleotides.
  • the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are located within two or more genes of the template nucleic acid molecule. In some embodiments, the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are located within two or more protein coding regions of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are identical or substantially identical (e.g., greater than 75% sequence identity).
  • the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal to the terminus of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site).
  • generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
  • MMEJ microhomology-mediated end joining
  • generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
  • the deletion comprises at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In certain embodiments, the deletion comprises about 50 base pairs to about 8,000 base pairs.
  • the deletion comprises about 50 base pairs to about 250 base pairs, about 50 base pairs to about 500 base pairs, about 50 base pairs to about 1 ,000 base pairs, about 50 base pairs to about 2,000 base pairs, about 50 base pairs to about 5,000 base pairs, about 50 base pairs to about 8,000 base pairs, about 250 base pairs to about 500 base pairs, about 250 base pairs to about 1 ,000 base pairs, about 250 base pairs to about 2,000 base pairs, about 250 base pairs to about 5,000 base pairs, about 250 base pairs to about 8,000 base pairs, about 500 base pairs to about 1 ,000 base pairs, about 500 base pairs to about 2,000 base pairs, about 500 base pairs to about 5,000 base pairs, about 500 base pairs to about 8, 000 base pairs, about 1 ,000 base pairs to about 2,000 base pairs, about 1 ,000 base pairs to about 5,000 base pairs, about 1 ,000 base pairs to about 8,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about 8,000 base pairs, or about 5,000 base pairs to about 8,000 base pairs.
  • the deletion comprises about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs. In certain embodiments, the deletion comprises at least about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, or about 5,000 base pairs. In certain embodiments, the deletion comprises at most about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs.
  • the deletion removes one or more genes within the template nucleic acid molecule.
  • the deletion is a full deletion of a gene or a partial deletion of a gene.
  • the deletion comprises an inversion.
  • the first and second gene editing system excises all or substantially all (e.g., greater that half of the total template nucleic acid) of the template nucleic acid.
  • the excised region comprises about 10% of the total template nucleic acid to about 100% of the total template nucleic acid.
  • the excised region comprises about 10% of the total template nucleic acid to about 25% of the total template nucleic acid, about 10% of the total template nucleic acid to about 50% of the total template nucleic acid, about 10% of the total template nucleic acid to about 60% of the total template nucleic acid, about 10% of the total template nucleic acid to about 70% of the total template nucleic acid, about 10% of the total template nucleic acid to about 80% of the total template nucleic acid, about 10% of the total template nucleic acid to about 90% of the total template nucleic acid, about 10% of the total template nucleic acid to about 100% of the total template nucleic acid, about 25% of the total template nucleic acid to about 50% of the total template nucleic acid, about 25% of the total template nucleic acid to about 60% of the total template nucleic acid, about 25% of the total template nucleic acid to about 70% of the total template nucleic acid, about 25% of the total template nucleic acid to about 80% of the total template nucleic acid, about
  • the excised region comprises about 10% of the total template nucleic acid, about 25% of the total template nucleic acid, about 50% of the total template nucleic acid, about 60% of the total template nucleic acid, about 70% of the total template nucleic acid, about 80% of the total template nucleic acid, about 90% of the total template nucleic acid, or about 100% of the total template nucleic acid. In certain embodiments, the excised region comprises at least about 10% of the total template nucleic acid, about 25% of the total template nucleic acid, about 50% of the total template nucleic acid, about 60% of the total template nucleic acid, about 70% of the total template nucleic acid, about 80% of the total template nucleic acid, or about 90% of the total template nucleic acid.
  • the first target site and the second target site are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In certain embodiments, the first and second target sites are separated by about 50 base pairs to about 8,000 base pairs.
  • the first and second target sites are separated by about 50 base pairs to about 250 base pairs, about 50 base pairs to about 500 base pairs, about 50 base pairs to about 1 ,000 base pairs, about 50 base pairs to about 2,000 base pairs, about 50 base pairs to about 5,000 base pairs, about 50 base pairs to about 8,000 base pairs, about 250 base pairs to about 500 base pairs, about 250 base pairs to about 1 ,000 base pairs, about 250 base pairs to about 2,000 base pairs, about 250 base pairs to about 5,000 base pairs, about 250 base pairs to about 8,000 base pairs, about 500 base pairs to about 1 ,000 base pairs, about 500 base pairs to about 2,000 base pairs, about 500 base pairs to about 5,000 base pairs, about 500 base pairs to about 8,000 base pairs, about 1 ,000 base pairs to about 2,000 base pairs, about 1 ,000 base pairs to about 5,000 base pairs, about 1 ,000 base pairs to about 8,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about 8,000 base pairs
  • the first and second target sites are separated by about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs. In certain embodiments, the first and second target sites are separated by at least about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, or about 5,000 base pairs. In certain embodiments, the first and second target sites are separated by at most about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs.
  • the template nucleic acid molecule is in a cell.
  • the cell is in an individual.
  • the individual is a human.
  • the template nucleic acid molecule is a viral genome.
  • compositions comprising: (a) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease system comprising: (i) a first guide ribonucleic acid (gRNA) comprising a first spacer sequence that hybridizes to a first target site on a template nucleic acid molecule, and (ii) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease, wherein: the first CRISPR-associated nuclease cleaves the template nucleic acid molecule within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the first target site and generates a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence; and (b) a second CRISPR-associated endonuclease system comprising: (i) a first guide
  • the first gRNA and the second gRNA are different.
  • the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides.
  • sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
  • microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
  • the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
  • 3A-3B show an example of a cleaved region (340 and 342) on a template nucleic acid molecule, wherein a first and a second cleaved region comprise a first (310 and 311 ) and a second sequence (312 and 313) having microhomology.
  • a cleaved or cleavable region comprises about 5 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site. In some embodiments, a cleaved or cleavable region comprises about 5 base pairs 5' and 3' of a cut site to about 10 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 12 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 15 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 20 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site to about 12 base pairs 5' and 3' of a cut site, about
  • a cleaved or cleavable region comprises about 5 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site, about 20 base pairs 5' and 3' of a cut site, or about 25 base pairs 5' and 3' of a cut site.
  • a cleaved or cleavable region comprises at least about 5 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site, or about 20 base pairs 5' and 3' of a cut site.
  • a cleaved or cleavable region comprises at most about 10 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site, about 20 base pairs 5' and 3' of a cut site, or about 25 base pairs 5' and 3' of a cut site.
  • the microhomology comprises at least 2, at least
  • microhomology comprises at least 2, at least 5, at least 10, at least 15, or at least 20 complementary nucleotides. In some embodiments, microhomology comprises about 2 complementary nucleotides to about 20 complementary nucleotides. In some embodiments, microhomology comprises about 2 complementary nucleotides to about 5 complementary nucleotides, about
  • microhomology comprises about 2 complementary nucleotides, about 5 complementary nucleotides, about 10 complementary nucleotides, about 5 complementary nucleotides to about 15 complementary nucleotides, about 5 complementary nucleotides to about 20 complementary nucleotides, about 10 complementary nucleotides to about 15 complementary nucleotides, about 10 complementary nucleotides to about 20 complementary nucleotides, or about 15 complementary nucleotides to about 20 complementary nucleotides.
  • microhomology comprises about 2 complementary nucleotides, about 5 complementary nucleotides, about 10 complementary nucleotides, about
  • microhomology comprises at least about 2 complementary nucleotides, about 5 complementary nucleotides, about 10 complementary nucleotides, or about 15 complementary nucleotides. In some embodiments, microhomology comprises more than 20 complementary nucleotides.
  • the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are located within two or more genes of the template nucleic acid molecule. In some embodiments, the first target site is located within a first protein coding region of the template nucleic acid molecu le and the second target site is located within a second protein coding region of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are located within two or more protein coding regions of the template nucleic acid molecule.
  • the first target site and the second target site are identical or substantially identical (e.g., greater than 75% sequence identity).
  • matching regions e.g., complementary regions
  • the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal to the terminus of the first target site, the second target site, or both the first target site and the second target site (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a target site).
  • generating a first cleaved region within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the first template sequence and generating a first cleaved region within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the second template sequence activates microhomology- mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
  • MMEJ microhomology- mediated end joining
  • generating a first cleaved region within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the first template sequence and generating a first cleaved region within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the second template sequence excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
  • the deletion comprises at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In some embodiments, the deletion comprises at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In certain embodiments, the deletion comprises about 50 base pairs to about 8,000 base pairs.
  • the deletion comprises about 50 base pairs to about 250 base pairs, about 50 base pairs to about 500 base pairs, about 50 base pairs to about 1 ,000 base pairs, about 50 base pairs to about 2,000 base pairs, about 50 base pairs to about 5,000 base pairs, about 50 base pairs to about 8,000 base pairs, about 250 base pairs to about 500 base pairs, about 250 base pairs to about 1 ,000 base pairs, about 250 base pairs to about 2,000 base pairs, about 250 base pairs to about 5,000 base pairs, about 250 base pairs to about 8,000 base pairs, about 500 base pairs to about 1 ,000 base pairs, about 500 base pairs to about 2,000 base pairs, about 500 base pairs to about 5,000 base pairs, about 500 base pairs to about 8,000 base pairs, about 1 ,000 base pairs to about 2,000 base pairs, about 1 ,000 base pairs to about 5,000 base pairs, about 1 ,000 base pairs to about 8,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about 8,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs
  • the deletion comprises about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs. In certain embodiments, the deletion comprises at least about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, or about 5,000 base pairs. In certain embodiments, the deletion comprises at most about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs.
  • the deletion removes one or more genes within the template nucleic acid molecule.
  • the deletion is a full deletion of a gene or a partial deletion of a gene.
  • the first and second gene editing system excises all or substantially all (e.g., greater that half of the total template nucleic acid) of the template nucleic acid.
  • the deletion removes one or more genes within the template nucleic acid molecule.
  • the deletion removes of at least about 1 , 2, 3, 4, or 5 genes within the template nucleic acid molecule
  • the deletion is a full deletion of a gene or a partial deletion of a gene.
  • the excised region comprises about 10% of the total template nucleic acid to about 100% of the total template nucleic acid. In certain embodiments, the excised region comprises about 10% of the total template nucleic acid to about 25% of the total template nucleic acid, about 10% of the total template nucleic acid to about 50% of the total template nucleic acid, about 10% of the total template nucleic acid to about 60% of the total template nucleic acid, about 10% of the total template nucleic acid to about 70% of the total template nucleic acid, about 10% of the total template nucleic acid to about 80% of the total template nucleic acid, about 10% of the total template nucleic acid to about 90% of the total template nucleic acid, about 10% of the total template nucleic acid to about 100% of the total template nucleic acid, about 25% of the total template nucleic acid to about 50% of the total template nucleic acid, about 25% of the total template nucleic acid.
  • the excised region comprises about 10% of the total template nucleic acid, about 25% of the total template nucleic acid, about 50% of the total template nucleic acid, about 60% of the total template nucleic acid, about 70% of the total template nucleic acid, about 80% of the total template nucleic acid, about 90% of the total template nucleic acid, or about 100% of the total template nucleic acid. In certain embodiments, the excised region comprises at least about 10% of the total template nucleic acid, about 25% of the total template nucleic acid, about 50% of the total template nucleic acid, about 60% of the total template nucleic acid, about 70% of the total template nucleic acid, about 80% of the total template nucleic acid, or about 90% of the total template nucleic acid.
  • the first target site and the second target site are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In some embodiments, the first target site and the second target site are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In certain embodiments, the first and second target sites are separated by about 50 base pairs to about 8,000 base pairs.
  • the first and second target sites are separated by about 50 base pairs to about 250 base pairs, about 50 base pairs to about 500 base pairs, about 50 base pairs to about 1 ,000 base pairs, about 50 base pairs to about 2,000 base pairs, about 50 base pairs to about 5,000 base pairs, about 50 base pairs to about 8,000 base pairs, about 250 base pairs to about 500 base pairs, about 250 base pairs to about 1 ,000 base pairs, about 250 base pairs to about 2,000 base pairs, about 250 base pairs to about 5,000 base pairs, about 250 base pairs to about 8,000 base pairs, about 500 base pairs to about 1 ,000 base pairs, about 500 base pairs to about 2,000 base pairs, about 500 base pairs to about 5,000 base pairs, about 500 base pairs to about 8,000 base pairs, about 1 ,000 base pairs to about 2,000 base pairs, about 1 ,000 base pairs to about 5,000 base pairs, about 1 ,000 base pairs to about 8,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about 8,000 base pairs
  • the first and second target sites are separated by about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs. In certain embodiments, the first and second target sites are separated by at least about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, or about 5,000 base pairs. In certain embodiments, the first and second target sites are separated by at most about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs.
  • the template nucleic acid molecule is in a cell.
  • the cell is in an individual.
  • the individual is a human.
  • the template nucleic acid molecule is a viral genome.
  • nucleic acids encoding one or more components of the first CRISPR-Cas system and/or the second CRISPR-Cas system described herein.
  • Engineered CRISPR systems generally contain two components: a guide RNA (gRNA or sgRNA) and a CRISPR-associated endonuclease (Cas protein).
  • gRNA or sgRNA guide RNA
  • Cas protein CRISPR-associated endonuclease
  • CRISPR/CRISPR-associated (Cas) systems provide bacteria and archaea with adaptive immunity against viruses and plasmids by using CRISPR RNAs (crRNAs) to guide the silencing of invading nucleic acids.
  • the CRISPR-Cas is an RNA-mediated adaptive defense system that relies on small RNA molecules for sequence-specific detection and silencing of foreign nucleic acids.
  • CRISPR-Cas systems are composed of cas genes organized in operon(s) and CRISPR array(s) consisting of genome-targeting sequences (termed spacers).
  • CRISPR-Cas systems generally refer to and include an enzyme system that includes a guide RNA sequence that contains a nucleotide sequence complementary or substantially complementary to a region of a target polynucleotide (e.g., a template nucleic acid such a HSV genomic DNA), and a protein with nuclease activity.
  • CRISPR-Cas systems include Type I CRISPR-Cas system, Type II CRISPR-Cas system, Type III CRISPR-Cas system, and derivatives thereof.
  • CRISPR-Cas systems include engineered and/or programmed nuclease systems derived from naturally accruing CRISPR-Cas systems.
  • CRISPR- Cas systems may contain engineered and/or mutated Cas proteins.
  • nucleases generally refer to enzymes capable of cleaving the phosphodiester bonds between the nucleotide subunits of nucleic acids.
  • endonucleases are generally capable of cleaving the phosphodiester bond within a polynucleotide chain.
  • Nickases refer to endonucleases that cleave only a single strand of a DNA duplex.
  • the CRISPR-Cas system used herein can be a type I, a type II, or a type III system.
  • suitable CRISPR- Cas proteins include Cas3, Cas4, Cas5, Cas5e (or CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8a1 , Cas8a2, Cas8b, Cas8c, Cas9, Casi o, Casl Od, CasF, CasG, CasH, CasX, Cas ⁇ t>, Csy1 , Csy2, Csy3, Cse1 (or CasA), Cse2 (or CasB), Cse3 (or CasE), Cse4 (or CasC), Csc1 , Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1 , Cmr3, Cmr4,
  • the CRISPR-Cas protein or endonuclease is Cas9. In certain embodiments, the CRISPR-Cas protein or endonuclease is Cas12. In certain embodiments, the CRISPR-Cas protein or endonuclease is CasX. In certain embodiments, the CRISPR-Cas protein or endonuclease is Cas ⁇ t>.
  • the Cas9 protein can be from or derived from: Staphylococcus aureus, Streptococcus pyogenes, Streptococcus thermophilus, Streptococcus sp., Nocardiopsis rougevillei, Streptomyces pristinaespiralis, Streptomyces viridochromogenes, Streptomyces viridochromogenes, Streptosporangium roseum, Alicyclobacillus acidocaldarius, Bacillus pseudomycoides, Bacillus selenitireducens, Exiguobacterium sibiricum, Lactobacillus delbrueckii, Lactobacillus salivarius, Microscilla marina, Burkholderiales bacterium, Polaromonas naphthalenivorans, Polaromonas sp., Crocosphaera watsonii, Cyanothece sp., Microcystis
  • the gene editing system comprises a CRISPR- associated (Cas) protein, or functional fragment or derivative thereof.
  • the Cas protein is an endonuclease, including but not limited to the Cas9 nuclease.
  • the Cas9 protein comprises an amino acid sequence identical to the wild type Streptococcus pyogenes or Staphylococcus aureus Cas9 amino acid sequence.
  • the Cas protein may comprise the amino acid sequence of a Cas protein from other species, for example other Streptococcus species, such as thermophilus; Pseudomonas aeruginosa, Escherichia coli, or other sequenced bacteria genomes and archaea, or other prokaryotic microorganisms.
  • Other Cas proteins, useful for the present disclosure known or can be identified, using methods known in the art (see e.g., Esvelt et al., 2013, Nature Methods, 10: 1 116-1121 ).
  • the Cas protein may comprise a modified amino acid sequence, as compared to its natural source.
  • CRISPR-Cas proteins comprise at least one RNA recognition and/or RNA binding domain.
  • RNA recognition and/or RNA binding domains interact with guide RNAs (gRNAs).
  • CRISPR-Cas proteins can also comprise nuclease domains (i.e., DNase or RNase domains), DNA binding domains, helicase domains, RNase domains, protein-protein interaction domains, dimerization domains, as well as other domains.
  • the CRISPR-Cas-like protein can be a wild type CRISPR-Cas protein, a modified CRISPR-Cas protein, or a fragment of a wild type or modified CRISPR-Cas protein.
  • the CRISPR-Cas-like protein can be modified to increase nucleic acid binding affinity and/or specificity, alter an enzymatic activity, and/or change another property of the protein.
  • nuclease i.e., DNase, RNase domains of the CRISPR-Cas-like protein can be modified, deleted, or inactivated.
  • the CRISPR-Cas-like protein can be truncated to remove domains that are not essential for the function of the Cas protein.
  • the CRISPR- Cas-like protein can also be truncated or modified to optimize the activity of the effector domain of the Cas protein.
  • the CRISPR-Cas-like protein can be derived from a wild type Cas protein or fragment thereof.
  • the CRISPR- Cas-like protein is a modified Cas9 protein.
  • the amino acid sequence of the Cas9 protein can be modified to alter one or more properties (e.g., nuclease activity, affinity, stability, etc.) of the protein relative to wild-type or another Cas protein.
  • domains of the Cas9 protein not involved in RNA-guided cleavage can be eliminated from the protein such that the modified Cas9 protein is smaller than the wild-type Cas9 protein.
  • the disclosed CRISPR-Cas compositions should also be construed to include any form of a protein having substantial homology to a Cas protein (e.g., Cas9, saCas9, Cas9 protein) disclosed herein.
  • a protein which is “substantially homologous” is about 50% homologous, more preferably about 70% homologous, even more preferably about 80% homologous, more preferably about 90% homologous, even more preferably, about 95% homologous, and even more preferably about 99% homologous to amino acid sequence of a Cas protein disclosed herein.
  • the gRNA is a short synthetic RNA composed of a scaffold sequence necessary for Cas-binding and targeting sequence (also referred to as a spacer sequence) that defines the genomic target to be modified.
  • the gRNA functions, in part, by hybridizing to a template nucleic acid molecule (e.g., at a targeted site).
  • Hybridization generally refers to and includes the capacity and/or ability of a first nucleic acid molecule to non-covalently bind (e.g., form Watson-Crick-base pairs and/or G/ll base pairs), anneal, and/or hybridize to a second nucleic acid molecule under the appropriate or certain in vitro and/or in vivo conditions of temperature, pH, and/or solution ionic strength.
  • standard Watson-Crick base pairing includes: adenine (A) pairing with thymidine (T); adenine (A) pairing with uracil (II); and guanine (G) pairing with cytosine (C).
  • hybridization comprises at least two nucleic acids comprising complementary sequences (e.g., fully complementary, substantially complementary, or partially complementary). In certain embodiments, hybridization comprises at least two nucleic acids comprising fully complementary sequences. In certain embodiments, hybridization comprises at least two nucleic acids comprising substantially complementary sequences (e.g., greater than about 75%, greater than about 80%, greater than about 85%, greater than about 90%, or greater than about 95% complementary). In certain embodiments, hybridization comprises at least two nucleic acids comprising partially complementary sequences (e.g., greater than about 40%, greater than about 50%, greater than about 60%, or greater than about 70% complementary). In certain embodiments, partially complementary sequences comprises one or more regions of fully or substantially complementary sequences.
  • partially complementary sequences comprises one or more regions of fully or substantially complementary sequences, even if an overall complementarity is low (e.g., a total complementarity lower than about 50%, lower than about 40%, lower than about 30%, or lower than about 20%).
  • the conditions appropriate for hybridization between two nucleic acids depend on the length of the nucleic acids and the degree of complementation, variables well known in the art. For example, the greater the degree of complementation between two nucleotide sequences, the greater the value of the melting temperature (Tm) for hybrids of nucleic acids having those sequences.
  • nucleic acids with short stretches of complementarity e.g., complementarity over 35 or less, 30 or less, 25 or less, 22 or less, 20 or less, or 18 or less nucleotides
  • the position of mismatches becomes important (see Sambrook et al., supra, 1 1.7-11 .8).
  • Complementary or complementarity generally refers to a polynucleotide that includes a nucleotide sequence capable of selectively annealing to an identifying region of a target polynucleotide under certain conditions.
  • substantially complementary and grammatical equivalents is intended to mean a polynucleotide that includes a nucleotide sequence capable of specifically annealing to an identifying region of a target polynucleotide under certain conditions.
  • Annealing refers to the nucleotide basepairing interaction of one nucleic acid with another nucleic acid that results in the formation of a duplex, triplex, or other higher-ordered structure.
  • the primary interaction is typically nucleotide base specific, e.g., A:T, A: U , and G:C, by Watson- Crick and Hoogsteen-type hydrogen bonding.
  • basestacking and hydrophobic interactions can also contribute to duplex stability.
  • Conditions under which a polynucleotide anneals to complementary or substantially complementary regions of target nucleic acids are well known in the art, e.g., as described in Nucleic Acid Hybridization, A Practical Approach, Hames and Higgins, eds., IRL Press, Washington, D.C. (1985) and Wetmur and Davidson, Mol. Biol. 31 :349 (1968).
  • Hybridization generally refers to process in which two single-stranded polynucleotides bind non-covalently to form a stable doublestranded polynucleotide.
  • the composition comprises multiple different gRNA molecules, each targeted (e.g., capable of hybridizing) to a different target sequence.
  • this multiplexed strategy provides for increased efficacy.
  • These multiplex gRNAs or combination of gRNAs can be expressed separately in different vectors or expressed in one single vector.
  • the temperature and solution salt concentration are generally recognized as factors facilitating hybridization, and may be adjusted as necessary according to factors such as length of the region of complementation and the degree of complementarity.
  • Hybridization and washing conditions are well known and exemplified in Sambrook, J., Fritsch, E: F. and Maniatis, T. Molecular Cloning: A Laboratory Manual- Second Edition. Cold Spring Harbor Laboratory Press, Cold Spring Harbor (1989), particularly Chapter 1 1 and Table 11.1 therein; and Sambrook, J. and Russell, W., Molecular Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (2001 ).
  • the conditions of temperature and ionic strength determine the stringency of the hybridization.
  • hybridization is measured a under physiological temperature (e.g., 37 degrees Celsius) and salt concentrations (e.g., 0.15 molar or 0.9% salt in solution).
  • Target specificity can be used in reference to a guide RNA, or a crRNA specific to a target polynucleotide sequence or region and further includes a sequence of nucleotides capable of selectively annealing/hybridizing to a target (sequence or region) of a target polynucleotide, e.g., a target DNA.
  • Target specific nucleotides can have a single species of oligonucleotide, or it can include two or more species with different sequences.
  • the target specific nucleotide can be two or more sequences, including 3, 4, 5, 6, 7, 8, 9 or 10 or more different sequences.
  • a crRNA or the derivative thereof contains a target-specific nucleotide region complementary to a region of the target DNA sequence.
  • a crRNA or the derivative thereof may contain other nucleotide sequences besides a target-specific nucleotide region.
  • the other nucleotide sequences may be from a tracrRNA sequence.
  • gRNAs are generally supported by a scaffold, wherein a scaffold refers to the portions of gRNA or crRNA molecules comprising sequences which are substantially identical or are highly conserved across natural biological species (e.g., not conferring target specificity).
  • Scaffolds include the tracrRNA segment and the portion of the crRNA segment other than the polynucleotide-targeting guide sequence at or near the 5' end of the crRNA segment, excluding any unnatural portions comprising sequences not conserved in native crRNAs and tracrRNAs.
  • the gRNA comprises a CRISPR RNA (crRNA):trans activating cRNA (tracrRNA) duplex.
  • the gRNA comprises a stem-loop that mimics the natural duplex between the crRNA and tracrRNA.
  • the stem-loop comprises a nucleotide sequence comprising non- naturally occurring sequence.
  • the composition comprises a synthetic or chimeric guide RNA comprising a crRNA, stem, and tracrRNA.
  • a protospacer adjacent motif is also an important sequence element mediating enzymatic activity of a Cas nuclease.
  • a PAM sequence or element also refers to and includes an approximately 2-6 base pair DNA sequence that is an important targeting component of a Cas nuclease.
  • the PAM sequence further comprises, in certain instances, a DNA sequence that may be required for a Cas/sgRNA to form an R-loop to interrogate a specific DNA sequence through Watson-Crick pairing of its guide RNA with the genome.
  • the PAM specificity can be a function of the DNA-binding specificity of the Cas protein (e.g., a PAM recognition domain of a Cas), wherein, a protospacer adjacent motif recognition domain refers to a Cas amino acid sequence that comprises a binding site to a DNA target PAM sequence.
  • a protospacer adjacent motif recognition domain refers to a Cas amino acid sequence that comprises a binding site to a DNA target PAM sequence.
  • the PAM sequence is on either strand, and is downstream in the 5' to 3' direction of Cas9 cut site.
  • the canonical PAM sequence i.e. , the PAM sequence that is associated with the Cas9 nuclease of Streptococcus pyogenes or SpCas9 is 5'-NGG-3' wherein “N” is any nucleobase followed by two guanine (“G”) nucleobases.
  • G guanine
  • any given Cas9 nuclease e.g., SpCas9
  • the protospacer region DNA typically immediately precedes a 5'-NGG or NAG proto-spacer adjacent motif (PAM).
  • PAM proto-spacer adjacent motif
  • Other Cas9 orthologs can have different PAM specificities. For example, Cas9 from S.
  • thermophilus requires 5'-NNAGAA for CRISPR 1 and 5'- NGGNG for CRISPR3 and Neiseria menigiditis (nmCas9) requires 5'-NNNNGATT.
  • Cas9 enzymes from different bacterial species i.e., Cas9 orthologs
  • Cas9 orthologs can have varying PAM specificities.
  • Cas9 from Staphylococcus aureus recognizes NGRRT or NGRRN.
  • Cas9 from Neisseria meningitis recognizes NNNNGATT.
  • Cas9 from Streptococcus thermophilis recognizes NNAGAAW.
  • Cas9 from Treponema denticola recognizes NAAAAC.
  • non-SpCas9s bind a variety of PAM sequences, which makes them useful when no suitable SpCas9 PAM sequence is present at the desired target cut site.
  • non-SpCas9s can have other characteristics that make them more useful than SpCas9.
  • the gRNA spacer sequence comprises about 15 nucleotides to about 28 nucleotides. In some embodiments, the gRNA comprises at least about 15 nucleotides. In some embodiments, the gRNA spacer sequence comprises at most about 28 nucleotides.
  • the gRNA spacer sequence comprises about 15 nucleotides to about 16 nucleotides, about 15 nucleotides to about 17 nucleotides, about 15 nucleotides to about 18 nucleotides, about 15 nucleotides to about 19 nucleotides, about 15 nucleotides to about 20 nucleotides, about 15 nucleotides to about 21 nucleotides, about 15 nucleotides to about 22 nucleotides, about 15 nucleotides to about 23 nucleotides, about 15 nucleotides to about 24 nucleotides, about 15 nucleotides to about 25 nucleotides, about 15 nucleotides to about 28 nucleotides, about 16 nucleotides to about 17 nucleotides, about 16 nucleotides to about 18 nucleotides, about 16 nucleotides to about 19 nucleotides, about 16 nucleotides to about 20 nucleotides, about 16
  • the gRNA spacer sequence comprises about 15 nucleotides, about 16 nucleotides, about 17 nucleotides, about 18 nucleotides, about 19 nucleotides, about 20 nucleotides, about 21 nucleotides, about 22 nucleotides, about 23 nucleotides, about 24 nucleotides, about 25 nucleotides, or about 28 nucleotides.
  • the gene editing system comprises: (a) a Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated endonuclease or a nucleic acid encoding the CRISPR-associated endonuclease; and (b) a first guide nucleic acid that hybridizes to a first target site of a template nucleic acid molecule (e.g., a sequence repeated within the template nucleic acid molecule).
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeat
  • the gene editing system comprises: (a) a Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated endonuclease or a nucleic acid encoding the CRISPR-associated endonuclease; (b) a first guide nucleic acid that hybridizes to a first target site within a template nucleic acid molecule; and (c) a second guide nucleic acid that hybridizes to a second target site within the template nucleic acid molecule.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeat
  • the gene editing system comprises: (a) a Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated endonuclease or a nucleic acid encoding the CRISPR-associated endonuclease; (b) a first guide nucleic acid that hybridizes to a first target site within a template nucleic acid molecule; (c) a second guide nucleic acid that hybridizes to a second target site within the template nucleic acid molecule; and (d) a third guide nucleic acid that hybridizes to a third target site within the template nucleic acid molecule.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeat
  • a nucleic acid molecule from a template nucleic acid molecule
  • the method comprising: (a) cleaving the template nucleic acid molecule at a first target site on the template nucleic acid molecule, thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence; and (b) cleaving the template nucleic acid molecule at a second target site on the template nucleic acid molecule, thereby generating a second cleaved region, wherein: the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
  • (a) comprises contacting the template nucleic acid with a first gene editing system or a first CRISPR-associated nuclease system and cleaving the template nucleic acid molecule, and wherein (b) comprises contacting the template nucleic acid with a second gene editing system or a second CRISPR- associated nuclease system.
  • the first target site and the second target site are different.
  • a template nucleic acid molecule comprising: (a) cleaving the template nucleic acid molecule at a first target site on a template nucleic acid molecule, the first target site being located within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to a first template sequence; and (b) cleaving the template nucleic acid molecule at a second target site on the template nucleic acid molecule, the second target site being located within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to a second template sequence, the second template sequence comprising microhomology to the first template sequence.
  • Also provided are methods of editing a template nucleic acid molecule comprising: (a) cleaving the template nucleic acid molecule at a first target site, thereby generating a first cleaved region on the template nucleic acid molecule, the first cleaved region or segment thereof comprising a first template sequence; and (b) cleaving the template nucleic acid molecule at a second target site comprising microhomology to the first target site, thereby generating a second cleaved region on the template nucleic acid molecule, the second cleaved region or segment thereof comprising a second template sequence, the first template sequence comprising microhomology to the second template sequence.
  • the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal to the terminus of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site).
  • methods of inactivating a virus comprising: (a) cleaving the template nucleic acid molecule at a first target site on a viral genome, the first target site being located within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to a first viral sequence; and (b) cleaving the template nucleic acid molecule at a second target site on the viral genome, the second target site being located within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to a second viral sequence, the second viral sequence comprising microhomology to the first viral sequence.
  • Also provided are methods of inactivating a virus comprising: (a) cleaving a viral genome at a first target site, the first target site being located within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to a first viral sequence; and (b) cleaving the viral genome at a second target site, the second target site being located within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to a second viral sequence, the second viral sequence comprising microhomology to the first viral sequence.
  • the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal to the terminus of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site).
  • the method comprises, prior to (a), selecting the first template sequence or first viral sequence, and selecting the second template sequence or second viral sequence.
  • the cleaving is performed by an a CRISPR-Cas system, a meganuclease system, a TALEN system, or a ZFN system.
  • the method comprises prior to (a), contacting the template nucleic acid molecule with one or more enzymes selected from the group consisting of: a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system.
  • cleaving is performed by the compositions described herein. In some embodiments, cleaving is performed by the compositions described herein.
  • a template nucleic acid sequence e.g., a viral genome sequence
  • positional data for a cleavable region of the plurality of cleavable regions using the template nucleic acid sequence, wherein the positional data comprises (i) the location of the cleavable region, (ii) a cut site within the cleavable region, and/or (iii) nucleobase sequences of nucleobase positions within the cleavable region;
  • the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • a cleavable region comprises (i) about 10 base pairs 5’ of a cut site within the cleavable region and (ii) about 10 base pairs 3’ of the cut site within the cleavable region.
  • the microhomology data is a function of:
  • orientation and/or strand location (e.g., for identifying inversion outcomes);
  • the template nucleic acid sequence comprises consensus sequence
  • the computer-implemented method comprises in (a) generating, by one or more computers, the template nucleic acid sequence by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes).
  • the template nucleic acid sequence is different from each input nucleic acid sequence used to generate the consensus sequence.
  • the two or more input nucleic acid sequences are present within a definable geographical region (e.g., Asia, Europe, North America, etc.), a definable population of individuals (e.g., a patient population), or a definable pathology (e.g., cancer-causing variants).
  • a definable geographical region e.g., Asia, Europe, North America, etc.
  • a definable population of individuals e.g., a patient population
  • a definable pathology e.g., cancer-causing variants
  • the computer-implemented method further comprises: generating, by one or more computers, positional entropy data for a nucleotide at each position of the template nucleic acid sequence.
  • the method further comprises: generating, by the one or more computers, additional data using the template nucleic acid sequence and/or the positional data, wherein the additional data comprises positional entropy data (e.g., Shannon entropy) for a cut site and/or nucleobase positions proximal to the cut site, gene location (e.g., coding region data) data for a cut site and/or nucleobase positions proximal to the cut site, a distance data (e.g., distance from other target sequences) for a cut site and/or nucleobase positions proximal to the cut site, proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence) for a cut site and/or
  • positional entropy data e.g.,
  • the method further comprises: generating, by the one or more computers, additional data for the first target nucleic acid sequence and the second target nucleic acid sequence, wherein the additional data comprises positional entropy data (e.g., Shannon entropy), gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof.
  • positional entropy data e.g., Shannon entropy
  • gene location e.g., coding region data
  • a distance data e.g., distance from other target sequences
  • proximity to one or more PAM sequences e.g., homology to a human genome sequence
  • target specificity and selectivity data e.g., Azimuth 2.0
  • ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇
  • a template nucleic acid sequence e.g., a viral genome sequence
  • identifying or providing by one or more computers, a plurality of cut sites within the template nucleic acid sequence
  • Also provided are computer-implemented methods of cut site evaluation for cutting a template polynucleotide molecule comprising: (a) generating, by one or more computers, positional data for a first cut site and a second cut site within a template nucleic acid sequence, wherein the positional data comprises (i) the location of the cut site within the template and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the first cut site and the second cut site; and (b) generating, by one or more computers, microhomology data using the positional data, wherein the microhomology data identifies a degree of microhomology between nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology.
  • the method further comprises identifying, by one or more computers, a first target sequence within or adjacent to the first cut site and a second target sequence within or adjacent to the second cut site.
  • the nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise at least 2 nucleobase positions, at least 5 nucleobase positions, at least 7 nucleobase positions, at least 10 nucleobase positions, or at least 15 nucleobase positions.
  • the microhomology data is a function of: (i) total nucleobase complementarity of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; (ii) the length (e.g., number) of nucleobase complementarity of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; (iii) nucleobase complementarity at a 5’ edge (e.g., the at least two nucleobase positions at a 5’ terminus) or a 3’ edge of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; (iv) orientation and/or strand location (e.g., for identifying inversion outcomes); (v) base content of complementary nucleobases between nucleobases proximal to the first cut site and nucleobases proximal the second cut site
  • a computer-implemented method of cut site identification for cutting a template polynucleotide molecule comprising: (a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence); and (b) identifying or providing, by one or more computers, a plurality of cut sites within the template nucleic acid sequence; (c) generating, by one or more computers, positional data for a cut site of the plurality of cut sites using the template nucleic acid sequence, wherein the positional data comprises (i) the location of the cut site and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the cut site; (d) generating, by one or more computers, microhomology data for the plurality of cut sites using the positional data; and (e) identifying, by one or more computers, at least 2 cut sites,
  • a template nucleic acid sequence
  • the method further comprises identifying, by one or more computers, target sequences comprising or adjacent to the cut sites identified in (e).
  • the nucleobases proximal to cute sites comprise at least 2 nucleobase positions, at least 5 nucleobase positions, at least 7 nucleobase positions, at least 10 nucleobase positions, or at least 15 nucleobase positions.
  • the microhomology data is a function of: total nucleobase complementarity of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; the length (e.g., number) of nucleobase complementarity of nucleobases proximal to the cut sites; nucleobase complementarity at a 5’ edge (e.g., the at least two nucleobase positions at a 5’ terminus) or a 3’ edge of nucleobases proximal to the cut sites; melting temperature of nucleobases proximal to the cut sites; base content of complementary nucleobases between nucleobases proximal to the cut sites; or a combination thereof.
  • the function includes or utilizes sequence alignment data (e.g., of the proximal nucleobase positions) to generate the microhomology data.
  • the template nucleic acid sequence comprises consensus sequence
  • the computer-implemented method comprises in (a) generating, by one or more computers, the template nucleic acid sequence by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes).
  • the template nucleic acid sequence is different from each input nucleic acid sequence used to generate the consensus sequence.
  • the two or more input nucleic acid sequences are present within a definable geographical region (e.g., Asia, Europe, North America, etc.), a definable population of individuals (e.g., a patient population), or a definable pathology (e.g., cancer-causing variants).
  • the computer-implemented method further comprises: generating, by one or more computers, positional entropy data for a nucleotide at each position of the template nucleic acid sequence.
  • the two or more input nucleic acid sequences comprises at least 5 sequences. In some embodiments, the two or more input nucleic acid sequences comprises at least 50 sequences. In some embodiments, the two or more input nucleic acid sequences comprises at least 100 sequences. In some embodiments, the two or more input nucleic acid sequences comprises at least 250 sequences. In some embodiments, the two or more input nucleic acid sequences comprises are from a pathogen.
  • the two or more input nucleic acid sequences comprises are from a virus. In some embodiments, the two or more input nucleic acid sequences are sequences within different subclades of the virus. In some embodiments, the two or more input nucleic acid sequences are sequences within a single or same subclade of a virus.
  • the two or more input nucleic acid sequences comprises about 2 input sequences to about 1 ,000 input sequences. In certain embodiments, the two or more input nucleic acid sequences comprises about 2 input sequences to about 5 input sequences, about 2 input sequences to about 25 input sequences, about 2 input sequences to about 50 input sequences, about 2 input sequences to about 100 input sequences, about 2 input sequences to about 500 input sequences, about 2 input sequences to about 1 ,000 input sequences, about 5 input sequences to about 25 input sequences, about 5 input sequences to about 50 input sequences, about 5 input sequences to about 100 input sequences, about 5 input sequences to about 500 input sequences, about 5 input sequences to about 1 ,000 input sequences, about 25 input sequences to about 50 input sequences, about 25 input sequences to about 100 input sequences, about 25 input sequences to about 500 input sequences, about 25 input sequences to about 1 ,000 input sequences, about 50 input sequences to about 100 input sequences, about
  • the two or more input nucleic acid sequences comprises about 2 input sequences, about 5 input sequences, about 25 input sequences, about 50 input sequences, about 100 input sequences, about 500 input sequences, or about 1 ,000 input sequences. In certain embodiments, the two or more input nucleic acid sequences comprises at least about 2 input sequences, about 5 input sequences, about 25 input sequences, about 50 input sequences, about 100 input sequences, or about 500 input sequences. In certain embodiments, the two or more input nucleic acid sequences comprises at most about 5 input sequences, about 25 input sequences, about 50 input sequences, about 100 input sequences, about 500 input sequences, or about 1 ,000 input sequences.
  • a non-limiting example of a mathematical algorithm utilized for the comparison of two sequences is the algorithm of Karlin and Altschul, 1990, Proc. Natl. Acad. Sci. USA 87:2264-2268, modified as in Karlin and Altschul, 1993, Proc. Natl. Acad. Sci. USA 90:5873-5877.
  • Such an algorithm is incorporated into the NBLAST and XBLAST programs of Altschul et al., 1990, J. Mol. Biol. 215:403-410.
  • PSI-Blast can be used to perform an iterated search which detects distant relationships between molecules.
  • sequence alignment may be carried out using the CLUSTAL algorithm (e.g., as provided in the program Clustal- omega), as described by Higgins et al., 1996, Methods Enzymol. 266:383-402.
  • the template nucleic acid sequence is a human sequence
  • the template nucleic acid sequence is a viral sequence.
  • the virus is Hepatitis B virus (HBV), Human Immunodeficiency virus (HIV), JC virus (JCV), herpes simplex virus (HSV), or SARS-CoV-2.
  • a virus comprises Hepatitis B virus (HBV), Human Immunodeficiency virus (HIV), JC virus (JCV), herpes simplex virus (HSV), or SARS-CoV-2.
  • the virus is Hepatitis B virus (HBV) and the genotype is HBV-A.
  • the virus is Hepatitis B virus (HBV) and the genotype is HBV-B.
  • the virus is Hepatitis B virus (HBV) and the genotype is HBV-C.
  • the virus is Hepatitis B virus (HBV) and the different genotypes comprise HBV-A, HBV-B, HBV-C, or combinations thereof.
  • the virus is Hepatitis B virus (HBV) and different genotypes comprise HBV-A, HBV-B, HBV-C, HBV-D, HBV-E, HBV-F, HBV-G, HBV-H, or combinations thereof.
  • the virus is Hepatitis B virus (HBV) and various subclades within HBV-A, HBV-B, HBV-C, HBV- D, HBV-E, HBV-F, HBV-G, HBV-H, or combinations thereof.
  • the subclade within HBV-A comprises HBV-A1 , HBV- A2, HBV-QS-A3, HBV-A4, or combinations thereof.
  • the subclade within HBV-B comprises HBV-B1 , HBV-B2, HBV-QS-B3, HBV-B4, HBV- B5, or combinations thereof.
  • the subclade within HBV-C comprises HBV-C1 , HBV-QS-C2, HBV-C3, HBV-C4, HBV-C5, HBV-C6-C15, or combinations thereof.
  • the subclade within HBV-D comprises HBV-D1 , HBV-D2, HBV-D3, HBV-D4, HBV-D5, HBV-D6, or combinations thereof.
  • the subclade within HBV-F comprises HBV-F1 , HBV-F2, HBV- F3, HBV-F4, or combinations thereof.
  • HBV genotypes and subgenotypes/subclades in populations differ between geographic regions.
  • the subclades in North America include HBV-A2, HBV-D2, HBV-B5, HBV-B4, and HBV-G.
  • the subclades in Central America include HBV-A2, HBV-F1 , HBV-H, HBV-G, HBV-B2, HBV-F3, HBV-C1 , and HBV-F4.
  • the subclades in Caribbean include HBV-A1 , HBV-QS-A3, HBV-D4, HBV-A2, and HBV-D3.
  • the subclades in South America include HBV-F1 , HBV-F4, HBV-D3, HBV- F3, HBV-F2, HBV-A1 , HBV-A2, and HBV-D2.
  • the subclades in Northern Europe include HBV-D2, HBV-A2, HBV-D3, and HBV-E.
  • the subclades in Southern Europe include HBV-D3, HBV-D2, HBV-D1 , and HBV- A2.
  • the subclades in Western Europe include HBV-A2, HBV-D1 , HBV-D2, HBV-D3, and HBV-E.
  • the subclades in Eastern Europe include HBV-D2, HBV-A2, HBV-D1 , and HBV-D3.
  • the subclades in Northern Africa include HBV-D1 , HBV-E, HBV-D6, HBV-D2, and HBV-D3.
  • the subclades in Western Africa include HBV-E and HBV-A2.
  • the subclades in Middle Africa include HBV-E, and HBV-QS-A3.
  • the subclades in Eastern Africa include HBV-A1 , HBV-D2, and HBV-E.
  • the subclades in Southern Africa include HBV-A1 , HBV-D3, HBV- E, and HBV-A2.
  • the subclades in Western Asia include HBV-D1 and HBV-D2.
  • the subclades in Southern Asia include HBV-D1 , HBV-D3, HBV-D2, HBV-A1 , HBV-C1 , and HBV-D5.
  • the subclades in Central Asia include HBV-D1 , HBV-D2, HBV-QS-C2, and HBV-A2.
  • the subclades in Eastern Asia include HBV-QS-C2, HBV-B2, HBV-C1 , HBV-QS-B3, and HBV-C6-C15.
  • the subclades in Southeastern Asia include HBV-C1 , HBV-B2, HBV-QS-B3, HBV-B4, and HBV-QS-C2.
  • the subclades in Melanesia include HBV-D2, HBV-C3, and HBV-C6-C15.
  • the subclades in Polynesia include HBV-C3.
  • the subclades in Australia and New Zealand include HBV-D1 , HBV-C4, HBV-C3, and HBV-D4.
  • the most frequently observed subclades for various geographic regions comprise, for North America: HBV-A2 > HBV-D2 > HBV-B5 > HBV-B4 > HBV-G; for Central America: HBV-A2 > HBV-F1 > HBV-H > HBV-G > HBV-B2 > HBV-F3 > HBV-C1 > HBV-F4; for Caribbean: HBV-A1 > HBV-QS-A3 > HBV-D4 > HBV-A2 > D3; for South America: HBV-F1 > HBV-F4 > HBV-D3 > HBV-F3 > HBV-F2 > HBV-A1 > HBV-A2 > HBV- D2; for Northern Europe: HBV-D2 > HBV-A2 > HBV-D3 > HBV-E; for Southern Europe: HBV-D3 > HBV-D2 > HBV-D1 > HBV-A2; for Western Europe: HBV-A2
  • the method further comprises:generating, by the one or more computers, additional data using the template nucleic acid sequence and/or the positional data, wherein the additional data comprises positional entropy data (e.g., Shannon entropy) for a cut site and/or nucleobase positions proximal to the cut site, gene location (e.g., coding region data) data for a cut site and/or nucleobase positions proximal to the cut site, a distance data (e.g., distance from other target sequences) for a cut site and/or nucleobase positions proximal to the cut site, proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence) for a cut site and/or nucleobase positions proximal to the cut site, target specificity and selectivity data (e.g., Azimuth 2.0) for a cut site and/or nucleobase positions proximal to the cut site, or combinations thereof.
  • the method further comprises identifying a first target nucleic acid sequence comprising or adjacent to the first cut site and a second target nucleic acid sequence comprising or adjacent to the second cut site.
  • the method further comprises: generating, by the one or more computers, additional data for the first target nucleic acid sequence and the second target nucleic acid sequence, wherein the additional data comprises positional entropy data (e.g., Shannon entropy), gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof.
  • positional entropy data e.g., Shannon entropy
  • gene location e.g., coding region data
  • a distance data e.g., distance from other target sequences
  • proximity to one or more PAM sequences
  • the first target nucleic acid sequence and the second target nucleic acid sequence are further generated by using the additional data associated with the first target nucleic acid sequence and the second target nucleic acid sequence to generate a score, wherein the score is above a threshold value.
  • the first cut site and the second cut site are cleavable by one or more programmable nucleases.
  • the one or more programmable nucleases comprise a CRISPR-Cas system, a meganuclease system, a TALEN system, a ZFN system, or a combination thereof.
  • the first target nucleic acid sequence and the second target nucleic acid sequence are targetable (e.g., capable of being cleaved by) one or more CRISPR-Cas systems.
  • the microhomology comprises at least 2, at least 5, at least 10, at least 15, or at least 20 complementary or substantially complementary (e.g., greater than 75% complementarity) nucleobases.
  • the first cut site is located within a first gene of the template polynucleotide molecule and the second cut site is located within a second gene of the template polynucleotide molecule. In some embodiments, the first cut site and the second cut site are located within two or more genes of the template polynucleotide molecule. In some embodiments, the first cut site is located within a first protein coding region of the template polynucleotide molecule and the second cut site is located within a second protein coding region of the template polynucleotide molecule. In some embodiments, the first cut site and the second cut site are located within two or more protein coding regions of the template polynucleotide molecule.
  • the first cut site and the second cut site are identical or substantially identical (e.g., greater than 75% sequence identity).
  • the microhomology is located at the terminus (e.g., the 3’ end) of the first cut site or cut site associated with the first cut site and the second cut site or cut site associated with the second cut site.
  • the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cut site) to the terminus of the first cut site or cut site associated with the first cut site and the second cut site or cut site associated with the second cut site.
  • cutting at first cut site and cutting ate the second cut site results in a deletion (e.g., excising) between the first cut site and the second cut site.
  • cutting at first cut site and cutting at to the second cut site results in (i) microhomology-mediated end joining (MMEJ) of the first cut site and the second cut site, and/or (ii) a deletion in the template polynucleotide molecule.
  • the deletion comprises at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs.
  • the deletion removes one or more genes within the template polynucleotide molecule. [0169] In some embodiments, the deletion removes of at least about 1 , 2, 3, 4, or 5 genes within the template polynucleotide molecule. In some embodiments, the deletion is a full deletion of a gene or a partial deletion of a gene. In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs.
  • the template polynucleotide molecule is in a cell.
  • the cell is in an individual.
  • the individual is a human.
  • the template polynucleotide molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • the template polynucleotide molecule is an episomal or integrated genome exogenous a host cell.
  • a digital processing device comprising: at least one processor, an operating system configured to perform executable instructions, a memory, and a computer program including instructions executable by the digital processing device to create a target site identification application for identifying cut sites for cutting a template polynucleotide molecule
  • the target site identification application is programmed to: (a) generate microhomology data for a plurality of cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (i) the location of cut sites and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the cut sites; and (b) identify a first cut site and a second cut site using the microhomology data, wherein nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology.
  • non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processing device to create a target site identification application for identifying target sites for cutting a template polynucleotide molecule, wherein the target site identification application is programmed to: (a) generate microhomology data for a plurality of cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (i) the location of cut sites and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the cut sites; and (b) identify a first cut site and a second cut site using the microhomology data, wherein nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology.
  • a computer-implemented method of cut site identification for cutting a template polynucleotide molecule comprising:
  • a template nucleic acid sequence e.g., a viral genome sequence
  • microhomology data is a function of: total nucleobase complementarity of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; the length (e.g., number) of nucleobase complementarity of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; nucleobase complementarity at a 5’ edge (e.g., the at least two nucleobase positions at a 5’ terminus) or a 3’ edge of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; melting temperature of nucleobases proximal to the first cut site and nucleobases
  • a template nucleic acid sequence e.g., a viral genome sequence
  • the first target nucleic acid sequence and the second target nucleic acid sequence are generated by: (i) aligning the plurality of target nucleic acid sequences; (ii) identifying a pair of target nucleic acid sequences comprising microhomology; and (iii) identifying the first target nucleic acid sequence and the second target nucleic acid sequence comprising microhomology.
  • the computer-implemented method further comprises: (d) generating, by one or more computers, microhomology data for the microhomology of the first target nucleic acid sequence and the second target nucleic acid sequence.
  • the microhomology data comprises a scoring function, wherein the scoring function is derived from: total nucleobase complementarity, nucleobase complementarity at a 5’ or 3’ region of the first target nucleic acid sequence and the second target nucleic acid sequence, melting temperature of two or more complementary nucleobases, or a combination thereof.
  • a template nucleic acid sequence e.g., a viral genome sequence
  • a target nucleic acid comprises a portion (e.g., less than 50 base pairs) of the template nucleic acid sequence
  • a first target nucleic acid sequence and a second target nucleic acid sequence from the plurality of target nucleic acid sequences, wherein: (i) the first target nucleic acid sequence and the second target nucleic acid sequence comprise microhomology, or (ii) the first target nucleic acid sequence is proximal to a (e.g., within 20, 10, or 5 nucleobase positions) a
  • the first target nucleic acid sequence and the second target nucleic acid sequence are generated by: (i) aligning the plurality of target nucleic acid sequences to the template nucleic acid sequence; (ii) identifying regions proximal (e.g., within 20, 10, or 5 nucleobase positions) to the plurality of target nucleic acid sequences within the template nucleic acid sequence; (iii) identifying (1 ) pairs of target nucleic acid sequences comprising microhomology, (2) pairs of proximal sequences comprising microhomology that are proximal to a pair of target nucleic acid sequences, or (3) a combination thereof; and (iv) identifying the first target nucleic acid sequence and the second target nucleic acid sequence from the pairs of target nucleic acid sequences, the pairs of proximal sequences , or the combination thereof.
  • the computer-implemented method further comprises: (d) generating, by one or more computers, microhomology data for the microhomology of the first target nucleic acid sequence and the second target nucleic acid sequence.
  • the microhomology data comprises a scoring function, wherein the scoring function is derived from: total nucleobase complementarity, nucleobase complementarity at a 5’ or 3’ region of the first target nucleic acid sequence and the second target nucleic acid sequence, melting temperature of two or more complementary nucleobases, or a combination thereof.
  • Also provided are computer-implemented methods of target site identification for cutting a template polynucleotide molecule comprising: (a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence); and (b) identifying, by one or more computers, at least one pair of microhomologous sequences within the template nucleic acid sequence, wherein the pair of microhomologous sequences comprise microhomology; (c) generating, by one or more computers, a plurality of target nucleic acid sequences from the template nucleic acid sequence, wherein a target nucleic acid comprises a portion (e.g., less than 50 base pairs) of the template nucleic acid sequence; and (d) generating, by one or more computers, a first target nucleic acid sequence and a second target nucleic acid sequence from the plurality of target nucleic acid sequences and at least one pair of microhomologous sequences, wherein: (i)
  • the first target nucleic acid sequence and the second target nucleic acid sequence are generated by: (i) aligning the plurality of target nucleic acid sequences to the template nucleic acid sequence; (ii) identifying regions proximal (e.g., within 20, 10, or 5 nucleobase positions) to the plurality of target nucleic acid sequences within the template nucleic acid sequence; (iii) identifying (1 ) pairs of target nucleic acid sequences comprising microhomology, (2) pairs of proximal sequences comprising microhomology that are proximal to a pair of target nucleic acid sequences, or (3) a combination thereof; and (iv) identifying the first target nucleic acid sequence and the second target nucleic acid sequence from the pairs of target nucleic acid sequences, the pairs of proximal sequences , or the combination thereof.
  • the computer-implemented method further comprises: (e) generating, by one or more computers, microhomology data for the microhomology of the first target nucleic acid sequence and the second target nucleic acid sequence.
  • the microhomology data comprises a scoring function, wherein the scoring function is derived from: total nucleobase complementarity, nucleobase complementarity at a 5’ or 3’ region of the first target nucleic acid sequence and the second target nucleic acid sequence, melting temperature of two or more complementary nucleobases, or a combination thereof.
  • a digital processing device comprising: at least one processor, an operating system configured to perform executable instructions, a memory, and a computer program including instructions executable by the digital processing device to create a target site identification application for identifying target sites for cutting a template polynucleotide molecule, wherein the target site identification application is programmed to: (a) generate or provide a template nucleic acid sequence (e.g., a viral genome sequence); and (b) generate a plurality of target nucleic acid sequences from the template nucleic acid sequence, wherein a target nucleic acid comprises a portion (e.g., less than 50 base pairs) of the template nucleic acid sequence; and (c) generate a first target nucleic acid sequence and a second target nucleic
  • Non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processing device to create a target site identification application for identifying target sites for cutting a template polynucleotide molecule, wherein the target site identification application is programmed to: (a) generate or provide a template nucleic acid sequence (e.g., a viral genome sequence); and (b) generate a plurality of target nucleic acid sequences from the template nucleic acid sequence, wherein a target nucleic acid comprises a portion (e.g., less than 50 base pairs) of the template nucleic acid sequence; and (c) generate a first target nucleic acid sequence and a second target nucleic acid sequence from the plurality of target nucleic acid sequences, wherein the first target nucleic acid sequence and the second target nucleic acid sequence comprise microhomology.
  • a template nucleic acid sequence e.g., a viral genome sequence
  • a target nucleic acid comprises a portion (e.g., less
  • a digital processing device comprising: at least one processor, an operating system configured to perform executable instructions, a memory, and a computer program including instructions executable by the digital processing device to create a target site identification application for identifying target sites for cutting a template polynucleotide molecule, wherein the target site identification application is programmed to: (a) generating or providing, by one or more com puters, a template nucleic acid sequence (e.g., a viral genome sequence); and (b) generating, by one or more computers, a plurality of target nucleic acid sequences from the template nucleic acid sequence, wherein a target nucleic acid comprises a portion (e.g., less than 50 base pairs) of the template nucleic acid sequence; and (c) generating, by one or more computers, a first target nucleic acid sequence and a second target nucleic acid sequence from the plurality of target nucleic acid sequences, wherein : (i
  • the first target nucleic acid sequence and the second target nucleic acid sequence is a non-naturally occurring sequence.
  • the template nucleic acid sequence comprises a consensus sequence generated by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes).
  • the computer-implemented method comprises in (a) generating a consensus sequence, wherein the consensus sequence is generated by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes).
  • the consensus sequence is different from each input nucleic acid sequence.
  • the two or more input nucleic acid sequences are present within a definable geographical region (e. g., Asia, Europe, North America, etc.), a definable population of individuals (e.g., a patient population), or a definable pathology (e.g., cancer-causing variants).
  • the computer-implemented method further comprises: generating, by one or more computers, positional entropy data for a nucleotide at each position of the consensus sequence.
  • the method comprises generating, by one or more computers, a positional entropy score from the positional entropy data, and cleaning the plurality of target nucleic acid sequences by removing target nucleic acid sequences having a positional entropy score lower than a defined threshold score.
  • the two or more input nucleic acid sequences comprises at least 5 sequences In some embodiments, the two or more input nucleic acid sequences comprises at least 50 sequences. In some embodiments, the two or more input nucleic acid sequences comprises at least 100 sequences. In some embodiments, the two or more input nucleic acid sequences comprises at least 100 sequences Processing Device(s)
  • the methods, systems, and media described herein include at least one digital processing device, or use of the same.
  • the digital processing device includes one or more hardware central processing units (CPUs) or general-purpose graphics processing units (GPGPUs) that carry out the device's functions.
  • the digital processing device further comprises an operating system configured to perform executable instructions.
  • the digital processing device is optionally connected a computer network.
  • the digital processing device is optionally connected to the Internet such that it accesses the World Wide Web.
  • the digital processing device is optionally connected to a cloud computing infrastructure.
  • the digital processing device is optionally connected to an intranet.
  • the digital processing device is optionally connected to a data storage device.
  • suitable digital processing devices include, by way of non-limiting examples, commercial server computers and desktop computers known to those of skill in the art. Suitable digital processing devices also include devices custom-built using hardware and techniques known to those of skill in the art.
  • the additional data comprises positional entropy data, gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof.
  • the additional data comprises positional entropy data (e.g., Shannon entropy).
  • the additional data comprises gene location (e.g., coding region data) data.
  • the additional data comprises distance data (e.g., distance from other target sequences). In certain embodiments, the additional data comprises proximity to one or more PAM sequences. In certain embodiments, the additional data comprises homology data (e.g., homology to a human genome sequence where reduced or no homology is favored ). In certain embodiments, the additional data comprises target specificity and selectivity data (e.g., Azimuth 2.0). In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are further generated by using additional data associated with the first target nucleic acid sequence and the second target nucleic acid sequence to generate a score, wherein the score is above a threshold value (e.g., a user-defined value).
  • a threshold value e.g., a user-defined value
  • the plurality of target nucleic acid sequences is generated by applying a sliding window (e.g., of about 18, 20, 22, 25, 27, 30, etc.) across the template nucleic acid sequence, thereby generating the plurality of target nucleic acid sequences.
  • a sliding window e.g., of about 18, 20, 22, 25, 27, 30, etc.
  • the plurality of target nucleic acid sequences are user supplied and need to be generated.
  • the first target nucleic acid sequence and the second target nucleic acid sequence are targetable (e.g., capable of being cleaved by) one or more programmable nucleases (e.g., the gene editing systems described herein).
  • the one or more programmable nucleases comprise a CRISPR-Cas system, a meganuclease system, a TALEN system, a ZFN system, or a combination thereof.
  • the first target nucleic acid sequence and the second target nucleic acid sequence are targetable (e.g., capable of being cleaved by) one or more CRISPR-Cas systems.
  • the first target nucleic acid sequence and the second target nucleic acid sequence comprise a PAM sequence or are adjacent to a PAM sequence.
  • the microhomology comprises at least 2, at least 5, at least 10, at least 15, or at least 20 complementary nucleotides.
  • the first target nucleic acid sequence is located within a first gene of the template polynucleotide molecule and the second target nucleic acid sequence is located within a second gene of the template polynucleotide molecule.
  • the first target nucleic acid sequence and the second target nucleic acid sequence are located within two or more genes of the template polynucleotide molecule.
  • the first target nucleic acid sequence is located within a first protein coding region of the template polynucleotide molecule and the second target nucleic acid sequence is located within a second protein coding region of the template polynucleotide molecule. In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are located within two or more protein coding regions of the template polynucleotide molecule.
  • the microhomology is located at the terminus (e.g., the 3’ end) of the first target nucleic acid sequence or cut site associated with the first target nucleic acid sequence and the second target nucleic acid sequence or cut site associated with the second target nucleic acid sequence. In some embodiments, the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cut site) to the terminus of the first target nucleic acid sequence or cut site associated with the first target nucleic acid sequence and the second target nucleic acid sequence or cut site associated with the second target nucleic acid sequence.
  • cutting within or proximal to the first target nucleic acid sequence and cutting within or proximal to the second target nucleic acid sequence results in a deletion (e.g., excising) between the first target nucleic acid sequence and the second target nucleic acid sequence.
  • cutting within or proximal to the first target nucleic acid sequence and cutting within or proximal to the second target nucleic acid sequence results in (i) microhomology-mediated end joining (MMEJ) of a region within or proximal to the first target nucleic acid sequence and a region within or proximal to the second target nucleic acid sequence, and/or (ii) a deletion in the template polynucleotide molecule.
  • MMEJ microhomology-mediated end joining
  • the deletion comprises at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In some embodiments, the deletion removes one or more genes within the template polynucleotide molecule. In some embodiments, the deletion removes of at least about 1 , 2, 3, 4, or 5 genes within the template polynucleotide molecule. In some embodiments, the deletion is a full deletion of a gene or a partial deletion of a gene.
  • the first target nucleic acid sequence and the second target nucleic acid sequence are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs.
  • the template polynucleotide molecule is in a cell.
  • the cell is in an individual.
  • the individual is a human.
  • the template polynucleotide molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • the template polynucleotide molecule is an episomal or integrated genome exogenous a host cell.
  • the digital processing device includes an operating system configured to perform executable instructions.
  • the operating system is, for example, software, including programs and data, which manages the device's hardware and provides services for execution of applications.
  • suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®.
  • suitable personal computer operating systems include, by way of non-limiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX-like operating systems such as GNU/Linux®.
  • the operating system is provided by cloud computing.
  • the device includes a storage and/or memory device.
  • the storage and/or memory device is one or more physical apparatuses used to store data or programs on a temporary or permanent basis.
  • the device is volatile memory and requires power to maintain stored information.
  • the device is non-volatile memory and retains stored information when the digital processing device is not powered.
  • the non-volatile memory may comprise flash memory, dynamic random-access memory (DRAM), ferroelectric random-access memory (FRAM), phase-change random access memory (PRAM), or the like.
  • the device is a storage device including, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, magnetic disk drives, magnetic tapes drives, optical disk drives, cloud computing-based storage, and the like.
  • the storage and/or memory device is a combination of devices such as those disclosed herein.
  • the digital processing device optionally includes a display to send visual information to a user.
  • Suitable displays include liquid crystal displays (LCD), th in film transistor liquid crystal displays (TFT-LCD), organic light emitting diode (OLED) displays (including passive-matrix OLED (PMOLED) and active-matrix OLED (AMOLED) displays), plasma displays, video projectors, and head-mounted displays (such as a VR headset) in communication with the digital processing device.
  • Suitable VR headsets include, by way of non-limiting examples, HTC Vive, Oculus Rift, Samsung Gear VR, Microsoft HoloLens, Razer OSVR, FOVE VR, Zeiss VR One, Avegant Glyph, Freefly VR headset, and the like.
  • the display is a combination of devices such as those disclosed herein.
  • the digital processing device optionally includes one or more input devices to receive information from a user.
  • Suitable input devices include keyboards, pointing devices (including, by way of non-limiting examples, a mouse, a trackball, a track pad, a joystick, a game controller, and a stylus), touch screens or a multi-touch screens, microphones to capture voice or other sound input, video cameras or other sensors to capture motion or visual input.
  • the input device is a Kinect, Leap Motion, or the like.
  • the input device is a combination of devices such as those disclosed herein.
  • an exemplary digital processing device 401 is programmed or otherwise configured to assemble shortread DNA sequences into fully phased complete genomic sequences.
  • the device 401 can regulate various aspects of the sequence assembly methods of the present disclosure, such as, for example, performing initial alignments, quality checking, performing subsequent alignments, resolving ambiguity, and phasing heterozygous loci.
  • the digital processing device 401 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 405, which can be a single core or multi core processor, or a plurality of processors for parallel processing.
  • the digital processing device 401 also includes memory or memory location 410 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 415 (e.g., hard disk), communication interface 420 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 425, such as cache, other memory, data storage and/or electronic display adapters.
  • the memory 410, storage unit 415, interface 420 and peripheral devices 425 are in communication with the CPU 405 through a communication bus (solid lines), such as a motherboard.
  • the storage unit 415 can be a data storage unit (or data repository) for storing data.
  • the digital processing device 401 can be operatively coupled to a computer network (“network”) 430 with the aid of the communication interface 420.
  • the network 430 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet.
  • the network 430 in some embodiments is a telecommunication and/or data network.
  • the network 430 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
  • the network 430 in some embodiments with the aid of the device 401 , can implement a peer- to-peer network, which may enable devices coupled to the device 401 to behave as a client or a server.
  • the CPU 405 can execute a sequence of machine-readable instructions, which can be embodied in a program or software.
  • the instructions may be stored in a memory location, such as the memory 410.
  • the instructions can be directed to the CPU 405, which can subsequently program or otherwise configure the CPU 405 to implement methods of the present disclosure. Examples of operations performed by the CPU 405 can include fetch, decode, execute, and write back.
  • the CPU 405 can be part of a circuit, such as an integrated circuit. One or more other components of the device 401 can be included in the circuit.
  • the circuit is an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the storage unit 415 can store files, such as drivers, libraries and saved programs.
  • the storage unit 415 can store user data, e.g., user preferences and user programs.
  • the digital processing device 401 in some embodiments can include one or more additional data storage units that are external, such as located on a remote server that is in communication through an intranet or the Internet.
  • the digital processing device 401 can communicate with one or more remote computer systems through the network 430.
  • the device 401 can communicate with a remote computer system of a user.
  • remote computer systems include personal computers (e.g., portable PC), slate or tablet PCs (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants.
  • Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the digital processing device 401 , such as, for example, on the memory 410 or electronic storage unit 415.
  • the machine executable or machine-readable code can be provided in the form of software.
  • the code can be executed by the processor 405.
  • the code can be retrieved from the storage unit 415 and stored on the memory 410 for ready access by the processor 405.
  • the electronic storage unit 415 can be precluded, and machine-executable instructions are stored on memory 410.
  • the methods, systems, and media disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked digital processing device.
  • a computer readable storage medium is a tangible component of a digital processing device.
  • a computer readable storage medium is optionally removable from a digital processing device.
  • a computer readable storage medium includes, by way of nonlimiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives; optical disk drives, cloud computing systems and services, and the like.
  • the program and instructions are permanently, substantially permanently, semi-permanently, or non-transitorily encoded on the media.
  • the methods, systems, and media disclosed herein include at least one computer program, or use of the same.
  • a computer program includes a sequence of instructions, executable in the digital processing device's CPU, written to perform a specified task.
  • Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), data structures, and the like, that perform particular tasks or implement particular abstract data types.
  • APIs Application Programming Interfaces
  • a computer program may be written in various versions of various languages.
  • a computer program comprises one sequence of instructions. In other embodiments, a computer program comprises a plurality of sequences of instructions. In some embodiments, a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations. In various embodiments, a computer program includes one or more software modules. In various implementations, a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof. Standalone Application
  • a computer program includes a standalone application, which is a program that is run as an independent computer process, not an add-on to an existing process, e.g., not a plug-in.
  • standalone applications are often compiled.
  • a compiler is a computer program(s) that transforms source code written in a programming language into binary object code such as assembly language or machine code. Suitable compiled programming languages include, by way of non-limiting examples, C, C++, Objective-C, COBOL, Delphi, Eiffel, JavaTM, Lisp, PythonTM, Visual Basic, and VB .NET, or combinations thereof. Compilation is often performed, at least in part, to create an executable program.
  • a computer program includes one or more executable complied applications.
  • a software module comprises a file, a section of code, a programming object, a programming structure, or combinations thereof.
  • a software module comprises a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, or combinations thereof.
  • the one or more software modules comprise a web application, a mobile application, and a standalone application.
  • software modules are in one computer program or application.
  • software modules are in more than one computer program or application.
  • software modules are hosted on one machine.
  • software modules are hosted on more than one machine.
  • software modules are hosted on one or more cloud computing platforms and/or services.
  • software modules are hosted on one or more machines in one location.
  • software modules are hosted on one or more machines in more than one location. Databases
  • the methods, systems, and media disclosed herein include one or more databases, or use of the same.
  • databases include, by way of non-limiting examples, relational databases, non-relational databases, object- oriented databases, object databases, entity-relationship model databases, associative databases, and XML databases.
  • Further non-limiting examples include SQL, PostgreSQL, MySQL, Oracle, DB2, and Sybase.
  • a database is internet-based.
  • a database is web-based.
  • a database is cloud computing-based.
  • a database is based on one or more local computer storage devices.
  • a method of excising a nucleic acid molecule from a template nucleic acid molecule comprising:
  • identifying a first cleavable region and a second cleavable region having microhomology by: (i) generating, by one or more computers, microhomology data for a plurality of cleavable regions comprising cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (1 ) the location of cut sites and/or (2) nucleobase sequences of nucleobase positions within the cleavable regions comprising the cut sites; and (ii) identifying, by one or more computers, a first cleavable region and a second cleavable region comprising microhomology using the microhomology data;
  • positional data comprises (i) the location of cut sites and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases that are 5’ and/or 3’) to the cut sites;
  • generating or providing a template nucleic acid sequence e.g., a viral genome sequence
  • positional data for a cut site of the plurality of cut sites using the template nucleic acid sequence, wherein the positional data comprises (i) the location of the cut site and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the cut site;
  • positional data for a first cut site and a second cut site within a template nucleic acid sequence, wherein the positional data comprises (i) the location of the cut site within the template and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the first cut site and the second cut site;
  • microhomology data identifies a degree of microhomology between nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology
  • the method further comprises identifying a first target sequence within or adjacent to the first cut site and a second target sequence within or adjacent to the second cut site.
  • the nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise at least 2 nucleobase positions, at least 5 nucleobase positions, at least 7 nucleobase positions, at least 10 nucleobase positions, at least 15 nucleobase positions, at least 20 nucleobase positions, or at least 30 nucleobase positions.
  • the microhomology data is a function of:
  • nucleobase complementarity at a 5’ edge e.g., the at least two nucleobase positions at a 5’ terminus
  • nucleobases proximal to the first cut site and nucleobases proximal the second cut site e.g., the at least two nucleobase positions at a 5’ terminus
  • the first cut site and the second cut site are cleavable by one or more programmable nucleases.
  • the one or more programmable nucleases comprise a CRISPR-Cas system, a meganuclease system, a TALEN system, a ZFN system, or a combination thereof.
  • the template polynucleotide is in a cell.
  • the template polynucleotide is a genome.
  • the genome is a viral genome.
  • the method further comprises identifying a first target nucleic acid sequence comprising or adjacent to the first cut site and a second target nucleic acid sequence comprising or adjacent to the second cut site.
  • the methods further comprises: generating, by the one or more computers, additional data for the first target nucleic acid sequence and the second target nucleic acid sequence, wherein the additional data comprises positional entropy data (e.g., Shannon entropy), gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof.
  • positional entropy data e.g., Shannon entropy
  • gene location e.g., coding region data
  • a distance data e.g., distance from other target sequences
  • proximity to one or more PAM sequences
  • the first target nucleic acid sequence and the second target nucleic acid sequence are further generated by using the additional data associated with the first target nucleic acid sequence and the second target nucleic acid sequence to generate a score, wherein the score is above a threshold value.
  • the first cut site and the second cut site are cleavable by one or more programmable nucleases.
  • the one or more programmable nucleases comprise a CRISPR-Cas system, a meganuclease system, a TALEN system, a ZFN system, or a combination thereof.
  • the first target nucleic acid sequence and the second target nucleic acid sequence are targetable (e.g., capable of being cleaved by) one or more CRISPR-Cas systems.
  • the first cut site and the second cut sites are adjacent-a PAM sequence.
  • the microhomology comprises at least 2, at least 5, at least 10, at least 15, or at least 20 complementary nucleotides.
  • the first cut site is located within a first gene of the template polynucleotide molecule and the second cut site is located within a second gene of the template polynucleotide molecule. In some embodiments, the first cut site and the second cut site are located within two or more genes of the template polynucleotide molecule. In some embodiments, the first cut site is located within a first protein coding region of the template polynucleotide molecule and the second cut site is located within a second protein coding region of the template polynucleotide molecule. In some embodiments, the first cut site and the second cut site are located within two or more protein coding regions of the template polynucleotide molecule.
  • the first cut site and the second cut site are identical or substantially identical (e.g., greater than 75% sequence identity).
  • the microhomology is located at the terminus (e.g., the 3’ end) of the first cut site or cut site associated with the first cut site and the second cut site or cut site associated with the second cut site.
  • the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cut site) to the terminus of the first cut site or cut site associated with the first cut site and the second cut site or cut site associated with the second cut site.
  • the cutting at first cut site and cutting ate the second cut site results in a deletion (e.g., excising) between the first cut site and the second cut site.
  • the cutting at first cut site and cutting at to the second cut site results in (i) microhomology-mediated end joining (MMEJ) of the first cut site and the second cut site, and/or (ii) a deletion in the template polynucleotide molecule.
  • the deletion comprises at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs.
  • the deletion removes one or more genes within the template polynucleotide molecule.
  • the deletion removes of at least about 1 , 2, 3, 4, or 5 genes within the template polynucleotide molecule.
  • the deletion is a full deletion of a gene or a partial deletion of a gene.
  • the first target nucleic acid sequence and the second target nucleic acid sequence are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs.
  • the template polynucleotide molecule is in a cell.
  • the cell is in an individual.
  • the individual is a human.
  • the template polynucleotide molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • the template polynucleotide molecule is an episomal or integrated genome exogenous a host cell.
  • Exemplary embodiment 1 A composition, comprising:
  • a first gene editing system configured to enzymatically cleave at a first target site on a template nucleic acid molecule and generate a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence;
  • the second gene editing system is configured to enzymatically cleave at a second target site on the template nucleic acid molecule and generate a second cleaved region, the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
  • Exemplary embodiment 2 The composition of embodiment 1 , wherein the first target site and the second target site are different.
  • Exemplary embodiment 3 The composition of embodiment 1 or 2, wherein the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • GC guanine or cytosine
  • Exemplary embodiment 4 The composition of any one of embodiments 1 to 3, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
  • Exemplary embodiment 5 The composition of any one of embodiments 1 to 3, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
  • Exemplary embodiment 6 The composition of any one of embodiments 1 to 5, wherein the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
  • Exemplary embodiment 7 The composition of any one of embodiments 1 to 8, wherein the first gene editing system and the second gene editing system is selected from the group consisting of a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system.
  • Exemplary embodiment 8 The composition of any one of embodiments 1 to 7, wherein the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides.
  • Exemplary embodiment 9 The composition of any one of embodiments 1 to 8, wherein the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule.
  • Exemplary embodiment 10 The composition of any one of embodiments 1 to 9, wherein the first target site and the second target site are located within two or more genes of the template nucleic acid molecule.
  • Exemplary embodiment 11 The composition of any one of embodiments 1 to 10, wherein the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule.
  • Exemplary embodiment 12 The composition of any one of embodiments 1 to 1 1 , wherein the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
  • Exemplary embodiment 13 The composition of any one of embodiments 1 to 1 1 , wherein the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
  • the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
  • Exemplary embodiment 14 The composition of any one of embodiments 1 to 13, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
  • MMEJ microhomology-mediated end joining
  • Exemplary embodiment 15 The composition of any one of embodiments 1 to 13, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
  • Exemplary embodiment 16 The composition of any one of embodiments 14 to 15, wherein the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
  • 50 base pairs or greater e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs.
  • Exemplary embodiment 17 The composition of any one of embodiments 14 to 16, wherein the deletion removes one or more genes within the template nucleic acid molecule, wherein the deletion is a full deletion of a gene or a partial deletion of a gene.
  • Exemplary embodiment 18 The composition of any one of embodiments 14 to 17, wherein the deletion comprises an inversion.
  • Exemplary embodiment 19 The composition of any one of embodiments 1 to 18, wherein the first target site and the second target site are separated by a distance of at least 50 (e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
  • at least 50 e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs.
  • Exemplary embodiment 20 The composition of any one of embodiments 1 to 19, wherein the template nucleic acid molecule is in a cell.
  • Exemplary embodiment 21 The composition of embodiment 20, wherein the cell is in an individual.
  • Exemplary embodiment 22 The composition of embodiment 21 , wherein the individual is a human.
  • Exemplary embodiment 23 The composition of any one of embodiments 1 to 24, wherein the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • Exemplary embodiment 24 The composition of any one of embodiments 1 to 24, wherein the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell.
  • Exemplary embodiment 25 A composition, comprising: (a) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease system comprising: (i) a first guide ribonucleic acid (gRNA) comprising a first spacer sequence that hybridizes to a first target site on a template nucleic acid molecule, and (ii) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease, wherein: the first CRISPR-associated nuclease cleaves the template nucleic acid molecule within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the first target site and generates a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeat
  • a second CRISPR-associated endonuclease system comprising (i) a second guide ribonucleic acid (gRNA) comprising a second spacer sequence that hybridizes to a second target site on the template nucleic acid molecule, and (ii) a second Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)- associated nuclease, wherein:
  • the second CRISPR-associated nuclease cleaves the template nucleic acid molecule within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the second target site and generates a second cleaved region, the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
  • Exemplary embodiment 26 The composition of embodiment 25, wherein the first gRNA and the second gRNA are different.
  • Exemplary embodiment 27 The composition of embodiment 25 or 26, wherein the microhomology comprises three or more complementary nucleotides having a GC (guanine or cytosine) content greater than or equal to 50%.
  • Exemplary embodiment 28 The composition of any one of embodiments 25 to 27, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
  • Exemplary embodiment 29 The composition of any one of embodiments 25 to 28, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
  • Exemplary embodiment 30 The composition of any one of embodiments 25 to 29, wherein the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
  • Exemplary embodiment 31 The composition of any one of embodiments 25 to 30, wherein the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides.
  • Exemplary embodiment 32 The composition of any one of embodiments 25 to 31 , wherein the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule.
  • Exemplary embodiment 33 The composition of any one of embodiments 25 to 32, wherein the first target site and the second target site are located within two or more genes of the template nucleic acid molecule.
  • Exemplary embodiment 34 The composition of any one of embodiments 25 to 33, wherein the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule.
  • Exemplary embodiment 35 The composition of any one of embodiments 25 to 34, wherein the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
  • Exemplary embodiment 36 The composition of any one of embodiments 25 to 34, wherein the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
  • the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
  • Exemplary embodiment 37 The composition of any one of embodiments 25 to 36, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
  • MMEJ microhomology-mediated end joining
  • Exemplary embodiment 38 The composition of any one of embodiments 25-36, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
  • Exemplary embodiment 39 The composition of any one of embodiments 37 or 38, wherein the deletion comprises 50 base pairs or greater (e.g ., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
  • 50 base pairs or greater e.g ., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs.
  • Exemplary embodiment 40 The composition of any one of embodiments 37 to 39, wherein the deletion removes one or more genes within the template nucleic acid molecule, and wherein the deletion is a full deletion of a gene or a partial deletion of a gene.
  • Exemplary embodiment 41 The composition of any one of embodiments 37 to 40, wherein the deletion comprises an inversion.
  • Exemplary embodiment 42 The composition of any one of embodiments 37 to 41 , wherein the first target site and the second target site are separated by a distance of at least 50 (e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
  • at least 50 e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs.
  • Exemplary embodiment 43 The composition of any one of embodiments 1 to 42, wherein the template nucleic acid molecule is in a cell.
  • Exemplary embodiment 44 The composition of embodiment 43, wherein the cell is in an individual.
  • Exemplary embodiment 45 The composition of embodiment 44, wherein the individual is a human.
  • Exemplary embodiment 46 The composition of any one of embodiments 1 to 45, wherein the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • Exemplary embodiment 47 The composition of any one of embodiments 1 to 46, wherein the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell.
  • Exemplary embodiment 48 Use of the composition of any one of embodiments 1 to 47 in a method of excising a nucleic acid molecule from the template nucleic acid molecule.
  • Exemplary embodiment 49 A nucleic acid vector encoding one or more components of the first gene editing system and/or the second gene editing system of any one of embodiments 1 to 47.
  • Exemplary embodiment 50 Use of the nucleic acid vector of embodiment 49 in a method of excising a nucleic acid molecule from the template nucleic acid molecule.
  • Exemplary embodiment 51 A nucleic acid vector encoding one or more components of the first CRISPR-Cas system and/or the second CRISPR-Cas system of any one of embodiments 25 to 47.
  • Exemplary embodiment 52 A method of excising a nucleic acid molecule from a template nucleic acid molecule, the method comprising:
  • Exemplary embodiment 53 The method of embodiment 52, wherein (a) comprises contacting the template nucleic acid with the first gene editing system of any one of embodiments 1 -24 or the first CRISPR-associated nuclease system of any one of embodiments 25-47 and cleaving the template nucleic acid molecule, and wherein (b) comprises contacting the template nucleic acid with the second gene editing system of any one of embodiments 1 -24 or the second CRISPR- associated nuclease system of any one of embodiments 25-47 and cleaving the template nucleic acid molecule.
  • Exemplary embodiment 54 The method of any one of embodiments 52 to 53, wherein the first target site and the second target site are different.
  • Exemplary embodiment 55 The method of any one of embodiments 52 to 54, wherein the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • Exemplary embodiment 56 The method of any one of embodiments 52 to 55, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
  • Exemplary embodiment 57 The method of any one of embodiments 52 to 55, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
  • Exemplary embodiment 58 The method of any one of embodiments 52 to 57, wherein the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
  • Exemplary embodiment 59 The method of any one of embodiments 52 to 58, wherein the first gene editing system and the second gene editing system is selected from the group consisting of a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system.
  • Exemplary embodiment 60 The method of any one of embodiments 52 to 59, wherein the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides.
  • Exemplary embodiment 61 The method of any one of embodiments 52 to 60, wherein the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule.
  • Exemplary embodiment 62 The method of any one of embodiments 52 to 61 , wherein the first target site and the second target site are located within two or more genes of the template nucleic acid molecule.
  • Exemplary embodiment 63 The method of any one of embodiments 52 to 62, wherein the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule.
  • Exemplary embodiment 64 The method of any one of embodiments 52 to 63, wherein the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
  • Exemplary embodiment 65 The method of any one of embodiments 52 to 64, wherein the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
  • the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
  • Exemplary embodiment 66 The method of any one of embodiments 52 to 65, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
  • MMEJ microhomology-mediated end joining
  • Exemplary embodiment 67 The method of any one of embodiments 52 to 65, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
  • Exemplary embodiment 68 The method of any one of embodiments 66 to 67, wherein the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
  • the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
  • Exemplary embodiment 69 The method of any one of embodiments 66 to 68, wherein the deletion removes one or more genes within the template nucleic acid molecule.
  • Exemplary embodiment 70 The method of any one of embodiments 66 to 69, wherein the deletion is a full deletion of a gene or a partial deletion of a gene.
  • Exemplary embodiment 71 The method of any one of embodiments 52 to 70, wherein the first target site and the second target site are separated by a distance of at least 50 (e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
  • at least 50 e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs.
  • Exemplary embodiment 72 The method of any one of embodiments 52 to 71 , wherein the template nucleic acid molecule is in a cell.
  • Exemplary embodiment 73 The method of embodiment 72, wherein the cell is in an individual.
  • Exemplary embodiment 74 The method of embodiment 73, wherein the individual is a human.
  • Exemplary embodiment 75 The method of any one of embodiments 52 to 74, wherein template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • Exemplary embodiment 76 The method of any one of embodiments 52 to 75, wherein the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell.
  • Exemplary embodiment 77 A method of inactivating a virus, comprising:
  • Exemplary embodiment 78 The method of embodiment 77, wherein (a) comprises contacting the viral nucleic acid with the first gene editing system of any one of embodiments 1 -24 or the first CRISPR-associated nuclease system of any one of embodiments 25-47 and cleaving the viral nucleic acid molecule, and wherein (b) comprises contacting the viral nucleic acid with the second gene editing system of any one of embodiments 1 -24 or the second CRISPR-associated nuclease system of any one of embodiments 25-47 and cleaving the viral nucleic acid molecule.
  • Exemplary embodiment 79 The method of any one of embodiments 77 to 78, wherein the first target site and the second target site are different.
  • Exemplary embodiment 80 The method of any one of embodiments 77 to 79, wherein the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • Exemplary embodiment 81 The method of any one of embodiments 77 to 80, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
  • Exemplary embodiment 82 The method of any one of embodiments 77 to 80, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
  • Exemplary embodiment 83 The method of any one of embodiments 77 to 82, wherein the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
  • Exemplary embodiment 84 The method of any one of embodiments 77 to 83, wherein the viral nucleic acid molecule is in a cell.
  • Exemplary embodiment 85 The method of embodiment 84, wherein the cell is in an individual.
  • Exemplary embodiment 86 The method of embodiment 85, wherein the individual is a human.
  • Exemplary embodiment 87 The method of any one of embodiments 77 to 86, wherein viral nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
  • Exemplary embodiment 88 The method of any one of embodiments 77 to 86, wherein the viral nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell.
  • Exemplary embodiment 89 A computer-implemented method of cut site identification and characterization for cutting a template polynucleotide molecule, the computer-implemented method comprising:
  • Exemplary embodiment 90 A computer-implemented method of cut site identification and characterization for cutting a template polynucleotide molecule, the computer-implemented method comprising:
  • a template nucleic acid sequence e.g., a viral genome sequence
  • positional data for a cleavable region of the plurality of cleavable regions using the template nucleic acid sequence, wherein the positional data comprises (i) the location of the cleavable region, (ii) a cut site within the cleavable region, and/or (iii) nucleobase sequences of nucleobase positions within the cleavable region;
  • Exemplary embodiment 91 The computer-implemented method of any one of embodiments 89-90, wherein the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
  • Exemplary embodiment 92 The computer-implemented method of any one of embodiments 89-91 , wherein a cleavable region comprises (i) about 10 base pairs 5’ of a cut site within the cleavable region and (ii) about 10 base pairs 3’ of the cut site within the cleavable region.
  • Exemplary embodiment 93 The computer-implemented method of any one of embodiments 89-92, wherein the microhomology data is a function of:
  • orientation and/or strand location (e.g., for identifying inversion outcomes);
  • Exemplary embodiment 94 The computer-implemented method of any one of embodiments 89-93, wherein the template nucleic acid sequence comprises consensus sequence, and wherein the computer-implemented method comprises in (a) generating, by one or more computers, the template nucleic acid sequence by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes).
  • Exemplary embodiment 95 The computer-implemented method of any one of embodiments 89-94, wherein the template nucleic acid sequence is different from each input nucleic acid sequence used to generate the consensus sequence.
  • Exemplary embodiment 96 The computer-implemented method of any one of embodiments 89-95, wherein the two or more input nucleic acid sequences are present within a definable geographical region (e.g., Asia, Europe, North America, etc.), a definable population of individuals (e.g., a patient population), or a definable pathology (e.g., cancer-causing variants).
  • a definable geographical region e.g., Asia, Europe, North America, etc.
  • a definable population of individuals e.g., a patient population
  • a definable pathology e.g., cancer-causing variants.
  • Exemplary embodiment 97 The computer-implemented method of any one of embodiments 89-95, wherein the computer-implemented method further comprises: generating, by one or more computers, positional entropy data for a nucleotide at each position of the template nucleic acid sequence.
  • Exemplary embodiment 98 The computer-implemented method of any one of embodiments 89-97, further comprising: generating, by the one or more computers, additional data using the template nucleic acid sequence and/or the positional data, wherein the additional data comprises positional entropy data (e.g., Shannon entropy) for a cut site and/or nucleobase positions proximal to the cut site, gene location (e.g., coding region data) data for a cut site and/or nucleobase positions proximal to the cut site, a distance data (e.g., distance from other target sequences) for a cut site and/or nucleobase positions proximal to the cut site, proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence) for a cut site and/or nucleobase positions proximal to the cut site, target specificity and selectivity data (e.g., Azimuth 2.0) for a cut
  • Exemplary embodiment 99 The computer-implemented method of any one of embodiments 58-94, wherein the method further comprises identifying a first target site sequence comprising or adjacent to the first cut site and a second target site sequence comprising or adjacent to the second cut site.
  • Exemplary embodiment 100 The computer-implemented method of embodiment 99, further comprising: generating, by the one or more computers, additional data for the first target nucleic acid sequence and the second target nucleic acid sequence, wherein the additional data comprises positional entropy data (e.g., Shannon entropy), gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof.
  • positional entropy data e.g., Shannon entropy
  • gene location e.g., coding region data
  • a distance data e.g., distance from other target sequences
  • PAM sequences e.g., PAM sequence
  • homology data e.g., homology to a human genome sequence
  • target specificity and selectivity data
  • Exemplary embodiment 101 A method of excising a nucleic acid molecule from a template nucleic acid molecule, the method comprising:
  • the determination of percent identity or percent similarity between two sequences can be accomplished using a mathematical algorithm.
  • a non-limiting example of a mathematical algorithm utilized for the comparison of two sequences is the algorithm of Karlin and Altschul, 1990, Proc. Natl. Acad. Sci. USA 87:2264- 2268, modified as in Karlin and Altschul, 1993, Proc. Natl. Acad. Sci. USA 90:5873- 5877.
  • Such an algorithm is incorporated into the NBLAST and XBLAST programs of Altschul et al., 1990, J. Mol. Biol. 215:403-410.
  • PSI-Blast can be used to perform an iterated search which detects distant relationships between molecules.
  • sequence alignment may be carried out using the CLUSTAL algorithm (e.g., as provided in the program Clustal-omega), as described by Higgins et al., 1996, Methods Enzymol. 266:383-402.
  • the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “include” and “includes”) or “containing” (and any form of containing, such as “contain” and “contains”), are inclusive or open-ended and do not exclude additional, unrecited elements or process steps.
  • the term “about” in the context of a given value or range includes and/or refers to a value or range that is within 20%, within 10%, and/or within 5% of the given value or range.
  • the term “and/or” is to be taken as specific disclosure of each of the two specified features or components with or without the other.
  • a and/or B is to be taken as specific disclosure of each of (i) A, (ii) B and (iii) A and B, just as if each were set out individually herein.
  • a sample includes and/or refers to any fluid or liquid sample which is being analyzed in order to detect and/or quantify an analyte.
  • a sample is a biological sample.
  • samples include without limitation a bodily fluid, an extract, a solution containing proteins and/or DNA, a cell extract, a cell lysate, or a tissue lysate.
  • bodily fluids include urine, saliva, blood, serum, plasma, cerebrospinal fluid, tears, semen, sweat, pleural effusion, liquified fecal matter, and lacrimal gland secretion.
  • MMEJ-mediated excision of template nucleic acids e.g., viral sequences
  • MMEJ-mediated deletions are considered to be limited to indels at single cut sites having smaller distances (e.g., 15 nucleotides) between microhomologous sequences.
  • MMEJ prediction algorithms generally reduce MMEJ predictions as a function of the distance between microhomologous sequences (e.g., reducing predicted MMEJ frequencies as the distance between microhomologous sequences increases).
  • MMEJ- mediated excision is achieved over large distances (e.g., >100 base pairs, >1 ,000 base pairs, etc.) separating a first and a second cleaved region.
  • compositions and methods are useful for excising viral nucleic acids.
  • Viral nucleic acids can a present in increase challenge in targeting because they generally be present in copy number in a cell, and can be integrated or episomal.
  • using microhomology scoring to model the competing repair possibilities, including individual cut site repair, inversions and/or excision modelling can allow choice of target sites that maximize the desired outcome in viral excision.
  • choosing target sites e.g., cleavable regions or guide RNAs
  • choosing target sites e.g., cleavable regions or guide RNAs
  • the examples include a representative number of various nucleases (e.g., different Cas enzymes) and target sequences (e.g., differing sets of genes targeted and/or guides used).
  • the data shows that a higher MMEJ ranking (e.g., increased microhomology) is associated with increased frequencies of MMEJ- mediated excision between distant cleaved regions (e.g., upwards of 4,500 base pairs in some cases).
  • Microhomology can be determined by various known methods, such as Microhomology-Predictor (Bae, S., Kweon, J., Kim, H. et al. Microhomology-based choice of Cas9 nuclease target sites.
  • microhomology are a function of the number of complementary base pairs (e.g., microhomology length > 3 contiguous base pairs) and GC content (e.g., > 50% GC content).
  • GC content e.g., > 50% GC content.
  • the first and the second cleaved region have reduced, little, or no internal microhomology (i.e., within the first cleaved region or within the second cleaved region) when each cleaved region is independently considered (e.g., sequences 5’ and 3’ of the first cut site within the first cleaved region have reduced or no microhomology).
  • HEK293FT cells with a stable reporter construct, containing regions of interest in the HSV genome were treated with CasX2 nuclease and two guide RNAs, ICP0_g9 and ICP27_g9. Once expressed, the fully complexed ribonucleoprotein target two distinct sites within these regions for double-strand break.
  • the CasX2 nuclease and the guide RNAs were expressed from two expression plasmids with each plasmid containing a single copy of CasX2 nuclease and a singular guide RNA.
  • the plasmids were delivered to the HEK293FT reporter cell lines via transient transfection, genomic DNA harvested and amplified with flanking primers for NGS.
  • FIG. 4A shows a graphical representation of the cut sites found in HSV.
  • FIG. 4B shows excision the microhomology (MH)/ MME J rank and associated frequency of excision as measured by number of sequencing reads. Each of these excisions went from one microhomology region identified near the ICP27_g9 cut site (e.g., within the ICP27 cleaved region) to a predicted matching microhomology region near the ICP0_g9 cut site (e.g., within the ICPO cleaved region). Excision of ⁇ 3,600 base pairs was observed.
  • FIG. 4B aligns pairs of sequences with the bottom sequence in each pair showing the MMEJ-based excision from the NGS results. Above is the viral reference sequence. In both sequences the microhomology regions are bold and underlined - in the bottom sequence, one microhomology region and the intervening sequences have been excised. Pam sequences are in lower case letters.
  • the top six and eighth scoring predicted microhomology mediated excisions were found in the sequencing results.
  • the top scoring site (Rank 1 , exemplary MH Score 335) was found in the highest number of reads (281 1 ).
  • the second top predicted MMEJ score (Rank 2, exemplary MH Score 335) was found in the second highest number of reads (776).
  • FIG. 4C shows a plot of the sequencing reads detecting each of the different microhomology mediated viral excisions plotted against the increasing MMEJ score.
  • the data shows that a higher MMEJ ranking (e.g., increased microhomology) is associated with increased frequencies of MMEJ-mediated excision between distant cleaved regions (e.g., upwards of 4,500 base pairs in some cases).
  • the DNA deletion predicted with the highest score, also had the highest number of reads, indicating it happened at highest rate.
  • MMEJ ranking and scoring can be used to analyze multiple different candidate target sites for excision (e.g., excision of 50, 100, 500, 1 ,000 base pairs, and/or greater).
  • HEK293FT cells with a stable reporter construct, containing regions of interest in the HSV genome were treated with CasX2 nuclease and two guide RNAs, ICP0_g6 and ICP27_g9. Once expressed, the fully complexed ribonucleoprotein target two distinct sites within these regions for double-strand break.
  • the CasX2 nuclease and the guide RNAs were expressed from two expression plasmids with each plasmid containing a single copy of CasX2 nuclease and a singular guide RNA.
  • the plasmids were delivered to the HEK293FT reporter cell lines via transient transfection, genomic DNA harvested and amplified with flanking primers for NGS.
  • FIG. 5 shows excision the MMEJ rank and associated reads as measured by number of sequencing reads. Each of these excisions went from one microhomology region identified near the ICP27_g6 cut site (e.g., within the ICP27 cleaved region) to a predicted matching microhomology region near the ICP0_g9 cut site (e.g., within the ICP0 cleaved region). Excision of ⁇ 4,500 base pairs was observed.
  • FIG. 5 aligns pairs of sequences with the bottom sequence in each pair showing the MMEJ-based excision from the NGS results. Above is the viral reference sequence. In both sequences the microhomology regions are bold and underlined - in the bottom sequence, one microhomology region and the intervening sequences have been excised. Pam sequences are in lower case letters.
  • the top two scoring predicted microhomology mediated excisions were found in the sequencing results.
  • the top scoring site (Rank 1 , exemplary MH Score 233) with a MMEJ score was found in the highest number of reads (3,988).
  • the second top predicted MMEJ score (Rank 2, exemplary MH Score 184) was found in the second highest number of reads (2,611 ).
  • HEK293FT cells with a stable reporter construct, containing regions of interest in the HSV genome were treated with SluCas9 nuclease and two guide RNAs, ICP0_g3 and ICP27_g4. Once expressed, the fully complexed ribonucleoprotein target two distinct sites within these regions for double-strand break.
  • the SluCas9 nuclease and the guide RNAs were expressed from two expression plasmids with each plasmid containing a single copy of SluCas9 nuclease and a singular guide RNA.
  • the plasmids were delivered to the HE K293FT reporter cell lines via transient transfection, genomic DNA harvested and amplified with flanking primers for NGS.
  • FIG. 6 shows excision the microhomology (MH)/ MME J rank and associated frequency as measured by number of sequencing reads. Each of these excisions went from one microhomology region identified near the ICP27_g4 cut site (e.g., within the ICP27 cleaved region) to a predicted matching microhomology region near the ICP0_g3 cut site (e.g., within the ICPO cleaved region). Excision of ⁇ 3,400 base pairs was observed.
  • FIG. 6 aligns pairs of sequences with the bottom sequence in each pair showing the MMEJ-based excision from the NGS results. Above is the viral reference sequence. In both sequences the microhomology regions are bold and underlined - in the bottom sequence, one microhomology region and the intervening sequences have been excised. Pam sequences are in lower case letters.
  • the top two scoring predicted microhomology mediated excisions were found in the sequencing results.
  • the top scoring site (Rank 1 , exemplary MH Score 303.5) with a MMEJ score was found in the highest number of reads (3,964).
  • the second top predicted MMEJ score (Rank 2, exemplary MH Score 133) was found in the second highest number of reads (408).
  • the CpeCas9 nuclease and the guide RNAs were expressed from two expression plasmids with each plasmid containing a single copy of CpeCas9 nuclease and a singular guide RNA.
  • the plasmids were delivered to the HEK293FT reporter cell lines via transient transfection, genomic DNA harvested and amplified with flanking primers for NGS.
  • FIG. 7A shows excision the microhomology (MH)/MMEJ rank and associated frequency as measured by number of sequencing reads.
  • the excisions example on top went from one microhomology region identified near the ICP0_g16 cut site (e.g., within the ICPO cleaved region) to a predicted matching microhomology region near the ICP27_g20 cut site (e.g., within the ICP27 cleaved region). Excision of ⁇ 3,559 base pairs was observed.
  • FIG. 7A aligns pairs of sequences with the bottom sequence in each pair showing the MMEJ-based excision from the NGS results.
  • the viral reference sequence In both sequences the microhomology regions are bold and underlined - in the bottom sequence, one microhomology region and the intervening sequences have been excised.
  • the top two scoring predicted microhomology mediated excisions were found in the sequencing results.
  • the top scoring site (Rank 1 , exemplary MH Score 444.9) with a MMEJ score was found in the highest number of reads.
  • the second top predicted MMEJ score (Rank 2, exemplary MH Score 298.2) was found in the second highest number of reads.
  • FIG. 7B shows MMEJ can be used to predict and generate MMEJ-mediated inversions.
  • the inversion scoring site had an exemplary MH score of 461 and a resulted in an excision and then inversion of ⁇ 4548 base pairs.
  • HEK293FT cells carrying partial HBV genomic sequences of interest on a stably integrated reporter construct were transfected with CpeCas9 nuclease- encoding mRNA and a series of two guide RNAs. Once expressed, the fully complexed ribonucleoproteins generate double-stranded breaks at the two distinct target sites. Seven days after the transfection, genomic DNA was extracted from the harvested cells and the DNA sequences spanning the target sites were separately amplified by PCR with flanking primers for Sanger Sequencing and indel determination.
  • FIG. 8 shows The MMEJ scoring output after cutting the HBV sequences. The figure aligns a pair of sequences with the bottom sequence showing the MMEJ- based excision deduced from the indel determination shown below the sequence before editing.
  • U1 cells with two integrated copies of HIV were treated with SaCas9 nuclease protein and two guide RNAs.
  • the SaCas9 protein was combined with the guides RNAs and introduced into the U1 cells.
  • Single cell clones were isolated and cultured, genomic DNA harvested and amplified with flanking primers for NGS to determine the editing events in the cells.
  • the top two scoring microhomology mediated excisions were identified in the sequencing results.
  • the top scoring site (Rank 1 , having an exemplary MH score of 666) was found in 3964 reads.
  • a lower predicted scoring site (Rank 7, having an exemplary MH score of 120) was found in 2 reads.

Abstract

The present disclosure relates to compositions and methods for inactivating viral or gene editing other sequence using microhomology-mediated end joining DNA repair.

Description

COMPUTER-IMPLEMENTED SYSTEMS AND METHODS FOR TARGETING
MICROHOMOLOGY-MEDIATED EXCISION
CROSS_REFERENCE
[0001] This application claims the benefit of U.S. Provisional Patent Application No. 63/329,274 filed April 8, 2022, which is herein incorporated by reference in entirety for all purposes.
BACKGROUND
[0002] Gene editing systems have the potential to treat an array of diseases, however, one of the major challenges facing their implementation is improving the rate of safe, high fidelity, and efficient editing. The development of safe and efficient gene editing systems and methods is important for the broad implementation of genome-editing technologies in the treatment of disease. The challenges of developing safe and effective delivery genome-editing technologies differs from that facing classical gene replacement therapy, which requires longterm transgene expression and does not involve base or genome editing. The traditional goal of genome-editing technologies has focused on the delivery of one or a limited number of doses of programmable nucleases, in a transient manner, with the goal of achieving sufficient effective editing efficiencies to yield clinical benefits. Due to potential limitations with current gene editing technologies and methods, sufficient effective editing efficiencies may not yield clinical benefits or such clinical benefits may be hard to achieve.
SUMMARY
[0003] Provided and described herein are programmable nuclease systems for MME J -mediated excision of template nucleic acids (e.g., viral sequences) by cutting at least two target sites, wherein the cutting generates cleaved regions having microhomology. Generally, MMEJ-mediated deletions are considered to be limited to indels at single cut sites having smaller distances (e.g., <15 nucleotides) between microhomologous sequences. Moreover, MMEJ prediction algorithms generally reduce MMEJ predictions as a function of the distance between microhomologous sequences (e.g., reducing predicted MMEJ frequencies as the distance between microhomologous sequences increases). However, the compositions and methods described herein are advantageous for achieving MMEJ-mediated excision between two cut sites and/or over large distances (e.g., >100 base pairs, >1 ,000 base pairs, etc.) separating a first and a second cleaved region.
[0004] In certain instances, methods described herein are further advantageous for modelling and/or predicting a range of repair options with multiple cuts (e.g., MMEJ-mediated excision, MMEJ-mediated excision using inversions, etc.). For example, the described methods better allow for the selection of target sites predicted to provide more the desired outcome, such as improved MMEJ-mediated excision efficiencies and/or MMEJ-mediated inversions.
[0005] The compositions and methods described herein are additionally useful for the excising viral nucleic acid molecules (e.g., to inactivate a virus). In certain embodiments, using microhomology scoring to model the competing repair possibilities, including individual cut site repair, inversions and/or excision modelling can allow choice of target sites that maximize the desired outcome in viral excision. In certain instances, choosing target sites (e.g., cleavable regions or guide RNAs) based on high identifying and/or characterizing microhomologies can provide excision or inversion levels that are substantially higher than expected from non-homology repair.
[0006] Provided herein are compositions, comprising:
(a) a first gene editing system, wherein: the first gene editing system is configured to enzymatically cleave at a first target site on a template nucleic acid molecule and generate a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(b) a second gene editing system, wherein: the second gene editing system is configured to enzymatically cleave at a second target site on the template nucleic acid molecule and generate a second cleaved region, the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
[0007] In some embodiments, the first target site and the second target site are different. In some embodiments, the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%. In some embodiments, the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides. In some embodiments, sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology. In some embodiments, microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence. In some embodiments, the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
[0008] In some embodiments, the first gene editing system and the second gene editing system is selected from the group consisting of a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system.
[0009] In some embodiments, the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides, first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. In some embodiments, the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides, first target site and the second target site are located within two or more genes of the template nucleic acid molecule. In some embodiments, the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides, first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule.
[0010] In some embodiments, the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. [0011] In some embodiments, the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
[0012] In some embodiments, generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule. In some embodiments, the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs). In some embodiments, the deletion removes one or more genes within the template nucleic acid molecule. In some embodiments, the deletion is a full deletion of a gene or a partial deletion of a gene. In some embodiments, the deletion comprises an inversion. In some embodiments, the first target site and the second target site are separated by a distance of at least 50 (e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
[0013] In some embodiments, the template nucleic acid molecule is in a cell. In some embodiments, the cell is in an individual. In some embodiments, the individual is a human. In some embodiments, the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated). In some embodiments, the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell.
[0014] Provided herein are compositions, comprising:
(a) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)- associated nuclease system comprising: (i) a first guide ribonucleic acid (gRNA) comprising a first spacer sequence that hybridizes to a first target site on a template nucleic acid molecule, and (ii) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease, wherein: the first CRISPR-associated nuclease cleaves the template nucleic acid molecule within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the first target site and generates a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence; and (b) a second CRISPR-associated endonuclease system comprising (i) a second guide ribonucleic acid (gRNA) comprising a second spacer sequence that hybridizes to a second target site on the template nucleic acid molecule, and (ii) a second Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)- associated nuclease, wherein: the second CRISPR-associated nuclease cleaves the template nucleic acid molecule within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the second target site and generates a second cleaved region, the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
[0015] In some embodiments, the first gRNA and the second gRNA are different. In some embodiments, the microhomology comprises three or more complementary nucleotides having a GC (guanine or cytosine) content greater than or equal to 50%. In some embodiments, the, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology. In some embodiments, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
[0016] In some embodiments, the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region. In some embodiments, the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides. In some embodiments, the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. [0017] In some embodiments, the first target site and the second target site are located within two or more genes of the template nucleic acid molecule. In some embodiments, the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule. In some embodiments, the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
[0018] In some embodiments, generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
[0019] In some embodiments, generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule. In some embodiments, the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs). In some embodiments, the deletion removes one or more genes within the template nucleic acid molecule. In some embodiments, the deletion is a full deletion of a gene or a partial deletion of a gene. In some embodiments, the deletion comprises an inversion. In some embodiments, the first target site and the second target site are separated by a distance of at least 50 (e.g. , at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
[0020] In some embodiments, the template nucleic acid molecule is in a cell. In some embodiments, the cell is in an individual. In some embodiments, the individual is a human. In some embodiments, the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated). [0021] Provided herein are methods of excising a nucleic acid molecule from a template nucleic acid molecule, the method comprising: (a) cleaving the template nucleic acid molecule at a first target site on the template nucleic acid molecule, thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(b) cleaving the template nucleic acid molecule at a second target site on the template nucleic acid molecule, thereby generating a second cleaved region, wherein: the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
[0022] In some embodiments, the first target site and the second target site are different. In some embodiments, the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%. In some embodiments, sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology. In some embodiments, microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence. In some embodiments, the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region. [0023] In some embodiments, the first gene editing system and the second gene editing system is selected from the group consisting of a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system. In some embodiments, the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides.
[0024] In some embodiments, the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are located within two or more genes of the template nucleic acid molecule. In some embodiments, the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule.
[0025] In some embodiments, the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
[0026] In some embodiments, generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule. In some embodiments, generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule. In some embodiments, the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs). In some embodiments, the deletion removes one or more genes within the template nucleic acid molecule. In some embodiments, the deletion is a full deletion of a gene or a partial deletion of a gene. In some embodiments, the deletion comprises an inversion.
[0027] In some embodiments, the first target site and the second target site are separated by a distance of at least 50 (e.g. , at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
[0028] In some embodiments, the template nucleic acid molecule is in a cell. In some embodiments, the cell is in an individual. In some embodiments, the individual is a human. In some embodiments, the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated). In some embodiments, the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell.
[0029] Provided herein are methods of inactivating a virus, comprising:
(a) cleaving a viral nucleic acid molecule at a first target site on the viral nucleic acid molecule, thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(b) cleaving the viral nucleic acid molecule at a second target site on the viral nucleic acid molecule, thereby generating a second cleaved region, wherein: the first cleaved region or segment thereof comprises a first nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
[0030] In some embodiments, the first target site and the second target site are different. In some embodiments, the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%. In some embodiments, sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology. In some embodiments, microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence. In some embodiments, the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region. [0031] In some embodiments, the viral nucleic acid molecule is in a cell. In some embodiments, the cell is in an individual. In some embodiments, the individual is a human. In some embodiments, the viral nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated). In some embodiments, the viral nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell. [0032] Provided herein are computer-implemented methods of cut site identification and characterization for cutting a template polynucleotide molecule, the computer-implemented method comprising:
(a) generating, by one or more computers, microhomology data for a plurality of cleavable regions comprising cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (i) the location of cut sites and/or (ii) nucleobase sequences of nucleobase positions within the cleavable regions comprising the cut sites; and
(b) identifying, by one or more computers, a first cleavable region and a second cleavable region comprising microhomology using the microhomology data.
[0033] Provided herein are computer-implemented methods of cut site identification and characterization for cutting a template polynucleotide molecule, the computer-implemented method comprising:
(a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence);
(b) identifying or providing, by one or more computers, a plurality of cleavable regions comprising cut sites within the template nucleic acid sequence;
(c) generating, by one or more computers, positional data for a cleavable region of the plurality of cleavable regions using the template nucleic acid sequence, wherein the positional data comprises (i) the location of the cleavable region, (ii) a cut site within the cleavable region, and/or (iii) nucleobase sequences of nucleobase positions within the cleavable region;
(d) generating, by one or more computers, microhomology data the plurality of cleavable regions using the positional data, and identifying a first cleavable region and a second cleavable region comprising microhomology using the microhomology data.
[0034] In some embodiments, the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
[0035] In some embodiments, a cleavable region comprises (i) about 10 base pairs 5’ of a cut site within the cleavable region and (ii) about 10 base pairs 3’ of the cut site within the cleavable region.
[0036] In some embodiments, the microhomology data is a function of:
(i) total nucleobase complementarity of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region; (ii) the length (e.g., number of contiguous nucleobases) of complementary nucleobases of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region;
(iii) GC content of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region;
(iv) orientation and/or strand location (e.g., for identifying inversion outcomes);
(v) base content of complementary nucleobases between nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region; or
(vi) a combination of (i)-(v).
[0037] In some embodiments, the template nucleic acid sequence comprises consensus sequence, and wherein the computer-implemented method comprises in (a) generating, by one or more computers, the template nucleic acid sequence by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes).
[0038] In some embodiments, the template nucleic acid sequence is different from each input nucleic acid sequence used to generate the consensus sequence. [0039] In some embodiments, the two or more input nucleic acid sequences are present within a definable geographical region (e.g., Asia, Europe, North America, etc.), a definable population of individuals (e.g., a patient population), or a definable pathology (e.g., cancer-causing variants).
[0040] In some embodiments, the computer-implemented method further comprises: generating, by one or more computers, positional entropy data for a nucleotide at each position of the template nucleic acid sequence.
[0041] In some embodiments, the method further comprises: generating, by the one or more computers, additional data using the template nucleic acid sequence and/or the positional data, wherein the additional data comprises positional entropy data (e.g., Shannon entropy) for a cut site and/or nucleobase positions proximal to the cut site, gene location (e.g., coding region data) data for a cut site and/or nucleobase positions proximal to the cut site, a distance data (e.g., distance from other target sequences) for a cut site and/or nucleobase positions proximal to the cut site, proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence) for a cut site and/or nucleobase positions proximal to the cut site, target specificity and selectivity data (e.g., Azimuth 2.0) for a cut site and/or nucleobase positions proximal to the cut site, or combinations thereof. [0042] In some embodiments, the method further comprises identifying a first target site sequence comprising or adjacent to the first cut site and a second target site sequence comprising or adjacent to the second cut site.
[0043] In some embodiments, the method further comprises: generating, by the one or more computers, additional data for the first target nucleic acid sequence and the second target nucleic acid sequence, wherein the additional data comprises positional entropy data (e.g., Shannon entropy), gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof.
[0044] Provided herein are methods of excising a nucleic acid molecule from a template nucleic acid molecule, the method comprising:
(a) identifying a first cleavable region and a second cleavable region having microhomology by: (i) generating, by one or more computers, microhomology data for a plurality of cleavable regions comprising cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (1 ) the location of cut sites and/or (2) nucleobase sequences of nucleobase positions within the cleavable regions comprising the cut sites; and (ii) identifying, by one or more computers, a first cleavable region and a second cleavable region comprising microhomology using the microhomology data;
(b) cleaving the template nucleic acid molecule at the first cleavable region on the template nucleic acid molecule, thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(c) cleaving the template nucleic acid molecule at the second cleavable region on the template nucleic acid molecule, thereby generating a second cleaved region, wherein: the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise the microhomology.
BRIEF DESCRIPTION OF THE DRAWINGS
[0045] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
[0046] FIG. 1A, FIG. 1 B, and FIG. 1C show a schematic representation of generating cleaved regions having microhomology.
[0047] FIG. 2A and FIG. 2B show a schematic representation of microhomologous sequences on or within cleaved regions.
[0048] FIG. 3A and FIG. 3B show a schematic representation of microhomologous sequences on or within cleaved regions.
[0049] FIG. 4A, FIG. 4B, and FIG. 4C show representations and data of two-cut MME J -mediated excision of an HSV nucleic acid using CasX and encompassing ~3,500 base pairs.
[0050] FIG. 5 shows representations and data of two-cut MMEJ-mediated excision of an HSV nucleic acid using CasX and encompassing ~4,500 base pairs. [0051] FIG. 6 shows representations and data of two-cut MMEJ-mediated excision of an HSV nucleic acid using SluCas and encompassing ~3,400 base pairs.
[0052] FIG. 7A and FIG. 7B show representations and data of two-cut MMEJ- mediated excision of an HSV nucleic acid using CpeCas and encompassing ~3, 500-4, 500 base pairs.
[0053] FIG. 8 shows representations and data of single MMEJ-mediated deletion at a single cleaved region of an HBV nucleic acid using CpeCas.
[0054] FIG. 9 shows an exemplary flowchart of a method for selecting a guide RNA.
[0055] FIG. 10 schematically illustrates a computer system that is programmed or otherwise configured to implement methods provided herein.
DETAILED DESCRIPTION
Gene Editing Systems
[0056] A gene editing system generally refers to and includes a system comprising one or more nucleic acid-modifying enzymes capable of binding a nucleic acid molecule (e.g., a template nucleic acid molecule). As described herein, gene editing systems are generally used for modifying the nucleic acid of a target gene and/or for modulating the expression of a target gene (e.g., as measured by mRNA expression, protein expression, or protein function). Furthermore, in general, the one or more nucleic acid-binding domains or components are associated with the one or more nucleic acid-modifying enzymes or components, such that the one or more nucleic acid -binding domains target the one or more nucleic acid-modifying enzymes or components to a specific nucleic acid site (e.g., a specific sequence). Furthermore, the gene editing systems described herein are useful for targeting two or more sites on a template nucleic acid (e.g., a viral genome) to excise a portion (e.g., full or partial) through microhomology-mediated repair pathways (e.g., MMEJ). Gene systems generally include, but are not limited to, zinc finger nucleases, transcription activator-like effector nucleases (TALENs); clustered regularly interspaced short palindromic repeats (CRISPR)ZCas systems, meganuclease systems, and recombinase-based systems.
[0057] In some embodiments, the gene editing systems described herein are useful for targeting at least a first site (e.g., region) on a template nucleic acid and a second site on the template nucleic acid. Target or target site (e.g., as used in target region, or target gene) generally refers to and includes a specific nucleic acid (e.g., a sequence, gene, position, region etc.) on which a protein or enzyme is intended to bind or act upon. In some embodiments, the gene editing system comprises a nucleic acid encoding a gene editing system configured to target multiple target sites (e.g., a first target site and a second target site) on a template nucleic acid molecule. In certain instances, gene editing systems targeting a first and a second target site (e.g., binding and enzymatically generating a first and a second double stranded break) provides advantages when combined with targeting regions having microhomology in that generating a first and a second double stranded break comprising microhomology promotes efficient excision (e.g., microhomology-mediated excision) of the region between the first site and second site, whereas a single cut may be readily repaired by host cell machinery.
[0058] In some embodiments, the gene editing system targets a first target site and a second target site on a template nucleic acid. In certain instances, targeting a first target site and a second target site on a template nucleic acid provides the benefit of promoting excision of the region between the first site and second site, wherein this effect is achieved by a gene editing system targeting the first and second target site. In such instances, the gene editing system can be a CRISPR- Cas system (e.g., comprising a first gRNA and second gRNA, or a single gRNA having a target site repeated within in a template sequence), a meganuclease system (e.g., comprising a first meganuclease and a second meganuclease, or a single meganuclease having a target site repeated within in a template sequence), a TALEN system (e.g., comprising a first TALEN and a second TALEN, or a single TALEN having a target site repeated within in a template sequence), or a zinc finger nuclease system (e.g., comprising a first ZFN and a second ZFN, or a single ZFN having a target site repeated within in a template sequence). In some embodiments, a vector comprising a nucleic acid encoding a gene editing system that is configured to target (e.g., cut) multiple sites (identical sites or different sites) within a template nucleic acid are useful for excision of a region between the multiple sites. In such instances and embodiments, microhomology-mediated excision provides advantages over targeting a single site, targeting sites without homology or introducing a single cut in a template sequence.
CRISPR-Cas systems
[0059] In some embodiments, the gene editing system is a CRSIPR-Cas system. CRISPR system refers to and includes elements involved in the expression of or directing the activity of a CRISPR-associated (Cas) endonuclease, including guide RNA sequences and components thereof, such as a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a spacer sequence (also referred to as a guide sequence), or other sequences and transcripts from a CRISPR locus. In general, a CRISPR system is characterized by such elements that promote the formation of a CRISPR complex at the site of a target sequence. In the context of formation of a CRISPR complex, a target sequence refers to a sequence to which a spacer sequence is designed to hybridize to, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex.
[0060] In some embodiments, the gene editing system is a CRSIPR-Cas system. CRISPR system refers to and includes elements involved in the expression of or directing the activity of a CRISPR-associated (“Cas”) endonuclease, including guide RNA sequences and components thereof, such as a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a spacer sequence (also referred to as a guide sequence), or other sequences and transcripts from a CRISPR locus. In general, a CRISPR system is characterized by such elements that promote the formation of a CRISPR complex that directs to the target sequence or sequences. In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a spacer sequence is designed to hybridize to, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex.
Meganuclease gene editing nucleases
[0061] In some embodiments, the gene editing system is a meganuclease system. Meganuclease generally refers to and includes an endonuclease that binds double- stranded DNA at a recognition sequence that is greater than about 12 to about 40 base pairs. A meganuclease can be an endonuclease that is derived from l-Crel, and can refer to an engineered variant of l-Crel that has been modified relative to natural l-Crel with respect to, for example, DNA- binding specificity, DNA cleavage activity, DNA-binding affinity, or dimerization properties. Methods for producing such modified variants of l-Crel are known. In certain instances, a meganuclease as used herein binds to double-stranded DNA as a heterodimer or as a single-chain meganuclease in which a pair of DNA-binding domains are joined into a single polypeptide using a peptide linker. A single-chain meganuclease refers to and includes a polypeptide comprising a pair of meganuclease subunits joined by a linker. A single-chain meganuclease has the organization: N-terminal subunit - Linker - C-terminal subunit. In certain instances, the two meganuclease subunits are generally non-identical in amino acid sequence and recognize nonidentical DNA. In embodiments described herein, unless otherwise specified, meganucleases can refer to a dimeric or single-chain meganuclease.
[0062] In certain instances, meganucleases can be divided into families based on sequence and structure motifs: LAGLIDADG, GIY-YIG, HNH, His-Cys box and PD-(D/E)XK. A range of other meganulcease systems are possible, e.g., such as those derived from naturally occurring enzymes or through rational design. In certain instances, crystal structures illustrate the mode of sequence specificity and cleavage mechanism for the meganucleases (e.g., LAGLIDADG meganucleases) where (i) specificity contacts arise from the burial of extended [3-strands into the major groove of the DNA, with the DNA binding elements the having a pitch and contour mimicking the helical twist of the DNA; (ii) cleavage generates 4-nt 3'-OH overhangs occurs across the minor groove, wherein the scissile phosphate bonds are brought closer to the protein catalytic core by a distortion of the DNA in the central “4-base” region; (iii) cleavage occurs via a proposed two-metal mechanism; and (iv) finally, additional affinity and/or specificity contacts can arise from adapted scaffolds or modifications in regions outside the core a/|3 fold (see, e.g., Silva et al., 2011 , Meganucleases and other tools for targeted genome engineering, Curr Gene Ther 1 1 (1 ):1 1 -27)
TALEN Gene Editing Systems
[0063] In some embodiments, the gene editing system is a transcription activator-like effector nuclease (TALEN) system. TALENs generally refer to and include a polypeptide comprising a transcription activator-like effector domain (TALE) for DNA binding and a Fokl nuclease domain. In certain instances, TALENs can be rapidly designed and assembled with flexible targeting sequences with potentially high potency and specificity. In general, the TALE has domain has a central DNA-binding domain composed of 13-28 repeat monomers of 33-34 amino acids. The amino acids of each monomer are highly conserved, except for hypervariable amino acid residues at positions 12 and 13. The two variable amino acids are called repeat-variable residues (RVDs). The amino acid pairs Nl, NG, HD, and NN of RVDs preferentially recognize adenine, thymine, cytosine, and guanine/adenine, respectively, and modulation of RVDs can recognize consecutive DNA bases. Other natural, selected or rationally designed RVDs can be used, including NK and NH to recognize guanine. In certain instances, the relationship between amino acid sequence and DNA recognition allows for the engineering of specific DNA binding domains by selecting a combination of repeat segments containing the appropriate RVDs. In some embodiments, the transcription activator-like effector (TALE) DNA binding domain can be fused to a functional domain, such as a recombinase, a nuclease, a transposase or a helicase, thereby conferring sequence specificity to the functional domain.
[0064] In some embodiments, the gene editing system is a zinc finger nuclease (ZFN) system. Zinc finger nucleases generally refer to and include a chimeric polypeptide molecule comprising at least one zinc finger DNA binding domain, and generally three to five domains effectively linked to at least one nuclease capable of cleaving DNA. In many embodiments cleavage requires dimerization of two zinc finger nuclease domains. Furthermore, zinc finger nucleases are generally capable of directing targeted genetic recombination or targeted mutation in a host cell by causing a double-stranded break at a target locus. In certain instances, zinc finger nucleases include a DNA-binding domain and a DNA-cleavage domain, wherein the DNA binding domain is comprised of at least one zinc finger and is operatively linked to a DNA-cleavage domain. In such instances, the zinc finger DNA- binding domain is at the N-terminus of the chimeric protein molecule and the DNA- cleavage domain is located at the C-terminus of the molecule. In many embodiments cleavage requires dimerization of two zinc finger nuclease domains to cleave the intervening sequence.
Gene editing systems targeting regions having microhomology
[0065] Provided herein are targetable gene editing systems (e.g., such as those described herein) that specifically target and enzymatically act (e.g., cleave) on a template nucleic acid sequence (e.g., a viral genome) at (i) a first target sequence, generating a first cleaved region, and (ii) a second target sequence, generating a second cleaved region, the second cleaved region comprising microhomology or a region (e.g., a sequence within the cleaved region) of microhomology to the first cleaved region.
Microhomology
[0066] Microhomology-mediated end joining (MMEJ), as used herein, generally refers to and includes the mechanism for double stranded breaks in a template nucleic acid molecule (e.g., within a genome), which relies on exposed microhomologous sequences (i.e., sequences having microhomology) flanking broken junction to fix DSBs in a Ku- and ligase IV-independent manner. MMEJ generally involves five steps for repairing a double stranded break: resection of the DSB ends (generally 5’ to 3’ resection), annealing of region/sequences having microhomology, removal of heterologous flaps, fill-in synthesis (i.e., polymerase extension), and ligation. Additional pathways for repair of the cleaved or cleavable regions described herein
[0067] In some embodiments, microhomology can be determined by various known methods, such as Microhomology-Predictor (Bae, S., Kweon, J., Kim, H. et al. Microhomology-based choice of Cas9 nuclease target sites. Nat Methods 11 , 705-706 (2014) and MENTHLI (Robust Activation of Microhomology-mediated End Joining for Precision Gene Editing Applications. Ata H, Ekstrom TL, Martinez- Galvez G, Mann CM, Dvornikov AV, Schaefbauer KJ, Ma AC, Dobbs D, Clark KJ, Ekker SC. PLOS Genetics 14(9): e1007652), inDelphi (Max W. Shen, Mandana Arbab, Jonathan Y. Hsu, Daniel Worstell, Sannie J. Culbertson, Olga Krabbe, Christopher A. Cassa, David R. Liu, David K. Gifford, and Richard I. Sherwood. "Predictable and precise template-free editing of pathogenic variants." Nature, 2018), ForCasT (Elrick H, Nelakuditi V, Clark G, Brudno M, Ramani AK, Nutter LM. FORCAST: a fully integrated and open source pipeline to design Cas-mediated mutagenesis experiments) Lindel, and MENdel (Gabriel Martinez-Galvez, Parnal Joshi, Iddo Friedberg, Armando Manduca, Stephen C Ekker, Deploying MMEJ using MENdel in precision gene editing applications for gene therapy and functional genomics, Nucleic Acids Research, Volume 49, Issue 1 , 1 1 January 2021 ), each of which are herein incorporated by reference for the application of determining and/or identifying microhomology.
[0068] In some embodiments, the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%. For a template nucleic acid molecule (FIG. 1A - 100; FIG. 1 B - 100) having a first sequence (FIG. 1A - 110; FIG. 1 B - 110) comprising microhomology to a second sequence (FIG. 1A - 112; FIG. 1 B - 112), targetable gene editing systems (e.g., CRISPR-Cas systems, meganuclease systems, TALEN systems, or ZFN systems) can be configured to cut at (i) a first target site (FIG. 1A - 120 (target site) and 130 (cut/cleavage site); FIG. 1 B - 120 (target site) and 130 (cut/cleavage site)) located within (FIG. 1A - 120) or proximal (FIG. 1 B - 120) to the first sequence, and (ii) a second target site (FIG. 1A - 122 (target site) and 132 (cut/cleavage site); FIG. 1 B - 122 (target site) and 132 (cut/cleavage site)) located within (FIG. 1A - 122) or proximal (FIG. 1 B - 122) to the second sequence, thereby generating a first cleaved region (FIG. 1A - 140; FIG. 1 B - 140) and a second cleaved region (FIG. 1A - 142; FIG. 1 B - 142) comprising the sequences having microhomology (FIG. 1A - 110 and 112; FIG. 1 B - 110 and 112), wherein microhomology-based repair mechanisms (e.g., MMEJ) join facilitate excision through, in certain instances, resection of the DSB ends (generally 5’ to 3’ resection), annealing of region/sequences having microhomology, removal of heterologous flaps, fill-in synthesis (i.e. , polymerase extension), and ligation.
[0069] In some embodiments, sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology. In some embodiments, microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
[0070] In some embodiments, sequences having microhomology comprise a first sequence (e.g., within a first cleaved region; FIG. 1A or 1 B - 140) and a second sequence (e.g., within a second cleaved region; FIG. 1A or 1 B - 142), wherein the first sequence and the second sequence are complementary (e.g., full, substantially, or partially) to one another. In certain embodiments, first sequence and/or second sequence comprise about 2 nucleotides to about 20 nucleotides. In certain embodiments, first sequence and/or second sequence comprise about 2 nucleotides to about 3 nucleotides, about 2 nucleotides to about 4 nucleotides, about 2 nucleotides to about 5 nucleotides, about 2 nucleotides to about 6 nucleotides, about 2 nucleotides to about 7 nucleotides, about 2 nucleotides to about 8 nucleotides, about 2 nucleotides to about 9 nucleotides, about 2 nucleotides to about 10 nucleotides, about 2 nucleotides to about 12 nucleotides, about 2 nucleotides to about 15 nucleotides, about 2 nucleotides to about 20 nucleotides, about 3 nucleotides to about 4 nucleotides, about 3 nucleotides to about 5 nucleotides, about 3 nucleotides to about 6 nucleotides, about 3 nucleotides to about 7 nucleotides, about 3 nucleotides to about 8 nucleotides, about 3 nucleotides to about 9 nucleotides, about 3 nucleotides to about 10 nucleotides, about 3 nucleotides to about 12 nucleotides, about 3 nucleotides to about 15 nucleotides, about 3 nucleotides to about 20 nucleotides, about 4 nucleotides to about 5 nucleotides, about 4 nucleotides to about 6 nucleotides, about 4 nucleotides to about 7 nucleotides, about 4 nucleotides to about 8 nucleotides, about 4 nucleotides to about 9 nucleotides, about 4 nucleotides to about 10 nucleotides, about 4 nucleotides to about 12 nucleotides, about 4 nucleotides to about 15 nucleotides, about 4 nucleotides to about 20 nucleotides, about 5 nucleotides to about 6 nucleotides, about 5 nucleotides to about 7 nucleotides, about 5 nucleotides to about 8 nucleotides, about 5 nucleotides to about 9 nucleotides, about 5 nucleotides to about 10 nucleotides, about 5 nucleotides to about 12 nucleotides, about 5 nucleotides to about 15 nucleotides, about 5 nucleotides to about 20 nucleotides, about 6 nucleotides to about 7 nucleotides, about 6 nucleotides to about 8 nucleotides, about 6 nucleotides to about 9 nucleotides, about 6 nucleotides to about 10 nucleotides, about 6 nucleotides to about 12 nucleotides, about 6 nucleotides to about 15 nucleotides, about 6 nucleotides to about 20 nucleotides, about 7 nucleotides to about 8 nucleotides, about 7 nucleotides to about 9 nucleotides, about 7 nucleotides to about 10 nucleotides, about 7 nucleotides to about 12 nucleotides, about 7 nucleotides to about 15 nucleotides, about 7 nucleotides to about 20 nucleotides, about 8 nucleotides to about 9 nucleotides, about 8 nucleotides to about 10 nucleotides, about 8 nucleotides to about 12 nucleotides, about 8 nucleotides to about 15 nucleotides, about 8 nucleotides to about 20 nucleotides, about 9 nucleotides to about 10 nucleotides, about 9 nucleotides to about 12 nucleotides, about 9 nucleotides to about 15 nucleotides, about 9 nucleotides to about 20 nucleotides, about 10 nucleotides to about 12 nucleotides, about 10 nucleotides to about 15 nucleotides, about 10 nucleotides to about 20 nucleotides, about 12 nucleotides to about 15 nucleotides, about 12 nucleotides to about 20 nucleotides, or about 15 nucleotides to about 20 nucleotides. In certain embodiments, first sequence and/or second sequence comprise about 2 nucleotides, about 3 nucleotides, about 4 nucleotides, about 5 nucleotides, about 6 nucleotides, about 7 nucleotides, about 8 nucleotides, about 9 nucleotides, about 10 nucleotides, about 12 nucleotides, about 15 nucleotides, or about 20 nucleotides. In certain embodiments, first sequence and/or second sequence comprise at least about 2 nucleotides, about 3 nucleotides, about 4 nucleotides, about 5 nucleotides, about 6 nucleotides, about 7 nucleotides, about 8 nucleotides, about 9 nucleotides, about 10 nucleotides, about 12 nucleotides, or about 15 nucleotides.
[0071] In certain embodiments, first and second sequences having microhomology comprise about 2 complementary nucleotides to about 15 complementary nucleotides. In certain embodiments, first and second sequences having microhomology comprise about 2 complementary nucleotides to about 3 complementary nucleotides, about 2 complementary nucleotides to about 4 complementary nucleotides, about 2 complementary nucleotides to about 5 complementary nucleotides, about 2 complementary nucleotides to about 6 complementary nucleotides, about 2 complementary nucleotides to about 7 complementary nucleotides, about 2 complementary nucleotides to about 8 complementary nucleotides, about 2 complementary nucleotides to about 9 complementary nucleotides, about 2 complementary nucleotides to about 10 complementary nucleotides, about 2 complementary nucleotides to about 12 complementary nucleotides, about 2 complementary nucleotides to about 15 complementary nucleotides, about 3 complementary nucleotides to about 4 complementary nucleotides, about 3 complementary nucleotides to about 5 complementary nucleotides, about 3 complementary nucleotides to about 6 complementary nucleotides, about 3 complementary nucleotides to about 7 complementary nucleotides, about 3 complementary nucleotides to about 8 complementary nucleotides, about 3 complementary nucleotides to about 9 complementary nucleotides about 3 complementary nucleotides to about 10 complementary nucleotides about 3 complementary nucleotides to about 12 complementary nucleotides about 3 complementary nucleotides to about 15 complementary nucleotides, about 4 complementary nucleotides to about 5 complementary nucleotides, about 4 complementary nucleotides to about 6 complementary nucleotides, about 4 complementary nucleotides to about 7 complementary nucleotides, about 4 complementary nucleotides to about 8 complementary nucleotides, about 4 complementary nucleotides to about 9 complementary nucleotides about 4 complementary nucleotides to about 10 complementary nucleotides about 4 complementary nucleotides to about 12 complementary nucleotides about 4 complementary nucleotides to about 15 complementary nucleotides, about 5 complementary nucleotides to about 6 complementary nucleotides, about 5 complementary nucleotides to about 7 complementary nucleotides, about 5 complementary nucleotides to about 8 complementary nucleotides, about 5 complementary nucleotides to about 9 complementary nucleotides, about 5 complementary nucleotides to about 10 complementary nucleotides, about 5 complementary nucleotides to about 12 complementary nucleotides, about 5 complementary nucleotides to about 15 complementary nucleotides, about 6 complementary nucleotides to about 7 complementary nucleotides, about 6 complementary nucleotides to about 8 complementary nucleotides, about 6 complementary nucleotides to about 9 complementary nucleotides about 6 complementary nucleotides to about 10 complementary nucleotides about 6 complementary nucleotides to about 12 complementary nucleotides about 6 complementary nucleotides to about 15 complementary nucleotides, about 7 complementary nucleotides to about 8 complementary nucleotides, about 7 complementary nucleotides to about 9 complementary nucleotides, about 7 complementary nucleotides to about 10 complementary nucleotides, about 7 complementary nucleotides to about 12 complementary nucleotides, about 7 complementary nucleotides to about 15 complementary nucleotides, about 8 complementary nucleotides to about 9 complementary nucleotides, about 8 complementary nucleotides to about 10 complementary nucleotides, about 8 complementary nucleotides to about 12 complementary nucleotides, about 8 complementary nucleotides to about 15 complementary nucleotides, about 9 complementary nucleotides to about 10 complementary nucleotides, about 9 complementary nucleotides to about 12 complementary nucleotides, about 9 complementary nucleotides to about 15 complementary nucleotides, about 10 complementary nucleotides to about 12 complementary nucleotides, about 10 complementary nucleotides to about 15 complementary nucleotides, or about 12 complementary nucleotides to about 15 complementary nucleotides. In certain embodiments, first and second sequences having microhomology comprise about 2 complementary nucleotides, about 3 complementary nucleotides, about 4 complementary nucleotides, about 5 complementary nucleotides, about 6 complementary nucleotides, about 7 complementary nucleotides, about 8 complementary nucleotides, about 9 complementary nucleotides, about 10 complementary nucleotides, about 12 complementary nucleotides, or about 15 complementary nucleotides. In certain embodiments, first and second sequences having microhomology comprise at least about 2 complementary nucleotides, about 3 complementary nucleotides, about 4 complementary nucleotides, about 5 complementary nucleotides, about 6 complementary nucleotides, about 7 complementary nucleotides, about 8 complementary nucleotides, about 9 complementary nucleotides, about 10 complementary nucleotides, or about 12 complementary nucleotides.
[0072] In some embodiments, sequences having microhomology comprise a first sequence (e.g., within a first cleaved region; FIG. 1A or 1 B - 140) and a second sequence (e.g., within a second cleaved region; FIG. 1A or 1 B - 142), wherein the first sequence and the second sequence are capable of annealing to one another (e.g., under physiological conditions). In certain embodiments, first and second sequences having microhomology comprise about 2 nucleotides capable of annealing to about 20 nucleotides capable of annealing. In certain embodiments, first and second sequences having microhomology comprise about 2 nucleotides capable of annealing to about 3 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 4 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 5 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 6 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 7 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 8 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 9 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 10 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 12 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 15 nucleotides capable of annealing, about 2 nucleotides capable of annealing to about 20 nucleotides capable of annealing, about 3 nucleotides capable of annealing to about 4 nucleotides capable of annealing, about 3 nucleotides capable of annealing to about 5 nucleotides capable of annealing, about 3 nucleotides capable of annealing to about 6 nucleotides capable of annealing, about 3 nucleotides capable of annealing to about 7 nucleotides capable of annealing, about 3 nucleotides capable of annealing to about 8 nucleotides capable of annealing, about 3 nucleotides capable of annealing to about 9 nucleotides capable of annealing, about 3 nucleotides capable of annealing to about 10 nucleotides capable of annealing, about 3 nucleotides capable of annealing to about 12 nucleotides capable of annealing, about 3 nucleotides capable of annealing to about 15 nucleotides capable of annealing, about 3 nucleotides capable of annealing to about 20 nucleotides capable of annealing, about 4 nucleotides capable of annealing to about 5 nucleotides capable of annealing, about 4 nucleotides capable of annealing to about 6 nucleotides capable of annealing, about 4 nucleotides capable of annealing to about 7 nucleotides capable of annealing, about 4 nucleotides capable of annealing to about 8 nucleotides capable of annealing, about 4 nucleotides capable of annealing to about 9 nucleotides capable of annealing, about 4 nucleotides capable of annealing to about 10 nucleotides capable of annealing, about 4 nucleotides capable of annealing to about 12 nucleotides capable of annealing, about 4 nucleotides capable of annealing to about 15 nucleotides capable of annealing, about 4 nucleotides capable of annealing to about 20 nucleotides capable of annealing, about 5 nucleotides capable of annealing to about 6 nucleotides capable of annealing, about 5 nucleotides capable of annealing to about 7 nucleotides capable of annealing, about 5 nucleotides capable of annealing to about 8 nucleotides capable of annealing, about 5 nucleotides capable of annealing to about 9 nucleotides capable of annealing, about 5 nucleotides capable of annealing to about 10 nucleotides capable of annealing, about 5 nucleotides capable of annealing to about 12 nucleotides capable of annealing, about 5 nucleotides capable of annealing to about 15 nucleotides capable of annealing, about 5 nucleotides capable of annealing to about 20 nucleotides capable of annealing, about 6 nucleotides capable of annealing to about 7 nucleotides capable of annealing, about 6 nucleotides capable of annealing to about 8 nucleotides capable of annealing, about 6 nucleotides capable of annealing to about 9 nucleotides capable of annealing, about 6 nucleotides capable of annealing to about 10 nucleotides capable of annealing, about 6 nucleotides capable of annealing to about 12 nucleotides capable of annealing, about 6 nucleotides capable of annealing to about 15 nucleotides capable of annealing, about 6 nucleotides capable of annealing to about 20 nucleotides capable of annealing, about 7 nucleotides capable of annealing to about 8 nucleotides capable of annealing, about 7 nucleotides capable of annealing to about 9 nucleotides capable of annealing, about 7 nucleotides capable of annealing to about 10 nucleotides capable of annealing, about 7 nucleotides capable of annealing to about 12 nucleotides capable of annealing, about 7 nucleotides capable of annealing to about 15 nucleotides capable of annealing, about 7 nucleotides capable of annealing to about 20 nucleotides capable of annealing, about 8 nucleotides capable of annealing to about 9 nucleotides capable of annealing, about 8 nucleotides capable of annealing to about 10 nucleotides capable of annealing, about 8 nucleotides capable of annealing to about 12 nucleotides capable of annealing, about 8 nucleotides capable of annealing to about 15 nucleotides capable of annealing, about 8 nucleotides capable of annealing to about 20 nucleotides capable of annealing, about 9 nucleotides capable of annealing to about 10 nucleotides capable of annealing, about 9 nucleotides capable of annealing to about 12 nucleotides capable of annealing, about 9 nucleotides capable of annealing to about 15 nucleotides capable of annealing, about 9 nucleotides capable of annealing to about 20 nucleotides capable of annealing, about 10 nucleotides capable of annealing to about 12 nucleotides capable of annealing, about 10 nucleotides capable of annealing to about 15 nucleotides capable of annealing, about 10 nucleotides capable of annealing to about 20 nucleotides capable of annealing, about 12 nucleotides capable of annealing to about 15 nucleotides capable of annealing, about 12 nucleotides capable of annealing to about 20 nucleotides capable of annealing, or about 15 nucleotides capable of annealing to about 20 nucleotides capable of annealing. In certain embodiments, first and second sequences having microhomology comprise about 2 nucleotides capable of annealing, about 3 nucleotides capable of annealing, about 4 nucleotides capable of annealing, about 5 nucleotides capable of annealing, about 6 nucleotides capable of annealing, about 7 nucleotides capable of annealing, about 8 nucleotides capable of annealing, about 9 nucleotides capable of annealing, about 10 nucleotides capable of annealing, about 12 nucleotides capable of annealing, about 15 nucleotides capable of annealing, or about 20 nucleotides capable of annealing. In certain embodiments, first and second sequences having microhomology comprise at least about 2 nucleotides capable of annealing, about 3 nucleotides capable of annealing, about 4 nucleotides capable of annealing, about 5 nucleotides capable of annealing, about 6 nucleotides capable of annealing, about 7 nucleotides capable of annealing, about 8 nucleotides capable of annealing, about 9 nucleotides capable of annealing, about 10 nucleotides capable of annealing, about 12 nucleotides capable of annealing, or about 15 nucleotides capable of annealing.
[0073] In some embodiments, sequences having microhomology comprise a first sequence (e.g., within a first cleaved region; FIG. 1A - 140) and a second sequence (e.g., within a second cleaved region; FIG. 1A - 142), wherein the first sequence and/or the second sequence are located at the terminal end of a first cleaved region and/or second cleaved region, respectively. In some embodiments, the terminal end is the 3’ end of a cleaved region (FIG. 2A - 211 and 213). In some embodiments, sequences having microhomology comprise a first sequence (e.g., within a first cleaved region; FIG. 1A - 140) and a second sequence (e.g., within a second cleaved region; FIG. 1A - 142), wherein the first sequence and/or the second sequence are proximally located (e.g., within 2 nucleobase positions) at the terminal end of a first cleaved region and/or second cleaved region, respectively. In some embodiments, the terminal end is the 3’ end of a cleaved region (FIG. 2B - 211 and 213). In certain embodiments, the first sequence and/or the second sequence are located within 1 to 25 nucleobase positions from the terminal end of a first cleaved region and/or second cleaved region, respectively. In certain embodiments, the first sequence and/or the second sequence are located within about 1 nucleobase position from the terminal end to about 25 nucleobase positions from the terminal end. In certain embodiments, the first sequence and/or the second sequence are located within about 1 nucleobase position from the terminal end to about 2 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 3 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 4 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 5 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 10 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 15 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 20 nucleobase positions from the terminal end, about 1 nucleobase position from the terminal end to about 25 nucleobase positions from the terminal end, about 2 nucleobase positions from the terminal end to about 3 nucleobase positions from the terminal end, about 2 nucleobase positions from the terminal end to about 4 nucleobase positions from the terminal end, about 2 nucleobase positions from the terminal end to about 5 nucleobase positions from the terminal end, about 2 nucleobase positions from the terminal end to about 10 nucleobase positions from the terminal end, about 2 nucleobase positions from the terminal end to about 15 nucleobase positions from the terminal end, about 2 nucleobase positions from the terminal end to about 20 nucleobase positions from the terminal end, about 2 nucleobase positions from the terminal end to about 25 nucleobase positions from the terminal end, about 3 nucleobase positions from the terminal end to about 4 nucleobase positions from the terminal end, about 3 nucleobase positions from the terminal end to about 5 nucleobase positions from the terminal end, about 3 nucleobase positions from the terminal end to about 10 nucleobase positions from the terminal end, about 3 nucleobase positions from the terminal end to about 15 nucleobase positions from the terminal end, about 3 nucleobase positions from the terminal end to about 20 nucleobase positions from the terminal end, about 3 nucleobase positions from the terminal end to about 25 nucleobase positions from the terminal end, about 4 nucleobase positions from the terminal end to about 5 nucleobase positions from the terminal end, about 4 nucleobase positions from the terminal end to about 10 nucleobase positions from the terminal end, about 4 nucleobase positions from the terminal end to about 1 5 nucleobase positions from the terminal end, about 4 nucleobase positions from the terminal end to about 20 nucleobase positions from the terminal end, about 4 nucleobase positions from the terminal end to about 25 nucleobase positions from the terminal end, about 5 nucleobase positions from the terminal end to about 10 nucleobase positions from the terminal end, about 5 nucleobase positions from the terminal end to about 15 nucleobase positions from the terminal end, about 5 nucleobase positions from the terminal end to about 20 nucleobase positions from the terminal end, about 5 nucleobase positions from the terminal end to about 25 nucleobase positions from the terminal end, about 10 nucleobase positions from the terminal end to about 15 nucleobase positions from the terminal end, about 10 nucleobase positions from the terminal end to about 20 nucleobase positions from the terminal end, about 10 nucleobase positions from the terminal end to about 25 nucleobase positions from the terminal end, about 15 nucleobase positions from the terminal end to about 20 nucleobase positions from the terminal end, about 15 nucleobase positions from the terminal end to about 25 nucleobase positions from the terminal end, or about 20 nucleobase positions from the terminal end to about 25 nucleobase positions from the terminal end. In certain embodiments, the first sequence and/or the second sequence are located within about 1 nucleobase position from the terminal end, about 2 nucleobase positions from the terminal end, about 3 nucleobase positions from the terminal end, about 4 nucleobase positions from the terminal end, about 5 nucleobase positions from the terminal end, about 10 nucleobase positions from the terminal end, about 15 nucleobase positions from the terminal end, about 20 nucleobase positions from the terminal end, or about 25 nucleobase positions from the terminal end. In certain embodiments, the first sequence and/or the second sequence are located within at least about 1 nucleobase position from the terminal end, about 2 nucleobase positions from the terminal end, about 3 nucleobase positions from the terminal end, about 4 nucleobase positions from the terminal end, about 5 nucleobase positions from the terminal end, about 10 nucleobase positions from the terminal end, about 15 nucleobase positions from the terminal end, or about 20 nucleobase positions from the terminal end. In certain embodiments, the first sequence and/or the second sequence are located within at most about 2 nucleobase positions from the terminal end, about 3 nucleobase positions from the terminal end, about 4 nucleobase positions from the terminal end, about 5 nucleobase positions from the terminal end, about 10 nucleobase positions from the terminal end, about 15 nucleobase positions from the terminal end, about 20 nucleobase positions from the terminal end, or about 25 nucleobase positions from the terminal end.
[0074] In some embodiments, the first and second sequences having microhomology are located in different genes. In some embodiments, the first and second sequences having microhomology located in coding regions of different genes. In certain embodiments, the first and second sequences having microhomology are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In certain embodiments, the first and second sequences having microhomology are separated by about 50 base pairs to about 8,000 base pairs. In certain embodiments, the first and second sequences having microhomology are separated by about 50 base pairs to about 250 base pairs, about 50 base pairs to about 500 base pairs, about 50 base pairs to about 1 ,000 base pairs, about 50 base pairs to about 2,000 base pairs, about 50 base pairs to about 5,000 base pairs, about 50 base pairs to about 8,000 base pairs, about 250 base pairs to about 500 base pairs, about 250 base pairs to about 1 ,000 base pairs, about 250 base pairs to about 2,000 base pairs, about 250 base pairs to about 5,000 base pairs, about 250 base pairs to about 8,000 base pairs, about 500 base pairs to about 1 ,000 base pairs, about 500 base pairs to about 2,000 base pairs, about 500 base pairs to about 5,000 base pairs, about 500 base pairs to about 8,000 base pairs, about 1 ,000 base pairs to about 2,000 base pairs, about 1 ,000 base pairs to about 5,000 base pairs, about 1 ,000 base pairs to about 8,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about 8,000 base pairs, or about 5,000 base pairs to about 8,000 base pairs. In certain embodiments, the first and second sequences having microhomology are separated by about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs. In certain embodiments, the first and second sequences having microhomology are separated by at least about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, or about 5,000 base pairs. In certain embodiments, the first and second sequences having microhomology are separated by at most about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs.
[0075] In some embodiments, the first and second sequences having microhomology are located on different template nucleic acids (e.g., an episomal genome and an integrated genome).
Compositions comprising gene editing systems targeting regions having microhomology [0076] Provided herein are compositions, comprising: (a) a first gene editing system, wherein: the first gene editing system is configured to enzymatically cleave at a first target site on a template nucleic acid molecule and generate a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence; and (b) a second gene editing system, wherein: the second gene editing system is configured to enzymatically cleave at a second target site on the template nucleic acid molecule and generate a second cleaved region, the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
[0077] In some embodiments, the first target site and the second target site are different. In some embodiments, the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%. In some embodiments, the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides. In some embodiments, sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology. In some embodiments, microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
[0078] In some embodiments, the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region. FIG. 3A-3B show an example of a cleaved region (340 and 342) on a template nucleic acid molecule, wherein a first and a second cleaved region comprise a first (310 and 311 ) and a second sequence (312 and 313) having microhomology.
[0079] In some embodiments, a cleaved or cleavable region comprises about 5 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site. In some embodiments, a cleaved or cleavable region comprises about 5 base pairs 5' and 3' of a cut site to about 10 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 12 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 15 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 20 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site to about 12 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site to about 15 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site to about 20 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site to about 15 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site to about 20 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site to about 20 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site, or about 20 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site. In some embodiments, a cleaved or cleavable region comprises about 5 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site, about 20 base pairs 5' and 3' of a cut site, or about 25 base pairs 5' and 3' of a cut site. In some embodiments, a cleaved or cleavable region comprises at least about 5 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site, or about 20 base pairs 5' and 3' of a cut site. In some embodiments, a cleaved or cleavable region comprises at most about 10 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site, about 20 base pairs 5' and 3' of a cut site, or about 25 base pairs 5' and 3' of a cut site.
[0080] Also provided herein are compositions, comprising: (a) a first gene editing system, the first gene editing system configured to enzymatically cleave at a first target site on a template nucleic acid molecule and generate a first cleaved region; and (b) a second gene editing system, the second gene editing system configured to enzymatically cleave at a second target site on the template nucleic acid molecule and generate a second cleaved region, and the second target site comprising microhomology to the first target site. [0081] Further provided are compositions, comprising: (a) a first gene editing system, the first gene editing system configured to enzymatically cleave at a first target site on a template nucleic acid molecule and generate a first cleaved region on the template nucleic acid molecule, and (b) a second gene editing system, the second gene editing system configured to enzymatically cleave at a second target site on a template nucleic acid molecule and generate a second cleaved region on the template nucleic acid molecule, the second cleaved region having microhomology to the first cleaved region.
[0082] Any targetable nuclease system can be used to target sites or regions comprising microhomology. In some embodiments, the first gene editing system and the second gene editing system is selected from the group consisting of a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system.
[0083] In some embodiments, the microhomology comprises at least 2, at least
5, at least 10, at least 15, or at least 20 complementary nucleotides. In some embodiments, microhomology comprises about 2 complementary nucleotides to about 20 complementary nucleotides . In some embodiments, microhomology comprises about 2 complementary nucleotides to about 5 complementary nucleotides, about 2 complementary nucleotides to about 10 complementary nucleotides, about 2 complementary nucleotides to about 15 complementary nucleotides, about 2 complementary nucleotides to about 20 complementary nucleotides, about 5 complementary nucleotides to about 10 complementary nucleotides, about 5 complementary nucleotides to about 15 complementary nucleotides, about 5 complementary nucleotides to about 20 complementary nucleotides, about 10 complementary nucleotides to about 15 complementary nucleotides, about 10 complementary nucleotides to about 20 complementary nucleotides, or about 15 complementary nucleotides to about 20 complementary nucleotides. In some embodiments, microhomology comprises about 2 complementary nucleotides, about 5 complementary nucleotides, about 10 complementary nucleotides, about 15 complementary nucleotides, or about 20 complementary nucleotides. In some embodiments, microhomology comprises at least about 2 complementary nucleotides, about 5 complementary nucleotides, about 10 complementary nucleotides, or about 15 complementary nucleotides. In some embodiments, microhomology comprises more than 20 complementary nucleotides. In some embodiments, microhomology comprises more than 30 complementary nucleotides. In some embodiments, microhomology comprises more than 40 complementary nucleotides.
[0084] In some embodiments, the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are located within two or more genes of the template nucleic acid molecule. In some embodiments, the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are located within two or more protein coding regions of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are identical or substantially identical (e.g., greater than 75% sequence identity).
[0085] In some embodiments, the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal to the terminus of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site).
[0086] In some embodiments, generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule. In some embodiments, generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
[0087] In some embodiments, the deletion comprises at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In certain embodiments, the deletion comprises about 50 base pairs to about 8,000 base pairs. In certain embodiments, the deletion comprises about 50 base pairs to about 250 base pairs, about 50 base pairs to about 500 base pairs, about 50 base pairs to about 1 ,000 base pairs, about 50 base pairs to about 2,000 base pairs, about 50 base pairs to about 5,000 base pairs, about 50 base pairs to about 8,000 base pairs, about 250 base pairs to about 500 base pairs, about 250 base pairs to about 1 ,000 base pairs, about 250 base pairs to about 2,000 base pairs, about 250 base pairs to about 5,000 base pairs, about 250 base pairs to about 8,000 base pairs, about 500 base pairs to about 1 ,000 base pairs, about 500 base pairs to about 2,000 base pairs, about 500 base pairs to about 5,000 base pairs, about 500 base pairs to about 8, 000 base pairs, about 1 ,000 base pairs to about 2,000 base pairs, about 1 ,000 base pairs to about 5,000 base pairs, about 1 ,000 base pairs to about 8,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about 8,000 base pairs, or about 5,000 base pairs to about 8,000 base pairs. In certain embodiments, the deletion comprises about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs. In certain embodiments, the deletion comprises at least about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, or about 5,000 base pairs. In certain embodiments, the deletion comprises at most about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs.
[0088] In some embodiments, the deletion removes one or more genes within the template nucleic acid molecule. The composition of any one of claims 13 to 16, wherein the deletion removes of at least about 1 , 2, 3, 4, or 5 genes within the template nucleic acid molecule. In some embodiments, the deletion is a full deletion of a gene or a partial deletion of a gene. In some embodiments, the deletion comprises an inversion. In some embodiments, the first and second gene editing system excises all or substantially all (e.g., greater that half of the total template nucleic acid) of the template nucleic acid. In certain embodiments, the excised region comprises about 10% of the total template nucleic acid to about 100% of the total template nucleic acid. In certain embodiments, the excised region comprises about 10% of the total template nucleic acid to about 25% of the total template nucleic acid, about 10% of the total template nucleic acid to about 50% of the total template nucleic acid, about 10% of the total template nucleic acid to about 60% of the total template nucleic acid, about 10% of the total template nucleic acid to about 70% of the total template nucleic acid, about 10% of the total template nucleic acid to about 80% of the total template nucleic acid, about 10% of the total template nucleic acid to about 90% of the total template nucleic acid, about 10% of the total template nucleic acid to about 100% of the total template nucleic acid, about 25% of the total template nucleic acid to about 50% of the total template nucleic acid, about 25% of the total template nucleic acid to about 60% of the total template nucleic acid, about 25% of the total template nucleic acid to about 70% of the total template nucleic acid, about 25% of the total template nucleic acid to about 80% of the total template nucleic acid, about 25% of the total template nucleic acid to about 90% of the total template nucleic acid, about 25% of the total template nucleic acid to about 100% of the total template nucleic acid, about 50% of the total template nucleic acid to about 60% of the total template nucleic acid, about 50% of the total template nucleic acid to about 70% of the total template nucleic acid, about 50% of the total template nucleic acid to about 80% of the total template nucleic acid, about 50% of the total template nucleic acid to about 90% of the total template nucleic acid, about 50% of the total template nucleic acid to about 100% of the total template nucleic acid, about 60% of the total template nucleic acid to about 70% of the total template nucleic acid, about 60% of the total template nucleic acid to about 80% of the total template nucleic acid, about 60% of the total template nucleic acid to about 90% of the total template nucleic acid, about 60% of the total template nucleic acid to about 100% of the total template nucleic acid, about 70% of the total template nucleic acid to about 80% of the total template nucleic acid, about 70% of the total template nucleic acid to about 90% of the total template nucleic acid, about 70% of the total template nucleic acid to about 100% of the total template nucleic acid, about 80% of the total template nucleic acid to about 90% of the total template nucleic acid, about 80% of the total template nucleic acid to about 100% of the total template nucleic acid, or about 90% of the total template nucleic acid to about 100% of the total template nucleic acid. In certain embodiments, the excised region comprises about 10% of the total template nucleic acid, about 25% of the total template nucleic acid, about 50% of the total template nucleic acid, about 60% of the total template nucleic acid, about 70% of the total template nucleic acid, about 80% of the total template nucleic acid, about 90% of the total template nucleic acid, or about 100% of the total template nucleic acid. In certain embodiments, the excised region comprises at least about 10% of the total template nucleic acid, about 25% of the total template nucleic acid, about 50% of the total template nucleic acid, about 60% of the total template nucleic acid, about 70% of the total template nucleic acid, about 80% of the total template nucleic acid, or about 90% of the total template nucleic acid.
[0089] In some embodiments, the first target site and the second target site are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In certain embodiments, the first and second target sites are separated by about 50 base pairs to about 8,000 base pairs. In certain embodiments, the first and second target sites are separated by about 50 base pairs to about 250 base pairs, about 50 base pairs to about 500 base pairs, about 50 base pairs to about 1 ,000 base pairs, about 50 base pairs to about 2,000 base pairs, about 50 base pairs to about 5,000 base pairs, about 50 base pairs to about 8,000 base pairs, about 250 base pairs to about 500 base pairs, about 250 base pairs to about 1 ,000 base pairs, about 250 base pairs to about 2,000 base pairs, about 250 base pairs to about 5,000 base pairs, about 250 base pairs to about 8,000 base pairs, about 500 base pairs to about 1 ,000 base pairs, about 500 base pairs to about 2,000 base pairs, about 500 base pairs to about 5,000 base pairs, about 500 base pairs to about 8,000 base pairs, about 1 ,000 base pairs to about 2,000 base pairs, about 1 ,000 base pairs to about 5,000 base pairs, about 1 ,000 base pairs to about 8,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about 8,000 base pairs, or about 5,000 base pairs to about 8,000 base pairs. In certain embodiments, the first and second target sites are separated by about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs. In certain embodiments, the first and second target sites are separated by at least about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, or about 5,000 base pairs. In certain embodiments, the first and second target sites are separated by at most about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs.
[0090] In some embodiments, the template nucleic acid molecule is in a cell. In some embodiments, the cell is in an individual. In some embodiments, the individual is a human. In some embodiments, the template nucleic acid molecule is a viral genome. CRISPR-Cas Systems targeting regions having microhomology
[0091] Provided herein are compositions, comprising: (a) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease system comprising: (i) a first guide ribonucleic acid (gRNA) comprising a first spacer sequence that hybridizes to a first target site on a template nucleic acid molecule, and (ii) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease, wherein: the first CRISPR-associated nuclease cleaves the template nucleic acid molecule within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the first target site and generates a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence; and (b) a second CRISPR-associated endonuclease system comprising (i) a second guide ribonucleic acid (gRNA) comprising a second spacer sequence that hybridizes to a second target site on the template nucleic acid molecule, and (ii) a second Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease, wherein: the second CRISPR-associated nuclease cleaves the template nucleic acid molecule within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the second target site and generates a second cleaved region, the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
[0092] In some embodiments, the first gRNA and the second gRNA are different. In some embodiments, the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%. In some embodiments, the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides. In some embodiments, sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology. In some embodiments, microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence. [0093] In some embodiments, the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region. FIG. 3A-3B show an example of a cleaved region (340 and 342) on a template nucleic acid molecule, wherein a first and a second cleaved region comprise a first (310 and 311 ) and a second sequence (312 and 313) having microhomology.
[0094] In some embodiments, a cleaved or cleavable region comprises about 5 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site. In some embodiments, a cleaved or cleavable region comprises about 5 base pairs 5' and 3' of a cut site to about 10 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 12 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 15 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 20 base pairs 5' and 3' of a cut site, about 5 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site to about 12 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site to about 15 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site to about 20 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site to about 15 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site to about 20 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site to about 20 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site, or about 20 base pairs 5' and 3' of a cut site to about 25 base pairs 5' and 3' of a cut site. In some embodiments, a cleaved or cleavable region comprises about 5 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site, about 20 base pairs 5' and 3' of a cut site, or about 25 base pairs 5' and 3' of a cut site. In some embodiments, a cleaved or cleavable region comprises at least about 5 base pairs 5' and 3' of a cut site, about 10 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site, or about 20 base pairs 5' and 3' of a cut site. In some embodiments, a cleaved or cleavable region comprises at most about 10 base pairs 5' and 3' of a cut site, about 12 base pairs 5' and 3' of a cut site, about 15 base pairs 5' and 3' of a cut site, about 20 base pairs 5' and 3' of a cut site, or about 25 base pairs 5' and 3' of a cut site.
[0095] In some embodiments, the microhomology comprises at least 2, at least
5, at least 10, at least 15, or at least 20 complementary nucleotides. In some embodiments, the microhomology comprises at least 2, at least 5, at least 10, at least 15, or at least 20 complementary nucleotides. In some embodiments, microhomology comprises about 2 complementary nucleotides to about 20 complementary nucleotides. In some embodiments, microhomology comprises about 2 complementary nucleotides to about 5 complementary nucleotides, about
2 complementary nucleotides to about 10 complementary nucleotides, about 2 complementary nucleotides to about 15 complementary nucleotides, about 2 complementary nucleotides to about 20 complementary nucleotides, about 5 complementary nucleotides to about 10 complementary nucleotides, about 5 complementary nucleotides to about 15 complementary nucleotides, about 5 complementary nucleotides to about 20 complementary nucleotides, about 10 complementary nucleotides to about 15 complementary nucleotides, about 10 complementary nucleotides to about 20 complementary nucleotides, or about 15 complementary nucleotides to about 20 complementary nucleotides. In some embodiments, microhomology comprises about 2 complementary nucleotides, about 5 complementary nucleotides, about 10 complementary nucleotides, about
15 complementary nucleotides, or about 20 complementary nucleotides. In some embodiments, microhomology comprises at least about 2 complementary nucleotides, about 5 complementary nucleotides, about 10 complementary nucleotides, or about 15 complementary nucleotides. In some embodiments, microhomology comprises more than 20 complementary nucleotides.
[0096] In some embodiments, the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are located within two or more genes of the template nucleic acid molecule. In some embodiments, the first target site is located within a first protein coding region of the template nucleic acid molecu le and the second target site is located within a second protein coding region of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are located within two or more protein coding regions of the template nucleic acid molecule. In some embodiments, the first target site and the second target site are identical or substantially identical (e.g., greater than 75% sequence identity). In certain embodiments, matching regions (e.g., complementary regions) can have mismatched (e.g., non-complementary) nucleotides in the middle of and/or within the matching nucleotides, for example, two matching nucleotides, one non-matching nucleotides, followed by two matching nucleotides.
[0097] In some embodiments, the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal to the terminus of the first target site, the second target site, or both the first target site and the second target site (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a target site).
[0098] In some embodiments, generating a first cleaved region within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the first template sequence and generating a first cleaved region within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the second template sequence activates microhomology- mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule. In some embodiments, generating a first cleaved region within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the first template sequence and generating a first cleaved region within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the second template sequence excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
[0099] In some embodiments, the deletion comprises at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In some embodiments, the deletion comprises at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In certain embodiments, the deletion comprises about 50 base pairs to about 8,000 base pairs. In certain embodiments, the deletion comprises about 50 base pairs to about 250 base pairs, about 50 base pairs to about 500 base pairs, about 50 base pairs to about 1 ,000 base pairs, about 50 base pairs to about 2,000 base pairs, about 50 base pairs to about 5,000 base pairs, about 50 base pairs to about 8,000 base pairs, about 250 base pairs to about 500 base pairs, about 250 base pairs to about 1 ,000 base pairs, about 250 base pairs to about 2,000 base pairs, about 250 base pairs to about 5,000 base pairs, about 250 base pairs to about 8,000 base pairs, about 500 base pairs to about 1 ,000 base pairs, about 500 base pairs to about 2,000 base pairs, about 500 base pairs to about 5,000 base pairs, about 500 base pairs to about 8,000 base pairs, about 1 ,000 base pairs to about 2,000 base pairs, about 1 ,000 base pairs to about 5,000 base pairs, about 1 ,000 base pairs to about 8,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about 8,000 base pairs, or about 5,000 base pairs to about 8,000 base pairs. In certain embodiments, the deletion comprises about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs. In certain embodiments, the deletion comprises at least about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, or about 5,000 base pairs. In certain embodiments, the deletion comprises at most about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs.
[0100] In some embodiments, the deletion removes one or more genes within the template nucleic acid molecule. The composition of any one of claims 13 to 16, wherein the deletion removes of at least about 1 , 2, 3, 4, or 5 genes within the template nucleic acid molecule. In some embodiments, the deletion is a full deletion of a gene or a partial deletion of a gene. In some embodiments, the first and second gene editing system excises all or substantially all (e.g., greater that half of the total template nucleic acid) of the template nucleic acid. In some embodiments, the deletion removes one or more genes within the template nucleic acid molecule. In some embodiments, the deletion removes of at least about 1 , 2, 3, 4, or 5 genes within the template nucleic acid molecule
[0101] In some embodiments, the deletion is a full deletion of a gene or a partial deletion of a gene. In certain embodiments, the excised region comprises about 10% of the total template nucleic acid to about 100% of the total template nucleic acid. In certain embodiments, the excised region comprises about 10% of the total template nucleic acid to about 25% of the total template nucleic acid, about 10% of the total template nucleic acid to about 50% of the total template nucleic acid, about 10% of the total template nucleic acid to about 60% of the total template nucleic acid, about 10% of the total template nucleic acid to about 70% of the total template nucleic acid, about 10% of the total template nucleic acid to about 80% of the total template nucleic acid, about 10% of the total template nucleic acid to about 90% of the total template nucleic acid, about 10% of the total template nucleic acid to about 100% of the total template nucleic acid, about 25% of the total template nucleic acid to about 50% of the total template nucleic acid, about 25% of the total template nucleic acid to about 60% of the total template nucleic acid, about 25% of the total template nucleic acid to about 70% of the total template nucleic acid, about 25% of the total template nucleic acid to about 80% of the total template nucleic acid, about 25% of the total template nucleic acid to about 90% of the total template nucleic acid, about 25% of the total template nucleic acid to about 100% of the total template nucleic acid, about 50% of the total template nucleic acid to about 60% of the total template nucleic acid, about 50% of the total template nucleic acid to about 70% of the total template nucleic acid, about 50% of the total template nucleic acid to about 80% of the total template nucleic acid, about 50% of the total template nucleic acid to about 90% of the total template nucleic acid, about 50% of the total template nucleic acid to about 100% of the total template nucleic acid, about 60% of the total template nucleic acid to about 70% of the total template nucleic acid, about 60% of the total template nucleic acid to about 80% of the total template nucleic acid, about 60% of the total template nucleic acid to about 90% of the total template nucleic acid, about 60% of the total template nucleic acid to about 100% of the total template nucleic acid, about 70% of the total template nucleic acid to about 80% of the total template nucleic acid, about 70% of the total template nucleic acid to about 90% of the total template nucleic acid, about 70% of the total template nucleic acid to about 100% of the total template nucleic acid, about 80% of the total template nucleic acid to about 90% of the total template nucleic acid, about 80% of the total template nucleic acid to about 100% of the total template nucleic acid, or about 90% of the total template nucleic acid to about 100% of the total template nucleic acid. In certain embodiments, the excised region comprises about 10% of the total template nucleic acid, about 25% of the total template nucleic acid, about 50% of the total template nucleic acid, about 60% of the total template nucleic acid, about 70% of the total template nucleic acid, about 80% of the total template nucleic acid, about 90% of the total template nucleic acid, or about 100% of the total template nucleic acid. In certain embodiments, the excised region comprises at least about 10% of the total template nucleic acid, about 25% of the total template nucleic acid, about 50% of the total template nucleic acid, about 60% of the total template nucleic acid, about 70% of the total template nucleic acid, about 80% of the total template nucleic acid, or about 90% of the total template nucleic acid.
[0102] In some embodiments, the first target site and the second target site are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In some embodiments, the first target site and the second target site are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In certain embodiments, the first and second target sites are separated by about 50 base pairs to about 8,000 base pairs. In certain embodiments, the first and second target sites are separated by about 50 base pairs to about 250 base pairs, about 50 base pairs to about 500 base pairs, about 50 base pairs to about 1 ,000 base pairs, about 50 base pairs to about 2,000 base pairs, about 50 base pairs to about 5,000 base pairs, about 50 base pairs to about 8,000 base pairs, about 250 base pairs to about 500 base pairs, about 250 base pairs to about 1 ,000 base pairs, about 250 base pairs to about 2,000 base pairs, about 250 base pairs to about 5,000 base pairs, about 250 base pairs to about 8,000 base pairs, about 500 base pairs to about 1 ,000 base pairs, about 500 base pairs to about 2,000 base pairs, about 500 base pairs to about 5,000 base pairs, about 500 base pairs to about 8,000 base pairs, about 1 ,000 base pairs to about 2,000 base pairs, about 1 ,000 base pairs to about 5,000 base pairs, about 1 ,000 base pairs to about 8,000 base pairs, about 2,000 base pairs to about 5,000 base pairs, about 2,000 base pairs to about 8,000 base pairs, or about 5,000 base pairs to about 8,000 base pairs. In certain embodiments, the first and second target sites are separated by about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs. In certain embodiments, the first and second target sites are separated by at least about 50 base pairs, about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, or about 5,000 base pairs. In certain embodiments, the first and second target sites are separated by at most about 250 base pairs, about 500 base pairs, about 1 ,000 base pairs, about 2,000 base pairs, about 5,000 base pairs, or about 8,000 base pairs.
[0103] In some embodiments, the template nucleic acid molecule is in a cell. In some embodiments, the cell is in an individual. In some embodiments, the individual is a human. In some embodiments, the template nucleic acid molecule is a viral genome.
[0104] Also provided herein are nucleic acids encoding one or more components of the first CRISPR-Cas system and/or the second CRISPR-Cas system described herein.
[0105] Engineered CRISPR systems generally contain two components: a guide RNA (gRNA or sgRNA) and a CRISPR-associated endonuclease (Cas protein). In nature, CRISPR/CRISPR-associated (Cas) systems provide bacteria and archaea with adaptive immunity against viruses and plasmids by using CRISPR RNAs (crRNAs) to guide the silencing of invading nucleic acids. The CRISPR-Cas is an RNA-mediated adaptive defense system that relies on small RNA molecules for sequence-specific detection and silencing of foreign nucleic acids. CRISPR-Cas systems are composed of cas genes organized in operon(s) and CRISPR array(s) consisting of genome-targeting sequences (termed spacers).
[0106] CRISPR-Cas systems generally refer to and include an enzyme system that includes a guide RNA sequence that contains a nucleotide sequence complementary or substantially complementary to a region of a target polynucleotide (e.g., a template nucleic acid such a HSV genomic DNA), and a protein with nuclease activity. CRISPR-Cas systems include Type I CRISPR-Cas system, Type II CRISPR-Cas system, Type III CRISPR-Cas system, and derivatives thereof. CRISPR-Cas systems include engineered and/or programmed nuclease systems derived from naturally accruing CRISPR-Cas systems. CRISPR- Cas systems may contain engineered and/or mutated Cas proteins. In certain embodiments, nucleases generally refer to enzymes capable of cleaving the phosphodiester bonds between the nucleotide subunits of nucleic acids. In certain embodiments, endonucleases are generally capable of cleaving the phosphodiester bond within a polynucleotide chain. Nickases refer to endonucleases that cleave only a single strand of a DNA duplex.
[0107] In some embodiments, the CRISPR-Cas system used herein can be a type I, a type II, or a type III system. Non-limiting examples of suitable CRISPR- Cas proteins include Cas3, Cas4, Cas5, Cas5e (or CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8a1 , Cas8a2, Cas8b, Cas8c, Cas9, Casi o, Casl Od, CasF, CasG, CasH, CasX, Cas<t>, Csy1 , Csy2, Csy3, Cse1 (or CasA), Cse2 (or CasB), Cse3 (or CasE), Cse4 (or CasC), Csc1 , Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1 , Cmr3, Cmr4, Cmr5, Cmr6, Csb1 , Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CasX, Csx3, Csz1 , Csx15, Csf1 , Csf2, Csf3, Csf4, and Cu1966. In certain embodiments, the CRISPR-Cas protein or endonuclease is Cas9. In certain embodiments, the CRISPR-Cas protein or endonuclease is Cas12. In certain embodiments, the CRISPR-Cas protein or endonuclease is CasX. In certain embodiments, the CRISPR-Cas protein or endonuclease is Cas<t>.
[0108] In some embodiments, the Cas9 protein can be from or derived from: Staphylococcus aureus, Streptococcus pyogenes, Streptococcus thermophilus, Streptococcus sp., Nocardiopsis dassonvillei, Streptomyces pristinaespiralis, Streptomyces viridochromogenes, Streptomyces viridochromogenes, Streptosporangium roseum, Alicyclobacillus acidocaldarius, Bacillus pseudomycoides, Bacillus selenitireducens, Exiguobacterium sibiricum, Lactobacillus delbrueckii, Lactobacillus salivarius, Microscilla marina, Burkholderiales bacterium, Polaromonas naphthalenivorans, Polaromonas sp., Crocosphaera watsonii, Cyanothece sp., Microcystis aeruginosa, Synechococcus sp., Acetohalobium arabaticum, Ammonifex degensii, Caldicelulosiruptor becscii, Candidatus Desulforudis, Clostridium botulinum, Clostridium difficile, Fine goldia magna, Natranaerobius thermophilus, Pelotomaculum the rmopropionicum, Acidithiobacillus caldus, Acidithiobacillus ferrooxidans, Allochromatium vinosum, Marinobacter sp., Nitrosococcus halophilus, Nitrosococcus watsoni, Pseudoalteromonas haloplanktis, Ktedonobacter racemifer, Methanohalobium evestigatum, Anabaena variabilis, Nodularia spumigena, Nostoc sp., Arthrospira maxima, Arthrospira platensis, Arthrospira sp., Lyngbya sp., Microcoleus chthonoplastes, Oscillatoria sp., Petrotoga mobilis, Thermosipho africanus, or Acaryochloris marina.
[0109] In some embodiments, the gene editing system comprises a CRISPR- associated (Cas) protein, or functional fragment or derivative thereof. In certain embodiments, the Cas protein is an endonuclease, including but not limited to the Cas9 nuclease. In some embodiments, the Cas9 protein comprises an amino acid sequence identical to the wild type Streptococcus pyogenes or Staphylococcus aureus Cas9 amino acid sequence. In some embodiments, the Cas protein may comprise the amino acid sequence of a Cas protein from other species, for example other Streptococcus species, such as thermophilus; Pseudomonas aeruginosa, Escherichia coli, or other sequenced bacteria genomes and archaea, or other prokaryotic microorganisms. Other Cas proteins, useful for the present disclosure, known or can be identified, using methods known in the art (see e.g., Esvelt et al., 2013, Nature Methods, 10: 1 116-1121 ). In certain embodiments, the Cas protein may comprise a modified amino acid sequence, as compared to its natural source. [0110] CRISPR-Cas proteins comprise at least one RNA recognition and/or RNA binding domain. RNA recognition and/or RNA binding domains interact with guide RNAs (gRNAs). CRISPR-Cas proteins can also comprise nuclease domains (i.e., DNase or RNase domains), DNA binding domains, helicase domains, RNase domains, protein-protein interaction domains, dimerization domains, as well as other domains.
[0111] In some embodiments, the CRISPR-Cas-like protein can be a wild type CRISPR-Cas protein, a modified CRISPR-Cas protein, or a fragment of a wild type or modified CRISPR-Cas protein. In some embodiments, the CRISPR-Cas-like protein can be modified to increase nucleic acid binding affinity and/or specificity, alter an enzymatic activity, and/or change another property of the protein. For example, nuclease (i.e., DNase, RNase) domains of the CRISPR-Cas-like protein can be modified, deleted, or inactivated. Alternatively, in some embodiments, the CRISPR-Cas-like protein can be truncated to remove domains that are not essential for the function of the Cas protein. In some embodiments, the CRISPR- Cas-like protein can also be truncated or modified to optimize the activity of the effector domain of the Cas protein.
[0112] In some embodiments, the CRISPR-Cas-like protein can be derived from a wild type Cas protein or fragment thereof. In certain embodiments, the CRISPR- Cas-like protein is a modified Cas9 protein. For example, the amino acid sequence of the Cas9 protein can be modified to alter one or more properties (e.g., nuclease activity, affinity, stability, etc.) of the protein relative to wild-type or another Cas protein. Alternatively, in some embodiments, domains of the Cas9 protein not involved in RNA-guided cleavage can be eliminated from the protein such that the modified Cas9 protein is smaller than the wild-type Cas9 protein.
[0113] In some embodiments, the disclosed CRISPR-Cas compositions should also be construed to include any form of a protein having substantial homology to a Cas protein (e.g., Cas9, saCas9, Cas9 protein) disclosed herein. Preferably, a protein which is “substantially homologous” is about 50% homologous, more preferably about 70% homologous, even more preferably about 80% homologous, more preferably about 90% homologous, even more preferably, about 95% homologous, and even more preferably about 99% homologous to amino acid sequence of a Cas protein disclosed herein.
[0114] The gRNA is a short synthetic RNA composed of a scaffold sequence necessary for Cas-binding and targeting sequence (also referred to as a spacer sequence) that defines the genomic target to be modified. The gRNA functions, in part, by hybridizing to a template nucleic acid molecule (e.g., at a targeted site).
[0115] Hybridization, as used herein, generally refers to and includes the capacity and/or ability of a first nucleic acid molecule to non-covalently bind (e.g., form Watson-Crick-base pairs and/or G/ll base pairs), anneal, and/or hybridize to a second nucleic acid molecule under the appropriate or certain in vitro and/or in vivo conditions of temperature, pH, and/or solution ionic strength. Generally, standard Watson-Crick base pairing includes: adenine (A) pairing with thymidine (T); adenine (A) pairing with uracil (II); and guanine (G) pairing with cytosine (C). In some embodiments, hybridization comprises at least two nucleic acids comprising complementary sequences (e.g., fully complementary, substantially complementary, or partially complementary). In certain embodiments, hybridization comprises at least two nucleic acids comprising fully complementary sequences. In certain embodiments, hybridization comprises at least two nucleic acids comprising substantially complementary sequences (e.g., greater than about 75%, greater than about 80%, greater than about 85%, greater than about 90%, or greater than about 95% complementary). In certain embodiments, hybridization comprises at least two nucleic acids comprising partially complementary sequences (e.g., greater than about 40%, greater than about 50%, greater than about 60%, or greater than about 70% complementary). In certain embodiments, partially complementary sequences comprises one or more regions of fully or substantially complementary sequences. In certain embodiments, partially complementary sequences comprises one or more regions of fully or substantially complementary sequences, even if an overall complementarity is low (e.g., a total complementarity lower than about 50%, lower than about 40%, lower than about 30%, or lower than about 20%). The conditions appropriate for hybridization between two nucleic acids depend on the length of the nucleic acids and the degree of complementation, variables well known in the art. For example, the greater the degree of complementation between two nucleotide sequences, the greater the value of the melting temperature (Tm) for hybrids of nucleic acids having those sequences. For hybridizations between nucleic acids with short stretches of complementarity (e.g., complementarity over 35 or less, 30 or less, 25 or less, 22 or less, 20 or less, or 18 or less nucleotides) the position of mismatches becomes important (see Sambrook et al., supra, 1 1.7-11 .8).
[0116] Complementary or complementarity, as used herein, generally refers to a polynucleotide that includes a nucleotide sequence capable of selectively annealing to an identifying region of a target polynucleotide under certain conditions. As used herein, the term substantially complementary and grammatical equivalents is intended to mean a polynucleotide that includes a nucleotide sequence capable of specifically annealing to an identifying region of a target polynucleotide under certain conditions. Annealing refers to the nucleotide basepairing interaction of one nucleic acid with another nucleic acid that results in the formation of a duplex, triplex, or other higher-ordered structure. The primary interaction is typically nucleotide base specific, e.g., A:T, A: U , and G:C, by Watson- Crick and Hoogsteen-type hydrogen bonding. In certain embodiments, basestacking and hydrophobic interactions can also contribute to duplex stability. Conditions under which a polynucleotide anneals to complementary or substantially complementary regions of target nucleic acids are well known in the art, e.g., as described in Nucleic Acid Hybridization, A Practical Approach, Hames and Higgins, eds., IRL Press, Washington, D.C. (1985) and Wetmur and Davidson, Mol. Biol. 31 :349 (1968). Annealing conditions will depend upon the particular application and can be routinely determined by persons skilled in the art, without undue experimentation. Hybridization generally refers to process in which two single-stranded polynucleotides bind non-covalently to form a stable doublestranded polynucleotide.
[0117] In certain embodiments, the composition comprises multiple different gRNA molecules, each targeted (e.g., capable of hybridizing) to a different target sequence. In certain embodiments, this multiplexed strategy provides for increased efficacy. These multiplex gRNAs or combination of gRNAs can be expressed separately in different vectors or expressed in one single vector.
[0118] The temperature and solution salt concentration are generally recognized as factors facilitating hybridization, and may be adjusted as necessary according to factors such as length of the region of complementation and the degree of complementarity. Hybridization and washing conditions are well known and exemplified in Sambrook, J., Fritsch, E: F. and Maniatis, T. Molecular Cloning: A Laboratory Manual- Second Edition. Cold Spring Harbor Laboratory Press, Cold Spring Harbor (1989), particularly Chapter 1 1 and Table 11.1 therein; and Sambrook, J. and Russell, W., Molecular Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor (2001 ). The conditions of temperature and ionic strength determine the stringency of the hybridization. In some embodiments, hybridization is measured a under physiological temperature (e.g., 37 degrees Celsius) and salt concentrations (e.g., 0.15 molar or 0.9% salt in solution).
[0119] Target specificity can be used in reference to a guide RNA, or a crRNA specific to a target polynucleotide sequence or region and further includes a sequence of nucleotides capable of selectively annealing/hybridizing to a target (sequence or region) of a target polynucleotide, e.g., a target DNA. Target specific nucleotides can have a single species of oligonucleotide, or it can include two or more species with different sequences. Thus, the target specific nucleotide can be two or more sequences, including 3, 4, 5, 6, 7, 8, 9 or 10 or more different sequences. In certain embodiments, a crRNA or the derivative thereof contains a target-specific nucleotide region complementary to a region of the target DNA sequence. In certain embodiments, a crRNA or the derivative thereof may contain other nucleotide sequences besides a target-specific nucleotide region. In certain embodiments, the other nucleotide sequences may be from a tracrRNA sequence. [0120] gRNAs are generally supported by a scaffold, wherein a scaffold refers to the portions of gRNA or crRNA molecules comprising sequences which are substantially identical or are highly conserved across natural biological species (e.g., not conferring target specificity). Scaffolds include the tracrRNA segment and the portion of the crRNA segment other than the polynucleotide-targeting guide sequence at or near the 5' end of the crRNA segment, excluding any unnatural portions comprising sequences not conserved in native crRNAs and tracrRNAs. In some embodiments, the gRNA comprises a CRISPR RNA (crRNA):trans activating cRNA (tracrRNA) duplex. In some embodiments, the gRNA comprises a stem-loop that mimics the natural duplex between the crRNA and tracrRNA. In some embodiments, the stem-loop comprises a nucleotide sequence comprising non- naturally occurring sequence. For example, in some embodiments, the composition comprises a synthetic or chimeric guide RNA comprising a crRNA, stem, and tracrRNA. [0121] Generally, a protospacer adjacent motif (PAM) is also an important sequence element mediating enzymatic activity of a Cas nuclease. A PAM sequence or element also refers to and includes an approximately 2-6 base pair DNA sequence that is an important targeting component of a Cas nuclease. The PAM sequence further comprises, in certain instances, a DNA sequence that may be required for a Cas/sgRNA to form an R-loop to interrogate a specific DNA sequence through Watson-Crick pairing of its guide RNA with the genome. In certain instances, the PAM specificity can be a function of the DNA-binding specificity of the Cas protein (e.g., a PAM recognition domain of a Cas), wherein, a protospacer adjacent motif recognition domain refers to a Cas amino acid sequence that comprises a binding site to a DNA target PAM sequence.
[0122] Typically, the PAM sequence is on either strand, and is downstream in the 5' to 3' direction of Cas9 cut site. The canonical PAM sequence (i.e. , the PAM sequence that is associated with the Cas9 nuclease of Streptococcus pyogenes or SpCas9) is 5'-NGG-3' wherein “N” is any nucleobase followed by two guanine (“G”) nucleobases. Different PAM sequences can be associated with different Cas9 nucleases or equivalent proteins from different organisms. In addition, any given Cas9 nuclease, e.g., SpCas9, may be modified to alter the PAM specificity of the nuclease such that the nuclease recognizes alternative PAM sequence. In the CRISPR-Cas system derived from S. pyogenes (spCas9), the protospacer region DNA typically immediately precedes a 5'-NGG or NAG proto-spacer adjacent motif (PAM). Other Cas9 orthologs can have different PAM specificities. For example, Cas9 from S. thermophilus (stCas9) requires 5'-NNAGAA for CRISPR 1 and 5'- NGGNG for CRISPR3 and Neiseria menigiditis (nmCas9) requires 5'-NNNNGATT. Cas9 from Staphylococcus aureus subsp. aureus (saCas9) requires 5'-NNGRRT (R=A or G). In some embodiments, Cas9 enzymes from different bacterial species (i.e., Cas9 orthologs) can have varying PAM specificities. For example, Cas9 from Staphylococcus aureus (SaCas9) recognizes NGRRT or NGRRN. In addition, Cas9 from Neisseria meningitis (NmCas) recognizes NNNNGATT. In another example, Cas9 from Streptococcus thermophilis (StCas9) recognizes NNAGAAW. In still another example, Cas9 from Treponema denticola (TdCas) recognizes NAAAAC. These are example are not meant to be limiting. It will be further appreciated that non-SpCas9s bind a variety of PAM sequences, which makes them useful when no suitable SpCas9 PAM sequence is present at the desired target cut site. Furthermore, non-SpCas9s can have other characteristics that make them more useful than SpCas9.
[0123] In some embodiments, the gRNA spacer sequence comprises about 15 nucleotides to about 28 nucleotides. In some embodiments, the gRNA comprises at least about 15 nucleotides. In some embodiments, the gRNA spacer sequence comprises at most about 28 nucleotides. In some embodiments, the gRNA spacer sequence comprises about 15 nucleotides to about 16 nucleotides, about 15 nucleotides to about 17 nucleotides, about 15 nucleotides to about 18 nucleotides, about 15 nucleotides to about 19 nucleotides, about 15 nucleotides to about 20 nucleotides, about 15 nucleotides to about 21 nucleotides, about 15 nucleotides to about 22 nucleotides, about 15 nucleotides to about 23 nucleotides, about 15 nucleotides to about 24 nucleotides, about 15 nucleotides to about 25 nucleotides, about 15 nucleotides to about 28 nucleotides, about 16 nucleotides to about 17 nucleotides, about 16 nucleotides to about 18 nucleotides, about 16 nucleotides to about 19 nucleotides, about 16 nucleotides to about 20 nucleotides, about 16 nucleotides to about 21 nucleotides, about 16 nucleotides to about 22 nucleotides, about 16 nucleotides to about 23 nucleotides, about 16 nucleotides to about 24 nucleotides, about 16 nucleotides to about 25 nucleotides, about 16 nucleotides to about 28 nucleotides, about 17 nucleotides to about 18 nucleotides, about 17 nucleotides to about 19 nucleotides, about 17 nucleotides to about 20 nucleotides, about 17 nucleotides to about 21 nucleotides, about 17 nucleotides to about 22 nucleotides, about 17 nucleotides to about 23 nucleotides, about 17 nucleotides to about 24 nucleotides, about 17 nucleotides to about 25 nucleotides, about 17 nucleotides to about 28 nucleotides, about 18 nucleotides to about 19 nucleotides, about 18 nucleotides to about 20 nucleotides, about 18 nucleotides to about 21 nucleotides, about 18 nucleotides to about 22 nucleotides, about 18 nucleotides to about 23 nucleotides, about 18 nucleotides to about 24 nucleotides, about 18 nucleotides to about 25 nucleotides, about 18 nucleotides to about 28 nucleotides, about 19 nucleotides to about 20 nucleotides, about 19 nucleotides to about 21 nucleotides, about 19 nucleotides to about 22 nucleotides, about 19 nucleotides to about 23 nucleotides, about 19 nucleotides to about 24 nucleotides, about 19 nucleotides to about 25 nucleotides, about 19 nucleotides to about 28 nucleotides, about 20 nucleotides to about 21 nucleotides, about 20 nucleotides to about 22 nucleotides, about 20 nucleotides to about 23 nucleotides, about 20 nucleotides to about 24 nucleotides, about 20 nucleotides to about 25 nucleotides, about 20 nucleotides to about 28 nucleotides, about 21 nucleotides to about 22 nucleotides, about 21 nucleotides to about 23 nucleotides, about 21 nucleotides to about 24 nucleotides, about 21 nucleotides to about 25 nucleotides, about 21 nucleotides to about 28 nucleotides, about 22 nucleotides to about 23 nucleotides, about 22 nucleotides to about 24 nucleotides, about 22 nucleotides to about 25 nucleotides, about 22 nucleotides to about 28 nucleotides, about 23 nucleotides to about 24 nucleotides, about 23 nucleotides to about 25 nucleotides, about 23 nucleotides to about 28 nucleotides, about 24 nucleotides to about 25 nucleotides, about 24 nucleotides to about 28 nucleotides, or about 25 nucleotides to about 28 nucleotides. In some embodiments, the gRNA spacer sequence comprises about 15 nucleotides, about 16 nucleotides, about 17 nucleotides, about 18 nucleotides, about 19 nucleotides, about 20 nucleotides, about 21 nucleotides, about 22 nucleotides, about 23 nucleotides, about 24 nucleotides, about 25 nucleotides, or about 28 nucleotides.
[0124] In some embodiments, the gene editing system comprises: (a) a Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated endonuclease or a nucleic acid encoding the CRISPR-associated endonuclease; and (b) a first guide nucleic acid that hybridizes to a first target site of a template nucleic acid molecule (e.g., a sequence repeated within the template nucleic acid molecule). In some embodiments, the gene editing system comprises: (a) a Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated endonuclease or a nucleic acid encoding the CRISPR-associated endonuclease; (b) a first guide nucleic acid that hybridizes to a first target site within a template nucleic acid molecule; and (c) a second guide nucleic acid that hybridizes to a second target site within the template nucleic acid molecule. In some embodiments, the gene editing system comprises: (a) a Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated endonuclease or a nucleic acid encoding the CRISPR-associated endonuclease; (b) a first guide nucleic acid that hybridizes to a first target site within a template nucleic acid molecule; (c) a second guide nucleic acid that hybridizes to a second target site within the template nucleic acid molecule; and (d) a third guide nucleic acid that hybridizes to a third target site within the template nucleic acid molecule. Methods
[0125] Provided herein are methods of excising a nucleic acid molecule from a template nucleic acid molecule, the method comprising: (a) cleaving the template nucleic acid molecule at a first target site on the template nucleic acid molecule, thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence; and (b) cleaving the template nucleic acid molecule at a second target site on the template nucleic acid molecule, thereby generating a second cleaved region, wherein: the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
[0126] In some embodiments, (a) comprises contacting the template nucleic acid with a first gene editing system or a first CRISPR-associated nuclease system and cleaving the template nucleic acid molecule, and wherein (b) comprises contacting the template nucleic acid with a second gene editing system or a second CRISPR- associated nuclease system.
[0127] In some embodiments, the first target site and the second target site are different.
[0128] Provided herein are also methods of editing a template nucleic acid molecule, comprising: (a) cleaving the template nucleic acid molecule at a first target site on a template nucleic acid molecule, the first target site being located within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to a first template sequence; and (b) cleaving the template nucleic acid molecule at a second target site on the template nucleic acid molecule, the second target site being located within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to a second template sequence, the second template sequence comprising microhomology to the first template sequence.
[0129] Also provided are methods of editing a template nucleic acid molecule, comprising: (a) cleaving the template nucleic acid molecule at a first target site, thereby generating a first cleaved region on the template nucleic acid molecule, the first cleaved region or segment thereof comprising a first template sequence; and (b) cleaving the template nucleic acid molecule at a second target site comprising microhomology to the first target site, thereby generating a second cleaved region on the template nucleic acid molecule, the second cleaved region or segment thereof comprising a second template sequence, the first template sequence comprising microhomology to the second template sequence. In some embodiments, the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal to the terminus of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site).
[0130] Further provided are methods of inactivating a virus, comprising: (a) cleaving the template nucleic acid molecule at a first target site on a viral genome, the first target site being located within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to a first viral sequence; and (b) cleaving the template nucleic acid molecule at a second target site on the viral genome, the second target site being located within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to a second viral sequence, the second viral sequence comprising microhomology to the first viral sequence.
[0131] Also provided are methods of inactivating a virus, comprising: (a) cleaving a viral genome at a first target site, the first target site being located within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to a first viral sequence; and (b) cleaving the viral genome at a second target site, the second target site being located within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to a second viral sequence, the second viral sequence comprising microhomology to the first viral sequence. In some embodiments, the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. In some embodiments, the microhomology is located proximal to the terminus of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site).
[0132] In some embodiments, the method comprises, prior to (a), selecting the first template sequence or first viral sequence, and selecting the second template sequence or second viral sequence. [0133] Any of the gene editing systems described herein can be used in the methods. In some embodiments, the cleaving is performed by an a CRISPR-Cas system, a meganuclease system, a TALEN system, or a ZFN system. In some embodiments, the method comprises prior to (a), contacting the template nucleic acid molecule with one or more enzymes selected from the group consisting of: a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system. In some embodiments, cleaving is performed by the compositions described herein. In some embodiments, cleaving is performed by the compositions described herein.
Computer-implemented Methods and Systems
[0134] Provided herein are computer-implemented methods of cut site identification and characterization for cutting a template polynucleotide molecule, the computer-implemented method comprising:
(a) generating, by one or more computers, microhomology data for a plurality of cleavable regions comprising cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (i) the location of cut sites and/or (ii) nucleobase sequences of nucleobase positions within the cleavable regions comprising the cut sites; and
(b) identifying, by one or more computers, a first cleavable region and a second cleavable region comprising microhomology using the microhomology data.
[0135] Provided herein are computer-implemented methods of cut site identification and characterization for cutting a template polynucleotide molecule, the computer-implemented method comprising:
(a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence);
(b) identifying or providing, by one or more computers, a plurality of cleavable regions comprising cut sites within the template nucleic acid sequence;
(c) generating, by one or more computers, positional data for a cleavable region of the plurality of cleavable regions using the template nucleic acid sequence, wherein the positional data comprises (i) the location of the cleavable region, (ii) a cut site within the cleavable region, and/or (iii) nucleobase sequences of nucleobase positions within the cleavable region;
(d) generating, by one or more computers, microhomology data the plurality of cleavable regions using the positional data, and identifying a first cleavable region and a second cleavable region comprising microhomology using the microhomology data.
[0136] In some embodiments, the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
[0137] In some embodiments, a cleavable region comprises (i) about 10 base pairs 5’ of a cut site within the cleavable region and (ii) about 10 base pairs 3’ of the cut site within the cleavable region.
[0138] In some embodiments, the microhomology data is a function of:
(i) total nucleobase complementarity of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region;
(ii) the length (e.g., number of contiguous nucleobases) of complementary nucleobases of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region;
(iii) GC content of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region;
(iv) orientation and/or strand location (e.g., for identifying inversion outcomes);
(v) base content of complementary nucleobases between nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region; or
(vi) a combination of (i)-(v).
[0139] In some embodiments, the template nucleic acid sequence comprises consensus sequence, and wherein the computer-implemented method comprises in (a) generating, by one or more computers, the template nucleic acid sequence by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes). In some embodiments, the template nucleic acid sequence is different from each input nucleic acid sequence used to generate the consensus sequence. [0140] In some embodiments, the two or more input nucleic acid sequences are present within a definable geographical region (e.g., Asia, Europe, North America, etc.), a definable population of individuals (e.g., a patient population), or a definable pathology (e.g., cancer-causing variants).
[0141] In some embodiments, the computer-implemented method further comprises: generating, by one or more computers, positional entropy data for a nucleotide at each position of the template nucleic acid sequence. [0142] In some embodiments, the method further comprises: generating, by the one or more computers, additional data using the template nucleic acid sequence and/or the positional data, wherein the additional data comprises positional entropy data (e.g., Shannon entropy) for a cut site and/or nucleobase positions proximal to the cut site, gene location (e.g., coding region data) data for a cut site and/or nucleobase positions proximal to the cut site, a distance data (e.g., distance from other target sequences) for a cut site and/or nucleobase positions proximal to the cut site, proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence) for a cut site and/or nucleobase positions proximal to the cut site, target specificity and selectivity data (e.g., Azimuth 2.0) for a cut site and/or nucleobase positions proximal to the cut site, or combinations thereof. [0143] In some embodiments, the method further comprises identifying a first target site sequence comprising or adjacent to the first cut site and a second target site sequence comprising or adjacent to the second cut site.
[0144] In some embodiments, the method further comprises: generating, by the one or more computers, additional data for the first target nucleic acid sequence and the second target nucleic acid sequence, wherein the additional data comprises positional entropy data (e.g., Shannon entropy), gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof.
[0145] Provided herein are computer-implemented methods of cut site identification for cutting a template polynucleotide molecule, the computer- implemented method comprising: (a) generating, by one or more computers, microhomology data for a plurality of cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (i) the location of cut sites and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the cut sites; and (b) identifying, by one or more computers, a first cut site and a second cut site using the microhomology data, wherein nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology.
[0146] Further provided are computer-implemented methods of cut site identification for cutting a template polynucleotide molecule, the computer- implemented method comprising: (a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence); (b) identifying or providing, by one or more computers, a plurality of cut sites within the template nucleic acid sequence; (c) generating, by one or more computers, positional data for a cut site of the plurality of cut sites using the template nucleic acid sequence, wherein the positional data comprises (i) the location of the cut site and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the cut site; and (d) generating, by one or more computers, microhomology data using the positional data, and identifying a first cut site and a second cut site, wherein nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology.
[0147] Also provided are computer-implemented methods of cut site evaluation for cutting a template polynucleotide molecule, the computer-implemented method comprising: (a) generating, by one or more computers, positional data for a first cut site and a second cut site within a template nucleic acid sequence, wherein the positional data comprises (i) the location of the cut site within the template and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the first cut site and the second cut site; and (b) generating, by one or more computers, microhomology data using the positional data, wherein the microhomology data identifies a degree of microhomology between nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology.
[0148] In some embodiments, wherein the method further comprises identifying, by one or more computers, a first target sequence within or adjacent to the first cut site and a second target sequence within or adjacent to the second cut site. In some embodiments, the nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise at least 2 nucleobase positions, at least 5 nucleobase positions, at least 7 nucleobase positions, at least 10 nucleobase positions, or at least 15 nucleobase positions.
[0149] In some embodiments, the microhomology data is a function of: (i) total nucleobase complementarity of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; (ii) the length (e.g., number) of nucleobase complementarity of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; (iii) nucleobase complementarity at a 5’ edge (e.g., the at least two nucleobase positions at a 5’ terminus) or a 3’ edge of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; (iv) orientation and/or strand location (e.g., for identifying inversion outcomes); (v) base content of complementary nucleobases between nucleobases proximal to the first cut site and nucleobases proximal the second cut site; or (vi) a combination of (i)-(v). In certain embodiments, the function includes or utilizes sequence alignment data (e.g., of the proximal nucleobase positions) to generate the microhomology data.
[0150] A computer-implemented method of cut site identification for cutting a template polynucleotide molecule, the computer-implemented method comprising: (a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence); and (b) identifying or providing, by one or more computers, a plurality of cut sites within the template nucleic acid sequence; (c) generating, by one or more computers, positional data for a cut site of the plurality of cut sites using the template nucleic acid sequence, wherein the positional data comprises (i) the location of the cut site and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the cut site; (d) generating, by one or more computers, microhomology data for the plurality of cut sites using the positional data; and (e) identifying, by one or more computers, at least 2 cut sites, using the microhomology data, wherein the at least 2 cut sites comprise nucleobase positions proximal to the cut sites having microhomology.
[0151] In some embodiments, the method further comprises identifying, by one or more computers, target sequences comprising or adjacent to the cut sites identified in (e). In some embodiments, the nucleobases proximal to cute sites comprise at least 2 nucleobase positions, at least 5 nucleobase positions, at least 7 nucleobase positions, at least 10 nucleobase positions, or at least 15 nucleobase positions.
[0152] In some embodiments, the microhomology data is a function of: total nucleobase complementarity of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; the length (e.g., number) of nucleobase complementarity of nucleobases proximal to the cut sites; nucleobase complementarity at a 5’ edge (e.g., the at least two nucleobase positions at a 5’ terminus) or a 3’ edge of nucleobases proximal to the cut sites; melting temperature of nucleobases proximal to the cut sites; base content of complementary nucleobases between nucleobases proximal to the cut sites; or a combination thereof. In certain embodiments, the function includes or utilizes sequence alignment data (e.g., of the proximal nucleobase positions) to generate the microhomology data.
[0153] In some embodiments, the template nucleic acid sequence comprises consensus sequence, and wherein the computer-implemented method comprises in (a) generating, by one or more computers, the template nucleic acid sequence by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes). In some embodiments, the template nucleic acid sequence is different from each input nucleic acid sequence used to generate the consensus sequence. [0154] In some embodiments, the two or more input nucleic acid sequences are present within a definable geographical region (e.g., Asia, Europe, North America, etc.), a definable population of individuals (e.g., a patient population), or a definable pathology (e.g., cancer-causing variants). In some embodiments, the computer-implemented method further comprises: generating, by one or more computers, positional entropy data for a nucleotide at each position of the template nucleic acid sequence.
[0155] In some embodiments, the two or more input nucleic acid sequences comprises at least 5 sequences. In some embodiments, the two or more input nucleic acid sequences comprises at least 50 sequences. In some embodiments, the two or more input nucleic acid sequences comprises at least 100 sequences. In some embodiments, the two or more input nucleic acid sequences comprises at least 250 sequences. In some embodiments, the two or more input nucleic acid sequences comprises are from a pathogen.
[0156] In some embodiments, the two or more input nucleic acid sequences comprises are from a virus. In some embodiments, the two or more input nucleic acid sequences are sequences within different subclades of the virus. In some embodiments, the two or more input nucleic acid sequences are sequences within a single or same subclade of a virus.
[0157] In certain embodiments, the two or more input nucleic acid sequences comprises about 2 input sequences to about 1 ,000 input sequences. In certain embodiments, the two or more input nucleic acid sequences comprises about 2 input sequences to about 5 input sequences, about 2 input sequences to about 25 input sequences, about 2 input sequences to about 50 input sequences, about 2 input sequences to about 100 input sequences, about 2 input sequences to about 500 input sequences, about 2 input sequences to about 1 ,000 input sequences, about 5 input sequences to about 25 input sequences, about 5 input sequences to about 50 input sequences, about 5 input sequences to about 100 input sequences, about 5 input sequences to about 500 input sequences, about 5 input sequences to about 1 ,000 input sequences, about 25 input sequences to about 50 input sequences, about 25 input sequences to about 100 input sequences, about 25 input sequences to about 500 input sequences, about 25 input sequences to about 1 ,000 input sequences, about 50 input sequences to about 100 input sequences, about 50 input sequences to about 500 input sequences, about 50 input sequences to about 1 ,000 input sequences, about 100 input sequences to about 500 input sequences, about 100 input sequences to about 1 ,000 input sequences, or about 500 input sequences to about 1 ,000 input sequences. In certain embodiments, the two or more input nucleic acid sequences comprises about 2 input sequences, about 5 input sequences, about 25 input sequences, about 50 input sequences, about 100 input sequences, about 500 input sequences, or about 1 ,000 input sequences. In certain embodiments, the two or more input nucleic acid sequences comprises at least about 2 input sequences, about 5 input sequences, about 25 input sequences, about 50 input sequences, about 100 input sequences, or about 500 input sequences. In certain embodiments, the two or more input nucleic acid sequences comprises at most about 5 input sequences, about 25 input sequences, about 50 input sequences, about 100 input sequences, about 500 input sequences, or about 1 ,000 input sequences.
[0158] A non-limiting example of a mathematical algorithm utilized for the comparison of two sequences is the algorithm of Karlin and Altschul, 1990, Proc. Natl. Acad. Sci. USA 87:2264-2268, modified as in Karlin and Altschul, 1993, Proc. Natl. Acad. Sci. USA 90:5873-5877. Such an algorithm is incorporated into the NBLAST and XBLAST programs of Altschul et al., 1990, J. Mol. Biol. 215:403-410. Alternatively, PSI-Blast can be used to perform an iterated search which detects distant relationships between molecules. When utilizing BLAST, Gapped BLAST, and PSI-Blast programs, the default parameters of the respective programs (e.g., XBLAST and NBLAST) can be used. Another preferred, non-limiting example of a mathematical algorithm utilized for the comparison of sequences is the algorithm of Myers and Miller, CABIOS (1989). Such an algorithm is incorporated into the ALIGN program (version 2.0) which is part of the GCG sequence alignment software package. Additional algorithms for sequence analysis are known in the art and include ADVANCE and ADAM as described in Torellis and Robotti, 1994, Comput. Appl. Biosci. 10:3-5; and FASTA described in Pearson and Lipman, 1988, Proc. Natl. Acad. Sc/. USA 85:2444-8. Alternatively, sequence alignment may be carried out using the CLUSTAL algorithm (e.g., as provided in the program Clustal- omega), as described by Higgins et al., 1996, Methods Enzymol. 266:383-402.
[0159] In some embodiments, the template nucleic acid sequence is a human sequence, In some embodiments, the template nucleic acid sequence is a viral sequence. In some embodiments, the virus is Hepatitis B virus (HBV), Human Immunodeficiency virus (HIV), JC virus (JCV), herpes simplex virus (HSV), or SARS-CoV-2.
[0160] In some embodiments, a virus comprises Hepatitis B virus (HBV), Human Immunodeficiency virus (HIV), JC virus (JCV), herpes simplex virus (HSV), or SARS-CoV-2. In some embodiments, the virus is Hepatitis B virus (HBV) and the genotype is HBV-A. In some embodiments, the virus is Hepatitis B virus (HBV) and the genotype is HBV-B. In some embodiments, the virus is Hepatitis B virus (HBV) and the genotype is HBV-C. In some embodiments, the virus is Hepatitis B virus (HBV) and the different genotypes comprise HBV-A, HBV-B, HBV-C, or combinations thereof. In some embodiments, the virus is Hepatitis B virus (HBV) and different genotypes comprise HBV-A, HBV-B, HBV-C, HBV-D, HBV-E, HBV-F, HBV-G, HBV-H, or combinations thereof. In some embodiments, the virus is Hepatitis B virus (HBV) and various subclades within HBV-A, HBV-B, HBV-C, HBV- D, HBV-E, HBV-F, HBV-G, HBV-H, or combinations thereof.
[0161] In some instances, the subclade within HBV-A comprises HBV-A1 , HBV- A2, HBV-QS-A3, HBV-A4, or combinations thereof. In some instances, the subclade within HBV-B comprises HBV-B1 , HBV-B2, HBV-QS-B3, HBV-B4, HBV- B5, or combinations thereof. In some instances, the subclade within HBV-C comprises HBV-C1 , HBV-QS-C2, HBV-C3, HBV-C4, HBV-C5, HBV-C6-C15, or combinations thereof. In some instances, the subclade within HBV-D comprises HBV-D1 , HBV-D2, HBV-D3, HBV-D4, HBV-D5, HBV-D6, or combinations thereof. In some instances, the subclade within HBV-F comprises HBV-F1 , HBV-F2, HBV- F3, HBV-F4, or combinations thereof.
[0162] In some embodiments, HBV genotypes and subgenotypes/subclades in populations differ between geographic regions. As an example, the subclades in North America include HBV-A2, HBV-D2, HBV-B5, HBV-B4, and HBV-G. As an example, the subclades in Central America include HBV-A2, HBV-F1 , HBV-H, HBV-G, HBV-B2, HBV-F3, HBV-C1 , and HBV-F4. As an example, the subclades in Caribbean include HBV-A1 , HBV-QS-A3, HBV-D4, HBV-A2, and HBV-D3. As an example, the subclades in South America include HBV-F1 , HBV-F4, HBV-D3, HBV- F3, HBV-F2, HBV-A1 , HBV-A2, and HBV-D2. As an example, the subclades in Northern Europe include HBV-D2, HBV-A2, HBV-D3, and HBV-E. As an example, the subclades in Southern Europe include HBV-D3, HBV-D2, HBV-D1 , and HBV- A2. As an example, the subclades in Western Europe include HBV-A2, HBV-D1 , HBV-D2, HBV-D3, and HBV-E. As an example, the subclades in Eastern Europe include HBV-D2, HBV-A2, HBV-D1 , and HBV-D3. As an example, the subclades in Northern Africa include HBV-D1 , HBV-E, HBV-D6, HBV-D2, and HBV-D3. As an example, the subclades in Western Africa include HBV-E and HBV-A2. As an example, the subclades in Middle Africa include HBV-E, and HBV-QS-A3. As an example, the subclades in Eastern Africa include HBV-A1 , HBV-D2, and HBV-E. As an example, the subclades in Southern Africa include HBV-A1 , HBV-D3, HBV- E, and HBV-A2. As an example, the subclades in Western Asia include HBV-D1 and HBV-D2. As an example, the subclades in Southern Asia include HBV-D1 , HBV-D3, HBV-D2, HBV-A1 , HBV-C1 , and HBV-D5. As an example, the subclades in Central Asia include HBV-D1 , HBV-D2, HBV-QS-C2, and HBV-A2. As an example, the subclades in Eastern Asia include HBV-QS-C2, HBV-B2, HBV-C1 , HBV-QS-B3, and HBV-C6-C15. As an example, the subclades in Southeastern Asia include HBV-C1 , HBV-B2, HBV-QS-B3, HBV-B4, and HBV-QS-C2. As an example, the subclades in Melanesia include HBV-D2, HBV-C3, and HBV-C6-C15. As an example, the subclades in Polynesia include HBV-C3. As an example, the subclades in Australia and New Zealand include HBV-D1 , HBV-C4, HBV-C3, and HBV-D4. In some embodiments, the most frequently observed subclades for various geographic regions, in decreasing order, comprise, for North America: HBV-A2 > HBV-D2 > HBV-B5 > HBV-B4 > HBV-G; for Central America: HBV-A2 > HBV-F1 > HBV-H > HBV-G > HBV-B2 > HBV-F3 > HBV-C1 > HBV-F4; for Caribbean: HBV-A1 > HBV-QS-A3 > HBV-D4 > HBV-A2 > D3; for South America: HBV-F1 > HBV-F4 > HBV-D3 > HBV-F3 > HBV-F2 > HBV-A1 > HBV-A2 > HBV- D2; for Northern Europe: HBV-D2 > HBV-A2 > HBV-D3 > HBV-E; for Southern Europe: HBV-D3 > HBV-D2 > HBV-D1 > HBV-A2; for Western Europe: HBV-A2 > HBV-D1 > HBV-D2 > HBV-D3 > HBV-E; for Eastern Europe: HBV-D2 > HBV-A2 > HBV-D1 > HBV-D3; for Northern Africa: HBV-D1 > HBV-E > HBV-D6 > HBV-D2 > HBV-D3; for Western Africa: HBV-E > HBV-A2; for Middle Africa: HBV-E > HBV- QS-A3; for Eastern Africa: HBV-A1 > HBV-D2 > HBV-E; for Southern Africa: HBV- A1 > HBV-D3 > HBV-E > HBV-A2, for Western Asia: HBV-D1 > HBV-D2, for Southern Asia: HBV-D1 > HBV-D3 > HBV-D2 > HBV-A1 > HBV-C1 > HBV-D5; for Central Asia: HBV-D1 > HBV-D2 > HBV-QS-C2 > HBV-A2; for Eastern Asia: HBV- QS-C2 > HBV-B2 > HBV-C1 > HBV-QS-B3 > HBV-C6-C15; Southeastern Asia: HBV-C1 > HBV-B2 > HBV-QS-B3 > B4 > HBV-QS-C2; Melanesia: HBV-D2 > HBV- C3 > HBV-C6-C15; Polynesia: HBV-C3; and for Australia and New Zealand: HBV- D1 > HBV-C4 > HBV-C3 > HBV-D4. In some embodiments, the method further comprises:generating, by the one or more computers, additional data using the template nucleic acid sequence and/or the positional data, wherein the additional data comprises positional entropy data (e.g., Shannon entropy) for a cut site and/or nucleobase positions proximal to the cut site, gene location (e.g., coding region data) data for a cut site and/or nucleobase positions proximal to the cut site, a distance data (e.g., distance from other target sequences) for a cut site and/or nucleobase positions proximal to the cut site, proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence) for a cut site and/or nucleobase positions proximal to the cut site, target specificity and selectivity data (e.g., Azimuth 2.0) for a cut site and/or nucleobase positions proximal to the cut site, or combinations thereof.
[0163] In some embodiments, the method further comprises identifying a first target nucleic acid sequence comprising or adjacent to the first cut site and a second target nucleic acid sequence comprising or adjacent to the second cut site. [0164] In some embodiments, the method further comprises: generating, by the one or more computers, additional data for the first target nucleic acid sequence and the second target nucleic acid sequence, wherein the additional data comprises positional entropy data (e.g., Shannon entropy), gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof. In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are further generated by using the additional data associated with the first target nucleic acid sequence and the second target nucleic acid sequence to generate a score, wherein the score is above a threshold value.
[0165] In some embodiments, the first cut site and the second cut site are cleavable by one or more programmable nucleases. In some embodiments, the one or more programmable nucleases comprise a CRISPR-Cas system, a meganuclease system, a TALEN system, a ZFN system, or a combination thereof. In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are targetable (e.g., capable of being cleaved by) one or more CRISPR-Cas systems.
[0166] In some embodiments, the microhomology comprises at least 2, at least 5, at least 10, at least 15, or at least 20 complementary or substantially complementary (e.g., greater than 75% complementarity) nucleobases.
[0167] In some embodiments, the first cut site is located within a first gene of the template polynucleotide molecule and the second cut site is located within a second gene of the template polynucleotide molecule. In some embodiments, the first cut site and the second cut site are located within two or more genes of the template polynucleotide molecule. In some embodiments, the first cut site is located within a first protein coding region of the template polynucleotide molecule and the second cut site is located within a second protein coding region of the template polynucleotide molecule. In some embodiments, the first cut site and the second cut site are located within two or more protein coding regions of the template polynucleotide molecule. In some embodiments, the first cut site and the second cut site are identical or substantially identical (e.g., greater than 75% sequence identity). In some embodiments, the microhomology is located at the terminus (e.g., the 3’ end) of the first cut site or cut site associated with the first cut site and the second cut site or cut site associated with the second cut site. In some embodiments, the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cut site) to the terminus of the first cut site or cut site associated with the first cut site and the second cut site or cut site associated with the second cut site.
[0168] In some embodiments, cutting at first cut site and cutting ate the second cut site results in a deletion (e.g., excising) between the first cut site and the second cut site. In some embodiments, cutting at first cut site and cutting at to the second cut site results in (i) microhomology-mediated end joining (MMEJ) of the first cut site and the second cut site, and/or (ii) a deletion in the template polynucleotide molecule. In some embodiments, the deletion comprises at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In some embodiments, the deletion removes one or more genes within the template polynucleotide molecule. [0169] In some embodiments, the deletion removes of at least about 1 , 2, 3, 4, or 5 genes within the template polynucleotide molecule. In some embodiments, the deletion is a full deletion of a gene or a partial deletion of a gene. In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs.
[0170] In some embodiments, the template polynucleotide molecule is in a cell. In some embodiments, the cell is in an individual. In some embodiments, the individual is a human. In some embodiments, the template polynucleotide molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated). In some embodiments, the template polynucleotide molecule is an episomal or integrated genome exogenous a host cell.
[0171] Provided herein are computer-implemented systems comprising: a digital processing device comprising: at least one processor, an operating system configured to perform executable instructions, a memory, and a computer program including instructions executable by the digital processing device to create a target site identification application for identifying cut sites for cutting a template polynucleotide molecule, wherein the target site identification application is programmed to: (a) generate microhomology data for a plurality of cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (i) the location of cut sites and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the cut sites; and (b) identify a first cut site and a second cut site using the microhomology data, wherein nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology.
[0172] Also provided herein are non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processing device to create a target site identification application for identifying target sites for cutting a template polynucleotide molecule, wherein the target site identification application is programmed to: (a) generate microhomology data for a plurality of cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (i) the location of cut sites and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the cut sites; and (b) identify a first cut site and a second cut site using the microhomology data, wherein nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology.
[0173] A computer-implemented method of cut site identification for cutting a template polynucleotide molecule, the computer-implemented method comprising:
(a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence); and (b) generating, by one or more computers, microhomology data using the template nucleic acid sequence, wherein the microhomology data is a function of: total nucleobase complementarity of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; the length (e.g., number) of nucleobase complementarity of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; nucleobase complementarity at a 5’ edge (e.g., the at least two nucleobase positions at a 5’ terminus) or a 3’ edge of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; melting temperature of nucleobases proximal to the first cut site and nucleobases proximal the second cut site; base content of complementary nucleobases between nucleobases proximal to the first cut site and nucleobases proximal the second cut site; or a combination thereof; and (c) identifying or generating, by one or more computers, cut sites within the template nucleic acid sequence using the microhomology data, wherein nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology.
[0174] Further provided herein are computer-implemented method of target site identification for cutting a template polynucleotide molecule, the computer- implemented method comprising: (a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence); and
(b) generating, by one or more computers, a plurality of target nucleic acid sequences from the template nucleic acid sequence, wherein a target nucleic acid comprises a portion (e.g., less than 50 base pairs) of the template nucleic acid sequence; and (c) generating, by one or more computers, a first target nucleic acid sequence and a second target nucleic acid sequence from the plurality of target nucleic acid sequences, wherein the first target nucleic acid sequence and the second target nucleic acid sequence comprise microhomology.
[0175] In some embodiments, in (c), the first target nucleic acid sequence and the second target nucleic acid sequence are generated by: (i) aligning the plurality of target nucleic acid sequences; (ii) identifying a pair of target nucleic acid sequences comprising microhomology; and (iii) identifying the first target nucleic acid sequence and the second target nucleic acid sequence comprising microhomology.
[0176] In some embodiments, the computer-implemented method further comprises: (d) generating, by one or more computers, microhomology data for the microhomology of the first target nucleic acid sequence and the second target nucleic acid sequence. In some embodiments, the microhomology data comprises a scoring function, wherein the scoring function is derived from: total nucleobase complementarity, nucleobase complementarity at a 5’ or 3’ region of the first target nucleic acid sequence and the second target nucleic acid sequence, melting temperature of two or more complementary nucleobases, or a combination thereof. [0177] Further provided are computer-implemented methods of target site identification for cutting a template polynucleotide molecule, the computer- implemented method comprising: (a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence); and (b) generating, by one or more computers, a plurality of target nucleic acid sequences from the template nucleic acid sequence, wherein a target nucleic acid comprises a portion (e.g., less than 50 base pairs) of the template nucleic acid sequence; and (c) generating, by one or more computers, a first target nucleic acid sequence and a second target nucleic acid sequence from the plurality of target nucleic acid sequences, wherein: (i) the first target nucleic acid sequence and the second target nucleic acid sequence comprise microhomology, or (ii) the first target nucleic acid sequence is proximal to a (e.g., within 20, 10, or 5 nucleobase positions) a first template nucleic acid sequence of the template nucleic acid sequence and the second target nucleic acid sequence is proximal to a second template nucleic acid sequence of the template nucleic acid sequence, wherein the first template nucleic acid sequence and the second template nucleic acid sequence comprise microhomology.
[0178] In some embodiments, in (c), the first target nucleic acid sequence and the second target nucleic acid sequence are generated by: (i) aligning the plurality of target nucleic acid sequences to the template nucleic acid sequence; (ii) identifying regions proximal (e.g., within 20, 10, or 5 nucleobase positions) to the plurality of target nucleic acid sequences within the template nucleic acid sequence; (iii) identifying (1 ) pairs of target nucleic acid sequences comprising microhomology, (2) pairs of proximal sequences comprising microhomology that are proximal to a pair of target nucleic acid sequences, or (3) a combination thereof; and (iv) identifying the first target nucleic acid sequence and the second target nucleic acid sequence from the pairs of target nucleic acid sequences, the pairs of proximal sequences , or the combination thereof.
[0179] In some embodiments, the computer-implemented method further comprises: (d) generating, by one or more computers, microhomology data for the microhomology of the first target nucleic acid sequence and the second target nucleic acid sequence. In certain embodiments, the microhomology data comprises a scoring function, wherein the scoring function is derived from: total nucleobase complementarity, nucleobase complementarity at a 5’ or 3’ region of the first target nucleic acid sequence and the second target nucleic acid sequence, melting temperature of two or more complementary nucleobases, or a combination thereof. [0180] Provided are computer-implemented methods of target site identification for cutting a template polynucleotide molecule, the computer-implemented method comprising: (a) generating, by one or more computers, a first target nucleic acid sequence and a second target nucleic acid sequence from a plurality of target nucleic acid sequences (e.g., user-defined or computer-generated), wherein: (i) the first target nucleic acid sequence and the second target nucleic acid sequence comprise microhomology, or (ii) the first target nucleic acid sequence is proximal to a (e.g., within 20, 10, or 5 nucleobase positions) a first template nucleic acid sequence of the template nucleic acid sequence and the second target nucleic acid sequence is proximal to a second template nucleic acid sequence of the template nucleic acid sequence, wherein the first template nucleic acid sequence and the second template nucleic acid sequence comprise microhomology; and wherein the first target nucleic acid sequence and the second target nucleic acid sequence are generated by: (1 ) aligning the plurality of target nucleic acid sequences to a template nucleic acid sequence; (2) identifying regions proximal (e.g., within 20, 10, or 5 nucleobase positions) to the plurality of target nucleic acid sequences within the template nucleic acid sequence; (3) identifying pairs of target nucleic acid sequences comprising microhomology, pairs of proximal sequences comprising microhomology that are proximal to a pair of target nucleic acid sequences, or a combination thereof; and (4) identifying the first target nucleic acid sequence and the second target nucleic acid sequence from the pairs of target nucleic acid sequences, the pairs of proximal sequences , or the combination thereof.
[0181] Also provided are computer-implemented methods of target site identification for cutting a template polynucleotide molecule, the computer- implemented method comprising: (a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence); and (b) identifying, by one or more computers, at least one pair of microhomologous sequences within the template nucleic acid sequence, wherein the pair of microhomologous sequences comprise microhomology; (c) generating, by one or more computers, a plurality of target nucleic acid sequences from the template nucleic acid sequence, wherein a target nucleic acid comprises a portion (e.g., less than 50 base pairs) of the template nucleic acid sequence; and (d) generating, by one or more computers, a first target nucleic acid sequence and a second target nucleic acid sequence from the plurality of target nucleic acid sequences and at least one pair of microhomologous sequences, wherein: (i) the first target nucleic acid sequence and the second target nucleic acid sequence comprise microhomology, or (ii) the first target nucleic acid sequence is proximal to a (e.g., within 20, 10, or 5 nucleobase positions) a first template nucleic acid sequence of the template nucleic acid sequence and the second target nucleic acid sequence is proximal to a second template nucleic acid sequence of the template nucleic acid sequence, wherein the first template nucleic acid sequence and the second template nucleic acid sequence comprise microhomology.
[0182] In some embodiments, in (d), the first target nucleic acid sequence and the second target nucleic acid sequence are generated by: (i) aligning the plurality of target nucleic acid sequences to the template nucleic acid sequence; (ii) identifying regions proximal (e.g., within 20, 10, or 5 nucleobase positions) to the plurality of target nucleic acid sequences within the template nucleic acid sequence; (iii) identifying (1 ) pairs of target nucleic acid sequences comprising microhomology, (2) pairs of proximal sequences comprising microhomology that are proximal to a pair of target nucleic acid sequences, or (3) a combination thereof; and (iv) identifying the first target nucleic acid sequence and the second target nucleic acid sequence from the pairs of target nucleic acid sequences, the pairs of proximal sequences , or the combination thereof. [0183] In some embodiments, the computer-implemented method further comprises: (e) generating, by one or more computers, microhomology data for the microhomology of the first target nucleic acid sequence and the second target nucleic acid sequence. In certain embodiments, the microhomology data comprises a scoring function, wherein the scoring function is derived from: total nucleobase complementarity, nucleobase complementarity at a 5’ or 3’ region of the first target nucleic acid sequence and the second target nucleic acid sequence, melting temperature of two or more complementary nucleobases, or a combination thereof. [0184] Further provided herein are computer-implemented systems and non- transitory computer-readable storage media that incorporate (e.g., creating and executing) the computer-implemented methods described herein. Accordingly, provided herein are computer-implemented system comprising: a digital processing device comprising: at least one processor, an operating system configured to perform executable instructions, a memory, and a computer program including instructions executable by the digital processing device to create a target site identification application for identifying target sites for cutting a template polynucleotide molecule, wherein the target site identification application is programmed to: (a) generate or provide a template nucleic acid sequence (e.g., a viral genome sequence); and (b) generate a plurality of target nucleic acid sequences from the template nucleic acid sequence, wherein a target nucleic acid comprises a portion (e.g., less than 50 base pairs) of the template nucleic acid sequence; and (c) generate a first target nucleic acid sequence and a second target nucleic acid sequence from the plurality of target nucleic acid sequences, wherein the first target nucleic acid sequence and the second target nucleic acid sequence comprise microhomology.
[0185] Also provided are Non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processing device to create a target site identification application for identifying target sites for cutting a template polynucleotide molecule, wherein the target site identification application is programmed to: (a) generate or provide a template nucleic acid sequence (e.g., a viral genome sequence); and (b) generate a plurality of target nucleic acid sequences from the template nucleic acid sequence, wherein a target nucleic acid comprises a portion (e.g., less than 50 base pairs) of the template nucleic acid sequence; and (c) generate a first target nucleic acid sequence and a second target nucleic acid sequence from the plurality of target nucleic acid sequences, wherein the first target nucleic acid sequence and the second target nucleic acid sequence comprise microhomology.
[0186] Further provided herein are computer-implemented system comprising: a digital processing device comprising: at least one processor, an operating system configured to perform executable instructions, a memory, and a computer program including instructions executable by the digital processing device to create a target site identification application for identifying target sites for cutting a template polynucleotide molecule, wherein the target site identification application is programmed to: (a) generating or providing, by one or more com puters, a template nucleic acid sequence (e.g., a viral genome sequence); and (b) generating, by one or more computers, a plurality of target nucleic acid sequences from the template nucleic acid sequence, wherein a target nucleic acid comprises a portion (e.g., less than 50 base pairs) of the template nucleic acid sequence; and (c) generating, by one or more computers, a first target nucleic acid sequence and a second target nucleic acid sequence from the plurality of target nucleic acid sequences, wherein : (i) the first target nucleic acid sequence and the second target nucleic acid sequence comprise microhomology, or (ii) the first target nucleic acid sequence is proximal to a (e.g., within 20, 10, or 5 nucleobase positions) a first template nucleic acid sequence of the template nucleic acid sequence and the second target nucleic acid sequence is proximal to a second template nucleic acid sequence of the template nucleic acid sequence, wherein the first template nucleic acid sequence and the second template nucleic acid sequence comprise microhomology.
[0187] Also provided are Non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processing device to create a target site identification application for identifying target sites for cutting a template polynucleotide molecule, wherein the target site identification application is programmed to: (a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence); and (b) generating, by one or more computers, a plurality of target nucleic acid sequences from the template nucleic acid sequence, wherein a target nucleic acid comprises a portion (e.g., less than 50 base pairs) of the template nucleic acid sequence; and (c) generating, by one or more computers, a first target nucleic acid sequence and a second target nucleic acid sequence from the plurality of target nucleic acid sequences, wherein: (i) the first target nucleic acid sequence and the second target nucleic acid sequence comprise microhomology, or (ii) the first target nucleic acid sequence is proximal to a (e.g., within 20, 10, or 5 nucleobase positions) a first template nucleic acid sequence of the template nucleic acid sequence and the second target nucleic acid sequence is proximal to a second template nucleic acid sequence of the template nucleic acid sequence, wherein the first template nucleic acid sequence and the second template nucleic acid sequence comprise microhomology.
[0188] In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence is a non-naturally occurring sequence. In some embodiments, the template nucleic acid sequence comprises a consensus sequence generated by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes). In some embodiments, the computer-implemented method comprises in (a) generating a consensus sequence, wherein the consensus sequence is generated by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes).
[0189] In certain embodiments, the consensus sequence is different from each input nucleic acid sequence. In certain embodiments, the two or more input nucleic acid sequences are present within a definable geographical region (e. g., Asia, Europe, North America, etc.), a definable population of individuals (e.g., a patient population), or a definable pathology (e.g., cancer-causing variants).
[0190] In some embodiments, the computer-implemented method further comprises: generating, by one or more computers, positional entropy data for a nucleotide at each position of the consensus sequence. In certain embodiments, the method comprises generating, by one or more computers, a positional entropy score from the positional entropy data, and cleaning the plurality of target nucleic acid sequences by removing target nucleic acid sequences having a positional entropy score lower than a defined threshold score.
[0191] In some embodiments, the two or more input nucleic acid sequences comprises at least 5 sequences In some embodiments, the two or more input nucleic acid sequences comprises at least 50 sequences. In some embodiments, the two or more input nucleic acid sequences comprises at least 100 sequences. In some embodiments, the two or more input nucleic acid sequences comprises at least 100 sequences Processing Device(s)
[0192] The methods, systems, and media described herein include at least one digital processing device, or use of the same. The digital processing device includes one or more hardware central processing units (CPUs) or general-purpose graphics processing units (GPGPUs) that carry out the device's functions. The digital processing device further comprises an operating system configured to perform executable instructions. The digital processing device is optionally connected a computer network. By way of example, the digital processing device is optionally connected to the Internet such that it accesses the World Wide Web. By way of further example, the digital processing device is optionally connected to a cloud computing infrastructure. By way of further example, the digital processing device is optionally connected to an intranet. By way of still further example, the digital processing device is optionally connected to a data storage device.
[0193] In accordance with the description herein, suitable digital processing devices include, by way of non-limiting examples, commercial server computers and desktop computers known to those of skill in the art. Suitable digital processing devices also include devices custom-built using hardware and techniques known to those of skill in the art.
[0194] In some embodiments, generating, by the one or more computers, additional data associated with the plurality of target sequences, wherein the additional data comprises positional entropy data, gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof. In certain embodiments, the additional data comprises positional entropy data (e.g., Shannon entropy). In certain embodiments, the additional data comprises gene location (e.g., coding region data) data. In certain embodiments, the additional data comprises distance data (e.g., distance from other target sequences). In certain embodiments, the additional data comprises proximity to one or more PAM sequences. In certain embodiments, the additional data comprises homology data (e.g., homology to a human genome sequence where reduced or no homology is favored ). In certain embodiments, the additional data comprises target specificity and selectivity data (e.g., Azimuth 2.0). In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are further generated by using additional data associated with the first target nucleic acid sequence and the second target nucleic acid sequence to generate a score, wherein the score is above a threshold value (e.g., a user-defined value).
[0195] In some embodiments, the plurality of target nucleic acid sequences is generated by applying a sliding window (e.g., of about 18, 20, 22, 25, 27, 30, etc.) across the template nucleic acid sequence, thereby generating the plurality of target nucleic acid sequences. I some embodiments, the plurality of target nucleic acid sequences are user supplied and need to be generated.
[0196] In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are targetable (e.g., capable of being cleaved by) one or more programmable nucleases (e.g., the gene editing systems described herein). In certain embodiments, the one or more programmable nucleases comprise a CRISPR-Cas system, a meganuclease system, a TALEN system, a ZFN system, or a combination thereof. In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are targetable (e.g., capable of being cleaved by) one or more CRISPR-Cas systems. In certain embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence comprise a PAM sequence or are adjacent to a PAM sequence.
[0197] In some embodiments, the microhomology comprises at least 2, at least 5, at least 10, at least 15, or at least 20 complementary nucleotides. In some embodiments, the first target nucleic acid sequence is located within a first gene of the template polynucleotide molecule and the second target nucleic acid sequence is located within a second gene of the template polynucleotide molecule. In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are located within two or more genes of the template polynucleotide molecule. In some embodiments, the first target nucleic acid sequence is located within a first protein coding region of the template polynucleotide molecule and the second target nucleic acid sequence is located within a second protein coding region of the template polynucleotide molecule. In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are located within two or more protein coding regions of the template polynucleotide molecule.
[0198] In some embodiments, the microhomology is located at the terminus (e.g., the 3’ end) of the first target nucleic acid sequence or cut site associated with the first target nucleic acid sequence and the second target nucleic acid sequence or cut site associated with the second target nucleic acid sequence. In some embodiments, the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cut site) to the terminus of the first target nucleic acid sequence or cut site associated with the first target nucleic acid sequence and the second target nucleic acid sequence or cut site associated with the second target nucleic acid sequence.
[0199] In some embodiments, cutting within or proximal to the first target nucleic acid sequence and cutting within or proximal to the second target nucleic acid sequence results in a deletion (e.g., excising) between the first target nucleic acid sequence and the second target nucleic acid sequence. In some embodiments, cutting within or proximal to the first target nucleic acid sequence and cutting within or proximal to the second target nucleic acid sequence results in (i) microhomology-mediated end joining (MMEJ) of a region within or proximal to the first target nucleic acid sequence and a region within or proximal to the second target nucleic acid sequence, and/or (ii) a deletion in the template polynucleotide molecule. In some embodiments, the deletion comprises at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In some embodiments, the deletion removes one or more genes within the template polynucleotide molecule. In some embodiments, the deletion removes of at least about 1 , 2, 3, 4, or 5 genes within the template polynucleotide molecule. In some embodiments, the deletion is a full deletion of a gene or a partial deletion of a gene.
[0200] In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In some embodiments, the template polynucleotide molecule is in a cell. In some embodiments, the cell is in an individual. In some embodiments, the individual is a human. In some embodiments, the template polynucleotide molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated). In some embodiments, the template polynucleotide molecule is an episomal or integrated genome exogenous a host cell. Digital Processing Device
[0201] The digital processing device includes an operating system configured to perform executable instructions. The operating system is, for example, software, including programs and data, which manages the device's hardware and provides services for execution of applications. Those of skill in the art will recognize that suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®. Those of skill in the art will recognize that suitable personal computer operating systems include, by way of non-limiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX-like operating systems such as GNU/Linux®. In some embodiments, the operating system is provided by cloud computing.
[0202] The device includes a storage and/or memory device. The storage and/or memory device is one or more physical apparatuses used to store data or programs on a temporary or permanent basis. In some embodiments, the device is volatile memory and requires power to maintain stored information. In some embodiments, the device is non-volatile memory and retains stored information when the digital processing device is not powered. The non-volatile memory may comprise flash memory, dynamic random-access memory (DRAM), ferroelectric random-access memory (FRAM), phase-change random access memory (PRAM), or the like. In other embodiments, the device is a storage device including, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, magnetic disk drives, magnetic tapes drives, optical disk drives, cloud computing-based storage, and the like. In various embodiments, the storage and/or memory device is a combination of devices such as those disclosed herein.
[0203] The digital processing device optionally includes a display to send visual information to a user. Suitable displays include liquid crystal displays (LCD), th in film transistor liquid crystal displays (TFT-LCD), organic light emitting diode (OLED) displays (including passive-matrix OLED (PMOLED) and active-matrix OLED (AMOLED) displays), plasma displays, video projectors, and head-mounted displays (such as a VR headset) in communication with the digital processing device. Suitable VR headsets include, by way of non-limiting examples, HTC Vive, Oculus Rift, Samsung Gear VR, Microsoft HoloLens, Razer OSVR, FOVE VR, Zeiss VR One, Avegant Glyph, Freefly VR headset, and the like. In various embodiments, the display is a combination of devices such as those disclosed herein.
[0204] The digital processing device optionally includes one or more input devices to receive information from a user. Suitable input devices include keyboards, pointing devices (including, by way of non-limiting examples, a mouse, a trackball, a track pad, a joystick, a game controller, and a stylus), touch screens or a multi-touch screens, microphones to capture voice or other sound input, video cameras or other sensors to capture motion or visual input. In particular embodiments, the input device is a Kinect, Leap Motion, or the like. In various embodiments, the input device is a combination of devices such as those disclosed herein.
[0205] Referring to FIG. 10, in a particular embodiment, an exemplary digital processing device 401 is programmed or otherwise configured to assemble shortread DNA sequences into fully phased complete genomic sequences. The device 401 can regulate various aspects of the sequence assembly methods of the present disclosure, such as, for example, performing initial alignments, quality checking, performing subsequent alignments, resolving ambiguity, and phasing heterozygous loci. In this embodiment, the digital processing device 401 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 405, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The digital processing device 401 also includes memory or memory location 410 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 415 (e.g., hard disk), communication interface 420 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 425, such as cache, other memory, data storage and/or electronic display adapters. The memory 410, storage unit 415, interface 420 and peripheral devices 425 are in communication with the CPU 405 through a communication bus (solid lines), such as a motherboard. The storage unit 415 can be a data storage unit (or data repository) for storing data. The digital processing device 401 can be operatively coupled to a computer network (“network”) 430 with the aid of the communication interface 420. The network 430 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network 430 in some embodiments is a telecommunication and/or data network. The network 430 can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 430, in some embodiments with the aid of the device 401 , can implement a peer- to-peer network, which may enable devices coupled to the device 401 to behave as a client or a server.
[0206] Continuing to refer to FIG. 10, the CPU 405 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 410. The instructions can be directed to the CPU 405, which can subsequently program or otherwise configure the CPU 405 to implement methods of the present disclosure. Examples of operations performed by the CPU 405 can include fetch, decode, execute, and write back. The CPU 405 can be part of a circuit, such as an integrated circuit. One or more other components of the device 401 can be included in the circuit. In some embodiments, the circuit is an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
[0207] Continuing to refer to FIG. 10, the storage unit 415 can store files, such as drivers, libraries and saved programs. The storage unit 415 can store user data, e.g., user preferences and user programs. The digital processing device 401 in some embodiments can include one or more additional data storage units that are external, such as located on a remote server that is in communication through an intranet or the Internet.
[0208] Continuing to refer to FIG. 10, the digital processing device 401 can communicate with one or more remote computer systems through the network 430. For instance, the device 401 can communicate with a remote computer system of a user. Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PCs (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants.
[0209] Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the digital processing device 401 , such as, for example, on the memory 410 or electronic storage unit 415. The machine executable or machine-readable code can be provided in the form of software. During use, the code can be executed by the processor 405. In some embodiments, the code can be retrieved from the storage unit 415 and stored on the memory 410 for ready access by the processor 405. In some situations, the electronic storage unit 415 can be precluded, and machine-executable instructions are stored on memory 410. Non-Transitory Computer Readable Storage Medium
[0210] The methods, systems, and media disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked digital processing device. In some embodiments, a computer readable storage medium is a tangible component of a digital processing device. In other embodiments, a computer readable storage medium is optionally removable from a digital processing device. A computer readable storage medium includes, by way of nonlimiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives; optical disk drives, cloud computing systems and services, and the like. In some embodiments, the program and instructions are permanently, substantially permanently, semi-permanently, or non-transitorily encoded on the media.
Computer Program
[0211] The methods, systems, and media disclosed herein include at least one computer program, or use of the same. A computer program includes a sequence of instructions, executable in the digital processing device's CPU, written to perform a specified task. Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), data structures, and the like, that perform particular tasks or implement particular abstract data types. In light of the disclosure provided herein, those of skill in the art will recognize that a computer program may be written in various versions of various languages.
[0212] The functionality of the computer readable instructions may be combined or distributed as desired in various environments. In some embodiments, a computer program comprises one sequence of instructions. In other embodiments, a computer program comprises a plurality of sequences of instructions. In some embodiments, a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations. In various embodiments, a computer program includes one or more software modules. In various implementations, a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof. Standalone Application
[0213] In some embodiments, a computer program includes a standalone application, which is a program that is run as an independent computer process, not an add-on to an existing process, e.g., not a plug-in. Those of skill in the art will recognize that standalone applications are often compiled. A compiler is a computer program(s) that transforms source code written in a programming language into binary object code such as assembly language or machine code. Suitable compiled programming languages include, by way of non-limiting examples, C, C++, Objective-C, COBOL, Delphi, Eiffel, Java™, Lisp, Python™, Visual Basic, and VB .NET, or combinations thereof. Compilation is often performed, at least in part, to create an executable program. In some embodiments, a computer program includes one or more executable complied applications.
Software Modules
[0214] The methods, systems, and media disclosed herein include software, server, and/or database modules, or use of the same. In view of the disclosure provided herein, software modules are created by techniques known to those of skill in the art using machines, software, and languages known to the art. The software modules disclosed herein are implemented in a multitude of ways. In various implementations, a software module comprises a file, a section of code, a programming object, a programming structure, or combinations thereof. In other various implementations, a software module comprises a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, or combinations thereof. By way of non-limiting examples, the one or more software modules comprise a web application, a mobile application, and a standalone application. In some embodiments, software modules are in one computer program or application. In other embodiments, software modules are in more than one computer program or application. In some embodiments, software modules are hosted on one machine. In other embodiments, software modules are hosted on more than one machine. In particular embodiments, software modules are hosted on one or more cloud computing platforms and/or services. In some embodiments, software modules are hosted on one or more machines in one location. In other embodiments, software modules are hosted on one or more machines in more than one location. Databases
[0215] The methods, systems, and media disclosed herein include one or more databases, or use of the same. In view of the disclosure provided herein, those of skill in the art will recognize that many databases are suitable for storage and retrieval of sequence and graph information. Suitable databases include, by way of non-limiting examples, relational databases, non-relational databases, object- oriented databases, object databases, entity-relationship model databases, associative databases, and XML databases. Further non-limiting examples include SQL, PostgreSQL, MySQL, Oracle, DB2, and Sybase. In some embodiments, a database is internet-based. In further embodiments, a database is web-based. In still further embodiments, a database is cloud computing-based. In other embodiments, a database is based on one or more local computer storage devices.
Methods for Excision
[0216] A method of excising a nucleic acid molecule from a template nucleic acid molecule, the method comprising:
(a) identifying a first cleavable region and a second cleavable region having microhomology by: (i) generating, by one or more computers, microhomology data for a plurality of cleavable regions comprising cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (1 ) the location of cut sites and/or (2) nucleobase sequences of nucleobase positions within the cleavable regions comprising the cut sites; and (ii) identifying, by one or more computers, a first cleavable region and a second cleavable region comprising microhomology using the microhomology data;
(b) cleaving the template nucleic acid molecule at the first cleavable region on the template nucleic acid molecule, thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
[0217] (c) cleaving the template nucleic acid molecule at the second cleavable region on the template nucleic acid molecule, thereby generating a second cleaved region, wherein: the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise the microhomology [0218] Provided herein are methods of excising a template polynucleotide molecule, the method comprising:
(a) generating microhomology data for a plurality of cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (i) the location of cut sites and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases that are 5’ and/or 3’) to the cut sites;
(b) identifying a first cut site and a second cut site using the microhomology data, wherein nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology; and
(c) cutting the template nucleic acid at the first cut site and the second cut site, thereby excising a segment of the template polynucleotide between the first cut site and the second cut site.
[0219] Provided herein are methods of excising a template polynucleotide molecule, the method comprising:
(a) generating or providing a template nucleic acid sequence (e.g., a viral genome sequence);
(b) identifying or providing a plurality of cut sites within the template nucleic acid sequence;
(c) generating positional data for a cut site of the plurality of cut sites using the template nucleic acid sequence, wherein the positional data comprises (i) the location of the cut site and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the cut site;
(d) generating microhomology data using the positional data, and identifying a first cut site and a second cut site, wherein nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology; and
(e) cutting the template nucleic acid at the first cut site and the second cut site, thereby excising a segment of the template polynucleotide between the first cut site and the second cut site.
[0220] Provided herein are methods of excising a template polynucleotide molecule, the method comprising:
(a) generating, by one or more computers, positional data for a first cut site and a second cut site within a template nucleic acid sequence, wherein the positional data comprises (i) the location of the cut site within the template and/or (ii) nucleobase sequences of nucleobase positions proximal (e.g., at least two nucleobases 5’ and/or 3’) to the first cut site and the second cut site;
(b) generating, by one or more computers, microhomology data using the positional data, wherein the microhomology data identifies a degree of microhomology between nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise microhomology; and
(c) cutting the template nucleic acid at the first cut site and the second cut site, thereby excising a segment of the template polynucleotide between the first cut site and the second cut site.
[0221] In some embodiments, the method further comprises identifying a first target sequence within or adjacent to the first cut site and a second target sequence within or adjacent to the second cut site. In some embodiments, the nucleobases proximal to the first cut site and nucleobases proximal the second cut site comprise at least 2 nucleobase positions, at least 5 nucleobase positions, at least 7 nucleobase positions, at least 10 nucleobase positions, at least 15 nucleobase positions, at least 20 nucleobase positions, or at least 30 nucleobase positions.
[0222] In some embodiments, the microhomology data is a function of:
(i) total nucleobase complementarity of nucleobases proximal to the first cut site and nucleobases proximal the second cut site;
(ii) the length (e.g., number) of nucleobase complementarity of nucleobases proximal to the first cut site and nucleobases proximal the second cut site;
(iii) nucleobase complementarity at a 5’ edge (e.g., the at least two nucleobase positions at a 5’ terminus) or a 3’ edge of nucleobases proximal to the first cut site and nucleobases proximal the second cut site;
(iv) melting temperature of nucleobases proximal to the first cut site and nucleobases proximal the second cut site;
(v) base content of complementary nucleobases between nucleobases proximal to the first cut site and nucleobases proximal the second cut site; or
(vi) a combination of (i)-(v). [0223] In some embodiments, the first cut site and the second cut site are cleavable by one or more programmable nucleases. In some embodiments, the one or more programmable nucleases comprise a CRISPR-Cas system, a meganuclease system, a TALEN system, a ZFN system, or a combination thereof. In some embodiments, the template polynucleotide is in a cell. In some embodiments, the template polynucleotide is a genome. In some embodiments, the genome is a viral genome.
[0224] In some embodiments, the method further comprises identifying a first target nucleic acid sequence comprising or adjacent to the first cut site and a second target nucleic acid sequence comprising or adjacent to the second cut site. [0225] In some embodiments, the methods further comprises: generating, by the one or more computers, additional data for the first target nucleic acid sequence and the second target nucleic acid sequence, wherein the additional data comprises positional entropy data (e.g., Shannon entropy), gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof. In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are further generated by using the additional data associated with the first target nucleic acid sequence and the second target nucleic acid sequence to generate a score, wherein the score is above a threshold value.
[0226] In some embodiments, the first cut site and the second cut site are cleavable by one or more programmable nucleases. In some embodiments, the one or more programmable nucleases comprise a CRISPR-Cas system, a meganuclease system, a TALEN system, a ZFN system, or a combination thereof. In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are targetable (e.g., capable of being cleaved by) one or more CRISPR-Cas systems. In some embodiments, the first cut site and the second cut sites are adjacent-a PAM sequence. In some embodiments, the microhomology comprises at least 2, at least 5, at least 10, at least 15, or at least 20 complementary nucleotides.
[0227] In some embodiments, the first cut site is located within a first gene of the template polynucleotide molecule and the second cut site is located within a second gene of the template polynucleotide molecule. In some embodiments, the first cut site and the second cut site are located within two or more genes of the template polynucleotide molecule. In some embodiments, the first cut site is located within a first protein coding region of the template polynucleotide molecule and the second cut site is located within a second protein coding region of the template polynucleotide molecule. In some embodiments, the first cut site and the second cut site are located within two or more protein coding regions of the template polynucleotide molecule. In some embodiments, the first cut site and the second cut site are identical or substantially identical (e.g., greater than 75% sequence identity). In some embodiments, the microhomology is located at the terminus (e.g., the 3’ end) of the first cut site or cut site associated with the first cut site and the second cut site or cut site associated with the second cut site.
[0228] In some embodiments, the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cut site) to the terminus of the first cut site or cut site associated with the first cut site and the second cut site or cut site associated with the second cut site. In some embodiments, the cutting at first cut site and cutting ate the second cut site results in a deletion (e.g., excising) between the first cut site and the second cut site.
[0229] In some embodiments, the cutting at first cut site and cutting at to the second cut site results in (i) microhomology-mediated end joining (MMEJ) of the first cut site and the second cut site, and/or (ii) a deletion in the template polynucleotide molecule. In some embodiments, the deletion comprises at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In some embodiments, the deletion removes one or more genes within the template polynucleotide molecule. In some embodiments, the deletion removes of at least about 1 , 2, 3, 4, or 5 genes within the template polynucleotide molecule. In some embodiments, the deletion is a full deletion of a gene or a partial deletion of a gene.
[0230] In some embodiments, the first target nucleic acid sequence and the second target nucleic acid sequence are separated by a distance of at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs. In some embodiments, the template polynucleotide molecule is in a cell. In some embodiments, the cell is in an individual. In some embodiments, the individual is a human. In some embodiments, the template polynucleotide molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated). In some embodiments, the template polynucleotide molecule is an episomal or integrated genome exogenous a host cell.
Exemplary Embodiments
[0231] Exemplary embodiment 1 : A composition, comprising:
(a) a first gene editing system, wherein: the first gene editing system is configured to enzymatically cleave at a first target site on a template nucleic acid molecule and generate a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(b) a second gene editing system, wherein: the second gene editing system is configured to enzymatically cleave at a second target site on the template nucleic acid molecule and generate a second cleaved region, the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
[0232] Exemplary embodiment 2: The composition of embodiment 1 , wherein the first target site and the second target site are different.
[0233] Exemplary embodiment 3: The composition of embodiment 1 or 2, wherein the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
[0234] Exemplary embodiment 4: The composition of any one of embodiments 1 to 3, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
[0235] Exemplary embodiment 5: The composition of any one of embodiments 1 to 3, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
[0236] Exemplary embodiment 6: The composition of any one of embodiments 1 to 5, wherein the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
[0237] Exemplary embodiment 7: The composition of any one of embodiments 1 to 8, wherein the first gene editing system and the second gene editing system is selected from the group consisting of a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system.
[0238] Exemplary embodiment 8: The composition of any one of embodiments 1 to 7, wherein the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides.
[0239] Exemplary embodiment 9: The composition of any one of embodiments 1 to 8, wherein the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule.
[0240] Exemplary embodiment 10: The composition of any one of embodiments 1 to 9, wherein the first target site and the second target site are located within two or more genes of the template nucleic acid molecule.
[0241] Exemplary embodiment 11 : The composition of any one of embodiments 1 to 10, wherein the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule.
[0242] Exemplary embodiment 12: The composition of any one of embodiments 1 to 1 1 , wherein the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
[0243] Exemplary embodiment 13: The composition of any one of embodiments 1 to 1 1 , wherein the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
[0244] Exemplary embodiment 14: The composition of any one of embodiments 1 to 13, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
[0245] Exemplary embodiment 15: The composition of any one of embodiments 1 to 13, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
[0246] Exemplary embodiment 16: The composition of any one of embodiments 14 to 15, wherein the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
[0247] Exemplary embodiment 17: The composition of any one of embodiments 14 to 16, wherein the deletion removes one or more genes within the template nucleic acid molecule, wherein the deletion is a full deletion of a gene or a partial deletion of a gene.
[0248] Exemplary embodiment 18: The composition of any one of embodiments 14 to 17, wherein the deletion comprises an inversion.
[0249] Exemplary embodiment 19: The composition of any one of embodiments 1 to 18, wherein the first target site and the second target site are separated by a distance of at least 50 (e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
[0250] Exemplary embodiment 20: The composition of any one of embodiments 1 to 19, wherein the template nucleic acid molecule is in a cell.
[0251] Exemplary embodiment 21 : The composition of embodiment 20, wherein the cell is in an individual.
[0252] Exemplary embodiment 22: The composition of embodiment 21 , wherein the individual is a human.
[0253] Exemplary embodiment 23: The composition of any one of embodiments 1 to 24, wherein the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
[0254] Exemplary embodiment 24: The composition of any one of embodiments 1 to 24, wherein the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell. [0255] Exemplary embodiment 25: A composition, comprising: (a) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease system comprising: (i) a first guide ribonucleic acid (gRNA) comprising a first spacer sequence that hybridizes to a first target site on a template nucleic acid molecule, and (ii) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease, wherein: the first CRISPR-associated nuclease cleaves the template nucleic acid molecule within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the first target site and generates a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(b) a second CRISPR-associated endonuclease system comprising (i) a second guide ribonucleic acid (gRNA) comprising a second spacer sequence that hybridizes to a second target site on the template nucleic acid molecule, and (ii) a second Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)- associated nuclease, wherein:
[0256] the second CRISPR-associated nuclease cleaves the template nucleic acid molecule within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the second target site and generates a second cleaved region, the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
[0257] Exemplary embodiment 26: The composition of embodiment 25, wherein the first gRNA and the second gRNA are different.
[0258] Exemplary embodiment 27: The composition of embodiment 25 or 26, wherein the microhomology comprises three or more complementary nucleotides having a GC (guanine or cytosine) content greater than or equal to 50%.
[0259] Exemplary embodiment 28: The composition of any one of embodiments 25 to 27, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
[0260] Exemplary embodiment 29: The composition of any one of embodiments 25 to 28, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
[0261] Exemplary embodiment 30: The composition of any one of embodiments 25 to 29, wherein the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
[0262] Exemplary embodiment 31 : The composition of any one of embodiments 25 to 30, wherein the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides.
[0263] Exemplary embodiment 32: The composition of any one of embodiments 25 to 31 , wherein the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule.
[0264] Exemplary embodiment 33: The composition of any one of embodiments 25 to 32, wherein the first target site and the second target site are located within two or more genes of the template nucleic acid molecule.
[0265] Exemplary embodiment 34: The composition of any one of embodiments 25 to 33, wherein the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule.
[0266] Exemplary embodiment 35: The composition of any one of embodiments 25 to 34, wherein the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
[0267] Exemplary embodiment 36: The composition of any one of embodiments 25 to 34, wherein the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
[0268] Exemplary embodiment 37: The composition of any one of embodiments 25 to 36, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
[0269] Exemplary embodiment 38: The composition of any one of embodiments 25-36, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
[0270] Exemplary embodiment 39: The composition of any one of embodiments 37 or 38, wherein the deletion comprises 50 base pairs or greater (e.g ., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
[0271] Exemplary embodiment 40: The composition of any one of embodiments 37 to 39, wherein the deletion removes one or more genes within the template nucleic acid molecule, and wherein the deletion is a full deletion of a gene or a partial deletion of a gene.
[0272] Exemplary embodiment 41 : The composition of any one of embodiments 37 to 40, wherein the deletion comprises an inversion.
[0273] Exemplary embodiment 42: The composition of any one of embodiments 37 to 41 , wherein the first target site and the second target site are separated by a distance of at least 50 (e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
[0274] Exemplary embodiment 43: The composition of any one of embodiments 1 to 42, wherein the template nucleic acid molecule is in a cell.
[0275] Exemplary embodiment 44: The composition of embodiment 43, wherein the cell is in an individual.
[0276] Exemplary embodiment 45: The composition of embodiment 44, wherein the individual is a human.
[0277] Exemplary embodiment 46: The composition of any one of embodiments 1 to 45, wherein the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
[0278] Exemplary embodiment 47: The composition of any one of embodiments 1 to 46, wherein the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell. [0279] Exemplary embodiment 48: Use of the composition of any one of embodiments 1 to 47 in a method of excising a nucleic acid molecule from the template nucleic acid molecule.
[0280] Exemplary embodiment 49: A nucleic acid vector encoding one or more components of the first gene editing system and/or the second gene editing system of any one of embodiments 1 to 47.
[0281] Exemplary embodiment 50: Use of the nucleic acid vector of embodiment 49 in a method of excising a nucleic acid molecule from the template nucleic acid molecule.
[0282] Exemplary embodiment 51 : A nucleic acid vector encoding one or more components of the first CRISPR-Cas system and/or the second CRISPR-Cas system of any one of embodiments 25 to 47.
[0283] Exemplary embodiment 52: A method of excising a nucleic acid molecule from a template nucleic acid molecule, the method comprising:
(a) cleaving the template nucleic acid molecule at a first target site on the template nucleic acid molecule, thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(b) cleaving the template nucleic acid molecule at a second target site on the template nucleic acid molecule, thereby generating a second cleaved region, wherein: the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
[0284] Exemplary embodiment 53: The method of embodiment 52, wherein (a) comprises contacting the template nucleic acid with the first gene editing system of any one of embodiments 1 -24 or the first CRISPR-associated nuclease system of any one of embodiments 25-47 and cleaving the template nucleic acid molecule, and wherein (b) comprises contacting the template nucleic acid with the second gene editing system of any one of embodiments 1 -24 or the second CRISPR- associated nuclease system of any one of embodiments 25-47 and cleaving the template nucleic acid molecule.
[0285] Exemplary embodiment 54: The method of any one of embodiments 52 to 53, wherein the first target site and the second target site are different.
[0286] Exemplary embodiment 55: The method of any one of embodiments 52 to 54, wherein the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
[0287] Exemplary embodiment 56: The method of any one of embodiments 52 to 55, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
[0288] Exemplary embodiment 57: The method of any one of embodiments 52 to 55, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
[0289] Exemplary embodiment 58: The method of any one of embodiments 52 to 57, wherein the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
[0290] Exemplary embodiment 59: The method of any one of embodiments 52 to 58, wherein the first gene editing system and the second gene editing system is selected from the group consisting of a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system.
[0291] Exemplary embodiment 60: The method of any one of embodiments 52 to 59, wherein the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides.
[0292] Exemplary embodiment 61 : The method of any one of embodiments 52 to 60, wherein the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule.
[0293] Exemplary embodiment 62: The method of any one of embodiments 52 to 61 , wherein the first target site and the second target site are located within two or more genes of the template nucleic acid molecule. [0294] Exemplary embodiment 63: The method of any one of embodiments 52 to 62, wherein the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule.
[0295] Exemplary embodiment 64: The method of any one of embodiments 52 to 63, wherein the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
[0296] Exemplary embodiment 65: The method of any one of embodiments 52 to 64, wherein the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region.
[0297] Exemplary embodiment 66: The method of any one of embodiments 52 to 65, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule.
[0298] Exemplary embodiment 67: The method of any one of embodiments 52 to 65, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule.
[0299] Exemplary embodiment 68: The method of any one of embodiments 66 to 67, wherein the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
[0300] Exemplary embodiment 69: The method of any one of embodiments 66 to 68, wherein the deletion removes one or more genes within the template nucleic acid molecule.
[0301] Exemplary embodiment 70: The method of any one of embodiments 66 to 69, wherein the deletion is a full deletion of a gene or a partial deletion of a gene. [0302] Exemplary embodiment 71 : The method of any one of embodiments 52 to 70, wherein the first target site and the second target site are separated by a distance of at least 50 (e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs).
[0303] Exemplary embodiment 72: The method of any one of embodiments 52 to 71 , wherein the template nucleic acid molecule is in a cell.
[0304] Exemplary embodiment 73: The method of embodiment 72, wherein the cell is in an individual.
[0305] Exemplary embodiment 74: The method of embodiment 73, wherein the individual is a human.
[0306] Exemplary embodiment 75: The method of any one of embodiments 52 to 74, wherein template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
[0307] Exemplary embodiment 76: The method of any one of embodiments 52 to 75, wherein the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell.
Exemplary embodiment 77: A method of inactivating a virus, comprising:
[0308] (a) cleaving a viral nucleic acid molecule at a first target site on the viral nucleic acid molecule, thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence; and (b) cleaving the viral nucleic acid molecule at a second target site on the viral nucleic acid molecule, thereby generating a second cleaved region, wherein: the first cleaved region or segment thereof comprises a first nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
[0309] Exemplary embodiment 78: The method of embodiment 77, wherein (a) comprises contacting the viral nucleic acid with the first gene editing system of any one of embodiments 1 -24 or the first CRISPR-associated nuclease system of any one of embodiments 25-47 and cleaving the viral nucleic acid molecule, and wherein (b) comprises contacting the viral nucleic acid with the second gene editing system of any one of embodiments 1 -24 or the second CRISPR-associated nuclease system of any one of embodiments 25-47 and cleaving the viral nucleic acid molecule.
[0310] Exemplary embodiment 79: The method of any one of embodiments 77 to 78, wherein the first target site and the second target site are different. [0311] Exemplary embodiment 80: The method of any one of embodiments 77 to 79, wherein the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
Exemplary embodiment 81 : The method of any one of embodiments 77 to 80, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
[0312] Exemplary embodiment 82: The method of any one of embodiments 77 to 80, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence.
[0313] Exemplary embodiment 83: The method of any one of embodiments 77 to 82, wherein the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region.
[0314] Exemplary embodiment 84: The method of any one of embodiments 77 to 83, wherein the viral nucleic acid molecule is in a cell.
[0315] Exemplary embodiment 85: The method of embodiment 84, wherein the cell is in an individual.
[0316] Exemplary embodiment 86: The method of embodiment 85, wherein the individual is a human.
[0317] Exemplary embodiment 87: The method of any one of embodiments 77 to 86, wherein viral nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated).
[0318] Exemplary embodiment 88: The method of any one of embodiments 77 to 86, wherein the viral nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell. [0319] Exemplary embodiment 89: A computer-implemented method of cut site identification and characterization for cutting a template polynucleotide molecule, the computer-implemented method comprising:
(a) generating, by one or more computers, microhomology data for a plurality of cleavable regions comprising cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (i) the location of cut sites and/or (ii) nucleobase sequences of nucleobase positions within the cleavable regions comprising the cut sites; and
(b) identifying, by one or more computers, a first cleavable region and a second cleavable region comprising microhomology using the microhomology data.
[0320] Exemplary embodiment 90: A computer-implemented method of cut site identification and characterization for cutting a template polynucleotide molecule, the computer-implemented method comprising:
(a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence);
(b) identifying or providing, by one or more computers, a plurality of cleavable regions comprising cut sites within the template nucleic acid sequence;
(c) generating, by one or more computers, positional data for a cleavable region of the plurality of cleavable regions using the template nucleic acid sequence, wherein the positional data comprises (i) the location of the cleavable region, (ii) a cut site within the cleavable region, and/or (iii) nucleobase sequences of nucleobase positions within the cleavable region;
(d) generating, by one or more computers, microhomology data the plurality of cleavable regions using the positional data, and identifying a first cleavable region and a second cleavable region comprising microhomology using the microhomology data.
[0321] Exemplary embodiment 91 : The computer-implemented method of any one of embodiments 89-90, wherein the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
[0322] Exemplary embodiment 92: The computer-implemented method of any one of embodiments 89-91 , wherein a cleavable region comprises (i) about 10 base pairs 5’ of a cut site within the cleavable region and (ii) about 10 base pairs 3’ of the cut site within the cleavable region. [0323] Exemplary embodiment 93: The computer-implemented method of any one of embodiments 89-92, wherein the microhomology data is a function of:
(i) total nucleobase complementarity of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region;
(ii) the length (e.g., number of contiguous nucleobases) of complementary nucleobases of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region;
(iii) GC content of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region;
(iv) orientation and/or strand location (e.g., for identifying inversion outcomes);
(v) base content of complementary nucleobases between nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region; or
(vi) a combination of (i)-(v).
[0324] Exemplary embodiment 94: The computer-implemented method of any one of embodiments 89-93, wherein the template nucleic acid sequence comprises consensus sequence, and wherein the computer-implemented method comprises in (a) generating, by one or more computers, the template nucleic acid sequence by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes).
[0325] Exemplary embodiment 95: The computer-implemented method of any one of embodiments 89-94, wherein the template nucleic acid sequence is different from each input nucleic acid sequence used to generate the consensus sequence. [0326] Exemplary embodiment 96: The computer-implemented method of any one of embodiments 89-95, wherein the two or more input nucleic acid sequences are present within a definable geographical region (e.g., Asia, Europe, North America, etc.), a definable population of individuals (e.g., a patient population), or a definable pathology (e.g., cancer-causing variants).
[0327] Exemplary embodiment 97: The computer-implemented method of any one of embodiments 89-95, wherein the computer-implemented method further comprises: generating, by one or more computers, positional entropy data for a nucleotide at each position of the template nucleic acid sequence.
[0328] Exemplary embodiment 98: The computer-implemented method of any one of embodiments 89-97, further comprising: generating, by the one or more computers, additional data using the template nucleic acid sequence and/or the positional data, wherein the additional data comprises positional entropy data (e.g., Shannon entropy) for a cut site and/or nucleobase positions proximal to the cut site, gene location (e.g., coding region data) data for a cut site and/or nucleobase positions proximal to the cut site, a distance data (e.g., distance from other target sequences) for a cut site and/or nucleobase positions proximal to the cut site, proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence) for a cut site and/or nucleobase positions proximal to the cut site, target specificity and selectivity data (e.g., Azimuth 2.0) for a cut site and/or nucleobase positions proximal to the cut site, or combinations thereof.
[0329] Exemplary embodiment 99: The computer-implemented method of any one of embodiments 58-94, wherein the method further comprises identifying a first target site sequence comprising or adjacent to the first cut site and a second target site sequence comprising or adjacent to the second cut site.
[0330] Exemplary embodiment 100: The computer-implemented method of embodiment 99, further comprising: generating, by the one or more computers, additional data for the first target nucleic acid sequence and the second target nucleic acid sequence, wherein the additional data comprises positional entropy data (e.g., Shannon entropy), gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof.
[0331] Exemplary embodiment 101 : A method of excising a nucleic acid molecule from a template nucleic acid molecule, the method comprising:
[0332] (a) identifying a first cleavable region and a second cleavable region having microhomology by:
(i) generating, by one or more computers, microhomology data for a plurality of cleavable regions comprising cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (1 ) the location of cut sites and/or (2) nucleobase sequences of nucleobase positions within the cleavable regions comprising the cut sites; and
(ii) identifying, by one or more computers, a first cleavable region and a second cleavable region comprising microhomology using the microhomology data;
(b) cleaving the template nucleic acid molecule at the first cleavable region on the template nucleic acid molecule, thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(c) cleaving the template nucleic acid molecule at the second cleavable region on the template nucleic acid molecule, thereby generating a second cleaved region, wherein: the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise the microhomology.
[0333] The determination of percent identity or percent similarity between two sequences can be accomplished using a mathematical algorithm. A non-limiting example of a mathematical algorithm utilized for the comparison of two sequences is the algorithm of Karlin and Altschul, 1990, Proc. Natl. Acad. Sci. USA 87:2264- 2268, modified as in Karlin and Altschul, 1993, Proc. Natl. Acad. Sci. USA 90:5873- 5877. Such an algorithm is incorporated into the NBLAST and XBLAST programs of Altschul et al., 1990, J. Mol. Biol. 215:403-410. Alternatively, PSI-Blast can be used to perform an iterated search which detects distant relationships between molecules. When utilizing BLAST, Gapped BLAST, and PSI-Blast programs, the default parameters of the respective programs (e.g., XBLAST and NBLAST) can be used. Another preferred, non-limiting example of a mathematical algorithm utilized for the comparison of sequences is the algorithm of Myers and Miller, CABIOS (1989). Such an algorithm is incorporated into the ALIGN program (version 2.0) which is part of the GCG sequence alignment software package. Additional algorithms for sequence analysis are known in the art and include ADVANCE and ADAM as described in Torellis and Robotti, 1994, Comput. Appl. Biosci. 10:3-5; and FASTA described in Pearson and Lipman, 1988, Proc. Natl. Acad. Sci. USA 85:2444-8. Alternatively, sequence alignment may be carried out using the CLUSTAL algorithm (e.g., as provided in the program Clustal-omega), as described by Higgins et al., 1996, Methods Enzymol. 266:383-402.
[0334] As used herein, the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), “including” (and any form of including, such as “include” and “includes”) or “containing” (and any form of containing, such as “contain” and “contains”), are inclusive or open-ended and do not exclude additional, unrecited elements or process steps. As also used herein, in any instance or embodiment described herein, “comprising” may be replaced with “consisting essentially of” and/or “consisting of”, used herein, in any instance or embodiment described herein, “comprises” may be replaced with “consists essentially of” and/or “consists of”.
[0335] As used herein, the term “about” in the context of a given value or range includes and/or refers to a value or range that is within 20%, within 10%, and/or within 5% of the given value or range.
[0336] As used herein, the term “and/or” is to be taken as specific disclosure of each of the two specified features or components with or without the other. For example, “A and/or B” is to be taken as specific disclosure of each of (i) A, (ii) B and (iii) A and B, just as if each were set out individually herein.
[0337] As used herein, a “sample” includes and/or refers to any fluid or liquid sample which is being analyzed in order to detect and/or quantify an analyte. In some embodiments, a sample is a biological sample. Examples of samples include without limitation a bodily fluid, an extract, a solution containing proteins and/or DNA, a cell extract, a cell lysate, or a tissue lysate. Non-limiting examples of bodily fluids include urine, saliva, blood, serum, plasma, cerebrospinal fluid, tears, semen, sweat, pleural effusion, liquified fecal matter, and lacrimal gland secretion.
EXAMPLES
Example 1
[0338] Exemplified herein is MMEJ-mediated excision of template nucleic acids (e.g., viral sequences) by cutting at least two different target sites, wherein the cutting generates a first and a second cleaved region having microhomology. Generally, MMEJ-mediated deletions are considered to be limited to indels at single cut sites having smaller distances (e.g., 15 nucleotides) between microhomologous sequences. Moreover, MMEJ prediction algorithms generally reduce MMEJ predictions as a function of the distance between microhomologous sequences (e.g., reducing predicted MMEJ frequencies as the distance between microhomologous sequences increases). However, as exemplified below, MMEJ- mediated excision is achieved over large distances (e.g., >100 base pairs, >1 ,000 base pairs, etc.) separating a first and a second cleaved region.
[0339] In certain instances, the compositions and methods are useful for excising viral nucleic acids. Viral nucleic acids can a present in increase challenge in targeting because they generally be present in copy number in a cell, and can be integrated or episomal. In certain embodiments, using microhomology scoring to model the competing repair possibilities, including individual cut site repair, inversions and/or excision modelling can allow choice of target sites that maximize the desired outcome in viral excision. In certain instances, choosing target sites (e.g., cleavable regions or guide RNAs) based on high identifying and/or characterizing microhomologies can provide excision or inversion or concatemerization outcomes that are substantially higher than expected from nonhomology repair
[0340] The examples include a representative number of various nucleases (e.g., different Cas enzymes) and target sequences (e.g., differing sets of genes targeted and/or guides used). The data shows that a higher MMEJ ranking (e.g., increased microhomology) is associated with increased frequencies of MMEJ- mediated excision between distant cleaved regions (e.g., upwards of 4,500 base pairs in some cases). Microhomology can be determined by various known methods, such as Microhomology-Predictor (Bae, S., Kweon, J., Kim, H. et al. Microhomology-based choice of Cas9 nuclease target sites. Nat Methods 1 1 , 705- 706 (2014) and MENTHLI (Robust Activation of Microhomology-mediated End Joining for Precision Gene Editing Applications. Ata H, Ekstrom TL, Martinez- Galvez G, Mann CM, Dvornikov AV, Schaefbauer KJ, Ma AC, Dobbs D, Clark KJ, Ekker SC. PLOS Genetics 14(9): e1007652), inDelphi (Max W. Shen, Mandana Arbab, Jonathan Y. Hsu, Daniel Worstell, Sannie J. Culbertson, Olga Krabbe, Christopher A. Cassa, David R. Liu, David K. Gifford, and Richard I. Sherwood. "Predictable and precise template-free editing of pathogenic variants." Nature, 2018), ForCasT (Elrick H, Nelakuditi V, Clark G, Brudno M, Ramani AK, Nutter LM. FORCAST: a fully integrated and open source pipeline to design Cas-mediated mutagenesis experiments) Lindel, and MENdel (Gabriel Martinez-Galvez, Parnal Joshi, Iddo Friedberg, Armando Manduca, Stephen C Ekker, Deploying MMEJ using MENdel in precision gene editing applications for gene therapy and functional genomics, Nucleic Acids Research, Volume 49, Issue 1 , 1 1 January 2021 ), each of which are herein incorporated by reference for the application of determining and/or identifying microhomology. Generally, such methods for determining microhomology are a function of the number of complementary base pairs (e.g., microhomology length > 3 contiguous base pairs) and GC content (e.g., > 50% GC content). Notably, in certain instances for achieving MMEJ-mediated excision over larger distances (e.g., >100 base pairs, >1 ,000 base pairs, etc.) separating a first and a second cleaved region, the first and the second cleaved region have reduced, little, or no internal microhomology (i.e., within the first cleaved region or within the second cleaved region) when each cleaved region is independently considered (e.g., sequences 5’ and 3’ of the first cut site within the first cleaved region have reduced or no microhomology).
Example 1.1 - CasX + ICPO and ICP27
[0341] HEK293FT cells with a stable reporter construct, containing regions of interest in the HSV genome, were treated with CasX2 nuclease and two guide RNAs, ICP0_g9 and ICP27_g9. Once expressed, the fully complexed ribonucleoprotein target two distinct sites within these regions for double-strand break. The CasX2 nuclease and the guide RNAs were expressed from two expression plasmids with each plasmid containing a single copy of CasX2 nuclease and a singular guide RNA. The plasmids were delivered to the HEK293FT reporter cell lines via transient transfection, genomic DNA harvested and amplified with flanking primers for NGS.
[0342] FIG. 4A shows a graphical representation of the cut sites found in HSV. FIG. 4B shows excision the microhomology (MH)/ MME J rank and associated frequency of excision as measured by number of sequencing reads. Each of these excisions went from one microhomology region identified near the ICP27_g9 cut site (e.g., within the ICP27 cleaved region) to a predicted matching microhomology region near the ICP0_g9 cut site (e.g., within the ICPO cleaved region). Excision of ~3,600 base pairs was observed. FIG. 4B aligns pairs of sequences with the bottom sequence in each pair showing the MMEJ-based excision from the NGS results. Above is the viral reference sequence. In both sequences the microhomology regions are bold and underlined - in the bottom sequence, one microhomology region and the intervening sequences have been excised. Pam sequences are in lower case letters.
[0343] The top six and eighth scoring predicted microhomology mediated excisions were found in the sequencing results. The top scoring site (Rank 1 , exemplary MH Score 335) was found in the highest number of reads (281 1 ). The second top predicted MMEJ score (Rank 2, exemplary MH Score 335) was found in the second highest number of reads (776).
[0344] FIG. 4C shows a plot of the sequencing reads detecting each of the different microhomology mediated viral excisions plotted against the increasing MMEJ score. The data shows that a higher MMEJ ranking (e.g., increased microhomology) is associated with increased frequencies of MMEJ-mediated excision between distant cleaved regions (e.g., upwards of 4,500 base pairs in some cases). The DNA deletion predicted with the highest score, also had the highest number of reads, indicating it happened at highest rate. The data shows that MMEJ ranking and scoring can be used to analyze multiple different candidate target sites for excision (e.g., excision of 50, 100, 500, 1 ,000 base pairs, and/or greater).
Example 1.2 - CasX + ICP0 and ICP27
[0345] HEK293FT cells with a stable reporter construct, containing regions of interest in the HSV genome, were treated with CasX2 nuclease and two guide RNAs, ICP0_g6 and ICP27_g9. Once expressed, the fully complexed ribonucleoprotein target two distinct sites within these regions for double-strand break. The CasX2 nuclease and the guide RNAs were expressed from two expression plasmids with each plasmid containing a single copy of CasX2 nuclease and a singular guide RNA. The plasmids were delivered to the HEK293FT reporter cell lines via transient transfection, genomic DNA harvested and amplified with flanking primers for NGS.
[0346] FIG. 5 shows excision the MMEJ rank and associated reads as measured by number of sequencing reads. Each of these excisions went from one microhomology region identified near the ICP27_g6 cut site (e.g., within the ICP27 cleaved region) to a predicted matching microhomology region near the ICP0_g9 cut site (e.g., within the ICP0 cleaved region). Excision of ~4,500 base pairs was observed. FIG. 5 aligns pairs of sequences with the bottom sequence in each pair showing the MMEJ-based excision from the NGS results. Above is the viral reference sequence. In both sequences the microhomology regions are bold and underlined - in the bottom sequence, one microhomology region and the intervening sequences have been excised. Pam sequences are in lower case letters.
[0347] The top two scoring predicted microhomology mediated excisions were found in the sequencing results. The top scoring site (Rank 1 , exemplary MH Score 233) with a MMEJ score was found in the highest number of reads (3,988). The second top predicted MMEJ score (Rank 2, exemplary MH Score 184) was found in the second highest number of reads (2,611 ). Example 1.3 - SluCas9 HSV ICPO g3 and ICP27 g4
[0348] HEK293FT cells with a stable reporter construct, containing regions of interest in the HSV genome, were treated with SluCas9 nuclease and two guide RNAs, ICP0_g3 and ICP27_g4. Once expressed, the fully complexed ribonucleoprotein target two distinct sites within these regions for double-strand break. The SluCas9 nuclease and the guide RNAs were expressed from two expression plasmids with each plasmid containing a single copy of SluCas9 nuclease and a singular guide RNA. The plasmids were delivered to the HE K293FT reporter cell lines via transient transfection, genomic DNA harvested and amplified with flanking primers for NGS.
[0349] FIG. 6 shows excision the microhomology (MH)/ MME J rank and associated frequency as measured by number of sequencing reads. Each of these excisions went from one microhomology region identified near the ICP27_g4 cut site (e.g., within the ICP27 cleaved region) to a predicted matching microhomology region near the ICP0_g3 cut site (e.g., within the ICPO cleaved region). Excision of ~3,400 base pairs was observed. FIG. 6 aligns pairs of sequences with the bottom sequence in each pair showing the MMEJ-based excision from the NGS results. Above is the viral reference sequence. In both sequences the microhomology regions are bold and underlined - in the bottom sequence, one microhomology region and the intervening sequences have been excised. Pam sequences are in lower case letters.
[0350] The top two scoring predicted microhomology mediated excisions were found in the sequencing results. The top scoring site (Rank 1 , exemplary MH Score 303.5) with a MMEJ score was found in the highest number of reads (3,964). The second top predicted MMEJ score (Rank 2, exemplary MH Score 133) was found in the second highest number of reads (408).
Example 1.4 - CpeCas9 HSV ICPO g10 and ICP27 g20
[0351] HEK293FT cells with a stable reporter construct, containing regions of interest in the HSV genome, were treated with CpeCas9 nuclease and a series of two guide RNAs. Once expressed, the fully complexed ribonucleoprotein target two distinct sites within these regions for double-strand break. The CpeCas9 nuclease and the guide RNAs were expressed from two expression plasmids with each plasmid containing a single copy of CpeCas9 nuclease and a singular guide RNA. The plasmids were delivered to the HEK293FT reporter cell lines via transient transfection, genomic DNA harvested and amplified with flanking primers for NGS. [0352] FIG. 7A shows excision the microhomology (MH)/MMEJ rank and associated frequency as measured by number of sequencing reads. The excisions example on top went from one microhomology region identified near the ICP0_g16 cut site (e.g., within the ICPO cleaved region) to a predicted matching microhomology region near the ICP27_g20 cut site (e.g., within the ICP27 cleaved region). Excision of ~3,559 base pairs was observed. The excisions example on bottom went from one microhomology region identified near the ICP0_g15 cut site (e.g., within the ICPO cleaved region) to a predicted matching microhomology region near the ICP27_g10 cut site (e.g., within the ICP27 cleaved region). Excision of ~4,500 base pairs was observed. FIG. 7A aligns pairs of sequences with the bottom sequence in each pair showing the MMEJ-based excision from the NGS results. Above is the viral reference sequence. In both sequences the microhomology regions are bold and underlined - in the bottom sequence, one microhomology region and the intervening sequences have been excised.
[0353] The top two scoring predicted microhomology mediated excisions were found in the sequencing results. The top scoring site (Rank 1 , exemplary MH Score 444.9) with a MMEJ score was found in the highest number of reads. The second top predicted MMEJ score (Rank 2, exemplary MH Score 298.2) was found in the second highest number of reads. FIG. 7B shows MMEJ can be used to predict and generate MMEJ-mediated inversions. The inversion scoring site had an exemplary MH score of 461 and a resulted in an excision and then inversion of ~4548 base pairs.
Example 1.5 - HBV targeting by CpeCas9
[0354] HEK293FT cells carrying partial HBV genomic sequences of interest on a stably integrated reporter construct were transfected with CpeCas9 nuclease- encoding mRNA and a series of two guide RNAs. Once expressed, the fully complexed ribonucleoproteins generate double-stranded breaks at the two distinct target sites. Seven days after the transfection, genomic DNA was extracted from the harvested cells and the DNA sequences spanning the target sites were separately amplified by PCR with flanking primers for Sanger Sequencing and indel determination. [0355] FIG. 8 shows The MMEJ scoring output after cutting the HBV sequences. The figure aligns a pair of sequences with the bottom sequence showing the MMEJ- based excision deduced from the indel determination shown below the sequence before editing.
[0356] The second scoring microhomology mediated excisions flanking the PL0896 cut site were found in the sequencing results as the major deletion in two different experiments where the PL0896 guide was co -transfected with different guides. The deleted bases in both experiments are shown in the bottom line as dashes underneath the deleted nucleotides. The microhomology repeat in bold (here TCC).
Example 1.6 - HIV targeting by SaCas9
[0357] U1 cells with two integrated copies of HIV were treated with SaCas9 nuclease protein and two guide RNAs. The SaCas9 protein was combined with the guides RNAs and introduced into the U1 cells. Single cell clones were isolated and cultured, genomic DNA harvested and amplified with flanking primers for NGS to determine the editing events in the cells.
[0358] The top two scoring microhomology mediated excisions were identified in the sequencing results. The top scoring site (Rank 1 , having an exemplary MH score of 666) was found in 3964 reads. A lower predicted scoring site (Rank 7, having an exemplary MH score of 120) was found in 2 reads.
[0359] While preferred embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the instant disclosure. It should be understood that various alternatives to the embodiments described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the embodiments disclosed herein, and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

CLAIMS Listing of Claims
1. A composition, comprising:
(a) a first gene editing system, wherein: the first gene editing system is configured to enzymatically cleave at a first target site on a template nucleic acid molecule and generate a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(b) a second gene editing system, wherein: the second gene editing system is configured to enzymatically cleave at a second target site on the template nucleic acid molecule and generate a second cleaved region, the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology.
2. The composition of claim 1 , wherein the first target site and the second target site are different.
3. The composition of claim 1 or 2, wherein the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%.
4. The composition of any one of claims 1 to 3, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology.
5. The composition of any one of claims 1 to 3, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence. The composition of any one of claims 1 to 5, wherein the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region. The composition of any one of claims 1 to 6, wherein the first gene editing system and the second gene editing system is selected from the group consisting of a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system. The composition of any one of claims 1 to 7, wherein the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides. The composition of any one of claims 1 to 8, wherein the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. The composition of any one of claims 1 to 9, wherein the first target site and the second target site are located within two or more genes of the template nucleic acid molecule. The composition of any one of claims 1 to 10, wherein the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule. The composition of any one of claims 1 to 11 , wherein the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. The composition of any one of claims 1 to 11 , wherein the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. The composition of any one of claims 1 to 13, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule. The composition of any one of claims 1 to 13, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule. The composition of any one of claims 14 to 15, wherein the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5, 000, or at least 8,000 base pairs). The composition of any one of claims 14 to 16, wherein the deletion removes one or more genes within the template nucleic acid molecule, and wherein the deletion is a full deletion of a gene or a partial deletion of a gene. The composition of any one of claims 14 to 17, wherein the deletion comprises an inversion. The composition of any one of claims 1 to 18, wherein the first target site and the second target site are separated by a distance of at least 50 (e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs). The composition of any one of claims 1 to 19, wherein the template nucleic acid molecule is in a cell. The composition of claim 20, wherein the cell is in an individual. The composition of claim 21 , wherein the individual is a human. The composition of any one of claims 1 to 22, wherein the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated) a virus derived sequence, or virus-like sequence. The composition of any one of claims 1 to 23, wherein the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell. A composition, comprising: (a) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease system comprising:
(i) a first guide ribonucleic acid (gRNA) comprising a first spacer sequence that hybridizes to a first target site on a template nucleic acid molecule, and
(ii) a first Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease, wherein: the first CRISPR-associated nuclease cleaves the template nucleic acid molecule within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the first target site and generates a first cleaved region, and the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(b) a second CRISPR-associated endonuclease system comprising
(i) a second guide ribonucleic acid (gRNA) comprising a second spacer sequence that hybridizes to a second target site on the template nucleic acid molecule, and
(ii) a second Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated nuclease, wherein: the second CRISPR-associated nuclease cleaves the template nucleic acid molecule within or proximal (e.g., within a distance of about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions) to the second target site and generates a second cleaved region, the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology. The composition of claim 25, wherein the first gRNA and the second gRNA are different. The composition of claim 25 or 26, wherein the microhomology comprises three or more complementary nucleotides having a GC (guanine or cytosine) content greater than or equal to 50%. The composition of any one of claims 25 to 27, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology. The composition of any one of claims 25 to 28, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence. The composition of any one of claims 25 to 29, wherein the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region. The composition of any one of claims 25 to 30, wherein the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides. The composition of any one of claims 25 to 31 , wherein the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. The composition of any one of claims 25 to 32, wherein the first target site and the second target site are located within two or more genes of the template nucleic acid molecule. The composition of any one of claims 25 to 33, wherein the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule. The composition of any one of claims 25 to 34, wherein the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. The composition of any one of claims 25 to 34, wherein the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. The composition of any one of claims 25 to 36, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule. The composition of any one of claims 25-36, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule. The composition of any one of claims 37 or 38, wherein the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs). The composition of any one of claims 37 to 39, wherein the deletion removes one or more genes within the template nucleic acid molecule, and the deletion is a full deletion of a gene or a partial deletion of a gene. The composition of any one of claims 37 to 40, wherein the deletion comprises an inversion. The composition of any one of claims 37 to 41 , wherein the first target site and the second target site are separated by a distance of at least 50 (e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs). The composition of any one of claims 1 to 42, wherein the template nucleic acid molecule is in a cell. The composition of claim 43, wherein the cell is in an individual. The composition of claim 44, wherein the individual is a human. The composition of any one of claims 1 to 45, wherein the template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated). The composition of any one of claims 1 to 46, wherein the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell. Use of the composition of any one of claims 1 to 47 in a method of excising a nucleic acid molecule from the template nucleic acid molecule. A nucleic acid vector encoding one or more components of the first gene editing system and/or the second gene editing system of any one of claims 1 to 47. Use of the nucleic acid vector of claim 49 in a method of excising a nucleic acid molecule from the template nucleic acid molecule. A nucleic acid vector encoding one or more components of the first CRISPR- Cas system and/or the second CRISPR-Cas system of any one of claims 25 to 47. A method of excising a nucleic acid molecule from a template nucleic acid molecule, the method comprising:
(a) cleaving the template nucleic acid molecule at a first target site on the template nucleic acid molecule, thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(b) cleaving the template nucleic acid molecule at a second target site on the template nucleic acid molecule, thereby generating a second cleaved region, wherein: the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology. The method of claim 52, wherein (a) comprises contacting the template nucleic acid with the first gene editing system of any one of claims 1 -24 or the first CRISPR-associated nuclease system of any one of claims 25-47 and cleaving the template nucleic acid molecule, and wherein (b) comprises contacting the template nucleic acid with the second gene editing system of any one of claims 1 -24 or the second CRISPR-associated nuclease system of any one of claims 25-47 and cleaving the template nucleic acid molecule. The method of any one of claims 52 to 53, wherein the first target site and the second target site are different. The method of any one of claims 52 to 54, wherein the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%. The method of any one of claims 52 to 55, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology. The method of any one of claims 52 to 55, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence. The method of any one of claims 52 to 57, wherein the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region. The method of any one of claims 52 to 58, wherein the first gene editing system and the second gene editing system is selected from the group consisting of a CRISPR-Cas system, a meganuclease system, a TALEN system, and a ZFN system. The method of any one of claims 52 to 59, wherein the microhomology comprises at least 3 (e.g., at least 5, at least 10, at least 15, or at least 20) complementary nucleotides. The method of any one of claims 52 to 60, wherein the first target site is located within a first gene of the template nucleic acid molecule and the second target site is located within a second gene of the template nucleic acid molecule. The method of any one of claims 52 to 61 , wherein the first target site and the second target site are located within two or more genes of the template nucleic acid molecule. The method of any one of claims 52 to 62, wherein the first target site is located within a first protein coding region of the template nucleic acid molecule and the second target site is located within a second protein coding region of the template nucleic acid molecule. The method of any one of claims 52 to 63, wherein the microhomology is located at the terminus (e.g., the 3’ end) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. The method of any one of claims 52 to 64, wherein the microhomology is located proximal (e.g., within about 1 , 2, 3, 4, 5, 10, 15 or 20 nucleobase positions from a cleavage site) to the terminus (e.g., 5’ or 3’) of the first cleaved region, the second cleaved region, or both the first cleaved region and the second cleaved region. The method of any one of claims 52 to 65, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region activates microhomology-mediated end joining (MMEJ) of the first cleaved region and the second cleaved region, thereby excising a region of the template nucleic acid molecule and/or generating in a deletion in the template nucleic acid molecule. The method of any one of claims 52 to 65, wherein generating the first cleaved region comprising two or more nucleotides complementary to the second cleaved region excises a region of the template nucleic acid molecule and/or generates a deletion in the template nucleic acid molecule. The method of any one of claims 66 to 67, wherein the deletion comprises 50 base pairs or greater (e.g., at least 50, at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs). The method of any one of claims 66 to 68, wherein the deletion removes one or more genes within the template nucleic acid molecule. The method of any one of claims 66 to 69, wherein the deletion is a full deletion of a gene or a partial deletion of a gene. The method of any one of claims 52 to 70, wherein the first target site and the second target site are separated by a distance of at least 50 (e.g., at least 100, at least 250, at least 500, at least 750, at least 1 ,000, at least 2,000, at least 5,000, or at least 8,000 base pairs). The method of any one of claims 52 to 71 , wherein the template nucleic acid molecule is in a cell. The method of claim 72, wherein the cell is in an individual. The method of claim 73, wherein the individual is a human. The method of any one of claims 52 to 74, wherein template nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated). The method of any one of claims 52 to 75, wherein the template nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell. A method of inactivating a virus, comprising:
(a) cleaving a viral nucleic acid molecule at a first target site on the viral nucleic acid molecule, thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(b) cleaving the viral nucleic acid molecule at a second target site on the viral nucleic acid molecule, thereby generating a second cleaved region, wherein: the first cleaved region or segment thereof comprises a first nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise microhomology. The method of claim 77, wherein (a) comprises contacting the viral nucleic acid with the first gene editing system of any one of claims 1 -24 or the first CRISPR-associated nuclease system of any one of claims 25-47 and cleaving the viral nucleic acid molecule, and wherein (b) comprises contacting the viral nucleic acid with the second gene editing system of any one of claims 1 -24 or the second CRISPR-associated nuclease system of any one of claims 25-47 and cleaving the viral nucleic acid molecule. The method of any one of claims 77 to 78, wherein the first target site and the second target site are different. The method of any one of claims 77 to 79, wherein the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%. The method of any one of claims 77 to 80, wherein: sequences within (e.g., internal to) the first cleaved region lack microhomology; and sequences within (e.g., internal to) the second cleaved region lack microhomology. The method of any one of claims 77 to 80, wherein: microhomology of sequences within (e.g., internal to) the first cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence; and microhomology of sequences within (e.g., internal to) the second cleaved region is less (e.g., in number or degree) than the microhomology of first nucleic acid sequence and the second sequence. The method of any one of claims 77 to 82, wherein the first cleaved region comprises (i) about 10 base pairs 5’ of a first cleaved site within the first cleaved region and (ii) about 10 base pairs 3’ of the first cleaved site within the first cleaved region; and wherein the second cleaved region comprises (i) about 10 base pairs 5’ of a second cleaved site within the second cleaved region and (ii) about 10 base pairs 3’ of the second cleaved site within the second cleaved region. The method of any one of claims 77 to 83, wherein the viral nucleic acid molecule is in a cell. The method of claim 84, wherein the cell is in an individual. The method of claim 85, wherein the individual is a human. The method of any one of claims 77 to 86, wherein viral nucleic acid molecule is a viral genome (e.g., integrated, episomal, and/or both episomal and integrated). The method of any one of claims 77 to 86, wherein the viral nucleic acid molecule is an exogenous nucleic acid molecule relative to a host cell genome, an episomal nucleic acid molecule, or an integrated genome exogenous a host cell. A computer-implemented method of cut site identification and characterization for cutting a template polynucleotide molecule, the computer-implemented method comprising:
(a) generating, by one or more computers, microhomology data for a plurality of cleavable regions comprising cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (i) the location of cut sites and/or (ii) nucleobase sequences of nucleobase positions within the cleavable regions comprising the cut sites; and
(b) identifying, by one or more computers, a first cleavable region and a second cleavable region comprising microhomology using the microhomology data. A computer-implemented method of cut site identification and characterization for cutting a template polynucleotide molecule, the computer-implemented method comprising:
(a) generating or providing, by one or more computers, a template nucleic acid sequence (e.g., a viral genome sequence);
(b) identifying or providing, by one or more computers, a plurality of cleavable regions comprising cut sites within the template nucleic acid sequence;
(c) generating, by one or more computers, positional data for a cleavable region of the plurality of cleavable regions using the template nucleic acid sequence, wherein the positional data comprises (i) the location of the cleavable region, (ii) a cut site within the cleavable region, and/or (iii) nucleobase sequences of nucleobase positions within the cleavable region;
(d) generating, by one or more computers, microhomology data the plurality of cleavable regions using the positional data, and identifying a first cleavable region and a second cleavable region comprising microhomology using the microhomology data. The computer-implemented method of any one of claims 89-90, wherein the microhomology comprises three or more complementary nucleotides (e.g., in a contiguous sequence) having a GC (guanine or cytosine) content greater than or equal to 50%. The computer-implemented method of any one of claims 89-91 , wherein a cleavable region comprises (i) about 10 base pairs 5’ of a cut site within the cleavable region and (ii) about 10 base pairs 3’ of the cut site within the cleavable region. The computer-implemented method of any one of claims 89-92, wherein the microhomology data is a function of:
(i) total nucleobase complementarity of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region;
(ii) the length (e.g., number of contiguous nucleobases) of complementary nucleobases of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region;
(iii) GC content of nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region;
(iv) orientation and/or strand location (e.g., for identifying inversion outcomes);
(v) base content of complementary nucleobases between nucleobase sequences within the first cleavable region and nucleobase sequences within the second cleavable region; or
(vi) a combination of (i)-(v). The computer-implemented method of any one of claims 89-93, wherein the template nucleic acid sequence comprises consensus sequence, and wherein the computer-implemented method comprises in (a) generating, by one or more computers, the template nucleic acid sequence by aligning two or more input nucleic acid sequences (e.g., two or more viral genomes). The computer-implemented method of any one of claims 89-94, wherein the template nucleic acid sequence is different from each input nucleic acid sequence used to generate the consensus sequence. The computer-implemented method of any one of claims 89-95, wherein the two or more input nucleic acid sequences are present within a definable geographical region (e.g., Asia, Europe, North America, etc.), a definable population of individuals (e.g., a patient population), or a definable pathology (e.g., cancer-causing variants).
The computer-implemented method of any one of claims 89-95, wherein the computer-implemented method further comprises: generating, by one or more computers, positional entropy data for a nucleotide at each position of the template nucleic acid sequence.
The computer-implemented method of any one of claims 89-97, further comprising: generating, by the one or more computers, additional data using the template nucleic acid sequence and/or the positional data, wherein the additional data comprises positional entropy data (e.g., Shannon entropy) for a cut site and/or nucleobase positions proximal to the cut site, orientation data (e.g., for identifying inversion outcomes) gene location (e.g., coding region data) data for a cut site and/or nucleobase positions proximal to the cut site, a distance data (e.g., distance from other target sequences) for a cut site and/or nucleobase positions proximal to the cut site, proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence) for a cut site and/or nucleobase positions proximal to the cut site, target specificity and selectivity data (e.g., Azimuth 2.0) for a cut site and/or nucleobase positions proximal to the cut site, or combinations thereof.
The computer-implemented method of any one of claims 58-94, wherein the method further comprises identifying a first target site sequence comprising or adjacent to the first cut site and a second target site sequence comprising or adjacent to the second cut site.
The computer-implemented method of claim 99, further comprising: generating, by the one or more computers, additional data for the first target nucleic acid sequence and the second target nucleic acid sequence, wherein the additional data comprises positional entropy data (e.g., Shannon entropy), gene location (e.g., coding region data) data, a distance data (e.g., distance from other target sequences), proximity to one or more PAM sequences, homology data (e.g., homology to a human genome sequence), target specificity and selectivity data (e.g., Azimuth 2.0), or combinations thereof. A method of excising a nucleic acid molecule from a template nucleic acid molecule, the method comprising:
(a) identifying a first cleavable region and a second cleavable region having microhomology by:
(i) generating, by one or more computers, microhomology data for a plurality of cleavable regions comprising cut sites within a template nucleic acid sequence using positional data, wherein the positional data comprises (1 ) the location of cut sites and/or (2) nucleobase sequences of nucleobase positions within the cleavable regions comprising the cut sites; and
(ii) identifying, by one or more computers, a first cleavable region and a second cleavable region comprising microhomology using the microhomology data;
(b) cleaving the template nucleic acid molecule at the first cleavable region on the template nucleic acid molecule, thereby generating a first cleaved region, wherein the first cleaved region or segment thereof comprises a first nucleic acid sequence; and
(c) cleaving the template nucleic acid molecule at the second cleavable region on the template nucleic acid molecule, thereby generating a second cleaved region, wherein: the second cleaved region or segment thereof comprises a second nucleic acid sequence, and the first nucleic acid sequence and the second nucleic acid sequence comprise the microhomology.
PCT/US2023/017959 2022-04-08 2023-04-07 Computer-implemented systems and methods for targeting microhomology-mediated excision WO2023196647A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263329274P 2022-04-08 2022-04-08
US63/329,274 2022-04-08

Publications (1)

Publication Number Publication Date
WO2023196647A1 true WO2023196647A1 (en) 2023-10-12

Family

ID=88243499

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/017959 WO2023196647A1 (en) 2022-04-08 2023-04-07 Computer-implemented systems and methods for targeting microhomology-mediated excision

Country Status (1)

Country Link
WO (1) WO2023196647A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018025206A1 (en) * 2016-08-02 2018-02-08 Kyoto University Method for genome editing
WO2020055941A1 (en) * 2018-09-13 2020-03-19 Excision Biotherapeutics, Inc. Compositions and methods for excision with single grna
WO2022119919A1 (en) * 2020-12-01 2022-06-09 Howell Alexandra Compositions and methods for cleaving viral genomes

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018025206A1 (en) * 2016-08-02 2018-02-08 Kyoto University Method for genome editing
WO2020055941A1 (en) * 2018-09-13 2020-03-19 Excision Biotherapeutics, Inc. Compositions and methods for excision with single grna
WO2022119919A1 (en) * 2020-12-01 2022-06-09 Howell Alexandra Compositions and methods for cleaving viral genomes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHIN-IL KIM, TOMOKO MATSUMOTO, HARUNOBU KAGAWA, MICHIKO NAKAMURA, RYOKO HIROHATA, AYANO UENO, MAKI OHISHI, TETSUSHI SAKUMA, TOMOYO: "Microhomology-assisted scarless genome editing in human iPSCs", NATURE COMMUNICATIONS, vol. 9, no. 1, 1 December 2018 (2018-12-01), XP055566588, DOI: 10.1038/s41467-018-03044-y *

Similar Documents

Publication Publication Date Title
Kim et al. Evaluating and enhancing target specificity of gene-editing nucleases and deaminases
Liang et al. Effective gene editing by high-fidelity base editor 2 in mouse zygotes
AU2021212165B2 (en) Genomewide unbiased identification of dsbs evaluated by sequencing (guide-seq)
Ferreira et al. Multiplexed CRISPR/Cas9 genome editing and gene regulation using Csy4 in Saccharomyces cerevisiae
Koo et al. Measuring and reducing off-target activities of programmable nucleases including CRISPR-Cas9
Mussolino et al. TALENs facilitate targeted genome editing in human cells with high specificity and low cytotoxicity
Yang et al. Targeted and genome-wide sequencing reveal single nucleotide variations impacting specificity of Cas9 in human stem cells
Stella et al. The genome editing revolution: A CRISPR‐Cas TALE off‐target story
Riordan et al. Application of CRISPR/Cas9 for biomedical discoveries
McCarty et al. Rapid assembly of gRNA arrays via modular cloning in yeast
Shagina et al. Normalization of genomic DNA using duplex-specific nuclease
Ribeiro et al. Guide RNA selection for CRISPR-Cas9 transfections in Plasmodium falciparum
Bhushan et al. The evolution of CRISPR/Cas9 and their cousins: hope or hype?
Lux et al. Therapeutic gene editing safety and specificity
JP2017529855A (en) Methods and systems for detection of gene mutations
Cuculis et al. A single-molecule view of genome editing proteins: biophysical mechanisms for TALEs and CRISPR/Cas9
Mota et al. CRISPR/Cas Class 2 systems and their applications in biotechnological processes
Jianwei et al. Structures of apo Cas12a and its complex with crRNA and DNA reveal the dynamics of ternary complex formation and target DNA cleavage
Raper et al. Sharpening the scissors: Mechanistic details of CRISPR/Cas9 improve functional understanding and inspire future research
Zhang et al. The effect of sequence mismatches on binding affinity and endonuclease activity are decoupled throughout the Cas9 binding site
WO2023196647A1 (en) Computer-implemented systems and methods for targeting microhomology-mediated excision
Robinson et al. Non-mendelian inheritance of paralogs of 2 cytoskeletal genes in the ciliate Chilodonella uncinata
Mineta et al. Enhanced cleavage of double-stranded DNA by artificial zinc-finger nuclease sandwiched between two zinc-finger proteins
Craig Replitrons: a new group of eukaryotic transposons encoding HUH endonuclease
Cain CRISPR genome editing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23785479

Country of ref document: EP

Kind code of ref document: A1