WO2023014967A1 - Methods for detecting indel produced by genome editing protocol - Google Patents

Methods for detecting indel produced by genome editing protocol Download PDF

Info

Publication number
WO2023014967A1
WO2023014967A1 PCT/US2022/039567 US2022039567W WO2023014967A1 WO 2023014967 A1 WO2023014967 A1 WO 2023014967A1 US 2022039567 W US2022039567 W US 2022039567W WO 2023014967 A1 WO2023014967 A1 WO 2023014967A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
certain embodiments
genomic dna
targeted
nuclease
Prior art date
Application number
PCT/US2022/039567
Other languages
French (fr)
Inventor
Dario Boffelli
Stacia WYMAN
David IK MARTIN
Wendy J. MAGIS
Mark Dewitt
Original Assignee
The Regents Of The University Of California
Children's Hospital & Research Center At Oakland
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Regents Of The University Of California, Children's Hospital & Research Center At Oakland filed Critical The Regents Of The University Of California
Publication of WO2023014967A1 publication Critical patent/WO2023014967A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/34Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving hydrolase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells

Definitions

  • Gene editing typically involves the use of a targeted nuclease to induce double-strand DNA breaks (DSBs) at specific genomic sites.
  • DSBs are then repaired by one of two cellular mechanisms: homology-directed repair (HDR) uses a DNA template to repair the DSB, while nonhomologous end joining (NHEJ) directly repairs the DSB but frequently creates an insertion or deletion mutation (indel) at the DSB site.
  • HDR is usually less efficient than NHEJ, even protocols that use a DNA template to edit the region around a DSB result in a large proportion of repaired alleles containing an indel.
  • Current methods for determining the genotypes produced by genome editing involve Polymerase Chain Reaction (PCR) amplification of DNA fragments followed by deep sequencing. These methods have limitations and do not inform the full extent of induced indel landscape. Accordingly, new and improved methods for discovering indels induced by a genome editing process is needed.
  • Certain embodiments of the invention provide a method of providing an unbiased, full landscape of mutations induced by a genome editing protocol (e.g., CRSIPR-Cas, TALEN, or ZFN based protocol, or other genome-editing protocol).
  • a genome editing protocol e.g., CRSIPR-Cas, TALEN, or ZFN based protocol, or other genome-editing protocol.
  • Certain embodiments of the invention provide a method of identifying in a sample a DNA variant induced by a genome editing protocol, comprising: contacting a genomic DNA of the genome edited sample with one or more targeted nucleases (e.g., one targeted nuclease, or a pair of targeted nucleases) that is capable of excising a DNA fragment (e.g., a high molecular weight DNA fragment) from the genomic DNA; isolating the DNA fragment; and sequencing the isolated DNA fragment; wherein the genomic DNA comprises an editing site targeted by the genome editing protocol, and the DNA fragment comprises the editing site and the DNA variant (e.g., large and/or remote DNA variant).
  • a targeted nucleases e.g., one targeted nuclease, or a pair of targeted nucleases
  • Genome editing holds great promise in a wide range of applications from advancing basic research to revolutionizing treatment for certain intractable diseases. Genome editing protocols are known in the art, and the field continues to evolve. Currently CRISPR/Cas based protocols are efficient and facile genome editing approaches. Before the advent of CRISPR/Cas, transcription activator-like effector nuclease (TALEN), and Zinc finger nuclease (ZFN) based platforms were also widely adopted genome editing technologies.
  • TALEN transcription activator-like effector nuclease
  • ZFN Zinc finger nuclease
  • a double stranded break (DSB) and/or a single stranded break may be generated by a targeted nuclease or nickase at a specifically targeted editing site.
  • DNA repair mechanisms often involved in genome editing processes may include, but not limited to, homology directed repair (HDR) and non-homologous end joining (NHEJ).
  • HDR homology directed repair
  • NHEJ non-homologous end joining
  • the NHEJ process may be particularly prone to introduce unintended DNA modification relative to the original DNA sequence.
  • DNA repair or ligation mechanism(s) are not fully elucidated.
  • the unbiased, full spectrum of DNA variants e.g., indels
  • unintentionally introduced by genome editing process is not well characterized and understood.
  • PCR polymerase chain reaction
  • a pair of PCR primers are designed and located upstream and downstream of the editing site.
  • PCR reaction kinetics may bias the amplified fragments, leading to a skewed and/or incomplete representation of the indel landscape.
  • the resulting template DNA may lack the sequence sufficiently complementary to the designed PCR primer or even lack the sequence entirely due to deletion.
  • PCR based post-editing assessments or quality controls by themselves may be insufficient to faithfully enumerate the whole gamut of indels.
  • DNA variants e.g., indels
  • the inadequacy of current post-editing evaluation workflow could have major consequences for genome editing applications including their adoption in medicine to deliver effective and safe therapies.
  • Certain embodiments of the invention provide methods of generating a catalogue of mutations induced by a genome editing protocol.
  • Certain embodiments described herein provide efficient methods suitable to provide an unbiased, full landscape of mutations induced by a genome editing protocol, including but not limited to, large indels, and/or remote indels that are distant from the targeted editing site.
  • the invention provides a method of identifying in a sample a DNA variant unintendedly induced by a genome editing protocol, comprising: contacting a genomic DNA of the genome edited sample with one or more targeted nucleases (e.g., one targeted nuclease or a pair of targeted nucleases) that is capable of excising a DNA fragment from the genomic DNA, isolating the DNA fragment, and sequencing the DNA fragment; wherein the genomic DNA comprises an editing site targeted by the genome editing protocol, and the DNA fragment comprises the editing site and the DNA variant.
  • the sample is genome edited with a genome editing protocol selected from the group consisting of CRISPR-Cas based protocol, TALEN based protocol, and ZFN based protocol.
  • the sample is genome edited using a CRISPR-Cas based genome-editing protocol.
  • the invention provides a method of identifying in a sample a DNA variant unintendedly induced by a genome editing protocol, comprising: contacting a genomic DNA of the genome edited sample with one or more targeted nucleases (e.g., one targeted nuclease or a pair of targeted nucleases) that is capable of excising a high molecular weight (HMW) DNA fragment from the genomic DNA, isolating the HMW DNA fragment, and sequencing the HMW DNA fragment; wherein the genomic DNA comprises an editing site targeted by the genome editing protocol, and the HMW DNA fragment comprises the editing site and the DNA variant.
  • targeted nucleases e.g., one targeted nuclease or a pair of targeted nucleases
  • DNA variant refers to an unintended DNA sequence modification induced by a genome editing protocol.
  • double stranded break may be generated by targeted nuclease at specifically targeted editing site.
  • unintended DNA modifications may be randomly introduced as side effect on the edited DNA molecule.
  • the DNA variant is an indel (insertion or deletion).
  • the DNA variant is a deletion.
  • the DNA variant is an insertion.
  • the DNA variant is a point mutation.
  • one or more point mutation DNA variants may be contiguous, for example, multiple point mutations in a row will form a segment mutation (e.g., 2 base pair (bp) or longer in length) so the segment sequence is entirely replaced but no change in length occurred in the segment.
  • a DNA variant may be disadvantageous, or even harmful (e.g., to an edited cell, or to a host, or to a recipient of the edited cell).
  • a DNA variant may be harmless, or even beneficial.
  • the term “DNA variant” as described herein also encompasses DNA rearrangement and/or translocation as unintended DNA modification induced by a genome editing protocol.
  • chromothripsis is a mutational phenomenon of clustered chromosomal rearrangement occurred in localized genomic region(s).
  • DNA rearrangement and/or translocation may involve the deletion of a DNA segment at one location of the genomic DNA and the insertion of the DNA segment at another location of the genomic DNA.
  • the DNA variant is a deletion of a DNA segment that has been, completely or partially, rearranged or translocated or inserted into another location of the genomic DNA.
  • the DNA variant is an insertion of a DNA segment that has been, completely or partially, rearranged or translocated or deleted at another location of the genomic DNA.
  • the DNA variant is an insertion of a DNA segment that has been, completely or partially, copied from another location of the genomic DNA so the copy number of the DNA segment might be changed (e.g., increased) in the genome.
  • the DNA variants include short and long DNA variants, near and remote DNA variants, and isolated and clustered DNA variants.
  • the DNA variant has a length of at least about Ibp, 2bp, 3bp, 4bp, 5bp, lObp, 20bp, 30bp, 40bp, 50bp, 60bp, 70bp, 80bp, 90bp, lOObp, 200bp, 300bp, 400bp, 500bp, 600bp, 700bp, 800bp, 900bp, Ikb, l.
  • lkb 1.2kb, 1.3kb, 1.4kb, 1.5kb, 1.6kb, 1.7kb, 1.8kb, 1.9kb, 2kb, 2.5kb, 3kb, 3.5kb, 4kb, 4.5kb, 5kb, 5.5kb, 6kb, 6.5kb, 7kb, 7.5kb, 8kb, 8.5kb, 9kb, 9.5kb, lOkb, 1 Ikb, 12kb, 13kb, 14kb, 15kb, 16kb, 17kb, 18kb, 19kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, or longer.
  • a short DNA variant indicates a DNA variant that has a length of less than about lOObp in length.
  • a long or large DNA variant indicates a DNA variant that is at least about 100 bp in length.
  • the large DNA variant e.g., an indel
  • the large DNA variant has a length of about lOObp, 200bp, 300bp, 400bp, 500bp, 600bp, 700bp, 800bp, 900bp, Ikb, l.
  • lkb 1.2kb, 1.3kb, 1.4kb, 1.5kb, 1.6kb, 1.7kb, 1.8kb, 1.9kb, 2kb, 2.5kb, 3kb, 3.5kb, 4kb, 4.5kb, 5kb, 5.5kb, 6kb, 6.5kb, 7kb, 7.5kb, 8kb, 8.5kb, 9kb, 9.5kb, lOkb, 1 Ikb, 12kb, 13kb, 14kb, 15kb, 16kb, 17kb, 18kb, 19kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, or longer.
  • the DNA variant is an indel that has a length of at least 1 kilobase (kb). In certain embodiments, the DNA variant is an indel that has a length of at least 2kb. In certain embodiments, the DNA variant is an indel that has a length of at least 3kb. In certain embodiments, the DNA variant is an indel that has a length of at least 4kb. In certain embodiments, the DNA variant is an indel that has a length of at least 5kb. In certain embodiments, the DNA variant is an indel that has a length of at least 6kb. In certain embodiments, the DNA variant is an indel that has a length of at least 7kb.
  • kb kilobase
  • the DNA variant is an indel that has a length of at least 8kb. In certain embodiments, the DNA variant is an indel that has a length of at least 9kb. In certain embodiments, the DNA variant is an indel that has a length of at least lOkb.
  • a large DNA variant e.g., indel
  • a large DNA variant has a length of about lOObp to lOOkb, Ikb to 90kb, 2kb to 80kb, 5kb to 70kb, lOkb to 60kb, or 15kb to 50kb. Accordingly, the method described herein is capable of detecting a large DNA variant (e.g., having a length of about lOObp to lOOkb, such as 2kb, 5kb or 15kb) as described above.
  • a near DNA variant indicates a DNA variant that is less than about 500 bases from the editing site.
  • a remote DNA variant indicates a DNA variant that is at least about 500 bases from the editing site.
  • the DNA variant (e.g., indel) is at least about 500bp, 600bp, 700bp, 800bp, 900bp, Ikb, 2kb, 3kb, 4kb, 5kb, 6kb, 7kb, 8kb, 9kb, lOkb, 15kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, HOkb, 120kb, 130kb, 140kb, 150kb, 160kb, 170kb, 180kb, 190kb, 200kb or further, away from the editing site.
  • the DNA variant (e.g., insertion or deletion) is at least about 500 bases away from the editing site. In certain embodiments, the DNA variant is at least about Ikb, 2kb, 3kb, 4kb or 5kb away from the editing site. In certain embodiments, the DNA variant is at least about lOkb away from the editing site. In certain embodiments, the DNA variant is at least about 20kb away from the editing site. In certain embodiments, the DNA variant is at least about 30kb away from the editing site. In certain embodiments, the DNA variant is at least about 40kb away from the editing site. In certain embodiments, the DNA variant is at least about 50kb away from the editing site.
  • the DNA variant is at least about 60kb away from the editing site. In certain embodiments, the DNA variant is at least about lOOkb away from the editing site. In certain embodiments, the DNA variant is at least about 200kb, or further, away from the editing site. As non-limiting examples, in certain embodiments, the DNA variant is about 500bp to 200kb, Ikb to 190kb, 2kb to 180kb, 5kb to 170kb, lOkb to 160kb, or 15kb to 150kb away from the editing site. Accordingly, the method described herein is capable of detecting a remote DNA variant (e.g., from about 500bp to 200kb away, such as lOkb, 50kb or 60kb away from the editing site) as described above.
  • a remote DNA variant e.g., from about 500bp to 200kb away, such as lOkb, 50kb or 60kb away from the editing site
  • a DNA variant may have any combination of a length described herein and a distance from editing site described herein. Accordingly, the method described herein is capable of detecting a large and/or remote DNA variant as described herein. For example, in certain embodiments, the method described herein is capable of detecting a large and remote DNA variant of 5kb in length and is at least 60kb away from the editing site. In certain embodiments, a large and remote DNA variant may have a length of at least lOObp and is at least 500bp away from the editing site. In certain embodiments, a DNA variant may have a length of at least 300bp and is at least 800bp away from the editing site.
  • a DNA variant may have a length of at least 500bp and is at least Ikb away from the editing site. In certain embodiments, a DNA variant may have a length of at least 800bp and is at least 3kb away from the editing site. In certain embodiments, a DNA variant may have a length of at least Ikb and is at least 5kb away from the editing site. In certain embodiments, a DNA variant may have a length of at least 2kb and is at least lOkb away from the editing site. In certain embodiments, a DNA variant may have a length of at least 3kb and is at least 20kb away from the editing site.
  • a DNA variant may have a length of at least 4kb and is at least 30kb away from the editing site. In certain embodiments, a DNA variant may have a length of at least 5kb and is at least 40kb away from the editing site.
  • methods described herein are suitable for generating full spectrum of DNA variants or characterizing extensive DNA variants induced by a genomeediting protocol.
  • one DNA variant is characterized.
  • two or more DNA variants are characterized.
  • at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or more DNA variants are characterized.
  • at least 2 DNA variants are characterized.
  • at least 5 DNA variants are characterized.
  • at least 10 DNA variants are characterized.
  • at least 20 DNA variants are characterized.
  • at least 30 DNA variants are characterized.
  • at least 40 DNA variants are characterized.
  • At least 50 DNA variants are characterized. In certain embodiments, at least 60 DNA variants are characterized.
  • an isolated DNA variant indicates one DNA variant having no other DNA variant in the proximity within 5kb upstream of the DNA variant 5' end and 5kb downstream of the DNA variant 3' end.
  • clustered DNA variants indicate that the distance between two DNA variants is shorter than 5kb (distance from 3' end of one DNA variant to the 5' end of another DNA variant is ⁇ 5kb).
  • editing site refers to the intended target site on a genomic DNA (e.g., chromosomal, mitochondrial or plasmid DNA) for editing in a genome editing protocol.
  • the target site for editing is purposefully and rationally chosen.
  • the editing site in a CRISPR-Cas based genome editing protocol may be targeted by specifically designed guide RNA (gRNA), while in TALEN or ZFN based protocol the editing site is targeted via specifically designed TALE nuclease or zinc finger nuclease.
  • the desired outcome of genome editing at the editing site may include, but is not limited to, correction of a single point mutation at the editing site, replacement of a deleterious DNA segment with a beneficial DNA segment at the editing site, knocking out of an undesirable DNA segment at the editing site, or knocking in of a desirable DNA segment at the editing site.
  • the editing site may be located at a protein coding region or a non-coding region.
  • the editing site may be located at a regulatory region such as a promoter, enhancer, 5' or 3 '-untranslated region (UTR).
  • the editing site may be located at a transposon or retrotransposon.
  • the editing site may be located at a microRNA coding region.
  • the editing site may be located at a site encoding a splice signal.
  • the editing site may be located within chromosomal DNA, mitochondrial DNA, or plasmid DNA.
  • the editing site may be a single nucleotide in length (e.g., for point mutation editing).
  • the editing site may be a DNA segment (longer than a single nucleotide) that has a 5' end and a 3' end.
  • the editing site e.g., for gene replacement, or knock-out
  • the editing site may have the same length prior to and after the genome editing protocol (e.g., single point edit, or replacement of a DNA segment of equal length).
  • the editing site may have different lengths prior to and after the genome editing protocol (e.g., knock out, knock in, or replacement of a DNA segment of differing length).
  • the editing site and the immediately adjacent nucleotide(s) surrounding the editing site have definitive location/loci in a genome map (e.g., chromosome map), and the sequence at the editing site and its close proximity can be located and probed precisely before and/or after the genome editing protocol.
  • a cell may carry a disease-causing allele at the editing site, the sequence of which can be probed and ascertained with PCR/sequencing or any other suitable sequencing, genotyping or diagnostic methods.
  • DNA sequence at the editing site can be probed and ascertained to provide an indication whether correct DNA sequence is now present at the editing site as intended.
  • the editing site of a genomic DNA in a cell may have successfully edited DNA sequence as intended by the genome editing protocol.
  • the editing site of a genomic DNA in a cell may have the original, unedited DNA sequence prior to or after the genome editing protocol.
  • the editing site of a genomic DNA may display correct editing on one chromosome and not on its homologous chromosome. It is also possible that the editing site of a genomic DNA in a cell may have partially edited DNA sequence that falls short of the full length of the intended DNA segment for knock out, knock in, or replacement.
  • the methods described herein comprise contacting an edited genomic DNA with one or more targeted nucleases (e.g., one single targeted nuclease or a pair of targeted nucleases) that is capable of excising a DNA fragment, such as a high molecular weight (HMW) DNA fragment, from the edited genomic DNA.
  • one or more targeted nucleases e.g., one single targeted nuclease or a pair of targeted nucleases
  • a DNA fragment such as a high molecular weight (HMW) DNA fragment
  • the one or more targeted nucleases comprises one, two, or more targeted nucleases. In certain embodiments, the one or more targeted nucleases comprises one single targeted nuclease. In certain embodiments, the single targeted nuclease may be designed to cut a linear genomic DNA for excision and release of a HMW DNA fragment of interest that comprises the editing site and one end of the linear genomic DNA.
  • the one or more targeted nucleases comprises a pair of targeted nucleases.
  • the pair of targeted nucleases (downstream nuclease and upstream nuclease) is designed to cut at downstream and upstream of the editing site respectively for excision and release of a DNA fragment (e.g., HMW DNA fragment) that includes the editing site and flanking sequences.
  • the HMW DNA fragment may also comprise one or more DNA variant induced by a genome editing protocol (e.g., induced by NHEJ repair following genome editing).
  • the HMW DNA fragment may comprise an unbiased, full spectrum of any DNA variants, including large and/or remote DNA variant(s) as described herein.
  • the one or more targeted nucleases comprises a CRISPR-Cas nuclease, a transcription activator-like effector nuclease (TALEN), a zinc-finger nuclease (ZFN), or a meganuclease.
  • the one or more targeted nucleases comprise a CRISPR-Cas nuclease.
  • the one or more targeted nucleases comprise a CRISPR-Cas9 nuclease.
  • the one or more targeted nucleases comprise Streptococcus pyogenes Cas9 nuclease (SpCas9). In certain embodiments, the one or more targeted nucleases comprise a Staphylococcus aureus Cas9 nuclease (SaCas9). In certain embodiments, the one or more targeted nucleases comprise a CRISPR-Casl2a nuclease.
  • the pair of targeted nucleases comprises two of the same class of nucleases (e.g., two Cas nucleases, or two ZFNs). In certain embodiments, the pair of targeted nucleases comprises a pair of CRISPR-Cas9 nucleases. In certain embodiments, the pair of targeted nucleases comprises two types of nucleases within the same class (e.g., a SpCas9 and a SaCas9; or a Cas9 and a Cas 12a). In certain embodiments, the pair of targeted nucleases comprises two different classes of nucleases (e.g., a Cas nuclease and a non-Cas nuclease such as a TALEN or ZFN).
  • a Cas nuclease and a non-Cas nuclease such as a TALEN or ZFN.
  • CRISPR-Cas e.g., CRISPR-Cas, TALEN, or ZFN
  • CRISPR-Cas Class 2 Clustered Regularly Interspaced Short Palindromic Repeat
  • exemplary CRISPR-Cas systems comprises two components: a guide RNA (gRNA or sgRNA) and a CRISPR-associated endonuclease (Cas protein).
  • gRNA or sgRNA guide RNA
  • Cas protein CRISPR-associated endonuclease
  • the gRNA is a short synthetic RNA comprising a scaffold sequence necessary for Cas-binding and a user-defined about 20 nucleotide spacer that defines the genomic target to be modified.
  • a user-defined about 20 nucleotide spacer that defines the genomic target to be modified.
  • design and generation of targeted nuclease systems, or further engineered CRISPR, TALEN and ZFN derivative systems are known/practiced in the field and further supported by commercially available services.
  • exemplary U.S. patents directed to targeted nuclease systems such as U.S. Patent 8,586,363; U.S. Patent 9,393,257; U.S. Patent 9,982,277; U.S. Patent 10,266,850; and U.S. Patent 10,570,418 are incorporated by reference herein for all purposes.
  • high molecular weight DNA or “HMW DNA fragment” as described herein refers to a DNA molecule having at least lOkb in length.
  • the HMW DNA fragment may have a length of at least about lOkb, 15kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, HOkb, 120kb, 130kb, 140kb, 150kb, 160kb, 170kb, 180kb, 190kb, 200kb, 210kb, 220kb, 230kb, 240kb, 250kb, 260kb, 270kb, 280kb, 290kb, 300kb, 310kb, 320kb, 330kb, 340kb, 350k
  • the HMW DNA fragment has a length of at least about 15kb. In certain embodiments, the HMW DNA fragment has a length of at least about 20kb. In certain embodiments, the HMW DNA fragment has a length of at least about 50kb. In certain embodiments, the HMW DNA fragment has a length of at least about 75kb. In certain embodiments, the HMW DNA fragment has a length of at least about lOOkb. In certain embodiments, the HMW DNA fragment has a length of at least about 150kb. In certain embodiments, the HMW DNA fragment has a length of at least about 200kb. In certain embodiments, the HMW DNA fragment has a length of at least about 250kb.
  • the HMW DNA fragment has a length of at least about 300kb. In certain embodiments, the HMW DNA fragment has a length of at least about 350kb. In certain embodiments, the HMW DNA fragment has a length of at least about 400kb. In certain embodiments, the HMW DNA fragment has a length of about lOkb to 500kb, 20kb to 450kb, 30kb to 400kb, 40kb to 350kb, or 50kb to 300kb, as described above.
  • the HMW DNA fragment has a length of about 15kb to 490kb, 25kb to 430kb, 35kb to 410kb, 45kb to 390kb, or 55kb to 360kb, as described above.
  • one or more targeted nucleases may be used for excision and release of a DNA fragment of interest from one end of a genomic DNA (e.g., a linear genomic DNA), wherein the DNA fragment comprises the editing site, genome-editing induced DNA variant(s), and the end of the genomic DNA.
  • a genomic DNA e.g., a linear genomic DNA
  • the targeted nuclease cuts the linear genomic DNA at least about lOkb, 15kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, HOkb, 120kb, 130kb, 140kb, 150kb, 160kb, 170kb, 180kb, 190kb, 200kb, 210kb, 220kb, 230kb, 240kb, 250kb, 260kb, 270kb, 280kb, 290kb, 300kb, 310kb, 320kb, 330kb, 340kb, 350kb, 360kb, 370kb, 380kb, 390kb, 400kb, 410kb, 420kb, 430kb, 440
  • one or more targeted nucleases cuts the linear genomic DNA at about lOkb to 500kb, 20kb to 450kb, 30kb to 400kb, 40kb to 350kb, or 50kb to 300kb, as described above, away from the end of the linear genomic DNA.
  • the targeted nuclease may cut the linear genomic DNA at least about lOkb (e.g., about lOOkb or 200kb) downstream of the 5' end of the linear genomic DNA and release a HMW DNA fragment comprising the editing site, genome editing induced DNA variant(s), and the 5' end of the genomic DNA.
  • the targeted nuclease may cut the linear genomic DNA at least about lOkb (e.g., about lOOkb or 200kb) upstream of the 3' end of the linear genomic DNA and release a HMW DNA fragment comprising the editing site, genome editing induced DNA variant(s), and the 3' end of the genomic DNA.
  • the released HMW DNA fragment of interest further comprises a telomere region.
  • telomere region is the end of linear chromosome and a region of repetitive nucleotide sequences that could be recognized by specialized protein(s) including telomerase.
  • the distance(s) as described above is measured by the cutting location relative to the first non-telomere nucleotide that abuts the telomere region.
  • the targeted nuclease may cut the linear genomic DNA at about lOkb to 500kb, 20kb to 450kb, 30kb to 400kb, 40kb to 350kb, or 50kb to 300kb, as described above, away from the first non-telomere nucleotide that abuts the telomere region.
  • the targeted nuclease may cut the linear genomic DNA at 200kb away from the first non-telomere nucleotide that abuts a telomere region of 8kb in length, releasing a DNA fragment of about 208kb in length.
  • one targeted nuclease of the pair cuts the genomic DNA at least about lOObp, 200bp, 300bp, 400bp, 500bp, 600bp, 700bp, 800bp, 900bp, Ikb, 2kb, 3kb, 4kb, 5kb, 6kb, 7kb, 8kb, 9kb, lOkb, 15kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, HOkb, 120kb, 130kb, 140kb, 150kb, 160kb, 170kb, 180kb, 190kb, 200kb, or further, downstream of
  • downstream nuclease cuts the genomic DNA at least about 5kb downstream of the editing site. In certain embodiments, downstream nuclease cuts the genomic DNA at least about lOkb downstream of the editing site. In certain embodiments, downstream nuclease cuts the genomic DNA at least about 50kb downstream of the editing site. In certain embodiments, downstream nuclease cut the genomic DNA at least about lOOkb downstream of the editing site. In certain embodiments, downstream nuclease cuts the genomic DNA at least about 150kb downstream of the editing site. In certain embodiments, downstream nuclease cuts the genomic DNA at least about 200kb downstream of the editing site. The distance of downstream cutting location relative to the editing site is measured by the downstream cutting location relative to the first neighboring nucleotide downstream to the 3' end of the editing site.
  • one targeted nuclease of the pair cuts the genomic DNA at least about lOObp, 200bp, 300bp, 400bp, 500bp, 600bp, 700bp, 800bp, 900bp, Ikb, 2kb, 3kb, 4kb, 5kb, 6kb, 7kb, 8kb, 9kb, lOkb, 15kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, HOkb, 120kb, 130kb, 140kb, 150kb, 160kb, 170kb, 180kb, 190kb, 200kb, or further, upstream nuclease
  • the upstream nuclease cuts the genomic DNA at least about 5kb upstream of the editing site. In certain embodiments, the upstream nuclease cuts the genomic DNA at least about lOkb upstream of the editing site. In certain embodiments, the upstream nuclease cuts the genomic DNA at least about 50kb upstream of the editing site. In certain embodiments, the upstream nuclease cuts the genomic DNA at least about lOOkb upstream of the editing site. In certain embodiments, the upstream nuclease cuts the genomic DNA at least about 150kb upstream of the editing site. In certain embodiments, the upstream nuclease cuts the genomic DNA at least about 200kb upstream of the editing site.
  • the distance of the upstream cutting location relative to the editing site is measured by the upstream cutting location relative to the first neighboring nucleotide upstream to the 5' end of the editing site.
  • the distance of upstream or downstream cutting location relative to the editing site may be approximately symmetric or asymmetric.
  • the pair of targeted nucleases cut symmetrically or asymmetrically at any combination of a downstream cutting distance described herein and an upstream cutting distance described herein.
  • one targeted nuclease of the pair cuts the genomic DNA at about 8kb downstream of the editing site
  • the other targeted nuclease of the pair cuts the genomic DNA at about 8kb upstream of the editing site.
  • the downstream nuclease cuts the genomic DNA at about 50kb downstream of the editing site
  • the upstream targeted nuclease cuts the genomic DNA at about 50kb upstream of the editing site.
  • the downstream nuclease cuts the genomic DNA at about 60kb downstream of the editing site
  • the upstream targeted nuclease cuts the genomic DNA at about 60kb upstream of the editing site.
  • the downstream nuclease cuts the genomic DNA at about lOOkb downstream of the editing site, and the upstream nuclease cuts the genomic DNA at about lOOkb upstream of the editing site. In certain embodiments, the downstream nuclease cuts the genomic DNA at about 180kb downstream of the editing site, and the upstream nuclease cuts the genomic DNA at about 180kb upstream of the editing site.
  • one targeted nuclease of the pair cuts the genomic DNA at about lOkb downstream of the editing site and the other targeted nuclease of the pair (upstream nuclease) cuts the genomic DNA at about 18kb upstream of the editing site.
  • the downstream nuclease cuts the genomic DNA at about 72kb downstream of the editing site and the upstream nuclease cuts the genomic DNA at about 66kb upstream of the editing site.
  • the downstream nuclease cuts the genomic DNA at about 120kb downstream of the editing site and the upstream nuclease cuts the genomic DNA at about 160kb upstream of the editing site.
  • a pair of targeted nucleases are CRISPR-Cas9 nucleases.
  • the upstream and downstream cutting locations are targeted by specifically designed gRNA sequences.
  • the methods described herein do not involve amplifying a DNA fragment surrounding the editing site using a pair of PCR primers.
  • the HMW DNA is cut and released from the edited genomic DNA using one or more targeted nucleases, for example, a single targeted nuclease or a pair of targeted nucleases as described herein.
  • the methods described herein may produce a HMW DNA fragment from a genome edited sample in a faithful and unbiased manner.
  • the design of appropriate PCR primers could be better informed, PCR may then be performed as secondary, confirmatory test for certain location or indel(s) discovered by the methods described herein.
  • the sample comprises an edited DNA, or an edited cell comprising an edited DNA.
  • the sample comprises a cell.
  • the method described herein comprises lysing the sample (e.g., a cell) to release the edited genomic DNA (e.g., prior to contacting the DNA with one or more targeted nucleases, such as a single targeted nuclease or a pair of targeted nucleases as descried herein).
  • cell can be lysed by chemical or biochemical methods.
  • lysing comprises contacting the sample cell with hypotonic solution, enzyme (e.g., lysozyme or proteinase), and/or cell membrane disrupting agent such as detergent (e.g., SDS).
  • cell can be lysed by physical or mechanical methods, including but not limited to, sonication, freeze-thawing, or other shearing methods.
  • the sample comprises a prokaryotic cell, or a eukaryotic cell.
  • the sample comprises a bacterial cell, yeast cell, insect cell, plant cell, or mammalian cell.
  • the sample comprises an E. coli cell.
  • the sample comprises an animal cell.
  • the sample comprises a mouse cell, a rat cell, a hamster cell, a cow cell, a pig cell, a horse cell, a dog cell, a cat cell, a fish cell, a goat cell, a camelids cell, a sheep cell, or a chicken cell.
  • the sample comprises a zebra fish cell.
  • the sample comprises a human cell.
  • the sample comprises a human stem cell.
  • the sample comprises a human somatic cell (e.g., muscle cell, or neuron).
  • the edited cells are of prophylactic and/or therapeutic use.
  • the sample comprises an edited cell that is suitable for being administered into an animal (if the edited cell harbors the desired edits at the editing site and is free of harmful or dangerous DNA variant induced by the genome editing protocol).
  • the methods described herein comprises comparing the sequence of the DNA fragment such as HMW DNA fragment (after sequencing the HMW DNA fragment) to one or more reference sequences (e.g., the original sequence of the sample before genome-editing, and/or a control sequence having wildtype sequence of a gene).
  • the comparison may be conducted using suitable alignment or multiple alignment bioinformatic workflow.
  • the methods described herein comprises determining the nature of DNA variant(s) comprised within the DNA fragment (e.g., HMW DNA fragment). For example, in certain embodiments, a DNA variant(s) is determined to be indel (e.g., insertion or deletion). In certain embodiments, a DNA variant(s) is determined to be point mutation. In certain embodiments, a DNA variant(s) is determined to lead to a missense substitution that results in replacement of one amino acid into another. In certain embodiments, a DNA variant(s) is determined to lead to a nonsense substitution that results in a premature stop codon and shortened protein. In certain embodiments, a DNA variant(s) is determined to lead to frameshift.
  • a DNA variant(s) is determined to be part of a rearrangement, or translocation event (e.g., as a result of chromothripsis).
  • a DNA variant(s) is determined to be part of a duplication, or inversion event.
  • a duplication occurs when a stretch of one or more nucleotides in a gene is copied and repeated (e.g., next to the original DNA sequence).
  • An inversion changes more than one nucleotide in a gene by replacing the original sequence with the same sequence in reverse order.
  • a DNA variant(s) is determined to be part of a repeat expansion event that increases the number of times that a short DNA sequence (e.g., trinucleotide or tetranucleotide) is repeated.
  • DNA variant(s) comprised within the DNA fragment e.g., HMW DNA fragment
  • a population of cells subject to the same genome editing protocol as the sample cells tested in methods described herein may be only reserved for future study or discarded, or is suitable for subsequent application when the HMW DNA is generally free of detrimental DNA variant.
  • certain embodiments of the invention provide methods of treatment or a method of medical therapy for a disease.
  • the methods described herein further comprise administering into an animal a population of cells, wherein the administered population of cells and the sample cell were previously edited in the same genome editing protocol (e.g., administered cells, and the sample cells used for quality control were edited in the same genome-editing batch/process).
  • the sample comprises a stem cell. In certain embodiments, the sample comprises a hematopoietic stem cell. In certain embodiments, the sample comprises an induced pluripotent stem cell (iPSC). In certain embodiments, the sample comprises a patient derived iPSC. In certain embodiments, the sample comprises a pluripotent cell. In certain embodiments, the sample comprises a progenitor cell. In certain embodiments, the sample comprises a blood cell. In certain embodiments, the sample comprises an immune cell. In certain embodiments, the sample comprises a T cell (e.g., CAR-T cell). In certain embodiments, the sample comprises a dendritic cell. In certain embodiments, the sample comprises a Natural Killer cell. In certain embodiments, the sample comprises a B cell. In certain embodiments, the sample comprises a cancer cell.
  • iPSC induced pluripotent stem cell
  • the sample comprises a patient derived iPSC.
  • the sample comprises a pluripotent cell.
  • the sample comprises
  • the disease is a hereditary disease.
  • the disease is a blood disorder.
  • the disease is sickle cell disease.
  • the disease is thalassemia (e.g., beta thalassemia).
  • the disease is cancer.
  • the disease is an immune disorder.
  • the disease is a neuronal disorder (e.g., fronto-temporal dementia).
  • the disease is a muscular disorder (e.g., muscular dystrophy).
  • the edited cells may be of biomanufacturing use.
  • the cell is a human embryonic kidney (HEK) 293 cell.
  • the cell is a 293F cell.
  • the cell is a 293T cell.
  • the cell is a human embryonic retinal (PER.C6) cell.
  • the cell is a HT- 1080 cell.
  • the cell is a Huh-7 cell.
  • the cell is a Monkey kidney epithelial (Vero) cell.
  • the cell is a Chinese Hamster Ovary (CHO) cell.
  • the cell is a baby hamster kidney (BHK) cell.
  • the sample comprises a hybridoma cell.
  • methods described herein comprises electrophoresing.
  • a sample (e.g., cell) is introduced into a loading compartment of a device or a gel that is suitable for size selection process (e.g., electrophoresis).
  • the loading compartment comprises a solution.
  • sample cells are pipetted into the loading compartment of the device or gel.
  • the sample is lysed in situ within the loading compartment, releasing the edited genomic DNA from sample cell into the loading compartment.
  • sample cells are not lysed in situ within the loading compartment of device or gel. It is understood by person skilled in the art that in certain embodiments, such sample preparations are conducted in a suitable container (e.g., a tube) and then introduced into the loading compartment.
  • sample cells are encapsulated within a gel matrix.
  • the sample is lysed in situ within the gel matrix, releasing the edited genomic DNA from sample cell into the gel matrix.
  • the method further comprises one or more pretreatment step to digest and/or elute lipid, protein, RNA, cellular metabolites, etc.
  • one or more pretreatment step to digest and/or elute lipid, protein, RNA, cellular metabolites, etc.
  • an initial electrophoresis step is conducted to elute smaller cellular content released from sample cell, while ultra-large genomic DNA are unable to migrate through the gel under electrophoretic field.
  • the genomic DNA is contacted with one or more targeted nucleases (e.g., one targeted nuclease or a pair of targeted nucleases) as described herein and incubated for a period of time (e.g., about 15-45 minutes) to generate the DNA fragment (e.g., HWM DNA fragment) comprising the editing site and DNA variant(s).
  • one or more targeted nucleases e.g., one targeted nuclease or a pair of targeted nucleases
  • the genomic DNA released from an edited sample (e.g., an edited cell) is contacted with one or more targeted nucleases (e.g., one targeted nuclease or a pair of targeted nucleases) in a liquid solution (e.g., in liquid phase within a container, or within loading compartment of device or gel).
  • one or more targeted nucleases e.g., one targeted nuclease or a pair of targeted nucleases
  • a liquid solution e.g., in liquid phase within a container, or within loading compartment of device or gel.
  • the genomic DNA from an edited sample is contacted with one or more targeted nucleases (e.g., one targeted nuclease or a pair of targeted nucleases) in a gel matrix (e.g., agarose gel).
  • a targeted nucleases e.g., one targeted nuclease or a pair of targeted nucleases
  • a gel matrix e.g., agarose gel
  • the genomic DNA from an edited sample is contacted with the pair of downstream and upstream targeted nucleases simultaneously or sequentially.
  • the genomic DNA is contacted with one or the pair of targeted nuclease(s) for about 5 minutes to 6 hours, 10 minutes to 3 hours, 15 minutes to 2 hours, 20 minutes to 1.5 hours, 30 minutes to 1 hour, or 40 minutes to 50 minutes.
  • the genomic DNA from an edited sample is contacted with one or the pair of targeted nuclease(s) for at least about 5 minutes, 10 minutes, 15 minutes, 20 minutes, 25 minutes, 30 minutes, 40 minutes, 50 minutes, 1 hour, 2 hours or 3 hours.
  • the targeted nucleases are optionally inactivated to stop further enzymatic activities.
  • genomic DNA is purified using any suitable technique and then contacted with one or more targeted nucleases for incubation within a tube or any suitable container.
  • the resultant genomic DNA mixture including the released DNA fragment (e.g., HMW DNA fragment), is then transferred into loading compartment of a device or gel for separation (e.g., via electrophoresis).
  • the method comprises contacting the resultant genomic DNA mixture with a detergent (e.g., SDS).
  • a detergent e.g., SDS
  • this step may improve electrophoresis efficiency, separate certain DNA binding proteins from genomic DNA, and/or change the charge level of the genomic DNA or fragment.
  • the method comprises an isolating step that isolates the DNA fragment (e.g., HMW DNA fragment) from the genomic DNA mixture.
  • the DNA fragment e.g., HMW DNA fragment
  • Any DNA isolation technology that isolates, purifies, or separates DNA fragment including high molecular weight (HMW) DNA fragment may be used for methods described herein.
  • the DNA isolation technology involves separating DNA molecules or fragments based on size.
  • the DNA isolating step comprises electrophoresing. In certain embodiments, the DNA isolating step comprises one dimensional electrophoresing.
  • the DNA isolating step comprises electrophoresing the DNA fragment, such as HMW DNA fragment (e.g., for a first period of time in a first direction). In certain embodiments, the DNA isolating step further comprises electrophoresing the DNA fragment, such as HMW DNA fragment, for a second period of time in a second direction. Accordingly, in certain embodiments, the DNA isolating step comprises two-dimensional electrophoresing.
  • the isolating step is conducted in a device suitable for one dimensional, or two-dimensional electrophoresis.
  • the isolating step may be conducted in a SageHLSTM device/protocol as disclosed in U.S. Patent Application 2020/0041449, which is incorporated by reference herein for all purposes.
  • HMW DNA fragment is electrophoresed for a first period of time in one direction for separation by size and then electrophoresed for a second period of time in another direction (e.g., a perpendicular direction) for elution from the gel and then isolated into a collection chamber.
  • the isolating step in any electrophoresis gel/device or protocol that is suitable for isolating HMW DNA fragment by size.
  • a DNA ladder is used to help locate the HMW DNA fragment.
  • the HMW DNA fragment is retrieved by cutting the gel cube containing the HMW DNA fragment, followed by dissolving the gel to release the HMW DNA fragment.
  • the HMW DNA fragment is eluted from the gel for collection.
  • isolated DNA fragment e.g., HMW DNA fragments
  • sequenced are sequenced to provide a sequence result readout.
  • Any DNA sequencing technology that can provide sequence result over a high molecular weight (HMW) DNA fragment may be used for methods described herein.
  • the sequencing method is a high-throughput sequencing method, for example, a massive parallel signature sequencing (MPSS) method.
  • MPSS massive parallel signature sequencing
  • the sequencing method is a deep sequencing method.
  • the sequencing method is a shotgun sequencing method.
  • the sequencing method is a short-read sequencing method.
  • the sequencing method is a pyrosequencing method.
  • the sequencing method is a long-read sequencing method. In certain embodiments, the sequencing method is a Nanopore DNA sequencing method. In certain embodiments, the sequencing method is a single molecule real time (SMRT) sequencing method.
  • SMRT single molecule real time
  • nucleic acid refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form, composed of monomers (nucleotides) containing a sugar, phosphate and a base which is either a purine or pyrimidine. Unless specifically limited, the term encompasses nucleic acids containing known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified position thereof (e.g., degenerate codon substitutions) and complementary sequences as well as the sequence explicitly indicated.
  • degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucl. Acids Res., 19:508 (1991); Ohtsuka et al., JBC, 260:2605 (1985); Rossolini et al., Mol. Cell. Probes, 8:91 (1994).
  • a "nucleic acid fragment” is a fraction of a given nucleic acid molecule.
  • Deoxyribonucleic acid (DNA) in the majority of organisms is the genetic material while ribonucleic acid (RNA) is involved in the transfer of information contained within DNA into proteins.
  • nucleotide sequence refers to a polymer of DNA or RNA that can be single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases capable of incorporation into DNA or RNA polymers.
  • nucleic acid refers to a polymer of DNA or RNA that can be single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases capable of incorporation into DNA or RNA polymers.
  • nucleic acid “nucleic acid molecule,” “nucleic acid fragment,” “nucleic acid sequence or segment,” or “polynucleotide” may also be used interchangeably with gene, cDNA, DNA and RNA encoded by a gene.
  • portion or “fragment,” as it relates to a nucleic acid molecule, sequence or segment of the invention, when it is linked to other sequences for expression, is meant a sequence having at least 80 nucleotides, more specifically at least 150 nucleotides, and still more specifically at least 400 nucleotides. If not employed for expressing, a “portion” or “fragment” means at least 9, specifically 12, more specifically 15, even more specifically at least 20, consecutive nucleotides, e.g., probes and primers (oligonucleotides), corresponding to the nucleotide sequence of the nucleic acid molecules of the invention.
  • an "isolated” or “purified” DNA molecule or an “isolated” or “purified” polypeptide is a DNA molecule or polypeptide that exists apart from its native environment and is therefore not a product of nature.
  • An isolated DNA molecule or polypeptide may exist in a purified form or may exist in a non-native environment such as, for example, outside a host cell.
  • an "isolated” or “purified” nucleic acid molecule or protein, or biologically active portion thereof is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized.
  • an "isolated" nucleic acid is free of sequences that naturally flank the nucleic acid (i.e., sequences located at the 5' and 3' ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived.
  • the isolated nucleic acid molecule can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb, or 0.1 kb of nucleotide sequences that naturally flank the nucleic acid molecule in genomic DNA of the cell from which the nucleic acid is derived.
  • a protein that is substantially free of cellular material includes preparations of protein or polypeptide having less than about 30%, 20%, 10%, 5%, (by dry weight) of contaminating protein.
  • culture medium may represent less than about 30%, 20%, 10%, or 5% (by dry weight) of chemical precursors or non-protein-of- interest chemicals.
  • Fragments and variants of the disclosed nucleotide sequences and proteins or partial-length proteins encoded thereby are also encompassed by the present invention.
  • fragment or “portion” is meant a full length or less than full length of the nucleotide sequence encoding, or the amino acid sequence of, a polypeptide or protein.
  • Naturally occurring is used to describe an object that can be found in nature as distinct from being artificially produced.
  • a protein or nucleotide sequence present in an organism including a virus
  • which can be isolated from a source in nature and which has not been intentionally modified by man in the laboratory is naturally occurring.
  • Recombinant DNA molecule is a combination of DNA sequences that are joined together using recombinant DNA technology and procedures used to join together DNA sequences as described, for example, in Sambrook and Russell, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press (3 rd edition, 2001).
  • heterologous DNA sequence each refer to a sequence that originates from a source foreign to the particular host cell or, if from the same source, is modified from its original form.
  • a heterologous gene in a host cell includes a gene that is endogenous to the particular host cell but has been modified.
  • the terms also include non-naturally occurring multiple copies of a naturally occurring DNA sequence.
  • the terms refer to a DNA segment that is foreign or heterologous to the cell, or homologous to the cell but in a position within the host cell nucleic acid in which the element is not ordinarily found. Exogenous DNA segments are expressed to yield exogenous polypeptides.
  • a "homologous" DNA sequence is a DNA sequence that is naturally associated with a host cell into which it is introduced.
  • Wild-type refers to the normal gene, or organism found in nature without any known mutation.
  • Genome refers to the complete genetic material of an organism.
  • a “vector” is defined to include, inter alia, any plasmid, cosmid, phage or binary vector in double or single stranded linear or circular form which may or may not be self-transmissible or mobilizable, and which can transform prokaryotic or eukaryotic host either by integration into the cellular genome or exist extrachromosomally (e.g., autonomous replicating plasmid with an origin of replication).
  • Regulatory sequences each refer to nucleotide sequences located upstream (5' non-coding sequences), within, or downstream (3' non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include enhancers, promoters, translation leader sequences, introns, and polyadenylation signal sequences. They include natural and synthetic sequences as well as sequences that may be a combination of synthetic and natural sequences. As is noted above, the term “suitable regulatory sequences” is not limited to promoters. However, some suitable regulatory sequences useful in the present invention will include, but are not limited to constitutive promoters, tissue-specific promoters, development-specific promoters, inducible promoters and viral promoters.
  • 5' non-coding sequence refers to a nucleotide sequence located 5' (upstream) to the coding sequence. It is present in the fully processed mRNA upstream of the initiation codon and may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency (Turner et al., Mol. Biotech., 3:225 (1995).
  • 3' non-coding sequence refers to nucleotide sequences located 3' (downstream) to a coding sequence and include polyadenylation signal sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression.
  • the polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3 ' end of the mRNA precursor.
  • Promoter refers to a nucleotide sequence, usually upstream (5') to its coding sequence, which controls the expression of the coding sequence by providing the recognition for RNA polymerase and other factors required for proper transcription.
  • Promoter includes a minimal promoter that is a short DNA sequence comprised of a TATA- box and other sequences that serve to specify the site of transcription initiation, to which regulatory elements are added for control of expression.
  • Promoter also refers to a nucleotide sequence that includes a minimal promoter plus regulatory elements that is capable of controlling the expression of a coding sequence or functional RNA. This type of promoter sequence consists of proximal and more distal upstream elements, the latter elements often referred to as enhancers.
  • an “enhancer” is a DNA sequence that can stimulate promoter activity and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue specificity of a promoter. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even be comprised of synthetic DNA segments. A promoter may also contain DNA sequences that are involved in the binding of protein factors that control the effectiveness of transcription initiation in response to physiological or developmental conditions.
  • the "initiation site” is the position surrounding the first nucleotide that is part of the transcribed sequence, which is also defined as position +1. With respect to this site all other sequences of the gene and its controlling regions are numbered. Downstream sequences (i.e. further protein encoding sequences in the 3' direction) are denominated positive, while upstream sequences (mostly of the controlling regions in the 5' direction) are denominated negative.
  • promoter elements particularly a TATA element, that are inactive or that have greatly reduced promoter activity in the absence of upstream activation are referred to as "minimal or core promoters.”
  • minimal or core promoters In the presence of a suitable transcription factor, the minimal promoter functions to permit transcription.
  • a “minimal or core promoter” thus consists only of all basal elements needed for transcription initiation, e.g., a TATA box and/or an initiator.
  • sequences e.g., nucleic acids, polynucleotides or polypeptides: (a) “reference sequence,” (b) “comparison window,” (c) “sequence identity,” (d) “percentage of sequence identity,” and (e) “substantial identity.”
  • reference sequence is a defined sequence used as a basis for sequence comparison.
  • a reference sequence may be a subset or the entirety of a specified sequence; for example, as a segment of a full length cDNA, gene sequence or peptide sequence, or the complete cDNA, gene sequence or peptide sequence.
  • comparison window makes reference to a contiguous and specified segment of a sequence, wherein the sequence in the comparison window may comprise additions or deletions (i.e., gaps) compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences.
  • the comparison window is at least 20 contiguous nucleotides in length, and optionally can be 30, 40, 50, 100, or longer.
  • Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine sequence identity. Such implementations include, but are not limited to, CLUSTAL in the PC/Gene program (available from Intelligenetics, Mountain View, California); the ALIGN program (Version 2.0) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Version 8 (available from Genetics Computer Group (GCG), 575 Science Drive, Madison, Wisconsin, USA). Alignments using these programs can be performed using the default parameters.
  • the CLUSTAL program is well described by Higgins et al., Gene, 73:237 (1988); Higgins et al., CABIOS, 5: 151 (1989); Corpet et al., Nucl.
  • HSPs high scoring sequence pairs
  • Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always > 0) and N (penalty score for mismatching residues; always ⁇ 0).
  • M forward score for a pair of matching residues
  • N penalty score for mismatching residues; always ⁇ 0.
  • a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when the cumulative alignment score falls off by the quantity X from its maximum achieved value, the cumulative score goes to zero or below due to the accumulation of one or more negative-scoring residue alignments, or the end of either sequence is reached.
  • the BLAST algorithm In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences.
  • One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
  • P(N) the smallest sum probability
  • a test nucleic acid sequence is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid sequence to the reference nucleic acid sequence is less than about 0.1, more specifically less than about 0.01, and most specifically less than about 0.001.
  • Gapped BLAST in BLAST 2.0
  • PSLBLAST in BLAST 2.0
  • the default parameters of the respective programs e.g., BLASTN for nucleotide sequences, BLASTX for proteins
  • W wordlength
  • E expectation
  • E expectation
  • BLOSUM62 scoring matrix See the world wide web at ncbi.nlm.nih.gov. Alignment may also be performed manually by visual inspection.
  • comparison of sequences for determination of percent sequence identity to another sequence may be made using the BlastN program (version 1.4.7 or later) with its default parameters or any equivalent program.
  • equivalent program is intended any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide or amino acid residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by the preferred program.
  • sequence identity or “identity” in the context of two nucleic acid or polypeptide sequences makes reference to a specified percentage of residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window, as measured by sequence comparison algorithms or by visual inspection.
  • percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule.
  • sequences differ in conservative substitutions the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution.
  • Sequences that differ by such conservative substitutions are said to have "sequence similarity" or "similarity.” Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, California).
  • percentage of sequence identity means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison, and multiplying the result by 100 to yield the percentage of sequence identity.
  • sequence identity means that a polynucleotide comprises a sequence that has at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, or 79%, at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, or 89%, at least 90%, 91%, 92%, 93%, or 94%, and at least 95%, 96%, 97%, 98%, or 99% sequence identity, compared to a reference sequence using one of the alignment programs described using standard parameters.
  • nucleotide sequences are substantially identical if two molecules hybridize to each other under stringent conditions (see below).
  • stringent conditions are selected to be about 5 °C lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength and pH.
  • T m thermal melting point
  • stringent conditions encompass temperatures in the range of about 1°C to about 20°C, depending upon the desired degree of stringency as otherwise qualified herein.
  • Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides they encode are substantially identical. This may occur, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code.
  • nucleic acid sequences are substantially identical is when the polypeptide encoded by the first nucleic acid is immunologically cross reactive with the polypeptide encoded by the second nucleic acid.
  • substantially identity in the context of a peptide indicates that a peptide comprises a sequence with at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, or 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, or 89%, at least 90%, 91%, 92%, 93%, or 94%, or 95%, 96%, 97%, 98% or 99%, sequence identity to the reference sequence over a specified comparison window.
  • Optimal alignment is conducted using the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443 (1970).
  • An indication that two peptide sequences are substantially identical is that one peptide is immunologically reactive with antibodies raised against the second peptide.
  • a peptide is substantially identical to a second peptide, for example, where the two peptides differ only by a conservative substitution.
  • sequence comparison typically one sequence acts as a reference sequence to which test sequences are compared.
  • test and reference sequences are input into a computer, subsequence coordinates are designated if necessary, and sequence algorithm program parameters are designated.
  • sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters.
  • hybridizing specifically to refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA.
  • Bod(s) substantially refers to complementary hybridization between a probe nucleic acid and a target nucleic acid and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the target nucleic acid sequence.
  • T m The thermal melting point
  • T m is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Specificity is typically the function of post-hybridization washes, the critical factors being the ionic strength and temperature of the final wash solution.
  • T m can be approximated from the equation of Meinkoth and Wahl, Anal.
  • T m 81.5°C + 16.6 (log M) +0.41 (%GC) - 0.61 (% form) - 500/L; where M is the molarity of monovalent cations, %GC is the percentage of guanosine and cytosine nucleotides in the DNA, % form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs.
  • M is the molarity of monovalent cations
  • %GC is the percentage of guanosine and cytosine nucleotides in the DNA
  • % form is the percentage of formamide in the hybridization solution
  • L is the length of the hybrid in base pairs.
  • T m is reduced by about 1°C for each 1% of mismatching; thus, T m , hybridization, and/or wash conditions can be adjusted to hybridize to sequences of the desired identity. For example, if sequences with >90% identity are sought, the T m can be decreased 10°C.
  • stringent conditions are selected to be about 5°C lower than the T m for the specific sequence and its complement at a defined ionic strength and pH.
  • severely stringent conditions can utilize a hybridization and/or wash at 1, 2, 3, or 4°C lower than the T m ;
  • moderately stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9, or 10°C lower than the T m ;
  • low stringency conditions can utilize a hybridization and/or wash at 11, 12, 13, 14, 15, or 20°C lower than the T m .
  • An example of highly stringent wash conditions is 0.15 M NaCl at 72°C for about 15 minutes.
  • An example of stringent wash conditions is a 0.2X SSC wash at 65°C for 15 minutes (see, Sambrook, infra, for a description of SSC buffer).
  • a high stringency wash is preceded by a low stringency wash to remove background probe signal.
  • An example medium stringency wash for a duplex of, e.g., more than 100 nucleotides is IX SSC at 45°C for 15 minutes.
  • An example low stringency wash for a duplex of, e.g., more than 100 nucleotides is 4-6X SSC at 40°C for 15 minutes.
  • stringent conditions typically involve salt concentrations of less than about 1.5 M, more specifically about 0.01 to 1.0 M, Na ion concentration (or other salts) at pH 7.0 to 8.3, and the temperature is typically at least about 30°C and at least about 60°C for long probes (e.g., >50 nucleotides).
  • Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.
  • a signal to noise ratio of 2X (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization.
  • Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the proteins that they encode are substantially identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code.
  • Very stringent conditions are selected to be equal to the T m for a particular probe.
  • An example of stringent conditions for hybridization of complementary nucleic acids which have more than 100 complementary residues on a filter in a Southern or Northern blot is 50% formamide, e.g., hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in 0. IX SSC at 60 to 65°C.
  • Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1.0 M NaCl, 1% SDS at 37°C, and a wash in 0.5X to IX SSC at 55 to 60°C.
  • a genome editing protocol was implemented in this Example for correction of the sickle mutation in human hematopoietic stem cells, using a Cas9 ribonucleoprotein targeting cleavage of the B-globin gene (HBB) near the mutation site, and an oligonucleotide serving as a donor template for homology-directed repair (HDR) of cleavage and correction of the disease-causing mutation.
  • HBB B-globin gene
  • HDR homology-directed repair
  • This protocol drives correction of about >20% of HBB alleles in human hematopoietic stem cells. However, the remainder of edited alleles in the cell population are repaired by NHEJ. Indels induced by the genome editing protocol may produce null alleles equivalent to P-thalassemia mutations, which is a major safety concern.
  • the invention as described herein was partly designed to detect and quantify longer indels, up to tens of kb in size, that may be produced by repair of Cas9-induced DSBs.
  • Human hematopoietic stem cells that had been edited with a CRISPR/Cas9 based protocol were loaded onto the SageHLSTM chip and then were further treated with a pair of targeted nucleases (Cas9/guide RNA complexes) that cleave chromosome 11 approximately lOOkb upstream and downstream of the editing site.
  • Cleavage liberated a fragment of approximately 200kb spanning the editing site for the comprehensive detection of a full spectrum of indels, including large deletions (up to 200kb) that may be present in the edited region.
  • the liberated region was separated by gel electrophoresis and eluted from the gel using the SageHLSTM apparatus.
  • the recovered DNA was fragmented and cloned into an Illumina® sequencing library using standard procedures and sequenced in one lane of an Illumina® Novaseq® apparatus to produce paired end 150 bp reads at a depth of 955,551,306 read pairs.
  • Reads were processed with a bioinformatic pipeline that identifies structural variations (specifically large deletions) by identifying breakpoints that bracket the cut site to the up and downstream region. Sequence reads (150bp PE reads) were aligned to the human reference genome (hg38) using BWA. Reads were then labeled and extracted if they were split reads (a single read whose ends map to non-adjacent regions) or reads that were part of a pair that maps discordantly (farther from each other in the genome than would be expected for the average insert size). Breakpoints with ends that map to distant regions identify the boundaries of large deletions. The reads were then analyzed using structural variation callers and events that span the targeted cut site locus were extracted as putative large deletion events.
  • the invention encompasses each intervening value between the upper and lower limits of the range to at least a tenth of the lower limit's unit, unless the context clearly indicates otherwise. Further, the invention encompasses any other stated intervening values. Moreover, the invention also encompasses ranges excluding either or both of the upper and lower limits of the range, unless specifically excluded from the stated range.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Medicinal Chemistry (AREA)
  • Biomedical Technology (AREA)

Abstract

Certain embodiments of the present invention provide a method for unbiased discovery of the full spectrum of indels induced by a genome editing protocol. In certain embodiments, the present invention provides a method for discovering large and/or distant indels induced by a genome editing protocol.

Description

METHODS FOR DETECTING INDEL PRODUCED BY GENOME EDITING
PROTOCOL
CROSS- REFERENCE TO RELATED APPLICATION(S)
This application claims priority to U.S. Provisional Application No. 63/229,943, filed August 5, 2021, the contents of which are incorporated by reference herein.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH
This invention was made with government support under HL 151319 awarded by National Institutes of Health. The government has certain rights in the invention.
BACKGROUND
Gene editing typically involves the use of a targeted nuclease to induce double-strand DNA breaks (DSBs) at specific genomic sites. DSBs are then repaired by one of two cellular mechanisms: homology-directed repair (HDR) uses a DNA template to repair the DSB, while nonhomologous end joining (NHEJ) directly repairs the DSB but frequently creates an insertion or deletion mutation (indel) at the DSB site. Because HDR is usually less efficient than NHEJ, even protocols that use a DNA template to edit the region around a DSB result in a large proportion of repaired alleles containing an indel. Current methods for determining the genotypes produced by genome editing involve Polymerase Chain Reaction (PCR) amplification of DNA fragments followed by deep sequencing. These methods have limitations and do not inform the full extent of induced indel landscape. Accordingly, new and improved methods for discovering indels induced by a genome editing process is needed.
SUMMARY
Certain embodiments of the invention provide a method of providing an unbiased, full landscape of mutations induced by a genome editing protocol (e.g., CRSIPR-Cas, TALEN, or ZFN based protocol, or other genome-editing protocol).
Certain embodiments of the invention provide a method of identifying in a sample a DNA variant induced by a genome editing protocol, comprising: contacting a genomic DNA of the genome edited sample with one or more targeted nucleases (e.g., one targeted nuclease, or a pair of targeted nucleases) that is capable of excising a DNA fragment (e.g., a high molecular weight DNA fragment) from the genomic DNA; isolating the DNA fragment; and sequencing the isolated DNA fragment; wherein the genomic DNA comprises an editing site targeted by the genome editing protocol, and the DNA fragment comprises the editing site and the DNA variant (e.g., large and/or remote DNA variant).
DETAILED DESCRIPTION
Genome editing holds great promise in a wide range of applications from advancing basic research to revolutionizing treatment for certain intractable diseases. Genome editing protocols are known in the art, and the field continues to evolve. Currently CRISPR/Cas based protocols are efficient and facile genome editing approaches. Before the advent of CRISPR/Cas, transcription activator-like effector nuclease (TALEN), and Zinc finger nuclease (ZFN) based platforms were also widely adopted genome editing technologies.
In the process of genome editing, a double stranded break (DSB) and/or a single stranded break may be generated by a targeted nuclease or nickase at a specifically targeted editing site. During the DNA repair or ligation for the break, unintended DNA modifications may be randomly introduced as DNA variant artifacts. DNA repair mechanisms often involved in genome editing processes may include, but not limited to, homology directed repair (HDR) and non-homologous end joining (NHEJ). Without wanting to be bound by theory, the NHEJ process may be particularly prone to introduce unintended DNA modification relative to the original DNA sequence. DNA repair or ligation mechanism(s), however, are not fully elucidated. In addition, the unbiased, full spectrum of DNA variants (e.g., indels) unintentionally introduced by genome editing process is not well characterized and understood.
Conventionally, validation and characterization of post-editing outcomes involve amplifying a DNA fragment surrounding the editing site via polymerase chain reaction (PCR). For example, a pair of PCR primers are designed and located upstream and downstream of the editing site. Depending on the nature of PCR primers and the genome-edited template DNA, however, PCR reaction kinetics may bias the amplified fragments, leading to a skewed and/or incomplete representation of the indel landscape. For example, since a DNA variant can be unintentionally and unpredictably introduced by a genome editing protocol, the resulting template DNA may lack the sequence sufficiently complementary to the designed PCR primer or even lack the sequence entirely due to deletion. Therefore, PCR based post-editing assessments or quality controls by themselves may be insufficient to faithfully enumerate the whole gamut of indels. There are different types of DNA variants (e.g., indels) that have been unrecognized or underappreciated so far. The inadequacy of current post-editing evaluation workflow could have major consequences for genome editing applications including their adoption in medicine to deliver effective and safe therapies. Certain embodiments of the invention provide methods of generating a catalogue of mutations induced by a genome editing protocol. Certain embodiments described herein provide efficient methods suitable to provide an unbiased, full landscape of mutations induced by a genome editing protocol, including but not limited to, large indels, and/or remote indels that are distant from the targeted editing site.
In certain embodiments, the invention provides a method of identifying in a sample a DNA variant unintendedly induced by a genome editing protocol, comprising: contacting a genomic DNA of the genome edited sample with one or more targeted nucleases (e.g., one targeted nuclease or a pair of targeted nucleases) that is capable of excising a DNA fragment from the genomic DNA, isolating the DNA fragment, and sequencing the DNA fragment; wherein the genomic DNA comprises an editing site targeted by the genome editing protocol, and the DNA fragment comprises the editing site and the DNA variant. In certain embodiments, the sample is genome edited with a genome editing protocol selected from the group consisting of CRISPR-Cas based protocol, TALEN based protocol, and ZFN based protocol. In certain embodiments, the sample is genome edited using a CRISPR-Cas based genome-editing protocol.
In certain embodiments, the invention provides a method of identifying in a sample a DNA variant unintendedly induced by a genome editing protocol, comprising: contacting a genomic DNA of the genome edited sample with one or more targeted nucleases (e.g., one targeted nuclease or a pair of targeted nucleases) that is capable of excising a high molecular weight (HMW) DNA fragment from the genomic DNA, isolating the HMW DNA fragment, and sequencing the HMW DNA fragment; wherein the genomic DNA comprises an editing site targeted by the genome editing protocol, and the HMW DNA fragment comprises the editing site and the DNA variant.
The term “DNA variant” as described herein refers to an unintended DNA sequence modification induced by a genome editing protocol. In the process of genome editing, double stranded break (DSB) may be generated by targeted nuclease at specifically targeted editing site. During DNA repair processes, unintended DNA modifications may be randomly introduced as side effect on the edited DNA molecule. In certain embodiments, the DNA variant is an indel (insertion or deletion). In certain embodiments, the DNA variant is a deletion. In certain embodiments, the DNA variant is an insertion. In certain embodiments, the DNA variant is a point mutation. In certain embodiments, one or more point mutation DNA variants may be contiguous, for example, multiple point mutations in a row will form a segment mutation (e.g., 2 base pair (bp) or longer in length) so the segment sequence is entirely replaced but no change in length occurred in the segment. In certain embodiments, a DNA variant may be disadvantageous, or even harmful (e.g., to an edited cell, or to a host, or to a recipient of the edited cell). In certain embodiments, a DNA variant may be harmless, or even beneficial. The term “DNA variant” as described herein also encompasses DNA rearrangement and/or translocation as unintended DNA modification induced by a genome editing protocol. For example, chromothripsis is a mutational phenomenon of clustered chromosomal rearrangement occurred in localized genomic region(s). DNA rearrangement and/or translocation may involve the deletion of a DNA segment at one location of the genomic DNA and the insertion of the DNA segment at another location of the genomic DNA. In certain embodiments, the DNA variant is a deletion of a DNA segment that has been, completely or partially, rearranged or translocated or inserted into another location of the genomic DNA. In certain embodiments, the DNA variant is an insertion of a DNA segment that has been, completely or partially, rearranged or translocated or deleted at another location of the genomic DNA. In certain embodiments, the DNA variant is an insertion of a DNA segment that has been, completely or partially, copied from another location of the genomic DNA so the copy number of the DNA segment might be changed (e.g., increased) in the genome.
Methods described herein is suitable for discovering DNA variants that may have been missed by currently available protocols. In certain embodiments the DNA variants include short and long DNA variants, near and remote DNA variants, and isolated and clustered DNA variants. In certain embodiments, the DNA variant has a length of at least about Ibp, 2bp, 3bp, 4bp, 5bp, lObp, 20bp, 30bp, 40bp, 50bp, 60bp, 70bp, 80bp, 90bp, lOObp, 200bp, 300bp, 400bp, 500bp, 600bp, 700bp, 800bp, 900bp, Ikb, l. lkb, 1.2kb, 1.3kb, 1.4kb, 1.5kb, 1.6kb, 1.7kb, 1.8kb, 1.9kb, 2kb, 2.5kb, 3kb, 3.5kb, 4kb, 4.5kb, 5kb, 5.5kb, 6kb, 6.5kb, 7kb, 7.5kb, 8kb, 8.5kb, 9kb, 9.5kb, lOkb, 1 Ikb, 12kb, 13kb, 14kb, 15kb, 16kb, 17kb, 18kb, 19kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, or longer. As used herein, a short DNA variant indicates a DNA variant that has a length of less than about lOObp in length. As used herein, a long or large DNA variant indicates a DNA variant that is at least about 100 bp in length. In certain embodiments, the large DNA variant (e.g., an indel) has a length of about lOObp, 200bp, 300bp, 400bp, 500bp, 600bp, 700bp, 800bp, 900bp, Ikb, l. lkb, 1.2kb, 1.3kb, 1.4kb, 1.5kb, 1.6kb, 1.7kb, 1.8kb, 1.9kb, 2kb, 2.5kb, 3kb, 3.5kb, 4kb, 4.5kb, 5kb, 5.5kb, 6kb, 6.5kb, 7kb, 7.5kb, 8kb, 8.5kb, 9kb, 9.5kb, lOkb, 1 Ikb, 12kb, 13kb, 14kb, 15kb, 16kb, 17kb, 18kb, 19kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, or longer. In certain embodiments, the DNA variant is an indel that has a length of at least 1 kilobase (kb). In certain embodiments, the DNA variant is an indel that has a length of at least 2kb. In certain embodiments, the DNA variant is an indel that has a length of at least 3kb. In certain embodiments, the DNA variant is an indel that has a length of at least 4kb. In certain embodiments, the DNA variant is an indel that has a length of at least 5kb. In certain embodiments, the DNA variant is an indel that has a length of at least 6kb. In certain embodiments, the DNA variant is an indel that has a length of at least 7kb. In certain embodiments, the DNA variant is an indel that has a length of at least 8kb. In certain embodiments, the DNA variant is an indel that has a length of at least 9kb. In certain embodiments, the DNA variant is an indel that has a length of at least lOkb. As non-limiting examples, in certain embodiments, a large DNA variant (e.g., indel) has a length of about lOObp to lOOkb, Ikb to 90kb, 2kb to 80kb, 5kb to 70kb, lOkb to 60kb, or 15kb to 50kb. Accordingly, the method described herein is capable of detecting a large DNA variant (e.g., having a length of about lOObp to lOOkb, such as 2kb, 5kb or 15kb) as described above.
The method described herein is particularly suitable to generate an unbiased, full spectrum of DNA variants including remote DNA variant induced by a genome editing protocol. As used herein, a near DNA variant indicates a DNA variant that is less than about 500 bases from the editing site. As used herein, a remote DNA variant indicates a DNA variant that is at least about 500 bases from the editing site. In certain embodiments, the DNA variant (e.g., indel) is at least about 500bp, 600bp, 700bp, 800bp, 900bp, Ikb, 2kb, 3kb, 4kb, 5kb, 6kb, 7kb, 8kb, 9kb, lOkb, 15kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, HOkb, 120kb, 130kb, 140kb, 150kb, 160kb, 170kb, 180kb, 190kb, 200kb or further, away from the editing site. In certain embodiments, the DNA variant (e.g., insertion or deletion) is at least about 500 bases away from the editing site. In certain embodiments, the DNA variant is at least about Ikb, 2kb, 3kb, 4kb or 5kb away from the editing site. In certain embodiments, the DNA variant is at least about lOkb away from the editing site. In certain embodiments, the DNA variant is at least about 20kb away from the editing site. In certain embodiments, the DNA variant is at least about 30kb away from the editing site. In certain embodiments, the DNA variant is at least about 40kb away from the editing site. In certain embodiments, the DNA variant is at least about 50kb away from the editing site. In certain embodiments, the DNA variant is at least about 60kb away from the editing site. In certain embodiments, the DNA variant is at least about lOOkb away from the editing site. In certain embodiments, the DNA variant is at least about 200kb, or further, away from the editing site. As non-limiting examples, in certain embodiments, the DNA variant is about 500bp to 200kb, Ikb to 190kb, 2kb to 180kb, 5kb to 170kb, lOkb to 160kb, or 15kb to 150kb away from the editing site. Accordingly, the method described herein is capable of detecting a remote DNA variant (e.g., from about 500bp to 200kb away, such as lOkb, 50kb or 60kb away from the editing site) as described above.
In certain embodiments, a DNA variant may have any combination of a length described herein and a distance from editing site described herein. Accordingly, the method described herein is capable of detecting a large and/or remote DNA variant as described herein. For example, in certain embodiments, the method described herein is capable of detecting a large and remote DNA variant of 5kb in length and is at least 60kb away from the editing site. In certain embodiments, a large and remote DNA variant may have a length of at least lOObp and is at least 500bp away from the editing site. In certain embodiments, a DNA variant may have a length of at least 300bp and is at least 800bp away from the editing site. In certain embodiments, a DNA variant may have a length of at least 500bp and is at least Ikb away from the editing site. In certain embodiments, a DNA variant may have a length of at least 800bp and is at least 3kb away from the editing site. In certain embodiments, a DNA variant may have a length of at least Ikb and is at least 5kb away from the editing site. In certain embodiments, a DNA variant may have a length of at least 2kb and is at least lOkb away from the editing site. In certain embodiments, a DNA variant may have a length of at least 3kb and is at least 20kb away from the editing site. In certain embodiments, a DNA variant may have a length of at least 4kb and is at least 30kb away from the editing site. In certain embodiments, a DNA variant may have a length of at least 5kb and is at least 40kb away from the editing site.
In certain embodiments, methods described herein are suitable for generating full spectrum of DNA variants or characterizing extensive DNA variants induced by a genomeediting protocol. In certain embodiments, one DNA variant is characterized. In certain embodiments, two or more DNA variants are characterized. In certain embodiments, at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90 or more DNA variants are characterized. In certain embodiments, at least 2 DNA variants are characterized. In certain embodiments, at least 5 DNA variants are characterized. In certain embodiments, at least 10 DNA variants are characterized. In certain embodiments, at least 20 DNA variants are characterized. In certain embodiments, at least 30 DNA variants are characterized. In certain embodiments, at least 40 DNA variants are characterized. In certain embodiments, at least 50 DNA variants are characterized. In certain embodiments, at least 60 DNA variants are characterized. As used herein, an isolated DNA variant indicates one DNA variant having no other DNA variant in the proximity within 5kb upstream of the DNA variant 5' end and 5kb downstream of the DNA variant 3' end. As used herein, clustered DNA variants indicate that the distance between two DNA variants is shorter than 5kb (distance from 3' end of one DNA variant to the 5' end of another DNA variant is <5kb).
The term “editing site” as described herein refers to the intended target site on a genomic DNA (e.g., chromosomal, mitochondrial or plasmid DNA) for editing in a genome editing protocol. The target site for editing is purposefully and rationally chosen. For example, the editing site in a CRISPR-Cas based genome editing protocol may be targeted by specifically designed guide RNA (gRNA), while in TALEN or ZFN based protocol the editing site is targeted via specifically designed TALE nuclease or zinc finger nuclease. The desired outcome of genome editing at the editing site may include, but is not limited to, correction of a single point mutation at the editing site, replacement of a deleterious DNA segment with a beneficial DNA segment at the editing site, knocking out of an undesirable DNA segment at the editing site, or knocking in of a desirable DNA segment at the editing site. The editing site may be located at a protein coding region or a non-coding region. The editing site may be located at a regulatory region such as a promoter, enhancer, 5' or 3 '-untranslated region (UTR). The editing site may be located at a transposon or retrotransposon. The editing site may be located at a microRNA coding region. The editing site may be located at a site encoding a splice signal. The editing site may be located within chromosomal DNA, mitochondrial DNA, or plasmid DNA.
The editing site may be a single nucleotide in length (e.g., for point mutation editing). The editing site may be a DNA segment (longer than a single nucleotide) that has a 5' end and a 3' end. As non-limiting examples, the editing site (e.g., for gene replacement, or knock-out) may have a length of about Ibp, 2bp, 3bp, 4bp, 5bp, lObp, 25bp, 50bp, lOObp, 200bp, 500bp, Ikb, 2kb, 5kb, lOkb, or longer. The editing site may have the same length prior to and after the genome editing protocol (e.g., single point edit, or replacement of a DNA segment of equal length). The editing site may have different lengths prior to and after the genome editing protocol (e.g., knock out, knock in, or replacement of a DNA segment of differing length). Given the rational design nature of a genome editing protocol, the editing site and the immediately adjacent nucleotide(s) surrounding the editing site have definitive location/loci in a genome map (e.g., chromosome map), and the sequence at the editing site and its close proximity can be located and probed precisely before and/or after the genome editing protocol. For example, prior to a genome editing process, a cell may carry a disease-causing allele at the editing site, the sequence of which can be probed and ascertained with PCR/sequencing or any other suitable sequencing, genotyping or diagnostic methods. Likewise, after the genome editing process, DNA sequence at the editing site can be probed and ascertained to provide an indication whether correct DNA sequence is now present at the editing site as intended. Accordingly, the editing site of a genomic DNA in a cell may have successfully edited DNA sequence as intended by the genome editing protocol. Alternatively, the editing site of a genomic DNA in a cell may have the original, unedited DNA sequence prior to or after the genome editing protocol. In some instances, the editing site of a genomic DNA may display correct editing on one chromosome and not on its homologous chromosome. It is also possible that the editing site of a genomic DNA in a cell may have partially edited DNA sequence that falls short of the full length of the intended DNA segment for knock out, knock in, or replacement.
In certain embodiments, the methods described herein comprise contacting an edited genomic DNA with one or more targeted nucleases (e.g., one single targeted nuclease or a pair of targeted nucleases) that is capable of excising a DNA fragment, such as a high molecular weight (HMW) DNA fragment, from the edited genomic DNA.
In certain embodiments, the one or more targeted nucleases comprises one, two, or more targeted nucleases. In certain embodiments, the one or more targeted nucleases comprises one single targeted nuclease. In certain embodiments, the single targeted nuclease may be designed to cut a linear genomic DNA for excision and release of a HMW DNA fragment of interest that comprises the editing site and one end of the linear genomic DNA.
In certain embodiments, the one or more targeted nucleases comprises a pair of targeted nucleases. The pair of targeted nucleases (downstream nuclease and upstream nuclease) is designed to cut at downstream and upstream of the editing site respectively for excision and release of a DNA fragment (e.g., HMW DNA fragment) that includes the editing site and flanking sequences.
Accordingly, the HMW DNA fragment may also comprise one or more DNA variant induced by a genome editing protocol (e.g., induced by NHEJ repair following genome editing). In certain embodiments, the HMW DNA fragment may comprise an unbiased, full spectrum of any DNA variants, including large and/or remote DNA variant(s) as described herein.
In certain embodiments, the one or more targeted nucleases (e.g., one targeted nuclease or a pair of targeted nucleases) comprises a CRISPR-Cas nuclease, a transcription activator-like effector nuclease (TALEN), a zinc-finger nuclease (ZFN), or a meganuclease. In certain embodiments, the one or more targeted nucleases comprise a CRISPR-Cas nuclease. In certain embodiments, the one or more targeted nucleases comprise a CRISPR-Cas9 nuclease. In certain embodiments, the one or more targeted nucleases comprise Streptococcus pyogenes Cas9 nuclease (SpCas9). In certain embodiments, the one or more targeted nucleases comprise a Staphylococcus aureus Cas9 nuclease (SaCas9). In certain embodiments, the one or more targeted nucleases comprise a CRISPR-Casl2a nuclease.
In certain embodiments, the pair of targeted nucleases comprises two of the same class of nucleases (e.g., two Cas nucleases, or two ZFNs). In certain embodiments, the pair of targeted nucleases comprises a pair of CRISPR-Cas9 nucleases. In certain embodiments, the pair of targeted nucleases comprises two types of nucleases within the same class (e.g., a SpCas9 and a SaCas9; or a Cas9 and a Cas 12a). In certain embodiments, the pair of targeted nucleases comprises two different classes of nucleases (e.g., a Cas nuclease and a non-Cas nuclease such as a TALEN or ZFN).
The methods of using targeted nuclease systems (e.g., CRISPR-Cas, TALEN, or ZFN) for selectively creating DNA break at a targeted location of a DNA molecule are known in the art and described herein. For example, Class 2 Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) systems, which form an adaptive immune system in bacteria, were adapted for genome engineering. Due to its comparative simplicity and adaptability, CRISPR has rapidly become the most popular genome editing approach that gained widespread adoption in both industry and academic labs. Exemplary CRISPR-Cas systems comprises two components: a guide RNA (gRNA or sgRNA) and a CRISPR-associated endonuclease (Cas protein). The gRNA is a short synthetic RNA comprising a scaffold sequence necessary for Cas-binding and a user-defined about 20 nucleotide spacer that defines the genomic target to be modified. Thus, one can change the genomic target of the Cas protein by changing the sequence of the gRNA. Overall, design and generation of targeted nuclease systems, or further engineered CRISPR, TALEN and ZFN derivative systems are known/practiced in the field and further supported by commercially available services. In addition, exemplary U.S. patents directed to targeted nuclease systems, such as U.S. Patent 8,586,363; U.S. Patent 9,393,257; U.S. Patent 9,982,277; U.S. Patent 10,266,850; and U.S. Patent 10,570,418 are incorporated by reference herein for all purposes.
The term “high molecular weight DNA” or “HMW DNA fragment” as described herein refers to a DNA molecule having at least lOkb in length. For example, in certain embodiments, the HMW DNA fragment may have a length of at least about lOkb, 15kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, HOkb, 120kb, 130kb, 140kb, 150kb, 160kb, 170kb, 180kb, 190kb, 200kb, 210kb, 220kb, 230kb, 240kb, 250kb, 260kb, 270kb, 280kb, 290kb, 300kb, 310kb, 320kb, 330kb, 340kb, 350kb, 360kb, 370kb, 380kb, 390kb, 400kb, 410kb, 420kb, 430kb, 440kb, 450kb, 460kb, 470kb, 480kb, 490kb, 500kb, or longer. In certain embodiments, the HMW DNA fragment has a length of at least about 15kb. In certain embodiments, the HMW DNA fragment has a length of at least about 20kb. In certain embodiments, the HMW DNA fragment has a length of at least about 50kb. In certain embodiments, the HMW DNA fragment has a length of at least about 75kb. In certain embodiments, the HMW DNA fragment has a length of at least about lOOkb. In certain embodiments, the HMW DNA fragment has a length of at least about 150kb. In certain embodiments, the HMW DNA fragment has a length of at least about 200kb. In certain embodiments, the HMW DNA fragment has a length of at least about 250kb. In certain embodiments, the HMW DNA fragment has a length of at least about 300kb. In certain embodiments, the HMW DNA fragment has a length of at least about 350kb. In certain embodiments, the HMW DNA fragment has a length of at least about 400kb. In certain embodiments, the HMW DNA fragment has a length of about lOkb to 500kb, 20kb to 450kb, 30kb to 400kb, 40kb to 350kb, or 50kb to 300kb, as described above. In certain embodiments, the HMW DNA fragment has a length of about 15kb to 490kb, 25kb to 430kb, 35kb to 410kb, 45kb to 390kb, or 55kb to 360kb, as described above.
In certain embodiments, one or more targeted nucleases (e.g., a single targeted nuclease) may be used for excision and release of a DNA fragment of interest from one end of a genomic DNA (e.g., a linear genomic DNA), wherein the DNA fragment comprises the editing site, genome-editing induced DNA variant(s), and the end of the genomic DNA. In certain embodiments, the targeted nuclease cuts the linear genomic DNA at least about lOkb, 15kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, HOkb, 120kb, 130kb, 140kb, 150kb, 160kb, 170kb, 180kb, 190kb, 200kb, 210kb, 220kb, 230kb, 240kb, 250kb, 260kb, 270kb, 280kb, 290kb, 300kb, 310kb, 320kb, 330kb, 340kb, 350kb, 360kb, 370kb, 380kb, 390kb, 400kb, 410kb, 420kb, 430kb, 440kb, 450kb, 460kb, 470kb, 480kb, 490kb, 500kb, or further, away from the end (5' end or 3' end) of the linear genomic DNA. In certain embodiments, one or more targeted nucleases (e.g., a single targeted nuclease) cuts the linear genomic DNA at about lOkb to 500kb, 20kb to 450kb, 30kb to 400kb, 40kb to 350kb, or 50kb to 300kb, as described above, away from the end of the linear genomic DNA. For example, the targeted nuclease may cut the linear genomic DNA at least about lOkb (e.g., about lOOkb or 200kb) downstream of the 5' end of the linear genomic DNA and release a HMW DNA fragment comprising the editing site, genome editing induced DNA variant(s), and the 5' end of the genomic DNA. Alternatively, the targeted nuclease may cut the linear genomic DNA at least about lOkb (e.g., about lOOkb or 200kb) upstream of the 3' end of the linear genomic DNA and release a HMW DNA fragment comprising the editing site, genome editing induced DNA variant(s), and the 3' end of the genomic DNA. In certain embodiments, the released HMW DNA fragment of interest further comprises a telomere region. As used herein, telomere region is the end of linear chromosome and a region of repetitive nucleotide sequences that could be recognized by specialized protein(s) including telomerase. In such cases wherein the linear genomic DNA comprises a telomere at the genomic DNA terminal, the distance(s) as described above is measured by the cutting location relative to the first non-telomere nucleotide that abuts the telomere region. For example, the targeted nuclease may cut the linear genomic DNA at about lOkb to 500kb, 20kb to 450kb, 30kb to 400kb, 40kb to 350kb, or 50kb to 300kb, as described above, away from the first non-telomere nucleotide that abuts the telomere region. As one non-limiting example, the targeted nuclease may cut the linear genomic DNA at 200kb away from the first non-telomere nucleotide that abuts a telomere region of 8kb in length, releasing a DNA fragment of about 208kb in length.
In certain embodiments, for a pair of targeted nucleases capable of excising a DNA fragment of interest, one targeted nuclease of the pair (downstream nuclease) cuts the genomic DNA at least about lOObp, 200bp, 300bp, 400bp, 500bp, 600bp, 700bp, 800bp, 900bp, Ikb, 2kb, 3kb, 4kb, 5kb, 6kb, 7kb, 8kb, 9kb, lOkb, 15kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, HOkb, 120kb, 130kb, 140kb, 150kb, 160kb, 170kb, 180kb, 190kb, 200kb, or further, downstream of the editing site. In certain embodiments, downstream nuclease cuts the genomic DNA at least about 5kb downstream of the editing site. In certain embodiments, downstream nuclease cuts the genomic DNA at least about lOkb downstream of the editing site. In certain embodiments, downstream nuclease cuts the genomic DNA at least about 50kb downstream of the editing site. In certain embodiments, downstream nuclease cut the genomic DNA at least about lOOkb downstream of the editing site. In certain embodiments, downstream nuclease cuts the genomic DNA at least about 150kb downstream of the editing site. In certain embodiments, downstream nuclease cuts the genomic DNA at least about 200kb downstream of the editing site. The distance of downstream cutting location relative to the editing site is measured by the downstream cutting location relative to the first neighboring nucleotide downstream to the 3' end of the editing site.
In certain embodiments, for a pair of targeted nucleases capable of excising a DNA fragment of interest, one targeted nuclease of the pair (upstream nuclease) cuts the genomic DNA at least about lOObp, 200bp, 300bp, 400bp, 500bp, 600bp, 700bp, 800bp, 900bp, Ikb, 2kb, 3kb, 4kb, 5kb, 6kb, 7kb, 8kb, 9kb, lOkb, 15kb, 20kb, 25kb, 30kb, 35kb, 40kb, 45kb, 50kb, 55kb, 60kb, 65kb, 70kb, 75kb, 80kb, 85kb, 90kb, 95kb, lOOkb, HOkb, 120kb, 130kb, 140kb, 150kb, 160kb, 170kb, 180kb, 190kb, 200kb, or further, upstream of the editing site. In certain embodiments, the upstream nuclease cuts the genomic DNA at least about 5kb upstream of the editing site. In certain embodiments, the upstream nuclease cuts the genomic DNA at least about lOkb upstream of the editing site. In certain embodiments, the upstream nuclease cuts the genomic DNA at least about 50kb upstream of the editing site. In certain embodiments, the upstream nuclease cuts the genomic DNA at least about lOOkb upstream of the editing site. In certain embodiments, the upstream nuclease cuts the genomic DNA at least about 150kb upstream of the editing site. In certain embodiments, the upstream nuclease cuts the genomic DNA at least about 200kb upstream of the editing site. The distance of the upstream cutting location relative to the editing site is measured by the upstream cutting location relative to the first neighboring nucleotide upstream to the 5' end of the editing site. The distance of upstream or downstream cutting location relative to the editing site may be approximately symmetric or asymmetric. In certain embodiments, the pair of targeted nucleases cut symmetrically or asymmetrically at any combination of a downstream cutting distance described herein and an upstream cutting distance described herein.
For example, in certain embodiments, one targeted nuclease of the pair (downstream nuclease) cuts the genomic DNA at about 8kb downstream of the editing site, and the other targeted nuclease of the pair (upstream nuclease) cuts the genomic DNA at about 8kb upstream of the editing site. In certain embodiments, the downstream nuclease cuts the genomic DNA at about 50kb downstream of the editing site, and the upstream targeted nuclease cuts the genomic DNA at about 50kb upstream of the editing site. In certain embodiments, the downstream nuclease cuts the genomic DNA at about 60kb downstream of the editing site, and the upstream targeted nuclease cuts the genomic DNA at about 60kb upstream of the editing site. In certain embodiments, the downstream nuclease cuts the genomic DNA at about lOOkb downstream of the editing site, and the upstream nuclease cuts the genomic DNA at about lOOkb upstream of the editing site. In certain embodiments, the downstream nuclease cuts the genomic DNA at about 180kb downstream of the editing site, and the upstream nuclease cuts the genomic DNA at about 180kb upstream of the editing site.
In certain embodiments, one targeted nuclease of the pair (downstream nuclease) cuts the genomic DNA at about lOkb downstream of the editing site and the other targeted nuclease of the pair (upstream nuclease) cuts the genomic DNA at about 18kb upstream of the editing site. In certain embodiments, the downstream nuclease cuts the genomic DNA at about 72kb downstream of the editing site and the upstream nuclease cuts the genomic DNA at about 66kb upstream of the editing site. In certain embodiments, the downstream nuclease cuts the genomic DNA at about 120kb downstream of the editing site and the upstream nuclease cuts the genomic DNA at about 160kb upstream of the editing site.
In certain embodiments, a pair of targeted nucleases are CRISPR-Cas9 nucleases. Hence, the upstream and downstream cutting locations are targeted by specifically designed gRNA sequences.
In certain embodiments, the methods described herein do not involve amplifying a DNA fragment surrounding the editing site using a pair of PCR primers. In contrast, the HMW DNA is cut and released from the edited genomic DNA using one or more targeted nucleases, for example, a single targeted nuclease or a pair of targeted nucleases as described herein. Thus, the methods described herein may produce a HMW DNA fragment from a genome edited sample in a faithful and unbiased manner. However, once the HMW DNA fragment has been characterized using methods described herein, the design of appropriate PCR primers could be better informed, PCR may then be performed as secondary, confirmatory test for certain location or indel(s) discovered by the methods described herein.
Methods described herein can be used to characterize a genome edited sample in an unbiased and comprehensive manner to generate a full spectrum documentation on DNA variant(s) induced by a genome editing protocol. In certain embodiments, the sample comprises an edited DNA, or an edited cell comprising an edited DNA. In certain embodiments, the sample comprises a cell. In certain embodiments, the method described herein comprises lysing the sample (e.g., a cell) to release the edited genomic DNA (e.g., prior to contacting the DNA with one or more targeted nucleases, such as a single targeted nuclease or a pair of targeted nucleases as descried herein).
Any cell lysis method can be used so long as the integrity of the genomic DNA is not compromised during lysis process. In certain embodiments, cell can be lysed by chemical or biochemical methods. In certain embodiments, lysing comprises contacting the sample cell with hypotonic solution, enzyme (e.g., lysozyme or proteinase), and/or cell membrane disrupting agent such as detergent (e.g., SDS). In certain embodiments, cell can be lysed by physical or mechanical methods, including but not limited to, sonication, freeze-thawing, or other shearing methods.
In certain embodiments, the sample comprises a prokaryotic cell, or a eukaryotic cell. In certain embodiments, the sample comprises a bacterial cell, yeast cell, insect cell, plant cell, or mammalian cell. In certain embodiments, the sample comprises an E. coli cell. In certain embodiments, the sample comprises an animal cell. In certain embodiments, the sample comprises a mouse cell, a rat cell, a hamster cell, a cow cell, a pig cell, a horse cell, a dog cell, a cat cell, a fish cell, a goat cell, a camelids cell, a sheep cell, or a chicken cell. In certain embodiments, the sample comprises a zebra fish cell. In certain embodiments, the sample comprises a human cell. In certain embodiments, the sample comprises a human stem cell. In certain embodiments, the sample comprises a human somatic cell (e.g., muscle cell, or neuron).
Once a population of cells is subject to a genome editing protocol, an aliquot of cells is taken from such population to produce a sample for use in the methods described herein, while the rest of the population is reserved for future application or disposal. In certain embodiments, the edited cells are of prophylactic and/or therapeutic use. In certain embodiments, the sample comprises an edited cell that is suitable for being administered into an animal (if the edited cell harbors the desired edits at the editing site and is free of harmful or dangerous DNA variant induced by the genome editing protocol). Thus, the invention methods described herein provide comprehensive, robust quality control and assurance processes for subsequent applications of edited DNA or cells. In certain embodiments, the methods described herein comprises comparing the sequence of the DNA fragment such as HMW DNA fragment (after sequencing the HMW DNA fragment) to one or more reference sequences (e.g., the original sequence of the sample before genome-editing, and/or a control sequence having wildtype sequence of a gene). For example, the comparison may be conducted using suitable alignment or multiple alignment bioinformatic workflow.
In certain embodiments, the methods described herein comprises determining the nature of DNA variant(s) comprised within the DNA fragment (e.g., HMW DNA fragment). For example, in certain embodiments, a DNA variant(s) is determined to be indel (e.g., insertion or deletion). In certain embodiments, a DNA variant(s) is determined to be point mutation. In certain embodiments, a DNA variant(s) is determined to lead to a missense substitution that results in replacement of one amino acid into another. In certain embodiments, a DNA variant(s) is determined to lead to a nonsense substitution that results in a premature stop codon and shortened protein. In certain embodiments, a DNA variant(s) is determined to lead to frameshift. In certain embodiments, a DNA variant(s) is determined to be part of a rearrangement, or translocation event (e.g., as a result of chromothripsis). In certain embodiments, a DNA variant(s) is determined to be part of a duplication, or inversion event. A duplication occurs when a stretch of one or more nucleotides in a gene is copied and repeated (e.g., next to the original DNA sequence). An inversion changes more than one nucleotide in a gene by replacing the original sequence with the same sequence in reverse order. On the other hand, some regions of DNA contain short sequences of nucleotides (e.g., trinucleotide or tetranucleotide) that are repeated a number of times in a row. In certain embodiments, a DNA variant(s) is determined to be part of a repeat expansion event that increases the number of times that a short DNA sequence (e.g., trinucleotide or tetranucleotide) is repeated. Thus, depending on the target or purpose of a genome-editing protocol, once the nature of the DNA variant(s) comprised within the DNA fragment (e.g., HMW DNA fragment) is determined, it is evident to people skilled in the art that whether a population of cells subject to the same genome editing protocol as the sample cells tested in methods described herein may be only reserved for future study or discarded, or is suitable for subsequent application when the HMW DNA is generally free of detrimental DNA variant.
Accordingly, certain embodiments of the invention provide methods of treatment or a method of medical therapy for a disease. In certain embodiments, the methods described herein further comprise administering into an animal a population of cells, wherein the administered population of cells and the sample cell were previously edited in the same genome editing protocol (e.g., administered cells, and the sample cells used for quality control were edited in the same genome-editing batch/process).
In certain embodiments, the sample comprises a stem cell. In certain embodiments, the sample comprises a hematopoietic stem cell. In certain embodiments, the sample comprises an induced pluripotent stem cell (iPSC). In certain embodiments, the sample comprises a patient derived iPSC. In certain embodiments, the sample comprises a pluripotent cell. In certain embodiments, the sample comprises a progenitor cell. In certain embodiments, the sample comprises a blood cell. In certain embodiments, the sample comprises an immune cell. In certain embodiments, the sample comprises a T cell (e.g., CAR-T cell). In certain embodiments, the sample comprises a dendritic cell. In certain embodiments, the sample comprises a Natural Killer cell. In certain embodiments, the sample comprises a B cell. In certain embodiments, the sample comprises a cancer cell.
In certain embodiments, the disease is a hereditary disease. In certain embodiments, the disease is a blood disorder. In certain embodiments, the disease is sickle cell disease. In certain embodiments, the disease is thalassemia (e.g., beta thalassemia). In certain embodiments, the disease is cancer. In certain embodiments, the disease is an immune disorder. In certain embodiments, the disease is a neuronal disorder (e.g., fronto-temporal dementia). In certain embodiments, the disease is a muscular disorder (e.g., muscular dystrophy).
In certain embodiments, the edited cells may be of biomanufacturing use. In certain embodiments, the cell is a human embryonic kidney (HEK) 293 cell. In certain embodiments, the cell is a 293F cell. In certain embodiments, the cell is a 293T cell. In certain embodiments, the cell is a human embryonic retinal (PER.C6) cell. In certain embodiments, the cell is a HT- 1080 cell. In certain embodiments, the cell is a Huh-7 cell. In certain embodiments, the cell is a Monkey kidney epithelial (Vero) cell. In certain embodiments, the cell is a Chinese Hamster Ovary (CHO) cell. In certain embodiments, the cell is a baby hamster kidney (BHK) cell. In certain embodiments, the sample comprises a hybridoma cell.
Sample preparation, Generation of HMW DNA fragment, and DNA Isolation Methods
In certain embodiments, methods described herein comprises electrophoresing.
In certain embodiments, a sample (e.g., cell) is introduced into a loading compartment of a device or a gel that is suitable for size selection process (e.g., electrophoresis). In certain embodiments, the loading compartment comprises a solution. In certain embodiments, sample cells are pipetted into the loading compartment of the device or gel. In certain embodiments, the sample is lysed in situ within the loading compartment, releasing the edited genomic DNA from sample cell into the loading compartment. However, in certain embodiments, sample cells are not lysed in situ within the loading compartment of device or gel. It is understood by person skilled in the art that in certain embodiments, such sample preparations are conducted in a suitable container (e.g., a tube) and then introduced into the loading compartment.
Alternatively, in certain embodiments, sample cells are encapsulated within a gel matrix. In certain embodiments, the sample is lysed in situ within the gel matrix, releasing the edited genomic DNA from sample cell into the gel matrix.
To optionally clarify the content in the loading compartment or gel matrix, in certain embodiments, the method further comprises one or more pretreatment step to digest and/or elute lipid, protein, RNA, cellular metabolites, etc. For example, an initial electrophoresis step is conducted to elute smaller cellular content released from sample cell, while ultra-large genomic DNA are unable to migrate through the gel under electrophoretic field.
With or without the optional clarification step(s), in certain embodiments, the genomic DNA is contacted with one or more targeted nucleases (e.g., one targeted nuclease or a pair of targeted nucleases) as described herein and incubated for a period of time (e.g., about 15-45 minutes) to generate the DNA fragment (e.g., HWM DNA fragment) comprising the editing site and DNA variant(s).
In certain embodiments, the genomic DNA released from an edited sample (e.g., an edited cell) is contacted with one or more targeted nucleases (e.g., one targeted nuclease or a pair of targeted nucleases) in a liquid solution (e.g., in liquid phase within a container, or within loading compartment of device or gel).
In certain embodiments, the genomic DNA from an edited sample is contacted with one or more targeted nucleases (e.g., one targeted nuclease or a pair of targeted nucleases) in a gel matrix (e.g., agarose gel).
In certain embodiments, the genomic DNA from an edited sample is contacted with the pair of downstream and upstream targeted nucleases simultaneously or sequentially. In certain embodiments, the genomic DNA is contacted with one or the pair of targeted nuclease(s) for about 5 minutes to 6 hours, 10 minutes to 3 hours, 15 minutes to 2 hours, 20 minutes to 1.5 hours, 30 minutes to 1 hour, or 40 minutes to 50 minutes. In certain embodiments, the genomic DNA from an edited sample is contacted with one or the pair of targeted nuclease(s) for at least about 5 minutes, 10 minutes, 15 minutes, 20 minutes, 25 minutes, 30 minutes, 40 minutes, 50 minutes, 1 hour, 2 hours or 3 hours.
In certain embodiments, the targeted nucleases are optionally inactivated to stop further enzymatic activities.
It is apparent to people skilled in the art, that there are a variety of ways to purify genomic DNA from genome-edited sample cell and then contacted the DNA with one or more targeted nucleases (e.g., one targeted nuclease or a pair of targeted nucleases). For example, in certain embodiments, genomic DNA is purified using any suitable technique and then contacted with one or more targeted nucleases for incubation within a tube or any suitable container. The resultant genomic DNA mixture, including the released DNA fragment (e.g., HMW DNA fragment), is then transferred into loading compartment of a device or gel for separation (e.g., via electrophoresis).
In certain embodiments, the method comprises contacting the resultant genomic DNA mixture with a detergent (e.g., SDS). Without wanting to be bound by theory, this step may improve electrophoresis efficiency, separate certain DNA binding proteins from genomic DNA, and/or change the charge level of the genomic DNA or fragment.
In certain embodiments, the method comprises an isolating step that isolates the DNA fragment (e.g., HMW DNA fragment) from the genomic DNA mixture. Any DNA isolation technology that isolates, purifies, or separates DNA fragment including high molecular weight (HMW) DNA fragment may be used for methods described herein. In certain embodiments, the DNA isolation technology involves separating DNA molecules or fragments based on size.
In certain embodiments, the DNA isolating step comprises electrophoresing. In certain embodiments, the DNA isolating step comprises one dimensional electrophoresing.
In certain embodiments, the DNA isolating step comprises electrophoresing the DNA fragment, such as HMW DNA fragment (e.g., for a first period of time in a first direction). In certain embodiments, the DNA isolating step further comprises electrophoresing the DNA fragment, such as HMW DNA fragment, for a second period of time in a second direction. Accordingly, in certain embodiments, the DNA isolating step comprises two-dimensional electrophoresing.
In certain embodiments, the isolating step is conducted in a device suitable for one dimensional, or two-dimensional electrophoresis. For non-limiting examples, in certain embodiments, the isolating step may be conducted in a SageHLS™ device/protocol as disclosed in U.S. Patent Application 2020/0041449, which is incorporated by reference herein for all purposes. In certain embodiments using a SageHLS™ device, HMW DNA fragment is electrophoresed for a first period of time in one direction for separation by size and then electrophoresed for a second period of time in another direction (e.g., a perpendicular direction) for elution from the gel and then isolated into a collection chamber.
However, it is also apparent to people skilled in the art, that there are a variety of ways to conduct the isolating step in any electrophoresis gel/device or protocol that is suitable for isolating HMW DNA fragment by size. In certain embodiments, a DNA ladder is used to help locate the HMW DNA fragment. At the end of the isolating step, in certain embodiments, the HMW DNA fragment is retrieved by cutting the gel cube containing the HMW DNA fragment, followed by dissolving the gel to release the HMW DNA fragment. In certain embodiments, the HMW DNA fragment is eluted from the gel for collection.
Deep Sequencing Methods
In certain embodiments, isolated DNA fragment (e.g., HMW DNA fragments) are sequenced to provide a sequence result readout. Any DNA sequencing technology that can provide sequence result over a high molecular weight (HMW) DNA fragment may be used for methods described herein.
Seminal sequencing technology such as Sanger sequencing has a read-length limit of about 500bp~800bp. Modem sequencing technologies have since overcome the read-length limit of early generation technology and continue to evolve/improve. Distinct from Sanger sequencing, Next Generation Sequencing (NGS) methods are suitable for methods described herein. People skilled in the art are familiar with a variety of modem sequencing technologies and platforms capable of sequencing a HMW DNA fragments. Specific terms described herein are merely non-limiting exemplary sequencing technologies or terms suitable for methods described herein.
In certain embodiments, the sequencing method is a high-throughput sequencing method, for example, a massive parallel signature sequencing (MPSS) method. In certain embodiments, the sequencing method is a deep sequencing method. In certain embodiments, the sequencing method is a shotgun sequencing method. In certain embodiments, the sequencing method is a short-read sequencing method. In certain embodiments, the sequencing method is a pyrosequencing method.
In certain embodiments, the sequencing method is a long-read sequencing method. In certain embodiments, the sequencing method is a Nanopore DNA sequencing method. In certain embodiments, the sequencing method is a single molecule real time (SMRT) sequencing method.
Certain Definitions
The term "nucleic acid" refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form, composed of monomers (nucleotides) containing a sugar, phosphate and a base which is either a purine or pyrimidine. Unless specifically limited, the term encompasses nucleic acids containing known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified position thereof (e.g., degenerate codon substitutions) and complementary sequences as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucl. Acids Res., 19:508 (1991); Ohtsuka et al., JBC, 260:2605 (1985); Rossolini et al., Mol. Cell. Probes, 8:91 (1994). A "nucleic acid fragment" is a fraction of a given nucleic acid molecule. Deoxyribonucleic acid (DNA) in the majority of organisms is the genetic material while ribonucleic acid (RNA) is involved in the transfer of information contained within DNA into proteins. The term "nucleotide sequence" refers to a polymer of DNA or RNA that can be single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases capable of incorporation into DNA or RNA polymers. The terms "nucleic acid," "nucleic acid molecule," "nucleic acid fragment," "nucleic acid sequence or segment," or "polynucleotide" may also be used interchangeably with gene, cDNA, DNA and RNA encoded by a gene.
By “portion” or “fragment,” as it relates to a nucleic acid molecule, sequence or segment of the invention, when it is linked to other sequences for expression, is meant a sequence having at least 80 nucleotides, more specifically at least 150 nucleotides, and still more specifically at least 400 nucleotides. If not employed for expressing, a “portion” or “fragment” means at least 9, specifically 12, more specifically 15, even more specifically at least 20, consecutive nucleotides, e.g., probes and primers (oligonucleotides), corresponding to the nucleotide sequence of the nucleic acid molecules of the invention.
The invention encompasses isolated or substantially purified nucleic acid or protein compositions. In the context of the present invention, an "isolated" or "purified" DNA molecule or an "isolated" or "purified" polypeptide is a DNA molecule or polypeptide that exists apart from its native environment and is therefore not a product of nature. An isolated DNA molecule or polypeptide may exist in a purified form or may exist in a non-native environment such as, for example, outside a host cell. For example, an "isolated" or "purified" nucleic acid molecule or protein, or biologically active portion thereof, is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. In one embodiment, an "isolated" nucleic acid is free of sequences that naturally flank the nucleic acid (i.e., sequences located at the 5' and 3' ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived. For example, in various embodiments, the isolated nucleic acid molecule can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb, or 0.1 kb of nucleotide sequences that naturally flank the nucleic acid molecule in genomic DNA of the cell from which the nucleic acid is derived. A protein that is substantially free of cellular material includes preparations of protein or polypeptide having less than about 30%, 20%, 10%, 5%, (by dry weight) of contaminating protein. When the protein of the invention, or biologically active portion thereof, is recombinantly produced, culture medium may represent less than about 30%, 20%, 10%, or 5% (by dry weight) of chemical precursors or non-protein-of- interest chemicals. Fragments and variants of the disclosed nucleotide sequences and proteins or partial-length proteins encoded thereby are also encompassed by the present invention. By "fragment" or "portion" is meant a full length or less than full length of the nucleotide sequence encoding, or the amino acid sequence of, a polypeptide or protein.
"Naturally occurring" is used to describe an object that can be found in nature as distinct from being artificially produced. For example, a protein or nucleotide sequence present in an organism (including a virus), which can be isolated from a source in nature and which has not been intentionally modified by man in the laboratory, is naturally occurring.
“Recombinant DNA molecule” is a combination of DNA sequences that are joined together using recombinant DNA technology and procedures used to join together DNA sequences as described, for example, in Sambrook and Russell, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press (3rd edition, 2001).
The terms "heterologous DNA sequence," "exogenous DNA segment" or "heterologous nucleic acid," each refer to a sequence that originates from a source foreign to the particular host cell or, if from the same source, is modified from its original form. Thus, a heterologous gene in a host cell includes a gene that is endogenous to the particular host cell but has been modified. The terms also include non-naturally occurring multiple copies of a naturally occurring DNA sequence. Thus, the terms refer to a DNA segment that is foreign or heterologous to the cell, or homologous to the cell but in a position within the host cell nucleic acid in which the element is not ordinarily found. Exogenous DNA segments are expressed to yield exogenous polypeptides.
A "homologous" DNA sequence is a DNA sequence that is naturally associated with a host cell into which it is introduced.
"Wild-type" refers to the normal gene, or organism found in nature without any known mutation.
“Genome” refers to the complete genetic material of an organism.
A “vector" is defined to include, inter alia, any plasmid, cosmid, phage or binary vector in double or single stranded linear or circular form which may or may not be self-transmissible or mobilizable, and which can transform prokaryotic or eukaryotic host either by integration into the cellular genome or exist extrachromosomally (e.g., autonomous replicating plasmid with an origin of replication).
"Regulatory sequences" and "suitable regulatory sequences" each refer to nucleotide sequences located upstream (5' non-coding sequences), within, or downstream (3' non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include enhancers, promoters, translation leader sequences, introns, and polyadenylation signal sequences. They include natural and synthetic sequences as well as sequences that may be a combination of synthetic and natural sequences. As is noted above, the term "suitable regulatory sequences" is not limited to promoters. However, some suitable regulatory sequences useful in the present invention will include, but are not limited to constitutive promoters, tissue-specific promoters, development-specific promoters, inducible promoters and viral promoters.
"5' non-coding sequence" refers to a nucleotide sequence located 5' (upstream) to the coding sequence. It is present in the fully processed mRNA upstream of the initiation codon and may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency (Turner et al., Mol. Biotech., 3:225 (1995).
"3' non-coding sequence" refers to nucleotide sequences located 3' (downstream) to a coding sequence and include polyadenylation signal sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression. The polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3 ' end of the mRNA precursor.
"Promoter" refers to a nucleotide sequence, usually upstream (5') to its coding sequence, which controls the expression of the coding sequence by providing the recognition for RNA polymerase and other factors required for proper transcription. "Promoter" includes a minimal promoter that is a short DNA sequence comprised of a TATA- box and other sequences that serve to specify the site of transcription initiation, to which regulatory elements are added for control of expression. "Promoter" also refers to a nucleotide sequence that includes a minimal promoter plus regulatory elements that is capable of controlling the expression of a coding sequence or functional RNA. This type of promoter sequence consists of proximal and more distal upstream elements, the latter elements often referred to as enhancers. Accordingly, an "enhancer" is a DNA sequence that can stimulate promoter activity and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue specificity of a promoter. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even be comprised of synthetic DNA segments. A promoter may also contain DNA sequences that are involved in the binding of protein factors that control the effectiveness of transcription initiation in response to physiological or developmental conditions.
The "initiation site" is the position surrounding the first nucleotide that is part of the transcribed sequence, which is also defined as position +1. With respect to this site all other sequences of the gene and its controlling regions are numbered. Downstream sequences (i.e. further protein encoding sequences in the 3' direction) are denominated positive, while upstream sequences (mostly of the controlling regions in the 5' direction) are denominated negative.
Promoter elements, particularly a TATA element, that are inactive or that have greatly reduced promoter activity in the absence of upstream activation are referred to as "minimal or core promoters." In the presence of a suitable transcription factor, the minimal promoter functions to permit transcription. A “minimal or core promoter” thus consists only of all basal elements needed for transcription initiation, e.g., a TATA box and/or an initiator.
The following terms are used to describe the sequence relationships between two or more sequences (e.g., nucleic acids, polynucleotides or polypeptides): (a) "reference sequence," (b) "comparison window," (c) "sequence identity," (d) "percentage of sequence identity," and (e) "substantial identity."
(a) As used herein, "reference sequence" is a defined sequence used as a basis for sequence comparison. A reference sequence may be a subset or the entirety of a specified sequence; for example, as a segment of a full length cDNA, gene sequence or peptide sequence, or the complete cDNA, gene sequence or peptide sequence.
(b) As used herein, "comparison window" makes reference to a contiguous and specified segment of a sequence, wherein the sequence in the comparison window may comprise additions or deletions (i.e., gaps) compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Generally, the comparison window is at least 20 contiguous nucleotides in length, and optionally can be 30, 40, 50, 100, or longer. Those of skill in the art understand that to avoid a high similarity to a reference sequence due to inclusion of gaps in the sequence a gap penalty is typically introduced and is subtracted from the number of matches.
Methods of alignment of sequences for comparison are well known in the art. Thus, the determination of percent identity between any two sequences can be accomplished using a mathematical algorithm. Non-limiting examples of such mathematical algorithms are the algorithm of Myers and Miller, CABIOS, 4: 11 (1988); the local homology algorithm of Smith et al., Adv. Appl. Math., 2:482 (1981); the homology alignment algorithm of Needleman and Wunsch, JMB, 48:443 (1970); the search-for-similarity-method of Pearson and Lipman, Proc. Natl. Acad. Sci. USA, 85:2444 (1988); the algorithm of Karlin and Altschul, Proc. Natl. Acad. Sci. USA, 87:2264 (1990), modified as in Karlin and Altschul, Proc. Natl. Acad. Sci. USA, 90:5873 (1993).
Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine sequence identity. Such implementations include, but are not limited to, CLUSTAL in the PC/Gene program (available from Intelligenetics, Mountain View, California); the ALIGN program (Version 2.0) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Version 8 (available from Genetics Computer Group (GCG), 575 Science Drive, Madison, Wisconsin, USA). Alignments using these programs can be performed using the default parameters. The CLUSTAL program is well described by Higgins et al., Gene, 73:237 (1988); Higgins et al., CABIOS, 5: 151 (1989); Corpet et al., Nucl. Acids Res., 16: 10881 (1988); Huang et al., CABIOS, 8: 155 (1992); and Pearson et al., Meth. Mol. Biol., 24:307 (1994). The ALIGN program is based on the algorithm of Myers and Miller, supra. The BLAST programs of Altschul et al., JMB, 215:403 (1990); Nucl. Acids Res., 25:3389 (1990), are based on the algorithm of Karlin and Altschul supra.
Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (available on the world wide web at ncbi.nlm.nih.gov/). This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold. These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always > 0) and N (penalty score for mismatching residues; always < 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when the cumulative alignment score falls off by the quantity X from its maximum achieved value, the cumulative score goes to zero or below due to the accumulation of one or more negative-scoring residue alignments, or the end of either sequence is reached.
In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences. One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a test nucleic acid sequence is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid sequence to the reference nucleic acid sequence is less than about 0.1, more specifically less than about 0.01, and most specifically less than about 0.001.
To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al., Nucleic Acids Res. 25:3389 (1997). Alternatively, PSLBLAST (in BLAST 2.0) can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al., supra. When utilizing BLAST, Gapped BLAST, PSLBLAST, the default parameters of the respective programs (e.g., BLASTN for nucleotide sequences, BLASTX for proteins) can be used. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, a cutoff of 100, M=5, N=-4, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix. See the world wide web at ncbi.nlm.nih.gov. Alignment may also be performed manually by visual inspection.
For purposes of the present invention, comparison of sequences for determination of percent sequence identity to another sequence may be made using the BlastN program (version 1.4.7 or later) with its default parameters or any equivalent program. By "equivalent program" is intended any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide or amino acid residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by the preferred program.
(c) As used herein, "sequence identity" or "identity" in the context of two nucleic acid or polypeptide sequences makes reference to a specified percentage of residues in the two sequences that are the same when aligned for maximum correspondence over a specified comparison window, as measured by sequence comparison algorithms or by visual inspection. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule. When sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Sequences that differ by such conservative substitutions are said to have "sequence similarity" or "similarity." Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, California).
(d) As used herein, "percentage of sequence identity" means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison, and multiplying the result by 100 to yield the percentage of sequence identity.
(e)(i) The term "substantial identity" of sequences means that a polynucleotide comprises a sequence that has at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, or 79%, at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, or 89%, at least 90%, 91%, 92%, 93%, or 94%, and at least 95%, 96%, 97%, 98%, or 99% sequence identity, compared to a reference sequence using one of the alignment programs described using standard parameters. One of skill in the art will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning, and the like. Substantial identity of amino acid sequences for these purposes normally means sequence identity of at least 70%, at least 80%, 90%, at least 95%.
Another indication that nucleotide sequences are substantially identical is if two molecules hybridize to each other under stringent conditions (see below). Generally, stringent conditions are selected to be about 5 °C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. However, stringent conditions encompass temperatures in the range of about 1°C to about 20°C, depending upon the desired degree of stringency as otherwise qualified herein. Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides they encode are substantially identical. This may occur, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. One indication that two nucleic acid sequences are substantially identical is when the polypeptide encoded by the first nucleic acid is immunologically cross reactive with the polypeptide encoded by the second nucleic acid. (e)(ii) The term "substantial identity" in the context of a peptide indicates that a peptide comprises a sequence with at least 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, or 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, or 89%, at least 90%, 91%, 92%, 93%, or 94%, or 95%, 96%, 97%, 98% or 99%, sequence identity to the reference sequence over a specified comparison window. Optimal alignment is conducted using the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443 (1970). An indication that two peptide sequences are substantially identical is that one peptide is immunologically reactive with antibodies raised against the second peptide. Thus, a peptide is substantially identical to a second peptide, for example, where the two peptides differ only by a conservative substitution.
For sequence comparison, typically one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters.
As noted above, another indication that two nucleic acid sequences are substantially identical is that the two molecules hybridize to each other under stringent conditions. The phrase "hybridizing specifically to" refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA. “Bind(s) substantially” refers to complementary hybridization between a probe nucleic acid and a target nucleic acid and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the target nucleic acid sequence.
"Stringent hybridization conditions" and "stringent hybridization wash conditions" in the context of nucleic acid hybridization experiments such as Southern and Northern hybridizations are sequence dependent, and are different under different environmental parameters. Longer sequences hybridize specifically at higher temperatures. The thermal melting point (Tm) is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Specificity is typically the function of post-hybridization washes, the critical factors being the ionic strength and temperature of the final wash solution. For DNA-DNA hybrids, the Tm can be approximated from the equation of Meinkoth and Wahl, Anal. Biochem., 138:267 (1984); Tm 81.5°C + 16.6 (log M) +0.41 (%GC) - 0.61 (% form) - 500/L; where M is the molarity of monovalent cations, %GC is the percentage of guanosine and cytosine nucleotides in the DNA, % form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. Tm is reduced by about 1°C for each 1% of mismatching; thus, Tm, hybridization, and/or wash conditions can be adjusted to hybridize to sequences of the desired identity. For example, if sequences with >90% identity are sought, the Tm can be decreased 10°C. Generally, stringent conditions are selected to be about 5°C lower than the Tm for the specific sequence and its complement at a defined ionic strength and pH. However, severely stringent conditions can utilize a hybridization and/or wash at 1, 2, 3, or 4°C lower than the Tm; moderately stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9, or 10°C lower than the Tm; low stringency conditions can utilize a hybridization and/or wash at 11, 12, 13, 14, 15, or 20°C lower than the Tm. Using the equation, hybridization and wash compositions, and desired temperature, those of ordinary skill will understand that variations in the stringency of hybridization and/or wash solutions are inherently described. If the desired degree of mismatching results in a temperature of less than 45°C (aqueous solution) or 32°C (formamide solution), it is preferred to increase the SSC concentration so that a higher temperature can be used. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Laboratory Techniques in Biochemistry and Molecular Biology Hybridization with Nucleic Acid Probes, part I chapter 2 "Overview of principles of hybridization and the strategy of nucleic acid probe assays" Elsevier, New York (1993). Generally, highly stringent hybridization and wash conditions are selected to be about 5°C lower than the Tm for the specific sequence at a defined ionic strength and pH.
An example of highly stringent wash conditions is 0.15 M NaCl at 72°C for about 15 minutes. An example of stringent wash conditions is a 0.2X SSC wash at 65°C for 15 minutes (see, Sambrook, infra, for a description of SSC buffer). Often, a high stringency wash is preceded by a low stringency wash to remove background probe signal. An example medium stringency wash for a duplex of, e.g., more than 100 nucleotides, is IX SSC at 45°C for 15 minutes. An example low stringency wash for a duplex of, e.g., more than 100 nucleotides, is 4-6X SSC at 40°C for 15 minutes. For short probes (e.g., about 10 to 50 nucleotides), stringent conditions typically involve salt concentrations of less than about 1.5 M, more specifically about 0.01 to 1.0 M, Na ion concentration (or other salts) at pH 7.0 to 8.3, and the temperature is typically at least about 30°C and at least about 60°C for long probes (e.g., >50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. In general, a signal to noise ratio of 2X (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization. Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the proteins that they encode are substantially identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code.
Very stringent conditions are selected to be equal to the Tm for a particular probe. An example of stringent conditions for hybridization of complementary nucleic acids which have more than 100 complementary residues on a filter in a Southern or Northern blot is 50% formamide, e.g., hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in 0. IX SSC at 60 to 65°C. Exemplary low stringency conditions include hybridization with a buffer solution of 30 to 35% formamide, IM NaCl, 1% SDS (sodium dodecyl sulphate) at 37°C, and a wash in IX to 2X SSC (20X SSC = 3.0 M NaCl/0.3 M trisodium citrate) at 50 to 55°C. Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1.0 M NaCl, 1% SDS at 37°C, and a wash in 0.5X to IX SSC at 55 to 60°C.
The invention will now be illustrated by the following non-limiting Examples.
EXAMPLE 1
Full Spectrum Characterization oflndels Induced by A Genome Editing Protocol
A genome editing protocol was implemented in this Example for correction of the sickle mutation in human hematopoietic stem cells, using a Cas9 ribonucleoprotein targeting cleavage of the B-globin gene (HBB) near the mutation site, and an oligonucleotide serving as a donor template for homology-directed repair (HDR) of cleavage and correction of the disease-causing mutation. This protocol drives correction of about >20% of HBB alleles in human hematopoietic stem cells. However, the remainder of edited alleles in the cell population are repaired by NHEJ. Indels induced by the genome editing protocol may produce null alleles equivalent to P-thalassemia mutations, which is a major safety concern. The invention as described herein was partly designed to detect and quantify longer indels, up to tens of kb in size, that may be produced by repair of Cas9-induced DSBs.
In this Example, to capture a large molecular weight fragment of DNA encompassing the HBB editing site with the very large regions surrounding it, the SageHLS™ CATCH method was used with SageHLS™ apparatus and protocol for the extraction of high molecular weight DNA. Subsequently, the fragment was deep sequenced to detect large-scale indels.
Human hematopoietic stem cells that had been edited with a CRISPR/Cas9 based protocol were loaded onto the SageHLS™ chip and then were further treated with a pair of targeted nucleases (Cas9/guide RNA complexes) that cleave chromosome 11 approximately lOOkb upstream and downstream of the editing site. Cleavage liberated a fragment of approximately 200kb spanning the editing site for the comprehensive detection of a full spectrum of indels, including large deletions (up to 200kb) that may be present in the edited region. The liberated region was separated by gel electrophoresis and eluted from the gel using the SageHLS™ apparatus.
The recovered DNA was fragmented and cloned into an Illumina® sequencing library using standard procedures and sequenced in one lane of an Illumina® Novaseq® apparatus to produce paired end 150 bp reads at a depth of 955,551,306 read pairs.
Reads were processed with a bioinformatic pipeline that identifies structural variations (specifically large deletions) by identifying breakpoints that bracket the cut site to the up and downstream region. Sequence reads (150bp PE reads) were aligned to the human reference genome (hg38) using BWA. Reads were then labeled and extracted if they were split reads (a single read whose ends map to non-adjacent regions) or reads that were part of a pair that maps discordantly (farther from each other in the genome than would be expected for the average insert size). Breakpoints with ends that map to distant regions identify the boundaries of large deletions. The reads were then analyzed using structural variation callers and events that span the targeted cut site locus were extracted as putative large deletion events.
This analysis revealed the presence of a 5,143 bp deletion that was located approximately 66kb upstream of the B-globin gene and not spanning the cut site. Analysis of read depth indicated that the deletion is heterozygous. To validate the presence of the deletion at the detected breakpoints, PCR primers were designed to amplify the genomic interval spanning the deletion breakpoint and amplify both the deleted and the normal alleles. Sequencing of the amplified alleles confirmed the presence of the deletion in unedited cell and validates the capability of the invention described herein to detect large deletions.
These data show that the invention described herein characterizes the full spectrum of insertions/deletions produced by editing with a gene editing protocol. The invention described herein detects short or long insertions/deletions at the cut site and in the full genomic interval containing the editing site targeted by CRISPR/Cas9.
Although the foregoing specification and examples fully disclose and enable the present invention, they are not intended to limit the scope of the invention, which is defined by the claims appended hereto.
All publications, patents and patent applications are incorporated herein by reference. While in the foregoing specification this invention has been described in relation to certain embodiments thereof, and many details have been set forth for purposes of illustration, it will be apparent to those skilled in the art that the invention is susceptible to additional embodiments and that certain of the details described herein may be varied considerably without departing from the basic principles of the invention.
The use of the terms “a” and “an” and “the” and similar referents in the context of describing the invention are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.
Embodiments of this invention are described herein, including the best mode known to the inventors for carrying out the invention. Variations of those embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the invention to be practiced otherwise than as specifically described herein. Accordingly, this invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context.
With respect to ranges of values, the invention encompasses each intervening value between the upper and lower limits of the range to at least a tenth of the lower limit's unit, unless the context clearly indicates otherwise. Further, the invention encompasses any other stated intervening values. Moreover, the invention also encompasses ranges excluding either or both of the upper and lower limits of the range, unless specifically excluded from the stated range.
Further, all numbers expressing quantities of ingredients, reaction conditions, % purity, polypeptide and polynucleotide lengths, and so forth, used in the specification and claims, are modified by the term "about," unless otherwise indicated. Accordingly, the numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties of the present invention. At the very least, and not as an attempt to limit the application of the doctrine of equivalents to the scope of the claims, each numerical parameter should at least be construed in light of the number of reported significant digits, applying ordinary rounding techniques. Nonetheless, the numerical values set forth in the specific examples are reported as precisely as possible. Any numerical value, however, inherently contains certain errors from the standard deviation of its experimental measurement. Unless defined otherwise, the meanings of all technical and scientific terms used herein are those commonly understood by one of skill in the art to which this invention belongs. One of skill in the art will also appreciate that any methods and materials similar or equivalent to those described herein can also be used to practice or test the invention. Further, all publications mentioned herein are incorporated by reference in their entireties.

Claims

WHAT IS CLAIMED IS: A method of identifying in a sample a DNA variant induced by a genome editing protocol, comprising: contacting a genomic DNA of the genome edited sample with one or more targeted nucleases that is capable of excising a DNA fragment from the genomic DNA; isolating the DNA fragment; and sequencing the isolated DNA fragment; wherein the genomic DNA comprises an editing site targeted by the genome editing protocol, and the DNA fragment comprises the editing site and the DNA variant. The method of claim 1, wherein the DNA fragment is a high molecular weight (HMW) DNA fragment that is at least lOkb in length. The method of any one of claims 1-2, wherein the DNA variant is an indel. The method of any one of claims 1-3, wherein the DNA fragment is at least lOOkb in length. The method of any one of claims 1-4, wherein the DNA fragment is at least 200kb in length. The method of any one of claims 1-5, wherein the one or more targeted nucleases comprises a single targeted nuclease, the genomic DNA is a linear DNA comprising two ends, and the DNA fragment further comprises one end of the genomic DNA. The method of claim 6, wherein the targeted nuclease cuts the genomic DNA at least about lOkb away from the end of the genomic DNA. The method of claim 6, wherein the targeted nuclease cuts the genomic DNA at least about lOOkb away from the end of the genomic DNA.
32 The method of claim 6, wherein the targeted nuclease cuts the genomic DNA at least about 200kb away from the end of the genomic DNA. The method of any one of claims 1-5, wherein the one or more targeted nucleases comprise a pair of targeted nucleases. The method of claim 10, wherein one targeted nuclease of the pair cuts the genomic DNA at least 5kb downstream of the editing site, and/or the other targeted nuclease of the pair cuts the genomic DNA at least 5kb upstream of the editing site. The method of claim 10, wherein one targeted nuclease of the pair cuts the genomic DNA at least 50kb downstream of the editing site, and/or the other targeted nuclease of the pair cuts the genomic DNA at least 50kb upstream of the editing site. The method of claim 10, wherein one targeted nuclease of the pair cuts the genomic DNA at least lOOkb downstream of the editing site, and/or the other targeted nuclease of the pair cuts the genomic DNA at least lOOkb upstream of the editing site. The method of any one of claims 1-13, wherein the DNA variant is located at least 500bp away from the editing site. The method of any one of claims 1-14, wherein the DNA variant is located at least Ikb away from the editing site. The method of any one of claims 1-15, wherein the DNA variant is located at least 5 kb away from the editing site. The method of any one of claims 1-16, wherein the method is capable of detecting a DNA variant located at least lOkb away from the editing site. The method of any one of claims 1-17, wherein the DNA variant is an indel that is at least lOObp in length. The method of any one of claims 1-18, wherein the DNA variant is an indel that is at least 2kb in length.
33 The method of any one of claims 1-19, wherein the method is capable of detecting a DNA variant that is at least 5kb in length. The method of any one of claims 1-20, wherein the one or more targeted nucleases comprises a CRISPR-Cas nuclease, a transcription activator-like effector nuclease (TALEN), a zinc-finger nuclease (ZFN), or a meganuclease. The method of any one of claims 1-21, wherein the one or more targeted nucleases comprise a pair of CRISPR-Cas9 nucleases. The method of any one of claims 1-22, wherein contacting comprises contacting the genomic DNA with the one or more targeted nucleases in a solution. The method of any one of claims 1-23, wherein contacting comprises contacting the genomic DNA with the one or more targeted nucleases in a gel matrix. The method of any one of claims 1-24, wherein isolating comprises electrophoresing. The method of any one of claims 1-25, wherein isolating comprises two-dimensional electrophoresing. The method of any one of claims 1-26, wherein sequencing comprises deep sequencing the DNA fragment. The method of any one of claims 1-26, wherein sequencing comprises long-read sequencing the DNA fragment. The method of any one of claims 1-28, wherein the sample comprises a bacterial cell, yeast cell, plant cell, or mammalian cell. The method of any one of claims 1-29, wherein the sample comprises a mammalian cell. The method of any one of claims 1-30, wherein the sample comprises a stem cell. The method of any one of claims 1-31, wherein the sample comprises a hematopoietic stem cell. The method of any one of claims 1-32, wherein the sample comprises an immune cell. The method of any one of claims 29-33, further comprises lysing the sample cell to release the genomic DNA. The method of any one of claims 1-34, wherein the sample was edited with a genome editing protocol selected from the group consisting of CRISPR-Cas based protocol, TALEN based protocol, and ZFN based protocol. The method of any one of claims 29-35, further comprises administering into an animal a population of cells, wherein the administered population of cells and the sample cell were previously edited in the same genome editing protocol.
PCT/US2022/039567 2021-08-05 2022-08-05 Methods for detecting indel produced by genome editing protocol WO2023014967A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163229943P 2021-08-05 2021-08-05
US63/229,943 2021-08-05

Publications (1)

Publication Number Publication Date
WO2023014967A1 true WO2023014967A1 (en) 2023-02-09

Family

ID=85154806

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/039567 WO2023014967A1 (en) 2021-08-05 2022-08-05 Methods for detecting indel produced by genome editing protocol

Country Status (1)

Country Link
WO (1) WO2023014967A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200392473A1 (en) * 2017-12-22 2020-12-17 The Broad Institute, Inc. Novel crispr enzymes and systems

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200392473A1 (en) * 2017-12-22 2020-12-17 The Broad Institute, Inc. Novel crispr enzymes and systems

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ATKINS ANDREW, CHUNG CHENG-HAN, ALLEN ALEXANDER G., DAMPIER WILL, GURROLA THEODORE E., SARIYER ILKER K., NONNEMACHER MICHAEL R., W: "Off-Target Analysis in Gene Editing and Applications for Clinical Translation of CRISPR/Cas9 in HIV-1 Therapy", FRONTIERS IN GENOME EDITING, vol. 3, XP093033952, DOI: 10.3389/fgeed.2021.673022 *
CHALLA ANIL K., STANFORD DENISE, ALLEN ANTONIO, RASMUSSEN LAWRENCE, AMANOR FERDINAND K., RAJU S. VAMSEE: "Validation of gene editing efficiency with CRISPR-Cas9 system directly in rat zygotes using electroporation mediated delivery and embryo culture", METHODSX, ELSEVIER BV, NL, vol. 8, 1 January 2021 (2021-01-01), NL , pages 101419, XP093033951, ISSN: 2215-0161, DOI: 10.1016/j.mex.2021.101419 *
DABROWSKA MAGDALENA, CZUBAK KAROL, JUZWA WOJCIECH, KRZYZOSIAK WLODZIMIERZ J, OLEJNICZAK MARTA, KOZLOWSKI PIOTR: "qEva-CRISPR: a method for quantitative evaluation of CRISPR/Cas-mediated genome editing in target and off-target sites", NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, GB, vol. 46, no. 17, 28 September 2018 (2018-09-28), GB , pages e101 - e101, XP093033950, ISSN: 0305-1048, DOI: 10.1093/nar/gky505 *
FANG HUAQIANG, BYGRAVE ALEXEI M, ROTH RICHARD H, JOHNSON RICHARD C, HUGANIR RICHARD L: "An optimized CRISPR/Cas9 approach for precise genome editing in neurons", ELIFE, vol. 10, 10 March 2021 (2021-03-10), XP093033949, DOI: 10.7554/eLife.65202 *
KANG SEUNG-HUN, LEE WI-JAE, AN JU-HYUN, LEE JONG-HEE, KIM YOUNG-HYUN, KIM HANSEOP, OH YEOUNSUN, PARK YOUNG-HO, JIN YEUNG BAE, JUN : "Prediction-based highly sensitive CRISPR off-target validation using target-specific DNA enrichment", NATURE COMMUNICATIONS, vol. 11, no. 1, 1 December 2020 (2020-12-01), XP055923681, DOI: 10.1038/s41467-020-17418-8 *
LUQING ZHANG, JIA RUIRUI, PALANGE NORBERTO J., SATHEKA ACHIM CCHITVSANZWHOH, TOGO JACQUES, AN YAO, HUMPHREY MABWI, BAN LUYING, JI : "Large Genomic Fragment Deletions and Insertions in Mouse Using CRISPR/Cas9", PLOS ONE, vol. 10, no. 3, pages e0120396, XP055349967, DOI: 10.1371/journal.pone.0120396 *

Similar Documents

Publication Publication Date Title
US20230091847A1 (en) Compositions and methods for improving homogeneity of dna generated using a crispr/cas9 cleavage system
CN106715694B (en) Nuclease-mediated DNA Assembly
EP3004349B1 (en) A method for producing precise dna cleavage using cas9 nickase activity
US20200172935A1 (en) Modified cpf1 mrna, modified guide rna, and uses thereof
Luan et al. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition
JP2022113799A (en) CRISPR-Cas Systems and Methods for Altering Expression of Gene Products
CN105658796B (en) CRISPR-CAS component systems, methods, and compositions for sequence manipulation
WO2017095967A2 (en) Therapeutic targets for the correction of the human dystrophin gene by gene editing and methods of use
WO2016025759A1 (en) Dna knock-in system
CN113373130A (en) Cas12 protein, gene editing system containing Cas12 protein and application
EP3940078A1 (en) Off-target single nucleotide variants caused by single-base editing and high-specificity off-target-free single-base gene editing tool
AU2017302657A1 (en) Mice comprising mutations resulting in expression of c-truncated fibrillin-1
AU2015323936B2 (en) Recombinase mutants
Guria et al. Circular RNA profiling by illumina sequencing via template‐dependent multiple displacement amplification
Bruijnesteijn et al. Rapid characterization of complex killer cell immunoglobulin-like receptor (KIR) regions using Cas9 enrichment and nanopore sequencing
CN115667283A (en) RNA-guided kilobase-scale genome recombination engineering
CN110551762B (en) CRISPR/ShaCas9 gene editing system and application thereof
WO2023014967A1 (en) Methods for detecting indel produced by genome editing protocol
CN116716298A (en) Guide editing system and fixed-point modification method of target gene sequence
US20190309283A1 (en) Method for preparing long-chain single-stranded dna
CN110551763B (en) CRISPR/SlutCas9 gene editing system and application thereof
CN110577970B (en) CRISPR/Sa-SlutCas9 gene editing system and application thereof
US20190218533A1 (en) Genome-Scale Engineering of Cells with Single Nucleotide Precision
JP7305812B2 (en) Method for preparing long single-stranded DNA
Ramadass Cloning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22853949

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE