WO2024119461A1 - Compositions et procédés pour détecter les sites de clivage cibles des nucléases crispr/cas et la translocation de l'adn - Google Patents

Compositions et procédés pour détecter les sites de clivage cibles des nucléases crispr/cas et la translocation de l'adn Download PDF

Info

Publication number
WO2024119461A1
WO2024119461A1 PCT/CN2022/137789 CN2022137789W WO2024119461A1 WO 2024119461 A1 WO2024119461 A1 WO 2024119461A1 CN 2022137789 W CN2022137789 W CN 2022137789W WO 2024119461 A1 WO2024119461 A1 WO 2024119461A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
sequence
target
sites
template
Prior art date
Application number
PCT/CN2022/137789
Other languages
English (en)
Inventor
Zhike LU
Lijia MA
Zhenxing Yu
Original Assignee
Westlake University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Westlake University filed Critical Westlake University
Priority to PCT/CN2022/137789 priority Critical patent/WO2024119461A1/fr
Publication of WO2024119461A1 publication Critical patent/WO2024119461A1/fr

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1068Template (nucleic acid) mediated chemical library synthesis, e.g. chemical and enzymatical DNA-templated organic molecule synthesis, libraries prepared by non ribosomal polypeptide synthesis [NRPS], DNA/RNA-polymerase mediated polypeptide synthesis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • the present disclosure relates to complexes, polynucleotides, vectors, kits, and methods for detecting cleavage sites of CRISPR/Cas nucleases and DNA translocations at cleavage sites in a genome.
  • CRISPR-based genome editing exhibited enormous potential in both biological research and clinical applications.
  • CRISPR therapy has its unique advantage of directly targeting the nucleic acid sequences of previously undruggable targets.
  • non-specific targeting of gRNAs which might introduce undesired edits, causes unexpected cell genotoxicity.
  • it is urged to understand the outcomes of off-target edits and the resulting DNA translocations, which challenges the great translational potential of CRISPR technology in harnessing genetic disorders and other human diseases.
  • GUIDE-seq labeled and enriched double-strand breaks in the genome of living cells using exogenous double-stranded oligodeoxynucleotides (dsODNs) , which were mediated by DNA repair process (Tsai et al., Nat Biotechnol 2015) .
  • dsODNs exogenous double-stranded oligodeoxynucleotides
  • BLISS is another type of in cellula technique, which utilizes in situ DSB ligation in fixed cells and characterizes the off-target sites for both SpCas9 and As/LbCpf1 (Yan et al., Nat Commun 2017) .
  • CRISPR technology holds therapeutic potential for many unmet medical needs, the off-target identification of in vivo CRISPR editing and the evaluation of corresponding genotoxicity are highly demanded.
  • one strategy is to use in vitro or computational approaches to prioritize a list of genomic regions and validate them on in vivo samples one by one through targeted amplicon sequencing (Amplicon-seq) (Newby et al., Nature 2021; Musunuru et al., Nature 2021; Akcakaya et al., Nature 2018) , which would risk overlooking in vivo specific off-targets and suffer from tedious labor work if the prior data comes with a long candidate list.
  • DISCOVER-seq utilized the signal of chromatin immunoprecipitation of MRE11, which is involved in the DNA repairing pathway, to represent and enrich genomic sites undergoing DSB-induced repairs (Wienert et al., Science 2019) .
  • the dynamic nuclease activity of Cas9 might not be fully captured by the “snapshot” signal from MRE11 immunoprecipitation.
  • DNA translocation has been a significant concern for CRISPR editing, as it typically causes higher genotoxicity, although it occurs at a relatively lower frequency (Wei et al., Cell 2016) .
  • the potential risk of DNA translocation has often been concentrated on applying CRISPR editing in producing CAR-T cells since multiple gRNAs were introduced to T cells and cause risks of translocation between double-strand DNA (DSB) ends (Liu et al., Cell 2017; Ren et al., Clin Cancer Res 2017) .
  • compositions and methods for detecting target cleavage sites of CRISPR/Cas nucleases and DNA translocation are described in International Application No. PCT/CN2021/124025, filed October 15, 2021, which is incorporated herein by reference in its entirety.
  • CRISPR technology holds significant promise for biological studies and gene therapies because of its high flexibility and efficiency when applied in mammalian cells.
  • endonuclease e.g., Cas9
  • Cas9 potentially generates undesired edits; thus, there is an urgent need to comprehensively identify off-target sites so that the genotoxicities can be accurately assessed.
  • PEAC-seq a new technology, which is referred to as “PEAC-seq” in some embodiments, for detecting cleavage sites of CRISPR/Cas nucleases and DNA translocations at cleavage sites in a genome.
  • PEAC-seq adopts the Prime Editor, or a modified version of the Prime Editor, to insert a sequence-optimized sequence (i.e., a label or a tag) to the Cas nuclease editing sites and enrich the labeled regions with site-specific primers for high throughput sequencing (HTS) .
  • PEAC-seq employs a Cas nuclease, a reverse transcriptase, and a guide RNA (also called “pegRNA” ) .
  • the PEAC-seq can identify Cas editing sites, as well as DNA translocations, which are more genotoxic but usually overlooked by other off-target detection methods. As PEAC-seq does not rely on exogenous oligodeoxynucleotides (ODNs) to label the editing site, it can be used in vivo for off- target identification. PEAC-seq provides a comprehensive and streamlined strategy to identify CRISPR off-targeting sites in vitro and in vivo, as well as DNA translocation events. This new technique further diversifies the toolkit to evaluate the genotoxicity of CRISPR applications in research and clinical applications.
  • ODNs exogenous oligodeoxynucleotides
  • PEAC-seq provides a method to detect Cas9 cleavage sites with high accuracy and sensitivity.
  • PEAC-seq can be used in vitro and in vivo, as illustrated in the Examples of this disclosure.
  • PEAC-seq can also be used to detect DNA translocations at Cas cleavage sites.
  • PEAC-seq is designed to insert an insertion sequence (e.g., a label or a tag) into a Cas9 cleavage site (including both on-target and off-target sites) in the genome. These insertion sequences function as labels, marking the Cas9 cleavage sites.
  • the incorporation of the insertion sequences in the genomic DNA is also referred to herein as “labeling. ”
  • the insertion sequence (e.g., a label or a tag) can be optimized in composition and length to increase insertion efficiency. For instance, the insertion sequence can incorporate a tag sequence to represent and enrich the edited sites in the genome.
  • the reverse transcriptase and the Cas9 nuclease are fused together as, e.g., a fusion protein.
  • the labeling of the genomic DNA by an insertion sequence is performed at the same location of a cleavage site right after the Cas nuclease cleaves the genomic DNA at that cleavage site.
  • the Cas cleavage sites can be identified on the genome. The accompanying process of cut-and-insertion ensures consistency between cutting events and insertion events.
  • DNA translocations at Cas9 cleavage sites can be identified by a detection method disclosed herein.
  • the present disclosure provides a comprehensive and streamlined method to identify CRISPR targeting sites both in vitro and in vivo, as well as DNA translocation events.
  • the method employs a guide RNA comprising an insertion sequence reverse transcriptase (RT) template and does not rely on additional exogenous label sequence.
  • RT reverse transcriptase
  • the present disclosure provides a complex comprising a Cas nuclease, a reverse transcriptase (RT) , and a guide RNA which comprises a spacer, a scaffold, an insertion sequence RT template (RTT) , and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • the spacer, the scaffold, the insertion sequence RT template, and the PBS are arranged from 5’ to 3’ in the guide RNA.
  • the present disclosure provides a complex comprising a Cas nuclease, a reverse transcriptase, a guide RNA which comprises a spacer and a scaffold, and an RTT-PBS sequence which comprises an insertion sequence RT template and a primer binding site (PBS) , wherein the PBS is located downstream to the 3’ end of the insertion sequence RT template, and wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • the RTT-PBS sequence is a linear sequence or a circularized sequence.
  • the RTT-PBS sequence further comprises an MS2 hairpin.
  • the Cas nuclease is selected from Cas9, its variants, and mutants of any of the variants.
  • the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, variants thereof and mutants of any of the variants.
  • the Cas nuclease and the reverse transcriptase are not fused or linked.
  • the Cas nuclease and the reverse transcriptase are formed as a fusion protein, optionally operably connected by a linker.
  • the fusion protein is encoded by a sequence of SEQ ID NO: 208.
  • the insertion sequence RT template is about 10 to 30 nucleotides.
  • the insertion sequence RT template comprises a nucleotide sequence of any one of SEQ ID NOs: 504-506.
  • the insertion sequence RT template encodes one or more tags suitable for hybrid capture.
  • the guide RNA comprises an RNA structural motif at the 3’ end.
  • the RNA structural motif is a modified prequeosine1-1 riboswitch aptamer (evopreQ1) or a frameshifting pseudoknot from Moloney Murine Leukemia Virus (MMLV) .
  • evopreQ1 modified prequeosine1-1 riboswitch aptamer
  • MMLV Moloney Murine Leukemia Virus
  • the PBS comprises random nucleotides.
  • the present disclosure provides a polynucleotide encoding the Cas nuclease, the reverse transcriptase, the spacer, the scaffold, the insertion sequence RT template, and the PBS in any one of the complexes disclosed herein.
  • the present disclosure provides a polynucleotide encoding a guide RNA comprising an insertion sequence RT template, wherein the insertion sequence RT template comprises a nucleotide sequence of any one of SEQ ID NOs: 504-506.
  • the present disclosure provides a polynucleotide encoding an RTT-PBS sequence which comprises an insertion sequence RT template and a primer binding site (PBS) , wherein the insertion sequence RT template comprises a nucleotide sequence of any one of SEQ ID NOs: 504-506.
  • the present disclosure provides a vector comprising any one of the polynucleotides disclosed herein.
  • the present disclosure provides a kit comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, and a guide RNA which comprises a spacer, a scaffold, an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • a kit comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, and a guide RNA which comprises a spacer, a scaffold, an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • PBS primer binding site
  • the present disclosure provides a kit comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, a guide RNA which comprises a spacer and a scaffold, and an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • a kit comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, a guide RNA which comprises a spacer and a scaffold, and an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • PBS primer binding site
  • the present disclosure provides a kit comprising one or more vectors encoding a Cas nuclease, a reverse transcriptase, and a guide RNA which comprises a spacer, a scaffold, and an RTT-PBS sequence which comprises an insertion sequence RT template and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • a kit comprising one or more vectors encoding a Cas nuclease, a reverse transcriptase, and a guide RNA which comprises a spacer, a scaffold, and an RTT-PBS sequence which comprises an insertion sequence RT template and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • DSB DNA double-strand break
  • the present disclosure provides a kit comprising one or more vectors encoding a Cas nuclease, a reverse transcriptase, a guide RNA which comprises a spacer and a scaffold, and an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • a kit comprising one or more vectors encoding a Cas nuclease, a reverse transcriptase, a guide RNA which comprises a spacer and a scaffold, and an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • PBS primer binding site
  • the Cas nuclease is selected from Cas9, its variants and mutants of any of the variants.
  • the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, variants thereof and mutants of any of the variants.
  • the insertion sequence RT template comprises a nucleotide sequence of any one of SEQ ID NOs: 504-506.
  • the present disclosure provides a method for labeling Cas nuclease cleavage sites in genomic DNA, comprising contacting the genomic DNA with the complex in any one of claims 1-16, wherein the genomic DNA is cleaved at one or more cleavage sites, and one or more sequences that are reverse transcribed from the insertion sequence RT template in part or in whole are inserted into the one or more cleavage sites, and wherein the one or more sequences inserted into the one or more cleavage sites are labels.
  • the cleavage site is on-target or off-target.
  • the present disclosure provides a method for detecting Cas9 cleavage sites and/or detecting DNA translocation in genomic DNA, comprising
  • the one or more amplicons each comprise a portion of genomic DNA that is immediately upstream or downstream to the one or more labels.
  • the present disclosure provides a method for identifying off-target Cas cleavage sites, comprising comparing the Cas cleavage sites identified by the disclosed herein with a target sequence, wherein the cleavage site that is not identical to the target sequence is an off-target site.
  • the labeled genomic DNA is processed by Tn5 tagmentation before enrichment, wherein sequencing adapters that include unique molecular identifiers (UMI) are embedded in the Tn5 transposases.
  • UMI unique molecular identifiers
  • the labeled genomic DNA is targeted and enriched by PCR or a hybrid capture-based target enrichment method.
  • the enrichment is performed by two PCR, wherein in one PCR reaction the insertion sequence is used as the forward primer binding site and in the other PCR reaction the insertion sequence is used as the reverse primer binding site.
  • the 3’ end of the primers that bind to the insertion sequence is at least 2-bp away from the insertion boundary.
  • the present disclosure provides a method for determining the relative specificity of a plurality of guide RNAs comprising
  • a guide RNA having fewer off-target sites is more specific than a guide RNA having more off-target sites.
  • the present disclosure provides a method for determining the relative specificity of a plurality of Cas nuclease variants or mutants of any of the variants comprising
  • the present disclosure provides a method for determining the relative genotoxicity of a plurality of guide RNAs comprising
  • a guide RNA having fewer off-target sites and fewer DNA translocation is more specific than a guide RNA having more off-target sites and more DNA translocation.
  • Fig. 1 illustrates an embodiment in this disclosure: PEAC-seq.
  • Fig. 1A is a schematic representation of a PEAC-seq experimental procedure.
  • the gDNA were extracted and undergone Tn5 tagmentation.
  • the Tn5 was embedded with UMI-adaptors to eliminate PCR duplications. After tagmentation, fragments were amplified by pairs of primers (one priming at the PEAC-seq insertion, another priming with the Tn5 adaptor) .
  • Fig. 1B is a schematic representation of the two forward primers and two reverse primers designed for tag enrichment and library preparation of PEAC-seq.
  • Each forward primer was paired with a downstream Tn5 primer to generate amplicons including the PEAC-seq tag sequence and its downstream genomic sequences.
  • Each reverse primer was paired with an upstream Tn5 primer to generate amplicons including the PEAC-seq tag sequence and its upstream genomic sequences.
  • five Amplicon-seq data from the three forward primers and two reverse primers were generated, and six candidate lists of putative off-targets were inferred from the five Amplicon-seq data using a modified GUIDE-seq analysis pipeline.
  • Figs. 1C-1E are Venn diagrams showing the shared and unique off-targets identified by PEAC-seq and GUIDE-seq.
  • Fig. 2 shows an analysis of PEAC-seq off-target sites.
  • Fig. 2A is the visualization of PEAC-seq on-target and off-target sites.
  • the symbol ‘*’ represented a PEAC-seq site that was also called by the GUIDE-seq.
  • the symbol ‘**’ represented a PEAC-seq off-target (PEAC-seq-unique) that was identified by Amplicon-seq but not called by the GUIDE-seq.
  • PEAC score quantitative enrichment of the PEAC-seq tag at the edited sites
  • PEAC-ID each identified sites (on-target and off-target) by PEAC-seq were assigned a PEAC-ID, which was ordered by the PEAC score (descending order) .
  • Fig. 2B shows the number of reads from the shared PEAC-seq and GUIDE-seq sites are highly correlated.
  • Fig. 2C shows screenshots of PEAC-seq signal tracks from the IGV Genome Browser.
  • One on-target site, one shared off-target site, and one PEAC-seq unique off-target site were presented.
  • signals from both the PEAC-seq and the wild-type (WT, no Cas9-MMLV treatment) samples were included.
  • the first track represented signals from the amplicons of a forward primer and a downstream Tn5 primer
  • the second track represented signals from the amplicons of a reverse primer and an upstream Tn5 primer.
  • the model on the right side showed the direction of spacer and PAM of each case.
  • Fig. 2D shows the shared off-targets (grey bars) tend to have less mismatches compared to the on-target site, while the PEAC-seq unique sites (slashed bars) and the GUIDE-seq unique sites (dashed bars) tend to have more mismatches.
  • Fig. 2E shows the mutation frequencies were plotted at each position alongside the gRNA and PAM sequences (from 5’ to 3’ ) . From top to bottom are profiles of VEGFA TS1, TS2 and TS3.
  • Fig. 3 shows PEAC-seq identified DNA translocations relevant to CRISPR genome editing.
  • Fig. 3A shows signal tracks of one PEAC-seq site with unexpected upstream signals from the F-primer amplicon. Dashed grey bar: cutting site; Light grey and dark grey peaks: expected signals from the F-primer; White peak with board line and slashed bars: unexpected signals from the F-primer.
  • Fig. 3B are proposed models of the generation of unexpected upstream signals. Both the Receiver site and the Donor site can generate DSBs and proximal to each other within the nucleus. Model (i) and Model (ii) joined DSB ends from the same Receiver site. Model (iii) , Model (iv) and Model (v) joined one donor DSB and one Receiver DSB. If the donor DSB carried the PEAC-seq insertion, the unexpected upstream signal would be observed at the Receiver Site. In the models, the gRNA location was set on the top strand.
  • Fig. 3C shows the design of validation PCR to identify the genomic sequence of the Donor Sites.
  • Two specific primers (Nest-F1 and Nest-F2) were designed upstream of the gRNA of the Receiver Site.
  • the Nest-F1 and Nest-F2 were sequentially used with the downstream Tn5 primer, and two amplicons were generated.
  • the 2nd amplicons were sent for Amplicon-seq.
  • Fig. 3D shows the translocation cases identified by PEAC-seq + Amplicon-seq.
  • Fig. 3E shows the translocation scores of all sites were plotted. The two arrows indicate the Receiver Site in Fig4D. A DNA translocation score was calculated as “translocation reads number” /(“normal reads number” + “translocation reads number” + 10) .
  • Fig. 4 shows PEAC-seq identified pcsk9 off-targets from an edited mouse embryo.
  • Fig. 4A is a schematic representation of an in vivo PEAC-seq experiment.
  • Fig. 4B is a Venn diagram showing the overlap between the PEAC-seq on-target and off-targets of PCSK9 and the top 18 editing sites (including the on-target) identified by DISCOVER-seq.
  • Fig. 4C is the sequence visualization of the PCSK9 on-target and off-targets.
  • One off-target was identified from one of the two embryos. The site was also reported by DISCOVER-seq and validated by Amplicon-seq. The scale bar represented the indel frequency reported by CRISPResso.
  • Fig. 4D shows signal tracks of the on-target and off-target sites identified from PEAC-seq in two different embryos and wild-type control.
  • the signal of the WT control at chr4: 106463845 was 1000-fold lower than the samples and was considered as background.
  • Fig. 5 illustrates ePEAC-seq, an enhanced version of PEAC-seq with higher sensitivity to identify off-targets.
  • Fig. 5A is a schematic representation of the five modified versions of PEAC-seq.
  • Fig. 5B shows the insertion frequencies of PEAC-seq tag in PEAC-seq and its five modifications.
  • Fig. 5C is the Venn diagram of EMX1 off-targets identified by PEAC-seq and GUIDE-seq.
  • Fig. 5D shows the ePEAC-seq identified two more verified off-targets that were missed by PEAC-seq.
  • Fig. 6 shows genomic context of PEAC-seq off-target and translocations.
  • Fig. 6A shows signals of the ATAC-seq peaks and ChIP-seq peaks of multiple histone modifications and proteins surrounding the PEAC-seq off-targets.
  • Fig. 6B shows signals of the DSB surrounding the PEAC-seq translocation sites (left panel) and random controls (right panel) .
  • Fig. 7 illustrates library preparation and modified GUIDE-seq pipeline to generate six lists of candidate sites.
  • Amplicons were enriched by PEAC-seq insertion-specific primers and Tn5 primers. Three forward primers and two reverse primers were used with the upstream (light blue) and downstream (yellow) Tn5 primers, in five separate PCR reactions. A total of five NGS libraries were generated and sequenced.
  • a modified GUIDEseq analysis pipeline was applied, and six lists of candidate sites were generated from each pair of the forward and the reverse primers.
  • Fig. 8 shows indel frequency and tag insertion ability of a Cas9-MMLV system.
  • Fig. 8A shows the results from Amplicon-seq, which was conducted to quantify the indel frequency of ten on-target sites.
  • the indels were generated by Cas9 or Cas9-MMLV.
  • Fig. 8B shows the frequency of tag insertion was estimated on the same ten on-target site.
  • Fig. 9 shows signal tracks of PEAC-seq at VEGFA TS1. Chromosome locations and the overlap with GUIDE-seq were also shown.
  • Fig. 10 shows signal tracks of PEAC-seq at VEGFA TS2.
  • Fig. 10A is a Venn diagram showing the overlap of on-target and off-targets of VEGFA TS2 between PEAC-seq and GUIDE-seq. Eighty-one sites were overlapped. Seventy-one sites were GUIDE-seq unique and thirty-four sites were PEAC-seq unique.
  • Fig. 10B is a GUIDE-seq visualization output of PEAC-seq sites at VEGFA TS2.
  • Fig. 10C shows signal tracks of PEAC-seq sites at VEGFA TS2. Chromosome locations and the overlap with GUIDE-seq were also shown.
  • Fig. 11 shows signal tracks of PEAC-seq at VEGFA TS3.
  • Fig. 11A is a Venn diagram shows the overlap of on-target and off-targets of VEGFA TS3 between PEAC-seq and GUIDE-seq. Thirty-five sites were overlapped. Twenty-five sites were GUIDE-seq unique, and eight sites were PEAC-seq unique.
  • Fig. 11B is GUIDE-seq visualization output of PEAC-seq sites at VEGFA TS3.
  • Fig. 11C shows signal tracks of PEAC-seq sites at VEGFA TS3. Chromosome locations and the overlap with GUIDE-seq were also shown.
  • Fig. 12 shows signal tracks of PEAC-seq at EMX1.
  • Fig. 12A is a GUIDE-seq visualization output of PEAC-seq sites at EMX1.
  • Fig. 12B shows signal tracks of PEAC-seq sites at EMX1. Chromosome locations and the overlap with GUIDE-seq were also shown.
  • Fig. 13 shows signal tracks of PEAC-seq at RNF2.
  • Fig. 13A is a Venn diagram shows the overlap of on-target and off-targets of RNF2 between PEAC-seq and GUIDE-seq. One site was called by both two methods.
  • Fig. 13B is a GUIDE-seq visualization output of PEAC-seq sites at RNF2.
  • Fig. 13C shows signal tracks of PEAC-seq sites at RNF2. Chromosome locations and the overlap with GUIDE-seq were also shown.
  • Fig. 14 shows signal tracks of PEAC-seq at FANCF.
  • Fig. 14A is a GUIDE-seq visualization output of PEAC-seq sites at FANCF.
  • Fig. 14B shows signal tracks of PEAC-seq sites at FANCF. Chromosome locations and the overlap with GUIDE-seq were also shown.
  • Fig. 15 shows the translocations call by PEAC-seq.
  • Fig. 15A shows the primerE. geometric_mean and translocation rates of on/off target sites called by PEAC-seq. primerE. geometric_mean: Geometric mean of the number of reads amplified by forward/reverse primer with distinct molecular indices. Translocation Rate: The ratio of reads amplified by PEAC-seq forward primer but with reverse orientation.
  • Figs. 15B and 15C show two of the translocation sites with highest translocation rates called by PEAC-seq, which were validated by unidirectional targeted sequencing (UDiTaS) .
  • Circos plots show the chromosome rearrangements at the receiver sites Translocation Validation site1 (chr22: 37266776-37266799) (15B) and Translocation Validation site2 (chr14: 61612048-61612071) (15C) . Both sites are off-targets of VEGFA TS3. Arcs were used to represent the rearrangements between the Translocation validation sites and other sites. The receiver sites were marked as diamonds, and the known VEGFA TS3 off target sites were marked as stars.
  • Fig. 16 shows PEAC-seq identified mPnpla3 off-targets from edited mouse embryo.
  • Fig. 16A is a Venn diagram that shows the overlap between the PEAC-seq on-target and off-targets of PnPla3 and the top21 off-targets validated by WGS (Anderson et al, 2018) . Three cleavage sites were identified from two different embryos from our study. All three sites were reported previously.
  • Fig. 16B is a sequence visualization of the Pnpla3 on-target and off-targets.
  • One off-target site was identified by both embryos, and each embryo identified an embryo-specific off-target. All three off-targets were reported previously and also verified by Amplicon-NGS.
  • Fig. 16 C shows signal track of the on-target and off-targets sites identified by PEAC-seq in two different embryos and wild-type control.
  • Fig. 17 illustrates ePEAC-seq, an enhanced version of PEAC-seq.
  • the Venn diagrams show the VEGFA TS2 off-targets identified by GUIDE-seq and PEAC-seq.
  • Fig. 18 illustrates ePEAC-seq, an enhanced version of PEAC-seq.
  • the Venn diagrams show the EMX1 off-targets identified by GUIDE-seq and PEAC-seq.
  • Fig. 19 illustrates mut-pegRNA, an enhanced version of the pegRNA for PEAC-seq.
  • Random nucleotide was incorporated into the PBS region of pegRNAs to improve the binding between pegRNA and off-targets with PBS mismatches.
  • RTT is an insertion sequence RT template. This illustration shows five different mut-pegRNA, each included one random nucleotide shown as “N”.
  • Fig. 20 shows target sites identified by PEAC-seq in cellulo.
  • the Cas9 target sequences i.e., cleavage sites
  • the Cas9 target sequences identified by PEAC-seq targeting six genes (VEGFA TS1 (Fig. 20A) , VEGFA TS2 (Fig. 20B) , VEGFA TS3 (Fig. 20C) , EMX1 (Fig. 20D) , RFN2 (Fig. 20E) and FANCF (Fig. 20F) ) , and their Chromosome locations.
  • the number of mismatches is also shown.
  • the cleavage site with 0 mismatch is the on-target site and the others are off-target sites.
  • Fig. 21 shows target sites identified by PEAC-seq in vivo.
  • Fig. 21A shows the Cas9 target sequences (i.e., cleavage sites) from Embryo #5 and Embryo #12 identified by PEAC-seq targeting Pcsk9.
  • Fig. 21B shows the Cas9 target sequences (i.e., cleavage sites) from Embryo #21 and Embryo #31 identified by PEAC-seq targeting Pnpla3.
  • Fig. 22 shows the primers, oligos, and vectors used in the development of PEAC-seq.
  • Fig. 23 shows the primers used in the validation of chromosome translocation. (Examples 1, 3)
  • Fig. 24 shows the primers and vectors used in PEAC-seq and Amplicon-NGS in vivo.
  • Fig. 25 shows the result of Amplicon-seq validation.
  • Fig. 26 shows the insertion efficiency of four insertion sequences on four target sites, and the nucleotide composition of each insertion sequence.
  • Fig. 27 shows the insertion efficiency of two insertion sequences at HEK3 site, and the nucleotide composition of each insertion sequence.
  • Fig. 28 shows off-targets sites of VEGFA TS2 called with GUIDE-seq, PEAC-seq, and ePEAC-seq.
  • nucleic acids are written left to right in the 5'to 3'orientation; and amino acid sequences are written left to right in amino to carboxy orientation, respectively.
  • variable refers to varied form of a subject, which includes wild-type forms, naturally occurring or artificially mutant forms.
  • an “insertion sequence” refers to a DNA sequence that is encoded by the RT template comprised in a guide RNA and the products reverse transcribed from this RT template. Both partial and full-length products may exist in a reverse transcription. When “insertion sequence” is used to refer to the reverse transcription products, it includes both the partial and full-length products.
  • a guide RNA refers to a synthetic or expressed RNA sequence that comprises a CRISPR binding motif and a spacer.
  • a “spacer” is a DNA-targeting motif, which is a sequence that is complementary to a target specific DNA region.
  • a CRISPR binding motif is sometimes call “scaffold. ”
  • the CRISPR binding motif of a guide RNA can bind to a Cas enzyme and DNA-targeting motif of the gRNA can guide the complex to a specific target location on a DNA.
  • a gRNA may further comprise an insertion sequence RT template.
  • the guide RNA is a pegRNA.
  • a “complex” refers to a system of components that achieves a function as disclosed herein, e.g., detecting cleavage sites of CRISPR/Cas nucleases and DNA translocations at cleavage sites. Some or all of the components of the system may be connected (covalently or non-covalently associated) or not connected.
  • a “fusion protein” is a protein comprising at least two domains that are encoded by separate genes that have been joined a single polypeptide.
  • a fusion protein can comprise two domains that are encoded by separate genes that have been joined so that they are transcribed and translated as a single unit, producing a single polypeptide.
  • the at least two domains are fused together directly.
  • the domains are connected by one or more linkers.
  • the present disclosure provides a new method for identifying Cas protein cleavage sites.
  • This method can be used for off-target identification. As explained below, this takes advantage of the sequence insertion ability from the Prime Editor (PE) , so it is referred to as PEAC-seq (Prime Editor Assisted off-target Characterization) in some embodiments (e.g., as illustrated in Fig. 1A, Fig. 17, and Fig. 18) .
  • PEAC-seq Primary Editor Assisted off-target Characterization
  • the Prime Editing system is a “search-and-replace” genome editing technology that mediates targeted insertions, deletions, and base-to-base conversions and combinations thereof in human cells without the need for double strand breaks (DSBs) or donor DNA templates.
  • Prime Editors use a reverse transcriptase (RT) fused to an RNA-programmable nickase (e.g., Cas9 nickase) and a prime editing guide RNA (also known as pegRNA) to copy genetic information directly from an extension on the pegRNA into the target genomic locus (Anzalone et al., 2019) .
  • RT reverse transcriptase
  • pegRNA prime editing guide RNA
  • the template sequence on the pegRNA extension will be reverse transcribed into DNA and hybridize to the unedited complementary strand with the help of another endonuclease (e.g., FEN1) .
  • the native PE system utilizes a pegRNA (Prime Editor gRNA) containing extra sequences at the 3’ of gRNA, which serve as a priming site and reverse transcriptase template, allowing reverse transcription from the exposed 3’-hydroxyl group of the non-targeting strand to incorporated additional DNA sequences into the cleavage sites.
  • pegRNA Primary Editor gRNA
  • PEAC-seq the sequences reverse transcribed from the template and inserted into the cleavage sites are used as labels in subsequent enrichment and identification of the cleavage sits.
  • An optimized reverse transcriptase (RT) template is used to incorporate PEAC-seq label sequences, which were further used to represent and enrich the local sequences of the editing sites from the genome, including both on-target and off-target sites.
  • the PEAC-seq method in the present disclosure replaces the Cas9 nickase in the Prime Editing system with a Cas9, which creates DSBs in the genomic DNA. By creating DSBs, the newly reverse transcribed DNA sequences will be inserted into the cleavage site at a higher efficiency.
  • PEAC-seq accompanies the process of CRISPR editing and label insertion, which ensures the consistency between editing events and PEAC-seq signals.
  • PEAC-seq When applied PEAC-seq on a few promiscuous sites in both in cellulo and in vivo samples, it can effectively identify off-targets by comparing to the results of GUIDE-seq, DISCOVER-seq, WGS, and Amplicon-seq.
  • DNA translocations can be successfully identified. DNA translocations can not be directly profiled by currently available methods and are typically more toxic to cells.
  • PEAC-seq is an unbiased method of identifying CRISPR off-targets and off-target-related DNA translocations. As it bypassed the addition of high molarity of exogenous dsODNs, PEAC-seq also holds immense potential to identify off-targets and translocations for in vivo CRISPR editing, which would be particularly valuable for translational studies.
  • Off-target detection is crucial to biotechnological and clinical applications of the CRISPR technology. Over the past years, many designs have been applied to depict profiles of off-targets in vitro and in cellulo. These methods often involve addition of exogenous dsODN or chemicals, which limits their applications in vivo. Besides these experimental approaches, computational algorithms considered diverse features of gRNA also contributed to generating candidate off-target list. However, it is always concerning how well the cellular context could be reflected by these alternative approaches.
  • PEAC-seq uses a label sequence encoded within the CRISPR-Cas system and inserts it along with the cleavage into the cleavage sites.
  • PEAC-seq has successfully identified and validated off-targets both in HEK293T and in mouse embryos.
  • a Cas nuclease that is capable of creating double-strand breaks (DSB) such as Cas9
  • Cas9n the Cas9 nickase
  • a pegRNA comprises an insertion sequence RT template for inserting a label sequence by reverse transcription for subsequent enrichment.
  • the Cas/pegRNA creates DSBs in the genome at both on-target and off-target sites, and the label sequence is introduced at the DSB sites through reverse transcription from the pegRNA and incorporated into the genome through the NHEJ pathway of DNA repair.
  • the insertion sequence RT template can be reverse transcribed either in full or in part.
  • the insertion sequence RT template can be designed with the following considerations: (1) avoiding the RNA secondary structure of the inserted sequence (i.e., the label sequence) and between the inserted sequence and the gRNA scaffold; (2) sequence uniqueness to the host genome; (3) sufficiently long for efficient anneal by PCR primers for enrichment.
  • the insertion sequence RT template encodes a 21-nt sequence of SEQ ID NO: 1.
  • DNA translocation is also referred to as chromosome translocation, or chromosome rearrangement.
  • a translocation a segment from one chromosome is transferred to a nonhomologous chromosome or to a new site on the same chromosome.
  • Chromosomal translocations appear to arise from improper repair of DNA double-strand breaks (DSBs) , which are highly toxic lesions.
  • DSBs DNA double-strand breaks
  • the “guardians” of genome integrity mostly ensure reliable repair of DSBs; also, unrepaired DSBs can lead to apoptosis or senescence.
  • imprecise repair of DSBs has the potential to be highly deleterious, as it can lead to genome instability, including the formation of chromosomal rearrangements.
  • chromosomal translocations can arise when DNA ends from DSBs on two heterologous chromosomes are improperly joined.
  • the DSB-induced DNA rearrangements which have not been systematically evaluated by other CRISPR off-target identification techniques, would cause severe chromosome aberrant including large fragment deletion, inversion, and translocation.
  • the resulted PCR amplicons can be used as indicators for chromosome rearrangements, as it can distinguish whether the amplicon came from the joining of expected DSB ends. It is also noticed that the occurrence of DNA translocation is independent to the frequency of DSB at a particular site, which indicated that other factors, e.g., position or DSB context sequences might contribute to translocation (Wei et al., Cell 2016) .
  • both the translocation profiling methods and genotoxicity assessment need to be developed for CRISPR transitional applications.
  • the presently disclosed methods e.g., PEAC-seq, can detect DNA translocation in genomic DNA and address at least some of the problems in the art.
  • PEAC-seq method and DISCOVER-seq, both relying on agent signals that accompanying with the cleavage events.
  • DISCOVER-seq is not as accurate and efficient as the PEAC-seq method because it uses MRE11 ChIP-seq signals to represent the DSB events undergoing in the edited cells, while the nature of ChIP-seq technique captured only the snapshot of MRE11 binding and might not exhibit the off-target sites over the course of editing.
  • PEAC-seq relies on the enrichment of an inserted PCR handle. Random sequence screen demonstrated good efficiency of long insertion. Increasing the cell population may further increase the sensitivity of PEAC-seq, which have been demonstrated by the two verified PEAC-seq unique off-targets in cellulo. PEAC-seq provides a versatile tool to enhance our understanding about the occurrence of off-target in different context, which is a very informative alternative to the costly WGS.
  • the insertion efficiency of a PEAC-seq label sequence is important for the detection accuracy and efficiency.
  • the insertion efficiency may vary across different pegRNAs and at different off-targets.
  • Recent studies have reported a variety of modifications to the native PE system to increase the editing efficiency, including modifications on pegRNA, MMLV, and transient expression of a dominant negative DNA mismatch repair (MMR) protein, such as the MLH1dn protein (Nelson et al., Nat Biotechnol 2022; Zong et al., Nat Biotechnol 2022; Chen et al., Cell 2021) .
  • MMR dominant negative DNA mismatch repair
  • incorporating epegRNA is an effective method to improve the insertion efficiency of PEAC-seq labels, which, for example, rescued two missing off-targets from EMX1 PEAC-seq (see Example 5) .
  • Unprotected nuclear RNAs are susceptible to degradation from both the 5′and 3′termini by exonucleases.
  • the 3′extension of pegRNAs is likely to be exposed in cells and thus more susceptible to exonucleolytic degradation.
  • the pegRNA comprises a structural RNA motif at its 3’ end. Specifically, addition of structured RNA motifs at the 3’ end of the pegRNA can improve pegRNA stability and minimize degradation.
  • structured RNA motif and “RNA structural motif” are used interchangeably, and they refer to a piece of RNA with a defined secondary and/or tertiary structure.
  • the RNA structural motif is a modified prequeosine1-1 riboswitch aptamer (evopreQ1, SEQ ID NO: 75) . (Nelson et al., Nat Biotechnol 2022; Zong et al., Nat Biotechnol, 2022) .
  • the RNA structural motif is a frameshifting pseudoknot from Moloney murine leukemia virus (MMLV) (mpknot, SEQ ID NO: 503) . (Chen et al., Cell 2021) . In some embodiments, some unnecessary sequence can be trimmed off from the RNA structural motif to remove extraneous sequences while maintain the pegRNA’s editing efficiency.
  • MMLV Moloney murine leukemia virus
  • mpknot SEQ ID NO: 503
  • some unnecessary sequence can be trimmed off from the RNA structural motif to remove extraneous sequences while maintain the pegRNA’s editing efficiency.
  • a viral exoribonuclease-resistant RNA (xrRNA) motif is appended to the 3’ end of the pegRNA. This modification can increase pegRNA’s resistance against degradation.
  • the Xrn1-resistant RNAs (xrRNAs) are a group of conserved structures found in flaviviruses, including Dengue, Yellow fever, West Nile, and Zika. Located at the beginning of the 3’ untranslated region (3’ -UTR) of the viral genome, such structure protects the downstream viral RNA from degradation by the 5’ -3’ exoribonuclease Xrn1, resulting in the production of a non-coding sub-genomic viral RNA that functions to enhance viral pathogenicity.
  • the xrRNAs adopt a characteristic knot-like structure that is thought to mechanically impede Xrn1 processing from the 5’ direction. Recent evidence demonstrated that even under bidirectional pulling forces, the xrRNA motif exhibited a remarkably high level of mechanical rigidity and resistance to unfolding. (Zhang, Guiquan, et al. "Enhancement of prime editing via xrRNA motif-joined pegRNA. " Nature communications 13.1 (2022) : 1-12. )
  • the insertion efficiency of PEAC-seq may depend on the length and sequence composition of the insertion sequence (Fig. 26) .
  • the RNA secondary structure of the insertion sequence and sequence uniqueness to the host genome can vary. But the present disclosure provides several considerations to be taken into account in designing insertion sequence RT template. As long as these considerations are taken into account, the insertion sequence (as well as the insertion sequence RT template) is exchangeable. (Figs. 26-27) .
  • the present disclosure provides three insertion sequences, which are SEQ ID NOs: 1 and 497-498, and the insertion sequence RT templates of them are SEQ ID NOs: 504-506, respectively.
  • the PBS primary binding site
  • the PBS is a 13-nt sequence as in the native PE system (see Anzalone et al., Nature 2019) .
  • the PBS is a 17-nt sequence. The present disclosure provides that both the 13-nt and 17-nt PBS worked well in the methods disclosed herein.
  • the PBS sequences which are designed to be complementary to the on-target sites, can have mismatches at off-target sites. Many off-targets with PBS mismatches were successfully identified by PEAC-seq, indicating the complication of the effects of PBS mismatches on reverse transcription.
  • the present disclosure further provides that in some embodiments, including random nucleotides in the PBS region of pegRNA can improve the extension efficiency at off-targets with PBS mismatches.
  • the PBS in the pegRNA comprises random nucleotides, for example, proximal to the primer extension site.
  • mut-pegRNAs are referred to as mut-pegRNAs.
  • pegRNA designed from the on-target sequence can enable PEAC-seq tag insertion in most off-target sites, and the incorporation of mut-pegRNA may improve the insertion efficiency of PEAC-seq tags in some off-target sites with critical PBS mismatches.
  • a mix of pegRNA and mut-pegRNAs may also increase the insertion efficiency of the PEAC-seq tag.
  • the mix has 50%pegRNA and 50%mut-pegRNAs.
  • the mix comprises more than one mut-pegRNA, such as two, three, four, or five different mut-pegRNAs.
  • the mix comprises five different mut-pegRNAs, e.g., as shown in Fig. 19, with 10%of each.
  • the kit disclosed herein comprises more than one guide RNAs, polynucleotide encoding the guide RNAs, or vectors encoding the guide RNAs, wherein the more than one guide RNAs are a mix of pegRNA and mut-pegRNA.
  • the composition disclosed herein comprises more than one guide RNAs, polynucleotide encoding the guide RNAs, or vectors encoding the guide RNAs, wherein the more than one guide RNAs are a mix of pegRNA and mut-pegRNA.
  • Reverse transcriptase evolving for error-correcting activity may also improve the primer extension efficiencies. If a proper enzyme can be evolved and characterized, the 3’ to 5’ exonuclease activity can correct mismatches between PBS and off-targets.
  • the insertion sequence RT template is provided separately from the gRNA.
  • the insertion sequence RT template (RTT) is provided together with the primer binding sequence (PBS) as a separate RTT-PBS sequence.
  • the RTT-PBS sequence comprises a MS2 hairpin.
  • the RTT-PBS sequence is circular.
  • the RTT-PBS sequence is linear.
  • the present disclosure further provides that in some embodiments, the Cas nuclease and the reverse transcriptase are not fused.
  • the Cas nuclease and reverse transcriptase are provided separately, for example, by two separate vectors.
  • the separate Cas nuclease and/or reverse transcriptase are each fused with one or more tags which facilitate recruit of the reverse transcription by the Cas nuclease or the pegRNA.
  • the reverse transcriptase is fused with an MS2 coat protein, and the pegRNA is incorporated with multiple MS2 stem-loops.
  • the reverse transcriptase is fused with a single-chain variable fragment (scFv)
  • the Cas nuclease is fused with multiple copies of GCN4 peptide (this particular multi-peptide tag is called SunTag) .
  • scFv single-chain variable fragment
  • SunTag this particular multi-peptide tag
  • the Cas nuclease is provided in two or more parts.
  • the Cas nuclease is split into two parts and delivered by two separate vectors.
  • each of the two parts of the Cas nuclease is connected with a trans splicing intein.
  • the reverse transcriptase is a modified Moloney–murine leukemia virus reverse transcriptase (M-MLV RT) .
  • M-MLV RT is composed of fingers, palm, thumb, and connection domains, each having a unique role in nucleotide incorporation during DNA synthesis.
  • RNase H domain that functions as a processive endonuclease cleaving the RNA strand in RNA–DNA heteroduplexes.
  • the reverse transcriptase is a M-MLV RT with decreased or disrupted RNase H activity.
  • the RNase H activity of the M-MLV RT is decreased or disrupted by one or more point mutations within the RNase H domain of the M-MLV RT, for example an Asp524Asn substitution.
  • the whole RNase H domain is deleted from the M-MLV RT.
  • the whole RNase H domain and the connection domain that is linked to the RNase H domain are deleted from the M-MLV RT.
  • Some viral proteins can facilitate reverse transcription, such as the nucleocapsid (NC) protein that has nucleic acid chaperone activity affecting a variety of RT-related functions.
  • the reverse transcriptase is a M-MLV RT which is fused with an NC protein.
  • the NC protein is fused at the C terminus of the M-MLV RT.
  • the NC protein is fused between the Cas nuclease and the M-MLV RT.
  • the present disclosure provides that other optimization and modification to the Prime Editor system can also be similarly applied to the PEAC-seq methods disclosed herein.
  • the PEAC-seq methods disclosed herein adopt the Prime Editor system, or a modified version of the Prime Editor system, to report CRISPR off-targets in cellulo and in vivo, and Cas-dependent DNA rearrangement.
  • PEAC-seq further diversifies the CRISPR off-target identification toolbox and provides a reliable solution to directly identify off-targets for in vivo editing and recognize DNA rearrangements, which would strengthen our ability to assess genotoxicity in clinics.
  • the present disclosure provides a complex comprising a Cas nuclease, a reverse transcriptase (RT) , and a guide RNA which comprises a spacer, a scaffold, an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • the spacer, the scaffold, the insertion sequence RT template, and the PBS are arranged from 5’ to 3’ in the guide RNA.
  • the present disclosure provides a complex comprising a Cas nuclease, a reverse transcriptase, a guide RNA which comprises a spacer and a scaffold, and an RTT-PBS sequence which comprises an insertion sequence RT template and a primer binding site (PBS) , wherein the PBS is located downstream to the 3’ end of the insertion sequence RT template, and wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • the RTT-PBS sequence is a linear sequence or a circularized sequence.
  • the RTT-PBS sequence further comprises an MS2 hairpin
  • the reverse transcriptase is fused with an MS2 coat protein.
  • MS2 is a 19-nucleotide long viral (bacteriophage) RNA sequence present at the ribosomal binding site of the MS2 replicase mRNA, which folds into a hairpin loop structure. This hairpin loop is recognized with high specificity and affinity by the MS2 bacteriophage capsid RNA-binding protein MS2 (Kd of 3-300 ⁇ 10 9 , depending on the stem loop sequence and the MS2 RBP variant) .
  • RBP MS2 as a chimeric protein containing a peptide tag facilitates the isolation of the [MS2 RNA/MS2 protein] complex, together with other molecules present in the complex.
  • MS2 has been used widely to tag RNA transcribed in vitro and in vivo for other applications.
  • the double-strand break created by the Cas nuclease has blunt ends. In some embodiments, the double-strand break created by the Cas nuclease has sticky ends.
  • DNA ends refer to the properties of the ends of linear DNA molecules, which are described as “sticky” or “blunt” based on the shape of the complementary strands at the terminus. In sticky ends, one strand is longer than the other (typically by at least a few nucleotides) , such that the longer strand has bases which are left unpaired. In blunt ends, both strands are of equal length, i.e., they end at the same base position, leaving no unpaired bases on either strand.
  • the Cas nuclease is selected from Cas9, its variants, and mutants of any one of the variants.
  • CRISPR clustered, regularly interspaced, short palindromic repeats
  • Cas CRISPR-associated systems
  • the present disclosure involves a Cas nuclease or a variant or a mutant of any of the variants thereof.
  • All variants and mutants of Cas9 can be used in a method, composition, or kit disclosed herein, including but not limited to a wild-type Cas9 or a Cas9 nickase (Cas9n) .
  • the Cas9 nuclease used herein can either be wild type or be genetically modified.
  • the Cas9 nucleases to be used herein can be selected from SpCas9 (Cas9 isolated from Streptococcus pyogenes) , SaCas9 (Cas9 isolated from Staphylococcus aureus) , StCas9 (Cas9 isolated from Streptococcus thermophilus) , NmCas9 (Cas9 isolated from Neisseria meningitidis) , FnCas9 (Cas9 isolated from Francisella novicida) , CjCas9 (Cas9 isolated from Campylobacter jejuni) , ScCas9 (Cas9 isolated from Streptococcus canis) , and any variants and mutant forms of the Cas9 listed above, such as high-fidelity Cas9 (Kleinstiver et al., Nature. 2016 Jan 28) and enhanced SpCas9 (Slaymaker et al., Sciences. 2016 Jan 01) .
  • the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, variants thereof and mutants of any of the variants.
  • the present disclosure involves a reverse transcriptase or a variant or a mutant of any of the variants thereof, which can be provided as a fusion protein with a Cas nuclease, or provided in trans.
  • Reverse transcriptase also known as RNA-dependent DNA polymerase, is a DNA polymerase enzyme that transcribes single-stranded RNA into DNA.
  • Reverse transcriptase is found in many eukaryotic and prokaryotic systems like telomerase, retrotransposons, retrons, and are found abundantly in the genomes of plants and animals. Any of the wild type, variant, and mutant forms of reverse transcriptase which are known in the art or which can be made using methods known in the art are contemplated herein.
  • the reverse transcriptase that can be used herein include, but not limited to, Moloney Murine Leukemia Virus (M-MLV) reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, and their variants or mutants of any of the variants.
  • M-MLV RT has decreased or disrupted RNase H activity.
  • the M-MLV RT is fused with a viral nucleocapsid (NC) protein.
  • NC viral nucleocapsid
  • the reverse transcriptase is fused directly to the Cas nuclease. In some embodiments, the reverse transcriptase is connected to the Cas nuclease with a linker. It would be understood that a person skilled in the art is able to select conditions (e.g., optimal temperature, pH, reaction time, and/or concentration) suitable for a reverse transcriptase to form the insertion double strand DNA and the like.
  • the Cas nuclease and the reverse transcriptase are not fused or linked.
  • the Cas nuclease and the reverse transcriptase are formed as a fusion protein, or operably connected by a linker.
  • a fusion protein can be made from a fusion gene, e.g., created by joining parts of two different genes.
  • the fusion protein is encoded by a sequence of SEQ ID NO: 208.
  • the Cas nuclease is provided in two parts.
  • an intein-mediated split-Cas9 is used in the complex disclosed herein and methods disclosed herein.
  • a bi-lobed shaped structure of Cas9 has recently been discovered. The two lobes consist of a recognition lobe (REC) and a nuclease lobe (NUC) . In between, there is a positively charged groove where the negatively charged nucleic acids of the holo-form reside.
  • REC recognition lobe
  • NUC nuclease lobe
  • Structural studies render the rational engineering of Cas9 possible, either to equip it with new functionalities or to change its characteristics.
  • PE2 can be divided into two parts in the middle of the SpCas9 nickase and then reconstituted into intact functional PE2 if trans splicing inteins are placed at the location of the split.
  • the components of this split-intein PE2 can be delivered into cells in vivo using dual AAV vectors to mediate PE events.
  • a guide RNA is a short synthetic RNA composed of a scaffold sequence necessary for Cas-binding and a user-defined nucleotide spacer that defines the gene target to be modified.
  • the strand of genomic DNA that is bound by the spacer is typically referred to as the complementary strand.
  • the other strand of DNA is typically referred to as the non-complementary strand.
  • the guide RNA used herein is made up of two RNA molecules which are a crRNA and a tracrRNA, wherein the crRNA is customized to bind a target gene, and the tracrRNA serves as a binding scaffold for a Cas enzyme.
  • the guide RNA used herein is a single guide RNA (sgRNA) , wherein the single RNA molecule comprises a custom- designed crRNA sequence fused to a scaffold tracrRNA sequence.
  • a single guide RNA is used to increase the editing efficiency.
  • the guide RNA further comprises an extension arm to its 3’ end.
  • the extension arm provides a DNA synthesis template sequence that encodes a single strand DNA flap that is to be inserted into a Cas cleavage site.
  • at the 3’ end of the extension arm is a primer binding site (PBS) that binds to the non-complementary strand of the target gene and serves as a primer for the reverse transcriptase.
  • PBS primer binding site
  • the insertion sequence RT template and PBS are split off the guide RNA, and are provided separately as an RTT-PBS sequence.
  • the DNA synthesis template sequence for the reverse transcriptase is referred to as an insertion sequence RT template in the present disclosure.
  • the guide RNA comprising a spacer, a scaffold, and an insertion sequence reverse transcriptase (RT) template.
  • the spacer, the scaffold, the insertion sequence RT template, and the PBS are arranged from 5’ to 3’ in the guide RNA.
  • the insertion sequence RT template is about 10 to 30 nucleotides. In some embodiments, the insertion sequence template comprises a nucleotide sequence of any length, e.g., from about 10bp to 30bp. The insertion sequence can be of any length, including but not limited to 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 nucleotides in length.
  • the insertion sequence RT template comprises a nucleotide sequence of any one of SEQ ID NO: 504.
  • the present disclosure provides two alternative insertion sequences, which are SEQ ID NOs: 505 and 506. (Fig. 27) .
  • the insertion sequence RT template encodes one or more tags suitable for hybrid capture.
  • Hybrid capture is a method used in target DNA enrichment, where a “bait” molecule is used to select target regions from DNA libraries.
  • the hybrid capture method that can be used herein include, but not limited to, biotinylated oligonucleotide baits.
  • the guide RNA comprises an RNA structural motif at the 3’ end.
  • the RNA structural motif is a modified prequeosine1-1 riboswitch aptamer (evopreQ1) .
  • the RNA structural motif is a frameshifting pseudoknot from Moloney Murine Leukemia Virus (MMLV) .
  • the PBS comprises random nucleotides.
  • the PBS is a short sequence complementary to the strand of the target gene other than the one targeted by the spacer. PBS binds to the target site and serves as the point of initiation for reverse transcription.
  • Random nucleotide refers to the nucleotide in a PBS sequence that does not complementary to the target gene.
  • the pegRNA comprises 1 random nucleotide. In some embodiments, the pegRNA comprises 2, 3, or 4 random nucleotides. In some embodiments, the random nucleotide is proximal to the reverse transcription initiation site. In some embodiments, the random nucleotide is the nucleotide next to the insertion sequence RT template.
  • the guide RNA comprises a viral exoribonuclease-resistant RNA (xrRNA) motif at its 3’ end.
  • xrRNA viral exoribonuclease-resistant RNA
  • the xrRNA motif is derived from a flavivirus.
  • the falvivurs is Dengue, Yellow fever, West Nile, or Zika.
  • the present disclosure provides a polynucleotide encoding the Cas nuclease, the reverse transcriptase, the spacer, the scaffold, the insertion sequence RT template, and the PBS in any one of the complexes disclosed herein.
  • the present disclosure provides a polynucleotide encoding a guide RNA comprising an insertion sequence RT template, wherein the insertion sequence RT template comprises a nucleotide sequence of any one of SEQ ID NOs: 504-506.
  • the present disclosure provides a polynucleotide encoding an RTT-PBS sequence which comprises an insertion sequence RT template and a primer binding site (PBS) , wherein the insertion sequence RT template comprises a nucleotide sequence of any one of SEQ ID NOs: 504-506.
  • polynucleotides disclosed herein can be obtained by methods known in the art.
  • the polynucleotide can be obtained from cloned DNA (e.g., from a DNA library) , by chemical synthesis, by cDNA cloning, or by the cloning of genomic DNA or fragments thereof, purified from the desired cell.
  • cloned DNA e.g., from a DNA library
  • any method known to those skilled in the art for identification of nucleic acids that encode desired genes can be used. Any method available in the art can be used to obtain a full length (i.e., encompassing the entire coding region) cDNA or genomic DNA encoding a desired protein, such as from a cell or tissue source.
  • Modified or variant polynucleotides can be engineered from a wildtype polynucleotide using standard recombinant DNA methods.
  • Polynucleotides can be cloned or isolated using any available methods known in the art for cloning and isolating nucleic acid molecules. Such methods include PCR amplification of nucleic acids and screening of libraries, including nucleic acid hybridization screening, antibody-based screening, and activity-based screening.
  • Methods for amplification of polynucleotides can be used to isolate polynucleotides encoding a desired protein, including for example, polymerase chain reaction (PCR) methods.
  • PCR can be carried out using any known methods or procedures in the art. Exemplary methods include use of a Perkin-Elmer Cetus thermal cycler and Taq polymerase (Gene Amp) .
  • a nucleic acid containing gene of interest can be used as a source material from which a desired polypeptide-encoding nucleic acid molecule can be amplified.
  • DNA and mRNA preparations, cell extracts, tissue extracts from an appropriate source e.g., testis, prostate, breast
  • fluid samples e.g., blood, serum, saliva
  • samples from healthy and/or diseased subjects can be used in amplification methods.
  • the source can be from any eukaryotic species including, but not limited to, vertebrate, mammalian, human, porcine, bovine, feline, avian, equine, canine, and other primate sources.
  • Nucleic acid libraries also can be used as a source material. Primers can be designed to amplify a desired polynucleotide.
  • primers can be designed based on expressed sequences from which a desired polynucleotide is generated. Primers can be designed based on back-translation of a polypeptide amino acid sequence. If desired, degenerate primers can be used for amplification. Oligonucleotide primers that hybridize to sequences at the 3’a nd 5’ termini of the desired sequence can be uses as primers to amplify by PCR from a nucleic acid sample. Primers can be used to amplify the entire full-length polynucleotide, or a truncated sequence thereof. Nucleic acid molecules generated by amplification can be sequenced and confirmed to encode a desired polypeptide.
  • the present disclosure provides a vector comprising the polynucleotide disclosed herein.
  • the present disclosure provides a vector comprising a first polynucleotide encoding a Cas nuclease and a reverse transcriptase, and a second polynucleotide encoding a guide RNA comprising a spacer, a scaffold, an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • PBS primer binding site
  • the second polynucleotide is a polynucleotide encoding a guide RNA comprising an insertion sequence RT template, wherein the insertion sequence RT template comprises a nucleotide sequence of SEQ ID NO: 504, or any one of SEQ ID NOs: 505-506.
  • the Cas nuclease is selected from Cas9, its variants and mutants of any of the variants.
  • the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, variants thereof and mutants of any of the variants.
  • any methods known in the art for the insertion of DNA fragments into a vector can be used to construct expression vectors comprising a polynucleotide disclosed herein. These methods can include in vitro recombinant DNA and synthetic techniques and in vivo (genetic) recombination.
  • the polynucleotide disclosed herein can be operably linked to control sequences in the expression vector (s) to ensure protein expression.
  • control sequences may include, but are not limited to, leader or signal sequences, promoters (e.g., naturally associated or heterologous promoters) , ribosomal binding sites, enhancer or activator elements, translational start and termination sequences, and transcription start and termination sequences, and are chosen to be compatible with the host cell chosen to express the proteins.
  • the promoters may be either naturally occurring promoters, hybrid promoters that combine elements of more than one promoter, or synthetic promoters.
  • An expression construct may be present in a cell on an episome, such as a plasmid, or the expression construct may be inserted in a chromosome such as in a gene locus.
  • the expression vector includes a selectable marker gene to allow the selection of transformed host cells.
  • the vector is an expression vector comprising a nucleotide sequence encoding a variant polypeptide operably linked to at least one regulatory control sequence. Regulatory control sequences for use herein include promoters, enhancers, and other expression control elements.
  • the expression vector is designed for the choice of the host cell to be transformed, the particular variant polypeptide desired to be expressed, the vector's copy number, the ability to control that copy number, and/or the expression of any other protein encoded by the vector, such as antibiotic markers.
  • the vector can include, but is not limited to, viral vectors and plasmid DNA.
  • Viral vectors can include, but are not limited to, adenoviral vectors, lentiviral vectors, retroviral vectors, and adeno-associated viral vectors.
  • expression vectors contain selection markers such as ampicillin-resistance, hygromycin-resistance, tetracycline resistance, kanamycin resistance, or neomycin resistance to permit detection of those cells transformed with the desired DNA sequences.
  • Suitable vectors, promoter, and enhancer elements are known in the art; many are commercially available for generating subject recombinant constructs.
  • the vector is a polycistronic vector.
  • the vector is a bicistronic vector or a tricistronic vector.
  • Bicistronic or polycistronic expression vectors may include (1) multiple promoters fused to each of the open reading frames; (2) insertion of splicing signals between genes; (3) fusion of genes whose expressions are driven by a single promoter; and (4) insertion of proteolytic cleavage sites between genes (self-cleavage peptide) or insertion of internal ribosomal entry sites (IRESs) between genes.
  • Apolycistronic vector is used to co-express multiple genes in the same cell.
  • Two strategies are most commonly used to construct a multicistronic vector.
  • an Internal Ribosome Entry Site (IRES) element is typically used for bi-cistronic vectors.
  • the IRES element acting as another ribosome recruitment site, allows initiation of translation from an internal region of the mRNA. Thus, two proteins are translated from one mRNA.
  • IRES elements are quite large (usually 500-600 bp) (Pelletier et al., 1988; Jang et al., 1988) .
  • the engineered CD47 proteins disclosed herein have a smaller size compared to the wild-type full-length human CD47, and thus can be used with IRES element in a multicistronic vectors having limited packaging capacity.
  • the present disclosure provides a kit comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, and a guide RNA which comprises a spacer, a scaffold, an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • a kit comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, and a guide RNA which comprises a spacer, a scaffold, an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • PBS primer binding site
  • the present disclosure provides a kit comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, a guide RNA which comprises a spacer and a scaffold, and an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • a kit comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, a guide RNA which comprises a spacer and a scaffold, and an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • PBS primer binding site
  • the present disclosure provides a kit comprising one or more vectors encoding a Cas nuclease, a reverse transcriptase, and a guide RNA which comprises a spacer, a scaffold, and an RTT-PBS sequence which comprises an insertion sequence RT template and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • a kit comprising one or more vectors encoding a Cas nuclease, a reverse transcriptase, and a guide RNA which comprises a spacer, a scaffold, and an RTT-PBS sequence which comprises an insertion sequence RT template and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • DSB DNA double-strand break
  • the present disclosure provides a kit comprising one or more vectors encoding a Cas nuclease, a reverse transcriptase, a guide RNA which comprises a spacer and a scaffold, and an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • a kit comprising one or more vectors encoding a Cas nuclease, a reverse transcriptase, a guide RNA which comprises a spacer and a scaffold, and an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • PBS primer binding site
  • the Cas nuclease is selected from Cas9, its variants and mutants of any of the variants.
  • the reverse transcriptase is selected from Moloney Murine Leukemia Virus M-MLV reverse transcriptase, Avian Myeloblastosis Virus (AMV) reverse transcriptase, variants thereof and mutants of any of the variants.
  • the insertion sequence RT template comprises a nucleotide sequence of SEQ ID NO: 504, or any one of SEQ ID NOs: 505-506.
  • the present disclosure provides a composition comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, and a guide RNA comprises a spacer, a scaffold, an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • a composition comprising one or more polynucleotide sequences encoding a Cas nuclease, a reverse transcriptase, and a guide RNA comprises a spacer, a scaffold, an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • PBS primer binding site
  • the present disclosure provides a composition
  • a composition comprising one or more vectors encoding a Cas nuclease, a reverse transcriptase, and a guide RNA comprises a spacer, a scaffold, an insertion sequence RT template, and a primer binding site (PBS) , wherein the Cas nuclease is capable of creating DNA double-strand break (DSB) .
  • PBS primer binding site
  • the present disclosure provides a method for labeling Cas nuclease cleavage sites in genomic DNA, comprising contacting the genomic DNA with the complex disclosed herein, wherein the genomic DNA is cleaved at one or more cleavage sites, and one or more sequences that are reverse transcribed from the insertion sequence RT template in part or in whole are inserted into the one or more cleavage sites, and wherein the one or more sequences inserted into the one or more cleavage sites are labels.
  • the cleavage site is on-target or off-target.
  • a Cas nuclease binds to a genetic locus that has a sequence exactly the same as the target gene, the cleavage site created there is an on-target cleavage site. Otherwise, the cleavage site is an off-target site.
  • the present disclosure provides a method for detecting Cas9 cleavage sites and/or detecting DNA translocation in genomic DNA, comprising (a) labeling the genomic DNA with the method disclosed herein, (b) targeting and amplifying the labeled sites on the genomic DNA to obtain amplicons, (c) sequencing the amplicons to obtain sequencing results, and (d) analyzing the sequencing result to identify Cas cleavage sites and/or DNA translocations at the Cas cleavage sites.
  • the one or more amplicons each comprise a portion of genomic DNA that is immediately upstream or downstream to the one or more labels.
  • the method is used to identify Cas nuclease off-target sites by comparing the Cas cleavage sites identified by the method disclosed herein with a target sequence, and the cleavage site that is not identical to the target sequence is an off-target site. It would be understood that, based on the method disclosed herein, those of ordinary skill in the art are able to locate the cleavage sites on the genome with readily available tools such as Burrows-Wheeler Aligner (BWA) .
  • BWA Burrows-Wheeler Aligner
  • the genomic DNA is processed by Tn5 tagmentation before amplification.
  • Tn5 tagmentation uses a hyperactive variant of the Tn5 transposase that mediates the fragmentation of double-stranded DNA and ligates synthetic oligonucleotides at both ends (Adey et al. 2010) .
  • Wild-type Tn5 transposon is a composite transposon in which two near-identical insertion sequences (IS50L and IS50R) are flanking three antibiotic resistance genes (Reznikoff 2008) .
  • Each IS50 contains two inverted 19-bp end sequences (ESs) , an outside end (OE) and an inside end (IE) .
  • Tn5 tagmentation platform or kits and their variants or mutants of any of the variants can be used in the present disclosure, such as Nextera DNA kits and on-bead tagmentation.
  • the genomic DNA is processed by Tn5 tagmentation before amplification, wherein sequencing adapters that include unique molecular identifiers (UMI) are embedded in the Tn5 transposases.
  • UMI is a type of molecular barcoding that provides error correction and increased accuracy in sequencing data analysis.
  • the molecular barcodes are short sequences used to uniquely tag each molecule in a sample library.
  • the UMI-included adapters are embedded into Tn5 so that dsDNA fragments after tagmentation are tagged with these UMI-included adapters, which can be used to eliminate PCR duplicates from the sequencing data.
  • the genomic DNA comprises the insertion sequence or a portion of the insertion sequence is targeted and enriched by a method selected from PCR, or a hybrid capture-based target enrichment method.
  • Hybrid capture-based target enrichment method that can be used herein includes, but not limited to, biotinylated oligonucleotide baits.
  • PCR polymerase chain reaction
  • a set of flanking primers anneal at the outer regions of the DNA sequence of interest, and therefore, unwanted DNA are not amplified.
  • Another available group of methods for target enrichment is hybrid capture-based methods.
  • One commonly used hybridization capture tag uses a biotinylated oligonucleotide bait. Any methods that can effectively enrich a targeted portion of the genomic DNA can be used herein.
  • the enrichment is performed by two rounds of PCR, wherein in one reaction the insertion sequence is used as the forward primer binding site and in the other reaction the insertion sequence is used as the reverse primer binding site.
  • the 3’ end of the primers that bind to the insertion sequence are at least 2-bp away from the insertion boundary so that the extension sequence information can be used to filter out random priming reads (see Fig 1B) . If the primer correctly binds to the insertion sequence, there would be at least 2 bp at the beginning of the extension sequence that are complementary to the insertion sequence.
  • the insertion boundary described herein is the first and last base pair of the insertion sequence.
  • the method is used to identify DNA translocation in genomic DNA, wherein a detection of signals located at the upstream genomic region of the forward primer binding site.
  • the methods disclosed herein can be used in vitro, in cellulo, or in vivo.
  • the present disclosure provides a method for determining the relative specificity of a plurality of guide RNAs comprising (a) identifying the off-target sites for Cas cleavage using each of the guide RNAs with a method disclosed herein, and (b) determining the relative specificity of the guide RNAs based on the total number of off-target sites identified for each of the guide RNAs, wherein a guide RNA having fewer off-target sites is more specific than a guide RNA having more off-target sites.
  • the present disclosure provides a method for determining the relative specificity of a plurality of Cas nuclease variants and mutants comprising (a) identifying the off-target cleavage site for each of the Cas nuclease variants and mutants with a method disclosed herein, and (b) determining the relative specificity of the Cas nuclease variants and mutants based on the total number of off-target sites identified for each of the Cas nuclease variants and mutants, wherein a Cas nuclease variant or mutant having fewer off-target sites is more specific than a Cas nuclease variant or mutant having more off-target sites.
  • the present disclosure provides a method for determining the relative genotoxicity of a plurality of guide RNAs comprising (a) identifying the off-target cleavage site and DNA translocation for each of the guide RNAs with a method disclosed herein, and (b) determining the relative genotoxicity of the guide RNAs based on the total number of off-target sites and DNA translocation identified for each of the guide RNAs, wherein a guide RNA having fewer off-target sites and fewer DNA translocation is more specific than a guide RNA having more off-target sites and DNA translocation.
  • sequencing includes any method of determining the sequence of a nucleic acid. Any method of sequencing can be used in the present disclosure, including chain terminator (Sanger) sequencing and dye terminator sequencing. In preferred embodiments, Next Generation Sequencing (NGS) is used. NGS is a high-throughput sequencing technology that performs thousands or millions of sequencing reactions in parallel. Although different NGS platforms use varying assay chemistries, they all generate sequence data from a large number of sequencing reactions run simultaneously on a large number of templates. Typically, the sequence data is collected using a scanner, and then assembled and analyzed bioinformatically. Thus, the sequencing reactions are performed, read, assembled, and analyzed in parallel.
  • chain terminator Sanger sequencing and dye terminator sequencing.
  • NGS Next Generation Sequencing
  • NGS methods require template amplification and some do not.
  • Amplification-requiring methods include pyrosequencing; the Solexa/Illumina platform, and the Supported Oligonucleotide Ligation and Detection (SOLID) platform.
  • Methods that do not require amplification include single-molecule sequencing methods, nanopore sequencing, HeliScope, real-time sequencing by synthesis, single molecule real time (SMRT) DNA sequencing methods using zero-mode waveguides (ZMWs) and others.
  • SMRT single molecule real time
  • ZMWs zero-mode waveguides
  • hybridization-based sequence methods or other high-throughput methods can also be used, e.g., microarray analysis, NANOSTRING, ILLUMINA, or other sequencing platforms.
  • the methods described herein can be used in any cell that is capable of repairing a DSB in genomic DNA and synthesizing new strand of DNA based on a template.
  • the two major DSB repair pathways in eukaryotic cells are homologous recombination and non-homologous end joining (NHEJ) .
  • the methods can be performed in cells capable of any of the repair pathways.
  • the Prime Editor system was adapted by replacing the Cas9 nickase with wildtype Cas9.
  • the RT template of pegRNA was modified.
  • the RT templated used here is SEQ ID NO. 9.
  • the Cas9 and pegRNA were assembled into a single vector as the PEAC-seq backbone.
  • the spacer sequences targeting VEGFA, EMX1, RFN2 and FANCF were cloned into the PEAC-seq backbone individually.
  • HEK293T cells were seeded in a 12-well plate and grow till ⁇ 80%confluency. Each well was transfected with 3ug plasmids by Lipofectamine 3000. The post-transfection cells were collected after 48 hours.
  • the cell sorter (SONY MA900) was used to sort about 100,000 GFP positive cells. About 500ng extracted gDNA was digested with NotI then cleaned up with 0.5x AMPure XP beads to remove the carryover plasmids. The gDNA fragments were retained on the AMPure XP beads, and on-beads Tn5 digestion was performed at 55C for one hour and adaptors were inserted at the ends of the fragments. The Tn5 was expressed and embedded with the adaptors in-house. At the end of the Tn5 digestion, 6uL 0.2%SDS was added to terminate the reaction. The products were purified and size-selected by 1.5x AMPure XP beads and eluted in 50uL H2O.
  • the 21bp insertion sequence was used to enrich the editing sites (both on-target and off-target) in the NGS library preparation.
  • 1st round of the nested PCR two separate reactions were performed. Each reaction used a 20uL template in a total of 50uL volume at ⁇ 30 cycles.
  • 2.5uL 1st round product was used as the template in the 2nd round amplification in a total of 50uL volume at 17 cycles, and Illumina adaptors were added.
  • the amplicons were purified by AMPure XP beads using 0.6x+0.25x double size selection.
  • the library was sequenced on the Illumina Novaseq platform as paired-end 150bp.
  • oligo and vectors are summarized in the Fig. 22.
  • the PEAC-seq data was analyzed using a modified pipeline from GUIDE-seq (Tsai et al., Nat Biotechnol 2015) . Firstly, adapters were trimmed using cutadapt (Martin M., EMBnet. journal 2011) , and reads without appropriate adapter were removed. Then the reads were mapped to the human or mouse genome (hg38, mm10) using bwa. Reads mapped to the same location and shared the same UMI were considered as PCR duplicates and merged in the following analysis. In order to fit in the target identification pipeline from GUIDE-seq, the reads name from bam files were modified, and the bam files from the forward and backward PCR were labeled and merged.
  • the reads number from the GUIDE-seq output file was normalized to reads per million and the number of reads with correct primer extension was calculated.
  • two nested PCR primers upstream of the gRNA were designed.
  • the site-specific nested PCR primers were served as forward primers, and downstream Tn5 primer was served as reverse primer.
  • the nested primers were sequentially used to amplify the adjacent sequences of translocated DSBs.
  • About 300 ng PEAC-seq gDNA was fragmentized by Tn5, purified with 1.5x AMpure XP beads and eluted with 23uL H 2 O.
  • About 20uL purified DNA was used as template for the 1 st round PCR for 20 cycles.
  • 2.5uL products from the 1 st PCR was used as template for another 20 cycles in the 2 nd round of the nested PCR.
  • Another 20-cycles PCR was conducted to add the sequencing adaptors.
  • the amplicons were purified by 0.6x then 0.25x double-size beads selection.
  • the library was sequenced on the Illumina Novaseq platform as paired-end 150b
  • oligo and vectors are summarized in the Fig. 23.
  • a 21-nt cytosine-depleted sequence was designed as an insertion sequence RT template
  • PEAC-seq was then conducted in HEK293T cells at six sites (VEGFA TS1, VEGFA TS2, VEGFA TS3, EMX1, FANCF, and RNF2) that have been tested in multiple studies (Kim et al., Nat Methods 2015; Kim et al., Genome Res 2018; Tsai et al., Nat Methods 2017; Cameron et al., Nat Methods 2017; Tsai et al., Nat biotechnol 2015) .
  • a modified GUIDE-seq analysis pipeline was used to rank and filter the identified editing sites. With an analysis of the off-target sites generated from different primer sets for PEAC-seq tag enrichment, the F1 and R2 primers were chosen as the enrichment primers in the following analysis (Figs. 9-14, 20) .
  • Amplicon-seq was then conducted to verify those off-targets that were only identified by GUIDE-seq or PEAC-seq at VEGFA TS1, FANCF, and EMX1 sites (Tsai et al., Nat Biotechnol 2015) (Table 2) .
  • VEGFA TS1 site Amplicon-seq confirmed the two PEAC-seq-unique off-targets, demonstrating good sensitivity of PEAC-seq.
  • the PEAC score calculated from the sequencing reads of PEAC-seq, quantitatively represents the enrichment of PEAC-seq tag at the edited sites.
  • the off-target sites identified by both PEAC-seq and GUIDE-seq show higher PEAC score compared to PEAC-seq-unique off-targets (Fig. 2A) .
  • the number of sequencing reads surrounding the off-targets were highly correlated at the fourteen shared sites (Fig. 2B) , suggesting their consistency in detecting high confident off-targets.
  • the on-target site, shared off-target sites, and PEAC-seq-unique off-target sites show similar tracks (Fig. 2C) .
  • the shared off-target sites composed a smaller number of mismatches than off-target sites unique to one of the methods (Fig. 2D) , which is expected as the number of mismatches closely relate to the occurrence of off-target editing.
  • the forward primer (F1) and downstream Tn5 primer would amplify regions downstream, but not upstream, of the PEAC-seq label (Fig. 3A) .
  • unexpected signals located at the upstream genomic region of the F1-Tn5 amplicons were observed (Figs. 3A, 15) . These signals might come from the joining of DSB ends from another genome breaking site.
  • PEAC-seq generates DSBs with three different ends, including one upstream end appended with a complete or partial PEAC-seq tag, one upstream end without PEAC-seq tag, and one downstream end.
  • DSB ends from different breaking points might join together and cause DNA rearrangements.
  • the upstream end with the PEAC-seq label from a distal Donor Site may join to the upstream end of a Receiver Site, but the direction of the PEAC-seq tag is reverse relative to the Receiver Site (Fig. 3B, model (v) ) .
  • This joining generates signals upstream to the PEAC-seq label of the Receiver Site, which won’t be amplified by the F1 and Tn5 primers (Fig. 3A) .
  • the directional insertion sequence in PEAC-seq allows identification of the aberrant ends joining from different DSB sites.
  • primers (Nest-F) located at the upstream of the F1 primer were designed, which paired with the downstream Tn5 primer to identify the sequences of the unknown Donor sites (Fig. 3C) .
  • a successful amplification bridging the Donor and the Receiver sites does not require the existence of the PEAC-seq label insertion (Fig. 3B, model (III) and (iv) ) , which allows comprehensive estimation of the various rearrangement patterns between the Donor and the Receiver sites.
  • PEAC-seq used the templated information on pegRNA to insert label sequences and not rely on exogenous labels. This straightforward procedure allowed us to investigate its application in vivo.
  • Mice embryos were edited at the pronuclear stage by injecting in vitro transcribed Cas9-MMLV mRNA and pegRNAs targeting PCSK9 and PNPLA3. Embryos were collected around E14.5 to E21 and generated the PEAC-seq off-target lists for these two sites (Fig. 4) .
  • One PCSK9 on-target and one off-target were identified from the two embryos, which both have been previously reported by DISCOVER-seq (Figs. 4B-4D, 21) .
  • Amplicon-seq verified the edits at the PEAC-seq off-targets and confirmed non-edits at the other reported off-targets.
  • the small number of PCSK9 off-targets in our study might be relevant to the short editing time window by using mRNA injection in embryos, compared to the adenovirus delivery in the liver (Wienert et al., Science 2019) .
  • the PEAC-seq at another in vivo CRISPR therapy target PNPLA3 was conducted. Three editing sites, including the on-target site, were identified by PEAC-seq from two embryos (Fig. 16, Fig. 21) .
  • Both the pegRNA and the mRNA of Cas9-MMLV were prepared by in vitro transcription.
  • the DNA template of pegRNA was amplified from the plasmids “pcsk9-sgRNA” and “mPnpla-sgRNA” by primers T7F and T7R.
  • the PCR products were gel purified using MinElute Gel Extraction Kit (QIAGEN #28606) , which was used as the template for in vitro transcription by HiScribe T7 Quick High Yield RNA Synthesis Kit (NEB #E2050S) .
  • the pCMV-Cas9-PE2 plasmid was linearized by MssI (Thermo #FD1344) . According to the manufacturer’s instructions, 1ug linearized product was used as a template to generate Cas9-PE mRNA from in vitro transcription by HiScribe T7 ARCA mRNA Kit (NEB #E2060S) .
  • C57BL/6 and ICR mice were purchased and housed in the Laboratory Animal Resource Center (LARC) at the Westlake University.
  • LARC Laboratory Animal Resource Center
  • the LARC is a certified pathogen-free and environmental-control facility (21 ⁇ 2°C, 55 ⁇ 15%humidity and 12: 12-h light: dark cycle) .
  • the C57BL/6 mice were used for embryo collection, and ICR females were used as recipients. All animal experiments were conducted under the protocol approved by the animal care and ethical committee of the Westlake University.
  • Embryos were then flushed several times to rinse off the hyaluronidase and cumulus cells. Afterward, embryos were transferred into a dish with prewarmed KSOM medium (Millipore #MR-106-D) covered by mineral oil followed by three additional washes.
  • the mixture of Cas9-PE2 mRNA (100ng/uL) and pegRNA (50ng/uL) was injected into the cytoplasm of the zygote in M2 medium.
  • the injection was conducted using a microinjector (NARISHIGE #IM-400B) with constant flow settings.
  • the injected embryos were cultured in KSOM medium with amino acids in a cell culture incubator at 37C and with 5%CO2, then were transplanted into oviducts of pseudopregnant ICR females at 0.5 dpc. Pups were sacrificed at E19.5 ⁇ E21, and organs were collected, dissected and snap-frozen in liquid nitrogen. Samples were stored at -80C until further analysis.
  • the gDNA from organs was extracted using TIANamp Genomic DNA Kit (TIANGEN #DP304-03) according to the manufacturer’s instructions. Nested PCR was applied to amplify the targeting regions and attach the Illumina adaptors to amplicons.
  • the in vivo PEAC-seq library was constructed as the cell line data in the previous section by Tn5 fragmentation.
  • the PEAC-seq was modified to use epegRNA (engineered pegRNA, incorporated 3’ RNA structural motif evopreQ 1 ) and including transient expression of MLH1dn with Cas9-MMLV.
  • epegRNA engineered pegRNA, incorporated 3’ RNA structural motif evopreQ 1
  • MLH1dn transient expression of MLH1dn with Cas9-MMLV.
  • epegRNA, hMLH1, and epegRNA plus MLH1dn three modified versions of PEAC-seq were developed and their performances on identifying off-targets at EMX1 and VEGFA TS2 sites were benchmarked (Fig. 5A) .
  • the truncated MMLV was not included as it is reported to be effective in plants but not in mammal cells (Zong et al., Nat Biotechnol 2022) .
  • the PEAC-seq label insertion was the main concentration because its efficiency is critical to the overall performance of PEAC-seq.
  • incorporating epegRNA appears to be the most effective one to increase the number of PEAC-seq tag insertion at different cutoffs (Fig. 5B) .
  • the epegRNA version of PEAC-seq was named as ePEAC-seq.
  • ePEAC-seq successfully identified the two missed off-targets of EMX1 (Figs. 5C-5D) , emphasized its higher sensitivity than PEAC-seq.
  • ePEAC-seq also called more off-target sites shared with GUIDE-seq, comparing to PEAC-seq (Fig. 10A, 17) . It is not surprising that the transient expression of MLH1dn didn’ t improve the performance, as MLH1dn is a dominant negative MMR protein, which involves DNA heteroduplexes by selectively replacing nicked DNA strands (Chen et al., Cell 2021) .
  • the repair pathway activated by PEAC-seq is probably different, as in some embodiments, the wild-type Cas9 replaced the Cas9 nickase in the native PE system.
  • Deeptools ‘computeMatrix’ (command : --referencePoint center --afterRegionStartLength 5000 --beforeRegionStartLength 5000 -p 15 --binSize 500) and ‘plotHeatmap’ function (Ramirez et al., Nucleic Acids Res 2014) were used to visualize the the genomic co-localizations between the all in vitro PEAC-seq off-target sites and epigenetic signals.
  • DSBs hotspots were identified from the dsODN only control (no Cas9/gRNA) from the GUIDE-seq performed in the 293T cells. Control genomic regions, which were equally sized regions randomly across the genome, were generated with the in-house perl script.
  • Deeptools ‘computeMatrix’a nd ‘plotHeatmap’ function were used to plot the heatmap of the genomic co-localizations between the PEAC-seq translocation sites or control genomic regions.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente divulgation concerne des complexes, des polynucléotides, des vecteurs, des kits et des procédés permettant de détecter les sites de clivage des nucléases CRISPR/Cas et les translocations d'ADN au niveau des sites de clivage dans un génome.
PCT/CN2022/137789 2022-12-09 2022-12-09 Compositions et procédés pour détecter les sites de clivage cibles des nucléases crispr/cas et la translocation de l'adn WO2024119461A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/137789 WO2024119461A1 (fr) 2022-12-09 2022-12-09 Compositions et procédés pour détecter les sites de clivage cibles des nucléases crispr/cas et la translocation de l'adn

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/137789 WO2024119461A1 (fr) 2022-12-09 2022-12-09 Compositions et procédés pour détecter les sites de clivage cibles des nucléases crispr/cas et la translocation de l'adn

Publications (1)

Publication Number Publication Date
WO2024119461A1 true WO2024119461A1 (fr) 2024-06-13

Family

ID=84901443

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/137789 WO2024119461A1 (fr) 2022-12-09 2022-12-09 Compositions et procédés pour détecter les sites de clivage cibles des nucléases crispr/cas et la translocation de l'adn

Country Status (1)

Country Link
WO (1) WO2024119461A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007140506A1 (fr) * 2006-06-02 2007-12-13 Human Genetic Signatures Pty Ltd Acide nucléique microbien modifié destiné à la détection et à l'analyse de micro-organismes
WO2023060539A1 (fr) * 2021-10-15 2023-04-20 Westlake University Compositions et procédés pour détecter des sites de clivage cibles de nucléases crispr/cas et une translocation d'adn

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007140506A1 (fr) * 2006-06-02 2007-12-13 Human Genetic Signatures Pty Ltd Acide nucléique microbien modifié destiné à la détection et à l'analyse de micro-organismes
WO2023060539A1 (fr) * 2021-10-15 2023-04-20 Westlake University Compositions et procédés pour détecter des sites de clivage cibles de nucléases crispr/cas et une translocation d'adn

Non-Patent Citations (57)

* Cited by examiner, † Cited by third party
Title
AKCAKAYA, P. ET AL.: "In vivo CRISPR editing with no detectable genome-wide off-target mutations", NATURE, vol. 561, 2018, pages 416 - 419, XP036902697, DOI: 10.1038/s41586-018-0500-9
ALANIS-LOBATO ET AL., PROC. NATL. ACAD. SCI. USA, 2021
ALANIS-LOBATO, G. ET AL.: "Frequent loss of heterozygosity in CRISPR-Cas9-edited early human embryos", PROC NATL ACAD SCI U S A, 2021, pages 118
ALT, F.W.ZHANG, Y.MENG, F.L.GUO, C.SCHWER, B.: "Mechanisms of programmed DNA lesions and genomic instability in the immune system", CELL, vol. 152, 2013, pages 417 - 429
ANDERSON KEITH R ET AL: "CRISPR off-target analysis in genetically engineered rats and mice", NATURE METHODS, NATURE PUBLISHING GROUP US, NEW YORK, vol. 15, no. 7, 21 May 2018 (2018-05-21), pages 512 - 514, XP036538714, ISSN: 1548-7091, [retrieved on 20180521], DOI: 10.1038/S41592-018-0011-5 *
ANDERSON, K.R. ET AL.: "CRISPR off-target analysis in genetically engineered rats and mice", NAT METHODS, vol. 15, 2018, pages 512 - 514, XP036542157, DOI: 10.1038/s41592-018-0011-5
ANZALONE ANDREW V. ET AL: "Search-and-replace genome editing without double-strand breaks or donor DNA", NATURE, vol. 576, no. 7785, 21 October 2019 (2019-10-21), London, pages 149 - 157, XP055980447, ISSN: 0028-0836, Retrieved from the Internet <URL:https://www.nature.com/articles/s41586-019-1711-4> DOI: 10.1038/s41586-019-1711-4 *
ANZALONE, A.V. ET AL.: "Search-and-replace genome editing without double-strand breaks or donor DNA", NATURE, vol. 576, 2019, pages 149 - 157, XP055899878, DOI: 10.1038/s41586-019-1711-4
BOIX ET AL., NATURE, 2021
BOIX, C.A.JAMES, B.T.PARK, Y.P.MEULEMAN, W.KELLIS, M.: "Regulatory genomic circuitry of human disease loci by integrative epigenomics", NATURE, vol. 590, 2021, pages 300 - 307, XP037365134, DOI: 10.1038/s41586-020-03145-z
BOTHMER, A. ET AL.: "Detection and Modulation of DNA Translocations During Multi-Gene Genome Editing in T Cells", CRISPR J, vol. 3, 2020, pages 177 - 187
CAMERON, P. ET AL.: "Site-seq: Mapping the genomic landscape of CRISPR-Cas9 cleavage", NAT METHODS, vol. 14, 2017, pages 600 - 606
CHEN, P.J. ET AL.: "Enhanced prime editing systems by manipulating cellular determinants of editing outcomes", CELL, vol. 184, 2021, pages 5635 - 5652
CHIARLE, R. ET AL.: "Genome-wide translocation sequencing reveals mechanisms of chromosome breaks and rearrangements in B cells", CELL, vol. 147, 2011, pages 107 - 119, XP028304221, DOI: 10.1016/j.cell.2011.07.049
CHOI JUNHONG ET AL: "Precise genomic deletions using paired prime editing", NATURE BIOTECHNOLOGY, NATURE PUBLISHING GROUP US, NEW YORK, vol. 40, no. 2, 14 October 2021 (2021-10-14), pages 218 - 226, XP037691460, ISSN: 1087-0156, [retrieved on 20211014], DOI: 10.1038/S41587-021-01025-Z *
DATABASE Genbank [online] 20 November 2022 (2022-11-20), WELLCOME SANGER TREE OF LIFE PROGRAMME: "Hofmannophila pseudospretella genome assembly, chromosome: 11", XP093045215, retrieved from https://www.ncbi.nlm.nih.gov/nuccore/OX376322.1 Database accession no. OX376322.1 *
DATABASE Genebank [online] 9 June 2022 (2022-06-09), WELLCOME SANGER TREE OF LIFE PROGRAMME: "Piscicola geometra assembly chromosome: 11", XP093045210, retrieved from https://www.ncbi.nlm.nih.gov/nuccore/OX030965.1 Database accession no. OX030965 *
ELLEFSON ET AL., SCIENCE, 2016
ELLEFSON, J.W. ET AL.: "Synthetic evolutionary origin of a proofreading reverse transcriptase", SCIENCE, vol. 352, 2016, pages 1590 - 1593, XP055787498, DOI: 10.1126/science.aaf5409
GIANNOUKOS, G. ET AL.: "UDiTaS, a genome editing detection method for indels and genome rearrangements", BMC GENOMICS, vol. 19, 2018, pages 212
GRUNEWALD, JULIAN ET AL.: "Engineered CRISPR prime editors with compact, untethered reverse transcriptases", NATURE BIOTECHNOLOGY, 2022, pages 1 - 7
HU, J. ET AL.: "Detecting DNA double-stranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing", NAT PROTOC, vol. 11, 2016, pages 853 - 871, XP055668234, DOI: 10.1038/nprot.2016.043
KIM, D. ET AL.: "Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells", NAT METHODS, vol. 12, 2015, pages 237 - 243, XP055554961, DOI: 10.1038/nmeth.3284
KIM, D.KIM, J.S.: "DIG-seq: a genome-wide CRISPR off-target profiling method using chromatin DNA", GENOME RES, vol. 28, 2018, pages 1894 - 1900
KLEINSTIVER ET AL., NATURE, 28 January 2016 (2016-01-28)
LIANG, G. ET AL.: "Frequent gene conversion in human embryos induced by double strand breaks", BIORXIV, 2020
LIU ET AL., CELL, 2017
LIU PENGPENG ET AL: "Improved prime editors enable pathogenic allele correction and cancer modelling in adult mice", NATURE COMMUNICATIONS, vol. 12, no. 1, 9 April 2021 (2021-04-09), XP055980471, Retrieved from the Internet <URL:http://www.nature.com/articles/s41467-021-22295-w> DOI: 10.1038/s41467-021-22295-w *
LIU, BIN ET AL.: "A split prime editor with untethered reverse transcriptase and circular RNA template", NATURE BIOTECHNOLOGY, 2022, pages 1 - 6
LIU, PENGPENG ET AL.: "Improved prime editors enable pathogenic allele correction and cancer modelling in adult mice", NATURE COMMUNICATIONS, vol. 12, no. 1, 2021, pages 1 - 13, XP055980471, DOI: 10.1038/s41467-021-22295-w
LIU, X. ET AL.: "CRISPR-Cas9-mediated multiplex gene editing in CAR-T cells", CELL RES, vol. 27, 2017, pages 154 - 157, XP055555205, DOI: 10.1038/cr.2016.142
MACLEAN ET AL., NATURE REV. MICROBIOL., vol. 7, 2009, pages 287 - 296
MARTIN, M.: "Cutadapt removes adapter sequences from high-throughput sequencing reads", EMBNET.JOURNAL, vol. 17, 2011, pages 10 - 12
MUSUNURU, K. ET AL.: "In vivo CRISPR base editing of PCSK9 durably lowers cholesterol in primates", NATURE, vol. 593, 2021, pages 429 - 434, XP037513148, DOI: 10.1038/s41586-021-03534-y
NELSON, J.W. ET AL.: "Engineered pegRNAs improve prime editing efficiency", NAT BIOTECHNOL, vol. 40, 2022, pages 402 - 410, XP037720612, DOI: 10.1038/s41587-021-01039-7
NEWBY, G.A. ET AL.: "Base editing of haematopoietic stem cells rescues sickle cell disease in mice", NATURE, 2021
RAMIREZ, F.DUNDAR, F.DIEHL, S.GRUNING, B.A.MANKE, T.: "deepTools: a flexible platform for exploring deep-sequencing data", NUCLEIC ACIDS RES, vol. 42, 2014, pages W187 - 191
REN, J. ET AL.: "Multiplex Genome Editing to Generate Universal CAR T Cells Resistant to PD1 Inhibition", CLIN CANCER RES, vol. 23, 2017, pages 2255 - 2266, XP055565027, DOI: 10.1158/1078-0432.CCR-16-1300
SHENGDAR Q TSAI ET AL: "CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR–Cas9 nuclease off-targets", NATURE METHODS, vol. 14, no. 6, 1 January 2017 (2017-01-01), New York, pages 607 - 614, XP055424040, ISSN: 1548-7091, DOI: 10.1038/nmeth.4278 *
SLAYMAKER ET AL., SCIENCES, 1 January 2016 (2016-01-01)
TRUONG, DONG-JIUNN JEFFERY ET AL.: "Development of an intein-mediated split-Cas9 system for gene therapy", NUCLEIC ACIDS RESEARCH, vol. 43, no. 13, 2015, pages 6450 - 6458, XP055791410, DOI: 10.1093/nar/gkv601
TSAI ET AL., NAT BIOTECHNOL, 2015
TSAI SHENGDAR Q ET AL: "GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases", NATURE BIOTECHNOLOGY, NATURE PUBLISHING GROUP US, NEW YORK, vol. 33, no. 2, 16 December 2014 (2014-12-16), pages 187 - 197, XP037614260, ISSN: 1087-0156, [retrieved on 20141216], DOI: 10.1038/NBT.3117 *
TSAI, S.Q. ET AL.: "CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets", NAT METHODS, vol. 14, 2017, pages 607 - 614, XP055424040, DOI: 10.1038/nmeth.4278
TSAI, S.Q. ET AL.: "GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases", NAT BIOTECHNOL, vol. 33, 2015, pages 187 - 197, XP055555627, DOI: 10.1038/nbt.3117
VOELKERDING ET AL., CLINICAL CHEM., vol. 55, 2009, pages 641 - 658
WEI, P.C. ET AL.: "Long Neural Genes Harbor Recurrent DNA Break Clusters in Neural Stem/Progenitor Cells", CELL, vol. 164, 2016, pages 644 - 655, XP029416800, DOI: 10.1016/j.cell.2015.12.039
WIENERT, B. ET AL.: "Unbiased detection of CRISPR off-targets in vivo using DISCOVER-Seq", SCIENCE, vol. 364, 2019, pages 286 - 289, XP055787709, DOI: 10.1126/science.aav9023
YAN, W.X. ET AL.: "BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks", NAT COMMUN, vol. 8, 2017, pages 15058, XP055485619, DOI: 10.1038/ncomms15058
YIN, J. ET AL.: "Optimizing genome editing strategy by primer-extension-mediated sequencing", CELL DISCOV, vol. 5, 2019, pages 18, XP055773402, DOI: 10.1038/s41421-019-0088-8
YU ZHENXING ET AL: "PEAC-seq adopts Prime Editor to detect CRISPR off-target and DNA translocation", NATURE COMMUNICATIONS, vol. 13, no. 1, 12 December 2022 (2022-12-12), XP093044844, Retrieved from the Internet <URL:https://www.nature.com/articles/s41467-022-35086-8> DOI: 10.1038/s41467-022-35086-8 *
ZHANG, GUIQUAN ET AL.: "Enhancement of prime editing via xrRNA motif joined pegRNA", NATURE COMMUNICATIONS, vol. 13, no. 1, 2022, pages 1 - 12
ZHANG, GUIQUAN ET AL.: "Enhancement of prime editing via xrRNA motif-joined pegRNA", NATURE COMMUNICATIONS, vol. 13, no. 1, 2022, pages 1 - 12
ZONG YUAN ET AL: "An engineered prime editor with enhanced editing efficiency in plants", NATURE BIOTECHNOLOGY, vol. 40, no. 9, 24 March 2022 (2022-03-24), New York, pages 1394 - 1402, XP093045317, ISSN: 1087-0156, Retrieved from the Internet <URL:https://www.nature.com/articles/s41587-022-01254-w> DOI: 10.1038/s41587-022-01254-w *
ZONG, Y. ET AL.: "An engineered prime editor with enhanced editing efficiency in plants", NAT BIOTECHNOL, vol. 40, 2022, pages 1394 - 1402
ZONG, YUAN ET AL.: "An engineered prime editor with enhanced editing efficiency in plants", NATURE BIOTECHNOLOGY, 2022, pages 1 - 9
ZUCCARO, M.V. ET AL.: "Allele-Specific Chromosome Removal after Cas9 Cleavage in Human Embryos", CELL, vol. 183, 2020, pages 1650 - 1664

Similar Documents

Publication Publication Date Title
CN107586835B (zh) 一种基于单链接头的下一代测序文库的构建方法及其应用
JP7229923B2 (ja) ヌクレアーゼ切断を評価する方法
ES2955957T3 (es) Polinucleótidos de ADN/ARN híbridos CRISPR y procedimientos de uso
CN110734908A (zh) 高通量测序文库的构建方法以及用于文库构建的试剂盒
JP7426370B2 (ja) ゲノムdna断片の標的化された精製のための調製用電気泳動方法
JP2018532419A (ja) CRISPR−Cas sgRNAライブラリー
JP2020501554A (ja) 短いdna断片を連結することによる一分子シーケンスのスループットを増加する方法
JP2016538001A (ja) 体細胞半数体ヒト細胞株
TW201321518A (zh) 微量核酸樣本的庫製備方法及其應用
EP1969146A1 (fr) Methodes pour la cartographie d&#39;acides nucleiques et l&#39;identification de variations structurales fines dans des acides nucleiques et leurs utilisations
CN109880851B (zh) 用于富集CRISPR/Cas9介导的同源重组修复细胞的筛选报告载体及筛选方法
EP3730616A1 (fr) Systèmes d&#39;édition de gènes à base unique fragmentés et application associée
WO2015144045A1 (fr) Banque de plasmides comprenant deux marqueurs aléatoires et leur utilisation dans le séquençage à haut débit
EP4159853A1 (fr) Système et procédé d&#39;édition de génome
US11519026B2 (en) Methods for removal of adaptor dimers from nucleic acid sequencing preparations
WO2019173248A1 (fr) Acides nucléiques ciblant un acide nucléique modifié
Glick et al. Medical biotechnology
CN116716298A (zh) 一种引导编辑系统和目的基因序列的定点修饰方法
US11661624B2 (en) Methods of identifying and characterizing gene editing variations in nucleic acids
WO2023060539A1 (fr) Compositions et procédés pour détecter des sites de clivage cibles de nucléases crispr/cas et une translocation d&#39;adn
WO2023016021A1 (fr) Outil d&#39;édition de base et son procédé de construction
WO2024119461A1 (fr) Compositions et procédés pour détecter les sites de clivage cibles des nucléases crispr/cas et la translocation de l&#39;adn
JP2024509047A (ja) Crispr関連トランスポゾンシステム及びその使用方法
US20240279728A1 (en) Detecting a dinucleotide sequence in a target polynucleotide
US20240287609A1 (en) Compositions and methods for large-scale in vivo genetic screening

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22840008

Country of ref document: EP

Kind code of ref document: A1