WO2023225358A1 - Génération et suivi de cellules avec des éditions précises - Google Patents

Génération et suivi de cellules avec des éditions précises Download PDF

Info

Publication number
WO2023225358A1
WO2023225358A1 PCT/US2023/022989 US2023022989W WO2023225358A1 WO 2023225358 A1 WO2023225358 A1 WO 2023225358A1 US 2023022989 W US2023022989 W US 2023022989W WO 2023225358 A1 WO2023225358 A1 WO 2023225358A1
Authority
WO
WIPO (PCT)
Prior art keywords
target locus
sequence
target
retron
locus
Prior art date
Application number
PCT/US2023/022989
Other languages
English (en)
Inventor
Shi-An A. CHEN
Alex KERN
Hunter FRASER
Original Assignee
The Board Of Trustees Of The Leland Stanford Junior University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Board Of Trustees Of The Leland Stanford Junior University filed Critical The Board Of Trustees Of The Leland Stanford Junior University
Publication of WO2023225358A1 publication Critical patent/WO2023225358A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/64General methods for preparing the vector, for introducing it into the cell or for selecting the vector-containing host
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1079Screening libraries by altering the phenotype or phenotypic trait of the host
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/80Vectors or expression systems specially adapted for eukaryotic hosts for fungi
    • C12N15/81Vectors or expression systems specially adapted for eukaryotic hosts for fungi for yeasts
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/905Stable introduction of foreign DNA into chromosome using homologous recombination in yeast
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1241Nucleotidyltransferases (2.7.7)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y207/00Transferases transferring phosphorus-containing groups (2.7)
    • C12Y207/07Nucleotidyltransferases (2.7.7)

Definitions

  • the present disclosure provides a nucleic acid composition that comprises two or more editing modules that are present on an expression vector.
  • the compositions and methods allow for producing combinations of targeted genetic modifications in the genome of a host cell.
  • the disclosure provides a retron-guide RNA cassette comprising: (a) a first retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) an msd locus; (iv) a first donor DNA sequence located within the msd locus, wherein the donor DNA sequence comprises homology to one or more sequences within a first target locus; and (v) a second inverted repeat sequence coding region; and (b) a first guide RNA (gRNA) coding region; (c) a second retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) an msd locus; (iv
  • the first target locus is located in trans to the second target locus. In some embodiments, the first target locus is located in a trans-regulatory element, and the second target locus is located in the 3’ untranslated region (UTR) of a transcription unit.
  • the first donor DNA sequence comprises a genetic variant compared to the sequences within the first target locus. In some embodiments, the genetic variant comprises a trans-expression quantitative trait locus (eQTL) variant at the first target locus.
  • eQTL trans-expression quantitative trait locus
  • the first target locus is located in cis to the second target locus.
  • the first target locus is located in a cis-regulatory element of a transcription unit, and the second target locus is located in the 3’ untranslated region (UTR) of the transcription unit.
  • the first donor DNA sequence comprises a genetic variant relative to the sequence at the first target locus.
  • the genetic variant comprises a cis-eQTL variant at the first target locus.
  • the second target locus is i) an intron or ii) is not located in genomic sequences that regulate transcription or translation of a gene.
  • the barcode sequence encodes a detectable molecule, a selectable marker, or a cell surface marker.
  • the first or second gRNA coding region is upstream of the first or second retron in the cassette such that transcription of the cassette results in a transcript in which the gRNA is 5’ of the RNA transcribed from the retron. In some embodiments, the first or second gRNA coding region is downstream of the first or second retron in the cassette such that transcription of the cassette results in a transcript in which the gRNA is 3’ of the RNA transcribed from the retron. [0011] In some embodiments, the retron-guide RNA cassette further comprises one or more ribozyme sequences. In some embodiments, the first and second retrons are connected by a self-cleaving ribozyme sequence.
  • the ribozyme sequence encodes a ribozyme selected from the group consisting of hepatitis delta virus (HDV) ribozyme, drz- Agam1-1, drzAgam1-2, drzPmar-1, Twister, Hammerhead, and combinations thereof.
  • the one or more ribozyme sequences are different from each other.
  • the retron-guide RNA cassette further comprises a third retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) an msd locus; (iv) a third donor DNA sequence located within the msd locus, wherein the donor DNA sequence comprises homology to one or more sequences within a third target locus; and (v) a second inverted repeat sequence coding region; and a third guide RNA (gRNA) coding region.
  • the disclosure provides a vector comprising a retron-guide RNA cassette described herein.
  • the disclosure provides a method for identifying a genetic modification at a target locus in a host cell, the method comprising: (a) transforming the host cell with a vector or retron-guide RNA cassette described herein; (b) culturing the host cell or transformed progeny of the host cell under conditions sufficient for expressing from the vector a first retron donor DNA-guide molecule comprising a first retron transcript and the first gRNA coding region and a second retron donor DNA-guide molecule comprising a second retron transcript and the second gRNA coding region, wherein the first and second retron transcripts self-prime reverse transcription by a reverse transcriptase (RT) expressed by the host cell or the transformed progeny of the host cell, wherein at least a portion of the first retron transcript is reverse transcribed to produce a multicopy single-stranded DNA (msDNA) molecule having one or more donor DNA sequences, wherein the one or more donor DNA sequences are homologous to the first target
  • the method identifies a genetic modification at a target locus within a genome of a host cell, where the genome comprises the endogenous genomic chromosomal DNA of the host cell. In some embodiments, the method identifies a genetic modification at a target locus anywhere within a genome of a host cell. In some embodiments, the target locus is located in an exogenous genome that is present in a host cell, such as a viral genome, a bacterial genome, a transposable element or an endovirus genome that are not part of the endogenous host cell genome.
  • the target locus is located in heterologous or exogenous DNA, such as the DNA of transgenes, viruses or transposons, that are present in the host cell or host cell nucleus. In some embodiments, the target locus is located in heterologous or exogenous DNA that is integrated into the host cell genomic DNA. In some embodiments, the target locus is located in heterologous or exogenous DNA that is not integrated into the host cell genomic DNA, such as transiently expressed transgenes, episomes or plasmids. [0016] In some embodiments, the first target locus is located in trans to the second target locus.
  • the first target locus is located in a trans-regulatory element, and the second target locus is located in a 5’ untranslated region, protein coding region, or the 3’ untranslated region (UTR) of a transcription unit.
  • the genetic variant comprises a trans-eQTL variant at the first target locus.
  • the first target locus is located in cis to the second target locus.
  • the first target locus is located in a cis-regulatory element of a transcription unit, and the second target locus is located in a 5’ untranslated region, protein coding region, or the 3’ untranslated region (UTR) of the transcription unit.
  • the genetic variant comprises a cis-eQTL variant at the first target locus.
  • the first and/or second target locus is located in an intergenic, non-coding region of the host cell genomic DNA.
  • the one or more donor DNA sequences comprise a genetic variant compared to the sequences within the first target locus.
  • the barcode sequence encodes a detectable molecule, a selectable marker, or a cell surface marker.
  • detecting the presence of the unique barcode sequence comprises sequencing the genome of the host cell, or detecting a detectable molecule encoded by the barcode sequence.
  • the vector is no longer present in the host cell when detecting the presence of the unique barcode sequence.
  • greater than or equal to about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95% of the host cells comprise both the barcode sequence and the sequence modifications compared to the sequences within the first target locus.
  • the method further comprises: (d) transforming the host cell with a second vector comprising a second retron-guide RNA cassette comprising: a third retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) an msd locus; (iv) a third donor DNA sequence located within the msd locus, wherein the donor DNA sequence comprises homology to one or more sequences within a third target locus; and (v) a second inverted repeat sequence coding region; and a third guide RNA (gRNA) coding region; a fourth retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) a second msd locus; (iv) a fourth donor DNA sequence located within the second msd locus, wherein the donor DNA sequence comprises homology to one or more sequences within a fourth target locus and
  • the one or more donor DNA sequences comprise a genetic variant compared to the sequences within the third target locus.
  • the third target locus is located in trans to the fourth target locus.
  • the third target locus is located in a trans-regulatory element, and the fourth target locus is located in the 3’ untranslated region (UTR) of a transcription unit.
  • the genetic variant comprises a trans-eQTL variant at the third target locus.
  • the third target locus is located in cis to the fourth target locus.
  • the third target locus is located in a cis-regulatory element of a transcription unit, and the fourth target locus is located in the 3’ untranslated region (UTR) of the transcription unit.
  • the genetic variant comprises a cis-eQTL variant at the first target locus.
  • the method further comprises detecting the relative expression of transcription from the transcription units comprising genetic variants at the first and third target loci.
  • the first and third gRNAs are the same; (ii) the first and third target loci are the same; (iii) the genetic modification at the first and third loci is different; (vi) the second and fourth gRNAs are the same; (v) the second and fourth target loci are the same; and (vi) the barcode sequences inserted at the second and fourth target loci are different.
  • the first and third gRNAs are different; (ii) the first and third target loci are different; (iii) the genetic modification at the first and third loci is different; (iv) the second and fourth gRNAs are the same; (v) the second and fourth target loci are the same; and (vi) the barcode sequences inserted at the second and fourth target loci are different.
  • the one or more donor DNA sequences comprise two homology arms, wherein each homology arm has at least about 70% to about 99% similarity to a portion of the sequence of the one or more target loci on either side of a nuclease cleavage site.
  • greater than or equal to about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95% of the host cells comprise both the barcode sequence and the sequence modifications compared to the sequences within the third target locus.
  • the method further comprises detecting the presence of the unique barcode at the third target locus, thereby identifying the genetic modification at both the first and third target loci.
  • the method further comprises repeating steps (d)-(f) with a third vector comprising a third retron-guide RNA cassette that inserts a genetic modification at a fifth target locus and a unique barcode sequence at a sixth target locus, thereby identifying the genetic modification at the fifth target locus.
  • the host cell is a prokaryotic cell.
  • the host cell is a eukaryotic cell.
  • the eukaryotic cell is a yeast cell.
  • the eukaryotic cell is a mammalian cell or cell line.
  • the mammalian cell is a human cell or cell line.
  • the host cell comprises a clonal population of host cells.
  • the genetic modifications are induced in greater than or equal to about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95% of the population of host cells.
  • the method further comprises transforming a mixture of cells with one or more vectors comprising the first, second or third retron-guide RNA cassettes, and screening the transformed cells for a phenotypic change relative to an untransformed control cell.
  • the method further comprises detecting the presence of the genetic modification at the target locus or the presence of the unique barcode sequence present in each retron-guide RNA cassette.
  • the disclosure provides a method for identifying two or more genetic modifications at two different target loci in a host cell, the method comprising: transforming the host cell with a vector or retron-guide RNA cassette described herein; wherein the vector or retron-guide RNA cassette comprises two or more variant editing cassettes that are expressed in the same transcript, and a donor DNA sequence comprising homology to one or more sequences within a third, different target locus and a unique barcode sequence.
  • Fig.1a-k Design and validation of CRISPEY-BAR for generating and tracking thousands of precise genome edits simultaneously. [0039] Fig.1a.
  • CRISPEY-BAR dual edit strategy Top, CRISPEY-BAR expression cassette consisting of pGAL7 galactose-inducible promoter and terminator (brown); self-cleaving HDV-like-ribozymes RzCIV, RZHDV and RZSpur3 (magenta); barcode insertion retron-guide cassette (blue) containing programmed barcode (orange) and UMI (yellow); variant editing cassette (green). Middle, the variant editing cassette converts a wildtype (WT) allele into an alternative allele. Bottom, the barcode insertion retron-guide cassette. [0040] Fig.1b. Schematic for conventional CRISPEY.
  • Fig.1c Schematic for CRISPEY-BAR. Variants tracked across three growth replicates by genomically-integrated barcodes with attached UMIs.
  • Fig.1d Workflow for CRISPEY-BAR library pool construction.
  • Fig. 1e Validation of genomic variant editing rate from CRISPEY-BAR. Blue, randomly picked colonies that contain both genomic-integrated barcode and the designed edit. Orange, randomly picked colonies that contain only the genomic-integrated barcode but not the designed edit.
  • Fig. 1f Schematic for CRISPEY-BAR pooled competition in yeast.
  • Fig.1g Example of CRISPEY-BAR data over time. Each line indicates normalized counts for a single UMI for a given barcode from 1 of 3 replicates in a competition experiment. Counts in later time points are normalized to the first time point. Light blue and blue: two barcodes representing different guides targeting the same variant chr7: 848783 AC>A. Red and dark red: two barcodes representing different guides targeting the same variant chr7: 847050 C>A. Gray scale: Non-targeting of variants, barcode integration only (no-edit control regarding variants). Data shown are from Terbinafine competition across approximately 26 generations. [0046] Fig.1h. Example of outlier removal.
  • Fig.1k Validation of pooled fitness in fluconazole by pairwise competition.
  • X-axis fitness ef-fect measured by CRISPEY-BAR pooled competition.
  • Y-axis fitness effect measured through pairwise competition against GFP strain using flow cytometry. Data shown for 13 variants in fluconazole. Data presented as mean ⁇ SEM.
  • Fig.2a-g Detection of natural variants affecting fitness within QTLs mapped in complex traits.
  • Fig.2a Diagram of library design process using natural variants and QTL regions, as well as library statistics.
  • Fig.2b Schematic for experiment workflow for QTL fine-mapping with CRISPEY- BAR.
  • Fig. 2c Number of variants with fitness effect (FDR ⁇ 0.01) within SC and appropriate stress condition.
  • Fig. 2d Annotation enrichment of variants with fitness effect (FDR ⁇ 0.01). Blue, variant enrichment for hits in fluconazole condition. Orange, variant enrichment for hits in caffeine condition. Green, variant enrichment for hits in cobalt chloride condition.
  • Fig.2e Fitness effects of example QTL regions. Dark blue, fitness effects in stress condition (FDR ⁇ 0.01). Dark orange, fitness effects in SC (FDR ⁇ 0.01). Light blue, no fitness effects stress condition. Gold, no fitness effects in SC. Most variants are represented twice (effect in QTL condition and complete media).
  • Fig. 2f PDR5 fitness effects in CAFF and FLC. Magenta, PDR5 variant fitness measured in caffeine condition. Orange, PDR5 variant fitness measured in fluconazole condition. Dark gray, noncoding regions flanking PDR5. Light gray, coding region of PDR5. Vertical lines connect the same variant fitness values measured in both caffeine and fluconazole.
  • Fig.2g Fitness effects of example QTL regions. Dark blue, fitness effects in stress condition (FDR ⁇ 0.01). Dark orange, fitness effects in SC (FDR ⁇ 0.01). Light blue, no fitness effects stress condition. Gold, no fitness effects in SC. Most variants are represented twice (effect in QTL condition and complete media).
  • Fig. 2f PDR5 fitness
  • Fig.3a-h CRISPEY-BAR enabled robust mapping of variant-level GxE interactions within the ergosterol biosynthesis pathway.
  • Fig.3a Ergosterol pathway diagram showing 24 genes from the ergosterol synthesis pathway surveyed in this study. Lovastatin and terbinafine target genes in the ergosterol pathway.
  • Fig.3b The same pool of yeast edited at natural ergosterol pathway variants was grown in six different conditions and tracked by barcode sequencing.
  • Fig.3c Gene level fitness effects of surveyed natural variants in six conditions.
  • X- axis labels indicate the genes containing the variants. Red, causal variants (p ⁇ 0.01). Gray, non-significant variants. Target genes are outlined by dashed black lines where applicable.
  • Fig.3d GxE interactions were calculated between each pair of conditions (15 pairwise comparisons).
  • Fig.3e Diagram showing definition of GxE variants in this study: A positive effect variant (black circle) in condition 1 can either have the same effect in another condition (white circle at same height in red region), a stronger positive effect (top white circle in red region), no effect, white circle at zero, or a negative effect (bottom white circle in blue region).
  • Fig. 3f The number of significant GxE interactions for each pairwise comparison.
  • Fig.3g GxE annotation enrichments for variants with GxE. Enrichment of variants with GxE in each category were normalized to all variants tested. Red dashed line indicates an enrichment factor of 1.0, corresponding to no enrichment over the library.
  • Fig.3h Variants with GxE effects within the HMG1 promoter. Clusters of variants with significant GxE effects within 8 bp of each other are in gray highlighted areas.
  • Fig.4a-f Quantifying GxE interactions among ergosterol pathway variants
  • Fig.4a Schematic of rare GxE between conditions (correlated effects).
  • Fig.4b Schematic of common GxE between conditions (uncorrelated effects).
  • Fig.4c Fitness effects of variants within PDR5 in caffeine and fluconazole.
  • Fig.4d Fitness effects of variants within ergosterol pool in lovastatin and CoCl2.
  • Fig.4e Fitness effects of variants within ergosterol pool in lovastatin and CoCl2.
  • Fig.4f Heatmaps showing fitness effects of all variants with a significant effect in any condition. Significant positive effects (red), significant negative effects (blue), non- significant positive effects (pink), and non-significant negative effects (light blue).
  • Fig. 5a-e Types of GxE variants and effect of natural variation on ERG4 expression.
  • Fig. 5a Example of fitness effect detected in only one condition.
  • Fig.5b Example of fitness effect detected in only one condition.
  • Fig.5c Example of fitness effects with same direction detected in two conditions.
  • Fig.5c Example of fitness effects with opposite directions between conditions, showing sign GxE.
  • Fig.5d Sign GxE variants have larger maximum fitness effects. Whiskers represent Q3 + 1.5xIQR and Q1 - 1.5xIQR, or the maximum and minimum values of the dataset if these are respectively lower or higher than the IQR based intervals.
  • Fig.5e Effect of natural variants on ERG4 expression. Top left: Consensus Rpn4p binding motif. Top right: Genomic location of Rpn4p binding site affected by chr7: 472522 C>A variant within ERG4/PDR1 divergent promoter.
  • FIG.6 Schematic for library cloning in CRISPEY-BAR.
  • Fig. 7 Schematic for pooled editing and growth competition in CRISPEY-BAR.
  • Fig.8 Schematic for CRISPEY-BAR sequencing library preparation.
  • Fig.9 Fitness and ERG4 expression for variants in Fig.5e.
  • X-axis Paired fitness from flow cytometry measurements similar to Fig.1i, see also Methods.
  • the present disclosure provides compositions and methods for tracking one or more targeted genetic modifications (also referred to as genetic “edits” or “variants”) made in the genome of a cell or organism.
  • the present disclosure provides a nucleic acid composition that comprises two or more editing modules that are present on an expression vector.
  • the compositions and methods allow for producing combinations of targeted genetic modifications in the genome of a host cell, where the combinations of modifications are predetermined.
  • the first module comprises nucleic acid sequences that can modify a genetic locus in a host cell (e.g., a first target locus) and the second module comprises nucleic acid sequences that modify a second genetic locus in a host cell (e.g., a second target locus).
  • the first target locus is at a different location in the genome than the second target locus.
  • the genetic modification at the first target locus is different than the genetic modification at the second target locus.
  • the genetic modification at the first target locus comprises a mutation, edit, variant or deletion in the nucleic acid sequence of the first target locus.
  • the genetic modification at the second target locus comprises a mutation, edit or variant of the nucleic acid sequence of the second target locus. In some embodiments, the genetic modification at the second target locus comprises or further comprises introducing a unique barcode sequence at the second target locus. In some embodiments, the genetic modification at the second target locus comprises introducing both a mutation, edit, or variant and a unique barcode sequence at the second target locus.
  • compositions and methods can be used to introduce a second genetic modification at a target locus in the same host cell or its progeny by transfecting the cell with a second vector comprising nucleic acid sequences that can modify a third target locus and a second module comprising nucleic acid sequences that can introduce a barcode sequence at a fourth target locus.
  • the first and third target loci are the same, but the genetic modification is different.
  • the second and fourth target loci are the same, but the barcode sequence is different. The above can be repeated to introduce additional genetic modifications along with different unique barcode sequences at the same or different target loci.
  • the vector can be removed or lost in the host cell and its daughter cells.
  • the intended combination of precise edits made in each cell can be determined by detecting the unique barcode sequence assigned to each edit combination.
  • barcode sequence can be detected by Sanger sequencing, next generation sequencing (NGS) or other detection methods that distinguish the unique barcode sequence assigned to each edit combination. This can be performed in a mixture of host cells, a single host cell, or a clonal cell lineage.
  • NGS next generation sequencing
  • the compositions and methods described herein provide the following advantages. 1) Detecting genetic modifications in the host cell does not require the presence of the expression vector in the host cell or its progeny.
  • the two or more editing modules are present on a bicistronic retron-donor-guide editing vector.
  • the bicistronic retron-donor-guide editing vector allows simultaneous editing of two different genetic target loci.
  • the first and second modules comprise a retron-guide RNA cassette.
  • Retron- guide RNA cassettes are described in US 2019/0330619 A1 (corresponding to WO 2018/049168) and US Provisional Patent App. No.63/232,080 (filed 8 August 2021),which are hereby incorporated by reference herein in their entirety.
  • the combination of edits that will be made across all modules are predetermined.
  • the three editing modules are present on a retron-donor- guide editing vector.
  • the bicistronic retron-donor-guide editing vector allows simultaneous editing of three different genetic target loci.
  • the first, second and third modules comprise a retron-guide RNA cassette.
  • the first and second modules introduce two (a pair) of genetic edits in two different target sequences, and the third module introduces a unique barcode sequence that is associated with the pair of genetic variants introduced by the first and second modules (a “variant-pair” specific barcode).
  • the first and second editing modules are connected by self- cleaving HDV-like ribozymes to allow separation of either module to detach from the RNA pol2 transcript, which allows Cas9/retron binding and nuclear export.
  • ribozymes are selected from drz-CIV-1, HDV ribozyme, and drz-Spur-3, though other combinations of ribozymes are expressly included herein.
  • Genome editing methods commonly include the provision of both an engineered nuclease or nickase and a donor DNA repair template that contains the DNA sequence to be inserted at a desired location.
  • the CRISPR/Cas9 system utilizes a guide RNA (gRNA) that directs the Cas9 nuclease to introduce a double-strand cut at a specific location.
  • gRNA guide RNA
  • a donor DNA repair template can then be provided, enabling the precise insertion of a new sequence mediated by homology-directed repair of the double-strand cut.
  • the gRNA and donor DNA template have been supplied as separate molecules, meaning that each editing experiment must be performed in a separate tube or vessel.
  • the reverse transcription of the DNA coding unit (msd region) of the retron transcript results in a multicopy single- stranded DNA (msDNA) molecule that contains a donor DNA repair template and is physically tethered to the gRNA, increasing editing efficiency.
  • msDNA multicopy single- stranded DNA
  • the practice of the present disclosure employs, unless otherwise indicated, conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA, which are within the skill of the art. See Sambrook, Fritsch and Maniatis, Molecular Cloning: A Laboratory Manual, 2nd edition (1989), Current Protocols in Molecular Biology (F. M.
  • Oligonucleotides that are not commercially available can be chemically synthesized, e.g., according to the solid phase phosphoramidite triester method first described by Beaucage and Caruthers, Tetrahedron Lett.22:1859-1862 (1981), using an automated synthesizer, as described in Van Devanter et.
  • any method or material similar or equivalent to a method or material described herein can be used in the practice of the present disclosure.
  • the following terms are defined.
  • the terms “a,” “an,” or “the” as used herein not only include aspects with one member, but also include aspects with more than one member.
  • the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise.
  • reference to “a cell” includes a plurality of such cells and reference to “the agent” includes reference to one or more agents known to those skilled in the art, and so forth.
  • the term “about” in relation to a reference numerical value can include a range of values plus or minus 10% from that value.
  • the amount “about 10” includes amounts from 9 to 11, including the reference numbers of 9, 10, and 11.
  • the term “about” in relation to a reference numerical value can also include a range of values plus or minus 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, or 1% from that value.
  • the terms “5’ ” and “3’ ” denote the positions of elements or features relative to the overall arrangement of the retron-guide RNA cassettes, vectors, or retron donor DNA-guide molecules of the present disclosure in which they are included. Positions are not, unless otherwise specified, referred to in the context of the orientation of a particular element or features.
  • the msr and msd loci in FIG. 4 are shown in opposite orientations.
  • the msr locus is said to be 5’ of the msd locus.
  • the 3’ end of the msr locus is said to be overlapping with the 5’ end of the msd locus.
  • the term “upstream” refers to a position that is 5’ of a point of reference.
  • the term “downstream” refers to a position that is 3’ of a point of reference.
  • the msr locus is said to be located upstream of the reverse transcriptase sequence, and the reverse transcriptase sequence is said to be located downstream of the msr locus.
  • the term “genome editing” refers to a type of genetic engineering in which DNA is inserted, replaced, or removed from a target DNA (e.g., the genome of a cell) using one or more nucleases and/or nickases.
  • the nucleases create specific double-strand breaks (DSBs) at desired locations in the genome, and harness the cell’s endogenous mechanisms to repair the induced break by homology-directed repair (HDR) (e.g., homologous recombination) or by nonhomologous end joining (NHEJ).
  • HDR homology-directed repair
  • NHEJ nonhomologous end joining
  • two nickases can be used to create two single-strand breaks on opposite strands of a target DNA, thereby generating a blunt or a sticky end.
  • Any suitable DNA nuclease can be introduced into a cell to induce genome editing of a target DNA sequence.
  • the terms “genetic modification,” “genetic edit,” and “genome edit” can be used interchangeably and refer to a change in the nucleic acid sequence of a target polynucleotide (e.g., the genomic DNA of a cell), such that the nucleic acid sequence of the modified DNA is different from the native, endogenous, previously modified, or wild-type sequence of the target DNA.
  • DNA nuclease refers to an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of DNA, and may be an endonuclease or an exonuclease. According to the present disclosure, the DNA nuclease may be an engineered (e.g., programmable or targetable) DNA nuclease which can be used to induce genome editing of a target DNA sequence.
  • DNA nuclease can be used including, but not limited to, CRISPR-associated protein (Cas) nucleases, other endo- or exo- nucleases, variants thereof, fragments thereof, and combinations thereof.
  • CRISPR-associated protein (Cas) nucleases CRISPR-associated protein (Cas) nucleases, other endo- or exo- nucleases, variants thereof, fragments thereof, and combinations thereof.
  • double-strand break or “double-strand cut” refers to the severing or cleavage of both strands of the DNA double helix.
  • the DSB may result in cleavage of both stands at the same position leading to “blunt ends” or staggered cleavage resulting in a region of single-stranded DNA at the end of each DNA fragment, or “sticky ends”.
  • a DSB may arise from the action of one or more DNA nucleases.
  • nonhomologous end joining or “NHEJ” refers to a pathway that repairs double-strand DNA breaks in which the break ends are directly ligated without the need for a homologous template.
  • HDR homologous recombination
  • the most common form of HDR is homologous recombination (HR), a type of genetic recombination in which nucleotide sequences are exchanged between two similar or identical molecules of DNA.
  • nucleic acid refers to deoxyribonucleic acids (DNA), ribonucleic acids (RNA) and polymers thereof in either single-, double- or multi-stranded form.
  • the term includes, but is not limited to, single-, double- or multi-stranded DNA or RNA, genomic DNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine and/or pyrimidine bases or other natural, chemically modified, biochemically modified, non-natural, synthetic or derivatized nucleotide bases.
  • a nucleic acid can comprise a mixture of DNA, RNA and analogs thereof.
  • nucleic acids containing known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions), alleles, orthologs, single nucleotide polymorphisms (SNPs), and complementary sequences as well as the sequence explicitly indicated.
  • degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka et al., J. Biol. Chem.260:2605-2608 (1985); and Rossolini et al., Mol. Cell. Probes 8:91-98 (1994)).
  • SNP single nucleotide polymorphism
  • SNPs are biallelic markers although tri- and tetra-allelic markers can also exist.
  • a nucleic acid molecule comprising SNP A ⁇ C may include a C or A at the polymorphic position.
  • the term “gene” means the segment of DNA involved in producing a polypeptide chain. The DNA segment may include regions preceding and following the coding region (leader and trailer) involved in the transcription/translation of the gene product and the regulation of the transcription/translation, as well as intervening sequences (introns) between individual coding segments (exons).
  • cassette refers to a combination of genetic sequence elements that may be introduced as a single element and may function together to achieve a desired result.
  • a cassette typically comprises polynucleotides in combinations that are not found in nature.
  • a cassette can be inserted into a vector, such as an expression vector.
  • operably linked refers to two or more genetic elements, such as a polynucleotide coding sequence and a promoter, placed in relative positions that permit the proper biological functioning of the elements, such as the promoter directing transcription of the coding sequence.
  • inducible promoter refers to a promoter that responds to environmental factors and/or external stimuli that can be artificially controlled in order to modify the expression of, or the level of expression of, a polynucleotide sequence or refers to a combination of elements, for example an exogenous promoter and an additional element such as a trans-activator operably linked to a separate promoter.
  • An inducible promoter may respond to abiotic factors such as oxygen levels or to chemical or biological molecules. In some embodiments, the chemical or biological molecules may be molecules not naturally present in humans.
  • vector and “expression vector” refer to a nucleic acid construct, generated recombinantly or synthetically, with a series of specified nucleic acid elements that permit transcription of a particular polynucleotide sequence in a host cell.
  • An expression vector may be part of a plasmid, viral genome, or nucleic acid fragment.
  • an expression vector includes a polynucleotide to be transcribed, operably linked to a promoter.
  • promoter is used herein to refer to an array of nucleic acid control sequences that direct transcription of a nucleic acid.
  • a promoter includes necessary nucleic acid sequences near the start site of transcription, such as, in the case of a polymerase II type promoter, a TATA element.
  • a promoter also optionally includes distal enhancer or repressor elements, which can be located as much as several thousand base pairs from the start site of transcription.
  • Other elements that may be present in an expression vector include those that enhance transcription (e.g., enhancers) and terminate transcription (e.g., terminators).
  • “Recombinant” refers to a genetically modified polynucleotide, polypeptide, cell, tissue, or organism.
  • a recombinant polynucleotide (or a copy or complement of a recombinant polynucleotide) is one that has been manipulated using well known methods.
  • a recombinant expression cassette comprising a promoter operably linked to a second polynucleotide can include a promoter that is heterologous to the second polynucleotide as the result of human manipulation (e.g., by methods described in Sambrook et al., Molecular Cloning - A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, (1989) or Current Protocols in Molecular Biology Volumes 1-3, John Wiley & Sons, Inc. (1994-1998)).
  • a recombinant expression cassette typically comprises polynucleotides in combinations that are not found in nature. For instance, human manipulated restriction sites or plasmid vector sequences can flank or separate the promoter from other sequences.
  • a recombinant protein is one that is expressed from a recombinant polynucleotide, and recombinant cells, tissues, and organisms are those that comprise recombinant sequences (polynucleotide and/or polypeptide).
  • heterologous refers to biological material that is introduced, inserted, or incorporated into a recipient (e.g., host) organism that originates from another organism.
  • heterologous material that is introduced into the recipient organism is not normally found in that organism.
  • Heterologous material can include, but is not limited to, nucleic acids, amino acids, peptides, proteins, and structural elements such as genes, promoters, and cassettes.
  • a host cell can be, but is not limited to, a bacterium, a yeast cell, a mammalian cell, or a plant cell. The introduction of heterologous material into a host cell or organism can result, in some instances, in the expression of additional heterologous material in or by the host cell or organism.
  • the transformation of a yeast host cell with an expression vector that contains DNA sequences encoding a bacterial protein may result in the expression of the bacterial protein by the yeast cell.
  • the incorporation of heterologous material may be permanent or transient.
  • the expression of heterologous material may be permanent or transient.
  • reporter and “selectable marker” can be used interchangeably and refer to a gene product that permits a cell expressing that gene product to be identified and/or isolated from a mixed population of cells. Such isolation might be achieved through the selective killing of cells not expressing the selectable marker, which may be, as a non- limiting example, an antibiotic resistance gene.
  • the selectable marker may permit identification and/or subsequent isolation of cells expressing the marker as a result of the expression of a fluorescent protein such as GFP or the expression of a cell surface marker which permits isolation of cells by fluorescence-activated cell sorting (FACS), magnetic- activated cell sorting (MACS), or analogous methods.
  • a cell surface marker include CD8, CD19, and truncated CD19.
  • cell surface markers used for isolating desired cells are non-signaling molecules, such as subunit or truncated forms of CD8, CD19, or CD20. Suitable markers and techniques are known in the art.
  • culture when referring to cell culture itself or the process of culturing, can be used interchangeably to mean that a cell (e.g., yeast cell) is maintained outside its normal environment under controlled conditions, e.g., under conditions suitable for survival.
  • a cell e.g., yeast cell
  • Cultured cells are allowed to survive, and culturing can result in cell growth, stasis, differentiation or division. The term does not imply that all cells in the culture survive, grow, or divide, as some may naturally die or senesce.
  • Cells are typically cultured in media, which can be changed during the course of the culture.
  • the terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.
  • administering includes oral administration, topical contact, administration as a suppository, intravenous, intraperitoneal, intramuscular, intralesional, intrathecal, intranasal, or subcutaneous administration to a subject.
  • Parenteral administration is by any route, including parenteral and transmucosal (e.g., buccal, sublingual, palatal, gingival, nasal, vaginal, rectal, or transdermal).
  • Parenteral administration includes, e.g., intravenous, intramuscular, intra-arteriole, intradermal, subcutaneous, intraperitoneal, intraventricular, and intracranial.
  • Other modes of delivery include, but are not limited to, the use of liposomal formulations, intravenous infusion, transdermal patches, etc.
  • the term “treating” refers to an approach for obtaining beneficial or desired results including, but not limited to, a therapeutic benefit and/or a prophylactic benefit.
  • compositions may be administered to a subject at risk of developing a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, even though the disease, condition, or symptom may not have yet been manifested.
  • effective amount or “sufficient amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results.
  • the therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary skill in the art.
  • the specific amount may vary depending on one or more of: the particular agent chosen, the host cell type, the location of the host cell in the subject, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, and the physical delivery system in which it is carried.
  • pharmaceutically acceptable carrier refers to a substance that aids the administration of an active agent to a cell, an organism, or a subject.
  • “Pharmaceutically acceptable carrier” refers to a carrier or excipient that can be included in the compositions of the disclosure and that causes no significant adverse toxicological effect on the patient.
  • Non- limiting examples of pharmaceutically acceptable carrier include water, NaCl, normal saline solutions, lactated Ringer’s, normal sucrose, normal glucose, cell culture media, and the like.
  • pharmaceutically acceptable carrier include water, NaCl, normal saline solutions, lactated Ringer’s, normal sucrose, normal glucose, cell culture media, and the like.
  • degrons can be located anywhere in a protein, and can include short amino acid sequences, structural motifs, or exposed amino acids (e.g., lysine, arginine). Degrons exist in both prokaryotic and eukaryotic organisms. Degrons can be classified as being either ubiquitin-dependent or ubiquitin-independent.
  • cellular localization tag refers to an amino acid sequence, also known as a “protein localization signal,” that targets a protein for localization to a specific cellular or subcellular region, compartment, or organelle (e.g., nuclear localization sequence, Golgi retention signal).
  • Cellular localization tags are typically located at either the N-terminal or C- terminal end of a protein.
  • the term “synthetic response element” refers to a recombinant DNA sequence that is recognized by a transcription factor and facilitates gene regulation by various regulatory agents. A synthetic response element can be located within a gene promoter and/or enhancer region.
  • the term “ribozyme” refers to an RNA molecule that is capable of catalyzing a biochemical reaction.
  • ribozymes function in protein synthesis, catalyzing the linking of amino acids in the ribosome.
  • ribozymes participate in various other RNA processing functions, such as splicing, viral replication, and tRNA biosynthesis.
  • ribozymes can be self-cleaving.
  • Non-limiting examples of ribozymes include the HDV ribozyme, the Lariat capping ribozyme (formally called GIR1 branching ribozyme), the glmS ribozyme, group I and group II self-splicing introns, the hairpin ribozyme, the hammerhead ribozyme, various rRNA molecules, RNase P, the twister ribozyme, the VS ribozyme, the pistol ribozyme, and the hatchet ribozyme.
  • GIR1 branching ribozyme Lariat capping ribozyme
  • glmS ribozyme group I and group II self-splicing introns
  • the hairpin ribozyme the hammerhead ribozyme
  • various rRNA molecules RNase P
  • the twister ribozyme the VS ribozyme
  • pistol ribozyme the hatchet ribozyme
  • Percent similarity in the context of polynucleotide or peptide sequences, is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the sequence (e.g., an msr locus sequence) in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence which does not comprise additions or deletions, for optimal alignment of the two sequences.
  • the percentage is calculated by determining the number of positions at which the identical nucleotide or amino acid occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of similarity (e.g., sequence similarity).
  • a polynucleotide or peptide has at least about 70% similarity (e.g., sequence similarity), preferably at least about 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% similarity, to a reference sequence, when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection, such sequences are then said to be “substantially similar.”
  • this definition also refers to the complement of a test sequence.
  • sequence comparison typically one sequence acts as a reference sequence, to which test sequences are compared.
  • test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated.
  • sequence comparison algorithm then calculates the percent sequence similarities for the test sequences relative to the reference sequence, based on the program parameters.
  • sequence comparison of nucleic acids and proteins the BLAST and BLAST 2.0 algorithms and the default parameters discussed below are used.
  • Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math.2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol.48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat’l. Acad. Sci.
  • HSPs high scoring sequence pairs
  • Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mismatching residues; always ⁇ 0).
  • the BLASTP program uses as defaults a word size (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see, e.g., Henikoff and Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989)).
  • the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul, Proc. Nat’l. Acad. Sci. USA, 90:5873-5787 (1993)).
  • One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
  • P(N) the smallest sum probability
  • a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001.
  • the present disclosure provides compositions and methods for simultaneously introducing genetic modifications at two different target loci in the genome of a host cell.
  • the disclosure provides methods comprising the use of retron-guide RNA cassettes, vectors comprising said cassettes, and retron donor DNA-guide molecules of the present disclosure to modify nucleic acids of interest at target loci of interest, and to screen genetic loci of interest, in the genomes of host cells.
  • the present disclosure also provides compositions and methods for preventing or treating genetic diseases by enhancing precise genome editing to correct a mutation in target genes associated with the diseases. Kits for genome editing and screening are also provided.
  • the present disclosure can be used with any cell type and at any gene locus that is amenable to nuclease-mediated genome editing technology.
  • the present disclosure provides a retron-guide RNA (gRNA) cassette.
  • the cassette comprises: (a) a first retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) an msd locus; (iv) a first donor DNA sequence located within the msd locus, wherein the first donor DNA sequence comprises homology to one or more sequences within a first target locus; and (v) a second inverted repeat sequence coding region; and (b) a first guide RNA (gRNA) coding region; (c) a second retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) an msd locus; (iv) a second donor DNA sequence located within the second msd locus, wherein the second donor DNA sequence comprises homology to one
  • the first donor DNA sequence can introduce a genetic modification or edit at the first target locus.
  • the first and second donor DNA sequences can introduce genetic modifications or edits at the first and second target loci.
  • the first donor DNA sequence comprises a genetic variant compared to the sequences within the first target locus.
  • the first and second donor DNA sequences comprise genetic variants compared to the sequences within the first and second target loci, respectively.
  • the first and second donor DNA sequences can introduce genetic modifications at the first and second target loci by HDR.
  • the second donor DNA sequence comprises a sequence having a mutation (or edit) relative to the nucleic acid sequence of the second target locus.
  • the second donor DNA sequence comprises or further comprises a unique barcode sequence.
  • the second donor DNA sequence comprises both a mutation (or edit) relative to the nucleic acid sequence of the second target locus and a unique barcode sequence.
  • the retron-guide RNA (gRNA) cassette can be used to introduce two mutations/edits at the first and second target loci, or to introduce two mutations/edits at the first and second target loci and a unique barcode sequence at the second target loci.
  • the mutations introduced by the first and second donor DNA sequences are different.
  • the barcode sequence comprises a defined sequence that can be distinguished from endogenous sequences by sequencing the target locus.
  • Examples of exemplary barcode sequences include random barcodes synthesized with poly-(N) tracts, which are added to the retron-sgRNA cassettes by PCR and associated with the first edit by paired sequencing of cloned plasmid libraries; programmed barcodes of 12-bp sequences that exclude common restriction sites; and retron, sgRNA or next-generation sequencing (NGS) related sequences with defined hamming distance between any pair of barcodes.
  • the barcode sequence encodes a detectable molecule, such as a fluorescent protein, a selectable marker, or a cell surface marker.
  • compositions and methods described herein provide the ability to introduce two or more edits into the genome of a host cell, where a first edit at the first target locus causes a biological effect that can be monitored by measuring the second edit at the second target locus.
  • the first edit comprises an eQTL variant edit that affects expression/transcription of a gene, which can be tracked by the RNA/DNA ratio of the second edit (e.g., by inserting a barcode sequence into the 3’UTR of the gene).
  • the first edit at the first target locus affects the phenotype of a cell, such as cell physiology or growth, cultured in a media comprising a test compound or drug, where the phenotype can be monitored by determining the number of copies of a DNA barcode inserted at the second target locus measured at different timepoints during growth in the media comprising the test compound or drug.
  • the first edit at the first target locus introduces an amino acid variant in an enzyme, and the second edit inserts a barcode into a gene encoding a substrate of the enzyme.
  • the first edit at the first target locus introduces an amino acid variant into a ubiquitin ligase that affects target protein translation
  • the first edit can be tracked by sorting cells comprising a barcode and sequences encoding a detectable marker (such as green fluorescent protein (GFP)) integrated at the second target locus, e.g., in sequences encoding the C-terminus of a target protein.
  • a detectable marker such as green fluorescent protein (GFP)
  • GFP green fluorescent protein
  • the first or second gRNA coding region is upstream of the first or second retron in the cassette such that transcription of the cassette results in a transcript in which the gRNA is 5’ of the RNA transcribed from the retron.
  • transcription products of the retron and the gRNA coding region are physically coupled.
  • the resulting gRNA and donor DNA sequences are also physically coupled (e.g., during genome editing and/or screening).
  • the transcription products are coupled during a single transcription event.
  • the transcription products of the retron and the gRNA coding region are initially coupled, and then subsequently become uncoupled (e.g., after transcription of the retron, or after reverse transcription of the retron transcript), in which case the guide RNA and the donor DNA sequence will also be physically uncoupled during genome editing and/or screening.
  • uncoupling can be induced by a ribozyme.
  • a suitable ribozyme is the hepatitis delta virus (HDV) ribozyme.
  • the cassette further comprises a ribozyme sequence (e.g., HDV ribozyme sequence).
  • the ribozyme sequence encodes a ribozyme selected from the group consisting of hepatitis delta virus (HDV) ribozyme, drz- Agam1-1, drzAgam1-2, drzPmar-1, Twister, Hammerhead and combinations thereof.
  • HDV hepatitis delta virus
  • transcription products of the retron and the gRNA coding region are not initially physically coupled (i.e., the transcription products are created in separate transcription events).
  • the retron and the gRNA coding region can be included in two different retron-gRNA cassettes, which can be included in the same vector or in different vectors.
  • expression from the vector(s) occurs inside a host cell.
  • transcription of the retron and/or the gRNA coding region occurs outside of the host cell, and then the transcription product(s) are introduced into the host cell.
  • the transcription products are created in separate transcription events and are subsequently joined together for genome editing and/or screening, in which case the resulting gRNA and donor DNA sequence will also be physically coupled for genome editing and/or screening. Such joining can occur before or after reverse transcription of the retron transcript (i.e., before or after creation of msDNA from the retron transcript).
  • the transcription products of the retron and the gRNA coding region result in a donor DNA sequence and a gRNA that are never physically coupled.
  • the retron and the gRNA coding region are located in different cassettes and the resulting donor DNA sequence and gRNA act in trans.
  • the gRNA coding region of the cassette is located 3’ of the retron. In other embodiments, the gRNA coding region is located 5’ of the retron. The relative positions of the gRNA coding region and retron may be selected, for example, based upon the particular nuclease being used.
  • the retron-gRNA cassette is at least about 5,000 nucleotides in length.
  • the retron-gRNA cassette is between about 1,000 and 5,000 (i.e., about 1,000, 1,100, 1,200, 1,300, 1,400, 1,500, 1,600, 1,700, 1,800, 1,900, 2,000, 2,100, 2,200, 2,300, 2,400, 2,500, 2,600, 2,700, 2,800, 2,900, 3,000, 3,100, 3,200, 3,300, 3,400, 3,500, 3,600, 3,700, 3,800, 3,900, 4,000, 4,100, 4,200, 4,300, 4,400, 4,500, 4,600, 4,700, 4,800, 4,900, or 5,000) nucleotides in length.
  • the cassette is between about 300 and 1,000 (i.e., about 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1,000) nucleotides in length.
  • the cassette is between about 200 and 300 (i.e., about 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, or 300) nucleotides in length.
  • the cassette is between about 30 and 200 (i.e., about 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200) nucleotides in length.
  • the cassette further comprises one or more sequences having homology to a vector cloning site.
  • These vector homology sequences can be about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more nucleotides in length.
  • the vector homology sequences are about 20 nucleotides in length.
  • the vector homology sequence are about 15 nucleotides in length.
  • the vector homology sequences are about 25 nucleotides in length.
  • the present disclosure provides a vector comprising a retron- guide RNA cassette of the present disclosure.
  • the vector further comprises a promoter.
  • the promoter is operably linked to the cassette.
  • the promoter is inducible.
  • the promoter is an RNA polymerase II promoter.
  • the promoter is an RNA polymerase III promoter.
  • a combination of promoters is used.
  • the vector further comprises a terminator sequence.
  • Vectors of the present disclosure can include commercially available recombinant expression vectors and fragments and variants thereof.
  • Vectors of the present disclosure may further comprise a reverse transcriptase (RT) coding sequence and, optionally, may further comprise a nuclear localization sequence (NLS). In some instances, the NLS will be located 5’ of the RT coding sequence.
  • Vectors of the present disclosure can further comprise a nuclease coding sequence. The sequence can encode Cas9, Cpf1, or any other suitable nuclease. Examples of suitable nucleases are provided herein and will also be known to one of skill in the art.
  • expression of the retron-gRNA cassette and the RT coding sequence and/or the nuclease coding sequence can all be under the control of a single promoter.
  • expression of the retron-gRNA cassette and the RT coding sequence and/or the nuclease coding sequence can each be under the control of a different promoter.
  • Other combinations are also possible.
  • expression of the retron-gRNA cassette can be under the control of one promoter, while expression of the RT coding sequence and/or the nuclease coding sequence are under the control of another promoter.
  • expression of the retron-gRNA cassette and expression of the RT coding sequence can be under the control of one promoter, while expression of the nuclease coding sequence can be under the control of another promoter.
  • expression of the retron-gRNA cassette and expression of the nuclease coding sequence can be under the control of one promoter, while the RT coding sequence is under the control of another promoter.
  • one or more of the promoters are inducible.
  • the vector can comprise a retron-gRNA cassette under the control of a Gal7 promoter, an RT coding sequence under the control of a Gal10 promoter, and a nuclease (e.g., Cas9) coding sequence under the control of a Gal1 promoter.
  • a reporter unit that includes a nucleotide sequence encoding a reporter polypeptide (e.g., a detectable polypeptide, fluorescent polypeptide, or a selectable marker (e.g., URA3)).
  • the size of the vector will depend on the size of the individual components within the vector, e.g., retron-gRNA cassette, RT coding sequence, nuclease coding sequence, NLS, and so on.
  • the vector is between about 1,000 and about 20,000 (i.e., about 1,000, 1,500, 2,000, 2,500, 3,000, 3,500, 4,000, 4,500, 5,000, 5,500, 6,000, 6,500, 7,000, 7,500, 8,000, 8,500, 9,000, 9,500, 10,000, 10,500, 11,000, 11,500, 12,000, 12,500, 13,000, 13,500, 14,000, 14,500, 15,000, 15,500, 16,000, 16,500, 17,000, 17,500, 18,000, 18,500, 19,000, 19,500, or 20,000) nucleotides in length.
  • the vector is more than about 20,000 nucleotides in length.
  • msDNA multicopy single-stranded DNA
  • the donor DNA sequence is physically coupled to the gRNA, by virtue of the msDNA being physically coupled to the gRNA.
  • at least some of the RNA content of the msDNA is degraded (e.g., by an RNase such as RNase H).
  • Retrons have been known for some time as a class of retroelement, first discovered in gram-negative bacteria such as Myxococcus xanthus (e.g., retrons Mx65 and Mx162), Stigmatella aurantiaca (e.g., retron Sa163), and Escherichia coli (e.g., retrons Ec48, Ec67, Ec73, Ec78, Ec83, Ec86, and Ec107).
  • Myxococcus xanthus e.g., retrons Mx65 and Mx162
  • Stigmatella aurantiaca e.g., retron Sa163
  • Escherichia coli e.g., retrons Ec48, Ec67, Ec73, Ec78, Ec83, Ec86, and Ec107.
  • Retrons are also found in Salmonella typhimurium (e.g., retron St85), Salmonella enteritidis, Vibrio cholera (e.g., retron Vc95), Vibrio parahaemolyticus (e.g., retron Vp96), Klebsiella pneumoniae, Proteus mirabilis, Xanthomonas campestris, Rhizobium sp., Bradyrhizobium sp., Ralstonia metallidurans, Nannocystis exedens (e.g., retron Ne144), Geobacter sulfurreducens, Trichodesmium erythraeum, Nostoc punctiforme, Nostoc sp., Staphylococcus aureus, Fusobacterium nucleatum, and Flexibacter elegans.
  • Salmonella typhimurium e.g., retron St85
  • Salmonella enteritidis e.g., retron
  • the present disclosure provides for retron- guide RNA cassettes that comprise a retron.
  • the retron is derived from the E. coli retron Ec86, which is shown in FIG.2.
  • Retrons mediate the synthesis in host cells of multicopy single-stranded DNA (msDNA) molecules, which result from the reverse transcription of a retron transcript and typically include a DNA component and an RNA component.
  • the native msDNA molecules reportedly exist as single-stranded DNA-RNA hybrids, characterized by a structure which comprises a single-stranded DNA branching out of an internal guanosine residue of a single- stranded RNA molecule at a 2 ⁇ ,5 ⁇ -phosphodiester linkage.
  • RNA content of the msDNA molecule is degraded. In some instances, the RNA content is degraded by RNase H.
  • Native retrons have been found to consist of the gene for reverse transcriptase (RT) and msr and msd loci under the control of a single promoter.
  • a vector comprising a retron-guide RNA cassette further comprises a sequence encoding an RT.
  • methods are provided wherein the RT is encoded on a separate plasmid from the retron-guide RNA cassette.
  • the RT is encoded in a sequence that has been integrated into the host cell genome.
  • the msd region of a retron transcript typically codes for the DNA component of msDNA
  • the msr region of a retron transcript typically codes for the RNA component of msDNA.
  • the msr and msd loci have overlapping ends, and may be oriented opposite one another with a promoter located upstream of the msr locus which transcribes through the msr and msd loci.
  • sequence of the msd locus will vary, depending on the particular donor DNA sequence that is located within the msd locus.
  • the msd and msr regions of retron transcripts generally contain first and second inverted repeat sequences, which together make up a stable stem structure.
  • the combined msr-msd region of the retron transcript serves not only as a template for reverse transcription but, by virtue of its secondary structure, also serves as a primer (i.e., self-priming) for msDNA synthesis by a reverse transcriptase.
  • the first inverted repeat sequence coding region is located within the 5’ end of the msr locus.
  • the second inverted repeat sequence coding region is located 3’ of the msd locus.
  • the first inverted repeat sequence is located within the 5’ end of the msr region.
  • the second inverted repeat sequence is located 3’ of the msd region.
  • a non-limiting example is shown in FIG.4, wherein the msr and msd loci are arranged in opposite orientations.
  • the first inverted sequence repeat coding region is shown at the 5’ end of the cassette, while the second inverted sequence repeat coding region is shown near the 3’ end of the cassette.
  • sequence of an inverted repeat sequence coding region can be varied, so long as the sequence of the counterpart inverted repeat sequence coding region within the same retron is also varied such that the two resulting inverted repeat sequences (i.e., present within a retron transcript) are complementary and allow for the formation of a stable stem structure.
  • Any number of RTs may be used in alternative embodiments of the present disclosure, including prokaryotic and eukaryotic RTs. If desired, the nucleotide sequence of a native RT may be modified, for example using known codon optimization techniques, so that expression within the desired host is optimized.
  • the RT may be targeted to the nucleus so that efficient utilization of the RNA template may take place.
  • An example of such a RT includes any known RT, either prokaryotic or eukaryotic, fused to a nuclear localization sequence or signal (NLS).
  • the vector further comprises an NLS.
  • the NLS is located 5’ of the RT coding sequence.
  • any suitable NLS may also be used, providing that the NLS assists in localizing the RT within the nucleus.
  • the use of an RT in the absence of an NLS may also be used if the RT is present within the nuclear compartment at a level that synthesizes a product from the RNA template.
  • the retron-guide RNA cassettes and retron donor DNA-guide molecules of the present disclosure comprise guide RNA (gRNA) coding regions and gRNA molecules, respectively.
  • the gRNAs for use in the CRISPR-retron system of the present disclosure typically include a crRNA sequence that is complementary to a target nucleic acid sequence and may include a scaffold sequence (e.g., tracrRNA) that interacts with a Cas nuclease (e.g., Cas9) or a variant or fragment thereof, depending on the particular nuclease being used.
  • the gRNA can comprise any nucleic acid sequence having sufficient complementarity with a target polynucleotide sequence (e.g., target DNA sequence) to hybridize with the target sequence and direct sequence-specific binding of a nuclease to the target sequence.
  • a target polynucleotide sequence e.g., target DNA sequence
  • the gRNA may recognize a protospacer adjacent motif (PAM) sequence that may be near or adjacent to the target DNA sequence.
  • PAM protospacer adjacent motif
  • the target DNA site may lie immediately 5’ of a PAM sequence, which is specific to the bacterial species of the Cas9 used.
  • the PAM sequence of Streptococcus pyogenes-derived Cas9 is NGG; the PAM sequence of Neisseria meningitidis-derived Cas9 is NNNNGATT; the PAM sequence of Streptococcus thermophilus-derived Cas9 is NNAGAA; and the PAM sequence of Treponema denticola-derived Cas9 is NAAAAC.
  • the PAM sequence can be 5’-NGG, wherein N is any nucleotide; 5’-NRG, wherein N is any nucleotide and R is a purine; or 5’-NNGRR, wherein N is any nucleotide and R is a purine.
  • the selected target DNA sequence should immediately precede (i.e., be located 5’ of) a 5’NGG PAM, wherein N is any nucleotide, such that the guide sequence of the DNA- targeting RNA (e.g., gRNA) base pairs with the opposite strand to mediate cleavage at about 3 base pairs upstream of the PAM sequence.
  • the target DNA site may lie immediately 3’ of a PAM sequence, e.g., when the Cpf1 endonuclease is used.
  • the PAM sequence is 5’- TTTN, where N is any nucleotide.
  • the target DNA sequence i.e., the genomic DNA sequence having complementarity for the gRNA
  • the target DNA sequence will typically follow (i.e., be located 3’ of) the PAM sequence.
  • Two CP1-family nucleases, AsCpf1 (from Acidaminococcus) and LbCpf1 (from Lachnospiraceae) are known to function in human cells. Both AsCpf1 and LbCpf1 cut 19 bp after the PAM sequence on the targeted strand and 23 bp after the PAM sequence on the opposite strand of the DNA molecule.
  • the degree of complementarity between a guide sequence of the gRNA (i.e., crRNA sequence) and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
  • Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
  • Burrows-Wheeler Transform e.g., the Burrows Wheeler Aligner
  • ClustalW ClustalW
  • Clustal X Clustal X
  • BLAT Novoalign
  • SOAP available at soap.genomics.org.cn
  • Maq available at maq.sourceforge.net
  • a crRNA sequence is about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some instances, a crRNA sequence is about 20 nucleotides in length. In other instances, a crRNA sequence is about 15 nucleotides in length. In other instances, a crRNA sequence is about 25 nucleotides in length. [0165] The nucleotide sequence of a modified gRNA can be selected using any of the web- based software described above.
  • Considerations for selecting a DNA-targeting RNA include the PAM sequence for the nuclease (e.g., Cas9 or Cpf1) to be used, and strategies for minimizing off-target modifications.
  • Tools such as the CRISPR Design Tool, can provide sequences for preparing the gRNA, for assessing target modification efficiency, and/or assessing cleavage at off-target sites.
  • the length of the gRNA molecule is about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, or more nucleotides in length.
  • the length of the gRNA is about 100 nucleotides in length.
  • the gRNA is about 90 nucleotides in length.
  • the gRNA is about 110 nucleotides in length. 3.
  • the present disclosure provides retron-guide RNA cassettes comprising a retron that comprises a donor DNA sequence.
  • the present disclosure provides retron donor DNA-guide molecules comprising retron transcripts that comprise donor DNA sequence coding regions, the retron transcripts subsequently being reverse transcribed to yield msDNA that comprises a donor DNA sequence.
  • the donor DNA sequence or sequences participate in homology-directed repair (HDR) of genetic loci of interest following cleavage of genomic DNA at the genetic locus or loci of interest (i.e., after a nuclease has been directed to cut at a specific genetic locus of interest, targeted by binding of gRNA to a target sequence).
  • HDR homology-directed repair
  • the recombinant donor repair template (i.e., donor DNA sequence) comprises two homology arms that are homologous to portions of the sequence of the genetic locus of interest at either side of a Cas nuclease (e.g., Cas9 or Cpf1 nuclease) cleavage site.
  • the homology arms may be the same length or may have different lengths.
  • each homology arm has at least about 70 to about 99 percent similarity (i.e., at least about 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95.96, 97, 98, or 99 percent similarity) to a portion of the sequence of the genetic locus of interest at either side of a nuclease (e.g., Cas nuclease) cleavage site.
  • a nuclease e.g., Cas nuclease
  • the recombinant donor repair template comprises or further comprises a reporter unit that includes a nucleotide sequence encoding a reporter polypeptide (e.g., a detectable polypeptide, fluorescent polypeptide, or a selectable marker). If present, the two homology arms can flank the reporter cassette and are homologous to portions of the genetic locus of interest at either side of the Cas nuclease cleavage site.
  • the reporter unit can further comprise a sequence encoding a self-cleavage peptide, one or more nuclear localization signals, and/or a fluorescent polypeptide (e.g., superfolder GFP (sfGFP)). Other suitable reporters are described herein.
  • the donor DNA sequence is at least about 500 to 10,000 (i.e., at least about 500, 600, 700, 800, 900, 1,000, 1,100, 1,200, 1,300, 1,400, 1,500, 1,600, 1,700, 1,800, 1,900, 2,000, 2,500, 3,000, 3,500, 4,000, 4,500, 5,000, 5,500, 6,000, 6,500, 7,000, 7,500, 8,000, 8,500, 9,000, 9,500, or 10,000) nucleotides in length.
  • the donor DNA sequence is between about 600 and 1,000 (i.e., about 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, or 1,000) nucleotides in length.
  • the donor DNA sequence is between about 100 and 500 (i.e., about 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, or 500) nucleotides in length.
  • the donor DNA sequence is less than about 100 (i.e., less than about 100, 95, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, 15, 10, or 5) nucleotides in length.
  • the donor DNA sequence in the second retron comprises a barcode sequence.
  • the barcode sequence comprises a defined sequence that can be distinguished from endogenous sequences by sequencing the target locus.
  • Examples of exemplary barcode sequences include random barcodes synthesized with poly-(N) tracts, which are added to the retron-sgRNA cassettes by PCR and associated with the first edit by paired sequencing of cloned plasmid libraries; programmed barcodes of 12- bp sequences that exclude common restriction sites; and retron, sgRNA or next-generation sequencing (NGS) related sequences with defined hamming distance between any pair of barcodes.
  • the barcode sequence encodes a detectable molecule, such as a fluorescent protein, a selectable marker, or a cell surface marker.
  • the compositons and methods of the disclosure can be used to introduce genetic modifications anywhere in the genomic or chromosomal DNA of a cell, or in exogenous (non-host cell) DNA, such as the DNA of transgenes, viruses or transposons.
  • the exogenous DNA is present in the nucleus of a host cell.
  • the exogenous DNA is integrated into the host cell genomic DNA, for example as a transgene.
  • the compositons and methods of the disclosure can be used to modify a heterologous or exogenous genome, such as a viral genome, a bacterial genome, a transposable element or an endovirus genome that are not part of the endogenous host cell genome.
  • the compositons and methods of the disclosure can be used to modify a heterologous or exogenous genome of a pathogen, such as a virus or bacteria, that is present in the host cell.
  • the target locus is located in heterologous or exogenous DNA that is not integrated into the host cell genomic DNA, such as transiently expressed transgenes, episomes or plasmids.
  • the method identifies a genetic modification at a target locus within a genome of a host cell, where the genome comprises the endogenous genomic chromosomal DNA of the host cell. In some embodiments, the method identifies a genetic modification at a target locus anywhere within a genome of a host cell.
  • the target locus is located in an exogenous genome that is present in a host cell, such as a viral genome, a bacterial genome, a transposable element or an endovirus genome that are not part of the endogenous host cell genome.
  • the target locus is located in heterologous or exogenous DNA, such as the DNA of transgenes, viruses or transposons, that are present in the host cell or host cell nucleus.
  • the target locus is located in heterologous or exogenous DNA that is integrated into the host cell genomic DNA.
  • the target locus is located in heterologous or exogenous DNA that is not integrated into the host cell genomic DNA, such as transiently expressed transgenes, episomes or plasmids.
  • the retron-guide RNA cassette comprises a first donor DNA sequence having homology to one or more sequences within a first target locus, and a second donor DNA sequence located within the second msd locus, wherein the second donor DNA sequence comprises homology to one or more sequences within a second target locus and a unique barcode sequence, where the first and second target loci are located within the genomic DNA of a host cell.
  • the retron-guide RNA cassette comprises a first donor DNA sequence having homology to one or more sequences within a first target locus, and a second donor DNA sequence located within the second msd locus, wherein the second donor DNA sequence comprises homology to one or more sequences within a second target locus and a unique barcode sequence, where the first and second target loci are located within exogenous or heterologous DNA that is present in a host cell or organism.
  • the first and second target loci are located within exogenous or heterologous DNA that is integrated in the host cell genomic DNA.
  • the first and second target loci are located within exogenous or heterologous DNA that is not-integrated in the host cell genomic DNA.
  • the first target locus is located in cis to the second target locus.
  • the first and second target loci are located on the same chromosome, in the same gene, or adjacent to or within the same transcription unit.
  • the first target locus is located in a cis-regulatory element of a transcription unit, and the second target locus is located at a different position in the transcription unit.
  • the first target locus is located upstream or 5’ of a gene or transcription unit, and the second target locus is located downstream or 3’ of a gene or transcription unit.
  • the first target locus is located in a cis-regulatory element of a transcription unit, and the second target locus is located in the 3’ untranslated region (UTR) of the same transcription unit.
  • the first and/or second target locus is located in an intron or non-coding RNA expressed by a gene.
  • the first donor DNA sequence in the retron cassette comprises a genetic variant, such as a single nucleotide polymorphism, missense mutation, synonymous mutation, nonsense mutation, insertion, or a deletion, relative to the sequence at the first target locus.
  • the genetic variant comprises a cis-expression quantitative train locus (cis-eQTL) variant at the first target locus.
  • the first target locus is located in trans to the second target locus.
  • the first and second target loci are located on different chromosomes or in different genes.
  • the first target locus is located in a trans-regulatory element, and the second target locus is located in a gene, or in a transcription unit that is in trans to the first target locus.
  • the first target locus is located in a trans-regulatory element, and the second target locus is located in the 3’ untranslated region (UTR) of a transcription unit in trans to the first target locus.
  • the first donor DNA sequence in the retron cassette comprises a genetic variant compared to the sequences within the first target locus.
  • the genetic variant comprises an amino acid change in a transcription factor that regulates the expression (e.g., transcription) of another gene or transcript.
  • the genetic variant comprises a mutation in a transcription factor binding site that modifies the expression of a gene or transcript located in cis or trans to the second target locus.
  • the genetic variant comprises a trans-expression quantitative train locus (trans-eQTL) variant at the first target locus.
  • multiple rounds of genetic targeting are performed on the same pool of cells, or a single cell that has a genetic modification at a target locus.
  • the first round of genetic editing can introduce a genetic modification at a first target locus and a barcode sequence at a second target locus.
  • a second genetic modification can be introduced at the same first target locus or a different (third) target locus and a new genetic modification in the barcode sequence, or a new unique barcode sequence, is introduced at the second target locus.
  • the consecutive barcodes can be identified by NGS or Sanger sequencing.
  • the barcodes could encode different fluorescent markers and the combinations of markers can be determined by flow cytometry or fluorescence microscopy.
  • the barcodes could encode different peptides and the combinations of peptides can be determined by mass spectrometry.
  • the second target locus corresponds to a region of the genome that is transcriptionally competent but is not likely to cause adverse effects on cells resulting from mutated or inserted DNA, often referred to as “safe-harbors.”
  • the second target locus is i) located in an intron or ii) is not located in genomic sequences that regulate transcription or translation of a gene.
  • the second target locus comprises the yeast S. cerevisiae YBR209W locus described in Levy SF, et al., Quantitative evolutionary dynamics using high-resolution lineage tracking. Nature. 2015 Mar 12;519(7542):181-6. doi: 10.1038/nature14279.
  • the second target locus comprises the human AAVS1 (also known as the PPP1R12C locus) locus on chromosome 19.
  • the AAVS1 is a well-validated “safe harbor” for inserting DNA transgenes with expected function. It has an open chromatin structure and is transcription-competent. Most importantly, there are no known adverse effects on cells resulting from the inserted DNA fragment of interest. See the internet at www.genecopoeia.com/product/aavs1-safe-harbor/. C.
  • the CRISPR/Cas system of genome modification includes a Cas nuclease (e.g., Cas9 or Cpf1 nuclease) or a variant or fragment or combination thereof and a DNA-targeting RNA (e.g., guide RNA (gRNA)).
  • the gRNA may contain a guide sequence that targets the Cas nuclease to the target genomic DNA and a scaffold sequence that interacts with the Cas nuclease (e.g., tracrRNA).
  • the system may optionally include a donor repair template.
  • a fragment of a Cas nuclease or a variant thereof with desired properties can be used.
  • the donor repair template can include a nucleotide sequence encoding a reporter polypeptide such as a fluorescent protein or an antibiotic resistance marker, and homology arms that are homologous to the target DNA and flank the site of gene modification.
  • the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas (CRISPR-associated protein) nuclease system is an engineered nuclease system based on a bacterial system that can be used for genome engineering.
  • the crRNA then associates, through a region of partial complementarity, with another type of RNA called tracrRNA to guide the Cas (e.g., Cas9) nuclease to a region homologous to the crRNA in the target DNA called a “protospacer.”
  • the Cas (e.g., Cas9) nuclease cleaves the DNA to generate blunt ends at the double-strand break at sites specified by a 20-nucleotide guide sequence contained within the crRNA transcript.
  • the Cas (e.g., Cas9) nuclease may require both the crRNA and the tracrRNA for site-specific DNA recognition and cleavage.
  • This system has now been engineered such that the crRNA and tracrRNA, if needed, can be combined into one molecule (the “single guide RNA” or “sgRNA”), and the crRNA equivalent portion of the guide RNA can be engineered to guide the Cas (e.g., Cas9) nuclease to target any desired sequence (see, e.g., Jinek et al. (2012) Science, 337:816-821; Jinek et al. (2013) eLife, 2:e00471; Segal (2013) eLife, 2:e00563).
  • the Cas e.g., Cas9 nuclease
  • the CRISPR/Cas system can be engineered to create a double-strand break at a desired target in a genome of a cell, and harness the cell’s endogenous mechanisms to repair the induced break by homology-directed repair (HDR) or nonhomologous end-joining (NHEJ).
  • HDR homology-directed repair
  • NHEJ nonhomologous end-joining
  • the Cas nuclease can direct cleavage of one or both strands at a location in a target DNA sequence.
  • the Cas nuclease can be a nickase having one or more inactivated catalytic domains that cleaves a single strand of a target DNA sequence.
  • Non-limiting examples of Cas nucleases include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3, Csf4, Cpf1, homologs thereof, variants thereof, fragments thereof, mutants thereof, derivatives thereof, and combinations thereof.
  • Type II Cas nucleases There are three main types of Cas nucleases (type I, type II, and type III), and 10 subtypes including 5 type I, 3 type II, and 2 type III proteins (see, e.g., Hochstrasser and Doudna, Trends Biochem Sci, 2015:40(1):58- 66).
  • Type II Cas nucleases include Cas1, Cas2, Csn2, Cas9, and Cpf1. These Cas nucleases are known to those skilled in the art.
  • the amino acid sequence of the Streptococcus pyogenes wild-type Cas9 polypeptide is set forth, e.g., in NBCI Ref. Seq. No.
  • NP_269215 and the amino acid sequence of Streptococcus thermophilus wild-type Cas9 polypeptide is set forth, e.g., in NBCI Ref. Seq. No. WP_011681470. Furthermore, the amino acid sequence of Acidaminococcus sp. BV3L6 is set forth, e.g., in NBCI Ref. Seq. No. WP_021736722.1.
  • Cas nucleases can be derived from a variety of bacterial species including, but not limited to, Veillonella atypical, Fusobacterium nucleatum, Filifactor alocis, Solobacterium moorei, Coprococcus catus, Treponema denticola, Peptoniphilus duerdenii, Catenibacterium mitsuokai, Streptococcus mutans, Listeria innocua, Staphylococcus pseudintermedius, Acidaminococcus intestine, Olsenella uli, Oenococcus kitaharae, Bifidobacterium bifidum, Lactobacillus rhamnosus, Lactobacillus gasseri, Finegoldia magna, Mycoplasma mobile, Mycoplasma gallisepticum, Mycoplasma ovipneumoniae, Mycoplasma canis, Myco
  • Torquens Ilyobacter polytropus, Ruminococcus albus, Akkermansia muciniphila, Acidothermus cellulolyticus, Bifidobacterium longum, Bifidobacterium dentium, Corynebacterium diphtheria, Elusimicrobium minutum, Nitratifractor salsuginis, Sphaerochaeta globus, Fibrobacter succinogenes subsp.
  • Cpf1 refers to an RNA-guided double-stranded DNA-binding nuclease protein that is a type II Cas nuclease.
  • Wild-type Cpf1 contains a RuvC-like endonuclease domain similar to the RuvC domain of Cas9, but does not have an HNH endonuclease domain and the N-terminal region of Cpf1 does not have the alpha-helix recognition lobe possessed by Cas9.
  • the wild-type protein requires a single RNA molecule, as no tracrRNA is necessary.
  • Wild-type Cpf1 creates staggered-end cuts and utilizes a T-rich protospacer-adjacent motif (PAM) that is 5’ of the guide RNA targeting sequence.
  • PAM T-rich protospacer-adjacent motif
  • Cas9 refers to an RNA-guided double-stranded DNA-binding nuclease protein or nickase protein that is a type II Cas nuclease. Wild-type Cas9 nuclease has two functional domains, e.g., RuvC and HNH, that cut different DNA strands. The wild-type enzyme requires two RNA molecules (e.g., a crRNA and a tracrRNA), or alternatively, a single fusion molecule (e.g., a gRNA comprising a crRNA and a tracrRNA).
  • Wild-type Cas9 utilizes a G- rich protospacer-adjacent motif (PAM) that is 3’ of the guide RNA targeting sequence and creates double-strand cuts having blunt ends. Cas9 can induce double-strand breaks in genomic DNA (target DNA) when both functional domains are active.
  • PAM protospacer-adjacent motif
  • the Cas9 enzyme can comprise one or more catalytic domains of a Cas9 protein derived from bacteria belonging to the group consisting of Corynebacter, Sutterella, Legionella, Treponema, Filifactor, Eubacterium, Streptococcus, Lactobacillus, Mycoplasma, Bacteroides, Flaviivola, Flavobacterium, Sphaerochaeta, Azospirillum, Gluconacetobacter, Neisseria, Roseburia, Parvibaculum, Staphylococcus, Nitratifractor, and Campylobacter.
  • the two catalytic domains are derived from different bacteria species.
  • Useful variants of the Cas9 nuclease can include a single inactive catalytic domain, such as a RuvC- or HNH- enzyme or a nickase.
  • a Cas9 nickase has only one active functional domain and can cut only one strand of the target DNA, thereby creating a single- strand break or nick.
  • a double-strand break can be introduced using a Cas9 nickase if at least two DNA-targeting RNAs that target opposite DNA strands are used.
  • a double-nicked induced double-strand break can be repaired by NHEJ or HDR (Ran et al., 2013, Cell, 154:1380-1389).
  • This gene editing strategy favors HDR and decreases the frequency of insertion/deletion (“indel”) mutations at off-target DNA sites.
  • Cas9 nucleases or nickases are described in, for example, U.S. Patent Nos.8,895,308; 8,889,418; and 8,865,406 and U.S. Application Publication Nos.2014/0356959, 2014/0273226 and 2014/0186919.
  • the Cas9 nuclease or nickase can be codon-optimized for the host cell or host organism.
  • the Cas nuclease can be a Cas9 fusion protein such as a polypeptide comprising the catalytic domain of a restriction enzyme (e.g., FokI) linked to dCas9.
  • a restriction enzyme e.g., FokI
  • FokI-dCas9 fusion protein fCas9
  • fCas9 can use two guide RNAs to bind to a single strand of target DNA to generate a double-strand break.
  • a nucleotide sequence encoding the Cas nuclease is present in a recombinant expression vector.
  • the recombinant expression vector is a viral construct, e.g., a recombinant adeno-associated virus construct, a recombinant adenoviral construct, a recombinant lentiviral construct, etc.
  • viral vectors can be based on vaccinia virus, poliovirus, adenovirus, adeno-associated virus, SV40, herpes simplex virus, human immunodeficiency virus, and the like.
  • a retroviral vector can be based on Murine Leukemia Virus, spleen necrosis virus, and vectors derived from retroviruses such as Rous Sarcoma Virus, Harvey Sarcoma Virus, avian leukosis virus, a lentivirus, human immunodeficiency virus, myeloproliferative sarcoma virus, mammary tumor virus, and the like.
  • Useful expression vectors are known to those of skill in the art, and many are commercially available. The following vectors are provided by way of example for eukaryotic host cells: pXT1, pSG5, pSVK3, pBPV, pMSG, and pSVLSV40.
  • any other vector may be used if it is compatible with the host cell.
  • useful expression vectors containing a nucleotide sequence encoding a Cas9 enzyme are commercially available from, e.g., Addgene, Life Technologies, Sigma-Aldrich, and Origene.
  • any of a number of transcription and translation control elements including promoter, transcription enhancers, transcription terminators, and the like, may be used in the expression vector.
  • Useful promoters can be derived from viruses, or any organism, e.g., prokaryotic or eukaryotic organisms.
  • Promoters may also be inducible (i.e., capable of responding to environmental factors and/or external stimuli that can be artificially controlled).
  • Suitable promoters include, but are not limited to: RNA polymerase II promoters (e.g., pGAL7 and pTEF1), RNA polymerase III promoters (e.g., RPR-tetO, SNR52, and tRNA-tyr), the SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6), an enhanced U6 promoter, a human H1 promoter (H1), etc.
  • Suitable terminators include, but are not limited to SNR52 and RPR terminator sequences, which can be used with transcripts created under the control of a RNA polymerase III promoter. Additionally, various primer binding sites may be incorporated into a vector to facilitate vector cloning, sequencing, genotyping, and the like. As a non-limiting example, the Pci1-Up sequence can be incorporated. Other suitable promoter, enhancer, terminator, and primer binding sequences will readily be known to one of skill in the art. D. Methods for identifying genetic modifications at a target locus [0191] The disclosure also provides methods for identifying a genetic modification at a target locus within the genome of a host cell, or within a heterologous or exogenous genome or DNA present in a host cell.
  • the method comprises transforming the host cell with a vector comprising a retron guide cassette described herein.
  • the method is an in vitro method.
  • the method is an in vivo method.
  • the host cell or transformed progeny of the host cell express a first retron donor DNA-guide molecule comprising a first retron transcript and the first gRNA coding region and a second retron donor DNA-guide molecule comprising a second retron transcript and the second gRNA coding region.
  • the first and second retron transcripts self-prime reverse transcription by a reverse transcriptase (RT) expressed by the host cell or the transformed progeny of the host cell.
  • RT reverse transcriptase
  • the first retron transcript is reverse transcribed to produce a multicopy single-stranded DNA (msDNA) molecule having one or more donor DNA sequences, wherein the one or more donor DNA sequences are homologous to the first target locus and comprise sequence modifications compared to the sequences within the first target locus.
  • the first target locus is cut by a nuclease expressed by the host cell or transformed progeny of the host cell, wherein the site of nuclease cutting is specified by the first gRNA.
  • the one or more donor DNA sequences recombine with one or more target nucleic acid sequences to insert, delete, and/or substitute one or more bases of the sequence of the one or more target nucleic acid sequences to induce one or more sequence modifications at the first target locus within the genome.
  • at least a portion of the second retron transcript is reverse transcribed to produce a multicopy single-stranded DNA (msDNA) molecule having one or more donor DNA sequences, wherein the one or more donor DNA sequences are homologous to the second target locus.
  • msDNA multicopy single-stranded DNA
  • the second target locus is cut by a nuclease expressed by the host cell or transformed progeny of the host cell, wherein the site of nuclease cutting is specified by the second gRNA.
  • the one or more donor DNA sequences recombine with one or more target nucleic acid sequences to insert a unique barcode sequence at the second target locus.
  • the method comprises detecting the presence of the unique barcode sequence, wherein the presence of the unique barcode sequence indicates the presence of the genetic modification at the first target locus, thereby identifying the genetic modification at the first target locus.
  • the first target locus is located in cis to the second target locus.
  • the first and second target loci are located on the same chromosome, in the same gene, or adjacent to or within the same transcription unit.
  • the first target locus is located in a cis-regulatory element of a transcription unit, and the second target locus is located at a different position in the transcription unit.
  • the first target locus is located upstream or 5’ of a gene or transcription unit, and the second target locus is located downstream or 3’ of a gene or transcription unit.
  • the first target locus is located in a cis-regulatory element of a transcription unit, and the second target locus is located in the 3’ untranslated region (UTR) of the same transcription unit.
  • UTR untranslated region
  • the first and/or second target locus is located in an intron or non-coding RNA expressed by a gene.
  • the first donor DNA sequence in the retron cassette comprises a genetic variant, such as a single nucleotide polymorphism, insertion, or a deletion, relative to the sequence at the first target locus.
  • the genetic variant comprises a cis-expression quantitative train locus (cis-eQTL) variant at the first target locus.
  • the first target locus is located in trans to the second target locus.
  • the first and second target loci are located on different chromosomes or in different genes.
  • the first target locus is located in a trans-regulatory element, and the second target locus is located in a gene, or in a transcription unit that is in trans to the first target locus. In some embodiments, the first target locus is located in a trans-regulatory element, and the second target locus is located in the 3’ untranslated region (UTR) of a transcription unit in trans to the first target locus.
  • the first donor DNA sequence in the retron cassette comprises a genetic variant compared to the sequences within the first target locus. In some embodiments, the genetic variant comprises an amino acid change in a transcription factor that regulates the expression (e.g., transcription) of another gene or transcript.
  • the genetic variant comprises a mutation in a transcription factor binding site that modifies the expression of a gene or transcript located in cis or trans to the second target locus.
  • the genetic variant comprises a trans-expression quantitative trait locus (trans-eQTL) variant at the first target locus.
  • trans-eQTL trans-expression quantitative trait locus
  • the barcode sequence comprises a defined sequence that can be distinguished from endogenous sequences by sequencing the target locus.
  • Examples of exemplary barcode sequences include random barcodes synthesized with poly-(N) tracts, which are added to the retron-sgRNA cassettes by PCR and associated with the first edit by paired sequencing of cloned plasmid libraries; programmed barcodes of 12-bp sequences that exclude common restriction sites; and retron, sgRNA or next-generation sequencing (NGS) related sequences with defined hamming distance between any pair of barcodes.
  • the barcode sequence encodes a detectable molecule, such as a fluorescent protein, a selectable marker, or a cell surface marker.
  • the second target locus corresponds to a region of the genome that is transcriptionally competent but is not likely to cause adverse effects on cells resulting from mutated or inserted DNA, often referred to as “safe-harbors.”
  • the second target locus is i) located in an intron or ii) is not located in genomic sequences that regulate transcription or translation of a gene.
  • the second target locus comprises the yeast S. cerevisiae YBR209W locus described in Levy SF, et al., Quantitative evolutionary dynamics using high-resolution lineage tracking. Nature. 2015 Mar 12;519(7542):181-6. doi: 10.1038/nature14279.
  • the second target locus comprises the human AAVS1 (also known as the PPP1R12C locus) locus on chromosome 19.
  • detecting the presence of the unique barcode sequence comprises sequencing the genome of the host cell, or detecting a detectable molecule encoded by the barcode sequence.
  • the vector is no longer present in the host cell when detecting the presence of the unique barcode sequence. In some embodiments, the vector is not integrated in the genome of the host cell.
  • the vector can be lost from the host cell or its progeny by dilution during cell division.
  • the vector can be actively removed from the cell.
  • the vector contains a gene that is toxic to the host cell.
  • the vector contains the URA3 marker gene and the cells are treated with 5-Fluoroorotic acid (5-FOA) to selectively cause toxicity to cells that retain the vector.
  • the vector can include a gene that can be used for counter-selection to kill host cells that retain the vector. See Mezzadra R, et al., A Traceless Selection: Counter- selection System That Allows Efficient Generation of Transposon and CRISPR-modified T- cell Products.
  • the vector can encode surface markers that are expressed in vector containing cells following the genetic edits, which can be immobilized by antibodies and discarded. The remaining post-edit cells that lost the transient vector can then be retained for later use.
  • the vector contains sequences that can be targeted by gRNA introduced to the cell post-editing to cut the DNA vector and expose it to exonuclease degradation.
  • greater than or equal to about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95% of the host cells comprise both the barcode sequence and the sequence modifications compared to the sequences within the first target locus.
  • the method steps are repeated by transforming the host cell or progeny thereof with a second vector comprising a second retron-guide RNA cassette to introduce a second pair or combination of edits into the genome of the host cell. This allows multiple edits to be tracked in the same cell or clonal population of transformed cells by detecting the presence and/or expression of the different barcodes inserted into the genome of the host cell.
  • the method further comprises transforming the host cell or progeny thereof with a second vector comprising a second retron-guide RNA cassette comprising: a third retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) an msd locus; (iv) a third donor DNA sequence located within the msd locus, wherein the donor DNA sequence comprises homology to one or more sequences within a third target locus; and (v) a second inverted repeat sequence coding region; and a third guide RNA (gRNA) coding region; a fourth retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) a second msd locus; (iv) a fourth donor DNA sequence located within the second msd locus, wherein the donor DNA sequence comprises homology to one or more sequences within a fourth target locus and
  • the host cell expresses a third retron donor DNA-guide molecule comprising a third retron transcript and the third gRNA coding region and a fourth retron donor DNA-guide molecule comprising a fourth retron transcript and the fourth gRNA coding region.
  • the third and fourth retron transcripts self-prime reverse transcription by a reverse transcriptase (RT) expressed by the host cell or the transformed progeny of the host cell.
  • RT reverse transcriptase
  • the third retron transcript is reverse transcribed to produce a multicopy single-stranded DNA (msDNA) molecule having one or more donor DNA sequences, wherein the one or more donor DNA sequences are homologous to the third target locus and comprise sequence modifications compared to the sequences within the third target locus.
  • the third target locus is cut by a nuclease expressed by the host cell or transformed progeny of the host cell, wherein the site of nuclease cutting is specified by the third gRNA.
  • the one or more donor DNA sequences recombine with one or more target nucleic acid sequences to insert, delete, and/or substitute one or more bases of the sequence of the one or more target nucleic acid sequences to induce one or more sequence modifications at the third target locus within the genome.
  • at least a portion of the fourth retron transcript is reverse transcribed to produce a multicopy single-stranded DNA (msDNA) molecule having one or more donor DNA sequences, wherein the one or more donor DNA sequences are homologous to the fourth target locus.
  • the fourth target locus is cut by a nuclease expressed by the host cell or transformed progeny of the host cell, wherein the site of nuclease cutting is specified by the fourth gRNA.
  • the one or more donor DNA sequences recombine with one or more target nucleic acid sequences to insert the second unique barcode sequence at the fourth target locus.
  • the method comprises detecting the presence of the second unique barcode sequence, wherein the presence of the second unique barcode sequence indicates the presence of the genetic modification at the third target locus, thereby identifying the genetic modification at the third target locus.
  • the third target locus is located in cis to the fourth target locus.
  • the third and fourth target loci are located on the same chromosome, in the same gene, or adjacent to or within the same transcription unit.
  • the third target locus is located in a cis-regulatory element of a transcription unit, and the fourth target locus is located at a different position in the transcription unit.
  • the third target locus is located upstream or 5’ of a gene or transcription unit, and the fourth target locus is located downstream or 3’ of a gene or transcription unit.
  • the third target locus is located in a cis-regulatory element of a transcription unit, and the fourth target locus is located in the 3’ untranslated region (UTR) of the same transcription unit.
  • UTR untranslated region
  • the third and/or fourth target locus is located in an intron or non-coding RNA expressed by a gene.
  • the third donor DNA sequence in the second retron-guide RNA cassette comprises a genetic variant, such as a single nucleotide polymorphism, insertion, or a deletion, relative to the sequence at the third target locus.
  • the genetic variant comprises a cis-expression quantitative train locus (cis- eQTL) variant at the third target locus.
  • the third target locus is located in trans to the fourth target locus.
  • the third and fourth target loci are located on different chromosomes or in different genes.
  • the third target locus is located in a trans-regulatory element, and the fourth target locus is located in a gene, or in a transcription unit that is in trans to the third target locus. In some embodiments, the third target locus is located in a trans-regulatory element, and the fourth target locus is located in the 3’ untranslated region (UTR) of a transcription unit in trans to the third target locus.
  • the third donor DNA sequence in the retron cassette comprises a genetic variant compared to the sequences within the third target locus. In some embodiments, the genetic variant comprises an amino acid change in a transcription factor that regulates the expression (e.g., transcription) of another gene or transcript.
  • the genetic variant comprises a mutation in a transcription factor binding site that modifies the expression of a gene or transcript located in cis or trans to the second target locus.
  • the genetic variant comprises a trans-expression quantitative trait locus (trans-eQTL) variant at the first target locus.
  • trans-eQTL trans-expression quantitative trait locus
  • the second unique barcode sequence comprises a defined sequence that can be distinguished from endogenous sequences by sequencing the target locus.
  • Examples of exemplary barcode sequences include random barcodes synthesized with poly-(N) tracts, which are added to the retron-sgRNA cassettes by PCR and associated with the first edit by paired sequencing of cloned plasmid libraries; programmed barcodes of 12- bp sequences that exclude common restriction sites; and retron, sgRNA or next-generation sequencing (NGS) related sequences with defined Hamming distance between any pair of barcodes.
  • the second unique barcode sequence encodes a detectable molecule, such as a fluorescent protein, a selectable marker, or a cell surface marker.
  • the second unique barcode sequence is different than the unique barcode sequence (i.e., the first unique barcode sequence) inserted at the second target locus.
  • the fourth target locus corresponds to a region of the genome that is transcriptionally competent but is not likely to cause adverse effects on cells resulting from mutated or inserted DNA, often referred to as “safe-harbors.”
  • the fourth target locus is i) located in an intron or ii) is not located in genomic sequences that regulate transcription or translation of a gene.
  • the fourth target locus comprises the yeast S.
  • the second target locus comprises the human AAVS1 (also known as the PPP1R12C locus) locus on chromosome 19.
  • detecting the presence of the second unique barcode sequence comprises sequencing the genome of the host cell, or detecting a detectable molecule encoded by the barcode sequence.
  • the second vector is no longer present in the host cell when detecting the presence of the unique barcode sequence. In some embodiments, the second vector is not integrated in the genome of the host cell. In some embodiments, the second vector can be lost from the host cell or its progeny by dilution during cell division. [0213] In some embodiments, greater than or equal to about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95% of the host cells comprise both the second barcode sequence and the sequence modifications compared to the sequences within the third target locus. [0214] In some embodiments, the methods further comprise detecting or determining the relative expression of transcription from the transcription units comprising genetic variants at the first and third target loci.
  • the relative expression can be determined by quantifying the amount of the barcode sequence and determining the relative ratio of transcript sequences to barcode sequences.
  • the amount of the barcode sequence is measured by performing RT-qPCR assays using primers that amplify the barcode sequence.
  • the amount of the barcode sequence is determined by next generation sequencing (NGS).
  • NGS next generation sequencing
  • transcript abundance is determined by measuring or quantifying the amount of a detectable marker encoded by the barcode.
  • the TRACE-Seq method tracks recombination alleles and identifies clonal reconstitution dynamics of gene targeted human hematopoietic stem cells.” Nat Commun 12, 472 (2021). https://doi.org/10.1038/s41467-020-20792-y, incorporating the genetic variant and barcode with one guide in a single editing event, which is limited to using amino acid codon replacement as barcodes.
  • the codon swap barcoding strategy also is not applicable for non-coding sequences where it is important to preserve all nucleotides.
  • the current methods allow insertion of the barcode sequence elsewhere in the genome, and does not interfere with the locus comprising the genetic variant edit.
  • the TRACE-seq method is less useful because all loci must be genotyped which limits throughput.
  • the first and third gRNAs are the same.
  • the first and third target loci are the same.
  • the genetic modifications or edits at the first and third loci are different.
  • the second and fourth gRNAs are the same.
  • the first and third gRNAs are the same, and the second and fourth gRNAs are the same.
  • the second and fourth target loci are the same.
  • the barcode sequences inserted at the same target loci are different. In some embodiments, the barcode sequences inserted at the second and fourth target loci are different. [0217] In some embodiments of the methods described herein, different guide RNAs are used to introduce different genetic modifications at different target loci, but the same guide RNA is used to introduce different barcodes at the same target locus. This allows the same validated gRNA to be used to insert the barcode sequence at the target locus with high efficiency. Thus, in some embodiments, the first and third gRNAs are different. In some embodiments, the first and third target loci are different. In some embodiments, the genetic modifications at the first and third loci are different.
  • the second and fourth gRNAs are the same. In some embodiments, the first and third gRNAs are different, and the second and fourth gRNAs are the same. In some embodiments, the second and fourth target loci are the same. In some embodiments, the barcode sequences inserted at the second and fourth target loci are different. [0218] In some embodiments of the methods described herein, different guide RNAs are used to introduce different genetic modifications at different target loci, and different guide RNAs are used to introduce different barcode sequences at different target loci. Thus, in some embodiments, the first and third gRNAs are different, and the second and fourth gRNAs are different.
  • the first and third target loci are different, and the second and fourth target loci are different.
  • the genetic modifications at the first and third loci are different, and the barcode sequences inserted at the second and fourth target loci are different.
  • the one or more donor DNA sequences comprise two homology arms, wherein each homology arm has at least about 70% to about 99% similarity to a portion of the sequence of the one or more target loci on either side of a nuclease cleavage site.
  • the methods comprise detecting the presence of the unique barcode at the second target locus, thereby identifying the genetic modification at both the first and third target loci.
  • the methods are repeated with a third vector comprising a third retron-guide RNA cassette that inserts a genetic modification at a fifth target locus and a unique barcode sequence at a sixth target locus, thereby identifying the genetic modification at the fifth target locus.
  • the methods can be repeated multiple times with vectors comprising different retron-guide RNA cassettes to insert additional genetic modifications at the same or different target loci and to introduce additional unique barcodes at specific loci in the host cell genome that can be used to track the corresponding genetic modifications.
  • the host cell is a prokaryotic cell.
  • the host cell is a eukaryotic cell, such as a yeast cell or mammalian cell.
  • the host cell comprises a clonal population of host cells.
  • the genetic modifications are induced in greater than or equal to about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95% of the population of host cells.
  • the methods comprise transforming a mixture of cells with one or more vectors comprising the first, second and/or third retron-guide RNA cassettes, and screening the transformed cells for a phenotypic change relative to an untransformed control cell.
  • the methods comprise detecting the presence of the genetic modification at the target locus or the presence of the unique barcode sequence present in each retron-guide RNA cassette.
  • the genetic modifications can be detected by sequencing the genomic DNA comprising the modification, or by detecting a change in one or more phenotypes expressed by the host cell or organism comprising the host cell.
  • the presence of the unique barcode sequence can be detected by sequencing the genomic DNA comprising the barcode sequence, or by detecting a protein or detectable marker encoded by the barcode sequence.
  • Methods for introducing nucleic acids into host cells are known in the art, and any known method can be used to introduce a nuclease or a nucleic acid (e.g., a nucleotide sequence encoding the nuclease or reverse transcriptase, a DNA-targeting RNA (e.g., a guide RNA), a donor repair template for homology-directed repair (HDR), etc.) into a cell.
  • a nuclease or a nucleic acid e.g., a nucleotide sequence encoding the nuclease or reverse transcriptase, a DNA-targeting RNA (e.g., a guide RNA), a donor repair template for homology-directed repair (HDR), etc.
  • Non-limiting examples of suitable methods include electroporation, viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, calcium phosphate precipitation, polyethyleneimine (PEI)-mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct microinjection, nanoparticle-mediated nucleic acid delivery, and the like.
  • the components of the CRISPR-retron system can be introduced into a cell using a delivery system.
  • the delivery system comprises a nanoparticle, a microparticle (e.g., a polymer micropolymer), a liposome, a micelle, a virosome, a viral particle, a nucleic acid complex, a transfection agent, an electroporation agent (e.g., using a NEON transfection system), a nucleofection agent, a lipofection agent, and/or a buffer system that includes a nuclease component (as a polypeptide or encoded by an expression construct), a reverse transcriptase component, and one or more nucleic acid components such as a DNA-targeting RNA (e.g., a guide RNA) and/or a donor repair template.
  • a nuclease component as a polypeptide or encoded by an expression construct
  • a reverse transcriptase component e.g., a reverse transcriptase component
  • nucleic acid components such as a DNA-targeting RNA (e.g
  • the components can be mixed with a lipofection agent such that they are encapsulated or packaged into cationic submicron oil-in-water emulsions.
  • the components can be delivered without a delivery system, e.g., as an aqueous solution.
  • Methods of preparing liposomes and encapsulating polypeptides and nucleic acids in liposomes are described in, e.g., Methods and Protocols, Volume 1: Pharmaceutical Nanocarriers: Methods and Protocols. (ed. Weissig). Humana Press, 2009 and Heyes et al. (2005) J Controlled Release 107:276-87.
  • microparticles and encapsulating polypeptides and nucleic acids are described in, e.g., Functional Polymer Colloids and Microparticles volume 4 (Microspheres, microcapsules & liposomes). (eds. Arshady & Guyot). Citus Books, 2002 and Microparticulate Systems for the Delivery of Proteins and Vaccines. (eds. Cohen & Bernstein). CRC Press, 1996.
  • F. Host cells [0228]
  • the present disclosure provides host cells that have been transformed by vectors of the present disclosure.
  • the compositions and methods of the present disclosure can be used for genome editing of any host cell of interest.
  • the host cell can be a cell from any organism, e.g., a bacterial cell, an archaeal cell, a cell of a single-cell eukaryotic organism, a plant cell (e.g., a rice cell, a wheat cell, a tomato cell, an Arabidopsis thaliana cell, a Zea mays cell and the like), an algal cell (e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens C.
  • a bacterial cell e.g., a bacterial cell, an archaeal cell, a cell of a single-cell eukaryotic organism
  • a plant cell e.g., a rice cell, a wheat cell, a tomato cell, an Arabidopsis thaliana cell, a Zea mays cell and the like
  • an algal cell e
  • a fungal cell e.g., yeast cell, etc.
  • an animal cell e.g., fruit fly, cnidarian, echinoderm, nematode, etc.
  • a cell from a vertebrate animal e.g., fish, amphibian, reptile, bird, mammal, etc.
  • a cell from a mammal a cell from a human, a cell from a healthy human, a cell from a human patient, a cell from a cancer patient, etc.
  • the host cell treated by the method disclosed herein can be transplanted to a subject (e.g., patient).
  • the host cell can be derived from the subject to be treated (e.g., patient).
  • Any type of cell may be of interest, such as a stem cell, e.g., embryonic stem cell, induced pluripotent stem cell, adult stem cell, e.g., mesenchymal stem cell, neural stem cell, hematopoietic stem cell, organ stem cell, a progenitor cell, a somatic cell, e.g., fibroblast, hepatocyte, heart cell, liver cell, pancreatic cell, muscle cell, skin cell, blood cell, neural cell, immune cell, and any other cell of the body, e.g., human body.
  • a stem cell e.g., embryonic stem cell, induced pluripotent stem cell
  • adult stem cell e.g., mesenchymal stem cell, neural stem cell, hematopoietic stem cell, organ stem cell, a progenitor cell
  • a somatic cell e.g., fibroblast,
  • the cells can be primary cells or primary cell cultures derived from a subject, e.g., an animal subject or a human subject, and allowed to grow in vitro for a limited number of passages.
  • the cells are disease cells or derived from a subject with a disease.
  • the cells can be cancer or tumor cells.
  • the cells can also be immortalized cells (e.g., cell lines), for instance, from a cancer cell line.
  • Cells can be harvested from a subject by any standard method. For instance, cells from tissues, such as skin, muscle, bone marrow, spleen, liver, kidney, pancreas, lung, intestine, stomach, etc., can be harvested by a tissue biopsy or a fine needle aspirate.
  • Blood cells and/or immune cells can be isolated from whole blood, plasma or serum.
  • suitable primary cells include peripheral blood mononuclear cells (PBMC), peripheral blood lymphocytes (PBL), and other blood cell subsets such as, but not limited to, T cell, a natural killer cell, a monocyte, a natural killer T cell, a monocyte-precursor cell, a hematopoietic stem cell or a non-pluripotent stem cell.
  • the cell can be any immune cells including any T-cell such as tumor infiltrating cells (TILs), such as CD3+ T-cells, CD4+ T- cells, CD8+ T-cells, or any other type of T-cell.
  • TILs tumor infiltrating cells
  • the T cell can also include memory T cells, memory stem T cells, or effector T cells.
  • the T cells can also be skewed towards particular populations and phenotypes.
  • the T cells can be skewed to phenotypically comprise, CD45RO(-), CCR7(+), CD45RA(+), CD62L(+), CD27(+), CD28(+) and/or IL- 7R ⁇ (+).
  • Suitable cells can be selected that comprise one of more markers selected from a list comprising: CD45RO(-), CCR7(+), CD45RA(+), CD62L(+), CD27(+), CD28(+) and/or IL- 7R ⁇ (+).
  • Induced pluripotent stem cells can be generated from differentiated cells according to standard protocols described in, for example, U.S. Patent Nos.7,682,828, 8,058,065, 8,530,238, 8,871,504, 8,900,871 and 8,791,248, the disclosures are herein incorporated by reference in their entirety for all purposes.
  • the host cell is in vitro. In other embodiments, the host cell is ex vivo. In yet other embodiments, the host cell is in vivo. G.
  • the present disclosure provides a method for modifying one or more target nucleic acids of interest at one or more target loci within a genome of a host cell, or within a heterologous or exogenous genome or DNA present in a host cell.
  • the method comprises: (a) transforming the host cell with a vector of the present disclosure; and (b) culturing the host cell or transformed progeny of the host cell under conditions sufficient for expressing from the vector a retron donor DNA-guide molecule comprising a retron transcript and a guide RNA (gRNA) molecule, wherein the retron transcript self-primes reverse transcription by a reverse transcriptase (RT) expressed by the host cell or the transformed progeny of the host cell, wherein at least a portion of the retron transcript is reverse transcribed to produce a multicopy single-stranded DNA (msDNA) molecule having one or more donor DNA sequences, wherein the one or more donor DNA sequences are homologous to the one or more target loci and comprise sequence modifications compared to the one or more target nucleic acids, wherein the one or more target loci are cut by a nuclease expressed by the host cell or the transformed progeny of the host cell, wherein the site of nuclease cutting
  • the host cell is capable of expressing the RT prior to transforming the host cell with the vector.
  • the RT is encoded in a sequence that is integrated into the genome of the host cell.
  • the RT is encoded in a sequence on a separate plasmid.
  • the host cell is capable of expressing the RT at the same time as, or after, transforming the host cell with the vector.
  • the RT is expressed from the vector.
  • the RT is encoded in a sequence on a separate plasmid.
  • the host cell is capable of expressing the nuclease (e.g., Cas9) prior to transforming the host cell with the vector.
  • the nuclease is encoded in a sequence that is integrated into the genome of the host cell. In other instances, the nuclease is encoded in a sequence on a separate plasmid. In other embodiments, the host cell is capable of expressing the nuclease at the same time as, or after, transforming the host cell with the vector. In some instances, the nuclease is expressed from the vector. In other instances, the nuclease is encoded in a sequence on a separate plasmid. [0235] In some embodiments, the vector comprises a retron-gRNA cassette that, when transcribed, yields a retron transcript and gRNA that are physically coupled.
  • the resulting donor DNA sequence within the msDNA and the gRNA can also be physically coupled.
  • the retron transcript and gRNA subsequently become physically uncoupled (e.g., before or after reverse transcription of the retron transcript occurs).
  • Physical uncoupling of the retron transcript and the gRNA can result from, for example, ribozyme cleavage (e.g., the retron-gRNA cassette also contains a ribozyme sequence).
  • the resulting donor DNA sequence within the msDNA and the gRNA will be physically uncoupled (e.g., during genome editing and/or screening).
  • the retron transcript and the gRNA are not initially physically coupled.
  • the retron transcript and the gRNA are subsequently joined together.
  • Transcription event(s) that result in the production of the retron transcript and/or gRNA can occur inside a host cell, outside of a host cell (e.g., followed by introduction of the retron transcript and/or gRNA into the host cell), or a combination thereof.
  • the one or more target nucleic acids of interest are modified by a donor DNA sequence (e.g., within a msDNA) and a gRNA that are never physically coupled.
  • the donor DNA sequence and the gRNA can be expressed from different cassettes (e.g., which are contained in the same vector or different vectors) and the donor DNA sequence and the gRNA can act in trans.
  • the present disclosure provides a method for screening one or more genetic loci of interest in a genome of a host cell, the method comprising: (a) modifying one or more target nucleic acids of interest at one or more target loci within the genome of the host cell according to a method of the present disclosure; (b) incubating the modified host cell under conditions sufficient to elicit a phenotype that is controlled by the one or more genetic loci of interest; (c) identifying the resulting phenotype of the modified host cell; and (d) determining that the identified phenotype was the result of the modifications made to the one or more target nucleic acids of interest at the one or more target loci of interest.
  • the target DNA can be analyzed by standard methods known to those in the art.
  • indel mutations can be identified by sequencing using the SURVEYOR ® mutation detection kit (Integrated DNA Technologies, Coralville, IA) or the Guide-it TM Indel Identification Kit (Clontech, Mountain View, CA).
  • Homology-directed repair (HDR) can be detected by PCR-based methods, and in combination with sequencing or RFLP analysis.
  • Non-limiting examples of PCR-based kits include the Guide-it Mutation Detection Kit (Clontech) and the GeneArt ® Genomic Cleavage Detection Kit (Life Technologies, Carlsbad, CA). Deep sequencing can also be used, particularly for a large number of samples or potential target/off-target sites.
  • editing efficiency can be assessed by employing a reporter or selectable marker to examine the phenotype of an organism or a population of organisms. In some instances, the marker produces a visible phenotype, such as the color of an organism or population of organisms.
  • edits can be made that either restore or disrupt the function of metabolic pathways that confer a visible phenotype (e.g., a color) to the organism.
  • a successful genome edit results in a color change in the target organism (e.g., because the edit disrupts a metabolic pathway that results in a color change or because the edit restores function in a pathway that results in a color change)
  • the absolute number or the proportion of organisms or their progeny that exhibit a color change e.g., an estimated or direct count of the number of organisms exhibiting a color change divided by the total number of organisms for which the genomes were potentially edited
  • the phenotype is examined by growing the target organisms and/or their progeny under conditions that result in a phenotype, wherein the phenotype may not be visible under ordinary growth conditions.
  • growing yeast in a culture medium that is adenine deficient can lead to a particular phenotype (e.g., a color change) in yeast cells that possess a genetic defect in adenine synthesis.
  • growing yeast cells in adenine- deficient media can allow one to discern the effect of genome edits that putatively target adenine biosynthesis loci.
  • the reporter or selectable marker is a fluorescent tagged protein, an antibody, a labeled antibody, a chemical stain, a chemical indicator, or a combination thereof.
  • the reporter or selectable marker responds to a stimulus, a biochemical, or a change in environmental conditions.
  • the reporter or selectable marker responds to the concentration of a metabolic product, a protein product, a synthesized drug of interest, a cellular phenotype of interest, a cellular product of interest, or a combination thereof.
  • a cellular product of interest can be, as a non-limiting example, an RNA molecule (e.g., messenger RNA (mRNA), long non-coding RNA (lncRNA), microRNA (miRNA)).
  • RNA molecule e.g., messenger RNA (mRNA), long non-coding RNA (lncRNA), microRNA (miRNA)
  • Editing efficiency can also be examined or expressed as a function of time. For example, an editing experiment can be allowed to run for a fixed period of time (e.g., 24 or 48 hours) and the number of successful editing events in that fixed time period can be determined. Alternatively, the proportion of successful editing events can be determined for a fixed period of time. Typically, longer editing periods will result in a larger number of successful editing events. Editing experiments or procedures can run for any length of time.
  • a genome editing experiment or procedure runs for several hours (e.g., about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24 hours). In other embodiments, a genome editing experiment or procedure runs for several days (e.g., about 1, 2, 3, 4, 5, 6, or 7 days). [0242] In addition to the length of time of the editing period, editing efficiency can be affected by the choice of gRNA, donor DNA sequence, the choice of promoter used, or a combination thereof. [0243] In other embodiments, editing efficiency is compared to a control efficiency.
  • the control efficiency is determined by running a genome editing experiment in which the retron transcript and gRNA molecule are never physically coupled, or are initially coupled but subsequently become uncoupled. In some instances, the retron transcript and gRNA molecule are initially coupled and then become uncoupled (e.g., by ribozyme cleavage). In other instances, the retron-guide RNA (gRNA) cassette is configured such that the transcript products of the retron and gRNA coding region are never physically coupled. In yet other instances, the retron transcript and gRNA are introduced into the host cell separately.
  • the methods and compositions of the present disclosure result in at least about a 1.3- to 3-fold (i.e., at least about a 1.3-, 1.4-, 1.5-, 1.6-, 1.7-, 1.8-, 1.9-, 2-, 2.1-, 2.2-, 2.3-, 2.4-, 2.5-, 2.6-, 2.7-, 2.8-, 2.9-, or 3-fold) increase in efficiency, compared to when the retron transcript and gRNA are not physically coupled during editing.
  • At least about a 3- to 10-fold increase i.e., at least about a 3-, 4-, 5-, 6-, 7-, 8-, 9-, or 10-fold
  • at least about a 10- to 100-fold i.e., at least about 10-, 20-, 30-, 40-, 50-, 60-, 70-, 80-, 90-, or 100-fold
  • Editing efficiency can also be improved by performing editing experiments or procedures in a multiplex format.
  • multiplexing comprises cloning two or more editing retron-gRNA cassettes in tandem into a single vector. In some instances, at least about 10 retron-gRNA cassettes (i.e., at least about 2, 3, 4, 5, 6, 7, 8, 9, or 10 retron- gRNA cassettes) are cloned into a single vector. [0245] In other embodiments, multiplexing comprises transforming a host cell with two or more vectors. Each vector can comprise one or multiple retron-gRNA cassettes. In some instances, at least about 10 vectors (i.e., at least about 2, 3, 4, 5, 6, 7, 8, 9, or 10 vectors) are used to transform an individual host cell.
  • multiplexing comprises transforming two or more individual host cells, each with a different vector or combination of vectors.
  • at least about 2 host cells i.e., at least about 2, 3, 4, 5, 6, 7, 8, 9, or 10 host cells
  • between about 10 and 100 host cells i.e., about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 host cells
  • between about 100 and 1,000 host cells i.e., about 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1,000 host cells
  • between about 1,000 and 10,000 host cells i.e., about 1,000, 1,500, 2,000, 2,500, 3,000, 3,500, 4,000, 4,500, 5,000, 5,500, 6,000, 6,500, 7,000, 7,500, 8,000, 8,500, 9,000, 9,500, or 10,000 host cells are transformed).
  • between about 10,000 and 100,000 host cells i.e., about 10,000, 15,000, 20,000, 25,000, 30,000, 35,000, 40,000, 45,000, 50,000, 55,000, 60,000, 65,000, 70,000, 75,000, 80,000, 85,000, 90,000, 95,000, or 100,000 host cells are transformed.
  • host cells i.e., at least about 100,000, 150,000, 200,000, 250,000, 300,000, 350,000, 400,000, 450,000, 500,000, 550,000, 600,000, 650,000, 700,000, 750,000, 800,000, 850,000, 900,000, 950,000 or 1,000,000 host cells
  • more than about 1,000,000 host cells are transformed.
  • multiple embodiments of multiplexing can be combined. [0247] By using one or a combination of the various multiplexing embodiments, it is possible to modify and/or screen any number of loci within a genome. In some instances, at least about 10 (i.e., about 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) genetic loci are modified or screened.
  • loci are modified or screened.
  • between about 100 and 1,000 genetic loci i.e., about 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, or 1,000 genetic loci are modified or screened.
  • between about 1,000 and 100,000 genetic loci are modified or screened.
  • the host cell or host cell comprises a population of host cells.
  • one or more sequence modifications are induced in at least about 20 percent (i.e., at least about 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 percent) of the population of cells. In other instances, one or more sequence modifications are induced in at least about 50 percent (i.e., at least about 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 65, 70, 75, 80, 85, 90, 95, or 100 percent) of the population of cells.
  • one or more sequence modifications are induced in at least about 75 percent (i.e., at least about 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 95, or 100 percent) of the population of cells.
  • one or more sequence modifications are induced in at least about 90 percent (i.e., at least about 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 percent) of the population of cells.
  • one or more sequence modifications are induced in at least about 95 percent (i.e., at least about 95, 96, 97, 98, 99, or 100 percent) of the population of cells.
  • the precision of genome editing can correspond to the number or percentage of on- target genome editing events relative to the number or percentage of all genome editing events, including on-target and off-target events. Testing for on-target genome editing events can be accomplished by direct sequencing of the target region or other methods described herein.
  • editing precision is at least about 80 percent (i.e., at least about 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 95, or 100 percent), meaning that at least about 80 percent of all genome editing events are on-target editing events.
  • editing precision is at least about 90 percent (i.e., at least about 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 percent), meaning that at least about 90 percent of all genome editing events are on-target editing events.
  • editing precision is at least about 95 percent (i.e., at least about 95, 96, 97, 98, 99, or 100 percent), meaning that at least about 95 percent of all genome editing events are on-target editing events.
  • editing precision is at least about 99 percent (i.e., at least about 99 or 100 percent), meaning that at least 99 percent of all genome editing events are on-target editing events.
  • compositions and methods of the present disclosure are suitable for any disease that has a genetic basis and is amenable to prevention or amelioration of disease-associated sequelae or symptoms by editing or correcting one or more genetic loci that are linked to the disease.
  • Non-limiting examples of diseases include X-linked severe combined immune deficiency, sickle cell anemia, thalassemia, hemophilia, neoplasia, cancer, age-related macular degeneration, schizophrenia, trinucleotide repeat disorders, fragile X syndrome, prion-related disorders, amyotrophic lateral sclerosis, drug addiction, autism, Alzheimer’s disease, Parkinson’s disease, cystic fibrosis, blood and coagulation diseases and disorders, inflammation, immune-related diseases and disorders, metabolic diseases and disorders, liver diseases and disorders, kidney diseases and disorders, muscular/skeletal diseases and disorders, neurological and neuronal diseases and disorders, cardiovascular diseases and disorders, pulmonary diseases and disorders, and ocular diseases.
  • compositions and methods of the present disclosure can also be used to prevent or treat any combination of suitable genetic diseases.
  • the subject is treated before any symptoms or sequelae of the genetic disease develop.
  • the subject has symptoms or sequelae of the genetic disease.
  • treatment results in a reduction or elimination of the symptoms or sequelae of the genetic disease.
  • treatment includes administering compositions of the present disclosure directly to a subject.
  • pharmaceutical compositions of the present disclosure can be delivered directly to a subject (e.g., by local injection or systemic administration).
  • compositions of the present disclosure are delivered to a host cell or population of host cells, and then the host cell or population of host cells is administered or transplanted to the subject.
  • the host cell or population of host cells can be administered or transplanted with a pharmaceutically acceptable carrier.
  • editing of the host cell genome has not yet been completed prior to administration or transplantation to the subject.
  • editing of the host cell genome has been completed when administration or transplantation occurs.
  • progeny of the host cell or population of host cells are transplanted into the subject.
  • correct editing of the host cell or population of host cells, or the progeny thereof is verified before administering or transplanting edited cells or the progeny thereof into a subject.
  • compositions of the present disclosure including cells and/or progeny thereof that have had their genomes edited by the methods and/or compositions of the present disclosure, may be administered as a single dose or as multiple doses, for example two doses administered at an interval of about one month, about two months, about three months, about six months or about 12 months. Other suitable dosage schedules can be determined by a medical practitioner.
  • Prevention or treatment can further comprise administering agents and/or performing procedures to prevent or treat concomitant or related conditions. As non-limiting examples, it may be necessary to administer drugs to suppress immune rejection of transplanted cells, or prevent or reduce inflammation or infection.
  • kits for modifying one or more target nucleic acids of interest at one or more target loci within a genome of a host cell, or within a heterologous or exogenous genome or DNA present in a host cell, the kit comprising one or a plurality of vectors or retron-guide RNA (gRNA) cassettes of the present disclosure.
  • the kit may further comprise a host cell or a plurality of host cells that are recombinantly modified by the vectors or retron-guide RNA (gRNA) cassettes of the present disclosure.
  • the kit contains one or more reagents.
  • the reagents are useful for transforming a host cell with a vector or a plurality of vectors, and/or inducing expression from the vector or plurality of vectors.
  • the kit may further comprise a reverse transcriptase, a plasmid for expressing a reverse transcriptase, one or more nucleases, one or more plasmids for expressing one or more nucleases, or a combination thereof.
  • the kit may further comprise one or more reagents useful for delivering nucleases or reverse transcriptases into the host cell and/or inducing expression of the reverse transcriptase and/or the one or more nucleases.
  • the kit further comprises instructions for transforming the host cell with the vector, introducing nucleases and/or reverse transcriptases into the host cell, inducing expression of the vector, reverse transcriptase, and/or nucleases, or a combination thereof.
  • the present disclosure provides a kit for modifying one or more target nucleic acids of interest at one or more target loci in a host cell, the kit comprising one or a plurality of retron donor DNA-guide molecules of the present disclosure.
  • the kit may further comprise a host cell or a plurality of host cells comprising genetic modifications introduced by the retron donor DNA-guide molecules of the present disclosure.
  • the kit contains one or more reagents.
  • the reagents are useful for introducing the retron donor DNA-guide molecule or plurality thereof into the host cell.
  • the kit may further comprise a reverse transcriptase, a plasmid for expressing a reverse transcriptase, one or more nucleases, one or more plasmids for expressing one or more nucleases, or a combination thereof.
  • the kit may further comprise one or more reagents useful for delivering into the host cell reverse transcriptases and/or nucleases and/or inducing expression of the reverse transcriptase and/or the one or more nucleases.
  • the kit further comprises instructions for introducing the retron donor DNA-guide molecule or plurality thereof into the host cell, introducing nucleases and/or reverse transcriptases into the host cell, inducing expression of the reverse transcriptase and/or nucleases, or a combination thereof.
  • J. Applications The compositions and methods provided by the present disclosure are useful for any number of applications. As non-limiting examples, genome editing or screening according to the compositions and methods of the present disclosure can be used for cell lineage tracking or the measurement of RNA abundance, or to track the relative abundance of cells targeted by a mixture of edits in parallel. For example, the insertion of barcodes described herein can be used for cell lineage tracking or the measurement of RNA abundance.
  • genome editing or screening according to the compositions and methods of the present disclosure can be used in high-throughput precision editing genetic screens to 1) improve industrial microbial growth; 2) select strains for improving crop yield; 3) track edited cell populations used for medical treatments in vitro or in vivo; and 4) track edited cell populations used in cell therapy.
  • genome editing according to the compositions and methods of the present disclosure can be performed to correct detrimental lesions in order to prevent or treat a disease, or to identify one or more specific genetic loci that contribute to a phenotype, disease, biological function, and the like.
  • genome editing or screening according to the compositions and methods of the present disclosure can be used to improve or optimize a biological function, pathway, or biochemical entity (e.g., protein optimization).
  • optimization applications are especially suited to the compositions and methods of the present disclosure, as they can require the modification of a large number of genetic loci and subsequently assessing the effects.
  • Other non-limiting examples of applications suitable for the compositions and methods of the present disclosure include the production of recombinant proteins for pharmaceutical and industrial use, the production of various pharmaceutical and industrial chemicals, the production of vaccines and viral particles, and the production of fuels and nutraceuticals. All of these applications typically involve high-throughput or high-content screening, making them especially suited to the compositions and methods of the present disclosure.
  • inducing one or more sequence modifications at one or more genetic loci of interest comprises substituting, inserting, and/or deleting one or more nucleotides at the one or more genetic loci of interest. In some instances, inducing the one or more sequence modifications results in the insertion of one or more sequences encoding cellular localization tags, one or more synthetic response elements, and/or one or more sequences encoding degrons into the genome. [0265] In other embodiments, inducing the one or more sequence modifications at the one or more genetic loci of interest results in the insertion of one or more sequences from a heterologous genome. Introducing heterologous DNA sequences into a genome is useful for any number of applications, some of which are described herein.
  • Non-limiting examples are directed protein evolution, biological pathway optimization, and production of recombinant pharmaceuticals.
  • EXAMPLES [0266] The following example provides representative methods for performing an exemplary embodiment of the disclosure. The example demonstrates that the methods of the disclosure can be used for high-throughput genome editing. [0267] Introduction [0268] An important issue in understanding complex traits is the phenomenon of gene-by- environment (GxE) interactions, wherein a genetic variant’s effect is dependent on the environment an organism is exposed to 1 .
  • GxE gene-by- environment
  • QTL mapping uses genetic crosses between strains to create diverse progeny through recombination to calculate statistical signals that associate with environmental response 8–11 .
  • QTL quantitative trait locus
  • reverse genetic approaches such as constructing knockout libraries and measuring their effects on growth have single-gene resolution, and have been invaluable sources of information about the functions of genes in various organisms and their genetic interactions.
  • CRISPEY Cas9 Retron precISe Parallel Editing via homologY
  • RT bacterial retron reverse transcriptase
  • msDNA multi-copy, single-stranded DNA
  • this design has improved statistical power to detect fitness effects by incorporating unique molecular identifiers (UMIs), as well as the ability to maintain strain barcodes in non-selective media, which allows both assaying and detecting GxE effects of thousands of individual genetic variants in any growth condition.
  • UMIs unique molecular identifiers
  • This approach allows natural variants throughout the genome to be surveyed in any condition, providing the ability to decipher the precise genetic basis and molecular mechanisms giving rise to complex traits.
  • CRISPEY-BAR was used to measure the effects of 4184 natural variants segregating in yeast (Saccharomyces cerevisiae) across a variety of conditions.548 variants underlying variation in growth in these environments were identified. Importantly, resolution of the measurements can differentiate the effects of variants even when they are tightly clustered in the genome, as well as different alleles at the same genomic position. This single- nucleotide resolution of GxE interactions not only allows exploration of the natural landscape of complex traits, but also provides direct mechanistic insights into phenotypic evolution 14,19 . More generally, the methods provide a paradigm for studying genetic variants and their environmental interactions at unprecedented resolution and throughput via multiplexed precision genome editing.
  • CRISPEY-BAR enables high-resolution mapping of genotype to phenotype relationships
  • CRISPEY-BAR is a scalable system for measuring the effects of precise genome edits by tracking an associated genomic barcode (Fig.1a). As described in a previous report, CRISPEY uses a single guide/donor pair to make one precise edit per cell, and in a pooled assay, measures the change in abundance of each guide/donor pair post-editing through high- throughput sequencing of plasmids (Fig.1b) 18 .
  • a new vector design was developed incorporating two consecutive retron-guide cassettes flanked by three self-cleaving ribozymes, allowing simultaneous generation of two guide/donor pairs for making two precise edits in the same cell 20 (Fig.1a, Fig.6).
  • the different ribozymes prevent unwanted recombination events during pooled cloning and co-transcriptionally separate the two retron- guide RNAs for processing by retron reverse transcriptase (RT).
  • CRISPEY-BAR implements a dual-edit design to simultaneously 1) integrate a unique genomic barcode and 2) make a precise variant edit of interest.
  • Each variant editing guide/donor pair is associated with a unique barcode, which can be used to track change in the abundance of cells edited by a specific guide/donor pair (Fig.1c).
  • UMIs were linked to each barcode to serve as biological replicates for pooled-editing and growth competition (Fig.1c).
  • CRISPEY-BAR was designed to measure the fitness effect of each variant with at least two guide/donor pairs, six UMIs and three pooled competition replicates (Fig.1c, Fig.7).
  • the barcode is genomically-integrated, no maintenance of an ectopic vector is needed post-editing, and 1:1 stoichiometric measurement of edited strains can be achieved through multiplexed sequencing of barcode amplicons (Fig.1d).
  • the barcode was designed to be covered by 76-base short-read sequencing to minimize sequencing costs and run-time, instead of resequencing the plasmid with 300-base paired-end reads to re- identify guide-donor pairs (Fig.8).
  • This sequencing design uses primers that are specific to the barcode-integrated genomic locus, therefore sequencing only the barcoded strains (Fig. 8).
  • Selective detection of the integrated barcode edit guarantees the edited cell expresses functional Cas9 and retron components, as well as endogenous cellular factors that facilitate HDR. This strategy allowed for enrichment of strains likely containing variant edits, which is crucial for high-throughput screens.
  • UMIs unique molecular identifiers
  • Fig.1i see also Methods
  • CRISPEY-BAR is highly efficient in precision editing and allows massively parallel tracking of variant fitness effects using the dual-edit design.
  • Detection of natural variants affecting fitness within QTLs reveals hidden genetic complexity
  • variants were first characterized within regions likely to be enriched for effects on growth in response to stress conditions, in which the yeast pool has slower growth overall.
  • a total of 36 genomic regions overlapping QTLs for growth of segregants derived from 16 diverse parental strains were measured in three stress conditions: fluconazole (FLC), cobalt chloride (CoCl 2 ) and caffeine (CAFF) (Fig.2a) 8 .
  • FLC fluconazole
  • CoCl 2 cobalt chloride
  • CAFF caffeine
  • the library could be enriched for variants impacting fitness in these stress conditions (Fig.2a) 7 .3 oligonucleotide pools (corresponding to variants to be assayed in fluconazole, cobalt chloride, and caffeine) were designed for pooled cloning into 3 separate CRISPEY- BAR libraries, which were then used for pooled editing (see Methods). After plasmid removal, the edited yeast were subjected to pooled growth competitions in synthetic complete media as well as each corresponding stress condition and changes in barcode abundance across roughly 25 generations were tracked (Fig.2b, Fig.7).
  • all pairwise comparisons between the relative fitness measurements for each variant were performed in each condition to see if the effects on growth were significantly different (Fig.3d,e).
  • two identical competitions in SC media were performed and variants tested for GxE interactions between them.
  • CRISPEY-BAR allows measuring more than one variant at the same genomic locus for multiallelic loci within the ergosterol pathway, which highlights the resolution and specificity of the measurements.
  • the other variant was a synonymous variant with no effects on fitness.
  • variants with significant effects in more than one condition can be grouped into two categories: 1) those with significant fitness effects in only one direction (Fig.5b) and 2) those with significant fitness effects in opposite directions, which is referred to as “sign GxE” (Fig. 5c).
  • the pleiotropic variant exhibiting sign GxE at chr7: 472522 C>A was located in a canonical Rpn4p binding site 33 (Fig.5e top, bottom left).
  • This variant's strongest effect was a significant fitness decrease in lovastatin.
  • Rpn4p is a transcriptional activator, it was hypothesized that the disruption of the Rpn4p binding site might decrease ERG4 expression.
  • RT-qPCR was used to measure expression of ERG4 in a genotyped strain carrying chr7: 472522 C>A and found that its expression decreased relative to the wildtype strain (Fig.5e bottom right).
  • CRISPEY-BAR was able to survey thousands of natural variants and identify the variants affecting fitness at the nucleotide-level, directly leading to discovery of molecular mechanisms of GxE interactions.
  • CRISPEY-BAR strategy and its applications provide a solution to rapidly discover natural genetic variants impacting a complex trait. As a proof of principle, 548 variants with significant effects on growth within QTLs were identified, as well as across a core metabolic pathway.
  • CRISPEY-BAR is highly efficient in precise editing.
  • the RT was shown in CRISPEY to be effective in production of msDNA as DNA donors for precision editing 18 .
  • the inventors have since tested additional retron RTs in CRISPEY, showing higher efficiency in yeast, as well as editing activity in human cells 34 . While this study only applied the SpCas9 with an ‘NGG’ PAM site limiting the variants that can be targeted, alternative nucleases with alternative PAM can be interchanged with SpCas9 to target additional variants 35–37 .
  • the CRISPEY-BAR approach has an efficient guide for barcoding, while the variant editing guide can have a range of efficiency.
  • CRISPEY-BAR This caveat can be overcome by applying CRISPEY-BAR to additional strains of budding yeast to not only capture the effects of variants within one lab strain, but also the effect of genetic background.
  • the CRISPEY-BAR design also allows for additional ribozymes and CRISPEY cassettes to be incorporated.
  • a single barcode-insertion cassette plus two or more variant editing cassettes can be expressed in the same transcript, allowing simultaneous editing of two genetic variants of choice and integration of a variant-pair specific barcode.
  • gene-by-gene (epistatic) interactions can be observed, as well as gene-by-gene-by-environment (GxGxE) interactions that govern the crosstalk between gene networks and the environment 38–40 .
  • the traits include growth in: 'Cobalt_Chloride;2mM;2’, 'Caffeine;15mM;2' and 'Fluconazole;100uM;2', and we refer to these traits as ‘stress conditions’ 11 .
  • stress conditions For the ergosterol pool, all non-reference alleles from yeast natural variants that were within +-500bp from the coding region of the selected ergosterol pathway genes were included 4 .
  • the guides and donors selected for CRISPEY editing were designed as described, with the following parameters or modifications 18 : 1.
  • the alternative allele is within -6 to -1 and +1 to +2 positions of the guide target and PAM sequences; 2.
  • the donor template is 108 bp in length with asymmetric homology arms, 40 bp for the 5’ arm and 68 bp for the 3’ arm; 3.
  • Variants were included if two or more guides were found for a given variant.
  • the resulting msDNA donor will result in a shorter 3’ homology arm and longer 5’ arm flanking the variant, which was to have higher HDR efficiency using ssDNA as repair donor 41 .
  • the donors were further filtered to exclude SphI, AscI and NotI restriction sites used in the cloning process, as well as keeping a minimum of 30 bp homology arm 5’ of variant and 55 bp 3’ of homology arm in the donor template.
  • the resulting output is 250 bp per oligo, consists of 5’ homology to the pSAC200 CRISPEY- BAR vector, 12 bp programmed barcode, restriction site region for cloning, 108 bp donor template sequence, 34 bp constant region, 20 bp guide sequence and 3’ homology to the pSAC200 CRISPEY-BAR vector (Fig.6).
  • the general sequence is: 5’- GTTGCAGTTAGCTAACAGGCCATGCNNNNNNNNNNGCATGCAGCGGCCGCAG GCGCGCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNN NNNN NNNN NNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
  • Barcodes were designed using a custom script implementing a quaternary Hamming(12,8) code based on the encoding scheme described in a previous study 42 . This encoding scheme generates DNA barcodes with a minimum Hamming distance of 3, allowing for error correction of 1 bp mutations or DNA sequencing errors.
  • sgGFP non-editing guide
  • Oligonucleotides were first amplified with Q5 polymerase (NEB) with 1 uM primer #615 in 50 uL reaction following manufacturer instructions and initial denaturation of 98°C for 2 min, and then 5 cycles of 98°C for 10 s and 65°C for 30 s, followed by 25 cycles of 98°C for 10 s and 69°C for 40 s, then final extension of 72°C for 2 min.
  • NEB Q5 polymerase
  • PCR products were then purified with 45 uL nucleoMAG NGS beads (hereafter, “beads”) (Takara) and eluted with 20 uL water.2 uL of the first round PCR product was further amplified with Q5 polymerase (NEB) with 1 uM primer #615 and #576 in 50 uL reaction as manufacturer instructions and initial denaturation of 98°C for 2 min, and then 15 cycles of 98°C for 10 s and 69°C for 30 s, then final extension of 72°C for 2 min. Second round PCR products were then purified with 45 uL beads and eluted with 20 uL Tris pH 8.0.
  • Beads nucleoMAG NGS beads
  • the pooled oligos were amplified with Q5 polymerase (NEB) with 1 uM primer #617 and #337-343 in 50 uL reaction following manufacturer instructions and initial denaturation of 98°C for 2 min, and then 15 cycles of 98°C for 10 s and 69°C for 40 s, then final extension of 72°C for 2 min, followed by purification using 45 uL beads and indexing PCR using Illumina dual-indexing primers.
  • the indexed amplicons corresponding to each pool were then sequenced by MiSeq using reagent kit v2 Nano to obtain paired-end 150bp reads that are mapped to the designed oligonucleotides.
  • the assembled products were purified by beads and eluted in 10 uL water.3 uL of the assembled products were used for electroporation with 27 uL Endura Electrocompetent cells for CRISPR DUO (Lucigen). Two electroporation reactions were performed for each pool following manufacturer instructions and recovered in SOC media (Lucigen) for 25 min at 37°C and plated to a single 15 cm LB agar plate with Carbenicillin (GoldBio). A serial dilution of the recovered bateria was plated to estimate colony forming units (cfu), and all pools contained more than 500,000 cfus.
  • the transformants were incubated for 22 hr at 32°C and the resulting bacterial lawn was collected for storage in LB with 10% glycerol at -80°C. Half of the collected transformant stock was used for plasmid extraction using Nucleobond Xtra Midi Plus (Macherey-Nagel) and eluted as “post-Gibson” plasmid pools, yielding 105-120 ug of plasmid DNA.
  • PCR was performed with 1 uM of each primer as manufacturer instructions, and initial denaturation of 98°C for 3 min, and then 35 cycles of 98°C for 10 s, 66°C for 30 s, 72°C for 40 s; then final extension of 72°C for 2 min.
  • the ligation product was purified by beads and eluted in 30 uL water.3 uL of the purified ligation products were used for electroporation with 27 uL Endura Electrocompetent cells for CRISPR DUO (Lucigen). Two electroporation reactions were performed for each pool, one reaction with ligation insert and the other without insert as negative control. Electroporation was performed following manufacturer instructions and recovered in SOC media (Lucigen) for 30 min at 37°C and the with-insert ligations were plated to two 15 cm LB agar plates with Carbenicillin (GoldBio) at 32°C for 22 hr.
  • a serial dilution of the recovered bacteria from both with- and without-insert ligations was plated to estimate cfu, and all pools contained more than 1,000,000 cfu, corresponding to at least 2,500x coverage for each oligonucleotide on average within each pool.
  • Ligation plates were incubated at 32°C for 22 hr, and transformants were stored in LB with 10% glycerol.
  • Ligated plasmids were extracted from one fourth of the collected bacteria from each pool using Nucleobond Xtra Midi Plus (Macherey-Nagel) and eluted as “post-ligation” plasmid pools, yielding 160-240 ug of plasmid DNA per reaction.
  • yeast transformant pools were selected on YNB -histidine -uracil 2% glucose (1.7g/L yeast nitrogen base (RPI); 5 g/L Ammonium Sulfate (ACROS organics); 1.9 g Dropout synthetic mix minus histidine, uracil w/o nitrogen base (US Biological) and 20 g/L glucose (Sigma) 2% agar plates and stored in YNB -histidine -uracil 2% glucose media with 15% glycerol at -80°C.
  • Cells were harvested from the last galactose media growth and stored in YNB - histidine -uracil 2% glucose media with 15% glycerol at -80°C.
  • the plasmid-cured cells were collected and stored in YNB 2% glucose media with 15% glycerol at -80°C. [0319] Pooled competition [0320] Pooled competitions were carried out in 1 L baffled flasks in YNB 2% glucose (SC, hereafter) media with or without specified conditions (Fig.7). The concentration of each drug/salt was titrated to approximately 5 generations of growth of the ZRS111 strain every 12 hr, indicating overall decreased fitness in each condition to apply consistent growth stress to cells. In contrast, for SC media only, there are approximately 5 generations of growth ZRS111 strain in 8 hr.
  • Genomic DNA was eluted in 200 uL per sample, further digested with 1 uL RNaseA and quantified by Qubit dsDNA HS assay (Invitrogen).10 ug of genomic DNA was amplified in 400 uL Q5 polymerase (NEB) PCR reaction with 1 uM forward primer #261 and 1 uM reverse primer equimolar mix of primers #327- #334 (Fig.8).
  • NEB Q5 polymerase
  • PCR was performed following manufacturer’s instructions, with 1M Betaine and initial denaturation of 98°C for 2 min, then 19 cycles of 98°C for 10 s, 65°C for 20 s; then extension at 72°C for 5 min.100 uL of first round of PCR products were purified using 100 uL beads and 15 uL of the purified amplicons were further indexed by 50 uL Q5 polymerase (NEB) PCR reaction following manufacturer’s instructions with 1 uM equimolar mix of indexing primers for Illumina sequencing, and initial denaturation of 98°C for 2 min, then 8 cycles of 98°C for 10 s, 70°C for 20 s; then extension at 72°C for 2 min.
  • NEB Q5 polymerase
  • the indexed amplicons were purified with 50 uL beads, eluted in 100 uL water and quantified by Qubit dsDNA HS assay (Invitrogen).
  • the purified, indexed amplicons from six time point samples for the three replicates per competition were mixed equimolar and purified by SizeSelect II gel (Invitrogen) for ⁇ 300 bp product.
  • the size selected libraries were then purified by beads and submitted for paired-end sequencing on NextSeq 550 using custom read1 primer #354, with custom cycles of 12 cycles for read1, 8 + 8 cycles for dual indices and 64 cycles for read2 using a 1 x 75 bp High-Output Kit (Fig.8). Data available at PRJNA827354.
  • Fluconazole Ecological Enrichment Test To test whether strains from particular ecological origins were enriched for variants with significant effects in a particular direction in fluconazole, we first split the variants with significant fitness effects in fluconazole into positive and negative effect variants. We then checked for each strain in the 1,011 yeast genomes if they were homozygous or heterozygous for the alternate allele we edited in at each significant variant.
  • strains with the alternate allele had 1 added to their score, and for negative effect alleles, strains with the alternate allele had 1 subtracted from their score.
  • the total number of negative effect variants was added to this score for all strains, as any strain with the reference allele for those sites in effect had the positive effect allele.
  • the 1,011 yeast strains were then sorted by this score, and the top 50 were chosen to look at their ecological origins, as they were presumably the strains with the most evidence for being under selection for increased growth in fluconazole.
  • the resulting PCR products were bead purified and cloned into pSAC200, ligated with UMI-containing insert and transformed into yeast as described for library cloning above.
  • the yeast transformants were induced for editing by culturing in 5 mL YNB -HIS -URA 2% raffinose media for 24 hr, passaged twice in 5 mL YNB -HIS -URA 2% galactose media for 24 hr each, then streaked out on YNB - URA 2% glucose (1.7g/L yeast nitrogen base (RPI); 5 g/L Ammonium Sulfate (ACROS organics); 1.9 g Dropout synthetic mix minus uracil, w/o nitrogen base (US Biological) and 20 g/L glucose (Sigma) 2% agar plates to obtain single edited clones.
  • plasmids were cured from edited clones by restreaking on YNB 2% glucose 2% agar plates with 1 g/L 5- Fluororotic acid monohydrate (GoldBio). The single plasmid-cured colonies were amplified by growing in YNB 2% glucose media overnight and stored in YNB 2% glucose media with 15% glycerol at -80°C. [0339] Colonies were streaked out from the frozen stock and lysed with Zymolyase 20T (US Biological) solution in 50 mM potassium phosphate buffer, pH 7.5.
  • PCR cycles had an initial denaturation of 95°C for 2 min; then 35 cycles of 95°C for 10 s, 60°C for 15 s, 72°C for 20 s; then a final extension of 72°C for 5 min.
  • PCR products were purified, Sanger sequenced and aligned to the reference genome using SGD BLAST to confirm the intended genotype 50,51 .
  • SGD BLAST SGD BLAST
  • Genomic amplicons of loci containing the associated variant edit were Sanger sequenced from barcoded colonies to calculate the editing rates shown in Fig.1d.
  • qRT-PCR [0341] Strains containing the Sanger sequencing-verified genotypes were thawed from frozen stock and grown overnight in 5 mL YNB 2% glucose media.0.5 mL of the overnight culture was passaged to 50 mL YNB 2% glucose media with or without 30 mg/mL lovastatin. Cells were harvested after 5 generations of growth in media, approximately 12 hr after passaging.
  • T1-T6 Harvested cells were spun down and resuspended in 1x DPBS (Gibco) and stored at 4°C and assayed by flow cytometry within 12 hr post-harvest. Generation time was estimated by measuring OD 600 of the culture containing ZRS111 and GFP control strain at every time point. Competition for each edited strain against GFP control strain was replicated four times in four different wells, to control for spontaneous mutation during competition.
  • Ratios between each edited strain against GFP control strain were determined by flow cytometry assay, using an Attune NxT Flow Cytometer and Autosampler (ThermoFisher Scientific). GFP was detected using a 530 nm band-pass filter (BL1) with a 488 nm laser. The channel voltages were adapted from a previous study and set as follows: FSC: 200; SSC: 320; and BL1:480 41 . A threshold for FSC of 2.5 x 10 3 A.U. was applied to exclude non-yeast events. Data analysis was performed using Attune NxT Software v2.7.
  • Doublets were removed by FSC gating and cell counts for GFP control strain were determined by BL1 gating and the remaining cells were counted as the non-fluorescent, corresponding to edited strains. Samples with fewer than 500 total cells gated, as well as samples with cell counts of less than 3 for either GFP or edited strains, were excluded. Log2 ratios between edited strain count and GFP control strain count were calculated for each sample and fitted to a slope for the estimated generations within each replicate. The slopes were normalized by subtracting the slope calculated by the competition of a non-variant edit, barcode-only control to the GFP control strain in the same replicate. Finally, the mean and standard error for slopes across four replicates were calculated for each edited strain, representing pairwise fitness values.
  • a retron-guide RNA cassette comprising: (a) a first retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) an msd locus; (iv) a first donor DNA sequence located within the msd locus, wherein the donor DNA sequence comprises homology to one or more sequences within a first target locus; and (v) a second inverted repeat sequence coding region; and (b) a first guide RNA (gRNA) coding region; (c) a second retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) an msd locus; (iv) a second donor DNA sequence located within the second msd locus,
  • the first donor DNA sequence comprises a genetic variant relative to the sequence at the first target locus.
  • the genetic variant comprises a cis-eQTL variant at the first target locus.
  • HDV hepatitis delta virus
  • ribozyme sequence selected from the group consisting of hepatitis delta virus (HDV) ribozyme, drz- CIV-1, drz-Spur-3, drz-Agam1-1, drzAgam1-2, drzPmar-1, Twister, Hammerhead, and combinations thereof.
  • HDV hepatitis delta virus
  • a method for identifying a genetic modification at a target locus in a host cell comprising: (a) transforming the host cell with a vector of embodiment 19; (b) culturing the host cell or transformed progeny of the host cell under conditions sufficient for expressing from the vector a first retron donor DNA-guide molecule comprising a first retron transcript and the first gRNA coding region and a second retron donor DNA-guide molecule comprising a second retron transcript and the second gRNA coding region, wherein the first and second retron transcripts self-prime reverse transcription by a reverse transcriptase (RT) expressed by the host cell or the transformed progeny of the host cell, wherein at least a portion of the first retron transcript is reverse transcribed to produce a multicopy single-stranded DNA (msDNA) molecule having one or more donor DNA sequences, wherein the one or more donor DNA sequences are homologous to the first target locus and comprise sequence modifications compared to the sequences within the first target locus, where
  • 26. The method of embodiment 25, wherein the first target locus is located in a cis- regulatory element of a transcription unit, and the second target locus is located in a 5’ untranslated region, a protein coding region, or a 3’ untranslated region (UTR) of the transcription unit.
  • 27. The method of any one of embodiments 20 to 26, wherein the first and/or second target locus is located in a non-coding intergenic region in the host cell genomic DNA.
  • 28. The method of any one of embodiments 25 or 26, wherein the one or more donor DNA sequences comprise a genetic variant compared to the sequences within the first target locus. 29.
  • the genetic variant comprises a cis-eQTL variant at the first target locus.
  • the barcode sequence encodes a detectable molecule, a selectable marker, or a cell surface marker.
  • detecting the presence of the unique barcode sequence comprises sequencing the genome of the host cell, or detecting a detectable molecule encoded by the barcode sequence.
  • any one of embodiments 20 to 33 further comprising: (d) transforming the host cell with a second vector comprising a second retron- guide RNA cassette comprising: a third retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) an msd locus; (iv) a third donor DNA sequence located within the msd locus, wherein the donor DNA sequence comprises homology to one or more sequences within a third target locus; and (v) a second inverted repeat sequence coding region; and a third guide RNA (gRNA) coding region; a fourth retron comprising: (i) an msr locus; (ii) a first inverted repeat sequence coding region; (iii) a second msd locus; (iv) a fourth donor DNA sequence located within the second msd locus, wherein the donor DNA sequence comprises homology to one or more sequences within a fourth target loc
  • the method of embodiment 39 wherein the third target locus is located in a cis- regulatory element of a transcription unit, and the fourth target locus is located in the 3’ untranslated region (UTR) of the transcription unit.
  • the one or more donor DNA sequences comprise a genetic variant compared to the sequences within the third target locus.
  • the genetic variant comprises a cis-eQTL variant at the first target locus.
  • any one of embodiments 34 to 43 wherein (i) the first and third gRNAs are different; (ii) the first and third target loci are different; (iii) the genetic modification at the first and third loci is different; (iv) the second and fourth gRNAs are the same; (v) the second and fourth target loci are the same; and (vi) the barcode sequences inserted at the second and fourth target loci are different.
  • the one or more donor DNA sequences comprise two homology arms, wherein each homology arm has at least about 70% to about 99% similarity to a portion of the sequence of the one or more target loci on either side of a nuclease cleavage site.
  • 52. The method of embodiment 51, wherein the eukaryotic cell is a yeast cell.
  • 53. The method of embodiment 51, wherein the eukaryotic cell is a mammalian cell. 54.
  • the genetic modifications are induced in greater than or equal to about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95% of the population of host cells.
  • 56. The method of any one of embodiments 20 to 49, comprising transforming a mixture of cells with one or more vectors comprising the first, second or third retron-guide RNA cassettes, and screening the transformed cells for a phenotypic change relative to an untransformed control cell.
  • the method of embodiment 56 further comprising detecting the presence of the genetic modification at the target locus or the presence of the unique barcode sequence present in each retron-guide RNA cassette.
  • AATGATACGGCGACCACCGAGATCTACACACTGCATAACACTCTTTCCCTACAC Primer GACGCTCTTCCGATCT #341 9. AATGATACGGCGACCACCGAGATCTACACAAGGAGTAACACTCTTTCCCTACAC Primer GACGCTCTTCCGATCT #342 10.
  • TGCGCACCCTTA Inverted repeat sequenc e 32. TAAGGGTGCGCA Second inverted repeat 33. ATGCGCACCCTTAGCGAGAGGTTTATCATTAAGGTCAACCTCTGGATGTTGTTT msr CGGCATCCTGCATTGAATCTGAGTTACT locus 34. TCTGAGTTACTGTCTGTTTgaacTGTTGGAACGGAGAGCATCGCCTGATGCTCTCC msd GAGCCAACtttAAACCCGTTTcTTCTGAC locus first retron 35.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Mycology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

La divulgation concerne des compositions et des procédés permettant d'introduire au moins deux modifications génétiques dans le génome d'une cellule ou d'un organisme hôte. Les compositions comprennent des cassettes d'ARN guide rétroniques qui peuvent être utilisées pour introduire une première modification génétique, telle qu'une variante génétique, au niveau d'un premier locus cible et une seconde modification génétique, telle qu'une séquence de code à barres unique, au niveau d'un second locus cible. Les procédés permettent le suivi de la première modification génétique par détection de la présence de la séquence de code à barres ou d'une protéine codée par la séquence de code à barres. Les procédés peuvent être utilisés pour suivre de multiples variantes génétiques introduites dans une cellule hôte par détection de la présence de multiples séquences de code à barres uniques, sans avoir à détecter les séquences de vecteur utilisées pour transformer la cellule hôte.
PCT/US2023/022989 2022-05-20 2023-05-19 Génération et suivi de cellules avec des éditions précises WO2023225358A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263344470P 2022-05-20 2022-05-20
US63/344,470 2022-05-20

Publications (1)

Publication Number Publication Date
WO2023225358A1 true WO2023225358A1 (fr) 2023-11-23

Family

ID=88836015

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/022989 WO2023225358A1 (fr) 2022-05-20 2023-05-19 Génération et suivi de cellules avec des éditions précises

Country Status (1)

Country Link
WO (1) WO2023225358A1 (fr)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050037487A1 (en) * 2003-05-28 2005-02-17 Yoshihiro Kawaoka Recombinant influenza vectors with a PolII promoter and ribozymes for vaccines and gene therapy
US20130316339A1 (en) * 2010-09-01 2013-11-28 Orion Genomics Llc Detection of nucleic acid sequences adjacent to repeated sequences
US20150184199A1 (en) * 2013-12-19 2015-07-02 Amyris, Inc. Methods for genomic integration
US20190330619A1 (en) * 2016-09-09 2019-10-31 The Board Of Trustees Of The Leland Stanford Junior University High-throughput precision genome editing
WO2020163779A1 (fr) * 2019-02-08 2020-08-13 The Board Of Trustees Of The Leland Stanford Junior University Production et suivi de cellules modifiées avec des modifications génétiques combinatoires
US20200283780A1 (en) * 2019-03-08 2020-09-10 Zymergen Inc. Iterative genome editing in microbes
US20210017530A1 (en) * 2014-12-31 2021-01-21 Synthetic Genomics, Inc. RNA-Guided Endonuclease Expressing Algal Strain for High Efficiency In Vivo Genome Editing
WO2022007959A1 (fr) * 2020-07-10 2022-01-13 中国科学院动物研究所 Système et procédé d'édition d'acide nucléique
US20220049226A1 (en) * 2020-08-13 2022-02-17 Sana Biotechnology, Inc. Methods of treating sensitized patients with hypoimmunogenic cells, and associated methods and compositions

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050037487A1 (en) * 2003-05-28 2005-02-17 Yoshihiro Kawaoka Recombinant influenza vectors with a PolII promoter and ribozymes for vaccines and gene therapy
US20130316339A1 (en) * 2010-09-01 2013-11-28 Orion Genomics Llc Detection of nucleic acid sequences adjacent to repeated sequences
US20150184199A1 (en) * 2013-12-19 2015-07-02 Amyris, Inc. Methods for genomic integration
US20210017530A1 (en) * 2014-12-31 2021-01-21 Synthetic Genomics, Inc. RNA-Guided Endonuclease Expressing Algal Strain for High Efficiency In Vivo Genome Editing
US20190330619A1 (en) * 2016-09-09 2019-10-31 The Board Of Trustees Of The Leland Stanford Junior University High-throughput precision genome editing
WO2020163779A1 (fr) * 2019-02-08 2020-08-13 The Board Of Trustees Of The Leland Stanford Junior University Production et suivi de cellules modifiées avec des modifications génétiques combinatoires
US20200283780A1 (en) * 2019-03-08 2020-09-10 Zymergen Inc. Iterative genome editing in microbes
WO2022007959A1 (fr) * 2020-07-10 2022-01-13 中国科学院动物研究所 Système et procédé d'édition d'acide nucléique
US20220049226A1 (en) * 2020-08-13 2022-02-17 Sana Biotechnology, Inc. Methods of treating sensitized patients with hypoimmunogenic cells, and associated methods and compositions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
STRUNZ TOBIAS, GRASSMANN FELIX, GAYÁN JAVIER, NAHKURI SATU, SOUZA-COSTA DEBORA, MAUGEAIS CYRILLE, FAUSER SASCHA, NOGOCEKE EVERSON,: "A mega-analysis of expression quantitative trait loci (eQTL) provides insight into the regulatory architecture of gene expression variation in liver", SCIENTIFIC REPORTS, NATURE PUBLISHING GROUP, US, vol. 8, no. 1, US , XP093114130, ISSN: 2045-2322, DOI: 10.1038/s41598-018-24219-z *

Similar Documents

Publication Publication Date Title
US11760998B2 (en) High-throughput precision genome editing
US12054754B2 (en) CRISPR-associated transposon systems and components
Yumlu et al. Gene editing and clonal isolation of human induced pluripotent stem cells using CRISPR/Cas9
AU2024202007A1 (en) Novel CRISPR enzymes and systems
JP2022127638A (ja) 最適化機能CRISPR-Cas系による配列操作のための系、方法および組成物
EP3024964B1 (fr) Ingénierie génomique
CA3111432A1 (fr) Nouvelles enzymes crispr et systemes
JP2021500036A (ja) アデノシン塩基編集因子の使用
CN110520528A (zh) 高保真性cas9变体及其应用
WO2017196768A1 (fr) Arn-guides à auto-ciblage utilisés dans un système crispr
JP2019526248A (ja) プログラム可能cas9−リコンビナーゼ融合タンパク質およびその使用
WO2016100974A1 (fr) Identification non biaisée de cassures bicaténaires et réarrangement génomique par séquençage de capture d'insert à l'échelle du génome
EP3940078A1 (fr) Variants mononucléotidiques hors cible produits par une édition génique à base unique et un outil d'édition génique à base unique hors cible à haute spécificité
JP2022520063A (ja) 組み合わせ遺伝子修飾を有する操作された細胞の産生および追跡
WO2023225358A1 (fr) Génération et suivi de cellules avec des éditions précises
EP4352233A1 (fr) Systèmes crispr-transposon pour la modification d'adn
KR20180128864A (ko) 매칭된 5' 뉴클레오타이드를 포함하는 가이드 rna를 포함하는 유전자 교정용 조성물 및 이를 이용한 유전자 교정 방법
US20210115500A1 (en) Genotyping edited microbial strains
US20210292752A1 (en) Method for Isolating or Identifying Cell, and Cell Mass
US20240263173A1 (en) High-throughput precision genome editing in human cells
WO2024023734A1 (fr) Édition génomique multi-arng
US20240287506A1 (en) Library construction method based on long overhang sequence ligation
US20240124873A1 (en) Methods and compositions for combinatorial targeting of the cell transcriptome
US20240240164A1 (en) Non-viral homology mediated end joining
WO2024092217A1 (fr) Systèmes et procédés d'insertions génétiques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23808411

Country of ref document: EP

Kind code of ref document: A1