WO2019236893A2 - Stem cell lines containing endogenous, differentially-expressed tagged proteins, methods of production, and use thereof - Google Patents

Stem cell lines containing endogenous, differentially-expressed tagged proteins, methods of production, and use thereof Download PDF

Info

Publication number
WO2019236893A2
WO2019236893A2 PCT/US2019/035852 US2019035852W WO2019236893A2 WO 2019236893 A2 WO2019236893 A2 WO 2019236893A2 US 2019035852 W US2019035852 W US 2019035852W WO 2019236893 A2 WO2019236893 A2 WO 2019236893A2
Authority
WO
WIPO (PCT)
Prior art keywords
cell
selectable marker
protein
cells
tagged
Prior art date
Application number
PCT/US2019/035852
Other languages
French (fr)
Other versions
WO2019236893A3 (en
Inventor
Brock ROBERTS
Original Assignee
Allen Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Allen Institute filed Critical Allen Institute
Publication of WO2019236893A2 publication Critical patent/WO2019236893A2/en
Publication of WO2019236893A3 publication Critical patent/WO2019236893A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/06Animal cells or tissues; Human cells or tissues
    • C12N5/0602Vertebrate cells
    • C12N5/0696Artificially induced pluripotent stem cells, e.g. iPS
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N5/00Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
    • C12N5/06Animal cells or tissues; Human cells or tissues
    • C12N5/0602Vertebrate cells
    • C12N5/0652Cells of skeletal and connective tissues; Mesenchyme
    • C12N5/0657Cardiomyocytes; Heart cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2510/00Genetically modified cells

Definitions

  • the present disclosure relates to the fields of stem cell biology, genetics, and genetic engineering.
  • the present disclosure relates to methods of genetically engineering stem cells to express one or more fluorescently-tagged structural or other proteins that are expressed when the stem cells undergo differentiation, but are otherwise not expressed in a pluripotent state.
  • the methods described herein allow for the generation of genetically-engineered, fluorescently-tagged stem cells, wherein the endogenous functions of the stem cells remain un-altered (see, e.g. , pluripotency and genomic stability).
  • the methods allow for three-dimensional live cell imaging of intracellular proteins.
  • the methods allow for use of the cells for screening, observing cellular dysplasia, disease staging, monitoring disease progression or improvement, or cellular stress in response to a test agent
  • fusion constructs often result in unpredictable and artificial expression levels of the tagged protein, either as a result of transient expression of transfected constructs, or as a result of copy number variation with transduced constructs. These realities hinder the interpretation of experiments and in turn the study of pathogenesis and drug discovery.
  • CRISPR/Cas9 eliminates many of the challenges associated with genetic engineering and an ever-growing number of studies illuminate the power of this approach.
  • the system is most commonly used in loss-of-function studies, wherein one or more genes are mutated or deleted to generate genetic knock-outs. Less common is the use of the system to introduce exogenous genetic sequences into a target locus.
  • homology-directed repair HDR mediates the insertion of a repair template into the target locus and can be used to correct an existing mutation in the genomic sequence or to insert exogenous nucleic acid sequences (e.g ., a nucleic acid sequence encoding one or more selectable markers).
  • HDR has a low error- rate, it is an inherently inefficient process, with rates of less than 10% in normal cells. As such, until now it has been difficult to reproduce HDR-mediated protein tagging across multiple targets to enable systematic use of this process in the study of endogenous protein dynamics particularly in view of the unpredictability of how the introduction of large fluorescent tags may affect endogenous gene function as well as stem cell viability, pluripotency, and chromosomal stability.
  • the methods provided herein utilize CRISPR/Cas9-mediated gene editing to introduce multiple selectable markers via HDR into the genomic loci of target proteins, into the genomic safe harbor location, or other locations in the genome. Utilizing a first selection of transfected cells, followed by removal of a first selectable marker, cells can be produced that include a tagged, endogenous, differentially-expressed protein. These methods result in the production of isogenic hiPSC clones expressing detectable endogenously-regulated, differentially- expressed fusion proteins unique to each cell line, and do not substantially modify or alter stem cell pluripotency or function.
  • the present invention provides a method for producing a cell comprising at least one tagged endogenous, differentially-expressed protein.
  • the method suitably comprises providing a first nuclease specific for a target genomic locus of a differentially-expressed protein, providing a donor plasmid comprising: a first polynucleotide encoding a selection cassette , wherein the selection cassette comprises a first selectable marker; a second polynucleotide encoding a second selectable marker that is different from the first selectable marker; a third polynucleotide encoding a 5’ homology arm; and a fourth polynucleotide encoding a 3’ homology arm.
  • the methods further include introducing the first nuclease and the donor plasmid of into a cell such that the first and second polynucleotides are inserted into the target genomic locus; selecting cells expressing the first selectable marker; and introducing into the cells (selected cells): a second nuclease capable of excising the selection cassette to generate an endogenous protein tagged with the second selectable marker, wherein the tagged endogenous protein is substantially free of a scar sequence; thereby producing the cell comprising the at least one tagged endogenous, differentially-expressed protein.
  • a method for producing a stem cell comprising at least one tagged endogenous, differentially-expressed protein.
  • the method suitably comprises providing a first ribonucleoprotein (RNP) complex comprising a first Cas protein, a first CRISPR RNA (crRNA) and a first trans-activating RNA (tracrRNA), wherein the first crRNA is specific for a target genomic locus of an endogenous, differentially-expressed protein in a stem cell, providing a donor plasmid comprising polynucleotide sequences encoding: a first selectable marker; a 5’ excision site and a 3’ excision site , wherein the 5’ and 3’ excision sites flank the first selectable marker; a second selectable marker that is different from the first selectable marker; and a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length.
  • RNP ribonucleoprotein
  • the method further includes transfecting the complex and the donor plasmid into the stem cell such that the polynucleotide sequences encoding the various components are inserted into the target genomic locus, selecting stem cells expressing the first selectable marker; and transfecting the stem cells of (d) with a second RNP complex comprising a second Cas protein, a second crRNA, and a second tracrRNA, wherein the second crRNA is specific for the 5’ and 3’ excision sites on the donor plasmid, to generate an endogenous protein tagged with the second selectable marker, thereby producing the stem cell comprising at least one tagged endogenous, differentially-expressed protein.
  • donor plasmid comprising polynucleotide sequences encoding: a first selectable marker; a constitutive regulatory element operably linked to the first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; a second selectable marker that is different from the first selectable marker; a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length.
  • a stably tagged cell generated by insertion of the donor plasmids described herein.
  • the donor plasmids can be used for imaging one or more proteins in one or more cells.
  • a cell comprising an exogenous polynucleotide integrated at a target genomic locus, the exogenous polynucleotide comprising polynucleotide sequences encoding: a first selectable marker; a constitutive regulatory element operably linked to the first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; and a second selectable marker that is different from the first selectable marker, wherein the target genomic locus is a locus of a gene encoding a differentially-expressed protein.
  • a cell comprising a CRISPR/Cas9 ribonucleoprotein
  • RNP RNP complex and a donor polynucleotide
  • the donor polynucleotide comprising polynucleotide sequences encoding: a first selectable marker; a constitutive regulatory element operably linked to the first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; a second selectable marker that is different from the first selectable marker, and a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length.
  • a cell comprising an endogenous, differentially-expressed protein stably tagged with a selectable marker.
  • kits comprising an array of stem cells comprising at least one tagged endogenous, differentially-expressed protein.
  • a method of generating a signature for a test agent comprising: (a) admixing the test agent with one or more cells produced by the methods herein, detecting a response in the one or more cells; detecting a response in a control cell; detecting a difference in the response in the one or more cells from the control cell; and generating a data set of the difference in the response.
  • a cell comprising at least one tagged endogenous, stimuli-responsive gene
  • the method comprising: providing a first nuclease specific for a target genomic locus of a stimuli-responsive gene; providing a donor plasmid comprising: a first polynucleotide encoding a selection cassette, wherein the selection cassette comprises a first selectable marker; a second polynucleotide encoding a second selectable marker that is different from the first selectable marker; a third polynucleotide encoding a 5’ homology arm; and a fourth polynucleotide encoding a 3’ homology arm; introducing the first nuclease and the donor plasmid into a cell such that the first and second polynucleotides are inserted into the target genomic locus; selecting cells expressing the first selectable marker; and introducing into the cells of (d): a second nuclease capable of
  • methods for producing a cell comprising at least one tagged endogenous, stimuli-responsive gene, the method comprising: providing a first ribonucleoprotein (RNP) complex comprising a first Cas protein, a first CRISPR RNA (crRNA) and a first trans-activating RNA (tracrRNA), wherein the first crRNA is specific for a target genomic locus of an endogenous, stimuli-responsive gene in a cell; providing a donor plasmid comprising polynucleotide sequences encoding: a first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; a second selectable marker that is different from the first selectable marker; and a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length; transfecting the complex and the donor plasmid into
  • a cell comprising an exogenous polynucleotide integrated at a target genomic locus, the exogenous polynucleotide comprising polynucleotide sequences encoding: a first selectable marker; a constitutive regulatory element operably linked to the first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; and a second selectable marker that is different from the first selectable marker, wherein the target genomic locus is a locus of a stimuli-responsive gene.
  • FIG. 1 A - FIG. 1D provide schematics of illustrative gene editing and clone selection protocols.
  • FIG. 1 A shows a schematic illustrating design features important for genome editing experiments.
  • FIG. 1B illustrates a schematic of donor plasmids for N-terminal tagging of LMNB 1 and C-terminal tagging of DSP.
  • FIG. 1C illustrates a schematic depicting the genome editing process.
  • FIG. 1D shows a schematic overview of the clone isolation, genetic screening, and quality control workflow.
  • FIG. 2A - FIG. 2D illustrate comparisons of gene editing efficiency.
  • FIG. 2A shows flow cytometry plots displaying GFP intensity (y-axis) 3-4 days after editing.
  • FIG. 2B shows a comparison of genome editing efficiency, as defined by FACS, shown as a percentage of GFP+ cells within the gated cell population in each panel of FIG. 2A.
  • FIG. 2C shows estimated percentage of cells in the FACS-enriched populations expressing GFP, as determined by live microscopy.
  • FIG. 2D shows a representative image of the LMNB 1 Crl FACS-enriched population showing an enrichment of GFP+ cells. Scale bars are 10 pm.
  • FIG. 3 show a schematic illustrating the sequential process for identifying precisely tagged clones.
  • step 1 ddPCR was used to identify clones with GFP insertion (normalized genomic GFP copy number ⁇ l or ⁇ 2) and no plasmid integration (normalized genomic plasmid backbone copy number ⁇ 0.2). Hypothetical example of a typical editing experiment is shown with examples for pass and fail criteria.
  • step 2 FIG. 3B
  • junctional PCR amplification of the tagged allele was used to determine precise on-target GFP insertion.
  • step 3 FIG. 3
  • the untagged allele of a clone with monoallelic GFP insertion is amplified. The amplicon was then sequenced to ensure that no mutations have been introduced to this allele.
  • FIG. 4A - FIG. 4E show results of genetic assays to screen for precise genome editing in clones.
  • FIG. 4A shows ddPCR screening data from five experiments representative of experimental outcome categories.
  • FIG. 4B shows examples of ddPCR screening data from experiments representative of the range of outcomes observed. Each data point represents one clone.
  • FIG. 4C shows the rates of clonal confirmation by junctional tiled PCR following selection by ddPCR.
  • FIG. 4D shows the rates of clonal confirmation by junctional tiled PCR when ddPCR was not used as an initial screening criterion.
  • FIG. 4E shows the rate of clonal confirmation by untagged allele amplification and sequencing.
  • FIG. 5A - FIG. 5E show additional results of genetic assays to screen for precise genome editing in clones.
  • FIG. 5A shows percentage of clones confirmed by ddPCR to have incorporated the GFP tag but not the plasmid backbone.
  • FIG. 5B shows percentage of clones confirmed in step 1 that also had correctly sized junctional PCR amplicons.
  • FIG. 5C shows percentage of clones confirmed to have wild type untagged alleles by PCR amplification and Sanger sequencing following steps 1 and 2.
  • FIG. 5D shows the percentage of clones in each experiment with KAN/AMP copy number > 0.2 is displayed on the y-axis. Stacked bars represent 3 observed subcategories of rejected clones.
  • FIG. 5E shows fragment analysis of complete junctional allele amplification.
  • FIG. 6A - FIG. 6C show amplification of complete junctional (non-tiled)
  • FIG. 6A shows junctional PCR primers complementary to sequences flanking the homology arms in the distal genome were used together to co-amplify tagged and untagged alleles.
  • FIG. 6B shows an assay served to rule out anticipated DNA repair outcomes where tiled junctional PCR data leads to a misleading result because the GFP tag sequence has been duplicated during HDR, as indicated by the schematic.
  • FIG. 6C shows molecular weight markers are as indicated (kb).
  • FIG. 7 illustrates the morphology of final candidate clones with GFP-tagged
  • FIG. 8 A - FIG. 8K show live-cell imaging of final 10 edited clonal lines.
  • FIG. 9A - FIG. 9C show cell biological assays to evaluate co-expression of tagged and untagged protein forms and their relative contributions to cellular proteome and structure.
  • FIG. 9A shows comparison of labeled structures in edited cells and unedited WTC parental cells.
  • FIG. 9B shows lysate from ACTB cl. 184 (left), TOMM20 cl. 27 (middle), and LMNB1 cl. 210 (right) are compared to unedited WTC cell lysate by western blot.
  • FIG. 9C shows quantification of the Western blot analyses in FIG. 9B.
  • FIG. 10A - FIG. 10F show an assessment of stem cell quality after genome editing.
  • FIG. 10A shows representative phase contrast images depicting cell and colony morphology of the unedited WTC line and several GFP-tagged clones (LMNB1, ACTB, TOMM20, and PXN).
  • FIG. 10B shows representative flow cytometry plots of gene-edited LMNB 1 cl. 210 cells and unedited WTC cells immunostained for indicated pluripotency markers (Nanog, Oct3/4, Sox2, SSEA-3, TRA-l-60) and a marker of differentiation (SSEA-l).
  • FIG. 10C shows representative flow cytometry plots of differentiated unedited WTC cells or gene-edited LMNB1 cl.
  • FIG. 10D shows cardiomyocytes differentiated from unedited WTC cells and stained with cardiac Troponin T (cTnT) antibody to label cardiac myofibrils.
  • FIG. 10E shows representative flow cytometry plots showing cTnT expression in unedited WTC control cells and several gene edited cell lines (LMNB 1 cl. 210, ACTB cl. 184, and TOMM20 cl. 27).
  • FIG. 10F shows a quantitative assessment of pluripotency and cardiomyocyte differentiation markers for final clones
  • FIG. 11 A - FIG. 11E illustrate results of phenotypic validation of candidate clones.
  • FIG. 12 illustrates expression levels of the 12 genes attempted for genome editing in the WTC parental cell line.
  • FIG. 13 A - FIG. 13E illustrate predicted genome wide CRISPR/Cas9 alternative binding sites, categorized according to sequence profile and location with respect to genes.
  • FIG. 13A shows predicted alternative CRISPR/Cas9 binding sites (SEQ ID NOs: 174 - 186) categorized for each crRNA used.
  • FIG. 13B shows predicted off-target sequence breakdown based on sequence profile.
  • FIG. 13C shows breakdown of sequenced off-target sites by sequence profile.
  • FIG. 13D shows all predicted off-target sites were additionally categorized according to their location with respect to annotated genes.
  • FIG. 13E shows breakdown of sequenced off-target sites by genomic location with respect to annotated genes.
  • FIG. 14A - FIG. 14B illustrate ddPCR screening data.
  • FIG. 14A shows ddPCR screening data for all experiments.
  • FIG. 14B shows a dilution series of the donor plasmid used for the PXN-EGFP tagging experiment was used to confirm equivalent amplification of the AMP and GFP sequences in two-channel ddPCR assays.
  • FIG. 15 illustrates comparison of unedited versus edited cells by immunofluorescence.
  • FIG. 16 illustrates comparison of GFP tag localization and endogenous protein stain in edited cell lines.
  • FIG. 17 shows live cell imaging comparison of transiently transfected cells and genome edited cells. Top panels depict transiently transfected WTC cells and bottom panels depict gene edited clonal lines. Left: WTC transfected with EGFP-tagged alpha tubulin construct compared to the TUB AlB-mEGFP edited cell line. Images are a single apical frame. Middle: WTC transfected with EGFP-tagged desolating construct compared to the DSP-mEGFP edited cell line. Images are maximum intensity projections of apical 4 z-frames. Right: WTC transfected with mCherry-tagged Tom20 construct compared to the TOMM20-mEGFP edited cell line. Images are single basal frames of the cell. [0039] FIG. 18A - FIG. 18B show Western blot analysis of all 10 edited clonal lines.
  • FIG. 19A - FIG. 19B show editing experiments testing the feasibility of biallelic editing of the LMNB1 and TUBA1B loci.
  • FIG. 19A shows final clones LMNBl-mEGFP and TUBAlB-mEGFP were transfected using the standard editing protocol with a donor cassette targeting the untagged allele of the tagged locus, encoding mTagRFP-T (sequential delivery, top row).
  • FIG. 19B shows the sorted population from FIG. 19A (indicated by asterisk) revealed similar subcellular localization of GFP and mTagRFP-T signal to the nuclear envelope in the majority of cells, suggesting successful biallelic tagging.
  • FIG. 20A - FIG. 20B show live imaging analysis at two culture time points of TUBAlB-mEGFP edited cells and the four final edited clones that displayed a low abundance of tagged protein.
  • FIG. 21 A - FIG. 21C show Western blot analysis of candidate clones at one culture time point and final clones at two culture time points from editing experiments that displayed a low abundance of tagged protein.
  • FIG. 22A - FIG. 22D show flow cytometry analysis of GFP tag expression stability, flow cytometry analysis of cell cycle dynamics, microscopy analysis of mitotic index, and culture growth assays.
  • FIG. 22A shows endogenous GFP signal in final edited clones was compared in otherwise identical cultures separated by four passages (14 days) of culturing time (indicated).
  • FIG. 22B shows propidium iodide staining and flow cytometry were used to quantify numbers of cells in Gl (indicated), S phase (indicated) and G2/M phase (indicated) in final edited clones.
  • FIG. 22C shows DAPI staining of colonies from each of the same five clonal lines was additionally used to quantify the numbers of mitotic cells per colony, as indicated.
  • FIG. 22D shows ATP quantitation was used as an indirect measure of cell growth.
  • FIG. 23 illustrates PCR primers (SEQ ID NOs: 193 - 272) used in experiments. All primers are listed in 5' to 3' orientation.
  • FIG. 24A - FIG. 24B illustrate antibodies used in western blot, immunofluorescence, and flow cytometry experiments.
  • FIG. 25 illustrates a workflow overview and strategy for building predictive models of the dynamic organization and behavior of cells using image-based 3D data sets of fluorescently tagged structures in human induced pluripotent stem cells (hiPSC).
  • FIG. 26A - FIG. 26C illustrate image-based feature extraction: colony growth and fluorescent texture quantification to sort and select drug-induced end point phenotypes.
  • FIG. 27 illustrates high resolution 3D images reveal drug signatures on target and non-target cell structures as well as the morphological spectrum of each structure
  • FIG. 28A - FIG. 28C illustrate fluorescence quantification of 3D images to analyze drug-induced Golgi reorganization.
  • FIG. 29A - FIG. 29F illustrate relative fluorescence quantification of 3D images and z-axis intensity profiling to analyze drug-induced cytoskeleton reorganization.
  • FIG. 30 illustrates Z-axis intensity profiling of 3D images to analyze drug- induced cell junction reorganization.
  • FIG. 31 illustrates Z-axis intensity profiling of 3D images to analyze drug- induced cell junction reorganization.
  • FIG. 32 illustrates exemplary factors for producing differentiated cell types from human iPSCs.
  • FIG. 33 A - FIG. 33H illustrate a two-step CRISPR/Cas9 mediated targeting via HDR and subsequent microhomology guided excision of a constitutively expressed selection cassette, in accordance with embodiments hereof.
  • FIG. 34 A - FIG. 34C illustrate fluorescence assisted cell sorting (FACS) experiments to isolate mCherry-expressing cells and establish efficacy of two-step editing at transcriptionally silent loci, in accordance with embodiments hereof.
  • FACS fluorescence assisted cell sorting
  • FIG. 35A - FIG. 35E illustrate FACS-sorting of mCherry-negative cells to measure excision and obtain putatively GFP-tagged cells, in accordance with embodiments hereof.
  • FIG. 36A - FIG. 36F illustrate genetic analysis of precise GFP tagging using two-step targeting and excision in clones, in accordance with embodiments hereof.
  • FIG. 37A - FIG. 37C illustrate quantitative assays to evaluate cardiomyocyte differentiation efficiency and GFP-tagged allele expression in precisely excised clones, in accordance with embodiments hereof.
  • FIG. 38 provides quality control criteria to evaluate the robustness of clonal line differentiation, pluripotency and genomic stability, in accordance with embodiments hereof.
  • FIG. 39A - FIG. 39C illustrate imaging experiments to evaluate sarcomeric localization of the GFP-tagged alleles, in accordance with embodiments hereof.
  • FIG. 40A - FIG. 40D illustrate quantitative and imaging assays to evaluate cardiomyocyte differentiation efficiency and GFP-tagged allele expression in precisely excised MYL2 clones, in accordance with embodiments hereof.
  • the present invention provides methods for producing stem cells comprising one or more tagged proteins using the CRISPR/Cas9 gene editing system.
  • the methods described herein enable the insertion of fluorescent tags into a target genomic loci or plurality of target genomic loci to generate stem cells that are phenotypically and functional similar to the un modified parent population.
  • Stem cells produced by the methods described herein additionally retain the capacity to self-renew and differentiate into specialized cell types.
  • any concentration range, percentage range, ratio range, or integer range is to be understood to include the value of any integer within the recited range and, when appropriate, fractions thereof (such as one tenth and one hundredth of an integer), unless otherwise indicated.
  • the terms“about” and“approximately” are used as equivalents. Any numerals used in this application with or without about/approximately are meant to cover any normal fluctuations appreciated by one of ordinary skill in the relevant art.
  • the term“approximately” or“about” refers to a range of values that fall within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less in either direction (greater than or less than) of the stated reference value unless otherwise stated or otherwise evident from the context (except where such number would exceed 100% of a possible value).
  • the present invention provides for methods of producing a stem cell comprising at least one tagged endogenous protein.
  • the endogenous protein is a wild-type protein, whereas in other embodiments, the endogenous protein comprises one or more naturally-occurring mutations and/or one or more introduced mutations. Examples of mutations include but are not limited to amino acid insertions, deletions and substitutions.
  • stem cell refers to a multipotent, non-specialized cell with the capacity to self-renew and to differentiate into at least one differentiated cell lineage (e.g, potency).
  • The“sternness” of a stem cell include the characteristics of self-renewal and multipotency.
  • Self-renewal refers to the proliferation of a stem cell to generate one (asymmetric division) or two (symmetric division) daughter cells with development potentials that are indistinguishable from those of the mother cell.
  • Self-renewal results in an expanded population of stem cells, each of which maintains an undifferentiated state and the ability to differentiate into specialized cells. Typically, an expanded population of stem cells retains the sternness characteristics of the parent cell.
  • Potency refers to the ability of a stem cell to differentiate into at least one type of specialized cell. The greater the number of different specialized cell types a stem cell can differentiate into, the greater its potency.
  • a stem cell may be a totipotent cell, and able to differentiate into any specialized cell type (e.g, a zygote).
  • a stem cell may be pluripotent and able to differentiate into cell types of any of the three germ layers (endoderm, mesoderm, or ectoderm) (e.g, an embryonic stem cell or an induced pluripotent stem cell (iPSC)).
  • the stem cell may be multipotent and have the capacity to differentiate into multiple cell types of a particular cell lineage (e.g ., a hematopoietic stem cell).
  • Multipotent stem cells may also be referred to as progenitor cells.
  • stem cells may be obtained from a donor, or they may be generated from a non-stem cell.
  • Non-limiting examples of stem cells include embryonic stem cells and adult stem cells.
  • Stem cells include, but are not limited to, mesenchymal stem cells, adipose tissue-derived stem cells, hematopoietic stem cells, and umbilical cord-derived stem cells.
  • the stem cells described herein are human iPSCs.
  • iPSCs are derived from differentiated adult cells and have been modified to express transcription factors and proteins responsible for the induction and/or maintenance of a pluripotent state (e.g., Oct 3/4, Sox family transcription factors, Kef family transcription factors, and Nanog).
  • the iPSCs described herein are derived from a normal, healthy human donor.
  • the iPSC is a WTC or a WTB cell line (Kreitzer et al, American Journal of Stem Cells, 2: 119-31, 2013; Miyaoka et al, Nature Methods, 11 :291-3, 2013).
  • the iPSC is derived from a human donor that has been diagnosed with a disease or disorder.
  • the iPSC may be derived from a patient diagnosed with a cardiomyopathy (e.g, arrhythmogenic right ventricular cardiomyopathy, dilated cardiomyopathy, hypertrophic cardiomyopathy, left ventricular non-compaction cardiomyopathy, or restrictive cardiomyopathy), a heritable disease (e.g, deficiency of acyl-CoA dehydrogenase, very long chain (ACADVL), Barth syndrome (BTHS), camitine-acylcamitine translocase deficiency (CACTD), congenital disorder of DE glycosylation (CDDG), muscular dystrophies (including Emery-Dreifuss muscular dystrophy (EDMD1), autosomal dominant Emery-Dreifuss muscular dystrophy (EDMD2), Duchenne’s muscular dystrophy, and chronic granulomatous disease), Friedreich ataxia 1 (FRDA)
  • a cardiomyopathy
  • Stem cell markers as used herein are defined as gene products (e.g. protein, RNA, glycans, glycoproteins, etc.) that are specifically or predominantly expressed by stem cells.
  • Cells may be identified as a particular type of stem cell based on their expression of one or more of the stem cell markers using techniques commonly available in the art including, but not limited to, analysis of gene expression signatures of cell populations by microarray, qPCR, RNA-sequencing (RNA-Seq), Next-generation sequencing (NGS), serial analysis of gene expression (SAGE), and/or analysis of protein expression by immunohistochemistry, western blot, and flow cytometry.
  • RNA-Seq RNA-sequencing
  • NGS Next-generation sequencing
  • SAGE serial analysis of gene expression
  • Stem cell markers may be present in the nucleus (e.g, transcription factors), in the cytosol, and/or on the cell membrane (e.g, cell-surface markers).
  • a stem cell marker is a gene product that directly and specifically supports the maintenance of stem cell identity and/or stem cell function.
  • a stem cell marker is gene that is expressed specifically or predominantly by stem cells but does not necessarily have a specific function in the maintenance of stem cell identity and/or stem cell function. Examples of stem cell markers include, but are not limited to, Oct 3/4, Sox2, Nanog, Tra-l60, Tra-l8l, and SSEA3.
  • the present invention provides genetically engineered stem cells.
  • the terms“genetically engineered stem cells” or“modified stem cells” or“edited stem cells” refer to stem cells that comprise one or more genetic modifications, such as one or more tags inserted into a locus of one or more endogenous target genes.
  • “Genetic engineering” refers to the process of manipulating a genomic DNA sequence to mutate or delete one or more nucleic acids of the endogenous sequence or to introduce an exogenous nucleic acid sequence into the genomic locus.
  • the genetically-engineered or modified stem cells described herein comprise a genomic DNA sequence that is altered (e.g, genetically engineered to express a tag) compared to an un-modified stem cell or control stem cell.
  • an un-modified or control stem cell refers to a cell or population of cells wherein the genomes have not been experimentally manipulated (e.g, stem cells that have not been genetically engineered to express a tag).
  • the stem cells described herein are derived from a donor (e.g., a healthy donor) and comprise one or more genetic mutations associated with a particular disease or disorder introduced into the iPSC genome. Such embodiments are referred to herein as“mutant stem cells.”
  • a donor e.g., a healthy donor
  • Such embodiments are referred to herein as“mutant stem cells.”
  • Introduction of mutations into an iPSC derived from a health donor can mimic the genetic state of a particular disease or disorder, while maintaining the isogenic relationship between the mutant stem cell and the normal iPSC from which it is derived. This allows direct comparisons between the two cell types to be made when assessing the effect of a particular mutation on cellular structure, cellular function, protein localization, protein function, and/or protein expression.
  • mutations may be introduced into the PKD1 and/or PKD2 genes of an iPSC derived from a healthy donor to produce a PC 1 -mutant stem cell, a PC2-mutant stem cell, or a PCl/PC2-mutant stem cell.
  • These mutant stem cells and the corresponding normal stem cells from which they are derived can then be further engineered to express one or more detectable markers in one or more endogenous target genomic loci.
  • these cells are assayed according to the methods described herein to determine the effect of a particular mutation on cellular structure, cellular function, protein localization, protein function, and/or protein expression, and can elucidate the role of a protein in different diseases, such as polycystic kidney disease.
  • the present invention provides populations of genetically engineered stem cells that have been modified to express one or more tagged endogenous proteins.
  • a“population” of cells refers to any number of cells greater than 1, e.g., at least lxlO 3 cells, at least lxlO 4 cells, at least lxlO 5 cells, at least lxlO 6 cells, at least lxlO 7 cells, at least lxlO 8 cells, at least lxlO 9 cells, or at least lxlO 10 or more cells.
  • the present invention provides methods of producing genetically-engineered stem cells comprising at least one tagged endogenous protein.
  • the method comprises (a) providing a gene-editing system capable of producing double or single stranded DNA breaks at a target endogenous locus; (b) providing a repair template comprising a polynucleotide sequence encoding a detectable tag; (c) introducing the gene-editing system and the repair template into a stem cell such that the polynucleotide sequence encoding the detectable tag is inserted into an endogenous target genomic locus to generate the tagged endogenous protein.
  • the cells are cultured under conditions that allow insertion of the sequence encoding the detectable tag into the target genomic locus, such as any of those disclosed herein.
  • the cells produced in step (c) are cultured under conditions suitable for expression of the tagged endogenous protein.
  • the stem cell is an iPSC, and the methods further comprise generating the iPSC.
  • the iPSCs are generated from cells obtained from a donor, such as a normal, healthy donor or a diseased donor.
  • the methods described herein are used to produce a genetically-engineered stem cell comprising one tagged endogenous protein. In some embodiments, the methods described herein are used to produce a genetically-engineered stem cell comprising two, three, four, five, six, seven, eight, nine, ten, or more tagged endogenous proteins. In some embodiments, the repair template comprises a 5’ homology arm and a 3’ homology arm, each of about 1 kb in length, or each more than 1 kb in length.
  • the term“gene-editing system” refers to a protein, nucleic acid, or combination thereof that is capable of modifying a target locus of an endogenous DNA sequence when introduced into a cell.
  • Numerous gene editing systems suitable for use in the methods of the present invention are known in the art including, but not limited to, zinc-finger nuclease systems, TALEN systems, and CRISPR/Cas systems.
  • the gene editing system used in the methods described herein is a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas (CRISPR Associated) nuclease system, which is an engineered nuclease system based on a bacterial system that can be used for mammalian genome engineering.
  • the system comprises a CRISPR-associated endonuclease (for example, a Cas endonuclease) and a guide RNA (gRNA).
  • the gRNA is comprised of two parts; a crispr-RNA (crRNA) that is specific for a target genomic DNA sequence, and a trans-activating RNA (tracrRNA) that facilitates endonuclease binding to the DNA at the targeted insertion site.
  • crRNA crispr-RNA
  • tracrRNA trans-activating RNA
  • the crRNA and tracrRNA may be present in the same RNA oligonucleotide, referred to as a single guide-RNA (sgRNA).
  • the crRNA and tracrRNA may be present as separate RNA oligonucleotides.
  • the gRNA is comprised of a crRNA oligonucleotide and a tracrRNA oligonucleotide that associate to form a crRNA:tracrRNA duplex.
  • the term“guide RNA” or“gRNA” refers to the combination of a tracrRNA and a crRNA, present as either an sgRNA or a crRNA:tracrRNA duplex.
  • the CRISPR/Cas systems described herein comprise a Cas protein, a crRNA, and a tracrRNA.
  • the crRNA and tracrRNA are combined as a duplex RNA molecule to form a gRNA.
  • the crRNA:tracrRNA duplex is formed in vitro prior to introduction to a cell.
  • the crRNA and tracrRNA are introduced into a cell as separate RNA molecules and crRNA:tracrRNA duplex is then formed intracellularly.
  • polynucleotides encoding the crRNA and tracrRNA are provided.
  • the polynucleotides encoding the crRNA and tracrRNA are introduced into a cell and the crRNA and tracrRNA molecules are then transcribed intracellularly.
  • the crRNA and tracrRNA are encoded by a single polynucleotides.
  • the crRNA and tracrRNA are encoded by separate polynucleotides.
  • a detectable tag is inserted into a target locus of an endogenous gene mediated by Cas-mediated DNA cleavage at or near a target insertion site.
  • target insertion site refers to a specific location within a target locus, wherein a polynucleotide sequence encoding a detectable tag can be inserted.
  • a Cas endonuclease is directed to the target insertion site by the sequence specificity of the crRNA portion of the gRNA, which requires the presence of a protospacer motif (PAM) sequence near the target insertion site.
  • PAM protospacer motif
  • PAM sequences suitable for use with a particular endonuclease are known in the art (see e.g ., Nat Methods. 2013 Nov; 10(11): 1116— 1121 and Sci Rep. 2014; 4: 5405). Exemplary PAM sequences suitable for use in the present invention are shown in Table 5.
  • the target locus comprises a PAM sequence within 50 base pairs of the target insertion site. In some embodiments, the target locus comprises a PAM sequence within 10 base pairs of the target insertion site.
  • the genomic loci that can be targeted by this method are limited only by the relative distance of the PAM sequence to the target insertion site and the presence of a unique 20 base pair sequence to mediate sequence- specific, gRNA-mediated Cas9 binding.
  • the target insertion site is located at the 5’ terminus of the target locus. In some embodiments, the target insertion site is located at the 3’ end of the target locus. In some embodiments, the target insertion site is located within an intron or an exon of the target locus.
  • the specificity of a gRNA for a target loci is mediated by the crRNA sequence, which comprises a sequence of about 20 nucleotides that are complementary to the DNA sequence at a target locus.
  • the crRNA sequences used in the methods of the present invention are at least 90% complementary to a DNA sequence of a target locus.
  • the crRNA sequences used in the methods of the present invention are at least 95%, 96%, 97%, 98%, or 99% complementary to a DNA sequence of a target locus.
  • the crRNA sequences used in the methods of the present invention are 100% complementary to a DNA sequence of a target locus.
  • the crRNA sequences described herein are designed to minimize off-target binding using algorithms known in the art (e.g ., Cas-OFF finder) to identify target sequences that are unique to a particular target locus or target gene.
  • the crRNA sequences used in the methods of the present invention are at least 90% identical to one of SEQ ID NOs: 85 - 140.
  • the crRNA sequences used in the methods of the present invention are at least 95%, 96%, 97%, 98%, or 99% identical to one of SEQ ID NOs: 85 - 140.
  • the crRNA sequences used in the methods of the present invention are 100% identical to one of SEQ ID NOs: 85 - 140. Exemplary crRNA sequences are shown in Table 5.
  • the endonuclease is a Cas protein. In some embodiments, the endonuclease is a Cas9 protein. In some embodiments, the Cas9 protein is derived from Streptococcus pyogenes (e.g, SpCas9), Staphylococcus aureus (e.g, SaCas9), or Neisseria meningitides (NmeCas9).
  • Streptococcus pyogenes e.g, SpCas9
  • Staphylococcus aureus e.g, SaCas9
  • Neisseria meningitides Neisseria meningitides
  • the Cas endonuclease is a Cas9 protein or a Cas9 ortholog and is selected from the group consisting of SpCas9, SpCas9-HFl, SpCas9- HF2, SpCas9-HF3, SpCas9-HF4, SaCas9, FnCpf, FnCas9, eSpCas9, and NmeCas9.
  • the endonuclease is selected from the group consisting of C2C1, C2C3, Cpfl (also referred to as Casl2a), Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csnl and Csxl2), CaslO, Csyl, Csy2, Csy3, Csel, Cse2, Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, and Csf4.
  • CsxlO Csx
  • the Cas9 is a wildtype (WT) Cas9 protein or ortholog.
  • WT Cas9 comprises two catalytically active domains (HNH and RuvC). Binding of WT Cas9 to DNA based on gRNA specificity results in double-stranded DNA breaks that can be repaired by non-homologous end joining (NHEJ) or homology-directed repair (HDR).
  • NHEJ non-homologous end joining
  • HDR homology-directed repair
  • Cas9 is fused to proteins that recruit DNA-damage signaling proteins, exonucleases, or phosphatases to further increase the likelihood or the rate of repair of the target sequence by one repair mechanism or another.
  • a WT Cas9 is co-expressed with a nucleic acid repair template to facilitate the incorporation of an exogenous nucleic acid sequence by homology-directed repair.
  • a WT Cas9 is co-expressed with an exogenous nucleic acid sequence encoding a detectable tag to facilitate the incorporation of the nucleic acid encoding the detectable tag into an endogenous target loci by homology-directed repair.
  • the Cas9 is a Cas9 nickase mutant.
  • Cas9 nickase mutants comprise only one catalytically active domain (either the HNH domain or the RuvC domain).
  • the Cas9 nickase mutants retain DNA binding based on gRNA specificity, but are capable of cutting only one strand of DNA resulting in a single-strand break ( e.g . a“nick”).
  • two complementary Cas9 nickase mutants e.g.
  • one Cas9 nickase mutant with an inactivated RuvC domain, and one Cas9 nickase mutant with an inactivated HNH domain are expressed in the same cell with two gRNAs corresponding to two respective target sequences; one target sequence on the sense DNA strand, and one on the antisense DNA strand.
  • This dual-nickase system results in staggered double stranded breaks and can increase target specificity, as it is unlikely that two off-target nicks will be generated close enough to generate a double stranded break.
  • a Cas9 nickase mutant is co-expressed with a nucleic acid repair template to facilitate the incorporation of an exogenous nucleic acid sequence by homology- directed repair.
  • a Cas9 nickase mutant is co-expressed with an exogenous nucleic acid sequence encoding a detectable tag to facilitate the incorporation of the nucleic acid encoding the detectable tag into an endogenous target loci by homology-directed repair.
  • the components of a gene editing system are introduced into a population of stem cells with a repair template.
  • the repair template comprises a polynucleotide sequence encoding a detectable tag flanked on both the 5’ and 3’ ends by homology arm polynucleotide sequences.
  • the homology arm sequences and detectable tag sequences comprised within a repair template facilitate the repair of the Cas9- induced double-stranded DNA breaks at an endogenous target loci by homology-directed repair (HDR).
  • HDR homology-directed repair
  • repair of the double-stranded breaks by HDR results in the insertion of the polynucleotide sequence encoding the detectable tag into the endogenous target locus.
  • the repair template comprises a nucleic acid sequence that is at least about 90% identical to a sequence selected from SEQ ID NOs: 31 - 84. In some embodiments, the repair template comprises a nucleic acid sequence that is at least about 95%, 96%, 97%, 98%, or 99% identical to a sequence selected from SEQ ID NOs: 31 - 84. In some embodiments, the repair template comprises a nucleic acid sequence that is 100% identical to a sequence selected from SEQ ID NOs: 31 - 84.
  • each of the 5’ and 3’ homology arms is at least about
  • the homology arm sequences may be at least 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 2000, 3000 or more base pairs long. In some embodiments, the homology arm sequences are at least about 1000 base pairs long. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to an endogenous nucleic acid sequence located 5’ to a particular endogenous target locus.
  • the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to an endogenous nucleic acid sequence located 5’ to a particular endogenous target locus. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to an endogenous nucleic acid sequence located 5’ to a particular endogenous target locus. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to a sequence selected from SEQ ID NOs: 1 - 15.
  • the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to a sequence selected from SEQ ID NOs: 1 - 15. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to a sequence selected from SEQ ID NOs: 1 - 15.
  • the 3’ homology arm polynucleotide sequence is at least about 90% identical to an endogenous nucleic acid sequence located 3’ to a particular endogenous target locus. In some embodiments, the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to an endogenous nucleic acid sequence located 3’ to a particular endogenous target locus. In some embodiments, the 3’ homology arm polynucleotide sequence is 100% identical to an endogenous nucleic acid sequence located 3’ to a particular endogenous target locus.
  • the 3’ homology arm polynucleotide sequence is at least about 90% identical to a sequence selected from SEQ ID NOs: 16 - 30. In some embodiments, the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to a sequence selected from SEQ ID NOs: 16 - 30. In some embodiments, the 3’ homology arm polynucleotide sequence is 100% identical to a sequence selected from SEQ ID NOs: 16 - 30.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to a sequence selected from SEQ ID NOs: 1 - 15 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to a sequence selected from SEQ ID NOs: 16 - 30. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to a sequence selected from SEQ ID NOs: 1 - 15 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to a sequence selected from SEQ ID NOs: 16 - 30.
  • the 5’ homology arm polynucleotide sequence is 100% identical to a sequence selected from SEQ ID NOs: 1 - 15 and the 3’ homology arm polynucleotide sequence is 100% identical to a sequence selected from SEQ ID NOs: 16 - 30.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 1 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 16. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 16.
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 1 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 16.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 2 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 17. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 2 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 17.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 3 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 18. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 3 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 18.
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 3 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 18.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 4 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 19. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 4 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 19.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 5 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 20. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • 5 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 20.
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 5 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 20.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 6 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 21. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 21.
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 6 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 21.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 7 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 22. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 2.
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 7 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 22.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 8 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 23. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 23.
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 8 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 23.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 9 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 24. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 24.
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 9 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 24.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 10 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 25. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 10 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 25.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 11 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 26.
  • the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 26.
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 11 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 26.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 12 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 27. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 27.
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 12 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 27.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 13 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 28. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 28.
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 13 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 28.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 14 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 29. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
  • the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 14 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 29.
  • the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 15 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 30. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 15 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 30. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 15 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 30.
  • the components of the gene-editing system can be intracellularly delivered to a population of cells by any means known in the art.
  • the Cas component of a CRISPR/Cas gene editing system is provided as a protein.
  • the Cas protein may be complexed with a crRNA:tracrRNA duplex in vitro to form an CRISPR/Cas RNP (crRNP) complex.
  • the crRNP complex is introduced to a cell by transfection.
  • the Cas protein may be introduced to a cell before or after a gRNA is introduced to the cell.
  • the Cas protein is introduced to a cell by transfection before or after a gRNA is introduced to the cell.
  • a nucleic acid encoding a Cas protein is provided.
  • the nucleic acid encoding the Cas protein is an DNA nucleic acid and is introduced to the cell by transduction.
  • the Cas9 and gRNA components of a CRISPR/Cas gene editing system are encoded by a single polynucleotide molecule.
  • the polynucleotide encoding the Cas protein and gRNA component are comprised in a viral vector and introduced to the cell by viral transduction.
  • the Cas9 and gRNA components of a CRISPR/Cas gene editing system are encoded by a different polynucleotide molecules.
  • the polynucleotide encoding the Cas protein is comprised in a first viral vector and the polynucleotide encoding the gRNA is comprised in a second viral vector.
  • the first viral vector is introduced to a cell prior to the second viral vector.
  • the second viral vector is introduced to a cell prior to the first viral vector.
  • integration of the vectors results in sustained expression of the Cas9 and gRNA components.
  • sustained expression of Cas9 may lead to increased off-target mutations and cutting in some cell types. Therefore, in some embodiments, an mRNA nucleic acid sequence encoding the Cas protein may be introduced to the population of cells by transfection. In such embodiments, the expression of Cas9 will decrease over time, and may reduce the number of off target mutations or cutting sites.
  • each of the Cas9, tracrRNA, crRNA, and repair template components are introduced to a cell by transfection alone or in combination (e.g ., transfection of a crRNP). Transfection may be performed by any means known in the art, including but not limited to lipofection, electroporation (e.g., Neon® transfection system or an Amaxa Nucleofector®), sonication, or nucleofection.
  • the gRNA components can be transfected into a population of cells with a plasmid encoding the Cas9 nuclease. In such embodiments, the expression of Cas9 will decrease over time, and may reduce the number of off target mutations or cutting sites.
  • the repair templates described herein comprise a polynucleotide sequence encoding a“detectable tag”,“tag,” or“label.”
  • a“detectable tag”,“tag,” or“label” are used interchangeably herein and refer to a protein that is capable of being detected and is linked or fused to a heterologous protein (e.g, an endogenous protein).
  • the detectable tag serves to identify the presence of the heterologous protein. Insertion of a polynucleotide sequence encoding a detectable tag into an endogenous target loci results in the expression of a tagged version of the endogenous protein.
  • detectable tags include but are not limited to, FLAG tags, poly- histidine tags (e.g.
  • 6xHis 6xHis
  • SNAP tags 6xHis
  • Halo tags cMyc tags
  • glutathione-S-transferase tags avidin
  • enzymes fluorescent molecules
  • luminescent proteins chemiluminescent proteins
  • bioluminescent proteins bioluminescent proteins
  • phosphorescent proteins 6xHis
  • the detectable tag is a fluorescent protein such as green fluorescent protein (GFP), blue fluorescent protein, cyan fluorescent protein, yellow fluorescent protein, or red fluorescent protein.
  • the detectable tag is GFP.
  • Additional examples of detectable tags suitable for use in the present methods and compositions include mCherry, tdTomato, mNeonGreen, eGFP, Emerald, mEGFP (A208K mutation), mKate, and mTagRFPt.
  • the fluorescent protein is selected from the group consisting ofbBlue/UV proteins (such as TagBFP, mTagBFP2, Azurite, EBFP2, mKalamal, Sirius, Sapphire, and T-Sapphire); cyan proteins (such as ECFP, Cerulean, SCFP3A, mTurquoise, mTurquoise2, monomeric Midoriishi-Cyan, TagCFP, and mTFPl); green proteins (such as: EGFP, Emerald, Superfolder GFP, Monomeric Azami Green, TagGFP2, mUKG, mWasabi, Clover, and mNeonGreen); yellow proteins (such as EYFP, Citrine, Venus, SYFP2, and TagYFP); orange proteins (such as Monomeric Kusabira-Orange, ihKOk, mK02, mOrange, and mOrange2); red proteins (such as mRaspberry, mCherry, mStraw
  • the detectable tag can be selected from AmCyan, AsRed, DsRed2, DsRed Express, E2-Crimson, HcRed, ZsGreen, ZsYellow, mCherry, mStrawberry, mOrange, mBanana, mPlum, mRasberry, tdTomato, DsRed Monomer, and/or AcGFP, all of which are available from Clontech.
  • the polynucleotide sequence encoding the detectable tag is at least about 20 base pairs long. In some embodiments, the polynucleotide sequence encoding the detectable tag is at least 100 base pairs long.
  • the polynucleotide sequence encoding the detectable tag may be about 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 2000, 3000, 4000, 5000 or more base pairs long.
  • the polynucleotide sequence encoding the detectable tag comprises at least about 300 base pairs long. In some embodiments, the polynucleotide sequence encoding the detectable tag comprises at least about 500 base pairs long. In further embodiments, the polynucleotide sequence encoding the detectable tag is about 700 to about 750 base pairs long.
  • the polynucleotide sequence encoding the detectable tag may be about 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 7114, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 740, or about 750 base pairs long.
  • the polynucleotide sequence encoding the detectable tag is between 710 and 730 base pairs long.
  • the polynucleotide sequence can encode a full-length detectable tag or a portion or fragment thereof.
  • the polynucleotide sequence encodes a full-length detectable tag. In some embodiments, insertion of the detectable tag into the target locus does not significantly alter the expression or function of either the endogenous protein or the encoded detectable tag.
  • the insertion of the detectable tag sequence into an endogenous gene results in the production of a tagged endogenous protein.
  • the tag is directly fused to the endogenous protein.
  • the term“directly fused” refers to two or more amino acid sequences connected to each other ( e.g ., by peptide bonds) without intervening or extraneous sequences (e.g, two or more amino acid sequences that are not connected by a linker sequence).
  • the polynucleotide sequence encoding the detectable tag further comprises a linker sequence such that the detectable tag is attached (or linked) to the endogenous protein by a linker sequence.
  • the attachment may be by covalent or non-covalent linkage.
  • the attachment is covalent.
  • the linker sequence is a flexible linker sequence.
  • the tag is directly fused, or attached by a linker, to the C- terminal or N-terminal end of an endogenous protein.
  • the linker sequence is selected from the group consisting of sequences shown in Tables 3 and 4.
  • the donor polynucleotide further comprises a polynucleotide sequence encoding a selectable marker that allows for the selection of cells comprising the donor polynucleotide.
  • selectable markers are known in the art and include antibiotic resistance genes.
  • the antibiotic resistance gene confers resistance to gentamycin, thymidine kinase, ampicillin, and/or kanamycin.
  • the donor polynucleotide is a plasmid, referred to herein as a“donor plasmid.”
  • the donor plasmid comprises a repair template comprising (i) a 5’ homology arm sequence; (ii) a nucleic acid sequence encoding a detectable tag; and (iii) a 3’ homology arm sequence.
  • the repair template comprised within the donor plasmid further comprises a linker sequence located at the 5’ end or the 3’ end of the nucleic acid sequence encoding the detectable tag.
  • the repair template comprised within the donor plasmid further comprises an antibiotic resistance cassette located between the 5’ and 3’ homology arm sequences.
  • the antibiotic resistance cassette may be located 3’ to the 5’ homology arm sequence and 5’ to the nucleic acid sequence encoding the detectable tag.
  • the antibiotic resistance cassette may be located 5’ to the 3’ homology arm sequence and 3’ to the nucleic acid sequence encoding the detectable tag.
  • the donor plasmid does not comprise a promoter.
  • the donor plasmid functions as a vehicle to deliver the tag sequence intracellularly to a cell and does not mediate transcription and/or translation of the tag sequence or any polynucleotide sequence comprised therein.
  • the present invention provides for methods of inserting one or more detectable tags into one or more endogenous target loci.
  • the target locus is located within an endogenous gene encoding a structural protein or a non- structural protein. Exemplary target genes are shown below in Tables 1 and 2.
  • the structural protein is selected from paxillin (PXN), tubulin-alpha lb (TUBA1B), lamin Bl (LMNB1), actinin alpha 1 (ACTN1), translocase of outer mitochondrial membrane 20 (TOMM20), desmoplakin (DSP), Sec6l translocon beta subunit (SEC61B), fibrillarin (FBL), actin beta (ACTB), myosin heavy chain 10 (MYH10), vimentin (VIM), tight junction protein 1 (TJP1, also known as ZO-l), safe harbor locus, CAGGS promoter (AAVS1), microtubule-associated protein 1 light chain 3 beta (MAP1LC3B, also known as LC3), ST6 beta-galactoside alpha-2, 6- sialyltransferase 1 (ST6GAL1), lysosomal associated membrane protein 1 (LAMP1), centrin 2 (CETN2), solute carrier family 25 member 17 (SLC25A17), R
  • the one or more detectable tags are inserted into an endogenous target locus in a gene encoding a structural protein or a non-structural protein, wherein the expression of the gene and/or the encoded protein is associated with a particular cell type or tissue type.
  • the expression of the gene and/or the encoded protein is associated with cardiomyocytes, hepatocytes, renal cells, epithelial cells, endothelial cells, neurons, mucosal cells of the gut, lung, or nasal passages.
  • the expression of the gene and/or the encoded protein is associated with cardiac tissue including, but not limited to, troponin II, slow skeletal type (TNNI1), actinin alpha 2 (ACTN2), troponin 13, cardiac type (TNN13), myosin light chain 2 (MYL2), myosin light chain 7 (MYL7), titin (TTN), SMAD family member 2 (SMAD), SMAD family member 5 (SMAD5), NK2 homeobox 5 (NKX2-5), Mesoderm posterior bHLH transcription factor 1 (MESP1), Mix paired-like homeobox (MIXL1), and ISL LIM homeobox 1 (ISL1).
  • TNNI1 troponin II
  • ACTN2 actinin alpha 2
  • TTN titin
  • SMAD SMAD
  • SMAD5 SMAD family member 5
  • NKX2-5 NK2 homeobox 5
  • MEP1 Mesoderm posterior bHLH transcription factor 1
  • the expression of the gene and/or the encoded protein is associated with liver tissue including, but not limited to Cytochrome P450E1 (CYP2E1), Transferrin (TF), hemopexin (HPX), and albumin (ALB).
  • the expression of the gene and/or the encoded protein is associated with kidney tissue including, but not limited to Polycystic kidney disease 1 (PKD1) and Polycystic kidney disease 2 (PKD2).
  • the expression of the gene and/or the encoded protein is associated with epithelial tissue including, but not limited to keratin 5 (KRT5) and lamanin subunit gamma 2 (LAMC2). Exemplary genes associated with specific tissue and cell types are shown below in Table 2.
  • a plurality of detectable labels is inserted into a plurality of target loci. For example, one detectable label is inserted at one endogenous loci and a different detectable label is inserted at a different endogenous loci.
  • each of the individual detectable labels is selected such that the detection of one does not interfere, or minimally interferes with, the detection of another.
  • a unique crRNA is generated for each target locus.
  • a CRISPR ribonucleoprotein (crRNP), comprising a Cas protein complexed with a crRNA:tracrRNA duplex, is produced for each target locus.
  • the plurality of nucleic acid sequences encoding the plurality of detectable labels are comprised in a single donor plasmid and are flanked on the 5’ and 3’ ends by homology arms corresponding to genomic sequences within the target locus. For example, at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more detectable labels and their corresponding homology arms may be comprised within one donor polynucleotide.
  • the plurality of nucleic acid sequences encoding the plurality of detectable labels and their corresponding homology arms are comprised within at least two different donor plasmids. For example, at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more donor plasmids may be used in the present methods. In some embodiments, a plurality of donor plasmids ( e.g ., at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) each comprising one sequence encoding a detectable label and the corresponding homology arms may be used in the present methods.
  • a plurality of donor plasmids (e.g., at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) each comprising a plurality of sequences encoding two or more detectable labels (e.g, at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) and the corresponding homology arms may be used in the present methods.
  • the plurality of donor plasmids are introduced to a stem cell at the same time.
  • the plurality of donor plasmids are introduced to a stem cell sequentially.
  • the present disclosure provides edited stem cell clones that stably express one or more tagged endogenous proteins.
  • the stably tagged stem cell clones of the current invention are characterized by (i) mono- or biallelic insertion of a nucleic acid sequence encoding a detectable tag (e.g. GFP) into one or more endogenous proteins (e.g, structural, non-structural, or non-expressed proteins of the stem cell); (ii) pluripotency (e.g, the ability to differentiate into all three germ layers); and (iii) the lack of additional mutations or alternations in the endogenous stem cell genome.
  • Such edited stem cell clones are herein referred to as“stably tagged stem cell clones.”
  • the stably tagged stem cell clones described herein phenotypically differ from non-engineered stem cell clones only by the expression of one or more endogenous proteins that have been tagged with a detectable tag and the incorporation of one or more antibiotic resistance cassettes into the one or more tagged endogenous loci.
  • the stably tagged stem cell clones of the current invention are characterized by (i) mono- or biallelic insertion of a nucleic acid sequence encoding a detectable tag (e.g.
  • GFP into one or more endogenous proteins (e.g., structural, non-structural, or non-expressed proteins of the stem cell); (ii) pluripotency (e.g, the ability to differentiate into all three germ layers); and (iii) the presence of one or more additional mutations or alternations in the endogenous stem cell genome.
  • endogenous proteins e.g., structural, non-structural, or non-expressed proteins of the stem cell
  • pluripotency e.g, the ability to differentiate into all three germ layers
  • pluripotency e.g, the ability to differentiate into all three germ layers
  • the presence of one or more additional mutations or alternations in the endogenous stem cell genome e.g., the ability to differentiate into all three germ layers
  • Such edited stem cell clones are herein referred to as“stably tagged mutant stem cell clones.”
  • the stably tagged mutant stem cell clones comprise one or more one or more additional mutations or alternations in the endogenous stem cell genome that
  • the stably tagged mutant stem cell clones described herein phenotypically differ from non-engineered stem cell clones by the expression of one or more endogenous proteins that have been tagged with a detectable tag, the incorporation of one or more antibiotic resistance cassettes into the one or more tagged endogenous loci, and the presence of one or more mutations additional not found in the non-engineered stem cell clones.
  • the stably tagged mutant stem cell clones described herein phenotypically differ from the corresponding stably tagged stem cell clones only by the presence of one or more additional mutations.
  • compositions comprising stably tagged stem cell clones made by the methods described herein.
  • the compositions comprise a stably tagged stem cell clone wherein one endogenous protein is tagged.
  • a composition may comprise a stably tagged stem cell clone expressing a tagged endogenous protein wherein the endogenous protein is one selected from Tables 1 and/or 2 (e.g ., one of PXN, TUBA1B, LMNB 1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l), AAVS1, MAP1LC3B (also known as LC3), ST6GAL1, LAMP1, CETN2, SLC25A17 (also known as PMP34), RAB5A, GJA1 (also known as connexin 43 (CX43)), MAPK1, ATP2A2, AKT1, CTNNB1, NPM1, HIST1H2BJ, C AGGS :HI S T 1 H2B J : 2 A : C A AX, PKD2, DMD, DES, SLC25A17 (also known as PMP34),
  • compositions described herein comprise a stably tagged stem cell clone wherein at least two endogenous proteins are tagged.
  • a composition may comprise a stably tagged stem cell clone wherein one endogenous loci is tagged with a detectable tag and wherein another endogenous loci is tagged with a different detectable tag.
  • either of the endogenous loci may be selected from Tables 1 and/or 2.
  • the endogenous proteins may be two or more of those listed in Tables 1 and 2 (e.g., two or more of PXN, TUBA1B, LMNB 1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l), AAVS1, MAP1LC3B (also known as LC3), ST6GAL1, LAMP1, CETN2, SLC25A17 (also known as PMP34), RAB5A, GJA1 (also known as connexin 43 (CX43)), MAPK1, ATP2A2, AKT1, CTNNB1, NPM1, HIST1H2BJ,
  • Tables 1 and 2 e.g., two or more of PXN, TUBA1B, LMNB 1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l),
  • C AGGS HI S T 1 H2B J : 2 A : C A AX, PKD2, DMD, DES, SLC25A17 (also known as PMP34), SMC1A, NUP153, CTCF, CBX1, Oct4, Sox2, Nanog, TNNI1, ACTN2, TNN13, MYL2, MYL7, TTN, SMAD, SMAD5, NKX2-5, MESP1, MIXL1, ISL1, CYP2E1, TF, HPX, ALB, PKD1, PKD2, KRT5, and LAMC2.
  • one detectable tag may be inserted into a target loci in TUBAB1 and a different detectable tag may be inserted into a target loci in LMNB1.
  • one detectable tag may be inserted into a target loci in SEC61B and a different detectable tag may be inserted into a target loci in LMNB 1.
  • one detectable tag may be inserted into a target loci in TOMM20 and a different detectable tag may be inserted into a target loci in TUBAB1.
  • one detectable tag may be inserted into a target loci in SEC61B and a different detectable tag may be inserted into a target loci in TEGBAB1.
  • one detectable tag may be inserted into a target loci in TUBAB1 and a different detectable tag may be inserted into a target loci in CETN2.
  • one detectable tag may be inserted into a target loci in SEC61B and a different detectable tag may be inserted into a target loci in LMNB 1.
  • one detectable tag may be inserted into a target loci in AAVS1 and a different detectable tag may be inserted into a target loci in CAGGS:HISTlH2BJ:2A:CAAX.
  • one detectable tag may be inserted into a target loci in TOMM20 and a different detectable tag may be inserted into a target loci in TUB AB E
  • compositions described herein comprise a stably tagged stem cell clone wherein at least three endogenous proteins are tagged.
  • a composition may comprise a stably tagged stem cell clone wherein a first endogenous loci is tagged with a first detectable tag, a second endogenous loci is tagged with a second detectable tag, and a third endogenous loci is tagged with a third detectable tag.
  • any of the endogenous loci may be selected from Tables 1 and/or 2.
  • the endogenous proteins may be three or more of those listed in Tables 1 and 2 (e.g ., three or more of PXN, TUBA1B, LMNB1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l), AAVS1, MAP1LC3B (also known as LC3), ST6GAL1, LAMPl, CETN2, SLC25A17 (also known as PMP34), RAB5A, GJA1 (also known as connexin 43 (CX43)), MAPK1, ATP2A2, AKT1, CTNNB1, NPM1, HIST1H2BJ, C AGGS :HI S T 1 H2B J : 2 A : C A AX, PKD2, DMD, DES, SLC25A17 (also known as PMP34), SMC1A, NUP153, CTCF, CBX1, Oct4, Sox2, Nanog,
  • the compositions described herein comprise a stably tagged stem cell clone wherein at least four or five or more endogenous proteins are tagged.
  • the endogenous proteins may be three or more of those listed in Tables 1 and 2 (e.g., four, five, or more of PXN, TUBA1B, LMNB 1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l), AAVS1, MAP1LC3B (also known as LC3), ST6GAL1, LAMP1, CETN2, SLC25A17 (also known as PMP34), RAB5A, GJA1 (also known as connexin 43 (CX43)), MAPK1, ATP2A2, AKT1, CTNNB1, NPM1, HIST1H2BJ,
  • C AGGS HI S T 1 H2B J : 2 A : C A AX, PKD2, DMD, DES, SLC25A17 (also known as PMP34), SMC1A, NUP153, CTCF, CBX1, Oct4, Sox2, Nanog, TNNI1, ACTN2, TNN13, MYL2, MYL7, TTN, SMAD, SMAD5, NKX2-5, MESP1, MIXL1, ISL1, CYP2E1, TF, HPX, ALB, PKD1, PKD2, KRT5, and LAMC2.
  • compositions described herein comprise two or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses a tagged endogenous protein. In some embodiments, each stably tagged stem cell clone express a different tagged endogenous protein. In some embodiments, the compositions described herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses a different tagged endogenous protein.
  • compositions described herein comprise two or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses two or more tagged endogenous proteins. In some embodiments, the compositions described herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses two or more tagged endogenous proteins. In some embodiments, the compositions described herein comprise two or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses 2, 3, 4, 5, 6, 7, 8, 9, 10 or more tagged endogenous proteins.
  • compositions described herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses 2, 3, 4, 5, 6, 7, 8, 9, 10 or more tagged endogenous proteins.
  • each stably tagged stem cell clone express a group of tagged endogenous proteins that are different from the tagged endogenous proteins expressed by another stem cell clone in the same composition.
  • Exemplary endogenous proteins that can be tagged in these embodiments are shown in Tables 1 and 2, including but not limited to PXN, TEBA1B, LMNB1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l), AAVS1, MAP1LC3B (also known as LC3), ST6GAL1, LAMP1, CETN2, SLC25A17 (also known as PMP34), RAB5A, GJA1 (also known as connexin 43 (CX43)), MAPK1, ATP2A2, AKT1, CTNNB1, NPM1, HIST1H2BJ, CAGGS:HISTlH2BJ:2A:CAAX, PKD2, DMD, DES, SLC25A17 (also known as PMP34), SMC1A, NUP153, CTCF, CBX1, Oct4, Sox2, Nanog, TNNI1, ACTN2, TNN13
  • Exemplary stably tagged stem cell clones that can be produced by the methods and techniques are shown below in Tables 3 and 4.
  • the association of any tag in the table with any structural protein in the table is for illustrative purposes only.
  • any tag (or fluorescent protein) in the Table can be associated with any structural gene in the table.
  • the present invention provides methods for selecting a stem cell that has been modified by the methods described herein to express a tagged endogenous protein.
  • the insertion of the tag sequence into the endogenous target loci does not result in additional genetic mutations or alterations in the endogenous target locus, or any other heterologous locus in the endogenous genome.
  • the insertion of the tag sequence into the endogenous target loci does not modify or alter the expression, function, or localization of the endogenous protein.
  • methods are provided herein for selecting stem cells modified by the methods described herein, wherein the identified stem cells comprise one or more of precise insertion of the nucleic acid sequence encoding a tag; pluripotency; maintained cell viability and function as compared to a non-modified stem cell; maintained levels of expression of the tagged endogenous protein as compared to a non-modified stem cell; maintained protein localization of the tagged endogenous protein as compared to a non- modified stem cell; maintained protein function of the tagged endogenous protein as compared to a non-modified stem cell; maintained expression of stem cell markers as compared to a non- modified stem cell; and/or maintained differentiation potential.
  • the properties of a selected stem cell are validated by one or more of several downstream assays.
  • a population of edited stem cells are sorted based on their relative expression of the detectable tag.
  • cells are sorted by fluorescence activated cell sorting (FACS).
  • FACS fluorescence activated cell sorting
  • Cells that are positive for the inserted tag are selected for further analysis.
  • the selected cells are expanded in a single colony expansion assay to produce individual clones of edited stem cells.
  • edited clones are further analyzed by digital droplet
  • ddPCR ddPCR
  • the clones are further analyzed to determine the copy number of the inserted tag sequence.
  • identified clones have monoallelic or biallelic insertion of the tag sequence.
  • the modified cells are assessed for the functional expression of the one or more detectable tags.
  • live cell imaging may be used to observe localization, expression intensity, and persistence of expression of the tagged endogenous protein in the modified stem cells described herein.
  • the expression of one or more detectable tags does not substantially or does not significantly alter the endogenous expression, localization, or function of the tagged protein.
  • the precise insertion of the tag sequence is analyzed by sequencing the edited target locus or a portion thereof.
  • the junctions between the endogenous genomic sequence and the 5’ and 3’ ends of the tag sequence are amplified.
  • the amplification products derived from the population of edited cells are sequenced and compared with sequences of the corresponding target locus derived from a population of non-edited cells.
  • potential off-target sites for the crRNA sequences are determined using algorithms known in the art (e.g., Cas-OFF finder). To determine the presence of off-target cutting or insertions, these predicted off-target sites and the surrounding genomic sequences can be amplified and sequenced to determine the presence of any mutations or inserted tag sequences. Sequencing can be performed by a number of methods known in the art, e.g., Sanger sequencing and Next-generation, high-throughput sequencing.
  • the edited populations of cells can be assessed for the expression of transcription factors, cell surface markers, and other proteins or genes associated with stem cells (e.g. Oct 3/4, Sox2, Nanog, Tra-l60, Tra-l8l, and SSEA3).
  • Protein expression can be determined by a number of means known in the art including flow cytometry, ELISA, Western blots, immunohistochemistry, or co-immunoprecipication.
  • Gene expression can be determined by qPCR, microarray, and/or sequencing techniques (e.g, NGS, RNA-Seq, or CHIP-Seq).
  • the edited populations of cells can be assessed for the presence of the CRISPR/Cas9 ribonucleoprotein (RNP) complex and/or the donor polynucleotide.
  • the edited stem cells are determined to be pluripotent according to the methods outlined above may be cryopreserved for later differentiation or use.
  • the invention provides for methods of live-cell imaging in three dimensions using the stably tagged stem cell clones and methods described herein.
  • the present invention provides methods of assaying the differentiation potential of the edited stem cells and stably tagged clones thereof described herein. Such assays typically involve culturing edited stem cells or stably tagged clones thereof in media comprising one or more factors required for differentiation. Factors required for differentiation are referred to herein as“differentiation agents” and will vary according to the desired differentiated cell type.
  • the ability of the edited stem cells or stably tagged clones thereof described herein to differentiate into specialized cells is substantially similar to the ability of un-modified stem cells to differentiate into specialized cells.
  • the edited stem cells and/or stably tagged clones thereof described herein are able to differentiate into substantially the same number of different types of specialized cells, differentiate at substantially the same rate ( e.g ., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more days to differentiated), and produce differentiated cells that are as viable and as function as un-modified stem cells.
  • the methods of assaying the differentiation potential of the edited stem cells and stably tagged clones thereof described herein includes the addition of one or more test agents to a culture of edited stem cells or stably tagged clones thereof prior to, during, or after the addition of one or more differentiation agents.
  • the edited stem cells or stably tagged clones thereof can then be visualized for changes in cellular morphology associated with the individual structural proteins tagged within each edited stem cells or stably tagged clones thereof.
  • these methods may be used to identify agents that promote differentiation into one or more cell lineages and therefore may be useful as differentiation agents.
  • these methods may be used to identify agents that disrupt or inhibit differentiation.
  • the stably tagged stem cells may be differentiated into any cell type, including but not limited to hematopoietic cells, neurons, astrocytes, dendritic cells, hepatocytes, cardiomyocytes, kidney cells, smooth muscle cells, skeletal muscle cells, epithelial cells, or endothelial cells.
  • the methods described herein can be used to produce edited stem cells in which one or more endogenous genes are tagged, which when differentially expressed or changed localization, provide information regarding a potential disease state or condition.
  • the following genes can be tagged and monitored following differentiation. Mislocalization or misfolding of the protein products of these genes often indicate evidence of a disease condition or potential for a disease condition. Shiny App. Values for each gene are also provided (internal database simulations and experiments), indicating their low level of expression in wild type human induced PSCs. Cells produced using the methods described herein with such genes tagged can provide a mechanisms for examining correction of such errors via pharmacological or other intervention. Many such targets for the editing methods described herein are G-protein-coupled receptors (GPCRs).
  • GPCRs G-protein-coupled receptors
  • Exemplary genes include:
  • GPCRs many targets that are mislocalized in disease states. Very large druggable target class. Below are a few examples. GPCRs also tolerate tagging.
  • Rhodopsin Perturbed localization in retinitis pigmentosum.
  • the present invention provides methods for drug screening to identify candidate therapeutic agents, and methods of screening agents to determine the effects of agents on the stably-tagged stem cell clones described herein and cells derived therefrom produced by the methods of the present invention.
  • the methods may be employed to identify an agent having a desired effect on the cells.
  • the stably-tagged stems cells of the present invention enable changes across multiple cell types to be assayed with the built in control of the cell types all being derived from the same progenitor clone.
  • methods are provided for determining the effect of agents including small molecules, proteins, nucleic acids, lipids or even physical or mechanical stress (i.e. UV light, temperature shifts, mechanical sheer, etc.) by culturing a population of the stably-tagged stem cell clones described herein and cells derived therefrom in the presence and absence of the test agent(s).
  • agents that disrupt, alter, or modulate various key cellular structures and processes including but not limited to cell division, microtubule organization, actin dynamics, vesicle trafficking, cell signaling, DNA replication, calcium regulation, ion channel regulators, and/or statins are assayed by the present methods.
  • the agent exerts a biological effect on the cells, such as increased cell growth or differentiation, increased or reduced expression of one or more genes, or increased or reduced cell death or apoptosis, etc.
  • the stably-tagged stem cell clones used to screen for agents having a particular effect comprise a tagged protein associated with the cellular structure, process or biological activity being examined, such as any of the combinations of genes and structures shown in tables 3 and 4. Exemplary agents are shown in FIG. 26A.
  • the method provides assaying the cells after the exposure period by any known method, including confocal microscopy in order to determine changes in the content, orientation or cellular composition of the tagged structural protein contained within the given cell population.
  • a comparison can be made between the treated cells and untreated controls.
  • a positive control may also be utilized in such methods.
  • one or more positive control agents with known effects on targeted structures may be applied to differentiated cell cultures derived from stably tagged stem cell clones and imaged, for example by confocal microscopy.
  • the data obtained from these positive control experiments may be used as a training set for data that would allow for the automated assaying of different cellular structures in different cell types based on machine learning.
  • the data obtained from these experiments are used to generate a signature for a test agent.
  • the method of generating a signature for a test agent comprises (a) admixing the test agent with one or more stably tagged stem cell clones; (b) detecting a response in the one or more stem cell clones; (c) detecting a response in a control stem cell; (d) detecting a difference in the response in the one or more stem cell clones from the control stem cell; and (e) generating a data set of the difference in the response.
  • the detected response in the stem cell clones and/or control cells is one or more of cell proliferation, microtubule organization, actin dynamics, vesicle trafficking, cell-surface protein expression, DNA replication, cytokine or chemokine production, changes in gene expression, and/or cell migration.
  • the control cell is a stably tagged stem cell clone that has not been exposed to the test agent or a control agent ( e.g ., a vehicle control).
  • the control cell is a stably tagged stem cell clone that has been exposed a control agent (e.g., a vehicle control).
  • these methods are used to determine the toxicity of a test agent and/or to determine the optimal dose of a test agent required to induce or inhibit a particular cell function or cell response.
  • the difference in the response in the one or more stem cell clones from the control stem cell are quantified and used to generate a data set of the difference in the response. This data-set can then be used as a training set for an algorithm to predict the effect of a related agent on a particular cellular function.
  • stably tagged stem cell clones derived from diseased patients or stably tagged mutant stem cell clones can be differentiated into one or more differentiated cell types assayed by the methods described herein to generate a cell-type specific data-set related to a particular disease.
  • the cell proliferation, microtubule organization, actin dynamics, vesicle trafficking, cell-surface protein expression, DNA replication, cytokine or chemokine production, changes in gene expression, and/or cell migration of the differentiated cells can be determined at one or more time points during differentiation and maturation.
  • Data sets derived from such assays can then be used as a training set for one or more disease-specific algorithms that can be applied to a cell sample derived from a patient to determine whether the patient has a disease, the stage of disease, and/or used to monitor the effects of a particular disease treatment.
  • the disease is selected from a disease characterized by aberrant cell growth, wound healing, inflammation, and/or neurodegeneration.
  • methods are provided for live-cell imaging to observe intracellular protein localization, expression intensity, and persistence of expression in the modified stem cells or stably transfected stem cell clones described herein.
  • the expression of one or more detectable tags does not substantially or does not significantly alter the endogenous expression or localization of the tagged protein.
  • the invention provides for methods of live-cell imaging in three dimensions using the stably tagged stem cell clones and the cell culturing and plating and microscopy methods described herein.
  • kits comprising the stably tagged stem cell clones described herein.
  • the kits described herein comprise two or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses a tagged endogenous protein.
  • each stably tagged stem cell clone express a different tagged endogenous protein.
  • the kits described herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses a different tagged endogenous protein.
  • kits described herein comprise two or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses two or more tagged endogenous proteins. In some embodiments, the kits described herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses two or more tagged endogenous proteins. In some embodiments, the kits described herein comprise two or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses 2, 3, 4, 5, 6, 7, 8, 9, 10 or more tagged endogenous proteins.
  • kits described herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses 2, 3, 4, 5, 6, 7, 8, 9, 10 or more tagged endogenous proteins.
  • each stably tagged stem cell clone express a group of tagged endogenous proteins that are different from the tagged endogenous proteins expressed by another stem cell clone in the same composition.
  • Exemplary endogenous proteins that can be tagged in these embodiments are shown in Tables 1 and 2, including but not limited to PXN, TUBA1B, LMNB 1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l), AAVS1, MAP1LC3B (also known as LC3), ST6GAL1, LAMP1, CETN2, SLC25A17 (also known as PMP34), RAB5A, GJA1 (also known as connexin 43 (CX43)), MAPK1, ATP2A2, AKT1, CTNNB1, NPM1, HIST1H2BJ,
  • kits also allow for building an entire“cell clinic” or reference set that comprises cell types from every major organ system, or those of interest, that allows for the interrogation of likely function of new genes and assaying of cellular toxicity.
  • kits for assessing differentiation agents and/or the effect of compounds or drugs on the differentiation of stem cells comprising one or more stably tagged stem cell clones expressing one or more tagged endogenous proteins. In some embodiments, the present disclosure provides a kit comprising a plurality of stably tagged stem cell clones expressing one or more tagged endogenous proteins. In some embodiments, the cells are provided as an array such that all cellular structures are tagged among a plurality of stably tagged stem cell clones.
  • kits described herein further comprise one or more agents known to elicit stem cell differentiation into one or more cell types.
  • agents known to elicit stem cell differentiation into one or more cell types include one of skill in the art would understand the appropriate media and agents for differentiation into various cell types.
  • a kit may include stably tagged stem cells and media containing Activin A for cardiomyocyte differentiation.
  • a kit may include stably tagged stem cells and media containing factors described in Methods Mol Biol. 2014; 1210: 131-41 or Biomed Rep. 2017 Apr; 6(4): 367-373 for hepatocyte differentiation.
  • a kit may include stably tagged stem cells and media containing factors described in Methods Mol Biol. 2017;1597: 195-206 or Nat Commun.
  • a kit may include stably tagged stem cells and media containing factors described in Mol Psychiatry. 2017 Apr 18. doi: l0. l038/mp.20l7.56 or Scientific Reports volume 7, Article number: 42367 (2017) for neuronal cell differentiation. Additional exemplary factors for producing differentiated cell types from human iPSCs are shown in FIG. 32.
  • the stably tagged stem cells according to this embodiment may be provided in expanded form, for example, on a multi-well plate and ready for assay. Alternatively, the cells may be provided in a form that requires further expansion before plating and assaying.
  • kits comprising one or more differentiated cell types derived from one or more stably tagged stem cell clones.
  • derived from for example, one or more stably tagged stem cell clones refers to cells that are differentiated, from the stably tagged stem cell clones.
  • cells that are derived from stably tagged stem cell clones are terminally differentiated cells that are direct progeny of the stably tagged stem cell clones. Therefore, the differentiated cell types, like their stably tagged stem cell clone progenitors also express tagged ( e.g .
  • kits provided herein comprise one or more differentiated cell types.
  • kits provided herein contain differentiated cell types from all three germ layers.
  • kits are provided containing differentiated cells of substantially all major cell types of the body derived from stably tagged stem cell clones.
  • the kits are provided on multi-well plates in assay ready format.
  • the cells are provided in a form that requires thawing, culturing and/or expanding the cells.
  • the differentiated cells derived from stably tagged stem cells are provided in an array such that for each cell type member in the array, a tagged protein member is provided such that every structure being studied is tagged in each cell type being assayed.
  • a cell comprising at least one tagged endogenous, differentially-expressed protein.
  • the methods described herein can be used to produce various cells types, including for example, normal cells, cancer cells, tissue-specific cells, etc.
  • the cells that are produced are stem cells, as described herein.
  • the methods are useful for producing at least one tagged endogenous, differentially-expressed protein.
  • an“endogenous, differentially- expressed protein” refers to a protein that is a wild-type protein, or a protein that comprises one or more naturally-occurring mutations and/or one or more introduced mutations, that is substantially expressed in one cellular state, but is non-substantially expressed in a different cellular state.
  • An endogenous, differentially-expressed protein is non-substantially expressed in a first cellular state when that endogenous, differentially-expressed protein, is produced at a level that is less than about 10% of the level of production in a second cellular state.
  • steps should be taken to provide that the endogenous, differentially-expressed protein is not expressed at all when the gene editing described herein is taking place, so as to allow the methods to modify the target gene(s) as required.
  • stem cells have the capacity to differentiate into at least one differentiated cell lineage, and in embodiments, the ability to differentiate into all three germ layers.
  • a first cellular state i.e., as an undifferentiated stem cell
  • the level of an endogenous, differentially-expressed protein would be less than about 10%, suitably less than about 5%, less than about 1%, less than about 0.5%, less than about 0.1%, less than about 0.01%, and suitably, about 0%, of the level of production of the same, endogenous, differentially-expressed protein, in the second cellular state (i.e., a stem cell that is differentiated into one of the three germ layer cells).
  • cells that can contain an endogenous, differentially-expressed protein include, for example, cells that may differentially- express a protein in transitioning from a normal cell to cancerous cell, cells transitioning from a normal cell to a diseased cell, cells transiting from a normal cell to a dying or apoptotic cell, etc.
  • the endogenous, differentially-expressed protein exhibits no expression in a pluripotent stem cell, but is expressed (i.e., at a biologically meaningful level) in a differentiated cell. That is, in embodiments, the endogenous, differentially-expressed protein is only, specifically expressed in a differentiated cell, but is not expressed in a pluripotent stem cell.
  • the methods include providing a first nuclease specific for a target genomic locus of a differentially-expressed protein.
  • the methods further include providing a donor plasmid that comprises a first polynucleotide encoding a selection cassette, wherein the selection cassette comprises a first selectable marker; a second polynucleotide encoding a second selectable marker that is different from the first selectable marker; a third polynucleotide encoding a 5’ homology arm; and a fourth polynucleotide encoding a 3’ homology arm.
  • the methods further include introducing the first nuclease and the donor plasmid of into a cell such that the first and second polynucleotides are inserted into the target genomic locus. Cells expressing the first selectable marker are then selected.
  • the methods also include introducing into the selected cells a second nuclease capable of excising the selection cassette to generate an endogenous protein tagged with the second selectable marker.
  • the methods suitably produce a cell comprising the at least one tagged endogenous, differentially-expressed protein, such that the tagged endogenous protein is substantially free of a scar sequence.
  • nuclease specific for a target genomic locus of a particular protein i.e., a differentially-expressed protein
  • examples include zinc-finger nuclease systems, TALEN systems, and in suitable embodiments, CRISPR/Cas systems.
  • Nucleases specific for a target genomic locus are described throughout.
  • the donor plasmid that is provided includes a first polynucleotide encoding a selection cassette, wherein the selection cassette comprises a first selectable marker.
  • selection cassette refers to a polynucleotide sequence that contains one or more genes encoding one or more selectable markers, and also suitably including one or more linkers, spacers or flanking polynucleotide sequences; one or more constitutive regulatory elements; and a pair of excision sites.
  • the one or more linkers, spacers or flanking polynucleotides sequences are sequences that are not found in the genomic sequence of the cell being targeted by the method.
  • a“selectable marker” refers to a gene that encodes a protein that is capable of being detected or observed, or confers a trait to allow preferential selection (whether positive or negative selection), thereby allowing detection, selection, identification or visualization of the cells that include the marker.
  • selectable marker include antibiotic resistance genes (e.g., resistance to ampicillin, chloramphenicol, tetracycline or kanamycin, etc.), counterselectable markers that eliminate or inhibit growth upon selection (e.g., thymidine kinase) , as well as detectable tags, as described herein which include but are not limited to, FLAG tags, poly-histidine tags (e.g.
  • 6xHis 6xHis
  • SNAP tags 6xHis
  • Halo tags cMyc tags
  • glutathione-S-transferase tags avidin
  • enzymes fluorescent molecules
  • luminescent proteins chemiluminescent proteins
  • bioluminescent proteins bioluminescent proteins
  • phosphorescent proteins 6xHis
  • the donor plasmid also further includes a second selectable marker that is suitably different than the first selectable marker, so as to allow for a two-selection approach to produce the desired cells, as described herein.
  • the donor plasmid also includes a polynucleotide encoding a 5 homology arm and a 3 homology arm. As described herein, each of the 5 and 3 homology arms is at least about 500 base pairs long. In some embodiments, the homology arm sequences are at least about 1000 base pairs long. In embodiments, each of the 5 and 3 homology arm polynucleotide sequences is at least about 90% identical to an endogenous nucleic acid sequence located 5’ or 3’, to a particular endogenous target locus. In some embodiments, each of the homology arm sequences is at least about 95%, 96%, 97%, 98%, or 99%, or 100% identical to an endogenous nucleic acid sequence located 5’ or 3’ to a particular endogenous target locus.
  • the methods of production further include introduction the nuclease and the donor plasmid into a cell, such that the first a second polynucleotides are inserted into the target genomic locus.
  • the nuclease and the donor plasmid can be inserted via various methods of transfection, including lipofection, electroporation (e.g., Neon® transfection system or an Amaxa Nucleofector®), sonication, or nucleofection.
  • the transfection occurs via electroporation, as described herein, suitably utilizing electroporation comprises at least 1 pulse, suitably at least 2 pulses, and more suitably 1 to 5 pulses.
  • the electroporation utilizes a pulse that is at least about 15 ms in length, at a voltage of at least about 1300 V. Additional lengths and voltages for use in electroporation are described herein.
  • cells that express the first selectable marker are then selected.
  • the cells can be selected via various cell sorting methods, including for example FACS. Selecting cells that express this first selectable marker provides a mechanism to ensure that the cells include the donor plasmid.
  • the methods suitably further include introducing into these selected cells
  • a second nuclease that is capable of excising the selection cassette.
  • nucleases that can be used for such a gene editing approach are provided herein.
  • the selection of appropriate nucleases, excision sites, and flanking polynucleotide sequences results in an endogenous, differentially expressed protein, that is substantially free of a scar sequence.
  • substantially free of a scar sequence means that the tagged protein contains less than 34 nucleotides that are the result of the nuclease-facilitated excision, suitably less than 30 nucleotides, less than 20 nucleotides, suitably less than 10 nucleotides, more suitably, less that 5 nucleotides that are the result of the nuclease-facilitated excision, suitably 4 nucleotides or less, 3 nucleotides or less, 2 nucleotides or less, 1 nucleotide, and suitably 0 nucleotides, that are residual from the excision.
  • deleting exogenously introduced sequences was accomplished with site-specific recombinases or transposases.
  • a method for producing a stem cell comprising at least one tagged endogenous, differentially-expressed protein.
  • the methods suitably include providing a first ribonucleoprotein (RNP) complex comprising a first Cas protein, a first CRISPR RNA (crRNA) and a first trans- activating RNA (tracrRNA), wherein the first crRNA is specific for a target genomic locus of an endogenous, differentially-expressed protein in a stem cell.
  • RNP ribonucleoprotein
  • crRNA CRISPR RNA
  • tracrRNA trans- activating RNA
  • the first crRNA is specific for a target genomic locus of an endogenous, differentially-expressed protein in a stem cell.
  • a donor plasmid is provided, comprising polynucleotide sequences encoding a first selectable marker.
  • this first selectable marker is a detectable tag, suitably a fluorescent protein.
  • this first detectable tag is a gene encoding
  • the donor plasmid further includes a 5 excision site and a 3’ excision site, wherein the 5 and 3’ excision sites flank the first selectable marker.
  • excision sites are generally on the order of about 5-40 base pairs in length, and suitably include sites that are specific for the nuclease selected to allow for precise removal of the first selectable marker, and suitably the 5 excision site and a 3’ excision site are nucleic acid sequences that are not found in the target genome.
  • further linker or spacer polynucleotides can be included on either side of the 5 excision site and a 3’ excision site.
  • the donor plasmid further includes a second selectable marker that is different from the first selectable marker, suitably located 3' from the first selectable marker (and the excision cassette).
  • the second selectable marker is suitably a detectable tag, so as to produce a cell that includes a tagged, endogenous differentially-expressed protein, that can readily be detected (e.g., via imaging, cell sorting, etc.), including a fluorescent protein such as GFP.
  • the donor plasmid also suitably includes a 5 homology arm and a 3’ homology arm, wherein the 5 and 3’ homology arms are at least about 1 kb in length. Suitable lengths and percent identity to a target, endogenous nucleic acid sequence, for the 5’ and 3’ homology arms are provided herein.
  • the complex of the first ribonucleoprotein (RNP) complex comprising a first Cas protein, the first CRISPR RNA (crRNA) and the first trans-activating RNA (tracrRNA), are suitable transfected into the stem cell, along with the donor plasmid.
  • RNP ribonucleoprotein
  • crRNA CRISPR RNA
  • tracrRNA first trans-activating RNA
  • Exemplary methods of transfecting the complexes and donor plasmids are described herein.
  • the polynucleotides encoding the selection cassette i.e., first selectable marker, 5’ excision site and a 3’ excision site flanking the first selectable marker
  • the second selectable marker are inserted into the target genomic locus.
  • a selection is then carried out to select for stem cells that express the first selectable marker.
  • Suitable selection methods include various cell sorting methods, such as FACS.
  • an additional transfection (e.g., transfection 2 in FIG. 33C, is carried out with a second RNP complex comprising a second Cas protein, a second crRNA, and a second tracrRNA, wherein the second crRNA is specific for the 5’ and 3’ excision sites on the donor plasmid.
  • This second transfection results in the excision of the first selectable marker, and the generation of an endogenous, differentially-expressed protein that is tagged with the second selectable marker (i.e., a detectibly tagged, endogenous, differentially-expressed protein).
  • Stem cells that include this second selectable marker can be selected for, by for example, cell sorting for cells not containing the first selectable marker - that is cells that are substantially free (20% of the cells or less, suitably 1% or less) contain the first selectable marker (i.e., FACS sorting for cells without an mCherry detectable tag).
  • the resulting stem cells contain at least one tagged endogenous, differentially-expressed protein.
  • the first selectable marker is operably linked to a constantive regulatory element such that, once it is successfully transfected into the target cell, the fist selectable marker is expressed. That is, the polynucleotide encoding the first selectable marker further encodes a constitutive regulatory element operably linked to the first selectable marker.
  • “operably linked” means that the constitutive regulatory element is upstream from the selectable marker and capable of causing the first selectable marker to be transcribed.
  • the regulatory element is a “constitutive” element, it is unregulated, and allows for continual transcription of the first regulatory element, once transfection has successfully occurred.
  • constitutive regulatory elements examples include, but are not limited to, a CAGGS promoter (l .6-kb hybrid promoter composed of the CMV immediate-early enhance, CBA promoter, and CBA intron l/exon 1), a hPGK promoter, an EF 1 -a promoter, a ubiquitin promoter (UBC promoter), and an actin promoter.
  • the constitutive regulator element can be replaced with an inducible promoter, for example a tetracycline-inducible promoter (tet), and the like.
  • the methods described herein allow for modifying the 5’ end with gene editing method, thus proving a method with more effective/flexible editing in a gene that could be sensitive to a leftover sequence from gene editing.
  • the donor plasmids useful in the methods described herein can further include microhomology containing sequences or linkers, flanking the 5’ and 3’ excision sites.
  • Microhomology containing sequences suitably 5-25 base pair sequences, that facilitate the ligation of mismatched hanging strands of polynucleotides, removing overhanging nucleotides, and filling in the missing base pairs.
  • the microhomology containing sequences comprise tri-nucleotide or hexa-nucleotide repeat sequences.
  • the microhomology containing sequences are useful to guide in-frame microhomology -mediated end joining repair.
  • the donor plasmid can include polynucleotide that codes for (and thus the tagged, differentially-expressed protein will include), a linker that links the second selectable marker and the tagged protein.
  • the linker is a protein sequence, including for example, Ser-Gly-Ser-Gly-Ser-Pro-Gly (SEQ ID NO: 288), Ser-Gly-Ser-Gly-Ser-Gly (SEQ ID NO: 289), Ser-Gly-Pro-Gly, or the ACTN2 linker: Val-Asp-Gly-Thr-Ala-Gly-Pro-Gly-Ser-Gly- Pro-Gly-Ser-Ile-Ala-Thr (SEQ ID NO: 290).
  • the 5’ and 3’ excision cites suitably include a TialL protospacer, for example an inverted TialL protospacer. These protospacers enable nucleases, for example Cas9/CRISPR-mediated, excision of the selection cassette after the cells that express the first selectable marker have been selected.
  • the 5’ and 3’ excision cites, including the TialL target sequence is absent from the target genome, including the human genome, and can be used to ligate distinct double strand breaks induced by Cas9.
  • the TialL sites are oriented in the“P AM-out” orientation such that NHEJ-mediated double strand repair following Cas9 activity results in an in- frame mEGFP fusion with the target gene.
  • the peptide linker sequences incorporated within the TialL sites can be designed and oriented such that NHEJ-based repair after excision results in an in-frame coding sequence with 12 bp of residual sequence (for example encoding Ser-Gly-Pro- Gly) that serves as a canonical linker between the mEGFP and the target gene.
  • Use of TialL sites suitably provide three base pairs encoding Gly or Ser (depending on orientation), which are useful in protein engineering.
  • the first and/or second selectable markers each contain at least about 8 amino acids in length, for example at least about 10 amino acid, at least about 20 amino acids, at least about 30 amino acids, at least about 40 amino acids, at least about 50 amino acids, at least about 60 amino acids, at least about 70 amino acids, at least about 80 amino acids, at least about 90 amino acids, or at least about 100 amino acids.
  • exemplary selectable makers for use as the first and/or the second selectable markers including an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag.
  • both the first and second selectable markers are detectable tags, and suitably are fluorescent proteins, including for example green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, or mTagRFPt.
  • GFP green fluorescent protein
  • BFP blue fluorescent protein
  • CFP cyan fluorescent protein
  • YFP yellow fluorescent protein
  • RFP red fluorescent protein
  • mCherry tdTomato
  • mNeonGreen or mTagRFPt.
  • the first selectable marker is mCherry
  • the second selectable marker is GFP.
  • the emission fluorescent signals from these two detectable tags are sufficiently far apart (mCherry at about 600-620 nm; GFP at about 500-520 nm)
  • FACS fluorescence activated cell sorting
  • the nuclease system for both introduction of the donor plasmid, as well as removal of the first selectable marker (selection cassette) is a CRISPR/Cas system.
  • the first RNP comprises the first crRNA, the first tracrRNA, and the first Cas protein complexed at a ratio of 1 : 1 : 1
  • the second RNP comprises the second crRNA, the second tracrRNA, and the second Cas protein complexed at a ratio of 1 : 1 : 1.
  • the Cas protein is a wild-type Cas9 protein or a Cas9-nickase protein.
  • the methods provided herein are designed such that, in embodiments, the first crRNA sequence is selected to minimize off-target cleavage of genomic DNA sequences and/or insertion of the donor plasmid, and the second crRNA sequence is selected to minimize off- target cleavage of the 5’ and 3’ excision sites.
  • off-target cleavage is less than about 5.0%, more suitably less than about 4.0%, less than about 3.0%, less than about 2.0%, less than about 1.0%, or less than about 0.5%.
  • a double-stranded break is generated at the target genomic locus after step the excision of the first selectable marker.
  • This double-stranded break can be repaired by various mechanisms, including for example, homology directed repair (HDR), non homology end joining (NHEJ), or microhomology-mediated end joining (MMEJ).
  • HDR homology directed repair
  • NHEJ non homology end joining
  • MMEJ microhomology-mediated end joining
  • the use of microhomology linkers including hexa- and tri-nucleotide repeats, allows for double-stranded break repair by MMEJ, with the microhomology linkers acting as a repair template during MMEJ. ETse of such sequences bias excision repair outcomes and efficiently delete the residual sequences remaining from Cas9 cleavage, including any protospacer adjacent motif (PAM) sequences that may have been included, leading to a scarless fusion product.
  • PAM protospacer adjacent motif
  • the scar sequence can be repurposed as a peptide linker.
  • a scar sequence is placed in between a gene sequence and the tag, and the scar becomes a linker via non-homologous end joining, i.e., a 4 amino acid linker.
  • the scar can be deleted through microhomology mediated end joining (MMEJ).
  • MMEJ can be used insert nucleotides for amino acid linkers that are actually desired. For example, a sequence“A” that encodes the linker is placed on both sides of the scar and then MMEJ is utilized.
  • MMEJ involves a deletion event such that one of the“A” is deleted so only copy remains. MMEJ removes the scar by cutting out everything between the“A” sequences.. Although the scar may be transiently present episomally , but is not be present in cells.
  • the cells that are produced using the method are induced pluripotent stem cells (iPSC) derived from a healthy donor, and can be a WTC cell or a WTB cell, as described herein.
  • iPSC induced pluripotent stem cells
  • Cells into which the iPSCs prepared in accordance with the methods herein can differentiate into include, for example, a cardiomyocyte, a differentiated kidney cell, or a differentiated fibroblast.
  • the tagged protein that is produced via the methods described herein can be ACTN2, TTNI1, MYL2, MYL7, or TTN. Additional proteins known in the art that are differentially-expressed can also be readily tagged using the methods described herein.
  • the methods described herein can include an additional selection step based on genetic screening to confirm that the second selectable marker has been properly inserted, in the proper position, and with appropriate functionality.
  • Such screening methods can include, for example, use of genetic screening to determine at least two of the following: insertion of the second selectable marker sequence, stable integration of the donor plasmid, and/or relative copy number of the second selectable marker sequence.
  • the genetic screening is performed by droplet digital PCR (DDPCR), tile junction PCR, or both.
  • the second selectable marker can be inserted into one or both alleles of the target genomic locus, but is not stably integrated into the plasmid backbone.
  • Genetic sequencing to identify clones with successful insertion of the second selectable marker can include, amplifying the genomic sequences across the junction between the inserted second selectable marker and the 5’ and 3’ distal genomic regions to generate tiled-junction amplification products, sequencing the tiled-junction amplification products, and comparing the sequence of the tiled-junction amplification products with a reference sequence to confirm precise insertion of the second selectable marker.
  • cells produced using the methods herein, and in particular stem cells express at least one protein associated with pluripotency, including for example, one or more of Oct3/4, Sox2, Nanog, Tra-l60, Tra-l8l, and SSEA3/4.
  • the expression level of the at least one protein associated with pluripotency is comparable to the expression level of the same protein in an unmodified cell or stem cell.
  • the stem cells produced using the methods described herein maintain a differentiation potential that is comparable to an unmodified stem cell, and suitably the stem cells produced by the methods have a morphology, viability, potency, and endogenous cellular function of the stem cells are not substantially changed compared to unmodified stem cells and differentiated cells thereof. That is, that the stem cells produced using the methods described herein will function as normal stem cells, even with the inclusion of a tagged, endogenous differentially-expressed protein.
  • the donor plasmid includes polynucleotide sequences encoding a first selectable marker, a constitutive regulatory element operably linked to the first selectable marker, a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker, a second selectable marker that is different from the first selectable marker, a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length.
  • the constitutive regulatory element is a
  • the donor plasmid can also further include microhomology containing sequences flanking the 5’ and 3’ excision sites, and suitably the microhomology containing sequences include tri-nucleotide or hexa-nucleotide repeat sequences.
  • the donor plasmid can further include a flexible linker sequence.
  • the polynucleotide sequences encoding the first and second selectable markers are each at least about 20 nucleotides in length, more suitably the first and second selectable markers are each between about 300 nucleotides and about 3000 nucleotides in length, or the polynucleotide sequences encoding the first and second selectable markers can each greater than about 3000 nucleotides.
  • the first and second selectable markers encoded by the polynucleotides are suitably each at least about 8 amino acids in length, or can between about 8 and about 100 amino acids in length.
  • the first and/or second selectable marker is suitably an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag.
  • Suitable first and second selectable markers include detectable tags, such as fluorescent proteins, including green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
  • the first selectable marker is mCherry
  • the second selectable marker is GFP.
  • stably tagged cells generated by inserting the donor plasmids described herein into a genomic locus targeted by the 5’ and 3’ homology arms.
  • the donor plasmids and methods described herein are suitably used to prepare tagged proteins that can be imaged, allowing for detection, imaging, tracking and studying of proteins that are silent in an undifferentiated cell (i.e., a stem cell), but differentially-expressed, that is turned on, when the cell differentiates into one or more further cell types.
  • the cells prepared herein can be part of a tissue, including a living tissue. Imaging methods described herein, can allow for three-dimensional imaging of the cells and the tagged proteins, allowing for determination of location of the tagged proteins during various cell stages, etc. various methods of imaging cells, including 3-D imaging, are described herein.
  • a cell comprising an exogenous polynucleotide integrated at a target genomic locus, the exogenous polynucleotide comprising polynucleotide sequences encoding a first selectable marker, a constitutive regulatory element operably linked to the first selectable marker, a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker, and a second selectable marker that is different from the first selectable marker.
  • the target genomic locus is suitably a locus of a gene encoding a differentially-expressed protein.
  • a cell comprising a CRISPR/Cas9 ribonucleoprotein
  • RNP RNP complex and a donor polynucleotide
  • the donor polynucleotide comprising polynucleotide sequences encoding a first selectable marker, a constitutive regulatory element operably linked to the first selectable marker, a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker, a second selectable marker that is different from the first selectable marker, and a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length.
  • the cells include microhomology containing sequences flanking the 5’ and 3’ excision sites, suitably containing sequences comprising tri -nucleotide or hexa-nucleotide repeat sequences.
  • the 5’ and 3’ excision sites each can comprise a TialL protospacer, including an inverted TialL protospacer.
  • first and second selectable markers are described herein, including the use of fluorescent proteins, including green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
  • fluorescent proteins including green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
  • cells comprising an endogenous, differentially- expressed protein stably tagged with a selectable marker, suitably wherein the selectable marker is an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag.
  • the selectable marker is a detectable tag, such as a fluorescent protein, suitably selected from green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
  • the cell is an undifferentiated stem cell, and the differentially-expressed protein is not expressed in the undifferentiated stem cell, but is expressed in a differentiated cell derived from the undifferentiated stem cell.
  • differentiated cells or groups of differentiated cells wherein the differentiated cells or group of differentiated cells are cardiomyocytes, differentiated kidney cells, or differentiated fibroblasts, and include a tagged, differentially-expressed protein produced using the methods and plasmids described herein.
  • kits comprising an array of stem cells comprising at least one tagged endogenous, differentially-expressed protein, suitably produced using the various methods and plasmids described herein.
  • the kits can be used for visualizing one or more proteins during differentiation, or use for selecting differentiated cells, comprising an array of the cells described herein.
  • the visualizing of the one or more proteins is performed by fluorescent microscopy, and the differentiated cells express at least one tagged endogenous protein.
  • Also provided herein are methods of generating a signature for a test agent comprising admixing the test agent with one or more cells produced by the various methods described herein, detecting a response in the one or more cells, detecting a response in a control cell (i.e., a cell that does not include a test agent), detecting a difference in the response in the one or more cells from the control cell, and generating a data set of the difference in the response.
  • the cells produced by the various methods described herein can also be utilized in various activities, such as, determining a genetic or protein target for a test agent or drug within a cell, determining toxicity of a test agent on the cell, determining the stage of disease in the cell, determining the dose of a test agent or drug for treatment of disease, monitoring disease progression in the cell, and monitoring effects of treatment of a test agent or drug on the cell. Additional uses of the cell include monitoring progression of disease or effect of a test agent on a disease wherein the disease is selected from the group consisting of aberrant cell growth, wound healing, inflammation, immune disorders, genetic disorders, neurodegeneration, and neuromuscular degeneration.
  • a“stimuli- responsive gene” refers to a gene that turns on or is activated in response to an external stimuli, an environmental factor, or an added compound or substance.
  • Examples of stimuli-responsive genes include genes that are turned on or activated in response to stress, heat, light, oxidation, ionizing radiation, metal-induced toxicity, or in response to a foreign compound or drug.
  • the methods comprise a) providing a first nuclease specific for a target genomic locus of a stimuli-responsive gene; b) providing a donor plasmid comprising: i. a first polynucleotide encoding a selection cassette, wherein the selection cassette comprises a first selectable marker; ii. a second polynucleotide encoding a second selectable marker that is different from the first selectable marker; iii. a third polynucleotide encoding a 5’ homology arm; and iv.
  • a fourth polynucleotide encoding a 3’ homology arm; introducing the first nuclease of (a) and the donor plasmid of (b) into a cell such that the first and second polynucleotides are inserted into the target genomic locus; selecting cells expressing the first selectable marker; and introducing into the cells of (d): a second nuclease capable of excising the selection cassette to generate an endogenous, stimuli-responsive gene tagged with the second selectable marker; thereby producing the cell comprising the at least one tagged endogenous, stimuli-responsive gene.
  • methods for producing cells containing endogenous tagged genes can be carried out using various gene editing methods, including those based on TALENS, Zinc Finger, CRISPR-Cas, etc.
  • a method for producing a cell comprising at least one tagged endogenous, stimuli-responsive gene comprising: a) providing a first ribonucleoprotein (RNP) complex comprising a first Cas protein, a first CRISPR RNA (crRNA) and a first trans-activating RNA (tracrRNA), wherein the first crRNA is specific for a target genomic locus of an endogenous, stimuli-responsive gene in a cell; b) providing a donor plasmid comprising polynucleotide sequences encoding: a first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; a second selectable marker that is
  • Exemplary stimuli-responsive genes include those that turn on or activate in response to endoplasmic reticulum stress, ionizing radiation stress, heat shock, oxidative stress, metal-induced toxicity, or drug-induced toxicity, as well as other external stimuli. Examples of stimuli-responsive genes that can be tagged using the methods described herein are provided in the chart below. Additional stimuli-responsive genes include those involved in intracellular signaling pathways that are activated in response to stress or toxicity. Also provided are references where additional information regarding each of the genes, including sequence information of the genes for production of 5’ and 3’ homology arms, can be found.
  • Additional stimuli-responsive genes and their sequence information can also be found in the AmiGO 2 from the Gene Ontology (GO) Consortium, which can be used to identify additional genes positively or negatively regulated in response to various biological stimuli (e.g., X-ray, heat, hypoxia, etc.), and can be found at amigo.geneontology.org/amigo/dd_browse.
  • GO Gene Ontology
  • Examples of cells that can be produced with tagged endogenous, stimuli- responsive genes, using the methods described herein, include any mammalian or human primary cells or cell lines, including lung cells, endothelial cells, muscle cells, liver cells, brain cells, nerve cells, immune cells, cartilage cells, cancer cells, etc.
  • a gene involved in sensing or promoting apoptosis in a cell can also be tagged, such that the effect of a stress, compound, etc., on the apoptotic response of the cell can be visually or otherwise tracked prior to the cell actually undergoing apoptosis.
  • the various methods described herein with regard to tagging endogenous genes in stem cells can be extended to producing the tagged cells which contain an endogenous, stimuli-responsive gene, using similar methods, approaches, components, etc.
  • cells produced herein in which an endogenous, stimuli- responsive gene have been tagged can provide various research and clinical advantages.
  • cells can be placed under various stress situations, including heat, cold, radiation, or situations where such stresses may be occurring, to view or otherwise track the response of the cells, as well as potentially determine methods that can intervene to stop or avert the stress response.
  • the methods and cells containing tagged endogenous, stimuli-responsive genes can also be used as drug screening or toxicity assays for potential new chemical compounds.
  • Drugs can be provided to the cells in a controlled environment, suitably in cell culture or in situ , and the response monitored visually (if using fluorescence or other visual tags) or otherwise tracked to determine if the toxicity or stress response(s) of the cells are activated.
  • agents that can counter toxicity causing compounds can also be screened using such cells and methods.
  • the selection cassette of (b) suitably further comprises 5’ and 3’ excision sites flanking the first selectable marker.
  • the cell comprising the at least one tagged endogenous, stimuli-responsive gene is substantially free of the first selectable marker.
  • the polynucleotide encoding the first selectable marker further encodes a constitutive regulatory element operably linked to the first selectable marker.
  • the constitutive regulatory element is a CAGGS promoter, a UBC promoter, an EFl-a promoter, an actin promoter, or a hPGK promoter.
  • the donor plasmid of (b) further comprises microhomology containing sequences flanking the 5’ and 3’ excision sites.
  • the microhomology containing sequences comprise tri -nucleotide or hexa-nucleotide repeat sequences.
  • the 5’ and 3’ excision sites each comprise a TialL protospacer, including where the TialL protospacer is an inverted TialL protospacer.
  • the first and/or second selectable markers are each at least about 8 amino acids in length, and in embodiments the first and/or second selectable markers are each at least about 100 amino acids in length.
  • first and/or the second selectable markers are described herein, and suitably can be an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag.
  • the first and second selectable markers are fluorescent proteins, including those selected from the group consisting of green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
  • the first selectable marker is mCherry
  • the second selectable marker is GFP.
  • the selecting of (d) suitably comprises fluorescence activated cell sorting (FACS).
  • the methods can further comprise (f): selecting cells expressing the second selectable marker.
  • the second selectable marker is a fluorescent protein and the second selection step comprises fluorescence activated cell sorting (FACS).
  • the first nuclease and/or the second nuclease is a Cas nuclease, a
  • the first nuclease and/or the second nuclease is a Cas protein, including Cas9.
  • the first RNP comprises the first crRNA, the first tracrRNA, and the first Cas protein complexed at a ratio of 1 : 1 : 1.
  • the second RNP comprises the second crRNA, the second tracrRNA, and the second Cas protein complexed at a ratio of 1 : 1 : 1.
  • the Cas protein is a wild-type Cas9 protein or a Cas9-nickase protein.
  • the first crRNA sequence is selected to minimize off-target cleavage of genomic DNA sequences and/or insertion of the donor plasmid.
  • the second crRNA sequence is selected to minimize off-target cleavage of the 5’ and 3’ excision sites.
  • the methods provided herein suitably result in off-target cleavage that is less than about 1.0%.
  • a double-stranded break is generated at the target genomic locus after step (c).
  • the double-stranded break is repaired by homology directed repair (HDR), non-homology end joining (NHEJ), or microhomology-mediated end joining (MMEJ).
  • HDR homology directed repair
  • NHEJ non-homology end joining
  • MMEJ microhomology-mediated end joining
  • the donor plasmid acts as a repair template during MMEJ.
  • PAM protospacer adjacent motif
  • the introducing or transfecting of (c) occurs by electroporation.
  • the electroporation comprises at least 1 pulse (more suitably at least 1-5 pulses, including 2 pulses), and in embodiments the pulse is at least about 15 ms at a voltage of at least about 1300 V.
  • At least about 0.1% of the cells express the first selectable marker after step (c).
  • the second selection step further comprises genetic screening to determine at least two or more of the following: insertion of the second selectable marker sequence; stable integration of the donor plasmid; and/or relative copy number of the second selectable marker sequence.
  • the genetic screening is performed by droplet digital PCR (ddPCR), tile junction PCR, or both.
  • selecting the clones having an insertion of the second selectable marker comprises selecting clones that have the second selectable marker inserted into one or both alleles of the target genomic locus and do not have stable integration of the plasmid backbone.
  • the methods further comprise sequencing clones having an insertion of the second selectable marker to identify clones that have a precise insertion of the second selectable marker.
  • the clones that have a precise insertion are identified by: amplifying the genomic sequences across the junction between the inserted second selectable marker and the 5’ and 3’ distal genomic regions to generate tiled-junction amplification products; sequencing the tiled-junction amplification products of (a); and comparing the sequence of the tiled-junction amplification products with a reference sequence.
  • a cell comprising an exogenous polynucleotide integrated at a target genomic locus, the exogenous polynucleotide comprising polynucleotide sequences encoding: a first selectable marker; a constitutive regulatory element operably linked to the first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; and a second selectable marker that is different from the first selectable marker, wherein the target genomic locus is a locus of a stimuli- responsive gene.
  • the cells further comprise microhomology containing sequences flanking the 5’ and 3’ excision sites, suitably where the microhomology containing sequences comprise tri -nucleotide or hexa-nucleotide repeat sequences.
  • the 5’ and 3’ excision sites each comprise a TialL protospacer, including where the TialL protospacer is an inverted TialL protospacer.
  • the first and/or second selectable marker each comprise about 8 amino acids in length, and suitably the first and/or second selectable markers each comprise at least about 100 amino acids in length.
  • the first and/or second selectable marker is an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag.
  • the first and second selectable markers are fluorescent proteins, including green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
  • the first selectable marker is mCherry
  • the second selectable marker is GFP.
  • tagged endogenous genes include genes coding for structural proteins, membrane proteins, and various other cellular components.
  • the methods comprise a) providing a first nuclease specific for a target genomic locus of an endogenous gene; b) providing a donor plasmid comprising: i. a first polynucleotide encoding a selection cassette, wherein the selection cassette comprises a first selectable marker; ii. a second polynucleotide encoding a second selectable marker that is different from the first selectable marker; iii. a third polynucleotide encoding a 5’ homology arm; and iv.
  • a fourth polynucleotide encoding a 3’ homology arm; introducing the first nuclease of (a) and the donor plasmid of (b) into a cell such that the first and second polynucleotides are inserted into the target genomic locus; selecting cells expressing the first selectable marker; and introducing into the cells of (d): a second nuclease capable of excising the selection cassette to generate an endogenous gene tagged with the second selectable marker; thereby producing the cell comprising the at least one tagged endogenous gene.
  • Example 1 A ribonucleoprotein -based CRISPR/Cas9 system to create Fluorescent
  • the CRISPR/Cas9 system was used to introduce a GFP tag into the genomic loci of various proteins by HDR-mediated incorporation. Exemplary proteins tagged by the methods described herein are shown in Tables 1 and 2 above. Experiments were designed to introduce GFP at the N- or C-terminus along with a short linker using a CRISPR/Cas9 RNP and a donor plasmid encoding the full length GFP protein (FIG. 1 A). The donor plasmid contained 1 kb homology arms about 1 kb in length, on either side of the GFP operably linked to a linker sequence and a bacterial selection sequence in the backbone.
  • FIG. 1B illustrates a schematic of donor plasmids for N-terminal tagging of LMNB 1 and C-terminal tagging of DSP.
  • FIG. 13 shows the predicted genome wide CRISPR/Cas9 binding sites, categorized according to sequence profile and location with respect to genes. At least two independent crRNA sequences were used in each editing experiment in an effort to maximize editing success and elucidate the potential significance of possible off-target effects in the clonal cell lines generated (FIG. 13 A). Predicted alternative CRISPR/Cas9 binding sites were categorized for each crRNA used and each predicted off-target sequence was categorized according to its sequence profile (the number of mismatches and RNA or DNA bulges it contains relative to the crRNA used in the experiment and their position relative to the PAM) (FIG.
  • Cas- OFFinder was used to discriminate between crRNA sequences with respect to their genome-wide specificity (Bae et al ., (2014) Bioinformatics, 30(10): 1473-1475) by identifying all alternative sites genome-wide with ⁇ 2 mismatches/bulges in the non-seed and/or ⁇ 1 mismatch/bulge in the seed region, with an NGG or NAG PAM.
  • the seed and non-seed region of a crRNA binding sequence was defined with respect to its proximity to the PAM sequence. All predicted off-target sites were additionally categorized according to their location with respect to annotated genes (FIG. 13D). Genomic location was defined as follows:
  • exon inside exon or within 50 bp of exon
  • genic in intron (but >50 bp from an exon) or within 200 bp of an annotated gene;
  • crRNAs targeting Cas9 to within 50bp of the intended GFP integration site were used, with a strong preference for any crRNAs with binding sites within lObp.
  • a subset of CRISPR/Cas9 alternative binding sites identified by Cas-OFFinder were selected for sequencing and FIG. 13E shows the breakdown of sequenced off-target sites by genomic location with respect to annotated genes. Numbers above bars represent the number of clones sequenced for each experiment. All 406 sequenced sites were found to be wild type.
  • Donor plasmids were designed for each target locus and contained design features specific to each target and a GFP-encoding nucleic acid sequence (See, e.g. , FIG. 1A and FIG. 1B). Homology arms of about 1 kb in length and corresponding to the endogenous DNA regions located 5’ and 3’ to the target insertion site were designed from the hg38 reference genome and were corrected for known SNPs in WTC11 cells. Unique linkers for each locus were used and were inserted 5’ of the GFP sequence for C-terminal tagging of the endogenous protein or 3’ of the GFP sequence for N-terminal tagging of the endogenous protein.
  • Plasmids were initially created either by In-Fusion assembly of gBlock pieces (IDT) into a pUCl9 backbone, or the plasmids were synthesized and cloned into a pU57 backbone by Genewiz. All plasmids were deposited in the Addgene database. Donor plasmids were diluted to working concentrations of 1 pg/pL in TE. In some experiments, higher concentrations of donor plasmid were used, but lower concentrations ( ⁇ 500 ng/pL) were avoided. Table 6 below illustrates nucleic acid sequences for exemplary plasmid inserts comprising GFP detectable tags, homology arms targeting the indicated genes, and linkers including:
  • TJP1 homology arm (i) 5’ TJP1 homology arm (SEQ ID NO: 8) - mEGFP - linker (SEQ ID NO: 280) - 3’ TJP1 homology arm (SEQ ID NO: 23);
  • Wild type (WT) S. pyogenes Cas9 (spCas9) protein was purchased from UC
  • crRNA CRISPR RNA
  • tracrRNA trans-activating crRNA
  • crRNP CRISPR/Cas9 ribonucleoprotein
  • the crRNA and tracrRNA oliognucleotides were reconstituted to 100 mM in TE at pH 7.5 (catalog #11-01-02-02, IDT).
  • the crRNA and tracrRNA oligonucleotides were then combined in a sterile PCR at a final concentration of 40 pM in Duplex Buffer (100 mM potassium acetate; 30 mM HEPES, pH 7.5).
  • the crRNA and tracrRNA mixture was heated to 95 °C for 5 min to generate a crRNA:tracrRNA duplex. After heating, the crRNA:tracrRNA duplex was allowed to cool at room temperature for a minimum of two hours, after which the crRNA:tracrRNA duplex was kept on ice. crRNA:tracrRNA duplexes were then diluted to a working concentration of 10 mM in TE. All dilutions and stocks were kept on ice throughout the protocol. Alternatively, the crRNAdracrRNA duplexes were stored at -20°C for later use.
  • spCas9 was stored at -80°C and was thawed on ice or at 4°C until no ice pellet was visible, approximately 2-5 min. spCas9 was then diluted to a working concentration of 10 pM in TE in preparation for use. Alternatively, working concentrations of Cas9 protein were stored at -20°C for up to 2 weeks and multiple freeze-thaw cycles were avoided ( ⁇ 3 freeze-thaw cycles recommended).
  • crRNPs were generated by combining the solution of crRNA:tracrRNA duplexes and Cas9 protein in a 1.5 mL eppendorf tube and gently pipetting up and down three times. A separate crRNP was generated for each reaction to be performed. crRNPs were incubated a room temperature for a minimum of 10 minutes and no longer than 1 hour prior to the addition of the complexes to cells.
  • WTC iPSCs were cultured according to described methods. Briefly,
  • WTC11 iPSCs were cultured in a feeder free system on tissue culture plates or dishes coated with pheno red-free GFR Matrigel (Corning) diluted 1 :30 in DMEM/F12 (Gibco) in mTeSRl media (StemCell Technologies) supplemented with 1% (v/v) Penicillin-streptomycin (P/S) (Gibco).
  • Cells were not allowed to reach confluency greater than 85% and were passaged every 3-4 days by dissociation into single-cell suspension using StemPro® Accutase® (Gibco). When in single cell suspension, cells were counted using a Vi-CELL® Series Cell Viability Analyzer (Beckman Coulter).
  • mTeSRl media 400 mL basal media with provided
  • mTeSRl media 100 mL 5X supplement (catalog # 05850, Stem Cell Technologies) with added 5 mL (1% v/v) Penicillin/Streptomycin (catalog # 15140-122, Gibco) was prepared and sterile filtered with a 0.22 pm filter prior to use.
  • mTeSRl media was brought to room temperature on the bench top, and was not warmed in a 37°C water bath.
  • mTeSRl + ROCK inhibitor (Ri) media was prepared by adding 10 mM Ri to mTeSRl media at a 1 : 1000 dilution. Accutase was warmed in a 37°C water bath. Previously prepared Matrigel-coated vessels (stored at 4°C) were brought to room temperature.
  • 6- well plates were prepared by aspirating and discarding any excess Matrigel liquid, and adding 4 mL of RT mTeSRl + Ri media to each well. Plates with media were kept in an incubator at 37°C and 5% CO2 until ready to plate cells after the transfection procedure.
  • 8xl0 5 cells were resuspended in 100 pL Neon Buffer R with 2 pg donor plasmid, 2 pg Cas9 protein duplexed with a crRNA:tracrRNA at a 1 : 1 molar ratio to Cas9, then electroporated with one pulse at 1300 V for 30 ms, and plated onto Matrigel-coated 6-well dishes with mTeSRl media supplemented with 1% P/S and 10 pM RI. Transfected cells were cultured as previously described for 3-4 days until the transfected culture had recovered to -70% confluent. Transfected cells were incubated for at least 24 hours before changing the media to mTeSRl without Ri. Successfully transfected cells were identified and harvested by FACS sorting for use in downstream applications after reaching a healthy confluency and maturity (approximately 3-4 days) (FIG. 1C).
  • Fluorescence-activated cell sorting was used to enrich the population of gene edited cells after transfection and to evaluate rates of HDR (FIG. 2A).
  • the cell suspension 0.5 - 1.0 x 10 6 cells/mL in mTeSRl + RI
  • a range of GFP fluorescent intensity was observed in edited populations (FIG. 2A and FIG. 2B).
  • the GFP intensity determined by FACS correlated with transcription levels of the target protein observed by RNAseq analysis from the WTC parental cell line (RNA-seq analysis shown in FIG. 12).
  • FIG. 2C shows a representative image of the LMNB 1 Crl FACS-enriched population showing an enrichment of GFP+ cells.
  • FIG. 3, Step 1 An overview of the genetic screening process is shown in FIG. 3, Steps 1 through 3, including digital droplet PCR (ddPCR, FIG. 3, Step 1), tiled junctional PCR assays (FIG. 3, Step 2), and sequencing analysis of inserted amplicons (FIG. 3, Step 3).
  • ddPCR digital droplet PCR
  • Step 2 tiled junctional PCR assays
  • FIG. 3, Step 3 sequencing analysis of inserted amplicons
  • RPP30 reference gene could be used to analyze all gene edits
  • a droplet digital PCR (ddPCR) assay was used to rapidly interrogate large sets of clones in parallel without having to optimize parameters specifically for each target gene, a significant advantage for our high throughput platform.
  • ddPCR droplet digital PCR
  • Assays were designed to measure three DNA sequences common to each experiment: (1) the GFP tag sequence to measure tag incorporation; (2) the ampicillin or kanamycin resistance gene to assess stable integration of the plasmid backbone; and (3) a two- copy genomic reference locus (RPP30) to calculate genomic copy number. These sequences were used to identify clones with a GFP:RPP30 signature of ⁇ 0.5 or -1.0, suggesting monoallelic or biallelic stable integration of the GFP sequence into the host cell genome. Clones with an elevated AmpR/KanR:RPP30 ddPCR signature (>0. l) suggested stable integration of the donor plasmid backbone and were rejected.
  • GFP-tagged clones lacking plasmid backbone integration were identified using ddPCR, with equivalently amplifying primer sets and probes corresponding both to the GFP tag and the donor plasmid backbone.
  • the abundance of the GFP tag sequence was quantified (x-axis in FIG. 3, Step 1) and normalized to a known 2-copy genomic reference gene (RPP30) in order to calculate genomic GFP copy number in the sample.
  • the reference assay for the 2-copy, autosomal gene RPP30 was purchased from Bio-Rad.
  • the assay for mEGFP detection was as follows:
  • the reported final copy number of mEGFP per genome was calculated as the ratio of [(copies / pL mEGFP) - (copies / pL nonintegrated AMP)] / (copies / pL RPP30), where a ratio of 0.5 indicated monoallelic insertion ( ⁇ 1 copy per genome) and a ratio of 1 indicated biallelic insertion ( ⁇ 2 copies/genome).
  • the AMP sequence was used to normalize mEGFP signal only when integration into the genome was ruled out during primary screening.
  • Clones with a GFP copy number of -1.0 (monoallelic) or ⁇ 2.0 (biallelic) and AMP/KAN ⁇ 0.2 were putatively identified as correctly edited clones. Combining data across all successful editing experiments, 39% of clones were retained as candidates using this assay (FIG. 5 A). Clones with a GFP copy number 0.2-1 were considered possible mosaics of edited and unedited cells and were rejected. Clones with a GFP copy number between ⁇ l and ⁇ 2 were further screened to identify potential biallelic clones from mixed cultures.
  • the screening strategy also identified several faulty outcomes in the editing and selection process including unedited clones co-purified during flow cytometry selection, and clones harboring plasmid backbone in the targeted locus and enabled selection of successfully edited clones. These results demonstrate that the addition of the ddPCR assay to the genetic screening process enabled selection of successfully edited clones and eliminated unsuccessful or off-target edits from downstream analyses.
  • Primer sequences used in each PCR reaction are shown in FIG. 23. All primers are listed in 5’ to 3’ orientation. PCR was used to amplify the tagged allele in two tiled reactions spanning the left and right homology arms, the mEGFP and linker sequence, and portions of the distal genomic region 5’ of the left homology arm and 3’ of the right homology arm using PrimeStar® (Clontech) PCR reagents and gene-specific primers. Both tiled junctional PCR products were Sanger sequenced bidrectionally with PCR primers when their size was validated as correct by gel electrophoresis and/or Fragment Analyzer (FIG. 5E).
  • FIG. 5D shows the percentage of clones in each experiment with
  • Such cultures typically displayed colonies that were loosely packed with irregular edges and larger, more elongated cells compared to undifferentiated cells, as observed with one PXN clone (a confirmed biallelic edit) (FIG. 10A right-most image).
  • Expression of established pluripotency stem cell markers was also determined, including the transcription factors Oct3/4, Sox2 and Nanog, and cell surface markers SSEA-3 and TRA-l-60 (Fig. 10B, FIG. 10F). High levels of penetrance in the expression of each marker (>86% of cells) were observed in all final clonal lines from the 10 different genome edits, similar to that of the unedited cells (Fig. 10B, FIG. 10F).
  • Candidate clones retain expression of yluriyotency markers
  • Assays were performed to ensure that the clones identified to have precise edits retained stem cell properties during the process of gene editing and expansion.
  • the expression of established stem cell markers including the transcription factors Oct3/4, Sox2 and Nanog, cell surface pluripotency markers Tra-l60 and Tra 181, and the pro-differentiation marker SSEA3 were measured by flow cytometry (FIG. 5A). Briefly, cells were dissociated Accutase as previously described, fixed with CytoFix Fixation BufferTM (BD Bioscience), and frozen in KnockOutTM Serum Replacement (Gibco) with 10% DMSO.
  • Cells were washed with 2% BSA in DPBS and half of the cells were stained with anti-TRA-l-60 Brilliant VioletTM 510, anti-SSEA-3 AlexaFluor® 647, and anti-SSEA-l Brilliant VioletTM 421 (all BD Bioscience). The other half of the cells were permeabilized with 0.5% Triton-XlOO and 2% BSA in DPBS and stained with anti- Nanog AlexaFluor® 647, anti-Sox2 V450, and anti-Oct-3/4 Brilliant VioletTM 510 (all BD Bioscience).
  • each nuclear marker was expressed well above the commonly used thresholds of > 85%+ for stem cell markers and ⁇ 15%+ for differentiation markers and comparable to the parental WTC line (FIG. 5A and 5B).
  • all clones displayed negligible changes in the mean expression intensity of each nuclear marker.
  • Cell surface pluripotency markers displayed similarly robust expression when analyzed in this manner, albeit with greater variability (FIG 5A and FIG. 5C). This analysis was conducted for a total of approximately 50 clones and only 10% were rejected due to changes in the expression profile of these markers. Although comparable, there was sufficient variability within each set of candidate clones candidate clones could be ranked relative to each other to determine those that were most similar to the WTC parent line.
  • Gene edited candidate clones are capable of cardiomyocyte differentiation
  • Cells were harvested using 0.5% Trypsin-EDTA (Gibco), filtered with a 40 pm cell strainer, fixed with CytoFix Fixation BufferTM, permeabilized with BD Perm/WashTM buffer, stained with anti- Cardiac Troponin T AlexaFluor® 647 (BD Bioscience) or isotype control, acquired on a FACS Aria Fusion and analyzed using FlowJo software V.10.2.
  • Edited clones are karyotvvicallv stable
  • stem cells and stably tagged stem cell clones and differentiated cells therefrom of the invention can be used for three-dimensional live cell imaging of intracellular proteins.
  • the methods allow for use of the cells for screening, observing cellular dysplasia, disease staging, monitoring disease progression or improvement or cellular stress in response to a test agent.
  • the resulting endogenously tagged lines allowed for the observation of tagged proteins and corresponding organelles with exceptional clarity due to their endogenous regulation and absence of fixation and staining artifacts. Without exception, distinct localization patterns of the tagged protein were observed when compared to cells transiently transfected with constructs expressing GFP fusion proteins.
  • paxillin was observed in the matrix adhesions formed between substrate contact points and the basal surface of cells, as well as at the dynamic edges of colonies (FIG. 8C).
  • Beta actin localized to the basal surface of colonies both in prominent filaments (stress fibers) and at the periphery of cell protrusions (lamellipodia), as well as in an apical actin band at cell-cell contacts, a feature common in epithelial cells (FIG. 8D).
  • Non-muscle myosin heavy chain IIB had similar localization in actomyosin bundles, including at basal stress fibers and in an apical band (FIG. 8D, 8E).
  • Intensity level was used as a proxy to distinguish between low- and high-level transgene overexpression, though low-level expressing cells were often rare.
  • transfected cells with low EGFP -tubulin transgene expression were comparable to the gene edited alpha tubulin cells (TUBAlB-mEGFP), although the transfected cells contained higher cytosolic signal.
  • Transfected cells with low desmoplakin-EGFP transgene expression revealed a similar pattern to that observed in the DSP-mEGFP gene-edited line, but the transfected cell population also contained other cells, likely expressing the transgene to a greater extent, with high cytosolic signal and increased number and size of desmosome-like puncta.
  • the pipeline was prototyped using a small suite of well-characterized compounds that include brefeldin A, paclitaxel, rapamycin, wortmannin and staurosporine (FIG. 26A).
  • Low-resolution imaging 24x magnification
  • hiPSC colonies were monitored for morphologic changes using transmitted light (FIG. 26B) and an endogenously GFP-tagged structure, such as microtubules (FIG. 26C).
  • FIG. 27 shows representative image planes from z-stacks collected at l20x of the GFP-tagged cell lines with nucleus and cell membrane markers. Cells were treated with the indicated perturbation agent at a pre-selected concentration and time point established in phase I.
  • microtubule stabilizing agent paclitaxel increased microtubule bundle thickness and altered the shape and position of the mitotic spindle during hiPS cell division.
  • paclitaxel also induced aberrant reorganization of the ER in cells undergoing mitosis, while showing minimal effects on the bulk organization of the actin bundles and cell junctions.
  • Other drugs such as staurosporine, a broad kinase inhibitor, had major effects on colony and cell morphology, inducing rearrangements in cell packing and shape. It also induced re-localization of desmosomes, indicating that the cell-cell junctions undergo substantial rearrangement.
  • FIG. 30 For drug-induced effects on cell junction reorganization, representative maximum intensity projections of a z-stack along the x-z axis are shown in FIG. 30. From these projections, the mean pixel intensity for the GFP channel along the x-axis, from the top of the image to the bottom, was measured to generate an intensity profile plot. These plots show the redistribution of ZO-l along the z-axis in the presence of both staurosporine and (S)-nitro- blebbistatin. In presence of staurosporine, desmosomes relocalized throughout the cell, and the number of DSP-positive plaques increased in number (FIG. 31).
  • the resulting imaging data from each compound per stably tagged stem cell clone or differentiated cell derived therefrom can be compared to the negative controls (untreated and vehicle controls) to determine effect on various criteria including cell and subcellular morphology, localization of tagged structure, and dynamics.
  • the effect of that compound on multiple structures can be assessed within the cell.
  • the intended effect of each compound with the relevant gene edited cell line can be confirmed as described in the assays above.
  • the effect of that compound on all other structures can be assessed using the suite of gene edited iPSC lines to create a unique“fingerprint” or signature for that compound in relation to multiple structures.
  • the data generated with these established set of compounds can be used as an initial training set for assays with compounds with unknown function.
  • These profiles can serve as a reference database that can be used for screening novel and previously uncharacterized compound libraries to identify targets, help guide mechanistic studies, and determine specificity.
  • the combination of using human, diploid, non-transformed cells with live imaging using these gene edited iPSCs can provide a much better platform for performing toxicology screening.
  • these predictive models based on the stem cells and stably tagged stem cell clones and differentiated cells therefrom of the present invention can be used for screening, observing cellular dysplasia, disease staging, monitoring disease progression or improvement or cellular stress in response to a test agent.
  • Described herein is a multi-step CRISPR/Cas9 gene editing method to create endogenously tagged mEGFP-fusions for transcriptionally silent genes in hiPSCs, allowing visualization of proteins that are only expressed upon differentiation.
  • a donor template was designed containing the fusion tag (mEGFP) and an mCherry selection cassette delivered in tandem to a target locus via HDR (homology directed repair). The mCherry expression was driven by a constitutive promoter and served as a drug-free, excisable selection marker. Following this selection, the mCherry cassette is excised with Cas9, creating a mEGFP-fusion with the target gene.
  • Sequence elements to guide MMEJ were included for scarless excision with linker sequences between the mEGFP tag and the target gene.
  • mEGFP-tagged genes encoding the cardiomyocyte sarcomeric proteins troponin I (TNNI1), alpha-actinin (ACTN2), titin (TTN), myosin light chain 2a (MYL7), and myosin light chain 2v (MYL2) in undifferentiated hiPSCs have been successfully produced.
  • This methodology provides a general strategy for introducing various tags to silent genomic loci in a scar-less manner in hiPSCs.
  • Genome editing has revolutionized cell biology with the ability to precisely edit and engineer genes of interest at their endogenous loci (Doyon, Zeitler et al. 2011, Dambournet D, Hong et al. 2014, Grassart, Cheng et al. 2014, Mahen, Koch et al. 2014, Otsuka, Bui et al. 2016, Roberts, Haupt et al. 2017).
  • Editing human induced pluripotent stem cells (hiPSCs) is particularly powerful for interrogating cellular dynamics in a diploid, non-transformed and relatively stable genomic setting.
  • the ability to differentiate gene edited hiPSCs into multiple lineages makes them an ideal model system for disease modeling and regenerative medicine (Drubin and Hyman 2017).
  • Methods for endogenous fluorescence tagging of select proteins in hiPSCs includes the precise addition of a fluorescent tag sequence to the host cell genome and can be accomplished via HDR (Roberts, Haupt et al. 2017); Dambournet et al 2014; Koch et al, 2018). Since HDR is an inefficient step in this process, a selection strategy must be used to enrich the rare population of edited cells. This is often accomplished by drug selection or by flow cytometry- based sorting, which relies on successful HDR as well as the expression of the tagged fusion protein. Therefore, this approach cannot be used to enrich for edited cells where the target gene is silent in hiPSCs but expressed upon differentiation to other cell types (i.e., differentially- expressed).
  • hiPSCs The exceptional proliferative capacity of hiPSCs make it a simple and scalable editing platform with broad downstream applications. Unlike terminally differentiated cells, an edited hiPSC clonal line can be subjected to extensive quality control, expanded as a shared resource, and differentiated into multiple lineages (Roberts et al.,).
  • One strategy for selecting cells edited at a silent locus utilizes HDR- mediated delivery of a selection marker (a drug resistance and/or fluorescent protein) under the control of a constitutive promoter. After selection of the edited cells, this sequence is then removed by recombination, most commonly using the Cre/Lox system. Despite its use, this Cre/Lox recombination event results in a 34 base pair residual loxP“scar,” which can disrupt endogenous sequences important for proper regulation of the targeted gene (Skames, Rosen et al. 2011, Yao, Mich et al. 2017, (Judge, Perez-Bermejo et al. 2017).
  • a selection marker a drug resistance and/or fluorescent protein
  • a multi-step editing strategy using CRISPR/Cas9 to add an endogenous mEGFP tag to transcriptionally inactive genes in hiPSCs with drug-free selection and a“scarless” fusion product (FIG. 33A - FIG. -33H).
  • a mEGFP tag is delivered via HDR in tandem with an excisable cassette expressing a second fluorescent protein (mCherry) under the control of a constitutive promoter to enable enrichment of edited cells.
  • the selection cassette is excised with Cas9. Also included are repeat rich sequences in the donor template that guide excision via the MMEJ pathway. As a result, the excision site is deleted and a customizable in- frame linker is introduced between the endogenous coding sequence and the mEGFP tag.
  • a multi-step gene editing strategy is provided to endogenously tag key cardiac sarcomeric proteins with mEGFP (FIG. 33A - FIG. 33H).
  • Gene edited hiPSC lines were prepared for five genes expressed specifically in cardiomyocytes: TNNI1, encoding the myofibril contractile regulator slow skeletal Troponin II; ACTN2, encoding the cardiomyocyte-specific actin regulator alpha actinin 2; TTN, encoding to the sarcomere spanning structural protein titin; and MYL7 and MYL2, which respectively encode myosin motor proteins expressed earlier and in atrial subtypes (MLC2a) and later and in ventricular subtypes (MLC2v) during cardiomyocyte differentiation.
  • MLC2a myofibril contractile regulator slow skeletal Troponin II
  • ACTN2 encoding the cardiomyocyte-specific actin regulator alpha actinin 2
  • TTN encoding to the sarcomere spanning structural protein titin
  • WTC line Episomally derived and previously characterized WTC line was selected as the parental line ((Kreitzer, Salomonis et al. 2013)).
  • Population RNA-seq of WTC hiPSCs and WTC- derived cardiomyocytes confirmed that these five genes are transcriptionally silent in pluripotent cells and activated during cardiomyocyte differentiation (Data available at allencell.org).
  • These five sarcomeric proteins provided a range of expression levels in hiPSC-derived cardiomyocytes, known localization patterns in the sarocomere, and unique developmental expression kinetics for testing the effectiveness of the editing approach.
  • a donor template plasmid is provided with several key features to enable the multi-step editing strategy (FIG. 33 A).
  • the first feature was a fluorescence (mCherry) selection cassette driven by a constitutive promoter. This selection cassette was adjacent to a second downstream fluorescent tag (mEGFP), intended to ultimately be fused to the c-terminus of the gene of interest.
  • mCherry fluorescence (mCherry) selection cassette driven by a constitutive promoter.
  • mEGFP second downstream fluorescent tag
  • Successful donor sequence incorporation via HDR in hiPSCs transfected with Cas9 and target-specific crRNAs resulted in mCherry expression, which served as a surrogate for editing success at these transcriptionally silent loci and enrichment of putatively edited cells.
  • Inverted TialL protospacer sites are included flanking the mCherry donor selection cassette to enable excision (FIG. 33A). These protospacers were included to enable Cas9/CRISPR-mediated excision of the selection cassette after mCherry-expressing cells were initially enriched (FIG. 33A - FIG. 33F).
  • the TialL target sequence is absent from the human genome and has been used to ligate distinct double strand breaks induced by Cas9 (Lackner, Carre et al. 2015). Sites were designed in the“P AM-out” orientation such that NHEJ-mediated double strand repair following Cas9 activity would result in an in-frame mEGFP fusion with the target gene.
  • the peptide linker sequences incorporated within the TialL sites were designed and oriented such that NHEJ-based repair after excision would result in an in-frame coding sequence with 12 bp of residual sequence (encoding Ser-Gly-Pro-Gly) that served as a canonical linker between the mEGFP and the target gene (FIG. 33 A).
  • An additional feature of the donor template is the inclusion of microhomology-containing sequences composed of hexa- and tri-nucleotide repeats to encode common peptide linkers in the mEGFP-fusion.
  • Microhomology -mediated end joining events (MMEJ) utilizing these repeat sequences bias excision repair outcomes and efficiently delete the residual sequence remaining from Cas9 cleavage, and lead to a more favorable, predictable, and designable linker sequence .
  • Step 1- HDR-mediated delivery of the mCherry selection cassette and mEGFP to target loci
  • Donor plasmids were introduced into WTC hiPSCs using a described RNP- mediated electroporation protocol (Roberts, Haupt et al. 2017) and the rate of HDR was evaluated as indicated by the fraction of mCherry-expressing cells with flow cytometry.
  • the significant increase in mCherry-expressing cells with gene-specific crRNAs compared to mock control transfections with the plasmid indicated HDR-mediated incorporation of this large donor sequence at all five loci (FIG. 34A - FIG. 34C).
  • Step 2 Excision of the mCherry selection cassette with CRISPR/Cas9 and NHEJ/MMEJ repair
  • the estimated frequency of alleles in each TialL excised population that were candidates for tagging were also calculated by dividing this relative difference of mCherry-negative cell abundance by the absolute number of mCherry- negative cells in the TialL excised population (FIG. 35C).
  • Step 3 Initial confirmation of mEGFP-tagging within mosaic cardiomyocyte cultures
  • the mCherry-negative populations were sorted, expanded, and evaluated for expression of the mEGFP-fusion upon differentiation into cardiomyocytes (FIG. 35D). All five gene edited populations resulted in robust cardiomyocyte differentiation with high levels (>86%) of the cardiac marker, cardiac troponin T (cTnT), expression. A subset of mEGFP expressing cells (1-15%) within the cardiomyocyte cell populations (cTnT+) in all 5 targeting experiments was identified (FIG. 35D). In contrast, non-cardiomyocytes (cTnT-) with mEGFP expression was observed in non-cardiomyocyte cells (cTnT-), strongly suggesting that mEGFP expression was specific to cardiomyocytes. The expression and sarcomeric localization of the mEGFP-fusion protein in a subset of cardiomyocytes was also confirmed by microscopy (data not shown).
  • Step 4 Genetic screening for clones with precisely edited mEGFP -tagged alleles
  • 150-200 colonies were isolated from each putatively excised population and screened for precise editing similar to previously described methods (Roberts and Haupt 2017).
  • a sequential genomic screen was performed consisting of a primary multiplexed ddPCR assay to measure the genomic copy number of several key sequences from each clone (mEGFP/ Amp/mCherry) followed by junctional PCR and Sanger sequencing as described below.
  • junctional PCR was performed on the mEGFP+/mCherry-/AmpR- clones from the ddPCR assay to confirm precise editing at the appropriate genomic location.
  • PCR primers were designed to amplify the sequence junctions between the mEGFP tag and the genomic sequences 5’ and 3’ of the homology arms.
  • All mEGFP+/mCherry-/AmpR- clones identified from the ddPCR screen underwent editing at the appropriate genomic locus, as judged by the successful amplification of overlapping PCR products on both sides of the tag sequence insertion (data not shown). The extent to which these PCR products matched the anticipated product size was specific to each experiment (FIG. 36C).
  • Step 5 Confirmation and validation of mEGFP-tagging in hiPSC-derived cardiomyocytes
  • cardiomyocytes were also expressing mEGFP(>93% mEGFP+/cTnT+) in all tested clones, suggesting that mEGFP-tagged alleles are expressed during cardiomyocyte differentiation (FIG. 37A - FIG. 37B, FIG. 38). Consistent with varying transcript abundance with bulk RNA-sequencing, the intensity of mEGFP expression varied among the five genes with the lowest levels of expression observed with TTN-mEGFP (FIG. 37A). Abundant and timely expression of ACTN2-mEGFP was observed in multiple clones despite imprecise editing (duplication of donor plasmid elements) at the 3’UTR (FIG. 37A - FIG. 37C).
  • Step 6 Imaging of clonal mEGFP-tagged hiPSC-derived cardiomyocytes
  • FIG. 39A In addition to confirming expression of mEGFP by flow cytometry, cardiomyocytes generated from all five gene edited clonal lines were re-plated on glass-bottom plates with PEI/laminin to perform live cell imaging and immunocytochemistry (FIG. 39A FIG. 39C). Live imaging revealed sarcomeric localization of the mEGFP-tagged fusion proteins with canonical striations localizing to the sarcomere as expected (FIG. 39 A). In addition to confirming sarcomeric localization of mEGFP for each tagged protein, expected differences in protein localization for distinct structures were also observed.
  • MYL7 and TNNI1 are expected to localize between z-bands in the myofibril; a thick banding pattern of MYL7-GFP and TNNI1-GFP was observed, with dark lines marking the z- and m-bands.
  • alpha-actinin localizes to the z-line of the sarcomere and titin to the m-line of the myofibril.
  • These proteins both show a thinner banding pattern within the sarcomere, reflecting different localization and function in the sarcomere.
  • Antibodies specific to the targeted proteins co-localized with mEGFP, confirming appropriate localization of the tagged protein in cardiomyocytes (FIG. 39B).
  • each edited hiPSC clonal line was subjected to the same screening and quality control process described previously to ensure genomic (karyotype), stem cell (pluripotency), and cell biological (morphology, growth rate) integrity (Roberts and Haupt 2017). The majority of these clones passed these quality control standards and a subset of these clones were expanded and banked (FIG. 38). Discussion
  • Endogenous fluorescent tagging in hiPSCs has enabled live imaging to study the organization and dynamics of key functional proteins and structures in stem cells and their derivatives.
  • the advent of efficient and accessible gene editing tools like CRISPR/Cas9 has only recently merged endogenous tagging approaches with the differentiation potential of hiPSCs, as demonstrated in a recent study evaluating adhesion in sarcomere assembly using paxilin- mEGFP tagged cardiomyocytes (Chopra, Kutys et al. 2018).
  • Described herein is a unique multi-step CRISPR/Cas9-mediated editing strategy for tagging non-expressed genes in hiPSCs and methodology to mEGFP-tag several genes expressed specifically during cardiomyocyte differentiation.
  • a donor plasmid design enabled detection of HDR at targeted non-expressed loci, enrichment for putatively tagged cells with a constitutively expressed selection cassette, and generation of mEGFP-fusion alleles lacking genomic scars using a Cas9/MMEJ-driven excision strategy.
  • This approach uniquely utilizes CRISPR/Cas9 for both the incorporation and subsequent excision of the selection cassette and provides the added benefits of selection without drugs.
  • HDR rates typically ranged from 0.1-5% for these transcriptionally active loci in hiPSCs. Similar rates were observed (0.4-2%) in the current study using a much larger donor plasmid, suggesting that targeting non- expressed rather than expressed loci with a selection cassette much larger than the tag alone permits similar, tenable HDR rates for silent editing. Testing multiple crRNAs in parallel for each target also ensured editing success. In both studies, >90% edited cells were recovered by flow sorting and highlights the utility of a drug-free enrichment strategy, especially when HDR rates are low.
  • CRISPR/Cas9 was utilized a second time to excise the mCherry selection cassette from the enriched population of edited cells.
  • the relative efficiency of this step (4-12%) varied by gene and promoter with the CAAGS promoter (TTN, MYL2 and MYL7) preferable to hPGK (ACTN2, TNNI1). In all cases, the rate of excision was sufficient for robust FACS enrichment and exceeded the rate of HDR observed in the initial delivery step.
  • MMEJ sequences to guide scarless excision with a tunable linker as well as introduction of a canonical peptide linker for when DNA repair is mediated by NHEJ in multiple clones.
  • These features provide flexibility for adding a specific linker based on known properties of the target protein and provides a strategy to precisely engineer various edited outcomes. This can include the addition of an epitope tag, a cleavable peptide, or no intervening sequence as demonstrated for correcting disease mutations in a study (Kim, Matsumoto et al. 2018). Results with these five target genes suggest that the use of a CAAGS-driven selection cassette along with specific microhomology sequences provide means of introducing scarless edits at silent loci. This represents the first report utilizing MMEJ with CRISPR/Cas9-mediated endogenous tagging to generate scarless edits and tunable linkers at silent loci.
  • hiPSC human induced pluripotent stem cell
  • Donor plasmids were designed uniquely for each target locus. Homology arms 5' and 3' of the desired insertion site were each 1 kb in length and designed using the GRCh38 reference genome. WTC-specific variants (SNPs and INDELs) were identified from publicly available exome data (ETCSC Genome Browser) and also internal exome data. In cases where the WTC-specific variant was heterozygous, the reference genome variant was used in the donor plasmid; when the WTC-specific variant was homozygous, the WTC-specific variant was used in the donor plasmid.
  • SNPs and INDELs SNPs and INDELs
  • Linkers for each protein were unique to each target and were included in the donor plasmid with microhomology-containing redundancies, such that after MMEJ, the restored sequence would function as a linker 5' of mEGFP in each C-terminal tagging experiment.
  • Linker sequence was designed to flank TialL crRNA binding sites, which in turn flanked the mCherry expression cassette sequence containing either the PGK promoter or the CAGGS. To prevent crRNAs from targeting the donor plasmid sequence, mutations were introduced to disrupt Cas9 recognition or crRNA binding; when possible, these changes did not affect the amino acid sequence.
  • Negative control transfections were performed in all experiments with the crRNA targeting the AAVS1 locus in order to assess the relative rate of random donor cassette incorporation.
  • Cells was cultured for two passages across 7-9 days before analysis, in order to allow mCherry expression from the episomal donor plasmid to decline.
  • Cells were harvested for FACS using Accutase as previously described. The cell suspension (0.5 - l.OxlO 6 cells/mL in mTeSRl with ROCK inhibitor) was filtered through a 35 pm mesh filter into polystyrene round bottomed tubes. Cells were sorted using a FACSArialll Fusion (BD Biosciences) with a 130 pm nozzle and FACSDiva software (BD Biosciences).
  • FACS-enriched populations of edited cells were seeded at a density of lxlO 4 cells in a 10 cm GFR Matrigel-coated tissue culture plate. After 5-7 days clones were manually picked with a pipette and transferred into individual wells of 96-well GFR Matrigel-coated tissue culture plates with mTeSRl supplemented with 1% P/S and 10 pM ROCK inhibitor for 1 day. After 3-4 days of normal maintenance with mTeSRl supplemented with 1% P/S, colonies were dispersed with Accutase and transferred into a fresh GFR Matrigel-coated 96-well plate. After recovery, the plate was divided into daughter plates for ongoing culture, freezing, and gDNA isolation.
  • the following primers were used for the detection of mEGFP (5'- GCCGACAAGC AGAAGAACG-3 ', 5 '-GGGT GTTCTGCTGGT AGT GG-3 ') and hydrolysis probe (/56-FAM/AGATCCGCC/ZEN/ACAACATCGAGG/3LABkFQ/).
  • This assay was run in duplex with the genomic reference RPP30-HEX.
  • the PCR for detection of the AMP gene used the primers (5'- TTTCCGTGTCGCCCTTATTCC -3', 5'- ATGTAACCCACTCGTGCACCC -3') and hydrolysis probe (/5HEX/T GGGTGAGC/ZEN/AAA AAC AGGA AGGC/3 IABkF Q/) .
  • the PCR for detection of the KAN gene used the primers (5'-AACAGGAATCGAATGCAACCG-3', 5'- TTACTCACCACTGCGATCCC-3 ') and hydrolysis probe
  • PCR reactions were prepared using the required 2x Supermix for probes with no EGTR (Bio-Rad) with a final concentration of 400 nM for primers and 200 nM for probes, together with 10 units of Hindlll and 3 pL of sample (30-90 ng DNA) to a final volume of 25 pL.
  • Each reaction prior to cycling was loaded into a sample well of an 8-well disposable droplet generation cartridge followed by 70 pL of droplet generator oil into the oil well (Bio-Rad). Droplets were then generated using the QX200 droplet generator.
  • the resulting emulsions were then transferred to a 96-well plate, sealed with a pierceable foil seal (Bio-Rad), and run to completion on a Bio-Rad C1000 Touch thermocycler with a Deep Well cycling block.
  • the cycling conditions were: 98°C for 10 min, followed by 40 cycles (98°C for 30 s, 60°C for 20 s, 72°C for 15 s) with a final inactivation at 98 °C for 10 min.
  • droplets were analyzed on the QX200 and data analysis was preformed using QuantaSoft software.
  • the AMP or KAN signal was determined to be from residual non- integrated/background plasmid when the ratio of AMP/RPP30 or KAN/RPP30 fell below 0.2 copies/genome, because this was the maximum value of non-integrated plasmid observed at the time point used for screening in control experiments (data not shown).
  • a dilution series was preformed using a known plasmid containing both the mEGFP and AMP sequence. 78-5000 copies of plasmid were loaded per well and both mEGFP and AMP primers and probes were multiplexed together to ensure that the value returned corresponded to the copies of plasmid loaded.
  • the ratios of (copies/pLmEGFP)/(copies/pLRPP30) were plotted against [(copies/pLAMP)/(copies/pLRPP30) to identify cohorts of clones for ongoing analysis.
  • Clones were evaluated with PCR using primers spanning the junction between the tag sequence and the endogenous genomic sequence distal to the homology arm whether all mEGFP+/mCh-/AmpR- clones from each targeting experiment were targeted at the intended locus (data not shown).
  • PCR was used to amplify the tagged allele in two tiled reactions spanning the left and right homology arms, the mEGFP and linker sequence, and portions of the distal genomic region 5' of the left homology arm and 3' of the right homology arm (FIG. 34A - FIG. 34C) using gene-specific primers.
  • Cycling conditions were as follows (98°C 10 s, 70°C 5 s, 72°C 60 s) x 6 cycles at -2°C/cycle annealing temperature, (98°C 10 s, 54°C 5 s, 72°C 60 s) x 32 cycles, l2°C hold.
  • PCR was also used to amplify the untagged allele using gene-specific primers. These primers did not selectively amplify the unmodified locus, as was the case for tiled junctional PCR amplification of the tagged allele, but rather amplified both untagged and tagged alleles. PCR was performed with the same Primestar® reagents and cycling conditions as described above. Tracking of insertions and deletions (INDELs) by decomposition (TIDE) analysis was performed manually on the amplification reaction after bidirectional Sanger sequencing in order to determine the sequence of the untagged allele. For all final clones with wild type untagged alleles, the PCR product corresponding to the untagged allele was gel isolated and sequenced to confirm the initial result from TIDE analysis. Cell Plating for Imaging
  • Cells were plated on glass bottom multi-well plates (1.5H glass, Cellvis) coated with phenol red-free GFR Matrigel (Coming) diluted 1 :30 in phenol red-free DMEM/F12 (Gibco). Cells were seeded at a density of 2.5xl0 3 in 96-well plates and 12.5-18c10 3 on 24-well plates and fixed or imaged 3-4 days later.
  • Cardiomyocyte differentiation was achieved using a small molecule differentiation protocol similar to previously reported methods, with optimizations to small molecule concentration and timing (Lian et al. 2013). Briefly, cells were seeded onto GFR Matrigel-coated 6-well tissue culture plates at a density ranging from 0. l5-0.25xl0 6 cells per well in mTeSRl supplemented with 1% P/S and 10 mM ROCK inhibitor, designated as day -3. Cells were grown for three days, with daily mTeSRl media changes (day -2 and day -1).
  • Cells were resuspended in the same media and a 10 pL aliquot was used to count cardiomyocytes in a hemocytometer (INCYTO C-ChipTM). Cells were seeded onto PEI/Laminin coated 24-well glass bottom plates at a density ranging from 0.35-0.5x105 cells per well in RMPI media containing B27 with insulin and 10 pM ROCK inhibitor. 24 hours after plating, media was changed to RPMI media containing B27 with insulin. Imaging was performed 5-2 ld after plating.
  • Live cell imaging was performed on a Zeiss spinning-disk microscope with a Zeiss 20x/0.8 NA Plan-Apochromat, or 40x/l .2 NA W C-Apochromat Korr UV Vis IR objective, a CSU-X1 Yokogawa spinning-disk head, and Hamamatsu Orca Flash 4.0 camera.
  • Fixed cell imaging was done on a 3i spinning-disk microscope with a Zeiss 20x/0.8 NA Plan- Apochromat, or 63 x/l .2 NA W C-Apochromat Korr UV Vis IR objective, a CSU-W1 Yokogawa spinning-disk head, and Hamamatsu Orca Flash 4.0 camera.
  • Microscopes were outfitted with a humidified environmental chamber to maintain cells at 37°C with 5% CO2 during imaging.
  • Two clonal populations (one at passage 8 and one at passage 14) were sequenced from the WTC unedited parental line. After dissociation of cell cultures with Accutase, 2-3xl0 6 cells were pelleted, washed once with DPBS, resuspended in 350 pL of Qiagen RLT plus lysis buffer, then flash frozen in liquid nitrogen before storage at -80°C. 101 bp paired end libraries were prepared using an Illumina TruSeq Stranded mRNA Library Prep kit. Libraries were sequenced on an Illumina HiSeq 2500 at a depth of 30 million read pairs (Covance). Adapters were trimmed using Cutadapt (Martin 2017).
  • the CellTiter-Glo reagent was added to the live cells at a 1 :4 dilution at each of the time points and luminescence counts were read with a Perkin-Elmer Enspire plate reader.
  • the standard curve plate and 0 h plates were read within two hours of plating. Cell numbers were extrapolated from the linear portion of the standard curve for each experiment and the following equation was used to calculate cell doubling time where Tf is the final time in hours, Xf is the final cell count, and Xi is the initial cell count:
  • Reported doubling time was calculated using counts at time of seeding (0 h) and at 96 hours after seeding. Two independent experiments were performed for each edited cell line. Triplicate counts from each independent experiment were averaged (leaving two data points per edited cell line and three for unedited WTC) and a one-way ANOVA was performed to test if doubling times between cell lines were significantly different.
  • FIG. 33A - FIG. 33H Schematic describing multi-step CRISPR/Cas9 mediated targeting via HDR and subsequent microhomology guided excision of the constitutively expressed selection cassette.
  • the constitutive expression cassette sequences (PGK/CAGGS promoter driving mCherry expression) flanked by TialL crRNA binding sites utilized to release the cassette are not shown.
  • Oppositely oriented PAM sequences and PAM-3 trinucleotide sequences (turquoise font) anticipated from direct NHEJ without MMEJ are shown.
  • the linker sequence for ACTN2 targeting differed significantly from the linker sequence used for targeting of TTN, TNNI1, MYL2 and MYL7. In-frame translations of each linker region are indicated. Residues encoded by the endogenous open reading frame specific to each locus are shown in orange. Amino acid residues designed to comprise peptide linkers between the endogenous reading frame and mEGFP tag, after successful excision, are shown in blue. Invariant amino acid linker residues (P-G-S-G) resulting from translation, after successful excision, of the PAM and PAM-3 sequences from each oppositely oriented TialL crRNA site, are shown in pink. The initial two residues at the N- terminus of mEGFP are shown in green. Nucleotides involved in guiding microhomology- mediated in-frame deletions of the invariant residues are displayed in red font.
  • FIG. 34A - FIG. 34C Fluorescence assisted cell sorting (FACS) experiments to isolate mCherry-expressing cells and establish the efficacy of multi-step editing at transcriptionally silent loci.
  • FIG. 34A Percentages of mCherry-expressing cells isolated after transfection with donor plasmids in conjunction with Cas9/crRNA complexes targeting the intended locus (top and middle row of panels), alongside mock transfections (bottom row). The percentage of mCherry-expressing cells in each transfection is displayed, as indicated. Boxes indicate the thresholds applied to determine whether individual cells were mCherry-expressing within each analysis. The identity of the targeting crRNA is indicated in blue font within each plot.
  • mCherry fluorescence is indicated on the y-axis and forward scatter (FSC) is indicated on the x- axis.
  • FIG. 34B All data from (34A) is displayed in graphical format. Standard deviations are indicated where multiple replicate transfections were performed.
  • FIG. 34C Live imaging and FACS were performed one expansion passage after FACS enrichment in order to validate high fluorescence sorting purity. As an examples of these analyses, mCherry fluorescence was imaged in the enchriched TNNI1 Crl experiment along with Hoechst nuclear dye (indicated in each panel in the merged image). Cells isolated from the TTN Crl enrichment experiment were analyzed by FACS and found to 98.8% pure.
  • FIG. 35 A - FIG. 35D FACS-sorting of mCherry -negative cells to measure excision and obtain putatively mEGFP-tagged cells.
  • FIG. 35 A mCherry-expressing cells isolated from targeting experiments were transfected after recovery with either TialL or mock RNP. mCherry-negative cells were then collected as putatively excised cells from the TialL transfected condition. The percentage of mCherry- cells, according to the displayed gates, were measured and are as indicated from both the TialL and mock transfected conditions. mCherry fluorescence is indicated on the y-axis and forward scatter (FSC) is indicated on the x-axis.
  • FIG. 35B FACS-sorting of mCherry -negative cells to measure excision and obtain putatively mEGFP-tagged cells.
  • FIG. 35 A mCherry-expressing cells isolated from targeting experiments were transfected after recovery with either TialL or mock RNP. mCherry-negative cells
  • mCherry- negative cells isolated after TialL-mediated excision were differentiated into cardiomyocytes and analyzed by flow cytometry for mEGFP expression and expression of the cardiomyocyte marker cTnT.
  • mEGFP expression was observed within the population of cells positive for the cardiomyocyte marker cardiac troponin T (cTnT).
  • the percentage of cTnT+/mEGFP+ (upper right sector within each plot) cells is indicated, and was interpreted as a proxy for estimated tagging efficiency.
  • FIG. 35C Percentages of mCherry-negative cells (as determined by FACS threshold shown in FIG. 35 A) from both mock and TialL excision conditions are shown in graphical format. Standard deviation describes variance between replicate conditions.
  • FIG. 35D The percentage of on-target excised cells (blue arrows in FIG. 35C) calculated by subtracting the mean percentage of mCherry- negative cells in the TialL excised condition from the mean percentage of mCherry-negative cells in the mock excised condition is shown on the left y-axis. This value was used as an estimate for absolute rate of excision.
  • the estimated frequency (right y-axis) of putatively excised alleles in the TialL population was determined by subtracting the percentage of mCherry-negative cells in the mock-excised population from the percentage of mCherry-negative cells in the TialL excised population and dividing this value by the total percentage of mCherry-negative cells in the TialL excised population.
  • FIG. 35E Histograms display cell frequencies as a function of mCherry fluorescence from TialL-RNP-excised (blue) and mock-transfected (red) populations of putatively edited cells (mCherry-expressing) from the TNNI1 and ACTN2 targeting experiments.
  • mCherry- negative cells were abundant in the mock condition because hPGK promoter silencing occurred.
  • the diminished frequency of mCherry-expressing cells in TialL RNP -transfected populations black arrows was consistently observed and interpreted as evidence that excision had occurred. This contrasted with the MYL7, MYL2 and TTN tagging experiments, where mCherry expression was more stable in non-excised cells and mCherry-negative cells were much rarer in the mock transfected condition.
  • FIG. 36A - FIG. 36F Genetic analysis of precise mEGFP tagging using multi-step targeting and excision in clones.
  • FIG. 36A Clones from each targeting experiment were analyzed according to their normalized genomic copy number of the mEGFP, mCherry and AmpR sequences. Clones were categorized as candidates for further analysis if the of mEGFP genomic copy number was consistent with monoallelic or biallelic tagging (copy number of 1 or 2), and additive copy number of AmpR and mCherry was ⁇ 0.2.
  • Plots display the normalized mEGFP copy number (x-axis) plotted against the normalized additive mCherry and AmpR copy numbers (y-axis).
  • FIG. 36B The percentage of clones validated by ddPCR, is displayed in bar graphs.
  • FIG. 36C Percentages of ddPCR validated clones with validated PCR junctions between the mEGFP tag and the surrounding genomic region, distal to the homology arms, are shown.
  • FIG. 36D The percentages of analyzed clones with in-frame sequences of peptide linkers at the excision site are additionally shown, demonstrating a high rate of in-frame excision predicted to generate effectively tagged clones.
  • FIG. 36E The percentage of clones with WT untagged alleles is shown, demonstrating the relative low impact of unintended NHEJ at the targeted locus.
  • FIG. 37A - FIG. 37C Quantitative assays to evaluate cardiomyocyte differentiation efficiency and mEGFP-tagged allele expression in precisely excised clones.
  • FIG. 37A Selected clones validated by ddPCR analysis, junctional PCR and sequencing of the peptide linker at the excision site, and thus predicted to produce an in-frame endogenous mEGFP fusion protein, were differentiated into cardiomyocytes. mEGFP fluorescence in fixed cells was measured (y axis) and plotted against antibody staining intensity for the cardiomyocyte marker cardiac troponin T (cTnT, x axis).
  • FIG. 37B The percentages of cells positive for the cTnT marker of cardiomyocyte fate across biological replicates and in several independently edited clones validated to contain a precisely excised mEGFP-tagged alleles are shown. Error bars are standard deviation. ACTN2 experiments were two replicates.
  • FIG. 37C The percentages of cardiomyocytes (cTnT+) that were additionally expressing mEGFP are shown in experiments with several independently edited clones. Error bars are standard deviation among biological replicates. ACTN2 experiments were two replicates. All other experiments consisted of three replicates.
  • FIG. 38 Quality control criteria to evaluate the robustness of clonal line differentiation, pluripotency and genomic stability. Clones from each experiment are indicated, and were evaluated for karyotypic irregularities with metaphase spreads. FACS analysis after staining for nuclear pluripotency markers is also shown, with minimum values obtained among all trials displayed and number of trials in parentheses. Germ layer marker expression in the was measured by RT-ddPCR for TNNI1 clone 172. Cardiomyocyte differentiation using the small molecule protocol was performed and the percentage of cTnT+ cells was measured using FACS. Errors indicate standard deviation. The percentage of cTnT+ cells that additionally expressed mEGFP above threshold are additionally shown. Errors indicate standard deviation.
  • FIG. 39A - FIG. 39C Imaging experiments to evaluate sarcomeric localization of the mEGFP-tagged alleles.
  • FIG. 39 A Two independently edited clones from each of the five targeting experiments were differentiated, plated on glass bottom plates and imaged live using spinning disc confocal microscopy. Similar mEGFP-localization to the sarcomere was observed in both clones.
  • FIG. 39B One clone from each experiment was additionally fixed and imaging was performed on both the mEGFP fluorescence (green channel, green boxes for insets) and antibody staining (purple channel, purple boxes for insets) against the targeted protein, to confirm whether absolute overlap of the mEGFP and endogenous stain was observed.
  • FIG. 39 A Two independently edited clones from each of the five targeting experiments were differentiated, plated on glass bottom plates and imaged live using spinning disc confocal microscopy. Similar mEGFP-localization to the sarcomere was observed in both clones.
  • FIG. 40 A - FIG. 40D Quantitative and imaging assays to evaluate cardiomyocyte differentiation efficiency and GFP-tagged allele expression in precisely excised MYL2 clones.
  • FIG. 40A Selected MYL2 clones predicted to produce an in-frame endogenous GFP fusion protein, were differentiated into cardiomyocytes and cultured for 13 and 26 days. GFP fluorescence in fixed cells was measured (y axis) and plotted against antibody staining intensity for the cardiomyocyte marker cardiac troponin T (cTnT, x axis).
  • FIG. 40B The percentages of cells expressing the cTnT marker of cardiomyocyte fate across biological replicates and in several independently edited MYL2 clones validated to contain a precisely excised GFP-tagged alleles are shown.
  • FIG 40C The percentages of cardiomyocytes (cTnT+) that were additionally expressing GFP are shown in experiments with several independently edited clones. Error bars are standard deviation among biological replicates. Day 12-13 experiments were two replicates. Day 26 experiments consisted of one replicate.
  • FIG 40D Two independently edited clones from MYL2 editing experiments were differentiated, plated on glass bottom plates coated with PEI/laminin and imaged live using spinning disc confocal microscopy. At day 20 (left column) and day 28 (middle column) after differentiation GFP expression was assessed, showing an increase in both the number of cells expressing GFP and the intensity of GFP signal. Additionally, GFP localization was observed specifically at sarcomeres for both clones (right column). Scale bars are as indicated.

Abstract

The present invention provides stably tagged cells, including stem cells and methods for producing such cells comprising one or more tagged, differentially-expressed proteins using a gene editing system. The methods described herein enable the insertion of large fluorescent tags into a plurality of genomic loci to generate stem cells that are phenotypically and functional similar to the un-modified parent population. Stem cells produced by the methods described herein additionally retain the capacity to self-renew and differentiate into specialized cell types and can be used in assays and visualization of three-dimensional live cell imaging.

Description

STEM CELL LINES CONTAINING ENDOGENOUS, DIFFERENTIALLY-EXPRESSED TAGGED PROTEINS, METHODS OF PRODUCTION, AND USE THEREOF
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application claims benefit of U.S. Provisional Application No.
62/681,887, filed June 7, 2018, the disclosure of which is incorporated by reference herein in its entirety.
FIELD OF THE INVENTION
[0002] The present disclosure relates to the fields of stem cell biology, genetics, and genetic engineering. In particular aspects, the present disclosure relates to methods of genetically engineering stem cells to express one or more fluorescently-tagged structural or other proteins that are expressed when the stem cells undergo differentiation, but are otherwise not expressed in a pluripotent state. In further embodiments, the methods described herein allow for the generation of genetically-engineered, fluorescently-tagged stem cells, wherein the endogenous functions of the stem cells remain un-altered (see, e.g. , pluripotency and genomic stability). In further embodiments, the methods allow for three-dimensional live cell imaging of intracellular proteins. In further embodiments, the methods allow for use of the cells for screening, observing cellular dysplasia, disease staging, monitoring disease progression or improvement, or cellular stress in response to a test agent
BACKGROUND OF THE INVENTION
[0003] Conventional methods of live-cell protein imaging utilize protein fusion constructs, wherein a detectable marker (e.g., a fluorescent protein) is fused to the protein of interest, transduced or transfected into a cell. As such, these systems result essentially in the production of a cell that overexpresses the transduced protein. Although these systems have enabled the probing and analysis of protein localization and cellular dynamics in a wide range of cell types and assays, they fail to allow for the analysis and characterization of a target protein in an un-altered, endogenous state. For example, fusion constructs often result in unpredictable and artificial expression levels of the tagged protein, either as a result of transient expression of transfected constructs, or as a result of copy number variation with transduced constructs. These realities hinder the interpretation of experiments and in turn the study of pathogenesis and drug discovery.
[0004] The limitations of exogenous fusion construct systems are further exacerbated in the context of cells that are difficult to transfect or transduce, such as stem cells. In such cells, variation in the expression level of the construct may be especially problematic, as levels of transduction/transfection efficiency may be particularly low to begin with. Accordingly, there is a need in the art for methods that enable tagging of endogenous proteins such that the endogenous expression levels, function, and localization of the protein remain unaltered. In addition, there is a need in the art to enable tagging of endogenous proteins that are expressed when a stem cell undergoes differentiation, but are otherwise largely unexpressed in the stem cell.
SUMMARY OF THE INVENTION
[0005] In stem cells, and other cells that are particularly difficult to transfect or transduce, engineering of the endogenous genomic sequence to insert a protein tag overcomes the challenges of variable expression and allows for dynamic study of the endogenously-regulated, targeted gene product, including proteins that are differentially-expressed. These systems are enabled by the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas (CRISPR Associated) nuclease system, which allows for the precise targeting of a genomic locus, and with the insertion of one or more selectable markers. In cases where the target locus is expressed only in a differentiated cellular state, this system allows for the tag to be delivered together with selectable markers that are expressed in stem cells.
[0006] CRISPR/Cas9 eliminates many of the challenges associated with genetic engineering and an ever-growing number of studies illuminate the power of this approach. The system is most commonly used in loss-of-function studies, wherein one or more genes are mutated or deleted to generate genetic knock-outs. Less common is the use of the system to introduce exogenous genetic sequences into a target locus. In this instance, homology-directed repair (HDR) mediates the insertion of a repair template into the target locus and can be used to correct an existing mutation in the genomic sequence or to insert exogenous nucleic acid sequences ( e.g ., a nucleic acid sequence encoding one or more selectable markers). Although HDR has a low error- rate, it is an inherently inefficient process, with rates of less than 10% in normal cells. As such, until now it has been difficult to reproduce HDR-mediated protein tagging across multiple targets to enable systematic use of this process in the study of endogenous protein dynamics particularly in view of the unpredictability of how the introduction of large fluorescent tags may affect endogenous gene function as well as stem cell viability, pluripotency, and chromosomal stability.
[0007] The methods provided herein utilize CRISPR/Cas9-mediated gene editing to introduce multiple selectable markers via HDR into the genomic loci of target proteins, into the genomic safe harbor location, or other locations in the genome. Utilizing a first selection of transfected cells, followed by removal of a first selectable marker, cells can be produced that include a tagged, endogenous, differentially-expressed protein. These methods result in the production of isogenic hiPSC clones expressing detectable endogenously-regulated, differentially- expressed fusion proteins unique to each cell line, and do not substantially modify or alter stem cell pluripotency or function.
[0008] In some embodiments, the present invention provides a method for producing a cell comprising at least one tagged endogenous, differentially-expressed protein. The method suitably comprises providing a first nuclease specific for a target genomic locus of a differentially-expressed protein, providing a donor plasmid comprising: a first polynucleotide encoding a selection cassette , wherein the selection cassette comprises a first selectable marker; a second polynucleotide encoding a second selectable marker that is different from the first selectable marker; a third polynucleotide encoding a 5’ homology arm; and a fourth polynucleotide encoding a 3’ homology arm. The methods further include introducing the first nuclease and the donor plasmid of into a cell such that the first and second polynucleotides are inserted into the target genomic locus; selecting cells expressing the first selectable marker; and introducing into the cells (selected cells): a second nuclease capable of excising the selection cassette to generate an endogenous protein tagged with the second selectable marker, wherein the tagged endogenous protein is substantially free of a scar sequence; thereby producing the cell comprising the at least one tagged endogenous, differentially-expressed protein.
[0009] Also provided herein is a method for producing a stem cell comprising at least one tagged endogenous, differentially-expressed protein. The method suitably comprises providing a first ribonucleoprotein (RNP) complex comprising a first Cas protein, a first CRISPR RNA (crRNA) and a first trans-activating RNA (tracrRNA), wherein the first crRNA is specific for a target genomic locus of an endogenous, differentially-expressed protein in a stem cell, providing a donor plasmid comprising polynucleotide sequences encoding: a first selectable marker; a 5’ excision site and a 3’ excision site , wherein the 5’ and 3’ excision sites flank the first selectable marker; a second selectable marker that is different from the first selectable marker; and a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length. The method further includes transfecting the complex and the donor plasmid into the stem cell such that the polynucleotide sequences encoding the various components are inserted into the target genomic locus, selecting stem cells expressing the first selectable marker; and transfecting the stem cells of (d) with a second RNP complex comprising a second Cas protein, a second crRNA, and a second tracrRNA, wherein the second crRNA is specific for the 5’ and 3’ excision sites on the donor plasmid, to generate an endogenous protein tagged with the second selectable marker, thereby producing the stem cell comprising at least one tagged endogenous, differentially-expressed protein.
[0010] In further embodiments, provided herein is donor plasmid comprising polynucleotide sequences encoding: a first selectable marker; a constitutive regulatory element operably linked to the first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; a second selectable marker that is different from the first selectable marker; a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length.
[0011] Also provided is a stably tagged cell generated by insertion of the donor plasmids described herein. In further embodiments, the donor plasmids can be used for imaging one or more proteins in one or more cells.
[0012] In still further embodiments, provided herein is a cell comprising an exogenous polynucleotide integrated at a target genomic locus, the exogenous polynucleotide comprising polynucleotide sequences encoding: a first selectable marker; a constitutive regulatory element operably linked to the first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; and a second selectable marker that is different from the first selectable marker, wherein the target genomic locus is a locus of a gene encoding a differentially-expressed protein.
[0013] Also provided herein is a cell comprising a CRISPR/Cas9 ribonucleoprotein
(RNP) complex and a donor polynucleotide, the donor polynucleotide comprising polynucleotide sequences encoding: a first selectable marker; a constitutive regulatory element operably linked to the first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; a second selectable marker that is different from the first selectable marker, and a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length.
[0014] In further embodiments, provided herein is a cell comprising an endogenous, differentially-expressed protein stably tagged with a selectable marker.
[0015] Also provided is a kit comprising an array of stem cells comprising at least one tagged endogenous, differentially-expressed protein.
[0016] In further embodiments provided herein is a method of generating a signature for a test agent comprising: (a) admixing the test agent with one or more cells produced by the methods herein, detecting a response in the one or more cells; detecting a response in a control cell; detecting a difference in the response in the one or more cells from the control cell; and generating a data set of the difference in the response.
[0017] Also provided are uses of a cell produced by the methods described herein for determining toxicity of a test agent on the cell, determining the stage of disease in the cell, determining the dose of a test agent or drug for treatment of disease, monitoring disease progression in the cell; and monitoring effects of treatment of a test agent or drug on the cell.
[0018] Also provided herein are uses of a cell produced by the various methods described herein for monitoring progression of disease or effect of a test agent on a disease wherein the disease is selected from the group consisting of aberrant cell growth, wound healing, inflammation, and neurodegeneration.
[0019] In additional embodiments, provided herein are methods for producing a cell comprising at least one tagged endogenous, stimuli-responsive gene, the method comprising: providing a first nuclease specific for a target genomic locus of a stimuli-responsive gene; providing a donor plasmid comprising: a first polynucleotide encoding a selection cassette, wherein the selection cassette comprises a first selectable marker; a second polynucleotide encoding a second selectable marker that is different from the first selectable marker; a third polynucleotide encoding a 5’ homology arm; and a fourth polynucleotide encoding a 3’ homology arm; introducing the first nuclease and the donor plasmid into a cell such that the first and second polynucleotides are inserted into the target genomic locus; selecting cells expressing the first selectable marker; and introducing into the cells of (d): a second nuclease capable of excising the selection cassette to generate an endogenous, stimuli-responsive gene tagged with the second selectable marker; thereby producing the cell comprising the at least one tagged endogenous, stimuli-responsive gene.
[0020] In further embodiments, methods are provided for producing a cell comprising at least one tagged endogenous, stimuli-responsive gene, the method comprising: providing a first ribonucleoprotein (RNP) complex comprising a first Cas protein, a first CRISPR RNA (crRNA) and a first trans-activating RNA (tracrRNA), wherein the first crRNA is specific for a target genomic locus of an endogenous, stimuli-responsive gene in a cell; providing a donor plasmid comprising polynucleotide sequences encoding: a first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; a second selectable marker that is different from the first selectable marker; and a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length; transfecting the complex and the donor plasmid into the cell such that the polynucleotide sequences are inserted into the target genomic locus; selecting cells expressing the first selectable marker; and transfecting the cells with a second RNP complex comprising a second Cas protein, a second crRNA, and a second tracrRNA, wherein the second crRNA is specific for the 5’ and 3’ excision sites on the donor plasmid, to generate an endogenous stimuli-responsive gene tagged with the second selectable marker, thereby producing the cell comprising at least one tagged endogenous, stimuli-responsive gene.
[0021] In still further embodiments, provided herein is a cell comprising an exogenous polynucleotide integrated at a target genomic locus, the exogenous polynucleotide comprising polynucleotide sequences encoding: a first selectable marker; a constitutive regulatory element operably linked to the first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; and a second selectable marker that is different from the first selectable marker, wherein the target genomic locus is a locus of a stimuli-responsive gene.
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] FIG. 1 A - FIG. 1D provide schematics of illustrative gene editing and clone selection protocols. FIG. 1 A shows a schematic illustrating design features important for genome editing experiments. FIG. 1B illustrates a schematic of donor plasmids for N-terminal tagging of LMNB 1 and C-terminal tagging of DSP. FIG. 1C illustrates a schematic depicting the genome editing process. FIG. 1D shows a schematic overview of the clone isolation, genetic screening, and quality control workflow.
[0023] FIG. 2A - FIG. 2D illustrate comparisons of gene editing efficiency. FIG.
2A shows flow cytometry plots displaying GFP intensity (y-axis) 3-4 days after editing. FIG. 2B shows a comparison of genome editing efficiency, as defined by FACS, shown as a percentage of GFP+ cells within the gated cell population in each panel of FIG. 2A. FIG. 2C shows estimated percentage of cells in the FACS-enriched populations expressing GFP, as determined by live microscopy. FIG. 2D shows a representative image of the LMNB 1 Crl FACS-enriched population showing an enrichment of GFP+ cells. Scale bars are 10 pm.
[0024] FIG. 3 show a schematic illustrating the sequential process for identifying precisely tagged clones. In step 1 (FIG. 3 A), ddPCR was used to identify clones with GFP insertion (normalized genomic GFP copy number ~l or ~2) and no plasmid integration (normalized genomic plasmid backbone copy number <0.2). Hypothetical example of a typical editing experiment is shown with examples for pass and fail criteria. In step 2 (FIG. 3B), junctional PCR amplification of the tagged allele was used to determine precise on-target GFP insertion. In step 3 (FIG. 3C), the untagged allele of a clone with monoallelic GFP insertion is amplified. The amplicon was then sequenced to ensure that no mutations have been introduced to this allele.
[0025] FIG. 4A - FIG. 4E show results of genetic assays to screen for precise genome editing in clones. FIG. 4A shows ddPCR screening data from five experiments representative of experimental outcome categories. FIG. 4B shows examples of ddPCR screening data from experiments representative of the range of outcomes observed. Each data point represents one clone. FIG. 4C shows the rates of clonal confirmation by junctional tiled PCR following selection by ddPCR. FIG. 4D shows the rates of clonal confirmation by junctional tiled PCR when ddPCR was not used as an initial screening criterion. FIG. 4E shows the rate of clonal confirmation by untagged allele amplification and sequencing.
[0026] FIG. 5A - FIG. 5E show additional results of genetic assays to screen for precise genome editing in clones. FIG. 5A shows percentage of clones confirmed by ddPCR to have incorporated the GFP tag but not the plasmid backbone. FIG. 5B shows percentage of clones confirmed in step 1 that also had correctly sized junctional PCR amplicons. FIG. 5C shows percentage of clones confirmed to have wild type untagged alleles by PCR amplification and Sanger sequencing following steps 1 and 2. FIG. 5D shows the percentage of clones in each experiment with KAN/AMP copy number > 0.2 is displayed on the y-axis. Stacked bars represent 3 observed subcategories of rejected clones. FIG. 5E shows fragment analysis of complete junctional allele amplification.
[0027] FIG. 6A - FIG. 6C show amplification of complete junctional (non-tiled)
PCR products to demonstrate presence of the allele anticipated from tiled junctional PCR product data. FIG. 6A shows junctional PCR primers complementary to sequences flanking the homology arms in the distal genome were used together to co-amplify tagged and untagged alleles. FIG. 6B shows an assay served to rule out anticipated DNA repair outcomes where tiled junctional PCR data leads to a misleading result because the GFP tag sequence has been duplicated during HDR, as indicated by the schematic. FIG. 6C shows molecular weight markers are as indicated (kb).
[0028] FIG. 7 illustrates the morphology of final candidate clones with GFP-tagged
PXN.
[0029] FIG. 8 A - FIG. 8K show live-cell imaging of final 10 edited clonal lines.
Scale bars in all panels are as indicated.
[0030] FIG. 9A - FIG. 9C show cell biological assays to evaluate co-expression of tagged and untagged protein forms and their relative contributions to cellular proteome and structure. FIG. 9A shows comparison of labeled structures in edited cells and unedited WTC parental cells. FIG. 9B shows lysate from ACTB cl. 184 (left), TOMM20 cl. 27 (middle), and LMNB1 cl. 210 (right) are compared to unedited WTC cell lysate by western blot. FIG. 9C shows quantification of the Western blot analyses in FIG. 9B.
[0031] FIG. 10A - FIG. 10F show an assessment of stem cell quality after genome editing. FIG. 10A shows representative phase contrast images depicting cell and colony morphology of the unedited WTC line and several GFP-tagged clones (LMNB1, ACTB, TOMM20, and PXN). FIG. 10B shows representative flow cytometry plots of gene-edited LMNB 1 cl. 210 cells and unedited WTC cells immunostained for indicated pluripotency markers (Nanog, Oct3/4, Sox2, SSEA-3, TRA-l-60) and a marker of differentiation (SSEA-l). FIG. 10C shows representative flow cytometry plots of differentiated unedited WTC cells or gene-edited LMNB1 cl. FIG. 10D shows cardiomyocytes differentiated from unedited WTC cells and stained with cardiac Troponin T (cTnT) antibody to label cardiac myofibrils. FIG. 10E shows representative flow cytometry plots showing cTnT expression in unedited WTC control cells and several gene edited cell lines (LMNB 1 cl. 210, ACTB cl. 184, and TOMM20 cl. 27). FIG. 10F shows a quantitative assessment of pluripotency and cardiomyocyte differentiation markers for final clones
[0032] FIG. 11 A - FIG. 11E illustrate results of phenotypic validation of candidate clones.
[0033] FIG. 12 illustrates expression levels of the 12 genes attempted for genome editing in the WTC parental cell line.
[0034] FIG. 13 A - FIG. 13E illustrate predicted genome wide CRISPR/Cas9 alternative binding sites, categorized according to sequence profile and location with respect to genes. FIG. 13A shows predicted alternative CRISPR/Cas9 binding sites (SEQ ID NOs: 174 - 186) categorized for each crRNA used. FIG. 13B shows predicted off-target sequence breakdown based on sequence profile. FIG. 13C shows breakdown of sequenced off-target sites by sequence profile. FIG. 13D shows all predicted off-target sites were additionally categorized according to their location with respect to annotated genes. FIG. 13E shows breakdown of sequenced off-target sites by genomic location with respect to annotated genes.
[0035] FIG. 14A - FIG. 14B illustrate ddPCR screening data. FIG. 14A shows ddPCR screening data for all experiments. FIG. 14B shows a dilution series of the donor plasmid used for the PXN-EGFP tagging experiment was used to confirm equivalent amplification of the AMP and GFP sequences in two-channel ddPCR assays.
[0036] FIG. 15 illustrates comparison of unedited versus edited cells by immunofluorescence.
[0037] FIG. 16 illustrates comparison of GFP tag localization and endogenous protein stain in edited cell lines.
[0038] FIG. 17 shows live cell imaging comparison of transiently transfected cells and genome edited cells. Top panels depict transiently transfected WTC cells and bottom panels depict gene edited clonal lines. Left: WTC transfected with EGFP-tagged alpha tubulin construct compared to the TUB AlB-mEGFP edited cell line. Images are a single apical frame. Middle: WTC transfected with EGFP-tagged desolating construct compared to the DSP-mEGFP edited cell line. Images are maximum intensity projections of apical 4 z-frames. Right: WTC transfected with mCherry-tagged Tom20 construct compared to the TOMM20-mEGFP edited cell line. Images are single basal frames of the cell. [0039] FIG. 18A - FIG. 18B show Western blot analysis of all 10 edited clonal lines.
[0040] FIG. 19A - FIG. 19B show editing experiments testing the feasibility of biallelic editing of the LMNB1 and TUBA1B loci. FIG. 19A shows final clones LMNBl-mEGFP and TUBAlB-mEGFP were transfected using the standard editing protocol with a donor cassette targeting the untagged allele of the tagged locus, encoding mTagRFP-T (sequential delivery, top row). FIG. 19B shows the sorted population from FIG. 19A (indicated by asterisk) revealed similar subcellular localization of GFP and mTagRFP-T signal to the nuclear envelope in the majority of cells, suggesting successful biallelic tagging.
[0041] FIG. 20A - FIG. 20B show live imaging analysis at two culture time points of TUBAlB-mEGFP edited cells and the four final edited clones that displayed a low abundance of tagged protein.
[0042] FIG. 21 A - FIG. 21C show Western blot analysis of candidate clones at one culture time point and final clones at two culture time points from editing experiments that displayed a low abundance of tagged protein.
[0043] FIG. 22A - FIG. 22D show flow cytometry analysis of GFP tag expression stability, flow cytometry analysis of cell cycle dynamics, microscopy analysis of mitotic index, and culture growth assays. FIG. 22A shows endogenous GFP signal in final edited clones was compared in otherwise identical cultures separated by four passages (14 days) of culturing time (indicated). FIG. 22B shows propidium iodide staining and flow cytometry were used to quantify numbers of cells in Gl (indicated), S phase (indicated) and G2/M phase (indicated) in final edited clones. FIG. 22C shows DAPI staining of colonies from each of the same five clonal lines was additionally used to quantify the numbers of mitotic cells per colony, as indicated. FIG. 22D shows ATP quantitation was used as an indirect measure of cell growth.
[0044] FIG. 23 illustrates PCR primers (SEQ ID NOs: 193 - 272) used in experiments. All primers are listed in 5' to 3' orientation.
[0045] FIG. 24A - FIG. 24B illustrate antibodies used in western blot, immunofluorescence, and flow cytometry experiments.
[0046] FIG. 25 illustrates a workflow overview and strategy for building predictive models of the dynamic organization and behavior of cells using image-based 3D data sets of fluorescently tagged structures in human induced pluripotent stem cells (hiPSC). [0047] FIG. 26A - FIG. 26C illustrate image-based feature extraction: colony growth and fluorescent texture quantification to sort and select drug-induced end point phenotypes.
[0048] FIG. 27 illustrates high resolution 3D images reveal drug signatures on target and non-target cell structures as well as the morphological spectrum of each structure
[0049] FIG. 28A - FIG. 28C illustrate fluorescence quantification of 3D images to analyze drug-induced Golgi reorganization.
[0050] FIG. 29A - FIG. 29F illustrate relative fluorescence quantification of 3D images and z-axis intensity profiling to analyze drug-induced cytoskeleton reorganization.
[0051] FIG. 30 illustrates Z-axis intensity profiling of 3D images to analyze drug- induced cell junction reorganization.
[0052] FIG. 31 illustrates Z-axis intensity profiling of 3D images to analyze drug- induced cell junction reorganization.
[0053] FIG. 32 illustrates exemplary factors for producing differentiated cell types from human iPSCs.
[0054] FIG. 33 A - FIG. 33H illustrate a two-step CRISPR/Cas9 mediated targeting via HDR and subsequent microhomology guided excision of a constitutively expressed selection cassette, in accordance with embodiments hereof.
[0055] FIG. 34 A - FIG. 34C illustrate fluorescence assisted cell sorting (FACS) experiments to isolate mCherry-expressing cells and establish efficacy of two-step editing at transcriptionally silent loci, in accordance with embodiments hereof.
[0056] FIG. 35A - FIG. 35E illustrate FACS-sorting of mCherry-negative cells to measure excision and obtain putatively GFP-tagged cells, in accordance with embodiments hereof.
[0057] FIG. 36A - FIG. 36F illustrate genetic analysis of precise GFP tagging using two-step targeting and excision in clones, in accordance with embodiments hereof.
[0058] FIG. 37A - FIG. 37C illustrate quantitative assays to evaluate cardiomyocyte differentiation efficiency and GFP-tagged allele expression in precisely excised clones, in accordance with embodiments hereof.
[0059] FIG. 38 provides quality control criteria to evaluate the robustness of clonal line differentiation, pluripotency and genomic stability, in accordance with embodiments hereof.
[0060] FIG. 39A - FIG. 39C illustrate imaging experiments to evaluate sarcomeric localization of the GFP-tagged alleles, in accordance with embodiments hereof. [0061] FIG. 40A - FIG. 40D illustrate quantitative and imaging assays to evaluate cardiomyocyte differentiation efficiency and GFP-tagged allele expression in precisely excised MYL2 clones, in accordance with embodiments hereof.
DETAILED DESCRIPTION OF THE INVENTION
[0062] The present invention provides methods for producing stem cells comprising one or more tagged proteins using the CRISPR/Cas9 gene editing system. The methods described herein enable the insertion of fluorescent tags into a target genomic loci or plurality of target genomic loci to generate stem cells that are phenotypically and functional similar to the un modified parent population. Stem cells produced by the methods described herein additionally retain the capacity to self-renew and differentiate into specialized cell types.
[0063] The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described. All documents, or portions of documents, cited herein, including but not limited to patents, patent applications, articles, books, and treatises, are hereby expressly incorporated by reference in their entirety for any purpose. In the event that one or more of the incorporated documents or portions of documents define a term that contradicts that term’s definition in the application, the definition that appears in this application controls. However, mention of any reference, article, publication, patent, patent publication, and patent application cited herein is not, and should not be taken as an acknowledgment, or any form of suggestion, that they constitute valid prior art or form part of the common general knowledge in any country in the world.
[0064] In the present description, any concentration range, percentage range, ratio range, or integer range is to be understood to include the value of any integer within the recited range and, when appropriate, fractions thereof (such as one tenth and one hundredth of an integer), unless otherwise indicated. As used in this application, the terms“about” and“approximately” are used as equivalents. Any numerals used in this application with or without about/approximately are meant to cover any normal fluctuations appreciated by one of ordinary skill in the relevant art. In certain embodiments, the term“approximately” or“about” refers to a range of values that fall within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less in either direction (greater than or less than) of the stated reference value unless otherwise stated or otherwise evident from the context (except where such number would exceed 100% of a possible value).
[0065] It should be understood that the terms "a" and "an" as used herein refer to
"one or more" of the enumerated components unless otherwise indicated. The use of the alternative ( e.g ., "or") should be understood to mean either one, both, or any combination thereof of the alternatives. As used herein, the terms "include" and "comprise" are used synonymously. As used herein,“plurality” may refer to one or more components (e.g., one or more detectable tags).
I. Stem Cells
[0066] In some embodiments, the present invention provides for methods of producing a stem cell comprising at least one tagged endogenous protein. In certain embodiments, the endogenous protein is a wild-type protein, whereas in other embodiments, the endogenous protein comprises one or more naturally-occurring mutations and/or one or more introduced mutations. Examples of mutations include but are not limited to amino acid insertions, deletions and substitutions.
[0067] The term“stem cell,” as used herein, refers to a multipotent, non-specialized cell with the capacity to self-renew and to differentiate into at least one differentiated cell lineage (e.g, potency). The“sternness” of a stem cell include the characteristics of self-renewal and multipotency. Self-renewal refers to the proliferation of a stem cell to generate one (asymmetric division) or two (symmetric division) daughter cells with development potentials that are indistinguishable from those of the mother cell. Self-renewal results in an expanded population of stem cells, each of which maintains an undifferentiated state and the ability to differentiate into specialized cells. Typically, an expanded population of stem cells retains the sternness characteristics of the parent cell.
[0068] Potency refers to the ability of a stem cell to differentiate into at least one type of specialized cell. The greater the number of different specialized cell types a stem cell can differentiate into, the greater its potency. In some embodiments, a stem cell may be a totipotent cell, and able to differentiate into any specialized cell type (e.g, a zygote). In some embodiments, a stem cell may be pluripotent and able to differentiate into cell types of any of the three germ layers (endoderm, mesoderm, or ectoderm) (e.g, an embryonic stem cell or an induced pluripotent stem cell (iPSC)). In some embodiments, the stem cell may be multipotent and have the capacity to differentiate into multiple cell types of a particular cell lineage ( e.g ., a hematopoietic stem cell). Multipotent stem cells may also be referred to as progenitor cells. In certain embodiments, stem cells may be obtained from a donor, or they may be generated from a non-stem cell. Non-limiting examples of stem cells include embryonic stem cells and adult stem cells. Stem cells include, , but are not limited to, mesenchymal stem cells, adipose tissue-derived stem cells, hematopoietic stem cells, and umbilical cord-derived stem cells.
[0069] In some embodiments, the stem cells described herein are human iPSCs. iPSCs are derived from differentiated adult cells and have been modified to express transcription factors and proteins responsible for the induction and/or maintenance of a pluripotent state (e.g., Oct 3/4, Sox family transcription factors, Kef family transcription factors, and Nanog). In some embodiments, the iPSCs described herein are derived from a normal, healthy human donor. In some embodiments, the iPSC is a WTC or a WTB cell line (Kreitzer et al, American Journal of Stem Cells, 2: 119-31, 2013; Miyaoka et al, Nature Methods, 11 :291-3, 2013). In some embodiments, the iPSC is derived from a human donor that has been diagnosed with a disease or disorder. For example, in some embodiments the iPSC may be derived from a patient diagnosed with a cardiomyopathy (e.g, arrhythmogenic right ventricular cardiomyopathy, dilated cardiomyopathy, hypertrophic cardiomyopathy, left ventricular non-compaction cardiomyopathy, or restrictive cardiomyopathy), a heritable disease (e.g, deficiency of acyl-CoA dehydrogenase, very long chain (ACADVL), Barth syndrome (BTHS), camitine-acylcamitine translocase deficiency (CACTD), congenital disorder of DE glycosylation (CDDG), muscular dystrophies (including Emery-Dreifuss muscular dystrophy (EDMD1), autosomal dominant Emery-Dreifuss muscular dystrophy (EDMD2), Duchenne’s muscular dystrophy, and chronic granulomatous disease), Friedreich ataxia 1 (FRDA), glycogen storage disease II, Hurler-Scheie syndrome, isobutyryl-CoA dehydrogenase deficiency, Kearn-Sayre syndrome (KSS), Leigh syndrome, leprechaunism, long chain 3-hydroxyacyl-CoA dehydrogenase deficiency, mitochondrial DNA depletion syndrome 12 (cardiomyopathic type), mucolipidosis Ilia, myoclonus epilepsy associated with ragged-red fibers (MERFF), centronuclear myopathy 1 (CNM1), Preader-Willi syndrome (PWS), adult-onset progeria, propionic academia, Vici syndrome (VICIS), or Werner syndrome), or a disease caused by or associated with a chromosomal abnormality (e.g, chromosome 1P36 deletion syndrome, Duchenne’s muscular dystrophy, and Prader-Willi syndrome). [0070] Stem cell markers” as used herein are defined as gene products (e.g. protein, RNA, glycans, glycoproteins, etc.) that are specifically or predominantly expressed by stem cells. Cells may be identified as a particular type of stem cell based on their expression of one or more of the stem cell markers using techniques commonly available in the art including, but not limited to, analysis of gene expression signatures of cell populations by microarray, qPCR, RNA-sequencing (RNA-Seq), Next-generation sequencing (NGS), serial analysis of gene expression (SAGE), and/or analysis of protein expression by immunohistochemistry, western blot, and flow cytometry. Stem cell markers may be present in the nucleus (e.g, transcription factors), in the cytosol, and/or on the cell membrane (e.g, cell-surface markers). In some embodiments, a stem cell marker is a gene product that directly and specifically supports the maintenance of stem cell identity and/or stem cell function. In some embodiments, a stem cell marker is gene that is expressed specifically or predominantly by stem cells but does not necessarily have a specific function in the maintenance of stem cell identity and/or stem cell function. Examples of stem cell markers include, but are not limited to, Oct 3/4, Sox2, Nanog, Tra-l60, Tra-l8l, and SSEA3.
[0071] In some embodiments, the present invention provides genetically engineered stem cells. Herein, the terms“genetically engineered stem cells” or“modified stem cells” or“edited stem cells” refer to stem cells that comprise one or more genetic modifications, such as one or more tags inserted into a locus of one or more endogenous target genes.“Genetic engineering” refers to the process of manipulating a genomic DNA sequence to mutate or delete one or more nucleic acids of the endogenous sequence or to introduce an exogenous nucleic acid sequence into the genomic locus. The genetically-engineered or modified stem cells described herein comprise a genomic DNA sequence that is altered (e.g, genetically engineered to express a tag) compared to an un-modified stem cell or control stem cell. As used herein, an un-modified or control stem cell refers to a cell or population of cells wherein the genomes have not been experimentally manipulated (e.g, stem cells that have not been genetically engineered to express a tag).
[0072] In some embodiments, the stem cells described herein are derived from a donor (e.g., a healthy donor) and comprise one or more genetic mutations associated with a particular disease or disorder introduced into the iPSC genome. Such embodiments are referred to herein as“mutant stem cells.” Introduction of mutations into an iPSC derived from a health donor can mimic the genetic state of a particular disease or disorder, while maintaining the isogenic relationship between the mutant stem cell and the normal iPSC from which it is derived. This allows direct comparisons between the two cell types to be made when assessing the effect of a particular mutation on cellular structure, cellular function, protein localization, protein function, and/or protein expression. For example, mutations may be introduced into the PKD1 and/or PKD2 genes of an iPSC derived from a healthy donor to produce a PC 1 -mutant stem cell, a PC2-mutant stem cell, or a PCl/PC2-mutant stem cell. These mutant stem cells and the corresponding normal stem cells from which they are derived can then be further engineered to express one or more detectable markers in one or more endogenous target genomic loci. In some embodiments, these cells are assayed according to the methods described herein to determine the effect of a particular mutation on cellular structure, cellular function, protein localization, protein function, and/or protein expression, and can elucidate the role of a protein in different diseases, such as polycystic kidney disease.
[0073] In some embodiments, the present invention provides populations of genetically engineered stem cells that have been modified to express one or more tagged endogenous proteins. Herein, a“population” of cells ( e.g ., stem cells) refers to any number of cells greater than 1, e.g., at least lxlO3 cells, at least lxlO4 cells, at least lxlO5 cells, at least lxlO6 cells, at least lxlO7 cells, at least lxlO8 cells, at least lxlO9 cells, or at least lxlO10 or more cells.
II. Methods of producing genetically-engineered stem cells
[0074] In some embodiments, the present invention provides methods of producing genetically-engineered stem cells comprising at least one tagged endogenous protein. In some embodiments, the method comprises (a) providing a gene-editing system capable of producing double or single stranded DNA breaks at a target endogenous locus; (b) providing a repair template comprising a polynucleotide sequence encoding a detectable tag; (c) introducing the gene-editing system and the repair template into a stem cell such that the polynucleotide sequence encoding the detectable tag is inserted into an endogenous target genomic locus to generate the tagged endogenous protein. In certain embodiments, during step (c), the cells are cultured under conditions that allow insertion of the sequence encoding the detectable tag into the target genomic locus, such as any of those disclosed herein. In particular embodiments, the cells produced in step (c) are cultured under conditions suitable for expression of the tagged endogenous protein. In various embodiments of any of the methods disclosed herein, the stem cell is an iPSC, and the methods further comprise generating the iPSC. In particular embodiments, the iPSCs are generated from cells obtained from a donor, such as a normal, healthy donor or a diseased donor.
[0075] In some embodiments, the methods described herein are used to produce a genetically-engineered stem cell comprising one tagged endogenous protein. In some embodiments, the methods described herein are used to produce a genetically-engineered stem cell comprising two, three, four, five, six, seven, eight, nine, ten, or more tagged endogenous proteins. In some embodiments, the repair template comprises a 5’ homology arm and a 3’ homology arm, each of about 1 kb in length, or each more than 1 kb in length.
A. Gene-editing systems
[0076] Herein, the term“gene-editing system” refers to a protein, nucleic acid, or combination thereof that is capable of modifying a target locus of an endogenous DNA sequence when introduced into a cell. Numerous gene editing systems suitable for use in the methods of the present invention are known in the art including, but not limited to, zinc-finger nuclease systems, TALEN systems, and CRISPR/Cas systems.
[0077] In some embodiments, the gene editing system used in the methods described herein is a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas (CRISPR Associated) nuclease system, which is an engineered nuclease system based on a bacterial system that can be used for mammalian genome engineering. Generally, the system comprises a CRISPR-associated endonuclease (for example, a Cas endonuclease) and a guide RNA (gRNA). The gRNA is comprised of two parts; a crispr-RNA (crRNA) that is specific for a target genomic DNA sequence, and a trans-activating RNA (tracrRNA) that facilitates endonuclease binding to the DNA at the targeted insertion site. In some embodiments, the crRNA and tracrRNA may be present in the same RNA oligonucleotide, referred to as a single guide-RNA (sgRNA). In some embodiments, the crRNA and tracrRNA may be present as separate RNA oligonucleotides. In such embodiments, the gRNA is comprised of a crRNA oligonucleotide and a tracrRNA oligonucleotide that associate to form a crRNA:tracrRNA duplex. As used herein, the term“guide RNA” or“gRNA” refers to the combination of a tracrRNA and a crRNA, present as either an sgRNA or a crRNA:tracrRNA duplex.
[0078] In some embodiments, the CRISPR/Cas systems described herein comprise a Cas protein, a crRNA, and a tracrRNA. In some embodiments, the crRNA and tracrRNA are combined as a duplex RNA molecule to form a gRNA. In some embodiments, the crRNA:tracrRNA duplex is formed in vitro prior to introduction to a cell. In some embodiments, the crRNA and tracrRNA are introduced into a cell as separate RNA molecules and crRNA:tracrRNA duplex is then formed intracellularly. In some embodiments, polynucleotides encoding the crRNA and tracrRNA are provided. In such embodiments, the polynucleotides encoding the crRNA and tracrRNA are introduced into a cell and the crRNA and tracrRNA molecules are then transcribed intracellularly. In some embodiments, the crRNA and tracrRNA are encoded by a single polynucleotides. In some embodiments, the crRNA and tracrRNA are encoded by separate polynucleotides.
[0079] In some embodiments, a detectable tag is inserted into a target locus of an endogenous gene mediated by Cas-mediated DNA cleavage at or near a target insertion site. As such, the term“target insertion site” refers to a specific location within a target locus, wherein a polynucleotide sequence encoding a detectable tag can be inserted. In some embodiments, a Cas endonuclease is directed to the target insertion site by the sequence specificity of the crRNA portion of the gRNA, which requires the presence of a protospacer motif (PAM) sequence near the target insertion site. A variety of PAM sequences suitable for use with a particular endonuclease ( e.g ., a Cas9 endonuclease) are known in the art ( See e.g ., Nat Methods. 2013 Nov; 10(11): 1116— 1121 and Sci Rep. 2014; 4: 5405). Exemplary PAM sequences suitable for use in the present invention are shown in Table 5. In some embodiments, the target locus comprises a PAM sequence within 50 base pairs of the target insertion site. In some embodiments, the target locus comprises a PAM sequence within 10 base pairs of the target insertion site. The genomic loci that can be targeted by this method are limited only by the relative distance of the PAM sequence to the target insertion site and the presence of a unique 20 base pair sequence to mediate sequence- specific, gRNA-mediated Cas9 binding. In some embodiments, the target insertion site is located at the 5’ terminus of the target locus. In some embodiments, the target insertion site is located at the 3’ end of the target locus. In some embodiments, the target insertion site is located within an intron or an exon of the target locus.
[0080] The specificity of a gRNA for a target loci is mediated by the crRNA sequence, which comprises a sequence of about 20 nucleotides that are complementary to the DNA sequence at a target locus. In some embodiments, the crRNA sequences used in the methods of the present invention are at least 90% complementary to a DNA sequence of a target locus. In some embodiments, the crRNA sequences used in the methods of the present invention are at least 95%, 96%, 97%, 98%, or 99% complementary to a DNA sequence of a target locus. In some embodiments, the crRNA sequences used in the methods of the present invention are 100% complementary to a DNA sequence of a target locus. In some embodiments, the crRNA sequences described herein are designed to minimize off-target binding using algorithms known in the art ( e.g ., Cas-OFF finder) to identify target sequences that are unique to a particular target locus or target gene. In some embodiments, the crRNA sequences used in the methods of the present invention are at least 90% identical to one of SEQ ID NOs: 85 - 140. In some embodiments, the crRNA sequences used in the methods of the present invention are at least 95%, 96%, 97%, 98%, or 99% identical to one of SEQ ID NOs: 85 - 140. In some embodiments, the crRNA sequences used in the methods of the present invention are 100% identical to one of SEQ ID NOs: 85 - 140. Exemplary crRNA sequences are shown in Table 5.
[0081] In some embodiments, the endonuclease is a Cas protein. In some embodiments, the endonuclease is a Cas9 protein. In some embodiments, the Cas9 protein is derived from Streptococcus pyogenes (e.g, SpCas9), Staphylococcus aureus (e.g, SaCas9), or Neisseria meningitides (NmeCas9). In some embodiments, the Cas endonuclease is a Cas9 protein or a Cas9 ortholog and is selected from the group consisting of SpCas9, SpCas9-HFl, SpCas9- HF2, SpCas9-HF3, SpCas9-HF4, SaCas9, FnCpf, FnCas9, eSpCas9, and NmeCas9. In some embodiments, the endonuclease is selected from the group consisting of C2C1, C2C3, Cpfl (also referred to as Casl2a), Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csnl and Csxl2), CaslO, Csyl, Csy2, Csy3, Csel, Cse2, Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO, Csxl6, CsaX, Csx3, Csxl, Csxl5, Csfl, Csf2, Csf3, and Csf4.
[0082] In some embodiments, the Cas9 is a wildtype (WT) Cas9 protein or ortholog. WT Cas9 comprises two catalytically active domains (HNH and RuvC). Binding of WT Cas9 to DNA based on gRNA specificity results in double-stranded DNA breaks that can be repaired by non-homologous end joining (NHEJ) or homology-directed repair (HDR). In some embodiments, Cas9 is fused to proteins that recruit DNA-damage signaling proteins, exonucleases, or phosphatases to further increase the likelihood or the rate of repair of the target sequence by one repair mechanism or another. In some embodiments, a WT Cas9 is co-expressed with a nucleic acid repair template to facilitate the incorporation of an exogenous nucleic acid sequence by homology-directed repair. In some embodiments, a WT Cas9 is co-expressed with an exogenous nucleic acid sequence encoding a detectable tag to facilitate the incorporation of the nucleic acid encoding the detectable tag into an endogenous target loci by homology-directed repair.
[0083] In some embodiments, the Cas9 is a Cas9 nickase mutant. Cas9 nickase mutants comprise only one catalytically active domain (either the HNH domain or the RuvC domain). The Cas9 nickase mutants retain DNA binding based on gRNA specificity, but are capable of cutting only one strand of DNA resulting in a single-strand break ( e.g . a“nick”). In some embodiments, two complementary Cas9 nickase mutants (e.g. , one Cas9 nickase mutant with an inactivated RuvC domain, and one Cas9 nickase mutant with an inactivated HNH domain) are expressed in the same cell with two gRNAs corresponding to two respective target sequences; one target sequence on the sense DNA strand, and one on the antisense DNA strand. This dual-nickase system results in staggered double stranded breaks and can increase target specificity, as it is unlikely that two off-target nicks will be generated close enough to generate a double stranded break. In some embodiments, a Cas9 nickase mutant is co-expressed with a nucleic acid repair template to facilitate the incorporation of an exogenous nucleic acid sequence by homology- directed repair. In some embodiments, a Cas9 nickase mutant is co-expressed with an exogenous nucleic acid sequence encoding a detectable tag to facilitate the incorporation of the nucleic acid encoding the detectable tag into an endogenous target loci by homology-directed repair.
B. Repair Templates
[0084] In some embodiments, the components of a gene editing system (e.g, one or more gRNAs and a Cas9 protein, or nucleic acids encoding the same) are introduced into a population of stem cells with a repair template. In some embodiments, the repair template comprises a polynucleotide sequence encoding a detectable tag flanked on both the 5’ and 3’ ends by homology arm polynucleotide sequences. In such embodiments, the homology arm sequences and detectable tag sequences comprised within a repair template facilitate the repair of the Cas9- induced double-stranded DNA breaks at an endogenous target loci by homology-directed repair (HDR). In such embodiments, repair of the double-stranded breaks by HDR results in the insertion of the polynucleotide sequence encoding the detectable tag into the endogenous target locus. In some embodiments, the repair template comprises a nucleic acid sequence that is at least about 90% identical to a sequence selected from SEQ ID NOs: 31 - 84. In some embodiments, the repair template comprises a nucleic acid sequence that is at least about 95%, 96%, 97%, 98%, or 99% identical to a sequence selected from SEQ ID NOs: 31 - 84. In some embodiments, the repair template comprises a nucleic acid sequence that is 100% identical to a sequence selected from SEQ ID NOs: 31 - 84.
1. Homology Arms
[0085] In some embodiments, each of the 5’ and 3’ homology arms is at least about
500 base pairs long. For example, the homology arm sequences may be at least 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 2000, 3000 or more base pairs long. In some embodiments, the homology arm sequences are at least about 1000 base pairs long. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to an endogenous nucleic acid sequence located 5’ to a particular endogenous target locus. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to an endogenous nucleic acid sequence located 5’ to a particular endogenous target locus. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to an endogenous nucleic acid sequence located 5’ to a particular endogenous target locus. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to a sequence selected from SEQ ID NOs: 1 - 15. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to a sequence selected from SEQ ID NOs: 1 - 15. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to a sequence selected from SEQ ID NOs: 1 - 15.
[0086] In some embodiments, the 3’ homology arm polynucleotide sequence is at least about 90% identical to an endogenous nucleic acid sequence located 3’ to a particular endogenous target locus. In some embodiments, the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to an endogenous nucleic acid sequence located 3’ to a particular endogenous target locus. In some embodiments, the 3’ homology arm polynucleotide sequence is 100% identical to an endogenous nucleic acid sequence located 3’ to a particular endogenous target locus. In some embodiments, the 3’ homology arm polynucleotide sequence is at least about 90% identical to a sequence selected from SEQ ID NOs: 16 - 30. In some embodiments, the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to a sequence selected from SEQ ID NOs: 16 - 30. In some embodiments, the 3’ homology arm polynucleotide sequence is 100% identical to a sequence selected from SEQ ID NOs: 16 - 30.
[0087] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to a sequence selected from SEQ ID NOs: 1 - 15 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to a sequence selected from SEQ ID NOs: 16 - 30. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to a sequence selected from SEQ ID NOs: 1 - 15 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to a sequence selected from SEQ ID NOs: 16 - 30. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to a sequence selected from SEQ ID NOs: 1 - 15 and the 3’ homology arm polynucleotide sequence is 100% identical to a sequence selected from SEQ ID NOs: 16 - 30.
[0088] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 1 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 16. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
1 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 16. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 1 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 16.
[0089] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 2 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 17. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
2 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 17. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 2 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 17.
[0090] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 3 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 18. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
3 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 18. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 3 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 18.
[0091] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 4 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 19. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
4 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 19. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 4 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 19.
[0092] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 5 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 20. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
5 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 20. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 5 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 20.
[0093] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 6 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 21. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
6 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 21. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 6 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 21.
[0094] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 7 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 22. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
7 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 2. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 7 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 22.
[0095] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 8 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 23. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
8 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 23. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 8 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 23.
[0096] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 9 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 24. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
9 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 24. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 9 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 24.
[0097] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 10 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 25. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
10 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 25. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 10 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 25. [0098] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 11 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 26. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
11 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 26. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 11 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 26.
[0099] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 12 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 27. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
12 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 27. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 12 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 27.
[0100] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 13 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 28. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
13 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 28. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 13 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 28.
[0101] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 14 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 29. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO:
14 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 29. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 14 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 29.
[0102] In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 15 and the 3’ homology arm polynucleotide sequence is at least about 90% identical to SEQ ID NO: 30. In some embodiments, the 5’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 15 and the 3’ homology arm polynucleotide sequence is at least about 95%, 96%, 97%, 98%, or 99% identical to SEQ ID NO: 30. In some embodiments, the 5’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 15 and the 3’ homology arm polynucleotide sequence is 100% identical to SEQ ID NO: 30.
C. Introduction of gene-editing systems
[0103] The components of the gene-editing system ( e.g ., a CRISPR/Cas system comprising a Cas, tracrRNA, and crRNA) can be intracellularly delivered to a population of cells by any means known in the art. In some embodiments, the Cas component of a CRISPR/Cas gene editing system is provided as a protein. In some embodiments, the Cas protein may be complexed with a crRNA:tracrRNA duplex in vitro to form an CRISPR/Cas RNP (crRNP) complex. In some embodiments, the crRNP complex is introduced to a cell by transfection. In some embodiments, the Cas protein may be introduced to a cell before or after a gRNA is introduced to the cell. In some embodiments, the Cas protein is introduced to a cell by transfection before or after a gRNA is introduced to the cell.
[0104] In some embodiments, a nucleic acid encoding a Cas protein is provided. In some embodiments, the nucleic acid encoding the Cas protein is an DNA nucleic acid and is introduced to the cell by transduction. In some embodiments, the Cas9 and gRNA components of a CRISPR/Cas gene editing system are encoded by a single polynucleotide molecule. In some embodiments, the polynucleotide encoding the Cas protein and gRNA component are comprised in a viral vector and introduced to the cell by viral transduction. In some embodiments, the Cas9 and gRNA components of a CRISPR/Cas gene editing system are encoded by a different polynucleotide molecules. In some embodiments, the polynucleotide encoding the Cas protein is comprised in a first viral vector and the polynucleotide encoding the gRNA is comprised in a second viral vector. In some aspects of this embodiment, the first viral vector is introduced to a cell prior to the second viral vector. In some aspects of this embodiment, the second viral vector is introduced to a cell prior to the first viral vector. In such embodiments, integration of the vectors results in sustained expression of the Cas9 and gRNA components. However, sustained expression of Cas9 may lead to increased off-target mutations and cutting in some cell types. Therefore, in some embodiments, an mRNA nucleic acid sequence encoding the Cas protein may be introduced to the population of cells by transfection. In such embodiments, the expression of Cas9 will decrease over time, and may reduce the number of off target mutations or cutting sites.
[0105] In some embodiments, each of the Cas9, tracrRNA, crRNA, and repair template components are introduced to a cell by transfection alone or in combination ( e.g ., transfection of a crRNP). Transfection may be performed by any means known in the art, including but not limited to lipofection, electroporation (e.g., Neon® transfection system or an Amaxa Nucleofector®), sonication, or nucleofection. In such embodiments, the gRNA components can be transfected into a population of cells with a plasmid encoding the Cas9 nuclease. In such embodiments, the expression of Cas9 will decrease over time, and may reduce the number of off target mutations or cutting sites.
D. Detectable Tags
[0106] In some embodiments, the repair templates described herein comprise a polynucleotide sequence encoding a“detectable tag”,“tag,” or“label.” These terms are used interchangeably herein and refer to a protein that is capable of being detected and is linked or fused to a heterologous protein (e.g, an endogenous protein). Herein, the detectable tag serves to identify the presence of the heterologous protein. Insertion of a polynucleotide sequence encoding a detectable tag into an endogenous target loci results in the expression of a tagged version of the endogenous protein. Examples of detectable tags include but are not limited to, FLAG tags, poly- histidine tags (e.g. 6xHis), SNAP tags, Halo tags, cMyc tags, glutathione-S-transferase tags, avidin, enzymes, fluorescent molecules, luminescent proteins, chemiluminescent proteins, bioluminescent proteins, and phosphorescent proteins.
[0107] In some embodiments, the detectable tag is a fluorescent protein such as green fluorescent protein (GFP), blue fluorescent protein, cyan fluorescent protein, yellow fluorescent protein, or red fluorescent protein. In some embodiments, the detectable tag is GFP. Additional examples of detectable tags suitable for use in the present methods and compositions include mCherry, tdTomato, mNeonGreen, eGFP, Emerald, mEGFP (A208K mutation), mKate, and mTagRFPt. In some embodiments the fluorescent protein is selected from the group consisting ofbBlue/UV proteins (such as TagBFP, mTagBFP2, Azurite, EBFP2, mKalamal, Sirius, Sapphire, and T-Sapphire); cyan proteins (such as ECFP, Cerulean, SCFP3A, mTurquoise, mTurquoise2, monomeric Midoriishi-Cyan, TagCFP, and mTFPl); green proteins (such as: EGFP, Emerald, Superfolder GFP, Monomeric Azami Green, TagGFP2, mUKG, mWasabi, Clover, and mNeonGreen); yellow proteins (such as EYFP, Citrine, Venus, SYFP2, and TagYFP); orange proteins (such as Monomeric Kusabira-Orange, ihKOk, mK02, mOrange, and mOrange2); red proteins (such as mRaspberry, mCherry, mStrawberry, mTangerine, tdTomato, TagRFP, TagRFP- T, mApple, mRuby, and mRuby2); far-red proteins (such as mPlum, HcRed-Tandem, mKate2, mNeptune, and NirFP); near-infrared proteins (such as TagRFP657, IFP1.4, and iRFP); long stokes shift proteins (such as mKeima Red, LSS-mKatel, LSS-mKate2, and mBeRFP); photoactivatible pProteins (such as PA-GFP, PAmCherryl, and PATagRFP); photoconvertible proteins (such as Kaede (green), Kaede (red), KikGRl (green), KikGRl (red), PS-CFP2, PS-CFP2, mEos2 (green), mEos2 (red), mEos3.2 (green), mEos3.2 (red), PSmOrange, and PSmOrange); and photoswitchable proteins (such as Dronpa). In some embodiments, the detectable tag can be selected from AmCyan, AsRed, DsRed2, DsRed Express, E2-Crimson, HcRed, ZsGreen, ZsYellow, mCherry, mStrawberry, mOrange, mBanana, mPlum, mRasberry, tdTomato, DsRed Monomer, and/or AcGFP, all of which are available from Clontech.
[0108] In some embodiments, the polynucleotide sequence encoding the detectable tag is at least about 20 base pairs long. In some embodiments, the polynucleotide sequence encoding the detectable tag is at least 100 base pairs long. For example, the polynucleotide sequence encoding the detectable tag may be about 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 2000, 3000, 4000, 5000 or more base pairs long. In some embodiments, the polynucleotide sequence encoding the detectable tag comprises at least about 300 base pairs long. In some embodiments, the polynucleotide sequence encoding the detectable tag comprises at least about 500 base pairs long. In further embodiments, the polynucleotide sequence encoding the detectable tag is about 700 to about 750 base pairs long. For example, the polynucleotide sequence encoding the detectable tag may be about 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 7114, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 740, or about 750 base pairs long. In some embodiments, the polynucleotide sequence encoding the detectable tag is between 710 and 730 base pairs long. The polynucleotide sequence can encode a full-length detectable tag or a portion or fragment thereof. In some embodiments, the polynucleotide sequence encodes a full-length detectable tag. In some embodiments, insertion of the detectable tag into the target locus does not significantly alter the expression or function of either the endogenous protein or the encoded detectable tag.
[0109] The insertion of the detectable tag sequence into an endogenous gene results in the production of a tagged endogenous protein. In some embodiments, the tag is directly fused to the endogenous protein. The term“directly fused” refers to two or more amino acid sequences connected to each other ( e.g ., by peptide bonds) without intervening or extraneous sequences (e.g, two or more amino acid sequences that are not connected by a linker sequence). In some embodiments, the polynucleotide sequence encoding the detectable tag further comprises a linker sequence such that the detectable tag is attached (or linked) to the endogenous protein by a linker sequence. In such embodiments, the attachment may be by covalent or non-covalent linkage. In some embodiments, the attachment is covalent. In some embodiments, the linker sequence is a flexible linker sequence. In some embodiments, the tag is directly fused, or attached by a linker, to the C- terminal or N-terminal end of an endogenous protein. In some embodiments, the linker sequence is selected from the group consisting of sequences shown in Tables 3 and 4.
[0110] In some embodiments, the donor polynucleotide further comprises a polynucleotide sequence encoding a selectable marker that allows for the selection of cells comprising the donor polynucleotide. Selectable markers are known in the art and include antibiotic resistance genes. In some embodiments, the antibiotic resistance gene confers resistance to gentamycin, thymidine kinase, ampicillin, and/or kanamycin.
[0111] In some embodiments, the donor polynucleotide is a plasmid, referred to herein as a“donor plasmid.” In some embodiments, the donor plasmid comprises a repair template comprising (i) a 5’ homology arm sequence; (ii) a nucleic acid sequence encoding a detectable tag; and (iii) a 3’ homology arm sequence. In some embodiments, the repair template comprised within the donor plasmid further comprises a linker sequence located at the 5’ end or the 3’ end of the nucleic acid sequence encoding the detectable tag. In some embodiments, the repair template comprised within the donor plasmid further comprises an antibiotic resistance cassette located between the 5’ and 3’ homology arm sequences. In such embodiments, the antibiotic resistance cassette may be located 3’ to the 5’ homology arm sequence and 5’ to the nucleic acid sequence encoding the detectable tag. Alternatively, the antibiotic resistance cassette may be located 5’ to the 3’ homology arm sequence and 3’ to the nucleic acid sequence encoding the detectable tag. In some embodiments, the donor plasmid does not comprise a promoter. In such embodiments, the donor plasmid functions as a vehicle to deliver the tag sequence intracellularly to a cell and does not mediate transcription and/or translation of the tag sequence or any polynucleotide sequence comprised therein.
E. Endogenous Target Loci.
[0112] In some embodiments, the present invention provides for methods of inserting one or more detectable tags into one or more endogenous target loci. In some embodiments, the target locus is located within an endogenous gene encoding a structural protein or a non- structural protein. Exemplary target genes are shown below in Tables 1 and 2. In some embodiments, the structural protein is selected from paxillin (PXN), tubulin-alpha lb (TUBA1B), lamin Bl (LMNB1), actinin alpha 1 (ACTN1), translocase of outer mitochondrial membrane 20 (TOMM20), desmoplakin (DSP), Sec6l translocon beta subunit (SEC61B), fibrillarin (FBL), actin beta (ACTB), myosin heavy chain 10 (MYH10), vimentin (VIM), tight junction protein 1 (TJP1, also known as ZO-l), safe harbor locus, CAGGS promoter (AAVS1), microtubule-associated protein 1 light chain 3 beta (MAP1LC3B, also known as LC3), ST6 beta-galactoside alpha-2, 6- sialyltransferase 1 (ST6GAL1), lysosomal associated membrane protein 1 (LAMP1), centrin 2 (CETN2), solute carrier family 25 member 17 (SLC25A17), RAB5A, member RAS oncogene family (RAB5A), gap junction protein alpha 1 (also known as connexin 43 (CX43)) (GJA1), mitogen-activated protein kinase 1 (MAPK1), ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 2 (ATP2A2), ART serine/threonine kinase 1 (AKT1), catenin beta 1 (CTNNB1), nucleophosmin (NPM1), histone cluster 1 H2B family member j (HIST1H2BJ), Histone cluster 1 H2B family member j:2A:CAAX (CAGGS:HISTlH2BJ:2A:CAAX), polycystin 2, transient receptor potential cation channel (PKD2), dystrophin (DMD), desmin (DES), solute carrier family 25 member 17 (SLC25A17, also known as PMP34), Structural maintenance of chromosomes 1A (SMC1A), Nucleoporin 153 (NUP153), CCCTC-binding factor (CTCF), Chromobox 1 (CBX1), POET class 5 homeobox 1 (Oct4), Sex-determining region-box 2 (Sox2), and Nanog homeobox (Nanog). In certain embodiments, any of these target loci are tagged with a detectable tag, e.g., a fluorescent tag, such as GFP.
[0113] In some embodiments, the one or more detectable tags are inserted into an endogenous target locus in a gene encoding a structural protein or a non-structural protein, wherein the expression of the gene and/or the encoded protein is associated with a particular cell type or tissue type. For example, in some embodiments, the expression of the gene and/or the encoded protein is associated with cardiomyocytes, hepatocytes, renal cells, epithelial cells, endothelial cells, neurons, mucosal cells of the gut, lung, or nasal passages. In some embodiments, the expression of the gene and/or the encoded protein is associated with cardiac tissue including, but not limited to, troponin II, slow skeletal type (TNNI1), actinin alpha 2 (ACTN2), troponin 13, cardiac type (TNN13), myosin light chain 2 (MYL2), myosin light chain 7 (MYL7), titin (TTN), SMAD family member 2 (SMAD), SMAD family member 5 (SMAD5), NK2 homeobox 5 (NKX2-5), Mesoderm posterior bHLH transcription factor 1 (MESP1), Mix paired-like homeobox (MIXL1), and ISL LIM homeobox 1 (ISL1).
[0114] In some embodiments, the expression of the gene and/or the encoded protein is associated with liver tissue including, but not limited to Cytochrome P450E1 (CYP2E1), Transferrin (TF), hemopexin (HPX), and albumin (ALB). In some embodiments, the expression of the gene and/or the encoded protein is associated with kidney tissue including, but not limited to Polycystic kidney disease 1 (PKD1) and Polycystic kidney disease 2 (PKD2). In some embodiments, the expression of the gene and/or the encoded protein is associated with epithelial tissue including, but not limited to keratin 5 (KRT5) and lamanin subunit gamma 2 (LAMC2). Exemplary genes associated with specific tissue and cell types are shown below in Table 2.
Table 1: Illustrative Target Genes and Corresponding Cell Structures
Figure imgf000032_0001
Figure imgf000033_0001
Table 2: Illustrative tissue-type and cell-type associated genes
Figure imgf000033_0002
Figure imgf000034_0001
[0115] In some embodiments, a plurality of detectable labels is inserted into a plurality of target loci. For example, one detectable label is inserted at one endogenous loci and a different detectable label is inserted at a different endogenous loci. In such embodiments, each of the individual detectable labels is selected such that the detection of one does not interfere, or minimally interferes with, the detection of another. In such embodiments, a unique crRNA is generated for each target locus. In further embodiments, a CRISPR ribonucleoprotein (crRNP), comprising a Cas protein complexed with a crRNA:tracrRNA duplex, is produced for each target locus. In some embodiments, the plurality of nucleic acid sequences encoding the plurality of detectable labels are comprised in a single donor plasmid and are flanked on the 5’ and 3’ ends by homology arms corresponding to genomic sequences within the target locus. For example, at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more detectable labels and their corresponding homology arms may be comprised within one donor polynucleotide.
[0116] In some embodiments, the plurality of nucleic acid sequences encoding the plurality of detectable labels and their corresponding homology arms are comprised within at least two different donor plasmids. For example, at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more donor plasmids may be used in the present methods. In some embodiments, a plurality of donor plasmids ( e.g ., at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) each comprising one sequence encoding a detectable label and the corresponding homology arms may be used in the present methods. In some embodiments, a plurality of donor plasmids (e.g., at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) each comprising a plurality of sequences encoding two or more detectable labels (e.g, at least 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) and the corresponding homology arms may be used in the present methods. In some embodiments, the plurality of donor plasmids are introduced to a stem cell at the same time. In some embodiments, the plurality of donor plasmids are introduced to a stem cell sequentially.
III. Stably-Tagged Stem Cell Clones
[0117] In some embodiments, the present disclosure provides edited stem cell clones that stably express one or more tagged endogenous proteins. In some embodiments, the stably tagged stem cell clones of the current invention are characterized by (i) mono- or biallelic insertion of a nucleic acid sequence encoding a detectable tag (e.g. GFP) into one or more endogenous proteins (e.g, structural, non-structural, or non-expressed proteins of the stem cell); (ii) pluripotency (e.g, the ability to differentiate into all three germ layers); and (iii) the lack of additional mutations or alternations in the endogenous stem cell genome. Such edited stem cell clones are herein referred to as“stably tagged stem cell clones.”
[0118] The stably tagged stem cell clones described herein phenotypically differ from non-engineered stem cell clones only by the expression of one or more endogenous proteins that have been tagged with a detectable tag and the incorporation of one or more antibiotic resistance cassettes into the one or more tagged endogenous loci.. In some embodiments, the stably tagged stem cell clones of the current invention are characterized by (i) mono- or biallelic insertion of a nucleic acid sequence encoding a detectable tag (e.g. GFP) into one or more endogenous proteins (e.g., structural, non-structural, or non-expressed proteins of the stem cell); (ii) pluripotency (e.g, the ability to differentiate into all three germ layers); and (iii) the presence of one or more additional mutations or alternations in the endogenous stem cell genome. Such edited stem cell clones are herein referred to as“stably tagged mutant stem cell clones.” In some embodiments, the stably tagged mutant stem cell clones comprise one or more one or more additional mutations or alternations in the endogenous stem cell genome that are associated with a particular disease or disorder. Thus, the stably tagged mutant stem cell clones described herein phenotypically differ from non-engineered stem cell clones by the expression of one or more endogenous proteins that have been tagged with a detectable tag, the incorporation of one or more antibiotic resistance cassettes into the one or more tagged endogenous loci, and the presence of one or more mutations additional not found in the non-engineered stem cell clones. The stably tagged mutant stem cell clones described herein phenotypically differ from the corresponding stably tagged stem cell clones only by the presence of one or more additional mutations.
[0119] Provided herein are compositions comprising stably tagged stem cell clones made by the methods described herein. In some embodiments, the compositions comprise a stably tagged stem cell clone wherein one endogenous protein is tagged. For example, a composition may comprise a stably tagged stem cell clone expressing a tagged endogenous protein wherein the endogenous protein is one selected from Tables 1 and/or 2 ( e.g ., one of PXN, TUBA1B, LMNB 1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l), AAVS1, MAP1LC3B (also known as LC3), ST6GAL1, LAMP1, CETN2, SLC25A17 (also known as PMP34), RAB5A, GJA1 (also known as connexin 43 (CX43)), MAPK1, ATP2A2, AKT1, CTNNB1, NPM1, HIST1H2BJ, C AGGS :HI S T 1 H2B J : 2 A : C A AX, PKD2, DMD, DES, SLC25A17 (also known as PMP34), SMC1A, NUP153, CTCF, CBX1, Oct4, Sox2, Nanog, TNNI1, ACTN2, TNN13, MYL2, MYL7, TTN, SMAD, SMAD5, NKX2-5, MESP1, MIXL1, ISL1, CYP2E1, TF, HPX, ALB, PKD1, PKD2, KRT5, and LAMC2.
[0120] In some embodiments, the compositions described herein comprise a stably tagged stem cell clone wherein at least two endogenous proteins are tagged. For example, a composition may comprise a stably tagged stem cell clone wherein one endogenous loci is tagged with a detectable tag and wherein another endogenous loci is tagged with a different detectable tag. In such embodiments, either of the endogenous loci may be selected from Tables 1 and/or 2. For example, the endogenous proteins may be two or more of those listed in Tables 1 and 2 (e.g., two or more of PXN, TUBA1B, LMNB 1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l), AAVS1, MAP1LC3B (also known as LC3), ST6GAL1, LAMP1, CETN2, SLC25A17 (also known as PMP34), RAB5A, GJA1 (also known as connexin 43 (CX43)), MAPK1, ATP2A2, AKT1, CTNNB1, NPM1, HIST1H2BJ,
C AGGS : HI S T 1 H2B J : 2 A : C A AX, PKD2, DMD, DES, SLC25A17 (also known as PMP34), SMC1A, NUP153, CTCF, CBX1, Oct4, Sox2, Nanog, TNNI1, ACTN2, TNN13, MYL2, MYL7, TTN, SMAD, SMAD5, NKX2-5, MESP1, MIXL1, ISL1, CYP2E1, TF, HPX, ALB, PKD1, PKD2, KRT5, and LAMC2. In some embodiments, one detectable tag may be inserted into a target loci in TUBAB1 and a different detectable tag may be inserted into a target loci in LMNB1. In some embodiments, one detectable tag may be inserted into a target loci in SEC61B and a different detectable tag may be inserted into a target loci in LMNB 1. In some embodiments, one detectable tag may be inserted into a target loci in TOMM20 and a different detectable tag may be inserted into a target loci in TUBAB1. In some embodiments, one detectable tag may be inserted into a target loci in SEC61B and a different detectable tag may be inserted into a target loci in TEGBAB1. In some embodiments, one detectable tag may be inserted into a target loci in TUBAB1 and a different detectable tag may be inserted into a target loci in CETN2. In some embodiments, one detectable tag may be inserted into a target loci in SEC61B and a different detectable tag may be inserted into a target loci in LMNB 1. In some embodiments, one detectable tag may be inserted into a target loci in AAVS1 and a different detectable tag may be inserted into a target loci in CAGGS:HISTlH2BJ:2A:CAAX. In some embodiments, one detectable tag may be inserted into a target loci in TOMM20 and a different detectable tag may be inserted into a target loci in TUB AB E
[0121] In some embodiments, the compositions described herein comprise a stably tagged stem cell clone wherein at least three endogenous proteins are tagged. For example, a composition may comprise a stably tagged stem cell clone wherein a first endogenous loci is tagged with a first detectable tag, a second endogenous loci is tagged with a second detectable tag, and a third endogenous loci is tagged with a third detectable tag. In such embodiments, any of the endogenous loci may be selected from Tables 1 and/or 2. For example, the endogenous proteins may be three or more of those listed in Tables 1 and 2 ( e.g ., three or more of PXN, TUBA1B, LMNB1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l), AAVS1, MAP1LC3B (also known as LC3), ST6GAL1, LAMPl, CETN2, SLC25A17 (also known as PMP34), RAB5A, GJA1 (also known as connexin 43 (CX43)), MAPK1, ATP2A2, AKT1, CTNNB1, NPM1, HIST1H2BJ, C AGGS :HI S T 1 H2B J : 2 A : C A AX, PKD2, DMD, DES, SLC25A17 (also known as PMP34), SMC1A, NUP153, CTCF, CBX1, Oct4, Sox2, Nanog, TNNI1, ACTN2, TNN13, MYL2, MYL7, TTN, SMAD, SMAD5, NKX2-5, MESP1, MIXL1, ISL1, CYP2E1, TF, HPX, ALB, PKD1, PKD2, KRT5, and LAMC2.
[0122] In some embodiments, the compositions described herein comprise a stably tagged stem cell clone wherein at least four or five or more endogenous proteins are tagged. In such embodiments, the endogenous proteins may be three or more of those listed in Tables 1 and 2 (e.g., four, five, or more of PXN, TUBA1B, LMNB 1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l), AAVS1, MAP1LC3B (also known as LC3), ST6GAL1, LAMP1, CETN2, SLC25A17 (also known as PMP34), RAB5A, GJA1 (also known as connexin 43 (CX43)), MAPK1, ATP2A2, AKT1, CTNNB1, NPM1, HIST1H2BJ,
C AGGS : HI S T 1 H2B J : 2 A : C A AX, PKD2, DMD, DES, SLC25A17 (also known as PMP34), SMC1A, NUP153, CTCF, CBX1, Oct4, Sox2, Nanog, TNNI1, ACTN2, TNN13, MYL2, MYL7, TTN, SMAD, SMAD5, NKX2-5, MESP1, MIXL1, ISL1, CYP2E1, TF, HPX, ALB, PKD1, PKD2, KRT5, and LAMC2.
[0123] In some embodiments, the compositions described herein comprise two or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses a tagged endogenous protein. In some embodiments, each stably tagged stem cell clone express a different tagged endogenous protein. In some embodiments, the compositions described herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses a different tagged endogenous protein. In some embodiments, the compositions described herein comprise two or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses two or more tagged endogenous proteins. In some embodiments, the compositions described herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses two or more tagged endogenous proteins. In some embodiments, the compositions described herein comprise two or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses 2, 3, 4, 5, 6, 7, 8, 9, 10 or more tagged endogenous proteins. In some embodiments, the compositions described herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses 2, 3, 4, 5, 6, 7, 8, 9, 10 or more tagged endogenous proteins. In some embodiments, each stably tagged stem cell clone express a group of tagged endogenous proteins that are different from the tagged endogenous proteins expressed by another stem cell clone in the same composition. Exemplary endogenous proteins that can be tagged in these embodiments are shown in Tables 1 and 2, including but not limited to PXN, TEBA1B, LMNB1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l), AAVS1, MAP1LC3B (also known as LC3), ST6GAL1, LAMP1, CETN2, SLC25A17 (also known as PMP34), RAB5A, GJA1 (also known as connexin 43 (CX43)), MAPK1, ATP2A2, AKT1, CTNNB1, NPM1, HIST1H2BJ, CAGGS:HISTlH2BJ:2A:CAAX, PKD2, DMD, DES, SLC25A17 (also known as PMP34), SMC1A, NUP153, CTCF, CBX1, Oct4, Sox2, Nanog, TNNI1, ACTN2, TNN13, MYL2, MYL7, TTN, SMAD, SMAD5, NKX2-5, MESP1, MIXL1, ISL1, CYP2E1, TF, HPX, ALB, PKD1, PKD2, KRT5, and LAMC2.
[0124] Exemplary stably tagged stem cell clones that can be produced by the methods and techniques are shown below in Tables 3 and 4. The association of any tag in the table with any structural protein in the table is for illustrative purposes only. In this regard, any tag (or fluorescent protein) in the Table can be associated with any structural gene in the table.
Table 3: Exemplary Embodiments of Stably Tagged Stem Cell Clones
Figure imgf000039_0001
Figure imgf000040_0001
Figure imgf000041_0001
* * variable based on excision
Table 4 - Exemplary Embodiments of Stably Dual-Tagged Stem Cell Clones
Figure imgf000041_0002
Figure imgf000042_0001
A. Validation Assays
[0125] In some embodiments, the present invention provides methods for selecting a stem cell that has been modified by the methods described herein to express a tagged endogenous protein. In some embodiments, the insertion of the tag sequence into the endogenous target loci does not result in additional genetic mutations or alterations in the endogenous target locus, or any other heterologous locus in the endogenous genome. In further embodiments, the insertion of the tag sequence into the endogenous target loci does not modify or alter the expression, function, or localization of the endogenous protein. In some embodiments, methods are provided herein for selecting stem cells modified by the methods described herein, wherein the identified stem cells comprise one or more of precise insertion of the nucleic acid sequence encoding a tag; pluripotency; maintained cell viability and function as compared to a non-modified stem cell; maintained levels of expression of the tagged endogenous protein as compared to a non-modified stem cell; maintained protein localization of the tagged endogenous protein as compared to a non- modified stem cell; maintained protein function of the tagged endogenous protein as compared to a non-modified stem cell; maintained expression of stem cell markers as compared to a non- modified stem cell; and/or maintained differentiation potential. In some embodiments, the properties of a selected stem cell are validated by one or more of several downstream assays.
[0126] In some embodiments, a population of edited stem cells ( e.g ., wherein a crRNP and a donor plasmid have been transfected into the cells) are sorted based on their relative expression of the detectable tag. In some embodiments, cells are sorted by fluorescence activated cell sorting (FACS). Cells that are positive for the inserted tag (e.g., express the tag at levels that are increased compared to non-edited population) are selected for further analysis. In some embodiments, the selected cells are expanded in a single colony expansion assay to produce individual clones of edited stem cells.
[0127] In some embodiments, edited clones are further analyzed by digital droplet
PCR (ddPCR) to identify clones that have an inserted tag sequence and that do not have stable genomic incorporation of the plasmid backbone. In some embodiments, the clones are further analyzed to determine the copy number of the inserted tag sequence. In some embodiments, identified clones have monoallelic or biallelic insertion of the tag sequence.
[0128] In further embodiments, the modified cells are assessed for the functional expression of the one or more detectable tags. For example, live cell imaging may be used to observe localization, expression intensity, and persistence of expression of the tagged endogenous protein in the modified stem cells described herein. In some embodiments, the expression of one or more detectable tags does not substantially or does not significantly alter the endogenous expression, localization, or function of the tagged protein. In some embodiments, the precise insertion of the tag sequence is analyzed by sequencing the edited target locus or a portion thereof. In some embodiments, the junctions between the endogenous genomic sequence and the 5’ and 3’ ends of the tag sequence are amplified. The amplification products derived from the population of edited cells are sequenced and compared with sequences of the corresponding target locus derived from a population of non-edited cells. In some embodiments, potential off-target sites for the crRNA sequences are determined using algorithms known in the art (e.g., Cas-OFF finder). To determine the presence of off-target cutting or insertions, these predicted off-target sites and the surrounding genomic sequences can be amplified and sequenced to determine the presence of any mutations or inserted tag sequences. Sequencing can be performed by a number of methods known in the art, e.g., Sanger sequencing and Next-generation, high-throughput sequencing.
In some embodiments, the edited populations of cells can be assessed for the expression of transcription factors, cell surface markers, and other proteins or genes associated with stem cells (e.g. Oct 3/4, Sox2, Nanog, Tra-l60, Tra-l8l, and SSEA3). Protein expression can be determined by a number of means known in the art including flow cytometry, ELISA, Western blots, immunohistochemistry, or co-immunoprecipication. Gene expression can be determined by qPCR, microarray, and/or sequencing techniques (e.g, NGS, RNA-Seq, or CHIP-Seq). In some embodiments the edited populations of cells can be assessed for the presence of the CRISPR/Cas9 ribonucleoprotein (RNP) complex and/or the donor polynucleotide. In some embodiments, the edited stem cells are determined to be pluripotent according to the methods outlined above may be cryopreserved for later differentiation or use. B. Differentiation Assays
[0129] In some embodiments, the invention provides for methods of live-cell imaging in three dimensions using the stably tagged stem cell clones and methods described herein. In some embodiments, the present invention provides methods of assaying the differentiation potential of the edited stem cells and stably tagged clones thereof described herein. Such assays typically involve culturing edited stem cells or stably tagged clones thereof in media comprising one or more factors required for differentiation. Factors required for differentiation are referred to herein as“differentiation agents” and will vary according to the desired differentiated cell type. In some embodiments, the ability of the edited stem cells or stably tagged clones thereof described herein to differentiate into specialized cells is substantially similar to the ability of un-modified stem cells to differentiate into specialized cells. For example, in some embodiments, the edited stem cells and/or stably tagged clones thereof described herein are able to differentiate into substantially the same number of different types of specialized cells, differentiate at substantially the same rate ( e.g ., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more days to differentiated), and produce differentiated cells that are as viable and as function as un-modified stem cells.
[0130] In some embodiments, the methods of assaying the differentiation potential of the edited stem cells and stably tagged clones thereof described herein includes the addition of one or more test agents to a culture of edited stem cells or stably tagged clones thereof prior to, during, or after the addition of one or more differentiation agents. The edited stem cells or stably tagged clones thereof can then be visualized for changes in cellular morphology associated with the individual structural proteins tagged within each edited stem cells or stably tagged clones thereof. In some embodiments, these methods may be used to identify agents that promote differentiation into one or more cell lineages and therefore may be useful as differentiation agents. In some embodiments, these methods may be used to identify agents that disrupt or inhibit differentiation. In some embodiments, the stably tagged stem cells may be differentiated into any cell type, including but not limited to hematopoietic cells, neurons, astrocytes, dendritic cells, hepatocytes, cardiomyocytes, kidney cells, smooth muscle cells, skeletal muscle cells, epithelial cells, or endothelial cells.
[0131] In additional embodiments, the methods described herein can be used to produce edited stem cells in which one or more endogenous genes are tagged, which when differentially expressed or changed localization, provide information regarding a potential disease state or condition. For example, the following genes can be tagged and monitored following differentiation. Mislocalization or misfolding of the protein products of these genes often indicate evidence of a disease condition or potential for a disease condition. Shiny App. Values for each gene are also provided (internal database simulations and experiments), indicating their low level of expression in wild type human induced PSCs. Cells produced using the methods described herein with such genes tagged can provide a mechanisms for examining correction of such errors via pharmacological or other intervention. Many such targets for the editing methods described herein are G-protein-coupled receptors (GPCRs).
[0132] Exemplary genes include:
• Tau, MAPT; PMID: 27378256; Accumulates in disease-specific manner in neurons in several significant disease states. Shiny App value: 0.42
• NF-kB, NFKB1 and NFKB2; Nuclear localization vital for inflammation and cancer states, regulator of many processes. Shiny App values: 2.8 and 4.9.
• SMAD3; Nuclear localization perturbed in ALS, other neurodegenerative disorders. Shiny App: 6
• FOX03; Transcription factor with misregulated localization in cancers, acts as tumor suppressor. Shiny App: 4
• EGFR; Receptor with inappropriate localization in cancer states. Shiny App: 1
• GPCRs: many targets that are mislocalized in disease states. Very large druggable target class. Below are a few examples. GPCRs also tolerate tagging.
• PMID: 17878512
• PMID: 23161143
• AQP2, aquaporin. Perturbed localization in kidney models of diabetes and polycystic kidney disease. Shiny App: 0; PMID: 16825342
• AVP, vasopressin. Also perturbed in PKD.; Shiny App: 0
• Rhodopsin, RHO. Perturbed localization in retinitis pigmentosum.;
Shiny app: 0
C. Screening Assays with Stably-Tagged Stem Cells and Cells Derived Therefrom
[0133] In some aspects, the present invention provides methods for drug screening to identify candidate therapeutic agents, and methods of screening agents to determine the effects of agents on the stably-tagged stem cell clones described herein and cells derived therefrom produced by the methods of the present invention. The methods may be employed to identify an agent having a desired effect on the cells. The stably-tagged stems cells of the present invention enable changes across multiple cell types to be assayed with the built in control of the cell types all being derived from the same progenitor clone.
[0134] In some embodiments, methods are provided for determining the effect of agents including small molecules, proteins, nucleic acids, lipids or even physical or mechanical stress (i.e. UV light, temperature shifts, mechanical sheer, etc.) by culturing a population of the stably-tagged stem cell clones described herein and cells derived therefrom in the presence and absence of the test agent(s). In some embodiments, agents that disrupt, alter, or modulate various key cellular structures and processes, including but not limited to cell division, microtubule organization, actin dynamics, vesicle trafficking, cell signaling, DNA replication, calcium regulation, ion channel regulators, and/or statins are assayed by the present methods. In some embodiments, the agent exerts a biological effect on the cells, such as increased cell growth or differentiation, increased or reduced expression of one or more genes, or increased or reduced cell death or apoptosis, etc. In particular embodiments, the stably-tagged stem cell clones used to screen for agents having a particular effect comprise a tagged protein associated with the cellular structure, process or biological activity being examined, such as any of the combinations of genes and structures shown in tables 3 and 4. Exemplary agents are shown in FIG. 26A.
[0135] In a further embodiment, the method provides assaying the cells after the exposure period by any known method, including confocal microscopy in order to determine changes in the content, orientation or cellular composition of the tagged structural protein contained within the given cell population. In one embodiment, a comparison can be made between the treated cells and untreated controls. In a further embodiment, a positive control may also be utilized in such methods. In some embodiments, one or more positive control agents with known effects on targeted structures may be applied to differentiated cell cultures derived from stably tagged stem cell clones and imaged, for example by confocal microscopy. The data obtained from these positive control experiments may be used as a training set for data that would allow for the automated assaying of different cellular structures in different cell types based on machine learning. [0136] In some embodiments, the data obtained from these experiments are used to generate a signature for a test agent. In some embodiments, the method of generating a signature for a test agent comprises (a) admixing the test agent with one or more stably tagged stem cell clones; (b) detecting a response in the one or more stem cell clones; (c) detecting a response in a control stem cell; (d) detecting a difference in the response in the one or more stem cell clones from the control stem cell; and (e) generating a data set of the difference in the response. In some embodiment the detected response in the stem cell clones and/or control cells is one or more of cell proliferation, microtubule organization, actin dynamics, vesicle trafficking, cell-surface protein expression, DNA replication, cytokine or chemokine production, changes in gene expression, and/or cell migration. In some embodiments, the control cell is a stably tagged stem cell clone that has not been exposed to the test agent or a control agent ( e.g ., a vehicle control). In some embodiments, the control cell is a stably tagged stem cell clone that has been exposed a control agent (e.g., a vehicle control). In some embodiments, these methods are used to determine the toxicity of a test agent and/or to determine the optimal dose of a test agent required to induce or inhibit a particular cell function or cell response. In such embodiments, the difference in the response in the one or more stem cell clones from the control stem cell are quantified and used to generate a data set of the difference in the response. This data-set can then be used as a training set for an algorithm to predict the effect of a related agent on a particular cellular function.
[0137] In some embodiments, stably tagged stem cell clones derived from diseased patients or stably tagged mutant stem cell clones can be differentiated into one or more differentiated cell types assayed by the methods described herein to generate a cell-type specific data-set related to a particular disease. In such embodiments, the cell proliferation, microtubule organization, actin dynamics, vesicle trafficking, cell-surface protein expression, DNA replication, cytokine or chemokine production, changes in gene expression, and/or cell migration of the differentiated cells can be determined at one or more time points during differentiation and maturation. Data sets derived from such assays can then be used as a training set for one or more disease-specific algorithms that can be applied to a cell sample derived from a patient to determine whether the patient has a disease, the stage of disease, and/or used to monitor the effects of a particular disease treatment. In some embodiments, the disease is selected from a disease characterized by aberrant cell growth, wound healing, inflammation, and/or neurodegeneration. [0138] In some embodiments, methods are provided for live-cell imaging to observe intracellular protein localization, expression intensity, and persistence of expression in the modified stem cells or stably transfected stem cell clones described herein. In some embodiments, the expression of one or more detectable tags does not substantially or does not significantly alter the endogenous expression or localization of the tagged protein. In some embodiments, the invention provides for methods of live-cell imaging in three dimensions using the stably tagged stem cell clones and the cell culturing and plating and microscopy methods described herein.
IV. Kits
[0139] In some embodiments, provided herein are kits comprising the stably tagged stem cell clones described herein. In some embodiments, the kits described herein comprise two or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses a tagged endogenous protein. In some embodiments, each stably tagged stem cell clone express a different tagged endogenous protein. In some embodiments, the kits described herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses a different tagged endogenous protein. In some embodiments, the kits described herein comprise two or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses two or more tagged endogenous proteins. In some embodiments, the kits described herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses two or more tagged endogenous proteins. In some embodiments, the kits described herein comprise two or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses 2, 3, 4, 5, 6, 7, 8, 9, 10 or more tagged endogenous proteins. In some embodiments, the kits described herein comprise 2, 3, 4, 5, 6, 7, 8, 9, 10 or more stably tagged stem cell clones, wherein each stably tagged stem cell clone expresses 2, 3, 4, 5, 6, 7, 8, 9, 10 or more tagged endogenous proteins. In some embodiments, each stably tagged stem cell clone express a group of tagged endogenous proteins that are different from the tagged endogenous proteins expressed by another stem cell clone in the same composition. Exemplary endogenous proteins that can be tagged in these embodiments are shown in Tables 1 and 2, including but not limited to PXN, TUBA1B, LMNB 1, ACTN1, TOMM20, DSP, SEC61B, FBL, ACTB, MYH10, VIM, TJP1 (also known as ZO-l), AAVS1, MAP1LC3B (also known as LC3), ST6GAL1, LAMP1, CETN2, SLC25A17 (also known as PMP34), RAB5A, GJA1 (also known as connexin 43 (CX43)), MAPK1, ATP2A2, AKT1, CTNNB1, NPM1, HIST1H2BJ,
C AGGS : HI S T 1 H2B J : 2 A : C A AX, PKD2, DMD, DES, SLC25A17 (also known as PMP34), SMC1A, NUP153, CTCF, CBX1, Oct4, Sox2, Nanog, TNNI1, ACTN2, TNN13, MYL2, MYL7, TTN, SMAD, SMAD5, NKX2-5, MESP1, MIXL1, ISL1, CYP2E1, TF, HPX, ALB, PKD1, PKD2, KRT5, and LAMC2. In some embodiments, the kits also allow for building an entire“cell clinic” or reference set that comprises cell types from every major organ system, or those of interest, that allows for the interrogation of likely function of new genes and assaying of cellular toxicity.
[0140] In some embodiments, the present disclosure provides kits for assessing differentiation agents and/or the effect of compounds or drugs on the differentiation of stem cells. In some embodiments, the present disclosure provides a kit comprising one or more stably tagged stem cell clones expressing one or more tagged endogenous proteins. In some embodiments, the present disclosure provides a kit comprising a plurality of stably tagged stem cell clones expressing one or more tagged endogenous proteins. In some embodiments, the cells are provided as an array such that all cellular structures are tagged among a plurality of stably tagged stem cell clones.
[0141] In some embodiments, the kits described herein further comprise one or more agents known to elicit stem cell differentiation into one or more cell types. One of skill in the art would understand the appropriate media and agents for differentiation into various cell types. For example, a kit may include stably tagged stem cells and media containing Activin A for cardiomyocyte differentiation. Alternatively, a kit may include stably tagged stem cells and media containing factors described in Methods Mol Biol. 2014; 1210: 131-41 or Biomed Rep. 2017 Apr; 6(4): 367-373 for hepatocyte differentiation. Alternatively, a kit may include stably tagged stem cells and media containing factors described in Methods Mol Biol. 2017;1597: 195-206 or Nat Commun. 2015 Oct 23;6:87l5 for renal cell differentiation. Alternatively, a kit may include stably tagged stem cells and media containing factors described in Mol Psychiatry. 2017 Apr 18. doi: l0. l038/mp.20l7.56 or Scientific Reports volume 7, Article number: 42367 (2017) for neuronal cell differentiation. Additional exemplary factors for producing differentiated cell types from human iPSCs are shown in FIG. 32. The stably tagged stem cells according to this embodiment may be provided in expanded form, for example, on a multi-well plate and ready for assay. Alternatively, the cells may be provided in a form that requires further expansion before plating and assaying. [0142] In some embodiments, provided herein are kits comprising one or more differentiated cell types derived from one or more stably tagged stem cell clones. As used herein “derived from,” for example, one or more stably tagged stem cell clones refers to cells that are differentiated, from the stably tagged stem cell clones. In some embodiments, cells that are derived from stably tagged stem cell clones are terminally differentiated cells that are direct progeny of the stably tagged stem cell clones. Therefore, the differentiated cell types, like their stably tagged stem cell clone progenitors also express tagged ( e.g . with a detectable marker, such as, for example, GFP and the like) structural or non-structural proteins. In one embodiment, the kits provided herein comprise one or more differentiated cell types. In some embodiments, kits provided herein contain differentiated cell types from all three germ layers. In some embodiments, kits are provided containing differentiated cells of substantially all major cell types of the body derived from stably tagged stem cell clones. In some embodiments, the kits are provided on multi-well plates in assay ready format. In some embodiments, the cells are provided in a form that requires thawing, culturing and/or expanding the cells. In some embodiments, the differentiated cells derived from stably tagged stem cells are provided in an array such that for each cell type member in the array, a tagged protein member is provided such that every structure being studied is tagged in each cell type being assayed.
V. Methods, Cells and Kits For Differentially-Expressed Protein Tagging
[0143] In still further embodiments, provided herein are method for producing a cell comprising at least one tagged endogenous, differentially-expressed protein. The methods described herein can be used to produce various cells types, including for example, normal cells, cancer cells, tissue-specific cells, etc. In embodiments, the cells that are produced are stem cells, as described herein.
[0144] In embodiments, the methods are useful for producing at least one tagged endogenous, differentially-expressed protein. As used herein, an“endogenous, differentially- expressed protein,” refers to a protein that is a wild-type protein, or a protein that comprises one or more naturally-occurring mutations and/or one or more introduced mutations, that is substantially expressed in one cellular state, but is non-substantially expressed in a different cellular state. An endogenous, differentially-expressed protein is non-substantially expressed in a first cellular state when that endogenous, differentially-expressed protein, is produced at a level that is less than about 10% of the level of production in a second cellular state. Importantly, steps should be taken to provide that the endogenous, differentially-expressed protein is not expressed at all when the gene editing described herein is taking place, so as to allow the methods to modify the target gene(s) as required.
[0145] For example, as described herein stem cells have the capacity to differentiate into at least one differentiated cell lineage, and in embodiments, the ability to differentiate into all three germ layers. Thus, in a first cellular state (i.e., as an undifferentiated stem cell), the level of an endogenous, differentially-expressed protein, would be less than about 10%, suitably less than about 5%, less than about 1%, less than about 0.5%, less than about 0.1%, less than about 0.01%, and suitably, about 0%, of the level of production of the same, endogenous, differentially-expressed protein, in the second cellular state (i.e., a stem cell that is differentiated into one of the three germ layer cells). In further embodiments, cells that can contain an endogenous, differentially-expressed protein, include, for example, cells that may differentially- express a protein in transitioning from a normal cell to cancerous cell, cells transitioning from a normal cell to a diseased cell, cells transiting from a normal cell to a dying or apoptotic cell, etc. As described herein, in embodiments the endogenous, differentially-expressed protein exhibits no expression in a pluripotent stem cell, but is expressed (i.e., at a biologically meaningful level) in a differentiated cell. That is, in embodiments, the endogenous, differentially-expressed protein is only, specifically expressed in a differentiated cell, but is not expressed in a pluripotent stem cell.
[0146] In embodiments, the methods include providing a first nuclease specific for a target genomic locus of a differentially-expressed protein. The methods further include providing a donor plasmid that comprises a first polynucleotide encoding a selection cassette, wherein the selection cassette comprises a first selectable marker; a second polynucleotide encoding a second selectable marker that is different from the first selectable marker; a third polynucleotide encoding a 5’ homology arm; and a fourth polynucleotide encoding a 3’ homology arm. The methods further include introducing the first nuclease and the donor plasmid of into a cell such that the first and second polynucleotides are inserted into the target genomic locus. Cells expressing the first selectable marker are then selected. The methods also include introducing into the selected cells a second nuclease capable of excising the selection cassette to generate an endogenous protein tagged with the second selectable marker. As described herein, the methods suitably produce a cell comprising the at least one tagged endogenous, differentially-expressed protein, such that the tagged endogenous protein is substantially free of a scar sequence.
[0147] As described herein, various gene editing systems in which a nuclease specific for a target genomic locus of a particular protein, i.e., a differentially-expressed protein, are readily used to modify a target locus of an endogenous DNA sequence. Examples include zinc-finger nuclease systems, TALEN systems, and in suitable embodiments, CRISPR/Cas systems. Nucleases specific for a target genomic locus are described throughout.
[0148] In embodiments, the donor plasmid that is provided includes a first polynucleotide encoding a selection cassette, wherein the selection cassette comprises a first selectable marker. As used herein, the term“selection cassette” refers to a polynucleotide sequence that contains one or more genes encoding one or more selectable markers, and also suitably including one or more linkers, spacers or flanking polynucleotide sequences; one or more constitutive regulatory elements; and a pair of excision sites. Suitably, the one or more linkers, spacers or flanking polynucleotides sequences are sequences that are not found in the genomic sequence of the cell being targeted by the method.
[0149] As used herein a“selectable marker” refers to a gene that encodes a protein that is capable of being detected or observed, or confers a trait to allow preferential selection (whether positive or negative selection), thereby allowing detection, selection, identification or visualization of the cells that include the marker. Examples of selectable marker include antibiotic resistance genes (e.g., resistance to ampicillin, chloramphenicol, tetracycline or kanamycin, etc.), counterselectable markers that eliminate or inhibit growth upon selection (e.g., thymidine kinase) , as well as detectable tags, as described herein which include but are not limited to, FLAG tags, poly-histidine tags (e.g. 6xHis), SNAP tags, Halo tags, cMyc tags, glutathione-S-transferase tags, avidin, enzymes, fluorescent molecules, luminescent proteins, chemiluminescent proteins, bioluminescent proteins, and phosphorescent proteins.
[0150] The donor plasmid also further includes a second selectable marker that is suitably different than the first selectable marker, so as to allow for a two-selection approach to produce the desired cells, as described herein. The donor plasmid also includes a polynucleotide encoding a 5 homology arm and a 3 homology arm. As described herein, each of the 5 and 3 homology arms is at least about 500 base pairs long. In some embodiments, the homology arm sequences are at least about 1000 base pairs long. In embodiments, each of the 5 and 3 homology arm polynucleotide sequences is at least about 90% identical to an endogenous nucleic acid sequence located 5’ or 3’, to a particular endogenous target locus. In some embodiments, each of the homology arm sequences is at least about 95%, 96%, 97%, 98%, or 99%, or 100% identical to an endogenous nucleic acid sequence located 5’ or 3’ to a particular endogenous target locus.
[0151] In embodiments, the methods of production further include introduction the nuclease and the donor plasmid into a cell, such that the first a second polynucleotides are inserted into the target genomic locus. As described herein, the nuclease and the donor plasmid can be inserted via various methods of transfection, including lipofection, electroporation (e.g., Neon® transfection system or an Amaxa Nucleofector®), sonication, or nucleofection. In embodiments, the transfection occurs via electroporation, as described herein, suitably utilizing electroporation comprises at least 1 pulse, suitably at least 2 pulses, and more suitably 1 to 5 pulses. IN embodiments, the electroporation utilizes a pulse that is at least about 15 ms in length, at a voltage of at least about 1300 V. Additional lengths and voltages for use in electroporation are described herein.
[0152] In some embodiments, cells that express the first selectable marker are then selected. As described herein, in embodiments wherein the first selectable marker is a detectable marker such a fluorescent protein, the cells can be selected via various cell sorting methods, including for example FACS. Selecting cells that express this first selectable marker provides a mechanism to ensure that the cells include the donor plasmid.
[0153] The methods suitably further include introducing into these selected cells
(i.e., the cells that expressed the first selectable marker), a second nuclease that is capable of excising the selection cassette. Examples of such nucleases that can be used for such a gene editing approach are provided herein. As described herein, the selection of appropriate nucleases, excision sites, and flanking polynucleotide sequences, results in an endogenous, differentially expressed protein, that is substantially free of a scar sequence. As used herein“substantially free of a scar sequence” means that the tagged protein contains less than 34 nucleotides that are the result of the nuclease-facilitated excision, suitably less than 30 nucleotides, less than 20 nucleotides, suitably less than 10 nucleotides, more suitably, less that 5 nucleotides that are the result of the nuclease-facilitated excision, suitably 4 nucleotides or less, 3 nucleotides or less, 2 nucleotides or less, 1 nucleotide, and suitably 0 nucleotides, that are residual from the excision. Traditionally, deleting exogenously introduced sequences was accomplished with site-specific recombinases or transposases. However, these enzymes can leave behind a“genomic scar” in the form of a residual 36-34 base pair loxP site or 4 bp transposase integration site. These extraneous sequences are avoided at the N-terminus of target genes, where regulatory sequences are more densely clustered, using the methods described herein. As noted herein, for example, the use of Cre/Lox for excision, can in some cases result in a 34 base pair residual loxP“scar,” which can disrupt endogenous sequences important for proper regulation of the targeted gene. The methods provided herein eliminate this scar sequence, providing a“scarless” fusion product.
[0154] In embodiments, provided herein is a method for producing a stem cell comprising at least one tagged endogenous, differentially-expressed protein. As described with regard to FIG. 33A - FIG. 33C, the methods suitably include providing a first ribonucleoprotein (RNP) complex comprising a first Cas protein, a first CRISPR RNA (crRNA) and a first trans- activating RNA (tracrRNA), wherein the first crRNA is specific for a target genomic locus of an endogenous, differentially-expressed protein in a stem cell. A donor plasmid is provided, comprising polynucleotide sequences encoding a first selectable marker. As described herein, in embodiments, this first selectable marker is a detectable tag, suitably a fluorescent protein. As shown in FIG. 33A, in embodiments, this first detectable tag is a gene encoding for an mCherry fluorescent protein.
[0155] The donor plasmid further includes a 5 excision site and a 3’ excision site, wherein the 5 and 3’ excision sites flank the first selectable marker. As shown in FIG. 33A, such excision sites are generally on the order of about 5-40 base pairs in length, and suitably include sites that are specific for the nuclease selected to allow for precise removal of the first selectable marker, and suitably the 5 excision site and a 3’ excision site are nucleic acid sequences that are not found in the target genome. In embodiments, further linker or spacer polynucleotides can be included on either side of the 5 excision site and a 3’ excision site. The donor plasmid further includes a second selectable marker that is different from the first selectable marker, suitably located 3' from the first selectable marker (and the excision cassette). In embodiments, the second selectable marker is suitably a detectable tag, so as to produce a cell that includes a tagged, endogenous differentially-expressed protein, that can readily be detected (e.g., via imaging, cell sorting, etc.), including a fluorescent protein such as GFP. The donor plasmid also suitably includes a 5 homology arm and a 3’ homology arm, wherein the 5 and 3’ homology arms are at least about 1 kb in length. Suitable lengths and percent identity to a target, endogenous nucleic acid sequence, for the 5’ and 3’ homology arms are provided herein.
[0156] In embodiments as illustrated in FIG. 33B, the complex of the first ribonucleoprotein (RNP) complex comprising a first Cas protein, the first CRISPR RNA (crRNA) and the first trans-activating RNA (tracrRNA), are suitable transfected into the stem cell, along with the donor plasmid. Exemplary methods of transfecting the complexes and donor plasmids are described herein. As a result of this transfection, the polynucleotides encoding the selection cassette (i.e., first selectable marker, 5’ excision site and a 3’ excision site flanking the first selectable marker), the second selectable marker are inserted into the target genomic locus.
[0157] A selection is then carried out to select for stem cells that express the first selectable marker. Suitable selection methods include various cell sorting methods, such as FACS.
[0158] In embodiments, an additional transfection (e.g., transfection 2 in FIG. 33C, is carried out with a second RNP complex comprising a second Cas protein, a second crRNA, and a second tracrRNA, wherein the second crRNA is specific for the 5’ and 3’ excision sites on the donor plasmid. This second transfection results in the excision of the first selectable marker, and the generation of an endogenous, differentially-expressed protein that is tagged with the second selectable marker (i.e., a detectibly tagged, endogenous, differentially-expressed protein). Stem cells that include this second selectable marker can be selected for, by for example, cell sorting for cells not containing the first selectable marker - that is cells that are substantially free (20% of the cells or less, suitably 1% or less) contain the first selectable marker (i.e., FACS sorting for cells without an mCherry detectable tag). The resulting stem cells contain at least one tagged endogenous, differentially-expressed protein.
[0159] In some embodiments, the first selectable marker is operably linked to a constative regulatory element such that, once it is successfully transfected into the target cell, the fist selectable marker is expressed. That is, the polynucleotide encoding the first selectable marker further encodes a constitutive regulatory element operably linked to the first selectable marker. As used herein,“operably linked” means that the constitutive regulatory element is upstream from the selectable marker and capable of causing the first selectable marker to be transcribed. As the regulatory element is a “constitutive” element, it is unregulated, and allows for continual transcription of the first regulatory element, once transfection has successfully occurred. Examples of constitutive regulatory elements that can be used in the various methods and plasmids described herein include, but are not limited to, a CAGGS promoter (l .6-kb hybrid promoter composed of the CMV immediate-early enhance, CBA promoter, and CBA intron l/exon 1), a hPGK promoter, an EF 1 -a promoter, a ubiquitin promoter (UBC promoter), and an actin promoter. In other embodiments, the constitutive regulator element can be replaced with an inducible promoter, for example a tetracycline-inducible promoter (tet), and the like.
[0160] The methods described herein allow for modifying the 5’ end with gene editing method, thus proving a method with more effective/flexible editing in a gene that could be sensitive to a leftover sequence from gene editing.
[0161] In additional embodiments, the donor plasmids useful in the methods described herein can further include microhomology containing sequences or linkers, flanking the 5’ and 3’ excision sites. Microhomology containing sequences, suitably 5-25 base pair sequences, that facilitate the ligation of mismatched hanging strands of polynucleotides, removing overhanging nucleotides, and filling in the missing base pairs. In embodiments, the microhomology containing sequences comprise tri-nucleotide or hexa-nucleotide repeat sequences. As shown in FIG. 33A, in suitable embodiments, the microhomology containing sequences are useful to guide in-frame microhomology -mediated end joining repair.
[0162] The donor plasmid can include polynucleotide that codes for (and thus the tagged, differentially-expressed protein will include), a linker that links the second selectable marker and the tagged protein. In embodiments, the linker is a protein sequence, including for example, Ser-Gly-Ser-Gly-Ser-Pro-Gly (SEQ ID NO: 288), Ser-Gly-Ser-Gly-Ser-Gly (SEQ ID NO: 289), Ser-Gly-Pro-Gly, or the ACTN2 linker: Val-Asp-Gly-Thr-Ala-Gly-Pro-Gly-Ser-Gly- Pro-Gly-Ser-Ile-Ala-Thr (SEQ ID NO: 290).
[0163] As described herein, in embodiments, and as illustrated in FIG. 33A, the 5’ and 3’ excision cites suitably include a TialL protospacer, for example an inverted TialL protospacer. These protospacers enable nucleases, for example Cas9/CRISPR-mediated, excision of the selection cassette after the cells that express the first selectable marker have been selected. As noted herein, suitably the 5’ and 3’ excision cites, including the TialL target sequence, is absent from the target genome, including the human genome, and can be used to ligate distinct double strand breaks induced by Cas9. In embodiments, the TialL sites are oriented in the“P AM-out” orientation such that NHEJ-mediated double strand repair following Cas9 activity results in an in- frame mEGFP fusion with the target gene. The peptide linker sequences incorporated within the TialL sites can be designed and oriented such that NHEJ-based repair after excision results in an in-frame coding sequence with 12 bp of residual sequence (for example encoding Ser-Gly-Pro- Gly) that serves as a canonical linker between the mEGFP and the target gene. Use of TialL sites suitably provide three base pairs encoding Gly or Ser (depending on orientation), which are useful in protein engineering.
[0164] In embodiments, the first and/or second selectable markers each contain at least about 8 amino acids in length, for example at least about 10 amino acid, at least about 20 amino acids, at least about 30 amino acids, at least about 40 amino acids, at least about 50 amino acids, at least about 60 amino acids, at least about 70 amino acids, at least about 80 amino acids, at least about 90 amino acids, or at least about 100 amino acids.
[0165] As described herein, exemplary selectable makers for use as the first and/or the second selectable markers, including an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag. In embodiments, both the first and second selectable markers are detectable tags, and suitably are fluorescent proteins, including for example green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, or mTagRFPt. In some embodiments, as shown in FIG. 33A, the first selectable marker is mCherry, and the second selectable marker is GFP. As the emission fluorescent signals from these two detectable tags are sufficiently far apart (mCherry at about 600-620 nm; GFP at about 500-520 nm), it allows for detection and selection of cells that are transcribed with mCherry (i.e., via fluorescence activated cell sorting (FACS)), followed by detection and/or selection, if desired, of the second selectable maker, GFP, including for use in imaging, as described herein. Selection for the second selectable marker, or selection for the lack of the first selectable marker, indicating the excision of the first selectable marker, can be utilized to select the cells containing the tagged, endogenous differentially-expressed proteins.
[0166] As described herein, suitably the nuclease system for both introduction of the donor plasmid, as well as removal of the first selectable marker (selection cassette) is a CRISPR/Cas system. A detailed discussion of the CRISPR/Cas system is provided herein. In embodiments, the first RNP comprises the first crRNA, the first tracrRNA, and the first Cas protein complexed at a ratio of 1 : 1 : 1, and the second RNP comprises the second crRNA, the second tracrRNA, and the second Cas protein complexed at a ratio of 1 : 1 : 1. In suitable embodiments, the Cas protein is a wild-type Cas9 protein or a Cas9-nickase protein.
[0167] The methods provided herein are designed such that, in embodiments, the first crRNA sequence is selected to minimize off-target cleavage of genomic DNA sequences and/or insertion of the donor plasmid, and the second crRNA sequence is selected to minimize off- target cleavage of the 5’ and 3’ excision sites. Suitably, off-target cleavage is less than about 5.0%, more suitably less than about 4.0%, less than about 3.0%, less than about 2.0%, less than about 1.0%, or less than about 0.5%.
[0168] In embodiments, a double-stranded break is generated at the target genomic locus after step the excision of the first selectable marker. This double-stranded break can be repaired by various mechanisms, including for example, homology directed repair (HDR), non homology end joining (NHEJ), or microhomology-mediated end joining (MMEJ). In exemplary embodiments, the use of microhomology linkers, including hexa- and tri-nucleotide repeats, allows for double-stranded break repair by MMEJ, with the microhomology linkers acting as a repair template during MMEJ. ETse of such sequences bias excision repair outcomes and efficiently delete the residual sequences remaining from Cas9 cleavage, including any protospacer adjacent motif (PAM) sequences that may have been included, leading to a scarless fusion product.
[0169] Traditional gene editing with CRISPR can leave a 12 bp scar, i.e., a. 3bp
PAM sequence, 3BP of protospacer leftover from one side, 3BP of protospacer leftover from other side, 3BP PAM. The methods described herein remove this issue by multiple approaches. In one embodiment, the scar sequence can be repurposed as a peptide linker. A scar sequence is placed in between a gene sequence and the tag, and the scar becomes a linker via non-homologous end joining, i.e., a 4 amino acid linker. In additional embodiments, the scar can be deleted through microhomology mediated end joining (MMEJ). MMEJ can be used insert nucleotides for amino acid linkers that are actually desired. For example, a sequence“A” that encodes the linker is placed on both sides of the scar and then MMEJ is utilized. The resulting product is [target gene - A - selectable marker] MMEJ involves a deletion event such that one of the“A” is deleted so only copy remains. MMEJ removes the scar by cutting out everything between the“A” sequences.. Although the scar may be transiently present episomally , but is not be present in cells.
[0170] In embodiments, the cells that are produced using the method are induced pluripotent stem cells (iPSC) derived from a healthy donor, and can be a WTC cell or a WTB cell, as described herein. Cells into which the iPSCs prepared in accordance with the methods herein can differentiate into include, for example, a cardiomyocyte, a differentiated kidney cell, or a differentiated fibroblast.
[0171] In exemplary embodiments, the tagged protein that is produced via the methods described herein can be ACTN2, TTNI1, MYL2, MYL7, or TTN. Additional proteins known in the art that are differentially-expressed can also be readily tagged using the methods described herein.
[0172] In further embodiments, the methods described herein can include an additional selection step based on genetic screening to confirm that the second selectable marker has been properly inserted, in the proper position, and with appropriate functionality. Such screening methods can include, for example, use of genetic screening to determine at least two of the following: insertion of the second selectable marker sequence, stable integration of the donor plasmid, and/or relative copy number of the second selectable marker sequence. In exemplary embodiments, the genetic screening is performed by droplet digital PCR (DDPCR), tile junction PCR, or both.
[0173] In embodiments, the second selectable marker can be inserted into one or both alleles of the target genomic locus, but is not stably integrated into the plasmid backbone. Genetic sequencing to identify clones with successful insertion of the second selectable marker can include, amplifying the genomic sequences across the junction between the inserted second selectable marker and the 5’ and 3’ distal genomic regions to generate tiled-junction amplification products, sequencing the tiled-junction amplification products, and comparing the sequence of the tiled-junction amplification products with a reference sequence to confirm precise insertion of the second selectable marker.
[0174] In embodiments, cells produced using the methods herein, and in particular stem cells, express at least one protein associated with pluripotency, including for example, one or more of Oct3/4, Sox2, Nanog, Tra-l60, Tra-l8l, and SSEA3/4. Suitably, the expression level of the at least one protein associated with pluripotency is comparable to the expression level of the same protein in an unmodified cell or stem cell. In further embodiments, the stem cells produced using the methods described herein maintain a differentiation potential that is comparable to an unmodified stem cell, and suitably the stem cells produced by the methods have a morphology, viability, potency, and endogenous cellular function of the stem cells are not substantially changed compared to unmodified stem cells and differentiated cells thereof. That is, that the stem cells produced using the methods described herein will function as normal stem cells, even with the inclusion of a tagged, endogenous differentially-expressed protein.
[0175] In still further embodiments, provided herein is a donor plasmid for use in the various methods described herein. Suitably, the donor plasmid includes polynucleotide sequences encoding a first selectable marker, a constitutive regulatory element operably linked to the first selectable marker, a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker, a second selectable marker that is different from the first selectable marker, a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length.
[0176] As described herein, suitably the constitutive regulatory element is a
CAGGS promoter or a hPGK promoter, though other promoters as described herein can also be utilized. The donor plasmid can also further include microhomology containing sequences flanking the 5’ and 3’ excision sites, and suitably the microhomology containing sequences include tri-nucleotide or hexa-nucleotide repeat sequences. In embodiments, the donor plasmid can further include a flexible linker sequence. Suitably, the polynucleotide sequences encoding the first and second selectable markers are each at least about 20 nucleotides in length, more suitably the first and second selectable markers are each between about 300 nucleotides and about 3000 nucleotides in length, or the polynucleotide sequences encoding the first and second selectable markers can each greater than about 3000 nucleotides. The first and second selectable markers encoded by the polynucleotides are suitably each at least about 8 amino acids in length, or can between about 8 and about 100 amino acids in length.
[0177] As described throughout, the first and/or second selectable marker is suitably an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag. Suitable first and second selectable markers include detectable tags, such as fluorescent proteins, including green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt. As shown in FIG. 33 A, in embodiments, the first selectable marker is mCherry, and the second selectable marker is GFP. [0178] Also provided herein are stably tagged cells generated by inserting the donor plasmids described herein into a genomic locus targeted by the 5’ and 3’ homology arms.
[0179] As described herein, the donor plasmids and methods described herein are suitably used to prepare tagged proteins that can be imaged, allowing for detection, imaging, tracking and studying of proteins that are silent in an undifferentiated cell (i.e., a stem cell), but differentially-expressed, that is turned on, when the cell differentiates into one or more further cell types. The cells prepared herein can be part of a tissue, including a living tissue. Imaging methods described herein, can allow for three-dimensional imaging of the cells and the tagged proteins, allowing for determination of location of the tagged proteins during various cell stages, etc. various methods of imaging cells, including 3-D imaging, are described herein.
[0180] In additional embodiments, provided herein is a cell comprising an exogenous polynucleotide integrated at a target genomic locus, the exogenous polynucleotide comprising polynucleotide sequences encoding a first selectable marker, a constitutive regulatory element operably linked to the first selectable marker, a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker, and a second selectable marker that is different from the first selectable marker. As described herein, the target genomic locus is suitably a locus of a gene encoding a differentially-expressed protein.
[0181] Also provided is a cell comprising a CRISPR/Cas9 ribonucleoprotein
(RNP) complex and a donor polynucleotide, the donor polynucleotide comprising polynucleotide sequences encoding a first selectable marker, a constitutive regulatory element operably linked to the first selectable marker, a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker, a second selectable marker that is different from the first selectable marker, and a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length.
[0182] As described herein, in embodiments, the cells include microhomology containing sequences flanking the 5’ and 3’ excision sites, suitably containing sequences comprising tri -nucleotide or hexa-nucleotide repeat sequences. The 5’ and 3’ excision sites each can comprise a TialL protospacer, including an inverted TialL protospacer.
[0183] Various lengths and compositions of the first and second selectable markers are described herein, including the use of fluorescent proteins, including green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
[0184] Also provided herein are cells comprising an endogenous, differentially- expressed protein stably tagged with a selectable marker, suitably wherein the selectable marker is an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag. In embodiments, the selectable marker is a detectable tag, such as a fluorescent protein, suitably selected from green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
[0185] As described herein, in embodiments the cell is an undifferentiated stem cell, and the differentially-expressed protein is not expressed in the undifferentiated stem cell, but is expressed in a differentiated cell derived from the undifferentiated stem cell. Also provided herein are differentiated cells or groups of differentiated cells, wherein the differentiated cells or group of differentiated cells are cardiomyocytes, differentiated kidney cells, or differentiated fibroblasts, and include a tagged, differentially-expressed protein produced using the methods and plasmids described herein.
[0186] In further embodiments, provided herein are kits comprising an array of stem cells comprising at least one tagged endogenous, differentially-expressed protein, suitably produced using the various methods and plasmids described herein. The kits can be used for visualizing one or more proteins during differentiation, or use for selecting differentiated cells, comprising an array of the cells described herein. Suitably, the visualizing of the one or more proteins is performed by fluorescent microscopy, and the differentiated cells express at least one tagged endogenous protein.
[0187] Also provided herein are methods of generating a signature for a test agent comprising admixing the test agent with one or more cells produced by the various methods described herein, detecting a response in the one or more cells, detecting a response in a control cell (i.e., a cell that does not include a test agent), detecting a difference in the response in the one or more cells from the control cell, and generating a data set of the difference in the response.
[0188] The cells produced by the various methods described herein can also be utilized in various activities, such as, determining a genetic or protein target for a test agent or drug within a cell, determining toxicity of a test agent on the cell, determining the stage of disease in the cell, determining the dose of a test agent or drug for treatment of disease, monitoring disease progression in the cell, and monitoring effects of treatment of a test agent or drug on the cell. Additional uses of the cell include monitoring progression of disease or effect of a test agent on a disease wherein the disease is selected from the group consisting of aberrant cell growth, wound healing, inflammation, immune disorders, genetic disorders, neurodegeneration, and neuromuscular degeneration.
IV Tagging of Stimuli-Responsive Genes
[0189] In further embodiments, provided herein are methods for producing a cell comprising at least one tagged endogenous, stimuli-responsive gene. As used herein a“stimuli- responsive gene” refers to a gene that turns on or is activated in response to an external stimuli, an environmental factor, or an added compound or substance. Examples of stimuli-responsive genes include genes that are turned on or activated in response to stress, heat, light, oxidation, ionizing radiation, metal-induced toxicity, or in response to a foreign compound or drug.
[0190] The methods provided that tag an endogenous, stimuli-responsive gene, allow for the production of cells where it can be confirmed that the tag has been inserted, but without triggering the stimuli-responsive gene (i.e., independent of the activation of the stimuli- responsive gene). Thus, cells can be produced with tagged stimuli-responsive genes, and then later exposed to a stimuli or compound, at which time the stimuli-responsive gene becomes activated, and the tag is expressed and confirmed, either visually or via other mechanism.
[0191] In embodiments, the methods comprise a) providing a first nuclease specific for a target genomic locus of a stimuli-responsive gene; b) providing a donor plasmid comprising: i. a first polynucleotide encoding a selection cassette, wherein the selection cassette comprises a first selectable marker; ii. a second polynucleotide encoding a second selectable marker that is different from the first selectable marker; iii. a third polynucleotide encoding a 5’ homology arm; and iv. a fourth polynucleotide encoding a 3’ homology arm; introducing the first nuclease of (a) and the donor plasmid of (b) into a cell such that the first and second polynucleotides are inserted into the target genomic locus; selecting cells expressing the first selectable marker; and introducing into the cells of (d): a second nuclease capable of excising the selection cassette to generate an endogenous, stimuli-responsive gene tagged with the second selectable marker; thereby producing the cell comprising the at least one tagged endogenous, stimuli-responsive gene.
[0192] As described herein, methods for producing cells containing endogenous tagged genes can be carried out using various gene editing methods, including those based on TALENS, Zinc Finger, CRISPR-Cas, etc. In embodiments, also provided herein is a method for producing a cell comprising at least one tagged endogenous, stimuli-responsive gene, the method comprising: a) providing a first ribonucleoprotein (RNP) complex comprising a first Cas protein, a first CRISPR RNA (crRNA) and a first trans-activating RNA (tracrRNA), wherein the first crRNA is specific for a target genomic locus of an endogenous, stimuli-responsive gene in a cell; b) providing a donor plasmid comprising polynucleotide sequences encoding: a first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; a second selectable marker that is different from the first selectable marker; and a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length; transfecting the complex of (a) and the donor plasmid of (b) into the cell such that the polynucleotide sequences encoding (i) to (iii) are inserted into the target genomic locus; selecting cells expressing the first selectable marker; and transfecting the cells of (d) with a second RNP complex comprising a second Cas protein, a second crRNA, and a second tracrRNA, wherein the second crRNA is specific for the 5’ and 3’ excision sites on the donor plasmid, to generate an endogenous stimuli-responsive gene tagged with the second selectable marker, thereby producing the cell comprising at least one tagged endogenous, stimuli-responsive gene.
[0193] Exemplary stimuli-responsive genes include those that turn on or activate in response to endoplasmic reticulum stress, ionizing radiation stress, heat shock, oxidative stress, metal-induced toxicity, or drug-induced toxicity, as well as other external stimuli. Examples of stimuli-responsive genes that can be tagged using the methods described herein are provided in the chart below. Additional stimuli-responsive genes include those involved in intracellular signaling pathways that are activated in response to stress or toxicity. Also provided are references where additional information regarding each of the genes, including sequence information of the genes for production of 5’ and 3’ homology arms, can be found.
Figure imgf000064_0001
Figure imgf000065_0001
Figure imgf000066_0001
[0194] Additional stimuli-responsive genes and their sequence information can also be found in the AmiGO 2 from the Gene Ontology (GO) Consortium, which can be used to identify additional genes positively or negatively regulated in response to various biological stimuli (e.g., X-ray, heat, hypoxia, etc.), and can be found at amigo.geneontology.org/amigo/dd_browse. Additional information on AmiG02 can be found in Ashbumer et al.,“Gene ontology: tool for the unification of biology,” Nat Genet 25(l):25-9 (2000), The Gene Ontology Consortium,“The Gene Ontology Resource: 20 years and still GOing strong,” Nucleic Acids Res 47(Dl):D330-D338 (2019) and Carbon et al.,“AmiGO: online access to ontology and annotation data,” Bioinformatics 25(2):288-289 (2009), the disclosures of each of which are incorporated by reference herein in their entireties.
[0195] Examples of cells that can be produced with tagged endogenous, stimuli- responsive genes, using the methods described herein, include any mammalian or human primary cells or cell lines, including lung cells, endothelial cells, muscle cells, liver cells, brain cells, nerve cells, immune cells, cartilage cells, cancer cells, etc.
[0196] In additional embodiments, a gene involved in sensing or promoting apoptosis in a cell can also be tagged, such that the effect of a stress, compound, etc., on the apoptotic response of the cell can be visually or otherwise tracked prior to the cell actually undergoing apoptosis. [0197] The various methods described herein with regard to tagging endogenous genes in stem cells can be extended to producing the tagged cells which contain an endogenous, stimuli-responsive gene, using similar methods, approaches, components, etc.
[0198] The methods and cells produced herein in which an endogenous, stimuli- responsive gene have been tagged, can provide various research and clinical advantages. For example, cells can be placed under various stress situations, including heat, cold, radiation, or situations where such stresses may be occurring, to view or otherwise track the response of the cells, as well as potentially determine methods that can intervene to stop or avert the stress response.
[0199] The methods and cells containing tagged endogenous, stimuli-responsive genes can also be used as drug screening or toxicity assays for potential new chemical compounds. Drugs can be provided to the cells in a controlled environment, suitably in cell culture or in situ , and the response monitored visually (if using fluorescence or other visual tags) or otherwise tracked to determine if the toxicity or stress response(s) of the cells are activated. In addition, agents that can counter toxicity causing compounds can also be screened using such cells and methods.
[0200] As described herein with regard to tagging of differentially expressed genes and proteins, the selection cassette of (b) suitably further comprises 5’ and 3’ excision sites flanking the first selectable marker.
[0201] In embodiments, the cell comprising the at least one tagged endogenous, stimuli-responsive gene is substantially free of the first selectable marker.
[0202] Suitably, the polynucleotide encoding the first selectable marker further encodes a constitutive regulatory element operably linked to the first selectable marker. In embodiments, the constitutive regulatory element is a CAGGS promoter, a UBC promoter, an EFl-a promoter, an actin promoter, or a hPGK promoter.
[0203] Suitably, the donor plasmid of (b) further comprises microhomology containing sequences flanking the 5’ and 3’ excision sites. In embodiments, the microhomology containing sequences comprise tri -nucleotide or hexa-nucleotide repeat sequences. Suitably, the 5’ and 3’ excision sites each comprise a TialL protospacer, including where the TialL protospacer is an inverted TialL protospacer. [0204] In embodiments, the first and/or second selectable markers are each at least about 8 amino acids in length, and in embodiments the first and/or second selectable markers are each at least about 100 amino acids in length.
[0205] Exemplary first and/or the second selectable markers are described herein, and suitably can be an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag. In embodiments, the first and second selectable markers are fluorescent proteins, including those selected from the group consisting of green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt. Suitably, the first selectable marker is mCherry, and the second selectable marker is GFP.
[0206] In embodiments where the first selectable marker is a fluorescent protein, the selecting of (d) suitably comprises fluorescence activated cell sorting (FACS). The methods can further comprise (f): selecting cells expressing the second selectable marker. In embodiments, the second selectable marker is a fluorescent protein and the second selection step comprises fluorescence activated cell sorting (FACS).
[0207] Suitably, the first nuclease and/or the second nuclease is a Cas nuclease, a
TALEN, or a zinc finger nuclease, and in embodiments, the first nuclease and/or the second nuclease is a Cas protein, including Cas9. In embodiments, the first RNP comprises the first crRNA, the first tracrRNA, and the first Cas protein complexed at a ratio of 1 : 1 : 1. In additional embodiments, the second RNP comprises the second crRNA, the second tracrRNA, and the second Cas protein complexed at a ratio of 1 : 1 : 1. Suitably, the Cas protein is a wild-type Cas9 protein or a Cas9-nickase protein. In exemplary embodiments, the first crRNA sequence is selected to minimize off-target cleavage of genomic DNA sequences and/or insertion of the donor plasmid. Suitably, the second crRNA sequence is selected to minimize off-target cleavage of the 5’ and 3’ excision sites. The methods provided herein suitably result in off-target cleavage that is less than about 1.0%.
[0208] In embodiments, a double-stranded break is generated at the target genomic locus after step (c). Suitably, the double-stranded break is repaired by homology directed repair (HDR), non-homology end joining (NHEJ), or microhomology-mediated end joining (MMEJ). In exemplary embodiments, the donor plasmid acts as a repair template during MMEJ. Suitably, protospacer adjacent motif (PAM) sequences are removed from the donor plasmid after insertion into the target genomic locus.
[0209] In exemplary embodiments of the methods described herein, the introducing or transfecting of (c) occurs by electroporation. Suitably, the electroporation comprises at least 1 pulse (more suitably at least 1-5 pulses, including 2 pulses), and in embodiments the pulse is at least about 15 ms at a voltage of at least about 1300 V.
[0210] In suitable embodiments, at least about 0.1% of the cells express the first selectable marker after step (c).
[0211] In exemplary embodiments, the second selection step further comprises genetic screening to determine at least two or more of the following: insertion of the second selectable marker sequence; stable integration of the donor plasmid; and/or relative copy number of the second selectable marker sequence. In embodiments, the genetic screening is performed by droplet digital PCR (ddPCR), tile junction PCR, or both. Suitably, selecting the clones having an insertion of the second selectable marker comprises selecting clones that have the second selectable marker inserted into one or both alleles of the target genomic locus and do not have stable integration of the plasmid backbone. In embodiments, the methods further comprise sequencing clones having an insertion of the second selectable marker to identify clones that have a precise insertion of the second selectable marker. In exemplary embodiments, the clones that have a precise insertion are identified by: amplifying the genomic sequences across the junction between the inserted second selectable marker and the 5’ and 3’ distal genomic regions to generate tiled-junction amplification products; sequencing the tiled-junction amplification products of (a); and comparing the sequence of the tiled-junction amplification products with a reference sequence.
[0212] In further embodiments, provided herein is a cell comprising an exogenous polynucleotide integrated at a target genomic locus, the exogenous polynucleotide comprising polynucleotide sequences encoding: a first selectable marker; a constitutive regulatory element operably linked to the first selectable marker; a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; and a second selectable marker that is different from the first selectable marker, wherein the target genomic locus is a locus of a stimuli- responsive gene.
[0213] As described herein, the cells further comprise microhomology containing sequences flanking the 5’ and 3’ excision sites, suitably where the microhomology containing sequences comprise tri -nucleotide or hexa-nucleotide repeat sequences. Suitably, the 5’ and 3’ excision sites each comprise a TialL protospacer, including where the TialL protospacer is an inverted TialL protospacer.
[0214] As described herein, in embodiments, the first and/or second selectable marker each comprise about 8 amino acids in length, and suitably the first and/or second selectable markers each comprise at least about 100 amino acids in length. In embodiments, the first and/or second selectable marker is an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag. Suitably, the first and second selectable markers are fluorescent proteins, including green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt. In embodiments, the first selectable marker is mCherry, and the second selectable marker is GFP.
[0215] In further embodiments, provided herein are methods for producing a cell comprising at least one tagged endogenous, gene. Examples of tagged endogenous genes include genes coding for structural proteins, membrane proteins, and various other cellular components.
[0216] The methods provided that tag an endogenous gene, allow for the production of cells where it can be confirmed that the tag has been inserted, but without triggering the endogenous gene (i.e., independent of the activation of the endogenous gene). Thus, cells can be produced with tagged endogenous genes, and when the endogenous gene is activated, and the tag is expressed and confirmed, either visually or via other mechanism.
[0217] In embodiments, the methods comprise a) providing a first nuclease specific for a target genomic locus of an endogenous gene; b) providing a donor plasmid comprising: i. a first polynucleotide encoding a selection cassette, wherein the selection cassette comprises a first selectable marker; ii. a second polynucleotide encoding a second selectable marker that is different from the first selectable marker; iii. a third polynucleotide encoding a 5’ homology arm; and iv. a fourth polynucleotide encoding a 3’ homology arm; introducing the first nuclease of (a) and the donor plasmid of (b) into a cell such that the first and second polynucleotides are inserted into the target genomic locus; selecting cells expressing the first selectable marker; and introducing into the cells of (d): a second nuclease capable of excising the selection cassette to generate an endogenous gene tagged with the second selectable marker; thereby producing the cell comprising the at least one tagged endogenous gene.
[0218] The various methods described herein for tagging endogenous genes can readily be applied to this general method for endogenous gene tagging.
EXAMPLES
[0219] The following examples are for the purpose of illustrating various exemplary embodiments of the invention and are not meant to limit the scope of the present invention in any fashion. Alterations, modifications, and other changes to the described embodiments which are encompassed within the spirit of the invention as defined by the scope of the claims are specifically contemplated.
Example 1 - A ribonucleoprotein -based CRISPR/Cas9 system to create Fluorescent
Figure imgf000071_0001
Protein-tagged hiPSC lines
[0220] The CRISPR/Cas9 system was used to introduce a GFP tag into the genomic loci of various proteins by HDR-mediated incorporation. Exemplary proteins tagged by the methods described herein are shown in Tables 1 and 2 above. Experiments were designed to introduce GFP at the N- or C-terminus along with a short linker using a CRISPR/Cas9 RNP and a donor plasmid encoding the full length GFP protein (FIG. 1 A). The donor plasmid contained 1 kb homology arms about 1 kb in length, on either side of the GFP operably linked to a linker sequence and a bacterial selection sequence in the backbone. The example in the schematic shows successful N-terminal tagging via HDR resulting in the tag and linker being inserted after the endogenous start codon (ATG) in frame with the first exon (FIG. 1A, right panel). FIG. 1B illustrates a schematic of donor plasmids for N-terminal tagging of LMNB 1 and C-terminal tagging of DSP. crRNA Design
[0221] Custom synthetic crRNAs and their corresponding tracrRNAs were ordered from either IDT or Dharmacon. FIG. 13 shows the predicted genome wide CRISPR/Cas9 binding sites, categorized according to sequence profile and location with respect to genes. At least two independent crRNA sequences were used in each editing experiment in an effort to maximize editing success and elucidate the potential significance of possible off-target effects in the clonal cell lines generated (FIG. 13 A). Predicted alternative CRISPR/Cas9 binding sites were categorized for each crRNA used and each predicted off-target sequence was categorized according to its sequence profile (the number of mismatches and RNA or DNA bulges it contains relative to the crRNA used in the experiment and their position relative to the PAM) (FIG. 13B and 13C). Cas- OFFinder was used to discriminate between crRNA sequences with respect to their genome-wide specificity (Bae et al ., (2014) Bioinformatics, 30(10): 1473-1475) by identifying all alternative sites genome-wide with < 2 mismatches/bulges in the non-seed and/or < 1 mismatch/bulge in the seed region, with an NGG or NAG PAM. As indicated in FIG. 13 A, the seed and non-seed region of a crRNA binding sequence was defined with respect to its proximity to the PAM sequence. All predicted off-target sites were additionally categorized according to their location with respect to annotated genes (FIG. 13D). Genomic location was defined as follows:
(a) exon: inside exon or within 50 bp of exon;
(b) genic: in intron (but >50 bp from an exon) or within 200 bp of an annotated gene;
(c) non-genic: >200 bp from an annotated gene.
[0222] When possible, crRNAs targeting Cas9 to within 50bp of the intended GFP integration site were used, with a strong preference for any crRNAs with binding sites within lObp. A subset of CRISPR/Cas9 alternative binding sites identified by Cas-OFFinder were selected for sequencing and FIG. 13E shows the breakdown of sequenced off-target sites by genomic location with respect to annotated genes. Numbers above bars represent the number of clones sequenced for each experiment. All 406 sequenced sites were found to be wild type.
[0223] Only crRNAs unique within the human genome were used with one unavoidable exception (TOMM20, where the locus sequence restricted crRNA choice), and crRNAs whose alternative binding sites include mismatches in the“seed” region and are in non- genic regions were prioritized whenever possible. Table 5 below shows exemplary polynucleotide sequences of the crRNA sequences.
Table 5: Exemplary crRNA sequences
Figure imgf000072_0001
Figure imgf000073_0001
Figure imgf000074_0001
Donor plasmid design
[0224] Donor plasmids were designed for each target locus and contained design features specific to each target and a GFP-encoding nucleic acid sequence (See, e.g. , FIG. 1A and FIG. 1B). Homology arms of about 1 kb in length and corresponding to the endogenous DNA regions located 5’ and 3’ to the target insertion site were designed from the hg38 reference genome and were corrected for known SNPs in WTC11 cells. Unique linkers for each locus were used and were inserted 5’ of the GFP sequence for C-terminal tagging of the endogenous protein or 3’ of the GFP sequence for N-terminal tagging of the endogenous protein. When necessary, mutations were introduced to the plasmid backbone to prevent crRNA binding and Cas9-mediated cleavage of the plasmid. Plasmids were initially created either by In-Fusion assembly of gBlock pieces (IDT) into a pUCl9 backbone, or the plasmids were synthesized and cloned into a pU57 backbone by Genewiz. All plasmids were deposited in the Addgene database. Donor plasmids were diluted to working concentrations of 1 pg/pL in TE. In some experiments, higher concentrations of donor plasmid were used, but lower concentrations (<500 ng/pL) were avoided. Table 6 below illustrates nucleic acid sequences for exemplary plasmid inserts comprising GFP detectable tags, homology arms targeting the indicated genes, and linkers including:
(a) 5’ paxillin homology arm (SEQ ID NO: 6) - linker (SEQ ID NO: 278) - EGFP - 3’ paxillin homology arm (SEQ ID NO: 21);
(b) 5’ SEC61B homology arm (SEQ ID NO: 7) - mEGFP - linker (SEQ ID NO: 279) - 3’ SEC61B homology arm (SEQ ID NO: 22);
(c) 5’ TOMM20 homology arm (SEQ ID NO: 9) - linker (SEQ ID NO: 281) - mEGFP - 3’ TOMM20 homology arm (SEQ ID NO: 24);
(d) 5’ TUBA1B homology arm (SEQ ID NO: 10) - mEGFP - linker (SEQ ID NO: 282) - 3’ TUBA1B homology arm (SEQ ID NO: 25); (e) 5’ LMNB1 homology arm (SEQ ID NO: 4) - mEGFP - linker (SEQ ID NO: 276) - 3’ LMNB1 homology arm (SEQ ID NO: 19);
(f) 5’ FBL homology arm (SEQ ID NO: 3) - linker (SEQ ID NO: 275) - mEGFP - 3’ FBL homology arm (SEQ ID NO: 18);
(g) 5’ ACTB homology arm (SEQ ID NO: 1) - mEGFP - linker (SEQ ID NO: 273) - 3’ ACTB homology arm (SEQ ID NO: 16);
(h) 5’ DSP homology arm (SEQ ID NO: 2) - linker (SEQ ID NO: 274) - mEGFP - 3’ DSP homology arm (SEQ ID NO: 17);
(i) 5’ TJP1 homology arm (SEQ ID NO: 8) - mEGFP - linker (SEQ ID NO: 280) - 3’ TJP1 homology arm (SEQ ID NO: 23); and
(j) 5’ MYH10 homology arm (SEQ ID NO: 5) - mEGFP - linker (SEQ ID NO: 277) - 3’ MYH10 homology arm (SEQ ID NO: 20).
[0225] 5’ homology arm sequences are shown in underlined text, linker sequences are shown in italic text, tag sequences are shown in regular text, and 3’ homology arm sequences are shown in bold text. Additional plasmid insert sequences are provided in SEQ ID NOs: 31 - 84.
Table 6: Exemplary plasmid insert sequences
Figure imgf000075_0001
Figure imgf000076_0001
Figure imgf000078_0001
Figure imgf000079_0001
Figure imgf000080_0001
Figure imgf000081_0001
Figure imgf000082_0001
Figure imgf000083_0001
CRISPR/Cas9 RNP System
[0226] Wild type (WT) S. pyogenes Cas9 (spCas9) protein was purchased from UC
Berkeley QB3 Macrolab and was pre-complexed in vitro with synthetic CRISPR RNA (crRNA) and trans-activating crRNA (tracrRNA) duplexes to generate a CRISPR/Cas9 ribonucleoprotein (crRNP). Briefly, the crRNA and tracrRNA oliognucleotides were reconstituted to 100 mM in TE at pH 7.5 (catalog #11-01-02-02, IDT). The crRNA and tracrRNA oligonucleotides were then combined in a sterile PCR at a final concentration of 40 pM in Duplex Buffer (100 mM potassium acetate; 30 mM HEPES, pH 7.5). Using a thermocycler or heat block, the crRNA and tracrRNA mixture was heated to 95 °C for 5 min to generate a crRNA:tracrRNA duplex. After heating, the crRNA:tracrRNA duplex was allowed to cool at room temperature for a minimum of two hours, after which the crRNA:tracrRNA duplex was kept on ice. crRNA:tracrRNA duplexes were then diluted to a working concentration of 10 mM in TE. All dilutions and stocks were kept on ice throughout the protocol. Alternatively, the crRNAdracrRNA duplexes were stored at -20°C for later use.
[0227] spCas9 was stored at -80°C and was thawed on ice or at 4°C until no ice pellet was visible, approximately 2-5 min. spCas9 was then diluted to a working concentration of 10 pM in TE in preparation for use. Alternatively, working concentrations of Cas9 protein were stored at -20°C for up to 2 weeks and multiple freeze-thaw cycles were avoided (< 3 freeze-thaw cycles recommended).
[0228] crRNPs were generated by combining the solution of crRNA:tracrRNA duplexes and Cas9 protein in a 1.5 mL eppendorf tube and gently pipetting up and down three times. A separate crRNP was generated for each reaction to be performed. crRNPs were incubated a room temperature for a minimum of 10 minutes and no longer than 1 hour prior to the addition of the complexes to cells.
Cell Culture and Transfection
[0229] WTC iPSCs were cultured according to described methods. Briefly,
WTC11 iPSCs were cultured in a feeder free system on tissue culture plates or dishes coated with pheno red-free GFR Matrigel (Corning) diluted 1 :30 in DMEM/F12 (Gibco) in mTeSRl media (StemCell Technologies) supplemented with 1% (v/v) Penicillin-streptomycin (P/S) (Gibco). Cells were not allowed to reach confluency greater than 85% and were passaged every 3-4 days by dissociation into single-cell suspension using StemPro® Accutase® (Gibco). When in single cell suspension, cells were counted using a Vi-CELL® Series Cell Viability Analyzer (Beckman Coulter). After splitting, cells were re-plated in mTeSRl supplemented with 1% P/S and 10 mM ROCK inhibitor (RI) (Stemolecule Y-27632, Stemgent) for 24 h. Expired media was replenished with fresh mTeSRl media supplemented with 1% P/S daily at 37°C and 5% CO2.
[0230] Prior to transfection, mTeSRl media (400 mL basal media with provided
100 mL 5X supplement (catalog # 05850, Stem Cell Technologies) with added 5 mL (1% v/v) Penicillin/Streptomycin (catalog # 15140-122, Gibco) was prepared and sterile filtered with a 0.22 pm filter prior to use. mTeSRl media was brought to room temperature on the bench top, and was not warmed in a 37°C water bath. mTeSRl + ROCK inhibitor (Ri) media was prepared by adding 10 mM Ri to mTeSRl media at a 1 : 1000 dilution. Accutase was warmed in a 37°C water bath. Previously prepared Matrigel-coated vessels (stored at 4°C) were brought to room temperature. 6- well plates were prepared by aspirating and discarding any excess Matrigel liquid, and adding 4 mL of RT mTeSRl + Ri media to each well. Plates with media were kept in an incubator at 37°C and 5% CO2 until ready to plate cells after the transfection procedure.
[0231] Cells were aliquoted in mTeSR + Ri into separate 1.5 mL eppendorf tubes.
Cells were pelleted by centrifuging in a micro-centrifuge at 211 x g for 3 min at room temperature. Various delivery methods including CrisprMax, GeneJuice, Amaxa and Neon were evaluated before concluding that Neon electroporation resulted in favorable co-introduction of protein, RNA, and plasmid into hiPSCs as measured by transfection of a control reporter plasmid and T7 assays as a read out for Cas9 activity (data not shown). Supernatant was aspirated and discarded and cells were resuspended in Buffer R from the Neon Transfection Kit. 8xl05 cells were resuspended in 100 pL Neon Buffer R with 2 pg donor plasmid, 2 pg Cas9 protein duplexed with a crRNA:tracrRNA at a 1 : 1 molar ratio to Cas9, then electroporated with one pulse at 1300 V for 30 ms, and plated onto Matrigel-coated 6-well dishes with mTeSRl media supplemented with 1% P/S and 10 pM RI. Transfected cells were cultured as previously described for 3-4 days until the transfected culture had recovered to -70% confluent. Transfected cells were incubated for at least 24 hours before changing the media to mTeSRl without Ri. Successfully transfected cells were identified and harvested by FACS sorting for use in downstream applications after reaching a healthy confluency and maturity (approximately 3-4 days) (FIG. 1C).
Example 2 - Generating clonal lines of GFP-tagged hiPSCs
Enrichment of sene-edited cells
[0232] Fluorescence-activated cell sorting (FACS) was used to enrich the population of gene edited cells after transfection and to evaluate rates of HDR (FIG. 2A). The cell suspension (0.5 - 1.0 x 106 cells/mL in mTeSRl + RI) was filtered through a 40 pM mesh filter into polypropylene round bottom tube. As expected for tagging experiments targeting diverse cellular proteins, a range of GFP fluorescent intensity was observed in edited populations (FIG. 2A and FIG. 2B). The GFP intensity determined by FACS correlated with transcription levels of the target protein observed by RNAseq analysis from the WTC parental cell line (RNA-seq analysis shown in FIG. 12). The percentage of GFP+ cells above the background defined by untransfected, unedited cells was used as a measure of HDR-mediated knock-in efficiency (FIG. 2B). Successful GFP-tagging was observed with at least one crRNA in 10 of the 12 target loci even when HDR was inefficient (<l%). Of the successful edits, editing efficiency was variable across the genomic loci with the majority of the experiments yielding < 0.1% - 4% GFP+ cells. Sec6lb was a notable exception, wherein 20% of the treated cells were GFP+ (FIG. 1D). The observed efficiency at each locus was consistent between experiments. These data indicate that HDR efficiency at a given locus depends significantly on the crRNA used, as in several experiments only one crRNA gave rise to a GFP+ population of cells (FIG. 1D).
[0233] In all gene targeting experiments, flow-based selection resulted in the recovery and enrichment of GFP-tagged clones, even when HDR was inefficient (< 1%). For example, weak GFP signal was observed in some experiments where the target gene transcript was relatively scarce (such as PXN) or where the protein is known to localize to small foci in cells corresponding to cell junctions (DSP) or substrate adhesion sites (PXN). However, enriched populations of cells edited at these loci were able to be obtained, despite the low percentages of GFP+ cells after transfection (FIG. 2A). Experiments were also performed to assess HDR efficiency as a function of variable homology arm lengths in the donor plasmid. Among the three loci tested, there was a range of efficiencies with the standard 1 kb homology arms. However, the 1 kb arms flanking the intended protein tag sequence resulted in the best and most reliable efficiency compared to the shorter (200 bp or 50 bp arms) (data not shown).
[0234] After FACS enrichment, approximately > 70% of the cells were GFP+ even after a period of recovery and scale up post sorting, indicating that flow cytometry is an efficient method for isolation of GFP+ cells. To ensure the knock-in of GFP to the targeted genomic locus resulted in appropriate localization of the resulting fusion proteins, the cells were analyzed by live fluorescence imaging prior to generating clonal lines. Each population displayed localization of the GFP signal to the anticipated cellular structure (FIG. 2C). FIG. 2D shows a representative image of the LMNB 1 Crl FACS-enriched population showing an enrichment of GFP+ cells.
[0235] Clonal cell lines generated from these edited, enriched cell populations were then generated to identify and isolate precisely edited cells. Briefly, cells from the FACS-enriched population were seeded at a density of 104 cells in a 10 cm Matrigel-coated tissue culture plate. After 5-7 days clones were manually picked with a pipette and transferred into individual wells of 96-well Matrigel-coated tissue culture plates and expanded clonally. Greater than 90% of these clones survived colony picking. After 3-4 days, colonies were dispersed with Accutase and transferred into a fresh 96-well plate. After recovery, the plate was divided into plates for ongoing culture or freezing and gDNA isolation. When cells were 60-85% confluent they were dissociated and pelleted in 96-well V-bottom plates for cryopreservation. Cells were then resuspended in 60 pL mTeSRl supplemented with 1% P/S and 10 mM RI. Two sister plates were frozen using 30 pL cell suspension per plate, added to 170 pL CryoStor® CS10 (StemCell Technologies) in non- Matrigel coated 96-well tissue culture plates. Plates were sealed with Parafilm and introduced to the -80°C freezer in a room temperature Styrofoam box. Plates were stored long term at -80°C for up to 8 weeks before thawing. Few clones (< 5% across experiments) spontaneously differentiated after isolation, splitting, and freezing and a majority of clones were able to be scaled up for genetic and quality control experiments. A schematic of the overall selection and quality control process is shown in FIG. 1D.
Example 3 - Genetic screening of edited clones
[0236] Genetic screening analyses were performed in order to identify clones in which GFP tagging was performed precisely, without damage to endogenous untagged alleles (if present) and without permanent incorporation of the plasmid donor backbone into the genome. A genetic screening strategy was used to rapidly discriminate between precisely and imprecisely edited clones. Criteria for precise editing were as follows:
(a) Incorporation of the GFP tag in-frame with the targeted exon;
(b) The absence of random or on-target donor plasmid backbone integration; and
(c) No unintended mutations in either allele.
[0237] An overview of the genetic screening process is shown in FIG. 3, Steps 1 through 3, including digital droplet PCR (ddPCR, FIG. 3, Step 1), tiled junctional PCR assays (FIG. 3, Step 2), and sequencing analysis of inserted amplicons (FIG. 3, Step 3).
Digital droplet PCR fddPCR)
[0238] Because primers and probes for GFP, the donor plasmid backbone, and the
RPP30 reference gene could be used to analyze all gene edits, a droplet digital PCR (ddPCR) assay was used to rapidly interrogate large sets of clones in parallel without having to optimize parameters specifically for each target gene, a significant advantage for our high throughput platform. During clonal expansion, a sample of cells was pelleted and total gDNA was extracted using the PureLink Pro 96 Genomic DNA Purification Kit (Life Technologies). ddPCR was performed using the Bio-Rad QX200 Droplet Reader, Droplet Generator, and QuantaSoft software.
[0239] Assays were designed to measure three DNA sequences common to each experiment: (1) the GFP tag sequence to measure tag incorporation; (2) the ampicillin or kanamycin resistance gene to assess stable integration of the plasmid backbone; and (3) a two- copy genomic reference locus (RPP30) to calculate genomic copy number. These sequences were used to identify clones with a GFP:RPP30 signature of ~0.5 or -1.0, suggesting monoallelic or biallelic stable integration of the GFP sequence into the host cell genome. Clones with an elevated AmpR/KanR:RPP30 ddPCR signature (>0. l) suggested stable integration of the donor plasmid backbone and were rejected.
[0240] First, GFP-tagged clones lacking plasmid backbone integration were identified using ddPCR, with equivalently amplifying primer sets and probes corresponding both to the GFP tag and the donor plasmid backbone. The abundance of the GFP tag sequence was quantified (x-axis in FIG. 3, Step 1) and normalized to a known 2-copy genomic reference gene (RPP30) in order to calculate genomic GFP copy number in the sample. The reference assay for the 2-copy, autosomal gene RPP30 was purchased from Bio-Rad. The assay for mEGFP detection was as follows:
(a) Primers:
(i) 5 '- GCCGAC AAGC AGAAGAACG-3 ' (SEQ ID NO : 187)
(ii) 5'-GGGTGTTCTGCTGGTAGTGG-3' (SEQ ID NO: 188)
(b) Probe: /56- FAM/AGATCCGCC/ZEN/ACAACATCGAGG/3IABkFQ/
(SEQ ID NO: 189).
[0241] The copy number of a marker sequence in the donor plasmid (AMP or KAN resistance genes) in each clone (y-axis in FIG. 3, Step 1) was also calculated. The assay for AMP was as follows:
(a) Primers:
(i) 5'-TTTCCGTGTCGCCCTTATTCC -3' (SEQ ID NO: 190)
(ii) 5'- ATGTAACCCACTCGTGCACCC -3' (SEQ ID NO: 191) (b) Probe: /5HEX/T GGGT GAGC/ZEN/AAAAAC AGGAAGGC/3 IABkF Q/ (SEQ ID NO: 192)
[0242] The reported final copy number of mEGFP per genome was calculated as the ratio of [(copies / pL mEGFP) - (copies / pL nonintegrated AMP)] / (copies / pL RPP30), where a ratio of 0.5 indicated monoallelic insertion (~ 1 copy per genome) and a ratio of 1 indicated biallelic insertion (~ 2 copies/genome). The AMP sequence was used to normalize mEGFP signal only when integration into the genome was ruled out during primary screening. For primary screening [(copies / pLmEGFP) / (copies / pLRPP30) was plotted against [(copies / pLAMP) / (copies / pLRPP30) in order to identify cohorts of clones for ongoing analysis.
[0243] Clones with a GFP copy number of -1.0 (monoallelic) or ~2.0 (biallelic) and AMP/KAN < 0.2 were putatively identified as correctly edited clones. Combining data across all successful editing experiments, 39% of clones were retained as candidates using this assay (FIG. 5 A). Clones with a GFP copy number 0.2-1 were considered possible mosaics of edited and unedited cells and were rejected. Clones with a GFP copy number between ~l and ~2 were further screened to identify potential biallelic clones from mixed cultures.
[0244] The screening strategy also identified several faulty outcomes in the editing and selection process including unedited clones co-purified during flow cytometry selection, and clones harboring plasmid backbone in the targeted locus and enabled selection of successfully edited clones. These results demonstrate that the addition of the ddPCR assay to the genetic screening process enabled selection of successfully edited clones and eliminated unsuccessful or off-target edits from downstream analyses.
Tiled-junctional PCR
[0245] Clones whose ddPCR signature indicated the stable presence of GFP in the genome (GFP:RPP30 values ~0.5 or 1) and the absence plasmid backbone integration (AmpR/KanR:RPP30 < 0.1) were further analyzed by tiled-junctional PCR to determine the presence of the predicted tagged alleles and sequences.
[0246] Primer sequences used in each PCR reaction are shown in FIG. 23. All primers are listed in 5’ to 3’ orientation. PCR was used to amplify the tagged allele in two tiled reactions spanning the left and right homology arms, the mEGFP and linker sequence, and portions of the distal genomic region 5’ of the left homology arm and 3’ of the right homology arm using PrimeStar® (Clontech) PCR reagents and gene-specific primers. Both tiled junctional PCR products were Sanger sequenced bidrectionally with PCR primers when their size was validated as correct by gel electrophoresis and/or Fragment Analyzer (FIG. 5E). This enabled confirmation of GFP tag incorporation without large insertions or deletions the tagged allele. 90% (n=23 l) of the overall clones tested in this assay contained expected junctional PCR products after initial confirmation by ddPCR (FIG. 5B). Furthermore, the majority of the clones rejected based on ddPCR signature ( e.g ., clones with > 0.1 AmpR/KanR:RPP30 ratios) also contained inappropriate junctions. Sanger sequencing of the junctional amplicons from a subset of these clones (n=l07) confirmed correct sequences in all cases (data not shown).
Sequencing analysis of inserted amylicons
[0247] The untagged allele (for monoallelic GFP-tagged clones) was amplified and sequenced to ensure that no mutations had been introduced via the NHEJ repair pathway at the binding site of the crRNA used for editing). 77% (n=l77) of the clones analyzed from all experiments contained a wild type untagged allele (FIG. 5C) and a subset of these clones was chosen for further analysis in additional quality control assays. A subset of clones confirmed by ddPCR and junctional PCR from each gene edit were selected and analyzed by Sanger sequencing of the amplicon corresponding to the untagged allele in order to rule out unanticipated mutations at the tagged locus (FIG. 3). Clones with mutations caused by NHEJ in the untagged allele were rejected. Among clones with correct junctional product sizes, the correct sequence was confirmed in the overwhelming majority of clones (> 95%). To rule out the possibility of misleading junctional PCR outcomes in the final clones, such as rearrangements and duplications, a single PCR reaction designed to amplify both the tagged and untagged allele across both homology arm junctions was used (FIG. 6A - FIG. 6B). In 9 out of 10 cases, the presence of the expected products for both the tagged and untagged alleles was confirmed (FIG. 6C).
Conclusions
[0248] Clones were frequently rejected due to stable integration of plasmid backbone sequence and these rejected clones were further analyzed. In many cases, clones were derived from FACS-enriched populations in which most cells displayed the correct anticipated subcellular GFP tag localization, but nevertheless harbored the GFP tag and donor plasmid backbone at equivalent copy number. It is possible that non-random HDR-mediated incorporation of both the tag and the donor plasmid backbone at the targeted locus result in this pattern. Such an outcome would result in a tagged protein, but also unintended insertions of exogenous sequence into the locus (Rouet et al ., 1994; Hockemeyer et al ., 2009). This possibility was evaluated by performing the tiled junctional PCR assay (FIG. 3, Step 2) on clones rejected by ddPCR due to integrated plasmid backbone, in the same manner as clones putatively confirmed by ddPCR.
[0249] FIG. 5D shows the percentage of clones in each experiment with
KAN/AMP copy number > 0.2 (y-axis). Stacked bars represent 3 observed subcategories of rejected clones: (i) clones with one correct and one incorrect or missing junctions (interpreted as plasmid backbone integration at the targeted locus); (ii) clones in which no junctions are amplified (interpreted to contain random integration of the donor plasmid); and (iii) clones in which both junctions are correct (interpreted to contain duplications of the GFP tag sequence at the targeted locus). A large majority of clones gave rise to at least one junctional PCR amplicon, suggesting that plasmid integration occurs at the target locus. Clones with no amplified junctions, as expected in the case of donor plasmid integration at random genomic locations, were uncommon (4% of failed clones). Much more frequently (51% of failed clones), junctions from rejected clones failed to amplify or were aberrantly large on one side of the tag but intact on the other side (FIG. 5D). 45% of the plasmid-integrated clones rejected by ddPCR (which were 45% of all clones) had correct junctions on both sides of the tag (FIG. 5D“combined”).
[0250] It is possible that these categories of clones harbor insertions and/or duplications derived from the donor cassette sequence delivered by HDR to non-coding regions flanking the GFP tag at the target locus. The prevalence of clones with this flawed editing outcome may underlie heterogeneity in the GFP signal intensity observed in some experiments. However, the ddPCR results largely correlated with the presence or absence of appropriate junctions (FIG. 5B) and validates the use of ddPCR as an efficient screening assay. Although clones deemed acceptable based on ddPCR signature largely overlapped with those with correct tiled PCR junction products ( e.g . ZO-l, PXN), suggesting that it may be possible to use this approach as the primary screening method instead of ddPCR, this was not the case. Confirmation of clones with amplification of both junctions does not, on its own, exclude the possibility of incorrect repair at the targeted locus (FIG. 5D). [0251] The relative rates of putative clonal confirmation and rejection in this assay varied widely based both on the locus and the crRNA used (FIG. 5A). For example, TOMM20 editing yielded GFP+ cells from only one crRNA (Crl), all of which contained integrated plasmid (80/83) and/or faulty junctions (3/83) (FIG. 4B and FIG. 5A - 5B, FIG. 14A, FIG. 6C). In the absence of precise editing at this locus, several TOMM20 clones with evidence of plasmid backbone insertion in the non-coding sequences at the TOMM20 locus were selected for expansion and downstream quality control analysis. The large majority of TUBA1B clones edited with Cr2 contained integrated plasmid, while most clones from Crl were unaffected (FIG. 4B). Similarly, the frequency and type of mutations found in the unedited allele were also target and locus specific, with ACTB Crl a notable outlier case in which NHEJ-mediated mutations in the untagged allele occurred in all analyzed clones (n=24) unlike ACTB Cr2 (FIG. 5C).
[0252] Putatively confirmed clones were almost exclusively tagged at one allele, while clones with putative biallelic edits with no plasmid incorporation were rare (FIG. 4A and FIG. 4B). Clones with ddPCR signatures consistent with biallelic editing (GFP copy number ~2) were observed at low frequency across all experiments (total n=8) (FIG. 4A, FIG. 14A). Only one clone (PXN Cr2 cl. 53) was confirmed as a biallelic edit with predicted junctional products (data not shown), but was later rejected due to poor morphology (FIG. 10 A). Other suspected biallelic clones were rejected due to incorrect junctional products and/or presence of the untagged allele (data not shown) indicating that these clones did not precisely incorporate the GFP tag in both alleles. The frequency of faulty HDR demonstrated by these data underscores the importance of multi-step genomic screening to identify precisely edited clones and confirm monoallelic editing.
[0253] Taken together, confirmation rates of 39% (GFP incorporation with no plasmid), 90% (correct junctions), and 77% (wild type untagged allele) were observed in each of the three screening steps across all gene targeting experiments (FIG. 5A-5C). Thus -25% of the clones screened in this manner met all three of these precise editing criteria. Donor plasmid integration was the most common category of imprecise editing, affecting 45% of all clones (FIG. 5D). These data suggest that this frequently occurs at the edited locus as a faulty byproduct of the editing process and that screening by junctional PCR alone, without a method to directly detect the plasmid backbone, leads to misidentification of clones with imprecise editing, despite appropriate localization of the tagged protein resulting from the edit (Jasin and Rothstein, 2013; Oceguera-Yanez et al. , 2016). Example 4 - Further genomic and proteomic validation of candidate clones
[0254] The analyses described above resulted in the identification of a refined set of candidate clones, wherein both tagged and untagged alleles were validated for the correct sequence identity. These candidate clones were further validated in a number of lower throughput downstream assays.
[0255] To assess whether the clones that met the above gene editing criteria contained off-target mutations due to non-specific CRISPR/Cas9 activity, several final candidate clones from each experiment were analyzed for mutations at off-target sites predicted by Cas- OFFinder (FIG. 13A) (Bae et al, 2014). Potential off-target sites for each crRNA were prioritized for screening based both on their similarity to the on-target site and their proximity to genic regions. Five sites with the greatest similarity in sequence to the on-target site within the seed region and the protospacer-adjacent motif (PAM) and five sites that were the most similar within genic regions (within 2 kb of an annotated exon) were chosen for analysis. Approximate 200 bp of sequence flanking the predicted off-target site was amplified by PCR and the product was Sanger sequenced. PCR amplification of these regions followed by Sanger sequencing was performed to identify potential mutations in 3-5 final candidate clones for all 10 genome editing experiments (6-12 sequenced sites per clone) across 142 unique sites. Among a total of 406 sequenced loci, no off-target editing events were identified (FIG. 13 A). Follow-up exome sequencing of the final clones confirmed the absence of any mutations at predicted genic sites captured at adequate depth (data not shown). However, during this exercise, SNPs were identified that were subsequently confirmed to be present in the WTC parental cell line, indicating the ability of this method to uncover alternative alleles.
[0256] Western blot analysis was performed on lysates from each candidate clone in order to confirm that the observed shift in molecular weight of the tagged vs. untagged peptide was consistent with the known molecular weight of the linker and GFP tag (FIG. 9B and FIG. 18 A). Immunoblotting with antibodies against the endogenous protein yielded products consistent with both the anticipated molecular weight of the tagged and untagged proteins and was further confirmed in all cases using an anti-GFP antibody (FIG. 9B and FIG. 18 A). In Fig. 9B, lysates from ACTB cl. 184 (left), TOMM20 cl. 27 (middle), and LMNB1 cl. 210 (right) were compared to unedited WTC cell lysate by western blot. In all cases, blots with antibodies against the respective proteins (beta actin, Tom20, and nuclear lamin Bl) are shown in the left blot, and blots with anti-GFP antibodies are shown in the right blot, as indicated. Loading controls were either alpha tubulin or alpha actinin, as indicated.
[0257] Semi-quantitative imaging of the blot was also used to determine the relative abundance of protein products derived from each allele. In all cases, immunoblotting with antibodies against the endogenous protein yielded products consistent with both the anticipated molecular weight of the tagged and untagged peptides. Notably, the appropriate Tom20-GFP fusion protein product was obtained despite our inability to identify a precisely edited clone, suggesting that the additional plasmid backbone sequence did not disrupt the coding sequence of the TOMM20 gene. Antibodies used in these experiments are described in FIG. 24A and FIG. 24B.
[0258] The western blot data was used to quantify the abundance of the GFP- tagged protein copy relative to the total abundance of the targeted protein (FIG. 9C). Relative levels of the tagged/untagged protein varied by experiment, but was highly reproducible. While many clones expressed the tagged protein at -50% of the total protein in the cell, as expected for monoallelic tagging, others did not (FIG. 9C). In the most extreme example, although the final tagged beta actin clone expressed total levels of beta actin similar to the levels found in unedited cells, only 5% of the detected protein was tagged. This suggested that these cells adapted to any compromised function of the tagged allele while retaining normal viability and behavior.
Biallelic Edits
[0259] The observation that the tagged allele had reduced expression in some experiments coupled with the rarity of biallelic edits in these experiments raised the possibility that the tagged protein copy has reduced function. The tolerance of biallelic tagging (and thus whether the tagged protein has sufficient function) was tested by introducing a spectrally distinct red fluorescent protein tag (mTagRFP-T) into the unedited allele of two different tagged clonal cell lines, LMNBl-mEGFP and TUBAlB-mEGFP (FIG. 19A).
[0260] Putative biallelically edited cells were FACS-isolated, expanded, and imaged to confirm localization of both tags to the nuclear envelope in the enriched population (FIG. 19B). Additional experiments were performed to test whether transfection of two unique donor plasmids (one to deliver mEGFP and another for mTagRFP-T) simultaneously could produce biallelically edited cells in a single step in unedited cells using the RNP methods described above. Both methods produced populations of mTagRFP-T+/GFP+ cells, indicating tolerance of biallelic tagging at this locus despite previously observed reduced expression of the tagged protein (FIG. 19A).
[0261] In contrast to LMNB1, mTagRFP-T+/GFP+ cells were not able to be recovered after attempted editing of the TUBAlB-mEGFP clonal cell line with the TUBA1B- mTagRFP-T donor plasmid, nor were mTagRFP-T+/GFP+ cells able to be isolated when both donors were co-delivered to unedited cells, despite the prevalence of both mTagRFP-T+ and GFP+ cells as separate edited populations (FIG. 19A, right panels). These data suggest that genomic loci vary widely in their tolerance for biallelic tagging and that cells may compensate for monoallelic tags by reducing expression of the tagged protein, as observed (FIG. 9C). However, although the ratio of the expression of tagged protein to untagged protein varied by the edited line, the total amount of a protein (tagged plus untagged) in an edited line remained similar to the (untagged) amount in unedited cells (FIG. 9C, FIG. 18A - 18B).
[0262] To assess the possibility of allele-specific loss of expression in clonally derived cultures due to perturbed function of the tagged protein copy this, two cultures of the four cell lines displaying unequal tagged/untagged protein copy abundance (and TUBAlB-mEGFP as a control) were maintained for different amounts of time. These two sets of cultures were then imaged. As shown in FIG. 20A and B, no difference in the signal intensity or tag localization in cultures separated by four passages (14 days culture time). Similarly, no significant difference in the relative abundance of the tagged and untagged protein were observed in immunoblotting experiments performed on cultures that differed with respect to length of passage time (FIG. 21 A -C). Additionally, the ratio of tagged to untagged protein abundance in 4-5 independently edited clonal lines was consistent between the final clone chosen for expansion and alternative, independently generated clones (FIG. 21). Flow cytometry confirmed that GFP-negative cells were indistinguishably scarce in cultures at both passage numbers in each of five experiments and that the overall fluorescence intensity of the GFP -tagged protein was unaltered (FIG. 22 A). The consistency in expression across clones and passaging time provided further confidence in the stability of expression. Example 5 - Phenotypic and functional validation of candidate clones
[0263] Upon validating the expression and localization of the GFP-tagged protein in each of the genome-edited lines, experiments were performed to ensure that each expanded candidate clonal line retained stem cell properties comparable to the unedited WTC cells. Assays included morphology, growth rate, expression of pluripotency markers, and differentiation potential (FIG. 10, FIG. 22D). Undifferentiated stem cell morphology was defined as colonies retaining a smooth, defined edge and growing in an even, homogeneous monolayer (FIG. 10 A). Clones with morphology consistent with spontaneous differentiation were rejected (Thomson et al ., 1998; Smith, 2001; Brons et al, 2007; Tesar et al, 2007). Such cultures typically displayed colonies that were loosely packed with irregular edges and larger, more elongated cells compared to undifferentiated cells, as observed with one PXN clone (a confirmed biallelic edit) (FIG. 10A right-most image). Expression of established pluripotency stem cell markers was also determined, including the transcription factors Oct3/4, Sox2 and Nanog, and cell surface markers SSEA-3 and TRA-l-60 (Fig. 10B, FIG. 10F). High levels of penetrance in the expression of each marker (>86% of cells) were observed in all final clonal lines from the 10 different genome edits, similar to that of the unedited cells (Fig. 10B, FIG. 10F). Consistent with these results, low penetrance (<9% of cells) of the early differentiation marker SSEA-l was observed by flow cytometry in both the edited and control WTC cells (Fig. 10B, FIG. 10F). All 39 clones satisfied commonly used guidelines of >85% pluripotency marker expression and <15% cells expressing the differentiation marker SSEA-l used by various stem cell banks (Baghbaderani et al. , 2015).
Candidate clones retain expression of yluriyotency markers
[0264] Assays were performed to ensure that the clones identified to have precise edits retained stem cell properties during the process of gene editing and expansion. As such, the expression of established stem cell markers, including the transcription factors Oct3/4, Sox2 and Nanog, cell surface pluripotency markers Tra-l60 and Tra 181, and the pro-differentiation marker SSEA3 were measured by flow cytometry (FIG. 5A). Briefly, cells were dissociated Accutase as previously described, fixed with CytoFix Fixation Buffer™ (BD Bioscience), and frozen in KnockOut™ Serum Replacement (Gibco) with 10% DMSO. Cells were washed with 2% BSA in DPBS and half of the cells were stained with anti-TRA-l-60 Brilliant Violet™ 510, anti-SSEA-3 AlexaFluor® 647, and anti-SSEA-l Brilliant Violet™ 421 (all BD Bioscience). The other half of the cells were permeabilized with 0.5% Triton-XlOO and 2% BSA in DPBS and stained with anti- Nanog AlexaFluor® 647, anti-Sox2 V450, and anti-Oct-3/4 Brilliant Violet™ 510 (all BD Bioscience). Cells were acquired on a FACSAria Fusion (BD Bioscience) equipped with 405, 488, 561, and 637 nm lasers and analyzed using FlowJo software V.10.2 (Treestar, Inc.). Doublets were excluded using forward scatter and side scatter (height versus width), then marker-specific gates were set according to corresponding fluorescence-minus-one (FMO) controls to obtain the percent positive for each marker.
[0265] In all candidate clones tested, each nuclear marker was expressed well above the commonly used thresholds of > 85%+ for stem cell markers and < 15%+ for differentiation markers and comparable to the parental WTC line (FIG. 5A and 5B). When compared to the WTC reference line, all clones displayed negligible changes in the mean expression intensity of each nuclear marker. Cell surface pluripotency markers displayed similarly robust expression when analyzed in this manner, albeit with greater variability (FIG 5A and FIG. 5C). This analysis was conducted for a total of approximately 50 clones and only 10% were rejected due to changes in the expression profile of these markers. Although comparable, there was sufficient variability within each set of candidate clones candidate clones could be ranked relative to each other to determine those that were most similar to the WTC parent line.
[0266] In vitro differentiation assays to confirm the pluripotency of the cell lines were performed. Directed germ layer differentiation was compared between unedited cells and the final selected edited clonal line representing each of the 10 targeted structures. Each cell line was differentiated for 5-7 days under defined conditions to mesoderm, endoderm, and ectoderm using differentiation media specific to each lineage. The cells were stained for early markers of germ layer differentiation (Brachyury, Sox 17, and Pax6) and analyzed by flow cytometry (FIG. 10C, FIG. 11 A - FIG. HC, FIG. 10F) (Showell etal, 2004; Murry and Keller, 2008; Zhang etal, 2010; Viotti et al ., 2014). While the differentiation into each germ layer was variable, all three germ layer markers in the edited clones showed increased expression relative to undifferentiated cells (FIG. 10C). In all edited clones tested, > 91% of cells expressed Brachyury after mesodermal differentiation, > 47% expressed Sox 17 after differentiation to endoderm, and >65% expressed Pax6 upon ectoderm differentiation (FIG. 10C, FIG. 10F). Directed differentiation of edited clones into each germ layer lineage was generally comparable to unedited cells.
Gene edited candidate clones are capable of cardiomyocyte differentiation
[0267] Additional experiments were performed to assess whether each clone could robustly differentiate into cardiomyocytes. Each edited clone’s differentiation potential was assessed by directing it to a cardiomyocyte fate using established protocols using a combination of growth factors and small molecules (Lian et al ., 2015; Palpant el al ., 2015) and evaluated cultures for spontaneous beating (days 6-20) and cardiac Troponin T (cTnT) expression (days 20- 25), in order to evaluate the robustness of cardiomyocyte differentiation. Briefly, cells were seeded onto Matrigel-coated 6-well tissue culture plates at a density ranging from 0.5-2xl06 cells per well in mTeSRl supplemented with 1% P/S, 10 mM RI, and 1 mM CHIR99021 (Cayman Chemical). The following day (designated day 0), directed cardiac differentiation was initiated by treating the cultures with 100 ng/mL ActivinA (R&D) in RPMI media (Invitrogen) containing 1 :60 diluted GFR Matrigel (Corning), and insulin-free B27 supplement (Invitrogen). After 17 hours (day 1), cultures were treated with 10 ng/mL BMP4 (R&D systems) in RPMI media containing 1 pM CHIR99021 and insulin-free B27 supplement. At day 3, cultures were treated with 1 pM XAV 939 (ToCris) in RPMI media supplemented with insulin-free B27 supplement. On day 5, the media was replaced with RPMI media supplemented with insulin-free B27. From day 7 onto about day 20, media was replaced with RPMI media supplemented with B27 with insulin (Invitrogen). Cells were harvested using 0.5% Trypsin-EDTA (Gibco), filtered with a 40 pm cell strainer, fixed with CytoFix Fixation Buffer™, permeabilized with BD Perm/Wash™ buffer, stained with anti- Cardiac Troponin T AlexaFluor® 647 (BD Bioscience) or isotype control, acquired on a FACS Aria Fusion and analyzed using FlowJo software V.10.2.
[0268] Clonal lines generally displayed successful cardiomyocyte differentiation, with cTnT expression and qualitative spontaneous contractility comparable to the parental WTC line (FIG. 10D, 10E, FIG. 10F). Variability was observed both between clones and between differentiation experiments within a given clone. In order to address this variation, the initial density of the cells was varied. Initial beating, homogeneity of beating in the culture, and perceived strength of contraction were used as qualitative markers to rank clones relative to each other. Additionally, Troponin T expression after 20 days in culture was used as a quantitative measurement of the cells’ commitment to cardiomyocyte identity (FIG. 11D and 11E). The total fraction of cells in each culture that was positive for Troponin T varied significantly between experiments, but in all cases >30% Troponin T+ cells were obtained. Data for cell lines with GFP- tagged PXN, TOM20, TUBA1B, LMNB1, and DSP can be found at the Allen Institute for Cell Science’s website under the cell-line catalog section, which is incorporated by reference in its entirety.
[0269] This cardiomyocyte differentiation data combined with pluripotency marker expression and germ layer differentiation data, support the conclusion that fusing GFP with these endogenously expressed proteins via monoallelic tagging does not appear to disrupt pluripotency or differentiation potential of these edited hiPSC cells.
[0270] Additional experiments can be performed according to protocols known in the art (e.g, Methods Mol Biol. 2014; 1210: 131-41; Biomed Rep. 2017 Apr; 6(4): 367-373; Methods Mol Biol. 20l7; l597: l95-206; Nat Commun. 2015 Oct 23;6:87l5; Mol Psychiatry. 2017 Apr 18. doi: l0. l038/mp.20l7.56; Scientific Reports volume 7, Article number: 42367 (2017)) and illustrated in FIG. 32 to determine the ability of the clonal cell lines to differentiate into hepatocytes, renal cells, neuronal cells, or other cells.
Edited clones are karyotvvicallv stable
[0271] Establishing clonal hiPSC lines and culturing them long term is known to carry the risk of fixing somatic mutations and/or chromosomal aneuploidies (Weissbein et al ., 2014). The possibility exists that the additional stressors inherent to gene editing heighten this risk. To address this concern, karyotype analysis was performed on each candidate clone. Karyotype analysis was performed by Diagnostic Cytogenetics Inc. (DCI, Seattle WA). At minimum of 20 metaphase cells were analyzed per clone. Of the ~50 candidate clones tested, only two instances where karyotypic abnormalities became fixed in the culture were detected (data not shown). In one instance, a single candidate clone from an experiment was rejected. Further, all clones identified as candidates from the experiment targeting ACTN1 displayed the same aneuploidy event, suggesting that it had become fixed early in the editing process. These data indicate that that aneuploidy occurs at a rate that is non-negligible, and that chromosomal abnormalities must be ruled out in each experiment. However, these data suggest that the rate of aneuploidy is permissively low for high-throughput editing using these methodologies. Transcriptome-wide analysis of edited candidate clones
[0272] Transcriptome-wide analysis of two final candidate clones from a number of experiments was performed. This analysis was performed to determine whether hiPSC clones maintained over 10-15 passages and harboring potentially disruptive tags on key cellular proteins demonstrated similar global gene expression patterns to the unedited reference line, or if they had alternatively evolved into globally distinct cell lines in a manner not distinguishable by the above described quality control assays (data not shown).
[0273] In order to further characterize global gene expression changes between each edited clone and the reference line, genes whose expression differs by greater than 2-fold are annotated and compared between experiments and expression from control cell lines. Cluster analysis on these data sets is also performed to determine the most statistically significant GO term categories among edited clones. RNA-seq analysis is also performed to confirm the absence of detectable mutations in expressed sequences due to potential off-target Cas9 activity in the final clones. These findings are further confirmed by next generation exome sequencing. Analysis for additional clones is also be performed.
Phenotypic characterization of GFP-tagged iPSC lines
[0274] These results indicate that the stem cells and stably tagged stem cell clones and differentiated cells therefrom of the invention can be used for three-dimensional live cell imaging of intracellular proteins. In further embodiments, the methods allow for use of the cells for screening, observing cellular dysplasia, disease staging, monitoring disease progression or improvement or cellular stress in response to a test agent.
[0275] As a final characterization step, live imaging on preferred candidate clones was performed. Cells were maintained with phenol red free mTeSRl media (STEMCELL Technologies) one day prior to live-cell imaging. Stably tagged stem cell clones can be imaged using spinning disk confocal microscopy. Cells were imaged using spinning disk confocal microscopy at low (lOx or 20x) and high (lOOx) magnification. Microscopes were outfitted with a humidified environmental chamber to maintain cells at 37°C with 5% CO2 during imaging. Healthy, undifferentiated WTC hiPSCs ranged from 5-20 mih in diameter and 10-20 pm in height and grew in tightly packed colonies (FIG. 8A, 8B). The resulting endogenously tagged lines allowed for the observation of tagged proteins and corresponding organelles with exceptional clarity due to their endogenous regulation and absence of fixation and staining artifacts. Without exception, distinct localization patterns of the tagged protein were observed when compared to cells transiently transfected with constructs expressing GFP fusion proteins.
[0276] For example, paxillin was observed in the matrix adhesions formed between substrate contact points and the basal surface of cells, as well as at the dynamic edges of colonies (FIG. 8C). Beta actin localized to the basal surface of colonies both in prominent filaments (stress fibers) and at the periphery of cell protrusions (lamellipodia), as well as in an apical actin band at cell-cell contacts, a feature common in epithelial cells (FIG. 8D). Non-muscle myosin heavy chain IIB had similar localization in actomyosin bundles, including at basal stress fibers and in an apical band (FIG. 8D, 8E). Desmoplakin localized to distinct puncta at apical cell-cell boundaries as expected of desmosomes, which form junctional complexes in epithelial cells (FIG. 8F). Tight junction protein ZOl also localized apically to cell-cell contacts where tight junctions are formed (FIG. 8G). These observations suggest the presence of multiple distinct epithelial junction complexes and an overall apical junction zone in edited hiPSC colonies. In addition, alpha tubulin was both diffuse, as unpolymerized tubulin, and localized to microtubules, which exhibited apicobasal polarity in non-dividing cells with many microtubules extending parallel to the z- direction as reported for some epithelial cell types (FIG. 8H) (Musch, 2004; Toya and Takeichi, 2016).
[0277] Sec6l beta localized to endoplasmic reticulum (FIG. 81), and Tom20 localized to mitochondria (FIG. 8J) and were distributed throughout the cytoplasm, often with greatest density in a cytoplasmic‘pocket’ near the top of the cell and at lowest density in the central periphery of the cell. The center region of the cell was almost entirely occupied by the nucleus, which was observed outlined by nuclear lamin Bl (FIG. 8B). Fibrillarin was localized to nucleoli within the center of the nuclei (FIG. 8K).
[0278] These observations are consistent with the epithelial nature of tightly packed undifferentiated WTC hiPSCs grown on 2D surfaces. All final candidate clones, spanning 10 editing experiments, exhibited predicted subcellular localization of their tagged proteins (FIG. 8). Taken together, these data demonstrate the ability to identify clonal lines in which genome editing did not interfere with the expected localization of the tagged proteins to their respective structures. Furthermore, live-cell time-lapse imaging demonstrated that proper localization occurred throughout the cell cycle and the presence of the tagged protein did not noticeably interfere with cell behavior.
[0279] The impact of the tag on correct localization of the targeted protein compared to the localization of the native, unedited protein was also assessed. Edited clones were fixed alongside unedited cells and immunocytochemistry or phalloidin staining was performed. In all 10 experiments, no detectable differences in the pattern of antibody labeling between the unedited cells and the edited cell line were observed (FIG. 9A, FIG. 15, and FIG. 16). Within all edited cell lines, localization of the GFP -tagged protein was also compared to the pattern of antibody labeling, which was predicted to label both the GFP -tagged and untagged protein fractions within the same cell. In all cases, this revealed extensive co-localization (FIG. 15, and FIG. 16).
[0280] As endogenously GFP -tagged proteins in live imaging experiments generate more interpretable localization data than that produced in fixed and immunostained cells (Allen Institute for Cell Science, 2017), endogenous localization in edited lines was directly compared to cells transiently transfected with constructs expressing FP-fusion proteins (EGFP or mCherry) (FIG. 17). Although transient transfection, like fixation and immunostaining, is vulnerable to artifacts, cells with low transient transgene expression exhibited similar tag localization to that observed in the gene edited cell lines. In other cases, high transient transgene expression led to artifacts, including high diffuse cytosolic background and aggregation of the tagged protein. Intensity level was used as a proxy to distinguish between low- and high-level transgene overexpression, though low-level expressing cells were often rare. As examples, transfected cells with low EGFP -tubulin transgene expression were comparable to the gene edited alpha tubulin cells (TUBAlB-mEGFP), although the transfected cells contained higher cytosolic signal. Transfected cells with low desmoplakin-EGFP transgene expression revealed a similar pattern to that observed in the DSP-mEGFP gene-edited line, but the transfected cell population also contained other cells, likely expressing the transgene to a greater extent, with high cytosolic signal and increased number and size of desmosome-like puncta. Transfection and overexpression of Tom20 led to cell death and perturbed mitochondrial morphology, while the endogenously tagged cells displayed intact mitochondrial networks with both normal morphology and cell viability. These results highlight the importance of using multiple techniques to validate the localization of tagged proteins in gene edited cell lines. They also demonstrate the advantages to using genome editing to observe cellular structures rather than conventional methods that rely on overexpression, fixation, and antibody staining.
Example 6 - Development of image-based drug-induced protein signatures
[0281] The collection of the gene-edited hiPS cells described herein was used to develop image-based drug-induced protein signatures. Experiments were conducted with 12 known reference compounds that disrupt various key cellular structures and processes including cell division, microtubule organization, actin dynamics, vesicle trafficking, cell signaling, DNA replication, calcium regulation, ion channel regulators, and statins. Agents used in these experiments are shown in FIG. 26A.
[0282] The pipeline was prototyped using a small suite of well-characterized compounds that include brefeldin A, paclitaxel, rapamycin, wortmannin and staurosporine (FIG. 26A). Low-resolution imaging (24x magnification) was used to test a matrix of concentrations and time points for each compound of interest to establish an initial set of conditions for each perturbation. hiPSC colonies were monitored for morphologic changes using transmitted light (FIG. 26B) and an endogenously GFP-tagged structure, such as microtubules (FIG. 26C). After establishing an end point response for several compounds, high-resolution (l20x magnification) imaging of multiple cell lines was performed under standardized perturbation parameters, in the presence of dyes to label the nucleus and cell membrane for reference purposes (FIG. 27). FIG. 27 shows representative image planes from z-stacks collected at l20x of the GFP-tagged cell lines with nucleus and cell membrane markers. Cells were treated with the indicated perturbation agent at a pre-selected concentration and time point established in phase I.
[0283] These perturbations showed alterations roughly analogous to those seen in other cell types. For example, the microtubule stabilizing agent paclitaxel increased microtubule bundle thickness and altered the shape and position of the mitotic spindle during hiPS cell division. In addition, paclitaxel, also induced aberrant reorganization of the ER in cells undergoing mitosis, while showing minimal effects on the bulk organization of the actin bundles and cell junctions. Other drugs, such as staurosporine, a broad kinase inhibitor, had major effects on colony and cell morphology, inducing rearrangements in cell packing and shape. It also induced re-localization of desmosomes, indicating that the cell-cell junctions undergo substantial rearrangement.
[0284] Fluorescence quantification of the 3D images were used to analyze drug- induced Golgi reorganization, cytoskeleton reorganization, and cell junction reorganization. To quantify the relative abundance of each structure of interest ( e.g . Golgi as presented above), the pixel intensities of the GFP channel (488nm) were summed across the entire z-stack. For each experiment, the same threshold was used to exclude background intensity noise across the control (DMSO) and experimental (perturbation agent) groups. The data were plotted by averaging z-stack data from a time interval (30-minute) and compared to the control DMSO data. Dunnett's multiple comparison test was used to perform one-way ANOVA between the different time intervals against the control group.
[0285] As shown in FIG. 28, Brefeldin A induced dissociation of the golgi within
30 minutes (FIG. 28A), while (S)-nitro blebbistatin induced fragmentation of the organelle (FIG. 28B). Additionally, rapamycin induced morphological reorganization of the golgi (FIG. 28C).
[0286] Relative protein abundance of actin and myosin were also quantified. As shown in FIG. 29, a reorganization and relative decrease in actin (FIG. 29A and 29B) and myosin protein abundance was observed in the presence of (S)-nitro blebbistatin (FIG. 29C). In addition, paclitaxel stabilized the microtubules by enhancing polymerization of tubulin, which was reflected in a trend of increased relative localized fluorescence intensity over time (FIG. 29D and 29F). Further, both staurosporine and (S)-nitro blebbistatin induced reorganization of the myosin through the thickness of the cell (FIG. 29E).
[0287] For drug-induced effects on cell junction reorganization, representative maximum intensity projections of a z-stack along the x-z axis are shown in FIG. 30. From these projections, the mean pixel intensity for the GFP channel along the x-axis, from the top of the image to the bottom, was measured to generate an intensity profile plot. These plots show the redistribution of ZO-l along the z-axis in the presence of both staurosporine and (S)-nitro- blebbistatin. In presence of staurosporine, desmosomes relocalized throughout the cell, and the number of DSP-positive plaques increased in number (FIG. 31). To analyze the change in desmosome number, the number of 3D objects in a z-stack were counted using the 3D Object Counter tool in Fiji. The images were thresholded by size and minimum pixel intensity such that -95% of the objects were captured. Data were analyzed by Student’ s t-test (** p< 0.01). [0288] These data demonstrate that image-based 3D data sets of fluorescently tagged structures in human induced pluripotent stem cells (hiPSC), generated by a scalable and reproducible imaging pipeline, identifies signature profiles for a range of well-characterized small molecules and can be used to generate a predictive model of the dynamic organization and behavior of cells. These unique data can be used to train predictive models to identify the effects of perturbing target pathways, ascertain“off-target” effects and the mode of action of unknown compounds, and identify likely pathways influenced by mutations. By building complete combinations of image-based observations of many structures/lines in the presence of a large number of standardized biochemical perturbations, a comprehensive database of drug signatures on hiPS cells in their normal, pathological and regenerative (developmental) states can be generated.
[0289] To generate the predictive model, the resulting imaging data from each compound per stably tagged stem cell clone or differentiated cell derived therefrom, can be compared to the negative controls (untreated and vehicle controls) to determine effect on various criteria including cell and subcellular morphology, localization of tagged structure, and dynamics. By testing each compound in multiple gene edited iPSC lines (where each line has one structure tagged with GFP), the effect of that compound on multiple structures can be assessed within the cell. First, the intended effect of each compound with the relevant gene edited cell line can be confirmed as described in the assays above. The effect of that compound on all other structures can be assessed using the suite of gene edited iPSC lines to create a unique“fingerprint” or signature for that compound in relation to multiple structures. The data generated with these established set of compounds can be used as an initial training set for assays with compounds with unknown function. These profiles can serve as a reference database that can be used for screening novel and previously uncharacterized compound libraries to identify targets, help guide mechanistic studies, and determine specificity. Additionally, the combination of using human, diploid, non-transformed cells with live imaging using these gene edited iPSCs can provide a much better platform for performing toxicology screening. Further, these predictive models based on the stem cells and stably tagged stem cell clones and differentiated cells therefrom of the present invention can be used for screening, observing cellular dysplasia, disease staging, monitoring disease progression or improvement or cellular stress in response to a test agent. Example 7 - Multi-step CRISPR/Cas9 gene editing method to create endogenously tagged mEGFP-fusions for transcriptionally silent genes in hiPSCs.
[0290] Described herein is a multi-step CRISPR/Cas9 gene editing method to create endogenously tagged mEGFP-fusions for transcriptionally silent genes in hiPSCs, allowing visualization of proteins that are only expressed upon differentiation. A donor template was designed containing the fusion tag (mEGFP) and an mCherry selection cassette delivered in tandem to a target locus via HDR (homology directed repair). The mCherry expression was driven by a constitutive promoter and served as a drug-free, excisable selection marker. Following this selection, the mCherry cassette is excised with Cas9, creating a mEGFP-fusion with the target gene. Sequence elements to guide MMEJ (microhomology-mediated end joining) were included for scarless excision with linker sequences between the mEGFP tag and the target gene. ETsing this strategy, mEGFP-tagged genes encoding the cardiomyocyte sarcomeric proteins troponin I (TNNI1), alpha-actinin (ACTN2), titin (TTN), myosin light chain 2a (MYL7), and myosin light chain 2v (MYL2) in undifferentiated hiPSCs have been successfully produced. This methodology provides a general strategy for introducing various tags to silent genomic loci in a scar-less manner in hiPSCs.
Introduction
[0291] Genome editing has revolutionized cell biology with the ability to precisely edit and engineer genes of interest at their endogenous loci (Doyon, Zeitler et al. 2011, Dambournet D, Hong et al. 2014, Grassart, Cheng et al. 2014, Mahen, Koch et al. 2014, Otsuka, Bui et al. 2016, Roberts, Haupt et al. 2017). Editing human induced pluripotent stem cells (hiPSCs) is particularly powerful for interrogating cellular dynamics in a diploid, non-transformed and relatively stable genomic setting. Furthermore, the ability to differentiate gene edited hiPSCs into multiple lineages makes them an ideal model system for disease modeling and regenerative medicine (Drubin and Hyman 2017).
[0292] Methods for endogenous fluorescence tagging of select proteins in hiPSCs includes the precise addition of a fluorescent tag sequence to the host cell genome and can be accomplished via HDR (Roberts, Haupt et al. 2017); Dambournet et al 2014; Koch et al, 2018). Since HDR is an inefficient step in this process, a selection strategy must be used to enrich the rare population of edited cells. This is often accomplished by drug selection or by flow cytometry- based sorting, which relies on successful HDR as well as the expression of the tagged fusion protein. Therefore, this approach cannot be used to enrich for edited cells where the target gene is silent in hiPSCs but expressed upon differentiation to other cell types (i.e., differentially- expressed). The exceptional proliferative capacity of hiPSCs make it a simple and scalable editing platform with broad downstream applications. Unlike terminally differentiated cells, an edited hiPSC clonal line can be subjected to extensive quality control, expanded as a shared resource, and differentiated into multiple lineages (Roberts et al.,).
[0293] One strategy for selecting cells edited at a silent locus utilizes HDR- mediated delivery of a selection marker (a drug resistance and/or fluorescent protein) under the control of a constitutive promoter. After selection of the edited cells, this sequence is then removed by recombination, most commonly using the Cre/Lox system. Despite its use, this Cre/Lox recombination event results in a 34 base pair residual loxP“scar,” which can disrupt endogenous sequences important for proper regulation of the targeted gene (Skames, Rosen et al. 2011, Yao, Mich et al. 2017, (Judge, Perez-Bermejo et al. 2017).
[0294] To overcome the limitation associated with recombinases and transposases, provided herein is a multi-step editing strategy using CRISPR/Cas9 to add an endogenous mEGFP tag to transcriptionally inactive genes in hiPSCs with drug-free selection and a“scarless” fusion product (FIG. 33A - FIG. -33H). In the first step, a mEGFP tag is delivered via HDR in tandem with an excisable cassette expressing a second fluorescent protein (mCherry) under the control of a constitutive promoter to enable enrichment of edited cells. In a second step, the selection cassette is excised with Cas9. Also included are repeat rich sequences in the donor template that guide excision via the MMEJ pathway. As a result, the excision site is deleted and a customizable in- frame linker is introduced between the endogenous coding sequence and the mEGFP tag.
[0295] All of the five cardiac genes, which are silent in hiPSCs, were successfully targeted via HDR and enriched with an excisable, constitutively expressed selection cassette. The resulting hiPSC lines represent key structural elements of the cardiomyocyte sarcomere; the Z- disc (ACTN2), thin filament (TNNI1), thick filament specific to early and later developmental time points (MYL7, MYL2) and the sarcomere spanning protein titin (TTN). Clonal lines were generated and genomic screening was performed to identify monoallelic-mEGFP tags with precisely guided in-frame linker sequences lacking a genomic scar. The gene-edited hiPSC clones from all five gene targeting experiments robustly differentiated into cardiomyocytes and demonstrated proper mEGFP localization to the intended sarcomeric structures.
Results
[0296] A multi-step gene editing strategy is provided to endogenously tag key cardiac sarcomeric proteins with mEGFP (FIG. 33A - FIG. 33H). Gene edited hiPSC lines were prepared for five genes expressed specifically in cardiomyocytes: TNNI1, encoding the myofibril contractile regulator slow skeletal Troponin II; ACTN2, encoding the cardiomyocyte-specific actin regulator alpha actinin 2; TTN, encoding to the sarcomere spanning structural protein titin; and MYL7 and MYL2, which respectively encode myosin motor proteins expressed earlier and in atrial subtypes (MLC2a) and later and in ventricular subtypes (MLC2v) during cardiomyocyte differentiation. Episomally derived and previously characterized WTC line was selected as the parental line ((Kreitzer, Salomonis et al. 2013)). Population RNA-seq of WTC hiPSCs and WTC- derived cardiomyocytes confirmed that these five genes are transcriptionally silent in pluripotent cells and activated during cardiomyocyte differentiation (Data available at allencell.org). These five sarcomeric proteins provided a range of expression levels in hiPSC-derived cardiomyocytes, known localization patterns in the sarocomere, and unique developmental expression kinetics for testing the effectiveness of the editing approach.
Design of the donor template for silent editing and selection
[0297] A donor template plasmid is provided with several key features to enable the multi-step editing strategy (FIG. 33 A). The first feature was a fluorescence (mCherry) selection cassette driven by a constitutive promoter. This selection cassette was adjacent to a second downstream fluorescent tag (mEGFP), intended to ultimately be fused to the c-terminus of the gene of interest. Successful donor sequence incorporation via HDR in hiPSCs transfected with Cas9 and target-specific crRNAs resulted in mCherry expression, which served as a surrogate for editing success at these transcriptionally silent loci and enrichment of putatively edited cells.
[0298] Inverted TialL protospacer sites are included flanking the mCherry donor selection cassette to enable excision (FIG. 33A). These protospacers were included to enable Cas9/CRISPR-mediated excision of the selection cassette after mCherry-expressing cells were initially enriched (FIG. 33A - FIG. 33F). The TialL target sequence is absent from the human genome and has been used to ligate distinct double strand breaks induced by Cas9 (Lackner, Carre et al. 2015). Sites were designed in the“P AM-out” orientation such that NHEJ-mediated double strand repair following Cas9 activity would result in an in-frame mEGFP fusion with the target gene. The peptide linker sequences incorporated within the TialL sites were designed and oriented such that NHEJ-based repair after excision would result in an in-frame coding sequence with 12 bp of residual sequence (encoding Ser-Gly-Pro-Gly) that served as a canonical linker between the mEGFP and the target gene (FIG. 33 A).
[0299] An additional feature of the donor template is the inclusion of microhomology-containing sequences composed of hexa- and tri-nucleotide repeats to encode common peptide linkers in the mEGFP-fusion. Microhomology -mediated end joining events (MMEJ) utilizing these repeat sequences bias excision repair outcomes and efficiently delete the residual sequence remaining from Cas9 cleavage, and lead to a more favorable, predictable, and designable linker sequence .
[0300] Five donor plasmids targeting each of the five genes were designed with a selection cassette in tandem with mEGFP, with each plasmid containing the locus specific homology regions for each gene. All plasmids were similar except for the use of two different promoters: the low expressing but more compact hPGK (human phosphoglucokinase) for ACTN2 and TNNI1, and the more active, stable and larger CAGGS promoter for TTN, MYL7, and MYL2.
Step 1- HDR-mediated delivery of the mCherry selection cassette and mEGFP to target loci
[0301] Donor plasmids were introduced into WTC hiPSCs using a described RNP- mediated electroporation protocol (Roberts, Haupt et al. 2017) and the rate of HDR was evaluated as indicated by the fraction of mCherry-expressing cells with flow cytometry. The significant increase in mCherry-expressing cells with gene-specific crRNAs compared to mock control transfections with the plasmid indicated HDR-mediated incorporation of this large donor sequence at all five loci (FIG. 34A - FIG. 34C). Because artificial promoters often display variable responses in pluripotent stem cells, versions of the donor cassette with two different promoters were tested: the low expressing hPGK with the first two genes attempted (TNNI1 and ACTN2) and the higher expressing CAGGS for the subsequent genes tested. Since these experiments were conducted sequentially, each target gene was not tested with both promoters. Despite a larger population of mCherry positive cells in the TNNI1 and ACTN2 experiments (FIG. 34A), the fluorescence intensity was modest as expected with the hPGK promoter (FIG. 34A - FIG. 34B). In comparison, the CAGGS-driven mCherry fluorescence intensity was greater and led to ease of sorting (FIG. 34 A, y-axis).
[0302] Two different crRNAs were used for each target because editing success can be highly dependent on the target sequence at a given target locus (Roberts and Haupt et al. 2017). Both crRNAs for all five genes resulted in successful HDR at a rate (0.4-2%) consistent with a previous study (0.2-5%) tagging stem cell expressed genes. In contrast, experiments performed with control non-targeting crRNA, or no crRNA showed very few mCherry-expressing cells (FIG. 34A). These control conditions reveal the low rate of random, non-targeted integration of the construct into the genome (FIG. 34A, bottom row, B). Cells identified as mCherry-positive were sorted, expanded and confirmed via FACS and/or imaging to consist of >90% mCherry+ positive cells by (FIG. 34C for examples, FIG. 35 A mock).
Step 2 - Excision of the mCherry selection cassette with CRISPR/Cas9 and NHEJ/MMEJ repair
[0303] The FACS-enriched mCherry-positive putatively edited cell populations were subjected to a second round of editing with RNP complexes that targeted the TialL target sequences to excise the selection cassette. Following this transfection, cultures were recovered, expanded over two passages (~7-8 days), and sorted for mCherry-negative cells. The observation of a subset of mCherry negative cells was indicative of excision and this population of cells was more prominent in the Tiall transfected cells compared to the mock crRNA in all experiments (FIG. 35 A - FIG. 35B).
[0304] This strategy was tested with TNNI1 and ACTN2. However, confirming excision in the TNNI1- and ACTN2-targeted cells was challenging due to the low expression of PGK-driven mCherry in the starting population and the gradual silencing prior to excision as observed in the mock-excised cells (FIG. 35E). However, the reproducible loss of mCherry - expression relative to mock excision controls and subsequent clonal analysis suggested that excision of the selection cassette occurred in a subset of these TNNI1 and ACTN2-edited cell populations (FIG. 35E). In contrast, identifying and sorting the mCherry-negative excised population for TTN, MYL2, and MYL7 was more straightforward due to the significantly higher and stable expression of CAAGS-driven mCherry before excision. This is highlighted by the 2-7 fold increase in mCherry negative cells compared to the mock controls for TTN, MYL2, and MYL7 (FIG. 35 A). The differences between the percentages of mCherry -negative cells observed in TialL and mock excision conditions were used to calculate the absolute rate of excision in each of the five experiments, which varied from 4-13% (FIG. 35C). The estimated frequency of alleles in each TialL excised population that were candidates for tagging were also calculated by dividing this relative difference of mCherry-negative cell abundance by the absolute number of mCherry- negative cells in the TialL excised population (FIG. 35C).
Step 3 - Initial confirmation of mEGFP-tagging within mosaic cardiomyocyte cultures
[0305] The mCherry-negative populations were sorted, expanded, and evaluated for expression of the mEGFP-fusion upon differentiation into cardiomyocytes (FIG. 35D). All five gene edited populations resulted in robust cardiomyocyte differentiation with high levels (>86%) of the cardiac marker, cardiac troponin T (cTnT), expression. A subset of mEGFP expressing cells (1-15%) within the cardiomyocyte cell populations (cTnT+) in all 5 targeting experiments was identified (FIG. 35D). In contrast, non-cardiomyocytes (cTnT-) with mEGFP expression was observed in non-cardiomyocyte cells (cTnT-), strongly suggesting that mEGFP expression was specific to cardiomyocytes. The expression and sarcomeric localization of the mEGFP-fusion protein in a subset of cardiomyocytes was also confirmed by microscopy (data not shown).
Step 4 - Genetic screening for clones with precisely edited mEGFP -tagged alleles
[0306] 150-200 colonies were isolated from each putatively excised population and screened for precise editing similar to previously described methods (Roberts and Haupt 2017). A sequential genomic screen was performed consisting of a primary multiplexed ddPCR assay to measure the genomic copy number of several key sequences from each clone (mEGFP/ Amp/mCherry) followed by junctional PCR and Sanger sequencing as described below.
[0307] First, the presence and genomic copy number of the mEGFP tag sequence in each clone was evaluated. The presence of at least one copy of the mEGFP tag was taken as evidence that the homology donor cassette was stably incorporated into the genome. The majority of clones contained mEGFP and those with one copy were selected, indicative of a monoallelic edit. Clones with no mEGFP or mosaic mEGFP (a non-integer genomic copy number) were rejected at this step. No biallelic clones were detected for any of the target genes (FIG. 36C). All clones with Amp and/or mCherry ddPCR signal were rejected with the goal of identifying clones that had monoallelic mEGFP, no plasmid backbone incorporation (Amp-) and excision of the mCherry cassette (mCherry-). In some clones, several genomic copies of mCherry and/or Amp were observed, although the frequencies of these events varied by gene (FIG. 36A). It was hypothesized that these ddPCR profiles indicate multiple misediting and sorting outcomes including incorporation of the plasmid backbone sequence into the genome, stable integration at an off-target site during the first editing step, and/or imperfect selection post-excision. Notably, this was rare for MYL2 where a high rate of precise editing was observed (60%) (FIG. 36A - FIG. -36B). Using this multiplexed screen, we identified at least two clones from all five targeting experiments whose ddPCR signatures indicated the stable presence of a single mEGFP tag copy in the genome and the absence of Amp, an indicator of imprecise HDR, and mCherry, an indicator of inefficient excision.
[0308] Junctional PCR was performed on the mEGFP+/mCherry-/AmpR- clones from the ddPCR assay to confirm precise editing at the appropriate genomic location. PCR primers were designed to amplify the sequence junctions between the mEGFP tag and the genomic sequences 5’ and 3’ of the homology arms. All mEGFP+/mCherry-/AmpR- clones identified from the ddPCR screen underwent editing at the appropriate genomic locus, as judged by the successful amplification of overlapping PCR products on both sides of the tag sequence insertion (data not shown). The extent to which these PCR products matched the anticipated product size was specific to each experiment (FIG. 36C). Of these, appropriate PCR junction products were observed for 50% of the 10 TNNI1 clones, 25% of the TTN clones, 27% of the 1 1 MYL7 clones, and 92% of the 66 MYL2 clones, and were further considered as candidates (FIG. 36C). All 8 ACTN2 clones identified by the ddPCR screen contained an aberrantly large 3’ junction due to a duplication in the homology arm region during HDR (data not shown). Within the set of mEGFP+/mCherry- /AmpR- clones identified from all experiments, all clones that displayed aberrantly large PCR junction products were flawed at either the 5’ or the 3’ junction, but not both, supporting the conclusion that HDR misrepair is a strand specific DNA repair event.
[0309] Next, Sanger sequencing of the target locus was performed on clones with appropriately sized junctions to test for precise HDR and excision outcomes. The majority of clones confirmed by 5’ junctional PCR contained sequences with predicted NHEJ or MMEJ outcomes following excision (FIG. 36D). In addition to in-frame deletion of the selection cassette, precisely edited clones contained a range of linker-mEGFP fusions made possible by the donor template design. 19% of clones (n=36) contained a SGPG linker consistent with an NHEJ outcome (FIG. 36D, 36F). Other in-frame linkers of differing length arising from 3-21 bp deletions occurred in 64% of clones and were likely a result of MMEJ-driven repair (FIG. 36D, 36F). 17% of clones were rejected due to small out-of-frame deletions in the sequence near the excision site (FIG. 36D, 36F). Finally, all clones with PCR-validated 3’ junctions (n=23) also showed the anticipated sequence identity (data not shown).
[0310] In a final assay, the untagged allele was amplified and sequenced in all clones with correct ddPCR signatures and validated junctions, and in some cases, clones with incorrect junctions. NHEJ damage to the untagged allele was observed in 3/39 clones across all five experiments (FIG. 36E), and these clones were rejected. The low rate of observed NHEJ damage to the untagged alleles within HDR-targeted clones was consistent with our previous report (Roberts, Haupt et al. 2017).
Step 5 - Confirmation and validation of mEGFP-tagging in hiPSC-derived cardiomyocytes
[0311] All clones that satisfied the genomic criteria for precise in-frame mEGFP- fusion to the targeted locus after step 4 were subjected to directed cardiomyocyte differentiation to evaluate mEGFP-tagging of the sarcomeric proteins. All clones demonstrated robust differentiation into cardiomyocytes, with mEGFP expression and spontaneous contractile beating emerging within 12 days of differentiation in all tagged ACTN2, TNNI1, TTN, and MYL7 clones (FIG. 37A- FIG. 37B, FIG. 38). Flow cytometry 12 days after initiating cardiac differentiation revealed that the majority of cells were cardiomyocytes (>78% cTNT+) (FIG. 37A - FIG. 37B, FIG. 38). Furthermore, these cardiomyocytes were also expressing mEGFP(>93% mEGFP+/cTnT+) in all tested clones, suggesting that mEGFP-tagged alleles are expressed during cardiomyocyte differentiation (FIG. 37A - FIG. 37B, FIG. 38). Consistent with varying transcript abundance with bulk RNA-sequencing, the intensity of mEGFP expression varied among the five genes with the lowest levels of expression observed with TTN-mEGFP (FIG. 37A). Abundant and timely expression of ACTN2-mEGFP was observed in multiple clones despite imprecise editing (duplication of donor plasmid elements) at the 3’UTR (FIG. 37A - FIG. 37C). Consistent with previous observation with the non-clonal differentiation experiments, cTnT-/mEGFP+ cells within the clonal differentiated cells were extremely scarce (<l%) in all clones tested, suggesting that expression of the fusion protein was specific to the cardiomyocyte lineage (FIG. 37A).
Step 6 - Imaging of clonal mEGFP-tagged hiPSC-derived cardiomyocytes
[0312] In addition to confirming expression of mEGFP by flow cytometry, cardiomyocytes generated from all five gene edited clonal lines were re-plated on glass-bottom plates with PEI/laminin to perform live cell imaging and immunocytochemistry (FIG. 39A FIG. 39C). Live imaging revealed sarcomeric localization of the mEGFP-tagged fusion proteins with canonical striations localizing to the sarcomere as expected (FIG. 39 A). In addition to confirming sarcomeric localization of mEGFP for each tagged protein, expected differences in protein localization for distinct structures were also observed. For example, MYL7 and TNNI1 are expected to localize between z-bands in the myofibril; a thick banding pattern of MYL7-GFP and TNNI1-GFP was observed, with dark lines marking the z- and m-bands. In contrast, alpha-actinin localizes to the z-line of the sarcomere and titin to the m-line of the myofibril. These proteins both show a thinner banding pattern within the sarcomere, reflecting different localization and function in the sarcomere. Antibodies specific to the targeted proteins co-localized with mEGFP, confirming appropriate localization of the tagged protein in cardiomyocytes (FIG. 39B). Expression of cTnT in edited cardiomyocytes was confirmed (FIG. 39C). Unedited cells were also fixed and immunolabeled with antibodies specific to each target, and labeling patterns were found to be comparable to that in the mEGFP-tagged cells (data not shown). The vast majority of cTNT+ cells were also mEGFP+, even after 3-4 weeks of differentiation, indicating that expression of the mEGFP-tagged allele in these cardiomyocytes remained stable and clonal over time.
[0313] Upon confirmation of the appropriate expression and localization of the mEGFP-fusion in cardiomyocytes, each edited hiPSC clonal line was subjected to the same screening and quality control process described previously to ensure genomic (karyotype), stem cell (pluripotency), and cell biological (morphology, growth rate) integrity (Roberts and Haupt 2017). The majority of these clones passed these quality control standards and a subset of these clones were expanded and banked (FIG. 38). Discussion
[0314] Endogenous fluorescent tagging in hiPSCs has enabled live imaging to study the organization and dynamics of key functional proteins and structures in stem cells and their derivatives. The advent of efficient and accessible gene editing tools like CRISPR/Cas9 has only recently merged endogenous tagging approaches with the differentiation potential of hiPSCs, as demonstrated in a recent study evaluating adhesion in sarcomere assembly using paxilin- mEGFP tagged cardiomyocytes (Chopra, Kutys et al. 2018). This more recent approach using pluripotent models follows established in vivo approaches with knock-in-mice expressing endogenously tagged MYL2 and TTN that have revealed basic mechanisms of sarcomere dynamics (da Silva Lopes, Pietas et al. 2011, Ishizu, Higo et al. 2017). Partnering insights gained from in vivo studies with hiPSC-derived models reveal mechanisms important for differentiation, development, and, especially, disease. However, such approaches require the development of systematic and scalable strategies for endogenous tagging, irrespective of transcriptional activity of the target gene in hiPSCs.
[0315] Described herein is a unique multi-step CRISPR/Cas9-mediated editing strategy for tagging non-expressed genes in hiPSCs and methodology to mEGFP-tag several genes expressed specifically during cardiomyocyte differentiation. A donor plasmid design enabled detection of HDR at targeted non-expressed loci, enrichment for putatively tagged cells with a constitutively expressed selection cassette, and generation of mEGFP-fusion alleles lacking genomic scars using a Cas9/MMEJ-driven excision strategy. This approach uniquely utilizes CRISPR/Cas9 for both the incorporation and subsequent excision of the selection cassette and provides the added benefits of selection without drugs. Confirmation of delivery of the selection cassette via HDR, enrichment for mCherry expression, excising the selection cassette, and incorporation of a tunable linker in multiple clones derived from all five edited hiPSC lines indicates the applicability of this method and workflow.
[0316] Previous mEGFP -tagging experiments targeting 10 expressed genes in
WTC hiPSCs using a similar HDR delivery protocol, but with donor plasmids designed to deliver only an in-frame mEGFP tag at the N- or C- terminus of the target gene. HDR rates typically ranged from 0.1-5% for these transcriptionally active loci in hiPSCs. Similar rates were observed (0.4-2%) in the current study using a much larger donor plasmid, suggesting that targeting non- expressed rather than expressed loci with a selection cassette much larger than the tag alone permits similar, tenable HDR rates for silent editing. Testing multiple crRNAs in parallel for each target also ensured editing success. In both studies, >90% edited cells were recovered by flow sorting and highlights the utility of a drug-free enrichment strategy, especially when HDR rates are low. CRISPR/Cas9 was utilized a second time to excise the mCherry selection cassette from the enriched population of edited cells. The relative efficiency of this step (4-12%) varied by gene and promoter with the CAAGS promoter (TTN, MYL2 and MYL7) preferable to hPGK (ACTN2, TNNI1). In all cases, the rate of excision was sufficient for robust FACS enrichment and exceeded the rate of HDR observed in the initial delivery step.
[0317] Successful gene tagging was achieved with this methodology for all five target genes with a significant number of clones containing mEGFP at the intended locus. Further screening by PCR and Sanger sequencing identified a subset of clones with precise editing as defined by the absence of any mutations or duplications at either the tagged or untagged allele. The frequency of precise editing, as determined by the percentage of clones validated in a ddPCR screen, varied by target locus, much like in previously reported data set targeting expressed loci. Since the same Tiall sequences were used for excising the selection cassette in the second step of this method in all five experiments, the variability in editing precision is most likely introduced during the first editing event of HDR, which is influenced by the target locus and crRNA.
[0318] Also provided herein is the use of MMEJ sequences to guide scarless excision with a tunable linker as well as introduction of a canonical peptide linker for when DNA repair is mediated by NHEJ in multiple clones. These features provide flexibility for adding a specific linker based on known properties of the target protein and provides a strategy to precisely engineer various edited outcomes. This can include the addition of an epitope tag, a cleavable peptide, or no intervening sequence as demonstrated for correcting disease mutations in a study (Kim, Matsumoto et al. 2018). Results with these five target genes suggest that the use of a CAAGS-driven selection cassette along with specific microhomology sequences provide means of introducing scarless edits at silent loci. This represents the first report utilizing MMEJ with CRISPR/Cas9-mediated endogenous tagging to generate scarless edits and tunable linkers at silent loci.
[0319] All precisely edited clonal cell lines demonstrated robust cardiomyocyte differentiation and timely expression of the mEGFP-fusion protein suggesting endogenous regulation of the tagged protein during cardiac differentiation. Appropriate expression and localization was observed in tagged ACTN2 clones despite imprecise editing at the 3’UTR, suggesting that the compromised UTR does not affect the expression or localization of ACTN2- mEGFP. Tracking the expression and localization of the mEGFP -tagged early and late markers MYL2 and MYL7 will help evaluate the expression and dynamics of the fusion proteins during the differentiation and maturation process, as well as validate the integrity of the tagged alleles. Since the five tagged genes encode important sarcomeric proteins, successful tagging will enable live cell analysis of structural organization during differentiation and development.
[0320] The ability to generate multiple mEGFP-tagged clonal hiPSC lines for several cardiomyocyte-specific genes using this methodology demonstrates the feasibility of this CRISPR/Cas9-mediated, scarless approach for endogenously tagging silent loci in hiPSCs. The methods described here may be broadly applied to tagging silent genes that are important for multiple cell types from all three lineages.
Materials and Methods
Cell Culture
[0321] All work with human induced pluripotent stem cell (hiPSC) lines was approved by internal oversight committees and performed in accordance with applicable NIH, NAS, and ISCR Guidelines. The WTC hiPSC cell line was generated by Dr. Bruce Conklin (The Gladstone Institutes) and maintained using described methods (Kreitzer, Salomonis et al. 2013). Edited cell lines described in this report can be obtained by visiting the Allen Cell Explorer (www.allencell.org, Allen Institute for Cell Science 2017).
Donor Plasmids, crRNAs and Cas9 Protein
[0322] Donor plasmids were designed uniquely for each target locus. Homology arms 5' and 3' of the desired insertion site were each 1 kb in length and designed using the GRCh38 reference genome. WTC-specific variants (SNPs and INDELs) were identified from publicly available exome data (ETCSC Genome Browser) and also internal exome data. In cases where the WTC-specific variant was heterozygous, the reference genome variant was used in the donor plasmid; when the WTC-specific variant was homozygous, the WTC-specific variant was used in the donor plasmid. Linkers for each protein were unique to each target and were included in the donor plasmid with microhomology-containing redundancies, such that after MMEJ, the restored sequence would function as a linker 5' of mEGFP in each C-terminal tagging experiment. Linker sequence was designed to flank TialL crRNA binding sites, which in turn flanked the mCherry expression cassette sequence containing either the PGK promoter or the CAGGS. To prevent crRNAs from targeting the donor plasmid sequence, mutations were introduced to disrupt Cas9 recognition or crRNA binding; when possible, these changes did not affect the amino acid sequence. When unnecessary, because the crRNA binding sequence spanned the designed genomic insertion site and was thus abrogated, no additional mutations were made in the donor plasmid. The plasmids were synthesized and cloned into a pUC57 backbone and prepared for transfection by Genewiz. Custom synthetic crRNAs and their corresponding tracrRNAs were ordered from either IDT or Dharmacon. Recombinant wild type Streptococcus pyogenes Cas9 protein was purchased from the UC Berkeley QB3 Macrolab. All tagging experiments discussed in the current report used the mEGFP (K206A) sequence. Detailed information relating to editing design can be found at The Allen Cell Explorer (Allen Institute for Cell Science 2017).
Transfection and Enrichment by Fluorescence-Activated Cell Sorting (FACS)
[0323] Cells were dissociated into single-cell suspension using Accutase as previously described (Roberts and Haupt et al. 2017). Transfections were performed using the Neon transfection system (ThermoFisher Scientific). Cas9:crRNA:tracrRNA precomplexed 1 : 1 : 1 and co-transfected with 2 pg of donor plasmid optimally balanced editing efficiency with cell survival after transfection (data not shown) this platform as used for all editing experiments.
[0324] A cell pellet of 8xl05 cells was resuspended in 100 pL Neon Buffer R with
2 pg donor plasmid, 2 pg Cas9 protein, and duplexed crRNA:tracrRNA in a 1 : 1 molar ratio to Cas9. Prior to addition to the cell suspension, the Cas9/crRNA:tracrRNA RNP was precomplexed for a minimum of 10 min at room temperature. Electroporation was with one pulse at 1300 V for 30 ms. Cells were then immediately plated onto GFR Matrigel-coated 6-well dishes with mTeSRl media supplemented with 1% P/S and 10 pM ROCK inhibitor. Transfected cells were cultured as previously described for 3-4 days until the transfected culture had recovered to -70% confluence. Negative control transfections were performed in all experiments with the crRNA targeting the AAVS1 locus in order to assess the relative rate of random donor cassette incorporation. Cells was cultured for two passages across 7-9 days before analysis, in order to allow mCherry expression from the episomal donor plasmid to decline. [0325] Cells were harvested for FACS using Accutase as previously described. The cell suspension (0.5 - l.OxlO6 cells/mL in mTeSRl with ROCK inhibitor) was filtered through a 35 pm mesh filter into polystyrene round bottomed tubes. Cells were sorted using a FACSArialll Fusion (BD Biosciences) with a 130 pm nozzle and FACSDiva software (BD Biosciences). Forward scatter and side scatter (height versus width) were used to exclude doublets and the mEGFP+ gate was set using live, untransfected WTC cells such that <0.1% of untransfected cells fell within the gate. Sorted populations were plated into GFR Matrigel-coated 96-well plates (<2 x 103 cells recovered) or 24-well plates (<lxl04 cells recovered) for expansion of the whole enriched population before clone isolation. To determine % HDR, data were analyzed using FlowJo V.10.2 (TreeStar, Inc.).
Clonal Cell Line Generation
[0326] FACS-enriched populations of edited cells were seeded at a density of lxlO4 cells in a 10 cm GFR Matrigel-coated tissue culture plate. After 5-7 days clones were manually picked with a pipette and transferred into individual wells of 96-well GFR Matrigel-coated tissue culture plates with mTeSRl supplemented with 1% P/S and 10 pM ROCK inhibitor for 1 day. After 3-4 days of normal maintenance with mTeSRl supplemented with 1% P/S, colonies were dispersed with Accutase and transferred into a fresh GFR Matrigel-coated 96-well plate. After recovery, the plate was divided into daughter plates for ongoing culture, freezing, and gDNA isolation.
[0327] To cryopreserve clones in a 96-well format, when cells were 60-85% confluent they were dissociated and pelleted in 96-well V-bottom plates. Cells were then resuspended in 60 pL mTeSRl supplemented with 1% P/S and 10 pM ROCK inhibitor. Two sister plates were frozen using 30 pL cell suspension per plate, added to 170 pL CryoStor® CS10 (Sigma) in non-GFR Matrigel coated 96-well tissue culture plates. Plates were sealed with Parafilm and stored at -80°C.
Genetic Screening with Droplet Digital PCR (ddPCR)
[0328] During clone expansion, >1500 cells were pelleted from a 96-well plate for total gDNA extraction using the PureLink Pro 96 Genomic DNA Purification Kit (ThermoFisher Scientific). ddPCR was performed using the Bio-Rad QX200 Droplet Reader, Droplet Generator, and QuantaSoft software. The reference assay for the 2-copy, autosomal gene RPP30 was purchased from Bio-Rad (assay ID dHsaCPl000485, cat. # 10031243). For primary ddPCR screening the assay consisted of three hydrolysis probe-based PCR amplifications targeted to three different genes: mEGFP (insert), AMP or KAN (backbone), and the genomic reference RPP30. The following primers were used for the detection of mEGFP (5'- GCCGACAAGC AGAAGAACG-3 ', 5 '-GGGT GTTCTGCTGGT AGT GG-3 ') and hydrolysis probe (/56-FAM/AGATCCGCC/ZEN/ACAACATCGAGG/3LABkFQ/). This assay was run in duplex with the genomic reference RPP30-HEX. The PCR for detection of the AMP gene used the primers (5'- TTTCCGTGTCGCCCTTATTCC -3', 5'- ATGTAACCCACTCGTGCACCC -3') and hydrolysis probe (/5HEX/T GGGTGAGC/ZEN/AAA AAC AGGA AGGC/3 IABkF Q/) . The PCR for detection of the KAN gene used the primers (5'-AACAGGAATCGAATGCAACCG-3', 5'- TTACTCACCACTGCGATCCC-3 ') and hydrolysis probe
(/5HEX/GT GAAAAT A/ZEN/TTGTTGAT GCGCTGG/3 IABkF Q/) .
[0329] PCR reactions were prepared using the required 2x Supermix for probes with no EGTR (Bio-Rad) with a final concentration of 400 nM for primers and 200 nM for probes, together with 10 units of Hindlll and 3 pL of sample (30-90 ng DNA) to a final volume of 25 pL. Each reaction prior to cycling was loaded into a sample well of an 8-well disposable droplet generation cartridge followed by 70 pL of droplet generator oil into the oil well (Bio-Rad). Droplets were then generated using the QX200 droplet generator. The resulting emulsions were then transferred to a 96-well plate, sealed with a pierceable foil seal (Bio-Rad), and run to completion on a Bio-Rad C1000 Touch thermocycler with a Deep Well cycling block. The cycling conditions were: 98°C for 10 min, followed by 40 cycles (98°C for 30 s, 60°C for 20 s, 72°C for 15 s) with a final inactivation at 98 °C for 10 min. After PCR, droplets were analyzed on the QX200 and data analysis was preformed using QuantaSoft software.
[0330] The AMP or KAN signal was determined to be from residual non- integrated/background plasmid when the ratio of AMP/RPP30 or KAN/RPP30 fell below 0.2 copies/genome, because this was the maximum value of non-integrated plasmid observed at the time point used for screening in control experiments (data not shown). To ensure that no significant amplification bias existed between the mEGFP and AMP amplicons, a dilution series was preformed using a known plasmid containing both the mEGFP and AMP sequence. 78-5000 copies of plasmid were loaded per well and both mEGFP and AMP primers and probes were multiplexed together to ensure that the value returned corresponded to the copies of plasmid loaded. For primary screening the ratios of (copies/pLmEGFP)/(copies/pLRPP30) were plotted against [(copies/pLAMP)/(copies/pLRPP30) to identify cohorts of clones for ongoing analysis.
Genetic Screening with Tiled Junctional PCR
[0331] Clones were evaluated with PCR using primers spanning the junction between the tag sequence and the endogenous genomic sequence distal to the homology arm whether all mEGFP+/mCh-/AmpR- clones from each targeting experiment were targeted at the intended locus (data not shown). PCR was used to amplify the tagged allele in two tiled reactions spanning the left and right homology arms, the mEGFP and linker sequence, and portions of the distal genomic region 5' of the left homology arm and 3' of the right homology arm (FIG. 34A - FIG. 34C) using gene-specific primers. Both tiled junctional PCR products were Sanger sequenced (Genewiz) bidirectionally with PCR primers when their size was validated by gel electrophoresis and/or fragment analysis (Advanced Analytics Technologies, Inc. Fragment Analyzer). In final clones, a single, non-tiled junctional PCR reaction using the gene-specific external 5' and 3' junctional primers was used to amplify both the edited and wild type allele in a single reaction. All PCR reactions described above were prepared using PrimeStar® (Takara) 2x GC buffer, 200 mM DNTPs, 1 unit PrimeStar® HS polymerase, 800 nM primers, 10 ng gDNA in a final volume of 25 pL. Cycling conditions were as follows (98°C 10 s, 70°C 5 s, 72°C 60 s) x 6 cycles at -2°C/cycle annealing temperature, (98°C 10 s, 54°C 5 s, 72°C 60 s) x 32 cycles, l2°C hold.
Screening for Clones with Wild Type ETntagged Allele Sequences
[0332] PCR was also used to amplify the untagged allele using gene-specific primers. These primers did not selectively amplify the unmodified locus, as was the case for tiled junctional PCR amplification of the tagged allele, but rather amplified both untagged and tagged alleles. PCR was performed with the same Primestar® reagents and cycling conditions as described above. Tracking of insertions and deletions (INDELs) by decomposition (TIDE) analysis was performed manually on the amplification reaction after bidirectional Sanger sequencing in order to determine the sequence of the untagged allele. For all final clones with wild type untagged alleles, the PCR product corresponding to the untagged allele was gel isolated and sequenced to confirm the initial result from TIDE analysis. Cell Plating for Imaging
[0333] Cells were plated on glass bottom multi-well plates (1.5H glass, Cellvis) coated with phenol red-free GFR Matrigel (Coming) diluted 1 :30 in phenol red-free DMEM/F12 (Gibco). Cells were seeded at a density of 2.5xl03 in 96-well plates and 12.5-18c103 on 24-well plates and fixed or imaged 3-4 days later.
Directed Differentiation of hiPSCs to Cardiomyocytes
[0334] Cardiomyocyte differentiation was achieved using a small molecule differentiation protocol similar to previously reported methods, with optimizations to small molecule concentration and timing (Lian et al. 2013). Briefly, cells were seeded onto GFR Matrigel-coated 6-well tissue culture plates at a density ranging from 0. l5-0.25xl06 cells per well in mTeSRl supplemented with 1% P/S and 10 mM ROCK inhibitor, designated as day -3. Cells were grown for three days, with daily mTeSRl media changes (day -2 and day -1). The following day (designated day 0), directed cardiac differentiation was initiated by treating the cultures with 7.5 pM CHIR99021 (Cayman Chemical) in RPMI media (Invitrogen) containing insulin-free B27 supplement (Invitrogen). After 48 hours (day 2), cultures were treated with 7.5 pM IWP2 (R&D systems) in RPMI media containing insulin-free B27 supplement. At day 4, cultures were treated with RPMI media supplemented with insulin-free B27 supplement. From day 6 onwards, media was replaced with RPMI media supplemented with B27 with insulin (Invitrogen) every 2 days. Two lines, TNNI1 and ACTN2 populations, were differentiated using previously described methods (Roberts, Haupt et al. 2017).
Cardiomyocyte Plating for Imaging and cTnT Expression Measurement
[0335] To prepare 24-well glass bottom plates for cardiomyocyte plating, plates were treated with 0.5M glacial acetic acid (Fischer Scientific) at room temperature for 20-60 minutes and washed three times with sterile milliQ (MQ) water. Wells were treated with 0.1% PEI (Sigma Aldrich) solution in sterile MQ water overnight at 4°C, then rinsed 2 times with DPBS and one time with sterile MQ water. Finally, wells were treated with 25 pg/mL natural mouse laminin (Thermo Fischer Scientific) and incubated overnight at 4°C. Laminin solution was removed immediately prior to re-plating. [0336] Cells were harvested using TrypLE Select (lOx) (Invitrogen), diluted to 2x with Versene (Invitrogen) and warmed to 37°C. Cells were washed twice with PBS and incubated with 2x TrypLE/Versene for 6-12 min at 37°C. Cells were gently titurated 8-l2x, collected, and pelleted in RPMI media (Invitrogen) containing B27 with insulin (Invitrogen), 10 mM ROCK inhibitor, and 200 U/mL DNase I (Millipore Sigma) at 1000 rpm for 3 min at room temperature. Cells were resuspended in the same media and a 10 pL aliquot was used to count cardiomyocytes in a hemocytometer (INCYTO C-ChipTM). Cells were seeded onto PEI/Laminin coated 24-well glass bottom plates at a density ranging from 0.35-0.5x105 cells per well in RMPI media containing B27 with insulin and 10 pM ROCK inhibitor. 24 hours after plating, media was changed to RPMI media containing B27 with insulin. Imaging was performed 5-2 ld after plating.
[0337] To measure cardiac Troponin T (cTnT) expression, cells were pelleted at
1000 rpm for 3 min at room temperature. Cells were then fixed with 4% PFA in DPBS for 10 min at room temperature, washed once with DPBS and then resuspended in 5% FBS in DPBS. Fixed cells were incubated in BD Perm/Wash™ buffer containing anti-cardiac Troponin T AlexaFluor® 647 or equal mass of mlgGl, k AF647 isotype control (all BD Biosciences) for 30 min at room temperature (Will need antibody table). After staining, cells were washed with BD Perm/Wash™ buffer, then 5% FBS in DPBS and resuspended in 5% FBS in DPBS with DAPI (2pg/mL). Cells were acquired on a CytoFLEX S (Beckman Coulter) or FACSArialll Fusion (BD Biosciences) and analyzed using FlowJo software V.10.2. (Treestar, Inc.). Nucleated particles were identified as a sharp, condensed peak on a DAPI histogram and were then gated to exclude doublets as previously described (Roberts, Haupt et al. 2017). The cardiac Troponin T (cTnT) positive gate was set to include 1% of cells in the isotype control sample.
Live-cell imaging
[0338] Live cell imaging was performed on a Zeiss spinning-disk microscope with a Zeiss 20x/0.8 NA Plan-Apochromat, or 40x/l .2 NA W C-Apochromat Korr UV Vis IR objective, a CSU-X1 Yokogawa spinning-disk head, and Hamamatsu Orca Flash 4.0 camera. Fixed cell imaging was done on a 3i spinning-disk microscope with a Zeiss 20x/0.8 NA Plan- Apochromat, or 63 x/l .2 NA W C-Apochromat Korr UV Vis IR objective, a CSU-W1 Yokogawa spinning-disk head, and Hamamatsu Orca Flash 4.0 camera. Microscopes were outfitted with a humidified environmental chamber to maintain cells at 37°C with 5% CO2 during imaging. G-banding Karyotype Analysis
[0339] Karyotype analysis was performed by Diagnostic Cytogenetics Inc. (DCI).
A minimum of 20 metaphase cells were analyzed per clone.
RNA-Seq Analysis
[0340] Two clonal populations (one at passage 8 and one at passage 14) were sequenced from the WTC unedited parental line. After dissociation of cell cultures with Accutase, 2-3xl06 cells were pelleted, washed once with DPBS, resuspended in 350 pL of Qiagen RLT plus lysis buffer, then flash frozen in liquid nitrogen before storage at -80°C. 101 bp paired end libraries were prepared using an Illumina TruSeq Stranded mRNA Library Prep kit. Libraries were sequenced on an Illumina HiSeq 2500 at a depth of 30 million read pairs (Covance). Adapters were trimmed using Cutadapt (Martin 2017). Reads were mapped to human genome build GRCh38 (GCA 000001405.15) and NCBI annotations 107 using STAR aligner (Dobin, Davis et al. 2013). Gene level and isoform level transcript abundances were estimated using Cufflinks (Trapnell, Williams et al. 2010).
Growth Curve Measurements
[0341] Edited and unedited WTC hiPSCs were grown to -75-80% confluence in a
10 cm plate and passaged via Accutase detachment on day 0 of growth. 4xl03 cells were then plated in triplicate on four GFR Matrigel-coated 96-well plates (one for each of the terminal time points: 0 h, 48 h, 72 h, and 96 h). A standard curve was also plated in triplicate as a two-fold serial dilution from 2xl05 cells to 98 cells. The ATP -based CellTiter-Glo (Promega) kit was used as an indirect measure of cell growth. Briefly, the CellTiter-Glo reagent was added to the live cells at a 1 :4 dilution at each of the time points and luminescence counts were read with a Perkin-Elmer Enspire plate reader. The standard curve plate and 0 h plates were read within two hours of plating. Cell numbers were extrapolated from the linear portion of the standard curve for each experiment and the following equation was used to calculate cell doubling time where Tf is the final time in hours, Xf is the final cell count, and Xi is the initial cell count:
(Tf * LOG(2))/{(LOG(Xf)) - (f WG(Xi ))) [0342] Reported doubling time was calculated using counts at time of seeding (0 h) and at 96 hours after seeding. Two independent experiments were performed for each edited cell line. Triplicate counts from each independent experiment were averaged (leaving two data points per edited cell line and three for unedited WTC) and a one-way ANOVA was performed to test if doubling times between cell lines were significantly different.
Figure Legends
[0343] FIG. 33A - FIG. 33H. Schematic describing multi-step CRISPR/Cas9 mediated targeting via HDR and subsequent microhomology guided excision of the constitutively expressed selection cassette. Donor construct sequences encoding peptide linkers specific to each of the five tagging experiments, and also containing microhomology guiding repeat sequences for excision, are shown. The constitutive expression cassette sequences (PGK/CAGGS promoter driving mCherry expression) flanked by TialL crRNA binding sites utilized to release the cassette are not shown. Oppositely oriented PAM sequences and PAM-3 trinucleotide sequences (turquoise font) anticipated from direct NHEJ without MMEJ are shown. The linker sequence for ACTN2 targeting differed significantly from the linker sequence used for targeting of TTN, TNNI1, MYL2 and MYL7. In-frame translations of each linker region are indicated. Residues encoded by the endogenous open reading frame specific to each locus are shown in orange. Amino acid residues designed to comprise peptide linkers between the endogenous reading frame and mEGFP tag, after successful excision, are shown in blue. Invariant amino acid linker residues (P-G-S-G) resulting from translation, after successful excision, of the PAM and PAM-3 sequences from each oppositely oriented TialL crRNA site, are shown in pink. The initial two residues at the N- terminus of mEGFP are shown in green. Nucleotides involved in guiding microhomology- mediated in-frame deletions of the invariant residues are displayed in red font.
[0344] FIG. 34A - FIG. 34C. Fluorescence assisted cell sorting (FACS) experiments to isolate mCherry-expressing cells and establish the efficacy of multi-step editing at transcriptionally silent loci. FIG. 34A. Percentages of mCherry-expressing cells isolated after transfection with donor plasmids in conjunction with Cas9/crRNA complexes targeting the intended locus (top and middle row of panels), alongside mock transfections (bottom row). The percentage of mCherry-expressing cells in each transfection is displayed, as indicated. Boxes indicate the thresholds applied to determine whether individual cells were mCherry-expressing within each analysis. The identity of the targeting crRNA is indicated in blue font within each plot. mCherry fluorescence is indicated on the y-axis and forward scatter (FSC) is indicated on the x- axis. FIG. 34B. All data from (34A) is displayed in graphical format. Standard deviations are indicated where multiple replicate transfections were performed. FIG. 34C. Live imaging and FACS were performed one expansion passage after FACS enrichment in order to validate high fluorescence sorting purity. As an examples of these analyses, mCherry fluorescence was imaged in the enchriched TNNI1 Crl experiment along with Hoechst nuclear dye (indicated in each panel in the merged image). Cells isolated from the TTN Crl enrichment experiment were analyzed by FACS and found to 98.8% pure.
[0345] FIG. 35 A - FIG. 35D. FACS-sorting of mCherry -negative cells to measure excision and obtain putatively mEGFP-tagged cells. FIG. 35 A. mCherry-expressing cells isolated from targeting experiments were transfected after recovery with either TialL or mock RNP. mCherry-negative cells were then collected as putatively excised cells from the TialL transfected condition. The percentage of mCherry- cells, according to the displayed gates, were measured and are as indicated from both the TialL and mock transfected conditions. mCherry fluorescence is indicated on the y-axis and forward scatter (FSC) is indicated on the x-axis. FIG. 35B. mCherry- negative cells isolated after TialL-mediated excision were differentiated into cardiomyocytes and analyzed by flow cytometry for mEGFP expression and expression of the cardiomyocyte marker cTnT. mEGFP expression was observed within the population of cells positive for the cardiomyocyte marker cardiac troponin T (cTnT). The percentage of cTnT+/mEGFP+ (upper right sector within each plot) cells is indicated, and was interpreted as a proxy for estimated tagging efficiency. FIG. 35C. Percentages of mCherry-negative cells (as determined by FACS threshold shown in FIG. 35 A) from both mock and TialL excision conditions are shown in graphical format. Standard deviation describes variance between replicate conditions. P-values from Student’s t-test are shown, demonstrating significantly elevated abundance of mCherry-negative cells when excision was performed with the TialL crRNA. FIG. 35D. The percentage of on-target excised cells (blue arrows in FIG. 35C) calculated by subtracting the mean percentage of mCherry- negative cells in the TialL excised condition from the mean percentage of mCherry-negative cells in the mock excised condition is shown on the left y-axis. This value was used as an estimate for absolute rate of excision. The estimated frequency (right y-axis) of putatively excised alleles in the TialL population was determined by subtracting the percentage of mCherry-negative cells in the mock-excised population from the percentage of mCherry-negative cells in the TialL excised population and dividing this value by the total percentage of mCherry-negative cells in the TialL excised population.
[0346] FIG. 35E. Histograms display cell frequencies as a function of mCherry fluorescence from TialL-RNP-excised (blue) and mock-transfected (red) populations of putatively edited cells (mCherry-expressing) from the TNNI1 and ACTN2 targeting experiments. mCherry- negative cells were abundant in the mock condition because hPGK promoter silencing occurred. The diminished frequency of mCherry-expressing cells in TialL RNP -transfected populations (black arrows) was consistently observed and interpreted as evidence that excision had occurred. This contrasted with the MYL7, MYL2 and TTN tagging experiments, where mCherry expression was more stable in non-excised cells and mCherry-negative cells were much rarer in the mock transfected condition.
[0347] FIG. 36A - FIG. 36F. Genetic analysis of precise mEGFP tagging using multi-step targeting and excision in clones. FIG. 36A. Clones from each targeting experiment were analyzed according to their normalized genomic copy number of the mEGFP, mCherry and AmpR sequences. Clones were categorized as candidates for further analysis if the of mEGFP genomic copy number was consistent with monoallelic or biallelic tagging (copy number of 1 or 2), and additive copy number of AmpR and mCherry was < 0.2. Plots display the normalized mEGFP copy number (x-axis) plotted against the normalized additive mCherry and AmpR copy numbers (y-axis). Clones considered for further analysis are indicated in each plot by the green boxes. FIG. 36B. The percentage of clones validated by ddPCR, is displayed in bar graphs. FIG. 36C. Percentages of ddPCR validated clones with validated PCR junctions between the mEGFP tag and the surrounding genomic region, distal to the homology arms, are shown. FIG. 36D. The percentages of analyzed clones with in-frame sequences of peptide linkers at the excision site are additionally shown, demonstrating a high rate of in-frame excision predicted to generate effectively tagged clones. FIG. 36E. The percentage of clones with WT untagged alleles is shown, demonstrating the relative low impact of unintended NHEJ at the targeted locus. FIG. 36F. In each experiment, peptide linkers at the site of excision were sequenced and are as shown in the rightmost column, establishing the strong tendency toward in-frame repair after excision of the selectable cassette. All outcomes within analyzed clones are shown, including aberrant insertions and out of frame deletions (indicated by red font). In FIG. 36B - FIG. 36E, ratio of validated clones to total number of clones tested is shown, in all analyses, in blue font.
[0348] FIG. 37A - FIG. 37C. Quantitative assays to evaluate cardiomyocyte differentiation efficiency and mEGFP-tagged allele expression in precisely excised clones. FIG. 37A. Selected clones validated by ddPCR analysis, junctional PCR and sequencing of the peptide linker at the excision site, and thus predicted to produce an in-frame endogenous mEGFP fusion protein, were differentiated into cardiomyocytes. mEGFP fluorescence in fixed cells was measured (y axis) and plotted against antibody staining intensity for the cardiomyocyte marker cardiac troponin T (cTnT, x axis). Each clone displayed robust differentiation, with greater than 74% cTnT+ cells, and within the cTnT+ cell population, the majority of cells expressed mEGFP. Percentages of cells gated as positive and negative for both mEGFP fluorescence and cTnT staining are as indicated. TNNI1, ACTN2, TTN and MYL7 were differentiated for 12 days. MYL2 cultures were differentiated for 30 days. FIG. 37B: The percentages of cells positive for the cTnT marker of cardiomyocyte fate across biological replicates and in several independently edited clones validated to contain a precisely excised mEGFP-tagged alleles are shown. Error bars are standard deviation. ACTN2 experiments were two replicates. All other experiments consisted of three replicates. FIG. 37C: The percentages of cardiomyocytes (cTnT+) that were additionally expressing mEGFP are shown in experiments with several independently edited clones. Error bars are standard deviation among biological replicates. ACTN2 experiments were two replicates. All other experiments consisted of three replicates.
[0349] FIG. 38. Quality control criteria to evaluate the robustness of clonal line differentiation, pluripotency and genomic stability. Clones from each experiment are indicated, and were evaluated for karyotypic irregularities with metaphase spreads. FACS analysis after staining for nuclear pluripotency markers is also shown, with minimum values obtained among all trials displayed and number of trials in parentheses. Germ layer marker expression in the was measured by RT-ddPCR for TNNI1 clone 172. Cardiomyocyte differentiation using the small molecule protocol was performed and the percentage of cTnT+ cells was measured using FACS. Errors indicate standard deviation. The percentage of cTnT+ cells that additionally expressed mEGFP above threshold are additionally shown. Errors indicate standard deviation.
[0350] FIG. 39A - FIG. 39C. Imaging experiments to evaluate sarcomeric localization of the mEGFP-tagged alleles. FIG. 39 A. Two independently edited clones from each of the five targeting experiments were differentiated, plated on glass bottom plates and imaged live using spinning disc confocal microscopy. Similar mEGFP-localization to the sarcomere was observed in both clones. FIG. 39B. One clone from each experiment was additionally fixed and imaging was performed on both the mEGFP fluorescence (green channel, green boxes for insets) and antibody staining (purple channel, purple boxes for insets) against the targeted protein, to confirm whether absolute overlap of the mEGFP and endogenous stain was observed. FIG. 39C. Cells were additionally stained for the cardiomyocyte differentiation marker cTnT and staining for this marker (purple channel, purple boxes for insets) was compared with mEGFP-fluorescence (green channel, green boxes for insets) in order to visualize the extent of differentiation in each culture, and whether mEGFP expression and sarcomeric localization was specific to cells that had acquired a cardiomyocyte identity. All scale bars are 20 microns. All cardiomyocytes were differentiated for ~20 days.
[0351] FIG. 40 A - FIG. 40D. Quantitative and imaging assays to evaluate cardiomyocyte differentiation efficiency and GFP-tagged allele expression in precisely excised MYL2 clones. FIG. 40A. Selected MYL2 clones predicted to produce an in-frame endogenous GFP fusion protein, were differentiated into cardiomyocytes and cultured for 13 and 26 days. GFP fluorescence in fixed cells was measured (y axis) and plotted against antibody staining intensity for the cardiomyocyte marker cardiac troponin T (cTnT, x axis). Each clone displayed robust differentiation, with greater than 65% cTnT+ cells, and within the cTnT+ cell population, an increasing percentage of cardiomyocytes expressing GFP on Day 26 of differentiation, compared with Day 13. Percentages of cells gated as positive and negative for both GFP fluorescence and cTnT staining are as indicated. Thresholds for cTnT+ cells based on antibody staining are as indicated by purple boxes. Thresholds for GFP expression are as indicated by green boxes. FIG. 40B: The percentages of cells expressing the cTnT marker of cardiomyocyte fate across biological replicates and in several independently edited MYL2 clones validated to contain a precisely excised GFP-tagged alleles are shown. Error bars are standard deviation. Day 12-13 experiments were two replicates. Day 26 experiments consisted of one replicate. FIG 40C: The percentages of cardiomyocytes (cTnT+) that were additionally expressing GFP are shown in experiments with several independently edited clones. Error bars are standard deviation among biological replicates. Day 12-13 experiments were two replicates. Day 26 experiments consisted of one replicate. FIG 40D: Two independently edited clones from MYL2 editing experiments were differentiated, plated on glass bottom plates coated with PEI/laminin and imaged live using spinning disc confocal microscopy. At day 20 (left column) and day 28 (middle column) after differentiation GFP expression was assessed, showing an increase in both the number of cells expressing GFP and the intensity of GFP signal. Additionally, GFP localization was observed specifically at sarcomeres for both clones (right column). Scale bars are as indicated.
References
[0352] Allen Institute for Cell Science (2017). "Allen Cell Explorer." from http://www.allencell.org/.
[0353] Chopra, A., et al. (2018). "Force Generation via beta-Cardiac Myosin, Titin, and alpha- Actinin Drives Cardiac Sarcomere Assembly from Cell-Matrix Adhesions." Dev Cell 44(1): 87-96 e85.
[0354] Dambournet D, et al. (2014). "Tagging Endogenous Loci for Live-Cell
Fluorescence Imaging and Molecule Counting ETsing ZFNs, TALENs, and Cas9." Methods in Enzymology Chapter 7(546): ISSN 0076-6879.
[0355] Dobin, A., et al. (2013). "STAR: ultrafast universal RNA-seq aligner."
Bioinformatics 29(1): 15-21.
[0356] Doyon, J. B., et al. (2011). "Rapid and efficient clathrin-mediated endocytosis revealed in genome-edited mammalian cells." Nat Cell Biol 13(3): 331-337.
[0357] Drubin, D. G. and A. A. Hyman (2017). "Stem cells: the new "model organism"." Mol Biol Cell 28(11): 1409-1411.
[0358] Grassart, A., et al. (2014). "Actin and dynamin2 dynamics and interplay during clathrin-mediated endocytosis." J Cell Biol 205(5): 721-735.
[0359] Judge, L. M., et al. (2017). "A BAG3 chaperone complex maintains cardiomyocyte function during proteotoxic stress." JCI Insight 2(14).
[0360] Kim, S. T, et al. (2018). "Microhomology-assisted scarless genome editing in human iPSCs." Nat Commun 9(1): 939.
[0361] Kreitzer, F. R., et al. (2013). "A robust method to derive functional neural crest cells from human pluripotent stem cells." Am J Stem Cells 2(2): 119-131.
[0362] Lackner, D. H., et al. (2015). "A generic strategy for CRISPR-Cas9- mediated gene tagging." Nat Commun 6: 10237. [0363] Lin, S., et al. (2014). "Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery." Elife 3 : e04766.
[0364] Mahen, R., et al. (2014). "Comparative assessment of fluorescent transgene methods for quantitative imaging in human cells." Mol Biol Cell 25(22): 3610-3618.
[0365] Martin, M. (2017). "Cutadapt Removes Adapter Sequences From High-
Throughput Sequencing Reads." EMBnet.joumal 17(1): 10-12.
[0366] Otsuka, S., et al. (2016). "Nuclear pore assembly proceeds by an inside-out extrusion of the nuclear envelope." Elife 5.
[0367] Roberts, B., et al. (2017). "Systematic gene tagging using CRISPR/Cas9 in human stem cells to illuminate cell organization." Mol Biol Cell 28(21): 2854-2874.
[0368] Skarnes, W. C., et al. (2011). "A conditional knockout resource for the genome-wide study of mouse gene function." Nature 474(7351): 337-342.
[0369] Trapnell, C., et al. (2010). "Transcript assembly and quantification by RNA-
Seq reveals unannotated transcripts and isoform switching during cell differentiation." Nat Biotechnol 28(5): 511-515.
[0370] UCSC Genome Browser. "WTC Public Exome and Genome." from http://genome.ucsc.edu/cgi- bin/hgTracks?db=hgl9&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType= default&virtMode=0&nonVirtPosition=&position=chr 10%3 Al 21410882%2D 121437329&hgsid =585358619_Z6HyzkmflhRh2IzPa4Hh8 SsJTwRB .
[0371] Yao, Z., et al. (2017). "A Single-Cell Roadmap of Lineage Bifurcation in
Human ESC Models of Embryonic Brain Development." Cell Stem Cell 20(1): 120-134.
[0372] Feng et al, 2017. https://www.biorxiv.org/content/early/20l7/07/26/l37059
[0373] Han et al, 2017. https://www.biorxiv.org/content/early/20l7/08/2l/l78905
[0374] Koch et al, 2018. https://www.biorxiv.org/content/early/20l8/03/07/l88847.
[0375] Otsuka et al, 2016. https://www.ncbi.nlm.nih.gov/pubmed/27630l23.
[0376] Chopra et al, 2018. https://www.ncbi.nlm.nih.gov/pubmed/293 l6444.
[0377] Palpant et al, 2015. https://www.ncbi.nlm.nih.gov/pubmed/26l53229.
[0378] Lian et al, 2015. https://www.ncbi.nlm.nih.gov/pubmed/26l25590.
[0379] Kim et al. https://www.ncbi.nlm.nih.gov/pubmed/29507284. [0380] All journal articles, patents, patent applications and other references are herein incorporated by reference in their entireties for all purposes.
[0381] From the foregoing, it will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without deviating from the spirit and scope of the invention. Accordingly, the invention is not limited except as by the appended claims.

Claims

1. A method for producing a cell comprising at least one tagged endogenous, differentially- expressed protein, the method comprising:
(a) providing a first nuclease specific for a target genomic locus of a differentially- expressed protein;
(b) providing a donor plasmid comprising:
i. a first polynucleotide encoding a selection cassette, wherein the selection cassette comprises a first selectable marker;
ii. a second polynucleotide encoding a second selectable marker that is different from the first selectable marker;
iii. a third polynucleotide encoding a 5’ homology arm; and
iv. a fourth polynucleotide encoding a 3’ homology arm;
(c) introducing the first nuclease of (a) and the donor plasmid of (b) into a cell such that the first and second polynucleotides are inserted into the target genomic locus;
(d) selecting cells expressing the first selectable marker; and
(e) introducing into the cells of (d): a second nuclease capable of excising the selection cassette to generate an endogenous protein tagged with the second selectable marker, wherein the tagged endogenous protein is substantially free of a scar sequence;
thereby producing the cell comprising the at least one tagged endogenous, differentially- expressed protein.
2. A method for producing a stem cell comprising at least one tagged endogenous, differentially-expressed protein, the method comprising:
(a) providing a first ribonucleoprotein (RNP) complex comprising a first Cas protein, a first CRISPR RNA (crRNA) and a first trans-activating RNA (tracrRNA), wherein the first crRNA is specific for a target genomic locus of an endogenous, differentially- expressed protein in a stem cell;
(b) providing a donor plasmid comprising polynucleotide sequences encoding:
i. a first selectable marker; ii. a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker;
iii. a second selectable marker that is different from the first selectable marker; and
iv. a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length;
(c) transfecting the complex of (a) and the donor plasmid of (b) into the stem cell such that the polynucleotide sequences encoding (i) to (iii) are inserted into the target genomic locus;
(d) selecting stem cells expressing the first selectable marker; and
(e) transfecting the stem cells of (d) with a second RNP complex comprising a second Cas protein, a second crRNA, and a second tracrRNA, wherein the second crRNA is specific for the 5’ and 3’ excision sites on the donor plasmid, to generate an endogenous protein tagged with the second selectable marker,
thereby producing the stem cell comprising at least one tagged endogenous, differentially- expressed protein.
3. The method of claim 1, wherein the selection cassette of (b) further comprises 5’ and 3’ excision sites flanking the first selectable marker.
4. The method of claim 1, wherein the cell comprising the at least one tagged endogenous, differentially-expressed protein is substantially free of the first selectable marker.
5. The method of claim 2, wherein the stem cell comprising at least one tagged endogenous, differentially-expressed protein is substantially free of the first selectable marker.
6. The method of any of claims 1 to 5, wherein the polynucleotide encoding the first selectable marker further encodes a constitutive regulatory element operably linked to the first selectable marker.
7. The method of claim 6, wherein the constitutive regulatory element is a CAGGS promoter, a UBC promoter, an EFl-a promoter, an actin promoter, or a hPGK promoter.
8. The method of any of claims 2 to 7, wherein the donor plasmid of (b) further comprises microhomology containing sequences flanking the 5’ and 3’ excision sites.
9. The method of claim 8, wherein the microhomology containing sequences comprise tri nucleotide or hexa-nucleotide repeat sequences.
10. The method of any of claims 2 to 9, wherein the tagged endogenous, differentially-expressed protein is substantially free of a scar sequence.
11. The method of any of claims 1 to 10, wherein the tagged endogenous, differentially- expressed protein comprises a linker that links the second selectable marker and the protein.
12. The method of claim 11, wherein the linker is Ser-Gly-Ser-Gly-Ser-Pro-Gly (SEQ ID NO:
288), Ser-Gly-Ser-Gly-Ser-Gly (SEQ ID NO: 289), Ser-Gly-Pro-Gly, or Val-Asp-Gly-Thr- Ala-Gly-Pro-Gly-Ser-Gly-Pro-Gly-Ser-Ile-Ala-Thr (SEQ ID NO: 290).
13. The method of any of claims 2 to 11, wherein the 5’ and 3’ excision sites each comprise a TialL protospacer.
14. The method of claim 12, wherein the TialL protospacer is an inverted TialL protospacer.
15. The method of any of claims 1 to 14, wherein the first and/or second selectable markers are each at least about 8 amino acids in length.
16. The method of any of claims 1 to 15, wherein the first and/or second selectable markers are each at least about 100 amino acids in length.
17. The method of any of claims 1 to 16, wherein the first and/or the second selectable marker is an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag.
18. The method of claim 17, wherein the first and second selectable markers are fluorescent proteins.
19. The method of claim 18, wherein the first and second selectable markers are independently selected from the group consisting of green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
20. The method of claim 19, wherein the first selectable marker is mCherry, and the second selectable marker is GFP.
21. The method of claim 1, wherein the first selectable marker is a fluorescent protein, and the selecting of (d) comprises fluorescence activated cell sorting (FACS).
22. The method of any of claims 1 to 21, further comprising (f): selecting cells expressing the second selectable marker.
23. The method of claim 22, wherein the second selectable marker is a fluorescent protein and the second selection step comprises fluorescence activated cell sorting (FACS).
24. The method of claim 1, wherein the first nuclease and/or the second nuclease is a Cas nuclease, a TALEN, or a zinc finger nuclease.
25. The method of claim 24, wherein the first nuclease and/or the second nuclease is a Cas protein.
26. The method of claim 2, wherein the first RNP comprises the first crRNA, the first tracrRNA, and the first Cas protein complexed at a ratio of 1 : 1 : 1.
27. The method of claim 2 or 26, wherein the second RNP comprises the second crRNA, the second tracrRNA, and the second Cas protein complexed at a ratio of 1 : 1 : 1.
28. The method of claim 2 or 25, wherein the Cas protein is a wild-type Cas9 protein or a Cas9- nickase protein.
29. The method of claim 2, wherein the first crRNA sequence is selected to minimize off-target cleavage of genomic DNA sequences and/or insertion of the donor plasmid.
30. The method of claim 2, wherein the second crRNA sequence is selected to minimize off- target cleavage of the 5’ and 3’ excision sites.
31. The method of claim 29 or 30, wherein the off-target cleavage is less than about 1.0%.
32. The method of any of claims 1 to 31, wherein a double-stranded break is generated at the target genomic locus after step (c).
33. The method of claim 32, wherein the double-stranded break is repaired by homology directed repair (HDR), non-homology end joining (NHEJ), or microhomology-mediated end joining (MMEJ).
34. The method of claim 33, wherein the double-stranded break is repaired by MMEJ.
35. The method of claim 34, wherein the donor plasmid acts as a repair template during MMEJ.
36. The method of any of claims 1 to 35, wherein protospacer adjacent motif (PAM) sequences are removed from the donor plasmid after insertion into the target genomic locus.
37. The method of any of claims 1 to 36, wherein the cell is an induced pluripotent stem cell (iPSC) derived from a healthy donor.
38. The method of claim 37, wherein the iPSC is a WTC cell or a WTB cell.
39. The method of claim 1 or 2, wherein the introducing or transfecting of (c) occurs by electroporation.
40. The method of claim 39, wherein the electroporation comprises at least 1 pulse.
41. The method of claim 40, wherein the pulse is at least about 15 ms at a voltage of at least about 1300 V.
42. The method of claim 39, wherein the electroporation comprises at least 1 to 5 pulses.
43. The method of claim 42, wherein the electroporation comprises at least 2 pulses.
44. The method of any of claims 1 to 43, wherein the target genomic locus is a locus of a gene encoding a protein that is not expressed in an undifferentiated stem cell, but is expressed in a differentiated cell.
45. The method of claim 44, wherein the differentiated cell is a cardiomyocyte, a differentiated kidney cell, or a differentiated fibroblast.
46. The method of claim 45, wherein the protein is ACTN2, TTNI1, MYL2, MYL7, or TTN.
47. The method of any of claims 1 to 46, wherein at least about 0.1% of the cells express the first selectable marker after step (c).
48. The method of claim 22, wherein the second selection step further comprises genetic screening to determine at least two or more of the following:
(a) insertion of the second selectable marker sequence;
(b) stable integration of the donor plasmid; and/or
(c) relative copy number of the second selectable marker sequence.
49. The method of claim 48, wherein the genetic screening is performed by droplet digital PCR (ddPCR), tile junction PCR, or both.
50. The method of claim 22, wherein selecting clones having an insertion of the second selectable marker comprises selecting clones that have the second selectable marker inserted into one or both alleles of the target genomic locus and do not have stable integration of the plasmid backbone.
51. The method of claim 22, further comprising sequencing clones having an insertion of the second selectable marker to identify clones that have a precise insertion of the second selectable marker.
52. The method of claim 51, wherein the clones that have a precise insertion are identified by:
(a) amplifying the genomic sequences across the junction between the inserted second selectable marker and the 5’ and 3’ distal genomic regions to generate tiled-junction amplification products;
(b) sequencing the tiled-junction amplification products of (a); and
(c) comparing the sequence of the tiled-junction amplification products with a reference sequence.
53. The method of claim 1, wherein the cell comprising the at least one tagged endogenous, differentially-expressed protein expresses at least one protein associated with pluripotency.
54. The method of claim 2, wherein the stem cell comprising the at least one tagged endogenous, differentially-expressed protein expresses at least one protein associated with pluripotency.
55. The method of claim 53 or 54, wherein the protein associated with pluripotency is selected from the group consisting of Oct3/4, Sox2, Nanog, Tra-l60, Tra-l8l, and SSEA3/4.
56. The method of claim 53 or 54, wherein the expression level of the at least one protein associated with pluripotency is comparable to the expression level of the same protein in an unmodified cell or stem cell.
57. The method of claim 2, wherein the stem cell comprising the at least one tagged endogenous, differentially-expressed protein maintains a differentiation potential that is comparable to an unmodified stem cell.
58. The method of claim 57, wherein the stem cell comprising the at least one tagged endogenous, differentially-expressed protein is capable of differentiating into a mesodermal cell, an endodermal cell, or an ectodermal cell.
59. The method of claim 2, wherein morphology, viability, potency, and endogenous cellular functions of the stem cells comprising the at least one tagged protein and/or differentiated cells derived from the stem cells comprising the least one tagged protein are not substantially changed compared to unmodified stem cells and differentiated cells thereof.
60. A donor plasmid comprising polynucleotide sequences encoding:
(a) a first selectable marker;
(b) a constitutive regulatory element operably linked to the first selectable marker;
(c) a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker;
(d) a second selectable marker that is different from the first selectable marker; (e) a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length.
61. The donor plasmid of claim 60, wherein the constitutive regulatory element is a CAGGS promoter, a UBC promoter, an EFl-a promoter, an actin promoter, or a hPGK promoter.
62. The donor plasmid of claim 60 or 61, further comprising microhomology containing sequences flanking the 5’ and 3’ excision sites.
63. The donor plasmid of claim 62, wherein the microhomology containing sequences comprise tri -nucleotide or hexa-nucleotide repeat sequences.
64. The donor plasmid of any of claims 60 to 63, further comprising a flexible linker sequence.
65. The donor plasmid of any of claims 60 to 64, wherein the polynucleotide sequences encoding the first and second selectable markers are each at least about 20 nucleotides in length.
66. The donor plasmid of claim 65, wherein the polynucleotide sequences encoding the first and second selectable markers are each between about 300 nucleotides and about 3000 nucleotides in length.
67. The donor plasmid of claim 65, wherein the polynucleotide sequences encoding the first and second selectable markers are each greater than about 3000 nucleotides.
68. The donor plasmid of any of claims 60 to 65, wherein the first and second selectable markers are each at least about 8 amino acids in length.
69. The donor plasmid of claim 68, wherein the first and second selectable markers are each between about 8 and about 100 amino acids in length.
70. The donor plasmid of any of claims 60 to 69, wherein the first and/or second selectable marker is an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag.
71. The donor plasmid of claim 70, wherein the first and second selectable markers are fluorescent proteins.
72. The donor plasmid of claim 71, wherein the first and second selectable markers are independently selected from the group consisting of green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
73. The donor plasmid of claim 72, wherein the first selectable marker is mCherry, and the second selectable marker is GFP.
74. A stably tagged cell generated by insertion of the donor plasmid of any of claims 60 to 73 into a genomic locus targeted by the 5’ and 3’ homology arms.
75. Use of the donor plasmid of any of claims 60 to 73 for imaging one or more proteins in one or more cells.
76. Use of the donor plasmid of claim 75, wherein the one or more cells are tissue.
77. Use of the donor plasmid of claim 75 or 76, wherein the one or more cells are living.
78. Use of the donor plasmid of any of claims 75 to 77, wherein the one or more proteins is not expressed in a stem cell and expressed in a differentiated cell derived from the stem cell.
79. Use of the donor plasmid of any of claims 75 to 78, wherein the imaging is three-dimensional imaging.
80. A cell comprising an exogenous polynucleotide integrated at a target genomic locus, the exogenous polynucleotide comprising polynucleotide sequences encoding:
(a) a first selectable marker;
(b) a constitutive regulatory element operably linked to the first selectable marker;
(c) a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; and
(d) a second selectable marker that is different from the first selectable marker, wherein the target genomic locus is a locus of a gene encoding a differentially-expressed protein.
81. A cell comprising a CRISPR/Cas9 ribonucleoprotein (RNP) complex and a donor polynucleotide, the donor polynucleotide comprising polynucleotide sequences encoding:
(a) a first selectable marker;
(b) a constitutive regulatory element operably linked to the first selectable marker;
(c) a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker;
(d) a second selectable marker that is different from the first selectable marker, and
(e) a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length.
82. The cell of claim 80 or 81 , further comprising microhomology containing sequences flanking the 5’ and 3’ excision sites.
83. The cell of claim 82, wherein the microhomology containing sequences comprise tri nucleotide or hexa-nucleotide repeat sequences.
84. The cell of any of claims 80 to 83, wherein the 5’ and 3’ excision sites each comprise a TialL protospacer.
85. The cell of claim 84, wherein the TialL protospacer is an inverted TialL protospacer.
86. The cell of any of claims 80 to 85, wherein the first and/or second selectable marker each comprise about 8 amino acids in length.
87. The cell of claim 86, wherein the first and/or second selectable markers each comprise at least about 100 amino acids in length.
88. The cell of any of claims 80 to 87, wherein the first and/or second selectable marker is an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag.
89. The cell of claim 88, wherein the first and second selectable markers are fluorescent proteins.
90. The cell of claim 89, wherein the first and second selectable markers are independently selected from the group consisting of green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
91. The cell of claim 90, wherein the first selectable marker is mCherry, and the second selectable marker is GFP.
92. A cell comprising an endogenous, differentially-expressed protein stably tagged with a selectable marker.
93. The cell of claim 92, wherein the selectable marker is an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag.
94. The cell of claim 93, wherein the selectable marker is a fluorescent protein.
95. The cell of claim 94, wherein the selectable marker is selected from the group consisting of green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
96. The cell of claim 95, wherein the selectable marker is GFP.
97. The cell of any of claims 80 to 96, wherein the cell is an undifferentiated stem cell.
98. The cell of claim 97, wherein the differentially-expressed protein is not expressed in the undifferentiated stem cell, but is expressed in a differentiated cell derived from the undifferentiated stem cell.
99. A differentiated cell or group of differentiated cells derived from the cell of any of claims 80 to 98.
100. The differentiated cell or group of differentiated cells of claim 99, wherein the differentiated cell or group of differentiated cells are selected from the group consisting of cardiomyocytes, differentiated kidney cells, and differentiated fibroblasts.
101. A kit comprising an array of stem cells comprising at least one tagged endogenous, differentially-expressed protein.
102. The kit of claim 101, wherein the stem cells are made according to the method of any of claims 1 to 59.
103. A kit for visualizing one or more proteins during differentiation or selecting differentiated cells, comprising an array of the cells of any of claims 80 to 98.
104. The kit of claim 103, wherein the visualizing of the one or more proteins is performed by fluorescent microscopy.
105. The kit of claim 103, wherein the differentiated cells express at least one tagged endogenous protein.
106. The kit of claim 103 or 105, wherein the selecting of the differentiated cells is performed by fluorescence activated cell sorting (FACS).
107. A method of generating a signature for a test agent comprising:
(a) admixing the test agent with one or more cells produced by the method of any one of claims
1-59;
(b) detecting a response in the one or more cells;
(c) detecting a response in a control cell;
(d) detecting a difference in the response in the one or more cells from the control cell; and
(e) generating a data set of the difference in the response.
108. Use of a cell produced by the method of any one of claims 1-59 in an activity selected from the group consisting of:
(a) determining toxicity of a test agent on the cell;
(b) determining the stage of disease in the cell;
(c) determining the dose of a test agent or drug for treatment of disease; (d) monitoring disease progression in the cell;
(e) monitoring effects of treatment of a test agent or drug on the cell; and
(f) determining a genetic or protein target for a test agent or drug within a cell
109. Use of a cell produced by the method of any one of claims 1-59 for monitoring progression of disease or effect of a test agent on a disease wherein the disease is selected from the group consisting of aberrant cell growth, wound healing, inflammation, immune disorders, genetic disorders, neurodegeneration and neuromuscular degeneration.
110. A method for producing a cell comprising at least one tagged endogenous, stimuli-responsive gene, the method comprising:
(a) providing a first nuclease specific for a target genomic locus of a stimuli-responsive gene;
(b) providing a donor plasmid comprising:
i. a first polynucleotide encoding a selection cassette, wherein the selection cassette comprises a first selectable marker;
ii. a second polynucleotide encoding a second selectable marker that is different from the first selectable marker;
iii. a third polynucleotide encoding a 5’ homology arm; and
iv. a fourth polynucleotide encoding a 3’ homology arm;
(c) introducing the first nuclease of (a) and the donor plasmid of (b) into a cell such that the first and second polynucleotides are inserted into the target genomic locus;
(d) selecting cells expressing the first selectable marker; and
(e) introducing into the cells of (d): a second nuclease capable of excising the selection cassette to generate an endogenous, stimuli-responsive gene tagged with the second selectable marker;
thereby producing the cell comprising the at least one tagged endogenous, stimuli-responsive gene.
111. A method for producing a cell comprising at least one tagged endogenous, stimuli-responsive gene, the method comprising:
(a) providing a first ribonucleoprotein (RNP) complex comprising a first Cas protein, a first CRISPR RNA (crRNA) and a first trans-activating RNA (tracrRNA), wherein the first crRNA is specific for a target genomic locus of an endogenous, stimuli-responsive gene in a cell;
(b) providing a donor plasmid comprising polynucleotide sequences encoding:
i. a first selectable marker;
ii. a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker;
iii. a second selectable marker that is different from the first selectable marker; and
iv. a 5’ homology arm and a 3’ homology arm, wherein the 5’ and 3’ homology arms are at least about 1 kb in length;
(c) transfecting the complex of (a) and the donor plasmid of (b) into the cell such that the polynucleotide sequences encoding (i) to (iii) are inserted into the target genomic locus;
(d) selecting cells expressing the first selectable marker; and
(e) transfecting the cells of (d) with a second RNP complex comprising a second Cas protein, a second crRNA, and a second tracrRNA, wherein the second crRNA is specific for the 5’ and 3’ excision sites on the donor plasmid, to generate an endogenous stimuli-responsive gene tagged with the second selectable marker,
thereby producing the cell comprising at least one tagged endogenous, stimuli-responsive gene.
112. The method of claim 110 or claim 111, wherein the selection cassette of (b) further comprises 5’ and 3’ excision sites flanking the first selectable marker.
113. The method of claim any one of claims 110-112, wherein the cell comprising the at least one tagged endogenous, stimuli-responsive gene is substantially free of the first selectable marker.
114. The method of any of claims 110-113, wherein the polynucleotide encoding the first selectable marker further encodes a constitutive regulatory element operably linked to the first selectable marker.
115. The method of claim 114, wherein the constitutive regulatory dementis a CAGGS promoter, a UBC promoter, an EFl-a promoter, an actin promoter, or a hPGK promoter.
116. The method of any of claims 110-115, wherein the donor plasmid of (b) further comprises microhomology containing sequences flanking the 5’ and 3’ excision sites.
117. The method of claim 116, wherein the microhomology containing sequences comprise tri nucleotide or hexa-nucleotide repeat sequences.
118. The method of any of claims 110-117, wherein the 5’ and 3’ excision sites each comprise a TialL protospacer.
119. The method of claim 118, wherein the TialL protospacer is an inverted TialL protospacer.
120. The method of any of claims 110-119, wherein the first and/or second selectable markers are each at least about 8 amino acids in length.
121. The method of any of claims 110-120, wherein the first and/or second selectable markers are each at least about 100 amino acids in length.
122. The method of any of claims 110-121, wherein the first and/or the second selectable marker is an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag.
123. The method of claim 122, wherein the first and second selectable markers are fluorescent proteins.
124. The method of claim 123, wherein the first and second selectable markers are independently selected from the group consisting of green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
125. The method of claim 124, wherein the first selectable marker is mCherry, and the second selectable marker is GFP.
126. The method of any one of claims 110-125, wherein the first selectable marker is a fluorescent protein, and the selecting of (d) comprises fluorescence activated cell sorting (FACS).
127. The method of any of claims 110-126, further comprising (f): selecting cells expressing the second selectable marker.
128. The method of claim 127, wherein the second selectable marker is a fluorescent protein and the second selection step comprises fluorescence activated cell sorting (FACS).
129. The method of claim 110, wherein the first nuclease and/or the second nuclease is a Cas nuclease, a TALEN, or a zinc finger nuclease.
130. The method of claim 129, wherein the first nuclease and/or the second nuclease is a Cas protein.
131. The method of claim 111, wherein the first RNP comprises the first crRNA, the first tracrRNA, and the first Cas protein complexed at a ratio of 1 : 1 : 1.
132. The method of claim 111 or 131, wherein the second RNP comprises the second crRNA, the second tracrRNA, and the second Cas protein complexed at a ratio of 1 : 1 : 1.
133. The method of claim 111 or 132, wherein the Cas protein is a wild-type Cas9 protein or a Cas9-nickase protein.
134. The method of claim 111, wherein the first crRNA sequence is selected to minimize off- target cleavage of genomic DNA sequences and/or insertion of the donor plasmid.
135. The method of claim 111, wherein the second crRNA sequence is selected to minimize off- target cleavage of the 5’ and 3’ excision sites.
136. The method of claim 134 or 135, wherein the off-target cleavage is less than about 1.0%.
137. The method of any of claims 110-136, wherein a double-stranded break is generated at the target genomic locus after step (c).
138. The method of claim 137, wherein the double-stranded break is repaired by homology directed repair (HDR), non-homology end joining (NHEJ), or microhomology-mediated end joining (MMEJ).
139. The method of claim 138, wherein the double-stranded break is repaired by MMEJ.
140. The method of claim 139, wherein the donor plasmid acts as a repair template during MMEJ.
141. The method of any of claims 110-140, wherein protospacer adjacent motif (PAM) sequences are removed from the donor plasmid after insertion into the target genomic locus.
142. The method of any one of claims 110-141, wherein the introducing or transfecting of (c) occurs by electroporation.
143. The method of claim 142, wherein the electroporation comprises at least 1 pulse.
144. The method of claim 143, wherein the pulse is at least about 15 ms at a voltage of at least about 1300 V.
145. The method of claim 144, wherein the electroporation comprises at least 1 to 5 pulses.
146. The method of claim 145, wherein the electroporation comprises at least 2 pulses.
147. The method of any of claims 110-146, wherein at least about 0.1% of the cells express the first selectable marker after step (c).
148. The method of claim 147, wherein the second selection step further comprises genetic screening to determine at least two or more of the following:
(a) insertion of the second selectable marker sequence;
(b) stable integration of the donor plasmid; and/or
(c) relative copy number of the second selectable marker sequence.
149. The method of claim 148, wherein the genetic screening is performed by droplet digital PCR (ddPCR), tile junction PCR, or both.
150. The method of claim 148, wherein selecting clones having an insertion of the second selectable marker comprises selecting clones that have the second selectable marker inserted into one or both alleles of the target genomic locus and do not have stable integration of the plasmid backbone.
151. The method of claim 148, further comprising sequencing clones having an insertion of the second selectable marker to identify clones that have a precise insertion of the second selectable marker.
152. The method of claim 151, wherein the clones that have a precise insertion are identified by:
(a) amplifying the genomic sequences across the junction between the inserted second selectable marker and the 5’ and 3’ distal genomic regions to generate tiled-junction amplification products;
(b) sequencing the tiled-junction amplification products of (a); and
(c) comparing the sequence of the tiled-junction amplification products with a reference sequence.
153. The method of any one of claims 110-152, wherein the stimuli-responsive gene is a gene that activates in response to endoplasmic reticulum stress, ionizing radiation stress, heat shock, oxidative stress, metal-induced toxicity, or drug-induced toxicity.
154. A cell comprising an exogenous polynucleotide integrated at a target genomic locus, the exogenous polynucleotide comprising polynucleotide sequences encoding:
(a) a first selectable marker;
(b) a constitutive regulatory element operably linked to the first selectable marker;
(c) a 5’ excision site and a 3’ excision site, wherein the 5’ and 3’ excision sites flank the first selectable marker; and
(d) a second selectable marker that is different from the first selectable marker, wherein the target genomic locus is a locus of a stimuli-responsive gene.
155. The cell of claim 153, further comprising microhomology containing sequences flanking the 5’ and 3’ excision sites.
156. The cell of claim 154, wherein the microhomology containing sequences comprise tri nucleotide or hexa-nucleotide repeat sequences.
157. The cell of any of claims 153-155, wherein the 5’ and 3’ excision sites each comprise a TialL protospacer.
158. The cell of claim 156, wherein the TialL protospacer is an inverted TialL protospacer.
159. The cell of any of claims 153-157, wherein the first and/or second selectable marker each comprise about 8 amino acids in length.
160. The cell of claim 158, wherein the first and/or second selectable markers each comprise at least about 100 amino acids in length.
161. The cell of any of claims 153-159, wherein the first and/or second selectable marker is an antibiotic resistance marker, an auxotrophic marker, a fluorescent protein, a luminescent protein, a photoactivatable protein, a FLAG tag, a SNAP tag, or a Halo tag.
162. The cell of claim 160, wherein the first and second selectable markers are fluorescent proteins.
163. The cell of claim 161, wherein the first and second selectable markers are independently selected from the group consisting of green fluorescent protein (GFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), mCherry, tdTomato, mNeonGreen, and mTagRFPt.
164. The cell of claim 162, wherein the first selectable marker is mCherry, and the second selectable marker is GFP.
165. The cell of any one of claims 154-164, wherein the stimuli-responsive gene is a gene that activates in response to endoplasmic reticulum stress, ionizing radiation stress, heat shock, oxidative stress, metal-induced toxicity, or drug-induced toxicity.
PCT/US2019/035852 2018-06-07 2019-06-06 Stem cell lines containing endogenous, differentially-expressed tagged proteins, methods of production, and use thereof WO2019236893A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862681887P 2018-06-07 2018-06-07
US62/681,887 2018-06-07

Publications (2)

Publication Number Publication Date
WO2019236893A2 true WO2019236893A2 (en) 2019-12-12
WO2019236893A3 WO2019236893A3 (en) 2020-01-16

Family

ID=68770630

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/035852 WO2019236893A2 (en) 2018-06-07 2019-06-06 Stem cell lines containing endogenous, differentially-expressed tagged proteins, methods of production, and use thereof

Country Status (1)

Country Link
WO (1) WO2019236893A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113265427A (en) * 2021-06-02 2021-08-17 呈诺再生医学科技(珠海横琴新区)有限公司 iPSC differentiation dynamic monitoring system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6241701B1 (en) * 1997-08-01 2001-06-05 Genetronics, Inc. Apparatus for electroporation mediated delivery of drugs and genes
RS64622B1 (en) * 2012-05-25 2023-10-31 Univ California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
RU2716420C2 (en) * 2013-06-17 2020-03-11 Те Брод Инститьют Инк. Delivery and use of systems of crispr-cas, vectors and compositions for targeted action and therapy in liver
WO2015129686A1 (en) * 2014-02-25 2015-09-03 国立研究開発法人 農業生物資源研究所 Plant cell having mutation introduced into target dna, and method for producing same
WO2016113733A1 (en) * 2015-01-13 2016-07-21 Yeda Research And Development Co. Ltd. Vectors, compositions and methods for endogenous epitope tagging of target genes

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113265427A (en) * 2021-06-02 2021-08-17 呈诺再生医学科技(珠海横琴新区)有限公司 iPSC differentiation dynamic monitoring system
CN113265427B (en) * 2021-06-02 2022-09-09 呈诺再生医学科技(珠海横琴新区)有限公司 iPSC differentiation dynamic monitoring system

Also Published As

Publication number Publication date
WO2019236893A3 (en) 2020-01-16

Similar Documents

Publication Publication Date Title
Roberts et al. Systematic gene tagging using CRISPR/Cas9 in human stem cells to illuminate cell organization
JP6993063B2 (en) Genome engineering
Nashun et al. Continuous histone replacement by Hira is essential for normal transcriptional regulation and de novo DNA methylation during mouse oogenesis
Sybirna et al. A critical role of PRDM14 in human primordial germ cell fate revealed by inducible degrons
Gafni et al. Derivation of novel human ground state naive pluripotent stem cells
Yang et al. The histone H2A deubiquitinase Usp16 regulates embryonic stem cell gene expression and lineage commitment
Kojima et al. GATA transcription factors, SOX17 and TFAP2C, drive the human germ-cell specification program
WO2014172470A2 (en) Methods of mutating, modifying or modulating nucleic acid in a cell or nonhuman mammal
US20190365818A1 (en) Genetically-tagged stem cell lines and methods of use
Javed et al. Microcephaly modeling of kinetochore mutation reveals a brain-specific phenotype
Lambers et al. Foxc1 regulates early cardiomyogenesis and functional properties of embryonic stem cell derived cardiomyocytes
Asimi et al. Hijacking of transcriptional condensates by endogenous retroviruses
JP6948650B2 (en) Ploid human embryonic stem cell lines and somatic cell lines and methods for producing them
JP2015500637A (en) Haploid cells
Fossat et al. Context-specific function of the LIM homeobox 1 transcription factor in head formation of the mouse embryo
US20210123016A1 (en) Regulators of human pluripotent stem cells and uses thereof
Tsai et al. A human embryonic stem cell reporter line for monitoring chemical-induced cardiotoxicity
Liu et al. SUMO ylated PRC 1 controls histone H3. 3 deposition and genome integrity of embryonic heterochromatin
Verma et al. CRISPR/Cas-mediated knockin in human pluripotent stem cells
Tekel et al. Cytosine and adenosine base editing in human pluripotent stem cells using transient reporters for editing enrichment
Gayle et al. piggyBac insertional mutagenesis screen identifies a role for nuclear RHOA in human ES cell differentiation
Ravid Lustig et al. GATA transcription factors drive initial Xist upregulation after fertilization through direct activation of long-range enhancers
WO2019236893A2 (en) Stem cell lines containing endogenous, differentially-expressed tagged proteins, methods of production, and use thereof
Cirino et al. Chromatin and transcriptional response to loss of TBX1 in early differentiation of mouse cells
Lustig et al. GATA transcription factors drive initial Xist upregulation after fertilization through direct activation of a distal enhancer element

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19814409

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19814409

Country of ref document: EP

Kind code of ref document: A2