WO2015040075A1 - Genomic screening methods using rna-guided endonucleases - Google Patents

Genomic screening methods using rna-guided endonucleases Download PDF

Info

Publication number
WO2015040075A1
WO2015040075A1 PCT/EP2014/069825 EP2014069825W WO2015040075A1 WO 2015040075 A1 WO2015040075 A1 WO 2015040075A1 EP 2014069825 W EP2014069825 W EP 2014069825W WO 2015040075 A1 WO2015040075 A1 WO 2015040075A1
Authority
WO
WIPO (PCT)
Prior art keywords
grna
library
population
cells
genes
Prior art date
Application number
PCT/EP2014/069825
Other languages
French (fr)
Inventor
Kosuke Yusa
Ylong LI
Original Assignee
Genome Research Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB201316558A external-priority patent/GB201316558D0/en
Priority claimed from GB201321257A external-priority patent/GB201321257D0/en
Application filed by Genome Research Limited filed Critical Genome Research Limited
Publication of WO2015040075A1 publication Critical patent/WO2015040075A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1079Screening libraries by altering the phenotype or phenotypic trait of the host
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1075Isolating an individual clone by screening libraries by coupling phenotype to genotype, not provided for in other groups of this subclass

Definitions

  • This invention relates to functional genomic screening and, in particular, methods for genome wide loss-of-f nction screening in mammalian cells.
  • Genomic screens have been useful in the elucidation of gene function and the identification of new targets for drug discovery.
  • mice embryonic stem cells ESCs
  • homozygous mutants can be obtained by knocking out both alleles of the relevant gene with two rounds of gene targeting.
  • this process takes at least 2 months and genome-wide mutant libraries cannot be easily generated by this method.
  • RNAi libraries have been used to systematically target large sets of genes in the genome, allowing many genes to be interrogated
  • RNAi libraries are limited by the efficacy of gene silencing. Since RNAi mediated suppression is rarely 100% efficient, not all genes in an RNAi library are knocked down with sufficient efficiency to generate a detectable phenotype. Furthermore, the amount of suppression of different genes in RNAi libraries is not uniform, further limiting the effectiveness of RNAi libraries for functional screening and cells targeted by RNAi often display off target effects.
  • ⁇ aspect of the invention provides a method of genomic screening comprising;
  • each cell in the library expressing an RNA-guided endonuclease and a gRNA specific for a target gene, such that the target gene is inactivated in the cell
  • the library expresses gRNA molecules specific for a set of target genes, such that a gene from the set of target genes is inactivated in each mutant mammalian cell in the library
  • the target genes that are identified are candidate modulators of the test phenotype.
  • the population of cells is selected using a selection that is lethal to cells which do not display the test phenotype. Cells which survive the selection therefore display the test phenotype and form the selected cell population.
  • cells displaying the test phenotype may be isolated and/or separated from other cells in the library to form the selected cell population.
  • Another aspect of the invention provides a method of genomic screening comprising;
  • each cell in the library expressing an RNA-guided endonuclease and a gRNA specific for a target gene, such that the target gene is inactivated in the cell
  • the library expresses gRNA molecules specific for a set of target genes, such that a gene from the set of target genes is inactivated in each mutant mammalian cell in the library
  • nucleic acid sequences encoding gRNA molecules that are amplified or depleted in the selected cell
  • sequences one or more target genes that are inactivated in the selected cell population are inactivated.
  • each cell in the library comprises a nucleic acid encoding an RNA-guided endonuclease and a nucleic acid encoding a gRNA specific for a target gene stably integrated into the genome of the cell.
  • the nucleic acids expressing the RNA-guided endonuclease and gRNAs may be stably or conditionally expressed in the library of cells in order to inactivate the set of target genes in the library.
  • Methods of the invention may be useful in identifying genes that modulate or are functionally linked to the test phenotype in the mammalian cell.
  • a suitable library of mutant mammalian cells may be produced by a method comprising;
  • each integrative vector comprising a nucleic acid sequence encoding a guide RNA molecule (gRNA) that is specific for a target gene in the mammalian cells, wherein said population comprises nucleic acid sequences encoding a diverse population of guide RNA molecules (gRNAs) that is specific for a set of target genes within the mammalian cell,
  • gRNA guide RNA molecule
  • the mammalian cells may comprise a nucleic acid encoding a RNA-guided endonuclease integrated into the genome of the cells.
  • RNA-guided endonuclease is Clustered Regularly
  • CRISPR Interspaced Short Palindromic Repeat
  • Cas9 Interspaced Short Palindromic Repeat
  • Another aspect of the invention provides a diverse population of integrative vectors
  • each integrative vector being capable of integration into the genome of a mammalian cell and comprising a nucleic acid sequence encoding a guide RNA molecule (gRNA) that is specific for a target gene in the mammalian cell,
  • gRNA guide RNA molecule
  • said population of vectors comprises nucleic acid sequences that encode a diverse population of guide RNA molecules (gRNAs) that is specific for a set of target genes within the
  • the integrative vectors are viral vectors, most preferably lentiviral vectors.
  • Another aspect of the invention provides a library of mutant mammalian cells .
  • each mutant mammalian cell in the library expressing an RNA- guided endonuclease and a guide RNA molecule specific for target gene, such that the target gene is inactivated in the cell,
  • each mammalian cell in the library comprises a nucleic acid encoding RNA-guided endonuclease and a nucleic acid encoding a gRNA specific for a target gene integrated into the genome thereof.
  • the set of target genes may comprise all of the protein coding genes in the genome of the mammalian cell (i.e. the library is a pan-genomic library) or a subset of the genes in the genome of the mammalian cell. Populations of mammalian cells may be useful in the methods described above .
  • FIG 1 is a schematic of the gRNA cloning vector. Bbsl digestion removes the spacer and produces cohesive ends, which allows duplex oligos with compatible overhangs to ligate into. Underlined, Bbsl sites. For the duplex oligos, G at the +1 position is highlighted in red. gRNA sequences are shown as N (top strand) and n (bottom strand) .
  • Figure 2 shows the sequences and the genomic positions of gRNAs targeting the Piga gene. The genomic PAM sequences of each gRNA are shown in red.
  • Figure 3 shows a summary of flow cytometry analysis of GPI-anchored protein expression performed 6 days post transfection. Transfection was performed in triplicates. Data are shown as mean ⁇ s.d.
  • Figure 4 is a schematic of the piggyBac vector carrying human EFla promoter-driven puromycin resistant gene (Puro) and humanized Cas9 (hCas9) .
  • the two coding sequences are fused with the T2A self-cleaving peptide.
  • PB piggyBac repeats
  • bpA bovine polyadenylation signal sequence .
  • FIGs 5 and 6 show flow cytometry analyses of GPI-anchored protein expression (Fig 5) and GFP (Fig 6) after transient transfection.
  • the parental wild type ESCs (JM8) and the hCas9-expressing clones were transfected with the indicated plasmid DNAs .
  • the expression of GPI- anchored proteins was analysed 6 days post transfection by FLAER staining (Fig 5) .
  • pBluescript was used as a carrier plasmid.
  • the cells were also transfected separately with a GFP plasmid and analysed 2 days post transfection (Fig 6) .
  • Figure 7 is a schematic of the piggyBac vector carrying the gRNA expression cassette and a neomycin resistant gene cassette.
  • U6. human U6 promoter; T7, U6 terminator; PGK mouse Pgkl promoter.
  • Figure 8 shows flow cytometry analysis of doubly transgenic mouse ESC lines. This analysis was performed 3 days after the colonies were picked
  • Figure 9 shows histograms showing the distribution of indel sizes at the on-target locus. Data for parental ESCs (JM8) and two doubly transgenic lines (5-4 and 8-3) are shown.
  • Figure 10 shows histograms the distribution of indel sizes at the off- target locus 120-tm5-21. Data for parental ESCs (JM8) and two doubly transgenic lines (5-4 and 8-3) are shown.
  • Figure 11 shows histograms the distribution of indel sizes at the off- target locus 120-tm5-2. Data for parental ESCs (J 8) and two doubly transgenic lines (5-4 and 8-3) are shown.
  • Figure 12 shows a schematic of the self-inactivating lentiviral vector that expresses gRNA.
  • the Pgkl promoter-driven puro-2A-BFP cassette was inserted downstream of the gRNA expression cassette.
  • the sequences shown are the gRNAs targeting Site 3 of the Piga gene with the endogenous sequences (top) or the modified sequences (bottom) . Note that the +1 position of the U6 transcript has been changed to a
  • CMV CMV promoter
  • RU5 5' long terminal repeat lacking the U3 region
  • T7 U6 terminator
  • PGK mouse Pgkl promoter
  • BFP blue fluorescent protein
  • 2A self-cleavage peptide
  • puro puromycin resistant gene
  • AU3RU5 enhancer-deleted 3' LTR.
  • Figure 13 shows the inactivation of the Piga gene by lentiviral delivery of the gRNA expression cassette in hCas9-expressing mouse ESCs (upper) or mouse pancreatic carcinoma cells (lower) .
  • Cells were infected with lentivirus expressing the gRNA targeting site 3 of the Piga gene. Transduced cells were analysed by flow cytometry 6 days post infection. While cells transduced with the empty lentivirus were all FLAER-positive , cells transduced with Piga : gRNA-expressing lentivirus were generally FLAER-negative population.
  • Figure 14 shows the inactivation of the Piga gene by lentiviral delivery of the gRNA expression cassette in hCas9-expressing mouse ESCs infected with lentivirus expressing the gRNA targeting the indicated sites of the Piga gene with the endogenous (top) or altered (bottom) sequences. Transduced cells were analysed by flow cytometry 6 days post infection.
  • Figure 16 shows the fractions of reads with indels analysed by deep sequencing. **, no indel was detected.
  • Figure 17 shows flow cytometry analysis of cells transfected with the indicated gRNA-expression vectors.
  • pBluescript was used as a control vector. *, not significant when compared to the control by Student's t-test .
  • Figures 18 and 19 show the percentage of in-frame indels.
  • the y axis shows the number of reads with in-frame indels at the indicated size divided by the total numbers of reads with all indels at Site 1
  • Figure 20 shows a schematic of genetic screens with genome-wide lentiviral gRNA libraries.
  • Figure 21 shows gRNA design statistics (far left) and deep sequencing analyses of the gRNAs in the lentiviral plasmid DNA library (centre left) , ESC library 1 (centre right) and 2 (far right) .
  • Figure 22 shows scatter plots comparing gRNA frequencies in the oriainal lentiviral olasmid DNA and in the ESC library 1 (LH) and 2 (RH) .
  • gRNA counts in the ESC libraries have been normalised against the gRNA counts from the lentiviral library.
  • Figure 23 shows fold changes of read counts between the lentiviral plasmid DNA library and the ESC libraries.
  • Pluripotency genes, Nanog and Pou5fl, and essential genes, Rad51 and Brcal are significantly depleted in the ESC libraries, whereas lineage marker genes are not depleted.
  • the same gRNAs in the two ESC libraries are linked with lines. Mann-Whitney U test was performed by comparing gRNAs of each gene from each library with all gRNAs in the corresponding ESC library .
  • Figures 24 and 25 show genetic screens using the genome-wide gRNA library and genetic validation assays of the novel candidate genes. Genes with multiple gRNA hits in cells resistant to alpha-toxin are shown in Fig 24 and 6TG in Fig 25. All known genes involved in the GPI-anchor synthesis pathway are shown in Fig 24. Genes highlighted in asterisk were chosen for further validation. MMR, mismatch repair.
  • Figures 26 and 27 show the enrichment of gRNA sequences after
  • Figure 28 shows a summary of the guide RNA hits in the 26 GPI-anchor pathway genes identified in Example 2.10.
  • Table 1 shows indel patterns on the on-target sites in the doubly transgenic colonies. Sequences in red and green represents the PAM and gRNA sequences, respectively. Mismatch bases are shown in blue. The sizes of the deletions were shown on the right. Note that colony 8-3 and 8-9 carry multiple numbers of deletions.
  • Table 2 shows a summary of gRNA hits in genome-wide screens. Detailed Description
  • the methods described herein relate to genomic screens to identify genes and other genomic sequences that modulate, support or are functionally linked to a phenotype of interest.
  • Screening methods described herein employ libraries, collections or pools of mutant mammalian cells in which different members of a set of target genes are selectively inactivated or "knocked out” in different cells in the library.
  • the target gene is inactivated or knocked out in the cells of the library through the expression from integrated genomic nucleic acid sequences of i) an RNA guided endonuclease, such as Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) - associated 9 (Cas9) nuclease, and ii) a gRNA molecule that directs the endonuclease to the specific target gene.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeat
  • Cas9 Clustered Regularly Interspaced Short Palindromic Repeat
  • gRNA molecule that directs the endonuclease to the specific target gene.
  • RNA guided endonuclease and the gRNA molecule together cause double strand cleavage of the genomic DNA in both alleles of the target gene. Repair of these DNA double strand breaks (DSBs) by the mammalian cell leads to the introduction of insertion or deletion (indel) mutations that
  • Targeted gene inactivation using gRNA and RNA guided endonucleases that are stably integrated into a cell genome provides stable
  • endonuclease expression from stably integrated coding sequences is shown herein to be sufficient to mediate targeted DNA double strand breakage and gene inactivation.
  • Different members of the library of mutant mammalian cells express different gRNA molecules specific for different target genes, such that different genes from the set of target genes are inactivated in different mutant mammalian cells in the library.
  • the library of mutant mammalian cells may be used inter alia for loss- of-function screening.
  • the library may be subjected to positive or negative selection for a test phenotype, such as drug resistance, and nucleic acid sequences encoding gRNA molecules may be identified in the selected cell population that displays the phenotype. From these sequences, target genes that mediate the test phenotype may be identified.
  • the amounts or numbers of copies of one or more nucleic acid sequences that encode gRNA molecules may be
  • gRNA encoding nucleic acid sequences in the selected cell population may be identified.
  • Target genes that mediate the phenotype may be identified from these identified gRNA encoding nucleic acids.
  • the library may be subjected to selection for cells having a specific phenotype, such as drug resistance or a cellular response to a stimulus or chemical compound, and the amounts of nucleic acids encoding different gRNA molecules in the population of selected cells (i.e. in cells which display the test phenotype) may be determined relative to a control sample of the library (e.g. a library sample that has not been subjected to selection) .
  • a control sample of the library e.g. a library sample that has not been subjected to selection
  • gRNA encoding nucleic acid sequences that are amplified or depleted in the population of selected cells relative to the control sample may be identified and used to identify target genes that mediate the
  • Genes which are targeted by the gRNAs encoded by nucleic acids that are enriched or depleted may be identified from the recognition sequences of the enriched or depleted gRNA-encoding nucleic acids.
  • the target genes identified by the methods described herein are associated with or involved in the phenotype that is tested.
  • the identified genes may modulate, mediate or be functionally linked to the selected phenotype in the mammalian cells or may be activators or repressors of a phenotypic cellular response.
  • each cell in the library carries retrievable information which identifies the gene that is inactivated in it.
  • This internal tag allows the screening methods described herein to be performed on a library or pool of mammalian cells in a single step. This avoids the need for large-scale parallel testing of individual clonal cell populations with known deletions (i.e. an array format)
  • both recessive and dominant genes may be identified by the methods described herein.
  • Methods of the invention are performed in vitro.
  • the mammalian cells are preferably isolated cells and may be from any human or non-human mammalian species, preferably human or mouse cells.
  • Suitable mammalian cells for use in the methods described herein include somatic cells, pluripotent cells, somatic stem cells and cancer cells.
  • Mammalian pluripotent cells may include embryonic stem (ES) cells, for example murine ES cells, and non-embryonic stem cells, including foetal and adult somatic stem cells and stem cells derived from non- pluripotent cells, such as induced pluripotent (iPS) cells.
  • ES embryonic stem
  • iPS induced pluripotent
  • the mammalian cells may be from an established cell line, for example, a cancer cell line such as human lung carcinoma cell line A549, or may be obtained from a cell sample from an individual, for example a human or non-human mammal.
  • a cancer cell line such as human lung carcinoma cell line A549
  • A549 human lung carcinoma cell line A549
  • the mammalian cells are dividing cells. Suitable mammalian cells are well-known in the art.
  • the cells in the mutant mammalian cell library express a RNA guided endonuclease and a gRNA.
  • the expression is stable in the cells.
  • the RNA guided endonuclease forms a complex with the gRNA and cleaves both strands of the target gene at a DNA sequence (termed a recognition region or protospacer) with the target gene that is complementary to the target sequence (or crRNA region) of the gRNA.
  • the nucleic acid encoding the RNA guided endonuclease may for example be stably integrated into the genome of the mammalian cells at a neutral and active site in the genome , such as the mouse Rosa26 locus or its human homologue (Irion et al Nature Biotech 25 (12) 1477-1482), the AAVS1 site (Sadelain, M . et al . 2011 Nat Rev Cancer. (2011) 1;
  • RNA guided endonuclease is Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) -associated 9 (Cas9) nuclease.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeat
  • Cas9 Clustered Regularly Interspaced Short Palindromic Repeat
  • CRISPR CRISPR associated protein
  • Bacteriophage-derived 30-bp DNA fragments are inserted into the CRISPR locus of the host cell and transcribed as CRISPR RNAs (crRNAs) . These form a complex with trans-encoded RNA (tracrRNA) and CRISPR-associated (Cas) proteins, and the complex introduces site-specific cleavage at DNA sites that match the sequence of the crRNAs.
  • crRNAs CRISPR RNAs
  • the cells in the mutant mammalian cell library express a Cas9 nuclease and a gRNA molecule.
  • the Cas9 nuclease forms a complex with the gRNA molecule and introduces a site-specific DNA double strand break into DNA sequences (termed recognition regions or protospacers) that are complementary to the target sequence (or crRNA region) of the gRNA molecule.
  • Suitable Cas9 nuclease sequences include SEQ ID NO: 1.
  • Suitable Cas9 nucleases for use as described herein may be derived from Streptococcus pyogenes SF370, Streptococcus thermophilus LMD-9 (Cong, L. et al . Science 339, 819-823 (2013)) or other bacterial or archeal species.
  • the nucleic acid sequence encoding the Cas9 nuclease or other RNA guided endonuclease may be humanized, i.e. codon-optimised for expression in human cells.
  • endonucleases may be produced by conventional synthetic means or obtained from commercial suppliers (System Biosciences, USA; BioCat GmbH, DE) or non-profit repositories (e.g. Addgene, MA USA) .
  • the amino acid sequences of suitable RNA guided endonucleases may be fused to nuclear translocalisation signals (NLSs) and/or tags.
  • NLSs and/or tags and encoding nucleic acids are well-known in the art.
  • a Cas9 nuclease may be fused at one end or both ends to a nuclear localisation signal and/or a sequence tag, such as a FLAG, HA, Myc, V5 tag.
  • the nucleic acid sequence encoding the Cas9 nuclease or other RNA guided endonuclease may be operably linked to a suitable regulatory element. Suitable regulatory elements are active after stable
  • constitutive promoters such as human elongation factor la (EFla) promoter, CAG promoter, human ubiquitin C promoter, human/mouse PGK promoter, and human/mouse PolII promoter; and conditional promoters, such as the tetracycline response element (TRE) promoter.
  • EFla human elongation factor la
  • CAG CAG promoter
  • human/mouse PGK promoter human/mouse PolII promoter
  • conditional promoters such as the tetracycline response element (TRE) promoter.
  • TRE tetracycline response element
  • the nucleic acid sequence encoding the Cas9 nuclease may be contained in an expression vector.
  • Expression vectors suitable for stable integration into the mammalian cell genome and the expression of recombinant proteins are well known in the art.
  • Suitable vectors can be chosen or constructed, containing appropriate regulatory sequences, including promoter sequences, terminator fragments, polyadenylation sequences, enhancer sequences, marker genes and other sequences as appropriate.
  • the vector contains appropriate regulatory sequences to drive the expression of the Cas9 or other RNA guided endonuclease in the mammalian cells.
  • a vector may also comprise sequences, such as origins of replication and selectable markers, which allow for its selection and replication and expression in bacterial hosts such as E. coli as well as in mammalian cells.
  • Vectors suitable for use in expressing RNA guided endonuclease encoding nucleic acids include plasmids and viral vectors e.g. 'phage, or phagemid, and the precise choice of vector will depend on the particular expression system which is employed.
  • the vector is an integrative vector. For further details see, for example,
  • the expression vector comprising the nucleic acid encoding the Cas9 nuclease or other RNA guided endonuclease stably integrates into the genome of the mammalian cells following transfection .
  • Each mammalian cell may contain one copy or multiple copies of the nucleic acid sequence encoding the RNA guided
  • endonuclease may be stably integrated into the genome of the mammalian cells at a neutral and active site in the genome, such as the mouse Rosa26 locus or its human homologue (Irion et al Nature Biotech 25 (12) 1477-1482) or the mouse Collal locus or its human homologue.
  • Mammalian cells may be transfected or infected with suitable
  • RNA guided endonuclease may be any RNA guided endonuclease.
  • the RNA guided endonuclease may be any RNA guided endonuclease.
  • transfected cells that express the RNA guided endonuclease may be selected following transfection, for example using a selectable marker such as antibiotic resistance or fluorescence, such that the transfected cells are isolated from cells that do not express the RNA guided endonuclease.
  • a selectable marker such as antibiotic resistance or fluorescence
  • separation of transfected cells that express the RNA guided endonuclease from cells that do not express the RNA guided endonuclease may not be necessary following transfection.
  • the mammalian cells may be transfected with the nucleic acid encoding the RNA guided endonuclease before, at the same time as, or after transfection with the nucleic acid encoding the diverse population of guide RNA (gRNA) molecules.
  • gRNA guide RNA
  • a guide RNA (gRNA) molecule forms a complex with the RNA guided endonuclease that introduces a site-specific DNA double strand break into a DNA sequence (termed a target region or protospacer) within the target gene that is complementary to the recognition sequence (or crRNA region) of the gRNA molecule.
  • a gRNA molecule that directs an RNA guided endonuclease to cleave DNA strands within a target gene may be termed "specific" for the target gene.
  • gRNA molecule specific for a target gene in a cell in combination with an RNA guided endonuclease, such as Cas9 nuclease, leads to the selective inactivation of the target gene in the cell, whilst other genes in the cell are unaffected.
  • an RNA guided endonuclease such as Cas9 nuclease
  • a gRNA molecule comprises a recognition sequence (crRNA) and a scaffold sequence (tracrRNA) . .
  • the recognition sequence of the gRNA is complementary to the sequence of a target region of genomic DNA (also called a protospacer) within a target gene.
  • Suitable target regions may be 15 bp to 25bp in length, preferably 18bp to 20bp, and may be followed by a protospace- adjacent motif (PAM) .
  • PAMs include NGG and NAG, wherein N is any nucleotide.
  • a suitable target region in a target gene may consist of the sequence 5' -NigNGG -3', 5'-N 20 NGG -3', 5'-Ni 9 NAG- 3' or 5 ' -N2QNAG-3 ' , where N is any nucleotide. Examples of PAM
  • the target region is located wholly or partially within a coding sequence of the target gene (i.e. an exonic sequence) .
  • Preferred target regions within a target gene are located lOObp or more downstream from the ATG initiation codon of the target gene but within the first 50% of the exonic sequence of the target gene (i.e. the 50% of the exonic sequence that is adjacent the initiation codon) .
  • a suitable target region may be present in all transcripts of a target gene and only present in a single exon within the mammalian cell genome .
  • the first nucleotide of the recognition sequence of the gRNA may be G regardless of the corresponding residue in the protospacer sequence (i.e. the G may be a mismatch with the corresponding residue in the protospacer sequence) .
  • the remainder of the recognition sequence is complementary to the sequence of the target region of genomic DNA.
  • the initial G residue in the recognition sequence may correspond to a complementary C residue in the
  • the recognition sequence of the gRNA may consist of the sequence GN.g and the protospacer of the target gene may have the sequence N2 0 NGG, where N is any nucleotide.
  • the last five nucleotides of the recognition sequence of the gRNA may be devoid of the sequence TTT.
  • the nucleotide sequence from position 14 onwards of a target region that is targeted by a gRNA molecule may be unique to the target gene and not found in other genes or exonic sequences within the genome of the mammalian cell.
  • the nucleotide sequence of nucleotides 1 to 13 of the target region targeted by a gRNA molecule i.e.
  • 5'- ⁇ - ⁇ 3-3' may also be unique to the target gene or may be rare within the mammalian cell genome outside the target gene (for example, less than 100, less than 50, less than 10 or less than 5 repeats of the nucleotide sequence in genes or exonic sequences outside the target gene) .
  • nucleotide sequences that differ from the nucleotide sequence of positions 1 to 13 of the target region by one nucleotide may also be rare within the mammalian cell genome outside the target gene (for example less than 100, less than 50, less than 10 or less than 5 repeats of the nucleotide sequence in genes or exonic sequences outside the target gene) .
  • Suitable target regions within a target gene may be identified using standard genomic techniques as described herein and used to design the recognition sequences of gRNA molecules to target the gene.
  • guide RNAs may be designed for each CCDS of the target gene .
  • a gRNA may further comprise a scaffold (or tracrRNA) sequence.
  • scaffold sequence may depend on the RNA guided
  • gRNA scaffold sequences for use with specific RNA guided endonucleases , such as Cas9, are well- known in the art.
  • a gRNA scaffold sequence derived from the same species as the RNA guided endonuclease is employed.
  • Suitable gRNA scaffold sequences are well-known in the art and include the sequence of SEQ ID NO: 2.
  • Nucleic acids encoding gRNA molecules as described herein may be readily prepared by the skilled person using publicly available genomic information, the information and references contained herein and techniques known in the art (for example, see Molecular Cloning: a Laboratory Manual: 4th edition, Green et al . , 2012 Cold Spring Harbor Laboratory Press).
  • a diverse population of gRNAs may be produced in which sequence overlap between the members of the population is minimised or avoided.
  • a library of mutant mammalian cells may be generated for use in the methods described herein using a pooled population or library of diverse gRNA molecules.
  • the number of different gRNAs in the diverse population depends on the number of genes in the set of target genes to be inactivated and the number of gRNAs in the population that target each gene.
  • the diverse population may comprise at least 10, at least 100, at least 1000 at least 10000, at least 20000, at least 30000, at least
  • the diverse population of gRNAs may comprise gRNAs with at least 10, at least 100, at least 1000 at least 10000, at least 20000, at least 30000, at least 50000 or at least 100000 different recognition sequences.
  • the diverse population of gRNAs may be specific for a set of target genes in the mammalian cell that consists of at least 10, at least 100, at least 1000, at least 10000, at least 19000 or at least 20000 different genes.
  • the diverse population of guide RNA molecules (gRNAs) is specific for a set of target genes in the mammalian cell that consists of at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or at least 98% of the protein coding genes in the mammalian cell.
  • the diverse population of gRNAs may target all of the genes in the genome of the mammalian cell or a subset of the genes in the genome.
  • Suitable subsets of genes that may be targeted include genes involved in specific biological pathways, such as signal transduction and epigenetic regulation, genes that encode specific protein activities, such as kinases and phosphatases.
  • Suitable diverse populations of gRNAs may be designed and synthesised using standard techniques. For example, the exonic coordinates of all the protein coding genes in the mammalian cell genome are publically available may be obtained from genomic databases. Protospacer
  • sequences of N 2 oNGG or N 20 NAG may be extracted and oligonucleotides corresponding to some or all of the extracted sequences may be synthesised using standard techniques. For example, large populations of oligonucleotides may be produced by parallel synthesis using standard techniques or obtained from commercial suppliers (e.g.
  • oligonucleotides may be cloned into integrative gRNA expression vectors adjacent a gRNA scaffold sequence.
  • Each target gene may be targeted by one gRNA in the diverse population or more preferably two or more gRNAs in said diverse population (i.e. two or more different gRNAs in the diverse population may be specific for the same gene) .
  • two, three, four, five or more gRNA molecules in said library may be specific for different target regions within the same target gene, preferably three to five. This may be helpful in reducing the risk that the results of the screen are affected by the off target effects of a single gRNA and increasing the probability of successful inactivation of the target gene.
  • integrative vectors may be inactivated in the mammalian cell library.
  • mammalian cells in the library depends on the relative amounts of integrative vector and mammalian cells used for transfection.
  • one target gene may be inactivated in each mammalian cell in the library.
  • two to five target genes may inactivated in each mammalian cell.
  • a second screening step may be employed to identify which of the inactivated target genes in a cell is
  • the diverse population of integrative vectors may comprise vectors that encode control gRNA molecules.
  • Suitable control gRNAs may be irrelevant to the set of target genes and may for example introduce mutations into non-functional sequences or into irrelevant genes that are not part of the target set of genes. This may be useful as a control and/or for facilitating enrichment when the number of genes in a focussed gene set is relatively small, for example in a secondary screen described above. Since each gRNA molecule of interest represents a smaller fraction in the total population, the presence of irrelevant gRNA molecules allows an increased relative enrichment of nucleic acids encoding gRNA molecules that are involved in the test phenotype .
  • the representation of individual gRNAs within the diverse population may be confirmed by sequencing before transfection or infection .
  • a nucleic acid encoding a gRNA molecule may be contained in an expression cassette.
  • the expression cassette may comprise the nucleic acid sequence encoding the gRNA molecule operably linked to a
  • Suitable regulatory elements include constitutive viral or mammalian regulatory elements, such as the human U6 promoter or the human Hi promoter.
  • a suitable expression cassette may comprise a promoter, nucleic acid encoding the gRNA and and a termination signal eg Is.
  • the nucleic acid encoding the gRNA molecule or the expression cassette comprising the nucleic acid may be contained in an integrative vector.
  • Suitable integrative vectors stably integrate into the genome of the mammalian cells after transfection and express the nucleic acid encoding the gRNA molecule. Integration of the vector into the genome of a cell allows the identification of the target genes that are inactivated in the cell through the identification of the gRNA molecule encoded by the integrated vector.
  • Suitable integrative vectors are well known in the art and include viral vectors, for example retroviral vectors, such as MLV and lentiviruses , such as HIV, SIV and FIV, and transposon vectors such as Sleeping BeautyTM, piggyBac ⁇ M and Tol2 transposon systems.
  • the integrative vector is a lentiviral vector.
  • An example of a lentiviral gRNA vector suitable for use in the methods described herein is shown in Fig 12.
  • the integrative vector may further comprise one or more selectable markers.
  • Suitable selectable markers include fluorescent proteins, such as Blue Fluorescent Protein (BFP) or Green Fluorescent Protein (GFP) , that can be selected by cell-sorting, and antibiotic resistance genes, such as puromycin resistance or neomycin resistance, that can be selected by exposure to the antibiotic.
  • mammalian cells that incorporate the integrated vector in their genome may be selected through expression of the selectable marker.
  • Another aspect of the invention provides a population or library of integrative vectors for transfecting a mammalian cell population, preferably a mammalian cell population expressing an RNA guided endonuclease, as described herein, each said integrative vector comprising a nucleic acid sequence encoding a guide RNA molecule (gRNA) that is specific for a target gene within the mammalian cells, said population encoding a diverse population of guide RNA molecules (gRNAs) specific for a set of target genes within the mammalian cells.
  • gRNA guide RNA molecule
  • the integrative vectors in the diverse population may be isolated or, where appropriate, may be packaged into a suitable viral particle for infection or trans fection .
  • the diverse population of integrative vectors may be a pan-genomic library that targets all of the active genes in the mammalian cell (i.e. genes that encode proteins or miRNA) .
  • the diverse population may comprise nucleic acid sequences encoding 20000 or more different gRNAs.
  • the diverse population may comprise nucleic acid sequences encoding gRNAs that target at least different 20000 genes in a mammalian cell.
  • nucleic acid sequences in the diverse population may encode 1 to 5 gRNAs that target each active gene in a mammalian cell.
  • the diverse population of integrative vectors may be a sub-genomic library that targets a subset of the active genes in a mammalian cell, for example all the genes encoding a specific enzymatic activity or all the genes encoding members of a specific pathway.
  • the set of target genes may be genes encoding kinases, phosphatases, tyrosine kinases or G protein coupled receptors (GPCRs) .
  • the diverse population may comprise nucleic acid sequences encoding gRNAs that target a subset of at least 10, at least 50, at least 100, at least 500 or at least 1000 different genes in a
  • the diverse population may comprise nucleic acid sequences encoding 50 or more, 500 or more, or 1000 or more different gRNAs. Nucleic acid sequences in the diverse population may encode 1 to 5 gRNAs that target each gene in the subset of genes in a mammalian cell.
  • the diverse population of integrative vectors may be stored, e.g. frozen, using conventional techniques, or used in genomic screening. For example, viral plasmid DNAs containing gRNA libraries may be stored at -20C and lentivirus particles may be be stored at ⁇ 80C.
  • the gRNA encoding nucleic acid sequences in the diverse population may be analysed before transfection to determine the representation of each gRNA in the diverse population
  • the diverse population of integrative vectors may be used or suitable for use in the methods described herein. Suitable populations of integrative vectors are described in more detail above.
  • the diverse population of integrative vectors may be packaged where necessary and stably transfected into the mammalian cells using standard techniques.
  • the mammalian cells in the library express, preferably stably, both a RNA guided endonuclease, preferably Cas9, and a gRNA molecule from the library of gRNA molecules.
  • mammalian cells that express both the RNA guided endonuclease and the gRNA molecule may be selected, for example using selectable markers, such as antibiotic resistance or fluorescence.
  • Each mammalian cell in the library may express one or multiple gRNAs, such that one or multiple target genes are inactivated in the cell.
  • each mammalian cell in the library only expresses a single gRNA, such that a single target gene is inactivated in the cell.
  • the mammalian cells may be cultured for at least 3 days, preferably at least 6 days, in order for the target genes to be inactivated by the gRNA/RNA guided endonuclease system (e.g. CRISPR/Cas9) .
  • the target genes e.g. CRISPR/Cas9 .
  • repair of these DNA DSBs by the cell introduces mutations at the target region, typically deletions or insertions, that selectively alter the activity, preferably inactivate, the target gene.
  • the mutations may result in the loss of expression of active gene product.
  • the activity of the target gene is altered and preferably abolished, in cells that express a gRNA molecule specific for the target gene but is not affected in cells that do not express a gRNA molecule specific for the target gene i.e. no active gene product is expressed from the target gene in cells that express the gRNA molecule.
  • the gRNAs in the population are specific for different target genes, such that the diverse gRNA population as a whole is specific for a set of target genes.
  • Each of the genes in the set that is targeted by the gRNA population may be inactivated in one or more cells of the mutant mammalian cell library.
  • all of the target genes that are targeted by the diverse gRNA population are inactivated in the mutant mammalian cell library (i.e. the set of target genes corresponds to the set of inactivated genes) .
  • inactivation may be less than 100% efficient and inactivation may not occur in some cells that express both a RNA guided endonuclease and a gRNA molecule.
  • fewer target genes may be inactivated in the mutant mammalian cell library than are targeted by the gRNA molecules in the diverse gRNA population (i.e. the set of target genes is larger than the set of inactivated genes) .
  • Mutations that inactivate a target gene may include insertions and deletions (indels) of one or more nucleotides.
  • the mutation that is introduced into the target gene leads to a
  • Both alleles of a gene may be inactivated by the stable expression of the gRNA and the RNA guided endonuclease. This allows recessive genes that contribute to the test phenotype to be identified using the methods described herein.
  • Another aspect of the invention provides a library of mutant mammalian cells
  • each cell in the library expressing a RNA guided endonuclease , such as Cas9, and a gRNA specific for a target gene, such that the target gene is inactivated in the cell,
  • a RNA guided endonuclease such as Cas9
  • a gRNA specific for a target gene such that the target gene is inactivated in the cell
  • the library expresses a diverse population of gRNA molecules that is specific for a set of target genes, such that one or more target genes from the set of target genes is inactivated in each of the cells in the library.
  • each cell in the library comprises a nucleic acid encoding an RNA guided endonuclease, such as Cas9, and a nucleic acid encoding gRNA molecule specific for target gene stably integrated into its genome .
  • an RNA guided endonuclease such as Cas9
  • gRNA molecule specific for target gene stably integrated into its genome
  • the library may be isolated and/or purified following production. In other embodiments, the library may not undergo further isolation or purification. For example, non-library cells which do not express both a RNA guided endonuclease and a gRNA specific for a target gene and/or do not have an inactivated target gene may also be present alongside the cells of the library.
  • the library may be used or suitable for use in the methods described herein.
  • the library is a pan-genomic library that targets all of the active genes in the mammalian cell (i.e. genes that encode proteins) .
  • the diverse population may comprise 20000 or more gRNAs and may target at least 20000 genes in the cell.
  • Each gene in the mammalian cell may be targeted by 1 to 5 gRNAs in the diverse population, such that an active gene in the cell is inactivated in each cell in the library.
  • the library is a focussed or sub- genomic library that targets a specific subset or panel of genes in the cell, for example genes encoding a specific enzymatic activity or genes encoding members of a specific pathway.
  • Suitable subsets or panels of target genes include genes encoding kinases, phosphatases, tyrosine kinases or G protein coupled receptors (GPCRs) .
  • GPCRs G protein coupled receptors
  • Each gene in the subset or panel may be targeted by 1 to 5 gRNAs in the diverse population, such that a gene from the subset is inactivated in each cell in the library.
  • the library of mammalian cells may be maintained in culture, expanded, stored, for example frozen using conventional techniques, or used in genomic screening.
  • the gRNA sequences in the library may be analysed before genomic screening is performed to determine the representation of each gRNA in the library.
  • the mutant mammalian cell library may be interrogated in order to identify genes that contribute to or are associated with a phenotype of interest.
  • the library may be subjected to a selection for a test phenotype (i.e. a phenotype of interest) to identify a cell population within the library that displays the test phenotype.
  • a test phenotype i.e. a phenotype of interest
  • the selection may comprise subjecting the library to culture conditions that are lethal to cells which do not display the test phenotype. Cells which survive the selective culture
  • test phenotype displayed the test phenotype and represent a selected cell population. Further isolation of the selected population of cells may not be required.
  • cells within the library that display the test phenotype may be isolated and/or separated from other cells in the library to form the selected cell population.
  • Suitable selection methods for isolating cells within the library that display the test phenotype are well known in the art and include flow cytometry, immunological methods, such as panning or magnetic beads, cell adhesion, imaging techniques and/or culturing as clonal, or oligoclonal populations, for example in an array format.
  • a test sample of the mutant mammalian cell library may be subjected to selection for the test phenotype.
  • the test sample is preferably representative of the mutant mammalian cell library.
  • the results may be compared with a control sample of the library that has not been subjected to selection.
  • the control sample may be untreated, cultured under non-selective
  • results from the test sample of the library after selection may be compared with results from the test sample of the library before selection.
  • the phenotype of interest may be selected in the library of mutant mammalian cells or the test sample thereof by applying a phenotypic screen.
  • the test phenotype is displayed by members of the mutant mammalian cell library that have an inactivated target gene that is relevant to the phenotype e.g. a gene that activates, represses or otherwise mediates or is involved in the cellular pathways involved in establishing the test phenotype in the cell.
  • the phenotype may be selected by applying selective pressure to the mutant mammalian cell library or a test sample thereof.
  • the library or sample may be cultured under conditions that allow cells that display the phenotype of interest to survive while cells that do not display the phenotype of interest do not survive, or conditions that confer a growth or survival advantage on cells that display the phenotype of interest compared to cells that do not display the phenotype of interest (i.e. the culture conditions are selective for cells which possess the phenotype of interest) .
  • the culture conditions are selective for cells which possess the phenotype of interest
  • phenotype of interest may survive or be enriched in the library or sample thereof during selection or may not survive or may be depleted in the library or sample thereof during selection, relative to cells in which the inactivated gene is not relevant to the phenotype.
  • the phenotype may be selected by identifying and isolating cells in the library or sample that display the phenotype of interest from other cells in the library or sample that do not display the
  • phenotype for example using cell-sorting (e.g. FACS) , cell adhesion, cell cloning or immunological and imaging techniques.
  • FACS cell-sorting
  • the phenotype may be oncogenesis (Ngo, V. N. et al. Nature 441, 106-110 (2006)), cell viability (MacKeigan, J. P. et al Nature Cell Biol. ⁇ , 591-600 (2005)), cell motility (Collins, C. S. et al. Proc. Natl Acad. Sci. USA 103, 3775-3780 (2006)), proteasome function (Paddison, P. J. et al. Nature 428, 427-431 (2004)), mitotic progression (Moffat, J. et al. Cell 124, 1283-1298 (2006)) host-pathogen interaction (Yeung,
  • the phenotype of interest may be sensitivity or resistance to the selective culture conditions.
  • the phenotype of interest may be sensitivity or resistance to a chemical compound, such as a small molecule inhibitor, and the selection may be applied by exposing the mutant mammalian cells in the library or the test sample thereof to the chemical compound.
  • the methods described herein may be useful in identifying candidate genes that modify or mediate the effect of a chemical compound, such as a small molecule inhibitor or other drug, in a mammalian cell.
  • a chemical compound such as a small molecule inhibitor or other drug
  • gRNA encoding nucleic acids that are amplified or depleted in the library or sample thereof following exposure to the chemical compound may be identified as targeting genes that modulate resistance or sensitivity to the compound (e.g. inactivation of the target gene by the gRNA molecule increases resistance or sensitivity to the compound) .
  • the methods described here may be useful in identifying candidate genes that are involved in a cellular pathway in a mammalian cell.
  • gRNA encoding nucleic acids that are amplified or depleted in the library or sample thereof following exposure to a chemical compound or other selection may be identified as targeting genes that mediate or are involved in the cellular pathway.
  • the test sample may be exposed to aerolysin (from Aeromonas hydrophila) or alpha-toxin (from Clostridium septicum) to select GPI-anchor synthesis-defective phenotypes and thereby identify genes involved in GPI-anchor
  • 6-thioguanine (6TG) to select DNA mismatch repair-defective phenotypes and thereby identify genes involved in DNA mismatch repair
  • PARP inhibitors such as olaparib, to select identify genes involved in HR dependent DNA DSB repair
  • FIAU flialuridine
  • Aerolysin from Aeromonas hydrophila and alpha-toxin from Clostridium septicu are cytolytic pore-forming toxins and use GPI-anchored proteins as their receptors. Although GPI-anchored proteins are essential for development, GPI-deficient cells are viable.
  • Deficiencies in GPI biosynthesis therefore confer resistance to aerolysin and alpha-toxin.
  • 6TG is converted by Hprt into thio-GMP.
  • thio-dGTP is formed and incorporated into genomic DNA during replication, resulting in DNA mispairing.
  • Mismatch repair (MMR) genes recognise the mispairing and induce apoptosis.
  • MMR-deficient cells are not able to recognise the mispairing and are therefore able to survive under 6TG treatment.
  • gRNA molecules that are encoded by nucleic acid sequences whose abundance is altered by the selection i.e.
  • nucleic acid sequences that are present in greater or lesser amounts in the selected cell population than the unselected library) are specific for genes that modulate or are otherwise involved in the test phenotype.
  • the gene that is targeted by a gRNA molecule encoded by a nucleic acid whose abundance in the cell population is altered by the selection may be identified from the recognition sequence of the gRNA, which is complementary to a target region within the target gene, as described above .
  • a gene that modulates or contributes to the test phenotype may be involved in or be a component of a cellular process or pathway that mediates the test phenotype in the cell. Methods of the invention allow the rapid identification of candidate genes relevant to the test phenotype in the cell i.e. genes that modulate, mediate or are negatively or positively associated with the test phenotype in the mammalian cell.
  • Cells of the mutant cell library in which the inactivated target gene is relevant to the phenotype of interest are amplified or depleted by the selection compared to cells in which the inactivated gene is not relevant to the phenotype.
  • Nucleic acid sequences encoding gRNAs that are specific for genes relevant to the phenotype are therefore enriched or depleted in total genomic DNA isolated from the selected cell population relative to control samples.
  • cells from the selected population may be harvested.
  • cells may also be harvested from the unselected library to produce the control sample.
  • encoding nucleic acids that are amplified or depleted in the selected cell population (i.e. cells displaying the test phenotype) or the control sample may be analysed before or after the selected cell population.
  • the abundances of gRNA encoding nucleic acid sequences in a control sample may be determined, and optionally stored or recorded, and used to identify gRNA encoding nucleic acids that are amplified or depleted in multiple different test samples .
  • Genomic DNA from the test sample and/or control sample may be amplified before sequencing.
  • genomic DNA from the test sample and/or the control sample of cells may be purified before amplification. Suitable methods of DNA purification are well known in the art.
  • the total gRNA encoding nucleic acid in the test sample and/or the control sample may be amplified from the genomic DNA of each sample.
  • Amplification primers may be based on the sequence of the integrative vector or the non-diverse regions of the gRNA-encoding nucleic acids and may be designed using routine primer design techniques.
  • suitable primers for gRNA amplification include the forward primer; CTTGAAAGTATTTCGATTTCTTGG and the reverse primer:
  • the amount or abundance of each gRNA encoding nucleic acid sequence in the test sample may be determined after or at the same time as sequencing. From the amount of each gRNA encoding nucleic acid sequence, genes that are enriched or depleted in the test sample and are therefore involved in the selection phenotype in the cell may be identified.
  • the relative amount or abundance of individual gRNA encoding nucleic acid sequences in the test sample may be determined relative to a control sample.
  • the amount of each gRNA encoding nucleic acid sequence in the control sample may be determined before, at the same time as, or after the amount of each gRNA encoding nucleic acid sequence in the test sample; or may have been previously determined.
  • the number of times an individual gRNA encoding sequence is read i.e. the read count
  • may be compared between the test and control samples to determine the relative abundance of the sequence. Suitable methods of determining the relative amounts of gRNA molecules in the samples are well known in the art. For example, mapping reads may be performed with standard software such as the Burrows-Wheeler
  • An increased amount of a gRNA encoding sequence in the test sample, relative to the control sample, is indicative that the gRNA encoding sequence is enriched by the selection.
  • a decreased amount of a gRNA encoding sequence in the test sample, relative to the control sample is indicative that the gRNA encoding sequence is depleted by the selection .
  • gRNA encoding nucleic acid sequences that are enriched or depleted in the selected cell population compared to the control population encode gRNAs that target genes involved in the test phenotype .
  • a gene that is involved in the test phenotype may be identified from the sequence of a gRNA encoding nucleic acid that is enriched or depleted in the test sample.
  • the amount or abundance of multiple different nucleic acid sequences that encode gRNA molecules that are specific for the same target gene may be determined in the test sample relative to a control sample.
  • the enrichment or depletion of multiple gRNA encoding nucleic acids that are specific for the same target gene in the test sample relative to the control provides strong indication that the target gene is associated with the test phenotype whereas the enrichment or depletion of only one of multiple gRNA encoding nucleic acids that are specific for the same target gene may be indicative that the observed phenotype arises from an off-target effect and the target gene is not associated with the phenotype.
  • the target gene may be further tested, for example by genetic, biochemical or biological analysis, to confirm its activity and/or function. For example, the effect of knocking out genes identified as target genes in a cell may be determined.
  • kits for use in a method of genomic screening as described above comprising;
  • each mutant mammalian cell in the library expressing an RNA guided endonuclease, such as Cas9, and a gRNA specific for target gene, such that the target gene is inactivated in the cell,
  • an RNA guided endonuclease such as Cas9
  • a gRNA specific for target gene such that the target gene is inactivated in the cell
  • each integrative vector comprising a nucleic acid sequence encoding a guide RNA molecule (gRNA) that is specific for a target gene in a mammalian cell,
  • gRNA guide RNA molecule
  • gRNAs that is specific for a set of target genes within a mammalian cell, and optionally;
  • Suitable libraries, viral vectors and mammalian cells are described above .
  • the kit may include instructions for use in a method of genomic screening as described above.
  • a kit may include one or more other reagents required for the method, such as culture media, buffer solutions, amplification, sequencing and other reagents.
  • the kit may include one or more articles and/or reagents for
  • l-EFla-puro2ACas9 was constructed as follows. Firstly, we removed the Notl and the Ascl sites from pPB-LR5 19 by cloning the Mlul-Xbal fragment containing the piggyBac transposon into the Mlul- Xbal site of PCR-generated pBluescript, resulting in pPB-LR5.1.
  • CAG promoter the Nhel-Clal fragment of pPB-CAG . EBNXN 20
  • bpA the PCR-generated Clal-Xhol fragment
  • the PCR- generated Mfel-Pacl fragment containing puro-T2A-GFP was then cloned into the EcoRI-PacI site of pPB-LR5.1-CAG, resulting in pPB-LR5.1- CAGpuro2AGFP .
  • the fragment containing human EFla promoter was PCR-generated using a BAC clone, RP11-159L14, as a template and cloned into pPB vector together with the GFP fragment, resulting in pPB-EFla-GF .
  • the Nhel-AscI fragment containing the hEFla promoter was excised from pPB-EFla-GFP and cloned into the Nhel-AscI site of pPB- LR5. l-GAGpuro2AGFP, resulting in pPB-LR5. l-EFla-puro2AGFP .
  • the gRNA cloning vector, pU6-gRNA (Bbsl ) was constructed by cloning a gBlock fragment (IDT) containing the human U6 promoter, the gRNA cloning site and the gRNA scaffold, into the XhoI-BamHI site of pBluescriptl I .
  • IDT gBlock fragment
  • the lentiviral gRNA expression vector, pKLV-U6gRNA (Bbs I ) -PGKpuro2ABFP was constructed as follows. Firstly, a new lentiviral backbone vector, pKLV was constructed. A vector containing the multicloning site, Spel- Apal-MluI-XhoI-AscI-BamHI-Notl-Kpnl-Eagl-PacI , was generated by PCR using pBluescript as a template, resulting in pBS- CS-KLV.
  • the modified 3' LTR followed by bpA was synthesized (GeneArt) and cloned into the Kpnl-Pacl site of pBS-MCS-KLV, resulting in pBS-3LTRbpA.
  • the Spel-Apal fragment containing the CMV promoter, the 5' R/U5 region and the packaging signal sequence was excised from FUW-OSK - (Addgene, 20328) and cloned into pBS-3LTRbpA, resulting in pKLV.
  • the PGK-puro2ABFP cassette was constructed as follows. Fragments
  • a BED file containing the exonic coordinates of all protein coding genes on the mouse reference genome GRCm38 was obtained. Overlapping coordinates were merged using BEDtools. The sequences of each genomic interval in the BED file, with an additional 20 nucleotides on both sides of the intervals, were retrieved and used to identify all sequences comprising 5' -GN 2 oGG-3' . To avoid off-target cleavages, only gRNAs that matched stringent conditions were chosen: from position 8, the 5'-N14GG-3' of each gRNA only had a single match to the mouse genome. A total of 325,638 sites were identified.
  • gRNAs that are positioned at least 100 bp away from the translation initiation site and in the first half of coding sequences were collected. Finally, up to 5 gRNAs were chosen for each gene, prioritising gRNAs with fewer predicted off-target sites.
  • a 79-mer oligo pool was purchased from CustomArray Inc.
  • the oligo sequences are 5'-
  • N19 indicates each of the 87,897 gRNA sequences.
  • the single- stranded oligos were converted to doublestranded DNA by PGR using Q5 Hot Start High-Fidelity 2X Master Mix (NEB) with 32 fmol of the oligo as template and primers (79mer-Ul and -Ll) using the following conditions: 98 °C for 10 sec, 10 cycles of 98 °C for 10 sec, 64 °C for 15 sec and 72 °C for 15 sec, and the final extension, 72 °C for 2 min.
  • NEB Hot Start High-Fidelity 2X Master Mix
  • Mouse ESCs (JM8) were cultured on mitomycin C-treated MEFs in Knockout DMEM (Invitrogen) supplemented with 15% FBS (PAA), 1% GlutaMax
  • ESCs suspended in 80 ⁇ OPTI-MEM were mixed with 20 ⁇ of the DNA:PLUS:LTX mixture and plated onto a well of a 96-well plate containing feeder cells. These cells were incubated for 1 hour at 37oC. The transfection mixture was then removed and 150 ⁇ of ESC medium were added. The transfected cells were cultured for 6-7 days before relevant functional analysis.
  • transposase expression vector pCMV-mPBase (5 ⁇ g) and a transposon vector (100 ng) were electroporated into 1 x 10 6 ESCs at 230V and 500 ⁇ using GenePluser II (BioRad) and plated onto a 10-cm dish. Two days later, drug selection was initiated. The resulting colonies were picked and further expanded.
  • ESCs and diluted virus were mixed in 100 ⁇ of the ESC medium containing 8 ⁇ g ml "1 polybrene (Millipore) , incubated for 30 min at 37°C in a well of a round-bottomed 96-well plate, plated onto a well of a feeder-containing 96-well plate and cultured until functional analyses. Transduction volumes were scaled up according to the areas of the culture plates if necessary.
  • transgenic ESC lines with Phusion High-Fidelity polymerase in GC buffer Thermo Scientific
  • PCR products were pooled, purified using QIAquick PCR Purification Kit (Qiagen) and used for Illumina library generation .
  • ESCs were dissociated into single-cell suspension and plated onto gelatin coated plate at a density of 9 x 10 4 cells cm -2 in a volume of 220 ⁇ cnr 2 with the indicated concentrations of alpha-toxin.
  • the cells were cultured for 48 h and then the medium was replaced with fresh M15L medium daily until staining with methylene blue or harvesting for downstream analysis.
  • ESCs were dissociated into single-cell suspension and plated onto pSNL feeder plates at a density of 5 x 10 6 cells per 10-cm dish for the MMR screening or 2.5 x 10 4 cells per well of a 12-well plate for comparison of gene inactivation efficiencies between gRNA and shRNA.
  • the medium was replaced with a selective medium
  • Mutant cells (1 x 10 b cells) were transfected with a mixture of cDNA expression vector (2.25 ug) and pPB-EFla-GFP (0.25 ug) using
  • Lipofectamine LTX Lipofectamine LTX.
  • pBluescriptl I was used as a negative control.
  • GFP-positive cells were sorted using
  • 1.0 x 10 7 ESCs (JM8-Cas9#5) were infected with the genome-wide gRNA lentiviral library at an MOI of 0.3. Two independent infections were conducted, thus producing two independent ESC libraries. Three days post infection, 2.0 x 10 6 BFP-positive cells were sorted for each of the libraries and cultured for an additional 4 days. For each of the 2 ESC libraries, 6 x 10 6 or 10 x 10 6 mutant ESCs were treated with alpha- toxin (1.0 nM) for 48 h or 6-TG (2 ⁇ ) for 5 days, respectively, and further cultured for an additional 5 days. Surviving cells were pooled per library and genomic DNA was extracted and used for PCR templates.
  • PCR products were pooled, purified using QIAquick PCR Purification Kit (Qiagen) . Five hundred nanograms of the purified PCR products were ligated with Illumina adaptors 53 using NEBNext DNA Library Prep Master Mix (NEB) according to the manufacturer's protocols.
  • NEB NEBNext DNA Library Prep Master Mix
  • the adaptor-ligated products (1 15-1 of the input material) were used for PCR enrichment 53 with KAPA HiFi HotStart ReadyMix with the following PCR conditions: 98 °C for 30 sec, 7 cycles of 98 °C for 10 sec, 66°C for 15 sec and 72 °C for 20 sec, and the final extension, 72 °C for 5 min.
  • the PCR products were purified with Agencourt AMPure XP beads (Beckman) in a PCR product-to-bead ratio of 1:0.7.
  • the purified libraries were quantified and sequenced on Illumina MiSeq by 250-bp paired-end sequencing. Each read was mapped to a custom reference sequence using BWA-SW 52. Reads containing indels overlapping the ⁇ 20-bp region of the predicted cut sites were considered to be the outcome of NHEJ.
  • the cut frequency was calculated by dividing the number of reads with indels by the total number of reads mapped.
  • the region containing the gRNA was amplified using primers (gLibrary-HiSeq_50bp- SE-U1 and -Ll) with Q5 Hot Start High-Fidelity 2X Master Mix.
  • primers gLibrary-HiSeq_50bp- SE-U1 and -Ll
  • Q5 Hot Start High-Fidelity 2X Master Mix we conducted 10 independent PCR reactions using 15 ng of the whole genome lentiviral plasmid library per reaction and 72 independent PCR reactions using 1 ⁇ g of the mouse ESC library per reaction for each of the two ESC libraries. These correspond to 1.7 x 10 10 molecules of the plasmid DNA and 1.1 x 10 7 ESCs in total, respectively.
  • the region containing the gRNA was amplified using 1 ⁇ ig of genomic DNA (1.5 x 10 " cells) and primers (gLibrary-MiSeq_150bp-PE-Ul and -LI) with Q5 Hot Start High-Fidelity 2X Master Mix.
  • the PCR products were pooled in each group and purified using QIAquick PCR Purification Kit.
  • PCR enrichment 53 Two hundred picograms of the purified PCR products were used for PCR enrichment 53 with KAPA HiFi HotStart ReadyMix with the following conditions: 98 °C for 30 sec, 12 cycles of 98 °C for 10 sec, 66°C for 15 sec and 72 °C for 20 sec, and the final extension, 72 °C for 5 min.
  • the PCR products were purified with Agencourt AMPure XP beads in a PCR-product-to-bead ratio of 1:0.7.
  • the purified libraries were quantified and sequenced on
  • gRNA sequences were extracted by removing constant regions from each read and these were used to count the number of reads of each gRNA in the library.
  • site 120_5tm-2 represents a weak off-target site in comparison to a strong off-target site like site 120_5tm-21.
  • CCDS protein- coding regions
  • Lentiviral vectors have been successfully utilised in various gene delivery applications including the delivery of small hairpin RNA ( shRNA) for RNA interference (RNAi) (Moffat, J. et al . Cell 124, 1283- 1298 (2006) ; Silva, J.M. et al. Nature genetics 37, 1281-1288 (2005) We first generated a lentiviral vector carrying the U6-promoter-driven gRNA expression cassette (Fig. 12) . In order to directly clone duplex oligonucleotides into the vector, we mutated the existing Bbsl sites in the vector backbone.
  • shRNA small hairpin RNA
  • RNAi RNA interference
  • the Cas9-expressing ESCs (clone JM8-Cas9#5) were individually transduced with a virus expressing the Piga site3 gRNA and analysed for GPI-anchored protein expression 6 days post
  • Lentiviral vectors are known to eventually succumb to proviral silencing in ESCs over time. Since these cells are FLAER-negative , the gRNA must have been expressed and Piga must have been inactivated. Because of proviral silencing, however, these cells have slowly become BFP-negative over time. This would explain the presence of the double- negative population.
  • tumour cells were first transduced with a lentivirus carrying hCas9 and then further transduced with a lentivirus carrying the gRNA cassette targeting the Piga gene. It is evident from Figure 13 (lower) that lentivirally delivered CRISPR/Cas9 system is able to introduce site-specific DSBs in these tumour cells, albeit at a slightly lower knock out frequency when compared to mouse ESCs.
  • Pigh Site 1 and 2 showed a marked difference.
  • DSBs at the Pigh Site 2 yielded highly frequent in-frame deletion (12 bp), whereas Site 1 was repaired with a 2-bp deletion (Figs. 18, 19) .
  • a gene product from the PighAl2 allele is functional because Pigh ⁇ 12 mutant protein is able to complement Pigh mutant phenotype.
  • gRNA sequences must not contain the Bbsl site. With these criteria, we were able to identify 87,897 gRNAs covering 94.3 % of genes with at least 2 gRNAs per gene (Fig. 21 left panel)
  • gRNAs were cloned into the lentiviral vector shown in Figure 12, producing a first-generation mouse genome-wide lentiviral gRNA
  • gRNAs that were present in the lentiviral plasmid library were not as frequently represented in the ESC libraries (Fig. 22) .
  • the genes include pluripotency genes, Nanog and Pou5fl, whose
  • Alpha-toxin Out of the 26 known genes involved in the GPI-anchor biosynthesis pathway, 14 genes that were confirmed to generate knockout phenotype by the CRISPR-Cas system (Figs. 16-19) had more than one gRNA hit (Figs. 24 and 26) . No gRNA was designed for the Pigv gene due to the splice variants that have been predicted for this gene. There are 7 genes for which two independent gRNAs were
  • hCas9-expressing ESCs (clone JM8-Cas9#5) were transduced at a multiplicity of infection of 0.1 with 3 gRNAs for each of the 4 major mismatch repair genes and an empty vector.
  • the transduced ESCs were then treated with 6-thioguanine (6TG) to enrich for mismatch repair- defective mutants, respectively.
  • 6TG 6-thioguanine
  • Our data showed clear enrichment of gRNA sequences relevant to the phenotype being screened and depletion of irrelevant gRNA sequences (Figs 25 and 27) .
  • alpha-toxin screen genes with at least 2 independent gRNA hits, except Olfrl206, were chosen and 4-5 independent gRNA expression vectors were constructed for each candidate gene.
  • MMR screen 3 genes with at least 2 independent gRNAs were chosen and 4-5 gRNA expression vectors were constructed for each gene.
  • hCas9- expressing ESCs were independently transfected with each of these gRNA expression vectors. Six days post transfection, the cells were treated with the relevant agents and their resistance was analysed.
  • Alpha-toxin None of the gRNAs could give rise to resistant cells at 1.0 nM alpha-toxin at a level similar to the gRNA targeting Piga;
  • siRNA interference RNA
  • shRNA short hairpin RNA
  • the shRNA can be expressed from PolIII promoters such as human U6 and HI promoters and this expression cassette can be
  • 91,842 CRISPR guide RNAs targeting 18,071 human protein-coding genes were designed as described above and in Kosuke-Yusa et al Nature Biotechnology (2014) 32 267-273. The guide RNAs were then cloned into the lentiviral vector as described above, resulting in the human CRIPSR guide RNA library.
  • Validation of the human library was carried out by screening mutant cell libraries for alpha-toxin resistant mutants. We first introduced a Cas9 expression cassette into HT29 human colorectal cancer cell line by lentiviral transduction and established a stable cell line.
  • the Cas9-expressing HT29 was mutagenized by transducing with the lentiviral library in 4 replicates.
  • the transduced cells were cultured for 2 weeks to completely deplete remaining mRNA and proteins of mutated genes and then treated with alpha-toxin.
  • Five days after treatment, surviving cells were harvested, lysed and used for PCR amplification of the region containing guide sequences.
  • the PCR products were then sequenced on the Illumina MiSeq platform and resulting data were analysed.
  • lentiviral vectors can be used to deliver gRNA expression cassettes into mammalian cells. Since gRNA-mediated DSBs can introduce null mutations to target genes, gRNA-based screens are able to overcome one of the major problems of RNAi screens, namely incomplete suppression of gene expression. These led us to generate a genome-wide lentiviral gRNA library, which we used to successfully conduct genetic screens .
  • a key to the success of genome-wide gRNA-based genetic screening is the performance of each gRNA, i.e. cutting efficiency.
  • deletion patterns may be
  • Off-target cleavages by the CRISPR-Cas9 system are expected to be more frequent than that observed in ZFNs and TALENs .
  • the low specificity at the 5' end of the gRNAs may be useful.
  • the U6 promoter For efficient transcription from the U6 promoter, the
  • nucleotide at this position needs to be guanine.
  • target sites with GN19NGG have been most commonly used. This, however, limits the number of target site candidates in a given genome.
  • the design of gRNAs therefore need not be restricted to sites with GN19NGG and sites with N19NGG can be used as CRISPR target sites. This new design significantly increases the repertoire of gRNAs available for use.
  • Genome-wide lentiviral gRNA libraries as tools of genome-wide mutagenesis, hold several advantages over existing mutagenesis methods.
  • creating null mutations by the CRISPR/Cas system could overcome one of the major problems associated with RNAi, namely incomplete suppression of gene expression.
  • various cell types including cancer cells are amenable to gRNA-based genome engineering.
  • a lentiviral genome-wide gRNA library will have wide applicability and represents a promising platform for functional genomics .
  • TCCATTCCCA AGTTCTTTCTCTGCCATGG TGATGCTCTCTTCCACGCCAAG

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

This invention relates to the genomic screening of libraries of mutant mammalian cells in which each cell has a target gene inactivated by expression of an RNA-guided endonuclease and a guide RNA molecule (gRNA) specific for the target gene. The library of mutant cells expresses gRNA molecules specific for a set of target genes and a target gene from the set of target genes is inactivated in each cell in the library. Mutant mammalian cells that display a test phenotype from said library are selected and one or more nucleic acid sequences that encode gRNA molecules identified in the selected cell population. From these nucleic acid sequences, the target genes that mediate the test phenotype may be identified. Screening methods and libraries and vector populations for use in screening methods are provided.

Description

Genomic Screening Methods using RNA-Guided Endonucleases
Field
This invention relates to functional genomic screening and, in particular, methods for genome wide loss-of-f nction screening in mammalian cells.
Background
Genomic screens have been useful in the elucidation of gene function and the identification of new targets for drug discovery.
The diploid nature of the mammalian genome has hampered rapid and efficient production of mutant cells and organisms. In mouse embryonic stem cells (ESCs) , homozygous mutants can be obtained by knocking out both alleles of the relevant gene with two rounds of gene targeting. However, this process takes at least 2 months and genome-wide mutant libraries cannot be easily generated by this method.
RNAi libraries have been used to systematically target large sets of genes in the genome, allowing many genes to be interrogated
simultaneously using high-throughput formats (Sims et al. Genome Biology 2011, 12:R104; Iorns et al Nature Rev Drug Discov (2007) 6 556-568, Campeau et al Briefings in Functional Genomics. 10 4 215- 226) .
However, RNAi libraries are limited by the efficacy of gene silencing. Since RNAi mediated suppression is rarely 100% efficient, not all genes in an RNAi library are knocked down with sufficient efficiency to generate a detectable phenotype. Furthermore, the amount of suppression of different genes in RNAi libraries is not uniform, further limiting the effectiveness of RNAi libraries for functional screening and cells targeted by RNAi often display off target effects.
Summary
The present inventors have developed a knock-out approach using RNA- guided endonuclease technology that allows high-throughput genome-wide loss of function screening of both dominant and recessive genes in mammalian cells. Δη aspect of the invention provides a method of genomic screening comprising;
providing a library of mutant mammalian cells, each cell in the library expressing an RNA-guided endonuclease and a gRNA specific for a target gene, such that the target gene is inactivated in the cell, wherein the library expresses gRNA molecules specific for a set of target genes, such that a gene from the set of target genes is inactivated in each mutant mammalian cell in the library,
selecting a population of cells from the library that display a test phenotype,
identifying one or more nucleic acid sequences encoding gRNA molecules in the selected cell population, and;
identifying from the one or more identified nucleic acid sequences one or more target genes that are inactivated in the selected cell population.
The target genes that are identified are candidate modulators of the test phenotype. In some embodiments, the population of cells is selected using a selection that is lethal to cells which do not display the test phenotype. Cells which survive the selection therefore display the test phenotype and form the selected cell population. In other embodiments, cells displaying the test phenotype may be isolated and/or separated from other cells in the library to form the selected cell population.
Another aspect of the invention provides a method of genomic screening comprising;
providing a library of mutant mammalian cells, each cell in the library expressing an RNA-guided endonuclease and a gRNA specific for a target gene, such that the target gene is inactivated in the cell, wherein the library expresses gRNA molecules specific for a set of target genes, such that a gene from the set of target genes is inactivated in each mutant mammalian cell in the library,
selecting a population of cells from the library that display a test phenotype, determining the amounts of nucleic acid sequences that encode gRNA molecules in the selected cell population relative to a control sample of the library, and;
identifying one or more nucleic acid sequences encoding gRNA molecules that are amplified or depleted in the selected cell
population relative to the control sample, and
identifying from the one or more identified nucleic acid
sequences one or more target genes that are inactivated in the selected cell population.
Preferably, each cell in the library comprises a nucleic acid encoding an RNA-guided endonuclease and a nucleic acid encoding a gRNA specific for a target gene stably integrated into the genome of the cell. The nucleic acids expressing the RNA-guided endonuclease and gRNAs may be stably or conditionally expressed in the library of cells in order to inactivate the set of target genes in the library.
Methods of the invention may be useful in identifying genes that modulate or are functionally linked to the test phenotype in the mammalian cell.
A suitable library of mutant mammalian cells may be produced by a method comprising;
providing a population of mammalian cells that express a RNA guided endonuclease
providing a population of integrative vectors, each integrative vector comprising a nucleic acid sequence encoding a guide RNA molecule (gRNA) that is specific for a target gene in the mammalian cells, wherein said population comprises nucleic acid sequences encoding a diverse population of guide RNA molecules (gRNAs) that is specific for a set of target genes within the mammalian cell,
transfecting or infecting the mammalian cells with the population of integrative vectors such that one or more integrative vectors stably integrate into the genome of each of the cells,
thereby producing a library of mutant cells that express a RNA- guided endonuclease and a member of the diverse population of gRNAs, such that a gene from the set of target genes is inactivated in each mutant mammalian cell in the library. The mammalian cells may comprise a nucleic acid encoding a RNA-guided endonuclease integrated into the genome of the cells.
Preferably, the RNA-guided endonuclease is Clustered Regularly
Interspaced Short Palindromic Repeat (CRISPR) -associated 9 (Cas9) nuclease .
Another aspect of the invention provides a diverse population of integrative vectors,
each integrative vector being capable of integration into the genome of a mammalian cell and comprising a nucleic acid sequence encoding a guide RNA molecule (gRNA) that is specific for a target gene in the mammalian cell,
wherein said population of vectors comprises nucleic acid sequences that encode a diverse population of guide RNA molecules (gRNAs) that is specific for a set of target genes within the
mammalian cell.
Preferably, the integrative vectors are viral vectors, most preferably lentiviral vectors.
Diverse populations of integrative vectors may be useful in the methods described above.
Another aspect of the invention provides a library of mutant mammalian cells ,
each mutant mammalian cell in the library expressing an RNA- guided endonuclease and a guide RNA molecule specific for target gene, such that the target gene is inactivated in the cell,
wherein the library expresses a diverse population of gRNA molecules that is specific for a set of target genes, such that a gene from the set of target genes is inactivated in each mutant mammalian cell in the library. Preferably, each mammalian cell in the library comprises a nucleic acid encoding RNA-guided endonuclease and a nucleic acid encoding a gRNA specific for a target gene integrated into the genome thereof. The set of target genes may comprise all of the protein coding genes in the genome of the mammalian cell (i.e. the library is a pan-genomic library) or a subset of the genes in the genome of the mammalian cell. Populations of mammalian cells may be useful in the methods described above .
Brief Description of Figures
Figure 1 is a schematic of the gRNA cloning vector. Bbsl digestion removes the spacer and produces cohesive ends, which allows duplex oligos with compatible overhangs to ligate into. Underlined, Bbsl sites. For the duplex oligos, G at the +1 position is highlighted in red. gRNA sequences are shown as N (top strand) and n (bottom strand) . Figure 2 shows the sequences and the genomic positions of gRNAs targeting the Piga gene. The genomic PAM sequences of each gRNA are shown in red.
Figure 3 shows a summary of flow cytometry analysis of GPI-anchored protein expression performed 6 days post transfection. Transfection was performed in triplicates. Data are shown as mean ± s.d.
Figure 4 is a schematic of the piggyBac vector carrying human EFla promoter-driven puromycin resistant gene (Puro) and humanized Cas9 (hCas9) . The two coding sequences are fused with the T2A self-cleaving peptide. PB, piggyBac repeats; bpA, bovine polyadenylation signal sequence .
Figures 5 and 6 show flow cytometry analyses of GPI-anchored protein expression (Fig 5) and GFP (Fig 6) after transient transfection. The parental wild type ESCs (JM8) and the hCas9-expressing clones were transfected with the indicated plasmid DNAs . The expression of GPI- anchored proteins was analysed 6 days post transfection by FLAER staining (Fig 5) . pBluescript was used as a carrier plasmid. To measure transfection efficiency, the cells were also transfected separately with a GFP plasmid and analysed 2 days post transfection (Fig 6) . Figure 7 is a schematic of the piggyBac vector carrying the gRNA expression cassette and a neomycin resistant gene cassette. U6. human U6 promoter; T7, U6 terminator; PGK, mouse Pgkl promoter. Figure 8 shows flow cytometry analysis of doubly transgenic mouse ESC lines. This analysis was performed 3 days after the colonies were picked .
Figure 9 shows histograms showing the distribution of indel sizes at the on-target locus. Data for parental ESCs (JM8) and two doubly transgenic lines (5-4 and 8-3) are shown.
Figure 10 shows histograms the distribution of indel sizes at the off- target locus 120-tm5-21. Data for parental ESCs (JM8) and two doubly transgenic lines (5-4 and 8-3) are shown.
Figure 11 shows histograms the distribution of indel sizes at the off- target locus 120-tm5-2. Data for parental ESCs (J 8) and two doubly transgenic lines (5-4 and 8-3) are shown.
Figure 12 shows a schematic of the self-inactivating lentiviral vector that expresses gRNA. The Pgkl promoter-driven puro-2A-BFP cassette was inserted downstream of the gRNA expression cassette. The sequences shown are the gRNAs targeting Site 3 of the Piga gene with the endogenous sequences (top) or the modified sequences (bottom) . Note that the +1 position of the U6 transcript has been changed to a
G nucleotide in the modified sequence. CMV, CMV promoter; RU5, 5' long terminal repeat lacking the U3 region; U6, human U6 promoter; T7, U6 terminator; PGK, mouse Pgkl promoter; BFP, blue fluorescent protein; 2A, self-cleavage peptide; puro, puromycin resistant gene; AU3RU5, enhancer-deleted 3' LTR.
Figure 13 shows the inactivation of the Piga gene by lentiviral delivery of the gRNA expression cassette in hCas9-expressing mouse ESCs (upper) or mouse pancreatic carcinoma cells (lower) . Cells were infected with lentivirus expressing the gRNA targeting site 3 of the Piga gene. Transduced cells were analysed by flow cytometry 6 days post infection. While cells transduced with the empty lentivirus were all FLAER-positive , cells transduced with Piga : gRNA-expressing lentivirus were generally FLAER-negative population.
Figure 14 shows the inactivation of the Piga gene by lentiviral delivery of the gRNA expression cassette in hCas9-expressing mouse ESCs infected with lentivirus expressing the gRNA targeting the indicated sites of the Piga gene with the endogenous (top) or altered (bottom) sequences. Transduced cells were analysed by flow cytometry 6 days post infection.
Figure 15 shows a summary of the cytometry analysis shown in figure 14. Data are shown as mean ± s.d. (n=2) . Student's t-test was
performed.* p<0.01 Figure 16 to 19 ahow analyses of 52 gRNAs targeting the 26 genes involved in the GPI-anchor biosynthesis pathway.
Figure 16 shows the fractions of reads with indels analysed by deep sequencing. **, no indel was detected.
Figure 17 shows flow cytometry analysis of cells transfected with the indicated gRNA-expression vectors. pBluescript was used as a control vector. *, not significant when compared to the control by Student's t-test .
Figures 18 and 19 show the percentage of in-frame indels. The y axis shows the number of reads with in-frame indels at the indicated size divided by the total numbers of reads with all indels at Site 1
(Figure 18) and at Site 2 (Figure 19) . **, no indel was detected. Data are shown as mean ± s.d. (n=2-4) .
Figure 20 shows a schematic of genetic screens with genome-wide lentiviral gRNA libraries. Figure 21 shows gRNA design statistics (far left) and deep sequencing analyses of the gRNAs in the lentiviral plasmid DNA library (centre left) , ESC library 1 (centre right) and 2 (far right) . Figure 22 shows scatter plots comparing gRNA frequencies in the oriainal lentiviral olasmid DNA and in the ESC library 1 (LH) and 2 (RH) . gRNA counts in the ESC libraries have been normalised against the gRNA counts from the lentiviral library.
Figure 23 shows fold changes of read counts between the lentiviral plasmid DNA library and the ESC libraries. Pluripotency genes, Nanog and Pou5fl, and essential genes, Rad51 and Brcal, are significantly depleted in the ESC libraries, whereas lineage marker genes are not depleted. The same gRNAs in the two ESC libraries are linked with lines. Mann-Whitney U test was performed by comparing gRNAs of each gene from each library with all gRNAs in the corresponding ESC library . Figures 24 and 25 show genetic screens using the genome-wide gRNA library and genetic validation assays of the novel candidate genes. Genes with multiple gRNA hits in cells resistant to alpha-toxin are shown in Fig 24 and 6TG in Fig 25. All known genes involved in the GPI-anchor synthesis pathway are shown in Fig 24. Genes highlighted in asterisk were chosen for further validation. MMR, mismatch repair.
Figures 26 and 27 show the enrichment of gRNA sequences after
treatment with either alpha-toxin (Fig 26) or 6TG (Fig 27) treatment. Four independent mutant cell libraries were used in each treatment. *p<0.01, **p<0.001 by T-test.
Figure 28 shows a summary of the guide RNA hits in the 26 GPI-anchor pathway genes identified in Example 2.10. Table 1 shows indel patterns on the on-target sites in the doubly transgenic colonies. Sequences in red and green represents the PAM and gRNA sequences, respectively. Mismatch bases are shown in blue. The sizes of the deletions were shown on the right. Note that colony 8-3 and 8-9 carry multiple numbers of deletions.
Table 2 shows a summary of gRNA hits in genome-wide screens. Detailed Description
The methods described herein relate to genomic screens to identify genes and other genomic sequences that modulate, support or are functionally linked to a phenotype of interest.
Screening methods described herein employ libraries, collections or pools of mutant mammalian cells in which different members of a set of target genes are selectively inactivated or "knocked out" in different cells in the library. The target gene is inactivated or knocked out in the cells of the library through the expression from integrated genomic nucleic acid sequences of i) an RNA guided endonuclease, such as Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) - associated 9 (Cas9) nuclease, and ii) a gRNA molecule that directs the endonuclease to the specific target gene. The RNA guided endonuclease and the gRNA molecule together cause double strand cleavage of the genomic DNA in both alleles of the target gene. Repair of these DNA double strand breaks (DSBs) by the mammalian cell leads to the introduction of insertion or deletion (indel) mutations that
inactivate the target gene.
Targeted gene inactivation using gRNA and RNA guided endonucleases that are stably integrated into a cell genome provides stable
expression over time in the library of cells and has not been reported previously. Furthermore, the amount of gRNA and RNA guided
endonuclease expression from stably integrated coding sequences is shown herein to be sufficient to mediate targeted DNA double strand breakage and gene inactivation.
Different members of the library of mutant mammalian cells express different gRNA molecules specific for different target genes, such that different genes from the set of target genes are inactivated in different mutant mammalian cells in the library.
The library of mutant mammalian cells may be used inter alia for loss- of-function screening. The library may be subjected to positive or negative selection for a test phenotype, such as drug resistance, and nucleic acid sequences encoding gRNA molecules may be identified in the selected cell population that displays the phenotype. From these sequences, target genes that mediate the test phenotype may be identified. Optionally, the amounts or numbers of copies of one or more nucleic acid sequences that encode gRNA molecules may be
determined in the selected cell population. The most abundant gRNA encoding nucleic acid sequences in the selected cell population, or the gRNA encoding nucleic acid sequences that are present above a threshold amount in the selected cell population may be identified. Target genes that mediate the phenotype may be identified from these identified gRNA encoding nucleic acids.
In some embodiments, the library may be subjected to selection for cells having a specific phenotype, such as drug resistance or a cellular response to a stimulus or chemical compound, and the amounts of nucleic acids encoding different gRNA molecules in the population of selected cells (i.e. in cells which display the test phenotype) may be determined relative to a control sample of the library (e.g. a library sample that has not been subjected to selection) . gRNA encoding nucleic acid sequences that are amplified or depleted in the population of selected cells relative to the control sample may be identified and used to identify target genes that mediate the
phenotype .
Genes which are targeted by the gRNAs encoded by nucleic acids that are enriched or depleted may be identified from the recognition sequences of the enriched or depleted gRNA-encoding nucleic acids.
The target genes identified by the methods described herein are associated with or involved in the phenotype that is tested. For example, the identified genes may modulate, mediate or be functionally linked to the selected phenotype in the mammalian cells or may be activators or repressors of a phenotypic cellular response.
Because the gRNA-encoding nucleic acid is integrated into the genome of the mammalian cell, each cell in the library carries retrievable information which identifies the gene that is inactivated in it. This internal tag allows the screening methods described herein to be performed on a library or pool of mammalian cells in a single step. This avoids the need for large-scale parallel testing of individual clonal cell populations with known deletions (i.e. an array format)
Because both alleles of the target genes in the library of mutant cells are inactivated by the RNA guided endonuclease, both recessive and dominant genes may be identified by the methods described herein.
Methods of the invention are performed in vitro.
The mammalian cells are preferably isolated cells and may be from any human or non-human mammalian species, preferably human or mouse cells.
Suitable mammalian cells for use in the methods described herein include somatic cells, pluripotent cells, somatic stem cells and cancer cells.
Mammalian pluripotent cells may include embryonic stem (ES) cells, for example murine ES cells, and non-embryonic stem cells, including foetal and adult somatic stem cells and stem cells derived from non- pluripotent cells, such as induced pluripotent (iPS) cells.
The mammalian cells may be from an established cell line, for example, a cancer cell line such as human lung carcinoma cell line A549, or may be obtained from a cell sample from an individual, for example a human or non-human mammal.
Preferably, the mammalian cells are dividing cells. Suitable mammalian cells are well-known in the art.
The cells in the mutant mammalian cell library express a RNA guided endonuclease and a gRNA. Preferably, the expression is stable in the cells. The RNA guided endonuclease forms a complex with the gRNA and cleaves both strands of the target gene at a DNA sequence (termed a recognition region or protospacer) with the target gene that is complementary to the target sequence (or crRNA region) of the gRNA.
The nucleic acid encoding the RNA guided endonuclease may for example be stably integrated into the genome of the mammalian cells at a neutral and active site in the genome , such as the mouse Rosa26 locus or its human homologue (Irion et al Nature Biotech 25 (12) 1477-1482), the AAVS1 site (Sadelain, M . et al . 2011 Nat Rev Cancer. (2011) 1;
12(l) :51-8) or the Mouse Collal locus (Beard, C. et al . Genesis 44: 23-28) or its human homologue.
In preferred embodiments, the RNA guided endonuclease is Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) -associated 9 (Cas9) nuclease.
The clustered regularly interspaced short palindromic repeat
(CRISPR) /CRISPR associated protein (Cas) system is an adaptive immune mechanism found in bacterial and archaeal species that allows the host to combat pathogens, such as bacteriophages (Barrangou, R. et al.
Science 315, 1709-1712 (2007); Marraffini, L.A. & Sontheimer, E.J.
Science 322, 1843-1845 (2008); Bhaya, D., Davison, M. & Barrangou, R. Annual review of genetics 45, 273-297 (2011); Garneau, J.E. et al. Nature 468, 67-71 (2010)) . Bacteriophage-derived 30-bp DNA fragments are inserted into the CRISPR locus of the host cell and transcribed as CRISPR RNAs (crRNAs) . These form a complex with trans-encoded RNA (tracrRNA) and CRISPR-associated (Cas) proteins, and the complex introduces site-specific cleavage at DNA sites that match the sequence of the crRNAs. The use of the CRISPR/Cas system to introduce site-specific DNA DSBs into cultured mammalian cells has been reported in the art (Jinek, M. et al . Science 337, 816-821 (2012); Gasiunas, G., Barrangou, R.,
Horvath, P. & Siksnys, V. Proc. Natl. Acad. Sci. USA 109, E2579-2586 (2012); Cong, L. et al. Science 339, 819-823 (2013); Mali, P. et al. Science 339, 823-826 (2013); Cho, S. ., Kim, S., Kim, J.M. & Kim, J.S. Nature Biotechnology 31, 230-232 (2013); Wang, H. et al. Cell 153, 910-918 (2013) ) .
Preferably, the cells in the mutant mammalian cell library express a Cas9 nuclease and a gRNA molecule. The Cas9 nuclease forms a complex with the gRNA molecule and introduces a site-specific DNA double strand break into DNA sequences (termed recognition regions or protospacers) that are complementary to the target sequence (or crRNA region) of the gRNA molecule.
The amino acid sequences and encoding nucleic acid sequences of suitable Cas9 nucleases for use as described herein are well-known in the art. Suitable Cas9 nuclease sequences include SEQ ID NO: 1.
Suitable Cas9 nucleases for use as described herein may be derived from Streptococcus pyogenes SF370, Streptococcus thermophilus LMD-9 (Cong, L. et al . Science 339, 819-823 (2013)) or other bacterial or archeal species.
The nucleic acid sequence encoding the Cas9 nuclease or other RNA guided endonuclease may be humanized, i.e. codon-optimised for expression in human cells.
Nucleic acid encoding Cas9 nucleases and other RNA guided
endonucleases may be produced by conventional synthetic means or obtained from commercial suppliers (System Biosciences, USA; BioCat GmbH, DE) or non-profit repositories (e.g. Addgene, MA USA) .
In some embodiments, the amino acid sequences of suitable RNA guided endonucleases may be fused to nuclear translocalisation signals (NLSs) and/or tags. Suitable NLSs and/or tags and encoding nucleic acids are well-known in the art. For example, a Cas9 nuclease may be fused at one end or both ends to a nuclear localisation signal and/or a sequence tag, such as a FLAG, HA, Myc, V5 tag.
The nucleic acid sequence encoding the Cas9 nuclease or other RNA guided endonuclease may be operably linked to a suitable regulatory element. Suitable regulatory elements are active after stable
integration into the mammalian genome and include constitutive promoters, such as human elongation factor la (EFla) promoter, CAG promoter, human ubiquitin C promoter, human/mouse PGK promoter, and human/mouse PolII promoter; and conditional promoters, such as the tetracycline response element (TRE) promoter.
The nucleic acid sequence encoding the Cas9 nuclease (i.e. Cas9 nucleic acid) or other RNA guided endonuclease may be contained in an expression vector. Expression vectors suitable for stable integration into the mammalian cell genome and the expression of recombinant proteins are well known in the art.
Suitable vectors can be chosen or constructed, containing appropriate regulatory sequences, including promoter sequences, terminator fragments, polyadenylation sequences, enhancer sequences, marker genes and other sequences as appropriate. Preferably, the vector contains appropriate regulatory sequences to drive the expression of the Cas9 or other RNA guided endonuclease in the mammalian cells.
For ease of manipulation, a vector may also comprise sequences, such as origins of replication and selectable markers, which allow for its selection and replication and expression in bacterial hosts such as E. coli as well as in mammalian cells.
Vectors suitable for use in expressing RNA guided endonuclease encoding nucleic acids include plasmids and viral vectors e.g. 'phage, or phagemid, and the precise choice of vector will depend on the particular expression system which is employed. Preferably, the vector is an integrative vector. For further details see, for example,
Molecular Cloning: a Laboratory Manual: 4th edition, Green et al . , 2012, Cold Spring Harbor Laboratory Press.
In preferred embodiments, the expression vector comprising the nucleic acid encoding the Cas9 nuclease or other RNA guided endonuclease stably integrates into the genome of the mammalian cells following transfection . Each mammalian cell may contain one copy or multiple copies of the nucleic acid sequence encoding the RNA guided
endonuclease .
In some embodiments, the nucleic acid encoding the RNA guided
endonuclease may be stably integrated into the genome of the mammalian cells at a neutral and active site in the genome, such as the mouse Rosa26 locus or its human homologue (Irion et al Nature Biotech 25 (12) 1477-1482) or the mouse Collal locus or its human homologue.
Mammalian cells may be transfected or infected with suitable
expression vectors for stable integration of the nucleic acid encoding the RNA guided endonuclease using standard recombinant techniques. Following transfection, the RNA guided endonuclease may be
constitutively or conditionally expressed in the mammalian cells.
Techniques and protocols for transformation, transfection and gene expression in cell culture are well known in the art (see for example Protocols in Molecular Biology, Second Edition, Ausubel et al. eds . John Wiley & Sons, 1992; Recombinant Gene Expression Protocols Ed RS Tuan (Mar 1997) Humana Press Inc) .
In some embodiments, transfected cells that express the RNA guided endonuclease may be selected following transfection, for example using a selectable marker such as antibiotic resistance or fluorescence, such that the transfected cells are isolated from cells that do not express the RNA guided endonuclease.
In other embodiments, separation of transfected cells that express the RNA guided endonuclease from cells that do not express the RNA guided endonuclease may not be necessary following transfection. The mammalian cells may be transfected with the nucleic acid encoding the RNA guided endonuclease before, at the same time as, or after transfection with the nucleic acid encoding the diverse population of guide RNA (gRNA) molecules. A guide RNA (gRNA) molecule forms a complex with the RNA guided endonuclease that introduces a site-specific DNA double strand break into a DNA sequence (termed a target region or protospacer) within the target gene that is complementary to the recognition sequence (or crRNA region) of the gRNA molecule. A gRNA molecule that directs an RNA guided endonuclease to cleave DNA strands within a target gene may be termed "specific" for the target gene. Expression of the gRNA molecule specific for a target gene in a cell in combination with an RNA guided endonuclease, such as Cas9 nuclease, leads to the selective inactivation of the target gene in the cell, whilst other genes in the cell are unaffected.
A gRNA molecule comprises a recognition sequence (crRNA) and a scaffold sequence (tracrRNA) . .
The recognition sequence of the gRNA (crRNA) is complementary to the sequence of a target region of genomic DNA (also called a protospacer) within a target gene. Suitable target regions may be 15 bp to 25bp in length, preferably 18bp to 20bp, and may be followed by a protospace- adjacent motif (PAM) . Suitable PAMs include NGG and NAG, wherein N is any nucleotide. For example, a suitable target region in a target gene may consist of the sequence 5' -NigNGG -3', 5'-N20NGG -3', 5'-Ni9NAG- 3' or 5 ' -N2QNAG-3 ' , where N is any nucleotide. Examples of PAM
sequences are shown in Figure 28.
Preferably, the target region is located wholly or partially within a coding sequence of the target gene (i.e. an exonic sequence) . Preferred target regions within a target gene are located lOObp or more downstream from the ATG initiation codon of the target gene but within the first 50% of the exonic sequence of the target gene (i.e. the 50% of the exonic sequence that is adjacent the initiation codon) . A suitable target region may be present in all transcripts of a target gene and only present in a single exon within the mammalian cell genome .
In some preferred embodiments, for example when the gRNA is expressed using a human U6 promoter, the first nucleotide of the recognition sequence of the gRNA may be G regardless of the corresponding residue in the protospacer sequence (i.e. the G may be a mismatch with the corresponding residue in the protospacer sequence) . The remainder of the recognition sequence is complementary to the sequence of the target region of genomic DNA. The initial G residue in the recognition sequence may correspond to a complementary C residue in the
protospacer sequence or may be an additional residue that does not correspond to a complementary C residue in the protospacer sequence (i.e. a mismatch) . For example, the recognition sequence of the gRNA may consist of the sequence GN.g and the protospacer of the target gene may have the sequence N20NGG, where N is any nucleotide. In some embodiments, the last five nucleotides of the recognition sequence of the gRNA may be devoid of the sequence TTT.
In order to reduce off-target effects, the nucleotide sequence from position 14 onwards of a target region that is targeted by a gRNA molecule (e.g. 5' -N14-20NGG-3' or 5'- N-4-20NAG-3' ) may be unique to the target gene and not found in other genes or exonic sequences within the genome of the mammalian cell. In some embodiments, the nucleotide sequence of nucleotides 1 to 13 of the target region targeted by a gRNA molecule (i.e. 5'-Νι-ΐ3-3') may also be unique to the target gene or may be rare within the mammalian cell genome outside the target gene (for example, less than 100, less than 50, less than 10 or less than 5 repeats of the nucleotide sequence in genes or exonic sequences outside the target gene) . In addition, nucleotide sequences that differ from the nucleotide sequence of positions 1 to 13 of the target region by one nucleotide may also be rare within the mammalian cell genome outside the target gene (for example less than 100, less than 50, less than 10 or less than 5 repeats of the nucleotide sequence in genes or exonic sequences outside the target gene) .
Suitable target regions within a target gene may be identified using standard genomic techniques as described herein and used to design the recognition sequences of gRNA molecules to target the gene.
When multiple consensus coding sequences (CCDS) are assigned to a target gene, guide RNAs may be designed for each CCDS of the target gene .
In addition to a recognition sequence, a gRNA may further comprise a scaffold (or tracrRNA) sequence.
The choice of scaffold sequence may depend on the RNA guided
endonuclease being employed and suitable gRNA scaffold sequences for use with specific RNA guided endonucleases , such as Cas9, are well- known in the art. Typically, a gRNA scaffold sequence derived from the same species as the RNA guided endonuclease is employed. Suitable gRNA scaffold sequences are well-known in the art and include the sequence of SEQ ID NO: 2. Nucleic acids encoding gRNA molecules as described herein may be readily prepared by the skilled person using publicly available genomic information, the information and references contained herein and techniques known in the art (for example, see Molecular Cloning: a Laboratory Manual: 4th edition, Green et al . , 2012 Cold Spring Harbor Laboratory Press).
In some embodiments, a diverse population of gRNAs may be produced in which sequence overlap between the members of the population is minimised or avoided.
A library of mutant mammalian cells may be generated for use in the methods described herein using a pooled population or library of diverse gRNA molecules. The number of different gRNAs in the diverse population depends on the number of genes in the set of target genes to be inactivated and the number of gRNAs in the population that target each gene.
The diverse population may comprise at least 10, at least 100, at least 1000 at least 10000, at least 20000, at least 30000, at least
50000, at least 80000 or at least 100000 different gRNAs. For example, the diverse population of gRNAs may comprise gRNAs with at least 10, at least 100, at least 1000 at least 10000, at least 20000, at least 30000, at least 50000 or at least 100000 different recognition sequences.
The diverse population of gRNAs may be specific for a set of target genes in the mammalian cell that consists of at least 10, at least 100, at least 1000, at least 10000, at least 19000 or at least 20000 different genes. The diverse population of guide RNA molecules (gRNAs) is specific for a set of target genes in the mammalian cell that consists of at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95% or at least 98% of the protein coding genes in the mammalian cell.
The diverse population of gRNAs may target all of the genes in the genome of the mammalian cell or a subset of the genes in the genome.
Suitable subsets of genes that may be targeted include genes involved in specific biological pathways, such as signal transduction and epigenetic regulation, genes that encode specific protein activities, such as kinases and phosphatases.
Suitable diverse populations of gRNAs may be designed and synthesised using standard techniques. For example, the exonic coordinates of all the protein coding genes in the mammalian cell genome are publically available may be obtained from genomic databases. Protospacer
sequences of N2oNGG or N20NAG may be extracted and oligonucleotides corresponding to some or all of the extracted sequences may be synthesised using standard techniques. For example, large populations of oligonucleotides may be produced by parallel synthesis using standard techniques or obtained from commercial suppliers (e.g.
CustomArray Inc, WA, USA) . After synthesis, oligonucleotides may be cloned into integrative gRNA expression vectors adjacent a gRNA scaffold sequence.
Each target gene may be targeted by one gRNA in the diverse population or more preferably two or more gRNAs in said diverse population (i.e. two or more different gRNAs in the diverse population may be specific for the same gene) . For example, two, three, four, five or more gRNA molecules in said library may be specific for different target regions within the same target gene, preferably three to five. This may be helpful in reducing the risk that the results of the screen are affected by the off target effects of a single gRNA and increasing the probability of successful inactivation of the target gene. At least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, or most preferably all of the set of target genes that are targeted by the diverse population of
integrative vectors may be inactivated in the mammalian cell library.
The number of target genes that are inactivated in individual
mammalian cells in the library depends on the relative amounts of integrative vector and mammalian cells used for transfection. In some embodiments, one target gene may be inactivated in each mammalian cell in the library. In other embodiments, two to five target genes may inactivated in each mammalian cell. Where multiple target genes are inactivated in each cell, a second screening step may be employed to identify which of the inactivated target genes in a cell is
responsible for the phenotype identified in the screen.
In some embodiments, the diverse population of integrative vectors may comprise vectors that encode control gRNA molecules. Suitable control gRNAs may be irrelevant to the set of target genes and may for example introduce mutations into non-functional sequences or into irrelevant genes that are not part of the target set of genes. This may be useful as a control and/or for facilitating enrichment when the number of genes in a focussed gene set is relatively small, for example in a secondary screen described above. Since each gRNA molecule of interest represents a smaller fraction in the total population, the presence of irrelevant gRNA molecules allows an increased relative enrichment of nucleic acids encoding gRNA molecules that are involved in the test phenotype .
Optionally, the representation of individual gRNAs within the diverse population may be confirmed by sequencing before transfection or infection .
A nucleic acid encoding a gRNA molecule may be contained in an expression cassette. The expression cassette may comprise the nucleic acid sequence encoding the gRNA molecule operably linked to a
heterologous regulatory element. Suitable regulatory elements include constitutive viral or mammalian regulatory elements, such as the human U6 promoter or the human Hi promoter. A suitable expression cassette may comprise a promoter, nucleic acid encoding the gRNA and and a termination signal eg Is. The nucleic acid encoding the gRNA molecule or the expression cassette comprising the nucleic acid may be contained in an integrative vector.
Suitable integrative vectors stably integrate into the genome of the mammalian cells after transfection and express the nucleic acid encoding the gRNA molecule. Integration of the vector into the genome of a cell allows the identification of the target genes that are inactivated in the cell through the identification of the gRNA molecule encoded by the integrated vector. Suitable integrative vectors are well known in the art and include viral vectors, for example retroviral vectors, such as MLV and lentiviruses , such as HIV, SIV and FIV, and transposon vectors such as Sleeping Beauty™, piggyBac~M and Tol2 transposon systems. Preferably, the integrative vector is a lentiviral vector. An example of a lentiviral gRNA vector suitable for use in the methods described herein is shown in Fig 12.
The integrative vector may further comprise one or more selectable markers. Suitable selectable markers include fluorescent proteins, such as Blue Fluorescent Protein (BFP) or Green Fluorescent Protein (GFP) , that can be selected by cell-sorting, and antibiotic resistance genes, such as puromycin resistance or neomycin resistance, that can be selected by exposure to the antibiotic.
Following transfection with the diverse population, mammalian cells that incorporate the integrated vector in their genome may be selected through expression of the selectable marker. Another aspect of the invention provides a population or library of integrative vectors for transfecting a mammalian cell population, preferably a mammalian cell population expressing an RNA guided endonuclease, as described herein, each said integrative vector comprising a nucleic acid sequence encoding a guide RNA molecule (gRNA) that is specific for a target gene within the mammalian cells, said population encoding a diverse population of guide RNA molecules (gRNAs) specific for a set of target genes within the mammalian cells.
The integrative vectors in the diverse population may be isolated or, where appropriate, may be packaged into a suitable viral particle for infection or trans fection . In some embodiments, the diverse population of integrative vectors may be a pan-genomic library that targets all of the active genes in the mammalian cell (i.e. genes that encode proteins or miRNA) . The diverse population may comprise nucleic acid sequences encoding 20000 or more different gRNAs. The diverse population may comprise nucleic acid sequences encoding gRNAs that target at least different 20000 genes in a mammalian cell. For example, nucleic acid sequences in the diverse population may encode 1 to 5 gRNAs that target each active gene in a mammalian cell. In other embodiments, the diverse population of integrative vectors may be a sub-genomic library that targets a subset of the active genes in a mammalian cell, for example all the genes encoding a specific enzymatic activity or all the genes encoding members of a specific pathway. For example, the set of target genes may be genes encoding kinases, phosphatases, tyrosine kinases or G protein coupled receptors (GPCRs) . The diverse population may comprise nucleic acid sequences encoding gRNAs that target a subset of at least 10, at least 50, at least 100, at least 500 or at least 1000 different genes in a
mammalian cell. The diverse population may comprise nucleic acid sequences encoding 50 or more, 500 or more, or 1000 or more different gRNAs. Nucleic acid sequences in the diverse population may encode 1 to 5 gRNAs that target each gene in the subset of genes in a mammalian cell. After production, the diverse population of integrative vectors may be stored, e.g. frozen, using conventional techniques, or used in genomic screening. For example, viral plasmid DNAs containing gRNA libraries may be stored at -20C and lentivirus particles may be be stored at 80C.
Optionally, the gRNA encoding nucleic acid sequences in the diverse population may be analysed before transfection to determine the representation of each gRNA in the diverse population
The diverse population of integrative vectors may be used or suitable for use in the methods described herein. Suitable populations of integrative vectors are described in more detail above.
Following the production, the diverse population of integrative vectors may be packaged where necessary and stably transfected into the mammalian cells using standard techniques. Following transfection, the mammalian cells in the library express, preferably stably, both a RNA guided endonuclease, preferably Cas9, and a gRNA molecule from the library of gRNA molecules. Optionally, mammalian cells that express both the RNA guided endonuclease and the gRNA molecule may be selected, for example using selectable markers, such as antibiotic resistance or fluorescence.
Each mammalian cell in the library may express one or multiple gRNAs, such that one or multiple target genes are inactivated in the cell. Preferably, each mammalian cell in the library only expresses a single gRNA, such that a single target gene is inactivated in the cell.
Following transfection, the mammalian cells may be cultured for at least 3 days, preferably at least 6 days, in order for the target genes to be inactivated by the gRNA/RNA guided endonuclease system (e.g. CRISPR/Cas9) .
Conditional or stable expression of a gRNA specific for a target gene and the RNA guided endonuclease in a mammalian cell generates DNA DSBs at the target region of the gene that is targeted by the gRNA
molecule. Repair of these DNA DSBs by the cell introduces mutations at the target region, typically deletions or insertions, that selectively alter the activity, preferably inactivate, the target gene. For example, the mutations may result in the loss of expression of active gene product.
The activity of the target gene is altered and preferably abolished, in cells that express a gRNA molecule specific for the target gene but is not affected in cells that do not express a gRNA molecule specific for the target gene i.e. no active gene product is expressed from the target gene in cells that express the gRNA molecule.
The gRNAs in the population are specific for different target genes, such that the diverse gRNA population as a whole is specific for a set of target genes. Each of the genes in the set that is targeted by the gRNA population may be inactivated in one or more cells of the mutant mammalian cell library.
In some embodiments, all of the target genes that are targeted by the diverse gRNA population are inactivated in the mutant mammalian cell library (i.e. the set of target genes corresponds to the set of inactivated genes) . In other embodiments, inactivation may be less than 100% efficient and inactivation may not occur in some cells that express both a RNA guided endonuclease and a gRNA molecule. For example, fewer target genes may be inactivated in the mutant mammalian cell library than are targeted by the gRNA molecules in the diverse gRNA population (i.e. the set of target genes is larger than the set of inactivated genes) . The presence of mammalian cells lacking inactivated target genes alongside the mutant mammalian cells of the library does not affect the genomic screening methods described herein . Mutations that inactivate a target gene may include insertions and deletions (indels) of one or more nucleotides. Preferably, the mutation that is introduced into the target gene leads to a
frameshift, introduction of a premature stop codon or deletion of a critical amino acid residue in the encoded protein.
Both alleles of a gene may be inactivated by the stable expression of the gRNA and the RNA guided endonuclease. This allows recessive genes that contribute to the test phenotype to be identified using the methods described herein.
Another aspect of the invention provides a library of mutant mammalian cells,
each cell in the library expressing a RNA guided endonuclease , such as Cas9, and a gRNA specific for a target gene, such that the target gene is inactivated in the cell,
wherein the library expresses a diverse population of gRNA molecules that is specific for a set of target genes, such that one or more target genes from the set of target genes is inactivated in each of the cells in the library.
Preferably, each cell in the library comprises a nucleic acid encoding an RNA guided endonuclease, such as Cas9, and a nucleic acid encoding gRNA molecule specific for target gene stably integrated into its genome .
In some embodiments, the library may be isolated and/or purified following production. In other embodiments, the library may not undergo further isolation or purification. For example, non-library cells which do not express both a RNA guided endonuclease and a gRNA specific for a target gene and/or do not have an inactivated target gene may also be present alongside the cells of the library.
Suitable libraries of mutant mammalian cells are described in more detail above. The library may be used or suitable for use in the methods described herein. In some preferred embodiments, the library is a pan-genomic library that targets all of the active genes in the mammalian cell (i.e. genes that encode proteins) . The diverse population may comprise 20000 or more gRNAs and may target at least 20000 genes in the cell. Each gene in the mammalian cell may be targeted by 1 to 5 gRNAs in the diverse population, such that an active gene in the cell is inactivated in each cell in the library. In other preferred embodiments, the library is a focussed or sub- genomic library that targets a specific subset or panel of genes in the cell, for example genes encoding a specific enzymatic activity or genes encoding members of a specific pathway. Suitable subsets or panels of target genes include genes encoding kinases, phosphatases, tyrosine kinases or G protein coupled receptors (GPCRs) . Each gene in the subset or panel may be targeted by 1 to 5 gRNAs in the diverse population, such that a gene from the subset is inactivated in each cell in the library.
After production, the library of mammalian cells may be maintained in culture, expanded, stored, for example frozen using conventional techniques, or used in genomic screening. Optionally, the gRNA sequences in the library may be analysed before genomic screening is performed to determine the representation of each gRNA in the library.
Following the production of a mammalian cell library in which some or all of the set of target genes is inactivated, the mutant mammalian cell library may be interrogated in order to identify genes that contribute to or are associated with a phenotype of interest.
The library may be subjected to a selection for a test phenotype (i.e. a phenotype of interest) to identify a cell population within the library that displays the test phenotype.
In some embodiments, the selection may comprise subjecting the library to culture conditions that are lethal to cells which do not display the test phenotype. Cells which survive the selective culture
conditions therefore display the test phenotype and represent a selected cell population. Further isolation of the selected population of cells may not be required.
In other embodiments, cells within the library that display the test phenotype may be isolated and/or separated from other cells in the library to form the selected cell population. Suitable selection methods for isolating cells within the library that display the test phenotype are well known in the art and include flow cytometry, immunological methods, such as panning or magnetic beads, cell adhesion, imaging techniques and/or culturing as clonal, or oligoclonal populations, for example in an array format.
In some embodiments, a test sample of the mutant mammalian cell library may be subjected to selection for the test phenotype. The test sample is preferably representative of the mutant mammalian cell library. The results may be compared with a control sample of the library that has not been subjected to selection. For example, the control sample may be untreated, cultured under non-selective
conditions or treated with vehicle instead of active compound, as appropriate. In some embodiments, results from the test sample of the library after selection may be compared with results from the test sample of the library before selection. The phenotype of interest may be selected in the library of mutant mammalian cells or the test sample thereof by applying a phenotypic screen. The test phenotype is displayed by members of the mutant mammalian cell library that have an inactivated target gene that is relevant to the phenotype e.g. a gene that activates, represses or otherwise mediates or is involved in the cellular pathways involved in establishing the test phenotype in the cell.
The phenotype may be selected by applying selective pressure to the mutant mammalian cell library or a test sample thereof. For example, the library or sample may be cultured under conditions that allow cells that display the phenotype of interest to survive while cells that do not display the phenotype of interest do not survive, or conditions that confer a growth or survival advantage on cells that display the phenotype of interest compared to cells that do not display the phenotype of interest (i.e. the culture conditions are selective for cells which possess the phenotype of interest) . In other words, cells in which the inactivated gene is relevant to the
phenotype of interest may survive or be enriched in the library or sample thereof during selection or may not survive or may be depleted in the library or sample thereof during selection, relative to cells in which the inactivated gene is not relevant to the phenotype. The phenotype may be selected by identifying and isolating cells in the library or sample that display the phenotype of interest from other cells in the library or sample that do not display the
phenotype, for example using cell-sorting (e.g. FACS) , cell adhesion, cell cloning or immunological and imaging techniques.
The choice of selection depends on the phenotype that is being investigated. Suitable techniques for selection of cells with
particular phenotypes are well known in the art. For example, the phenotype may be oncogenesis (Ngo, V. N. et al. Nature 441, 106-110 (2006)), cell viability (MacKeigan, J. P. et al Nature Cell Biol. Ί, 591-600 (2005)), cell motility (Collins, C. S. et al. Proc. Natl Acad. Sci. USA 103, 3775-3780 (2006)), proteasome function (Paddison, P. J. et al. Nature 428, 427-431 (2004)), mitotic progression (Moffat, J. et al. Cell 124, 1283-1298 (2006)) host-pathogen interaction (Yeung,
ML, et al. J. Biol. Chem. 284:19643-73 (2009)) or signal transduction, e.g. resistance to TGF- β-induced apoptosis.
In some embodiments, the phenotype of interest may be sensitivity or resistance to the selective culture conditions. For example, the phenotype of interest may be sensitivity or resistance to a chemical compound, such as a small molecule inhibitor, and the selection may be applied by exposing the mutant mammalian cells in the library or the test sample thereof to the chemical compound.
The methods described herein may be useful in identifying candidate genes that modify or mediate the effect of a chemical compound, such as a small molecule inhibitor or other drug, in a mammalian cell. gRNA encoding nucleic acids that are amplified or depleted in the library or sample thereof following exposure to the chemical compound may be identified as targeting genes that modulate resistance or sensitivity to the compound (e.g. inactivation of the target gene by the gRNA molecule increases resistance or sensitivity to the compound) . The methods described here may be useful in identifying candidate genes that are involved in a cellular pathway in a mammalian cell. gRNA encoding nucleic acids that are amplified or depleted in the library or sample thereof following exposure to a chemical compound or other selection may be identified as targeting genes that mediate or are involved in the cellular pathway. For example, the test sample may be exposed to aerolysin (from Aeromonas hydrophila) or alpha-toxin (from Clostridium septicum) to select GPI-anchor synthesis-defective phenotypes and thereby identify genes involved in GPI-anchor
biosynthesis pathway; 6-thioguanine (6TG) to select DNA mismatch repair-defective phenotypes and thereby identify genes involved in DNA mismatch repair; PARP inhibitors, such as olaparib, to select identify genes involved in HR dependent DNA DSB repair; or flialuridine (FIAU) to identify genes involved in FIAU metabolism.
Aerolysin from Aeromonas hydrophila and alpha-toxin from Clostridium septicu are cytolytic pore-forming toxins and use GPI-anchored proteins as their receptors. Although GPI-anchored proteins are essential for development, GPI-deficient cells are viable.
Deficiencies in GPI biosynthesis therefore confer resistance to aerolysin and alpha-toxin. 6TG is converted by Hprt into thio-GMP. After further modification, thio-dGTP is formed and incorporated into genomic DNA during replication, resulting in DNA mispairing. Mismatch repair (MMR) genes recognise the mispairing and induce apoptosis. In contrast, MMR-deficient cells are not able to recognise the mispairing and are therefore able to survive under 6TG treatment. gRNA molecules that are encoded by nucleic acid sequences whose abundance is altered by the selection (i.e. nucleic acid sequences that are present in greater or lesser amounts in the selected cell population than the unselected library) are specific for genes that modulate or are otherwise involved in the test phenotype. The gene that is targeted by a gRNA molecule encoded by a nucleic acid whose abundance in the cell population is altered by the selection may be identified from the recognition sequence of the gRNA, which is complementary to a target region within the target gene, as described above . A gene that modulates or contributes to the test phenotype may be involved in or be a component of a cellular process or pathway that mediates the test phenotype in the cell. Methods of the invention allow the rapid identification of candidate genes relevant to the test phenotype in the cell i.e. genes that modulate, mediate or are negatively or positively associated with the test phenotype in the mammalian cell.
Target genes may be identified by sequencing the gRNA-encoding nucleic acids integrated into the genomes of the selected population of cells (i.e. the cells of the library or test sample thereof after selection for the test phenotype) . In some embodiments, the abundance of each gRNA-encoding nucleic acid sequence in the selected population of cells may be determined relative to a control sample of the library that has not been subject to selection.
Cells of the mutant cell library in which the inactivated target gene is relevant to the phenotype of interest are amplified or depleted by the selection compared to cells in which the inactivated gene is not relevant to the phenotype. Nucleic acid sequences encoding gRNAs that are specific for genes relevant to the phenotype are therefore enriched or depleted in total genomic DNA isolated from the selected cell population relative to control samples.
After selecting cells in the library which display the test phenotype, cells from the selected population may be harvested. In some
embodiments, cells may also be harvested from the unselected library to produce the control sample.
When comparison of the abundance of gRNA encoding nucleic acids with a control sample is required, the selected cell population and the control sample may be analysed simultaneously to identify gRNA
encoding nucleic acids that are amplified or depleted in the selected cell population (i.e. cells displaying the test phenotype) or the control sample may be analysed before or after the selected cell population. In some embodiments, the abundances of gRNA encoding nucleic acid sequences in a control sample may be determined, and optionally stored or recorded, and used to identify gRNA encoding nucleic acids that are amplified or depleted in multiple different test samples . Genomic DNA from the test sample and/or control sample may be amplified before sequencing. Optionally, genomic DNA from the test sample and/or the control sample of cells may be purified before amplification. Suitable methods of DNA purification are well known in the art.
The total gRNA encoding nucleic acid in the test sample and/or the control sample may be amplified from the genomic DNA of each sample. Amplification primers may be based on the sequence of the integrative vector or the non-diverse regions of the gRNA-encoding nucleic acids and may be designed using routine primer design techniques. For example, suitable primers for gRNA amplification include the forward primer; CTTGAAAGTATTTCGATTTCTTGG and the reverse primer:
ACTCGGTGCCACTTTTTCAA.
Suitable techniques for the amplification of genomic DNA from
populations of cells, such as PCR, are well known in the art (Bassik MC et al. Nat Methods (2009) 6:443-445, Quail MA et al. Nat Methods 2008, 5:1005-1010, Schlabach MR, Science 2008, 319:620-624; Zuber J et al Nat Biotechnol 2011, 29:79-83; Silva JM et a; Science 2008, 319: 617-620) .
Optionally, the amplification products may be purified and/or modified prior to sequencing. For example, the amplification products may be adapted to be compatible with a sequencing technique or platform. The nature of the adaptation will depend on the sequencing technique or platform. For example, for Solexa-Illumina sequencing, primers may be ligated onto the ends of the amplification products. gRNA encoding nucleic acids may be sequenced using any convenient high-throughput quantitative sequencing technique or platform, including Solexa-Illumina sequencing (Bentley et al Nature, 456, 53-59
(2008) ), Ligation-based sequencing (SOLiDr ) (KJ McKernan et al Genome Res. (2009) 19: 1527-1541), pyrosequencing (M Ronaghi et al Science
(1998) 281 5375 363-365); strobe sequencing (SMRT™) (Eid et al Science
(2009) 323 5910 133-138; Korlach et al Methods in Enzymology 472
(2010) 431-455)); Nanopore sequencing (GridlON™ system) (Schneider et al Nature Biotechnology 30, 326-328 (2012)) and semi-conductor array sequencing (Ion Torrent™) (Rothberg et al (2011) Nature 475 348-352) .
Suitable protocols, reagents and apparatus for sequencing are well known in the art and are available commercially.
In other embodiments, gRNA encoding nucleic acids may be identified and analysed by microarrays. Suitable microarrays are well known in the art and include the Gene Modulation Array Platform (Ketela et al . BMC Genomics 2011, 12:213) .
The amount or abundance of each gRNA encoding nucleic acid sequence in the test sample may be determined after or at the same time as sequencing. From the amount of each gRNA encoding nucleic acid sequence, genes that are enriched or depleted in the test sample and are therefore involved in the selection phenotype in the cell may be identified.
In some embodiments, the relative amount or abundance of individual gRNA encoding nucleic acid sequences in the test sample may be determined relative to a control sample. The amount of each gRNA encoding nucleic acid sequence in the control sample may be determined before, at the same time as, or after the amount of each gRNA encoding nucleic acid sequence in the test sample; or may have been previously determined. The number of times an individual gRNA encoding sequence is read (i.e. the read count) may be compared between the test and control samples to determine the relative abundance of the sequence. Suitable methods of determining the relative amounts of gRNA molecules in the samples are well known in the art. For example, mapping reads may be performed with standard software such as the Burrows-Wheeler
Alignment tool (BWA) and mapped reads counted using SAMtools (Li et al Bio informatics (2009) 25 (16) : 2078-2079) .
An increased amount of a gRNA encoding sequence in the test sample, relative to the control sample, is indicative that the gRNA encoding sequence is enriched by the selection. A decreased amount of a gRNA encoding sequence in the test sample, relative to the control sample, is indicative that the gRNA encoding sequence is depleted by the selection . gRNA encoding nucleic acid sequences that are enriched or depleted in the selected cell population compared to the control population encode gRNAs that target genes involved in the test phenotype .
A gene that is involved in the test phenotype may be identified from the sequence of a gRNA encoding nucleic acid that is enriched or depleted in the test sample.
The recognition sequence of a gRNA molecule encoded by an enriched or depleted nucleic acid sequence is complementary to and specifically matches a target sequence within the target gene. The target sequence that is targeted by a gRNA molecule may be identified from the gRNA encoding sequence. The target sequence is unique for a single target gene in the mammalian cell. The gRNA encoding nucleic acid within the genome of the mammalian cell is therefore a tag or marker that allows the inactivated target gene in the mammalian cell to be identified. The screening methods described herein therefore allow phenotypic traits (e.g. survival in tests, drug resistance, response to a stimulus, activation of a pathway, expression of a protein or
receptors) to be linked with genetic information. As described above, multiple gRNA molecules in the diverse population (e.g. 3-5) may be specific for each target gene.
In some embodiments, the amount or abundance of multiple different nucleic acid sequences that encode gRNA molecules that are specific for the same target gene may be determined in the test sample relative to a control sample. The enrichment or depletion of multiple gRNA encoding nucleic acids that are specific for the same target gene in the test sample relative to the control provides strong indication that the target gene is associated with the test phenotype whereas the enrichment or depletion of only one of multiple gRNA encoding nucleic acids that are specific for the same target gene may be indicative that the observed phenotype arises from an off-target effect and the target gene is not associated with the phenotype. Following identification of a target gene that is associated or involved in the test phenotype, the target gene may be further tested, for example by genetic, biochemical or biological analysis, to confirm its activity and/or function. For example, the effect of knocking out genes identified as target genes in a cell may be determined.
Another aspect of the invention provides a kit for use in a method of genomic screening as described above, comprising;
a library of mutant mammalian cells, each mutant mammalian cell in the library expressing an RNA guided endonuclease, such as Cas9, and a gRNA specific for target gene, such that the target gene is inactivated in the cell,
wherein the library expresses a diverse population of gRNA molecules that is specific for a set of target genes, such that a target gene from the set is inactivated in each cell in the library.
Another aspect of the invention provides a kit for use in a method of genomic screening as described above, comprising;
i) a population of integrative vectors;
each integrative vector comprising a nucleic acid sequence encoding a guide RNA molecule (gRNA) that is specific for a target gene in a mammalian cell,
wherein said population of vectors comprises nucleic acid sequences that encode a diverse population of guide RNA molecules
(gRNAs) that is specific for a set of target genes within a mammalian cell, and optionally;
(ii) a population of mammalian cells that stably express an RNA guided endonuclease, such as Cas9 nuclease.
Suitable libraries, viral vectors and mammalian cells are described above .
The kit may include instructions for use in a method of genomic screening as described above. A kit may include one or more other reagents required for the method, such as culture media, buffer solutions, amplification, sequencing and other reagents. The kit may include one or more articles and/or reagents for
performance of the method, such as culture vessels, DMA and/or RNA isolation and purification reagents, and sample handling containers (such components generally being sterile) .
Other aspects and embodiments of the invention provide the aspects and embodiments described above with mammalian cells replaced by other cell types, for example bacterial cells, or eukaryotic cells, such as plant cells and non-mammalian animal cells, including yeast, fish, insect and nematode cells.
Other aspects and embodiments of the invention provide the aspects and embodiments described above with the term "comprising" replaced by the term "consisting of" and the aspects and embodiments described above with the term "comprising" replaced by the term "consisting
essentially of".
Various further aspects and embodiments of the present invention will be apparent to those skilled in the art in view of the present disclosure .
It is to be understood that the application discloses all combinations of any of the above aspects and embodiments described above with each other, unless the context demands otherwise. Similarly, the
application discloses all combinations of the preferred and/or optional features either singly or together with any of the other aspects, unless the context demands otherwise.
Unless context dictates otherwise, the descriptions and definitions of the features set out above are not limited to any particular aspect or embodiment of the invention and apply equally to all aspects and embodiments which are described. Modifications of the above embodiments, further embodiments and modifications thereof will be apparent to the skilled person on reading this disclosure, and as such these are within the scope of the present invention.
All documents and sequence database entries mentioned in this
specification are incorporated herein by reference in their entirety for all purposes. "and/or" where used herein is to be taken as specific disclosure of each of the two specified features or components with or without the other. For example "A and/or B" is to be taken as specific disclosure of each of (i) A, (ii) B and (iii) A and B, just as if each is set out individually herein.
Certain aspects and embodiments of the invention will now be
illustrated by way of example and with reference to the figures described above. Experiments
1. Methods
1.1 Plasmid construction
All PCR-generated fragments were verified by Sanger sequencing.
The humanized Cas9 expression vector13 was obtained from Addgene
(41815). We modified the vector as follows. Firstly, the AgeI-BstZ17I region in the vector was replaced with a PCR-generated fragment containing the bovine growth hormone polyadenylation signal sequence (bpA) . Secondly, the lul-Ncol region containing the CMV promoter was replaced with the human EFla promoter, resulting in pEFla-hCas9.
pPB-LR5. l-EFla-puro2ACas9 was constructed as follows. Firstly, we removed the Notl and the Ascl sites from pPB-LR5 19 by cloning the Mlul-Xbal fragment containing the piggyBac transposon into the Mlul- Xbal site of PCR-generated pBluescript, resulting in pPB-LR5.1.
Secondly, the CAG promoter (the Nhel-Clal fragment of pPB-CAG . EBNXN 20) and bpA (the PCR-generated Clal-Xhol fragment) were cloned into the Nhel-Sall site of pPB-LR5.1, resulting in pPB-LR5.1-CAG . The PCR- generated Mfel-Pacl fragment containing puro-T2A-GFP was then cloned into the EcoRI-PacI site of pPB-LR5.1-CAG, resulting in pPB-LR5.1- CAGpuro2AGFP . Separately, the fragment containing human EFla promoter was PCR-generated using a BAC clone, RP11-159L14, as a template and cloned into pPB vector together with the GFP fragment, resulting in pPB-EFla-GF . The Nhel-AscI fragment containing the hEFla promoter was excised from pPB-EFla-GFP and cloned into the Nhel-AscI site of pPB- LR5. l-GAGpuro2AGFP, resulting in pPB-LR5. l-EFla-puro2AGFP . Finally, the Ncol-Notl fragment containing Cas9 was cloned into the Ncol-Notl site of pPB-LR5. l-EFla-puro2AGFP, resulting in pPB-LR5.1 EFla- puro2ACas9.
The gRNA cloning vector, pU6-gRNA (Bbsl ) , was constructed by cloning a gBlock fragment (IDT) containing the human U6 promoter, the gRNA cloning site and the gRNA scaffold, into the XhoI-BamHI site of pBluescriptl I .
The lentiviral gRNA expression vector, pKLV-U6gRNA (Bbs I ) -PGKpuro2ABFP, was constructed as follows. Firstly, a new lentiviral backbone vector, pKLV was constructed. A vector containing the multicloning site, Spel- Apal-MluI-XhoI-AscI-BamHI-Notl-Kpnl-Eagl-PacI , was generated by PCR using pBluescript as a template, resulting in pBS- CS-KLV. The modified 3' LTR followed by bpA was synthesized (GeneArt) and cloned into the Kpnl-Pacl site of pBS-MCS-KLV, resulting in pBS-3LTRbpA. The Spel-Apal fragment containing the CMV promoter, the 5' R/U5 region and the packaging signal sequence was excised from FUW-OSK - (Addgene, 20328) and cloned into pBS-3LTRbpA, resulting in pKLV. Secondly, the PGK-puro2ABFP cassette was constructed as follows. Fragments
containing PGK-puro2A and 2ABFP were PCR-generated. The Bbsl site within the BFP coding sequence 22 was mutated during this PCR process. The full-length coding sequence of puro-2A-BFP was generated by fusing PGK-puro2A and 2ABFP in the second PCR reaction and then clones into the BamHI-NotI site of pBluescript, resulting in pPGK-puro2ABFP .
Finally, the XhoI-BamHI fragment from pU6-gRNA (Bbsl ) and the BamHI- Notl fragment of pPGK-puro2ABFP were cloned into the Xhol-Notl site of pKLV, resulting in pKLV-U6gRNA (Bbs I ) -PGKpuro2ABFP .
Individual gRNA expression vectors were constructed as follows. Top and bottom 26-nt oligonucleotides were mixed at 10 μΜ each in 10 mM Tris-HCl (pHS.O) and 5 mM MgCl2 in a total volume of 100 μΐ . The mixture was incubated at 95 °C at 5 min and cooled to room temperature. The duplex oligos were then cloned into the Bbsl site of pU6-gRNA(BbsI) or pLV-U6gRNA (Bbs ) -PGKpuro2ABFP . cDNA expression vectors were constructed as follows. cDNAs for B4galt7 and Ext2 were PCRamplified using cDNAs from JM8 mouse ESCs as a PCR template, digested with Sall/BsrGI {B4galt7) or Sall/EcoRl ( xt2) , and cloned into the Sall/BsrGI or Sall/EcoRI site of pPB-Efla-GFP, respectively. The coding sequence of PighA12 were PCR-amplified using two primer pairs (Ul and LI; U2 and L2) and cloned into the Sall/EcoRI site of pPB-EFla-GFP using Gibson Assembly Master Mix (NEB) . A fulllength Pigh coding sequence was also PCR-amplified with a primer pair (Ul and L2) and cloned into the Sall/EcoRI site of pPB-EFla-GFP . shRNA expression vectors were constructed as follows. Top and bottom oligonucleotides were mixed at 3 μΜ each in 10 mM Tris-HCl (pH8.0) and 5 mM MgC12 in a total volume of 50 μΐ . The mixture was incubated first at 95 °C for 5 min, then at 70 °C for 10 min. Subsequently, the mixture was cooled to room temperature. The duplex oligonucleotides were then cloned into the Bbsl site of pKLV-U6gRNA (Bbsl ) -PGKpuro2ABFP .
1.2 Genome-wide mouse gRNA design
A BED file containing the exonic coordinates of all protein coding genes on the mouse reference genome GRCm38 was obtained. Overlapping coordinates were merged using BEDtools. The sequences of each genomic interval in the BED file, with an additional 20 nucleotides on both sides of the intervals, were retrieved and used to identify all sequences comprising 5' -GN2oGG-3' . To avoid off-target cleavages, only gRNAs that matched stringent conditions were chosen: from position 8, the 5'-N14GG-3' of each gRNA only had a single match to the mouse genome. A total of 325,638 sites were identified.
1.3 Genome-wide mouse gRNA design for a lentiviral library
In order to generate a genome-wide lentiviral gRNA library, we designed an additional set of genome-wide gRNAs. A BED file containing the consensus coding sequence (CCDS) on the mouse reference genome GRC38 (CCDS released on 08/14/2012) was obtained. When genes have multiple CCDS transcripts, we only chose the overlapping regions. The sequences of each genomic interval were retrieved and used to identify all sequences comprising 5' -N19NGG-3' . Filtering was performed as follows: Firstly, sites with more than 1 perfect hit in any of the Ensembl exons were removed. Secondly, off-target sites of each of candidate gRNAs were examined with the following 2 options (1) N12MGG without any mismatches and (2) N20NGG with up to 3 mismatches.
Thirdly, gRNAs that are positioned at least 100 bp away from the translation initiation site and in the first half of coding sequences were collected. Finally, up to 5 gRNAs were chosen for each gene, prioritising gRNAs with fewer predicted off-target sites. A
total of 87,897 gRNA sequences were chosen.
1.4 Genome-wide mouse gRNA lentiviral library construction
A 79-mer oligo pool was purchased from CustomArray Inc. The oligo sequences are 5'-
GCAGATGGCTCTTTGTCCTAGACATCGAAGACAACACCGN19GTTTTACAGTCTTCTCGTCGC-3' , where N19 indicates each of the 87,897 gRNA sequences. The single- stranded oligos were converted to doublestranded DNA by PGR using Q5 Hot Start High-Fidelity 2X Master Mix (NEB) with 32 fmol of the oligo as template and primers (79mer-Ul and -Ll) using the following conditions: 98 °C for 10 sec, 10 cycles of 98 °C for 10 sec, 64 °C for 15 sec and 72 °C for 15 sec, and the final extension, 72 °C for 2 min. The PCR products were purified with the Nucleotide Removal kit (Qiagen) , digested with Bbsl and separated by PAGE. The 26-bp fragment was excised and ligated into the Bbsl site of pKLVU6gRNA (Bbsl ) - PGKpuro2ABFP.
1.5 Cell culture and transfection
Mouse ESCs (JM8) were cultured on mitomycin C-treated MEFs in Knockout DMEM (Invitrogen) supplemented with 15% FBS (PAA), 1% GlutaMax
(Invitrogen), 1% nonessential amino acids (Invitrogen), 0.1 mM 2- mercaptoethanol and 1, 000 U ml 1 Leukemia inhibitory factor (LIF;
Millipore) . 293FT (Invitrogen) and pancreatic carcinoma cells were cultured in DMEM containing 10 % FBS and 1% GlutaMax. Transient reverse transfection of ESCs was carried out using Lipofectamine LTX (Invitrogen) according to the manufacturer's instruction. Briefly, 100 ng of plasmid DNA and 0.1 μΐ of the PLUS reagent were mixed into 10 μΐ OPTI-MEM (Invitrogen) and incubated for 5 min at room temperature. 0.3 μΐ of the LTX reagent was diluted into 10 μΐ of OPTI-MEM and combined with the DNA:PLUS mixture. This was incubated for 30 min at room temperature. Subsequently, 15,000 ESCs suspended in 80 μΐ OPTI-MEM were mixed with 20 μΐ of the DNA:PLUS:LTX mixture and plated onto a well of a 96-well plate containing feeder cells. These cells were incubated for 1 hour at 37oC. The transfection mixture was then removed and 150 μΐ of ESC medium were added. The transfected cells were cultured for 6-7 days before relevant functional analysis.
Single-copy transgenesis using the piggyBac transposon system was carried out as described previously . Briefly, the piggyBac
transposase expression vector, pCMV-mPBase (5 μg) and a transposon vector (100 ng) were electroporated into 1 x 106 ESCs at 230V and 500 μΕ using GenePluser II (BioRad) and plated onto a 10-cm dish. Two days later, drug selection was initiated. The resulting colonies were picked and further expanded.
1.6 Lentivirus production and transduction
Three μg of a lentiviral vector, 9 g of ViraPower Lentiviral
Packaging Mix (Invitrogen) and 12 μΐ of the PLUS reagent were added to 3 ml of OPTI-MEM and incubated for 5 min at room temperature. Thirty six μΐ of the LTX reagent was then added to this mixture and further incubated for 30 min at room temperature. The transfection complex was added to 80% confluent 293FT cells and incubated for 3 hours. The medium was replaced with fresh medium at 24 h post transfection. Viral supernatant was harvested at 48 h post transfection and stored at - 80oC. Transduction of ESCs was performed in suspension as follows:
15,000 ESCs and diluted virus were mixed in 100 μΐ of the ESC medium containing 8 μg ml"1 polybrene (Millipore) , incubated for 30 min at 37°C in a well of a round-bottomed 96-well plate, plated onto a well of a feeder-containing 96-well plate and cultured until functional analyses. Transduction volumes were scaled up according to the areas of the culture plates if necessary.
1.7 Off-target site prediction and cleavage analysis
Twenty-nt gRNA sequences were mapped to the mouse reference genome (NCBI37) using BWA aln with the following option: -n 5 -o 0 -1 20 -N . Subsequently, the mapped positions that were followed by the PAM sequence were extracted as potential off-target sites. All potential off-target sites with a maximum of 5 mismatches for Site 3 gRNA of the Piga gene were determined. For off-target cleavage analysis, we excluded sites for which specific primers cannot be designed because of the presence of repetitive elements. We designed PCR primers using Batch Primer3 and selected 95 off-target sites. Each locus was individually amplified using genomic DNAs derived from doubly
transgenic ESC lines with Phusion High-Fidelity polymerase in GC buffer (Thermo Scientific) . PCR products were pooled, purified using QIAquick PCR Purification Kit (Qiagen) and used for Illumina library generation .
1.8 Illumina library generation and sequence analysis
Five hundred ng of pooled PCR products were ligated with Illumina adaptors using NEBNext DNA Library Prep Master Mix (NEB) according to the manufacturer's protocols. The adaptor-ligated products (1 15"1 of the input material) were used for PCR enrichment with ΚΑΡΆ HiFi
HotStart ReadyMix with the following PCR conditions: 98 °C for 30 sec, 7 cycles of 98 °C for 10 sec, 66°C for 15 sec and 72 °C for 20 sec, and the final extension, 72 °C for 5 min. The PCR products were purified with Agencourt AMPure XP beads (Beckman) in a PCR-product-to- bead ratio of 1:0.7. The purified library was quantified and sequenced on Illumina MiSeq by 250-bp paired-end sequencing. Each read was mapped to a custom reference sequence using BWA-SW. Reads containing indels overlapping the ± 20-bp region of the predicted cut sites were considered to be the outcome of NHEJ. The cut frequency was calculated by dividing the number of reads with indels by the total number of reads mapped.
1.9 Flow cytometry
Fluorophore (Alexa488 ) -labeled mutant proaerolysin (FLAER) was purchased (VH bio) and used for cell staining at 25 nM in 1% BSA in PBS for 20 min at room temperature. The stained cells were analyzed on the LSRFortessa instrument (BD) . Data were subsequently analyzed using FlowJo .
1.10 Screening for alpha-toxin-resistant or 6TG-resistant mutants
A mini gRNA library was first constructed as follows. An equal amount of 65 plasmid DNAs were mixed and used to generate lentivirus as described above. The titer of the resulting virus was measured by transducing ESCs and analyzing the number of BFP-positive cells. An ESC library was generated by transducing hCas 9-expressing ESCs at an moi of 0.1. Alpha-toxin resistant cells were obtained by treating 500,000 cells with 300 pM alpha-toxin for 48 h. Similarly, mismatch repair-deficient mutants were obtained by treating 500,000 cells with 2 μΜ 6TG for 7 days. Genomic DNA was isolated with the DNeasy kit (Qiagen) from un-screened ESCs and enriched mutants. Regions
containing gRNA sequences were PCR-amplified and sequenced on Illumina MiSeq by 150-bp paired-end sequencing. Read counts of each gRNA were analyzed and compared between un-screened ESCs and enriched mutants.
1.11 Alpha-toxin treatment
ESCs were dissociated into single-cell suspension and plated onto gelatin coated plate at a density of 9 x 104 cells cm-2 in a volume of 220 μΐ cnr2 with the indicated concentrations of alpha-toxin. The cells were cultured for 48 h and then the medium was replaced with fresh M15L medium daily until staining with methylene blue or harvesting for downstream analysis.
1.12 6-thioguanine treatment
ESCs were dissociated into single-cell suspension and plated onto pSNL feeder plates at a density of 5 x 106 cells per 10-cm dish for the MMR screening or 2.5 x 104 cells per well of a 12-well plate for comparison of gene inactivation efficiencies between gRNA and shRNA. On the following day, the medium was replaced with a selective medium
containing 2 μΜ 6-thioguanine (Sigma) . The selection was continued for 5 days and the cells were cultured for an additional 5 days
without 6-TG.
1.13 cDNA complementation assay
Mutant cells (1 x 10b cells) were transfected with a mixture of cDNA expression vector (2.25 ug) and pPB-EFla-GFP (0.25 ug) using
Lipofectamine LTX. As a negative control, pBluescriptl I was used. Two days post transfection, GFP-positive cells were sorted using
MoFlow XDP (Beckman) . Immediately after cell sorting, 5 x 104 cells were treated at the indicated concentration of alpha toxin in a 96- well plate for 48 h. The cells were further cultured in M15L medium until staining.
1.14 Lentivirus production and transduction
Three yg of a lentiviral vector, 9 μg of ViraPower Lentiviral
Packaging Mix (Invitrogen) and 12 μΐ of the PLUS reagent were added to 3 ml of OPTI-MEM and incubated for 5 min at room temperature. Thirty six μΐ of the LTX reagent was then added to this mixture and further incubated for 30 min at room temperature. The transfection complex was added to 80% confluent 293FT cells and incubated for 3 hours. The medium was replaced with fresh medium at 24 h post transfection. Viral supernatant was harvested at 48 h post transfection and stored at -80 °C . Transduction of ESCs was performed in suspension as follows: 15,000 ESCs and diluted virus were mixed in 100 μΐ of the ESC medium
containing 8 μg ml-1 polybrene (Millipore) , incubated for 30 min at
37°C in a well of a round-bottomed 96-well plate, plated onto a well of a feeder-containing 96-well plate and cultured until functional analyses. Transduction volumes were scaled up according to the areas of the culture plates if necessary.
1.15 Generation of genome-wide ESC mutant libraries and screening
1.0 x 107 ESCs (JM8-Cas9#5) were infected with the genome-wide gRNA lentiviral library at an MOI of 0.3. Two independent infections were conducted, thus producing two independent ESC libraries. Three days post infection, 2.0 x 106 BFP-positive cells were sorted for each of the libraries and cultured for an additional 4 days. For each of the 2 ESC libraries, 6 x 106 or 10 x 106 mutant ESCs were treated with alpha- toxin (1.0 nM) for 48 h or 6-TG (2 μΜ) for 5 days, respectively, and further cultured for an additional 5 days. Surviving cells were pooled per library and genomic DNA was extracted and used for PCR templates.
1.16 Off-target site prediction and cleavage analysis
Twenty-nucleotide guide sequences were mapped to the mouse reference genome (GRCm38) using BWA aln with the following option: -n 5 -o 0 -1 20 -N (ref.52) . Subsequently, the mapped positions that were followed by the PAM sequences (NGG or NAG) were extracted as potential off- target sites. Potential off-target sites with bulge structures were identified by mapping the 20-bp gRNA sequences to the mouse genome using BWA aln with the following option: -n 3 -o 1 -k 3 -N. For off- target cleavage analysis, we excluded sites for which specific primers cannot be designed because of the presence of repetitive elements. We designed PCR primers using Batch Primer3 and selected 95 off-target sites for each of the NGG and NAG PAMs and 41 and 44 off-target sites with bulges+NGG and bulges+NAG, respectively. Each locus was
individually amplified using genomic DNA derived from 20 doubly transgenic ESC lines and transiently transfected ESCs (day 4) with Phusion High-Fidelity polymerase in GC buffer (Thermo Scientific) . PCR products were pooled, purified using QIAquick PCR Purification Kit (Qiagen) . Five hundred nanograms of the purified PCR products were ligated with Illumina adaptors 53 using NEBNext DNA Library Prep Master Mix (NEB) according to the manufacturer's protocols. The adaptor-ligated products (1 15-1 of the input material) were used for PCR enrichment 53 with KAPA HiFi HotStart ReadyMix with the following PCR conditions: 98 °C for 30 sec, 7 cycles of 98 °C for 10 sec, 66°C for 15 sec and 72 °C for 20 sec, and the final extension, 72 °C for 5 min. The PCR products were purified with Agencourt AMPure XP beads (Beckman) in a PCR product-to-bead ratio of 1:0.7. The purified libraries were quantified and sequenced on Illumina MiSeq by 250-bp paired-end sequencing. Each read was mapped to a custom reference sequence using BWA-SW 52. Reads containing indels overlapping the ± 20-bp region of the predicted cut sites were considered to be the outcome of NHEJ. The cut frequency was calculated by dividing the number of reads with indels by the total number of reads mapped.
1.17 Illumina sequencing of gRNAs in the genome-wide library and the enriched mutants
For sequencing of all gRNAs in the genome-wide library, the region containing the gRNA was amplified using primers (gLibrary-HiSeq_50bp- SE-U1 and -Ll) with Q5 Hot Start High-Fidelity 2X Master Mix. We conducted 10 independent PCR reactions using 15 ng of the whole genome lentiviral plasmid library per reaction and 72 independent PCR reactions using 1 μg of the mouse ESC library per reaction for each of the two ESC libraries. These correspond to 1.7 x 1010 molecules of the plasmid DNA and 1.1 x 107 ESCs in total, respectively. For sequencing of gRNAs in the enriched mutants, the region containing the gRNA was amplified using 1 \ig of genomic DNA (1.5 x 10" cells) and primers (gLibrary-MiSeq_150bp-PE-Ul and -LI) with Q5 Hot Start High-Fidelity 2X Master Mix. The PCR products were pooled in each group and purified using QIAquick PCR Purification Kit. Two hundred picograms of the purified PCR products were used for PCR enrichment 53 with KAPA HiFi HotStart ReadyMix with the following conditions: 98 °C for 30 sec, 12 cycles of 98 °C for 10 sec, 66°C for 15 sec and 72 °C for 20 sec, and the final extension, 72 °C for 5 min. The PCR products were purified with Agencourt AMPure XP beads in a PCR-product-to-bead ratio of 1:0.7. The purified libraries were quantified and sequenced on
Illumina HiSeq2500 by 50-bp single-end sequencing (for the entire libraries) or on Illumina MiSeq by 150-bp paired-end sequencing (for the enriched mutants) . gRNA sequences were extracted by removing constant regions from each read and these were used to count the number of reads of each gRNA in the library.
1.18 Gene ontology analysis
We computed an average depletion rate for each gRNA and chose the genes with at least 3 gRNAs with an average depletion rate larger than tenfold for gene ontology analysis. Gene ontology analyses were performed using the DAVID Bioinformatics Resources (NIAD, NIH) .
2. Results
2.1 Constitutive expression of Cas9 and gRNA from single-copy transgenes is sufficient for efficient cleavages
We first generated a U6 promoter-based gRNA cloning vector, which allows double-stranded oligonucleotides to be readily cloned (Fig. 1) . We then designed 4 gRNAs targeting exon 2 of the Piga gene and cloned the relevant duplex oligonucleotides into the gRNA cloning vector (Fig. 2) . Piga is an X-linked gene and encodes a member of the enzyme complex that is responsible for the first step of the glycosyl phosphatidyl inositol (GPI) -anchor synthesis. In addition to the Piga gene, there are 25 genes that are involved in the GPI-anchor synthesis pathway (Takeda, J. et al. Cell 73, 703-711 (1993)) . Inactivation of one of these genes, with the exception of Pigg, results in a complete or partial loss of GPI-anchored proteins on the cell surface, which can be easily detected by flow cytometry. Since alpha-toxin from Clostridium septicum uses GPI-anchored proteins as a cellular
receptor, cells expressing GPI-anchored proteins are killed by cytolysis in the presence of alpha-toxin. In contrast, GPI-anchor- deficient cells are resistant to the toxin. The mutant cells can therefore be selected by toxin treatment. We transfected mouse embryonic stem cells (ESCs) with a humanized Cas9 (hCas9) expression vector in combination with either a gRNA expression vector targeting Piga or a control vector. Six days after trans fection, the transfected cells were stained with fluorescently labelled aerolysin (FLAER) , which binds to the GPI moiety of GPI-anchored proteins, and analysed by flow cytometry. All 4 gRNAs were able to give rise to FLAER negative cells (12-17%; Fig. 3) .
Next, we examined whether constitutive expression of the two
components of the CIRSPR/Cas system is able to introduce efficient site-specific DSBs. We first generated stable ESC lines expressing hCas9 driven by the human elongation factor 1 alpha (EFla) promoter using single-copy piggyBac transposition (Fig. 4) . We then transfected these hCas 9-expressing cell lines and control wild-type JM8 ESCs with either a gRNA (site 2 or 3 of the Piga gene) expression vector or a mixture of hCas9 and the gRNA expression vectors. Compared to wild- type ESCs, the hCas 9-expressing ESC lines show higher knockout frequencies (Fig. 5; 13% in wt cells, 25% in the stable ESC lines; p<0.05) . This effect was seen despite similar transfection
efficiencies in all the cell lines examined (Fig. 6) . In addition, over-expression of hCas9 by transient transfection does not appear to have any effect on the knockout frequencies in the hCas9-expressing ESCs. These results provide indication that the expression levels of hCas9 in these stably transfected ESC lines are sufficient to induce gRNA-mediated cleavage. We next introduced the gRNA expression cassette into two hCas9- expressing ESC lines by single-copy piggyBac transposition (Fig. 7) and analysed G418-resistant colonies for GPI-anchored protein
expression. Out of 20 derivative colonies analysed, 18 colonies nearly completely lacked GPI-anchored protein expression, whereas the remaining two colonies (Cas9#5-3 and -7) contained 15% and 98% GPI- positive cells, respectively (Fig. 8) . Our results indicate that gRNA expression from a single transgene is sufficient to introduce
CRISPR/Cas-mediated DSBs. In addition, since 90% of the colonies examined were completely lacking in GPI-anchored protein expression, it is likely that the cleavage reaction occurs in a short period of time. If CIRSPR/Cas-mediated cleavage occurred slowly, most colonies would be expected to contain a mixture of GPI-positive and negative cells. As a result of this slow cleavage process, each colony would also be expected to contain a mixture of cells with different patterns of insertions/deletions (indels) .
2.2 Deep sequencing analyses of on-target cleavages at a clonal level In order to analyse the indel patterns at the on-target site and examine the timing of the cleavages, we carried out deep sequence analyses of all twenty colonies shown in Fig. 8 and examined the indel patterns at the target site. In addition, we also analysed the frequency of off-target cleavages at 95 potential off-target sites.
Out of the twenty colonies analysed, seventeen showed consistent results with the flow cytometry data (Fig. 8); nearly 100% of reads at the target site of these colonies contain indels. The discrepancy between the two sets of data for the remaining three colonies (Cas9#5- 3, 8-3 and 8-9) can be explained by the presence of GPI-positive wild- type cells in each of the three colonies. Expansion of these cells, which was clearly detected in Cas9#5-3, can occur between FACS analysis and DNA extraction, as shown in a competition assay between wild type and Piga-deficient cells, and will directly result in higher number of reads derived from wild-type sequence. It was also evident from the indel patterns that eighteen of the twenty colonies analysed contain single clonal indels (one representative data is shown in Fig. 9) . Since the Piga gene is X-linked, this provides indication that the cleavage event occurred in the clonal origin of each colony. In particular, the Cas9#5-7 clone had a clonal 6-bp in- frame deletion, which explains the low percentage of FLAER-negative cells observed in Fig. 8. The remaining two colonies, Cas9#8-3 and Cas9#8-9, carried multiple deletions (Fig. 9) , indicating that the on- target cleavage occurred at a later stage during colony expansion. 2.3 Deep sequencing analyses of off-target cleavages in cell lines constitutively expressing both Cas9 and gRNA
In addition, we analysed the frequency of off-target cleavages at 275 potential off-target sites. Out of these, 95 sites have up to 5 mismatches between the genome and the Piga Site 2 gRNA and are followed by the NGG PAM. Another 95 sites have up to 5 mismatches but these contain the NAG PAM rather than the NGG PAM. The remaining 85 sites have mismatches and bulges between the sites and the gRNA and are followed by either the NGG or the NAG PAMs .
Out of the 95 potential off-target sites analysed, we identified only 2 loci (120_5tm-2 and 120_5tm-21) that were cleaved, albeit at different frequencies. Site 120_5tm-21, which contains 3 mismatches, was targeted by the gRNA as efficiently as the on-target (Fig. 10) . However, site 120_5tm-2, with 2 mismatches, was targeted at different frequencies in each of the twenty colonies analysed. Since these cell lines constitutively express both hCas9 and the gRNA, cleavages have occurred at various time points during cell expansion, which explains the various cutting frequencies and indel patterns observed (Fig. 11) . It is thus evident that site 120_5tm-2 represents a weak off-target site in comparison to a strong off-target site like site 120_5tm-21. Importantly, we examined 19 potential off-target sites in protein- coding regions (CCDS) and found that none showed any off-target cleavages. Thus far, all published off-target cleavage analyses have been performed using transiently transfected samples. To compare off- target cleavages between transient and constitutive expression of the CRISPR-Cas system, we analysed hCas9-expressing ESCs transiently transfected with the Piga Site 2 gRNA expression vector by deep sequencing. The same 2 off-target sites, which were identified in the clones constitutively expressing hCas9 and the gRNA, were also the only 2 identified in the transiently transfected cells. Interestingly, Site 2 of the Piga gene and the off-target site 120_5tm-21 displayed similar cleavage frequencies, whereas the weak off-target site,
120_5tm-2, displayed lower cutting frequency.
Taken together, our results indicate that, whilst the CRISPR-Cas system does introduce DSBs at selected off-target sites, gRNAs may used without deleterious consequences. Based on the data presented above, we conclude that stable expression of each component of the CRISPR/Cas9 system is sufficient to activate the CRISPR/Cas9 system in mammalian cells and can be used for targeted gene disruption. 2.4 Lentiviral delivery of gRNA expression cassettes
Lentiviral vectors have been successfully utilised in various gene delivery applications including the delivery of small hairpin RNA ( shRNA) for RNA interference (RNAi) (Moffat, J. et al . Cell 124, 1283- 1298 (2006) ; Silva, J.M. et al. Nature genetics 37, 1281-1288 (2005) We first generated a lentiviral vector carrying the U6-promoter-driven gRNA expression cassette (Fig. 12) . In order to directly clone duplex oligonucleotides into the vector, we mutated the existing Bbsl sites in the vector backbone. Furthermore, we employed the puromycin resistant gene and a blue fluorescent protein (BFP) to facilitate enrichment of transduced cells by either drug selection or cell sorting (Fig. 12) . The Cas9-expressing ESCs (clone JM8-Cas9#5) were individually transduced with a virus expressing the Piga site3 gRNA and analysed for GPI-anchored protein expression 6 days post
infection. Cells expressing Site 2 gRNA displayed the highest fraction of FLAER-negative cells, whereas Site 1 and 3 gRNA produced less
FLAER-negative cells (Figs. 13-15) . This is likely due to insufficient transcription of the gRNAs caused by the absence of a G nucleotide at the first position of the gRNA (figure 12) . Since recent studies have shown that mismatches at 5' end of gRNAs have no effect on Cas9- mediated cleavage efficiency, we investigated whether replacing the first nucleotide with a G nucleotide will increase transcription of the gRNA from the U6 promoter and thereby increase cleavage
efficiency. As shown in Figures 14 and 15, the number of FLAER- negative cells significantly increased when the first nucleotide of the gRNAs (Site 1 and 3 of the Piga gene) were replaced with a G nucleotide. These results indicate that there are substantial benefits to having the G nucleotide at the first position of the gRNAs. Based on this result, we employed N19+NGG sites for the gRNA design for lentiviral libraries. We also tested lentiviral expression of gRNAs targeting 4 of the major mismatch repair genes. Cells defective in these MMR genes as well as Hprt are known to be insensitive to a nucleotide analogue, 6-thoguanine (6TG) . We obtained 6TG-res istant cells by expressing gRNAs targeting the MMR genes. We concluded that the lentivirally delivered gRNA expression cassettes are able to produce sufficient gRNAs to introduce DSBs at the target site.
In the experiments using lentiviruses , we observed that a substantial fraction of cells was double negative (FLAER-, BFP-; Fig. 14) .
Lentiviral vectors are known to eventually succumb to proviral silencing in ESCs over time. Since these cells are FLAER-negative , the gRNA must have been expressed and Piga must have been inactivated. Because of proviral silencing, however, these cells have slowly become BFP-negative over time. This would explain the presence of the double- negative population.
We next examined whether stable expression of both components of the CRISPR/Cas system was able to introduce DSBs in carcinoma cells. The tumour cells were first transduced with a lentivirus carrying hCas9 and then further transduced with a lentivirus carrying the gRNA cassette targeting the Piga gene. It is evident from Figure 13 (lower) that lentivirally delivered CRISPR/Cas9 system is able to introduce site-specific DSBs in these tumour cells, albeit at a slightly lower knock out frequency when compared to mouse ESCs.
2.5 Successful designing of functional CRISPR gRNAs
In principle, a successful site-specific cleavage by lentiviral delivery of gRNAs allows genome-wide gRNA libraries to be used as a novel tool to generate mutant libraries. We first tried to establish the versatility of the CRISPR-Cas9 in ESCs. We designed
genome-wide gRNAs and chose 2 gRNAs for each of the 26 known genes involved in the GPI-anchor biosysthesis pathway) and tested each gRNA independently for its ability to produce mutant ESCs. We transiently transfected hCas9-expressing ESCs with each gRNA expression vector separately and performed (1) deep-sequencing analyses of the cut sites 4 days post transfection, (2) flow cytometry analyses 6 days post transfection and (3) treated cells 6 days post transfection with alpha-toxin to identify resistant phenotypes.
Deep sequencing analysis revealed that out of the 52 gRNAs analysed, 50 were able to induce DSBs with an average cutting frequency of 12.7 ± 6.7 % (Fig. 16) . From phenotypic analyses, we identified gRNAs targeting 17 genes altogether that could give rise to a GPI-anchor- synthesis-deficient phenotype (Fig. 17) . Transfected cells were treated with 1.0 nM alpha-toxin for 48 h and stained with methylene blue after an additional 2 days culture. Different mutants appeared to have different sensitivities to the toxin (e.g. Pigf, Pign, Pigq and Pigw) . This is likely due to some of the gene being inessential for GPI anchored protein synthesis which results in the respective mutant being able to synthesise GPI anchored portions at a reduced level. In most cases, the fraction of FLAER negative cells observed (Fig. 17) were comparable to the cutting frequencies obtained (Fig. 16) ;
however, Pigh Site 1 and 2 showed a marked difference. We further analysed the sequence data and found that DSBs at the Pigh Site 2 yielded highly frequent in-frame deletion (12 bp), whereas Site 1 was repaired with a 2-bp deletion (Figs. 18, 19) . A gene product from the PighAl2 allele is functional because Pigh Δ12 mutant protein is able to complement Pigh mutant phenotype. These results indicate that the frequency of in-frame deletions will directly affect individual gene inactivation efficiency.
We next analysed all indel sizes of the 50 sites (Fig. 18, 19) and found that, on average, small (≤9bp) and large (up to 45bp) in-frame indel frequencies are 22.3 ± 15.9 and 31.8 ± 16.7 %, respectively. We also found that most cut sites had at least one prominent deletion, which was often associated with 2-4 bp micro-homologies . These micro- homology-mediated repairs were reproducibly observed, indicating that alternative-NHEJ is operating in mouse ESCs as reported previously. The vast majority of repairs of CRISPR-Cas9-mediated DSBs were associated with deletion (83.8 ± 8.0 %), although insertions were also observed to occur. The frequency of these mutation signatures appear to be similar to that observed in ZFNs. In general, we concluded that a significant fraction of gRNAs are functional, thereby inactivating the function of targeted genes. This provides indication that genome- wide gRNAs can be used to inactivate various genes for the screening of recessive genes. 2.6 Construction of a mouse genome-wide lentiviral gRNA library
The ease by which functional gRNAs can be designed and our success in inactivating gene functions with lentivirus-delivered gRNAs led us to generate a genome-wide gRNA library. Figure 20 illustrates an experimental scheme of the gRNA library-based genetic screening. We designed a maximum of 5 gRNAs for protein-coding genes in the mouse genome. The design criteria are as follows: 1) Each gRNAs should target an exon that is used by all CCDS transcripts of the relevant gene. 2) Each gRNAs should be positioned at least 100 bp away from the translation initiation site and in the first half of coding sequences. 3) gRNAs should not have potential off-target sites in any other
Ensembl exons . 4) gRNA sequences must not contain the Bbsl site. With these criteria, we were able to identify 87,897 gRNAs covering 94.3 % of genes with at least 2 gRNAs per gene (Fig. 21 left panel)
These gRNAs were cloned into the lentiviral vector shown in Figure 12, producing a first-generation mouse genome-wide lentiviral gRNA
library. To validate the quality of the library, we sequenced 139 clones, which were randomly isolated from the library, by capillary sequencing. We found that 121 clones (87.1%) have gRNA sequences that were designed. The rest of the clones have point mutations (5 clones), 1-bp indels (6 clones) or mutations in the 3' flanking region (7 clones). We also sequenced all the gRNAs in the library using the Illumina HiSeq2500 platform and obtained data at a sequencing depth of 503x. Out of 87,897 gRNAs designed, 87,802 (99.892%) sequences were identified (Fig. 21 centre left panel) . Although a small fraction of gRNAs were under/over-presented, the vast majority of gRNAs (82%) represent within tenfold difference in the library (Fig. 21 centre left panel) .
Next, we generated a lentivirus pool of the designed gRNAs and
performed two independent infections of hCas9-expressing ESCs (JM8- Cas9#5) . Two days post infection, we analysed BFP expression by flow cytometry and identified 32.5 % and 33.7 % of the cell populations as being BFP-positive . Subsequently, 2 x 106 BFP-positive transduced cells were enriched for both libraries by cell sorting. We deep sequenced gRNAs in the ESC libraries at a sequencing depth of 487x and 527x and found that 96.52 % of the gRNAs was present in both of the ESC
libraries (Fig. 21 centre right panel, right panel) .
We observed that a number of gRNAs that were present in the lentiviral plasmid library were not as frequently represented in the ESC libraries (Fig. 22) . There are two possibilities for this: (1) individual gRNA sequences affect lentiviral packaging adversely, which might lead to a reduced viral titre; or (2) the gRNAs inactivate essential genes and thus are depleted in the ESC populations. Our data points to the second possibility. We performed gene ontology (GO) analyses on genes that were depleted in the ESC libraries and found that GO terms in essential biological processes were overrepresented . The genes include pluripotency genes, Nanog and Pou5fl, whose
knockouts result in ESC differentiation and stop these cells from proliferating, and Rad51 and Brcal, which are known to be essential for ESC proliferation (Fig. 23) . In contrast, lineage specification genes such as T, Pax6, Nkx2-5 and Cdx2, appeared at the same frequency in both the ESC libraries and the lentiviral plasmid libraries (Fig. 23) . This indicates that the gRNAs are functional, which should result in the presence of various knockout mutants in our ESC libraries.
2.7 Genetic screens using the genome-wide lentiviral gRNA library Using these two mutant ESC libraries, we performed two genetic screens to identity genes that modulate susceptibility to (1) alpha-toxin and (2) 6TG. After obtaining resistant cells, proviral regions containing gRNAs were PCR-amplified and sequenced on the Illumina MiSeq platform. In total, we identified gRNA sequences targeting 654 and 276 genes in alpha-toxin- and 6TG-resistant cells, respectively (Table 2) . Figure 24 and 25 show genes with 2 or more gRNA hits together with all known genes whose mutations are able to confer cells resistance in each screening system.
Alpha-toxin: Out of the 26 known genes involved in the GPI-anchor biosynthesis pathway, 14 genes that were confirmed to generate knockout phenotype by the CRISPR-Cas system (Figs. 16-19) had more than one gRNA hit (Figs. 24 and 26) . No gRNA was designed for the Pigv gene due to the splice variants that have been predicted for this gene. There are 7 genes for which two independent gRNAs were
identified and 6 genes for which the same gRNA was identified in both ESC libraries (Fig. 24 and 26) .
6TG: hCas9-expressing ESCs (clone JM8-Cas9#5) were transduced at a multiplicity of infection of 0.1 with 3 gRNAs for each of the 4 major mismatch repair genes and an empty vector. The transduced ESCs were then treated with 6-thioguanine (6TG) to enrich for mismatch repair- defective mutants, respectively. We then PCR-amplified and sequenced the gRNA sequences present in un-screened ESCs and these mutant cells and analysed them for enrichment of gRNA sequences. Our data showed clear enrichment of gRNA sequences relevant to the phenotype being screened and depletion of irrelevant gRNA sequences (Figs 25 and 27) . However, for the alpha-toxin resistant cells, we were unable to recover all gRNAs targeting genes involved in the GPI-anchor
biosynthesis pathway. This is likely due to two factors: not all of the genes tested were essential for GPI-anchor biosynthesis and the failure of some gRNAs to induce DSBs in their target sites.
Nonetheless, our results indicate that the use of gRNA libraries is a promising novel approach for recessive genetic screening.
Four major MMR genes were successfully identified with 4-5 gRNA hits per gene (Fig. 25) . The Hprt gene, which is required to convert 6TG to a toxic molecule, was also identified. Six genes were identified with 2 gRNA hits each. Taken together, we have not only identified a number of previously known genes for each screen but also isolated genes whose association with the biological agent tested have not been described before.
2.8 Genetic validation of candidate genes
We proceeded to validate selected candidate genes obtained from the 2 screens. For the alpha-toxin screen, genes with at least 2 independent gRNA hits, except Olfrl206, were chosen and 4-5 independent gRNA expression vectors were constructed for each candidate gene. For the MMR screen, 3 genes with at least 2 independent gRNAs were chosen and 4-5 gRNA expression vectors were constructed for each gene. hCas9- expressing ESCs were independently transfected with each of these gRNA expression vectors. Six days post transfection, the cells were treated with the relevant agents and their resistance was analysed. Alpha-toxin: None of the gRNAs could give rise to resistant cells at 1.0 nM alpha-toxin at a level similar to the gRNA targeting Piga;
however, cells transfected with gRNAs targeting 4 genes, B4galt7, 1700016K19Rik, Cstf3 and Ext2 were resistant at lower toxin concentrations (0.50-0.75 nM) . Cells transfected with gRNAs for Lyplal and Trpc2 displayed similar sensitivity to the toxin as the wild type cells and are therefore likely to be false positives. In spite of the increased resistance to the toxin, flow cytometry analysis revealed that all transfected cells showed normal GPI-anchored protein
expression providing indication that these genes modulate
susceptibility to alpha-toxin but are not involved directly in the GPI-anchor biosynthesis pathway. Multiple numbers of different gRNAs targeting the same genes displayed similar phenotype, providing indication that that the observed phenotype is a direct consequence of gene inactivation . Nevertheless, to rule out the possibility of off- target cleavages by individual gRNAs, we conducted cDNA
complementation analysis on two of the mutants (B4galt7 and Ext2) . The mutant cells were transfected with the relevant cDNA expression vector or a control vector, together with a GFP expression vector. The transfected cells expressing GFP were sorted 2 days later and treated with 0.50 nM alpha-toxin. In contrast to cells transfected with the control vector, cells transfected with the relevant cDNA expression vector were able to revert to wild type sensitivity levels.
Overexpression of these genes in wild type ESCs did not change the susceptibility to 0.50 nM alpha toxin. These indicate that the increased resistance was due to the inactivation of the relevant genes by on-target cleavage.
6TG: None of the gRNAs targeting the 3 candidate genes gave rise to resistant cells, providing indication that they are all false- positives . 2.9 Comparison between gRNA- and shRNA-based approaches
A large number of RNAi screens have been conducted and successfully identified novel genes and pathways that are involved in given phenotypes. For the purposes of genome-wide screens, short
interference RNA ( siRNA) -based approaches require high-throughput phenotyping platforms and are usually labour-intensive and time- consuming (Boutros, M. & Ahringer, J. Nature reviews. Genetics 9, 554- 566 (2008)) . As an alternative approach, short hairpin RNA ( shRNA) - based RNAi has been used (Brummelkamp, T.R. et al Science 296, 550-553 (2002) ) . The shRNA can be expressed from PolIII promoters such as human U6 and HI promoters and this expression cassette can be
delivered to target cells by lentivirus (Stewart, S.A. et al . RNA 9, 493-501 (2003)) . Genome-wide lentiviral shRNA libraries have been generated for both human and mouse. Since shRNA-based screens can be performed in a pooled format (Root, D.E. et al Nature methods 3, 715- 719 (2006) ) , it is comparatively easier than the siRNA-based approach. However, RNAi can only reduce mRNA expression and rarely achieve 100 % knockdown. In contrast, the CRISPR-Cas 9-based approach is able to generate null mutations. We compared the gene inactivation
efficiencies of gRNA and shRNA by observing whether knockout
phenotypes were produced in alpha-toxin and 6TG treatment.
We cloned validated shRNA sequences for Piga, Pigx, Msh6, Mlhl , Msh2 and Pms2 into our lentiviral vector for direct comparisons. Apart from the position of central PPE and the addition of gRNA scaffold, the configuration of our vector is essentially the same as that of the RNAi consortium vector, pLKO.l. The shRNA expression unit has the U6 terminator sequence. As such, the gRNA scaffold should not interfere with shRNA transcription. We generated gRNA- and shRNA-expressing lentiviruses and transduced hCas 9-expressing ESCs. Six days post infection, the cells were subjected to the relevant phenotype assays. All the gRNAs tested could generate resistant cells. However, only one shRNA targeting Mlhl could give rise to the comparable number of resistant cells. These results indicate that the CRISPR-Cas 9-based approach has considerable advantages over RNAi, especially for biological agents with strong killing effects, such as toxin.
2.10 Generation and Validation of a human CRISPR gRNA library in a Cancer Cell Line
91,842 CRISPR guide RNAs targeting 18,071 human protein-coding genes were designed as described above and in Kosuke-Yusa et al Nature Biotechnology (2014) 32 267-273. The guide RNAs were then cloned into the lentiviral vector as described above, resulting in the human CRIPSR guide RNA library.
Validation of the human library was carried out by screening mutant cell libraries for alpha-toxin resistant mutants. We first introduced a Cas9 expression cassette into HT29 human colorectal cancer cell line by lentiviral transduction and established a stable cell line.
Subsequently, the Cas9-expressing HT29 was mutagenized by transducing with the lentiviral library in 4 replicates. The transduced cells were cultured for 2 weeks to completely deplete remaining mRNA and proteins of mutated genes and then treated with alpha-toxin. Five days after treatment, surviving cells were harvested, lysed and used for PCR amplification of the region containing guide sequences. The PCR products were then sequenced on the Illumina MiSeq platform and resulting data were analysed.
Sequence analysis of guide RNAs present in the resistant cells identified a total of 3566 genes. Guide RNA hits in the 26 genes involved in the GPI-anchor synthesis pathway are shown in Figure 28. All 5 guide RNAs designed for essential GPI-anchor pathway genes such as PGAP2, PIGA and PIGS were identified in all of the 4 mutant libraries. The other essential genes had multiple guide RNA hits with more than 2 hits in at least 2 independent libraries. PIGN and PIGW, genes with weak phenotype, and non-essential genes such as DPM2, MDPU1 and PIGG had 0-2 guide RNA hits only in 1-2 libraries. The genes that are not involved in the GPI-anchor synthesis pathway have also got 0-2 guide RNA hits. Overall, all known essential genes were identified with more than 2 guide RNA hits and clearly distinguishable from nonessential genes and genes that are not involved in the GPI-anchor biosynthesis pathway. Importantly, the false-negative rate is 0% in this screen.
In the present study, we showed that stable expression of two
essential components of the type II CRISPR-Cas system, namely Cas9 and gRNA, is sufficient to introduce site-specific DSBs . We also showed that lentiviral vectors can be used to deliver gRNA expression cassettes into mammalian cells. Since gRNA-mediated DSBs can introduce null mutations to target genes, gRNA-based screens are able to overcome one of the major problems of RNAi screens, namely incomplete suppression of gene expression. These led us to generate a genome-wide lentiviral gRNA library, which we used to successfully conduct genetic screens . A key to the success of genome-wide gRNA-based genetic screening is the performance of each gRNA, i.e. cutting efficiency. In this study, we conducted in-depth analyses of 52 gRNAs targeting the 26 genes involved in the GPI-anchor biosynthesis pathway. It is remarkable that 50 out of 52 gRNA tested could induce DSBs together with Cas9, albeit with variable cutting frequencies. All of the 86 gRNAs tested have been shown to be functional (Ran et al Cell 154, 1380-1389 (2013)) . Taken together, the high success rate in designing functional gRNAs is consistent across various cell types in the CRISPR-Cas9 system derived from Streptococcus pyogenes . Deep sequencing analyses also revealed the repair outcomes of the CRISPR-Cas9-induced DSBs in mouse ESCs. We observed frequent micro-homology-mediated end joining at almost every cut site, which biases the mutational outcome. As evident at Site 1 of Pigk and Site 2 of Pigh, this repair mechanism, depending on the sounding sequences, produced a much higher frequency of in-frame deletions than expected. This leads to a reduction in the frequency of generating null mutations. In ESCs, deletion patterns may be
predictable purely based on the target site sequences and design gRNAs that have higher chances of generating out-of-frame deletions. Off- target cleavages by the CRISPR-Cas9 system are expected to be more frequent than that observed in ZFNs and TALENs . We analysed off-target cleavages of the gRNA targeting Site 3 of the Piga gene, which has 55% GC content. Out of 275 potential off-target sites, including sites with bulge-type mismatches and the NAG PAM, we found evidence of only two off-target cleavages. It is clear that off-target cleavages occur through imperfect hybridisation between the gRNA and the genomic DNA. Nevertheless, well-designed gRNAs, with no off-target sites in exons, are suitable for use as genetic tools. More comprehensive analyses will be necessary before we conclusively discuss about the real nature of the off-target cleavages.
Paradoxically, the low specificity at the 5' end of the gRNAs may be useful. For efficient transcription from the U6 promoter, the
nucleotide at this position needs to be guanine. As such, target sites with GN19NGG have been most commonly used. This, however, limits the number of target site candidates in a given genome. We have showed that the mismatch at this position does not reduce but actually increase the cutting frequencies of lentivirally expressed gRNAs. The design of gRNAs therefore need not be restricted to sites with GN19NGG and sites with N19NGG can be used as CRISPR target sites. This new design significantly increases the repertoire of gRNAs available for use. We conducted two genetic screens using the genome-wide gRNA libraries and both screens were extremely successful. We identified not only known components that modulate susceptibility to the
biological agents tested but also previously unknown genes. In these screens, we used a cut-off of a minimum of 2 different gRNA hits per gene to select candidate genes for further validation. Given that the functional performance of each gRNA is high and the CRISPR-Cas9 system is able to produce null mutations, the gRNA-based genetic screening is expected to produce gene hits at a high signal-to-noise ratio. This will result in few false positives being generated. In fact, we obtained only a few false positives in our genetic screens.
Genome-wide lentiviral gRNA libraries, as tools of genome-wide mutagenesis, hold several advantages over existing mutagenesis methods. In particular, creating null mutations by the CRISPR/Cas system could overcome one of the major problems associated with RNAi, namely incomplete suppression of gene expression. Moreover, various cell types including cancer cells are amenable to gRNA-based genome engineering. A lentiviral genome-wide gRNA library will have wide applicability and represents a promising platform for functional genomics .
References
1. Urnov, F.D., Rebar, E.J., Holmes, M.C., Zhang, H.S. & Gregory, P.D. Nature reviews. Genetics 11, 636-646 (2010) .
2. Joung, J.K. & Sander, J.D. Nature reviews. Molecular cell biology 14, 49-55 (2013) .
3. Reyon, D. et al. Nature biotechnology 30, 460-465 (2012).
4. Schmid-Burgk, J.L., Schmidt, T., Kaiser, V., Honing, K. &
Hornung, V. Nature biotechnology 31, 76-81 (2013) .
5. Kim, Y. et al . Nature biotechnology 31, 251-258 (2013) .
6. Barrangou, R. et al. Science 315, 1709-1712 (2007) .
7. Marraffini, L.A. & Sontheimer, E.J. Science 322, 1843-1845
(2008) .
8. Bhaya, D., Davison, M. & Barrangou, R. Annual review of genetics 45, 273-297 (2011) .
9. Garneau, J.E. et al. Nature 468, 67-71 (2010) .
10. Jinek, M. et al. Science 337, 816-821 (2012) .
11. Gasiunas, G., Barrangou, R., Horvath, P. & Siksnys, V. Proc.
Natl. Acad. Sci. USA 109, E2579-2586 (2012) .
12. Cong, L. et al . Science 339, 819-823 (2013) .
13. Mali, P. et al . Science 339, 823-826 (2013) .
14. Cho, S.W., Kim, S., Kim, J.M. & Kim, J.S. Nature biotechnology 31, 230-232 (2013) .
15. Wang, H. et al . Cell 153, 910-918 (2013) .
16. Takeda, J. et al. Cell 73, 703-711 (1993) .
17. Moffat, J. et al. Cell 124, 1283-1298 (2006) .
18. Silva, J.M. et al. Nature genetics 37, 1281-1288 (2005) .
Sequences
SEQ ID NO: 1 humanised Cas9 (Mali et al Science 339 823-826 (2013) gccaccATGGACAAGAAGTACTCCATTGGGCTCGATATCGGCACAAACAGCGTCGGCTGGGCCGTCATTA CGGACGAGTACAAGGTGCCGAGCAAAAAATTCAAAGTTCTGGGCAATACCGATCGCCACAGCATAAAGAA GAACCTCATTGGCGCCCTCCTGTTCGACTCCGGGGAGACGGCCGAAGCCACGCGGCTCAAAAGAACAGCA CGGCGCAGATATACCCGCAGAAAGAATCGGATCTGCTACCTGCAGGAGATCTTTAGTAATGAGATGGCTA AGGTGGATGACTCTTTCTTCCATAGGCTGGAGGAGTCCTTTTTGGTGGAGGAGGATAAAAAGCACGAGCG CCACCCAATCTTTGGCAATATCGTGGACGAGGTGGCGTACCATGAAAAGTACCCAACCATATATCATCTG AGGAAGAAGCTTGTAGACAGTACTGATAAGGCTGACTTGCGGTTGATCTATCTCGCGCTGGCGCATATGA TCAAATTTCGGGGACACTTCCTCATCGAGGGGGACCTGAACCCAGACAACAGCGATGTCGACAAACTCTT TATCCAACTGGTTCAGACTTACAATCAGCTTTTCGAAGAGAACCCGATCAACGCATCCGGAGTTGACGCC AAAGCAATCCTGAGCGCTAGGCTGTCCAAATCCCGGCGGCTCGAAAACCTCATCGCACAGCTCCCTGGGG AGAAGAAGAACGGCCTGTTTGGTAATCTTATCGCCCTGTCACTCGGGCTGACCCCCAACTTTAAATCTAA CTTCGACCTGGCCG AGATGCCAAGCTTCAACTGAGCAAAG CACC ACGATGATGATCTCGACAATCTG CTGGCCCAGATCGGCGACCAGTACGCAGACCTTTTTTTGGCGGCAAAGAACCTGTCAGACGCCATTCTGC TGAGTGATATTCTGCGAGTGAACACGGAGATCACCAAAGCTCCGCTGAGCGCTAGTATGATCAAGCGCTA TGATGAGCACCACCAAGACTTGACTTTGCTGAAGGCCCTTGTCAGACAGCAACTGCCTGAGAAGTACAAG GAAATTTTCTTCGATCAGTCTAAAAATGGCTACGCCGGATACATTGACGGCGGAGCAAGCCAGGAGGAAT TTTACAAATTTATTAAGCCCATCTTGGAAAAAATGGACGGCACCGAGGAGCTGCTGGTAAAGCTTAACAG AGAAGATCTGTTGCGCAAACAGCGCACTTTCGACAATGGAAGCATCCCCCACCAGATTCACCTGGGCGAA CTGCACGCTATCCTCAGGCGGCAAGAGGATTTCTACCCCTTTTTGAAAGATAACAGGGAAAAGATTGAGA AAATCCTCACATTTCGGATACCCTACTATGTAGGCCCCCTCGCCCGGGGAAATTCCAGATTCGCGTGGAT GACTCGCAAATCAGAAGAGACCATCACTCCCTGGAACTTCGAGGAAGTCGTGGATAAGGGGGCCTCTGCC CAGTCCTTCATCGAAAGGATGACTAACTTTGATAAAAATCTGCCTAACGAAAAGGTGCTTCCTAAACACT CTCTGCTGTACGAGTACTTCACAGTTTATAACGAGCTCACCAAGGTCAAATACGTCACAGAAGGGATGAG
AAAGCCAGCATTCCTGTCTGGAGAGCAGAAGAAAGCTATCGTGGACCTCCTCTTCAAGACGAACCGGAAA GTTACCGTGAAACAGCTCAAAGAAGACTATTTCAAAAAGATTGAATGTTTCGACTCTGTTGAAATCAGCG GAGTGGAGGATCGCTTCAACGCATCCCTGGGAACGTATCACGATCTCCTGAAAATCATTAAAGACAAGGA CTTCCTGGACAATGAGGAGAACGAGGACATTCTTGAGGACATTGTCCTCACCCTTACGTTGTTTGAAGAT AGGGAGATGATTGAAGAACGCTTGAAAACTTACGCTCATCTCTTCGACGACAAAGTCATGAAACAGCTCA
AGAGGCGCCGATATACAGGATGGGGGCGGCTGTCAAGAAAACTGATCAATGGGATCCGAGACAAGCAGAG TGGAAAGACAATCCTGGATTTTCTTAAGTCCGATGGATTTGCCAACCGGAACTTCATGCAGTTGATCCAT GATGACTCTCTCACCTTTAAGGAGGACATCCAGAAAGCACAAGTTTCTGGCCAGGGGGACAGTCTTCACG AGCACATCGCTAATCTTGCAGGTAGCCCAGCTATCAAAAAGGGAATACTGCAGACCGTTAAGGTCGTGGA TGAACTCGTCAAAGTAATGGGAAGGCATAAGCCCGAGAATATCGTTATCC-AGATGGCCCGAGAGAACCAA ACTACCCAGAAGGGACAGAAGAACAGTAGGGAAAGGATGAAGAGGATTGAAGAGGGTATAAAAGAACTGG GGTCCCAAATCCTTAAGGAACACCCAGTTGAAAACACCCAGCTTCAGAATGAGAAGCTCTACCTGTACTA CCTGCAGAACGGCAGGGACATGTACGTGGATCAGGAACTGGACATCAATCGGCTCTCCGACTACGACGTG GATCATATCGTGCCCCAGTCTTTTCTCAAAGATGATTCTATTGATAATAAAGTGTTGACAAGATCCGATA AAAATAGAGGGAAGAGTGATAACGTCCCCTCAGAAGAAGTTGTCAAGAAAATGAAAAATTATTGGCGGCA
GCTGCTGAACGCCAAACTGATCACACAACGGAAGTTCGATAATCTGACTAAGGCTGAACGAGGTGGCCTG TCTGAGTTGGATAAAGCCGGCTTCATCAAAAGGCAGCTTGTTGAGACACGCCAGATCACCAAGCACGTGG CCCAAATTCTCGATTCACGCATGAACACCAAGTACGATGAAAATGACAAACTGATTCGAGAGGTGAAAGT TATTACTCTGAAGTCTAAGCTGGTCTCAGATTTCAGAAAGGACTTTCAGTTTTATAAGGTGAGAGAGATC AACAATTACCACCATGCGCATGATGCCTACCTGAATGCAGTGGTAGGCACTGCACTTATCAAAAAATATC
CCAAGCTTGAATCTGAATTTGTTTACGGAGACTATAAAGTGTACGATGTTAGGAAAATGATCGCAAAGTC TGAGCAGGAAATAGGCAAGGCCACCGCTAAGTACTTCTTTTACAGCAATATTATGAATTTTTTCAAGACC GAGATTACACTGC-CCAATGGAGAGATTCGGAAGCGACCAC TATCGAAACAAACGGAGAAACAGGAGAAA TCGTGTGGGACAAGGGTAGGGATTTCGCGACAGTCCGGAAGGTCCTGTCCATGCCGCAGGTGAACATCGT TAAAAAGACCGAAGTACAGACCGGAGGCTTCTCCAAGGAAAGTATCCTCCCGAAAAGGAACAGCGACAAG CTGATCGCACGCAAAAAAGATTGGGACCCCAAGAAATACGGCGGATTCGATTCTCCTACAGTCGCT ACA GTGTACTGGTTGTGGCCAAAGTGGAGAAAGGGAAGTCTAAAAAACTCAAAAGCGTCAAGGAACTGCTGGG CA.TCACAATCATGGA.GCGATCAAGCTTCGAAAAAAA.CCCCATCGACTTTCTCGAGGCGAAAGGATATAAA GAGGTCAAAAAAGACCTCATCATTAAGCTTCCCAAGTACTCTCTCTTTGAGCTTGAAAACGGCCGGAAAC GAATGCTCGCTAGTGCGGGCGAGCTGCAGAAAGGTAACGAGCTGGCACTGCCCTCTAAATACGTTAATTT
CTTGTATCTGGCCAGCCACTATGAAAAGCTCAAAGGGTCTCCCGAAGATAATGAGCAGAAGCAGCTGTTC GTGGAACAACACAAACACTACCTTGATGAGATCATCGAGCAAATAAGCGAATTCTCCAAAAGAGTGATCC TCGCCGACGCTAACCTCGATAAGGTGCTTTCTGCTTACAATAAGCACAGGGATAAGCCCATCAGGGAGCA GGCAGAAAACATTATCCACTTGTTTACTCTGACCAACTTGGGCGCGCCTGCAGCCTTCAAGTACTTCGAC ACCACCATAGACAGAAAGCGGTACACCTCTACAAAGGAGGTCCTGGACGCCACACTGATTCATCAGTCAA TTACGGGGCTCTATGAAACAAGAATCGACCTCTCTCAGCTCGGTGGAGACAGCAGGGCTGACCCCAAGAA GAAGAGGAAGGTGTGA
SEQ ID NO: 2 gRNA scaffold sequence
GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCT
TCCATTCCCACAGTTCTTTCICTGCCmTGGCCCATGATGCTCTCTTCCACGCCAAG
5-1 TCCATTCCCACAGTTCTTTCTCT CC¾TS¾TCCTCTCTTCCA SCCJUt'3 Δ7
5-2 TCCATTCCCAGAG!TC ITCTCTG CCA.TG&TGCTCTCTTCCACGCC) AG Δ7
5-3 TCCATTCCCACAGTTCTTTCTCTGCCATG GCTCTCYTCCACGCCAAG .A
5-4 TCCATTCCCACAGTTCTTTCTCTGCC&TGS TGATGCXCTCTTCCACGCCAAG A7
5-5 TCCATTCCCACAGTTCTTTCTCTGCCATeG-CCaTGATGCTCTC TCCACGCCAAG
5-€ ΓCC TTCCCACAGTTCTTTCTCTG CCATGAT6CTCTCTTCCACGCCAAG Al
5-7 TCCATTCCCACAGTTCTTTCTCT- GCCCAT6A SC CTCTTCCACGCC AG
5-3 T CCATGATSCTCTCTTCCACGCC AG 430
5-9· TCCATTCCCACAGTTCTTTCTCTG CCA.ISATGCTCTCTTCC CGCCAAG ΑΊ
5-1C TCCATTCCCACAGTTCTTTCTCT CCATGATGCTCTCTTCCACGCCAAG Δ7
9-1 TC ATTCCCACAGTTCTT GATGATGCTCTCTTCCACGCCAAG 414
3-2 TCC TTCCCACAGTTCTTTCTCTGCC Δ33
3-3 TCCATTCCCACAGTTCTTTCTCTG CCATG&TGCTCICTTCCACGCCAAG A7
TCCATTCCCACAGTTCTTTCTCTGCCATGG-CC TG&TGCTCTCTTCCACGCCAAG Al
TCCATTCCCACAGTTCTTTCTCTGCCATGG TGATGCICTCTTCCACGCCAAG
TCC TTCCC CAGTTCTTTCTCTGCCAT6 GCTCTCTTCCACGCCAAG A9
3-5 TCCATTCCCACAGTTCTTTCTCTG CCATSATGCTCTCTTCCACGCCAAG Δ7
S-fc TCCATTCCCACAGTTCTT GCCATGATGCTCTCTTCCACGCCAAG Ail
3-7 TCCATTCCCACAGTTCTTTCTCTG CCATG&TGCTCTCTTCCACGCCAAG Δ7
3-3 TCCATTCCCACAGTTCTTTCTCTG CCATGATGCTCTCTTCCACGCCAAG 47
3-9' TCCATTCCCACAGTTCTTTCTCTG CCA SA GC CTCTTCCACGCCAAG &7
TCCATTCCC CAGTTCTT CATSATGCTCTCTTCCACGCCAAG A 14
TCCATTCCCACAGTTCTTTCTCT CCCft GATGCTCTCTTCCACGCCAAG Al
TCCATTCCCA:AGTTCTTTCTCTGCCATGG TGATGCTCTCTTCCACGCCAAG
TCCATTCCCACAGTTCTTTCTCTGCCATG^S-CCATG TSCTCTCTTCCACGCCAAG Al
3-10 TCCATTCCCACAGTTCTTTCTCTG CCATGATGCTCTCTTCCACGCCAAG Al
Table 1
Totai numbers of gRNA hits from Number of genes with the indicated number of gRNA hits both ESC libraries GP1 screen MMR screen
1 627 264
2 13 7
3 3 0
4 3 0
5 0 0
6 5 1
7 3 1
8 0 3
9 0 0
10 0 0
Totai 654 276
Table 2

Claims

Claims
1. A method of genomic screening comprising;
providing a library of mutant mammalian cells, each cell in the library expressing an RNA-guided endonuclease and a guide RNA molecule (gRNA) specific for a target gene, such that the target gene is inactivated in the cell,
wherein the library expresses gRNA molecules specific for a set of target genes, such that a target gene from the set of target genes is inactivated in each cell in the library,
selecting mutant mammalian cells that display a test phenotype from said library to produce a selected cell population displaying the test phenotype,
identifying one or more nucleic acid seguences in the selected cell population that encode gRNA molecules, and;
identifying one or more target genes that are targeted by gRNA molecules encoded by the one or more identified sequences.
2. A method according to claim 1 wherein the library of mutant mammalian cells is produced by a method comprising;
providing a population of mammalian cells that stably express a RNA guided endonuclease,
providing a population of integrative vectors, each integrative vector comprising a nucleic acid sequence encoding a guide RNA molecule (gRNA) that is specific for a target gene in the mammalian cells ,
wherein said population comprises nucleic acid sequences encoding a diverse population of guide RNA molecules (gRNAs) that is specific for a set of target genes within the mammalian cell,
transfecting the mammalian cells with the population of
integrative vectors to produce a library of mutant cells that stably express a RNA guided endonuclease and a member of the diverse
population of gRNAs,
such that a target gene from the set is inactivated in each cell in the library.
3. A method according to any one of the preceding claims wherein the RNA guided endonuclease is Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) -associated 9 (Cas9) .
4. A method according to any one of the preceding claims wherein each cell in the library comprises a nucleic acid encoding a RNA guided endonuclease and a nucleic acid encoding a guide RNA molecule (gRNA) specific for a target gene, said nucleic acids being stably integrated into the genome of the cell.
5. A method according to any one of the preceding claims wherein each guide RNA molecule in the population is specific for a genomic target region in an exon of the target gene.
6. A method according to any one of the preceding claims wherein the diverse population comprises at least 100 gRNA molecules.
7. A method according to any one of the preceding claims wherein the diverse population comprises at least 10000 gRNA molecules.
8. A method according to any one of the preceding claims wherein the diverse population comprises at least 80000 gRNA molecules.
9. A method according to any one of the preceding claims wherein the set of target genes comprises at least 10000 genes.
10. A method according to any one of the preceding claims wherein the set of target genes comprises at least 19000 genes.
11. A method according to any one of the preceding claims wherein the set of target genes comprises at least 90% of the protein coding genes in the genome of the mammalian cells.
12. A method according to any one of the preceding claims wherein each gene in the set of target genes is targeted by 2 or more gRNA molecules in the diverse population.
13. A method according to any one of the preceding claims wherein each gene in the set of target genes is targeted by up to 5 gRNA molecules in the diverse population.
14. A method according to any one of the preceding claims wherein each member of the library of mutant mammalian cells comprises a single copy of the nucleic acid encoding the RNA guided endonuclease and a single copy of the nucleic acid encoding the gRNA molecule stably integrated into the genome thereof.
15. A method according to any one of the preceding claims wherein the nucleic acids encoding the gRNA molecules are delivered in lentiviral vectors .
16. A method according to any one of the preceding claims wherein nucleic acids encoding the gRNA molecules are expressed from a human U6 promoter.
17. A method according to claim 16 wherein each gRNA molecule comprises a recognition sequence specific for a target region of a target gene, the first nucleotide of the recognition sequence being G.
18. A method according to any one of the preceding claims wherein the gRNA molecules are produced by a method comprising;
(i) providing nucleic acid molecules encoding a diverse
population of guide RNA molecules (gRNAs) specific for a set of target protein-coding genes within a mammalian cell, the nucleic acid molecules encoding 1 to 5 gRNAs that are specific for each protein- coding gene in the set of genes,
wherein each gRNA encoded by the nucleic acid molecules is specific for a target region that is
(a) in an exon that is present in all transcripts of a protein coding gene in the set,
(b) located at least lOObp downstream of the translation initiation site and within the first half of the coding sequence,
(c) only present in a single exon within the mammalian cell genome ; and
(ii) expressing said nucleic acid molecules in the population of mammalian cells.
19. A method according to any one of the preceding claims wherein mutant mammalian cells that display the test phenotype are selected by culturing the mutant mammalian cell library under selective
conditions .
20. A method according to claim 19 wherein the selective conditions are lethal to cells that do not display the test phenotype.
21. A method according to any one of claims 1 to 19 wherein mutant mammalian cells that display the test phenotype are selected by identifying mutant mammalian cells in the library that display the test phenotype and isolating the identified cells.
22. A method according to any one of the preceding claims wherein mutant mammalian cells that display the test phenotype are selected by a method comprising exposing the mutant mammalian cells in the library to a chemical compound.
23. A method according to claim 22 wherein the test phenotype is resistance to the chemical compound.
24. A method according to any one of the preceding claims comprising amplifying the gRNA-encoding nucleic acid sequences from the selected cell population.
25. A method according to any one of the preceding claims wherein the one or more nucleic acid sequences encoding gRNA molecules in the selected cell population are identified by sequencing the nucleic acids in the selected cell population that encode gRNA molecules.
26. A method according to any one of the preceding claims wherein the one or more nucleic acid sequences are identified by determining the amounts of one or more nucleic acid sequences encoding gRNA molecules that are present in the selected cell population.
27. A method according to any one of the preceding claims wherein the amounts of the one or more nucleic acid sequences encoding gRNA molecules present in the selected cell population are determined relative to the amounts in a control sample of said library.
28. A method according to claim 27 comprising identifying one or more gRNA encoding nucleic acid sequences that are enriched or depleted in the selected cell population relative to the control sample.
29. A method according to claim 28 wherein the target genes that are targeted by gRNA molecules encoded by the one or more enriched or depleted nucleic acid sequences are identified.
30. A method according to any one of the preceding claims wherein said one or more identified target genes modulate the test phenotype in the mammalian cells.
31. A method according to any one of claims 1 to 30 wherein the mammalian cells are pluripotent cells.
32. A method according to claim 31 wherein the mammalian cells are embryonic stem cells.
33. A method according to any one of claims 1 to 30 wherein the mammalian cells are cancer cells.
34. A method according to any one of claims 1 to 30 wherein the mammalian cells are somatic cells.
35. A method according to any one of claims 1 to 34 wherein the mammalian cells are mouse cells.
36. A method according to any one of claims 1 to 34 wherein the mammalian cells are human cells.
37. A population of integrative vectors,
each integrative vector comprising a nucleic acid sequence encoding a guide RNA molecule (gRNA) that is specific for a target gene in a mammalian cell,
wherein said population of vectors comprises nucleic acid sequences that encode a diverse population of guide RNA molecules (gRNAs) that is specific for a set of target genes within a mammalian cell .
38. A population according to claim 37 wherein the integrative vectors are lentiviral vectors.
39. A population according to claim 37 or 38 wherein each guide RNA molecule in the population is specific for a genomic target region in an exon of the target gene.
40. A population according to any one of claims 37 to 39 wherein the diverse population comprises at least 80000 gRNA molecules.
41. A population according to any one of claims 37 to 40 wherein the set of target genes comprises at least 19000 genes.
42. A population according to any one of claims 37 to 41 wherein the set of target genes comprises at least 90% of the protein coding genes in the genome of the mammalian cell.
43 A population according to any one of claims 37 to 42 wherein the mammalian cell is a mouse cell.
44. A population according to any one of claims 37 to 42 wherein the mammalian cell is a human cell.
45. A population according to any one of claims 37 to 44 wherein the vectors comprise one or more selectable markers.
46. A population according to any one of claims 37 to 45 wherein nucleic acids encoding the gRNA molecules operably linked to a human U6 promoter.
47. A population according to any one of claims 37 to 46 wherein wherein the gRNA molecules comprise a recognition sequence specific for an exonic target region of the target genes, the first nucleotide of said recognition sequence being guanine (G) .
48. A population according to any one of claims 37 to 47 wherein the population is produced by a method comprising;
(i) providing nucleic acid molecules encoding a diverse
population of guide RNA molecules (gRNAs) specific for a set of target protein-coding genes within a mammalian cell, the nucleic acid molecules encode 1 to 5 gRNAs that are specific for each protein- coding gene in the set of genes,
wherein each gRNA encoded by the nucleic acid molecules is specific for a target region that is
(a) in an exon that is present in all transcripts of a protein coding gene in the set,
(b) located at least lOObp downstream of the translation initiation site and within the first half of the coding sequence,
(c) only present in a single exon within the mammalian cell genome; and
(ii) cloning said nucleic acid molecules into integrative vectors .
49. A library of mutant mammalian cells,
each mutant mammalian cell in the library expressing a RNA guided endonuclease and a gRNA specific for a target gene, such that the target gene is inactivated in the cell,
wherein the library expresses a diverse population of gRNA molecules that is specific for a set of target genes, such that a target gene from the set is inactivated in each cell in the library.
50. A library according to claim 49 wherein each mammalian cell has one copy of a nucleic acid encoding the RNA guided endonuclease and one copy of a nucleic acid encoding the gRNA.
51. A library according to claim 49 or 50 wherein said nucleic acid encoding the gRNA is stably integrated into the genome of the cell.
52. A library according to any one of claims 49 to 51 wherein said nucleic acid encoding the RNA guided endonuclease is stably integrated into the genome of the cell.
53. A library according to any one of claims 49 to 52 transfected with a population of integrative vectors according to any one of claims 39 to 52.
54. A library according to any one of claims 49 to 53 wherein the nucleic acid encoding the gRNA is contained in a lentiviral vector.
55. A library according to any one of claims 49 to 54 wherein the RNA guided endonuclease is Clustered Regularly Interspaced Short
Palindromic Repeat (CRISPR) -associated 9 (Cas9) nuclease.
56. A library according to any one of claims 49 to 55 wherein the diverse population comprises at least 80000 gRNA molecules.
57. A library according to any one of claims 49 to 56 wherein the set of target genes comprises at least 19000 genes.
58. A library according to any one of claims 49 to 57 wherein the set of target genes comprises at least 90% of the protein coding genes in the genome of the mammalian cell.
59. A library according to any one of claims 49 to 58 wherein the mammalian cells are pluripotent cells.
60. A library according to any one of claims 49 to 59 wherein the mammalian cells are embryonic stem cells.
61. A library according to any one of claims 49 to 58 wherein the mammalian cells are cancer cells
62. A library according to any one of claims 49 to 58 wherein the mammalian cells are somatic cells.
63. A library according to any one of claims 49 to 62 wherein the mammalian cells are mouse cells.
64. A library according to any one of claims 49 to 62 wherein the mammalian cells are human cells.
65. A library according to any one of claims 49 to 64 produced by a method comprising;;
(i) providing nucleic acid molecules encoding a diverse
population of guide RNA molecules (gRNAs) specific for a set of target protein-coding genes within a mammalian cell, the nucleic acid molecules encode 1 to 5 gRNAs that are specific for each protein- coding gene in the set of genes,
wherein each gRNA encoded by the nucleic acid molecules is specific for a target region that is
(a) in an exon that is present in all transcripts of a protein coding gene in the set,
(b) located at least lOObp downstream of the translation initiation site and within the first half of the coding sequence,
(c) only present in a single exon within the mammalian cell genome; and
(ii) expressing said nucleic acid molecules in a population of mammalian cells that express a RNA guided endonuclease
66. Use of a population according to any one of claims 37 to 48 or a library according to any one of claims 49 to 65 in a method of genomic screening.
67. Use according to claim 66 in a method according to any one of claims 1 to 36.
68. A kit comprising a population according to any one of claims 37 to 48 or a library according to any one of claims 49 to 65.
PCT/EP2014/069825 2013-09-18 2014-09-17 Genomic screening methods using rna-guided endonucleases WO2015040075A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB1316558.4 2013-09-18
GB201316558A GB201316558D0 (en) 2013-09-18 2013-09-18 Genomic screening methods
GB201321257A GB201321257D0 (en) 2013-12-02 2013-12-02 Genomic screening methods
GB1321257.6 2013-12-02

Publications (1)

Publication Number Publication Date
WO2015040075A1 true WO2015040075A1 (en) 2015-03-26

Family

ID=51688025

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2014/069825 WO2015040075A1 (en) 2013-09-18 2014-09-17 Genomic screening methods using rna-guided endonucleases

Country Status (1)

Country Link
WO (1) WO2015040075A1 (en)

Cited By (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9228207B2 (en) 2013-09-06 2016-01-05 President And Fellows Of Harvard College Switchable gRNAs comprising aptamers
US9322006B2 (en) 2011-07-22 2016-04-26 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US9359599B2 (en) 2013-08-22 2016-06-07 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US9388430B2 (en) 2013-09-06 2016-07-12 President And Fellows Of Harvard College Cas9-recombinase fusion proteins and uses thereof
WO2016196805A1 (en) * 2015-06-05 2016-12-08 The Regents Of The University Of California Methods and compositions for generating crispr/cas guide rnas
US9526784B2 (en) 2013-09-06 2016-12-27 President And Fellows Of Harvard College Delivery system for functional nucleases
US9833761B2 (en) 2013-08-05 2017-12-05 Twist Bioscience Corporation De novo synthesized gene libraries
US9834791B2 (en) 2013-11-07 2017-12-05 Editas Medicine, Inc. CRISPR-related methods and compositions with governing gRNAS
US9840699B2 (en) 2013-12-12 2017-12-12 President And Fellows Of Harvard College Methods for nucleic acid editing
US9895673B2 (en) 2015-12-01 2018-02-20 Twist Bioscience Corporation Functionalized surfaces and preparation thereof
WO2018038772A1 (en) * 2016-08-22 2018-03-01 Twist Bioscience Corporation De novo synthesized nucleic acid libraries
US9981239B2 (en) 2015-04-21 2018-05-29 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
US10077453B2 (en) 2014-07-30 2018-09-18 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10113163B2 (en) 2016-08-03 2018-10-30 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10167457B2 (en) 2015-10-23 2019-01-01 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US20190136211A1 (en) * 2016-04-28 2019-05-09 Industry-Academic Cooperation Foundation Yonsei University Method For In Vivo High-Throughput Evaluating Of RNA-Guided Nuclease Activity
WO2019106522A1 (en) * 2017-11-28 2019-06-06 Novartis Ag Pooled crispr/cas9 screening in primary cells using guide swap technology
US10337001B2 (en) 2014-12-03 2019-07-02 Agilent Technologies, Inc. Guide RNA with chemical modifications
WO2019136169A1 (en) * 2018-01-04 2019-07-11 Arizona Board Of Regents On Behalf Of Arizona State University Versatile amplicon single-cell droplet sequencing-based shotgun screening platform to accelerate functional genomics
JP2019524162A (en) * 2016-08-18 2019-09-05 ザ リージェンツ オブ ザ ユニバーシティ オブ カリフォルニア CRISPR-Cas genome editing with modular AAV delivery system
CN110249049A (en) * 2016-12-29 2019-09-17 法兰克福大学 The method for generating high-order genome editor library
US10417457B2 (en) 2016-09-21 2019-09-17 Twist Bioscience Corporation Nucleic acid based data storage
CN110402305A (en) * 2016-11-30 2019-11-01 中国农业大学 A kind of method of CRISPR library screening
CN110396523A (en) * 2018-04-23 2019-11-01 中国科学院上海生命科学研究院 A kind of plant fixed point recombination method that repeated fragment mediates
CN110506203A (en) * 2017-04-07 2019-11-26 塞奇科学股份有限公司 For purifying the system and method to detect genetic structure variation by using integrated electrophoresis DNA
EP3578658A1 (en) * 2018-06-08 2019-12-11 Johann Wolfgang Goethe-Universität Frankfurt Method for generating a gene editing vector with fixed guide rna pairs
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
US10669304B2 (en) 2015-02-04 2020-06-02 Twist Bioscience Corporation Methods and devices for de novo oligonucleic acid assembly
CN111304211A (en) * 2020-03-10 2020-06-19 无锡市第五人民医院 RHD-T268A mutant and detection thereof
US10696965B2 (en) 2017-06-12 2020-06-30 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US10767175B2 (en) 2016-06-08 2020-09-08 Agilent Technologies, Inc. High specificity genome editing using chemically modified guide RNAs
CN111748848A (en) * 2019-03-26 2020-10-09 北京大学 Method for identifying functional elements
US10844373B2 (en) 2015-09-18 2020-11-24 Twist Bioscience Corporation Oligonucleic acid variant libraries and synthesis thereof
US10894242B2 (en) 2017-10-20 2021-01-19 Twist Bioscience Corporation Heated nanowells for polynucleotide synthesis
US10894959B2 (en) 2017-03-15 2021-01-19 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
US10907274B2 (en) 2016-12-16 2021-02-02 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
US10936953B2 (en) 2018-01-04 2021-03-02 Twist Bioscience Corporation DNA-based digital information storage with sidewall electrodes
US11001829B2 (en) 2014-09-25 2021-05-11 The Broad Institute, Inc. Functional screening with optimized functional CRISPR-Cas systems
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11306309B2 (en) 2015-04-06 2022-04-19 The Board Of Trustees Of The Leland Stanford Junior University Chemically modified guide RNAs for CRISPR/CAS-mediated gene regulation
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11332738B2 (en) 2019-06-21 2022-05-17 Twist Bioscience Corporation Barcode-based nucleic acid sequence assembly
CN114672549A (en) * 2022-04-22 2022-06-28 厦门大学 Rett syndrome early auxiliary diagnosis kit
US11377676B2 (en) 2017-06-12 2022-07-05 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US11407837B2 (en) 2017-09-11 2022-08-09 Twist Bioscience Corporation GPCR binding proteins and synthesis thereof
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11492728B2 (en) 2019-02-26 2022-11-08 Twist Bioscience Corporation Variant nucleic acid libraries for antibody optimization
US11492727B2 (en) 2019-02-26 2022-11-08 Twist Bioscience Corporation Variant nucleic acid libraries for GLP1 receptor
US11492665B2 (en) 2018-05-18 2022-11-08 Twist Bioscience Corporation Polynucleotides, reagents, and methods for nucleic acid hybridization
US11512347B2 (en) 2015-09-22 2022-11-29 Twist Bioscience Corporation Flexible substrates for nucleic acid synthesis
US11519004B2 (en) 2018-03-19 2022-12-06 Regeneran Pharmaceuticals, Inc. Transcription modulation in animals using CRISPR/Cas systems
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11542495B2 (en) 2015-11-20 2023-01-03 Sage Science, Inc. Preparative electrophoretic method for targeted purification of genomic DNA fragments
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11550939B2 (en) 2017-02-22 2023-01-10 Twist Bioscience Corporation Nucleic acid based data storage using enzymatic bioencryption
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
CN116403647A (en) * 2023-06-08 2023-07-07 上海精翰生物科技有限公司 Biological information detection method for detecting slow virus integration site and application thereof
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US11884915B2 (en) 2021-09-10 2024-01-30 Agilent Technologies, Inc. Guide RNAs with chemical modification for prime editing
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11897920B2 (en) 2017-08-04 2024-02-13 Peking University Tale RVD specifically recognizing DNA base modified by methylation and application thereof
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence

Non-Patent Citations (56)

* Cited by examiner, † Cited by third party
Title
"Protocols in Molecular Biology", 1992, JOHN WILEY & SONS
"Recombinant Gene Expression Protocols", March 1997, HUMANA PRESS INC
BARRANGOU, R. ET AL., SCIENCE, vol. 315, 2007, pages 1709 - 1712
BASSIK MC ET AL., NAT METHODS, vol. 6, pages 443 - 445
BEARD, C. ET AL., GENESIS, vol. 44, pages 23 - 28
BENTLEY ET AL., NATURE, vol. 456, 2008, pages 53 - 59
BHAYA, D.; DAVISON, M.; BARRANGOU, R., ANNUAL REVIEW OF GENETICS, vol. 45, 2011, pages 273 - 297
BOUTROS, M.; AHRINGER, J., NATURE REVIEWS. GENETICS, vol. 9, 2008, pages 554 - 566
BRUMMELKAMP, T.R. ET AL., SCIENCE, vol. 296, 2002, pages 550 - 553
CAMPEAU ET AL., BRIEFINGS IN FUNCTIONAL GENOMICS, vol. 10, no. 4, pages 215 - 226
CHO, S.W.; KIM, S.; KIM, J.M.; KIM, J.S., NATURE BIOTECHNOLOGY, vol. 31, 2013, pages 230 - 232
COLLINS, C. S. ET AL., PROC. NATL ACAD. SCI. USA, vol. 103, 2006, pages 3775 - 3780
CONG, L. ET AL., SCIENCE, vol. 339, 2013, pages 819 - 823
EID ET AL., SCIENCE, vol. 323, no. 5910, 2009, pages 133 - 138
GARNEAU, J.E. ET AL., NATURE, vol. 468, 2010, pages 67 - 71
GASIUNAS, G.; BARRANGOU, R.; HORVATH, P.; SIKSNYS, V., PROC. NATL. ACAD. SCI. USA, vol. 109, 2012, pages E2579 - 2586
GREEN ET AL.: "Molecular Cloning: a Laboratory Manual", 2012, COLD SPRING HARBOR LABORATORY PRESS
HIROKO KOIKE-YUSA ET AL: "Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library", NATURE BIOTECHNOLOGY, vol. 32, no. 3, 23 December 2013 (2013-12-23), pages 267 - 273, XP055115706, ISSN: 1087-0156, DOI: 10.1038/nbt.2800 *
IORNS ET AL., NATURE REV DRUG DISCOV, vol. 6, 2007, pages 556 - 568
IRION ET AL., NATURE BIOTECH, vol. 25, no. 12, pages 1477 - 1482
JINEK, M. ET AL., SCIENCE, vol. 337, 2012, pages 816 - 821
JOUNG, J.K.; SANDER, J.D.: "Nature reviews", MOLECULAR CELL BIOLOGY, vol. 14, 2013, pages 49 - 55
KETELA ET AL., BMC GENOMICS, vol. 12, 2011, pages 213
KIM, Y. ET AL., NATURE BIOTECHNOLOGY, vol. 31, 2013, pages 251 - 258
KJ MCKERNAN ET AL., GENOME RES., vol. 19, 2009, pages 1527 - 1541
KORLACH ET AL., METHODS IN ENZYMOLOGY, vol. 472, 2010, pages 431 - 455
KOSUKE-YUSA ET AL., NATURE BIOTECHNOLOGY, vol. 32, 2014, pages 267 - 273
LI ET AL., BIOINFORMATICS, vol. 25, no. 16, 2009, pages 2078 - 2079
M RONAGHI ET AL., SCIENCE, vol. 281, no. 5375, 1998, pages 363 - 365
MACKEIGAN, J. P. ET AL., NATURE CELL BIOL., vol. 7, 2005, pages 591 - 600
MALI, P. ET AL., SCIENCE, vol. 339, 2013, pages 823 - 826
MARRAFFINI, L.A.; SONTHEIMER, E.J., SCIENCE, vol. 322, 2008, pages 1843 - 1845
MOFFAT, J. ET AL., CELL, vol. 124, 2006, pages 1283 - 1298
NGO, V. N. ET AL., NATURE, vol. 441, 2006, pages 106 - 110
P. MALI ET AL: "Supplementary Materials for RNA-Guided Human Genome Engineering via Cas9", SCIENCE, vol. 339, no. 6121, 3 January 2013 (2013-01-03), pages 823 - 826, XP055161524, ISSN: 0036-8075, DOI: 10.1126/science.1232033 *
PADDISON, P. J. ET AL., NATURE, vol. 428, 2004, pages 427 - 431
QI LEI S ET AL: "Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression", CELL, CELL PRESS, US, vol. 152, no. 5, 28 February 2013 (2013-02-28), pages 1173 - 1183, XP028987304, ISSN: 0092-8674, DOI: 10.1016/J.CELL.2013.02.022 *
QUAIL MA ET AL., NAT METHODS, vol. 5, 2008, pages 1005 - 1010
RAN ET AL., CELL, vol. 154, 2013, pages 1380 - 1389
REYON, D. ET AL., NATURE BIOTECHNOLOGY, vol. 30, 2012, pages 460 - 465
ROOT, D.E. ET AL., NATURE METHODS, vol. 3, 2006, pages 715 - 719
ROTHBERG ET AL., NATURE, vol. 475, 2011, pages 348 - 352
SADELAIN, M. ET AL., NAT REV CANCER, vol. 12, no. 1, 2011, pages 51 - 8
SCHLABACH MR, SCIENCE, vol. 319, 2008, pages 620 - 624
SCHMID-BURGK, J.L.; SCHMIDT, T.; KAISER, V.; HONING, K.; HORNUNG, V., NATURE BIOTECHNOLOGY, vol. 31, 2013, pages 76 - 81
SCHNEIDER ET AL., NATURE BIOTECHNOLOGY, vol. 30, 2012, pages 326 - 328
SILVA JM, SCIENCE, vol. 319, 2008, pages 617 - 620
SILVA, J.M. ET AL., NATURE GENETICS, vol. 37, 2005, pages 1281 - 1288
SIMS ET AL., GENOME BIOLOGY, vol. 12, 2011, pages R104
STEWART, S.A. ET AL., RNA, vol. 9, 2003, pages 493 - 501
T. WANG ET AL: "Genetic Screens in Human Cells Using the CRISPR-Cas9 System", SCIENCE, vol. 343, no. 6166, 12 December 2013 (2013-12-12), pages 80 - 84, XP055115509, ISSN: 0036-8075, DOI: 10.1126/science.1246981 *
TAKEDA, J. ET AL., CELL, vol. 73, 1993, pages 703 - 711
URNOV, F.D.; REBAR, E.J.; HOLMES, M.C.; ZHANG, H.S.; GREGORY, P.D., NATURE REVIEWS. GENETICS, vol. 11, 2010, pages 636 - 646
WANG, H. ET AL., CELL, vol. 153, 2013, pages 910 - 918
YEUNG, ML ET AL., J. BIOL. CHEM., vol. 284, 2009, pages 19643 - 73
ZUBER J ET AL., NAT BIOTECHNOL, vol. 29, 2011, pages 79 - 83

Cited By (133)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10323236B2 (en) 2011-07-22 2019-06-18 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US9322006B2 (en) 2011-07-22 2016-04-26 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US11185837B2 (en) 2013-08-05 2021-11-30 Twist Bioscience Corporation De novo synthesized gene libraries
US11559778B2 (en) 2013-08-05 2023-01-24 Twist Bioscience Corporation De novo synthesized gene libraries
US10773232B2 (en) 2013-08-05 2020-09-15 Twist Bioscience Corporation De novo synthesized gene libraries
US10583415B2 (en) 2013-08-05 2020-03-10 Twist Bioscience Corporation De novo synthesized gene libraries
US11452980B2 (en) 2013-08-05 2022-09-27 Twist Bioscience Corporation De novo synthesized gene libraries
US10384188B2 (en) 2013-08-05 2019-08-20 Twist Bioscience Corporation De novo synthesized gene libraries
US10639609B2 (en) 2013-08-05 2020-05-05 Twist Bioscience Corporation De novo synthesized gene libraries
US9833761B2 (en) 2013-08-05 2017-12-05 Twist Bioscience Corporation De novo synthesized gene libraries
US10272410B2 (en) 2013-08-05 2019-04-30 Twist Bioscience Corporation De novo synthesized gene libraries
US10632445B2 (en) 2013-08-05 2020-04-28 Twist Bioscience Corporation De novo synthesized gene libraries
US10618024B2 (en) 2013-08-05 2020-04-14 Twist Bioscience Corporation De novo synthesized gene libraries
US11920181B2 (en) 2013-08-09 2024-03-05 President And Fellows Of Harvard College Nuclease profiling system
US10954548B2 (en) 2013-08-09 2021-03-23 President And Fellows Of Harvard College Nuclease profiling system
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US9359599B2 (en) 2013-08-22 2016-06-07 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US10227581B2 (en) 2013-08-22 2019-03-12 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US9526784B2 (en) 2013-09-06 2016-12-27 President And Fellows Of Harvard College Delivery system for functional nucleases
US11299755B2 (en) 2013-09-06 2022-04-12 President And Fellows Of Harvard College Switchable CAS9 nucleases and uses thereof
US10682410B2 (en) 2013-09-06 2020-06-16 President And Fellows Of Harvard College Delivery system for functional nucleases
US9999671B2 (en) 2013-09-06 2018-06-19 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US10912833B2 (en) 2013-09-06 2021-02-09 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US9737604B2 (en) 2013-09-06 2017-08-22 President And Fellows Of Harvard College Use of cationic lipids to deliver CAS9
US9228207B2 (en) 2013-09-06 2016-01-05 President And Fellows Of Harvard College Switchable gRNAs comprising aptamers
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US9388430B2 (en) 2013-09-06 2016-07-12 President And Fellows Of Harvard College Cas9-recombinase fusion proteins and uses thereof
US9340800B2 (en) 2013-09-06 2016-05-17 President And Fellows Of Harvard College Extended DNA-sensing GRNAS
US9340799B2 (en) 2013-09-06 2016-05-17 President And Fellows Of Harvard College MRNA-sensing switchable gRNAs
US10190137B2 (en) 2013-11-07 2019-01-29 Editas Medicine, Inc. CRISPR-related methods and compositions with governing gRNAS
US10640788B2 (en) 2013-11-07 2020-05-05 Editas Medicine, Inc. CRISPR-related methods and compositions with governing gRNAs
US11390887B2 (en) 2013-11-07 2022-07-19 Editas Medicine, Inc. CRISPR-related methods and compositions with governing gRNAS
US9834791B2 (en) 2013-11-07 2017-12-05 Editas Medicine, Inc. CRISPR-related methods and compositions with governing gRNAS
US9840699B2 (en) 2013-12-12 2017-12-12 President And Fellows Of Harvard College Methods for nucleic acid editing
US11124782B2 (en) 2013-12-12 2021-09-21 President And Fellows Of Harvard College Cas variants for gene editing
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11578343B2 (en) 2014-07-30 2023-02-14 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10077453B2 (en) 2014-07-30 2018-09-18 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11001829B2 (en) 2014-09-25 2021-05-11 The Broad Institute, Inc. Functional screening with optimized functional CRISPR-Cas systems
US10337001B2 (en) 2014-12-03 2019-07-02 Agilent Technologies, Inc. Guide RNA with chemical modifications
US10900034B2 (en) 2014-12-03 2021-01-26 Agilent Technologies, Inc. Guide RNA with chemical modifications
US10669304B2 (en) 2015-02-04 2020-06-02 Twist Bioscience Corporation Methods and devices for de novo oligonucleic acid assembly
US11697668B2 (en) 2015-02-04 2023-07-11 Twist Bioscience Corporation Methods and devices for de novo oligonucleic acid assembly
US11535846B2 (en) 2015-04-06 2022-12-27 The Board Of Trustees Of The Leland Stanford Junior University Chemically modified guide RNAS for CRISPR/Cas-mediated gene regulation
US11306309B2 (en) 2015-04-06 2022-04-19 The Board Of Trustees Of The Leland Stanford Junior University Chemically modified guide RNAs for CRISPR/CAS-mediated gene regulation
US11851652B2 (en) 2015-04-06 2023-12-26 The Board Of Trustees Of The Leland Stanford Junior Compositions comprising chemically modified guide RNAs for CRISPR/Cas-mediated editing of HBB
US9981239B2 (en) 2015-04-21 2018-05-29 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
US11691118B2 (en) 2015-04-21 2023-07-04 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
US10744477B2 (en) 2015-04-21 2020-08-18 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
US11279926B2 (en) 2015-06-05 2022-03-22 The Regents Of The University Of California Methods and compositions for generating CRISPR/Cas guide RNAs
WO2016196805A1 (en) * 2015-06-05 2016-12-08 The Regents Of The University Of California Methods and compositions for generating crispr/cas guide rnas
US10844373B2 (en) 2015-09-18 2020-11-24 Twist Bioscience Corporation Oligonucleic acid variant libraries and synthesis thereof
US11807956B2 (en) 2015-09-18 2023-11-07 Twist Bioscience Corporation Oligonucleic acid variant libraries and synthesis thereof
US11512347B2 (en) 2015-09-22 2022-11-29 Twist Bioscience Corporation Flexible substrates for nucleic acid synthesis
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US10167457B2 (en) 2015-10-23 2019-01-01 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11542495B2 (en) 2015-11-20 2023-01-03 Sage Science, Inc. Preparative electrophoretic method for targeted purification of genomic DNA fragments
US10384189B2 (en) 2015-12-01 2019-08-20 Twist Bioscience Corporation Functionalized surfaces and preparation thereof
US10987648B2 (en) 2015-12-01 2021-04-27 Twist Bioscience Corporation Functionalized surfaces and preparation thereof
US9895673B2 (en) 2015-12-01 2018-02-20 Twist Bioscience Corporation Functionalized surfaces and preparation thereof
US20190136211A1 (en) * 2016-04-28 2019-05-09 Industry-Academic Cooperation Foundation Yonsei University Method For In Vivo High-Throughput Evaluating Of RNA-Guided Nuclease Activity
US10767175B2 (en) 2016-06-08 2020-09-08 Agilent Technologies, Inc. High specificity genome editing using chemically modified guide RNAs
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10113163B2 (en) 2016-08-03 2018-10-30 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
EP3500667A4 (en) * 2016-08-18 2020-09-02 The Regents of the University of California Crispr-cas genome engineering via a modular aav delivery system
JP2019524162A (en) * 2016-08-18 2019-09-05 ザ リージェンツ オブ ザ ユニバーシティ オブ カリフォルニア CRISPR-Cas genome editing with modular AAV delivery system
US10053688B2 (en) 2016-08-22 2018-08-21 Twist Bioscience Corporation De novo synthesized nucleic acid libraries
US10975372B2 (en) 2016-08-22 2021-04-13 Twist Bioscience Corporation De novo synthesized nucleic acid libraries
GB2568444A (en) * 2016-08-22 2019-05-15 Twist Bioscience Corp De novo synthesized nucleic acid libraries
WO2018038772A1 (en) * 2016-08-22 2018-03-01 Twist Bioscience Corporation De novo synthesized nucleic acid libraries
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US10754994B2 (en) 2016-09-21 2020-08-25 Twist Bioscience Corporation Nucleic acid based data storage
US11562103B2 (en) 2016-09-21 2023-01-24 Twist Bioscience Corporation Nucleic acid based data storage
US11263354B2 (en) 2016-09-21 2022-03-01 Twist Bioscience Corporation Nucleic acid based data storage
US10417457B2 (en) 2016-09-21 2019-09-17 Twist Bioscience Corporation Nucleic acid based data storage
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
CN110402305A (en) * 2016-11-30 2019-11-01 中国农业大学 A kind of method of CRISPR library screening
CN110402305B (en) * 2016-11-30 2023-07-21 北京复昇生物科技有限公司 CRISPR library screening method
US10907274B2 (en) 2016-12-16 2021-02-02 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
CN110249049A (en) * 2016-12-29 2019-09-17 法兰克福大学 The method for generating high-order genome editor library
US11550939B2 (en) 2017-02-22 2023-01-10 Twist Bioscience Corporation Nucleic acid based data storage using enzymatic bioencryption
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US10894959B2 (en) 2017-03-15 2021-01-19 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
CN110506203A (en) * 2017-04-07 2019-11-26 塞奇科学股份有限公司 For purifying the system and method to detect genetic structure variation by using integrated electrophoresis DNA
US11867661B2 (en) 2017-04-07 2024-01-09 Sage Science, Inc. Systems and methods for detection of genetic structural variation using integrated electrophoretic DNA purification
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11377676B2 (en) 2017-06-12 2022-07-05 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US11332740B2 (en) 2017-06-12 2022-05-17 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US10696965B2 (en) 2017-06-12 2020-06-30 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11897920B2 (en) 2017-08-04 2024-02-13 Peking University Tale RVD specifically recognizing DNA base modified by methylation and application thereof
US11932884B2 (en) 2017-08-30 2024-03-19 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11407837B2 (en) 2017-09-11 2022-08-09 Twist Bioscience Corporation GPCR binding proteins and synthesis thereof
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US10894242B2 (en) 2017-10-20 2021-01-19 Twist Bioscience Corporation Heated nanowells for polynucleotide synthesis
US11745159B2 (en) 2017-10-20 2023-09-05 Twist Bioscience Corporation Heated nanowells for polynucleotide synthesis
WO2019106522A1 (en) * 2017-11-28 2019-06-06 Novartis Ag Pooled crispr/cas9 screening in primary cells using guide swap technology
WO2019136169A1 (en) * 2018-01-04 2019-07-11 Arizona Board Of Regents On Behalf Of Arizona State University Versatile amplicon single-cell droplet sequencing-based shotgun screening platform to accelerate functional genomics
US10936953B2 (en) 2018-01-04 2021-03-02 Twist Bioscience Corporation DNA-based digital information storage with sidewall electrodes
US20210163926A1 (en) * 2018-01-04 2021-06-03 Arizona Board Of Regents On Behalf Of Arizona State University Versatile amplicon single-cell droplet sequencing-based shotgun screening platform to accelerate functional genomics
US11519004B2 (en) 2018-03-19 2022-12-06 Regeneran Pharmaceuticals, Inc. Transcription modulation in animals using CRISPR/Cas systems
CN110396523B (en) * 2018-04-23 2023-06-09 中国科学院分子植物科学卓越创新中心 Plant site-directed recombination method mediated by repeated segments
CN110396523A (en) * 2018-04-23 2019-11-01 中国科学院上海生命科学研究院 A kind of plant fixed point recombination method that repeated fragment mediates
US11492665B2 (en) 2018-05-18 2022-11-08 Twist Bioscience Corporation Polynucleotides, reagents, and methods for nucleic acid hybridization
US11732294B2 (en) 2018-05-18 2023-08-22 Twist Bioscience Corporation Polynucleotides, reagents, and methods for nucleic acid hybridization
EP3578658A1 (en) * 2018-06-08 2019-12-11 Johann Wolfgang Goethe-Universität Frankfurt Method for generating a gene editing vector with fixed guide rna pairs
WO2019234258A1 (en) * 2018-06-08 2019-12-12 Johann Wolfgang Goethe-Universität Frankfurt am Main Method for generating a gene editing vector with fixed guide rna pairs
US11492728B2 (en) 2019-02-26 2022-11-08 Twist Bioscience Corporation Variant nucleic acid libraries for antibody optimization
US11492727B2 (en) 2019-02-26 2022-11-08 Twist Bioscience Corporation Variant nucleic acid libraries for GLP1 receptor
US11643652B2 (en) 2019-03-19 2023-05-09 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11795452B2 (en) 2019-03-19 2023-10-24 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
CN111748848B (en) * 2019-03-26 2023-04-28 北京大学 Method for identifying functional elements
CN111748848A (en) * 2019-03-26 2020-10-09 北京大学 Method for identifying functional elements
US11332738B2 (en) 2019-06-21 2022-05-17 Twist Bioscience Corporation Barcode-based nucleic acid sequence assembly
CN111304211A (en) * 2020-03-10 2020-06-19 无锡市第五人民医院 RHD-T268A mutant and detection thereof
CN111304211B (en) * 2020-03-10 2020-12-01 无锡市第五人民医院 RHD-T268A mutant and detection thereof
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US11884915B2 (en) 2021-09-10 2024-01-30 Agilent Technologies, Inc. Guide RNAs with chemical modification for prime editing
CN114672549A (en) * 2022-04-22 2022-06-28 厦门大学 Rett syndrome early auxiliary diagnosis kit
CN116403647B (en) * 2023-06-08 2023-08-15 上海精翰生物科技有限公司 Biological information detection method for detecting slow virus integration site and application thereof
CN116403647A (en) * 2023-06-08 2023-07-07 上海精翰生物科技有限公司 Biological information detection method for detecting slow virus integration site and application thereof

Similar Documents

Publication Publication Date Title
WO2015040075A1 (en) Genomic screening methods using rna-guided endonucleases
JP7083364B2 (en) Optimized CRISPR-Cas dual nickase system, method and composition for sequence manipulation
US11643669B2 (en) CRISPR mediated recording of cellular events
CN106637421B (en) Construction of double sgRNA library and method for applying double sgRNA library to high-throughput functional screening research
US11905521B2 (en) Methods and systems for targeted gene manipulation
US20180112255A1 (en) Crispr mediated in vivo modeling and genetic screening of tumor growth and metastasis
CN105473773B (en) Genome engineering
EP3011035B1 (en) Assay for quantitative evaluation of target site cleavage by one or more crispr-cas guide sequences
JP6625971B2 (en) Delivery, engineering and optimization of tandem guide systems, methods and compositions for array manipulation
US20180255751A1 (en) Delivery, use and therapeutic applications of the crispr-cas systems and compositions for modeling mutations in leukocytes
US20190271041A1 (en) Epigenetic modification of mammalian genomes using targeted endonucleases
JP2019076097A (en) Rna-guided human genome engineering
WO2016123071A1 (en) Methods of identifying essential protein domains
US11459586B2 (en) Methods for increasing efficiency of nuclease-mediated gene editing in stem cells
US20020150945A1 (en) Methods for making polynucleotide libraries, polynucleotide arrays, and cell libraries for high-throughput genomics analysis
Zhang et al. DEAD-box helicase 18 counteracts PRC2 to safeguard ribosomal DNA in pluripotency regulation
EP3342868B1 (en) Constructs and screening methods
Shah et al. Efficient and versatile CRISPR engineering of human neurons in culture to model neurological disorders
Gayle et al. piggyBac insertional mutagenesis screen identifies a role for nuclear RHOA in human ES cell differentiation
US20230046668A1 (en) Targeted integration in mammalian sequences enhancing gene expression
Shafiq et al. Three rules for epigenetic inheritance of human Polycomb silencing
Seczynska Epigenetic repression of intronless mobile elements by the HUSH complex
Jensen HP1-mediated transcriptional silencing of ERVs and genes in mouse embryonic stem cells
Ciotta Tagging methods as a tool to investigate histone H3 methylation dynamics in mouse embryonic stem cells
de Leon Ley Developing cellular tools to report oncogene-dependent replication stress

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14781824

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14781824

Country of ref document: EP

Kind code of ref document: A1