CN111334531A - High signal-to-noise ratio negative genetic screening method - Google Patents

High signal-to-noise ratio negative genetic screening method Download PDF

Info

Publication number
CN111334531A
CN111334531A CN201811546460.9A CN201811546460A CN111334531A CN 111334531 A CN111334531 A CN 111334531A CN 201811546460 A CN201811546460 A CN 201811546460A CN 111334531 A CN111334531 A CN 111334531A
Authority
CN
China
Prior art keywords
sequence
crispr
rna
guide
site
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811546460.9A
Other languages
Chinese (zh)
Inventor
袁鹏飞
王飞
于玲玲
董曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Edigene Beijing Biotechnology Co ltd
Original Assignee
Edigene Beijing Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Edigene Beijing Biotechnology Co ltd filed Critical Edigene Beijing Biotechnology Co ltd
Priority to CN201811546460.9A priority Critical patent/CN111334531A/en
Publication of CN111334531A publication Critical patent/CN111334531A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/80Vectors containing sites for inducing double-stranded breaks, e.g. meganuclease restriction sites
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2810/00Vectors comprising a targeting moiety
    • C12N2810/10Vectors comprising a non-peptidic targeting moiety

Abstract

The present invention relates to improving the signal-to-noise ratio of negative genetic screens by targeting eukaryotic genomic splice sites.

Description

High signal-to-noise ratio negative genetic screening method
Technical Field
The present invention relates to negative screens for more efficient completion of gene function by targeting splice sites in the genome of eukaryotic cells to gene disruption of messenger rna (mrna) of the gene.
Background
An important objective of genetic analysis is to identify genes that cause a particular biological phenotype or disease. The hypothesis or reverse genetics screening method utilizes known information to detect specific genetic variations through a "genotype-to-phenotype" approach. Forward genetic screening is a powerful tool for the discovery and annotation of functional genetic elements by modifying or modulating the expression of a large number of genes by "phenotype-to-genotype" studies, from which cells or organisms capable of producing a particular phenotype are selected and further analyzed for mutations that result in these phenotypes.
Initial forward genetic screening was performed in model organisms such as yeast, Drosophila, Zebra fish nematodes, etc., using DNA chemical mutagens followed by isolation of individuals producing the abnormal phenotype. Many basic biological mechanisms are obtained by this screening approach, such as RAS and Notch signaling pathways, and molecular mechanisms of embryogenesis and development. The main drawback of DNA mutagen-based screening is that the random mutations of the selected monoclonals are unknown and the identification of these mutants is time-consuming and laborious. In addition, the mutants generated by the random mutation method are mainly heterozygote, and recessive phenotype can not be screened out. Nowadays, the development of next-generation sequencing (NGS) technology provides great convenience for high-throughput screening and identification of mutants, and viruses or transposons are used to replace chemical mutagens and known linker sequences are inserted to facilitate sequencing analysis.
At present, the CRISPR/Cas9 system is already used for large-scale construction of a gene knockout library, and large-scale functional gene screening and related research can be carried out on the human whole genome level. In 2014, the Zhang Feng subject group successfully constructed a GeCKO (CRISPR-Cas9 knockout) library which is mediated by lentiviruses and targets 18080 genes. They found genes that play important roles in cancer cell viability and pluripotent stem cells using the library, and successfully screened several key genes in the process of melanoma development from the library. Meanwhile, Eric s.lander topic group also successfully constructed a lentiviral sgRNA gene screening library based on CRISPR/Cas9 system, and they successfully confirmed all key genes in DNA mismatch repair process and confirmed two target genes of TOP2A and CDK6 acting as antitumor etoposide. Meanwhile, Nature reports that Beijing university successfully develops a lentivirus sgRNA cell library based on CRISPR/Cas9 system, establishes a functional gene screening platform and a technical route for analyzing data by a high-throughput sequencing technology1-4. In addition, the Lei s.qi group constructed CRISPRi and CRISPRa systems, and performed functional gene screening in the genome-wide range, respectively, to find various tumor suppressors, differentiation regulators, and essential genes involved in cell survival and proliferation5-13
Genetic screening based on the CRISPR/Cas9 system in mammalian cells generally uses a pooled screening method, usually positive or negative screening with cell growth as the phenotype. In positive screening, a strong selective pressure is applied so that a small number of well-tolerated cells are retained for further screening. Positive screening assays generally require the addition of drugs, toxins or pathogens for the corresponding screening. In negative selection, the purpose of the selection was to find the genotype that caused the missing part of the cell during the selection process. The screening usually takes cell survival under a certain selection pressure as a screening marker, and the simplest negative screening is carried out by continuous culture for a long time, wherein the lost cells are cells carrying sgrnas which successfully destroy target genes, and the genes are key genes influencing cell proliferation. The gene of interest can be screened by comparing the relative proportion of each sgRNA at the final time point and the initial time point of the screening. Because of the relatively mild levels of cell loss and the large number of missing genes in negative screens, genes of interest are often required to be extremely sensitive to dead-live screening in order to be effectively screened (e.g., essential genes). Currently, for negative screening, the overall signal-to-noise ratio is low.
Summary of The Invention
The present invention provides methods for studying the function of genomic regions, and for screening and identifying genes essential for cell survival and growth under specific conditions. These methods rely in part on library screening based on newly developed CRISPR/Cas systems provided herein.
Specifically, the present invention relates to:
1. a CRISPR/Cas guide RNA construct for disrupting RNA cleavage in a eukaryotic cell genome comprising a guide sequence that targets a genomic sequence surrounding an RNA splice site operably linked to a promoter and a guide hairpin sequence.
2. The CRISPR/Cas guide RNA construct of item 1, wherein the eukaryotic genome is a human genome.
3. The CRISPR/Cas guide RNA construct of item 1 or 2, wherein the guide sequence is 19-21 nucleotides in length.
4. The CRISPR/Cas guide RNA construct of any of items 1-3, wherein the hairpin sequence is about 40 nucleotides in length and once transcribed can bind to the CRISPR/Cas nuclease.
5. The CRISPR/Cas guide RNA construct of any of items 1-4, wherein the guide sequence targets a genomic sequence within a region spanning-50-bp to +75-bp around the SD or SA site of the RNA.
6. The CRISPR/Cas guide RNA construct of item 5, wherein the guide sequence targets a genomic sequence within a region spanning-30-bp to +30-bp around the SD or SA site of the RNA.
7. The CRISPR/Cas guide RNA construct of item 6, wherein the guide sequence targets a genomic sequence within a region spanning-10-bp to +10-bp around the SD or SA site of the RNA.
8. The CRISPR/Cas guide RNA construct of any of items 1-7, which is a viral vector or plasmid.
9. A library comprising a plurality of CRISPR/Cas guide RNA constructs of any one of items 1-8.
10. A storage liquid comprising the CRISPR/Cas guide RNA construct of any of items 1-8 or the library of item 9.
11. A host cell comprising the CRISPR/Cas guide RNA construct of any of items 1-8.
12. The host cell of item 11, further comprising a CRISPR/Cas nuclease and/or a coding sequence for a CRISPR/Cas nuclease.
13. The host cell of item 11 or 12, further comprising a Cas9 nuclease.
14. The host cell of any one of claims 11-13, further comprising a reporter construct integrated into its genome.
15. A population of host cells according to any one of claims 11 to 14.
16. A method, comprising:
introducing into a host cell a CRISPR/Cas guide RNA construct comprising a guide sequence that targets a genomic sequence surrounding an RNA splice site operably linked to a promoter and a guide hairpin sequence,
expressing the guide RNA targeting the genomic sequence in the host cell and introducing exon skipping and/or intron retention in the RNA in the presence of a CRISPR/Cas nuclease and determining the functional profile of the RNA.
17. The method of clause 16, wherein the guide sequence targets a genomic sequence within a region spanning-50-bp to +75-bp around the SD or SA site of the RNA.
18. The method of clause 17, wherein the guide sequence targets a genomic sequence within a region spanning-30-bp to +30-bp around the SD or SA site of the RNA.
19. The method of clause 18, wherein the guide sequence targets a genomic sequence within a region spanning-10-bp to +10-bp around the SD or SA site of the RNA.
20. The method of any one of items 15-19, wherein the functional profile comprises a change in cell phenotype and/or an increase or decrease in expression of a coding gene or a non-coding gene.
21. The method of item 20, wherein the coding gene is an exogenous reporter gene or a naturally-occurring coding gene in the genome.
22. The method of any one of items 16-21, wherein the host cell is in a population of host cells and each host cell independently comprises a specific guide RNA construct.
23. The method of item 22, which is a method for screening and identifying genes necessary for the survival and growth of cells under specific conditions. In one aspect, the methods of the invention utilize the ability of the CRISPR/Cas system to cleave specific genomic sequences surrounding the incrna cleavage site to induce intron retention or exon skipping resulting in incrna, thereby interfering with or eliminating incrna function. The genomic locus targeted is in particular around a splice site of a genomic gene, in particular around a splice site of a genomic gene encoding RNA (lncRNA), in particular within a region spanning-50-bp to +75-bp around an SD or SA site, more preferably within a region spanning-30-bp to +30-bp, most preferably within a region spanning-10-bp to + 10-bp. The sequence surrounding the targeted lncRNA splice site is cleaved and mutated by a cellular non-homologous end joining (NHEJ) mechanism in the host cell, and such mutation results in exon skipping and/or intron retention and thus substantial elimination of the active function of lncRNA.
As known in the art, CRISPR/Cas system nucleases require guide RNA to cleave genomic DNA. These guide RNAs consist of: (1) a 19-21 nucleotide spacer sequence (guide sequence) that targets the CRISPR/Cas system nuclease to a different sequence of a genomic location in a sequence-specific manner, and (2) a hairpin sequence located between the guide RNAs and allowing binding of the guide RNA to the CRISPR/Cas system nuclease.
The methods herein involve introducing a CRISPR/Cas guide RNA construct comprising a guide sequence that targets a genomic sequence surrounding an RNA splice site operably linked to a promoter and a hairpin sequence into a host cell in which the guide RNA (guide RNA) that targets the genomic sequence is expressed. In one embodiment, the guide sequence targets genomic sequences within a region spanning-50-bp to +75-bp around the SD or SA site of the RNA, more preferably within a region spanning-30-bp to +30-bp around the SD or SA site of the RNA, and most preferably within a region spanning-10-bp to +10-bp around the SD or SA site of the RNA.
In some cases, the method further comprises determining a functional profile of the RNA. A change in the expression of a genomic gene (coding gene or non-coding gene) or a change in the functional activity of its gene product (encoded protein) can be used as an indication of the incrna regulatory function. Alternatively, the coding sequence for the reporter gene may be inserted into the genome (e.g., by replacing the form of the native coding sequence) and changes in its expression or the functional activity of its gene product may be used as an indicator of the functional profile of the RNA. In some cases, the coding sequence of the reporter gene is fused to the native coding sequence, and the indication is the protein expression of the mRNA or the resulting fusion protein or the functional activity of the fusion protein.
In a particular aspect, the methods disclosed herein can be used to screen for and identify lncrnas that are involved in cellular processes other than transcription, including, for example, cell survival, cell division, cell metabolism, apoptosis, cell cycling, nucleosome assembly, signal transduction, multi-cellular organism development, immune response, cell adhesion, angiogenesis, and the like. In some embodiments, the method can be used to identify lncrnas that cause a change in a cellular process selected from the group consisting of: cell survival, cell division, cell metabolism, apoptosis, cell cycle, nucleosome assembly, signal transduction, development of multicellular organisms, immune response, cell adhesion, and angiogenesis. In some embodiments, the methods can be used to identify lncrnas that cause a change in a cellular phenotype, such as loss of function or gain of function. In some embodiments, the methods can be used to identify lncrnas that result in a decrease or increase in transcription of a coding gene and/or a non-coding gene. The methods can be used to identify the role of one or more incrnas simultaneously or sequentially, or to identify the function of an incrna individually or to identify the function of multiple incrnas in different combinations.
For example, a population of cells is transfected with a CRISPR/Cas guide RNA library that encodes different sequences of guide RNAs that target genomic sequences surrounding incrna splice sites, respectively, and the guide RNAs are expressed in the cells and induced to undergo exon skipping and/or intron retention of the incrnas in the presence of the CRISPR/Cas. The RNA profile and transcriptome of each cell may be analyzed using, for example, but not limited to, single cell RNA sequencing (RNA-Seq) techniques. The analysis will reveal the effect of cellular genomic mutations on the RNA profile, including the type and abundance of RNA molecules. The methods can also be used to identify the nature (e.g., sequence) of the guide RNA that achieves exon skipping and or intron retention. Thus, the effect of exon skipping or intron retention can be observed immediately across the entire cellular transcriptome by experimentation in single cells.
The present invention provides CRISPR/Cas guide RNA constructs comprising a guide sequence that targets a genomic sequence surrounding an RNA splice site operably linked to a promoter and a hairpin sequence.
In some embodiments, the eukaryotic genome may be a human genome, and thus the CRISPR/Cas guide construct may be intended for use in a human cell.
The guide sequence may be 19-21 nucleotides in length. Hairpin sequences may be less than 100 nucleotides, less than 90, 80, 70, 60, 50, 40 or 30 nucleotides in length, for example about 20, 30, 40, 50, 60 nucleotides. In other embodiments, the hairpin sequence may be about 20-60 or 20-40 nucleotides in length. Once transcribed, the hairpin sequence can bind to the CRISPR/Cas nuclease.
The CRISPR/Cas guide construct is DNA in nature and when transcribed produces guide RNA.
The invention also provides a population of cells comprising any of the above host cells. The host cell population may be homologous or heterologous.
In some embodiments, the cell further comprises a CRISPR/Cas nuclease and/or a coding sequence for a CRISPR/Cas nuclease. In some embodiments, the cell further comprises a coding sequence for Cas9 nuclease and/or Cas9 nuclease.
In some embodiments, the coding sequence for the reporter protein or the fusion protein comprising the reporter protein is integrated into the genome of the host cell.
In some embodiments, the host cell is in a population of host cells, and each host cell independently comprises a specific guide RNA construct.
In some embodiments, each host cell expresses a specific functional guide RNA, and the host cell undergoes a mutation in a different genomic sequence to that of other host cells in the population that is implicated by the guide RNA.
The invention also provides a high throughput method for screening or identifying essential genes in the genome of a eukaryotic cell, comprising introducing a CRISPR/Cas guide RNA library of genomic sequences surrounding a targeted RNA splice site into a population of host cells, wherein each host cell in the population independently comprises and expresses a specific guide RNA and, in the presence of a CRISPR/Cas nuclease, cleaves and mutates the targeted genomic sequence and thereby causes exon skipping and/or intron retention of the RNA.
In some embodiments, the high-throughput method further comprises identifying the effect of the RNA on the phenotype of the cell or the expression of the coding or non-coding gene. In some embodiments, each host cell expresses a specific guide RNA and is mutated in a different genomic sequence relative to the other host cells in the population. In some embodiments, the encoding gene is exogenous or endogenous to the genome of the cell. In some embodiments, the alteration in the phenotype of the cell comprises a loss of function or an acquisition of function. In some embodiments, the change in expression of the coding gene or non-coding gene is an increase or decrease in transcription of the coding gene or non-coding gene.
The invention also provides a method for interfering with or eliminating RNA function in a eukaryotic cell comprising introducing into a eukaryotic cell one or more CRISPR/Cas guide RNAs that target one or more polynucleotide sequences surrounding one or more splice sites of the RNA, whereby the one or more guide RNAs target one or more polynucleotide sequences surrounding one or more splice sites of the RNA and, in the presence of a Cas protein, cleave the one or more polynucleotide sequences, resulting in intron retention and/or exon skipping of the RNA and thus interfering with or eliminating the function of the RNA. In some embodiments, the guide RNA targets a polynucleotide sequence within a region spanning-50-bp to +75-bp around the SD site or SA site of the RNA. In some embodiments, the guide RNA targets a polynucleotide sequence within a region spanning-30-bp to +30-bp around the SD site or SA site of the RNA. In some embodiments, the guide RNA targets a polynucleotide sequence within a region spanning-10-bp to +10-bp around the SD site or SA site of the RNA. In some embodiments, the CRISPR/Cas nuclease is Cas9 or Cpfl. In some embodiments, the introduction into the cell is performed by a delivery system comprising a viral particle, a liposome, electroporation, microinjection, conjugation, a nanoparticle, an exosome, a microbubble, or a gene-gun, preferably by a delivery system comprising a lentiviral particle.
Brief Description of Drawings
FIGS. 1a-b.a, genomic sequence features and base specificity of splice sites in humans. The y-axis indicates the probability of bases at each locus. b, schematic representation of intron retention or exon skipping induced by sgRNA around targeted Splice Donor (SD) or Splice Acceptor (SA) sites.
Fig. 2a-b, which show the correlation between repeated experiments in a sgRNA library screen for essential ribosomal genes. Scattergrams of normalized sgRNA read counts in libraries including day 0 control samples (Ctrl) and day 15 experimental samples (Exp) for the targeted splicing of HeLa cell line (a) and huh7.5 cell line (b). Spearman correlation (Spearman corr.) between two replicates of each sample was also reported.
Fig. 3 this figure embodies a deep sequencing analysis of CRISPR screening of sgRNA libraries targeting ribosomal genes in HeLa and huh7.5 cell lines. The sgRNA saturation mutagenesis library was designed to target the 50-bp to +75-bp region around the 5 'SD site and the 75-bp to +50-bp region around the 3' SA site of 79 ribosomal genes. The collected plasmid libraries were transduced into HeLa and huh7.5 cells expressing Cas9 protein, respectively, by lentivirus. Log of read counts by normalization2(Exp: Ctrl) the decrease in total sgrnas at each indicated locus was calculated, and the black bars represent the average fold change in total sgrnas at each locus. The dashed line indicates the position of the splice site.
Fig. 4a-c, which show the identification of sgRNA targeting regions that generate splice site disruption. a, normalization of high-efficiency sgrnas at each locus in HeLa and huh7.5 cell lines. Data were calculated by dividing the number of sgrnas with more than 4-fold reduction by the total number of sgrnas designed at the indicated loci. b, comparison of high-potency sgrnas targeting introns, 5' SD sites and exons in HeLa and huh7.5 cell lines. Each bar represents the percentage of sgrnas with greater than 2-fold or 4-fold reduction in different regions. Data are expressed as mean ± s.e.m. c, comparison of high efficiency sgrnas targeting introns, 3' SA sites and exons in HeLa and huh7.5 cell lines. Data are expressed as mean ± s.e.m.
Detailed Description
Definition of
The invention is described on the basis of specific embodiments and with reference to the attached drawings, but the invention is not limited thereto, but the scope of protection is defined by the claims. Any reference signs in the claims shall not be construed as limiting the scope. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes. Where the term "comprising" is used in the present description and claims, it does not exclude other elements or steps. Where an article is referred to as a singular noun, e.g., "a," "an," or "the," "the," etc., this description typically includes plural referents unless otherwise indicated.
The following terms or definitions are also provided to aid in the understanding of the present invention. Unless specifically defined herein, all terms used herein have the same meaning as one skilled in the art to which this invention pertains. For these art definitions and nomenclature, specific practitioner reference may be made specifically to Sambrook et al, Molecular Cloning: A laboratory Manual,2nded., Cold Spring Harbor Press, Plainview, New York (1989); and Ausubel et al, Current Protocols in Molecular Biology (Supplement 47), John Wiley&Sons, New York (1999). The definitions provided herein should not be construed to have a scope less than understood by those of skill in the art.
The terms "polynucleotide", "nucleotide sequence", "nucleic acid" and "oligonucleotide" are used interchangeably. It refers to a polymeric form of a nucleotide of any length, which may be a deoxyribonucleotide or a ribonucleotide, or an analog thereof. The polynucleotide may have a three-dimensional structure and may perform any function, known or unknown. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, a locus, an exon, an intron, messenger RNA (mrna), RNA (lncrna), transfer RNA, ribosomal RNA, short interfering RNA (sirna), short hairpin RNA (shrna), microrna (mirna), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure can be imparted before or after assembly of the multimer. The sequence of nucleotides may be interrupted by non-nucleotide components. The polynucleotide may be further modified after multimerization, such as by conjugation to a labeling component.
In one aspect of the invention, the terms "chimeric RNA," "chimeric guide RNA," "single guide RNA," and "synthetic guide RNA" are used interchangeably and refer to a polynucleotide sequence comprising a guide sequence, a tracr sequence, and a tracr partner sequence. The term "guide sequence" refers to a sequence of about 20bp within a guide RNA that specifies a targeting site, and may be used interchangeably with the terms "guide sequence" or "spacer".
As used herein, "expression" refers to the process of transcription of a polynucleotide from a DNA template (e.g., into mRNA or other RNA transcript) and/or the subsequent translation of the transcribed mRNA into a peptide, polypeptide, or protein. The transcripts and encoded polypeptides may be collectively referred to as "gene products". If the polynucleotide is derived from genomic DNA, expression may include splicing of mRNA in eukaryotic cells.
The practice of the present invention employs, unless otherwise indicated, conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA, which are within the skill of the art. See Sambrook, Fritsch and Maniatis, Molecular CLONING, A Laborary Manual,2 conclusion (1989); CURRENT promoters IN MOLECULAR BIOLOGY (f.m. ausubel, et al eds., (1987)); the series METHODS IN ENZYMOLOGY (Academic Press, Inc.): PGR 2: APRACTICAL APPROACH (M.J. MacPherson, B.D. hames and G.R. Taylor eds. (1995)), Harlowand Lane, eds. (1988) ANTIBODIES, A LABORATORY MANUAL, and ANIMAL CELL CULTURE (R.LFRESHN, ed. (1987))14-18
Several aspects of the invention relate to a vector system comprising one or more vectors, or such vectors. Vectors can be designed for expression of CRISPR transcripts (e.g., nucleic acid transcripts, proteins, or enzymes) in prokaryotic or eukaryotic cells. For example, CRISPR transcripts can be expressed in bacterial cells such as e.coli, insect cells, yeast cells, or mammalian cells. Suitable host cells are described IN Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990)19Are also described in detail. Alternatively, the recombinant expression vector can be transcribed and translated in vitro, for example, using T7 promoter regulatory sequences and T7 polymerase.
In some embodiments, a mammalian cell vector is used, which is capable of driving expression of one or more sequences in a mammalian cell.Examples of mammalian expression vectors include pCDM820And pMT2PC21. When used in mammalian cells, the regulatory function of the expression vector is provided primarily by one or more regulatory elements. For example, commonly used promoters are derived from polyoma virus, adenovirus 2, cytomegalovirus, simian virus 40, and other promoters disclosed herein and known in the art. Other suitable expression systems for use in both prokaryotic and eukaryotic cells are described, for example, in Chapters 16 and 17of Sambrook, et al, MOLECULAR CLONING: A LABORATORY MANUAL.2nd ed., Cold Spring Harbor LABORATORY Press, Cold Spring Harbor, N.Y.,198914
In general, "CRISPR system" collectively refers to a transcript or other element involved in the expression of or directing the activity of a CRISPR-associated ("Cas") gene, including sequences encoding the Cas gene, tracr (trans-activating CRISPR) sequences (e.g., tracrRNA or partially-activating tracrRNA), tracr-chaperone sequences (encompassing "direct repeats" and partial direct repeats of tracrRNA-processing in the context of an endogenous CRISPR system), guide sequences (also referred to as "spacers" in the context of an endogenous CRISPR system), or other sequences and transcripts from the CRISPR locus. In some embodiments, the one or more elements of the CRISPR system are derived from a type I, type II or type III CRISPR system.
In the context of forming a CRISPR complex, a "target sequence" refers to a sequence for which a guide sequence is designed to have complementarity, wherein hybridization between the target sequence and the guide sequence promotes formation of the CRISPR complex. Complete complementarity is not necessary provided that there is sufficient complementarity to cause hybridization and promote CRISPR complex formation.
Typically, in the context of an endogenous CRISPR system, formation of a CRISPR complex (including hybridization of a guide sequence to a target sequence and complexing with one or more Cas proteins) results in cleavage of one or both strands in or near (e.g., within 1, 2, 3, 4, 5, 6,7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence. Without wishing to be bound by theory, the tracr sequence may comprise or consist of all or a portion of a wild-type tracr sequence (e.g., about or greater than about 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 53, 56, 59, 62, 65, 70, 75, 80, 85 or more nucleotides of a wild-type tracr sequence) and may also form part of a CRISPR complex, e.g., by hybridizing all or a portion of a tracr partner sequence operably linked to a guide sequence along at least a portion of the tracr sequence.
In some embodiments, the tracr sequence is sufficiently complementary to the tracr partner sequence to hybridize and participate in the formation of a CRISPR complex. Identical to the target sequence, complete complementarity is not necessary, as long as it is sufficient for its function. In some embodiments, the tracr sequence has at least 50%, 60%, 70%, 80%, 90%, 95%, or 99% complementarity along the length of the tracr partner sequence under optimal alignment.
In some embodiments, one or more vectors that drive expression of one or more elements of the CRISPR system are introduced into a host cell such that expression of the CRISPR system elements directs formation of CRISPR complexes at one or more target sites. In another embodiment, the host cell is engineered to stably express Cas9 and/or OCT 1.
In general, a guide sequence is any polynucleotide sequence that has sufficient complementarity to a target polynucleotide sequence to hybridize to the target sequence and direct sequence-specific binding of the CRISPR complex to the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence is about or greater than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99% or more when optimally aligned using an appropriate alignment algorithm. Optimal alignment may be determined using any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, Needleman-Wimsch algorithm, Burrows-Wheeler Transform-based algorithms (e.g., Burrows Wheeler Aligner), ClustalW, Clustai X, BLAT, Novoalign (Novocraft Technologies, ELAND ((Illumina, san diego, CA)), SOAP (available at SOAP. genomics. org. cn), and Maq (available at maq. sourceform. net.) in some embodiments, the guide sequence length may be about or greater than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 65, 60, 70, 65, 70, 75, or less than about 5 nucleotides of the specific binding capacity of the guide sequence to a CR sequence in some embodiments The assay method was evaluated. For example, components of the CRISPR system (including the guide sequences to be tested) sufficient to form a CRISPR complex can be provided to a host cell having a corresponding target sequence, such as can be performed by transfection using a vector encoding the CRISPR sequence components followed by assessment of preferential cleavage within the target sequence (such as by the surfyor assay as described herein). Likewise, cleavage of a target polynucleotide sequence can be assessed in a test tube by providing a set of target sequence, CRISPR complex (comprising the guide sequence to be tested and a control guide sequence different from the guide sequence), and comparing the rate of binding or cleavage of the target sequence between the test and control guide sequence reactions. Other assays are possible and would be known to those skilled in the art.
In some embodiments, the CRISPR enzyme is part of a fusion protein comprising one or more heterologous protein domains (e.g., about or greater than about 1, 2, 3, 4, 5, 6,7, 8, 9, 10 or more domains in addition to the CRISPR enzyme). The CRISPR enzyme fusion protein can comprise any additional protein sequence, and optionally a linker sequence between any two domains. Examples of protein domains that can be fused to a CRISPR enzyme include, but are not limited to, epitope tags, reporter gene sequences, and protein domains having one or more of the following activities: methylase activity, demethylase activity, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, RNA cleavage activity and nucleic acid binding activity.
In some aspects, the invention provides methods comprising delivering to a host cell one or more polynucleotides, e.g., one or more constructs, e.g., vectors, one or more transcripts thereof and/or one or more proteins transcribed therefrom, as described herein. The invention can be used as a basic platform for targeted modification of DNA-based genomes. It can interface with any delivery system including, but not limited to, viruses, liposomes, electroporation, microinjection, and conjugation. In some aspects, the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced by such cells. In some embodiments, the CRISPR enzyme in combination (and optionally complexed) with the guide sequence is delivered to a cell. Nucleic acids can be introduced into mammalian cells or target tissues using conventional viral and non-viral based gene transfer methods. Such methods can be used to administer nucleic acids encoding CRISPR system components to cells in a culture medium or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g., vector transcripts as described herein), naked nucleic acids, and nucleic acids complexed with a delivery vehicle, such as liposomes. Viral vector delivery systems include DNA and RNA viruses, which have an episomal or integrated genome for delivery to cells.
Non-viral delivery methods of nucleic acids include lipofection, nuclear transfection, microinjection, gene guns, viral particles, liposomes, immunoliposomes, polycations or lipids, nucleic acid conjugates, naked DNA, and artificial virions.
The use of RNA or DNA based systems for delivering nucleic acids has the advantage of high efficiency in targeting viruses to specific cells of the body and transporting viral loads to the nucleus.
The term "exon" as used herein refers to any portion of a gene that will encode a portion of the final mature RNA (produced by the gene after intron removal by RNA splicing). The term exon refers to the DNA sequence within a gene as well as the corresponding sequence in an RNA transcript. In RNA splicing, introns are removed and exons are covalently joined to each other as part of the generation of mature messenger RNA.
An "intron" is any nucleotide sequence within a gene that is removed by RNA splicing during maturation of the final RNA product. The term intron refers to the DNA sequence within a gene and the corresponding sequence in an RNA transcript. The sequences are joined together in the final mature RNA after RNA splicing. Introns are found in most organisms and genes of many viruses, and are present in many genes, including those that produce proteins, ribosomal rna (rrna), rna (incrna), and transport rna (trna). When proteins are produced from genes containing introns, RNA splicing occurs as part of the post-transcriptional RNA processing pathway and precedes translation.
The term "splicing" as used herein means the editing of a nascent precursor messenger (pre-mRNA) transcript to mature messenger rna (mRNA). For most eukaryotic introns, splicing is performed in a series of reactions catalyzed by spliceosomes, complexes of micronucleus ribonucleoproteins (snrnps). Spliceosomes introns are usually located within the sequence of eukaryotic protein-encoding genes. Within an intron, essential for splicing are a donor site (5' end of the intron), a branching site (near the 3' end of the intron), and an acceptor site (3 ' end of the intron). Within the larger, less conserved region, the Splice Donor (SD) site includes the sequence GT with little alteration at the 5' end of the intron. The Splice Acceptor (SA) site at the 3' end of the intron terminates the intron with an almost invariant AG sequence. Upstream (5' -direction) of AG there are regions rich in pyrimidine (C and T) or polypyrimidine tracts. Further upstream of the polypyrimidine tract is a branch point which comprises an adenine nucleotide involved in the formation of a noose22,23
The nuclear pre-mRNA intron is characterized by specific intron sequences located at the boundaries of the intron and exon. These sequences are recognized by the spliceosome RNA molecules when the initial splicing reaction occurs. The major spliceosome splices at the 5 'splice site containing GT and at the 3' splice site containing the intron of AG, and this type of splicing is referred to as canonical splicing or as the lasso pathway, with splicing above 99% being such a way. In contrast, when intron flanking sequences do not follow the GT-AG rule, non-canonical splicing is said to occur in less than 1% of the proportion24
Our bioinformatic analysis using the Weblogo3 tool showed that about 99% of the intron regions in the human genome were flanked by GT at the 5 'site and AG at the 3' site. These intron regions are suitable for coding gene and non-coding RNA.
Exon skipping is a form of RNA splicing that causes one or more exons to "skip" the final RNA, while intron retention is a form of RNA splicing in which introns remain in the final RNA after splicing.
Splicing is regulated by trans-acting proteins (repressor and activator) and corresponding cis-acting regulatory sites (silencer and enhancer) on pre-mRNA. However, as part of the complexity of alternative splicing, it should be noted that the effects of splicing factors are often position-dependent. That is, in the context of exons, a splicing factor that functions as a splice-activating protein when associated with an intron enhancer element may function as a repressor protein when associated with an intended splice element, and vice versa25. Secondary structure of pre-mRNA transcripts also plays a role in regulating splicing, such as by pooling together splicing elements or by masking a sequence that, if unmasked, would function as a binding element for splicing factors26. In conclusion, these elements form the "splicing code" which controls how splicing occurs under different cellular conditions "27
Modification of genes in eukaryotic cells
The methods of the invention involve efficient delivery of sgrnas targeting splice sites to produce exon skipping and/or intron retention to interfere with genes, including, for example, coding or non-coding genes. For a gene encoding lncRNA, the method can be effective in affecting the function of lncRNA.
To assess the efficacy of splicing targeting in CRISPR screening, we designed a saturated library of splice sites targeting 79 ribosomal genes, most of which are essential for cell growth in various cell lines. The library contained 5,788 sgrnas with cleavage sites within 50-bp to +75-bp around each 5 'SD (splice donor) site and 50-bp to +75-bp around each 3' SA (splice acceptor) site of the 79 genes. Clearly, sgrnas that affect the splice site are superior to sgrnas that target only the exon region, and the closer the distance from the sgRNA cleavage site to the splice site, the better its gene disruption effect, with the peak point slightly toward the exon for SD and SA cases.
CRISPR/Cas9 action mechanism and library screening principle
The methods of the invention utilize CRISPR/Cas systems. Cas9 is from a microbial type II CRISPR (short palindromic repeats) system that has been shown to cleave DNA when paired with a single guide rna (grna). The gRNA contains a 17-21bp sequence that directs Cas9 to a complementary region in the genome, thus allowing the specific generation of a Double Strand Break (DSB) site that is repaired in an error-prone manner by a cellular non-homologous end joining (NHEJ) mechanism. Cas9 primarily cleaves the genomic site of the gRNA followed by the PAM sequence (-NGG). NHEJ-mediated Cas 9-induced DSB repair induces an initial broad range of mutations at cleavage sites that are typically small (<10bp) insertions/deletions (indels) but may include larger (>100bp) insertions/deletions (indels) and single base changes.
The splicing targeting methods of the invention can be used to screen multiple (e.g., thousands) of sequences in a genome, thereby elucidating the function of those sequences. In some embodiments, the splicing targeting methods of the invention involve high throughput screening of non-coding RNAs by using the CRISPR/Cas9 system to identify genes required for survival, proliferation, or drug resistance, among others. In screening, grnas targeting tens of thousands of splice sites within the gene of interest are co-delivered as a pool with Cas9 into target cells, e.g., by lentiviral vectors. By identifying grnas that are enriched or depleted in cells after selection for a desired phenotype, genes required for that phenotype can be systematically identified.
In a manner based on the above-described high-throughput CRISPR/Cas9, a gRNA library can be cloned into a lentiviral vector. In this case, it is desirable to reduce the multiplicity of infection (MOI) to limit the number of guide RNAs in a single cell, typically with only a single guide RNA per cell. Integration of grnas in each cell was randomized, allowing for pooled screening (pool screen) of only one gRNA expressed per cell. It is noteworthy that the high throughput gRNA-based screening on the genome of the targeted splice sites of the present invention can also be used for other CRISPR-based high throughput screens for coding and regulatory genes.
Guide RNA
As known in the art, CRISPR/Cas system nucleases require guide (guide) RNA to cleave genomic DNA. These guide RNAs consist of: (1) a spacer (guide sequence) comprising 19-21 nucleotides that targets the CRISPR/Cas system nuclease to multiple sequences of a genomic location in a sequence-specific manner, and (2) a hairpin sequence between the guide RNAs and allowing binding of the guide RNA to the CRISPR/Cas system nuclease. In the presence of a CRISPR/Cas nuclease, the guide RNA triggers a CRISPR/Cas-based genome cleavage event in the cell.
A guide sequence is selected or designed based on the intended target sequence. In some embodiments, the target sequence is a sequence surrounding a splice site, e.g., a region-50-bp to +75-bp around the SD site of a gene encoding incrna within the genome of a cell, preferably a region-30-bp to +30-bp around the SD site, and most preferably a region-10-bp to +10-bp around the SD site; a region of-50-bp to +75-bp around the SA site, preferably a region of-30-bp to +30-bp around the SA site, and most preferably a region of-10-bp to +10-bp around the SA site. Exemplary target sequences include those sequences unique in the target genome.
For example, for S.pyogenes Cas9, a unique target series in the genome may include a Cas9 target site of the form M8N12XGG, where N12XGG (N is A, G, T or C; and X may be either) has a single occurrence in the genome. The unique target sequence in the genome may include the Streptococcus pyogenes Cas9 target site of form M9N11XGG, where N11XGG (N is A, G, T or C; and X may be either) has a single occurrence in the genome.
For streptococcus thermophilus (s. thermophilus) CRISPR1Cas9, the unique target sequence in the genome may include a Cas target site of the form M8N12XXAGAAW, where N12XXAGAAW (N is A, G, T or C; and X may be either, and W is a or T) has a single occurrence in the genome. The unique target sequence in the genome can include the streptococcus thermophilus CRISPR1Cas9 target site in the form of M9N11XXAGAAW, where N12XXAGAAW (N is A, G, T or C; and X can be either, and W is a or T) has a single occurrence in the genome.
For streptococcus pyogenes Cas9, the unique target sequence in the genome may include a target site of the form M8N12XGGXG, where N12XGGXG (N is A, G, T or C; and X may be either) has a single occurrence in the genome. The unique target sequence in the genome may include a Streptococcus pyogenes Cas9 target site in the form of M9N11XGGXG, where N12XGGXG (N is A, G, T or C; and X may be either) has a single occurrence in the genome. In each of these sequences, "M" may be A, G, T or C and need not be considered when considering a sequence as a unique sequence.
It is to be understood that any hairpin sequence can be used as long as it can be recognized and bound by a CRISPR/Cas nuclease.
Guide RNA constructs
In some embodiments, the invention relates to a guide RNA construct. The guide RNA construct may comprise (1) a guide sequence and (2) a guide RNA hairpin sequence, and optionally (3) a promoter sequence capable of initially directing RNA transcription. Non-limiting examples of guide RNA hairpin sequences are Chen et al cell.2013 Dec 19; 155(7) 1479-91. An example of a promoter is the human U6 promoter.
In some embodiments, the present invention relates to a CRISPR/Cas guide construct comprising (1) a guide sequence and (2) a guide RNA hairpin sequence, and optionally (3) a promoter sequence capable of initially guiding RNA transcription, wherein the guide sequence targets a sequence surrounding a splice site in the genome of a eukaryotic cell, e.g., the guide sequence targets a region of-50-bp to +75-bp, preferably a region of-30-bp to +30-bp, and most preferably a region of-10-bp to +10-bp surrounding a SD site or SA site of a gene encoding incrna. In some embodiments, the guide sequence targets a splice site of a gene encoding an RNA in the genome of a eukaryotic cell to induce exon skipping and/or intron retention, and thereby destroy the RNA. In some embodiments, the eukaryotic cell genome is a human genome. In some embodiments, the guide sequence is 19-21 nucleotides in length. In some embodiments, the hairpin sequence is about 40 nucleotides in length and once transcribed can bind to the CRISPR/Cas nuclease.
CRISPR/Cas system nucleases
In some embodiments, the CRISPR/Cas nuclease is a type II CRISPR/Cas nuclease. In some embodiments, the CRISPR/Cas nuclease is Cas9 nuclease. In some embodiments, the Cas9 nuclease is streptococcus pneumoniae, streptococcus pyogenes, or streptococcus thermophilus Cas9, and may include a mutated Cas9 derived from these organisms. The nuclease may be a functionally equivalent variant of Cas 9. In some embodiments, the CRISPR/Cas nuclease is codon optimized for expression in a eukaryotic cell. In some embodiments, the CRISPR/Cas nuclease directs cleavage of one or both strands at a target sequence position. CRISPR/Cas system nucleases include, but are not limited to, Cas9 and Cpfl.
Reporter genes and proteins, and reads
In some embodiments, the reporter gene can be integrated into the cell using a CRISPR/Cas mechanism. For example, expression vectors, such as plasmids, comprising a promoter (e.g., the U6 promoter), a guide RNA hairpin sequence, and a guide sequence to target a desired genomic locus into which the reporter construct is integrated can be used. Such expression vectors may be prepared by cloning the guide sequences into an expression construct containing additional elements. A DNA fragment comprising a reporter coding sequence can be prepared and subsequently modified to include homology arms that flank the reporter coding sequence. A guide RNA expression vector, an amplified DNA fragment comprising a sequence encoding a reporter protein, and a CRISPR/Cas nuclease (or nuclease-encoding expression vector) are introduced into a host cell (e.g., via electroporation). The expression vector may further comprise additional selectable markers such as antibiotic resistance markers to enrich for cells successfully infected with the expression vector. Cells expressing the reporter protein may be further selected.
Reporter genes encoding readily assayable proteins are known in the art and include, but are not limited to, Green Fluorescent Protein (GFP), Glutathione S Transferase (GST), horseradish peroxidase (HRP), Chloramphenicol Acetyltransferase (CAT) β -galactosidase, β -glucuronidase, luciferase, HcRed, DsRed, Cyan Fluorescent Protein (CFP), Yellow Fluorescent Protein (YFP), and auto-fluorescent proteins including Blue Fluorescent Protein (BFP), cell surface markers, antibiotic resistance genes such as neo, and the like.
Expression vector
The term "vector" refers to a nucleic acid molecule capable of transporting another nucleic acid to which it is linked. Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double-stranded, or partially double-stranded; nucleic acid molecules comprising one or more free ends, with no free ends (e.g., circular); a nucleic acid molecule comprising DNA, RNA, or both; and various other polynucleotides known in the art. One type of vector is a "plasmid," which refers to a circular double-stranded DNA loop into which additional DNA segments are inserted, such as by standard molecular cloning techniques. Some vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having an origin of bacterial replication and episomal mammalian vectors). When introduced into a host cell, other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of the host cell and thereby are replicated together with the host genome. In addition, some vectors are capable of directing the expression of genes to which they are operably linked. Such vectors are referred to herein as "expression vectors". Expression vectors in recombinant DNA technology often take the form of plasmids.
A recombinant expression vector may comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, by which is meant that the recombinant expression vector comprises one or more regulatory elements operably linked to the nucleic acid sequence to be expressed, which may be selected on the basis of the host cell used for expression. Within a recombinant expression vector, "operably linked" is intended to link the nucleotide sequence of interest to the regulatory element(s) in a manner that allows for expression of the nucleotide(s) (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
Host cell
In fact, any eukaryotic cell type can be used as a host cell, so long as it can be cultured in vitro and modified as described herein. Preferably, the host cell is a pre-established cell line. The host cell and cell line may be a human cell or cell line, or it may be a non-human, mammalian cell or cell line.
Examples
Materials and methods
1. Cells and reagents
The HeLa cell line from Z.Jiang laboratory (university of Beijing) was cultured in Dulbecco's modified eagle's medium (DMEM, Gibco C11995500 BT). The Huh7.5 cell line from s.cohen laboratories (stanford university medical school) was cultured in dmem (Gibco) supplemented with 1% MEM non-essential amino acids (NEAA, Gibco 1140-. K562 cells from h.wu laboratory (university of beijing) and GM12878 cells from Coriell cell bank were cultured in RPMI1640 medium (Gibco 11875-093). All cells were supplemented with 10% fetal bovine serum (FBS, CellMaxBL102-02) and 1% penicillin/streptomycin, 5% CO at 37 ℃2Culturing in medium.
2. Reverse transcription PCR (RT-PCR) for testing intron retention or exon skipping
Cloning of sgrnas into lentiviral expression vectors carrying CMV promoter-driven mCherry markers, followed by MOI<Transduction of HeLa by viral infectionOCCells1-472 hours post infection, FACS sorted mCherry positive cells and total RNA was extracted from each sample using the RNAprep purified cell/bacteria kit (TIANGEN DP 430). cDNA was synthesized from 2. mu.g of total RNA using QuantscriptRT kit (TIANGEN KR103-04), and RT-PCR reaction was performed using TransTaq HiFi DNA polymerase (TransGen AP 131-13).
sgRNA sequences targeting either RPL18 or RPL11 genes:
sgRNA1RPL18:5’-GGACCAGCCACTCACCATCC(SEQ ID No.1)
sgRNA2RPL18:5’-AGCTTCATCTTCCGGATCTT(SEQ ID No.2)
sgRNA3RPL11:5’-TCCTTGTGACTACTCACCTT(SEQ ID No.3)
sgRNA4RPL11:5’-AACTCATACTCCCGCACCTG(SEQ ID No.4)
primers used for RT-PCR:
1F:5’-CTGGGTCTTGTCTGTCTGGAA(SEQ ID No.5);
1R:5’-CTGGTGTTTACATTCAGCCCC(SEQ ID No.6);
2F:5’-GGCCAGAAGAACCAACTCCA(SEQ ID No.7);
2R:5’-GACAGTGCCACAGCCCTTAG(SEQ ID No.8);
3F:5’-TCAAGATGGCGTGTGGGATT(SEQ ID No.9);
3R:5’-GACCAGCAAATGGTGAAGCC(SEQ ID No.10);
4F:5’-GATCCTTTGGCATCCGGAGA(SEQ ID No.11);
4R:5’-GCTGATTCTGTGTTTGGCCC(SEQ ID No.12)。
3. construction and screening of sgRNA library for splicing targeting essential ribosomal genes
79 ribosomal genes were retrieved from NCBI. We scanned all potential sgrnas around-50-bp to +75-bp and around-75-bp to +50-bp per 5 'SD site and around-75-bp to +50-bp per 3' SA site targeting these 79 genes, including:
RPL10, RPL10A, RPL11, RPL12, RPL13, RPL13A, RPL14, RPL15, RPL17, RPL18, RPL18A, RPL19, RPL21, RPL22, RPL22L1, RPL23, RPL23A, RPL24, RPL26, RPL26L1, RPL27, RPL27 27, RPL35 27, RPL36 27, RPL27, RPS27, RPL27, RPS27, RPL27, RPS27, RPL27, RPS27, RPL27, RPS27, RPS27, RPS27, RPL 27. We ensure that all sgrnas have at least 2 mismatches with any other locus of the human genome. To demonstrate the natural cleavage potency of sgrnas in the library, GC content was not considered in the design. A total of 5,788 sgrnas targeting 79 ribosomal genes were synthesized using a CustmoArray 12K array chip (CustmoArray, Inc.). The design of sgrnas is illustrated here by the RPL18 gene among 79 ribosomal genes. In HeLa and Huh7.5 cells expressing Cas9<An MOI of 0.3 cell libraries containing these sgRNAs were constructed by lentiviral delivery28Minimum coverage is 400 ×. viral infection72 hours later, the cells were treated by FACS (BD) according to mCherry+Control cells for each library were collected using the DNeasy Blood and Tissue kit (QIAGEN 69506) (2.4 × 106) For genomic DNA extraction, and the experimental cells were cultured continuously for 15 days before the extraction of genomic DNA. For each replicate, the lentiviral integrated sgRNA coding region was PCR amplified by TransTaq HiFi DNA polymerase (TransGen AP131-13) and further used DNA Clean as described previously&Concentrator-25(Zymo Research Corporation D4034) purification4,9. The resulting Library was prepared for high throughput sequencing analysis (Illumina HiSeq2500) using the NEBNext Ultra DNA Library Prep kit for Illumina (NEB E7370L).
4. Design and construction of genome-scale human lncRNA library
lncRNA were retrieved from the gendate dataset V20 containing 14,470 lncRNA. In this dataset, 2477 lncrnas without splice sites were removed in the first filtration step. For the remaining lncRNA, all potential 20-nt sgRNAs were designed that target the-10-bp to +10-bp region around each 5 'SD site and 3' SA site. To ensure cleavage efficiency and specificity, we retained only sgRNAs with at least 2 mismatches to other loci in the genome, with GC contents between 20% and 80%, and removed those sgRNAs containing 4-bp T nucleotide homopolymers. For optimal coverage, some sgRNAs with 1-bp or 0-bp mismatches to other loci were retained, as long as they did not target any essential genes of the K562 cell line15And the total number of mismatched sites is less than 2. Finally, a total of 126,773 sgrnas targeting 10,996lncRNA were synthesized. In this library, we also included 500 non-targeted sgrnas in the human genome as negative controls, and 350 sgrnas targeting 36 essential ribosomal genes as positive controls. Oligonucleotides were synthesized using a CustmoArray 90K array chip (CustmoArray, Inc.) and library construction was as described above.
5. Computer analysis of screens
The sequencing was read and decoded by the home-made script. From the multiple change (fold change) of each sgRNA ready in the experimental group and the reference group in the deep sequencing result, we draw a trend graph of the change of the amount of the sgrnas after screening, and a log value of the ratio of the sgrnas of the experimental group at day 15 to the reference group at day 0 of all sgrnas designed for each site indicates the magnitude of the decrease (drop out) of the amount of the sgrnas existing in each site. This amplitude can be defined as the signal-to-noise ratio of the stealth screen.
Results
Consistent with common general knowledge, there are conserved sequences that form splice sites, and we use the Weblogo3 tool33Bioinformatic analysis of (b) showed that about 99% of the intron regions in the human genome were flanked by GT at the 5 'Splice Donor (SD) site and AG at the 3' Splice Acceptor (SA) site. Notably, the AG sequence is present mainly as the last two bases of the exon immediately upstream of the SD site (fig. 1 a). To confirm the effectiveness of sgrnas in generating exon skipping and/or intron retention, we designed sgrnas that target the SD or SA sites of two ribosomal genes, RPL18 and RPL11, which are essential for cell growth and proliferation. Stably expressing Cas9 and OCT1 genes4In HeLa cells of (1), sgRNA1 targeting the SD siteRPL18And sgRNA2 targeting the SA siteRPL18Intron 3 retention and exon 4 skipping, respectively, were generated at the RPL18 locus in the genome, which were confirmed by both reverse transcription PCR (RT-PCR) and Sanger sequencing analysis.
The same results were obtained from similar attempts at the RPL11 gene, where sgRNA3RPL11And sgRNA4RPL11Intron 2 retention and exon 4 skipping, respectively, were generated at the RPL11 locus. Fig. 1b shows a schematic diagram of intron retention and exon skipping induced by sgrnas targeting either Splice Donor (SD) or Splice Acceptor (SA) sites.
To further assess the efficacy of targeted splicing in CRISPR scans, we designed a saturated library of splice sites targeting 79 ribosomal genes essential for cell growth in various cell lines29. The library contained 5,788 sgRNAs with cleavage sites within-50-bp to +75-bp around each 5 'SD site and within-75-bp to +50-bp for each 3' SA site of the 79 genes, see Table for examples of sgRNAs1。
Cell libraries containing these sgrnas were constructed by lentiviral delivery at MOI (multiplicity of infection) of <0.3 in HeLa cells and huh7.5 cells expressing Cas 9. Screening was performed by cell culture of library cells up to 15 days long, and based on NGS analysis, sgrnas that resulted in a decrease in cell viability were deciphered.
By calculating the fold change of sgrnas between 15-day experimental samples (Exp) and control samples (Ctrl), we ordered all sgrnas and aligned according to the distance (in how many base pairs) between the sgRNA cleavage site and its corresponding SD or SA site. Spearman correlation between Ctrl and Exp biological replicates in both HeLa and huh7.5 cells showed that all results were highly reproducible (fig. 2). To demonstrate the effectiveness of splicing targeting on gene disruption, we combined all data targeting the SD site and data targeting the SA site and arranged them according to their physical distance from the SD or SA site (fig. 3 and fig. 1 d). It is evident that sgrnas affecting splice sites are superior to those targeting exon regions only in both HeLa and huh7.5 cells. The closer the cleavage site of the sgRNA is to the splice site, the better its effect on gene disruption, with the peak point slightly towards the exon for both SD and SA cases (fig. 3, 1 d). In contrast, a large number of sgrnas targeting the intron were rarely depleted during the screening process, indicating that their effect on gene disruption and cell viability due to loss of function of the gene was small. The only exceptions are those sgrnas that target such intron regions34,35The intron region, near the SA site, includes a branch point, followed by a polypyrimidine nucleotidic tract known to be involved in RNA splicing.
Since the number of sgrnas designed for any locus is not equal, we compared the percentage of high-efficiency sgrnas (sgrnas that are more than 4-fold reduced) per locus for a fair comparison. By doing so, we further confirmed that sgrnas targeting SD and SA were greatly superior to those targeting exon regions only (fig. 4 a). To better quantify our results, we classified all sgrnas into three categories: an intron-targeted sgRNA (the cleavage site of the sgRNA is within the intron and at least 30-bp from the SD or SA site), an exon-targeted sgRNA (the cleavage site of the sgRNA is within the exon and at least 30-bp from the SD or SA site), and a splicing-targeted sgRNA (the cleavage site of the sgRNA is-10-bp to +10-bp flanking the SD or SA site, and + refer to the intron and exon orientations, respectively). In both HeLa and huh7.5 cells, the percentage of sgrnas that resulted in more than a2 or 4-fold reduction was much higher in target spliced sgrnas than in the other two classes (fig. 4b,4 c).
Based on experimental data, the novel method described in the present invention was demonstrated to have significant advantages in negative CRISPR screening of coding genes, which is complementary to conventional exon-targeting methods, and also allows large-scale functional deletion screening of non-coding genes using single guide RNA-CRISPR libraries. In addition, exon skipping or intron retention resulting from splice site disruption provides a convenient method for functional validation of a single non-coding RNA.
Reference to the literature
1.Shalem,O.et al.Genome-scale CRISPR-Cas9 knockout screening in humancells.Science 343,84-87(2014).
2.Wang,T.,Wei,J.J.,Sabatini,D.M.&Lander,E.S.Genetic screens in humancells using the CRISPR-Cas9 system.Science 343,80-84(2014).
3.Koike-Yusa,H.,Li,Y.,Tan,E.P.,Velasco-Herrera Mdel,C.&Yusa,K.Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library.Nat Biotechnol 32,267-273(2014).
4.Zhou,Y.et al.High-throughput screening of a CRISPR/Cas9 library forfunctional genomics in human cells.Nature 509,487-491(2014).
5.Ezkurdia,I.et al.Multiple evidence strands suggest that there maybe as few as 19,000 human protein-coding genes.Hum Mol Genet 23,5866-5878(2014).
6.Rinn,J.L.&Chang,H.Y.Genome regulation by long noncoding RNAs.AnnuRev Biochem 81,145-166(2012).
7.Quinn,J.J.&Chang,H.Y.Unique features of long non-coding RNAbiogenesis and function.Nat Rev Genet 17,47-62(2016).
8.Kretz,M.et al.Control of somatic tissue differentiation by the longnon-coding RNA TINCR.Nature 493,231-235(2013).
9.Zhu,S.et al.Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR-Cas9 library.Nat Biotechnol 34,1279-1286(2016).
10.Guttman,M.et al.lincRNAs act in the circuitry controllingpluripotency and differentiation.Nature 477,295-300(2011).
11.Lin,N.et al.An evolutionarily conserved long noncoding RNA TUNAcontrols pluripotency and neural lineage commitment.Mol Cell 53,1005-1019(2014).
12.Liu,S.J.et al.CRISPRi-based genome-scale identification offunctional long noncoding RNA loci in human cells.Science 355(2017).
13.Adamson,B.,Smogorzewska,A.,Sigoillot,F.D.,King,R.W.&Elledge,S.J.Agenome-wide homologous recombination screen identifies the RNA-bindingprotein RBMX as a component of the DNA-damage response.Nat Cell Biol 14,318-328(2012).
14.Sambrook,Fritsch and Maniatis,MOLECULAR CLONING:A LABORATORYMANUAL,2nd edition(1989).
15.F.M.Ausubel,et al.eds.,CURRENT PROTOCOLS IN MOLECULAR BIOLOGY(1987).
16.M.J.MacPherson,B.D.Hames and G.R.Taylor eds.,METHODS IN ENZYMOLOGY(Academic Press,Inc.):PGR 2:A PRACTICAL APPROACH(1995).
17.Harlow and Lane,eds.ANTIBODIES,A LABORATORY MANUAL,(1988).
18.R.L Freshney,ed.,ANIMAL CELL CULTURE(1987).
19.Goeddel,GENE EXPRESSION TECHNOLOGY:METHODS IN ENZYMOLOGY 185,Academic Press,San Diego,Calif.(1990).
20.Seed,1987.Nature 329:840(Seed,B.An LFA-3 cDNA encodes aphospholipid-linked membrane protein homologous to its receptor CD2.Nature(1987)329:840–842.)
21.Kaufman,et al.,1987.EMBO J.6:187-195(Randal J,Kaufman,etal.Translational efficiency of polycistronic mRNAs and their utilization toexpress heterologous genes in mammalian cells.The EMBO Journal(1987)6:187-195)
22.Clancy,Suzanne.RNA Splicing:Introns and exons andSpliceosome.Nature Education.1,31(2008).
23.Black,Douglas L.Mechanisms of Alternative Pre-Messenger RNASplicing.Annual Review of Biochemistry.72:291–336(2003).
24.Ng,Bernard;Yang,Fan;et al.Increased noncanonical splicing ofautoantigen transcripts provides the structural basis for expression ofuntolerized epitopes.Journal of Allergy and Clinical Immunology.114:1463–70(2004).
25.Lim,KH;Ferraris,L;et al.Using positional distribution to identifysplicing elements and predict pre-mRNA processing defects in human genes.Proc.Natl.Acad.Sci.USA.108:11093–11098(2011).
Warf, MB; berglund, JA. role of RNA structure in regulating pre-mRNAscope. Tr. end sBiochem. Sci.35: 169-178 (2010).
Warf, MB; berglund, JA. role of RNA structure in regulating pre-mRNAscope. Tr. end sBiochem. Sci.35(3): 169-178 (2010).
28.Ren,Q.et al.A Dual-Reporter System for Real-Time Monitoring andHigh-throughput CRISPR/Cas9 Library Screening of the Hepatitis CVirus.Scientific reports 5,8865(2015).
29.Wang,T.et al.Identification and characterization of essentialgenes in the human genome.Science 350,1096-1101(2015).
30.Li,B.&Dewey,C.N.RSEM:accurate transcript quantification from RNA-Seq data with or without a reference genome.BMC bioinformatics 12,323(2011).
31.Leng,N.et al.EBSeq:an empirical Bayes hierarchical model forinference in RNA-seq experiments.Bioinformatics 29,1035-1043(2013).
32.Jiao,X.et al.DAVID-WS:a stateful web service to facilitate gene/protein list analysis.Bioinformatics 28,1805-1806(2012).
33.Crooks,G.E.,Hon,G.,Chandonia,J.M.&Brenner,S.E.WebLogo:a sequencelogo generator.Genome Res 14,1188-1190(2004).
34.Matlin,A.J.,Clark,F.&Smith,C.W.Understanding alternative splicing:towards a cellular code.Nat Rev Mol Cell Biol 6,386-398(2005).
35.Taggart,A.J.,DeSimone,A.M.,Shih,J.S.,Filloux,M.E.&Fairbrother,W.G.Large-scale mapping of branchpoints in human pre-mRNA transcripts invivo.Nat Struct Mol Biol 19,719-721(2012).
36.Hsu,P.D.et al.DNA targeting specificity of RNA-guided Cas9nucleases.Nat Biotechnol 31,827-832(2013).
37.Xu,H.et al.Sequence determinants of improved CRISPR sgRNAdesign.Genome Res 25,1147-1157(2015).
38.Heidari,N.etal.Genome-wide map of regulatory interactions in thehuman genome.Genome Res 24,1905-1917(2014).
39.Muller,R.Y.,Hammond,M.C.,Rio,D.C.&Lee,Y.J.An Efficient Method forElectroporation of Small Interfering RNAs into ENCODE Project Tier 1 GM12878and K562 Cell Lines.J Biomol Tech 26,142-149(2015).
40.Joung,J.et al.Genome-scale activation screen identifies a lncRNAlocus regulating a gene neighbourhood.Nature(2017).
41.Goyal,A.et al.Challenges of CRISPR/Cas9 applications for long non-coding RNA genes.Nucleic Acids Res 45,e12(2017).

Claims (10)

1. A CRISPR/Cas guide RNA construct for disrupting RNA cleavage in a eukaryotic cell genome comprising a guide sequence that targets a genomic sequence surrounding an RNA splice site operably linked to a promoter and a guide hairpin sequence.
2. The CRISPR/Cas guide RNA construct of claim 1, wherein said eukaryotic genome is a human genome.
3. The CRISPR/Cas guide RNA construct of claim 1 or 2, wherein said guide sequence is 19-21 nucleotides in length.
4. The CRISPR/Cas guide RNA construct of any of claims 1-3, wherein said hairpin sequence is about 40 nucleotides in length and once transcribed can bind to a CRISPR/Cas nuclease.
5. The CRISPR/Cas guide RNA construct of any of claims 1-4, wherein said guide sequence targets a genomic sequence within a region spanning-50-bp to +75-bp around the SD or SA site of the RNA.
6. The CRISPR/Cas guide RNA construct of claim 5, wherein said guide sequence targets a genomic sequence within a region spanning-30-bp to +30-bp around the SD or SA site of the RNA.
7. The CRISPR/Cas guide RNA construct of claim 6, wherein said guide sequence targets a genomic sequence within a region spanning-10-bp to +10-bp around the SD or SA site of the RNA.
8. The CRISPR/Cas guide RNA construct of any of claims 1-7, which is a viral vector or plasmid.
9. A library comprising a plurality of CRISPR/Cas guide RNA constructs of any of claims 1-8.
10. A storage liquid comprising the CRISPR/Cas guide RNA construct of any of claims 1-8 or the library of claim 9.
CN201811546460.9A 2018-12-18 2018-12-18 High signal-to-noise ratio negative genetic screening method Pending CN111334531A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811546460.9A CN111334531A (en) 2018-12-18 2018-12-18 High signal-to-noise ratio negative genetic screening method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811546460.9A CN111334531A (en) 2018-12-18 2018-12-18 High signal-to-noise ratio negative genetic screening method

Publications (1)

Publication Number Publication Date
CN111334531A true CN111334531A (en) 2020-06-26

Family

ID=71181343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811546460.9A Pending CN111334531A (en) 2018-12-18 2018-12-18 High signal-to-noise ratio negative genetic screening method

Country Status (1)

Country Link
CN (1) CN111334531A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023284735A1 (en) * 2021-07-12 2023-01-19 Edigene Therapeutics (Beijing) Inc. Methods of identifying drug sensitive genes and drug resistant genes in cancer cells

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023284735A1 (en) * 2021-07-12 2023-01-19 Edigene Therapeutics (Beijing) Inc. Methods of identifying drug sensitive genes and drug resistant genes in cancer cells

Similar Documents

Publication Publication Date Title
CN110343724B (en) Method for screening and identifying functional lncRNA
Durrant et al. Systematic discovery of recombinases for efficient integration of large DNA sequences into the human genome
JP7244885B2 (en) Methods for Screening and Identifying Functional lncRNAs
US9260723B2 (en) RNA-guided human genome engineering
JP2018532419A (en) CRISPR-Cas sgRNA library
AU2019408503B2 (en) Compositions and methods for highly efficient genetic screening using barcoded guide rna constructs
WO2018005691A1 (en) Efficient genetic screening method
Neumayr et al. STARR‐seq and UMI‐STARR‐seq: assessing enhancer activities for genome‐wide‐, high‐, and low‐complexity candidate libraries
WO2017083766A1 (en) High-throughput crispr-based library screening
DAS et al. Full-length cDNAs: more than just reaching the ends
US20220017895A1 (en) Gramc: genome-scale reporter assay method for cis-regulatory modules
US20090111099A1 (en) Promoter Detection and Analysis
US11946163B2 (en) Methods for measuring and improving CRISPR reagent function
Li et al. Chromatin context-dependent regulation and epigenetic manipulation of prime editing
Wu et al. Massively parallel characterization of CRISPR activator efficacy in human induced pluripotent stem cells and neurons
KR20160118987A (en) Pair of sgRNAs for Deletion of LincRNA
CN111334531A (en) High signal-to-noise ratio negative genetic screening method
Mitschka et al. Generation of 3′ UTR knockout cell lines by CRISPR/Cas9-mediated genome editing
Wu Mouse oocytes, a complex single cell transcriptome
US20110071047A1 (en) Promoter detection and analysis
Guay et al. Unbiased genome-scale identification of cis-regulatory modules in the human genome by GRAMc
CN116286991B (en) Whole genome enhancer screening system, screening method and application
WO2023060539A1 (en) Compositions and methods for detecting target cleavage sites of crispr/cas nucleases and dna translocation
WO2023081762A2 (en) Serine recombinases
FACS-Based et al. Check for updates

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200626

WD01 Invention patent application deemed withdrawn after publication