CN110343724A - Method for screening and identifying functional lncRNA - Google Patents
Method for screening and identifying functional lncRNA Download PDFInfo
- Publication number
- CN110343724A CN110343724A CN201810284463.3A CN201810284463A CN110343724A CN 110343724 A CN110343724 A CN 110343724A CN 201810284463 A CN201810284463 A CN 201810284463A CN 110343724 A CN110343724 A CN 110343724A
- Authority
- CN
- China
- Prior art keywords
- sequence
- crispr
- rna
- site
- cell
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title abstract description 81
- 108020005198 Long Noncoding RNA Proteins 0.000 title abstract description 59
- 238000012216 screening Methods 0.000 title description 38
- 230000008685 targeting Effects 0.000 claims abstract description 59
- 108091046869 Telomeric non-coding RNA Proteins 0.000 claims description 89
- 108020005004 Guide RNA Proteins 0.000 claims description 83
- 238000010453 CRISPR/Cas method Methods 0.000 claims description 66
- 101710163270 Nuclease Proteins 0.000 claims description 34
- 125000003729 nucleotide group Chemical group 0.000 claims description 27
- 239000002773 nucleotide Substances 0.000 claims description 25
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 18
- 230000002452 interceptive effect Effects 0.000 claims description 11
- 239000013612 plasmid Substances 0.000 claims description 8
- 239000013603 viral vector Substances 0.000 claims description 4
- 239000007788 liquid Substances 0.000 claims description 2
- 238000003860 storage Methods 0.000 claims description 2
- 108090000623 proteins and genes Proteins 0.000 abstract description 147
- 108091027963 non-coding RNA Proteins 0.000 abstract description 11
- 102000042567 non-coding RNA Human genes 0.000 abstract description 11
- 241000206602 Eukaryota Species 0.000 abstract 1
- 210000004027 cell Anatomy 0.000 description 187
- 108091033409 CRISPR Proteins 0.000 description 75
- 108091027544 Subgenomic mRNA Proteins 0.000 description 55
- 238000010354 CRISPR gene editing Methods 0.000 description 41
- 230000014509 gene expression Effects 0.000 description 41
- 230000006870 function Effects 0.000 description 35
- 108020004414 DNA Proteins 0.000 description 32
- 102000040430 polynucleotide Human genes 0.000 description 27
- 108091033319 polynucleotide Proteins 0.000 description 27
- 239000002157 polynucleotide Substances 0.000 description 27
- 102000004169 proteins and genes Human genes 0.000 description 26
- 239000013598 vector Substances 0.000 description 25
- 230000000694 effects Effects 0.000 description 22
- 230000014759 maintenance of location Effects 0.000 description 21
- 230000008859 change Effects 0.000 description 20
- 238000003776 cleavage reaction Methods 0.000 description 20
- 230000007017 scission Effects 0.000 description 20
- 230000004663 cell proliferation Effects 0.000 description 18
- 150000007523 nucleic acids Chemical class 0.000 description 18
- 239000013604 expression vector Substances 0.000 description 17
- 102000039446 nucleic acids Human genes 0.000 description 17
- 108020004707 nucleic acids Proteins 0.000 description 17
- 230000001105 regulatory effect Effects 0.000 description 17
- 108700039887 Essential Genes Proteins 0.000 description 14
- 108020001027 Ribosomal DNA Proteins 0.000 description 14
- 238000012217 deletion Methods 0.000 description 14
- 230000037430 deletion Effects 0.000 description 14
- 108091026890 Coding region Proteins 0.000 description 13
- 230000010261 cell growth Effects 0.000 description 13
- 108020004999 messenger RNA Proteins 0.000 description 12
- 239000013642 negative control Substances 0.000 description 12
- 238000004458 analytical method Methods 0.000 description 11
- 108700008625 Reporter Genes Proteins 0.000 description 10
- 208000015181 infectious disease Diseases 0.000 description 10
- 101001120822 Homo sapiens Putative microRNA 17 host gene protein Proteins 0.000 description 9
- 108091092195 Intron Proteins 0.000 description 9
- 102100026055 Putative microRNA 17 host gene protein Human genes 0.000 description 9
- 238000010200 validation analysis Methods 0.000 description 9
- 108700024394 Exon Proteins 0.000 description 8
- 239000012634 fragment Substances 0.000 description 8
- 230000003612 virological effect Effects 0.000 description 8
- 102100022681 40S ribosomal protein S27 Human genes 0.000 description 7
- 102100025601 60S ribosomal protein L27 Human genes 0.000 description 7
- 101000678466 Homo sapiens 40S ribosomal protein S27 Proteins 0.000 description 7
- 101000719728 Homo sapiens 60S ribosomal protein L27 Proteins 0.000 description 7
- 230000027455 binding Effects 0.000 description 7
- 230000015572 biosynthetic process Effects 0.000 description 7
- 239000003795 chemical substances by application Substances 0.000 description 7
- 108010048367 enhanced green fluorescent protein Proteins 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 7
- 238000013518 transcription Methods 0.000 description 7
- 230000035897 transcription Effects 0.000 description 7
- 102100032411 60S ribosomal protein L18 Human genes 0.000 description 6
- 101001087985 Homo sapiens 60S ribosomal protein L18 Proteins 0.000 description 6
- 101001000998 Homo sapiens Protein phosphatase 1 regulatory subunit 12C Proteins 0.000 description 6
- 102100035620 Protein phosphatase 1 regulatory subunit 12C Human genes 0.000 description 6
- 230000001413 cellular effect Effects 0.000 description 6
- 230000007423 decrease Effects 0.000 description 6
- 230000012010 growth Effects 0.000 description 6
- 210000004962 mammalian cell Anatomy 0.000 description 6
- 238000003757 reverse transcription PCR Methods 0.000 description 6
- 239000004055 small Interfering RNA Substances 0.000 description 6
- 238000003559 RNA-seq method Methods 0.000 description 5
- 241000193996 Streptococcus pyogenes Species 0.000 description 5
- 108700009124 Transcription Initiation Site Proteins 0.000 description 5
- 238000013459 approach Methods 0.000 description 5
- 238000003556 assay Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 108020001507 fusion proteins Proteins 0.000 description 5
- 102000037865 fusion proteins Human genes 0.000 description 5
- 239000002502 liposome Substances 0.000 description 5
- 230000006780 non-homologous end joining Effects 0.000 description 5
- 239000002245 particle Substances 0.000 description 5
- 230000004083 survival effect Effects 0.000 description 5
- 230000014616 translation Effects 0.000 description 5
- 238000010356 CRISPR-Cas9 genome editing Methods 0.000 description 4
- 101001073740 Homo sapiens 60S ribosomal protein L11 Proteins 0.000 description 4
- 101000920979 Homo sapiens Putative ciliary rootlet coiled-coil protein-like 1 protein Proteins 0.000 description 4
- 241000713666 Lentivirus Species 0.000 description 4
- 108091027974 Mature messenger RNA Proteins 0.000 description 4
- 102100032204 Putative ciliary rootlet coiled-coil protein-like 1 protein Human genes 0.000 description 4
- 108091027967 Small hairpin RNA Proteins 0.000 description 4
- 238000000692 Student's t-test Methods 0.000 description 4
- 238000001516 cell proliferation assay Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000021615 conjugation Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000004520 electroporation Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000001943 fluorescence-activated cell sorting Methods 0.000 description 4
- 230000001404 mediated effect Effects 0.000 description 4
- 239000002679 microRNA Substances 0.000 description 4
- 238000000520 microinjection Methods 0.000 description 4
- 238000010369 molecular cloning Methods 0.000 description 4
- 239000013641 positive control Substances 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 239000000047 product Substances 0.000 description 4
- 238000003259 recombinant expression Methods 0.000 description 4
- 239000000523 sample Substances 0.000 description 4
- 210000001324 spliceosome Anatomy 0.000 description 4
- 238000012353 t test Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000004565 tumor cell growth Effects 0.000 description 4
- 238000011144 upstream manufacturing Methods 0.000 description 4
- 102100035916 60S ribosomal protein L11 Human genes 0.000 description 3
- 102100021206 60S ribosomal protein L19 Human genes 0.000 description 3
- 238000010446 CRISPR interference Methods 0.000 description 3
- 101001105789 Homo sapiens 60S ribosomal protein L19 Proteins 0.000 description 3
- 108700011259 MicroRNAs Proteins 0.000 description 3
- 108091028043 Nucleic acid sequence Proteins 0.000 description 3
- 229920000776 Poly(Adenosine diphosphate-ribose) polymerase Polymers 0.000 description 3
- 102000015097 RNA Splicing Factors Human genes 0.000 description 3
- 108010039259 RNA Splicing Factors Proteins 0.000 description 3
- 241000194020 Streptococcus thermophilus Species 0.000 description 3
- 241000700605 Viruses Species 0.000 description 3
- 230000001580 bacterial effect Effects 0.000 description 3
- 230000032823 cell division Effects 0.000 description 3
- 230000003833 cell viability Effects 0.000 description 3
- 230000033077 cellular process Effects 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 3
- 230000005782 double-strand break Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000005714 functional activity Effects 0.000 description 3
- 210000005260 human cell Anatomy 0.000 description 3
- 238000009396 hybridization Methods 0.000 description 3
- 238000000338 in vitro Methods 0.000 description 3
- 230000000670 limiting effect Effects 0.000 description 3
- 108091091751 miR-17 stem-loop Proteins 0.000 description 3
- 108091069239 miR-17-2 stem-loop Proteins 0.000 description 3
- 108091070501 miRNA Proteins 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- 230000037361 pathway Effects 0.000 description 3
- 108090000765 processed proteins & peptides Proteins 0.000 description 3
- 230000035755 proliferation Effects 0.000 description 3
- 108020001580 protein domains Proteins 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 230000009711 regulatory function Effects 0.000 description 3
- 125000006850 spacer group Chemical group 0.000 description 3
- CCEKAJIANROZEO-UHFFFAOYSA-N sulfluramid Chemical group CCNS(=O)(=O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F CCEKAJIANROZEO-UHFFFAOYSA-N 0.000 description 3
- 230000002103 transcriptional effect Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 102100024406 60S ribosomal protein L15 Human genes 0.000 description 2
- 102100023247 60S ribosomal protein L23a Human genes 0.000 description 2
- 108010035563 Chloramphenicol O-acetyltransferase Proteins 0.000 description 2
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 2
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 2
- 239000006144 Dulbecco’s modified Eagle's medium Substances 0.000 description 2
- 102000005720 Glutathione transferase Human genes 0.000 description 2
- 108010070675 Glutathione transferase Proteins 0.000 description 2
- 108010043121 Green Fluorescent Proteins Proteins 0.000 description 2
- 102000004144 Green Fluorescent Proteins Human genes 0.000 description 2
- 101001117935 Homo sapiens 60S ribosomal protein L15 Proteins 0.000 description 2
- 101001115494 Homo sapiens 60S ribosomal protein L23a Proteins 0.000 description 2
- 108010001336 Horseradish Peroxidase Proteins 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 2
- 108010047956 Nucleosomes Proteins 0.000 description 2
- 108091034117 Oligonucleotide Proteins 0.000 description 2
- 101710084414 POU domain, class 2, transcription factor 1 Proteins 0.000 description 2
- 108091007412 Piwi-interacting RNA Proteins 0.000 description 2
- 108020005067 RNA Splice Sites Proteins 0.000 description 2
- 108020004511 Recombinant DNA Proteins 0.000 description 2
- 108020003224 Small Nucleolar RNA Proteins 0.000 description 2
- 102000042773 Small Nucleolar RNA Human genes 0.000 description 2
- 108020004459 Small interfering RNA Proteins 0.000 description 2
- 108091028113 Trans-activating crRNA Proteins 0.000 description 2
- 108020004566 Transfer RNA Proteins 0.000 description 2
- 208000036142 Viral infection Diseases 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 2
- 230000033115 angiogenesis Effects 0.000 description 2
- 230000000692 anti-sense effect Effects 0.000 description 2
- 230000006907 apoptotic process Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000003115 biocidal effect Effects 0.000 description 2
- 238000007622 bioinformatic analysis Methods 0.000 description 2
- 108091005948 blue fluorescent proteins Proteins 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 230000021164 cell adhesion Effects 0.000 description 2
- 230000019522 cellular metabolic process Effects 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 239000002299 complementary DNA Substances 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 108010082025 cyan fluorescent protein Proteins 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000001819 effect on gene Effects 0.000 description 2
- 210000001808 exosome Anatomy 0.000 description 2
- 239000012091 fetal bovine serum Substances 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000037433 frameshift Effects 0.000 description 2
- 230000009368 gene silencing by RNA Effects 0.000 description 2
- 239000005090 green fluorescent protein Substances 0.000 description 2
- 229920001519 homopolymer Polymers 0.000 description 2
- 230000028993 immune response Effects 0.000 description 2
- 230000002401 inhibitory effect Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 239000002105 nanoparticle Substances 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 210000001623 nucleosome Anatomy 0.000 description 2
- 229920001184 polypeptide Polymers 0.000 description 2
- 230000001124 posttranscriptional effect Effects 0.000 description 2
- 102000004196 processed proteins & peptides Human genes 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 210000001236 prokaryotic cell Anatomy 0.000 description 2
- RXWNCPJZOCPEPQ-NVWDDTSBSA-N puromycin Chemical compound C1=CC(OC)=CC=C1C[C@H](N)C(=O)N[C@H]1[C@@H](O)[C@H](N2C3=NC=NC(=C3N=C2)N(C)C)O[C@@H]1CO RXWNCPJZOCPEPQ-NVWDDTSBSA-N 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- -1 respectively. f Proteins 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 108020004418 ribosomal RNA Proteins 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 229920006395 saturated elastomer Polymers 0.000 description 2
- 238000013341 scale-up Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000019491 signal transduction Effects 0.000 description 2
- 239000000344 soap Substances 0.000 description 2
- UCSJYZPVAKXKNQ-HZYVHMACSA-N streptomycin Chemical compound CN[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O[C@H]1O[C@@H]1[C@](C=O)(O)[C@H](C)O[C@H]1O[C@@H]1[C@@H](NC(N)=N)[C@H](O)[C@@H](NC(N)=N)[C@H](O)[C@H]1O UCSJYZPVAKXKNQ-HZYVHMACSA-N 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 210000001519 tissue Anatomy 0.000 description 2
- 238000010361 transduction Methods 0.000 description 2
- 230000026683 transduction Effects 0.000 description 2
- 238000001890 transfection Methods 0.000 description 2
- 238000011311 validation assay Methods 0.000 description 2
- 230000009385 viral infection Effects 0.000 description 2
- 108091005957 yellow fluorescent proteins Proteins 0.000 description 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 1
- 102100022406 60S ribosomal protein L10a Human genes 0.000 description 1
- 102100025643 60S ribosomal protein L12 Human genes 0.000 description 1
- 102100024442 60S ribosomal protein L13 Human genes 0.000 description 1
- 102100022289 60S ribosomal protein L13a Human genes 0.000 description 1
- 102100031854 60S ribosomal protein L14 Human genes 0.000 description 1
- 102100023990 60S ribosomal protein L17 Human genes 0.000 description 1
- 102100021690 60S ribosomal protein L18a Human genes 0.000 description 1
- 102100037965 60S ribosomal protein L21 Human genes 0.000 description 1
- 102100037685 60S ribosomal protein L22 Human genes 0.000 description 1
- 102100038008 60S ribosomal protein L22-like 1 Human genes 0.000 description 1
- 102100021308 60S ribosomal protein L23 Human genes 0.000 description 1
- 102100035322 60S ribosomal protein L24 Human genes 0.000 description 1
- 102100028348 60S ribosomal protein L26 Human genes 0.000 description 1
- 102100028439 60S ribosomal protein L26-like 1 Human genes 0.000 description 1
- 102100036116 60S ribosomal protein L35 Human genes 0.000 description 1
- 102100022048 60S ribosomal protein L36 Human genes 0.000 description 1
- 230000007730 Akt signaling Effects 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 102100026189 Beta-galactosidase Human genes 0.000 description 1
- 108091003079 Bovine Serum Albumin Proteins 0.000 description 1
- 101100470627 Caenorhabditis elegans rpl-27 gene Proteins 0.000 description 1
- 108090000994 Catalytic RNA Proteins 0.000 description 1
- 102000053642 Catalytic RNA Human genes 0.000 description 1
- 108091092236 Chimeric RNA Proteins 0.000 description 1
- 108020004705 Codon Proteins 0.000 description 1
- 241000701022 Cytomegalovirus Species 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 230000022963 DNA damage response, signal transduction by p53 class mediator Effects 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- 230000033616 DNA repair Effects 0.000 description 1
- 241000450599 DNA viruses Species 0.000 description 1
- 241000702421 Dependoparvovirus Species 0.000 description 1
- 206010059866 Drug resistance Diseases 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 238000012413 Fluorescence activated cell sorting analysis Methods 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 108700028146 Genetic Enhancer Elements Proteins 0.000 description 1
- 108010060309 Glucuronidase Proteins 0.000 description 1
- 102000053187 Glucuronidase Human genes 0.000 description 1
- 101001023784 Heteractis crispa GFP-like non-fluorescent chromoprotein Proteins 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101000691550 Homo sapiens 39S ribosomal protein L13, mitochondrial Proteins 0.000 description 1
- 101001108634 Homo sapiens 60S ribosomal protein L10 Proteins 0.000 description 1
- 101000755323 Homo sapiens 60S ribosomal protein L10a Proteins 0.000 description 1
- 101000575173 Homo sapiens 60S ribosomal protein L12 Proteins 0.000 description 1
- 101001118201 Homo sapiens 60S ribosomal protein L13 Proteins 0.000 description 1
- 101000681240 Homo sapiens 60S ribosomal protein L13a Proteins 0.000 description 1
- 101000704267 Homo sapiens 60S ribosomal protein L14 Proteins 0.000 description 1
- 101000682512 Homo sapiens 60S ribosomal protein L17 Proteins 0.000 description 1
- 101000752293 Homo sapiens 60S ribosomal protein L18a Proteins 0.000 description 1
- 101000661708 Homo sapiens 60S ribosomal protein L21 Proteins 0.000 description 1
- 101001097555 Homo sapiens 60S ribosomal protein L22 Proteins 0.000 description 1
- 101000661567 Homo sapiens 60S ribosomal protein L22-like 1 Proteins 0.000 description 1
- 101000675833 Homo sapiens 60S ribosomal protein L23 Proteins 0.000 description 1
- 101000660926 Homo sapiens 60S ribosomal protein L24 Proteins 0.000 description 1
- 101001080179 Homo sapiens 60S ribosomal protein L26 Proteins 0.000 description 1
- 101001080152 Homo sapiens 60S ribosomal protein L26-like 1 Proteins 0.000 description 1
- 101000715818 Homo sapiens 60S ribosomal protein L35 Proteins 0.000 description 1
- 101001110263 Homo sapiens 60S ribosomal protein L36 Proteins 0.000 description 1
- 101000615488 Homo sapiens Methyl-CpG-binding domain protein 2 Proteins 0.000 description 1
- 101000711369 Homo sapiens Probable ribosome biogenesis protein RLP24 Proteins 0.000 description 1
- 241000701109 Human adenovirus 2 Species 0.000 description 1
- 108060001084 Luciferase Proteins 0.000 description 1
- 239000005089 Luciferase Substances 0.000 description 1
- 241000829100 Macaca mulatta polyomavirus 1 Species 0.000 description 1
- 102100021299 Methyl-CpG-binding domain protein 2 Human genes 0.000 description 1
- 108091061960 Naked DNA Proteins 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 101100470609 Oscheius tipulae rpl-27a gene Proteins 0.000 description 1
- 102100035593 POU domain, class 2, transcription factor 1 Human genes 0.000 description 1
- 229930182555 Penicillin Natural products 0.000 description 1
- JGSARLDLIJGVTE-MBNYWOFBSA-N Penicillin G Chemical compound N([C@H]1[C@H]2SC([C@@H](N2C1=O)C(O)=O)(C)C)C(=O)CC1=CC=CC=C1 JGSARLDLIJGVTE-MBNYWOFBSA-N 0.000 description 1
- 102100039087 Peptidyl-alpha-hydroxyglycine alpha-amidating lyase Human genes 0.000 description 1
- 241001505332 Polyomavirus sp. Species 0.000 description 1
- CZPWVGJYEJSRLH-UHFFFAOYSA-N Pyrimidine Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 1
- 238000012228 RNA interference-mediated gene silencing Methods 0.000 description 1
- 230000007022 RNA scission Effects 0.000 description 1
- 108091030071 RNAI Proteins 0.000 description 1
- 238000011530 RNeasy Mini Kit Methods 0.000 description 1
- 101150067054 RPL11 gene Proteins 0.000 description 1
- 239000012980 RPMI-1640 medium Substances 0.000 description 1
- 108700005075 Regulator Genes Proteins 0.000 description 1
- 102000009661 Repressor Proteins Human genes 0.000 description 1
- 108010034634 Repressor Proteins Proteins 0.000 description 1
- 102000004389 Ribonucleoproteins Human genes 0.000 description 1
- 108010081734 Ribonucleoproteins Proteins 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 101150034081 Rpl18 gene Proteins 0.000 description 1
- 108091081021 Sense strand Proteins 0.000 description 1
- 241000193998 Streptococcus pneumoniae Species 0.000 description 1
- 101100166147 Streptococcus thermophilus cas9 gene Proteins 0.000 description 1
- 241000283907 Tragelaphus oryx Species 0.000 description 1
- 238000001793 Wilcoxon signed-rank test Methods 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- ZVQOOHYFBIDMTQ-UHFFFAOYSA-N [methyl(oxido){1-[6-(trifluoromethyl)pyridin-3-yl]ethyl}-lambda(6)-sulfanylidene]cyanamide Chemical compound N#CN=S(C)(=O)C(C)C1=CC=C(C(F)(F)F)N=C1 ZVQOOHYFBIDMTQ-UHFFFAOYSA-N 0.000 description 1
- 239000012190 activator Substances 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N adenyl group Chemical class N1=CN=C2N=CNC2=C1N GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- 108010005774 beta-Galactosidase Proteins 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 230000022131 cell cycle Effects 0.000 description 1
- 230000030833 cell death Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 230000009918 complex formation Effects 0.000 description 1
- 230000000536 complexating effect Effects 0.000 description 1
- 108091036078 conserved sequence Proteins 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012258 culturing Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 238000012350 deep sequencing Methods 0.000 description 1
- 238000002716 delivery method Methods 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000003828 downregulation Effects 0.000 description 1
- 241001493065 dsRNA viruses Species 0.000 description 1
- 230000002900 effect on cell Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 239000003797 essential amino acid Substances 0.000 description 1
- 235000020776 essential amino acid Nutrition 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 108010021843 fluorescent protein 583 Proteins 0.000 description 1
- 238000010362 genome editing Methods 0.000 description 1
- 230000008826 genomic mutation Effects 0.000 description 1
- 239000001963 growth medium Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000013537 high throughput screening Methods 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 238000001638 lipofection Methods 0.000 description 1
- 210000003738 lymphoid progenitor cell Anatomy 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 239000002609 medium Substances 0.000 description 1
- 230000000813 microbial effect Effects 0.000 description 1
- 238000002703 mutagenesis Methods 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 210000004940 nucleus Anatomy 0.000 description 1
- 230000008212 organismal development Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 229940049954 penicillin Drugs 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000029279 positive regulation of transcription, DNA-dependent Effects 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 238000001243 protein synthesis Methods 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 229950010131 puromycin Drugs 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 108091092562 ribozyme Proteins 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 230000003584 silencer Effects 0.000 description 1
- 238000012174 single-cell RNA sequencing Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 230000009870 specific binding Effects 0.000 description 1
- 229940031000 streptococcus pneumoniae Drugs 0.000 description 1
- 229960005322 streptomycin Drugs 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 230000037426 transcriptional repression Effects 0.000 description 1
- 238000011222 transcriptome analysis Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000014621 translational initiation Effects 0.000 description 1
- 239000003981 vehicle Substances 0.000 description 1
- 210000002845 virion Anatomy 0.000 description 1
- 210000005253 yeast cell Anatomy 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
- C12N15/902—Stable introduction of foreign DNA into chromosome using homologous recombination
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B40/00—Libraries per se, e.g. arrays, mixtures
- C40B40/04—Libraries containing only organic compounds
- C40B40/06—Libraries containing nucleotides or polynucleotides, or derivatives thereof
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Molecular Biology (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Biomedical Technology (AREA)
- Medicinal Chemistry (AREA)
- Microbiology (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Plant Pathology (AREA)
- Mycology (AREA)
- Biophysics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- General Chemical & Material Sciences (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention relates to the methods for carrying out gene interference to non-coding RNA (lncRNA) by targeting gene of eucaryote cell group splice site.
Description
Technical Field
The present invention relates to the screening and identification of functional lncrnas by gene interference of long non-coding rnas (lncrnas) by targeting splice sites in the genome of eukaryotic cells.
Background
As a powerful genome editing tool, the CRISPR-Cas9 system has been used to identify gene function by large-scale screening1-4. Even on a genomic scale, gene interference is mostly achieved by frame-shift mutations generated within exons. With the exception of about 2% of protein-encoding genes in the human genome, there is much evidence that the remaining large number of transcripts are non-codingRNA5. Wherein,>the lncRNA of 200 nucleotides represents most genes without significant protein coding potential6-7. Previous studies showed that the total number of human lncRNA exceeds the total number of protein-encoding genes and that the number continues to climb8。
IncRNAs regulate gene expression at the transcriptional or post-transcriptional level via cis-or trans-form, and play key roles in a variety of cellular processes9. Although tens of thousands of loci have been labeled in the human genome to encode long noncoding rnas (lncrnas), their function has largely been unknown, primarily due to the lack of scalable methods to cause loss of function of this gene. In general, since lncrnas are not sensitive to reading frame changes, it is difficult to disrupt their expression in a conventional manner using the CRISPR-Cas9 system, let alone using the CRISPR-Cas9 system on a large scale. We previously developed a deletion strategy for loss-of-function screening of lncRNA by pgRNA libraries9But it is still difficult to scale up. Although studies have demonstrated that RNA interference is based10,11Or CRISPR12The screening of (1) is effective for identifying the function of lncRNA, but the RNAi method has potential off-target problems13And both approaches are limited by the effectiveness of transcriptional knockdown. Thus, there is a need in the art to find efficient methods for screening and identifying functional long non-coding RNAs and for interfering with the function of non-coding RNAs in a large scale manner.
Summary of The Invention
The present invention provides methods for studying the function of genomic regions, and for screening and identifying lncRNA with regulatory function. These methods rely in part on library screening based on newly developed CRISPR/Cas systems provided herein.
Specifically, the present invention relates to:
1. a CRISPR/Cas guide RNA construct for interfering with a long non-coding RNA in a eukaryotic cell genome comprising a guide sequence targeting a genomic sequence surrounding a long non-coding RNA splice site operably linked to a promoter and a guide hairpin sequence.
2. The CRISPR/Cas guide RNA construct of item 1, wherein the eukaryotic genome is a human genome.
3. The CRISPR/Cas guide RNA construct of item 1 or 2, wherein the guide sequence is 19-21 nucleotides in length.
4. The CRISPR/Cas guide RNA construct of any of items 1-3, wherein the hairpin sequence is about 40 nucleotides in length and once transcribed can bind to the CRISPR/Cas nuclease.
5. The CRISPR/Cas guide RNA construct of any of items 1-4, wherein the guide sequence targets a genomic sequence within a region spanning-50-bp to +75-bp around the SD or SA site of the long non-coding RNA.
6. The CRISPR/Cas guide RNA construct of item 5, wherein the guide sequence targets a genomic sequence within a region spanning-30-bp to +30-bp around the SD site or SA site of the long non-coding RNA.
7. The CRISPR/Cas guide RNA construct of item 6, wherein the guide sequence targets a genomic sequence within a region spanning-10-bp to +10-bp around the SD or SA site of the long non-coding RNA.
8. The CRISPR/Cas guide RNA construct of any of items 1-7, which is a viral vector or plasmid.
9. A library comprising a plurality of CRISPR/Cas guide RNA constructs of any one of items 1-8.
10. A storage liquid comprising the CRISPR/Cas guide RNA construct of any of items 1-8 or the library of item 9.
11. A host cell comprising the CRISPR/Cas guide RNA construct of any of items 1-8.
12. The host cell of item 11, further comprising a CRISPR/Cas nuclease and/or a coding sequence for a CRISPR/Cas nuclease.
13. The host cell of item 11 or 12, further comprising a Cas9 nuclease.
14. The host cell of any one of claims 11-13, further comprising a reporter construct integrated into its genome.
15. A population of host cells according to any one of claims 11 to 14.
16. A method, comprising:
introducing into a host cell a CRISPR/Cas guide RNA construct comprising a guide sequence targeting a genomic sequence surrounding a long non-coding RNA splice site operably linked to a promoter and a guide hairpin sequence,
expressing the guide RNA targeting the genomic sequence in the host cell and introducing exon skipping and/or intron retention in the long non-coding RNA in the presence of a CRISPR/Cas nuclease and determining the functional profile of the long non-coding RNA.
17. The method of clause 16, wherein the guide sequence targets a genomic sequence within a region spanning-50-bp to +75-bp around the SD or SA site of the long non-coding RNA.
18. The method of clause 17, wherein the guide sequence targets a genomic sequence within a region spanning-30-bp to +30-bp around the SD site or SA site of the long non-coding RNA.
19. The method of clause 18, wherein the guide sequence targets a genomic sequence within a region spanning-10-bp to +10-bp around the SD or SA site of the long non-coding RNA.
20. The method of any one of items 15-19, wherein the functional profile comprises a change in cell phenotype and/or an increase or decrease in expression of a coding gene or a non-coding gene.
21. The method of item 20, wherein the coding gene is an exogenous reporter gene or a naturally-occurring coding gene in the genome.
22. The method of any one of items 16-21, wherein the host cell is in a population of host cells and each host cell independently comprises a specific guide RNA construct.
23. The method of item 22, which is a high throughput method for screening or identifying long non-coding RNAs in eukaryotic cell genomes.
24. lncRNA for use in modulating cell growth or proliferation, selected from the group consisting of: XXbac-B135H6.15, RP11-848P1.5, AC005330.2, AP001062.9, AP005135.2, RP11-867G23.4, LINC01049, DGCR5, RP11-509A17.3, CTB-25J19.1, CTD-2517M22.17, CROCCP2, AC016629.8, CTC-490G23.4, RP 016629.8-117D 22.1, AC016629.8, RP 016629.8-251M 1.1, AC016629.8, RP 016629.8-17J 7, RP 016629.8-56N 19.5, TMEM191 016629.8, LL22NC 016629.8-102D 1.18, LINC 10, LL22NC 016629.8-23C 6.13, 203RP 016629.8-83J 21.3, 016629.8-KRA 12.4, ANbacD 62P 016629.8-016629.8, PARP 19.72, TERP 19-016629.8, CTP 016629.8-36464P 016629.8, CTRP 016629.8, and CTRP 36464.
25. A method for interfering with or eliminating the function of a long non-coding RNA in a eukaryotic cell, comprising introducing into a eukaryotic cell one or more CRISPR/Cas guide RNAs that target one or more polynucleotide sequences surrounding one or more splice sites of the long non-coding RNA, whereby the one or more guide RNAs target one or more polynucleotide sequences surrounding one or more splice sites of the long non-coding RNA and cleave the one or more polynucleotide sequences in the presence of a Cas protein, resulting in intron retention and/or exon skipping of the long non-coding RNA and thus interfering with or eliminating the function of the long non-coding RNA.
26. The method of clause 25, wherein the guide RNA targets a polynucleotide sequence within a region spanning-50-bp to +75-bp around the SD or SA site of the long non-coding RNA.
27. The method of item 26, wherein said guide RNA targets a polynucleotide sequence within a region spanning-30-bp to +30-bp around the SD or SA site of the long non-coding RNA.
28. The method of clause 27, wherein the guide RNA targets a polynucleotide sequence within a region spanning-10-bp to +10-bp around the SD or SA site of the long non-coding RNA.
29. The method of any one of items 25-28, wherein the Cas protein is a Cas9 enzyme.
30. The method of any one of items 25-29, wherein introducing into the cell is accomplished by a delivery system comprising a viral particle, a liposome, electroporation, microinjection, conjugation, a nanoparticle, an exosome, a microbubble, or a gene gun.
31. The method of clause 30, wherein introducing into the cell is effected by a delivery system comprising a lentiviral particle.
32. A method of interfering with and identifying gene function by targeted splicing comprising:
introducing into a host cell a CRISPR/Cas guide RNA construct comprising a guide sequence that targets a genomic sequence surrounding a splice site of a gene of interest operably linked to a promoter and a guide hairpin sequence,
expressing the guide RNA targeting the genomic sequence in the host cell and introducing exon skipping and/or intron retention in the gene of interest in the presence of a CRISPR/Cas nuclease and determining the functional profile of the gene of interest.
33. The method of item 32, wherein the gene of interest is a gene or a non-coding gene having a conserved coding sequence.
34. The method of clause 33, wherein the guide sequence targets a genomic sequence within a region spanning-50-bp to +75-bp around the SD site or SA site of the gene of interest.
35. The method of clause 34, wherein the guide sequence targets a genomic sequence within a region spanning-30-bp to +30-bp around the SD site or SA site of the gene of interest.
36. The method of item 35, wherein the guide sequence targets a genomic sequence within a region spanning-10-bp to +10-bp around the SD site or SA site of the gene of interest.
37 the method of any one of items 32-36, wherein the functional profile comprises a change in cell phenotype and/or an increase or decrease in expression of a coding gene or a non-coding gene.
38. The method of item 37, wherein the coding gene is an exogenous reporter gene or a naturally-occurring coding gene in the genome.
39. A method of inhibiting tumor cell growth or proliferation by interfering with the function of long non-coding rna (incrna), comprising identifying and disrupting incrna essential for tumor cell growth or proliferation using the method of any of items 16-23, thereby inhibiting tumor cell growth or proliferation.
40. The method of item 39, wherein said lncRNA essential for tumor cell growth or proliferation is selected from the group consisting of XXbac-B135H6.15, RP11-848P1.5, AC005330.2, AP001062.9, AP005135.2, RP11-867G23.4, LINC01049, DGCR5, RP11-509A17.3, CTB-25J19.1, CTD-2517M22.17, CROCCP2, AC016629.8, CTC-490G23.4, RP 016629.8-117D 22.1, AC016629.8, RP 016629.8-251M 1.1, AC016629.8, RP 36429J 17.7, RP 016629.8-56N 19.5, TMEM191 016629.8, LL22NC 016629.8-102D 1.20318, LINC 3610, KRRP 22-23C 6.13, 016629.8-3683.3-36544, TERP 12.72, TEEM 191-72, NRP 016629.8, CTP 016629.8-016629.8, PARP 016629.8, CTP 016629.8-016629.8, and CTP 016629.8-016629.8.
In one aspect, the methods of the invention utilize the ability of the CRISPR/Cas system to cleave specific genomic sequences surrounding the incrna cleavage site to induce intron retention or exon skipping resulting in incrna, thereby interfering with or eliminating incrna function. The genomic loci targeted are in particular around splice sites of genomic genes, in particular around splice sites of genomic genes encoding long noncoding rna (lncrna), in particular within a region spanning-50-bp to +75-bp around the SD or SA site, more preferably within a region spanning-30-bp to +30-bp, most preferably within a region spanning-10-bp to + 10-bp. The sequence surrounding the targeted lncRNA splice site is cleaved and mutated by a cellular non-homologous end joining (NHEJ) mechanism in the host cell, and such mutation results in exon skipping and/or intron retention and thus substantial elimination of the active function of lncRNA.
As known in the art, CRISPR/Cas system nucleases require guide RNA to cleave genomic DNA. These guide RNAs consist of: (1) a 19-21 nucleotide spacer sequence (guide sequence) that targets the CRISPR/Cas system nuclease to a different sequence of a genomic location in a sequence-specific manner, and (2) a hairpin sequence located between the guide RNAs and allowing binding of the guide RNA to the CRISPR/Cas system nuclease.
The methods herein involve introducing a CRISPR/Cas guide RNA construct comprising a guide sequence targeting a genomic sequence surrounding a long non-coding RNA splice site operably linked to a promoter and a hairpin sequence into a host cell in which the guide RNA (guideerna) targeting the genomic sequence is expressed. In one embodiment, the guide sequence targets genomic sequences within a region spanning-50-bp to +75-bp around the SD or SA site of the long non-coding RNA, more preferably within a region spanning-30-bp to +30-bp around the SD or SA site of the long non-coding RNA, and most preferably within a region spanning-10-bp to +10-bp around the SD or SA site of the long non-coding RNA.
In some cases, the method further comprises determining a functional profile of the long non-coding RNA. A change in the expression of a genomic gene (coding gene or non-coding gene) or a change in the functional activity of its gene product (encoded protein) can be used as an indication of the incrna regulatory function. Alternatively, the coding sequence for the reporter gene may be inserted into the genome (e.g., by replacing the form of the native coding sequence) and changes in its expression or the functional activity of its gene product may be used as an indicator of the functional profile of the long non-coding RNA. In some cases, the coding sequence of the reporter gene is fused to the native coding sequence, and the indication is the protein expression of the mRNA or the resulting fusion protein or the functional activity of the fusion protein.
In a particular aspect, the methods disclosed herein can be used to screen for and identify lncrnas that are involved in cellular processes other than transcription, including, for example, cell survival, cell division, cell metabolism, apoptosis, cell cycling, nucleosome assembly, signal transduction, multi-cellular organism development, immune response, cell adhesion, angiogenesis, and the like. In some embodiments, the method can be used to identify lncrnas that cause a change in a cellular process selected from the group consisting of: cell survival, cell division, cell metabolism, apoptosis, cell cycle, nucleosome assembly, signal transduction, development of multicellular organisms, immune response, cell adhesion, and angiogenesis. In some embodiments, the methods can be used to identify lncrnas that cause a change in a cellular phenotype, such as loss of function or gain of function. In some embodiments, the methods can be used to identify lncrnas that result in a decrease or increase in transcription of a coding gene and/or a non-coding gene. The methods can be used to identify the role of one or more incrnas simultaneously or sequentially, or to identify the function of an incrna individually or to identify the function of multiple incrnas in different combinations.
For example, a population of cells is transfected with a CRISPR/Cas guide RNA library that encodes different sequences of guide RNAs that target genomic sequences surrounding incrna splice sites, respectively, and the guide RNAs are expressed in the cells and induced to undergo exon skipping and/or intron retention of the incrnas in the presence of the CRISPR/Cas. The RNA profile and transcriptome of each cell may be analyzed using, for example, but not limited to, single cell RNA sequencing (RNA-Seq) techniques. The analysis will reveal the effect of cellular genomic mutations on the RNA profile, including the type and abundance of RNA molecules. The methods can also be used to identify the nature (e.g., sequence) of the guide RNA that achieves exon skipping and or intron retention. Thus, the effect of exon skipping or intron retention can be observed immediately across the entire cellular transcriptome by experimentation in single cells.
The present invention provides CRISPR/Cas guide RNA constructs comprising a guide sequence targeting a genomic sequence surrounding a long non-coding RNA splice site operably linked to a promoter and a hairpin sequence.
In some embodiments, the eukaryotic genome may be a human genome, and thus the CRISPR/Cas guide construct may be intended for use in a human cell.
The guide sequence may be 19-21 nucleotides in length. Hairpin sequences may be less than 100 nucleotides, less than 90, 80, 70, 60, 50, 40 or 30 nucleotides in length, for example about 20, 30, 40, 50, 60 nucleotides. In other embodiments, the hairpin sequence may be about 20-60 or 20-40 nucleotides in length. Once transcribed, the hairpin sequence can bind to the CRISPR/Cas nuclease.
The CRISPR/Cas guide construct is DNA in nature and when transcribed produces guide RNA.
The invention also provides a population of cells comprising any of the above host cells. The host cell population may be homologous or heterologous.
In some embodiments, the cell further comprises a CRISPR/Cas nuclease and/or a coding sequence for a CRISPR/Cas nuclease. In some embodiments, the cell further comprises a coding sequence for Cas9 nuclease and/or Cas9 nuclease.
In some embodiments, the coding sequence for the reporter protein or the fusion protein comprising the reporter protein is integrated into the genome of the host cell.
In some embodiments, the host cell is in a population of host cells, and each host cell independently comprises a specific guide RNA construct.
In some embodiments, each host cell expresses a specific functional guide RNA, and the host cell undergoes a mutation in a different genomic sequence to that of other host cells in the population that is implicated by the guide RNA.
The invention also provides a high-throughput method for screening or identifying long non-coding RNAs in the genome of eukaryotic cells, comprising introducing a CRISPR/Cas guide RNA library targeting a genomic sequence surrounding a incrna splice site into a population of host cells, wherein each host cell in the population of cells independently comprises and expresses a specific guide RNA and, in the presence of a CRISPR/Cas nuclease, cleaves and mutates the targeted genomic sequence and thus results in exon skipping and/or intron retention of the incrna.
In some embodiments, the high throughput method further comprises identifying the effect of lncRNA on cell phenotype or expression of coding or non-coding genes. In some embodiments, each host cell expresses a specific guide RNA and is mutated in a different genomic sequence relative to the other host cells in the population. In some embodiments, the encoding gene is exogenous or endogenous to the genome of the cell. In some embodiments, the alteration in the phenotype of the cell comprises a loss of function or an acquisition of function. In some embodiments, the change in expression of the coding gene or non-coding gene is an increase or decrease in transcription of the coding gene or non-coding gene.
The invention also provides lncrnas screened or identified by the high throughput methods disclosed herein. These lncRNAs include, but are not limited to, XXbac-B135H6.15, RP11-848P1.5, AC005330.2, AP001062.9, AP005135.2, RP11-867G23.4, LINC01049, DGCR5, RP11-509A17.3, CTB-25J19.1, CTD-2517M22.17, CROCCP2, AC016629.8, CTC-490G23.4, RP 016629.8-117D 22.1, AC016629.8, RP 016629.8-251M 1.1, AC016629.8, RP 016629.8-J17.7, RP 72-56N 19.5, TMEM191 016629.8, NC 016629.8-102D 1.18, LINC00410, NC 22NC 016629.8-20323C 6.13, RP 72-83J 21.3, KRRP 016629.8-544A 12.4, AND 016629.8-429P 62, AND 429-19, LINC 0043672, CTP 016629.8-016629.8, CTP 016629.8, and TPRP 36464 can modulate cell proliferation.
The invention also provides a method for interfering with or eliminating the function of a long non-coding RNA in a eukaryotic cell, comprising introducing into the eukaryotic cell one or more CRISPR/Cas guide RNAs that target one or more polynucleotide sequences surrounding one or more splice sites of the long non-coding RNA, whereby the one or more guide RNAs target one or more polynucleotide sequences surrounding one or more splice sites of the long non-coding RNA and, in the presence of a Cas protein, cleave the one or more polynucleotide sequences, resulting in intron retention and/or exon skipping of the long non-coding RNA and thus interfering with or eliminating the function of the long non-coding RNA. In some embodiments, the guide RNA targets a polynucleotide sequence within a region spanning-50-bp to +75-bp around the SD or SA site of the long non-coding RNA. In some embodiments, the guide RNA targets a polynucleotide sequence within a region spanning-30-bp to +30-bp around the SD site or SA site of the long non-coding RNA. In some embodiments, the guide RNA targets a polynucleotide sequence within a region spanning-10-bp to +10-bp around the SD or SA site of the long non-coding RNA. In some embodiments, the CRISPR/Cas nuclease is Cas9 or Cpfl. In some embodiments, the introduction into the cell is performed by a delivery system comprising a viral particle, a liposome, electroporation, microinjection, conjugation, a nanoparticle, an exosome, a microbubble, or a gene-gun, preferably by a delivery system comprising a lentiviral particle.
Brief Description of Drawings
FIGS. 1a-b.a, genomic sequence features and base specificity of splice sites in humans. The y-axis indicates the probability of bases at each locus. b, schematic representation of intron retention or exon skipping induced by sgRNA around targeted Splice Donor (SD) or Splice Acceptor (SA) sites.
Fig. 2a-b, which show the correlation between repeated experiments in a sgRNA library screen for essential ribosomal genes. Scattergrams of normalized sgRNA read counts in libraries including day 0 control samples (Ctrl) and day 15 experimental samples (Exp) for the targeted splicing of HeLa cell line (a) and huh7.5 cell line (b). Spearman correlation (Spearman corr.) between two replicates of each sample was also reported.
Fig. 3 this figure embodies a deep sequencing analysis of CRISPR screening of sgRNA libraries targeting ribosomal genes in HeLa and huh7.5 cell lines. The sgRNA saturation mutagenesis library was designed to target the-50-bp to +75-bp region around the 5 'SD site and the-75-bp to +50-bp region around the 3' SA site of 79 ribosomal genes. The collected plasmid libraries were transduced into HeLa and huh7.5 cells expressing Cas9 protein, respectively, by lentivirus. Log of read counts by normalization2(Exp: Ctrl) the decrease in total sgrnas at each indicated locus was calculated, and the black bars represent the average fold change in total sgrnas at each locus. The dashed line indicates the position of the splice site.
Fig. 4a-c, which show the identification of sgRNA targeting regions that generate splice site disruption. a, normalization of high-efficiency sgrnas at each locus in HeLa and huh7.5 cell lines. Data were calculated by dividing the number of sgrnas with more than 4-fold reduction by the total number of sgrnas designed at the indicated loci. b, comparison of high-potency sgrnas targeting introns, 5' SD sites and exons in HeLa and huh7.5 cell lines. Each bar represents the percentage of sgrnas with greater than 2-fold or 4-fold reduction in different regions. Data are expressed as mean ± s.e.m. c, comparison of high efficiency sgrnas targeting introns, 3' SA sites and exons in HeLa and huh7.5 cell lines. Data are expressed as mean ± s.e.m.
Fig. 5a-e. this figure illustrates the construction of a CRISPR system and genome-scale screening to identify lncrnas essential for cell growth and proliferation. and a, constructing a CRISPR system. b, a process of constructing, screening and data analyzing the sgRNA library of targeted splicing. c, between two independent repeatsscattergrams of sgRNA fold changes. d, log of sgRNA not targeting sgRNA, targeting essential genes and lncRNA2(fold change) distribution. Fold changes in each group were compared to non-targeted sgrnas by t-test. P<0.001. e, screening score of negatively selected lncrnas by splicing-targeted CRISPR screening, fold change of all targeted sgrnas was compared to negative control sgrnas by Wilcox assay and the resulting P-value was further corrected by zero distribution of negative control genes (which was obtained by randomly sampling negative control sgrnas). Screening scores were calculated from mean fold change and corrected P values (see methods section). The top 10 lncRNA samples and the essential genes for negative selection were marked separately.
FIGS. 6a-f. this figure shows the validation of the function of candidate lncRNA. a-c, effects of sgrnas on cell proliferation in K562 and GM12878 cells shown, including three control sgrnas, a non-targeting sgRNA, a sgRNA targeting the AAVS1 locus, a sgRNA targeting the RPL18 (a gene essential for cell growth) splice site (a), and two negative selection incrnas (b, c). Lentiviral expression vectors, each containing sgRNA for the EGFP marker driven by the CMV promoter, were transduced into K562 and GM12878 cells, respectively. The percentage of EGFP-positive cells was measured every 3 days by FACS, indicating sgRNA-infected cells. The first FACS analysis started 3 days after infection (marked as day 0) and subsequently the collected cells were passaged for 12 days. Cell proliferation was determined for each sample by dividing the percentage of EGFP-positive cells at the indicated time points by the percentage on day 0. Data are presented as mean and standard deviation of triplicate experiments. Asterisks (—) represent P values compared to sgrnas targeting AAVS1 at the assay endpoint (day 12), calculated using the t-test and adjusted using the Benjamini-Hochberg method. P < 0.05; p < 0.01; p < 0.001; p < 0.0001; NS, not significant. d, cell proliferation of the highest ranked candidate lncrnas in K562 cells compared to 35 of GM12878 cells by a strategy of targeted splicing. The 35 highest-ranking candidate lncRNA is XXbac-B135H6.15, RP11-848P1.5, AC005330.2, AP001062.9, AP005135.2, RP11-867G23.4, LINC01049, DGCR5, RP11-509A17.3, CTB-25J19.1, CTD-2517M22.17, CROCCP2, AC016629.8, CTC-490G23.4, RP 016629.8-117D 22.1, AC016629.8, RP 016629.8-36M 1.1, AC016629.8, RP 016629.8-429J 17.7, RP 016629.8-56N 19.5, TMEM191 016629.8, LL22NC 016629.8-102D 1.18, LINC00410, LL22 016629.8-23C 6.13, 016629.8-J21.3, KRRP 72-36544, KRRP 12.4, ANNC 016629.8-102D 1.18, LINC 2033672, PARP 016629.8-016629.8, CTP 016629.8-36464, CTP 016629.8-36464. The threshold was set at 80%, the normalized percentage of sgRNA-infected cells on day 12. The light grey dots indicate lncRNA that is only essential in K562 cells and the dark grey dots indicate those that exhibit growth phenotype in both K562 and GM12878 cells. e, effect of large fragment deletion of lncRNA XXbac-B135H6.15 in K562 cells on cell proliferation. 4 pairs of grnas were designed to delete the promoter and first exon. pgRNA was also expressed from scaffolds containing EGFP markers and cell proliferation assays (a-c) were performed as shown in fig. 3. Data are presented as mean and standard deviation of triplicate experiments. Asterisks represent P values compared to AAVS1_ P1 at day 15, calculated using the t-test and adjusted using the Benjamini-Hochberg method. P < 0.05; p < 0.01; p < 0.001; p < 0.0001; NS, not significant. f, showing the correlation between the splice-targeting and pgRNA-mediated deletion approaches to knockout effect on the highest ranking lncRNA candidates.
Fig. 7-12 these figures provide proof of validation of the highest ranking lncrnas obtained by the splicing targeting strategy.
FIGS. 13a-b, which provide validation of candidate lncRNA by large fragment deletion. a cell proliferation assay in K562 cells by large fragment deletion of AAVS1 locus and essential genes RPL19, RPL 23A. 2 pairs of grnas were designed for the AAVS1 locus, and one pair was designed for each essential gene to delete the promoter and first exon. The design principles of pgRNA and the method used to determine the growth effect are the same as described in FIG. 3 and the remaining figures. Data are presented as mean and standard deviation of triplicate experiments. Asterisks represent P values compared to AAVS1_ P1 at day 15, calculated using the t-test and adjusted using the Benjamini-Hochberg method. P < 0.05; p < 0.01; p < 0.001; p < 0.0001; NS, not significant. And b, verifying the effect of large-fragment deletion of 5 candidate lncRNA on cell growth by a splicing targeting strategy.
Figure 14 this figure provides validation of candidate lncrnas by large fragment deletion, where 6 candidate lncrnas were not validated by the splicing targeting strategy in K562 cells.
Fig. 15a-f, which demonstrate the functional profiling of lncRNAs MIR17HG and BMS1P20 in K562 and GM12878 cell lines. a, expression pattern of the highest ranked 500 genes showing the highest variation between MIR17 HG-and BMS1P20-KO (knock-out) cells and their corresponding controls. b, expression levels of the top 100 of the 100 necessary incrna candidates in K562 and GM12878 cells. c, expression levels of essential genes down-regulated in MIR17 HG-and BMS1P20-KO cells compared to wild-type K562 cells. d, Venn diagram showing down-regulated essential genes between MIR17 HG-and BMS1P20-KO K562 cells. e, volcano pattern of differential expression in K562 cells compared to GM12878 cells after infection with sgRNA spliced to target BMS1P 20. The black and grey dots represent the total gene and the differentially expressed gene, respectively. f, Gene Ontology (GO) terminology and KEGG annotation of genes that are down-regulated (upper panel) and up-regulated (lower panel) in K562 cells.
FIGS. 16a-e. this figure illustrates the RNA-seq profiles of lncRNA knockouts of MIR17HG and BMS1P20 in K562 and GM12878 cells. a, paired scatter plots of gene expression levels between MIR17HG-KO (knock out), BMS1P20-KO and wild-type K562 cells. b, paired scatter plots of gene expression levels between MIR17HG knockout, BMS1P20 knockout, and wild-type GM12878 cell. c, gene ontology and KEGG annotation of the conserved essential genes showing down-regulation after infection of sgrnas targeting MIR17HG and BMS1P20 in K562 cells. d, volcano pattern of intercellular differential expression between BMS1P20-KO and wild-type K562. e, volcano pattern of intercellular differential expression between BMS1P20-KO and wild-type GM 12878.
Detailed Description
Definition of
The invention is described on the basis of specific embodiments and with reference to the attached drawings, but the invention is not limited thereto, but the scope of protection is defined by the claims. Any reference signs in the claims shall not be construed as limiting the scope. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes. Where the term "comprising" is used in the present description and claims, it does not exclude other elements or steps. Where an article is referred to as a singular noun, e.g., "a," "an," or "the," "the," etc., this description typically includes plural referents unless otherwise indicated.
The following terms or definitions are also provided to aid in the understanding of the present invention. Unless specifically defined herein, all terms used herein have the same meaning as one skilled in the art to which this invention pertains. For these art definitions and nomenclature, specific practitioner reference may be made specifically to Sambrook et al, Molecular Cloning: A laboratory Manual,2nded., Cold Spring Harbor Press, Plainview, New York (1989); and Ausubel et al, Current Protocols in Molecular Biology (Supplement 47), John Wiley&Sons, New York (1999). The definitions provided herein should not be construed to have a scope less than understood by those of skill in the art.
The terms "polynucleotide", "nucleotide sequence", "nucleic acid" and "oligonucleotide" are used interchangeably. It refers to a polymeric form of a nucleotide of any length, which may be a deoxyribonucleotide or a ribonucleotide, or an analog thereof. The polynucleotide may have a three-dimensional structure and may perform any function, known or unknown. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, loci, exons, introns, messenger RNA (mrna), long non-coding RNA (lncrna), transfer RNA, ribosomal RNA, short interfering RNA (sirna), short hairpin RNA (shrna), micro RNA (mirna), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure can be imparted before or after assembly of the multimer. The sequence of nucleotides may be interrupted by non-nucleotide components. The polynucleotide may be further modified after multimerization, such as by conjugation to a labeling component.
In one aspect of the invention, the terms "chimeric RNA," "chimeric guide RNA," "single guide RNA," and "synthetic guide RNA" are used interchangeably and refer to a polynucleotide sequence comprising a guide sequence, a tracr sequence, and a tracr partner sequence. The term "guide sequence" refers to a sequence of about 20bp within a guide RNA that specifies a targeting site, and may be used interchangeably with the terms "guide sequence" or "spacer".
As used herein, "expression" refers to the process of transcription of a polynucleotide from a DNA template (e.g., into mRNA or other RNA transcript) and/or the subsequent translation of the transcribed mRNA into a peptide, polypeptide, or protein. The transcripts and encoded polypeptides may be collectively referred to as "gene products". If the polynucleotide is derived from genomic DNA, expression may include splicing of mRNA in eukaryotic cells.
The practice of the present invention employs, unless otherwise indicated, conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA, which are within the skill of the art. See Sambrook, Fritsch and Maniatis, Molecular CLONING, A Laborary Manual,2 conclusion (1989); CURRENT promoters IN MOLECULAR BIOLOGY (f.m. ausubel, et al eds., (1987)); the series METHODS IN ENZYMOLOGY (Academic Press, Inc.: PGR 2: APRACTICAL APPROACH (M.J. MacPherson, B.D. Hames and G.R. Taylor eds. (1995)), Harlow and Lane, eds. (1988) ANTIBODIES, A LABORATORY MANUAL, and ANIMAL CELLCULATURE (R.L. Freekney, ed. (1987))14-18。
Several aspects of the invention relate to a vector system comprising one or more vectors, or such vectors. Vectors can be designed for expression of CRISPR transcripts (e.g., nucleic acid transcripts, proteins, or enzymes) in prokaryotic or eukaryotic cells. For example, CRISPR transcripts can be expressed in bacterial cells such as e.coli, insect cells, yeast cells, or mammalian cells. Suitable host cells are described IN Goeddel, GENE EXPRESSION TECHNOLOGY: METHODS IN ENZYMOLOGY 185, Academic Press, San Diego, Calif. (1990)19Are also described in detail. Alternatively, the recombinant expression vector may be transcribed and translated in vitro, for example, using the T7 promoter to regulateSequence and T7 polymerase.
In some embodiments, a mammalian cell vector is used, which is capable of driving expression of one or more sequences in a mammalian cell. Examples of mammalian expression vectors include pCDM820And pMT2PC21. When used in mammalian cells, the regulatory function of the expression vector is provided primarily by one or more regulatory elements. For example, commonly used promoters are derived from polyoma virus, adenovirus 2, cytomegalovirus, simian virus 40, and other promoters disclosed herein and known in the art. Other suitable expression systems for use in both prokaryotic and eukaryotic cells are described, for example, in Chapters 16 and 17of Sambrook, et al, MOLECULAR CLONING: A LABORATORY MANUAL.2nd ed., Cold Spring Harbor LABORATORY Press, Cold Spring Harbor, N.Y.,198914。
In general, "CRISPR system" collectively refers to a transcript or other element involved in the expression of or directing the activity of a CRISPR-associated ("Cas") gene, including sequences encoding the Cas gene, tracr (trans-activating CRISPR) sequences (e.g., tracrRNA or partially-activating tracrRNA), tracr-chaperone sequences (encompassing "direct repeats" and partial direct repeats of tracrRNA-processing in the context of an endogenous CRISPR system), guide sequences (also referred to as "spacers" in the context of an endogenous CRISPR system), or other sequences and transcripts from the CRISPR locus. In some embodiments, the one or more elements of the CRISPR system are derived from a type I, type II or type III CRISPR system.
In the context of forming a CRISPR complex, a "target sequence" refers to a sequence for which a guide sequence is designed to have complementarity, wherein hybridization between the target sequence and the guide sequence promotes formation of the CRISPR complex. Complete complementarity is not necessary provided that there is sufficient complementarity to cause hybridization and promote CRISPR complex formation.
Typically, in the context of an endogenous CRISPR system, formation of a CRISPR complex (including hybridization of a guide sequence to a target sequence and complexing with one or more Cas proteins) results in cleavage of one or both strands in or near (e.g., within 1, 2, 3, 4, 5, 6,7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence. Without wishing to be bound by theory, the tracr sequence may comprise or consist of all or a portion of a wild-type tracr sequence (e.g., about or greater than about 20, 23, 26, 29, 32, 35, 38, 41, 44, 47, 50, 53, 56, 59, 62, 65, 70, 75, 80, 85 or more nucleotides of a wild-type tracr sequence) and may also form part of a CRISPR complex, e.g., by hybridizing all or a portion of a tracr partner sequence operably linked to a guide sequence along at least a portion of the tracr sequence.
In some embodiments, the tracr sequence is sufficiently complementary to the tracr partner sequence to hybridize and participate in the formation of a CRISPR complex. Identical to the target sequence, complete complementarity is not necessary, as long as it is sufficient for its function. In some embodiments, the tracr sequence has at least 50%, 60%, 70%, 80%, 90%, 95%, or 99% complementarity along the length of the tracr partner sequence under optimal alignment.
In some embodiments, one or more vectors that drive expression of one or more elements of the CRISPR system are introduced into a host cell such that expression of the CRISPR system elements directs formation of CRISPR complexes at one or more target sites. In another embodiment, the host cell is engineered to stably express Cas9 and/or OCT 1.
In general, a guide sequence is any polynucleotide sequence that has sufficient complementarity to a target polynucleotide sequence to hybridize to the target sequence and direct sequence-specific binding of the CRISPR complex to the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence is about or greater than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99% or more when optimally aligned using an appropriate alignment algorithm. Optimal alignment may be determined using any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, Needleman-Wimsch algorithm, Burrows-Wheeler Transform-based algorithms (e.g., Burrows Wheeler Aligner), ClustalW, Clustai X, BLAT, Novoalign (Novocraft Technologies, ELAND ((Illumina, san diego, CA)), SOAP (available at SOAP. genomics. org. cn), and Maq (available at maq. sourceform. net.) in some embodiments, the guide sequence length may be about or greater than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 65, 60, 70, 60, 70, 75, or less specific for the guide sequence, the SPR, the target sequence may be about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 65, 70, 75, or less than about 5 Assessed by any suitable assay method. For example, components of the CRISPR system (including the guide sequences to be tested) sufficient to form a CRISPR complex can be provided to a host cell having a corresponding target sequence, such as can be performed by transfection using a vector encoding the CRISPR sequence components followed by assessment of preferential cleavage within the target sequence (such as by the surfyor assay as described herein). Likewise, cleavage of a target polynucleotide sequence can be assessed in a test tube by providing a set of target sequence, CRISPR complex (comprising the guide sequence to be tested and a control guide sequence different from the guide sequence), and comparing the rate of binding or cleavage of the target sequence between the test and control guide sequence reactions. Other assays are possible and would be known to those skilled in the art.
In some embodiments, the CRISPR enzyme is part of a fusion protein comprising one or more heterologous protein domains (e.g., about or greater than about 1, 2, 3, 4, 5, 6,7, 8, 9, 10 or more domains in addition to the CRISPR enzyme). The CRISPR enzyme fusion protein can comprise any additional protein sequence, and optionally a linker sequence between any two domains. Examples of protein domains that can be fused to a CRISPR enzyme include, but are not limited to, epitope tags, reporter gene sequences, and protein domains having one or more of the following activities: methylase activity, demethylase activity, transcriptional activation activity, transcriptional repression activity, transcriptional release factor activity, RNA cleavage activity and nucleic acid binding activity.
In some aspects, the invention provides methods comprising delivering to a host cell one or more polynucleotides, e.g., one or more constructs, e.g., vectors, one or more transcripts thereof and/or one or more proteins transcribed therefrom, as described herein. The invention can be used as a basic platform for targeted modification of DNA-based genomes. It can interface with any delivery system including, but not limited to, viruses, liposomes, electroporation, microinjection, and conjugation. In some aspects, the invention further provides cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced by such cells. In some embodiments, the CRISPR enzyme in combination (and optionally complexed) with the guide sequence is delivered to a cell. Nucleic acids can be introduced into mammalian cells or target tissues using conventional viral and non-viral based gene transfer methods. Such methods can be used to administer nucleic acids encoding CRISPR system components to cells in a culture medium or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g., vector transcripts as described herein), naked nucleic acids, and nucleic acids complexed with a delivery vehicle, such as liposomes. Viral vector delivery systems include DNA and RNA viruses, which have an episomal or integrated genome for delivery to cells.
Non-viral delivery methods of nucleic acids include lipofection, nuclear transfection, microinjection, gene guns, viral particles, liposomes, immunoliposomes, polycations or lipids, nucleic acid conjugates, naked DNA, and artificial virions.
The use of RNA or DNA based systems for delivering nucleic acids has the advantage of high efficiency in targeting viruses to specific cells of the body and transporting viral loads to the nucleus.
In a preferred embodiment, the targets of the invention comprise long non-coding RNAs (lncrnas), which represent a class of long transcribed RNA molecules, e.g. RNA molecules of more than 200 nucleotides in length. Their size distinguishes IncRNAs from IncRNAs of small regulatory RNAs such as microRNA (miRNA), short interfering (miRNA), Piwi-interacting RNA (piRNA), small nucleolar RNA (snorRNA), short hairpin RNA (shRNA) and other short RNAs. lncRNA can function in a sequence-specific manner by binding to DNA or RNA or by binding to a protein. In contrast to mirnas, lncrnas do not appear to act in the usual mode of action, but can regulate gene expression and protein synthesis in a variety of ways.
IncRNAs can be classified into the following locus biotypes based on their position relative to the protein-encoding gene. Intergenic lncrnas, which are genetically transcribed from both strands; an intron lncRNA, which is completely transcribed from the intron of the protein-encoding gene; a sense lncRNA transcribed from the sense strand of the protein-encoding gene and comprising a complete sequence from an exon of the protein-encoding gene or covering the protein-encoding gene by an intron that partially overlaps with the protein-encoding gene; and antisense lncrnas, which are transcribed from the antisense strand of a protein-coding gene that overlaps with an exon or intron region, or which cover the entire protein-coding sequence by an intron. Recent studies in human transcriptome analysis have shown that protein coding sequences account for only a small fraction of genomic transcripts. The major human genomic transcript is non-coding rna (lncrna).
The term "lncRNA" is used in its broadest sense to refer to a target of the invention and includes "lncRNA genes" and the "lncRNA transcripts" that they produce.
The term "exon" as used herein refers to any portion of a gene that will encode a portion of the final mature RNA (produced by the gene after intron removal by RNA splicing). The term exon refers to the DNA sequence within a gene as well as the corresponding sequence in an RNA transcript. In RNA splicing, introns are removed and exons are covalently joined to each other as part of the generation of mature messenger RNA.
An "intron" is any nucleotide sequence within a gene that is removed by RNA splicing during maturation of the final RNA product. The term intron refers to the DNA sequence within a gene and the corresponding sequence in an RNA transcript. The sequences are joined together in the final mature RNA after RNA splicing. Introns are found in genes of most organisms and a variety of viruses, and are present in a variety of genes, including those of protein production, ribosomal rna (rrna), long noncoding rna (incrna), and transport rna (trna). When proteins are produced from genes containing introns, RNA splicing occurs as part of the post-transcriptional RNA processing pathway and precedes translation.
As herein describedThe term "splicing" is used to mean the editing of nascent precursor messenger (pre-mRNA) transcripts into mature messenger RNA (mRNA). For most eukaryotic introns, splicing is performed in a series of reactions catalyzed by spliceosomes, complexes of micronucleus ribonucleoproteins (snrnps). Spliceosomes introns are usually located within the sequence of eukaryotic protein-encoding genes. Within an intron, essential for splicing are a donor site (5' end of the intron), a branching site (near the 3' end of the intron), and an acceptor site (3 ' end of the intron). Within the larger, less conserved region, the Splice Donor (SD) site includes the sequence GT with little alteration at the 5' end of the intron. The Splice Acceptor (SA) site at the 3' end of the intron terminates the intron with an almost invariant AG sequence. Upstream (5' -direction) of AG there are regions rich in pyrimidine (C and T) or polypyrimidine tracts. Further upstream of the polypyrimidine tract is a branch point which comprises an adenine nucleotide involved in the formation of a noose22,23。
The nuclear pre-mRNA intron is characterized by specific intron sequences located at the boundaries of the intron and exon. These sequences are recognized by the spliceosome RNA molecules when the initial splicing reaction occurs. The major spliceosome splices at the 5 'splice site containing GT and at the 3' splice site containing the intron of AG, and this type of splicing is referred to as canonical splicing or as the lasso pathway, with splicing above 99% being such a way. In contrast, when intron flanking sequences do not follow the GT-AG rule, non-canonical splicing is said to occur in less than 1% of the proportion24。
Our bioinformatic analysis using the Weblogo3 tool showed that about 99% of the intron regions in the human genome were flanked by GT at the 5 'site and AG at the 3' site. These intron regions are suitable for coding gene and non-coding RNA.
Exon skipping is a form of RNA splicing that causes one or more exons to "skip" the final RNA, while intron retention is a form of RNA splicing in which introns remain in the final RNA after splicing.
Splicing is regulated by trans-acting proteins (repressor and activator) and corresponding cis-acting regulatory sites (silencer and enhancer) on pre-mRNA. However, the device is not suitable for use in a kitchenHowever, as part of the complexity of alternative splicing, it should be noted that the effects of splicing factors are often position-dependent. That is, in the context of exons, a splicing factor that functions as a splice-activating protein when associated with an intron enhancer element may function as a repressor protein when associated with an intended splice element, and vice versa25. Secondary structure of pre-mRNA transcripts also plays a role in regulating splicing, such as by pooling together splicing elements or by masking a sequence that, if unmasked, would function as a binding element for splicing factors26. In conclusion, these elements form the "splicing code" which controls how splicing occurs under different cellular conditions "27。
Modification of genes in eukaryotic cells
The methods of the invention involve efficient delivery of sgrnas targeting splice sites to produce exon skipping and/or intron retention to interfere with genes, including, for example, coding or non-coding genes. For a gene encoding lncRNA, the method can be effective in affecting the function of lncRNA.
To assess the efficacy of splicing targeting in CRISPR screening, we designed a saturated library of splice sites targeting 79 ribosomal genes, most of which are essential for cell growth in various cell lines. The library contained 5,788 sgrnas with cleavage sites within 50-bp to +75-bp around each 5 'SD (splice donor) site and 50-bp to +75-bp around each 3' SA (splice acceptor) site of the 79 genes. Clearly, sgrnas that affect the splice site are superior to sgrnas that target only the exon region, and the closer the distance from the sgRNA cleavage site to the splice site, the better its gene disruption effect, with the peak point slightly toward the exon for SD and SA cases.
CRISPR/Cas9 action mechanism and library screening principle
The methods of the invention utilize CRISPR/Cas systems. Cas9 is from a microbial type II CRISPR (short palindromic repeats) system that has been shown to cleave DNA when paired with a single guide rna (grna). The gRNA contains a 17-21bp sequence that directs Cas9 to a complementary region in the genome, thus allowing the specific generation of a Double Strand Break (DSB) site that is repaired in an error-prone manner by a cellular non-homologous end joining (NHEJ) mechanism. Cas9 primarily cleaves the genomic site of the gRNA followed by the PAM sequence (-NGG). NHEJ-mediated Cas 9-induced DSB repair induces an initial broad range of mutations at cleavage sites that are typically small (<10bp) insertions/deletions (indels) but may include larger (>100bp) insertions/deletions (indels) and single base changes.
The splicing targeting methods of the invention can be used to screen multiple (e.g., thousands) of sequences in a genome, thereby elucidating the function of those sequences. In some embodiments, the splicing targeting methods of the invention involve high throughput screening of non-coding RNAs by using the CRISPR/Cas9 system to identify genes required for survival, proliferation, or drug resistance, among others. In screening, grnas targeting tens of thousands of splice sites within the gene of interest are co-delivered as a pool with Cas9 into target cells, e.g., by lentiviral vectors. By identifying grnas that are enriched or depleted in cells after selection for a desired phenotype, genes required for that phenotype can be systematically identified.
In a manner based on the above-described high-throughput CRISPR/Cas9, a gRNA library can be cloned into a lentiviral vector. In this case, it is desirable to reduce the multiplicity of infection (MOI) to limit the number of guide RNAs in a single cell, typically with only a single guide RNA per cell. Integration of grnas in each cell was randomized, allowing for pooled screening (pool screen) of only one gRNA expressed per cell. It is noteworthy that the high throughput gRNA-based screening on the genome of the targeted splice sites of the present invention can also be used for other CRISPR-based high throughput screens for coding and regulatory genes.
Guide RNA
As known in the art, CRISPR/Cas system nucleases require guide (guide) RNA to cleave genomic DNA. These guide RNAs consist of: (1) a spacer (guide sequence) comprising 19-21 nucleotides that targets the CRISPR/Cas system nuclease to multiple sequences of a genomic location in a sequence-specific manner, and (2) a hairpin sequence between the guide RNAs and allowing binding of the guide RNA to the CRISPR/Cas system nuclease. In the presence of a CRISPR/Cas nuclease, the guide RNA triggers a CRISPR/Cas-based genome cleavage event in the cell.
A guide sequence is selected or designed based on the intended target sequence. In some embodiments, the target sequence is a sequence surrounding a splice site, e.g., a region-50-bp to +75-bp around the SD site of a gene encoding incrna within the genome of a cell, preferably a region-30-bp to +30-bp around the SD site, and most preferably a region-10-bp to +10-bp around the SD site; a region of-50-bp to +75-bp around the SA site, preferably a region of-30-bp to +30-bp around the SA site, and most preferably a region of-10-bp to +10-bp around the SA site. Exemplary target sequences include those sequences unique in the target genome.
For example, for S.pyogenes Cas9, a unique target series in the genome may include a Cas9 target site of the form M8N12XGG, where N12XGG (N is A, G, T or C; and X may be either) has a single occurrence in the genome. The unique target sequence in the genome may include the Streptococcus pyogenes Cas9 target site of form M9N11XGG, where N11XGG (N is A, G, T or C; and X may be either) has a single occurrence in the genome.
For streptococcus thermophilus (s. thermophilus) CRISPR1Cas9, the unique target sequence in the genome may include a Cas target site of the form M8N12XXAGAAW, where N12XXAGAAW (N is A, G, T or C; and X may be either, and W is a or T) has a single occurrence in the genome. The unique target sequence in the genome can include the streptococcus thermophilus CRISPR1Cas9 target site in the form of M9N11XXAGAAW, where N12XXAGAAW (N is A, G, T or C; and X can be either, and W is a or T) has a single occurrence in the genome.
For streptococcus pyogenes Cas9, the unique target sequence in the genome may include a target site of the form M8N12XGGXG, where N12XGGXG (N is A, G, T or C; and X may be either) has a single occurrence in the genome. The unique target sequence in the genome may include a Streptococcus pyogenes Cas9 target site in the form of M9N11XGGXG, where N12XGGXG (N is A, G, T or C; and X may be either) has a single occurrence in the genome. In each of these sequences, "M" may be A, G, T or C and need not be considered when considering a sequence as a unique sequence.
It is to be understood that any hairpin sequence can be used as long as it can be recognized and bound by a CRISPR/Cas nuclease.
Guide RNA constructs
In some embodiments, the invention relates to a guide RNA construct. The guide RNA construct may comprise (1) a guide sequence and (2) a guide RNA hairpin sequence, and optionally (3) a promoter sequence capable of initially directing RNA transcription. Non-limiting examples of guide RNA hairpin sequences are Chen et al cell. 2013Dec 19; 155(7) 1479-91. An example of a promoter is the human U6 promoter.
In some embodiments, the present invention relates to a CRISPR/Cas guide construct comprising (1) a guide sequence and (2) a guide RNA hairpin sequence, and optionally (3) a promoter sequence capable of initially guiding RNA transcription, wherein the guide sequence targets a sequence surrounding a splice site in the genome of a eukaryotic cell, e.g., the guide sequence targets a region of-50-bp to +75-bp, preferably a region of-30-bp to +30-bp, and most preferably a region of-10-bp to +10-bp surrounding a SD site or SA site of a gene encoding incrna. In some embodiments, the guide sequence targets a splice site of a gene encoding a long non-coding RNA in the genome of a eukaryotic cell to induce exon skipping and/or intron retention, and thus disrupt the long non-coding RNA. In some embodiments, the eukaryotic cell genome is a human genome. In some embodiments, the guide sequence is 19-21 nucleotides in length. In some embodiments, the hairpin sequence is about 40 nucleotides in length and once transcribed can bind to the CRISPR/Cas nuclease.
CRISPR/Cas system nucleases
In some embodiments, the CRISPR/Cas nuclease is a type II CRISPR/Cas nuclease. In some embodiments, the CRISPR/Cas nuclease is Cas9 nuclease. In some embodiments, the Cas9 nuclease is streptococcus pneumoniae, streptococcus pyogenes, or streptococcus thermophilus Cas9, and may include a mutated Cas9 derived from these organisms. The nuclease may be a functionally equivalent variant of Cas 9. In some embodiments, the CRISPR/Cas nuclease is codon optimized for expression in a eukaryotic cell. In some embodiments, the CRISPR/Cas nuclease directs cleavage of one or both strands at a target sequence position. CRISPR/Cas system nucleases include, but are not limited to, Cas9 and Cpfl.
Reporter genes and proteins, and reads
In some embodiments, the reporter gene can be integrated into the cell using a CRISPR/Cas mechanism. For example, expression vectors, such as plasmids, comprising a promoter (e.g., the U6 promoter), a guide RNA hairpin sequence, and a guide sequence to target a desired genomic locus into which the reporter construct is integrated can be used. Such expression vectors may be prepared by cloning the guide sequences into an expression construct containing additional elements. A DNA fragment comprising a reporter coding sequence can be prepared and subsequently modified to include homology arms that flank the reporter coding sequence. A guide RNA expression vector, an amplified DNA fragment comprising a sequence encoding a reporter protein, and a CRISPR/Cas nuclease (or nuclease-encoding expression vector) are introduced into a host cell (e.g., via electroporation). The expression vector may further comprise additional selectable markers such as antibiotic resistance markers to enrich for cells successfully infected with the expression vector. Cells expressing the reporter protein may be further selected.
The reporter gene is used to identify potentially transfected cells and to evaluate the function of the regulatory sequences. In general, a reporter gene is a gene that is not endogenous or native to the host cell and encodes a protein that can be readily determined. Reporter genes encoding easily assayable proteins are known in the art and include, but are not limited to, Green Fluorescent Protein (GFP), Glutathione S Transferase (GST), horseradish peroxidase (HRP), Chloramphenicol Acetyltransferase (CAT) β -galactosidase, β -glucuronidase, luciferase, HcRed, DsRed, Cyan Fluorescent Protein (CFP), Yellow Fluorescent Protein (YFP), and auto-fluorescent proteins including Blue Fluorescent Protein (BFP), cell surface markers, antibiotic resistance genes such as neo, and the like.
Expression vector
The term "vector" refers to a nucleic acid molecule capable of transporting another nucleic acid to which it is linked. Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double-stranded, or partially double-stranded; nucleic acid molecules comprising one or more free ends, with no free ends (e.g., circular); a nucleic acid molecule comprising DNA, RNA, or both; and various other polynucleotides known in the art. One type of vector is a "plasmid," which refers to a circular double-stranded DNA loop into which additional DNA segments are inserted, such as by standard molecular cloning techniques. Some vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having an origin of bacterial replication and episomal mammalian vectors). When introduced into a host cell, other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of the host cell and thereby are replicated together with the host genome. In addition, some vectors are capable of directing the expression of genes to which they are operably linked. Such vectors are referred to herein as "expression vectors". Expression vectors in recombinant DNA technology often take the form of plasmids.
A recombinant expression vector may comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, by which is meant that the recombinant expression vector comprises one or more regulatory elements operably linked to the nucleic acid sequence to be expressed, which may be selected on the basis of the host cell used for expression. Within a recombinant expression vector, "operably linked" is intended to link the nucleotide sequence of interest to the regulatory element(s) in a manner that allows for expression of the nucleotide(s) (e.g., in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
Host cell
In fact, any eukaryotic cell type can be used as a host cell, so long as it can be cultured in vitro and modified as described herein. Preferably, the host cell is a pre-established cell line. The host cell and cell line may be a human cell or cell line, or it may be a non-human, mammalian cell or cell line.
Examples
Materials and methods
1. Cells and reagents
HeLa cell line from Z.Jiang laboratory (university of Beijing) was cultured in Dulbecco's modified Eagle's medium (DMEM, Gibco C11995500 BT). The Huh7.5 cell line from s.cohen laboratories (stanford university medical school) was cultured in dmem (Gibco) supplemented with 1% MEM non-essential amino acids (NEAA, Gibco 1140-. K562 cells from h.wu laboratory (university of beijing) and GM12878 cells from Coriell cell bank were cultured in RPMI1640 medium (Gibco 11875-093). All cells were supplemented with 10% fetal bovine serum (FBS, CellMax BL102-02) and 1% penicillin/streptomycin, 5% CO at 37 ℃2Culturing in medium.
2. Reverse transcription PCR (RT-PCR) for testing intron retention or exon skipping
Cloning of sgrnas into lentiviral expression vectors carrying CMV promoter-driven mCherry markers, followed by MOI<Transduction of HeLa by viral infectionOCCells1-472 hours post infection, FACS sorted mCherry positive cells and total RNA was extracted from each sample using the RNAprep purified cell/bacteria kit (TIANGEN DP 430). cDNA was synthesized from 2. mu.g of total RNA using QuantscriptRT kit (TIANGEN KR103-04), and RT-PCR reaction was performed using TransTaq HiFi DNA polymerase (TransGen AP 131-13).
sgRNA sequences targeting either RPL18 or RPL11 genes:
sgRNA1RPL18:5’-GGACCAGCCACTCACCATCC(SEQ ID No.1)
sgRNA2RPL18:5’-AGCTTCATCTTCCGGATCTT(SEQ ID No.2)
sgRNA3RPL11:5’-TCCTTGTGACTACTCACCTT(SEQ ID No.3)
sgRNA4RPL11:5’-AACTCATACTCCCGCACCTG(SEQ ID No.4)
primers used for RT-PCR:
1F:5’-CTGGGTCTTGTCTGTCTGGAA(SEQ ID No.5);
1R:5’-CTGGTGTTTACATTCAGCCCC(SEQ ID No.6);
2F:5’-GGCCAGAAGAACCAACTCCA(SEQ ID No.7);
2R:5’-GACAGTGCCACAGCCCTTAG(SEQ ID No.8);
3F:5’-TCAAGATGGCGTGTGGGATT(SEQ ID No.9);
3R:5’-GACCAGCAAATGGTGAAGCC(SEQ ID No.10);
4F:5’-GATCCTTTGGCATCCGGAGA(SEQ ID No.11);
4R:5’-GCTGATTCTGTGTTTGGCCC(SEQ ID No.12)。
3. construction and screening of sgRNA library for splicing targeting essential ribosomal genes
79 ribosomal genes were retrieved from NCBI. We scanned all potential sgrnas around-50-bp to +75-bp and around-75-bp to +50-bp per 5 'SD site and around-75-bp to +50-bp per 3' SA site targeting these 79 genes, including:
RPL10, RPL10A, RPL11, RPL12, RPL13, RPL13A, RPL14, RPL15, RPL17, RPL18, RPL18A, RPL19, RPL21, RPL22, RPL22L1, RPL23, RPL23A, RPL24, RPL26, RPL26L1, RPL27, RPL27 27, RPL35 27, RPL36 27, RPL27, RPS27, RPL27, RPS27, RPL27, RPS27, RPL27, RPS27, RPL27, RPS27, RPS27, RPS27, RPL 27. We ensure that all sgrnas have at least 2 mismatches with any other locus of the human genome. To demonstrate the natural cleavage potency of sgrnas in the library, GC content was not considered in the design. A total of 5,788 sgrnas targeting 79 ribosomal genes were synthesized using a CustmoArray 12K array chip (CustmoArray, Inc.). The design of sgrnas is illustrated here by the RPL18 gene among 79 ribosomal genes.
In HeLa and Huh7.5 cells expressing Cas9<An MOI of 0.3 cell libraries containing these sgRNAs were constructed by lentiviral delivery28Minimum coverage is 400 x. 72 hours after viral infection, the cells were treated by FACS (BD) according to mCherry+And (5) sorting. Control cells (2.4X 10) from each library were collected using DNeasy Blood and Tissue kit (QIAGEN 69506)6) For genomic DNA extraction, and the experimental cells were cultured continuously for 15 days before the extraction of genomic DNA. For each replicate, the lentiviral integrated sgRNA coding region was PCR amplified by TransTaq HiFi DNA polymerase (TransGen AP131-13) and further used DNA Clean as described previously&Concentrator-25 (Zymo Research Corporation D4034) purification4,9. The resulting Library was prepared for high throughput sequencing analysis (Illumina hiseq2500) using the NEBNext Ultra DNA Library Prep kit for Illumina (NEB E7370L).
4. Design and construction of genome-scale human lncRNA library
lncRNA were retrieved from the gendate dataset V20 containing 14,470 lncRNA. In this dataset, 2477 lncrnas without splice sites were removed in the first filtration step. For the remaining lncRNA, all potential 20-nt sgRNAs were designed that target the-10-bp to +10-bp region around each 5 'SD site and 3' SA site. To ensure cleavage efficiency and specificity, we retained only sgRNAs with at least 2 mismatches to other loci in the genome, with GC contents between 20% and 80%, and removed those sgRNAs containing 4-bp T nucleotide homopolymers. For optimal coverage, some sgRNAs with 1-bp or 0-bp mismatches to other loci were retained, as long as they did not target any essential genes of the K562 cell line15And the total number of mismatched sites is less than 2. Finally, a total of 126,773 sgrnas targeting 10,996 lncRNA were synthesized. In this library, we also included 500 non-targeted sgrnas in the human genome as negative controls, and 350 sgrnas targeting 36 essential ribosomal genes as positive controls. Using a Custmoarray 90K array chip (Cu)stmoArray, Inc.) oligonucleotides were synthesized and library construction was as described above.
5. genome-Scale lncRNA screening
Will be 5X 10 in total8The K562 cells were plated at 175cm2Flasks (Corning 431080), two for each. Cells were infected with sgRNA library lentivirus at an MOI of less than 0.3(1000x coverage) over 24 hours. 48 hours after infection, the library cells were subjected to puromycin treatment (3. mu.g/ml; Solambio P8230) for 2 days. For each replicate, a total of 1.3X 10 were collected8Individual cells were used for genome extraction as day 0 control samples. 30 days after viral infection, 1.3X 10 were isolated8The experimental cells are used for genome extraction and NGS analysis4,9。
6. Computer analysis of screens
Sequencing reads were mapped to hg38 reference genomes and decoded by home-made scripts. The sgRNA counts were normalized quantitatively from both replicates, and then the average counts and fold changes between experimental and control groups were calculated. Randomizing 10 negative control sgrnas by replacement of each gene generated 1000 negative control genes. The noise sgRNA was subsequently filtered based on the following criteria: a sgRNA is considered as a filtered noise sgRNA if the fold change of the sgRNA is lower in one replicate than the average fold change in the positive control sgRNA and higher in another replicate than the average fold change of the negative control sgRNA. For each lncRNA after noise filtration, we compared fold change of sgRNA to negative controls by Wilcox test and corrected the p-value using the empirical distribution generated by the negative controls to reduce false positive rate. We finally defined the screening score as: screening score-Scale (-log)10(adjusted p-value)) + | Scale (log)2(sgRNA fold change)) |. We assigned those hits with a screening score higher than 2 as essential lncrnas.
Confirmation of number of lncRNA samples
Two top-ranked sgrnas with at least 2 mismatches to any other locus in the genome were selected from the library for confirmation of the splicing strategy. For the pgRNA deletion strategy, pgRNA was designed to delete the promoter and first exon of each incrna. We designed gRNA pairs according to the following principles: (1) one sgRNA targets a region 2.5-3.5kb upstream of the transcription initiation site (TSS) and the other targets a region 0.2-1.5kb downstream of the TSS; (2) avoiding overlap with exons or promoters of any coding or non-coding gene. For each sgRNA of the pgRNA pair, we further ensure that (1) the GC content is 45% -70%, (2) the sgRNA does not include a 4-bp homopolymer, and (3) the sgRNA contains more than 2 mismatches with any other locus in the human locus. We included some sgrnas with 2 mismatches to other loci but fewer than 2 off-target sites.
All sgrnas or pgrnas targeted to the selected lncrnas to be confirmed were individually cloned into EGFP-tagged lentiviral vectors driven by CMV promoter. After viral packaging, the sgRNA or pgRNA lentiviruses are packaged in order to<An MOI of 1.0 was transduced into K562 or GM12878 cells. Cell proliferation assays as described in the literature9。
RNA sequencing and data analysis
Two sgrnas targeting the splice sites of lncRNA MIR17HG and BMS1P20 were cloned into lentiviral vectors with EGFP labeling, respectively. Infection of sgRNA by lentivirus (MOI)<1) Delivered into K562 or GM12878 cells. Sorting by FACS 2X 10 5 days post infection6EGFP-positive K562 or GM12878 cells. Total RNA from each sample was extracted using RNeasy Mini kit (QIAGEN 79254), and RNA-seq libraries were prepared according to NEBNext PolyA mRNA Magnetic Isolation Module (NEB E7490S), NEBNext RNA First Strand and Synthesis Module (NEB E7525S), NEBNext RNA Second Strand and Synthesis Module (NEB E6111S), and NEBNext UltraDNA Library Prep kit for Illumina (NEB E7370L). NGS analysis was performed on all samples using Illumina HiSeq X Ten platform (Genetron Health). Mapping depth sequencing reads to hg38 reference genome and passage through RSEM v1.2.2530The expression of the gene is quantified. By EBseq version 1.10.031The genes for which differential expression analysis was performed and which were differentially expressed were selected from adjusted P values<0.05 and absolute log2(fold change)>3. By DAVID 6.832Gene Ontology (Gene Ontology) and KEGG analyses were performed.
Results
Consistent with common general knowledge, there are conserved sequences that form splice sites, and we use the Weblogo3 tool33Bioinformatic analysis of (b) showed that about 99% of the intron regions in the human genome were flanked by GT at the 5 'Splice Donor (SD) site and AG at the 3' Splice Acceptor (SA) site. Notably, the AG sequence is present mainly as the last two bases of the exon immediately upstream of the SD site (fig. 1 a). To confirm the effectiveness of sgrnas in generating exon skipping and/or intron retention, we designed sgrnas that target the SD or SA sites of two ribosomal genes, RPL18 and RPL11, which are essential for cell growth and proliferation. Stably expressing Cas9 and OCT1 genes4In HeLa cells of (1), sgRNA1 targeting the SD siteRPL18And sgRNA2 targeting the SA siteRPL18Intron 3 retention and exon 4 skipping, respectively, were generated at the RPL18 locus in the genome, which were confirmed by both reverse transcription PCR (RT-PCR) and Sanger sequencing analysis.
The same results were obtained from similar attempts at the RPL11 gene, where sgRNA3RPL11And sgRNA4RPL11Intron 2 retention and exon 4 skipping, respectively, were generated at the RPL11 locus. Fig. 1b shows a schematic diagram of intron retention and exon skipping induced by sgrnas targeting either Splice Donor (SD) or Splice Acceptor (SA) sites.
To further assess the efficacy of targeted splicing in CRISPR scans, we designed a saturated library of splice sites targeting 79 ribosomal genes essential for cell growth in various cell lines29. The library contained 5,788 sgrnas with cleavage sites within-50-bp to +75-bp around each 5 'SD site and within-75-bp to +50-bp of each 3' SA site for the 79 genes, examples of sgrnas are seen in table 1.
Cell libraries containing these sgrnas were constructed by lentiviral delivery at MOI (multiplicity of infection) of <0.3 in HeLa cells and huh7.5 cells expressing Cas 9. Screening was performed by cell culture of library cells up to 15 days long, and based on NGS analysis, sgrnas that resulted in a decrease in cell viability were deciphered.
By calculating the fold change of sgrnas between 15-day experimental samples (Exp) and control samples (Ctrl), we ordered all sgrnas and aligned according to the distance (in how many base pairs) between the sgRNA cleavage site and its corresponding SD or SA site. Spearman correlation between Ctrl and Exp biological replicates in both HeLa and huh7.5 cells showed that all results were highly reproducible (fig. 2). To demonstrate the effectiveness of splicing targeting on gene disruption, we combined all data targeting the SD site and data targeting the SA site and ranked them according to their physical distance from the SD or SA site (fig. 3). It is evident that sgrnas affecting splice sites are superior to those targeting exon regions only in both HeLa and huh7.5 cells. The closer the cleavage site of the sgRNA is to the splice site, the better its effect on gene disruption, with the peak point slightly oriented towards the exon for both SD and SA cases (fig. 3). In contrast, a large number of sgrnas targeting the intron were rarely depleted during the screening process, indicating that their effect on gene disruption and cell viability due to loss of function of the gene was small. The only exceptions are those sgrnas that target such intron regions34,35The intron region, near the SA site, includes a branch point, followed by a polypyrimidine nucleotidic tract known to be involved in RNA splicing.
Since the number of sgrnas designed for any locus is not equal, we compared the percentage of high-efficiency sgrnas (sgrnas that are more than 4-fold reduced) per locus for a fair comparison. By doing so, we further confirmed that sgrnas targeting SD and SA were greatly superior to those targeting exon regions only (fig. 4 a). To better quantify our results, we classified all sgrnas into three categories: an intron-targeted sgRNA (the cleavage site of the sgRNA is within the intron and at least 30-bp from the SD or SA site), an exon-targeted sgRNA (the cleavage site of the sgRNA is within the exon and at least 30-bp from the SD or SA site), and a splicing-targeted sgRNA (the cleavage site of the sgRNA is-10-bp to +10-bp flanking the SD or SA site, and-and + refer to the intron and exon orientations, respectively). In both HeLa and huh7.5 cells, the percentage of sgrnas that resulted in more than a2 or 4-fold reduction was much higher in target spliced sgrnas than in the other two classes (fig. 4b,4 c).
Based on the above results, we concluded that this strategy should be universally applicable to coding genes and non-coding RNAs, since RNA splicing is a very conserved mechanism in both. Given that targeting splice sites would likely allow lncRNA function in human cells to be disrupted by exon skipping and/or intron retention, we designed and constructed sgRNA libraries that specifically targeted splicing to establish functional screening of lncRNA at the genome scale. From 14470 lncRNA retrieved from the GENCODE database V20, we first filtered out 2,477 that lacks a splice site. We also follow several other rules: all sgRNA cleavage sites are within-10-bp to +10-bp around the splice site, and the sgRNAs are predicted to have high cleavage activity29,36,37Without off-target of any known essential genes15(see methods section). We finally prepared a library comprising 126,773 sgrnas which targeted 10,996 unique incrnas. Together with 500 non-targeted control sgrnas and 350 sgrnas targeting essential ribosomal genes, we constructed a cell library in which engineered K562 cells stably expressed Cas9 protein (fig. 5a and 2 a). By passing through<Cell libraries were prepared by lentiviral transduction at low MOI of 0.3. After infection we cultured library cells continuously for 30 days to screen for those lncRNAs that affect cell growth and proliferation. Subsequent use of NGS analysis for sgRNA decoding4,9(FIG. 5 b).
After 30 days of culture, both lgrnas targeting lncRNA and essential genes were consumed compared to non-targeted sgrnas (fig. 5c, 5d, fig. 2b), indicating their effect on cell survival or proliferation. For each lncRNA, we calculated the fold of sgrnas and obtained their P-values via the Wilcoxon test by comparison to non-targeted sgrnas. We randomly sampled the non-targeted sgrnas to generate "negative control genes", thereby correcting the P-value of lncrnas by their distribution. For each lncRNA, screening scores were calculated by combining mean fold change and corrected P values (see methods section). A total of 243 candidate lncrnas were thus selected based on a threshold screening score of 2, the depletion of which would lead to cell growth inhibition or cell death in the K562 cell line (fig. 5 e). All 36 essential genes in the ranked list of negatively selected genes were significantly enriched based on the screening scores, indicating the reliability of the screening protocol and data analysis methods.
From the negatively selected lncrnas whose corresponding sgrnas were consistently consumed in both replicates, we selected 35 high ranked lncRNA genes for further validation. For each candidate, we cloned two high-ranked sgrnas obtained from the library screen into the lentiviral backbone with the EGFP selectable marker. Sgrnas that are not targeted to the sgRNA and that are targeted to the nonfunctional adeno-associated virus integration site 1(AAVS1) locus were selected as negative controls and sgrnas that are targeted to the ribosomal gene RPL18 were included as positive controls (fig. 6a fig. 3). Each sgRNA was transduced into K562 cells and cell proliferation was quantified based on the percentage change in EGFP-positive cells. To further examine the difference in lncRNA function between cancer and normal cells, we included the lymphoid stem cell GM12878 for validation, which has a relatively normal karyotype and belongs to the class 1 ENCODE cell line, as in K56224,25. Notably, targeting all sgrnas of the 35 highest ranked incrna loci effectively resulted in cell proliferation of K562 cells (fig. 6b, c fig. 3, and fig. 7-12). Of these, 18 lncrnas also appeared to be essential for GM12878 growth (fig. 6b and fig. 7-10 fig. 3), while 6 and 11 lncrnas samples showed weak detectable effect (fig. 10) and no detectable effect (fig. 6c and fig. 11-12 fig. 3) on cell viability, respectively, in GM 12878. These results indicate the presence of cell type specificity. In summary, about half of the lncrnas essential for K562 did not have a significant effect on the growth of GM12878 cells, suggesting a unique biomarker for cancer cells with therapeutic potential (fig. 6d fig. 3).
To further confirm our validation assay and screening strategy, both of which rely on splicing interference, we chose the pgRNA-mediated deletion method9To independently study the effect of lncRNA sampling from our screen. We selected 6 lncrnas from the 35 samples that were validated, while another 6 candidates from top-ranked samples were not included in the validation because their highly ranked target-spliced sgrnas had some off-target potential. For each of these 12 lncRNAFour pairs of pgRNAs were counted, with the promoter and first exon deleted (see methods section). The AAVS1 locus or ribosomal genes RPL19 and RPL23A were selected as negative or positive control for pgRNA targeting, respectively (fig. 13 a). By cell proliferation assay, 6 incrnas from 35 validated samples showed reproducible phenotypes as validated by the strategy of targeted splicing (fig. 6e and fig. 13b fig. 3). The validation results from targeted splicing have a good correlation with the results from the deletion strategy (correlation coefficient 0.93, P0.002) (fig. 6f fig. 3), indicating that the strategy of targeted splicing is a reliable and robust approach for lncRNA gene disruption. Also, we confirmed that other 6 lncrnas candidates were important for the growth of K562 cells (fig. 14). Thus, it was confirmed that all 41 lncrnas were crucial for the growth and proliferation of K562 cells.
To better understand the mechanisms that lead to these different manifestations in K562 and GM12878 cells, we further examined the functions of lncRNA MIR17HG (fig. 6b fig. 3) that are essential for both cell lines and BMS1P20 that are essential only for cell survival of K562 and not for GM12878 (fig. 6c fig. 3). We performed RNA-seq analysis of both K562 and GM12878 cells with or without MIR17HG or BMS1P20 knockouts. We disrupted each lncRNA with two sgrnas targeting its splice site, the effectiveness of which was confirmed in a validation assay (fig. 6b, c fig. 3). We evaluated the expression levels of the top-ranked 500 genes that showed variation between control and sgRNA-targeted samples and observed different expression patterns after knocking out the two lncrnas (fig. 15a fig. 4 a). For both lncrnas in each cell line, two sgrnas targeting the same splice site with similar changes in expression pattern were shown (fig. 16a, b). The overall expression level of the top 100 essential lncrnas identified from K562 cells was higher in wild-type K562 cells than in GM12878 cells (P ═ 0.03, fig. 15b fig. 4 b).
In the K562 cell line, altering the splicing pattern of MIR17HG down-regulated 179 essential genes known to affect cell growth and proliferation15(P ═ 0.01, fig. 15c fig. 4c), and disruption of BMS1P20 down-regulated 178 known essential genes15(P ═ 0.05, fig. 15c fig. 4c), showing that both are presentHow the species lncRNA affects the possible mechanism of K562 cell growth. Unexpectedly, MIR17HG and BMS1P20 affected 140 common essential genes in K562 cells (fig. 15d), although they played different roles in GM12878 cells. These conserved genes are enriched in several essential pathways, such as those that regulate translation initiation, cell division, and DNA repair (fig. 16 c). For BMS1P20, disruption of this lncRNA up-or down-regulated expression of a panel of encoding genes in both K562 and GM12878 cells compared to control cells (fig. 16 d-e). We further investigated the genes differentially expressed in K562 after knocking out this lncRNA compared to GM12878 (fig. 15 e). These genes, down-regulated in K562, are enriched in processes such as the p53 signaling pathway and the PI3K-Akt signaling pathway, which can affect cell growth and proliferation (fig. 15 f). There were also genes up-regulated (fig. 15f) and these differentially expressed genes were all associated with phenotypic changes resulting from BMS1P20 knock-out in affecting cell growth in both cell lines.
In summary, gene interference of both the gene encoding the protein and the lncRNA can be substantially enhanced by targeting the splice sites. In addition to generating frame-shift mutations in the gene encoding the protein, targeted splicing provides additional opportunities for gene disruption. This feature is irreplaceable for knocking out non-coding RNAs that are not reading frame sensitive via sgRNA approach. Furthermore, this strategy of disrupting splice sites may be particularly effective when it is difficult to design appropriate sgrnas that target genes with conserved coding sequences.
The CRISPR-Cas9 system has been used to delete via two strategies (paired gRNAs (pgRNAs))9And CRISPR12) Large scale identification of functional lncrnas. Although technically easy to scale-up compared to pgRNA-mediated genomic deletions using the CRISPRi strategy, CRISPRi and CRISPRa methods typically function over a window of about 1-kb of the targeted Transcription Start Site (TSS)12,26By this method the skilled person will be faced with the risk of inadvertently affecting the expression of genes adjacent to almost 60% of the incRNA locus27. The strategy of targeted splicing can effectively avoid using a single guide RNA to cut most of the overlapping regions, and avoid much more chance of influencing adjacent genes, thereby reducing the false positive rate. And CRISPRi, because it only reduces gene expression levels rather than completely knocking out the target locus, reserves space for false positive results.
Based on experimental data, the novel method described in the present invention was demonstrated to have significant advantages in negative CRISPR screening of coding genes, which is complementary to conventional exon-targeting methods, and also allows large-scale functional deletion screening of non-coding genes using single guide RNA-CRISPR libraries. In addition, exon skipping or intron retention resulting from splice site disruption provides a convenient method for functional validation of a single non-coding RNA.
Reference to the literature
1.Shalem,O.et al.Genome-scale CRISPR-Cas9 knockout screening in human cells.Science 343,84-87(2014).
2.Wang,T.,Wei,J.J.,Sabatini,D.M.&Lander,E.S.Genetic screens in human cells using the CRISPR-Cas9 system.Science 343,80-84(2014).
3.Koike-Yusa,H.,Li,Y.,Tan,E.P.,Velasco-Herrera Mdel,C.&Yusa,K.Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat Biotechnol 32,267-273(2014).
4.Zhou,Y.et al.High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells.Nature 509,487-491(2014).
5.Ezkurdia,I.et al.Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes.Hum Mol Genet 23,5866-5878(2014).
6.Rinn,J.L.&Chang,H.Y.Genome regulation by long noncoding RNAs.Annu Rev Biochem 81,145-166(2012).
7.Quinn,J.J.&Chang,H.Y.Unique features of long non-coding RNA biogenesis and function.Nat Rev Genet 17,47-62(2016).
8.Kretz,M.et al.Control of somatic tissue differentiation by the long non-coding RNA TINCR.Nature 493,231-235(2013).
9.Zhu,S.et al.Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR-Cas9 library.Nat Biotechnol 34,1279-1286(2016).
10.Guttman,M.et al.lincRNAs act in the circuitry controlling pluripotency and differentiation.Nature 477,295-300(2011).
11.Lin,N.et al.An evolutionarily conserved long noncoding RNA TUNA controls pluripotency and neural lineage commitment.Mol Cell 53,1005-1019(2014).
12.Liu,S.J.et al.CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells.Science 355(2017).
13.Adamson,B.,Smogorzewska,A.,Sigoillot,F.D.,King,R.W.&Elledge,S.J.A genome-wide homologous recombination screen identifies the RNA-bindingprotein RBMX as a component of the DNA-damage response.Nat Cell Biol 14,318-328(2012).
14.Sambrook,Fritsch and Maniatis,MOLECULAR CLONING:A LABORATORY MANUAL,2nd edition(1989).
15.F.M.Ausubel,et al.eds.,CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (1987).
16.M.J.MacPherson,B.D.Hames and G.R.Taylor eds.,METHODS IN ENZYMOLOGY (Academic Press,Inc.):PGR 2:A PRACTICAL APPROACH(1995).
17.Harlow and Lane,eds.ANTIBODIES,A LABORATORY MANUAL,(1988).
18.R.L Freshney,ed.,ANIMAL CELL CULTURE(1987).
19.Goeddel,GENE EXPRESSION TECHNOLOGY:METHODS IN ENZYMOLOGY 185, Academic Press,San Diego,Calif.(1990).
20.Seed,1987.Nature 329:840(Seed,B.An LFA-3 cDNA encodes a phospholipid-linked membrane protein homologous to its receptor CD2.Nature(1987)329:840–842.)
21.Kaufman,et al.,1987.EMBO J.6:187-195(Randal J,Kaufman,et al.Translational efficiency of polycistronic mRNAs and their utilization toexpress heterologous genes in mammalian cells.The EMBO Journal(1987)6:187-195)
22.Clancy,Suzanne.RNA Splicing:Introns and exons and Spliceosome.Nature Education.1, 31(2008).
23.Black,Douglas L.Mechanisms of Alternative Pre-Messenger RNA Splicing.Annual Review of Biochemistry.72:291–336(2003).
24.Ng,Bernard;Yang,Fan;et al.Increased noncanonical splicing of autoantigen transcripts provides the structural basis for expression ofuntolerized epitopes.Journal of Allergy and Clinical Immunology.114:1463–70(2004).
25.Lim,KH;Ferraris,L;et al.Using positional distribution to identify splicing elements and predict pre-mRNA processing defects in humangenes.Proc.Natl.Acad.Sci.USA.108: 11093–11098(2011).
Warf, MB; berglund, JA.role of RNA structure in regulating pre-mRNA spicing.Tr end s biochem. Sci.35: 169-178 (2010).
Warf, MB; berglund, JA. role of RNA structure in regulating pre-mRNA spicing. Tr end s biochem. Sci.35(3): 169-178 (2010).
28.Ren,Q.et al.A Dual-Reporter System for Real-Time Monitoring and High-throughput CRISPR/Cas9 Library Screening of the Hepatitis C Virus.Scientific reports 5,8865(2015).
29.Wang,T.et al.Identification and characterization of essential genes in the human genome. Science 350,1096-1101(2015).
30.Li,B.&Dewey,C.N.RSEM:accurate transcript quantification from RNA-Seq data with or without a reference genome.BMC bioinformatics 12,323(2011).
31.Leng,N.et al.EBSeq:an empirical Bayes hierarchical model for inference in RNA-seq experiments.Bioinformatics 29,1035-1043(2013).
32.Jiao,X.et al.DAVID-WS:a stateful web service to facilitate gene/protein list analysis. Bioinformatics 28,1805-1806(2012).
33.Crooks,G.E.,Hon,G.,Chandonia,J.M.&Brenner,S.E.WebLogo:a sequence logo generator.Genome Res 14,1188-1190(2004).
34.Matlin,A.J.,Clark,F.&Smith,C.W.Understanding alternative splicing:towards a cellular code.Nat Rev Mol Cell Biol 6,386-398(2005).
35.Taggart,A.J.,DeSimone,A.M.,Shih,J.S.,Filloux,M.E.&Fairbrother,W.G. Large-scale mapping of branchpoints in human pre-mRNA transcripts in vivo.NatStruct Mol Biol 19,719-721(2012).
36.Hsu,P.D.et al.DNA targeting specificity of RNA-guided Cas9nucleases.Nat Biotechnol 31,827-832(2013).
37.Xu,H.et al.Sequence determinants of improved CRISPR sgRNA design.Genome Res 25, 1147-1157(2015).
38.Heidari,N.et al.Genome-wide map of regulatory interactions in the human genome. Genome Res 24,1905-1917(2014).
39.Muller,R.Y.,Hammond,M.C.,Rio,D.C.&Lee,Y.J.An Efficient Method for Electroporation of Small Interfering RNAs into ENCODE Project Tier 1 GM12878 and K562 Cell Lines.J Biomol Tech 26,142-149(2015).
40.Joung,J.et al.Genome-scale activation screen identifies a lncRNA locus regulating a gene neighbourhood.Nature(2017).
41.Goyal,A.et al.Challenges of CRISPR/Cas9applications for long non-coding RNA genes. Nucleic Acids Res 45,e12(2017).
Sequence listing
<110> Beijing university
Boya Zhenning (Beijing) Biotech Co., Ltd
<120> method for screening and identifying functional lncRNA
<130> PA00044
<141> 2018-04-02
<160> 12
<170> SIPOSequenceListing 1.0
<210> 1
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
ggaccagcca ctcaccatcc 20
<210> 2
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
agcttcatct tccggatctt 20
<210> 3
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
tccttgtgac tactcacctt 20
<210> 4
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 4
aactcatact cccgcacctg 20
<210> 5
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 5
ctgggtcttg tctgtctgga a 21
<210> 6
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 6
ctggtgttta cattcagccc c 21
<210> 7
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 7
ggccagaaga accaactcca 20
<210> 8
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 8
gacagtgcca cagcccttag 20
<210> 9
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 9
tcaagatggc gtgtgggatt 20
<210> 10
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 10
gaccagcaaa tggtgaagcc 20
<210> 11
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 11
gatcctttgg catccggaga 20
<210> 12
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 12
gctgattctg tgtttggccc 20
Claims (10)
1. A CRISPR/Cas guide RNA construct for interfering with a long non-coding RNA in a eukaryotic cell genome comprising a guide sequence targeting a genomic sequence surrounding a long non-coding RNA splice site operably linked to a promoter and a guide hairpin sequence.
2. The CRISPR/Cas guide RNA construct of claim 1, wherein said eukaryotic genome is a human genome.
3. The CRISPR/Cas guide RNA construct of claim 1 or 2, wherein said guide sequence is 19-21 nucleotides in length.
4. The CRISPR/Cas guide RNA construct of any of claims 1-3, wherein said hairpin sequence is about 40 nucleotides in length and once transcribed can bind to a CRISPR/Cas nuclease.
5. The CRISPR/Cas guide RNA construct of any of claims 1-4, wherein said guide sequence targets a genomic sequence within a region spanning-50-bp to +75-bp around the SD or SA site of the long non-coding RNA.
6. The CRISPR/Cas guide RNA construct of claim 5, wherein the guide sequence targets a genomic sequence within a region spanning-30-bp to +30-bp around the SD or SA site of the long non-coding RNA.
7. The CRISPR/Cas guide RNA construct of claim 6, wherein the guide sequence targets a genomic sequence within a region spanning-10-bp to +10-bp around the SD or SA site of the long non-coding RNA.
8. The CRISPR/Cas guide RNA construct of any of claims 1-7, which is a viral vector or plasmid.
9. A library comprising a plurality of CRISPR/Cas guide RNA constructs of any of claims 1-8.
10. A storage liquid comprising the CRISPR/Cas guide RNA construct of any of claims 1-8 or the library of claim 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810284463.3A CN110343724B (en) | 2018-04-02 | 2018-04-02 | Method for screening and identifying functional lncRNA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810284463.3A CN110343724B (en) | 2018-04-02 | 2018-04-02 | Method for screening and identifying functional lncRNA |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110343724A true CN110343724A (en) | 2019-10-18 |
CN110343724B CN110343724B (en) | 2021-10-12 |
Family
ID=68173534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810284463.3A Active CN110343724B (en) | 2018-04-02 | 2018-04-02 | Method for screening and identifying functional lncRNA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110343724B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111088357A (en) * | 2019-12-31 | 2020-05-01 | 深圳大学 | Tumor marker for ESCC and application thereof |
CN112704737A (en) * | 2021-01-15 | 2021-04-27 | 青岛市第九人民医院 | A coronary artery endothelial cell angiogenesis promoter |
CN113327645A (en) * | 2021-04-15 | 2021-08-31 | 四川大学华西医院 | Long non-coding RNA and application thereof in diagnosis and treatment of bile duct cancer |
CN113539360A (en) * | 2021-07-21 | 2021-10-22 | 西北工业大学 | IncRNA characteristic recognition method based on correlation optimization and immune enrichment |
CN114807126A (en) * | 2021-01-22 | 2022-07-29 | 清华大学深圳国际研究生院 | Method for silencing expression of long non-coding RNA and application thereof |
WO2023284735A1 (en) * | 2021-07-12 | 2023-01-19 | Edigene Therapeutics (Beijing) Inc. | Methods of identifying drug sensitive genes and drug resistant genes in cancer cells |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016142719A1 (en) * | 2015-03-12 | 2016-09-15 | Genome Research Limited | Biallelic genetic modification |
CN107849581A (en) * | 2015-05-19 | 2018-03-27 | Kws种子欧洲股份公司 | Method and construct for the specific nucleic acid editor in plant |
-
2018
- 2018-04-02 CN CN201810284463.3A patent/CN110343724B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016142719A1 (en) * | 2015-03-12 | 2016-09-15 | Genome Research Limited | Biallelic genetic modification |
CN107849581A (en) * | 2015-05-19 | 2018-03-27 | Kws种子欧洲股份公司 | Method and construct for the specific nucleic acid editor in plant |
Non-Patent Citations (5)
Title |
---|
CHENGZU LONG: "Correction of diverse muscular dystrophy mutations in human engineered heart muscle by single-site genome editing", 《SCIENCE ADVANCES》 * |
SHIN SY: "DGCR5,lncRNA", 《GENBANK》 * |
SHIYOU ZHU: "Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISCRISCRISCRISPR–Cas9 library", 《NATURE BIOTECHNOLOGY》 * |
梁成伟等: "《生物化学》", 31 December 2017 * |
赵武玲: "《分子生物学》", 31 August 2010 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111088357A (en) * | 2019-12-31 | 2020-05-01 | 深圳大学 | Tumor marker for ESCC and application thereof |
CN112704737A (en) * | 2021-01-15 | 2021-04-27 | 青岛市第九人民医院 | A coronary artery endothelial cell angiogenesis promoter |
CN113768947A (en) * | 2021-01-15 | 2021-12-10 | 青岛市第九人民医院 | Application of gene inhibitor in preparation of medicine for treating ischemic heart disease |
CN114807126A (en) * | 2021-01-22 | 2022-07-29 | 清华大学深圳国际研究生院 | Method for silencing expression of long non-coding RNA and application thereof |
CN113327645A (en) * | 2021-04-15 | 2021-08-31 | 四川大学华西医院 | Long non-coding RNA and application thereof in diagnosis and treatment of bile duct cancer |
WO2023284735A1 (en) * | 2021-07-12 | 2023-01-19 | Edigene Therapeutics (Beijing) Inc. | Methods of identifying drug sensitive genes and drug resistant genes in cancer cells |
CN113539360A (en) * | 2021-07-21 | 2021-10-22 | 西北工业大学 | IncRNA characteristic recognition method based on correlation optimization and immune enrichment |
Also Published As
Publication number | Publication date |
---|---|
CN110343724B (en) | 2021-10-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110343724B (en) | Method for screening and identifying functional lncRNA | |
Giuliano et al. | Generating single cell–derived knockout clones in mammalian cells with CRISPR/Cas9 | |
CN106637421B (en) | Construction of double sgRNA library and method for applying double sgRNA library to high-throughput functional screening research | |
Ishizu et al. | Somatic primary piRNA biogenesis driven by cis-acting RNA elements and trans-acting Yb | |
AU2019408503B2 (en) | Compositions and methods for highly efficient genetic screening using barcoded guide rna constructs | |
Lu et al. | Transcriptome-wide investigation of circular RNAs in rice | |
JP7244885B2 (en) | Methods for Screening and Identifying Functional lncRNAs | |
US11667904B2 (en) | CRISPR-associated systems and components | |
JP2018532419A (en) | CRISPR-Cas sgRNA library | |
Zhao et al. | CRISPR–Cas9-mediated functional dissection of 3′-UTRs | |
WO2016205745A2 (en) | Cell sorting | |
EP2479278A1 (en) | Method for the construction of specific promoters | |
Lemp et al. | Cryptic transcripts from a ubiquitous plasmid origin of replication confound tests for cis-regulatory function | |
CN111349654A (en) | Compositions and methods for efficient gene screening using tagged guide RNA constructs | |
Wu et al. | Massively parallel characterization of CRISPR activator efficacy in human induced pluripotent stem cells and neurons | |
AU2022381188A1 (en) | Serine recombinases | |
US11946163B2 (en) | Methods for measuring and improving CRISPR reagent function | |
Estep et al. | Immunoblot screening of CRISPR/Cas9-mediated gene knockouts without selection | |
Zhu et al. | RNA circuits and RNA-binding proteins in T cells | |
Mitschka et al. | Generation of 3′ UTR knockout cell lines by CRISPR/Cas9-mediated genome editing | |
CN111334531A (en) | High signal-to-noise ratio negative genetic screening method | |
CN113151265A (en) | Method for inhibiting expression of lncRNA in nucleus based on CRISPR-dCase9 system | |
WO2024215712A2 (en) | Methods for identifying epigenetic factors influencing gene editing and modulating gene editing outcome via epigenetic modulation | |
Guay et al. | Unbiased genome-scale identification of cis-regulatory modules in the human genome by GRAMc | |
Liu et al. | Multiplexed pooled library screening with Cpf1 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |