WO2023102610A1 - Methods and compositions for multiplexing cell analysis - Google Patents
Methods and compositions for multiplexing cell analysis Download PDFInfo
- Publication number
- WO2023102610A1 WO2023102610A1 PCT/AU2022/051476 AU2022051476W WO2023102610A1 WO 2023102610 A1 WO2023102610 A1 WO 2023102610A1 AU 2022051476 W AU2022051476 W AU 2022051476W WO 2023102610 A1 WO2023102610 A1 WO 2023102610A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cells
- seq
- progeny
- populations
- cell
- Prior art date
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 44
- 238000000034 method Methods 0.000 title claims description 116
- 239000000203 mixture Substances 0.000 title description 8
- 210000004027 cell Anatomy 0.000 claims description 526
- 230000002068 genetic effect Effects 0.000 claims description 102
- 210000004263 induced pluripotent stem cell Anatomy 0.000 claims description 99
- 238000012163 sequencing technique Methods 0.000 claims description 67
- 230000014509 gene expression Effects 0.000 claims description 50
- 238000013507 mapping Methods 0.000 claims description 32
- 238000003559 RNA-seq method Methods 0.000 claims description 28
- 108091034117 Oligonucleotide Proteins 0.000 claims description 24
- 230000003094 perturbing effect Effects 0.000 claims description 20
- 239000002299 complementary DNA Substances 0.000 claims description 16
- 150000007523 nucleic acids Chemical class 0.000 claims description 15
- 108091033409 CRISPR Proteins 0.000 claims description 13
- 230000003321 amplification Effects 0.000 claims description 12
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 12
- 239000003795 chemical substances by application Substances 0.000 claims description 11
- 102000039446 nucleic acids Human genes 0.000 claims description 11
- 108020004707 nucleic acids Proteins 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 10
- 241000702421 Dependoparvovirus Species 0.000 claims description 7
- 238000010362 genome editing Methods 0.000 claims description 7
- 230000010354 integration Effects 0.000 claims description 7
- 230000001404 mediated effect Effects 0.000 claims description 6
- 210000002220 organoid Anatomy 0.000 claims description 6
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 claims description 5
- 108091030071 RNAI Proteins 0.000 claims description 4
- 238000012258 culturing Methods 0.000 claims description 4
- 230000009368 gene silencing by RNA Effects 0.000 claims description 4
- 150000003384 small molecules Chemical group 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 3
- 239000000074 antisense oligonucleotide Substances 0.000 claims description 2
- 238000012230 antisense oligonucleotides Methods 0.000 claims description 2
- 238000007878 drug screening assay Methods 0.000 claims description 2
- 229920001184 polypeptide Polymers 0.000 claims description 2
- 102000004196 processed proteins & peptides Human genes 0.000 claims description 2
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 2
- 238000010354 CRISPR gene editing Methods 0.000 claims 2
- 108090000623 proteins and genes Proteins 0.000 description 54
- 239000000523 sample Substances 0.000 description 51
- 230000004069 differentiation Effects 0.000 description 25
- 210000000130 stem cell Anatomy 0.000 description 25
- 108020004414 DNA Proteins 0.000 description 24
- 239000003550 marker Substances 0.000 description 24
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 22
- 238000006243 chemical reaction Methods 0.000 description 16
- 108020005004 Guide RNA Proteins 0.000 description 14
- 239000013612 plasmid Substances 0.000 description 14
- 102000004169 proteins and genes Human genes 0.000 description 14
- 238000013459 approach Methods 0.000 description 13
- 238000002474 experimental method Methods 0.000 description 13
- 125000003729 nucleotide group Chemical group 0.000 description 13
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 12
- 238000001514 detection method Methods 0.000 description 12
- 239000002773 nucleotide Substances 0.000 description 12
- 238000012174 single-cell RNA sequencing Methods 0.000 description 12
- 230000024245 cell differentiation Effects 0.000 description 11
- 239000000047 product Substances 0.000 description 10
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 9
- 108091027544 Subgenomic mRNA Proteins 0.000 description 9
- 239000003153 chemical reaction reagent Substances 0.000 description 9
- 229910052804 chromium Inorganic materials 0.000 description 9
- 239000011651 chromium Substances 0.000 description 9
- 238000001914 filtration Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 7
- 238000002372 labelling Methods 0.000 description 7
- 238000002360 preparation method Methods 0.000 description 7
- CIWBSHSKHKDKBQ-JLAZNSOCSA-N Ascorbic acid Chemical compound OC[C@H](O)[C@H]1OC(=O)C(O)=C1O CIWBSHSKHKDKBQ-JLAZNSOCSA-N 0.000 description 6
- 108010077544 Chromatin Proteins 0.000 description 6
- 238000012168 Perturb-seq Methods 0.000 description 6
- 230000001413 cellular effect Effects 0.000 description 6
- 210000003483 chromatin Anatomy 0.000 description 6
- 238000013461 design Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 239000000463 material Substances 0.000 description 6
- 238000003908 quality control method Methods 0.000 description 6
- 230000009467 reduction Effects 0.000 description 6
- 238000012216 screening Methods 0.000 description 6
- 230000002103 transcriptional effect Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000001973 epigenetic effect Effects 0.000 description 5
- 230000002438 mitochondrial effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 230000019491 signal transduction Effects 0.000 description 5
- 230000011664 signaling Effects 0.000 description 5
- 230000009897 systematic effect Effects 0.000 description 5
- 238000013518 transcription Methods 0.000 description 5
- 230000035897 transcription Effects 0.000 description 5
- 102000004127 Cytokines Human genes 0.000 description 4
- 108090000695 Cytokines Proteins 0.000 description 4
- 239000006146 Roswell Park Memorial Institute medium Substances 0.000 description 4
- 238000003556 assay Methods 0.000 description 4
- 230000027455 binding Effects 0.000 description 4
- 239000003124 biologic agent Substances 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 4
- 230000007423 decrease Effects 0.000 description 4
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 210000001671 embryonic stem cell Anatomy 0.000 description 4
- 210000001900 endoderm Anatomy 0.000 description 4
- 238000013401 experimental design Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 239000001963 growth medium Substances 0.000 description 4
- 108020004999 messenger RNA Proteins 0.000 description 4
- 239000013642 negative control Substances 0.000 description 4
- 230000037361 pathway Effects 0.000 description 4
- 230000001105 regulatory effect Effects 0.000 description 4
- 238000010839 reverse transcription Methods 0.000 description 4
- 230000008685 targeting Effects 0.000 description 4
- 238000011282 treatment Methods 0.000 description 4
- 102000012410 DNA Ligases Human genes 0.000 description 3
- 108010061982 DNA Ligases Proteins 0.000 description 3
- 230000004568 DNA-binding Effects 0.000 description 3
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 3
- 101001062347 Homo sapiens Hepatocyte nuclear factor 3-beta Proteins 0.000 description 3
- 108020005196 Mitochondrial DNA Proteins 0.000 description 3
- 241000699666 Mus <mouse, genus> Species 0.000 description 3
- 108091028043 Nucleic acid sequence Proteins 0.000 description 3
- 108091027967 Small hairpin RNA Proteins 0.000 description 3
- 102000013814 Wnt Human genes 0.000 description 3
- 108050003627 Wnt Proteins 0.000 description 3
- 230000004913 activation Effects 0.000 description 3
- 229960005070 ascorbic acid Drugs 0.000 description 3
- 235000010323 ascorbic acid Nutrition 0.000 description 3
- 239000011668 ascorbic acid Substances 0.000 description 3
- 239000000872 buffer Substances 0.000 description 3
- 239000013043 chemical agent Substances 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 239000003102 growth factor Substances 0.000 description 3
- 238000001727 in vivo Methods 0.000 description 3
- 238000011534 incubation Methods 0.000 description 3
- 239000003112 inhibitor Substances 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 210000003716 mesoderm Anatomy 0.000 description 3
- 230000036961 partial effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 239000004055 small Interfering RNA Substances 0.000 description 3
- 238000012085 transcriptional profiling Methods 0.000 description 3
- 238000001890 transfection Methods 0.000 description 3
- 239000013603 viral vector Substances 0.000 description 3
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical compound C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 description 2
- 108091093088 Amplicon Proteins 0.000 description 2
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- 102100035875 C-C chemokine receptor type 5 Human genes 0.000 description 2
- 102000053602 DNA Human genes 0.000 description 2
- 238000001712 DNA sequencing Methods 0.000 description 2
- 241000116650 Dipseudopsis nebulosa Species 0.000 description 2
- 101100310856 Drosophila melanogaster spri gene Proteins 0.000 description 2
- 238000012413 Fluorescence activated cell sorting analysis Methods 0.000 description 2
- 102100029284 Hepatocyte nuclear factor 3-beta Human genes 0.000 description 2
- 241000282412 Homo Species 0.000 description 2
- 101000687905 Homo sapiens Transcription factor SOX-2 Proteins 0.000 description 2
- 101000844802 Lacticaseibacillus rhamnosus Teichoic acid D-alanyltransferase Proteins 0.000 description 2
- 241000713666 Lentivirus Species 0.000 description 2
- 101710163270 Nuclease Proteins 0.000 description 2
- 108010047956 Nucleosomes Proteins 0.000 description 2
- 238000010459 TALEN Methods 0.000 description 2
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 2
- 108091023040 Transcription factor Proteins 0.000 description 2
- 102000040945 Transcription factor Human genes 0.000 description 2
- 102100024270 Transcription factor SOX-2 Human genes 0.000 description 2
- 102000005789 Vascular Endothelial Growth Factors Human genes 0.000 description 2
- 108010019530 Vascular Endothelial Growth Factors Proteins 0.000 description 2
- 238000001790 Welch's t-test Methods 0.000 description 2
- 108010017070 Zinc Finger Nucleases Proteins 0.000 description 2
- 101150063416 add gene Proteins 0.000 description 2
- 239000000556 agonist Substances 0.000 description 2
- 230000004075 alteration Effects 0.000 description 2
- AVKUERGKIZMTKX-NJBDSQKTSA-N ampicillin Chemical compound C1([C@@H](N)C(=O)N[C@H]2[C@H]3SC([C@@H](N3C2=O)C(O)=O)(C)C)=CC=CC=C1 AVKUERGKIZMTKX-NJBDSQKTSA-N 0.000 description 2
- 229960000723 ampicillin Drugs 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- 239000011230 binding agent Substances 0.000 description 2
- 230000000747 cardiac effect Effects 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 238000000205 computational method Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000032459 dedifferentiation Effects 0.000 description 2
- 238000002224 dissection Methods 0.000 description 2
- 239000000975 dye Substances 0.000 description 2
- CTSPAMFJBXKSOY-UHFFFAOYSA-N ellipticine Chemical compound N1=CC=C2C(C)=C(NC=3C4=CC=CC=3)C4=C(C)C2=C1 CTSPAMFJBXKSOY-UHFFFAOYSA-N 0.000 description 2
- 230000007717 exclusion Effects 0.000 description 2
- 238000010195 expression analysis Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 210000004602 germ cell Anatomy 0.000 description 2
- 210000001654 germ layer Anatomy 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- NOESYZHRGYRDHS-UHFFFAOYSA-N insulin Chemical compound N1C(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(NC(=O)CN)C(C)CC)CSSCC(C(NC(CO)C(=O)NC(CC(C)C)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCC(O)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CSSCC(NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2C=CC(O)=CC=2)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2NC=NC=2)NC(=O)C(CO)NC(=O)CNC2=O)C(=O)NCC(=O)NC(CCC(O)=O)C(=O)NC(CCCNC(N)=N)C(=O)NCC(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC(O)=CC=3)C(=O)NC(C(C)O)C(=O)N3C(CCC3)C(=O)NC(CCCCN)C(=O)NC(C)C(O)=O)C(=O)NC(CC(N)=O)C(O)=O)=O)NC(=O)C(C(C)CC)NC(=O)C(CO)NC(=O)C(C(C)O)NC(=O)C1CSSCC2NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(N)CC=1C=CC=CC=1)C(C)C)CC1=CN=CN1 NOESYZHRGYRDHS-UHFFFAOYSA-N 0.000 description 2
- 210000003734 kidney Anatomy 0.000 description 2
- 210000004962 mammalian cell Anatomy 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000010369 molecular cloning Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 210000001623 nucleosome Anatomy 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 125000001805 pentosyl group Chemical group 0.000 description 2
- 210000001778 pluripotent stem cell Anatomy 0.000 description 2
- 108091033319 polynucleotide Proteins 0.000 description 2
- 102000040430 polynucleotide Human genes 0.000 description 2
- 239000002157 polynucleotide Substances 0.000 description 2
- 230000004481 post-translational protein modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000002062 proliferating effect Effects 0.000 description 2
- XJMOSONTPMZWPB-UHFFFAOYSA-M propidium iodide Chemical compound [I-].[I-].C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CCC[N+](C)(CC)CC)=C1C1=CC=CC=C1 XJMOSONTPMZWPB-UHFFFAOYSA-M 0.000 description 2
- 230000009145 protein modification Effects 0.000 description 2
- RXWNCPJZOCPEPQ-NVWDDTSBSA-N puromycin Chemical compound C1=CC(OC)=CC=C1C[C@H](N)C(=O)N[C@H]1[C@@H](O)[C@H](N2C3=NC=NC(=C3N=C2)N(C)C)O[C@@H]1CO RXWNCPJZOCPEPQ-NVWDDTSBSA-N 0.000 description 2
- 238000012372 quality testing Methods 0.000 description 2
- 239000013643 reference control Substances 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 238000007480 sanger sequencing Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 239000002356 single layer Substances 0.000 description 2
- 210000001082 somatic cell Anatomy 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 238000001847 surface plasmon resonance imaging Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 238000009424 underpinning Methods 0.000 description 2
- 210000002438 upper gastrointestinal tract Anatomy 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- WHTVZRBIWZFKQO-AWEZNQCLSA-N (S)-chloroquine Chemical compound ClC1=CC=C2C(N[C@@H](C)CCCN(CC)CC)=CC=NC2=C1 WHTVZRBIWZFKQO-AWEZNQCLSA-N 0.000 description 1
- KKAJSJJFBSOMGS-UHFFFAOYSA-N 3,6-diamino-10-methylacridinium chloride Chemical compound [Cl-].C1=C(N)C=C2[N+](C)=C(C=C(N)C=C3)C3=CC2=C1 KKAJSJJFBSOMGS-UHFFFAOYSA-N 0.000 description 1
- FWBHETKCLVMNFS-UHFFFAOYSA-N 4',6-Diamino-2-phenylindol Chemical compound C1=CC(C(=N)N)=CC=C1C1=CC2=CC=C(C(N)=N)C=C2N1 FWBHETKCLVMNFS-UHFFFAOYSA-N 0.000 description 1
- STQGQHZAVUOBTE-UHFFFAOYSA-N 7-Cyan-hept-2t-en-4,6-diinsaeure Natural products C1=2C(O)=C3C(=O)C=4C(OC)=CC=CC=4C(=O)C3=C(O)C=2CC(O)(C(C)=O)CC1OC1CC(N)C(O)C(C)O1 STQGQHZAVUOBTE-UHFFFAOYSA-N 0.000 description 1
- 229920001817 Agar Polymers 0.000 description 1
- 108700028369 Alleles Proteins 0.000 description 1
- 239000012583 B-27 Supplement Substances 0.000 description 1
- 102100024522 Bladder cancer-associated protein Human genes 0.000 description 1
- 101150110835 Blcap gene Proteins 0.000 description 1
- 102000004219 Brain-derived neurotrophic factor Human genes 0.000 description 1
- 108090000715 Brain-derived neurotrophic factor Proteins 0.000 description 1
- 101710149870 C-C chemokine receptor type 5 Proteins 0.000 description 1
- 238000010356 CRISPR-Cas9 genome editing Methods 0.000 description 1
- AQGNHMOJWBZFQQ-UHFFFAOYSA-N CT 99021 Chemical compound CC1=CNC(C=2C(=NC(NCCNC=3N=CC(=CC=3)C#N)=NC=2)C=2C(=CC(Cl)=CC=2)Cl)=N1 AQGNHMOJWBZFQQ-UHFFFAOYSA-N 0.000 description 1
- 102100029761 Cadherin-5 Human genes 0.000 description 1
- 206010008805 Chromosomal abnormalities Diseases 0.000 description 1
- 208000031404 Chromosome Aberrations Diseases 0.000 description 1
- 208000005443 Circulating Neoplastic Cells Diseases 0.000 description 1
- 108020004705 Codon Proteins 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- 239000012625 DNA intercalator Substances 0.000 description 1
- 230000008836 DNA modification Effects 0.000 description 1
- 230000007018 DNA scission Effects 0.000 description 1
- 102000052510 DNA-Binding Proteins Human genes 0.000 description 1
- 108700020911 DNA-Binding Proteins Proteins 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- WEAHRLBPCANXCN-UHFFFAOYSA-N Daunomycin Natural products CCC1(O)CC(OC2CC(N)C(O)C(C)O2)c3cc4C(=O)c5c(OC)cccc5C(=O)c4c(O)c3C1 WEAHRLBPCANXCN-UHFFFAOYSA-N 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 241000160765 Erebia ligea Species 0.000 description 1
- QTANTQQOYSUMLC-UHFFFAOYSA-O Ethidium cation Chemical compound C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CC)=C1C1=CC=CC=C1 QTANTQQOYSUMLC-UHFFFAOYSA-O 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 101150021185 FGF gene Proteins 0.000 description 1
- ZIXGXMMUKPLXBB-UHFFFAOYSA-N Guatambuinine Natural products N1C2=CC=CC=C2C2=C1C(C)=C1C=CN=C(C)C1=C2 ZIXGXMMUKPLXBB-UHFFFAOYSA-N 0.000 description 1
- 102100024208 Homeobox protein MIXL1 Human genes 0.000 description 1
- 101000946926 Homo sapiens C-C chemokine receptor type 5 Proteins 0.000 description 1
- 101000794587 Homo sapiens Cadherin-5 Proteins 0.000 description 1
- 101001052462 Homo sapiens Homeobox protein MIXL1 Proteins 0.000 description 1
- 101000958741 Homo sapiens Myosin-6 Proteins 0.000 description 1
- 101000613490 Homo sapiens Paired box protein Pax-3 Proteins 0.000 description 1
- 101001069727 Homo sapiens Paired mesoderm homeobox protein 1 Proteins 0.000 description 1
- 101000819111 Homo sapiens Trans-acting T-cell-specific transcription factor GATA-3 Proteins 0.000 description 1
- 241000713772 Human immunodeficiency virus 1 Species 0.000 description 1
- 102000004877 Insulin Human genes 0.000 description 1
- 108090001061 Insulin Proteins 0.000 description 1
- 102100034343 Integrase Human genes 0.000 description 1
- 239000006137 Luria-Bertani broth Substances 0.000 description 1
- 102100025169 Max-binding protein MNT Human genes 0.000 description 1
- 102000018697 Membrane Proteins Human genes 0.000 description 1
- 108010052285 Membrane Proteins Proteins 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 101100113998 Mus musculus Cnbd2 gene Proteins 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 241000204031 Mycoplasma Species 0.000 description 1
- 102100038319 Myosin-6 Human genes 0.000 description 1
- 241001176820 Nebulosa Species 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 108091092724 Noncoding DNA Proteins 0.000 description 1
- 101100493740 Oryza sativa subsp. japonica BC10 gene Proteins 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 101150102573 PCR1 gene Proteins 0.000 description 1
- 102100040891 Paired box protein Pax-3 Human genes 0.000 description 1
- 102100033786 Paired mesoderm homeobox protein 1 Human genes 0.000 description 1
- WDVSHHCDHLJJJR-UHFFFAOYSA-N Proflavine Chemical compound C1=CC(N)=CC2=NC3=CC(N)=CC=C3C=C21 WDVSHHCDHLJJJR-UHFFFAOYSA-N 0.000 description 1
- 230000026279 RNA modification Effects 0.000 description 1
- 239000013614 RNA sample Substances 0.000 description 1
- 230000004570 RNA-binding Effects 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 238000011529 RT qPCR Methods 0.000 description 1
- 101100247004 Rattus norvegicus Qsox1 gene Proteins 0.000 description 1
- 108020004511 Recombinant DNA Proteins 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 108020001027 Ribosomal DNA Proteins 0.000 description 1
- KJTLSVCANCCWHF-UHFFFAOYSA-N Ruthenium Chemical compound [Ru] KJTLSVCANCCWHF-UHFFFAOYSA-N 0.000 description 1
- SUYXJDLXGFPMCQ-INIZCTEOSA-N SJ000287331 Natural products CC1=c2cnccc2=C(C)C2=Nc3ccccc3[C@H]12 SUYXJDLXGFPMCQ-INIZCTEOSA-N 0.000 description 1
- 101100166144 Staphylococcus aureus cas9 gene Proteins 0.000 description 1
- 235000019892 Stellar Nutrition 0.000 description 1
- 206010043276 Teratoma Diseases 0.000 description 1
- 108010012306 Tn5 transposase Proteins 0.000 description 1
- 102100021386 Trans-acting T-cell-specific transcription factor GATA-3 Human genes 0.000 description 1
- 108010020764 Transposases Proteins 0.000 description 1
- 102000008579 Transposases Human genes 0.000 description 1
- GLNADSQYFUSGOU-GPTZEZBUSA-J Trypan blue Chemical compound [Na+].[Na+].[Na+].[Na+].C1=C(S([O-])(=O)=O)C=C2C=C(S([O-])(=O)=O)C(/N=N/C3=CC=C(C=C3C)C=3C=C(C(=CC=3)\N=N\C=3C(=CC4=CC(=CC(N)=C4C=3O)S([O-])(=O)=O)S([O-])(=O)=O)C)=C(O)C2=C1N GLNADSQYFUSGOU-GPTZEZBUSA-J 0.000 description 1
- VGQOVCHZGQWAOI-UHFFFAOYSA-N UNPD55612 Natural products N1C(O)C2CC(C=CC(N)=O)=CN2C(=O)C2=CC=C(C)C(O)=C12 VGQOVCHZGQWAOI-UHFFFAOYSA-N 0.000 description 1
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 1
- 102100035140 Vitronectin Human genes 0.000 description 1
- 108010031318 Vitronectin Proteins 0.000 description 1
- DPKHZNPWBDQZCN-UHFFFAOYSA-N acridine orange free base Chemical compound C1=CC(N(C)C)=CC2=NC3=CC(N(C)C)=CC=C3C=C21 DPKHZNPWBDQZCN-UHFFFAOYSA-N 0.000 description 1
- 150000001251 acridines Chemical class 0.000 description 1
- 229940023020 acriflavine Drugs 0.000 description 1
- 239000012190 activator Substances 0.000 description 1
- 210000005006 adaptive immune system Anatomy 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 239000008272 agar Substances 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 125000003275 alpha amino acid group Chemical group 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 239000005557 antagonist Substances 0.000 description 1
- VGQOVCHZGQWAOI-HYUHUPJXSA-N anthramycin Chemical compound N1[C@@H](O)[C@@H]2CC(\C=C\C(N)=O)=CN2C(=O)C2=CC=C(C)C(O)=C12 VGQOVCHZGQWAOI-HYUHUPJXSA-N 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- DZBUGLKDJFMEHC-UHFFFAOYSA-N benzoquinolinylidene Natural products C1=CC=CC2=CC3=CC=CC=C3N=C21 DZBUGLKDJFMEHC-UHFFFAOYSA-N 0.000 description 1
- 102000023732 binding proteins Human genes 0.000 description 1
- 108091008324 binding proteins Proteins 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 125000003178 carboxy group Chemical group [H]OC(*)=O 0.000 description 1
- 210000004413 cardiac myocyte Anatomy 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 239000006143 cell culture medium Substances 0.000 description 1
- 239000012578 cell culture reagent Substances 0.000 description 1
- 210000000170 cell membrane Anatomy 0.000 description 1
- 238000002659 cell therapy Methods 0.000 description 1
- 230000003833 cell viability Effects 0.000 description 1
- 230000023549 cell-cell signaling Effects 0.000 description 1
- 108091092356 cellular DNA Proteins 0.000 description 1
- 230000008668 cellular reprogramming Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 229960003677 chloroquine Drugs 0.000 description 1
- WHTVZRBIWZFKQO-UHFFFAOYSA-N chloroquine Natural products ClC1=CC=C2C(NC(C)CCCN(CC)CC)=CC=NC2=C1 WHTVZRBIWZFKQO-UHFFFAOYSA-N 0.000 description 1
- ZYVSOIYQKUDENJ-WKSBCEQHSA-N chromomycin A3 Chemical compound O([C@@H]1C[C@@H](O[C@H](C)[C@@H]1OC(C)=O)OC=1C=C2C=C3C[C@H]([C@@H](C(=O)C3=C(O)C2=C(O)C=1C)O[C@@H]1O[C@H](C)[C@@H](O)[C@H](O[C@@H]2O[C@H](C)[C@@H](O)[C@H](O[C@@H]3O[C@@H](C)[C@H](OC(C)=O)[C@@](C)(O)C3)C2)C1)[C@H](OC)C(=O)[C@@H](O)[C@@H](C)O)[C@@H]1C[C@@H](O)[C@@H](OC)[C@@H](C)O1 ZYVSOIYQKUDENJ-WKSBCEQHSA-N 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000003501 co-culture Methods 0.000 description 1
- 230000004186 co-expression Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- STQGQHZAVUOBTE-VGBVRHCVSA-N daunorubicin Chemical compound O([C@H]1C[C@@](O)(CC=2C(O)=C3C(=O)C=4C=CC=C(C=4C(=O)C3=C(O)C=21)OC)C(C)=O)[C@H]1C[C@H](N)[C@H](O)[C@H](C)O1 STQGQHZAVUOBTE-VGBVRHCVSA-N 0.000 description 1
- CFCUWKMKBJTWLW-UHFFFAOYSA-N deoliosyl-3C-alpha-L-digitoxosyl-MTM Natural products CC=1C(O)=C2C(O)=C3C(=O)C(OC4OC(C)C(O)C(OC5OC(C)C(O)C(OC6OC(C)C(O)C(C)(O)C6)C5)C4)C(C(OC)C(=O)C(O)C(C)O)CC3=CC2=CC=1OC(OC(C)C1O)CC1OC1CC(O)C(O)C(C)O1 CFCUWKMKBJTWLW-UHFFFAOYSA-N 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 238000012161 digital transcriptional profiling Methods 0.000 description 1
- 230000003467 diminishing effect Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 239000012636 effector Substances 0.000 description 1
- 238000004520 electroporation Methods 0.000 description 1
- 239000000839 emulsion Substances 0.000 description 1
- 210000004039 endoderm cell Anatomy 0.000 description 1
- 210000003038 endothelium Anatomy 0.000 description 1
- 230000006718 epigenetic regulation Effects 0.000 description 1
- ZMMJGEGLRURXTF-UHFFFAOYSA-N ethidium bromide Chemical compound [Br-].C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CC)=C1C1=CC=CC=C1 ZMMJGEGLRURXTF-UHFFFAOYSA-N 0.000 description 1
- 229960005542 ethidium bromide Drugs 0.000 description 1
- 238000000684 flow cytometry Methods 0.000 description 1
- -1 fluorcoumanin Chemical compound 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 239000012595 freezing medium Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 210000001647 gastrula Anatomy 0.000 description 1
- 238000011223 gene expression profiling Methods 0.000 description 1
- 238000003197 gene knockdown Methods 0.000 description 1
- 238000003209 gene knockout Methods 0.000 description 1
- 238000012239 gene modification Methods 0.000 description 1
- 230000004547 gene signature Effects 0.000 description 1
- 230000030279 gene silencing Effects 0.000 description 1
- 230000005017 genetic modification Effects 0.000 description 1
- 230000007614 genetic variation Effects 0.000 description 1
- 235000013617 genetically modified food Nutrition 0.000 description 1
- 238000011331 genomic analysis Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000000126 in silico method Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 229940125396 insulin Drugs 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000008611 intercellular interaction Effects 0.000 description 1
- 230000003834 intracellular effect Effects 0.000 description 1
- 238000011005 laboratory method Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 239000002609 medium Substances 0.000 description 1
- CFCUWKMKBJTWLW-BKHRDMLASA-N mithramycin Chemical compound O([C@@H]1C[C@@H](O[C@H](C)[C@H]1O)OC=1C=C2C=C3C[C@H]([C@@H](C(=O)C3=C(O)C2=C(O)C=1C)O[C@@H]1O[C@H](C)[C@@H](O)[C@H](O[C@@H]2O[C@H](C)[C@H](O)[C@H](O[C@@H]3O[C@H](C)[C@@H](O)[C@@](C)(O)C3)C2)C1)[C@H](OC)C(=O)[C@@H](O)[C@@H](C)O)[C@H]1C[C@@H](O)[C@H](O)[C@@H](C)O1 CFCUWKMKBJTWLW-BKHRDMLASA-N 0.000 description 1
- 108091064355 mitochondrial RNA Proteins 0.000 description 1
- 238000010172 mouse model Methods 0.000 description 1
- 210000002894 multi-fate stem cell Anatomy 0.000 description 1
- 238000002703 mutagenesis Methods 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- UPBAOYRENQEPJO-UHFFFAOYSA-N n-[5-[[5-[(3-amino-3-iminopropyl)carbamoyl]-1-methylpyrrol-3-yl]carbamoyl]-1-methylpyrrol-3-yl]-4-formamido-1-methylpyrrole-2-carboxamide Chemical compound CN1C=C(NC=O)C=C1C(=O)NC1=CN(C)C(C(=O)NC2=CN(C)C(C(=O)NCCC(N)=N)=C2)=C1 UPBAOYRENQEPJO-UHFFFAOYSA-N 0.000 description 1
- 229910052754 neon Inorganic materials 0.000 description 1
- GKAOGPIIYCISHV-UHFFFAOYSA-N neon atom Chemical compound [Ne] GKAOGPIIYCISHV-UHFFFAOYSA-N 0.000 description 1
- 238000007481 next generation sequencing Methods 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 108091027963 non-coding RNA Proteins 0.000 description 1
- 102000042567 non-coding RNA Human genes 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 210000004789 organ system Anatomy 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 150000004713 phosphodiesters Chemical group 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 229960003171 plicamycin Drugs 0.000 description 1
- 230000008488 polyadenylation Effects 0.000 description 1
- 125000004424 polypyridyl Polymers 0.000 description 1
- 239000013641 positive control Substances 0.000 description 1
- 230000029279 positive regulation of transcription, DNA-dependent Effects 0.000 description 1
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 1
- 230000037452 priming Effects 0.000 description 1
- 229960000286 proflavine Drugs 0.000 description 1
- 238000001742 protein purification Methods 0.000 description 1
- 238000000734 protein sequencing Methods 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 239000012264 purified product Substances 0.000 description 1
- 229950010131 puromycin Drugs 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000037425 regulation of transcription Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 108020004418 ribosomal RNA Proteins 0.000 description 1
- 210000003705 ribosome Anatomy 0.000 description 1
- 239000011435 rock Substances 0.000 description 1
- 229910052707 ruthenium Inorganic materials 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 210000004927 skin cell Anatomy 0.000 description 1
- 230000000392 somatic effect Effects 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
- 108010042747 stallimycin Proteins 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 235000012976 tarts Nutrition 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 108091006107 transcriptional repressors Proteins 0.000 description 1
- 238000011222 transcriptome analysis Methods 0.000 description 1
- 230000002463 transducing effect Effects 0.000 description 1
- 238000010361 transduction Methods 0.000 description 1
- 230000026683 transduction Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000032258 transport Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- 230000004906 unfolded protein response Effects 0.000 description 1
- 241000701161 unidentified adenovirus Species 0.000 description 1
- 241001430294 unidentified retrovirus Species 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 108700026220 vif Genes Proteins 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1065—Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/65—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression using markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N5/00—Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
- C12N5/06—Animal cells or tissues; Human cells or tissues
- C12N5/0602—Vertebrate cells
- C12N5/0696—Artificially induced pluripotent stem cells, e.g. iPS
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B20/00—Methods specially adapted for identifying library members
- C40B20/04—Identifying library members by means of a tag, label, or other readable or detectable entity associated with the library members, e.g. decoding processes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2501/00—Active agents used in cell culture processes, e.g. differentation
- C12N2501/65—MicroRNA
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2506/00—Differentiation of animal cells from one lineage to another; Differentiation of pluripotent cells
- C12N2506/45—Differentiation of animal cells from one lineage to another; Differentiation of pluripotent cells from artificially induced pluripotent stem cells
Definitions
- the present invention relates to the use of a population of cells into which an engineered barcode unique to the cells has been stably integrated and the use of such cells in combination with one or more other populations of cells into which an engineered barcode unique to each other population which has been stably integrated for multiplexed analysis.
- multiplexing uses multiplexing of genetically distinct cells so that their known (Demuxlet, MIX-seq), or inferred (scSplit, Vivero) genotypes enable demultiplexing by reference to intrinsic genetic variation across input samples.
- multiplexing involves associating a sample- specific oligonucleotide barcode to the cells by attaching a barcode to the cell membrane or a membrane protein (Cell hashing, ClickTags, MULTI-Seq), introducing barcodes into the cells (Transient barcoding, barRNA-seq, scifi-seq, sci-plex), or into their genomes (CellTag).
- Barcoding-based multiplexing requires barcode sequencing alongside the transcriptome with expressed barcodes for each cell used to identify its sample of origin. Downstream computational approaches then distinguish true positive barcode expression signals from background noise arising from low quality cell barcodes and ambient transcripts present during cell capture. Current strategies estimate the background count distribution and determine
- CellTag is an alternative approach for barcoding cells.
- an unknown copy number of barcodes are integrated into the genome randomly which can alter or silence gene expression based on integration location and the lack of reliable expression levels of barcodes makes detection more likely to result in false negative barcode detection.
- the present invention provides a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell RNA-seq library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
- iPSC induced pluripotent stem cell
- the present invention provides a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell de-multiplexing said library using said genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
- iPSC induced pluripotent stem cell
- the present invention provides a plurality of isogenic populations of iPSCs or progeny thereof, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is integrated into the genome of the cells of each population at a targeted location.
- the disclosed embodiments also provide a computer program product including a non- transitory computer readable medium on which is provided program instructions for performing the recited operations and other computational operations of the methods described herein.
- Some embodiments provide a system for multiplexed single cell analysis in a sample using the isogenic barcoded iPSC populations described herein.
- the system includes a sequencer for receiving nucleic acids from the test sample providing nucleic acid sequence information from the sample, a processor; and one or more computer-readable storage media having stored thereon instructions for execution on the processor to map a read of sequence information from a pooled sample (e.g. a sequence library) to a single cell, de-multiplexing the sequence information using the genetic barcodes; and mapping said single cell to an originating cell populations or progeny thereof in the test sample.
- a pooled sample e.g. a sequence library
- a method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.
- a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell RNA-seq library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
- iPSC induced pluripotent stem cell
- step of generating a multiplexed single cell RNA-seq library comprises: creating a one or more gene expression libraries; creating one or more separate barcode libraries via creation of purified cDNA and amplification of regions of said cDNA comprising said genetic barcode; and pooling said gene expression library and said barcode library.
- step of generating a multiplexed single cell RNA-seq library comprises creating a gene expression library via creation of purified cDNA and amplification of said cDNA but does not include the generation of a separate barcode library.
- a method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations of cells or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.
- a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
- iPSC induced pluripotent stem cell
- AGCCCTGAGTCAGTA (SEQ ID NO: 4);
- CAAATTCAAGGCGAT SEQ ID NO: 5
- TTATTATGTTCTAGC SEQ ID NO: 17
- AATCTCTGAAACGAA SEQ ID NO: 18
- manipulating one or more of the populations of cells or progeny thereof comprises altering the culture conditions of, or genetically perturbing the cells of the one or more populations or progeny thereof.
- altering the culture conditions comprises contacting the cells or progeny thereof with an agent of interest, contacting the cells or progeny thereof with another cell, co-culturing the cells or progeny thereof with another cell, or co-culturing the cells and/or progeny thereof in an organoid.
- agent of interest is a small molecule, a polypeptide, an antibody, a nucleic acid molecule, an RNAi, a vector comprising a nucleic acid molecule, an antisense oligonucleotide, or a gene editing system (e.g. CRISPR/Cas9).
- agent of interest is a small molecule, a polypeptide, an antibody, a nucleic acid molecule, an RNAi, a vector comprising a nucleic acid molecule, an antisense oligonucleotide, or a gene editing system (e.g. CRISPR/Cas9).
- AGCCCTGAGTCAGTA (SEQ ID NO: 4);
- CAAATTCAAGGCGAT SEQ ID NO: 5
- composition of matter, group of steps or group of compositions of matter shall be taken to encompass one and a plurality (i.e. one or more) of those steps, compositions of matter, groups of steps or group of compositions of matter.
- Figure 1 shows CRISPR editing and quality control of engineered barcode in isogenic iPSCs.
- Example QC steps for FACS analysis of GFP (left) and SSEA4 (right) (b) image analysis for morphology (c) and G-band karyotyping (d) used to ensure quality of barcode engineered iPSCs.
- Single cell RNA-seq of barcoded iPSCs in the pluripotent state demonstrates that all barcoded cell lines show similar transcriptional profiles based on dimensionality reduction using UMAP visualisation (e), and for each barcoded line, analysis of reads per cell (f) and expression of the pluripotency markers SOX2 and OCT4 (g).
- External hashing antibodies were used on four barcoded cell lines as secondary validation for accuracy of barcode calling from single cell RNA-seq data showing high fidelity and confidence of computational assignment of external and internal barcodes.
- Figure 2 shows quality control analysis of barcoded iPSCs. a-c. Each cell line was analysed by G-band karyotyping (a) and FACS analysis for the pluripotency marker SSEA4 (b) and purity of the GFP expression transcribed from the barcode cassette engineered into the AAVS1 locus (c).
- FIG. 3 shows the experimental design for multiplexed single cell analysis of mesendoderm differentiation
- a Timeline of the general differentiation protocol from hiPSCs to committed mesendoderm cell types.
- hiPSC human induced pluripotent stem cell
- GLS germ layer specification
- PC progenitor cell
- cCT committed cell types
- Experimental approaches comprise a high resolution time course capturing cells every 24 hours between day 2 and day 9 of differentiation (left) as well as a perturbation strategy of Wnt and BMP signalling pathways during the progenitor cell stage between day 3 and day 5 (right), capturing cells prior to perturbation (day 2), immediately after (day 5) and at the committed cell stage (day 9).
- c Timeline of the general differentiation protocol from hiPSCs to committed mesendoderm cell types.
- hiPSC human induced pluripotent stem cell
- GLS germ layer specification
- PC progenitor cell
- cCT committed cell types
- Experimental approaches comprise a high resolution time course capturing cells every 24
- Figure 4 shows High resolution time course of mesendoderm differentiation, a. Uniform manifold approximation and projection (UMAP) plot showing all cells of the time course (13,682 cells). Cells are coloured by their cluster annotation and numbered according to the legend in figure 4C. b. UMAP plot showing cells coloured by time point, c. Fraction of clusters per time point, displayed as absolute numbers (top) and proportions (bottom), d. Nebulosa plots of specific marker genes demarcating cluster identities, e. Dot plot of marker genes from both datasets, f. RNA velocity results coloured by cluster identity and time point.
- UMAP Uniform manifold approximation and projection
- Figure 5 shows signalling pathway perturbations during mesendoderm differentiation
- UMAP Uniform manifold approximation and projection
- b UMAP plot showing cells coloured by time point, sequencing library and condition
- c Fraction of clusters per time point, displayed as proportions
- d Nebulosa plots of specific marker genes demarcating cluster identities, e. Dot plot of marker genes from both datasets, f. Normalized stagged bar plots displaying contributions of cells from different conditions to each cluster.
- FIG. 6 shows mitochondrial genes measured using different barcoding strategies. Samples shown are barcoded by Cell hashing: seu_bc0Xav, seu_bc5Xav, seu_bcDox vs. engineered barcoding (separately amplified barcoding library): fit, seu2, seu3 vs engineered barcoding (barcoding reads just from transcriptome library): lib 1 , lib2, lib3. Data show that mitochondrial reads, which are a measure of cell stress, are significantly higher in cells that use cell hashing (P ⁇ 0.01 two sample Welch's t-test).
- Figure 7 shows barcode classification of singlet (individual cells), negative (no barcode detected), and doublet (multiple barcodes detected) measured using different barcoding strategies.
- Samples shown are barcoded by Cell hashing: seu_bc0Xav, seu_bc5Xav, seu_bcDox vs. engineered barcoding (separately amplified barcoding library): fit, seu2, seu3 vs engineered barcoding (barcoding reads just from transcriptome library): lib 1 , lib2, lib3.
- Data show no significant difference in singlet detection efficiency using any either barcoding or library sequencing method based on two sample Welch’s t-test.
- Figure 8 shows 0XAV QC data.
- a-d Thresholds used for cell filtering based on transcriptome metrics: (a) library size, (b) number of reads mapped to genes, (c) percentage of reads mapped to mitochondrial genes, and (d) percentage of reads mapped to ribosomal genes, e.
- kidney organoid optionally includes one or more kidney organoid.
- nucleic acids are written left to right in 5’ to 3’ orientation and amino acid sequences are written left to right in amino to carboxy orientation, respectively.
- a gene is a locus (or region) of DNA which is made up of nucleotides and is the molecular unit of heredity.
- polynucleotide oligonucleotide
- nucleic acid and nucleic acid molecules
- polynucleotide oligonucleotide
- nucleic acid molecules are used interchangeably and refer to a covalently linked sequence of nucleotides (i.e., ribonucleotides for RNA and deoxyribonucleotides for DNA) in which the 3’ position of the pentose of one nucleotide is joined by a phosphodiester group to the 5’ position of the pentose of the next.
- the nucleotides include sequences of any form of nucleic acid, including, but not limited to RNA and DNA molecules. The terms includes, without limitation, single- and double- stranded polynucleotide.
- the term “read” refers to a sequence obtained from a portion of a nucleic acid sample. Typically, though not necessarily, a read represents a short sequence of contiguous base pairs in the sample. The read may be represented symbolically by the base pair sequence (in A, T, C, or G) of the sample portion. It may be stored in a memory device and processed as appropriate to determine whether it matches a reference sequence or meets other criteria. A read may be obtained directly from a sequencing apparatus or indirectly from stored sequence information concerning the sample.
- a read is a nucleic acid sequence of sufficient length (e.g., at least about 25 bp) that can be used to identify a larger sequence or region, e.g., that can be aligned and specifically assigned to a chromosome or genomic region or gene.
- genomic read is used in reference to a read of any segments in the entire genome of an individual.
- iPSCs Induced pluripotent stem cells
- iPS cells iPS cells
- iPS cells iPS cells
- IPS cells have been derived using modifications of an approach originally discovered in 2006 (Yamanaka, S. et al., Cell Stem Cell, 1:39-49 (2007)). For example, in one instance, to create iPS cells, scientists started with skin cells that were then modified by a standard laboratory technique using retroviruses to insert genes into the cellular DNA.
- the inserted genes were Oct4, Sox2, Lif4, and c- myc, known to act together as natural regulators to keep cells in an embryonic stem cell-like state. These cells have been described in the literature. See, for example, Wemig et al., PNAS, 105:5856-5861(2008); Jaenisch et al., Cell, 132:567-582 (2008); Hanna et al., Cell, 133:250-264 (2008); and Brambrink et al., Cell Stem Cell, 2:151-159 (2008). It is also possible that such cells can be created by specific culture conditions (exposure to specific agents) may also be created from a variety of different starting cell types. These references are all incorporated by reference for teaching iPSCs and methods for producing them.
- iPSCs have many characteristic features of embryonic stem cells. For example, they have the ability to create chimeras with germ line transmission and tetrapioid complementation and they can also form teratomas containing various cell types from the three embryonic germ layers. On the other hand, they may not be identical as some reports demonstrate. See, for example, Chin et al., Cell Stem Cell 5:111-123 (2009) showing that induced pluripotent stem cells and embryonic stem cells can be distinguished by gene expression signatures.
- Cells such as iPSCs or their progeny (including differentiated progeny) as disclosed herein may in the context of the present specification be said to “express” or “comprise the expression” or conversely to “not express” one or more markers, such as one or more genes or gene products; or be described as “positive” or conversely as “negative” for one or more markers, such as one or more genes or gene products; or be said to “comprise” a defined “gene or gene product signature”.
- Such terms are commonplace and well-understood by the skilled person when characterizing cell phenotypes.
- a skilled person would conclude the presence or evidence of a distinct signal for the marker when carrying out a measurement capable of detecting or quantifying the marker in or on the cell.
- the presence or evidence of the distinct signal for the marker would be concluded based on a comparison of the measurement result obtained for the cell to a result of the same measurement carried out for a negative control (for example, a cell known to not express the marker) and/or a positive control (for example, a cell known to express the marker).
- a positive cell may generate a signal for the marker that is at least 1.5-fold higher than a signal generated for the marker by a reference cell (e.g. negative control cell) or than an average signal generated for the marker by a population of reference or negative control cells, e.g., at least 2-fold, at least 4-fold, at least 10-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50-fold higher or even higher.
- a reference cell e.g. negative control cell
- an average signal generated for the marker by a population of reference or negative control cells e.g., at least 2-fold, at least 4-fold, at least 10-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50-fold higher or even higher.
- a positive cell may generate a signal for the marker that is 3.0 or more standard deviations, e.g., 3.5 or more, 4.0 or more, 4.5 or more, or 5.0 or more standard deviations, higher than an average signal generated for the marker by a population of reference or negative control cells.
- Human pluripotent stem cells can self-renew and have the potential to differentiate into theoretically any cell type of the body in response to developmental signalling cues that guide cell differentiation decisions.
- iPSCs induced pluripotent stem cells
- iPSCs induced pluripotent stem cells
- the present invention provides a plurality of isogenic populations of iPSCs or progeny thereof, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location.
- the present invention provides a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location.
- Embodiments disclosed herein also relate to progeny of such cells (e.g. progeny of iPSCs or other stem or progenitor cells), including differentiated progeny or a population of cells obtained from one or more of the populations of barcoded cells (e.g. iPSCs).
- progeny of such cells e.g. progeny of iPSCs or other stem or progenitor cells
- differentiated progeny or a population of cells obtained from one or more of the populations of barcoded cells e.g. iPSCs.
- the term “differentiated” or “differentiation” as used with respect to cells in a differentiating cell system refers to the process by which cells differentiate from one cell type (e.g., a multipotent, totipotent or pluripotent differentiable cell) to another cell type such as a target differentiated cell).
- the “cell differentiation” refers to a specialization process or a pathway by which a less specialized cell (e.
- dedifferentiation refers to a process wherein a more specialized cell having a more distinct form and function, and/or limited self-renewal and/or proliferative capacity becomes less specialized and acquires a greater self-renewal and/or proliferative capacity or differentiation capacity (e.g. multipotent, pluripotent etc.).
- An induced Pluripotent Stem Cell (iPSC) is an example of a dedifferentiated cell. Accordingly, dedifferentiation can refer to a process of cellular reprogramming.
- the isogenic populations of cells are cell lines derived from a single source wherein each cell line comprises a genetic barcode unique to that cell line.
- the barcoded iPSC populations or cell lines are linage committed or differentiated whereby the iPSCs are differentiated to multipotent stem or progenitor cells to cells with more specialized or differentiated phenotype in vitro under conditions to permit the cells to obtain said phenotype.
- barcode refers to a short sequence of nucleotides (for example, DNA or RNA) that is used as an identifier for an associated molecule, such as a target molecule and/or target nucleic acid, or as an identifier of the source of an associated molecule, such as a cell-of-origin.
- a barcode may also refer to any unique, non-naturally occurring, nucleic acid sequence that may be used to identify the originating source of a nucleic acid fragment.
- the barcode sequence provides a high-quality individual read of a barcode associated with a single cell/clone, such that multiple cells or clones can be sequenced and analysed together.
- amplified sequences from single cells can be sequenced together and resolved based on the barcode associated with each cell.
- the sample barcode oligonucleotides comprise a PCR handle compatible with single cell sequencing methods as described herein (e.g., Drop-seq, InDrop, 10X Genomics).
- the PCR-amplification handle in the sample barcode oligonucleotides can be changed depending on which sequence read is used for RNA readout (e.g. Drop-seq uses Read2, 10X vl uses Readl).
- a Read2 sequence is used as a PCR handle to generate barcode-containing amplicons compatible with Chromium scRNA library preparation.
- the sample barcode oligonucleotides may be RNA or DNA.
- sample barcode oligonucleotides may incorporate any modified nucleotides known in the art.
- the sample barcode oligonucleotides include a nucleotide barcode sequence of from about 5 - about 20 nucleotides. In certain embodiments, the sample barcode oligonucleotides include a 15 nucleotide barcode sequence.
- the barcode is selected from one or more of the following: BC01- GTGCCGACCAGTATC (SEQ ID NO: 1); BC02 - ACCACCTGACGCAAA (SEQ ID NO: 2); BC03 -ACGGCCCTATTTAAG SEQ ID NO: 3); BC04; AGCCCTGAGTCAGTA (SEQ ID NO: 4); BC05 - CAAATTCAAGGCGAT (SEQ ID NO: 5); BC06 - AATCTTGTATAAGTA (SEQ ID NO: 6); BC07 - CGTCACATTTGAGTC (SEQ ID NO: 7); BC08 - GGACCTTCTTACGAC (SEQ ID NO: 8); BC09 - TACCAATTGTACGCT SEQ ID NO: 9); BC10 - CGCTAATGTCCGTTT (SEQ ID NO: 10); BC11 - ACCCTACGGTGGTTC (SEQ ID NO: 11); BC12 - TGTCCAAGCTGCAAT (SEQ ID NO: 12); BC13
- TTATTATGTTCTAGC SEQ ID NO: 17
- BC18 AATCTCTGAAACGAA
- the sample barcode oligonucleotides are compatible with oligo dT -based RN A- sequencing library preparations so that they can be captured and sequenced together with mRNAs.
- the sample barcode oligonucleotide includes a poly A tail.
- a poly T oligo is used to capture mRNA and polyadenylated sample barcode oligonucleotides and prime a reverse transcription reaction to obtain cDNA molecules.
- Commonly used reverse transcriptases have DNA-dependent DNA polymerase activity. This activity allows DNA sample barcoding oligonucleotides to be copied into cDNA during reverse transcription.
- the sample barcode oligonucleotides comprise a PCR handle for amplification and next-generation sequencing library preparation, a barcode sequence specific for each sample, and a polyA stretch at the 3’ end designed to anneal to polyT stretches on primers used to initiate reverse transcription.
- the sample barcode oligonucleotide comprises an UMI.
- random priming may be used for reverse transcription.
- said genetic barcode is associated with a detectable label, such as a fluorescent label (e.g. GFP), which enables visulisation of the presence of the barcoded cells.
- a detectable label such as a fluorescent label (e.g. GFP)
- the genetic barcode is incorporated into a genomic safe harbor locus (GSH) or a site in the genome able to accommodate the integration of new genetic material in a manner that ensures that the newly inserted genetic elements functions predictably, is expressed ubiquitously and does not cause alterations of the host genome posing a risk to the host cell.
- GSH genomic safe harbor locus
- the barcode is stably integrated into a GSH selected from: the adeno-associated virus site 1 (AAVS1), the chemokine (C-C motif) receptor 5 (CCR5) gene, a chemokine receptor gene known as an HIV-1 coreceptor; and the Rosa26 locus or the human ortholog of the mouse Rosa26 locus.
- the genetic barcode is incorporated into the adeno-associated virus site 1 (AAVS1) locus.
- the genetic barcodes may be targeted to be stably integrated into the genome of a cell through the use of any appropriate gene-editing tools known to the person skilled in the art.
- the barcodes are targeted to a specific locus using a programmable nuclease such as a zinc-finger nuclease (ZFN), transcription activator-like effector nucleases (TALEN) and clustered regularly interspaced short palindromic repeat (CRISPR)-Cas-associated nucleases.
- ZFN zinc-finger nuclease
- TALEN transcription activator-like effector nucleases
- CRISPR clustered regularly interspaced short palindromic repeat
- the barcodes are stably integrated into the cell genome using CRISPR/Cas9-medited gene editing.
- a cell or population of cells according to the invention may comprise more than one genetic barcode, wherein at least one genetic barcode is unique to the cell or population of cells.
- a cell or population of cells according to the invention may comprise a combination of genetic barcodes, wherein the combination of genetic barcodes is unique to the cell or population of cells.
- the barcodes are separated by one or more nucleotides.
- the present invention provides a method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.
- the present invention provides a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into a targeted location of the genome of the cells of each population, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
- iPSC induced pluripotent stem cell
- the present invention relates to methods of measuring or determining or inferring transcriptional level or even protein level changes s, e.g., massively parallel measuring or determining or inferring of RNA levels in a single cell or a cellular network in response to at least one perturbation parameter or advantageously a plurality of perturbation parameters or massively parallel perturbation parameters involving sequencing DNA of a perturbed cell or cells, whereby transcriptional level and optionally protein level effects may be determined in the single cell in response to the at least one perturbation parameter or advantageously a plurality of perturbation parameters or massively parallel perturbation parameters.
- the present invention provides a method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.
- the present invention provides a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell RNA-seq library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
- iPSC induced pluripotent stem cell
- embodiments of the invention may involve a method of inferring or determining or measuring genetic information, including RNA levels, in a single cell from a cellular network, e.g., massively parallel inferring or determining or measuring of RNA levels in a single cell or a cellular network in response to at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
- perturbation parameter(s) comprising optionally so manipulating or perturbing the cell or the cells or each cell of a cellular network with the perturbation parameter(s) and sequencing of the perturbed cell(s), whereby RNA level(s) and optionally protein level(s) is / are determined in the cell(s) in response to the perturbation parameter(s).
- Seurat version 3 as described by Stuart et al., 2019 Cell (https://doi.org/10.1016/bcell.2019.05.031), incorporated herein by reference, is used for the mapping and demultiplexing of reads from the sequencing libraries, (e.g. HTO demultiplexing, doublet calling based on HTO reads, generating quality control metrics).
- the scds R package (Bais & Kostka 2020 Bioinformatics, 36(4): 1150-1158, https://doi.org/10.1093/bioinformatics/btz698), incorporated herein by reference, containing three different algorithms to call doublets based on sequence reads is also used.
- different samples of single cells are multiplexed to generate a multiplexed single sequencing library.
- the samples may be from different perturbations, different time points in an experiment, from different samples treated under different conditions in an experiment, or from different experiments (e.g., replicates).
- the sequencing library is sequenced and demultiplexed in silico.
- the invention involves single cell RNA sequencing (see, e.g., Kalisky, T., Blainey, P. & Quake, S. R. Genomic Analysis at the Single-Cell Level. Annual review of genetics 45, 431-445, (2011); Kalisky, T. & Quake, S. R. Single-cell genomics. Nature Methods 8, 311-314 (2011); Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research, (2011); Tang, F. et al. RNA-Seq analysis to capture the transcriptome landscape of a single cell. Nature Protocols 5, 516-535, (2010); Tang, F. et al.
- the invention involves plate based single cell RNA sequencing (see, e.g., Picelli, S. et al., 2014, “Full-length RNA-seq from single cells using Smart-seq2” Nature protocols 9, 171-181, doi: 10. 1038/nprot.2014.006).
- the invention involves high-throughput single-cell RNA-seq.
- Macosko et al. 2015, “Highly Parallel Genome- wide Expression Profiling of Individual Cells Using Nanoliter Droplets” Cell 161, 1202-1214; International patent application number PCT/US2015/049178, published as W02016/040476 on March 17, 2016; Klein et al., 2015, “Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells” Cell 161, 1187-1201; International patent application number PCT/US2016/027734, published as WO2016168584A1 on October 20, 2016; Zheng, et al., 2016, “Haplotyping germline and cancer genomes with high-throughput linked-read sequencing” Nature Biotechnology 34, 303-311; Zheng, et al., 2017, “Massively parallel digital transcriptional profiling of single cells” Nat.
- tagmentation is used to introduce adaptor sequences to genomic DNA in regions of accessible chromatin (e.g., between individual nucleosomes) (see, e.g., US20160208323 Al; US20160060691A1; WO2017156336A1; J. D. Buenrostro et al. , Singlecell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486-490 (2015); and Cusanovich, D. A., Daza, R., Adey, A., Pliner, H., Christiansen, L., Gunderson, K. L., Steemers, F. J., Trapnell, C. & Shendure, J.
- a hyperactive Tn5 transposase loaded in vitro with adapters for high-throughput DNA sequencing can simultaneously fragment and tag a genome with sequencing adapters.
- the adapters are compatible with the methods described herein.
- a handle is attached to the adapters, such that the tagmented DNA acts as an artificial mRNA (e.g., poly A tail) and can be captured by a cell of origin barcode poly dT capture sequence.
- the sample barcode oligonucleotides are adapted for tagmentation with the adapters used in the first step of generating cell of origin barcodes.
- samples for use in droplet based single sequencing as described herein are multiplexed.
- Cells belonging to different cell populations e.g. iPSC populations
- a unique genetic barcode incorporated into their genomes as described herein.
- the single cells from multiple populations may then be loaded into a microfluidic device.
- the labeled cells are encapsulated with reagents and “cell of origin” barcode or UMI containing beads in emulsion droplets.
- the genetic barcode incorporated into the cells genome may then be released from the cell in the droplet (e.g., by lysis of the cell in the droplet) and processed to generate a cDNA molecule comprising the genetic barcode incorporated unique to the population of cells from which the cell was derived and also a “cell- of-origin” barcode or UMI particular to that cell.
- the sequencing data can then be demultiplexed to determine the cell of origin and the population (e.g. cell line) of origin and therefore the as sociated condition/perturbation .
- Detection of the gene expression level can be conducted in real time in an amplification assay.
- the amplified products can be directly visualized with fluorescent DNA-binding agents including but not limited to DNA intercalators and DNA groove binders. Because the amount of the intercalators incorporated into the double- stranded DNA molecules is typically proportional to the amount of the amplified DNA products, one can conveniently determine the amount of the amplified products by quantifying the fluorescence of the intercalated dye using conventional optical systems in the art.
- DNA-binding dye suitable for this application include SYBR green, SYBR blue, DAPI, propidium iodine, Hoechst, SYBR gold, ethidium bromide, acridines, proflavine, acridine orange, acriflavine, fluorcoumanin, ellipticine, daunomycin, chloroquine, distamycin D, chromomycin, homidium, mithramycin, ruthenium polypyridyls, anthramycin, and the like.
- probe-based quantitative amplification relies on the sequence- specific detection of a desired amplified product. It utilizes fluorescent, target- specific probes (e.g., TaqMan® probes) resulting in increased specificity and sensitivity. Methods for performing probe-based quantitative amplification are well established in the art and are taught in U.S. Patent No. 5,210,015. [00080] Multiplexed Perturbation studies
- a method utilizing the cells (e.g. iPSCs) described herein comprises (1) imparting or introducing single-order or combinatorial perturbations to a population of cells, (2) measuring genomic, genetic, proteomic, epigenetic and/or phenotypic differences in single cells and (3) assigning a perturbation(s) to the single cells.
- a perturbation may be linked to a phenotypic change and preferably changes in gene or protein expression.
- measured differences that are relevant to the perturbations are determined by applying a model accounting for co-variates to the measured differences.
- the model may include the capture rate of measured signals, whether the perturbation actually perturbed the cell (phenotypic impact), the presence of subpopulations of either different cells or cell states, and/or analysis of matched cells without any perturbation.
- the measuring of phenotypic differences and assigning a perturbation to a single cell is determined by performing single cell RNA sequencing (RNA- seq).
- the single cell RNA-seq is performed by any method as described herein (e.g., Drop-seq, InDrop, 10X genomics).
- manipulating or perturbing the cell(s) involves altering the culture conditions so as to contact the cell(s) with one or more agents (e.g. another cell, secretions from another cell e.g. a co-culture, cytokine, growth factor, signaling pathway agonist or antagonist, small molecule, antibody, etc.)
- agents e.g. another cell, secretions from another cell e.g. a co-culture, cytokine, growth factor, signaling pathway agonist or antagonist, small molecule, antibody, etc.
- perturb-seq Methods and tools for genome-scale screening of perturbations in single cells using CRISPR-Cas9 have been described, herein referred to as perturb-seq (see e.g., Dixit et al., “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens” 2016, Cell 167, 1853-1866; Adamson et al., “A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response” 2016, Cell 167, 1867-1882; Feldman et al., Lentiviral co-packaging mitigates the effects of intermolecular recombination and multiple integrations in pooled genetic screens, bioRxiv 262121, doi: doi.org/10.
- unique barcodes are used to perform Perturb-seq.
- a guide RNA is detected by RNA-seq using a transcript expressed from a vector encoding the guide RNA.
- the transcript may include a unique barcode specific to the guide RNA.
- the transcript may include the guide RNA sequence (see, e.g., Fig. 16, CROP-seq, Datlinger, et al., 2017).
- a guide RNA and guide RNA barcode is expressed from the same vector and the barcode may be detected by RNA-seq.
- a perturbation may be assigned to a single cell by detection of a guide RNA barcode in the cell.
- a cell barcode is added to the RNA in single cells, such that the RNA may be assigned to a single cell. Generating cell barcodes is described herein for single cell sequencing methods.
- a Unique Molecular Identifier UMI is added to each individual transcript and protein capture oligonucleotide.
- the UMI allows for determining the capture rate of measured signals, or preferably the binding events or the number of transcripts captured.
- the data is more significant if the signal observed is derived from more than one protein binding event or transcript.
- Perturb-seq is performed using a guide RNA barcode expressed as a polyadenylated transcript, a cell barcode, and a UMI.
- a CRISPR system is used to create an INDEL at a target gene.
- epigenetic screening is performed by applying CRISPRa/i/x technology (see, e.g., Konermann et al. “Genome-scale transcriptional activation by an engineered CRISPR- Cas9 complex” Nature. 2014 Dec 10. doi: 10. 1038/naturel 4136; Qi, L. S., et al. (2013).
- a CRISPR system may be used to activate gene transcription.
- a nuclease-dead RNA-guided DNA binding domain, dCas9, tethered to transcriptional repressor domains that promote epigenetic silencing (e.g., KRAB) may be used for "CRISPR" that represses transcription.
- dCas9 as an activator (CRISPRa)
- a guide RNA is engineered to carry RNA binding motifs (e.g., MS2) that recruit effector domains fused to RNA-motif binding proteins, increasing transcription.
- CRISPR/Cas9 may be used to perturb protein-coding genes or non- protein-coding DNA.
- CRISPR/Cas9 may be used to knockout protein-coding genes by frameshifts, point mutations, inserts, or deletions.
- An extensive toolbox may be used for efficient and specific CRISPR/Cas9 mediated knockout as described herein, including a double-nicking CRISPR to efficiently modify both alleles of a target gene or multiple target loci and a smaller Cas9 protein for delivery on smaller vectors (Ran, F.A., et al., In vivo genome editing using Staphylococcus aureus Cas9. Nature. 520, 186-191 (2015)).
- a genome-wide sgRNA mouse library (-10 sgRNAs/gene) may also be used in a mouse that expresses a Cas9 protein (see, e.g., WO2014204727A1).
- perturbation is by deletion of regulatory elements.
- Non-coding elements may be targeted by using pairs of guide RNAs to delete regions of a defined size, and by tiling deletions covering sets of regions in pools.
- RNAi may be shRNA’s targeting genes.
- the shRNA’s may be delivered by any methods known in the art.
- the shRNA’ s may be delivered by a viral vector.
- the viral vector may be a lentivirus, adenovirus, or adeno associated virus (AAV).
- whole genome screens can be used for understanding the phenotypic readout of perturbing potential target genes.
- perturbations target expressed genes as defined by a gene signature using a focused sgRNA library. Libraries may be focused on expressed genes in specific networks or pathways.
- regulatory drivers are perturbed.
- systematic perturbation of key genes that regulate mesendodermal differentiation may be performed in a high-throughput fashion.
- Gene expression profiling data can be used to define the target of interest and perform follow-up single-cell and population RNA-seq analysis.
- the present invention provides for a method of reconstructing a cellular network, comprising introducing at least 1, 2, 3, 4 or more single-order or combinatorial perturbations to a plurality of cells in a population of cells, wherein each cell in the plurality of the cells receives at least 1 perturbation; measuring comprising: detecting genomic, genetic, proteomic, epigenetic and/or phenotypic differences in single cells compared to one or more cells that did not receive any perturbation, and detecting the perturbation(s) in single cells; and determining measured differences relevant to the perturbations by applying a model accounting for co-variates to the measured differences, whereby intercellular and/or intracellular networks or circuits are inferred.
- the measuring in single cells may comprise single cell sequencing.
- the single cell sequencing may comprise unique molecular identifiers (UMI), whereby the capture rate of the measured signals, such as transcript copy number or probe binding events, in a single cell is determined.
- UMI unique molecular identifiers
- the model may comprise accounting for the capture rate of measured signals, whether the perturbation actually perturbed the cell (phenotypic impact), the presence of subpopulations of either different cells or cell states, and/or analysis of matched cells without any perturbation.
- the measuring may comprise detecting the transcriptome of each of the single cells.
- the perturbation(s) may comprise one or more genetic perturbation(s).
- the perturbation(s) may comprise one or more epigenetic or epigenomic perturbation(s).
- At least one perturbation may be introduced with RNAi- or a CRISPR-Cas system.
- At least one perturbation may be introduced via a chemical agent, biological agent, an intercellular spatial relationship between two or more cells, an increase or decrease of oxygen concentration, an increase or decrease of temperature, addition or subtraction of energy, electromagnetic energy, or ultrasound.
- the measuring or measured differences may comprise measuring or measured differences of DNA, RNA, protein or post translational modification; or measuring or measured differences of protein or post translational modification correlated to RNA and/or DNA level(s).
- the perturbing or perturb ation(s) may comprise(s) genetic perturbing.
- the perturbing or perturbation(s) may comprise(s) single-order perturbations.
- the perturbing or perturbation(s) may comprise(s) combinatorial perturbations.
- the perturbing or perturbation(s) may comprise gene knock-down, gene knock-out, gene activation, gene insertion, or regulatory element deletion.
- the perturbing or perturbation(s) may comprise genome-wide perturbation.
- the perturbing or perturbation(s) may comprise performing CRISPR-Cas-based perturbation.
- the perturbing or perturbation(s) may comprise performing pooled single or combinatorial CRISPR- Cas-based perturbation with a genome- wide library of sgRNAs.
- the perturbations may be of a selected group of targets based on similar pathways or network of targets.
- the perturbing or perturbation(s) may comprises performing pooled combinatorial CRISPR-Cas-based perturbation with a genome-wide library of sgRNAs.
- Each sgRNA may be associated with a unique perturbation barcode.
- Each sgRNA may be co-delivered with a reporter mRNA comprising the unique perturbation barcode (or sgRNA perturbation barcode).
- the perturbing or perturbation(s) may comprise subjecting the cell to an increase or decrease in temperature.
- the perturbing or perturbation(s) may comprise subjecting the cell to a chemical agent.
- the perturbing or perturbation(s) may comprise subjecting the cell to a biological agent.
- the biological agent may be a growth factor or cytokine or antibody.
- the perturbing or perturbation(s) may comprise subjecting the cell to a chemical agent, biological agent and/or temperature increase or decrease across a gradient.
- the cell may be in a microfluidic system.
- the cell may be in a droplet.
- the population of cells may be sequenced by using microfluidics to partition each individual cell into a droplet containing a unique barcode, thus allowing a cell barcode to be introduced.
- the perturbing or perturbation(s) may comprise transforming or transducing the cell or a population that includes and from which the cell is isolated with one or more genomic sequenceperturbation constructs that perturbs a genomic sequence in the cell.
- the sequence-perturbation construct may be a viral vector, preferably a lentivirus vector.
- the perturbing or perturbation(s) may comprise multiplex transformation or transduction with a plurality of genomic sequenceperturbation constructs.
- the present invention provides a method for multiplexed cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed bulk RNA-sequencing library from said plurality of said populations or progeny thereof and sequencing said library; and deconvolving said library using the genetic barcodes.
- the present invention provides a kit for multiplexed single cell analysis of cells (e.g. iPSCs) and optionally their progeny, comprising one or more cell populations, such as one or more iPSC populations, described herein.
- cells e.g. iPSCs
- progeny comprising one or more cell populations, such as one or more iPSC populations, described herein.
- kit refers to any delivery system for delivering materials.
- a kit may refer to a combination of materials for handling stem cells, such delivery systems include systems that allow for the storage, transport, or delivery of reaction reagents (e.g., compounds, proteins, detection agents (such as probes or antibodies), plasmids, vectors etc. in the appropriate containers (such as tubes, etc.) and/or supporting materials (e.g., buffers, reagents, culture media, written instructions for performing cell differentiation, etc.) from one location to another.
- reaction reagents e.g., compounds, proteins, detection agents (such as probes or antibodies), plasmids, vectors etc.
- containers such as tubes, etc.
- supporting materials e.g., buffers, reagents, culture media, written instructions for performing cell differentiation, etc.
- kits include one or more enclosures (e.g., boxes, or bags, and the like) containing the relevant reaction reagents (such as culture media, oligonucleotides, enzymes, inhibitors etc.) and growth factors and cytokines (e.g. VEGF, BDNF, FGF etc.)) and/or supporting materials.
- relevant reaction reagents such as culture media, oligonucleotides, enzymes, inhibitors etc.
- growth factors and cytokines e.g. VEGF, BDNF, FGF etc.
- the kit comprises cells, such as iPSCs, together with cell culture reagents as described herein, including within the examples below, for creation of barcoded cells.
- the kit further comprises one or more reagents differentiating the cells to a selected phenotype and optionally reagents for the generation of a multiplexed single cell sequencing library.
- kits of the present invention may further comprise one or more of the following: a culture medium, at least one cell culture medium supplement, an agent for inhibiting or increasing expression of one or more gene products, and at least one agent for detecting expression of a marker of differentiation.
- Barcoding design 10,000 15 bp barcodes were generated using a 25% probability for the presence of each of the four nucleotides A, C, T and G. Barcodes containing runs of 4 or more nucleotides, or starting or ending with a stop codon, were excluded. All 18 selected barcodes were tested to ensure a minimum Hamming distance of 5 nt (Table 1).
- AAVSl-CAG-hrGFP (Addgene# 52344) was used as the plasmid backbone. It contains hrGFP under the control of the CAG promoter, and AAVS1 homology arms to allow integration of the linearized plasmid into the genome when paired with the CRISPR system using well-described guide RNAs.
- the barcode cassette was introduced between the EcoRV and Mlul sites of the plasmid between hrGFP and the poly-adenylation site, to enable expression as part of the hrGFP transcriptional unit.
- pAAVSl-CAG-hrGFP (Addgene# 52344) was digested by incubation with EcoRV-HF (New England BioLabs; NEB) followed by addition of MluI-HF (NEB) and further incubation. Successful digestion was confirmed by running a small amount on an agarose gel, and remainder was purified using QIAQuick PCR Purification Kit (QIAGEN).
- Top and bottom strands of barcode oligos were annealed by mixing luL each of 100 pM oligo in lx T4 DNA ligase buffer in a volume of 10 pL, heating to 94°C for 2 min, then allowing to cool to 25°C at a rate of l°C/s. Annealed oligos were further diluted 1 in 10 with nuclease-free water.
- WTC wt iPSCs were maintained as previously described (ref Friedman et al). Briefly, cells were cultured on Vitronectin XF (Stem Cell Technologies, Cat# 07180) coated plates in mTeSR media with supplement (Stem Cell Technologies, Cat# 05850) at 37°C with 5% CO2. [000119] For gene editing, cells were grown to about 50-80% confluency, dissociated using IXTrypLE and 100-200K cells were used for each 10 pl reaction of the Neon Transfection System.
- the transfection mixture included 0.5 pg of Barcode plasmid DNA, 20 pmol AAVS1- taregting sgRNA (protospacer sequence: atcctgtccctagtggcccc (SEQ ID NO: 19), chemically synthesized by Agilent technology) and 20 pmol spCas9 protein (IDT).
- spCas9 protein IDT
- Genomic DNA from all cell lines was extracted using QuickExtract DNA Extraction Solution (Epicentre). Correct targeting of donor construct at the AAVS1 locus was confirmed by junction PCR using the following primer pair: AAVS1 Fl : 5’ -ggttcggcttctggcgtgacc-3 ’ (SEQ ID NO: 20), AAVS1 Rl : 5’ -tcaagagtcacccagagacagtgac-3’ (SEQ ID NO: 21). The PCR product was then sent for Sanger sequencing using a universal sequencing primer to validate correct barcode insertion in each cell lines.
- Flow cytometry was performed on live cells for endogenous GFP expression and after labelling for the pluripotency marker SSEA3 (BectonDickinson, Cat# 562706) and corresponding isotype control.
- Cells were analyzed using a BD FACSCANTO II (BectonDickinson, San Jose, CA) with FACSDiva software (BD Biosciences). Data analysis was performed using FlowJo (Tree Star, Ashland, Oregon).
- RNA-seq libraries were generated using the 10X Genomics Chromium 3' Gene Expression (v2) protocol, with minor modifications to the workflow, outlined by Stoeckius and Smibert (https://citeseq.files. WordPress.com/2019/02/cell_hashing_protocol_190213.pdf) to capture the fraction of droplets containing the HTO-derived cDNA ( ⁇ 180bp).
- HTO additive primers and Illumina TruSeq DNA D7xx_s primer (containing i7 index) were ordered from IDT, and used according to the cell hashing protocol. Hashtag libraries were quantified using the Agilent Bioanalyzer.
- Sequencing was performed using the Illumina Nextseq instrument using the Nextseq High Output 150-cycle kit and the gene expression and HTO libraries were pooled on a single flowcell using a ratio of 90:10.
- the lOx Genomics sample index used was SI-GA-D11, and the flowcell ID containing the raw data was 190114_NS500239_0333_AHHTLFBGX9.
- the standard 10X Genomics v2 3' gene expression library was processed using the 10X Genomics cellranger pipeline to derive gene expression count matrices.
- HTO-tagged cells were identified and extracted from the fastq files using the CITE-seq-Count with default parameters (https://hoohm.github.io/CITE-seq-Count/), to generate a count matrix of cells and their respective HTO expression values. This allowed the pooled hashtagged cells to be identified uniquely deconvoluted.
- barcoded cells were identified by the expression of a barcode from the whitelist, which were included in the transcriptome reference with unique identifiers (e.g. 'bcOl')-
- the inventors performed single cell-RNA sequencing on a pooled sample from all 18 barcoding iPS cell lines and found similar expression levels of barcode transcripts in all cell lines. Labelling of four different cell lines with cell hashing antibodies, yielded in a strong correlation of these external barcodes with their internal counterparts, validating the accurate detection of barcoding transcripts. Furthermore, we found comparable expression levels of pluripotency markers in all cell lines and the dimensionality reduction visualisation displays even distribution of cell lines without any effect on clustering ( Figure 2).
- Example 2 Multiplexing of signalling perturbations during mesendoderm differentiation.
- WTC BC01-BC18 18 barcoded iPS cell lines (“WTC BC01-BC18”) were generated as outlined in Example 1.
- BC01 - BC018 were cultured in parallel and multiple temporally staggered set-ups of mesendoderm directed differentiation using a monolayer platform were performed as follows. Differentiation was induced on day 0 by changing the culture media to RPMI (ThermoFisher, Cat# 11875119) containing 3 pM CHIR99021 (Stem Cell Technologies, Cat# 72054), 500 mg/mL BSA (Sigma Aldrich, Cat# A9418), and 213 mg/mL ascorbic acid (SigmaAldrich, Cat# A8960).
- the media was replaced with RPMI containing 500 mg/mL BSA, 213 mg/mL ascorbic acid and one of the signalling molecules listed in Table 2 below.
- the media was exchanged for RPMI containing 500 mg/mL BSA, and 213 mg/mL ascorbic acid without supplemental cytokines.
- the cultures were fed with RPMI containing lx B27 supplement plus insulin (Life Technologies Australia, Cat# 17504001).
- 5pL of full-length amplified cDNA was used to generate a barcoding library for each sample pool.
- a first round of PCR was performed to specifically amplify cDNA regions containing the barcode cassette, and append partial P5 and P7 sequencing adaptors.
- Each reaction contained lx KAPA HiFi Ho tS tart Ready Mix and 300nM each barcode_amp_F and barcode_amp_R primers in a final volume of 50pL.
- a 2-step PCR protocol was performed with annealing/extension at 71 °C for 30s, for six cycles. After a 1.2X SPRI clean-up to remove primers, a second round of PCR was performed with the entire volume of purified product from PCR1.
- Each reaction contained lx KAPA HiFi HotStart ReadyMix, 500nM SI- PCR primer (identical sequence to primer in the Chromium kit), and 5pL of a unique i7 indexed R primer from Chromium i7 Multiplex Kit (lOx Genomics), in a total volume of 50pL.
- PCR was performed as for the SI-PCR protocol in the gene expression library construction workflow. Eight indexing PCR cycles were performed, for a total of 14 cycles over two rounds of PCR.
- Final barcoding libraries were purified using IX SPRI beads, and fragment size and library concentration verified along with gene expression libraries using a BioAnalyzer DNA High Sensitivity Kit (Agilent). Final gene expression libraries were 62-82nM, with average size 457- 494bp. Barcoding libraries were 3O-38nM, with average size 358-362bp.
- a single pool was prepared from the three gene expression and three barcoding libraries for sequencing.
- the samples were pooled equimolar within each library type, and combined so that the gene expression libraries together made up 90% of the pool, and the barcoding libraries 10%.
- Sequencing was performed using the Illumina NovaSeq 6000 instrument using a S4 Reagent Kit vl.5 (200 cycles).
- Gene expression count matrices were derived using the standard 10X cell ranger pipeline.
- sample barcodes were assigned to the remaining cells based on the barcode with the highest expression in each cell.
- Downstream analysis of single-cell data Normalisation, UMAP dimensionality reduction, and clustering of the data was done following the standard pipeline in the Seurat pipeline. The clustering resolution used was 0.2, and we assigned cell type labels by interrogating marker gene expression in each cluster. To visualise the expression of marker gene expression in the UMAP plots, we used the R package Nebulosa (vO.99.92), which represents gene expression using kernel density estimation to account for overplotting and noise from expression drop-out.
- RNA velocity estimation we used velocyto (vO.17.15) to count spliced and unspliced transcripts from the lOx cellranger output using the ‘runlOx’ command. The resulting count matrices were then input into the scVelo (vO.2.1) pipeline for pre-processing, stochastic RNA velocity estimation and embedding onto the UMAP coordinates generated from the Seurat pipeline.
- the inventors adapted a well-established monolayer-based cardiac differentiation protocol, which firstly guides cells towards mesendoderm lineages by small molecule activation of WNT signalling (Figure 3a). Based on this protocol, we either captured cells every 24 hours from day 2 to day 9 to generate a high resolution time course of cell states during differentiation or perturbed known developmental signalling pathways between day 3 and day 5 and sampled cells on days 2, 5 and 9 ( Figure 3b). For single cell RNA sequencing, samples were multiplexed using two different approaches.
- RNA velocity was applied to our time course dataset (Figure 4f). Firstly, this analysis verified the initial separation on day 2 of clusters 6 and 7 into a mesendoderm population and definitive endoderm. It also showed a clear direction of transcription kinetics in cluster 8 that overlaps with the time points of cell capture. Most importantly, it highlighted the complexity of transient progenitor cell states in both endodermal and mesodermal lineages.
- Barcoded cell lines have the potential for utility in diverse multiplexing endpoints, including those not developed, in which analysis of unique barcodes embedded in the genomic DNA or RNA of input cell types provides a way of multiplexing endpoints for scalable analysis of iPSCs or differentiated cell types.
- Example 3 Comparative Example Cell Hashing libraries vs Internal Barcode libraries
- RNA-seq libraries were created as described in Example 1 (i.e. using barcoded iPSC cell lines) resulting in three sets of data comprising three reactions each:
- Dataset 1 Cell hashing library: seu_bc0Xav, seu_bc5Xav, seu_bcDox
- Dataset 2 Internal barcoding library with separately amplified barcoding reads: se, seu2, seu3;
- Dataset 3 Internal barcoding where barcoding reads were solely obtained transcriptome library: libl, lib2, lib3
- the internal barcoding libraries (datasets 2 and 3) have the same transcriptomes, but represent two different methods of detecting the barcoding reads.
- the first dataset has very consistent cell numbers (19997, 19995, 19991), whereas the second and third datasets were more overloaded and initially had more cells (24721, 24828, 26583).
- datasets 1 and 2 have two libraries generated: 1 for the transcriptome and 1 for the barcodes.
- Dataset 3 was only generated with sequencing the internal barcodes (no separate library for the barcodes).
- the inventors have surprisingly found that the internal barcodes can be picked up with equal efficiency to sequencing the barcodes separately (Figure 7). Indeed, there are slightly more singlet cells in the internal barcoding dataset.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Genetics & Genomics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- General Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Immunology (AREA)
- Plant Pathology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Transplantation (AREA)
- Cell Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Developmental Biology & Embryology (AREA)
- General Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention relates to the use of a population of cells into which an engineered barcode unique to the cells has been stably integrated and the use of such cells in combination with one or more other populations of cells into which an engineered barcode unique to each other population which has been stably integrated for multiplexed analysis.
Description
METHODS AND COMPOSITIONS FOR MULTIPLEXING CELL ANALYSIS
Field
[0001] The present invention relates to the use of a population of cells into which an engineered barcode unique to the cells has been stably integrated and the use of such cells in combination with one or more other populations of cells into which an engineered barcode unique to each other population which has been stably integrated for multiplexed analysis.
Background
[0002] Recent development of single-cell RNA sequencing (Tang et al., 2009, Picelli et al., 2013, Hashimshony et al., 2012, Macosko et al., 2015, Klein et al., 2015, Zheng et al., 2017) has enabled transcriptomic profiling at an unprecedented resolution and scale. To overcome the high cost of processing scRNA-seq samples and technical variation between sample processing runs, new strategies in sample multiplexing have emerged to efficiently design experiments to maximise the volume and quality of data derived by single cell studies. These capabilities are ideally suited for scaling stem cell differentiation perturbation assays to reveal the underpinning biology of cell differentiation decisions. They provide versatile approaches for labelling samples using internal or external barcoding coupled with computational strategies for demultiplexing data for downstream analysis.
[0003] One approach uses multiplexing of genetically distinct cells so that their known (Demuxlet, MIX-seq), or inferred (scSplit, Vivero) genotypes enable demultiplexing by reference to intrinsic genetic variation across input samples. For experiments requiring isogenic samples, multiplexing involves associating a sample- specific oligonucleotide barcode to the cells by attaching a barcode to the cell membrane or a membrane protein (Cell hashing, ClickTags, MULTI-Seq), introducing barcodes into the cells (Transient barcoding, barRNA-seq, scifi-seq, sci-plex), or into their genomes (CellTag).
[0004] Barcoding-based multiplexing requires barcode sequencing alongside the transcriptome with expressed barcodes for each cell used to identify its sample of origin. Downstream computational approaches then distinguish true positive barcode expression signals from background noise arising from low quality cell barcodes and ambient transcripts present during cell capture. Current strategies estimate the background count distribution and determine
AH25(41029897_l):DPS
whether the expression of a given barcode in each cell is statistically different from the background.
[0005] All sample multiplexing methods provide a way to identify multiplet cell barcodes beyond the transcriptome-based metrics such as library size and marker gene co-expression. However, only the combinatorial barcoding methods (sci-fi-seq, sci-plex) can rescue multiplets because sample labelling is done on a transcript level. As such, these methods are uniquely permissive to overloading single-cell capture machinery to increase the number of cells processed in one scRNA-seq experiment, and are thus also more robust to the diminishing returns with more experimental samples.
[0006] External barcoding strategies are the most common but involve an additional processing stage to administer cell barcodes after harvesting. This stage exposes the cells to stressors that could impact their transcriptomic readout and often requires expensive single-use reagents. To avoid these disadvantages, the cells would need to have their sample barcode prior to the experiment, such as genome-embedded barcodes (Demuxlet, CellTag). These ‘internal’ barcoding strategies also allow for more complex experimental designs as cells of different samples can be co-cultured in a dish or organoid, whilst remaining identifiable after pooling and sequencing. However, approaches such as Demuxlet involve the use of cells with different genetic backgrounds to demultiplex which inherently limits their utility to population statistical genetics studies where genetic heterogeneity is desirable. CellTag is an alternative approach for barcoding cells. However, in such methods an unknown copy number of barcodes are integrated into the genome randomly which can alter or silence gene expression based on integration location and the lack of reliable expression levels of barcodes makes detection more likely to result in false negative barcode detection.
[0007] The field of developmental biology has made quick work adopting and even driving the multiplexing approaches for scalable analysis of cell lineage decisions in vivo. As such, atlases of developing organs from humans and mice have made major impact. However, the use of barcoded single cell studies to evaluate stem cell differentiation are not matching. There is currently a need for improved sample barcoding and multiplexed experimental design to enable significant data generation describing the molecular basis of cell differentiation and to drive translation of stem cell biology.
Summary of Invention
[0008] To enable efficient multiplexing of single cell data from iPSCs, we generated 18 isogenic iPSCs with genetically encoded barcodes. This tool overcomes numerous limitations based on published or commercially available barcoding methods. In particular, it does not require expensive single use reagents for barcoding cells that also require extended protocols for labelling cells that can compromise the quality of the samples submitted through the cell capture pipeline. While externally applied labelling are limited to the number of unique features (e.g. antigens) that label all cell types, internal barcoding is only limited by the number of combinations of the 15 base pair barcodes that reaches 100 trillion unique barcode options. While many barcoding approaches must be applied at the cell capture stage, internal barcoding provides a unique platform for mixing cells with different barcodes to study cell-cell heterogeneity and interactions, including in organoid models. Lastly, engineering barcodes into iPSCs provides a unique cell type in which to multiplex analysis of diverse cell types that can be derived by directed differentiation into the diverse cell types from organ systems across the embryonic and extra-embryonic lineages.
[0009] In one aspect the present invention provides a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell RNA-seq library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
[00010] In one aspect the present invention provides a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations;
manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell de-multiplexing said library using said genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
[00011] In one aspect the present invention provides a plurality of isogenic populations of iPSCs or progeny thereof, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is integrated into the genome of the cells of each population at a targeted location.
[00012] The disclosed embodiments also provide a computer program product including a non- transitory computer readable medium on which is provided program instructions for performing the recited operations and other computational operations of the methods described herein.
[00013] Some embodiments provide a system for multiplexed single cell analysis in a sample using the isogenic barcoded iPSC populations described herein. The system includes a sequencer for receiving nucleic acids from the test sample providing nucleic acid sequence information from the sample, a processor; and one or more computer-readable storage media having stored thereon instructions for execution on the processor to map a read of sequence information from a pooled sample (e.g. a sequence library) to a single cell, de-multiplexing the sequence information using the genetic barcodes; and mapping said single cell to an originating cell populations or progeny thereof in the test sample.
[00014] Numbered statements of the invention are as follows:
1. A method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library;
mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.
2. A method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell RNA-seq library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
3. The method of statement 1 or 2 wherein the step of generating a multiplexed single cell RNA-seq library comprises: creating a one or more gene expression libraries; creating one or more separate barcode libraries via creation of purified cDNA and amplification of regions of said cDNA comprising said genetic barcode; and pooling said gene expression library and said barcode library.
4. The method of statement 3, wherein the more than one gene expression library and more than one barcode library are created and pooled.
5. The method of statement 3 or 4, wherein the one or more gene expression libraries comprise approximately 90% of the pool and the one or more barcode libraries comprise approximately 10% of the pool.
6. The method of any one of the preceding statements, wherein the step of generating a multiplexed single cell RNA-seq library comprises creating a gene expression library via creation of purified cDNA and amplification of said cDNA but does not include the generation of a separate barcode library.
7. A method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations of cells or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.
8. A method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
9. The method of any one of the preceding statements, wherein providing a plurality of isogenic populations of cells or iPSCs comprises incorporating said genetic barcode into said targeted location via CRISPR/Cas9-mediated integration.
10. The method of any one of the preceding statements, wherein said genetic barcode is incorporated into a genomic safe harbor locus.
11. The method of statement 10 wherein said genetic barcode is incorporated into the adeno- associated virus site 1 (AAVS1).
12. The method of any one of the preceding statements wherein said genetic barcode is fluorescently labelled.
13. The method of any one of the preceding statements wherein said genetic barcode is from 5 - 20 bp.
14. The method of statement 13 wherein said genetic barcode is 15 bp.
15. The method of statement 13, wherein said genetic barcode is selected from the group consisting of:
GTGCCGACCAGTATC (SEQ ID NO: 1);
ACCACCTGACGCAAA (SEQ ID NO: 2);
ACGGCCCTATTTAAG (SEQ ID NO: 3);
AGCCCTGAGTCAGTA (SEQ ID NO: 4);
CAAATTCAAGGCGAT (SEQ ID NO: 5);
AATCTTGTATAAGTA (SEQ ID NO: 6);
CGTCACATTTGAGTC (SEQ ID NO: 7);
GGACCTTCTTACGAC (SEQ ID NO: 8);
TACCAATTGTACGCT (SEQ ID NO: 9);
CGCTAATGTCCGTTT (SEQ ID NO: 10);
ACCCTACGGTGGTTC (SEQ ID NO: 11);
TGTCCAAGCTGCAAT (SEQ ID NO: 12);
GTGTATTTAAAGCCG (SEQ ID NO: 13);
ACACCCGTATGTCAC (SEQ ID NO: 14);
TCTTTCGATGGCGGT (SEQ ID NO: 15);
GAGCACCCGCGTATT (SEQ ID NO: 16);
TTATTATGTTCTAGC (SEQ ID NO: 17); and AATCTCTGAAACGAA (SEQ ID NO: 18).
16. The method of any one of the preceding statements, wherein prior to generating said multiplexed RNA-seq library, one or more of the populations of cells and/or progeny thereof are mixed together with one or more other populations of cells and/or progeny thereof.
17. The method of any one of the preceding statements, wherein manipulating one or more of the populations of cells or iPSCs or progeny thereof comprises contacting the cells with an agent of interest which results in a biologically measurable perturbation to a cell.
18. The method of any one of the preceding statements, wherein manipulating one or more of the populations of cells or progeny thereof comprises altering the culture conditions of, or genetically perturbing the cells of the one or more populations or progeny thereof.
19. The method of statement 18, wherein altering the culture conditions comprises contacting the cells or progeny thereof with an agent of interest, contacting the cells or progeny thereof with another cell, co-culturing the cells or progeny thereof with another cell, or co-culturing the cells and/or progeny thereof in an organoid.
20. The method of statement 17, 18 or 19, wherein the agent of interest is a small molecule, a polypeptide, an antibody, a nucleic acid molecule, an RNAi, a vector comprising a nucleic acid molecule, an antisense oligonucleotide, or a gene editing system (e.g. CRISPR/Cas9).
21. The method of any one of statements 17 to 20 when used in a high-throughput drug screening assay.
22. The method of any one of the preceding statements wherein said progeny are differentiated progeny.
23. A plurality of isogenic populations of cells, or progeny thereof, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location.
24. The plurality of isogenic populations of cells, or progeny thereof, of statement 23, wherein said genetic barcode is incorporated into a genomic safe harbor locus.
25. The plurality of isogenic populations of cells, or progeny thereof, of statement 24, wherein said genetic barcode is incorporated into the adeno-associated virus site 1 (AAVS1).
26. The plurality of isogenic populations of cells, or progeny thereof, of any one of statements 23 - 25, wherein said genetic barcode is fluorescently labelled.
27. The plurality of isogenic populations of cells, or progeny thereof, of any one of statements 20 - 22, wherein said genetic barcode is from 10 - 20 bp.
28. The plurality of isogenic populations of cells, or progeny thereof, of statement 24, wherein said genetic barcode is 15 bp.
29. The plurality of isogenic populations of cells, or progeny thereof, of statement 25, wherein said genetic barcode is selected from the group consisting of:
GTGCCGACCAGTATC (SEQ ID NO: 1);
ACCACCTGACGCAAA (SEQ ID NO: 2);
ACGGCCCTATTTAAG (SEQ ID NO: 3);
AGCCCTGAGTCAGTA (SEQ ID NO: 4);
CAAATTCAAGGCGAT (SEQ ID NO: 5);
AATCTTGTATAAGTA (SEQ ID NO: 6);
CGTCACATTTGAGTC (SEQ ID NO: 7);
GGACCTTCTTACGAC (SEQ ID NO: 8);
TACCAATTGTACGCT (SEQ ID NO: 9);
CGCTAATGTCCGTTT (SEQ ID NO: 10);
ACCCTACGGTGGTTC (SEQ ID NO: 11);
TGTCCAAGCTGCAAT (SEQ ID NO: 12);
GTGTATTTAAAGCCG (SEQ ID NO: 13);
ACACCCGTATGTCAC (SEQ ID NO: 14);
TCTTTCGATGGCGGT (SEQ ID NO: 15);
GAGCACCCGCGTATT (SEQ ID NO: 16);
TTATTATGTTCTAGC (SEQ ID NO: 17); and
AATCTCTGAAACGAA (SEQ ID NO: 18).
30. The plurality of isogenic populations of cells of any one of statements 23 - 29, wherein the cells are iPSCs or progeny thereof.
31. The progeny of cells of any one of statements 23 - 30, wherein the progeny are differentiated progeny.
[00015] Any example or embodiment herein shall be taken to apply mutatis mutandis to any other example or embodiment unless specifically stated otherwise.
[00016] The present disclosure is not to be limited in scope by the specific examples described herein, which are intended for the purpose of exemplification only. Functionally-equivalent methods and systems are clearly within the scope of the disclosure, as described herein.
[00017] Throughout this specification, unless specifically stated otherwise or the context requires otherwise, reference to a single step, composition of matter, group of steps or group of compositions of matter shall be taken to encompass one and a plurality (i.e. one or more) of those steps, compositions of matter, groups of steps or group of compositions of matter.
[00018] The disclosure is hereinafter described by way of the following non-limiting Examples and with reference to the accompanying drawings. Although the examples herein concern humans and the language is primarily directed to human concerns, the concepts described herein are applicable to genomes from other animals. These and other objects and features of the present disclosure will become more fully apparent from the following description and appended claims, or may be learned by the practice of the disclosure as set forth hereinafter.
Brief Description of Drawings
[00019] Figure 1 shows CRISPR editing and quality control of engineered barcode in isogenic iPSCs. a. Plasmid using for designing barcodes into the AAVS1-CAG-GFP targeting cassette, b- d. Example QC steps for FACS analysis of GFP (left) and SSEA4 (right) (b) image analysis for morphology (c) and G-band karyotyping (d) used to ensure quality of barcode engineered iPSCs. e-g. Single cell RNA-seq of barcoded iPSCs in the pluripotent state demonstrates that all barcoded cell lines show similar transcriptional profiles based on dimensionality reduction using UMAP visualisation (e), and for each barcoded line, analysis of reads per cell (f) and expression of the pluripotency markers SOX2 and OCT4 (g). h. External hashing antibodies were used on four barcoded cell lines as secondary validation for accuracy of barcode calling from single cell RNA-seq data showing high fidelity and confidence of computational assignment of external and internal barcodes.
[00020] Figure 2 shows quality control analysis of barcoded iPSCs. a-c. Each cell line was analysed by G-band karyotyping (a) and FACS analysis for the pluripotency marker SSEA4 (b)
and purity of the GFP expression transcribed from the barcode cassette engineered into the AAVS1 locus (c).
[00021] Figure 3 shows the experimental design for multiplexed single cell analysis of mesendoderm differentiation, a. Timeline of the general differentiation protocol from hiPSCs to committed mesendoderm cell types. hiPSC, human induced pluripotent stem cell; GLS, germ layer specification; PC, progenitor cell; cCT, committed cell types, b. Experimental approaches comprise a high resolution time course capturing cells every 24 hours between day 2 and day 9 of differentiation (left) as well as a perturbation strategy of Wnt and BMP signalling pathways during the progenitor cell stage between day 3 and day 5 (right), capturing cells prior to perturbation (day 2), immediately after (day 5) and at the committed cell stage (day 9). c.
Different barcoding methods have been used for multiplexing sc-RNA-sequencing experiments. Samples from the time course were labelled with commercially available hashtag antibodies (TotalSeq™-A). For the signalling perturbations, 18 barcoded iPS cell lines were generated using CRISPR/Cas9 and each experimental condition was carried out using two of these cell lines as biological duplicates, d. sc-RNA-sequencing for both experiments was performed using the Chromium 10X platform and hashtag or expressed cell barcodes were used for demultiplexing, e, summary of important features for multiplexing.
[00022] Figure 4 shows High resolution time course of mesendoderm differentiation, a. Uniform manifold approximation and projection (UMAP) plot showing all cells of the time course (13,682 cells). Cells are coloured by their cluster annotation and numbered according to the legend in figure 4C. b. UMAP plot showing cells coloured by time point, c. Fraction of clusters per time point, displayed as absolute numbers (top) and proportions (bottom), d. Nebulosa plots of specific marker genes demarcating cluster identities, e. Dot plot of marker genes from both datasets, f. RNA velocity results coloured by cluster identity and time point.
[00023] Figure 5 shows signalling pathway perturbations during mesendoderm differentiation, a. Uniform manifold approximation and projection (UMAP) plot showing all cells in the dataset (48,526 cells). Cells are coloured by their cluster annotation and numbered according to the legend in figure 3C. b. UMAP plot showing cells coloured by time point, sequencing library and condition, c. Fraction of clusters per time point, displayed as proportions, d. Nebulosa plots of specific marker genes demarcating cluster identities, e. Dot plot of marker genes from both
datasets, f. Normalized stagged bar plots displaying contributions of cells from different conditions to each cluster.
[00024] Figure 6 shows mitochondrial genes measured using different barcoding strategies. Samples shown are barcoded by Cell hashing: seu_bc0Xav, seu_bc5Xav, seu_bcDox vs. engineered barcoding (separately amplified barcoding library): seul, seu2, seu3 vs engineered barcoding (barcoding reads just from transcriptome library): lib 1 , lib2, lib3. Data show that mitochondrial reads, which are a measure of cell stress, are significantly higher in cells that use cell hashing (P < 0.01 two sample Welch's t-test).
[00025] Figure 7 shows barcode classification of singlet (individual cells), negative (no barcode detected), and doublet (multiple barcodes detected) measured using different barcoding strategies. Samples shown are barcoded by Cell hashing: seu_bc0Xav, seu_bc5Xav, seu_bcDox vs. engineered barcoding (separately amplified barcoding library): seul, seu2, seu3 vs engineered barcoding (barcoding reads just from transcriptome library): lib 1 , lib2, lib3. Data show no significant difference in singlet detection efficiency using any either barcoding or library sequencing method based on two sample Welch’s t-test.
[00026] Figure 8 shows 0XAV QC data. a-d. Thresholds used for cell filtering based on transcriptome metrics: (a) library size, (b) number of reads mapped to genes, (c) percentage of reads mapped to mitochondrial genes, and (d) percentage of reads mapped to ribosomal genes, e. UMAP plots before (left) and after (right) cell filtering based on transcriptome metrics. Points in the pre-filtering plot are coloured by the number of algorithms that call each cell a doublet Doublets labelled in the post-filtering plot are those that were called doublets by 3 or more algorithms, f. Distribution of HTO calls for each barcode (left, A2052-A2051) and summarised (right) as determined from the Seurat algorithm, post transcriptome-based filtering.
Description of Embodiments
[00027] Unless otherwise indicated, the practice of the method and system disclosed herein involves conventional techniques and apparatus commonly used in molecular biology, microbiology, protein purification, protein engineering, protein and DNA sequencing, and recombinant DNA fields, which are within the skill of the art. Unless specifically defined otherwise, all technical and scientific terms used herein shall be taken to have the same meaning as commonly understood by one of ordinary skill in the art (e.g., molecular biology, cell culture,
stem cell differentiation, cell therapy, genetic modification, disease modelling, biochemistry, physiology, and clinical studies).
[00028] Unless otherwise indicated, the molecular and statistical techniques utilized in the present disclosure are standard procedures, well known to those skilled in the art. Such techniques are described and explained throughout the literature in sources such as, J. Perbal, A Practical Guide to Molecular Cloning, John Wiley and Sons (1984), J. Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbour Laboratory Press (1989), T.A. Brown (editor), Essential Molecular Biology: A Practical Approach, Volumes 1 and 2, IRL Press (1991), D.M. Glover and B.D. Hames (editors), DNA Cloning: A Practical Approach, Volumes 1-4, IRL Press (1995 and 1996), and F.M. Ausubel et al. (editors), Current Protocols in Molecular Biology, Greene Pub. Associates and Wiley-Interscience (1988, including all updates until present), Ed Harlow and David Lane (editors) Antibodies: A Laboratory Manual, Cold Spring Harbour Laboratory, (1988), J.E. Coligan et al. (editors) Current Protocols in Immunology, John Wiley & Sons (including all updates until present), Robert Lanza (editor) Handbook of Stem Cells, Volume 1, Embryonic Stem Cells (Elsevier).
[00029] As used in this specification and the appended claims, terms in the singular and the singular forms "a," "an" and "the," for example, optionally include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to "a kidney organoid" optionally includes one or more kidney organoid.
[00030] As used herein, the term “about”, unless stated to the contrary, refers to +/- 10%, more preferably +/- 5%, more preferably +/- 1%, of the designated value.
[00031] The term “and/or”, e.g., “X and/or Y” shall be understood to mean either “X and Y” or “X or Y” and shall be taken to provide explicit support for both meanings or for either meaning.
[00032] Throughout this specification the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps.
[00033] Numeric ranges are inclusive of the numbers defining the range. It is intended that every maximum numerical limitation given throughout this specification includes every lower numerical limitation, as if such lower numerical limitations were expressly written herein. Every minimum numerical limitation given throughout this specification will include every higher numerical limitation, as if such higher numerical limitations were expressly written herein. Every numerical range given throughout this specification will include every narrower numerical range that falls within such broader numerical range, as if such narrower numerical ranges were all expressly written herein.
[00034] The headings provided herein are not intended to limit the disclosure.
[00035] Various embodiments are described hereinafter. It should be noted that the specific embodiments are not intended as an exhaustive description or as a limitation to the broader aspects discussed herein. One aspect described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced with any other embodiment(s). Reference throughout this specification to “one embodiment”, “an embodiment,” “an example embodiment,” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment,” “in an embodiment,” or “an example embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to a person skilled in the art from this disclosure, in one or more embodiments. Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention. For example, in the appended claims, any of the claimed embodiments can be used in any combination.
[00036] Unless otherwise indicated, nucleic acids are written left to right in 5’ to 3’ orientation and amino acid sequences are written left to right in amino to carboxy orientation, respectively.
[00037] A gene is a locus (or region) of DNA which is made up of nucleotides and is the molecular unit of heredity.
[00038] The terms “polynucleotide”, “oligonucleotide”, “nucleic acid” and “nucleic acid molecules” are used interchangeably and refer to a covalently linked sequence of nucleotides (i.e., ribonucleotides for RNA and deoxyribonucleotides for DNA) in which the 3’ position of the pentose of one nucleotide is joined by a phosphodiester group to the 5’ position of the pentose of the next. The nucleotides include sequences of any form of nucleic acid, including, but not limited to RNA and DNA molecules. The terms includes, without limitation, single- and double- stranded polynucleotide.
[00039] The term “read” refers to a sequence obtained from a portion of a nucleic acid sample. Typically, though not necessarily, a read represents a short sequence of contiguous base pairs in the sample. The read may be represented symbolically by the base pair sequence (in A, T, C, or G) of the sample portion. It may be stored in a memory device and processed as appropriate to determine whether it matches a reference sequence or meets other criteria. A read may be obtained directly from a sequencing apparatus or indirectly from stored sequence information concerning the sample. In some cases, a read is a nucleic acid sequence of sufficient length (e.g., at least about 25 bp) that can be used to identify a larger sequence or region, e.g., that can be aligned and specifically assigned to a chromosome or genomic region or gene.
[00040] The term “genomic read” is used in reference to a read of any segments in the entire genome of an individual.
[00041] “Induced pluripotent stem cells (iPSCs) or (iPS cells)” is a designation that pertains to somatic cells that have been reprogrammed or “de-differentiated”, for example, by introducing exogenous genes that confer on the somatic cell a less differentiated phenotype. These cells can then be induced to differentiate into less differentiated progeny. IPS cells have been derived using modifications of an approach originally discovered in 2006 (Yamanaka, S. et al., Cell Stem Cell, 1:39-49 (2007)). For example, in one instance, to create iPS cells, scientists started with skin cells that were then modified by a standard laboratory technique using retroviruses to insert genes into the cellular DNA. In one instance, the inserted genes were Oct4, Sox2, Lif4, and c- myc, known to act together as natural regulators to keep cells in an embryonic stem cell-like state. These cells have been described in the literature. See, for example, Wemig et al., PNAS, 105:5856-5861(2008); Jaenisch et al., Cell, 132:567-582 (2008); Hanna et al., Cell, 133:250-264 (2008); and Brambrink et al., Cell Stem Cell, 2:151-159 (2008). It is also possible that such cells can be created by specific culture conditions (exposure to specific agents) may also be created
from a variety of different starting cell types. These references are all incorporated by reference for teaching iPSCs and methods for producing them.
[00042] iPSCs have many characteristic features of embryonic stem cells. For example, they have the ability to create chimeras with germ line transmission and tetrapioid complementation and they can also form teratomas containing various cell types from the three embryonic germ layers. On the other hand, they may not be identical as some reports demonstrate. See, for example, Chin et al., Cell Stem Cell 5:111-123 (2009) showing that induced pluripotent stem cells and embryonic stem cells can be distinguished by gene expression signatures.
[00043] Cells such as iPSCs or their progeny (including differentiated progeny) as disclosed herein may in the context of the present specification be said to “express” or “comprise the expression” or conversely to “not express” one or more markers, such as one or more genes or gene products; or be described as “positive” or conversely as “negative” for one or more markers, such as one or more genes or gene products; or be said to “comprise” a defined “gene or gene product signature”.
[00044] Such terms are commonplace and well-understood by the skilled person when characterizing cell phenotypes. By means of additional guidance, when a cell is said to be positive for or to express or comprise expression of a given marker, such as a given gene or gene product, a skilled person would conclude the presence or evidence of a distinct signal for the marker when carrying out a measurement capable of detecting or quantifying the marker in or on the cell. Suitably, the presence or evidence of the distinct signal for the marker would be concluded based on a comparison of the measurement result obtained for the cell to a result of the same measurement carried out for a negative control (for example, a cell known to not express the marker) and/or a positive control (for example, a cell known to express the marker). Where the measurement method allows for a quantitative assessment of the marker, a positive cell may generate a signal for the marker that is at least 1.5-fold higher than a signal generated for the marker by a reference cell (e.g. negative control cell) or than an average signal generated for the marker by a population of reference or negative control cells, e.g., at least 2-fold, at least 4-fold, at least 10-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50-fold higher or even higher. Further, a positive cell may generate a signal for the marker that is 3.0 or more standard deviations, e.g., 3.5 or more, 4.0 or more, 4.5 or more, or 5.0 or more standard
deviations, higher than an average signal generated for the marker by a population of reference or negative control cells.
[00045] A reference herein to a patent document or other matter which is given as prior art is not to be taken as an admission that that document or matter was known or that the information it contains was part of the common general knowledge as at the priority date of any of the claims.
Barcoded Cells
[00046] Human pluripotent stem cells (hPSCs) can self-renew and have the potential to differentiate into theoretically any cell type of the body in response to developmental signalling cues that guide cell differentiation decisions. With capabilities in deriving induced pluripotent stem cells (iPSCs) and advances in generating diverse functional cell types of the body through directed differentiation protocols, greater understanding and facility in deriving pure, well- defined, and functional cell types is needed.
[00047] The development of single-cell RNA sequencing has enabled transcriptomic profiling at an unprecedented resolution and scale. New strategies in sample multiplexing have emerged to efficiently design experiments to maximise the volume and quality of data derived by single cell studies that are ideally suited for scaling stem cell differentiation perturbation assays to reveal the underpinning biology of cell differentiation decisions and elucidate mechanisms of development, model diseases, discover drugs, and regenerate organs.
[00048] It would be desirable to provide tools and methods for the systematic analysis of iPSC biology. It would also be desirable to provide tools and methods for the systematic analysis of other cell types (e.g. stem and progenitor cells, cell lines), and other biological tissues (e.g. in vivo murine models and other model organisms).
[00049] In one aspect, the present invention provides a plurality of isogenic populations of iPSCs or progeny thereof, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location.
[00050] In another aspect, the present invention provides a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each
population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location.
[00051] Embodiments disclosed herein also relate to progeny of such cells (e.g. progeny of iPSCs or other stem or progenitor cells), including differentiated progeny or a population of cells obtained from one or more of the populations of barcoded cells (e.g. iPSCs). As used herein, the term “differentiated” or “differentiation” as used with respect to cells in a differentiating cell system refers to the process by which cells differentiate from one cell type (e.g., a multipotent, totipotent or pluripotent differentiable cell) to another cell type such as a target differentiated cell). Accordingly, the “cell differentiation”, refers to a specialization process or a pathway by which a less specialized cell (e.g. stem cell) develops or matures to possess a more distinct form and function (i.e. more specialized).
[00052] As used herein, the term “dedifferentiation” or “dedifferentiated” as used with respect to cells, refers to a process wherein a more specialized cell having a more distinct form and function, and/or limited self-renewal and/or proliferative capacity becomes less specialized and acquires a greater self-renewal and/or proliferative capacity or differentiation capacity (e.g. multipotent, pluripotent etc.). An induced Pluripotent Stem Cell (iPSC) is an example of a dedifferentiated cell. Accordingly, dedifferentiation can refer to a process of cellular reprogramming.
[00053] In embodiments of the invention the isogenic populations of cells (e.g. iPSCs) are cell lines derived from a single source wherein each cell line comprises a genetic barcode unique to that cell line.
[00054] In embodiments of the invention the barcoded iPSC populations or cell lines are linage committed or differentiated whereby the iPSCs are differentiated to multipotent stem or progenitor cells to cells with more specialized or differentiated phenotype in vitro under conditions to permit the cells to obtain said phenotype.
[00055] The term “barcode”, “genetic barcode” and barcode oligonucleotide” as used herein refers to a short sequence of nucleotides (for example, DNA or RNA) that is used as an identifier for an associated molecule, such as a target molecule and/or target nucleic acid, or as an identifier of the source of an associated molecule, such as a cell-of-origin. A barcode may also refer to any unique, non-naturally occurring, nucleic acid sequence that may be used to identify
the originating source of a nucleic acid fragment. Although it is not necessary to understand the mechanism of an invention, it is believed that the barcode sequence provides a high-quality individual read of a barcode associated with a single cell/clone, such that multiple cells or clones can be sequenced and analysed together. Not being bound by a theory, amplified sequences from single cells can be sequenced together and resolved based on the barcode associated with each cell.
[00056] In certain embodiments, the sample barcode oligonucleotides comprise a PCR handle compatible with single cell sequencing methods as described herein (e.g., Drop-seq, InDrop, 10X Genomics). Depending on the application, the PCR-amplification handle in the sample barcode oligonucleotides can be changed depending on which sequence read is used for RNA readout (e.g. Drop-seq uses Read2, 10X vl uses Readl). In certain embodiments a Read2 sequence is used as a PCR handle to generate barcode-containing amplicons compatible with Chromium scRNA library preparation. The sample barcode oligonucleotides may be RNA or DNA. The sample barcode oligonucleotides may incorporate any modified nucleotides known in the art. In certain embodiments, the sample barcode oligonucleotides include a nucleotide barcode sequence of from about 5 - about 20 nucleotides. In certain embodiments, the sample barcode oligonucleotides include a 15 nucleotide barcode sequence. In a preferred embodiment the barcode is selected from one or more of the following: BC01- GTGCCGACCAGTATC (SEQ ID NO: 1); BC02 - ACCACCTGACGCAAA (SEQ ID NO: 2); BC03 -ACGGCCCTATTTAAG SEQ ID NO: 3); BC04; AGCCCTGAGTCAGTA (SEQ ID NO: 4); BC05 - CAAATTCAAGGCGAT (SEQ ID NO: 5); BC06 - AATCTTGTATAAGTA (SEQ ID NO: 6); BC07 - CGTCACATTTGAGTC (SEQ ID NO: 7); BC08 - GGACCTTCTTACGAC (SEQ ID NO: 8); BC09 - TACCAATTGTACGCT SEQ ID NO: 9); BC10 - CGCTAATGTCCGTTT (SEQ ID NO: 10); BC11 - ACCCTACGGTGGTTC (SEQ ID NO: 11); BC12 - TGTCCAAGCTGCAAT (SEQ ID NO: 12); BC13 - GTGTATTTAAAGCCG (SEQ ID NO: 13); BC14 - ACACCCGTATGTCAC (SEQ ID NO: 14); BC15 - TCTTTCGATGGCGGT (SEQ ID NO: 15); BC16 - GAGCACCCGCGTATT (SEQ ID NO: 16); BC17 -
TTATTATGTTCTAGC (SEQ ID NO: 17); and BC18 - AATCTCTGAAACGAA (SEQ ID NO: 18).
[00057] In certain embodiments, the sample barcode oligonucleotides are compatible with oligo dT -based RN A- sequencing library preparations so that they can be captured and sequenced together with mRNAs. In certain embodiments, the sample barcode oligonucleotide includes a
poly A tail. In certain embodiments, a poly T oligo is used to capture mRNA and polyadenylated sample barcode oligonucleotides and prime a reverse transcription reaction to obtain cDNA molecules. Commonly used reverse transcriptases have DNA-dependent DNA polymerase activity. This activity allows DNA sample barcoding oligonucleotides to be copied into cDNA during reverse transcription. In certain embodiments, the sample barcode oligonucleotides comprise a PCR handle for amplification and next-generation sequencing library preparation, a barcode sequence specific for each sample, and a polyA stretch at the 3’ end designed to anneal to polyT stretches on primers used to initiate reverse transcription. In certain embodiments, the sample barcode oligonucleotide comprises an UMI. In certain embodiments, random priming may be used for reverse transcription.
[00058] In certain embodiments said genetic barcode is associated with a detectable label, such as a fluorescent label (e.g. GFP), which enables visulisation of the presence of the barcoded cells.
[00059] In an embodiment of the invention the genetic barcode is incorporated into a genomic safe harbor locus (GSH) or a site in the genome able to accommodate the integration of new genetic material in a manner that ensures that the newly inserted genetic elements functions predictably, is expressed ubiquitously and does not cause alterations of the host genome posing a risk to the host cell. Various GSH sites have been described previously and will be known to the person skilled in the art. In embodiments the barcode is stably integrated into a GSH selected from: the adeno-associated virus site 1 (AAVS1), the chemokine (C-C motif) receptor 5 (CCR5) gene, a chemokine receptor gene known as an HIV-1 coreceptor; and the Rosa26 locus or the human ortholog of the mouse Rosa26 locus. In a preferred embodiment, the genetic barcode is incorporated into the adeno-associated virus site 1 (AAVS1) locus.
[00060] The genetic barcodes may be targeted to be stably integrated into the genome of a cell through the use of any appropriate gene-editing tools known to the person skilled in the art. In certain embodiments the barcodes are targeted to a specific locus using a programmable nuclease such as a zinc-finger nuclease (ZFN), transcription activator-like effector nucleases (TALEN) and clustered regularly interspaced short palindromic repeat (CRISPR)-Cas-associated nucleases. In a preferred embodiment the barcodes are stably integrated into the cell genome using CRISPR/Cas9-medited gene editing.
[00061] In certain embodiments, a cell or population of cells according to the invention may comprise more than one genetic barcode, wherein at least one genetic barcode is unique to the cell or population of cells. In another embodiment, a cell or population of cells according to the invention may comprise a combination of genetic barcodes, wherein the combination of genetic barcodes is unique to the cell or population of cells. In one embodiment, the barcodes are separated by one or more nucleotides.
Methods of the Invention
[00062] According to another aspect, the present invention provides a method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.
[00063] According to another aspect, the present invention provides a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into a targeted location of the genome of the cells of each population, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
[00064] The present invention relates to methods of measuring or determining or inferring transcriptional level or even protein level changes s, e.g., massively parallel measuring or determining or inferring of RNA levels in a single cell or a cellular network in response to at least one perturbation parameter or advantageously a plurality of perturbation parameters or
massively parallel perturbation parameters involving sequencing DNA of a perturbed cell or cells, whereby transcriptional level and optionally protein level effects may be determined in the single cell in response to the at least one perturbation parameter or advantageously a plurality of perturbation parameters or massively parallel perturbation parameters.
[00065] According to another aspect, the present invention provides a method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.
[00066] In another aspect the present invention provides a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell RNA-seq library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
[00067] .Accordingly, embodiments of the invention may involve a method of inferring or determining or measuring genetic information, including RNA levels, in a single cell from a cellular network, e.g., massively parallel inferring or determining or measuring of RNA levels in a single cell or a cellular network in response to at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
93, 94, 95, 96, 97, 98, 99 or 100 or massively parallel “manipulation(s)” e.g. perturbation parameter(s) comprising optionally so manipulating or perturbing the cell or the cells or each cell
of a cellular network with the perturbation parameter(s) and sequencing of the perturbed cell(s), whereby RNA level(s) and optionally protein level(s) is / are determined in the cell(s) in response to the perturbation parameter(s).
[00068] Computational methods for the mapping of sequencing reads from a sequencing library to a single cell will be well-known to the skilled addressee and can be performed, for example, using any available sequence mapping software. Similarly, computational methods for demultiplexing barcodes enabling sequence information to be mapped to specific cells and specific cells to their starting population may be performed using methods that are known to the skilled address and using any available software. Demultiplexing may be performed using more than one method. Sequence mapping and demultiplexing methods useful in the methods of the present invention are described in more detail in the references cited in the following section (incorporated herein by reference) and in the Examples of the invention described herein. In certain embodiments Seurat version 3 as described by Stuart et al., 2019 Cell (https://doi.org/10.1016/bcell.2019.05.031), incorporated herein by reference, is used for the mapping and demultiplexing of reads from the sequencing libraries, (e.g. HTO demultiplexing, doublet calling based on HTO reads, generating quality control metrics). In other embodiments, the scds R package (Bais & Kostka 2020 Bioinformatics, 36(4): 1150-1158, https://doi.org/10.1093/bioinformatics/btz698), incorporated herein by reference, containing three different algorithms to call doublets based on sequence reads is also used.
[00069] Multiplexing single cell RNA-sequencing using internal barcodes
[00070] In certain embodiments, different samples of single cells are multiplexed to generate a multiplexed single sequencing library. The samples may be from different perturbations, different time points in an experiment, from different samples treated under different conditions in an experiment, or from different experiments (e.g., replicates). In certain embodiments, the sequencing library is sequenced and demultiplexed in silico.
[00071] Recent development of single-cell RNA sequencing including droplet-based (see, e.g., Macosko, et al., Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 161(5): 1202 — 1214, 2015; and Dixit, et al., Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell, 167(7): 1853-1866, 2016) and combinatorial split-pool methods (see, e.g., Vitak, et al.,
Sequencing thousands of single-cell genomes with combinatorial indexing. Nature Methods, 14(3):302-308, 2017; Cao, et al., Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 357(6352):661-667, 2017; and Rosenberg et al., Scaling single cell transcriptomics through split pool barcoding. bioRxiv preprint first posted online Feb. 2, 2017, doi:dx.doi.org/10.1101/105163) have enabled transcriptomic profiling at an unprecedented resolution and scale.
[00072] In certain embodiments, the invention involves single cell RNA sequencing (see, e.g., Kalisky, T., Blainey, P. & Quake, S. R. Genomic Analysis at the Single-Cell Level. Annual review of genetics 45, 431-445, (2011); Kalisky, T. & Quake, S. R. Single-cell genomics. Nature Methods 8, 311-314 (2011); Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research, (2011); Tang, F. et al. RNA-Seq analysis to capture the transcriptome landscape of a single cell. Nature Protocols 5, 516-535, (2010); Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nature Methods 6, 377-382, (2009); Ramskold, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nature Biotechnology 30, 777-782, (2012); and Hashimshony, T., Wagner, F., Sher, N. & Yanai, I. CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification. Cell Reports, Cell Reports, Volume 2, Issue 3, p666-673, 2012).
[00073] In certain embodiments, the invention involves plate based single cell RNA sequencing (see, e.g., Picelli, S. et al., 2014, “Full-length RNA-seq from single cells using Smart-seq2” Nature protocols 9, 171-181, doi: 10. 1038/nprot.2014.006).
[00074] In certain embodiments, the invention involves high-throughput single-cell RNA-seq. In this regard reference is made to Macosko et al., 2015, “Highly Parallel Genome- wide Expression Profiling of Individual Cells Using Nanoliter Droplets” Cell 161, 1202-1214; International patent application number PCT/US2015/049178, published as W02016/040476 on March 17, 2016; Klein et al., 2015, “Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells” Cell 161, 1187-1201; International patent application number PCT/US2016/027734, published as WO2016168584A1 on October 20, 2016; Zheng, et al., 2016, “Haplotyping germline and cancer genomes with high-throughput linked-read sequencing” Nature Biotechnology 34, 303-311; Zheng, et al., 2017, “Massively parallel digital transcriptional profiling of single cells” Nat. Commun. 8, 14049 doi: 10. 1038/ncommsl4049; International
patent publication number WO2014210353A2; Zilionis, et al., 2017, “Single-cell barcoding and sequencing using droplet microfluidics” Nat Protoc. Jan;12(l):44-73; Cao et al., 2017, “Comprehensive single cell transcriptional profiling of a multicellular organism by combinatorial indexing” bioRxiv preprint first posted online Feb. 2, 2017, doi: dx.doi.org/10. H01/104844; Vitak, et al., “Sequencing thousands of single-cell genomes with combinatorial indexing” Nature Methods, 14(3):302-308, 2017; Cao, et al., Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 357(6352):661-667, 2017; and Gierahn et al.,“Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput” Nature Methods 14, 395-398 (2017), all the contents and disclosure of each of which are herein incorporated by reference in their entirety.
[00075] In certain embodiments, tagmentation is used to introduce adaptor sequences to genomic DNA in regions of accessible chromatin (e.g., between individual nucleosomes) (see, e.g., US20160208323 Al; US20160060691A1; WO2017156336A1; J. D. Buenrostro et al. , Singlecell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486-490 (2015); and Cusanovich, D. A., Daza, R., Adey, A., Pliner, H., Christiansen, L., Gunderson, K. L., Steemers, F. J., Trapnell, C. & Shendure, J. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science. 2015 May 22;348(6237):910-4. doi: 10. H26/science.aabl601 . Epub 2015 May 7). The term “tagmentation” refers to a step in the Assay for Transposase Accessible Chromatin using sequencing (ATAC-seq) as described. (See, Buenrostro, J. D., Giresi, P. G, Zaba, L. C., Chang, H. Y., Greenleaf, W. J., Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods 2013; 10 (12): 1213-1218). Specifically, a hyperactive Tn5 transposase loaded in vitro with adapters for high-throughput DNA sequencing, can simultaneously fragment and tag a genome with sequencing adapters. In one embodiment the adapters are compatible with the methods described herein.
[00076] The multiplexing strategy described herein is also applicable to single-cell profiling of chromatin accessibility (see, e.g., Cusanovich, et al., 2015; and www.10xgenomics.com/solutions/single-cell-atac/). In certain embodiments, a handle is attached to the adapters, such that the tagmented DNA acts as an artificial mRNA (e.g., poly A tail) and can be captured by a cell of origin barcode poly dT capture sequence. In certain embodiments, the sample barcode oligonucleotides are adapted for tagmentation with the adapters used in the first step of generating cell of origin barcodes.
[00077] In one exemplary embodiment, samples for use in droplet based single sequencing as described herein are multiplexed. Cells belonging to different cell populations (e.g. iPSC populations) are labeled with a unique genetic barcode incorporated into their genomes as described herein. The single cells from multiple populations may then be loaded into a microfluidic device. The labeled cells are encapsulated with reagents and “cell of origin” barcode or UMI containing beads in emulsion droplets. The genetic barcode incorporated into the cells genome may then be released from the cell in the droplet (e.g., by lysis of the cell in the droplet) and processed to generate a cDNA molecule comprising the genetic barcode incorporated unique to the population of cells from which the cell was derived and also a “cell- of-origin” barcode or UMI particular to that cell. The sequencing data can then be demultiplexed to determine the cell of origin and the population (e.g. cell line) of origin and therefore the as sociated condition/perturbation .
[00078] In certain embodiments, quantitative real time PCR can be utilized. Detection of the gene expression level can be conducted in real time in an amplification assay. In one aspect, the amplified products can be directly visualized with fluorescent DNA-binding agents including but not limited to DNA intercalators and DNA groove binders. Because the amount of the intercalators incorporated into the double- stranded DNA molecules is typically proportional to the amount of the amplified DNA products, one can conveniently determine the amount of the amplified products by quantifying the fluorescence of the intercalated dye using conventional optical systems in the art. DNA-binding dye suitable for this application include SYBR green, SYBR blue, DAPI, propidium iodine, Hoechst, SYBR gold, ethidium bromide, acridines, proflavine, acridine orange, acriflavine, fluorcoumanin, ellipticine, daunomycin, chloroquine, distamycin D, chromomycin, homidium, mithramycin, ruthenium polypyridyls, anthramycin, and the like.
[00079] In another aspect, other fluorescent labels such as sequence specific probes can be employed in the amplification reaction to facilitate the detection and quantification of the amplified products. Probe-based quantitative amplification relies on the sequence- specific detection of a desired amplified product. It utilizes fluorescent, target- specific probes (e.g., TaqMan® probes) resulting in increased specificity and sensitivity. Methods for performing probe-based quantitative amplification are well established in the art and are taught in U.S. Patent No. 5,210,015.
[00080] Multiplexed Perturbation studies
[00081] Methods and tools for genome-scale screening of perturbations in single cells are known to the skilled person. Methods and tools allow reconstructing of a cellular network or for example, the differentiation trajectory of cell or population of cells. In one embodiment, a method utilizing the cells (e.g. iPSCs) described herein comprises (1) imparting or introducing single-order or combinatorial perturbations to a population of cells, (2) measuring genomic, genetic, proteomic, epigenetic and/or phenotypic differences in single cells and (3) assigning a perturbation(s) to the single cells. A perturbation may be linked to a phenotypic change and preferably changes in gene or protein expression. In preferred embodiments, measured differences that are relevant to the perturbations are determined by applying a model accounting for co-variates to the measured differences. The model may include the capture rate of measured signals, whether the perturbation actually perturbed the cell (phenotypic impact), the presence of subpopulations of either different cells or cell states, and/or analysis of matched cells without any perturbation. In certain embodiments, the measuring of phenotypic differences and assigning a perturbation to a single cell is determined by performing single cell RNA sequencing (RNA- seq). In preferred embodiments, the single cell RNA-seq is performed by any method as described herein (e.g., Drop-seq, InDrop, 10X genomics). In certain embodiments, manipulating or perturbing the cell(s) involves altering the culture conditions so as to contact the cell(s) with one or more agents (e.g. another cell, secretions from another cell e.g. a co-culture, cytokine, growth factor, signaling pathway agonist or antagonist, small molecule, antibody, etc.) In other embodiments methods of performing genomewide CRIS PR-mediated perturbation screens are provided.
[00082] Methods and tools for genome-scale screening of perturbations in single cells using CRISPR-Cas9 have been described, herein referred to as perturb-seq (see e.g., Dixit et al., “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens” 2016, Cell 167, 1853-1866; Adamson et al., “A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response” 2016, Cell 167, 1867-1882; Feldman et al., Lentiviral co-packaging mitigates the effects of intermolecular recombination and multiple integrations in pooled genetic screens, bioRxiv 262121, doi: doi.org/10. H01/262121; Datlinger, et al., 2017, Pooled CRISPR screening with single-cell transcriptome readout. Nature Methods. Vol. 14 No.3 DOI: 10. 1038/nmeth.4177; Hill
et al., On the design of CRISPR-based single cell molecular screens, Nat Methods. 2018 Apr; 15(4): 271-274; and International publication serial number WO/2017/075294).
[00083] In certain embodiments, unique barcodes are used to perform Perturb-seq. In certain embodiments, a guide RNA is detected by RNA-seq using a transcript expressed from a vector encoding the guide RNA. The transcript may include a unique barcode specific to the guide RNA. The transcript may include the guide RNA sequence (see, e.g., Fig. 16, CROP-seq, Datlinger, et al., 2017). In certain embodiments, a guide RNA and guide RNA barcode is expressed from the same vector and the barcode may be detected by RNA-seq. Not being bound by a theory, detection of a guide RNA barcode is more reliable than detecting a guide RNA sequence, reduces the chance of false guide RNA assignment and reduces the sequencing cost associated with executing these screens. Thus, a perturbation may be assigned to a single cell by detection of a guide RNA barcode in the cell. In certain embodiments, a cell barcode is added to the RNA in single cells, such that the RNA may be assigned to a single cell. Generating cell barcodes is described herein for single cell sequencing methods. In certain embodiments, a Unique Molecular Identifier (UMI) is added to each individual transcript and protein capture oligonucleotide. Not being bound by a theory, the UMI allows for determining the capture rate of measured signals, or preferably the binding events or the number of transcripts captured. Not being bound by a theory, the data is more significant if the signal observed is derived from more than one protein binding event or transcript. In preferred embodiments, Perturb-seq is performed using a guide RNA barcode expressed as a polyadenylated transcript, a cell barcode, and a UMI.
[00084] In certain embodiments, a CRISPR system is used to create an INDEL at a target gene. In other embodiments, epigenetic screening is performed by applying CRISPRa/i/x technology (see, e.g., Konermann et al. “Genome-scale transcriptional activation by an engineered CRISPR- Cas9 complex” Nature. 2014 Dec 10. doi: 10. 1038/naturel 4136; Qi, L. S., et al. (2013).
"Repurposing CRISPR as an RNA-guided platform for sequence- specific control of gene expression". Cell. 152 (5): 1173-83; Gilbert, L. A., et al., (2013). "CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes". Cell. 154 (2): 442-51; Komor et al., 2016, Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage, Nature 533, 420-424; Nishida et al., 2016, Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems, Science 353(6305); Yang et al., 2016, Engineering and optimising deaminase fusions for genome editing, Nat Commun. 7: 13330; Hess et al., 2016, Directed evolution using dCas9-targeted somatic hypermutation in mammalian
cells, Nature Methods 13, 1036-1042; and Ma et al., 2016, Targeted AID-mediated mutagenesis (TAM) enables efficient genomic diversification in mammalian cells, Nature Methods 13, 1029- 1035). Numerous genetic variants associated with disease phenotypes are found to be in noncoding region of the genome, and frequently coincide with transcription factor (TF) binding sites and non coding RNA genes. Not being bound by a theory, CRISPRa/i/x approaches may be used to achieve a more thorough and precise understanding of the implication of epigenetic regulation. In one embodiment, a CRISPR system may be used to activate gene transcription. A nuclease-dead RNA-guided DNA binding domain, dCas9, tethered to transcriptional repressor domains that promote epigenetic silencing (e.g., KRAB) may be used for "CRISPR" that represses transcription. To use dCas9 as an activator (CRISPRa), a guide RNA is engineered to carry RNA binding motifs (e.g., MS2) that recruit effector domains fused to RNA-motif binding proteins, increasing transcription.
[00085] In one embodiment, CRISPR/Cas9 may be used to perturb protein-coding genes or non- protein-coding DNA. CRISPR/Cas9 may be used to knockout protein-coding genes by frameshifts, point mutations, inserts, or deletions. An extensive toolbox may be used for efficient and specific CRISPR/Cas9 mediated knockout as described herein, including a double-nicking CRISPR to efficiently modify both alleles of a target gene or multiple target loci and a smaller Cas9 protein for delivery on smaller vectors (Ran, F.A., et al., In vivo genome editing using Staphylococcus aureus Cas9. Nature. 520, 186-191 (2015)). A genome-wide sgRNA mouse library (-10 sgRNAs/gene) may also be used in a mouse that expresses a Cas9 protein (see, e.g., WO2014204727A1).
[00086] In one embodiment, perturbation is by deletion of regulatory elements. Non-coding elements may be targeted by using pairs of guide RNAs to delete regions of a defined size, and by tiling deletions covering sets of regions in pools.
[00087] In one embodiment, perturbation of genes is by RNAi. The RNAi may be shRNA’s targeting genes. The shRNA’s may be delivered by any methods known in the art. In one embodiment, the shRNA’ s may be delivered by a viral vector. The viral vector may be a lentivirus, adenovirus, or adeno associated virus (AAV).
[00088] In certain embodiments, whole genome screens can be used for understanding the phenotypic readout of perturbing potential target genes. In preferred embodiments, perturbations
target expressed genes as defined by a gene signature using a focused sgRNA library. Libraries may be focused on expressed genes in specific networks or pathways. In other preferred embodiments, regulatory drivers are perturbed. In certain embodiments, systematic perturbation of key genes that regulate mesendodermal differentiation may be performed in a high-throughput fashion. Gene expression profiling data can be used to define the target of interest and perform follow-up single-cell and population RNA-seq analysis.
[00089] In one aspect, the present invention provides for a method of reconstructing a cellular network, comprising introducing at least 1, 2, 3, 4 or more single-order or combinatorial perturbations to a plurality of cells in a population of cells, wherein each cell in the plurality of the cells receives at least 1 perturbation; measuring comprising: detecting genomic, genetic, proteomic, epigenetic and/or phenotypic differences in single cells compared to one or more cells that did not receive any perturbation, and detecting the perturbation(s) in single cells; and determining measured differences relevant to the perturbations by applying a model accounting for co-variates to the measured differences, whereby intercellular and/or intracellular networks or circuits are inferred. The measuring in single cells may comprise single cell sequencing. The single cell sequencing may comprise unique molecular identifiers (UMI), whereby the capture rate of the measured signals, such as transcript copy number or probe binding events, in a single cell is determined. The model may comprise accounting for the capture rate of measured signals, whether the perturbation actually perturbed the cell (phenotypic impact), the presence of subpopulations of either different cells or cell states, and/or analysis of matched cells without any perturbation.
[00090] The measuring may comprise detecting the transcriptome of each of the single cells. The perturbation(s) may comprise one or more genetic perturbation(s). The perturbation(s) may comprise one or more epigenetic or epigenomic perturbation(s). At least one perturbation may be introduced with RNAi- or a CRISPR-Cas system. At least one perturbation may be introduced via a chemical agent, biological agent, an intercellular spatial relationship between two or more cells, an increase or decrease of oxygen concentration, an increase or decrease of temperature, addition or subtraction of energy, electromagnetic energy, or ultrasound.
[00091] The measuring or measured differences may comprise measuring or measured differences of DNA, RNA, protein or post translational modification; or measuring or measured differences of protein or post translational modification correlated to RNA and/or DNA level(s).
[00092] The perturbing or perturb ation(s) may comprise(s) genetic perturbing. The perturbing or perturbation(s) may comprise(s) single-order perturbations. The perturbing or perturbation(s) may comprise(s) combinatorial perturbations. The perturbing or perturbation(s) may comprise gene knock-down, gene knock-out, gene activation, gene insertion, or regulatory element deletion. The perturbing or perturbation(s) may comprise genome-wide perturbation. The perturbing or perturbation(s) may comprise performing CRISPR-Cas-based perturbation. The perturbing or perturbation(s) may comprise performing pooled single or combinatorial CRISPR- Cas-based perturbation with a genome- wide library of sgRNAs. The perturbations may be of a selected group of targets based on similar pathways or network of targets.
[00093] The perturbing or perturbation(s) may comprises performing pooled combinatorial CRISPR-Cas-based perturbation with a genome-wide library of sgRNAs. Each sgRNA may be associated with a unique perturbation barcode. Each sgRNA may be co-delivered with a reporter mRNA comprising the unique perturbation barcode (or sgRNA perturbation barcode).
[00094] The perturbing or perturbation(s) may comprise subjecting the cell to an increase or decrease in temperature. The perturbing or perturbation(s) may comprise subjecting the cell to a chemical agent. The perturbing or perturb ation(s) may comprise subjecting the cell to a biological agent. The biological agent may be a growth factor or cytokine or antibody. The perturbing or perturbation(s) may comprise subjecting the cell to a chemical agent, biological agent and/or temperature increase or decrease across a gradient.
[00095] The cell may be in a microfluidic system. The cell may be in a droplet. The population of cells may be sequenced by using microfluidics to partition each individual cell into a droplet containing a unique barcode, thus allowing a cell barcode to be introduced.
[00096] The perturbing or perturbation(s) may comprise transforming or transducing the cell or a population that includes and from which the cell is isolated with one or more genomic sequenceperturbation constructs that perturbs a genomic sequence in the cell. The sequence-perturbation construct may be a viral vector, preferably a lentivirus vector. The perturbing or perturbation(s) may comprise multiplex transformation or transduction with a plurality of genomic sequenceperturbation constructs.
[00097] The skilled addressee will readily appreciate that the foregoing methods involving genetically barcoded cells according to the present invention may also be readily applied in the
context of bulk RNA-seq analysis involving deconvolution of multiplexed bulk RNA samples. Accordingly, in another embodiment, the present invention provides a method for multiplexed cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed bulk RNA-sequencing library from said plurality of said populations or progeny thereof and sequencing said library; and deconvolving said library using the genetic barcodes.
Kits
[00098] The present invention provides a kit for multiplexed single cell analysis of cells (e.g. iPSCs) and optionally their progeny, comprising one or more cell populations, such as one or more iPSC populations, described herein.
[00099] As used herein, the term “kit” refers to any delivery system for delivering materials. In the context of cell differentiation, a kit may refer to a combination of materials for handling stem cells, such delivery systems include systems that allow for the storage, transport, or delivery of reaction reagents (e.g., compounds, proteins, detection agents (such as probes or antibodies), plasmids, vectors etc. in the appropriate containers (such as tubes, etc.) and/or supporting materials (e.g., buffers, reagents, culture media, written instructions for performing cell differentiation, etc.) from one location to another. For example, kits include one or more enclosures (e.g., boxes, or bags, and the like) containing the relevant reaction reagents (such as culture media, oligonucleotides, enzymes, inhibitors etc.) and growth factors and cytokines (e.g. VEGF, BDNF, FGF etc.)) and/or supporting materials.
[000100] In another embodiment the kit comprises cells, such as iPSCs, together with cell culture reagents as described herein, including within the examples below, for creation of barcoded cells. In another embodiment, the kit further comprises one or more reagents differentiating the cells to a selected phenotype and optionally reagents for the generation of a multiplexed single cell sequencing library.
[000101] In another embodiment the kit further comprises instructions for the preparation of barcoded cells, including barcoded iPSCs, as described herein
[000102] In addition to inhibitors and agonists, the kits of the present invention may further comprise one or more of the following: a culture medium, at least one cell culture medium supplement, an agent for inhibiting or increasing expression of one or more gene products, and at least one agent for detecting expression of a marker of differentiation.
Examples
Example 1 - Generation of a novel cell barcoding system
[000103] To allow for an opportunity to interrogate novel aspects of cell-cell interactions and signalling in a single cell context, cells were engineered to produce their own internal barcode through stable incorporation of a specifically designed expression cassette into a transcriptionally active region of the genome.
[000104] Materials and methods
[000105] Barcoding design. 10,000 15 bp barcodes were generated using a 25% probability for the presence of each of the four nucleotides A, C, T and G. Barcodes containing runs of 4 or more nucleotides, or starting or ending with a stop codon, were excluded. All 18 selected barcodes were tested to ensure a minimum Hamming distance of 5 nt (Table 1).
[000106] Barcode cassette design. The barcodes were introduced into the cells as a part of a barcode cassette, which also incorporated the reverse complement of a partial Chromium Read2 adaptor sequence (truncated slightly to allow oligo length of <= 60bp). This Read2 sequence is used as a PCR handle to generate barcode-containing amplicons compatible with Chromium scRNA library preparation. Additionally, restriction enzyme recognition sequences were added or regenerated to enable easy transfer of the cassette between different vectors. The exact structure of the cassette is as follows: EcoRV site 3’ 3bp - Mlul site - partial lOx Read2 adaptor reverse complement - 15bp barcode - Mlul complementary sequence. Barcoding cassettes were ordered as complementary single-stranded oligos, which could be annealed and ligated into a digested plasmid backbone.
[000107] Vector design. AAVSl-CAG-hrGFP (Addgene# 52344) was used as the plasmid backbone. It contains hrGFP under the control of the CAG promoter, and AAVS1 homology arms to allow integration of the linearized plasmid into the genome when paired with the CRISPR system using well-described guide RNAs.
[000108] The barcode cassette was introduced between the EcoRV and Mlul sites of the plasmid between hrGFP and the poly-adenylation site, to enable expression as part of the hrGFP transcriptional unit.
[000109] Generation of barcoding plasmids. pAAVSl-CAG-hrGFP (Addgene# 52344) was digested by incubation with EcoRV-HF (New England BioLabs; NEB) followed by addition of MluI-HF (NEB) and further incubation. Successful digestion was confirmed by running a small amount on an agarose gel, and remainder was purified using QIAQuick PCR Purification Kit (QIAGEN).
[000110] Top and bottom strands of barcode oligos were annealed by mixing luL each of 100 pM oligo in lx T4 DNA ligase buffer in a volume of 10 pL, heating to 94°C for 2 min, then
allowing to cool to 25°C at a rate of l°C/s. Annealed oligos were further diluted 1 in 10 with nuclease-free water.
[000111] Ligation of barcode cassettes to vector was performed by combining lOOng digested plasmid, 4 pL diluted annealed oligos, and 1 pL T4 DNA ligase (NEB; 400U/pL) in lOpL total volume with lx T4 DNA ligase buffer. Reaction was incubated at 16°C for 16 hours. Separate reactions were performed for each barcode oligo.
[000112] 3pL of ligation reactions were added to 20 pL Stellar competent cells (Takara) and heat shock transformation performed at 42 °C for 1 min in 1.5mL tubes. 350 pL SOC media was added for recovery, with shaking at 37°C for 1 hour. 100 pL was spread onto selective ampicillin-containing agar plates, which were incubated overnight at 37°C.
[000113] 3 colonies were picked from each plate for screening with colony PCR for expected insert in 10 pL reactions using MangoTaq (Bioline).
[000114] Colonies showing successful amplification of insert were grown overnight in 5 mL LB broth containing ampicillin, and plasmid purified using QIAGEN Plasmid Miniprep kit.
[000115] 100% sequence identity was confirmed across the barcode insert by Sanger sequencing, performed by the Australian Genome Research Facility (AGRF).
[000116] 50 mL cultures were grown from glycerol stocks of plasmids with confirmed barcode sequence insert, and plasmid was purified using Nucleobond Xtra Midi Kit (Macherey-Nagel) to give endotoxin-free, high concentration stock for transfection.
[000117] Stable cell generation. All human pluripotent stem cell studies were carried out in accordance with consent from the University of Queensland’s Institutional Human Research Ethics approval (HREC#: 2015001434).
[000118] WTC wt iPSCs were maintained as previously described (ref Friedman et al). Briefly, cells were cultured on Vitronectin XF (Stem Cell Technologies, Cat# 07180) coated plates in mTeSR media with supplement (Stem Cell Technologies, Cat# 05850) at 37°C with 5% CO2.
[000119] For gene editing, cells were grown to about 50-80% confluency, dissociated using IXTrypLE and 100-200K cells were used for each 10 pl reaction of the Neon Transfection System. The transfection mixture included 0.5 pg of Barcode plasmid DNA, 20 pmol AAVS1- taregting sgRNA (protospacer sequence: atcctgtccctagtggcccc (SEQ ID NO: 19), chemically synthesized by Agilent technology) and 20 pmol spCas9 protein (IDT). After electroporation with 1 pulse of 1300 V for 30 ms, cells were seeded in mTesSR with ROCK Inhibitor (Y-27632) and CloneR (STEMCELL Technologies). Selection was performed with 1 pg/ml puromycin and purified cell lines were frozen down in CryoStor CS10 Cell Freezing Medium and stored in liquid nitrogen.
[000120] Quality control of cell lines. All cell lines underwent quality testing for correct genetic insertion, selection efficiency, pluripotency, chromosomal abnormalities and mycoplasma contamination.
[000121] Genomic DNA from all cell lines was extracted using QuickExtract DNA Extraction Solution (Epicentre). Correct targeting of donor construct at the AAVS1 locus was confirmed by junction PCR using the following primer pair: AAVS1 Fl : 5’ -ggttcggcttctggcgtgtgacc-3 ’ (SEQ ID NO: 20), AAVS1 Rl : 5’ -tcaagagtcacccagagacagtgac-3’ (SEQ ID NO: 21). The PCR product was then sent for Sanger sequencing using a universal sequencing primer to validate correct barcode insertion in each cell lines.
[000122] Flow cytometry was performed on live cells for endogenous GFP expression and after labelling for the pluripotency marker SSEA3 (BectonDickinson, Cat# 562706) and corresponding isotype control. Cells were analyzed using a BD FACSCANTO II (BectonDickinson, San Jose, CA) with FACSDiva software (BD Biosciences). Data analysis was performed using FlowJo (Tree Star, Ashland, Oregon).
[000123] Karyotyping was carried out as a professional service by Sullivan Nicolaides Pathology. IPSCs were grown in a 25 cm2 flask to about 70-80% confluency and send for analysis. 15 cells were examined per culture and three exemplary karyotypes were provided as results.
[000124] Proof of principle single cell RNA-sequencing. All barcoded iPS cell lines were cultured in parallel as described above, dissociated using 0.5% EDTA and 600K cells from each cell line were combined. Prior to this, four cell lines were additionally labelling with different
TotalSeq-A cell hashing antibodies (Stoeckius et al, 2018; Genome Biology) according to the manufacturers protocol. The combined sample was transferred into 2% BSA (Sigma Aldrich, Cat#A9418) in PBS, stained with Propidium Iodide and 500K viable cells were sorted using a BD Influx™ Cell Sorter (BectonDickinson, San Jose, CA) with FACSDiva software (BD Biosciences).
[000125] Single cell RNA-seq libraries were generated using the 10X Genomics Chromium 3' Gene Expression (v2) protocol, with minor modifications to the workflow, outlined by Stoeckius and Smibert (https://citeseq.files.wordpress.com/2019/02/cell_hashing_protocol_190213.pdf) to capture the fraction of droplets containing the HTO-derived cDNA (<180bp).
[000126] HTO additive primers and Illumina TruSeq DNA D7xx_s primer (containing i7 index) were ordered from IDT, and used according to the cell hashing protocol. Hashtag libraries were quantified using the Agilent Bioanalyzer.
[000127] Sequencing was performed using the Illumina Nextseq instrument using the Nextseq High Output 150-cycle kit and the gene expression and HTO libraries were pooled on a single flowcell using a ratio of 90:10. The lOx Genomics sample index used was SI-GA-D11, and the flowcell ID containing the raw data was 190114_NS500239_0333_AHHTLFBGX9.
[000128] The standard 10X Genomics v2 3' gene expression library was processed using the 10X Genomics cellranger pipeline to derive gene expression count matrices. HTO-tagged cells were identified and extracted from the fastq files using the CITE-seq-Count with default parameters (https://hoohm.github.io/CITE-seq-Count/), to generate a count matrix of cells and their respective HTO expression values. This allowed the pooled hashtagged cells to be identified uniquely deconvoluted.
[000129] From the 3' gene expression data, barcoded cells were identified by the expression of a barcode from the whitelist, which were included in the transcriptome reference with unique identifiers (e.g. 'bcOl')-
[000130] Results
Plasmid cloning and cell engineering of 18 barcoded WTC-11 iPSC lines was performed as described in the methods above and outlined in Figure la. All cell lines underwent vigorous quality testing in terms of genomic aberrations, pluripotency and stable integration (Figure 2).
[000131] The inventors performed single cell-RNA sequencing on a pooled sample from all 18 barcoding iPS cell lines and found similar expression levels of barcode transcripts in all cell lines. Labelling of four different cell lines with cell hashing antibodies, yielded in a strong correlation of these external barcodes with their internal counterparts, validating the accurate detection of barcoding transcripts. Furthermore, we found comparable expression levels of pluripotency markers in all cell lines and the dimensionality reduction visualisation displays even distribution of cell lines without any effect on clustering (Figure 2).
Example 2 - Multiplexing of signalling perturbations during mesendoderm differentiation.
[000132] The inventors next tested the utility of the barcoded cell lines for studying cell mesendodermal differentiation.
[000133] Materials and methods
[000134] 18 barcoded iPS cell lines (“WTC BC01-BC18”) were generated as outlined in Example 1. BC01 - BC018 were cultured in parallel and multiple temporally staggered set-ups of mesendoderm directed differentiation using a monolayer platform were performed as follows. Differentiation was induced on day 0 by changing the culture media to RPMI (ThermoFisher, Cat# 11875119) containing 3 pM CHIR99021 (Stem Cell Technologies, Cat# 72054), 500 mg/mL BSA (Sigma Aldrich, Cat# A9418), and 213 mg/mL ascorbic acid (SigmaAldrich, Cat# A8960). On day 3, the media was replaced with RPMI containing 500 mg/mL BSA, 213 mg/mL ascorbic acid and one of the signalling molecules listed in Table 2 below. On day 5, the media was exchanged for RPMI containing 500 mg/mL BSA, and 213 mg/mL ascorbic acid without supplemental cytokines. On day 7, the cultures were fed with RPMI containing lx B27 supplement plus insulin (Life Technologies Australia, Cat# 17504001).
[000135] Cell lines were divided into 2 batches (BC01-BC09 and BC10-BC18) to allow for capture of biological duplicates for all 9 conditions. Cells were collected for sc-RNA-sequencing on day 2 as a reference, prior to any perturbations, as well as on days 5 and 9 of differentiation. In total, 3 sequencing libraries were generated, with each library consisting of all 18 cell lines. A
combination of 2 different timepoints from the two batches was pooled in each library to allow for easy detection and removal of potential batch effects during downstream analysis (Table 2).
[000136] Perturbation experiment single cell library preparation and sequencing. Sample pools were assessed for quality using a hemocytometer with Trypan Blue exclusion. Cell viability ranged from 82-88%; cell concentration was between 1.3E+06 and 2.1.9E+06 cells/mL. Chromium Single Cell 3’ v3 (lOx Genomics) reactions were performed for each sample according to manufacturer’s protocol, targeting 20,000 cells per reaction. 11 cycles of cDNA amplification were performed in a C1000 Touch thermocycler with Deep Well Reaction Module (Bio-Rad). After clean-up of full-length amplified cDNA, lOpL was used for construction of the gene expression library according to manufacturer’s protocol, with 11 indexing PCR cycles. Additionally, 5pL of full-length amplified cDNA was used to generate a barcoding library for each sample pool. Briefly, a first round of PCR was performed to specifically amplify cDNA regions containing the barcode cassette, and append partial P5 and P7 sequencing adaptors. Each reaction contained lx KAPA HiFi Ho tS tart Ready Mix and 300nM each barcode_amp_F and barcode_amp_R primers in a final volume of 50pL. A 2-step PCR protocol was performed with
annealing/extension at 71 °C for 30s, for six cycles. After a 1.2X SPRI clean-up to remove primers, a second round of PCR was performed with the entire volume of purified product from PCR1. Each reaction contained lx KAPA HiFi HotStart ReadyMix, 500nM SI- PCR primer (identical sequence to primer in the Chromium kit), and 5pL of a unique i7 indexed R primer from Chromium i7 Multiplex Kit (lOx Genomics), in a total volume of 50pL. PCR was performed as for the SI-PCR protocol in the gene expression library construction workflow. Eight indexing PCR cycles were performed, for a total of 14 cycles over two rounds of PCR. Final barcoding libraries were purified using IX SPRI beads, and fragment size and library concentration verified along with gene expression libraries using a BioAnalyzer DNA High Sensitivity Kit (Agilent). Final gene expression libraries were 62-82nM, with average size 457- 494bp. Barcoding libraries were 3O-38nM, with average size 358-362bp.
[000137] A single pool was prepared from the three gene expression and three barcoding libraries for sequencing. The samples were pooled equimolar within each library type, and combined so that the gene expression libraries together made up 90% of the pool, and the barcoding libraries 10%. Sequencing was performed using the Illumina NovaSeq 6000 instrument using a S4 Reagent Kit vl.5 (200 cycles). Gene expression count matrices were derived using the standard 10X cell ranger pipeline.
[000138] Demultiplexing and quality control. Barcoding reads used to assign sample barcodes to each cell were from the separately amplified and sequenced barcode libraries, and cell barcodes without both transcriptome and sample barcoding reads were removed. For each barcode sequencing library, the ‘HTODemux’ function in the Seurat R package (v3.0) was used to determine the dominant sample barcode for each cell and annotate negative and doublet cells based on their sample barcode reads alone. Three transcriptome-based doublet detection methods in the scds R package (v 1.2.0) were used to further assign doublet annotations to each cell, and cells labelled as doublets by at least three methods were removed. Transcriptome-based cell filtering as part of the Seurat pipeline removed cells with fewer than 2000 and greater than 7500 detected genes; fewer than 5000 and greater than 50,000 total read counts; or mitochondrial reads accounting for greater than 25% of total reads.
[000139] Following filtering, sample barcodes were assigned to the remaining cells based on the barcode with the highest expression in each cell.
[000140] Downstream analysis of single-cell data. Normalisation, UMAP dimensionality reduction, and clustering of the data was done following the standard pipeline in the Seurat pipeline. The clustering resolution used was 0.2, and we assigned cell type labels by interrogating marker gene expression in each cluster. To visualise the expression of marker gene expression in the UMAP plots, we used the R package Nebulosa (vO.99.92), which represents gene expression using kernel density estimation to account for overplotting and noise from expression drop-out.
[000141] For RNA velocity estimation, we used velocyto (vO.17.15) to count spliced and unspliced transcripts from the lOx cellranger output using the ‘runlOx’ command. The resulting count matrices were then input into the scVelo (vO.2.1) pipeline for pre-processing, stochastic RNA velocity estimation and embedding onto the UMAP coordinates generated from the Seurat pipeline.
[000142] Results
[000143] The inventors adapted a well-established monolayer-based cardiac differentiation protocol, which firstly guides cells towards mesendoderm lineages by small molecule activation of WNT signalling (Figure 3a). Based on this protocol, we either captured cells every 24 hours from day 2 to day 9 to generate a high resolution time course of cell states during differentiation or perturbed known developmental signalling pathways between day 3 and day 5 and sampled cells on days 2, 5 and 9 (Figure 3b). For single cell RNA sequencing, samples were multiplexed using two different approaches. While the 8 time course samples were labelled with commercially available cell hashing antibodies, genetically engineered barcode iPS cell lines produced according to the methods outlined in Example 1 were used to combine a total of 54 sequencing reactions into only 3 libraries (Figure 3c, 3e). Both experiments were sequenced using the 10X Chromium platform and samples were demultiplexed according to their barcoding method (Figure 3d).
[000144] Temporal dissection of mesendoderm differentiation. Demultiplexing, doublet calling and preprocessing of the sequencing data was performed using 4 different doublet detection algorithms and low quality cells were removed from the dataset based on UMI and feature count, as well as percentage of mitochondrial and ribosomal RNA content (Figure 8). After filtering, 13,682 cells were subjected to normalization, UMAP dimensionality reduction,
and unsupervised clustering following the standard pipeline of Seurat, which identified 10 distinct clusters of both endodermal and mesodermal cell lineages (Figure 4a). The inventors analyzed cell populations based on their day of appearance within the time course (Figure 4b-c) alongside with known marker gene expression to identify transcriptional phenotypes of subpopulations (Figure 4d-e). On days 2 and 3 of differentiation, cells divided into either FOXA2-positive definitive endoderm (cluster 7) or MIXL1 -expressing mesendoderm (cluster 6). Following on, we found 3 separate groups of cells spanning from day 4 through to day 9, one of which was a small population of CDH5 -positive endocardial endothelium (cluster 8). The other two groups comprised of multiple clusters (0, 1 and 3) of endoderm (all expressing FOXA2) or mesodermal identities (clusters 2, 4, 5 and 9). The endoderm cells on day 4 were made of as a whole by primitive gut endoderm (cluster 3) highly expressing the transcription factors FOXA2 and GATA3. SOX2 -positive anterior foregut cells arose on day 5 of, declining in cell numbers towards later stages of differentiation, whereas TTR-positive posterior foregut was firstly detected on day 6 persisting until day 9. The mesodermal cells on day 4 were all HAND1- positive cardiac progenitor cells belonging to cluster 2, whilst on days 5 to 8 multiple populations existed in parallel. We detected paraxial and lateral plate mesoderm (PAX3+ and PRRX1+, cluster 4) from day 5 to day 7, a small population of NOG+/T+ axial mesoderm (cluster 9) on days 6 and 7, as well as a MYH6+ cardiomyocyte population (cluster 5) from as early as day 5 onwards (Figure 4c-e).
[000145] To predict future states of individual cells, identify branching points of lineages and further examine transcription kinetics, we applied RNA velocity to our time course dataset (Figure 4f). Firstly, this analysis verified the initial separation on day 2 of clusters 6 and 7 into a mesendoderm population and definitive endoderm. It also showed a clear direction of transcription kinetics in cluster 8 that overlaps with the time points of cell capture. Most importantly, it highlighted the complexity of transient progenitor cell states in both endodermal and mesodermal lineages.
[000146] Taken together, these data show iPSC differentiation into committed endodermal and mesodermal cell types via multiple different progenitor cell populations in a highly dynamic process.
[000147] Using a directed differentiation protocol for derivation of mesendodermal cell types (Figure 3a), we designed an experiment for systematic activation or inhibition of known
developmental signalling pathways governing Wnt, BMP, and VEGF pathways (Figure 3b). Barcoded cells were split into two groups to enable analysis of biological duplicates for each time point and condition. Cells were captured prior to perturbation at day 2 then at day 5 after treatment perturbation (day 3-5). Lastly, committed cell types were captured on day 9 of differentiation. A total of 54 experimental samples were mixed based on unique barcode combinations into 3 multiplexed samples and submitted for library preparation and sequencing. Computational demultiplexing of barcodes enabled each transcript mapped to specific cells and each cell mapped to specific experimental condition and time point.
[000148] After data quality control, we successfully capture 48,526 cells that are represented using dimensionality reduction methods (Figure 5a). Cell analysis for time point, sequencing library, and treatment condition demonstrate effective analysis of multiplexed data consistent with original experimental design (Figure 5b-c). Marker genes are used to identify cell types as features of diverse cell types captured by the multiplexed data (Figure 5d-e). Quantitative analysis of all input treatment conditions is used to identify how different treatments contribute to various cell types (Figure 5f).
[000149] Taken together, these data demonstrate the success of deriving engineered barcode iPSCs for multiplexed single cell analysis. Barcoded cell lines have the potential for utility in diverse multiplexing endpoints, including those not developed, in which analysis of unique barcodes embedded in the genomic DNA or RNA of input cell types provides a way of multiplexing endpoints for scalable analysis of iPSCs or differentiated cell types.
Example 3 - Comparative Example Cell Hashing libraries vs Internal Barcode libraries
[000150] In this example 3 RNA-seq libraries were created as described in Example 1 (i.e. using barcoded iPSC cell lines) resulting in three sets of data comprising three reactions each:
Dataset 1: Cell hashing library: seu_bc0Xav, seu_bc5Xav, seu_bcDox
Dataset 2: Internal barcoding library with separately amplified barcoding reads: seul, seu2, seu3; and
Dataset 3: Internal barcoding where barcoding reads were solely obtained transcriptome library: libl, lib2, lib3
The internal barcoding libraries (datasets 2 and 3) have the same transcriptomes, but represent two different methods of detecting the barcoding reads.
[000151] The first dataset has very consistent cell numbers (19997, 19995, 19991), whereas the second and third datasets were more overloaded and initially had more cells (24721, 24828, 26583).
[000152] Since the barcoding reads were sequenced separately to the transcriptome in dataset 2, the cell barcodes that were not picked up in both the libraries were excluded, resulting in fewer cells in this dataset (19656, 19708, 19349).
[000153] Comparison of the number of genes, library size, percentage mitochondrial and ribosomal reads were assessed as measures of transcriptome quality. As a measure of the efficiency of barcoding using the cell hashing method and the internal barcoding method described herein, the comparison of the % of cells assigned singlet (confident) barcodes and % of cells assigned no barcode was performed as filtering for transcriptome quality wherein cells with a high number of genes and outlier library size are excluded.
[000154] Results
[000155] Comparisons of transcriptomes of cells prepared using the cell hashing method and those prepared using the internal barcoding strategy described herein demonstrate that the number of mitochondrial genes is significantly higher in cells prepared using cell hashing methods (Figure 6): percent mitochondrial reads: dataset 1 vs. dataset 2: p = 0.01226; dataset 1 vs. dataset 3: p = 0.007659; dataset 2 vs. dataset3: p = 0.5867.
[000156] These data highlight that cells prepared using the internal barcoding methods described herein are not as stressed at the time of cell capture and yield transcriptomes of higher quality. Without wishing to be bound by theory this is likely due to the longer incubation and handling times required for the hashing method requiring the addition of hashing antibodies.
[000157] As detailed above, datasets 1 and 2 have two libraries generated: 1 for the transcriptome and 1 for the barcodes. Dataset 3 was only generated with sequencing the internal barcodes (no separate library for the barcodes). The inventors have surprisingly found that the internal barcodes can be picked up with equal efficiency to sequencing the barcodes separately
(Figure 7). Indeed, there are slightly more singlet cells in the internal barcoding dataset. These findings, have significant implications for the reduction of cost and complexity of workflows since an operator utilizing the internally barcoded cells and methods described herein need only sequence the an RNA library and parse genes back to cells and cells back to experimental conditions in 1 library.
Claims
1. A method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.
2. A method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell RNA-seq library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
3. The method of claim 1 or 2 wherein the step of generating a multiplexed single cell RNA- seq library comprises: creating a one or more gene expression libraries; creating one or more separate barcode libraries via creation of purified cDNA and amplification of regions of said cDNA comprising said genetic barcode; and pooling said gene expression library and said barcode library.
47
4. The method of claim 3, wherein the more than one gene expression library and more than one barcode library are created and pooled.
5. The method of claim 3 or 4, wherein the one or more gene expression libraries comprise approximately 90% of the pool and the one or more barcode libraries comprise approximately 10% of the pool.
6. The method of any one of the preceding claims, wherein the step of generating a multiplexed single cell RNA-seq library comprises creating a gene expression library via creation of purified cDNA and amplification of said cDNA but does not include the generation of a separate barcode library.
7. A method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations of cells or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.
8. A method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell
48 de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.
9. The method of any one of the preceding claims, wherein providing a plurality of isogenic populations of cells or iPSCs comprises incorporating said genetic barcode into said targeted location via CRISPR/Cas9-mediated integration.
10. The method of any one of the preceding claims, wherein said genetic barcode is incorporated into a genomic safe harbor locus.
11. The method of claim 10 wherein said genetic barcode is incorporated into the adeno- associated virus site 1 (AAVS1).
12. The method of any one of the preceding claims wherein said genetic barcode is fluorescently labelled.
13. The method of any one of the preceding claims wherein said genetic barcode is from 5 - 20 bp.
14. The method of claim 13 wherein said genetic barcode is 15 bp.
15. The method of claim 13, wherein said genetic barcode is selected from the group consisting of:
GTGCCGACCAGTATC (SEQ ID NO: 1);
ACCACCTGACGCAAA (SEQ ID NO: 2);
ACGGCCCTATTTAAG (SEQ ID NO: 3);
AGCCCTGAGTCAGTA (SEQ ID NO: 4);
CAAATTCAAGGCGAT (SEQ ID NO: 5);
AATCTTGTATAAGTA (SEQ ID NO: 6);
CGTCACATTTGAGTC (SEQ ID NO: 7);
GGACCTTCTTACGAC (SEQ ID NO: 8);
TACCAATTGTACGCT (SEQ ID NO: 9);
CGCTAATGTCCGTTT (SEQ ID NO: 10);
ACCCTACGGTGGTTC (SEQ ID NO: 11);
TGTCCAAGCTGCAAT (SEQ ID NO: 12);
GTGTATTTAAAGCCG (SEQ ID NO: 13);
ACACCCGTATGTCAC (SEQ ID NO: 14);
TCTTTCGATGGCGGT (SEQ ID NO: 15);
GAGCACCCGCGTATT (SEQ ID NO: 16);
TTATTATGTTCTAGC (SEQ ID NO: 17); and AATCTCTGAAACGAA (SEQ ID NO: 18).
16. The method of any one of the preceding claims, wherein prior to generating said multiplexed RNA-seq library, one or more of the populations of cells and/or progeny thereof are mixed together with one or more other populations of cells and/or progeny thereof.
17. The method of any one of the preceding claims, wherein manipulating one or more of the populations of cells or iPSCs or progeny thereof comprises contacting the cells with an agent of interest which results in a biologically measurable perturbation to a cell.
18. The method of any one of the preceding claims, wherein manipulating one or more of the populations of cells or progeny thereof comprises altering the culture conditions of, or genetically perturbing the cells of the one or more populations or progeny thereof.
19. The method of claim 18, wherein altering the culture conditions comprises contacting the cells or progeny thereof with an agent of interest, contacting the cells or progeny thereof with another cell, co-culturing the cells or progeny thereof with another cell, or co-culturing the cells and/or progeny thereof in an organoid.
20. The method of claim 17, 18 or 19, wherein the agent of interest is a small molecule, a polypeptide, an antibody, a nucleic acid molecule, an RNAi, a vector comprising a nucleic acid molecule, an antisense oligonucleotide, or a gene editing system (e.g. CRISPR/Cas9).
21. The method of any one of claims 17 to 20 when used in a high-throughput drug screening assay.
22. The method of any one of the preceding claims wherein said progeny are differentiated progeny.
23. A plurality of isogenic populations of cells, or progeny thereof, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location.
24. The plurality of isogenic populations of cells, or progeny thereof, of claim 23, wherein said genetic barcode is incorporated into a genomic safe harbor locus.
25. The plurality of isogenic populations of cells, or progeny thereof, of claim 24, wherein said genetic barcode is incorporated into the adeno-associated virus site 1 (AAVS1).
26. The plurality of isogenic populations of cells, or progeny thereof, of any one of claims 23 - 25, wherein said genetic barcode is fluorescently labelled.
27. The plurality of isogenic populations of cells, or progeny thereof, of any one of claims 20 - 22, wherein said genetic barcode is from 10 - 20 bp.
28. The plurality of isogenic populations of cells, or progeny thereof, of claim 24, wherein said genetic barcode is 15 bp.
29. The plurality of isogenic populations of cells, or progeny thereof, of claim 25, wherein said genetic barcode is selected from the group consisting of:
GTGCCGACCAGTATC (SEQ ID NO: 1);
ACCACCTGACGCAAA (SEQ ID NO: 2);
ACGGCCCTATTTAAG (SEQ ID NO: 3);
AGCCCTGAGTCAGTA (SEQ ID NO: 4);
CAAATTCAAGGCGAT (SEQ ID NO: 5);
AATCTTGTATAAGTA (SEQ ID NO: 6);
CGTCACATTTGAGTC (SEQ ID NO: 7);
GGACCTTCTTACGAC (SEQ ID NO: 8);
TACCAATTGTACGCT (SEQ ID NO: 9);
CGCTAATGTCCGTTT (SEQ ID NO: 10);
ACCCTACGGTGGTTC (SEQ ID NO: 11);
TGTCCAAGCTGCAAT (SEQ ID NO: 12);
GTGTATTTAAAGCCG (SEQ ID NO: 13);
ACACCCGTATGTCAC (SEQ ID NO: 14);
TCTTTCGATGGCGGT (SEQ ID NO: 15);
GAGCACCCGCGTATT (SEQ ID NO: 16);
TTATTATGTTCTAGC (SEQ ID NO: 17); and
AATCTCTGAAACGAA (SEQ ID NO: 18).
30. The plurality of isogenic populations of cells of any one of claims 23 - 29, wherein the cells are iPSCs or progeny thereof.
31. The progeny of cells of any one of claims 23 - 30, wherein the progeny are differentiated progeny.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021903971 | 2021-12-08 | ||
AU2021903971A AU2021903971A0 (en) | 2021-12-08 | Methods and compositions for multiplexing cell analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023102610A1 true WO2023102610A1 (en) | 2023-06-15 |
Family
ID=86729303
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/AU2022/051476 WO2023102610A1 (en) | 2021-12-08 | 2022-12-08 | Methods and compositions for multiplexing cell analysis |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023102610A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001053532A2 (en) * | 2000-01-20 | 2001-07-26 | Rosetta Inpharmatics, Inc. | Barcoded synthetic lethal screening to identify drug targets |
WO2020072531A1 (en) * | 2018-10-02 | 2020-04-09 | The Board Of Trustees Of The Leland Stanford Junior University | Compositions and methods for multiplexed quantitative analysis of cell lineages |
-
2022
- 2022-12-08 WO PCT/AU2022/051476 patent/WO2023102610A1/en unknown
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001053532A2 (en) * | 2000-01-20 | 2001-07-26 | Rosetta Inpharmatics, Inc. | Barcoded synthetic lethal screening to identify drug targets |
WO2020072531A1 (en) * | 2018-10-02 | 2020-04-09 | The Board Of Trustees Of The Leland Stanford Junior University | Compositions and methods for multiplexed quantitative analysis of cell lineages |
Non-Patent Citations (9)
Title |
---|
BOWLING SARAH; SRITHARAN DULUXAN; OSORIO FERNANDO G.; NGUYEN MAXIMILIAN; CHEUNG PRISCILLA; RODRIGUEZ-FRATICELLI ALEJO; PATEL SACHI: "An Engineered CRISPR-Cas9 Mouse Line for Simultaneous Readout of Lineage Histories and Gene Expression Profiles in Single Cells", CELL, ELSEVIER, AMSTERDAM NL, vol. 181, no. 6, 14 May 2020 (2020-05-14), Amsterdam NL , pages 1410, XP086181116, ISSN: 0092-8674, DOI: 10.1016/j.cell.2020.04.048 * |
EL-NACHEF DANNY, SHI KEVIN, BEUSSMAN KEVIN M., MARTINEZ REFUGIO, REGIER MARY C., EVERETT GUY W., MURRY CHARLES E., STEVENS KELLY R: "A Rainbow Reporter Tracks Single Cells and Reveals Heterogeneous Cellular Dynamics among Pluripotent Stem Cells and Their Differentiated Derivatives", STEM CELL REPORTS, CELL PRESS, UNITED STATES, vol. 15, no. 1, 1 July 2020 (2020-07-01), United States , pages 226 - 241, XP093073525, ISSN: 2213-6711, DOI: 10.1016/j.stemcr.2020.06.005 * |
PEI WEIKE, FEYERABEND THORSTEN B., RÖSSLER JENS, WANG XI, POSTRACH DANIEL, BUSCH KATRIN, RODE IMMANUEL, KLAPPROTH KAY, DIETLEIN NI: "Polylox barcoding reveals haematopoietic stem cell fates realized in vivo", NATURE, NATURE PUBLISHING GROUP UK, LONDON, vol. 548, no. 7668, 24 August 2017 (2017-08-24), London, pages 456 - 460, XP093073520, ISSN: 0028-0836, DOI: 10.1038/nature23653 * |
PORTER SN ET AL.: "Lentiviral and targeted cellular barcoding reveals ongoing clonal dynamics of cell lines in vitro and in vivo", GENOME BIOLOGY, vol. 15, no. 5, May 2014 (2014-05-01), pages R75, XP021191459, DOI: 10.1186/gb-2014-15-5-r75 * |
SHARMA RAJIV, DEVER DANIEL P., LEE CIARAN M., AZIZI ARMON, PAN YIDAN, CAMARENA JOAB, KÖHNKE THOMAS, BAO GANG, PORTEUS MATTHEW H., : "The TRACE-Seq method tracks recombination alleles and identifies clonal reconstitution dynamics of gene targeted human hematopoietic stem cells", NATURE COMMUNICATIONS, vol. 12, no. 1, XP093073518, DOI: 10.1038/s41467-020-20792-y * |
SHEN SOPHIE, WERNER TESSA, SUN YULIANGZI, SHIM WOO JUN, LUKOWSKI SAMUEL, ANDERSEN STACEY, CHIU HAN SHENG, XIA DI, PHAM DUY, SU ZEZ: "An integrated cell barcoding and computational analysis pipeline for scalable analysis of differentiation at single-cell resolution", BIORXIV, 27 October 2022 (2022-10-27), XP093073530, [retrieved on 20230814], DOI: 10.1101/2022.10.12.511862 * |
SHOEMAKER DD ET AL.: "Quantitative phenotypic analysis of yeast deletion mutants using a highly parallel molecular bar-coding strategy", NATURE GENETICS, vol. 14, no. 4, December 1996 (1996-12-01), pages 450 - 6, XP002043431, DOI: 10.1038/ng1296-450 * |
WINCOTT CEIRE J., SRITHARAN GAYATHRI, BENNS HENRY J., BUNYAN MONIQUE, ALVES EDUARDO, FRICKEL EVA M., EWALD SARAH E., CHILD MATTHE: "The host brain is permissive to colonization by Toxoplasma gondii", BIORXIV, 7 August 2020 (2020-08-07), XP093073523, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/2020.08.06.239822v1.full.pdf> [retrieved on 20230814], DOI: 10.1101/2020.08.06.239822 * |
WINZELER EA ET AL.: "Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis", SCIENCE, vol. 285, no. 5429, 6 August 1999 (1999-08-06), pages 901 - 6, XP001025980, DOI: 10.1126/science.285.5429.901 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mazid et al. | Rolling back human pluripotent stem cells to an eight-cell embryo-like stage | |
US20220205035A1 (en) | Methods and applications for cell barcoding | |
KR102393414B1 (en) | High-throughput, high-volume single-cell transcriptome libraries and methods of making and using same | |
US11072816B2 (en) | Single-cell proteomic assay using aptamers | |
Flynn et al. | Long noncoding RNAs in cell-fate programming and reprogramming | |
Crispatzu et al. | The chromatin, topological and regulatory properties of pluripotency-associated poised enhancers are conserved in vivo | |
Ho et al. | Evaluating synthetic activation and repression of neuropsychiatric-related genes in hiPSC-derived NPCs, neurons, and astrocytes | |
EP3794141A1 (en) | High-throughput single-cell sequencing with reduced amplification bias | |
WO2018145068A1 (en) | An integrated system for programmable dna methylation | |
Shields et al. | lncRedibly versatile: biochemical and biological functions of long noncoding RNAs | |
Wettstein et al. | Generation of a knockout mouse embryonic stem cell line using a paired CRISPR/Cas9 genome engineering tool | |
US20230383336A1 (en) | Method for nucleic acid detection by oligo hybridization and pcr-based amplification | |
Haupt et al. | Endogenous protein tagging in human induced pluripotent stem cells using CRISPR/Cas9 | |
Hainer et al. | Profiling of pluripotency factors in individual stem cells and early embryos | |
Li et al. | Decoding pluripotency: Genetic screens to interrogate the acquisition, maintenance, and exit of pluripotency | |
Aregger et al. | Application of CHyMErA Cas9-Cas12a combinatorial genome-editing platform for genetic interaction mapping and gene fragment deletion screening | |
Tekel et al. | Cytosine and adenosine base editing in human pluripotent stem cells using transient reporters for editing enrichment | |
Regan et al. | Practical Considerations for Single‐Cell Genomics | |
Ravid Lustig et al. | GATA transcription factors drive initial Xist upregulation after fertilization through direct activation of long-range enhancers | |
Biase et al. | Rainbow-Seq: combining cell lineage tracing with single-cell RNA sequencing in preimplantation embryos | |
Ngan et al. | CRISPR‐Suppressor scanning for systematic discovery of drug‐resistance mutations | |
WO2023102610A1 (en) | Methods and compositions for multiplexing cell analysis | |
Srivastava et al. | Distinguishing states of arrest: Genome-wide descriptions of cellular quiescence using ChIP-Seq and RNA-Seq analysis | |
Shen et al. | An integrated cell barcoding and computational analysis pipeline for scalable analysis of differentiation at single-cell resolution | |
Theofilatos et al. | Protocol to isolate mature thymic T cell subsets using fluorescence-activated cell sorting for assessing gene expression by RNA-seq and transcription factor binding across the genome by CUT&RUN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22902523 Country of ref document: EP Kind code of ref document: A1 |