WO2024091801A2 - Methods and compositions for inducing cell differentiation - Google Patents
Methods and compositions for inducing cell differentiation Download PDFInfo
- Publication number
- WO2024091801A2 WO2024091801A2 PCT/US2023/076715 US2023076715W WO2024091801A2 WO 2024091801 A2 WO2024091801 A2 WO 2024091801A2 US 2023076715 W US2023076715 W US 2023076715W WO 2024091801 A2 WO2024091801 A2 WO 2024091801A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- psc
- pscs
- reading frame
- open reading
- cell
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 335
- 239000000203 mixture Substances 0.000 title claims abstract description 36
- 230000001939 inductive effect Effects 0.000 title claims description 155
- 230000024245 cell differentiation Effects 0.000 title description 2
- 210000004027 cell Anatomy 0.000 claims abstract description 421
- 210000001778 pluripotent stem cell Anatomy 0.000 claims description 1011
- 102000040430 polynucleotide Human genes 0.000 claims description 454
- 108091033319 polynucleotide Proteins 0.000 claims description 454
- 239000002157 polynucleotide Substances 0.000 claims description 454
- 108700026244 Open Reading Frames Proteins 0.000 claims description 437
- 108090000623 proteins and genes Proteins 0.000 claims description 293
- 102000004169 proteins and genes Human genes 0.000 claims description 276
- 101000759565 Homo sapiens Zinc finger and BTB domain-containing protein 1 Proteins 0.000 claims description 183
- 102100023253 Zinc finger and BTB domain-containing protein 1 Human genes 0.000 claims description 183
- -1 RELA Proteins 0.000 claims description 149
- 102100022446 Transcription factor Sp4 Human genes 0.000 claims description 69
- 108010079362 Core Binding Factor Alpha 3 Subunit Proteins 0.000 claims description 68
- 102100025369 Runt-related transcription factor 3 Human genes 0.000 claims description 68
- 241000282414 Homo sapiens Species 0.000 claims description 65
- 102100039562 ETS translocation variant 3 Human genes 0.000 claims description 62
- 102100035237 GA-binding protein alpha chain Human genes 0.000 claims description 62
- 101000813726 Homo sapiens ETS translocation variant 3 Proteins 0.000 claims description 62
- 101001022105 Homo sapiens GA-binding protein alpha chain Proteins 0.000 claims description 62
- 101000909637 Homo sapiens Transcription factor COE1 Proteins 0.000 claims description 62
- 102100024207 Transcription factor COE1 Human genes 0.000 claims description 62
- 101000876829 Homo sapiens Protein C-ets-1 Proteins 0.000 claims description 61
- 102100020684 Krueppel-like factor 9 Human genes 0.000 claims description 61
- 102100035251 Protein C-ets-1 Human genes 0.000 claims description 61
- 101000979342 Homo sapiens Nuclear factor NF-kappa-B p105 subunit Proteins 0.000 claims description 60
- 102100023050 Nuclear factor NF-kappa-B p105 subunit Human genes 0.000 claims description 60
- 101000756787 Homo sapiens Transcription factor RFX3 Proteins 0.000 claims description 58
- 101000597045 Homo sapiens Transcriptional enhancer factor TEF-3 Proteins 0.000 claims description 58
- 102100022821 Transcription factor RFX3 Human genes 0.000 claims description 58
- 102100035148 Transcriptional enhancer factor TEF-3 Human genes 0.000 claims description 58
- 101000651211 Homo sapiens Transcription factor PU.1 Proteins 0.000 claims description 57
- 101001010792 Homo sapiens Transcriptional regulator ERG Proteins 0.000 claims description 57
- 102000004265 STAT2 Transcription Factor Human genes 0.000 claims description 57
- 108010081691 STAT2 Transcription Factor Proteins 0.000 claims description 57
- 102100027654 Transcription factor PU.1 Human genes 0.000 claims description 57
- 101000931462 Homo sapiens Protein FosB Proteins 0.000 claims description 55
- 102100020847 Protein FosB Human genes 0.000 claims description 55
- 238000012258 culturing Methods 0.000 claims description 49
- 230000001105 regulatory effect Effects 0.000 claims description 43
- 210000001130 astrocyte Anatomy 0.000 claims description 36
- 231100000433 cytotoxic Toxicity 0.000 claims description 34
- 230000001472 cytotoxic effect Effects 0.000 claims description 34
- 239000001963 growth medium Substances 0.000 claims description 31
- 210000003494 hepatocyte Anatomy 0.000 claims description 26
- 210000003719 b-lymphocyte Anatomy 0.000 claims description 23
- 210000003289 regulatory T cell Anatomy 0.000 claims description 23
- 210000001151 cytotoxic T lymphocyte Anatomy 0.000 claims description 20
- 102100039196 CX3C chemokine receptor 1 Human genes 0.000 claims description 11
- 101001077417 Gallus gallus Potassium voltage-gated channel subfamily H member 6 Proteins 0.000 claims description 10
- 230000002025 microglial effect Effects 0.000 claims description 10
- 102100022807 Potassium voltage-gated channel subfamily H member 2 Human genes 0.000 claims description 9
- 102100031702 Endoplasmic reticulum membrane sensor NFE2L1 Human genes 0.000 claims 11
- 101000588298 Homo sapiens Endoplasmic reticulum membrane sensor NFE2L1 Proteins 0.000 claims 11
- 101000577547 Homo sapiens Nuclear respiratory factor 1 Proteins 0.000 claims 11
- 102100022047 Hepatocyte nuclear factor 4-gamma Human genes 0.000 claims 6
- 101001045749 Homo sapiens Hepatocyte nuclear factor 4-gamma Proteins 0.000 claims 6
- 102100023226 Early growth response protein 1 Human genes 0.000 claims 5
- 101000823089 Equus caballus Alpha-1-antiproteinase 1 Proteins 0.000 claims 5
- 102100030334 Friend leukemia integration 1 transcription factor Human genes 0.000 claims 5
- 101001049697 Homo sapiens Early growth response protein 1 Proteins 0.000 claims 5
- 101001062996 Homo sapiens Friend leukemia integration 1 transcription factor Proteins 0.000 claims 5
- 101001139112 Homo sapiens Krueppel-like factor 9 Proteins 0.000 claims 5
- 101000893493 Homo sapiens Protein flightless-1 homolog Proteins 0.000 claims 5
- 102100022054 Hepatocyte nuclear factor 4-alpha Human genes 0.000 claims 2
- 101001045740 Homo sapiens Hepatocyte nuclear factor 4-alpha Proteins 0.000 claims 2
- 101150075175 Asgr1 gene Proteins 0.000 claims 1
- 102100024222 B-lymphocyte antigen CD19 Human genes 0.000 claims 1
- 102100027207 CD27 antigen Human genes 0.000 claims 1
- 102100032912 CD44 antigen Human genes 0.000 claims 1
- 108090000835 CX3C Chemokine Receptor 1 Proteins 0.000 claims 1
- 101000980825 Homo sapiens B-lymphocyte antigen CD19 Proteins 0.000 claims 1
- 101000914511 Homo sapiens CD27 antigen Proteins 0.000 claims 1
- 101000868273 Homo sapiens CD44 antigen Proteins 0.000 claims 1
- 101001046686 Homo sapiens Integrin alpha-M Proteins 0.000 claims 1
- 101001057504 Homo sapiens Interferon-stimulated gene 20 kDa protein Proteins 0.000 claims 1
- 101001055144 Homo sapiens Interleukin-2 receptor subunit alpha Proteins 0.000 claims 1
- 102100022338 Integrin alpha-M Human genes 0.000 claims 1
- 102100027268 Interferon-stimulated gene 20 kDa protein Human genes 0.000 claims 1
- 108091023040 Transcription factor Proteins 0.000 abstract description 220
- 102000040945 Transcription factor Human genes 0.000 abstract description 219
- 210000004263 induced pluripotent stem cell Anatomy 0.000 abstract description 70
- 102000000524 Nuclear Respiratory Factor 1 Human genes 0.000 description 118
- 108010016592 Nuclear Respiratory Factor 1 Proteins 0.000 description 118
- 230000004069 differentiation Effects 0.000 description 115
- 102000006752 Hepatocyte Nuclear Factor 4 Human genes 0.000 description 89
- 108010086524 Hepatocyte Nuclear Factor 4 Proteins 0.000 description 89
- 102000039446 nucleic acids Human genes 0.000 description 63
- 108020004707 nucleic acids Proteins 0.000 description 63
- 150000007523 nucleic acids Chemical class 0.000 description 63
- 108010012451 Sp4 Transcription Factor Proteins 0.000 description 62
- 102100030768 ETS domain-containing transcription factor ERF Human genes 0.000 description 60
- 101000938776 Homo sapiens ETS domain-containing transcription factor ERF Proteins 0.000 description 60
- 101710116942 Krueppel-like factor 9 Proteins 0.000 description 56
- 230000006698 induction Effects 0.000 description 55
- 102000048854 Friend leukemia integration 1 transcription factor Human genes 0.000 description 52
- 108010080989 Proto-Oncogene Protein c-fli-1 Proteins 0.000 description 52
- 102100029983 Transcriptional regulator ERG Human genes 0.000 description 52
- 210000000130 stem cell Anatomy 0.000 description 52
- 238000013179 statistical model Methods 0.000 description 48
- 230000014509 gene expression Effects 0.000 description 40
- 108010077544 Chromatin Proteins 0.000 description 35
- 210000003483 chromatin Anatomy 0.000 description 35
- 238000009826 distribution Methods 0.000 description 27
- 150000001413 amino acids Chemical group 0.000 description 25
- 238000007619 statistical method Methods 0.000 description 18
- SGKRLCUYIXIAHR-AKNGSSGZSA-N (4s,4ar,5s,5ar,6r,12ar)-4-(dimethylamino)-1,5,10,11,12a-pentahydroxy-6-methyl-3,12-dioxo-4a,5,5a,6-tetrahydro-4h-tetracene-2-carboxamide Chemical compound C1=CC=C2[C@H](C)[C@@H]([C@H](O)[C@@H]3[C@](C(O)=C(C(N)=O)C(=O)[C@H]3N(C)C)(O)C3=O)C3=C(O)C2=C1O SGKRLCUYIXIAHR-AKNGSSGZSA-N 0.000 description 17
- 229960003722 doxycycline Drugs 0.000 description 17
- 230000002018 overexpression Effects 0.000 description 17
- 239000003795 chemical substances by application Substances 0.000 description 15
- 230000002596 correlated effect Effects 0.000 description 14
- 210000001671 embryonic stem cell Anatomy 0.000 description 13
- 239000003112 inhibitor Substances 0.000 description 13
- 101710200897 Asialoglycoprotein receptor 1 Proteins 0.000 description 11
- 101000746022 Homo sapiens CX3C chemokine receptor 1 Proteins 0.000 description 11
- 102000004535 Tankyrases Human genes 0.000 description 11
- 108010017601 Tankyrases Proteins 0.000 description 11
- 238000013459 approach Methods 0.000 description 11
- 230000027455 binding Effects 0.000 description 11
- 238000012216 screening Methods 0.000 description 11
- 239000004017 serum-free culture medium Substances 0.000 description 11
- 150000003384 small molecules Chemical class 0.000 description 11
- 102100026292 Asialoglycoprotein receptor 1 Human genes 0.000 description 10
- 238000007477 logistic regression Methods 0.000 description 10
- 210000000274 microglia Anatomy 0.000 description 10
- 230000004913 activation Effects 0.000 description 9
- 230000003115 biocidal effect Effects 0.000 description 9
- 108091008053 gene clusters Proteins 0.000 description 9
- 239000003102 growth factor Substances 0.000 description 9
- 108020004414 DNA Proteins 0.000 description 8
- 210000001744 T-lymphocyte Anatomy 0.000 description 8
- 239000000090 biomarker Substances 0.000 description 8
- 230000000694 effects Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 238000010801 machine learning Methods 0.000 description 8
- 210000003716 mesoderm Anatomy 0.000 description 8
- 238000003657 Likelihood-ratio test Methods 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 7
- 239000003550 marker Substances 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 229930101283 tetracycline Natural products 0.000 description 7
- 230000001225 therapeutic effect Effects 0.000 description 7
- 239000013598 vector Substances 0.000 description 7
- 108091007911 GSKs Proteins 0.000 description 6
- 102000004103 Glycogen Synthase Kinases Human genes 0.000 description 6
- 239000004098 Tetracycline Substances 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 6
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 6
- 210000001654 germ layer Anatomy 0.000 description 6
- 229960002180 tetracycline Drugs 0.000 description 6
- 235000019364 tetracycline Nutrition 0.000 description 6
- 150000003522 tetracyclines Chemical class 0.000 description 6
- VBEQCZHXXJYVRD-GACYYNSASA-N uroanthelone Chemical compound C([C@@H](C(=O)N[C@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CS)C(=O)N[C@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)NCC(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CS)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O)C(C)C)[C@@H](C)O)NC(=O)[C@H](CO)NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CO)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](NC(=O)[C@H](CC=1NC=NC=1)NC(=O)[C@H](CCSC)NC(=O)[C@H](CS)NC(=O)[C@@H](NC(=O)CNC(=O)CNC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CS)NC(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)CNC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)[C@H](CO)NC(=O)[C@H](CO)NC(=O)[C@H]1N(CCC1)C(=O)[C@H](CS)NC(=O)CNC(=O)[C@H]1N(CCC1)C(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)[C@H](CO)NC(=O)[C@@H](N)CC(N)=O)C(C)C)[C@@H](C)CC)C1=CC=C(O)C=C1 VBEQCZHXXJYVRD-GACYYNSASA-N 0.000 description 6
- 239000013603 viral vector Substances 0.000 description 6
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 5
- 238000001353 Chip-sequencing Methods 0.000 description 5
- 108020004635 Complementary DNA Proteins 0.000 description 5
- 102000004887 Transforming Growth Factor beta Human genes 0.000 description 5
- 108090001012 Transforming Growth Factor beta Proteins 0.000 description 5
- KLGQSVMIPOVQAX-UHFFFAOYSA-N XAV939 Chemical compound N=1C=2CCSCC=2C(O)=NC=1C1=CC=C(C(F)(F)F)C=C1 KLGQSVMIPOVQAX-UHFFFAOYSA-N 0.000 description 5
- 239000003242 anti bacterial agent Substances 0.000 description 5
- 238000003556 assay Methods 0.000 description 5
- 238000010804 cDNA synthesis Methods 0.000 description 5
- 239000002299 complementary DNA Substances 0.000 description 5
- 201000010099 disease Diseases 0.000 description 5
- 239000003814 drug Substances 0.000 description 5
- 210000003981 ectoderm Anatomy 0.000 description 5
- 210000001900 endoderm Anatomy 0.000 description 5
- 239000012583 B-27 Supplement Substances 0.000 description 4
- 210000001266 CD8-positive T-lymphocyte Anatomy 0.000 description 4
- ZEOWTGPWHLSLOG-UHFFFAOYSA-N Cc1ccc(cc1-c1ccc2c(n[nH]c2c1)-c1cnn(c1)C1CC1)C(=O)Nc1cccc(c1)C(F)(F)F Chemical compound Cc1ccc(cc1-c1ccc2c(n[nH]c2c1)-c1cnn(c1)C1CC1)C(=O)Nc1cccc(c1)C(F)(F)F ZEOWTGPWHLSLOG-UHFFFAOYSA-N 0.000 description 4
- 102400001368 Epidermal growth factor Human genes 0.000 description 4
- 101800003838 Epidermal growth factor Proteins 0.000 description 4
- 102100023593 Fibroblast growth factor receptor 1 Human genes 0.000 description 4
- 101710182386 Fibroblast growth factor receptor 1 Proteins 0.000 description 4
- 101001047090 Homo sapiens Potassium voltage-gated channel subfamily H member 2 Proteins 0.000 description 4
- 108010023082 activin A Proteins 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 229940079593 drug Drugs 0.000 description 4
- 230000004064 dysfunction Effects 0.000 description 4
- 229940116977 epidermal growth factor Drugs 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 210000001519 tissue Anatomy 0.000 description 4
- 238000013518 transcription Methods 0.000 description 4
- 230000035897 transcription Effects 0.000 description 4
- 230000002103 transcriptional effect Effects 0.000 description 4
- 238000003151 transfection method Methods 0.000 description 4
- HJCMDXDYPOUFDY-WHFBIAKZSA-N Ala-Gln Chemical compound C[C@H](N)C(=O)N[C@H](C(O)=O)CCC(N)=O HJCMDXDYPOUFDY-WHFBIAKZSA-N 0.000 description 3
- AQGNHMOJWBZFQQ-UHFFFAOYSA-N CT 99021 Chemical compound CC1=CNC(C=2C(=NC(NCCNC=3N=CC(=CC=3)C#N)=NC=2)C=2C(=CC(Cl)=CC=2)Cl)=N1 AQGNHMOJWBZFQQ-UHFFFAOYSA-N 0.000 description 3
- 108010037362 Extracellular Matrix Proteins Proteins 0.000 description 3
- 102000010834 Extracellular Matrix Proteins Human genes 0.000 description 3
- 229940125830 FGFR1 inhibitor Drugs 0.000 description 3
- 229940125832 FGFR3 inhibitor Drugs 0.000 description 3
- 102000053171 Glial Fibrillary Acidic Human genes 0.000 description 3
- 101710193519 Glial fibrillary acidic protein Proteins 0.000 description 3
- 241000699666 Mus <mouse, genus> Species 0.000 description 3
- 238000003559 RNA-seq method Methods 0.000 description 3
- 102000008579 Transposases Human genes 0.000 description 3
- 108010020764 Transposases Proteins 0.000 description 3
- 102000013814 Wnt Human genes 0.000 description 3
- 108050003627 Wnt Proteins 0.000 description 3
- 229960002648 alanylglutamine Drugs 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 210000000988 bone and bone Anatomy 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000004520 electroporation Methods 0.000 description 3
- 210000005046 glial fibrillary acidic protein Anatomy 0.000 description 3
- 238000000126 in silico method Methods 0.000 description 3
- 230000002401 inhibitory effect Effects 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 238000012417 linear regression Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 239000002609 medium Substances 0.000 description 3
- 230000000921 morphogenic effect Effects 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 3
- 238000007481 next generation sequencing Methods 0.000 description 3
- 239000013612 plasmid Substances 0.000 description 3
- 229920001184 polypeptide Polymers 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 102000004196 processed proteins & peptides Human genes 0.000 description 3
- 108090000765 processed proteins & peptides Proteins 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 239000013589 supplement Substances 0.000 description 3
- 230000001629 suppression Effects 0.000 description 3
- ZRKFYGHZFMAOKI-QMGMOQQFSA-N tgfbeta Chemical compound C([C@H](NC(=O)[C@H](C(C)C)NC(=O)CNC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CC(C)C)NC(=O)CNC(=O)[C@H](C)NC(=O)[C@H](CO)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](C)NC(=O)[C@H](C)NC(=O)[C@@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CCSC)C(C)C)[C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](C)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N1[C@@H](CCC1)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O)C1=CC=C(O)C=C1 ZRKFYGHZFMAOKI-QMGMOQQFSA-N 0.000 description 3
- 210000001325 yolk sac Anatomy 0.000 description 3
- FNQJDLTXOVEEFB-UHFFFAOYSA-N 1,2,3-benzothiadiazole Chemical compound C1=CC=C2SN=NC2=C1 FNQJDLTXOVEEFB-UHFFFAOYSA-N 0.000 description 2
- FPIPGXGPPPQFEQ-UHFFFAOYSA-N 13-cis retinol Natural products OCC=C(C)C=CC=C(C)C=CC1=C(C)CCCC1(C)C FPIPGXGPPPQFEQ-UHFFFAOYSA-N 0.000 description 2
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 2
- LRFVTYWOQMYALW-UHFFFAOYSA-N 9H-xanthine Chemical compound O=C1NC(=O)NC2=C1NC=N2 LRFVTYWOQMYALW-UHFFFAOYSA-N 0.000 description 2
- 239000013607 AAV vector Substances 0.000 description 2
- RZVAJINKPMORJF-UHFFFAOYSA-N Acetaminophen Chemical compound CC(=O)NC1=CC=C(O)C=C1 RZVAJINKPMORJF-UHFFFAOYSA-N 0.000 description 2
- 239000005964 Acibenzolar-S-methyl Substances 0.000 description 2
- 102000012002 Aquaporin 4 Human genes 0.000 description 2
- 108010036280 Aquaporin 4 Proteins 0.000 description 2
- 108020004705 Codon Proteins 0.000 description 2
- 102000012410 DNA Ligases Human genes 0.000 description 2
- 108010061982 DNA Ligases Proteins 0.000 description 2
- 108060002716 Exonuclease Proteins 0.000 description 2
- 108090000379 Fibroblast growth factor 2 Proteins 0.000 description 2
- 102100024785 Fibroblast growth factor 2 Human genes 0.000 description 2
- 238000012413 Fluorescence activated cell sorting analysis Methods 0.000 description 2
- 108700011259 MicroRNAs Proteins 0.000 description 2
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 2
- FPIPGXGPPPQFEQ-BOOMUCAASA-N Vitamin A Natural products OC/C=C(/C)\C=C\C=C(\C)/C=C/C1=C(C)CCCC1(C)C FPIPGXGPPPQFEQ-BOOMUCAASA-N 0.000 description 2
- 239000013543 active substance Substances 0.000 description 2
- FPIPGXGPPPQFEQ-OVSJKPMPSA-N all-trans-retinol Chemical compound OC\C=C(/C)\C=C\C=C(/C)\C=C\C1=C(C)CCCC1(C)C FPIPGXGPPPQFEQ-OVSJKPMPSA-N 0.000 description 2
- 210000004436 artificial bacterial chromosome Anatomy 0.000 description 2
- 230000003140 astrocytic effect Effects 0.000 description 2
- 210000002469 basement membrane Anatomy 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 210000004556 brain Anatomy 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 230000009274 differential gene expression Effects 0.000 description 2
- 210000002257 embryonic structure Anatomy 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 210000002919 epithelial cell Anatomy 0.000 description 2
- 102000013165 exonuclease Human genes 0.000 description 2
- 210000002744 extracellular matrix Anatomy 0.000 description 2
- 238000001943 fluorescence-activated cell sorting Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- BRZYSWJRSDMWLG-CAXSIQPQSA-N geneticin Chemical compound O1C[C@@](O)(C)[C@H](NC)[C@@H](O)[C@H]1O[C@@H]1[C@@H](O)[C@H](O[C@@H]2[C@@H]([C@@H](O)[C@H](O)[C@@H](C(C)O)O2)N)[C@@H](N)C[C@H]1N BRZYSWJRSDMWLG-CAXSIQPQSA-N 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 210000005260 human cell Anatomy 0.000 description 2
- FDGQSTZJBFJUBT-UHFFFAOYSA-N hypoxanthine Chemical compound O=C1NC=NC2=C1NC=N2 FDGQSTZJBFJUBT-UHFFFAOYSA-N 0.000 description 2
- 210000002865 immune cell Anatomy 0.000 description 2
- 210000000987 immune system Anatomy 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- 238000001727 in vivo Methods 0.000 description 2
- 210000002602 induced regulatory T cell Anatomy 0.000 description 2
- 238000007913 intrathecal administration Methods 0.000 description 2
- 238000001990 intravenous administration Methods 0.000 description 2
- 238000007914 intraventricular administration Methods 0.000 description 2
- DRAVOWXCEBXPTN-UHFFFAOYSA-N isoguanine Chemical compound NC1=NC(=O)NC2=C1NC=N2 DRAVOWXCEBXPTN-UHFFFAOYSA-N 0.000 description 2
- 210000002540 macrophage Anatomy 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 239000002679 microRNA Substances 0.000 description 2
- 210000004498 neuroglial cell Anatomy 0.000 description 2
- 108010008217 nidogen Proteins 0.000 description 2
- 239000002773 nucleotide Substances 0.000 description 2
- 125000003729 nucleotide group Chemical group 0.000 description 2
- 244000052769 pathogen Species 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- 239000000546 pharmaceutical excipient Substances 0.000 description 2
- 230000003389 potentiating effect Effects 0.000 description 2
- RXWNCPJZOCPEPQ-NVWDDTSBSA-N puromycin Chemical compound C1=CC(OC)=CC=C1C[C@H](N)C(=O)N[C@H]1[C@@H](O)[C@H](N2C3=NC=NC(=C3N=C2)N(C)C)O[C@@H]1CO RXWNCPJZOCPEPQ-NVWDDTSBSA-N 0.000 description 2
- 238000007670 refining Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 239000011435 rock Substances 0.000 description 2
- YGSDEFSMJLZEOE-UHFFFAOYSA-N salicylic acid Chemical compound OC(=O)C1=CC=CC=C1O YGSDEFSMJLZEOE-UHFFFAOYSA-N 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 235000019155 vitamin A Nutrition 0.000 description 2
- 239000011719 vitamin A Substances 0.000 description 2
- 229940045997 vitamin a Drugs 0.000 description 2
- DIGQNXIGRZPYDK-WKSCXVIASA-N (2R)-6-amino-2-[[2-[[(2S)-2-[[2-[[(2R)-2-[[(2S)-2-[[(2R,3S)-2-[[2-[[(2S)-2-[[2-[[(2S)-2-[[(2S)-2-[[(2R)-2-[[(2S,3S)-2-[[(2R)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[2-[[(2S)-2-[[(2R)-2-[[2-[[2-[[2-[(2-amino-1-hydroxyethylidene)amino]-3-carboxy-1-hydroxypropylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1-hydroxyethylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1,3-dihydroxypropylidene]amino]-1-hydroxyethylidene]amino]-1-hydroxypropylidene]amino]-1,3-dihydroxypropylidene]amino]-1,3-dihydroxypropylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1,3-dihydroxybutylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1-hydroxypropylidene]amino]-1,3-dihydroxypropylidene]amino]-1-hydroxyethylidene]amino]-1,5-dihydroxy-5-iminopentylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1,3-dihydroxybutylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1,3-dihydroxypropylidene]amino]-1-hydroxyethylidene]amino]-1-hydroxy-3-sulfanylpropylidene]amino]-1-hydroxyethylidene]amino]hexanoic acid Chemical compound C[C@@H]([C@@H](C(=N[C@@H](CS)C(=N[C@@H](C)C(=N[C@@H](CO)C(=NCC(=N[C@@H](CCC(=N)O)C(=NC(CS)C(=N[C@H]([C@H](C)O)C(=N[C@H](CS)C(=N[C@H](CO)C(=NCC(=N[C@H](CS)C(=NCC(=N[C@H](CCCCN)C(=O)O)O)O)O)O)O)O)O)O)O)O)O)O)O)N=C([C@H](CS)N=C([C@H](CO)N=C([C@H](CO)N=C([C@H](C)N=C(CN=C([C@H](CO)N=C([C@H](CS)N=C(CN=C(C(CS)N=C(C(CC(=O)O)N=C(CN)O)O)O)O)O)O)O)O)O)O)O)O DIGQNXIGRZPYDK-WKSCXVIASA-N 0.000 description 1
- XQCZBXHVTFVIFE-UHFFFAOYSA-N 2-amino-4-hydroxypyrimidine Chemical compound NC1=NC=CC(O)=N1 XQCZBXHVTFVIFE-UHFFFAOYSA-N 0.000 description 1
- 238000012605 2D cell culture Methods 0.000 description 1
- 238000012604 3D cell culture Methods 0.000 description 1
- 229930024421 Adenine Natural products 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- 108060000903 Beta-catenin Proteins 0.000 description 1
- 102000015735 Beta-catenin Human genes 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 108010035532 Collagen Proteins 0.000 description 1
- 102000008186 Collagen Human genes 0.000 description 1
- 241000701022 Cytomegalovirus Species 0.000 description 1
- 230000004543 DNA replication Effects 0.000 description 1
- 230000004568 DNA-binding Effects 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 1
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 1
- 108010092160 Dactinomycin Proteins 0.000 description 1
- 241000702421 Dependoparvovirus Species 0.000 description 1
- 108010016626 Dipeptides Proteins 0.000 description 1
- 239000006144 Dulbecco’s modified Eagle's medium Substances 0.000 description 1
- 108010051542 Early Growth Response Protein 1 Proteins 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- YQYJSBFKSSDGFO-UHFFFAOYSA-N Epihygromycin Natural products OC1C(O)C(C(=O)C)OC1OC(C(=C1)O)=CC=C1C=C(C)C(=O)NC1C(O)C(O)C2OCOC2C1O YQYJSBFKSSDGFO-UHFFFAOYSA-N 0.000 description 1
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- VGGSQFUCUMXWEO-UHFFFAOYSA-N Ethene Chemical compound C=C VGGSQFUCUMXWEO-UHFFFAOYSA-N 0.000 description 1
- 239000005977 Ethylene Substances 0.000 description 1
- 102100027842 Fibroblast growth factor receptor 3 Human genes 0.000 description 1
- 101710182396 Fibroblast growth factor receptor 3 Proteins 0.000 description 1
- 108060006662 GSK3 Proteins 0.000 description 1
- 102000001267 GSK3 Human genes 0.000 description 1
- 208000034826 Genetic Predisposition to Disease Diseases 0.000 description 1
- 102000019058 Glycogen Synthase Kinase 3 beta Human genes 0.000 description 1
- 108010051975 Glycogen Synthase Kinase 3 beta Proteins 0.000 description 1
- 102000003886 Glycoproteins Human genes 0.000 description 1
- 108090000288 Glycoproteins Proteins 0.000 description 1
- 101150113453 Gsk3a gene Proteins 0.000 description 1
- 102000008055 Heparan Sulfate Proteoglycans Human genes 0.000 description 1
- 229920002971 Heparan sulfate Polymers 0.000 description 1
- 102300052097 Hepatocyte nuclear factor 4-alpha isoform HNF4-Alpha-2 Human genes 0.000 description 1
- 206010019851 Hepatotoxicity Diseases 0.000 description 1
- 208000007514 Herpes zoster Diseases 0.000 description 1
- 101000882584 Homo sapiens Estrogen receptor Proteins 0.000 description 1
- 101600072310 Homo sapiens Hepatocyte nuclear factor 4-alpha (isoform HNF4-Alpha-2) Proteins 0.000 description 1
- 101000692455 Homo sapiens Platelet-derived growth factor receptor beta Proteins 0.000 description 1
- 101001059454 Homo sapiens Serine/threonine-protein kinase MARK2 Proteins 0.000 description 1
- 101000713575 Homo sapiens Tubulin beta-3 chain Proteins 0.000 description 1
- 101000851007 Homo sapiens Vascular endothelial growth factor receptor 2 Proteins 0.000 description 1
- 241000701041 Human betaherpesvirus 7 Species 0.000 description 1
- 241000701044 Human gammaherpesvirus 4 Species 0.000 description 1
- 241001502974 Human gammaherpesvirus 8 Species 0.000 description 1
- 241000702617 Human parvovirus B19 Species 0.000 description 1
- 241000829111 Human polyomavirus 1 Species 0.000 description 1
- UGQMRVRMYYASKQ-UHFFFAOYSA-N Hypoxanthine nucleoside Natural products OC1C(O)C(CO)OC1N1C(NC=NC2=O)=C2N=C1 UGQMRVRMYYASKQ-UHFFFAOYSA-N 0.000 description 1
- 101150026109 INSR gene Proteins 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 102100021244 Integral membrane protein GPR180 Human genes 0.000 description 1
- 102100034343 Integrase Human genes 0.000 description 1
- 241000701460 JC polyomavirus Species 0.000 description 1
- WHUUTDBJXJRKMK-VKHMYHEASA-N L-glutamic acid Chemical compound OC(=O)[C@@H](N)CCC(O)=O WHUUTDBJXJRKMK-VKHMYHEASA-N 0.000 description 1
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 description 1
- 229930182816 L-glutamine Natural products 0.000 description 1
- 108010085895 Laminin Proteins 0.000 description 1
- 102000003792 Metallothionein Human genes 0.000 description 1
- 108090000157 Metallothionein Proteins 0.000 description 1
- 108090000744 Mitogen-Activated Protein Kinase Kinases Proteins 0.000 description 1
- 241000700627 Monkeypox virus Species 0.000 description 1
- 102100037369 Nidogen-1 Human genes 0.000 description 1
- 102100026547 Platelet-derived growth factor receptor beta Human genes 0.000 description 1
- 108090000315 Protein Kinase C Proteins 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 101001023863 Rattus norvegicus Glucocorticoid receptor Proteins 0.000 description 1
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 1
- 108010034634 Repressor Proteins Proteins 0.000 description 1
- 102000009661 Repressor Proteins Human genes 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 239000006146 Roswell Park Memorial Institute medium Substances 0.000 description 1
- 108060006706 SRC Proteins 0.000 description 1
- 102000001332 SRC Human genes 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 206010039491 Sarcoma Diseases 0.000 description 1
- 102100028904 Serine/threonine-protein kinase MARK2 Human genes 0.000 description 1
- 241000700584 Simplexvirus Species 0.000 description 1
- 108091081024 Start codon Proteins 0.000 description 1
- 108090000054 Syndecan-2 Proteins 0.000 description 1
- 108700009124 Transcription Initiation Site Proteins 0.000 description 1
- 102100036790 Tubulin beta-3 chain Human genes 0.000 description 1
- 241000700647 Variola virus Species 0.000 description 1
- 102100033177 Vascular endothelial growth factor receptor 2 Human genes 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 108010084455 Zeocin Proteins 0.000 description 1
- 229930183665 actinomycin Natural products 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 239000012190 activator Substances 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 229960000643 adenine Drugs 0.000 description 1
- 150000005005 aminopyrimidines Chemical class 0.000 description 1
- 229960000723 ampicillin Drugs 0.000 description 1
- AVKUERGKIZMTKX-NJBDSQKTSA-N ampicillin Chemical compound C1([C@@H](N)C(=O)N[C@H]2[C@H]3SC([C@@H](N3C2=O)C(O)=O)(C)C)=CC=CC=C1 AVKUERGKIZMTKX-NJBDSQKTSA-N 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 229940088710 antibiotic agent Drugs 0.000 description 1
- 210000001106 artificial yeast chromosome Anatomy 0.000 description 1
- 230000000712 assembly Effects 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 229930189065 blasticidin Natural products 0.000 description 1
- 210000000601 blood cell Anatomy 0.000 description 1
- 210000001185 bone marrow Anatomy 0.000 description 1
- 229960003669 carbenicillin Drugs 0.000 description 1
- FPPNZSSZRUTDAP-UWFZAAFLSA-N carbenicillin Chemical compound N([C@H]1[C@H]2SC([C@@H](N2C1=O)C(O)=O)(C)C)C(=O)C(C(O)=O)C1=CC=CC=C1 FPPNZSSZRUTDAP-UWFZAAFLSA-N 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 210000000170 cell membrane Anatomy 0.000 description 1
- 239000002458 cell surface marker Substances 0.000 description 1
- 210000003169 central nervous system Anatomy 0.000 description 1
- 239000013043 chemical agent Substances 0.000 description 1
- 235000015111 chews Nutrition 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 229920001436 collagen Polymers 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010205 computational analysis Methods 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001784 detoxification Methods 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 108010057988 ecdysone receptor Proteins 0.000 description 1
- 235000013601 eggs Nutrition 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 102000052116 epidermal growth factor receptor activity proteins Human genes 0.000 description 1
- 108700015053 epidermal growth factor receptor activity proteins Proteins 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 239000003797 essential amino acid Substances 0.000 description 1
- 235000020776 essential amino acid Nutrition 0.000 description 1
- 108020001507 fusion proteins Proteins 0.000 description 1
- 102000037865 fusion proteins Human genes 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- 229930195712 glutamate Natural products 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 208000002672 hepatitis B Diseases 0.000 description 1
- 231100000304 hepatotoxicity Toxicity 0.000 description 1
- 230000007686 hepatotoxicity Effects 0.000 description 1
- 108010051779 histone H3 trimethyl Lys4 Proteins 0.000 description 1
- 230000013632 homeostatic process Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000028993 immune response Effects 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 230000015788 innate immune response Effects 0.000 description 1
- 210000005007 innate immune system Anatomy 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 210000000265 leukocyte Anatomy 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000013332 literature search Methods 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 210000001161 mammalian embryo Anatomy 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 108010082117 matrigel Proteins 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 229910021645 metal ion Inorganic materials 0.000 description 1
- HPNSFSBZBAHARI-UHFFFAOYSA-N micophenolic acid Natural products OC1=C(CC=C(C)CCC(O)=O)C(OC)=C(C)C2=C1C(=O)OC2 HPNSFSBZBAHARI-UHFFFAOYSA-N 0.000 description 1
- 244000005700 microbiome Species 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 238000002703 mutagenesis Methods 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- 229960000951 mycophenolic acid Drugs 0.000 description 1
- HPNSFSBZBAHARI-RUDMXATFSA-N mycophenolic acid Chemical compound OC1=C(C\C=C(/C)CCC(O)=O)C(OC)=C(C)C2=C1C(=O)OC2 HPNSFSBZBAHARI-RUDMXATFSA-N 0.000 description 1
- NFVJNJQRWPQVOA-UHFFFAOYSA-N n-[2-chloro-5-(trifluoromethyl)phenyl]-2-[3-(4-ethyl-5-ethylsulfanyl-1,2,4-triazol-3-yl)piperidin-1-yl]acetamide Chemical compound CCN1C(SCC)=NN=C1C1CN(CC(=O)NC=2C(=CC=C(C=2)C(F)(F)F)Cl)CCC1 NFVJNJQRWPQVOA-UHFFFAOYSA-N 0.000 description 1
- YOHYSYJDKVYCJI-UHFFFAOYSA-N n-[3-[[6-[3-(trifluoromethyl)anilino]pyrimidin-4-yl]amino]phenyl]cyclopropanecarboxamide Chemical compound FC(F)(F)C1=CC=CC(NC=2N=CN=C(NC=3C=C(NC(=O)C4CC4)C=CC=3)C=2)=C1 YOHYSYJDKVYCJI-UHFFFAOYSA-N 0.000 description 1
- VRBKIVRKKCLPHA-UHFFFAOYSA-N nefazodone Chemical compound O=C1N(CCOC=2C=CC=CC=2)C(CC)=NN1CCCN(CC1)CCN1C1=CC=CC(Cl)=C1 VRBKIVRKKCLPHA-UHFFFAOYSA-N 0.000 description 1
- 229960001800 nefazodone Drugs 0.000 description 1
- 230000001703 neuroimmune Effects 0.000 description 1
- 244000309711 non-enveloped viruses Species 0.000 description 1
- 230000030648 nucleus localization Effects 0.000 description 1
- 210000000963 osteoblast Anatomy 0.000 description 1
- 210000004681 ovum Anatomy 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- FJKROLUGYXJWQN-UHFFFAOYSA-N papa-hydroxy-benzoic acid Natural products OC(=O)C1=CC=C(O)C=C1 FJKROLUGYXJWQN-UHFFFAOYSA-N 0.000 description 1
- 229960005489 paracetamol Drugs 0.000 description 1
- 230000008186 parthenogenesis Effects 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- CWCMIVBLVUHDHK-ZSNHEYEWSA-N phleomycin D1 Chemical compound N([C@H](C(=O)N[C@H](C)[C@@H](O)[C@H](C)C(=O)N[C@@H]([C@H](O)C)C(=O)NCCC=1SC[C@@H](N=1)C=1SC=C(N=1)C(=O)NCCCCNC(N)=N)[C@@H](O[C@H]1[C@H]([C@@H](O)[C@H](O)[C@H](CO)O1)O[C@@H]1[C@H]([C@@H](OC(N)=O)[C@H](O)[C@@H](CO)O1)O)C=1N=CNC=1)C(=O)C1=NC([C@H](CC(N)=O)NC[C@H](N)C(N)=O)=NC(N)=C1C CWCMIVBLVUHDHK-ZSNHEYEWSA-N 0.000 description 1
- 150000004713 phosphodiesters Chemical group 0.000 description 1
- 210000002826 placenta Anatomy 0.000 description 1
- 239000011148 porous material Substances 0.000 description 1
- 210000004986 primary T-cell Anatomy 0.000 description 1
- 238000009516 primary packaging Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 230000002062 proliferating effect Effects 0.000 description 1
- 238000002731 protein assay Methods 0.000 description 1
- 238000001243 protein synthesis Methods 0.000 description 1
- 229950010131 puromycin Drugs 0.000 description 1
- 102000005962 receptors Human genes 0.000 description 1
- 108020003175 receptors Proteins 0.000 description 1
- 108700005467 recombinant KCB-1 Proteins 0.000 description 1
- 210000002707 regulatory b cell Anatomy 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 150000004492 retinoid derivatives Chemical class 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 229960004889 salicylic acid Drugs 0.000 description 1
- 230000003248 secreting effect Effects 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 210000004927 skin cell Anatomy 0.000 description 1
- 238000010374 somatic cell nuclear transfer Methods 0.000 description 1
- 230000010473 stable expression Effects 0.000 description 1
- 150000003431 steroids Chemical class 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 210000000225 synapse Anatomy 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 101150024821 tetO gene Proteins 0.000 description 1
- 101150061166 tetR gene Proteins 0.000 description 1
- OFVLGDICTFRJMM-WESIUVDSSA-N tetracycline Chemical compound C1=CC=C2[C@](O)(C)[C@H]3C[C@H]4[C@H](N(C)C)C(O)=C(C(N)=O)C(=O)[C@@]4(O)C(O)=C3C(=O)C2=C1O OFVLGDICTFRJMM-WESIUVDSSA-N 0.000 description 1
- 229940113082 thymine Drugs 0.000 description 1
- 210000001685 thyroid gland Anatomy 0.000 description 1
- 231100000765 toxin Toxicity 0.000 description 1
- 239000003053 toxin Substances 0.000 description 1
- 108700012359 toxins Proteins 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000010361 transduction Methods 0.000 description 1
- 230000026683 transduction Effects 0.000 description 1
- 238000001890 transfection Methods 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- GXPHKUHSUJUWKP-UHFFFAOYSA-N troglitazone Chemical compound C1CC=2C(C)=C(O)C(C)=C(C)C=2OC1(C)COC(C=C1)=CC=C1CC1SC(=O)NC1=O GXPHKUHSUJUWKP-UHFFFAOYSA-N 0.000 description 1
- 229960001641 troglitazone Drugs 0.000 description 1
- GXPHKUHSUJUWKP-NTKDMRAZSA-N troglitazone Natural products C([C@@]1(OC=2C(C)=C(C(=C(C)C=2CC1)O)C)C)OC(C=C1)=CC=C1C[C@H]1SC(=O)NC1=O GXPHKUHSUJUWKP-NTKDMRAZSA-N 0.000 description 1
- 230000024275 uncoating of virus Effects 0.000 description 1
- 241000701161 unidentified adenovirus Species 0.000 description 1
- 241001430294 unidentified retrovirus Species 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 229940035893 uracil Drugs 0.000 description 1
- 230000035899 viability Effects 0.000 description 1
- 238000003142 viral transduction method Methods 0.000 description 1
- 210000002845 virion Anatomy 0.000 description 1
- 229940075420 xanthine Drugs 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N5/00—Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor
- C12N5/06—Animal cells or tissues; Human cells or tissues
- C12N5/0602—Vertebrate cells
- C12N5/0696—Artificially induced pluripotent stem cells, e.g. iPS
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2501/00—Active agents used in cell culture processes, e.g. differentation
- C12N2501/60—Transcription factors
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2510/00—Genetically modified cells
Definitions
- the present disclosure relates, at least in part, to methods and compositions for generating astrocyte-like cells (iAstIIs), cytotoxic T-cell-like cells (iCytoTs), hepatocyte-like cells (iHeps), regulatory T-cell-like cells (iTRegs), B cell-like cells (iBCells), and/or microglia-like cells (iMicroglia) from pluripotent stem cells.
- astrocyte-like cells iAstIIs
- iCytoTs cytotoxic T-cell-like cells
- iHeps hepatocyte-like cells
- iTRegs regulatory T-cell-like cells
- iBCells B cell-like cells
- iMicroglia microglia-like cells
- the present disclosure also relates, at least in part, for identifying transcription factors that improve differentiation efficiency of pluripotent stem cells into astrocyte-like cells, cytotoxic T-cell-like cells, hepatocyte-like cells, regulatory T-cell-like cells, B cell-like cells, and/or microglia-like cells.
- a pluripotent stem cell comprising: an engineered polynucleotide comprising an open reading frame encoding ERG, EGR1, FLI1, FOSB, or any combination thereof.
- the engineered polynucleotide comprises an open reading frame encoding ERG.
- the engineered polynucleotide comprises an open reading frame encoding EGR1.
- the engineered polynucleotide comprises an open reading frame encoding FLI1. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding FOSB. In some embodiments, the PSC expresses or overexpresses ERG, EGR1, FLI1, FOSB, or any combination thereof. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: a protein selected from ERG, EGR1, FLI1, and FOSB, wherein the protein is overexpressed.
- PSC pluripotent stem cell
- the PSC expresses or overexpresses: ERG, EGR1, FLI1, FOSB, or any combination thereof.
- the PSC is a human PSC.
- the PSC is an induced PSC (iPSC).
- the PSC comprises 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ERG, EGR1, FLI1, and FOSB.
- the PSC comprises 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ERG, EGR1, FLI1, and FOSB.
- aspects of the present disclosure relate to a composition
- a composition comprising: a population of any one of the PSCs described herein.
- the population comprises at least 2500/cm2 of the PSC.
- aspects of the present disclosure relate to a method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ERG, EGR1, FLI1, and FOSB to produce astrocyte-like cells.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ERG.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding EGR1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding FLI1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding FOSB. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. In some embodiments, the inducible promoter is a chemically-inducible promoter.
- the chemically-inducible promoter is a doxycycline-inducible promoter.
- the population comprises 1x10 2 -1x10 7 PSCs.
- the population of PSCs is cultured for at least 1 day.
- the population of PSCs is cultured for about 3-6 days.
- the population of PSCs is cultured for no more than 6 days.
- the astrocyte-like cells are CD44+ and A2B5+.
- a pluripotent stem cell comprising: an engineered polynucleotide comprising an open reading frame encoding ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof.
- the engineered polynucleotide comprises an open reading frame encoding ZBTB1.
- the engineered polynucleotide comprises an open reading frame encoding RUNX3.
- the engineered polynucleotide comprises an open reading frame encoding RELA.
- the engineered polynucleotide comprises an open reading frame encoding NRF1.
- the engineered polynucleotide comprises an open reading frame encoding ERF. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding SP4. In some embodiments, the PSC expresses or overexpresses ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter.
- a pluripotent stem cell comprising: a protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4, wherein the protein is overexpressed.
- the PSC expresses or overexpresses: ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof.
- the PSC is a human PSC.
- the PSC is an induced PSC (iPSC).
- the PSC comprises 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4.
- the PSC comprises 8- 10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4.
- Aspects of the present disclosure relate to a composition comprising: a population of any one of the PSCs described herein. In some embodiments, the population comprises at least 2500/cm2 of the PSC.
- aspects of the present disclosure relate to a method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 to produce cytotoxic T-cell-like cells.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ZBTB1.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RUNX3.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ERF. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding SP4. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter.
- the heterologous promoter is an inducible promoter.
- the inducible promoter is a chemically-inducible promoter.
- the chemically-inducible promoter is a doxycycline-inducible promoter.
- the population comprises 1x10 2 -1x10 7 PSCs.
- the population of PSCs is cultured for at least 1 day.
- the population of PSCs is cultured for about 3-6 days.
- the population of PSCs is cultured for no more than 6 days.
- the cytotoxic T-cell-like cells are CD3+ and CD8+.
- a pluripotent stem cell comprising: an engineered polynucleotide comprising an open reading frame encoding HNF4G, TEAD4, RFX3, or any combination thereof.
- the engineered polynucleotide comprises an open reading frame encoding HNF4G.
- the engineered polynucleotide comprises an open reading frame encoding TEAD4.
- the engineered polynucleotide comprises an open reading frame encoding RFX3.
- the engineered polynucleotide comprises an open reading frame encoding HNF4A.
- the PSC expresses or overexpresses HNF4G, TEAD4, RFX3, or any combination thereof. In some embodiments, the PSC further expresses or overexpresses HNF4A.
- the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter.
- PSC pluripotent stem cell
- the PSC expresses or overexpresses: HNF4G, TEAD4, RFX3, or any combination thereof.
- the PSC is a human PSC.
- the PSC is an induced PSC (iPSC).
- the PSC comprises 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from HNF4G, HNF4A, TEAD4, and RFX3.
- the PSC comprises 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from HNF4G, TEAD4, and RFX3.
- aspects of the present disclosure relate to a composition
- a composition comprising: a population comprising any one of the PSCs described herein.
- the population comprises at least 2500/cm2 of the PSC.
- aspects of the present disclosure relate to a method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from HNF4G, TEAD4, and RFX3 to produce hepatocyte-like cells.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding HNF4G.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding TEAD4. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RFX3. In some embodiments, the PSCs of the expanded population further comprise an engineered polynucleotide comprising an open reading frame encoding HNF4A. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. In some embodiments, the inducible promoter is a chemically-inducible promoter.
- the chemically-inducible promoter is a doxycycline-inducible promoter.
- the population comprises 1x10 2 -1x10 7 PSCs.
- the population of PSCs is cultured for at least 1 day.
- the population of PSCs is cultured for about 3-6 days.
- the population of PSCs is cultured for no more than 6 days.
- the hepatocyte-like cells are CD184+ and ASGPR1+.
- a pluripotent stem cell comprising: an engineered polynucleotide comprising an open reading frame encoding ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof.
- the engineered polynucleotide comprises comprising an open reading frame encoding ETS1.
- the engineered polynucleotide comprises an open reading frame encoding ETV3.
- the engineered polynucleotide comprises an open reading frame encoding GABPA.
- the engineered polynucleotide comprises an open reading frame encoding KLF9.
- the engineered polynucleotide comprises an open reading frame encoding NFKB1.
- the PSC expresses or overexpresses ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof.
- the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter.
- the heterologous promoter is an inducible promoter. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: a protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1, wherein the protein is overexpressed.
- PSC pluripotent stem cell
- the PSC expresses or overexpresses: ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof.
- the PSC is a human PSC.
- the PSC is an induced PSC (iPSC).
- the PSC comprises 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1.
- the PSC comprises 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1.
- aspects of the present disclosure relate to a composition
- a composition comprising: a population of any one of the PSCs described herein.
- the population comprises at least 2500/cm2 of the PSC.
- aspects of the present disclosure relate to a method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1 to produce regulatory T-cell-like cells.
- the engineered polynucleotide comprises an open reading frame encoding ETS1.
- the engineered polynucleotide comprises an open reading frame encoding ETV3. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding GABPA. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding KLF9. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding NFKB1. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. In some embodiments, the inducible promoter is a chemically-inducible promoter.
- the chemically-inducible promoter is a doxycycline-inducible promoter.
- the population comprises 1x10 2 -1x10 7 PSCs.
- the population of PSCs is cultured for at least 1 day.
- the population of PSCs is cultured for about 3-6 days.
- the population of PSCs is cultured for no more than 6 days.
- the regulatory T-cell-like cells are CD3+ and CD25+.
- a pluripotent stem cell comprising: an engineered polynucleotide comprising an open reading frame encoding EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof.
- the engineered polynucleotide comprises an open reading frame encoding EBF1.
- the engineered polynucleotide comprises an open reading frame encoding ZBTB1.
- the engineered polynucleotide comprises an open reading frame encoding RELA.
- the engineered polynucleotide comprises an open reading frame encoding NRF1.
- the engineered polynucleotide comprises an open reading frame encoding REL.
- the PSC expresses or overexpresses EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof.
- the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter.
- the heterologous promoter is an inducible promoter. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: a protein selected from EBF1, ZBTB1, RELA, NRF1, and REL, wherein the protein is overexpressed.
- PSC pluripotent stem cell
- the PSC expresses or overexpresses: EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof.
- the PSC is a human PSC.
- the PSC is an induced PSC (iPSC).
- the PSC comprises 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from EBF1, ZBTB1, RELA, NRF1, and REL.
- the PSC comprises 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from EBF1, ZBTB1, RELA, NRF1, and REL.
- aspects of the present disclosure relate to a composition
- a composition comprising: a population of any one of the PSCs described herein.
- the population comprises at least 2500/cm2 of the PSC.
- aspects of the present disclosure relate to a method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from EBF1, ZBTB1, RELA, NRF1, and REL to produce B cell-like cells.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding EBF1.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ZBTB1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding REL. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter.
- the heterologous promoter is an inducible promoter.
- the inducible promoter is a chemically-inducible promoter.
- the chemically-inducible promoter is a doxycycline-inducible promoter.
- the population comprises 1x10 2 -1x10 7 PSCs.
- the population of PSCs is cultured for at least 1 day.
- the population of PSCs is cultured for about 3-6 days.
- the population of PSCs is cultured for no more than 6 days.
- the B cell-like cells are CD19+ and CD27+.
- a pluripotent stem cell comprising: an engineered polynucleotide comprising an open reading frame encoding SPI1, ZBTB1, RELA, STAT2, or any combination thereof.
- the engineered polynucleotide comprises an open reading frame encoding SPI1.
- the engineered polynucleotide comprises an open reading frame encoding ZBTB1.
- the engineered polynucleotide comprises an open reading frame encoding RELA.
- the engineered polynucleotide comprises an open reading frame encoding STAT2.
- the PSC expresses or overexpresses SPI1, ZBTB1, RELA, STAT2, or any combination thereof.
- the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter.
- the heterologous promoter is an inducible promoter.
- a pluripotent stem cell comprising: a protein selected from SPI1, ZBTB1, RELA, and STAT2, wherein the protein is overexpressed.
- the PSC expresses or overexpresses: SPI1, ZBTB1, RELA, STAT2, or any combination thereof.
- the PSC is a human PSC.
- the PSC is an induced PSC (iPSC).
- the PSC comprises 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from SPI1, ZBTB1, RELA, and STAT2.
- the PSC comprises 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from SPI1, ZBTB1, RELA, and STAT2.
- a composition comprising: a population comprising any one of the PSCs described herein. In some embodiments, the population comprises at least 2500/cm2 of the PSC.
- aspects of the present disclosure relate to a method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from SPI1, ZBTB1, RELA, and STAT2 to produce microglia-like cells.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding SPI1.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ZBTB1.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding STAT2. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. In some embodiments, the inducible promoter is a chemically-inducible promoter. In some embodiments, the chemically-inducible promoter is a doxycycline-inducible promoter.
- the population comprises 1x10 2 -1x10 7 PSCs. In some embodiments, the population of PSCs is cultured for at least 1 day. In some embodiments, the population of PSCs is cultured for about 3-6 days. In some embodiments, the population of PSCs is cultured for no more than 6 days. In some embodiments, the microglia-like cells are CD11b+ and CX3CR1+.
- aspects of the present disclosure relate to a method, comprising: (i) analyzing epigenetics data for a target cell type to identify genomic sites that are available for binding of a transcription factor and generating a first pool of transcription factors; (ii) analyzing transcriptomic data for the target cell type to identify expression levels of the transcription factors associated with the genomic sites that are available for binding identified in step (i) and generating a second pool of transcription factors; (iii) using a first statistical method to filter background data and identify transcription factors that are present in the first pool of transcription factors and the second pool of transcription factors and generating a third pool of transcription factors, wherein the third pool of transcription factors comprises transcription factors that are in both the first pool and the second pool; (iv) using a second statistical method to determine the statistical significance of the transcription factors in the third pool of transcription factors; and (v) repeating steps (i)-(iv) one or more times to iteratively refine the third pool of transcription factors.
- the epigenetics data provides information related to whether genomic chromatin is open or closed. In some embodiments, the epigenetics data is produced by DNAse-seq, ATAC-seq, or ChIP-seq. In some embodiments, the transcriptomic data provides information related to whether there are more transcripts of the transcription factor in the target cell type than in a non-target cell type. In some embodiments, the transcriptomic data is produced by RNA-seq. In some embodiments, the first statistical method is linear regression algorithm. In some embodiments, the first statistical method is a logistic regression algorithm. In some embodiments, the first statistical method is a L1-regularized logistic regression model (LASSO).
- LASSO L1-regularized logistic regression model
- the background data is associated with transcription factors that are not expressed in the target cell type at a higher expression level than in the non-target cell type.
- the second statistical method is a log-likelihood ratio test.
- the method further comprises transfecting transcription factors of the third pool into a stem cell.
- the method further comprises inducing differentiation of the stem cell into the target cell type.
- the method further comprises analyzing the target cell type to identify additional transcription factors associated with the target cell type.
- the method further comprises using data from the target cell type to further refine the previous steps.
- the target cell type is an astrocyte, a cytotoxic T-cell, a hepatocyte, a regulatory T-cell, a B cell, or a microglial cell.
- differentiation of stem cells using one or more of the transcription factors in the third pool results in production of the target cell type in no more than 6 days.
- aspects of the present disclosure relate to a method for generating a transcription factor screening pool comprising: using at least one computer hardware processer to perform: accessing at least one statistical model relating one or more input transcription factors to differentiation efficiency of a cell having the one or more input transcription factors; obtaining differentiation efficiency information for the one or more input transcription factors; generating, using the at least one statistical model and the differentiation efficiency information, a transcription factor pool having transcription factors that are predicted to differentiate the cell into a target cell type in accordance with the differentiation efficiency information.
- the at least one statistical model correlates chromatin accessibility data and transcriptomics data to make initial predictions relating the one or more input transcription factors to differentiation efficiency of the cell having the one or more input transcription factors.
- the at least one statistical model distinguishes open chromatin data from background data.
- the open chromatin data is associated with the target cell type.
- the method further comprises identifying an initial set of transcription factor motifs positively correlated with the open chromatin data by using a statistical coefficient trained to distinguish the open chromatin data from the background data.
- the differentiation efficiency information corresponds to a mode of a distribution of differentiation efficiency data used to train the at least one statistical model. In some embodiments, wherein the at least one statistical model was trained using measured differentiation efficiency values having a multimodal distribution with modes, and the differentiation efficiency information corresponds to a mode of the multimodal distribution with the highest value.
- the transcription factors of the transcription factor pool have predicted differentiation efficiency within a distribution centered at the mode of the multimodal distribution with the highest value.
- the differentiation efficiency information corresponds to a Gaussian distribution centered at a mode of a distribution for differentiation efficiency data used to train the at least one statistical model.
- the differentiation efficiency information corresponds to a high differentiation efficiency component of a distribution of differentiation efficiency values for transcription factors.
- generating the transcription factor pool further comprises: generating an initial pool of transcription factors; using transcription factors in the initial pool as input to the at least one statistical model to obtain values for differentiation efficiency; selecting, based on the values for differentiation efficiency and the differentiation efficiency information, one or more of the transcription factors in the initial pool to include in the transcription factor pool.
- the at least one statistical model comprises at least one regression model. In some embodiments, the at least one statistical model comprises at least one neural network. In some embodiments, the at least one statistical model has a recurrent neural network architecture. In some embodiments, the at least one statistical model comprises a L1-regularized logistic regression model (LASSO). In some embodiments, the at least one statistical model comprises a log-likelihood ratio test.
- LASSO L1-regularized logistic regression model
- the at least one statistical model comprises a log-likelihood ratio test.
- aspects of the present disclosure relate to a system comprising: at least one hardware processor; and at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform: accessing at least one statistical model relating one or more input transcription factors to differentiation efficiency of a cell having the one or more input transcription factors; obtaining differentiation efficiency information for transcription factors, wherein the differentiation efficiency information corresponds to a mode of a distribution for differentiation efficiency data used to train the at least one statistical model; and generating, using the at least one statistical model and the differentiation efficiency information, a transcription factor pool having transcription factors with predicted differentiation efficiency in accordance with the differentiation efficiency information.
- the target cell type is a Type II astrocyte, cytotoxic T-cell, regulatory T- cell, hepatocyte, B cell, or microglial cell.
- FIGs.1A-1E show an overview of the machine-guided experimental workflow with the human TFome to identify and optimize transcription factor (TF) conversion combinations.
- FIG.1A CellCartographer considers only TFs with known binding motifs that are found to be highly specific to target cell identity.
- FIG.1B The CellCartographer workflow uses epigenetics and transcriptomics NGS data to determine TF pools for screening with the TFome (dashed box). Iterative rounds of screening are refined with ML and engineered induced pluripotent stem cell (iPSC) lines with sufficient differentiation undergo clonal isolation to isolate high-efficiency clones.
- iPSC engineered induced pluripotent stem cell
- FIG.1C TF-binding motifs and chromatin accessibility data are used to train a classifier model to determine TFs that are associated with the cell types of interest and then filtered with RNAseq data to get a finalized sub-library of TFs for pooled screening.
- FIG.1D iPSCs are nucleofected with TF-cassette pools that are integrated randomly into the genome where any one cell may receive some combination of these factors in either multiple copies or not at all.
- FIG.1E In silico validation of screening lists — for four cell types with previously validated TF- overexpression differentiation factors, our model accurately re-identifies these factors (shaded) in the top TFs that would be put into a screen.
- FIGs.2A-2D show computational analysis of 34 cell types with CellCartographer.
- FIG.2A Multidimensional scaling of the similarity in gene expression between different cell types.
- FIG.2B Multidimensional scaling of the similarity in TFs correlated with open chromatin.
- FIG.2C Motifs correlated and anti-correlated with open chromatin vary across 34 cell types analyzed.
- FIG.2D Highly ranked motifs correlated with open chromatin for cell types derived from yolk sac (microglia), endoderm (hepatocyte), mesoderm (B cell, T- cell, regulatory T-cell), and ectoderm (astrocyte).
- FIGs.3A-3F show primary pooled screens for cell types originating from each germ layer.
- FIG.3A Type II Astrocytes (ectoderm).
- FIG.3B Microglia (yolk sac).
- FIG.3C CD8-positive T-cells (mesoderm).
- FIG.3D B cells (mesoderm).
- FIG.3E Regulatory T-cells (mesoderm).
- FIG.3F Hepatocytes (endoderm).
- FIGs.4A-4D For each cell type, we show percent double-positive for fluorescence- activated cell sorting (FACS) analysis of canonical markers for non-clonal and mono-clonal (dashed box) cell lines, and an iPSC + media control (solid box) differentiated for six days in cell-type-specific media + DOX for (FIG.4A) Type II Astrocytes (iAstIIs) (FIG.4B) Cytotoxic T-cells (iCytoTs) (FIG.4C) Hepatocytes (iHeps) and (FIG.4D) Regulatory T- cells (iTRegs).
- FACS fluorescence- activated cell sorting
- FIG.4E Differential gene expression (quantified by Z-score) for all genes for two replicates of each differentiated cell type in both media conditions.
- FIG.4F Principal component analysis of all genes for each cell type in each media condition and a primary cell control.
- FIG.4G Differential gene expression (quantified by Reads Per Kilobase Million (RPKM)) for key marker genes across target cell types and iPSCs.
- FIG. 4H Metascape analysis of gene enrichment of high-efficiency clones for genes that were upregulated in these lines and differentiation conditions compared to iPSCs. Analysis of select highly significant GO Terms from TOP 50 for each differentiated cell type and condition is shown (-log10(P) ⁇ 3).
- FIGs.5A-5L show functional validation of iAstIIs, iHeps, iCytoTs, and iTRegs.
- FIGs.5A-5C Stimulation of Type II astrocytes over 10 min with small molecules with (FIG.5A) 100 ⁇ M ATP, (FIG.5B) 100 ⁇ M glutamate, and (FIG.5C) 30mM KCl.
- (LEFT) Relative fluorescence of six individual astrocytes. Astrocyte cell population shown before (TOP) and after (BOTTOM) addition of small molecule.
- FIG.5D BF image of induced iHeps prior to hepatotoxicity testing.
- FIG. 5E Nefazodone
- FIG.5F Acetaminophen
- FIG.5G Troglitazone for 24h and assayed for percent viability (survival rate normalized to each cell type without toxins applied).
- FIG.5H Brightfield imaging of T-cell populations (LEFT to RIGHT): Primary CD8 T-cells, iTRegs, Primary CD8 T-cells + activation beads, iCytoTs + activation beads.
- FIG.5I Suppression assay for iTRegs co-cultured with activated primary CD8 T-cells.
- FIG.5J Calculated percent suppression with titrated dosing of iTRegs in suppression assay. Primary T-cells have been shown to suppress in the range of 20/30/40% respectively.
- FIG. 5K Activation assay for iCytoTs.
- FIG.5L Percent of proliferating primary CD8 T-cells and iCD8 cells post-activation.
- FIGs.6A-6B show cell line differentiation for double-positive surface markers after 6 days of differentiation in cell-type-specific growth medium with doxycycline (DOX).
- DOX doxycycline
- FIG.6A For B cells (FIG.6A) and microglia (FIG.6B), percent double-positive for FACS analysis of canonical markers for iPSC only (solid box), non-clonal and mono-clonal (dashed box) cell lines is shown.
- FIGs.7A-7F show cell line differentiation for either one or both cell surface markers after 6 days of differentiation in cell-type-specific growth medium with DOX.
- astrocytes For Type II astrocytes (FIG.7A), CD8-positive T-cells (FIG.7B), microglia (FIG.7C), regulatory T- cells (FIG.7D), hepatocytes (FIG.7E), and B cells (mesoderm) (FIG.7F), percent differentiated for FACS analysis of canonical markers for non-clonal, mono-clonal (dashed box) cell lines, and iPSC+ media control (solid box) is shown.
- DETAILED DESCRIPTION Cell types generated by differentiation of stem cells have the potential to accelerate therapeutic discoveries for a variety of diseases.
- the present disclosure relates, at least in part, to methods and compositions for generating astrocyte-like cells (iAstIIs), cytotoxic T- cell-like cells (iCytoTs), hepatocyte-like cells (iHeps), regulatory T-cell-like cells (iTRegs), B cell-like cells (iBCells), and/or microglia-like cells (iMicroglia) from pluripotent stem cells (e.g., induced pluripotent stem cells (iPSCs)).
- astrocyte-like cells e.g., cytotoxic T- cell-like cells (iCytoTs), hepatocyte-like cells (iHeps), regulatory T-cell-like cells (iTRegs), B cell-like cells (iBCells), and/or microglia-like cells (iMicroglia) from pluripotent stem cells (e.g., induced pluripotent stem cells (iPSCs)).
- pluripotent stem cells
- the present disclosure also relates, at least in part, to methods for identifying transcription factors that improve differentiation efficiency of pluripotent stem cells into astrocyte-like cells, cytotoxic T-cell-like cells, hepatocyte-like cells, regulatory T-cell-like cells, B cell-like cells, and/or microglia-like cells.
- Astrocyte-Like Cells Some aspects of the present disclosure provide astrocyte-like cells and methods of producing such cells. Astrocytes are specialized glial cells in the brain that regulate neuronal synapses and play an important role in the neuroimmune system.
- An astrocyte-like cell is a cell that exhibits phenotypic characteristics of astrocytes.
- an astrocyte-like cell may express one or more biomarkers expressed by an astrocyte or exhibit one or more functions exhibited by an astrocyte.
- Astrocytes are broadly classified as Type I astrocytes and Type II astrocytes.
- the astrocyte-like cell of the present disclosure are Type I astrocytes or exhibit phenotypic characteristics of Type I astrocytes. Phenotypic characteristics of Type I astrocytes include, for example, a protoplasmic presentation with short astrocytic processes.
- the astrocyte-like cell of the present disclosure are Type II astrocytes or exhibit phenotypic characteristics of Type II astrocytes.
- Phenotypic characteristics of Type II astrocytes include, for example, a fibrous presentation with long astrocytic processes.
- Native astrocytes typically express the gene Cluster of Differentiation 44 (CD44) and the gene A2B5, two putative marker genes for astrocytes.
- CD44 Cluster of Differentiation 44
- A2B5 two putative marker genes for astrocytes.
- an astrocyte-like cell expresses CD44 (i.e., is CD44-positive (CD44+)).
- an astrocyte-like cell expresses A2B5 (i.e., is A2B5+).
- the astrocyte-like cells produced by the methods provided herein are CD44+/A2B5+ astrocyte- like cells.
- biomarkers of astrocyte identity include high levels of glial fibrillary acidic protein (GFAP) (e.g., ⁇ 80% of cells of an iPSC-derived population express GFAP) and low levels of neuronal class III beta-tubulin (TUJ1) (e.g., ⁇ 15% of cells of an iPSC-derived population express TUJ1).
- GFAP glial fibrillary acidic protein
- TUJ1 neuronal class III beta-tubulin
- Additional biomarkers of astrocyte identity include high levels of expression of aquaporin-4 (AQP4) (see, e.g., Jurga AM , Paleczna M, Kadluczka J, Kuter KZ.
- Cytotoxic T-cell-Like Cells Some aspects of the present disclosure provide cytotoxic T-cell-like cells and methods of producing such cells. Cytotoxic T cells are a type of immune cell associated with the innate immune system. Cytotoxic T-cells are T lymphocytes that kill foreign cells and pathogens in the body. A cytotoxic T-cell-like cell is a cell that exhibits phenotypic characteristics of cytotoxic T-cells. For example, a cytotoxic T-cell-like cell may express one or more biomarkers expressed by a cytotoxic T-cell or exhibit one or more functions exhibited by a cytotoxic T-cell.
- Cytotoxic T-cells that development in the body typically express the gene Cluster of Differentiation 3 (CD3) and the gene Cluster of Differentiation 8 (CD8), two putative marker genes for cytotoxic T-cells.
- a cytotoxic T-cell-like cell expresses CD3 (i.e., is CD3-positive (CD3+)).
- a cytotoxic T-cell-like cell expresses CD8 (i.e., is CD8+).
- the cytotoxic T-cell-like cells produced herein are CD3+/CD8+ cytotoxic T-cell-like cells (i.e., cells that express the CD3 protein and the CD8 protein).
- Hepatocyte-Like Cells Some aspects of the present disclosure provide hepatocyte-like cells and methods of producing such cells.
- Hepatocytes are specialized epithelial cells in the liver that play an important role in metabolism, detoxification, and protein synthesis. Hepatocytes also participate in the innate immune response by secreting immune proteins in response to invading cells, pathogens, and microorganisms.
- a hepatocyte-like cell is a cell that exhibits phenotypic characteristics of hepatocytes.
- a hepatocyte-like cell may express one or more biomarkers expressed by a hepatocyte or exhibit one or more functions exhibited by a hepatocyte.
- Hepatocytes produced in the body express the gene Cluster of Differentiation 184 (CD184) and the gene Asialoglycoprotein Receptor 1 (ASGPR1), two putative marker genes for hepatocytes.
- a hepatocyte-like cell expresses CD184 (i.e., is CD184-positive (CD184+)).
- a hepatocyte-like cell expresses ASGPR1 (i.e., is ASGPR1+).
- the hepatocyte-like cells produced by the methods provided herein are CD184+/ASGPR1+ hepatocyte-like cells (i.e., cells that express the CD184 protein and the ASGPR1 protein).
- regulatory T-cell-like cells are a specialized type of T-cell that act to suppress the immune response and maintain homeostasis in the body.
- a regulatory T-cell-like cell is a cell that exhibits phenotypic characteristics of regulatory T-cells.
- a regulatory T-cell-like cell may express one or more biomarkers expressed by a regulatory T- cell or exhibit one or more functions exhibited by a regulatory T-cell.
- Regulatory T-cells produced in the body express the gene Cluster of Differentiation 3 (CD3) and the gene Cluster of Differentiation 25 (CD25), two putative marker genes for regulatory T-cells.
- CD3 Cluster of Differentiation 3
- CD25 Cluster of Differentiation 25
- a regulatory T-cell-like cell expresses CD3 (i.e., is CD3-positive (CD3+)). In some embodiments, a regulatory T-cell-like cell expresses CD25 (i.e., is CD25+).
- the regulatory T-cell-like cells produced by the methods provided herein are CD3+/CD25+ regulatory T-cell-like cells (i.e., cells that express the CD3 protein and the CD25 protein).
- B Cell-Like Cells Some aspects of the present disclosure provide B cell-like cells and methods of producing such cells. B cells are a specialized type of white blood cells that make antibodies. B cells are a part of the immune system and develop from stem cells in the bone marrow.
- a B cell-like cell is a cell that exhibits phenotypic characteristics of B cells.
- a B cell- like cell may express one or more biomarkers expressed by a B cell or exhibit one or more functions exhibited by a B cell.
- B cells produced in the body express the gene Cluster of Differentiation 19 (CD19) and the gene Cluster of Differentiation 27 (CD27), two putative marker genes for B cells.
- CD19 i.e., is CD19-positive (CD19+)
- a B cell-like cell expresses CD27 (i.e., is CD27+).
- the B cell-like cells produced by the methods provided herein are CD19+/CD27+ B cell-like cells (i.e., cells that express the CD19 protein and the CD27 protein).
- Microglia-Like Cells Some aspects of the present disclosure provide microglia-like cells and methods of producing such cells. Microglia are a specialized type glial cells that function as macrophages in the central nervous system. A microglia-like cell is a cell that exhibits phenotypic characteristics of microglia. For example, a microglia-like cell may express one or more biomarkers expressed by a microglial cell or exhibit one or more functions exhibited by a microglial cell.
- Microglia produced in the body express the gene Cluster of Differentiation 11b (CD11b) and the gene C-X3-C Motif Chemokine Receptor 1 (CX3CR1), two putative marker genes for microglia.
- a microglia-like cell expresses CD11b (i.e., is CD11b-positive (CD11b+)).
- a microglia-like cell expresses CX3CR1 (i.e., is CX3CR1+).
- the microglia-like cells produced by the methods provided herein are CD11b+/CX3CR1+ microglia-like cells (i.e., cells that express the CD11b protein and the CX3CR1 protein).
- Pluripotent Stem Cells The astrocyte-like cells, cytotoxic T-cell-like cells, hepatocyte-like cells, regulatory T-cell-like cells, B cell-like cells, and microglia-like cells provided herein are differentiated from pluripotent stem cells.
- Pluripotent stem cells are cells that have the capacity to self- renew by dividing, and to develop into the three primary germ cell layers of the early embryo (e.g., ectoderm, endoderm, and mesoderm), and therefore into all cells of the adult body, but not extra-embryonic tissues such as the placenta.
- pluripotent stem cells include induced pluripotent cell (iPSCs), “true” embryonic stem cell (ESCs) derived from embryos, embryonic stem cells made by somatic cell nuclear transfer (ntESCs), and embryonic stem cells from unfertilized eggs (parthenogenesis embryonic stem cells, or pESCs).
- a pluripotent cell is a human pluripotent cell.
- a pluripotent stem cell is an embryonic stem cell, such as a human embryonic stem cell.
- Embryonic stem cell is a general term for pluripotent stem cells that are made using embryos or eggs, rather than for cells genetically reprogrammed from the body.
- ESCs encompass true ESCs, ntESCs, and pESCs.
- a pluripotent stem cell is an induced pluripotent stem cell, such as a human induced pluripotent stem cell.
- iPSCs may be derived from skin or blood cells that have been reprogrammed back into an embryonic-like pluripotent state that enables the development of an unlimited source of any type of human cell. See, e.g., Ye L et al. Curr Cardiol Rev.2013 Feb; 9(1): 63–72, incorporated herein by reference.
- the PSCs provided herein are engineered to differentiate into a particular cell type of interest by overexpressing one or more proteins (e.g., transcription factors) in the PSCs.
- a transcription factor is a protein that controls the rate of transcription.
- a protein is expressed in a PSC at a level that is at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 50%, or at least 100% higher than a control level, for example, an endogenous (baseline) level.
- a cell “expresses” a particular protein if the level of the protein in the cell is detectable (e.g., using a known protein assay).
- a cell “overexpresses” a particular protein (e.g., engineered polynucleotide encoding the protein) if the level of the protein is higher than (e.g., at least 5%, at least 10%, or at least 20% higher than) the level of the protein expressed from an endogenous, naturally-occurring polynucleotide encoding the protein.
- a control level of protein expression is an endogenous (baseline) level of expression of that same protein, for example, in a naturally-occurring pluripotent stem cell.
- protein encompasses full length functional proteins as well as full-length or truncated functional variants of a protein, unless stated otherwise.
- a variant protein encompasses full length functional transcription factors as well as full-length or truncated functional variants of the transcription factors, unless stated otherwise.
- a variant protein may comprise an amino acid sequence that has, for example, at least 80% identity to the amino acid sequence of a corresponding wild-type or reference protein and still exhibits the same function(s) as the corresponding wild-type or reference protein.
- a variant protein has at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity to a wild-type or reference protein.
- a global alignment is used to determine the percent identity between two proteins (e.g., the alignment spanning the entire length of both proteins).
- a local alignment may be used to determine the percent identity between regions of similarity between the two proteins (e.g., the alignment spanning the entire length of the truncated proteins but not the entire length of the wild-type or reference protein). Differentiation is the process by which an uncommitted cell or a partially committed cell commits to a specialized cell fate.
- aspects of the present disclosure relate to the differentiation of uncommitted pluripotent stem cells (e.g., induced pluripotent stem cells) into one or more cell fate selected from, for example, astrocyte-like cells, cytotoxic T-cell- like cells, hepatocyte-like cells, regulatory T-cell-like cells, B cell-like cells, and microglia- like cells.
- a PSC e.g., iPSC, such as a human iPSC
- the PSC comprises: a (one or more, e.g., 1, 2, 3, or 4) protein selected from ETS Transcription Factor ERG (ERG), Early Growth Response 1 (EGR1), friend leukemia integration 1 transcription factor (FLI1), and FBJ murine osteosarcoma viral oncogene homolog B (FOSB), wherein the protein is overexpressed.
- ERG ETS Transcription Factor ERG
- EGR1 Early Growth Response 1
- FLI1 friend leukemia integration 1 transcription factor
- FOSB FBJ murine osteosarcoma viral oncogene homolog B
- the PSC comprises: one or more proteins selected from ERG, EGR1, FLI1, and FOSB.
- the PSC comprises: two or more proteins selected from ERG, EGR1, FLI1, and FOSB.
- the PSC comprises: three or more proteins selected from ERG, EGR1, FLI1, and FOSB.
- the PSC comprises: one protein selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises: two proteins selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises: three proteins selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises: ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises and/or overexpresses ERG. In some embodiments, overexpression refers to an expression level above the expression level in a control cell. In some embodiments, the PSC comprises and/or overexpresses EGR1. In some embodiments, the PSC comprises and/or overexpresses FLI1.
- the PSC comprises and/or overexpresses FOSB. In some embodiments, the PSC comprises and/or overexpresses ERG and EGR1; ERG and FLI1; ERG and FOSB; EGR1 and FLI1; EGR1 and FOSG; or FLI1 and FOSB. In some embodiments, the PSC comprises and/or overexpresses any preceding combination and one (at least one) additional protein selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises and/or overexpresses ERG, EGR1, FLI1, and FOSB.
- a PSC in some embodiments, comprises an engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ERG, EGR1, FLI1, and FOSB.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ERG.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EGR1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding FLI1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding FOSB.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ERG and an engineered polynucleotide comprising an open reading frame encoding EGR1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ERG and an engineered polynucleotide comprising an open reading frame encoding FLI1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ERG and an engineered polynucleotide comprising an open reading frame encoding FOSB.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EGR1 and an engineered polynucleotide comprising an open reading frame encoding FLI1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EGR1 and an engineered polynucleotide comprising an open reading frame encoding FOSG. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding FLI1 and an engineered polynucleotide comprising an open reading frame encoding FOSB.
- the PSC comprises any preceding combination of engineered polynucleotides and one (at least one) additional engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ERG, EGR1, FLI1, and FOSB.
- the PSC comprises an engineered polynucleotide comprising an open reading frame encoding ERG, an engineered polynucleotide comprising an open reading frame encoding EGR1, an engineered polynucleotide comprising an open reading frame encoding FLI1, and an engineered polynucleotide comprising an open reading frame encoding FOSB.
- the PSC comprises: a (one or more, e.g., 1, 2, 3, or 4) protein selected from Zinc finger and BTB domain containing 1 (ZBTB1), Runt-related transcription factor 3 (RUNX3), REL-associated protein (RELA), Nuclear respiratory factor 1 (NRF1), ETS2 Repressor Factor (ERF), and Sp4 transcription factor (SP4), wherein the protein is overexpressed.
- ZBTB1 Zinc finger and BTB domain containing 1
- RUNX3 Runt-related transcription factor 3
- RELA REL-associated protein
- NRF1 Nuclear respiratory factor 1
- SP4 transcription factor Sp4 transcription factor
- the PSC comprises: one or more proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: two or more proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: three or more proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: four or more proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: five or more proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4.
- the PSC comprises: one protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: two proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: three proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: four proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: five proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4.
- the PSC comprises: ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4.
- the PSC comprises and/or overexpresses ZBTB1.
- overexpression refers to an expression level above the expression level in a control cell.
- the PSC comprises and/or overexpresses RUNX3.
- the PSC comprises and/or overexpresses RELA.
- the PSC comprises and/or overexpresses NRF1.
- the PSC comprises and/or overexpresses ERF.
- the PSC comprises and/or overexpresses SP4.
- the PSC comprises and/or overexpresses ZBTB1 and RUNX3; ZBTB1 and RELA; ZBTB1 and NRF1; ZBTB1 and SP4; RUNX3 and RELA; RUNX3 and NRF1; RUNX3 and ERF; RUNX3 and SP4; RELA and NRF1; RELA and ERF; RELA and SP4; NRF1 and ERF; NRF1 and SP4; or ERF and SP4.
- the PSC comprises and/or overexpresses any preceding combination and one (at least one) additional protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4.
- the PSC comprises and/or overexpresses ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4.
- a PSC in some embodiments, comprises an engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RUNX3.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding SP4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding RUNX3.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding ERF.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding SP4.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RUNX3 and an engineered polynucleotide comprising an open reading frame encoding RELA.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RUNX3 and an engineered polynucleotide comprising an open reading frame encoding NRF1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RUNX3 and an engineered polynucleotide comprising an open reading frame encoding ERF. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RUNX3 and an engineered polynucleotide comprising an open reading frame encoding SP4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA and an engineered polynucleotide comprising an open reading frame encoding NRF1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA and an engineered polynucleotide comprising an open reading frame encoding ERF. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA and an engineered polynucleotide comprising an open reading frame encoding SP4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding NRF1 and an engineered polynucleotide comprising an open reading frame encoding ERF.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding NRF1 and an engineered polynucleotide comprising an open reading frame encoding SP4.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ERF and an engineered polynucleotide comprising an open reading frame encoding SP4.
- the PSC comprises any preceding combination of engineered polynucleotides and one (at least one) additional engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4.
- the PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1, an engineered polynucleotide comprising an open reading frame encoding RUNX3, an engineered polynucleotide comprising an open reading frame encoding RELA, an engineered polynucleotide comprising an open reading frame encoding NRF1, an engineered polynucleotide comprising an open reading frame encoding ERF, and an engineered polynucleotide comprising an open reading frame encoding SP4.
- a PSC e.g., iPSC, such as a human iPSC
- the PSC comprises: a (one or more, e.g., 1, 2, 3, or 4) protein selected from Hepatocyte Nuclear Factor 4 Alpha (HNF4A), Hepatocyte Nuclear Factor 4 Gamma (HNF4G), TEA Domain Transcription Factor 4 (TEAD4), and Regulatory Factor X3 (RFX3), wherein the protein is overexpressed.
- HNF4A Hepatocyte Nuclear Factor 4 Alpha
- HNF4G Hepatocyte Nuclear Factor 4 Gamma
- TEAD4 TEA Domain Transcription Factor 4
- RFX3 regulatory Factor X3
- the PSC comprises: one or more proteins selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises: two or more proteins selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises: three or more proteins selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises: one protein selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises: two proteins selected from HNF4A, HNF4G, TEAD4, and RFX3.
- the PSC comprises: three proteins selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises: HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises and/or overexpresses HNF4A. In some embodiments, overexpression refers to an expression level above the expression level in a control cell. In some embodiments, the PSC comprises and/or overexpresses HNF4G. In some embodiments, the PSC comprises and/or overexpresses TEAD4. In some embodiments, the PSC comprises and/or overexpresses RFX3.
- the PSC comprises and/or overexpresses HNF4A and HNF4G; HNF4A and TEAD4; HNF4A and RFX3; HNF4G and TEAD4; HNF4G and RFX3; TEAD4 and RFX3; In some embodiments, the PSC comprises and/or overexpresses any preceding combination and one (at least one) additional protein selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises and/or overexpresses HNF4A, HNF4G, TEAD4, and RFX3.
- a PSC in some embodiments, comprises an engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from HNF4A, HNF4G, TEAD4, and RFX3.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4A.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4G.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding TEAD4.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RFX3.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4A and an engineered polynucleotide comprising an open reading frame encoding HNF4G. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4A and an engineered polynucleotide comprising an open reading frame encoding TEAD4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4A and an engineered polynucleotide comprising an open reading frame encoding RFX3.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4G and an engineered polynucleotide comprising an open reading frame encoding TEAD4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4G and an engineered polynucleotide comprising an open reading frame encoding RFX3. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding TEAD4 and an engineered polynucleotide comprising an open reading frame encoding RFX3.
- the PSC comprises any preceding combination of engineered polynucleotides and one (at least one) additional engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from HNF4A, HNF4G, TEAD4, and RFX3.
- the PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4A, an engineered polynucleotide comprising an open reading frame encoding HNF4G, an engineered polynucleotide comprising an open reading frame encoding TEAD4, and an engineered polynucleotide comprising an open reading frame encoding RFX3.
- the PSC comprises: a (one or more, e.g., 1, 2, 3, or 4) protein selected from ETS Proto-Oncogene 1, Transcription Factor (ETS1), ETS Variant Transcription Factor 3 (ETV3), GA Binding Protein Transcription Factor Subunit Alpha (GABPA), and Krueppel- like factor 9 (KLF9), Nuclear Factor Kappa B Subunit 1 (NFKB1), wherein the protein is overexpressed.
- ETS Proto-Oncogene 1 ETS Proto-Oncogene 1, Transcription Factor (ETS1), ETS Variant Transcription Factor 3 (ETV3), GA Binding Protein Transcription Factor Subunit Alpha (GABPA), and Krueppel- like factor 9 (KLF9), Nuclear Factor Kappa B Subunit 1 (NFKB1), wherein the protein is overexpressed.
- ETS1 ETS Proto-Oncogene 1
- ETV3 ETS Variant Transcription Factor 3
- the PSC comprises: one or more proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: two or more proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: three or more proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: four or more proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: one protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1.
- the PSC comprises: two proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: three proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: four proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises and/or overexpresses ETS1. In some embodiments, overexpression refers to an expression level above the expression level in a control cell. In some embodiments, the PSC comprises and/or overexpresses ETV3.
- the PSC comprises and/or overexpresses GABPA. In some embodiments, the PSC comprises and/or overexpresses KLF9. In some embodiments, the PSC comprises and/or overexpresses NFKB1. In some embodiments, the PSC comprises and/or overexpresses ETS1 and ETV3; ETS1 and GABPA; ETS1 and KLF9; ETV3 and GABPA; ETV3 and KLF9; ETV3 and NFKB1; GABPA and KLF9; GABPA and NFKB1; orKLF9 and NFKB1. In some embodiments, the PSC comprises and/or overexpresses any preceding combination and one (at least one) additional protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises and/or overexpresses ETS1, ETV3, GABPA, KLF9, and NFKB1.
- a PSC in some embodiments, comprises an engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETS1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETV3.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding GABPA.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding KLF9.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETS1 and an engineered polynucleotide comprising an open reading frame encoding ETV3. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETS1 and an engineered polynucleotide comprising an open reading frame encoding GABPA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETS1 and an engineered polynucleotide comprising an open reading frame encoding KLF9.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETS1 and an engineered polynucleotide comprising an open reading frame encoding NFKB1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETV3 and an engineered polynucleotide comprising an open reading frame encoding GABPA.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETV3 and an engineered polynucleotide comprising an open reading frame encoding KLF9.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETV3 and an engineered polynucleotide comprising an open reading frame encoding NFKB1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding GABPA and an engineered polynucleotide comprising an open reading frame encoding KLF9.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding GABPA and an engineered polynucleotide comprising an open reading frame encoding NFKB1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding KLF9 and an engineered polynucleotide comprising an open reading frame encoding NFKB1.
- the PSC comprises any preceding combination of engineered polynucleotides and one (at least one) additional engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1.
- the PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETS1, an engineered polynucleotide comprising an open reading frame encoding ETV3, an engineered polynucleotide comprising an open reading frame encoding GABPA, an engineered polynucleotide comprising an open reading frame encoding KLF9, and an engineered polynucleotide comprising an open reading frame encoding NFKB1.
- a PSC e.g., iPSC, such as a human iPSC
- the PSC comprises: a (one or more, e.g., 1, 2, 3, or 4) protein selected from Zinc finger and BTB domain containing 1 (ZBTB1), EBF Transcription Factor 1 (EBF1), REL-associated protein (RELA), Nuclear respiratory factor 1 (NRF1), and REL-associated protein (REL), wherein the protein is overexpressed.
- ZBTB1 Zinc finger and BTB domain containing 1
- EBF1 EBF Transcription Factor 1
- RELA REL-associated protein
- NRF1 Nuclear respiratory factor 1
- REL REL-associated protein
- the PSC comprises: one or more proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL.
- the PSC comprises: two or more proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL.
- the PSC comprises: three or more proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: four or more proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: one protein selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: two proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: three proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: four proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL.
- the PSC comprises: ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises and/or overexpresses ZBTB1. In some embodiments, overexpression refers to an expression level above the expression level in a control cell. In some embodiments, the PSC comprises and/or overexpresses EBF1. In some embodiments, the PSC comprises and/or overexpresses RELA. In some embodiments, the PSC comprises and/or overexpresses NRF1. In some embodiments, the PSC comprises and/or overexpresses REL.
- the PSC comprises and/or overexpresses ZBTB1 and EBF1; ZBTB1 and RELA; ZBTB1 and NRF1; EBF1 and RELA; EBF1 and NRF1; EBF1 and REL; RELA and NRF1; RELA and REL; or NRF1 and REL.
- the PSC comprises and/or overexpresses any preceding combination and one (at least one) additional protein selected from ZBTB1, EBF1, RELA, NRF1, and REL.
- the PSC comprises and/or overexpresses ZBTB1, EBF1, RELA, NRF1, and REL.
- a PSC in some embodiments, comprises an engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ZBTB1, EBF1, RELA, NRF1, and REL.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EBF1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding NRF1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding EBF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding NRF1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding REL. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EBF1 and an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EBF1 and an engineered polynucleotide comprising an open reading frame encoding NRF1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EBF1 and an engineered polynucleotide comprising an open reading frame encoding REL. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA and an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA and an engineered polynucleotide comprising an open reading frame encoding REL.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding NRF1 and an engineered polynucleotide comprising an open reading frame encoding REL.
- the PSC comprises any preceding combination of engineered polynucleotides and one (at least one) additional engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ZBTB1, EBF1, RELA, NRF1, and REL.
- the PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1, an engineered polynucleotide comprising an open reading frame encoding EBF1, an engineered polynucleotide comprising an open reading frame encoding RELA, an engineered polynucleotide comprising an open reading frame encoding NRF1, and an engineered polynucleotide comprising an open reading frame encoding REL.
- iPSC such as a human iPSC
- the PSC comprises: a (one or more, e.g., 1, 2, 3, or 4) protein selected from Zinc finger and BTB domain containing 1 (ZBTB1), Spi-1 Proto-Oncogene (SPI1), REL-associated protein (RELA), and Signal Transducer And Activator Of Transcription 2 (STAT2), wherein the protein is overexpressed.
- ZBTB1 Zinc finger and BTB domain containing 1
- SPI1 Spi-1 Proto-Oncogene
- RELA REL-associated protein
- STAT2 Signal Transducer And Activator Of Transcription 2
- the PSC comprises: one or more proteins selected from ZBTB1, SPI1, RELA, and STAT2.
- the PSC comprises: two or more proteins selected from ZBTB1, SPI1, RELA, and STAT2.
- the PSC comprises: three or more proteins selected from ZBTB1, SPI1, RELA, and STAT2.
- the PSC comprises: one protein selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises: two proteins selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises: three proteins selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises: ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises and/or overexpresses ZBTB1. In some embodiments, overexpression refers to an expression level above the expression level in a control cell. In some embodiments, the PSC comprises and/or overexpresses SPI1. In some embodiments, the PSC comprises and/or overexpresses RELA.
- the PSC comprises and/or overexpresses STAT2. In some embodiments, the PSC comprises and/or overexpresses ZBTB1 and SPI1; ZBTB1 and RELA; ZBTB1 and STAT2; SPI1 and RELA; SPI1 and STAT2; or RELA and STAT2. In some embodiments, the PSC comprises and/or overexpresses any preceding combination and one (at least one) additional protein selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises and/or overexpresses ZBTB1, SPI1, RELA, and STAT2.
- a PSC in some embodiments, comprises an engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ZBTB1, SPI1, RELA, and STAT2.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding SPI1.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding STAT2.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding SPI1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding STAT2.
- a PSC comprises an engineered polynucleotide comprising an open reading frame encoding SPI1 and an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding SPI1 and an engineered polynucleotide comprising an open reading frame encoding STAT2. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA and an engineered polynucleotide comprising an open reading frame encoding STAT2.
- the PSC comprises any preceding combination of engineered polynucleotides and one (at least one) additional engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ZBTB1, SPI1, RELA, and STAT2.
- the PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1, an engineered polynucleotide comprising an open reading frame encoding SPI1, an engineered polynucleotide comprising an open reading frame encoding RELA, and an engineered polynucleotide comprising an open reading frame encoding STAT2.
- Engineered Transcription Factors - Polynucleotides and Polypeptides comprise engineered polynucleotides.
- An engineered polynucleotide is a nucleic acid (e.g., at least two nucleotides covalently linked together, and in some instances, containing phosphodiester bonds, referred to as a phosphodiester backbone) that does not occur in nature.
- Engineered polynucleotides include recombinant nucleic acids and synthetic nucleic acids.
- a recombinant nucleic acid is a molecule that is constructed by joining nucleic acids (e.g., isolated nucleic acids, synthetic nucleic acids or a combination thereof) from two different organisms (e.g., human and mouse).
- a synthetic nucleic acid is a molecule that is amplified or chemically, or by other means, synthesized.
- a synthetic nucleic acid includes those that are chemically modified, or otherwise modified, but can base pair with (bind to) naturally occurring nucleic acid molecules.
- Recombinant and synthetic nucleic acids also include those molecules that result from the replication of either of the foregoing.
- An engineered polynucleotide may comprise DNA (e.g., genomic DNA, cDNA or a combination of genomic DNA and cDNA), RNA or a hybrid molecule, for example, where the nucleic acid contains any combination of deoxyribonucleotides and ribonucleotides (e.g., artificial or natural), and any combination of two or more bases, including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine, hypoxanthine, isocytosine and isoguanine.
- a polynucleotide is a complementary DNA (cDNA).
- cDNA is synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by reverse transcriptase.
- RNA messenger RNA
- miRNA microRNA
- Engineered polynucleotides of the present disclosure may be produced using standard molecular biology methods (see, e.g., Green and Sambrook, Molecular Cloning, A Laboratory Manual, 2012, Cold Spring Harbor Press).
- nucleic acids are produced using GIBSON ASSEMBLY® Cloning (see, e.g., Gibson, D.G. et al. Nature Methods, 343–345, 2009; and Gibson, D.G. et al.
- GIBSON ASSEMBLY® typically uses three enzymatic activities in a single-tube reaction: 5 ⁇ exonuclease, the 3 ⁇ extension activity of a DNA polymerase and DNA ligase activity.
- the 5 ⁇ exonuclease activity chews back the 5 ⁇ end sequences and exposes the complementary sequence for annealing.
- the polymerase activity then fills in the gaps on the annealed domains.
- a DNA ligase then seals the nick and covalently links the DNA fragments together.
- the overlapping sequence of adjoining fragments is much longer than those used in Golden Gate Assembly, and therefore results in a higher percentage of correct assemblies.
- an engineered polynucleotide comprises a promoter operably linked to an open reading frame.
- a promoter is a nucleotide sequence to which RNA polymerase binds to initial transcription (e.g., ATG). Promoters are typically located directly upstream from (at the 5' end of) a transcription initiation site.
- a promoter is a heterologous promoter. A heterologous promoter is not naturally associated with the open reading frame to which is it operably linked.
- a promoter is an inducible promoter.
- An inducible promoter may be regulated in vivo by a chemical agent, temperature, or light, for example.
- Inducible promoters enable, for example, temporal and/or spatial control of gene expression.
- Inducible promoters for use in accordance with the present disclosure include any inducible promoter described herein or known to one of ordinary skill in the art.
- inducible promoters include, without limitation, chemically/biochemically-regulated and physically- regulated promoters such as alcohol-regulated promoters, tetracycline-regulated promoters (e.g., anhydrotetracycline (aTc)-responsive promoters and other tetracycline responsive promoter systems, which include a tetracycline repressor protein (tetR), a tetracycline operator sequence (tetO) and a tetracycline transactivator fusion protein (tTA)), steroid- regulated promoters (e.g., promoters based on the rat glucocorticoid receptor, human estrogen receptor, moth ecdysone receptors, and promoters from the steroid/retinoid/thyroid 25 receptor superfamily), metal-regulated promoters (e.g., promoters derived from metallothionein (proteins that bind and sequester metal ions) genes from yeast
- the inducible promoter is a tetracycline-inducible promoter. In some embodiments, the inducible promoter is a doxycycline-inducible promoter. In other embodiments, a promoter is a constitutive promoter (active in vivo, unregulated).
- An open reading frame is a continuous stretch of codons that begins with a start codon (e.g., ATG), ends with a stop codon (e.g., TAA, TAG, or TGA), and encodes a polypeptide, for example, a protein.
- An open reading frame is operably linked to a promoter if that promoter regulates transcription of the open reading frame.
- Vectors used for delivery of an engineered polynucleotide include minicircles, plasmids, bacterial artificial chromosomes (BACs), and yeast artificial chromosomes.
- Transposon-based systems such as the piggyBacTM system (e.g., Chen et al. Nature Communications.2020; 11(1): 3446), is also contemplated herein.
- An engineered polynucleotide comprising an open reading frame encoding ERG e.g., UniprotKB Accession No. P11308).
- the protein comprises the sequence of: MASTIKEALSVVSEDQSLFECAYGTPHLAKTEMTASSSSDYGQTSKMSPRVPQQDWLSQPPARVTIKMECNPSQV NGSRNSPDECSVAKGGKMVGSPDTVGMNYGSYMEEKHMPPPNMTTNERRVIVPADPTLWSTDHVRQWLEWAVKEY GLPDVNILLFQNIDGKELCKMTKDDFQRLTPSYNADILLSHLHYLRETPLPHLTSDDVDKALQNSPRLMHARNTG GAAFIFPNTSVYPEATQRITTRPDLPYEPPRRSAWTGHGHPTPQSKAAQPSPSTVPKTEDQRPQLDPYQILGPTS SRLANPGSGQIQLWQFLLELLSDSSNSSCITWEGTNGEFKMTDPDEVARRWGERKSKPNMNYDKLSRALRYYYDK NIMTKVHGKRYAYKFDFHGIAQALQPHPPESSLYKYPSDLPYMGS
- an engineered polynucleotide comprises an open reading frame encoding EGR1 (e.g., UniprotKB Accession No. P18146).
- the protein comprises the sequence of: MTTLKEAVTFKDVAVVFTEEELRLLDLAQRKLYREVMLENFRNLLSVGHQSLHRDTFHFLKEEKFWMMETATQRE GNLGGKIQMEMETVSESGTHEGLFSHQTWEQISSDLTRFQDSMVNSFQFSKQDDMPCQVDAGLSIIHVRQKPSEG RTCKKSFSDVSVLDLHQQLQSREKSHTCDECGKSFCYSSALRIHQRVHMGEKLYNCDVCGKEFNQSSHLQIHQRI HTGEKPFKCEQCGKGFSRRSGLYVHRKLHTGVKPHICEKCGKAFIHDSQLQEHQRIHTGEKPFKCDICCKSFRSR ANLNRHSMVHMREKPFRCDTCGKSFGLKSALNSHRMVHTGE
- an engineered polynucleotide comprises an open reading frame encoding FLI1 (e.g., UniprotKB Accession No. Q01543).
- the protein comprises the sequence of: MDGTIKEALSVVSDDQSLFDSAYGAAAHLPKADMTASGSPDYGQPHKINPLPPQQEWINQPVRVNVKREYDHMNG SRESPVDCSVSKCSKLVGGGESNPMNYNSYMDEKNGPPPPNMTTNERRVIVPADPTLWTQEHVRQWLEWAIKEYS LMEIDTSFFQNMDGKELCKMNKEDFLRATTLYNTEVLLSHLSYLRESSLLAYNTTSHTDQSSRLSVKEDPSYDSV RRGAWGNNMNSGLNKSPPLGGAQTISKNTEQRPQPDPYQILGPTSSRLANPGSGQIQLWQFLLELLSDSANASCI TWEGTNGEFKMTDPDEVARRWGERKSKPNMNYDKLSRALRYYYY
- an engineered polynucleotide comprises an open reading frame encoding FOSB (e.g., UniprotKB Accession No. P53539).
- the protein comprises the sequence of: MFQAFPGDYDSGSRCSSSPSAESQYLSSVDSFGSPPTAAASQECAGLGEMPGSFVPTVTAITTSQDLQWLVQPTL ISSMAQSQGQPLASQPPVVDPYDMPGTSYSTPGMSGYSSGGASGSGGPSTSGTTSGPGPARPARARPRRPREETL TPEEEEKRRVRRERNKLAAAKCRNRRRELTDRLQAETDQLEEEKAELESEIAELQKEKERLEFVLVAHKPGCKIP YEEGPGPGPLAEVRDLPGSAPAKEDGFSWLLPPPPPPPLPFQTSQDAPPNLTASLFTHSEVQVLGDPFPVVNPSY TSSFVLTCPEVSAFAGAQRTSGSDQPSDPLNSPSLLALWIHPAFLY (
- an engineered polynucleotide comprises an open reading frame encoding ZBTB1 (e.g., UniprotKB Accession No. Q9Y2K1).
- the protein comprises the sequence of: MAKPSHSSYVLQQLNNQREWGFLCDCCIAIDDIYFQAHKAVLAACSSYFRMFFMNHQHSTAQLNLSNMKISAECF DLILQFMYLGKIMTAPSSFEQFKVAMNYLQLYNVPDCLEDIQDADCSSSKCSSSASSKQNSKMIFGVRMYEDTVA RNGNEANRWCAEPSSTVNTPHNREADEESLQLGNFPEPLFDVCKKSSVSKLSTPKERVSRRFGRSFTCDSCGFGF SCEKLLDEHVLTCTNRHLYQNTRSYHRIVDIRDGKDSNIKAEFGEKDSSKTFSAQTDKYRGDTSQAADDSASTTG SRKSSTVESEIASEEKSRAAERKRIIIKMEPEDIPTDELK
- an engineered polynucleotide comprises an open reading frame encoding RUNX3 (e.g., UniprotKB Accession No. Q13761).
- the protein comprises the sequence of: MASNSIFDSFPTYSPTFIRDPSTSRRFTPPSPAFPCGGGGGKMGENSGALSAQAAVGPGGRARPEVRSMVDVLAD HAGELVRTDSPNFLCSVLPSHWRCNKTLPVAFKVVALGDVPDGTVVTVMAGNDENYSAELRNASAVMKNQVARFN DLRFVGRSGRGKSFTLTITVFTNPTQVATYHRAIKVTVDGPREPRRHRQKLEDQTKPFPDRFGDLERLRMRVTPS TPSPRGSLSTTSHFSSQPQTPIQGTSELNPFSDPRQFDRSFPTLPTLTESRFPDPRMHYPGAMSAAFPYSATPSG TSISSLSVAGMPATSRFHHTYLPPPYPGAPQNQSGPFQANPSPYHLYYGTSSGSYQFSMVAGSSS
- an engineered polynucleotide comprises an open reading frame encoding RELA (e.g., UniprotKB Accession No. Q04206).
- the protein comprises the sequence of: MDELFPLIFPAEPAQASGPYVEIIEQPKQRGMRFRYKCEGRSAGSIPGERSTDTTKTHPTIKINGYTGPGTVRIS LVTKDPPHRPHPHELVGKDCRDGFYEAELCPDRCIHSFQNLGIQCVKKRDLEQAISQRIQTNNNPFQVPIEEQRG DYDLNAVRLCFQVTVRDPSGRPLRLPPVLSHPIFDNRAPNTAELKICRVNRNSGSCLGGDEIFLLCDKVQKEDIE VCPQASTPALSLYVIPEHHQL* (SEQ ID NO: 7)
- the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 7.
- an engineered polynucleotide comprises an open reading frame encoding NRF1 (e.g., UniprotKB Accession No. Q16656).
- the protein comprises the sequence of: MEEHGVTQTEHMATIEAHAVAQQVQQVHVATYTEHSMLSADEDSPSSPEDTSYDDSDILNSTAADEVTAHLAAAG PVGMAAAAAVATGKKRKRPHVFESNPSIRKRQQTRLLRKLRATLDEYTTRVGQQAIVLCISPSKPNPVFKVFGAA PLENVVRKYKSMILEDLESALAEHAPAPQEVNSELPPLTIDGIPVSVDKMTQAQLRAFIPEMLKYSTGRGKPGWG KESCKPIWWPEDIPWANVRSDVRTEEQKQRVSWTQALRTIVKNCYKQHGREDLLYAFEDQQTQTQATATHSIAHL VPSQTVVQTFSNPDGTVSLIQVGTGATVATLADASELPTTV
- an engineered polynucleotide comprises an open reading frame encoding ERF (e.g., UniprotKB Accession No. P50548).
- the protein comprises the sequence of: MKTPADTGFAFPDWAYKPESSPGSRQIQLWHFILELLRKEEYQGVIAWQGDYGEFVIKDPDEVARLWGVRKCKPQ MNYDKLSRALRYYYNKRILHKTKGKRFTYKFNFNKLVLVNYPFIDVGLAGGAVPQSAPPVPSGGSHFRFPPSTPS EVLSPTEDPRSPPACSSSSSSLFSAVVARRLGRGSVSDCSDGTSELEEPLGEDPRARPPGPPDLGAFRGPPLARL PHDPGVFRVYPRPRGGPEPLSPFPVSPLAGPGSLLPPQLSPALPMTPTHLAYTPSPTLSPMYPSGGGGPSGSGGG SHFSFSPEDMKRYLQAHTQSVYNYHLSPRAFLHYPGLVVPQPQRPDKCPLPPMAPET
- an engineered polynucleotide comprises an open reading frame encoding SP4 (e.g., UniprotKB Accession No. Q02446).
- the protein comprises the sequence of: MSDQKKEEEEEAAAAAAMATEGGKTSEPENNNKKPKTSGSQDSQPSPLALLAATCSKIGTPGENQATGQQQIIID PSQGLVQLQNQPQQLELVTTQLAGNAWQLVASTPPASKENNVSQPASSSSSSSSSNNGSASPTKTKSGNSSTPGQ FQVIQVQNPSGSVQYQVIPQLQTVEGQQIQINPTSSSSLQDLQGQIQLISAGNNQAILTAANRTASGNILAQNLA NQTVPVQIRPGVSIPLQLQTLPGTQAQVVTTLPINIGGVTLALPVINNVAAGGGTGQVGQPAATADSGTSNGNQL VSTPTNTTTSASTMPESPSSSTTCTTTASTSLTSSDTLVSSADTGQ
- an engineered polynucleotide comprises an open reading frame encoding HNF4A (e.g., UniprotKB Accession No. P41235).
- the protein comprises the sequence of: MRLSKTLVDMDMADYSAALDPAYTTLEFENVQVLTMGNDTSPSEGTNLNAPNSLGVSALCAICGDRATGKHYGAS SCDGCKGFFRRSVRKNHMYSCRFSRQCVVDKDKRNQCRYCRLKKCFRAGMKKEAVQNERDRISTRRSSYEDSSLP SINALLQAEVLSRQITSPVSGINGDIRAKKIASIADVCESMKEQLLVLVEWAKYIPAFCELPLDDQVALLRAHAG EHLLLGATKRSMVFKDVLLLGNDYIVPRHCPELAEMSRVSIRILDELVLPFQELQIDDNEYAYLKAIIFFDPDAK GLSDPGKIKRLRSQVQVSLEDYINDRQYDSRGRFGELLLLLPTLQSITWQ
- an engineered polynucleotide comprises an open reading frame encoding HNF4G (e.g., UniprotKB Accession No. Q14541).
- the protein comprises the sequence of: MDMANYSEVLDPTYTTLEFETMQILYNSSDSSAPETSMNTTDNGVNCLCAICGDRATGKHYGASSCDGCKGFFRR SIRKSHVYSCRFSRQCVVDKDKRNQCRYCRLRKCFRAGMKKEAVQNERDRISTRRSTFDGSNIPSINTLAQAEVR SRQISVSSPGSSTDINVKKIASIGDVCESMKQQLLVLVEWAKYIPAFCELPLDDQVALLRAHAGEHLLLGATKRS MMYKDILLLGNNYVIHRNSCEVEISRVANRVLDELVRPFQEIQIDDNEYACLKAIVFFDPDAKGLSDPVKIKNMR FQVQIGLEDYINDRQYDSRGRFGELLLLLPTLQSITWQMIEQIQF
- an engineered polynucleotide comprises an open reading frame encoding TEAD4 (e.g., UniprotKB Accession No. Q15561).
- the protein comprises the sequence of: MYGRNELIARYIKLRTGKTRTRKQVSSHIQVLARRKAREIQAKLKDQAAKDKALQSMAAMSSAQIISATAFHSSM ALARGPGRPAVSGFWQGALPGQAGTSHDVKPFSQQTYAVQPPLPLPGFESPAGPAPSPSAPPAPPWQGRSVASSK LWMLEFSAFLEQQQDPDTYNKHLFVHIGQSSPSYSDPYLEAVDIRQIYDKFPEKKGGLKDLFERGPSNAFFLVKF WADLNTNIEDEGSSFYGVSSQYESPENMIITCSTKVCSFGKQVVEKVETEYARYENGHYSYRIHRSPLCEYMINF IHKLKHLPEKYMMNSVLENFTILQVVTNRDTQETLLCIAY
- an engineered polynucleotide comprises an open reading frame encoding RFX3 (e.g., UniprotKB Accession No. P48380).
- the protein comprises the sequence of: MQTSETGSDTGSTVTLQTSVASQAAVPTQVVQQVPVQQQVQVQTVQQVQHVYPAQVQYVEGSDTVYTNGAIRTT TYPYTETQMYSQNTGGNYFDTQGSSAQVTTVVSSHSMVGTGGIQMGVTGGQLISSSGGTYLIGNSMENSGHSVTH TTRASPATIEMAIETLQKSDGLSTHRSSLLNSHLQWLLDNYETAEGVSLPRSTLYNHYLRHCQEHKLDPVNAASF GKLIRSIFMGLRTRRLGTRGNSKYHYYGIRVKPDSPLNRLQEDMQYMAMRQQPMQQKQRYKPMQKVDGVADGFTG SGQQTGTSVGQTVI
- an engineered polynucleotide comprises an open reading frame encoding ETS1 (e.g., UniprotKB Accession No. P14921).
- the protein comprises the sequence of: MKAAVDLKPTLTIIKTEKVDLELFPSPDMECADVPLLTPSSKEMMSQALKATFSGFTKEQQRLGIPKDPRQWTET HVRDWVMWAVNEFSLKGVDFQKFCMNGAALCALGKDCFLELAPDFVGDILWEHLEILQKEDVKPYQVNGVNPAYP ESRYTSDYFISYGIEHAQCVPPSEFSEPSFITESYQTLHPISSEELLSLKYENDYPSVILRDPLQTDTLQNDYFA IKQEVVTPDNMCMGRTSRGKLGGQDSFESIESYDSCGQEMGKEEKQT* (SEQ ID NO: 15)
- the protein comprises a sequence that has at least 80%, at least 85%, at least 90%,
- an engineered polynucleotide comprises an open reading frame encoding ETV3 (e.g., UniprotKB Accession No. P41162).
- the protein comprises the sequence of: MKAGCSIVEKPEGGGGYQFPDWAYKTESSPGSRQIQLWHFILELLQKEEFRHVIAWQQGEYGEFVIKDPDEVARL WGRRKCKPQMNYDKLSRALRYYYNKRILHKTKGKRFTYKFNFNKLVMPNYPFINIRSSGKIQTLLVGN (SEQ ID NO: 16)
- the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 16.
- an engineered polynucleotide comprises an open reading frame encoding GABPA (e.g., UniprotKB Accession No. Q06546).
- the protein comprises the sequence of: MTKREAEELIEIEIDGTEKAECTEESIVEQTYAPAECVSQAIDINEPIGNLKKLLEPRLQCSLDAHEICLQDIQL DPERSLFDQGVKTDGTVQLSVQVISYQGIEPKLNILEIVKPADTVEVVIDPDAHHAESEAHLVEEAQVITLDGTK HITTISDETSEQVTRWAAALEGYRKEQERLGIPYDPIQWSTDQVLHWVVWVMKEFSMTDIDLTTLNISGRELCSL NQEDFFQRVPRGEILWSHLELLRKYVLASQEQQMNEIVTIDQPVQIIPASVQSATPTTIKVINSSAKAAKVQRAP RISGEDRSSPGNRTGNNGQIQLWQFLLELLTDKDARDCIS
- an engineered polynucleotide comprises an open reading frame encoding KLF9 (e.g., UniprotKB Accession No. Q13886).
- the protein comprises the sequence of: MSAAAYMDFVAAQCLVSISNRAAVPEHGVAPDAERLRLPEREVTKEHGDPGDTWKDYCTLVTIAKSLLDLNKYRP IQTPSVCSDSLESPDEDMGSDSDVTTESGSSPSHSPEERQDPGSAPSPLSLLHPGVAAKGKHASEKRHKCPYSGC GKVYGKSSHLKAHYRVHTGERPFPCTWPDCLKKFSRSDELTRHYRTHTGEKQFRCPLCEKRFMRSDHLTKHARRH TEFHPSMIKRSKKALANALL (SEQ ID NO: 18)
- the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 18.
- an engineered polynucleotide comprises an open reading frame encoding NFKB1 (e.g., UniprotKB Accession No. P19838).
- the protein comprises the sequence of: MAEDDPYLGRPEQMFHLDPSLTHTIFNPEVFQPQMALPTADGPYLQILEQPKQRGFRFRYVCEGPSHGGLPGASS EKNKKSYPQVKICNYVGPAKVIVQLVTNGKNIHLHAHSLVGKHCEDGICTVTAGPKDMVVGFANLGILHVTKKKV FETLEARMTEACIRGYNPGLLVHPDLAYLQAEGGGDRQLGDREKELIRQAALQQTKEMDLSVVRLMFTAFLPDST GSFTRRLEPVVSDAIYDSKAPNASNLKIVRMDRTAGCVTGGEEIYLLCDKVQKDDIQIRFYEEEENGGVWEGFGD FSPTDVHRQFAIVFKTPKYKDINITKPASVFVQL
- an engineered polynucleotide comprises an open reading frame encoding EBF1 (e.g., UniprotKB Accession No. Q9UH73).
- the protein comprises the sequence of: MFGIQESIQRSGSSMKEEPLGSGMNAVRTWMQGAGVLDANTAAQSGVGLARAHFEKQPPSNLRKSNFFHFVLALY DRQGQPVEIERTAFVGFVEKEKEANSEKTNNGIHYRLQLLYSNGIRTEQDFYVRLIDSMTKQAIVYEGQDKNPEM CRVLLTHEIMCSRCCDKKSCGNRNETPSDPVIIDRFFLKFFLKCNQNCLKNAGNPRDMRRFQVVVSTTVNVDGHV LAVSDNMFVHNNSKHGRRARRLDPSEGTPSYLEHATPCIKAISPSEGWTTGGATVIIIGDNFFDGLQVIFGTMLV WSELITPHAIRVQTPPRHIPGVVEVTLSYKSKQFCKGTPGRFIYTALNE
- an engineered polynucleotide comprises an open reading frame encoding REL (e.g., UniprotKB Accession No. Q04864).
- the protein comprises the sequence of: MASGAYNPYIEIIEQPRQRGMRFRYKCEGRSAGSIPGEHSTDNNRTYPSIQIMNYYGKGKVRITLVTKNDPYKPH PHDLVGKDCRDGYYEAEFGQERRPLFFQNLGIRCVKKKEVKEAIITRIKAGINPFNVPEKQLNDIEDCDLNVVRL CFQVFLPDEHGNLTTALPPVVSNPIYDNRAPNTAELRICRVNKNCGSVRGGDEIFLLCDKVQKDDIEVRFVLNDW EAKGIFSQADVHRQVAIVFKTPPYCKAITEPVTVKMQLRRPSDQEVSESMDFRYLPDEKDTYGNKAKKQKTTLLF QKLCQDHVNFPERPRPGLLGSIGEGRYFKKEPNLFSHDAVV
- an engineered polynucleotide comprises an open reading frame encoding SPI1 (e.g., UniprotKB Accession No. P17947).
- the protein comprises the sequence of: MLQACKMEGFPLVPPQPSEDLVPYDTDLYQRQTHEYYPYLSSDGESHSDHYWDFHPHHVHSEFESFAENNFTELQ SVQPPQLQQLYRHMELEQMHVLDTPMVPPHPSLGHQVSYLPRMCLQYPSLSPAQPSSDEEEGERQSPPLEVSDGE ADGLEPGPGLLPGETGSKKKIRLYQFLLDLLRSGDMKDSIWWVDKDKGTFQFSSKHKEALAHRWGIQKGNRKKMT YQKMARALRNYGKTGEVKKVKKKLTYQFSGEVLGRGGLAERRHPPH* (SEQ ID NO: 22)
- the protein comprises a sequence that has at least 80%, at least 85%, at least 90%,
- an engineered polynucleotide comprises an open reading frame encoding STAT2 (e.g., UniprotKB Accession No. P52630).
- the protein comprises the sequence of: MAQWEMLQNLDSPFQDQLHQLYSHSLLPVDIRQYLAVWIEDQNWQEAALGSDDSKATMLFFHFLDQLNYECGRCS QDPESLLLQHNLRKFCRDIQPFSQDPTQLAEMIFNLLLEEKRILIQAQRAQLEQGEPVLETPVESQQHEIESRIL DLRAMMEKLVKSISQLKDQQDVFCFRYKIQAKGKTPSLDPHQTKEQKILQETLNELDKRRKEVLDASKALLGRLT TLIELLLPKLEEWKAQQQKACIRAPIDHGLEQLETWFTAGAKLLFHLRQLLKELKGLSCLVSYQDDPLTKGVDLR NAQVTELLQRLLHRAFVVETQPCMPQTP
- a PSC comprises 1-20 copies of an engineered polynucleotide.
- PSC may comprise 1-15, 1-10, 2-10, 2-15, 2-10, 5-20, 5-15, or 5-10 copies of an engineered polynucleotide.
- a PSC comprises 8-10 copies of an engineered polynucleotide.
- a PSC comprises fewer than 25 copies of an engineered polynucleotide.
- a PSC may comprise fewer than 20, fewer than 15, or fewer than 10 copies of an engineered polynucleotide.
- Methods of Producing Cells for Stem Cells Some aspects of the present disclosure relate to a method of using direct transcription factor overexpression in conjunction with growth factor culturing to induce CD44 + and A2B5 + astrocyte-like cells from stem cells (e.g., iPSCs) in fewer than 6 days (e.g., about 3 to about 5 days, e.g., about 3, about 4, or about 5 days).
- stem cells e.g., iPSCs
- the methods of producing astrocyte- like cells comprises culturing, in culture media, a population of pluripotent stem cells (PSCs) (e.g., iPSCs, such as human iPSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ERG, EGR1, FLI1, and FOSB to produce astrocyte-like cells.
- PSCs pluripotent stem cells
- the method comprises expressing ERG in PSCs of the expanded population.
- the method comprises expressing ERG1 in PSCs of the expanded population.
- the method comprises expressing FLI1 in PSCs of the expanded population.
- the method comprises expressing FOSB in PSCs of the expanded population. In some embodiments, the method comprises expressing ERG and ERG1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ERG and FLI1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ERG and FOSB in PSCs of the expanded population. In some embodiments, the method comprises expressing ERG1 and FLI1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ERG1 and FOSB in PSCs of the expanded population. In some embodiments, the method comprises expressing FLI1 and FOSB in PSCs of the expanded population.
- the method comprises expressing any preceding combination and a (at least one) protein selected from ERG, EGR1, FLI1, and FOSB in PSCs of the expanded population. In some embodiments, the method comprises expressing ERG, EGR1, FLI1, and FOSB in PSCs of the expanded population.
- Other aspects of the present disclosure relate to a method of using direct transcription factor overexpression in conjunction with growth factor culturing to induce CD3 + and CD8 + cytotoxic T-cell-like cells from stem cells in fewer than 6 days (e.g., about 3 to about 5 days, e.g., about 3, about 4, or about 5 days).
- the methods of producing cytotoxic T-cell-like cells comprises culturing, in culture media, a population of pluripotent stem cells (PSCs) (e.g., iPSCs, such as human iPSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 to produce cytotoxic T-cell-like cells.
- the method comprises expressing ZBTB1 in PSCs of the expanded population.
- the method comprises expressing RUNX3 in PSCs of the expanded population.
- the method comprises expressing RELA in PSCs of the expanded population.
- the method comprises expressing NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ERF in PSCs of the expanded population. In some embodiments, the method comprises expressing SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and RUNX3 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and ERF in PSCs of the expanded population.
- the method comprises expressing ZBTB1 and SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing RUNX3 and RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing RUNX3 and NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing RUNX3 and ERF in PSCs of the expanded population. In some embodiments, the method comprises expressing RUNX3 and SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA and NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA and ERF in PSCs of the expanded population.
- the method comprises expressing RELA and SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing NRF1 and ERF in PSCs of the expanded population. In some embodiments, the method comprises expressing NRF1 and SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing ERF and SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing any preceding combination and a (at least one) protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 in PSCs of the expanded population.
- Yet other aspects of the present disclosure relate to a method of using direct transcription factor overexpression in conjunction with growth factor culturing to induce CD184 + and ASGPR1 + hepatocyte-like cells from stem cells in fewer than 6 days (e.g., about 3 to about 5 days, e.g., about 3, about 4, or about 5 days).
- the methods of producing hepatocyte T-cell-like cells comprises culturing, in culture media, a population of pluripotent stem cells (PSCs) (e.g., iPSCs, such as human iPSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from HNF4A, HNF4G, TEAD4, and RFX3 to produce hepatocyte T-cell-like cells.
- the method comprises expressing HNF4A in PSCs of the expanded population.
- the method comprises expressing HNF4G in PSCs of the expanded population.
- the method comprises expressing TEAD4 in PSCs of the expanded population. In some embodiments, the method comprises expressing RFX3 in PSCs of the expanded population. In some embodiments, the method comprises expressing HNF4A and HNF4G in PSCs of the expanded population. In some embodiments, the method comprises expressing HNF4A and TEAD4 in PSCs of the expanded population. In some embodiments, the method comprises expressing HNF4A and RFX3 in PSCs of the expanded population. In some embodiments, the method comprises expressing HNF4G and TEAD4 in PSCs of the expanded population. In some embodiments, the method comprises expressing HNF4G and RFX3 in PSCs of the expanded population.
- the method comprises expressing TEAD4 and RFX3 in PSCs of the expanded population. In some embodiments, the method comprises expressing any preceding combination and a (at least one) protein selected from HNF4A, HNF4G, TEAD4, and RFX3, in PSCs of the expanded population. In some embodiments, the method comprises expressing HNF4A, HNF4G, TEAD4, and RFX3 in PSCs of the expanded population.
- Still other aspects of the present disclosure relate to a method of using direct transcription factor overexpression in conjunction with growth factor culturing to induce CD3+ and CD25+ regulatory T-cell-like cells from stem cells in fewer than 6 days (e.g., about 3 to about 5 days, e.g., about 3, about 4, or about 5 days).
- the methods of producing regulatory T-cell-like cells comprises culturing, in culture media, a population of pluripotent stem cells (PSCs) (e.g., iPSCs, such as human iPSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1 to produce regulatory T-cell-like cells.
- the method comprises expressing ETS1 in PSCs of the expanded population.
- the method comprises expressing ETV3 in PSCs of the expanded population.
- the method comprises expressing GABPA in PSCs of the expanded population.
- the method comprises expressing KLF9 in PSCs of the expanded population. In some embodiments, the method comprises expressing NFKB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ETS1 and ETV3 in PSCs of the expanded population. In some embodiments, the method comprises expressing ETS1 and GABPA in PSCs of the expanded population. In some embodiments, the method comprises expressing ETS1 and KLF9 in PSCs of the expanded population. In some embodiments, the method comprises expressing ETS1 and NFKB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ETV3 and GABPA in PSCs of the expanded population.
- the method comprises expressing ETV3 and KLF9 in PSCs of the expanded population. In some embodiments, the method comprises expressing ETV3 and NFKB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing GABPA and KLF9 in PSCs of the expanded population. In some embodiments, the method comprises expressing GABPA and NFKB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing KLF9 and NFKB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing any preceding combination and a (at least one) protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1, in PSCs of the expanded population.
- the method comprises expressing ETS1, ETV3, GABPA, KLF9, and NFKB1, in PSCs of the expanded population.
- Further aspects of the present disclosure relate to a method of using direct transcription factor overexpression in conjunction with growth factor culturing to induce CD19+ and CD27+ B cell-like cells from stem cells in fewer than 6 days (e.g., about 3 to about 5 days, e.g., about 3, about 4, or about 5 days).
- the methods of producing B cell-like cells comprises culturing, in culture media, a population of pluripotent stem cells (PSCs) (e.g., iPSCs, such as human iPSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ZBTB1, EBF1, RELA, NRF1, and REL to produce B cell-like cells.
- PSCs pluripotent stem cells
- the method comprises expressing ZBTB1 in PSCs of the expanded population.
- the method comprises expressing EBF1 in PSCs of the expanded population.
- the method comprises expressing RELA in PSCs of the expanded population.
- the method comprises expressing NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing REL in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and EBF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and REL in PSCs of the expanded population. In some embodiments, the method comprises expressing EBF1 and RELA in PSCs of the expanded population.
- the method comprises expressing EBF1 and NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing EBF1 and REL in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA and NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA and REL in PSCs of the expanded population. In some embodiments, the method comprises expressing NRF1 and REL in PSCs of the expanded population. In some embodiments, the method comprises expressing any preceding combination and a (at least one) protein selected from ZBTB1, EBF1, RELA, NRF1, and REL in PSCs of the expanded population.
- the method comprises expressing ZBTB1, EBF1, RELA, NRF1, and REL in PSCs of the expanded population. Additional aspects of the present disclosure relate to a method of using direct transcription factor overexpression in conjunction with growth factor culturing to induce CD11b+ and CX3CR1+ microglia-like cells from stem cells in fewer than 6 days (e.g., about 3 to about 5 days, e.g., about 3, about 4, or about 5 days).
- the methods of producing microglia-like cells comprises culturing, in culture media, a population of pluripotent stem cells (PSCs) (e.g., iPSCs, such as human iPSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ZBTB1, SPI1, RELA, and STAT2 to produce microglia-like cells.
- PSCs pluripotent stem cells
- the method comprises expressing ZBTB1 in PSCs of the expanded population.
- the method comprises expressing SPI1 in PSCs of the expanded population.
- the method comprises expressing RELA in PSCs of the expanded population.
- the method comprises expressing STAT2 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and SPI1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and STAT2 in PSCs of the expanded population. In some embodiments, the method comprises expressing SPI1 and RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing SPI1 and STAT2 in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA and STAT2 in PSCs of the expanded population.
- the method comprises expressing any preceding combination and a (at least one) protein selected from ZBTB1, SPI1, RELA, and STAT2 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1, SPI1, RELA, and STAT2 in PSCs of the expanded population.
- the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter, non- limiting examples of which are provided elsewhere herein.
- the population a starting population comprises, in some embodiments, about 1x10 2 - 1x10 10 , about 1x10 2 -1x10 9 , about 1x10 2 -1x10 8 , or about 1x10 2 -1x10 7 PSCs. In some embodiments, the population comprises about 1x10 3 -1x10 8 or about 1x10 3 -1x10 7 PSCs. In some embodiments, the population comprises about 1x10 4 -1x10 7 or about 1x10 5 -1x10 6 PSCs.
- the population comprises about 1x10 1 PSCs, about 1x10 2 PSCs, about 1x10 3 PSCs, about 1x10 4 PSCs, about 1x10 5 PSCs, about 1x10 6 PSCs, about 1x10 7 PSCs, about 1x10 8 PSCs, about 1x10 9 PSCs, or about 1x10 10 PSCs.
- the population of PSCs is cultured for about 1 day to about 10 days.
- the population of PSCs is cultured for no more than 10 days.
- the population of PSCs may be cultured for no more than 9 days, no more than 8 days, no more than 7 days, no more than 6 days, or no more than 5 days.
- the population of PSCs is cultured for no more than 6 days. In some embodiments, the population of PSCs is cultured for about 2 to about 6 days, about 2 to about 5 days, about 2 to about 4 days, about 3 to about 6 days, about 3 to about 5 days, or about 3 to about 4 days. In some embodiments, the population of PSCs is cultured for about 2 days, about 3 days, about 4 days, about 5 days, or about 6 days.
- a differentiated cell type is produced from a PSC (or a population of PSCs) within 10 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein (e.g., selected from (a) ERG, EGR1, FLI1 and FOSB to produce astrocyte-like cells; (b) ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 to produce cytotoxic T-cell-like cells; (c) HNF4G, TEAD4, and RFX3 to produce hepatocyte-like cells; (d) ETS1, ETV3, GABPA, KLF9, and NFKB1 to produce regulatory T-cell-like cells; (e) EBF1, ZBTB1, RELA, NRF1, and REL to produce B cell-like cells; or (f) SPI1, ZBTB1, RELA, and STAT2 to produce microglia-like cells).
- a transcription factor provided herein (e
- a differentiated cell type is produced from a PSC (or a population of PSCs) within 9 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein. In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 8 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein.
- a differentiated cell type is produced from a PSC (or a population of PSCs) within 7 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein. In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 6 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein.
- a differentiated cell type is produced from a PSC (or a population of PSCs) within 5 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein. In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 4 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein.
- a differentiated cell type is produced from a PSC (or a population of PSCs) within 3 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein. In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 2 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein.
- Some methods of the present disclosure comprise (a) delivering to PSCs an engineered polynucleotide comprising an inducible promoter operably linked to an open reading frame encoding a (one or more) protein selected from ERG, EGR1, FLI1, and FOSB; (b) culturing the PSCs in feeder-free, serum-free culture media to produce an expanded population of PSCs; and (c) culturing PSCs of the expanded population in a series of induction media comprising an inducing agent to produce astrocyte-like cells (e.g.,CD44+/A2B5+ astrocyte-like cells).
- the series of induction media comprises a first, a second, a third, and a fourth induction media.
- Some methods of the present disclosure comprise (a) delivering to PSCs an engineered polynucleotide comprising an inducible promoter operably linked to an open reading frame encoding a (one or more) protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4; (b) culturing the PSCs in feeder-free, serum-free culture media to produce an expanded population of PSCs; and (c) culturing PSCs of the expanded population in a series of induction media comprising an inducing agent to produce cytotoxic T-cell-like cells (e.g., CD3+/CD8+ cytotoxic T-cell-like cells).
- cytotoxic T-cell-like cells e.g., CD3+/CD8+ cytotoxic T-cell-like cells.
- the series of induction media comprises a first, a second, a third, and a fourth induction media.
- Some methods of the present disclosure comprise (a) delivering to PSCs an engineered polynucleotide comprising an inducible promoter operably linked to an open reading frame encoding a (one or more) protein selected from HNF4A, HNF4G, TEAD4, and RFX3; (b) culturing the PSCs in feeder-free, serum-free culture media to produce an expanded population of PSCs; and (c) culturing PSCs of the expanded population in a series of induction media comprising an inducing agent to produce hepatocyte-like cells (e.g., CD184+/ASGPR1+ hepatocyte-like cells).
- hepatocyte-like cells e.g., CD184+/ASGPR1+ hepatocyte-like cells
- the series of induction media comprises a first, a second, a third, and a fourth induction media.
- Some methods of the present disclosure comprise (a) delivering to PSCs an engineered polynucleotide comprising an inducible promoter operably linked to an open reading frame encoding a (one or more) protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1; (b) culturing the PSCs in feeder-free, serum-free culture media to produce an expanded population of PSCs; and (c) culturing PSCs of the expanded population in a series of induction media comprising an inducing agent to produce regulatory T-cell-like cells (e.g., CD3+/CD25+ regulatory T-cell-like cells).
- regulatory T-cell-like cells e.g., CD3+/CD25+ regulatory T-cell-like cells
- the series of induction media comprises a first, a second, a third, and a fourth induction media.
- Some methods of the present disclosure comprise (a) delivering to PSCs an engineered polynucleotide comprising an inducible promoter operably linked to an open reading frame encoding a (one or more) protein selected from EBF1, ZBTB1, RELA, NRF1, and REL; (b) culturing the PSCs in feeder-free, serum-free culture media to produce an expanded population of PSCs; and (c) culturing PSCs of the expanded population in a series of induction media comprising an inducing agent to produce B cell-like cells (e.g., CD19+/CD27+ B cell-like cells).
- B cell-like cells e.g., CD19+/CD27+ B cell-like cells
- the series of induction media comprises a first, a second, a third, and a fourth induction media.
- Some methods of the present disclosure comprise (a) delivering to PSCs an engineered polynucleotide comprising an inducible promoter operably linked to an open reading frame encoding a (one or more) protein selected from SPI1, ZBTB1, RELA, and STAT2; (b) culturing the PSCs in feeder-free, serum-free culture media to produce an expanded population of PSCs; and (c) culturing PSCs of the expanded population in a series of induction media comprising an inducing agent to produce microglia-like cells (e.g., CD11b+/CX3CR1+ microglia-like cells).
- microglia-like cells e.g., CD11b+/CX3CR1+ microglia-like cells
- the series of induction media comprises a first, a second, a third, and a fourth induction media.
- the PSCs are cultured in feeder-free, serum-free culture media for about 6 to about 24 hours.
- the PSC may be cultured in feeder-free, serum- free culture media for about, 6 to about 12 hours.
- the PSCs are cultured in feeder-free, serum-free culture media for about 6 hours, about 7 hours, about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, about 13 hours, about 14 hours, about 15 hours, about 16 hours, about 17 hours, about 18 hours, about 19 hours, about 20 hours, about 21 hours, about 22 hours, about 23 hours, or about 24 hours.
- the expanded population of PSCs comprises at least 5x10 3 PSCs.
- the expanded population (e.g., at the time of induction) may comprise at least 1x10 4 , at least 1x10 5 , at least 1x10 6 , or at least 1x10 7 PSCs.
- the expanded population of PSCs comprises about 5x10 3 PSCs to about 1x10 7 PSCs.
- PSCs of the expanded population are cultured at a density of about 2,000 cells/cm 2 to about 3,000 cells/cm 2 .
- PSCs of the expanded population are cultured at a density of about 500/cm 2 - 10000/cm 2 PSCs.
- the PSCs of the expanded population are cultured at a density of about 1000/cm 2 - 9500/cm 2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 1500/cm 2 - 9000/cm 2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 2000/cm 2 - 8500/cm 2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 2500/cm 2 - 8000/cm 2 PSCs.
- PSCs of the expanded population are cultured at a density of about 3000/cm 2 - 7500/cm 2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 3500/cm 2 - 7000/cm 2 PSCs. In some embodiments, the population comprises 4000/cm 2 - 6500/cm 2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 4500/cm 2 - 6000/cm 2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 5000/cm 2 - 5500/cm 2 PSCs.
- PSCs of the expanded population are cultured at a density of at least 500/cm 2 PSCs, at least 1000/cm 2 PSCs, at least 1500/cm 2 PSCs, at least 2000/cm 2 PSCs, at least 2500/cm 2 PSCs, at least 3000/cm 2 PSCs, at least 3500/cm 2 PSCs, at least 4000/cm 2 PSCs, at least 4500/cm 2 PSCs, at least 5000/cm 2 PSCs, at least 5500/cm 2 PSCs, at least 6000/cm 2 PSCs, at least 6500/cm 2 PSCs, at least 7000/cm 2 PSCs, at least 7500/cm 2 PSCs, at least 8000/cm 2 PSCs, at least 8500/cm 2 PSCs, at least 9000/cm 2 PSCs, at least 9500/cm 2 PSCs, or at least 10000/cm 2 PSC
- PSCs of the expanded population are cultured for no longer than 8 days, no longer than 7 days, no longer than 6 days, no longer than 5 days, or no longer than 4 days.
- PSCs of the expanded population may be cultured for about 2 to about 8 days, about 2 to about 7 days, about 2 to about 6 days, about 2 to about 5 days, about 2 to about 4 days, about 3 to about 8 days, about 3 to about 7 days, about 3 to about 6 days, about 3 to about 5 days, or about 3 to about 4 days.
- PSCs of the expanded population are cultured for about 2 days, about 3 days, about 4 days, about 5 days, about 6 days, about 7 days, or about 8 days.
- PSCs of the expanded population are cultured in a first induction media for about 6 to about 36 hours.
- the PSC may be cultured in a first induction media for about 6 to about 24 hours, about 6 to about 18 hours, about 6 to about 12 hours, 12 to about 36 hours, about 12 to about 24 hours, about 12 to about 18 hours, 18 to about 36 hours, or about 18 to about 24 hours.
- the PSCs are cultured in a first induction media for about 6 hours, about 7 hours, about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, about 13 hours, about 14 hours, about 15 hours, about 16 hours, about 17 hours, about 18 hours, about 19 hours, about 20 hours, about 21 hours, about 22 hours, about 23 hours, about 24 hours, about 25 hours, about 26 hours, about 27 hours, about 28 hours, about 29 hours, or about 30 hours.
- PSCs of the expanded population are cultured in a second induction media for about 6 to about 36 hours.
- the PSC may be cultured in a second induction media for about 6 to about 24 hours, about 6 to about 18 hours, about 6 to about 12 hours, 12 to about 36 hours, about 12 to about 24 hours, about 12 to about 18 hours, 18 to about 36 hours, or about 18 to about 24 hours.
- the PSCs are cultured in a second induction media for about 6 hours, about 7 hours, about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, about 13 hours, about 14 hours, about 15 hours, about 16 hours, about 17 hours, about 18 hours, about 19 hours, about 20 hours, about 21 hours, about 22 hours, about 23 hours, about 24 hours, about 25 hours, about 26 hours, about 27 hours, about 28 hours, about 29 hours, or about 30 hours.
- PSCs of the expanded population are cultured in a third induction media for about 6 to about 36 hours.
- the PSC may be cultured in a third induction media for about 6 to about 24 hours, about 6 to about 18 hours, about 6 to about 12 hours, 12 to about 36 hours, about 12 to about 24 hours, about 12 to about 18 hours, 18 to about 36 hours, or about 18 to about 24 hours.
- the PSCs are cultured in a third induction media for about 6 hours, about 7 hours, about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, about 13 hours, about 14 hours, about 15 hours, about 16 hours, about 17 hours, about 18 hours, about 19 hours, about 20 hours, about 21 hours, about 22 hours, about 23 hours, about 24 hours, about 25 hours, about 26 hours, about 27 hours, about 28 hours, about 29 hours, or about 30 hours.
- PSCs of the expanded population are cultured in a fourth induction media for about 6 to about 36 hours.
- the PSC may be cultured in a fourth induction media for about 6 to about 24 hours, about 6 to about 18 hours, about 6 to about 12 hours, 12 to about 36 hours, about 12 to about 24 hours, about 12 to about 18 hours, 18 to about 36 hours, or about 18 to about 24 hours.
- the PSCs are cultured in a fourth induction media for about 6 hours, about 7 hours, about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, about 13 hours, about 14 hours, about 15 hours, about 16 hours, about 17 hours, about 18 hours, about 19 hours, about 20 hours, about 21 hours, about 22 hours, about 23 hours, about 24 hours, about 25 hours, about 26 hours, about 27 hours, about 28 hours, about 29 hours, or about 30 hours.
- PSCs are incubated for at least 6 hours. In some embodiments, after incubation, the media is removed from the plate and the plate is washed with DMEM/F12.
- Some aspects provide a method of producing astrocyte-like cells , comprising: (a) delivering 1-20 copies, for example, 8-10 copies, of an engineered nucleic acid to a population of human stem cells, wherein the engineered nucleic acid comprises an inducible promoter operably linked to an open reading frame encoding a transcription factor selected from ERG, EGR1, FLI1 and FOSB; (b) inducing activation of the inducible promoter; and (c) culturing the population of human stem cells for no more than 10 days, preferably no more than 7 days, in a series of induction media (e.g., a first, second, third and/or fourth induction media as described herein) to produce astrocyte-like cells (e.g.,CD44+/A2B5+ astrocyte-like cells).
- a series of induction media e.g., a first, second, third and/or fourth induction media as described herein
- the method of producing astrocyte-like cells comprises delivering to the human stem cells 1-20 copies of each of the following: (i) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ERG, (ii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding EGR1, (iii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding FLI1, and (iv) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding FOSB.
- Some aspects provide a method of producing cytotoxic T-cell-like cells, comprising: (a) delivering 1-20 copies, for example, 8-10 copies, of an engineered nucleic acid to a population of human stem cells, wherein the engineered nucleic acid comprises an inducible promoter operably linked to an open reading frame encoding a transcription factor selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4; (b) inducing activation of the inducible promoter; and (c) culturing the population of human stem cells for no more than 10 days, preferably no more than 7 days, in a series of induction media (e.g., a first, second, third and/or fourth induction media as described herein) to produce cytotoxic T-cell-like cells (e.g., CD3+/CD8+ cytotoxic T-cell-like cells).
- a series of induction media e.g., a first, second, third and/or fourth induction media as described herein
- the method of producing cytotoxic T-cell-like cells comprises delivering to the human stem cells 1-20 copies of each of the following: (i) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ZBTB1, (ii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding RUNX3, (iii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding RELA, (iv) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding NRF1, (v) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ERF, and (vi) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding SP4.
- Some aspects provide a method of producing hepatocyte-like cells, comprising: (a) delivering 1-20 copies, for example, 8-10 copies, of an engineered nucleic acid to a population of human stem cells, wherein the engineered nucleic acid comprises an inducible promoter operably linked to an open reading frame encoding a transcription factor selected from HNF4G, TEAD4, and RFX3; (b) inducing activation of the inducible promoter; and (c) culturing the population of human stem cells for no more than 10 days, preferably no more than 7 days, in a series of induction media (e.g., a first, second, third and/or fourth induction media as described herein) to produce hepatocyte-like cells (e.g., CD184+/ASGPR1+ hepatocyte-like cells).
- induction media e.g., a first, second, third and/or fourth induction media as described herein
- the method of producing hepatocyte-like cells comprises delivering to the human stem cells 1-20 copies of each of the following: (i) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding HNF4G, (ii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding TEAD4, and (iii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding RFX3.
- Some aspects provide a method of producing regulatory T-cell-like cells, comprising: (a) delivering 1-20 copies, for example, 8-10 copies, of an engineered nucleic acid to a population of human stem cells, wherein the engineered nucleic acid comprises an inducible promoter operably linked to an open reading frame encoding a transcription factor selected from ETS1, ETV3, GABPA, KLF9, and NFKB1; (b) inducing activation of the inducible promoter; and (c) culturing the population of human stem cells for no more than 10 days, preferably no more than 7 days, in a series of induction media (e.g., a first, second, third and/or fourth induction media as described herein) to produce regulatory T-cell-like cells (e.g., CD3+/CD25+ regulatory T-cell-like cells).
- a series of induction media e.g., a first, second, third and/or fourth induction media as described herein
- the method of producing regulatory T-cell-like cells comprises delivering to the human stem cells 1-20 copies of each of the following: (i) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ETS1, (ii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ETV3, (iii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding GABPA, (iv) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding KLF9, and (v) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding NFKB1.
- Some aspects provide a method of producing B cell-like cells, comprising: (a) delivering 1-20 copies, for example, 8-10 copies, of an engineered nucleic acid to a population of human stem cells, wherein the engineered nucleic acid comprises an inducible promoter operably linked to an open reading frame encoding a transcription factor selected from EBF1, ZBTB1, RELA, NRF1, and REL; (b) inducing activation of the inducible promoter; and (c) culturing the population of human stem cells for no more than 10 days, preferably no more than 7 days, in a series of induction media (e.g., a first, second, third and/or fourth induction media as described herein) to produce B cell-like cells (e.g., CD19+/CD27+ B cell-like cells).
- a series of induction media e.g., a first, second, third and/or fourth induction media as described herein
- the method of producing B cell-like cells comprises delivering to the human stem cells 1-20 copies of each of the following: (i) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding EBF1, (ii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ZBTB1, (iii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding RELA, (iv) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding NRF1, and (v) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding REL.
- Some aspects provide a method of producing microglia-like cells, comprising: (a) delivering 1-20 copies, for example, 8-10 copies, of an engineered nucleic acid to a population of human stem cells, wherein the engineered nucleic acid comprises an inducible promoter operably linked to an open reading frame encoding a transcription factor selected from SPI1, ZBTB1, RELA, and STAT2; (b) inducing activation of the inducible promoter; and (c) culturing the population of human stem cells for no more than 10 days, preferably no more than 7 days, in a series of induction media (e.g., a first, second, third and/or fourth induction media as described herein) to produce microglia-like cells (e.g., CD11b+/CX3CR1+ microglia-like cells).
- a series of induction media e.g., a first, second, third and/or fourth induction media as described herein
- the method of producing microglia-like cells comprises delivering to the human stem cells 1-20 copies of each of the following: (i) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding SPI1, (ii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ZBTB1, (iii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding RELA, and (iv) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding STAT2.
- the engineered nucleic acid(s) is/are integrated into the genome of the human stem cells.
- the human stem cells are pluripotent stem cells (PSCs), for example, induced pluripotent stem cells (iPSCs).
- the inducing step comprises delivering a chemical inducing agent, such as doxycycline or tetracycline, that activates the inducible promoter (e.g., a doxycycline-inducible promoter or a tetracycline-inducible promoter).
- the human stem cells are first expanded in a feeder-free, serum-free culture media, prior to delivery of the engineered nucleic acid(s).
- the engineered polynucleotide of the present disclosure may be delivered to a PSC using any one or more transfection method, including chemical transfection methods, viral transduction methods, and electroporation.
- an engineered polynucleotide is delivered on a vector.
- a vector is any vehicle, for example, a virus or a plasmid, that is used to transfer a desired polynucleotide into a host cell, such as a PSC.
- the vector is a viral vector.
- a viral vector is not a naturally occurring viral vector.
- the viral vector may be from adeno-associated virus (AAV), adenovirus, herpes simplex virus, lentiviral, retrovirus, varicella, variola virus, hepatitis B, cytomegalovirus, JC polyomavirus, BK polyomavirus, monkeypox virus, Herpes Zoster, Epstein-Barr virus, human herpes virus 7, Kaposi's sarcoma-associated herpesvirus, or human parvovirus B 19.
- AAV adeno-associated virus
- adenovirus herpes simplex virus
- lentiviral retrovirus
- varicella variola virus
- hepatitis B cytomegalovirus
- JC polyomavirus cytomegalovirus
- BK polyomavirus monkeypox virus
- Herpes Zoster Epstein-Barr virus
- human herpes virus 7 Kaposi's sarcoma-associated herpesvirus
- human parvovirus B 19 Other
- AAV is a small, non- enveloped virus that packages a single- stranded linear DNA genome that is approximately 5 kb long and has been adapted for use as a gene transfer vehicle (Samulski, RJ et al., Annu Rev Virol.2014;1(1):427-51).
- the coding regions of AAV are flanked by inverted terminal repeats (ITRs), which act as the origins for DNA replication and serve as the primary packaging signal (McLaughlin, SK et al. Virol.1988;62(6): 1963-73; Hauswirth, WW et al. 1977;78(2):488-99).
- ITRs inverted terminal repeats
- Both positive and negative strands are packaged into virions equally well and capable of infection (Zhong, L et al. Mol Ther.2008;16(2):290-5; Zhou, X et al. Mol Ther.2008;16(3):494- 9; Samulski, RJ et al. Virol.1987;61(10):3096-101).
- a small deletion in one of the two ITRs allows packaging of self-complementary vectors, in which the genome self-anneals after viral uncoating. This results in more efficient transduction of cells but reduces the coding capacity by half (McCarty, DM et al. Mol Ther.2008;16(10): 1648-56; McCarty, DM et al.
- a polynucleotide is delivered to a cell using a transposon/transposase system.
- the piggyBacTM transposon system may be used.
- a piggyBacTM transposon is a mobile genetic element that efficiently transposes between vectors and chromosomes via a “cut and paste” mechanism (Woodard et al.2015).
- the piggyBacTM transposase recognizes transposon-specific inverted terminal repeat sequences (ITRs) located on both ends of the transposon vector and efficiently moves the contents from the original sites and integrates them into TTAA chromosomal sites.
- the piggyBacTM transposon system facilitates efficient integration of a polynucleotide into a cell genome.
- the method further comprises delivering to a PSC a transposon comprising an engineered polynucleotide and also delivering a transposase.
- an engineered polynucleotide is delivered to a cell using electroporation. Electroporation is a physical transfection method that uses an electrical pulse to create temporary pores in cell membranes through which the engineered polynucleotide can pass into cells. See, e.g., Chicaybam L et al. Front. Bioeng. Biotechnol., 23 January 2017.
- an engineered polynucleotide may further comprise an antibiotic resistance gene to confer resistance to an antibiotic used in an antibiotic drug selection process.
- an antibiotic resistance gene to confer resistance to an antibiotic used in an antibiotic drug selection process.
- a ‘pure’ population of cells comprising an integrated engineered polynucleotide may be obtained.
- a population of cells comprising an integrated engineered polynucleotide are selected using antibiotic drug selection.
- Antibiotic drug selection is the process of treating a population of cells with an antibiotic so that only cells that are capable of surviving in the presence of said antibiotic will remain in the population.
- Non-limiting examples of antibiotics that may be used for antibiotic drug selection include: puromycin, blasticidin, geneticin, hygromycin, mycophenolic acid, zeocin, carbenicillin, kanemycin, ampicillin, and actinomycin.
- Culture Media may comprise, for example, a solubilized basement membrane preparation extracted from the Engelbreth-Holm-Swarm (EHS) mouse sarcoma (e.g., Corning® Matrigel® Matrix) (coated at 75 to 150 ⁇ l per cm 2 of lot-based diluted suspension).
- EHS Engelbreth-Holm-Swarm
- the solubilized basement membrane preparation comprises one or more extracellular matrix (ECM) protein and one or more growth factor.
- ECM extracellular matrix
- the ECM proteins may be selected from Laminin, Collagen IV, heparan sulfate proteoglycans, and entactin/nidogen.
- culture media further comprises one or more growth factor, for example, selected from recombinant human basic fibroblast growth factor (rh bFGF) (e.g., 80ng/ml to 120ng/ml) and recombinant human transforming growth factor ⁇ (rh TGF ⁇ ) (e.g., 20 to 25pM).
- rh bFGF recombinant human basic fibroblast growth factor
- rh TGF ⁇ recombinant human transforming growth factor ⁇
- culture media further comprises rh bFGF and rh TGF ⁇ .
- culture media comprises mTeSRTM media (STEMCELL Technologies).
- a first induction media comprises one or more of (e.g., 2, 3, 4, or more of) B-27 Supplement (e.g., 90X to110X), L-alanyl-L-glutamine (e.g., 1.8 mM to 2.2 mM), an inducing agent (e.g., doxycycline (e.g., 50 ng/ml to 2000 ng/ml)), Activin A (e.g., 50 ng/ml to 150 ng/ml), a glycogen synthase kinase (GSK) 3 inhibitor (e.g., 2.8 ⁇ M to 3.2 ⁇ M), a selective FGFR1 and FGFR3 inhibitor (e.g., 90 nM to 110 nM), and a small molecule ROCK inhibitor (e.g., 8 ⁇ M to 12 ⁇ M).
- B-27 Supplement e.g., 90X to
- a first induction media comprises B-27, L-alanyl-L-glutamine, an inducing agent (e.g., doxycycline), Activin A, a glycogen synthase kinase (GSK) 3 inhibitor, and a selective FGFR1 and FGFR3 inhibitor.
- the first induction media may comprise aRB27 Media, doxycycline, Activin A, CHIR99021, and PD 173074.
- the second induction media comprises one or more of (e.g., 2, 3, 4, or more of) B-27 Supplement (e.g., 90X to 110X), an inducing agent (e.g., doxycycline (e.g., 50 ng/ml to 2000 ng/ml)), a small molecule inhibitor of tankyrase (TNKS) (e.g., 0.9 ⁇ M to 1.1 ⁇ M), and a human bone morphogenic protein 4 (hBMP4) (e.g., 20 ng/ml to 250 ng/ml).
- B-27 Supplement e.g., 90X to 110X
- an inducing agent e.g., doxycycline (e.g., 50 ng/ml to 2000 ng/ml)
- TNKS small molecule inhibitor of tankyrase
- hBMP4 human bone morphogenic protein 4
- the second induction media comprises B-27, an inducing agent (e.g., doxycycline), a small molecule inhibitor of tankyrase (TNKS), and a human bone morphogenic protein 4 (hBMP4).
- an inducing agent e.g., doxycycline
- TNKS small molecule inhibitor of tankyrase
- hBMP4 human bone morphogenic protein 4
- the second induction media may comprise aRB27 Media, doxycycline, XAV939, and human bone morphogenic protein 4 (hBMP4).
- the third induction media comprises one or more of (e.g., 2, 3, 4, or more of) B-27, an inducing agent (e.g., doxycycline), a small molecule inhibitor of tankyrase (e.g., 0.9 ⁇ M to 1.1 ⁇ M), stem cell factor (SCF) (e.g., 25 ng/ml to 200 ng/ml), and epidermal growth factor (EGF) (e.g., 25 ng/ml to 100 ng/ml).
- an inducing agent e.g., doxycycline
- a small molecule inhibitor of tankyrase e.g., 0.9 ⁇ M to 1.1 ⁇ M
- SCF stem cell factor
- EGF epidermal growth factor
- the third induction media comprises B-27 Supplement (e.g., 90X to 110X), an inducing agent (e.g., doxycycline (e.g., 50 ng/ml to 2000 ng/ml)), a small molecule inhibitor of tankyrase (e.g., 0.9 ⁇ M to 1.1 ⁇ M), stem cell factor (SCF) (e.g., 25ng/ml to 200ng/ml), and epidermal growth factor (EGF) (e.g., 25 ng/ml to 100 ng/ml).
- the third induction media may comprise aRB27 Media, doxycycline, XAV939, SCF, and EGF.
- the fourth induction media comprises one or more of (e.g., 2, 3, 4, or more of) B-27 Supplement (90-110X), an inducing agent (e.g., doxycycline (e.g., 50 ng/ml to 2000 ng/ml)), a small molecule inhibitor of tankyrase (e.g., 0.9 ⁇ M to 1.1 ⁇ M), hBMP4 (e.g., 20 ng/ml to 250 ng/ml), SCF (e.g., 25 ng/ml to 200 ng/ml), and EGF (e.g., 25 ng/ml to 100 ng/ml).
- an inducing agent e.g., doxycycline (e.g., 50 ng/ml to 2000 ng/ml)
- a small molecule inhibitor of tankyrase e.g., 0.9 ⁇ M to 1.1 ⁇ M
- hBMP4 e.g., 20 ng/ml
- the fourth induction media comprises B-27, an inducing agent (e.g., doxycycline), a small molecule inhibitor of tankyrase, hBMP4, SCF, and EGF.
- the fourth induction media may comprise aRB27 Media, doxycycline, XAV939, hBMP4, SCF, and EGF.
- the ‘aRB27 Media’ used herein comprises Advanced RPMI, B-27TM Supplement, minus vitamin A (Thermo Fisher) or plus vitamin A, GlutaMAXTM Supplement (Thermo Fisher), non-essential amino acids (NEAA), Primocin® (a broad-spectrum antibiotic), and Y- 27632 (a small molecule ROCK inhibitor).
- GlutaMAXTM Supplement comprises L-alanyl-L-glutamine, which is a dipeptide substitute for L-glutamine.
- Activin-A is a dimeric glycoprotein, which belongs to the transforming growth factor- ⁇ (TGF- ⁇ ) family.
- GSK3 is a serine/threonine kinase that is a key inhibitor of the WNT pathway; therefore, CHIR99021 functions as a WNT activator.
- PD 173074 is a selective FGFR1 and FGFR3 inhibitor (IC50 values are ⁇ 5 nM, ⁇ 21.5 nM, ⁇ 100 nM, ⁇ 17600 nM and ⁇ 19800 nM for FGFR3, FGFR1, VEGFR2, PDGFR and c-Src respectively, and > 50000 nM for EGFR, InsR, MEK and PKC).
- compositions and Method of Use comprising the astrocyte-like cells, cytotoxic T-cell-like cells, hepatocyte-like cells, regulatory T-cell-like cells, B cell-like cells, and/or microglia-like cells produced herein.
- the compositions further comprise a pharmaceutically-acceptable excipient.
- the compositions in some embodiments, are cryopreserved.
- compositions may be administered to a subject, such as a human subject, using any suitable route of administration.
- Suitable routes of administration include, for example, parenteral routes such as intravenous, intrathecal, parenchymal, or intraventricular routes.
- Suitable routes of administration include, for example, parenteral routes such as intravenous, intrathecal, parenchymal, or intraventricular injection.
- a subject is a human subject.
- the subject may have a disease, disorder, or symptoms of a disease associated with astrocyte dysfunction, cytotoxic T-cell dysfunction, hepatocyte dysfunction, regulatory T-cell, B cell, or microglia dysfunction.
- the compositions may be administered to a subject in a therapeutically effective amount.
- terapéuticaally effective amount refers to the amount of cell material required to confer therapeutic effect on a subject, either alone or in combination with at least one other active agent. Effective amounts vary, as recognized by those skilled in the art, depending on the route of administration, excipient usage, and co-usage with other active agents.
- the quantity to be administered depends on the subject to be treated, including, for example, the strength of an individual’s immune system or genetic predispositions. Suitable dosage ranges are readily determinable by one skilled in the art and may be on the order of micrograms of the polypeptide of this disclosure.
- the dosage of the preparations disclosed herein may depend on the route of administration and varies according to the size of the subject.
- aspects of the present disclosure relate to a method for identifying transcription factors that are able to differentiate a stem cell into a target cell type (e.g., an astrocyte, a cytotoxic T-cell, a hepatocyte, a regulatory T-cell, a B cell, or a microglial cell), comprising: (i) analyzing epigenetics data for a target cell type to identify genomic sites that are available for binding of a transcription factor and generating a first pool of transcription factors; (ii) analyzing transcriptomic data for the target cell type to identify expression levels of the transcription factors associated with the genomic sites that are available for binding identified in step (i) and generating second pool of transcription factors; (iii) using a first statistical method to filter background data and identify transcription factors that are present in the first pool of transcription factors and the second pool of transcription factors and generating a third pool of transcription factors, wherein the third pool of transcription factors;
- a target cell type e.g., an astrocyte, a cytotoxic T-cell,
- the epigenetics data provides information related to whether genomic chromatin is open or closed. In some embodiments, the epigenetics data is produced by DNAse-seq, ATAC-seq, or ChIP-seq. In some embodiments, the transcriptomic data provides information related to whether there are more transcripts of the transcription factor in the target cell type than in a non-target cell type. In some embodiments, the transcriptomic data is produced by RNA-seq. In some embodiments, the first statistical method is linear regression algorithm. In some embodiments, the first statistical method is a logistic regression algorithm. In some embodiments, the first statistical method is a L1-regularized logistic regression model (LASSO).
- LASSO L1-regularized logistic regression model
- the background data is associated with transcription factors that are not expressed in the target cell type at a higher expression level than in the non-target cell type.
- the second statistical method is a log-likelihood ratio test.
- the method further comprises transfecting transcription factors of the third pool into a stem cell.
- the method further comprises inducing differentiation of the stem cell into the target cell type.
- the method further comprises analyzing the target cell type to identify additional transcription factors associated with the target cell type.
- the method further comprises using data from the target cell type to further refine the steps of the method.
- the target cell type is an astrocyte, a cytotoxic T-cell, a hepatocyte, a regulatory T-cell, a B cell, or a microglial cell.
- differentiation of stem cells using one or more of the transcription factors in the third pool results in production of the target cell type in no more than 6 days.
- aspects of the present disclosure relate to a method for generating a transcription factor screening pool comprising: using at least one computer hardware processer to perform: accessing at least one statistical model relating one or more input transcription factors to differentiation efficiency of a cell having the one or more input transcription factors; obtaining differentiation efficiency information for the one or more input transcription factors; generating, using the at least one statistical model and the differentiation efficiency information, a transcription factor pool having transcription factors that are predicted to differentiate the cell into a target cell type in accordance with the differentiation efficiency information.
- the at least one statistical model correlates chromatin accessibility data and transcriptomics data to make initial predictions relating the one or more input transcription factors to differentiation efficiency of the cell having the one or more input transcription factors.
- the at least one statistical model distinguishes open chromatin data from background data.
- the open chromatin data is associated with the target cell type.
- the method further comprises identifying an initial set of transcription factor motifs positively correlated with the open chromatin data by using a statistical coefficient trained to distinguish the open chromatin data from the background data.
- the differentiation efficiency information corresponds to a mode of a distribution of differentiation efficiency data used to train the at least one statistical model.
- the at least one statistical model was trained using measured differentiation efficiency values having a multimodal distribution with modes, and the differentiation efficiency information corresponds to a mode of the multimodal distribution with the highest value.
- the transcription factors of the transcription factor pool have predicted differentiation efficiency within a distribution centered at the mode of the multimodal distribution with the highest value.
- the differentiation efficiency information corresponds to a Gaussian distribution centered at a mode of a distribution for differentiation efficiency data used to train the at least one statistical model.
- the differentiation efficiency information corresponds to a high differentiation efficiency component of a distribution of differentiation efficiency values for transcription factors.
- generating the transcription factor pool further comprises: generating an initial pool of transcription factors; using transcription factors in the initial pool as input to the at least one statistical model to obtain values for differentiation efficiency; selecting, based on the values for differentiation efficiency and the differentiation efficiency information, one or more of the transcription factors in the initial pool to include in the transcription factor pool.
- the at least one statistical model comprises at least one regression model.
- the at least one statistical model comprises at least one neural network.
- the at least one statistical model has a recurrent neural network architecture.
- the at least one statistical model comprises a L1- regularized logistic regression model (LASSO).
- the at least one statistical model comprises a log-likelihood ratio test.
- aspects of the present disclosure relate to a system comprising: at least one hardware processor; and at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform: accessing at least one statistical model relating one or more input transcription factors to differentiation efficiency of a cell having the one or more input transcription factors; obtaining differentiation efficiency information for transcription factors, wherein the differentiation efficiency information corresponds to a mode of a distribution for differentiation efficiency data used to train the at least one statistical model; and generating, using the at least one statistical model and the differentiation efficiency information, a transcription factor pool having transcription factors with predicted differentiation efficiency in accordance with the differentiation efficiency information.
- the target cell type is a Type II astrocyte, cytotoxic T-cell, regulatory T-cell, hepatocyte, B cell, or microglial cell.
- PSC pluripotent stem cell
- the PSC of any one of the preceding paragraphs comprising the engineered polynucleotide comprising an open reading frame encoding FLI1. 5. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding FOSB. 6. The PSC of any one of the preceding paragraphs, wherein the PSC expresses or overexpresses ERG, EGR1, FLI1, FOSB, or any combination thereof. 7. The PSC of any one of the preceding paragraphs, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 8. The PSC of paragraph 7, wherein the heterologous promoter is an inducible promoter. 9.
- a pluripotent stem cell comprising: a protein selected from ERG, EGR1, FLI1, and FOSB, wherein the protein is overexpressed. 10.
- the PSC of any one of the preceding paragraphs comprising 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ERG, EGR1, FLI1, and FOSB, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
- the PSC of any one of the preceding paragraphs comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ERG, EGR1, FLI1, and FOSB, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
- a composition comprising: a population of the PSC of any one of the preceding paragraphs. 16.
- a method comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ERG, EGR1, FLI1, and FOSB to produce astrocyte-like cells.
- PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ERG.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding EGR1. 20.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding FLI1.
- the heterologous promoter is an inducible promoter.
- the inducible promoter is a chemically- inducible promoter. 25.
- the chemically-inducible promoter is a doxycycline-inducible promoter.
- the population comprises 1x10 2 -1x10 7 PSCs.
- the population of PSCs is cultured for at least 1 day.
- the population of PSCs is cultured for about 3-6 days, 29.
- the method of paragraph 28, wherein the population of PSCs is cultured for no more than 6 days.
- the astrocyte-like cells are CD44 + and A2B5 + . 31.
- a pluripotent stem cell comprising: an engineered polynucleotide comprising an open reading frame encoding ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof.
- the PSC of paragraph 31 comprising the engineered polynucleotide comprising an open reading frame encoding ZBTB1.
- the PSC of paragraph 31 or 32 comprising the engineered polynucleotide comprising an open reading frame encoding RUNX3.
- the PSC of any one of the preceding paragraphs comprising the engineered polynucleotide comprising an open reading frame encoding RELA. 35.
- the PSC of any one of the preceding paragraphs comprising the engineered polynucleotide comprising an open reading frame encoding NRF1. 36. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding ERF. 37. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding SP4. 38. The PSC of any one of the preceding paragraphs, wherein the PSC expresses or overexpresses ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof. 39.
- PSC pluripotent stem cell
- the PSC of any one of the preceding paragraphs comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
- a composition comprising: a population of the PSC of any one of the preceding paragraphs.
- 48. The composition of paragraph B17, wherein the population comprises at least 2500/cm 2 of the PSC. 49.
- a method comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 to produce cytotoxic T-cell-like cells.
- PSCs pluripotent stem cells
- a protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 to produce cytotoxic T-cell-like cells.
- a pluripotent stem cell comprising: an engineered polynucleotide comprising an open reading frame encoding HNF4G, TEAD4, RFX3, or any combination thereof.
- the PSC of paragraph 65 comprising the engineered polynucleotide comprising an open reading frame encoding HNF4G. 67.
- the PSC of any one of the preceding paragraphs comprising the engineered polynucleotide comprising an open reading frame encoding TEAD4.
- the PSC of any one of the preceding paragraphs comprising the engineered polynucleotide comprising an open reading frame encoding RFX3.
- the PSC of any one of the preceding paragraphs wherein the PSC expresses or overexpresses HNF4G, TEAD4, RFX3, or any combination thereof. 71. The PSC of paragraph 70, wherein the PSC further expresses or overexpresses HNF4A. 72. The PSC of any one of the preceding paragraphs, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 73. The PSC of paragraph 72, wherein the heterologous promoter is an inducible promoter. 74. A pluripotent stem cell (PSC) comprising: a protein selected from HNF4G, TEAD4, and RFX3, wherein the protein is overexpressed. 75.
- PSC pluripotent stem cell
- the PSC of paragraph 74 wherein the PSC expresses or overexpresses: HNF4G, TEAD4, RFX3, or any combination thereof.
- 76. The PSC of any one of paragraphs 65-75, wherein the PSC is a human PSC.
- iPSC induced PSC
- 78 The PSC of any one of the preceding paragraphs, comprising 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from HNF4G, HNF4A, TEAD4, and RFX3, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 79.
- the PSC of any one of the preceding paragraphs comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from HNF4G, TEAD4, and RFX3, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
- a composition comprising: a population comprising the PSC of any one of the preceding paragraphs.
- the composition of paragraph 80, wherein the population comprises at least 2500/cm 2 of the PSC. 82.
- a method comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from HNF4G, TEAD4, and RFX3 to produce hepatocyte-like cells.
- PSCs pluripotent stem cells
- a protein selected from HNF4G, TEAD4, and RFX3 to produce hepatocyte-like cells.
- the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RFX3.
- the PSCs of the expanded population further comprise an engineered polynucleotide comprising an open reading frame encoding HNF4A.
- the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter.
- the heterologous promoter is an inducible promoter. 89.
- the inducible promoter is a chemically- inducible promoter.
- the chemically-inducible promoter is a doxycycline-inducible promoter.
- the population comprises 1x10 2 -1x10 7 PSCs.
- the population of PSCs is cultured for at least 1 day.
- the method of paragraph 92, wherein the population of PSCs is cultured for about 3-6 days, 94.
- the method of paragraph 93, wherein the population of PSCs is cultured for no more than 6 days. 95.
- a pluripotent stem cell comprising: an engineered polynucleotide comprising an open reading frame encoding ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof.
- the PSC of paragraph 96 comprising the engineered polynucleotide comprising an open reading frame encoding ETS1.
- the PSC of paragraph 96 or 97 comprising the engineered polynucleotide comprising an open reading frame encoding ETV3. 99.
- the PSC of any one of the preceding paragraphs comprising the engineered polynucleotide comprising an open reading frame encoding GABPA. 100. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding KLF9. 101. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding NFKB1. 102. The PSC of any one of the preceding paragraphs, wherein the PSC expresses or overexpresses ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof. 103.
- a pluripotent stem cell (PSC) comprising: a protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1, wherein the protein is overexpressed.
- PSC pluripotent stem cell
- the PSC of any one of the preceding paragraphs comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
- a composition comprising: a population of the PSC of any one of the preceding paragraphs. 112. The composition of paragraph 111, wherein the population comprises at least 2500/cm 2 of the PSC. 113.
- a method comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1 to produce regulatory T-cell-like cells.
- PSCs pluripotent stem cells
- a protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1 to produce regulatory T-cell-like cells.
- the heterologous promoter is an inducible promoter.
- the inducible promoter is a chemically- inducible promoter.
- the chemically-inducible promoter is a doxycycline-inducible promoter.
- the population comprises 1x10 2 -1x10 7 PSCs.
- the population of PSCs is cultured for at least 1 day.
- the method of paragraph 124, wherein the population of PSCs is cultured for about 3- 6 days, 126.
- a pluripotent stem cell comprising: an engineered polynucleotide comprising an open reading frame encoding EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof.
- the PSC of any one of the preceding paragraphs comprising the engineered polynucleotide comprising an open reading frame encoding RELA. 132. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding NRF1. 133. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding REL. 134. The PSC of any one of the preceding paragraphs, wherein the PSC expresses or overexpresses EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof. 135.
- PSC pluripotent stem cell
- the PSC of any one of the preceding paragraphs comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from EBF1, ZBTB1, RELA, NRF1, and REL, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
- a composition comprising: a population of the PSC of any one of the preceding paragraphs. 144. The composition of paragraph 143, wherein the population comprises at least 2500/cm 2 of the PSC. 145.
- a method comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from EBF1, ZBTB1, RELA, NRF1, and REL to produce B cell-like cells.
- PSCs pluripotent stem cells
- a protein selected from EBF1, ZBTB1, RELA, NRF1, and REL to produce B cell-like cells.
- the heterologous promoter is an inducible promoter.
- the inducible promoter is a chemically- inducible promoter.
- the chemically-inducible promoter is a doxycycline-inducible promoter.
- the population comprises 1x10 2 -1x10 7 PSCs.
- the population of PSCs is cultured for at least 1 day.
- the method of paragraph 156, wherein the population of PSCs is cultured for about 3- 6 days, 158.
- a pluripotent stem cell comprising: an engineered polynucleotide comprising an open reading frame encoding SPI1, ZBTB1, RELA, STAT2, or any combination thereof. 161.
- the PSC of paragraph 160 comprising the engineered polynucleotide comprising an open reading frame encoding SPI1. 162.
- the PSC of any one of the preceding paragraphs comprising the engineered polynucleotide comprising an open reading frame encoding ZBTB1. 163.
- the PSC of any one of the preceding paragraphs comprising the engineered polynucleotide comprising an open reading frame encoding RELA.
- the PSC of any one of the preceding paragraphs comprising the engineered polynucleotide comprising an open reading frame encoding STAT2.
- the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 167.
- a pluripotent stem cell comprising: a protein selected from SPI1, ZBTB1, RELA, and STAT2, wherein the protein is overexpressed.
- the PSC of any one of the preceding paragraphs comprising 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from SPI1, ZBTB1, RELA, and STAT2, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 173.
- the PSC of any one of the preceding paragraphs comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from SPI1, ZBTB1, RELA, and STAT2, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
- a composition comprising: a population comprising the PSC of any one of the preceding paragraphs. 175.
- a method comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from SPI1, ZBTB1, RELA, and STAT2 to produce microglia-like cells.
- PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding SPI1.
- the heterologous promoter is an inducible promoter.
- the inducible promoter is a chemically- inducible promoter.
- the chemically-inducible promoter is a doxycycline-inducible promoter.
- the population comprises 1x10 2 -1x10 7 PSCs.
- the population of PSCs is cultured for at least 1 day.
- the method of paragraph 186, wherein the population of PSCs is cultured for about 3- 6 days, 188.
- a method comprising: (i) analyzing epigenetics data for a target cell type to identify genomic sites that are available for binding of a transcription factor and generating a first pool of transcription factors; (ii) analyzing transcriptomic data for the target cell type to identify expression levels of the transcription factors associated with the genomic sites that are available for binding identified in step (i) and generating a second pool of transcription factors; (iii) using a first statistical method to filter background data and identify transcription factors that are present in the first pool of transcription factors and the second pool of transcription factors and generating a third pool of transcription factors, wherein the third pool of transcription factors comprises transcription factors that are in both the first pool and the second pool; (iv) using a second statistical method to determine the statistical significance of the transcription factors in the third pool of transcription factors; and (v) repeating steps (i)-(iv) one or more times to iteratively refine the third pool of transcription factors.
- the epigenetics data provides information related to whether genomic chromatin is open or closed.
- the epigenetics data is produced by DNAse-seq, ATAC-seq, or ChIP-seq. 193.
- the method of any one of the preceding paragraphs, wherein the transcriptomic data provides information related to whether there are more transcripts of the transcription factor in the target cell type than in a non-target cell type. 194.
- the method of any one of the preceding paragraphs, wherein the transcriptomic data is produced by RNA-seq.
- the first statistical method is linear regression algorithm. 196.
- the first statistical method is a logistic regression algorithm.
- the first statistical method is a L1-regularized logistic regression model (LASSO).
- LASSO L1-regularized logistic regression model
- the background data is associated with transcription factors that are not expressed in the target cell type at a higher expression level than in the non-target cell type.
- the second statistical method is a log-likelihood ratio test.
- the method of paragraph 200 further comprising inducing differentiation of the stem cell into the target cell type.
- 202 The method of paragraph 201, further comprising analyzing the target cell type to identify additional transcription factors associated with the target cell type.
- 203 The method of any one of the preceding paragraphs, further comprising using data from the target cell type to further refine the steps of paragraph 190.
- 204 The method of any one of the preceding paragraphs, wherein the target cell type is an astrocyte, a cytotoxic T-cell, a hepatocyte, a regulatory T-cell, a B cell, or a microglial cell.
- 205 The method of any one of the preceding paragraphs, wherein the target cell type is an astrocyte, a cytotoxic T-cell, a hepatocyte, a regulatory T-cell, a B cell, or a microglial cell.
- a method for generating a transcription factor screening pool comprising: using at least one computer hardware processer to perform: accessing at least one statistical model relating one or more input transcription factors to differentiation efficiency of a cell having the one or more input transcription factors; obtaining differentiation efficiency information for the one or more input transcription factors; generating, using the at least one statistical model and the differentiation efficiency information, a transcription factor pool having transcription factors that are predicted to differentiate the cell into a target cell type in accordance with the differentiation efficiency information.
- the differentiation efficiency information corresponds to a mode of a distribution of differentiation efficiency data used to train the at least one statistical model.
- the at least one statistical model was trained using measured differentiation efficiency values having a multimodal distribution with modes, and the differentiation efficiency information corresponds to a mode of the multimodal distribution with the highest value.
- 213. The method of any one of the preceding paragraphs, wherein the transcription factors of the transcription factor pool have predicted differentiation efficiency within a distribution centered at the mode of the multimodal distribution with the highest value.
- the differentiation efficiency information corresponds to a Gaussian distribution centered at a mode of a distribution for differentiation efficiency data used to train the at least one statistical model. 215. The method of any one of the preceding paragraphs, wherein the differentiation efficiency information corresponds to a high differentiation efficiency component of a distribution of differentiation efficiency values for transcription factors. 216. The method of any one of the preceding paragraphs, wherein generating the transcription factor pool further comprises: generating an initial pool of transcription factors; using transcription factors in the initial pool as input to the at least one statistical model to obtain values for differentiation efficiency; selecting, based on the values for differentiation efficiency and the differentiation efficiency information, one or more of the transcription factors in the initial pool to include in the transcription factor pool. 217.
- the at least one statistical model comprises at least one regression model. 218.
- the method of any one of the preceding paragraphs, wherein the at least one statistical model comprises at least one neural network. 219.
- the method of any one of the preceding paragraphs, wherein the at least one statistical model comprises a L1-regularized logistic regression model (LASSO). 221.
- LASSO L1-regularized logistic regression model
- the at least one statistical model comprises a log-likelihood ratio test. 222.
- a system comprising: at least one hardware processor; and at least one non-transitory computer-readable storage medium storing processor- executable instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform: accessing at least one statistical model relating one or more input transcription factors to differentiation efficiency of a cell having the one or more input transcription factors;obtaining differentiation efficiency information for transcription factors, wherein the differentiation efficiency information corresponds to a mode of a distribution for differentiation efficiency data used to train the at least one statistical model; and generating, using the at least one statistical model and the differentiation efficiency information, a transcription factor pool having transcription factors with predicted differentiation efficiency in accordance with the differentiation efficiency information. 223.
- the target cell type is a Type II astrocyte, cytotoxic T-cell, regulatory T-cell, hepatocyte, B cell, or microglial cell.
- Stem cells are the progenitor cells of all differentiating multi-cellular organisms. In principle, it is possible to differentiate these cells into any other type of cell, which can then be used for many different possible therapeutic or diagnostic applications.
- iPSCs induced pluripotent stem cells
- iPSC-derived cytotoxic T-cells iCytoT
- regulatory T-cells iTreg
- type II astrocytes iAstII
- iHep hepatocytes
- Example 1 Machine learning for determining TF sub-libraries It is not known exactly how many human cell types exist, but current estimates put the number in the hundreds, all originating from a single ‘totipotent’ embryonic stem cell. Since the creation of induced pluripotent stem cells (iPSCs), scientists have been trying to recreate differentiation of iPSCs into all of these other types of cells and combine them into tissues or tissue-like structures (a.k.a. ‘cell-fate engineering’). This goal seems feasible given that it has been generally accepted that iPSCs are functionally identical to embryonic stem cells (ESCs).
- ESCs embryonic stem cells
- TF-based approaches directly manipulate the epigenetic landscape of individual cells, they have proved to address these three issues to a great extent. While TF-based approaches have been fruitful, the task of identifying the correct TFs for a fast, efficient, and robust 37 cell conversion remains a challenging problem. There are two general ways to go about this research process: (1) an exhaustive literature search for potentially relevant transcription factors for a desired cell type and identify successful combinations via trial-and-error or (2) to use computational tools to predict TFs. While iPSCs were created through a systematic version of the former, this process does not scale — it is very laborious, requires deep expertise of the cell types being converted, and can only account for previously studied TFs associated with specific cell types.
- ML pipelines 49 have yielded impressive results in various areas of molecular biology to date, currently no tools use ML to generate screens for cell-fate engineering.
- ML machine-learning
- CellCartographer uses next generation sequencing based readouts of chromatin accessibility (e.g., DNase-seq, ATAC- seq, ChIP-seq) and transcription (RNA-seq) to predict TFs to be correlated with cell-type identity.
- iTreg regulatory T-cells
- iCytoTs cytotoxic T-cells
- iHep hepatocytes
- iAstII type-II astrocytes
- the CellCartographer model leverages chromatin accessibility data to make initial predictions of TFs for differentiating towards a given cell type. After initial TF predictions are made, TF transcript levels are used to exclude TFs that are not expressed.
- the CellCartographer pipeline can leverage a variety of assays for chromatin accessibility and transcriptomics to predict a set of TFs for a target cell type, which can then be tested in a pooled screen (FIG. 1B).
- RNA-seq assays including ribo-depleted, total RNA or polyA RNA seq. Since the number of TFs in the TFome (1732) with characterized binding sites (891), yields 2 891 possible outcomes (FIG.1A).
- Each TF cassette is integrated randomly within each cell from zero to n times, allowing us to explore a large parameter space of DNA integration location and resulting expression amounts of each TF in combination.
- TF predictions for each cell type we begin by training a logistic regression classifier model to distinguish between open chromatin regions and a set of background genomic loci using the known DNA TF binding motifs drawn from the JASPAR database (FIG.1C).
- FOG.1C DNA TF binding motifs drawn from the JASPAR database
- Example 2 Primary pooled TF screens for differentiation
- Mesoderm T-cells (subtypes cytotoxic, delta- gamma, and regulatory), B cells, macrophages, epithelial cells (subtypes kidney, bronchial, and mammary), and osteoblasts
- Endoderm hepatocytes
- Ectoderm type II astrocytes
- Yolk Sac microglia
- Example 3 Iterative pooled TF screening and clonal isolation Using the barcode frequencies, we calculated 3 refined TF pools for each cell type: All TFs that appear in sequencing, TFs that appear greater than average, and TFs that appear one standard deviation or more than average (FIGs.4A-D. Using the refined TF pools, we performed a second round of differentiation.
- iPSCs were nucleofected as before, but we selected and stabilized the cell lines before screening differentiation in different settings. Specifically, given the stability of the constructed cell lines (e.g., less cell death), we opted to test them for only six days, and also decided to test their performance in targeT-cell-type growth medium in addition to stem cell medium (data not shown). In this round, we found broad improvement in differentiation percentage across all six cell types (FIGs.4A-D). While B cells already had a considerably high differentiation percentage in the primary screening round (17.6%), it improved to an average of greater than 50%.
- iTregs germ layer - regulatory T-cells
- iCD8s cytotoxic T-cells
- iAstIIs type II astrocytes
- iHeps hepatocytes
- iHeps we validated the morphology (FIG.5D) and compared their viability compared to primary hepatocytes and undifferentiated cells when exposed to hepatotoxins for 24hrs. We observed that our iHeps had highly similar viability to primary hepatocytes after being exposed to Nefazodone (FIG.5E), Acetaminophen (FIG.5F), and Troglitozone (FIG. 5G), and demonstrated significantly higher viability compared to undifferentiated iPSCs. iTRegs were validated by demonstrating that the cells inhibited the expansion of responder T-cells.
- iTRegs had size and morphology approximately the same as primary cytotoxic responder cells (FIG.5H). While the size and shape were generally consistent, with both iTRegs and iCytoTs, the primary responder T- cells took on an elongated shape when stimulated, while our iCytoTs did not clearly show this morphological change to stimulus.
- Responder T-cells were stimulated to activate with IL-2 and CD3+CD28+ beads for three days. After this activation step, responder T-cells were labeled with a fluorescent dye and co-cultured with iTRegs in variable quantities.
- CellCartographer has a very minimal requirement for producing useful TF pools — it does not require re-training large models for additional cell types, which can prove useful for engineering cell lines for differentiation into exotic cell types with little data available. Furthermore, we were able to successfully engineer iTRegs using TFs determined from Mus Musculus data since that was the only epigenetic NGS data available for this cell type, meaning calculations of factors can work cross-species. Finally, the pooled screening philosophy of CellCartographer, allows biologists to explore and debug many experimental variables that are generally invisible to software tools — namely synthetic DNA genomic integration location, copy count, and cell culture conditions. Pooled screening and paired ML analysis allows us to screen out these issues.
- PGP1 iPS cells were expanded and nucleofected with P3 Primary cell 4D Nuceleofection kits with pulse code CB150 using 2 ⁇ g of total DNA for 800,000 cells (1.6 ⁇ g TF pool/0.4 ⁇ g SPB) [Lonza]. Cells were plated onto Matrigel-cotated plates [Corning] with ROCK-inhibitor [Millipore] and selected with puromycin [Sigma].
- Stable cell lines were expanded over several passages using TrypLE [Gibco] in mTeSR1 [StemCell Technologies] and frozen in mFreSR [StemCell Technologies].
- Cells were differentiated with 2ng/mL doxycycline [Sigma] at variable conditions as described in (data not shown) in either mTeSR [StemCell Technologies], RPMI-1640 (microglia) [Gibco], Williams’ E Medium (hepatocytes) [Gibco], Immunocult- XF T-cell Expansion Media (T-cells) [StemCell Technologies], LGM-3 (B cells) [Lonza], or BrainPhys Media (Astrocytes) [Stem Cell Technologies].
- Flow Cytometry and Cell Sorting Cells were digested in TrypLE [Gibco] and resuspended in growth media before staining with cell surface markers.
- the following antibodies were used for analysis and cell sorting: [Microglia: CD11b-FITC, CX3CR1-PE]; [CD8-positive T-cells: CD3-PerCP-Cy5.5, CD8-FITC]; [T-Regulatory cells: CD3-PerCP-Cy5.5, CD4-PE-Cy7, FOXP3-PE, CD127- V450]; [B cells: CD19-PE-Cy7, CD27-FITC]; [Hepatocytes: ASGPR1-PE, CD184-APC]; [Astrocytes: CD44-FITC, A2B5-PE].
- RNA sequencing Cells were either collected from FACS (primary screens) or collected directly from culture (refined screens and stable cell line characterization) and were lysed in TRIzol [Invitrogen]. RNA was purified with Direct-zol RNA MicroPrep and RNA MiniPrep kits [Zymo].
- glass bottom dishes [Ibidi 81158] were coated in Poly-d-lysine (0.1 mg/mL) for 2 hours at room temperature, washed twice in PBS [Gibco], and coated overnight in fibronectin (10 ⁇ g/mL) [Thermo] at 37 ⁇ C.
- Differentiated astrocytes were digested in TrypLE [Gibco] for 7-10 minutes, and 40,000- 50,000 cells were transferred to coated dishes and maintained for 2 days before stimulation and imaging. Prior to stimulation and imaging the astrocytes were stained with Fluo-4 (1 ⁇ g/mL) [FluoroPure] in BrainPhys without phenol red [StemCell] and incubated in the dark for at least 25 minutes at 37 ⁇ C.
- iHeps were differentiated as described (data not shown) and then transferred to 96- well plates pre-coated with Matrigel [Corning] and treated with hepatotoxins as previously described. Briefly, after differentiation, 25,000 iHeps, undifferentiated iPSCs, and plateable primary human hepatocytes [ZenBio] were plated in each well and incubated overnight at 37 ⁇ C.
- Hepatocyte Medium E (William’s E Medium [Gibco], Maintenance Cocktail B [Gibco], and 0.1 ⁇ M Dexamethasone [Gibco]) for one day.
- media was exchanged and supplemented with hepatotoxins (Acetaminophen at [3.125,6.25,12.5,25,50,100] mM [Spectrum], Nefazodone at [1,3,10,30,100,300] ⁇ M [Sigma], and Troglitazone at [1,3,10,30,100,300] ⁇ M [Sigma]).
- Cytotoxic T-cell activation assays Primary cytotoxic T-cells (Human Peripheral Blood CD4+CD45RA+ T Cells) [StemCell] and iCytoTs were cultured and activated in the same manner. Briefly, cells were incubated in ImmunoCult- XF T Cell Expansion Medium [StemCell] + IL-2 [R&D Systems] with DYNAL Dynabeads Human T-Activator CD3/CD28 for T Cell Expansion and Activation [Fisher] for 3 days.
- iTRegs were differentiated in ImmunoCult-XF T Cell Expansion Medium [StemCell] + IL-2 [R&D Systems] for 4 days and then moved into co-culture with activated and CellTrace Violet [Fisher] stained cytotoxic T-cells and grown at 37 ⁇ C for 11 days, changing media every 2-3 days. Finally, cells were analyzed via flow cytometry. The percentage of suppression was determined as 100 x [1 - (% of proliferating cells with iTRegs) / (% of proliferating cells without iTregs)] after applying gates for proliferating v. non- proliferating cells and subtracting auto-fluorescence resulting from unstained iTregs.
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Biotechnology (AREA)
- Organic Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Zoology (AREA)
- Developmental Biology & Embryology (AREA)
- Microbiology (AREA)
- Transplantation (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Cell Biology (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
Provided herein are methods and compositions for differentiating induced pluripotent stem cells into one or more cell types by overexpressing one or more transcription factors.
Description
METHODS AND COMPOSITIONS FOR INDUCING CELL DIFFERENTIATION CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application No.63/415,729, filed October 13, 2022, which is hereby incorporated by reference in its entirety. GOVERNMENT LICENSE RIGHTS This invention was made with government support under W911NF-17-2-0089 awarded by U.S. Army Research Office and under DK091183 awarded by National Institutes of Health. The government has certain rights in the invention. REFERENCE TO AN ELECTRONIC SEQUENCE LISTING The contents of the electronic sequence listing (H049870763WO00-SEQ-KVC.xml; Size: 31,902 bytes; and Date of Creation: October 10, 2023) is herein incorporated by reference in its entirety. BACKGROUND Stem cell and cell-fate engineering are promising research areas that may accelerate the development of therapeutics for a number of diseases. However, current approaches to stem cell and cell-fate engineering are laborious and costly. Efficient methods for differentiating stem cells into specific cell types that may be used for therapeutic disease intervention remain elusive. SUMMARY The present disclosure relates, at least in part, to methods and compositions for generating astrocyte-like cells (iAstIIs), cytotoxic T-cell-like cells (iCytoTs), hepatocyte-like cells (iHeps), regulatory T-cell-like cells (iTRegs), B cell-like cells (iBCells), and/or microglia-like cells (iMicroglia) from pluripotent stem cells. The present disclosure also relates, at least in part, for identifying transcription factors that improve differentiation efficiency of pluripotent stem cells into astrocyte-like cells, cytotoxic T-cell-like cells, hepatocyte-like cells, regulatory T-cell-like cells, B cell-like cells, and/or microglia-like cells. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding ERG, EGR1,
FLI1, FOSB, or any combination thereof. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding ERG. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding EGR1. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding FLI1. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding FOSB. In some embodiments, the PSC expresses or overexpresses ERG, EGR1, FLI1, FOSB, or any combination thereof. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: a protein selected from ERG, EGR1, FLI1, and FOSB, wherein the protein is overexpressed. In some embodiments, the PSC expresses or overexpresses: ERG, EGR1, FLI1, FOSB, or any combination thereof. In some embodiments, the PSC is a human PSC. In some embodiments, the PSC is an induced PSC (iPSC). In some embodiments, the PSC comprises 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ERG, EGR1, FLI1, and FOSB. Aspects of the present disclosure relate to a composition comprising: a population of any one of the PSCs described herein. In some embodiments, the population comprises at least 2500/cm2 of the PSC. Aspects of the present disclosure relate to a method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ERG, EGR1, FLI1, and FOSB to produce astrocyte-like cells. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ERG. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding EGR1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding FLI1. In some embodiments, the PSCs of the
expanded population comprise an engineered polynucleotide comprising an open reading frame encoding FOSB. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. In some embodiments, the inducible promoter is a chemically-inducible promoter. In some embodiments, the chemically-inducible promoter is a doxycycline-inducible promoter. In some embodiments, the population comprises 1x102 -1x107 PSCs. In some embodiments, the population of PSCs is cultured for at least 1 day. In some embodiments, the population of PSCs is cultured for about 3-6 days. In some embodiments, the population of PSCs is cultured for no more than 6 days. In some embodiments, the astrocyte-like cells are CD44+ and A2B5+. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding ZBTB1. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding RUNX3. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding RELA. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding NRF1. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding ERF. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding SP4. In some embodiments, the PSC expresses or overexpresses ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: a protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4, wherein the protein is overexpressed. In some embodiments, the PSC expresses or overexpresses: ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof. In some embodiments, the PSC is a human PSC. In some embodiments, the PSC is an induced PSC (iPSC). In some embodiments, the PSC comprises 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises 8-
10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. Aspects of the present disclosure relate to a composition comprising: a population of any one of the PSCs described herein. In some embodiments, the population comprises at least 2500/cm2 of the PSC. Aspects of the present disclosure relate to a method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 to produce cytotoxic T-cell-like cells. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ZBTB1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RUNX3. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ERF. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding SP4. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. In some embodiments, the inducible promoter is a chemically-inducible promoter. In some embodiments, the chemically-inducible promoter is a doxycycline-inducible promoter. In some embodiments, the population comprises 1x102 -1x107 PSCs. In some embodiments, the population of PSCs is cultured for at least 1 day. In some embodiments, the population of PSCs is cultured for about 3-6 days. In some embodiments, the population of PSCs is cultured for no more than 6 days. In some embodiments, the cytotoxic T-cell-like cells are CD3+ and CD8+. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding HNF4G, TEAD4, RFX3, or any combination thereof. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding HNF4G. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding TEAD4. In some embodiments,
the engineered polynucleotide comprises an open reading frame encoding RFX3. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding HNF4A. In some embodiments, the PSC expresses or overexpresses HNF4G, TEAD4, RFX3, or any combination thereof. In some embodiments, the PSC further expresses or overexpresses HNF4A. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: a protein selected from HNF4G, TEAD4, and RFX3, wherein the protein is overexpressed. In some embodiments, the PSC expresses or overexpresses: HNF4G, TEAD4, RFX3, or any combination thereof. In some embodiments, the PSC is a human PSC. In some embodiments, the PSC is an induced PSC (iPSC). In some embodiments, the PSC comprises 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from HNF4G, HNF4A, TEAD4, and RFX3. In some embodiments, the PSC comprises 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from HNF4G, TEAD4, and RFX3. Aspects of the present disclosure relate to a composition comprising: a population comprising any one of the PSCs described herein. In some embodiments, the population comprises at least 2500/cm2 of the PSC. Aspects of the present disclosure relate to a method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from HNF4G, TEAD4, and RFX3 to produce hepatocyte-like cells. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding HNF4G. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding TEAD4. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RFX3. In some embodiments, the PSCs of the expanded population further comprise an engineered polynucleotide comprising an open reading frame encoding HNF4A. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous
promoter is an inducible promoter. In some embodiments, the inducible promoter is a chemically-inducible promoter. In some embodiments, the chemically-inducible promoter is a doxycycline-inducible promoter. In some embodiments, the population comprises 1x102 -1x107 PSCs. In some embodiments, the population of PSCs is cultured for at least 1 day. In some embodiments, the population of PSCs is cultured for about 3-6 days. In some embodiments, the population of PSCs is cultured for no more than 6 days. In some embodiments, the hepatocyte-like cells are CD184+ and ASGPR1+. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof. In some embodiments, the engineered polynucleotide comprises comprising an open reading frame encoding ETS1. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding ETV3. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding GABPA. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding KLF9. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding NFKB1. In some embodiments, the PSC expresses or overexpresses ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: a protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1, wherein the protein is overexpressed. In some embodiments, the PSC expresses or overexpresses: ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof. In some embodiments, the PSC is a human PSC. In some embodiments, the PSC is an induced PSC (iPSC). In some embodiments, the PSC comprises 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1.
Aspects of the present disclosure relate to a composition comprising: a population of any one of the PSCs described herein. In some embodiments, the population comprises at least 2500/cm2 of the PSC. Aspects of the present disclosure relate to a method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1 to produce regulatory T-cell-like cells. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding ETS1. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding ETV3. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding GABPA. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding KLF9. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding NFKB1. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. In some embodiments, the inducible promoter is a chemically-inducible promoter. In some embodiments, the chemically-inducible promoter is a doxycycline-inducible promoter. In some embodiments, the population comprises 1x102 -1x107 PSCs. In some embodiments, the population of PSCs is cultured for at least 1 day. In some embodiments, the population of PSCs is cultured for about 3-6 days. In some embodiments, the population of PSCs is cultured for no more than 6 days. In some embodiments, the regulatory T-cell-like cells are CD3+ and CD25+. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding EBF1. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding ZBTB1. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding RELA. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding NRF1. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding REL. In some embodiments, the PSC expresses or overexpresses EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof.
In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: a protein selected from EBF1, ZBTB1, RELA, NRF1, and REL, wherein the protein is overexpressed. In some embodiments, the PSC expresses or overexpresses: EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof. In some embodiments, the PSC is a human PSC. In some embodiments, the PSC is an induced PSC (iPSC). In some embodiments, the PSC comprises 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from EBF1, ZBTB1, RELA, NRF1, and REL. In some embodiments, the PSC comprises 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from EBF1, ZBTB1, RELA, NRF1, and REL. Aspects of the present disclosure relate to a composition comprising: a population of any one of the PSCs described herein. In some embodiments, the population comprises at least 2500/cm2 of the PSC. Aspects of the present disclosure relate to a method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from EBF1, ZBTB1, RELA, NRF1, and REL to produce B cell-like cells. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding EBF1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ZBTB1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding REL. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. In some embodiments, the inducible promoter is a chemically-inducible promoter. In some embodiments, the chemically-inducible promoter is a doxycycline-inducible promoter.
In some embodiments, the population comprises 1x102 -1x107 PSCs. In some embodiments, the population of PSCs is cultured for at least 1 day. In some embodiments, the population of PSCs is cultured for about 3-6 days. In some embodiments, the population of PSCs is cultured for no more than 6 days. In some embodiments, the B cell-like cells are CD19+ and CD27+. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding SPI1, ZBTB1, RELA, STAT2, or any combination thereof. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding SPI1. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding ZBTB1. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding RELA. In some embodiments, the engineered polynucleotide comprises an open reading frame encoding STAT2. In some embodiments, the PSC expresses or overexpresses SPI1, ZBTB1, RELA, STAT2, or any combination thereof. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. Aspects of the present disclosure relate to a pluripotent stem cell (PSC) comprising: a protein selected from SPI1, ZBTB1, RELA, and STAT2, wherein the protein is overexpressed. In some embodiments, the PSC expresses or overexpresses: SPI1, ZBTB1, RELA, STAT2, or any combination thereof. In some embodiments, the PSC is a human PSC. In some embodiments, the PSC is an induced PSC (iPSC). In some embodiments, the PSC comprises 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from SPI1, ZBTB1, RELA, and STAT2. In some embodiments, the PSC comprises 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from SPI1, ZBTB1, RELA, and STAT2. Aspects of the present disclosure relate to a composition comprising: a population comprising any one of the PSCs described herein. In some embodiments, the population comprises at least 2500/cm2 of the PSC. Aspects of the present disclosure relate to a method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of
PSCs; and expressing in PSCs of the expanded population a protein selected from SPI1, ZBTB1, RELA, and STAT2 to produce microglia-like cells. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding SPI1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ZBTB1. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding STAT2. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter. In some embodiments, the inducible promoter is a chemically-inducible promoter. In some embodiments, the chemically-inducible promoter is a doxycycline-inducible promoter. In some embodiments, the population comprises 1x102 -1x107 PSCs. In some embodiments, the population of PSCs is cultured for at least 1 day. In some embodiments, the population of PSCs is cultured for about 3-6 days. In some embodiments, the population of PSCs is cultured for no more than 6 days. In some embodiments, the microglia-like cells are CD11b+ and CX3CR1+. Aspects of the present disclosure relate to a method, comprising: (i) analyzing epigenetics data for a target cell type to identify genomic sites that are available for binding of a transcription factor and generating a first pool of transcription factors; (ii) analyzing transcriptomic data for the target cell type to identify expression levels of the transcription factors associated with the genomic sites that are available for binding identified in step (i) and generating a second pool of transcription factors; (iii) using a first statistical method to filter background data and identify transcription factors that are present in the first pool of transcription factors and the second pool of transcription factors and generating a third pool of transcription factors, wherein the third pool of transcription factors comprises transcription factors that are in both the first pool and the second pool; (iv) using a second statistical method to determine the statistical significance of the transcription factors in the third pool of transcription factors; and (v) repeating steps (i)-(iv) one or more times to iteratively refine the third pool of transcription factors.
In some embodiments, the epigenetics data provides information related to whether genomic chromatin is open or closed. In some embodiments, the epigenetics data is produced by DNAse-seq, ATAC-seq, or ChIP-seq. In some embodiments, the transcriptomic data provides information related to whether there are more transcripts of the transcription factor in the target cell type than in a non-target cell type. In some embodiments, the transcriptomic data is produced by RNA-seq. In some embodiments, the first statistical method is linear regression algorithm. In some embodiments, the first statistical method is a logistic regression algorithm. In some embodiments, the first statistical method is a L1-regularized logistic regression model (LASSO). In some embodiments, the background data is associated with transcription factors that are not expressed in the target cell type at a higher expression level than in the non-target cell type. In some embodiments, the second statistical method is a log-likelihood ratio test. In some embodiments, the method further comprises transfecting transcription factors of the third pool into a stem cell. In some embodiments, the method further comprises inducing differentiation of the stem cell into the target cell type. In some embodiments, the method further comprises analyzing the target cell type to identify additional transcription factors associated with the target cell type. In some embodiments, the method further comprises using data from the target cell type to further refine the previous steps. In some embodiments, the target cell type is an astrocyte, a cytotoxic T-cell, a hepatocyte, a regulatory T-cell, a B cell, or a microglial cell. In some embodiments, differentiation of stem cells using one or more of the transcription factors in the third pool results in production of the target cell type in no more than 6 days. Aspects of the present disclosure relate to a method for generating a transcription factor screening pool comprising: using at least one computer hardware processer to perform: accessing at least one statistical model relating one or more input transcription factors to differentiation efficiency of a cell having the one or more input transcription factors; obtaining differentiation efficiency information for the one or more input transcription factors; generating, using the at least one statistical model and the differentiation efficiency information, a transcription factor pool having transcription factors that are predicted to differentiate the cell into a target cell type in accordance with the differentiation efficiency information.
In some embodiments, the at least one statistical model correlates chromatin accessibility data and transcriptomics data to make initial predictions relating the one or more input transcription factors to differentiation efficiency of the cell having the one or more input transcription factors. In some embodiments, the at least one statistical model distinguishes open chromatin data from background data. In some embodiments, the open chromatin data is associated with the target cell type. In some embodiments, the method further comprises identifying an initial set of transcription factor motifs positively correlated with the open chromatin data by using a statistical coefficient trained to distinguish the open chromatin data from the background data. In some embodiments, the differentiation efficiency information corresponds to a mode of a distribution of differentiation efficiency data used to train the at least one statistical model. In some embodiments, wherein the at least one statistical model was trained using measured differentiation efficiency values having a multimodal distribution with modes, and the differentiation efficiency information corresponds to a mode of the multimodal distribution with the highest value. In some embodiments, the transcription factors of the transcription factor pool have predicted differentiation efficiency within a distribution centered at the mode of the multimodal distribution with the highest value. In some embodiments, the differentiation efficiency information corresponds to a Gaussian distribution centered at a mode of a distribution for differentiation efficiency data used to train the at least one statistical model. In some embodiments, the differentiation efficiency information corresponds to a high differentiation efficiency component of a distribution of differentiation efficiency values for transcription factors. In some embodiments, generating the transcription factor pool further comprises: generating an initial pool of transcription factors; using transcription factors in the initial pool as input to the at least one statistical model to obtain values for differentiation efficiency; selecting, based on the values for differentiation efficiency and the differentiation efficiency information, one or more of the transcription factors in the initial pool to include in the transcription factor pool. In some embodiments, the at least one statistical model comprises at least one regression model. In some embodiments, the at least one statistical model comprises at least one neural network. In some embodiments, the at least one statistical model has a recurrent neural network architecture. In some embodiments, the at least one statistical model
comprises a L1-regularized logistic regression model (LASSO). In some embodiments, the at least one statistical model comprises a log-likelihood ratio test. Aspects of the present disclosure relate to a system comprising: at least one hardware processor; and at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform: accessing at least one statistical model relating one or more input transcription factors to differentiation efficiency of a cell having the one or more input transcription factors; obtaining differentiation efficiency information for transcription factors, wherein the differentiation efficiency information corresponds to a mode of a distribution for differentiation efficiency data used to train the at least one statistical model; and generating, using the at least one statistical model and the differentiation efficiency information, a transcription factor pool having transcription factors with predicted differentiation efficiency in accordance with the differentiation efficiency information. In some embodiments, the target cell type is a Type II astrocyte, cytotoxic T-cell, regulatory T- cell, hepatocyte, B cell, or microglial cell. The details of one or more embodiments of the invention are set forth in the description below. Other features or advantages of the present invention will be apparent from the following drawings and detailed description of several embodiments, and also from the appended claims. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. In the drawings: FIGs.1A-1E show an overview of the machine-guided experimental workflow with the human TFome to identify and optimize transcription factor (TF) conversion combinations. (FIG.1A) CellCartographer considers only TFs with known binding motifs that are found to be highly specific to target cell identity. (FIG.1B) The CellCartographer workflow uses epigenetics and transcriptomics NGS data to determine TF pools for screening with the TFome (dashed box). Iterative rounds of screening are refined with ML and engineered induced pluripotent stem cell (iPSC) lines with sufficient differentiation
undergo clonal isolation to isolate high-efficiency clones. (FIG.1C) TF-binding motifs and chromatin accessibility data are used to train a classifier model to determine TFs that are associated with the cell types of interest and then filtered with RNAseq data to get a finalized sub-library of TFs for pooled screening. (FIG.1D) iPSCs are nucleofected with TF-cassette pools that are integrated randomly into the genome where any one cell may receive some combination of these factors in either multiple copies or not at all. (FIG.1E) In silico validation of screening lists — for four cell types with previously validated TF- overexpression differentiation factors, our model accurately re-identifies these factors (shaded) in the top TFs that would be put into a screen. FIGs.2A-2D show computational analysis of 34 cell types with CellCartographer. (FIG.2A) Multidimensional scaling of the similarity in gene expression between different cell types. (FIG.2B) Multidimensional scaling of the similarity in TFs correlated with open chromatin. (FIG.2C) Motifs correlated and anti-correlated with open chromatin vary across 34 cell types analyzed. (FIG.2D) Highly ranked motifs correlated with open chromatin for cell types derived from yolk sac (microglia), endoderm (hepatocyte), mesoderm (B cell, T- cell, regulatory T-cell), and ectoderm (astrocyte). FIGs.3A-3F show primary pooled screens for cell types originating from each germ layer. For each cell type, a negative antibody stain for iPSCs without TFs (LEFT), the cell population with induced TFs (MIDDLE), and the barcoded TF appearance frequency in the transcriptome of double-positive cell populations (RIGHT) is shown. (FIG.3A) Type II Astrocytes (ectoderm). (FIG.3B) Microglia (yolk sac). (FIG.3C) CD8-positive T-cells (mesoderm). (FIG.3D) B cells (mesoderm). (FIG.3E) Regulatory T-cells (mesoderm). (FIG.3F). Hepatocytes (endoderm). FIGs.4A-4H. show iteratively engineered poly-clonal and mono-clonal cell lines. (FIGs.4A-4D) For each cell type, we show percent double-positive for fluorescence- activated cell sorting (FACS) analysis of canonical markers for non-clonal and mono-clonal (dashed box) cell lines, and an iPSC + media control (solid box) differentiated for six days in cell-type-specific media + DOX for (FIG.4A) Type II Astrocytes (iAstIIs) (FIG.4B) Cytotoxic T-cells (iCytoTs) (FIG.4C) Hepatocytes (iHeps) and (FIG.4D) Regulatory T- cells (iTRegs). (FIG.4E) Differential gene expression (quantified by Z-score) for all genes for two replicates of each differentiated cell type in both media conditions. (FIG.4F) Principal component analysis of all genes for each cell type in each media condition and a primary cell control. (FIG.4G) Differential gene expression (quantified by Reads Per Kilobase Million (RPKM)) for key marker genes across target cell types and iPSCs. (FIG.
4H) Metascape analysis of gene enrichment of high-efficiency clones for genes that were upregulated in these lines and differentiation conditions compared to iPSCs. Analysis of select highly significant GO Terms from TOP 50 for each differentiated cell type and condition is shown (-log10(P) ≥ 3). FIGs.5A-5L show functional validation of iAstIIs, iHeps, iCytoTs, and iTRegs. (FIGs.5A-5C) Stimulation of Type II astrocytes over 10 min with small molecules with (FIG.5A) 100µM ATP, (FIG.5B) 100µM glutamate, and (FIG.5C) 30mM KCl. (LEFT) Relative fluorescence of six individual astrocytes. Astrocyte cell population shown before (TOP) and after (BOTTOM) addition of small molecule. (FIG.5D) BF image of induced iHeps prior to hepatotoxicity testing. iHeps, primary hepatocytes and iPCs titrated with (FIG. 5E) Nefazodone, (FIG.5F) Acetaminophen, and (FIG.5G) Troglitazone for 24h and assayed for percent viability (survival rate normalized to each cell type without toxins applied). (FIG.5H) Brightfield imaging of T-cell populations (LEFT to RIGHT): Primary CD8 T-cells, iTRegs, Primary CD8 T-cells + activation beads, iCytoTs + activation beads. (FIG.5I) Suppression assay for iTRegs co-cultured with activated primary CD8 T-cells. (FIG.5J) Calculated percent suppression with titrated dosing of iTRegs in suppression assay. Primary T-cells have been shown to suppress in the range of 20/30/40% respectively. (FIG. 5K) Activation assay for iCytoTs. (FIG.5L) Percent of proliferating primary CD8 T-cells and iCD8 cells post-activation. FIGs.6A-6B show cell line differentiation for double-positive surface markers after 6 days of differentiation in cell-type-specific growth medium with doxycycline (DOX). For B cells (FIG.6A) and microglia (FIG.6B), percent double-positive for FACS analysis of canonical markers for iPSC only (solid box), non-clonal and mono-clonal (dashed box) cell lines is shown. FIGs.7A-7F show cell line differentiation for either one or both cell surface markers after 6 days of differentiation in cell-type-specific growth medium with DOX. For Type II astrocytes (FIG.7A), CD8-positive T-cells (FIG.7B), microglia (FIG.7C), regulatory T- cells (FIG.7D), hepatocytes (FIG.7E), and B cells (mesoderm) (FIG.7F), percent differentiated for FACS analysis of canonical markers for non-clonal, mono-clonal (dashed box) cell lines, and iPSC+ media control (solid box) is shown. DETAILED DESCRIPTION Cell types generated by differentiation of stem cells have the potential to accelerate therapeutic discoveries for a variety of diseases. The present disclosure relates, at least in
part, to methods and compositions for generating astrocyte-like cells (iAstIIs), cytotoxic T- cell-like cells (iCytoTs), hepatocyte-like cells (iHeps), regulatory T-cell-like cells (iTRegs), B cell-like cells (iBCells), and/or microglia-like cells (iMicroglia) from pluripotent stem cells (e.g., induced pluripotent stem cells (iPSCs)). The present disclosure also relates, at least in part, to methods for identifying transcription factors that improve differentiation efficiency of pluripotent stem cells into astrocyte-like cells, cytotoxic T-cell-like cells, hepatocyte-like cells, regulatory T-cell-like cells, B cell-like cells, and/or microglia-like cells. Astrocyte-Like Cells Some aspects of the present disclosure provide astrocyte-like cells and methods of producing such cells. Astrocytes are specialized glial cells in the brain that regulate neuronal synapses and play an important role in the neuroimmune system. An astrocyte-like cell is a cell that exhibits phenotypic characteristics of astrocytes. For example, an astrocyte-like cell may express one or more biomarkers expressed by an astrocyte or exhibit one or more functions exhibited by an astrocyte. Astrocytes are broadly classified as Type I astrocytes and Type II astrocytes. In some embodiments, the astrocyte-like cell of the present disclosure are Type I astrocytes or exhibit phenotypic characteristics of Type I astrocytes. Phenotypic characteristics of Type I astrocytes include, for example, a protoplasmic presentation with short astrocytic processes. In some embodiments, the astrocyte-like cell of the present disclosure are Type II astrocytes or exhibit phenotypic characteristics of Type II astrocytes. Phenotypic characteristics of Type II astrocytes include, for example, a fibrous presentation with long astrocytic processes. Native astrocytes typically express the gene Cluster of Differentiation 44 (CD44) and the gene A2B5, two putative marker genes for astrocytes. In some embodiments, an astrocyte-like cell expresses CD44 (i.e., is CD44-positive (CD44+)). In some embodiments, an astrocyte-like cell expresses A2B5 (i.e., is A2B5+). Thus, in some embodiments, the astrocyte-like cells produced by the methods provided herein are CD44+/A2B5+ astrocyte- like cells. (i.e., cells that express the CD44 protein and the A2B5 protein). Other biomarkers of astrocyte identity include high levels of glial fibrillary acidic protein (GFAP) (e.g., ≥80% of cells of an iPSC-derived population express GFAP) and low levels of neuronal class III beta-tubulin (TUJ1) (e.g., ≤15% of cells of an iPSC-derived population express TUJ1). Additional biomarkers of astrocyte identity include high levels of expression of aquaporin-4 (AQP4) (see, e.g., Jurga AM , Paleczna M, Kadluczka J, Kuter
KZ. Beyond the GFAP-Astrocyte Protein Markers in the Brain. Biomolecules.2021; 11(9):1361.). Cytotoxic T-cell-Like Cells Some aspects of the present disclosure provide cytotoxic T-cell-like cells and methods of producing such cells. Cytotoxic T cells are a type of immune cell associated with the innate immune system. Cytotoxic T-cells are T lymphocytes that kill foreign cells and pathogens in the body. A cytotoxic T-cell-like cell is a cell that exhibits phenotypic characteristics of cytotoxic T-cells. For example, a cytotoxic T-cell-like cell may express one or more biomarkers expressed by a cytotoxic T-cell or exhibit one or more functions exhibited by a cytotoxic T-cell. Cytotoxic T-cells that development in the body typically express the gene Cluster of Differentiation 3 (CD3) and the gene Cluster of Differentiation 8 (CD8), two putative marker genes for cytotoxic T-cells. In some embodiments, a cytotoxic T-cell-like cell expresses CD3 (i.e., is CD3-positive (CD3+)). In some embodiments, a cytotoxic T-cell-like cell expresses CD8 (i.e., is CD8+). Thus, in some embodiments, the cytotoxic T-cell-like cells produced herein are CD3+/CD8+ cytotoxic T-cell-like cells (i.e., cells that express the CD3 protein and the CD8 protein). Hepatocyte-Like Cells Some aspects of the present disclosure provide hepatocyte-like cells and methods of producing such cells. Hepatocytes are specialized epithelial cells in the liver that play an important role in metabolism, detoxification, and protein synthesis. Hepatocytes also participate in the innate immune response by secreting immune proteins in response to invading cells, pathogens, and microorganisms. A hepatocyte-like cell is a cell that exhibits phenotypic characteristics of hepatocytes. For example, a hepatocyte-like cell may express one or more biomarkers expressed by a hepatocyte or exhibit one or more functions exhibited by a hepatocyte. Hepatocytes produced in the body express the gene Cluster of Differentiation 184 (CD184) and the gene Asialoglycoprotein Receptor 1 (ASGPR1), two putative marker genes for hepatocytes. In some embodiments, a hepatocyte-like cell expresses CD184 (i.e., is CD184-positive (CD184+)). In some embodiments, a hepatocyte-like cell expresses ASGPR1 (i.e., is ASGPR1+). Thus, in some embodiments, the hepatocyte-like cells produced by the
methods provided herein are CD184+/ASGPR1+ hepatocyte-like cells (i.e., cells that express the CD184 protein and the ASGPR1 protein). Regulatory T-cell-Like Cells Some aspects of the present disclosure provide regulatory T-cell-like cells and methods of producing such cells. Regulatory T-cells are a specialized type of T-cell that act to suppress the immune response and maintain homeostasis in the body. A regulatory T-cell- like cell is a cell that exhibits phenotypic characteristics of regulatory T-cells. For example, a regulatory T-cell-like cell may express one or more biomarkers expressed by a regulatory T- cell or exhibit one or more functions exhibited by a regulatory T-cell. Regulatory T-cells produced in the body express the gene Cluster of Differentiation 3 (CD3) and the gene Cluster of Differentiation 25 (CD25), two putative marker genes for regulatory T-cells. In some embodiments, a regulatory T-cell-like cell expresses CD3 (i.e., is CD3-positive (CD3+)). In some embodiments, a regulatory T-cell-like cell expresses CD25 (i.e., is CD25+). Thus, in some embodiments, the regulatory T-cell-like cells produced by the methods provided herein are CD3+/CD25+ regulatory T-cell-like cells (i.e., cells that express the CD3 protein and the CD25 protein). B Cell-Like Cells Some aspects of the present disclosure provide B cell-like cells and methods of producing such cells. B cells are a specialized type of white blood cells that make antibodies. B cells are a part of the immune system and develop from stem cells in the bone marrow. A B cell-like cell is a cell that exhibits phenotypic characteristics of B cells. For example, a B cell- like cell may express one or more biomarkers expressed by a B cell or exhibit one or more functions exhibited by a B cell. B cells produced in the body express the gene Cluster of Differentiation 19 (CD19) and the gene Cluster of Differentiation 27 (CD27), two putative marker genes for B cells. In some embodiments, a B cell-like cell expresses CD19 (i.e., is CD19-positive (CD19+)). In some embodiments, a B cell-like cell expresses CD27 (i.e., is CD27+). Thus, in some embodiments, the B cell-like cells produced by the methods provided herein are CD19+/CD27+ B cell-like cells (i.e., cells that express the CD19 protein and the CD27 protein). Microglia-Like Cells
Some aspects of the present disclosure provide microglia-like cells and methods of producing such cells. Microglia are a specialized type glial cells that function as macrophages in the central nervous system. A microglia-like cell is a cell that exhibits phenotypic characteristics of microglia. For example, a microglia-like cell may express one or more biomarkers expressed by a microglial cell or exhibit one or more functions exhibited by a microglial cell. Microglia produced in the body express the gene Cluster of Differentiation 11b (CD11b) and the gene C-X3-C Motif Chemokine Receptor 1 (CX3CR1), two putative marker genes for microglia. In some embodiments, a microglia-like cell expresses CD11b (i.e., is CD11b-positive (CD11b+)). In some embodiments, a microglia-like cell expresses CX3CR1 (i.e., is CX3CR1+). Thus, in some embodiments, the microglia-like cells produced by the methods provided herein are CD11b+/CX3CR1+ microglia-like cells (i.e., cells that express the CD11b protein and the CX3CR1 protein). Pluripotent Stem Cells The astrocyte-like cells, cytotoxic T-cell-like cells, hepatocyte-like cells, regulatory T-cell-like cells, B cell-like cells, and microglia-like cells provided herein are differentiated from pluripotent stem cells. Pluripotent stem cells are cells that have the capacity to self- renew by dividing, and to develop into the three primary germ cell layers of the early embryo (e.g., ectoderm, endoderm, and mesoderm), and therefore into all cells of the adult body, but not extra-embryonic tissues such as the placenta. Non-limiting examples of pluripotent stem cells include induced pluripotent cell (iPSCs), “true” embryonic stem cell (ESCs) derived from embryos, embryonic stem cells made by somatic cell nuclear transfer (ntESCs), and embryonic stem cells from unfertilized eggs (parthenogenesis embryonic stem cells, or pESCs). In some embodiments, a pluripotent cell is a human pluripotent cell. In some embodiments, a pluripotent stem cell is an embryonic stem cell, such as a human embryonic stem cell. “Embryonic stem cell” is a general term for pluripotent stem cells that are made using embryos or eggs, rather than for cells genetically reprogrammed from the body. As used herein, “ESCs” encompass true ESCs, ntESCs, and pESCs. In other embodiments, a pluripotent stem cell is an induced pluripotent stem cell, such as a human induced pluripotent stem cell. iPSCs may be derived from skin or blood cells that have been reprogrammed back into an embryonic-like pluripotent state that enables the
development of an unlimited source of any type of human cell. See, e.g., Ye L et al. Curr Cardiol Rev.2013 Feb; 9(1): 63–72, incorporated herein by reference. The PSCs provided herein are engineered to differentiate into a particular cell type of interest by overexpressing one or more proteins (e.g., transcription factors) in the PSCs. A transcription factor is a protein that controls the rate of transcription. In some embodiments, a protein is expressed in a PSC at a level that is at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 50%, or at least 100% higher than a control level, for example, an endogenous (baseline) level. A cell “expresses” a particular protein if the level of the protein in the cell is detectable (e.g., using a known protein assay). A cell “overexpresses” a particular protein (e.g., engineered polynucleotide encoding the protein) if the level of the protein is higher than (e.g., at least 5%, at least 10%, or at least 20% higher than) the level of the protein expressed from an endogenous, naturally-occurring polynucleotide encoding the protein. In some embodiments, a control level of protein expression is an endogenous (baseline) level of expression of that same protein, for example, in a naturally-occurring pluripotent stem cell. The term “protein” encompasses full length functional proteins as well as full-length or truncated functional variants of a protein, unless stated otherwise. Thus, the term “protein” encompasses full length functional transcription factors as well as full-length or truncated functional variants of the transcription factors, unless stated otherwise. A variant protein may comprise an amino acid sequence that has, for example, at least 80% identity to the amino acid sequence of a corresponding wild-type or reference protein and still exhibits the same function(s) as the corresponding wild-type or reference protein. In some embodiments, a variant protein has at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity to a wild-type or reference protein. In some embodiments a global alignment is used to determine the percent identity between two proteins (e.g., the alignment spanning the entire length of both proteins). In other embodiments, such as those involving truncated variants, a local alignment may be used to determine the percent identity between regions of similarity between the two proteins (e.g., the alignment spanning the entire length of the truncated proteins but not the entire length of the wild-type or reference protein). Differentiation is the process by which an uncommitted cell or a partially committed cell commits to a specialized cell fate. Aspects of the present disclosure relate to the differentiation of uncommitted pluripotent stem cells (e.g., induced pluripotent stem cells) into one or more cell fate selected from, for example, astrocyte-like cells, cytotoxic T-cell-
like cells, hepatocyte-like cells, regulatory T-cell-like cells, B cell-like cells, and microglia- like cells. Differentiation of Astrocyte-Like Cells Some aspects of the present disclosure provide a PSC (e.g., iPSC, such as a human iPSC) that is engineered to differentiate into an astrocyte-like cell. In some embodiments, the PSC comprises: a (one or more, e.g., 1, 2, 3, or 4) protein selected from ETS Transcription Factor ERG (ERG), Early Growth Response 1 (EGR1), friend leukemia integration 1 transcription factor (FLI1), and FBJ murine osteosarcoma viral oncogene homolog B (FOSB), wherein the protein is overexpressed. In some embodiments, the PSC comprises: one or more proteins selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises: two or more proteins selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises: three or more proteins selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises: one protein selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises: two proteins selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises: three proteins selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises: ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises and/or overexpresses ERG. In some embodiments, overexpression refers to an expression level above the expression level in a control cell. In some embodiments, the PSC comprises and/or overexpresses EGR1. In some embodiments, the PSC comprises and/or overexpresses FLI1. In some embodiments, the PSC comprises and/or overexpresses FOSB. In some embodiments, the PSC comprises and/or overexpresses ERG and EGR1; ERG and FLI1; ERG and FOSB; EGR1 and FLI1; EGR1 and FOSG; or FLI1 and FOSB. In some embodiments, the PSC comprises and/or overexpresses any preceding combination and one (at least one) additional protein selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises and/or overexpresses ERG, EGR1, FLI1, and FOSB. A PSC, in some embodiments, comprises an engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ERG. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EGR1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding
FLI1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding FOSB. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ERG and an engineered polynucleotide comprising an open reading frame encoding EGR1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ERG and an engineered polynucleotide comprising an open reading frame encoding FLI1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ERG and an engineered polynucleotide comprising an open reading frame encoding FOSB. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EGR1 and an engineered polynucleotide comprising an open reading frame encoding FLI1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EGR1 and an engineered polynucleotide comprising an open reading frame encoding FOSG. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding FLI1 and an engineered polynucleotide comprising an open reading frame encoding FOSB. In some embodiments, the PSC comprises any preceding combination of engineered polynucleotides and one (at least one) additional engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ERG, EGR1, FLI1, and FOSB. In some embodiments, the PSC comprises an engineered polynucleotide comprising an open reading frame encoding ERG, an engineered polynucleotide comprising an open reading frame encoding EGR1, an engineered polynucleotide comprising an open reading frame encoding FLI1, and an engineered polynucleotide comprising an open reading frame encoding FOSB. Differentiation of Cytotoxic T Cell-Like Cells Some aspects of the present disclosure provide a PSC (e.g., iPSC, such as a human iPSC) that is engineered to differentiate into an cytotoxic T-cell-like cell. In some embodiments, the PSC comprises: a (one or more, e.g., 1, 2, 3, or 4) protein selected from Zinc finger and BTB domain containing 1 (ZBTB1), Runt-related transcription factor 3 (RUNX3), REL-associated protein (RELA), Nuclear respiratory factor 1 (NRF1), ETS2 Repressor Factor (ERF), and Sp4 transcription factor (SP4), wherein the protein is overexpressed. In some embodiments, the PSC comprises: one or more proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: two or more proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: three or more proteins selected from ZBTB1, RUNX3,
RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: four or more proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: five or more proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: one protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: two proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: three proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: four proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: five proteins selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises: ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises and/or overexpresses ZBTB1. In some embodiments, overexpression refers to an expression level above the expression level in a control cell. In some embodiments, the PSC comprises and/or overexpresses RUNX3. In some embodiments, the PSC comprises and/or overexpresses RELA. In some embodiments, the PSC comprises and/or overexpresses NRF1. In some embodiments, the PSC comprises and/or overexpresses ERF. In some embodiments, the PSC comprises and/or overexpresses SP4. In some embodiments, the PSC comprises and/or overexpresses ZBTB1 and RUNX3; ZBTB1 and RELA; ZBTB1 and NRF1; ZBTB1 and SP4; RUNX3 and RELA; RUNX3 and NRF1; RUNX3 and ERF; RUNX3 and SP4; RELA and NRF1; RELA and ERF; RELA and SP4; NRF1 and ERF; NRF1 and SP4; or ERF and SP4. In some embodiments, the PSC comprises and/or overexpresses any preceding combination and one (at least one) additional protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises and/or overexpresses ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. A PSC, in some embodiments, comprises an engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RUNX3. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding SP4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an
open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding RUNX3. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding ERF. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding SP4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RUNX3 and an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RUNX3 and an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RUNX3 and an engineered polynucleotide comprising an open reading frame encoding ERF. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RUNX3 and an engineered polynucleotide comprising an open reading frame encoding SP4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA and an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA and an engineered polynucleotide comprising an open reading frame encoding ERF. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA and an engineered polynucleotide comprising an open reading frame encoding SP4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding NRF1 and an engineered polynucleotide comprising an open reading frame encoding ERF. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding NRF1 and an engineered polynucleotide comprising an open reading frame encoding SP4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ERF and an engineered polynucleotide comprising an open reading frame encoding SP4. In some
embodiments, the PSC comprises any preceding combination of engineered polynucleotides and one (at least one) additional engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4. In some embodiments, the PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1, an engineered polynucleotide comprising an open reading frame encoding RUNX3, an engineered polynucleotide comprising an open reading frame encoding RELA, an engineered polynucleotide comprising an open reading frame encoding NRF1, an engineered polynucleotide comprising an open reading frame encoding ERF, and an engineered polynucleotide comprising an open reading frame encoding SP4. Differentiation of Hepatocyte-Like Cells Some aspects of the present disclosure provide a PSC (e.g., iPSC, such as a human iPSC) that is engineered to differentiate into an hepatocyte-like cell. In some embodiments, the PSC comprises: a (one or more, e.g., 1, 2, 3, or 4) protein selected from Hepatocyte Nuclear Factor 4 Alpha (HNF4A), Hepatocyte Nuclear Factor 4 Gamma (HNF4G), TEA Domain Transcription Factor 4 (TEAD4), and Regulatory Factor X3 (RFX3), wherein the protein is overexpressed. In some embodiments, the PSC comprises: one or more proteins selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises: two or more proteins selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises: three or more proteins selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises: one protein selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises: two proteins selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises: three proteins selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises: HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises and/or overexpresses HNF4A. In some embodiments, overexpression refers to an expression level above the expression level in a control cell. In some embodiments, the PSC comprises and/or overexpresses HNF4G. In some embodiments, the PSC comprises and/or overexpresses TEAD4. In some embodiments, the PSC comprises and/or overexpresses RFX3. In some embodiments, the PSC comprises and/or overexpresses HNF4A and HNF4G; HNF4A and TEAD4; HNF4A and RFX3; HNF4G and TEAD4; HNF4G and RFX3; TEAD4 and RFX3; In some embodiments, the PSC comprises and/or overexpresses any preceding combination and one (at least one) additional protein selected from HNF4A, HNF4G, TEAD4, and RFX3.In some embodiments, the PSC comprises and/or overexpresses HNF4A, HNF4G, TEAD4, and RFX3.
A PSC, in some embodiments, comprises an engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from HNF4A, HNF4G, TEAD4, and RFX3.In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4A. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4G. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding TEAD4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RFX3. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4A and an engineered polynucleotide comprising an open reading frame encoding HNF4G. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4A and an engineered polynucleotide comprising an open reading frame encoding TEAD4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4A and an engineered polynucleotide comprising an open reading frame encoding RFX3. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4G and an engineered polynucleotide comprising an open reading frame encoding TEAD4. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4G and an engineered polynucleotide comprising an open reading frame encoding RFX3. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding TEAD4 and an engineered polynucleotide comprising an open reading frame encoding RFX3. In some embodiments, the PSC comprises any preceding combination of engineered polynucleotides and one (at least one) additional engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from HNF4A, HNF4G, TEAD4, and RFX3. In some embodiments, the PSC comprises an engineered polynucleotide comprising an open reading frame encoding HNF4A, an engineered polynucleotide comprising an open reading frame encoding HNF4G, an engineered polynucleotide comprising an open reading frame encoding TEAD4, and an engineered polynucleotide comprising an open reading frame encoding RFX3. Differentiation of Regulatory T-Like Cells Some aspects of the present disclosure provide a PSC (e.g., iPSC, such as a human iPSC) that is engineered to differentiate into a regulatory T-cell-like cell. In some embodiments, the PSC comprises: a (one or more, e.g., 1, 2, 3, or 4) protein selected from
ETS Proto-Oncogene 1, Transcription Factor (ETS1), ETS Variant Transcription Factor 3 (ETV3), GA Binding Protein Transcription Factor Subunit Alpha (GABPA), and Krueppel- like factor 9 (KLF9), Nuclear Factor Kappa B Subunit 1 (NFKB1), wherein the protein is overexpressed. In some embodiments, the PSC comprises: one or more proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: two or more proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: three or more proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: four or more proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1.In some embodiments, the PSC comprises: one protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: two proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: three proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: four proteins selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises: ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises and/or overexpresses ETS1. In some embodiments, overexpression refers to an expression level above the expression level in a control cell. In some embodiments, the PSC comprises and/or overexpresses ETV3. In some embodiments, the PSC comprises and/or overexpresses GABPA. In some embodiments, the PSC comprises and/or overexpresses KLF9. In some embodiments, the PSC comprises and/or overexpresses NFKB1. In some embodiments, the PSC comprises and/or overexpresses ETS1 and ETV3; ETS1 and GABPA; ETS1 and KLF9; ETV3 and GABPA; ETV3 and KLF9; ETV3 and NFKB1; GABPA and KLF9; GABPA and NFKB1; orKLF9 and NFKB1.In some embodiments, the PSC comprises and/or overexpresses any preceding combination and one (at least one) additional protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1.In some embodiments, the PSC comprises and/or overexpresses ETS1, ETV3, GABPA, KLF9, and NFKB1. A PSC, in some embodiments, comprises an engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETS1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETV3. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding GABPA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding KLF9. In some embodiments, a
PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETS1 and an engineered polynucleotide comprising an open reading frame encoding ETV3. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETS1 and an engineered polynucleotide comprising an open reading frame encoding GABPA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETS1 and an engineered polynucleotide comprising an open reading frame encoding KLF9. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETS1 and an engineered polynucleotide comprising an open reading frame encoding NFKB1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETV3 and an engineered polynucleotide comprising an open reading frame encoding GABPA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETV3 and an engineered polynucleotide comprising an open reading frame encoding KLF9. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETV3 and an engineered polynucleotide comprising an open reading frame encoding NFKB1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding GABPA and an engineered polynucleotide comprising an open reading frame encoding KLF9. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding GABPA and an engineered polynucleotide comprising an open reading frame encoding NFKB1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding KLF9 and an engineered polynucleotide comprising an open reading frame encoding NFKB1. In some embodiments, the PSC comprises any preceding combination of engineered polynucleotides and one (at least one) additional engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1. In some embodiments, the PSC comprises an engineered polynucleotide comprising an open reading frame encoding ETS1, an engineered polynucleotide comprising an open reading frame encoding ETV3, an engineered polynucleotide comprising an open reading frame encoding GABPA, an engineered polynucleotide comprising an open reading frame encoding KLF9, and an engineered polynucleotide comprising an open reading frame encoding NFKB1. Differentiation of B Cell-Like Cells
Some aspects of the present disclosure provide a PSC (e.g., iPSC, such as a human iPSC) that is engineered to differentiate into an B cell-like cell. In some embodiments, the PSC comprises: a (one or more, e.g., 1, 2, 3, or 4) protein selected from Zinc finger and BTB domain containing 1 (ZBTB1), EBF Transcription Factor 1 (EBF1), REL-associated protein (RELA), Nuclear respiratory factor 1 (NRF1), and REL-associated protein (REL), wherein the protein is overexpressed. In some embodiments, the PSC comprises: one or more proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: two or more proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: three or more proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: four or more proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: one protein selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: two proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: three proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: four proteins selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises: ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises and/or overexpresses ZBTB1. In some embodiments, overexpression refers to an expression level above the expression level in a control cell. In some embodiments, the PSC comprises and/or overexpresses EBF1. In some embodiments, the PSC comprises and/or overexpresses RELA. In some embodiments, the PSC comprises and/or overexpresses NRF1. In some embodiments, the PSC comprises and/or overexpresses REL. In some embodiments, the PSC comprises and/or overexpresses ZBTB1 and EBF1; ZBTB1 and RELA; ZBTB1 and NRF1; EBF1 and RELA; EBF1 and NRF1; EBF1 and REL; RELA and NRF1; RELA and REL; or NRF1 and REL. In some embodiments, the PSC comprises and/or overexpresses any preceding combination and one (at least one) additional protein selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises and/or overexpresses ZBTB1, EBF1, RELA, NRF1, and REL. A PSC, in some embodiments, comprises an engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EBF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading
frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding EBF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding REL. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EBF1 and an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EBF1 and an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding EBF1 and an engineered polynucleotide comprising an open reading frame encoding REL. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA and an engineered polynucleotide comprising an open reading frame encoding NRF1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA and an engineered polynucleotide comprising an open reading frame encoding REL. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding NRF1 and an engineered polynucleotide comprising an open reading frame encoding REL. In some embodiments, the PSC comprises any preceding combination of engineered polynucleotides and one (at least one) additional engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ZBTB1, EBF1, RELA, NRF1, and REL. In some embodiments, the PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1, an engineered polynucleotide comprising an open reading frame encoding EBF1, an engineered polynucleotide comprising an open reading frame encoding RELA, an engineered polynucleotide comprising an open reading frame encoding NRF1, and an engineered polynucleotide comprising an open reading frame encoding REL. Differentiation of Microglia-Like Cells
Some aspects of the present disclosure provide a PSC (e.g., iPSC, such as a human iPSC) that is engineered to differentiate into an microglia-like cell. In some embodiments, the PSC comprises: a (one or more, e.g., 1, 2, 3, or 4) protein selected from Zinc finger and BTB domain containing 1 (ZBTB1), Spi-1 Proto-Oncogene (SPI1), REL-associated protein (RELA), and Signal Transducer And Activator Of Transcription 2 (STAT2), wherein the protein is overexpressed. In some embodiments, the PSC comprises: one or more proteins selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises: two or more proteins selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises: three or more proteins selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises: one protein selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises: two proteins selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises: three proteins selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises: ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises and/or overexpresses ZBTB1. In some embodiments, overexpression refers to an expression level above the expression level in a control cell. In some embodiments, the PSC comprises and/or overexpresses SPI1. In some embodiments, the PSC comprises and/or overexpresses RELA. In some embodiments, the PSC comprises and/or overexpresses STAT2. In some embodiments, the PSC comprises and/or overexpresses ZBTB1 and SPI1; ZBTB1 and RELA; ZBTB1 and STAT2; SPI1 and RELA; SPI1 and STAT2; or RELA and STAT2. In some embodiments, the PSC comprises and/or overexpresses any preceding combination and one (at least one) additional protein selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises and/or overexpresses ZBTB1, SPI1, RELA, and STAT2. A PSC, in some embodiments, comprises an engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding SPI1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding STAT2. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding SPI1. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open
reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1 and an engineered polynucleotide comprising an open reading frame encoding STAT2. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding SPI1 and an engineered polynucleotide comprising an open reading frame encoding RELA. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding SPI1 and an engineered polynucleotide comprising an open reading frame encoding STAT2. In some embodiments, a PSC comprises an engineered polynucleotide comprising an open reading frame encoding RELA and an engineered polynucleotide comprising an open reading frame encoding STAT2. In some embodiments, the PSC comprises any preceding combination of engineered polynucleotides and one (at least one) additional engineered polynucleotide comprising an open reading frame encoding a (one or more) protein selected from ZBTB1, SPI1, RELA, and STAT2. In some embodiments, the PSC comprises an engineered polynucleotide comprising an open reading frame encoding ZBTB1, an engineered polynucleotide comprising an open reading frame encoding SPI1, an engineered polynucleotide comprising an open reading frame encoding RELA, and an engineered polynucleotide comprising an open reading frame encoding STAT2. Engineered Transcription Factors - Polynucleotides and Polypeptides The pluripotent stem cells of the present disclosure, in some embodiments, comprise engineered polynucleotides. An engineered polynucleotide is a nucleic acid (e.g., at least two nucleotides covalently linked together, and in some instances, containing phosphodiester bonds, referred to as a phosphodiester backbone) that does not occur in nature. Engineered polynucleotides include recombinant nucleic acids and synthetic nucleic acids. A recombinant nucleic acid is a molecule that is constructed by joining nucleic acids (e.g., isolated nucleic acids, synthetic nucleic acids or a combination thereof) from two different organisms (e.g., human and mouse). A synthetic nucleic acid is a molecule that is amplified or chemically, or by other means, synthesized. A synthetic nucleic acid includes those that are chemically modified, or otherwise modified, but can base pair with (bind to) naturally occurring nucleic acid molecules. Recombinant and synthetic nucleic acids also include those molecules that result from the replication of either of the foregoing.
An engineered polynucleotide may comprise DNA (e.g., genomic DNA, cDNA or a combination of genomic DNA and cDNA), RNA or a hybrid molecule, for example, where the nucleic acid contains any combination of deoxyribonucleotides and ribonucleotides (e.g., artificial or natural), and any combination of two or more bases, including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine, hypoxanthine, isocytosine and isoguanine. In some embodiments, a polynucleotide is a complementary DNA (cDNA). cDNA is synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by reverse transcriptase. Engineered polynucleotides of the present disclosure may be produced using standard molecular biology methods (see, e.g., Green and Sambrook, Molecular Cloning, A Laboratory Manual, 2012, Cold Spring Harbor Press). In some embodiments, nucleic acids are produced using GIBSON ASSEMBLY® Cloning (see, e.g., Gibson, D.G. et al. Nature Methods, 343–345, 2009; and Gibson, D.G. et al. Nature Methods, 901–903, 2010, each of which is incorporated by reference herein). GIBSON ASSEMBLY® typically uses three enzymatic activities in a single-tube reaction: 5´ exonuclease, the 3´ extension activity of a DNA polymerase and DNA ligase activity. The 5´ exonuclease activity chews back the 5´ end sequences and exposes the complementary sequence for annealing. The polymerase activity then fills in the gaps on the annealed domains. A DNA ligase then seals the nick and covalently links the DNA fragments together. The overlapping sequence of adjoining fragments is much longer than those used in Golden Gate Assembly, and therefore results in a higher percentage of correct assemblies. Other methods of producing engineered polynucleotides may be used in accordance with the present disclosure. In some embodiments, an engineered polynucleotide comprises a promoter operably linked to an open reading frame. A promoter is a nucleotide sequence to which RNA polymerase binds to initial transcription (e.g., ATG). Promoters are typically located directly upstream from (at the 5' end of) a transcription initiation site. In some embodiments, a promoter is a heterologous promoter. A heterologous promoter is not naturally associated with the open reading frame to which is it operably linked. In some embodiments, a promoter is an inducible promoter. An inducible promoter may be regulated in vivo by a chemical agent, temperature, or light, for example. Inducible promoters enable, for example, temporal and/or spatial control of gene expression. Inducible promoters for use in accordance with the present disclosure include any inducible promoter described herein or known to one of ordinary skill in the art. Examples of inducible promoters include, without limitation, chemically/biochemically-regulated and physically-
regulated promoters such as alcohol-regulated promoters, tetracycline-regulated promoters (e.g., anhydrotetracycline (aTc)-responsive promoters and other tetracycline responsive promoter systems, which include a tetracycline repressor protein (tetR), a tetracycline operator sequence (tetO) and a tetracycline transactivator fusion protein (tTA)), steroid- regulated promoters (e.g., promoters based on the rat glucocorticoid receptor, human estrogen receptor, moth ecdysone receptors, and promoters from the steroid/retinoid/thyroid 25 receptor superfamily), metal-regulated promoters (e.g., promoters derived from metallothionein (proteins that bind and sequester metal ions) genes from yeast, mouse and human), pathogenesis-regulated promoters (e.g., induced by salicylic acid, ethylene or benzothiadiazole (BTH)), temperature/heat-inducible promoters (e.g., heat shock promoters), and light-regulated promoters (e.g., light responsive promoters from plant cells). In some embodiments, the inducible promoter is a tetracycline-inducible promoter. In some embodiments, the inducible promoter is a doxycycline-inducible promoter. In other embodiments, a promoter is a constitutive promoter (active in vivo, unregulated). An open reading frame is a continuous stretch of codons that begins with a start codon (e.g., ATG), ends with a stop codon (e.g., TAA, TAG, or TGA), and encodes a polypeptide, for example, a protein. An open reading frame is operably linked to a promoter if that promoter regulates transcription of the open reading frame. Vectors used for delivery of an engineered polynucleotide include minicircles, plasmids, bacterial artificial chromosomes (BACs), and yeast artificial chromosomes. Transposon-based systems, such as the piggyBac™ system (e.g., Chen et al. Nature Communications.2020; 11(1): 3446), is also contemplated herein. An engineered polynucleotide comprising an open reading frame encoding ERG (e.g., UniprotKB Accession No. P11308). In some embodiments, the protein comprises the sequence of: MASTIKEALSVVSEDQSLFECAYGTPHLAKTEMTASSSSDYGQTSKMSPRVPQQDWLSQPPARVTIKMECNPSQV NGSRNSPDECSVAKGGKMVGSPDTVGMNYGSYMEEKHMPPPNMTTNERRVIVPADPTLWSTDHVRQWLEWAVKEY GLPDVNILLFQNIDGKELCKMTKDDFQRLTPSYNADILLSHLHYLRETPLPHLTSDDVDKALQNSPRLMHARNTG GAAFIFPNTSVYPEATQRITTRPDLPYEPPRRSAWTGHGHPTPQSKAAQPSPSTVPKTEDQRPQLDPYQILGPTS SRLANPGSGQIQLWQFLLELLSDSSNSSCITWEGTNGEFKMTDPDEVARRWGERKSKPNMNYDKLSRALRYYYDK NIMTKVHGKRYAYKFDFHGIAQALQPHPPESSLYKYPSDLPYMGSYHAHPQKMNFVAPHPPALPVTSSSFFAAPN PYWNSPTGGIYPNTRLPTSHMPSHLGTYY* (SEQ ID NO: 1) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 1.
In some embodiments, an engineered polynucleotide comprises an open reading frame encoding EGR1 (e.g., UniprotKB Accession No. P18146). In some embodiments, the protein comprises the sequence of: MTTLKEAVTFKDVAVVFTEEELRLLDLAQRKLYREVMLENFRNLLSVGHQSLHRDTFHFLKEEKFWMMETATQRE GNLGGKIQMEMETVSESGTHEGLFSHQTWEQISSDLTRFQDSMVNSFQFSKQDDMPCQVDAGLSIIHVRQKPSEG RTCKKSFSDVSVLDLHQQLQSREKSHTCDECGKSFCYSSALRIHQRVHMGEKLYNCDVCGKEFNQSSHLQIHQRI HTGEKPFKCEQCGKGFSRRSGLYVHRKLHTGVKPHICEKCGKAFIHDSQLQEHQRIHTGEKPFKCDICCKSFRSR ANLNRHSMVHMREKPFRCDTCGKSFGLKSALNSHRMVHTGEKRYKCEECGKRFIYRQDLYKHQIDHTGEKPYNCK ECGKSFRWASGLSRHVRVHSGETTFKCEECGKGFYTNSQRYSHQRAHSGEKPYRCEECGKGYKRRLDLDFHQRVH RGEKPYNCKECGKSFGWASCLLNHQRIHSGEKPFKCEECGKRFTQNSQLYTHRRVHSGEKPFKCEECGKRFTQNS QLYSHRRVHTGVKPYKCEECGKGFNSKFNLDMHQRVHTGERPYNCKECGKSFSRASSILNHKRLHGDEKPFKCEE CGKRFTENSQLHSHQRVHTGEKPYKCEKCGKSFRWASTHLTHQRLHSREKLLQCEDCGKSIVHSSCLKDQQRDQS GEKTSKCEDCGKRYKRRLNLDTLLSLFLNDT* (SEQ ID NO: 2) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 2. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding FLI1 (e.g., UniprotKB Accession No. Q01543). In some embodiments, the protein comprises the sequence of: MDGTIKEALSVVSDDQSLFDSAYGAAAHLPKADMTASGSPDYGQPHKINPLPPQQEWINQPVRVNVKREYDHMNG SRESPVDCSVSKCSKLVGGGESNPMNYNSYMDEKNGPPPPNMTTNERRVIVPADPTLWTQEHVRQWLEWAIKEYS LMEIDTSFFQNMDGKELCKMNKEDFLRATTLYNTEVLLSHLSYLRESSLLAYNTTSHTDQSSRLSVKEDPSYDSV RRGAWGNNMNSGLNKSPPLGGAQTISKNTEQRPQPDPYQILGPTSSRLANPGSGQIQLWQFLLELLSDSANASCI TWEGTNGEFKMTDPDEVARRWGERKSKPNMNYDKLSRALRYYYDKNIMTKVHGKRYAYKFDFHGIAQALQPHPTE SSMYKYPSDISYMPSYHAHQQKVNFVPPHPSSMPVTSSSFFGAASQYWTSPTGGIYPNPNVPRHPNTHVPSHLGS YY* (SEQ ID NO: 3) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 3. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding FOSB (e.g., UniprotKB Accession No. P53539). In some embodiments, the protein comprises the sequence of: MFQAFPGDYDSGSRCSSSPSAESQYLSSVDSFGSPPTAAASQECAGLGEMPGSFVPTVTAITTSQDLQWLVQPTL ISSMAQSQGQPLASQPPVVDPYDMPGTSYSTPGMSGYSSGGASGSGGPSTSGTTSGPGPARPARARPRRPREETL TPEEEEKRRVRRERNKLAAAKCRNRRRELTDRLQAETDQLEEEKAELESEIAELQKEKERLEFVLVAHKPGCKIP YEEGPGPGPLAEVRDLPGSAPAKEDGFSWLLPPPPPPPLPFQTSQDAPPNLTASLFTHSEVQVLGDPFPVVNPSY TSSFVLTCPEVSAFAGAQRTSGSDQPSDPLNSPSLLALWIHPAFLY (SEQ ID NO: 4) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 4. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding ZBTB1 (e.g., UniprotKB Accession No. Q9Y2K1). In some embodiments, the protein comprises the sequence of: MAKPSHSSYVLQQLNNQREWGFLCDCCIAIDDIYFQAHKAVLAACSSYFRMFFMNHQHSTAQLNLSNMKISAECF DLILQFMYLGKIMTAPSSFEQFKVAMNYLQLYNVPDCLEDIQDADCSSSKCSSSASSKQNSKMIFGVRMYEDTVA RNGNEANRWCAEPSSTVNTPHNREADEESLQLGNFPEPLFDVCKKSSVSKLSTPKERVSRRFGRSFTCDSCGFGF SCEKLLDEHVLTCTNRHLYQNTRSYHRIVDIRDGKDSNIKAEFGEKDSSKTFSAQTDKYRGDTSQAADDSASTTG
SRKSSTVESEIASEEKSRAAERKRIIIKMEPEDIPTDELKDFNIIKVTDKDCNESTDNDELEDEPEEPFYRYYVE EDVSIKKSGRKTLKPRMSVSADERGGLENMRPPNNSSPVQEDAENASCELCGLTITEEDLSSHYLAKHIENICAC GKCGQILVKGRQLQEHAQRCGEPQDLTMNGLGNTEEKMDLEENPDEQSEIRDMFVEMLDDFRDNHYQINSIQKKQ LFKHSACPFRCPNCGQRFETENLVVEHMSSCLDQDMFKSAIMEENERDHRRKHFCNLCGKGFYQRCHLREHYTVH TKEKQFVCQTCGKQFLRERQLRLHNDMHKGMARYVCSICDQGNFRKHDHVRHMISHLSAGETICQVCFQIFPNNE QLEQHMDVHLYTCGICGAKFNLRKDMRSHYNAKHLKRTL (SEQ ID NO: 5) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 5. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding RUNX3 (e.g., UniprotKB Accession No. Q13761). In some embodiments, the protein comprises the sequence of: MASNSIFDSFPTYSPTFIRDPSTSRRFTPPSPAFPCGGGGGKMGENSGALSAQAAVGPGGRARPEVRSMVDVLAD HAGELVRTDSPNFLCSVLPSHWRCNKTLPVAFKVVALGDVPDGTVVTVMAGNDENYSAELRNASAVMKNQVARFN DLRFVGRSGRGKSFTLTITVFTNPTQVATYHRAIKVTVDGPREPRRHRQKLEDQTKPFPDRFGDLERLRMRVTPS TPSPRGSLSTTSHFSSQPQTPIQGTSELNPFSDPRQFDRSFPTLPTLTESRFPDPRMHYPGAMSAAFPYSATPSG TSISSLSVAGMPATSRFHHTYLPPPYPGAPQNQSGPFQANPSPYHLYYGTSSGSYQFSMVAGSSSGGDRSPTRML ASCTSSAASVAAGNLMNPSLGGQSDGVEADGSHSNSPTALSTPGRMDEAVWRPY* (SEQ ID NO: 6) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 6. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding RELA (e.g., UniprotKB Accession No. Q04206). In some embodiments, the protein comprises the sequence of: MDELFPLIFPAEPAQASGPYVEIIEQPKQRGMRFRYKCEGRSAGSIPGERSTDTTKTHPTIKINGYTGPGTVRIS LVTKDPPHRPHPHELVGKDCRDGFYEAELCPDRCIHSFQNLGIQCVKKRDLEQAISQRIQTNNNPFQVPIEEQRG DYDLNAVRLCFQVTVRDPSGRPLRLPPVLSHPIFDNRAPNTAELKICRVNRNSGSCLGGDEIFLLCDKVQKEDIE VCPQASTPALSLYVIPEHHQL* (SEQ ID NO: 7) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 7. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding NRF1 (e.g., UniprotKB Accession No. Q16656). In some embodiments, the protein comprises the sequence of: MEEHGVTQTEHMATIEAHAVAQQVQQVHVATYTEHSMLSADEDSPSSPEDTSYDDSDILNSTAADEVTAHLAAAG PVGMAAAAAVATGKKRKRPHVFESNPSIRKRQQTRLLRKLRATLDEYTTRVGQQAIVLCISPSKPNPVFKVFGAA PLENVVRKYKSMILEDLESALAEHAPAPQEVNSELPPLTIDGIPVSVDKMTQAQLRAFIPEMLKYSTGRGKPGWG KESCKPIWWPEDIPWANVRSDVRTEEQKQRVSWTQALRTIVKNCYKQHGREDLLYAFEDQQTQTQATATHSIAHL VPSQTVVQTFSNPDGTVSLIQVGTGATVATLADASELPTTVTVAQVNYSAVADGEVEQNWATLQGGEMTIQTTQA SEATQAVASLAEAAVAASQEMQQGATVTMALNSEAAAHAVATLAEATLQGGGQIVLSGETAAAVGALTGVQDANG LFMADRAGRKWILTDKATGLVQIPVSMYQTVVTSLAQGNGPVQVAMAPVTTRISDSAVTMDGQAVEVVTLEQ* (SEQ ID NO: 8) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 8.
In some embodiments, an engineered polynucleotide comprises an open reading frame encoding ERF (e.g., UniprotKB Accession No. P50548). In some embodiments, the protein comprises the sequence of: MKTPADTGFAFPDWAYKPESSPGSRQIQLWHFILELLRKEEYQGVIAWQGDYGEFVIKDPDEVARLWGVRKCKPQ MNYDKLSRALRYYYNKRILHKTKGKRFTYKFNFNKLVLVNYPFIDVGLAGGAVPQSAPPVPSGGSHFRFPPSTPS EVLSPTEDPRSPPACSSSSSSLFSAVVARRLGRGSVSDCSDGTSELEEPLGEDPRARPPGPPDLGAFRGPPLARL PHDPGVFRVYPRPRGGPEPLSPFPVSPLAGPGSLLPPQLSPALPMTPTHLAYTPSPTLSPMYPSGGGGPSGSGGG SHFSFSPEDMKRYLQAHTQSVYNYHLSPRAFLHYPGLVVPQPQRPDKCPLPPMAPETPPVPSSASSSSSSSSSPF KFKLQPPPLGRRQRAAGEKAVAGADKSGGSAGGLAEGAGALAPPPPPPQIKVEPISEGESEEVEVTDISDEDEED GEVFKTPRAPPAPPKPEPGEAPGASQCMPLKLRFKRRWSEDCRLEGGGGPAGGFEDEGEDKKVRGEGPGEAGGPL TPRRVSSDLQHATAQLSLEHRDS* (SEQ ID NO: 9) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 9. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding SP4 (e.g., UniprotKB Accession No. Q02446). In some embodiments, the protein comprises the sequence of: MSDQKKEEEEEAAAAAAMATEGGKTSEPENNNKKPKTSGSQDSQPSPLALLAATCSKIGTPGENQATGQQQIIID PSQGLVQLQNQPQQLELVTTQLAGNAWQLVASTPPASKENNVSQPASSSSSSSSSNNGSASPTKTKSGNSSTPGQ FQVIQVQNPSGSVQYQVIPQLQTVEGQQIQINPTSSSSLQDLQGQIQLISAGNNQAILTAANRTASGNILAQNLA NQTVPVQIRPGVSIPLQLQTLPGTQAQVVTTLPINIGGVTLALPVINNVAAGGGTGQVGQPAATADSGTSNGNQL VSTPTNTTTSASTMPESPSSSTTCTTTASTSLTSSDTLVSSADTGQYASTSASSSERTIEESQTPAATESEAQSS SQLQPNGMQNAQDQSNSLQQVQIVGQPILQQIQIQQPQQQIIQAIPPQSFQLQSGQTIQTIQQQPLQNVQLQAVN PTQVLIRAPTLTPSGQISWQTVQVQNIQSLSNLQVQNAGLSQQLTITPVSSSGGTTLAQIAPVAVAGAPITLNTA QLASVPNLQTVSVANLGAAGVQVQGVPVTITSVAGQQQGQDGVKVQQATIAPVTVAVGGIANATIGAVSPDQLTQ VHLQQGQQTSDQEVQPGKRLRRVACSCPNCREGEGRGSNEPGKKKQHICHIEGCGKVYGKTSHLRAHLRWHTGER PFICNWMFCGKRFTRSDELQRHRRTHTGEKRFECPECSKRFMRSDHLSKHVKTHQNKKGGGTALAIVTSGELDSS VTEVLGSPRIVTVAAISQDSNPATPNVSTNMEEF* (SEQ ID NO: 10) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 10. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding HNF4A (e.g., UniprotKB Accession No. P41235). In some embodiments, the protein comprises the sequence of: MRLSKTLVDMDMADYSAALDPAYTTLEFENVQVLTMGNDTSPSEGTNLNAPNSLGVSALCAICGDRATGKHYGAS SCDGCKGFFRRSVRKNHMYSCRFSRQCVVDKDKRNQCRYCRLKKCFRAGMKKEAVQNERDRISTRRSSYEDSSLP SINALLQAEVLSRQITSPVSGINGDIRAKKIASIADVCESMKEQLLVLVEWAKYIPAFCELPLDDQVALLRAHAG EHLLLGATKRSMVFKDVLLLGNDYIVPRHCPELAEMSRVSIRILDELVLPFQELQIDDNEYAYLKAIIFFDPDAK GLSDPGKIKRLRSQVQVSLEDYINDRQYDSRGRFGELLLLLPTLQSITWQMIEQIQFIKLFGMAKIDNLLQEMLL GGPCQAQEGRGWSGDSPGDRPHTVSSPLSSLASPLCRFGQVA* (SEQ ID NO: 11) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 11. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding HNF4G (e.g., UniprotKB Accession No. Q14541). In some embodiments, the protein comprises the sequence of:
MDMANYSEVLDPTYTTLEFETMQILYNSSDSSAPETSMNTTDNGVNCLCAICGDRATGKHYGASSCDGCKGFFRR SIRKSHVYSCRFSRQCVVDKDKRNQCRYCRLRKCFRAGMKKEAVQNERDRISTRRSTFDGSNIPSINTLAQAEVR SRQISVSSPGSSTDINVKKIASIGDVCESMKQQLLVLVEWAKYIPAFCELPLDDQVALLRAHAGEHLLLGATKRS MMYKDILLLGNNYVIHRNSCEVEISRVANRVLDELVRPFQEIQIDDNEYACLKAIVFFDPDAKGLSDPVKIKNMR FQVQIGLEDYINDRQYDSRGRFGELLLLLPTLQSITWQMIEQIQFVKLFGMVKIDNLLQEMLLGGASNDGSHLHH PMHPHLSQDPLTGQTILLGPMSTLVHADQISTPETPLPSPPQGSGQEQYKIAANQASVISHQHLSKQKQL* (SEQ ID NO: 12) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 12. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding TEAD4 (e.g., UniprotKB Accession No. Q15561). In some embodiments, the protein comprises the sequence of: MYGRNELIARYIKLRTGKTRTRKQVSSHIQVLARRKAREIQAKLKDQAAKDKALQSMAAMSSAQIISATAFHSSM ALARGPGRPAVSGFWQGALPGQAGTSHDVKPFSQQTYAVQPPLPLPGFESPAGPAPSPSAPPAPPWQGRSVASSK LWMLEFSAFLEQQQDPDTYNKHLFVHIGQSSPSYSDPYLEAVDIRQIYDKFPEKKGGLKDLFERGPSNAFFLVKF WADLNTNIEDEGSSFYGVSSQYESPENMIITCSTKVCSFGKQVVEKVETEYARYENGHYSYRIHRSPLCEYMINF IHKLKHLPEKYMMNSVLENFTILQVVTNRDTQETLLCIAYVFEVSASEHGAQHHIYRLVKE* (SEQ ID NO: 13) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 13. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding RFX3 (e.g., UniprotKB Accession No. P48380). In some embodiments, the protein comprises the sequence of: MQTSETGSDTGSTVTLQTSVASQAAVPTQVVQQVPVQQQVQQVQTVQQVQHVYPAQVQYVEGSDTVYTNGAIRTT TYPYTETQMYSQNTGGNYFDTQGSSAQVTTVVSSHSMVGTGGIQMGVTGGQLISSSGGTYLIGNSMENSGHSVTH TTRASPATIEMAIETLQKSDGLSTHRSSLLNSHLQWLLDNYETAEGVSLPRSTLYNHYLRHCQEHKLDPVNAASF GKLIRSIFMGLRTRRLGTRGNSKYHYYGIRVKPDSPLNRLQEDMQYMAMRQQPMQQKQRYKPMQKVDGVADGFTG SGQQTGTSVGQTVIAQSQHHQQFLDASRALPEFGEVEISSLPDGTTFEDIKSLQSLYREHCEAILDVVVNLQFSL IEKLWQTFWRYSPSTPTDGTTITESRSESTSFPIHFHG* (SEQ ID NO: 14) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 14. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding ETS1 (e.g., UniprotKB Accession No. P14921). In some embodiments, the protein comprises the sequence of: MKAAVDLKPTLTIIKTEKVDLELFPSPDMECADVPLLTPSSKEMMSQALKATFSGFTKEQQRLGIPKDPRQWTET HVRDWVMWAVNEFSLKGVDFQKFCMNGAALCALGKDCFLELAPDFVGDILWEHLEILQKEDVKPYQVNGVNPAYP ESRYTSDYFISYGIEHAQCVPPSEFSEPSFITESYQTLHPISSEELLSLKYENDYPSVILRDPLQTDTLQNDYFA IKQEVVTPDNMCMGRTSRGKLGGQDSFESIESYDSCGQEMGKEEKQT* (SEQ ID NO: 15) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 15.
In some embodiments, an engineered polynucleotide comprises an open reading frame encoding ETV3 (e.g., UniprotKB Accession No. P41162). In some embodiments, the protein comprises the sequence of: MKAGCSIVEKPEGGGGYQFPDWAYKTESSPGSRQIQLWHFILELLQKEEFRHVIAWQQGEYGEFVIKDPDEVARL WGRRKCKPQMNYDKLSRALRYYYNKRILHKTKGKRFTYKFNFNKLVMPNYPFINIRSSGKIQTLLVGN (SEQ ID NO: 16) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 16. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding GABPA (e.g., UniprotKB Accession No. Q06546). In some embodiments, the protein comprises the sequence of: MTKREAEELIEIEIDGTEKAECTEESIVEQTYAPAECVSQAIDINEPIGNLKKLLEPRLQCSLDAHEICLQDIQL DPERSLFDQGVKTDGTVQLSVQVISYQGIEPKLNILEIVKPADTVEVVIDPDAHHAESEAHLVEEAQVITLDGTK HITTISDETSEQVTRWAAALEGYRKEQERLGIPYDPIQWSTDQVLHWVVWVMKEFSMTDIDLTTLNISGRELCSL NQEDFFQRVPRGEILWSHLELLRKYVLASQEQQMNEIVTIDQPVQIIPASVQSATPTTIKVINSSAKAAKVQRAP RISGEDRSSPGNRTGNNGQIQLWQFLLELLTDKDARDCISWVGDEGEFKLNQPELVAQKWGQRKNKPTMNYEKLS RALRYYYDGDMICKVQGKRFVYKFVCDLKTLIGYSAAELNRLVTECEQKKLAKMQLHGIAQPVTAVALSTASLQT EKDNL (SEQ ID NO: 17) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 17. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding KLF9 (e.g., UniprotKB Accession No. Q13886). In some embodiments, the protein comprises the sequence of: MSAAAYMDFVAAQCLVSISNRAAVPEHGVAPDAERLRLPEREVTKEHGDPGDTWKDYCTLVTIAKSLLDLNKYRP IQTPSVCSDSLESPDEDMGSDSDVTTESGSSPSHSPEERQDPGSAPSPLSLLHPGVAAKGKHASEKRHKCPYSGC GKVYGKSSHLKAHYRVHTGERPFPCTWPDCLKKFSRSDELTRHYRTHTGEKQFRCPLCEKRFMRSDHLTKHARRH TEFHPSMIKRSKKALANALL (SEQ ID NO: 18) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 18. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding NFKB1 (e.g., UniprotKB Accession No. P19838). In some embodiments, the protein comprises the sequence of: MAEDDPYLGRPEQMFHLDPSLTHTIFNPEVFQPQMALPTADGPYLQILEQPKQRGFRFRYVCEGPSHGGLPGASS EKNKKSYPQVKICNYVGPAKVIVQLVTNGKNIHLHAHSLVGKHCEDGICTVTAGPKDMVVGFANLGILHVTKKKV FETLEARMTEACIRGYNPGLLVHPDLAYLQAEGGGDRQLGDREKELIRQAALQQTKEMDLSVVRLMFTAFLPDST GSFTRRLEPVVSDAIYDSKAPNASNLKIVRMDRTAGCVTGGEEIYLLCDKVQKDDIQIRFYEEEENGGVWEGFGD FSPTDVHRQFAIVFKTPKYKDINITKPASVFVQLRRKSDLETSEPKPFLYYPEIKDKEEVQRKRQKLMPNFSDSF GGGSGAGAGGGGMFGSGGGGGGTGSTGPGYSFPHYGFPTYGGITFHPGTTKSNAGMKHGTMDTESKKDPEGCDKS DDKNTVNLFGKVIETTEQDQEPSEATVGNGEVTLTYATGTKEESAGVQDNLFLEKAMQLAKRHANALFDYAVTGD VKMLLAVQRHLTAVQDENGDSVLHLAIIHLHSQLVRDLLEVTSGLISDDIINMRNDLYQTPLHLAVITKQEDVVE DLLRAGADLSLLDRLGNSVLHLAAKEGHDKVLSILLKHKKAALLLDHPNGDGLNAIHLAMMSNSLPCLLLLVAAG ADVNAQEQKSGRTALHLAVEHDNISLAGCLLLEGDAHVDSTTYDGTTPLHIAAGRGSTRLAALLKAAGADPLVEN FEPLYDLDDSWENAGEDEGVVPGTTPLDMATSWQVFDILNGKPYEPEFTSDDLLAQGDMKQLAEDVKLQLYKLLE IPDPDKNWATLAQKLGLGILNNAFRLSPAPSKTLMDNYEVSGGTVRELVEALRQMGYTEAIEVIQAASSPVKTTS
QAHSLPLSPASTRQQIDELRDSDSVCDSGVETSFRKLSFTESLTSGASLLTLNKMPHDYGQEGPLEGKI* (SEQ ID NO: 19) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 19. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding EBF1 (e.g., UniprotKB Accession No. Q9UH73). In some embodiments, the protein comprises the sequence of: MFGIQESIQRSGSSMKEEPLGSGMNAVRTWMQGAGVLDANTAAQSGVGLARAHFEKQPPSNLRKSNFFHFVLALY DRQGQPVEIERTAFVGFVEKEKEANSEKTNNGIHYRLQLLYSNGIRTEQDFYVRLIDSMTKQAIVYEGQDKNPEM CRVLLTHEIMCSRCCDKKSCGNRNETPSDPVIIDRFFLKFFLKCNQNCLKNAGNPRDMRRFQVVVSTTVNVDGHV LAVSDNMFVHNNSKHGRRARRLDPSEGTPSYLEHATPCIKAISPSEGWTTGGATVIIIGDNFFDGLQVIFGTMLV WSELITPHAIRVQTPPRHIPGVVEVTLSYKSKQFCKGTPGRFIYTALNEPTIDYGFQRLQKVIPRHPGDPERLPK EVILKRAADLVEALYGMPHNNQEIILKRAADIAEALYSVPRNHNQLPALANTSVHAGMMGVNSFSGQLAVNVSEA SQATNQGFTRNSSSVSPHGYVPSTTPQQTNYNSVTTSMNGYGSAAMSNLGGSPTFLNGSAANSPYAIVPSSPTMA SSTSLPSNCSSSSGIFSFSPANMVSAVKQKSAFAPVVRPQTSPPPTCTSTNGNSLQAISGMIVPPM* (SEQ ID NO: 20) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 20. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding REL (e.g., UniprotKB Accession No. Q04864). In some embodiments, the protein comprises the sequence of: MASGAYNPYIEIIEQPRQRGMRFRYKCEGRSAGSIPGEHSTDNNRTYPSIQIMNYYGKGKVRITLVTKNDPYKPH PHDLVGKDCRDGYYEAEFGQERRPLFFQNLGIRCVKKKEVKEAIITRIKAGINPFNVPEKQLNDIEDCDLNVVRL CFQVFLPDEHGNLTTALPPVVSNPIYDNRAPNTAELRICRVNKNCGSVRGGDEIFLLCDKVQKDDIEVRFVLNDW EAKGIFSQADVHRQVAIVFKTPPYCKAITEPVTVKMQLRRPSDQEVSESMDFRYLPDEKDTYGNKAKKQKTTLLF QKLCQDHVNFPERPRPGLLGSIGEGRYFKKEPNLFSHDAVVREMPTGVSSQAESYYPSPGPISSGLSHHASMAPL PSSSWSSVAHPTPRSGNTNPLSSFSTRTLPSNSQGIPPFLRIPVGNDLNASNACIYNNADDIVGMEASSMPSADL YGISDPNMLSNCSVNMMTTSSDSMGETDNPRLLSMNLENPSCNSVLDPRDLRQLHQMSSSSMSAGANSNTTVFVS QSDAFEGSDFSCADNSMINESGPSNSTNPNSHGFVQDSQYSGIGSMQNEQLSDSFPYEFFQV* (SEQ ID NO: 21) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 21. In some embodiments, an engineered polynucleotide comprises an open reading frame encoding SPI1 (e.g., UniprotKB Accession No. P17947). In some embodiments, the protein comprises the sequence of: MLQACKMEGFPLVPPQPSEDLVPYDTDLYQRQTHEYYPYLSSDGESHSDHYWDFHPHHVHSEFESFAENNFTELQ SVQPPQLQQLYRHMELEQMHVLDTPMVPPHPSLGHQVSYLPRMCLQYPSLSPAQPSSDEEEGERQSPPLEVSDGE ADGLEPGPGLLPGETGSKKKIRLYQFLLDLLRSGDMKDSIWWVDKDKGTFQFSSKHKEALAHRWGIQKGNRKKMT YQKMARALRNYGKTGEVKKVKKKLTYQFSGEVLGRGGLAERRHPPH* (SEQ ID NO: 22) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 22.
In some embodiments, an engineered polynucleotide comprises an open reading frame encoding STAT2 (e.g., UniprotKB Accession No. P52630). In some embodiments, the protein comprises the sequence of: MAQWEMLQNLDSPFQDQLHQLYSHSLLPVDIRQYLAVWIEDQNWQEAALGSDDSKATMLFFHFLDQLNYECGRCS QDPESLLLQHNLRKFCRDIQPFSQDPTQLAEMIFNLLLEEKRILIQAQRAQLEQGEPVLETPVESQQHEIESRIL DLRAMMEKLVKSISQLKDQQDVFCFRYKIQAKGKTPSLDPHQTKEQKILQETLNELDKRRKEVLDASKALLGRLT TLIELLLPKLEEWKAQQQKACIRAPIDHGLEQLETWFTAGAKLLFHLRQLLKELKGLSCLVSYQDDPLTKGVDLR NAQVTELLQRLLHRAFVVETQPCMPQTPHRPLILKTGSKFTVRTRLLVRLQEGNESLTVEVSIDRNPPQLQGFRK FNILTSNQKTLTPEKGQSQGLIWDFGYLTLVEQRSGGSGKGSNKGPLGVTEELHIISFTVKYTYQGLKQELKTDT LPVVIISNMNQLSIAWASVLWFNLLSPNLQNQQFFSNPPKAPWSLLGPALSWQFSSYVGRGLNSDQLSMLRNKLF GQNCRTEDPLLSWADFTKRESPPGKLPFWTWLDKILELVHDHLKDLWNDGRIMGFVSRSQERRLLKKTMSGTFLL RFSESSEGGITCSWVEHQDDDKVLIYSVQPYTKEVLQSLPLTEIIRHYQLLTEENIPENPLRFLYPRIPRDEAFG CYYQEKVNLQERRKYLKHRLIVVSNRQVDELQQPLELKPEPELESLELELGLVPEPELSLDLEPLLKAGLDLGPE LESVLESTLEPVIEPTLCMVSQTVPEPDQGPVSQPVPEPDLPCDLRHLNTEPMEIFRNCVKIEEIMPNGDPLLAG QNTVDEVYVSRPSHFYTDGPLMPSDF* (SEQ ID NO: 23) In some embodiments, the protein comprises a sequence that has at least 80%, at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NO: 23. The number of copies of an engineered polynucleotide delivered to a PSC may vary. In some embodiments, a PSC comprises 1-20 copies of an engineered polynucleotide. For example, and PSC may comprise 1-15, 1-10, 2-10, 2-15, 2-10, 5-20, 5-15, or 5-10 copies of an engineered polynucleotide. In some embodiments, a PSC comprises 8-10 copies of an engineered polynucleotide. In some embodiments, a PSC comprises fewer than 25 copies of an engineered polynucleotide. For example, a PSC may comprise fewer than 20, fewer than 15, or fewer than 10 copies of an engineered polynucleotide. Greater than 20 copies are also contemplated herein. Methods of Producing Cells for Stem Cells Some aspects of the present disclosure relate to a method of using direct transcription factor overexpression in conjunction with growth factor culturing to induce CD44+ and A2B5+ astrocyte-like cells from stem cells (e.g., iPSCs) in fewer than 6 days (e.g., about 3 to about 5 days, e.g., about 3, about 4, or about 5 days). The methods of producing astrocyte- like cells provided herein, in some aspects, comprises culturing, in culture media, a population of pluripotent stem cells (PSCs) (e.g., iPSCs, such as human iPSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ERG, EGR1, FLI1, and FOSB to produce astrocyte-like cells. In some embodiments, the method comprises expressing ERG in PSCs of the expanded population. In some embodiments, the method comprises expressing ERG1 in PSCs of the expanded population. In some embodiments, the method comprises expressing FLI1 in PSCs of the expanded population. In some embodiments, the method comprises expressing FOSB in
PSCs of the expanded population. In some embodiments, the method comprises expressing ERG and ERG1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ERG and FLI1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ERG and FOSB in PSCs of the expanded population. In some embodiments, the method comprises expressing ERG1 and FLI1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ERG1 and FOSB in PSCs of the expanded population. In some embodiments, the method comprises expressing FLI1 and FOSB in PSCs of the expanded population. In some embodiments, the method comprises expressing any preceding combination and a (at least one) protein selected from ERG, EGR1, FLI1, and FOSB in PSCs of the expanded population. In some embodiments, the method comprises expressing ERG, EGR1, FLI1, and FOSB in PSCs of the expanded population. Other aspects of the present disclosure relate to a method of using direct transcription factor overexpression in conjunction with growth factor culturing to induce CD3+ and CD8+ cytotoxic T-cell-like cells from stem cells in fewer than 6 days (e.g., about 3 to about 5 days, e.g., about 3, about 4, or about 5 days). The methods of producing cytotoxic T-cell-like cells provided herein, in some aspects, comprises culturing, in culture media, a population of pluripotent stem cells (PSCs) (e.g., iPSCs, such as human iPSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 to produce cytotoxic T-cell-like cells. In some embodiments, the method comprises expressing ZBTB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing RUNX3 in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ERF in PSCs of the expanded population. In some embodiments, the method comprises expressing SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and RUNX3 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and ERF in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing RUNX3 and RELA in PSCs of the expanded
population. In some embodiments, the method comprises expressing RUNX3 and NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing RUNX3 and ERF in PSCs of the expanded population. In some embodiments, the method comprises expressing RUNX3 and SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA and NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA and ERF in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA and SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing NRF1 and ERF in PSCs of the expanded population. In some embodiments, the method comprises expressing NRF1 and SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing ERF and SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing any preceding combination and a (at least one) protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 in PSCs of the expanded population. Yet other aspects of the present disclosure relate to a method of using direct transcription factor overexpression in conjunction with growth factor culturing to induce CD184+ and ASGPR1+ hepatocyte-like cells from stem cells in fewer than 6 days (e.g., about 3 to about 5 days, e.g., about 3, about 4, or about 5 days). The methods of producing hepatocyte T-cell-like cells provided herein, in some aspects, comprises culturing, in culture media, a population of pluripotent stem cells (PSCs) (e.g., iPSCs, such as human iPSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from HNF4A, HNF4G, TEAD4, and RFX3 to produce hepatocyte T-cell-like cells. In some embodiments, the method comprises expressing HNF4A in PSCs of the expanded population. In some embodiments, the method comprises expressing HNF4G in PSCs of the expanded population. In some embodiments, the method comprises expressing TEAD4 in PSCs of the expanded population. In some embodiments, the method comprises expressing RFX3 in PSCs of the expanded population. In some embodiments, the method comprises expressing HNF4A and HNF4G in PSCs of the expanded population. In some embodiments, the method comprises expressing HNF4A and TEAD4 in PSCs of the expanded population. In some embodiments, the method comprises expressing HNF4A and RFX3 in PSCs of the expanded population. In some embodiments, the method comprises expressing HNF4G and TEAD4 in PSCs of the expanded population. In some embodiments,
the method comprises expressing HNF4G and RFX3 in PSCs of the expanded population. In some embodiments, the method comprises expressing TEAD4 and RFX3 in PSCs of the expanded population. In some embodiments, the method comprises expressing any preceding combination and a (at least one) protein selected from HNF4A, HNF4G, TEAD4, and RFX3, in PSCs of the expanded population. In some embodiments, the method comprises expressing HNF4A, HNF4G, TEAD4, and RFX3 in PSCs of the expanded population. Still other aspects of the present disclosure relate to a method of using direct transcription factor overexpression in conjunction with growth factor culturing to induce CD3+ and CD25+ regulatory T-cell-like cells from stem cells in fewer than 6 days (e.g., about 3 to about 5 days, e.g., about 3, about 4, or about 5 days). The methods of producing regulatory T-cell-like cells provided herein, in some aspects, comprises culturing, in culture media, a population of pluripotent stem cells (PSCs) (e.g., iPSCs, such as human iPSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1 to produce regulatory T-cell-like cells. In some embodiments, the method comprises expressing ETS1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ETV3 in PSCs of the expanded population. In some embodiments, the method comprises expressing GABPA in PSCs of the expanded population. In some embodiments, the method comprises expressing KLF9 in PSCs of the expanded population. In some embodiments, the method comprises expressing NFKB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ETS1 and ETV3 in PSCs of the expanded population. In some embodiments, the method comprises expressing ETS1 and GABPA in PSCs of the expanded population. In some embodiments, the method comprises expressing ETS1 and KLF9 in PSCs of the expanded population. In some embodiments, the method comprises expressing ETS1 and NFKB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ETV3 and GABPA in PSCs of the expanded population. In some embodiments, the method comprises expressing ETV3 and KLF9 in PSCs of the expanded population. In some embodiments, the method comprises expressing ETV3 and NFKB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing GABPA and KLF9 in PSCs of the expanded population. In some embodiments, the method comprises expressing GABPA and NFKB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing KLF9 and NFKB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing any preceding combination and a (at least one) protein selected from ETS1,
ETV3, GABPA, KLF9, and NFKB1, in PSCs of the expanded population. In some embodiments, the method comprises expressing ETS1, ETV3, GABPA, KLF9, and NFKB1, in PSCs of the expanded population. Further aspects of the present disclosure relate to a method of using direct transcription factor overexpression in conjunction with growth factor culturing to induce CD19+ and CD27+ B cell-like cells from stem cells in fewer than 6 days (e.g., about 3 to about 5 days, e.g., about 3, about 4, or about 5 days). The methods of producing B cell-like cells provided herein, in some aspects, comprises culturing, in culture media, a population of pluripotent stem cells (PSCs) (e.g., iPSCs, such as human iPSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ZBTB1, EBF1, RELA, NRF1, and REL to produce B cell-like cells. In some embodiments, the method comprises expressing ZBTB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing EBF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing REL in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and EBF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and REL in PSCs of the expanded population. In some embodiments, the method comprises expressing EBF1 and RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing EBF1 and NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing EBF1 and REL in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA and NRF1 in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA and REL in PSCs of the expanded population. In some embodiments, the method comprises expressing NRF1 and REL in PSCs of the expanded population. In some embodiments, the method comprises expressing any preceding combination and a (at least one) protein selected from ZBTB1, EBF1, RELA, NRF1, and REL in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1, EBF1, RELA, NRF1, and REL in PSCs of the expanded population.
Additional aspects of the present disclosure relate to a method of using direct transcription factor overexpression in conjunction with growth factor culturing to induce CD11b+ and CX3CR1+ microglia-like cells from stem cells in fewer than 6 days (e.g., about 3 to about 5 days, e.g., about 3, about 4, or about 5 days). The methods of producing microglia-like cells provided herein, in some aspects, comprises culturing, in culture media, a population of pluripotent stem cells (PSCs) (e.g., iPSCs, such as human iPSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ZBTB1, SPI1, RELA, and STAT2 to produce microglia-like cells. In some embodiments, the method comprises expressing ZBTB1 in PSCs of the expanded population. In some embodiments, the method comprises expressing SPI1 in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing STAT2 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and SPI1 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1 and STAT2 in PSCs of the expanded population. In some embodiments, the method comprises expressing SPI1 and RELA in PSCs of the expanded population. In some embodiments, the method comprises expressing SPI1 and STAT2 in PSCs of the expanded population. In some embodiments, the method comprises expressing RELA and STAT2 in PSCs of the expanded population. In some embodiments, the method comprises expressing any preceding combination and a (at least one) protein selected from ZBTB1, SPI1, RELA, and STAT2 in PSCs of the expanded population. In some embodiments, the method comprises expressing ZBTB1, SPI1, RELA, and STAT2 in PSCs of the expanded population. In some embodiments, the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. In some embodiments, the heterologous promoter is an inducible promoter, non- limiting examples of which are provided elsewhere herein. The population a starting population comprises, in some embodiments, about 1x102 - 1x1010, about 1x102 -1x109, about 1x102 -1x108, or about 1x102 -1x107 PSCs. In some embodiments, the population comprises about 1x103 -1x108 or about 1x103 -1x107 PSCs. In some embodiments, the population comprises about 1x104 -1x107 or about 1x105 -1x106 PSCs. In some embodiments, the population comprises about 1x101 PSCs, about 1x102 PSCs,
about 1x103 PSCs, about 1x104 PSCs, about 1x105 PSCs, about 1x106 PSCs, about 1x107 PSCs, about 1x108 PSCs, about 1x109 PSCs, or about 1x1010 PSCs. In some embodiments, the population of PSCs is cultured for about 1 day to about 10 days. In some embodiments, the population of PSCs is cultured for no more than 10 days. For example, the population of PSCs may be cultured for no more than 9 days, no more than 8 days, no more than 7 days, no more than 6 days, or no more than 5 days. In some embodiments, the population of PSCs is cultured for no more than 6 days. In some embodiments, the population of PSCs is cultured for about 2 to about 6 days, about 2 to about 5 days, about 2 to about 4 days, about 3 to about 6 days, about 3 to about 5 days, or about 3 to about 4 days. In some embodiments, the population of PSCs is cultured for about 2 days, about 3 days, about 4 days, about 5 days, or about 6 days. In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 10 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein (e.g., selected from (a) ERG, EGR1, FLI1 and FOSB to produce astrocyte-like cells; (b) ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 to produce cytotoxic T-cell-like cells; (c) HNF4G, TEAD4, and RFX3 to produce hepatocyte-like cells; (d) ETS1, ETV3, GABPA, KLF9, and NFKB1 to produce regulatory T-cell-like cells; (e) EBF1, ZBTB1, RELA, NRF1, and REL to produce B cell-like cells; or (f) SPI1, ZBTB1, RELA, and STAT2 to produce microglia-like cells). In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 9 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein. In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 8 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein. In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 7 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein. In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 6 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein. In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 5 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein. In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 4 days of expressing (e.g.,
inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein. In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 3 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein. In some embodiments, a differentiated cell type is produced from a PSC (or a population of PSCs) within 2 days of expressing (e.g., inducing expression of) an engineered (exogenous) nucleic acid encoding a transcription factor provided herein. Some methods of the present disclosure comprise (a) delivering to PSCs an engineered polynucleotide comprising an inducible promoter operably linked to an open reading frame encoding a (one or more) protein selected from ERG, EGR1, FLI1, and FOSB; (b) culturing the PSCs in feeder-free, serum-free culture media to produce an expanded population of PSCs; and (c) culturing PSCs of the expanded population in a series of induction media comprising an inducing agent to produce astrocyte-like cells (e.g.,CD44+/A2B5+ astrocyte-like cells). In some embodiments, the series of induction media comprises a first, a second, a third, and a fourth induction media. Some methods of the present disclosure comprise (a) delivering to PSCs an engineered polynucleotide comprising an inducible promoter operably linked to an open reading frame encoding a (one or more) protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4; (b) culturing the PSCs in feeder-free, serum-free culture media to produce an expanded population of PSCs; and (c) culturing PSCs of the expanded population in a series of induction media comprising an inducing agent to produce cytotoxic T-cell-like cells (e.g., CD3+/CD8+ cytotoxic T-cell-like cells). In some embodiments, the series of induction media comprises a first, a second, a third, and a fourth induction media. Some methods of the present disclosure comprise (a) delivering to PSCs an engineered polynucleotide comprising an inducible promoter operably linked to an open reading frame encoding a (one or more) protein selected from HNF4A, HNF4G, TEAD4, and RFX3; (b) culturing the PSCs in feeder-free, serum-free culture media to produce an expanded population of PSCs; and (c) culturing PSCs of the expanded population in a series of induction media comprising an inducing agent to produce hepatocyte-like cells (e.g., CD184+/ASGPR1+ hepatocyte-like cells). In some embodiments, the series of induction media comprises a first, a second, a third, and a fourth induction media. Some methods of the present disclosure comprise (a) delivering to PSCs an engineered polynucleotide comprising an inducible promoter operably linked to an open reading frame encoding a (one or more) protein selected from ETS1, ETV3, GABPA, KLF9,
and NFKB1; (b) culturing the PSCs in feeder-free, serum-free culture media to produce an expanded population of PSCs; and (c) culturing PSCs of the expanded population in a series of induction media comprising an inducing agent to produce regulatory T-cell-like cells (e.g., CD3+/CD25+ regulatory T-cell-like cells). In some embodiments, the series of induction media comprises a first, a second, a third, and a fourth induction media. Some methods of the present disclosure comprise (a) delivering to PSCs an engineered polynucleotide comprising an inducible promoter operably linked to an open reading frame encoding a (one or more) protein selected from EBF1, ZBTB1, RELA, NRF1, and REL; (b) culturing the PSCs in feeder-free, serum-free culture media to produce an expanded population of PSCs; and (c) culturing PSCs of the expanded population in a series of induction media comprising an inducing agent to produce B cell-like cells (e.g., CD19+/CD27+ B cell-like cells). In some embodiments, the series of induction media comprises a first, a second, a third, and a fourth induction media. Some methods of the present disclosure comprise (a) delivering to PSCs an engineered polynucleotide comprising an inducible promoter operably linked to an open reading frame encoding a (one or more) protein selected from SPI1, ZBTB1, RELA, and STAT2; (b) culturing the PSCs in feeder-free, serum-free culture media to produce an expanded population of PSCs; and (c) culturing PSCs of the expanded population in a series of induction media comprising an inducing agent to produce microglia-like cells (e.g., CD11b+/CX3CR1+ microglia-like cells). In some embodiments, the series of induction media comprises a first, a second, a third, and a fourth induction media. In some embodiments, the PSCs are cultured in feeder-free, serum-free culture media for about 6 to about 24 hours. For example, the PSC may be cultured in feeder-free, serum- free culture media for about, 6 to about 12 hours. In some embodiments, the PSCs are cultured in feeder-free, serum-free culture media for about 6 hours, about 7 hours, about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, about 13 hours, about 14 hours, about 15 hours, about 16 hours, about 17 hours, about 18 hours, about 19 hours, about 20 hours, about 21 hours, about 22 hours, about 23 hours, or about 24 hours. In some embodiments, the expanded population of PSCs comprises at least 5x103 PSCs. For example, the expanded population (e.g., at the time of induction) may comprise at least 1x104, at least 1x105, at least 1x106, or at least 1x107 PSCs. In some embodiments, the expanded population of PSCs comprises about 5x103 PSCs to about 1x107 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 2,000 cells/cm2 to about 3,000 cells/cm2. In some embodiments, PSCs of the expanded
population are cultured at a density of about 500/cm2 - 10000/cm2 PSCs. In some embodiments, the PSCs of the expanded population are cultured at a density of about 1000/cm2 - 9500/cm2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 1500/cm2 - 9000/cm2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 2000/cm2 - 8500/cm2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 2500/cm2 - 8000/cm2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 3000/cm2 - 7500/cm2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 3500/cm2 - 7000/cm2 PSCs. In some embodiments, the population comprises 4000/cm2 - 6500/cm2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 4500/cm2 - 6000/cm2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of about 5000/cm2 - 5500/cm2 PSCs. In some embodiments, PSCs of the expanded population are cultured at a density of at least 500/cm2 PSCs, at least 1000/cm2 PSCs, at least 1500/cm2 PSCs, at least 2000/cm2 PSCs, at least 2500/cm2 PSCs, at least 3000/cm2 PSCs, at least 3500/cm2 PSCs, at least 4000/cm2 PSCs, at least 4500/cm2 PSCs, at least 5000/cm2 PSCs, at least 5500/cm2 PSCs, at least 6000/cm2 PSCs, at least 6500/cm2 PSCs, at least 7000/cm2 PSCs, at least 7500/cm2 PSCs, at least 8000/cm2 PSCs, at least 8500/cm2 PSCs, at least 9000/cm2 PSCs, at least 9500/cm2 PSCs, or at least 10000/cm2 PSCs. In some embodiments, PSCs of the expanded population are cultured for no longer than 8 days, no longer than 7 days, no longer than 6 days, no longer than 5 days, or no longer than 4 days. For example, PSCs of the expanded population may be cultured for about 2 to about 8 days, about 2 to about 7 days, about 2 to about 6 days, about 2 to about 5 days, about 2 to about 4 days, about 3 to about 8 days, about 3 to about 7 days, about 3 to about 6 days, about 3 to about 5 days, or about 3 to about 4 days. In some embodiments, PSCs of the expanded population are cultured for about 2 days, about 3 days, about 4 days, about 5 days, about 6 days, about 7 days, or about 8 days. In some embodiments, PSCs of the expanded population are cultured in a first induction media for about 6 to about 36 hours. For example, the PSC may be cultured in a first induction media for about 6 to about 24 hours, about 6 to about 18 hours, about 6 to about 12 hours, 12 to about 36 hours, about 12 to about 24 hours, about 12 to about 18 hours, 18 to about 36 hours, or about 18 to about 24 hours. In some embodiments, the PSCs are cultured in a first induction media for about 6 hours, about 7 hours, about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, about 13 hours, about 14 hours, about
15 hours, about 16 hours, about 17 hours, about 18 hours, about 19 hours, about 20 hours, about 21 hours, about 22 hours, about 23 hours, about 24 hours, about 25 hours, about 26 hours, about 27 hours, about 28 hours, about 29 hours, or about 30 hours. In some embodiments, PSCs of the expanded population are cultured in a second induction media for about 6 to about 36 hours. For example, the PSC may be cultured in a second induction media for about 6 to about 24 hours, about 6 to about 18 hours, about 6 to about 12 hours, 12 to about 36 hours, about 12 to about 24 hours, about 12 to about 18 hours, 18 to about 36 hours, or about 18 to about 24 hours. In some embodiments, the PSCs are cultured in a second induction media for about 6 hours, about 7 hours, about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, about 13 hours, about 14 hours, about 15 hours, about 16 hours, about 17 hours, about 18 hours, about 19 hours, about 20 hours, about 21 hours, about 22 hours, about 23 hours, about 24 hours, about 25 hours, about 26 hours, about 27 hours, about 28 hours, about 29 hours, or about 30 hours. In some embodiments, PSCs of the expanded population are cultured in a third induction media for about 6 to about 36 hours. For example, the PSC may be cultured in a third induction media for about 6 to about 24 hours, about 6 to about 18 hours, about 6 to about 12 hours, 12 to about 36 hours, about 12 to about 24 hours, about 12 to about 18 hours, 18 to about 36 hours, or about 18 to about 24 hours. In some embodiments, the PSCs are cultured in a third induction media for about 6 hours, about 7 hours, about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, about 13 hours, about 14 hours, about 15 hours, about 16 hours, about 17 hours, about 18 hours, about 19 hours, about 20 hours, about 21 hours, about 22 hours, about 23 hours, about 24 hours, about 25 hours, about 26 hours, about 27 hours, about 28 hours, about 29 hours, or about 30 hours. In some embodiments, PSCs of the expanded population are cultured in a fourth induction media for about 6 to about 36 hours. For example, the PSC may be cultured in a fourth induction media for about 6 to about 24 hours, about 6 to about 18 hours, about 6 to about 12 hours, 12 to about 36 hours, about 12 to about 24 hours, about 12 to about 18 hours, 18 to about 36 hours, or about 18 to about 24 hours. In some embodiments, the PSCs are cultured in a fourth induction media for about 6 hours, about 7 hours, about 8 hours, about 9 hours, about 10 hours, about 11 hours, about 12 hours, about 13 hours, about 14 hours, about 15 hours, about 16 hours, about 17 hours, about 18 hours, about 19 hours, about 20 hours, about 21 hours, about 22 hours, about 23 hours, about 24 hours, about 25 hours, about 26 hours, about 27 hours, about 28 hours, about 29 hours, or about 30 hours.
In some embodiments, PSCs are incubated for at least 6 hours. In some embodiments, after incubation, the media is removed from the plate and the plate is washed with DMEM/F12. Some aspects provide a method of producing astrocyte-like cells , comprising: (a) delivering 1-20 copies, for example, 8-10 copies, of an engineered nucleic acid to a population of human stem cells, wherein the engineered nucleic acid comprises an inducible promoter operably linked to an open reading frame encoding a transcription factor selected from ERG, EGR1, FLI1 and FOSB; (b) inducing activation of the inducible promoter; and (c) culturing the population of human stem cells for no more than 10 days, preferably no more than 7 days, in a series of induction media (e.g., a first, second, third and/or fourth induction media as described herein) to produce astrocyte-like cells (e.g.,CD44+/A2B5+ astrocyte-like cells). In some embodiments, the method of producing astrocyte-like cells comprises delivering to the human stem cells 1-20 copies of each of the following: (i) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ERG, (ii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding EGR1, (iii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding FLI1, and (iv) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding FOSB. Some aspects provide a method of producing cytotoxic T-cell-like cells, comprising: (a) delivering 1-20 copies, for example, 8-10 copies, of an engineered nucleic acid to a population of human stem cells, wherein the engineered nucleic acid comprises an inducible promoter operably linked to an open reading frame encoding a transcription factor selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4; (b) inducing activation of the inducible promoter; and (c) culturing the population of human stem cells for no more than 10 days, preferably no more than 7 days, in a series of induction media (e.g., a first, second, third and/or fourth induction media as described herein) to produce cytotoxic T-cell-like cells (e.g., CD3+/CD8+ cytotoxic T-cell-like cells). In some embodiments, the method of producing cytotoxic T-cell-like cells comprises delivering to the human stem cells 1-20 copies of each of the following: (i) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ZBTB1, (ii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding RUNX3, (iii) an engineered nucleic acid
comprising an inducible promoter operably linked to an open reading frame encoding RELA, (iv) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding NRF1, (v) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ERF, and (vi) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding SP4. Some aspects provide a method of producing hepatocyte-like cells, comprising: (a) delivering 1-20 copies, for example, 8-10 copies, of an engineered nucleic acid to a population of human stem cells, wherein the engineered nucleic acid comprises an inducible promoter operably linked to an open reading frame encoding a transcription factor selected from HNF4G, TEAD4, and RFX3; (b) inducing activation of the inducible promoter; and (c) culturing the population of human stem cells for no more than 10 days, preferably no more than 7 days, in a series of induction media (e.g., a first, second, third and/or fourth induction media as described herein) to produce hepatocyte-like cells (e.g., CD184+/ASGPR1+ hepatocyte-like cells). In some embodiments, the method of producing hepatocyte-like cells comprises delivering to the human stem cells 1-20 copies of each of the following: (i) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding HNF4G, (ii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding TEAD4, and (iii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding RFX3. Some aspects provide a method of producing regulatory T-cell-like cells, comprising: (a) delivering 1-20 copies, for example, 8-10 copies, of an engineered nucleic acid to a population of human stem cells, wherein the engineered nucleic acid comprises an inducible promoter operably linked to an open reading frame encoding a transcription factor selected from ETS1, ETV3, GABPA, KLF9, and NFKB1; (b) inducing activation of the inducible promoter; and (c) culturing the population of human stem cells for no more than 10 days, preferably no more than 7 days, in a series of induction media (e.g., a first, second, third and/or fourth induction media as described herein) to produce regulatory T-cell-like cells (e.g., CD3+/CD25+ regulatory T-cell-like cells). In some embodiments, the method of producing regulatory T-cell-like cells comprises delivering to the human stem cells 1-20 copies of each of the following: (i) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ETS1, (ii) an engineered nucleic acid comprising an inducible promoter operably
linked to an open reading frame encoding ETV3, (iii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding GABPA, (iv) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding KLF9, and (v) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding NFKB1. Some aspects provide a method of producing B cell-like cells, comprising: (a) delivering 1-20 copies, for example, 8-10 copies, of an engineered nucleic acid to a population of human stem cells, wherein the engineered nucleic acid comprises an inducible promoter operably linked to an open reading frame encoding a transcription factor selected from EBF1, ZBTB1, RELA, NRF1, and REL; (b) inducing activation of the inducible promoter; and (c) culturing the population of human stem cells for no more than 10 days, preferably no more than 7 days, in a series of induction media (e.g., a first, second, third and/or fourth induction media as described herein) to produce B cell-like cells (e.g., CD19+/CD27+ B cell-like cells). In some embodiments, the method of producing B cell-like cells comprises delivering to the human stem cells 1-20 copies of each of the following: (i) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding EBF1, (ii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ZBTB1, (iii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding RELA, (iv) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding NRF1, and (v) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding REL. Some aspects provide a method of producing microglia-like cells, comprising: (a) delivering 1-20 copies, for example, 8-10 copies, of an engineered nucleic acid to a population of human stem cells, wherein the engineered nucleic acid comprises an inducible promoter operably linked to an open reading frame encoding a transcription factor selected from SPI1, ZBTB1, RELA, and STAT2; (b) inducing activation of the inducible promoter; and (c) culturing the population of human stem cells for no more than 10 days, preferably no more than 7 days, in a series of induction media (e.g., a first, second, third and/or fourth induction media as described herein) to produce microglia-like cells (e.g., CD11b+/CX3CR1+ microglia-like cells). In some embodiments, the method of producing microglia-like cells comprises delivering to the human stem cells 1-20 copies of each of the following: (i) an engineered
nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding SPI1, (ii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding ZBTB1, (iii) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding RELA, and (iv) an engineered nucleic acid comprising an inducible promoter operably linked to an open reading frame encoding STAT2. In some embodiments, the engineered nucleic acid(s) is/are integrated into the genome of the human stem cells. In some embodiments, the human stem cells are pluripotent stem cells (PSCs), for example, induced pluripotent stem cells (iPSCs). In some embodiments, the inducing step comprises delivering a chemical inducing agent, such as doxycycline or tetracycline, that activates the inducible promoter (e.g., a doxycycline-inducible promoter or a tetracycline-inducible promoter). In some embodiments, the human stem cells are first expanded in a feeder-free, serum-free culture media, prior to delivery of the engineered nucleic acid(s). Transfection Methods The engineered polynucleotide of the present disclosure may be delivered to a PSC using any one or more transfection method, including chemical transfection methods, viral transduction methods, and electroporation. In some embodiments, an engineered polynucleotide is delivered on a vector. A vector is any vehicle, for example, a virus or a plasmid, that is used to transfer a desired polynucleotide into a host cell, such as a PSC. In some embodiments, the vector is a viral vector. In some embodiments, a viral vector is not a naturally occurring viral vector. The viral vector may be from adeno-associated virus (AAV), adenovirus, herpes simplex virus, lentiviral, retrovirus, varicella, variola virus, hepatitis B, cytomegalovirus, JC polyomavirus, BK polyomavirus, monkeypox virus, Herpes Zoster, Epstein-Barr virus, human herpes virus 7, Kaposi's sarcoma-associated herpesvirus, or human parvovirus B 19. Other viral vectors are encompassed by the present disclosure. In some embodiments, a viral vector is an AAV vector. AAV is a small, non- enveloped virus that packages a single- stranded linear DNA genome that is approximately 5 kb long and has been adapted for use as a gene transfer vehicle (Samulski, RJ et al., Annu Rev Virol.2014;1(1):427-51). The coding regions of AAV are flanked by inverted terminal repeats (ITRs), which act as the origins for DNA replication and serve as the primary
packaging signal (McLaughlin, SK et al. Virol.1988;62(6): 1963-73; Hauswirth, WW et al. 1977;78(2):488-99). Thus, an AAV vector typically includes ITR sequences. Both positive and negative strands are packaged into virions equally well and capable of infection (Zhong, L et al. Mol Ther.2008;16(2):290-5; Zhou, X et al. Mol Ther.2008;16(3):494- 9; Samulski, RJ et al. Virol.1987;61(10):3096-101). In addition, a small deletion in one of the two ITRs allows packaging of self-complementary vectors, in which the genome self-anneals after viral uncoating. This results in more efficient transduction of cells but reduces the coding capacity by half (McCarty, DM et al. Mol Ther.2008;16(10): 1648-56; McCarty, DM et al. Gene Ther.2001;8(16): 1248-54). In some embodiments, a polynucleotide is delivered to a cell using a transposon/transposase system. For example, the piggyBac™ transposon system may be used. A piggyBac™ transposon is a mobile genetic element that efficiently transposes between vectors and chromosomes via a “cut and paste” mechanism (Woodard et al.2015). During transposition, the piggyBac™ transposase recognizes transposon-specific inverted terminal repeat sequences (ITRs) located on both ends of the transposon vector and efficiently moves the contents from the original sites and integrates them into TTAA chromosomal sites. The piggyBac™ transposon system facilitates efficient integration of a polynucleotide into a cell genome. Thus, in some embodiments, the method further comprises delivering to a PSC a transposon comprising an engineered polynucleotide and also delivering a transposase. In some embodiments, an engineered polynucleotide is delivered to a cell using electroporation. Electroporation is a physical transfection method that uses an electrical pulse to create temporary pores in cell membranes through which the engineered polynucleotide can pass into cells. See, e.g., Chicaybam L et al. Front. Bioeng. Biotechnol., 23 January 2017. Following transfection, the engineered polynucleotides may be integrated into the genome of a PSC. In some embodiments, an engineered polynucleotide may further comprise an antibiotic resistance gene to confer resistance to an antibiotic used in an antibiotic drug selection process. In this way, a ‘pure’ population of cells comprising an integrated engineered polynucleotide may be obtained. In some embodiments, a population of cells comprising an integrated engineered polynucleotide are selected using antibiotic drug selection. Antibiotic drug selection is the process of treating a population of cells with an antibiotic so that only cells that are capable of surviving in the presence of said antibiotic will remain in the population. Non-limiting examples of antibiotics that may be used for antibiotic
drug selection include: puromycin, blasticidin, geneticin, hygromycin, mycophenolic acid, zeocin, carbenicillin, kanemycin, ampicillin, and actinomycin. Culture Media The methods provided herein, in some embodiments, comprise culturing PSCs in a feeder-free, serum-free culture media. Culture media may comprise, for example, a solubilized basement membrane preparation extracted from the Engelbreth-Holm-Swarm (EHS) mouse sarcoma (e.g., Corning® Matrigel® Matrix) (coated at 75 to 150 µl per cm2 of lot-based diluted suspension). In some embodiments, the solubilized basement membrane preparation comprises one or more extracellular matrix (ECM) protein and one or more growth factor. For example, the ECM proteins may be selected from Laminin, Collagen IV, heparan sulfate proteoglycans, and entactin/nidogen. In some embodiments, culture media further comprises one or more growth factor, for example, selected from recombinant human basic fibroblast growth factor (rh bFGF) (e.g., 80ng/ml to 120ng/ml) and recombinant human transforming growth factor β (rh TGFβ) (e.g., 20 to 25pM). In some embodiments, culture media further comprises rh bFGF and rh TGFβ. In some embodiments, culture media comprises mTeSR™ media (STEMCELL Technologies). In some embodiments, a first induction media comprises one or more of (e.g., 2, 3, 4, or more of) B-27 Supplement (e.g., 90X to110X), L-alanyl-L-glutamine (e.g., 1.8 mM to 2.2 mM), an inducing agent (e.g., doxycycline (e.g., 50 ng/ml to 2000 ng/ml)), Activin A (e.g., 50 ng/ml to 150 ng/ml), a glycogen synthase kinase (GSK) 3 inhibitor (e.g., 2.8 µM to 3.2 µM), a selective FGFR1 and FGFR3 inhibitor (e.g., 90 nM to 110 nM), and a small molecule ROCK inhibitor (e.g., 8 µM to 12 µM). In some embodiments, a first induction media comprises B-27, L-alanyl-L-glutamine, an inducing agent (e.g., doxycycline), Activin A, a glycogen synthase kinase (GSK) 3 inhibitor, and a selective FGFR1 and FGFR3 inhibitor. For example, the first induction media may comprise aRB27 Media, doxycycline, Activin A, CHIR99021, and PD 173074. In some embodiments, the second induction media comprises one or more of (e.g., 2, 3, 4, or more of) B-27 Supplement (e.g., 90X to 110X), an inducing agent (e.g., doxycycline (e.g., 50 ng/ml to 2000 ng/ml)), a small molecule inhibitor of tankyrase (TNKS) (e.g., 0.9 µM to 1.1 µM), and a human bone morphogenic protein 4 (hBMP4) (e.g., 20 ng/ml to 250 ng/ml). In some embodiments, the second induction media comprises B-27, an inducing agent (e.g., doxycycline), a small molecule inhibitor of tankyrase (TNKS), and a human bone
morphogenic protein 4 (hBMP4). For example, the second induction media may comprise aRB27 Media, doxycycline, XAV939, and human bone morphogenic protein 4 (hBMP4). In some embodiments, the third induction media comprises one or more of (e.g., 2, 3, 4, or more of) B-27, an inducing agent (e.g., doxycycline), a small molecule inhibitor of tankyrase (e.g., 0.9 µM to 1.1 µM), stem cell factor (SCF) (e.g., 25 ng/ml to 200 ng/ml), and epidermal growth factor (EGF) (e.g., 25 ng/ml to 100 ng/ml). In some embodiments, the third induction media comprises B-27 Supplement (e.g., 90X to 110X), an inducing agent (e.g., doxycycline (e.g., 50 ng/ml to 2000 ng/ml)), a small molecule inhibitor of tankyrase (e.g., 0.9 µM to 1.1 µM), stem cell factor (SCF) (e.g., 25ng/ml to 200ng/ml), and epidermal growth factor (EGF) (e.g., 25 ng/ml to 100 ng/ml). For example, the third induction media may comprise aRB27 Media, doxycycline, XAV939, SCF, and EGF. In some embodiments, the fourth induction media comprises one or more of (e.g., 2, 3, 4, or more of) B-27 Supplement (90-110X), an inducing agent (e.g., doxycycline (e.g., 50 ng/ml to 2000 ng/ml)), a small molecule inhibitor of tankyrase (e.g., 0.9 µM to 1.1 µM), hBMP4 (e.g., 20 ng/ml to 250 ng/ml), SCF (e.g., 25 ng/ml to 200 ng/ml), and EGF (e.g., 25 ng/ml to 100 ng/ml). In some embodiments, the fourth induction media comprises B-27, an inducing agent (e.g., doxycycline), a small molecule inhibitor of tankyrase, hBMP4, SCF, and EGF. For example, the fourth induction media may comprise aRB27 Media, doxycycline, XAV939, hBMP4, SCF, and EGF. The ‘aRB27 Media’ used herein comprises Advanced RPMI, B-27™ Supplement, minus vitamin A (Thermo Fisher) or plus vitamin A, GlutaMAX™ Supplement (Thermo Fisher), non-essential amino acids (NEAA), Primocin® (a broad-spectrum antibiotic), and Y- 27632 (a small molecule ROCK inhibitor). GlutaMAX™ Supplement comprises L-alanyl-L-glutamine, which is a dipeptide substitute for L-glutamine. Activin-A is a dimeric glycoprotein, which belongs to the transforming growth factor- β (TGF-β) family. CHIR99021 is an aminopyrimidine derivative that is an extremely potent glycogen synthase kinase (GSK) 3 inhibitor, inhibiting both GSK3β (IC₅₀ = 6.7 nM) and GSK3α (IC₅₀ = 10 nM). GSK3 is a serine/threonine kinase that is a key inhibitor of the WNT pathway; therefore, CHIR99021 functions as a WNT activator. PD 173074 is a selective FGFR1 and FGFR3 inhibitor (IC50 values are ~5 nM, ~21.5 nM, ~100 nM, ~17600 nM and ~19800 nM for FGFR3, FGFR1, VEGFR2, PDGFR and c-Src respectively, and > 50000 nM for EGFR, InsR, MEK and PKC).
XAV939 is a potent, small molecule inhibitor of tankyrase (TNKS) 1 and 2 (IC₅₀ = 11 and 4 nM, respectively) (Huang et al.). By inhibiting TNKS activity, XAV939 increases the protein levels of the axin-GSK3β complex and promotes the degradation of β-catenin in SW480 cells (Huang et al.), thereby inhibiting WNT pathway downstream actions. Therapeutic Compositions and Method of Use The present disclosure provides, in some embodiments, therapeutic compositions comprising the astrocyte-like cells, cytotoxic T-cell-like cells, hepatocyte-like cells, regulatory T-cell-like cells, B cell-like cells, and/or microglia-like cells produced herein. In some embodiments, the compositions further comprise a pharmaceutically-acceptable excipient. The compositions, in some embodiments, are cryopreserved. Such compositions may be administered to a subject, such as a human subject, using any suitable route of administration. Suitable routes of administration include, for example, parenteral routes such as intravenous, intrathecal, parenchymal, or intraventricular routes. Suitable routes of administration include, for example, parenteral routes such as intravenous, intrathecal, parenchymal, or intraventricular injection. In some embodiments, a subject is a human subject. The subject may have a disease, disorder, or symptoms of a disease associated with astrocyte dysfunction, cytotoxic T-cell dysfunction, hepatocyte dysfunction, regulatory T-cell, B cell, or microglia dysfunction. The compositions may be administered to a subject in a therapeutically effective amount. The term “ “therapeutically effective amount” refers to the amount of cell material required to confer therapeutic effect on a subject, either alone or in combination with at least one other active agent. Effective amounts vary, as recognized by those skilled in the art, depending on the route of administration, excipient usage, and co-usage with other active agents. The quantity to be administered depends on the subject to be treated, including, for example, the strength of an individual’s immune system or genetic predispositions. Suitable dosage ranges are readily determinable by one skilled in the art and may be on the order of micrograms of the polypeptide of this disclosure. The dosage of the preparations disclosed herein may depend on the route of administration and varies according to the size of the subject. It is believed that one skilled in the art can, based on the above description, utilize the present invention to its fullest extent. The following specific embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder of the disclosure in
any way whatsoever. All publications cited in the present application are incorporated by reference for the purposes or subject matter referenced in this disclosure. Methods of Identifying Transcription Factors for Differentiation of Stems Cells into Target Cell Types Aspects of the present disclosure relate to a method for identifying transcription factors that are able to differentiate a stem cell into a target cell type (e.g., an astrocyte, a cytotoxic T-cell, a hepatocyte, a regulatory T-cell, a B cell, or a microglial cell), comprising: (i) analyzing epigenetics data for a target cell type to identify genomic sites that are available for binding of a transcription factor and generating a first pool of transcription factors; (ii) analyzing transcriptomic data for the target cell type to identify expression levels of the transcription factors associated with the genomic sites that are available for binding identified in step (i) and generating second pool of transcription factors; (iii) using a first statistical method to filter background data and identify transcription factors that are present in the first pool of transcription factors and the second pool of transcription factors and generating a third pool of transcription factors, wherein the third pool of transcription factors; (iv) using a second statistical method to determine the statistical significance of the transcription factors in the third pool of transcription factors; and (v) repeating steps (i)-(iv) one or more times to iteratively refine the third pool of transcription factors. In some embodiments, the epigenetics data provides information related to whether genomic chromatin is open or closed. In some embodiments, the epigenetics data is produced by DNAse-seq, ATAC-seq, or ChIP-seq. In some embodiments, the transcriptomic data provides information related to whether there are more transcripts of the transcription factor in the target cell type than in a non-target cell type. In some embodiments, the transcriptomic data is produced by RNA-seq. In some embodiments, the first statistical method is linear regression algorithm. In some embodiments, the first statistical method is a logistic regression algorithm. In some embodiments, the first statistical method is a L1-regularized logistic regression model (LASSO). In some embodiments, the background data is associated with transcription factors that are not expressed in the target cell type at a higher expression level than in the non-target cell type. In some embodiments, the second statistical method is a log-likelihood ratio test. In some embodiments, the method further comprises transfecting transcription factors of the third pool into a stem cell. In some embodiments, the method further comprises inducing differentiation of the stem cell into the target cell type. In some embodiments, the
method further comprises analyzing the target cell type to identify additional transcription factors associated with the target cell type. In some embodiments, the method further comprises using data from the target cell type to further refine the steps of the method. In some embodiments, the target cell type is an astrocyte, a cytotoxic T-cell, a hepatocyte, a regulatory T-cell, a B cell, or a microglial cell. In some embodiments, differentiation of stem cells using one or more of the transcription factors in the third pool results in production of the target cell type in no more than 6 days. Aspects of the present disclosure relate to a method for generating a transcription factor screening pool comprising: using at least one computer hardware processer to perform: accessing at least one statistical model relating one or more input transcription factors to differentiation efficiency of a cell having the one or more input transcription factors; obtaining differentiation efficiency information for the one or more input transcription factors; generating, using the at least one statistical model and the differentiation efficiency information, a transcription factor pool having transcription factors that are predicted to differentiate the cell into a target cell type in accordance with the differentiation efficiency information. In some embodiments, the at least one statistical model correlates chromatin accessibility data and transcriptomics data to make initial predictions relating the one or more input transcription factors to differentiation efficiency of the cell having the one or more input transcription factors. In some embodiments, the at least one statistical model distinguishes open chromatin data from background data. In some embodiments, the open chromatin data is associated with the target cell type. In some embodiments, the method further comprises identifying an initial set of transcription factor motifs positively correlated with the open chromatin data by using a statistical coefficient trained to distinguish the open chromatin data from the background data. In some embodiments, the differentiation efficiency information corresponds to a mode of a distribution of differentiation efficiency data used to train the at least one statistical model. In some embodiments, the at least one statistical model was trained using measured differentiation efficiency values having a multimodal distribution with modes, and the differentiation efficiency information corresponds to a mode of the multimodal distribution with the highest value.
In some embodiments, the transcription factors of the transcription factor pool have predicted differentiation efficiency within a distribution centered at the mode of the multimodal distribution with the highest value. In some embodiments, the differentiation efficiency information corresponds to a Gaussian distribution centered at a mode of a distribution for differentiation efficiency data used to train the at least one statistical model. In some embodiments, the differentiation efficiency information corresponds to a high differentiation efficiency component of a distribution of differentiation efficiency values for transcription factors. In some embodiments, generating the transcription factor pool further comprises: generating an initial pool of transcription factors; using transcription factors in the initial pool as input to the at least one statistical model to obtain values for differentiation efficiency; selecting, based on the values for differentiation efficiency and the differentiation efficiency information, one or more of the transcription factors in the initial pool to include in the transcription factor pool. In some embodiments, wherein the at least one statistical model comprises at least one regression model. In some embodiments, the at least one statistical model comprises at least one neural network. In some embodiments, the at least one statistical model has a recurrent neural network architecture. In some embodiments, the at least one statistical model comprises a L1- regularized logistic regression model (LASSO). In some embodiments, the at least one statistical model comprises a log-likelihood ratio test. Aspects of the present disclosure relate to a system comprising: at least one hardware processor; and at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform: accessing at least one statistical model relating one or more input transcription factors to differentiation efficiency of a cell having the one or more input transcription factors; obtaining differentiation efficiency information for transcription factors, wherein the differentiation efficiency information corresponds to a mode of a distribution for differentiation efficiency data used to train the at least one statistical model; and generating, using the at least one statistical model and the differentiation efficiency information, a transcription factor pool having transcription factors with predicted differentiation efficiency in accordance with the differentiation efficiency information.
In some embodiments, the target cell type is a Type II astrocyte, cytotoxic T-cell, regulatory T-cell, hepatocyte, B cell, or microglial cell. Additional Embodiments Additional embodiments of the disclosure are provided in the following numbered paragraphs: 1. A pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding ERG, EGR1, FLI1, FOSB, or any combination thereof. 2. The PSC of paragraph 1, comprising the engineered polynucleotide comprising an open reading frame encoding ERG. 3. The PSC of paragraph 1 or 2, comprising the engineered polynucleotide comprising an open reading frame encoding EGR1. 4. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding FLI1. 5. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding FOSB. 6. The PSC of any one of the preceding paragraphs, wherein the PSC expresses or overexpresses ERG, EGR1, FLI1, FOSB, or any combination thereof. 7. The PSC of any one of the preceding paragraphs, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 8. The PSC of paragraph 7, wherein the heterologous promoter is an inducible promoter. 9. A pluripotent stem cell (PSC) comprising: a protein selected from ERG, EGR1, FLI1, and FOSB, wherein the protein is overexpressed. 10. The PSC of paragraph 9, wherein the PSC expresses or overexpresses: ERG, EGR1, FLI1, FOSB, or any combination thereof. 11. The PSC of any one of paragraphs 1-10, wherein the PSC is a human PSC. 12. The PSC of any one of paragraphs 1-11, wherein the PSC is an induced PSC (iPSC). 13. The PSC of any one of the preceding paragraphs, comprising 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ERG, EGR1, FLI1, and FOSB, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 14. The PSC of any one of the preceding paragraphs, comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected
from ERG, EGR1, FLI1, and FOSB, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 15. A composition comprising: a population of the PSC of any one of the preceding paragraphs. 16. The composition of paragraph 15, wherein the population comprises at least 2500/cm2 of the PSC. 17. A method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ERG, EGR1, FLI1, and FOSB to produce astrocyte-like cells. 18. The method of paragraph 17, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ERG. 19. The method of paragraph 17 or 18, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding EGR1. 20. The method of any one of paragraphs 17-19, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding FLI1. 21. The method of any one of paragraphs 17-20, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding FOSB. 22. The method of any one of paragraphs 17-21, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 23. The method of paragraph 22, wherein the heterologous promoter is an inducible promoter. 24. The method of paragraph 23, wherein the inducible promoter is a chemically- inducible promoter. 25. The method of paragraph 24, wherein the chemically-inducible promoter is a doxycycline-inducible promoter. 26. The method of any one of the preceding paragraphs, wherein the population comprises 1x102 -1x107 PSCs. 27. The method of any one of the preceding paragraphs, wherein the population of PSCs is cultured for at least 1 day.
28. The method of paragraph 27, wherein the population of PSCs is cultured for about 3-6 days, 29. The method of paragraph 28, wherein the population of PSCs is cultured for no more than 6 days. 30. The method of any one of the preceding paragraphs, wherein the astrocyte-like cells are CD44+ and A2B5+. 31. A pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof. 32. The PSC of paragraph 31, comprising the engineered polynucleotide comprising an open reading frame encoding ZBTB1. 33. The PSC of paragraph 31 or 32, comprising the engineered polynucleotide comprising an open reading frame encoding RUNX3. 34. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding RELA. 35. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding NRF1. 36. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding ERF. 37. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding SP4. 38. The PSC of any one of the preceding paragraphs, wherein the PSC expresses or overexpresses ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof. 39. The PSC of any one of the preceding paragraphs, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 40. The PSC of paragraph 39, wherein the heterologous promoter is an inducible promoter. 41. A pluripotent stem cell (PSC) comprising: a protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4, wherein the protein is overexpressed. 42. The PSC of paragraph B11, wherein the PSC expresses or overexpresses: ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof. 43. The PSC of any one of paragraphs 31-42, wherein the PSC is a human PSC. 44. The PSC of any one of paragraphs 31-43, wherein the PSC is an induced PSC (iPSC).
45. The PSC of any one of the preceding paragraphs, comprising 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 46. The PSC of any one of the preceding paragraphs, comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 47. A composition comprising: a population of the PSC of any one of the preceding paragraphs. 48. The composition of paragraph B17, wherein the population comprises at least 2500/cm2 of the PSC. 49. A method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 to produce cytotoxic T-cell-like cells. 50. The method of paragraph 49, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ZBTB1. 51. The method of paragraph 49 or 50, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RUNX3. 52. The method of any one of paragraphs 49-51, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RELA. 53. The method of any one of paragraphs 49-52, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding NRF1. 54. The method of any one of paragraphs 49-53, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ERF. 55. The method of any one of paragraphs 49-54, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding SP4.
56. The method of any one of paragraphs 49-55, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 57. The method of paragraph 56, wherein the heterologous promoter is an inducible promoter. 58. The method of paragraph 57, wherein the inducible promoter is a chemically- inducible promoter. 59. The method of paragraph 58, wherein the chemically-inducible promoter is a doxycycline-inducible promoter. 60. The method of any one of the preceding paragraphs, wherein the population comprises 1x102 -1x107 PSCs. 61. The method of any one of the preceding paragraphs, wherein the population of PSCs is cultured for at least 1 day. 62. The method of paragraph 61, wherein the population of PSCs is cultured for about 3-6 days, 63. The method of paragraph 62, wherein the population of PSCs is cultured for no more than 6 days. 64. The method of any one of the preceding paragraphs, wherein the cytotoxic T-cell-like cells are CD3+ and CD8+. 65. A pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding HNF4G, TEAD4, RFX3, or any combination thereof. 66. The PSC of paragraph 65, comprising the engineered polynucleotide comprising an open reading frame encoding HNF4G. 67. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding TEAD4. 68. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding RFX3. 69. The PSC of any one of the preceding paragraphs, further comprising an engineered polynucleotide comprising an open reading frame encoding HNF4A. 70. The PSC of any one of the preceding paragraphs, wherein the PSC expresses or overexpresses HNF4G, TEAD4, RFX3, or any combination thereof. 71. The PSC of paragraph 70, wherein the PSC further expresses or overexpresses HNF4A.
72. The PSC of any one of the preceding paragraphs, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 73. The PSC of paragraph 72, wherein the heterologous promoter is an inducible promoter. 74. A pluripotent stem cell (PSC) comprising: a protein selected from HNF4G, TEAD4, and RFX3, wherein the protein is overexpressed. 75. The PSC of paragraph 74, wherein the PSC expresses or overexpresses: HNF4G, TEAD4, RFX3, or any combination thereof. 76. The PSC of any one of paragraphs 65-75, wherein the PSC is a human PSC. 77. The PSC of any one of paragraphs 65-76, wherein the PSC is an induced PSC (iPSC). 78. The PSC of any one of the preceding paragraphs, comprising 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from HNF4G, HNF4A, TEAD4, and RFX3, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 79. The PSC of any one of the preceding paragraphs, comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from HNF4G, TEAD4, and RFX3, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 80. A composition comprising: a population comprising the PSC of any one of the preceding paragraphs. 81. The composition of paragraph 80, wherein the population comprises at least 2500/cm2 of the PSC. 82. A method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from HNF4G, TEAD4, and RFX3 to produce hepatocyte-like cells. 83. The method of paragraph 82, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding HNF4G. 84. The method of any one of paragraphs 82-83, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding TEAD4.
85. The method of any one of paragraphs 82-84, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RFX3. 86. The method of any one of the preceding paragraphs, wherein the PSCs of the expanded population further comprise an engineered polynucleotide comprising an open reading frame encoding HNF4A. 87. The method of any one of paragraphs 82-86, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 88. The method of paragraph 87, wherein the heterologous promoter is an inducible promoter. 89. The method of paragraph 88, wherein the inducible promoter is a chemically- inducible promoter. 90. The method of paragraph 89, wherein the chemically-inducible promoter is a doxycycline-inducible promoter. 91. The method of any one of the preceding paragraphs, wherein the population comprises 1x102 -1x107 PSCs. 92. The method of any one of the preceding paragraphs, wherein the population of PSCs is cultured for at least 1 day. 93. The method of paragraph 92, wherein the population of PSCs is cultured for about 3-6 days, 94. The method of paragraph 93, wherein the population of PSCs is cultured for no more than 6 days. 95. The method of any one of the preceding paragraphs, wherein the hepatocyte-like cells are CD184+ and ASGPR1+. 96. A pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof. 97. The PSC of paragraph 96, comprising the engineered polynucleotide comprising an open reading frame encoding ETS1. 98. The PSC of paragraph 96 or 97, comprising the engineered polynucleotide comprising an open reading frame encoding ETV3. 99. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding GABPA.
100. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding KLF9. 101. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding NFKB1. 102. The PSC of any one of the preceding paragraphs, wherein the PSC expresses or overexpresses ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof. 103. The PSC of any one of the preceding paragraphs, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 104. The PSC of paragraph 103, wherein the heterologous promoter is an inducible promoter. 105. A pluripotent stem cell (PSC) comprising: a protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1, wherein the protein is overexpressed. 106. The PSC of paragraph 105, wherein the PSC expresses or overexpresses: ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof. 107. The PSC of any one of paragraphs 96-106, wherein the PSC is a human PSC. 108. The PSC of any one of paragraphs 96-107, wherein the PSC is an induced PSC (iPSC). 109. The PSC of any one of the preceding paragraphs, comprising 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 110. The PSC of any one of the preceding paragraphs, comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 111. A composition comprising: a population of the PSC of any one of the preceding paragraphs. 112. The composition of paragraph 111, wherein the population comprises at least 2500/cm2 of the PSC. 113. A method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1 to produce regulatory T-cell-like cells.
114. The method of paragraph 113, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ETS1. 115. The method of paragraph 113 or 114, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ETV3. 116. The method of any one of paragraphs 113-115, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding GABPA. 117. The method of any one of paragraphs 113-116, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding KLF9. 118. The method of any one of paragraphs 113-117, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding NFKB1. 119. The method of any one of paragraphs 113-118, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 120. The method of paragraph 119, wherein the heterologous promoter is an inducible promoter. 121. The method of paragraph 120, wherein the inducible promoter is a chemically- inducible promoter. 122. The method of paragraph 121, wherein the chemically-inducible promoter is a doxycycline-inducible promoter. 123. The method of any one of the preceding paragraphs, wherein the population comprises 1x102 -1x107 PSCs. 124. The method of any one of the preceding paragraphs, wherein the population of PSCs is cultured for at least 1 day. 125. The method of paragraph 124, wherein the population of PSCs is cultured for about 3- 6 days, 126. The method of paragraph 125, wherein the population of PSCs is cultured for no more than 6 days. 127. The method of any one of the preceding paragraphs, wherein the regulatory T-cell- like cells are CD3+ and CD25+. 128. A pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof.
129. The PSC of paragraph 128, comprising the engineered polynucleotide comprising an open reading frame encoding EBF1. 130. The PSC of paragraph 128 or 129, comprising the engineered polynucleotide comprising an open reading frame encoding ZBTB1. 131. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding RELA. 132. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding NRF1. 133. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding REL. 134. The PSC of any one of the preceding paragraphs, wherein the PSC expresses or overexpresses EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof. 135. The PSC of any one of the preceding paragraphs, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 136. The PSC of paragraph 135, wherein the heterologous promoter is an inducible promoter. 137. A pluripotent stem cell (PSC) comprising: a protein selected from EBF1, ZBTB1, RELA, NRF1, and REL, wherein the protein is overexpressed. 138. The PSC of paragraph 137, wherein the PSC expresses or overexpresses: EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof. 139. The PSC of any one of paragraphs 128-138, wherein the PSC is a human PSC. 140. The PSC of any one of paragraphs 128-139, wherein the PSC is an induced PSC (iPSC). 141. The PSC of any one of the preceding paragraphs, comprising 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from EBF1, ZBTB1, RELA, NRF1, and REL, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 142. The PSC of any one of the preceding paragraphs, comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from EBF1, ZBTB1, RELA, NRF1, and REL, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 143. A composition comprising: a population of the PSC of any one of the preceding paragraphs.
144. The composition of paragraph 143, wherein the population comprises at least 2500/cm2 of the PSC. 145. A method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from EBF1, ZBTB1, RELA, NRF1, and REL to produce B cell-like cells. 146. The method of paragraph 145, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding EBF1. 147. The method of paragraph 145 or 146, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ZBTB1. 148. The method of any one of paragraphs 145-147, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RELA. 149. The method of any one of paragraphs 145-148, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding NRF1. 150. The method of any one of paragraphs 145-149, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding REL. 151. The method of any one of paragraphs 145-150, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 152. The method of paragraph 151, wherein the heterologous promoter is an inducible promoter. 153. The method of paragraph 152, wherein the inducible promoter is a chemically- inducible promoter. 154. The method of paragraph 153, wherein the chemically-inducible promoter is a doxycycline-inducible promoter. 155. The method of any one of the preceding paragraphs, wherein the population comprises 1x102 -1x107 PSCs. 156. The method of any one of the preceding paragraphs, wherein the population of PSCs is cultured for at least 1 day.
157. The method of paragraph 156, wherein the population of PSCs is cultured for about 3- 6 days, 158. The method of paragraph 157, wherein the population of PSCs is cultured for no more than 6 days. 159. The method of any one of the preceding paragraphs, wherein the B cell-like cells are CD19+ and CD27+. 160. A pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding SPI1, ZBTB1, RELA, STAT2, or any combination thereof. 161. The PSC of paragraph 160, comprising the engineered polynucleotide comprising an open reading frame encoding SPI1. 162. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding ZBTB1. 163. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding RELA. 164. The PSC of any one of the preceding paragraphs, comprising the engineered polynucleotide comprising an open reading frame encoding STAT2. 165. The PSC of any one of the preceding paragraphs, wherein the PSC expresses or overexpresses SPI1, ZBTB1, RELA, STAT2, or any combination thereof. 166. The PSC of any one of the preceding paragraphs, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 167. The PSC of paragraph 166, wherein the heterologous promoter is an inducible promoter. 168. A pluripotent stem cell (PSC) comprising: a protein selected from SPI1, ZBTB1, RELA, and STAT2, wherein the protein is overexpressed. 169. The PSC of paragraph 168, wherein the PSC expresses or overexpresses: SPI1, ZBTB1, RELA, STAT2, or any combination thereof. 170. The PSC of any one of paragraphs 160-169, wherein the PSC is a human PSC. 171. The PSC of any one of paragraphs 160-170, wherein the PSC is an induced PSC (iPSC). 172. The PSC of any one of the preceding paragraphs, comprising 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from SPI1, ZBTB1, RELA, and STAT2, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
173. The PSC of any one of the preceding paragraphs, comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from SPI1, ZBTB1, RELA, and STAT2, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC. 174. A composition comprising: a population comprising the PSC of any one of the preceding paragraphs. 175. The composition of paragraph 174, wherein the population comprises at least 2500/cm2 of the PSC. 176. A method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from SPI1, ZBTB1, RELA, and STAT2 to produce microglia-like cells. 177. The method of paragraph 176, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding SPI1. 178. The method of any one of paragraphs 176-177, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding ZBTB1. 179. The method of any one of paragraphs 176-178, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding RELA. 180. The method of any one of the preceding paragraphs, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding STAT2. 181. The method of any one of paragraphs 176-180, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter. 182. The method of paragraph 181, wherein the heterologous promoter is an inducible promoter. 183. The method of paragraph 182, wherein the inducible promoter is a chemically- inducible promoter. 184. The method of paragraph 183, wherein the chemically-inducible promoter is a doxycycline-inducible promoter. 185. The method of any one of the preceding paragraphs, wherein the population comprises 1x102 -1x107 PSCs.
186. The method of any one of the preceding paragraphs, wherein the population of PSCs is cultured for at least 1 day. 187. The method of paragraph 186, wherein the population of PSCs is cultured for about 3- 6 days, 188. The method of paragraph 187, wherein the population of PSCs is cultured for no more than 6 days. 189. The method of any one of the preceding paragraphs, wherein the microglia-like cells are CD11b+ and CX3CR1+. 190. A method, comprising: (i) analyzing epigenetics data for a target cell type to identify genomic sites that are available for binding of a transcription factor and generating a first pool of transcription factors; (ii) analyzing transcriptomic data for the target cell type to identify expression levels of the transcription factors associated with the genomic sites that are available for binding identified in step (i) and generating a second pool of transcription factors; (iii) using a first statistical method to filter background data and identify transcription factors that are present in the first pool of transcription factors and the second pool of transcription factors and generating a third pool of transcription factors, wherein the third pool of transcription factors comprises transcription factors that are in both the first pool and the second pool; (iv) using a second statistical method to determine the statistical significance of the transcription factors in the third pool of transcription factors; and (v) repeating steps (i)-(iv) one or more times to iteratively refine the third pool of transcription factors. 191. The method of paragraph 190, wherein the epigenetics data provides information related to whether genomic chromatin is open or closed. 192. The method of paragraph 190 or paragraph 191, wherein the epigenetics data is produced by DNAse-seq, ATAC-seq, or ChIP-seq. 193. The method of any one of the preceding paragraphs, wherein the transcriptomic data provides information related to whether there are more transcripts of the transcription factor in the target cell type than in a non-target cell type. 194. The method of any one of the preceding paragraphs, wherein the transcriptomic data is produced by RNA-seq.
195. The method of any one of the preceding paragraphs, wherein the first statistical method is linear regression algorithm. 196. The method of any one of the preceding paragraphs, wherein the first statistical method is a logistic regression algorithm. 197. The method of any one of the preceding paragraphs, wherein the first statistical method is a L1-regularized logistic regression model (LASSO). 198. The method of any one of the preceding paragraphs, wherein the background data is associated with transcription factors that are not expressed in the target cell type at a higher expression level than in the non-target cell type. 199. The method of any one of the preceding paragraphs, wherein the second statistical method is a log-likelihood ratio test. 200. The method of any one of the preceding paragraphs, further comprising transfecting transcription factors of the third pool into a stem cell. 201. The method of paragraph 200, further comprising inducing differentiation of the stem cell into the target cell type. 202. The method of paragraph 201, further comprising analyzing the target cell type to identify additional transcription factors associated with the target cell type. 203. The method of any one of the preceding paragraphs, further comprising using data from the target cell type to further refine the steps of paragraph 190. 204. The method of any one of the preceding paragraphs, wherein the target cell type is an astrocyte, a cytotoxic T-cell, a hepatocyte, a regulatory T-cell, a B cell, or a microglial cell. 205. The method of any one of the preceding paragraphs, wherein differentiation of stem cells using one or more of the transcription factors in the third pool results in production of the target cell type in no more than 6 days. 206. A method for generating a transcription factor screening pool comprising: using at least one computer hardware processer to perform: accessing at least one statistical model relating one or more input transcription factors to differentiation efficiency of a cell having the one or more input transcription factors; obtaining differentiation efficiency information for the one or more input transcription factors; generating, using the at least one statistical model and the differentiation efficiency information, a transcription factor pool having transcription factors that are
predicted to differentiate the cell into a target cell type in accordance with the differentiation efficiency information. 207. The method of paragraph 206, wherein the at least one statistical model correlates chromatin accessibility data and transcriptomics data to make initial predictions relating the one or more input transcription factors to differentiation efficiency of the cell having the one or more input transcription factors. 208. The method of paragraph 206 or paragraph 207, wherein the at least one statistical model distinguishes open chromatin data from background data. 209. The method of paragraph 208, wherein the open chromatin data is associated with the target cell type. 210. The method of any one of the preceding paragraphs, further comprising identifying an initial set of transcription factor motifs positively correlated with the open chromatin data by using a statistical coefficient trained to distinguish the open chromatin data from the background data. 211. The method of any one of the preceding paragraphs, wherein the differentiation efficiency information corresponds to a mode of a distribution of differentiation efficiency data used to train the at least one statistical model. 212. The method of any one of the preceding paragraphs, wherein the at least one statistical model was trained using measured differentiation efficiency values having a multimodal distribution with modes, and the differentiation efficiency information corresponds to a mode of the multimodal distribution with the highest value. 213. The method of any one of the preceding paragraphs, wherein the transcription factors of the transcription factor pool have predicted differentiation efficiency within a distribution centered at the mode of the multimodal distribution with the highest value. 214. The method of any one of the preceding paragraphs, wherein the differentiation efficiency information corresponds to a Gaussian distribution centered at a mode of a distribution for differentiation efficiency data used to train the at least one statistical model. 215. The method of any one of the preceding paragraphs, wherein the differentiation efficiency information corresponds to a high differentiation efficiency component of a distribution of differentiation efficiency values for transcription factors. 216. The method of any one of the preceding paragraphs, wherein generating the transcription factor pool further comprises: generating an initial pool of transcription factors;
using transcription factors in the initial pool as input to the at least one statistical model to obtain values for differentiation efficiency; selecting, based on the values for differentiation efficiency and the differentiation efficiency information, one or more of the transcription factors in the initial pool to include in the transcription factor pool. 217. The method of any one of the preceding paragraphs, wherein the at least one statistical model comprises at least one regression model. 218. The method of any one of the preceding paragraphs, wherein the at least one statistical model comprises at least one neural network. 219. The method of any one of the preceding paragraphs, wherein the at least one statistical model has a recurrent neural network architecture. 220. The method of any one of the preceding paragraphs, wherein the at least one statistical model comprises a L1-regularized logistic regression model (LASSO). 221. The method of any one of the preceding paragraphs, wherein the at least one statistical model comprises a log-likelihood ratio test. 222. A system comprising: at least one hardware processor; and at least one non-transitory computer-readable storage medium storing processor- executable instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform: accessing at least one statistical model relating one or more input transcription factors to differentiation efficiency of a cell having the one or more input transcription factors;obtaining differentiation efficiency information for transcription factors, wherein the differentiation efficiency information corresponds to a mode of a distribution for differentiation efficiency data used to train the at least one statistical model; and generating, using the at least one statistical model and the differentiation efficiency information, a transcription factor pool having transcription factors with predicted differentiation efficiency in accordance with the differentiation efficiency information. 223. The method of any one of the preceding paragraphs, wherein the target cell type is a Type II astrocyte, cytotoxic T-cell, regulatory T-cell, hepatocyte, B cell, or microglial cell. EXAMPLES
Stem cells are the progenitor cells of all differentiating multi-cellular organisms. In principle, it is possible to differentiate these cells into any other type of cell, which can then be used for many different possible therapeutic or diagnostic applications. The creation of induced pluripotent stem cells (iPSCs) has enabled scientists to explore the derivation of many types of cells. While there are diverse general approaches for cell-fate engineering, one of the fastest and most efficient approaches is transcription factor (TF) over-expression. Over-expression of specific combinations of TFs is often a reliable method to differentiate stem cells, but since there are at least 1732 transcription factors in the human genome, selecting the right combination to differentiate iPSCs directly into other cell-types is a difficult task. Here were describe a machine-learning (ML) pipeline, called CellCartographer, for using chromatin accessibility data to design multiplex TF pooled- screens for cell type conversions. We then describe a barcoded bulk RNA-seq method for refining sets of TFs using iterative NGS experiments. We validate this method by differentiating iPSCs into twelve diverse cell types at low efficiency in primary screens and then iteratively refining our differentiation strategy to achieve high efficiency differentiation for six of these cell types, originating from all germ layers in high efficiency in ≤ 6 days. Finally, we functionally characterized engineered iPSC-derived cytotoxic T-cells (iCytoT), regulatory T-cells (iTreg), type II astrocytes (iAstII), and hepatocytes (iHep) to validate fast, robust, and functionally accurate differentiation of stem cells into cell types potentially useful for downstream therapeutic and diagnostic pipelines. Example 1 - Machine learning for determining TF sub-libraries It is not known exactly how many human cell types exist, but current estimates put the number in the hundreds, all originating from a single ‘totipotent’ embryonic stem cell. Since the creation of induced pluripotent stem cells (iPSCs), scientists have been trying to recreate differentiation of iPSCs into all of these other types of cells and combine them into tissues or tissue-like structures (a.k.a. ‘cell-fate engineering’). This goal seems feasible given that it has been generally accepted that iPSCs are functionally identical to embryonic stem cells (ESCs). To perform cell-fate engineering, a litany of approaches has been employed that fall into three general categories: (1) application of growth factors into media in either 2D or 3D cell culture, (2) modifications to cell matrix and plate surface conditions, and (3) over-expression of transcription factors (TFs). Generally speaking, the first two categories of approaches have been effective in differentiating many different cell types simultaneously — this makes sense
because the general idea is to recapitulate aspects of natural development in vitro, where many cell types would differentiate in unison with each other. The drawbacks of these first two approaches are threefold: first, these protocols typically take a long time (often many weeks); second, the efficiency in converting to a single type of cell is often poor; and third, reproducibility across these experiments remains a large challenge. Because TF-based approaches directly manipulate the epigenetic landscape of individual cells, they have proved to address these three issues to a great extent. While TF-based approaches have been fruitful, the task of identifying the correct TFs for a fast, efficient, and robust 37 cell conversion remains a challenging problem. There are two general ways to go about this research process: (1) an exhaustive literature search for potentially relevant transcription factors for a desired cell type and identify successful combinations via trial-and-error or (2) to use computational tools to predict TFs. While iPSCs were created through a systematic version of the former, this process does not scale — it is very laborious, requires deep expertise of the cell types being converted, and can only account for previously studied TFs associated with specific cell types. The latter approach has been successful in recent years and can be used as a more general approach in minimizing time required to identify effective conversion factors. While these tools have demonstrated some predictive power, they have key limitations: (1) they cannot account for experimental details such as DNA copy count, clonality (i.e. polyclonal v. monoclonal cell lines), expression method, or cell culture conditions; (2) they generally only provide a single combination of TFs for a cell-type conversion that cannot be iteratively revised; (3) most rely on gene expression data exclusively, as opposed to other relevant data types like epigenetics; and (4) it is not easy to employ the tools for new or rare cell types with very limited data. Moreover, while machine-learning (ML) pipelines 49 have yielded impressive results in various areas of molecular biology to date, currently no tools use ML to generate screens for cell-fate engineering. To address these gaps, we built an epigenetics-based, ML-driven pooled screening tool for engineering cell-fate, called CellCartographer. CellCartographer uses next generation sequencing based readouts of chromatin accessibility (e.g., DNase-seq, ATAC- seq, ChIP-seq) and transcription (RNA-seq) to predict TFs to be correlated with cell-type identity. Using the predictions made by CellCartographer, we can define multiplex pooled- screens of TFs for over-expression, which allows us to explore many experimental variables such as variable stable expression quantities, genomic integration copy count and location, and culture conditions with the option to add more nuance depending on experimental
conditions. CellCartographer gives outputs agnostic to starting cell type because it has been demonstrated that the same (or similar) TF set can be used to differentiate cells from a variety of originating cell types and because the iterative engineering process from this starting in silico screen should be able to accommodate for these differences. We demonstrate how the CellCartographer predictions are sufficient for differentiating small sub-populations of cell- surface marker-positive cells for twelve target cell-type samples from all three germ layers. Then, we show how we can use bulk-RNA sequencing to refine the original TF predictions and zoom in on minimal TF combo sets to differentiate stem cells for six cell types from all three germ layers. Once a sufficiently-high percentage of polyclonal cell line differentiation was created, we showed that isolating clones from these populations results in the creation of high-performance clonal lines. Finally, we functionally characterized robust clonal lines of differentiation-inducible iPSC lines for each of the three germ layers: regulatory T-cells (iTreg) and cytotoxic T-cells (iCytoTs) - mesoderm, hepatocytes (iHep) - endoderm, and type-II astrocytes (iAstII) - ectoderm) to validate that the cells are functionally in vitro and molecularly accurate. We were able to differentiate four cell types using novel combinations of TFs in as little as 6 days. Importantly, our derivation of iTRegs and iCytoTs may considerably accelerate the investigation of T-cell biology. As many TFs are controlled for activity by nuclear localization as well as expression, the prevalence of TF motifs at accessible chromatin regions is a better indicator of TF activity and importance for cell identity than abundance of RNA expression. And so, the CellCartographer model leverages chromatin accessibility data to make initial predictions of TFs for differentiating towards a given cell type. After initial TF predictions are made, TF transcript levels are used to exclude TFs that are not expressed. The CellCartographer pipeline can leverage a variety of assays for chromatin accessibility and transcriptomics to predict a set of TFs for a target cell type, which can then be tested in a pooled screen (FIG. 1B). To broaden the functionality of CellCartographer, input data can be either manually uploaded or automatically queried and downloaded from the ENCODE database or GEO. Valid data types for chromatin accessibility include DNase-seq, ATAC-seq, ChIP-seq (H3K27Ac, H3K4me1, and H3K4me3). For transcriptomics data, CellCartographer accepts most RNA-seq assays including ribo-depleted, total RNA or polyA RNA seq. Since the number of TFs in the TFome (1732) with characterized binding sites (891), yields 2891 possible outcomes (FIG.1A). In a full library screen, the chance of observing a correct combination of TFs that differentiate a target cell type in a full library screen with 106 starting cells would be unlikely (on the order of 1 in 10167). And so, we reasoned that the
number of starting cells and the number of possible combinations formed from the set of transfected TFs should be similar (i.e.2nTFs mcells). In our case, we nucleofected 106 cells per experiment and the screening pools contained approximately 16 plasmids containing a transcription factors driven by a doxycycline-inducible promoter (FIG.1D). Each TF cassette is integrated randomly within each cell from zero to n times, allowing us to explore a large parameter space of DNA integration location and resulting expression amounts of each TF in combination. In order to make TF predictions for each cell type, we begin by training a logistic regression classifier model to distinguish between open chromatin regions and a set of background genomic loci using the known DNA TF binding motifs drawn from the JASPAR database (FIG.1C). By using a non-redundant set of motifs we can mitigate the effect of multiple-collinearity on our classifier model. By examining the sign of the model coefficients, we can determine whether the presence of a motif is negatively or positively correlated with open chromatin. From the set of TF motifs positively correlated with chromatin accessibility, we determine the most significant ones using the likelihood ratio test, which is an in silico analog of a mutagenesis experiment; the performance of the model is compared to a perturbed model that has been blinded to the presence of a given motif. We select the most significant motifs that are positively correlated with binding and the top 16 corresponding genes that are expressed. Finally, we exclude constitutively active TFs before screening the selected TFs (data not shown). Using publicly available DNase-seq data from ENCODE, we applied our approach on several cell types with simple known combinations of one to two lineage determining TFs and confirmed that these TFs appear in the top TFs predicted by CellCartographer. (FIG.1E). To computationally validate our model on a larger scale, we applied CellCartographer to 34 primary cells types and 29 tissue types. We found that each TF DNA binding motifs strongly correlated with chromatin accessibility had different behaviors in each cell type and tissue type (FIG.2C). Given that related cell types have similar transcriptional profiles (FIG.2A), we reasoned that they may also have similar TFs correlated with open chromatin that drive transcriptional profiles. To visualize the similarity between transcriptional profiles, we calculated the pairwise Pearson correlation between the gene expression values of each cell type (log RPKM values) and used multidimensional scaling to embed each cell type in way that respects the pairwise similarity between cell types; using the Spearman correlation model coefficient for each TF, we can also visualize the similarity of TF motifs correlated with open chromatin. We observe that cell types that group together
when considering similarities in transcriptional profiles such as adaptive immune cells (e.g., B cells and T-cells) and progenitor cell types (H1-hESC, GM23338, and neural stem progenitor) tend to look similar from the perspective of TF motifs correlated with open chromatin (FIG.2B). Example 2 - Primary pooled TF screens for differentiation To demonstrate that our pooled screening method could be generally applied to any cell type of interest, we identified cell types from each human germ layer and screened TFs combinations to identify populations of cells that came up positive for canonical markers. Specifically, we generated TF pools for: Mesoderm — T-cells (subtypes cytotoxic, delta- gamma, and regulatory), B cells, macrophages, epithelial cells (subtypes kidney, bronchial, and mammary), and osteoblasts; Endoderm — hepatocytes; Ectoderm — type II astrocytes; and Yolk Sac — microglia. For each cell type, we designed two TF pools for each cell type using CellCartographer — one pool containing TFs with expression level ≥ 1 RPKM and another containing TFs with expression level ≥ 4 RPKM (data not shown). We then prepared mixed DNA pools of equal concentration of each TF and nucleofected and screened iPSCs (FIG.1D). We found that the percentage of cells appearing positive in most cases was very small, but ranged from 0.05% (Regulatory T-cells) to 17.64% (B cells), although in almost all cases, the positive population was <1% (FIG.3). Thus, it appeared that all samples yielded at least a small population of differentiated cells that can be sequenced to determine which TFs from the TFome were present. From this set of diverse screened cell types, we decided to iteratively refine a set of six that had high clinical relevance — cytotoxic T-cells, Regulatory T-cells, B cells, hepatocytes, type II astrocytes, and microglia. A comparison of the top motifs positively correlated with open chromatin for these six cell types is shown in FIG.2D; the screening pools for each of these six cell types are not shown. It should be noted that at this step, the selection of specific surface markers biases the downstream analysis and refinement. For example, although TF pools for astrocytes were determined from data based on generic astrocytes (type I or type II), our selection of A2B5 as a surface marker in combination with CD44 selected specifically for type II astrocytes. In the case of the epithelial sub-types, there was some uncertainty of the ideal cell surface markers to use since CD24 was unexpectedly present in the stem cells and stem cells are partially epithelial in quality and express EpCAM to a slightly lesser degree than differentiated epithelial types. Nonetheless, from the pooled screens we were able to sort at least 1000 double-positive cells from each large population for
bulk RNA-sequencing. We lysed the sorted cells, prepared sequencing libraries, and amplified the barcoded regions of the TFome cassettes to tell us the relative abundance of TFome cassettes in the double-positive cells (FIGs.3A-3E). We found that the distributions for each cell type had some variability, but that in general, each cell type had TFs that were represented in the positive population more than others. In fact, only one of the six cell types (cytotoxic T-cells) had all TFs show up in sequencing at least once. Example 3 - Iterative pooled TF screening and clonal isolation Using the barcode frequencies, we calculated 3 refined TF pools for each cell type: All TFs that appear in sequencing, TFs that appear greater than average, and TFs that appear one standard deviation or more than average (FIGs.4A-D. Using the refined TF pools, we performed a second round of differentiation. Given that this round of screening generally limited TF pools to <5 TFs per pool, we built stable cell lines for additional testing and refinement. iPSCs were nucleofected as before, but we selected and stabilized the cell lines before screening differentiation in different settings. Specifically, given the stability of the constructed cell lines (e.g., less cell death), we opted to test them for only six days, and also decided to test their performance in targeT-cell-type growth medium in addition to stem cell medium (data not shown). In this round, we found broad improvement in differentiation percentage across all six cell types (FIGs.4A-D). While B cells already had a considerably high differentiation percentage in the primary screening round (17.6%), it improved to an average of greater than 50%. For all other five cell types, the refined lines appeared to improve in differentiation percentage dramatically compared to the populations seen in the primary screen. However, since these populations have mixed identity, it is likely that many of these cells were still only partially differentiated. When we examined the number of cells that were positive for just one (or both) markers, all cell types improved differentiation rates compared to the primary screens (data not shown). When we examined differentiation percentage (both partial and total) in target T-cell-type growth media, we saw even more near-complete differentiation of these cell lines (data not shown). While it was clear that the growth medium is a contributor to differentiation efficiency, the TFs were the major driver of differentiation for all cell types. Given that our cell lines were clearly making progress towards robust differentiation, but in a limited capacity, we reasoned that perhaps many micro-scale experimental details could be to blame — for example, perhaps cell-cell communication from non-differentiating
cells in the population was the issue, or perhaps the details of how many TF cassettes were integrated and in what location was very important. Since we use PiggyBac integrase that integrates variable copies of TF over-expression cassettes in random genomic locations, we hypothesized that perhaps some cells in the cell line population are holding back the rest of the population, and that isolating monoclonal cell lines could improve our differentiation efficiency. Ergo, we sorted random single cells in the population to form monoclonal lines and characterized them. To our satisfaction, for CD8 T-cells, microglia, astrocytes, and hepatocytes, this solved the problem — several clones of each were able to dramatically outcompete the mixed population in differentiation efficiency in all of the aforementioned differentiation conditions (FIGs.4A-D). After differentiation of high-performance clones, we performed RNA-sequencing to validate that our clones were generally reflective of target cell types at a molecular level in addition to surface markers. We found that across all genes, our differentiated cells clustered well by cell type in both media conditions (FIG.4E). Specifically, it was important to see that the molecular characteristics of the T-cell subtypes were in general agreement and were significantly different from all other types. As expected, since these cell types were all from different germ layers (except the T-cell subtypes), the expression profiles were dramatically different across differentiated cell types. This was further reflected in principal component analysis (FIG.4F) - we observed that our differentiated cell types generally clustered very tightly across both media conditions and that they clustered somewhat well with primary cell types. The clustering of cell types across variable media reinforces that the TF over- expression a more dominant factor than the different media conditions. Next, when we zoom in on key canonical markers for our differentiated cells, they once again cluster as expected and generally show upregulation of expected markers (FIG.4G). In the case of iAstIIs and iTRegs, there were some interesting, marked difference of key factors across media conditions, suggesting that media formulation may play a key role in the final condition and function of these cells. Finally, when we analyze the complete sets of significantly up- regulated genes (P < 0.1) for our high-efficiency clonal lines compared to iPSCs with Metascape, we see enrichment of GO terms that is supportive of cell-type specific features (FIG.4H). Example 4 - Functional characterization of differentiated cells Finally, after refinement of our differentiating cell lines and molecular validation of their identities, we wanted to validate that the cells also functionally perform their intended
function for down-stream clinical applications. To this end, we opted to focus on at least one cell type from each germ layer - regulatory T-cells (iTregs), cytotoxic T-cells (iCD8s), type II astrocytes (iAstIIs) and hepatocytes (iHeps). To functionally characterize these cell types, we performed in vitro assays based on biological function (FIGs.5A-5L). For the iAstIIs, we validated that the morphology was correct and that they were stimulated as expected by certain standard small molecules (FIGs.5A-C). We observed that at standard concentrations of small-molecules of three classes (glutamate - neurotransmitter, ATP - nucleotide, and KCl - ionic), many plated astrocytes were stimulated. We observed strong increases of relative Fluo-4 fluorescence immediately after induction for individual astrocytes that were both inactive before stimulation and active at times before stimulation. Furthermore, while glutamate and KCl should stimulate both astrocytes and other neuronal types, only astrocytes are stimulated by ATP, confirming that the cells we assayed both had correct astrocyte morphology and exclusive functionality. For the iHeps, we validated the morphology (FIG.5D) and compared their viability compared to primary hepatocytes and undifferentiated cells when exposed to hepatotoxins for 24hrs. We observed that our iHeps had highly similar viability to primary hepatocytes after being exposed to Nefazodone (FIG.5E), Acetaminophen (FIG.5F), and Troglitozone (FIG. 5G), and demonstrated significantly higher viability compared to undifferentiated iPSCs. iTRegs were validated by demonstrating that the cells inhibited the expansion of responder T-cells. Before this step, we confirmed that our iTRegs had size and morphology approximately the same as primary cytotoxic responder cells (FIG.5H). While the size and shape were generally consistent, with both iTRegs and iCytoTs, the primary responder T- cells took on an elongated shape when stimulated, while our iCytoTs did not clearly show this morphological change to stimulus. Responder T-cells were stimulated to activate with IL-2 and CD3+CD28+ beads for three days. After this activation step, responder T-cells were labeled with a fluorescent dye and co-cultured with iTRegs in variable quantities. After 11 days, fluorescence was recorded to validate that the addition of more iTRegs resulted in reduced responder T-cell proliferation (FIGs.5I, 5J). We observed some reduction in responder T-cell proliferation as we increased the number of iTRegs, albeit modestly compared to prior results with primary regulatory T-cells. Finally, to validate the iCytoTs, we activated them with the same bead-based method used in the regulatory T-cell assay and examined their morphology and interaction with the activator beads (FIG.5H) and then recorded proliferation. We found that as with the iTRegs, the proliferation was modest, but
increased by the number of days the iCytoTs were induced from stem cells prior to the initiation of the assay (FIGs.5K, 5L). In summary, we have described how the CellCartographer tool and pipeline can guide and refine cell-fate engineering with machine learning and synthetic TF-cassettes from the human TFome. We demonstrated that the primary TF pools for differentiating iPSCs into a diverse set of cell types yields a small population of positive cells for each of the tested types. We then went on to focus on six cell types from each germ layer to show how we can use NGS data from partially engineered cell lines with CellCartographer to engineer high- efficiency differentiation-inducible cell lines. Finally, we isolated high-performance clones for four cell types and functionally characterized at least one cell type from each germ layer to validate that our engineered cell lines were functionally accurate in vitro. While CellCartographer is not the first software to identify TFs for cell-fate engineering, it presents an advance in three main areas from a software perspective. First, it leverages machine learning to make TF predictions using epigenetics data and enables an iterative pipeline for refining engineered cell lines. We hypothesize that as sequencing technologies continue to improve and more data is generated, CellCartographer’s predictions should only improve. Second, CellCartographer has a very minimal requirement for producing useful TF pools — it does not require re-training large models for additional cell types, which can prove useful for engineering cell lines for differentiation into exotic cell types with little data available. Furthermore, we were able to successfully engineer iTRegs using TFs determined from Mus Musculus data since that was the only epigenetic NGS data available for this cell type, meaning calculations of factors can work cross-species. Finally, the pooled screening philosophy of CellCartographer, allows biologists to explore and debug many experimental variables that are generally invisible to software tools — namely synthetic DNA genomic integration location, copy count, and cell culture conditions. Pooled screening and paired ML analysis allows us to screen out these issues. Furthermore, while we use the starting predictions from CellCartographer to iteratively refine our cell lines in this study, all of the down-stream tools are compatible with starting predictions from other tools (e.g., another tool could provide the starting prediction and CellCartographer and the TFome can still be used downstream), meaning CellCartographer can be used to compliment other existing tools. This work also represents a major advance in terms of identifying four robust TF combinations for differentiation into high-value cell types relevant to therapeutics. At this time, aside from hepatocytes, there are no experimentally established TF combinations for
directly differentiating stem cells into type II astrocytes, regulatory T-cells, or cytotoxic T- cells. Furthermore, we demonstrate that this differentiation can be driven in stem cell media in six days or less, meaning that the TF combinations are fast, robust, and solely to credit for the differentiation in these examples. Finally, by performing additional optimizations with specialized media conditions and performing functional assays on iAstIIs, iHeps, iTRegs, and iCytoTs, we show that this strategy should be robust in ultimately obtaining functional clonal cell lines of theoretically any type that can differentiate rapidly, efficiently, and robustly from iPSCs. While the functional qualities of the iAstIIs, and iHeps were more dramatic and complete, the function and viability of the induced T-cells is likely very sensitive to media conditions and could be further improved with additional optimization of growth conditions starting from the stem cell state. In conclusion, we believe that CellCartographer provides a clear benefit to the field of stem cell biology and cell-line engineering. While we have already generated interesting inducibly-differentiating iPSC lines, we strongly believe that this tool can be applied immediately to aid the engineering of other stem cell lines for any number of therapeutic, diagnostic, or other commercial applications. Example Methods DNAse-seq and ATAC-seq analysis Adapters from sequencing reads were trimmed with Homer, using the command: homerTools trim-len 40. Following adapter trimming, reads were aligned using Bowtie2 (with default parameters) and then converted into a Homer tag directory. We called open chromatin regions or peaks with Homer using the following findPeaks command with the following parameters -C 0 -L 0 -fdr 0.9. We then use IDR to identify high confidence open chromatin regions. Prediction of transcription factors for cell fate engineering For the set of open chromatin regions for each cell type, we sample from the genome an equivalent number of background peaks that has matching GC content and size. Using a set of non-redundant DNA motifs, which specify the frequency of each nucleotide at each position in the motif, and a background frequency (0.25 at each position), we can calculate a log odds score that indicates how well a sequence matches a motif. For each open chromatin region and background loci, we calculate the highest log odds score for each motif. We standardize the motif scores such that the mean score value is 0 and the variance is 1. Then we train a L1-regularized logistic regression model (LASSO) to discriminate between open
chromatin regions and background sites. We assess the importance of each motif using a log- likelihood ratio test where we compare the performance of a perturbed model where a motif has been masked from during the model training procedure and the full model that has observed all motifs. We convert the difference in likelihoods given by the two models to p- values using the chi-squared test. Model coefficients and p-values reported are the average across 5 cross-validation splits. Data processing, model training, and statistical analysis was performed using python and the following packages: pandas, scipy, sklearn, biopython. Transcriptomics analysis Adapters from sequencing reads were trimmed with Homer, using the command: homer- Tools trim -len 40. Following adapter trimming, reads were aligned using Bowtie2 (with default parameters) and then converted into a Homer tag directory. We used the Homer analyzeRepeats command to quantify gene expression as RPKM values. Raw read counts at each gene were used as input to DeSeq2 for identifying differentially expressed genes. Cloning of transcription factors Transcription factors were cloned into puromycin-resistant cassettes with flanking piggyBac transposon [SystemsBio] genomic integration regions under the control of the mammalian DOX-unducible promoter pTRET. Plasmids for each transcription factor are members of the ‘Human TFome’ library deposited on Addgene. Creation of cell lines and cell culture All differentiating cell lines and differentiation screens were performed on reprogrammed PGP1 fibroblasts using the Sendai-reprogramming-factor virus. PGP1 iPS cells were expanded and nucleofected with P3 Primary cell 4D Nuceleofection kits with pulse code CB150 using 2µg of total DNA for 800,000 cells (1.6 µg TF pool/0.4 µg SPB) [Lonza]. Cells were plated onto Matrigel-cotated plates [Corning] with ROCK-inhibitor [Millipore] and selected with puromycin [Sigma]. Stable cell lines were expanded over several passages using TrypLE [Gibco] in mTeSR1 [StemCell Technologies] and frozen in mFreSR [StemCell Technologies]. Cells were differentiated with 2ng/mL doxycycline [Sigma] at variable conditions as described in (data not shown) in either mTeSR [StemCell Technologies], RPMI-1640 (microglia) [Gibco], Williams’ E Medium (hepatocytes) [Gibco], Immunocult- XF T-cell Expansion Media (T-cells) [StemCell Technologies], LGM-3 (B cells) [Lonza], or BrainPhys Media (Astrocytes) [Stem Cell Technologies]. Flow Cytometry and Cell Sorting Cells were digested in TrypLE [Gibco] and resuspended in growth media before staining with cell surface markers. The following antibodies were used for analysis and cell
sorting: [Microglia: CD11b-FITC, CX3CR1-PE]; [CD8-positive T-cells: CD3-PerCP-Cy5.5, CD8-FITC]; [T-Regulatory cells: CD3-PerCP-Cy5.5, CD4-PE-Cy7, FOXP3-PE, CD127- V450]; [B cells: CD19-PE-Cy7, CD27-FITC]; [Hepatocytes: ASGPR1-PE, CD184-APC]; [Astrocytes: CD44-FITC, A2B5-PE]. Cells were sorted and collected on a Sony SH800 FACS for primary screens. For characterization of stable cell lines, cells were stained and analyzed on a BD LSR Fortessa Analyzer flow cytometer. The gating strategy is exemplified in (data not shown). RNA sequencing Cells were either collected from FACS (primary screens) or collected directly from culture (refined screens and stable cell line characterization) and were lysed in TRIzol [Invitrogen]. RNA was purified with Direct-zol RNA MicroPrep and RNA MiniPrep kits [Zymo]. Library prep was performed using a SMARTer-seq v2 NGS library prep kit [TARAKA] (primary screens) and NEBNext Ultra II RNA Kits [NEB] (refined screens and stable cell line characterization). Barcodes were amplified from the prepped cDNA using two alternative primer pairs (data not shown). Amplicons were sequenced with a MiSeq kit [Illumina] using Illumina TruSeq indexes. Astrocyte stimulation assays iAstIIs were differentiated as described (data not shown) and then transferred to imaging dishes for stimulation as previously described. Briefly, glass bottom dishes [Ibidi 81158] were coated in Poly-d-lysine (0.1 mg/mL) for 2 hours at room temperature, washed twice in PBS [Gibco], and coated overnight in fibronectin (10 µg/mL) [Thermo] at 37◦C. Differentiated astrocytes were digested in TrypLE [Gibco] for 7-10 minutes, and 40,000- 50,000 cells were transferred to coated dishes and maintained for 2 days before stimulation and imaging. Prior to stimulation and imaging the astrocytes were stained with Fluo-4 (1 µg/mL) [FluoroPure] in BrainPhys without phenol red [StemCell] and incubated in the dark for at least 25 minutes at 37◦C. Cells were then washed with fresh media three times and transferred immediately to a Zeiss Axio 3 Inverted Microscope with CO2 (5%) and temperature control (37◦C). After staging, basal activity was measured for at least 2 minutes, after which small molecule stimuli were applied. Hepatocyte hepatotoxicity assays iHeps were differentiated as described (data not shown) and then transferred to 96- well plates pre-coated with Matrigel [Corning] and treated with hepatotoxins as previously described. Briefly, after differentiation, 25,000 iHeps, undifferentiated iPSCs, and plateable primary human hepatocytes [ZenBio] were plated in each well and incubated overnight at
37◦C. The next day, media was changed to Hepatocyte Medium E (William’s E Medium [Gibco], Maintenance Cocktail B [Gibco], and 0.1µM Dexamethasone [Gibco]) for one day. The following day, media was exchanged and supplemented with hepatotoxins (Acetaminophen at [3.125,6.25,12.5,25,50,100] mM [Spectrum], Nefazodone at [1,3,10,30,100,300] µM [Sigma], and Troglitazone at [1,3,10,30,100,300] µM [Sigma]). Cells were incubated again at 37◦C for 24 hours, and viability was measured with CellTiter- Glo Luminescent Cell Viability Assay [Promega]. Cytotoxic T-cell activation assays Primary cytotoxic T-cells (Human Peripheral Blood CD4+CD45RA+ T Cells) [StemCell] and iCytoTs were cultured and activated in the same manner. Briefly, cells were incubated in ImmunoCult- XF T Cell Expansion Medium [StemCell] + IL-2 [R&D Systems] with DYNAL Dynabeads Human T-Activator CD3/CD28 for T Cell Expansion and Activation [Fisher] for 3 days. After this incubation, the cells were stained with Celltrace Violet [Fisher] and moved into new wells at the concentration of 1M cells/well with fresh media (as above) and grown at 37◦C for 11 days, changing media every 2-3 days. Finally, cells were analyzed via flow cytometry. Percent activated was determined by gating cells that had diminished fluorescence after proliferation. Regulatory T-cell proliferation suppression assays iTRegs were co-cultured with activated primary cytotoxic T-cells in variable quantities. Briefly, iTRegs were differentiated in ImmunoCult-XF T Cell Expansion Medium [StemCell] + IL-2 [R&D Systems] for 4 days and then moved into co-culture with activated and CellTrace Violet [Fisher] stained cytotoxic T-cells and grown at 37◦C for 11 days, changing media every 2-3 days. Finally, cells were analyzed via flow cytometry. The percentage of suppression was determined as 100 x [1 - (% of proliferating cells with iTRegs) / (% of proliferating cells without iTregs)] after applying gates for proliferating v. non- proliferating cells and subtracting auto-fluorescence resulting from unstained iTregs. All references, patents and patent applications disclosed herein are incorporated by reference with respect to the subject matter for which each is cited, which in some cases may encompass the entirety of the document. The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.” It should also be understood that, unless clearly indicated to the contrary, in any methods claimed herein that include more than one step or act, the order of the steps or acts
of the method is not necessarily limited to the order in which the steps or acts of the method are recited. In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of” shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03. The terms “about” and “substantially” preceding a numerical value mean ±10% of the recited numerical value. Where a range of values is provided, each value between and including the upper and lower ends of the range are specifically contemplated and described herein.
Claims
CLAIMS What is claimed is: 1. A pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding ERG, EGR1, FLI1, FOSB, or any combination thereof.
2. The PSC of any one of the preceding claims, wherein the PSC expresses or overexpresses ERG, EGR1, FLI1, FOSB, or any combination thereof.
3. The PSC of any one of the preceding claims, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter, optionally wherein the heterologous promoter is an inducible promoter.
4. A pluripotent stem cell (PSC) comprising: a protein selected from ERG, EGR1, FLI1, and FOSB, wherein the protein is overexpressed.
5. The PSC of any one of the preceding claims, wherein the PSC is a human PSC, optionally an induced PSC (iPSC).
6. The PSC of any one of the preceding claims, comprising 1-20 copies, optionally 8-10 copies, of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ERG, EGR1, FLI1, and FOSB, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
7. A composition comprising: a population of the PSC of any one of the preceding claims, optionally, wherein the population comprises at least 2500/cm2 of the PSC.
8. A method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ERG, EGR1, FLI1, and FOSB to produce astrocyte-like cells.
9. The method of claim 8, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter, optionally an inducible promoter, preferably a chemically-inducible promoter such as a doxycycline-inducible promoter.
10. The method of claim 8 or 9, wherein the population comprises 1x102 -1x107 PSCs.
11. The method of any one of the preceding claims, wherein the population of PSCs is cultured for at least 1 day, about 3-6 days, or no more than 6 days.
12. The method of any one of the preceding claims, wherein the astrocyte-like cells are CD44+ and A2B5+.
13. A pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof.
14. The PSC of any one of the preceding claims, wherein the PSC expresses or overexpresses ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof.
15. The PSC of any one of the preceding claims, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter, optionally wherein the heterologous promoter is an inducible promoter.
16. A pluripotent stem cell (PSC) comprising: a protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4, wherein the protein is overexpressed.
17. The PSC of claim 41, wherein the PSC expresses or overexpresses: ZBTB1, RUNX3, RELA, NRF1, ERF, SP4, or any combination thereof.
18. The PSC of any one of claims 13-17, wherein the PSC is a human PSC, optionally wherein the PSC is an induced PSC (iPSC).
19. The PSC of any one of the preceding claims, comprising 1-20 copies, optionally 8-10 copies, of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
20. A composition comprising: a population of the PSC of any one of the preceding claims, wherein the population comprises at least 2500/cm2 of the PSC.
21. A method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ZBTB1, RUNX3, RELA, NRF1, ERF, and SP4 to produce cytotoxic T-cell-like cells.
22. The method of claim 21, wherein the PSCs of the expanded population comprise an engineered polynucleotide comprising an open reading frame encoding SP4.
23. The method of claim 21 or 22, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter, optionally .wherein the heterologous promoter is an inducible promoter, further optionally wherein the inducible promoter is a chemically-inducible promoter, further optionally, .wherein the chemically- inducible promoter is a doxycycline-inducible promoter.
24. The method of any one of the preceding claims, wherein the population comprises 1x102 -1x107 PSCs.
25. The method of any one of the preceding claims, wherein the population of PSCs is cultured for at least 1 day, about 3-6 days, or no more than 6 days.
26. The method of any one of the preceding claims, wherein the cytotoxic T-cell-like cells are CD3+ and CD8+.
27. A pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding HNF4G, TEAD4, RFX3, or any combination thereof.
28. The PSC of any one of the preceding claims, wherein the PSC expresses or overexpresses HNF4G, TEAD4, RFX3, or any combination thereof.
29. The PSC of claim 28, wherein the PSC further expresses or overexpresses HNF4A.
30. The PSC of any one of the preceding claims, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter, optionally wherein the heterologous promoter is an inducible promoter.
31. A pluripotent stem cell (PSC) comprising: a one or more proteins selected from HNF4G, TEAD4, and RFX3, wherein the one or more protein or any combination of the one or more proteins is overexpressed.
32. The PSC of any one of claims 27-31, wherein the PSC is a human PSC, optionally wherein the PSC is an induced PSC (iPSC).
33. The PSC of any one of the preceding claims, comprising 1-20 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from HNF4G, HNF4A, TEAD4, and RFX3, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
34. The PSC of any one of the preceding claims, comprising 8-10 copies of the engineered polynucleotide comprising the open reading frame encoding the protein selected from HNF4G, TEAD4, and RFX3, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
35. A composition comprising: a population comprising the PSC of any one of the preceding claims.
36. The composition of claim 35, wherein the population comprises at least 2500/cm2 of the PSC.
37. A method, comprising:
culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from HNF4G, TEAD4, and RFX3 to produce hepatocyte-like cells.
38. The method of claim 37, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter, optionally wherein the heterologous promoter is an inducible promoter, further optionally wherein the inducible promoter is a chemically-inducible promoter, further optionally wherein the chemically- inducible promoter is a doxycycline-inducible promoter.
39. The method of any one of the preceding claims, wherein the population comprises 1x102 -1x107 PSCs.
40. The method of any one of the preceding claims, wherein the population of PSCs is cultured for at least 1 day, about 3-6 days, or no more than 6 days.
41. The method of any one of the preceding claims, wherein the hepatocyte-like cells are CD184+ and ASGPR1+.
42. A pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof.
43. The PSC of any one of the preceding claims, wherein the PSC expresses or overexpresses ETS1, ETV3, GABPA, KLF9, NFKB1, or any combination thereof.
44. The PSC of any one of the preceding claims, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter, optionally wherein the heterologous promoter is an inducible promoter.
45. A pluripotent stem cell (PSC) comprising: a one or more protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1, wherein the one or more protein or any combination of the one or more proteins is overexpressed.
46. The PSC of any one of claims 42-45, wherein the PSC is a human PSC, optionally wherein the PSC is an induced PSC (iPSC).
47. The PSC of any one of the preceding claims, comprising 1-20 copies, optionally 8-10 copies, of the engineered polynucleotide comprising the open reading frame encoding the protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
48. A composition comprising: a population of the PSC of any one of the preceding claims, wherein the population comprises at least 2500/cm2 of the PSC.
49. A method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from ETS1, ETV3, GABPA, KLF9, and NFKB1 to produce regulatory T-cell-like cells.
50. The method of claim 49, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter, optionally wherein the heterologous promoter is an inducible promoter, further optionally wherein the inducible promoter is a chemically-inducible promoter, further optionally wherein the chemically- inducible promoter is a doxycycline-inducible promoter.
51. The method of any one of the preceding claims, wherein the population comprises 1x102 -1x107 PSCs.
52. The method of any one of the preceding claims, wherein the population of PSCs is cultured for at least 1 day, about 3-6 days, or no more than 6 days.
53. The method of any one of the preceding claims, wherein the regulatory T-cell-like cells are CD3+ and CD25+.
54. A pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof.
55. The PSC of any one of the preceding claims, wherein the PSC expresses or overexpresses EBF1, ZBTB1, RELA, NRF1, REL, or any combination thereof.
56. The PSC of any one of the preceding claims, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter, optionally wherein the heterologous promoter is an inducible promoter.
57. A pluripotent stem cell (PSC) comprising: a one or more protein selected from EBF1, ZBTB1, RELA, NRF1, and REL, wherein the one or more protein or any combination of the one or more proteins is overexpressed.
58. The PSC of any one of claims 54-57, wherein the PSC is a human PSC, optionally wherein the PSC is an induced PSC (iPSC).
59. The PSC of any one of the preceding claims, comprising 1-20 copies, optionally 8-10 copies, of the engineered polynucleotide comprising the open reading frame encoding the protein selected from EBF1, ZBTB1, RELA, NRF1, and REL, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
60. A composition comprising: a population of the PSC of any one of the preceding claims, wherein the population comprises at least 2500/cm2 of the PSC.
61. A method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from EBF1, ZBTB1, RELA, NRF1, and REL to produce B cell-like cells.
62. The method of claim 61, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter, optionally wherein the heterologous promoter is an inducible promoter, further optionally wherein the inducible promoter is a chemically-inducible promoter, further optionally wherein the chemically- inducible promoter is a doxycycline-inducible promoter.
63. The method of any one of the preceding claims, wherein the population comprises 1x102 -1x107 PSCs.
64. The method of any one of the preceding claims, wherein the population of PSCs is cultured for at least 1 day, about 3-6 days, or no more than 6 days.
65. The method of any one of the preceding claims, wherein the B cell-like cells are CD19+ and CD27+.
66. A pluripotent stem cell (PSC) comprising: an engineered polynucleotide comprising an open reading frame encoding SPI1, ZBTB1, RELA, STAT2, or any combination thereof.
67. The PSC of any one of the preceding claims, wherein the PSC expresses or overexpresses SPI1, ZBTB1, RELA, STAT2, or any combination thereof.
68. The PSC of any one of the preceding claims, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter, optionally wherein the heterologous promoter is an inducible promoter.
69. A pluripotent stem cell (PSC) comprising: a one or more protein selected from SPI1, ZBTB1, RELA, and STAT2, wherein the one or more protein or any combination of the one or more proteins is overexpressed.
70. The PSC of any one of claims 66-69, wherein the PSC is a human PSC, optionally wherein the PSC is an induced PSC (iPSC).
71. The PSC of any one of the preceding claims, comprising 1-20 copies, optionally 8-10 copies, of the engineered polynucleotide comprising the open reading frame encoding the
protein selected from SPI1, ZBTB1, RELA, and STAT2, optionally wherein the engineered polynucleotide is integrated into the genome of the PSC.
72. A composition comprising: a population comprising the PSC of any one of the preceding claims, wherein the population comprises at least 2500/cm2 of the PSC.
73. A method, comprising: culturing, in culture media, a population of pluripotent stem cells (PSCs) to produce an expanded population of PSCs; and expressing in PSCs of the expanded population a protein selected from SPI1, ZBTB1, RELA, and STAT2 to produce microglia-like cells.
74. The method of claim 73, wherein the open reading frame of the engineered polynucleotide is operably linked to a heterologous promoter, optionally wherein the heterologous promoter is an inducible promoter, further optionally wherein the inducible promoter is a chemically-inducible promoter, further optionally wherein the chemically- inducible promoter is a doxycycline-inducible promoter.
75. The method of any one of the preceding claims, wherein the population comprises 1x102 -1x107 PSCs.
76. The method of any one of the preceding claims, wherein the population of PSCs is cultured for at least 1 day, about 3-6 days, or no more than 6 days.
77. The method of any one of the preceding claims, wherein the microglia-like cells are CD11b+ and CX3CR1+.
78. The method of any one of the preceding claims, wherein the target cell type is a Type II astrocyte, cytotoxic T-cell, regulatory T-cell, hepatocyte, B cell, or microglial cell.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263415729P | 2022-10-13 | 2022-10-13 | |
US63/415,729 | 2022-10-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024091801A2 true WO2024091801A2 (en) | 2024-05-02 |
Family
ID=90832153
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/076715 WO2024091801A2 (en) | 2022-10-13 | 2023-10-12 | Methods and compositions for inducing cell differentiation |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024091801A2 (en) |
-
2023
- 2023-10-12 WO PCT/US2023/076715 patent/WO2024091801A2/en unknown
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7022878B2 (en) | Cell reprogramming | |
Nefzger et al. | Cell type of origin dictates the route to pluripotency | |
EP1734112B1 (en) | Method of proliferating pluripotent stem cell | |
EP2673358B1 (en) | Hematopoietic precursor cell production by programming | |
US20190264223A1 (en) | Novel method | |
KR20190018709A (en) | New and Efficient Methods for Reprogramming Blood into Induced Allogenic Stem Cells | |
JP2020536551A (en) | Cell reprogramming using a transient and transient plasmid vector expression system | |
IL293946A (en) | Engineered cells for therapy | |
Wind et al. | Defining the signalling determinants of a posterior ventral spinal cord identity in human neuromesodermal progenitor derivatives | |
CN113767167A (en) | In vivo generation of functional and patient-specific thymus tissue from induced pluripotent stem cells | |
DK2681310T3 (en) | HAPLOID EMBRYONIC MAMMAL STEM CELLS | |
WO2023192934A2 (en) | Methods and compositions for producing granulosa-like cells | |
Qiu et al. | Single-cell RNA sequencing of neural stem cells derived from human trisomic iPSCs reveals the abnormalities during neural differentiation of Down syndrome | |
WO2024091801A2 (en) | Methods and compositions for inducing cell differentiation | |
CA3234404A1 (en) | Treatment with genetically modified cells, and genetically modified cells per se, with increased competitive advantage and/or decreased competitive disadvantage | |
Zhong et al. | Effect of increased HoxB4 on human megakaryocytic development | |
Guyonneau-Harmand et al. | Transgene-free hematopoietic stem and progenitor cells from human induced pluripotent stem cells | |
WO2023242398A1 (en) | Process for obtaining functional lymphocytes cells | |
US11920159B2 (en) | Methods for expanding hematopoietic stem cells using revitalized mesenchymal stem cells | |
Hota et al. | Chromatin remodeler Brahma safeguards canalization in cardiac mesoderm differentiation | |
Valcourt et al. | Changing the Waddington landscape to control mesendoderm competence | |
Fahmy | Neural Crest Stem Cells Are the Elite Cell Type in Cellular Reprogramming | |
Creamer | The role of CDX4 during patterning of definitive hemogenic mesoderm | |
Teske et al. | Targeted CRISPR-Cas9 screening identifies transcription factor network controlling murine haemato-endothelial fate commitment | |
Sun | Investigating the Metabolic Landscape of T Cell Development from Hematopoietic Stem Cells In Vivo and In Vitro |