WO2023102610A1

WO2023102610A1 - Methods and compositions for multiplexing cell analysis

Info

Publication number: WO2023102610A1
Application number: PCT/AU2022/051476
Authority: WO
Inventors: Nathan Palpant; Stacey Anderson; Tessa WERNER; Samuel LUKOWSKI; Sophie SHEN
Original assignee: The University Of Queensland
Priority date: 2021-12-08
Filing date: 2022-12-08
Publication date: 2023-06-15

Abstract

The present invention relates to the use of a population of cells into which an engineered barcode unique to the cells has been stably integrated and the use of such cells in combination with one or more other populations of cells into which an engineered barcode unique to each other population which has been stably integrated for multiplexed analysis.

Description

METHODS AND COMPOSITIONS FOR MULTIPLEXING CELL ANALYSIS

Field

[0001] The present invention relates to the use of a population of cells into which an engineered barcode unique to the cells has been stably integrated and the use of such cells in combination with one or more other populations of cells into which an engineered barcode unique to each other population which has been stably integrated for multiplexed analysis.

Background

[0002] Recent development of single-cell RNA sequencing (Tang et al., 2009, Picelli et al., 2013, Hashimshony et al., 2012, Macosko et al., 2015, Klein et al., 2015, Zheng et al., 2017) has enabled transcriptomic profiling at an unprecedented resolution and scale. To overcome the high cost of processing scRNA-seq samples and technical variation between sample processing runs, new strategies in sample multiplexing have emerged to efficiently design experiments to maximise the volume and quality of data derived by single cell studies. These capabilities are ideally suited for scaling stem cell differentiation perturbation assays to reveal the underpinning biology of cell differentiation decisions. They provide versatile approaches for labelling samples using internal or external barcoding coupled with computational strategies for demultiplexing data for downstream analysis.

[0003] One approach uses multiplexing of genetically distinct cells so that their known (Demuxlet, MIX-seq), or inferred (scSplit, Vivero) genotypes enable demultiplexing by reference to intrinsic genetic variation across input samples. For experiments requiring isogenic samples, multiplexing involves associating a sample- specific oligonucleotide barcode to the cells by attaching a barcode to the cell membrane or a membrane protein (Cell hashing, ClickTags, MULTI-Seq), introducing barcodes into the cells (Transient barcoding, barRNA-seq, scifi-seq, sci-plex), or into their genomes (CellTag).

[0004] Barcoding-based multiplexing requires barcode sequencing alongside the transcriptome with expressed barcodes for each cell used to identify its sample of origin. Downstream computational approaches then distinguish true positive barcode expression signals from background noise arising from low quality cell barcodes and ambient transcripts present during cell capture. Current strategies estimate the background count distribution and determine

AH25(41029897_l):DPS whether the expression of a given barcode in each cell is statistically different from the background.

[0005] All sample multiplexing methods provide a way to identify multiplet cell barcodes beyond the transcriptome-based metrics such as library size and marker gene co-expression. However, only the combinatorial barcoding methods (sci-fi-seq, sci-plex) can rescue multiplets because sample labelling is done on a transcript level. As such, these methods are uniquely permissive to overloading single-cell capture machinery to increase the number of cells processed in one scRNA-seq experiment, and are thus also more robust to the diminishing returns with more experimental samples.

[0006] External barcoding strategies are the most common but involve an additional processing stage to administer cell barcodes after harvesting. This stage exposes the cells to stressors that could impact their transcriptomic readout and often requires expensive single-use reagents. To avoid these disadvantages, the cells would need to have their sample barcode prior to the experiment, such as genome-embedded barcodes (Demuxlet, CellTag). These ‘internal’ barcoding strategies also allow for more complex experimental designs as cells of different samples can be co-cultured in a dish or organoid, whilst remaining identifiable after pooling and sequencing. However, approaches such as Demuxlet involve the use of cells with different genetic backgrounds to demultiplex which inherently limits their utility to population statistical genetics studies where genetic heterogeneity is desirable. CellTag is an alternative approach for barcoding cells. However, in such methods an unknown copy number of barcodes are integrated into the genome randomly which can alter or silence gene expression based on integration location and the lack of reliable expression levels of barcodes makes detection more likely to result in false negative barcode detection.

[0007] The field of developmental biology has made quick work adopting and even driving the multiplexing approaches for scalable analysis of cell lineage decisions in vivo. As such, atlases of developing organs from humans and mice have made major impact. However, the use of barcoded single cell studies to evaluate stem cell differentiation are not matching. There is currently a need for improved sample barcoding and multiplexed experimental design to enable significant data generation describing the molecular basis of cell differentiation and to drive translation of stem cell biology. Summary of Invention

[0008] To enable efficient multiplexing of single cell data from iPSCs, we generated 18 isogenic iPSCs with genetically encoded barcodes. This tool overcomes numerous limitations based on published or commercially available barcoding methods. In particular, it does not require expensive single use reagents for barcoding cells that also require extended protocols for labelling cells that can compromise the quality of the samples submitted through the cell capture pipeline. While externally applied labelling are limited to the number of unique features (e.g. antigens) that label all cell types, internal barcoding is only limited by the number of combinations of the 15 base pair barcodes that reaches 100 trillion unique barcode options. While many barcoding approaches must be applied at the cell capture stage, internal barcoding provides a unique platform for mixing cells with different barcodes to study cell-cell heterogeneity and interactions, including in organoid models. Lastly, engineering barcodes into iPSCs provides a unique cell type in which to multiplex analysis of diverse cell types that can be derived by directed differentiation into the diverse cell types from organ systems across the embryonic and extra-embryonic lineages.

[0009] In one aspect the present invention provides a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell RNA-seq library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.

[00010] In one aspect the present invention provides a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell de-multiplexing said library using said genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.

[00011] In one aspect the present invention provides a plurality of isogenic populations of iPSCs or progeny thereof, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is integrated into the genome of the cells of each population at a targeted location.

[00012] The disclosed embodiments also provide a computer program product including a non- transitory computer readable medium on which is provided program instructions for performing the recited operations and other computational operations of the methods described herein.

[00013] Some embodiments provide a system for multiplexed single cell analysis in a sample using the isogenic barcoded iPSC populations described herein. The system includes a sequencer for receiving nucleic acids from the test sample providing nucleic acid sequence information from the sample, a processor; and one or more computer-readable storage media having stored thereon instructions for execution on the processor to map a read of sequence information from a pooled sample (e.g. a sequence library) to a single cell, de-multiplexing the sequence information using the genetic barcodes; and mapping said single cell to an originating cell populations or progeny thereof in the test sample.

[00014] Numbered statements of the invention are as follows:

1. A method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.

2. A method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell RNA-seq library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.

3. The method of statement 1 or 2 wherein the step of generating a multiplexed single cell RNA-seq library comprises: creating a one or more gene expression libraries; creating one or more separate barcode libraries via creation of purified cDNA and amplification of regions of said cDNA comprising said genetic barcode; and pooling said gene expression library and said barcode library.

4. The method of statement 3, wherein the more than one gene expression library and more than one barcode library are created and pooled.

5. The method of statement 3 or 4, wherein the one or more gene expression libraries comprise approximately 90% of the pool and the one or more barcode libraries comprise approximately 10% of the pool.

6. The method of any one of the preceding statements, wherein the step of generating a multiplexed single cell RNA-seq library comprises creating a gene expression library via creation of purified cDNA and amplification of said cDNA but does not include the generation of a separate barcode library. 7. A method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations of cells or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.

8. A method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.

9. The method of any one of the preceding statements, wherein providing a plurality of isogenic populations of cells or iPSCs comprises incorporating said genetic barcode into said targeted location via CRISPR/Cas9-mediated integration.

10. The method of any one of the preceding statements, wherein said genetic barcode is incorporated into a genomic safe harbor locus.

11. The method of statement 10 wherein said genetic barcode is incorporated into the adeno- associated virus site 1 (AAVS1). 12. The method of any one of the preceding statements wherein said genetic barcode is fluorescently labelled.

13. The method of any one of the preceding statements wherein said genetic barcode is from 5 - 20 bp.

14. The method of statement 13 wherein said genetic barcode is 15 bp.

15. The method of statement 13, wherein said genetic barcode is selected from the group consisting of:

GTGCCGACCAGTATC (SEQ ID NO: 1);

ACCACCTGACGCAAA (SEQ ID NO: 2);

ACGGCCCTATTTAAG (SEQ ID NO: 3);

AGCCCTGAGTCAGTA (SEQ ID NO: 4);

CAAATTCAAGGCGAT (SEQ ID NO: 5);

AATCTTGTATAAGTA (SEQ ID NO: 6);

CGTCACATTTGAGTC (SEQ ID NO: 7);

GGACCTTCTTACGAC (SEQ ID NO: 8);

TACCAATTGTACGCT (SEQ ID NO: 9);

CGCTAATGTCCGTTT (SEQ ID NO: 10);

ACCCTACGGTGGTTC (SEQ ID NO: 11);

TGTCCAAGCTGCAAT (SEQ ID NO: 12);

GTGTATTTAAAGCCG (SEQ ID NO: 13);

ACACCCGTATGTCAC (SEQ ID NO: 14);

TCTTTCGATGGCGGT (SEQ ID NO: 15);

GAGCACCCGCGTATT (SEQ ID NO: 16);

TTATTATGTTCTAGC (SEQ ID NO: 17); and AATCTCTGAAACGAA (SEQ ID NO: 18).

16. The method of any one of the preceding statements, wherein prior to generating said multiplexed RNA-seq library, one or more of the populations of cells and/or progeny thereof are mixed together with one or more other populations of cells and/or progeny thereof. 17. The method of any one of the preceding statements, wherein manipulating one or more of the populations of cells or iPSCs or progeny thereof comprises contacting the cells with an agent of interest which results in a biologically measurable perturbation to a cell.

18. The method of any one of the preceding statements, wherein manipulating one or more of the populations of cells or progeny thereof comprises altering the culture conditions of, or genetically perturbing the cells of the one or more populations or progeny thereof.

19. The method of statement 18, wherein altering the culture conditions comprises contacting the cells or progeny thereof with an agent of interest, contacting the cells or progeny thereof with another cell, co-culturing the cells or progeny thereof with another cell, or co-culturing the cells and/or progeny thereof in an organoid.

20. The method of statement 17, 18 or 19, wherein the agent of interest is a small molecule, a polypeptide, an antibody, a nucleic acid molecule, an RNAi, a vector comprising a nucleic acid molecule, an antisense oligonucleotide, or a gene editing system (e.g. CRISPR/Cas9).

21. The method of any one of statements 17 to 20 when used in a high-throughput drug screening assay.

22. The method of any one of the preceding statements wherein said progeny are differentiated progeny.

23. A plurality of isogenic populations of cells, or progeny thereof, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location.

24. The plurality of isogenic populations of cells, or progeny thereof, of statement 23, wherein said genetic barcode is incorporated into a genomic safe harbor locus.

25. The plurality of isogenic populations of cells, or progeny thereof, of statement 24, wherein said genetic barcode is incorporated into the adeno-associated virus site 1 (AAVS1).

26. The plurality of isogenic populations of cells, or progeny thereof, of any one of statements 23 - 25, wherein said genetic barcode is fluorescently labelled. 27. The plurality of isogenic populations of cells, or progeny thereof, of any one of statements 20 - 22, wherein said genetic barcode is from 10 - 20 bp.

28. The plurality of isogenic populations of cells, or progeny thereof, of statement 24, wherein said genetic barcode is 15 bp.

29. The plurality of isogenic populations of cells, or progeny thereof, of statement 25, wherein said genetic barcode is selected from the group consisting of:

GTGCCGACCAGTATC (SEQ ID NO: 1);

ACCACCTGACGCAAA (SEQ ID NO: 2);

ACGGCCCTATTTAAG (SEQ ID NO: 3);

AGCCCTGAGTCAGTA (SEQ ID NO: 4);

CAAATTCAAGGCGAT (SEQ ID NO: 5);

AATCTTGTATAAGTA (SEQ ID NO: 6);

CGTCACATTTGAGTC (SEQ ID NO: 7);

GGACCTTCTTACGAC (SEQ ID NO: 8);

TACCAATTGTACGCT (SEQ ID NO: 9);

CGCTAATGTCCGTTT (SEQ ID NO: 10);

ACCCTACGGTGGTTC (SEQ ID NO: 11);

TGTCCAAGCTGCAAT (SEQ ID NO: 12);

GTGTATTTAAAGCCG (SEQ ID NO: 13);

ACACCCGTATGTCAC (SEQ ID NO: 14);

TCTTTCGATGGCGGT (SEQ ID NO: 15);

GAGCACCCGCGTATT (SEQ ID NO: 16);

TTATTATGTTCTAGC (SEQ ID NO: 17); and

AATCTCTGAAACGAA (SEQ ID NO: 18).

30. The plurality of isogenic populations of cells of any one of statements 23 - 29, wherein the cells are iPSCs or progeny thereof.

31. The progeny of cells of any one of statements 23 - 30, wherein the progeny are differentiated progeny. [00015] Any example or embodiment herein shall be taken to apply mutatis mutandis to any other example or embodiment unless specifically stated otherwise.

[00016] The present disclosure is not to be limited in scope by the specific examples described herein, which are intended for the purpose of exemplification only. Functionally-equivalent methods and systems are clearly within the scope of the disclosure, as described herein.

[00017] Throughout this specification, unless specifically stated otherwise or the context requires otherwise, reference to a single step, composition of matter, group of steps or group of compositions of matter shall be taken to encompass one and a plurality (i.e. one or more) of those steps, compositions of matter, groups of steps or group of compositions of matter.

[00018] The disclosure is hereinafter described by way of the following non-limiting Examples and with reference to the accompanying drawings. Although the examples herein concern humans and the language is primarily directed to human concerns, the concepts described herein are applicable to genomes from other animals. These and other objects and features of the present disclosure will become more fully apparent from the following description and appended claims, or may be learned by the practice of the disclosure as set forth hereinafter.

Brief Description of Drawings

[00019] Figure 1 shows CRISPR editing and quality control of engineered barcode in isogenic iPSCs. a. Plasmid using for designing barcodes into the AAVS1-CAG-GFP targeting cassette, b- d. Example QC steps for FACS analysis of GFP (left) and SSEA4 (right) (b) image analysis for morphology (c) and G-band karyotyping (d) used to ensure quality of barcode engineered iPSCs. e-g. Single cell RNA-seq of barcoded iPSCs in the pluripotent state demonstrates that all barcoded cell lines show similar transcriptional profiles based on dimensionality reduction using UMAP visualisation (e), and for each barcoded line, analysis of reads per cell (f) and expression of the pluripotency markers SOX2 and OCT4 (g). h. External hashing antibodies were used on four barcoded cell lines as secondary validation for accuracy of barcode calling from single cell RNA-seq data showing high fidelity and confidence of computational assignment of external and internal barcodes.

[00020] Figure 2 shows quality control analysis of barcoded iPSCs. a-c. Each cell line was analysed by G-band karyotyping (a) and FACS analysis for the pluripotency marker SSEA4 (b) and purity of the GFP expression transcribed from the barcode cassette engineered into the AAVS1 locus (c).

[00021] Figure 3 shows the experimental design for multiplexed single cell analysis of mesendoderm differentiation, a. Timeline of the general differentiation protocol from hiPSCs to committed mesendoderm cell types. hiPSC, human induced pluripotent stem cell; GLS, germ layer specification; PC, progenitor cell; cCT, committed cell types, b. Experimental approaches comprise a high resolution time course capturing cells every 24 hours between day 2 and day 9 of differentiation (left) as well as a perturbation strategy of Wnt and BMP signalling pathways during the progenitor cell stage between day 3 and day 5 (right), capturing cells prior to perturbation (day 2), immediately after (day 5) and at the committed cell stage (day 9). c.

Different barcoding methods have been used for multiplexing sc-RNA-sequencing experiments. Samples from the time course were labelled with commercially available hashtag antibodies (TotalSeq™-A). For the signalling perturbations, 18 barcoded iPS cell lines were generated using CRISPR/Cas9 and each experimental condition was carried out using two of these cell lines as biological duplicates, d. sc-RNA-sequencing for both experiments was performed using the Chromium 10X platform and hashtag or expressed cell barcodes were used for demultiplexing, e, summary of important features for multiplexing.

[00022] Figure 4 shows High resolution time course of mesendoderm differentiation, a. Uniform manifold approximation and projection (UMAP) plot showing all cells of the time course (13,682 cells). Cells are coloured by their cluster annotation and numbered according to the legend in figure 4C. b. UMAP plot showing cells coloured by time point, c. Fraction of clusters per time point, displayed as absolute numbers (top) and proportions (bottom), d. Nebulosa plots of specific marker genes demarcating cluster identities, e. Dot plot of marker genes from both datasets, f. RNA velocity results coloured by cluster identity and time point.

[00023] Figure 5 shows signalling pathway perturbations during mesendoderm differentiation, a. Uniform manifold approximation and projection (UMAP) plot showing all cells in the dataset (48,526 cells). Cells are coloured by their cluster annotation and numbered according to the legend in figure 3C. b. UMAP plot showing cells coloured by time point, sequencing library and condition, c. Fraction of clusters per time point, displayed as proportions, d. Nebulosa plots of specific marker genes demarcating cluster identities, e. Dot plot of marker genes from both datasets, f. Normalized stagged bar plots displaying contributions of cells from different conditions to each cluster.

[00024] Figure 6 shows mitochondrial genes measured using different barcoding strategies. Samples shown are barcoded by Cell hashing: seu_bc0Xav, seu_bc5Xav, seu_bcDox vs. engineered barcoding (separately amplified barcoding library): seul, seu2, seu3 vs engineered barcoding (barcoding reads just from transcriptome library): lib 1 , lib2, lib3. Data show that mitochondrial reads, which are a measure of cell stress, are significantly higher in cells that use cell hashing (P < 0.01 two sample Welch's t-test).

[00025] Figure 7 shows barcode classification of singlet (individual cells), negative (no barcode detected), and doublet (multiple barcodes detected) measured using different barcoding strategies. Samples shown are barcoded by Cell hashing: seu_bc0Xav, seu_bc5Xav, seu_bcDox vs. engineered barcoding (separately amplified barcoding library): seul, seu2, seu3 vs engineered barcoding (barcoding reads just from transcriptome library): lib 1 , lib2, lib3. Data show no significant difference in singlet detection efficiency using any either barcoding or library sequencing method based on two sample Welch’s t-test.

[00026] Figure 8 shows 0XAV QC data. a-d. Thresholds used for cell filtering based on transcriptome metrics: (a) library size, (b) number of reads mapped to genes, (c) percentage of reads mapped to mitochondrial genes, and (d) percentage of reads mapped to ribosomal genes, e. UMAP plots before (left) and after (right) cell filtering based on transcriptome metrics. Points in the pre-filtering plot are coloured by the number of algorithms that call each cell a doublet Doublets labelled in the post-filtering plot are those that were called doublets by 3 or more algorithms, f. Distribution of HTO calls for each barcode (left, A2052-A2051) and summarised (right) as determined from the Seurat algorithm, post transcriptome-based filtering.

Description of Embodiments

[00027] Unless otherwise indicated, the practice of the method and system disclosed herein involves conventional techniques and apparatus commonly used in molecular biology, microbiology, protein purification, protein engineering, protein and DNA sequencing, and recombinant DNA fields, which are within the skill of the art. Unless specifically defined otherwise, all technical and scientific terms used herein shall be taken to have the same meaning as commonly understood by one of ordinary skill in the art (e.g., molecular biology, cell culture, stem cell differentiation, cell therapy, genetic modification, disease modelling, biochemistry, physiology, and clinical studies).

[00028] Unless otherwise indicated, the molecular and statistical techniques utilized in the present disclosure are standard procedures, well known to those skilled in the art. Such techniques are described and explained throughout the literature in sources such as, J. Perbal, A Practical Guide to Molecular Cloning, John Wiley and Sons (1984), J. Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbour Laboratory Press (1989), T.A. Brown (editor), Essential Molecular Biology: A Practical Approach, Volumes 1 and 2, IRL Press (1991), D.M. Glover and B.D. Hames (editors), DNA Cloning: A Practical Approach, Volumes 1-4, IRL Press (1995 and 1996), and F.M. Ausubel et al. (editors), Current Protocols in Molecular Biology, Greene Pub. Associates and Wiley-Interscience (1988, including all updates until present), Ed Harlow and David Lane (editors) Antibodies: A Laboratory Manual, Cold Spring Harbour Laboratory, (1988), J.E. Coligan et al. (editors) Current Protocols in Immunology, John Wiley & Sons (including all updates until present), Robert Lanza (editor) Handbook of Stem Cells, Volume 1, Embryonic Stem Cells (Elsevier).

[00029] As used in this specification and the appended claims, terms in the singular and the singular forms "a," "an" and "the," for example, optionally include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to "a kidney organoid" optionally includes one or more kidney organoid.

[00030] As used herein, the term “about”, unless stated to the contrary, refers to +/- 10%, more preferably +/- 5%, more preferably +/- 1%, of the designated value.

[00031] The term “and/or”, e.g., “X and/or Y” shall be understood to mean either “X and Y” or “X or Y” and shall be taken to provide explicit support for both meanings or for either meaning.

[00032] Throughout this specification the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element, integer or step, or group of elements, integers or steps, but not the exclusion of any other element, integer or step, or group of elements, integers or steps. [00033] Numeric ranges are inclusive of the numbers defining the range. It is intended that every maximum numerical limitation given throughout this specification includes every lower numerical limitation, as if such lower numerical limitations were expressly written herein. Every minimum numerical limitation given throughout this specification will include every higher numerical limitation, as if such higher numerical limitations were expressly written herein. Every numerical range given throughout this specification will include every narrower numerical range that falls within such broader numerical range, as if such narrower numerical ranges were all expressly written herein.

[00034] The headings provided herein are not intended to limit the disclosure.

[00035] Various embodiments are described hereinafter. It should be noted that the specific embodiments are not intended as an exhaustive description or as a limitation to the broader aspects discussed herein. One aspect described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced with any other embodiment(s). Reference throughout this specification to “one embodiment”, “an embodiment,” “an example embodiment,” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment,” “in an embodiment,” or “an example embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to a person skilled in the art from this disclosure, in one or more embodiments. Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention. For example, in the appended claims, any of the claimed embodiments can be used in any combination.

[00036] Unless otherwise indicated, nucleic acids are written left to right in 5’ to 3’ orientation and amino acid sequences are written left to right in amino to carboxy orientation, respectively.

[00037] A gene is a locus (or region) of DNA which is made up of nucleotides and is the molecular unit of heredity. [00038] The terms “polynucleotide”, “oligonucleotide”, “nucleic acid” and “nucleic acid molecules” are used interchangeably and refer to a covalently linked sequence of nucleotides (i.e., ribonucleotides for RNA and deoxyribonucleotides for DNA) in which the 3’ position of the pentose of one nucleotide is joined by a phosphodiester group to the 5’ position of the pentose of the next. The nucleotides include sequences of any form of nucleic acid, including, but not limited to RNA and DNA molecules. The terms includes, without limitation, single- and double- stranded polynucleotide.

[00039] The term “read” refers to a sequence obtained from a portion of a nucleic acid sample. Typically, though not necessarily, a read represents a short sequence of contiguous base pairs in the sample. The read may be represented symbolically by the base pair sequence (in A, T, C, or G) of the sample portion. It may be stored in a memory device and processed as appropriate to determine whether it matches a reference sequence or meets other criteria. A read may be obtained directly from a sequencing apparatus or indirectly from stored sequence information concerning the sample. In some cases, a read is a nucleic acid sequence of sufficient length (e.g., at least about 25 bp) that can be used to identify a larger sequence or region, e.g., that can be aligned and specifically assigned to a chromosome or genomic region or gene.

[00040] The term “genomic read” is used in reference to a read of any segments in the entire genome of an individual.

[00041] “Induced pluripotent stem cells (iPSCs) or (iPS cells)” is a designation that pertains to somatic cells that have been reprogrammed or “de-differentiated”, for example, by introducing exogenous genes that confer on the somatic cell a less differentiated phenotype. These cells can then be induced to differentiate into less differentiated progeny. IPS cells have been derived using modifications of an approach originally discovered in 2006 (Yamanaka, S. et al., Cell Stem Cell, 1:39-49 (2007)). For example, in one instance, to create iPS cells, scientists started with skin cells that were then modified by a standard laboratory technique using retroviruses to insert genes into the cellular DNA. In one instance, the inserted genes were Oct4, Sox2, Lif4, and c- myc, known to act together as natural regulators to keep cells in an embryonic stem cell-like state. These cells have been described in the literature. See, for example, Wemig et al., PNAS, 105:5856-5861(2008); Jaenisch et al., Cell, 132:567-582 (2008); Hanna et al., Cell, 133:250-264 (2008); and Brambrink et al., Cell Stem Cell, 2:151-159 (2008). It is also possible that such cells can be created by specific culture conditions (exposure to specific agents) may also be created from a variety of different starting cell types. These references are all incorporated by reference for teaching iPSCs and methods for producing them.

[00042] iPSCs have many characteristic features of embryonic stem cells. For example, they have the ability to create chimeras with germ line transmission and tetrapioid complementation and they can also form teratomas containing various cell types from the three embryonic germ layers. On the other hand, they may not be identical as some reports demonstrate. See, for example, Chin et al., Cell Stem Cell 5:111-123 (2009) showing that induced pluripotent stem cells and embryonic stem cells can be distinguished by gene expression signatures.

[00043] Cells such as iPSCs or their progeny (including differentiated progeny) as disclosed herein may in the context of the present specification be said to “express” or “comprise the expression” or conversely to “not express” one or more markers, such as one or more genes or gene products; or be described as “positive” or conversely as “negative” for one or more markers, such as one or more genes or gene products; or be said to “comprise” a defined “gene or gene product signature”.

[00044] Such terms are commonplace and well-understood by the skilled person when characterizing cell phenotypes. By means of additional guidance, when a cell is said to be positive for or to express or comprise expression of a given marker, such as a given gene or gene product, a skilled person would conclude the presence or evidence of a distinct signal for the marker when carrying out a measurement capable of detecting or quantifying the marker in or on the cell. Suitably, the presence or evidence of the distinct signal for the marker would be concluded based on a comparison of the measurement result obtained for the cell to a result of the same measurement carried out for a negative control (for example, a cell known to not express the marker) and/or a positive control (for example, a cell known to express the marker). Where the measurement method allows for a quantitative assessment of the marker, a positive cell may generate a signal for the marker that is at least 1.5-fold higher than a signal generated for the marker by a reference cell (e.g. negative control cell) or than an average signal generated for the marker by a population of reference or negative control cells, e.g., at least 2-fold, at least 4-fold, at least 10-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50-fold higher or even higher. Further, a positive cell may generate a signal for the marker that is 3.0 or more standard deviations, e.g., 3.5 or more, 4.0 or more, 4.5 or more, or 5.0 or more standard deviations, higher than an average signal generated for the marker by a population of reference or negative control cells.

[00045] A reference herein to a patent document or other matter which is given as prior art is not to be taken as an admission that that document or matter was known or that the information it contains was part of the common general knowledge as at the priority date of any of the claims.

Barcoded Cells

[00046] Human pluripotent stem cells (hPSCs) can self-renew and have the potential to differentiate into theoretically any cell type of the body in response to developmental signalling cues that guide cell differentiation decisions. With capabilities in deriving induced pluripotent stem cells (iPSCs) and advances in generating diverse functional cell types of the body through directed differentiation protocols, greater understanding and facility in deriving pure, well- defined, and functional cell types is needed.

[00047] The development of single-cell RNA sequencing has enabled transcriptomic profiling at an unprecedented resolution and scale. New strategies in sample multiplexing have emerged to efficiently design experiments to maximise the volume and quality of data derived by single cell studies that are ideally suited for scaling stem cell differentiation perturbation assays to reveal the underpinning biology of cell differentiation decisions and elucidate mechanisms of development, model diseases, discover drugs, and regenerate organs.

[00048] It would be desirable to provide tools and methods for the systematic analysis of iPSC biology. It would also be desirable to provide tools and methods for the systematic analysis of other cell types (e.g. stem and progenitor cells, cell lines), and other biological tissues (e.g. in vivo murine models and other model organisms).

[00049] In one aspect, the present invention provides a plurality of isogenic populations of iPSCs or progeny thereof, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location.

[00050] In another aspect, the present invention provides a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location.

[00051] Embodiments disclosed herein also relate to progeny of such cells (e.g. progeny of iPSCs or other stem or progenitor cells), including differentiated progeny or a population of cells obtained from one or more of the populations of barcoded cells (e.g. iPSCs). As used herein, the term “differentiated” or “differentiation” as used with respect to cells in a differentiating cell system refers to the process by which cells differentiate from one cell type (e.g., a multipotent, totipotent or pluripotent differentiable cell) to another cell type such as a target differentiated cell). Accordingly, the “cell differentiation”, refers to a specialization process or a pathway by which a less specialized cell (e.g. stem cell) develops or matures to possess a more distinct form and function (i.e. more specialized).

[00052] As used herein, the term “dedifferentiation” or “dedifferentiated” as used with respect to cells, refers to a process wherein a more specialized cell having a more distinct form and function, and/or limited self-renewal and/or proliferative capacity becomes less specialized and acquires a greater self-renewal and/or proliferative capacity or differentiation capacity (e.g. multipotent, pluripotent etc.). An induced Pluripotent Stem Cell (iPSC) is an example of a dedifferentiated cell. Accordingly, dedifferentiation can refer to a process of cellular reprogramming.

[00053] In embodiments of the invention the isogenic populations of cells (e.g. iPSCs) are cell lines derived from a single source wherein each cell line comprises a genetic barcode unique to that cell line.

[00054] In embodiments of the invention the barcoded iPSC populations or cell lines are linage committed or differentiated whereby the iPSCs are differentiated to multipotent stem or progenitor cells to cells with more specialized or differentiated phenotype in vitro under conditions to permit the cells to obtain said phenotype.

[00055] The term “barcode”, “genetic barcode” and barcode oligonucleotide” as used herein refers to a short sequence of nucleotides (for example, DNA or RNA) that is used as an identifier for an associated molecule, such as a target molecule and/or target nucleic acid, or as an identifier of the source of an associated molecule, such as a cell-of-origin. A barcode may also refer to any unique, non-naturally occurring, nucleic acid sequence that may be used to identify the originating source of a nucleic acid fragment. Although it is not necessary to understand the mechanism of an invention, it is believed that the barcode sequence provides a high-quality individual read of a barcode associated with a single cell/clone, such that multiple cells or clones can be sequenced and analysed together. Not being bound by a theory, amplified sequences from single cells can be sequenced together and resolved based on the barcode associated with each cell.

[00056] In certain embodiments, the sample barcode oligonucleotides comprise a PCR handle compatible with single cell sequencing methods as described herein (e.g., Drop-seq, InDrop, 10X Genomics). Depending on the application, the PCR-amplification handle in the sample barcode oligonucleotides can be changed depending on which sequence read is used for RNA readout (e.g. Drop-seq uses Read2, 10X vl uses Readl). In certain embodiments a Read2 sequence is used as a PCR handle to generate barcode-containing amplicons compatible with Chromium scRNA library preparation. The sample barcode oligonucleotides may be RNA or DNA. The sample barcode oligonucleotides may incorporate any modified nucleotides known in the art. In certain embodiments, the sample barcode oligonucleotides include a nucleotide barcode sequence of from about 5 - about 20 nucleotides. In certain embodiments, the sample barcode oligonucleotides include a 15 nucleotide barcode sequence. In a preferred embodiment the barcode is selected from one or more of the following: BC01- GTGCCGACCAGTATC (SEQ ID NO: 1); BC02 - ACCACCTGACGCAAA (SEQ ID NO: 2); BC03 -ACGGCCCTATTTAAG SEQ ID NO: 3); BC04; AGCCCTGAGTCAGTA (SEQ ID NO: 4); BC05 - CAAATTCAAGGCGAT (SEQ ID NO: 5); BC06 - AATCTTGTATAAGTA (SEQ ID NO: 6); BC07 - CGTCACATTTGAGTC (SEQ ID NO: 7); BC08 - GGACCTTCTTACGAC (SEQ ID NO: 8); BC09 - TACCAATTGTACGCT SEQ ID NO: 9); BC10 - CGCTAATGTCCGTTT (SEQ ID NO: 10); BC11 - ACCCTACGGTGGTTC (SEQ ID NO: 11); BC12 - TGTCCAAGCTGCAAT (SEQ ID NO: 12); BC13 - GTGTATTTAAAGCCG (SEQ ID NO: 13); BC14 - ACACCCGTATGTCAC (SEQ ID NO: 14); BC15 - TCTTTCGATGGCGGT (SEQ ID NO: 15); BC16 - GAGCACCCGCGTATT (SEQ ID NO: 16); BC17 -

TTATTATGTTCTAGC (SEQ ID NO: 17); and BC18 - AATCTCTGAAACGAA (SEQ ID NO: 18).

[00057] In certain embodiments, the sample barcode oligonucleotides are compatible with oligo dT -based RN A- sequencing library preparations so that they can be captured and sequenced together with mRNAs. In certain embodiments, the sample barcode oligonucleotide includes a poly A tail. In certain embodiments, a poly T oligo is used to capture mRNA and polyadenylated sample barcode oligonucleotides and prime a reverse transcription reaction to obtain cDNA molecules. Commonly used reverse transcriptases have DNA-dependent DNA polymerase activity. This activity allows DNA sample barcoding oligonucleotides to be copied into cDNA during reverse transcription. In certain embodiments, the sample barcode oligonucleotides comprise a PCR handle for amplification and next-generation sequencing library preparation, a barcode sequence specific for each sample, and a polyA stretch at the 3’ end designed to anneal to polyT stretches on primers used to initiate reverse transcription. In certain embodiments, the sample barcode oligonucleotide comprises an UMI. In certain embodiments, random priming may be used for reverse transcription.

[00058] In certain embodiments said genetic barcode is associated with a detectable label, such as a fluorescent label (e.g. GFP), which enables visulisation of the presence of the barcoded cells.

[00059] In an embodiment of the invention the genetic barcode is incorporated into a genomic safe harbor locus (GSH) or a site in the genome able to accommodate the integration of new genetic material in a manner that ensures that the newly inserted genetic elements functions predictably, is expressed ubiquitously and does not cause alterations of the host genome posing a risk to the host cell. Various GSH sites have been described previously and will be known to the person skilled in the art. In embodiments the barcode is stably integrated into a GSH selected from: the adeno-associated virus site 1 (AAVS1), the chemokine (C-C motif) receptor 5 (CCR5) gene, a chemokine receptor gene known as an HIV-1 coreceptor; and the Rosa26 locus or the human ortholog of the mouse Rosa26 locus. In a preferred embodiment, the genetic barcode is incorporated into the adeno-associated virus site 1 (AAVS1) locus.

[00060] The genetic barcodes may be targeted to be stably integrated into the genome of a cell through the use of any appropriate gene-editing tools known to the person skilled in the art. In certain embodiments the barcodes are targeted to a specific locus using a programmable nuclease such as a zinc-finger nuclease (ZFN), transcription activator-like effector nucleases (TALEN) and clustered regularly interspaced short palindromic repeat (CRISPR)-Cas-associated nucleases. In a preferred embodiment the barcodes are stably integrated into the cell genome using CRISPR/Cas9-medited gene editing. [00061] In certain embodiments, a cell or population of cells according to the invention may comprise more than one genetic barcode, wherein at least one genetic barcode is unique to the cell or population of cells. In another embodiment, a cell or population of cells according to the invention may comprise a combination of genetic barcodes, wherein the combination of genetic barcodes is unique to the cell or population of cells. In one embodiment, the barcodes are separated by one or more nucleotides.

Methods of the Invention

[00062] According to another aspect, the present invention provides a method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.

[00063] According to another aspect, the present invention provides a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into a targeted location of the genome of the cells of each population, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.

[00064] The present invention relates to methods of measuring or determining or inferring transcriptional level or even protein level changes s, e.g., massively parallel measuring or determining or inferring of RNA levels in a single cell or a cellular network in response to at least one perturbation parameter or advantageously a plurality of perturbation parameters or massively parallel perturbation parameters involving sequencing DNA of a perturbed cell or cells, whereby transcriptional level and optionally protein level effects may be determined in the single cell in response to the at least one perturbation parameter or advantageously a plurality of perturbation parameters or massively parallel perturbation parameters.

[00065] According to another aspect, the present invention provides a method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.

[00066] In another aspect the present invention provides a method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell RNA-seq library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a transcript from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.

[00067] .Accordingly, embodiments of the invention may involve a method of inferring or determining or measuring genetic information, including RNA levels, in a single cell from a cellular network, e.g., massively parallel inferring or determining or measuring of RNA levels in a single cell or a cellular network in response to at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,

15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,

41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,

67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,

93, 94, 95, 96, 97, 98, 99 or 100 or massively parallel “manipulation(s)” e.g. perturbation parameter(s) comprising optionally so manipulating or perturbing the cell or the cells or each cell of a cellular network with the perturbation parameter(s) and sequencing of the perturbed cell(s), whereby RNA level(s) and optionally protein level(s) is / are determined in the cell(s) in response to the perturbation parameter(s).

[00068] Computational methods for the mapping of sequencing reads from a sequencing library to a single cell will be well-known to the skilled addressee and can be performed, for example, using any available sequence mapping software. Similarly, computational methods for demultiplexing barcodes enabling sequence information to be mapped to specific cells and specific cells to their starting population may be performed using methods that are known to the skilled address and using any available software. Demultiplexing may be performed using more than one method. Sequence mapping and demultiplexing methods useful in the methods of the present invention are described in more detail in the references cited in the following section (incorporated herein by reference) and in the Examples of the invention described herein. In certain embodiments Seurat version 3 as described by Stuart et al., 2019 Cell (https://doi.org/10.1016/bcell.2019.05.031), incorporated herein by reference, is used for the mapping and demultiplexing of reads from the sequencing libraries, (e.g. HTO demultiplexing, doublet calling based on HTO reads, generating quality control metrics). In other embodiments, the scds R package (Bais & Kostka 2020 Bioinformatics, 36(4): 1150-1158, https://doi.org/10.1093/bioinformatics/btz698), incorporated herein by reference, containing three different algorithms to call doublets based on sequence reads is also used.

[00069] Multiplexing single cell RNA-sequencing using internal barcodes

[00070] In certain embodiments, different samples of single cells are multiplexed to generate a multiplexed single sequencing library. The samples may be from different perturbations, different time points in an experiment, from different samples treated under different conditions in an experiment, or from different experiments (e.g., replicates). In certain embodiments, the sequencing library is sequenced and demultiplexed in silico.

[00071] Recent development of single-cell RNA sequencing including droplet-based (see, e.g., Macosko, et al., Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 161(5): 1202 — 1214, 2015; and Dixit, et al., Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell, 167(7): 1853-1866, 2016) and combinatorial split-pool methods (see, e.g., Vitak, et al., Sequencing thousands of single-cell genomes with combinatorial indexing. Nature Methods, 14(3):302-308, 2017; Cao, et al., Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 357(6352):661-667, 2017; and Rosenberg et al., Scaling single cell transcriptomics through split pool barcoding. bioRxiv preprint first posted online Feb. 2, 2017, doi:dx.doi.org/10.1101/105163) have enabled transcriptomic profiling at an unprecedented resolution and scale.

[00072] In certain embodiments, the invention involves single cell RNA sequencing (see, e.g., Kalisky, T., Blainey, P. & Quake, S. R. Genomic Analysis at the Single-Cell Level. Annual review of genetics 45, 431-445, (2011); Kalisky, T. & Quake, S. R. Single-cell genomics. Nature Methods 8, 311-314 (2011); Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research, (2011); Tang, F. et al. RNA-Seq analysis to capture the transcriptome landscape of a single cell. Nature Protocols 5, 516-535, (2010); Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nature Methods 6, 377-382, (2009); Ramskold, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nature Biotechnology 30, 777-782, (2012); and Hashimshony, T., Wagner, F., Sher, N. & Yanai, I. CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification. Cell Reports, Cell Reports, Volume 2, Issue 3, p666-673, 2012).

[00073] In certain embodiments, the invention involves plate based single cell RNA sequencing (see, e.g., Picelli, S. et al., 2014, “Full-length RNA-seq from single cells using Smart-seq2” Nature protocols 9, 171-181, doi: 10. 1038/nprot.2014.006).

[00074] In certain embodiments, the invention involves high-throughput single-cell RNA-seq. In this regard reference is made to Macosko et al., 2015, “Highly Parallel Genome- wide Expression Profiling of Individual Cells Using Nanoliter Droplets” Cell 161, 1202-1214; International patent application number PCT/US2015/049178, published as W02016/040476 on March 17, 2016; Klein et al., 2015, “Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells” Cell 161, 1187-1201; International patent application number PCT/US2016/027734, published as WO2016168584A1 on October 20, 2016; Zheng, et al., 2016, “Haplotyping germline and cancer genomes with high-throughput linked-read sequencing” Nature Biotechnology 34, 303-311; Zheng, et al., 2017, “Massively parallel digital transcriptional profiling of single cells” Nat. Commun. 8, 14049 doi: 10. 1038/ncommsl4049; International patent publication number WO2014210353A2; Zilionis, et al., 2017, “Single-cell barcoding and sequencing using droplet microfluidics” Nat Protoc. Jan;12(l):44-73; Cao et al., 2017, “Comprehensive single cell transcriptional profiling of a multicellular organism by combinatorial indexing” bioRxiv preprint first posted online Feb. 2, 2017, doi: dx.doi.org/10. H01/104844; Vitak, et al., “Sequencing thousands of single-cell genomes with combinatorial indexing” Nature Methods, 14(3):302-308, 2017; Cao, et al., Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 357(6352):661-667, 2017; and Gierahn et al.,“Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput” Nature Methods 14, 395-398 (2017), all the contents and disclosure of each of which are herein incorporated by reference in their entirety.

[00075] In certain embodiments, tagmentation is used to introduce adaptor sequences to genomic DNA in regions of accessible chromatin (e.g., between individual nucleosomes) (see, e.g., US20160208323 Al; US20160060691A1; WO2017156336A1; J. D. Buenrostro et al. , Singlecell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486-490 (2015); and Cusanovich, D. A., Daza, R., Adey, A., Pliner, H., Christiansen, L., Gunderson, K. L., Steemers, F. J., Trapnell, C. & Shendure, J. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science. 2015 May 22;348(6237):910-4. doi: 10. H26/science.aabl601 . Epub 2015 May 7). The term “tagmentation” refers to a step in the Assay for Transposase Accessible Chromatin using sequencing (ATAC-seq) as described. (See, Buenrostro, J. D., Giresi, P. G, Zaba, L. C., Chang, H. Y., Greenleaf, W. J., Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods 2013; 10 (12): 1213-1218). Specifically, a hyperactive Tn5 transposase loaded in vitro with adapters for high-throughput DNA sequencing, can simultaneously fragment and tag a genome with sequencing adapters. In one embodiment the adapters are compatible with the methods described herein.

[00076] The multiplexing strategy described herein is also applicable to single-cell profiling of chromatin accessibility (see, e.g., Cusanovich, et al., 2015; and www.10xgenomics.com/solutions/single-cell-atac/). In certain embodiments, a handle is attached to the adapters, such that the tagmented DNA acts as an artificial mRNA (e.g., poly A tail) and can be captured by a cell of origin barcode poly dT capture sequence. In certain embodiments, the sample barcode oligonucleotides are adapted for tagmentation with the adapters used in the first step of generating cell of origin barcodes. [00077] In one exemplary embodiment, samples for use in droplet based single sequencing as described herein are multiplexed. Cells belonging to different cell populations (e.g. iPSC populations) are labeled with a unique genetic barcode incorporated into their genomes as described herein. The single cells from multiple populations may then be loaded into a microfluidic device. The labeled cells are encapsulated with reagents and “cell of origin” barcode or UMI containing beads in emulsion droplets. The genetic barcode incorporated into the cells genome may then be released from the cell in the droplet (e.g., by lysis of the cell in the droplet) and processed to generate a cDNA molecule comprising the genetic barcode incorporated unique to the population of cells from which the cell was derived and also a “cell- of-origin” barcode or UMI particular to that cell. The sequencing data can then be demultiplexed to determine the cell of origin and the population (e.g. cell line) of origin and therefore the as sociated condition/perturbation .

[00078] In certain embodiments, quantitative real time PCR can be utilized. Detection of the gene expression level can be conducted in real time in an amplification assay. In one aspect, the amplified products can be directly visualized with fluorescent DNA-binding agents including but not limited to DNA intercalators and DNA groove binders. Because the amount of the intercalators incorporated into the double- stranded DNA molecules is typically proportional to the amount of the amplified DNA products, one can conveniently determine the amount of the amplified products by quantifying the fluorescence of the intercalated dye using conventional optical systems in the art. DNA-binding dye suitable for this application include SYBR green, SYBR blue, DAPI, propidium iodine, Hoechst, SYBR gold, ethidium bromide, acridines, proflavine, acridine orange, acriflavine, fluorcoumanin, ellipticine, daunomycin, chloroquine, distamycin D, chromomycin, homidium, mithramycin, ruthenium polypyridyls, anthramycin, and the like.

[00079] In another aspect, other fluorescent labels such as sequence specific probes can be employed in the amplification reaction to facilitate the detection and quantification of the amplified products. Probe-based quantitative amplification relies on the sequence- specific detection of a desired amplified product. It utilizes fluorescent, target- specific probes (e.g., TaqMan® probes) resulting in increased specificity and sensitivity. Methods for performing probe-based quantitative amplification are well established in the art and are taught in U.S. Patent No. 5,210,015. [00080] Multiplexed Perturbation studies

[00081] Methods and tools for genome-scale screening of perturbations in single cells are known to the skilled person. Methods and tools allow reconstructing of a cellular network or for example, the differentiation trajectory of cell or population of cells. In one embodiment, a method utilizing the cells (e.g. iPSCs) described herein comprises (1) imparting or introducing single-order or combinatorial perturbations to a population of cells, (2) measuring genomic, genetic, proteomic, epigenetic and/or phenotypic differences in single cells and (3) assigning a perturbation(s) to the single cells. A perturbation may be linked to a phenotypic change and preferably changes in gene or protein expression. In preferred embodiments, measured differences that are relevant to the perturbations are determined by applying a model accounting for co-variates to the measured differences. The model may include the capture rate of measured signals, whether the perturbation actually perturbed the cell (phenotypic impact), the presence of subpopulations of either different cells or cell states, and/or analysis of matched cells without any perturbation. In certain embodiments, the measuring of phenotypic differences and assigning a perturbation to a single cell is determined by performing single cell RNA sequencing (RNA- seq). In preferred embodiments, the single cell RNA-seq is performed by any method as described herein (e.g., Drop-seq, InDrop, 10X genomics). In certain embodiments, manipulating or perturbing the cell(s) involves altering the culture conditions so as to contact the cell(s) with one or more agents (e.g. another cell, secretions from another cell e.g. a co-culture, cytokine, growth factor, signaling pathway agonist or antagonist, small molecule, antibody, etc.) In other embodiments methods of performing genomewide CRIS PR-mediated perturbation screens are provided.

[00082] Methods and tools for genome-scale screening of perturbations in single cells using CRISPR-Cas9 have been described, herein referred to as perturb-seq (see e.g., Dixit et al., “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens” 2016, Cell 167, 1853-1866; Adamson et al., “A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response” 2016, Cell 167, 1867-1882; Feldman et al., Lentiviral co-packaging mitigates the effects of intermolecular recombination and multiple integrations in pooled genetic screens, bioRxiv 262121, doi: doi.org/10. H01/262121; Datlinger, et al., 2017, Pooled CRISPR screening with single-cell transcriptome readout. Nature Methods. Vol. 14 No.3 DOI: 10. 1038/nmeth.4177; Hill et al., On the design of CRISPR-based single cell molecular screens, Nat Methods. 2018 Apr; 15(4): 271-274; and International publication serial number WO/2017/075294).

[00083] In certain embodiments, unique barcodes are used to perform Perturb-seq. In certain embodiments, a guide RNA is detected by RNA-seq using a transcript expressed from a vector encoding the guide RNA. The transcript may include a unique barcode specific to the guide RNA. The transcript may include the guide RNA sequence (see, e.g., Fig. 16, CROP-seq, Datlinger, et al., 2017). In certain embodiments, a guide RNA and guide RNA barcode is expressed from the same vector and the barcode may be detected by RNA-seq. Not being bound by a theory, detection of a guide RNA barcode is more reliable than detecting a guide RNA sequence, reduces the chance of false guide RNA assignment and reduces the sequencing cost associated with executing these screens. Thus, a perturbation may be assigned to a single cell by detection of a guide RNA barcode in the cell. In certain embodiments, a cell barcode is added to the RNA in single cells, such that the RNA may be assigned to a single cell. Generating cell barcodes is described herein for single cell sequencing methods. In certain embodiments, a Unique Molecular Identifier (UMI) is added to each individual transcript and protein capture oligonucleotide. Not being bound by a theory, the UMI allows for determining the capture rate of measured signals, or preferably the binding events or the number of transcripts captured. Not being bound by a theory, the data is more significant if the signal observed is derived from more than one protein binding event or transcript. In preferred embodiments, Perturb-seq is performed using a guide RNA barcode expressed as a polyadenylated transcript, a cell barcode, and a UMI.

[00084] In certain embodiments, a CRISPR system is used to create an INDEL at a target gene. In other embodiments, epigenetic screening is performed by applying CRISPRa/i/x technology (see, e.g., Konermann et al. “Genome-scale transcriptional activation by an engineered CRISPR- Cas9 complex” Nature. 2014 Dec 10. doi: 10. 1038/naturel 4136; Qi, L. S., et al. (2013).

"Repurposing CRISPR as an RNA-guided platform for sequence- specific control of gene expression". Cell. 152 (5): 1173-83; Gilbert, L. A., et al., (2013). "CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes". Cell. 154 (2): 442-51; Komor et al., 2016, Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage, Nature 533, 420-424; Nishida et al., 2016, Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems, Science 353(6305); Yang et al., 2016, Engineering and optimising deaminase fusions for genome editing, Nat Commun. 7: 13330; Hess et al., 2016, Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells, Nature Methods 13, 1036-1042; and Ma et al., 2016, Targeted AID-mediated mutagenesis (TAM) enables efficient genomic diversification in mammalian cells, Nature Methods 13, 1029- 1035). Numerous genetic variants associated with disease phenotypes are found to be in noncoding region of the genome, and frequently coincide with transcription factor (TF) binding sites and non coding RNA genes. Not being bound by a theory, CRISPRa/i/x approaches may be used to achieve a more thorough and precise understanding of the implication of epigenetic regulation. In one embodiment, a CRISPR system may be used to activate gene transcription. A nuclease-dead RNA-guided DNA binding domain, dCas9, tethered to transcriptional repressor domains that promote epigenetic silencing (e.g., KRAB) may be used for "CRISPR" that represses transcription. To use dCas9 as an activator (CRISPRa), a guide RNA is engineered to carry RNA binding motifs (e.g., MS2) that recruit effector domains fused to RNA-motif binding proteins, increasing transcription.

[00085] In one embodiment, CRISPR/Cas9 may be used to perturb protein-coding genes or non- protein-coding DNA. CRISPR/Cas9 may be used to knockout protein-coding genes by frameshifts, point mutations, inserts, or deletions. An extensive toolbox may be used for efficient and specific CRISPR/Cas9 mediated knockout as described herein, including a double-nicking CRISPR to efficiently modify both alleles of a target gene or multiple target loci and a smaller Cas9 protein for delivery on smaller vectors (Ran, F.A., et al., In vivo genome editing using Staphylococcus aureus Cas9. Nature. 520, 186-191 (2015)). A genome-wide sgRNA mouse library (-10 sgRNAs/gene) may also be used in a mouse that expresses a Cas9 protein (see, e.g., WO2014204727A1).

[00086] In one embodiment, perturbation is by deletion of regulatory elements. Non-coding elements may be targeted by using pairs of guide RNAs to delete regions of a defined size, and by tiling deletions covering sets of regions in pools.

[00087] In one embodiment, perturbation of genes is by RNAi. The RNAi may be shRNA’s targeting genes. The shRNA’s may be delivered by any methods known in the art. In one embodiment, the shRNA’ s may be delivered by a viral vector. The viral vector may be a lentivirus, adenovirus, or adeno associated virus (AAV).

[00088] In certain embodiments, whole genome screens can be used for understanding the phenotypic readout of perturbing potential target genes. In preferred embodiments, perturbations target expressed genes as defined by a gene signature using a focused sgRNA library. Libraries may be focused on expressed genes in specific networks or pathways. In other preferred embodiments, regulatory drivers are perturbed. In certain embodiments, systematic perturbation of key genes that regulate mesendodermal differentiation may be performed in a high-throughput fashion. Gene expression profiling data can be used to define the target of interest and perform follow-up single-cell and population RNA-seq analysis.

[00089] In one aspect, the present invention provides for a method of reconstructing a cellular network, comprising introducing at least 1, 2, 3, 4 or more single-order or combinatorial perturbations to a plurality of cells in a population of cells, wherein each cell in the plurality of the cells receives at least 1 perturbation; measuring comprising: detecting genomic, genetic, proteomic, epigenetic and/or phenotypic differences in single cells compared to one or more cells that did not receive any perturbation, and detecting the perturbation(s) in single cells; and determining measured differences relevant to the perturbations by applying a model accounting for co-variates to the measured differences, whereby intercellular and/or intracellular networks or circuits are inferred. The measuring in single cells may comprise single cell sequencing. The single cell sequencing may comprise unique molecular identifiers (UMI), whereby the capture rate of the measured signals, such as transcript copy number or probe binding events, in a single cell is determined. The model may comprise accounting for the capture rate of measured signals, whether the perturbation actually perturbed the cell (phenotypic impact), the presence of subpopulations of either different cells or cell states, and/or analysis of matched cells without any perturbation.

[00090] The measuring may comprise detecting the transcriptome of each of the single cells. The perturbation(s) may comprise one or more genetic perturbation(s). The perturbation(s) may comprise one or more epigenetic or epigenomic perturbation(s). At least one perturbation may be introduced with RNAi- or a CRISPR-Cas system. At least one perturbation may be introduced via a chemical agent, biological agent, an intercellular spatial relationship between two or more cells, an increase or decrease of oxygen concentration, an increase or decrease of temperature, addition or subtraction of energy, electromagnetic energy, or ultrasound.

[00091] The measuring or measured differences may comprise measuring or measured differences of DNA, RNA, protein or post translational modification; or measuring or measured differences of protein or post translational modification correlated to RNA and/or DNA level(s). [00092] The perturbing or perturb ation(s) may comprise(s) genetic perturbing. The perturbing or perturbation(s) may comprise(s) single-order perturbations. The perturbing or perturbation(s) may comprise(s) combinatorial perturbations. The perturbing or perturbation(s) may comprise gene knock-down, gene knock-out, gene activation, gene insertion, or regulatory element deletion. The perturbing or perturbation(s) may comprise genome-wide perturbation. The perturbing or perturbation(s) may comprise performing CRISPR-Cas-based perturbation. The perturbing or perturbation(s) may comprise performing pooled single or combinatorial CRISPR- Cas-based perturbation with a genome- wide library of sgRNAs. The perturbations may be of a selected group of targets based on similar pathways or network of targets.

[00093] The perturbing or perturbation(s) may comprises performing pooled combinatorial CRISPR-Cas-based perturbation with a genome-wide library of sgRNAs. Each sgRNA may be associated with a unique perturbation barcode. Each sgRNA may be co-delivered with a reporter mRNA comprising the unique perturbation barcode (or sgRNA perturbation barcode).

[00094] The perturbing or perturbation(s) may comprise subjecting the cell to an increase or decrease in temperature. The perturbing or perturbation(s) may comprise subjecting the cell to a chemical agent. The perturbing or perturb ation(s) may comprise subjecting the cell to a biological agent. The biological agent may be a growth factor or cytokine or antibody. The perturbing or perturbation(s) may comprise subjecting the cell to a chemical agent, biological agent and/or temperature increase or decrease across a gradient.

[00095] The cell may be in a microfluidic system. The cell may be in a droplet. The population of cells may be sequenced by using microfluidics to partition each individual cell into a droplet containing a unique barcode, thus allowing a cell barcode to be introduced.

[00096] The perturbing or perturbation(s) may comprise transforming or transducing the cell or a population that includes and from which the cell is isolated with one or more genomic sequenceperturbation constructs that perturbs a genomic sequence in the cell. The sequence-perturbation construct may be a viral vector, preferably a lentivirus vector. The perturbing or perturbation(s) may comprise multiplex transformation or transduction with a plurality of genomic sequenceperturbation constructs.

[00097] The skilled addressee will readily appreciate that the foregoing methods involving genetically barcoded cells according to the present invention may also be readily applied in the context of bulk RNA-seq analysis involving deconvolution of multiplexed bulk RNA samples. Accordingly, in another embodiment, the present invention provides a method for multiplexed cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations or progeny thereof; generating a multiplexed bulk RNA-sequencing library from said plurality of said populations or progeny thereof and sequencing said library; and deconvolving said library using the genetic barcodes.

Kits

[00098] The present invention provides a kit for multiplexed single cell analysis of cells (e.g. iPSCs) and optionally their progeny, comprising one or more cell populations, such as one or more iPSC populations, described herein.

[00099] As used herein, the term “kit” refers to any delivery system for delivering materials. In the context of cell differentiation, a kit may refer to a combination of materials for handling stem cells, such delivery systems include systems that allow for the storage, transport, or delivery of reaction reagents (e.g., compounds, proteins, detection agents (such as probes or antibodies), plasmids, vectors etc. in the appropriate containers (such as tubes, etc.) and/or supporting materials (e.g., buffers, reagents, culture media, written instructions for performing cell differentiation, etc.) from one location to another. For example, kits include one or more enclosures (e.g., boxes, or bags, and the like) containing the relevant reaction reagents (such as culture media, oligonucleotides, enzymes, inhibitors etc.) and growth factors and cytokines (e.g. VEGF, BDNF, FGF etc.)) and/or supporting materials.

[000100] In another embodiment the kit comprises cells, such as iPSCs, together with cell culture reagents as described herein, including within the examples below, for creation of barcoded cells. In another embodiment, the kit further comprises one or more reagents differentiating the cells to a selected phenotype and optionally reagents for the generation of a multiplexed single cell sequencing library.

[000101] In another embodiment the kit further comprises instructions for the preparation of barcoded cells, including barcoded iPSCs, as described herein [000102] In addition to inhibitors and agonists, the kits of the present invention may further comprise one or more of the following: a culture medium, at least one cell culture medium supplement, an agent for inhibiting or increasing expression of one or more gene products, and at least one agent for detecting expression of a marker of differentiation.

Examples

Example 1 - Generation of a novel cell barcoding system

[000103] To allow for an opportunity to interrogate novel aspects of cell-cell interactions and signalling in a single cell context, cells were engineered to produce their own internal barcode through stable incorporation of a specifically designed expression cassette into a transcriptionally active region of the genome.

[000104] Materials and methods

[000105] Barcoding design. 10,000 15 bp barcodes were generated using a 25% probability for the presence of each of the four nucleotides A, C, T and G. Barcodes containing runs of 4 or more nucleotides, or starting or ending with a stop codon, were excluded. All 18 selected barcodes were tested to ensure a minimum Hamming distance of 5 nt (Table 1).

Table 1 - Barcode Sequences

[000106] Barcode cassette design. The barcodes were introduced into the cells as a part of a barcode cassette, which also incorporated the reverse complement of a partial Chromium Read2 adaptor sequence (truncated slightly to allow oligo length of <= 60bp). This Read2 sequence is used as a PCR handle to generate barcode-containing amplicons compatible with Chromium scRNA library preparation. Additionally, restriction enzyme recognition sequences were added or regenerated to enable easy transfer of the cassette between different vectors. The exact structure of the cassette is as follows: EcoRV site 3’ 3bp - Mlul site - partial lOx Read2 adaptor reverse complement - 15bp barcode - Mlul complementary sequence. Barcoding cassettes were ordered as complementary single-stranded oligos, which could be annealed and ligated into a digested plasmid backbone.

[000107] Vector design. AAVSl-CAG-hrGFP (Addgene# 52344) was used as the plasmid backbone. It contains hrGFP under the control of the CAG promoter, and AAVS1 homology arms to allow integration of the linearized plasmid into the genome when paired with the CRISPR system using well-described guide RNAs.

[000108] The barcode cassette was introduced between the EcoRV and Mlul sites of the plasmid between hrGFP and the poly-adenylation site, to enable expression as part of the hrGFP transcriptional unit.

[000109] Generation of barcoding plasmids. pAAVSl-CAG-hrGFP (Addgene# 52344) was digested by incubation with EcoRV-HF (New England BioLabs; NEB) followed by addition of MluI-HF (NEB) and further incubation. Successful digestion was confirmed by running a small amount on an agarose gel, and remainder was purified using QIAQuick PCR Purification Kit (QIAGEN).

[000110] Top and bottom strands of barcode oligos were annealed by mixing luL each of 100 pM oligo in lx T4 DNA ligase buffer in a volume of 10 pL, heating to 94°C for 2 min, then allowing to cool to 25°C at a rate of l°C/s. Annealed oligos were further diluted 1 in 10 with nuclease-free water.

[000111] Ligation of barcode cassettes to vector was performed by combining lOOng digested plasmid, 4 pL diluted annealed oligos, and 1 pL T4 DNA ligase (NEB; 400U/pL) in lOpL total volume with lx T4 DNA ligase buffer. Reaction was incubated at 16°C for 16 hours. Separate reactions were performed for each barcode oligo.

[000112] 3pL of ligation reactions were added to 20 pL Stellar competent cells (Takara) and heat shock transformation performed at 42 °C for 1 min in 1.5mL tubes. 350 pL SOC media was added for recovery, with shaking at 37°C for 1 hour. 100 pL was spread onto selective ampicillin-containing agar plates, which were incubated overnight at 37°C.

[000113] 3 colonies were picked from each plate for screening with colony PCR for expected insert in 10 pL reactions using MangoTaq (Bioline).

[000114] Colonies showing successful amplification of insert were grown overnight in 5 mL LB broth containing ampicillin, and plasmid purified using QIAGEN Plasmid Miniprep kit.

[000115] 100% sequence identity was confirmed across the barcode insert by Sanger sequencing, performed by the Australian Genome Research Facility (AGRF).

[000116] 50 mL cultures were grown from glycerol stocks of plasmids with confirmed barcode sequence insert, and plasmid was purified using Nucleobond Xtra Midi Kit (Macherey-Nagel) to give endotoxin-free, high concentration stock for transfection.

[000117] Stable cell generation. All human pluripotent stem cell studies were carried out in accordance with consent from the University of Queensland’s Institutional Human Research Ethics approval (HREC#: 2015001434).

[000118] WTC wt iPSCs were maintained as previously described (ref Friedman et al). Briefly, cells were cultured on Vitronectin XF (Stem Cell Technologies, Cat# 07180) coated plates in mTeSR media with supplement (Stem Cell Technologies, Cat# 05850) at 37°C with 5% CO2. [000119] For gene editing, cells were grown to about 50-80% confluency, dissociated using IXTrypLE and 100-200K cells were used for each 10 pl reaction of the Neon Transfection System. The transfection mixture included 0.5 pg of Barcode plasmid DNA, 20 pmol AAVS1- taregting sgRNA (protospacer sequence: atcctgtccctagtggcccc (SEQ ID NO: 19), chemically synthesized by Agilent technology) and 20 pmol spCas9 protein (IDT). After electroporation with 1 pulse of 1300 V for 30 ms, cells were seeded in mTesSR with ROCK Inhibitor (Y-27632) and CloneR (STEMCELL Technologies). Selection was performed with 1 pg/ml puromycin and purified cell lines were frozen down in CryoStor CS10 Cell Freezing Medium and stored in liquid nitrogen.

[000120] Quality control of cell lines. All cell lines underwent quality testing for correct genetic insertion, selection efficiency, pluripotency, chromosomal abnormalities and mycoplasma contamination.

[000121] Genomic DNA from all cell lines was extracted using QuickExtract DNA Extraction Solution (Epicentre). Correct targeting of donor construct at the AAVS1 locus was confirmed by junction PCR using the following primer pair: AAVS1 Fl : 5’ -ggttcggcttctggcgtgtgacc-3 ’ (SEQ ID NO: 20), AAVS1 Rl : 5’ -tcaagagtcacccagagacagtgac-3’ (SEQ ID NO: 21). The PCR product was then sent for Sanger sequencing using a universal sequencing primer to validate correct barcode insertion in each cell lines.

[000122] Flow cytometry was performed on live cells for endogenous GFP expression and after labelling for the pluripotency marker SSEA3 (BectonDickinson, Cat# 562706) and corresponding isotype control. Cells were analyzed using a BD FACSCANTO II (BectonDickinson, San Jose, CA) with FACSDiva software (BD Biosciences). Data analysis was performed using FlowJo (Tree Star, Ashland, Oregon).

[000123] Karyotyping was carried out as a professional service by Sullivan Nicolaides Pathology. IPSCs were grown in a 25 cm² flask to about 70-80% confluency and send for analysis. 15 cells were examined per culture and three exemplary karyotypes were provided as results.

[000124] Proof of principle single cell RNA-sequencing. All barcoded iPS cell lines were cultured in parallel as described above, dissociated using 0.5% EDTA and 600K cells from each cell line were combined. Prior to this, four cell lines were additionally labelling with different TotalSeq-A cell hashing antibodies (Stoeckius et al, 2018; Genome Biology) according to the manufacturers protocol. The combined sample was transferred into 2% BSA (Sigma Aldrich, Cat#A9418) in PBS, stained with Propidium Iodide and 500K viable cells were sorted using a BD Influx™ Cell Sorter (BectonDickinson, San Jose, CA) with FACSDiva software (BD Biosciences).

[000125] Single cell RNA-seq libraries were generated using the 10X Genomics Chromium 3' Gene Expression (v2) protocol, with minor modifications to the workflow, outlined by Stoeckius and Smibert (https://citeseq.files.wordpress.com/2019/02/cell_hashing_protocol_190213.pdf) to capture the fraction of droplets containing the HTO-derived cDNA (<180bp).

[000126] HTO additive primers and Illumina TruSeq DNA D7xx_s primer (containing i7 index) were ordered from IDT, and used according to the cell hashing protocol. Hashtag libraries were quantified using the Agilent Bioanalyzer.

[000127] Sequencing was performed using the Illumina Nextseq instrument using the Nextseq High Output 150-cycle kit and the gene expression and HTO libraries were pooled on a single flowcell using a ratio of 90:10. The lOx Genomics sample index used was SI-GA-D11, and the flowcell ID containing the raw data was 190114_NS500239_0333_AHHTLFBGX9.

[000128] The standard 10X Genomics v2 3' gene expression library was processed using the 10X Genomics cellranger pipeline to derive gene expression count matrices. HTO-tagged cells were identified and extracted from the fastq files using the CITE-seq-Count with default parameters (https://hoohm.github.io/CITE-seq-Count/), to generate a count matrix of cells and their respective HTO expression values. This allowed the pooled hashtagged cells to be identified uniquely deconvoluted.

[000129] From the 3' gene expression data, barcoded cells were identified by the expression of a barcode from the whitelist, which were included in the transcriptome reference with unique identifiers (e.g. 'bcOl')-

[000130] Results Plasmid cloning and cell engineering of 18 barcoded WTC-11 iPSC lines was performed as described in the methods above and outlined in Figure la. All cell lines underwent vigorous quality testing in terms of genomic aberrations, pluripotency and stable integration (Figure 2).

[000131] The inventors performed single cell-RNA sequencing on a pooled sample from all 18 barcoding iPS cell lines and found similar expression levels of barcode transcripts in all cell lines. Labelling of four different cell lines with cell hashing antibodies, yielded in a strong correlation of these external barcodes with their internal counterparts, validating the accurate detection of barcoding transcripts. Furthermore, we found comparable expression levels of pluripotency markers in all cell lines and the dimensionality reduction visualisation displays even distribution of cell lines without any effect on clustering (Figure 2).

Example 2 - Multiplexing of signalling perturbations during mesendoderm differentiation.

[000132] The inventors next tested the utility of the barcoded cell lines for studying cell mesendodermal differentiation.

[000133] Materials and methods

[000134] 18 barcoded iPS cell lines (“WTC BC01-BC18”) were generated as outlined in Example 1. BC01 - BC018 were cultured in parallel and multiple temporally staggered set-ups of mesendoderm directed differentiation using a monolayer platform were performed as follows. Differentiation was induced on day 0 by changing the culture media to RPMI (ThermoFisher, Cat# 11875119) containing 3 pM CHIR99021 (Stem Cell Technologies, Cat# 72054), 500 mg/mL BSA (Sigma Aldrich, Cat# A9418), and 213 mg/mL ascorbic acid (SigmaAldrich, Cat# A8960). On day 3, the media was replaced with RPMI containing 500 mg/mL BSA, 213 mg/mL ascorbic acid and one of the signalling molecules listed in Table 2 below. On day 5, the media was exchanged for RPMI containing 500 mg/mL BSA, and 213 mg/mL ascorbic acid without supplemental cytokines. On day 7, the cultures were fed with RPMI containing lx B27 supplement plus insulin (Life Technologies Australia, Cat# 17504001).

[000135] Cell lines were divided into 2 batches (BC01-BC09 and BC10-BC18) to allow for capture of biological duplicates for all 9 conditions. Cells were collected for sc-RNA-sequencing on day 2 as a reference, prior to any perturbations, as well as on days 5 and 9 of differentiation. In total, 3 sequencing libraries were generated, with each library consisting of all 18 cell lines. A combination of 2 different timepoints from the two batches was pooled in each library to allow for easy detection and removal of potential batch effects during downstream analysis (Table 2).

Table 2 - Multiplexing summary of conditions, timepoints and sequencing libraries.

[000136] Perturbation experiment single cell library preparation and sequencing. Sample pools were assessed for quality using a hemocytometer with Trypan Blue exclusion. Cell viability ranged from 82-88%; cell concentration was between 1.3E+06 and 2.1.9E+06 cells/mL. Chromium Single Cell 3’ v3 (lOx Genomics) reactions were performed for each sample according to manufacturer’s protocol, targeting 20,000 cells per reaction. 11 cycles of cDNA amplification were performed in a C1000 Touch thermocycler with Deep Well Reaction Module (Bio-Rad). After clean-up of full-length amplified cDNA, lOpL was used for construction of the gene expression library according to manufacturer’s protocol, with 11 indexing PCR cycles. Additionally, 5pL of full-length amplified cDNA was used to generate a barcoding library for each sample pool. Briefly, a first round of PCR was performed to specifically amplify cDNA regions containing the barcode cassette, and append partial P5 and P7 sequencing adaptors. Each reaction contained lx KAPA HiFi Ho tS tart Ready Mix and 300nM each barcode_amp_F and barcode_amp_R primers in a final volume of 50pL. A 2-step PCR protocol was performed with annealing/extension at 71 °C for 30s, for six cycles. After a 1.2X SPRI clean-up to remove primers, a second round of PCR was performed with the entire volume of purified product from PCR1. Each reaction contained lx KAPA HiFi HotStart ReadyMix, 500nM SI- PCR primer (identical sequence to primer in the Chromium kit), and 5pL of a unique i7 indexed R primer from Chromium i7 Multiplex Kit (lOx Genomics), in a total volume of 50pL. PCR was performed as for the SI-PCR protocol in the gene expression library construction workflow. Eight indexing PCR cycles were performed, for a total of 14 cycles over two rounds of PCR. Final barcoding libraries were purified using IX SPRI beads, and fragment size and library concentration verified along with gene expression libraries using a BioAnalyzer DNA High Sensitivity Kit (Agilent). Final gene expression libraries were 62-82nM, with average size 457- 494bp. Barcoding libraries were 3O-38nM, with average size 358-362bp.

[000137] A single pool was prepared from the three gene expression and three barcoding libraries for sequencing. The samples were pooled equimolar within each library type, and combined so that the gene expression libraries together made up 90% of the pool, and the barcoding libraries 10%. Sequencing was performed using the Illumina NovaSeq 6000 instrument using a S4 Reagent Kit vl.5 (200 cycles). Gene expression count matrices were derived using the standard 10X cell ranger pipeline.

[000138] Demultiplexing and quality control. Barcoding reads used to assign sample barcodes to each cell were from the separately amplified and sequenced barcode libraries, and cell barcodes without both transcriptome and sample barcoding reads were removed. For each barcode sequencing library, the ‘HTODemux’ function in the Seurat R package (v3.0) was used to determine the dominant sample barcode for each cell and annotate negative and doublet cells based on their sample barcode reads alone. Three transcriptome-based doublet detection methods in the scds R package (v 1.2.0) were used to further assign doublet annotations to each cell, and cells labelled as doublets by at least three methods were removed. Transcriptome-based cell filtering as part of the Seurat pipeline removed cells with fewer than 2000 and greater than 7500 detected genes; fewer than 5000 and greater than 50,000 total read counts; or mitochondrial reads accounting for greater than 25% of total reads.

[000139] Following filtering, sample barcodes were assigned to the remaining cells based on the barcode with the highest expression in each cell. [000140] Downstream analysis of single-cell data. Normalisation, UMAP dimensionality reduction, and clustering of the data was done following the standard pipeline in the Seurat pipeline. The clustering resolution used was 0.2, and we assigned cell type labels by interrogating marker gene expression in each cluster. To visualise the expression of marker gene expression in the UMAP plots, we used the R package Nebulosa (vO.99.92), which represents gene expression using kernel density estimation to account for overplotting and noise from expression drop-out.

[000141] For RNA velocity estimation, we used velocyto (vO.17.15) to count spliced and unspliced transcripts from the lOx cellranger output using the ‘runlOx’ command. The resulting count matrices were then input into the scVelo (vO.2.1) pipeline for pre-processing, stochastic RNA velocity estimation and embedding onto the UMAP coordinates generated from the Seurat pipeline.

[000142] Results

[000143] The inventors adapted a well-established monolayer-based cardiac differentiation protocol, which firstly guides cells towards mesendoderm lineages by small molecule activation of WNT signalling (Figure 3a). Based on this protocol, we either captured cells every 24 hours from day 2 to day 9 to generate a high resolution time course of cell states during differentiation or perturbed known developmental signalling pathways between day 3 and day 5 and sampled cells on days 2, 5 and 9 (Figure 3b). For single cell RNA sequencing, samples were multiplexed using two different approaches. While the 8 time course samples were labelled with commercially available cell hashing antibodies, genetically engineered barcode iPS cell lines produced according to the methods outlined in Example 1 were used to combine a total of 54 sequencing reactions into only 3 libraries (Figure 3c, 3e). Both experiments were sequenced using the 10X Chromium platform and samples were demultiplexed according to their barcoding method (Figure 3d).

[000144] Temporal dissection of mesendoderm differentiation. Demultiplexing, doublet calling and preprocessing of the sequencing data was performed using 4 different doublet detection algorithms and low quality cells were removed from the dataset based on UMI and feature count, as well as percentage of mitochondrial and ribosomal RNA content (Figure 8). After filtering, 13,682 cells were subjected to normalization, UMAP dimensionality reduction, and unsupervised clustering following the standard pipeline of Seurat, which identified 10 distinct clusters of both endodermal and mesodermal cell lineages (Figure 4a). The inventors analyzed cell populations based on their day of appearance within the time course (Figure 4b-c) alongside with known marker gene expression to identify transcriptional phenotypes of subpopulations (Figure 4d-e). On days 2 and 3 of differentiation, cells divided into either FOXA2-positive definitive endoderm (cluster 7) or MIXL1 -expressing mesendoderm (cluster 6). Following on, we found 3 separate groups of cells spanning from day 4 through to day 9, one of which was a small population of CDH5 -positive endocardial endothelium (cluster 8). The other two groups comprised of multiple clusters (0, 1 and 3) of endoderm (all expressing FOXA2) or mesodermal identities (clusters 2, 4, 5 and 9). The endoderm cells on day 4 were made of as a whole by primitive gut endoderm (cluster 3) highly expressing the transcription factors FOXA2 and GATA3. SOX2 -positive anterior foregut cells arose on day 5 of, declining in cell numbers towards later stages of differentiation, whereas TTR-positive posterior foregut was firstly detected on day 6 persisting until day 9. The mesodermal cells on day 4 were all HAND1- positive cardiac progenitor cells belonging to cluster 2, whilst on days 5 to 8 multiple populations existed in parallel. We detected paraxial and lateral plate mesoderm (PAX3⁺ and PRRX1⁺, cluster 4) from day 5 to day 7, a small population of NOG⁺/T⁺ axial mesoderm (cluster 9) on days 6 and 7, as well as a MYH6⁺ cardiomyocyte population (cluster 5) from as early as day 5 onwards (Figure 4c-e).

[000145] To predict future states of individual cells, identify branching points of lineages and further examine transcription kinetics, we applied RNA velocity to our time course dataset (Figure 4f). Firstly, this analysis verified the initial separation on day 2 of clusters 6 and 7 into a mesendoderm population and definitive endoderm. It also showed a clear direction of transcription kinetics in cluster 8 that overlaps with the time points of cell capture. Most importantly, it highlighted the complexity of transient progenitor cell states in both endodermal and mesodermal lineages.

[000146] Taken together, these data show iPSC differentiation into committed endodermal and mesodermal cell types via multiple different progenitor cell populations in a highly dynamic process.

[000147] Using a directed differentiation protocol for derivation of mesendodermal cell types (Figure 3a), we designed an experiment for systematic activation or inhibition of known developmental signalling pathways governing Wnt, BMP, and VEGF pathways (Figure 3b). Barcoded cells were split into two groups to enable analysis of biological duplicates for each time point and condition. Cells were captured prior to perturbation at day 2 then at day 5 after treatment perturbation (day 3-5). Lastly, committed cell types were captured on day 9 of differentiation. A total of 54 experimental samples were mixed based on unique barcode combinations into 3 multiplexed samples and submitted for library preparation and sequencing. Computational demultiplexing of barcodes enabled each transcript mapped to specific cells and each cell mapped to specific experimental condition and time point.

[000148] After data quality control, we successfully capture 48,526 cells that are represented using dimensionality reduction methods (Figure 5a). Cell analysis for time point, sequencing library, and treatment condition demonstrate effective analysis of multiplexed data consistent with original experimental design (Figure 5b-c). Marker genes are used to identify cell types as features of diverse cell types captured by the multiplexed data (Figure 5d-e). Quantitative analysis of all input treatment conditions is used to identify how different treatments contribute to various cell types (Figure 5f).

[000149] Taken together, these data demonstrate the success of deriving engineered barcode iPSCs for multiplexed single cell analysis. Barcoded cell lines have the potential for utility in diverse multiplexing endpoints, including those not developed, in which analysis of unique barcodes embedded in the genomic DNA or RNA of input cell types provides a way of multiplexing endpoints for scalable analysis of iPSCs or differentiated cell types.

Example 3 - Comparative Example Cell Hashing libraries vs Internal Barcode libraries

[000150] In this example 3 RNA-seq libraries were created as described in Example 1 (i.e. using barcoded iPSC cell lines) resulting in three sets of data comprising three reactions each:

Dataset 1: Cell hashing library: seu_bc0Xav, seu_bc5Xav, seu_bcDox

Dataset 2: Internal barcoding library with separately amplified barcoding reads: seul, seu2, seu3; and

Dataset 3: Internal barcoding where barcoding reads were solely obtained transcriptome library: libl, lib2, lib3 The internal barcoding libraries (datasets 2 and 3) have the same transcriptomes, but represent two different methods of detecting the barcoding reads.

[000151] The first dataset has very consistent cell numbers (19997, 19995, 19991), whereas the second and third datasets were more overloaded and initially had more cells (24721, 24828, 26583).

[000152] Since the barcoding reads were sequenced separately to the transcriptome in dataset 2, the cell barcodes that were not picked up in both the libraries were excluded, resulting in fewer cells in this dataset (19656, 19708, 19349).

[000153] Comparison of the number of genes, library size, percentage mitochondrial and ribosomal reads were assessed as measures of transcriptome quality. As a measure of the efficiency of barcoding using the cell hashing method and the internal barcoding method described herein, the comparison of the % of cells assigned singlet (confident) barcodes and % of cells assigned no barcode was performed as filtering for transcriptome quality wherein cells with a high number of genes and outlier library size are excluded.

[000154] Results

[000155] Comparisons of transcriptomes of cells prepared using the cell hashing method and those prepared using the internal barcoding strategy described herein demonstrate that the number of mitochondrial genes is significantly higher in cells prepared using cell hashing methods (Figure 6): percent mitochondrial reads: dataset 1 vs. dataset 2: p = 0.01226; dataset 1 vs. dataset 3: p = 0.007659; dataset 2 vs. dataset3: p = 0.5867.

[000156] These data highlight that cells prepared using the internal barcoding methods described herein are not as stressed at the time of cell capture and yield transcriptomes of higher quality. Without wishing to be bound by theory this is likely due to the longer incubation and handling times required for the hashing method requiring the addition of hashing antibodies.

[000157] As detailed above, datasets 1 and 2 have two libraries generated: 1 for the transcriptome and 1 for the barcodes. Dataset 3 was only generated with sequencing the internal barcodes (no separate library for the barcodes). The inventors have surprisingly found that the internal barcodes can be picked up with equal efficiency to sequencing the barcodes separately (Figure 7). Indeed, there are slightly more singlet cells in the internal barcoding dataset. These findings, have significant implications for the reduction of cost and complexity of workflows since an operator utilizing the internally barcoded cells and methods described herein need only sequence the an RNA library and parse genes back to cells and cells back to experimental conditions in 1 library.

Claims

46 CLAIMS

3. The method of claim 1 or 2 wherein the step of generating a multiplexed single cell RNA- seq library comprises: creating a one or more gene expression libraries; creating one or more separate barcode libraries via creation of purified cDNA and amplification of regions of said cDNA comprising said genetic barcode; and pooling said gene expression library and said barcode library. 47

4. The method of claim 3, wherein the more than one gene expression library and more than one barcode library are created and pooled.

5. The method of claim 3 or 4, wherein the one or more gene expression libraries comprise approximately 90% of the pool and the one or more barcode libraries comprise approximately 10% of the pool.

6. The method of any one of the preceding claims, wherein the step of generating a multiplexed single cell RNA-seq library comprises creating a gene expression library via creation of purified cDNA and amplification of said cDNA but does not include the generation of a separate barcode library.

7. A method for multiplexed single cell analysis comprising: providing a plurality of isogenic populations of cells, wherein the cells of each of said populations comprise a genetic barcode unique to each population, wherein said genetic barcode is stably integrated into the genome of the cells of each population at a targeted location; manipulating one or more of said populations of cells or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell; de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said populations of cells or progeny thereof.

8. A method for multiplexed single cell analysis of an induced pluripotent stem cell (iPSC) or progeny thereof comprising: providing a plurality of isogenic iPSC populations into which a genetic barcode unique to each population is stably integrated into the genome of the cells of each population at a targeted location, and/or progeny of said iPSC populations; manipulating one or more of said iPSC populations or progeny thereof; generating a multiplexed single cell sequencing library from said plurality of said iPSC populations or progeny thereof and sequencing said library; mapping a read of sequence information from said library to a single cell 48 de-multiplexing said library using the genetic barcodes; and mapping said single cell to one of said iPSC populations or progeny thereof.

9. The method of any one of the preceding claims, wherein providing a plurality of isogenic populations of cells or iPSCs comprises incorporating said genetic barcode into said targeted location via CRISPR/Cas9-mediated integration.

10. The method of any one of the preceding claims, wherein said genetic barcode is incorporated into a genomic safe harbor locus.

11. The method of claim 10 wherein said genetic barcode is incorporated into the adeno- associated virus site 1 (AAVS1).

12. The method of any one of the preceding claims wherein said genetic barcode is fluorescently labelled.

13. The method of any one of the preceding claims wherein said genetic barcode is from 5 - 20 bp.

14. The method of claim 13 wherein said genetic barcode is 15 bp.

15. The method of claim 13, wherein said genetic barcode is selected from the group consisting of:

GTGCCGACCAGTATC (SEQ ID NO: 1);

ACCACCTGACGCAAA (SEQ ID NO: 2);

ACGGCCCTATTTAAG (SEQ ID NO: 3);

AGCCCTGAGTCAGTA (SEQ ID NO: 4);

CAAATTCAAGGCGAT (SEQ ID NO: 5);

AATCTTGTATAAGTA (SEQ ID NO: 6);

CGTCACATTTGAGTC (SEQ ID NO: 7);

GGACCTTCTTACGAC (SEQ ID NO: 8);

TACCAATTGTACGCT (SEQ ID NO: 9);

CGCTAATGTCCGTTT (SEQ ID NO: 10);

ACCCTACGGTGGTTC (SEQ ID NO: 11);

TGTCCAAGCTGCAAT (SEQ ID NO: 12); GTGTATTTAAAGCCG (SEQ ID NO: 13);

ACACCCGTATGTCAC (SEQ ID NO: 14);

TCTTTCGATGGCGGT (SEQ ID NO: 15);

GAGCACCCGCGTATT (SEQ ID NO: 16);

TTATTATGTTCTAGC (SEQ ID NO: 17); and AATCTCTGAAACGAA (SEQ ID NO: 18).

16. The method of any one of the preceding claims, wherein prior to generating said multiplexed RNA-seq library, one or more of the populations of cells and/or progeny thereof are mixed together with one or more other populations of cells and/or progeny thereof.

17. The method of any one of the preceding claims, wherein manipulating one or more of the populations of cells or iPSCs or progeny thereof comprises contacting the cells with an agent of interest which results in a biologically measurable perturbation to a cell.

18. The method of any one of the preceding claims, wherein manipulating one or more of the populations of cells or progeny thereof comprises altering the culture conditions of, or genetically perturbing the cells of the one or more populations or progeny thereof.

19. The method of claim 18, wherein altering the culture conditions comprises contacting the cells or progeny thereof with an agent of interest, contacting the cells or progeny thereof with another cell, co-culturing the cells or progeny thereof with another cell, or co-culturing the cells and/or progeny thereof in an organoid.

20. The method of claim 17, 18 or 19, wherein the agent of interest is a small molecule, a polypeptide, an antibody, a nucleic acid molecule, an RNAi, a vector comprising a nucleic acid molecule, an antisense oligonucleotide, or a gene editing system (e.g. CRISPR/Cas9).

21. The method of any one of claims 17 to 20 when used in a high-throughput drug screening assay.

22. The method of any one of the preceding claims wherein said progeny are differentiated progeny.

24. The plurality of isogenic populations of cells, or progeny thereof, of claim 23, wherein said genetic barcode is incorporated into a genomic safe harbor locus.

25. The plurality of isogenic populations of cells, or progeny thereof, of claim 24, wherein said genetic barcode is incorporated into the adeno-associated virus site 1 (AAVS1).

26. The plurality of isogenic populations of cells, or progeny thereof, of any one of claims 23 - 25, wherein said genetic barcode is fluorescently labelled.

27. The plurality of isogenic populations of cells, or progeny thereof, of any one of claims 20 - 22, wherein said genetic barcode is from 10 - 20 bp.

28. The plurality of isogenic populations of cells, or progeny thereof, of claim 24, wherein said genetic barcode is 15 bp.

29. The plurality of isogenic populations of cells, or progeny thereof, of claim 25, wherein said genetic barcode is selected from the group consisting of:

GTGCCGACCAGTATC (SEQ ID NO: 1);

ACCACCTGACGCAAA (SEQ ID NO: 2);

ACGGCCCTATTTAAG (SEQ ID NO: 3);

AGCCCTGAGTCAGTA (SEQ ID NO: 4);

CAAATTCAAGGCGAT (SEQ ID NO: 5);

AATCTTGTATAAGTA (SEQ ID NO: 6);

CGTCACATTTGAGTC (SEQ ID NO: 7);

GGACCTTCTTACGAC (SEQ ID NO: 8);

TACCAATTGTACGCT (SEQ ID NO: 9);

CGCTAATGTCCGTTT (SEQ ID NO: 10);

ACCCTACGGTGGTTC (SEQ ID NO: 11);

TGTCCAAGCTGCAAT (SEQ ID NO: 12);

GTGTATTTAAAGCCG (SEQ ID NO: 13); ACACCCGTATGTCAC (SEQ ID NO: 14);

TCTTTCGATGGCGGT (SEQ ID NO: 15);

GAGCACCCGCGTATT (SEQ ID NO: 16);

TTATTATGTTCTAGC (SEQ ID NO: 17); and

AATCTCTGAAACGAA (SEQ ID NO: 18).

30. The plurality of isogenic populations of cells of any one of claims 23 - 29, wherein the cells are iPSCs or progeny thereof.

31. The progeny of cells of any one of claims 23 - 30, wherein the progeny are differentiated progeny.