WO2016061318A1 - Smart reporter cells and methods of making and using same - Google Patents

Smart reporter cells and methods of making and using same Download PDF

Info

Publication number
WO2016061318A1
WO2016061318A1 PCT/US2015/055681 US2015055681W WO2016061318A1 WO 2016061318 A1 WO2016061318 A1 WO 2016061318A1 US 2015055681 W US2015055681 W US 2015055681W WO 2016061318 A1 WO2016061318 A1 WO 2016061318A1
Authority
WO
WIPO (PCT)
Prior art keywords
cell
cells
agents
marker
library
Prior art date
Application number
PCT/US2015/055681
Other languages
French (fr)
Inventor
Steven J. Altschuler
Lani Wu
Original Assignee
Altschuler Steven J
Lani Wu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Altschuler Steven J, Lani Wu filed Critical Altschuler Steven J
Publication of WO2016061318A1 publication Critical patent/WO2016061318A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/58Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving labelled substances
    • G01N33/582Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving labelled substances with fluorescent label
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/5005Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells
    • G01N33/5008Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics
    • G01N33/502Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics for testing non-proliferative effects
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/435Assays involving biological materials from specific organisms or of a specific nature from animals; from humans
    • G01N2333/43504Assays involving biological materials from specific organisms or of a specific nature from animals; from humans from invertebrates
    • G01N2333/43595Assays involving biological materials from specific organisms or of a specific nature from animals; from humans from invertebrates from coelenteratae, e.g. medusae

Definitions

  • An appropriate resolution (pixels per image) of the digitized image must be selected, whether the images are originally acquired by digital means or are scanned from conventional micrographs.
  • resolution is typically selected so that features of interest (e.g., whole cells, nuclei, or centromeres) comprise a sufficient number of pixels that their morphological characteristics (e.g., average diameter, area, perimeter, shape factor) may be determined with a sufficient accuracy at the selected magnification, while not exceeding available computing power and/or data storage.
  • a camera with very fine resolution i.e., a large number of pixels per imaged frame
  • a higher magnification may be used. In such cases, more image frames may be acquired for each specimen in order to image a statistically significant number of cells.
  • 2-deoxyglucose uptake and the intracellular localization (plasma membrane, microsomal, and cytosolic fractions) of GLUT4 may be measured in C2C12 myocytes/myotubes and 3T3-L1 adipocytes. Glycogen concentrations in C2C12 myocytes/myotubes may also be measured.
  • gluconeogenic gene expression mRNA levels of gluconeogenic genes, including PEPCK, G6Pase, and PGCla, in AML12 cells by real-time quantitative PCR may be measured. With each different collection of readouts, one can apply the invention to identify a smart reporter that can be used as previously described, for example to predict compound mechanisms from among a plurality of distinct modes of action.
  • the invention also provides a system for carrying out the inventive methods.
  • the system may include some or all of the hardware and software necessary to practice the inventive technology.
  • the system may include microscopes, microprocessors, data storage devices, robots, fluid handling devices, plate reader, automatic pipetters, software, printers, plotters, displays, etc.
  • the system may include a microscope able to acquire images at various magnifications and/or resolutions, a microprocessor, and software for carrying out the image analysis and the statistical analysis of the raw data derived from the images.
  • the system includes the hardware and/or software necessary to calculate statistics, such as but not limited to Kolnogorov-Smirnov statistics.
  • the system includes the hardware and/or software necessary to perform clustering analysis.
  • CD-tagging To monitor cellular responses to compounds, Central Dogma (CD)-tagging was used to build live cell reporters. CD-tagged proteins serve as reliable biomarkers of cellular responses to compounds. CD-tagging permits monitoring of time-dependent drug response while avoiding costly antibodies and toxic chemical labeling compounds.
  • a pSeg-tagged parent A549 clone was selected expressing H2B-CFP fusion protein in the nucleus and mCherry in the cytoplasm (FIG. 2, left) for automatic segmentation of cell boundary.
  • a multistep pipeline was implemented as follows: (1) generate the CD-tagging retrovirus in a retrovirus packaging cell line; (2) infect CD-tagging retrovirus into A549 pSeg clones; (3) use FACS to sort single CD-tagged clones into 384-well plates; (4) confirm positively fluorescent clones by microscope; and 5) identify CD-tagged genes by 3' RACE or 5' RACE in HT (96 well) format, and optionally, compare localization of expressed proteins to published reports.
  • FIG.3 shows the distribution of the 152 reporter cell lines by functional pathway.
  • Images obtained in the previous step are analyzed via an image-analysis pipeline as shown in FIG. 5.
  • KS non-parametric Kolmogorov-Smirnov
  • a collection of triply- labeled live-cell reporter cell lines was established (Fig. 14a).
  • pSeg Central Dogma
  • CD-tagged proteins typically express at endogenous levels and have preserved functionality, though for our profiling purposes they are only required to serve as reliable biomarkers of cellular responses to compounds.
  • A549 non-small cell lung cancer cell line was used as it is amenable both to transfection assays (high rates of transfection) and imaging studies (cells do not tend to clump and can be more easily identified by computer).
  • a pSeg-tagged parent A549 clone expressing the nuclear and cellular reporters was selected. (Stable pSeg- tagged A549 clones were grown for tens of passages without signs of reduced expression.) With this clone, -600 of triply-labeled A549 reporter clones were generated (Fig. 15a). The identities of the CD-tagged genes were then identified by 3' RACE or 5' RACE. From this large collection, 93 reporters that were tagged for distinct proteins were selected, placed in diverse GO-annotated functional pathways, and had detectable YFP levels by microscopy.
  • KS Kolmogorov-Smirnov
  • the XRCC5 reporter was used to perform a large-scale phenotypic screen of small- molecule compound libraries. These libraries included: the NCI approved oncology drug set IV (101 compounds), the NCI diversity set IV (1596 compounds), the NCI natural product set III (117 compounds), the Prestwick FDA approved drug set (1100 compounds), and the UTSW 8K set (8,000 compounds). All compound libraries were assayed at three different concentrations, except the Prestwick and UTSW sets, which were assayed at a single concentration due to their large sizes. Additionally, a "reference" drug set was included, which expanded the previous "test” set to 10 drug classes affecting diverse biological processes (Fig. 17).
  • a multi-step strategy to identify "hits” was taken.
  • feature space was transformed and reduced dimensionality to maximally separate reference drug classes from one another.
  • Linear discriminant analysis LDA was applied to the collection of reference drug profiles to identify an "optimal" transform that increased separation of profiles across drug classes while decreasing separation of profiles within each class.
  • LDA Linear discriminant analysis
  • our unknown compounds were assigned to the nearest reference drug class.
  • a nearest-neighbor approach was applied to the LDA-transformed space to assign each unknown compound to the category of its nearest reference drug.
  • confidence scores were calculated, ranging from 0 (low) to 1 (high), for each prediction.

Abstract

The present invention provides cell reporters and methods of using the same for high throughput screens of libraries of candidate agents. The cells and methods also provide for ability to predict similar mechanisms of actions of agents without a priori knowledge of endogenous biomarkers or signaling pathways.

Description

PATENT APPLICATION
FOR
SMART REPORTER CELLS AND METHODS OF MAKING AND USING SAME
BY
STEVEN J. ALTSCHULER
AND
LANI WU
CROSS-REFERENCE
This application claims the benefit of U.S. Provisional Application No. 62/064,923 filed October 16, 2014, which application is incorporated herein by reference.
BACKGROUND
Despite the modern explosion of information regarding the molecular pathways that become disregulated in disease, there have been surprisingly few new drug approvals. The number of people with diseases such as obesity and type-2 diabetes is staggering; yet, most of the front-line therapies, such as metformin, sulfonylureas and thiazolidinediones, can have serious side effects and have been on the market for decades with little or no improvement. Likewise, cancer is the leading cause of death in developed countries and the second leading cause of death in developing countries; yet, clinicians lack a diverse arsenal of drugs needed to treat subpopulations of patients more effectively, to reduce side effects, and to offer second-line treatment when drug resistance emerges.
The wealth of new mechanistic information, however, has not translated into new and better drugs. Current drug-discovery approaches typically focus on single molecular targets rather than diseases as a whole. Yet, diseases are complex, affecting multiple pathways and/or tissue types. In fact, many highly successful drugs were discovered because of their biological effects and without knowing their molecular targets or mechanisms of action. For example, the blockbuster drug metformin has complex systems-level effects, and acts through multiple pathways and tissue types rather than a single pathway.
SUMMARY
The present disclosure provides a cell comprising at least one heterologous marker, wherein the marker responds in a first and second characteristic manner in response to a first and second agent and said respective characteristic manners are indicative of the mechanism of action of said agents based on the characteristic response of the marker to a plurality of agents having known mechanisms of action.
In another embodiment the principles of the present disclosure are directed to a cell comprising at least one heterologous marker, wherein the response of the marker induced by an agent is capable of predicting a mechanism of action of the agent in view of its similarity to the response of the marker to a plurality of agents having different mechanisms of action.
In another embodiment, the principles of the present disclosure are directed to a cell comprising at least one heterologous marker, wherein the cell is capable of predicting the mechanism of action of an agent by the agent's ability to induce a phenotype in the cell that is similar to the phenotype induced by known agents.
In another embodiment, the principles of the present disclosure are directed to a cell comprising at least one heterologous marker, wherein the cell is capable of predicting the mechanism of action of a plurality of agents by the agents' ability to induce a phenotype in the cell that is similar to the phenotype induced by known agents.
In another embodiment, the principles of the present disclosure are directed to a library of smart reporter cells, each capable of predicting a bioactivity of a plurality of agents by the agents' ability to induce a phenotype of the cells that is similar to the phenotype induced by known agents.
In addition, the principles of the present disclosure provide for the above described library, wherein each of the cells comprises a heterologous marker. In another embodiment, the principles of the present disclosure are directed to a method of predicting mechanism of action of an agent comprising contacting a library of cells as described above with an agent, monitoring the marker, comparing the marker phenotype to marker phenotypes induced by a plurality of agents of known mechanism and predicting the mechanism of action of the agent based on a similarity in marker response between the response induced by the agent and at least one of the responses induced by at least one agent of known mechanism.
In another embodiment, the principles of the present disclosure are directed to a method of making a smart reporter cell comprising inserting a heterologous marker into the genome of a plurality of cells and propagating clones of the cells to form a first cell library, contacting the first cell library with a first and second agent with first and second mechanisms of action, respectively, monitoring the activity of the marker in response to the first and second agents contacting the first cell library with a third and fourth agent with first and second mechanisms of action, respectively, monitoring the activity of the marker in response to the third and fourth agents and selecting cells from the first library that respond substantially the same to the first and third agents and said second and fourth agents. In general, the responses of the different agents are substantially different such that the mechanisms of action can be distinguished.
In another embodiment, the principles of the present disclosure are directed to a method of making a library of tissue specific smart reporter cells comprising inserting a heterologous marker into the genome of stem cells and propagating clones of the cells to form a first stem cell library, contacting the first stem cell library with a first and second agent with first and second mechanisms of action, respectively, monitoring the activity of said marker in response to the first and second agents, contacting the first stem cell library with a third and fourth agent with first and second mechanisms of action, respectively, monitoring the activity of the marker in response to the third and fourth agents, selecting stem cells from the first library that respond substantially the same to the first and third agents and the second and fourth agents and differentiating the selected stem cells into different cell types.
In another embodiment, the principles of the present disclosure are directed to a method of making a library of tissue specific smart reporter cells comprising inserting a heterologous marker into the genome of stem cells and propagating clones of the cells to form a first stem cell library, differentiating cells in the first stem cell library into different cell types to form a differentiated cell library, contacting the differentiated cell library with a first and second agent with first and second mechanisms of action, respectively, monitoring the activity of said marker in response to said first and second agents, contacting said differentiated cell library with a third and fourth agent with first and second mechanisms of action, respectively, monitoring the activity of said marker in response to the third and fourth agents, selecting cells from the differentiated cell library that respond substantially the same to the first and third agents and the second and fourth agents.
In another embodiment, the principles of the present disclosure are directed to a method of screening for bioactive agents that target a particular cell type, said method comprising contacting said tissue specific libraries as described herein with a first agent and monitoring the effects of the agent on the markers in each of the differentiated cell types.
In another embodiment, the principles of the present disclosure are directed to a database comprising the detected results of marker activity in response to a plurality of agents obtained from the smart reporter cells as described herein.
DESCRIPTION OF THE DRAWINGS
FIG. 1 shows an overview of one embodiment of the invention described herein, showing phenotypic profiles of compounds visualized as points in low-dimensional scatter plots;
FIG. 2 shows plasmid maps and sample images for pSeg (left: nucleus/blue; cytosol/red) and pCDtag (right: tagged protein/yellow) in A549 cells. SA/SD: splice acceptor/donor. Scale bar ΙΟμιη;
FIG. 3 provides an overview of protein function in a reporter library according to an embodiment of the invention disclosed herein;
FIG. 4 shows responses of 6 reporters (rows with gene names) to drugs (columns) of different drug classes. YFP channel is shown. Scale bar ΙΟμιη;
FIGS. 5A-C provide an overview of transform from (A.) compounds are transformed to (C.) profiles (blue triangles). B. Reporter clone responses to a compound are measured and summarized as a multidimensional profile. C. Compound profiles can be visualized as low- dimensional points via Principle Component Analysis (PCA) or Multi-Dimensional Scaling (MDS) plots;
FIGS. 6A-B show how profiles distinguish drug classes according to one embodiment of the invention disclosed herein. A. Heat maps for visualizing profiles: y-axis: 89 reporters x about 80 features/reporter; x-axis: 9 time points spaced 6 hrs apart; red/green are +/- feature distribution shifts vs. control (as in FIG. 5). Feature orders in all plots are the same. B. Profiles from 4 replicate plates group together (compounds are labeled as in A);
FIGS. 7A-C show another embodiment of the invention. A. MDS projection of profiles from compounds in the NCI Challenge set. Black squares are unknowns. Guilt-by-association predictions of drug class are made for compounds "A2" (NSC693322), "A3" (NSC637578), and "C2" (NSC640499). Ellipses suggest functional groupings. B-C. Experimental validation for three predicted DNA-replication inhibitors. Candidate drugs show: B) increased DNA-damage (indicated byp-P53 & p-H2A.X staining) and C) cell-cycle arrest in Gl , similar to camptothecin;
FIG. 8 shows data regarding the relationship between reporters and total discriminatory power (DP Total)- (Top) Profiles from a single reporter and time point have a wide range of DP Total- (Bottom). Relationship between number of reporters, time points and DPTotai- Total treatment time: 48hrs;
FIG. 9 is a scatterplot showing that the most informative reporters are largely the same whether they were trained on 4 or 5 of our test drug classes;
FIG. 10 provides an overview of development and selection of informative reporter cell lines for profiling large chemical libraries;
FIG. 11 shows photographs of eight different lung cancer cell lines tagged with the pSeg labeling construct according to an embodiment of the invention disclosed herein;
FIG. 12 provides an overview of discovery of new compound leads through profiling medium-sized chemical libraries;
FIGS. 13A-B illustrate plate layouts for positive (+) and negative (-) controls. FIG. 13A shows controls positioned in interior plate wells to avoid edge effects. FIG. 13B shows controls undesirably present in outside wells. FIG. 14 provides an overview of one embodiment of the approach, including: (a) tagging cells with live-cell reporters of cytosolic and nuclear area and protein expression; (b) sample images of cellular responses across different reporters (rows) to different categories of drug perturbations (columns); (c) time traces of reporter cell lines to drug perturbations (circle at end of line is 48 hr time point).
FIG. 15 illustrates the diverse range of abilities of different reporters to discriminate compounds by mechanisms of action. In this embodiment, the reporter at the very left is the "smartest" reporter among the collection of reporter cell lines. At the bottom are shown three sample multi-dimensional scaling representations of profiles from three different reporters; the "smarter" the reporter, the more profiles from drugs of similar mechanism (same colored points) will group and drugs of different mechanism (different colored points) will separate.
FIG. 16 illustrates the results of a phenotypic screen using the "smart" reporter chosen by our methods in FIG. 15. (a) Approximately 10,000 compounds were screened using the "smart" reporter for activities in 10 different mechanistic drug classes, (b) Compounds with activity in one of the 10 different classes, or activity away from the cloud of negative control DMSO profiles, were deemed "hits." These 431 hits were rescreened with the smart reporter. In this embodiment, compounds that gave the same prediction of mechanism were dubbed "reproducible hits."
FIG. 17 illustrates that secondary studies validate predictions made by our "smart" reporter in FIG. 16 are accurate across diverse mechanisms of action. Top: False discovery rates (FDR; y-axes) were calculated for 38 reference drugs (FDRRef, solid line) or 175 reproducible compound "hits" (FDRRH, dashed line) at different thresholds for readouts selected in each validation assay (x-axes; Online Methods). Vertical dark gold dashed lines: readout thresholds at FDRRef = 0.1 (horizontal dark gold dash line). Middle: Readout values (x-axes) of DMSO, reference drugs (at five, 5-fold serial dilutions), and 175 reproducible hits (RH) were shown for each validation experiment. Reference drugs were grouped according to MOA; each point represents a drug-concentration instance, and different concentrations of the same drug were connected by grey lines. Circle size reflects the concentrations (larger size indicates higher concentration). Reproducible hits (RHs) that were predicted to belong to the MOA being validated were highlighted with corresponding colors. Bottom: representative images of cells treated with DMSO (-), positive control reference drugs (+), and RHs (?) indicated by black arrows in middle panel. Scale bar = 10 μιη.
FIG. 18 illustrates that a "smart" reporter can identify novel compound groupings. Pairwise distances between profiles among reference drugs and compound hits from our secondary phenotypic screen were computed. Clusters were identified by hierarchical clustering (right, red vertical line is the threshold for determining clusters, see Online Methods). Two clusters were found to mainly consist of previously annotated compounds: Glucocorticoid (purple) and Dihydrofolate reductase (DHFR) inhibitors (cyan); neither class was used in the selection of the smart reporter.
DETAILED DESCRIPTION
A pathway- and tissue-focused drug discovery approach is needed that would dramatically increase the odds that lead compounds become successful drugs. A key challenge, however, is to identify small numbers of high-quality compound leads with desired systems-level effects before either testing in animal models that are otherwise prohibitively expensive for large-scale, systematic screens, or by using approaches such as transcriptomics or proteomics which are far too expensive and time consuming to be scaled to libraries with hundreds of thousands or millions of compounds. Accordingly, the principles of the present disclosure provide reagents and methods of making and using such reagents in screens capable of predicting mechanism of action of bioactive agents.
In one embodiment, therefore, the present disclosure provides cells comprising at least one heterologous marker capable of consistently producing a characteristic phenotype in the presence of a plurality of agents, also referred to herein as bioactive agent, that impinge on or otherwise affect a particular cellular pathway. By cellular pathway is meant a signaling pathway, metabolic pathway or other regulated biochemical pathway in a cell.
In one embodiment, the biomarker is any detectable marker and is transfected, infected or otherwise inserted into the genome of a host cell. By host cell is meant any cell that may include cultured cells or primary cells. In some embodiments the cell is a cancer cell, a tissue-specific cell, a stem cell, a bacterium, fungus, and the like. The cell samples which can be analyzed using the inventive method can be derived from any source. The cells may be derived from any species of animal, plant, bacteria, fungus, microorganism, or single-celled organism. Examples of sources include E. coli, Saccharomyces cerevisiae, S. pombe, Candida albicans, C. elegans, Arabidopsis thaliana, rats, mice, pigs, dogs, and humans. In certain embodiments in which chemical compounds are being screened for biological activity in humans, the cells are of mammalian origin, preferably of primate origin and even more preferably of human origin. In certain embodiments, the cells are cell lines derived from metabolic syndrome-relevant tissue types, including adipose, hepatic, pancreatic and muscle tissues. In certain embodiments, the cells are from experimental cell lines that perform reproducibly under various experimental conditions. Examples of such cells lines include various bacterial and yeast cells lines, HeLa cells, COS cells, NCI 60 cells, and CHO cells, 3T3-L1 fibroblasts, C2C12 muscle, MIN6 pancreatic, and AML12 liver cell lines. In certain embodiments, the cells are cell lines derived from cancer-relevant tissues, such as lung, breast, colon, skin, pancreatic, hepatic, bladder, and brain tissues. In certain embodiments, the lung cancer-relevant cell line used for phenotypic profiling is selected from the group consisting of A549, CALU-1, H522, HCC2279, H460, PC9, H661, HCC2429, NCI-H23. In certain embodiments, the cells may be derived from known cell lines, cultures, or tissue/cell samples from surgical, pathological, or biopsy specimens. If the cells being analyzed are part of a specimen, the cells may be an integral part of an organ or tissue and therefore be surrounded by connective tissue, extracellular matrix, support cells such as fibroblasts, blood cells, etc., blood vessels, lymphatics, etc.
The cell used in the sample may be wild type cells or may have been altered. The genome of the cells may have been altered using techniques known in the art to enhance the expression of a gene, decrease the expression of a gene, delete a gene, modify or "edit" a gene, etc. In certain embodiments, the genome of the cells is modified to express nucleotide sequences that improve visualization of the cytoplasm and nucleus. That is, cells may be infected or transfected with an exogenous marker by methods known by one of skill in the art.
The ability to use stem cells provides the opportunity to insert a marker into the genome of a stem cell, select smart reporter stem cells, as described herein and then differentiate the smart stem cells into different cell or tissue types using methods known in the art. Alternatively, beginning with a stem cell or library of stem cells, these cells can be differentiated into different cell or tissue types as is known in the art and then the differentiated cells are infected/transfected with a marker gene. The transformed and differentiated cells are then screen to identify the smart differentiated cells as described herein. As is appreciated in the art, stem cells are derived by a variety of method and from a variety of tissues and can be any type of stem cell, from embryonic to a patient derived induced pluripotent stem cell (iPS).
In some embodiments, the exogenous markers are randomly inserted into the genome of the recipient cell. In some embodiments the random insertion is accomplished by methods such as, but not limited to "central dogma" (CD) tagging. In alternative embodiments, the markers are inserted into the genome of the recipient cell in a targeted fashion. Such site directed integration is accomplished by using systems such as, but not limited to the CRISPR/Cas system as described in U.S. Patent 8,795,965, which is incorporated herein by reference.
In some embodiments, Central Dogma (CD)-tagging, a genomic-scale approach for fluorescently labeling full-length proteins may be used to build live cell reporters. Such methods are generally known in the art, for example, as described in Jarvik et al. (Biotechniques, 1996 May; 20(5):896-904). CD-tagged proteins serve as reliable biomarkers of cellular responses to compounds and permits monitoring of time-dependent drug response while avoiding costly antibodies and toxic chemical labeling compounds.
In some embodiments described herein exogenous markers are incorporated into host cells. Markers are numerous and are well known to those of ordinary skill in the art. In some embodiments, green fluorescent protein, yellow fluorescent protein, variants thereof and the like find use in the invention. Other markers find use in staining the various organelles of the cell being examined. Such markers include, but are not limited to proteins that target mitochondria, lipid droplets, peroxisome, golgi, nucleus, cilia and so on.
In addition, in some embodiments, other dyes or stains are incorporated into cells that may or may not have an exogenous marker incorporated into their genome. Cells stained in this manner are capable of being examined by any of the methods described herein. In certain embodiments, to automatically analyze cellular responses to large-scale compound libraries, a library of reporter cell lines is established by using at least two cell-labeling technologies that include live cell labeling of the cytoplasm and nucleus, and live-cell labeling of biological pathways. To extract single-cell features, individual cellular and nuclear regions are labeled. To identify these regions automatically and reliably, cells are infected with retrovirus carrying a pSeg construct comprising nuclear and cytosolic fluorescent labels used subsequently for automated cellular segmentation. In some embodiments the cells are fixed for immunofluorescence or other assays as are known in the art. That is, given any reference set of bioactive agents, cell line and collection of antibodies or chemical probes, the cells can be screened to identify the best combination of readouts for classifying the bioactive agents. Fixed cell assays using antibodies provide considerable diversity to investigate pathways because there are so many established reagents. Accordingly, in one embodiment live cells are initially screened against a bioactive agent or library of bioactive agents. Upon identification of relevant bioactive agents, the screen can be repeated using cells that are fixed and investigated with one or more antibodies. Accordingly, in some embodiments any labeled cell or any collection of cell lines may be used according to the principles described herein in which the cell or cells are examined with a set of reference bioactive agents to monitor how the cell or cells reproducibly responds to the bioactive agent. The disclosure provides for methods and compositions that allow for identifying a reporter cell or cells that more accurately predict response to at least a first and second or third or more compound. That is as described herein, some cells do not respond to multiple compounds, e.g. bioactive agents, in a manner that is consistently predictive of a particular signaling or other biochemical pathway, while some cells or collections of cells respond to multiple bioactive agents in a manner that is predictive or becomes predictive for a particular bioactive agent or set of bioactive agents. These predictive cells therefore become reporter cells to screen unknown bioactive agents as described herein.
Once prepared, the cells can be screened to identify those cells that reproducibly produce a characteristic phenotype in the presence of a plurality of agents that modulate a particular cellular pathway. In this embodiment, a library of cells transformed with the marker of interest is contacted with different agents and a plurality of cellular phenotypes are monitored and/or recorded. Monitoring can continue from at least 1, 2, 5, 10, 15 minutes or more to at least 5, 10, 15, 24, 48 hours or more. Phenotypes include but are not limited to changed expression of any of the markers described herein, cell morphology and the like. Phenotypes may be monitored by any method capable of detecting changes in a cellular phenotype and include but are not limited to imaging, detecting changes in the electrophysiological state of a cell, mR A levels, protein levels and the like. Using microscopy, for instance, intensity, localization, texture, and other statistical properties of biomarker expression in single cells may be monitored. These features can be detected by methods known in the art and include imaging by microscopy, patch clamp analysis, RNASEQ, sequencing, Northern blotting, Western blotting and the like. In some embodiments, as each new method of monitoring is established or used, cells are re-screened as described herein to identify the "smartest" clone that provides the most reproducible and predictable results for the screening method.
The present disclosure demonstrates that not all marker integrations will result in cells that reproducibly produce a characteristic phenotype in the presence of a plurality of agents that modulate a particular cellular pathway. As shown in FIG. 15, following integration of a marker into a plurality of different cells, only a fraction of cells reproducibly produce a characteristic phenotype in the presence of a plurality of agents. These cells are referred to herein as smart reporter cells. Therefore as described herein, smart reporter cells are those that produce a given phenotype when exposed to at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 or more different agents, wherein those agents modulate a particular cellular pathway. In addition, these smart reporter cells reproducibly respond to multiple agents that affect or modulate different cellular pathways. That is, the smart reporter cells described herein provide for reproducible screening of agents that modulate at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 or more different cellular pathways. This allows for the first time, a single reporter cell type to be screened against agents and based on the detected phenotype of the cell, allow for prediction of the affected cellular pathway without any a priori knowledge of the nature of the biomarker. That is, the screen described herein provides for the first time an unbiased analysis of cellular phenotypes and pathways affected by a bioactive agent and does not require prior knowledge of the cellular pathway or markers associated that a particular phenotype.
These phenotypic aspects, summarized succinctly as "phenotypic profiles", of the cell may be quantified in certain embodiments. This data can then be analyzed later to derive various categories, correlations, or trends among different populations of cells that may have been treated in different ways (e.g., different drugs, different agents, different concentrations, different R Ai's, different time points). The phenotypic aspects of the cells in a population may be quantitated and statistically analyzed, and this data may be compared to data from a control set of cells or cells subjected to different conditions. The data can then be clustered to find cells of similar phenotypes in order to find compounds of a known activity or mechanism of action.
Accordingly, rather than identifying biomarkers that are specific to a single target or pathway, an objective approach as shown in FIG. 1 was developed for selecting smart reporter cell lines. Second, the smart reporters can be used in a two- (or multi-)tiered screening approach, that is, a small number of smart reporters can be used to profile large libraries at low resolution to identify candidate hits simultaneously across multiple drug classes followed by a larger number of informative reporters that are used to profile the initial hits at higher resolution to generate high-confidence, high-quality drug candidate leads. Rather than screening for compounds that affect a single disease target, the method may begin with existing drugs that have proven efficacy in humans or preclinical animal models and search for new compounds that have similar systems-level responses. In a preferable application the methods identify new compounds having activity profiles across multiple disease-derived cell lines, pathways, and/or time points that are similar to existing drugs. These new lead compounds preferably exhibit distinct chemical structures and operate through different mechanisms.
Referring further to FIG. 1, as most compounds in large chemical libraries are not expected to be relevant, an initial screen of a chemical library is performed at low-resolution using one or a small number of smart reporter cell lines (left panel). Using "guilt-by- association" candidate compounds are identified and grouped with other compounds having similar profiles to proven drugs (e.g., AMP kinase activators, peroxisome proliferator-activated receptor (PPARy) agonists, FGFs, sulfonylureas, and insulin) (middle panel). The use of smart reporters allow candidate compounds to be identified and grouped into multiple drug classes from a single screen. Then for the candidate hit set of each drug class, another round of screening may be performed at higher-resolution using greater numbers of smart reporters to more deeply profile compound effects and thereby reduce false positives from the initial screen (right panel). This method also has the potential to reveal new, druggable pathways for targeting diseases such as cancer (e.g., FIG. 1, cluster of uncircled black dots in middle panel). Accordingly the present disclosure provides methods for identifying smart reporter cell lines that predict mechanism of action of compounds. In addition, the present disclosure provides for methods of screening for molecules having similar mechanisms of action as other molecules.
In the embodiment illustrated in FIG. 2 a pSeg-tagged parent A549 clone was selected expressing H2B-CFP fusion protein in the nucleus and mCherry in the cytoplasm (FIG. 2, left) for automatic segmentation of cell boundary. A smart reporter cell was generated as follows: (1) generate the CD-tagging retrovirus in a retrovirus packaging cell line; (2) infect CD-tagging retrovirus into A549 pSeg clones; (3) use FACS to sort single CD-tagged clones into 384-well plates; (4) confirm positively fluorescent clones by microscope; and 5) identify CD-tagged genes by 3' RACE or 5' RACE in HT (96 well) format and optionally compare localization of expressed proteins to published reports. According to this method a library of at least 150 reporter cell lines may be built from CD- and pSeg-tagged A549 clones, each containing a tagged protein from diverse disease-related functional pathways (FIG. 3).
To assess whether the reporter cell lines were informative, 89 reporter cell lines were randomly selected from the library. Each reporter was treated with 11 different conditions (i.e. 1 control + 2 compounds x 5 drug classes, as shown in FIG. 6A. After treatment, cells were imaged every 6h for 2 days (4 images per well at 10X magnification). Visual inspection of the images revealed that different tagged proteins displayed diverse cellular responses to the panel of chemical compound perturbations (e.g., FIG. 4). As used herein, "perturbation" includes, by way of example, changes in cell morphology or markers expressed by a cell. Clone-to-clone variation also contributed to observed differences. An objective computational approach is next used to take advantage of the observed differences to identify reporters that most accurately predict mechanism of compound action.
Reporter cell lines are profiled. Images obtained in the previous step are analyzed via an image-analysis pipeline as exemplified in FIG. 5. Images are flat-field corrected and background subtracted. Second, individual cellular regions are identified by applying a watershed-based algorithm to the pSeg channels described herein. Third, morphology- and intensity-related features are extracted from each cell. In some embodiments at least 10, 20, 30, 40, 50, 60, 70 80 or more features are extracted and analyzed. Exemplary features include but are not limited to average or total cytosolic or nuclear intensity, Zernike moment intensity features, Haralick texture intensity features, cell size, cell ellipticity, etc.
Next, the distribution for each feature is compared between treated and control cell populations (FIG. 5B, blue vs. black curves). Any of a variety of analyses find use in the comparison. In some embodiments, a non-parametric Kolmogorov-Smirnov (KS) statistic is computed to determine the direction and magnitude of the population shift due to treatment (FIG. 5B, green vs. red vertical lines). The Kolmogorov-Smirnov statistic (Chakravarti, Laha, and Roy, (1967) Handbook of Methods of Applied Statistics, Volume I, John Wiley and Sons, pp. 392-394) is used to decide if a sample comes from a population with a specific distribution. The Kolmogorov-Smirnov (K-S) test is based on the empirical distribution function (ECDF). Given N ordered data points Yl, Y2, . . . , YN, the ECDF is defined as where n(i) is the number of points less than Yi and the Yi are ordered from smallest to largest value. This is a step function that increases by 1/N at the value of each ordered data point. An attractive feature of this test is that the distribution of the K-S test statistic itself does not depend on the underlying cumulative distribution function being tested. Another advantage is that it is an exact test (the chi-square goodness-of-fit test depends on an adequate sample size for the approximations to be valid).
To summarize overall cellular responses, statistics, such as the K-S statistics, are collected across features, time points, and/or reporter cell lines (FIG. 5B-C). Some profiling approaches, such as based on the K-S statistics, are particularly suited for profiling large compound libraries, as responses may be compared to controls on the same plate, and thus plate - and batch-level variation are minimized, and compound profiles may be built incrementally from different reporters and time points.
Smart reporter cell lines are identified by assessing whether the combined collection of reporters could distinguish different drug classes. Profiles are computed from the compound- treatment dataset of the randomly-selected cell lines. As shown in FIG. 6 A, heat map visualization reveal striking similarity of profiles for compounds from similar drug classes and, likewise, dissimilar profiles for compounds of dissimilar classes. Thus, profiles from live-cell readouts are informative signatures of drug mechanism, and validating the efficacy of the method, replicate profiles, which are derived from separate experiments, group together (FIG. 6B).
In certain embodiments, reporter cell lines are ranked by how well their profiles separate different drug classes. The best cell lines from each tissue type are referred to as smart reporter cells, noted above.
A diverse set of proven lead compounds is identified to serve as a starting point for further development into clinically useful drugs. Proven compounds from various drug classes are selected. For cancer, for instance, the classes may comprise by way of example alkylating agents, anti-metabolites, anti-mitotics, topoisomerase inhibitors, cytotoxic antibiotics, proteasome inhibitors, chromatin modifiers, growth factor signaling inhibitors, anti-HSP90, and ER stress inducers.
The cells may be treated with various chemical agents (e.g., small molecules, pharmaceutical agents, chemical compounds, biological molecules, proteins, polynucleotides, anti-sense agents such as RNAi, etc.) known to have a specific biological effect. For example, these chemical agents can include dimethylsulfoxide (DMSO), cisplatin, carboplatin, cyclophosphamide, methotrexate, pemetrexed, gemcitabine, vincristine, paclitaxel, PLK-1 inhibitors, Aurora B inhibitors, topoisomerase inhibitors, docetaxol, SAHA, bortezomib, romidepsin, camptothecin, NSC693322, NSC637578, NSC640499, erlotinib, gefitinib, tanespimycin, cytochalasin D, jasplakinoldie, latrunculin B, 105D, colchicine, griseofulvin, podophyllotoxin, taxol, vinblastine, actinomycin D, staurosporine, irinotecan, topotecan, camptothecin, actinomycin, bleomycin, doxorubicin, etoposide, anisomycin, emetine, puromycin, tunicamycin, Brefeldin A, thiazolidinediones (e.g., rosiglitazone, pioglitazone, and troglitazone), anisomycin, mevinolin, wortmannin, trichostatin, ibuprofen, indomethacin, sulindac sulfate, alsterpaullone, indirubin monoxime, olomucine, purvalanol A, glibenclamide, gliclazide, glipizide, cycloheximide, or nocodazole. Any combination of genetic and/or chemical alterations may also be used. For example, the cells may be genetically engineered to stop the cells in the cell cycle, and then chemical compounds from a library of compounds may be added to the genetically altered cells to identify compounds which patch the genetic defect.
As discussed above, the cell samples may be provided as arrays of cells—each element of the array representing a separate experiment in which the cells have been subjected to different conditions. For example, each well of a multi-well plate may be treated with a different test agent, different concentration, different temperature, or different time point to determine its effect on the cells. The cells may be treated with an agent in concentrations ranging from 0.1 pM up to 100 mM; preferably, 1 pM to 0.1 mM; more preferably 10 pM to 0.01 mM. In certain embodiments, the array of cells has at least one element containing cells which are untreated and therefore serve as a control. In certain embodiments, several elements of the array may serve as a control to enhance reliability and reproducibility. The cells may optionally be fixed and stained before images of the cells are acquired. In other embodiments, images of the cells may be obtained while the cells are alive. This allows the cells to be analyzed at later time points, or the cells may be further treated with agents. The cells to be analyzed using the inventive method may be imaged to obtain the raw data that will be analyzed to determine the phenotypic characteristics of the cells. The number of cells to be imaged may range from a single cell to less than 100 cells to less than 500 cells to over a thousand cells. In certain embodiments, the number of cells in a field to be imaged range from 100-200 cells, preferably approximately 200 cells. In certain embodiments, images with less than 10 cells are discarded. In other embodiments, images with less than 50 cells are discarded. Multiple images of the cells may be taken at different wavelengths to assess staining with different fluorescent dyes. Multiple images may also be taken in each well in order to reduce noise and increase reproducibility in the experiments. For example, five to ten images may be acquired in each well at different non-overlapping regions. The cells can be imaged using any method known in the art of light or fluorescence microscopy.
Images may be obtained digitally using a digital image capture device such as a CCD camera or the equivalent, or they may be obtained conventionally using standard film technology and then digitized from the film (e.g., using a scanner). In either case, the camera may be connected to a microscope. In a preferred embodiment, the images are acquired digitally by a CCD camera directly mounted to a microscope, thereby eliminating the additional step of digitizing an analog image.
The magnification chosen to image the cells may range from very low magnification 5x to very high magnification 5000x. In certain embodiments, the magnification ranges is lOx, 20x, 50x, lOOx, 200x, 500x, or lOOOx. As would be appreciated by one of skill in this art, the magnification would depend on various factors including the number of samples to be imaged, the number of cells per samples, and the aspects of the cells to be analyzed. For example, analysis for cell shape and morphology would typically require less magnification than imaging subcellular organelles such as the nucleus and centrosomes. In certain embodiments, the cells may be imaged at multiple magnifications in order to better assess several different aspects of the cells. In other embodiments, a magnification is chosen as a compromise between various competing factors so that the cells are only imaged once.
An appropriate resolution (pixels per image) of the digitized image must be selected, whether the images are originally acquired by digital means or are scanned from conventional micrographs. As will be understood by those of ordinary skill in the art, resolution is typically selected so that features of interest (e.g., whole cells, nuclei, or centromeres) comprise a sufficient number of pixels that their morphological characteristics (e.g., average diameter, area, perimeter, shape factor) may be determined with a sufficient accuracy at the selected magnification, while not exceeding available computing power and/or data storage. If a camera with very fine resolution (i.e., a large number of pixels per imaged frame) is not available, a higher magnification may be used. In such cases, more image frames may be acquired for each specimen in order to image a statistically significant number of cells.
In certain embodiments, the images are acquired using a digital camera mounted on a standard laboratory microscope. The images may then be stored and analyzed later by a computer, or they can be analyzed as they are acquired. Images may be stored in any appropriate file format, including lossy formats such as .jpg and .gif or lossless formats such as .tiff and .bmp. Alternatively, only analysis results may be stored.
Cell features may be identified using standard thresholding and edge detection techniques. Such techniques are described, for example, in U.S. Pat. No. 5,428,690 to Bacus et al, U.S. Pat. No. 5,548,661 to Price et al, and U.S. Pat. No. 5,848,177 to Bauer et al, all of which are incorporated by reference herein. Once the cell features have been identified by one of these methods, quantitative morphological data about each feature may be collected, such as area, perimeter, shape factor (commonly defined as the ratio of 4n(Area)/(Perimeter)2), aspect ratio, and gray level statistics (such as the average gray level and the standard deviation in the gray level for a particular feature). In an alternative embodiment, identification of individual cells is avoided by making use of image-based features. The phenotypes of image properties (rather than individual cells) could be extracted and compared across conditions. One approach to do this is to grid the image into blocks of pixels (e.g. 8x8), then extract intensity properties of the blocks from images taken of cells in each different drug condition. A tool for doing this is PhenoRipper, which the inventors developed. An application can include, but is not limited to neuroscience, for which identifying individual neurons is very difficult for standard approaches. Thus, it is possible to select the smart clone based on how smart it is for predicting drug mechanisms using PhenoRipper image blocks. After image (instead of cell) features have been extracted, the pipeline continues in the same way as described herein. Once the images have been analyzed for the specific cell characteristics and the characteristics have been quantified, any statistical methods known in the art can be used to determine the differences between two sets of data. In certain embodiments, a distribution of cells with a certain characteristic from a particular experiment may be used in statistically analyzing the characteristic. In certain embodiments, a set of experimental data involving a specific drug, at a particular concentration, and at a certain time point will be compared to a set of control data where no drug has been added. In other embodiments, experimental data with a first agent may be compared to experimental data with a second agent; or one concentration versus another concentration; or one time point versus another. In other embodiment, statistical analysis may be performed on more than two sets of data resulting in a 3 -way, 4-way, 5 -way, or multi-way analysis.
In certain embodiments, distributions are obtained for each set of data collected. In certain embodiments, it is convenient to represent with a single number each population of descriptor values in a given experimental well. Some of the characteristics desired in such a reduced measure include: (1) it must cope with non-normal distributions of descriptor values (e.g., bimodal distributions); (2) it must account for the fact that different descriptors have different levels of biological variability and experimental noise; (3) it must convert different types of measurement into a common unit for comparison; (4) it must be insensitive to descriptor parameterization; and (5) it must be insensitive to the precise quantitative relationship between antibody staining intensity and total amount of target per cell. Preferably, the reduced measure will have at least one of the desired characteristics.
In certain embodiments, two distributions may be compared by comparing the heights of the two distributions, the widths of the two distributions (e.g., the width at the base, the width at half-height), continuous distribution functions of the two distributions, etc. In comparing the continuous distribution functions, one can determine the maximum distance or displacement between the two curves (i.e., the Kolmogorov-Smirnov statistic), the integration or area between the two curves, the maximum height difference between the two curves, the intersection of the two curves, etc.
In certain embodiments, two sets of distribution data are compared using Kolmogorov- Smirnov statistics. Distributions of each data set are determined, and empirical cumulative distribution functions are calculated. The continuous distribution functions from each of the sets of data being compared are analyzed to determine the maximum displacement between the two cumulative distribution functions. That is, the function KS(f,g,) computes f-g at the point where |f-g| reaches its maximum. Note that KS(f,g,) = -KS(f,g,). The maximum displacement is a signed statistic known as the Kolmogorov-Smirnov statistic (KS statistics). In certain preferred embodiments, one set of data is experimental (e.g., cells treated with a particular compound) and the other is a control (e.g., cells left untreated). The resulting KS statistics from multiple experiments can then be assigned a color and plotted in an array so that the KS statistics from many different experiments can be visually assessed.
As an example of computing a KS statistic, let f and g be the continuous distribution functions for the sizes of nuclear areas in cell populations in two wells - f represents cells from an untreated well and g represents cells from a treated well. If the average nuclear area were to increase in the treated well, then g would shift to the right (FIG. 5). This would result in KS(f,g,) being positive. If the nuclear size instead decreased in the treated well, then KS(f,g,) would become negative.
In certain embodiments, two sets of distribution data are compared using Kolmogorov- Smirnov statistics modified to provide another quantitative approach to assess how well different drug classes could be distinguished by their profiles. For each compound c in a given drug class M(c), the average squared distance to other drugs within or between classes was computed by: dw (c) = mearijeM^ \Pc— Pj \ or db (c) = mearij ^ \Pc— Pj \ , respectively. Here, Pc and Pj denote profiles for compounds c and j and | · | denotes Euclidean distance. The standard Silhouette index, S(c) =— db^ dw^ was use(j ^0 assess the average ability of profiles to distinguish class M from all other classes: DP(M) = meanceMS (c) . Finally, this score was averaged over all classes, DPTotal = meanMDP (M), to obtain an overall assessment of discriminative power. This score is referred to as the total discriminative power (DPTotal), where a score of 0 indicates random mixing, and 1 indicates perfect separation of compound classes.
In certain embodiments, discriminative powers of different reporters are quantified. The discriminative power of profiles generated from individual reporters at a single time point (e.g., 48 hours) are analyzed. As illustrated in FIG. 8 (top histogram), the value of O?Totai may vary dramatically across the reporters. Using this measure, reporters may be identified that have a high ability to discriminate among drug classes based on a single time-point profile. Profiles, even those built from a single reporter and single time point, are therefore highly useful to identify candidate lists of compounds enriched for specified drug classes.
As would be appreciated by one of skill in this art, the reproducibility of these statistical calculations may be improved by analyzing a greater number of cells, for example, using replicates. In other embodiments, high and low values of a vector component may be dropped in calculating a replicate average to increase reproducibility.
Clustering algorithms can then be used to cluster data sets (e.g., compounds, descriptors) that are similar. In certain embodiments, standard hierarchical clustering algorithms are used. For example, clustering can be used to identify replicates of a compound within a set of data. Also, clustering can be used to cluster data from a compound with a known activity to data from a compound with a similar mechanism of action. In this way, the inventive system may be used to identify the mechanism of action of a new compound that is relatively nearby in phenotype space.
Clustering can also be used to better refine the cellular characteristics (descriptors) being evaluated. For example, clustering can be used to determine which descriptors can provide information that is independent or non-overlapping, or new correlations between descriptors.
Morphological analysis or phenotypic profiling of cells can be used in a wide variety of applications, for example, histology, pathology, drug screening, drug development, drug susceptibility screens, etc. In certain embodiments, chemical compounds are contacted with the cells, and the cells are imaged after a certain time period. In certain embodiments, different concentrations of the chemical compound dissolved in a suitable solvent such as medium, water, DMF, or DMSO are used. The cells are then imaged, and the data gathered from the images is analyzed to determine trends among different compounds or different descriptors.
In one embodiment, phenotypic profiling is used in drug discovery. First, a set of chemical compounds or drugs with known biological activity or mechanism of action, known as the training set, are contacted with cells at various concentrations and statistical data on various descriptors is gathered and analyzed. Trends are then established for certain compounds with known modes of action. For example, compounds that affect protein synthesis may affect certain descriptors while compounds that affect tubulin polymerization may affect other descriptors. After these trends have been established, a set of chemical compounds of unknown activities (e.g., a newly synthesized combinatorial library) may be contacted with the same cells to look for the effect of each of the compounds on the phenotypic profile of the cells. Clustering analysis comparing the training set of compounds to the new set of experimental compounds is then used to determine which compounds of unknown mechanisms of actions may have activities similar to compounds in the training set. Therefore, compounds more likely to have a desired activity can be quickly selected using phenotypic profiling.
Candidate hits may be next tested for potentially activity as leads or candidate drugs. After candidate compounds are screed using disease- or condition-relevant cell-based assays, they are profiled and compared to other compounds in the same drug class that potentially share similar overall profiles in the same assays and any functionally- or therapeutically-relevant differences are noted. The candidate compounds may be further screened in secondary assays such as intracellular signaling pathway analysis, cell differentiation, cell growth, cell uptake of relevant molecules, and gene expression.
By way of example, candidate anti-cancer chemotherapeutics that have passed initial phenotypic profile screens may undergo further testing depending upon whether they are DNA damaging agents, mitosis inhibitors, chromatin-modifying agents, proteasome inhibitors, growth factor signing inhibitors, ER stress inducers, or HSP-90 protein folding inhibitors. For DNA damaging agents, candidate compounds may undergo cell cycle analysis to evaluate Gl or S/G2 arrest, and immunofluorescence (IF) assays with DNA damage pathway marker (e.g., phosphorylated p53 or H2A.X). For mitosis inhibitors, candidate compounds may be examined by IF assays with a mitotic marker such as phosphorylated histone H3 or disruption of microtubule dynamics by performing tubulin polymerization assays. For hits targeting chromatin, candidate compounds may be subjected to IF assays with a panel of histone epigenetic markers and locus de-repression (LDR) assays (e.g., with murine mammary carcinoma cell line CI 27 labeled with GFP reporter). For hits in proteasome inhibitor, accumulation of fluorescent protein conjugated to a degron which enables fast degradation of the fusion protein in response to candidate compounds. For growth factor signaling inhibitors, IF assays may be performed to confirm reduction of phosphorylated pathway targets. For ER stress inducer, IF assays with ER markers and cell membrane proteins may be performed to detect changes of ER morphology and reduction of protein secretion through ER. Inhibitors of the HSP- 90 protein folding response may be validated in a rabbit reticulocyte lysate luciferase refolding assay.
By way of example, candidate anti-metabolic syndrome therapeutics that have passed initial phenotypic profile screens may undergo further testing. In certain embodiments, the top 50-100 candidate hits in each drug class using metabolic syndrome relevant cell-based assays may be profiled. Assays include but are not necessarily limited to intracellular signaling pathway analysis, glucose uptake and glucose transporter 4 (GLUT4) translocation, gluconeogenic gene expression, adipocyte differentiation and lipogenesis, insulin secretion, and primary cell assays.
Regarding intracellular signaling pathway analysis, in certain aspects activation of intracellular signaling pathways in AML12, C2C12 myocytes and/or myotubes, differentiated 3T3-L1 adipocytes and MIN6 pancreatic beta-cells may be tested, as well as the phosphorylation status of proteins including Akt, AMPKa, GSK3a/b, mTOR, and S6 ribosomal protein.
Regarding glucose uptake and glucose transporter 4 (GLUT4) translocation, in certain aspects 2-deoxyglucose uptake and the intracellular localization (plasma membrane, microsomal, and cytosolic fractions) of GLUT4 may be measured in C2C12 myocytes/myotubes and 3T3-L1 adipocytes. Glycogen concentrations in C2C12 myocytes/myotubes may also be measured.
Regarding gluconeogenic gene expression, mRNA levels of gluconeogenic genes, including PEPCK, G6Pase, and PGCla, in AML12 cells by real-time quantitative PCR may be measured. With each different collection of readouts, one can apply the invention to identify a smart reporter that can be used as previously described, for example to predict compound mechanisms from among a plurality of distinct modes of action.
Regarding adipocyte differentiation and lipogenesis, the effect of compounds on the differentiation rate of 3T3-L1 cells by measuring neutral lipid accumulation via Oil Red O staining and gene expression (e.g., FABP4, PPARy) by real-time quantitative PCR at various time points in the differentiation process may be determined. The effect of compounds on neutral lipid accumulation and lipogenic gene expression (e.g., PPARy, fatty acid synthase, SREBPlc) in 3T3-L1 adipocytes that have been differentiated prior to treatment may also be determined. Likewise, the compound effects on neutral lipid accumulation and lipogenic gene expression in AML12 cells may also be determined.
Insulin secretion may be determined by measuring insulin content and glucose-dependent insulin secretion as known in the art.
Compounds identified with the greatest efficacy from any or all of the above assays or different assays may be validated in human primary muscle, adipose, pancreas, and/or hepatocyte cells.
The invention also provides a system for carrying out the inventive methods. The system may include some or all of the hardware and software necessary to practice the inventive technology. The system may include microscopes, microprocessors, data storage devices, robots, fluid handling devices, plate reader, automatic pipetters, software, printers, plotters, displays, etc. In certain embodiments, the system may include a microscope able to acquire images at various magnifications and/or resolutions, a microprocessor, and software for carrying out the image analysis and the statistical analysis of the raw data derived from the images. In certain embodiments, the system includes the hardware and/or software necessary to calculate statistics, such as but not limited to Kolnogorov-Smirnov statistics. In certain embodiments, the system includes the hardware and/or software necessary to perform clustering analysis. In certain embodiments, a low magnification is useful where many cells are to be analyzed. In other embodiments, a high magnification is useful when analyzing for a characteristic only visible at high power. In addition to magnification, the resolution of the image may be varied depending on the analysis to be performed. In certain embodiments, a low resolution image is preferred for carrying out the automated analysis. In certain embodiments, the system does not include the microscopy equipment needed to acquire the images. Instead, the raw data is analyzed by a system with a microprocessor running the necessary software for performing the desired analysis. For example, the system may run the necessary software for calculating K-S statistics or other statistics. The system may also include the necessary software for performing the clustering of compounds or descriptors. The system may also include a storage device for storing the images and/or data for future recall if need be.
In addition, one aspect of the disclosure includes a database of cellular responses and the smart reporter from which it is generated. That is, smart reporters are selected based on their ability to reproducibly cluster similar molecules based on multiple phenotypic readouts. A database of readouts and results induced by different agents can therefore be generated for each smart reporter. A benefit of this is that for each experiment performed on a particular reporter cell with a different agent, new data can simply be added to the database of responses. This simplifies the complexity of subsequent experiments and allows for the possibility of identifying clusters of molecules not identified in a smaller subset of molecules.
It should be understood that this invention is not limited to the particular methodology, protocols, and reagents, etc., described herein and as such may vary. The terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention, which is defined solely by the claims. As used herein and in the claims, the singular forms include the plural reference and vice versa unless the context clearly indicates otherwise. Other than in the operating examples, or where otherwise indicated, all numbers expressing quantities of ingredients or reaction conditions used herein should be understood as modified in all instances by the term "about."
All patents and other publications identified are expressly incorporated herein by reference for the purpose of describing and disclosing, for example, the methodologies described in such publications that might be used in connection with the present invention. These publications are provided solely for their disclosure prior to the filing date of the present application. Nothing in this regard should be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention or for any other reason. All statements as to the date or representation as to the contents of these documents is based on the information available to the applicants and does not constitute any admission as to the correctness of the dates or contents of these documents.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as those commonly understood to one of ordinary skill in the art to which this invention pertains. Although any known methods, devices, and materials may be used in the practice or testing of the invention, the methods, devices, and materials in this regard are described herein.
EXAMPLES
Example 1
Building libraries of reporter cell lines Live-cell approach for labeling the cytoplasm and nucleus
To automatically analyze cellular responses to large-scale compound libraries, a library of reporter cell lines was established by using at least two cell-labeling technologies that include live cell labeling of the cytoplasm and nucleus, and live-cell labeling of biological pathways. The two-color labeling strategy provides unambiguous identification of cellular and nuclear regions and information about their morphology.
To extract single-cell features, individual cellular and nuclear regions were labeled. To identify these regions automatically and reliably, cells were infected with retrovirus carrying a pSeg construct comprising nuclear and cytosolic fluorescent labels used subsequently for automated cellular segmentation. In the illustrative embodiment shown in FIG. 2A (top), the pSeg construct consists of (1) pMYs retroviral backbone controlling genomic integration, (2) mCherry fluorescent protein (RFP) expressed in the whole cell, (3) Histone H2B fused to cyan fluorescent protein (CFP) expressed in the nucleus, and (4) selection cassette. As exemplified in FIG. 2 (bottom), A549, a non-small cell lung carcinoma cell line, was infected with this pSeg construct. Stable pSeg-tagged A549 clones were grown for tens of passages without signs of reduced expression. Similarly, as shown in FIG. 11, CALU-1, H522, HCC2279, H460, PC9, H661 and HCC2429 cell lines were stably infected with pSeg, therefore demonstrating that diverse lung cancer cell lines were stably infected with the pSeg construct.
Live-cell approach for labeling biological pathways
To monitor cellular responses to compounds, Central Dogma (CD)-tagging was used to build live cell reporters. CD-tagged proteins serve as reliable biomarkers of cellular responses to compounds. CD-tagging permits monitoring of time-dependent drug response while avoiding costly antibodies and toxic chemical labeling compounds.
A pSeg-tagged parent A549 clone was selected expressing H2B-CFP fusion protein in the nucleus and mCherry in the cytoplasm (FIG. 2, left) for automatic segmentation of cell boundary. A multistep pipeline was implemented as follows: (1) generate the CD-tagging retrovirus in a retrovirus packaging cell line; (2) infect CD-tagging retrovirus into A549 pSeg clones; (3) use FACS to sort single CD-tagged clones into 384-well plates; (4) confirm positively fluorescent clones by microscope; and 5) identify CD-tagged genes by 3' RACE or 5' RACE in HT (96 well) format, and optionally, compare localization of expressed proteins to published reports. Using this method a library of 152 reporter cell lines were built from CD- and pSeg- tagged A549 clones, each containing a tagged protein from diverse cancer-related functional pathways. FIG.3 shows the distribution of the 152 reporter cell lines by functional pathway.
Example 2
Perturbing and imaging reporter cell lines
To assess whether the reporter cell lines were informative, 89 reporter cell lines were randomly selected from the library. Each reporter was treated with 11 different conditions (i.e. 1 control + 2 compounds x 5 drug classes), as shown in FIG. 6A. After treatment, cells were imaged every 6 hours for 2 days (4 images per well at 10X magnification). Visual inspection of the images revealed that different tagged proteins displayed diverse cellular responses to the panel of chemical compound perturbations (e.g., FIG. 4). Clone-to-clone variation also contributed to observed differences. An objective computational approach is next used to take advantage of the observed differences to identify reporters that most accurately predict mechanism of compound action.
Example 3
Profiling reporter cell lines
Images obtained in the previous step are analyzed via an image-analysis pipeline as shown in FIG. 5. First, images were flat-field corrected and background subtracted. Second, individual cellular regions were identified by applying a watershed-based algorithm to the pSeg channels described herein. Third, morphology- and intensity-related features were extracted from each cell (preferably at least 80 features). Fourth, the distribution for each feature was compared between treated and control cell populations (FIG. 5B, blue vs. black curves). For each reporter, a non-parametric Kolmogorov-Smirnov (KS) statistic was computed to determine the direction and magnitude of the population shift due to treatment (FIG. 5B, green vs. red vertical lines). To summarize overall cellular responses, KS statistics were collected across features, time points, and/or reporter cell lines (FIG. 5B-C). The profiling approach was particularly suited for profiling large compound libraries, as responses may be compared to controls on the same plate, and thus plate- and batch-level variation were minimized, and compound profiles may be built incrementally from different reporters and time points.
Example 4 Identifying maximally-informative reporter cell lines for predicting drug classes
Maximally informative reporter cell lines are identified by assessing whether the combined collection of reporters could distinguish different drug classes. Profiles are computed from the compound-treatment dataset of the randomly-selected cell lines. As shown in FIG. 6 A, heat map visualization reveal striking similarity of profiles for compounds from similar drug classes and, likewise, dissimilar profiles for compounds of dissimilar classes. Thus, profiles from live-cell readouts are informative signatures of drug mechanism, and validating the efficacy of the method, replicate profiles, which are derived from separate experiments, group together (FIG. 6B).
Example 5
Testing the predictive power of a small number of reporters To assess whether phenotypic profiles of novel compounds that are similar to compounds of known mechanism of action could be used for predicting drug classes, responses of six (6) randomly chosen reporter cell lines were assessed with a subset of the NCI Challenge Drug Set. Three (3) compounds having profiles were similar to those of DNA inhibitors (FIG. 7A, red ellipse) were selected. Subsequent validation assays of DNA damage and cell cycle arrest suggested that predictions were functionally meaningful (FIG. 7B-C), as PubChem lists these drugs as affecting DNA in yeast screens. Thus, these results show that similarity of profiles to known drugs can accurately predict drug class for novel compounds.
Example 6
Quantifying drug class discrimination
A quantitative approach was developed to assess how well different drug classes could be distinguished by their profiles. For each compound c in a given drug class M(c), the average squared distance to other drugs within or between classes was computed by: dw(c) = mearijeM^ \Pc— Pj \ or db (c) = mearij ^ \Pc— Pj \ , respectively. Here, Pc and Pj denote profiles for compounds c and j and |·| denotes Euclidean distance. The standard Silhouette index, S(c) =— db^ dw^ was use(j ^0 assess the average ability of profiles to distinguish class M from all other classes: DP(M) = meanceMS (c) . Finally, this score was averaged over all classes, DPTotal = meanMDP(M), to obtain an overall assessment of discriminative power. This score is referred to as the total Discriminative Power (DPTotal), where a score of 0 indicates random mixing, and 1 indicates perfect separation of compound classes.
Example 7
Quantifying discriminative powers of different reporters
Discriminative powers of different reporters were quantified. The discriminative power of profiles generated from individual reporters at a single time point (48 hr) were analyzed. As shown in FIG. 8 (top histogram), the value of OFTotai varied dramatically across the reporters. Using this measure, reporters were identified that had a high ability to discriminate among drug classes based on a single time-point profile. Profiles, even those built from a single reporter and single time point, are therefore highly useful to identify candidate lists of compounds enriched for specified drug classes.
Example 8
Strategy for identifying maximally informative reporters.
Some reporters exhibit poor ability to discriminate among the different drug classes (FIG. 8, top, reporters with low OFTotai scores) while others are remarkably effective (high OFTotai scores). A "greedy search" strategy, that is, by starting with the single-most informative reporter for separating all drug classes, and then sequentially adding reporters that gave the highest DPTotal value of the expanded reporter set, may be used. As shown in FIG. 8, OFTotai improves from one reporter to reach a maximum around 2-3 reporters, then decreases with additional numbers of reporters (FIG. 8, bottom). Further, profiles based on data imaged every 24 hours provide similar performance to profiles built on higher temporal sampling of every 12 hours or 6 hours (FIG. 8, bottom).
Example 9
Testing ability to identify novel drug classes
Finally, the ability to take reporters deemed "informative" based on a training set of drug classes and discriminate a new drug class was tested. To do this, a leave-one-class-out strategy was used. In particular, for each reporter the average of OFTotai on all possible subsets of four drug classes (used above) vs. DProiiJ/ computed on all five classes (FIG. 9) were compared. It was found that the most informative clones based on the smaller training set of 4 drug classes were still highly discriminative when adding in a new drug class. These results suggest minimal bias in using informative clones in later steps to identify novel compound classes.
Example 10
Construction of reporter cell line library for live-cell drug screening To enable high-content profiling of large-scale compound libraries, a collection of triply- labeled live-cell reporter cell lines was established (Fig. 14a). First, cells were labeled for readouts for the whole cell (marked with mCherry fluorescent protein (RFP)) and the nucleus (marked with Histone H2B fused to cyan fluorescent protein (CFP)). These two makers (referred as pSeg) facilitated automatic and accurate identification of cellular and nuclear regions and information about their morphology by computer. Second, Central Dogma (CD)-tagging, a well- established genomic-scale approach for randomly labeling different full-length proteins (marked by inserting yellow fluorescent protein (YFP) as an extra exon) was used. CD-tagged proteins typically express at endogenous levels and have preserved functionality, though for our profiling purposes they are only required to serve as reliable biomarkers of cellular responses to compounds.
To build the reporters, the well-studied A549 non-small cell lung cancer cell line was used as it is amenable both to transfection assays (high rates of transfection) and imaging studies (cells do not tend to clump and can be more easily identified by computer). A pSeg-tagged parent A549 clone expressing the nuclear and cellular reporters was selected. (Stable pSeg- tagged A549 clones were grown for tens of passages without signs of reduced expression.) With this clone, -600 of triply-labeled A549 reporter clones were generated (Fig. 15a). The identities of the CD-tagged genes were then identified by 3' RACE or 5' RACE. From this large collection, 93 reporters that were tagged for distinct proteins were selected, placed in diverse GO-annotated functional pathways, and had detectable YFP levels by microscopy.
As an initial examination of our reporter library, six reporters representing diverse functional pathways and spatial localization patterns were selected (Fig. 15b). These reporters were treated for 48 hours with six compounds that targeted pathways related to our reporters. Microscopy images revealed that each reporter displayed diverse responses (cell morphological and protein localization) from drug to drug (Fig. 15b). Previous studies with fixed-cell assays demonstrated that quantitative measurements of cellular responses can be used to accurately group compounds by their mechanisms of action (MO A).
Example 11
Computation of reporter response profiles
Phenotypic profiles effectively transform compounds into vectors whose entries summarize the responses of cells in a population to the perturbation. This transformation occurred in three main steps. First, images of perturbed cells were transformed into collections of feature distributions: -200 features of morphology (e.g. shape of the nuclear and cellular domains) and protein expression (e.g. intensity, localization and texture properties of the tagged protein) are measured for each cell. Second, feature distributions for each condition were transformed into numerical scores: for each feature, differences in cumulative distribution functions (CDF) between perturbed and unperturbed conditions were summarized by a
Kolmogorov-Smirnov (KS) statistic. Third, scores were transformed into phenotypic profile vectors: for each perturbation, KS scores were combined across features to form a profile vector.
These resulting profile vectors succinctly summarized the effects of a compound, and could be further extended to include multiple time points, compound concentrations or even responses from multiple reporter cell lines.
Whether compounds with similar (or dissimilar) MO As would produce relatively similar (or dissimilar) profiles was investigated. Six selected reporters were treated (Fig. 14b) with a small panel of "test" drugs (31 conditions = 5 compounds in each of 6 different drug classes and DMSO control) and imaged cellular responses every 12 hours for 48 hours. 100 DMSO profiles were generated from randomly selecting cells in control conditions. Heat map representations of phenotypic profiles, built by merging data across our six reporters, revealed striking similarity of compounds from similar drug classes and, likewise, dissimilar profiles for compounds of dissimilar classes. Further, profiles were quite similar across replicate experiments. Thus, as with profiles built from fixed-cell assays and antibody readouts, profiles based on live-cell readouts were observed to produce informative signatures of drug mechanism.
To better visualize time -varying changes, our collection of profiles at each time point was projected into three dimensions. The resulting time traces showed the unperturbed cells remaining in a tight "ball" (Fig. 14c, gray curves). In contrast, time traces for different classes of drugs moved in different directions away from the DMSO "origin", with different members of each class moving in similar directions (Fig. 14c). The divergence of these time traces from one another also suggested that time points greater than 24 hours were sufficient for discriminating among drug classes. Thus, results suggested that phenotypic profiles from reporter cell lines can be used to predict compound MOA.
Example 12
Identification of "smart" drug-response reporter
To economize large-scale screens, to what degree a single reporter and time point could be used to accurately discriminate among our different drug classes was next investigated. This time all 93 reporters were treated with our panel of "test" drugs, imaged cellular responses, and then computed phenotypic profiles for each of the reporters individually using only the final 48 hour time point. As only a single time was used, each drug profile was a single point. To assess accuracy, six "test" drug profiles were randomly removed (one from each of the six categories), computed the centroid of the remaining four drug profiles in each category, and assigned the six test profiles to their nearest centroids (Online Methods). Prediction accuracy was determined by repeating this process 100 times and averaging the results across two duplicates of the experiment.
It was found that prediction accuracy varied dramatically from clone to clone ("random" guessing of 1 DMSO or 6 drug categories is 14%; Fig. 16). As might be expected, the parent A549 reporter (RiHC), labeled only with cellular region markers, displayed the lowest prediction accuracy (60%). However, that this accuracy is already more than four times better than random guessing, which confirmed recent results that morphology carries considerable information for predicting MOA. Nevertheless, results also confirmed the intuition that additional information from tagged proteins would improve MOA predictions. The reporter labeled for XRCC5, a nuclear-localized protein that functions in DNA double strand break repair, displayed the highest prediction accuracy (94%>). Thus, the approach of building a large and diverse collection of reporter cell lines and applying an objective test to identify the one best able to classify drugs by MOA yielded XRCC5 as the "smartest" reporter.
Example 13
Identification of multi-category hits using "smart" reporter The XRCC5 reporter was used to perform a large-scale phenotypic screen of small- molecule compound libraries. These libraries included: the NCI approved oncology drug set IV (101 compounds), the NCI diversity set IV (1596 compounds), the NCI natural product set III (117 compounds), the Prestwick FDA approved drug set (1100 compounds), and the UTSW 8K set (8,000 compounds). All compound libraries were assayed at three different concentrations, except the Prestwick and UTSW sets, which were assayed at a single concentration due to their large sizes. Additionally, a "reference" drug set was included, which expanded the previous "test" set to 10 drug classes affecting diverse biological processes (Fig. 17). All 38 reference drugs were used at eight, 5 -fold dilutions. Finally, to increase chances of identifying drug effects at different time scales, cells were imaged at both 24 and 48 hour. In total, profiles were built from -62,000 3-channel images of -20,000 conditions (derived from 10,914 unknown compounds, reference drugs and controls), -60,000,000 identified cellular regions, and -230 features per cell, yielding a total of ~1.4xl010 data points. Final compound profiles were built by merging data across these two time points. As before, these compound profiles can be viewed as points in a high-dimensional feature space.
A multi-step strategy to identify "hits" was taken. First, feature space was transformed and reduced dimensionality to maximally separate reference drug classes from one another. Linear discriminant analysis (LDA) was applied to the collection of reference drug profiles to identify an "optimal" transform that increased separation of profiles across drug classes while decreasing separation of profiles within each class. Second, our unknown compounds were assigned to the nearest reference drug class. A nearest-neighbor approach was applied to the LDA-transformed space to assign each unknown compound to the category of its nearest reference drug. Third, confidence scores were calculated, ranging from 0 (low) to 1 (high), for each prediction. Scores were estimated based on the collection of intra- and inter-drug category distances among profiles in our reference drug set, and compounds were re-annotated as "unclassified" if their predictions had low confidence scores (a threshold score of 0.1 was heuristically chosen based on calibration with the NCI approved oncology drug of known mechanism). Finally, "hit" compounds were identified. Hits were defined as compounds not annotated by the control category "DMSO"; that is, hits are "bioactive", but may not necessarily be near to known drug categories. Taken together, this approach allowed for a prediction of which compounds have interesting activity away from "DMSO", predict whether they have known or novel MO As, and prioritize compounds for validation based on confidence scores. Using this strategy, 431 "hit" compounds were identified from our diverse compound libraries (Fig. 17). These hit compounds were rescreened to test for reproducibility of our MOA predictions. Of these original hits, -40% (175/431) had the same predictions as in the first screen (Fig. 4b, middle pie chart). This process provided us with 175 reproducible hits (RHs) across 6 of our 10 reference MOA categories: 49 DNA inhibitors, 45 MT (microtubule) inhibitors, 5 mTOR inhibitors, 4 proteasome inhibitors, 2 HDAC inhibitors, 1 Hsp90 inhibitor, and 69 unclassified compounds (Fig. 17b, bottom pie chart).
Example 14
Validation of primary hits by secondary screening
Next the accuracy of the predicted hits was investigated, beginning with the two smallest predicted categories: Hsp90 and HDACs. Identified hits in these two categories both had literature support. In the Hsp90 category, our compound NSC330500 (macbecin II) was shown previously to be an Hsp90 inhibitor. In the HDAC category, compounds NSC701852 (from the NCI oncology drug set) and Vorinostat (from the Prestwick library) were different names for the same compound (SAHA), which was also was shown previously to be an HDAC inhibitor.
For the remaining four categories, secondary validation was performed (Fig. 17). All reference drugs and 175 reproducible hits were used so that one could both calibrate readout thresholds using reference drugs and estimate false discovery rate of the predictions. Specifically, for each validation screen, a readout threshold was chosen so that 90%> of the reference drugs above the threshold belong to the category to be validated (FDRRef = 0.1, Fig. 5 top). Then the false discovery rate of our reproducible hit predictions (FDRRH) was calculated as the percentage of predicted compounds that failed to pass the readout threshold.
To test DNA damaging activity, immunofluorescence (IF) staining experiments were carried out to detect the level of phospho-H2AX, whose level increases rapidly in response to DNA damaging agents. The DNA damaging ability of each reference drug/compound was measured by the median of averaged phospho-H2AX intensities in nucleus regions. In Fig. 18, at the threshold of 60.74 (FDRRef = 0.1), 76% (37/49) of the predicted DNA compounds (Fig. 17, blue dots) passed the validation (24% failed, FDRRH = 0.24).
To test microtubule perturbing activity, live cell imaging with TUBA1C CD-tagged reporter was carried out to examine mitotic arrest. Mitotic index (the percentage of cells undergoing mitosis) was calculated based on cell morphology and tubulin intensity. At the threshold of 0.06 (FDRRef = 0.1), 96% (43/45) of our predicted MT compounds (Fig. 17, yellow dots) passed the validation (4% failed, FDRRH = 0.04).
To test proteasome inhibiting activity, a ubiquitin-fused R-GFP clone of HeLa cells was used. Under normal conditions, R-GFP would be degraded through the ubiquitin-proteasome system by the N-end rule pathway, and show no fluorescent intensity. In contrast, proteasome inhibitors cause an increase in R-GFP signal (Fig. 18, drug PS-341 at bottom). At the threshold of 166 (FDRRef = 0.1), all our 4 predicted proteasome hits were validated (Fig. 17, red dots). Among those, two (Carfilzomib and Bortezomib) were identified previously in the literature. The other two (NSC26113 and NSC33570) were not previously known as proteasome inhibitors and are new predictions.
To test mTOR inhibiting ability, to measure the level of the phosphorylated ribosomal protein S6 by immunostaining was chosen. Under normal condition, mTOR constitutively phosphorylates the ribosomal protein kinase (S6K), which in turn phosphorylates S649. Therefore, mTOR inhibitors will decrease the level of phosphorylated S6. However, the Hsp90 reference drugs also decreased the phosphorylation level of S6 (Fig. 18), which was consistent with previous literature. Given that both mTOR and Hsp90 inhibitors reduce the level of phosphorylated S6, these two categories were combined when calculating FDRRef for reference drugs. At the threshold of -4.7 (FDRRef = 0.1), the predicted Hsp90 inhibitor (macbecin II; Fig. 17, magenta dots) and three predicted mTOR inhibitors (Fig. 17, green dots) passed the threshold. Two of the three validated mTOR compounds were previously known in the literature as rapamycin and everolimus; the other one (NSC 176324) is a new prediction. Taken together, these results suggest that the predictions of compound MO A, derived from a single-pass screen, had high accuracy across diverse functional categories.
Example 15
Identification of hits in novel MOA classes
The approach can also group compounds with novel MOAs that were not included in the reference drugs. Profiles of the compounds that showed activity (away from DMSO) in the secondary phenotypic screen of the identified 431 hits and reference drugs were collected. These profiles were grouped using hierarchical clustering. Clusters were defined so that the average distance among members was in the bottom 5% of the distribution of all pairwise distances among compounds (excluding DM SO).
A number of interpretable clusters emerged from the data (Fig. 18). Most of these consisted of reference drugs with the same MOA along with the previously described high- confidence predicted hits. Interestingly, some of the remaining clusters consisted of hit compounds whose MOAs were not included in our reference drugs. In one case, a cluster of compounds emerged that consisted of glucocorticoid steroids such as Betamethasone, Flunisolide, and Halcinonide. In another case, NSC740 (Methotrexate, NCI oncology drug set), NSC382035 (methylbenzoprim, NCI diversity set), and Amethopterin (R,S)' (Prestwick library) were grouped together (Fig. 18, inset). Previous studies showed that these compounds are dihydrofolate reductase inhibitors. These results demonstrate that the approach can identify groupings of drugs that work through known MOAs not included in the selection of the smart reporters, and even identify groupings of compounds that work through novel MOAs.

Claims

CLAIMS We claim:
1. A cell comprising at least one heterologous marker, wherein said marker responds in a first and second characteristic manner in response to a first and second agent and said respective characteristic manners are indicative of the mechanism of action of said agents based on the characteristic response of said marker to a plurality of agents having known mechanisms of action.
2. A cell comprising at least one heterologous marker, wherein the response of said marker induced by an agent is capable of predicting a mechanism of action of said agent upon comparison of said response to the responses of said marker to a plurality of agents having different mechanisms of action.
3. A cell comprising at least one heterologous marker, wherein said cell is capable of predicting the mechanism of action of an agent by the agent's ability to induce a phenotype in said cell that is similar to the phenotype induced by known agents.
4. The cell of claim 1-3, wherein the cell is capable of predicting similarities in
mechanism of action of a plurality of agents.
5. The cell of claim 4, wherein the mechanism of action of at least some of said agents is unknown.
6. The cell of claim 1-3, wherein said marker is selected from GFP and YFP.
7. A library of smart reporter cells, each capable of predicting a bioactivity of a plurality of agents by the agents' ability to induce a phenotype of said cells that is similar to the phenotype induced by known agents.
8. The library of claim 7, wherein each of said cells comprises a heterologous marker.
9. The library of claim 7, wherein the library is derived from a single cell type.
10. The library of claim 8, wherein at least a first and second cell of said library comprise a heterologous marker at a different locus within the genome of said cell.
11. The library of claim 7, wherein the library is derived from at least a first and second cell type.
12. The library of claim 11, wherein at least a first and second cell of said library
comprises a heterologous marker at a different locus within the genome of said cells.
13. A method of predicting mechanism of action of an agent comprising: a. contacting a library of cells of claim 1-3 with an agent;
b. monitoring said marker;
c. comparing the marker phenotype to marker phenotypes induced by a plurality of agents of known mechanism; and
d. predicting the mechanism of action of said agent based on a similarity in marker response between the response induced by said agent and at least one of said responses induced by at least one agent of known mechanism.
14. The method of claim 13, wherein said plurality of agents comprises at least 5 agents of known mechanisms.
15. The method of claim 13, wherein said marker is monitored by imaging.
16. The method of claim 13, wherein said marker is monitored for at least 5 minutes.
17. A method of making a smart reporter cell comprising:
a. inserting a heterologous marker into the genome of a plurality of cells and
propagating clones of said cells to form a first cell library;
b. contacting said first cell library with a first and second agent with first and second mechanisms of action, respectively;
c. monitoring the activity of said marker in response to said first and second agents; d. contacting said first cell library with a third and fourth agent with first and second mechanisms of action, respectively;
e. monitoring the activity of said marker in response to said third and fourth agents; and
f. selecting cells from said first library that respond substantially the same to said first and third agents and said second and fourth agents.
18. A method of making a library of tissue specific smart reporter cells comprising: a. inserting a heterologous marker into the genome of stem cells and propagating clones of said cells to form a first stem cell library;
b. contacting said first stem cell library with a first and second agent with first and second mechanisms of action, respectively;
c. monitoring the activity of said marker in response to said first and second agents; d. contacting said first stem cell library with a third and fourth agent with first and second mechanisms of action, respectively; e. monitoring the activity of said marker in response to said third and fourth agents; f. selecting stem cells from said first library that respond substantially the same to said first and third agents and said second and fourth agents; and
g. differentiating said selected stem cells into different cell types.
19. A method of making a library of tissue specific smart reporter cells comprising: a. inserting a heterologous marker into the genome of stem cells and propagating clones of said cells to form a first stem cell library;
b. differentiating cells in said first stem cell library into different cell types to form a differentiated cell library;
c. contacting said differentiated cell library with a first and second agent with first and second mechanisms of action, respectively;
d. monitoring the activity of said marker in response to said first and second agents; e. contacting said differentiated cell library with a third and fourth agent with first and second mechanisms of action, respectively;
f. monitoring the activity of said marker in response to said third and fourth agents; g. selecting cells from said differentiated cell library that respond substantially the same to said first and third agents and said second and fourth agents.
20. A method of screening for bioactive agents that target a particular cell type, said method comprising contacting said tissue specific libraries of claims 18 or 19 with a first agent and monitoring the effects of said agent on the markers in each of said differentiated cell types.
21. A database comprising the detected results of marker activity in response to a
plurality of agents obtained from the cells of claims 1, 2 or 3.
PCT/US2015/055681 2014-10-16 2015-10-15 Smart reporter cells and methods of making and using same WO2016061318A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201462064923P 2014-10-16 2014-10-16
US62/064,923 2014-10-16

Publications (1)

Publication Number Publication Date
WO2016061318A1 true WO2016061318A1 (en) 2016-04-21

Family

ID=55747320

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/055681 WO2016061318A1 (en) 2014-10-16 2015-10-15 Smart reporter cells and methods of making and using same

Country Status (1)

Country Link
WO (1) WO2016061318A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3938777A4 (en) * 2019-03-15 2022-11-23 Recursion Pharmaceuticals, Inc. Process control in cell based assays

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002052039A2 (en) * 2000-12-27 2002-07-04 Geneka Biotechnology Inc. Methods for selecting and producing pharmaceutical compounds using a library responsive to transcription factors
US20050196743A1 (en) * 1999-09-10 2005-09-08 Cadus Technologies, Inc. Cell surface proteins and use thereof as indicators of activation of cellular signal transduction pathways
US20060292695A1 (en) * 2005-06-22 2006-12-28 Roslin Institute Methods and kits for drug screening and toxicity testing using promoter-reporter cells derived from embryonic stem cells
WO2009002565A1 (en) * 2007-06-26 2008-12-31 Cellumen, Inc. Method for predicting biological systems responses in hepatocytes
US8460876B2 (en) * 2002-01-16 2013-06-11 Affymetrix, Inc. Screening methods involving the detection of short-lived proteins

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050196743A1 (en) * 1999-09-10 2005-09-08 Cadus Technologies, Inc. Cell surface proteins and use thereof as indicators of activation of cellular signal transduction pathways
WO2002052039A2 (en) * 2000-12-27 2002-07-04 Geneka Biotechnology Inc. Methods for selecting and producing pharmaceutical compounds using a library responsive to transcription factors
US8460876B2 (en) * 2002-01-16 2013-06-11 Affymetrix, Inc. Screening methods involving the detection of short-lived proteins
US20060292695A1 (en) * 2005-06-22 2006-12-28 Roslin Institute Methods and kits for drug screening and toxicity testing using promoter-reporter cells derived from embryonic stem cells
WO2009002565A1 (en) * 2007-06-26 2008-12-31 Cellumen, Inc. Method for predicting biological systems responses in hepatocytes

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3938777A4 (en) * 2019-03-15 2022-11-23 Recursion Pharmaceuticals, Inc. Process control in cell based assays

Similar Documents

Publication Publication Date Title
Way et al. Predicting cell health phenotypes using image-based morphology profiling
Kang et al. Improving drug discovery with high-content phenotypic screens by systematic selection of reporter cell lines
Caicedo et al. Applications in image-based profiling of perturbations
Way et al. Morphology and gene expression profiling provide complementary information for mapping cell state
US8050868B2 (en) Methods for determining the organization of a cellular component of interest
JP5406019B2 (en) Method for automated tissue analysis
Feng et al. Multi-parameter phenotypic profiling: using cellular effects to characterize small-molecule compounds
CN108885204B (en) High throughput imaging-based method for predicting cell-type specific toxicity of xenobiotics with different chemical structures
Fetz et al. Target identification by image analysis
JP2009526519A (en) Methods for predicting biological system response
Shachar et al. HIPMap: a high-throughput imaging method for mapping spatial gene positions
Giuliano et al. Systems cell biology based on high‐content screening
US20060154236A1 (en) Computer-assisted analysis
Adams et al. Compound classification using image‐based cellular phenotypes
Chong et al. Proteome-wide screens in Saccharomyces cerevisiae using the yeast GFP collection
WO2016061318A1 (en) Smart reporter cells and methods of making and using same
Gundert-Remy et al. Molecular approaches to the identification of biomarkers of exposure and effect—report of an expert meeting organized by COST Action B15
Zhang et al. Rapid and Accurate Identification of Cell Phenotypes of Different Drug Mechanisms by Using Single-Cell Fluorescence Images Via Deep Learning
Shen et al. RefCell: multi-dimensional analysis of image-based high-throughput screens based on ‘typical cells’
Song et al. Analysis of image-based phenotypic parameters for high throughput gene perturbation assays
Pearson et al. A statistical framework for high-content phenotypic profiling using cellular feature distributions
Garippa et al. High-Content Screening with a Special Emphasis on Cytotoxicity and Cell Health Measurements
Hagemann et al. Automated and unbiased classification of motor neuron phenotypes with single cell resolution in ALS tissue
Stossi et al. SPACe (Swift Phenotypic Analysis of Cells): an open-source, single cell analysis of Cell Painting data
EP4305416A1 (en) Cell analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15851087

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15851087

Country of ref document: EP

Kind code of ref document: A1