WO2009146617A1

WO2009146617A1 - Systematic identification of cis-silenced genes by trans-complementation

Info

Publication number: WO2009146617A1
Application number: PCT/CN2009/071142
Authority: WO
Inventors: T. Lahn Bruce
Original assignee: Cyagen Biosciences (Guangzhou) Inc.
Priority date: 2008-06-06
Filing date: 2009-04-02
Publication date: 2009-12-10
Also published as: CN101743328A; CN101743328B; CN102559899B; CN102559899A

Abstract

Methods to identify cis- silenced (occluded) genes through fusion of dissimilar cells (e.g., responder and reprogrammer) are disclosed. By trans-complementation mechanisms and differential expression profiled in the fused and unfused cells, the biallelicaly cis-silenced genes in a responder cell type are identified. Use of the trans- complementation assay to systematically identify and characterize occluded genes has wide-ranging applications in studies of health and disease, including cancer therapeutics.

Description

SYSTEMATIC IDENTIFICATION OF C/5-SILENCED GENES BY TRANS-

COMPLEMENTATION

BACKGROUND

[0001 ] A rrarøs-complementation assay allows the systematic identification of biallelically occluded genes.

[0002] Multicellular life is defined by the presence, within a single organism, of a wide array of cell types bearing the same genome but with disparate physiological functions. This is typically achieved through the progressive differentiation and diversification of multipotent stem cells into functionally specialized cells. As a general rule, differentiated cell types can stably maintain their phenotypic identities despite fluctuations in extracellular environment and intracellular regulatory networks. How cell type identity is maintained at the molecular level is a central but poorly understood question in biology. One possible explanation is that the phenotypic identity of differentiated cells is maintained via the stable silencing of lineage-inappropriate genes - i.e., genes promoting alternative lineages whose aberrant expression would lead to the manifestation of incorrect cellular phenotypes.

[0003] This explanation is in line with the increasing recognition that the transcriptional output of a gene is the combined product of two distinct inputs. The first is the transacting milieu of the cell, defined as all the diffusible factors that collectively impinge on gene regulatory sequences to promote or repress expression. The second is the czs-acting chromatin state of the gene itself, defined as the full complement of chromatin marks at the locus, marks such as DNA methylation, histone modifications, and the binding of chromatin remodeling factors, which in combination determine how the locus responds to its milieu. Particular chromatin marks such as DNA methylation and histone hypoacetylation are enriched at silent loci of the genome. In most cases, however, the exact contribution of these chromatin marks to the silent state cannot be teased apart from the contribution of the milieu. This is because it is difficult to know whether chromatin marks at silent loci are the cause or consequence of silencing, or to what extent the silent status of a gene and its associated chromatin marks are reversible when cellular milieu changes. As such, whether gene silencing by chro matin-based cis mechanisms plays a key role in maintaining cell type identity remains to be resolved. [0004] Monoallelic silencing such as X inactivation and imprinting is a clear exception to the above ambiguity. Here, it can be unequivocally ascertained that silencing is due to czs-acting chromatin mechanisms in a manner independent of milieu. The hallmark of monoallelic silencing is the differential expression of two copies of a gene - one silent and one active - in the same cell. The active copy serves as a positive control, attesting to the presence of a milieu that is conducive to the expression of the gene. In this context, the silent copy, which is bathed in the same milieu, must have been blocked from the milieu's action by the czs-effect of its chromatin state. Thus, at least in the case of monoallelic silencing, the transcriptional competency of a gene can be defined as existing in either of two states. One is the 'competent' state whereby a gene is capable of responding to the milieu of the cell, such that it is active if appropriate transcription activators are present, but silent if activators are absent or repressors are present. The other can be called the "occluded" state whereby a gene is no longer capable of responding to the cell's milieu and remains silent even in the presence of a transcriptionally conducive milieu.

[0005] During development, some genes might become biallelically occluded by mechanisms similar to monoallelic silencing. This process could play an essential role in maintaining the phenotypic identities of cells. A key test of this model is the identification of biallelically occluded genes. However, the lack of a positive control - the equivalent of the active copies for monoallelically silenced genes - poses a technical challenge in ascertaining the presence of biallelically occluded genes. Without such a control, it is not possible to definitively differentiate whether a silent gene is in the occluded state or whether it is competent but not expressed simply due to the lack of a conducive milieu. Furthermore, biochemical modifications of chromatin, which may regulate gene expression in cis, are immensely complex (for example, there are over 100 known chromatin marks), thus limiting the use of a 'bottom up' approach to differentiate cis- versus trans-regulation. SUMMARY

[0006] A rrarøs-complementation assay allows the systematic identification of biallelically occluded genes. A method described herein is to fuse at least two disparate dissimilar cell types, and search in the fused cells for genes silent in one genome but active (expressed) in the other. "Disparate" or "dissimilar" includes cells from different species and different cell types from the same species. For example, a reprogrammer cell can provide a transcriptional milieu that is different from the responder cell's milieu. The extent of dissimilarity or disparateness may vary and may depend on the cell types used. Cells that are particularly well suited for this assay are those that are easy to grow, since increasing the number of cells used increases the sensitivity of the assay. Similar to monoallelic silencing, the active copies of genes serve as a positive control, compared to which the occluded state of the silent copies can be ascertained. A responder cell's occluded state is in relation to the corresponding reprogrammer cell. Multiple different reprogrammer cells can be simultaneously fused with a responder cell to identify a plurality of occluded genes in the responder cell type.

[0007] An artificial cell-type liposome (or any other cell-fusing component) with a transcriptional milieu can also be used. This liposome may encompass the nucleic acids encoding one or more genes that need to be assayed under the control of their native transcriptional control elements and trans-factors necessary for its expression. A synthetic transcriptional milieu can also be used to identify the cis- silenced genes.

[0008] A method of identifying cz^'s-silenced genes (occluded) by rrarøs-complementation includes:

(a) selecting first and second cell types, wherein the cell types are genetically dissimilar;

(b) fusing the first and the second cell types to generate one or more fused cells, wherein the cell types are distinctly labeled;

(c) performing gene expression analysis for a plurality of genes expressed in the fused cells; and

(d) identifying cz^'s- silenced genes (occluded) in the first cell type by comparing the gene expression before and after the fusion, wherein a czs-silent gene is expressed from the genome of the second cell type before and after fusion and not expressed from the genome of the first cell type before and after fusion, [only before fusion?]

[0009] The first cell type may be designated as a responder cell type and the second cell type may be a reprogrammer cell type. The reprogrammer cell may be devoid of a nucleus. [00010] The first cell type may be a cancer cell type.

[00011 ] The cell types may be from different tissue types or different species. Multiple cell types are also within the scope of the disclosure. [00012] An embodiment of dissimilar cell types includes mSMM from C3H mice and mDF from BG mice. [00013] An occludome of a cell type is defined as an index of genes that are cis- silenced, wherein the cz^'s-silenced genes are not capable of being activated by an appropriate trans- activating factor from a dissimilar cell type. [00014] A method for determining the relative contributions of cis versus trans mechanisms to regulation of genes encoded in a target genome, the method comprising:

(a) fusing at least two disparate cell types in vitro;

(b) searching for the target genes differently expressed between the genomes of the fused cells and the unfused cells, wherein cis blocked genes cannot respond to trans signals introduced through the shared milieu of the fused cells; and

(c) comparing cis with trans genes.

[00015] A method for determining the relative contribution of cis versus trans mechanisms of regulation of a target gene, the method comprising:

(a) fusing at least two disparate cell types in vitro;

(b) comparing the expression level of the target gene from the genomes of the disparate cell types in unfused and fused cells, wherein the relative contribution correlates with the relative responsiveness of the target gene to trans signals introduced through the shared milieu of the fused cell.

[00016] A method to identify occluded genes, includes:

(a) fusing cells from different species, wherein the cells from one of the species are "responders" and the other "reprogrammers";

(b) labeling the responders with different labels than the reprogrammers;

(c) detecting fused cells by the presence of dual labels;

(d) interrogating fused cells to determine expression patterns; and

(e) identifying genes that are silent in the responder genomes but active in the reprogrammer genomes. [00017] In certain embodiments, expression patterns are determined by microarray analysis or by RT-PCR. In a particular embodiment, expression patterns are determined by microarray analysis followed by RT-PCR.

[00018] Embodiments of species include mouse and human.

[00019] Embodiments of the responder cells include human lung fibroblasts and embodiments of the reprogrammer cells include mouse skeletal muscle myoblasts.

[00020] Using the methods described herein with human lung fibroblast responder cells and mouse skeletal muscle myoblast reprogrammer cells, approximately 300 candidate genes were defined by microarray analysis and 52 genes were confirmed as occluded in subsequent validations. A subset of these occluded genes were tested in mouse/mouse, human/human and chimp/human fusions and occluded status was confirmed in most cases, suggesting that occlusion is not an artifactual result of mouse:human hybrids. The occluded state was shown to be stable following long term culture and nuclear fusion of hybrid cells.

[00021 ] A subset of muscle related occluded genes were shown to be occluded in fusions of mSMM with other human cell types: HeLa, mesenchymal stem cells and keratinocytes. Additionally a selective approach was applied to examine gene occlusion in fusions of hLF and mouse hepatocytes individually with a mouse neuroblastoma cell line. Two hepatocyte-expressed, hLF-occluded genes were found to be occluded in hepatocyte fusions as was a single neuroblast-expressed, hLF-occluded gene, suggesting that occluded status is generally maintained in response to different potential reprogrammer cells.

[00022] The molecular basis of occlusion was also analyzed, e.g., DNA methylation.

Bisulfite sequencing of a subset of occluded and non-occluded genes was used to identify potential methylation differences in hLF versus mSMM. A small number of occluded genes showed differential methylation close to the transcription start site. Treatment of hLF with the demethylating agent 5-azacytidine did not significantly change expression of occluded genes, but following cell fusion, low level activation of a number of these genes was observed, suggesting that occluded status had been at least partially eroded (notably many of these genes did not show differential methylation patterns). [00023] The effect of the HDAC inhibitor TSA was also investigated in similar experiments but was found to be minimal. [00024] The stability of the occluded state in response to variation in physiological conditions (nutrient, hypoxia and the like) was tested. Occluded status was found to be highly stable relative to rrarøs-activatable control genes. [00025] This indicates that chromatin marks mask some genes, i.e. occluded genes, from the transcription milieu in a given cell type whilst other genes are apparently more open to trans-acting factors. It is often key lineage determinants that are subject to chromatin modifications that mask them from lineage inappropriate activation.

BRIEF DESCRIPTION OF THE DRAWINGS

[00026] FIG. 1. Identification of occluded genes: (A) Expression analysis of the hLF-

SMM fusion by RT-PCR; four classes of genes are shown: occluded hLF genes, transactivated hLF genes, extinguished mSMM genes, and occluded mSMM genes; for each gene, four RT-PCR results are shown: the two on the top target the mSMM ortholog before and after fusion, while the two on the bottom target the hLF ortholog before and after fusion; known muscle-related genes are indicated by '+' above the gene name; (B) summary of whether genes are occluded (denoted by O) or transactivated (T) in hLF, mDF, and cDF; the presence or absence of CpG island in each gene is indicated, along with whether transcription start site (TSS) is differentially methylated between hLF and hSMM; also indicated is whether each gene is the target of Polycomb binding in human embryonic stem cells. hLF: human lung fibroblasts; mSMM: mouse skeletal muscle myoblasts; mDF: mouse dermal fibroblasts; cDF: chimpanzee dermal fibroblasts; hSMM: human skeletal muscle myoblasts.

[00027] FIG. 2. Expression analysis of the mDF-mSMM fusion by RT-PCR and sequencing of genes found to be occluded or transactivated in hLF; (A) RT-PCR performed with primers common to mDF and mSMM; showing expression in mSMM and fused cells but not in mDF; (B) sequencing of RT-PCR products from fused cells (last row of chromato grams); eleven of the 14 genes are occluded in mDF, as only the mSMM allele is expressed in fused cells; in contrast, Chrnd, Myog and MyH are transactivated, as both mDF and mSMM alleles are expressed; the first two rows of chromatograms are sequences of either mSMM or mDF alone, showing different alleles between these two cell types. Arrows in chromatograms indicate sites that are polymorphic between mDF and mSMM. mDF: mouse dermal fibroblasts; mSMM: mouse skeletal muscle myoblasts; hLF: human lung fibroblasts.

[00028] FIG. 3. The expression status of occluded genes versus control genes under various culture conditions; in addition to the normal culture condition, 5 additional conditions mimicking physiological variation were used, including low nutrient, hypoxia, hypothermia hyperthermia, and interferon-γ treatment; control genes were selected on the basis of being silent under the normal condition: (A) stable silencing of occluded genes in hLF under various conditions (B) activation of some of the control genes under culture conditions mimicking physiological variation; genes activated under one or more conditions are indicated by '*'.

[00029] FIG. 4. Bisulfite sequencing analysis of the occluded gene Myf5 (A) and the transactivated gene Actal (B); in the schema of gene structure, exons are shown in solid bars with thick bars indicating coding regions and thin bars indicating untranslated regions; bioinformatically identified CpG islands are indicated; in the conservation graph, the height of peaks reflects the degree of cross-species conservation; individual amplicons in bisulfite sequencing and their corresponding genomic regions are indicated by brackets; within each block of bisulfite sequencing data, columns correspond to CpG sites while rows correspond to sequenced clones; solid circles indicate methylated CpG; dots indicate unmethylated CpG.

[00030] FIG. 5. Effect of AdC (A) and TSA (B) treatment on occluded genes in hLF:

RT-PCR analysis of gene expression was performed on drug-treated hLF without fusion and drug-treated hLF fused to mSMM. hLF: human lung fibroblasts; mSMM: mouse skeletal muscle myoblasts.

[00031] FIG. 6. ChIP analysis of 16 chromatin marks in hLF; genes targeted by the analysis can be divided into silent and expressed categories, with the silent category further divided into occluded and transactivated groups: (A) PCR quantitation of ChIP for individual genes; each bar represents a region interrogated by a PCR amplicon; the height of each bar represents fold-enrichment, relative to input, of each gene; P- values are calculated from these data using the r-test, and indicate the statistical significance that two groups of genes are distinct for the chromatin mark surveyed; error bars are based on multiple replicates of the experiment; (B) principal component analysis of ChIP data across the 16 chromatin marks. NS: not significant; hLF: human lung fibroblasts.

[00032] FIG. 7. Different perspectives of the genome regulation: (A) proposed

'occludome' perspective that considers the genome as including the occluded portion and the competent portion; (B) traditional 'transcriptome' perspective that considers the genome as comprising the expressed portion and the silent portion; the occludome may provide a molecularly more fundamental and physiologically more stable definition of cell type than the transcriptome.

[00033] FIG. 8. Fusion between hLF and mSMM: (A) purification of fused cells by

FACS; prior to fusion, red and green fluorescent dyes were used to pre-label hLF and mSMM, respectively; after chemically induced fusion, hLF-mSMM fusion cells were purified based on dual fluorescence. Left panel is the cytometric plot of a control experiment involving the co-culture of hLF and mSMM without chemically induced fusion, which shows two distinct populations of cells. Right panel is the fusion experiment containing a third, dual-fluorescent population that includes fused cells; (B) light microscopy examination of FACS-purified cells; virtually all cells have dual fluorescence and contain two or more nuclei per cell; (C) the ratio of hLF versus mSMM nuclei in fused cells; the majority of cells have equal numbers of hLF versus mSMM nuclei (either 1:1 or 2:2); hLF: human lung fibroblasts; mSMM: mouse skeletal muscle myoblasts.

[00034] FIG. 9. Time-course analysis of the hLF-mSMM fusion; human and mouse gene expression in fused cells was analyzed by RT-PCR at 1, 2, 3, 4, 8 or 16 days post fusion; the occluded state of the hLF genes is well maintained throughout these different time points; hLF: human lung fibroblasts; mSMM: mouse skeletal muscle myoblasts.

[00035] FIG. 10. Time-course analysis of the mDF-mSMM fusion; expression of My/5,

Carl, Cxcr4 and Chrnd in fused cells was analyzed at 2, 4 or 8 days post fusion by sequencing of RT-PCR products from the respective genes; Arrows indicate sites that are polymorphic between mDF and Msmm; Myf5, Carl and Cxcr4 in mDF remain occluded throughout, whereas Chrnd is clearly transactivated by day 4; mDF: mouse dermal fibroblasts; mSMM: mouse skeletal muscle myoblasts. [00036] FIG. 11. Expression analysis of the cDF-hSMM fusion by RT-PCR and sequencing; Of the 24 occluded and 10 transactivated hLF genes, 12 and 8, respectively, are informative in the cDF-hSMM fusion in that they contain exonic substitutions between cDF and hSMM, and are also differentially expressed between these two cell types: (A) RT-PCR on the 20 informative genes, performed with primers common to cDF and hSMM, showing expression in hSMM and fused cells but not in cDF; (B) sequences of RT-PCR products from fused cells (last row of chromatograms), indicating that a gene is either occluded in cDF, as only the hSMM allele is expressed in fused cells, or it is transactivated, as both cDF and hSMM alleles are expressed; the first two rows of chromatograms are sequences of either hSMM or cDF alone, showing different alleles between these two cell types. Arrows in chromatograms indicate nucleotide substitution sites between cDF and hSMM. cDF: chimpanzee dermal fibroblasts; hSMM: human skeletal muscle myoblasts; hLF: human lung fibroblasts.

[00037] FIG. 12. Evidence that cells in the mDF-mSMM fusion have undergone DNA synthesis and nuclear merger. (A) Enlarged nuclear diameter (by about 40%) of mononucleated cells 4 days after fusion and FACS purification; these mononucleated cells represent the great majority of cells at this stage; (B) increase of BrdU-labeled cells over time indicating DNA synthesis; (C) immunofluorescence staining of BrdU incorporation 4 days after fusion; (D) presence of both mDF and mSMM genomes in the single nucleus of mononucleated cells 4 days after fusion; CIdU and IdU were used to label mDF and mSMM genomes, respectively, prior to fusion, and visualized by immunofluorescence staining in fused cells; mDF: mouse dermal fibroblasts; mSMM: mouse skeletal muscle myoblasts. Error bars in A and B represent standard errors of the mean.

[00038] FIG. 13. Identification of occluded genes in diverse cell types; three non-muscle cell types, hMSC, hKe and HeIa were fused with mSMM, with results shown in panels A, B, and C, respectively; the same 24 occluded hLF genes identified in the hLF-mSMM fusion were examined in these fusions; known muscle-related genes are indicated by '+' above the gene names; hLF: human lung fibroblasts; hMSC: human mesenchymal stem cells; hKe: human keratinocytes; mSMM: mouse skeletal muscle myoblasts. [00039] FIG. 14. Bisulfite sequencing analysis of occluded hLF genes Myodl (A),

Cacngl (B), Rapsn (C), and Tnnil (D), and transactivated hLF genes Myog (E), Ckm (F), Tnnil (G), and Tnncl (H); comparison is made between hLF and hSMM; this figure follows the same convention as FIG. 4. hLF: human lung fibroblasts; hSMM: human skeletal muscle myoblasts.

[00040] FIG. 15. Bisulfite sequencing analysis of transcription start site (TSS) of 18 occluded and 5 transactivated hLF genes; comparison is made between hLF and hSMM; the TSS oϊNcaml is covered in 2 amplicons; Cxcr4 and MyH each have two distinct TSS (indicated as a & b), which are analyzed separately; genes indicated by '*' are not expressed in hSMM, and it is therefore not known if they are occluded or competent in hSMM; all the other genes are expressed (and therefore competent) in hSMM; hLF: human lung fibroblasts; hSMM: human skeletal muscle myoblasts.

[00041 ] FIG. 16. Occluded hLF genes identified in hLF-mOst fusion; for each gene, four

RT-PCR results are shown: the two on the top target the mouse ortholog in mOst before and after fusion, while the two on the bottom target the human ortholog in hLF before and after fusion; genes found to also be occluded in the hLF-mSMM fusion are indicated; genes found to be the target of Polycomb binding in human embryonic stem cells are indicated by 'Pc'; hLF: human lung fibroblasts; mOst: mouse osteoblasts; mSMM: mouse skeletal muscle myoblasts.

[00042] FIG. 17. Expression analysis of the hLF-mHe (A) and hLF-mNeu (Bj fusions by

RT-PCR; for each gene, four RT-PCR results are shown: the two on the top target the mouse ortholog in either mHe (Aj or mNeu (B) before and after fusion, while the two on the bottom target the human ortholog in hLF before and after fusion; hLF: human lung fibroblasts; mHe: mouse hepatocytes; mNeu: mouse neuroblastoma.

[00043] FΪG. 18. Robustness of the occluded stale in a responder cell type to the use of different reprogrammer cell types; of the 24 occluded hLF genes identified in the hLF- mSMM fusion, Spockl and Mnda are also expressed in mHe, while Gnaol is also expressed in mNeu, allowing the opportunity to examine whether these genes are occluded in hLF when it is fused to different reprogrammer cell types: (K) Spock.2 and Mnda in hLF are also occluded in the context of the hLF-mHe fusion; (Bj Gnaol in hLF is also occluded in the context of the hLF-mNeu fusion; the other eenes cannot be ascertained because they are not expressed in either mHe or mNeu. liLF: human lung fibroblasts; mSMM: mouse skeletal muscle myoblasts; mHe: mouse hepalocytes; mNeu: mouse neuroblastoma.

[00044] FIG. 19. Robustness of gene transact! vation in a responder cell type to the use of different reprogrammer cell types. Of the 10 transact! vated hLF genes identified in the hLF-mSMM fusion, MfapS is also expressed in mOst, while Actal. MyH and Tnnil are also expressed in mHe, allowing the opportunity to examine whether these genes are transactivated in hLF when it is fused to different reprogrammer cell types: (Aj MfapS in hLF is also transactivated in the context of the hLF-mOst fusion; (B) AcIaI, MyH and Tnnll in hLF are also transactivated in the context of the hLF-mHe fusion; the other genes cannot be ascertained because they are not expressed in either mOst or mHe, hLF: human lung fibroblasts; mSMM: mouse skeletal muscle myoblasts; mOst: mouse osteoblasts; mHe: mouse hepatocytes, DETAILED DESCRIPTION

[00045] A gene's transcriptional output is the combined product of two inputs: diffusible factors in the cellular milieu acting in trans, and chromatin state acting in cis. Dissecting the relative contribution of cis versus trans mechanisms to gene regulation is referred to as rrafts-complementation. This can be accomplished by fusing at least two disparate cell types and searching for genes differentially expressed between the two genomes of fused cells. Any differential expression can be causally attributed to cis mechanisms because the two genomes of fused cells share a single homogenized milieu in trans. A state of transcriptional competency was uncovered termed 'occluded', whereby affected genes are silenced by czs-acting mechanisms in a manner that blocks them from responding to the rra/M-acting milieu of the cell. Occluded genes were identified in a variety of cell types. Occluded genes in a given cell type tend to include master triggers of alternative cell fates. These master triggers are transcriptional regulators that drive the activation of lineage specific programs, the occlusion of which would safeguard cell type identity. Furthermore, the occluded state is maintained during cell division and is extraordinarily stable under a wide range of physiological conditions. Chromatin analysis suggests that the occlusion of some genes may involve DNA methylation, and that occluded genes are enriched for HPl-α binding. Together, these results support the concept that occlusion of lineage-inappropriate genes is a key mechanism of cell fate restriction. The systematic description of occluded genes by methods disclosed herein offers a novel molecular definition of cell type, and provides a hitherto unavailable functional readout of chromatin state across the genome.

[00046] Cell fusion has been used in the past to investigate gene regulation, generally focusing on the transactivation and extinction of tissue-specific genes in fused cells that indicate the presence of trans-acting transcriptional activators or repressors. An important utility of cell fusion is implemented in the rrarøs-complementation assay. By fusing disparate cell types and searching for genes differentially expressed between the two genomes of the fused cells, the assay can dissect out the contribution of czs-acting mechanisms to gene silencing apart from the contribution of trαns-acting milieu. As one skilled in the art will appreciate, any suitable method for evaluating differential expression may be used according to the methods of the invention including, but not limited to, microarray analysis, real time PCR, Northern analysis and sequencing of cDNA produced by semi-quantitative reverse transcriptase PCR (RT-PCR). The use of species specific microarray analysis is particularly useful for the global analysis of differentially expressed genes following fusion of cells originating from different species. Using this assay, a class of genes was identified existing in what is the occluded state, defined as a state of transcriptional competency whereby a gene remains silent even in the presence of a transcriptionally conducive milieu. The occluded state is maintained during cell division and is highly stable under a wide range of physiological conditions.

[00047] Monoallelic silencing such as X inactivation and imprinting clearly fits the definition of the occluded state. Biallelic occlusion also occurs as a widespread biological phenomenon, affecting many genes in diverse cell types. Indeed, monoallelic silencing can be viewed as a special case of gene occlusion, and it is plausible that biallelic occlusion is the ancestral state from which monoallelic silencing evolved. Biallelic occlusion may be key to defining and safeguarding the phenotypic identities of cells by stably shutting down lineage-inappropriate genes that might otherwise become active.

[00048] An 'occludome' perspective of the genome regulation means considering the genome of a cell type as including two portions, one being all the occluded genes and the other all the competent (actively expressed) genes (FIG. 7A). Actively expressed genes in a cell type are all competent, but silent genes can be either competent or occluded. This is a different conceptual framework for understanding genome regulation from the current 'transcriptome' perspective of the genome whereby genes are considered either expressed or silent (compare FIGS. 7A with 7B).

[00049] To systematically map all the occluded genes in a cell type, the cell is fused with a wide variety of other cell types that collectively express the entire genome. Such an occludome map provides a definition of the cell type that is physiologically more consistent - and perhaps molecularly more fundamental - than the rather labile transcriptome. By comparing occludome maps between cell types of different lineages, between stem cells and differentiated cells of the same lineage, between young and old cells, between normal and pathological cells (such as cancer), and between cells from different species, wide-ranging insights into fundamental mechanisms of development, aging, disease processes, and evolution are contemplated. Furthermore, for a given cell type, comparisons are made between the occludome map and genome- wide maps of chromatin marks such as DNA methylation and histone modifications. Such comparisons may reveal the biochemical underpinnings of the occluded state, and more importantly, provide a hitherto unavailable functional readout of the complex chromatin code superimposed on the genetic code.

[00050] A binary on/off view of gene occlusion is presented. However, it is plausible that occlusion can sometimes lead to partial silencing of some genes, in which case a gene may show a quantitative expression difference between the two genomes of fused cells rather than a qualitative on/off difference. The rrarøs-complementation assay is used to reveal both full and partial occlusion as long as a gene displays differential expression between the genomes of the fused cells (assuming that confounding factors such as interspecies incompatibility are ruled out).

[00051 ] The definition of the occluded state requires that a gene is silent (or nearly silent) even in the presence of a transcriptionally conducive milieu. However, this definition is only in reference to a particular milieu. A gene may be occluded in one milieu but active in another milieu. This could happen if transcription factors in the first milieu are blocked by repressive chromatin marks present in certain czs-regulatory sequences of a gene, but transcription factors in the second milieu, distinct from the first, are able to drive expression by recognizing a different set of czs-regulatory sequences of the gene not affected by repressive chromatin. Alternatively, factors in the second milieu, unlike those in the first, can recognize their target sequences even in the presence of repressive chromatin. Some milieus might have the ability to 'deocclude' genes - i.e., erase the chromatin marks responsible for the occluded state. Such erasure could affect individual genes or the whole genome, and could be an active process or a passive one.

[00052] Reprogramming of somatic cells by nuclear transfer into oocytes or by fusion with embryonic stem cells (ESC) or embryonic germ cells (EGC) demonstrated the ability of these cell types to erase most, if not all, of the chromatin marks in somatic cells established during development. Such ability may arise from just a few genes whose ectopic expression can reprogram fibroblasts into pluripotent, ESC-like cells called induced pluripotent stem (iPS) cells. Shortly after the blastocyst stage (where ESC is derived), cells may lose their ability to deocclude the genome, perhaps by occluding the very genes that are responsible for genome- wide deocclusion in the first place. The progressive differentiation of cells in subsequent developmental stages is accompanied by the irreversible or nearly irreversible occlusion of an increasing number of genes, with distinct sets of genes becoming occluded in different lineages.

[00053] The occlusion of lineage-inappropriate genes serves to safeguard the phenotypic stability of the myriad cell types in multicellular organisms against noise in both extracellular environment and intracellular regulatory networks. Furthermore, that different cell types are characterized by different occludomes might also explain why the same signaling pathway often triggers the activation of different sets of genes in different cell types - a frequent phenomenon during the development of multicellular organisms. The ability of the same transcription factors to play different roles in different cell types allows increased cell type complexity in multicellular organisms without concomitant increases in genome size/complexity. Thus, the evolution of some form of gene occlusion might have been a prerequisite for the evolution of multicellularity.

[00054] The occluded state could be quite stable in order to confer and maintain cell identity over the entire ontology of the organism (the germline being an exception where the occluded state is either never fully established for most genes or is erased during gametogenesis). For some genes, the occluded state might be essentially irreversible in somatic cells under normal conditions (as is the case for X-inactivated and imprinted genes). Nevertheless, some occluded genes might become deoccluded in certain somatic cell types by deliberate mechanisms, which could contribute to the dedifferentiation/transdifferentiation of cells during tissue regeneration, especially in species capable of regenerating entire body parts after injury. On rare occasions, the competent/occluded status of genes could also change in a stochastic, unregulated manner, which might contribute to aging and disease processes such as cancer. The use of the rrarøs-complementation assay to systematically identify and characterize occluded genes should therefore have wide-ranging applications in studies of health and disease.

[00055] Identification of occluded genes via interspecies cell fusion [00056] To identify occluded genes within specific cell types, an interspecies cell fusion strategy was employed. One of the cell types being fused is referred to as the "responder" and the other the "reprogrammer." A goal is to identify occluded genes in the responder, which are defined operationally as genes silent in the responder genome of fused cells, but active in the reprogrammer genome of the same fused cells (Table 1). Human lung fibroblasts (hereon abbreviated hLF) were used as the responder, and mouse skeletal muscle myoblasts (mSMM) as the reprogrammer. By using cells from different species, sequence divergence between orthologs can be exploited to distinguish whether a transcript in fused cells is produced from the reprogrammer genome or the responder

[00057] In an embodiment, two cell populations were labeled with different fluorescent dyes and fused by polyethylene glycol. Dual fluorescent cells, which represent a small fraction of the total, were isolated by fluorescence activated cell sorting (FACS) (FIG. 8A). Microscopy confirmed that FACS-isolated cells were predominantly (>98%) fusions between hLF and mSMM, because they contained multiple nuclei of two distinct morphologies (hLF nuclei are larger and have weaker DAPI staining relative to mSMM) (FIG. 8B). For a subset of experiments, cells of heterotypic fusion (i.e., fusion between hLF and mSMM) were further enriched by antibiotics that eliminated unfused cells or cells of homotypic fusion (see Materials and Methods). Of the fused cells, more than 70% showed equal numbers of hLF versus mSMM nuclei, the great majority of which possessed one hLF and one mSMM nucleus while the rest contained two hLF and two mSMM nuclei. Less than 30% of cells showed unequal numbers of hLF and mSMM nuclei, the majority of which had an overrepresentation of mSMM nuclei (FIG. 8C). Cells were cultured for varying periods of time to allow for the resetting of gene expression in the new milieu. Regardless of culture period and medium formulation, fused cells remained as multinucleated heterokaryons, indicating that they had lost the ability to divide after fusion. Gene expression patterns became stabilized within 3 days of fusion. Day 4 post fusion was a target for analysis of gene expression.

[00058] To interrogate gene expression in hLF and mSMM before and after fusion, human and mouse Affymetrix microarrays were used. Although there is significant sequence divergence between human and mouse genomes (average 16% in coding regions), a human transcript in fused cells may still hybridize to orthologous probes on the mouse arrays and vice versa, given that the arrays are not designed for species- specific hybridization. To examine whether cross-species hybridization was a problem, cRNA was hybridized from each cell type to both the human and the mouse arrays. When cRNA from the correct species was hybridized to the arrays, about 45% of all the genes were called "present." By contrast, cRNA from the wrong species only led to about 10% of the genes being called "present." This shows that the arrays have sufficient species specificity to interrogate expression of a considerable fraction of genes in fused cells.

[00059] Four sets of array data were generated: hLF on human arrays, mSMM on mouse arrays, fused cells on human arrays, and fused cells on mouse arrays. To ensure robustness of the analysis, a list of genes shown by the array data to be active in mSMM but silent in hLF prior to fusion were selected. If, in the fused cells, these genes remain active in the mSMM genome and silent in the hLF genome, they would be placed in a candidate list of occluded hLF genes.

[00060] This analysis generated a candidate list of 279 putatively occluded hLF genes, all of which were subject to RT-PCR validation. For each gene, mouse- specific and human- specific RT-PCR primers were designed and confirmed to allow mouse and human gene expression in fused cells to be interrogated independently. Consistent with previous Affymetrix microarray studies, RT-PCR analysis showed that absence calls in the array data are much less reliable than presence calls. As a result, a large number of the candidate occluded hLF genes from the array data were shown by RT-PCR to be expressed at appreciable levels in hLF both before and after fusion. Winnowing out these and other false leads, 24 genes were confirmed by RT-PCR to exhibit expression patterns consistent with their occluded status in hLF (FIG. IA and Table 2). Of these, 9 have known muscle-related functions (indicated in FIG. IA). PCR on genomic DNA of fused cells using human- specific primers successfully amplified the hLF copies of all these genes, indicating that their lack of expression in fused cells is not due to the absence of hLF chromosomes. Indeed, it is unlikely that chromosome loss should occur in mitotically arrested heterokaryons. Applying the criteria in Table 1, a candidate list of 1040 putatively transactivated hLF genes were obtained. A subset of 202 genes was selected for RT-PCR validation. Many genes failed validation because RT-PCR detected appreciable levels of expression in hLF both before and after fusion. For a lot of these, RT-PCR did show increased expression after fusion, but were not considered transactivated genes per stringent definition of transactivation disclosed herein. This led to the identification of 10 transactivated hLF genes, of which 7 have known muscle-related functions (FIG. IA and Table X). For 3 of the genes (Ckm, Actal and MyIl), their transactivation is consistent with previous reports. For transactivated genes that showed significantly less amplification of the hLF transcripts than the mSMM transcripts in fused cells (such as My og, Actal, and Rap 1 gal), additional sets of primers confirmed that the differences in amplification reflected actual gene expression differences between the human and mouse orthologs, rather than differences in PCR efficiencies. In principle, the observed differences in gene expression between human and mouse transcripts in fused cells could be due to at least three possibilities. First, these genes could be partially occluded such that they turn on in response to the introduction of a conducive milieu, but not to the full extent possible. Second, the hLF cell population may be heterogeneous, with the genes in question being occluded in some but not all the cells. A third possibility is that incompatibility between mouse transcription factors made from the mSMM genome and human czs-regulatory sequences in the hLF genome results in only partial activation of these genes. [00062] Although occluded and transactivated genes are both silent in hLF prior to fusion, they clearly exist in two distinct states of transcriptional competency. Occluded genes do not become active even in the presence of a transcriptionally conducive milieu. In contrast, transactivated genes exist in a competent (though inactive) state that can turn on in response to the introduction of trans-acting factors in the milieu.

[00063] Ad hoc RT-PCR analysis also uncovered 4 extinguished mSMM genes and 6 occluded mSMM genes (FIG. IA). Extinction could result either from the introduction of transcriptional repressors or from the dilution or disappearance of transcriptional activators upon fusion. For extinguished mSMM genes, it was not possible to determine if their orthologs in hLF are occluded or not. The presence of occluded mSMM genes indicates that a given cell fusion experiment can be used to identify occluded genes in both fusion partners.

[00064] The identification of occluded hLF genes was carried out in a systematic and unbiased fashion, in the sense that all the candidate occluded genes based on the array data were subject to RT-PCR validation. The final tally of 24 occluded hLF genes therefore likely represents a considerable fraction of all occluded hLF genes in the context of the hLF-mSMM rrarøs-complementation experiment. By contrast, the transactivated hLF genes, extinguished mSMM genes, and occluded mSMM genes were uncovered by less systematic means.

[00065] The specification of the myogenic lineage is controlled by four transcription factors, Myodl, Myf5, Myog, and Myf6. Of these myogenic master triggers, Myodl and Myf5 are occluded in hLF, Myog is transactivated (and therefore competent) in hLF, and Myf6 is extinguished in mSMM (and therefore may be either occluded or competent in hLF). Myodl and Myf5 are known to be upstream of Myog and Myf6 in driving myogenic programs, and they also engage in positive auto-regulation and positive cross- regulation. Given such a regulatory circuit, should Myodl and Myf5 not undergo occlusion in non-muscle cells, any low-level expression of these genes caused by cellular noise is likely to become amplified through a feedback loop, which in turn could trigger the erroneous manifestation of muscle phenotype in non-muscle cells. The fact that Myodl and Myf5 are occluded in hLF (and in other non-muscle cell types as shown herein) therefore supports the model that the occlusion of key lineage-inappropriate genes serves to safeguard cell type identity against aberrant transdifferentiation. [00066] Stability of occluded state in fused cells over long culture periods

[00067] To investigate how the resetting of gene expression in fused cells is influenced by culture time, cells were incubated for 1, 2, 3, 4, 8 or 16 days after fusion. RT-PCR was used to examine the expression of genes listed in FIG. IA. This showed that the resetting of gene expression occurred mostly within the first 3 days of fusion, with expression patterns becoming stabilized after that. Occluded genes remained silent regardless of post-fusion incubation time (FIG. 9), demonstrating the temporal stability of the occluded state in fused cells. This temporal stability is further corroborated by experiments involving the fusion of other cell types.

[00068] Observation of gene occlusion not due to interspecies incompatibility

[00069] Although numerous transgene experiments using human promoters/enhancers to drive reporters in mice have shown conserved expression patterns between the human transgenes and endogenous mouse genes, what appears to be the occlusion of certain hLF genes may actually be the result of interspecies incompatibility - i.e., the failure of mouse transcription factors produced from the mSMM genome to recognize the corresponding human czs-regulatory sequences in the hLF genome. In order to address this possibility, two cell types that are both of mouse origin but from different strains were fused. One of the two cell types was mSMM, which is of C3H strain background. The other was mouse dermal fibroblasts (mDF) of B6 strain background. Sequence polymorphisms between the B6 and C3H mouse strains were explored to determine the origin of transcripts in fused cells.

[00070] Among the 24 occluded and 10 transactivated hLF genes, 11 and 3, respectively, were found to be informative in the mDF-mSMM fusion, meaning that they bear exonic polymorphisms between the two strains based on resequencing data, and are expressed in mSMM but not mDF based on RT-PCR data. For each of these genes, RT-PCR primers were designed to flank an inter-strain polymorphic site. The relative abundance of mDF (B6 strain) versus mSMM (C3H strain) transcripts of the gene in mDF-mSMM fusion cells was then assessed by sequencing the RT-PCR product. This analysis showed that, of the 11 informative genes occluded in hLF, all but one are also occluded in mDF based on their exclusive expression from the mSMM allele in fused cells, including the myogenic master triggers Myodl and Myf5 (FIG. 2; data also summarized in FIG. IB). The single exception is Chrnd, which is expressed at roughly equal levels from both mSMM and mDF alleles, indicating transactivation. Of the 3 informative genes transactivated in hLF, 2 were also found to be transactivated in mDF and one was occluded in mDF (FIG. 2; also summarized in FIG. IB). Similar to the hDF-mSMM fusion described above, genes found to be occluded in mDF in the mDF-mSMM fusion experiment remained silent in fused cells independent of culture time (FIG. 10). Thus, among the informative genes, those occluded in hLF are almost all occluded in mDF, and those transactivated in hLF are mostly transactivated in mDF. These results offer strong evidence that interspecies incompatibility played a negligible role in the identification of occluded genes in the hLF-mSMM fusion, although incompatibility might have affected a small number of genes. The fact that Chrnd appears occluded in hLF, but transactivated in mDF, suggests the possibility that the observed occlusion of this gene in hLF might be an artifact of interspecies incompatibility in the hLF-mSMM fusion.

[00071 ] Conservation of occluded state across species

[00072] Comparison between hLF and mDF suggests that the set of genes subject to occlusion in a given cell type - fibroblasts in this case - is conserved between divergent species. To further investigate this conservation, chimpanzee dermal fibroblasts (cDF) were fused with human skeletal muscle myoblasts (hSMM) in order to examine whether genes occluded in hLF are also occluded in cDF. The human-chimpanzee genome divergence is about 1/30 of that between human and mouse, and is in fact less than the polymorphism levels within many species. Interspecies incompatibility should therefore not be a significant issue in this case.

[00073] Of the 24 occluded and 10 transactivated hLF genes, 12 and 8, respectively, were found to be informative in the cDF-hSMM fusion. For these genes, RT-PCR was performed on cDF-hSMM fusion cells using primers common to both species but flanking human-chimpanzee nucleotide substitutions. Sequencing of the RT-PCR products revealed that of the 12 informative genes occluded in hLF, all are occluded in cDF (FIG. 11; summarized in FIG. IB). The occluded cDF genes include Chrnd, which is transactivated in mDF, supporting that the occluded status of this gene in hLF is real. Of the 8 informative genes transactivated in hLF, 6 are transactivated in cDF while the other 2 are occluded in cDF (FIG. 11; summarized in FIG. IB).

[00074] Thus, the occluded or transactivated state of genes in hLF is closely recapitulated in both mDF and cDF, indicating that the set of genes subject to occlusion in a given cell type is strongly conserved across species. Such conservation argues that the occlusion of lineage-inappropriate genes is a highly regulated process with important biological functions.

[00075] Effect of DNA synthesis and nuclear merger on the occluded state

[00076] In the mDF-mSMM fusion experiment, even though the majority of cells were heterokaryons, immediately after fusion and FACS purification, most cells became mononucleated after a few days of culture. Furthermore, the average nuclear diameter of these mononucleated cells is about 40% larger than that of either mDF or mSMM alone (FIG. 12). This could be due to the formation of a single nucleus from the multiple nuclei in a given fused cell (i.e., nuclear merger). The most likely scenario that multiple nuclei of a heterokaryon could merge is the breakdown and reassembly of the nuclear envelope as the cell undergoes mitosis. For this to occur, cells in the mDF-mSMM fusion must be capable of DNA synthesis and mitosis. By monitoring the incorporation of the thymidine analog 5-bromo-2'-deoxyuridine (BrdU), it was found that the majority of fused mDF-mSMM cells underwent de novo DNA synthesis a few days after fusion (FIG. 12). To further confirm that the single nucleus present in each of the mononucleated cells indeed contains both mDF and mSMM genomes, mDF and mSMM DNA, prior to fusion, were labeled with the thymidine analogs 5-chloro-2'-deoxyuridine (CIdU) and 5-iodo-2'-deoxyuridine (IdU), respectively. Four days after fusion and FACS purification, cells were co-immunostained for CIdU and IdU. For the great majority of mononucleated cells, the nuclei were found to be double positive for both CIdU and IdU, consistent with the merger of the mDF and mSMM nuclei (FIG. 12).

[00077] One complicating factor in identifying occluded genes in fused cells that have undergone mitosis is the possibility of chromosome loss. If some chromosomes are preferentially lost, they would be underrepresented in fused cells and the genes they carry could appear occluded. PCR was performed on genomic DNA of the fused cells, amplifying across the same polymorphic sites as those interrogated by RT-PCR. Sequencing of PCR products indicated the presence of both alleles at comparable levels for all genes investigated, which are physically scattered across the genome. The allele- specific expression seen in FIG. 2 is therefore not the result of chromosome loss. This data also argues the DNA synthesis observed in the fused mDF-mSMM cells is likely contributed by the replication of both the mDF and mSMM genomes, because if only one of two genomes has undergone replication, the alleles of the replicating genome should be consistently overrepresented in the genomic PCR product over the alleles of the non- replicating genome, which is not the case.

[00078] Cells in the mDF-mSMM fusion can undergo division, whereas cells in the other fusion experiments remain largely as mitotically arrested heterokaryons. This notwithstanding, the fact that occluded genes can be uncovered even after heterokaryons have undergone division argues that the occluded state is robust to DNA replication, nuclear merger, and changes in the cell cycle state.

[00079] Occlusion of muscle-related genes in diverse non-muscle cell types

[00080] If the occlusion of muscle-related genes, especially Myodl and Myf5, indeed serves to safeguard hLF against the accidental activation of myogenic programs, then similar sets of muscle-related genes are likely to be occluded in other cell types of non- myogenic lineages. To test this possibility, mSMM were fused with non-muscle cell types of diverse lineages, and RT-PCR was performed to examine whether the 24 genes occluded in hLF are also occluded in these other cell types. The non-muscle cells used included human mesenchymal stem cells (hMSC), human keratinocytes (hKe), and the human cervical cancer cell line HeIa. These cells provide a broad representation of both stem cells and differentiated cells, both normal cells and transformed cells, and cells derived from different germ layers.

[00081] Of the 9 known muscle-related genes occluded in hLF, the majority are also occluded in all these additional non-muscle cell types, including the myogenic master regulators Myodl and Myf5 (FIG. 13). Of the remaining 15 occluded hLF genes not known to be muscle-related, most were either expressed prior to fusion or were transactivated upon fusion in at least one of the non-muscle cell types interrogated. These results support the model that the occlusion of lineage-inappropriate genes, especially key master triggers of alternative lineages, contributes to the safeguarding of cell type identity.

[00082] Stability of the occluded state under varying physiological conditions

[00083] If the occluded state is indeed critical in safeguarding cell type identity, then it should be stable under a variety of physiological conditions. To investigate the stability of the occluded state, hLF was subjected to a variety of culture conditions mimicking various types of physiological stress, including low nutrient, hypoxia, hypothermia, and hyperthermia. Interferon-γ treatment was also included, which is known to have a dramatic effect on the expression of many genes in a variety of cell types including fibroblasts. The resulting expression patterns of the 24 occluded genes was examined under these culture conditions. All of them remained silent regardless of condition (FIG. 3A). As a control, a set of 61 genes were identified that are silent in hLF under the normal cultures condition based on microarray data and RT-PCR validation. Their expression patterns were examined under the alternative culture conditions. A total of 24 of the 61 genes (39%) became active in at least one of the conditions (FIG. 3B), which is statistically highly distinct from the behavior of zero activation among the 24 occluded genes (p < 0.00007 by Fisher's exact test).

[00084] These results demonstrate the extraordinary stability of the occluded state under variable physiological conditions, which stands in sharp contrast to the transcriptional lability of other genes in the genome. Researchers have often resorted to genome- wide gene expression patterns (i.e., the transcriptome) as a means of defining cell type identity. However, one cell type has the potential to display considerably different gene expression patterns under different physiological conditions, making the transcriptome too labile to provide a consistent definition of cell type. Results described herein argue that genome- wide gene occlusion patterns (i.e., the "occludome") could provide a much more consistent definition of cell type than the physiologically labile transcriptome. [00085] Probing the biochemical basis of the occluded state

[00086] The biochemical basis of the occluded state was investigated, focusing on chromatin modifications. DNA methylation in promoters and enhancers, which is frequently associated with gene silencing was examined. Bioinformatic surveys identified CpG islands in 18 of the 24 occluded and 4 of the 10 transactivated hLF genes (summarized in FIG. IB). Occluded genes thus appear enriched for CpG islands, though this is only marginally significant (p < 0.06 by Fisher's exact test). Extensive bisulfite sequencing was performed to analyze methylation patterns of 5 occluded and 5 transactivated hLF genes (3 of the occluded and 2 of the transactivated genes analyzed here contain CpG islands). Analysis was carried out on two cell types: hLF and hSMM. RT-PCR confirmed that all 10 genes are expressed in hSMM, indicating their competent state in these cells. Given that most of the genes are too big for bisulfite sequencing in their entirety, putative czs-regulatory regions were identified by cross-species sequence conservation. Also included in the analysis are regions surrounding transcription start sites (TSS) and experimentally validated enhancer elements irrespective of conservation.

[00087] FIG. 4 shows representative results of the methylation analysis for one occluded gene (Myf5) and one transactivated gene (Actal). Results for the remaining 8 genes are presented in FIG. 14. In 3 of the 5 occluded genes (Myf5, Cacngl and Rapsή), strong differential methylation was observed between hLF and hSMM, with at least a subset of the regions sampled having much higher levels of methylation in hLF than hSMM. Of the remaining 2 occluded genes, Myodl showed mild differential methylation in an enhancer far upstream of TSS, and Tnnil did not show differential methylation between hLF and hSMM in any of the regions sampled. In contrast to the occluded genes, none of the transactivated genes showed discernable differential methylation between hLF and hSMM. For Myf5 and Cacngl, at least some of the methylated regions in hLF fall within CpG islands. Although CpG islands are generally assumed to be unmethylated, there are clear exceptions such as many X-inactivated genes, some imprinted genes, and genes abnormally silent in cancer cells. Furthermore, normal CpG methylation can occasionally be found in non-imprinted, non X-inactivated genes, often in the tissue- specific manner. The methylation within CpG islands of some occluded genes may therefore represent another example of such exceptions.

[00088] For the 3 occluded hLF genes that showed robust differential methylation between hLF and hSMM, the vicinity of TSS is invariably a part of the differentially methylated regions. Bisulfite sequencing was used to examine the methylation status of TSS for the remaining 23 genes (Ly75 is not included because it is technically refractory to bisulfite sequencing). These remaining genes showed little or no TSS differential methylation between hLF and hSMM, regardless of whether they are occluded in hLF or not (FIG. 15; data of TSS methylation analysis also summarized in FIG. IB.)

[00089] For some genes, the occluded state is characterized by increased methylation relative to the competent state, especially around TSS. However, many occluded genes do not show appreciable differential methylation in TSS between occluded state in hLF and competent state in hSMM, suggesting that either methylation is not involved in conferring the occluded state to these genes, or if it is involved, it does so by acting in regions other than TSS. Data herein are in line with the latter possibility for at least some genes.

[00090] To further examine whether DNA methylation contributes causally to the occluded state, hLF was treated with the demethylating drug 5-aza-2'-deoxycytidine (AdC) prior to cell fusion. The treatment itself did not turn on the occluded hLF genes except for MsIn, which was activated very slightly by the drug. Upon fusion with mSMM, however, about half of the occluded hLF genes showed variable levels of transactivation (FIG. 5A). Yet, for the majority of these, the expression levels of the hLF copies were substantially lower than that of the mSMM copies. This could reflect either heterogeneous response of cells to drug treatment or the fact that for some of the occluded genes, demethylation only leads to partial erasure of the occluded state. It is noteworthy that AdC treatment can alter the occluded state of genes not showing appreciable differential TSS methylation between hLF and hSMM. It suggests that DNA methylation plays a role in maintaining the occluded state of these genes, but it does so by affecting regulatory regions outside of the immediate vicinity of TSS. It is also possible that AdC alters the occluded state of some of these genes not by DNA demethylation but rather by the drug's other unknown effects on chromatin state. These results, together with the bisulfite sequencing data, argue that whereas DNA methylation is probably a causal factor contributing to the occlusion of at least some genes, there are likely other mechanisms that also contribute to the occluded state.

[00091 ] Besides DNA methylation, many chromatin marks have been found to be either overrepresented or underrepresented at either silent loci of the genome or regions believed to be heterochromatic (e.g., the inactive X and the centromere). These marks are therefore good candidates in the search for biochemical mechanisms underlying the occluded state. 16 such marks were examined by chromatin immunoprecipitation (ChIP) followed by PCR. These included 7 histone modifications (H3K9Ac, H3K4me3, H3K9me2, H3K9me3, H3K27me3, H4K20mel, and H4K20me3), 5 histone variants (H2A.X, macroH2A, H2A.Z, H2A.Bbd, and CENPA), and 4 chromatin-binding proteins (Polycomb repressive complex proteins SUZ12 and EZH2, as well as HPl-α and HPl-γ which are mammalian homologs of Drosophila heterochromatin pro tin- 1). These marks were examined for three classes of genes in hLF: 24 occluded and 10 transactivated genes as depicted in FIG. IA, and a set of 17 actively expressed genes randomly selected from the microarray data and validated by RT-PCR (Table 3). For all these genes, focus was on the vicinity of TSS for chromatin analysis because it is the predominant site of differential chromatin modification in association with gene activity. For a few genes, validated enhancers were also included in the analysis.

[00092] For the great majority of these marks, there are significant differences between silent genes (including both occluded and transactivated genes) and expressed genes in a manner (FIG. 6A). Specifically, three marks, H3K9Ac, H3K4me3 and H2A.Z, are significantly enriched in expressed genes relative to silent genes, with the enrichment being most notable for H3K4me3. In contrast, 11 marks, H3K9me2, H3K9me3, H3K27m3, H4K20mel, H4K20me3, H2A.X, macroH2A, CENPA, SUZ12, EZH2 and HPl-α, show the opposite trend - i.e., they are significantly enriched in silent relative to expressed genes, with the most notable enrichment seen in H3K9me2, H3K9me3, H3K27me3 and H4K20me3. Two marks, H2A.Bbd and HPl-γ, did not show any significant difference between silent and expressed genes. In the comparison between occluded and transactivated genes, however, only one mark, HPl-α, showed a significant difference between the two types of genes, being enriched in occluded genes relative to transactivated genes.

[00093] To produce a visually more intuitive representation of the separation in chromatin signatures among the genes, principal component analysis was used to reduce the 16- dimensional data from the 16 marks to two dimensions. As expected, occluded genes and transactivated genes clustered closely with each other whereas expressed genes clustered separately (FIG. 6B). These data indicate that, of the chromatin marks surveyed, silent genes (including both occluded and transactivated genes) and expressed genes are highly distinct from each other, whereas occluded genes and transactivated genes are rather similar. Nevertheless, occluded genes do appear enriched for the binding of HPl-α as compared to transactivated genes, suggesting that HPl-α might be involved mechanistically in the occluded state.

[00094] The ChIP data failed to establish any difference in histone acetylation between occluded and transactivated states. To further probe whether histone acetylation might be involved in gene occlusion, hLF cells were treated with the histone deacetylase inhibitor trichostatin A (TSA) prior to cell fusion. The treatment itself did not have any long-term impact on the silent state of the occluded genes. After fusion, only two genes in the hLF genome, Myodl and Rcan2, showed very weak transactivation (FIG. 5B). These results suggest that, consistent with the ChIP data, the level of histone acetylation is not a major causal agent in conferring the occluded state, even though histone hypoacetylation is robustly associated with the lack of expression.

[00095] Identifying additional occluded genes in human lung fibroblasts

[00096] Although the search for occluded hLF genes in the hLF-mSMM fusion experiment was performed in an unbiased, systematic manner, only genes expressed in mSMM were accessible to the investigation. Indeed, a substantial fraction of occluded hLF genes identified in the hLF-mSMM fusion are muscle-related genes. However, to probe the occlusion status of a more diverse set of genes in hLF, hLF was fused with additional cell types, including mouse osteoblasts (mOst), mouse hepatocytes (mHe), and a mouse neuroblastoma line (mNeu) of neuronal precursor origin. Together with mSMM, these cell types represent all three embryonic germ layers.

[00097] For the fusion between hLF and mOst, the systematic approach was repeated as for the hLF-mSMM fusion - i.e., microarray analysis to detect candidate occluded genes followed by RT-PCR validation. Also included in the RT-PCR analysis were the 24 occluded hLF genes identified in the hLF-mSMM fusion. This led to the identification of 36 mOst-expressed genes that are occluded in hLF, 9 of which are also shown to be occluded in the hLF-mSMM fusion (FIG. 16). Among these occluded genes, of particular note is Runx2, a transcription factor that functions as a master trigger of osteoblast differentiation. This is analogous to the occlusion of myogenic master regulators Myodl and Myf5 in hLF as revealed by hLF-mSMM fusion. [00098] The hLF-mSMM fusion and the hLF-mOst fusion together identified 51 occluded genes in hLF. 11 of these genes have been shown previously to be targeted by Polycomb Repressive Complex 2 (PRC2) in human embryonic stem cells (these Polycomb target genes are indicated in FIG. IB and FIG. 16). This number is significantly higher than random expectation (p < 0.003), as only 1896 out of 22500 genes surveyed in the previous study are the target of Polycomb binding. This observation raises the possibility of a mechanistic link between Polycomb binding of certain genes in pluripotent ES cells and the later occlusion of these genes as ES cells differentiate into specialized cell types. The list of occluded genes was compared with the list of bivalent genes and the list of promoter-methylated genes (containing CpG island or not) identified previously in mouse ES cells, but no meaningful relationship was found. [00099] For the hLF-mHe and hLF-mNeu fusions, instead of using the systematic approach which is time consuming and laborious, a more focused (and therefore non- systematic) approach was used. A random set of known hepatic genes was selected and their expression in mHe and hLF by RT-PCR was examined. Genes found to be active in mHe but silent in hLF were tested further by RT-PCR for their expression patterns in the hLF-mHe fusion. This led to the identification of 6 occluded and 3 transactivated genes in hLF (FIG. 17). Similarly, a set of neuronal genes was selected and tested them in the hLF-mNeu fusion, which identified 3 occluded and 8 transactivated genes in hLF (FIG. 17).

[000100] Robustness of gene occlusion or transactivation to the use of different reprogrammers

[000101 ] Of the 24 occluded hLF genes identified in the hLF-mSMM fusion, 9 are expressed in mOst, 2 are expressed in mHe, and one is expressed in mNeu (the other genes are silent in these cell types). Whether these genes would also be shown to be occluded in hLF when hLF is fused to cell types other than mSMM was investigated. 9 mOst-expressed genes are indeed shown by the hLF-mOst fusion to all be occluded in hLF (FIG. 16). Similarly, the 2 mHe-expressed genes remained occluded in hLF in the context of the hLF-mHe fusion, while the one mNeu-expressed gene remained occluded in hLF in the context of the hLF-mNeu fusion (FIG. 18). [000102] Of the 10 transactivated hLF genes identified in the hLF-mSMM fusion, one is expressed in mOst while 3 are expressed in mHe. Whether these genes in hLF would also show transactivation when hLF is fused to cell types other than mSMM was examined. This is indeed the observation (FIG. 19). Thus, gene occlusion or transactivation in a given responder cell type is robust to the use of different reprogrammer cells as fusion partners, though there may be exceptions.

MATERIALS AND METHODS

[000103] Cell fusion

[000104] Mouse skeletal muscle myoblasts (mSMM), osteoblasts (mOst), hepatocytes

(mHe) and neuroblastoma (mNeu), and human lung fibroblasts (hLF) have been described previously, and are known by their common names as C2C12, MC3T3-E1 Subclone 4, AML12, Neuro-2a and MRC-5, respectively (Blau et al., 1985; Klebe and Ruddle, 1969; Wang et al., 1999; Wu et al., 1994; Yaffe and Saxel, 1977). C2C12 (CRL- 1772), MC3T3-E1 Subclone 4 (CRL-2593), MRC-5 (CCL- 171), HeIa (CCL-2), and human kerationcytes (hKe; CRL- 2404) were obtained from ATCC; human mesenchymal stem cells (hMSC) were derived as described (Zhang et al., 2007); chimpanzee dermal fibroblasts (cDF; S006007) were obtained from Coriell Institute for Medical Research; and human skeletal muscle myoblasts (hSMM; CC-2580T25) were obtained from Cambrex. Cell culture conditions followed published or vendor- supplied protocols.

[000105] Neomycin-resistant mSMM cells were generated by transfection with the pEGFP-

Nl plasmid (Clontech) and selection in 800 μg/ml G418. EGFP fluorescence varied within this cell population, but was negligible compared to dye fluorescence used for cell sorting. Puromycin-resistant hLF cells were generated using pBabe-puro retroviral vector from Addgene (#1764) (Morgenstern and Land, 1990). The vector was transfected into ProPakA.6 packaging cells (ATCC). 48 hours after transfection, the viral supernatant was filtered and added to the cells in the presence of 8 ug/ml polybrene, and 24 hours later cells were selected using 2 μg/ml puromycin.

[000106] One day before fusion, cells were labeled with 30 μM CMTMR or 10 μM

CMFDA Celltracker dye (Invitrogen) for 30 min at 37⁰C in culture medium. Subsequently, the cells were incubated in basal medium for one hour, and washed twice with PBS. After staining, mSMM cells were kept in low-serum medium composed of DMEM supplemented with 2% horse serum. Cell fusion was performed with polyethylene glycol (MW 1500) as described (Davidson et al., 1976). Briefly, one of the two cell populations was plated on 10-cm tissue culture dishes and the other cell population was overlaid. After attachment, cells were treated with warm PEG for 1 min, and then washed three times with warm basal medium. The fused cells were incubated for two hours in low-serum medium until cell sorting. Fused cells were purified to >98% purity by fluorescence activated cell sorting (FACS) with gating for dual fluorescence. After FACS, purified fused cells were maintained in low-serum medium until RNA extraction. The unfused mSMM used as control in expression studies were kept in low- serum medium for the same period as fused cells. For experiments in which antibiotic- resistant cells were fused, 4-8 μg/ml puromycin and 400-800 μg/ml G418 were added to the medium one day after fusion. The mOst cell line used in the fusion, MC3T3-E1 Subclone 4, is a preosteoblast line, which is normally maintained in a proliferative state in ascorbic acid free medium as previously described (Franceschi and Iyer, 1992). Prior to fusion, cells were cultured for 7 days in the presence of ascorbic acid as described (Xiao et al., 1997).

[000107] Microarray analysis [000108] Total RNA was purified from hLF, mSMM, and fused cells using TRIZOL reagent (Invitrogen) according to vendor's protocol, and used for microarray probe synthesis following standard Affymetrix protocols. Double-stranded cDNA samples generated using GeneChip One-Cycle cDNA Synthesis kit with first strand synthesis using oligo(dT) primers (Affymetrix) were used to synthesize biotin-labeled cRNA using GeneChip IVT Labeling kit (Affymetrix), and then the labeled cRNA samples were fragmented using GeneChip Sample Cleanup Module (Affymetrix). Hybridization, labeling and scanning were all performed by the Protein and Nucleic Acid (PAN) facility at Stanford University. The labeled cRNA sample from each cell type was hybridized to both of mouse MG U74Av2 and human HG U 133 A GeneChips (Affymetrix) with replicates to assess gene expression and cross-hybridization between species. Probe-level analyses of the images from scanning of chips were performed using Affymetrix GeneChip Operating Software (GCOS). Similar hybridization procedures were carried out for the hLF-mOst fusion. [000109] Threshold detection p-values were set to assign 'present' (p < 0.05), 'marginal'

(0.05 <p < 0.49), or 'absent' (p > 0.49) decision calls for each gene assigned by MAS 5.0 criteria using GCOS. Filtering gene lists based on absolute decision calls to get a candidate list of occluded genes was performed by using GeneSpring (Silicon Genetics). Occluded genes were filtered based on the following criteria: 'absent' calls in all of the replicates with hLF hybridized to human chip, 'absent' calls in all of the replicates with fused cells hybridized to human chip, 'present' or 'marginal' calls in at least one of the replicates with mSMM hybridized to mouse chip, and 'present' or 'marginal' calls in at least one of the replicates with fused cells hybridized to mouse chip. Filtering of transactivated genes was performed by 1) comparing genes based on absolute decision calls using GeneSpring with criteria of 'absent' calls in all of the replicates with hLF hybridized to human chip and 'present' or 'marginal' calls in at least one of the replicates with fused cells hybridized to human chip, 2) selecting genes showing differential expression between the two cell types based on signal intensity after normalization by RMA using RMAexpress (http://rmaexpress.bmbolstad.com), or 3) comparing the data from the two cell types by model-based expression index analysis using dChip (http://biosunl.harvard.edu/complab/dchip).

[000110] RT-PCR and sequencing

[000111 ] RNA (up to 2 μg) was used to generate cDNA using M-MLV reverse transcriptase and random primers (Invitrogen), or using Superscript III First-Strand Synthesis System with random primers for RT-PCR (Invitrogen) following vendor's protocol. Semi-quantitative PCR was carried out with variable template concentrations and PCR cycles to obtain linear range amplification of each gene. For the human- chimpanzee fusion experiment, primers were selected by identifying non-polymorphic primer sequences flanking intron- spanning amplicons that contain at least one single- nucleotide substitution between the two species based on genomic sequence alignment. For the mouse-mouse fusion experiment, amplicons containing at least one polymorphism between the two mouse strains were identified by sequencing randomly- chosen intron-spanning amplicons. Sequences of PCR primers and detailed conditions for RT-PCR are available from the inventors upon request. All DNA sequence analysis was performed with the ABI 3730 DNA Analyzer using the ABI BigDye Terminator (Applied Biosystems).

[000112] Analysis of DNA synthesis and nuclear merger

[000113] For analysis of nuclear merger, unfused cells were labeled with 10 μM 5-iodo-2'- deoxyuridine (IdU) or 5-chloro-2'-deoxyuridine (CIdU) in the media for 72 hours prior to fusion, and fused cells were stained specifically with mouse monoclonal anti-IdU (Becton-Dickinson, #347580; 1:500 dilution) and rat monoclonal anti-CldU antibodies (Accurate, #OBT0030; 1:250) based on published protocol (Vega and Peterson, 2005). These two antibodies do not cross-react when used for double- staining IdU and CIdU, but both recognize 5-bromo-2'-deoxyuridine (BrdU). Secondary antibodies were Oregon Green labeled goat anti-mouse (Invitrogen; 1:1000) and Cy3 labeled mouse anti-rat (Jackson Immunoresearch; 1:300). For analysis of DNA synthesis, BrdU was administered at 10 μM in the media immediately following cell fusion for 72 hours, and the cells stained with anti-BrdU antibody (Accurate, #OBT0030; 1:250) at a later time point. Because the incorporation of halogenated nucleotides into DNA could affect gene expression, the fusion experiment involving labeling with halogenated nucleotides was done separately from the fusion experiment for ascertaining the expression status of genes.

[000114] Chromatin analysis

[000115] Genomic regions targeted for bisulfite sequencing were chosen based on cross- species conservation as defined by the UCSC Genome Browser (Placental Mammal Conserved Elements by 28-way Multiz Alignment) (Kuhn et al., 2007). DNA methylation analysis was performed by bisulfite mutagenesis sequencing as described (Vallender and Lahn, 2006). Approximately 20 clones were sequenced for each region of interest and sequence files were analyzed using BiQ Analyzer software (Bock et al., 2005). Primers used in bisulfite sequencing are available from the inventors upon request. The analysis of Ly 75' s TSS was abandoned after 6 primer sets failed to amplify the region.

[000116] Chromatin immunoprecipitation (ChIP) were performed essentially as described previously (Li et al., 2003), with the following modifications. Samples were sonicated in 7 ml aliquots for 9 cycles of 20 seconds at 33% power using a Fisher Sonic Dismembrator Model 500 with a 0.5 inch flat tip horn. This amount of sonication yielded an average DNA fragment size of 300-700 bp. For immunoprecipitation from 7xlO⁶ cells, 40 μl of a 1:1 mixture of protein A and protein G Dynabeads (Invitrogen) were coupled to 10 μg of antibody, and then incubated with sonicated chromatin samples overnight. Immunoprecipitated chromatin was eluted from beads in 150 μl elution buffer (50 mM Tris pH 8, 10 mM EDTA, 1% SDS), digested with proteinase K (Roche), and DNA was purified using the GenCatch PCR Cleanup Kit (Epoch Biolabs). Semiquantitative PCR was performed with template concentration and PCR cycle tailored to each amplicon to obtain linear range amplification. PCR products were resolved on agarose gel and visualized by ethidium bromide staining. Densitometry analysis of background-subtracted images of PCR bands was performed using the Gel Analyzer module of ImageJ 1.37v (National Institutes of Health, http://rsb.info.nih.gov/ij). Measured PCR band intensities were normalized to IP input controls. The value of each data point was calculated as the average of at least three independent replicates. Sequences of PCR primers and detailed PCR conditions are available from the inventors upon request. Antibodies used for ChIP were as follows: Anti-H3 (abl791), H3K9Ac (ab4441), H3K4me3 (ab8580), H3K9me2 (abl220), H3K9me3 (ab8898), H3K27me3 (ab6002), H4K20mel (ab9051), H4K20me3 (ab9053), H2A.X (abl l l75), macroH2A.l (ab37264), H2A.Z (ab4174), H2A.Bbd (ab4175), and HPl-gamma (ab50365) were purchased from Abeam. Anti-CENP-A (sc-22787) was from Santa Cruz Biotechnology. Anti-SUZ12 (04-046) and HPl-alpha (05-689) were from Millipore. Anti-EZH2 (36- 6300) was from Invitrogen. For drug inhibition of DNA methylation, cells were plated at 20-25% confluence and treated with 10 μM 5-aza-2'-deoxycytidine (AdC) until the cells had undergone two population doublings. Cell fusion was then carried out and fused cells were incubated for 4 more days without AdC. For drug inhibition of histone acetylation, cells were treated with 1 μM trichostatin A for 24 hours until just before fusion. For both AdC and trichostatin A treatment, control cells not subject to fusion were exposed to the same temporal course of drug treatment. [000118] Analysis of gene expression under physiological alterations

[000119] Cells were cultured under either the normal condition (10% fetal calf serum at

37⁰C) or one of the conditions mimicking physiological alterations, including low nutrient (0.1% serum), hypoxia (380 μM of the hypoxia mimetic deferoxamine), hypothermia (33⁰C), hyperthermia (41⁰C), and interferon-γ treatment (100 ng/ml; Cell Sciences). Cells were maintained under each condition for 3 days, followed by RT-PCR analysis of selected genes as described herein.

Table 1. Expression patterns of occluded, transactivated, and extinguished genes

Expression pattern in reprogrammer Expression pattern in responder

Before fusion After fusion Before fusion After fusion Conclusion

Active Active Silent Silent Gene in responder occluded Active Active Silent Active Gene in responder transactivated & hence competent Active Silent Silent Silent Gene in reprogrammer extinguished

Table 2. Occluded and transactivated hl_F enes identified in the hLF-mSMM fusion.

Note: All PCR amplicons target TSS unless indicated otherwise as follows. En: upstream enhancer; Int: intronic enhancer; Up: immediate upstream of TSS; TSS#: one of multiple alternative TSSs.

Claims

CLAIM:

1. A method of identifying czs-silenced genes (occluded) by rrarøs-complementation, the method comprising:

(d) identifying cz^'s- silenced genes (occluded) in the first cell type by comparing the gene expression before and after the fusion, wherein a czs-silent gene is expressed from the genome of the second cell type before and after fusion and not expressed from the genome of the first cell type before and after fusion.

2. The method of claim 1, wherein the first cell type is designated as a responder cell type and the second cell type is a reprogrammer cell type.

3. The method of claim 1, wherein the first cell type is a cancer cell type.

4. The method of claim 2, wherein the reprogrammer cell is devoid of a nucleus.

5. The method of claim 1, wherein the cell types belong to different tissue types.

6. The method of claim 1, wherein the cell types belong to different species.

7. The method of claim 1, wherein the dissimilar cell types are mSMM from C3H mice and mDF from BG mice.

8. A method for determining the relative contribution of cis versus trans mechanisms of regulation of a target gene, the method comprising:

(a) fusing at least two disparate cell types in vitro;

9. An occludome of a cell type comprising an index of genes that is czs-silenced, wherein the cis-silenced genes are not capable of being activated by an appropriate trans- activating factor from a dissimilar cell type.

10. A method for determining the relative contributions of cis versus trans mechanisms to regulation of a target gene, the method comprising:

(a) fusing at least two disparate cell types in vitro;

(b) searching for the target genes differently expressed between the genomes of the fused cells and the unfused cells, wherein cis blocked genes cannot respond to trans signals introduced through the shared milieu of the fused cells; and (c) comparing cis with trans gene.

11. A method to identify occluded genes, the method comprising:

(a) fusing cells from different species, wherein one of the species is a "responder" and the others "reprogrammers";

(b) labeling the responders with different labels than the reprogrammers;

(c) detecting fused cells by the presence of dual labels;

(d) interrogating fused cells to determine expression patterns; and

(e) identifying cells that are silent in the responder genomes but active in the reprogrammer genomes.

12. The method of claim 11 wherein the species are mouse and human.

13. The method of claim 11 wherein the responder cells are human lung fibroblasts and the reprogrammer cells are mouse skeletal muscle myoblasts.

14. The method of claim 11, wherein the expression patterns are determined by microarray analysis.

15. The method of claim 11, wherein the expression patterns are determined by RT- PCR.