US20210343368A1 - Method for Engineering Synthetic Cis-Regulatory DNA - Google Patents

Method for Engineering Synthetic Cis-Regulatory DNA Download PDF

Info

Publication number
US20210343368A1
US20210343368A1 US17/273,821 US201917273821A US2021343368A1 US 20210343368 A1 US20210343368 A1 US 20210343368A1 US 201917273821 A US201917273821 A US 201917273821A US 2021343368 A1 US2021343368 A1 US 2021343368A1
Authority
US
United States
Prior art keywords
cell
expression
genomic
regions
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/273,821
Other languages
English (en)
Inventor
Gaetano Gargiulo
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Max Delbrueck Centrum Fuer Molekulare Medizin In Der Helmholtz
Max Delbrueck Centrum fuer Molekulare in der Helmholtz Gemeinschaft
Original Assignee
Max Delbrueck Centrum Fuer Molekulare Medizin In Der Helmholtz
Max Delbrueck Centrum fuer Molekulare in der Helmholtz Gemeinschaft
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Max Delbrueck Centrum Fuer Molekulare Medizin In Der Helmholtz, Max Delbrueck Centrum fuer Molekulare in der Helmholtz Gemeinschaft filed Critical Max Delbrueck Centrum Fuer Molekulare Medizin In Der Helmholtz
Assigned to MAX-DELBRÜCK-CENTRUM FÜR MOLEKULARE MEDIZIN IN DER HELMHOLTZ reassignment MAX-DELBRÜCK-CENTRUM FÜR MOLEKULARE MEDIZIN IN DER HELMHOLTZ ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GARGIULO, GAETANO
Publication of US20210343368A1 publication Critical patent/US20210343368A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/64General methods for preparing the vector, for introducing it into the cell or for selecting the vector-containing host
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6897Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids involving reporter genes operably linked to promoters
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Definitions

  • the invention relates to methods for generating cell-type specific expression cassettes and reporter vectors, as well as nucleic acid constructs that can be generated by such methods.
  • the cell-type specific expression cassettes and reporter vectors are characterized synthetic cis-regulatory DNA, also termed synthetic locus regions (sLCRs). sLCRs allow for a cell-type specific expression of reporter or effector genes.
  • the invention further relates to various uses of the reporter vectors, including the determination of a property of a cell, preferably a cell type, state or fate transition, in gene and viral therapy, drug discovery or validation.
  • Expression cassettes and reporter vectors have a wide range of applications in basic research, drug screening diagnosis or gene therapy.
  • lineage tracing of Fbx15 expression led to the discovery of defined factors capable of reprogramming fibroblasts into pluripotent cells 49 , and lineage tracing of Lgr5 expression enabled the identification of bona fine colon and small intestine stem cells 2 , which was later shown to mark several other adult tissue stem cells 3 .
  • the parallel development of sophisticated reporter strategies allows for single-cell resolution in analyzing multiple lineages.
  • reporter strategies based on adult stem cell biology can simultaneously inform on the origin of a tissue and it's aberrant homeostasis 6 7 8 .
  • Genetic reporters reflecting well characterized pathways can lead to a deeper understanding of complex signaling dichotomy such as transforming growth factor counteracting bone morphogenetic protein (BMP) signaling during hair follicle homeostasis 9 .
  • BMP bone morphogenetic protein
  • presently employed approaches for genetic tracing vectors rely on the use of cell-type, pathway specific or synthetic promoters or enhancers that are coupled to a reporter gene or a functional effector.
  • cell-type-specific promoters are based on placing the reporter gene or functional effector after the minimal promoter of a signature gene of the cell-type of interest. It allows thereby for the specific transcriptional activation of a given reporter or effector as mediated the promoter for the given gene.
  • Cell-type-specific vectors offer the possibility to use one given gene as a proxy of a cell state or developmental stage.
  • One example is the use of the Nestin promoter in order to mark neural progenitor cells. This approach is widely used and allows researchers to direct the activation of specific reporters or effectors in undifferentiated cells.
  • BMP response element specific for nuclear activity of SMAD1/5/8, which portrays the activation of the BMP pathway. While the BMP response element (BRE) reliably portrays the canonical pathway activation, it misses non-canonical activation and provides a reporter system which is insufficiently sensitive to feedback loops.
  • Limitations of using pathway-specific promoters include the need to rely on the assumption that the minimal set of regulatory elements used is sufficient to inform on the pathway activation. Furthermore a priori knowledge of such regulatory elements and their extensive characterization and isolation from their natural context is necessary and hamper their application for complex and less characterized cell types.
  • WO2001/49868 A1 discloses a cancer-specific gene expression vector comprising a promoter with a binding site (EF2bs) for the E2F transcription factor expressed in cancerous genes as well as additional binding sites for further transcription factors (e.g. SP1, AP1, NF1 or C/EFB).
  • EF2bs binding site for the E2F transcription factor expressed in cancerous genes as well as additional binding sites for further transcription factors (e.g. SP1, AP1, NF1 or C/EFB).
  • This approach still however relies on a priori knowledge of TF binding sites (e.g. EF2bs) previously identified as being relevant in specific types of cancer.
  • WO 2015/110449 A1 discloses a computational method for identifying cardiac and skeletal muscle specific regulatory elements with an enrichment of transcription factor binding sites (TFBS), wherein different regulatory regions (CSk-SH1-6; Sk-SH1) of a length of 300-500 bp are disclosed that each contain multiple (3-10) conserved TFBS.
  • TFBS transcription factor binding sites
  • WO 2008/107725 A1 discloses a computational method for identifying transcription factor regulatory elements (TFREs) active in a cell of interest, wherein the TFREs have a length of at least 6 to 100 bp, wherein 6 or more TFREs may be combined in a promotor element of an expression vector.
  • This technology employs however the fusion of the same pre-selected minimal promoter, with additional TFREs identified under any given conditions, i.e. the supervised merging of cis-elements with known function.
  • Guo et al. (Trends in Mol. Medicine, 14:410-418) review several viral vectors as well as transcriptional regulatory elements.
  • Gargiulo et al. (Mechanisms of Development, 35:193-203) disclose the identification of cis-acting elements for a cell-specific expression of a vitelline membrane protein gene 32 (VMPE) in the follicular epithelium of Drospholia, wherein the expression vectors comprise different segments of the regulatory genomic regions.
  • VMPE vitelline membrane protein gene 32
  • the technical problem underlying the present invention is to provide alternative and/or improved means for the generation of genetic tracing cassettes or vectors based on synthetic cis-regulatory DNA that allow for a cell-type or developmental stage specific expression of reporter genes or functional effectors.
  • the invention therefore relates to a method for generating a cell-type specific expression cassette, comprising the steps of:
  • the method allows for the generation of expression cassettes, which when introduced into a cell of interest yield expression of the reporter or effector gene in a manner highly specific to the particular entity or state, such as a cell type or state, which the reporter has been designed to depict, without the need of prior knowledge on the regulation of the gene expression in said entity or state of interest.
  • the method and constructs of the present invention are based on non-biased de novo approaches for decoding and reconstructing regulatory information for any given cell-type/state.
  • the invention represents an entirely novel approach based essentially on the clustering of cell-type/state specific TFBSs at cell-type/state specific signature genes.
  • the invention is also characterized by the advantages of employing a quantitative and/or statistical enrichment of relevant TFBS for any given cell-type/state.
  • the method essentially employs a systems biology approach to generate an expression cassette by identifying a set of endogenously occurring cis-regulatory elements from a given transcriptional signature of the cell type of interest and placing these cis-regulatory before a reporter or effector gene.
  • This approach is independent of pre-conceived information on particular characteristics of the cell type of interest, thereby allowing standardized, unbiased and straightforward production of reporter constructs for any given cell type.
  • the method identifies genomic sub-regions that comprise transcription factor binding sites characteristic for the cell type and assembles them into a set of genomic sub-regions that comprises a relevant portion of transcriptional regulatory sequence information within the cell type of interest.
  • the set of genomic sub-regions may also be referred to as a “synthetic cis-regulatory DNA”, “synthetic regulatory region” or “synthetic locus control region (sLCR)”.
  • the expression of the reporter or effector gene When introduced into a cell, the expression of the reporter or effector gene will occur, since in said cell type the transcription factors corresponding to the characteristic transcription factor binding sites are present and initiate expression of the reporter or effector gene.
  • the level of expression is thus related to the particular cell type.
  • Each cell type will essentially yield a different set of genes according to the signature gene set and each cell type will show differing levels of reporter expression depending on the transcription factors present and the combination of regulatory regions assembled in the sLCR.
  • the method is not limited to certain cell types, but may be applied to virtually any cell type and even distinguish cell state or fate transition within a certain cell type. To this end no a priori knowledge of gene regulation in the cell type of interest is needed.
  • the method only relies on the provision of a gene expression profile and genomic sequence data for a given cell type, which can be obtained using standard biomolecular techniques or consulting public databases.
  • the gene expression profile reflects the levels of gene expression within a cell type of interest.
  • RNA-SEQ or other sequencing or microarray-based techniques can be used to quantify the levels of RNA transcripts with in the cell type of interest.
  • the gene expression profile may also be potentially deduced using proteomics, e.g. by quantifying the expressed proteins or peptides present in the cell type of interest, which can be squared to the gene expression profile.
  • signature genes are selected that are characteristic for the cell type, cell state or entity of interest.
  • the selection of the signature genes can be adapted to the desired application.
  • signature genes may be selected according to their gene expression level, by ranking the genes of the cell type of interest according to their gene expression level and selecting genes that are above or below a certain threshold or selecting a predetermined number of highest or lowest expressed genes. For such a selection of signature genes the absolute expression levels of the genes of the cell type of interest serve as a reference. The resulting expression cassette may thereby faithfully report on the presence of the cell type of interest in various assays, independent of the cells to be probed.
  • the differentially regulated signature genes are selected by identifying genes that are up- or down-regulated compared to the expression levels in the reference cell type.
  • a gene expression profile of the cell type of interest and a reference cell type is provided.
  • the method may rely upon publically accessible annotated databases such as ENCODE, mENCODE (the mouse version of the ENCODE project), JASPAR, Ensemble, Entrez Gene, Genebank etc.
  • ENCODE publically accessible annotated databases
  • mENCODE the mouse version of the ENCODE project
  • JASPAR the mouse version of the ENCODE project
  • Ensemble Entrez Gene
  • Genebank etc.
  • transcription factors are identifiable by a skilled person through annotations of function in commonly available databases.
  • the target sequences, ie transcription factor binding sites, for each transcription factor are typically known to a skilled person and/or are obtainable using appropriately annotated databases such as those described above.
  • the method is directed towards the use of transcription factors for which their binding sites (in the form of DNA sequences or sequence motifs) are already known and/or preferably annotated in public databases.
  • the set of selected genes is used to determine a set of genomic regions from the genomic sequence data of the cell type of interest, wherein each genomic region comprises a sequence encoding a signature gene and additional genomic sequence adjacent to (preferably immediately flanking) the sequence encoding said signature gene.
  • This genomic sequence e.g. non-coding reference DNA (although cis-regulatory elements may be presented in coding regions), is intended to encompass regulatory sequences, which can be positioned upstream, downstream of, or within coding regions, more often in close proximity to a transcriptional start site but not exclusively there.
  • the size of the additional genomic sequence adjacent to the signature gene may vary as the method is advantageously not overly sensitive to the presence of extra portions of additional genomic sequence.
  • the additional genomic sequence should be large enough to encompass cis-regulatory elements (in particular transcription factor binding sites, or enhancers or silencers) that regulate the expression of the signature gene. It is known that such cis-regulatory elements may be in close proximity to the coding region structurally, but—given the 3D structural distribution of the genome in the nucleolus—the cis-regulatory elements may be located at a significant distance in terms of the linear genome sequence.
  • the regulatory genomic sequence is chosen based upon the folded three-dimensional state of the DNA within chromatin in the cell type by using topological associating domains as boundaries.
  • the method assumes cell-type specific non-coding CTCF binding sites as proxy for topological associating domains.
  • CTCF binding sites in the form of DNA sequences or sequence motifs
  • the method searches for multiple genomic sub-regions of similar or comparable size (e.g. equal size) that comprise one or more, preferably several, binding sites for the transcription factors that are encoded by the signature genes. All of the genomic sub-regions identified in step f) of the method thus comprise a DNA binding sites for a transcription factor that is characteristically expressed in the cell type of interest.
  • the genomic sub-region is assembled in a sLCR and said sLCRs is introduced into the cell of interest the characteristically expressed signature transcription factors may bind to said sLCR and regulate the expression of a downstream reporter or effector gene.
  • genomic sub-regions larger than the ones composing the sLCR are identified, which are redundant in terms of the binding sites for the characteristic transcription factors.
  • An assembly of a limited number of all identified genomic sub-regions is sufficient to represent the overall regulatory complexity and including all elements would not result in increased specificity but rather in unnecessarily large expression cassettes.
  • the method therefore further encompasses a step to select a minimal set of genomic sub-regions comprising transcription factor binding sites for a predetermined percentage of all transcription factors encoded by the selected signature genes.
  • the number of transcription factors encoded by the selected signature genes does not necessarily equal the number of transcription factor binding sites. In some selected embodiments, not all the transcription factors may have known binding sites or multiple transcription factor binding sites matrices may be associated to some transcription factors.
  • the method then preferably ranks the genomic sub-regions according to the number of transcription factor binding sites, in addition to the diversity of the transcription factor binding sites. For instance, the highest ranked genomic sub-region may contain 35 transcription factor binding sites for the transcription factors of step d), wherein 3 of these binding sites are represented 5 times in the same genomic sub-region, while the remaining binding sites are present only once. This highest ranked genomic sub-region would then comprise 23 different (unique) transcription factor binding sites which represent binding sites for 23 transcription factors of the signature genes. This highest ranked genomic sub-region would thus cover 23% of the characteristic transcription factors of step d).
  • a second (and potentially third) genomic sub-region(s) would be searched for that encompasses preferably transcription factor binding sites not yet contained within the 23 binding sites of the first genomic sub-region, and so on, such that the further genomic sub-region(s) would comprise at least 7 binding sites for transcription factors not already covered by the first, most highly ranked, genomic sub-region.
  • a minimal set of 2-10 genomic sub-regions will comprise transcription factor binding sites that are binding targets for at least 50% of the transcription factors encoded by the signature genes.
  • the minimal set of genomic sub-regions act as a synthetic cis-regulatory DNA to which the characteristic transcription factors can bind.
  • the minimal set of genomic sub-regions selected in step g) of the method is therefore herein therefore referred to as a synthetic locus control region (sLCR).
  • the cassette therefore comprises a regulatory region (sLCR) enriched for regulatory sequences that are bound by transcription factors that are e.g. expressed or highly expressed in the cell type of interest. This regulatory region is therefore unique/tailored to this particular cell type and lead to an expression level of the reporter gene unique to this cell type.
  • the predetermined percentage of coverage of transcription factors can be regarded as a “percentage of regulatory information” that is covered by the minimal set of genomic sub-regions. Theoretically, the higher the amount of regulatory information covered, the more specific the expression of the reporter or effector gene will be to the cell type. However, advantageously, a percentage covering at least 30% of regulatory information, preferably at least 40% or 50% yields excellent results in terms of a cell-type specific expression profile, as gauged by experimental validation.
  • a cell-type specific expression cassette is generated by assembling the set minimal of genomic sub-regions selected in step g) with a reporter or effector such that they are operably coupled, i.e. that the genomic sub-regions comprising the transcription factor binding sites as cis-regulatory elements are configured to regulate the expression of the reporter or effector gene.
  • the high coverage of regulatory information by means of the assembled genomic sub-regions without the need of prior information opens a vast potential of application for the methods and constructs described herein.
  • the expression cassettes as a part of a reporter vector, may be exploited in vitro and in vivo as a reporter for intrinsic cell states, for adaptive responses to external signaling or chemical inputs, cell fate transitions, reprogramming, forward and chemical genetic screenings.
  • the vectors can be used to deplete cell-type, developmental-stage or disease-specific populations in gene therapy or other genetic modification settings.
  • sLCRs may drive the tumor-specific expression of structural components of an oncolytic virus and/or co-stimulatory molecules aiming at increasing the specificity and effectiveness of an oncolytic therapy.
  • the method is characterized in that the gene expression profile comprises expression levels of genes in the cell type of interest, and
  • the second alternative allows for the selection of signature genes based upon a comparison of the expression level of the genes of said cell type as derivable from the gene expression profile.
  • Such an embodiment is particularly well suited for the generation of expression cassettes that will represent the cell type of interest in different experimental settings. To this end the selection of the genes that are 3- to 10-fold or more upregulated than the average expression level have yielded excellent results.
  • the first alternative allows for tailoring of the expression cassette to distinguish a cell type of interest compared to a reference cell type.
  • the cell type of interest may be a certain tumor cell
  • the reference cell type refers to a normal cell of the tissue type typically invaded by the tumor, or by the cell type from which the tumor cell originated.
  • the reference cell type may however also refer to the same type cell, but in a different cell state or before or after a fate transition.
  • the gene expression profile of the cell type of interest may refer to the gene expression profile of a cancer cell in a mesenchymal state after an epithelial-to-mesenchymal transition (EMT), whereas the gene expression profile of the reference cell type may refer to the gene expression profile of the same type of cancer cell, but in its epithelial state, i.e. before epithelial-to-mesenchymal transition (ETM).
  • EMT epithelial-to-mesenchymal transition
  • ETM epithelial-to-mesenchymal transition
  • the expression cassette will be able to distinguish cells that have undergone EMT from those that did not.
  • Expression cassettes derivable by selecting the signature genes based upon a relative regulation in comparison to reference cell types are characterized by particularly high specificity allowing for a distinction of the reference cell type from the cell type of interest without the need of any additional marker.
  • the method is characterized in that the predetermined percentage of transcription factors covered is 30% or more, preferably 40% or more, most preferably 50%, or more.
  • the method is characterized in that the genomic regions determined in e) correspond to genomic sequences of topological associating domains that contain the differentially regulated gene, wherein preferably a topological associating domain corresponds to a genomic sequence between two CTFC-binding sites.
  • an optimal coverage of the potential cis-regulatory elements governing the transcription of said signature genes can be achieved.
  • DNA sequences physically interact with each other more frequently than with sequences outside the topological associating domain, thereby forming a three-dimensional chromosome structures accessible for the transcriptional machinery.
  • Particularly good results could be achieved by selecting genomic sequence between two CTFC-binding sites.
  • Such embodiment yields an optimal balance between computational power resources, specificity of the non-coding cis-regulatory DNA to the genes they are most likely regulating and the size of the flanking DNA to cover the characteristic transcription factor binding sites.
  • the identification of genomic sub-regions of comparable, e.g. equal, size in step f) is performed by a sliding window algorithm of the genomic regions determined in e), wherein preferably the window has a length of 500 bp to 5000 bp, preferably 700 bp to 2000 bp, more preferably 800 bp to 1200 bp, most preferably 1000 bp and the sliding step has a length of 100 bp to 1000 bp, preferably 120 bp to 300 bp, more preferably 130 bp to 170 bp, most preferably 150 bp.
  • the sliding window is fixed to 1000 bp in size sliding by 150 bp steps, although the genomic sub-regions size resulting out of the scanning may vary in size because it depends on the statistical score and distribution of the TFBS.
  • the sliding window algorithm calculates the statistical enrichment of the transcription factor binding sites motifs from a relevant data base (e.g. JASPAR) restricted to the transcription factor bindings sites corresponding to the transcription factors identified in step d).
  • a relevant data base e.g. JASPAR
  • a list of significant enrichment of characteristic transcription factor binding sites within specific regions is generated and used to identify genomic sub-regions of comparable, preferably equal, size that comprise at least one transcription factor binding site for at least one characteristic transcription factor encoded by a signature gene.
  • tens (10 to 200, preferably between 20 and 180) of TFBS are comprised within genomic sub-regions of comparable size.
  • the multiple genomic sub-regions of comparable and limited size, preferably equal size, within the set of genomic regions determined in e) (according to step f), are typically the same size but may vary. Comparable in this context refers to multiple genomic sub-regions that exhibit preferably any window size of 500 bp to 5000 bp.
  • the genomic sub-regions have a length of 100 bp to 1000 bp, preferably 120 bp to 300 bp, more preferably 130 bp to 170 bp, most preferably 150 bp. If a sliding window algorithm is used, the length of the genomic sub-regions will preferably correlate with the sliding step. In other embodiments, the sliding window approach may use any given step size, from 1 bp up to those step sizes indicated for the window sizes above. The preferred length have been determined by employing the method to difference cell types and assay system and reflect the optimal results in terms of expression specificity and total size of the expression cassette.
  • the method is characterized in that the selection of a set of genomic sub-regions in g) is performed by calculating for each genomic sub-region identified in f):
  • TFBS transcription factor binding sites
  • the preferred method will rank the genomic sub-regions based upon the highest number of TFBS and the best diversity score. As an example of a number one ranking, in the genomic locus chr10:6019558-6019708, there are 20 TFBS that the said method associated with a Mesenchymal GBM state, with some repeated 2 to 6 times. Once the best ranked genomic sub-region is determined one may calculate the second best in all the remaining genomic sub-regions, wherein TFBS present in the first genomic sub-region are excluded from the ranking. By iteration one may calculate how many different genomic sub-regions are required to cover the entire set of transcription factor binding sites or a predetermined percentage.
  • the method is characterized in that the configuration of genomic sub-regions in h) is such that genomic sub-regions comprising a transcription start site are assembled adjacent and upstream of the sequence encoding the reporter gene and the genomic sub-regions not comprising a transcription start site are preferably assembled further upstream from the closest transcription start site.
  • the method may annotate all the genomic sub-regions elements (e.g. 150 bp elements) that contain a natural transcription start site and those which do not and the ranking will start from the transcription start site-containing genomic sub-regions. After the best ranked genomic sub-regions containing a transcriptional starting site is chosen, the ranking of additional genomic sub-region may be performed independent of whether those genomic sub-regions contain a transcription starting site or not.
  • the term “generating a cell-type specific expression cassette” relates to the design and physical production of a nucleic acid molecule.
  • the term “generating a cell-type specific expression cassette” relates to the design of a cell-type specific expression cassette without physically producing the corresponding nucleic acid molecule, for example the method may be a computer-implemented method or may comprise one or more computer-implemented steps in the method.
  • the method is or comprises computer-implemented elements and produces, as the output of the method, an in silico design, product, simulation and/or computer representation of said construct.
  • the “generating” of a cassette or construct may therefore in some embodiments occur in the computer, ie in computer software, for example the output may be a nucleic acid sequence, nucleic acid sequence information, ie in computer readable format.
  • the method of the present invention may also relate to a computer programme product, such as a software product.
  • the software may be configured for execution on common computing devices and is configured for carrying out one or more of the steps a) to h) of the method described herein.
  • the computer programme product of the present invention therefore also encompasses and directly relates to the features as described for the method provided herein. Further details on preferred computer-based approaches are provided in the examples and relevant references as described herein. If the method is carried out in a computer programme, for example by way of simulation or computer design of an inventive cassette, the sequence may, in some embodiments, be subsequently synthesized by methods known to a skilled person in a laboratory and utilized in which ever in vitro or in vivo application is desired.
  • the invention also relates to a system for carrying out the method described herein, comprising one or more computing devices, data storage devices and/or software as system components, wherein said components may be preferably connected in close proximity to one another or via a data connection, for example over the internet, and are configured to interact with one or more of said components and/or to carry out the method described herein.
  • the system may comprise computing devices, data storage devices and/or appropriate software, for example individual software modules, which interact with each other to carry out the method as described herein.
  • Step a) regarding providing a gene expression profile of a cell type of interest may be computer implemented, ie the information for a gene expression profile of a cell type of interest is preferably presented in a computer readable format, configured for processing in the further steps of the method.
  • Step b) regarding providing genomic sequence data of said cell type of interest may be computer implemented, ie the information for genomic sequence data is preferably presented in a computer readable format, configured for processing in the further steps of the method.
  • Step c) regarding selecting a set of signature genes from the gene expression profile, wherein said signature genes are (i) differentially regulated compared to a reference cell type or (ii) selected according to a gene expression level, is preferably computer-implemented.
  • genes and their expression profiles are represented as information in a format configured for processing by a computing device, such that a particular group of genes can be selected based on this information. This step may be automated or performed manually, depending on the selection characteristics employed/needed or skills of the user.
  • Step d), regarding identifying genes encoding a transcription factor within the set of signature genes selected in c), is preferably carried out in a computer implemented method, whereby the genes are annotated with function, such that a transcription factor function can be (optionally) automatically interrogated in any one or more of the identified signature genes.
  • Appropriate databases may be employed, as mentioned by way of example herein.
  • Step e) regarding determining a set of genomic regions from the genomic sequence data, wherein each genomic region comprises a sequence encoding a signature gene identified in c) and additional genomic sequence adjacent to the sequence encoding said signature gene is preferably carried out in a computer implemented method. Assessing and selecting genomic sequence adjacent to genes of interest can be carried out by a skilled person based on genomic sequence, ie as available from databases, either by using automatic selection criteria, or by manually assessing and selecting adjacent sequence.
  • Step f) regarding identifying multiple genomic sub-regions of equal size within the set of genomic regions determined in e), wherein said genomic sub-regions comprise one or more binding sites for one or more of the transcription factors identified in d), is preferably carried out using computer implemented methods.
  • the identification of binding sites for one or more of the transcription factors can be carried out using methods established in the art, for example any given sequence is searched and/or interrogated for the presence of known binding sites, defined by particular sequences or sequence motifs. Software configured for screening sequences for the presence of such known sequences is available to a skilled person.
  • Step g) regarding selecting a minimal set of genomic sub-regions, preferably between 2 and 10, from those determined in f), wherein the set of genomic sub-regions is selected to comprise transcription factor binding sites for a predetermined percentage of all transcription factors identified in d), is preferably carried out using a (optionally) automated computer algorithm. Details on the determination of genomic sub-regions is provided above. Multiple options are available for software solutions suitable for selecting the desired genomic sub-regions, or the selection can be carried out manually by the skilled user assessing the various sub-regions and compiling them to comprise binding sites for a certain percentage of the relevant transcription factors identified in step d).
  • Software can be designed and/or configured by a skilled person using established programming, coding, and bioinformatic techniques to assess genomic sub-regions for the presence of transcription factor binding sites, comparison of these binding sites to the transcription factors identified as signature genes, and selecting a compilation of genomic sub-regions to cover a predetermined percentage of the relevant transcription factors.
  • step h) of the method a cell-type specific expression cassette, comprising the set of genomic sub-regions selected in step g) operably coupled with a reporter or effector gene, is generated.
  • said “generating” may relate to the computer implemented production of nucleic acid sequence information in computer readable form and/or to the synthesis of a physical nucleic acid molecule based on and/or comprising said sequence.
  • the invention therefore further relates to a method for designing and/or manufacturing a nucleic acid molecule that corresponds, comprises or is based on the product DNA sequence information obtained from steps a) to g).
  • the method preferably comprises comprising carrying out the method described herein and subsequently synthesizing, cloning and/or isolating said nucleic acid molecule.
  • generating a cassette may in such embodiments comprise any relevant molecular biological or chemical technique for cloning, mutation, recombination, PCR amplification and/or synthesis used in generating a nucleic acid molecule.
  • the cassette is synthesized using de novo nucleic acid synthesis based on the information obtained by the method of the invention.
  • the invention relates to a cell-type specific reporter vector including an expression cassette generated by a method as described herein.
  • the invention relates cell-type specific reporter vector, comprising a synthetic regulatory region comprising 2 to 10 genomic sub-regions of 100 bp to 1000 bp, positioned adjacently, without a linker or with a linker sequence of or less than 100 bp positioned between said sub-regions, wherein said sub-regions originate from separate (non-adjacent) locations in the same genome of a cell type of interest, wherein the sub-regions cumulatively comprise binding sites for at least 5, preferably at least 10, most preferably at least 20 transcription factors, and
  • genomic sub-regions are operably coupled with a reporter or effector gene to regulate the expression of said reporter or effector gene.
  • genomic sub-regions are selected by a method according to the steps a) to g) as described herein.
  • a person skilled in the art will appreciate that preferred embodiments disclosed for the method equally apply to the cell-type specific reporter vector described herein.
  • the method of the invention leads to structural features of the vector, unique in this field.
  • a preferred embodiment of the invention relates to the construct design, where transcription factor binding sites from genomic subregions have a length of 100 to 1500 bp or 100 to 1250 bp, preferably 100 to 1000 bp, more preferably 120 bp to 300 bp, more preferably 130 bp to 170 bp, most preferably essentially 150 bp, combined with the origin of the genomic subregions from non-adjacent regions of the same genome.
  • the constructs of the invention are defined by a novel de novo and non-biased construction, by pulling together distinct/separated but highly relevant regulatory regions, that reflect the relevant size of regulatory information, in particular for sizes of preferably 120 bp to 300 bp, more preferably 130 bp to 170 bp, most preferably 150 bp, which approximate the size of a histone particle upon which DNA is wrapped.
  • a preferred embodiment of the invention relates to the construct design, where 5 or more transcription factor binding sites are used, i.e. the higher numbers of TFBSs reflect a novel de novo and non-biased construction, by pulling together sufficient numbers of TFBSs to cover a large regulatory portion of relevant TFs in any given cell type/state.
  • the genomic sub-regions are characterized in that they originate from separate locations in the same genome of a cell type and cumulatively comprise binding sites for at least 5, preferably at least 10, most preferably at least 20, or more, transcription factors.
  • the 2-10 (i.e. 2, 3, 4, 5, 6, 7, 8, 9 or 10) genomic sub-regions are compiled to form a sLCR comprising at least 5, 10, 15, 20, 25, 30, 35, 40, or more, transcription factor binding sites.
  • the genomic sub-regions cover binding sites for a large amount of transcription factors typically sufficient to cover the regulatory information of a cell type of interest.
  • the binding sites for the transcription factors refer to transcription factors that characteristically expressed in the cell type of interest.
  • the positioning the genomic sub-regions adjacently without a linker or with a linker sequence of less than 100 bp ensures a compact design of the reporter vector and an efficient transduction without comprising on the amount of regulatory information.
  • each of the genomic sub-regions has a length of 120 bp to 300 bp, more preferably 130 bp to 170 bp, most preferably 150 bp.
  • Such lengths of the genomic sub-regions optimally cover the relevant transcription factor binding sites enriched with statistical significance over the background genomic regions.
  • the optimal size of 150 bp may be due to the fact histones wrap around round 146 base pairs (bp) of the DNA genome around their core particles preventing access to transcription factors.
  • NFRs nucleosome free regions which are usually associated with active cis-regulatory DNA when upon unwrapping the DNA enables accessibility for transcription factors, which are therefore minimally 146pb.
  • the average size of cis-regulatory DNA is generally inferred by the average size of NFRs—otherwise referred to as DNAsel hypersensitive sites—which is about 1000 bp and usually contains a clustering of relevant transcription factor binding sites on these length scales.
  • the vector is characterized in that the genomic sub-region adjacent to the reporter or effector gene comprises a transcription start site. This ensures that the effector and reporter are in frame and may positively be regulated by the upstream synthetic regulatory region.
  • the unique design of the invention described herein has the advantage that a variety of reporter or effector genes can be coupled to the synthetic regulatory region comprising the genomic sub-regions depending on the desired application.
  • the vector is characterized in that the reporter or effector gene encodes a protein selected from a group comprising a fluorescent protein, a suicide gene, a luciferase, a ⁇ -galactosidase, a chloramphenicol acetyltransferase, a surface receptor, a protein tag, including but not limited to 6 ⁇ His tag, V5 tag, GFP tag, a self-processing ribozyme cassette, a mevalonate kinase and derivates thereof, a biotin ligase and derivates thereof including but not limited to BirA, a engineered peroxidase and derivates thereof including but not limited to APEX2, an endonuclease or site-specific recombinase and derivates thereof, including but not limited to restriction enzymes, Cre, Flp, Tn5, SpCas9, SaCas9, TALENs, a gene correcting a monogenic protein tag, including but not limited
  • Fluorescent proteins may be particularly useful for any kind of optical measurement of a signal indicative of the expression of the reporter gene. To this end the method may profit from using the state of the art microscopic and/or fluorescence-activated cell sorting devices and quantification techniques.
  • the invention can be readily employed using different kind of vector system and easily adapted to the cells of interests.
  • the vector is a viral vector, preferably a lentiviral or Adeno-associated viral vector.
  • the vector comprises a nucleic acid sequence according to SEQ ID NO 1-6 or a nucleic acid sequence with an identity of at least 80%, preferably of at least 90%, to any one of SEQ ID NO 1-6.
  • the invention allows for the provision of cell-type specific vector construct that mediate a reliable expression of desired reporter or effector genes in the cell type of interest without the need of a prior knowledge.
  • the vector construct allow for a variety of different application ranging from basic research to clinical studies or therapeutic strategies.
  • the vector constructs can be used for the identification of a cell type or the determination of an intrinsic cell state or developmental state of cells.
  • the vectors also allow to study how cells react to external signals or chemicals.
  • the vectors can be used in diagnostics, for example to determine the state or type of a cancer, e.g. whether an epithelial or mesenchymal glioblastoma is present and thereby allow for more effective therapeutic guidance.
  • the vectors may also be employed as pharmaceutical agents themselves for instance in gene therapeutic approaches.
  • the invention relates to the use of a vector for transforming a cell and/or determining a property of a cell, preferably a cell type, state or fate transition, for gene and viral therapy, drug discovery or validation.
  • the invention relates to a method for determining a property of a cell, preferably a cell type, state or fate transition, comprising the steps of
  • the reporter or effector gene may be a fluorescent protein, in which case microscopic devices may be used to quantitatively assess the fluorescent signal and thereby the expression of the reporter or effector gene in the cells probed.
  • the invention relates to a method for determining an intrinsic cell state, comprising the steps of
  • the invention relates to a method for determining cell fate transitions, comprising the steps of
  • the invention relates to a method for determining cell fate reprogramming factors, comprising the steps of
  • the invention relates to a method for determining the minimal requirements for in vitro cellular propagation of an intended phenotype, comprising the steps of
  • the invention relates to a method for a targeted correction of diseased cells, comprising the steps of
  • the invention relates to a method for Oncolytic viral therapy, comprising a comprising the steps of:
  • a property of a cell preferably a cell type, state or fate transition
  • a property of a cell preferably a cell type, state or fate transition
  • a further embodiment of the invention relates to using DNA methylation and/or ATAC-seq profiles as an input for signature genes discovery.
  • ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) is a technique used to assess genome-wide chromatin accessibility by probing open chromatin with hyperactive mutant Tn5 transposase that inserts sequencing adapters into open regions of the genome.
  • the mutant Tn5 transposase excises any sufficiently long DNA in a process called tagmentation, whereby the simultaneous fragmentation and tagging of DNA is performed by Tn5 transposase pre-loaded with sequencing adaptors.
  • the tagged DNA fragments are then purified, amplified by PCR and sent for sequencing. Sequencing reads can then be used to infer regions of increased accessibility as well as to map regions of transcription-factor binding sites and nucleosome positions.
  • chromatin accessibility of several classes of cis-regulatory elements is a predictive marker of in vivo DNA binding by transcription factors.
  • the repertoire of all accessible sites in chromatin is the strongest predictor of cell identity.
  • chromatin accessibility is the strongest predictor of cancer type similarity and can be used to identify subtype identities within the common dimensional space of individual cancer types.
  • ATAC-seq can be performed cells sorted according to expression levels of the reporter constructs described herein. Differential analysis of chromatin accessibility can therefore uncover many genes undergoing remodeling.
  • a further embodiment of the invention relates to target discovery and validation for drug targets in the area of stress responses (e.g. killing cells with high ER stress or inflammatory signaling) and senolitics (e.g. killing senescent cells).
  • stress responses e.g. killing cells with high ER stress or inflammatory signaling
  • senolitics e.g. killing senescent cells
  • a sLCR can be generated for a cell type/state with high ER stress, or inflammatory signaling, or undergoing senescence. Such a reporter can therefore be used to measure whether any given drug candidate, ie.e. applied during a screen, leads to change in the cell state.
  • a further embodiment of the invention relates to target discovery and validation for drug targets in the area of cell identity/fate changes.
  • specific regulatory profiles can be identified for any given cell identity, or for states before and after identity or fate changes, and a reporter constructs effectively generated.
  • sLCRs can be generated for cell types before and after identity change. Such reporters can therefore be used to measure whether any given drug candidate, ie.e. applied during a screen, leads to change in the cell state.
  • a further embodiment of the invention relates to target discovery and validation for synthetic peptides, using the methods and constructs described herein.
  • a further embodiment of the invention relates to target discovery and validation for therapeutic exosomes and anti-sense oligonucleotides, using the methods and constructs described herein.
  • a further embodiment of the invention relates to discovery of therapeutic potential of drug candidates in immunotherapy, including but not limited to, the role for innate immune cells in therapeutic response and resistance, and the use of sLCRs to engineer therapeutic adaptive immune cells (T-cells, NK) to resist exhaustion and main target specificity.
  • sLCRs therapeutic adaptive immune cells
  • sLCRs can be generated as a readout for immune cell activity and/or target specificity, and candidate molecules can be tested and changes in sLCR readout measured in order to assess if immune cells (T-cells, NK) can resist exhaustion when enhanced/treated with a candidate compound.
  • the invention relates to a computer-implemented method for determining the sequence of a synthetic locus control region (sLCR), comprising the steps a) to g) of the method as described herein.
  • the invention therefore also relates to computer software products capable and adapted to carry out the method steps a) through g) as described herein as well as a computer program for use in a methods described herein comprising instructions which, when the program is executed by a computer, cause the computer to carry out the steps of a) to g) of the method described herein.
  • the present invention is directed a method for generating cell-type specific expression cassettes, cell-type specific vectors using such an expression cassette as well as application of such vectors.
  • an expression cassette refers to a nucleic acid construct comprising nucleic acid elements sufficient for the expression of a gene product.
  • the expression cassette also encompasses an electronic representation of an expression cassette, as described herein.
  • an expression cassette comprises a nucleic acid (sequence) encoding as a gene product a reporter gene or a functional effector operatively linked to the selected genomic sub-regions comprising transcriptional binding sites that act as regulatory elements for the expression of the gene product.
  • synthetic cis-regulatory DNA refers to an arrangement of multiple genomic sub-regions that comprise validated and/or potential (putative/predicted) cis-regulatory sequences arranged adjacently (with or without a spacer) in a non-naturally occurring order (i.e. not occurring in that order or arrangement in a naturally occurring genome).
  • cis regulatory sequences are transcription factor binding sites (TFBS), promoters, enhancers, silencers, or other regulatory sequence capable of acting in cis on the expression of a coding region.
  • regulatory regions when arranged into a synthetic regulatory region, are typically characteristic for a cell type.
  • the method described herein preferably assembles these regulatory regions into a set of genomic sub-regions that comprises a relevant portion of transcriptional regulatory sequence information within the cell type of interest.
  • reporter vector refers to a nucleic acid construct comprising an expression cassette and further nucleic acid elements that allow for introducing the expression cassette into cells either in vitro or in vivo.
  • reporter vector can have one or more restriction endonuclease recognition sites (whether type I, II or IIs) at which the sequences can be cut in a determinable fashion without loss of an essential biological function of the vector, and into which a nucleic acid fragment can be spliced or inserted in order to bring about its replication and cloning.
  • Vectors can also comprise one or more recombination sites that permit exchange of nucleic acid sequences between two nucleic acid molecules.
  • Vectors can further provide primer sites, e.g., for PCR, transcriptional and/or translational initiation and/or regulation sites, recombinational signals, replicons, selectable markers, etc.
  • a vector can further contain one or more selectable markers suitable for use in the identification of cells transformed with the vector.
  • Vectors known in the art and those commercially available (and variants or derivatives thereof) can be used with the expression cassettes described herein.
  • Such vectors can be obtained from, for example, Vector Laboratories Inc., Invitrogen, Promega, Novagen, NEB, Clontech, Boehringer Mannheim, Pharmacia, EpiCenter, OriGenes Technologies Inc., Stratagene, PerkinElmer, Pharmingen, and Research Genetics, or can be freely distributed among scientists through Addgene.
  • viral vector refers to a nucleic acid vector construct that includes at least one element of viral origin and has the capacity to be packaged into a viral vector particle, encodes at least an exogenous nucleic acid.
  • the vector and/or particle can be utilized for the purpose of transferring any nucleic acids into cells either in vitro or in vivo. Numerous forms of viral vectors are known in the art.
  • virion is used to refer to a single infective viral particle.
  • “Viral vector”, “viral vector particle” and “viral particle” also refer to a complete virus particle with its DNA or RNA core and protein coat as it exists outside the cell.
  • transfection refers preferably to the delivery of DNA into eukaryotic (e.g., mammalian) cells.
  • transformation refers preferably to delivery of DNA into prokaryotic (e.g., E. coli ) cells.
  • transduction refers preferably to infecting cells with viral particles.
  • the nucleic acid molecule can be stably integrated into the genome generally known in the art.
  • transduction may however be used interchangeably herein and refer to the process of introducing a vector comprising an expression cassette into a cell.
  • cell-type specific relates to the specificity of the expression of a reporter or effector gene, when an expression cassette as described-herein is introduced into a cell of interest in comparison to other (e.g. reference cells).
  • the term cell-type specific encompasses an expression (level) specific to the cell type of the cell of interest as well as its cell state or fate.
  • the term cell-type specific expression cassette or vector therefore encompasses as well cell-state specific as well as cell-fate specific expression cassette or vectors.
  • reporter refers to gene products, encoded by a nucleic acid comprised in an expression construct as provided herein, that can be detected by an assay or method known in the art, thus “reporting” expression of the construct and/or “effecting” the state or fate of the cell they are expressed in.
  • Reporters and effectors and nucleic acid sequences encoding reporters are well known in the art. Reporters or effectors include, for example, fluorescent proteins, such as green fluorescent protein (GFP), blue fluorescent protein (BFP), yellow fluorescent protein (YFP), red fluorescent protein (RFP), enhanced fluorescent protein derivatives (e.g.
  • GFP green fluorescent protein
  • BFP blue fluorescent protein
  • YFP yellow fluorescent protein
  • RFP red fluorescent protein
  • enhanced fluorescent protein derivatives e.g.
  • eGFP eGFP
  • eYFP eGFP
  • mVenus eRFP
  • mCherry etc.
  • enzymes e.g. enzymes catalyzing a reaction yielding a detectable product, such as luciferases, beta-glucuronidases, chloramphenicol acetyltransferases, aminoglycoside phosphotransferases, aminocyclitol phosphotransferases, or puromycin N-acetyl-tranferases
  • luciferases e.g. enzymes catalyzing a reaction yielding a detectable product, such as luciferases, beta-glucuronidases, chloramphenicol acetyltransferases, aminoglycoside phosphotransferases, aminocyclitol phosphotransferases, or puromycin N-acetyl-tranferases
  • Appropriate reporters or effectors will be apparent to those of skill in
  • Preferred proteins are selected from a group comprising a fluorescent protein, a suicide gene including but not limited to thymidine kinase, a luciferase, a ⁇ -galactosidase, a chloramphenicol acetyltransferase, a surface receptor, a protein tag, including but not limited to 6 ⁇ His tag, V5 tag, GFP tag, a self-processing ribozyme cassette, a mevalonate kinase and derivates thereof, a biotin ligase and derivates thereof including but not limited to BirA, a engineered peroxidase and derivates thereof including but not limited to APEX2, an endonuclease or site-specific recombinase and derivates thereof, including but not limited to restriction enzymes, Cre, Flp, Tn5, SpCas9, SaCas9, TALENs, a gene correcting a monogenic disease, a tumour-
  • gene means essentially the coding nucleic acid sequence which is transcribed (DNA) and translated (mRNA) into a polypeptide in vitro or in vivo when operably linked to appropriate regulatory sequences.
  • the gene may or may not include regions preceding and following the coding region, e.g. 5′ untranslated (5′UTR) or “leader” sequences and 3′ UTR or “trailer” sequences, as well as intervening sequences (introns) between individual coding segments (exons).
  • Gene expression refers to the absolute or relative levels of expression and/or pattern of expression of a gene.
  • the expression of a gene may be measured at the level of DNA, cDNA, RNA, mRNA, proteins or combinations thereof. Gene expression may also be inferred from protein expression.
  • Gene expression profile refers to the levels of expression of multiple different genes measured for a cell type of interest. Gene expression profiles may be measured in a sample, such as samples comprising a variety of cell types, different tissues, different organs, or fluids (e.g., blood, urine, spinal fluid, sweat, saliva or serum) by various methods including but not limited to RNA-SEQ by massively parallel signature sequencing (MPSS), Serial Analysis of Gene Expression (SAGE) technology, microarray technologies, microfluidic technologies, in situ hybridization methods, quantitative and semi-quantitative RT-PCR techniques or mass-spectrometry.
  • MPSS massively parallel signature sequencing
  • SAGE Serial Analysis of Gene Expression
  • detecting expression is intended determining the quantity or presence of an RNA transcript or its expression product e.g. on the protein level.
  • expression level refers to the normalized level of a gene product, e.g. the normalized value determined for the RNA expression level of a gene or for the polypeptide expression level of a gene.
  • gene product or “expression product” are used herein to refer to the RNA transcription products (transcripts) of the gene, including mRNA, and the polypeptide translation products of such RNA transcripts.
  • a gene product can be, for example, an unspliced RNA, an mRNA, a splice variant mRNA, a microRNA, a fragmented RNA, a polypeptide, a post-translationally modified polypeptide, a splice variant polypeptide, etc.
  • RNA transcript refers to the RNA transcription products of a gene, including, for example, mRNA, an unspliced RNA, a splice variant mRNA, a microRNA, and a fragmented RNA.
  • Methods for detecting expression of the genes of the invention include methods based on hybridization analysis of polynucleotides, methods based on sequencing of polynucleotides, immunohistochemistry methods, and proteomics-based methods.
  • the methods generally detect expression products (e.g., mRNA) of the genes.
  • RNA Ribonucleic acid
  • the starting material is typically total RNA isolated from a biological sample, such as the cell type of interest, and a reference cell type, respectively.
  • RNA isolation can be performed using a purification kit, a buffer set and protease from commercial manufacturers, such as Qiagen (Valencia, Calif.), according to the manufacturers instructions.
  • Isolated RNA can be used in hybridization or amplification assays that include, but are not limited to, PCR analyses and probe arrays.
  • One method for the detection of RNA levels involves contacting the isolated RNA with a nucleic acid molecule (probe) that can hybridize to the mRNA encoded by the gene being detected.
  • the nucleic acid probe can be, for example, a full-length cDNA, or a portion thereof, such as an oligonucleotide of at least 7, 15, 30, 60, 100, 250, or 500 nucleotides in length and sufficient to specifically hybridize under stringent conditions to an intrinsic gene of the present invention, or any derivative DNA or RNA.
  • Hybridization of an mRNA with the probe indicates that the intrinsic gene in question is being expressed.
  • An alternative the level of gene expression in a cell type of interest involves the process of nucleic acid amplification, for example, by RT-PCR (U.S. Pat. No. 4,683,202), ligase chain reaction (Barany, Proc. Natl. Acad. Sci. USA 88:189-93, 1991), self sustained sequence replication (Guatelli et al., Proc. Natl. Acad. Sci. USA 87:1874-78, 1990), transcriptional amplification system (Kwoh et al., Proc. Natl. Acad. Sci.
  • gene expression may be assessed by quantitative RT-PCR.
  • PCR a target polynucleotide sequence is amplified by reaction with at least one oligonucleotide primer or pair of oligonucleotide primers.
  • the primer(s) hybridize to a complementary region of the target nucleic acid and a DNA polymerase extends the primer(s) to amplify the target sequence.
  • a nucleic acid fragment of one size dominates the reaction products (the target polynucleotide sequence which is the amplification product).
  • the amplification cycle is repeated to increase the concentration of the single target polynucleotide sequence.
  • the reaction can be performed in any thermocycler commonly used for PCR. However, preferred are cyclers with real-time fluorescence measurement capabilities.
  • Quantitative PCR (also referred as real-time PCR) is preferred under some circumstances because it provides not only a quantitative measurement, but also reduced time and contamination.
  • quantitative PCR or “real time QPCR” refers to the direct monitoring of the progress of PCR amplification as it is occurring without the need for repeated sampling of the reaction products.
  • the reaction products may be monitored via a signaling mechanism (e.g., fluorescence) as they are generated and are tracked after the signal rises above a background level but before the reaction reaches a plateau.
  • the number of cycles required to achieve a detectable or “threshold” level of fluorescence varies directly with the concentration of amplifiable targets at the beginning of the PCR process, enabling a measure of signal intensity to provide a measure of the amount of target nucleic acid in a sample in real time.
  • microarrays may be used for gene expression profiling.
  • microarray is intended an ordered arrangement of hybridizable array elements, such as, for example, polynucleotide probes, on a substrate.
  • probe refers to any molecule that is capable of selectively binding to a specifically intended target biomolecule, for example, a nucleotide transcript or a protein encoded by or corresponding to an intrinsic gene. Probes can be synthesized by one of skill in the art, or derived from appropriate biological preparations. Probes may be specifically designed to be labeled. Examples of molecules that can be utilized as probes include, but are not limited to, RNA, DNA, proteins, antibodies, and organic molecules.
  • DNA microarrays provide one method for the simultaneous measurement of the expression levels of large numbers of genes. Each array consists of a reproducible pattern of capture probes attached to a solid support. Labeled RNA or DNA is hybridized to complementary probes on the array and then detected by laser scanning. Hybridization intensities for each probe on the array are determined and converted to a quantitative value representing relative gene expression levels. See, for example, U.S. Pat. Nos. 6,040,138, 5,800,992 and 6,020,135, 6,033,860, and 6,344,316. High-density oligonucleotide arrays are particularly useful for determining the gene expression profile for a large number of RNAs in a sample.
  • Serial analysis of gene expression is a method that allows the simultaneous and quantitative analysis of a large number of gene transcripts, without the need of providing an individual hybridization probe for each transcript.
  • a short sequence tag (about 10-14 bp) is generated that contains Sufficient information to uniquely identify transcript, provided that the tag is obtained from a unique position within each transcript.
  • many transcripts are linked together to form long serial molecules, that can besequenced, revealing the identity of the multiple tags simultaneously.
  • the expression pattern of any population of transcripts can be quantitatively evaluated by determining the abundance of individual tags, and identifying the gene corresponding to each tag. For more details see, e.g. Velculescu et al., Science 270:484-487 (1995); and Velculescu et al., Cell 88:243-51 (1997).
  • Nucleic acid sequencing technologies are suitable methods for analysis of gene expression. The principle underlying these methods is that the number of times a cDNA sequence is detected in a sample is directly related to the relative expression of the mRNA corresponding to that sequence.
  • DGE Digital Gene Expression
  • SAGE Serial Analysis of Gene Expression
  • MPSS Massively Parallel Signature Sequencing
  • Next generation sequencing typically allows much higher throughput than the traditional Sanger approach. See Schuster, Next-generation sequencing transforms today's biology, Nature Methods 5:16-18 (2008); Metzker, Sequencing technologies the next generation. Nat Rev Genet. 2010 January; 11(1):31-46.
  • These platforms can allow sequencing of clonally expanded or non-amplified single molecules of nucleic acid fragments.
  • Certain platforms involve, for example, sequencing by ligation of dyemodified probes (including cyclic ligation and cleavage), pyrosequencing, and single-molecule sequencing. Nucleotide sequence species, amplification nucleic acid species and detectable products generated there from can be analyzed by such sequence analysis platforms.
  • Next-generation sequencing can be used in the methods of the invention, e.g. to determine the gene expression profile or the genomic sequence data of the cell type of interest.
  • RNA Sequencing uses massively parallel sequencing to allow for example transcriptome analyses of genomes at typically a far higher resolution than is available with Sanger sequencing- and microarray-based methods.
  • cDNAs complementary DNAs generated from the RNA of interest are directly sequenced using next-generation sequencing technologies.
  • RNA-Seq has been used successfully to precisely quantify transcript levels, confirm or revise previously annotated 5′ and 3′ ends of genes, and map exon/intron boundaries (Eminaga et al., 201 3. Quantification of microRNA Expression with Next-Generation Sequencing. Current Protocols in Molecular Biology. 103:4.1 7.1-4.1 7.14).
  • sequencing thus refers to any technique known in the art that allows the identification of consecutive nucleotides of at least part of a nucleic acid.
  • Exemplary sequencing techniques include IlluminaTM sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, massively parallel signature sequencing (MPSS), RNA-seq (also known as whole transcriptome sequencing), sequencing by hybridization, pyrosequencing, capillary electrophoresis, gel electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid-phase sequencing, high-throughput sequencing, massively parallel signature sequencing, emulsion PCR, sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by-synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiDTM sequencing, Illumina
  • proteome is defined herein as the totality of the proteins present in a cell type at a certain point of time.
  • Proteomics includes, among other things, study of the global changes of protein expression in a sample (also referred to as “expression proteomics”). Proteomics typically includes the following steps: (1) separation of individual proteins in a sample by 2-D gel electrophoresis (2-D PAGE); (2) identification of the individual proteins recovered from the gel, e.g. my mass spectrometry or N-terminal sequencing, and (3) analysis of the data using bioinformatics.
  • genomic generally refers to the complete set of genetic information in the form of one or more nucleic acid sequences, including text or in silico representations thereof.
  • a genome may include either DNA or RNA, depending upon its organism of origin. Most organisms have DNA genomes while some viruses have RNA genomes.
  • the term “genome” need not comprise the complete set of genetic information. The term may also refer to at least a majority portion of a genome such as at least 50% to 100% of an entire genome or any whole or fractional percentage therebetween.
  • genomic sequence data refers to data, including text or in silico representations thereof, on a genome, wherein the genomic sequence data may also relate to a genome preferably the majority of the genome, such as at least 50% to 100% of an entire genome or any whole or fractional percentage therebetween.
  • genomic sequence data of may include the actual sequencing of the genome of a cell type of interest or the reliance upon publically available data bases on genome sequence data such as the annotated Genome Sequence DataBase (GSDB), operated by the National Center for Genome Resources (NCGR).
  • GSDB annotated Genome Sequence DataBase
  • NCGR National Center for Genome Resources
  • the provision of genomic sequence data for a large number of species is publicly available through The UCSC Genome Browser created by the UCSC Genome Browser Group of UC Santa Cruz (CA, USA).
  • genomic region generally refers to a region a genome. Typically a genomic region refers to a continuous nucleic acid sequence stretch of the genome of the cell type of interest comprising at least one gene.
  • genomic sub-region refers to a portion of the a genomic region that is identified as described herein to comprise one or more binding sites for one or more of the transcription factors that have been identified as signature genes based upon the gene expression profile(s).
  • nucleic acid refers to any nucleic acid molecule, including, without limitation, DNA, RNA and hybrids or modified variants and polymers (“polynucleotides”) thereof in either single- or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid molecule/polynucleotide also implicitly encompasses conservatively modified variants thereof (e.g. degenerate codon substitutions) and complementary sequences as well as the sequence explicitly indicated.
  • conservatively modified variants thereof e.g. degenerate codon substitutions
  • degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19: 5081 (1991); Ohtsuka et al., J. Biol. Chem. 260: 2605-2608 (1985); Rossolini et al., Mol. Cell. Probes 8: 91-98 (1994)). Nucleotides are indicated by their bases by the following standard abbreviations: adenine (A), cytosine (C), thymine (T), and guanine (G).
  • A adenine
  • C cytosine
  • T thymine
  • G guanine
  • exogenous nucleic acid or “exogenous genetic element” relates to any nucleic acid introduced into the cell, which is not a component of the cells “original” or “natural” genome. Exogenous nucleic acids may be integrated or non-integrated, or relate to stably transfected nucleic acids.
  • “Functional variants” or “functional analogs” preferably refers to a nucleic acid or protein having a nucleotide sequence or amino acid sequence, respectively, that is “identical,” “essentially identical,” “substantially identical,” “homologous” or “similar” to a reference sequence which can, by way of non-limiting example, be the sequence of an isolated nucleic acid or protein, or a consensus sequence derived by comparison of two or more related nucleic acids or proteins, or a group of isoforms of a given nucleic acid or protein.
  • Non-limiting examples of types of isoforms include isoforms of differing molecular weight that result from, e.g., alternate RNA splicing or proteolytic cleavage; and isoforms having different post-translational modifications, such as glycosylation; and the likes.
  • variants refers to a nucleic acid or polypeptide differing from a reference nucleic acid or polypeptide, but retaining essential properties thereof. Generally, variants are overall closely similar, and, in many regions, identical to the reference nucleic acid or polypeptide. Thus “variant” forms of a transcription factor are overall closely similar, and capable of binding DNA and activate gene transcription.
  • the term “sense strand” refers to the DNA strand of a gene that is translated or translatable into protein.
  • the “sense strand” is located at the 5′ end downstream of the promoter, with the first codon of the protein is proximal to the promoter and the last codon is distal from the promoter. The opposite is referred to as the “anti-sense” strand.
  • operably linked refers to that the regulatory elements in the nucleic acid construct are configured to enable functional coupling between the regulatory element and gene, leading to expression of the gene, ie the regulatory element is preferably in-frame with a nucleic acid coding for a protein or peptide.
  • signature genes relates to genes that are selected from the genes of the cell type of interest genes that are characteristic for the expression profiles of said cell type of interest. Differentially regulated signature genes may be e.g. selected by identifying genes that are up- or down-regulated compared to the expression levels in the reference cell type, or by ranking the gene expression level for the cell type of interest and selecting signature genes based upon a threshold level or predetermined number of genes (e.g. most highly or most lowly expressed).
  • transcription factor refers to a protein that binds to specific DNA sequences and thereby controls the transfer (or transcription) of genetic information from DNA to mRNA.
  • the function of transcription factors is primarily to regulate the expression of genes. Transcription factors may function alone or in combination with further proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase to specific genes. Transcription factors contain at least DNA-binding domain, which attaches to a specific sequence of DNA (“binding sites”) typically adjacent to the genes that they regulate.
  • the term “microscopic device” relates to a device that comprises means for microscopic analysis of cells.
  • Microscopic analysis can be carried out, without limitation, by a light microscope, binocular stereoscopic microscope, bright field microscope, polarizing microscope, phase contrast microscope, differential interference contrast microscope, automatic microscope, fluorescence microscope, confocal microscope, total internal reflection fluorescence microscope, laser microscope (laser scanning confocal microscope), multiphoton excitation microscope, structured illumination microscope, transmission electron microscope (TEM), scanning electron microscope (SEM), atomic force microscope (AFM), scanning near-field optical microscope (SNOM), X-ray microscope, ultrasonic microscope.
  • Microscopic devices can additionally comprise a camera and/or detector for recording pictures of cells, for example, and a computer system for controlling the microscopic device.
  • the presence and/or intensity of a signal produced by reporter gene can be determined by means of a microscopic device, but also by other devices that can detect signals generated by reporter genes without limitation, such as flow cytometers, luminometers, spectrometers, photometers, or colorimeters.
  • topological associating domains preferably refers to a self-interacting genomic region, meaning that DNA sequences within a topological associating domain physically interact with each other more frequently than with sequences outside the topological associating domain, thereby forming a three-dimensional chromosome structures.
  • Topological associating domains can range in size from thousands to millions of DNA bases.
  • a number of proteins are known to be associated with topological associating domains formation including the protein CTCF and the protein complex cohesin.
  • the topological associating domains refers to a genomic sequence between two CTFC or cohesin binding sites.
  • the term “generating a cell-type specific expression cassette” relates in some embodiments to the design of a cell-type specific expression cassette without physically producing the corresponding nucleic acid molecule, for example the method may be a computer-implemented method or may comprise one or more computer-implemented steps in the method.
  • the term “generating a cell-type specific expression cassette” relates in some embodiments to the design and physical production of a nucleic acid molecule, preferably by de novo synthesis of the nucleic acid molecule.
  • Artificial gene synthesis (or de novo synthesis) is a preferred method of generating a cassette of the present invention and relates to methods used in synthetic biology to create any given nucleic acid sequence.
  • artificial synthesis differs from molecular cloning and polymerase chain reaction (PCR) in that the user does not have to begin with pre-existing DNA sequences. Therefore, it is possible to make a completely synthetic double-stranded DNA molecule with no major limits on either nucleotide sequence or size.
  • Gene synthesis approaches may be based on a combination of organic chemistry and molecular biological techniques and entire genes may be synthesized “de novo”, without the need for precursor template DNA.
  • the method has been used to generate functional bacterial chromosomes containing approximately one million base pairs.
  • Gene synthesis has become an important tool in many fields of recombinant DNA technology including heterologous gene expression, vaccine development, gene therapy, vector construction and various forms of molecular engineering.
  • the synthesis of nucleic acid sequences is often more economical than classical cloning and mutagenesis procedures. Multiple techniques are well-established and known to a skilled person.
  • gene therapy preferably refers to the transfer of DNA into a subject in order to treat a disease.
  • the person skilled in the art knows strategies to perform gene therapy using gene therapy vectors.
  • Such gene therapy vectors are optimized to deliver foreign DNA into the host cells of the subject.
  • the gene therapy vectors may be a viral vector. Viruses have naturally developed strategies to incorporate DNA in to the genome of host cells and may therefore be advantageously used.
  • Preferred viral gene therapy vectors may include but are not limited to retroviral vectors such as moloney murine leukemia virus (MMLV), adenoviral vectors, lentiviral, adenovirus-associated viral (AAV) vectors, pox virus vectors, herpes simplex virus vectors or human immunodeficiency virus vectors (HIV-1).
  • retroviral vectors such as moloney murine leukemia virus (MMLV), adenoviral vectors, lentiviral, adenovirus-associated viral (AAV) vectors, pox virus vectors, herpes simplex virus vectors or human immunodeficiency virus vectors (HIV-1).
  • non-viral vectors may be preferably used for the gene therapy such as plasmid DNA expression vectors driven by eukaryotic promoters or plasmid DNA sequence containing homology to the host genome in order to directly integrate the expression cassette at preferred locations in the genome of interest. DNA transfer may also be carried out using liposomes or
  • preferred gene therapy vectors may also refer to methods to transfer of the DNA such as electroporation or direct injection of nucleic acids into the subject.
  • the person skilled in the art knows how to choose preferred gene therapy vectors according the need of application as well as the methods on how to implement nucleic acid constructs such as the expression cassettes described herein into the gene therapy vector. (P. Seth et al., 2005, N. Koostra et, al. 2009., W. Walther et al. 2000, Waehler et al. 2007).
  • the method, system, or other computer implemented aspects of the invention may in some embodiments comprise and/or employ one or more conventional computing devices having a processor, an input device such as a keyboard or mouse, memory such as a hard drive and volatile or nonvolatile memory, and computer code (software) for the functioning of the invention.
  • the system may comprise one or more conventional computing devices that are pre-loaded with the required computer code or software, or it may comprise custom-designed software and/or hardware.
  • the system may comprise multiple computing devices which perform the steps of the invention.
  • a plurality of clients such as desktop, laptop, or tablet computers can be connected to a server such that, for example, multiple users can provide data or perform calculations at different steps of the method.
  • the computer system may also be networked with other computers or necessary databases, such as genomic databases, over a local area network (LAN) connection or via an Internet connection.
  • the system may also comprise a backup system which retains a copy of the data obtained by the invention.
  • the data connections necessary between the various steps of the method may be conducted or configured via any suitable means for data transmission, such as over a local area network (LAN) connection or via an Internet connection, either wired or wireless.
  • a client or user computer can have its own processor, input means such as a keyboard, mouse, or touchscreen, and memory, or it may be a terminal which does not have its own independent processing capabilities, but relies on the computational resources of another computer, such as a server, to which it is connected or networked.
  • a client system can contain the necessary computer code to assume control of the system if such a need arises.
  • the client system is a tablet or laptop.
  • the components of the computer system for carrying out the method may be conventional, although the system may be custom-configured for each particular implementation.
  • the computer implemented method steps or system may run on any particular architecture, for example, personal/microcomputer, minicomputer, or mainframe systems. Exemplary operating systems include Apple Mac OS X and iOS, Microsoft Windows, and UNIX/Linux; SPARC, POWER and Itanium-based systems; and z/Architecture.
  • the computer code to perform the invention may be written in any programming language or model-based development environment, such as but not limited to C/C++, C#, Objective-C, Java, Basic/VisualBasic, MATLAB, R, Simulink, StateFlow, Lab View, or assembler.
  • the computer code may comprise subroutines which are written in a proprietary computer language which is specific to the manufacturer of a circuit board, controller, or other computer hardware component used in conjunction with the invention.
  • the information processed and/or produced by the method can employ any kind of file format which is used in the industry.
  • the digital representations can be stored in a proprietary format, DXF format, XML format, or other format for use by the invention.
  • Any suitable computer readable medium may be utilized.
  • the computer-usable or computer-readable medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium.
  • the computer-readable medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a transmission media such as those supporting the Internet or an intranet, cloud storage or a magnetic storage device.
  • SEQ ID NO 1 ATATTTATTTTTAGGACCAGAAAGTTAAAGTGAATTGGATTTGATC MGT#1: CATTTTCTGAAAGGCTGGCAAGAATTCTTGACATTGCACAGGAATT subtype-specific sLCR for TCCATGTCAGCATGTTCTCACATGTATGATCTAATTTAGAGATTAT mesenchymal glioblastoma TTTGGGGGGCGGGGGTTGAGGAAATGGCATGACTCAGAGTTTAAAA (MES-GBM) generated using GCCCCAAATCTTAGCTGTGCCTGTGTAGCTTTACCACATAACCCAT the method described herein TGATAACTTAGTTGTGCAACCATCACCACCATCTGTTTTCAGAACT (see Example 1).
  • CTTTTCATTTTGCGAAACTGAAACCCGTTAAGCACTGATTTCCCAC MGT#1 is validated for use TCTCCCTCCTCCCAGCCCATAGCAAACCACCATCCCACCAGCACTT in several other solid TCATTTCGCAAATGGCAAAACTGAAGCCGATATTGTGGTTGTGACT tumors as reported for TATCCCAAAGTAATATACACATAAACCTCTATGGATGAGGAAAAAG Epithelial/Mesenchymal fate ACAGAGGGAAACTAAAAATTCAAAAGAACAAATTTGACTCACAGAT transition, including lung TTGCTGACTCATAGTTGTGACACTTCCTGGCTCAGGAAGTTGAATT and breast cancer.
  • Functionally analogous sequences refer preferably to the ability of the synthetic regulatory regions to promote transcription of an operably coupled reporter or effector gene in a cell type of interest.
  • Functionally analogous sequences refer preferably to the ability of the synthetic regulatory regions to promote transcription of viral essential genes and/or effector genes such as co-stimulatory molecules (e.g. cytokines/chemokines) in the diseases target cell of interest and not in non-diseased cells.
  • co-stimulatory molecules e.g. cytokines/chemokines
  • FIG. 1 Generation and validation of Synthetic Locus Control Regions (sLCRs)
  • FIG. 2 Intrinsic and Adaptive responses in MES- and PN-GICs revealed by sLCRs.
  • FIG. 3 GBM subtyping and Reprogramming using sLCRs.
  • FIG. 4 Tissue-independent Epithelial-Mesenchymal homeostasis revealed by sLCRs.
  • FIG. 5 Heterogenous Mesenchymal trans-differentiation revealed by sLCRs in vivo.
  • FIG. 6 Selection of MES GBM-subtype subtype-specific genes.
  • FIG. 7 Automated Synthetic Locus Control Regions (sLCR) generation.
  • FIG. 8 Intrinsic and Adaptive responses in MES- and PN-GICs revealed by sLCR.
  • FIG. 9 Transcription Factors binding to MGT #1 cis-regulatory DNA.
  • FIG. 10 Homeostatic maintenance of MGT #1 expression in breast cancer cells.
  • FIG. 11 MGT #1 reflects single and combinatorial contribution for TGFB and GSK126 to EMT.
  • FIG. 12 MGT #1 enables screening for cell fate transitions driven by external signaling and/or chemical perturbations.
  • FIG. 13 Intrinsic and Adaptive responses in MES- and PN-GICs revealed by sLCR—expanded.
  • FIG. 14 Heterogeneous Mesenchymal trans-differentiation revealed by sLCRs in vivo—expanded.
  • FIG. 15 sLCRs facilitate the discovery of therapeutic implications for non-cell autonomous crosstalk between tumor and immune cells.
  • FIG. 16 Extended characterization of Synthetic Locus Control Regions (sLCR).
  • FIG. 17 Further examples of adaptive responses revealed by sLCR
  • FIG. 18 The MES-GBM state induction measured by sLCRs in GICs is specific and reversible.
  • FIG. 19 MES-sLCRs to dissect the role for ionizing radiation and NFkB signaling in MES-GBM.
  • FIG. 20 Further evidence in support of sLCRs use in Phenotypic CRISPR/Cas9 forward genetic screens
  • FIG. 21 Further evidence in support of hMG cells to induce MGT #1 expression in hGIC and differential sensitivity to therapeutics and hMG cells
  • FIG. 22 Further evidence in support of sLCRs use in Phenotypic CRISPRi screens.
  • FIG. 1 Generation and validation of Synthetic Locus Control Regions (sLCR).
  • sLCR Synthetic Locus Control Regions
  • DRGs differentially regulated genes
  • GICs Glioma-initiating-cells
  • FIG. 2 Intrinsic and Adaptive responses in MES- and PN-GICs revealed by sLCR.
  • TFNa is the leading signaling contributing to the Mesenchymal GBM phenotype.
  • TNFa identified as top regulator as activator of two independently designed MES-GBM reporters (MGT #1-MGT #2) in MES-hGICs by adaptive response screening using the indicated cytokines up to 48 hours. Data are normalized to control.
  • MES-hGICs express higher basal levels of MGT #1 compared to PN-hGICs.
  • Cooperation between IL-6 and microglia cells in MGT #1-induction Live cell imaging of MGT #1 expression in MES-hGICs upon the indicated treatments.
  • d-e Differential MGT #1 activation informs on differential adaptive responses to TNFa. Expression changes for genes regulated by TNFa in either MES-hGICs or PN-hGICs measured by RNA-seq and hierarchical sample clustering.
  • TNFa Tumor Necrosis Factor alpha
  • MES Mesenchymal
  • PN Proneural
  • CL Classical
  • MGT #1-2 MES genetic tracing #1-2.
  • FBS fetal bovine serum
  • CBD Cannabidiol.
  • IRR Ionizing Radiation
  • FIG. 3 GBM subtyping and Reprogramming using sLCR.
  • T98 cells were transduced with either a Proneural sLCR or Mesenchymal sLCR driving mCherry as reporter and transfected with the indicated master regulators of PN subtype identity 50 or emtpy transfected.
  • Representative micrograph of T98cells (left) and FACS plot (Right) showing higher high intrinsic and TF-induced expression of the PNGT #2 but not MGT #2 reporter in T98 cells; Scale 100 ⁇ M
  • FIG. 4 Tissue-independent Epithelial-Mesenchymal homeostasis revealed by sLCR.
  • MGT #1 reveals adaptive responses to chemicals/morphogens in lung cancer cells. Left, representative MGT #1 expression in A549 cells seeded in 96-well and propagated for the indicated time.
  • CRISPRi and MGT #1 reveal mechanistical regulators of lung cancer EMT. Schematic diagram depicting screen. Dox, doxycycline. d) Immunoblotting of representative intermediate time-point of the CRISRPi screening. MGT #1-uorescence micrograph was taken before lysis. e) FACS sorting gating strategy for purifcation of MGT #1 high and low populations.
  • FIG. 5 Heterogenous Mesenchymal transdifferentiation revealed by sLCRs in vivo.
  • a) Representative coronal forebrain images of MES-hGICs; MGT #1-mVenus dim xenografts in NSG mice (n 10) at humane end point. Left, HE staining; right progressive insets showing magnification of GFP, Tubulin and DAPI counterstained tissue. Note the invasive glioma front being homogeneously MGT #1-mVenus high .
  • FIG. 6 Selection of MES GBM-subtype subtype-specific genes.
  • GSCs glioma-stem-like cells
  • FIG. 7 Automated Synthetic Locus Control Regions (sLCR) generation.
  • sLCR Automated Synthetic Locus Control Regions
  • sLCR generation involves assembly of n CREs from the closest to a natural TSS to the farthest distal-CREs, up to >50% of the TFBS diversity (MES-GBM in the example).
  • A denotes sLCRs generated by an automated algorithm.
  • FIG. 8 Intrinsic and Adaptive responses in MES- and PN-GICs revealed by sLCR. Representative live cell imaging of MGT #1 expression in GICs from FIG. 2 a.
  • FIG. 9 Transcription Factors binding to MGT #1 cis-regulatory DNA. a) Above, schematic representation of MGT #1 sLCRs. Below, a list of TFs for which ChIP-seq signal can be observed in the ENCODE public database in any of the cell lines used.
  • FIG. 10 Homeostatic maintenance of MGT #1 expression in breast cancer cells.
  • MGT #1 statically reflects a cell state or MGT #1 dynamically reflects cell homoestasis and in vitro homeostatic regulation is reestablished after perturbation (i.e. FACS purification of a MGT #1 dim population).
  • the green dashed circles highlight results in FIG. 4 a in which MCF7 and MDA-231 are shown to have intrinsic low or high MGT #1 expression, respectively, owing to their cell identity.
  • MCF7 and MDA231 were FACS sorted based on the best comparable MGT #1 intensity and propagated in vitro before FACS analysis shown in 4a.
  • FIG. 11 MGT #1 reflects single and combinatorial contribution for TGFB and GSK126 to EMT.
  • FIG. 12 MGT #1 enables screening for cell fate transition driven by external signaling and/or chemical perturbations. Shown is the Principal Component Analysis (PCA) of the data obtained from the screen. The two components PC1 and PC2 explain the largest variation in experiment.
  • PCA Principal Component Analysis
  • Hierarchical clustering was carried out of normalized florescence data from A549-MGT #1 cells were propagated and bottom reading fluorescence was scanned using a SPARM 20M TECAN plate reader. Clustering used Pearson correlation.
  • FIG. 13 Intrinsic and Adaptive responses in MES- and PN-GICs revealed by sLCR.
  • bubble size shows the magnitude of the change for each treatment over control (log 2-fold-change), with bubble color indicating the sign of the change (red or orange for enrichment, light blue for depletion).
  • FACS validation of the phenotypic screening Surface expression of CD133 and PNGT #2 were endogenous markers of cell identity. Note higher MES-hGICs MGT #1 expression compared to PN-hGICs.
  • Padj is indicated for representative comparisons and denotes results from overall 2-way ANOVA and Dunnett's multiple comparisons.
  • IR Ionizing Radiation.
  • TMZ Temozolomide.
  • FIG. 14 Heterogeneous Mesenchymal trans-differentiation revealed by sLCRs in vivo.
  • Each dot represents a given sample or the merge of all technical replicates, when available.
  • the analysis includes top principal components for the 250,000 most variable peaks across all samples.
  • Grey dots are all TCGA cancer types but GBMs/LGGs, which are colored along with the glioma stem cells from (Park et al., 2017, Cell Stem Cell 21, 209-224 Aug. 3, 2017) and GICs from this study.
  • the circle denotes the dimension occupied by the primary GBM/LGG and GICs/GSCs.
  • FIG. 15 sLCRs facilitate the discovery of therapeutic implications for non-cell autonomous crosstalk between tumor and immune cells.
  • a) Bright-field view and IF of representative MES-hGICs with the indicated reporters propagated as spheroid or organoids with immortalized human Microglia (hMG; upper and lower panels, respectively). Scale bar 50 um.
  • FIG. 16 Extended characterization of Synthetic Locus Control Regions (sLCR). Single-molecule RNA FISH quantification of MGT #1- and PGK-driven gene expression. Arrowheads/yellow denote cytoplasmic colocalization.
  • FIG. 17 Further examples of adaptive responses revealed by sLCR. Representative MGT #1 activation upon the indicated stimuli.
  • FIG. 18 The MES-GBM state induction measured by sLCRs in GICs is specific and reversible. a-b) Bar plot showing the individual response to the indicated factors/sLCRs after forty-eight hours of induction. c-d) Line plot showing the longitudinal expression of the indicated factors/sLCRs.
  • FIG. 19 MES-sLCRs to dissect the role for ionizing radiation and NFkB signaling in MES-GBM.
  • FIG. 20 Further evidence in support of sLCRs use in Phenotypic CRISPR/Cas9 forward genetic screens.
  • b) Box plot showing data quality assessment by comparing the distribution of highly-informative essential and all non-essential or non-targeting gRNAs in the unsorted screen conditions (P value Student's t-test).
  • Naive MES-hGICs carrying the Brunello library were FACS sorted in MGT #1 high and MGT #1 low and gRNAs normalized to the largest dataset and Log 2 converted (see methods). The indicated gRNAs are depleted compared to MGT #1 high fraction.
  • IPA Ingenuity Pathway Analysis (IPA) Top 25 Toxicity categories of all hits from the CRISPR/Cas9 KO screen (FC ⁇ 1.5; padj ⁇ 0.05). Only “positive regulators” are beyond the statistical cut-off. In bold, categories associated with retinoic receptors signaling.
  • IPA Upstream Regulator Analysis of all hits from the CRISPR/Cas9 KO screen (FC ⁇ 1.5; padj ⁇ 0.05). Positive and negative regulators of MES-GBM phenotype are colored in aqua and red, respectively. Grey denotes significant categories without directional enrichment.
  • FIG. 21 Further evidence in support of hMG cells to induce MGT #1 expression in hGIC and differential sensitivity to therapeutics and hMG cells a) Extended schematic depiction of the co-culture experiment in FIG. 4 ; For detailed media composition see Methods. b) FACS profiles of MES- or PN-hGICs-MGT #1 high alone or co-cultured with human microglia (hMG) or human CD34+-derived Myeloid-derived suppressor cells (MDSCs). c) Principal component analysis of the indicated RNA-seq profiles. Distances were calculated based on the average expression level of selected human MG markers obtained from Gosselin et al 2017.
  • FIG. 22 Further evidence in support of sLCRs use in Phenotypic CRISPRI screens.
  • a) Cumulative plot distribution for all the samples in the kinome screen (n 42), including technical replica and biological conditions: plasmid library, A549-H1944 input, A549-H1944+GSK126 high, med, low—controls—A549-H1944+GSK126+ dox high, med, low and A549-H1944+ dox high, med, low—screens for GSK126-driven EMT and homeostatic EMT, respectively.
  • Regions corresponding to DRGs were retrieved from the UCSC genome browser (hg19; Refseq table downloaded on Oct. 5, 2012) and scanned with windows of 150 bp and 50 bp steps (hereafter refer as cis-units).
  • the scanned area surrounding each signature gene was delimited by two distal CTCF sites, positioned >10 kb away from the TSS or TES.
  • Subtype-specific PWMs were mapped to the genomic regions using FIMO. PWMs best significantly over-represented regions (adj. p.value ⁇ 0.01; multiple backgrounds). For each window, whenever multiple matches for the same PWM were identified, the p-value of the best match was considered as a proxy for the affinity of that TF over that region.
  • TFBS pairwise correlation heatmaps in FIG. 1 a used the top 500 regions in terms of the score defined above.
  • Genomics coordinates vs TFBS correlation heatmaps, including the representative one in FIG. 1 a were generated with the top 100 scoring regions.
  • GBM stem-like cells GBM stem-like cells
  • current implementations of the method involve focusing on a validated Glioma-intrinsic signature 20 .
  • the first sLCRs were designed with manual selection of the top scoring cis-units based on PWM score and diversity. Also, the selection of the TSS-containing region was done manually.
  • the automated sLCR generation is written in python (URL GitHub/GitLab). The script takes as input a list of TFs, PWM, and the phenotype gene signature.
  • the selection of the best cis-units for any given a phenotype is generated by using an algorithm based on defined selection rules.
  • the algorithm first generates the ranking and the selection of the best cis-unit by applying the following formula: [Sum of scores ⁇ log 10(pvalue)*diversity (number of different TFBS)]. Iteratively, it removes the TFBS included in the selected cis-units.
  • the algorithm ranks cis-units also based on 5′ CAGE data. The ranked list is the output of the algorithm.
  • the automated procedure returned overlapping results with the manual selection ( FIG. 7 ). Heatmaps in FIG. 1 a - b were generated using heatmap.2 function from gplots R package.
  • RNA-seq generation RNA was extracted using Trizol (Invitrogen), precipitated using Isopropanol and purified using RNAClean XP beads. RNA-seq libraries generated for this study were constructed using the TruSeq Stranded Total RNA library prep kit. Beads-based approach was used for rRNA depletion (Ribo-Zero Gold; Illumina) and PCR amplification was performed as per the manufacturer's protocol. Final libraries were analyzed on Bioanalyzer or TapeStation and barcoded libraries were pooled and sequenced on an Illumina HiSeq2500 or HiSeq4000 platforms with either single-read 51 bp or paired-end 100-base protocols.
  • Illumina adaptors were trimmed using from the raw reads with Cutadapt, and raw reads were aligned to the human genome (Hg19 or Hg38) with TopHat. HTSeq was used to assess the number of uniquely assigned reads for each gene; expression values were then normalized to 10 7 total reads and log 2 transformed to obtain counts per millions (CPM).
  • FIG. 15 e data were analyzed using SeqMonk and reads were normalized by the standard analysis pipeline, applying DNA contamination correction and generating raw counts to perform DESeq2 differential analysis. The same pipeline with log transformation was used for visualization. Significance was determined using standard SeqMonk settings: p ⁇ 0.05 after Benjamimi and Hochberg correction with the application of independent intensity filtering. Quantitation was done as above. NFKB-related genes in MG vs GICs and TNFa vs GICs were determined using IPA, MES GBM signatures were obtained by the respective publications and plots were generated using Venny.
  • FIG. 15 e interaction map was generated using the function Ingenuity upstream regulator from IPA for the comparison MGT #1 High TNFa vs MGT #1 High C20MG co-culture.
  • ATAC-seq ATAC-seq on FACS sorted populations was performed on 20-50,000 cells from the in vivo experiment, and 50-100,000 from the in vitro experiment. Cells were centrifuged in PBS and gently resuspend the pellet in 50 ⁇ l of master mix (25 ⁇ L 2 ⁇ TD buffer, 2.5 ⁇ L transposase and 22.5 ⁇ L nuclease-free water, Nextera DNA Library Prep, Illumina), incubated 60 min, 37° C. with moderate shaking (500-800 rpm).
  • master mix 25 ⁇ L 2 ⁇ TD buffer, 2.5 ⁇ L transposase and 22.5 ⁇ L nuclease-free water, Nextera DNA Library Prep, Illumina
  • Transposition was stopped by 5 ⁇ l of Proteinase K and 50 ul of AL buffer (Quiagen), incubated at 56 C for 10 min and DNA purified using 1.8 ⁇ vol/vol AMPure XP beads and eluted in 18 ul.
  • the optimal number of PCR cycles for library amplification was determined per each sample using 2 ul of template followed by qPCR amplification using heat activated Kappa Hifi polymerase and EvaGreen 1 ⁇ . Final amplification was performed in 50 ul qPCR volume and 8-12 ul of template DNA. Primers were previously described (Buenrostro et al. 201).
  • ATAC-seq analysis Reads were adapter removed using trim-galore v0.6.2-nextera, then mapped using bowtie2 v2.3.5 (reference) default parameters. ATAC-seq analysis was performed using SeqMonk, by using as probes TSS ⁇ 5 kb final annotation on ENSEMBL mRNAs (2019 assembly). Counts were normalized using Read Count Quantitation function, and reads were corrected for total count only in probes per million reads, log transformed and further transformed by size factor normalization. Integration of sLCR ATAC-seq and TCGA ATAC-seq of FIG. 14 c was generated according to established protocols.
  • the sLCRs were synthetized initially at IDT and later at GenScript.
  • MGT #1-mVenus was cloned in the Pacl-BsrGl fragment of the Mammalian Expression, Lentiviral FUGW (gift from David Baltimore; Addgene #14883). Additional modifications, such as swapping of mVenus to mCherry, or MGT #1 with all other sLCR used either restriction enzyme digestion or Gibson cloning.
  • the sLCRs vectors are 3rd gen lentiviral system and have been used together with pCMV-G (Addgene #8454), pRSV-REV (Addgene #12253) and pMDLG/pRRE (Addgene #12251). Sall2 (ccsbBroad304_11117) Pou3f2 (ccsbBroad304_14774) were obtained from the CCSB-Broad Lentiviral Expression Library.
  • PN-hGICs were generated by our lab and will be described elsewhere. Briefly, a PN-hGICs were generated by transforming human NPC, by means of: pLenti6.2/V5-IDH1-R132H, TP53R173H and TP53R273H (point mutations introduced into TP53 ccsbBroad304_07088 from the CCSB-Broad Lentiviral Expression Library, and pRS-Puro-sh-PTEN(#1).
  • MES-hGICs were generated by transforming human NPC pRSPURO-sh-PTEN(#1), pLKO.1-sh-TP53 (TRCN0000003754) and pRS-shNF1. For these lines, thorough genetic, transcriptional and epigenetic characterization has been performed, as well as in vivo tumor formation and phenotypic mimicking ability. In vitro, GICs were propagated as described 76 with one modification.
  • PDGF-AA (20 ng/ml; R&D) is also supplemented to RHB-A (Takara). This medium composition will be referred to as RHB-A complete.
  • hGICs were cultured at 37° C. in a 5% CO2, 3% O2 and 95% humidity incubator.
  • T98G and U87MG were propagated in EMEM medium.
  • T98G were switched to RHB-A supplemented with EGF (20 ng ml-1), bFGF (20 ng ml-1), heparin (1 ⁇ g ml-1) and 5% penicillin and streptomycin and propagated first on standard tissue culture-treated plastic, then in ultra-low binding plastic (CORNING).
  • the MCF7, MDA-231, A549 and H1944, cell lines (kindly provided by the Rene Bernards lab, NKI) were cultured in RPMI medium. All cell lines were supplemented with 10% FBS, and 5% penicillin and streptomycin at 37° C. in a 5% CO2-95% air incubator.
  • Immortalized primary human Microglia C20 were cultured in RHB-A medium (Takara) supplemented with 1% FBS, 2.5 mM Glutamine (Thermofisher; 35050038), 1 ⁇ M Dexamethasone (Sigma; D1756) and 1% penicillin and streptomycin at 37° C. in a 5% CO2, 19% O2 and 95% humidity incubator.
  • Donor-derived CD34 cells were propagated in SFEM II (StemCell), SCF, FLT3-L, TPO, IL6 (all 100 ng/ml; easyexperiments.com), UM171 (Selleck, 0.035 ⁇ M), SR1 (Selleck, 0.75 ⁇ M), 19-deoxy-9-methylene-16,16-dimethyl PGE2 (Cayman, 10 ⁇ M).
  • Genome-wide CRISPR Knock-out in vitro screen For the genome-wide pooled CRISPR Knock-out screen, we utilized the Brunello library consisting of 77,441 sgRNAs targeting 19,114 genes (average of 4 sgRNAs per gene) and 1000 non-targeting controls. To achieve a library representation over 100 ⁇ , we transduced a total of 16 ⁇ 10 6 MES-hGICs-MGT #1 low cells at a MOI of ⁇ 0.5 and amplified the cells for 10 days prior introducing the treatment. At day 10, the cells were either treated with TNFa (10 ng/ml) and FBS (0.5%); Temozolomide (50 ⁇ M) and Irradiation (20 Gy) or left untreated.
  • gDNA extraction Before the gDNA extraction, we performed a FACs sorting of each condition, collecting the MES-hGICs-MGT #1 low , MES-hGICs-MGT #1 high and the unsorted populations.
  • the genomic DNA was extracted by lysing the cell pellets for 10′ at 56° C. in AL buffer (Qiagen), supplemented with Proteinase K (Invitrogen) and RNAse A (Thermo Scientific), subsequently purified with AMPure beads and eluted in EB buffer (Qiagen).
  • NGS libraries were constructed in a two-step PCR setup, where the PCR1 is used to amplify the sgRNA scaffold and insert a stagger sequence to increase library complexity across the flow cell, while the PCR2 introduced Illumina compatible adaptors with unique P7 barcodes, allowing sample multiplexity.
  • the PCR1 is used to amplify the sgRNA scaffold and insert a stagger sequence to increase library complexity across the flow cell
  • the PCR2 introduced Illumina compatible adaptors with unique P7 barcodes, allowing sample multiplexity.
  • 5 ⁇ g of each gDNA sample were divided over 5 parallel reactions, that were subsequently pooled together and purified using AMPure beads.
  • the optimal cycle numbers for PCR2 were determined for 1 ⁇ l of each PCR1 individually by conducting a qPCR amplification using KAPA HiFi HotStart Ready Mix (Roche) and 1 ⁇ EvaGreen (Biotium).
  • PCR1 and PCR2 were performed using KAPA HiFi HotStart Ready Mix. Primers are available upon request. Quality control of the final libraries was performed using the Qubit dsDNA HS kit (Invitrogen) for quantification and TapeStation High Sensitivity D1000 ScreenTapes (Agilent) for determination of PCR fragment size.
  • the barcoded libraries were pooled together in equal molarities and sequenced on an Illumina NextSeq500 using the 75 cycles V2 chemistry (1 ⁇ 75 nt single read mode).
  • Transwell co-culture Co-cultures of hGICs and immortalized primary human Microglia C20 were set up using hydrophilic PTFE 6-well cell culture inserts with a pore size of 0.4 ⁇ m (Merck). Human Microglia were seeded at 1.5 ⁇ 105 cells/well for 24 h on 6-well plates in respective medium. Medium was aspirated and cells were washed once with PBS before 1 ml of RHB-A complete medium was added. Transwell inserts were placed into plates and 5 ⁇ 105 single hGICs in a total volume of 1 ml of RHB-A complete medium were plated on insert surface. hGICs and C20 human Microglia were harvested after 48 h of co-culture for further analysis.
  • Transfection-Transduction Transfection and transduction were previously described in detail. Briefly, 12 ⁇ g of DNA mix (lentivector, pCMV-G, pRSV-REV, pMDLG/pRRE were incubated with the FuGENE-DMEM/F12 mix for 15 min at RT, added to the antibiotic-free medium covering the 293T cells and the a first-tap of viral supernatant was collected at 40 h after transfection. Titer was assessed using Lenti-X p24 Rapid Titer Kit (Takara) according to the manufacturer's instructions. We applied viral particles to target cells in the appropriate complete medium supplemented with 2.5 ⁇ g/ml protamine sulfate. After 12-14 h of incubation with the viral supernatant, the medium was refreshed with the appropriate complete medium.
  • Immunohistochemistry Tissues or tumorspheres were fixed in 4% PFA for 20′. Following fixation, dehydration was performed with increasing EtOH from 70% to 100%, Xylene and overnight Paraffin incubation. Paraffin-embedded samples (PES) were cut using a HM 355S microtome (Thermo Scientific). Hematoxylin/Eosin (HE) staining was performed with standard and slides images were acquired with an automated microscope (Keyence).
  • Permeabilization was performed with Triton 0,25% in PBS and—when appropriate—endogenous peroxidases were blocked with 3% H2O2 in water. Typically, we performed blocking with 5% normal goat serum (NGS).
  • Primary antibodies were: anti-GFP (Anti-GFP ab6556, 1:000), anti-MED1 (Abcam ab64965 1:500), anti-Tubulin (BD T5168, 1:2000), and secondary antibodies were: A31573, A11055 and A31571 Alexa Fluor 647, A21206 Alexa Fluor 488, A31570 Alexa Fluor 555.
  • RNA FISH and dual FISH-IF Cells were permeabilized in 70% ethanol (RNA FISH only) or with 0.5% triton X-100 (for dual IF-RNA FISH), washed in RNase-free PBS (1 ⁇ (Life Technologies, AM9932), fixed with 10% Deionized Formamide (EMD Millipore, S4117) in 20% Stellaris RNA FISH Wash Buffer A (Biosearch Technologies, Inc., SMF-WA1-60) and RNase-free PBS, for 5 min at RT.
  • IgK-MGT #1-mVenus and H2B-CFP were probed using SMF-1084-5 CAL Fluor® Red 635 and SMF-1063-5 Quasar® 570 custom Stellaris® FISH Probes (oligo sequence available upon request) in 10% Deionized Formamide 90% Stellaris RNA FISH Hybridization Buffer (Biosearch Technologies, SMF-HB1-10) at 31.5 ⁇ M in 100 ⁇ L transferred to the coverglass, hybridized at 37° C. in the dark. After 0/N incubation, slides were washed with RNase-free PBS 5 min (3 ⁇ ). If primary/secondary staining occurred, it was as described above.
  • Phenotypic screening Tumor cells were propagated as described above until the screening. Then we seeded 15′000/50 ⁇ l/well in 384 well plates (Corning), in Gibco FluoroBrite DMEM medium supplemented with the appropriate growth factors. Cells were dispensed as 50 ⁇ l suspension into each well using the SPARK20M Injector system (50 ⁇ l injection volume; 100 ⁇ l/s injection speed). For non-adherent cells (e.g. GICs), cells were further centrifuge at 1500 rpm for 1 h30 min at 37° C. Bottom reading fluorescence was scanned using a SPARM 20M TECAN plate reader at 37° C.
  • DMSO-soluble compounds such as GSK126, were robotically aliquoted using a D300e, whereas cytokines were robotically aliquoted to each well using an Andrew pipetting robot (AndrewAlliance), using the following concentrations:
  • Drug dose-response screening Transduced hGICs from transwell co-culture experiments were harvested into single cell suspension and sorted into mVenus high and low populations using a BD FACSAria III. Cells were counted and 7000 cells/50 ⁇ l/well were seeded onto 384-well black walled plates in RHB-A complete medium using the SPARK20M Injector system (50 ⁇ l injection volume; 100 ⁇ l/s injection speed). Drugs were typically dissolved as a 10 mM stock in DMSO and dispensed using the D300e compound printer (TECAN) for targeted dose-response with plate randomization and DMSO normalization.
  • TECAN D300e compound printer
  • Irradiation of hGICs Irradiation was delivered using the XenX irradiator platform (XStrahl Life Sciences), equipped with a 225 kV X-ray tube for targeted irradiation.
  • hGICs cultured in either 6-well plates or 96-well plates were placed in the focal plane of the beamline and exposed to irradiation for a specific time, depending on the target dosage, as calculated with an internal calculation software.
  • Matrigel organoids To generate organoids with co-culture of C20 human Microglia and hGICs, growth-factor reduced and phenol-red free Matrigel (BD; 734-1101) droplets were used as an extracellular matrix support. Target cells were harvested and single cell suspensions with 1.5 ⁇ 105 of C20 human Microglia and 3.5 ⁇ 105 of hGICs in a volume of 500 ⁇ l were prepared. Using pre-cooled consumables and pipette tips, 30 ⁇ l of Matrigel, thawed on ice, was added to each well of cold 60-well Minitrays (Thermofisher; 439225).
  • RT-qPCR cDNA was generated using SuperScriptTM VILOTMMasterMix RNA (0.5-2.5 ⁇ g) in 20 ⁇ L incubated at 25° C. for 10′, at 42° C. for 60′ and at 85° C. for 5′. RT-qPCR was performed with 10 ng cDNA/well, in a 384w ViiATM 7 System using 1 ⁇ PowerUp SYBR Green Master Mix (Applied Biosystems), in 10 ⁇ /well. Primers are available upon request.
  • Tissue dissection and Cell surface staining Brain tumor dissection was previously described in detail 77 . Briefly, the tissue was dissected with a scalpel, digested in Accutase/DNasel (947 ⁇ l Accutase, 50 ⁇ l DNase I Buffer, 3 ⁇ l DNase I) at 37° C. until needed. Filtered through a 120 ⁇ m cell strainer first and a 40 ⁇ m cell strainer before RBC lysis (NH4Cl, 155 mM; KHCO3, 10 mM; EDTA, pH 7.4, 0.1 mM). After washing in cold PBS, viability and cell count were assessed automatically with 0.4% Trypan Blue staining using a TECAN SPARK20M.
  • FACS sorting Transduced hGICs were harvested into single cell suspensions and resuspended into cold RHB-A complete and filtered into FACS tubes. Sorting was conducted using BD FACSAria III or Fusion. The appropriate laser-filter combinations were chosen depending on the fluorophores being sorted for. Typically, to remove dead cells, events were first gated on the basis of shape and granularity (FSC-A vs. SSC-A) and doublets were excluded (FSC-A vs. FSC-H). Positive gates were established on PGK-driven and constitutively expressed H2B-CFP as sorting reporter, to sort for populations with low to medium intensity of sLCR-dependent fluorophore expression.
  • Immunoblot Cell pellets were lysed in RIPA buffer (20 mM Tris-HCl pH7.5, 150 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% NP-40) supplemented with a 1 ⁇ Protease inhibitor cocktail (Roche), 10 mM NaPPi, 10 mM NaF, and 1 mM Sodium orthovanadate. The lysates were sonicated if necessary, and electrophoresis was performed using NuPAGE Bis-Tris precast gels (Life Technologies) in NuPAGE MOPS SDS Running Buffer (50 mM MOPS, 50 mM Tris Base, 0.1% SDS, 1 mM EDTA).
  • Protein was transferred onto Nitrocellulose membranes in transfer buffer (25 mM Tris-HCl pH 7.5, 192 mM Glycine, 20% Methanol) at 120 mA for 1 h. Protein transfer was assessed through staining with Ponceau Red for 5 min, following two washes with TBS-T. Blocking of membranes was done for 1 h at room temperature with 5% BSA in PBS. Dilutions of primary antibodies were prepared in PBS+5% BSA and membranes were incubated over night at 4° C. Following three washes for 5 min with TBS-T, dilutions of appropriate HRP-coupled secondary antibodies were prepared in PBS+5% BSA and membranes were incubated for 45 min at room temperature.
  • transfer buffer 25 mM Tris-HCl pH 7.5, 192 mM Glycine, 20% Methanol
  • ECL detection reagent Sigma; RPN2209
  • membranes were exposed to ECL Hyperfilms (Sgima; GE28-9068-37) to detect chemoluminescent signals.
  • Target Product code Manufacturer GFP Ab6556 Abcam Vinculin p-Stat3 y705 9145L Cell Signaling Stat3 sc-482x Santa Cruz p-NFKB p65 3033P Cell Signaling NFKB p65 86299 Abcam p-p38 t180 d3f9 45115 Cell Signaling p-p38 9211s Millipore Nestin 611658 BD Biosciences p-yH2AX Ser 139 05-636 Millipore K27me3 07-449 Millipore H3 total 1791 Abcam E-Cadherin 31950 Cell Signaling Vimentin 5741s Cell Signaling Goat Anti-Mouse IgG (H L) - HRP 626520 Invitrogen Goat Anti-Rabbit IgG (H L) - HRP G21234 Invitrogen
  • IncuCyte automated longitudinal imaging was performed in 96 wells black walls plates (Greiner). 300,000 cells per plate were seeded to reach optimal confluence at the end of the experiment. GSK126 was aliquoted using a D300e, whereas TGFB1+2 were manually aliquoted to each well. Both were refreshed every second day. The last timepoint was independently verified using a plate reader (BMC Clariostar).
  • CRISPRi screen For the CRISPRi screens, A549-MGT #1 ⁇ GSK126 ⁇ Dox cells were sorted on an Astrios Moflo. We aimed at a library representation of 1000 ⁇ (>6 million cells) in the 10% of the lowest (dim) and 10% of the highest (bright) cells within each population. The mid population was also sorted and included in the screen analysis, as control. Cells were lysed 10′ at 56 C in AL+ProteinaseK buffer (Quiagen) followed by DNA extractionwas extracted using AMPure beads (Agencourt) and RNAse A treatment. PCR amplification and barcode-tagging of the CRISPRi libraries was done essentially as described, including PCR buffer composition 77 .
  • PCR1 For each sample, in PCR1, we used 20 ug of DNA divided over 10 parallel reactions, including from input controls, whereas the plasmid library needed 0.1 ng of DNA in PCR1. Parallel PCR1 reactions were mixed together and 5 ul were used as template for PCR2. We used Phusion Polymerase (NEB), GC buffer and 3% DMSO in both PCR1 and PCR2. Primers are available upon request.
  • NEB Phusion Polymerase
  • GC buffer GC buffer
  • DMSO DMSO
  • CRISPR/Cas9 KO A549-MGT #1 were knocked-out for CNKSR2 and ARIDIA using a Cas9 RNP Synthego kit following instructions. Electroporation was performed using a BioRad XCell in PBS and using the standard pulse for A549 cells. Optimal gRNAs from the kit were first assessed using T7E1 as well as TIDE calculation (https://tide.nki.nl/). After that, we performed bulk assessment of MGT #1 fluorescence using flow cytometry as well as low confluence plating and manual cloning picking.
  • mice All mouse studies were conducted in accordance with a protocol approved by the Institutional Animal Care and Use Committee and in agreement with regulations by the European Union. Orthotopic glioma xenograft studies were conducted as previously described 76 with modifications.
  • NOD-SCID-IL2Rg/(NSG) mice were purchased from The Jackson Laboratory and maintained in specific-pathogen-free (SPF) conditions. We used male and female mice between 7-12 weeks of age.
  • Gene knock-out were performed using Synthego Gene Knockout Kits.
  • the sgRNAs were dissolved in nuclease free 1 ⁇ Te buffer to a stock concentration of 30 uM.
  • RNP complexes were formed by mixing the Cas9 nuclease-gRNAs in a ratio of 6:1.
  • Each RNP complex was electroporated into 250K A549-MGT1#1 in 2 mm cuvettes in 1 ⁇ PBS using the Biorad GenePulser xCell (150 volts, 10 ms). After electroporation the cells were cultured in RPMI supplemented with 10% Fetal Bovine Serum and 1% of penicillin/streptomycin.
  • g DNA was extracted using the Invisorb spin tissue isolation kit (Stratec), eluted in 50 ul of elution buffer and PCR was performed on target genes of interest using 800 to 1200 bp products centered around the gRNA target loci (primers available upon request). Knock-out efficiency was calculated using TIDE (NKI) and T7E1 assays. Individual clones were established or bulk KO cells were directly assayed by FACS using a BD LSRFortessa and FlowJo program.
  • TIDE TIDE
  • T7E1 assays Individual clones were established or bulk KO cells were directly assayed by FACS using a BD LSRFortessa and FlowJo program.
  • Example 1 Design of Expression Cassettes Comprising Subtype Specific Synthetic Locus Control Regions (sLCR) for Glioblastoma Multiforme (GBM) Tumor Cells
  • sLCR Subtype Specific Synthetic Locus Control Regions
  • GBM Glioblastoma Multiforme
  • GBM Glioblastoma Multiforme
  • GBM subtype identities and fate changes may hold therapeutic potential.
  • a predominant subtype and tumor cells with different subtype identities may coexist 17,18 .
  • tumors can change the dominant expression profile upon recurrence 19,20 .
  • subtype-specific GBM genes would substantially comprise the regulatory activity required to specific the subtype identity (i.e. cis-regulatory elements).
  • TFs transcription factor genes
  • GBM GBM stem-like cells
  • such synthetic Locus Control Regions should ideally comprise the minimal set of cis-units with the highest number (i) and diversity (ii).
  • at least one cis-unit composing one sLCR would also include a natural transcriptional start site (TSS), and would be placed immediately upstream the reported element ( FIG. 1 a ).
  • TSS natural transcriptional start site
  • Example 2 Genetic Tracing of Mesenchymal Fate in Human Glioma-Initiating Cells Using Lentiviral Vectors Comprising MGT #1 as sLCR
  • a typical lentiviral vector carrying a sLCR drives the subtype-expression of fluorescent reporters mVenus or mCherry.
  • mVenus is driven to the plasma membrane (by Igk leader and platelet-derived growth factor receptor (PDGFR) transmembrane sequences tagging; FIG. 1 c ) and the mCherry is shuttled to the nucleus through a NLS.
  • PDGFR platelet-derived growth factor receptor
  • lentiviral particles in HEK293T cells with MGT #1-mVenus sLCR used viral particles to infect human Glioma-initiating cells with a MES genotype (MES-hGICs).
  • MES-hGICs MES-hGICs
  • MES-hGICs were transduced with MGT #1 lentiviral particles.
  • PN-hGICs bear a combination of IDH1 and TP53 point mutations, which is only found in PN GBM, whereas MES-hGICs have triple knockdown of TP53, PTEN and NF1, featuring a MES GBM background.
  • MES-hGICs have triple knockdown of TP53, PTEN and NF1, featuring a MES GBM background.
  • we observed a minor but measurable increase in basal fluorescence in MES-hGICs suggesting that MGT #1 reflects a basal higher intrinsic signaling in these cells ( FIG. 1 e ).
  • TNFa is considered a prominent MES-GBM signaling pathway, and can induce a PN-to-MES transition 20
  • MGT #1 is faithfully reproducing a MES GBM signaling by exposing either MES-hGICs-MGT #1 low and PN-hGICs-MGT #1 low W to TNF ⁇ .
  • TNF ⁇ induced a fluorescence increase in both cell types as compared to each parental control.
  • MGT #1 informs on the differential response to external signaling between tumor cells with different genotypes.
  • MGT #1 is solid ground for a screening framework to identify relevant signaling for supporting MES-hGICs-MGT #1 low and PN-hGICs-MGT #1 low cells' growth and subtype identity.
  • Example 3 Use of MGT #1 and MGT #2 sLCRs as a Readout for Investigating Intrinsic and Adaptive Responses in GICs
  • a second independent reporter showed consistent results ( FIG. 2 a ), which supports our ability to generate a functional sLCR starting from a gene expression profile.
  • both MGT #1 and MGT #2 reporters indicated that FBS is capable of inducing a Mesenchymal differentiation, which—unlike in the case of TNF ⁇ —was accompanied by GICs differentiation as gauged by visual inspection and flow cytometry (data not shown).
  • This finding may be only in part explained by the presence of TGFB1, which is indeed a known component of FBS.
  • TGFB1 is a Mesenchymal inducer but does not strongly induce MGT #1 not it promotes differentiation when used as purified cytokine within the same timeframe ( FIG. 2 a ).
  • this observation on the FBS is highly consistent with the TCGA report that MES GBM signature cannot be find in any of the mouse brain cells but only in FBS cultured astroglial cells 16 .
  • TNF ⁇ tumor microenvironment
  • GAMs glioblastoma-associated microglia/monocytes
  • hGAMs were insufficient to drive MGT #1 expression in MES-hGICs neither unstimulated nor when exposed to the TLR4 endogenous ligand Tenascin-C(TNC 32 ), which is another GSCs-derived pro-inflammatory factor 33 .
  • TNF ⁇ drove MGT #1 induction in MES-hGICs regardless of the presence of hGAMs ( FIG. 2 b ).
  • our data uncover a potential cellular cross-talk in the GBM TME revolving around IL6 signaling and leading to the MES GBM specification.
  • MGT #1 is informing on the impact of an active signaling (e.g. TNF ⁇ ) and it does reflect similar cell fate transitions even when preexisting context-dependent differences are in place (e.g. a Mesenchymal signaling amplification or transition).
  • an active signaling e.g. TNF ⁇
  • TNF ⁇ drives PN-hGICs to a state that is closer to MES-hGICs in their na ⁇ ve state ( FIG. 2 c - d - e - f ).
  • Example 4 Use of MGT #1 to Functionally Test Whether Environmental Insults (e.g. Ionizing Radiations) May Induce Mesenchymal Transdifferentiation in GBM Cell Autonomous Manner
  • Environmental Insults e.g. Ionizing Radiations
  • both MES-hGICs-MGT #1 low and PN-hGICs-MGT #1 low cells showed an augmented Mesenchymal differentiation in combination with TNF ⁇ , indicating that TNF signaling and IR cooperatively induce this cell fate specification.
  • these data support the conclusion that sub lethal IR cooperates with other mechanisms to drive a Mesenchymal transition in GBM.
  • the data also support the speculation that NFKB activation is augmented as a result of non-canonical signaling caused by genotoxic stress 36 .
  • the Proneural GBM is thought to represent the common GBM ancestor subtype and also to reflect an oligodendrocytic cell-of-origin 26 ′ 37 .
  • Previous studies revealed that longstanding propagation in FBS affects the phenotypic identity of individual cell lines 25,16 .
  • Example 6 Dissecting Epithelial-to-Mesenchymal Transition in Breast and Lung Cancer Cells Using MGT #1
  • the Mesenchymal transdifferentiation is a physiologic process hijacked by multiple tumors of epithelial origin 39 .
  • MGT #1 transduces MGT #1 into well characterized Epithelial and Mesenchymal breast cancer cells.
  • Tumor subtypes are genetically engraved in breast cancer cells 40 .
  • epithelial MCF7 cells showed lower MGT #1 expression compared to MDA-231 cells, which are believed to have undergone EMT ( FIG. 10 a - b ).
  • MGT #1 expression reflects that actual breast cancer subtype identity
  • Ezh2 inhibition can support Kras-driven EMT in several mouse and human lung cancer cells 41 .
  • Example 6 MGT #1 as a Genetic Tracing Reporter for Tumor Homeostates In Vivo
  • CD133 which is routinely used to label tumor-propagating cells in patient-derived xenografts, showed a similar switch from a overall CD133high population in vitro, to a low or negative state.
  • CD133 expressing cells included a comparable fraction of MGT #1 expressing and non-expressing cells, thereby supporting the ability of MGT #1 to depict functional heterogeneity ( FIG. 5 e ).
  • sLCRs are designed to mimic endogenous CREs such as the alpha-globin LCR, which shows position-independent-cell-type- and developmental-stage-specific expression and engages transcription factories. These elements are often defined as super-enhancers and condensate into coactivator puncta.
  • PNGT #1-2 Proneural
  • MCT #1-2 Mesenchymal
  • sLCRs lentiviral particles into spontaneously immortalized human neural progenitor cells that acquired high copy number of PDGFRA, c-Myc and CDK4.
  • hGICs we further engineered hGICs to be depleted of PTEN and either bear IDH1 R132 and TP53 R273H point mutations or be further depleted of TP53 and NF1, thereby generating PN-hGICs and MES-hGICs, respectively.
  • PNGT #1-2 showed strong expression in both cell types, whereas MGT #1-2 displayed an overall low expression in both genotypes, underscoring the design specificity towards different regulatory networks.
  • MGT #1 had higher basal expression in MES-hGICs compared to PN-hGICs, indicating a genotype-specific response ( FIG. 1 h ).
  • MES-hGICs-MGT #1 low and PN-hGICs-MGT #1 low cells we next performed a phenotypic screening.
  • NBE-propagated hGICs were stimulated with selected factors (cytokines, growth factors, compounds) and FACS analyzed 48 hours after stimulation ( FIG. 13 b ).
  • sLCRs revealed shared and private responses in MES- and PN-hGICs-MGT #1 low and highlighted TNF ⁇ signaling as well as to human serum or FBS and Activin A as MES-GBM regulators.
  • FIGS. 2 g and 8 and 13 a FACS-sorted PN-hGICs-MGT #1 low bearing comparable levels of MGT #1 expression as MES-hGICs-MGT #1 low still failed to reach similar response to TNF ⁇ ( FIGS. 2 g and 8 and 13 a ). Consistently, despite being propagated under the same signaling conditions, MES-hGICs-MGT #1 low and PN-hGICs-MGT #1 low cells showed differences in endogenous expression and activation of selected signaling pathways ( FIG. 2 ). TNF ⁇ stimulation induced phosphorylation of NFkB-p65, STAT3 and p38-MAPK in both cell types but this resulted in a markedly different gene expression output ( FIG. 2 d ).
  • pro-differentiation signaling i.e. Human serum or FBS
  • FBS pro-differentiation signaling
  • washout experiments suggest that the MES-GBM state is reversible within the timeframe of few days ( FIG. 18 ), indicating that the MES GBM state may be acquired and reversed.
  • MES-hGICs-MGT #1 responded to short-term TNFalpha stimulation (4 hours) with higher upregulation of both MGT #1 and MES-GBM endogenous markers compared to the TNFalpha alone ( FIG. 13 f ), indicating that pre-treatment sensitized these cells to MES GBM program activation.
  • sLCRs provide a phenotypic layer of pharmacogenomic information over previous large studies based on fitness alone.
  • Chromatin accessibility is the strongest predictor of cancer type similarity and can be used to identify subtype identities within the common dimensional space of individual cancer types.
  • ATAC-seq was performed ATAC-seq on MES-hGICs-MGT #1 high cells in vitro and in vivo.
  • Differential analysis of chromatin accessibility uncovered many genes undergoing remodeling, notably at driver of PN-to-MES transition WWTR1 (TAZ) and at several TNF receptor gene loci, indicating that genetic tracing for remodeling events that exclusively occur in a physiologically relevant tumor microenvironment ( FIG.
  • Example 10 sLCRs Facilitate the Discovery of Therapeutic Implications for Non-Cell Autonomous Crosstalk Between Tumor and Immune Cells
  • IDH1-wild type GBM infiltration by Glioblastoma-associated microglia/monocytes (GAMs) was recently correlated with NF1 deficiency and a MES-GBM subtype identity but whether there is causal relationship between GAM and MES-GBM remains unresolved.
  • GAMs Glioblastoma-associated microglia/monocytes
  • inflammatory mediators derived from the adaptive immune system IFNgamma and IL-2, and stroma-derived IL-6 did not trigger direct MGT #1 activation to a comparable extent ( FIG. 17 ), collectively providing experimental insights into the cascade of events leading to a MES-GBM state in vivo.
  • EMT has been linked to resistance to chemotherapy but also offers therapeutic opportunities.
  • DNA damage stress is the main therapeutic component of the standard of care in GBM, otherwise referred to as the Stupp protocol.
  • a TNF-NFkB signature in GBM was previously linked to the mesenchymal state and radio-resistance in a large cohort of patients and PDX models.
  • sLCRs ability to identify a MES homeostate in order to explore the therapeutic implications of the microglia-driven GBM state
  • MES-hGICs-MGT #1 high cells retained a similar sensitivity profile to targeted agents such as BAY11-7085 (I ⁇ B), WP1066 (STAT3; FIGS. 15 h and 21 ).
  • the altered chemosensitivity profile of the MES-hGICs-MGT #1 high is consistent with the gene expression changes driven by hMG cells, including an impaired the DNA damage gene signature expression in MES-hGICs-MGT #1 high cells, a cell cycle profile shift together with the over-expression of a patient-derived MES-GBM and cholesterol biosynthesis signatures ( FIG. 21 ). Similar results were obtained with a Proneural genotype, indicating that hMG cells can divert hGICs into two functionally and therapeutically distinct states and supporting the use of sLCRs in target discovery platforms to integrate complex responses associated with tumor heterogeneity
  • sLCR genotype-to-molecular and cellular phenotype transitions in vitro and in vivo.
  • sLCR may be used in characterizing molecular mechanisms linking biological, chemical and environmental stimuli to cell fate transitions, including through chemical and forward genetic screens.
  • GBM subtypes have been consistent across expression platform (microarrays, RNA-seq), readouts (gene expression, DNA methylation) and patients' populations (Western and Chinese). Despite such an extensive effort, GBM subtypes' significance remains elusive when it comes to their origin, location or spatiotemporal evolution.
  • This technology enables transforming cellular and molecular profiling into phenotypic maps, which may fulfill the experimental needs associated with the continuous mapping of cellular and molecular features in health and disease, including at single-cell level.
  • sLCR improve in vivo phenotypic assays that still represent obligatory steps towards the full understanding of complex cellular and molecular mechanisms at organismal level. As such, it offers significant ex vivo opportunities.
  • the Proneural and Mesenchymal GBM programs rely on the activity of specific transcription factors.
  • acute inflammatory and pro-differentiation stimuli e.g. TNF signaling as well as bovine or human serum.
  • MES trans-differentiation measured by sLCRs can occur along with differentially impacting cell morphology.
  • Our experiments link MES-sLCRs readout in GBM cells, feed-forward responses to pro-inflammatory microenvironment, resistance to sub-lethal doses of genotoxic stress and expression of migration-associated markers such as CD44, all of which represent the hallmarks of progression in human cancer, including in GBM at single cell levels.
  • These features appear to be engraved in tissue homeostasis, as inferred by clustered cellular expression pattern (‘homeostases’) and heterogeneity in tumor models in vivo and ex vivo.
  • GAMs are believed to constitute the source for TNF ⁇ in both glioma mouse models and human tumors.
  • Our results provide experimental support to the clinical association between the MES-GBM subtype and specific immune landscapes and uncover TNF ⁇ -independent routes to MES GBM.
  • the GAM-driven MES-GBM state herewith identified shows an extent of overlap with patients' signatures, which is comparable to that of individual patients' signature themselves.
  • sLCR were shown to be of use in characterizing molecular mechanisms by linking biological, chemical and environmental stimuli to cell fate transitions, including through chemical and genetic screens.
  • Previous attempts to generate synthetic reporters using massively parallel sequencing or mixed models revealed the potential use of this approach and the limitations associated with limited control over the design.
  • Our method substantially addressed this problem and represent a base for future development, ranging from the linear improvement on basic design components (e.g. using curated resources of TFBS and cis-elements) to the systematic generation and validation of large numbers sLCR followed by machine learning of successful features.
  • robust cell-type- or state-specificity and granularity may be extended by combining sLCR with DNA barcoding.
  • Tunable operations may be achieved by coupling sLCRs transcriptional inputs with synthetic effector proteins enabling Boolean logic outputs.
  • genetic tracing by sLCRs is scalable and can be extended to virtually any given system, whether ex vivo or in vivo to dissect cell intrinsic and non-cell autonomous mechanisms controlling normal and diseased homeostasis.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Immunology (AREA)
  • Plant Pathology (AREA)
  • Cell Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Medicines Containing Material From Animals Or Micro-Organisms (AREA)
US17/273,821 2018-09-05 2019-09-05 Method for Engineering Synthetic Cis-Regulatory DNA Pending US20210343368A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP18192715.3 2018-09-05
EP18192715 2018-09-05
PCT/EP2019/073711 WO2020049106A1 (en) 2018-09-05 2019-09-05 A method for engineering synthetic cis-regulatory dna

Publications (1)

Publication Number Publication Date
US20210343368A1 true US20210343368A1 (en) 2021-11-04

Family

ID=63667685

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/273,821 Pending US20210343368A1 (en) 2018-09-05 2019-09-05 Method for Engineering Synthetic Cis-Regulatory DNA

Country Status (6)

Country Link
US (1) US20210343368A1 (zh)
EP (1) EP3847261A1 (zh)
JP (1) JP2021534807A (zh)
CN (1) CN113166767A (zh)
CA (1) CA3111045A1 (zh)
WO (1) WO2020049106A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114410621A (zh) * 2021-12-31 2022-04-29 吉林大学第一医院 一种简便快速的高通量基因组裸dna提取方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240209392A1 (en) * 2021-04-26 2024-06-27 The Regents Of The University Of California High-throughput expression-linked promoter selection in eukaryotic cells

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4683202A (en) 1985-03-28 1987-07-28 Cetus Corporation Process for amplifying nucleic acid sequences
US6040138A (en) 1995-09-15 2000-03-21 Affymetrix, Inc. Expression monitoring by hybridization to high density oligonucleotide arrays
US5800992A (en) 1989-06-07 1998-09-01 Fodor; Stephen P.A. Method of detecting nucleic acids
US5854033A (en) 1995-11-21 1998-12-29 Yale University Rolling circle replication reporter systems
EP0880598A4 (en) 1996-01-23 2005-02-23 Affymetrix Inc RAPID EVALUATION OF NUCLEIC ACID ABUNDANCE DIFFERENCE, WITH A HIGH-DENSITY OLIGONUCLEOTIDE SYSTEM
AU1287799A (en) 1997-10-31 1999-05-24 Affymetrix, Inc. Expression profiles in adult and fetal organs
US6020135A (en) 1998-03-27 2000-02-01 Affymetrix, Inc. P53-regulated genes
WO2001049868A1 (en) * 1999-12-31 2001-07-12 Korea Research Institute Of Bioscience And Biotechnology Cancer cell-specific gene expression system
JP2003203078A (ja) * 2001-10-19 2003-07-18 Mitsubishi Electric Corp 生理機能解析方法及びシステム
EP2160463A1 (en) * 2007-03-05 2010-03-10 Regulon S.A. A method for the construction of cancer-specific promoters using functional genomics
EP2479278A1 (en) * 2011-01-25 2012-07-25 Synpromics Ltd. Method for the construction of specific promoters
KR102235603B1 (ko) * 2013-02-01 2021-04-05 셀렉시스 에스. 에이. 향상된 이식유전자 발현 및 가공
GB201320351D0 (en) * 2013-11-18 2014-01-01 Erasmus Universiteit Medisch Ct Method
PL3097197T3 (pl) * 2014-01-21 2021-06-28 Vrije Universiteit Brussel Mięśniowo-specyficzne elementy regulatorowe kwasów nukleinowych oraz sposoby i ich zastosowanie
CN106546674A (zh) * 2016-10-20 2017-03-29 北京蛋白质组研究中心 用少量样品快速富集、分离、鉴定和定量内源转录因子及其复合物的方法及其专用装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114410621A (zh) * 2021-12-31 2022-04-29 吉林大学第一医院 一种简便快速的高通量基因组裸dna提取方法

Also Published As

Publication number Publication date
JP2021534807A (ja) 2021-12-16
EP3847261A1 (en) 2021-07-14
CN113166767A (zh) 2021-07-23
CA3111045A1 (en) 2020-03-12
WO2020049106A1 (en) 2020-03-12

Similar Documents

Publication Publication Date Title
Mueller et al. Evolutionary routes and KRAS dosage define pancreatic cancer phenotypes
Yekelchyk et al. Mono-and multi-nucleated ventricular cardiomyocytes constitute a transcriptionally homogenous cell population
Siira et al. Concerted regulation of mitochondrial and nuclear non‐coding RNA s by a dual‐targeted RN ase Z
Schrader et al. Actionable perturbations of damage responses by TCL1/ATM and epigenetic lesions form the basis of T-PLL
Tu et al. TNF-α-producing macrophages determine subtype identity and prognosis via AP1 enhancer reprogramming in pancreatic cancer
Schmitt et al. Phenotypic mapping of pathologic cross-talk between glioblastoma and innate immune cells by synthetic genetic tracing
Stricker et al. Robust stratification of breast cancer subtypes using differential patterns of transcript isoform expression
Treveil et al. Regulatory network analysis of Paneth cell and goblet cell enriched gut organoids using transcriptomics approaches
Agrawal Singh et al. PLZF targets developmental enhancers for activation during osteogenic differentiation of human mesenchymal stem cells
Nicolae et al. NFκB regulates p21 expression and controls DNA damage-induced leukemic differentiation
Hsu et al. METTL4-mediated nuclear N6-deoxyadenosine methylation promotes metastasis through activating multiple metastasis-inducing targets
Weiterer et al. Distinct IL‐1α‐responsive enhancers promote acute and coordinated changes in chromatin topology in a hierarchical manner
GB2549763A (en) Biomarkers for early diagnosis of ovarian cancer
Parreno et al. Transient loss of Polycomb components induces an epigenetic cancer fate
Lazure et al. Transcriptional reprogramming of skeletal muscle stem cells by the niche environment
Li et al. RNA splicing of the BHC80 gene contributes to neuroendocrine prostate cancer progression
Chen et al. Single-cell transcriptome analysis reveals six subpopulations reflecting distinct cellular fates in senescent mouse embryonic fibroblasts
US20210343368A1 (en) Method for Engineering Synthetic Cis-Regulatory DNA
Chen et al. LncRNA BC promotes lung adenocarcinoma progression by modulating IMPAD1 alternative splicing
Hai et al. A connectivity signature for glioblastoma
Ong et al. Requirement for TP73 and genetic alterations originating from its intragenic super-enhancer in adult T-cell leukemia
Bak et al. Ploidy-stratified single cardiomyocyte transcriptomics map Zinc Finger E-Box Binding Homeobox 1 to underly cardiomyocyte proliferation before birth
Lei et al. Noncoding SNP at rs1663689 represses ADGRG6 via interchromosomal interaction and reduces lung cancer progression
Castillo et al. Gene expression profile and signaling pathways in MCF-7 breast cancer cells mediated by acyl-coa synthetase 4 overexpression
Shen et al. FCGBP is a promising prognostic biomarker and correlates with immunotherapy efficacy in oral squamous cell carcinoma

Legal Events

Date Code Title Description
AS Assignment

Owner name: MAX-DELBRUECK-CENTRUM FUER MOLEKULARE MEDIZIN IN DER HELMHOLTZ, GERMANY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GARGIULO, GAETANO;REEL/FRAME:057114/0447

Effective date: 20210723

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED