WO2016156469A1 - Cartographie d'architecture de génome sur chromatine - Google Patents

Cartographie d'architecture de génome sur chromatine Download PDF

Info

Publication number
WO2016156469A1
WO2016156469A1 PCT/EP2016/057025 EP2016057025W WO2016156469A1 WO 2016156469 A1 WO2016156469 A1 WO 2016156469A1 EP 2016057025 W EP2016057025 W EP 2016057025W WO 2016156469 A1 WO2016156469 A1 WO 2016156469A1
Authority
WO
WIPO (PCT)
Prior art keywords
loci
gam
dna
segregation
chromatin
Prior art date
Application number
PCT/EP2016/057025
Other languages
English (en)
Inventor
Ana Pombo
Paul Dear
Miguel BRANCO
Original Assignee
Max-Delbrück-Centrum für Molekulare Medizin
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Max-Delbrück-Centrum für Molekulare Medizin filed Critical Max-Delbrück-Centrum für Molekulare Medizin
Publication of WO2016156469A1 publication Critical patent/WO2016156469A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6816Hybridisation assays characterised by the detection means
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the present invention relates to the field of analysis of the three-dimensional structure of the genome, i.e., for genome architecture mapping on chromatin (GAM-ch).
  • the invention provides a method of determining interaction of a plurality of nucleic acid loci in a compartment comprising nucleic acids, such as the cell nucleus, comprising separating nucleic acids from each other depending on their interaction in the compartment by crosslinking nucleic acids with each other directly or indirectly, fragmenting the nucleic acids of the compartment to obtain fragments and/or cross-linked complexes of fragments, and dividing the fragmented nucleic acids to obtain a collection of fractions such that every fraction contains, on average, less than one copy of every locus; determining the presence or absence of the plurality of loci in said fractions; and determining the co-segregation of said plurality of loci in the fractions.
  • Co-segregation may then be analysed with statistical methods to determine interactions.
  • the method can be used e.g., for identifying the frequency of interactions across a cell population between a plurality of loci; and mapping loci and/or genome architecture, e.g., in the nucleus, an organelle, a microorganism or a virus; identification of regulatory regions (enhancers) directing expression of a specific gene through spatial contacts; identifying the spatial contacts between loci that depend on their co- association with specific protein(s), or R A, and/or diagnosing a disease associated with a disturbed co-segregation of loci.
  • Chromatin immunoprecipitation ChIP can be combined with the method of the invention.
  • Information about the three-dimensional structure of chromatin is also of high interest, in particular, to discover contacts between regulatory regions (e.g. enhancers) and gene promoters which may be disrupted in disease due to genetic mutations in the non-coding part of the genome (e.g. Uslu V.V. et al. 2014 Long-range enhancers regulating Myc expression are required for normal facial morphogenesis. Nature Genetics 46: 753).
  • regulatory regions e.g. enhancers
  • gene promoters which may be disrupted in disease due to genetic mutations in the non-coding part of the genome
  • chromosomes Studying the structural properties and spatial organization of chromosomes is important for the understanding and evaluation of the regulation of gene expression, DNA replication and repair, and recombination.
  • the folding of chromosomes and their contacts has important implications for disease mechanisms and elucidation of targets for therapeutic approaches, e.g., in cancer or congenital diseases.
  • Chromatin exists in interacting and non-interacting states. Interacting states have different properties depending on the characteristics of the genomic sites, or binding sites, involved in the interactions, namely (a) their number, distance and distribution, (b) their specificity and affinity for binders, and (c) the concentration and specificity of binders. Chromatin interactions can also involve different numbers of loci associating simultaneously (multiplicity of interaction).
  • Fluorescence in situ hybridization uses microscopy to directly measure spatial distances between genomic loci, but it can only be applied to the study of a small number of genomic regions at a time in the same nucleus (e.g., Pombo A. 2003. Cellular genomics: which genes are transcribed when and where? Trends Biochem. Sci. 28, 6). It is theoretically possible to re-probe the same cells or tissue sections with different sets of probes, but there are concerns that repeated re-probing causes structural artefacts, e.g. due to DNA denaturation necessary to dissociate subsequent sets of probes, that e.g. induce artificial aggregation (contacts) of loci (i.e.
  • RNA-FISH is a milder FISH approach that does not involve DNA denaturation but that can only be used to determine the nuclear position of actively transcribed genes (not silent genes). Samples from cells in the interphase stage of the cell cycle, where functional chromatin contacts are most often mapped, can be re-probed for R A-FISH only about three times, although the preservation of structure has not been measured in detail.
  • the number of probes which can be simultaneously applied in either DNA- or RNA-FISH is limited by distinguishable fluorescent markers, e.g. 181 barcodes can in principle be obtained by combining five colours, four colour ratios and two different levels of intensity (Pombo A. 2003. Cellular genomics: which genes are transcribed when and where? Trends Biochem. Sci. 28, 6).
  • this approach fails when the loci analysed are so close in space that the combination of fluorochromes in one probe is not distinguishable from the combination in another, and is therefore not amenable to the identification of loci that are spatially proximal at very short distances.
  • FISH can only be applied to analyse interactions of known loci of interest, and not to discover e.g. the presence of an exogenous DNA sequence in an interaction with the host's DNA.
  • the approach fails e.g. in the detection of endogenous or exogenous DNA sequences, unless they are known a priori, e.g. viral subtype integration positions and the exact sequences of exogenous DNA.
  • FISH is also confounded by a priori assumptions of linear genome organisation, which are not acceptable to study chromatin positioning features, e.g. chromatin contacts, when e.g. the influence of natural variation in genomic sequence in organism populations is of interest, e.g. in studying human samples, due to the fact that FISH does not inherently detect sequence variations such as copy number variations, or genomic rearrangements, without a priori probe design or a priori whole genome sequencing of the sample followed by probe design.
  • 3C-based methods generally start with chemical crosslinking of proteins that mediate genomic contacts. After chromatin extraction, pieces of DNA bound by the crosslinked proteins and RNAs are treated with a restriction enzyme for fragmentation. Addition of a ligase then connects (ligates) two pieces of DNA.
  • 3C uses different methods of detecting such ligation events: a popular one is paired-end sequencing (Hi-C, 4C-seq, ChlA-PET), and in one embodiment the DNA bound by a specific protein (or molecule) is purified before the ligation step.
  • the present inventors addressed the problem of providing an improved method for determining the interaction of nucleic acids, which avoids bias based on ligation of fragmented nucleic acids for detection of nucleic acids interactions, and which allows for simultaneous analysis of several high multiplicity interactions (each involving more than two loci), in particular, more than two interactions.
  • the method allows for simultaneous analysis of substantially all nucleic acid interactions in the genome, in another, the method allows for simultaneous analyses of all nucleic acid interactions of fragments bound by a given protein or molecule of interest such as protein or RNA.
  • This problem is solved by the method of the invention, as described below and in the claims. This method is designated Genome Architecture Mapping on Chromatin (GAM-ch).
  • the present invention provides a method of determining interaction of a plurality of nucleic acid loci in a compartment comprising nucleic acids, comprising steps of
  • nucleic acids from each other depending on their interaction in the compartment by (i) crosslinking nucleic acids with each other directly or indirectly, (ii) fragmenting the nucleic acids of the compartment to obtain fragments and/or cross-linked complexes of fragments, e.g. by the use of sonication, mechanical shearing or restriction enzyme digestion, and (iii) dividing the fragmented nucleic acids to obtain a collection of fractions such that every fraction contains, on average, less than one copy of every locus (e.g. about 0.5 copies or one copy in every other fraction), wherein steps (i) and (ii) can be carried out simultaneously or in any order;
  • fragments bound by a given molecule of interest are selected, e.g. by chromatin immunoprecipitation (ChIP), as described in more detail below.
  • ChIP chromatin immunoprecipitation
  • a locus is the specific location of a gene, DNA sequence, or position on a chromosome (Wikipedia). Each chromosome carries many genes; the number of protein coding genes in the haploid human genome is estimated to be 20,000-25,000, on the 23 different chromosomes; there are as many transcription units which produce RNA species that do not encode for proteins. A variant of the similar DNA sequence located at a given locus is called an allele.
  • the nucleic acid may be DNA or RNA or a combination of both, e.g., if interactions between genes being actively transcribed and other genomic regions are to be analysed. Usually, the method of the invention is used to analyse co-segregation of DNA.
  • the co-segregation of loci may be analysed in any compartment comprising nucleic acids, such as the nucleus of a eukaryotic cell, a mitochondrion, a chloroplast, a prokaryotic cell or a virus.
  • nucleic acids such as the nucleus of a eukaryotic cell, a mitochondrion, a chloroplast, a prokaryotic cell or a virus.
  • co-segregation of nucleic acid in particular, DNA loci in the nucleus of a eukaryotic cell may be analysed.
  • the method of the invention thus constitutes a solution to analyse locus proximity or interaction in the nucleus, through measuring their frequency of co- segregation in cross-linked DNA complexes extracted from nuclei.
  • the cell or particle from which the compartment is derived may be a virus, a bacterium, a protozoan, a plant cell, a fungal cell or an animal cell, e.g., a mammalian cell, such as a cell from a patient (preferably, a human patient) having a disease or a disorder, or being diagnosed for a disorder, or a healthy subject.
  • the cell may be a tumor cell or a stem cell, such as an induced pluripotent stem cell generated, e.g., through reprogramming of human tissues.
  • Such cells can advantageously be used to apply GAM-ch to study human developmental disorders or congenital disease.
  • the cell is an embryonic stem cell, it is preferably not generated in a method involving destruction of a human embryo. A plurality of cells/compartments or single cells may be analysed with the method of the invention.
  • the mammal preferably is a human, but it may also be of interest to investigate, and, optionally, compare the genomic architecture of other organisms, such as E. coli, yeast, A. thaliana, C. elegans, X. leavis, D. rerio, D. melanogaster, mouse, rat or primate, or possibly parasitic interactions, e.g. the proximity of parasitic nucleic acids relative to the host genome, such as the chromatin contacts a virus (e.g. HIV, HSV) make with the host DNA, or of an artificially inserted nuclei acid (e.g. in the context of gene therapy).
  • a virus e.g. HIV, HSV
  • Cells can be derived from cell culture or analysed ex vivo from a specific tissue from a living organism or a dead organism, i.e., post-mortem, or from a whole experimental organism (e.g. a whole D. melanogaster embryo or C. elegans embryo), or from a mixture of microorganisms.
  • Cells used in the analysis can be selected, e.g., by synchronizing the cells in a particular stage of the cell cycle, or sorting the cells e.g. by fluorescence activated cell sorting to capture a particular cell type expressing a specific marker, e.g., using an antibody specific for a protein uniquely expressed in the cell type or cell stage of interest, or detected by in situ hybridization e.g.
  • a nucleic acid probe that detects a specific e.g. mRNA, or other RNA, expressed specifically in the cell type of interest, or a fluorescent marker such as GFP showing expression of a specific gene or characteristic of a specific stage.
  • a GFP transgene under the control of the promoter of the Pitx3 transcription factor can be used to mark dopamine- expressing neurons (Maxwell S. et al, 2005, Pitx3 regulates tyrosine hydroxylase expression in the substantia nigra and identifies a subgroup of mesencephalic dopaminergic progenitor neurons during mouse development. Dev. Biol. 282 (2): 467-479).
  • Cells can be pre-treated with an agent, e.g., to test the effect of drugs on co-segregation or positioning of loci, or be studied during the lifetime of an organism to understand development, ageing and degeneration.
  • a suspension of single cells is prepared before step (a), depending on the species and type of tissue, e.g., a single cell suspension of mammalian solid tissues may be prepared.
  • Preparation of a single cell suspension may be carried out by any procedure that is also compatible with 3C-techonologies. Detailed description of several single cell preparations compatible with the production of a chromatin sample that preserves crosslinked chromatin contacts can be found in e.g. Hagege H. et al. 2007. Quantitative analysis of chromosome conformation capture assays (3C-qPCR). Nature Protocols 2, 1722.
  • the preparation of a single cell suspension may start by tissue dissection, followed by treatment with collagenase, or, for soft tissues (e.g. mouse thymus or fetal liver), by passage of tissue through a cell strainer (e.g. 40 micrometer mesh), or in the case of cells grown in in vitro culture or microorganism cultures, through centrifugation of the culture at appropriate force for the cell type, followed by resuspension at appropriate strength to yield a single cell suspension with minimal cell damage or death.
  • a cell strainer e.g. 40 micrometer mesh
  • centrifugation of the culture at appropriate force for the cell type followed by resuspension at appropriate strength to yield a single cell suspension with minimal cell damage or death.
  • Application to post-mortem samples is also possible using published protocols or developments thereafter (Mitchell A.C. et al. 2014. The genome in three dimensions: a new frontier in human brain disease. Biol. Psychiatry 75, 961).
  • the separation of nucleic acids from each other in step (a) is carried out by (i) crosslinking nucleic acids with each other directly or indirectly, i.e., DNA and/or RNA may be cross-linked directly or through proteins interaction with the nucleic acid, using e.g. chemical crosslinking agents such as formaldehyde, (ii) fragmenting the nucleic acids of the compartment to obtain a fragments and/or complexes of cross-linked fragments of nucleic acids, e.g.
  • nucleic acids by sonication, and (iii) dividing the nucleic acids into fractions to obtain a collection of fractions each containing a plurality of fragments and/or complexes of cross-linked fragments, such that every fraction contains, on average, less than one copy of every locus.
  • Nuclei, cells, tissues or whole organisms are treated with a crosslinking agent, e.g. a chemical crosslinking agent in step (a) (i).
  • the crosslinking agent induces linkage of proteins with each other and between nucleic acids (DNA and/or RNA) and proteins.
  • the method of the invention is compatible with cross-linking conditions that are also compatible with current 3C-based methods.
  • the crosslinking agent comprises formaldehyde or another crosslinking agent compatible with DNA extraction.
  • Formaldehyde will preferably be used, at a concentration of 0.5-4%, preferably, about l%-2% (all w/w), e.g., in a buffered solution, e.g., of PBS pH 7.0- 8.0, or directly by addition of concentrated solution of the cross-linking agent directly to cell medium, preferably for 5-120 min, preferably 10-20 min.
  • Alternative cross-linkers are, e.g., disuccinimidylglutarate, dithiobis-succinimidyl propionate, glutaraldehyde.
  • Crosslinking may also be performed by UV radiation.
  • fixed nuclei or cells can be pelleted and stored frozen, e.g., at - 20°C, or -70°C or -80°C, e.g. in 1% formaldehyde.
  • Steps (i) and (ii) may be carried out at the same time or in any order.
  • crosslinking is performed as soon as possible to maintain the structure of chromatin intact as well as possible, i.e., it is usually performed first.
  • Step (a) of the method may further comprise, e.g., permeabilisation of cells by a lysis buffer and/or freezing.
  • the crosslinking can, e.g., be done directly on cells and then followed by permeabilisation, e.g., lysis with a suitable lysis buffer, and/or, freezing, and then fragmentation, e.g., by restriction (see Hagege et al. 2007).
  • crosslinking and permeabilising can be performed at the same time.
  • the fragmenting in step (a)(ii) can be carried out by any method, which preferably leads to formation of fragments of homogenous length, or randomly and evenly- spaced breaks in the nucleic acids.
  • fragmentation can be done by ultrasound, by mechanical shearing, by Dounce homogenisation, vortexing with glass beads, or by restriction digest, or a combination of two or more of these methods.
  • Physical methods such as ultrasound or shearing can be adapted to yield fragments or complexes of fragments of a desired fragment size, which may vary depending on the tissue and/or cell analysed.
  • Preferred average fragment size depends on the resolution with which chromatin interactions are aimed to be mapped (which depend on organism and on aims) and is about 100bp-5 Mbp, or preferably, 200bp-500kbp or lkbp-5kbp nucleotides.
  • the average "chromatin loop-size is about 100 kbp.
  • Promoter contacts with regulatory regions are often local, below 50 kbp, so an appropriate resolution needs to be chosen.
  • Dounce homogenisation can be performed using e.g. 100 mg tissue in (a) 2 mL IX PBS (phosphate buffered saline) or another suitable buffer, and (b) 200 ⁇ ⁇ protease inhibitor (Mitchell A.C. et al. 2014. The genome in three dimensions: a new frontier in human brain disease. Biol. Psychiatry 75, 961).
  • vitrification i.e. rapid freezing
  • chemical crosslinking agents e.g. formaldehyde
  • restriction digestion may be considered to introduce some bias into the formation of fragments, it may be acceptable if it is taken into account in the analysis of results.
  • frequently cutting restriction enzymes may be used, or a combination of enzymes recognizing different restriction sites e.g., two, three or four different restriction enzymes, may be used.
  • a restriction digest with the enzymes Hindlll, Ncol, EcoRI or Bglll (6-base cutters) or DpnII or Nlalll (4-base cutters) may be carried out e.g. for 60 min, or over night at 37°C and will provide different fragment sizes depending on the genomic distribution of the restriction sequence.
  • step (a) (iii) can be preceded by an additional step (a) (iii.0) comprising selection of fragments/complexes of fragments that are bound by a given molecule of interest, in particular a given protein, a given protein post-translational modification, a given RNA (if fragments are DNA) or a given DNA (if fragments are RNA), or a chemical modification of DNA (e.g. DNA methylation) or RNA, or a given protein/nucleic acid complex, or, after targeting a locus with Cas9 complex with guide RNAs.
  • the given molecule of interest is a protein that is bound to chromatin at the time that chromatin forms contacts.
  • Said selection may be carried out by an affinity-based method such as affinity precipitation, e.g.by performing a chromatin immunoprecipitation or pull down using antibodies or other affinity molecules (e.g. aptamer), followed by dividing/aliquoting e.g. the 'beads' used for pull down.
  • affinity-based method such as affinity precipitation, e.g.by performing a chromatin immunoprecipitation or pull down using antibodies or other affinity molecules (e.g. aptamer), followed by dividing/aliquoting e.g. the 'beads' used for pull down.
  • affinity precipitation with antibodies is preferred, other affinity based selection methods, e.g.
  • biotin binding to avidin or derivatives such as streptavidin e.g., after labelling of chromatin using in vivo biotinylation, or incorporation of biotin to specific nucleic acid sequences, e.g. after in situ incorporation of Biotin-UTP or Biotin-dUTP into nascent RNA or nascent DNA, respectively, can also be employed.
  • Specific nucleic acids may also be selected by use of hybridizing nucleic acids for selection, e.g., by affinity precipitation. Affinity precipitation can be substituted for by passage over columns comprising a ligand specific for the molecule of interest.
  • Chromatin Immunoprecipitation can be employed (e.g., Collas, 2010. The current state of chromatin immunoprecipitation. Molecular Biotechnology 45(1):87-100; Stock et al. 2007; Brookes et al. 2012). Suitable conditions for specific interaction with the molecule of interest are employed, e.g., conditions for stringent hybridization. Methods disclosed in WO 2014/14152397 A2 may be employed.
  • step (a) (iii) the nucleic acids in the preparation resulting from the previous steps, e.g., directly from step (a)(ii) or from step (a)(iii.0), are divided (or aliquoted) into fractions to obtain a collection of fractions such that every fraction contains, on average, less than one copy of every locus (e.g. 0.0001-0.9, 0.01-0.7, 0.1-0.6, 0.4-0.5, preferably, about 0.5 copies, i.e. one copy in every other fraction).
  • one locus is seen in every other fraction (i.e. in 50% of the fractions), or in 40% or 30% or 10% or 5% of fractions.
  • the number of fractions depends on the approximate number of loci and the genomic resolution at which the assay will be carried out (i.e. it depends on the total genome length of the organism under study and the length of the loci for which contacts are measured, in other words on the resolution).
  • the nucleic acids are separated into many fractions.
  • the number of fractions depends on whether only pairwise or multiple contacts are to be found between loci, on whether only the most highly frequent contacts (interactions) (e.g. frequency above 50% across the cell population), or also the least frequent contacts (e.g. 5%) also are to be identified.
  • step (a) (iii.0) is used to reduce the complexity of the sample. If step (a) (iii.0) is used, analysis of about 180 fractions (or more) already provides meaningful results.
  • the nucleic acid (often DNA) content of the fractions should be homogenous for the whole analysis, but non-homogenous fractions (e.g.
  • fractions that have excessive DNA content may be excluded a posteriori once nucleic acid content is mapped; e.g., if using fractions that are supposed to contain approximately 30% of genomic DNA coverage on average, any tubes that contain more than 40% or less that 20% coverage can be excluded, or analysed separately, upon DNA detection.
  • These fractions may be obtained from a plurality of cells (or nucleic acid containing cellular compartments) or from single cells.
  • the separation into fractions is preferably done after homogenous division of the fragments and/or cross-linked complexes of fragments.
  • some fractions will, statistically, contain one or more copies of all possible loci that cover the given genome. This may be found in different situations, firstly, when the preparation of fractions of the compartment leads to fractions with very heterogeneous content in terms of number of fragments (e.g. an large chunk of chromatin; Gavrilov A. A. et al. 2013. Disclosure of a structural milieu for the proximity ligation reveals the elusive nature of an active chromatin hub. Nucleic Acids Res. 41 , 3563-75). This is an artefact, which can be detected and disregarded in the analysis of the said invention. Furthermore, this may happen when the two alleles in a cell interact so closely that they appear in the same fraction. When loci are identified with sequencing, this is not a problem, as it can be measured based on sequence difference due to SNP variation between alleles.
  • the presence or absence of the plurality of loci may be determined by e.g., polymerase-chain reaction (PCR), or preferably, by sequencing, preferably, by next generation sequencing and eventually by the developing single molecule sequencing techniques.
  • PCR polymerase-chain reaction
  • WGA single cell whole genome amplification
  • the nucleic acids of loci in the fraction are sequenced substantially or completely. This is of particular interest if the method is carried out to detect possible interactions between different loci in a research setting, and a "normal" co-segregation pattern has not yet been established for the cell type of interest in the physiological conditions used.
  • the method of the invention may thus be used to analyse spatial proximity (and, consequently, interactions) of unknown and/or unspecified loci, or of transgenic loci inserted in the genome (e.g. in gene therapy) to study their effects of chromatin contacts.
  • the method can be used to detect specific (and new) species, as the DNA in cells of each species crosslinks with DNA from each species, and is more often found co-segregated.
  • nucleic acids such as DNA may be analysed by crosslinking, nuclear fractionation (optional), fragmentation (i.e. chromatin preparation or preparation of nuclei acid complexes), dilution and separation into fractions or sub-samples, followed by amplification using single-cell whole genome amplification (WGA; Baslan, T. et al. 2012. Genome-wide copy number analysis of single cells. Nat. Protoc. 7: 1024) (Fig. 4A).
  • WGA-amplified DNA may be sequenced, e.g., using Illumina HiSeq technology. Visual inspection of tracks from single fractions shows that each contains a different complement of sub-chromosomal regions of expected size (Fig. 6, Fig. 14), as expected from sequencing a sub-cellular fraction of chromatin containing fragment lengths of a given genomic length.
  • each fraction contains only a restricted subset of sequences from each chromosome (Fig. 15B).
  • presence or absence of a specific interaction has previously been investigated, so the interacting loci of interest are already known.
  • a significant difference in the frequency with which two loci interact may have been found between different patient groups (e.g., healthy subjects and subjects having a disease, such as a tumor or a congenital disease).
  • presence or absence of the two (or more) loci of interest can also be determined by specific PCR, or by otherwise specifically checking for their presence, e.g., by Southern blot or by Illumina HiSeq technology, after selection of nucleic acids covering locus of interest, e.g.
  • GAM-ch thus preferably combines single copy locus fractionation of a crosslinked chromatin preparation with DNA detection (e.g. by whole genome amplification and next generation sequencing).
  • DNA detection e.g. by whole genome amplification and next generation sequencing.
  • chromatin is crosslinked, loci that are closer to each other in the nuclear space (but not necessarily on the linear genome) are found together in the single molecule fraction more frequently than distant loci (i.e. they co-segregate more frequently, Fig. 2).
  • the frequency of contacts between genomic loci can then be inferred by scoring the presence or absence of loci among a number of aliquots containing a sub-genome sample of fragments (Fig. 2).
  • the resulting table can be used to compute the co-segregation frequency of each locus against every other locus to create a matrix of inferred contact frequencies between loci. Therefore, GAM allows for the calculation of chromatin contacts genome wide without the need for end-to- end ligation between the interacting fragments.
  • Co-segregation may be analysed with a statistical method to determine chromatin contacts. Close spatial proximity can be a sign for specific interaction of loci. Specific interaction of loci may thus also be determined by analysing co-segregation with a statistical method.
  • Statistical methods used in the method of the invention may be, e.g., inferential statistic methods.
  • Statistical methods used in the examples may also be used in the method of the invention to analyse samples of different origin and/or for different loci of interest, e.g., as mentioned herein.
  • the loci are determined to interact specifically, when they co-segregate at a frequency higher than expected from their linear genomic distance on a chromosome. If all possible pairs of loci in the genome at a given genomic (linear) distance are considered, pairs of loci that do NOT interact will be found distributed around an average frequency of chromatin contacts (i.e., co- segregation across the collection of fractions) that depends on the genomic distance between the two loci and the degree of chromatin compaction.
  • the term "contact" is used herein to describe co-segregation across the collection of fractions i.e., a quantitative measure of interaction. Loci that do not interact, e.g., are considered to have a value of contact of zero.
  • interacting pairs will have higher frequencies of chromatin contacts (i.e., co-segregation in the fractions) than the average for that genomic distance that depends on their physical distance in the nucleus of that particular cell type.
  • More complex arguments can also be considered, but an interaction can be most simply defined as a deviation from the random (three-dimensional) arrangement of the chromatin fibre taking into consideration any additional contributing factor(s) to a non- random behaviour.
  • GAM-ch measures the frequency with which two loci co-segregate in the same fraction, and can measure the co-segregation of all genomic loci simultaneously, producing quantitative information that is amenable to (a) the identification of genomic coordinates that more frequently interact with other genomic regions, but also (b) to a wide-range of mathematical treatments that calculate the probability of loci interacting above some random (expected) behaviour.
  • a plurality of loci means two or more loci, optionally, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, at least ten, at least eleven, at least 12, at least 13, at least 15, at least 20, at least 30, at least 40, at least 50, at least 75, at least 100, at least 200, at least 500 or at least 1000 loci and up to several million or billion loci, which are analysed simultaneously. For example, allele-specific analysis of a human cell at 5 kb resolution requires simultaneous analysis of 1.3 million loci.
  • substantially all loci or all loci in a compartment are analysed with the method of the invention, e.g., by sequencing substantially all nucleic acids, preferably, all DNA, in the compartment.
  • the loci to be analysed may be determined in a biased way (e.g. by choosing to analyse all 23000 protein coding genes in a human cell, or all gene promoters or all non-coding regulatory regions, or all enhancers), or in an unbiased way, e.g. by dividing the genome into windows of a certain size, e.g., windows of 100 bp to 10 Mbp, preferably, 1 kbp to 1 Mbp, 5 kbp-50 kbp, or 10 kbp-30 kbp windows.
  • the method of the invention can be applied in a way which does not distinguish between different alleles (e.g. the two homologous copies of a gene present in a normal human cell), or, alternatively, it can be used to distinguish the two (or more, in the case of e.g. polyploid amphibian cells) alleles of a locus in the same cell.
  • different alleles e.g. the two homologous copies of a gene present in a normal human cell
  • it can be used to distinguish the two (or more, in the case of e.g. polyploid amphibian cells) alleles of a locus in the same cell.
  • the method of the invention allows for the detection of multiple co-segregating loci, in particu- lar, more than two co-segregating loci, preferably, more than three, more than four, more than 8, or more than 20, co-segregating loci.
  • identification of multiple interactions using 3C-based methods has been attempted and shown to be both inefficient and highly biased (Sexton et al, 2012, Cell 148:458-72).
  • There is mathematical evidence showing that these experimental limitations of 3C-based methods will remain insurmountable, irrespective of incremental improvements (O'SuUivan J.M. et al., 2013, Nucleus 4:390-8).
  • restriction sites are not randomly distributed in the genome, leading to a bias in detection.
  • the efficiency of ligation is affected by the different length of DNA fragments, which adds further bias to 3C-based results.
  • the method of the invention is preferably not or not substantially affected by these biases.
  • step (b) no ligation occurs between nucleic acids originally present in the compartment, in particular, no ligation has to be performed prior to step (b).
  • ligation e.g., with external linkers is possible in the context of detection of the presence or absence of nucleic acid loci, e.g., for amplification or sequencing.
  • the avoidance of ligation of nucleic acids derived from the compartment with each other overcomes the structural bias of 3C-based methods.
  • GAM-ch is unique compared with competing technologies, as it can detect the multiplicity of loci interacting simultaneously, where there are more than three loci interacting at once (such detection being impossible or inefficient by ligation-based 3C-based methods), and it can also detect all loci present in the compartment and their copy number, irrespectively of whether they are found to participate in an interaction, which allows important corrections to be made in the contact maps. It is also one of the advantages of the method of the invention that it can be used to identify spatial proximity of loci which were not known before the method was carried out, i.e., interactions can be identified between newly discovered or non-defined loci. The present invention also provides the use of the method of the invention for
  • the method of the invention may be used to determine specific interactions, and is capable of differentiating leading interactions from bystander interactions;
  • mapping loci and/or genome architecture in the compartment (b) mapping loci and/or genome architecture in the compartment.
  • a map in particular, a matrix, can be drawn up for specific loci or the chromosomal architecture based in the co- segregation frequencies determined;
  • Chromosomal insertion of a nucleic acid due to gene therapy or other genetic engineering approaches may affect genome architecture, e.g., it may enhance or prevent interaction of regulatory regions with specific promoters and thus affect transcription of "unrelated" genes.
  • the expression pattern of the introduced nucleic acid may itself depend on, or be disrupted by, its interactions with endogenous regulatory regions.
  • the method of the invention allows for assessment of the effects of gene therapy or genetic engineering on the level of interaction between different loci;
  • mapping chromosomal rearrangements e.g., in cancer, including in specific sub-tissue cell populations, e.g. to study clonal evolution of rearrangements;
  • identifying a species in a mixture of species e.g., identifying a potentially novel microorganism species in a mixture of species
  • the method of the invention may be used in identification of species in microbial communities, e.g. as described for Hi-C in Burton et al. (2014, G3 4, 1339-1346).
  • step (a)(ii) specifically mapping contacts mediated by a defined factor (or molecule of interest, e.g., protein, R A, DNA and/or their modifications), e.g., by extracting said factor and associated complexes of fragments after step (a)(ii) is carried out, e.g., by immunoprecipitation of the defined protein and associated complexes of fragments (step (a) (iii.O)).
  • a defined factor or molecule of interest, e.g., protein, R A, DNA and/or their modifications
  • Option (1) may be of specific interest, as it reduces the complexity of the sample.
  • the present invention thus also provides a method of diagnosing a disease associated with a disturbed co-segregation of loci in a patient, comprising, in a sample taken from said patient, analysing co-segregation of a plurality of loci in the patient, and comparing said co-segregation with co-segregation of said loci in a subject already diagnosed with said disease, wherein the co- segregation is preferably also compared with co-segregation in a healthy subject.
  • co-segregation of loci may be compared between specific sub-groups of cells, which may be derived from the same patient, e.g., tumor cells and normal tissue.
  • Co-segregation can also be analysed in different cell types upon derivation of pluripotent stem cells from the patient, or model organism, and their experimental differentiation into specific cell types through laboratory culture in appropriate conditions, e.g. in the presence of the appropriate factors, in the suited container, at the appropriate temperature, e.g. 37°C for human samples.
  • "a" is meant to refer to "at least one", if not specifically mentioned otherwise.
  • the present invention may be used to investigate a disturbed co-segregation of loci in a patient, i.e., chromatin misfolding, it may also inform or guide the treatment of patients having a disease associated with chromatin misfolding, as such patients may, after diagnosis with a method of the invention, be treated to correct chromatin misfolding (Deng W., Blobel G., 2014, Curr Op Genet Dev. 25: 1-7). The present invention may then be used to monitor the effects of such treatments on chromatin misfolding.
  • Fig. 1 Limitations of current 3C-based methods due to dependency on ligation of DNA ends for capturing contacts between nucleic acids.
  • 3C-based methods the presence of multiple loci in a single interaction may dilute the measured ligation frequency between any two loci that are member of the interaction.
  • GAM-ch the measured interaction is not affected by multiplicity.
  • Fig. 2 Outline of the GAM-ch method. Chromatin is prepared from mildly-fixed cells and randomly fragmented (1). Crosslinked chromatin is divided (ali quoted) across tubes to have ⁇ 1 haploid genome equivalent per tube (2). The DNA content of each tube is determined to assess the co-segregation of genomic sequences across tubes (3). Co-segregation of genomic sequences reflects chromatin contacts of genomic sequences in the cell nucleus dependent on protein- protein and protein-RNA bridged interactions and is used to measure long-range chromatin interactions.
  • A Schematic presentation of the mouse ⁇ -globin gene cluster (adapted from Tolhuis, B. et al. (2002). Looping and interaction between hypersensitive sites in the active beta-globin locus.Molecular Cell 10, 1453). Arrows and circles depict the individual hypersensitive sites.
  • the ⁇ -globin genes are indicated by triangles, with active genes (Pmaj and ⁇ ) in grey and inactive genes ( ⁇ and ⁇ ) in black.
  • the olfactory receptor (OR) genes are indicated by white boxes, of which some were shown to interact with the ⁇ -globin gene cluster. Grey boxes also indicate other gene loci (3' prime olfactory receptor genes, Uros and Eraf), which were shown to interact with the ⁇ -globin gene cluster in embryonic liver tissue.
  • LCR Locus Control Region.
  • B A hypothetical 3D model of the active chromatin hub (ACH) based on population-based 3C data from Tolhuis et al. (2002). Neither the size of the ACH nor the actual position of the elements relative to each other is to scale. Hypersensitive sites and active genes of the locus form a hub of hyper-accessible chromatin (ACH). The inactive regions of the locus, having a more compact chromatin structure, are indicated in grey, with the inactive ⁇ and ⁇ genes in lighter grey. The olfactory genes are not shown. The interactions in the ACH would be dynamic in nature, in particular with the active genes (Pmaj and ⁇ ), which are alternately transcribed.
  • Crosslinking frequency with value 1 arbitrarily corresponds to the crosslinking frequency between two neighbouring control fragments within the Calreticulin (CALR) gene locus, which is expressed at similar levels in the two tissues.
  • a schematic illustration of mouse ⁇ -globin gene cluster is depicted; the grey shading represents the position and size of fragments generated by Hindlll restriction.
  • the quality of the chromatin preparation produce was validated by 3C at four regions of the murine ⁇ -globin gene cluster in fetal liver and brain cells. Fetal liver and brain cells from El 4.5 mouse embryos were fixed (5 or 10 min) in 2% formaldehyde, digested with Hindlll and ligated under highly diluted conditions. Ligation products were quantified by qPCR using the 3'end hypersensitive site (3'HS1) as bait. Means and SEM are shown. The black vertical line indicates the position and size of the 3C-bait fragment containing 3'HS1.
  • Crosslinking frequency with a value of 1 arbitrarily corresponds to the crosslinking frequency between two neighbouring control fragments (with analyzed restriction sites being 8.3 kb) within the Ercc3 gene locus (on chromosome 18), which is expressed at similar levels in fetal liver and brain. Black bars indicate the position of primer pairs used for 3C.
  • PK proteinase K
  • GAM-ch samples marked with an asterisk were used for library preparation.
  • WGA-amplified DNA was fragmented to -400 bp using Covaris, and amplified using the Illumina library mate-pair kit DNA fragments were excised (350-650 bp for -0.2 and -10 genomes, 200-650 bp for -0.7 genomes), quantified and sequenced.
  • GAM-ch is also designated xGAM.
  • Fig. 6 Mapping of GAM-ch-seq datasets corresponding to -0.2 and -10 genomes in comparison with linear DNA.
  • Gaps are defined as regions which are not covered by reads.
  • the sequencing depth is calculated by dividing the genome into identical windows and counting the number of nucleotides covered by reads, which fall into each window.
  • Fig. 8 Gap-size (A) and sequencing depth (B) distributions for 10 ng of linear DNA and GAM-ch (xGAM) samples at -0.2 and -10 genomes.
  • X axes represent the gap-sizes and sequencing depth at 1 kb windows (bp) in log 10 scale.
  • Y axes represent Kernel probability densities.
  • Graphs are plotted using density function in R. Fig. 9. Thresholds from Gaussian fitting to GAM-ch fractions with ⁇ 0.2 genomes.
  • the threshold is defined as the number of reads for which the height of the Gaussian fit ( ⁇ in dotted thick line) equals the height of the entire sequencing depth distribution (Ay in thin grey line).
  • X-axes represent the sequencing depth at 1 kb windows in the loglO scale.
  • Y-axes represent the Kernel probability densities.
  • Fig. 10 Number of "positive windows” detected from random sampling the original datasets of -0.2 genomes (10 to 100%, 12 pM).
  • Erosion of reads from GAM-ch-0.2 genome dataset shows only a mild change of detected "positive windows" when randomly sampling -60% of reads. Information is markedly lost when ⁇ 30% of reads are considered.
  • the threshold used here for the detection of 4 kb windows is based on the residual analysis in Fig. 9.
  • Fig. 11 Outline of the GAM-ch method in combination with immunoprecipitation of chromatin bound by, e.g., RNA polymerase II.
  • Chromatin crosslinking and fragmentation e.g., chromatin is prepared from mildly fixed cells and randomly fragmented, e.g. by sonication (1).
  • fragment enrichment e.g., by immunoprecipitation of a specific chromatin-bound protein such as RNA polymerase II (2).
  • Division of the fragmented nucleic acids to obtain a collection of fractions (every fraction contains ⁇ 1 copy of every locus, typically ⁇ 0.5 copies).
  • crosslinked chromatin is either directly divided (aliquoted) across tubes to have ⁇ 1 haploid genome equivalent per tube (3 a), or (optionally) first enriched for chromatin occupied by a given protein (or other bound molecule of interest), e.g.
  • chromatin immunoprecipitation 3b. Extract and detect nucleic acids, e.g., the DNA content of each tube is extracted and identified to assess the co-segregation of genomic sequences across tubes (4). Co-segregation of genomic sequences reflects chromatin contacts of genomic sequences in the cell nucleus dependent on protein-protein and protein-RNA bridged interactions and is used to measure long-range chromatin interactions. Boxes: Enhancers, thick black line: active gene, medium thick line: inactive gene, arrows: promoters.
  • RNA polymerase II occupies active gene promoters, coding regions and enhancers.
  • RNAPII-S5p ChlP-seq signal at promoters also called transcription start sites (TSS)
  • TSS transcription start sites
  • TES transcription end/termination sites
  • Transcriptionally silent genes are not occupied by RNAPII-S5p.
  • the average occupancy profiles are represented at ⁇ 5kb windows centered at the transcription start site (TSS) or transcription end site (TES). All mouse genes were ranked by their expression levels determined by mRNA-seq in mouse ESCs (Brookes et al. 2012), then top 25% genes were selected as most actively transcribed genes and the bottom 25% genes were selected as most transcriptionally silent.
  • RNA polymerase II co-associates with enhancers.
  • RNAPII-S5p is present at enhancers defined in murine ESCs according to Whyte et al. (2013). Background levels of ChIP signal was determined by a control ChIP experiment using non-specific antibody against plant steroid digoxigenin.
  • RNAPII-S5p occupancy determined by ChIP combined with quantitative PCR at active, Polycomb-repressed and inactive genes. Quantitative PCR confirms the expected enrichment of RNAPII-S5p of active (Oct4) and Polycomb-repressed (Nkx2.2, HoxA7) genes, and its absence at inactive (Myf5) gene, as expected (Stock et al. 2007). Background levels (mean enrichment after ChIP with non-specific antibody against plant steroid digoxigenin) at promoter and coding regions are shown in black bars. Means and standard deviations from three biological replicates are shown.
  • ChlP-enriched positive windows for different starting amounts of chromatin immunoprecipitated DNA.
  • the percentage of positive windows for GAM-chIP dataset is higher for GAM-chIP samples with larger amounts of input DNA.
  • ChlP-enriched positive windows were determined by number of reads in each 5 kb window from published ChlP-seq RNAPII- S5p obtained in mESC (Brookes et al. 2012). The top 2% of 5 kb windows were taken as the genomic windows most enriched for RNAPII-S5p.
  • Fig. 14 GAM-chIP raw data and detection of positive genomic windows.
  • GAM-chIP profiles of raw sequencing data across two genes show that more positive windows are detected across an actively transcribed gene than an inactive gene. Represented tracks from top to bottom: 1 - RNAPII-S5p ChlP-seq in mESC; 2 - cumulative window detection frequency across 182 GAM-chIP datasets; 3-7 - raw sequencing data for five randomly chosen GAM-chIP datasets together with representation of positive windows defined by fitting binominal distributions (black horizontal bars) or by JAMM peak-finder approach (striped horizontal bars); 8 - raw sequencing data for a control sample containing no chromatin immunoprecipitated material (water control). Images were obtained from UCSC Genome Browser using mean as windowing function. Schematic representation of the genes present in the selected regions is shown underneath.
  • Fig. 15 Quality controls of GAM-chIP dataset.
  • Each GAM-chIP sample contains only a restricted subset of sequences from each chromosome. Each mouse chromosome was divided into 5 kb windows, and the percentage of positive 5kb windows was plotted for each chromosome and for each GAM-chIP sample. No GAM-chIP sample contains more than 12% of any given chromosome, and all chromosomes are comparable in coverage except for chromosome X, which is present in only a single copy (whereas autosomal chromosomes are present in two copies), as expected in the male ESC line used.
  • RNAPII-S5p occupancy in ChlP-seq datasets (from published data; Brookes et al. 2012).
  • the TSS-overlapping 5 kb windows with the least binding of RNAPII-S5p are detected in 4.4% of GAM-chIP samples on average, whereas those with the most abundant binding are detected in an average of 12.5% of GAM-chIP samples.
  • the percentage of 5 kb positive windows overlapping transcriptionally active genes and enhancers are higher than the percentage of 5 kb positive windows overlapping transcriptionally silent genes.
  • the percentage of positive windows is shown for gene body (gene), promoters (transcription start site, TSS) and transcription end site (TES).
  • the set of most actively transcribed and of most silent genes were chosen based on their expression levels, as determined by mR A-seq in a published dataset (Brookes et al. 2012). Positive 5 kb windows overlap gene promoters with high R APII-S5p levels (as determined from published ChlP-seq dataset; Brookes et al. 2012) more often than gene promoters with low R APII-S5p levels.
  • Fig. 16 Co-segregation of genomic windows within actively transcribed genes in GAM- chIP samples. GAM-chIP samples containing multiple positive windows from the same actively transcribed gene occur more frequently than GAM-chIP samples containing multiple positive windows from the same silent genes and more often than would be expected by chance, confirming that chromatin contacts can be formed within actively transcribed genes during transcription (as schematized in Fig. 11).
  • Fig. 17 Co-segregation of genie regions of actively transcribed genes coincides with preferential co-segregation of nearby enhancers in GAM-chIP samples.
  • active (but not for silent) genes the nearest enhancer was more frequently observed in the GAM-chIP samples with the highest number of positively detected intragenic windows.
  • co-segregation of nearby enhancers in the same GAM-chIP samples as actively transcribed genes is indicative of a chromatin interaction between the enhancer and gene during transcription.
  • HAPPY Mapping is based on the co-segregation and detection of nearby DNA markers in the genome and uses limiting dilutions of fragmented DNA to single molecule contents.
  • LOD logarithm of the odds
  • GAM-ch applies the basic principle of HAPPY Mapping to a different purpose: instead of measuring linear genomic distances, it measures long-range chromatin interactions between any genomic regions within the three-dimensional cell.
  • Cells are first treated with a crosslinking agent which, for example, chemically crosslinks proximal genomic regions in the same or differ- rent chromosomes, before chromatin fractionation.
  • a crosslinking agent which, for example, chemically crosslinks proximal genomic regions in the same or differ- rent chromosomes, before chromatin fractionation.
  • GAM-ch detects chromatin proximity but does not require ligation of crosslinked DNA fragments.
  • GAM-ch chromatin preparations similar to 3C are prepared and diluted as for HAPPY Mapping, before quantification of co-segregation frequency; genomic regions that are bridged by proteins and crosslinked during the chromatin preparation will co-segregate more frequently than genomic regions that do not interact (Fig. 2).
  • GAM-ch can provide single allele information about multiplicity of interactions, i.e. multiple genomic regions interacting at the same time with a given allele.
  • 3C a given DNA fragment in a high multiplicity chromatin interaction can only ligate with one or two (at high restriction and ligation efficiency) other DNA fragments.
  • This limitation of 3C makes it difficult to distinguish, for example, between a low-frequency chromatin interaction involving only two fragments and an interaction that involves many genomic partners at high frequency across the cell population (Fig. 1).
  • the same 3C signal e.g.
  • a measured contact of 50% can be due to an interaction that occurs for half the alleles in the cell population if the multiplicity of interaction is only two (or possibly three), or be due to an interaction that occurs in all alleles (real contact frequency is 100%) but is underestimated to only 50%> due to competition with other bound DNA fragments that co-bind at high multiplicity, thereby diluting the probability of ligation between any single fragment with all others.
  • each GAM fraction was subjected first to WGA fragmentation, primer ligation and PCR amplification. WGA-amplified GAM-ch samples were then further amplified using the Illumina library preparation, which adds new sets of primers at each end of the DNA fragments. GAM-ch- seq samples were sequenced using the Illumina sequencing platform (Table 1). As recent 3C- based genome-wide mapping approaches use Hindlll digestion, instead of sonication, this approach was also adopted here. Validation of Hindlll-digested chromatin preparations was performed by 3C analyses (Fig. 3D). Linear DNA was used in parallel to test the effects of WGA and high-throughput sequencing on sequence representation, and as a positive control.
  • GAM-ch samples were prepared for Illumina sequencing as described for 3C and validated by 3C-qPCR using published primer sequences (Fig. 3D). Nuclei from fetal liver cells, fixed for 5 min, were extracted, counted using a haemocytometer, subjected to digestion with Hindlll (digestion efficiency of -77%), and aliquots of -100 genomes ⁇ L were prepared and frozen for further use. Different genome numbers of 3C-like chromatin were first subjected to WGA fragmentation (1 h at 50°C with PK and 4 min at 99°C) and amplification (-0.2, -0.7 and -10 genomes/tube; Fig. 4B). Linear human DNA (2 ng; provided with the WGA kit) was used as a positive control for the WGA reaction.
  • Fragment sizes of crosslinked chromatin range from -0.3-2 kb, whereas linear DNA is less fragmented upon WGA, probably due to lower-sized DNA fragments present in Hindlll digested chromatin (average distance between Hindlll restriction sites is -4 kb in the mouse genome).
  • GAM-ch samples of -0.2 genomes did not show visible products on ethidium bromide gels after WGA amplification (Fig. 4B), but yielded visible products upon preparation of sequencing library (Fig. 6).
  • GAM-ch samples were subjected to Illumina library preparation and DNA fragments were size- selected (350-650 bp for -0.2 and -10 genomes, 200-650 bp for -0.7 genomes) and sequenced. Since the -0.7 genome GAM-ch sample showed less-intense WGA products, DNA fragments from a wider range size were excised and sequenced. Linear mouse DNA was also amplified by WGA and Illumina library kits (not shown) and sequenced in parallel.
  • Each unmappable read is trimmed at its 5 'end by 36 nts and mapped back to the genome. For the remaining reads that still do not align, then 36 nts are trimmed at the 3 'end of the read, and resulting 36 nt read realigned to the genome.
  • This trimming strategy increased the overall percentage of alignment to ⁇ 54 ⁇ 6% (Fig. 5B). This trimming pipeline is not necessary for libraries produced using Illumina Nextera library kits, as the library production relies on tagmentation.
  • GAM-ch libraries obtained from -0.2 genomes show a more clustered distribution of sequencing reads with higher enrichment, as expected due to lower genomic content. This is consistent with a lower diversity of DNA fragments in the -0.2 genome libraries. The higher enrichment suggests that the amount of sequence obtained may already be sufficient to over- represent this diversity.
  • the first step in the analysis of GAM-ch samples is to detect DNA fragments that are present or absent in each GAM-ch sample analysed with subgenomic content. This requires the definition of background read distribution, and a decision about an appropriate window size.
  • the window size should reflect the average size of the DNA fragments present in 3C- like chromatin. For Hindlll restriction, this corresponds to -4 kb fragments.
  • Two different statistical approaches were performed to analyse and to compare sequencing results from multiple libraries. First, the distribution of the gap-size between adjacent covered areas of the genome was analysed, and second the sequencing depth at different window sizes was studied (Fig. 7). Both approaches were used to analyse the sequencing results from linear DNA and GAM-ch samples (Fig. 8).
  • the content of GAM-ch samples with -10 genomes also show an even distribution across the genome meaning that the whole genome is covered, which suggests that DNA extraction from 3C-like chromatin is efficient.
  • the average gap size peaks at -1 kb (Fig. 8A) and displays a second population of gap-sizes of -100 bp. This may reflect the fact that not all genomic regions are represented in this low DNA content sample; it can be the result of interacting DNA sequences within short range distances (as seen in 4C results) being frequently brought together due to crosslinking; further sequencing experiments and analyses are currently ongoing to investigate the significance of the different gap distributions.
  • sequenced reads in the -10 genomes sample are sequenced multiple times, such that each 1 kb window is covered by more reads with the distribution of sequencing depths peaking at -500 nts per 1 kb window (Fig. 8B). Since sequences are a mix of 36 and 72 nt reads, which will appear as single spikes representing a multiple unit of 36 nts in the sequencing depth distribution, each average read would contain about 50 nts ((36+72)/2). Therefore, each 1 kb window of the -10 genomes GAM-ch sample would be covered by -10 reads. In addition many windows with ⁇ 10 reads exist, which are visualized by the left spiky tail in the sequencing depth curve and are hardly distinguishable from the main population of windows with 10 reads.
  • GAM-ch samples with -0.2 genomes contain only a fraction of the genome, as seen in wider gaps in the read distribution and a gap-size peaking at -50 kb, with additional shoulders reflecting non-random spacing between DNA fragments; this is consistent with the presence of chromatin interactions in these few GAM-ch samples.
  • the less diverse set of fragments that are sequenced in the -0.2 genomes sample are sequenced more frequently than fragments in GAM- ch- 10 genomes sample, resulting in about 5000 sequenced nucleotides in each 1 kb window corresponding to -100 reads per 1 kb window.
  • the GAM-ch sample with -10 genomes did not have enough sequencing depth to sufficiently resolve the signal from the noise distribution.
  • the threshold is 790 nts (-1 1 reads with 72 nts), in the same order of magnitude of the residual distribution approach.
  • GAM-chIP combining GAM-ch with immunoprecipitation to capture fragments co-occupied by RNA polymerase II phosphorylated on Serines.
  • the DNA fragments bound by a specific protein are selected from the bulk chromatin, e.g. by chromatin immunoprecipitation (ChIP), a strategy called GAM-chIP.
  • GAM-chIP is performed with an additional step in which crosslinked chromatin fragments, containing a given protein or protein post-translational modification, are first selected prior to their dilution between tubes, e.g. to enrich for fragments containing genes and regulatory regions (enhancers) (Fig. 11). Including this additional selection step has two advantages: first it allows for detection of chromatin contacts which are formed in the presence of the given protein or protein post-translational modification.
  • R APII-S5p DNA fragments bound by R A polymerase II phosphorylated on the Serine-5 residue of the CTD, which we abbreviate to R APII-S5p.
  • R APII-S5p was chosen because it has high occupancy at active genes, especially at promoters, throughout coding regions and transcription termination sites, and enhancers (Fig. 12A,B). Combining GAM-ch with ChIP for R APII-S5p therefore has the potential of increasing the power of GAM-ch to detect contacts between enhancers and their target genes.
  • chromatin was crosslinked using formaldehyde and fragmented by sonication, then chromatin fragments bound by R APII-S5p were selected by immunoprecipitation using a specific antibody coupled to beads (CTD-4H8, Covance; according to Brookes et al 2012).
  • CCD-4H8, Covance a specific antibody coupled to beads
  • fragments resulting from ChIP were eluted from beads, and fractionated/diluted into a multitude of fractions and WGA amplified.
  • RNAPII-S5p bound DNA fragments ChIP of RNAPII-S5p bound DNA fragments was performed as described previously (Stock et al. 2007; Brookes et al. 2012).
  • Mouse embryonic stem cells (ESCs) were fixed in 1% formaldehyde for 10 min. Nuclei were then extracted, counted using a haemocytometer, and chromatin was extracted using sonication. Sonicated chromatin fragments bound by RNAPII-S5p were selected by immunoprecipitation.
  • RNAPII-S5p was validated using quantitative PCR of DNA fragments known to be bound by RNAPII-S5p in mouse ESCs, namely promoters and coding regions of active and Polycomb-repressed genes (Fig. 12C); inactive gene promoter and coding region were used as negative control.
  • a control ChIP experiment was performed with nonspecific antibody against plant steroid digoxigenin, which showed no DNA fragment enrichment, as expected (Stock et al. 2007; Brookes et al. 2012). This analysis demonstrated that the antibody immunoprecipitation step had successfully and efficiently selected RNAPII-S5p-bound chromatin fragments (Fig. 12C).
  • the immunoprecipitated chromatin material was divided (aliquoted) into multiple tubes at the chosen dilution factor based on the measured DNA concentration.
  • GAM-chIP samples show a fragment size distribution of ⁇ 100bp to ⁇ 1200bp following WGA amplification (Fig. 13 A, slightly smaller than for GAM-ch samples prepared by Hindlll digestion without chromatin immunoprecipitation; Fig 4B).
  • the fragment size distributions and the amount of DNA after amplification were comparable between different samples prepared from the same concentration of input DNA (Fig. 13B).
  • GAM-chIP samples from the first two exploratory experiments were subjected to Illumina TruSeq Nano library preparation (Table 2).
  • GAM-chIP An exploratory GAM-chIP dataset was collected consisting of 182 GAM-chIP samples (Table 2. GAM-chIP Exp003), each generated from 1 pg of chromatin after ChIP for RNAPII-S5p, plus four positive controls containing 500 pg of the same chromatin, and four negative controls where no chromatin was added (water control).
  • GAM-chIP samples in this exploratory collection were WGA amplified and subjected to Illumina Nextera XT library preparation. DNA fragments from 300-500 bp were size-selected and sequenced.
  • the mouse genome was divided into 5 kb windows and the number of sequencing reads mapping to each window was calculated.
  • a two- curve fitting strategy was applied to distinguish signal from noise in GAM-chIP datasets.
  • the distribution of sequencing depth over 5 kb windows was fit with a negative binomial distribution (representing sequencing noise) and a lognormal distribution (representing true signal).
  • a threshold number of reads x was determined, where the probability of observing more than x "noise" reads mapping to a single genomic window was less than 0.001. Such a threshold was thus independently determined for each sample, and windows were scored as positive if the number of sequenced reads was greater than the determined threshold.
  • GAM-ch and GAM-chIP experiments have the greatest statistical power when the chance of a given tube containing a given locus of interest is ⁇ 0.5.
  • the loci of interest are those which are bound by the protein targeted for enrichment, which can be identified by sequencing the bulk immunoprecipitated chromatin (ChlP-seq) without dilution and WGA amplification.
  • RNAPII-S5p As an estimation of the complexity of the datasets produced in the second experiment (Exp002), we determined the number of sequencing reads mapping to each 5 kb window by ChlP-seq of RNAPII-S5p using a published ChlP-seq dataset obtained in mouse ESCs (Brookes et al. 2012). The top 2% of 5 kb windows were taken as the genomic windows "most enriched for RNAPII- S5p". The percentage of "RNAPII-S5p most enriched windows" identified as positive in each GAM-chIP sample was determined (Fig. 13C). The percentage of most enriched windows identified as positive in each GAM-chIP dataset was highest for GAM-chIP samples with larger amount of input DNA, but was 2-16%, i.e.
  • the exploratory GAM-chIP R APII-S5p dataset consisted of 182 samples containing lpg of ChIP DNA, four samples with 500 pg DNA (positive controls) and four samples without DNA (negative controls). Positive windows were identified for each of these 190 samples as outlined above for the other GAM-chIP datasets. Positive windows were examined in the UCSC Genome Browser and compared to the raw sequencing data, confirming that the window-calling approach was performing sensibly (Fig. 14). We confirmed that each GAM-chIP sample contained only a subset of 5 kb windows, whilst very few positive windows were identified for the negative control samples, in support of the feasibility of the approach.
  • the 182 GAM-chIP samples were collected in two batches, each of which was further divided into four pools for independent sequencing to achieve sufficient sequencing depth.
  • the first four batches were WGA amplified immediately after ChIP, the second four batches were WGA amplified from the same ChIP material following storage at -20°C after the aliquoting step but before WGA amplification.
  • This collection of GAM-chIP samples gave a total of eight pools, each containing around 24 GAM-chIP samples.
  • quality control of purity of the amplified material from very small amounts of mouse DNA fragments, i.e. lpg
  • the percentage of sequencing reads from each library that could be successfully mapped back to the mouse genome was plotted by library pool number (Fig. 15 A).
  • the negative control samples yielded very low percentages of mapped reads to the mouse genome, indicating that they were not contaminated by mouse DNA (e.g. from the GAM- chIP samples processed in parallel) during the WGA amplification or library preparation steps.
  • Positive control samples (each with 500 pg of DNA) yielded the highest percentage of mapped reads (85% on average), whilst 178 out of 182 GAM-chIP libraries showed robust read mapping rates to the mouse genome of >70%.
  • the distribution of the percentage of mapped reads was highly reproducible between samples and between sequencing pools. In particular, pools 5 to 8 did not yield a smaller percentage of mapped reads than pools 1 to 4, indicating that they were not affected by the addition of the freezing step (Fig. 15 A).
  • each GAM- chIP sample contains only a restricted subset of sequences from each chromosome (Fig. 15B).
  • No GAM-chIP sample contains more than 12% of any given chromosome, and all chromosomes are comparable in coverage except for chromosome X, which is present in only a single copy (whereas autosomal chromosomes are present in two copies), as expected in the male ESC line used.
  • RNAPII-S5p antibodies shows abundant detection of DNA fragments co- occupied by RNA polymerase II phosphorylated on Serine-5
  • RNAPII-S5p is most abundant at actively transcribed genes, and in particular at their promoters (Fig. 12A). To confirm that the promoters of genes more highly bound by RNAPII-S5p are also more frequently detected in GAM-chIP samples, 5kb windows overlapping gene promoters were identified and sorted into five equal groups (quantiles) according to the occupancy of RNAPII- S5p (as determined by ChlP-seq, published dataset from Brookes et al. 2012; Fig. 15C). As expected, the detection frequency of 5 kb windows that overlap gene promoters (also called transcription start sites or TSSes) increases with increased chromatin occupancy of RNAPII-S5p.
  • TSSes transcription start sites
  • RNAPII-S5p The TSS-over lapping 5 kb windows with the lowest binding of RNAPII-S5p are detected in 4.4% of GAM-chIP samples on average, whereas those windows with the highest binding are detected in an average of 12.5% of GAM-chIP samples (Fig. 15C).
  • Future experiments will include the use of larger DNA fragment amounts per sample, to reach detection of genomic windows most abundantly occupied by RNAPII-S5p closer to the optimal 0.5 frequency of detection of each fragment, which will provide optimal chromatin contact information from the least number of samples (as expected from linear HAPPY Mapping).
  • GAM-chIP One possible use for GAM-chIP is to identify enhancers regulating the expression of given genes.
  • RNAPII-S5p is expected to be found at transcriptionally expressed genes and enhancers but not transcriptionally silent genes (Fig. 12A,B), and was therefore chosen as a suitable target for the exploratory GAM-chIP experiment in order to increase the potential to identify interactions within and between enhancers and active genes.
  • the use of different proteins for immunoprecipitation may yield optimal co-segregation of promoters and their target enhancers.
  • mice genes were ranked according to their expression level, as determined by mPvNA-seq. The top 25% of genes were selected as most actively transcribed genes, whilst the bottom 25% of genes was selected as transcriptionally silent genes. 5 kb windows were identified that overlapped the gene body, transcription start site (TSS) or transcription end site (TES) of genes in the top or bottom 25% by expression.
  • TSS transcription start site
  • TES transcription end site
  • the percentage of 5 kb windows overlapping each feature that were identified as positive was plotted for each of the 182 GAM-chIP samples and compared to the percentage of all 5 kb windows or of 5 kb windows overlapping enhancers detected as positive in each sample (Fig. 15D).
  • 5 kb windows overlapping the gene body, TSS or TES of a silent gene were detected slightly less frequently than the average for all 5 kb windows.
  • chromatin contacts can form within the bodies of actively transcribed genes (Larkin, Cook & Papantonis, 2012). This means that distant regions within the same gene should be crosslinked both to each other and to R APII-S5p.
  • GAM-chIP identifies the presence or absence of genomic loci across a collection of tubes. If actively transcribed genes interact with themselves during transcription, some tubes will contain many chromatin fragments derived from the same gene, which were crosslinked to each other during the fixation step. Alternatively, if actively transcribed genes do not interact with themselves, a smaller number of tubes will contain multiple windows from the same gene by chance alone.
  • GAM-chIP detects co-association of actively transcribed genes with nearest candidate enhancer regions
  • Genomic windows overlapping enhancers should therefore co-segregate in the same GAM-chIP samples as the genomic windows overlapping their target genes. Furthermore, since different parts of each gene also contact themselves during transcription, GAM-chIP samples containing multiple positive windows from the same gene are the most likely to have originated from the gene during its transcription cycle and therefore likely to additionally co-segregate with the enhancer.
  • GAM-chIP samples For each gene, we ordered the GAM-chIP samples according to the proportion of intragenic windows detected. GAM-chIP samples which contain many positive windows from the same active gene often also contain a nearby enhancer, whereas GAM-chIP samples containing few positive windows from the same gene are often less likely to additionally contain the enhancer (Fig. 17A). In contrast, this behaviour is not expected for silent genes, since these genes are not expected to contact nearby regions classified as enhancers in mouse ESCs. For silent genes, the detection of a nearby enhancer is often uncorrected to the detection of the gene itself (Fig. 17B). With a larger collection of GAM-chIP samples each produced from fragment frequencies closer to 0.5, it should be possible to assign enhancers to their target genes based on the correlation of detection of the enhancer with detection of the gene across the collection of samples.
  • GAM-ch samples with -0.2 and 10 genomes were subjected to WGA and detected by next- generation sequencing.
  • the sequencing profile of the GAM-ch-0.2 sample has distinct islands across the genome whereas linear DNA at high concentration is evenly distributed (Fig. 6).
  • the sequencing profile of -0.2 genomes suggests that only a sub-fraction of the genome is captured, which is then frequently sequenced, as expected (Fig. 8B).
  • the threshold of signal detection of positive windows above background was 13 reads (-940 nts) for 4 kb windows, resulting in 45xl0 3 -50xl0 3 windows of 4 kb passing the threshold (Fig. 10).
  • 45xl0 3 -50xl0 3 windows of 4 kb correspond to a total of 1.8xl0 8 -2xl0 8 nts (out of 2.6xl0 9 bp in the total mouse genome including repetitive sequences). If -0.2 genomes are dispensed across tubes, each molecule has a probability of 0.18 to be present in each tube assuming a Poisson distribution, which would correspond to ⁇ 4.7xl0 8 bp.
  • Identifying contacts between active genes and their regulatory regions is a major current challenge, especially as there is evidence for complex interactions between clustered enhancers and their target genes (Fig. 11 A).
  • 3C-based technologies underestimate contacting partners of most complex interactions (i.e. interactions involving three or more fragments; O'Sullivan et al. 2013; Fig. 1).
  • FISH in interphase nuclei is limited by sensitivity of detection which requires that probes cover several kilobase pairs of genomic sequence, and by spatial resolution, which is limited to detect interactions between genomic sequences separated by several tens of kilobase pairs.
  • Novel ligation-free technologies should help detect enhancers that participate in the most complex interactions (Fig. 11B).
  • GAM-chIP after R APII-S5p ChIP can be performed reliably for different amounts of DNA, especially for 1 pg of DNA yielding GAM-chIP libraries with low complexity (2-10% of detection of 5 kb genomic windows; Fig. 13, 14, 15).
  • the GAM-chIP libraries produced were enriched for genomic windows containing active genes, including windows covering the gene promoters (TSS) and the gene termination sites (TES) (Fig. 15C,D). 5kb genomic windows containing candidate enhancers were also more likely to be detected in the pool of positive windows in each GAM-chIP dataset (Fig. 15D), consistent with the presence of RNAPII-S5p at these regulatory regions.
  • Murine fetal liver and fetal brain were dissected from El 4.5 wildtype mouse embryos as described previously (Hagege et al. 2007) and processed in parallel for 3C and GAM-ch. The quality of the resulting 'chromatin' preparation was determined using a chromosome conformation capture (3C)-qPCR assay, performed as previously described (Hagege et al. 2007), on the mouse ⁇ -globin gene cluster as a reference locus.
  • 3C chromosome conformation capture
  • Mouse fetal liver and brain tissue from 14.5 dpc embryos were dissected and processed into a single cell suspension as previously described (Hagege et al. 2007), resulting in a single-cell sample containing approximately 2xl0 7 cells/mL in 10% (v/v) heat inactivated fetal calf serum in PBS.
  • Cells were fixed by addition of 2% formaldehyde/ 10%> FCS/PBS and incubated for 5 or 10 min at room temperature. The crosslinking reaction was then quenched by addition of 1 M glycine solution to give 0.14 M final concentration.
  • restriction enzyme buffer 500 ⁇ ; NEB2 buffer
  • 20%> (w/v) SDS solution 7.5 ⁇ was added to a final concentration of 0.3%, and incubated (1 h shaking at 900 rpm) to increase chromatin accessibility for restriction enzyme digestion.
  • 50 ⁇ , of 20%> Triton X-100 solution were added (2% final concentration) and incubated at 37°C (1 h shaking) to sequester SDS.
  • Hindlll 400 units; BioLab
  • digestion was performed overnight (37°C, shaking) followed by addition of 40 ⁇ ⁇ 20% SDS solution (1.6% final concentration) and incubation at 65°C (20 min) to inactivate Hindlll.
  • Aliquots of undigested and digested chromatin were taken for subsequent analysis of digestion efficiency.
  • the digested nuclei were transferred to a 50 mL Falcon tube and diluted in 6.125 mL of ligation buffer (66 mMTris-HCl, pH 7.5; 5 mM DTT; 5 mM MgCl 2 ; 1 mM ATP). After addition of 375 ⁇ . of 20% (v/v) Triton X-100 solution (1% final concentration), nuclei were incubated (1 h shaking at 37°C). T4 DNA ligase (Promega) was added (100 Units) and ligation was performed at 16°C for 4 h.
  • Reversal of crosslinks was performed by addition of 30 ⁇ of 10 mg/mL proteinase K (300 ⁇ g total; Sigma) and incubation at 65°C overnight followed by RNase incubation (300 ⁇ g total; Roche) at 37°C (1 h), and by phenol-chloroform extraction and ethanol precipitation (Sigma).
  • the 3C material was desalted using Micro Bio-Spin P-30 chromatography columns (BioRad) before qPCR. Each qPCR reaction was performed with -120 ng of 3C material.
  • Quantitative real-time PCR (MJ MiniOpticon, BioRad) was performed with Platinum Taq DNA Polymerase (Invitrogen) and double-dye oligonucleotides (5'FAM + 3'TAMRA) as TaqMan probes, using the following concentrations: 0.1 ⁇ LTaq-polymerase from kit; 2.5 ⁇ , lOxTaq-buffer from kit; 0.75 ⁇ MgCl 2 (final 1.5 mM) from kit; 0.5 ⁇ (final 200 ⁇ ); 0.25 ⁇ , of each primer (from stock solution of 0.29 ⁇ g/ ⁇ L); 0.025 ⁇ LTaq-probe (final 2.5 pmol); 1-2 ⁇ , DNA template and adjusting to 25 ⁇ , with H 2 0.
  • a real-time qPCR (95°C for 10 min, 40 cycles with 95°Cfor 30 seconds, 58°C or 15 seconds and 72°C for 15 seconds) with Syb R Green as performed with the undigested (UND) and digested (D) samples using 2xPCR mix (Promega) on the MJ MiniOpticon PCR engine (BioRad).
  • primer sets that amplify across each restriction site of interest (R) were used.
  • internal primers (C) not containing a restriction site were used.
  • Preparation of crosslinked nuclei from mouse fetal liver cells for GAM-ch is similar as for 3C. Briefly, fetal liver cells were resuspended in 2% formaldehyde/ 10% FCS/PBS and the reaction was quenched with glycine after 10 min. Fixed cells were lysed in cold lysis buffer, and nuclei were spun as for 3C (as described above).
  • sonication buffer 50 mM HEPES pH 7.9, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Na- deoxycholate, 0.1%) SDS
  • sonication buffer 50 mM HEPES pH 7.9, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Na- deoxycholate, 0.1%) SDS
  • Nuclei were sonicated in 2.5 mL aliquots using a Bioruptor (Diagenode) for 30 min at 30 s on/off intervals at medium energy.
  • mouse fetal liver cells were embedded into DNA agarose strings at a density of ⁇ lxl0 7 cells/mL ( ⁇ 2xl0 5 genomes/cm; prepared according to Dear D.H. et al. 1998. A high-resolution metric HAPPY map of human chromosome 14. Genomics 48:232). Agarose strings of distinct length were melted in 0.5x PCR buffer II (68°C, 10 min) and DNA was diluted in molecular biology-grade H 2 0 (Sigma) into aliquots of -100 genomes ⁇ L and stored at -20°C.
  • ESCs Mouse embryonic stem cells (ESCs; 46C cell line, male) were grown in ESGRO medium (Merck, SF001-500P) supplemented by 1000 units/ml LIF (Merck), and chromatin prepared as previously described (Stock et al, 2007). Briefly, cells were treated with 1% formaldehyde (37°C, 10 min) and the reaction stopped with addition of glycine to a final concentration of 0.125 M. Cells were washed in ice-cold PBS, before "swelling" buffer (25 mM HEPES pH 7.9, 1.5 mM MgC12, 10 mM KC1 and 0.1% NP-40) was added to lyse the cells (10 min, 4°C).
  • ESGRO medium Merck, SF001-500P
  • LIF Merck
  • chromatin prepared as previously described (Stock et al, 2007). Briefly, cells were treated with 1% formaldehyde (37°C, 10 min) and the reaction stopped
  • Protein-G-magnetic beads were first incubated with rabbit anti-mouse (IgG+IgM) bridging antibodies (Jackson Immunoresearch; 10 ⁇ g per 50 ⁇ beads) for 1 h at 4°C and washed with sonication buffer. Seven hundred ⁇ g of chromatin was immunoprecipitated (4°C, overnight) with 10 ⁇ g of RNAPII-S5p antibody (clone CTD-4H8, Covance) and 50 ⁇ magnetic beads beads. ChIP washes and elutions after immunoprecipitation were performed as described previously (Stock et al, 2007).
  • crosslinked DNA-protein complexes were eluted twice from beads (65°C, 5 min; and room temperature, 15 min) with 50 mM Tris-HCl pH 8.0, 1 mM EDTA and 1% SDS.
  • Half of the eluted immunoprecipitated chromatin was diluted into multiple tubes (based on the measured DNA concentration in the other half of eluted chromatin).
  • To measure DNA concentration half of the eluted chromatin was incubated overnight at 65 °C with addition of NaCl (160 mM final concentration) and RNase A (20 ⁇ g/ml; Sigma) to reverse cross- linking.
  • Oct4 promoter F GGCTCTCC AGAGGATGGCTGAG (SEQ ID NO : 1 )
  • Oct4 promoter R TCGGATGCCCCATCGCA (SEQ ID NO: 2)
  • Nkx2.2 promoter F CAGGTTCGTGAGTGGAGCCC (SEQ ID NO: 5)
  • Nkx2.2 promoter R GCGCGGCCTC AGTTTGTAAC (SEQ ID NO : 6)
  • HoxA7 promoter R CCGACAACCTCATACCTATTCCTG (SEQ ID NO: 10)
  • Illumina libraries were prepared for HT sequencing from WGA-amplified GAM-ch DNA.
  • WGA-amplified GAM-ch samples were fragmented using a Covaris shearing system before library preparation.
  • Illumina libraries were size selected on agarose gels, enabling visualisation of the amplified DNA fragments, and therefore more careful extraction of appropriate sized fragments.
  • QIAgen Gel Extraction kit libraries were quantified by QuBit (Invitrogen) and qPCR, and library size was analysed by Bioanalyser (Agilent). Fragment sizes were within the expected size distribution of 210-600 bp (including adapters) for all libraries.
  • RNAPII-S5p Chromatin precipitated with antibodies against RNAPII-S5p was quantified fluorimetrically with PicoGreen (Molecular Probes, Invitrogen) and diluted into multiple tubes (see Table 2 for amounts).
  • DNA was extracts by WGA, first by incubation in WGA fragmentation buffer containing PK for 2 h (Exp.001 and Exp.002) or 8 h (Exp.003); subsequent steps were carried out according to the manufacturer's specifications.
  • Amplified DNA was purified with MinElute 96 UF PCR Purification Kit (Qiagen) according to manufacturer's instructions.
  • DNA fragments from 300-500 bp were size-selected with Agencourt AMPure XP (Beckman Coulter) and the final DNA concentration was determined by PicoGreen fluorimetry (Molecular Probes, Invitrogen) and subjected to Illumina TruSeq Nano library preparation (GAM-chIP ExpOOl, GAM-chIP Exp002; Table 2) or to Illumina Nextera XT library preparation (GAM-chIPExp003 ; Table 2).
  • GAM-ch libraries (4-12 pM) were loaded onto the Genome Analyser flow cell.
  • the single- stranded DNA fragments bind randomly across the surface of the flow cell due to hybridisation between the adaptor sequences added to DNA ends during library preparation, and the oligonucleotides that coat the flow cell.
  • Polymerase-based extension converts each fragment to a cluster of approximately 1000 identical fragments.
  • the amount and size of DNA fragments loaded on to the flow cell was optimised to obtain the highest number of non-overlapping clusters following cluster generation.
  • Clusters were then sequenced by synthesis, using adaptor- specific primers and incorporation of fluorescent nucleotides. Digital images were taken at each round of nucleotide incorporation and the unique fluorescent signal assigned to each nucleotide enables its correct identification. Sequential images of a given cluster therefore represent the fragment sequence.
  • GAM-chIP libraries sequenced on the HiSeq or MiSeq were not imaged for the first thirty sequencing cycles (known as dark cycles) in order to avoid issues relating to low sequence diversity in the WGA adaptor. This step avoids the need for trimming reads after sequencing used in earlier GAM-ch datasets (Fig. 5A).
  • DNA reads were firstly aligned to the reference mouse genome (assembly mm9) using Illumina Extended software (pipeline 1.6) allowing only for two mismatches at most and unique matches only. Un-aligned reads were then trimmed at their 5 ' or 3 ' end and aligned to the mm9 genome using Bowtie software, version 0.9.8.1 (Langmead B. et al. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10, R25).
  • DNA reads were first aligned to the reference mouse genome (assembly mm 10) using Bowtie2 and enforcing a minimum mapping quality of 20. Read depth of coverage was calculated using bedtoolsmultibamcov (Quinlan & Hall 2010, BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:6). Curve fitting was performed in python using the fmin function from scipy. A combination of two distributions was fitted to the histogram of the number of reads per window.
  • a negative binomial distribution represents sequencing noise, and the parameters of the fit for this distribution were used to determine a threshold number of reads X where the probability of observing more than X reads mapping to a single genomic window by chance was less than 0.001. Such a threshold was thus independently determined for each sample, and windows were scored as positive if the number of sequenced reads was greater than the determined threshold.
  • a lognormal distribution representing true signal
  • positive windows were also called using JAMM (Ibrahim et al, 2015) in the peak mode with default settings.
  • ChlP-seq libraries for R APII-S5p and control (using non-specific antibody against plant steroid digoxigenin) were prepared from 10 ng of immunoprecipitated DNA (as measured by Picogreen quantification) with corresponding antibodies using the Next ChlP-Seq library Prep Master Mix Set from Illumina (NEB, # E6240) following the NEB protocol, with some modifications.
  • the intermediate products from the different steps of the NEB protocol were purified using MiniElute PCR purification kit (Qiagen, # 28004).
  • Adaptors, PCR amplification primers and indexing primers were from the Multiplexing Sample Preparation Oligonucleotide Kit (Illumina, # PE-400-1001).
  • Samples were PCR amplified prior to size selection of DNA fragments (250- 600bp) on an agarose gel. After purification by QIAquick Gel Extraction kit (Qiagen, # 28704), libraries were quantified by qPCR using Kapa Library Quantification Universal Kit (KapaBio systems, #KK4824). Library size distribution was assessed by 2100 Bioanalyzer (Agilent) with High Sensitivity DNA analysis Kit (Agilent, #5067-4626) before high-throughput sequencing. Libraries were quantified by Qubit and sequenced on Illumina HiSeq2000 (single- end sequencing, 51 nucleotides), according to the manufacturer's instructions.
  • Sequenced reads were aligned to the mouse genome (assembly mmlO, December 2011) using Bowtie2 version 2.0.5 (Langmead and Salzberg, 2012), with default parameters. Duplicated reads (i.e. identical reads, aligned to the same genomic location) occurring more often than a threshold were removed. The threshold is computed for each dataset as the 95th percentile of the frequency distribution of reads.
  • RNAPII-S5p and control ChIP enrichment at enhancers the list of enhancers from Whyte et al. 2013 was used.
  • TPM Transcripts per Million
  • Genes in the top 25% by expression were classified as active, whilst genes in the bottom 25% by expression were classified as silent.
  • paired-end (2xl00bp) reads from mRNA-seq were aligned against the mouse genome using STAR (Spliced Transcripts Alignment to a Reference, v2.4.2a, (Dobin et al, 2013) and expression levels were estimated in TPM with RSEM (RNA-Seq by Expectation-Maximization, vl .2.25 (Li and Dewey, 2011).
  • the reference for STAR and RSEM was produced from the Mouse Genome version mmlO, providing the gtf annotation from UCSC Known Genes (mmlO, version 6) and associated isoform-gene relationship information from the Known Isoforms table. Both tables were downloaded from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables).
  • the detection frequency of each window overlapped by the gene, ⁇ one window upstream/downstream, was calculated as the number of GAM-chIP samples in which the window was detected divided by the total number of GAM-chIP samples. Since each window is detected with a different frequency, each window can be described by its own binomial distribution.
  • the expected distribution of the number of positive windows from the same gene detected simultaneously in a single GAM-chIP sample was calculated as the convolution of the binomial distributions for each component window.
  • the average expected number of positive windows per GAM-chIP sample was calculated as the sum of the window detection frequencies. For each gene, the number of tubes with more than double this average was counted and compared to the expected number of tubes with more than double the average. The distribution of observed vs. expected values was plotted and compared between active genes and silent genes.
  • Enhancer loops appear stable during development and are associated with paused polymerase. Nature. 2014 Aug 7;512(7512):96-100.
  • Chromatin Interaction Analysis with Paired-End Tag (ChlA-PET) sequencing technology and application. BMC Genomics. 2014;15 Suppl 12:S11.
  • CTCF mediates long-range chromatin looping and local histone modification in the beta- globin locus.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne le domaine de l'analyse de la structure tridimensionnelle du génome, c'est-à-dire, la cartographie d'architecture du génome sur chromatine (GAM-ch). L'invention concerne un procédé permettant de déterminer l'interaction d'une pluralité de loci d'acides nucléiques dans un compartiment comprenant des acides nucléiques, comme le noyau de la cellule, consistant à séparer des acides nucléiques les uns des autres en fonction de leur interaction dans le compartiment par réticulation d'acides nucléiques les uns avec les autres directement ou indirectement, fragmenter les acides nucléiques du compartiment pour obtenir des fragments et/ou des complexes de fragments réticulés, et diviser les acides nucléiques fragmentés pour obtenir une collection de fractions de telle sorte que chaque fraction contienne, en moyenne, moins d'une copie de chaque locus ; déterminer la présence ou l'absence de la pluralité de loci dans lesdites fractions ; et déterminer la co-ségrégation de ladite pluralité de loci dans les fractions. La co-ségrégation peut alors être analysée avec des méthodes statistiques pour déterminer les interactions. Le procédé peut être utilisé par exemple, pour identifier la fréquence d'interactions sur une population de cellules entre une pluralité de loci ; et cartographier l'architecture des loci et/ou du génome, par exemple, dans le noyau, un organite, un micro-organisme ou un virus ; identifier des régions régulatrices dirigeant l'expression d'un gène spécifique par l'intermédiaire de contacts spatiaux ; identifier les contacts spatiaux entre les loci qui dépendent de leur co-association avec une/des protéine(s) spécifique(s) ou l'ARN et/ou diagnostiquer une maladie associée à une co-ségrégation perturbée de loci. L'immunoprécipitation de la chromatine (ChIP) peut être combinée avec le procédé de l'invention.
PCT/EP2016/057025 2015-03-31 2016-03-31 Cartographie d'architecture de génome sur chromatine WO2016156469A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP15161949 2015-03-31
EP15161949.1 2015-03-31

Publications (1)

Publication Number Publication Date
WO2016156469A1 true WO2016156469A1 (fr) 2016-10-06

Family

ID=52811014

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2016/057025 WO2016156469A1 (fr) 2015-03-31 2016-03-31 Cartographie d'architecture de génome sur chromatine

Country Status (1)

Country Link
WO (1) WO2016156469A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018045137A1 (fr) * 2016-09-02 2018-03-08 Ludwig Institute For Cancer Research Ltd Identification d'interactions de chromatine à l'échelle du génome
CN111727248A (zh) * 2017-09-25 2020-09-29 弗雷德哈钦森癌症研究中心 高效靶向原位全基因组剖析
CN112599189A (zh) * 2020-12-29 2021-04-02 北京优迅医学检验实验室有限公司 一种全基因组测序的数据质量评估方法及其应用
EP3988669A1 (fr) 2020-10-22 2022-04-27 Max-Delbrück-Centrum für Molekulare Medizin in der Helmholtz-Gemeinschaft Procédé de detection d'acides nucléiques par oligo hybridation et amplification à base de pcr
CN114842914A (zh) * 2022-04-24 2022-08-02 山东大学 一种基于深度学习的染色质环预测方法及系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100081141A1 (en) * 2008-08-06 2010-04-01 University Of Southern California Genome-Wide Chromosome Conformation Capture
WO2012159025A2 (fr) * 2011-05-18 2012-11-22 Life Technologies Corporation Analyse de conformation de chromosome

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100081141A1 (en) * 2008-08-06 2010-04-01 University Of Southern California Genome-Wide Chromosome Conformation Capture
WO2012159025A2 (fr) * 2011-05-18 2012-11-22 Life Technologies Corporation Analyse de conformation de chromosome

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ANA POMBO ET AL: "Three-dimensional genome architecture: players and mechanisms", NATURE REVIEWS MOLECULAR CELL BIOLOGY, vol. 16, no. 4, 11 March 2015 (2015-03-11), pages 245 - 257, XP055207128, ISSN: 1471-0072, DOI: 10.1038/nrm3965 *
DEAR P H ET AL: "HAPPY MAPPING: A PROPOSAL FOR LINKAGE MAPPING THE HUMAN GENOME", NUCLEIC ACIDS RESEARCH, INFORMATION RETRIEVAL LTD, vol. 17, no. 17, 12 September 1989 (1989-09-12), pages 6795 - 6807, XP000371654, ISSN: 0305-1048 *
JENNIFER L CRUTCHLEY ET AL: "Chromatin conformation signatures: ideal human disease biomarkers?", BIOMARKERS IN MEDICINE, vol. 4, no. 4, 1 August 2010 (2010-08-01), pages 611 - 629, XP055155789, ISSN: 1752-0363, DOI: 10.2217/bmm.10.68 *
PHILIPPE COLLAS: "The Current State of Chromatin Immunoprecipitation", MOLECULAR BIOTECHNOLOGY, vol. 45, no. 1, 1 May 2010 (2010-05-01), pages 87 - 100, XP055021496, ISSN: 1073-6085, DOI: 10.1007/s12033-009-9239-8 *
TOLHUIS B ET AL: "Looping and interaction between hypersensitive sites in the active beta-globin locus", MOLECULAR CELL, CELL PRESS, CAMBRIDGE, MA, US, vol. 10, no. 6, 1 December 2002 (2002-12-01), pages 1453 - 1465, XP002301469, ISSN: 1097-2765, DOI: 10.1016/S1097-2765(02)00781-5 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018045137A1 (fr) * 2016-09-02 2018-03-08 Ludwig Institute For Cancer Research Ltd Identification d'interactions de chromatine à l'échelle du génome
CN111727248A (zh) * 2017-09-25 2020-09-29 弗雷德哈钦森癌症研究中心 高效靶向原位全基因组剖析
EP3988669A1 (fr) 2020-10-22 2022-04-27 Max-Delbrück-Centrum für Molekulare Medizin in der Helmholtz-Gemeinschaft Procédé de detection d'acides nucléiques par oligo hybridation et amplification à base de pcr
WO2022084528A1 (fr) 2020-10-22 2022-04-28 Max-Delbrück-Centrum Für Molekulare Medizin In Der Helmholtz-Gemeinschaft Procédé destiné à la détection d'acide nucléique par hybridation des oligos et amplification basée sur la pcr
CN112599189A (zh) * 2020-12-29 2021-04-02 北京优迅医学检验实验室有限公司 一种全基因组测序的数据质量评估方法及其应用
CN114842914A (zh) * 2022-04-24 2022-08-02 山东大学 一种基于深度学习的染色质环预测方法及系统
CN114842914B (zh) * 2022-04-24 2024-04-05 山东大学 一种基于深度学习的染色质环预测方法及系统

Similar Documents

Publication Publication Date Title
JP7127104B2 (ja) 連続性を維持した転位
EP3334823B1 (fr) Procédé et kit pour générer des arn guide crispr/cas
KR102425438B1 (ko) 서열결정에 의해 평가된 DSB의 게놈 전체에 걸친 비편향된 확인 (GUIDE-Seq)
WO2016156469A1 (fr) Cartographie d'architecture de génome sur chromatine
US20200248229A1 (en) Unbiased detection of nucleic acid modifications
US11807896B2 (en) Physical linkage preservation in DNA storage
EP3230465B1 (fr) Mappage d'architecture de génome
JP2022095676A (ja) 保存されたサンプルからの長距離連鎖情報の回復
Shipkovenska et al. A conserved RNA degradation complex required for spreading and epigenetic inheritance of heterochromatin
US20220136041A1 (en) Off-Target Single Nucleotide Variants Caused by Single-Base Editing and High-Specificity Off-Target-Free Single-Base Gene Editing Tool
WO2014193980A1 (fr) Amplification pratiquement non biaisée de génomes
AU2019214956A1 (en) Sample prep for DNA linkage recovery
Mulla et al. Aneuploidy as a cause of impaired chromatin silencing and mating-type specification in budding yeast
US20230032136A1 (en) Method for determination of 3d genome architecture with base pair resolution and further uses thereof
Pinglay et al. Synthetic genomic reconstitution reveals principles of mammalian Hox cluster regulation
Lin et al. DNA sequence preference for de novo centromere formation on a Caenorhabditis elegans artificial chromosome
Willemin et al. Context-independent function of a chromatin boundary in vivo
Herbst Scalable approaches for gene tagging and genome walking sequencing
Smith Genetic and Epigenetic Identity of Centromeres
Goldberg et al. Engineered transcription-associated Cas9 targeting in eukaryotic cells
Hannigan The Functions and Regulation of mRNA Processing During Male Germ Cell Development
EP4352251A2 (fr) Compositions et procédés de criblage génétique in vivo à grande échelle
Belaghzal et al. HI-C 2.0: An Optimized Hi-C Procedure for High-Resolution Genome-Wide Mapping of Chromosome Conformation [preprint]
Fasolino Epigenomic And Nuclear Architectural Insights Into Rett Syndrome
Belaghzal Let us know how access to this document benefits you.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16712365

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16712365

Country of ref document: EP

Kind code of ref document: A1